Editorial policy: The Journal of Econometrics is designed to serve as an outlet for important new research in both theoretical and applied econometrics. Papers dealing with estimation and other methodological aspects of the application of statistical inference to economic data as well as papers dealing with the application of econometric techniques to substantive areas of economics fall within the scope of the Journal. Econometric research in the traditional divisions of the discipline or in the newly developing areas of social experimentation are decidedly within the range of the Journal’s interests. The Annals of Econometrics form an integral part of the Journal of Econometrics. Each issue of the Annals includes a collection of refereed papers on an important topic in econometrics. Editors: T. AMEMIYA, Department of Economics, Encina Hall, Stanford University, Stanford, CA 94035-6072, USA. A.R. GALLANT, Duke University, Fuqua School of Business, Durham, NC 27708-0120, USA. J.F. GEWEKE, Department of Economics, University of Iowa, Iowa City, IA 52240-1000, USA. C. HSIAO, Department of Economics, University of Southern California, Los Angeles, CA 90089, USA. P. ROBINSON, Department of Economics, London School of Economics, London WC2 2AE, UK. A. ZELLNER, Graduate School of Business, University of Chicago, Chicago, IL 60637, USA. Executive Council: D.J. AIGNER, Paul Merage School of Business, University of California, Irvine CA 92697; T. AMEMIYA, Stanford University; R. BLUNDELL, University College, London; P. DHRYMES, Columbia University; D. JORGENSON, Harvard University; A. ZELLNER, University of Chicago. Associate Editors: Y. AÏT-SAHALIA, Princeton University, Princeton, USA; B.H. BALTAGI, Syracuse University, Syracuse, USA; R. BANSAL, Duke University, Durham, NC, USA; M.J. CHAMBERS, University of Essex, Colchester, UK; SONGNIAN CHEN, Hong Kong University of Science and Technology, Kowloon, Hong Kong; XIAOHONG CHEN, Department of Economics, Yale University, 30 Hillhouse Avenue, P.O. Box 208281, New Haven, CT 06520-8281, USA; MIKHAIL CHERNOV (LSE), London Business School, Sussex Place, Regents Park, London, NW1 4SA, UK; V. CHERNOZHUKOV, MIT, Massachusetts, USA; M. DEISTLER, Technical University of Vienna, Vienna, Austria; M.A. DELGADO, Universidad Carlos III de Madrid, Madrid, Spain; YANQIN FAN, Department of Economics, Vanderbilt University, VU Station B #351819, 2301 Vanderbilt Place, Nashville, TN 37235-1819, USA; S. FRUHWIRTH-SCHNATTER, Johannes Kepler University, Liuz, Austria; E. GHYSELS, University of North Carolina at Chapel Hill, NC, USA; J.C. HAM, University of Southern California, Los Angeles, CA, USA; J. HIDALGO, London School of Economics, London, UK; H. HONG, Stanford University, Stanford, USA; Y. HONG, Cornell University, Ithaca, NY, USA; B.E. HONORÉ, Princeton University, Princeton, NJ, USA; M.L. KING, Monash University, Clayton, Vict., Australia; MICHAEL KEANE, University of Technology Sydney, P.O. Box 123 Broadway, NSW 2007, Australia; Y. KITAMURA, Yale Univeristy, New Haven, USA; G.M. KOOP, University of Strathclyde, Glasgow, UK; N. KUNITOMO, University of Tokyo, Tokyo, Japan; K. LAHIRI, State University of New York, Albany, NY, USA; Q. LI, Texas A&M University, College Station, USA; T. LI, Vanderbilt University, Nashville, TN, USA; R.L. MATZKIN, Northwestern University, Evanston, IL, USA; FRANCESCA MOLINARI (CORNELL), Department of Economics, 492 Uris Hall, Ithaca, New York 14853-7601, USA; H.R. MOON, University of Southern California, Los Angeles, USA; F.C. PALM, Rijksuniversiteit Limburg, Maastricht, The Netherlands; D.J. POIRIER, University of California, Irvine, USA; B.M. PÖTSCHER, University of Vienna, Vienna, Austria; I. PRUCHA, University of Maryland, College Park, USA; P.C. REISS, Stanford Business School, Stanford, USA; E. RENAULT, University of North Carolina, Chapel Hill, NC; F. SCHORFHEIDE, University of Pennsylvania, USA; R. SICKLES, Rice University, Houston, USA; F. SOWELL, Carnegie Mellon University, Pittsburgh, PA, USA; MARK STEEL (WARWICK), Department of Statistics, University of Warwick, Coventry CV4 7AL, UK; DAG BJARNE TJOESTHEIM, Department of Mathematics, University of Bergen, Bergen, Norway; HERMAN VAN DIJK, Erasmus University, Rotterdam, The Netherlands; Q.H. VUONG, Pennsylvania State University, University Park, PA, USA; E. VYTLACIL, Columbia University, New York, USA; T. WANSBEEK, Rijksuniversiteit Groningen, Groningen, Netherlands; T. ZHA, Federal Reserve Bank of Atlanta, Atlanta, USA and Emory University, Atlanta, USA. Submission fee: Unsolicited manuscripts must be accompanied by a submission fee of US$50 for authors who currently do not subscribe to the Journal of Econometrics; subscribers are exempt. Personal cheques or money orders accompanying the manuscripts should be made payable to the Journal of Econometrics. Publication information: Journal of Econometrics (ISSN 0304-4076). For 2010, Volumes 160–165 (12 issues) are scheduled for publication. Subscription prices are available upon request from the Publisher, from the Elsevier Customer Service Department nearest you, or from this journal’s website (http://www.elsevier.com/locate/jeconom). Further information is available on this journal and other Elsevier products through Elsevier’s website (http://www.elsevier.com). Subscriptions are accepted on a prepaid basis only and are entered on a calendar year basis. Issues are sent by standard mail (surface within Europe, air delivery outside Europe). Priority rates are available upon request. Claims for missing issues should be made within six months of the date of dispatch. USA mailing notice: Journal of Econometrics (ISSN 0304-4076) is published monthly by Elsevier B.V. (Radarweg 29, 1043 NX Amsterdam, The Netherlands). Periodicals postage paid at Rahway, NJ 07065-9998, USA, and at additional mailing offices. USA POSTMASTER: Send change of address to Journal of Econometrics, Elsevier Customer Service Department, 3251 Riverport Lane, Maryland Heights, MO 63043, USA. AIRFREIGHT AND MAILING in the USA by Mercury International Limited, 365 Blair Road, Avenel, NJ 07001-2231, USA. Orders, claims, and journal inquiries: Please contact the Elsevier Customer Service Department nearest you. St. Louis: Elsevier Customer Service Department, 3251 Riverport Lane, Maryland Heights, MO 63043, USA; phone: (877) 8397126 [toll free within the USA]; (+1) (314) 4478878 [outside the USA]; fax: (+1) (314) 4478077; e-mail:
[email protected]. Oxford: Elsevier Customer Service Department, The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK; phone: (+44) (1865) 843434; fax: (+44) (1865) 843970; e-mail:
[email protected]. Tokyo: Elsevier Customer Service Department, 4F Higashi-Azabu, 1-Chome Bldg., 1-9-15 Higashi-Azabu, Minato-ku, Tokyo 106-0044, Japan; phone: (+81) (3) 5561 5037; fax: (+81) (3) 5561 5047; e-mail:
[email protected]. Singapore: Elsevier Customer Service Department, 3 Killiney Road, #08-01 Winsland House I, Singapore 239519; phone: (+65) 63490222; fax: (+65) 67331510; e-mail:
[email protected]. Printed by Henry Ling Ltd., Dorchester, United Kingdom The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper)
Journal of Econometrics 160 (2011) 1
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Editorial
Realized Volatility
The study of high-frequency financial data has been one of the most rapidly evolving areas of research over the last decade. We have seen an explosive growth in the availability of such data, which has gone hand in hand with the development of theory for how to analyze the data. Realized Volatility is emblematic of this development, in that it was the earliest estimator which took advantage of the data in a non-parametric fashion. As both the data and the theoretical insights have grown, it has been possible to ask more complex questions of the data, to the point where ‘‘Realized Volatility’’ is now as much the name of a paradigm as the name of an estimator. The wide variety of topics is reflected in the articles in this volume. Further estimators of quadratic variation are introduced, multivariate situations are considered, as are jumps, microstructure, asynchronicity, and causality effects. Among the applications are several forms of forecasting, and the relationship between high-frequency estimates and measurements based on financial derivatives. Methodology is developed for improving on asymptotic approximation. For many of the questions asked, there is more than one paper on the subject, showing several ways in which the same problem can be handled. This volume, as well as the field of research, also represents a coming together of areas that have traditionally been separate, ranging from econometrics and statistics to empirical finance, mathematical finance, and to computer-intensive finance. The ability of high-frequency data to help unify the field of finance in its many forms is a meta-level side effect of this research, of which we are presumably only seeing the beginning. To a great extent, this is happening because the new econometric paradigm permits statistical inference about the same quantities that have previously been studied theoretically by other branches of finance. The power to measure from high-frequency data also creates greater transparency, both of markets and of theoretical constructions. The recent financial crisis has underlined the need both for such transparency, and for finance to be understood in an integrated way. Hence, the study of high-frequency financial data is not only an intellectual innovation, but also an endeavour of the greatest social importance.
0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.07.005
Finally, high-frequency data are not only a phenomenon in finance, but also in neural science, turbulence and even in Greenlandic ice cores. Most of the papers appearing in this volume were presented at the CIRANO/CIREQ conference on ‘‘Realized Volatility’’, April 2006, organized by Nour Meddahi. The program committee consisted of Torben Andersen, Tim Bollerslev, Ronald Gallant, René Garcia, Nour Meddahi, Per Mykland, Neil Shephard, and George Tauchen. For financial support, we thank the main sponsors CIRANO and CIREQ, as well as the Bank of Canada and Hydro Quebec (Chairs held by René Garcia), Canada Research Chairs (chairs held by Russell Davidson and Jean-Marie Dufour), the Department of Economics of the Université de Montréal, and the Journal of Applied Econometrics. We also thank CIREQ’s staff for helping us organize the conference. We thank the referees for their invaluable help in improving the quality of the submitted manuscripts. We would also like to thank Ron Gallant and George Tauchen for invaluable help and guidance in putting this Annals issue together. At this point, as Gussie Fink-Nottle would suggest, on this auspicious occasion, we shall not detain you any longer. Nour Meddahi Toulouse School of Economics, France Per Mykland Oxford-Man Institute, University of Oxford, United Kingdom Department of Statistics, University of Chicago, United Kingdom Neil Shephard ∗ Oxford-Man Institute and Department of Economics, University of Oxford, United Kingdom E-mail address:
[email protected]. Available online 13 July 2010 ∗ Corresponding editor.
Journal of Econometrics 160 (2011) 2–11
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Estimating quadratic variation when quoted prices change by a constant increment Jeremy Large ∗ Oxford-Man Institute of Quantitative Finance, Oxford, OX1 4EH, UK AHL, Man Investments, Oxford, OX1 4EH, UK
article
info
Article history: Available online 6 March 2010 JEL classification: C10 C22 C80 Keywords: Realized volatility Realized variance Quadratic variation Market microstructure High-frequency data Pure jump process
abstract For financial assets whose best quotes almost always change by jumping by the market’s price tick size (one cent, five cents, etc.), this paper proposes an estimator of Quadratic Variation which controls for microstructure effects. It measures the prevalence of alternations, where quotes jump back to their justprevious price. It defines a simple property called ‘‘uncorrelated alternation’’, which under conditions implies that the estimator is consistent in an asymptotic limit theory, where jumps become very frequent and small. Feasible limit theory is developed, and in simulations works well. © 2010 Elsevier B.V. All rights reserved.
1. Introduction There is widespread evidence of persistence in financial assets’ volatility. Therefore, estimating their ex post volatility furthers the desirable goal of forecasting volatility. Recent research has advocated measuring for this purpose empirical Quadratic Variation (QV), or Realized Volatility, as a statistic of elapsed volatility — see for example Barndorff-Nielsen and Shephard (2002) and Andersen et al. (2004). The availability of second-by-second price data has encouraged high-frequency sampling when estimating QV. However, consistent estimation is significantly complicated at the highest frequencies by market microstructure effects. This paper points out features in many markets’ microstructure which can be used as structural restrictions to control for this interference. This then leads to an estimator of QV. These features arise mainly from price discreteness. Harris (1994) points out that discreteness leads some markets to ‘trade on a penny’, so that their bid-ask spread is bid down to its regulatory minimum, the price tick size (a cent, five cents, etc.), practically all the time. Empirically on such a market, the best bid and ask change through sporadic jumps by the price tick size: so, they are pure jump processes of constant jump magnitude. They may also exhibit
∗ Corresponding address: Oxford-Man Institute of Quantitative Finance, Oxford, OX1 4EH, UK. E-mail address:
[email protected]. 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.007
a lack of autocorrelation in reversals, herein termed ‘‘uncorrelated alternation’’. The paper reports both these features in quote data. It focuses on Vodafone on the London Stock Exchange (LSE), which was its busiest equity by volume in 2004, with an ancillary study of GlaxoSmithKline (GSK). When these testable features are present, QV may be estimated either from the best bid, or from the best ask, with the statistic c nk2 , a
(1)
where n ∈ N is the number of jumps in the quote, the constant k > 0 is the size of the price tick, and a ≤ n is the number of alternations, i.e. jumps whose direction is a reversal of the last jump. Engle and Russell (2005) calls these ‘reversals’. Jumps which do not alternate are continuations, and number c = (n − a). Under some further technical assumptions, which do not rule out leverage effects, the statistic in (1) is consistent for the underlying price’s QV. The term nk2 is the QV of the observed price. This is an inconsistent, and normally an upwardly biased, estimate of underlying QV because of microstructure effects. However the upwards bias implies more alternation than continuation, which indicates that returns in tick time have negative first-order autocorrelation (even when alternation itself is uncorrelated). In fact, multiplying by the ratio c /a compensates consistently. Consistency is under a double asymptotic limit theory reflecting both the high-frequency and the small-scale of the market microstructure: in it the intensity of jumping grows without limit,
J. Large / Journal of Econometrics 160 (2011) 2–11
and the squared magnitude of each jump diminishes at the same rate. Delattre and Jacod (1997) have used such an approach. This differs from the limit theory of Aït-Sahalia et al. (2005), Bandi and Russell (2006b), Barndorff-Nielsen et al. (2008), Curci and Corsi (unpublished), Zhang et al. (2005) and Zhou (1996) which present consistent estimators of QV even in cases where microstructure is not of small scale. Stochastic volatility, leverage effects and drift are introduced through a time-change, drawing on results in Monroe (1978). Trading is in fact on a penny on important financial markets for interest rates futures, currency futures, and equities. Examples include: BNP Paribas equity (on Euronext), Vodafone equity (on the LSE), US 10-year Treasury Bond Futures (on CBOT), EURIBOR, Short Sterling and Euro-Swiss Franc Futures (all three on Euronext.liffe). However, it is not the norm, and is seldom observed for example on AMEX, Nasdaq or the NYSE, where tick sizes have fallen in recent years. To widen the range of applicable markets to include some of these cases, rounding techniques are proposed for the bid, ask or mid-quote. Using GSK data, I find empirically that the statistic can then be valid even though microstructure is more prominent. Asset price paths, while normally nearly continuous, experience sporadic large jumps, due for example to public announcements — see for example Aït-Sahalia (2002) and Barndorff-Nielsen and Shephard (2006). These jumps are typically far greater than k. The estimator is shown to be consistent only over periods without such egregious jumps. Transactions are less adapted than quotes to this technique. A large proportion of trades in many assets occur off-exchange (about half of volume for Vodafone), at prices which do not respect the price tick. Meanwhile, on-exchange trade price data suffers from the bid-ask bounce, an unnecessary extra disturbance compared the best bid or ask. The paper proceeds as follows: Section 2 presents the main theorem and the asymptotic limit theory. Section 3 then outlines the central Theorem (details are left to the Appendix). Section 4 assesses the estimator and asymptotic design in a simulation study. Section 5 applies and evaluates the estimator on LSE equity data. Section 6 concludes.
individual effects added together to give the price change during a day is... random’’. Following in Clark (1973)’s direction, the observed price Y , while different to X , is modelled as arising from the same time-change. This will mean that the frequency, not the magnitude, of quote updates is increasing in the volatility, σ . Definition 1. Processes such as W and V , which are subordinated by the time-change [X ], will be said to evolve in ‘‘business time’’, while X and Y evolve in ‘‘calendar time’’. See Oomen (2006) for more on this terminology. For some T > 0, only {Yt : 0 ≤ t ≤ T } is observed. Y has a random initial value. The quantity to be estimated is the QV of X over the period that Y is observed, namely [X ]T . As X is not observed, nor is [X ]T . Y deviates from X by a microstructure effect, the process ϵ , which is defined in calendar time by
ϵ = Y − X.
(4)
Hence, ϵ = (V − W )[X ] , and so (V − W ) is the microstructure effect viewed in business time. The following two conditions recur throughout the paper. Definition 2. The microstructure is stationary in business time if (V − W ) is stationary. Definition 3. The microstructure has no leverage effects if (where ⊥ ⊥ indicates independence) V |W ⊥ ⊥ σ |W .
(5)
While allowing leverage effects in X , this means that in business time the microstructure effect is conditionally independent of current volatility. 2.1. Constant observed jump magnitude Assume that V , the observed price viewed in business time, is a pure jump process which only jumps by ±k. So the true observed price, Y , is also. Then t
∫
2. The model and main result
3
Gu dNu ,
Yt = Y0 +
(6)
0
This section first prepares the ground for the main result, given in Section 2.4. The probability space {Ω , F , P } is generated by three stochastic processes on R+ : W , a standard Brownian motion, V , a finite activity pure jump process, and volatility σ ≥ 0. The focus of the paper will be on X , an underlying price, and Y , an observed price (e.g. bid or ask) defined thus: X = W[X ] ,
Y = V[X ] ,
(2)
where [X ] is a stochastic process defined by t
∫
σu2 du.
[X ]t =
(3)
0
So W and V are subordinated by the same process, [X ]. Monroe (1978) shows that this specification of X includes all continuous semimartingales. Hence it is consistent with canonical models in work on QV estimation — see for example BarndorffNielsen and Shephard (2006). In particular, X may have leverage effects and drift, for W and σ may be dependent. The continuity of [X ] imposes continuity on X , and leads exactly to a spot volatility of σt . With some loss of generality, assume σ is uniformly bounded above and away from zero. For introductions to stochastic volatility, see reviews in Ghysels et al. (1996) and Shephard (2005, Ch. 1). Clark (1973) argued earlier for this specification of X , so that the stochastic time-change might reflect how ‘‘[t]he number of
where N is a simple counting process and G is an adapted process that only takes values ±k for some k > 0. This idea, that the observed price is a pure jump process which deviates from a fundamental price, is already present in inter alia (Ball, 1988; Gottlieb and Kalay, 1985; Li and tMykland, 2006; Oomen, 2006; Zeng, 2003).1 The QV of Y , [Y ]t , is 0 G2u dNu = k2 Nt , a stochastic process. Decompose the process N by N = A + C , where A and C are counting processes. The alternation process, A, counts the jumps in Y which have opposite sign to the one before, and the continuation process C counts jumps that continue in the same direction as the one before. Both are adapted to Y . The first jump in Y is unassigned. For all i ∈ N let ti be the time of the ith jump in Y . Define the random sequence Q = {dAti − dCti : i ∈ N}. So Q records +1 for an alternation and −1 for a continuation. Definition 4. Y has Uncorrelated Alternation if Q has zero firstorder autocorrelation.
1 It explains two related effects. First, if prices are pure jump processes, then it is clear that Bipower Variation, the statistic introduced in Barndorff-Nielsen and Shephard (2004), converges to zero with finer sampling. Second, studying quotes data, Hansen and Lunde (2006) find that RV can be downwards-biased for QV. They show this implies negative covariation between efficient returns and noise. A pure jump process accounts for this mechanically, as pointed out in Bandi and Russell (2006a).
4
J. Large / Journal of Econometrics 160 (2011) 2–11 4
2.2. Technical properties
2
{E (Yti |H1 ) = E (Yti |H2 )} ↔ {E (Xti |H1 ) = E (Xti |H2 )}.
(7)
Thus, if H2 adds (no) new information to H1 concerning the likely direction of Y ’s next jump, it adds something (nothing) new about the level of X . Definition 5 (Buy–Sell Symmetry). If (V −V0 , W ) and −(V −V0 , W ) have identical distribution, then the microstructure is buy–sell symmetric. Even if buying and selling had identical dynamics, the behavior of a single quote, say the best bid, might differ when moving upwards when compared to the spread-widening downwards direction. But when trading is on a penny, no quote change widens the spread (other than perhaps very briefly), and buy–sell symmetry is more acceptable. Definition 6. Let the sequence Π be given by
Π=
[X ]ti − [X ]t(i−1) E ([X ]ti − [X ]t(i−1) )
,
Qi + 1 2
:i∈N .
(8)
The left hand term here is the elapsed QV in X between the (i − 1)th and ith jumps in Y , once de-averaged.
0
price (basis points)
Identification assumption. Given two events observable before any jumping time ti , H1 ∈ Fti − and H2 ∈ Fti − such that H2 ⊂ H1 ,
-2 -4 -6 -8 -10 -12
time (seconds)
Fig. 1. A simulation of this paper’s proposed model. It shows an asset’s observed price, which jumps, and its continuous underlying price, here a scaled Brownian motion. The observed price has a propensity to alternate, and its returns therefore have negative first-order autocorrelation in tick time.
2.4. The main result Theorem 2.1. Consider the model {Ω , F , P α }. Suppose that (A) Y has Uncorrelated Alternation, (B) Y always jumps by a constant ±k, (C) ϵ has no leverage effects, is stationary in business time, and is ergodic. (D) Y always jumps towards X , and (E) The Identification Assumption and Buy–Sell Symmetry hold. Condition on [X ]T , so that T is a random time. Then the following limit theory applies:
lim
α→0
NT
2.3. Asymptotic limit theory A long sample leads the time series econometrician to a thought experiment where the sample is of ‘‘infinite’’ length. Of course, in practice the data is finite and so this provides an approximation. Similarly, high frequency market microstructure data invites the double asymptotic theory that, given an underlying price, the microstructure had evolved ‘‘infinitely’’ fast, with ‘‘infinitely’’ small jumps. For example, in Delattre and Jacod (1997), a diffusion process is observed very frequently with a very small rounding error. The current asymptotic theory is closely related, with the key difference that Y is fully and continuously observed. So – as explored in Hansen et al. (2008) – the full DGP, not only the extent and quality of observation, changes in the limit. Suppose that the microstructure is stationary and has no leverage effects. So the probability measure admits the following factorization: P = PV |W × Pσ |W × PW ,
(9)
where PV |W and Pσ |W are the conditional distributions, and PW is the marginal. An element of the state space, ω ∈ Ω , may be written ω = (v, w, σ ∗ ), where v , w and σ ∗ are the realizations in ω of V , W and σ respectively. I use a scaling parameter, α ∈ R+ , and let α ↓ 0. Define w[α] by
wt[α] :=
1
α
wα2 t ,
(10)
and define v [α] similarly. So, for α < 1, the functional w → w [α] slows but normalizes w so that W [α] is also standard Brownian motion. Define a new measure PVα|W by: PVα|W (v, w) := PV |W v [α] , w
[α]
.
(11)
For fixed v [α] in the support of P, the size of jumps in v , as well as the intervening durations, decline indefinitely as α ↓ 0. The asymptotic theory approximates P in (9) by limα↓0 {P α }, where P α = (PVα|W × Pσ |W × PW ).
[ X ]T − 1 ∼ N (0, UMU ′ ), [X ]T
CT [ X ]T = k2 NT
(or 0 if AT = 0).
AT
Here U is 1, is the ratio
(1+R)2
[X ]T E [Y ]T
where
R
(12)
; M is the long-run variance matrix of Π ; and R
.
Proof. Section 3 provides the proof of this Theorem.
Fig. 1 shows a process satisfying the Theorem’s assumptions. A feasible limit theory when volatility is constant. Proposition 3.4 will show that R may be estimated by CT /AT . However, Theorem 2.1’s asymptotic limit theory is still infeasible because Π is not observed. Nevertheless, if one is willing to assume that the spot variance does not change much within the day, the following approach provides a useful approximate way to characterize the limiting standard errors: The elapsed QV in X between jumps at ti and ti−1 is given by
[X ]ti − [X ]t(i−1) = σ 2 (ti − ti−1 ),
(13)
and, de-averaged,
[X ]ti − [X ]t(i−1) E ([X ]ti − [X ]t(i−1) )
=
(ti − ti−1 ) T
E (NT ).
(14)
ˆ , on which the Newey Substituting NT for E (NT ) gives an estimate Π and West (1987) method, and other long-run variance estimation techniques, can be used to estimate M. Discussion of the result. The result is semi-parametric because it does not refer to the dynamics or the intensity of N. The proposed estimator is easy to calculate. It can be viewed as arising from applying a scaling correction, CT /AT (which may be more or less than 1), to naive Realized Volatility at high frequency, [Y ]T . So it has a close statistical relationship to Realized Volatility. Many jumps are indicative of high volatility unless most of them are alternations, a possibility which the observed proportion of
J. Large / Journal of Econometrics 160 (2011) 2–11 price
Definition 8. For each of Y ’s jumping times, ti , define Zti by Zti = E [Xti |Yti , Gti , R].
Y0+ 2k Rk
k/2 (R+1)
k/2 (R+1)
Note that Z is not observed because R is not observed. The evolution of Z is described in Fig. 2, which also illustrates the following lemma.
Y0
Y0 – k
time c
a
c
Fig. 2. The solid line shows Y , while the dashed line shows Z . The letters on the time axis indicate if the jump is an alternation or a continuation. The diagram illustrates the relative contribution to the QV of Z by alternations and continuations.
alternations to continuations provides a means to account for. Since it has no fixed observation frequency, the statistic does not encounter systematic biases due to intraday seasonality. Discussion of the assumptions and theory. Assumptions (A) and (B) can be tested empirically. (A) states that the likelihood that a jump is an alternation does not depend on whether the last jump was. It may be tested via a regression of Q on itself lagged. Section 5 goes on to treat cases where (B) and (A) do not hold directly using rounding techniques. Assumptions (C), (D) and (E) cannot be tested. (C) does not preclude leverage effects in X . (D) rules out transitory increases to |ϵ| through noise — or other trading. Unless the bid and ask simultaneously jump away from their underlying diffusions this would involve a change in the bid-ask spread: but the bid-ask spread is almost always constant when trading is on a penny. (E) rules out observed dynamics that do not reflect the underlying price at all. Related asymptotic theories have conditioned on σ , equivalently on the process [X ]. Here, however, only [X ]T , the elapsed QV in X over the period [0, T ], is given. 3. Proof of Theorem 2.1
Lemma 3.2. The Quadratic Variation process for Z , denoted [Z ], is a linear combination of the processes A and C given by [Z ] = k2 (C + R2 A). Proof. When Y jumps by continuing in the same direction as the last jump, Z jumps by k. When Y jumps by alternating in direction, Z jumps by Rk. This follows from simple calculation, and is shown in Fig. 2. The QV of Z is the sum of its squared jumps. Definition 9. A process S has Ideal Error if E [[S ]T ] = [X ]T . Proposition 3.3. Suppose that Assumptions (B), (C), (D) and (E) of Theorem 2.1 hold. Uncorrelated Alternation then implies that Z has Ideal Error. Proof. See Appendix B.
Uncorrelated Alternation may be tested simply by regressing Q linearly on itself lagged, and testing that the regressor is significant. Proposition 3.4. Suppose that Assumptions (B), (C) and (D) of Theorem 2.1 hold. Suppose that Z has Ideal Error. Then, conditional on [X ]T , E [AT R − CT ] = 0,
(18)
and R has the Method of Moments estimator
Throughout the proof, the model will be conditioned on the object of scientific interest, [X ]T . This implies that T is a random time. Furthermore, without loss of generality assume that E [ϵt ] = 0, so that the observed price Y has undergone a vertical shift leaving its increments unchanged.
Rˆ =
CT AT
.
(19)
(Define Rˆ = 0 if CT = AT = 0.) Proof. See Appendix C.
Definition 7. Let R be the ratio
So, recalling that [X ]T = R E [Y ]T , the proposed estimator of [X ]T
[X ]T . E [Y ]T
(15)
Proposition 3.1. Suppose that Assumptions (B),(C) and (D) of Theorem 2.1 hold. The error just before the ith jump is ϵti − . Taking the ergodic expectation, for all i k 2
(20)
Definition 10. Denote by Zˆ the estimate of the process Z constructed by replacing R with Rˆ in (17).2 Straightforwardly,
[R + 1] .
Proof. See Appendix A.
is
[ X ]T = Rˆ [Y ]T .
Under Assumption (C), R is invariant to [X ]T .
E [| ϵti − |] =
(17)
(Recall that Gti = ±k is the jump in Y at ti .) Extend the sequence {Zti : i ∈ N} rightwards to a càdlàg pure jump process Z .
k
Y0+ k
R=
5
(16) Rˆ [Y ]T = [Zˆ ]T .
Discussion. Proposition 3.1 provides a unbiased estimate of |ϵti − | while under Assumption (D) the direction of the jump at ti gives the sign of ϵti − . Combining these, an unbiased estimate of ϵti − itself is available. Equally, Xti can be estimated without bias by adding or subtracting E [| ϵti − |] to/from Yti − , depending on the direction Y jumps at ti . The next definition gives the name Z to this conditional estimation process, which is illustrated in Fig. 2. In line with intuition, if R ≈ 0, such as if X is almost constant, then Y jumps between X ± 2k .
(21)
The final proposition in this section provides the asymptotic limit theory for this estimator, proving its consistency.
2 Under the assumptions of Theorem 2.1, first constructing Zˆ provides a better estimate of X , the underlying price, than using Y directly. In this sense, Zˆ thus constructed is a useful ‘‘filter’’ for Y . The QV of Zˆ , [Zˆ ]T , is identical to the Alternation Estimator in the model. However, if in practice the data contains some jumps greater than the price tick size, they normally diverge. For some purposes [Zˆ ]T may be preferable: a relevant case is when large jumps in price are expected.
6
J. Large / Journal of Econometrics 160 (2011) 2–11
Proposition 3.5. Suppose that Assumptions (B),(C) and (D) of Theorem 2.1 hold. Suppose that Z has Ideal Error. Then the following limit theory applies:
lim
α→0
NT
Rˆ [Y ]T
[X ]T
− 1 ∼ N (0, UMU ′ ).
(22)
Proof. See Appendix D. Note that Π is stationary since the microstructure is stationary in business time. Proof of Theorem 2.1. Suppose that Assumptions (B), (C) and (D) of Theorem 2.1 hold. (Then Z may be constructed as in Fig. 2.) If in addition Assumptions (A) and (E) hold, then Proposition 3.3 shows that Z has Ideal Error. Therefore Propositions 3.4 and 3.5 apply and the Theorem follows. 4. Simulation assessment This section assesses the estimator and its asymptotic theory in simulations. Two widely-employed DGPs are considered, adapted so the observed price practically always jumps by k. In both, the underlying price X is specified as in Section 2. They are (1) Independent Noise with Rounding, where as in Li and Mykland (2006) X is observed with error and then rounded-down to a multiple of k, Y = k⌊(X + u)/k⌋,
(23)
with u ⊥ ⊥ X and u ∼ NID(0, χ 2 ); and (2) Rounding, which is a case of (1) when χ = 0. These differ from this paper’s DGP, and so help to evaluate the robustness of the estimator. Both DGPs have [Y ]T of unbounded expectation, so are unrealistic in this setting of full continuous observation. As was pointed out in Gottlieb and Kalay (1985), even without other noise, a rounded-off Itô process has an unbounded QV whenever it crosses a rounding threshold, for with probability 1 it crosses back and forth infinitely more times in the next instant. This is remedied by sampling a finite m ∈ N times over the day. It is convenient to sample evenly in business time. A third DGP is simulated, where Y has finite QV and so may be sampled continuously: called here (3) Sluggish Rounding.3
The three models are calibrated various ways following the simulation study of some moving-average, kernel and time-scale based estimators in Hansen et al. (2008). As there, [X ]T is set to 1, and T is normalized to average 6.5 h, a typical financial market ‘trading day’. Their smaller values of k and χ 2 are excluded. All reported statistics are invariant to the distribution of σ and any leverage effects, which were therefore left unspecified. An Euler discretization broke up each day, i.e. run, into 6.5 × 60 × 60 × 4 intervals (averaging 4 per second). There were 20,000 runs. The results are presented in Table 1. The upper panel shows the proposed estimator’s average value across all runs. The lower panel in Table 1 describes specification testing and inference. It reports rejection frequencies of first-order autocorrelation tests on Q , at 5% and 1%, alongside the proportion of jumps in Y exceeding k. Not surprisingly, the Sluggish Rounding models are the best specified ρ,k (these simulations fixed SRX ,0 := X0 = 0). In addition, models where rounding is large (k = 0.1), and χ is zero or small, when sampled every 10 s, are quite well specified. By way of contrast, Li and Mykland (2006) finds that estimators based on models of additive Independent Noise are most effective when χ is large relative to k. When calculated using Proposition 4.1, the expression
UMU ′ NT
[ X]
estimates the standard deviation of [X ]T . ‘Coverage of 1’ gives the T proportion of times that 1, the truth, lies within the resulting 95% and 99% confidence intervals. All the aforementioned models have good such coverage. However, models with k = 0.05 sampled every 10 s, have less desirable coverage, arising perhaps from their upwards bias of 4% to 5%, but are not often rejected by the proposed specification tests. 5. Empirical implementation This part implements the proposed estimator for Vodafone stock traded on the LSE’s electronic limit order book, SETS. Vodafone was the LSE’s most heavily traded stock (in £) in 2006. The data spans a period of 7 months from August 2004 to the end of February 2005. This period comprised 147 trading days running from 8:00 am to 4:30 pm, except for 24 December and 31 December, when markets closed at 12:30 pm. Vodafone’s best bid (ask) was revised 17,060 (17,167) times over the sampled period, on average 116 times per day.
ρ,k
Definition 11 (Sluggish Rounding). For ρ > 2k > 0, define SRX to be the process which jumps towards (possibly ‘beyond’) X by ρ,k amount k whenever |SRX − X | ≥ ρ . The initial condition SRX ,0 is distributed such that (SRX − X ) is stationary. The constraint ρ >
k 2
ρ,k
is required so SRX ρ,k
In the case where ρ = k, SRX ρ,k
whenever X reaches SRX
has finite expected QV.
jumps to exactly the value of X
± k. ρ,k
Proposition 4.1. Suppose that Y = SRX . Then Assumptions (A),(B), (C),(D) and (E) of Theorem 2.1 hold. By Proposition 3.1, ρ = 2k [R+1]. Furthermore, Π is an i.i.d. sequence and UMU ′ =
2 3R
(1 + 4R + 2R2 ).
(24)
Therefore UMU ′ may be estimated consistently by replacing R in (24) with Rˆ = CT /AT . Proof. See Appendix E.
3 I am grateful to Peter Hansen for suggesting this term.
5.1. Specification testing The prices at both the bid and the ask were first tested for uncorrelated alternation in a first order autoregression of the sequence Q . Over a long sample, fluctuation in the marginal propensity to alternate may introduce spurious dependence into this autoregression. For testing, the data was therefore viewed as a succession of independent trading days, over each of which parameter stability can reasonably be expected. The trading days were prepared by excising their first 15 min. After-effects of the opening auction are known to produce distinctive microstructure at this time. The null hypothesis tested was that Q is i.i.d., a stricter null than is needed, but simpler to test for. For the best ask (bid), 14.2 (14.9)% of days failed an LR test for i.i.d. alternation at 5%. While ideally these numbers would be close to 5%, in reality a minority of days experienced episodes of abnormal market microstructure due to large price jumps, news announcements, options due dates, etc. To study this further, days were broken at 12 pm into the morning and the afternoon, producing 294 periods. For the best ask (bid), 6.8 (8.5)% of periods now failed the LR test at 5%. The test’s rejection frequencies are much improved, suggesting that an abnormal episode in the microstructure is typically brief: it does not cause both halves of a
J. Large / Journal of Econometrics 160 (2011) 2–11
7
Table 1 Reports of simulations of the proposed estimator. k
χ2
(tick)
(noise)
Rounding
0.10 0.10 0.05 0.05
– – – –
Independent noise with rounding
0.10
Sluggish rounding
ρ
Mean sampling
‘Naïve’ realized
interval, 1/m
variance, RV1/m
NT
CT
AT
CT /AT
Alternation estimator
– – – –
10 s 1 min 10 s 1 min
3.9 2.6 2.0 2.2
386 153 766 246
80 63 270 214
306 91 497 33
0.26 0.70 0.54 n/a
1.016 1.073 1.042 n/a
0.00001
–
10 s
4.0
395
81
314
0.26
1.021
0.10 0.05 0.05 0.10 0.10 0.05 0.05
0.00001 0.00001 0.00001 0.0001 0.0001 0.0001 0.0001
– – – – – – –
1 min 10 s 1 min 10 s 1 min 10 s 1 min
2.6 2.0 2.2 4.7 2.7 2.4 2.2
154 783 247 467 158 914 251
63 273 214 87 64 305 221
91 510 33 381 94 609 30
0.70 0.54 n/a 0.23 0.69 0.50 n/a
1.074 1.050 n/a 1.067 1.091 1.144 n/a
0.10 0.05
– –
0.08 – 0.04 –
1.6 1.5
156 591
61 238
96 353
0.64 0.68
0.997 0.999
k (tick)
χ2
ρ
sqrt(UMU ′ /NT ) mean of all runs
Coverage of 1
Test of Unc. Alt.
0.95
0.99
0.05
0.01
0.989 0.959 0.970 0.009
0.053 0.053 0.059 0.261
0.010 0.012 0.014 0.111
(noise)
Quantities averaged across all runs
Proportion of jumps > k (%)
Rounding
0.10 0.10 0.05 0.05
– – – –
– – – –
10 s 1 min 10 s 1 min
0.12 0.17 0.08 n/a
0.953 0.900 0.908 0.004
Independent noise with rounding
0.10
0.00001
–
10 s
0.12
0.949 0.986 0.053 0.011
0.0
0.10 0.05 0.05 0.10 0.10 0.05 0.05
0.00001 0.00001 0.00001 0.0001 0.0001 0.0001 0.0001
– – – – – – –
1 min 10 s 1 min 10 s 1 min 10 s 1 min
0.17 0.08 n/a 0.11 0.17 0.07 n/a
0.902 0.893 0.005 0.895 0.881 0.490 0.002
2.4 0.8 24.5 0.0 2.8 2.2 26.1
0.10 0.05
– –
0.08 – 0.04 –
0.17 0.09
0.956 0.989 0.053 0.012 0.953 0.992 0.051 0.010
Sluggish rounding
1,400
pence squared
1,200 1,000 800 600 400 200 0.01
0.1
1 sampling interval (minutes)
10
100
Fig. 3. Volatility signature plots for Vodafone’s best bid price from August 2004 to February 2005. The diamonds and squares show RVζi , respectively for the bid and for the transformed bid, Zˆ , against sampling interval, 1/ζ .
trading day to be rejected separately. As a result of its tight spread, Vodafone lacks resiliency dynamics which can induce lagged autocorrelation in quoted prices, see Degryse et al. (2005) and Large (2007). This helps account for this finding of uncorrelated alternation. Finally, only 0.5% of jumps in the best bid or ask exceeded the price tick size. In conclusion, the model is found to be mis-specified mainly during infrequent brief interludes. The results of the next Section suggest that these interludes do not unduely prejudice the procedure. 5.2. Results The estimator was calculated for each day of the sample. To study its bias, Fig. 3 shows for the current data, volatility signature plots (see (Andersen et al., 2000)) of Y , and of the process Zˆ as given in Definition 10. Six days (among the 14.2% failing
0.957 0.962 0.009 0.963 0.950 0.693 0.005
0.055 0.066 0.259 0.090 0.052 0.137 0.243
0.011 0.015 0.106 0.019 0.011 0.042 0.096
0.0 2.3 0.6 24.3
0.0 0.0
the last part’s test) were excluded, since they contained large jumps in price. These were Christmas Eve, New Year’s Eve, and the third Fridays in November’04, December’04, January’05 and February’05. Under the assumptions of Proposition 3.3, passing a test for i.i.d. alternation implies that Z has Ideal Error. Hence loosely, this test can be interpreted as sufficient for the hypothesis that the volatility signature plot of Z is flat. Inspection of Fig. 3 suggests this may at least be so of Zˆ . Definition 12. Let RVζi be the Realized Variance (or, RV) of the bid sampled at a frequency ζ on the ith observed day. This is the sum of squared changes in that price on day i between successive times in the sequence {0, ζ , 2ζ , . . . , ζ ⌊T /ζ ⌋}. i We will be interested in the series {RV30 min : i = 1, 2, . . .}, i i } , { RV } , and { RV } . In this notation the observed 5 min 1 min min QV of the best bid is {RV0i + }. Fig. 3 illustrates these quantities’ upwards bias when viewed as estimators of underlying QV. Only at sampling intervals ≥30 min does the upwards bias due to market microstructure effects become moderate. i {RV15
5.3. Forecasting assessment To distinguish it from alternatives, the proposed estimator will now be referred to as the Alternation Estimator. On the ith day it is written Alt i . This section uses the Vodafone best bid data (excluding the six aforementioned days) in a simple forecasting assessment of Alt i . I follow Andersen et al. (2007) in turning to the HAR-RV model of Corsi (2009). Volatility is proxied both i i by {RV30 min }, and by Alt . The results suggests first, that the
8
J. Large / Journal of Econometrics 160 (2011) 2–11
Table 2 Regressions of two proxies for Vodafone’s QV against lagged values of the Alternation Estimator and RVζi . Standard errors are reported below estimates. No Newey-West-type corrections were made, but specification tests were performed on the residuals. Bold type indicates significance at 5%. All p-values are given in italics. Explained variable
Alt
Const. Day Alternation Estimator lagged terms
30 min
30 min
Alt
30 min
Alt
30 min
1.17
1.03
1.86 1.67
1.91 0.69
3.17
1.49
0.66 −0.78
2.55
0.51
0.23
0.13
0.88
0.69
0.53
0.54
0.67
0.85
1.05
0.88
1.98
2.48
1.79
0.62
0.61
0.78
0.77
0.35 0.19 −0.04
Week
Alt 0.67
0.10 0.24 −0.08
0.70
0.23 0.18 −0.12
0.10 0.22 −0.21
1.44
0.32 0.17 −0.02
0.29 0.21 −0.03
0.23 0.18 0.03
0.33 0.22 0.09
0.08
0.10
0.08
0.10
0.06
0.08
0.07
0.09
Month
0.07
0.12
0.10
0.15
0.02
0.03
−0.01
−0.02
0.04
0.05
0.04
0.05
0.03
0.03
0.03
0.03
Joint significance of lagged alternation terms
0.03
0.14
0.06
0.03
0.20
0.63
0.40
0.10
30 min −0.08
Day
0.15
Lagged RV terms Week
0.09
0.19
0.06 0.01
0.07
0.08
0.04
15 min 0.01 0.20
0.11
0.00 −0.03
−0.05
Month
30 min 0.16 −0.03 0.07
0.15
0.05
0.01 −0.11 0.00
15 min −0.06
5 min 0.07 −0.04
5 min 1 min 0.12 −0.16 −0.01 0.01
1 min 0.05 −0.07 −0.01 0.03
0.09
0.19
0.11
0.07
0.14
0.09
0.07 0.00
0.16
0.03 −0.01 −0.02
0.02
0.01 −0.02 −0.01 −0.03 −0.01
0.08
0.04
0.04
0.03
0.06
0.04
0.02
0.01
0.02
0.01
−0.12 −0.01
0.00
0.02 −0.01
0.01
0.01
0.01
0.01
0.01
0.07
0.04
−0.07 0.01
0.11
0.04
0.05
0.03
0.04
0.02
0.05
0.02
0.03
0.01
0.04
0.02
0.02
0.01
0.03
0.02
0.01
0.00
0.01
0.00
Joint significance of lagged RV terms
0.32
0.22
0.08
0.87
0.23
0.06
0.03
0.58
0.97
0.12
0.37
0.91
0.49
0.02
0.14
0.40
Joint sig. of all regressors
0.03
0.22
0.27
0.87
0.02
0.06
0.08
0.58
0.11
0.12
0.71
0.91
0.05
0.02
0.21
0.40
Specification tests Normal residuals Autocorrelated residuals Heteroskedastic residuals
0.00 0.38 0.73
0.00 0.01 0.83
0.00 0.00 0.63 0.54 0.41 0.91
0.00 0.00 0.37 0.18 0.59 0.61
0.00 0.65 0.21
0.00 0.80 0.62
0.00 0.28 0.95
0.00 0.05 0.83
0.00 0.68 0.93
0.00 0.34 0.92
0.00 0.33 0.20
0.00 0.15 0.18
0.00 0.81 0.28
0.00 0.22 0.21
i Alternation Estimator is better forecast than {RV30 min }. Second, forecasts of volatility proxies are improved by using either lagged values (1) of the Alternation Estimator or (2) of the biased highfrequency Realized Volatilities: {RV5i min } and {RV1i min }. In-sample OLS regressions were used, which provide useful comparisons even in this short sample. i i The dependence of RV30 min and Alt on lagged variables is assessed in the models i RV30
min
or Alt i = β0 + βD RVζi−1 + βW
5 −
i −j
RVζ
+ βM
j =1
+ χD Alt i−1 + χW
5 − j =1
Alt i−j + χM
22 −
22 −
i−j
RVζ
j =1
Alt i−j + εi ,
j =1
where the regressors are independent of εi , and the εi are i.d. and homoskedastic of mean zero. The frequency, ζ , is set to four values: 30, 15, 5 and 1 min. The coefficients βD , βW and βM describe the effect of the last day’s, week’s and month’s RV respectively. The coefficients χD , χW and χM do likewise for Alt i . In the construction of weekly and monthly quantities, appropriate linear adjustments were made to allow for public holidays. There were 119 daily observations. Repeats of all regressions excluded lagged values of Alt i . The results are reported in Table 2. Except when ζ = 5 min, Alt i shows significant dependence on lagged terms (at 5%) while i i RV30 = min shows none throughout. Forecasting Alt , when ζ 30 min or 15 min, lagged values of RVζ are jointly insignificant. However, adding lagged values of Alt i gives significant regressions. 5.4. Extension to markets that do not trade on a penny Vodafone is one of the only equities on the LSE which trades on a penny. Elsewhere, the model is mis-specified. GlaxoSmithKline (GSK) provides an example of this: over the same 147 days as Vodafone, the mean bid-ask spread was 1.15 pence, but the tick size was 1 pence. Jumps by more than the price tick are correspondingly more prevalent than for Vodafone, representing 4.9 (4.6)% of changes in the best bid (ask). To make this
data applicable, initial preparation is required. I propose two techniques.
• First, the quote may be rounded down (or up) to the nearest even (or odd) multiple of the price tick.
• Second, the quote may be ‘‘sluggishly rounded’’: specifically, ρ,2k
in place of the observed data, Y . It is practical to study SRY set ρ = 2k. Both these may result in processes that mostly contain jumps of 2k, making them amenable to the current method. The midquote, whose price increment is half the price tick size, can also be (sluggishly) rounded. A larger multiple of k than 2 can be used. Specification testing and results. For each observed day, GSK’s bid, ask and mid-quote were separately prepared using rounding and sluggish rounding. The results of specification testing and estimation are in Table 3. With the same provisos as for the Vodafone data, all the methods of preparation produce fairly well specified models. As documented in Table 3, although the six methods result in different numbers of jumps per day, and substantially differing propensities to alternate, they imply very similar estimates of underlying QV. In applications, it would be advisable to average all six. 6. Conclusion This paper views the observed price as a pure jump process whose deviations from an underlying stochastic process are stationary in business time. Noting that on many markets the amount by which quotes jump is constant, it proposes an estimator for the underlying price’s QV which scales down the quoted price’s observed QV by a factor that takes into account its propensity to alternate. Under conditions, the estimator is consistent in an appropriate asymptotic theory that is confirmed in calibrated simulations. Simple rounding techniques widen the range of applicable markets. The estimator is shown to be valid for two UK equities. Analysis of its bias and use in forecasting produces favorable results. Future research could extend the results in three important ways: to account for varying jump sizes in the market microstructure, to treat large discontinuities in the underlying price, and to estimate the covariation between assets.
J. Large / Journal of Econometrics 160 (2011) 2–11
9
Table 3 Results of specification testing and estimation for GSK quotes, rounded. Per cent of half-days failing spec. test at 5%
Jumps per day
Cont./Alt.
Estimated daily QV (pence2 )
rounded to the nearest 2p sluggishly rounded to 2p
3.7 7.8
210 61
0.19 0.62
156 151
rounded to the nearest 2p sluggishly rounded to 2p
6.1 6.5
211 60
0.19 0.65
158 155
7.8 8.2
278 77
0.25 0.89
156 155
Bid
Ask
Mid-quote rounded to the nearest 1.5p sluggishly rounded to 1.5p
Acknowledgements I thank Yacine Aït-Sahalia, Julio Cacho-Diaz, Peter Hansen, Mike Ludkovski, Nour Meddahi, Per Mykland, Neil Shephard and two anonymous referees for their help and encouragement. I also thank for very helpful comments conference and seminar participants at Stanford University, at the Frontiers in Time Series Analysis Conference in Olbia, Italy, at the Princeton-Chicago Conference on the Econometrics of High Frequency Financial Data at Bonita Springs, Florida, and at the European Winter Meeting of the Econometric Society 2005, Istanbul. I am very grateful to the Bendheim Center for Finance for accommodating me at Princeton University during part of the writing of this paper, and to the US-UK Fulbright Commission for their support. Appendix A. Proof of Proposition 3.1 Let u = V −W be the microstructure effect in business time. The proof equates the variance of u at (i.e. just after) two consecutive jumps, say the second and third, at (random) times t2 and t3 . Define λ¯ as the average intensity of jumping in business time. Define {wi } and {vi } as the sequences of random increments in W and V :
wi = Wti − Wti−1 , vi = Vti − Vti−1 , (25) so that v3 is equal to the jump at t3 , i.e. ±k, and {wi } are i.i.d. of known variance: wi ∼ N ( 0 , ti − ti−1 ). It then follows that uti = ut(i−1) + vi − wi . So, E[
u2t3
] = E [(ut2 + v3 − w3 ) ] 2
(26)
= E [u2t2 + v32 + w32 − 2w3 ut2 + 2v3 ut2 − 2v3 w3 ]
(27)
= E [u2t2 + v32 + w32 − 2w3 ut2 + 2v3 (ut2 − w3 )].
(28)
But by Assumptions (B) and (D), v3 is −k sign(ut2 − w3 ). ¯ . Further, as W is a martingale, Furthermore, E (wi2 ) is 1/λ E [w3 |ut2 ] = 0, and so E [w3 ut2 ] = 0. So, (28) is
Corollary B.1. Assume Assumptions (B), (C) and (D) of Theorem 2.1, and that Y does not have Ideal Error. Then Z has Ideal Error iff at each jump, timed ti , i > 1, E (ϵti |Gti ) = E (ϵti |Gti , Gt(i−1) ).
(30)
Proof. If Z has Ideal Error, then by Lemma B.2, for all t, E (Zt − Xt | the last two jumps in Y went up, then down )
= 0.
(31)
So, conditional on the two jumps in Y before t going up, then down E (Yt − Xt ) = Yt − Zt . So, E(ϵt | last 2 jumps in Y went up, then down) = E(ϵt | last jump in Y went down). The proposition now follows by the buy–sell symmetry of the model, considering exhaustively the four cases where prior to t: the last 2 jumps in Y went up, then down; the last 2 jumps in Y went up, then up; the last 2 jumps in Y went down, then up; and the last 2 jumps in Y went down, then down. So Z has Ideal Error when conditioning not only on the last jump, but also on the one before, leaves unchanged the best estimate of Xt given Yt . Lemma B.2. Assume Assumptions (B),(C) and (D) of Theorem 2.1. Then for any t, E [Z ]T − [X ]T = 2(R − 1)E [Y ]T pA E
Z t − Xt k
| the last two jumps in Y went up, then down , (32)
where pA is the probability that a jump is an alternation. Proof. See Appendix F.
Appendix C. Proof of Proposition 3.4
¯ X ]T . As one could equally have looked at any two E [Y ]T = k2 λ[ successive jumps, the proposition follows.
Case where Y has Ideal Error. Then R = 1. By Proposition 3.1, the expected value of |ϵt | just before a jump is k. Therefore, the expected value of ϵt conditional on Y just after an upwards jump is 0. By (E), Y then has equal probability of jumping up as down. As Q is uncorrelated, the probability that any given jump is an alternation is 0.5. Hence E [AT − CT ] = 0. Case where Y does not have Ideal Error. This case contains an important argument. Condition on [X ]T . Z has Ideal Error if
Appendix B. Proof of Proposition 3.3
[X ]T = E ([Z ]T ) = E (k2 (CT + AT R2 )).
¯ − 2kE [| ut2 − w3 |]. E [u2t2 ] + k2 + 1/λ
(29)
Moreover, (ut2 − w3 ) is ut3 − , the right limit of u before the jump at t3 . As u is stationary, we may equate E [u2t2 ] and E [u2t3 ] to obtain the equality E [| ut3 − |] =
k 2
1
¯ k2 λ
+ 1 . But, conditional on [X ]T ,
(33)
Also, R is defined by [X ]T = RE ([Y ]T ) = E (k R(CT + AT )). Subtracting and dividing by k2 , we therefore have the moment condition, 2
First suppose that Y has Ideal Error. Then Z = Y has Ideal Error trivially. Now, and for the rest of the proof, assume that Y does not have Ideal Error. If Q has a first lag autocorrelation of zero then it is easily checked that E (Gti+1 |Gti ) = E (Gti+1 |Gti , Gt(i−1) ). Therefore, by the Identification Assumption, E (ϵti+1 − |Gti ) = E (ϵti+1 − |Gti , Gt(i−1) ). But then, as no jumps occurred between ti and ti−1 , E (ϵti |Gti ) = E (ϵti |Gti , Gt(i−1) ). The Proposition now follows from Corollary B.1.
E [(CT + AT R2 ) − R(CT + AT )] = 0.
(34)
Or, factorizing, (R − 1) E [(AT R − CT )] = 0. Since Y does not have Ideal Error, R ̸= 1. Divide through by (R − 1) to obtain E [AT R − CT ] = 0.
10
J. Large / Journal of Econometrics 160 (2011) 2–11
So −2E (ηs Hs λs )ds = E (Hs2 λs )ds + ds. Multiplying by −S /ds and adding on a constant,
Appendix D. Proof of Proposition 3.5 Condition on [X ]T , and define a business time, S, by S = [X ]T . Let {s1 , s2 , s3 . . .} be the business times of the observed jumps in Y , i.e. the times of the jumps in V . Then Π reduces ¯ si − s(i−1) ), 1+Qi : i ∈ N . The RH fraction is 1 when Y to λ( 2 alternates, and 0 when Y continues. We take the limit as α → 0 of PV |W (v [α] , w [α] ). Consider (V [α] , W [α] ). This pair’s distribution is unchanged as α ↓ 0, but V [α] is observed for a longer time, until time S /α 2 . For given α , NT is the number of jumps in V [α] before time S /α 2 . As α ↓ 0, NT → ∞ with probability 1. As S /α 2 is the sum of the durations between the observed jumps (in business time), plus the time after the last jump in the sample, by a standard CLT,
¯ S λ 2 1 lim NT α NT − ∼ N (0, M ), p A
2SE ((ηs + Hs )Hs λs ) = SE (Hs2 λs ) − S .
¯ E(ηs Hs | jump at t), while the right But the left hand side of this is 2S λ hand side is E [Z ]T − [X ]T . Putting this together, given the buy–sell symmetry of η, ¯ E (ηs Hs | Y jumped up at t ). E [Z ]T − [X ]T = 2S λ
(35)
¯ k2 , so E [Z ]T − [X ]T = But, E [Y ]T = S λ [Y ]T E(ηs Hs |Y jumped up at t). So, distinguishing alternation from continuation in order to extract |H | from this, 2 k2
E [Y ]T {pA RkE (ηs | Y alternated up at s)
=
A
T
where pA is the probability that a jump is an alternation. Let f : (x, y) → (1 − y)/xy. Then f has positive derivative in R+ × R+ , so by the Delta Method, lim
α→0
NT
+ (1 − pA )kE (ηs | Y continued up at s)} 2 k
E [Y ]T E (ηs | Y jumped up at s) 2
NT
NT CT α 2
¯ AT Sλ
−
(1 − pA )
pA
+ E [Y ]T pA (R − 1)E (ηs | Y alternated up at s) k
2
= 0 + E [Y ]T pA (R − 1)E (ηs | Y alternated up at s). k
Therefore,
∼ N (0, df ′ Mdf ),
where df is evaluated at (1, pA )′ . By Proposition 3.4 ′
(36) (1−pA ) pA
= R, so
2 E [Z ]T − [X ]T = − E [Y ]T (1 − R)pA E (ηs | Y alternated up at s). k Or,
df |(1,pA )′ = −RU (after algebra) and lim
α→0
NT
(40) 2 E k2
α→0
(39)
NT (α k) CT 2
¯ RAT Sk2 λ
E [Z ]T − [ X ]T =
− 1 ∼ N (0, UMU ). ′
(37)
¯ R = 1; S = [X ]T , and Finally, recall: NT (α k)2 = [Y ]T ; k2 λ substitute into (37). Appendix E. Proof of Proposition 4.1 Condition on [X ]T . In business time, the duration between jumps is the time taken for a standard Brownian motion to exit the interval (−k, Rk). Hence Q and Π are i.i.d. The probability that the level Rk is reached before the level −k is known to be k+kRk , or 1/(1 + R). This also equals pA . The expected time to the first exit is Rk2 , which is therefore also E [si −s(i−1) ]. The following formulae are derived from results in Borodin and Salminen (1996): The variance ¯ si − s(i−1) )], is of the time between jumps is 13 Rk4 (1 + R2 ). So v ar [λ( 2 +1) ¯ si − [si −s(i−1) |Qi = 1] = 2Rk3(R(2R . Therefore cov[Qi , λ( +1) 1 −R R s(i−1) )] = − 3(1+R) . Also v ar [Qi ] = (1+R)2 . These give all entries of
1+R2 . And E 3R
M. UMU ′ follows.
Appendix F. Proof of Lemma B.2 Let S = [X ]T be known. Let Z˜ be Z[X ]−1 , i.e. Z˜ is Z as it evolves in business time. Define ηs = Z˜s − Ws . So, η is the error in Z , as it evolves in business time. As V − W is stationary, η is too. Let it follow the differential equation dηs = Hs dNs′ − dWs , so that N ′ is the driving counting process of V . Say that the adapted intensity process of this counting process is λ. H is a process which takes value ±k, and ±Rk, depending on whether V is alternating or continuing, up or down. Then, E ((ηs + dηs )2 ) = E (ηs2 ). Therefore, E (ηs2 + 2ηs dηs + dηs2 ) = E (ηs2 ). And so, −2E (ηs dηs ) = E (dηs2 ). Or,
− 2E (ηs (Hs dNs′ − dWs )) = E ((Hs dNs′ − dWs )2 ).
(38)
2 k
E [Y ]T (1 − R)pA E (ηs | Y alternated down at s). (41)
References Aït-Sahalia, Y., 2002. Telling from discrete data whether the underlying continuoustime model is a diffusion. Journal of Finance 57, 2075–2112. Aït-Sahalia, Y., Mykland, P.A., Zhang, L., 2005. How often to sample a continuoustime process in the presence of market microstructure noise. Review of Financial Studies 18, 351–416. Andersen, T.G., Bollerslev, T., Diebold, F., 2007. Roughing it up: including jump components in the measurement, modeling and forecasting of return volatility. Review of Economics and Statistics 89, 707–720. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2000. Great realizations. Risk 13, 105–108. Andersen, T.G., Bollerslev, T., Meddahi, N., 2004. Analytic evaluation of volatility forecasts. International Economic Review 45, 1079–1110. Ball, C.A., 1988. Estimation bias induced by discrete security prices. Journal of Finance 43, 845–865. Bandi, F.M., Russell, J.R., 2006a. Comment on ‘realized variance and market microstructure noise’ by Hansen and Lunde. Journal of Business and Economic Statistics 24, 167–173. Bandi, F.M., Russell, J.R., 2006b. Separating market microstructure noise from volatility. Journal of Financial Economics 79, 655–692. Barndorff-Nielsen, O., Hansen, P., Lunde, A., Shephard, N., 2008. Designing realized kernels to measure the ex-post variation of equity prices in the presence of noise. Econometrica 76, 1481–1536. Barndorff-Nielsen, O., Shephard, N., 2002. Econometric analysis of realised volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society B 64, 253–280. Barndorff-Nielsen, O., Shephard, N., 2004. Power and bipower variation with stochastic volatility and jumps (with discussion). Journal of Financial Econometrics 2, 1–48. Barndorff-Nielsen, O., Shephard, N., 2006. Econometrics of testing for jumps in financial economics using bipower variation. Journal of Financial Econometrics 4, 1–30. Borodin, A.N., Salminen, P., 1996. Handbook of Brownian Motion Facts and Formulae. Birkhuser, Basel. Clark, P.K., 1973. A subordinated stochastic process model with finite variance for speculative prices. Econometrica 41, 135–155. Corsi, F., 2009. A simple long memory model of realized volatility. Journal of Financial Econometrics 7, 174–196. Curci, G., Corsi, F., (2005). Discrete sine transform approach for realized volatility measurement. University of Southern Switzerland (unpublished).
J. Large / Journal of Econometrics 160 (2011) 2–11 Degryse, H., de Jong, F., van Ravenswaaij, M., Wuyts, G., 2005. Aggressive orders and the resiliency of a limit order market. Review of Finance 9, 201–242. Delattre, S., Jacod, J., 1997. A central limit theorem for normalized functions of the increments of a diffusion process, in the presence of round-off errors. Bernoulli 3, 1–28. Engle, R.F., Russell, J.R., 2005. A discrete-state continuous-time model of financial transactions prices: the ACM-ACD model. Journal of Business and Economic Statistics 23, 166–180. Ghysels, E., Harvey, A., Renault, E., 1996. Stochastic Volatility. In: Rao, C.R., Maddala, G.S. (Eds.), Statistical Methods in Finance. North-Holland, Amsterdam, pp. 119–191. Gottlieb, G., Kalay, A., 1985. Implications of discreteness of observed stock prices. Journal of Finance 40, 135–153. Hansen, P., Large, J., Lunde, A., 2008. Moving average-based estimators of integrated variance. Econometric Reviews 27, 79–111. Hansen, P., Lunde, A., 2006. Realized variance and market microstructure noise. Journal of Business and Economic Statistics 24, 127–281. The 2005 Invited Address with Comments and Rejoinder. Harris, L., 1994. Minimum price variations, discrete bid-ask spreads, and quotation sizes. Review of Financial Studies 7, 149–178.
11
Large, J., 2007. Measuring the resiliency of an electronic limit order book. Journal of Financial Markets 10, 1–25. Li, Y., Mykland, P.A., 2006. Are volatility estimators robust with respect to modeling assumptions? Bernoulli 13, 601–622. Monroe, I., 1978. Processes that can be embedded in Brownian motion. The Annals of Probability 6, 42–56. Newey, W.K., West, K., 1987. A simple, positive definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–708. Oomen, R.A.C., 2006. Properties of bias-corrected realized variance under alternative sampling schemes. Journal of Financial Econometrics 3, 555–577. Shephard, N., 2005. Stochastic Volatility: Selected Readings. Oxford University Press, Oxford. Zeng, Y., 2003. A partially observed model for micromovement of asset prices with Bayes estimation via filtering. Mathematical Finance 13, 411–444. Zhang, L., Mykland, P., Aït-Sahalia, Y., 2005. A tale of two timescales: determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association 100, 1394–1411. Zhou, B., 1996. High-frequency data and volatility in foreign-exchange rates. Journal of Business and Economic Statistics 14, 45–52.
Journal of Econometrics 160 (2011) 12–21
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Econometric analysis of jump-driven stochastic volatility models Viktor Todorov Department of Finance, Kellogg School of Management, Northwestern University, Evanston, IL 60208, United States
article
info
Article history: Available online 6 March 2010 JEL classification: G12 C51 C52 Keywords: Lévy process Method-of-moments Power variation Quadratic variation Realized variance Stochastic volatility
abstract This paper introduces and studies the econometric properties of a general new class of models, which I refer to as jump-driven stochastic volatility models, in which the volatility is a moving average of past jumps. I focus attention on two particular semiparametric classes of jump-driven stochastic volatility models. In the first, the price has a continuous component with time-varying volatility and timehomogeneous jumps. The second jump-driven stochastic volatility model analyzed here has only jumps in the price, which have time-varying size. In the empirical application I model the memory of the stochastic variance with a CARMA(2,1) kernel and set the jumps in the variance to be proportional to the squared price jumps. The estimation, which is based on matching moments of certain realized power variation statistics calculated from high-frequency foreign exchange data, shows that the jump-driven stochastic volatility model containing continuous component in the price performs best. It outperforms a standard two-factor affine jump–diffusion model, but also the pure-jump jump-driven stochastic volatility model for the particular jump specification. © 2010 Elsevier B.V. All rights reserved.
1. Introduction Continuous-time stochastic volatility models have been used for a long time in the theoretical and empirical finance literature. The typical way of modeling the volatility process in these models is with (a superposition of) non-negative diffusion processes (e.g. square-root processes). However, empirical results in the studies of Eraker et al. (2003) and Pan (2002) among others suggest that including jumps in the stochastic volatility is necessary in order to account for sudden big changes in volatility. The main goal of this paper is to introduce and analyze some of the econometric properties of a general class of continuous-time models, referred to as jump-driven stochastic volatility (hereafter JDSV) models, and check empirically whether models in this class can provide good fit to high-frequency financial data. The distinctive characteristic of the JDSV models is that the stochastic variance is purely jump-driven. It is modeled as a general moving average of past positive jumps. This way of modeling the variance has several attractive features. First, it allows for flexible and at the same time parsimonious modeling of the persistence in the variance by an appropriate choice of the weights which past jumps receive in the current value of the variance. Second, since the variance is driven by jumps it has the natural ability to change quickly. Third, the linearity of the stochastic variance with respect to the jumps makes the JDSV models analytically tractable.
E-mail address:
[email protected]. 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.009
The paper analyzes two classes of JDSV models. In the first class, the price is comprised of a continuous component with stochastic volatility plus a time-homogeneous jump component. I refer to these models as jump–diffusion JDSV models. In the second class of models, referred to as pure-jump JDSV models, the price contains only jumps. The price jumps in this second class of models exhibit stochastic volatility. I derive moments of the return process in closed form for both classes of JDSV models. In the empirical part of the paper I estimate semiparametric specifications of the JDSV models and compare their performance with a standard two-factor affine jump–diffusion model. The estimation of the models is based on matching moments of daily realized power variation statistics constructed from highfrequency FX data, and Monte Carlo study documents satisfactory performance of the estimator. In the estimated JDSV models the memory of the stochastic variance is modeled with a CARMA(2,1) kernel1 and the jumps in the variance are set proportional to the squared price jumps without specifying the jump measure itself. The jump dependence in the estimated JDSV models is quite appealing — it bears analogy with GARCH models in discrete time, and at the same time it illustrates nicely the flexibility of the jump modeling in the JDSV models. My empirical results show that the model providing best fit to the data is the jump–diffusion JDSV model for the particular
1 CARMA stands for continuous-time autoregressive moving average. As later explained in the paper, the choice of CARMA(2,1) kernel in the current setting produces dynamics of the stochastic variance analogous to that of the traditional two-factor stochastic volatility models.
V. Todorov / Journal of Econometrics 160 (2011) 12–21
jump specification. This model is found to outperform the standard two-factor affine jump–diffusion model. On the other hand the estimation results show that the pure-jump JDSV model in which variance jumps are proportional to the squared price jumps does not fit the high-frequency data. Intuitively, the two-factor affine jump–diffusion model cannot generate enough volatility in the stochastic variance. On the other hand, the pure-jump JDSV model generates too much volatility of the stochastic variance to be consistent with the value of the fourth power variation observed in the data. Finally, the estimation results provide empirical support for the CARMA Lévy-driven modeling of the volatility, which was recently analyzed theoretically in Brockwell (2001). I proceed with a short comparison of the current paper with the related literature. Barndorff-Nielsen and Shephard (2001) were the first to propose a model in which the stochastic variance is purely jump-driven. Their Non-Gaussian OU-SV model is nested in the jump–diffusion JDSV class defined here. In the Non-Gaussian OUSV model the memory function is exponential and the jumps in the price are proportional to the jumps in the variance. Brockwell (2001) extended the Non-Gaussian OU-SV model by considering CARMA memory functions.2 The jump-driven JDSV class that I propose here improves/generalizes these models in the following directions: (i) the memory function is allowed to be completely general (subject to integrability conditions), (ii) the dependence between the jumps in the price and the variance encompasses all possible cases (and not just that of a perfect linear dependence that was previously considered), (iii) the activity of the price jumps is unrestricted (it can be even of infinite variation). While the first class of JDSV models generalizes a particular model that was already available in the literature, the second class, i.e. the pure-jump JDSV class, has no known precedents. The closest model is the pure-jump COGARCH of Klüppelberg et al. (2004) — it is comparable with a pure-jump JDSV model in which the jumps in the variance are proportional to the squared price jumps. Common for both models is that the time-variation in the price jumps is introduced through time-variation in the jump size.3 The difference is that in the COGARCH model the stochastic variance is driven by the price jumps which are time-varying, while in the jump-driven JDSV models the driving jumps in the variance specification are time-homogeneous. Turning to the estimation, the current work is naturally related to previous studies which derive in closed form moments of returns associated with continuous-time stochastic volatility models. Das and Sundaram (1999) derive unconditional moments of the Heston’s model. Pan (2002) derives joint moments of returns and spot volatility in one-factor affine jump–diffusion model. She uses these moments for joint inference in an implied-state GMM estimation based on spot and option data. Meddahi (2002) derives both conditional and unconditional moments of general eigenfunction stochastic volatility models, which nest most of the diffusive-volatility continuous-time models used in finance. All these studies consider estimation at low frequency. In contrast, in this paper the estimation is based on aggregating high-frequency data in daily statistics and matching moments of these statistics. This estimation method has been analyzed theoretically in a time-homogeneous context in Ait-Sahalia (2004). Empirical applications so far have restricted attention to matching
2 To generate richer autocorrelation structure of the stochastic variance, Barndorff-Nielsen and Shephard (2001) propose also superpositions of Lévy-driven OU processes for modeling the stochastic variance. 3 An alternative way of introducing stochastic volatility in pure-jump models is to introduce time-variation in the compensator of the price jumps. Examples of such models in the literature are the time-changed Lévy processes studied in Carr et al. (2003); Carr and Wu (2004) and Barndorff-Nielsen and Shephard (2006).
13
moments only of the realized variance. Bollerslev and Zhou (2002) treat the realized variance as the unobservable integrated variance in a GMM estimation of affine jump–diffusion models.4 BarndorffNielsen and Shephard (2002, 2006) have used the realized variance to estimate via QMLE part of the parameters of Non-Gaussian OUSV models and time-changed Lévy models.5 But because BarndorffNielsen and Shephard (2002, 2006) use only the realized variance, they are unable to separate the continuous and discontinuous price components. In this paper I use not only the realized variance in the estimation, but also the realized fourth power variation. That enables me to separate the two price components. As the results show including the forth power variation in the estimation penalizes strongly for the omission of price jumps from the estimated model. The rest of the paper is organized as follows. In Section 2 I introduce the JDSV models in their general form and in Sections 2.1 and 2.2 I analyze two classes of these models — the jump–diffusion and the pure-jump JDSV models respectively. Section 3 contains the empirical part of the paper. Section 3.1 gives details on the method of estimation that is used. In Sections 3.2 and 3.3 I specify the memory function and the jump dependence in the estimated JDSV models. Section 3.4 specifies the two-factor affine jump–diffusion model that is used to compare with the JDSV models. Section 3.5 contains a Monte Carlo study to assess the performance of the estimator. Finally, in Section 3.6 I discuss the estimation results. Section 4 concludes the paper and points out directions for future research. The proofs of all results in the paper are contained in Appendix available upon request. 2. Jump-Driven stochastic volatility model The JDSV model for the logarithmic asset price p(t ) is specified with the following two equations p(t ) = p(0) + α t +
+ σ (t ) = σ¯ i0 + 2 i
Rn0
∫
σ1 (s−)dW (s)
0
∫ t∫ 0
t
∫
σ2 (s−)g (x)µ( ˜ ds, dx),
t
∫
−∞
Rn0
fi (t − s)ki (x)µ(ds, dx) for i = 1, 2,
(1)
(2)
where W (t ) is a standard Brownian motion; µ denotes homogeneous Poisson random measure on R × Rn0 with compensator ν(ds, dx) = dsG(dx) for some positive σ -finite measure G(·) and µ ˜ is the compensated version of µ, i.e. µ ˜ = µ − ν ; g : Rn0 → R0 ; ki : Rn0 → R+ , fi : R → R+ and σ¯ i0 ≥ 0 for i = 1, 2. Since fi (·) and ki (·) take only non-negative values the integral in (2) is nonnegative. The assumption for a constant drift term in Eq. (1) is for simplicity. The main distinctive feature of this class of models is that the state variables (when time-varying) are modeled as moving averages of past jumps. σi2 (t ) can be written equivalently as (with the normalization fi (0) = 1)
σi2 (t ) = σ¯ i0 +
−
fi (t − s)∆σi2 (s).
s≤t
From this representation it is clear that the function fi (·) determines the weight which past jumps have in the current value
4 A Similar approach is adopted by Garcia et al. (2006) in the estimation of objective and risk-neutral distributions. 5 Roberts et al. (2004) and Griffin and Steel (2006) use MCMC techniques to estimate, based on daily data, parametric specifications of the Non-Gaussian OUSV model.
14
V. Todorov / Journal of Econometrics 160 (2011) 12–21
of the state variable σi2 (t ). Therefore the function fi (·) is referred to as the memory function or kernel. Thus, the memory functions f1 (·) and f2 (·) determine the pattern of the persistence in the stochastic variance. The specification of σi2 (t ) in (2) is to be contrasted with the traditional way of modeling the stochastic variance, which is done by a superposition of independent variance factors.6 The advantage of modeling σi2 (t ) with a moving average of jumps is that it allows for a succinct way of generating flexible dependence in the stochastic variance without the need of introducing many factors which in general are hard to identify and price. Drawing on an analogy with time series modeling in discrete time, a natural choice for fi (·) is a CARMA kernel studied in Brockwell (2001). Intuitively, with a CARMA kernel for fi (·), σi2 (t ) is a continuoustime analogue of the discrete-time ARMA process with nonGaussian innovations. However, other kernels which satisfy certain integrability conditions (given below) could also be used. In addition to the convenient way of generating persistence in the volatility, the modeling of σi2 (t ) as a moving average of positive jumps has the natural ability of generating sudden big moves in the variance. As already mentioned in the introduction, this is found empirically to be an important characteristic of the variance. Thus, these two important features of the variance, persistence and ability to change quickly, can be easily captured by the JDSV models. Importantly, this is achieved without nonlinear transformations of underlying stochastic processes (like in the standard log-normal stochastic volatility model for example). This makes possible deriving moving average type representation for the integrated state variables (given in Eq. (4) below), which is a key for many of the subsequent results. The modeling of the jumps in the JDSV models provides a very convenient framework for describing the dependence between the jumps in the price and in the state variables. All jumps in the JDSV models are written as different functions of the jumps associated with a common Poisson measure, which is of arbitrary big dimension. Thus, for example, independence between the jumps in the price and in the state variables can be generated by specifying x to be multidimensional with independent marginals and letting the jump functions ki (x) and g (x) depend on different elements of the vector x. On the other extreme, perfect linear dependence between the jumps can be generated by making ki (x) ∝ g (x). The empirical section further illustrates the convenience of this way of modeling the jumps. It should be noted that the dependence between the price jumps and the jumps in the state variables is the only way through which the so-called ‘‘leverage effect’’ can be generated in the JDSV models. Therefore, in these models modeling of the jump dependence is at the same time modeling of the ‘‘leverage effect’’. I continue with fixing some notation associated with the JDSV model, which will be used throughout. I denote the continuouslycompounded return over the period (t , t + a] with ra (t ) = p(t + a) − p(t ). The quadratic variation of the price process over the period (t , t + a] is given by
[p, p](t ,t +a] =
t +a
∫
σ12 (s)ds +
∫
σ12 (s)ds +
∫
t
= t
t t +a
∫
∫
+ t
Rn0
∫ Rn0
t t +a
∫
t +a
σ22 (s)g 2 (x)µ(ds, dx)
t +a
σ22 (s)ds
∫ Rn0
σ22 (s)g 2 (x)µ( ˜ ds, dx).
t +a
∫
σi2 (s)ds = aσ¯ i0 t
t +a
∫
∫
Hia (t , s)ki (x)µ(ds, dx),
+ Rn0
−∞
(4)
where
∫ t +a fi (z − s)dz a Hi (t , s) = ∫t t +a fi (z − s)dz
if s < t (5) if t ≤ s < t + a.
s
Eq. (4) shows that, like σi2 (t ), the integrated quantity IVai (t ) is a weighted sum of past jumps. The only difference between σi2 (t ) and IVai (t ) is in the weights which past jumps receive. In the next two subsections I study two classes of the JDSV models in which either σ12 (t ) or σ22 (t ) is equal to a constant. Therefore, to simplify notation, I remove the index i of the nonconstant state variable σi2 (t ) in these models8 and refer to σ 2 (t )
t +a
and t σ 2 (s)ds as stochastic variance and integrated variance respectively. 2.1. Jump–diffusion JDSV model
The jump–diffusion JDSV model is a special case of the JDSV model in which the price jumps do not exhibit time variation, i.e. σ22 (t ) = 19 and thus its dynamics is given by p(t ) = p(0) + α t +
t
∫
σ (s−)dW (s) +
0
with σ 2 (t ) =
∫
t
∫ t∫ 0
∫
−∞
Rn0
Rn0
g (x)µ( ˜ ds, dx)
f (t − s)k(x)µ(ds, dx).
This way of modeling the price is ‘‘traditional’’ in terms of the role which price jumps play. The continuous component of the price is expected to generate most of the small moves of the price, while the jump component of the price should account for the sudden relatively bigger changes in the price. However, note that unlike most jump–diffusion models in the financial literature here the price jumps are left unrestricted with respect to their activity — they can be even of infinite variation. This is due to the modeling of the price jumps as integrals with respect to the compensated jump measure, µ ˜ , which is adopted in this paper. For the existence of moments of the return process we need integrability conditions, which are given in assumptions H1–H4 10 below H1. Rn g 2 (x)G(dx) < ∞, H2.
∞0
H3.
∞
H4.
0
0
Rn0
f (s)ds < ∞ and
f 2 (s)ds < ∞ and g (x)G(dx) < ∞. 4
n
R0
k(x)G(dx) < ∞,
Rn0
k2 (x)G(dx) < ∞,
In the next theorem I derive moments of the return process which will be used in the estimation.
g 2 (x)G(dx)
(3)
An easy application of Fubini’s theorem gives the following representation for the integrated state variables over the period ( t , t + a] 7
6 Superposition of processes can be nested in the JDSV models by making the functions fi (·) and ki (·) conformable vectors. 7 Note that σ 2 (t ) is defined as a pathwise integral, which is almost surely finite. i
IVai (t ) := aσ¯ i0 +
Theorem 1 (Moments of The Jump–Diffusion JDSV Model). For the jump–diffusion JDSV model assume that conditions H1–H4 are satisfied. Then we have
8 This applies also for all quantities associated with the process σ 2 (t ) like f (·) i i and ki (·). 9 And, as mentioned above, to ease notation I set σ 2 (t ) = σ 2 (t ) and drop the 1 index 1 from all other quantities associated with the process σ12 (t ). 10 Of course, these conditions imply that the integrals used in defining p(t ) and
σ 2 (t ) are square-integrable.
V. Todorov / Journal of Econometrics 160 (2011) 12–21
Var (r a (t )) = a
∞
∫
f (s)ds
∫ Rn0
0
a
∫
E(r a (t ) − E(r a (t )))3 = 3
k(x)G(dx) + a H a (0, u)du
+a Rn0
E(r (t ) − E(r (t ))) = 3 a
a
4
g 3 (x)G(dx),
(H (0, u)) du a
2
∞
f (s)ds
Rn0
0
∞
∫
+ 6a2
f (s)ds
∫
0
H (0, u)du a
+6
∫
Rn0
∫ +a Rn0
∫ Rn0
k (x)G(dx) 2
g 2 (x)G(dx)
Rn0
g 2 (x)G(dx)
(8)
and for h = a, 2a, 3a . . . we also have Cov((ra (0) − E(ra (0)))2 , (ra (h) − E(ra (h)))2 ) a
∫
H (h, u)du a
=
∫ Rn0
0
∫
g 2 (x)k(x)G(dx)
a
H a (h, u)H a (0, u)du
+ −∞
∫ Rn0
k2 (x)G(dx).
S2.
∞0
S3.
∞
0
0
Rn0
f (s)ds < ∞ and
(9)
Eq. (7) reveals the source of the skewness in the returns in this model. The first term in (7) is from the presence of leverage effect, which in this model reduces to negative linear relationship between the jumps in the price and the jumps in the stochastic variance σ 2 (t ). In addition, skewness in the returns could be generated through skewness in the jump component of the price, which is the second component of Eq. (7). Turning to the covariance between the demeaned squared returns, we see from Eq. (9) that it consists of two terms. The first term is due to the link between jumps in the price and those in the variance. If the jumps in the price and the variance are independent then this term will disappear. The second term in Eq. (9) is due to the time-variation in σ 2 (t ).
f (s)ds < ∞ and
0
with σ (t ) = 2
∫
t
−∞
∫ Rn0
k2 (x)G(dx) < ∞,
0
0
Assumptions S1–S4 are not restrictive in the sense that they are the minimal assumptions needed to estimate the model by a method-of-moments type estimator. Condition S5 on the other hand rules out linear dependence between the jumps and the variance and it is used for deriving closed form analytical expression for the fourth moment of the returns and covariance of the squared returns. Therefore, condition S5 is restrictive as it rules out leverage effect in the model. However, note that condition S5 does not preclude dependence between the jumps in p(t ) and σ 2 (t ). It only rules out linear dependence. I finish this section with a theorem which gives moments of the return process for the pure-jump JDSV model to be used in the empirical application in Section 3. Theorem 2 (Moments of the Pure-Jump JDSV Model). For the purejump JDSV model assume conditions S1–S5 hold. Then we have Var (ra (t )) = a
∞
∫
f (s)ds
f (t − s)k(x)µ(ds, dx).
In contrast to the jump–diffusion JDSV model, in this model the jumps should account for big as well as very small moves in the price. Therefore, it is expected that the price jumps will exhibit paths of infinite variation, so that enough small moves in the price can be generated. The modeling of the asset price as solely driven by jumps was recently used in finance by Carr
Rn0
g 2 (x)G(dx)
∫ Rn0
k(x)G(dx),
(10)
E (r a (t ) − E(ra (t )))4 ∞
∫
f (s)ds 2
=a 0
∫ Rn0
f (s)ds
× Rn0
+ 3a
2
Rn0
k(x)G(dx)
∫
∞
f (s)ds
Rn0
a
∫
0
Rn0
∫ Rn0
g 2 (x)G(dx)
g 2 (x)k(x)G(dx)
∫
g (x)G(dx) 2
Rn0
H u (0, s)f (u − s)dsdu −∞
2
∫ ×
f (s)ds
u
+6
g 4 (x)G(dx)
Rn0
0
0
∫
g 4 (x)G(dx)
∞
∫
0
∫
Rn0
k(x)G(dx)
f (s − u)duds
+6
∫
2 ∫
∫
s
∫
0
k (x)G(dx) 2
∞
a
∫ ∫
σ (s−)g (x)µ( ˜ ds, dx)
∫
0
0
Rn0
k(x)G(dx) < ∞,
Rn0
g 4 (x)G(dx) < ∞.
+a
The pure-jump JDSV model is a special case of the JDSV model in which the price does not contain continuous component, i.e. in which σ12 (t ) = 011 and therefore its dynamics is p(t ) = p(0) + α t +
n
R0
2
∫ 2.2. Pure-jump JDSV model
∫ t∫
As in the jump–diffusion JDSV model, these conditions guarantee that the returns and σ 2 (t ) are weakly stationary. Condition S4 is needed for deriving the fourth moment of the returns as well as for the covariance of the squared returns and it guarantees the finiteness of these moments. In addition, for deriving some of the moments of the return process we need the following additional condition S5. Rn g (x)k(x)G(dx) = 0 and Rn g 3 (x)G(dx) = 0.
2
∫
15
et al. (2003) and Barndorff-Nielsen and Shephard (2006). In these papers the stochastic volatility is generated by time-changing time-homogeneous (i.e. Lévy) jumps. In contrast, in the pure-jump JDSV model stochastic volatility is generated by introducing timevariation in the size of the price jumps. Conditions similar to H1–H4 are assumed to hold here as well S1. Rn g 2 (x)G(dx) < ∞,
S4.
g 4 (x)G(dx)
g 2 (x)k(x)G(dx)
Rn0
0
+ 3a2
k(x)G(dx)
k(x)G(dx)
Rn0
a
∫
2
∫
(7)
∫
−∞
∫ +3 a
g 2 (x)G(dx), (6)
k(x)g (x)G(dx)
Rn0
a
∫
Rn0
∫
0
∫
∫
g 2 (x)G(dx)
∫ Rn0
∫ Rn0
k(x)G(dx)
2
k2 (x)G(dx)
,
(11)
and for h = a, 2a, . . . 11 Again, to simplify notation, I set σ 2 (t ) = σ 2 (t ) and drop the index 2 from all 2 quantities associated with the process σ22 (t ).
Cov(ra2 (t ), ra2 (t + h)) =
∫
a
−∞
H a (h, u)H a (0, u)du
∫ Rn0
k2 (x)G(dx)
16
V. Todorov / Journal of Econometrics 160 (2011) 12–21
∫ × Rn0
∫ × Rn0
2 ∫ g 2 (x)G(dx) +
∞
f (s)ds
∫ Rn0
g 2 (x)G(dx)
H a (h, u)du 0
0
k(x)G(dx)
a
∫
∫ Rn0
g 2 (x)k(x)G(dx).
(12)
3. Empirical application
3.2. Modeling the memory of the stochastic variance
The goal of this section is to estimate models in the two classes of JDSV models that were analyzed in the previous section and to compare their performance with the widely used in the finance literature affine jump–diffusion models. I start this section with a description of the method of estimation. 3.1. Realized power variation and estimation The estimation in this paper is based on matching moments of daily returns and realized (daily) power variation. The realized p-power variation over a day t computed from high-frequency observations with length δ is defined as p
RPVδ (t ) =
M −
|rδ (t + (i − 1)δ)|p ,
restrictions on the parameters that guarantee non-negativity and stationarity of the state variables in the estimated models.12 To estimate the JDSV models with the method-of-moments type estimator proposed here we need to impose more structure on these models. In particular, we need to specify the memory function f (·) and in addition we need to model the jumps in the price and the variance. This is done in the next two subsections.
p > 0, M = ⌊1/δ⌋.
(13)
i=1
In this paper the realized power variation statistics that are used ∑M 2 are the Realized Variance (RVδ (t ) := i=1 |rδ (t + (i − 1)δ)| , hereafter RV) and the Realized Fourth Power Variation (FVδ (t ) := ∑M 4 i=1 |rδ (t + (i − 1)δ)| , hereafter FV). As δ ↓ 0, RVδ (t ) converges in probability to the quadratic variation over day t, while FVδ (t ) converges to the sum of the price jumps raised to the power four (over day t). Therefore, using RV and FV in the estimation we can identify the parameters controlling the variance of the continuous and discontinuous components of the price. Of course, to improve efficiency in the estimation potentially we need to consider also other realized power variation statistics. The choice of RV and FV for the estimation here is driven by the fact that these statistics can be used to disentangle continuous and discontinuous components of the price and importantly moments of RV and FV can be computed in closed form for the JDSV models which makes the estimation easy to apply. Moreover, the method of estimation used here has the advantage that it can provide an answer whether a whole class of models is appropriate in modeling asset prices without being fully parametric. The particular moments used in the estimation of all models in this paper are the following: mean, variance and autocorrelation of RV; mean of FV and mean of fourth moment of daily returns. These moments can be computed in closed-form for the JDSV models introduced in Sections 2.1 and 2.2 using Theorems 1 and 2 respectively. For the autocorrelation of RV I use lags one, four, seven, ten as well as the sum of the autocorrelations from lag twenty till lag forty. Thus, altogether I end up with nine conditions. Adding more conditions could increase the asymptotic efficiency of the estimator, but in a finite sample this is also associated with a lower precision in estimating the optimal weighting matrix (see the Monte Carlo evidence in Andersen and Sørensen (1996)). I proceed with further details about the estimation. In the estimation of all models I set the drift term to zero and use demeaned stock market return data. For the optimal weighting matrix I use a HAC estimator of the covariance matrix with a Bartlett kernel and a lag-length of eighty. The estimation is performed using the MCMC approach of Chernozhukov and Hong (2003) of treating the Laplace transform of the objective function as an unnormalized likelihood function and applying MCMC to the pseudo posterior. Then the parameter estimates are the resulting mode of the pseudo posterior. In the estimation I impose all
For the memory function f (·) in the estimation I use CARMA(2,1) kernel with two distinct real autoregressive roots. The CARMA(2,1) kernel generates the same autocorrelation in σ 2 (t ) as the twofactor stochastic volatility models, which in turn are found to be successful in fitting financial asset prices. Based on previous estimation of the two-factor models, one of the autoregressive roots is expected to be slowly mean reverting, corresponding to a persistent factor in the variance. The second autoregressive root is expected to be fast mean reverting which corresponds to a less persistent factor in the variance. The (normalized) CARMA(2,1) kernel with two distinct negative autoregressive roots is given by (see Brockwell, 2001 for details) b0 + ρ2 ρ2 u b0 + ρ1 ρ1 u e + e , u ≥ 0. (14) ρ1 − ρ2 ρ2 − ρ1 For this choice of f (·) the kernel of the integrated variance H a (t , s) f (u) =
in (5) becomes
H (t , s) = a
b0 + ρ1 eρ1 a − 1 eρ1 (t −s) ρ − ρ ρ 1 2 1 b0 + ρ2 eρ2 a − 1 ρ2 (t −s) e + ρ −ρ ρ 2
1 ρ1 (t −s+a)
2
b0 + ρ1 e −1 ρ − ρ ρ 1 2 1 b0 + ρ2 eρ2 (t −s+a) − 1 + ρ2 − ρ1 ρ2
if s < t (15)
if t ≤ s < t + a.
The necessary and sufficient condition for the CARMA(2,1) kernel to be non-negative is b0 ≥ − max{ρ1 , ρ2 } > 0 (see Todorov and Tauchen, 2006). The CARMA(2,1) kernel in Eq. (14) reduces to a CARMA(1,0) kernel when b0 = − min{ρ1 , ρ2 }. Therefore, the results for the CARMA(2,1) kernel could be specialized for the CARMA(1,0) case. In the empirical part I estimate both JDSV models with CARMA(2,1) and CARMA(1,0) kernel. Note that the CARMA(1,0) choice for the kernel f (·) corresponds to the case where the stochastic variance follows a Lévy-driven OU process as in Barndorff-Nielsen and Shephard (2001). 3.3. Jump specification In this subsection I model the jumps in both JDSV models. The approach adopted in this paper is to specify the functional form of g (·) and k(·) (and thus the jump dependence), but to leave the Poisson measure µ unspecified. Instead, in each of the models I estimate (parameterize) only cumulants associated with µ. This way the difference in the performance of the different models will not be influenced by a potentially wrong parametric choice for the distribution of the jumps, which can happen if a fully parametric approach is adopted instead. 3.3.1. The jump–diffusion JDSV model I make the following assumption for g (·) and k(·) in the jump–diffusion JDSV model H5. g (x) = const1 × x and k(x) = const2 × x2 .
12 The restrictions guaranteeing stationarity are given in Sections 3.2 and 3.4 below.
V. Todorov / Journal of Econometrics 160 (2011) 12–21
This assumption means that the jumps in the variance are proportional to the squared price jumps. This jump specification bears analogy with the GARCH modeling in discrete time where the conditional variance is determined by the past squared returns. It is important also to note that this specification of the jumps does not rule out jumps of infinite variation in the price (which will be the case for example if the jumps in the price are proportional to the jumps in the variance as in the Non-Gaussian OU-SV model). In the estimation I treat as parameters the following cumulants b associated with the Poisson measure µ: mc := Rn k(x)G(dx) ρ 0ρ ,
Rn0
g 2 (x)G(dx) and v :=
0
1 2
k2 (x)G(dx). The factor ρ 0ρ in 1 2 the expression for mc is associated with the memory function, but makes mc equal to the variance of the continuous component of the price and thus easier to interpret. The expression md is the variance of the discontinuous component. The cumulants mc , md and v are all we need to know for the measure µ in order to compute the moments used in the estimation of the jump–diffusion JDSV model (under the the jump specification H5 of course). md :=
b
Rn0
3.3.2. The pure-jump JDSV model The specification of the jumps in the pure-jump JDSV model that is used in the estimation is analogous to assumption H5 S6. g (x) = const1 × x and k(x) = const2 × x2 . Assumption S6 does not rule out jumps of infinite variation, which as already mentioned is particularly important for the purejump models. Note also that combining S5 and S6 we have the implication R x3 G(dx) = 0. 0
Under assumption S6 we can see that the jumps in σ 2 (t ) are proportional to the quadratic variation of the Lévy process driving the price. This is very similar to the COGARCH modeling (Klüppelberg et al., 2004; Brockwell et al., 2006) where the jumps in σ 2 (t ) are proportional to the quadratic variation of the discontinuous component of the price. What makes the pure-jump JDSV model analyzed here different from the COGARCH intuitively is the fact that σ 2 (t ) is an infinite moving average of the past squared Lévy jumps driving the price (i.e. under S6 we have k(x) ∝ g 2 (x)), while in the COGARCH model σ 2 (t ) is a moving average of the past squared price jumps. For the estimation of the pure-jump JDSV model under the assumptions S6 and S7, I parameterize the following two expressions related with the Poisson random measure µ
∫ m := Rn0
g 2 (x)G(dx)
∫ Rn0
k(x)G(dx)
b0
ρ1 ρ2
and
v :=
17
autocorrelation structure as in the JDSV models analyzed here for the choice of the CARMA(2,1) kernel. The two-factor affine jump–diffusion stochastic volatility model is given by dp(t ) = α dt +
V (t )dW (t ) +
∫ Rn0
g (x)µ( ˜ ds, dx),
V (t ) = V1 (t ) + V2 (t ), dVi (t ) = κi (θi − Vi (t ))dt + σi
Vi (t )dBi (t ),
Rn0
g 4 (x)G(dx)
∫ Rn0
k2 (x)G(dx).
Similar to the jump–diffusion JDSV model, the scaling of m by the b factor ρ 0ρ makes it equal to the variance of the return over a 1 2 unit interval and thus much easier to interpret. Given S6, m and v synthesize all information about µ needed in the estimation (with the method-of-moment type estimator specified in Section 3.1). 3.4. A two-factor affine jump–diffusion model To compare the performance of the JDSV models I estimate also a standard one and two-factor affine jump–diffusion models.13 The realized variance in the two-factor case has the same
13 The estimation method is the same as the one used for the JDSV models and was discussed in Section 3.1.
i = 1, 2,
(17)
where W (t ), B1 (t ) and B2 (t ) are independent standard Brownian motions14 ; the function g (·) and the compensated random measure µ ˜ are as introduced in the JDSV models. Like the jump–diffusion JDSV model, the jumps in the affine jump–diffusion model (16)–(17) are time-homogeneous15 : ∆p(t ) = g (x). The two variance factors follow square-root diffusion processes and take non-negative values. Note that unlike the JDSV models the stochastic volatility model (16)–(17) does not have jumps in the variance, i.e. ∆V (t ) = 0. For the estimation of the affine jump–diffusion models, I parameterize the fourth cumulant of the jumps in the price. That is I estimate v := Rn g 4 (x)G(dx). In addition, instead of working with 0
σ1 and σ2 , I estimate σ1v := σ1
θ1 2κ1
and σ2v := σ2
θ2
2 κ2
, which
are the standard deviations of the two variance factors. Further, I set θ := θ1 + θ2 + Rn g 2 (x)G(dx) and estimate θ , since θ1 , θ2 and
0
Rn0
g 2 (x)G(dx) are not identified separately (with the moment
conditions used here). Finally, to guarantee that the estimated parameters correspond to stationary variance factors V1 and V2 , I impose the following restriction on σ1v and σ2v 16
σ1v + σ2v < θ .
(18)
3.5. Monte Carlo study Before estimating the different models using real data I conduct a Monte Carlo study to assess the finite sample properties of the estimation method. The model estimated in the Monte Carlo is the jump–diffusion JDSV model with CARMA(1,0) kernel and jump specification H5.17 Table 1 contains details on the particular parametric distribution for the jumps used in the Monte Carlo as well as the parameter values in the different cases. Since the estimated model is an ‘‘one-factor’’ type model, I drop from the moment vector (given in Section 3.1) two of the moment conditions which identify the memory of the stochastic variance. These moments are the autocorrelation of RV at lag ten and the sum of the autocorrelations from lag twenty till lag forty. The estimated parameters are ρ1 , − Rn k(x)G(dx)/ρ1 , Rn g 2 (x)G(dx) 0
∫
(16)
and
−0.5
Rn0
0
k2 (x)G(dx)/ρ1 .18 In the Monte Carlo I analyze
14 The assumption for the independence of the Brownian motions allows for closed-form expressions for the moments needed in the estimation in the general two-factor case. 15 Of course in the general affine jump–diffusion models, the jumps could have time-varying intensity (but it should be affine in the factors), see e.g. (Duffie et al., 2003) for details. 16 Condition (18) follows from the restriction guaranteeing stationarity of a square-root process. 17 The reason for using a model in the jump–diffusion JDSV class in the Monte Carlo is that models in this class are more challenging to estimate as the estimation involves separation of continuous and discontinuous components. Moreover the estimation results later in this section show that the jump–diffusion JDSV models outperform the pure-jump JDSV models when variance jumps are proportional to squared price jumps. 18 The last parameter is the volatility of σ 2 (t ), but only for the CARMA(1,0) case and this is why this transformation of
Rn0
k2 (x)G(dx) is not used when reporting
the estimation results for the JDSV models in Section 3.6.
18
V. Todorov / Journal of Econometrics 160 (2011) 12–21
Table 1 Details on Monte Carlo. dp(t ) = σ (t −)dW (t ) + Case
t
R0
g (x)µ( ˜ dt , dx), σ 2 (t ) = −∞
R0
−λ|x|
eρ(t −s) k(x)µ(dt , dx)g (x) = k1 x, k(x) = k2 x2 , G(dx) = c |ex|1+α dx.
Parameter values
Low persistence, low volatility Low persistence, high volatility High persistence, low volatility High persistence, high volatility
Implied moments
α
λ
c
k1
k2
ρ
E [p, p](t ,t +1]
0.1 0.1 0.1 0.1
0.5785 0.1928 0.2000 0.0667
1.0283 1.0283 0.2487 0.2487
0.2177 0.3770 0.0961 0.1665
0.0426 0.1279 0.0025 0.0075
−0.10 −0.10 −0.03 −0.03
0.5 0.5 0.5 0.5
CV [p, p](t ,t +1]
0.5347 0.9261 0.6377 1.1045
Note: The Poisson measure used in the Monte Carlo is of a symmetric tilted stable process (also known as CGMY process). The simulation is done using the series representation method (see Rosiński, 2007 and Todorov and Tauchen, 2006 for details). CV [p, p](t ,t +1] stands for coefficient of variation of the daily quadratic variation. Table 2 Monte Carlo results. Parameter
True value
Mean
RMSE
5-th percentile
Median
95-th percentile
0.1000 0.4500
0.0998 0.4433
0.0120 0.0200
0.0810 0.4127
0.0996 0.4430
0.1191 0.4759
0.0500
0.0499
0.0052
0.0414
0.0497
0.0590
0.2236
0.2105
0.0252
0.1773
0.2092
0.2482
0.1000 0.4500
0.0981 0.4348
0.0127 0.0358
0.0776 0.3825
0.0982 0.4338
0.1182 0.4910
0.0500
0.0480
0.0065
0.0385
0.0479
0.0582
0.3873
0.3401
0.0715
0.2568
0.3379
0.4304
0.0300 0.4500
0.0279 0.4387
0.0088 0.0354
0.0140 0.3870
0.0279 0.4377
0.0419 0.4970
0.0500
0.0472
0.0115
0.0299
0.0471
0.0654
0.2236
0.1984
0.0411
0.1514
0.1955
0.2558
Panel D. High persistence, high volatility −ρ1 0.0300 0.4500 − Rn k(x)G(dx)/ρ1
0.0275 0.4271
0.0085 0.0594
0.0142 0.3410
0.0277 0.4227
0.0409 0.5238
0.0500
0.0453
0.0126
0.0262
0.0456
0.0640
0.3873
0.3150
0.1055
0.2095
0.3072
0.4495
Panel A. Low persistence, low volatility
−ρ1 − Rn k(x)G(dx)/ρ1 0 2 g (x)G(dx) Rn 0 −0.5 Rn k2 (x)G(dx)/ρ1 0
Panel B. Low persistence, high volatility
−ρ1 − Rn k(x)G(dx)/ρ1 0 2 g (x)G(dx) Rn 0 −0.5 Rn k2 (x)G(dx)/ρ1 0
Panel C. High persistence, low volatility
−ρ1 − Rn k(x)G(dx)/ρ1 0 2 g (x)G(dx) Rn 0 −0.5 Rn k2 (x)G(dx)/rho1 0
0
g 2 (x)G(dx) Rn 0 −0.5 Rn k2 (x)G(dx)/ρ1
0
Note: The table reports the results of the Monte Carlo experiment. The number of Monte Carlo replications is 1000 and each simulation contains 3000 days of 288 highfrequency observations.
scenarios of high and low level of volatility of the stochastic variance and of high and low persistence in the stochastic variance, which makes a total of four different cases. In all the cases I set the variance of the continuous price component to be 0.45 and that of the jump price component to be 0.05, following the nonparametric empirical findings in Andersen et al. (2007) and Huang and Tauchen (2005). Finally, to match the FX high-frequency data that I am going to use in the empirical application, I set the sample size to 3000 and the number of intraday observations to 288. The results from the Monte Carlo study are summarized in Table 2. Below I outline the key findings. (1) The moments of the realized power variation statistics used in the estimation can efficiently separate the total variance into two parts — one due to the continuous price component and the other one due to the discontinuous price component. Both estimates, − Rn k(x)G(dx)/ρ1 and Rn g 2 (x)G(dx), are slightly 0
0
downward biased, more so the variance of the continuous price component. This is consistent with the downward bias in the mean of the stochastic volatility that is found in the Monte Carlo studies of Andersen and Sørensen (1996) and Bollerslev and Zhou (2002). (2) The autocorrelations of the realized variance used in the estimation are able to identify ρ1 . This parameter estimate is again slightly downward biased. (3) The volatility of the stochastic variance parameter,
−0.5
Rn0
k2 (x)G(dx)/ρ1 , is the hardest parameter to estimate.
This is reflected in the relatively bigger (as compared with the other parameters) bias and RMSE. Similar conclusion is made also in Andersen and Sørensen (1996). (4) Increasing either the persistence of the stochastic variance or the volatility of the stochastic variance worsens the results, i.e. the biases and the RMSE-s of all parameters increase. Overall, the Monte Carlo study shows satisfactory performance of the estimator in empirically realistic situations. 3.6. Estimation and discussion of the results I finally proceed with the estimation. For the empirical application I use continuously compounded five-minute returns on the German Deutsch Mark/US Dollar (DM/$) spot exchange rate series. It spans the period from December 1, 1986 till June 30, 1999. From the data set were removed missing data, weekends, fixed holidays and similar calendar effects, with details explained in Andersen et al. (2001). The total number of days left in the data set is 3045, each of which consists of 288 five-minute continuously compounded returns. The results from the estimation of the different models are reported in Tables 3–4. The first important question, which I try to answer using the estimation results, is whether price jumps matter for the ability of the models to fit the data. Looking at the results for the jump–diffusion JDSV model in Table 3 and the affine
V. Todorov / Journal of Econometrics 160 (2011) 12–21
19
Table 3 Parameter estimates for the JDSV models. Panel A: Jump–Diffusion JDSV models CARMA(1,0) no price jumps
CARMA(1,0) with price jumps
b0
−ρ1 −ρ2
0.2736(0.0458)
0.1838(0.0383)
mc md
0.3881(0.0143) 0.0007(0.0002)
0.2752(0.0131) 0.1229(0.0141) 0.0047(0.0011)
46.3400(6)[0.0000]
32.4440(5)[0.0000]
v
J test
CARMA(2,1) no price jumps
CARMA(2,1) with price jumps
0.0609(0.0402) 0.0079(0.0097) 1.9516(0.2297) 0.4147(0.0244) 0.2080(0.1068)
0.1533(0.0634) 0.0221(0.0162) 1.4960(0.1845) 0.4285(0.0268) 0.0435(0.0155) 0.1564(0.0485)
21.2280(4)[0.0003]
5.1458(3)[0.1614]
Panel B: Pure-Jump JDSV Models CARMA(1,0)
CARMA(2,1)
−ρ1 −ρ2
b0 0.5680(0.1320)
m
0.4296(0.0129) 0.0447(0.0097)
241.4600(53609.3) 0.5266(0.4335) 3.0439(17.0097) 0.4102(0.0164) 0.0004(0.0929)
51.2000(6)[0.0000]
45.958(4)[0.0000]
v
J test
Note: The table reports the parameter estimates for the JDSV models with choices for the memory function f (·): CARMA(1,0) and CARMA(2,1) and jump specifications given in assumption H5 and S6 respectively. The data used in the estimation is DM /$ exchange rate over the period from December 1 1986 till June 30 1999, for a total of 3045 daily observations, each of which consists of 288 five-minute returns. The power variation statistics were computed using the intraday five-minute returns. The asymptotic variance–covariance matrix is calculated using Bartlett weights with a lag-length of eighty. The numbers in parenthesis after the J tests are the degrees of freedom and those in square brackets are the corresponding p-values. Table 4 Parameter estimates for the affine jump–diffusion SV models. One-factor no price jumps
One-factor with price jumps
Two-factor no price jumps
Two-factor with price jumps
θ κ1 σ1 v κ2 σ2 v v
0.3881(0.0223) 0.2707(5.7096) 0.0360(0.8432)
0.3980(0.0136) 0.1872(0.0389) 0.1561(0.0151)
0.4140(0.0269) 0.0084(0.7818) 0.1118(4.3350) 1.9496(0.1807) 0.2299(0.0872)
J test
46.2800(6)[0.0000]
0.0278(0.0072) 32.4400(5)[0.0000]
0.4429(0.0280) 0.0122(0.0119) 0.1696(0.0344) 1.4957(0.2052) 0.2733(0.0626) 0.0272(0.0085) 6.9470(3)[0.0736]
21.1680(4)[0.0003]
Note: The table reports the parameter estimates for the affine jump–diffusion stochastic volatility models given in (16)–(17), applied to the DM /$ exchange rate. θ = θ1 + θ2 + Rn g 2 (x)G(dx), σ1v = σ1 2θκ1 and σ2v = σ2 2θκ2 . The data spans the period from December 1 1986 till June 30 1999, for a total of 3045 daily observations, each 0
1
2
of which consists of 288 five-minute returns. The power variation statistics were computed using the intraday five-minute returns. The asymptotic variance–covariance matrix is calculated using Bartlett weights with a lag-length of eighty. The numbers in parenthesis after the J tests are the degrees of freedom and those in square brackets are the corresponding p-values.
jump–diffusion model in Table 4 one can see that inclusion of jumps in the price significantly improves the fit of the model. This holds true regardless of the choice of the memory function for the jump–diffusion JDSV model and the number of factors in the affine jump–diffusion model. Also, this conclusion for the importance of the price jumps is robust to the different ways of modeling the stochastic variance (as a sum of square-root processes or moving average of positive jumps). The big difference in the performance of the models with or without price jumps indicates that the moments used in the estimation significantly penalize for their omission. On the other hand, both estimated pure-jump JDSV models provide very bad fit to the data. Therefore, pure-jump JDSV model in which the jumps in the variance are proportional to the squares of the driving Lévy process in the price does not seem to provide good description of the data. The second question of interest is how well is the persistence of the realized variance described by a CARMA(2,1) kernel. Comparing the performance of the JDSV models with CARMA(1,0) kernel with those with CARMA(2,1) kernel, one can see that the CARMA(2,1) kernel provides a much better fit for the autocorrelation structure of RV. The same pattern emerges when comparing the one-factor affine jump–diffusion model with the two-factor one. This finding is in line with results in most of the empirical studies of multi-factor stochastic volatility models.
What is important to note here is that the CARMA(2,1) kernel does as good job as a two-factor stochastic volatility model in capturing the persistence in the realized variance without the need of introducing multiple factors. On Fig. 1 I compare the autocorrelation of the realized variance with that implied by the parameter estimates of the jump–diffusion JDSV model with CARMA(2,1) kernel, which contains price jumps. In the estimation, as already discussed in Section 3.1, I use only the first forty autocorrelations of the realized variance. Fig. 1 shows that the CARMA(2,1) kernel provides pretty good fit and matches well the observed autocorrelations even beyond lag forty. The model that provides best fit to the moments used in the estimation is the CARMA(2,1) jump–diffusion JDSV model containing price jumps. The level of the J-statistic is well below commonly accepted critical values. In fact, the only other model which cannot be rejected at a 5% significance level is the two-factor affine jump–diffusion model with price jumps. I compare these two models closer.19 Both of them generate the
19 Note that these two models are non-nested. They both have the same number of parameters and are estimated with the same number of moment conditions. Therefore the difference in their J statistics can be used to compare their performance, see Andrews (1999) for example.
20
V. Todorov / Journal of Econometrics 160 (2011) 12–21
Table 5 Moment condition tests.
Autocorrelation at lag 1 Autocorrelation at lag 4 Autocorrelation at lag 7 Autocorrelation at lag 10 Aver. autocorrelation for lags 20–40 E(RVδ (t )) E(RVδ2 (t )) E(FVδ (t )) E(r14 (t ))
CARMA(2,1) Jump–Diffusion JDSV model
Two-factor affine Jump–Diffusion model
2.1654 2.3476 1.7215 1.3912 0.0161 1.8370 1.8791 −1.7197 −0.1585
2.5694 2.1448 1.2226 0.8910 −1.2865 3.5623 3.8200 7.1224 5.0830
Note: The table reports the diagnostic t-statistics for each of the moment conditions underlying the estimation results for the CARMA(2,1) Jump–Diffusion JDSV model and the two-factor Affine Jump–Diffusion model, both containing price jumps, reported in the last columns of Tables 3 and 4 respectively.
0.7 0.6
autocorrelation
0.5 0.4 0.3 0.2 0.1 0
0
10
20
30
40
50 60 lags in days
70
80
90
100
Fig. 1. The figure shows the fit of the CARMA(2,1) kernel. The empirical autocorrelation of the Realized Variance is marked with +. The solid line is the autocorrelation implied from the jump–diffusion JDSV model given in (1)–(2) with CARMA(2,1) memory function and jump specification given in assumption H5. The parameters were set at the estimated values reported in the last column of Table 3.
same autocorrelation structure. Therefore the difference in their performance comes from the jump specification and the fact that in the JDSV model the variance is solely driven by jumps while in the affine jump–diffusion model it is driven by a sum of squareroot diffusion processes. Following Tauchen (1985), in Table 5 for each of the two models I report the t-statistics associated with the moments used in the estimation. These statistics should give an idea which moments the models have difficulty in matching.20 Firstly, as expected both models produce comparable fit for the autocorrelations used in the estimation. The difference comes in the fit to the rest of the moments. As seen from Table 5, the CARMA(2,1) jump–diffusion JDSV model has no difficulty in fitting the variance of RV, the mean of FV and the fourth moment of the daily returns. In contrast, the affine jump–diffusion model does not provide a good fit to those moments. In fact, looking at the parameter estimates of the affine jump–diffusion model, reported in the last column of Table 4, it can be seen that the volatility parameters for the two square-root processes driving the stochastic variance are on the boundary of the stationarity condition (18) (in the estimation I impose this restriction). That is, the square-root diffusion processes, used for modeling the stochastic variance, could not produce enough volatility in the stochastic variance. This is in contrast with JDSV models where volatility in the stochastic variance is very easy to generate.
20 Of course, these statistics should be interpreted with care because the inconsistency of parameter estimates in general would lead to the inconsistency of even correctly specified moment conditions.
I end this section with a short discussion on the parameter estimates for the best performing model from the ones compared here, i.e., the CARMA(2,1) jump–diffusion JDSV model with jump specification given in H5. The parameter estimates of this model are reported in the last column of Panel A in Table 3. The estimation results confirm our expectation regarding the autoregressive coefficients of the CARMA(2,1) kernel. The first autoregressive root has a half-life of approximately thirty days and is therefore relatively slow mean reverting. On the other hand, the second autoregressive root has a half-life of approximately half a day and thus has much faster mean reversion. Fig. 1 illustrates that this kernel does provide a good fit of the empirical autocorrelation function. Turning to the parameters controlling the jumps, it can be seen that the jump component in the price has around 9% share in the total variance of the return process. This level is similar to the proportion of jumps found in financial asset prices in the studies of Andersen et al. (2007) and Huang and Tauchen (2005). These studies use the non-parametric jump detection tests developed in Barndorff-Nielsen and Shephard (2004) to disentangle the jumps from the continuous price component. Thus, overall I conclude that the jump-driven JDSV model with CARMA(2,1) kernel containing jumps in the price with specification given in H5 is able to fit well the moments used in the estimation and captures the main empirical features observed in the high-frequency data. 4. Conclusion In this paper a general semiparametric class of jump-driven stochastic volatility models is introduced and their econometric properties are analyzed. These models have the distinctive feature that the state variables determining the time-variation of the continuous and discontinuous component of the price, when timevarying, are representable as moving averages of positive jumps. This allows for writing the integrated variance itself as another moving average of the same positive jumps. Using that, I derive moments of the return process and use them in estimation of the models in the JDSV class via GMM. The empirical results in the paper confirm the ability of models in the JDSV class to capture salient features of the high-frequency financial data, better than traditionally used stochastic volatility models. Finally, there are two directions in which the analysis of the current paper should be extended. First, in the estimated models the leverage effect is either ruled out or linked in a one-to-one relationship with the skewness of the returns, which is potentially rather restrictive. Therefore, more general specifications of the jumps in the price and the variance (and consequently their dependence) need to be considered in order to capture this feature of the data. The modeling of the jumps proposed here provides convenient framework for this. For example, following an analogy with the asymmetric GARCH in discrete-time, one possibility is to set the variance jumps proportional to the squared price jumps but
V. Todorov / Journal of Econometrics 160 (2011) 12–21
with coefficient of proportionality being different for the negative and positive price jumps. However, for identifying such more general jump structures in estimation, moments of other realized (multi)power variation statistics are needed and this is the second direction in which the current work should be extended. Acknowledgements I am grateful to the editors and two anonymous referees for their helpful comments. I would also like to thank Tim Bollerslev, Ron Gallant, Han Hong and George Tauchen for many discussions and encouragement along the way. In addition, I am indebted to Tim Bollerslev for providing me with the high-frequency FX data for the empirical application and for helpful comments on the second draft of the paper. I have benefited from suggestions made by seminar participants at the Duke Econometrics and Finance Lunch Group, the Opening Workshop of the SAMSI Program on Financial Mathematics, Statistics and Econometrics, Durham, September 2005, the Conference on Stochastics in Science in Honor of Ole Barndorff-Nielsen, Guanajuato, Mexico, March 2006 and the Conference on Realized Volatility, Montreal, April 2006. References Ait-Sahalia, Y., 2004. Disentangling diffusion from jumps. Journal of Financial Economics 74, 487–528. Andersen, T., Bollerslev, T., Diebold, F., 2007. Roughing it up: Disentangling continuous and jump components in measuring, modeling and forecasting asset return volatility. Review of Economics and Statistics 89, 701–720. Andersen, T., Bollerslev, T., Diebold, F., Labys, P., 2001. The distribution of exchange rate volatility. Journal of the American Statistical Association 96, 42–55. Andersen, T., Sørensen, B., 1996. GMM Estimation of a stochastic volatility model: a monte casrlo study. Journal of Business and Economic Statistics 14, 328–352. Andrews, D., 1999. Consistent moment selection procedures for generalized method of moments estimation. Econometrica 67, 543–564. Barndorff-Nielsen, O.E., Shephard, N., 2001. Non- Gaussian Ornstein-Uhlenbeckbased models and some of their applicaions in financial economics. Journal of the Royal Statistical Society: Series B 63, 167–241. Barndorff-Nielsen, O.E., Shephard, N., 2002. Econometric analysis of realised volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society: Series B 64, 253–280.
21
Barndorff-Nielsen, O.E., Shephard, N., 2004. Power and bipower variation with stochastic volatility and jumps. Journal of Financial Econometrics 2, 1–37. Barndorff-Nielsen, O.E., Shephard, N., 2006. Impact of jumps on returns and realised variances: econometric analysis of time-deformed Lévy processes. Journal of Econometrics 131, 217–252. Bollerslev, T., Zhou, H., 2002. Estimating stochastic volatility diffusionusing conditional moments of integrated volatility. Journal of Econometrics 109, 33–65. Brockwell, P., 2001. Lévy-driven CARMA processes. Annals of the Institute of Statistical Mathematics 53, 113–124. Brockwell, P., Chadraa, E., Lindner, A., 2006. Continuous time GARCH processes. Annals of Applied Probability 16, 790–826. Carr, P., Geman, H., Madan, D., Yor, M., 2003. Stochastic volatility for Lévy processes. Mathematical Finance 13, 345–382. Carr, P., Wu, L., 2004. Time-changed Lévy processes and option pricing. Journal of Financial Economics 71, 113–141. Chernozhukov, V., Hong, H., 2003. An MCMC approach to classical estimation. Journal of Econometrics 115, 293–346. Das, S., Sundaram, R., 1999. Of smiles and smirks: a term structure perspective. Journal of Financial and Quantitative Analysis 34, 211–239. Duffie, D., Filipović, D., Schachermayer, W., 2003. Affine processes and applications in Finance. Annals of Applied Probability 13 (3), 984–1053. Eraker, B., Johannes, M., Polson, N., 2003. The impact of jumps in volatility and returns. Journal of Finance 58, 1269–1300. Garcia, R., Lewis, M., Pastorello, S., Renault, E., 2006. Estimation of objective and risk-neutral distributions based on moments of integrated volatility. Working Paper, Universite de Montreal. Griffin, J., Steel, M., 2006. Inference with Ornstein-Uhlenbeck processes for stochastic volatility. Journal of Econometrics 134, 605–644. Huang, X., Tauchen, G., 2005. The relative contributions of jumps to total variance. Journal of Financial Econometrics 3, 456–499. Klüppelberg, C., Lindner, A., Maller, R., 2004. A continuous time GARCH process driven by a Lévy process: stationarity and second order behavior. Journal of Applied Probability 41, 601–622. Meddahi, N., 2002. Theoretical comparision between integrated and realized volatility. Journal of Applied Econometrics 17, 479–508. Pan, J., 2002. The jump-risk premia implicit in options: evidence from an integrated time-series study. Journal of Financial Economics 63, 3–50. Roberts, G., Papaspiliopoulos, O., Dellaportas, P., 2004. Bayesian inference for nonGaussian Ornstein–Uhlenbeck stochastic volatility processes. Journal of Royal Statistical Society B 66, 369–393. Rosiński, J., 2007. Tempering stable processes. Stochastic Processes and Their Applications 117, 677–707. Tauchen, G., 1985. diagnostic testing and evaluation of maximum likelihood models. Journal of Econometrics 30, 415–443. Todorov, V., Tauchen, G., 2006. Simulation methods for Lévy -Driven CARMA stochastic volatility models. Journal of Business and Economic Statistics 24, 450–469.
Journal of Econometrics 160 (2011) 22–32
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Estimation of objective and risk-neutral distributions based on moments of integrated volatility✩ René Garcia a,∗ , Marc-André Lewis b , Sergio Pastorello c , Éric Renault d a b
Edhec Business School, France Banque Nationale du Canada, Canada
c
Dipartimento di Scienze Economiche, Università di Bologna, Italy
d
University of North Carolina at Chapel Hill, CIRANO and CIREQ, United States
article
info
Article history: Available online 6 March 2010 JEL classification: C1 C5 G1 Keywords: Realized volatility Implied volatility Volatility risk premium Moments of integrated volatility Objective distribution Risk-neutral distribution
abstract In this paper, we present an estimation procedure which uses both option prices and high-frequency spot price feeds to estimate jointly the objective and risk-neutral parameters of stochastic volatility models. The procedure is based on a method of moments that uses analytical expressions for the moments of the integrated volatility and series expansions of option prices and implied volatilities. This results in an easily implementable and rapid estimation technique. An extensive Monte Carlo study compares various procedures and shows the efficiency of our approach. Empirical applications to the Deutsche mark–US dollar exchange rate futures and the S&P 500 index provide evidence that the method delivers results that are in line with the ones obtained in previous studies where much more involved estimation procedures were used. © 2010 Elsevier B.V. All rights reserved.
1. Introduction In continuous-time modeling in finance, stochastic processes for asset prices are combined with an absence of arbitrage argument to obtain the prices of derivative assets. Therefore, statistical inference on continuous-time models of asset prices can and should combine two sources of information, namely the price history of the underlying assets on which derivative contracts are written and the price history of the derivative securities themselves. However, statistical modeling poses a challenge. A joint
✩ This paper was started while the second author was a post-doctoral fellow at CIRANO. We would like to thank two anonymous referees and the editors for their very helpful comments and suggestions. We also benefited from comments by Nour Meddahi and conference participants at the Canadian Econometric Study Group (CESG) in Waterloo (2001), the Econometric Society Winter Meeting in Atlanta (2002), and the International Statistical Symposium and Bernoulli Society EAPR Conference in Taipei (2002). The first author gratefully acknowledges financial support from the Fonds Québécois de la recherche sur la société et la culture (FQRSC), the Social Sciences and Humanities Research Council of Canada (SSHRC), the Network of Centres of Excellence MITACS, Hydro-Québec and the Bank of Canada. The third author gratefully acknowledges financial support from the MIUR Prin 2005 program. ∗ Corresponding address: Département de Sciences Économiques, Université de Montréal, C.P. 6128, Succ. Centre-Ville, Montréal, Québec, H3C 3J7, Canada. E-mail address:
[email protected] (R. Garcia).
0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.011
model needs to be specified, not only for the objective probability distribution which governs the random shocks observed in the economy, but also for the risk-neutral probability distribution, which allows us to compute derivative asset prices as expectations of discounted payoffs. Since the two distributions have to be equivalent, there exists a link between the two through an integral martingale representation which includes the innovations associated with the primitive asset price processes and the risk premia associated with these sources of uncertainty. Moreover, state variables, observable or latent, may affect the drift and diffusion coefficients of the primitive assets and the corresponding risk premia. The main contribution of this paper is to propose a new methodology for an integrated analysis of spot and option prices. It is based on simple generalized method-of-moment (GMM) estimators of both the parameters of the asset price and state variable processes and the corresponding risk premia. To focus on the issue of the joint specification of an objective probability distribution and a risk-neutral one, we will restrict ourselves to the case of one state variable which will capture the stochastic feature of the volatility process of the underlying asset. We will adopt a popular affine diffusion model where volatility is parameterized as follows: dVt = k(θ − Vt )dt + γ
Vt dWtσ ,
where Vt is a latent state variable with an innovation governed by a Brownian motion Wtσ . This innovation can be correlated (with a
R. Garcia et al. / Journal of Econometrics 160 (2011) 22–32
coefficient ρ ) with the innovation of the primitive asset price process governed by WtS : dSt St
= µt dt + σt dWtS .
In a seminal paper, Hull and White (1987) have shown that, in the particular case where ρ = 0, the arbitrage-free option price is a conditional expectation of the Black and Scholes (1973) (BS) price, where the constant volatility parameter σ 2 is replaced by the T 2 1 1 so-called mean integrated volatility: T − σs ds and V = T− t t ,T t t where the conditional expectation is computed with respect to the risk-neutral conditional probability distribution of Vt ,T given σt . Heston (1993) has extended the analytical treatment of this option pricing formula to the case where ρ is different from zero, allowing for leverage effects and the presence of risk premia. However, with or without correlation, the option pricing formula involves the computation of a conditional expectation of a highly nonlinear integral function of the volatility process. To simplify this computation, we propose to use an expansion of the option pricing formula in the neighborhood of γ = 0, as in Lewis (2000), which corresponds to the Black–Scholes deterministic volatility case. The coefficients of this expansion are well-defined functions of the conditional moments of the joint distribution of the underlying asset returns and integrated volatilities, which we also derive analytically. These analytical expansions will allow us to compute very quickly implied volatilities which are functions of the parameters of the processes and of the risk premia. An integrated GMM approach using intraday returns for computing approximate integrated volatilities (see the pioneering papers of Andersen and Bollerslev, 1998; Barndorff-Nielsen and Shephard, 2001, 2004) and option prices for computing implied volatilities allow us to estimate jointly the parameters of the processes and the volatility risk premium λ. The main attractive feature of our method is its simplicity once analytical expressions for the various conditional moments of interest are available. These expressions were derived by Lewis (2001) using a recursive method, and also by Bollerslev and Zhou (2002) for the first two moments. The great advantage of the affine diffusion model is precisely to allow an analytical treatment of the conditional moments of interest. Bollerslev and Zhou (2002) have developed such a GMM approach based on the first two moments of integrated volatility to estimate the objective parameters of stochastic volatility and jump-diffusion models. In our estimation, we add moment conditions based on the third moment of integrated volatility. We hope that using these moments will help to better identify certain coefficients, and in particular to effectively estimate the asymmetry coefficient ρ .1 Recently, Bollerslev et al. (2011) adopted a very similar approach to ours, but considered a so-called model-free approach to recover implied volatilities. There is clearly a trade-off between model-free and model-based approaches to recover implied volatilities. While a model-free approach is robust to misspecification, it requires theoretically continuous strikes for option prices or practically a very liquid market like the S & P 500 option market.2
1 Meddahi (2002) also derives explicit formulas for conditional and unconditional moments of the continuous-time Eigenfunction Stochastic Volatility (ESV) models of Meddahi (2001), which include as special cases the log-normal, affine and GARCH diffusion models. 2 For model-free approaches see in particular Britten-Jones and Neuberger (2000), Lynch and Panigirtzoglou (2003) and Jiang and Tian (2005). The latter study shows how to implement the model-free implied volatility using observed option prices. They characterize the truncation errors when a finite range of strike prices are available in practice. To calculate the model-free implied volatility, they use a curve-fitting method and extrapolation from endpoint implied volatilities.
23
Model-based approaches like ours may be sensitive to misspecification but they require only a few option prices. It may be the only way to proceed for options on individual stocks or less liquid option markets in general. It should also be emphasized that model-free implied volatilities are used ultimately to estimate parameters of a volatility model and the corresponding risk premium. For efficiency reasons, it may make sense to use the less noisy model-based implied volatilities, given of course that the model is well-specified. Only few studies have estimated jointly the risk-neutral and objective parameters, and the estimation methods used are generally much more involved. Pastorello et al. (2000) proposed an iterative estimation procedure that used option and returns information to provide an estimate of the objective parameters in the absence of risk premia. Poteshman (1998) extends their methodology to include correlation between returns and volatility, a non-zero price of volatility risk, and flexible nonparametric specifications for this price of risk as well as the drift and diffusion functions of the volatility process. Chernov and Ghysels (2000) use the Efficient Method of Moments (EMM), a procedure that estimates the parameters of the structural model through a seminonparametric auxiliary density. Pan (2002) uses the Fourier transform to derive a set of moment conditions pertaining to implied states and jointly estimate jump-diffusion models using option and spot prices.3 Pastorello et al. (2003) propose a general methodology of iterative and recursive estimation in structural non-adaptive models which nests all the previous implied state approaches. Finally, Aït-Sahalia and Kimmel (2007) propose a maximum likelihood approach, using closed-from approximations to the likelihood function of the joint observations on the underlying asset and option prices. Compared to all these methods, the main advantage of our method is its simplicity and computational efficiency. We show through an extensive Monte Carlo that the estimation procedure works well for both the no-leverage model (ρ = 0) and the leverage model (ρ ̸= 0). Of course the selected moment conditions differ between the two models. Due to the presence of a correlation parameter in the leverage model, we include moment conditions involving the cross-product of returns with either integrated volatility or implied volatility. In the no-leverage case, the moment conditions are based only on moments of integrated volatility and of implied volatility. Finally, we provide an empirical illustration of our method for each model. The no-leverage model is applied to the Deutsche mark–US dollar exchange rate futures market. We use 5-min returns on the exchange rate futures and daily option prices on the same futures to compute the moments and implement our methodology. For the leverage model, we use 5-min returns on the S & P 500 index and daily option prices on the same index. Results show the presence of a significant volatility risk premium. The rest of the paper is organized as follows. In Section 2, we present the general methodology, and show how to construct two blocks of moment conditions for the estimation of the models, one based on the high-frequency return measures, another on the implied volatility obtained as power series in the volatility of volatility parameter γ . Section 3 describes the moment conditions for the first block of moment conditions, while Section 4 explains how to use option price expansions to define model-specific
3 Duffie et al. (2000) have extended the moment computations to the case of affine jump-diffusion models (where jumps are captured by Poisson components), while Barndorff-Nielsen and Shephard (2001) have put forward the so-called Ornstein–Uhlenbeck-like processes with a general Levy innovation. The general statistical methodology that we develop in this paper could be extended to these more general settings if a specification is chosen for the risk premia of the various jump components.
24
R. Garcia et al. / Journal of Econometrics 160 (2011) 22–32
implied volatilities, and how these implied volatilities can lead to the estimation of the parameters. Section 5 presents a Monte Carlo study for two stochastic volatility models, with and without leverage, for several sets of parameter values. In Section 6, we provide two empirical illustrations of the methodology. Appendices A and B contain the expressions for the moment of integrated volatility and for the cross-moments in the leverage model. 2. A general outline of the method
[ ] St Vt
[ =
] [ St µt St dt + Vt κ(θ − Vt ) γρ
0 γ 1 − ρ2
dWt1 , (1) dWt2
][
]
where St and Vt are the price and volatility processes. The affine model for the volatility appearing in the returns process was studied by Heston (1993), Duffie et al. (2000) and Meddahi and Renault (2004). The risk-neutral process is taken to be
[ ] St Vt
d
[ =
rt S t
] dt
κ ∗ (θ ∗ − Vt ) ] [ 1] [ St ˜t 0 dW + Vt ˜ t2 , γ ρ γ 1 − ρ 2 dW
(2)
where, by virtue of Girsanov theorem, only the parameters κ and θ are modified in the passage from one measure to the other. We follow Heston (1993) and specify the risk premium structure as: κ ∗ = κ − λ; κ ∗ θ ∗ = κθ , the volatility risk premium being parameterized by λ. For such models, the objective parameters to be estimated are4 : β = (κ, θ, γ , ρ). In order to define the riskneutral set of parameters β ∗ = (κ ∗ , θ ∗ , γ , ρ), one must have the additional parameter λ, since we shall assume the short rate rt to be observed. We will denote by ψ the vector of parameters comprised of β and λ. In the next sections, we will show that high-frequency measures of returns can be used to measure the integrated volatility Vt ,T . Lewis (2001) proposes a method to compute conditional moments of the integrated volatility in affine stochastic volatility models. Using these, it is possible to construct a set of moment conditions f1 (Vt ,T , β), which is such that E f1 (Vt ,T , β) = 0. We will denote by βˆ the estimator based on the set of moment conditions f1 . Moreover, it is possible to define model-specific implied imp volatilities Vt (β, λ, {cobs }), with {cobs } being the set of observed option prices. These implied volatilities, that are not to be confounded with Black–Scholes implied volatilities, are defined to be the point-in-time volatility which gives, for given values of the risk-neutral parameters β ∗ , the observed option price. We use these implied volatilities to construct a second set of imp moment conditions f2 (Vt (β, λ, {cobs }), β), which depends in a very nonlinear way on the parameters β and λ. It is thus possible to construct a joint set of moment conditions f1 (Vt ,T , β) = 0, imp f2 (Vt (β, λ, {cobs }), β)
[ E
From the seminal works of Andersen and Bollerslev (1998) and Barndorff-Nielsen and Shephard (2001), we know that highfrequency intraday data on returns can be used to obtain indirect information on the otherwise unobservable volatility process. The logarithmic price of an asset is assumed to obey the stochastic differential equation dpt = µ(pt , Vt , t )dt +
As stated in the introduction, two different but equivalent sets of bivariate stochastic processes are to be considered here. The objective process is taken to be the affine stochastic volatility process d
3. Estimating objective parameters from high-frequency returns
]
(3)
which we use to estimate by GMM the objective parameters β and ˆ the estimator based on the joint the risk premium λ. We will call ψ set of moment conditions f1 and f2 .
4 Since the drift term µ does not matter for option pricing purposes, we do not t specify it explicitly. Moreover, the inference method we will use for the objective parameters is robust to its specification.
Vt dWt ,
where Vt is the squared-volatility process (which could be stochastic, particularly of the affine type we discussed above) and Wt is a standard brownian motion. If the drift and diffusion coefficients are sufficiently regular to guarantee the existence of a unique strong solution to the SDE, then, by the theory of quadratic variation, we have plimN →∞
N −
pt + i (T −t ) − pt + i−1 (T −t )
i=1
N
N
2
T
∫
Vs ds ≡ Vt ,T ,
−→ t
and Vt ,T is referred to as the integrated volatility of the process Vt from time t to T . Andersen et al. (2001a,b, 2003) offer a characterization of the distributional features of daily realized returns volatilities constructed from high-frequency five-min returns for foreign exchange and individual stocks. The finiteness of the number of measures induces a systematic error in the integrated volatility measure, and, in fact, the quadratic variation estimator will be a biased estimator of the integrated volatility if the drift term is not zero, this bias falling as the number of measures increases. Bollerslev and Zhou (2002) use such an aggregation of returns to obtain integrated volatility time series from which they estimate by GMM the parameters of Heston’s (1993) stochastic volatility model. They base their estimation on a set of conditional moments of the integrated volatility, where they add to the basic conditional mean and second moment various lag-one and lag-one squared counterparts. In constructing estimates of the objective parameters of the stochastic volatility process, we follow their basic approach but introduce a new set of moment conditions involving higher moments of the integrated volatility, in particular its skewness. Lewis (2001) derives analytically all conditional moments of the integrated volatility for the class of affine stochastic volatility models (which includes the Heston (1993) and the Hull and White (1987) models).5 Some attention has to be devoted to information sets. Following the notation of Bollerslev and Zhou (2002), we shall define the filtration Ft = σ {Vs , s ≤ t }, that is, the sigma algebra generated by the instantaneous volatility process. Our moment conditions for the integrated volatility are originally conditional on this filtration. Since only the integrated volatility is observable, we need to in troduce the discrete filtration Gt = σ Vs−1,s , s = 0, 1, 2, . . . , t , which is the sigma algebra of integrated volatilities. Integrated volatilities are not observable, but realized volatilities are. As Bollerslev and Zhou (2002), we ignore the discretization noise. Corradi and Distaso (2006) provide a theoretical foundation for the approach that ignores the noise in a double asymptotic setting. Evidently, the filtration Gt is nested in the finer Ft . This enables one to rewrite moment conditions in terms of the coarser filtration using the law of iterated expectations: E [E ( · |Ft ) |Gt ] = E ( · |Gt ).
5 Duffie et al. (2000) provide analytical expressions for the instantaneous volatility process for such models. Bollerslev and Zhou (2002) derived analytical expressions for the mean and variance of the integrated volatility in Feller-type volatility models. To our knowledge, higher moments of the integrated volatility were not previously computed. Zhou (2003) characterized the Itô conditional moment generator for affine jump-diffusion models, and other nonlinear quadratic variance and semiparametric flexible jump models.
R. Garcia et al. / Journal of Econometrics 160 (2011) 22–32
3.1. The no-leverage model In the case where there is no correlation between returns and volatility (ρ = 0), we use the following set of moment conditions: m1t (β) = Vtk+1,t +2 − E Vtk+1,t +2 |Gt
,
k = 1, 2, 3.
(4)
The three conditional moment restrictions (4) can be expressed in terms of observed integrated volatilities because we have (see Appendix A) closed-form formulas for E Vtk+1,t +2 |Gt in terms of
E Vtk,t +1 |Gt , for k = 1, 2, 3. We use each of the resulting three orthogonality conditions with two instruments, a constant and Vtk−1,t , which results in six unconditional moment restrictions.6
f1t (β) =
Vtk+1,t +2 − E Vtk+1,t +2 |Gt
Vtk+1,t +2 − E Vtk+1,t +2 |Gt
Vtk−1,t
,
k = 1, 2, 3.
(5)
3.2. The leverage model In the leverage model, the correlation coefficient between the two Brownian motions, ρ , appears as an additional parameter. In principle, it could be identified using only the marginal moments of the integrated and the spot volatility (because the implied spot volatilities depend on both λ and ρ ). In practice, these moment expressions are not able to accurately identify the parameter ρ . We add some cross-moments between the log returns7 and the integrated volatility to the moment conditions in (4). The set of twelve moment conditions used in the estimation of the objective parameters is given below. To derive the closed-form expression of the cross-moments, we used the recurrence formula provided in Appendix A of Lewis (2001). These expressions are given in Appendix B. f1t (β)
Vtk+1,t +2 − E Vtk+1,t +2 |Gt
( −E |Gt )Vtk−1,t , k = 1, 2, 3 (pt +1 − pt )Vt +1,t +2 − E (pt +1 − pt )Vt +1,t +2 |Gt ((pt +1 − pt )Vt +1,t +2 − E (pt +1 − pt )Vt +1,t +2 |Gt )(pt −1 − pt −2 )Vt −1,t = (pt +1 − pt )2 Vt +1,t +2 − E (pt +1 − pt )2 Vt +1,t +2 |Gt 2 2 ((pt +1 − pt ) Vt +1,t +2 − E (pt +1 − pt ) Vt +1,t +2 |Gt )(pt −1 − pt −2 )2 Vt −1,t (pt +1 − pt )Vt2+1,t +2 − E (pt +1 − pt )Vt2+1,t +2 |Gt ((pt +1 − pt )Vt2+1,t +2 − E (pt +1 − pt )Vt2+1,t +2 |Gt )(pt −1 − pt −2 )Vt2−1,t Vtk+1,t +2
Vtk+1,t +2
.
25
However, semi-closed form option pricing formulas are generally difficult to invert and one has to use numerical procedures which are computationally intensive and whose precision has to be controlled. Moreover, implementing integral solutions such as the Heston’s formula can be very delicate due to divergences of the integrand in regions of the parameter space. One way to avoid both problems is to rewrite option pricing formulas as power series around values of the parameters for which the model can be analytically solved (i.e it has an explicit form in terms of elementary and special functions; not an integral one). This avenue is followed by Lewis (2000).8 4.1. Series expansions and inversion of option pricing formulas Since option prices are continuously differentiable at any order in the volatility of volatility parameter γ , one can expand the pricing formula around a fixed γ , which we will set to zero, as it corresponds to a deterministic volatility model that we can solve analytically. Generally, at date t options will have prices c (St , Vt , K , t , T , r , β ∗ ), where St and Vt are the underlying asset’s price and volatility, K the strike price, T the expiration date, and β ∗ are the parameters of the risk-neutral distribution. The Taylor expansion of c around γ = 0 has the general form c (St , Vt , K , t , T , r , β ∗ ) =
∞ −
∗ µj (St , Vt , K , t , T , r , β−γ )γ j ,
(7)
j =0
∗ where β−γ denotes the vector of risk-neutral parameters without the volatility of volatility coefficient γ . A series expansion like (7) represents a rapid and simple option price computation tool, and it also is usually straightforward to invert, so that one can define implied volatilities and compute them very efficiently. Option prices being strictly increasing functions of the volatility, the inversion is always possible. Assume that given an observed option price ctobs , the volatility V imp admits the expansion
V imp (St , ctobs , K , t , T , r , β ∗ ) =
∞ −
∗ νj (St , ctobs , K , t , T , r , β−γ )γ j .
j=0
(6) 4. Using implied volatilities to link objective and risk-neutral parameters Implied volatilities are usually computed by inverting the Black–Scholes formula, but they can also be defined for more elaborate pricing models involving additional parameters. Inversion of the pricing formula in the volatility parameter can then only be done for given values of these parameters. The value of implied volatility will therefore depend both on the option price and the parameter values.
6 Due to the MA(1) structure of the error terms in (5), the optimal weighting matrix for GMM estimation entails the estimation of the variance and only the firstorder autocorrelations of the moment conditions. 7 Bollerslev and Zhou (2002) express these moment conditions in terms of the log price of the asset, which is nonstationary (see (Bollerslev and Zhou, 2002), footnote 17 on p. 61). Attempts to run the GMM estimation using the conditions based on the price instead of the returns tended to exhibit an erratic behavior: estimates on the boundary of the admissible region, many error flags raised by the optimization software, and so on.
Such a development makes sense, because if γ goes to zero, the leverage model becomes a deterministic volatility model (in fact, if κ ∗ goes to zero, we recover the Black–Scholes model). If the observed option price is correctly priced by (7) with Vt set to the implied volatility V imp (St , ctobs , K , t , T , r , β ∗ ), then ctobs = c [St , V imp (St , ctobs , K , t , T , r , β ∗ ), K , t , T , r , β ∗ ]
=
∞ −
µj St ,
∞ −
j=0
ν( ,
obs k St c t
∗ ∗ , K , t , T , r , β−γ )γ k , K , t , T , r , β−γ
γ j.
k=0
Since we have explicit expressions for the µj s, we can Taylor expand the right-hand side and collect terms by powers of γ . By doing ∗ , {νk }) so, we can define the coefficients ν˜ j (St , ctobs , K , t , T , r , β−γ such that ctobs =
∞ −
∗ ν˜ j (St , ctobs , K , t , T , r , β−γ , {νk })γ j .
j=0
This equation is solved by imposing the conditions
ν˜ 0 = ctobs ν˜ j = 0 ∀j ≥ 1, which form a triangular system of polynomial equations for the νk . This system is easily solved order by order in k. Notice that a similar result could be obtained by starting from a Taylor expansion of
8 Medvedev and Scaillet (2007) develop similar Taylor expansions for short-term implied volatilities for jump-diffusion stochastic volatility models.
26
R. Garcia et al. / Journal of Econometrics 160 (2011) 22–32
the usual Black–Scholes implied volatility, denoted by VBS , instead of (7). We now provide an illustration of the inversion methodology with the leverage model, which encompasses the no-leverage one as a special case. The starting point is to derive the Taylor expansion of the call price, or equivalently of the corresponding Black–Scholes implied volatility. To this end, we follow Lewis (2000, chap. 3). To simplify the notation, we will keep the underlying asset price and volatility as the only explicit arguments in the formulas of option prices and Black–Scholes volatilities; also, let us denote with Et (·) the conditional expectation operator E(·|St , Vt ), taken with respect to the risk-neutral distribution. Romano and Touzi (1997) have shown that, under the stochastic volatility process (1), the call option price c (St , Vt ) can be written as
c (St , Vt ) = Et cBS (St Ωt ,T , (1 − ρ )V t ,T ) , 2
(8)
T
∫ = exp ρ
Vs dWs2
1
− ρ
2
2
t
T
∫
Vs ds
and
t
1 Vs ds. T −t t Using the Girsanov theorem and some simple stochastic calculus it easy to show that
V t ,T =
−κ ∗ (T −t )
1−e Et V t ,T = θ ∗ + (Vt − θ ∗ ) . κ ∗ (T − t ) We can now expand the option pricing formula around these mean values9 : E t Ω t ,T = 1
and
c (St , Vt ) =
1 ∂ p+q cBS (St , Et V t ,T )
∞ −
∂ Sq∂ V p
p!q! p,q=0
q
S t ( 1 − ρ 2 ) p Et
× (Ωt ,T − 1)q (V t ,T − Et V t ,T )p . Defining Rpq (S , V , T ) ≡ S q
[ ] ∂ p+q cBS (S , V ) ∂ cBS (S , V ) −1 , ∂ V p∂ Sq ∂V
this can be rewritten as ∞ ∂ cBS (St , Et V t ,T ) − (1 − ρ 2 )p Rpq (St , Et V t ,T ) c (St , Vt ) = Et ∂V p!q! p,q=0
× (Ωt ,T − 1)q (V t ,T − Et V t ,T )p . Let τ = T − t denote the time to maturity of the option. Lewis (2000) shows that the first values of Rpq can be expressed in terms of the moneyness X = log[S /K e−r τ ] and the expected integrated volatility Et Vt ,T as follows: R20 = τ
R11 = −
[
R22
X2
[
2(Et Vt ,T )2 X Et Vt ,T X
2
+
−
1 2Et Vt ,T
−
1
]
8
1 2 X +1
1
∗τ
+ [κ ∗ τ − 2]θ ∗ + Vt
J3
θ ∗ − 2Vt + e2κ τ [(−5 + 2κ ∗ τ )θ ∗ + 2Vt ] + 4eκ τ [θ ∗ + κ ∗ τ θ ∗ − κ ∗ τ Vt ] ∗ 4e2κ τ (κ ∗ )3 ∗
=
∗
J4
[6 + 2eκ τ (−3 + κ ∗ τ ) + κ ∗ τ (4 + κ ∗ τ )]θ ∗ + [−2 + 2eκ ∗ 2eκ τ (κ ∗ )3 ∗
= ρ2
∗τ
− κ ∗ τ (2 + κ ∗ τ )]Vt
if κ ∗ ̸= 0, and J1 =
ρ Vt τ 2
,
J3 =
Vt τ 3
,
J4 =
ρ 2 Vt τ 3
2 6 6 if κ ∗ = 0. Using these expressions, one can approximate the call option price by truncating the expansion at the order two: c (St , Vt ) = cBS (St , Et V t ,T ) +
+
T
∫
R12 =
[θ ∗ + (1 + κ ∗ τ )(θ ∗ − Vt )]e−κ (κ ∗ )2
[
where
Ω t ,T
J1 = ρ
J4 R12
τ
+
J1 R11 ∂ cBS (St , Et V t ,T )
τ
2J3 R20 + J12 R22 2τ 2
]
∂V ∂ cBS (St , Et V t ,T ) 2 γ . ∂V
Lewis (2000) shows that the neglected terms are O(γ 3 ). Alternatively, by starting from a series expansion of the Black–Scholes implied volatility, evaluating the relevant coefficients and truncating at the order two, one arrives at the following approximation: J1 R11 VBS (St , Vt ) = Et V t ,T + γ
τ
[ +
J2 + J4 R12
τ
+
2J3 R20 + J12 (R22 − R211 R20 )
τ2
]
γ2
whose approximation error is still O(γ 3 ). Lewis (2000), however, shows that for plausible values of the arguments the latter approximation is more accurate than the former, in particular for options far out-of-the-money and far in-the money. Following his suggestion, in our empirical analysis we use the Black–Scholes implied volatility series expansion. We now turn to the extraction of an implied volatility coherent with the leverage model from the observed Black–Scholes implied volatility. Specifically, we want to identify the coefficients νi of the approximation ∗ ∗ V imp (St , VBS ,t , β ∗ ) = ν0 (St , VBS ,t , β−γ ) + ν1 (St , VBS ,t , β−γ )γ ∗ + ν2 (St , VBS ,t , β−γ )γ 2
whose errors are O(γ 3 ), and where once again we suppress for simplicity the dependence of V imp and νi from the option’s characteristics, while keeping explicit only the observed underlying price St , the option’s Black–Scholes volatility VBS ,t , and the risk-neutral parameters β ∗ . Computing the coefficients ν0 , ν1 and ν2 is a straightforward but tedious exercise. As pointed out earlier, all we need to do is to insert this series expansion for Vt in the series expansion for VBS , further expand with respect to γ around 0, and recursively compute the values of the coefficients that ensure the equality between the truncated series above and the observed Black–Scholes volatility.10
]
− + (Et Vt ,T )2 E t V t ,T 4 [ 4 3 X X + 6X 2 = τ − 2(Et Vt ,T )4 2(Et Vt ,T )3 ] 3(X + 1) X 1 + + − . 2(Et Vt ,T )2 8Et Vt ,T 32
It is also convenient to define the following quantities:
9 Hull and White (1987) propose a similar series expansion for the case in which the stochastic volatility is independent of the stock price and follows a geometric Brownian motion.
4.2. Moment conditions for implied volatilities Given daily option prices and some initial values for the objective parameters β0 and the risk premium λ0 , we are able to construct an implied volatility time series. If the risk premium was suitably chosen and the objective parameters were the true ones, every option on a given day would have the same implied volatility, so that the implied volatility surface would be flat, with a value
10 The coefficients ν , ν , ν of the inverse series for the spot volatility V = 0 1 2 t ν0 + ν1 γ + ν2 γ 2 are provided in a document available on the following Web site: http://www.sceco.umontreal.ca/renegarcia/articles.htm.
R. Garcia et al. / Journal of Econometrics 160 (2011) 22–32
equal to the point-in-time volatility Vt . Since this will not usually be the case, one could use some daily mean value of the observed imp implied volatilities Vt (β0 , λ0 ) to generate the implied volatility 11 time series. For the no-leverage model, we then consider the daily volatility series and use the moments of the instantaneous volatility in the moment conditions f2t (β). Each of these conditional moment conditions are used with two instrumental variables (which are a constant and Vtk for the moment condition involving E Vtk+1 |Ft ), resulting in six unconditional moment conditions: f2t (V , β) =
[
] (Vtk+1 − E(Vtk+1 |Ft )) , (Vtk+1 − E(Vtk+1 |Ft ))Vtk
k = 1, 2, 3.
(9)
These moment conditions are used in (3) to obtain estimates of the β and λ parameters with a joint GMM procedure. We proceed in the same way for the leverage model and use a similar set of moment conditions as (6) with V replacing V and F replacing G. Expressions for the cross-moments are provided in Appendix B. 5. A Monte Carlo study In order to assess how the GMM estimation method described in the previous sections performs, we conducted a Monte Carlo study for both the leverage and no-leverage models. It should be mentioned that any affine model that admits a formulation of the option pricing formula in terms of a power series could be considered with the same methodology. 5.1. The no-leverage model We study the same three sets of parameters chosen by Bollerslev and Zhou (2002), in order to be able to compare our respective results for the estimation of βˆ . We combine them with different values of the volatility risk premium λ12 to obtain four parameter sets A, B, C and D. Note that these are daily parameters. As in Bollerslev and Zhou, √ we normalized θ so that the yearly volatility is 240 × θ , with 240 being the number of days per year we chose. This means that a yearly volatility of 7.74% is associated with θ = 0.25. Each day is further subdivided in 80 five-min periods. The quadratic variations Vt ,t +1 are aggregated over these 80 periods, giving daily integrated volatilities, whereas the option prices are computed from the midday price and spot volatility of the underlying asset (i.e. the 40th observation in each day). In turn, each 5-min interval is actually subdivided in ten 30 s subintervals, and the SDE is simulated using the finest grid. Under the no-leverage model, option prices can be expressed as mean values of the Black–Scholes price over the integrated volatility distribution.13 Option prices are obtained by simulating volatility trajectories to approximate the integrated volatility distribution. We wanted to avoid using our expansion for the
11 Whether strongly out or in-the-money options should be included in this mean value computation can be debated. Nevertheless, we chose to do so both for the Monte Carlo generated option prices and for the empirical applications. In the latter case, we also compared the results to an approach where we chose only one option per day, either the nearest to the money or the highest volume one. 12 Under CRRA preferences, where a representative investor has a power utility over wealth, a zero correlation between returns and volatility innovations will imply a zero risk premium. However, volatility could be correlated with aggregate consumption and carry a non-zero price of risk. Also, a non-zero premium could be rationalized in an international model in which the volatility price of risk is linked to the exchange risk. 13 Explicit expressions for the option prices and implied volatility series up to the sixth order in γ can be found in the document posted on the above-mentioned web site.
27
option pricing formula in order to validate its use. However, if enough trajectories are simulated, the simulated and series expansion price are almost indistinguishable (at least in the region where the expansion is valid and has enough precision).14 The risk premium structure is chosen as in the Heston’s (1993) paper, that is, risk-neutral (denoted by stars) and objective parameters are related by κ ∗ = κ − λ, θ ∗ κ ∗ = θ κ and γ ∗ = γ . Usually, one would expect λ to be positive, so that the asymptotic volatility is higher, meaning that option prices will also be. The GMM estimation procedure was conducted with a NeweyWest kernel with a lag length of two.15 Results of the estimations are provided in Table 1. Statistics were obtained by estimating parameters over 5000 independent sets of 4-year data (960 observations). A comparison of the βˆ estimates with the GMM estimates of Bollerslev and Zhou (2002) reveals that the RMSE are not universally smaller with the moment conditions that we specified. The difference between the two sets is the introduction in our estimators of the third-moment conditions, while in their selection of instruments they included the lagged squared integrated volatility. It appears that the root mean-square error (RMSE) with our moment conditions is lower for the mean-reversion parameter κ , while it is higher for θ . The evidence is mixed for γ . The third-moment condition seems to perform better for higher volatility of volatility but worse for highly persistent processes. A closer look at the third-moment formula of integrated volatility in Appendices A and B shows that when κ is close to zero a number of terms involving θ disappear in NT , hence a potential loss of identification for this parameter. Moreover, a γ close to zero will potentially reduce the information about the other parameters that appear inside the brackets since higher powers of γ pre-multiply the whole expressions in the higher-order moments. ˆ . The The RMSE of γ generally deteriorates with the estimator ψ added moment conditions where implied volatility is recovered through a Taylor expansion around γ = 0 may make this parameter harder to recover. Finally, the volatility risk premium λ is also nicely recovered. ˆ remains quite small, except maybe for the last Its RMSE for ψ configuration of parameters (Panel D). This may be due to the fact that in this case the process is close to violating the condition making the zero boundary inaccessible (2κθ ≥ γ 2 ), with a high volatility of volatility parameter. The error is also relatively large for the other parameters in this case. 5.2. The leverage model The leverage model requires prices and spot volatilities to be observed at the beginning of the period over which the quadratic variation is computed. For this reason, we modified the sampling scheme to use opening prices instead of mid-day prices. Moreover, the previous Monte Carlo strategy does not work in the leverage model.16 Therefore, we evaluate option prices by adding a random
14 Lewis (2000), on page 80, provides a graph in which it can be seen that the approximation error associated with the expansion increases with the moneyness. 15 We checked that results were quite insensitive to this choice. 16 Basically, the problem is that the in-the-money option prices simulated in the way we proceeded violate often the lower bound C ≥ S − K e−r τ , which makes it impossible to compute the associated Black and Scholes volatility. In the no-leverage model this problem was absent because C could be computed as the expectation of CBS (S , Vt ,t +τ ) with respect to the distribution of the integrated volatility. A similar expression also exists for the leverage model (see formula (8)), but in this case CBS is not evaluated in S, but in S rescaled by a function of the volatility trajectory. Of course increasing the number of trajectories may attenuate the issue, but it does not provide a complete solution, and it significantly increases the computational burden.
28
R. Garcia et al. / Journal of Econometrics 160 (2011) 22–32
Table 1 Parameter estimates of the no-leverage model.
βˆ κ
θ
γ
ψˆ κ
θ
γ
λ
0.1000 0.0990 0.0977 0.0186 0.0186
0.2500 0.2502 0.2493 0.0166 0.0166
0.1000 0.1007 0.1008 0.0064 0.0064
0.1000 0.1027 0.1026 0.0096 0.0099
0.2500 0.2463 0.2458 0.0167 0.0171
0.1000 0.1011 0.0999 0.0078 0.0078
0.0500 0.0537 0.0537 0.0070 0.0080
0.1000 0.0996 0.0983 0.0187 0.0187
0.2500 0.2493 0.2487 0.0164 0.0164
0.1000 0.1008 0.1009 0.0062 0.0063
0.1000 0.1038 0.1040 0.0097 0.0104
0.2500 0.2444 0.2439 0.0163 0.0172
0.1000 0.0994 0.0994 0.0060 0.0060
0.0200 0.0244 0.0242 0.0070 0.0083
0.0300 0.0344 0.0343 0.0125 0.0133
0.2500 0.2537 0.2388 0.0848 0.0849
0.1000 0.1003 0.1005 0.0076 0.0076
0.0300 0.0350 0.0365 0.0095 0.0107
0.2500 0.2355 0.2362 0.0354 0.0382
0.1000 0.0953 0.0973 0.0157 0.0164
0.0100 0.0163 0.0152 0.0071 0.0095
0.1000 0.1069 0.1062 0.0225 0.0235
0.2500 0.2463 0.2430 0.0406 0.0408
0.2000 0.1966 0.1966 0.0104 0.0109
0.1000 0.0999 0.1029 0.0259 0.0259
0.2500 0.2383 0.2368 0.0365 0.0383
0.2000 0.1876 0.1906 0.0304 0.0329
0.0500 0.0575 0.0571 0.0160 0.0177
Parameter set A True value Mean Median Std. error RMSE Parameter set B True value Mean Median Std. error RMSE Parameter set C True value Mean Median Std. error RMSE Parameter set D True value Mean Median Std. error RMSE
ˆ , we use in addition the set of moment conditions f2t defined in (9) for Note: The estimator βˆ is based on the set of moment conditions f1t defined in (5). For the estimator ψ a joint estimation as defined in (3).
error term to the series expansion formula in implied volatility. We studied the characteristics of the error terms implicitly introduced in the option prices by the MC strategy used to evaluate them in the no-leverage model, and found out that they looked fairly similar to N (0, ω2 ) added to the implied B&S volatilities, with ω = 4 × 10−7 . This is the distribution from which the random noises in option prices (via the implied B&S volatilities) are drawn in the experiments concerning the leverage model. Optimal weighting matrices have been computed in a second step using Newey-West with two lags for both estimators βˆ and ψˆ . We considered the same four sets of values for β and λ as in the no-leverage experiments, but only one value for the parameter ρ equal to −0.5. The results are provided in Table 2. For each configuration of the parameters, the MC experiment consists of 5000 replications of 960 observations each. In order to identify the correlation parameter ρ , these experiments considered six additional cross-moment conditions in addition to those used in the experiments conducted in no-leverage model for the βˆ estimator. An additional set of six moment conditions ˆ . For both estimators, defined in (9) is used for the estimator ψ ρ is clearly the most difficult parameter to estimate with the uniformly highest – by a very significant margin – RMSE among all the parameters. In terms of bias (both mean and median), the estimator βˆ tends to underestimate the leverage, roughly by 20% ˆ .17 The of the true value. This bias disappears for the estimator ψ performance seems to worsen a bit when γ increases (parameter
17 One might think that this behavior is due to the fact that the quadratic variations are only estimates of the true integrated volatilities, and that the measurement errors they embed may lead to the underestimation highlighted above. To check this hypothesis, we estimated β using also the true integrated volatilities. It turns out that the underestimation of ρ is roughly the same as before. Hence, it is not due to the measurement error in quadratic variations. The estimates of β based on the true spot volatilities, βˆ V , also exhibit a downward bias with respect to ρ , albeit less pronounced. The persistence in the volatility process may cause a finite sample
set 4a). For the parameters κ ,θ and γ the RMSEs are generally slightly higher than in the no-leverage case. An important issue for the inversion procedure recovering the implied volatility is the positivity of the extracted volatility. Since the inversion rests on a truncated expansion, it is legitimate to ask whether the recovered volatility is positive. Although it is impossible to ensure this positivity when such a truncation takes place, we verified in our Monte Carlo that even for a secondorder expansion none of the extracted volatilities were negative. However, in our experiments there are no misspecification issues since the data are simulated under the true model. With actual data, this will most likely not be the case. To investigate this aspect of the problem, we set up some Monte Carlo experiments identical to those illustrated above, but in which some errors were added to the simulated option prices through the corresponding Black–Scholes volatilities.18 These experiments showed that for reasonable values of the variance of the measurement errors, only a small percentage of recovered volatilities were negative. Moreover, these negative volatilities did not have any significant impact on the bias of the estimators. This suggests that in most applications negative recovered volatilities are not a crucial issue, and can be neglected without harm. Alternatively, we could impose the positivity of the extracted volatilities by suitably constraining the parameter space during the optimization of the GMM objective function. In practice, this amounts to impose one nonlinear constraint on the parameters for each observed option. It should be noted that this problem is fre-
bias. The absence of bias when adding information through option prices seems to confirm this interpretation. 18 For each option, the random errors added to the simulated Black–Scholes volatility were drawn from a zero-mean Gaussian distribution with a standard error set at some percentage α of the Black–Scholes volatility of the same option evaluated at the unconditional instantaneous volatility. We considered values of α ranging from 5% to 25%.
R. Garcia et al. / Journal of Econometrics 160 (2011) 22–32
29
Table 2 Parameter estimates of the leverage model. Estimator
Parameter
Parameter set 1a
Parameter set 2a
True Value
Mean
Median
RMSE
True Value
Mean
Median
RMSE
βˆ
κ θ γ ρ
0.100 0.250 0.100 −0.500
0.103 0.247 0.101 −0.455
0.100 0.247 0.101 −0.453
0.023 0.016 0.007 0.135
0.100 0.250 0.100 −0.500
0.104 0.247 0.101 −0.451
0.101 0.246 0.102 −0.449
0.024 0.017 0.007 0.135
ψˆ
κ θ γ ρ λ
0.100 0.250 0.100 −0.500 0.050
0.099 0.247 0.097 −0.498 0.051
0.100 0.246 0.098 −0.500 0.051
0.011 0.020 0.008 0.086 0.007
0.100 0.250 0.100 −0.500 0.020
0.099 0.246 0.097 −0.498 0.021
0.100 0.246 0.098 −0.500 0.021
0.010 0.019 0.008 0.082 0.007
Estimator
Parameter
True Value
Mean
Median
RMSE
True Value
Parameter set 3a
Parameter set 4a Mean
Median
RMSE
βˆ
κ θ γ ρ
0.030 0.250 0.100 −0.500
0.036 0.241 0.101 −0.424
0.034 0.238 0.101 −0.425
0.015 0.051 0.007 0.223
0.100 0.250 0.200 −0.500
0.112 0.241 0.198 −0.445
0.109 0.239 0.197 −0.443
0.027 0.032 0.011 0.155
ψˆ
κ θ γ ρ λ
0.030 0.250 0.100 −0.500 0.010
0.029 0.239 0.091 −0.498 0.019
0.030 0.249 0.095 −0.500 0.013
0.008 0.059 0.017 0.129 0.028
0.100 0.250 0.200 −0.500 0.050
0.105 0.228 0.184 −0.488 0.065
0.109 0.224 0.190 −0.493 0.063
0.026 0.057 0.030 0.126 0.027
ˆ , we use a similar set of moment conditions as (6) with V replacing Note: The estimator βˆ is based on the set of moment conditions f1t defined in (6). For the estimator ψ VF replacing G. We use in addition the set of moment conditions f2t defined in (9) for a joint estimation as defined in (3).
quently observed when the model to be estimated features an observable variable which is a function of some latent variable and the parameters, which is typically the case in derivative pricing and term structure modeling if one ignores the presence of measurement errors. When the parameters are estimated by maximum likelihood, this hugely complicates the estimation because the objective function may become undefined when the extracted latent variable violates some boundary condition. In our framework, the problem is easier to solve because the moment conditions are welldefined even for negative volatilities. Nonlinear constraints can be imposed directly on the objective function through a smooth penalty. We investigated this avenue at length, but the results were disappointing because the parameter estimates were much less reasonable, with mostly negative estimates of the volatility risk premium parameter λ and too low values of the volatility of volatility coefficient γ . Therefore we decided to ignore the negativity of the recovered volatilities since our Monte Carlo showed that it is essentially without consequences. We will see that our empirical results are fairly reasonable and compare well to estimates obtained in other studies with different estimation methodologies. 6. Empirical illustrations This section provides two applications of the proposed estimators, For the no-leverage model, we use a sample of high-frequency data on the Deutsche mark–US dollar (DM/$) futures contract and of daily call options on the same contracts. The spot price of the S&P500 index and daily call prices on the index are used to estimate the parameters of the leverage model. 6.1. Estimation of the no-leverage model Data on exchange rate are often used to estimate a no-leverage model. In Bollerslev and Zhou (2002) and Bates (1996) estimates of the parameter ρ are close to zero for the DM/$ exchange rate data either on the exchange rate themselves or on the corresponding currency options, confirming the appropriateness of a no-leverage specification.
Our data are for the DM/$ futures contract and were obtained from Tickdata.com. We form five-min log returns for the DM/$ futures contract on the spot exchange rate. The sample begins in January 1984 and ends in December 1998, for a total of 3753 daily observations. For each date 80 five-min intervals were observed, which were used to compute the quadratic variation measures. The option data set consists of daily data on options on DM/$ futures from the Chicago Mercantile Exchange (CME). At each date, a number of options were observed on the DM/$ futures, corresponding to different strike prices and maturity dates. Options with less than 15 days to maturity and with low transaction volume (less than 10 contracts) were dropped to avoid the inclusion of outliers. The number of options observed at each date is not constant – rather, it initially increases and subsequently declines towards the end of the sample – but it is never smaller than five. Notice that since our observations consist of options on a futures contract, the basic Hull and White formula and the volatility of volatility series expansions for the spot volatility have to be adjusted accordingly. In practice, as shown by Black (1976), if we denote by t the current date and by T the maturity date of the option, it is sufficient to use the standard pricing formula but with the discounted price exp[−rt (T − t )]St of the underlying asset instead of the price St . As a risk-free interest rate we used the three-month Treasury bill rate on the secondary market. The top panel of Table 3 reports the GMM estimates of the parameters of the no-leverage model on the sample of returns and option prices on the DM/$ futures. First, we report the estimates obtained with βˆ , which exploits 6 moment conditions for the integrated volatility, and hence does not allow us to estimate the ˆ , which risk premium parameter λ, and then the estimates with ψ is the joint estimator of all the parameters of the model, and is based on 12 moment conditions, 6 for the integrated volatility and 6 for the filtered spot volatilities computed using the inverse series outlined in the previous sections. ˆ we chose three ways to exploit the information available For ψ in option prices. First, we included all acceptable options according to the criteria set above, for a total of 96,599. Second, as in Pan (2002), we selected each day the option closest to the money. Finally, we included the option with the highest volume each day.
30
R. Garcia et al. / Journal of Econometrics 160 (2011) 22–32
Table 3 Estimation results with financial data. DM/US$ futures data and the no-leverage model Est.
#opt.
κ
θ
γ
λ
pv J
β
–
3753 (Pan) 3753 (Vol.)
0.088 (0.003) 0.073 (0.021) 0.050 (0.006)
0.005
ψ
0.222 (0.054) 0.233 (0.012) 0.246 (0.012) 0.219 (0.020)
0.158
96 599
0.292 (0.047) 0.293 (0.018) 0.295 (0.036) 0.313 (0.031)
–
ψ
0.094 (0.043) 0.098 (0.003) 0.112 (0.031) 0.066 (0.002)
ψ
0.256 0.080
S&P500 index data and the leverage model Est.
#opt.
κ
θ
γ
ρ
λ
pv J
β
–
0.173 (0.053) 0.027 (0.050) 0.070 (0.011) 0.032 (0.006)
0.809 (0.081) 0.844 (0.073) 0.800 (0.075) 0.718 (0.024)
0.713 (0.133) 0.186 (0.049) 0.414 (0.049) 0.336 (0.093)
−0.165
–
0.001
(0.100) −0.215 (0.103) −0.319 (0.103) −0.090 (0.010)
0.017 (0.012) 0.049 (0.012) 0.022 (0.005)
0.000
ψ
178 916
ψ
2517 (Pan) 2517 (Vol.)
ψ
Bates (1996) used short maturity at- and out-of-the-money Deutsche mark foreign currency options to estimate by maximum likelihood the parameters of several jump-diffusion models, among which the stochastic volatility model we estimated. The estimates are relatively close: for the parameter κθ our value is 0.0274 to compare with 0.031 for their corresponding parameter α ; for γ , 0.233 to compare with 0.284 for their corresponding parameter σv ; finally, for κ ∗ annualized, 2.4 to compare with 1.30 for their corresponding parameter β ∗ . Bates (1996) concludes that there is substantial qualitative agreement between implicit and time-series-based distributions, most notably with regard to implicit volatilities as forecasts of future volatility. Our estimates are in fact very close whether or not we include the information about option prices in the estimation. Although the one-factor volatility model is not rejected by the J statistic, the estimates suggest a very persistent volatility process with too-high a volatility of volatility γ parameter, which may be justified by the omission of a second volatility factor or a jump component.
0.000 0.000
In the last two cases, we had a total of 3753 options over the sample. When all options are used for estimation, we extract an average implied volatility, without regard to the moneyness or the liquidity of the options. Along with the number of options used in the estimation, the table reports the point estimates of the parameters, the associated asymptotic standard errors (computed using a Newey-West weighting scheme with lag two) in parentheses, and the p-value of the corresponding Hansen overidentifying restrictions test. A first observation is that, for all parameters, the estimates based on futures returns only and those based on futures and option prices are very close to each other when all options are included. Estimates are still close with the at-the-money options, but differ more for the highest-volume options, especially for the κ parameter. The estimates of the volatility risk premium are of the right sign and of reasonable magnitude. Moreover, the estimated risk premium is higher when all options are considered, which is intuitive since illiquid options are included along with the more liquid ones. It is also true of the at-the-money estimate, which is higher than the premium estimated with the highest-volume option. The model fares well in terms of the overidentifying statistic J . The p-value is always greater than 0.05 except when all options are included. This is a bit surprising since the one-factor stochastic volatility model has not been successful in describing the exchange rate data in previous studies. Using high-frequency exchange rate data, Bollerslev and Zhou (2002) find that the one-factor stochastic volatility model does not fully capture the dynamics of the daily DM/$ volatility. The stationarity condition (2κθ ≥ γ 2 ) is violated by their parameter estimates in βˆ but it is not the case with our estimates. Moreover their estimates of κ , θ and γ are fairly higher than our estimates. This could be justified on several grounds. First, they use data on the spot exchange rate, whereas our application uses observations on a futures contract; second, the sample interval is different (we consider a longer interval, beginning slightly less than two years before and ending slightly more than two years after); and third, and most importantly, their quadratic variation measures are computed from 288 five-min returns over the 24-hour cycle, while ours are based on just 80 five-min intervals, and hence tend to be significantly smaller than theirs. Overall they find that a two-factor volatility factor is better supported by the data.
6.2. Estimation of the leverage model While a one-factor stochastic volatility model may have some empirical support for exchange rate data, it is overwhelmingly rejected in the empirical literature that estimated this model for equity index returns. There is a consensus that single volatility factor models do not fit the data (see Andersen et al. (2002), Chernov et al. (2003), Eraker et al. (2002), Pan (2002), among others). Several authors augmented affine SV diffusions with jumps.19 Other authors20 have shown, however, that SV models with jumps in returns are not able to capture all the empirical features of observed option prices and returns. The p-values of the J statistic confirm the strong rejection of the stochastic volatility model. However, to assess our methodology, it is important to compare our results (reported in the second panel of Table 3) with the estimates produced in several of the abovementioned studies. The GMM estimates are based on the moment conditions described in Section 4. We follow exactly the same format to construct the integrated volatility and implied volatility measures as in the exchange rate futures case. The high-frequency data for the spot S&P 500 index (to build 80 five-min returns per day) were obtained from Tickdata.com for the period from January 4, 1996 till December 30, 2005, that is 2517 daily observations. The number of call options per day varies from a minimum of 30 to a maximum of 130, for a total of 179,176 options. ˆ with all options included are close to The estimates of ψ the posterior mean values obtained with a Bayesian method by Eraker et al. (2003), with time-series data on the index only, over the 1980–1999 period: κ in our study is estimated at 0.027, compared to 0.0231 in theirs, θ at 0.844 versus 0.9052, γ at 0.186 versus 0.1434. The largest difference is for the estimate of ρ (−0.215 instead of −0.3974). Our point estimate is low in absolute value with respect to most studies, especially those using option price data. Pan (2002) and Bakshi et al. (1997) obtain estimates around −0.5. However, in the latter study, the authors also compute a sample time-series correlation between daily S&P 500 index returns and daily changes in the implied volatility of the stochastic volatility model over the period 1988–1991. They obtain an estimate of −0.28, which is closer to our estimate also based on moments of the implied volatility for the leverage model. When we select one option per day (whether near the money or highest volume), the estimates are robust for κ and θ , but much
19 See in particular Andersen et al. (2001), Bates (1996), Chernov et al. (2003), Eraker et al. (2003), Pan (2002), among others. 20 Bakshi et al. (1997), Bates (2000), Chernov et al. (2003) and Pan (2002).
R. Garcia et al. / Journal of Econometrics 160 (2011) 22–32
more variable for γ and ρ . The estimated values for γ are much higher, while the absolute values for ρ are respectively on the high and the low side. To gauge the estimated value for the volatility risk premium, we can compare it to Pan (2002), who also used return and option data to estimate a jump-diffusion model (which includes both a volatility risk premium and a jump risk premium). The volatility risk premium is estimated at 7.6, which translates in a daily parameter of 0.03, compared to 0.049 for our estimate.
moments. We will content ourselves of giving the explicit form of the three first central moments for Feller-like stochastic volatility processes (both models considered here are in that class). The variance has the form21 : E[(VT − E(VT ))2 ] = AT V + BT where AT =
7. Concluding remarks
Appendix A. Computation method and expressions for moments of integrated volatility The first moment or the conditional expectation of the integrated volatility is:
[∫
E Vt ,T |Vt = E t
T
] Vu du Vt
T
∫
E [Vu |Vt ] du
= t T
∫ = t
t
If one defines G(u, t ) = Et [Vu ], it is clear from the law of iterated expectations that G(u, t ) is a martingale in t. Thus, Itô’s lemma implies that dE Vt ,T |Vt = −Vt dt + aT −t γ
Et
V t ,T
n − E[Vt ,T ] = Et
∫
T
aT − s
The third central moment has the form E[(VT − E(VT ))3 ] = MT V + NT with:
3γ 4 −1 + 2e3T κ − 2eT κ (1 + 2T κ) + e2T κ (1 − 2T κ (1 + T κ))
MT =
2e3T κ κ 5
NT
=
γ 4 θ + 6eT κ (θ + T θκ) + 2e3T κ (−11θ + 3T θκ) + 3e2T κ θ(5 + 2T κ(3 + T κ)) . 2e3T κ κ 5
Finally, the fourth moment can be shown to be: E[(VT − E(VT ))4 ] = QT V 2 + Rt V + ST with: QT = RT =
12γ 4 (− (T κ) + sinh(T κ))2
γ
e2T κ κ 6 4
3θ κ −1 + e2T κ − 2eT κ T κ 1 + 4eT κ (1 + T κ) e4T κ κ 7 + e2T κ (−5 + 2T κ) + γ 2 −3 + 15e4T κ − 12e2T κ
× (1 + T κ) (1 + 2T κ) − 6eT κ (2 + 3T κ) − 2e3T κ (−6 + T κ (3 + 2T κ (3 + T κ))) 2 γ 4θ ST = 3θ κ 1 + 4eT κ (1 + T κ) + e2T κ (−5 + 2T κ) 4e4T κ κ7 + γ 2 3 + 24eT κ (1 + T κ) + 3e4T κ (−93 + 20T κ) + 12e2T κ (7 + 2T κ (5 + 2T κ)) + 8e3T κ (21 + T κ (27 + 2T κ (6 + T κ))) .
Appendix B. Expressions for the cross-moments in the leverage model For the cross-moments, we use the expressions in Box I. These moments – along with the first three marginal moments of Vt – are used twice: once with the spot volatilities filtered from the option prices, and once with the observed integrated volatilities. In the latter case, we need the expressions of Vt and Vt2 in terms of E(Vt ,t +1 |Ft ) and E(Vt2,t +1 |Ft ). The solutions are given by: Vt =
Vt dWt .
Taking integer powers and expectations on each side we obtain
Tκ 3 e Tκκ γ θ + 4e θ (1 + T κ) + e2T κ (−5θ + 2T θ κ) BT = . 2e2T κ κ 3
The explicit expressions for these moments are the ones used in the GMM estimation and option pricing expansions.
θ + e−κ(u−t ) (Vt − θ ) du
≡ aT −t Vt + bT −t , T T with aT −t ≡ t αu−t du and bT −t ≡ t βu−t du. In order to compute higher let us consider the Vt moments, dependent random variable E Vt ,T |Vt : ∫ T E Vt ,T |Vt = Et [Vu ] du.
2γ 2 (sinh(T κ) − T κ) 2
In this paper, we proposed a joint estimation procedure of objective and risk-neutral parameters for stochastic volatility models. This approach uses both the high-frequency return information available on an underlying asset and the information on options written on this underlying. We applied this procedure to actual return and option data on exchange rate futures on the Deutsche mark–US Dollar and on the S&P 500 index. Analytical expressions for the moments of integrated volatility in affine stochastic volatility models enabled us to obtain explicit expansions of the implied volatility, a crucial feature of our procedure. The method is computationally simple since no simulations or numerical function inversions are involved. Many extensions of this work can be envisioned. A better specification for stock returns should incorporate jumps. Hence, developing an estimation procedure for jump-diffusion models appears as a natural extension. Introducing other measures such as bi-power variation (see (Barndorff-Nielsen and Shephard, 2004)) and accounting for microstructure noise are also avenues to be explored.
31
n
Vs dWs
t
where by Et [·], we mean E [·|Vt ]. This formula gives us a way to construct all central moments. The computation of this integral is however far from trivial. The interested reader will find in Lewis (2001) details of the computation for the third and fourth central
Vt2 =
E(Vt ,t +1 |Ft ) − b a Ab + ab2 − aB − (A − 2ab)E(Vt ,t +1 |Ft ) + aE(Vt2,t +1 |Ft ) a3
where a, b, A and B are the coefficients appearing in the conditional expectation and variance of the integrated volatility. Substituting
21 For simplicity of notation we write the moments at time t = 0, and let V = V0 .
32
R. Garcia et al. / Journal of Econometrics 160 (2011) 22–32
γ (Vt κ + θ(−1 + eκ − κ))ρ eκ κ 1 2 E[(pt +1 − pt ) Vt +1 |Ft ] = 2(−1 + eκ )Vt2 κ + 2Vt (θ κ(2 + e2κ + eκ (−3 + κ)) 2e2κ κ 2 + γ 2 (1 + eκ (−1 + κ + κ 2 ρ 2 ))) + θ (2(−1 + eκ )θ κ(1 + eκ (−1 + κ)) + γ 2 (−1 + e2κ (1 + 4ρ 2 ) − 2eκ (κ + 2ρ 2 + 2κρ 2 + κ 2 ρ 2 )))
E[(pt +1 − pt )Vt +1 |Ft ] =
E[(pt +1 − pt )Vt2+1 |Ft ] =
γ (2Vt2 κ 2 + (−1 + eκ )θ (−1 + eκ − κ)(γ 2 + 2θ κ) + Vt (γ 2 + 2θ κ)(−1 − 2κ + eκ (1 + κ)))ρ . e2 κ κ 2 Box I.
the expression of Vt and Vt +1 in the first cross-moment above, we get:
[
E (pt +1 − pt )
=
1 eκ κ
E(Vt +1,t +2 |Ft +1 ) − b
] Ft
a
[ ] E(Vt ,t +1 |Ft ) − b γ κ + θ(−1 + eκ − κ) ρ . a
By the Law of Iterated Expectations,
[
E (pt +1 − pt )
=
1 eκ κ
Vt +1,t +2 − b
]
Ft
a
[ ] E(Vt ,t +1 |Ft ) − b κ γ κ + θ(−1 + e − κ) ρ . a
Finally, the moment conditions have to be computed by conditioning to the ‘‘discrete-time’’ filtration Gt = {(ps , Vs−1,s ), s = t , t − 1, . . .} ⊂ Ft . Iterating the expectations we get:
[
E (pt +1 − pt )
=
1 eκ κ
Vt +1,t +2 − b
]
Gt
a
[ ] E(Vt ,t +1 |Gt ) − b κ γ κ + θ (−1 + e − κ) ρ . a
An analogous procedure leads to implementable moment conditions derived from the expressions of E[(pt +1 − pt )2 Vt +1 |Ft ] and E[(pt +1 − pt )Vt2+1 |Ft ] above. References Aït-Sahalia, Y., Kimmel, R., 2007. Maximum likelihood estimation of stochastic volatility models. Journal of Financial Economics 83, 413–452. Andersen, T.G., Benzoni, L., Lund, J., 2001. Towards an empirical foundation of continuous-time equity return models. Journal of Finance 57, 1239–1284. Andersen, T.G., Benzoni, L., Lund, J., 2002. An empirical investigation of continuoustime return models. Journal of Finance 57 (3), 1239–1284. Andersen, T., Bollerslev, T., 1998. Answering the spektics: yes, standard volatility models do provide accurate forecasts. International Economic Review 53, 219–265. Andersen, T., Bollerslev, T., Diebold, F.X., Ebens, H., 2001a. The distribution of stock return volatility. Journal of Financial Economics 61, 43–76. Andersen, T., Bollerslev, T., Diebold, F.X., Labys, P., 2001b. The distribution of exchange rate volatility. Journal of American Statistical Association 96, 42–55. Andersen, T., Bollerslev, T., Diebold, F.X., Labys, P., 2003. Modeling and forecasting realized volatility. Econometrica 71, 579–625. Bakshi, G., Cao, C., Chen, Z., 1997. Empirical performance of alternative option pricing models. Journal of Finance 94, 277–318. Barndorff-Nielsen, O.E, Shephard, N., 2001. Non-Gaussian Ornstein–Uhlenbeckbased models and some of their uses in financial economics. Journal of the Royal Statistical Society, Series B 63, 167–241. Barndorff-Nielsen, O.E, Shephard, N., 2004. Power and bipower variation with stochastic jumps. Journal of Financial Econometrics 2, 1–48. Bates, D., 1996. Jumps and stochastic volatility: exchange rate processes implicit in deutsche mark options. Review of Financial Studies 9, 69–107.
Bates, D., 2000. Post-’87 crash fears in the S&P 500 futures option market. Journal of Econometrics 94, 181–238. Black, F., 1976. The pricing of commodity contracts. Journal of Financial Economics 3, 167–179. Black, F., Scholes, M., 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81, 637–659. Bollerslev, T., Zhou, H., 2002. Estimating stochastic volatility diffusion using conditional moments of the integrated volatility. Journal of Econometrics 109, 33–65. Bollerslev, T., Gibson, M., Zhou, H., 2011. Dynamic estimation of volatility risk premia and investor risk aversion from option-implied and realized volatilities. Jounal of Econometrics 160 (1), 235–245. Finance and Economics Discussion Series. Divisions of Research and Statistics and Monetary Affairs Federal Reserve Board, Washington, DC. Britten-Jones, M., Neuberger, A., 2000. Option prices, implied price processes, and stochastic volatility. Journal of Finance 55, 839–866. Chernov, M., Ghysels, E., 2000. A study towards a unified approach to the joint estimation of objective and risk neutral measures for the purpose of options valuation. Journal of Financial Economics 56, 407–458. Chernov, M., Gallant, R., Ghysels, E., Tauchen, G., 2003. Alternative models of stock price dynamics. Journal of Econometrics 116, 225–257. Corradi, V., Distaso, W., 2006. Semiparametric comparison of stochastic volatility models using realized measures. Review of Economic Studies 73, 635–667. Duffie, D., Pan, J., Singleton, K., 2000. Transform analysis and asset pricing for affine jump-diffusions. Econometrica 68 1343–1376. Eraker, B., Johannes, M.S., Polson, N.G., 2003. The impact of jumps in returns and volatility. Journal of Finance 58, 1269–1300. Heston, S.L., 1993. A closed-form solution for options with stochastic volatility with applications to bond and currency options. The Review of Financial Studies 6, 327–343. Hull, J., White, A., 1987. The pricing of options on assets with stochastic volatility. The Journal of Finance 42, 281–300. Jiang, G., Tian, Y., 2005. Model-free implied volatility and its information content. Review of Financial Studies 18, 1305–1342. Lynch, D., Panigirtzoglou, N., 2003. Option implied and realized measures of variance, Working Paper, Monetary Instruments and Markets Division, Bank of England. Lewis, A.L., 2000. Option Valuation Under Stochastic Volatility. Finance Press. Lewis, M.-A., 2001. Analytical expressions for the moments of the integrated volatility in affine stochastic volatility models, Working Paper, CIRANO. Meddahi, N., 2001. An eigenfunction approach for volatility modeling, Working Paper 29-2001, CRDE, Université de Montréal. Meddahi, N., 2002. Moments of continuous time stochastic volatility models, Working Paper, Université de Montréal. Meddahi, N., Renault, E., 2004. Temporal aggregation of volatility models. Journal of Econometrics 119, 355–379. Medvedev, A., Scaillet, O., 2007. Approximation and calibration of short-term implied volatilities under jump-diffusion stochastic volatility. Review of Financial Studies 20 (2), 427–459. Pan, J., 2002. The jump-risk premia implicit in options: evidence from an integrated time-series study. Journal of Financial Economics 63, 3–50. Pastorello, S., Renault, E., Touzi, N., 2000. Statistical inference for random variance option pricing. Journal of Business and Economic Statistics 18, 358–367. Pastorello, S., Patilea, V., Renault, E., 2003. Iterative and recursive estimation in structural nonadaptive models. Journal of Business and Economic Statistics 21–4, 449–482. Poteshman, A., 1998. Estimating a general stochastic variance model from option prices, Manuscript, University of Chicago. Romano, M., Touzi, N., 1997. Contingent claim and market completeness in a stochastic volatility model. Mathematical Finance 7 (4), 399–410. Zhou, H., 2003. Itô conditional moment generator and the estimation of short rate processes. Journal of Financial Econometrics 1, 250–271.
Journal of Econometrics 160 (2011) 33–47
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Estimating covariation: Epps effect, microstructure noise✩ Lan Zhang Department of Finance, University of Illinois at Chicago, Chicago, IL 60607, United States
article
info
Article history: Available online 6 March 2010 JEL classification: C01 C13 C14 C46 G11 Keywords: Bias–variance tradeoff Epps effect High frequency data Measurement error Market microstructure Martingale Nonsynchronous trading Realized covariance Realized variance Two scales estimation
abstract This paper is about how to estimate the integrated covariance ⟨X , Y ⟩T of two assets over a fixed time horizon [0, T ], when the observations of X and Y are ‘‘contaminated’’ and when such noisy observations are at discrete, but not synchronized, times. We show that the usual previous-tick covariance estimator is biased, and the size of the bias is more pronounced for less liquid assets. This is an analytic characterization of the Epps effect. We also provide the optimal sampling frequency which balances the tradeoff between the bias and various sources of stochastic error terms, including nonsynchronous trading, microstructure noise, and time discretization. Finally, a two scales covariance estimator is provided which simultaneously cancels (to first order) the Epps effect and the effect of microstructure noise. The gain is demonstrated in data. © 2010 Elsevier B.V. All rights reserved.
1. Introduction This paper is about how to estimate the integrated covariance
⟨X , Y ⟩T over a fixed time horizon [0, T ], when the observations of X and Y are ‘‘contaminated’’ and when such noisy observations of X and of Y are at discrete, but not synchronized, times. Consider the price processes of two assets, {Xt } and {Yt }, both in logarithmic scale. Suppose both {Xt } and {Yt } follow an Itô process, namely, dXt = µXt dt + σtX dBXt ,
(1)
dYt = µ
(2)
Y t dt X
+σ
Y Y t dBt
,
Y
where B and B are standard Brownian motions, with correlation corr(BXt , BYt ) = ρt . The drift coefficient µt , and the instantaneous variance σt2 of the returns process Xt will be stochastic processes, which are assumed to be locally bounded. Our interest is to estimate the integrated covariation ⟨X , Y ⟩T ,
⟨X , Y ⟩T =
T
∫
σtX σtY d⟨BX , BY ⟩t ,
(3)
0
✩ The author would like to thank the editors and referees for their helpful and constructive comments. E-mail address:
[email protected].
0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.012
using the ultra-high frequency observations of X and Y within the fixed time horizon [0, T ]. Inference for (3) is a well-understood problem if X and Y are observed simultaneously and without contamination (say, in the form of microstructure ∑ noise). A limit theorem in stochastic processes states that i:τi ∈[0,T ] (Xτi − Xτi−1 ) (Yτi − Yτi−1 ), commonly called realized covariance, is a consistent estimator for ⟨X , Y ⟩T as the observation intervals get closer; furthermore its estimation error follows a mixed normal distribution, see, for example, Jacod and Protter (1998), Barndorff-Nielsen and Shephard (2002a), Zhang (2001), and Mykland and Zhang (2006). For a glimpse of the econometric literature on this inference problem when X = Y , one can read Andersen and Bollerslev (1998), Andersen et al. (2001), Barndorff-Nielsen and Shephard (2002b), and Gençay et al. (2002), among others. In ultra-high frequency data, the exact observation times of X and Y are rarely simultaneous, and estimating ⟨X , Y ⟩T in this asynchronous case becomes a relevant and pressing problem. This lack of synchronicity often causes some undesirable features in the inference. In particular, as documented by Epps (1979), correlation estimates tend to decrease when sampling is done at high frequencies. Even in daily data, asynchronicity can cause difficulties (Scholes and Williams, 1977). Lo and MacKinlay (1990) propose a solution based on a stochastic model of censoring. In practice, most nonparametric estimation procedures
34
L. Zhang / Journal of Econometrics 160 (2011) 33–47
for ⟨X , Y ⟩T start with creating an approximately synchronized pair (X , Y ) by either previous-tick interpolation or linear interpolation, then construct the estimator on the basis of the synchronized approximations. These interpolation-based estimators are often biased, as witnessed in empirical studies (Dacorogna et al., 2001). A different issue when one deals with high frequency data is the existence of microstructure noise. In the early papers (AïtSahalia et al., 2005; Zhang et al., 2005), they found that when the microstructure noise is present in the observed prices, then the realized variance estimator for ⟨X , X ⟩T – a special case of realized covariance – is biased and this bias can get progressively worse as more high frequency data is employed.1 However, it is not well understood how an estimator for the covariation ⟨X , Y ⟩T behaves, when the estimation uses ultra-high frequency noisy data. In this paper, we are concerned with the behavior of the previous-tick approach to estimation of ⟨X , Y ⟩T when the observation times of X and Y are not synchronized and when the microstructure noise is present in the observed price processes. We show that asynchronicity leads to a bias in the previous-tick estimator for ⟨X , Y ⟩T , thus giving an analytic form of the Epps effect. The variance of the estimator, meanwhile, comes from three sources—discrete observation/transaction times, nonsynchronization, and the microstructure noise. We provide the optimal sampling frequency to balance the tradeoff among different error sources, and present the explicit expression for the asymptotic bias and variance when the observations times of X and Y follow Poisson process. A further advantage of the previous-tick estimator is that it permits easy analysis of microstructure noise. It is here shown that in the presence of noise, one can create two and multiscale versions of the previous-tick estimator. As we shall see in Section 8, the bias due to asynchronicity cancels in the same way as the bias due to microstructure noise, while the variance asymptotically behaves as if there is no asynchronicity (in the subsample of previous ticks). Thus, while the previous-tick approach does throw away data, it can retain rate efficiency. In terms of microstructure noise, this paper provides a two- and multiscale alternative to the multivariate autocovariance-based estimator of Barndorff-Nielsen et al. (2008b). Other work investigating the combination of asynchronicity and microstructure noise includes Lunde and Voev (2007) and Griffin and Oomen (2007). The paper is organized as follows: we introduce the concepts and notations in Section 2.1, and give a preview of the main findings in Section 2.2. Sections 3 and 4 provide the asymptotic stochastic bias and variance of the previous-tick estimator, assuming the absence of microstructure noise in the price processes. Section 5 deals with the case when the trading times are random. An application when the transaction times follow Poisson processes is provided in Section 6. Section 7 focuses on the inference when the microstructure noise is present. Two scales estimation is presented in Section 8. Finally, Section 9 concludes. 2. Setting, and some main findings 2.1. Setup and notations Our interest is to estimate the covariation ⟨X , Y ⟩T between two returns in a fixed time period [0, T ], when X and Y are observed asynchronously. Let the observation/transaction times of X be recorded in Tn , and those of Y in Sm . At the moment we assume X and Y are free of
microstructure noise (in short, noise). Later in Section 7 we study the cases when these two price processes are observed with noise ϵ X and ϵ Y respectively. We denote the elements in Tn by τn,i , and the elements in Sm by θm,i . Specifically, 0 = τn,0 ≤ τn,1 ≤ · · · ≤ τn,n = T , 0 = θm,0 ≤ θm,1 ≤ · · · ≤ θm,m = T . For the ease of the notation, we often suppress the subscript n and m from the τ s and θ s unless the context is misleading. The τ and θ sequences may be irregular and random but independent of the price process, so long as the spacings are not allowed to be too large. An extension to more general random times is considered in Section 5. We focus on a particular type of covariance estimator called previous-tick estimator. Intuitively, it is a sample covariance estimator based on the prices that immediately precede (or are at) the pre-specified sampling points. One can view this previous-tick approach as a special way to subsample the raw data. To formulate the previous-tick covariance estimator, we introduce the concepts related to sampling points. Let N = n + m, write n and m as nN and mN from here on. We consider a subset of [0, T ] which satisfies the following.
VN ⊂ [0, T ];
(4)
We use vi to denote the elements in VN , VN = {v0 , v1 , . . . , vMN }, with v0 = 0 and vMN = T , where MN is the sampling frequency. A simple case of V would be a regular grid, where the elements are equally spaced out in time, that is, vi −vi−1 = 1v, ∀i. This sampling scheme is the most common one in analyzing time-dependent data, for example, typical sampling interval in high-frequency financial application includes every 5 min, 15 min, 30 min and hourly. An alternative way of setting the grid VN is to let the vi ’s depend on the observation times, for example by setting vi to be the maximum of min{τ ∈ Tn : τ > vi−1 } and min{θ ∈ Sm : θ > vi−1 }. This is the concept of refresh time, as introduced by BarndorffNielsen et al. (2008b). One can also implement this for more than two stocks. We assume the following regarding the relation between vi s, τi s, and θi s: Condition C1. There is at least one pair of (τ , θ ) in between the consecutive vi s. Under Condition C1, the previous ticks are then defined as: ti = max{τ ∈ Tn : τ ≤ vi },
and
(5)
si = max{θ ∈ Sm : θ ≤ vi },
so that the ti s and the si s are the sampling points in X and Y , respectively, according to the previous-tick sampling scheme. We note that Condition C1 holds so long as there are sufficiently many data in both X and Y within the time window [0, T ]. A sufficient criterion for C1 is provided by Conditions C2 and C3 below. C1 is also valid (without C3) when the v ’s are the refresh times mentioned above. We need more assumptions to pursue the analysis for the covariance estimator. We assume that the transaction times of X and Y satisfy: Condition C2. supi |θm,i −θm,i−1 | = O
=O
1 N
1 N
, and supi |τn,i −τn,i−1 |
.
Note that Condition C2 implies that on the one hand lim infN →∞ > 0 and lim infN →∞ nNN > 0; on the other hand, it is obvious m n that NN ≤ 1 and NN ≤ 1. In particular, nN = O(mN ) and mN = O(nN ). We sometimes assume that the sampling frequency MN satisfies: mN N
1 Recent developments on volatility estimation include multiscale estimation (Zhang, 2006; Aït-Sahalia et al., 2011), kernel methods (Barndorff-Nielsen et al., 2008a, 2011), and pre-averaging (Podolskij and Vetter, 2009; Jacod et al., 2009).
0, T ∈ VN , also VN is finite for each N .
L. Zhang / Journal of Econometrics 160 (2011) 33–47
Condition C3. supi |vN ,i − vN ,i−1 | = O
1 MN
, and MN = o(N ).
Conditions C2 and C3 imply Condition C1 when N is large enough. There are two reasons for imposing C3. One is technical, it arises naturally in connection with both two scales estimation (Section 8) and bias–variance tradeoffs (Sections 4.3, 6.2 and 7.1). The other is more conceptual: the observation times are often not known exactly or incorrectly recorded. If one assumes that the times are known up to, say, order O(N −α ), having the distance between consecutive grid points in V , vi − vi−1 , bigger than O(N −α ) ensures the previous-tick estimator to be consistent.
Let X and Y be Itô processes satisfying (1)–(2). Let [X , Y ]T be the previous-tick covariance estimator in (6). From the Kunita–Watanabe inequality, ⟨X , Y ⟩t is absolutely continuous in t. Assuming Condition C1, we can therefore decompose [X , Y ]T into:
[X , Y ]T =
[X , Y ]T =
min(ti ,si )
−∫
max(ti−1 ,si−1 )
i
Definition 1. At last, the previous-tick estimator for the covariation is defined as MN −
35
3.1. Decomposition for the estimator [X , Y ]T
+
min(ti ,si )
−∫ i
(6)
drift term
max(ti−1 ,si−1 )
+
LN , discretization error
R
i=1
(Xu − Xmax(ti−1 ,si−1 ) )dYu [2]
(Xti − Xti−1 )(Ysi − Ysi−1 ),
⟨X , Y ⟩′u du
N
,
(10)
asynchronicity error
where the ti s and si s are the previous ticks in (5).
where the ti s and si s are the previous ticks defined in (5), and see Lemma 1 in Appendix for the exact form of RN . We have used the following symbol (cf. McCullagh, 1987):
2.2. Some main findings We here summarize the most important results from the practitioner point of view. First of all, the bias in the estimator (6) is given by (7)
Notation 1. The symbol ‘‘[2]’’ is used as follows: if (a, b) is an expression in a and b, then (a, b)[2] means (a, b) + (b, a), so that (Xu − Xmax(ti−1 ,si−1 ) )dYu [2] means (Xu − Xmax(ti−1 ,si−1 ) )dYu + (Yu − Ymax(ti−1 ,si−1 ) )dXu .
where FN (t ) = i:max(ti ,si )≤t |ti − si |. Typically, FN (t ) and the bias are of order Op (MN /N ). See Theorem 1 for precise statements. Second, when Condition C3 is in place, then
Each component in (10) plays different role in the distribution of [X , Y ]T . To proceed the discussion, we need first to define stochastic bias and stochastic variance of an estimator.
T
∫
⟨X , Y ⟩′u dFN (u) + Op
− 0
1
N
,
∑
[X , Y ]T = [ X , Y ]T −
∫
T
⟨X , Y ⟩′u dFN (u) + Op (N −1/2 )
(8)
0
where [ X , Y ]T is the unobserved value of the synchronized estimator
[ X , Y ]T =
MN −
(Xvi − Xvi−1 )(Yvi − Yvi−1 ).
(9)
i=1
See Theorem 4 (in Section 4.1) for details. Under Condition C3, therefore, one can behave as if observations were synchronously obtained at times vi , provided that one can deal with the bias. This has important consequences. On the one hand, it provides an analytic characterization of the Epps (1979) effect. As described further in Section 3.2, the Epps effect is essentially the bias (7), and it is typically negative for positively associated processes (X , Y ). Also, from (8), the Epps effect is only a matter of bias; except at the highest sampling frequencies, it does not substantially affect the variance of the estimator. On the other hand, (8) suggests that when suitably adapted, existing theory for the synchronized case can be applied to the asynchronous case. We shall show two types of applications. In Sections 4.3, 6.2 and 7.1, we carry out a bias–variance tradeoff to remove the effect of asynchronicity. In Section 8 we show that both asynchronicity and microstructure noise can be removed with the help of two scales estimation, along the lines of Zhang et al. (2005). 3. Previous-tick covariance estimator under zero noise We start with an idealized world, where the mechanics of the trading process is perfect so that there is no microstructure noise in both X and Y . We shall see that [X , Y ]T can be decomposed based on the impact of different data structure.
Definition 2. Consider a semimartingale Z . Let Zˆ be an estimator for Z . Suppose that Zt − Zˆt has the following Doob–Meyer decomposition, for t ∈ [0, T ], Zˆt − Zt = At + Mt , where {Mt } is a martingale and {At } is a predictable process. Then for fixed t, t ∈ [0, T ], we call At the stochastic bias of Zˆt , denoted as SBias(Zˆt ); we call the quadratic variation ⟨M , M ⟩t of Mt the stochastic variance of Zˆt . Note that if At is nonrandom, it is also the exact bias; if ⟨M , M ⟩t is nonrandom, it gives the exact variance. In light of Definition 2 and the decomposition equation (10), the stochastic bias of [X , Y ]T is
−∫ i
min(ti ,si ) max(ti−1 ,si−1 )
⟨X , Y ⟩′u du − ⟨X , Y ⟩T ;
meanwhile, both the discretization error LN and the asynchronicity error RN contribute to the stochastic variance of [X , Y ]T . It is apparent that, in the situation when X and Y are traded simultaneously, the asynchronicity error RN becomes zero. When the trading is not synchronous, however, it is not obvious to see the relative impact and the tradeoff between LN and RN . We pursue these next. First, we need the following concept. Definition 3. A sequence of càdlàg processes GN (t ), 0 ≤ t ≤ T is said to be relatively compact in probability (RCP) if every subsequence has a further subsequence GNk so that there is a process G(t ), where GNk (t ) converges in probability to G(t ) at every continuity point t ∈ [0, T ] of G(t ).
36
L. Zhang / Journal of Econometrics 160 (2011) 33–47
For applied purposes, if the sequence GN is RCP, one can act as if the limit exists, cf. the discussion on p. 1411 of Zhang et al. (2005).
are RCP in the sense of Definition 3, and the process 3/2 MN QN ,u
−
[2] = 2
3.2. Stochastic bias: The Epps effect
i:ti ,si ≤u
Theorem 1. Let X and Y be Itô processes satisfying (1)–(2), with µt and σt locally bounded. Let [X , Y ]T be the previous-tick covariance estimator. Let VN = {0 = v0 , v1 , . . . , vMN } be a collection of sampling points which span across [0, T ], and let ti and si be the transaction times of X and Y , respectively, that immediately precede vi . Then, under Conditions C1 and C2, the stochastic bias of [X , Y ]T is T
∫
⟨X , Y ⟩′u dFN (u) + Op
−
0
1
N
,
(11)
FN (t ) =
T
T
∫ 0
) ⟨X , X ⟩′u ⟨Y , Y ⟩′u + (⟨X , Y ⟩′u )2 dUN(dis ,u
T
⟨X , Y ⟩′t dFN (t ) are 0
Γ ΛΓ ∗ , where Γ is orthogonal and Λ is a diagonal matrix. If one sets Λ+ as the matrix Λ with negative elements replaced by zero, a nonnegative definite estimator of ⟨X , X ⟩T is given by ⟨ X , X ⟩T = Γ Λ+ Γ ∗ . Asymptotically all these procedures have the same properties when the true ⟨X , X ⟩T is positive definite, since then Λ is eventually positive with probability one as N → ∞. 3.3. Stochastic variance Theorem 2. Under the same conditions and setup as in Theorem 1, the following processes
− (min(ti , si ) − max(ti−1 , si−1 ))2 = , T /MN i:t ,s ≤u i
− (si − si−1 )(ti − ti−1 ) − (max(ti−1 , si−1 ) − min(ti , si ))2 , T /N i:s ,t ≤u i i
T
∫
N
0
) ⟨X , X ⟩′u ⟨Y , Y ⟩′u dUN(nonsyn . ,u
(12)
(dis)
UN ,u = u − 2FN (u) + O(1/MN ) + O((MN /N )2 ).
(13)
As we shall see from the proof of Theorem 2 (in Appendix), the 3/2 1/MN term and 1/MN term (i.e. QN ,T ) in (12) correspond to the first- and the second-order effect, from the quadratic variation of the discretization error in (10), whereas the 1/N term comes from the quadratic variation of the asynchronous error. We can call U (dis) quadratic covariation of time due to discretization, and call U (nonsyn) quadratic covariation of time due to nonsynchronization. Remark 2. In the special case where X and Y are traded simultaneously, U (nonsyn) becomes zero, and the total asymptotic variance in (12) reduces to T
T
∫
MN
⟨X , X ⟩′u ⟨Y , Y ⟩′u + (⟨X , Y ⟩′u )2 dH (u),
(14)
0
where H (u) = U (u) = lim
∑
τMN ,i ≤u
(τMN ,i −τMN ,i−1 )2 T / MN
is the quadratic
variation of time. A further specialization is when X = Y , (12) becomes T MN
T
∫
2(⟨X , X ⟩′u )2 dH (u),
(15)
0
both (14) and (15) are consistent with the results in Mykland and Zhang (2006). Note that in (12), the relevant component in QT [2] is the endpoint of a martingale with quadratic variations as follows:
⟨QN [2], QN [2]⟩ =
2 3
lim
MN −
(min(ti , si ) − max(ti−1 , si−1 ))4
i =1
× {(⟨X , X ⟩′ti−1 )2 (⟨Y , Y ⟩′ti−1 )2 + 6(⟨X , X ⟩′ti−1 )(⟨Y , Y ⟩′ti−1 )(⟨X , Y ⟩′ti−1 )2 + (⟨X , Y ⟩′ti−1 )4 } × (1 + op (1)).
(16)
When taking expectation (which is relevant when the trading times are random), the QT [2] term yields zero, thus it disappears in the final expression for the variance.
i
(nonsyn) UN ,u
=
T
Finally, N MN
Remark 1. When the previous-tick estimator is used for all of [X , Y ]T , [X , X ]T , and [Y , Y ]T , the correlation estimator is no larger than one in absolute value. If one uses a different type of estimator for [X , X ]T , and [Y , Y ]T , the estimated correlation should just be truncated at 1 or −1 as appropriate. Similar comments apply when a covariation matrix ⟨X , X ⟩T is estimated for a vector process X . If a different estimator is used to compute the diagonal elements, one can take the estimated matrix and write ⟨X , X ⟩T =
(dis)
(min(ti , si ) − u)
is tight. Also, the leading terms in the stochastic variance of [X , Y ]T − ⟨X , Y ⟩T are
+ QN ,T [2] +
The function FN takes non-negative value and it will play a central rôle in our narrative. To see an example of a limit of N F (t ), we refer to the Poisson example in Corollary 4 (Section 6). MN N From Theorem 1, one should note that FN (t ) = 0 – thus the previous-tick estimator is unbiased – when the two processes X and Y are traded simultaneously, or more generally if the selected subsample has synchronized observation times. If the two assets X and Y are not traded simultaneously, the stochastic bias typically has order MN /N, the previous-tick estimator [X , Y ] is then asymptotically unbiased under Condition C3. However, there is a finite sample effect in (11), and (11) is an analytic representation of the Epps effect in cases where the subsampling is moderate (see the discussion in Section 2.2). Also T Theorem 1 implies the magnitude of the bias − 0 ⟨X , Y ⟩′u dFN (u) is greater for less liquid assets (large |ti − si | on average).
UN ,u
max(ti−1 ,si−1 )
× (Xu − Xmax(ti−1 ,si−1 ) )dYu [2]
|ti − si |.
Furthermore, the sequences MN FN (t ) and N RCP in the sense of Definition 3.
min(ti ,si )
max(ti−1 ,si−1 )
i:ti ,si ≤u
MN
i:max(ti ,si )≤t
∫
× (Yu − Ymax(ti−1 ,si−1 ) )dYu [2] ∫ min(ti ,si ) − (min(ti , si ) − u) +2 ⟨X , Y ⟩′ti
where
−
⟨X , X ⟩ti ′
4. The case when MN = o(N ) We shall see in this section that under C3, the 1/MN term (i.e. the discretization effect) in (12) is the sole leading term in the
L. Zhang / Journal of Econometrics 160 (2011) 33–47
asymptotic variance of the previous-tick estimator. The source of the second-order term in the asymptotic variance depends on the exact order of MN . An interesting case is when MN = O(N 2/3 ). This choice is optimal in the sense of minimizing mean squared error of [X , Y ]T , when the stochastic bias of [X , Y ]T is Op (MN /N ) (Theorem 1) and the stochastic variance is Op (1/MN ) (Theorem 2). 3/2
We can see that in this scenario the 1/N term and the 1/MN term in (12) share the second-order effects. We shall elaborate on the higher-order behaviors in this Section. We emphasize that regardless of the order of MN , the interaction between the discretization and the asynchronous effect √ is at most a third-order effect, with order 1/(MN N ) (see the proof of Theorem 2). 4.1. First order behavior This is an immediate conclusion from Theorem 2:
37
4.2. Higher order behavior We can also say something about the higher order terms in the variance. First the non-martingale part. Corollary 2. Assume Condition C2–C3. In addition to the conclusions of Corollary 1, we also have that (nonsyn)
UN ,u
=2
N
FN (u) + o(1).
MN
If we for the moment ignore the martingale QN [2], we can therefore assert that the effect of nonsynchronization is to high order fully characterized by the function FN (u), since this is the (nonsyn) (dis) quantity one encounters in both the bias, the UN ,u and UN ,u terms. Theorem 2 put together with Corollary 2 then yields
(dis)
Corollary 1. Assume C2–C3. Then Uu exists and equals the scaled quadratic variation of the grid points V . In the case of equispaced (dis) grid points, Uu = u. The total variance term of the previous-tick estimator is, to first order, T
T
∫
MN
⟨X , X ⟩′u ⟨Y , Y ⟩′u + (⟨X , Y ⟩′u )2 dUu(dis) .
stochastic variance =
T
∫
T MN
−2
⟨X , X ⟩′u ⟨Y , Y ⟩′u + (⟨X , Y ⟩′u )2 du 0 T
∫
T MN
(⟨X , Y ⟩′u )2 dFN (u) 0
+ QN [2] + op (N −1 )
(17)
0
−3/2
+ op (MN
Corollary 1 says that when the data (X , Y ) arrive faster than the sampling frequency, the asynchronization effect disappear and only the discretization effect U (dis) remains in the variance term. We can also assert something about the asymptotic distribution of the estimator. Let LN be the discretization term in (10). Then, in view of Theorem 1 and Lemma 1 (in the Appendix),
[X , Y ]T − ⟨X , Y ⟩T = −
T
∫
⟨X , Y ⟩′u dFN (u) + LN + Op (N −1/2 ). (18)
The quantity in (17) is simply the asymptotic version of ⟨LN , LN ⟩T . By extending the arguments above to all time points t ∈ [0, T ] and using the theory in Chapters VI and IX of Jacod and Shiryaev (2003), we thus obtain Theorem 3. Assume the conditions of Theorem 1. Under Condi1/2 tion C2–C3, MN LN converges in law to T
∫
T
(dis)
⟨X , X ⟩u ⟨Y , Y ⟩u + (⟨X , Y ⟩u ) dUu ′
′
′ 2
1/2
,
(19)
0
where Z is standard normal, and independent of X and Y . Note that the convergence is stable, in the sense of Rényi (1963), Aldous and Eagleson (1978), Chapter 3 (p. 56) of Hall and Heyde (1980), Rootzén (1980) and Section 2 (pp. 169–170) of Jacod and Protter (1998). For the connection to this type of high frequency data problem, see Zhang et al. (2005) and Zhang (2006). By using the same methods, we relate the estimator to the hypothetical unobserved ‘‘gold standard’’ (9): Theorem 4. Assume the conditions of Theorem 1. Under Condition C2–C3,
[X , Y ]T = [ X , Y ]T −
∫
T
⟨X , Y ⟩′u dFN (u) + Op (N −1/2 ).
(20)
0
The result also holds if [ X , Y ]T is defined with wi = max(ti , si ) replacing vi . One can, in fact, deduce Theorem 3 from this result using the standard theorems for synchronous observation in BarndorffNielsen and Shephard (2002a), Jacod and Protter (1998), Mykland and Zhang (2006) and Zhang (2001).
) + op (MN /N 2 ).
(22)
Putting this in turn together with Theorem 1, one obtains that the stochastic MSE (bias2 + variance) of [X , Y ]T − ⟨X , Y ⟩T is T
∫
2 ⟨X , Y ⟩u dFN (u) ′
stoch MSE = 0
+
0
Z
(21)
∫
T
T
MN
0
T
∫
−2
MN
⟨X , X ⟩′u ⟨Y , Y ⟩′u + (⟨X , Y ⟩′u )2 du
T
(⟨X , Y ⟩′u )2 dFN (u) 0
−3/2
+ QN [2] + op (N −1 ) + op (MN
) + op (MN /N 2 ).
(23)
−3/2
Recall that QN [2] = Op (MN ). What about the term due to QN [2]? First order answers can be provided by considering the case when the quadratic variations ⟨X , X ⟩, ⟨Y , Y ⟩ and ⟨X , Y ⟩ are nonrandom. In this case, by taking the expected MSE, the martingale term QN disappears. One can behave as if the MSE is the first three elements of (23). We return to the question of the meaning of QN [2] in later Section 4.4. 4.3. Bias–variance tradeoff In view of Theorem 1, ‘‘typical’’ behavior is that N MN
FN (u) → F (u) as N → ∞,
(24)
where F is a nondecreasing function. (In particular, every subsequence will have a further subsequence displaying this behavior). In this case, we obtain
stoch MSE =
MN
2 ∫
N
+
T
−2
N
2 ⟨X , Y ⟩′u dF (u)
0
∫
T
∫
0 T
MN T
T
⟨X , X ⟩′u ⟨Y , Y ⟩′u + (⟨X , Y ⟩′u )2 du (⟨X , Y ⟩′u )2 dF (u)
0
−3/2
+ QN [2] + op (N −1 ) + op (MN
) + op (MN2 /N 2 ). (25)
38
L. Zhang / Journal of Econometrics 160 (2011) 33–47
A tradeoff between bias2 (the first term) and variance (the later terms) is therefore obtained by setting (MN /N )2 = O(MN−1 ), yielding MN = O(N 2/3 ). Thus the first order terms in the MSE is given by the two first terms in (23) or (25). 4.4. The meaning of the martingale QN Let KN be the martingale (non-drift) term in (10). In other words, T ⟨X , Y ⟩′u dFN (u) + KN + Op (N −1 ). By the 0 same methods as before, we obtain.
[X , Y ]T − ⟨X , Y ⟩T = −
Corollary 3. Assume Condition C2–C3. Then, in probability, MN3 ⟨QN [2], QN [2]⟩ →
2 3
T3
∫ T
(⟨X , X ⟩′u )2 (⟨Y , Y ⟩′u )2
0
+ 6(⟨X , X ⟩′u )(⟨Y , Y ⟩′u )(⟨X , Y ⟩′u )2 + (⟨X , Y ⟩′u )4 du
(26)
and 1/2
3/2
⟨MN KN , MN QN [2]⟩ →
1 3
T2
∫ T
5⟨X , X ⟩′u ⟨Y , Y ⟩′u ⟨X , Y ⟩′u
0
+ (⟨X , Y ⟩′u )3 du.
(27)
In fact, the corollary asserts that [X , Y ]T −⟨X , Y ⟩T is correlated with its own stochastic variance! What could this possibly imply? Again, to get a first order answer, by considering what would happen if the quadratic variations ⟨X , X ⟩, ⟨Y , Y ⟩ and ⟨X , Y ⟩ are nonrandom. We then obtain that the third cumulant of KN is given by cum3 (KN ) = 3cov(KN , ⟨KN , KN ⟩) 1/2
1/2 MN KN
= 3MN E ⟨ , QN [2]⟩ + o(MN ) ∫ T = MN−2 T 2 {5⟨X , X ⟩′u ⟨Y , Y ⟩′u ⟨X , Y ⟩′u + (⟨X , Y ⟩′u )3 }du −2
(28)
0
(for the first transition, cf. Eq. (2.14) (p. 23) of Mykland (1994)). Similar methods can be used to compute the fourth cumulant. Thus, the QN is more of a contribution to the Edgeworth expansions of our estimator, rather than an adjustment to variance. 5. When trading times are random It is often natural to assume that the trading times τ and θ can be described as the event times of a counting process. Let the arrival times τ s have intensity λX (t ) and the θ s have intensity λY (t ). For the moment we assume that both these intensities can be random (but predictable) processes. This type of model requires some modification on the earlier development. For one thing, the counts m and n are random, so is N = m + n. Also, and more seriously, Conditions C1 and/or C2 may not be satisfied. We consider these issues in turn. First of all, to get an asymptotic framework, we assume the following. Condition C4. There is a sequence of experiments indexed by nonrandom α , α > 0, so that λX = λXα and λY = λYα . In general, the intensities can be any function of α , but we suppose that there are constants c¯ and c, independent of α , 0 < c ≤ 1 ≤ c¯ < ∞, so that for all t ∈ [0, T ],
α c ≤ λXα (t ) ≤ α¯c and α c ≤ λYα (t ) ≤ α¯c .
The assumption that will run into trouble is Condition C2. This is a natural assumption for developing analytical results when the trading/sampling times are nonrandom, but Condition C2 is neither true nor necessary if the sampling times are random. In fact, if the intensities λXα and λYα are independent of time t, then conditionally on m and n, the sampling times for the X and Y processes are like the order statistics from a uniform distribution on [0, T ] (see, for example, Theorem 2.3.1 (p. 67) of Ross (1996)). Thus supi |θm,i − θm,i−1 | = O(log N /N ), but not O(1/N ), and similarly for the τ ’s (see Devroye (1981, 1982), Aldous (1989), Shorack and Wellner (1986), for example). By the subsampling argument used in the proof of Theorem 5 below, this extends to all sampling schemes covered by Condition C4. Fortunately, this problem does not affect us in view of the upcoming Theorem 5. A restriction that ensures Condition C1 to be satisfied (eventually as N → ∞) is sufficient. For this, we require that the size of the regular grid satisfies Mα = op (α/ log α).
(30)
Theorem 5. Let X and Y be Itô processes satisfying (1)–(2), and let µt and σt be locally bounded. Let [X , Y ]T be the previous-tick covariance estimator. Let Vα = {0 = v0 , v1 , . . . , vMα } be a collection of nonrandom time points which span across [0, T ]. Let ti and si be the transaction times of X and Y , respectively, that immediately precede vi . Assume Condition C3, C4 and (30). Then the conclusions of Theorems 1, 2 and 4 remain valid (with Mα replacing MN ). 6. Application: trading times follow a Poisson process
= 3MN−2 cov(MN KN , QN [2]) + o(MN−2 ) −2
Remark 3 (Asymptotic Framework). We do asymptotics as α → ∞. Note that since N /α = Op (1) but not op (1), this is the same as supposing that N → ∞. The same argument yields the same orders for m and n.
(29)
We now consider an application where the transaction times for assets X and Y follow two independent Poisson processes with (constant) intensities λXα and λYα , respectively. The meaning of Condition C4 is now simply that λXα and λYα have the same order as index α → ∞. 6.1. Stochastic bias and variance in the case of Poisson arrivals Corollary 4. In the setting of Theorem 5, suppose that the consecutive sampling points vi ’s are evenly spaced. Also, suppose that the transaction times for assets X and Y follow two independent Poisson processes with intensities λXα and λYα (constant for each α ), respectively. Also suppose that λXα /α → ℓX and λYα /α → ℓY , as α → ∞. Then N Mα
FN (t ) → t
ℓX ℓY + ℓX ℓY
(31)
in probability. Since the stochastic bias is given by SBias([X , Y ]T ) = − ⟨X , Y ⟩′t dFN (t ) + Op (N −1 ), we obtain that N Mα
SBias([X , Y ]T ) → −⟨X , Y ⟩T
ℓY ℓX + ℓX ℓY
.
T 0
(32)
It is obvious that the bias has opposite sign with the covariation between X and Y , and its magnitude reaches its minimum when ℓX = ℓY (for given value of ℓ = ℓX + ℓY ). We now move on to the asymptotics of stochastic variance in the case of Poisson processes. In analogy with the result in
L. Zhang / Journal of Econometrics 160 (2011) 33–47
39
Section 4.2, we obtain:
semimartingale model, and a microstructure noise process. That is,
Corollary 5. In the setting of Corollary 4, then the asymptotic stochastic variance of the previous-tick estimator becomes (leaving out the term that is due to QN in Theorem 2)
Xτon,i = Xτn,i + ϵτXn,i
T
∫
T Mα
(⟨X , Y ⟩′u )2 + ⟨X , X ⟩′u ⟨Y , Y ⟩′u du 0
−2
T
N
ℓY ℓX + Y X ℓ ℓ
T
∫
(⟨X , Y ⟩′u )2 du.
(33)
0
The QN term is excluded for the reasons discussed in Sections 4.2 and 4.4. 6.2. Bias–variance tradeoff Assuming that the observed (X , Y ) are true (efficient) logarithmic prices, we have demonstrated that the previous-tick estimator has an asymptotically bounded bias. This bias is induced by asynchronous trading of two assets. Naturally the variance estimator in this case is unbiased as the price series is inherently synchronized with itself. In analogy with the development in Section 4.3, we can now find an optimal sampling frequency. In this Poisson application, we can obtain very straightforward expressions. Recall from Corollary 5 that the variance of the previous-tick estimator consists of two terms, the 1/Mα term and 1/N term. Under Condition (30), the latter is of smaller order. So the main terms in the mean squared error (MSE) are:
Mα N
⟨X , Y ⟩T
+
T Mα
ℓY ℓX + Y X ℓ ℓ
2
T
∫
(⟨X , Y ⟩′u )2 + ⟨X , X ⟩′u ⟨Y , Y ⟩′u du.
(34)
0
Setting ∂ MSE/∂ Mα = 0 gives Mα = O(N 2/3 ). In particular,
T
Mα∗ =
1/3
T
(⟨X , Y ⟩′u )2 + ⟨X , X ⟩′u ⟨Y , Y ⟩′u du 0 Y 2 X 2(⟨X , Y ⟩T )2 ℓℓX + ℓℓY
N 2/3 .
With this choice in the sampling frequency, the MSE becomes
3(2
∫ T ℓY ℓX + T (⟨X , Y ⟩′u )2 ℓX ℓY 0 2/3 ′ ′ + ⟨X , X ⟩u ⟨Y , Y ⟩u du N −2/3 .
−2/3
) ⟨X , Y ⟩T
and
Yθom,i = Yθm,i + ϵθYm,i
(35)
where X o and Y o are the observed transaction prices in logarithmic scale, X and Y are the latent efficient (log) prices which satisfy the Itô-process models (1) and (2), respectively. Following the same notations as in Section 2.1, we suppose X o is observed at grid Tn , Tn = {0 = τn,0 ≤ τn,1 ≤ · · · ≤ τn,n = T }, suppose Y o is sampled at grid Sm , Sm = {0 = θm,0 ≤ θm,1 ≤ · · · ≤ θm,m = T }. In the following, we present two approaches to handling microstructure noise. One is the classical bias–variance tradeoff. We then turn to two scales estimation in the next section. It should be emphasized that the main recommendation is to use two- or multiscale estimation. The purpose of carrying out the tradeoff below is mainly to show that the effect of microstructure can be integrated into the same scheme as the Epps effect, also for the purpose of sampling frequency. 7.1. Tradeoff between discretization, asynchronization, and microstructure noise To demonstrate the idea without delving into the mathematical details, we let the noise be independent of the latent processes, that is, ϵ X yX , ϵ Y yY , also ϵ X and ϵ Y are independent. A simple structure for the ϵ ’s is white noise. We note that this model structure can be extended to incorporate the correlation structure between the latent prices and the noises, as well as that of the noises from two securities, but we shall not consider this here. (We shall use more relaxed assumptions in Section 8 below). As was argued in Section 2 of Zhang et al. (2005), to rigorously implement a bias–variance tradeoff in the presence of microstructure noise, one needs to work with a shrinking noise asymptotics: E (ϵ X )2 and E (ϵ Y )2 will be taken to be of order o(1) as N → ∞. See also Zhang et al. (2011). A similar approach was used in Delattre and Jacod (1997). Similar to the definition and notations in (6), the previous-tick estimator for covariation now becomes the cross product of X o and Y o:
[X o , Y o ]T =
MN − (Xtoi − Xtoi−1 )(Ysoi − Ysoi−1 ),
(36)
i =1
where the ti s and the si s are the corresponding time ticks immediately preceding the sampling point vi . (Because of the law of large numbers, we shall in this section identify Mα and MN ). Our question in this section is, given the observations X o and Y o at the nonsynchronized discrete grids and assuming the model (35), how close is [X o , Y o ]T to the latent quantity [X , Y ]T ? How well can [X o , Y o ]T estimate the target ⟨X , Y ⟩T ? We next study [X o , Y o ]T − [X , Y ]T , termed as the error due to noise.
Note that our assumption is that Mα is nonrandom while N is random. We are using N for simplicity of notation only, to stand in for (λXα + λYα )T . We can do this since N /(λXα + λYα )T → 1 in probability as α → ∞.
7.1.1. Signal-noise decomposition When the microstructure noise is present in the observed price processes, we can decompose the covariation estimator into those induced by the latent prices and those related to the noise. From (36), we get
7. Effect of microstructure noise
[X o , Y o ]T = [X , Y ]T + [X , ϵ Y ]T [2] + [ϵ X , ϵ Y ]T , ∑MN X X Y where [X , Y ]T is the same as (6), [ϵ X , ϵ Y ]T = i=1 (ϵti −ϵti−1 )(ϵsi − ϵsYi−1 ), and
Like many other applications, financial data usually are noisy. In the finance literature, this noise is commonly referred to as microstructure noise. One can also view microstructure noise as observation or measurement error caused by ‘‘imperfect trading’’. A simple yet natural way to view high frequency transaction data is to use a hidden semimartingale argument. One model is to write the logarithmic price process of the observables as the sum of a latent process (say, efficient price), which follows a
[X , ϵ Y ]T [2] =
MN −
(Xti − Xti−1 )(ϵsYi − ϵsYi−1 )
i=1
+
MN − i=1
(Ysi − Ysi−1 )(ϵtXi − ϵtXi−1 ).
40
L. Zhang / Journal of Econometrics 160 (2011) 33–47
We shall see that the main-order term in the above decomposition is from the noise covariation [ϵ X , ϵ Y ] and from the signal-noise interaction [X , ϵ Y ]T [2]. To see this, write MN MN − − (Xti − Xti−1 )ϵsYi−1 . [X , ϵ Y ]T = (Xti − Xti−1 )ϵsYi − i=1
i =1
Because of the white-noise property in ϵ Y , we obtain E [([X , ϵ Y ]T )2 |X ] = 2[X , X ]T E (ϵ Y )2 + Op (E (ϵ Y )2 MN−1 ) where the order op (E (ϵ Y )2 ) is from the cross term. To find the exact formula for the cross term, we refer to the method in Zhang et al. (2005). So far we have [X , ϵ Y ]T = Op [E (ϵ Y )2 ]1/2 , similarly, [Y , ϵ X ]T = Op [E (ϵ X )2 ]1/2 . For the noise variation, notice that
[ϵ X , ϵ Y ]T = 2
MN −
ϵtXi ϵsYi −
i=1
MN −
ϵtXi ϵsYi−1 [2] + ϵtX0 ϵsY0 − ϵtXM ϵsYM ,
(37)
i=1
and ϵ Y are both white noise with mean zero, and uncorrelated with each other, we have var(ϵ ϵ ) = var(ϵ ϵ
X Y ti si−1
) = var(ϵ
ϵ ) = E (ϵ ) E (ϵ ) .
X Y ti−1 si
X 2
Op (1)
+ (
Op [MN E (ϵ X )2 E (ϵ Y )2 ]1/2
ℓY ℓX + ℓX ℓY
]2 MN
T T 0 (⟨X , Y ⟩′u )2 + ⟨X , X ⟩′u ⟨Y , Y ⟩′u du MN2
+ 6E (ϵ X )2 E (ϵ Y )2
= 0, yielding an optimal choice of r satisfying rX rY =
1 6
r
−2
T
∫
(⟨X , Y ⟩′u )2 + ⟨X , X ⟩′u ⟨Y , Y ⟩′u du
T 0
− 2r ⟨X , Y ⟩T
ℓX ℓY + Y X ℓ ℓ
]2 .
(This equation uniquely defines r). Hence, the optimal mean squared error is −2/3
2r
−1
T
∫
(⟨X , Y ⟩′u )2 + ⟨X , X ⟩′u ⟨Y , Y ⟩′u du
T
]2 [ Y ℓX ℓ 2 + Y + 2⟨X , X ⟩T rY + 2⟨Y , Y ⟩T rX . − r ⟨X , Y ⟩T ℓX ℓ
)
We here continue to look at the case (35) where there is microstructure noise on top of X and Y . In this section, we do not make the assumptions from Section 7.1 above. (In particular, there is no need to take noise to be ‘‘shrinking’’). 8.1. Definition and analysis of two scales estimation First, the average lag K previous-tick realized covariance is defined by
[X o , Y o ](TK ) =
MN 1 −
K i=K
(Xtoi − Xtoi−K )(Ysoi − Ysoi−K ).
In analogy with development in Zhang et al. (2005), we can now define a previous tick two scales realized covariance (TSCV) by
where cN is a constant that can be tuned for small sample precision −1/6
+ 2⟨X , X ⟩T E (ϵ ) + 2⟨Y , Y ⟩T E (ϵ ) + 6MN E (ϵ ) E (ϵ ) . X 2
8. Two scales estimation with previous-tick covariances
n¯ K o o (J ) (K ) ⟨ X , Y ⟩T = cN [X o , Y o ]T − [X , Y ]T n¯ J
0 Y 2
and E (ϵ Y )2 = rY N −2/3 .
0
.
2 ℓX ℓY ⟨X , Y ⟩T + Y N ℓX ℓ ∫ T T + (⟨X , Y ⟩′u )2 + ⟨X , X ⟩′u ⟨Y , Y ⟩′u du MN
⟨X , Y ⟩T
MSE = N
MN
N2
[
7.1.2. The tradeoff We study the tradeoff when the observation times of X and of Y follow Poisson processes with intensities λXα and λYα , respectively. As in Section 6.2, we shall for ease of exposition identify N and (λXα + λYα )T (by the law of large numbers, these two quantities are interchangeable in our formulas in this section). We note that the previous-tick estimator is asymptotically unbiased, with order MN /N. As far as its variance is concerned, the part due to asynchronicity and discretization decreases with sampling frequency (Theorem 2) whereas the part due to microstructure noise increases (Section 7.1.1). It would be desirable to balance the terms between the bias and the variance terms, in the sense that minimizes the MSE. We do so in the following. We let νN2 represent the order of the variance of ϵ X and ϵ Y , so that O(E (ϵ X )2 ) = O(E (ϵ Y )2 ) = O(νN2 ). For µX = µY = 0, [X , ϵ Y ]T [2] is the end point of a martingale and it has zero covariation with the pure noise. Then, the leading terms in MSE of [X o , Y o ] is,
2
[
[ X , ϵ Y ]T [2 ] Op ([E (ϵ X )2 ]1/2 )+Op ([E (ϵ Y )2 ]1/2 )
[ϵ X , ϵ Y ] T
E (ϵ X )2 = rX N −2/3 ,
Note that r, rX and rY regulate the triangular array type of asymptotics described just before Section 7.1.1. Only r is assumed to be controllable by the econometrician; rX and rY are given by nature. Setting ∂ MSE/∂ MN = 0, we get
Y 2
Hence, var([ϵ X , ϵ Y ]T ) = 6MN E (ϵ X )2 E (ϵ Y )2 . Therefore, the pure noise variation [ϵ X , ϵ Y ]T has order [MN E (ϵ X )2 E (ϵ Y )2 ]1/2 . In summary,
[X o , Y o ]T = [X , Y ]T +
MN = rN 2/3 ,
−
where we recall that ϵtXi ϵsYi−1 [2] = ϵtXi ϵsYi−1 + ϵtXi−1 ϵsYi . Because ϵ X
X Y ti si
be specific, suppose that r, rX and rY are nonnegative constants to that
X 2
Y 2
In order to capture all three effects consisting of microstructure noise, discretization variance and nonsynchronization bias, we have elected to let the size of the microstructure go to zero as N → ∞ in such a way that the variance due to noise will have the same size as the discretization and nonsynchronization MSE. To this effect, we can select MN = O(N 2/3 ), and νN2 = O(N −2/3 ). To
and for our purposes must satisfy that cN = 1 + op MN
(see,
in particular, Section 4.2 of Zhang et al. (2005)), and n¯ K = (MN − K + 1)/K , and similarly for n¯ J . The two scales are chosen such that 1 ≤ J ≪ K . Specifically, for the asymptotics, we require K = KN = O(N 2/3 ). J can be fixed or go to infinity with N. In the classical two scales setting, J = 1. If we assume that the sequence (ϵtXi , ϵsYi ) is independent of the latent X and Y processes, has bounded fourth moments, and is
L. Zhang / Journal of Econometrics 160 (2011) 33–47
exponentially α -mixing, it follows from the proof of Lemma A.2 in Zhang et al. (2005) that
+
M N 1 − K
n¯ J
ϵtXi ϵsYi−J [2] −
i=J
[X , Y ]T(J )
MN −
4.5
−1/6 ϵtXi ϵsYi−K [2] + op MN
(38)
i =K
where ‘‘ϵtXi ϵsYi−J [2]’’ means ϵtXi ϵsYi−J + ϵsYi ϵtXi−J (see Notation 1 in Section 3). We now use the results in this paper to analyze the first term in (38). If we assume Conditions C1 and C2, and if also J → ∞, it follows that Condition C3 is satisfied for the subsamples, and so from (8), (K )
[X , Y ]T −
n¯ K n¯ J
(K )
(J )
×
10–5
Aggregated [Xo,Yo]T vs. K for MSFT & GOOG (Oct.03,2005–Oct.31,2005)
4 (K)
(K ) ⟨ X , Y ⟩T = [X , Y ]T −
n¯ K
(K)
[X , Y ]T = [ X , Y ]T −
n¯ K n¯ J
(J )
−1/6
[ X , Y ]T + op MN
Aggregated [Xo,Yo]T
41
3.5 3 2.5 2 1.5 1
0
50
100
150
200 K
250
(K )
Fig. 1. Average estimator [X o , Y o ]T
300
350
400
as a function of K .
(K )
where [ X , Y ]T
is the averaged subsampled version of the (unobserved) synchronized estimator [ X , Y ]T . This is because the (K ) bias cancels to the relevant order! Specifically, the bias in [X , Y ]T is the expression in (7), multiplied by 1/K , and similarly for [X , Y ](TJ ) . Hence, in the end, (K )
⟨ X , Y ⟩T = [ X , Y ]T − +
M N 1 − K
n¯ K n¯ J
ϵtXi ϵsYi−J [2] −
(J )
[ X , Y ]T
MN −
−1/6 ϵtXi ϵsYi−K [2] + op MN .
(39)
i =K
i=J
It can be verified directly that this is also true when J is fixed as n → ∞, using the more complicated Theorems 1 and 2. Two things have been achieved in this development. On the one hand, we have shown that in this case, previous-tick estimation reduces, for purposes of analysis, to using synchronized observations. The other is that we do not need to be overly concerned with the precise dependence structure between ϵtXi and
ϵsYi .
From (39), we can now obtain the asymptotic mixed normality 1/6 of MN (⟨ X , Y ⟩T − ⟨X , Y ⟩T ) by just recycling the results in Zhang et al. (2005). We again stress that Condition C3 is not required on the original grid V , so that one can take MN = O(N ). A concrete limit theorem would be as follows: Theorem 6. Assume that (Xt ) and (Yt ) are Itô processes given by (1)–(2), with σtX and σtY continuous, and µXt and µYt locally bounded. Observables Xτon,i and Yθom,i are given by (35), and the grid V satisfies C1–C2, with MN /N → c1 > 0 as N → ∞. The scales J = JN and K = KN satisfy that KN /N 2/3 → c2 and JN /N 2/3 → 0 as N → ∞. Assume that J = limN →∞ JN is either infinity, or exists and is finite. Also assume that the noise processes are independent of (Xt ) and (Yt ), and that the process (ϵtXi , ϵsYi ) is stationary and exponentially
α -mixing, with E ϵ X = E ϵ Y = 0. Also suppose that ϵtXi and ϵsYi have finite (4 + δ)th moment for some δ > 0. Finally, define hi as in Eq. (43) ∑ in Zhang et al. (2005), with vi replacing ti , and set Gn (t ) = vi+1 ≤t hi 1vi . Assume that Gn converges pointwise to G. Then: N 1/6 (⟨ X , Y ⟩T − ⟨X , Y ⟩T ) converges stably in law to ωZ , where Z is standard normal (independent of X and Y ), and
ω2 =
1 −1 c c2 T 2 1
∫
T
⟨X , X ⟩′u ⟨Y , Y ⟩′u + (⟨X , Y ⟩′u )2 dG(t ) 0 ∞ − −2 + c2 c1 γ0,J + γ0,∞ + 2 (γi,J + γi,∞ ) , i=1
where γi,j = Cov(ϵtX0 ϵsY−j [2], ϵtXi ϵsYi−j [2]) (the precise form is rather tedious and is given in (52)) and γi,∞ = limj→∞ γi,j , given by
γi,∞ = 2Cov(ϵtX0 , ϵtXi )Cov(ϵsY0 , ϵsYi ) + 2Cov(ϵtX0 , ϵsYi )Cov(ϵsY0 , ϵtXi ).
(40)
Note that the functions Gn (t ) and G(t ) are exactly as in Zhang et al. (2005), and as argued on p. 1411 in that paper, G(t ) always exists under Condition C2, if necessary by using subsequences. If the vi s are equidistant, we have G′ (t ) ≡ 4/3. The assumption of stationarity of subsequences is one of convenience, and is not required, at the cost of more complicated expressions for the asymptotic variance. The deeper result is (39), which is not dependent on stationarity. 8.2. An illustration of behavior in data We here provide an instance of how the estimators behave in data. The daily covariance of Microsoft (MSFT) and Google (GOOG) were estimated based on the previous-tick method from the transactions reported in the TAQ database. The grid points V were based on the refresh time method (Barndorff-Nielsen et al. (2008b); see Section 2.1 above for a description). Figs. 1 and 2 give the average of the daily estimates from the trading days of October 2005. In Fig. 1, subsampling and averaging was used; Fig. 2 is based on the two scales estimator. From Fig. 1, one can see that the Epps effect does kick in at the highest frequencies (at very small K , the estimator sharply drops from 4 × 10−5 to 1.1 × 10−5 ), while at more moderately small K , there is an upward bias which is presumably due to microstructure. The Epps effect is substantially removed in the two scale covariance estimator. In Fig. 2, the TSCV is stable around 4 × 10−5 for large enough K . Regardless of the choice in K and J, it looks like that TSCV fluctuates in a much narrower range (between 4 × 10−5 to 5.4 × 10−5 ). 9. Conclusion: the Epps effect and its remedies This paper is about how to estimate ⟨X , Y ⟩T when the observation times of X and Y are not synchronized and when the microstructure noise is present in the observed price processes. Using the previous-tick estimator for ⟨X , Y ⟩T , we show in Theorem 1 that for positively associated assets X and Y , nonsynchronization induces a negative bias in the estimator. The magnitude of this bias increases in sampling frequency, up to a
42
L. Zhang / Journal of Econometrics 160 (2011) 33–47
Set 0 ≤ δit = N (vi − ti ) ≤ c and set 0 ≤ δis = N (vi − si ) ≤ c for some c. Then,
Aggregated TSCV vs. K,J for MSFT & GOOG (Oct.03,2005–Oct.31,2005)
N ×
MN
10–5
Aggregated TSCV
5.4
5
Hence,
4.6 4.4 4.2
20 15
3.8 400
10 350
300
250
5 200
150
100
50
K
J
00
Fig. 2. Two scales estimator ⟨ X , Y ⟩T as a function of J and K .
point; On the other hand, it decreases for more liquid assets. This is an analytic characterization of the Epps effect (Epps (1979)). To cope with this effect, the paper offers two approaches. On the one hand, the effect can be controlled through a bias–variance tradeoff. This tradeoff provides a optimal scheme for subsampling observations. The scheme can incorporate microstructure noise. A more satisfying approach is two or multiscale estimation. Section 8 shows that this approach eliminates, at the same time, the biases due to asynchronicity and microstructure noise. The rate of convergence is the same as that achieved in the scalar process case, where there is no asynchronicity. The principles outlined can be applied similarly to multiscale estimation (Zhang, 2006), thus achieving rate efficiency. A full development of this approach is deferred to later work.
Proof of Theorem 1. Assume first that µX = 0 and µY = 0. We know that the stochastic bias of [X , Y ]T is min(ti ,si )
⟨X , Y ⟩u du − ⟨X , Y ⟩T = ′
max(ti−1 ,si−1 )
T
∫
⟨X , Y ⟩u d[GN (u) − u], ′
0
∫ t− MN 0
I (max(ti−1 , si−1 ) < v < min(ti , si ))dv
=
Lemma 1. Let X and Y be Itô processes satisfying (1)–(2). Let vi , ti , and si be the i-th sampling point, and the previous ticks in X ∑ and in Y , respectively, as defined in Section 2.1. Let RN = i (R1,i , R2,i and R3,i ), where R1.i = (Xti − Xmin(ti ,si ) )(Ysi − Ysi−1 ) R2,i = (Xmax(ti−1 ,si−1 ) − Xti−1 )(Ysi − Ysi−1 ) R3,i = (Xmin(ti ,si ) − Xmax(ti−1 ,si−1 ) ) × [(Ysi − Ymin(ti ,si ) ) + (Ymax(ti−1 ,si−1 ) − Ysi−1 )]. (nonsyn)
Then, under Conditions C1 and C2, UN ,u of Definition 3, and
RN = Op
is RCP in the sense
,
N
(41)
(min(ti , si ) − max(ti−1 , si−1 )) + Op
N
T
∫
⟨Y , Y ⟩′u ⟨X , X ⟩′u dUu(nonsyn) + op
0
(nonsyn)
1
N
−
R1,i ,
i
1
=
−
=
R1,i
i
∫ − (Ysi − Ysi−1 )2 i
∫ − (⟨Y , Y ⟩si − ⟨Y , Y ⟩si−1 ) i
N
(nonsyn)
(the limit of UN ,u
ti min(si ,ti )
ti min(si ,ti )
d⟨X , X ⟩u
d⟨X , X ⟩u + op
(max(ti , si ) − min(ti , si )),
i:max(ti ,si )≤t
−
=−
i:max(ti ,si )≤t
T
(max(ti , si ) − min(ti , si )) + Op
1
N
N MN
0
Similarly,
.
⟨X , Y ⟩′t dFN (t ) is RCP, note that under C2, 0 ≤ vi − ti ≤ inf{τ > vi : τ ∈ Tn } − ti ≤ cN1 for some positive constant c c1 . Similarly, vi − si ≤ N2 for some positive constant c2 . To see why
1
N
N
i
GN (t ) − t = − max(t1 , s1 ) −
(42)
− 1 ′ ′ = ⟨Y , Y ⟩si ⟨X , X ⟩si (si − si−1 )(ti − min(si , ti )) + op .
where t = max{l ∈ T ∪ S : l ≤ t }. Hence (11) follows since
−
,
(nonsyn)
i:min(ti ,si )≤t
T
through any subsequence for which Uu exists.
i:min(ti ,si )≤t
=
1
√
⟨RN , RN ⟩ =
(min(ti , si ) − max(ti−1 , si−1 )) + (t − t )
−
FN (t ) is RCP by Helly’s Theorem (Ash, 1972, p 329).
The same result for the stochastic bias follows since ⟨X , Y ⟩′t is continuous. If we do not assume that µX = 0 and µY = 0, it is easy to see that the contribution to the bias from such terms is asymptotically negligible. To see this, we refer to Girsanov’s Theorem and the device used at the beginning of the proof of Theorem 2 in Zhang et al. (2005) (Section A.3, p. 1410). This works unless the instantaneous correlation between X and Y is one. In this latter case, one should use the methods of Mykland and Zhang (2006).
i=1
−
[max(δit , δis ) − min(δit , δis )] + op (1) = Op (1).
)
Proof of Lemma 1. UN ,u is RCP by the same methods as in the proof of Theorem 1. Note that the leading terms in R1,i –R3,i are martingale increments, with order root N. This is because
where GN (t ) =
[max(δit , δis ) − min(δit , δis )]
in particular, its quadratic variation
Appendix. Proofs
MN ∫ −
N MN
−
MN i:max(t ,s )≤t i i
MN i:v ≤t i
5.2
4.8
1
1 −
=
4
i =1
FN (t ) =
−
R2,i ,
i
=
−
R2,i
i
− ⟨Y , Y ⟩′si−1 ⟨X , X ⟩′si−1 (max(ti−1 , si−1 ) − ti−1 ) i
× (si − si−1 ) + op
1
N
.
L. Zhang / Journal of Econometrics 160 (2011) 33–47
At last, for
∑
i
⟨R3,i , R3,i ⟩ =
R3,i :
−∫
+
si
(Xmin(ti ,si ) − Xmax(ti−1 ,si−1 ) ) d⟨Y , Y ⟩u (Ymax(ti−1 ,si−1 ) − Ysi−1 ) d⟨X , X ⟩u 2
max(ti−1 ,si−1 )
i
− = ⟨X , X ⟩′ti ⟨Y , Y ⟩′ti (si − si−1 )(ti − ti−1 ) i
min(ti ,si )
−∫
N
i
2
min(ti ,si )
i
− = ⟨X , X ⟩′ti ⟨Y , Y ⟩′ti (min(ti , si ) − max(ti−1 , si−1 )) i
× [(si − min(si , ti )) + (max(ti−1 , si−1 ) − si−1 )] 1 + op . N
min(ti ,si )
i
−∫
− (Xmin(ti ,si ) − Xmax(ti−1 ,si−1 ) )
[X , Y ]T =
i
Invoking Itô’s formula on the first term, we obtain,
min(ti ,si )
− (Xmin(ti ,si ) − Xmax(ti−1 ,si−1 ) )(Ymin(ti ,si ) − Ymax(ti−1 ,si−1 ) )
(Ymax(ti−1 ,si−1 ) − Ysi−1 )dXu = 0.
i
R1,i ,
−
= 0 and
R2,i
−
=
R1,i ,
R3,i
R2,i ,
−
=
R3,i
=
i
i
si min(ti ,si )
i
i
× (Xmin(ti ,si ) − Xmax(ti−1 ,si−1 ) )d⟨Y , Y ⟩u ∫ min(ti ,si ) − ⟨Y , Y ⟩′ti (si − min(ti , si )) = − Xti−1 )dXu + op
1
N
(Xmax(ti−1 ,si−1 )
,
i
min(ti ,si ) max(ti−1 ,si−1 )
(Xmax(ti−1 ,si−1 )
The ∑ asymptotic variance of [X , Y ]T has two components, one – i (R1,i + R2,i + R3,i ) – comes from the nonsynchronization, while
− max(ti−1 , si−1 ))(max(ti−1 , si−1 ) − ti−1 ) + op
1
max(ti−1 ,si−1 ) min(ti ,si )
−∫
(Xu − Xmax(ti−1 ,si−1 ) )dYu [2] – is because
i−1
=
max(ti−1 ,si−1 )
i
−∫
N
− − (R1,i + R2,i + R3,i ), (R1,i + R2,i + R3,i )
+2
(Xu − Xmax(ti−1 ,si−1 ) )2 d⟨Y , Y ⟩u
min(ti ,si ) max(ti−1 ,si−1 )
i
the order in the final step is because: under Conditions C1 and C2, 0 ≤ vi − si ≤ inf{τ > vi : τ ∈ Tn } − si ≤ c1 /N for some c1 , and vi − ti ≤ c2 /N for some c2 . Hence, |si − min(ti , si )| = O(1/N ), | max(ti−1 , si−1 ) − ti−1 | = O(1/N ), and |min(ti , si ) − max(ti−1 , si−1 )| ≤ |min(ti , si ) − v∑ − vi−1| + i | + |vi ∑ |vi−1 − max(ti−1 , si−1 )| = O(1/M ). Therefore, R , 2 , i i i R3,i = Op (N −3/2 ) = op (N −1 ). Put together the above results, we have
(Xu − Xmax(ti−1 ,si−1 ) )dYu [2]
min(ti ,si )
−∫
+
(Xu − Xmax(ti−1 ,si−1 ) )dYu [2],
max(ti−1 ,si−1 )
i
N
1
i
max(ti−1 ,si−1 )
i
min(ti ,si )
−∫
i
i
∑ min(ti ,si )
of the discrete trading (or recording) time. The former is analyzed in Lemma 1. Now we are left to show the result in the latter term and the interaction between the two terms. ∑ min(ti ,si ) We start with the quadratic variation of i max(t ,s ) (Xu −
i
− Xti−1 )2 d⟨X , X ⟩u − = (si − min(ti , si ))2 (⟨Y , Y ⟩′ti )2 (⟨X , X ⟩′ti )2 (min(ti , si )
= op
(Xu − Xmax(ti−1 ,si−1 ) )dYu [2].
Xmax(ti−1 ,si−1 ) )dYu [2].
whose main order has a quadratic variation
∫ − (⟨Y , Y ⟩′ti )2 (si − min(ti , si ))2
max(ti−1 ,si−1 )
i
(Xu − Xmax(ti−1 ,si−1 ) )dYu [2]
⟨X , Y ⟩′u du
min(ti ,si )
−∫
the other –
max(ti−1 ,si−1 )
i
max(ti−1 ,si−1 )
i
(Xmax(ti−1 ,si−1 ) − Xti−1 )
min(ti ,si )
−∫
+
−∫
max(ti−1 ,si−1 )
i
= 0,
min(ti ,si )
−∫
+
while on the other hand,
−
− (⟨X , Y ⟩min(ti ,si ) − ⟨X , Y ⟩max(ti−1 ,si−1 ) ) i
−
i
i
i
− (R1,i + R2,i + R3,i ). i
Now, left to compute that covariations between the different terms. As it turns out, the covariations are either zero or of order op N1 . In particular, it follows directly from the definitions of R1,i , R2,i and R3,i that on the one hand
−
,
Proof of Theorem 2. Note that
(Xmin(ti ,si ) − Xmax(ti−1 ,si−1 ) )dYu ,
max(ti−1 ,si−1 )
i
1
N
taking the limit (for convergent subsequences), Lemma 1 follows from the theory in Chapter VI in Jacod and Shiryaev (2003).
× (Ymin(ti ,si ) − Ymax(ti−1 ,si−1 ) ) +
si
− (max(ti−1 , si−1 ) − min(ti , si ))2 + op
The first transition in above is because
−∫
43
− 1 = (⟨R1,i , R1,i ⟩ + ⟨R2,i , R2,i ⟩ + ⟨R3,i , R3,i ⟩) + op
−∫ i
(Yu − Ymax(ti−1 ,si−1 ) )2 d⟨X , X ⟩u
min(ti ,si ) max(ti−1 ,si−1 )
(Xu − Xmax(ti−1 ,si−1 ) )
× (Yu − Ymax(ti−1 ,si−1 ) )d⟨X , Y ⟩u − = ⟨X , X ⟩′ti ⟨Y , Y ⟩′ti (min(ti , si ) − max(ti−1 , si−1 ))2 i
+
− (⟨X , Y ⟩′ti )2 (min(ti , si ) − max(ti−1 , si−1 ))2 i
+ QN ,T [2] + op
1 3/2
MN
,
i−1
44
L. Zhang / Journal of Econometrics 160 (2011) 33–47
× (Xmax(ti−1 ,si−1 ) − Xti−1 )d⟨Y , Y ⟩u ∫ min(ti ,si ) − = (Xmax(ti−1 ,si−1 ) − Xti−1 )
where
∫ − QN ,T [2] = 2 ⟨X , X ⟩′ti i
min(ti ,si ) max(ti−1 ,si−1 )
(min(ti , si ) − u)
i
× (Yu − Ymax(ti−1 ,si−1 ) )dYu [2] ∫ min(ti ,si ) − (min(ti , si ) − u) +2 ⟨X , Y ⟩′ti max(ti−1 ,si−1 )
i
which has a quadratic variation
−
(Xmax(ti−1 ,si−1 ) − Xti−1 )2 (⟨Y , Y ⟩′ti−1 )2
i
× max(ti−1 ,si−1 )
≤
MN
2−
(min(ti , si ) − max(ti−1 , si−1 ))
4
× (⟨X , X ⟩′ti−1 )2 (⟨Y , Y ⟩′ti−1 )2 + 6(⟨X , X ⟩′ti−1 )(⟨Y , Y ⟩′ti−1 ) × (⟨X , Y ⟩′ti−1 )2 + (⟨X , Y ⟩′ti−1 )4 × (1 + op (1)). 1 Under Condition C2, we know that QN ,T [2] = Op . 3/2
min(ti ,si )
∫
By the results and the methods in Lemma 2 (with αi = max (ti−1 , si−1 ) and βi = min(ti , si )), we obtain that the quadratic variation of QN [2] is as follows:
3 i=1
1
sup(⟨Y , Y ⟩′t )2 sup(⟨X , X ⟩′t )2
3
t
t
min(ti ,si )
−∫
= Op
√
∼
M
0
min(ti , si ) − max(ti−1 , si−1 ) = 1v − N −1 (max(δit , δis )
− min(δit−1 , δis−1 )).
=
MN − T
(1v)2 − 2
i:ti ,si ≤u
1v N
(max(δit , δis )
= Op
2
proving (13). Next we study the interaction term. Since
X dY [2] =
X dY +
Y dX .
i
(Xu − max(ti−1 ,si−1 )
∑ min(ti ,si )
max(ti−1 ,si−1 )
i
−∫
(Xu − Xmax(ti−1 ,si−1 ) )dYu ,
min(ti ,si ) max(ti−1 ,si−1 )
βi αi
(βi − u)(Xu − Xαi )dXu ,
βi αi
(βi − u)(Xu − Xαi )dYu [2].
βi
i=1
=
αi
(Xu − Xαi )2 d⟨Y , Y ⟩u
MN 1−
2 i =1
⟨X , X ⟩′αi ⟨Y , Y ⟩′αi (βi
− αi ) + QN + op 2
− i
(Xu − Xmax(ti−1 ,si−1 ) )
R2,i
1 3/2
MN
where QN has quadratic variation MN 1−
For the rest, we have
max(ti−1 ,si−1 )
∫ MN − ⟨X , Y ⟩′αi
MN ∫ −
∑
min(ti ,si )
∫ MN − ⟨Y , Y ⟩′αi
Assuming Condition C1, then
∑ min(ti ,si )
i max(ti−1 ,si−1 ) (Xu − Xmax(ti−1 ,si−1 ) )dYu and ( R + R + R ) . First, it is obvious that 1 , i 2 , i 3 , i i ∫ − min(ti ,si ) − (Xu − Xmax(ti−1 ,si−1 ) )dYu , R1,i = 0.
interaction between
i
.
For next Lemma, we use the notation
i =1
Xmax(ti−1 ,si−1 ) )dYu [2] is symmetric, we only need to show the
=
NM
R3,i
i
In sum, the interaction term is negligible. Theorem 2 is proved.
RN =
= u − 2FN (u) + O(MN ) + O((MN /N ) ),
i
1
−
and
−1
−∫
√
(Xu − Xmax(ti−1 ,si−1 ) )dYu ,
i=1
− min(δit−1 , δis−1 )) + O(N −2 )
.
max(ti−1 ,si−1 )
QN = 2
i
R2,i
i
Lemma 2. Let X and Y be Itô processes satisfying (1)–(2). Let N and MN be as defined in Section 2.1. Denote
Hence, (dis) U N ,u
min(ti ,si )
−∫ i
For a rigorous proof of similar statements under lesser regularity conditions, see Propositions 1 and 3 in Mykland and Zhang (2006). To see Eq. (13), let δis and δit be as defined in the proof of Theorem 1, and set 1v = T /MN . We then get that
(Xu − Xmax(ti−1 ,si−1 ) )dYu ,
−
By same method,
) ⟨X , X ⟩′u ⟨Y , Y ⟩′u + (⟨X , Y ⟩′u )2 dUN(dis ,u .
1 NM
i
T
max(ti−1 ,si−1 )
i
− ⟨X , X ⟩′ti ⟨Y , Y ⟩′ti (min(ti , si ) − max(ti−1 , si−1 ))2
∫
(max(ti−1 , si−1 ) − ti−1 )
i
NM
MN
i
−
Hence
By continuity, and using (13)
− (⟨X , Y ⟩′ti )2 (min(ti , si ) − max(ti−1 , si−1 ))2 +
(min(ti , si ) − u)2 d⟨X , X ⟩u × (1 + op (1))
× (min(ti , si ) − max(ti−1 , si−1 ))3 × (1 + op (1)) 1 = Op . 2
T
⟨Y , Y ⟩′ti−1
× (min(ti , si ) − u)dXu × (1 + op (1)),
× (Xu − Xmax(ti−1 ,si−1 ) )dYu [2].
⟨QN [2], QN [2]⟩ =
max(ti−1 ,si−1 )
3 i=1
(⟨X , X ⟩′αi )2 (⟨Y , Y ⟩′αi )2 (βi − αi )4 × (1 + op (1)).
Similarly, MN ∫ − i=1
βi
αi
(Xu − Xαi )(Yu − Yαi )d⟨X , Y ⟩u
,
L. Zhang / Journal of Econometrics 160 (2011) 33–47
=
MN 1−
2 i=1
(⟨X , Y ⟩′αi )2 (βi − αi )2 + RN + op
1
,
3/2 MN
i=1
where RN has quadratic variation
=
MN 1−
(⟨X , Y ⟩′αi )2 (⟨X , X ⟩′αi )(⟨Y , Y ⟩′αi )(βi − αi )4 × (1 + op (1)).
6 i =1
βi ∫
(Xv − Xαi )dXv du,
αi
αi
=
u
1 12
βi ∫
∫
βi
(Xv − Xαi )dXv du
(⟨X , X ⟩′αi )2 (βi − αi )4 × (1 + op (1)).
u
∫
(Xv − Xαi )dXv du = (βi − αi )
αi
αi
∫
βi
− αi
∫
βi
= αi
∫
βi αi
(43)
βi
βi
= αi
αi
=
1 12
(Xu − Xαi )dXu
min(ti ,si )
i
(u − αi )(Xu − Xαi )dXu
= =
βi αi
−∫
αi
i=1
=
βi
i =1
∫ MN − ′ = ⟨Y , Y ⟩αi i =1
βi αi βi αi
MN 1−
2 i=1
i=1
=
βi αi
T i:s ,t ≤u i i
= −3/2
(Xu − Xαi )2 du + op (MN
MN
=
− ⟨X , Y ⟩′αi i =1
∫
αi βi αi
(si − si−1 )(ti − ti−1 )
N − T i:s ,t ≤u i i
((1v)2 − N −1 1v(δis − δis−1 ) − N −1 1v(δit − δit−1 )
+ N −2 (δis − δis−1 )(δit − δit−1 )) N − = (1v)2 + o(1)
)
T i:s ,t ≤u i i
(⟨X , X ⟩u − ⟨X , X ⟩αi )du
(44)
(by telescope sum) and βi
∫
u
αi
−3/2
(Xv − Xαi )dXv du + op (MN
N − T i:s ,t ≤u i i
=
(max(ti−1 , si−1 ) − min(ti , si ))2
N − T i:s ,t ≤u i i
((1v)2 − 2N −1 1v(max(δis , δit )
− min(δis−1 , δit−1 ))) + o(1) N − N (1v)2 − 2 FN (u) + o(1). = T i:s ,t ≤u i i
−3/2
(Xu − Xαi )(Yu − Yαi )du + op (MN (⟨X , Y ⟩u − ⟨X , Y ⟩αi )du
)
) by (43).
(Xu − Xαi )(Yu − Yαi )d⟨X , Y ⟩u
i =1
T
⟨X , X ⟩′t ⟨Y , Y ⟩′t dt N 0 = Op (1/N ),
N −
−3/2
βi
∫
Proof of Corollary 2. To show Eq. (21), define δit and δis as in the proof of Theorem 1. We then obtain that (with 1v = T /MN )
⟨X , X ⟩′αi ⟨Y , Y ⟩′αi (βi − αi )2 + QN + op (MN
∫ MN − ⟨X , Y ⟩′αi
1
(u − max(ti−1 , si−1 ))du(1 + op (1))
and similarly for the second term. This shows the result for wi = max(ti , si ). The result for vi follows similarly.
Similarly, MN ∫ −
max(ti ,si ) min(ti ,si )
− αi ) × (1 + op (1)).
αi
i=1
(Xu − Xmax(ti−1 ,si−1 ) )2 d⟨Y , Y ⟩u
− ⟨X , X ⟩′max(ti−1 ,si−1 ) ⟨Y , Y ⟩′min(ti ,si ) ∫
4
∫ MN − ′ +2 ⟨Y , Y ⟩αi =
min(ti ,si )
×
(Xu − Xαi )2 d⟨Y , Y ⟩u
∫ MN − ⟨Y , Y ⟩′αi
(Xu − Xmax(ti−1 ,si−1 ) )dYu
i
(βi − u)2 (u − αi )du × (1 + op (1))
(⟨X , X ⟩′αi )2 (βi
max(ti ,si )
−∫ i
(βi − u)(Xu − Xαi )dXu ,
Next, invoke Itô’s formula, MN
max(ti ,si )
2
∫
max(ti−1 ,si−1 )
i
min(ti ,si )
−∫
(βi − u)2 (⟨X , X ⟩u − ⟨X , X ⟩αi )d⟨X , X ⟩u × (1 + op (1))
= (⟨X , X ⟩′αi )2
).
(Xu − Xmax(ti−1 ,si−1 ) )dYu [2]. Now set L˜ N = ˜ i max(t ,s ) (Xu − Xmax(ti−1 ,si−1 ) )dYu [2]. To assess LN − LN = ∑ max(ti−i ,1si ) i−1 i min(ti ,si ) (Xu − Xmax(ti−1 ,si−1 ) )dYu [2], note that − ∫ max(ti ,si ) (Xu − Xmax(ti−1 ,si−1 ) )dYu ,
≤ ∫
−3/2
(⟨X , Y ⟩′αi )2 (βi − αi )2 + RN + op (MN
∑ min(ti ,si )
i
(βi − u) (Xu − Xαi ) d⟨X , X ⟩u 2
2 i=1
)
(nonsyn)
which has quadratic variation
∫
αi
−3/2
(Xv − Xαi )dYv [2] du + op (MN
∑ max(ti ,si )
Use integration by parts on the outer integration, we get
∫
αi
u
MN 1−
to be
u
αi
αi
βi ∫
Proof of Theorem 4. Since in this case, UN ,t → 0, it follows from the proof of Theorem 2 that we can take LN in Theorem 3
Proof of Lemma 2. We first show that
∫
MN − ⟨X , Y ⟩′αi
+
45
∫
MN
(45)
(nonsyn)
)
Since UN ,u is defined as the difference between the two above terms, the result (21) follows. Proof of Theorem 5. The proof is similar to that of the earlier results; the main difference lies in verifying that the relevant sequences are RCP in the sense of Definition 3. We here provide
46
L. Zhang / Journal of Econometrics 160 (2011) 33–47
the argument in the case of the bias; entirely similar considerations apply to the variance. Consider first the term
and E ((min(ti , si ) − vi )2 |vi )
α − (vi − max(si , ti ))
Mα
≤
=
i
α − α − (vi − si ) + (vi − ti ).
Mα
Mα
i
(46)
Suppose one considers the following subsampling scheme: every time τi occurs, it is sampled with probability c α/λXα (τi ). By standard considerations, the subsampled times τi′ are derived from a Poisson process with intensity c α . Suppose that the number of such τi′ is n′ . If t ′i = max{τj′ ≤ vi } one obtains that
α − α − (vi − ti ) ≤ (vi − t ′i ).
Mα
Mα
i
(47)
i
Note that by the Poisson property of the τi , the expectation of the right hand side of (47) is bounded by 1/c, hence (47) is Op (1). By using the same argument on the si s, one thus obtains that (46) is Op (1). Finally, if N ′ = m′ + n′ , by the law of large numbers, N ′ /α → 2cT as α → ∞. Hence MN FN (T ) is Op (1). Again, by Helly’s ′
Theorem (Ash (1972, p. 329)), follows similarly.
N MN
vi − max(ti , si ) = min ((vi − ti ), (vi − si )) ∼ exp(λα + λα ).
2
2
+
(λYα )2
2
(λα + λYα )2 X
2
2
+
(λXα + λYα )2 X λα λYα + λYα λXα
−
2
λXα λYα (51)
λ E [(vi − si ) min(vi − ti , vi − si )|vi ] = − Xα Y
1
λα (λXα + λYα )2
+
1
λXα λYα
.
Thus, by independent increment and by (50)–(51), we get E [(min(ti , si ) − max(ti−1 , si−1 ))2 |vi ]
= E (min(ti , si ) − vi ) + (vi − vi−1 ) + (vi−1 − max(ti−1 , si−1 ))2 |vi 2 Mα 2 1 1 1 T 1+2 − − + O . = Mα T λXα + λYα λXα λYα α2 Therefore,
[
T
(dis)
EUN ,vk = k
Y
Mα
1+2
Mα
2
λXα + λYα
T
−
1
λXα
−
1
+O
λYα
1
]
α2
.
Similarly, E [(si −si−1 )] = E [(si −vi )+(vi −vi−1 )+(vi−1 −si−1 )|vi ] = T = E [(ti − ti−1 )|vi ], thus, M
Also, since
α
vi − min(ti , si ) = max (vi − ti , vi − si ) = (vi − ti ) + (vi − si ) − min (vi − ti , vi − si ) ∼ exp(λα ) + exp(λα ) − exp(λα + λα ) X
Y
X
Y
k −
(nonsyn)
E UN ,vk
[ ×
(48)
then, [−(vi − min(ti , si )) + (vi − max(ti , si ))] 1
N
.
(49)
Mα
Under our assumptions,
E [FN (vk )] = −k −
1
λXα
−
1
λYα
+
2
λXα + λYα
+O
1
α
Proof of Corollary 5. This is a direct consequence of Theorem 2 and Corollaries 2 and 4. We here provide an independent proof as an addition. Again use the relation (48), we obtain 1
λα X
+
1
λα Y
−
1
λα + λYα X
,
= 2(λXα + λYα )k
−
2
λXα + λYα
+
1
λXα
+
1
+O
λYα
1
α2
]
.
T
(⟨X , Y ⟩′u )2 + ⟨X , X ⟩′u ⟨Y , Y ⟩′u du ∫ T T ℓY ℓX (⟨X , Y ⟩′u )2 + ⟨X , X ⟩′u ⟨Y , Y ⟩′u du, −2 + N ℓX ℓY 0 0
whereas the asymptotic stochastic variance due to nonsynchronization is
.
Also note that N /(λXα T + λYα T ) → 1 in probability. By appropriate normalization, it follows that (31) holds in expectation. By observing that (49) is an independent sum, it also follows that (31) holds in probability. Thus, (32) yields accordingly.
E [vi − min(ti , si )|vi ] =
Mα
∫
T
T
By the same argument as in the proof of Corollary 4 and the results in Theorem 2, the asymptotic stochastic variance due to discretization is
i=1
+ Op
2
where E [(vi − ti ) min(vi − ti , vi − si )|vi ][2] = E [((vi − ti ) + (vi − si )) min(vi − ti , vi − si )|vi ]. The first step is due to the independence between X and Y , and the second step is because
N
X
2
(λXα )2 +
FN (t ) is RCP. The rest of the proof
Proof of Corollary 4. Let vi , si , and ti be the same as in Section 2.1. Since X and Y are Poisson with intensities λXα and λYα , respectively, we get vi − ti ∼ exp(λXα ),2 and vi − si ∼ exp(λYα ), thus
FN (vk ) = −
=
i
2
+ Y 2+ X + X Y (λXα )2 (λα ) (λα + λYα )2 λα λα − 2E [(vi − ti ) min(vi − ti , vi − si )|vi ][2]
(50)
2
T
∫
T N
⟨X , X ⟩′u ⟨Y , Y ⟩′u du 0
ℓX ℓY + Y X ℓ ℓ
.
Adding up, (33) follows by the law of large numbers.
Proof of Theorem 6. Consider separately the signal and noise terms in (39). This is legitimate since the two terms are independent. It is easy to see that the term involving the semimartingales X and Y is handled exactly in analogy with the similar development (Theorem 3) in Zhang et al. (2005), integrating the methodology from Theorem 2 in the current paper. The constant appears as fol2/3 lows: the constant c from the earlier paper is here c ∼ KN /MN ∼ −2/3
c2 c1
.
−1/2
For the noise term, replace normalization by N 1/6 /KN by MN
2 X ∼ exp(λ) means X follow exponential distribution with intensity λ.
1/2 c1 c2−1 ,
(thus creating a constant of which is squared in the variance). We now have to deal with two suitably normalized mixing
L. Zhang / Journal of Econometrics 160 (2011) 33–47
sums. The asymptotic normality follows as in Chapter 5 of Hall and Heyde (1980). It is easy to verify that the two sums are asymptotically uncorrelated. If one sets γi,j = Cov(ϵtX0 ϵsY−j [2], ϵtXi ϵsYi−j [2]), the variance of the ‘‘J’’ term thus gets the form γ0,J + ∑asymptotic ∞ 2 i=1 γi,J , and similarly for the ‘‘K ’’ term (let J → ∞ if it isn’t already there). To see the expression for γi,j , note that
γi,j = Cov ϵtX0 ϵsY−j + ϵsY0 ϵtX−j , ϵtXi ϵsYi−j + ϵsYi ϵtXi−j = 2Cov ϵtX0 , ϵtXi Cov ϵsY0 , ϵsYi + 2Cov ϵtX0 , ϵsYi Cov ϵsY0 , ϵtXi + Cov ϵtX0 , ϵsYi−j Cov ϵsY−j , ϵtXi + Cov ϵsY0 , ϵtXi−j Cov ϵtX−j , ϵsYi + Cov ϵtX0 , ϵtXi−j Cov ϵsY−j , ϵsYi + Cov ϵsY0 , ϵsYi−j Cov ϵtXi , ϵtX−j + cum ϵtX0 , ϵtXi , ϵsY−j , ϵsYi−j + cum ϵsY0 , ϵsYi , ϵtX−j , ϵtXi−j + cum ϵtX0 , ϵtXi−j , ϵsY−j , ϵsYi + cum ϵtXi , ϵtX−j , ϵsY0 , ϵsYi−j . (52) Obviously, as j → ∞, γi,j tends to the expression in (40).
References Aït-Sahalia, Y., Mykland, P.A., Zhang, L., 2005. How often to sample a continuoustime process in the presence of market microstructure noise. Review of Financial Studies 18, 351–416. Aït-Sahalia, Y., Mykland, P.A., Zhang, L., 2011. Ultra high frequency volatility estimation with dependent microstructure noise. Journal of Econometrics 160 (1), 160–175. Aldous, D., 1989. Probability Approximations via the Poisson Clumping Heuristic. Springer-Verlag. Aldous, D.J., Eagleson, G.K., 1978. On mixing and stability of limit theorems. Annals of Probability 6, 325–331. Andersen, T.G., Bollerslev, T., 1998. Answering the skeptics: yes, standard volatility models do provide accurate forecasts. International Economic Review 39, 885–905. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2001. The distribution of realized exchange rate volatility. Journal of the American Statistical Association 96, 42–55. Ash, R.B., 1972. Real Analysis and Probability. Academic Press. Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N., 2008a. Designing realized kernels to measure ex-post variation of equity prices in the presence of noise. Econometrica 76, 1481–1536. Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N., 2008b. Multivariate realised kernels: consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading. Tech. Rep., University of Oxford. Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N., 2011. Subsampling realised kernels. Journal of Econometrics 160 (1), 204–219.
47
Barndorff-Nielsen, O.E., Shephard, N., 2002a. Econometric analysis of realized volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society. Series B 64, 253–280. Barndorff-Nielsen, O.E., Shephard, N., 2002b. Estimating quadratic variation using realised variance. Journal of Applied Econometrics 17, 457–477. Dacorogna, M.M., Gençay, R., Müller, U., Olsen, R.B., Pictet, O.V., 2001. An Introduction to High-Frequency Finance. Academic Press, San Diego. Delattre, S., Jacod, J., 1997. A central limit theorem for normalized functions of the increments of a diffusion process, in the presence of round-off errors. Bernoulli 3, 1–28. Devroye, L., 1981. Law of the iterated logarithm for order statistics for uniform spacing. Annals of Probability 9, 860–867. Devroye, L., 1982. A log log law for maximal uniform spacings. Annals of Probability 10, 863–868. Epps, T.W., 1979. Comovements in stock prices in the very short run. Journal of the American Statistical Association 74, 291–298. Gençay, R., Ballocchi, G., Dacorogna, M., Olsen, R., Pictet, O., 2002. Real-time trading models and the statistical properties of foreign exchange rates. International Economic Review 43, 463–491. Griffin, J., Oomen, R., 2007. Covariance measurement in the presence of nonsynchronous trading and market microstructure noise. Preprint. Hall, P., Heyde, C.C., 1980. Martingale Limit Theory and its Application. Academic Press, Boston. Jacod, J., Li, Y., Mykland, P.A., Podolskij, M., Vetter, M., 2009. Microstructure noise in the continuous case: the pre-averaging approach. Stochastic Processes and their Applications 119, 2249–2276. Jacod, J., Protter, P., 1998. Asymptotic error distributions for the Euler method for stochastic differential equations. Annals of Probability 26, 267–307. Jacod, J., Shiryaev, A.N., 2003. Limit Theorems for Stochastic Processes, 2nd ed. Springer-Verlag, New York. Lo, A.W., MacKinlay, A.C., 1990. An econometric analysis of nonsynchronous trading. Journal of Econometrics 45, 181–211. Lunde, A., Voev, V., 2007. Integrated covariance estimation using high-frequency data in the presence of noise. Journal of Financial Econometrics 5, 68–104. McCullagh, P., 1987. Tensor Methods in Statistics. Chapman and Hall, London, UK. Mykland, P.A., 1994. Bartlett type identities for martingales. Annals of Statistics 22, 21–38. Mykland, P.A., Zhang, L., 2006. ANOVA for diffusions and Itô processes. Annals of Statistics 34, 1931–1963. Podolskij, M., Vetter, M., 2009. Estimation of volatility functionals in the simultaneous presence of microstructure noise and jumps. Bernoulli 15 (3), 634–658. Rényi, A., 1963. On stable sequences of events. Sankhya Series A 25, 293–302. Rootzén, H., 1980. Limit distributions for the error in approximations of stochastic integrals. Annals of Probability 8, 241–251. Ross, S., 1996. Stochastic Processes, 2nd ed. Wiley, New York, NY. Scholes, M., Williams, J., 1977. Estimating betas from nonsynchronous data. Journal of Financial Economics 5, 309–327. Shorack, G.R., Wellner, J.A., 1986. Empirical Processes with Application to Statistics. Wiley. Zhang, L., 2001. From martingales to ANOVA: implied and realized volatility. Ph.D. Thesis, The University of Chicago, Department of Statistics. Zhang, L., 2006. Efficient estimation of stochastic volatility using noisy observations: a multi-scale approach. Bernoulli 12, 1019–1043. Zhang, L., Mykland, P.A., Aït-Sahalia, Y., 2005. A tale of two time scales: determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association 100, 1394–1411. Zhang, L., Mykland, P.A., Aït-Sahalia, Y., 2011. Edgeworth expansions for realized volatility and related estimators. Journal of Econometrics 160 (1), 190–203.
Journal of Econometrics 160 (2011) 48–57
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
The role of implied volatility in forecasting future realized volatility and jumps in foreign exchange, stock, and bond markets Thomas Busch a,d , Bent Jesper Christensen b,d , Morten Ørregaard Nielsen c,d,∗ a b
Danske Bank, Denmark Aarhus University, Denmark
c
Queen’s University, Canada
d
CREATES, Denmark
article
info
Article history: Available online 6 March 2010 JEL classification: C22 C32 F31 G1 Keywords: Bipower variation HAR (Heterogeneous autoregressive model) Implied volatility Jumps Options Realized volatility VecHAR Volatility forecasting
abstract We study the forecasting of future realized volatility in the foreign exchange, stock, and bond markets from variables in our information set, including implied volatility backed out from option prices. Realized volatility is separated into its continuous and jump components, and the heterogeneous autoregressive (HAR) model is applied with implied volatility as an additional forecasting variable. A vector HAR (VecHAR) model for the resulting simultaneous system is introduced, controlling for possible endogeneity issues. We find that implied volatility contains incremental information about future volatility in all three markets, relative to past continuous and jump components, and it is an unbiased forecast in the foreign exchange and stock markets. Out-of-sample forecasting experiments confirm that implied volatility is important in forecasting future realized volatility components in all three markets. Perhaps surprisingly, the jump component is, to some extent, predictable, and options appear calibrated to incorporate information about future jumps in all three markets. © 2010 Elsevier B.V. All rights reserved.
1. Introduction In both the theoretical and empirical finance literature, volatility is generally recognized as one of the most important determinants of risky asset values, such as exchange rates, stock and bond prices, and hence interest rates. Since any valuation procedure involves assessing the level and riskiness of future payoffs, it is particularly the forecasting of future volatility from variables in the current information set that is important for asset pricing, derivative pricing, hedging, and risk management. A number of different variables are potentially relevant for volatility forecasting. In the present paper, we include derivative prices and investigate whether implied volatilities (IV ) backed out from options on foreign currency futures, stock index futures, or Treasury bond (T-bond) futures contain incremental information when assessed against volatility forecasts based on high-frequency (5-min) current and past returns on exchange rates, stock index futures, and T-bond futures, respectively.
∗ Corresponding address: Department of Economics, Dunning Hall Room 307, 94 University Avenue, Queen’s University, Kingston, Ontario K7L 3N6, Canada. E-mail address:
[email protected] (M.Ø. Nielsen). 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.014
Andersen et al. (2003) and Andersen et al. (2004) show that simple reduced form time series models for realized volatility (RV ) outperform commonly used GARCH and related stochastic volatility models in forecasting future volatility. In recent work, Barndorff-Nielsen and Shephard (2004, 2006) derive a fully nonparametric separation of the continuous sample path (C ) and jump (J) components of RV. Applying this technique, Andersen et al. (2007) extend the results of Andersen et al. (2003) and Andersen et al. (2004) by using past C and J as separate regressors when forecasting volatility. They show that the two components play very different roles in forecasting, and that significant gains in performance are achieved by separating them. While C is strongly serially correlated, J is distinctly less persistent, and almost not forecastable, thus clearly indicating separate roles for C and J in volatility forecasting. In this paper, we study high-frequency (5-min) returns to the $/DM exchange rate, S&P 500 futures, and 30 year T-bond futures, as well as monthly prices of associated futures options. Alternative volatility measures are computed from the two separate data segments, i.e., RV and its components from high-frequency returns and IV from option prices. IV is widely perceived as a natural forecast of integrated volatility over the remaining life of the option contract under risk-neutral pricing. It is also a relevant
T. Busch et al. / Journal of Econometrics 160 (2011) 48–57
forecast in a stochastic volatility setting even if volatility risk is priced, and it should get a coefficient below (above) unity in forecasting regressions in the case of a negative (positive) volatility risk premium (Bollerslev and Zhou, 2006). Since options expire at a monthly frequency, we consider the forecasting of one-month volatility measures. The issue is whether IV retains incremental information about future integrated volatility when assessed against realized measures (RV , C , J) from the previous month. The methodological contributions of the present paper are to use high-frequency data and recent statistical techniques for the realized measures, and to allow these to have different impacts at different frequencies, when constructing the returnbased forecasts that IV is assessed against. These innovations ensure that IV is put to a harder test than in previous literature when comparing forecasting performance. The idea of allowing different impacts at different frequencies arises since realized measures covering the entire previous month very likely are not the only relevant yardsticks. Squared returns nearly one month past may not be as informative about future volatility as squared returns that are only one or a few days old. To address this issue, we apply the heterogeneous autoregressive (HAR) model proposed by Corsi (2009) for RV analysis and extended by Andersen et al. (2007) to include the separate C and J components of total realized volatility (RV = C + J) as regressors. In the HAR framework, we include IV from option prices as an additional regressor, and also consider separate forecasting of both C and J individually. As an additional contribution, we introduce a vector heterogeneous autoregressive (labeled VecHAR) model for joint modeling of IV , C , and J. Since IV is the new variable added in our study, compared to the RV literature, and since it may potentially be measured with error stemming from non-synchronicity between sampled option prices and corresponding futures prices, bid-ask spreads, model error, etc., we take special care in handling this variable. The simultaneous VecHAR analysis controls for possible endogeneity issues in the forecasting equations, and allows testing interesting cross-equation restrictions. Based on in-sample Mincer and Zarnowitz (1969) forecasting regressions, we show that IV contains incremental information relative to both C and J when forecasting subsequent RV in all three markets. Furthermore, in the foreign exchange and stock markets, IV is an unbiased forecast. Indeed, it completely subsumes the information content of the daily, weekly, and monthly highfrequency realized measures in the foreign exchange market. Moreover, out-of-sample forecasting evidence suggests that IV should be used alone when forecasting monthly RV in all three markets. The mean absolute out-of-sample forecast error increases if any RV components are included in constructing the forecast. The results are remarkable considering that IV by construction should forecast volatility over the entire interval through expiration of the option, whereas our realized measures exclude the non-trading (exchange closing) intervals overnight and during weekends and holidays in the stock and bond markets. Indeed, the results most strongly favor the IV forecast in case of foreign currency exchange rates where there is round-the-clock trading. Squared returns over non-trading intervals could be included in RV for the other two markets, but with lower weight since they are more noisy. Leaving them out produces conservative results on the role of IV and is most natural given our focus on the separation of C and J, which in practice requires high-frequency intra-day data. Using the HAR methodology for separate forecasting of C and J, our results show that IV has predictive power for each. Forecasting monthly C is very much like forecasting RV itself. The coefficient on IV is slightly smaller, but in-sample qualitative results on which variables to include are identical. The out-of-sample forecasting evidence suggests that IV again should be used alone in the foreign exchange and stock markets, but that it should be combined
49
with realized measures in the bond market. Perhaps surprisingly, even the jump component is, to some extent, predictable, and IV contains incremental information about future jumps in all three markets. The results from the VecHAR model reinforce the conclusions. In particular, when forecasting C in the foreign exchange market, IV completely subsumes the information content of all realized measures. Out-of-sample forecasting performance is about unchanged for J but improves for C in all markets by using the VecHAR model, relative to comparable univariate specifications. The VecHAR system approach allows testing cross-equation restrictions, the results of which support the finding that IV is a forecast of total realized volatility RV = C + J, indeed an unbiased forecast in the foreign exchange and stock markets. In the previous literature, a number of authors have included IV in forecasting regressions, and most have found that it contains at least some incremental information, although there is mixed evidence on its unbiasedness and efficiency.1 None of these studies has investigated whether the finding of incremental information in IV holds up when separating C and J computed from high-frequency returns, or when including both daily, weekly, and monthly realized measures in HAR-type specifications. An interesting alternative to using individual option prices might have been to use model-free implied volatilities as in Jiang and Tian (2005). However, Andersen and Bondarenko (2007) find that these are dominated by the simpler Black–Scholes implied volatilities in terms of forecasting power. The remainder of the paper is laid out as follows. In the next section we briefly describe realized volatility and the nonparametric identification of its separate continuous sample path and jump components. In Section 3 we discuss the derivative pricing model. Section 4 describes our data. In Section 5 the empirical results are presented, and Section 6 concludes. 2. The econometrics of jumps We assume that the logarithm of the asset price, p(t ), follows the general stochastic volatility jump diffusion model dp(t ) = µ(t )dt + σ (t )dw (t ) + κ(t )dq(t ),
t ≥ 0.
(1)
The mean µ(·) is assumed continuous and locally bounded, the instantaneous volatility σ (·) > 0 is càdlàg, and w(·) is the driving standard Brownian motion. The counting process q(t ) is normalized such that dq(t ) = 1 corresponds to a jump at time t and dq(t ) = 0 otherwise. Hence, κ(t ) is the random jump size at time t if dq(t ) = 1. The intensity of the arrival process for jumps, λ(t ), is possibly time-varying, but does not allow infinite activity jump processes. Note that the leverage effect is accommodated in (1) through possible dependence between σ (·) and w(·), see Barndorff-Nielsen, Graversen, Jacod and Shephard (2006) and Barndorff-Nielsen, Shephard, and Winkel (2006). The quadratic variation [p](t ) is defined for any semimartingale by
[p](t ) = p lim
K − (p(sj ) − p(sj−1 ))2 ,
(2)
j =1
where 0 = s0 < s1 < · · · < sK = t and the limit is taken for maxj |sj − sj−1 | → 0 as K → ∞. In the model (1), we have in wide generality
[p](t ) =
t
∫
σ 2 (s)ds + 0
q(t ) −
κ 2 (tj ),
(3)
j =1
1 See, e.g., Jorion (1995), Xu and Taylor (1995), Covrig and Low (2003), and Pong et al. (2004) on the foreign exchange market, Day and Lewis (1992), Canina and Figlewski (1993), Lamoureux and Lastrapes (1993), Christensen and Prabhala (1998), Fleming (1998), and Blair et al. (2001) on the stock market, and Amin and Morton (1994) on the bond market.
50
T. Busch et al. / Journal of Econometrics 160 (2011) 48–57
where 0 ≤ t1 < t2 < · · · are the jump times. In (3), quadratic variation is decomposed as integrated volatility plus the sum of squared jumps through time t. Assume that M + 1 evenly spaced intra-period observations for period t are available on the log-price pt ,j . The continuously compounded intra-period returns are r t ,j = p t ,j − p t ,j − 1 ,
j = 1, . . . , M , t = 1, . . . , T ,
(4)
where T is the number of periods in the sample. Realized volatility for period t is given by the sum of squared intra-period returns, RV t =
M −
rt2,j ,
t = 1, . . . , T .
(5)
j =1
Following Barndorff-Nielsen and Shephard (2004, 2006), the nonparametric separation of the continuous sample path and jump components of quadratic variation in (3) can be done through the related bipower and tripower variation measures. The staggered (skip-k, with k ≥ 0) realized bipower variation is defined as 2 BV t = µ− 1
M
M −
M − (k + 1) j=k+2
t = 1, . . . , T ,
(6)
√
3 TQ t = µ− 4/3
M −
M − 2(k + 1) j=2k+3
|rt ,j |4/3 |rt ,j−k−1 |4/3 |rt ,j−2k−2 |4/3 , (7)
where µ4/3 = 22/3 0 (7/6)/0 (1/2). We follow Huang and Tauchen (2005) and use k = 1 in (6) and (7) in our empirical work. The choice of k has no impact on asymptotic results. Combining (2) and (5), RV t is by definition a consistent estimator of the per-period increment [p](t ) − [p](t − 1) to quadratic variation as M → ∞. At the same time, BV t is consistent for the integrated volatility portion of the increment,
∫
t
σ 2 (s)ds as M → ∞,
(8)
t −1
as shown by Barndorff-Nielsen and Shephard (2004) and BarndorffNielsen, Shephard, and Winkel (2006). It follows that the difference RV t − BV t converges to the sum of squared jumps that have occurred during the period. Of course, it may be non-zero in finite samples due to sampling variation even if no jump occurred during period t, so a notion of a ‘‘significant jump component’’ is needed. Following e.g. Huang and Tauchen (2005) and Andersen et al. (2007), we apply the (ratio) test statistic
√ Zt =
M
t = 1, . . . , T ,
1 (RV t − BV t )RV − t . 4 −2 −2 1/2 µ− 1 + 2µ1 − 5 max{1, TQ t BV t }
(9)
In the absence of jumps, Zt →d N (0, 1) as M → ∞, and large positive values indicate that jumps occurred during period t. Huang and Tauchen (2005) show that market microstructure noise may bias the test against finding jumps, but also that staggering alleviates the bias.
(10)
where I{A} is the indicator for the event A, Φ1−α the 100(1 − α)% point in the standard normal distribution, and α the significance level. When I{Zt >Φ1−α } = 1, Jt is excess realized volatility above bipower variation, and hence attributable to jumps in prices. The continuous component of quadratic variation is estimated by the remainder of RV t , Ct = RV t − Jt ,
t = 1, . . . , T .
(11)
This way, Ct equals RV t if there is no significant jump during period t, and BV t if there is, i.e., Ct = I{Zt ≤Φ1−α } RV t + I{Zt >Φ1−α } BV t . For any standard significance level α < 1/2, both Jt and Ct from (10) and (11) are non-negative because Φ1−α is. Consistency of each component as estimators of the corresponding components of quadratic variation, i.e.,
∫
t
σ 2 (s)ds and Jt →p
C t →p
q(t ) −
κ 2 (tj ),
j=q(t −1)+1
may be achieved by letting α → 0 and M → ∞ simultaneously. Hence, this high-frequency data approach allows for period-byperiod nonparametric consistent estimation of both components of quadratic variation in (3). 3. Derivative pricing model For the construction of implied volatility, we let c denote the call option price, X the exercise or strike price, τ the time to expiration of the option, F the price of the underlying futures contract with delivery date ∆ periods after option expiration, and r the riskless interest rate. We use the futures option pricing formula, see Bates (1996a,b),
t = 1, . . . , T ,
BV t →p
Jt = I{Zt >Φ1−α } (RV t − BV t ) ,
t −1
|rt ,j ||rt ,j−k−1 |,
where µ1 = 2/π . In theory, a higher value of M improves precision of the estimators, but in practice it also makes them more susceptible to market microstructure effects, such as bid-ask bounces, stale prices, measurement errors, etc., introducing artificial (typically negative) serial correlation in returns, see, e.g., Hansen and Lunde (2006) and Barndorff-Nielsen and Shephard (2007). Huang and Tauchen (2005) show that staggering (i.e., setting k ≥ 1) mitigates the resulting bias in (6), since it avoids the multiplication of the adjacent returns rt ,j and rt ,j−1 that by (4) share the log-price pt ,j−1 in the non-staggered (i.e., k = 0) version of (6). Further, staggered realized tripower quarticity is M2
The (significant) jump component of realized volatility is now
c (F , X , τ , ∆, r , σ 2 ) = e−r (τ +∆) F Φ (d) − X Φ d − d=
ln(F /X ) + 12 σ 2 τ
√ σ 2τ
,
√
σ 2τ
, (12)
where Φ (·) is the standard normal c.d.f. and σ the futures return volatility. The case ∆ = 0 (no delivery lag) corresponds to the wellknown Black (1976) and Garman and Kohlhagen (1983) futures option formula. For general ∆ > 0, regarding the futures contract as an asset paying a continuous dividend yield equal to the riskless rate r, the asset price in the standard Black and Scholes (1973) and Merton (1973) formula is replaced by the discounted futures price e−r (τ +∆) F . Jorion (1995) applied (12) with ∆ = 0 to the currency option market, whereas Bates (1996a,b) used delivery lags ∆ specific to the Philadelphia Exchange (PHLX) and the Chicago Mercantile Exchange (CME), respectively. We consider serial $/DM and S&P 500 futures options with monthly expiration cycle traded at the CME, and equivalent T-bond futures options traded at the Chicago Board of Trade (CBOT). The contract specifications do not uniquely identify the particular T-bond serving as underlying asset for the bond futures, requiring merely that it does not mature and is not callable for at least 15 years from the first day of the delivery month of the underlying futures. The delivery month of the underlying futures contracts follows a quarterly (March) cycle, with delivery date on the third Wednesday of the month for currency and bond futures, and the third Friday for stock index futures. The options expire two Fridays prior to the third Wednesday of each month in the currency case, on the last Friday followed by at least two business days in the month in the bond case, and on the third Friday in the stock case, except every third month where it is shifted to the preceding Thursday to avoid ‘‘triple witching hour’’ problems associated with
T. Busch et al. / Journal of Econometrics 160 (2011) 48–57
simultaneous maturing of the futures, futures options, and index options. Upon exercise, the holder of the option receives a position at the strike X in the futures, plus the intrinsic value F − X in cash, on the following trading day, so the delivery lag is ∆ = 3/365 (from Friday to Monday), except ∆ = 1/365 (Thursday to Friday) every third month in the stock case. Finally, following French (1984), τ is measured in trading days when used with volatilities (σ 2 τ in (12)) and in calender days when concerning interest rates (in r (τ + ∆)). Given observations on the option price c and the variables F , X , τ , ∆, and r, an implied volatility (IV ) estimate in variance form can be backed out from (12) by numerical inversion of the nonlinear equation c = c (F , X , τ , ∆, r , IV ) with respect to IV. In our empirical work, IV measured one month prior to expiration is used as a forecast of subsequent RV and its components C and J (Section 2), measured from high-frequency returns over the remaining life of the option, i.e., the one-month interval ending at expiration. For stocks and bonds, these are returns on the futures, i.e., the underlying asset. In the currency case, we use returns to the $/DM spot exchange rate. Differences between this and the futures rate underlying the option stem mainly from the interest differential in the interest rate parity ln F = p + (r$ − rDM )τ
(13)
from international finance and should be slight. Condition (13) is exact for constant interest rates, since then forward and futures prices coincide (Cox et al., 1981), and an approximation for stochastic rates. Indeed, under (13), the Garman and Kohlhagen (1983) spot exchange rate option pricing formula reduces to (12). This European style formula is here applied to American style options since early exercise premia are very small for short-term, at-the-money (ATM, X = F ) futures options, as noted by Jorion (1995). Although (12) is used as a common standard among practitioners and in the empirical literature, its derivation does not accommodate jumps, and hence J may not be forecast very well by IV backed out from this formula. On the other hand, it is consistent with a time-varying volatility process for a continuous sample path futures price process, suggesting that IV should have better forecasting power for C . In fact, our empirical results below show that IV can predict both C and J, although it is confirmed that J is the most difficult component to predict. The findings suggest that in practice, option prices are, at least to some extent, calibrated to incorporate the possibility of future jumps. For our analysis, this reduces the empirical need to invoke a more general option pricing formula explicitly allowing for jumps. Such an extension would entail estimation of additional parameters, including prices of volatility and jump risk, which would be a considerable complication. If anything, the results would only reveal that option prices contain even more information than that reflected in our IV measure. 4. Data description Serial futures options with monthly expiration cycle were introduced in January 1987 (month where option price is sampled—expiration is the following month) in the $/DM market and in October 1990 for 30 year T-bonds. Our option price data cover the period from inception through May 1999 for $/DM and through November 2002 for bonds, and from January 1990 through November 2002 for S&P 500 futures options. We use open auction closing prices of one-month ATM calls obtained from the Commodity Research Bureau, recorded on the Tuesday after the expiration date of the preceding option contract. The US Eurodollar deposit one-month middle rate from Datastream is used for the risk-free rate r. The final samples are time series of length n of
51
annualized IV measures from (12), covering nonoverlapping onemonth periods, with n = 149 for the currency market,2 155 for the stock market, and 146 for the bond market. For RV and its separate components we use the same data as Andersen et al. (2007). These are based on five-minute observations on $/DM spot exchange rates, S&P 500 futures prices, and T-bond futures prices. There is a total of 288 high-frequency returns per day (rt ,j from (4)) for the currency market, 97 per day for the stock market, and 79 for the bond market. We use the nonparametric procedure from Section 2 to construct monthly realized volatility measures (in annualized terms) covering exactly the same periods as the IV estimates, so each of the n months has M about 6336 (288 returns per day for approximately 22 trading days) for the foreign exchange market, 2134 (22 × 97) for the stock market, and 1738 (22 × 79) for the bond market. As suggested by Andersen et al. (2007), a significance level of α = 0.1% is used to detect jumps and construct the series for J and C from (10) and (11). We find significant jumps in 148 out of the n = 149 months in the foreign exchange market, in 120 of the 155 months in stock market, and in 138 of the 146 months in the bond market. Thus, jumps are non-negligible in all three markets. If implied volatility were given by the conditional expectation of future realized volatility, we would expect that RV and IV had equal unconditional means, and RV higher unconditional standard deviation in the time series than IV. This pattern is confirmed in the foreign exchange and stock markets, where both RV and C have higher sample standard deviations (0.007 and 0.006 in the foreign exchange market, 0.032 and 0.027 in the stock market) than IV (0.005 in the foreign exchange market and 0.024 in the stock market), and almost in the bond market, where RV , C , and IV have about the same standard deviation (0.003). The unconditional sample mean of IV is slightly higher than that of RV in the stock (0.032 vs 0.029) and bond (0.009 vs 0.007) markets, possibly reflecting a negative price of volatility risk (cf. Bollerslev and Zhou, 2006), an early exercise premium, or the overnight closing period in the stock and bond markets. The opposite is the case for the foreign exchange market (0.012 vs 0.013), where there is roundthe-clock trading. Time series plots of the four monthly volatility measures are exhibited in Fig. 1, where Panel A is for the foreign exchange market, Panel B the stock market, and Panel C the bond market. In the foreign exchange and stock markets, the continuous component C of realized volatility is close to RV itself. The new variable in our analysis, implied volatility IV, is also close to RV, but not as close as C . In the bond market (Panel C), C is below RV, and IV hovers above both. In all three markets, the jump component J appears to exhibit less serial dependence than the other volatility measures, consistent with Andersen et al. (2007). This is evidence of the importance of analyzing the continuous and jump components separately. 5. Econometric models and empirical results In this section we study the relation between realized volatility together with its disentangled components and implied volatility from the associated option contract. We apply the Heterogeneous Autoregressive (HAR) model in our setting with implied volatility, and we introduce a multivariate extension. Each table has results for the foreign exchange market in Panel A, the stock market in Panel B, and the T-bond market in Panel C.
2 Trading in $/DM options declined near the introduction of the Euro, and for January 1999 no one-month currency option price is available, even though quarterly contract prices are. An IV estimate is constructed by linear interpolation between IV for December 1998 and February 1999.
52
T. Busch et al. / Journal of Econometrics 160 (2011) 48–57 Panel A: Foreign exchange data
0.06 0.05 0.04
IV RV C J
0.03 0.02 0.01
Jan/99
Jan/98
Jan/97
Jan/96
Jan/95
Jan/94
Jan/93
Jan/92
Jan/91
Jan/90
Jan/89
Jan/88
Jan/87
0
Panel B: S&P 500 Index data
0.15 0.12
IV RV C J
0.09 0.06
Jan/02
Jan/01
Jan/00
Jan/99
Jan/98
Jan/97
Jan/96
Jan/95
Jan/94
Jan/93
Jan/92
Jan/91
0
Jan/90
0.03
Panel C: Treasury bond data
0.02
0.015 IV RV C J
0.01
0.005
Oct/02
Oct/01
Oct/00
Oct/99
Oct/98
Oct/97
Oct/96
Oct/95
Oct/94
Oct/93
Oct/92
Oct/91
Oct/90
0
Fig. 1. Time series plots of monthly volatility measures.
5.1. Heterogeneous autoregressive (HAR) model In forecasting future realized volatility, it may be relevant to place more weight on recent squared returns than on those from the more distant past. This is done in a parsimonious fashion in the HAR model of Corsi (2009). When applying it to RV itself, we denote the model HAR-RV, following Corsi (2009). When separating the RV regressors into their C and J components, we denote the model HAR-RV-CJ, following Andersen et al. (2007). We modify the HAR-RV-CJ model in three directions. First, we include implied volatility (IV ) as an additional regressor and abbreviate the resulting model HAR-RV-CJIV. Secondly, in the following subsection the HAR approach is applied to separate forecasting of C and J, rather than total realized volatility RV = C + J, and we denote the corresponding models HAR-C-CJIV and HARJ-CJIV, respectively. Thirdly, Andersen et al. (2007) estimate HAR models with the regressand sampled at overlapping time intervals, e.g., monthly RV is sampled at the daily frequency, causing serial correlation in the error term. This does not necessarily invalidate the parameter estimates, although an adjustment must be made to obtain correct standard errors. However, options expire according to a monthly cycle (Section 3), and the analysis in Christensen and Prabhala (1998) suggests that use of overlapping data may lead to erroneous inferences in a setting involving both IV and RV. Thus, in all our regression specifications, see e.g. (14) below, the regressand and regressors cover nonoverlapping time intervals. The h-day realized volatility in annualized terms is RV t ,t +h = 252h−1 (RV t +1 + RV t +2 + · · · + RV t +h ) .
Henceforth, we use t to indicate trading days. Thus, RV t is now the daily realized volatility for day t from (5), while RV t ,t +h is a daily (h = 1), weekly (h = 5), or monthly (h = 22) realized volatility, and similarly for the continuous (Ct ,t +h ) and jump (Jt ,t +h ) components, where the jump test Zt from (9) is implemented using h-day period lengths for RV , BV , TQ , and M. Note that RV t ,t +1 = 252RV t +1 , and under stationarity E [RV t ,t +h ] = 252E [RV t +1 ] for all h. Each monthly realized measure is constructed using a value of h exactly matching the number of trading days covered by the associated implied volatility, but for notational convenience we continue to write h = 22 for all monthly realized measures. Finally, IV t denotes the implied volatility backed out from the price of the relevant one-month option sampled on day t, and is in variance form. The HAR-RV-CJIV model is the Mincer and Zarnowitz (1969) type regression RV t ,t +22 = α + γm xt −22,t + γw xt −5,t + γd xt + β IV t + εt ,t +22 , t = 22, 44, 66, . . . , 22n,
(14)
where εt ,t +22 is the monthly forecasting error, and xt −h,t is either RV t −h,t or the vector (Ct −h,t , Jt −h,t ). When a variable is not included in the specific regression, β = 0 or γm = γw = γd = 0 is imposed. Note that xt −22,t contains lagged realized volatility measures covering a month, whereas xt −5,t and xt allow extracting information about future volatility from the more recent (one week and one day) history of past squared returns. Table 1 shows the results for the HAR-RV-CJIV model in (14). We report coefficient estimates (standard errors in parentheses) together with adjusted R2 and the Breusch–Godfrey LM test for residual autocorrelation up to lag 12 (one year), denoted AR12 , used here instead of the standard Durbin–Watson statistic due to the presence of lagged endogenous variables in several of the regressions. Under the null hypothesis of absence of serial dependence in the residuals, the AR12 statistic is asymptotically χ 2 with 12 degrees of freedom. The last column (MAFE) reports out-of-sample mean absolute forecast errors (×100) for 24 rolling window one-step ahead forecasts based on n − 24 observations. In the first line of each panel, x = RV , so this is a monthly frequency HAR-RV model (Corsi, 2009). In the foreign exchange market (Panel A), the combined impact from the monthly, weekly, and daily RV on future realized volatility is 0.22 + 0.10 + 0.17 = 0.49, strikingly close to the first order autocorrelation of monthly RV, which is 0.46. The t-statistics are 1.92, 0.68, and 2.06, respectively, indicating that the weekly variable contains only minor information concerning future monthly exchange rate volatility. In the stock market (Panel B), all three RV measures are significant, but the weekly measure with a negative coefficient. In contrast to the foreign exchange market, the AR12 test is significant. Panel C is for bond data and the results in the first line show that only monthly RV is significant. In row two of each panel of Table 1, x = (C , J ), so this is a monthly frequency HAR-RV-CJ model (Andersen et al., 2007). The conclusions for C are similar to those for RV in the first row, except that the monthly and weekly components become insignificant in the stock market. The jump components are insignificant, except daily J in the stock and bond markets. Adjusted R2 improves when moving from first to second line of each panel of Table 1, thus confirming the enhanced in-sample (Mincer–Zarnowitz) forecasting performance obtained by splitting RV into its separate components also found by Andersen et al. (2007). Out-of-sample forecasting performance (MAFE) improves in the stock market when separating C and J, but remains unchanged in the bond market, and actually deteriorates in the currency market, hence showing the relevance of including this criterion in the analysis. Next, implied volatility is added to the information set at time t in the HAR regressions. When RV is included together with IV, fourth row, all the realized volatility coefficients turn
T. Busch et al. / Journal of Econometrics 160 (2011) 48–57
53
Table 1 Realized volatility HAR models. Const.
RV t −22,t
RV t −5,t
RV t
Ct −22,t
Ct −5,t
Ct
Jt −22,t
Jt −5,t
Jt
IV t
Adj. R2 (%)
–
–
–
–
–
–
26.0
AR12
MAFE
Panel A: Foreign exchange data 0.0061
0.2186
(0.0011)
0.0981
(0.1138)
0.1706 –
(0.1438)
(0.0828)
0.0061
–
–
–
0.0022
–
–
–
0.0021
−0.1483
0.0022
–
(0.0011) (0.0011) (0.0011) (0.0012)
0.0769
(0.1178)
0.0871
(0.0754)
–
0.2407
(0.1623)
–
0.0765 –
(0.1284)
–
0.2355
(0.1597)
–
–
−0.0517
(0.6783)
–
– 0.0097
(0.1526)
0.1922
(0.0930)
–
−0.8014
– (0.0869)
(0.5326)
0.3376
26.9
12.05
0.3774
0.8917 40.7
15.64
0.2938
– (0.0884)
– 0.0996
(0.6319)
(0.2193)
–
–
0.1114 −0.7076
(0.1471)
0.0898
(0.5640)
–
9.92
0.8733 41.1
21.50
*
0.3124
0.8715 40.4
23.90*
0.3496
(0.1419)
−0.0474 (0.1993)
(0.1515)
Panel B: S&P 500 data 0.0053
0.6240
(0.0025)
(0.1132)
0.0037
–
−0.0050
–
(0.0023)
(0.0027)
−0.0052
0.0378
(0.0027)
−0.0051 (0.0027)
(0.1311)
–
−0.3340
0.6765 –
(0.1039)
(0.1007)
–
–
–
–
−0.1617 (0.0943)
–
– 0.1568
– 0.0407
(0.1327)
– 0.9646
(0.1353)
0.2727
(0.1088)
(0.2720)
–
–
–
53.0
41.50**
3.2011
−0.1427
−0.7903
–
61.9
25.08*
2.8250
1.0585 62.1
17.83
1.9912
0.9513 64.0
26.18*
2.5089
0.7952 68.2
24.87
*
2.2582
32.5
20.30
0.2253
(0.1812)
(0.3873)
–
–
–
–
–
–
0.3177 –
–
–
–
–
–
(0.1026)
–
−0.1511
0.0633
(0.1336)
0.6016 −0.4486
(0.1237)
(0.1194)
0.0454
(0.2809)
(0.1690)
(0.0667)
(0.1391)
−0.7019 (0.3541)
(0.1447)
Panel C: Treasury bond data 0.0031
0.3600
(0.0005)
0.1112
(0.1106)
0.1389 –
(0.1143)
(0.0744)
0.0037
–
–
–
0.0023
–
–
–
(0.0005) (0.0006)
0.0018
0.0462
(0.0006)
0.0023
(0.0006)
0.1835
(0.1254)
–
(0.1086)
–
0.4203
– 0.1436
(0.1347)
–
–
0.0826 −0.1502
(0.1363)
(0.0776)
(0.2792)
–
−0.3379
(0.1764)
–
–
–
–
–
0.0817 –
–
–
–
–
–
0.1736
(0.1355)
0.1424
(0.1267)
0.0318 −0.7019
(0.0729)
(0.2842)
−0.0605 (0.2457)
– 0.3660
(0.2567)
– (0.0710)
–
–
–
19.17
0.2253
0.5686 35.0
37.0
29.45**
0.2023
0.3933 40.4
22.75*
0.2092
0.4129 45.5
*
0.2149
(0.0641)
0.2812
(0.1649)
(0.0882) (0.0867)
2
21.70
2
Note: The table shows HAR-RV-CJIV results for the specification (14) with standard errors in parentheses. Adj. R denotes the adjusted R for the regression and AR12 is the LM test statistic (with 12 lags) for the null of no serial correlation in the residuals. The last column (MAFE) reports out-of-sample mean absolute forecast errors (×100) for 24 rolling window one-step ahead forecasts based on n − 24 observations. * Denotes rejection at 5% significance level. ** Denotes rejection at 1% significance level.
insignificant in the foreign exchange and bond markets, and only daily RV remains significant in the stock market. Indeed, IV gets t-statistics of 6.15, 6.84, and 4.46 in the three markets, providing clear evidence of the relevance of IV in forecasting future volatility. The last row of each panel shows the results when including C and J together with IV, i.e., the full HAR-RV-CJIV model (14). In the foreign exchange market, IV completely subsumes the information content of both C and J at all frequencies. Adjusted R2 is about equally high in the third line of the panel, where IV is the sole regressor and where also MAFE takes the best (lowest) value in the panel. In the stock market, both daily components of RV remain significant, and the adjusted R2 increases from 62% to 68% relative to having IV as the sole regressor, but again MAFE points to the specification with only IV included as the best forecast. In the bond market, IV gets the highest t-statistic, as in the other two markets. In this case, the monthly jump component Jt −22,t is also significant and adjusted R2 improves markedly, both between lines three and four (adding realized volatility) and between lines four and five (separating the RV components), but the ordering by MAFE is the reverse. The AR12 test does show mild signs of misspecification in all three markets. The finding so far is that IV as a forecast of future volatility contains incremental information relative to return-based forecasts in all three markets, even when using the new nonparametric jump separation technique for RV and exploiting potential forecasting power of the C and J components at different frequencies using the HAR methodology. Indeed, in the preferred specification for the foreign exchange market, all realized measures are insignificant or left out, showing that the conclusions of Jorion (1995), Xu and Taylor (1995), and Pong et al. (2004) hold up even to these new improvements of the return-based forecasts. In fact, the superiority of the implied volatility forecast of future volatility extends to all three markets based on out-of-sample forecasting, where MAFE
within each panel in Table 1 is minimized by the specification with IV as the sole forecasting variable. 5.2. Forecasting the continuous and jump components We now split RV t ,t +22 on the left hand side of (14) into its continuous and jump components, Ct ,t +22 and Jt ,t +22 , and forecast these separately. This is particularly interesting because Andersen et al. (2007) show that the two components exhibit very different time series properties, as also evident from Fig. 1. Thus, forecasting should be carried out in different manners for the two, and we modify the HAR methodology accordingly. Andersen et al. (2007) did not consider separate forecasting of the components. Since our specifications include implied volatility as well, we are able to assess the incremental information in option prices on the future continuous and jump components of volatility separately. Our HAR-C-CJIV model for forecasting the future continuous component is Ct ,t +22 = α + γm xt −22,t + γw xt −5,t + γd xt + β IV t + εt ,t +22 , t = 22, 44, 66, . . . , 22n,
(15)
where Ct ,t +22 replaces RV t ,t +22 as regressand compared to (14) and x now contains either C or the vector (C , J ). Table 2 shows the results in the same format as in Table 1. The results in Table 2 are similar to those in Table 1. This confirms that C is as amenable to forecasting as RV, hence demonstrating the value of the new approach of separate HAR modeling of C and J. The AR12 tests show only modest signs of misspecification, except in the bond market. In the foreign exchange and bond markets, adjusted R2 s are similar to comparable specifications in Table 1, and in the stock market they are higher than in Table 1. Implied volatility generally gets higher coefficients and t-statistics than the lagged C and J components,
54
T. Busch et al. / Journal of Econometrics 160 (2011) 48–57
Table 2 Continuous component HAR models. Const.
Ct −22,t
Ct −5,t
Ct
Jt −22,t
Jt −5,t
Jt
IV t
Adj. R2 (%)
–
–
–
–
28.6
AR12
MAFE
Panel A: Foreign exchange data 0.0053
0.2273
(0.0009)
0.0053
(0.0010)
0.0020
(0.0010)
(0.1359)
0.2176
(0.1433)
– 0.0209
(0.0779)
0.0487
(0.1410)
–
0.1892
(0.1370)
0.2651
(0.0009)
0.0020
0.0634
(0.1128)
(0.0821)
–
−0.0171 (0.1310)
−0.1025
−0.5571
–
–
(0.5990)
0.1077
−0.8674
0.7356
–
(0.0774)
0.0826
(0.4980)
0.2088
(0.5628)
–
(0.1936)
– (0.4743)
(0.1775)
0.2987
27.9
10.98
0.3238
0.7857
39.9
18.84
0.2632
0.7409
40.2
24.24*
0.3129
67.2
25.39*
2.2421
(0.0791)
−0.0340
9.23
(0.1349)
Panel B: S&P 500 data 0.0031
0.1556
(0.0018)
0.0023
−0.0040
−0.0945
(0.0783)
0.0853
(0.0944)
–
(0.0019)
(0.0967)
0.1246
(0.0017)
−0.0052 (0.0021)
0.0959
(0.0989)
0.8636
(0.0962)
– (0.0949)
0.3412
(0.0879)
–
–
−0.3477
−0.4236
–
–
–
0.6053
−0.1721
−0.2138
−0.3606
0.0959
–
–
–
0.1171
−0.4624
−0.3924
(0.0774)
– 0.1014
–
(0.0848)
(0.1934)
(0.1997)
(0.1288)
–
(0.2753)
(0.1201)
73.7
14.25
1.8550
0.9498
68.5
24.99*
1.4481
0.5659
78.1
20.23
1.5407
–
39.4
27.14**
0.2251
–
43.1
22.38*
0.2199
0.4977
29.4
42.16**
0.2145
0.3124
48.3
20.50
0.2065
(0.0520)
(0.2517)
(0.1029)
Panel C: Treasury bond data 0.0023
0.4028
(0.0004)
0.0018 0.0019
(0.0005)
(0.1236)
0.4220
(0.0005)
(0.0006)
0.1480
(0.1216)
0.0030
(0.1221)
–
(0.1236)
– 0.2353
(0.1259)
(0.0722)
0.1082
(0.0703)
– 0.1073
(0.1177)
0.0786
(0.0678)
(0.2531)
(0.2327)
–
–
−0.8798
−0.1826
(0.2641)
0.2856
(0.1599)
–
(0.0637)
0.2214
(0.2282)
(0.1532)
(0.0806)
Note: The table shows HAR-C-CJIV results for (15), using the same definitions and layout as in Table 1.
and adjusted R2 is highest when IV is included along with these. In the foreign exchange market, IV completely subsumes the information content of the realized measures, just as in Panel A of Table 1. The out-of-sample forecasting evidence based on MAFE suggests using IV as the sole forecasting variable in the foreign exchange and stock markets, and combining IV with the realized measures in the bond market. We next consider the predictability of the jump component of realized volatility. The relevant HAR-J-CJIV model takes the form Jt ,t +22 = α + γm xt −22,t + γw xt −5,t + γd xt + β IV t + εt ,t +22 , t = 22, 44, 66, . . . , 22n,
(16)
where x now contains either J or (C , J ). Table 3 reports the results. Specifically, in line three of each panel, IV is used to predict the jump component of future volatility. It is strongly significant in all three markets and gets higher t-statistics than all other variables considered. The highest adjusted R2 s in the table are obtained in the fourth line of each panel, where all variables are included. Here, the AR12 test shows no signs of misspecification in the foreign exchange and bond markets, although it rejects in the stock market. Implied volatility remains highly significant in all three markets and turns out to be the strongest predictor of future jumps (in terms of t-statistics) even when the C and J components at all frequencies are included. The coefficient on IV ranges between 0.07 and 0.23 across markets and specifications, consistent with the mean jump component being an order of magnitude smaller than implied volatility in Fig. 1. Indeed, in the bond market, IV subsumes the information content of both C and J at all frequencies. In the foreign exchange and stock markets, the out-of-sample forecasting evidence suggests that IV should be used alone as the sole forecasting variable even when forecasting only the future jump component. In the bond market, the MAFE criterion selects the forecast using only past jump components. Comparing across Tables 1–3, the results are most similar in Tables 1 and 2, so RV and C behave similarly also in this forecasting context, and our results show that IV is important in forecasting both. The difference in results when moving to Table 3 reinforces that C and J should be treated separately. When doing so, we find that, firstly, jumps are predictable from variables in the information set, and, secondly, IV retains incremental information,
thus suggesting that option prices incorporate jump information. Finally, the out-of-sample forecasting evidence suggests using IV as the sole forecasting variable in the foreign exchange and stock markets, whether forecasting RV or either of its components. 5.3. The vector heterogeneous autoregressive (VecHAR) model We now introduce a simultaneous system approach for joint analysis of IV , C , and J. The reason is that, firstly, the results up to this point have been obtained in different regression equations that are not independent, and some relevant joint hypotheses involve cross-equation restrictions. Secondly, IV may be measured with error stemming from non-synchronous option and futures prices, misspecification of the option pricing formula, etc. Such errorsin-variable problems generate correlation between regressor and error terms in the forecasting equations for C and J, and thus an endogeneity problem. In addition, the realized measures contain sampling error, as discussed in Section 2. Our simultaneous system approach provides an efficient method for handling these endogeneity issues.3 Thus, we consider the vector heterogeneous autoregressive (VecHAR) system
1 0 0
−β1 −β2
0 1 0
1
+
Ct −22,t Jt −22,t IV t −1
A11d A21d A31d
A12d
A11w
α1 A11m = α2 + A21m α3 A31m A12w
×
Ct ,t +22 Jt ,t +22 IV t
+ A21w A31w A22d A32d
Ct Jt
A22w A32w
A12m A22m A32m
0 0
A33m
Ct −5,t Jt −5,t
εt1,t +22 + εt2,t +22 . εt3,t +22
(17)
The first two equations comprise the forecasting Eqs. (15) and (16) for C and J, and the third endogenizes IV. There are two sources
3 Engle and Gallo (2006) also consider a trivariate system, but for three realized quadratic variation measures, and without the HAR feature or jumps. Implied volatility is also not included in their system, although it is in subsequent univariate regressions.
T. Busch et al. / Journal of Econometrics 160 (2011) 48–57
55
Table 3 Jump component HAR models. Const.
Ct −22,t
Ct −5,t
Ct
–
–
Jt −22,t
Jt −5,t
Jt
0.3272
−0.1192
−0.0167
–
0.2947
−0.2443
0.0071
–
Adj. R2 (%)
IV t
AR12
MAFE
7.3
14.60
0.0607
Panel A: Foreign exchange data 0.0011
–
0.0009
−0.0295
0.0002
–
0.0003
−0.0726
(0.0001) (0.0002)
(0.0002) (0.0002)
0.0384
(0.0265)
0.0231
(0.0270)
–
(0.0155)
(0.1127)
– 0.0268
(0.0259)
(0.0908)
– 0.0038
(0.0250)
0.1598
(0.0148)
(0.1073)
(0.0910)
(0.0372)
(0.0937)
(0.0364)
–
–
−0.1092
−0.0134
0.2036
−0.3052
–
−0.3668
–
(0.0904)
14.4
15.15
0.0616
0.1060
24.1
10.83
0.0529
0.1307
27.2
11.09
0.0539
63.65**
1.1983 1.1494
(0.0154)
(0.0338)
(0.0257)
Panel B: S&P 500 data 0.0031
(0.0008)
–
0.0015
0.0321
(0.0010)
0.0002
(0.0012)
−0.0011 (0.0012)
– (0.0577)
0.0041
–
−0.0446
(0.0992)
0.1010
(0.0588)
−0.0685
(0.0473)
–
–
–
−0.0567
−0.0381
−0.0037
−0.2765
(0.0568)
0.2049
(0.1182)
– (0.0613)
(0.0770)
(0.0548)
(0.0788)
–
(0.1684)
– 0.2592
(0.1290)
(0.1571)
(0.0776)
4.0 6.2
57.36**
0.1087
8.0
59.37**
1.0042
0.2293
12.7
51.56**
1.1276
(0.0288)
−0.3413 (0.1626)
(0.0664)
Panel C: Treasury bond data 0.0007
–
0.0007
−0.0017
(0.0001) (0.0002)
– (0.0430)
0.0005
–
0.0004
−0.0617
(0.0002) (0.0002)
−0.0345
0.0351
−0.0469
(0.0436)
– (0.0448)
0.3100
– 0.0354
(0.0808)
0.3122
(0.0248)
–
(0.0821)
– 0.1780
(0.0241)
–
12.2
17.12
0.0463
0.0804
–
11.6
21.00
0.0516
(0.0564)
– 0.1221
(0.0940)
0.0820
(0.0553)
0.0546
(0.0892)
– (0.0419)
0.0438
(0.0867)
(0.0812)
0.0709
6.9
0.1005
18.2
(0.0207)
0.0598
(0.0545)
(0.0287)
34.70**
0.0640
19.46
0.0534
Note: The table shows HAR-J-CJIV results for (16), using the same definitions and layout as in Table 1. Table 4 VecHAR models. Dep. var.
Constant
Ct −22,t
Ct −5,t
Ct
Jt −22,t
Jt −5,t
Jt
IV t
IV t −1
AR12
MAFE
0.6369
–
33.88**
0.3114
0.1100
–
15.16
0.0550
8.57
0.0766
20.49
1.5277
Panel A: Foreign exchange data Ct ,t +22 Jt ,t +22 IV t
0.0024
0.0552
−0.0078
0.0004
−0.0658
0.0286
0.0013
−0.0668
(0.0010)
(0.0002) (0.0006)
(0.1770)
(0.0356)
(0.0547)
(0.1418)
0.1231
−0.7601
0.1013
−0.0177
0.0068
0.1811
−0.1305
−0.0102
0.0788
−0.0431
−0.3046 (0.1874)
(0.0436)
0.5478
−0.2863
−0.1840
−0.3466
(0.0721)
(0.0311)
(0.0139)
(0.0405)
(0.0365)
−0.0521
(0.5914)
(0.1191) (0.1431)
(0.6028)
(0.1037)
(0.1441)
(0.1886)
(0.0258)
0.0242
(0.0383)
0.9028
–
(0.0665)
Panel B: S&P 500 data Ct ,t +22
−0.0054
Jt ,t +22
−0.0015
−0.0732
−0.0369
−0.0232
−0.3152
0.2693
−0.3365
0.0009
−0.0144
0.0159
0.0307
0.3800
−0.0158
−0.3868
0.0766
−0.9023
−0.1713
−0.0494
0.1506
0.1358
IV t
(0.0022)
(0.0012)
(0.0007)
−0.1432 (0.1307)
(0.0980)
(0.0613)
0.1050
(0.0882) (0.0429)
(0.0454)
(0.1658)
(0.0429)
(0.0358)
(0.1902)
(0.1474)
(0.0547)
(0.1647)
(0.1895)
(0.0450)
(0.3820) (0.3471) (0.0956)
0.6918
–
0.2720
–
(0.1599) (0.1018)
0.9349
–
(0.0553)
59.32**
1.1225
28.36**
0.6819
Panel C: Treasury bond data Ct ,t +22 Jt ,t +22 IV t
0.0019
(0.0005)
0.0003
(0.0002)
0.0001
(0.0003)
0.2253
(0.1209)
−0.0740 (0.0479)
0.0328
(0.0447)
0.1073
(0.1089)
0.0350
(0.0412)
−0.0264 (0.0375)
(0.0646)
(0.2598)
(0.0289)
(0.1142)
(0.0355)
(0.2282)
0.0517
0.2547
(0.1815)
0.2180
(0.0609)
(0.0765)
(0.0589)
(0.1091)
(0.0389)
−0.0344
0.0448
0.3292
–
20.05
0.2042
0.1210
–
17.60
0.0539
21.85*
0.0404
(0.0806)
0.0556
(0.0525)
–
0.9172
(0.0400)
Note: The table shows FIML results for the simultaneous VecHAR system (17) with robust (sandwich-formula) standard errors in parentheses. AR12 and MAFE are defined as in Table 1.
of simultaneity in the VecHAR system. Firstly, the off-diagonal terms β1 and β2 in the leading coefficient matrix accommodates dependence of Ct ,t +22 and Jt ,t +22 on the endogenous variable IV t . Secondly, the system errors may be contemporaneously correlated. In the third equation, option prices may reflect return movements during the previous month, and via the HAR type specification more recent returns may receive higher weight. In addition, our specification allows dependence on IV t −1 , i.e., one-day lagged implied volatility sampled on Monday for the same option contract as in IV t , which is sampled on Tuesday. The specification of the third equation is similar to using IV t −1 as an additional instrument for IV t in an instrumental variables treatment of the endogeneity problem, but the system approach in (17) is more general and efficient. In Table 4 we present the results of Gaussian full information maximum likelihood (FIML) estimation of the VecHAR system with robust standard errors (sandwich-formula, H −1 VH −1 , where H is the Hessian and V the outer-product-gradient matrix) in
parentheses. Of course, the results are asymptotically valid even in the absence of Gaussianity. The AR12 tests show only mild signs of misspecification in the foreign exchange and bond markets, although the tests are significant in two of the equations for the stock market. Implied volatility is strongly significant in the forecasting equations for both C and J in all three markets, showing that option prices contain incremental information beyond that in high-frequency realized measures. In the foreign exchange market, IV subsumes the information content of all other variables in forecasting both C and J. Out-of-sample forecasting performance actually improves for C (first equation) in the VecHAR system relative to comparable univariate specifications (last row of each panel in Table 2). For J, out-of-sample forecasting performance is similar in the VecHAR system and in the last row of each panel in Table 3, with a small improvement in the simultaneous system for the stock market and a small deterioration in the other two markets.
56
T. Busch et al. / Journal of Econometrics 160 (2011) 48–57
Table 5 LR tests in VecHAR models. Hypothesis
Panel A: Currency data
H1 : A11m = 0, A12m = 0 H2 : A11w = 0, A12w = 0 H3 : A11d = 0, A12d = 0 H4 : β 1 = 1 H5 : A11m = 0, A12m = 0, β1 = 1 H6 : A11w = 0, A12w = 0, β1 = 1 H7 : A11d = 0, A12d = 0, β1 = 1 ¯ m = 0, A¯ w = 0, A¯ d = 0 H8 : A H9 : β 2 = 0 H10 : β1 + β2 = 1
Panel B: S&P 500 data
d.f.
p-values
LR
d.f.
p-values
LR
d.f.
p-values
1.8922 0.0490 2.6273 5.8950 14.5597 7.1047 6.9558 29.3771 13.2269 2.2915
2 2 2 1 3 3 3 12 1 1
0.3883 0.9758 0.2688 0.0152 0.0022 0.0686 0.0733 0.0035 0.0003 0.1301
2.9715 3.8762 32.8235 6.7789 20.4269 9.1775 35.4983 81.3038 13.4034 0.0512
2 2 2 1 3 3 3 12 1 1
0.2263 0.1440 0.0000 0.0092 0.0001 0.0270 0.0000 0.0000 0.0003 0.8210
14.5639 1.4315 3.2478 46.1995 85.7975 47.8707 46.3995 73.2120 15.0357 29.0539
2 2 2 1 3 3 3 12 1 1
0.0007 0.4888 0.1971 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0000
¯k = Note: The table shows LR test results for the simultaneous VecHAR system (17) where the matrix notation A
Table 5 shows results of likelihood ratio (LR) tests of many hypotheses of interest in the VecHAR model. First, the hypothesis H2 : A11w = 0, A12w = 0 in (17) is the relevant forecasting efficiency hypothesis in the C equation with respect to both weekly realized components. This hypothesis is not rejected in any of the markets. Indeed, in the foreign exchange market, IV subsumes the information content of C and J at all frequencies, with p-values of H1 : A11m = 0, A12m = 0 and H3 : A11d = 0, A12d = 0 of 39% and 27% in this market. IV subsumes the information content of the monthly measures (H1 ) in the stock market, and the daily measures (H3 ) in the bond market. Unbiasedness of IV as a forecast of C (H4 : β 1 = 1) is rejected at the 5% level in all three markets. The estimated coefficient on IV is below unity in all three markets in Table 4, showing that IV is upward biased as a forecast of future C . Possible reasons for this phenomenon are that volatility risk is priced (cf. Bollerslev and Zhou, 2006) or that IV reflects information about future J as well, which we return to in H9 and H10 . In H5 –H7 , the unbiasedness hypothesis H4 is tested jointly with the efficiency hypotheses H1 –H3 . Consistent with previous results, H5 –H7 are strongly rejected in the stock and bond markets and H6 –H7 (efficiency with respect to daily and weekly measures along with unbiasedness) are not rejected at the 5% level in the foreign exchange market. Next, we consider cross-equation restrictions which hence ¯k = require the system approach. Using the matrix notation A A11k A21k
A12k A22k
Panel C: Bond data
LR
, k = m, w, d, we examine in H8 : A¯ m = 0, A¯ w = 0,
¯ d = 0 the hypothesis that all realized components in both the A continuous and jump equations are jointly insignificant. This is rejected in all three markets. In H9 : β 2 = 0, we examine the hypothesis that IV carries no incremental information about future J, relative to the realized measures. This is strongly rejected in all three markets. Finally, in H10 : β1 + β2 = 1, again a cross-equation restriction, we test the hypothesis that IV t is an unbiased forecast of total realized volatility, RV t ,t +22 = Ct ,t +22 + Jt ,t +22 . Although unbiasedness of IV as a forecast of future C , H4 : β 1 = 1, is rejected at the 5% level or better in all markets, H10 : β1 + β2 = 1 is not rejected in the foreign exchange and stock markets. This reinforces earlier conclusions that IV does forecast more than just the continuous component, that jumps are, to some extent, predictable, and, indeed, that option prices are calibrated to incorporate information about future jumps. 6. Conclusions and discussion This paper examines the role of implied volatility in forecasting future realized volatility and jumps in the foreign exchange, stock, and bond markets. Realized volatility is separated into its continuous sample path and jump components, since Andersen
A11k A21k
A12k A22k
, k = m, w, d, is used.
et al. (2007) show that this leads to improved forecasting performance. We assess the incremental forecasting power of implied volatility relative to Andersen et al. (2007). On the methodological side, we apply the HAR model proposed by Corsi (2009) and applied by Andersen et al. (2007). We include implied volatility as an additional regressor, and also consider forecasting of the separate continuous and jump components of realized volatility. Furthermore, we introduce a vector HAR (VecHAR) model for simultaneous modeling of implied volatility and the separate components of realized volatility, controlling for possible endogeneity issues. On the substantive side, our empirical results using both insample Mincer and Zarnowitz (1969) regressions and out-ofsample forecasting show that in all three markets, option implied volatility contains incremental information about future return volatility relative to both the continuous and jump components of realized volatility. Indeed, implied volatility subsumes the information content of several realized measures in all three markets. In addition, implied volatility is an unbiased forecast of the sum of the continuous and jump components, i.e., of total realized volatility, in the foreign exchange and stock markets. The out-of-sample forecasting evidence confirms that implied volatility should be used in forecasting future realized volatility or the continuous component of this in all three markets. Finally, our results show that even the jump component of realized return volatility is, to some extent, predictable, and that option implied volatility enters significantly in the relevant jump forecasting equation for all three markets. Overall, our results are interesting and complement the burgeoning realized volatility literature. What we show is that implied volatility generally contains additional ex ante information on volatility and its continuous sample path and jump components beyond that in realized volatility and its components. This ex ante criterion is not everything that realized volatility may be used for, and it is possibly not the most important use. For example, realized volatility and its components can be used for ex post assessments of what volatility has been, whether there have been jumps in prices or not, etc. Implied volatility (even ex post implied volatility) is not well suited for these purposes. Acknowledgements We are grateful to the editors, two anonymous referees, Torben G. Andersen, John Y. Campbell, Per Frederiksen, Pierre Perron, Jeremy Stein, Matti Suominen, Luis Viceira, and participants at various conferences and seminars for useful comments, and especially to Tim Bollerslev for extensive comments and for providing the realized volatility, bipower variation, and tripower quarticity data. We thank Christian Bach, Rolf Karlsen, and Christian Sønderup for research assistance and the Center for Analytical Economics
T. Busch et al. / Journal of Econometrics 160 (2011) 48–57
(CAE) at Cornell, the Center for Analytical Finance (CAF) at Aarhus, the Center for Research in Econometric Analysis of Time Series (CREATES, funded by the Danish National Research Foundation) at Aarhus, the Social Sciences and Humanities Research Council of Canada (SSHRC), and the Danish Social Science Research Council (FSE) for research support. This paper combines and further develops the material in our previous three papers Christensen and Nielsen (2005) and Busch et al. (2005, 2006). References Amin, K.I., Morton, A.J., 1994. Implied volatility functions in arbitrage-free term structure models. Journal of Financial Economics 35, 141–180. Andersen, T.G., Bollerslev, T., Diebold, F.X., 2007. Roughing it up: including jump components in the measurement, modeling, and forecasting of return volatility. Review of Economics and Statistics 89, 701–720. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2003. Modeling and forecasting realized volatility. Econometrica 71, 579–625. Andersen, T.G., Bollerslev, T., Meddahi, N., 2004. Analytical evaluation of volatility forecasts. International Economic Review 45, 1079–1110. Andersen, T.G., Bondarenko, O., 2007. Construction and interpretation of model-free implied volatility. In: Nelken, I. (Ed.), Volatility as an Asset Class. Risk Books, London, UK, pp. 141–181. Barndorff-Nielsen, O.E., Graversen, S.E., Jacod, J., Shephard, N., 2006. Limit theorems for realised bipower variation in econometrics. Econometric Theory 22, 677–719. Barndorff-Nielsen, O.E., Shephard, N., 2004. Power and bipower variation with stochastic volatility and jumps (with discussion). Journal of Financial Econometrics 2, 1–57. Barndorff-Nielsen, O.E., Shephard, N., 2006. Econometrics of testing for jumps in financial economics using bipower variation. Journal of Financial Econometrics 4, 1–30. Barndorff-Nielsen, O.E., Shephard, N., 2007. Variation, jumps, market frictions, and high frequency data in financial econometrics. In: Blundell, R., Persson, T., Newey, W.K. (Eds.), Advances in Economics and Econometrics: Theory and Applications, Ninth World Congress. Cambridge University Press, Cambridge, UK, pp. 328–372. Barndorff-Nielsen, O.E., Shephard, N., Winkel, M., 2006. Limit theorems for multipower variation in the presence of jumps. Stochastic Processes and their Applications 116, 796–806. Bates, D.S., 1996a. Dollar jump fears, 1984–1992: distributional abnormalities implicit in currency futures options. Journal of International Money and Finance 15, 65–93. Bates, D.S., 1996b. Jumps and stochastic volatility: exchange rate processes implicit in deutsche mark options. Review of Financial Studies 9, 69–107. Black, F., 1976. The pricing of commodity contracts. Journal of Financial Economics 3, 167–179. Black, F., Scholes, M., 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81, 637–654.
57
Blair, B.J., Poon, S., Taylor, S.J., 2001. Forecasting S&P 100 volatility: the incremental information content of implied volatilities and high-frequency index returns. Journal of Econometrics 105, 5–26. Bollerslev, T., Zhou, H., 2006. Volatility puzzles: a simple framework for gauging return-volatility regressions. Journal of Econometrics 131, 123–150. Busch, T., Christensen, B.J., Nielsen, M.Ø, 2005. Forecasting exchange rate volatility in the presence of jumps. QED Working Paper 1187. Queen’s University. Busch, T., Christensen, B.J., Nielsen, M.Ø, 2006. The information content of treasury bond options concerning future volatility and price jumps. QED Working Paper 1188. Queen’s University. Canina, L., Figlewski, S., 1993. The informational content of implied volatility. Review of Financial Studies 6, 659–681. Christensen, B.J., Nielsen, M.Ø, 2005. The implied-realized volatility relation with jumps in underlying asset prices. QED Working Paper 1186. Queen’s University. Christensen, B.J., Prabhala, N.R., 1998. The relation between implied and realized volatility. Journal of Financial Economics 50, 125–150. Corsi, F., 2009. A simple approximate long memory model of realized volatility. Journal of Financial Econometrics 7, 174–196. Covrig, V., Low, B.S., 2003. The quality of volatility traded on the over-the-counter market: a multiple horizons study. Journal of Futures Markets 23, 261–285. Cox, J.C., Ingersoll, J.E., Ross, S.A., 1981. The relationship between forward prices and futures prices. Journal of Financial Economics 9, 321–346. Day, T.E., Lewis, C.M., 1992. Stock market volatility and the information content of stock index options. Journal of Econometrics 52, 267–287. Engle, R.F., Gallo, G.M., 2006. A multiple indicators model for volatility using intradaily data. Journal of Econometrics 131, 3–27. Fleming, J., 1998. The quality of market volatility forecasts implied by S&P 100 index option prices. Journal of Empirical Finance 5, 317–345. French, D.W., 1984. The weekend effect on the distribution of stock prices: implications for option pricing. Journal of Financial Economics 13, 547–559. Garman, M.B., Kohlhagen, S., 1983. Foreign currency option values. Journal of International Money and Finance 2, 231–237. Hansen, P.R., Lunde, A., 2006. Realized variance and market microstructure noise (with discussion). Journal of Business and Economic Statistics 24, 127–161. Huang, X., Tauchen, G., 2005. The relative contribution of jumps to total price variance. Journal of Financial Econometrics 3, 456–499. Jiang, G.J., Tian, Y.S., 2005. The model-free implied volatility and its information content. Review of Financial Studies 18, 1305–1342. Jorion, P., 1995. Predicting volatility in the foreign exchange market. Journal of Finance 50, 507–528. Lamoureux, C.G., Lastrapes, W.D., 1993. Forecasting stock-return variance: toward an understanding of stochastic implied volatilities. Review of Financial Studies 6, 293–326. Merton, R.C., 1973. Theory of rational option pricing. Bell Journal of Economics and Management Science 4, 141–183. Mincer, J., Zarnowitz, V., 1969. The evaluation of economic forecasts. In: Mincer, J. (Ed.), Economic Forecasts and Expectations. NBER, New York, pp. 3–46. Pong, S., Shackleton, M.B., Taylor, S.J., Xu, X., 2004. Forecasting currency volatility: a comparison of implied volatilities and AR(FI)MA models. Journal of Banking and Finance 28, 2541–2563. Xu, X., Taylor, S.J., 1995. Conditional volatility and the informational efficiency of the PHLX currency options market. Journal of Banking and Finance 19, 803–821.
Journal of Econometrics 160 (2011) 58–68
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Covariance measurement in the presence of non-synchronous trading and market microstructure noise Jim E. Griffin a , Roel C.A. Oomen b,c,∗ a
School of Mathematics, Statistics and Actuarial Science, The University of Kent, UK
b
Deutsche Bank, London, UK Department of Quantitative Economics, The University of Amsterdam, The Netherlands
c
article
info
Article history: Available online 6 March 2010
abstract This paper studies the problem of covariance estimation when prices are observed non-synchronously and contaminated by i.i.d. microstructure noise. We derive closed form expressions for the bias and variance of three popular covariance estimators, namely realised covariance, realised covariance plus lead and lag adjustments, and the Hayashi and Yoshida estimator, and present a comprehensive investigation into their properties and relative efficiency. Our main finding is that the ordering of the covariance estimators in terms of efficiency crucially depends on the level of microstructure noise, as well as the level of correlation. In fact, for sufficiently high levels of noise, the standard realised covariance estimator (without any corrections for non-synchronous trading) can be most efficient. We also propose a sparse sampling implementation of the Hayashi and Yoshida estimator, study the robustness of our findings using simulations with stochastic volatility and correlation, and highlight some important practical considerations. © 2010 Elsevier B.V. All rights reserved.
1. Introduction The covariance structure of asset returns is fundamental to many issues in finance, and the importance of accurate covariance estimation can hardly be overstated. In recent years, high frequency data have become increasingly available for a wide range of securities and we have witnessed a shift in focus away from parametric conditional covariance estimation based on daily or weekly data and towards the model-free ex post measurement of realised quantities based on intra-day data (e.g. Andersen et al., 2003; Barndorff-Nielsen and Shephard, 2004). Inherent to high frequency data is microstructure noise which, in a univariate setting, makes the standard realised variance estimator biased and inconsistent. In the recent literature, a number of alternatives have been offered to restore consistency through subsampling (Zhang et al., 2005), kernel-based autocovariance adjustments (BarndorffNielsen et al., 2008a), or pre-averaging methods (Podolskij and Vetter, forthcoming; Jacod et al., 2009). In a multivariate setting matters do not simplify because there is the additional challenge of non-synchronous trading effects: when the arrival times of trades are random and hence nonsynchronous across assets, returns sampled at regular intervals in
∗
Corresponding author at: Deutsche Bank, London, UK. E-mail addresses:
[email protected] (J.E. Griffin),
[email protected] (R.C.A. Oomen). 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.015
calendar time will correlate with preceding and successive returns on other assets, even when the underlying correlation structure is purely contemporaneous (Fisher, 1966). As a consequence, the realised covariance estimator tends to zero as the sampling frequency is increased (Epps, 1979) and to address this issue, two conceptually different approaches have been suggested in the literature. The first mitigates non-trading biases by incorporating lead and lag autocovariance terms into the realised covariance estimator based on synchronized returns (Scholes and Williams, 1977; Dimson, 1979; Cohen et al., 1983). The second more recent approach of Hayashi and Yoshida (2005) operates directly on the non-synchronous data and delivers unbiased covariance estimates by accumulating the cross-product of all fully and partially overlapping event-time returns. The contribution this paper makes is to join the above two streams of literature, and analyze the properties of the three main covariance estimators – realised covariance (RC), realised covariance plus lead–lag adjustment (RCLL), and the Hayashi–Yoshida covariance estimator (HY) – in a unified setting with nonsynchronous trading and market microstructure effects. When the efficient price process follows a correlated Brownian motion, observation times are generated by Poisson sampling, and observed prices are subjected to i.i.d. microstructure noise, we can obtain closed form expressions for the bias and mean-squared error. These are then used to compare the relative performance of the three covariance estimators and our main finding is that their ordering in terms of efficiency crucially depends on the level of mi-
J.E. Griffin, R.C.A. Oomen / Journal of Econometrics 160 (2011) 58–68
crostructure noise, as well as the level of correlation. Indeed, when the noise is sufficiently strong, or the correlation sufficiently low, the standard realised covariance estimator can be more efficient than RCLL and HY. In other scenarios, RCLL or HY may be most efficient. In this paper, we discuss optimal sampling and optimal order of lead–lag adjustment and we introduce a skip-sampled HY estimator that can be substantially more efficient when levels of microstructure noise are high. Based on simulations, we find that our results are robust to stochastic volatility and correlation. In an empirical illustration using high frequency trade and quote data, we expose significant lead–lag dependence between asset prices in event time due to ‘‘sluggish’’ price adjustment and we show that this renders the HY estimator downward biased in practice. We propose a lead–lag adjustment to the HY estimator and study its effectiveness in reducing this bias. There are a number of recent papers that address issues related to those investigated here. Bandi and Russell (2005) study RC with microstructure noise, Sheppard (2005) introduces the concept of ‘‘scrambling’’ to study non-trading effects, while Zhang (2006b) studies RC in a general framework with non-trading and microstructure effects. Most recently, Barndorff-Nielsen et al. (2008b) generalize the RCLL estimator to a class of multivariate realised kernels which yield consistent and positive definite covariance estimates based on refresh time sampled returns. Regarding the HY estimator, Voev and Lunde (2007) independently document lead–lag dependence patterns in high frequency data similar to those reported here and they study the effectiveness of a calendar time lead–lag adjustment in such a scenario. Most closely related to ours, is the concurrent paper by Martens (2006) that derives closed form expressions for the mean and variance of RCLL in a setting with bid-ask bounce and non-trading effects. Martens (2006) also discusses alternative sampling and interpolation schemes, a multiplicative bias corrections, and studies the performance of the HY estimator. For the latter, he resorts to simulations as closed form expressions are not available. Our paper is distinguished from the above mentioned studies in that it presents a comprehensive treatment of all three covariance estimators simultaneously in a unified and tractable framework that incorporates both non-trading and microstructure effects. The important new insight this study brings is that, depending on the specific features of the observed price process, either of the three estimators can be most efficient. 2. Covariance estimation with non-synchronous and noisy data Let S (j) (t ) denote the time-t efficient (logarithmic) price of asset j, for t ∈ [0, 1]. It is assumed that prices of asset j are observed at (j) Mj
(j)
(j)
a set of discrete times {tm }m=1 with 0 ≤ t1 < · · · < tMj ≤ 1 and are subject to observation error: (j) p(mj) = S (j) (tm ) + u(mj)
for m = 1, . . . , Mj
(1)
(j)
where um is a ‘‘noise’’ process to be specified. In practice, the observation times typically correspond to the occurrence of transactions or quote-revisions whereas the observation noise is due to market microstructure effects such as the bid-ask spread (e.g. Hasbrouck, 2007). Thus, the efficient price process is latent and all inference about the process in general, and the variance/covariance structure in particular, is necessarily based on the discretely sampled and noisy observations p. Throughout the remainder of this paper we make the following assumptions: Assumption 1 (Brownian Motion). The efficient price process S is a correlated Brownian motion, i.e. S (j) = σj W (j) with dW (i) dW (j) = ρij dt.
59
Assumption 2 (Poisson Sampling). The observation times of asset (j) Mj
j, i.e. {tm }m=1 , are generated by a Poisson process with intensity λj , and are independent of the efficient price and noise processes and the observation times of other assets. Assumption 3 (I.I.D. Noise). The noise process u(j) is i.i.d. with zero mean and variance ξj2 , and independent of the efficient price process and the noise processes of the other assets. The efficient price is specified as a Brownian martingale allowing for contemporaneous correlation between the different assets. The two salient features that are central to this paper, namely non-synchronously observed prices and market microstructure effects are captured through the Poisson sampling and i.i.d. noise specification respectively. Note that Assumptions 1 and 2 constitute a special case considered by Hayashi and Yoshida (2005) whereas Assumption 3 is common in the realised variance literature (Bandi and Russell, 2006; Hansen and Lunde, 2006; Zhang et al., 2005). In independent and concurrent work, Martens (2006) models the non-synchronous trading using binomial sampling and incorporates an explicit bid-ask bounce noise process. As in our setup, both components are independent, and lead to qualitatively similar behavior. The specification of the price, noise, and sampling processes necessarily reflects a balance between generality and analytic tractability and constitutes a first order approximation of reality at best. Still, it should be pointed out that the assumptions may not be as restrictive as they appear at first sight for at least two reasons, namely (i) seemingly dependent noise may often arise as an artefact of the sampling scheme, even when the actual noise process is i.i.d. (see Griffin and Oomen, 2008), and (ii) non-homogeneity of trade arrivals and stochastic volatility can be accounted for by appropriately deforming the time scale. In practice, the arrival rates of trades are often found to be crosssectionally dependent, but because this reduces the level of nonsynchronicity, our independence assumption is a conservative one. The implicit independence between the price innovations and the trade arrival process is the more restrictive assumption but, as discussed by de Jong and Nijman (1997), is difficult to relax in the current context. We now discuss the properties of some covariance estimators, categorized by the type of data they use. For ease of exposition, we focus below on two assets only, i.e. j ∈ {1, 2} and use ρ as a shorthand for ρij . The object of econometric interest in the covariance estimation is thus ρσ1 σ2 . 2.1. Covariance estimation using synchronized observations The first class of covariance estimators we consider is one that relies on synchronizing the raw price observations. Here we focus on the conventional ‘‘last-tick’’ method1 where at each sampling point the most recently observed price for each asset is recorded, i.e. (j)
Pt
= p(Njj)(t ) where Nj (t ) = sup{n|tn(j) ≤ t }. n
Following this, we can construct M synchronized returns sampled (j) (j) (j) at regular intervals ∆ = 1/M as rm = Pm∆ − P(m−1)∆ . It is important to emphasize that sampling prices in this fashion merely ensures that returns are measured on a common time grid but it does not eliminate the non-trading problem. Indeed, under
1 For a discussion of alternative synchronization methods, see Dacorogna et al. (2001), Hansen and Lunde (2004), and Harris et al. (1995). Hansen and Lunde (2004) show that the ‘‘last-tick’’ method is desirable for realised variance estimation. Martens (2006) finds interpolating between prices improves realised covariance estimation as it reduces noise. See also Gatheral and Oomen (2010).
60
J.E. Griffin, R.C.A. Oomen / Journal of Econometrics 160 (2011) 58–68
Assumptions 1 and 2, the autocovariance function of returns can be expressed as: (2)
(1) E (rm rm+h )
λ1 µ22 e−λ2 (h−1)∆ for h > 0 ρσ1 σ2 λ λ1 + λ2 2 λ1 µ2 λ2 µ1 = ρσ1 σ2 ∆ − − for h = 0 (2) λ λ + λ λ1 λ1 + λ2 2 1 2 2 −λ1 (−h−1)∆ ρσ σ λ2 µ1 e for h < 0 1 2 λ1 λ1 + λ2 where µj = 1 − e−λj ∆ . From this we can see that the lead–lag dependence is strong when the sampling frequency M is high relative to the arrival rate λ, and (a) symmetric with (un)equal arrival rates. It also follows directly that the standard realised ∑M (1) (2) covariance estimator RCM = m=1 rm rm is biased towards zero, i.e. E (|RCM |) < |ρ|σ1 σ2 , and inconsistent, i.e. limM →∞ E (RCM ) = 0 for fixed λ. This last property was first observed in an empirical study by Epps (1979) and provides the motivation for the estimator we study in this paper, namely the realised covariance with lead–lag adjustment (RCLL), i.e. U M − −
RCLLM (L, U ) =
(1)
(2) rm+l rm .
This estimator was first studied by Scholes and Williams (1977) for U = L = 1, and later extended by Dimson (1979), and Cohen et al. (1983). Recent empirical applications of the RCLL estimator can be found in Bollerslev and Zhang (2003), Bandi and Russell (2005), Liu (2009), and Oomen (forthcoming), see also Barndorff-Nielsen et al. (2008b). Theorem 2.1. Given Assumptions 1–3, the expectation of RCLL is equal to: E (RCLLM ) = ρσ1 σ2
λ1 µ2 e−λ2 L∆ λ2 µ1 e−λ1 U ∆ 1−M −M λ2 λ1 + λ2 λ1 λ1 + λ2
. (4)
The variance of RCLL is equal to: V (RCLLM ) =
M ρ 2 c2 +
U +L+1 M
The above result makes explicit the effect of the lead–lag adjustment: for M fixed, increasing U and L reduces the bias induced by non-trading at the cost of higher variance. Similarly, for U and L fixed, an increase in M reduces the variance at the cost of higher bias. Consequently, a jointly optimal choice of the triplet {M ∗ , U ∗ , L∗ } can be determined via, for instance, a mean-squared error (MSE) criterion. Due to the complexity of Eq. (5) it is not possible to find an analytic solution, but numerical evaluation is of course straightforward. Qualitatively, we have the following behavior (i) increasing U and L, allows for more frequent sampling, i.e. it increases M ∗ , (ii) when λ1 < λ2 a lead adjustment on asset 1 is more effective than a lag adjustment (i.e. U ∗ ≥ L∗ ) and vice versa, (iii) an increase in noise level ξ lowers the optimal sampling frequency. A more detailed discussion of RCLL and its performance will follow below. 2.2. Covariance estimation using irregularly spaced observations As an alternative to RC(LL) which requires synchronized data, Hayashi and Yoshida (2005) propose2 a covariance estimator that can be computed directly from asynchronously observed prices, i.e.
(3)
m=1 l=−L
Proof. See Appendix.
HY =
M1 − −
(1) (2)
Ri Rj ,
(6)
i=1 j∈Ai
(j)
(j)
(j)
(1)
(1)
(2)
(2)
where Ri = pi − pi−1 and Ai = {j|(ti−1 , ti ) ∩ (tj−1 , tj ) ̸= ∅}. In words, HY accumulates the cross-product of all fully and partially overlapping returns. Here, the returns are sampled at the highest available observation frequency, and are therefore irregularly spaced in calendar time and asynchronous across assets. Theorem 2.2. Given Assumptions 1–3, the expectation of HY is equal to: E (HY) = ρσ1 σ2 .
(7)
The variance of HY is equal to:
σ12 σ22 + ξ12 ξ22 c1 ,
where
σ2 σ2 c1 = 2 12 1 − e−λ2 ∆(U +L+1) + 2 22 1 − e−λ1 ∆(U +L+1) ξ1 ξ2 −λ1 ∆(U +L+1) + 4M µ2 1 − e + 2M µ22
e−λ1 ∆(U +L+1) − 2 −λ1 ∆ e 1 − e−(L+U )∆(λ1 +λ2 ) − ∆ +λ (λ ) 1 2 1−e
+ 2M µ22
1 − e−(L+U )∆(λ2 −λ1 ) −λ1 ∆(U +L) e 1 − e−∆(λ2 −λ1 )
and c2 = 2F0 − D20 (1 − 2 min (U , L)) − (H1 + H2 ) 1 − e−2U λ1 ∆ + 2F1 (1 − e−U λ1 ∆ ) − D21 1 − e−2λ1 ∆ D0 D1 1 − µ1 +2 2U − (1 − e−2U λ1 ∆ ) µ1 µ1 1 − e−2Lλ2 ∆ + 2F2 (1 − e−Lλ2 ∆ ) − D22 1 − e−2λ2 ∆ D0 D2 1 − µ2 2L − +2 (1 − e−2Lλ2 ∆ ) µ2 µ2 and expressions for D, F , and H are given in the Appendix.
(5)
V (HY) = 2σ σ
2 2 1 2
ρ2 λ1 + λ2 + λ1 λ2 λ1 + λ2
+ 2σ12 ξ22 + 2σ22 ξ12 + 4ξ12 ξ22 Proof. See Appendix.
λ2 λ1 + λ1 λ2
λ1 λ2 . λ1 + λ2
(8)
In the absence of noise, unbiasedness of the HY estimator is not surprising and has already been discussed in detail by Hayashi and Yoshida (2005). With i.i.d. noise, the HY estimator remains unbiased but is now inconsistent. Depending on the level of noise, it may not be optimal to sample prices at the highest available observation frequency because this leads to an accumulation of noise that more than offsets the gains from using more data. Specifically, consider the case where we sample every kth observation for both assets, i.e. (j) (j) (j) tk , t2k , . . . , t⌊Mj /k⌋k . In the Appendix, we show that the variance of
2 The HY estimator is also studied by Hayashi and Kusuoka (2008) in a more general semi-martingale setting. Hayashi and Yoshida (2008) establish joint asymptotic normality of the HY estimator and RV. The covariance estimator of de Jong and Nijman (1997) is very similar to the one proposed by Hayashi and Yoshida, see Martens (2006) for further discussion. In independent work, Corsi and Audrino (2007) proposes a ‘‘tick-by-tick realised covariance estimator’’ which coincides with the HY estimator. He shows that the estimator performs well, both in simulations and in practice.
J.E. Griffin, R.C.A. Oomen / Journal of Econometrics 160 (2011) 58–68
61
Fig. 1. Relative performance of alternative covariance estimators.
such a ‘‘skip-k sampling’’ HY estimator has a variance that is accurately approximated as V (HYk ) = σ12 σ22
ρ2 λ1 + λ2 + λ1 λ2 λ1 + λ2
+ 2σ12 ξ22 + 2σ22 ξ12 + 4ξ12 ξ22
λ2 λ1 + λ1 λ2
(k + 1)
λ 1 λ 2 −1 k . λ1 + λ2
(9)
From this the optimal – MSE minimizing – sampling frequency can be worked out explicitly as:
λ1 λ2 γ1 γ2 (10) (λ1 + λ2 )2 + ρ 2 λ22 + λ21 √ where γi = λi ξi /σi denotes the noise ratio as defined in Oomen ∗
k =2
(2006). From Eq. (10) it follows that when the trade intensities and noise ratios are equal for both assets, then k∗ > 1 when
1/4 γ > 1 + 21 ρ 2 ≈ 1. Particularly for trade data, such levels of γ
are not uncommon so that in practice a skip-sampled HY estimator may well be preferred. Of course, the subsampling techniques frequently used in the realised variance literature (e.g. Zhang et al., 2005; Zhang, 2006a) can also be applied here to further improve the efficiency. See Voev and Lunde (2007) for further discussion of this. 2.3. Relative efficiency of the alternative covariance estimators The real benefit of working under Assumptions 1–3 above is that with the closed form MSE expressions in Theorems 2.1 and 2.2 we are now in a position where we can study in a unified setting which estimator is most efficient and under what conditions. It turns out that in this comparison the key parameter is the level of noise γ because it can change the ordering of the estimators in terms of their efficiency. The correlation ρ is also important
because the behavior of the estimators markedly changes as this parameter approaches zero. For different parameter configurations, Fig. 1 plots the MSE of the baseline RC and HY estimators, the RCLL estimator with optimal number of leads and lag, and the optimally ‘‘skip-sampled’’ HY estimator. Starting with the baseline scenario in Panel A, we see that in the absence of noise HY is most efficient thanks to its superior ability to deal with non-trading effects. RC performs worst while RCLL improves over RC with the lead–lag correction allowing it to sample at higher frequencies. However, by progressively increasing the level of noise (Panels B–C), HY and RCLL deteriorate to the point where RC is superior to both. For RCLL, both the optimal order of lead–lag adjustments as well as the sampling frequency diminishes when the noise level is raised and beyond a certain level the estimator reduces to RC. The results also illustrate the benefit of the skip-sampled HY estimator: with high levels of noise as in Panel C, HY6 is substantially more efficient than HY and also marginally improves over RC. Comparing Panels A and D, we see that with asymmetric arrival rates an asymmetric lead–lag adjustment for RCLL is optimal. Further, comparing Panel A and E, we see that RCLL can outperform both RC and HY when the correlation is sufficiently low and the trading rates of the two assets differ. This reverse ordering arises partly because the bias of RC(LL) is linear in ρ . Finally, when ρ = 0 and noise levels are high as in Panel F, the MSE of RC(LL) turns non-convex in M: a locally optimal sampling frequency at moderate M appears while the globally optimal sampling frequency is M → ∞. Intuitively, starting from M = 1, an increase in sampling frequency leads to noise accumulation as well as greater efficiency of the non-noise component and this gives rise to the locally convex MSE pattern. However, beyond some point, further increases in M actually reduce noise accumulation since the microstructure effects operates in trade time while non-trading effects do not bias the estimator. In this special case, RC is consistent and thus outperforms all alternatives.
62
J.E. Griffin, R.C.A. Oomen / Journal of Econometrics 160 (2011) 58–68
Fig. 2. Relative performance of alternative covariance estimators with stochastic volatility and correlation.
3. Further discussion 3.1. Robustness to stochastic volatility and correlation In the above, volatility and correlation are assumed to be constant, largely for analytic tractability. As this assumption may be violated in practice, we conduct a simulation study to gauge the impact of stochastic volatility and correlation on our results. For this purpose, we adopt the same bi-variate simulation framework as in Barndorff-Nielsen and Shephard (2004): one volatility process is constructed as a superposition of two CEV processes (BarndorffNielsen and Shephard, 2002), the other is a GARCH diffusion (Andersen and Bollerslev, 1998), while the correlation process is specified as a transformed GARCH diffusion ensured to lie between −1 and 1. We use the model parameters of BarndorffNielsen and Shephard (2004) as they are informed by calibration results and found to generate very realistic sample paths with substantial variation in both volatility and correlation. We add i.i.d. Gaussian microstructure noise to the simulated price series and use Poisson sampling to generate non-synchronicity in observation times. Details of the simulation design can be found in the web appendix (Griffin and Oomen, 2009).
Based on 50,000 replications using the above simulation setup, Fig. 2 reports the MSE of the alternative covariance estimators with the theoretical MSE from Theorems 2.1 and 2.2 – evaluated at the unconditional level of volatility and correlation – superimposed for comparison purpose. Panels A–D display the results for four distinct specifications of λ and γ (recall that σ and ρ are specified by the simulation process) covering a wide range of behavior. Importantly, in all scenarios the MSE figures closely agree, suggesting that the closed form expressions derived in this paper are reasonably robust to stochastic volatility and correlation. Consequently, they may be used to guide the choice of optimal sampling frequency and order of lead–lag adjustment in practical applications. Similarly, for the HY estimator, Eq. (10) may be used to determine the optimal sparsity of the sampling scheme. 3.2. Some empirical observations A critical assumption underlying the HY estimator, is that the correlation between two assets does not extend beyond the interval where returns fully or partially overlap. Stated differently, while observation times are random and non-synchronous, when a price update arrives it should fully incorporate all available
J.E. Griffin, R.C.A. Oomen / Journal of Econometrics 160 (2011) 58–68
63
Fig. 3. AAPL & GOOG (Jul–Dec 2008).
information to the extent that the cross-correlation between nonoverlapping returns is zero. If this assumption holds then it directly follows that E (HYk ) should not depend on k. This is a hypothesis that can be tested in practice, and for this purpose we extract from the NYSE TAQ database all trades and NBBO mid-quotes for Apple (AAPL.OQ), Google (GOOG.OQ), IBM (IBM.N), and Intel (INTC.OQ) during regular trading hours between 9:45 and 16:00 over two contiguous sample periods from July–December 2008 and January–June 2009. In this paper we only report results for the AAPL/GOOG pair and the turbulent 2008 sample. Results for the other pairs and the 2009 sample serve as a robustness check and are reported in the web appendix (Griffin and Oomen, 2009) along with some summary statistics of the data. Fig. 3 draws the skip-sampled HY ‘‘signature plots’’ for trade (in Panel A1) and quote data (in Panel B1). From here it is obvious that the HY estimator severely under-estimates the correlation at high frequencies indicating that the hypothesized assumption is strongly violated. To gain some further insights, we consider the following statistic aimed at measuring the lead–lag dependence in tick time:
φ(h) =
− (1) (2) Ri Rmax Ai +h i − − (1) (2) Ri Rj
i j∈Ai − (1) (2) Ri Rmin Ai +h
h>0 h=0
and lags of returns that share no overlap in physical time carry no information about the underlying correlation structure. However, in practice, cross dependence between non-overlapping returns can arise when price adjustment is not instantaneous and it takes some trading before prices fully reflect the currently available information. In such a scenario, one would of course expect HY to be biased because it misses out on some cross dependence. To illustrate that this may be a reasonable conjecture, Fig. 3 plots φ(h) averaged across all days in the sample and taking (i) AAPL as the base asset – i.e. asset 1 in Eq. (11) – and computing its correlation with leads and lags of GOOG in Panels A2/B2 and (ii) GOOG as the base asset and computing its correlation with leads and lags of AAPL in Panels A3/B3. Two important observations can be made. Firstly, there is a clear lead–lag dependence which can be thought of as arising from ‘‘sluggishly’’ price adjustments. Secondly, there is an asymmetric pattern in the lead–lag dependence: AAPL appears to ‘‘lead’’ GOOG. This effect is evident both in the trade data as well as in the quote data. Given these observations, it seems natural to generalize HY to: HYLL =
(11)
h0 i + j < 0.
The expression for V(RCLL) in the absence of noise is completed by collecting sums of the appropriate terms. In particular, M − M −
.
2 2 1 2 Cov
∩
(tm⋆(−2)1 , tm⋆(2) )
⋆(1) ⋆(2) ⋆(2) ⋆(1) ν (tm−1 , tm ) ∩ (th−1 , th )
⋆(1) ⋆(1) ⋆(2) ⋆(2) × ν (th+i−1 , th+i ) ∩ (tm , tm ) + ρ 2 σ12 σ22 × Cov ν (tm⋆(+1)j−1 , tm⋆(+1)j ) ∩ (tm⋆(−2)1 , tm⋆(2) ) ,
i+j>0 i + j < 0.
If m = h + i
and so
∩
(λ1 +λ2 )
2
Cov[Rm+j Zm , Rh+i Zh |t ] = Cov(Rm+j , Rh+i |t )Cov(Zm , Zh |t )
(th⋆(−21) , th⋆(2) ))
+ ρ 2 σ12 σ22 Cov ν (tm⋆(+1)j−1 , tm⋆(+1)j ) ∩ (tm⋆(−2)1 , tm⋆(2) ) ,
The properties of covariances of normal random variables and Brownian motion imply that
ν((th⋆(+1i)−1 , th⋆(+1i) )
∩
(tm⋆(−2)1 , tm⋆(2) )
2 1 − λ12 λ1 +λ , D1 = λ1 λ 1+λ2 , D2 = where D0 = ∆ − λ1 λ +λ 2 1 2 2 2 1 2 λ1 µ2 −λ2 ∆ λ µ −λ ∆ µ µ λ2 µ1 µ2 2 , and F0 = ∆ + 2 2 λ +λ + 2 22 λ1 +λ1 + 2 1 2 2 , λ λ +λ 1
65
Cov[Rm Zh , Rh Zm ] = M ρ 2 σ12 σ22 2 min{U , L}D20 − H1 − H2
m=1 h=1
Only a few cases have non-negligible values for the values of λ1 and λ2 that interest us. If m + j = h and m = h + i then
where
Cov[Rm+j Zm , Rh+i Zh ]
H1 = 2
= ρ 2 σ12 σ22 E ν (th⋆(−11) , th⋆(1) ) ∩ (th⋆(−21) , th⋆(2) )
× ν (tm⋆(−1)1 , tm⋆(1) ) ∩ (tm⋆(−2)1 , tm⋆(2) ) +ρ σ σ 2
2 2 1 2 Cov
=σ σ ρ
2
µ1
V
M − U −
− D1 D2 e−(λ1 +λ2 )(|h−m|−1)∆ .
R˜ m+j Z˜m
m=1 j=−L
=V
M − U −
Rm+j Zm
m=1 j=−L
1+|h−m|
µ2 (λ1 + λ2 )2
and
The effects of noise can be include by noting that if R˜ i is the noisy version of Ri and Z˜j is the noisy version of Zj then
1+|h−m|
D20 −
ν (th⋆(−11) , th⋆(1) ) ∩ (tm⋆(−2)1 , tm⋆(2) ) ,
ν (tm⋆(−1)1 , tm⋆(1) ) ∩ (th⋆(−21) , th⋆(2) ) 2 2 1 2
µ21 µ22 1 − (µ1 µ2 )min(U ,L) (λ1 + λ2 )2 1 − µ1 µ2 −min(U ,L)(λ1 +λ2 )∆ 1−e H2 = 2D1 D2 . 1 − e−(λ1 +λ2 )∆
×
U − U −
+
M − M −
E [Zm Zh ]
m=1 h=1
Cov um+j − um+j−1 , uh+i − vh+i−1
j=−L i=−L
+
M − M − m=1 h=1
Cov [vm − vm−1 , vh − vh−1 ]
U − U −
E Zm+j Zh+i
j=−L i=−L
66
J.E. Griffin, R.C.A. Oomen / Journal of Econometrics 160 (2011) 58–68
+
M − M −
The three sub-parts in Eq. (15) are ordered by magnitude. From Eq. (14) the first term is equal to:
Cov [vm − vm−1 , vh − vh−1 ]
m=1 h=1
×
i+U − h +U −
Cov um+j − um+j−1 , uh+i − uh+i−1 .
(13)
j=i−L m=h−L
γi (h) =
2ξi2 µi
E [Zi Zh ] Cov um+j − um+j−1 , uh+i − vh+i−1
m=1 j=−L h=1 i=−L
m=1
j=−L
h=1
vh−1 ] is approximately M − U − M − U −
(M − d) e−λ2 ∆(d−1)
d=1
× e−λ1 ∆(U +L+1) e−λ1 ∆d + eλ1 ∆d − 2e−λ1 ∆d e−λ1 ∆(U +L+1) − 2 = 2ξ12 ξ22 µ22 λ ∆ e 1 − e−λ2 ∆ 1 − e−(L+U )∆(λ1 +λ2 ) M −U −L × M − (L+U )∆(λ +λ ) − 1 2 e 1 − e−∆(λ1 +λ2 )
i=−L
E [Rm+j Rh+i ]Cov[vm − vm−1 , vh −
E [Rm Rh ] Cov vm+j − vm+j−1 , vh+i − vh+i−1
+ 2ξ12 ξ22 µ22 −λ ∆ e 1 − e−λ2 ∆ 1 − e−(L+U )∆(λ2 −λ1 ) M −U −L × M − (L+U )∆(λ −λ ) − 2 1 e 1 − e−∆(λ2 −λ1 ) −λ1 ∆(U +L+1) e − 2 −λ1 ∆ ≃ 2M ξ12 ξ22 µ22 e 1 − e−(L+U )∆(λ1 +λ2 ) 1 − e−∆(λ1 +λ2 )
m=1 j=−L h=1 i=−L
+ 2M ξ12 ξ22 µ22 e−λ1 ∆(U +L)
= 2σ12 ξ22 1 − e−λ2 ∆(U +L+1) . To work out the last term in (13), we note that it is equal to ∑M ∑M ∑i+U ∑h+U i=1 h=1 γ2 (h − i) j=i−L m=h−L γ1 (m − j) and that U U −d − −
γ1 (m − j)
× (14)
=
j=−L m=−L−d
U −d −
U −d−1 U −d
− −
+2
j=−L
j=−L m=j+1 U −
+
U −d −
j=U −d+1 m=−L
+
+
M −U −L+
HYk =
⌊M 1 /k⌋ −
U −
− L−1 −
.
U +L
U
×
−
=
γ2 (h − i)
⌊M 1 /k⌋ −
⌊M 1 /k⌋ −
+
(M − d) γ2 (d) + 2
j=−L m=−L−d
−
= −
i=1
d=U +L+1
+
j∈Ai
(1)
= pki −
(1) pk(i−1)
= {j|(tk((1i)−1) , (k) (2) and Zi = pki −
j∈Ai
−
(k) (k)
(k) (k)
Cov(Ri Zj , Rh Zl )
(k)
l∈Ah
(k)
(k)
(k)
(k)
Cov(Ri , Rh )Cov(Zj , Zl )
(k)
h =1
(k)
l∈Ah
⌊M 1 /k⌋ − h=1
−
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
Cov(Ri , Zl )Cov(Zj , Rh )
(k)
l∈Ah
(k)
(k)
V (Ri )V (Zj )
(k)
j∈Ai
⌊M 1 /k⌋ −
=σ σ
h=1
⌊M 1 /k⌋ −
−
−
i=1
(15)
(k)
j∈Ai
⌊M 1 /k⌋ −
⌊M 1 /k⌋ −
(M − d) γ2 (d)
̸= ∅}, Ri
1 /k⌋ − − ⌊M−
(k)
i=1
(k)
The variance can be expressed as:
i=1
|i−h|>U +L
M −1
γ1 (m − j) .
(k) (k)
Ri Zj
j∈Ai
i=1
U −d
− −
−
(tk((2j)−1) , tkj(2) )
V (HY) =
j=U −d+1 m=−L−d
γ2 (h − i) +
d=1
.
(2) pk(i−1) .
tki ) ∩
γ1 (m − j)
−
1 − e−∆(λ1 +λ2 )
(k)
(1)
or equivalently, in terms of d = i − h: M γ2 (0) + 2
(k)
i=1
j=i−L m=h−L
1 − e−∆(λ1 +λ2 )
j=−L m=−L−d
0 U + L.
−
1 − e−(L+U )∆(λ2 −λ1 )
For completeness, the (negligible) third term for d > U + L is equal to:
j=−L m=−L−d
U U −d − −
(16)
e−λ1 ∆(U +L+1)
= 2σ22 ξ12 1 − e−λ1 ∆(U +L+1) ∑M ∑U ∑M ∑U
and
U +L −
2ξ12 ξ22 µ22
where γ1 (h) = Cov[ui−h − ui−h−1 , ui − ui−1 ] and γ2 (h) = Cov[vi−h −vi−h−1 , vi −vi−1 ]. The first term in (13) has already been worked out above. For the next two terms, note that E [Rm Rh ] = 0 unless m = h and this implies that M − U − M − U −
γ1 (m − j)
j=−L m=−L−d
The second term, for d = 1, . . . , U + L, is equal to:
h=0 h ̸= 0
−ξi2 µ2i e−λi ∆(|h|−1)
U U −d − −
= 4M ξ12 ξ22 µ2 1 − e−λ1 ∆(U +L+1) .
The covariance of the noise takes the form:
M γ2 (0)
2 2 1 2E
1 /k⌋ − − ⌊M−
(k)
j∈Ai
h=1
l∈Ah
(I1 ) + ρ σ σ 2
Cov(Ri , Zl )Cov(Zj , Rh )
(k)
2 2 1 2E
(I2 )
J.E. Griffin, R.C.A. Oomen / Journal of Econometrics 160 (2011) 58–68
where Bj is the set of all partitions (m, n, p) such that m + n + p = j and m ≥ 0, n ≥ 0, p ≥ 0 and
where I1 =
⌊M 1 /k⌋ −
1 /k⌋ − − ⌊M−
(k)
i=1
×ν
h =1
j∈Ai
ν (tk((1i)−1) , tki(1) ) ∩ (tk((2l)−1) , tkl(2) )
(k)
∩ (tk((2j)−1) , tkj(2) ) .
The expectation of the first term, conditional on M1 , is equal to: ⌊M 1 /k⌋ −
E ( I1 ) =
(1)
(1)
E ν (tk(i−1) , tki )
ν (tk((1i)−1) , tki(1) ) + x⋆
k(k + 1)
=
where x⋆ =
λ21 ∑
+
k (k + 1)
λ1
λ2
(1)
− ti(−1)1 is Gamma distributed with shape parameter (1) (1) k and scale parameter λ1 and th−1 − ti is gamma distributed with
⌊M1 /k⌋
x
parameters (k(h − i − 1), λ1 ) and letting Ix (a, b) = 01
(2) (2) (1) (1) (k) ν((tk(j−1) , tkj )) − ν((tk(i−1) , tki )). Throughout j∈Ai
the proof we assume that the inter-arrival times are independent exponentially distributed random variables. Strictly speaking, when conditioning on M1 , the process is binomial but this distinction will be immaterial for typical values of λ1 and λ2 when taking expectations of functions of the inter arrival times. Next, taking expectations w.r.t. M1 we have: E ( I1 ) =
k(k + 1)
λ21
+
k k+1
λ1 λ2
λ1 k
λ2 0 (m + 1) 0 (m + 1) (1) (1) × (tki − tk(i−1) ) 0 (m + 1) − 0 (m + 1, λ2 (tki − tk((1i)−1) )) m+1 (1) (1) µ(m2) = 0 m + 2, λ2 (tkh − tk(h−1) ) λ2 (1) (1) + (tkh − tk((1h)−1) ) 0 (m + 1) − 0 (m + 1, λ2 (tkh − tk((1h)−1) )) n (1) (1) (1) (1) pn = exp −λ2 (tk(h−1) − tki ) λ2 (tk(h−1) − tki ) /n!. Using that ti
i =1
= (k + 1)
1
λ1
+
1
λ2
.
(1)
I2 =
+2
(k) h=1
i=1
×ν
j∈Ai
(1) ) (tk((1h)−1) , tkh
(k)
I
λ2 λ1 +λ2
(m + 1, k + 1)
m
λ2
(1)
E [0 (m + 1, λ2 (ti
− ti(−1)1 ))]
+ E [(ti(1) − ti(−1)1 )] − E [(ti(1) − ti(−1)1 )0 (m, λ2 (ti(1) − ti(−1)1 ))] =
k
λ1
λ2
0 (m)I
1
λ1
λ2 λ1 +λ2
(m + k)I
(m + 1, k)
λ2 λ1 +λ2
(m, k) + 0 (m)
1
λ1
mI
λ2 λ1 +λ2
(m + 1, k)
and
m2
+
− 0 (m)
l∈Ah
∩
λ1
E [µ(2) ]m = E [µ(m1) ] =
ν (tk((1i)−1) , tki(1) ) ∩ (tk((2l)−1) , tkl(2) )
(tk((2j)−1) , tkj(2) )
(m + 2, k)
and so
(k) j∈Ai
i−1 − −−
k
= 0 (m + 1)
2 − ν (tk((1i)−1) , tki(1) ) ∩ (tk((2j)−1) , tkj(2) )
⌊M 1 /k⌋ −
λ2 λ1 +λ2
− ti(−1)1 )0 (m, λ2 (ti(1) − ti(−1)1 ))]
(1)
E [(ti
i=1
− ti(−1)1 ))] = 0 (m + 2)I
E [0 (m + 1, λ2 (ti
xa−1 (1−x)b−1
a−1 (1−x)b−1 0 x
which is the incomplete beta function, we get:
The second term can be expressed as: ⌊M 1 /k⌋ −
1
(1) (1) 0 m + 2, λ2 (tki − tk(i−1) ) +
(1)
l∈Ah
(1) (tk((1h)−1) , tkh )
1
1
µ(m1) =
j∈Ai
⌊M 1 /k⌋ −
I2 =
ν (tk((1i)−1) , tki(1) ) ν (tk((2j)−1) , tkj(2) )
− (k)
i =1
67
E [pn ] =
.
λ2 λ1 + λ2
n
λ1 λ1 + λ2
k(h−i)
(k(h − i) + n − 1)! . (k(h − i) − 1)!n!
It follows that Conditional on M1 , we have:
E
⌊M 1 /k⌋ − i =1
=E
E
2 − ν (tk((1i)−1) , tki(1) ) ∩ (tk((2j)−1) , tkj(2) )
(k) (k) j∈Ai l∈Ah
⌊M 1 /k⌋ −
ν
(tk((1i)−1) , tki(1) )
2
=
M1 (k + 1)
λ21
=
.
Taking expectations w.r.t. M1 , this term is equal to (k + 1)/λ1 . (1)
(1)
(2)
(2)
If i < h the expectation of ν((tk(i−1) , tki ) ∩ (tk(l−1) , tkl )) (1) ν((tk((1h)−1) , tkh ) ∩ (tk((2j)−1) , tkj(2) )) is non-zero only if asset 2 does not
(1) (1) transact on the interval (tki , tk(h−1) ). Conditional on t (1) we then have:
− − (k) (k) j∈Ai l∈Ah
ν
(tk((1i)−1) , tki(1) )
∩
(tk((2l)−1) , tkl(2) )
(1) × ν (tk((1h)−1) , tkh ) ∩ (tk((2j)−1) , tkj(2) ) − 1 = µ(m1) pn µ(p2) k m,n,p∈B k−1
ν (tk((1i)−1) , tki(1) ) ∩ (tk((2l)−1) , tkl(2) )
(1) × ν (tk((1h)−1) , tkh ) ∩ (tk((2j)−1) , tkj(2) )
(k) j∈Ai
i=1
E
− −
k−1 1−
k n=0
k−1−n
pn
−
Am Ak−1−n−m
m=0
where Am =
(m + 1) k I λ2 (m + 2, k) + I λ1 (k + 1, m + 1). λ2 λ1 (λ1 +λ2 ) (λ1 +λ2 )
So M1 − i −1 −
−−
E
i=1 h=1
×
=
ν((ti(−1)1 , ti(1) ) ∩ (tl(−2)1 , tl(2) ))
j∈Ai l∈Ah
ν((th(1−)1 , th(1) )
k−1 k−1−n − 1 − n=0
k
m=0
∩
(tj(−2)1 , tj(2) ))
Am Ak−1−n−m
M1 − i−1 − i=1 h=1
E [p n ]
68
J.E. Griffin, R.C.A. Oomen / Journal of Econometrics 160 (2011) 58–68
and noting that M1
M1
i−1
−−
E [p n ] =
i=1 h=1
λ2 λ1 + λ2
i−1
−− i=1 h=1
n
λ1 λ1 + λ2
k(i−h)
0 (k(i − h) + n) × . 0 (k(i − h))0 (n + 1) From the above, we now have V (HYk ) = σ σ (k + 1) 2 2 1 2
+
×
2
λ1
k k−1 −
k
1
λ1
− 1 + exp −
+ λ1 k
1
λ2
+ρ σ σ
1−
2
2 2 1 2
λ1 λ1 + λ2
k+1
λ1 k
Am Ak−1−m .
m=0
The expression
1 + λ1 λ2 λ λ1 k + 1 2 2 2 2 + ρ σ1 σ2 + λ1 + λ2 λ1 λ2
V (HYk ) = σ12 σ22 (k + 1)
1
is exact when k = 1 and well approximates the correct value when k > 1. References Andersen, T.G., Bollerslev, T., 1998. Answering the skeptics: yes, standard volatility models do provide accurate forecasts. International Economic Review 39 (4), 885–905. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2003. Modeling and forecasting realized volatility. Econometrica 71 (2), 579–625. Bandi, F.M., Russell, J.R., 2005. Realized covariation, realized beta, and microstructure noise. Manuscript, University of Chicago. Bandi, F.M., Russell, J.R., 2006. Separating microstructure noise from volatility. Journal of Financial Economics 79, 655–692. Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N., 2008a. Designing realised kernels to measure the ex-post variation of equity prices in the presence of noise. Econometrica 76 (6), 1481–1536. Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N., 2008b. Multivariate realised kernels: consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading. Manuscript, OxfordMan Institute. Barndorff-Nielsen, O.E., Shephard, N., 2002. Econometric analysis of realised volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society, Series B 64, 253–280. Barndorff-Nielsen, O.E., Shephard, N., 2004. Econometric analysis of realised covariation: high frequency based covariance, regression and correlation in financial economics. Econometrica 72 (3), 885–925. Bollerslev, T., Zhang, B.Y., 2003. Measuring and modeling systematic risk in factor pricing model using high-frequency data. Journal of Empirical Finance 10, 533–558. Cohen, K.J., Hawawini, G.A., Maier, S.F., Schwartz, R.A., Whitcomb, D.K., 1983. Friction in the trading process and the estimation of systematic risk. Journal of Financial Economics 12, 263–278. Corsi, F., Audrino, F., 2007. Realized correlation tick-by-tick. University of St. Gallen, Department of Economics, Discussion Paper No. 2007-02.
Dacorogna, M.M., Gençay, R., Müller, U., Olsen, R.B., Pictet, O.V., 2001. An Introduction to High-Frequency Finance. Academic Press, London, UK. de Jong, F., Nijman, T., 1997. High frequency analysis of lead–lag relationships between financial markets. Journal of Empirical Finance 4, 257–277. Dimson, E., 1979. Risk measurement when shares are subject to infrequent trading. Journal of Financial Economics 7, 197–226. Epps, T.W., 1979. Comovements in stock prices in the very short run. Journal of the American Statistical Association 74 (366), 291–298. Fisher, L., 1966. Some new stock-market indexes. Journal of Business 39 (1–2), 191–225. Gatheral, J., Oomen, R.C.A., 2010. Zero-intelligence realized variance estimation. Finance and Stochastics 14 (2), 249–283. Griffin, J.E., Oomen, R.C., 2008. Sampling returns for realized variance calculations: tick time or transaction time? Econometric Reviews 27 (1–3), 230–253. Griffin, J.E., Oomen, R.C., 2009. Appendix to covariance measurement in the presence of non-synchronous trading and market microstructure noise. Available at: http://ssrn.com/abstract=1438825. Hansen, P.R., Lunde, A., 2004. An unbiased measure of realized variance. Manuscript, Stanford University, Economics Department. Hansen, P.R., Lunde, A., 2006. Realized variance and market microstructure noise. Journal of Business & Economic Statistics 24 (2), 127–161. Harris, F.H.D., McInish, T.H., Shoesmith, G.L., Wood, R.A., 1995. Cointegration, error correction, and price discovery on informationally linked security markets. Journal of Financial & Quantitative Analysis 30 (4), 563–579. Hasbrouck, J., 2007. Empirical Market Microstructure. Oxford University Press. Hautsch, N., Kyj, L., Oomen, R.C., 2009. A blocking and regularization approach to high dimensional realized covariance estimation. Manuscript, Quantitative Products Laboratory, Berlin. Hayashi, T., Kusuoka, S., 2008. Consistent estimation of covariation under nonsynchronicity. Statistical Inference for Stochastic Processes 11 (1), 93–106. Hayashi, T., Yoshida, N., 2005. On covariance estimation of non-synchronously observed diffusion processes. Bernoulli 11 (2), 359–379. Hayashi, T., Yoshida, N., 2008. Asymptotic normality of a covariance estimator for nonsynchronously observed diffusion processes. Annals of the Institute of Statistical Mathematics 60 (2), 357–396. Jacod, J., Li, Y., Mykland, P.A., Podolskij, M., Vetter, M., 2009. Microstructure noise in the continuous case: the pre-averaging approach. Stochastic Processes and their Applications 119 (7), 2249–2276. Liu, Q., 2009. On portfolio optimization: how and when do we benefit from highfrequency data? Journal of Applied Econometrics 24 (4), 560–582. Martens, M., 2006. Estimating unbiased and precise realized covariances. Manuscript, Erasmus University Rotterdam. Oomen, R.C., 2006. Properties of realized variance under alternative sampling schemes. Journal of Business & Economic Statistics 24 (2), 219–237. Oomen, R.C., 2009. High dimensional covariance forecasting for short intra-day horizons. Quantitative Finance (forthcoming). Podolskij, M., Vetter, M., 2007. Estimation of volatility functionals in the simultaneous presence of microstructure noise and jumps. Bernoulli (forthcoming). Scholes, M., Williams, J., 1977. Estimating betas from nonsynchronous data. Journal of Financial Economics 5, 309–327. Sheppard, K., 2005. Realized covariance and scrambling. Manuscript, University of Oxford. Voev, V., Lunde, A., 2007. Integrated covariance estimation using high-frequency data in the presence of noise. Journal of Financial Econometrics 5 (1), 68–104. Zhang, L., 2006a. Efficient estimation of stochastic volatility using noisy observations: a multi-scale approach. Bernoulli 12 (6), 1019–1043. Zhang, L., 2006b. Estimating covariation: epps effect, microstructure noise. Manuscript, Carnegie Mellon University, Department of Statistics. Zhang, L., Mykland, P.A., Aït-Sahalia, Y., 2005. A tale of two time scales: determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association 100 (472), 1394–1411. Zhang, L., Mykland, P.A., Aït-Sahalia, Y., 2005. A tale of two time scales: determining integrated volatility with noisy high frequency data. Journal of the American Statistical Association 100, 1394–1411. Zhou, B., 1996. High frequency data and volatility in foreign-exchange rates. Journal of Business & Economic Statistics 14 (1), 45–52.
Journal of Econometrics 160 (2011) 69–76
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Do high-frequency measures of volatility improve forecasts of return distributions? John M. Maheu a,b , Thomas H. McCurdy c,d,∗ a
Department of Economics, University of Toronto, Canada
b
RCEA, Italy
c
Rotman School of Management, University of Toronto, Canada
d
CIRANO, Canada
article
info
Article history: Available online 6 March 2010 Keywords: Realized volatility Multiperiod out-of-sample prediction Term structure of density forecasts Stochastic volatility
abstract Many finance questions require the predictive distribution of returns. We propose a bivariate model of returns and realized volatility (RV), and explore which features of that time-series model contribute to superior density forecasts over horizons of 1 to 60 days out of sample. This term structure of density forecasts is used to investigate the importance of: the intraday information embodied in the daily RV estimates; the functional form for log(RV ) dynamics; the timing of information availability; and the assumed distributions of both return and log(RV ) innovations. We find that a joint model of returns and volatility that features two components for log(RV ) provides a good fit to S&P 500 and IBM data, and is a significant improvement over an EGARCH model estimated from daily returns. © 2010 Elsevier B.V. All rights reserved.
1. Introduction Many finance questions require a full characterization of the distribution of returns. Examples include option pricing which uses the forecast density of the underlying spot asset, or Value-at-Risk which focuses on a quantile of the forecasted distribution. Once we move away from the simplifying assumptions of Normallydistributed returns or quadratic utility, portfolio choice also requires a full specification of the return distribution. The purpose of this paper is to study the accuracy of forecasts of return densities produced by alternative models. Specifically, we focus on the value that high frequency measures of volatility provide in characterizing the forecast density of returns. We propose new bivariate models of returns and realized volatility and explore which features of those time-series models contribute to superior density forecasts over multiperiod horizons out of sample. Andersen and Bollerslev (1998), Andersen et al. (2001b), Andersen et al. (2001a), Barndorff-Nielsen and Shephard (2002) and Meddahi (2002), among others,1 have established the theoretical and empirical properties of the estimation of quadratic variation for a broad class of stochastic processes in finance. Although
∗ Corresponding author at: Rotman School of Management, University of Toronto, Canada. Tel.: +1 416 978 3425; fax: +1 416 971 3048. E-mail addresses:
[email protected] (J.M. Maheu),
[email protected] (T.H. McCurdy). 1 Recent reviews include Andersen et al. (2009) and Barndorff-Nielsen and Shephard (2007). 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.016
theoretical advances continue to be important, part of the research in this new field has focused on the time-series properties and forecast improvements that realized volatility provides. Examples include Andersen et al. (2003), Andersen et al. (2007), Andersen et al. (2004), Ghysels and Sinko (2006), Ghysels et al. (2006), Koopman et al. (2005), Maheu and McCurdy (2002, 2007), Martens et al. (2003) and Taylor and Xu (1997). Few papers have studied the benefits of incorporating RV into the return distribution. Andersen et al. (2003) and Giot and Laurent (2004) consider the value of RV for forecasting and for Value-atRisk. These approaches decouple the return and volatility dynamics and assume that RV is a sufficient statistic for the conditional variance of returns. Ghysels et al. (2005) find that high frequency measures of volatility identify a risk-return tradeoff at lower frequencies. Their filtering approach to volatility measurement does not provide a law of motion for volatility and therefore multiperiod forecasts cannot be computed in that setting. RV is an ex post measure of volatility and in general may not be equivalent to the conditional variance of returns. We propose bivariate models based on two alternative ways in which RV is linked to the conditional variance of returns. Since our system provides a law of motion for both return and RV at the daily frequency, multiperiod forecasts of returns and RV or the density of returns are available. The dynamics of the conditional distribution of RV will have a critical impact on the quality of the return density forecasts. Our benchmark model is an EGARCH model of returns. This model is univariate in the sense that it is driven by one stochastic process which directs the innovations to daily returns. It does not
70
J.M. Maheu, T.H. McCurdy / Journal of Econometrics 160 (2011) 69–76
allow higher-order moments of returns to be directed by a second stochastic process. Nor does it utilize any intraday information. Two types of functional forms for the bivariate models of returns and RV are proposed. The first model uses a heterogeneous autoregressive (HAR) specification (Corsi, 2009; Andersen et al., 2007) of log(RV). A second model allows different components of log(RV) to have different decay rates (Maheu and McCurdy, 2007). We also consider two ways to link RV to the variance of returns. First, we impose the cross-equation restriction that the conditional variance of daily returns is equal to the conditional expectation of daily RV. Second, motivated by Bollerslev et al. (2009) who model returns, bipower variation and realized jumps in a multivariate setting,2 we also investigate a specification of our bivariate component model for which the variance of returns is assumed to be synonymous with RV. We label this case ‘observable stochastic volatility’ and explore whether this assumption improves the term structure of density forecasts. We also compare specifications with non-Normal versus Normal innovations for both returns and log(RV). As in our benchmark EGARCH model, all of our bivariate models allow for so-called leverage or asymmetric effects of past negative versus positive return innovations. Our bivariate models allow for mean reversion in RV. This allows us to evaluate variance targeting for these specifications. Our main method of model comparison uses the predictive likelihood of returns. This is the forecast density of a model evaluated at the realized return; it provides a measure of the likelihood of the data being consistent with the model. Intuitively, better forecasting models will have higher predictive likelihood values. Therefore our focus is on the relative accuracy of the models in forecasting the return density out of sample. The forecast density of the models is not available in closed form; however, we discuss accurate simulation methods that can be used to evaluate the forecast density and the predictive likelihood. An important feature of our approach is that we can directly compare traditional volatility specifications, such as EGARCH, with our bivariate models of return and RV since we focus on a common criteria—forecast densities of returns. We generate a predictive likelihood for each out-of-sample data point and for each forecast horizon. For each forecast horizon, we can compute the average predictive likelihood where the average is computed over the fixed number of out-of-sample data points. A term structure of these average predictive likelihoods allows us to investigate the relative contributions of RV over short to long forecast horizons. Our empirical applications to S&P 500 (Spyder) and IBM returns reveal the importance of intraday return information, the timing of information availability, and non-Normal innovations to both returns and log(RV). The main features of our results are as follows. Bivariate models that use high frequency intraday data provide a significant improvement in density forecasts relative to an EGARCH model estimated from daily data. Two-component specifications for log(RV) provide similar or better performance than HAR alternatives; both dominate the less flexible singlecomponent version. A bivariate model of returns with Normal innovations and observable stochastic volatility directed by a 2component, exponentially decaying function of log(RV) provides good density forecasts over a range of out-of-sample horizons for both data series. We find that adding a mixture of Normals or GARCH effects to the innovations of the log(RV) part of this specification is not statistically important for our sample of S&P 500 returns, while the addition of the mixture of Normals provides a significant improvement for IBM.
2 For definition and development of bipower variation and realized jumps see, for example, Barndorff-Nielsen and Shephard (2004).
This paper is organized as follows. The next section introduces the data used to construct daily returns and daily RV. It also discusses the measurement of volatility, the adjustments to realized volatility to remove the effects of market microstructure, and a benchmark model which is based on daily return data. Our bivariate models of returns and RV, based on high-frequency intraday data, are introduced in Section 3. The calculation of density forecasts and the predictive likelihood are discussed in Section 4; results are presented in Section 5. Section 6 concludes. 2. Data and realized volatility estimation We investigate a broadly diversified equity index (the S&P 500) and an individual stock (IBM). For the former we use the Standard & Poor’s Depository Receipt (Spyder) which is a tradable security that represents ownership in the S&P 500 Index. Since this asset is actively traded, it avoids the stale price effect associated with using the S&P 500 index at high frequencies. Transaction price data associated with both the Spyder and IBM are obtained from the New York Stock Exchange’s Trade and Quotes (TAQ) database. Our data samples cover the period January 2, 1996 to August 29, 2007 for the Spyder and January 4, 1993 to August 29, 2007 for IBM. The shorter sample for the Spyder data was chosen based on volume of trading, for example there were many 5-minute periods with no transactions during the first years after the Spyder started trading in 1993, and a structural break in the Spyder log(RV) data in the mid 1990s (Liu and Maheu, 2008). The average number of transactions per day for the 1996–2007 sample of Spyder data was 32,971 but the volume of trades has increased substantially over the sample—especially from 2005 forward. In contrast, the average number of transactions per day for IBM shares has been more stable over our 1993–2007 sample, averaging 6011 transactions per day with a substantial increase from late 2006. After removing errors from the transaction data,3 a 5-minute grid4 from 9:30 to 16:00 EST was constructed by finding the closest transaction price before or equal to each grid-point time. From this grid, 5-minute continuously compounded (log) returns were constructed. These returns were scaled by 100 and denoted as rt ,i , i = 1, . . . , I, where I is the number of intraday returns in day t. For our 5-minute grid, normally I = 78 although the market closed early on a few days. This procedure generated 228,394 5-minute returns corresponding to 2936 trading days for the S&P 500; and 286,988 5-minute returns corresponding to 3693 trading days for IBM. The increment of quadratic variation is a natural measure of ex post variance over a time interval. A popular estimator is realized variance or realized volatility (RV) computed as the sum of squared returns over this time interval. The asymptotic distribution of RV has been studied by Barndorff-Nielsen and Shephard (2002) who provide conditions under which RV is an unbiased estimate. Given the intraday returns, rt ,i , i = 1, . . . , I, an unadjusted daily RV estimator is RVt ,u =
I −
rt2,i .
(2.1)
i=1
However, in the presence of market-microstructure dynamics, RV
3 Data were collected with a TAQ correction indicator of 0 (regular trade) and when possible a 1 (trade later corrected), we also excluded any transaction with a sale condition of Z , which is a transaction reported on the tape out of time sequence, and with intervening trades between the trade time and the reported time on the tape. We also checked any price change that was larger than 3% and removed obvious errors. 4 Volatility signature plots using grids ranging from 1 min to 195 min are available on request.
J.M. Maheu, T.H. McCurdy / Journal of Econometrics 160 (2011) 69–76
71
Table 1 Summary statistics: daily returns and realized volatility. Mean SPY rt RVu RVAC 1 RVAC 2 RVAC 3 IBM rt RVu RVAC 1 RVAC 2 RVAC 3
Variance
−0.018 1.210 1.079 1.013 0.978
−0.037 2.825 2.623 2.558 2.531
Skewness
Kurtosis
Min
Max
0.967 2.640 2.373 2.115 2.054
0.080 6.932 7.670 7.530 8.071
6.180 84.936 96.439 88.588 102.635
−7.504
2.602 9.161 9.433 9.875 10.095
0.074 5.145 6.051 6.377 6.362
3.898 54.879 75.409 82.091 81.024
−11.699
0.055 0.047 0.043 0.036
0.150 0.132 0.114 0.010
8.236 33.217 30.789 25.227 26.329 11.310 58.270 65.069 66.594 65.235
rt are daily returns, RVu are constructed from raw 5-minute returns with no adjustment, and RVACq , q = 1, 2, 3, are constructed as in Eq. (2.2).
can be a biased and inconsistent estimator for quadratic variation (Bandi and Russell, 2008; Zhang et al., 2005). Therefore, we consider several adjustments to our estimates and gauge their statistical performance in our model comparisons.5 Hansen and Lunde (2006) suggest the use of Bartlett weights to rule out negative values for RV. Following this approach, a corrected RV estimator is RVt ,ACq = ω0 γˆ0 + 2
q −
ωj γˆj ,
j =1
γˆj =
I −j −
rt ,i rt ,i+j ,
(2.2)
i=1
in which the weights follow a Bartlett scheme ωj = 1 − q+1 , j = 0, 1, . . . , q. We consider q = 1, 2, 3. Barndorff-Nielsen et al. (2008) discuss the asymptotic properties of statistics of this type. In order to match the volatility measures, daily returns, rt , are computed as the logarithmic difference of the closing price and the opening price. These returns are scaled by 100. Table 1 displays summary statistics for daily returns and daily RV estimates computed from the 5-minute grid. If we take the sample variance of daily returns as a benchmark estimate of volatility in which no market microstructure effects are present, and compare this to the sample mean of RV, we see a clear bias for unadjusted RV. With respect to removing bias, it appears that a Bartlett adjustment with q = 3 is necessary for the S&P 500 (Spyder) data, whereas an adjustment with q = 1 is adequate for the IBM data. This conclusion is supported by autocorrelation analyses of the 5minute returns data, as revealed by the autocorrelation functions with associated confidence bounds in Fig. 1 for the S&P 500 and IBM respectively. For the remainder of our paper, unless otherwise stated, we use RVt ≡ RVt ,ACq , with q = 3 for the S&P 500 and q = 1 for the IBM data. One way to ascertain whether or not high-frequency (intraperiod) information contributes to improved forecasts of return distributions, is to compare density forecasts from our bivariate specifications of returns and log(RV) with those from a benchmark EGARCH specification: j
rt = µ + ϵt ,
ϵt = σt ut ut ∼ NID(0, 1),
log(σt2 ) = ω + β log(σt2−1 ) + γ ut −1 + α|ut −1 |.
(2.3) (2.4)
Fig. 1. ACF of 5-minute return data.
for multiple horizons. In this section, we introduce two alternative joint specifications of daily returns and realized volatility. These bivariate models are distinguished by alternative assumptions about RV dynamics. We also consider versions of these bivariate models with non-Normal return and log(RV) innovations, as well as a version with an alternative assumption concerning available information about RV. In each case, cross-equation restrictions link the variance of returns and our realized volatility specification. Corollary 1 of Andersen et al. (2003) shows that, under empirically realistic conditions, the conditional expectation of quadratic variation (QVt ) is equal to the conditional variance of returns, that is, Et −1 (QVt ) = Vart −1 (rt ) ≡ σt2 . If RV is an unbiased estimator of quadratic variation,6 it follows that the conditional variance of returns can be linked to RV as σt2 = Et −1 (RVt ) where the information set is defined as Φt −1 ≡ {rt −1 , RVt −1 , rt −2 , RVt −2 , . . . , r1 , RV1 }. Assuming that RV has a log-Normal distribution, that restriction takes the form
1 σt2 = Et −1 (RVt ) = exp Et −1 log(RVt ) + Vart −1 (log(RVt )) . 2
(3.1) 3. Joint return-RV models 3.1. HAR-RV specifications As discussed in the Introduction, an integrated model of returns and realized volatility is needed to deal with common questions in finance which require a forecast density of returns
We begin with a bivariate specification for daily returns and RV in which conditional returns are driven by Normal innovations
5 For alternative approaches to dealing with market microstructure dynamics see Aït-Sahalia et al. (2005), Bandi and Russell (2006), Barndorff-Nielsen et al. (2008), Oomen (2005), Zhang (2006) and Zhou (1996).
6 We assume that any stochastic component in the intraperiod conditional mean is negligible compared to the total conditional variance. It is also straightforward to estimate a bias term.
72
J.M. Maheu, T.H. McCurdy / Journal of Econometrics 160 (2011) 69–76
and the dynamics of log(RVt ) are captured by Heterogeneous AutoRegressive (HAR) functions of lagged log(RVt ). Corsi (2009) and Andersen et al. (2007) use HAR functions in order to parsimoniously capture long-memory dependence. Motivated by that work, we define log(RVt −h,h ) ≡
h−1 1−
log(RVt −h+i ), h i=0 log(RVt −1,1 ) ≡ log(RVt −1 ).
(3.2)
For example, log(RVt −22,22 ) averages log(RV) over the most recent 22 days, that is, from t − 22 to t − 1, log(RVt −5,5 ) over the most recent 5 days, etc. This leads to our bivariate specification for daily returns and RV with the dynamics of log(RVt ) modeled as an asymmetric HAR function of past log(RV). This bivariate system is summarized as follows: rt = µ + ϵt ,
ϵt = σt ut ,
ut ∼ NID(0, 1)
(3.3)
log(RVt ) = ω + φ1 log(RVt −1 ) + φ2 log(RVt −5,5 ) + φ3 log(RVt −22,22 ) + γ ut −1 + ηvt , vt ∼ NID(0, 1).
(3.4)
This bivariate specification of daily returns and RV imposes the cross-equation restriction that relates the conditional variance of daily returns to the conditional expectation of daily RV, as in Eq. (3.1). Joint estimation of the bivariate system in Eqs. (3.3), (3.4) and (3.1) is by maximum likelihood. Since our applications are to equity returns, it is important to allow for asymmetric effects in volatility. To facilitate comparisons with the benchmark EGARCH model, our parameterization in Eq. (3.4) includes an asymmetry term, γ ut −1 associated with the standardized return innovation, ut −1 . The impact coefficient for a negative innovation to returns will be −γ , whereas the impact of a positive innovation will be γ . Typically, γˆ < 0, which means that a negative innovation to returns implies a higher conditional variance for next period. Unlike EGARCH, our parameterization does not propagate the asymmetry further into future volatility. In-sample fit of GARCH models have generally favored return innovations with tails that are fatter than those implied by a Normal distribution. Therefore, we evaluate whether or not that result obtains for our bivariate models of returns and RV. That is, we also try replacing Eq. (3.3) with rt = µ + ϵt ,
ϵt = σt ut ,
ut ∼ tν (0, 1),
(3.5)
in which tν denotes a t-distribution with mean 0, variance 1, and ν degrees of freedom. The remainder of the bivariate dynamic system for this case is the same as above. We compare this bivariate system with t-distributed return innovations to that with Normally-distributed innovations, not only for in-sample fit, but also for the term structure of out-of-sample density forecasts.
of daily RV as in Eq. (3.1). For this specification of our bivariate model, the dynamics of daily log(RV) are parameterized as the component model specified in Eqs. (3.7) and (3.8) which replace the HAR function in Eq. (3.4). Although infinite exponential smoothing provides parsimonious estimates, it possesses several drawbacks. For instance, it does not allow for mean reversion in volatility; and, as Nelson (1990) has shown in the case of squared returns or squared innovations to returns, the model is degenerate in its asymptotic limit. To circumvent these problems, but still retain parsimony, our dynamic model for log(RVt ), given by Eq. (3.7), weights each component i by the parameter 0 < φi < 1 and adds an intercept, ω. Note that when the model is stationary, variance forecasts will mean revert to ω/(1 − φ1 − φ2 ). This result can be used to do variance targeting and eliminate the parameter ω from the model.7 This model implies an infinite expansion in log(RVt −j ) with coeffij −1
j −1
cients of φ1 (1 − α1 )α1 + φ2 (1 − α2 )α2 , j = 1, 2, . . . .8 In order to evaluate the potential importance of t-distributed return innovations for this bivariate specification, we replace Eq. (3.6) with Eq. (3.5), and jointly estimate with Eqs. (3.7), (3.8) and (3.1). Motivated by Bollerslev et al. (2009), we also present results for an alternative assumption about available information in which we replace Eq. (3.1) with σt2 ≡ RVt . Then rt = µ + ϵ t ,
ϵt =
RVt ut ,
ut ∼ NID(0, 1)
(3.9)
2
log(RVt ) = ω +
−
φi si,t + γ ut −1 + ηvt ,
vt ∼ NID(0, 1) (3.10)
i =1
si,t = (1 − αi ) log(RVt −1 ) + αi si,t −1 ,
0 < αi < 1, i = 1, 2 (3.11)
which we label 2Comp-OSV. 3.3. Extensions We consider two extensions to the previous model. The first sets
η = 1, and replaces the innovation vt in (3.10) with a mixture of two Normals. It has density
vt ∼
2 N (0, σv, 1) 2 N (0, σv, 2)
with probability π with probability 1 − π
(3.12)
and allows log(RVt ) to have a fat-tailed distribution. The second extension is to include GARCH dynamics for the conditional variance of log(RV). In this case, η in (3.10) has a time subscript and follows the GARCH(1, 1) model
ηt2 = κ0 + κ1 [log(RVt −1 ) − Et −2 log(RVt −1 )]2 + κ2 ηt2−1
(3.13)
where log(RVt −1 ) − Et −2 log(RVt −1 ) denotes the innovation to log(RV) at time (t − 1).
3.2. Component-RV specifications 4. Density forecasts This bivariate specification for daily returns and RV has conditional returns driven by Normal innovations but now the dynamics of log(RVt ) are captured by two components (2Comp) with different decay rates, as in Maheu and McCurdy (2007). In particular, this bivariate system can be summarized as follows: rt = µ + ϵt ,
ϵt = σt ut ,
log(RVt ) = ω +
2 −
ut ∼ NID(0, 1)
φi si,t + γ ut −1 + ηvt ,
(3.6)
vt ∼ NID(0, 1) (3.7)
Our focus is on the return distribution. A popular approach to assess the accuracy of a model’s density forecasts is the predictive likelihood (Amisano and Giacomini, 2007; Bao et al., 2007; Weigend and Shi, 2000). This approach evaluates the model’s density forecast at the realized return. This is generally done for a one-step-ahead forecast density as multiperiod density forecasts are often not available in closed form. In this paper we advocate multiperiod forecasts since they provide more information to
i=1
si,t = (1 − αi ) log(RVt −1 ) + αi si,t −1 ,
0 < αi < 1, i = 1, 2 (3.8)
Again, we impose the cross-equation restriction that relates the conditional variance of daily returns to the conditional expectation
7 That is, set ω = mean(log(RV))(1 − φ − φ ). 1 2 8 Expanding (3.8) gives s = (1 − α ) ∑∞ α n log(RV i,t
i
n=0
i
t −1−n
).
J.M. Maheu, T.H. McCurdy / Journal of Econometrics 160 (2011) 69–76
discern among models. The details of the multiperiod predictive likelihood and how to calculate it are described below. The average predictive likelihood over the out-of-sample observations t = τ + kmax , . . . , T − k, is DM ,k =
T −k −
1
T − τ − kmax + 1 t =τ +k −k max
log fM ,k (rt +k |Φt , θ ),
k ≥ 1,
(4.1)
where fM ,k (x|Φt , θ ) is the k-period ahead predictive density for model M, given Φt and parameter θ , evaluated at the realized return x = rt +k . Intuitively, models that better account for the data produce a larger DM ,k . As we will see below for our application to S&P 500, T = 2936, τ = 1200, kmax = 60 so that τ + kmax − 1 = 1259. DM ,k is computed for each k using the out-of-sample returns r1260 , . . . , r2936 . That is, if k = 1, DM ,1 is computed using out-ofsample returns r1260 , . . . , r2936 . For k = 2, DM ,2 is computed using the same out-of-sample returns, etc. This gives us a term structure of average predictive likelihoods, DM ,1 , . . . , DM ,60 , to compare the performance of alternative models, M, over an identical set of outof-sample data points. To assess the statistical differences in DM ,k for two models we present Diebold and Mariano (1995) test statistics based on the work of Amisano and Giacomini (2007). Under the null hypothesis of equal performance based on predictive likelihoods of horizon k √ for models A and B, tAk,B = (DA,k −DB,k )/(σˆ AB,k / T − τ − kmax + 1) is asymptotically standard Normal. σˆ AB,k is the Newey–West longrun sample variance (HAC) estimate for dt = log fA,k (rt +k |Φt , θˆ ) −
log fB,k (rt +k |Φt , θˆ ). θˆ denotes the maximum likelihood estimate for the respective model. Due to the overlapping nature of the density forecasts for k > 1 we set the lag-length in the Newey–West variance estimate to the integer part of [k × 0.15].9 A large positive (negative) test statistic is a rejection of equal forecast performance and provides evidence in favor of model A(B). As with the predictive likelihoods, a term structure of associated test statistics tAk,B , k = 1, . . . , kmax are presented in the results section. 4.1. Computations For all k > 1 the term fM ,k (rt +k |Φt , θ ) will be unknown for the models we consider. However, given that we have fully specified the law of motion for daily returns and RV, we can accurately estimate this quantity by standard Monte Carlo methods. A conventional approach to estimate the forecast density would be to simulate the model out k periods a large number of times and apply a kernel density estimator to these realizations. However, using the kernel density estimator to estimate the forecast density ignores the fact that, in our applications, conditional on the variance we know the distribution. The use of conditional analytic results has been referred to as Rao–Blackwellization and is a standard approach to reduce the variance of a Monte Carlo estimate (Robert and Casella, 1999). This is particularly useful in density estimation which is our context. To illustrate consider our basic benchmark EGARCH model in (2.3). Note that in this univariate case the information set, Φt , just includes past returns. Our estimate is fM ,k (rt +k |Φt , θ ) =
≈
N 1 −
N i =1
∫
f (rt +k |µ, σt2+k )p(σt2+k |Φt )dσt2+k
(4.2)
73
2(i)
where f (rt +k |µ, σt +k ) is a Normal density with mean µ and 2(i)
variance σ evaluated at return rt +k ; and σt +k is simulated out N times according to the EGARCH specification, p(σt2+k |Φt ), which 2 t +k ,
is conditional on time t quantities σt2 , ut , and θˆ , the maximum likelihood estimate of the parameter vector based on Φt . For the joint models of returns and RV, we do a similar exercise to compute the predictive likelihood for returns. In this case, we simulate out both the return and RV dynamics, which implicitly integrates out the unknown σt2+k . For each simulation (i)
(i)
2(i)
of RVt +1 , . . . , RVt +k−1 , i = 1, . . . , N, we can compute σt +k = (i) Et +k−1 RVt +k using (4.1).10 A numerical standard error can be
ˆ M ,k .11 In our used to access the accuracy of fˆM ,k (rt +k |Φt , θ ) and D application we found N = 10,000 to provide sufficient accuracy. For example, the numerical standard error is typically well below ˆ M ,k . Note that for all of our bivariate models the dynamics 1% of D of the conditional distribution of RV will have a critical impact on the quality of the return density forecasts. 5. Results Our first results are out-of-sample density forecasts evaluated using predictive likelihoods. The S&P 500 sample starts at 1996/01/02, the first out-of-sample density forecast begins at 2000/12/26 (t = 1260) and ends at 2007/8/29 (t = 2936), for a total of 1677 density forecasts for each k. We summarize these outof-sample forecasts by averaging the associated 1677 predictive likelihoods for each k and then plotting their term structure for the forecast horizons k = 1, . . . , 60, that is, from 1 to 60 days out of sample. Note that the IBM sample starts at 1993/01/04, the first density forecast begins at 1997/12/24 (t = 1260), and ends at 2007/8/29 (t = 3693), for a total of 2434 density forecasts for each k. Full sample parameter estimates for the best models are discussed at the end of the section. Model estimation conditions on the first 24 observations. Our empirical work considered many different models, including different innovation distributions for returns, the value of variance targeting for log(RV), different functional forms for log(RV), and a variety of GARCH specifications estimated using daily returns for which EGARCH was the best specification. We note the following general results: models with variance targeting were dominated by the unrestricted version of the model; HAR and component models that link the conditional variance of returns to RVt by (3.1) always performed better with t-innovations to returns12 ; 2-component models were always better than single-component versions. In the following summary of results, we focus on the top models in different categories. Our empirical applications to S&P 500 and IBM returns reveal the importance of intraday information, the timing of information availability, and non-Normal innovations to both returns and log(RV). Figs. 2 and 3 compare the term structures of density forecasts for the best models of each type for the S&P 500 and IBM respectively; Fig. 4 evaluates the robustness of those results to a further generalization. The second plot in each figure displays a corresponding term structure of Diebold–Mariano test statistics for equal forecast performance for selected models.
10 Recall that the observable SV specification sets σ 2(i) = RV(i) . t +k t +k 11 To calculate a numerical standard error for D ˆ M ,k : let v 2 denote the 2(i)
2(i) t +k
f (rt +k |µ, σ
),
σ
2(i) t +k
∼ p(σ
2 t +k
|Φt )
sample variance of the draws of f (rt +k |µ, σt +k ), then the numerical stan-
(4.3)
√ dard error for fˆM ,k (rt +k |Φt , θ) is ν/ N. Using the delta rule to calcuˆ (log fˆM ,k (rt +k |Φt , θ)); the numerical standard error of Dˆ M ,k is late Var ∑T −k ˆ ˆ t =τ +k −k Var(log fM ,k (rt +k |Φt , θ))/(T − τ − kmax + 1). max
9 Our results are generally stronger (stronger rejections of the null hypothesis) for smaller lag-lengths.
12 We did consider t-innovations for returns in the observable SV models, but estimation supported a Normal distribution since the degree of freedom parameter always moved to extremely large values.
74
J.M. Maheu, T.H. McCurdy / Journal of Econometrics 160 (2011) 69–76
Fig. 2. S&P 500, joint models versus EGARCH.
Fig. 3. IBM, joint models versus EGARCH.
Note that all of the average predictive likelihood term structures display a negative slope. This is because the conditioning information is most useful for a small k. As we forecast further and further out of sample, the value of the current information diminishes. All of our models are stationary so that multiperiod forecast densities converge to the unconditional distribution. Using the same data points to evaluate the predictive likelihood for a different k, we can see how the accuracy of forecasts deteriorates for longer horizons. Two main conclusions can be gleaned from Fig. 2. Firstly, high-frequency intraday data provide a significant improvement in density forecasts relative to an EGARCH model estimated from daily data. The same conclusion about the value of highfrequency data can be drawn from the IBM sample, as shown in Fig. 3. Secondly, both the 2-component and the HAR specification dominate a single-component version of Eqs. (3.7) and (3.8) for the dynamics of log(RV). Note that the advantage of the more flexible
Fig. 4. IBM, robustness to non-normal innovations to log(RV).
functional forms (either 2-component or HAR) increase the further out we forecast. The three best bivariate specifications are the 2Comp-OSV, 2Comp and HAR. For the S&P 500, the latter two do equally well; for IBM forecasts the 2Comp specifications are better than HAR. The additional information assumed by the observable stochastic volatility (OSV) assumption, although very important with respect to in-sample fit as shown below, is only significant with respect to density forecasts for long horizons (beyond 45 days) for the S&P 500. The OSV assumption does not improve density forecasts for the IBM case, as shown by the Diebold–Mariano test statistics in Fig. 3 for ‘2Comp-OSV vs 2Comp’. Fig. 4 evaluates the robustness of the best bivariate specification for IBM to a generalization of the distributional assumption for log(RV). In particular, as discussed in Section 3.3, we generalize Eq. (3.10) to allow either a mixture-of-Normals or a GARCH parameterization of the conditional variance of log(RV). Although neither of these generalizations significantly improve the out-ofsample density forecasts for our S&P 500 sample, Fig. 4 suggests that a mixture-of-Normals parameterization of the variance of log(RV) improves density forecasts relative to the Normallydistributed alternative for the IBM sample. Table 2 provides full-sample model estimates for two of the best bivariate specifications for S&P 500 data. Estimates for the 2Comp-OSV model are reported in column 2 of the table. This specification imposes the restriction φ1 = φ2 which produced the best forecasts. The 3rd column of the table reports estimates for a model which replaces the OSV informational assumption with the assumption used by Maheu and McCurdy (2007), that is, relating the conditional variance of daily returns to the conditional expectation of daily RV, as in Eq. (3.1). In this case, t-distributed return innovations, as in Eq. (3.5), dominate Normal return innovations. Based on the in-sample loglikelihood, the 2Comp-OSV specification dominates the 2Comp specification. However, as shown in Fig. 2, there is not a large difference with respect to out-of-sample density forecasts. This is also evident from comparing the parameter estimates in Table 2. Except for the return intercept, and the fact that the return innovations have fatter tails for the 2Comp model than for the 2Comp-OSV version, the parameter estimates are similar.
J.M. Maheu, T.H. McCurdy / Journal of Econometrics 160 (2011) 69–76 Table 2 S&P 500 model estimates. 2Comp-OSV model
√
rt = µ + ϵt , ϵt = RVt ut , ut ∼ D(0, 1) ∑2 log(RVt ) = ω + i=1 φi si,t + γ ut −1 + ηvt , vt ∼ NID(0, 1), si,t = (1 − αi ) log(RVt −1 ) + αi si,t −1 , i = 1, 2. 2Comp model rt = µ + ϵt , ϵt = σt ut , ut ∼ tv (0, 1) σt2 = exp Et −1 log(RVt ) + 12 Vart −1 (log(RVt )) ∑2 log(RVt ) = ω + i=1 φi si,t + γ ut −1 + ηvt , vt ∼ NID(0, 1), si,t = (1 − αi ) log(RVt −1 ) + αi si,t −1 , i = 1, 2. Parameter
2Comp-OSV: ut ∼ N (0, 1)
2Comp: ut ∼ tν (0, 1)
µ ω φ1 φ2 α1 α2 γ η 1/ν
0.038 (0.011) −0.026 (0.012) 0.476 (0.007) 0.476 0.888 (0.017) 0.435 (0.037) −0.129 (0.010) 0.531 (0.009)
−0.018 (0.014) −0.025 (0.013)
lgl
−5646.725
−5916.342
0.402 (0.147) 0.543 (0.154) 0.911 (0.045) 0.508 (0.105) −0.141 (0.011) 0.528 (0.009) 0.089 (0.016)
The main features of our results are as follows. Bivariate models that use high-frequency intraday data provide a significant improvement in density forecasts relative to an EGARCH model estimated from daily data. Two-component specifications for the dynamics of log(RV) provide similar or better performance than HAR alternatives; both dominate the less flexible singlecomponent version. A bivariate model of returns with Normal innovations and observable stochastic volatility directed by a 2component, exponentially decaying function of log(RV) provides good density forecasts over a range of out-of-sample horizons for both data series. We find that adding a mixture of Normals or GARCH effects to the innovations of the log(RV) part of this specification is not statistically important for S&P 500, while the addition of the mixture of Normals provides a significant improvement for IBM. 6. Conclusion This paper proposes alternative joint specifications of daily returns and RV which link RV to the variance of returns and exploit the benefits of using intraperiod information to obtain accurate measures of volatility. Our focus is on out-of-sample forecasts of the return distribution generated by our bivariate models of return and RV. We explore which features of the time-series models contribute to superior density forecasts over horizons of 1 to 60 days out of sample. Our main method of model comparison uses the predictive likelihood of returns, the forecast density evaluated at the realized return, which provides a measure of the likelihood of the data being consistent with the model. An identical set of return observations is used to compute a term structure of test statistics over a range of forecast horizons, so that the average predictive likelihoods are not only comparable across models but also over different forecast horizons for a particular model. Two alternative joint specifications of daily returns and realized volatility were investigated. These two bivariate models are distinguished by alternative assumptions about RV dynamics. The first model uses a heterogenous autoregressive (HAR) specification of log(RV). The second model allows components of log(RV) to have different decay rates. Both of these bivariate models allow for asymmetric effects of past negative versus positive return innovations. Both models are stationary and consistent with mean reversion in RV. We also investigate an observable SV assumption (OSV) for the timing of information availability.
75
Using the predictive likelihood, we find that high-frequency intraday data is important for density forecasts relative to using daily data as in our benchmark EGARCH specification. Secondly, a flexible function form (either two components or HAR) is very important for the dynamics of log(RV). The OSV assumption marginally improves density forecasts at long horizons for the S&P 500 but is essentially similar for the IBM data. A bivariate model of returns with Normal innovations and observable stochastic volatility directed by a 2-component, exponentially decaying function of log(RV) provides good density forecasts over a range of out-of-sample horizons for both data series. Acknowledgements We thank the editors and two anonymous referees, as well as Zhongfang He, Lars Stentoft, participants of the April 2006 CIREQ conference on Realized Volatility, the McGill 2008 Risk Management Conference, the University of Waterloo 2009 Econometrics and Risk Management Conference, and seminar participants at the Federal Reserve Bank of Atlanta for many helpful comments. Guangyu Fu and Xiaofei Zhao provided excellent research assistance. We are also grateful to the SSHRC for financial support. References Aït-Sahalia, Y., Mykland, P.A., Zhang, L., 2005. Ultra high-frequency volatility estimation with dependent microstructure noise, NBER Working Paper No. W11380. Amisano, G., Giacomini, R., 2007. Comparing density forecasts via weighted likelihood ratio tests. Journal of Business and Economic Statistics 25 (2), 177–190. Andersen, T.G., Bollerslev, T., 1998. Answering the skeptics: yes, standard volatility models do provide accurate forecasts. International Economic Review 39 (4), 885–905. Andersen, T.G., Bollerslev, T., Diebold, F.X., 2007. Roughing it up: including jump components in the measurement, modeling and forecasting of return volatility. Review of Economics and Statistics 89, 701–720. Andersen, T.G., Bollerslev, T., Diebold, F.X., 2009. Parametric and nonparametric volatility measurement. In: Hansen, L., Ait-Sahalia, Y. (Eds.), Handbook of Financial Econometrics, North Holland, Elsevier, pp. 67–138. Andersen, T.G., Bollerslev, T., Meddahi, N., 2004. Analytic evaluation of volatility forecasts. International Economic Review 45, 1079–1110. Andersen, T.G., Bollerslev, T., Diebold, F.X., Ebens, H., 2001a. The distribution of realized stock return volatility. Journal of Financial Economics 61, 43–76. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2001b. The distribution of exchange rate volatility. Journal of the American Statistical Association 96, 42–55. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2003. Modeling and forecasting realized volatility. Econometrica 71, 529–626. Bandi, F.M., Russell, J.R., 2006. Separating microstructure noise from volatility. Journal of Financial Economics 79, 655–692. Bandi, F.M., Russell, J.R., 2008. Microstructure noise, realized volatility, and optimal sampling. Review of Economics and Statistics 75 (2), 339–364. Bao, Y., Lee, T.-H., Saltoglu, B., 2007. Comparing density forecast models. Journal of Forecasting 26 (3), 203–225. Barndorff-Nielsen, O.E., Shephard, N., 2002. Econometric analysis of realised volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society, Series B 64, 253–280. Barndorff-Nielsen, O.E., Shephard, N., 2004. Power and bipower variation with stochastic volatility and jumps. Journal of Financial Econometrics 2, 1–48. Barndorff-Nielsen, O.E., Shephard, N., 2007. Variation, jumps and high frequency data in financial econometrics. In: Blundell, R., Persson, T., Newey, W.K. (Eds.), Advances in Economics and Econometrics. Theory and Applications, Ninth World Congress. In: Econometric Society Monographs, Cambridge University Press, pp. 328–372. Barndorff-Nielsen, O., Hansen, P., Lunde, A., Shephard, N., 2008. Designing realised kernels to measure the ex-post variation of equity prices in the presence of noise. Econometrica 76, 1481–1536. Bollerslev, T., Kretschmer, U., Pigorsch, C., Tauchen, G., 2009. A discrete-time model for daily S&P500 returns and realized variations: jumps and leverage effects. Journal of Econometrics 150 (2), 151–166. Corsi, F., 2009. A simple approximate long memory model of realized volatility. Journal of Financial Econometrics 7 (2), 174–196. Diebold, F.X., Mariano, R.S., 1995. Comparing predictive accuracy. Journal of Business & Economic Statistics 13 (3), 252–263. Ghysels, E., Sinko, A., 2006. Volatility forecasting and microstructure noise. Manuscript, Department of Economics, University of North Carolina. Ghysels, E., Santa-Clara, P., Valkanov, R., 2005. There is a risk-return tradeoff after all. Journal of Financial Economics 76, 509–548. Ghysels, E., Santa-Clara, P., Valkanov, R., 2006. Predicting volatility: getting the most out of return data sampled at different frequencies. Journal of Econometrics 131, 445–475.
76
J.M. Maheu, T.H. McCurdy / Journal of Econometrics 160 (2011) 69–76
Giot, P., Laurent, S., 2004. Modelling daily value-at-risk using realized volatility and ARCH models. Journal of Empirical Finance 11, 379–398. Hansen, P.R., Lunde, A., 2006. Realized variance and market microstructure noise. Journal of Business & Economic Statistics 24 (2), 127–161. Koopman, S.J., Jungbacker, B., Hol, E., 2005. Forecasting daily variability of the S&P 100 stock index using historical, realised, and implied volatility measurements. Journal of Empirical Finance 12, 445–475. Liu, C., Maheu, J.M., 2008. Are there structural breaks in realized volatility? Journal of Financial Econometrics 6 (3), 326–360. Maheu, J.M., McCurdy, T.H., 2002. Nonlinear features of FX realized volatility. Review of Economics and Statistics 84 (4), 668–681. Maheu, J.M., McCurdy, T.H., 2007. Components of market risk and return. Journal of Financial Econometrics 5 (4), 560–590. Martens, M., van Dijk, D., de Pooter, M., 2003. Modeling and forecasting S&P500 volatility: long memory, structural breaks and nonlinearity. Econometric Institute, Erasmus University Rotterdam. Meddahi, N., 2002. A theoretical comparison between integrated and realized
volatility. Journal of Applied Econometrics 17, 479–508. Nelson, D.B., 1990. ARCH models as diffusion approximations. Journal of Econometrics 45, 7–39. Oomen, R.C.A., 2005. Properties of bias-corrected realized variance under alternative sampling schemes. Journal of Financial Econometrics 3, 555–577. Robert, C.P., Casella, G., 1999. Monte Carlo Statistical Methods. Springer, New York. Taylor, S., Xu, X., 1997. The incremental information in one million foreign exchange quotations. Journal of Empirical Finance 4, 317–340. Weigend, A.S., Shi, S., 2000. Predicting daily probability distributions of S&P500 returns. Journal of Forecasting 19, 375–392. Zhang, L., 2006. Efficient estimation of stochastic volatility using noisy observations: a multi-scale approach. Bernoulli 12 (6), 1019–1043. Zhang, L., Mykland, P.A., Aït-Sahalia, Y., 2005. A tale of two time scales: determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association 100 (472), 1394–1411. Zhou, B., 1996. High-frequency data and volatility in foreign exchange rates. Journal of Business & Economic Statistics 14, 45–52.
Journal of Econometrics 160 (2011) 77–92
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Threshold estimation of Markov models with jumps and interest rate modeling✩ Cecilia Mancini a , Roberto Renò b,∗ a
Dipartimento di Matematica per le Decisioni, via C. Lombroso 6/17, Università di Firenze, Italy
b
Dipartimento di Economia Politica, Piazza S. Francesco 7, Università di Siena, Italy
article
info
Article history: Available online 6 March 2010
abstract We reconstruct the level-dependent diffusion coefficient of a univariate semimartingale with jumps which is observed discretely. The consistency and asymptotic normality of our estimator are provided in the presence of both finite and infinite activity (finite variation) jumps. Our results rely on kernel estimation, using the properties of the local time of the data generating process, and the fact that it is possible to disentangle the discontinuous part of the state variable through those squared increments between observations not exceeding a suitable threshold function. We also reconstruct the drift and the jump intensity coefficients when they are level-dependent and jumps have finite activity, through consistent and asymptotically normal estimators. Simulated experiments show that the newly proposed estimators perform better in finite samples than alternative estimators, and this allows us to reexamine the estimation of a univariate model for the short term interest rate, for which we find fewer jumps and more variance due to the diffusion part than previous studies. © 2010 Elsevier B.V. All rights reserved.
1. Introduction In this paper, we focus on the nonparametric estimation of univariate models with jumps. We describe the evolution of a state variable, e.g. an interest rate or a logarithmic asset price, through the following process dXt = µt dt + σ (Xt )dWt + dJt ,
t ∈ [0, T ],
(1.1)
where W is a standard Brownian motion, and J is a pure jump semimartingale. The Brownian semimartingale part has leveldependent volatility and models continuous changes of the state variable. The jumps J = J1 + J˜2 can be large and infrequent (finite activity component J1 ) as well as small and frequent (infinite activity component J˜2 ) but with finite variation. J˜2 is assumed to be Lévy. Our model allows one to include jump components such as, for instance, doubly stochastic compound Poisson processes, the Variance Gamma process (Madan, 2001), the CGMY model
✩ This paper is complemented by a Web Appendix, containing Monte Carlo experiments and details on threshold implementation, which is downloadable from Roberto Renò’s web page. We wish to acknowledge Michael Johannes and Angelo Melino for useful comments, Jean Jacod for the important suggestion of references and advises, two anonymous referees and participants at the CIREQ conference on realized volatility, held in Montreal in April, 2006. All errors and omissions are our own. Cecilia Mancini has benefited from support by MIUR grants no. 2006132713004 for the year 2006 and no. 2004011204-002 for the year 2004. ∗ Corresponding author. E-mail addresses:
[email protected] (C. Mancini),
[email protected] (R. Renò).
0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.019
with Y < 1 (Carr et al., 2002) and α -stable, or tempered stable, processes with α < 1. Based on n observations of the state variable Xt in the time span [0, T ], the main contribution of this paper is to introduce and analyze fully non-parametric estimators of the function σ (x) with T < ∞. If we further assume J˜2 ≡ 0 and we consider leveldependent drift µ(Xt ) and jump intensity λ(Xt − ), we estimate µ(x) and λ(x) as T → ∞. In absence of the jump component nonparametric estimation of σ (·) has been first studied by Florens-Zmirou (1993). This technique has been refined and extended in Jiang and Knight (1997), Stanton (1997), Bandi and Phillips (2003) and Renò (2008), see also the review of Fan (2005). In presence of jumps, the only nonparametric estimation of σ (·) we are aware of has been proposed by Johannes (2004). This estimator, whose limiting theory is fully provided in Bandi and Nguyen (2003), allows for a finite activity doubly stochastic compound Poisson jump part and relies on estimation of the infinitesimal moments of the state variable. We instead build on the work of Mancini (2009) who shows that, for T < ∞, when J has finite activity and the interval between two observations shrinks it is possible to distinguish in which intervals jumps occurred. This is based on the fact that the diffusive part tends to zero at a known rate, namely the modulus of continuity of the Brownian motion paths.1 That allows one to
1 Alternative approaches to disentangle jumps from diffusion are based on power and multipower variation (Barndorff-Nielsen and Shephard, 2004). The properties
78
C. Mancini, R. Renò / Journal of Econometrics 160 (2011) 77–92
identify asymptotically the jump component and remove it from X . This methodology is robust to enlarging time span (T → ∞) and to the presence of infinite activity jumps. The importance of our estimator stems mainly from the fact that models with jumps are used in a variety of financial applications. For interest rate modeling, Das (2002) and Piazzesi (2005) show that the role of jumps is relevant in incorporating newly released information in interest rate levels. The statistical and economic role of jumps in interest rate modeling is further discussed in Johannes (2004). Underlying jump-diffusion models are used for bond pricing, as in Eberlein and Raible (1999), and more generally for derivatives pricing, see Bakshi et al. (1997), Bates (2000), Eraker et al. (2003), Andersen et al. (2002) and Pan (2002). Also infinite activity jump components have been considered for modeling assets prices, see Carr et al. (2002) and Aït-Sahalia and Jacod (2009). Finally, disentangling the jump component is crucial for risk management, see Andersen et al. (2007) and Corsi et al. (2009). After assessing the asymptotic properties of the proposed estimators, we first show that the estimation of the variance is reliable in realistic Monte Carlo simulations of the short term interest rates (results on Monte Carlo simulations are reported in a companion Web Appendix). We then compare on empirical data our results with those in the literature, and in particular with those in Johannes (2004). We find very different results, and in particular less jumps and more diffusive variance. Monte Carlo experiments reveal that the jump intensity function estimated in Johannes (2004) is upper biased, corroborating our findings. In our empirical analysis, we also show that the 7-day Eurodollar deposit rate should not be used as a proxy for the short rate, since it contains an inherent jump process triggered by liquidity reasons. When we estimate and exclude the jump component, we find almost the same volatility on the 7-day rate and on the threemonth T-bill rate. This paper is organized as follows. Section 2 describes the model. The results when J has finite jump activity are presented in Section 3. In Section 4 we show that our proposed estimator of the diffusion coefficient is consistent and asymptotically Gaussian even when J has infinite jump activity with finite variation. In Section 5 we use our method to estimate jump-diffusion models on empirical data of the short rate, and we compare the obtained results with those in the recent literature. Section 6 concludes. 2. Model setup In this Section, we set up the model and the assumptions. We model the evolution of an observable variable, a price, by a stochastic process Xt , t ∈ [0, T ]. We work in a filtered probability space (Ω , (Ft )t ∈[0,T ] , F , P), satisfying the usual conditions (Protter, 2005), where W is a standard Brownian motion, J1 is a finite activity (FA) pure jump semimartingale (e.g. driven by a doubly stochastic compound Poisson process with jump intensity in L1 (Ω × [0, +∞[) and J˜2 is a pure jump Lévy process with infinite activity (see Cont and Tankov (2004)). We then assume that (Xt )t ∈[0,T ] is a real process following (1.1) with J = J1 + J˜2 and such that X0 ∈ R. We require the following assumption on µt and σ (·) throughout all the paper. Assumption 2.1. µt and σt := σ (Xt ) are progressively measurable processes with cadlag paths such that the SDE (1.1) has a unique strong solution which is adapted and right continuous with left limits on [0, T ] (Gihman and Skorohod, 1972; Ikeda and Watanabe, 1981).
of such estimators and their robustness to the presence of infinite activity jumps have been further developed in Woerner (2006), Barndorff-Nielsen et al. (2006) and Jacod (2008).
For example, when J is a Lèvy process, it can be decomposed as the sum of the jumps larger than one and the sum of the compensated jumps smaller than one. We allow a little bit more generality, semimartingale, which we can write as t since J1 is any FA ∑ Nt J1t = 0 R x · m(dx, du) = ℓ=1 γℓ , where m is the jump random measure of J1 , the jump intensity is a stochastic process, and NT := T 1 · m(dx, dt ) is a.s. finite. 0 R We assume we have n + 1 discrete time observations {X0 , Xt1 , . . . , Xtn−1 , Xtn }, with tn = T . We assume that the step between consecutive observations is constant: ti − ti−1 = δ , for all i = 1, . . . , n, so that ti = iδ . The extension to the case of deterministic unevenly spaced data is straightforward for the consistency results, e.g. Mancini (2009). For a given semimartingale Z we indicate by ∆i Z := Zti − Zti−1 , ∆Zt := Zt − Zt − . [Z ]ct denotes the quadratic variation of the continuous martingale part of Z . We use classical kernel theory, so we introduce the bandwidth. Definition 2.2. A bandwidth parameter is a sequence of real numbers h such that as n → ∞ we have h → 0 and nh → ∞. An example of a bandwidth parameter which is very popular in applications is the following: 1
h = hs σˆ n− 5
(2.1)
where hs is a real constant to be tuned, and σˆ is the sample standard deviation. Definition 2.3. A kernel K is a non-negative real function such that +∞ K ( s ) ds = 1. −∞ An example of smooth kernel is the Gaussian density. The indicator function used by Florens-Zmirou (1993), namely K (u) = 12 I{|u| 0, if σ (x) is continuous and σ 2 (x) > 0 for δ ln(1/δ) all x; there exists ε ∈]0, 1[ such that, as n → ∞, h2+ε → 0; K (ℓ) are bounded and absolutely integrable on R for ℓ = 1, . . . , m − 1,
C. Mancini, R. Renò / Journal of Econometrics 160 (2011) 77–92
where m is the first integer such that m > 2/ε , K and K (m) are bounded; then we have Lˆ ⋆T (x) :=
n 1−
h i =1
=
where Ki−1 LT (x ) −
R−
a.s.
L⋆T (x)
X
σ 2 (x) −x
Ki − 1 δ →
K
,
, L⋆T (x)
ti−1
h
=
LT (x)
R+
K (u)du +
K (u), for all x visited by X , as δ and h → 0.
1. σ (x) is continuous, σ 2 is strictly positive, σ ′ is bounded; 1
2. as δ → 0 both the threshold function ϑ(δ) and ϑ(δ)δ tend to zero; 3. there exists ε ∈]0, 1[ such that, as n → ∞,
δ 1−ε ln 1δ h2+ε
then for all x visited by X the estimator K
σˆ (x) := 2 n
i=1
h
(∆i X )2 I{(∆i X )2 ≤ϑ(δ)}
X −x n ∑ ti−1 K
h
i=1
(3.1)
δ
satisfies
√
st nh σˆ n2 (x) − σ 2 (x) → M N
σ 6 (x)L⋆⋆ T ( x) , 0, 2 (L⋆T )2 (x)
(3.2)
LT (x− )
R−
] and L⋆⋆ T (x) = LT (x)
K 2 (u)du.
Theorem 3.7. For a Harris recurrent process X evolving as dXt = µ(Xt )dt + σ (Xt )dWt + dJ1,t , t ∈]0, +∞], with λt := λ(Xt − ), assume: 1. µ(x), σ (x), λ(x) are bounded; µ(x) and σ (x) are twice continuously differentiable and satisfy local Lipschitz and growth conditions (see Bandi and Nguyen, 2003, Assumption 1); λ(x) ≥ 0, σ 2 (x) > 0; 2. J1 is such that ∀ ε > 0 P {|γℓ | < ε} ≤ c ε and the jump sizes {γℓ }ℓ are independent of N; 3. ϑ(δ) = δ η , η ∈]0, 1[, with nδ 1+η/2 → 0; 4. the bandwidth parameter is of the form h = δ φ , with φ ∈]0, η/2[; 5. the kernel K is symmetric around zero, bounded, continuously differentiable with integrable first derivative; K is square integrable and such that R s2 K (s)ds < ∞; 6. as n, T → ∞ and h → 0, δ → 0 we have a.s.
where the above convergence is stable in law, M N 0, U is a random variable having a mixed normal law with the characteristic U 2 u2 2
∑N
function φ(u) = E [e−
t Here we assume that J1t = ℓ=1 γℓ , where N is a doubly stochastic Poisson process with an intensity process λt , and we restrict to the case µt := µ(Xt ) and λt := λ(Xt − ), it is possible to estimate the drift and the jump intensity functions by letting n, T → ∞ and δ = T /n → 0. The estimator for λ(x) is devised using the estimated number of jumps.
→ 0 (which
implies that nh2+ε → ∞) and nh3 → 0; 4. K (ℓ) , (K 2 )(ℓ) are bounded and absolutely integrable on R for ℓ = 0, . . . , m − 1, where m is the first integer such that m > 2/ε, and K (m) , (K 2 )(m) are bounded; R K (ℓ) (u)|u|du < ∞ for all ℓ = 0, . . . , m − 1;
X −x n ∑ ti−1
Remark 3.6. In Mancini (2009) the integrated volatility IVT := T 2 σs ds is estimated using the threshold estimator 0 ∑ n 2 i=1 (∆i X ) I{(∆i X )2 ≤ϑ(δ)} , with σs any cadlag process. Here we estimate the spot volatility σ (Xs ) with the joint combination of the localization procedure induced by kernels and the threshold technique to get rid of the jumps. 3.2. Estimating the drift and the intensity function
Theorem 3.2. For X evolving as dXt = µt dt +σ (Xt )dWt + dJ1,t , t ∈ ]0, T ], with fixed T > 0, if δ ln
79
R+
2
a.s.
∑n
Remark 3.3. Stable convergence implies convergence in distribution, and it is well known that the asymptotic normality implies convergence in probability of the estimator. Remark 3.4. The crucial assumption is the required speed of convergence of the threshold ϑ(δ). Theorem 3.2 holds even if the threshold ϑ(δ) varies with time according to:
a.s.
LT (x) → 0,
Define:
X −x n ∑ ti−1 K
1 2 L⋆⋆ T (x) is estimated by h i=1 Ki−1 δ (the proof is the same as for 2 Proposition 3.1 with K in place of K ).
h
hLˆ ⋆T (x) → ∞ but h5 Lˆ ⋆T (x) → 0 for all visited x, with Lˆ ⋆T (x) defined as in Proposition 3.1.
K 2 (u)du +
δ ln 1δ ⋆ ˆ
µ ˆ n (x) =
h
i =1
∆i X · I{(∆i X )2 ≤ϑ(δ)}
X −x n ∑ ti−1 K
h
i=1
.
(3.3)
δ
Then for each x visited by X we have
d
hLˆ ⋆T (x) µ ˆ n (x) − µ(x) → M N (0, K2 σ 2 (x)),
where K2 :=
R
K 2 (u)du. Further if K ′ is bounded and ci,n is a double
array of constants with i = 1, . . . , n such that ∀ x hLˆ ⋆T (x) supi |1 − ci,n | → 0 as n → ∞, define:
˜ ϑt (δ) = ct ϑ(δ)
X −x n ∑ ti−1 K
where ϑ˜ satisfies the threshold conditions, and ct is an a.s. bounded stochastic process which is also a.s. bounded away from zero, after replacing I{(∆i X )2 ≤ϑ(δ)} with I{(∆i X )2 ≤ϑt (δ)} . This is extremely useful i in financial applications, as will be shown later in the paper.
λˆ n (x) =
Remark 3.5. If the kernel is symmetric as in Bandi and Nguyen ⋆ ⋆⋆ − (2003), we have LT (x) = (LT (x) + LT (x ))/2 and LT (x) = LT (x) R K 2 (u)du + LT (x− ) R K 2 (u)du. Note that when X + − is continuous the asymptotic variance in (3.2) is consistent with Theorem 1′ in Florens-Zmirou (1993), given the different definitions of local time of X .
i=1
h
ci,n I{(∆i X )2 >ϑ(δ)}
X −x n ∑ ti−1 K
h
i =1
.
(3.4)
δ
Then, for each x visited by X , we have: d ˆ n (x) − λ(x) → hLˆ ⋆T (x) λ M N (0, K2 λ(x)).
Remark 3.8. The rate of divergence of LT (x) is T for a stationary process, and is less or equal than T , and generally unknown, for a recurrent process, see the discussion in Bandi and Renò (2008).
80
C. Mancini, R. Renò / Journal of Econometrics 160 (2011) 77–92
Remark 3.9. All the conditions linking δ, n, h and ϑ are compatible as soon as φ is sufficiently small (close to 0) and η is sufficiently large (close to 1). For instance with φ = 1/8, and η = 0.9 all such conditions are fulfilled. The assumption ∀ ε > 0 P {|γk | < ε} ≤ c ε is fulfilled e.g. for Gaussian or exponential jump sizes. Moreover, it is satisfied when for instance each |γi | is bounded away from zero or the laws γi (P ) have a density on R which is continuous in a neighborhood of zero.
ˆ n (x) can play an important The multiplying factors ci,n in λ role after making assumptions on the distribution of the jump sizes. Since we estimate the intensity using only those variations which are larger than the threshold, we discard all jumps whose size is below the threshold. If we assume the jump sizes to be normally distributed with mean 0 and variance σJ2 , we can attenuate considerably this problem by setting ci,n =
1
√
2FN (− ϑ/σJ )
,
(3.5)
where FN (x) is the cumulative normal distribution function. Results on the three month’s rate in Section 5 indicate that the correction (3.5) delivers an unbiased estimator of the jump intensity. We remark that the time series of estimated jumps ∆i XI{(∆i X )2 >ϑ(δ)} allows one to estimate σJ2 using, for example, a simple method of moments. In fact if a random variable V has law N (0, σJ2 ) its second moment m2 (c ) conditional to |V | ≥ c is given by:
σJ c exp m2 (c ) = σJ2 +
−c 2 2σJ2
√
FN (−c σJ ) 2π
.
(3.6)
4. Threshold estimation: infinite jump activity In this section we fix T > 0and assume that J˜2 is a pure jump s Lévy process of type J˜2s := 0 |x|≤1 x[m(dt , dx) − ν(dx)dt ] with
ν{|x| ≤ 1} = +∞, where ν is the Lévy measure of J˜2 , and X follows (1.1) with J1 a pure jump finite activity semimartingale. J˜2 is an infinite activity jump process: every trajectory jumps infinitely many times on every finite time interval. For any Lévy process the Blumenthal–Getoor index (Cont and Tankov, 2004),
|x|δ ν(dx) < +∞
∫
x2 ν(dx) = O(ε 2−α ),
as ε → 0;
(4.1)
|x|≤ε
3. the bandwidth is of the form h = δ φ , with φ ∈] 31 , 38 [ (which implies that ∃ ε > 0 such that φ < 2ε/(1 − 2ε), ε − 1/2 +
φ(1/2 + ε) ≥ 0 and
δ 1−ε
ln 1δ → 0); 4. the threshold function is of the form ϑ(δ) = δ η , with η ∈]5/6, 1[ such that η/2 > φ ; 5. K (ℓ) , (K 2 )(ℓ) are bounded and absolutely integrable on R for ℓ = 0, . . . , m − 1, where m is the first integer such that m > 2/ε, and K (m) , (K 2 )(m) are bounded; R K (ℓ) (u)|u|du < ∞ for all ℓ = 0, . . . , m − 1; h2+ε
P
L⋆ (x)
then for all x visited by X we have Lˆ ⋆T (x) → σT2 (x) , and (3.2) still holds. Remark 4.2. (i) Assumption 2 is satisfied if for instance ν has G(|x|) a density f (x) such that f (x) behaves as |x|1+α when x → 0,
where G is a real function with limx→0 G(x) ∈ R − {0}, and α (the Blumenthal–Getoor index of J˜2 ) is less than 1. In particular it holds if J˜2 is Variance Gamma, or tempered stable with Blumenthal–Getoor index strictly less than 1 (so in particular if it is a CGMY model with Y < 1), or α -stable with α < 1. Assumption 2 is satisfied more generally if J˜2 is a semimartingale, in which case the compensator of the jumps measure is of type νt ,ω (dx)dt, when the behavior of νt ,ω (dx) for x around the origin is controlled as in (4.1), uniformly in t ∈ [0, T ]. (ii) Condition φ > 31 ensures that nh3 → 0. Condition φ < 38 ensures that there exist a small ε > 0 and a p > 0 such that 1−ε−φ(2+ε)−p > 0, implying hδ 2+ε ln 1δ → 0, as it was requested for Theorem 3.2. For each η ∈]0, 1[ the threshold constraints 1−ε
√ Matching m2 ( ϑ) in (3.6) to the variance of the estimated jump sizes provides an estimator of σJ2 .
∫ α := inf δ ≥ 0 :
1. σ (x) is continuous, σ 2 is strictly positive, σ ′ is bounded; 2. α < 1 and
|x|≤1
measures how frenetic the jump activity is: the smaller α , the milder the activity. By definition, we have α ∈ [0, 2]. For instance, a finite activity Lévy pure jump process (i.e. a compound Poisson process) has α = 0, the VG process is infinite activity but still has α = 0. An α -stable process has Blumenthal–Getoor index equal to α . When α < 1 then J˜2 has finite variation, namely a.s. ∑ ˜ s≤T |∆J2,s | < ∞. In this case there exists a version of the local time Lt (x) which is continuous in t and cadlag in x and the occupation time formula keeps true (Protter (2005), ch. 4, thms 76 and following). In what follows, we show that estimator (3.1) still works in the present framework. To our knowledge, this is the first nonparametric kernel estimator which solves the problem of identifying the diffusion coefficient σ (x) in presence of infinite activity jumps. Theorem 4.1. For X evolving as dXt = µt dt + σ (Xt )dWt + dJt , t ∈ ]0, T ], fixed T > 0, J = J1 + J˜2 , if
ϑ(δ) → 0 and
δ ln 1δ /ϑ(δ) → 0 are satisfied. Condition η > φ
5 6
αη
ensures that η(1 − α/4) > 1/2 and thus 2 + η > 12 + 2 for all α < 1, which is used in the proof of Theorem 4.1 to show that the term S4 in the normalized estimation error is negligible. Condition η/2 > φ ensures that ϑ(δ)k/2 /hk → 0 for all k > 0, which is used a number of times, e.g. to show that the term S3 tends to zero in probability. The choice e.g. η = 0.9 and φ = 0.35, ε = 0.2407 allows to satisfy all the conditions requested in points 3. and 4. of the above Theorem, for all α < 1. −u2 /2
(iii) The Gaussian kernel K (u) = e√ of Theorems 3.2 and 4.1.
2π
satisfies the assumptions
With respect to Bandi and Nguyen (2003), not only do we allow for infinite activity jumps, but also our alternative methodology provides (at the cost of the additional specification of the threshold function ϑ ) significant improvements in small samples as shown in the simulation studies, see the Web Appendix. 5. Estimation of short rate models In this Section, we implement the proposed estimator on interest rate time series to reexamine nonparametric estimation of jump-diffusion models of the short rate. 5.1. Proxying the short rate The short rate is inherently unobservable, and it has to be proxied using interest rates of short term zero coupon bonds, e.g. Chapman et al. (1999). This problem is similar to that of approximating the derivative of a function, which is not observed continuously, by using finite increments. The literature has focused on the use of two time series: the 7-day Eurodollar deposit time series, e.g. in Ait-Sahalia (1996), Bandi (2002) and Hong and Li
C. Mancini, R. Renò / Journal of Econometrics 160 (2011) 77–92
Fig. 5.1. Top: the 7-day Eurodollar deposit daily time series, from June 1973 to February 1995, for a total of 5505 observations. Bottom: the 3-month T-bill daily time series, same time span.
(2005), and the 3-month Treasury Bill time series, e.g. in Stanton (1997) and Jiang (1998). They have important differences since, while the 3-month Treasury Bill rate is a market rate, the Eurodollar is an interbank rate and is more subject to sticky prices and reserve requirements. In our first application, we consider the time series of daily observations in the period starting in June 1973 and ending in February, 1995, for a total of 5505 observations, for both the 7-day Eurodollar time series and the 3-month T-Bill. The time series of the 7-day Eurodollar deposit coincides with that employed by AitSahalia (1996), Bandi (2002) and by Hong and Li (2005) for their nonparametric estimates. The 3-month T-bill time series is a subset of that used by Stanton (1997). Fig. 5.1 displays the two time series. The 7-day series is slightly higher in levels. The average difference between the 7-day and the 3-month time series is 1.08%, with a standard deviation of 1.04%. Thus, there is a slight term structure effect. However, the main difference between the two is that, while the 3-month time series looks more like a continuous diffusive process, the 7day time series displays frequent spikes, with a periodic pattern. Indeed it is well known, see e.g. Duffee (1996), Hamilton (1996), that interest rate instruments with maturity below three months display idiosyncratic features which are mostly due to calendar and liquidity effects, e.g. reserve management, and spikes are induced by liquidity shortages which nevertheless do not affect higher maturity instruments. Implementing the classical drift and diffusion estimators (Florens-Zmirou, 1993; Jiang and Knight, 1997; Stanton, 1997):
X −x n ∑ ti−1 K
SKn ,j
(x) =
h
i=1
(∆i X )j
X −x n ∑ ti−1 K
i=1
h
(5.1)
δ
with j = 1 for the drift and j = 2 for the variance on the two time series, we get the estimates displayed in Figs. 5.2 and 5.3. Confidence bands are obtained via a Monte Carlo simulation as explained in the Web Appendix. For the drift function, the estimates obtained for the two interest rate time series are quite
81
Fig. 5.2. Estimates of the drift function µ(r ) on the two time series, obtained with the estimator proposed in Stanton (1997). The dashed lines are the (10%, 90%) confidence bands for the estimate on the 3-month time series, computed via a Monte Carlo simulation. The dotted lines are confidence bands for the estimate on the 7-day time series. The solid line is the estimate obtained in Bandi (2002) for the 7-day time series.
consistent, even if the 7-day time series is estimated to be more mean-reverting than the 3-month time series. For the diffusion coefficient, big differences are instead observed. In fact the spikes in the 7-day time series raise enormously the observed variance of the short rate. This is rather annoying, since the two time series are supposed to proxy for the same economic variable (the short rate). It is important to remark that the estimates on the 7-day time series are almost identical to the estimates proposed by Bandi (2002) which are overlapped2 in Figs. 5.2 and 5.3, on the very same time series.3 The observed bias points towards the presence of a jump component which distorts the variance estimate, in both the time series. However, the jump component is much more relevant for the 7-day time series. Thus, if we properly estimate a jumpdiffusion model, once jumps are detected and deleted, the diffusion and the drift coefficient should look similar on the two time series. Loosely speaking, the two time series represent nearly the same variable, but the 7-day rate is with jumps, and the 3-month rate is without jumps. Thus, they are a suitable battlefield for the threshold estimator proposed in this paper. It is implemented in the following way. First, a GARCH(1, 1) model is estimated4 on the 3-month time series, and it is used as an auxiliary model to filter the conditional variance. We then
2 The estimates obtained in Bandi (2002) and, later in the paper, by Johannes (2004) have been obtained by scanning the images in their papers and using a digitalization software. Thus, they should be regarded as indicative, even if the error of the digitalization procedure is negligible. 3 Simulation results in Renò et al. (2006) show that the Bandi and Phillips (2003) estimator, used in Bandi (2002), and the Florens-Zmirou (1993) estimator, used in Stanton (1997), yield nearly identical results in estimating the diffusion function. 4 The model is h = ω + α · (X − X )2 + β · h and the following estimates t
t −1
t −2
t −1
are used: ω ˆ = 0.75 · 10−8 , αˆ = 0.091379, βˆ = 0.906878. ht is initialized with its estimated unconditional variance.
82
C. Mancini, R. Renò / Journal of Econometrics 160 (2011) 77–92
Fig. 5.3. Estimates of the diffusion function σ 2 (r ) on the two time series, obtained with the Stanton (1997) estimator (5.1). The dashed lines are the (10%, 90%) confidence bands for the estimate on the 3-month time series, computed via a Monte Carlo simulation. The dotted lines are confidence bands for the estimate on the 7-day time series. The solid line is the estimate obtained in Bandi (2002) for the 7-day time series.
implement the estimator (3.1) with the following threshold5 :
ϑ t = 9 · ht
(5.2)
where (ht )t =1,...,T is the filtered variance from GARCH(1, 1) estimation. This procedure cuts observations whose variations are three conditional standard deviations away from zero. By using a conditionally varying threshold, we are more sensible in detecting jumps when the diffusive variance is high, that is when a large movement could be more likely due to the diffusive component instead of a jump. We use the same estimates of the GARCH model for the two time series, while the threshold is not the same since it also depends on the time series of squared differences. The time series of detected jumps is shown in Fig. 5.4. As expected, we find many more jumps (607 jumps, 11.0% of observations) on the 7-day time series than on the 3-month time series (85 jumps, 1.54% of observations), and jump sizes are much larger on the 7-day time series. We then estimate the intensity function λ(r ) for the two time series using the estimator (3.4) with the correction (3.5). Here and throughout the rest of the paper, the threshold estimator is implemented with a Gaussian kernel and with the bandwidth (2.1) with hs = 3 when estimating the drift and diffusion function, and with hs = 5 when estimating the intensity function. Fig. 5.5 shows the result. The estimates are very different on the two samples. On the 7-day time series, we find a typical intensity of 35 jumps per year, and an intensity function which is not monotone. The estimate on the 7-day time series is close to the estimated intensity in Das (2002), who estimates a jump-diffusion
5 The behavior of the threshold function is constrained when δ → 0. With fixed δ > 0, the choice of the threshold is an empirical matter for which there is not yet theoretical guidance. The GARCH(1, 1) model we use should be regarded as a parsimonious auxiliary model to account for heteroskedasticity. An alternative threshold is studied in the Web Appendix, providing very similar results to those reported in this section.
Fig. 5.4. Top: the time series of detected jumps on the 7-day time series. Bottom: the time series of detected jumps on the 3-month time series.
model on US federal fund rates, and finds an intensity of 13–17 jumps per year. Confidence bands indicate a severe downward bias6 in estimation. For the 3-month time series, we get a typical intensity of 5 jumps per year, and the intensity increases with the level of interest rate. It is in very good agreement with the estimate of Andersen et al. (2004) of around 5 jumps per year. Confidence bands indicate that the proposed intensity estimator is unbiased. We are disentangling the idiosyncratic noise of the 7-day Eurodollar deposit rate, which is driven by liquidity reasons, and is displayed in the form of discontinuous variations, from the continuous variations of the short rate. The detected jumps on the 3-month time series have instead been shown to be mostly due to macroeconomic announcements (Johannes, 2004; Piazzesi, 2005). We now turn to the estimate of the drift and diffusion coefficients. Estimation results are displayed in Fig. 5.6 for the diffusion coefficient and in Fig. 5.7 for the drift coefficient. Both figures are exactly on the same scale of Figs. 5.2 and 5.3 respectively, to allow the reader to directly compare the two. The results are compelling. After filtering for the jump components, the two estimates of the diffusion function do actually look the same, as expected given the above discussion. For the 3-month time series, the difference between the estimate using the threshold estimator and using the classical estimator is not very large, with the classical estimator providing a larger estimate. For the 7-day time series the situation is completely different; with the same threshold, we detect a large number of jumps, and this makes a large difference in estimating the variance function. The same considerations apply to the estimate of the drift coefficient. The confidence bands obtained in Fig. 5.6 on the 7-day time series suggest that the estimate is upward biased. This problem is originated by the severe bias in estimating the intensity function on the 7-day time series, see Fig. 5.5. In Monte Carlo experiments
6 As explained in the Web Appendix, when confidence bands are below the estimate, it means that the estimate is downward biased. This is a clue of misspecification. Indeed, a single factor model may be inadequate in describing the dynamics of the short rate, and the extension to a stochastic volatility model has been advocated by many authors, see e.g. Andersen and Lund (1997).
C. Mancini, R. Renò / Journal of Econometrics 160 (2011) 77–92
Fig. 5.5. Estimates of the jump intensity function λ(r ) on the two time series, obtained with the threshold estimator (3.4) proposed in this paper. The dashed lines are the (10%, 90%) confidence bands for the estimate on the 3-month time series, computed via a Monte Carlo simulation. The dotted lines are confidence bands for the estimate on the 7-day time series.
83
Fig. 5.7. Estimates of the drift function µ(r ) on the two time series, obtained with the estimator (3.3) proposed in this paper. The dashed lines are the (10%, 90%) confidence bands for the estimate on the 3-month time series, computed via Monte Carlo simulation. The dotted lines are confidence bands for the estimate on the 7day time series.
in detecting jumps, and a noise of 0.17%. On the 7-day time series the efficiency is lower: 26.70% and also the noise is lower: 0.02%. Concluding, if the purpose is proxying the market spot rate, the 7-day time series, which is not a market rate series, should not be used, while it is better to use the 3-month time series. However, when carefully implementing a threshold estimator, the estimates of the continuous part of the two time series are consistent. 5.2. A comparison with the Bandi–Nguyen–Johannes estimator
Fig. 5.6. Estimates of the diffusion function σ 2 (r ) on the two time series, obtained with the estimator (3.1) proposed in this paper. The dashed lines are the (10%, 90%) confidence bands for the estimate on the 3-month time series, computed via Monte Carlo simulation. The dotted lines are confidence bands for the estimate on the 7day time series.
with normally distributed jump sizes most of the jumps are below the threshold, and the intensity is underestimated. All the jumps which are not detected contribute spuriously to the estimated variance, which turns out to be larger. On the simulated paths of the model estimated for the 3-month time series (see the Web Appendix), we get an efficiency of 39.47%
In this Section, we directly compare the results obtained with the threshold estimator with those obtained with the estimator proposed by Johannes (2004) and studied in Bandi and Nguyen (2003). To this purpose, we use the very same data set used in Johannes (2004), that is the time series of daily 3-month Treasury-bills annualized rate from January, 1965 to February, 1999, for a total of 8522 observations. The time series of interest rate levels and first differences are plotted in Fig. 5.8. Observations range from the lowest value of 2.61% to the largest value of 17.14%. On this data set, we use the threshold estimator to estimate the squared diffusion function σ 2 (r ), the drift function µ(r ) and the intensity function λ(r ). The estimate of the squared diffusion function is plotted in Fig. 5.9 and directly compared to that obtained in Johannes (2004).7 We get a large difference between the two estimates. The estimate obtained with the threshold estimator oscillates less and is considerably and significantly higher, for all interest rate levels. Substantial differences are also observed on the intensity function in the opposite direction. Both methods find that the intensity of jumps increases with the level of interest rates, but Johannes (2004) estimates an intensity ranging from 15 to 30 jumps per year when the short rate is less than 10%, while, with the threshold
7 In Johannes (2004), the drift and diffusion functions are obtained for the logarithmic differences of the short rate. To compare his results to ours, which are obtained on short rate differences, we transformed his estimates using Itô’s Lemma.
84
C. Mancini, R. Renò / Journal of Econometrics 160 (2011) 77–92
Fig. 5.8. Top: the time series used in Johannes (2004). Bottom: the time series of first differences.
estimator, we find in the same interest rate range an intensity from 5 to 10 jumps per year. For short rates larger than 10% the intensity becomes much larger for both estimators. Moreover, assuming normally distributed jumps with 0 mean and variance σJ2 , using the methodology explained in Section 3.2 we find a point estimate of σˆ J2 = 0.00397668, more than double the value (σJ2 = 0.0018) estimated in Johannes (2004). Parametric estimates on similar data are much closer to the estimate obtained with the threshold estimator. For example, Andersen et al. (2004) estimate around 5 jumps per year. Most importantly, this difference can be solved by the Monte Carlo simulations reported in the Web Appendix, which show that the Johannes (2004) estimator of the intensity function is highly upward biased.8 Thus, it also provides spuriously lower estimates of the diffusion function. Instead, the estimate of the intensity function obtained with the threshold estimator is unbiased, as shown by the Monte Carlo confidence bands in Fig. 5.11 (see also the results in the Web Appendix). For the drift, we find that the estimate obtained with the threshold estimator is very similar to that obtained with the Bandi–Nguyen–Johannes estimator, see Fig. 5.10. Looking at confidence bands, we observe significant mean-reversion only for rates less than 3% and rates larger than 14%. This is in substantial agreement with the result in Jones (2003), who reports substantial random walk behavior of the short rate in the central part of the distribution, and evidence of mean reversion for extreme values of the short rate. The presence of mean reversion in interest rate data is quite controversial, see e.g. Pritsker (1998) and Chapman and Pearson (2000). Our result confirms that it is very difficult to detect significant mean reversion in interest rates movements, if not for extreme levels. Concluding, the threshold estimator and the Bandi–Nguyen– Johannes estimator provide consistent estimates for the drift term, while the two methodologies provide a very different estimate of the diffusion and jump intensity functions, with the threshold estimator performing much better on Monte Carlo simulations, as well as being closer to parametric estimates of the jump intensity.
8 Bandi and Renò (2008) show that the bias is due to poor estimation of higher moments in finite samples, and they propose a correction for the Johannes (2004) estimator which mostly eliminates this bias.
Fig. 5.9. Estimates of the diffusion function σ 2 (r ) with the threshold estimator (3.1), together with the estimate on the same 3-months rates time series published by Johannes (2004). Dashed lines are the (10%, 90%) Monte Carlo confidence bands for the threshold estimator. Dotted lined are the (10%, 90%) Monte Carlo confidence bands published by Johannes (2004).
Fig. 5.10. Estimates of the drift function µ(r ) with the threshold estimator (3.3), together with the estimate on the same time series published by Johannes (2004). Dashed lines are the (10%, 90%) Monte Carlo confidence bands for the threshold estimator. Dotted lined are the (10%, 90%) Monte Carlo confidence bands published by Johannes (2004).
6. Conclusions In this paper, we introduce an estimator of the level-dependent diffusion coefficient in a univariate model with possibly infinite activity jumps, as well as estimators of the drift and the intensity functions when they are level-dependent and the jumps are given by a finite activity doubly stochastic compound Poisson process.
C. Mancini, R. Renò / Journal of Econometrics 160 (2011) 77–92
85
the threshold technique may in principle be used in a semiparametric setting, for example to estimate, once eliminated the jumps, the coefficients of a linear drift using OLS-like estimators. A simple parametric model is explored in Mancini (2003) combining threshold and maximum likelihood. Further research on these topics is under development. Appendix. Proofs We recall that δ = Tn , ti = iδ , X = Y + J. Denote by ∆i Z⋆ = ∆i ZI{(∆i X )2 ≤ϑ(δ)} , and for an integer ℓ, Xti−1 − x Xs − x , Ks(ℓ) := K (ℓ) , Ki−1 := K h
h
∆i Yˆ = ∆i X⋆ ,
∆i Jˆ = ∆i XI{(∆ X )2 >ϑ(δ)} , i
∆i Nˆ = I{(∆i X )2 >ϑ(δ)} . For any bounded process Z we denote by Z¯ = supu∈[0,T ] |Zu |.C denotes a constant; in the sequences of inequalities, even if passing from a line to the next one we would have a different constant, we still indicate it by C . By σ .W we denote the stochastic integral of σ with respect to W . We denote by τj j∈N the jump instants of J1 Fig. 5.11. Estimates of the intensity function λ(r ) with the threshold estimator (3.4), together with the estimate on the same time series published by Johannes (2004). Dashed lines are the (10%, 90%) Monte Carlo confidence bands for the threshold estimator. Dotted lined are the (10%, 90%) Monte Carlo confidence bands published by Johannes (2004).
The proposed estimator is based on the fact that, when the time interval δ between two observations tends to zero, jumps can be detected using a threshold which tends to zero slower than the modulus of continuity of the Brownian motion paths. We use this intuition to devise nonparametric estimators of the jumpdiffusion process and analyze their asymptotic properties, even in the presence of infinite activity jumps, extending the preceding literature on nonparametric estimation of continuous diffusions and of diffusions plus finite activity jump processes. Monte Carlo simulations in the web appendix show that, in finite samples, the threshold estimator for the diffusion coefficient of realistic interest rate processes is unbiased and performs fairly better than the alternative estimator of Bandi and Nguyen (2003) and Johannes (2004). Data analysis helps us in better understanding the peculiarities of short rate dynamics. First, we show that the difference between the 3-month rate and the 7-day rate is due to the jump component which affects the second and that is due to liquidity reasons. Our estimates imply that the 7-day time series, and presumably instruments which are not market rates, should not be used to proxy the spot rate. Once the jump component is detected and discarded, the two rates display the same diffusive pattern, as it should be, since both are used as proxies of the short rate. Moreover, we show that the threshold estimator provides quite different estimates with respect to finding in Johannes (2004), especially regarding the different contributions to the total variance of the short interest rate. We find that more variance has to be ascribed to the diffusive part than to the jump part. We think that our results can be useful not only in the framework of interest rate modeling, but more generally for asset and derivatives pricing. Clearly, the univariate setting proposed in this paper may reveal an insufficiency in dealing with the statistical properties of the short rate, as suggested in Corradi and Distaso (2007). Nonparametric estimation in presence of more general stochastic volatility and jumps have been conducted by Bandi and Renò (2008) using moment estimators, and may benefit from the techniques proposed in this paper. Moreover,
and by τ (i) the instant of the first jump in ]ti−1 , ti ], if ∆i N ≥ 1.
Proof of Proposition 3.1. We set, without loss of generality, T = 1, so that δ = Tn = 1n . Set L := L1 . Write Lˆ ⋆ (x) =
=
n 1−
Ki−1 δ
h i=1 1
1
∫
h
Ks ds + 0
1
h
n −
Ki−1 δ −
1
∫
Ks ds ,
(A.1)
0
i=1
and note that 1
1
∫
Ks ds =
h
0
1
∫
1
K
h
Xs− − x
d[X ]cs
σ 2 (Xs− )
h
0
so by the occupation time formula (Protter, 1990), 1 h
1
∫
Ks ds = 0
1 h
∫ K
K ( u)
= R+
∫ R−
1
σ (a) 2
1
σ 2 (uh + x)
K ( u)
+
h
R
∫
a−x
L(a)da
L(uh + x)du
1
σ 2 (uh + x)
L(uh + x)du
L⋆ (x)
which a.s. tends to σ 2 (x) . For each n define the random sets I0,n = {i ∈ {1, . . . , n} : ∆i N = 0}
and
(A.2)
I1,n = {i ∈ {1, . . . , n} : ∆i N ̸= 0}, so that the remainder in (A.1) coincides with
∫ 1 − h i∈I 0,n
ti
(Ki−1 − Ks )ds +
ti−1
∫ 1 − h i∈I 1,n
ti
(Ki−1 − Ks )ds.
(A.3)
ti−1 a.s.
The second term in (A.3) is dominated by N1 2Kh δ → 0, given the assumption on h. The first term in (A.3) can be written using Taylor expansion up to order m, where 2/m ≤ ε , and it is a.s. dominated by ¯
m−1
−C −∫ k=1
h i∈I 0,n
ti ti−1
(k) Xs − x |Xs − Xti−1 |k K ds h hk
86
C. Mancini, R. Renò / Journal of Econometrics 160 (2011) 77–92
∫ C − ti (m) X˜ is − x |Xs − Xti−1 |m ds, + K h i∈I h hm ti−1
(A.4)
0,n
where X˜ is are suitable values in ]Xti−1 , Xs [, for i ∈ I0,n . Using (14) in Mancini (2009) (property of uniform boundedness of the increments of X paths when J ≡ 0, hereafter indicated as the UBI property) for the free of jumps terms Xs − Xti−1 , last term of (A.4) is dominated by ti
−∫
C
h i∈I 0,n
m/2 δ ln 1δ hm
ti−1
ds = oa.s.
δ ln 1δ
to show that the numerator of the first term above converges stably in law to M1 , where M1 is a Gaussian martingale defined on ˜ , P˜ , F˜ ) of our filtered probability space and havan extension (Ω ing E˜ [M12 |F ] = 2σ 2 (x)L⋆⋆ (x). Using the Itô formula on (∆i Y )2 , we write the cited numerator as n −
h
i=1
1
h i∈I 0,n
≤C
ti−1
k ∫ δ ln 1δ 2 − hk+1
i∈I0,n
ti ti−1
δ ln 1δ hk+1
k δ δ ln 1δ 2 hk
h
C
hk+1
=C
ti−1
i=1
δ ln 1δ
ti
2k ∫
hk+1
1 0
(A.6)
n −
(1)
a.s.
→ 0,
(2)
C
h i=1
∫
m−1
C
2 σs − σ 2 (x) ds.
Ki−1
h
ti−1
a.s. (Ki−1 − Ks− ) σs2 − σ 2 (x) ds → 0; Ks−
a.s. σ − σ (x) ds → 0. 2 s
(A.8)
2
ti−1
a−x
L(uh + x)
|K (u)||u|h
−
(A.9)
√
σ 2 (uh + x)
du = Oa.s. ( nh3 ) → 0.
(m) X˜ is − x K h ti−1
hm
∫ n −
= oa.s.
|σs2 − σ 2 (x)|ds.
(A.10)
k/2 (k) δ ln 1δ K |σ 2 − σ 2 (x)|ds
ti
s
δ h
s
hk
ti−1
→ 0,
proceeding as in (A.9), the first m − 1 terms in (A.10) have the same a.s. limit as
Since last term is Oa.s. N1 hδ ln 1δ → 0, in view of Proposition 3.1, to prove the theorem it is sufficient, see Jacod (in press),
s
hk
ti
|Xs − Xti−1 |m
h i∈I 1,n
Ki−1 (∆i Y )2 I{∆i N ̸=0}
.
s
ti−1
∫ n − h i∈I 0,n
(k) |Xs − Xti−1 |k 2 K |σ − σ 2 (x)|ds
By the UBI property and the fact that for each k = 1, . . . , m − 1
Lˆ ⋆
Lˆ ⋆
h i∈I 0,n
+C
Ki−1 (∆i Y )2 − σ 2 (x)δ
ti
∫ n −
×
i=1
k=1
i=1
ti
∫
n
To reach result (1) first note that the sum of the terms with i ∈ I1,n tends as to zero, so that we have only to deal with the terms with i ∈ I0,n ; second we use the Taylor expansion of Ks − Ki−1 up to order m where m is the first integer such that 1/m < ε , so we can bound the sum in (1) by
Proof of Theorem 3.2. As in the previous proof set T = 1 and L := L1 . Also set σ (Xs ) =: σs , σ (X(i−1)δ ) =: σi−1 . Using Theorem 1 in Mancini (2009), write
h
R
−
K
nh
h
∑ n n
ti
h i=1 ti−1 n ∫ n − ti
√
δ ln 1δ
h
ti−1
n −
whose absolute value is dominated by
(k) Xs − x K ds h
nh σˆ n2 (x) − σ 2 (x) =
2 σs − σ 2 (x) ds
2 L(a) σ (a) − σ 2 (x) 2 da h R h σ (a) ∫ n a−x L(a) K (σ 2 )′ (ξ )(a − x) 2 da, = h R h σ ( a)
(A.7)
(k) Xs − x K ds h
√
ti
∫ Ki−1
i =1
n ∫ n−
n
∑ n n
h
qi +
∫
and the Lemma is proved.
n
Result (2) is obtained using the occupation time formula and then Taylor expanding σ 2 as follows
2k ∫ (k) a − x L(a) K =C σ 2 (a) da hk+1 h R k ∫ (k) L(uh + x) δ ln 1δ 2 K (u) =C du k h σ 2 (uh + x) R k/2 δ ln 1δ a.s. → 0, = Oa.s. 2
i =1
the last term of (A.6) has a.s. the same limit as
k n ∫ δ ln 1δ 2 −
n −
To show that the last sum above is negligible we prove what follows:
(k) Xs − x ds. K h
(k) Xs − x K ds h
ti−1
i∈I1,n
≤ CN1
ti
−∫
(Ys − Y(i−1)δ )σs dWs
i=1
To use the occupation time formula we now have to include all indices i = 1, . . . , n. Since k 2
ti ti−1
=:
(k) Xs − x |Xs − Xti−1 |k K ds h hk
(Ys − Y(i−1)δ )µs ds
ti−1
(A.5)
Furthermore each one of the first m − 1 terms of (A.4) is dealt using the occupation time formula and the UBI property: ti
ti
Ki − 1 2
+ 2
a.s. → 0.
+
−∫
∫
∫
m2
h2+ε
n
C
1 k/2 − n ∫ n δ ln δ h
hk
i =1
ti
(k) 2 K |σ − σ 2 (x)|ds s
ti−1
s
C. Mancini, R. Renò / Journal of Econometrics 160 (2011) 77–92 n
k δ ln 1δ √ = Oa.s. nh3 → 0.
(b3)
h
(b4)
The m-th term in (A.10) is a.s. dominated by C
m 1 n δ ln δ 2
= Oa.s.
hm
h
δ 1−ε ln
n −
δ
∑n
i =1
(b5)
→ 0.
h2+ε
We now deal with the term Sa =
1
P
Ei−1 [qi ] → 0;
(A.11)
i =1 P
Ei−1 [q4i ] → 0;
Sd =
P
Ei−1 [qi ∆i H ] → 0,
st
in press, Lemma 4.4), that i=1 qi → M1 . Remark that µ is assumed to be cadlag, therefore we know that it is locally bounded on [0, T ]. However, by localizing, we can assume that µ is a.s. bounded on [0, T ]. Using the Burkholder–Davis–Gundy (BDG) inequality we have:
∑n
−
Sb = 4
Ki−1 δ · Oa.s.
h i =1
h i=1
|µs |ds + C δ
∫
Ki2−1 Ei−1
ti
ti
Ys − Y(i−1)δ µs ds
2 µs ds
ti
∫
Ys − Y(i−1)δ σs dWs
ti−1
Ys − Y(i−1)δ
+ Ei−1
2
σ
2 s ds
ti−1
−
Ei2−1
∫
Ys − Y(i−1)δ µs ds
4
h i=1
∫
ti
∫
Ei−1
s
σu dWu
(i−1)δ
ti−1
2
σ
(b1)
n− h i =1
Ki2−1
(b2)
n 1−
h i=1
Ki2−1
σu dWu ti
2
2 a.s. 2 σs − σi−1 ds → 0;
[∫ nEi−1
ti−1
2
a.s.
ds → 2σ 2 (x)L⋆⋆ (x).
δ ln 1δ , and reasoning similarly
2 i=1 Ki−1 δ
s
(i−1)δ
σ
2 u du
] −
σi2−1
ds
2
ti
σ
2 i−1
=
nσi2−1 (s − ti−1 ) + nEi−1 [(σ 2 )′ (X˜ is )]
ti−1
(s − ti−1 )2 2
−
σi2−1
2
ds
(s − ti−1 ) σi2−1 nEi−1 [(σ 2 )′ (X˜ is )] ds = Oa.s. (δ 2 ).
ti
2
2
∑n 2 Therefore the sum in (b2) is Oa.s. δ 1h i=1 Ki−1 δ → 0. Neglecting the terms with i ∈ I1,n , term (b3) is a.s. dominated ∑n a.s. 2 by (σs4 )′ δ ln 1δ 1h i=1 Ki−1 δ → 0. Since a.s. the path Xs (ω) is bounded on [0, T ], σs4 is a.s. bounded and result (b4) is reached similarly as for the first term in (A.3) with K 2 in place of K . Using the occupation time formula, we also get the result (b5), and the result for Sb is proved. Analogously as for Sb , we have:
= Op
n n2 −
h2 i=1
= Op
n n −
δ h2
ti
4 (Ys − Y(i−1)δ )σs dWs
Ki4−1 Ei−1
∫
ti
(Ys − Y(i−1)δ ) σ 4
4 s ds
ti−1
n n −
h2 i =1
∫
ti−1
h2 i =1
Ki4−1 Ei−1
Ki4−1 Ei−1
∫
ti
∫
s
(i−1)δ
ti−1
σu dWu
4
→ 0,
n −
Ei−1 [qi ∆i H ] = 2
ds
(A.12)
s
(i−1)δ
σ
2 u du
]
σ
2 i−1
−
σi4−1 2
a.s.
ds → 0;
≤2
n n−
h i =1
i =1
∫
a.s.
ds → 0;
i−1
s
(i−1)δ
having used BDG and H inequalities for the second and last equalities above. ti Finally, to verify that Sd → 0 in probability, denote ∆i Z := (Ys − Yi−1 )dYs : if H = W then using the H inequality t
Ei−1
×
∫
= Oa.s.
ti−1
∫
[∫
ds.
ti
∫
2
h
nEi−1
2 s
For that we show the following: n
2
∑ n 1
ti−1
Sc = Op
Using Hölder (H) and BDG inequalities we have that among the conditional expectations the third one is having the lowest infinitesimal order, and in turn this is the sum of three terms, of which we need to consider only the one having the lowest infinitesimal order. Therefore it is sufficient to prove the convergence in probability of Ki2−1
σ
2 i −1
.
ti−1
n n−
σ
ti−1
ti
ti
Ks2
4 s
σs4
2
∫
ti
ti
−
Ks2
ti−1
Ys − Y(i−1)δ
ti−1
∫
σs4
(s − ti−1 ) σu2 du = σi2−1 (s − ti−1 ) + (σ 2 )′ (X˜ is ) ,
=
+ 2Ei−1
h i=1
×
h → 0.
ti
∫
σs2 ds
√
∫
Ki2−1
2
ti−1
∫
ti−1
n ∫ 4−
s
ti−1
ti−1
n 1−
n n−
ti
∫
Ki−1 Ei−1 C δ
h
i=1
≤ C
n
ti
−∫
therefore
where either H = W or H is any bounded martingale orthogonal (in the martingale sense) to W . Such conditions imply (Jacod,
|Sa | ≤
ti−1
n
h i=1
(i−1)δ
i=1
1
4 a.s. σi−1 − σs4 ds → 0;
as in (A.1) with K 2 in place of K , we reach the result. Result (b2) is the most important for our theorem. Let us expand
∫
n −
i=1
n
h i=1
given sum is Oa.s.
i =1
Sc =
Ki2−1
87
For (b1), by H inequality, using the Taylor expansion for σ 2 up to the first order, neglecting the terms with i ∈ I1,n and bounding |Xs − X(i−1)δ | by the UBI property when i ∈ I0,n , we reach that the
qi . We show that:
n − P Sb = (Ei−1 [q2i ] − Ei2−1 [qi ]) → 2σ 2 (x)L⋆⋆ (x); n −
1−
ti
∫
n n−
h i =1
Ki−1 Ei−1 [∆i Z ∆i H ]
Ki − 1
Ei−1 [(∆i Z )2 ] Ei−1 [(∆i W )2 ]
88
C. Mancini, R. Renò / Journal of Econometrics 160 (2011) 77–92
= Oa.s. (Lˆ
√ ⋆
h) → 0.
If H is orthogonal to W then
n n−
h i=1
=
∑n
P
Ei−1 [qi ∆i H ] equals
i=1
= Oa.s. (L
∫
ti ti−1
√
h) → 0,
t Proof. J1t = xm(dt , dx) is written in terms of the ℓ=1 γℓ = random measure m with compensator νt (dx)λ(Xt − )dt on R × [0, +∞[, where νt (dx) is the law of the size of the jump occurred at t, however J1 also has a representation in terms of another ¯ which random measure m is Poisson with compensator dxdt ¯ (dx, dt ), where η(ω, t , x) is (Jacod, in press): J1t = η(ω, t , x)m a predictable function on Ω × [0, +∞[×R such that the compensator νt (dx)λ(Xt − ) is the image measure of the Lebesgue measure on R − {0} through the map x → η(ω, t , x). Since m and ¯ represent m the trajectories of the same jump process, indicating ¯ (dt , dx), we have P (∆i N ≥ 1) = P (∆i N¯ ≥ 1) = O(δ) N¯ t := 1m and P (∆i N ≥ 2) = P (∆i N¯ ≥ 2) = O(δ 2 ).
∑N
Proof of Theorem 3.7. Note that since T → ∞ it is not guaranteed any longer that (4) in Mancini (2009) holds. Write:
hLˆ ⋆T (x) 1
hLˆ ⋆T (x)
n ∑
Ki−1 (∆i X⋆ − µ(x)δ)
i =1
−
ˆ⋆
LT (x)h i=1 n −
hLˆ ⋆T (x) i=1
n −
hLˆ ⋆T (x) i=1
∫ ti ≤ |Ki−1 | µ(Xs )ds + ∆i (σ .W ) ti−1 i =1 × I{| ti µ(X )ds|> √ϑ(δ) , ∆ N =0} + I{|∆ (σ .W )|> √ϑ(δ) ,∆ N =0} . s
ti−1
i
2
i
2
i
Since µ is bounded we have a.s., for small δ , for all i = 1, . . . , n, √ I t i ϑ(δ) (ω) = 0 and the first term above is zero. For |
ti−1
µ(Xs )ds|>
2
the second term P
n −
∫ ti µ(Xs )ds + ∆i (σ .W ) |Ki−1 | ti−1
× I{|∆ (σ .W )|> √ϑ(δ) ,∆ N =0} ̸= 0 i
i
2
≤ P ∪i |∆i (σ .W )| >
Ki−1 (∆i X⋆ − ∆i Y )
√ ϑ(δ) 2
which is dominated, by Corollary 3.3 in Mancini (2004), for each T , − ϑ(δ)
Ki−1 (∆i X⋆ − ∆i Y ),
hLT (x)αn,T (x) = Oa.s. ′
√ µ(Xs )ds+∆i (σ .W )|> ϑ(δ), ∆i N =0}
n −
Ki−1 (∆i Y − µ(x)δ)
hLˆ ⋆T (x)(αn′ ,T (x) + βn,T (x))
ˆ⋆
ti−1
i =1
(A.13)
where αn′ ,T (x) and βn,T (x) are defined as in equation (134) in Bandi and Nguyen (2003). Using their results we have
× I{| ti
Lˆ ⋆T (x)
1
∫ ∆ N n − ti i − µ(Xs )ds + ∆i (σ .W ) γℓ − Ki−1 i=1 ti−1 ℓ=1 ∆i N − γℓ I{(∆i X )2 >ϑ(δ)} I{∆i N =0} + ℓ=1 ∫ n ti − ≤ |Ki−1 | µ(Xs )ds + ∆i (σ .W ) t i − 1 i =1
n
1
1 h
hLˆ ⋆T (x) (µ ˆ n (x) − µ(x)) =
+
γℓ − ∆i XI{(∆i X )2 >ϑ(δ)} I{∆i N ≥2} ̸= 0
which, by Lemma A.1, is O(nδ 2 ) → 0. Let us now deal with the terms multiplying I{∆i N =0} :
(Ys − Yi−1 )µs ds ∆i H
Ki−1 Ei−1
Lemma A.1. If λ(Xs ) is bounded then, uniformly for all i = 1, . . . , n, P {∆i N ≥ 2} = O(δ 2 ).
=
n ≤P {∆i N ≥ 2}
having used that a.s. ∆i H ≤ C .
+
ℓ=1
i =1
n n−
h i =1
=
Ki − 1
∆ N i −
i =1
Ki−1 Ei−1 [∆i Z ∆i H ]
ˆ⋆
n −
d
5 ˆ⋆
h LT (x)
a.s.
→ 0,
1
by 2ne 4 2σ¯ 2 δ → 0. Finally we deal with the terms multiplying I{∆i N =1} : since (using the notation given in Section 2) on {∆i N = 1} we have ∆i J = γτ (i) , we are left with n −
Ki−1 γτ (i) I{|∆i X |≤√ϑ(δ),∆i N =1}
i=1
−
n −
∫
ti
Ki − 1 ti−1
i =1
µ(Xs )ds + ∆i (σ .W ) I{|∆i X |>√ϑ(δ), ∆i N =1} . (A.15)
hLˆ ⋆T (x)βn,T (x) → M N 0, K2 σ 2 (x) ,
so it is enough to prove that the second term in (A.13) tends to zero in probability. It is sufficient to show that the numerator does. To this purpose write: Plim
n −
Ki−1 (∆i Yˆ − ∆i Y ) = Plim
i=1
= Plim
n −
i1
Ki−1
∆ N i − ℓ=1
=
P {|∆i X | ≤
2
i
γℓ − ∆i XI{(∆i X )2 >ϑ(δ)}
ϑ(δ), ∆i N = 1}
√ − ϑ(δ) P |∆i X | ≤ ϑ(δ), ∆i N = 1, |∆i (σ .W )| ≤
× I{∆i N =0} + I{∆i N =1} + I{∆i N ≥2} .
− i
Ki−1 (∆i J − ∆i Jˆ)
i=1 n −
The probability that the first term of the r.h.s. is different from zero is dominated by
+ (A.14)
We now show that each term tends to zero in probability. As for the terms multiplying I{∆i N ≥2} :
−
P |∆i X | ≤
ϑ(δ), ∆i N = 1,
i
√ ϑ(δ) |∆i (σ .W )| > . 2
(A.16)
C. Mancini, R. Renò / Journal of Econometrics 160 (2011) 77–92
The second of last terms is dominated by i P {|∆i (σ .W )| > √ ϑ(δ)/2}, which tends to zero as before. The first term in the second line of (A.16) is dominated by
n ∑
∑
−
P |γτ (i) | ≤ C
ϑ(δ), ∆i N = 1 ,
with C a suitable constant, since, for all i = 1, . . . , n, |∆i X | ≥ t |∆i J | − | t i µ(Xs )ds| − |∆i (σ .W )|, so that on {|∆i X | ≤ i−1 √ √ ϑ(δ), ∆i N = 1, |∆i (σ .W )| ≤ ϑ(δ) } we have |γτ (i) | = |∆i J | ≤ 2 √ √ µδ ¯ + 32 ϑ(δ) ≤ C ϑ(δ), for sufficiently small δ . Now − P {|γτ (i) | ≤ C ϑ(δ), ∆i N = 1}
i
−
P {|γτ (i) | ≤ C
ϑ(δ)}P {∆i N = 1}
−
= OP
ϑ(δ) δ
= O(nδ 1+η/2 ) → 0,
(A.18)
i
by the independence of each √ γk from N, √ Lemma A.1 and the assumption that P {|γτ (i) | ≤ C ϑ(δ)} ≤ C ϑ(δ). Finally we show that the second term of (A.15) tends tozero in probability. In fact, using (14) in Mancini (2009) and that E
NT T
3/2
n ∑
ˆ n (x) − λ(x)) = hLˆ ⋆T (x)(λ
hLˆ ⋆T (x)
i =1 + hLˆ ⋆T (x) n ∑
Ki−1 δ
1
ˆ⋆
ln
δ
→ 0.
sup |1 − cin | hLT (x) i
∑
Ki−1 ∆i Nˆ (cin − 1) n ∑
Ki−1 δ
∑ n
i=1 hLˆ ⋆ (x)
Ki−1 (∆i Nˆ − ∆i N ) n ∑
Ki−1 δ
i=1 n ∑
+
i=1
t Ki−1 ∆i N − t i λ(Xs )ds i−1
k
such that a.s., for sufficiently small δk , for all i =
1 , nk
1, . . . , nk , on the set {(∆i J˜2 )2 ≤ ϑ(δk )}, we have
−
(∆J˜2,s )2 ≤ 3ϑ(δk ), |∆J˜2,s | ≤
and
(A.21)
3ϑ(δk ).
Let us rename Π (nk ) by Π (n) . Using the Itô formula and denoting t (n) Zt := 0 (Vs− − V [sn] )dVs we have
Ki−1 δ
n
−
(∆J˜2,s )2 = 2
s∈]ti−1 ,ti ]
ti
∫
i=1
Ki−1 δ
(Vs− − Vti−1 )dVs = 2∆i Z (n) .
ti−1
Note that our increments ∆i Z are exactly the increments of process W (n) in Jacod (2004), Eq. (2.1), with both Y and Y ′ = f (X ) being V here. Under our assumptions (α < 1) the speed un is n. The proof of the fact that nW (n) converges in distribution (Jacod, in press, Step 2 at p. 1845) can be repeated here with f (x) = x. Moreover Lemma 2.1 in Jacod (in press) ensures that nW (n) is uniformly tight, (n)
(n)
P
i.e. supt ≤1 |nWt | is tight, so supt ≤1 |Wt |/ϑ(δ) → 0, but then sup i
|∆i Z | |W (n) | P → 0, ≤ 2 sup t ϑ(δ) t ≤1 ϑ(δ)
and thus passing to a subsequence wnk we have a.s., for sufficiently small δ , supi |∆i Z | ≤ ϑ(δ). Now we have
sup ti ∈Π (nk )
−
(∆J˜2,s )2 − (∆i J˜2 )2
s∈]ti−1 ,ti ]
− ˜ 2 2 ≤ sup (∆i J2 ) − (∆J˜2,s ) i=1,...,nk s∈]t ,t ] i−1 i
n ∑
(A.22)
i−1 i
(A.19)
i
We can write the second term as
mesh δk =
i
= Op sup |1 − cin | hLˆ ⋆T (x) → 0.
Lemma A.2. Define Π (n) = { ni , i = 1, . . . , n} the partitions of [0, 1] on which the sums in our statement are constructed. We have that there exists a subsequence Π (nk ) = { ni , i = 1, . . . , nk }, with
(∆i J˜2 )2 −
− λ(x) ,
i =1
while the last term of (A.20) coincides with hLˆ ⋆T (x) αn′ ,T (x) with λ replacing µ and thus it tends a.s. to zero.
− ˜ 2 2 a.s ˜ (∆J2,s ) → 0. sup (∆i J2 ) − t ∈Π (nk ) s∈]t ,t ]
Ki−1 ∆i Nˆ
i=1 n
d
hLˆ ⋆T (x)γn,T (x) → M N (0, K2 λ(x)) ,
∑n
it is sufficient to show that the second term converges in distribution, since then the first term is dominated by n ∑
˜ 2 Proof. Set V = J˜2 and wn = i=1 (∆i J2 ) . Theorem 25.1 in Metivier (1982) implies that there exists a subsequence wnk such that
i=1
Ki − 1 δ
(A.20)
Analogously as in Eq. (A.14) the first term in the r.h.s. of (A.20) tends a.s. to zero. The second term is exactly γn,T (x), with c (Xs− , y) ≡ 1, in equation (104) of Bandi and Nguyen (2003), so
s∈]ti−1 ,ti ]
i =1
Ki−1 ∆i Nˆ
.
i =1
sup
i=1
∑ n
n
s∈]ti−1 ,ti ]
ˆ n (x), write For the convergence of λ
λ(Xs ) − λ(x)ds
∑
≤
λ¯ , for each T such term is bounded by n − 1 1 NT δ ln T |Ki−1 | δ ln I{∆i N =1} = Oa.s. δ δ T i=1 = OP nδ
+
89
To prove Theorem 4.1 for infinite activity jumps we set again T = 1.
i
ti ti−1
i=1
(A.17)
i
=
Ki−1
= 2 sup |∆i Z (nk ) | < 2ϑ(δk ) i=1,...,nk
90
C. Mancini, R. Renò / Journal of Econometrics 160 (2011) 77–92
so, for all ti ∈ Π (nk ) , on the set {(∆i J˜2 )2 ≤ ϑ(δk )}, we have ∑ ˜ 2 ˜ 2 s∈]ti−1 ,ti ] (∆J2,s ) ≤ (∆i J2 ) + 2ϑ(δk ) = 3ϑ(δk ), and in particular all jump sizes |∆J˜2,s | are bounded by
√
3ϑ(δk ).
As a consequence of the previous lemma we have that on a subsequence wnk for all i = 1, . . . , nk ti
∫
∆i J˜2 I{(∆i J˜2 )2 ≤ϑ(δ)} =
∫
ti−1
√ |x|≤3 r (h) ti
∫
˜ (dx, dt ) xm
∫
−
√
ti−1
3 r (h)√ϑ(δ)/2}
1/p
Ei−1
−
Fa(k) (a0 , ξ )
k=1
(k)
(k)
(k)
where Fab (a, b) := ∂∂ak ∂ b F (a, b). Putting all together we reach k+1
− |Ks(k) |
|Ki−1 − Ks | ≤
1/p
ti
p
ti−1
√ |x|≤ 3ϑ(δ)
x2 m(dx, ds)
+
+ δ (1−
αη 1 ) 2
+
|Ks(m) | hm
|X0u − X0s |m (m − 1)! u∈]ti−1 ,ti ]
∑ √
Ei−1 [|ξi |] = OP nh
Ki − 1 δ
i
h
δ η−
αη 2
(1+ 1q )
sup
|K˜ s(m+1) | hm+1
|K˜ s(m+1) | hm+1
|K˜ s′ | h
→ 0,
provided we choose q sufficiently close to 1, and this concludes the proof.
|X0u − X0s |k (k − 1)! u∈]ti−1 ,ti ] sup
|X0u − X0s |k sup |J2,ti−1 − J2,s | (k − 1)! u∈]ti−1 ,ti ]
hk+1
|X0u − X0s |m+1 (m − 1)! u∈]ti−1 ,ti ] sup
|X0u − X0s |m |J2,ti−1 − J2,s | (m − 1)! u∈]ti−1 ,ti ] sup
|J2,ti−1 − J2,s |,
(k) where each K˜ s := K (k)
thus
i
|K˜ s(k+1) |
q
αη 1 1 α = OP (δ + δ p ϑ(δ)1− 2 )δ (1− 2 ) q ,
−
hk
k=1
σs2 ds
+ Ei−1
(k)
Fa(k) (a0 , ξ ) = Fa(k) (a0 , b0 ) + Fab (a0 , ξ˜ )(ξ − b0 ),
ti−1
∫
(η − a0 )k−1 (η − a0 )m−1 + Fa(m) (η, , ˜ ξ) (k − 1)! (m − 1)!
k
+
p
∫
evaluated at a = X0,s , b = J2,s . We
where Fa (a, b) := ∂∂ak F (a, b). Finally expand each Fa (a0 , ξ ), moving the second component alone, around b0
+
using the BDG inequality, Lemma 5.1 in Jacod et al. (2005) and (27) in Cont and Mancini (2008) this is dominated by C
Fa (η, ξ ) =
m−1
ti
a+J1s +b−x h
Taylor expand the terms Ki−1 − Ks = F (a, b) − F (a0 , b0 ), for each i = 1 . . . n, around (a0 , b0 ) = (X0s , J2s ), with a = X0,ti−1 , b = J2,ti−1 as follows: first expand
(∆i Y )2 + (∆i J˜2m − ∆i J˜2c )2 I{|∆i J˜2 |>√ϑ(δ)/2} p 1/p 1/q ≤ CEi−1 (∆i Y )2 + (∆i J˜2m − ∆i J˜2c )2 Pi−1 {|∆i J˜2 | > ϑ(δ)/2}
∫
tion F (a, b) := K
m−1
∑ n
i
associated to J1 , since they give negligible impact, as in the proof of Theorem 3.2, sowe fix J1 and regard Ks as a two variables func-
i
(I{(∆i J˜2 )2 ≤4ϑ(δ), ∆i N =0} − I{(∆i X )2 ≤ϑ(δ)} ) is dominated by − n− Ki−1 (∆i X )2 I{(∆i J˜2 )2 ≤4ϑ(δ), ∆i N =0, |∆i J˜2 |>√ϑ(δ)/2} =: ξi .
≤ CEi−1
∑ ti
then expand the first partial derivative Fa (η, ξ ) around a0 moving η alone, up to the (m − 1)-th order
Ki−1 (∆i X )2 I{(∆i J˜2 )2 ≤4ϑ(δ), ∆i N =0}
and that a.s., for sufficiently small δ , 0 ≤
h
1 h
F (a, b) − F (a0 , b0 ) = Fa (η, ξ )(a − a0 ) + Fb (η, ξ )(b − b0 ),
i
xν(dx)ds := J2t + C δ |x|≤1
i ti−1 (Ki−1 − Ks )ds we ignore the terms i ∈ I1,n = {i ∈ {1, . . . , n} : ∆i N ̸= 0}, where N is the counting process
Lemma A.3. As δ → 0 we have n−
0
and we can incorporate thesecond term into the drift part of X . Det t note X0,t = 0 µs + C ds + 0 σs dWs . We split Xs = X0,s + J1,s + J2,s . Within
∫ t∫
|x|≤1
0
=: ∆i J˜2m − ∆i J˜2c .
xm(dx, dt )ds −
Z s −x h
(A.24)
is the kernel k-th derivative eval-
uated at a suitable point Zs (ω) to give the Lagrange remainder at ∑ ti the relative expansion. We now show that within 1h ( K − i − 1 i t i−1
Ks )ds all the previous terms give a negligible contribution. For all (k) the terms containing Ks we use the UBI property of X0 and then the occupation time formula, while for the other terms we use the (k) boundedness of K˜ s and that Ei−1 [|J2,ti−1 − J2,s |] = O(δ). For each
C. Mancini, R. Renò / Journal of Econometrics 160 (2011) 77–92
(k)
k = 1, . . . , m, the term containing Ks 1 h
|Ks | hk
ti−1
i
≤C =C
(k)
ti
−∫
δ ln
1 k/2
δ hk+1
is given by
We write the l.h.s. above as
k
|X0u − X0s | sup ds (k − 1)! u∈]ti−1 ,ti ] ∫ 1 |Ks(k) |ds
n n−
h i=1
=
hk
(k)
|K (u)| R
L(u + hx)
σ 2 (u + hx)
h
|K˜ s(k+1) |
ti
hk+1
ti−1
i
1
δ ln
du → 0.
k/2 |J2,ti−1 − J2,s |ds :=
δ
−
h
|K˜ s(m+1) |
ti
∫ 1−
hm+1
ti−1
i
1
δ ln
m+2 1 ds
δ
m 1 δ ln 1δ 2 δ ln δ = Oa.s. hm+1
= O
δ ln 1δ
h
m2
δ ln 1δ
h2+ε
h
n n−
h i=1
Ki−1 (∆i Y )2 I{(∆i J˜2 )2 >4ϑ(δ)}∪{∆i N ̸=0}
h i=1
ξi
δ and a.s. i Ei−1 [ξi ] ≤ C → 0. The contribution of the hk h2 last term in the second line of (A.24) is a.s. dominated by
Ki−1 (∆i Y )2 − σ 2 (x)δ
+
n n−
h i=1
Ki−1 (∆i J˜2 )2 I{(∆i J˜2 )2 ≤4ϑ(δ), ∆i N =0} :=
h
Ei−1 ti−1
i
= OP
δ ln 1δ
hm+1
m2
hm
h
∫
ti
δ ln
|J2,ti−1 − J2,s |ds
1
m2
Ei−1
i
ti−1
k 1 δ 1−ε OP 2+ε ln k+i h δ
|J2,ti−1 − J2,s |ds = OP
δ
h2
→ 0.
P
placing K with K 2 we also get that
2 i Ki−1 δ
∑
h
P
2 → L⋆⋆ T (x)/σ (x).
For the consistency and CLT for σˆ 2 (x), we follow exactly the same steps as in the proof of Theorem 3.2, however even here we have to check that the contribution given by J˜2 is negligible at each step. Write
∑ n n nh σˆ n2 (x) − σ 2 (x) =
h
i=1
Ki−1 ((∆i X )2⋆ − σ 2 (x)δ) Lˆ ⋆
.
It is sufficient to show that the numerator tends stably in law to the r.v. M1 in the proof of Theorem 3.2. For that, in view of Lemma A.3 it is sufficient to show that n n−
h i=1
δ
h
→ 0. For
for the fifth OP
st
δ ln 1δ
m √
nh3
h
→ 0.
2
√ h
h
(k+1)ε 2
m−1 2 2 m+ 1 2 h
δ
→ 0;
δ ln 1δ h
m ln 2 1δ hδ
δ 1−1/m
h2+1/m
m2
ln 1δ
→ 0,
→ 0, for the last, by taking
ε and that ε − 1/2 + φ(1/2 + ε) ≥ 0, we reach 1−εφ such √ ∑ δ ε ε OP h2+ε δ h nh → 0. As for i qi , the terms Sa , Sc , Sd are dealt exactly as in the FA jumps case. As for Sb , in (b1), we expand σ 2 up to the first order, with some Zs in place of X˜ i,s , we use the H and BDG inequalities and we see that the IA jump component contributes with Ei−1 [|J˜2,s − J˜2,ti−1 |2 ] = OP (δ) and the conclusion is as in Theorem 3.2. (b2) and (b5) still hold true. (b3) is dealt similarly as (b1), (b4) as in the proof of the consistency of Lˆ ⋆ above with K 2 in place of K , using that σ is bounded. Each term S2 , S3 , S4 is of type
∑ i ξi . For S2 , clearly the sum of the terms with ∆i N = ̸ 0 is OP ( hδ ) → 0. For the sum of the terms with (∆i J˜2 )2 > 4ϑ(δ), using the H inequality with q close to 1, the BDG and (27) in Cont and Mancini (2008), we have ∑ (1−αη/2)1/q−1/2+φ/2 ) → 0. For S3 , using (A.23) i Ei−1 [|ξi |] = OP (δ on a subsequence, Ito formula and BDG inequality we reach that
∑
i
Ki−1 (∆i X )2 I{(∆i J˜2 )2 ≤4ϑ(δ),∆i N =0} − σ 2 (x)δ → M1 .
(k+1)ε
for the fourth term we reach OP
Lˆ ⋆T → L⋆T (x)/σ 2 (x) in the presence of infinite activity jumps, re-
k+2 1
Remark A.4. In the same way we just extended the validity of
√
h2
h
For the contribution of all the other terms of (A.24) to (1) we simply use the boundedness of σs2 − σ 2 (x). For each of the terms in the the second term of (A.24), choosing a φ < 2ε/(1 − 2ε), we reach a final global contribution of
δ
δ → 0.
|K˜ s′ |
nh3
the contribution of the third term of (A.24) to (1), we also use the
Finally for the last term in the last line of (A.24) we have 1−
k √
δ ln 1δ
→ 0.
i−1
|K˜ s(m+1) |
(A.25)
that the final global contribution is OP
negligible since ti
Sℓ .
ℓ=1
For the first term in the last line of (A.24), wefind that its contri∑ ti bution to the upper bound we gave to 1h (Ki−1 − Ks )ds is i t
∫
4 −
With the notation of the proof of Theorem 3.2, we show that S1 converges stably to M1 , while all the other terms tend to zero in probability. As in the two lines before (A.8) S1 is the sum of two terms as in (A.8). Point (2) still holds true, while the negligibility stated at point (1) is obtained by expanding Ki−1 − Ks as in (A.24). Then to quantify the contribution to (1) of each of the terms in the first term of (A.24) we also expand σ 2 up to the first order, apply the occupation time formula using that (σ 2 )′ is bounded and we reach
occupation time formula and we reach OP
1−
n n− +2 Ki−1 ∆i Y ∆i J˜2 I{(∆i J˜2 )2 ≤4ϑ(δ), ∆i N =0}
i
k/2 δ ln 1δ
∑
n n−
−
The contribution of the last term in the first line of (A.24) is a.s. dominated by
∫ 1−
Ki−1 (∆i Y + ∆i J˜2 )2 I{(∆i J˜2 )2 ≤4ϑ(δ),∆i N =0} − σ 2 (x)δ
h i =1
0
k/2 ∫ δ ln 1δ
91
Ei−1 [|ξi |] = Lˆ ⋆
1
φ
√ α
P
hδ (1−α/2)η/2 → 0. Finally for S4 ,
Oa.s. δ − 2 + 2 +η(1− 2 )
→ 0.
∑
i
Ei−1 [|ξi |] =
92
C. Mancini, R. Renò / Journal of Econometrics 160 (2011) 77–92
References Ait-Sahalia, Y., 1996. Nonparametric pricing of interest rate derivative securities. Econometrica 64 (3), 527–560. Aït-Sahalia, Y., Jacod, J., 2009. Estimating the degree of activity of jumps in high frequency data. Annals of Statistics 37 (5A), 2202–2244. Andersen, T., Benzoni, L., Lund, J., 2002. An empirical investigation of continuoustime equity return models. Journal of Finance 57, 1239–1284. Andersen, T., Benzoni, L., Lund, J., 2004. Stochastic volatility, mean drift and jumps in the short-term interest rate. Working Paper. Andersen, T., Bollerslev, T., Diebold, F.X., 2007. Roughing it up: including jump components in the measurement, modeling and forecasting of return volatility. Review of Economics and Statistics 89, 701–720. Andersen, T., Lund, J., 1997. Estimating continuous-time stochastic volatility models of the short-term interest rate. Journal of Econometrics 77, 343–377. Bakshi, G., Cao, C., Chen, Z., 1997. Empirical performance of alternative option pricing models. Journal of Finance 52, 2003–2049. Bandi, F., 2002. Short-term interest rate dynamics: a spatial approach. Journal of Financial Economics 65, 73–110. Bandi, F., Nguyen, T., 2003. On the functional estimation of jump-diffusion models. Journal of Econometrics 116, 293–328. Bandi, F., Phillips, P., 2003. Fully nonparametric estimation of scalar diffusion models. Econometrica 71 (1), 241–283. Bandi, F., Renò, R., 2008. Nonparametric stochastic volatility. Working Paper. Barndorff-Nielsen, O.E., Shephard, N., 2004. Power and bipower variation with stochastic volatility and jumps. Journal of Financial Econometrics 2, 1–48. Barndorff-Nielsen, O.E., Shephard, N., Winkel, M., 2006. Limit theorems for multipower variation in the presence of jumps. Stochastic Processes and their Applications 116, 798–806. Bates, D., 2000. Post-’87 crash fears in the S&P 500 futures option market. Journal of Econometrics 94, 181–238. Carr, P., Geman, H., Madan, D., Yor, M., 2002. The fine structure of asset returns: an empirical investigation. Journal of Business 75 (2), 305–332. Chapman, D., Long, J., Pearson, N., 1999. Using proxies for the short rate: when are three months like an instant? Review of Financial Studies 12 (4), 763–806. Chapman, D., Pearson, N., 2000. Is the short rate drift actually nonlinear? Journal of Finance 55 (1), 355–388. Cont, R., Mancini, C., 2008. Detecting the presence of a diffusion and the nature of jumps in asset prices. Working Paper. Cont, R., Tankov, P., 2004. Financial Modelling with Jump Processes. Chapman & Hall - CRC. Corradi, V., Distaso, W., 2007. Deterministic versus stochastic volatility. Working Paper. Corsi, F., Pirino, D., Renò, R., 2009. Threshold bipower variation and the impact of jumps on volatility forecasting. Working Paper. Das, S., 2002. The surprise element: jumps in interest rates. Journal of Econometrics 106, 27–65. Duffee, G., 1996. Idiosyncratic variation of Treasury Bill yield spread. Journal of Finance 51, 527–552. Eberlein, E., Raible, S., 1999. Term structure models driven by general Lévy processes. Mathematical Finance 9, 31–53. Eraker, B., Johannes, M., Polson, N., 2003. The impact of jumps in equity index volatility and returns. Journal of Finance 58, 1269–1300. Fan, J., 2005. A selective overview of nonparametric methods in finance. Statistical Science 20 (4), 317–337. Florens-Zmirou, D., 1993. On estimating the diffusion coefficient from discrete observations. Journal of Applied Probability 30, 790–804.
Gihman, I., Skorohod, A., 1972. Stochastic Differential Equations. Springer Verlag. Hamilton, J., 1996. The daily market for federal funds. Journal of Political Economy 104, 26–56. Hong, Y., Li, H., 2005. Nonparametric specification testing for continuous-time models with applications to term structure of interest rates. Review of Financial Studies 18 (1), 37–84. Ikeda, N., Watanabe, S., 1981. Stochastic Differential Equations and Diffusion Processes. North Holland. Jacod, J., 2004. The Euler scheme for Lévy driven stochastic differential equations: limit theorems. The Annals of Probability 32 (3A), 1830–1872. Jacod, J., 2007. Statistics and high-frequency data. In: Lecture Notes of SEMSTAT Course in La Manga (in press). Jacod, J., 2008. Asymptotic properties of realized power variations and associated functionals of semimartingales. Stochastic Processes and their Applications 118, 517–559. Jacod, J., Kurtz, T., Meleard, S., Protter, P., 2005. The approximate Euler method for Levy driven stochastic differential equations. Annales de l’Institut Henri Poincaré/Probabilités et statistiques 41 (3), 523–558. Jiang, G., 1998. Nonparametric modeling of US interest rate term structure dynamics and implications on the prices of derivative securities. Journal of Financial and Quantitative Analysis 33 (4), 465–497. Jiang, G., Knight, J., 1997. A nonparametric approach to the estimation of diffusion processes, with an application to a short-term interest rate model. Econometric Theory 13, 615–645. Johannes, M., 2004. The statistical and economic role of jumps in continuous-time interest rate models. Journal of Finance 59, 227–260. Jones, C., 2003. Nonlinear mean reversion in the short-term interest rate. Review of Financial Studies 16, 765–791. Madan, D., 2001. Purely discontinuous asset price processes. In: Cvitanic, J., Jouini, E., Musiela, M. (Eds.), Advances in Mathematical Finance. Cambrdige University Press. Mancini, C., 2003. Statistics of a Poisson–Gaussian process. Working Paper. Available at: http://www.dmd.unifi.it/persone/c.mancini/paper4JE.pdf. Mancini, C., 2004. Estimation of the parameters of jump of a general Poissondiffusion model. Scandinavian Actuarial Journal 1, 42–52. Mancini, C., 2009. Non-parametric threshold estimation for models with stochastic diffusion coefficient and jumps. Scandinavian Journal of Statistics 36 (2), 270–296. Metivier, M., 1982. Semimartingales: A Course on Stochastic Processes. De Gruyter. Pan, J., 2002. The jump-risk premia implicit in options: evidence from an integrated time series study. Journal of Financial Economics 63, 3–50. Piazzesi, M., 2005. Bond yields and the federal reserve. Journal of Political Economy 113 (2), 311–344. Pritsker, M., 1998. Nonparametric density estimation and tests of continuous time interest rate models. Review of Financial Studies 11 (3), 449–487. Protter, P., 2005. Stochastic Integration and Differential Equations. Springer. Renò, R., 2008. Nonparametric estimation of the diffusion coefficient of stochastic volatility models. Econometric Theory 24 (5), 1174–1206. Renò, R., Roma, A., Schaefer, S., 2006. A comparison of alternative nonparametric estimators of the diffusion coefficient. Economic Notes 35 (3), 227–252. Stanton, R., 1997. A nonparametric model of term structure dynamics and the market price of interest rate risk. Journal of Finance 52, 1973–2002. Woerner, J., 2006. Power and multipower variation: inference for high frequency data. In: Shiryaev, A.N., do Rosario Grossinho, M., Oliveira, P., Esquivel, M. (Eds.), Stochastic Finance. Springer, pp. 343–364.
Journal of Econometrics 160 (2011) 93–101
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Forecasting multivariate realized stock market volatility Gregory H. Bauer a,∗ , Keith Vorkink b a
Financial Markets Department 4E, Bank of Canada, 234 Wellington, Ottawa, Ontario, Canada K1A 0G9
b
667 TNRB, Provo, UT 84602, USA
article
info
Article history: Available online 6 March 2010 JEL classification: G14 C53 C32 Keywords: HAR-RV model Realized volatility Covariance matrix Factor model
abstract We present a new matrix-logarithm model of the realized covariance matrix of stock returns. The model uses latent factors which are functions of lagged volatility, lagged returns and other forecasting variables. The model has several advantages: it is parsimonious; it does not require imposing parameter restrictions; and, it results in a positive-definite estimated covariance matrix. We apply the model to the covariance matrix of size-sorted stock returns and find that two factors are sufficient to capture most of the dynamics. © 2010 Elsevier B.V. All rights reserved.
1. Introduction The variances and covariances of stock returns vary over time (e.g. Andersen et al., 2005). As a result, many important financial applications require a model of the conditional covariance matrix. Three distinct categories of methods for estimating a latent conditional covariance matrix have evolved in the literature. In the first category are the various forms of the multivariate GARCH model where forecasts of future volatility depend on past volatility and shocks (e.g. Bauwens et al., 2006). In the second category, authors have modeled asset return variances and covariances as functions of a number of predetermined variables (e.g. Ferson, 1995). The third category includes multivariate stochastic volatility models (e.g. Asai et al., 2006). In this paper, we introduce a new model of the realized covariance matrix.1 We use high-frequency data to construct estimates of the daily realized variances and covariances of five size-sorted stock portfolios. By using high-frequency data we obtain an estimate of the matrix of ‘quadratic variations and covariations’ that differs from the true conditional covariance matrix by mean zero errors (e.g. Andersen et al. (2003) and Barndorff-Nielsen and Shephard (2004a)). This provides greater power in determining the effects of alternative forecasting
variables on equity market volatility when compared to efforts based on latent volatility models. We transform the realized covariance matrix using the matrix logarithm function to yield a series of transformed volatilities which we term the log-space volatilities. The matrix logarithm is a non-linear function of all of the elements of the covariance matrix and thus the log-space volatilities do not correspond oneto-one with their counterparts in the realized covariance matrix.2 However, modeling the time variation of the log-space volatilities is straightforward and avoids the problems that plague existing estimators of the latent volatility matrix. In particular, we do not have to impose any constraints on our estimates of the log-space volatilities. We then model the dynamics of the log-space volatility matrix using a latent factor model. The factors consist of both past volatilities and other variables that can help forecast future volatility. We thus are able to model the conditional covariance matrix by combining a large number of forecasting variables into a relatively small number of factors. Indeed we show that two factors can capture the volatility dynamics of the size-sorted stock portfolios. The factor model is estimated by GMM yielding a series of filtered estimates. We then transform these fitted values, using the matrix exponential function, back into forecasts of the realized covariance matrix. Our estimated matrix is positive
∗
Corresponding author. E-mail addresses:
[email protected] (G.H. Bauer),
[email protected] (K. Vorkink). 1 Andersen et al. (2001) and Barndorff-Nielsen and Shephard (2002) formalized the notion of realized volatility. 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.021
2 The matrix logarithm has been used for estimators of latent volatility by Chiu et al. (1996) and Kawakatsu (2006) and was also suggested in Asai et al. (2006).
94
G.H. Bauer, K. Vorkink / Journal of Econometrics 160 (2011) 93–101
definite by construction and does not require any parameter restrictions to be imposed. The approach can thus be viewed as a multivariate version of standard stochastic volatility models, where the variance is an exponential function of the factors and the associated parameters. In addition to introducing our new realized covariance matrix we also test the forecasting ability of alternative variables for time-varying equity market covariances. Our motivation is that researchers have examined a number of variables for forecasting returns but there is much less evidence that the variables forecast risks. The cross-section of small- and large-firm volatility has been examined in a number of earlier papers (e.g., Kroner and Ng (1998), Chan et al. (1999), and Moskowitz (2003)). However, these papers used models of latent volatility to capture the variation in the covariances. In contrast, we construct daily measures of the realized covariance matrix of small and large firms over the 1988 to 2002 period. Our precise measures of volatility allow a more detailed examination of the drivers of conditional covariances than prior work. Naturally all of these advantages come at a cost. The main cost is that by performing our analysis on the log-space volatilities and then using the (non-linear) matrix exponential function, the estimated volatilities are not unbiased. However, as we show below, a simple bias correction is available that greatly reduces the problem. Another cost is that direct interpretation of the effects of an instrument on expected volatility is difficult due to the non-linear nature of the model. However, using our factor model estimates, we can obtain the derivatives of the realized covariance matrix with respect to the forecasting variables. We are able to calculate the derivatives at each point in our sample, yielding a series of conditional volatility elasticities that are functions of both the level of the volatility and the factors driving the volatility. The time series allows us to determine which variables have a large impact on time-varying expected volatility. The paper is organized as follows. In Section 2, we present our model of matrix logarithmic realized volatility. In Section 3, we outline our method for constructing the realized volatility matrices and give the sources of the data. In Section 4, we give our results. In Section 5, we conclude. 2. Model 2.1. The matrix log transformation In this paper, we use the matrix exponential and matrix logarithm functions to model the time-varying covariance matrix. The matrix exponential function performs a power series expansion on a square matrix A V = expm (A) ≡
∞ − 1 n A . n! n =0
(1)
The matrix exponential function has a number of useful properties (Chiu et al. (1996)). The most important of these is that A is a real, symmetric matrix, if and only if V is a real, positive definite matrix. The matrix logarithm function is the inverse of the matrix exponential function. Taking the matrix logarithm of a real, positive definite matrix V results in a real, symmetric matrix A: A = logm (V ). The matrix logarithm and matrix exponential functions are used in our three-step procedure to obtain forecasts of the conditional covariance matrix of stock returns. In the first step, for each day t, we use high-frequency data to construct the P × P realized conditional covariance matrix Vt .3 The Vt matrix is positive semi-definite by construction. Applying the matrix
3 The details of how the matrix is constructed are presented below.
logarithm function, At = logm (Vt ),
(2)
yields a real, symmetric P × P matrix At . In the second step, we model the dynamics of the At matrix. To do this, we follow Chiu et al. (1996) and apply the vech operator to the matrix At at = v ech (At ), which stacks the elements on and below the diagonal of At to obtain the p × 1 vector at , where p = 12 P (P + 1). The at vector forms the basis for all subsequent models. Below, we present a factor model for the at processes which allows both lagged values of at and other variables to forecast the volatility. In the third step, we transform the fitted values from the logvolatility space into fitted values in the actual volatility space. We use the inverse of the vech function to form a P × P symmetric matrix At of the fitted values at each time t from the vector at . Applying the matrix exponential function
Vt = expm ( At ),
(3)
yields the positive semi-definite matrix Vt , which is our estimate of the conditional covariance matrix for day t. 2.2. Factor models of volatility 2.2.1. Forecasting variables We will use several different groups of variables to forecast the conditional covariance matrix. Based on the existing literature, we can separate the variables into two groups. The first are matrixlog values of realized volatility (at , at −1 , at −2 , . . .) which are used to capture the autoregressive nature of the volatility. There are three potential problems in using these variables to forecast volatility. First, the existing literature shows that capturing volatility dynamics will likely require a long lag structure. To overcome this, we adapt the Heterogeneous Autoregressive model of realized volatility (HAR-RV) of Corsi (2009) and Andersen et al. (2007) to a multivariate setting. These authors show that the aggregate market realized volatility is forecast well by a (linear) combination of lagged daily, weekly and monthly realized volatility. The second problem is that other authors have indicated that lagged realized volatility may not be the best predictor. In particular, both Andersen et al. (2007) and Ghysels et al. (2006) find that bi-power covariation – an estimate of the continuous part of the volatility diffusion – is a good predictor of the aggregate market’s realized volatility.4 We thus construct bipower covariation matrices aggregated over the last d = 1, 5 and 20 days. As in (2) above, we take the matrix logarithm of the bipower covariation matrix over the past d days to yield ABP (d)t . Taking the vech of this matrix yields the unique elements aBP (d)t . The third problem is the large number of correlated predictors. It is likely that the bi-power covariation series aBP (d)t is driven by a smaller number of factors. We test this by estimating the principal components of aBP (d)t , aBP (d, i),
i = 1, . . . , pc ,
(4)
where aBP (d, i) is the ith principal component of the d-day log-space bi-power covariation matrix. We find that a small number of components captures the volatility of the daily, weekly
4 Barndorff-Nielsen and Shephard (2004b, 2006) develop the theory of bi-power variation, and extend their results to the multivariate case (bi-power covariation) in Barndorff-Nielsen and Shephard (2005). We construct bi-power covariation measures for our portfolios using Definition 3 of Barndorff-Nielsen and Shephard (2005).
G.H. Bauer, K. Vorkink / Journal of Econometrics 160 (2011) 93–101
and monthly log-space bi-power covariation series.5 In turn, these principal components are sufficient to model the realized covariance matrix. Our approach can thus be viewed as a multivariate approach to the HAR-RV model using the principal components of bi-power variation as predictors. The second group of forecasting variables, denoted Xt , are those variables that have been shown to forecast equity market returns. In equilibrium, expected returns should be related to risk, so it is natural to question whether these variables also forecast the components of market wide volatility. Below, we use a number of variables that has been shown to predict equity market returns. We combine the two groups of forecasting variables as
95
Zt = (aBP (1, 1)t , . . . , aBP (5, 1)t , . . . , aBP (20, 1)t , . . . , Xt ).
means that the expected return variation can be ignored. As the realized covariance matrix can be estimated more precisely than can the expected returns, we should obtain more precise measures of the determinants of the covariance matrix. The third advantage is parsimony. For example, assume that we require 20 lags of daily log-space bi-power covariation plus 5 forecasting variables in Xt to capture volatility dynamics in our 5 × 5 volatility matrix. The number of parameters in a system of linear regressions (5) would be 4590 while a K = 2 factor version of (7) using the first three principal components of the log-space bi-power covariation matrices (for d = 1, 5 and 20 days) has only 69. The small number of parameters in the factor model helps in estimating and interpreting the model in-sample and should help in out-of-sample forecasting.
Below, we select different subsets of Zt that correspond to existing approaches to modeling volatility.
2.3. Estimation
2.2.2. Latent factors Combining all of the forecasting variables results in the model at = γ0 + γ1 Zt −1 + εt .
(5)
In this model, the number of factors driving the cross section of log-space volatility at is equal to the number of variables in Zt . However, it is likely that the common variation in at can be explained by a much smaller number of factors, as were the bipower covariation measures above. To model the common variation in the realized volatility matrix, we use a latent factor approach where the factors that drive the time-varying volatility are not specified directly. Rather, we assume that our set of forecasting variables Zt is related to the true, but unknown, volatility factors. We thus specify the k-th volatility factor υk,t as a linear combination of the set of N variables in Zt :
υk,t = θk Zt −1 , (6) where the θk = {θk,(1) , . . . , θk,(N ) } are coefficients that aggregate the forecasting variables in Zt . Each of the log-space volatilities ait is a function of the K volatility factors: ait = γ0i + β i θ Zt −1 + εti ,
i = 1, . . . , p,
where γ is the ith element of the intercept vector γ0 , β i is the 1 × K vector of the loadings of log-space volatility i on the K factors, and the K ×N matrix θ contains the coefficients on the Zt −1 variables for the K factors. Assembling the model for all p log-space volatilities yields i 0
at = γ0 + βθ Zt −1 + εt ,
(7)
where the p × K matrix β is the loading of the log-space volatilities on the time-varying factors. We note that using latent factors to model covariance matrices has a number of advantages over existing methods.6 First, it allows us to combine both lagged volatility measures (the principal components in (4)) as well as the Xt variables in a parsimonious manner. Previous models required each variable to be a separate factor. While the large number of variables may help forecast the covariance matrix, it is unlikely that each variable represents a specific volatility factor. Our approach can be used to weigh (via the θ coefficients) all of the variables in a way that is optimal for forecasting the covariance matrix. A second advantage to our approach is that it avoids using expected returns in modeling the volatility matrix. Aggregating squared return or bi-power covariation data over high frequencies
5 The first three principal components capture 48.5%, 85.3% and 94.2% of the variation in the 1, 5 and 20 day bi-power covariation series, respectively. 6 Examples of previous work using factor models for volatility dynamics include Engle and Lee (1999), Diebold and Nerlove (1989), King et al. (1994), and Harvey et al. (1994).
Our multivariate factor model is derived from the latent factor models of expected return variation that originated with Hansen and Hodrick (1980) and Gibbons and Ferson (1985). As in these papers, we estimate our factor model of volatility in (7) by GMM with the Newey and West (1987) form of the optimal weighting matrix.7 We use iterated GMM with a maximum number of 25 steps. The instruments are the same forecasting variables Zt . In its present form, (7) is unidentified due to the βθ combination. We thus impose the standard identification that the first K rows of the matrix β are equal to an identity matrix. The cross-equation restrictions imposed on (5): H0 : γ1 = βθ
(8)
can then be tested using the standard χ 2 test statistic from a GMM system. Our model has a potential errors-in-variables problem as the log-space bi-power covariation matrix ABP (d)t is constructed with error. Using its principal components as regressors will result in biased estimates of the coefficients. Ghysels and Jacquier (2005) have noted a similar problem with estimates of time-varying beta coefficients for portfolio selection. They advocate using lagged values of the betas in an instrumental variables regression to overcome the biases. We follow that approach here and use the twice lagged values of the principal components in the GMM instrument set. Once the coefficients have been estimated by GMM, the fitted values are reassembled into a square matrix At . Applying the matrix exponential function (3) yields the prediction for the covariance matrix in period t,
Vt = expm At .
(9)
We can then apply standard forecasting evaluation techniques to compare Vt to Vt . 2.4. Bias correction Our estimator Vt will be biased as the estimation is done in the log-volatility space and by Jensen’s inequality E Vt ̸= expm E A . An analytic bias correction exists if A and ε are normally distributed; however, in our data this is not the case and so we do a simple numerical bias correction on the individual volatility series. The realized volatility matrix Vt can be decomposed into a matrix of standard deviations and correlations:
Vt = SDt ∗ Ct ∗ SD′t , where SDt is a P × P diagonal matrix of the standard deviations and Ct is a P × P symmetric matrix of the correlations. A similar t decomposition can be done for the fitted value Vt to yield the SD and Ct matrices. We then estimate a bias correction factor as the 7 We use the Andrews (1991) test to determine the optimal lag length of 46 in the HAC covariance matrix.
96
G.H. Bauer, K. Vorkink / Journal of Econometrics 160 (2011) 93–101
ratio of the median values of the two standard deviation series: med (SDt (i, i)) , i = 1, . . . , 5. bc = t (i, i) med SD We then bias correct the standard deviations while leaving the correlations intact. This simple method works well in that the fitted values are of the approximate magnitude of the actual realized volatility series and the statistical and economic tests presented in the paper support its use. We recognize that other more sophisticated bias-correction methods could produce better results. 2.5. Interpreting expected volatility The matrix logarithmic volatility model has the disadvantage that the estimated coefficients cannot be interpreted directly as the effect of the variable on the specified element of the realized volatility. This results from the non-linear relationship between particular elements of Vt and At . However, derivatives of the estimated covariance matrix Vt with respect to the elements of the factor model can be easily obtained (Najfeld and Havel, 1995; Mathias, 1996). Let A(z ) be the P × P expected conditional covariance matrix from the third step (3) of our estimation procedure, where we consider the matrix to be a function of a particular forecasting variable, say z ∈ Z . The matrix of the element-by-element d A(z )
derivatives of A(z ) with respect to z, dz , is calculated using the estimated coefficients from our factor model (7). The P × P matrix of the derivatives of the actual volatilities with respect to z, d d V (z ) = expm A(z ) , dz dz can be extracted from the upper P × P right block of the following 2P × 2P matrix: d expm A(z ) expm A(z ) dz 0 expm A(z )
= expm
d A(z )
A(z )
dz
A(z )
0
,
(10)
where 0 is a P × P matrix of zeros. Eq. (10) allows one to interpret of the impact forecasting variables on the realized covariance matrix, even though the estimation occurs in the matrix-log space. We can calculate either the average impact across the entire sample, or the conditional impact at a point in time. For example, let At (zt ) be the estimated realized log-space volatilities for day t. To find the response of the expected covariance matrix to the forecasting variable zt , we need to calculate the derivative d t (zt ) . Given our two-factor model as defined in (7), expm A d(z ) t
d At (zt ) , is easily d(zt ) d Vt (zt ) matrix d(z ) . We can t
the matrix of element-by-element derivatives, d A (z )
obtained. Plugging d(t z )t into (10) yields the t then calculate the time-varying elasticity
below, we show that there is a significant time variation in the elasticities. 3. Data 3.1. Realized volatility We construct our realized covariance matrices from two data sets: the Institute for the Study of Securities Markets’ (ISSM) database and the Trades and Quotes (TAQ) database. Both data sets contain continuously-recorded information on stock quotes and trades for securities listed on the New York Stock Exchange (NYSE). The ISSM database provides quotes from January 1988 through December 1992 while the TAQ database provides quotes from January 1993 through December 2002.9 Value-weighted portfolio returns are created by assigning stocks to one of five size-sorted portfolios based on the prior month’s ending price and shares outstanding. Our choice of portfolios is partially motivated by an interest to see if the systematic components of conditional volatility are common across the size portfolios. We use the CRSP database to obtain shares outstanding and prior month ending prices.10 Once we have our time series of high-frequency portfolio returns, we construct our measure of realized covariance matrices using the approach of Hansen and Lunde (2006), who recommend a Newey and West (1987) type extension to the usual realized volatility construction.11 They note the potential bias in calculating variances if the serially autocorrelated nature of the data is ignored. Our data likely suffers from this problem as the portfolios of smaller stocks will include securities that are more illiquid than stocks in the larger quintiles. The illiquidity of small stocks suggests that price and volatility responses to information shocks may take more time to be incorporated, leading to time series autocorrelation in the high-frequency returns.12 The summary statistics of the log-space volatility matrix At = logm(Vt ) are shown in Table 1. Many of the skewness coefficients are close to 0 while the kurtosis statistics are close to 3.00. Although all of the Jarque–Bera statistics reject the null of normally distributed data, the test statistic values (not reported) have decreased quite dramatically relative to their values for the series in Vt . Thus, taking the matrix logarithm of multivariate realized volatility results in series that are much closer to being normally distributed. This parallels the univariate finding of Andersen et al. (2001). 3.2. Forecasting variables Our goal in this paper is to compare alternative models of the conditional covariance matrix. While all of the models use the latent factor form given in (7), they differ by the forecasting variables Zt used. We construct four alternative models using forecasting variables that correspond to existing approaches in the literature. The first model, labeled ‘‘MHAR-RV-BP’’, is a multivariate HAR model of daily realized volatility using the principal components
(i,j)
ε (i, j, z , t ) ≡
(zt ) σ (zt ) , d(zt ) Vt (i,j)
d Vt
(11)
which represents the per cent increase in the (i, j)th element of Vt due to a one standard deviation shock in the forecasting variable, σ (zt ), at time t.8 We can therefore examine how the elasticity of a particular equity market variance or covariance changes over time in response to changes in the forecasting variables. In our results
8 Measuring the elasticity for the covariance elements (i ̸= j) is problematic as the covariances can become arbitrarily small. For these elements, we therefore use
1/2 Vt (i,i) Vt (j,j) in the denominator of (11).
9 The ISSM data actually begins in January 1983; however, the first four years of the data have many missing days and the necessity of a contiguous data set for our time-series analysis precludes our use of these years. 10 We use a variety of other filters that reduces the set of securities included in our data base. For example, we exclude securities with CRSP share codes that are not 10 or 11, leading to the exclusion of preferred stocks, warrants, etc., and we only include stocks that are found in both the quotes databases (ISSM and TAQ) and CRSP. 11 The approach of Hansen and Lunde (2006) was theoretically developed in Barndorff-Nielsen et al. (2008). 12 We also use the procedure detailed in Hansen and Lunde (2005) to get an estimate of the covariance matrix that includes close to open, or overnight, price information.
G.H. Bauer, K. Vorkink / Journal of Econometrics 160 (2011) 93–101
97
Table 1 Summary statistics of the log-space realized covariance matrix of size-sorted stock returns. The table shows summary statistics of the log-space one-day realized covariance of stock returns on the five size- sorted NYSE, AMEX and NASDAQ portfolios. For each day in the series, the matrix logarithm of the realized volatility matrix is calculated. The summary statistics of the upper triangular elements of the resulting matrix are shown here. The table shows the: mean; median; standard deviation; the first-order autoregressive coefficient; and the skewness and kurtosis statistics. Also shown is the asymptotic marginal significance level (P-value) for the Jarque–Bera test of Normality. The bottom of the table presents the QQ test statistic of multivariate normality, multivariate skewness and multivariate kurtosis test statistics as well as their marginal significance levels. A(1)i,j
Mean (%)
Median (%)
Std. Dev. (%)
AR(1) statistic
Normality tests Skewness statistic
A(1)1,1 A(1)1,2 A(1)1,3 A(1)1,4 A(1)1,5 A(1)2,2 A(1)2,3 A(1)2,4 A(1)2,5 A(1)3,3 A(1)3,4 A(1)3,5 A(1)4,4 A(1)4,5 A(1)5,5
−11.880
−11.907
0.715 0.536 0.429 0.301 −12.793 0.987 0.708 0.454 −13.099 1.160 0.752 −13.068 1.406 −11.450
0.721 0.550 0.441 0.313 −12.817 0.994 0.724 0.477 −13.139 1.191 0.778 −13.114 1.441 −11.511
Multivariate normality test
1.228 0.701 0.648 0.631 0.608 1.133 0.737 0.638 0.598 1.157 0.684 0.615 1.109 0.652 1.187
0.452 0.123 0.085 0.069 −0.006 0.399 0.172 0.079 −0.007 0.450 0.111 0.029 0.415 0.074 0.420
QQ
Skewness
Kurtosis
−6
1748
0.001
0 or β < 0. For β > 0, a lower one sided (1 − α)% confidence interval for σ 2 based on the infeasible statistic Sβ is given by
We consider the following stochastic volatility model
[
where W1t , W2t and W3t are three independent standard Brownian motions. Since we assume no drift and no leverage, µ = ρ1 = ρ2 = 0, implying that d log St = σt dW3t . We consider two different models for σt . Our first model is the GARCH(1, 1) diffusion studied by Andersen and Bollerslev (1998): dσt2 = 0.035(0.636 − σt )dt + 0.144σt2 dW1t . We also consider the two-factor diffusion model analyzed by Huang and Tauchen 2 (2005): σt = s- exp(−1.2 + 0.04σ1t + 1.5σ2t2 ), where dσ1t2 = 2 2 −0.00137σ1t dt + dW1t , dσ2t = −1.386σ2t2 dt + (1 + 0.25σ2t2 )dW2t , and where the function s-exp is the usual exponential function with a linear growth function splined in at high values of its argument: s- exp(x) = exp(x) if x ≤ x0 and s- exp(x) =
]
1 − ρ12 − ρ22 dW3t ,
1/β β 0, R2 − zα vβ , where vβ
= βσ 2
β−1 √
hV ,
with V = 2σ 4 , is the scaling factor for Sβ , and where zα is such that Φ (zα ) = α for any α . For β < 0, it is given
S. Gonçalves, N. Meddahi / Journal of Econometrics 160 (2011) 129–144
143
Fig. 6. Coverage probabilities of confidence intervals across several values of β , two-factor diffusion.
1/β β by 0, R2 − z1−α vβ . When β > 0, a two-sided sym-
Proof of Proposition 3.2. We have
metric (1 − α)% confidence interval based on Sβ is given by
1 (σ 4 )1/2 pgβ (x) = − √ ((β − β ∗ )(x2 − 1) + (β − 1)) 2 σ2
1/β 1/β β β R2 − z1−α/2 vβ , R2 + z1−α/2 vβ , whereas for β < 0, it 1/β 1/β β β . The confiis given by R2 + z1−α/2 vβ , R2 − z1−α/2 vβ
dence intervals for σ 2 based on the feasible statistic Tβ are defined β−1
similarly with vβ replaced with vˆ β = β R2
hVˆ , where Vˆ =
2 R . 3 4
Appendix B. Proofs Proof of Proposition 3.1. We apply Lemmas S.1 and S.2 in GM (2009).
1
(σ 4 )1/2
((β − β ∗ )x2 + (β ∗ − 1)) σ2 1 (σ 4 )1/2 = √ ((β ∗ − β)x2 + (1 − β ∗ )). 2 σ2 = −√
2
When x is fixed and non-zero, the function that appears in the last equation is positive and β varies with β ≤ β ∗ < decreasing when 1; therefore, pβ ∗ (x) < pβ1 (x) < pβ2 (x). When x = 0, pgβ (x) does not depend on β ; hence, pβ ∗ (0) = pβ1 (0) = pβ2 (0). Proof of Propositon 3.3. See GM (2009).
Proof of Corollary 3.1. The goal is to characterize the value of β such that the leading term of κ1 (Sβ ) (resp κ3 (Sβ )) equals zero;
Proof of Corollary 3.2. The goal is to characterize the value of β such that the leading term of κ1 (Tβ ) (resp. κ3 (Tβ )) equals
given that g ′′ (σ 2 , β)/g ′ (σ 2 , β) = (β − 1)(σ 2 )−1 , and given Proposition 3.1, the solution is β = 1 (resp. β = β ∗ ). The Cauchy–Schwartz inequality implies (σ 4 )2 ≤ σ 2 σ 6 , which leads to β ∗ ≤ 1/3.
zero; given that g ′′ (σ 2 , β)/g ′ (σ 2 , β) = (β − 1)σ 2 , and given Proposition 3.3, the solution is β = β∗ (resp. β = β ∗∗ ). The Cauchy–Schwartz inequality implies (σ 4 )2 ≤ σ 2 σ 6 , which leads to β∗ ≤ −1 and β ∗∗ ≤ −1/3.
−1
144
S. Gonçalves, N. Meddahi / Journal of Econometrics 160 (2011) 129–144
Proof of Proposition 3.4. We have
√ qgβ (x) =
2 3
σ6 (σ 4 )3/2
1 (σ 4 )1/2
= √
2
σ2
(2x2 + 1) +
(σ 4 )1/2 gβ′′ (σ 2 ) √
2
gβ′ (σ 2 )
1 (β − β ∗∗ )x2 + (1 − β ∗∗ ) . 2
When x is fixed and non-zero, the function that appears in the last equation is positive when β varies with β ∗∗ < β ; and increasing therefore, qβ ∗∗ (x) < qβ1 (x) < qβ2 (x). When x = 0, qgβ (x) does not depend on β ; hence, qβ ∗∗ (0) = qβ1 (0) = qβ2 (01). References Andersen, T.G., Bollerslev, T., 1998. Answering the skeptics: yes, standard volatility models do provide accurate forecasts. International Economic Review 39, 885–905. Andersen, T.G., Bollerslev, T., Diebold, F.X., Ebens, H., 2001a. The distribution of realized stock return volatility. Journal of Financial Economics 61, 43–76. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2001b. The distribution of realized exchange rate volatility. Journal of the American Statistical Association 96, 42–55. Barndorff-Nielsen, O., Graversen, S.E., Jacod, J., Shephard, N., 2006. Limit theorems for bipower variation in financial econometrics. Econometric Theory 22, 677–719.
Barndorff-Nielsen, O., Shephard, N., 2002. Econometric analysis of realized volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society, Series B 64, 253–280. Barndorff-Nielsen, O., Shephard, N., 2004. Power and bipower variation with stochastic volatility and jumps. Journal of Financial Econometrics 2, 1–48. Barndorff-Nielsen, O., Shephard, N., 2005. How accurate is the asymptotic approximation to the distribution of realised volatility? In: Andrews, Donald W.K., Stock, James H. (Eds.), Identification and Inference for Econometric Models. A Festschrift for Tom Rothenberg. Cambridge University Press, pp. 306–331. Chen, W.C., Deo, R.S., 2004. Power transformation to induce normality and their applications. Journal of the Royal Statistical Society, Series B 66, 117–130. Gonçalves, S., Meddahi, N., 2009. Bootstrapping realized volatility. Econometrica 77, 283–306. Hall, P., 1992. The Bootstrap and Edgeworth Expansion. Springer-Verlag, New York. Huang, X., Tauchen, G., 2005. The relative contribution of jumps to total price variance. Journal of Financial Econometrics 3, 456–499. Jacod, J., Protter, P., 1998. Asymptotic error distributions for the Euler method for stochastic differential equations. Annals of Probability 26, 267–307. Marsh, P., 2004. Transformations for multivariate statistics. Econometric Theory 20, 963–987. Niki, N., Konishi, S., 1986. Effects of transformations in higher order asymptotic expansions. Annals of the Institute of Statistical Mathematics Part A 38, 371–383. Phillips, P.C.B., 1979. Expansions for transformations of statistics. Research Note, University of Birmingham, UK. Phillips, P.C.B, Park, J.Y., 1988. On the formulation of Wald tests of nonlinear restrictions. Econometrica 56, 1065–1083. Wilson, E.B., Hilferty, M.M., 1931. The distribution of chi square. Proceedings of the National Academy of Sciences USA 17, 684–688.
Journal of Econometrics 160 (2011) 145–159
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Market microstructure noise, integrated variance estimators, and the accuracy of asymptotic approximations✩ Federico M. Bandi a,c,∗ , Jeffrey R. Russell b a
Carey Business School, Johns Hopkins University, 1133, 100 N. Charles Street, Baltimore, MD 21201, United States
b
Booth School of Business, University of Chicago, 452, 5807 South Woodlawn Avenue, Chicago, IL 60637, United States
c
Edhec-Risk Institute, 393-400 Promenade des Anglais, BP 3116, 06202 Nice Cedex 3, France
article
info
Article history: Available online 6 March 2010 JEL classification: C13 C14 C22 Keywords: Integrated variance Realized variance Microstructure noise Kernel-based estimators
abstract A growing literature has been advocating consistent kernel estimation of integrated variance in the presence of financial market microstructure noise. We find that, for realistic sample sizes encountered in practice, the asymptotic results derived for the proposed estimators may provide unsatisfactory representations of their finite sample properties. In addition, the existing asymptotic results might not offer sufficient guidance for practical implementations. We show how to optimize the finite sample properties of kernel-based integrated variance estimators. Empirically, we find that their suboptimal implementation can, in some cases, lead to little or no finite sample gains when compared to the classical realized variance estimator. Significant statistical and economic gains can, however, be recovered by using our proposed finite sample methods. © 2010 Published by Elsevier B.V.
1. Introduction The asymptotic consistency of kernel-based or HAC-type variance estimators relies on a limiting condition requiring the number of autocovariances to diverge to infinity as the ratio (φ , say) of the number of autocovariances over the number of observations goes to zero. As noticed as early as Neave (1970), while this condition is ‘‘mathematically convenient’’, it might lead to inaccurate asymptotic approximations to the estimators’ finite sample properties. In effect, the ratio φ is fixed in any given sample. For a fixed φ , the magnitude of the finite sample mean squared error (MSE) of HAC-type variance estimators can differ substantially from asymptotic approximations relying on a vanishing φ . Importantly, some influential recent contributions on integrated variance estimation by virtue of noisy high-frequency asset price data rely on a similar
✩ This paper was previously circulated under the title ‘‘On the finite sample properties of kernel-based integrated variance estimators’’. We are grateful to Peter Hansen, Yixiao Sun, the Co-Editors (Nour Meddahi, Per Mykland, and Neil Shephard), two anonymous referees, and seminar participants at Chicago Booth, Yale, Rice, UC Los Angeles, UC San Diego, Singapore Management University, the CIREQ conference on realized volatility (Montreal, April 22–23, 2006), and the ‘‘International Conference on Financial Econometrics, Finance, and Risk’’ (Perth, June 29–July 1, 2006) for useful comments and discussions. The work of Federico Bandi was supported by the William S. Fishman Faculty Research Fund at the Booth School of Business of the University of Chicago. ∗ Corresponding author. E-mail addresses:
[email protected] (F.M. Bandi),
[email protected] (J.R. Russell).
0304-4076/$ – see front matter © 2010 Published by Elsevier B.V. doi:10.1016/j.jeconom.2010.03.027
asymptotic condition for ‘‘near-consistency’’ or consistency (see, e.g. Barndorff-Nielsen et al., 2005, 2008; Hansen and Lunde, 2006; Zhang, 2006; Zhang et al., 2005). These contributions are subject to the same observation: applied researchers are necessarily forced to select a value for φ . This paper shows that, for a given φ , the finite sample properties of HAC-type variance estimators might not conform closely with existing asymptotic approximations. However, the ratio φ can be chosen optimally on the basis of a finite sample MSE criterion. In other words, the finite sample properties of HAC-type integrated variance estimators can be optimized. Our approach relates to the optimal MSE approach to integrated variance estimation by virtue of realized variance of Bandi and Russell (2003, 2008). As in Bandi and Russell (2003, 2008), we focus on finite sample performance and study an MSE-based method to optimize such a performance. Bandi and Russell (2003, 2008) write the conditional (on the volatility path of the underlying price process) MSE of the classical realized variance estimator (Andersen et al., 2003; Barndorff-Nielsen and Shephard, 2002) as a function of the sampling frequency and select an optimal sampling frequency which minimizes the MSE. Here, the conditional MSEs of alternative integrated variance estimators are written as a function of φ and selection of an optimal φ is conducted for a given number of intra-daily observations. Interestingly, Kiefer and Vogelsang (2005) have also recently highlighted the importance of treating the ratio φ as fixed in deriving asymptotic approximations to the properties of HAC estimators. Differently from Kiefer and Vogelsang (2005), however, we
146
F.M. Bandi, J.R. Russell / Journal of Econometrics 160 (2011) 145–159
do not aim to derive asymptotic approximations for HAC estimators (and corresponding test statistics) for any value of φ . Rather, we study selection of φ in order to optimize the estimators’ finite sample performance as summarized by their conditional MSEs. Importantly, we can do so because, contrary to the more classical HAC literature, the existing work on integrated variance estimation in the presence of market microstructure noise has relied on a class of price formation models (discussed in Section 2) which readily lends itself to finite sample investigations. Using midpoints of bid–ask quotes for a sample of S&P100 stocks, we find that the root MSEs of HAC-type integrated variance estimators at the optimal φ value imply precise estimation of the integrated price variance over the period. In the case of biased (but even consistent) kernel estimators, estimation accuracy deteriorates quickly with suboptimal choices of φ . While the optimal finite sample MSE values of these estimators are smaller than the optimal finite sample MSE values of the classical realized variance estimator, the gains that biased estimators provide over the realized variance estimator can be either reduced or lost through suboptimal choices of φ . Importantly, asymptotic selection criteria for φ do not perform well. They imply a large finite sample bias component. We show how to choose φ in practice. Our optimal choice of φ yields considerable finite sample gains by reducing the impact of the bias term. The case of unbiased (or roughly unbiased) and consistent kernel estimators is somewhat different. Asymptotic selection criteria for φ perform, in general, better than in the biased case. Additionally, the MSEs (variances) of these estimators are fairly flat, albeit still convex, in φ . Hence, suboptimal choices of φ do not lead to drastic losses. Even though these estimators have the potential to be substantially more accurate than biased kernel estimators, as in the biased case, the asymptotic approximations to the estimators’ finite sample variances may overestimate their finite sample precision. We quantify the difference between their asymptotic and finite sample accuracy in practice. Finally, we provide some evidence about the economic gains yielded by our finite sample (MSE-based) procedures in the context of a classical portfolio-choice problem. We refer the interested reader to a companion paper (Bandi et al., 2008) for broader applications of our methods to variance forecasting for the purpose of option pricing and trading. Our previous work on realized variance focused on the finite sample performance of the classical realized variance estimator. If realized variance is used to identify integrated variance in the presence of market microstructure noise, the number of observations ought to be chosen optimally on the basis of finite sample criteria. Bandi and Russell (2003, 2008) provide one such (MSE-based) criterion, but other metrics (possibly dictated by economic theory) may be used. This paper looks at the finite sample performance of the recently proposed integrated variance kernel estimators. Again, if HAC-type estimators are used to identify integrated variance over a period, the number of autocovariances may be chosen optimally (given a kernel function and a certain number of intra-daily observations) to optimize a finite sample criterion. We, again, provide a statistically meaningful (MSE-based) procedure for doing this. While one could rank alternative approaches to integrated variance estimation in the presence of microstructure noise based solely on asymptotic properties, this paper (and our previous work on realized variance) shows that the resulting ranking can be misleading. In effect, unoptimized consistent kernel estimators can perform substantially less well than optimized realized variance (see Section 4.1.), despite the well-known inconsistency of realized variance (Bandi and Russell, 2003; Zhang et al., 2005). In addition, as a further example, consistent kernel estimators with the same asymptotic distribution can have drastically different finite sample properties (see Section 4.2.).
The analysis in this paper sheds light on the relative performance of several recent approaches to integrated variance estimation (including realized variance, the two-scale estimator of Zhang et al. (2005), in its traditional and bias-corrected form, and the class of flat-top, unbiased kernel estimators proposed by BarndorffNielsen et al. (2005, 2008)). Our intent is not to advocate a specific method. However, regardless of the estimator used, we recommend explicit optimization of its finite sample properties, when possible. Our goal is to facilitate this approach and provide directions for practical implementations. The paper proceeds as follows. Section 2 discusses the price formation model and the class of HAC-type estimators which are the focus of our work. Section 3 presents the finite sample MSEs (as a function of φ ) of recently proposed HAC estimators of the integrated price variance and discusses the choice of φ . In Section 4 we apply the methods to three representative stocks, i.e., Goldman Sachs, SBC Communications, and EXXON Mobil Corporation. Section 5 evaluates the usefulness of our finite sample methods in the context of a classical portfolio-choice problem. Section 6 concludes. The Appendix contains the proofs. 2. The framework Following the notation in Bandi and Russell (2008), inter alia, denote a trading day by h = [0, 1]. The trading day is divided into 1 m equispaced subperiods ti − ti−1 = m with i = 1, . . . , m so that 1 t0 = 0 and tm = 1. Now define p(ti ) − p(ti−1 ) = pe (ti ) − pe (ti−1 ) + η(ti ) − η(ti−1 ),
ri
rie
εi
(1)
where ri is an observed continuously compounded intra-daily return, rie is an equilibrium continuously compounded intra-daily return,2 and εi is a market microstructure contamination in the intra-daily return process. As in Bandi and Russell (2006a), Barndorff-Nielsen et al. (2005, 2008), Zhang et al. (2005), and Zhang (2006), among other works, we make the following assumptions: Assumption 1. The equilibrium price process pe is a stochastic volatility local martingale, namely, pe ( t ) =
t
∫
σs dWs ,
(2)
0
where {Wt : t ≥ 0} is a standard Brownian motion assumed to be independent of the càdlàg spot volatility process σt for all t. Furthermore, Q (t ) =
t
∫
σs4 ds < ∞
(3)
0
for all t. Assumption 2. The logarithmic price contaminations η(ti ) are i.i.d. mean zero with E (η2 ) = ση2 and E (η4 ) = θ ση4 < ∞.3 The η(ti )’s are independent of the equilibrium price process.
1 Extensions to deterministic non-equispaced arrival times are immediate. Allowing for stochastically spaced observations possibly dependent on the price process is more involved but may be empirically important. We refer the reader to the work of Renault and Werker (2008) for discussions. 2 As is customary in this literature, we are purposely unspecific about the nature of the equilibrium. 3 In what follows, for convenience, we set θ = 3 (i.e., the Gaussian case). This simplification allows us to only use estimates of ση2 , rather than estimates of
both ση2 and E (η4 ), in the empirical evaluation of the MSEs. The simplification is
F.M. Bandi, J.R. Russell / Journal of Econometrics 160 (2011) 145–159
These conditions are standard. They provide a uniform framework in which the asymptotic properties of the recently proposed kernel estimators have been derived. Since we compare finite sample performance to asymptotic performance for each estimator (and across estimators), we use assumptions under which limiting results have been derived for all estimators studied in this paper. The empirical validity of these assumptions depends on the market structure (centralized versus decentralized markets), the nature of the price measurements (transaction prices versus midpoints of bid–ask spreads, for instance), and the sampling method (calendar time sampling versus event time sampling). We refer the reader to Bandi and Russell (2006b) for discussions. Hansen and Lunde (2006) study the empirical features of the noise for a sample of NYSE and NASDAQ stocks. Awartani et al. (2009) propose hypothesis tests on the noise properties. The object of econometric interest is the integrated price 1 variance over the trading day, namely V = 0 σs2 ds. To this extent, consider the asymmetric kernel estimator
V = w0 γ0 + 2
q −
ws γs ,
(4)
s=1
where γs = i=1 ri ri+s and the ws ’s are generic weights. This class of integrated variance estimators is in the tradition of zerofrequency nonparametric spectral density estimators, or HAC estimators (Andrews, 1991; Andrews and Monahan, 1992; Newey and West, 1987, among others). Hansen and Lunde (2006) study the finite sample MSE properties of V for the case q = 1, w0 = 1, and w1 = mm−1 (see Zhou, 1996, for an introduction
∑m−s
to this estimator).4 Hansen and Lunde (2006) also discuss the finite sample bias properties of V for the more general case of an unrestricted q with w0 = 1 and ws = mm−s . The limiting features of V as a ‘‘near-consistent’’ estimator of the integrated variance of the equilibrium price process V are examined in Barndorff-Nielsen et al. (2005). Under Assumptions 1 and 2, Barndorff-Nielsen et al. (2005) show that, when using Bartlett-type kernel weights (i.e., q −s −1 q−1 when w0 = mm and ws = q for s = 1, . . . , q), the q asymptotic variance of V coincides with the theoretical lower bound of the limiting variance of asymmetric kernel estimators in the class represented by Eq. (4), namely 4(E(η2 ))2 . For an average
1
stock, (E(η2 ))2 is very small relative to V = 0 σs2 ds (see Section 4), hence the ‘‘near-consistency’’ of the Bartlett-type kernel-based estimator. Importantly, ‘‘near-consistency’’ requires q, m → ∞ q
q2
with m → 0 and m → ∞. Asymmetric kernels are inconsistent, unless appropriately modified. The two-scale estimator proposed by Zhang et al. (2005) (see Eq. (12) below) is a ‘‘modified’’ Bartlett-type kernel estimator. Its consistency and asymptotic mixed normality (at rate m1/6 ) are derived under q = cm2/3 (where c is a constant to be chosen appropriately), thereby implying limiting conditions on q and m similar to those leading to the ‘‘near-consistency’’ of the Bartlett kernel estimator. The Bartlett kernel estimator and its modified
147
version belong to the class of quadratic estimators. An interesting discussion of quadratic estimators and their use in integrated variance estimation is contained in Sun (2006). Barndorff-Nielsen et al. (2005, 2008) have recently advocated (unbiased) flat-top symmetric kernels of the type
V BNHLS = γ0 +
q −
ws ( γs + γ−s ) ,
(5)
s=1
where γs =
∑m
i =1 r i r i −s
with s = −q, . . . , q, ws = k
s−1 q
and
k is a function defined on [0, 1] satisfying k(0) = 1 and k(1) = 0.5 When q = cm2/3 , this class of estimators has an asymptotic mixed normal distribution. This distribution is the same as that of the two-scale estimator when using the flat-top Bartlett kernel, i.e., k(x) = 1 − x. The additional requirements k′ (0) = 0 and k′ (1) = 0 yield a faster rate of convergence of the estimators (m1/4 ) to their mixed normal distribution. When k(x) = 1 − 3x2 + 2x2 , the estimator has the same limiting distribution as the multi-scale estimator of Zhang (2006). In all cases, the number of autocovariances is assumed to diverge to infinity with the sample size at a certain rate. This condition may lead to imprecise asymptotic representations of the estimators’ finite sample properties and equally inaccurate (asymptotic) choices of the number of autocovariances. In practice, q = ⌊φ m⌋, where, as is customary, ⌊x⌋ denotes the largest integer that is smaller than x or equal to x, with 0 < φ ≤ 1. Next, we show how to select φ optimally on the basis of a finite sample MSE criterion. We study (1) the asymmetric Bartlett-type kernel estimator in its traditional and (a novel) bias-corrected form, (2) the modified Bartlett-type kernel estimator, i.e., the twoscale estimator of Zhang et al. (2005), again in its traditional and (a novel) bias-corrected form, and (3) the general class of flattop symmetric kernel estimators proposed by Barndorff-Nielsen et al. (2005). We focus on (2) since the two-scale estimator is, to the best of our knowledge, the first integrated variance estimator found to be consistent in the presence of market microstructure noise. We study (1) because the asymmetric Bartlett-type estimator has, as we show below, very similar finite sample properties to the two-scale estimator, despite being theoretically inconsistent. Finally, we analyze the general class of flat-top symmetric kernel estimators because, contrary to (1) and (2), these estimators are unbiased by construction. In addition, this class includes estimators which, while unbiased, have the same limiting properties as the two-scale estimator and the multi-scale estimator, under appropriate assumptions. We do not explicitly consider the multi-scale estimator but expect it, when suitably bias-adjusted, to behave similarly to the bias-corrected two-scale estimator and the flat-top symmetric kernel estimators. 3. Choosing φ 3.1. The asymmetric Bartlett kernel estimator We start with the estimator in Eq. (4) computed using Bartlett1 type kernel weights. In what follows, Q = 0 σs4 ds. Theorem 1
conceptually unimportant. The optimal number of autocovariances (and the MSEs at the optimum) of the biased estimators (in Theorems 1 and 2) are hardly affected by the properties of the noise (see Section 4.1.). This is because the bias term of these estimators plays a fundamental role in finite samples and the bias does not depend on the properties of the noise. In order to account for θ ̸= 3 in the case of the class of unbiased estimators in Theorem 3, the variance term of these estimators can be modified by simply writing Ω2 [1, 1] = θ, Ω2 [1, 2] = Ω2 [2, 1] = −1 − θ, Ω2 [2, 2] = 4 + θ , and Ω3 [1, 1] = (−θ + 1)/2, Ω3 [1, 2] = Ω3 [2, 1] = (θ − 1)/2 + 1, Ω3 [2, 2] = (−θ + 1)/2 − 7/2. Similar modifications may be introduced in Corollary 1 through 3. 4 The estimator’s finite sample MSE properties in the context of a pure jump process of finite variation for the equilibrium price are studied by Oomen (2006).
5 Barndorff-Nielsen et al. (2008) re-express the logarithmic price end-points, p(0) 1 1 and p(1), as an average of M observations in 0 ± m and 1 ± m . This averaging (provided M → ∞) simplifies the look of the limiting variances of the estimators in certain cases. Naturally, M is fixed in practice and recommended to be small (see Barndorff-Nielsen et al., 2008). In what follows, we set it equal to 1. This choice is designed to make the comparison between finite sample and asymptotic findings more straightforward. Because a diverging M will sometimes decrease the limiting variance of the corresponding estimator, this choice is also favorable to asymptotic approaches when compared to their finite sample counterparts.
148
F.M. Bandi, J.R. Russell / Journal of Econometrics 160 (2011) 145–159
provides the conditional (on the volatility path—as in Bandi and Russell, 2003, 2008) MSE of the estimator expressed as a function q of the ratio φ = m . The optimal φ , φ ∗ , is defined as the arg min of the conditional MSE. The MSE in Theorem 1 and in the subsequent theorems should be∑ interpreted as ‘‘nearly’’ exact. We replace m e 4 Q quantities like m E i=1 (ri ) with the integrated quarticity 3 σ ∑m e 4 m for ease of interpretation. This is justifiable in that 3 i=1 (ri ) estimates Q consistently as m → ∞ (Barndorff-Nielsen and Shephard, 2002). For sufficiently liquid stocks, in fact, the available number of intra-daily observations is large enough (cf., Section 4) for the representation to be empirically meaningful (see, e.g., Barndorff-Nielsen and Shephard, 2005, for further discussions). Theorem 1. Consider q −
V Bar = w0 γ0 + 2
Corollary 1 (The Bias-corrected Bartlett Estimator). Consider V Bar_adj
=
φ m2 −m−φ m+1 φ m2
−1
Var V Bar_adj (φ) =
V Bar . Then, (bias(φ))2 = 0 and −2
φ m2 − m − φ m + 1 φ m2
Var(φ),
(11)
where Var(φ) is defined in Eq. (9).
ws γs ,
(6)
s =1
with γs =
is a convex function of φ . This result mirrors a similar result in the context of integrated variance estimation by virtue of realized variance ( γ0 ). There, Bandi and Russell (2003, 2008) show that, when market microstructure noise plays a role, the variance of the classical realized variance estimator is a convex function of the number of intra-daily observations used to compute the estimator.
q −1
m−1 and ws = q i=1 ri ri+s . Assume w0 = m q for s = 1, . . . , q. The optimal (in a conditional MSE sense) φ is defined as
∑m−s
q −s
∗ φBar = arg min (bias(φ))2 + Var(φ) ,
(7)
0s+2
j ≥ 3,
V2 m2
cov( γj , zj+i ) = 2ση2 (σm2 −i+1 − σm2 −i+2 ) − (σm2 −i − σm2 −i+1 ) + 2σm2 −i+1 σm2 −j−i+1 , i ≥ 3.
[
q−4 q−1 2 − −
q−4 q−1 2 − −
q2 j=1
4 m2
cov( γj , zj+2 ) = −2ση4 + 2ση2 (σm2 −1 − σm2 ) − (σm2 −2 − σm2 −1 ) + 2σm2 −1 σm2 −j−1 ,
(q − k)(q − s) q2 s=1 k>s+2 × 2ση2 (σs2 − σs2−1 ) − (σs2+1 − σs2 ) + 2σs2 σk2
+
m
17 V 2
cov( γj , zj+1 ) = 4ση + 2ση σ + 2ση2 (σm2 −j − σm2 −1 ) + 2σm2 σm2 −j ,
q −3 2 −
q2 s=1
−
V
(q − (s + 2))(q − s) q2 s=1 × −2ση4 + 2ση2 (σ22 − σ12 ) − (σ32 − σ22 ) + 2σ22 σs2+2
=
V
cov( γ0 , z1 ) = 10ση4 + 8ση2 σm2 + 2σm4 ,
cov( γj , zj ) = −2ση4 − 2ση2 σm2
(q − s)(q − k)cov ( γs , zk )
q −2 2 −
− 8ση2
cov( γ1 , z1 ) = −4ση4 − 2ση2 σm2 ,
(q − (s + 1))(q − s) q2 s=1 4 × 4ση + 4ση2 σ12 + 2ση2 (σs2+1 − σ22 ) + 2σ12 σs2+1
+
q2
When computing the covariance between α and β we can treat the terms involving z similarly to those involving z in that
4
m q
− ση4 q − ση2 V
1
2
q −2 2 −
+
q3 7 V2
ση4 − 4ση4
cov( γ0 , zj ) = 4ση2 σm2 −j+1 − σm2 −j+2 + 2σm4 −j+1 and
As for B′ , write B′ = 2
12
(q − s)2 [−2ση4 − 2ση2 σ12 ]
q2 s=2 4ση4 4
3 V
ση4 q +
z2 ) = −4ση4 + 4ση2 σm2 −1 − σm2 + 2σm4 −1 , cov( γ0 ,
(q − s)2 cov ( γs , zs )
q2 s=1 1 = 2 2 (q − 1)2 [−4ση4 − 2ση2 σ12 ] q
+2
16
4
5V ση2 + − q 3 qm 2 qm2 4 m2 2 1V 2 8 2V 1 V2 + q + σ q + 2 . η 4 m2 3 m q2 m2
A′
+2
B′ = −2ση4 +
(q − s)2 cov ( γs , zs )
]
.
V
Putting all the elements together, and rewriting in terms of φ , we obtain ZMA 1 Var V = K − (Q + V 2 )φ 2 3 1 21 4 1 + − V − 4V 2 2 + Q φ m 3 3 m 8ση4 + 16ση2 V − 8Q − 56 V2 4 3 2 + − 4 Q +V + m m3
+ +
24ση2 V −
10 Q 3 2 m
2 m5
] 2
m2
+
Q +
+ 8ση4
+
−8ση4 + 8ση2 V
1
φ
m
−4ση4 − 8ση2 V + 4Q − 8V 2
m4
−4ση4 − 16ση2 V + 2Q
+
m3
8ση4 − 8ση2 V
m2
+
8 m
ση4
1
φ2
where K = (−4ση − 8V ση ) 4
×
1 m2
2
1 m
+ 2Q + 8V
+ −4ση − 8ση V + 1 2 m3
4
.
2
13 3
Q +
79 3
V
2
,
158
F.M. Bandi, J.R. Russell / Journal of Econometrics 160 (2011) 145–159
As for the bias term, write 1 q
Furthermore,
q−1
E (ϑq ) =
1−
1−
q s=1
q s=1
1
=
q−1
(q − 1)2ση2 +
q
+ =
(q − s)E (zs ) +
(q − s)E ( zs )
q −1 1−
1
q s=1
q
e e + w| Var γ r ,ε + γ ε,r w,
(q − s)σs2 + (q − 1)2ση2
since
q−1 1−
(q − s)σ
e e cov w | γ r ,r , w | γ ε,ε
and cov w | γ ε,ε , w| γ r ,ε + w| γ ε,r = 0. The final result can be easily derived by suitably re-expressing the quadratic forms in Theorem 1 of Barndorff-Nielsen et al. (2008).
Hence,
mq − m + q − 1
V + 2mση2
mq
q
=−
+
q
m
2V
q
−
m
m
V−
qm
.
Therefore, in terms of φ , bias V ZMA
2
V2
=
φ 2 m2
4V 2
+
m2
+ φ2V 2 +
V2
φ 2 m4
−
4V 2
φ m2
φ V 2V 2 − 4 V2 − 4 + 2 3 m φ m φm m 2 2 2 6V 2V 4V V2 2 2 = + + V φ − φ + − 4 2 2 +
2V
2
2V
+
−4
V2
2
2
2 m3
m
m
1
φ
m3
m
+
V2
+
m2
2V 2
+
m3
m
V
2
m4
1
φ2
Proof of Theorem 3. Write
V BNHLS = γ0 +
q − s−1 k
s=1
= γ0r
e ,r e
q
+ γ0ε,ε + γ0r
e ,ε
γs + γ−s ) ( + γ0ε,r + e
q − s−1 k
q
s=1
e e e e × γsr ,r + γsε,ε + γsr ,ε + γsε,r e e e e + γ−r s,r + γ−ε,εs + γ−r s,ε + γ−ε,sr = w | γ r ,r + w | γ ε,ε + w | γ r ,ε + w | γ ε,r , ∑ where γsy,x = m i=1 yi xi−s with s = −q, . . . , q, y , x y , x γ y,x = γ0 , γ1 + γ−y,1x , . . . , γqy,x + γ−y,qx , e
e
e
e
and
w = 1, 1, k
1 q
,...,k
q−1 q
|
.
Now, notice that
BNHLS e e E V = w| E γ r ,r
+ w| E γ ε,r
e
References
m
V
e
q−1 ση2 (m − 1) −V − 2 q 4 q V − 4− ση2 − V + V
= 2mV ση2 − 2mV ση2 = 0, e e e e cov w | γ r ,r , w | γ r ,ε + w| γ ε,r = 0,
2 m+1−s
q s=1 q V 4 ση2 + V − . 4− q m m
bias V ZMA =
BNHLS e e Var = w| Var γ r ,r w + w | Var( γ ε,ε )w V
e + w | E ( γ ε,ε ) + w| E γ r ,ε e
= V + 2mση2 − mση2 − mση2 = V .
.
Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2003. Modeling and forecasting realized volatility. Econometrica 71, 579–625. Andrews, D.W.K., 1991. An heteroskedasticity and autocorrelation consistent covariance matrix estimator. Econometrica 59, 817–854. Andrews, D.W.K., Monahan, J.C., 1992. An improved heteroskedasticity and autocorrelation consistent covariance matrix estimator. Econometrica 60, 953–966. Awartani, B., Corradi, V., Distaso, W., 2009. Assessing market microstructure effects via realized volatility measure with an application to the Dow Jones industrial average stocks. Journal of Business and Economic Statistics 27, 251–265. Bandi, F.M., Russell, J.R., 2003. Microstructure noise, realized volatility, and optimal sampling. Working Paper. Bandi, F.M., Russell, J.R., 2006a. Separating market microstructure noise from volatility. Journal of Financial Economics 79, 655–692. Bandi, F.M., Russell, J.R., 2006b. Comment on ‘‘Realized variance and market microstructure noise’’ by Hansen and Lunde. Journal of Business and Economic Statistics 24, 167–173. Bandi, F.M., Russell, J.R., 2008. Microstructure noise, realized variance, and optimal sampling. Review of Economic Studies 75, 339–369. Bandi, F.M., Russell, J.R., Yang, C., 2008. Realized variance forecasting and option pricing. Journal of Econometrics 147, 34–46. Barndorff-Nielsen, O.E., Shephard, N., 2005. How accurate is the asymptotic approximation to the distribution of realised volatility? In: Andrews, D.W.K., Stock, J.H. (Eds.), Identification and Inference for Econometric Models. A Festschrift in Honour of T.J. Rothenberg, Cambridge University Press, Cambridge, pp. 306–331. Barndorff-Nielsen, O.E., Hansen, P., Lunde, A., Shephard, N., 2005. Regular and modified kernel-based estimators of integrated variance: the case with independent noise. Working Paper. Barndorff-Nielsen, O.E., Hansen, P., Lunde, A., Shephard, N., 2008. Designing realized kernels to measure the ex-post variation of equity prices in the presence of noise. Econometrica 76, 1481–1536. Barndorff-Nielsen, O.E., Shephard, N., 2002. Econometric analysis of realized volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society, Series B 64, 253–280. Bartlett, M.S., 1950. Periodogram analysis and continuous spectra. Biometrica 37, 1–16. Corsi, F., 2009. A simple approximate long-memory model of realized volatility. Journal of Financial Econometrics 7, 174–196. Fleming, J., Kirby, C., Ostdiek, B., 2001. The economic value of volatility timing. Journal of Finance 56, 329–352. Fleming, J., Kirby, C., Ostdiek, B., 2003. The economic value of volatility timing using ‘‘realized volatility’’. Journal of Financial Economics 67, 473–509. Geweke, J., Porter-Hudak, S., 1983. The estimation and application of long memory time series models. Journal of Time Series Analysis 4, 221–238. Ghysels, E., Santa-Clara, P., Valkanov, R., 2006. Predicting volatility: getting the most out of return data sampled at different frequencies. Journal of Econometrics 131, 59–95. Hansen, P.R., Lunde, A., 2005. A realized variance for the whole day based on intermittent high-frequency data. Journal of Financial Econometrics 4, 525–554. Hansen, P.R., Lunde, A., 2006. Realized variance and market microstructure noise (with discussion). Journal of Business and Economic Statistics 24, 127–218. Kiefer, N.M., Vogelsang, T.J., 2005. A new asymptotic theory for heteroskedasticityautocorrelation robust tests. Econometric Theory 21, 1130–1164. Neave, H.R., 1970. An improved formula for the asymptotic variance of the spectrum estimates. Annals of Mathematical Statistics 41, 70–77.
F.M. Bandi, J.R. Russell / Journal of Econometrics 160 (2011) 145–159 Newey, W., West, K., 1987. A simple positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–708. Oomen, R.C.A., 2005. Properties of bias-corrected realized variance under alternative sampling schemes. Journal of Financial Econometrics 3, 555–577. Oomen, R.C.A., 2006. Properties of realized variance under alternative sampling schemes. Journal of Business and Economic Statistics 24, 219–233. Renault, E., Werker, B., 2008. Causality effects in return volatility measures with random times. Working Paper.
159
Sun, Y., 2006. Best quadratic unbiased estimators of integrated variance. Working Paper. Zhang, L., 2006. Efficient estimation of stochastic volatility using noisy observations: a multi-scale approach. Bernoulli 12, 1019–1043. Zhang, L., Mykland, P., Aït-Sahalia, Y., 2005. A tale of two time scales: determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association 100, 1394–1411. Zhou, B., 1996. High-frequency data and volatility in foreign-exchange rates. Journal of Business and Economic Statistics 14, 45–52.
Journal of Econometrics 160 (2011) 160–175
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Ultra high frequency volatility estimation with dependent microstructure noise✩ Yacine Aït-Sahalia a,∗ , Per A. Mykland b , Lan Zhang c a
Princeton University and NBER, Department of Economics, Princeton, NJ 08544-1021, United States
b
The University of Chicago, United States
c
University of Illinois - Chicago, United States
article
info
Article history: Available online 6 March 2010 Keywords: Market microstructure Serial dependence High frequency data Realized volatility Subsampling Two scales realized volatility Multiple scales realized volatility
abstract We analyze the impact of time series dependence in market microstructure noise on the properties of estimators of the integrated volatility of an asset price based on data sampled at frequencies high enough for that noise to be a dominant consideration. We show that combining two time scales for that purpose will work even when the noise exhibits time series dependence, analyze in that context a refinement of this approach is based on multiple time scales, and compare empirically our different estimators to the standard realized volatility. © 2010 Elsevier B.V. All rights reserved.
1. Introduction
process
When studying financial data, the notion that noise plays an essential role is an accepted fact of life, whether at the high frequency typical of transactions data or at the lower frequencies more commonly used in asset pricing. The fact that this is a central issue is perhaps best demonstrated by the fact that two recent presidential addresses to the American Finance Association have been entitled ‘‘noise’’ (Black, 1986) and ‘‘frictions’’ (Stoll, 2000) respectively. So we work under the assumption that the observed log-price Y (either transaction or quoted) in high frequency financial data is the unobservable efficient log-price X , plus some noise component ϵ due to the imperfections of the trading process, Yt = Xt + ϵt .
(1.1)
Since X is defined implicitly (as opposed to explicitly, such as the sum of expected discounted dividends for instance) we have maintained the simple identifying assumption that ϵ is independent of the X process. It is shown in Li and Mykland (2007) that this assumption can be substantially weakened (see also Jacod (1996) and Delattre and Jacod (1997)). We are interested in the implications of such a data generating process for the estimation of the volatility of the efficient log-price
✩ We are grateful to Joel Hasbrouck for comments and help with the sequencing of trades. Financial support from the NSF under grants SBR-0350772 (Aït-Sahalia), DMS-0204639, DMS 06-04758, and SES 06-31605 (Mykland and Zhang) and the NIH under grant RO1 AG023141-01 (Zhang) is also gratefully acknowledged. ∗ Corresponding author. Tel.: +1 609 258 4015. E-mail address:
[email protected] (Y. Aït-Sahalia).
0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.028
dXt = µt dt + σt dWt
(1.2)
using discretely sampled data on the transaction price process at time intervals of length ∆. By ultra high frequency, we mean that we are in a situation where the data available are such that ∆ will be measured in seconds rather than minutes or hours. Under these circumstances, the drift is of course irrelevant, both economically and statistically, and so we shall focus on functionals of the σt process and set µt = 0. It is the case that transactions and quoted data series in finance are often observed at random time intervals (see Aït-Sahalia and Mykland (2003) for inference under these circumstances), but throughout this paper we will assume for simplicity that ∆ is nonrandom when studying the asymptotic properties of our estimators. We make essentially no assumptions on the σt process: its driving process can of course be correlated with the Brownian motion Wt in (1.2), and it need not even have continuous sample paths. The noise term ϵ summarizes a diverse array of market microstructure effects, which can be roughly divided into three groups. First, ϵ represents the frictions inherent in the trading process: bid-ask bounces, discreteness of price changes and rounding, trades occurring on different markets or networks, etc. Second, ϵ captures informational effects: differences in trade sizes or informational content of price changes, the gradual response of prices to a block trade, the strategic component of the order flow, inventory control effects, etc. Third, ϵ encompasses measurement or data recording errors such as prices entered as zero, misplaced decimal points, etc., which are surprisingly prevalent in these types of data. As is clear from the laundry list of potential sources of noise, the data generating process for ϵ is likely to be quite involved.
Y. Aït-Sahalia et al. / Journal of Econometrics 160 (2011) 160–175
161
(all)
Fig. 1. This figure shows the RV estimator [Y , Y ]T plotted against the sampling interval ∆ = T /n. The RV estimator in the figure is computed for an average of the 30 Dow Jones Industrial Average stocks, averaged again over the last ten trading days in April 2004; the objective of the double averaging is to reduce the variability of the estimator in order to display its bias. Since ∆ = T /n, the plot illustrates the divergence of RV as n → ∞ predicted by our theory: RV has bias 2nE ε 2 .
Therefore, robustness to departures from any assumptions on ϵ is desirable. Models akin to (1.1) have been studied in the constant σ case by Zhou (1996), who proposes a bias correcting approach based on autocovariances. The behavior of this estimator has been studied by Zumbach et al. (2002). Hansen and Lunde (2006) study the Zhou estimator and extensions in the case where volatility is time varying but conditionally nonrandom. Related contributions have been made by Oomen (2006) and Bandi and Russell (2008). If σt is modelled parametrically, as a constant, we showed in Aït-Sahalia et al. (2005) that incorporating ϵ explicitly in the likelihood of the observed log-returns Y provides consistent and asymptotically normal estimators of the parameters. However, what distributional assumption must be used for ϵ ? Surprisingly, we found that misspecifying the marginal distribution of ϵ has no adverse consequences. In the nonparametric case where σt is an unrestricted stochastic process, an important object of interest is the integrated volatility T or quadratic variation of the process, ⟨X , X ⟩T = 0 σt2 dt, over a fixed interval T , typically one day in empirical applications. This quantity can then be used to hedge a derivatives’ portfolio, forecast the next day’s integrated volatility, etc. Without noise, the realized ∑ (all) = ni=1 (Yti+1 − Yti )2 provides volatility (RV) estimator [Y , Y ]T an estimate of the quantity ⟨X , X ⟩T , and asymptotic theory would lead one to sample as often as possible, or use all the data available, (all) hence the ‘‘all’’ superscript. The sum [Y , Y ]T converges to the integral ⟨X , X ⟩T , with a known distribution, a result which dates back to Jacod (1994) and Jacod and Protter (1998); see also e.g., Barndorff-Nielsen and Shephard (2002) and Mykland and Zhang (2006). In Aït-Sahalia et al. (2005) and Zhang et al. (2005), we studied the corresponding problem when a relatively simple type of market microstructure noise, iid, is present. We showed there that the situation changes radically in the presence of market microstructure noise. In particular, computing RV using all the data available (say every second) leads to an estimate of the variance of the noise, not the quadratic variation that one seeks to estimate: [Y , Y ]T(all) has a bias of 2nE [ϵ 2 ], which is an order of magnitude larger than the object we seek to estimate, ⟨X , X ⟩T . The divergence of the RV estimator as the number of observations n increases as illustrated in Fig. 1, which shows the behavior of the RV estimator as a function of the sampling interval ∆ = T /n: as predicted by our theory, the plot shows divergence proportional to 1/n. Equivalently, since our theory predicts that RV ≈ 2nE ϵ 2 asymptotically in n, we expect that ln RV ≈ ln(2E [ϵ 2 ]) + ln n so
Fig. 2. This figure shows a regression of ln RV against ln n, plotted in log–log scale. Each data point in the plot represents a triplet (one stock, one day, j) from the 30 DJIA stocks, the last 10 trading days in April 2004, and j = 1 or 2 depending upon whether all the observations are used on one out of two. For ease of interpretation, the sample size n on the x-axis is translated into an average sampling interval on the basis of 1 trading day = 6.5 h = 23,400 one-second time intervals.
that a regression of ln RV on ln n should have a slope coefficient close to 1. Fig. 2 shows the result: the estimated slope coefficient is 1.02 and the null value of 1 is not rejected. In theory, an estimate of E [ϵ 2 ] can be constructed using the intercept in that regression. In practice, the quality of the estimates derived from that regression could be adversely affected by the endogeneity of the regressor, cf, the data analysis in Hansen and Lunde (2006) and the theoretical development in Li and Mykland (2007). This difficulty seems to be of some importance for quoted data, while transaction data seem more robust in this respect. While a formal analysis of this phenomenon originated in our work cited above, the empirical message that emerges from this has long been known: do not compute RV at too high a frequency. This in fact formed the rationale for the recommendation in the literature to sample sparsely at some lower frequency. A sampling interval ∆sparse is picked in the range from 5 to 30 min: see e.g., Andersen et al. (2001), Barndorff-Nielsen and Shephard (2002) and Gençay et al. (2002). We denote the RV estimator corresponding to (sparse) ∆sparse = T /nsparse as [Y , Y ]T . If one insists upon sampling sparsely, we then showed in our earlier papers how to determine the optimal sparse frequency, instead of selecting it arbitrarily. However, even if sampling sparsely at our optimally-determined frequency, one is still throwing away a large amount of data. For example, if T = 1 NYSE day and transactions occur every ∆ = 1 s, the original sample size is n = T /∆ = 23, 400. Sampling sparsely even at the highest frequency used by empirical researchers (once every 5 min) entails throwing away 299 out of every 300 observations: the sample size used is only nsparse = 78. This violates one of the most basic principles of statistics, and our objective when starting this research project was to propose a solution which made use of the full data sample, despite the fact that ultra high frequency data can be extremely noisy.1 Our approach to estimating the volatility is to use Two Scales Realized Volatility (TSRV). By evaluating the quadratic variation at two different frequencies, averaging the results over the entire sampling, and taking a suitable linear combination of the results at
1 Since the best achievable convergence rate under microstructure is of order Op (n−1/4 ) rather than the standard Op (n−1/2 ), the loss in keeping only one out of 300 √ observations is comparable to, in a standard statistical situation, keeping only 1 in 300 ≈ 17 data. The loss to subsampling is thus not quite as severe as it may at first seem, though is also smallest. The preceding numbers ignore, of course, all asymptotic constants.
162
Y. Aït-Sahalia et al. / Journal of Econometrics 160 (2011) 160–175
the two frequencies, one obtains a consistent and asymptotically unbiased estimator of ⟨X , X ⟩T . We start by briefly reviewing the rationale behind the TSRV estimator in Section 2. In our earlier paper, however, we made the assumption that the noise term was iid. In Section 3, we document that dependence in the noise can be important in some empirical situations. So our main purpose in the following will be to propose a version of the TSRV estimator which can deal with such serial dependence in market microstructure noise. Both Aït-Sahalia et al. (2005) and Hansen and Lunde (2006) have considered such departures in the case of the MLE for σ constant and Zhou estimator for σ time varying, respectively.2 , 3 Just like the marginal distribution of the noise is likely to be unknown, its degree of dependence is also likely to be unknown and so our approach will be nonparametric in nature. We develop the theory for a generalized, serial-dependence-robust, TSRV estimator in Section 4. In a nutshell, we will continue combining two different time scales, but rather than starting with the fastest possible time scale as our starting point, one now needs to be somewhat more subtle and adjust how fast the fast time scale is. Next, we analyze in Section 5 the impact of serial dependence (all) in the noise on the distribution of the RV estimators, [Y , Y ]T (sparse)
and [Y , Y ]T . We then discuss in Section 6 the Multiple Scales Realized Volatility (MSRV, Zhang (2006)), which achieves further asymptotic efficiency gains over TSRV. As we did for TSRV and RV, we analyze the impact of serial dependence in the noise on that estimator, and see that this estimator does not need to be modified because of the dependence. Finally, we provide in Section 7 an empirical study of the TSRV and MSRV estimators, and compare them to RV. We examine in particular the robustness of TSRV to the choice of the two time scales, contrast it with RV’s divergence as sampling gets more frequent and with RV’s variability in empirical samples, and study the dependence of the estimators on various ways of pre-processing the raw high frequency data. Section 8 concludes.
Fig. 3. This figure describes the construction of the TSRV estimator.
retained, while the variation of the estimator can be lessened by the averaging and the use of the full data sample. Subsampling and averaging together gives rise to the estimator
[Y , Y ](Tavg) =
2 We exclude here any form of correlation between the noise ε and the efficient price X in our analysis, something which has been stressed as potentially important by Hansen and Lunde (2006). As we discuss in Aït-Sahalia et al. (2006), however, the noise can only be distinguished from the efficient price under fairly careful modelling. In most cases, the assumption that the noise is stationary, alone, is not enough to make the noise identifiable. 3 Another issue we do not address in the present paper is that of small sample corrections to the asymptotics of the estimators. Recently, Gonçalves and Meddahi (2009) have developed an Edgeworth expansion for the basic RV estimator when there is no noise. Their expansion applies to the studentized statistic based on the standard RV and it is used for assessing the accuracy of the bootstrap in comparison to the first order asymptotic approach. By contrast, we develop in Zhang et al. (2011) an Edgeworth expansion for nonstudentized statistics for the standard RV, TSRV and other estimators, but allow for the presence of microstructure noise.
K k=1
[Y , Y ](Tsparse,k)
(2.1) (sparse,k)
constructed by averaging the estimators [Y , Y ]T obtained by sampling sparsely on each of the K grids of average size n¯ = n/K . (avg) Unfortunately, [Y , Y ]T remains a biased estimator of the quadratic variation ⟨X , X ⟩T of the true return process, although its bias 2n¯ E [ϵ 2 ] now increases with the average size n¯ of the subsamples, instead of the full sample size n as in 2nE [ϵ 2 ]. However (all) E [ϵ 2 ] can be consistently approximated by [Y , Y ]T : 1 (2.2) [Y , Y ](Tall) . E [ϵ 2 ] = 2n Thus a bias-adjusted estimator for ⟨X , X ⟩T can be constructed as (tsrv)
n¯ = [Y , Y ](Tavg) − [Y , Y ](Tall) n
⟨ X , X ⟩T
2. The TSRV estimator with IID noise Before showing how to extend TSRV to account for serial dependence in market microstructure noise, we first summarize the properties of TSRV under iid noise so that we can later on discuss the effect of dependence in the noise. The TSRV estimator is based on subsampling, averaging and bias-correction. The idea is to partition the original grid of observation times, G = {t0 , . . . , tn } into subsamples, G(k) , k = 1, . . . , K where n/K → ∞ as n → ∞. For example, for G(1) , start at the first observation and take an observation every 5 min; for G(2) , start at the second observation and take an observation every 5 min, etc. Then we average the estimators obtained on the subsamples. The idea is that the benefit (sparse) (all) of sampling sparsely, as in [Y , Y ]T vs. [Y , Y ]T , can now be
K 1 −
slow time scale
(2.3)
fast time scale
and this is the TSRV estimator. Fig. 3 summarizes this construction. If the number of subsamples is optimally selected as K ∗ = cn2/3 , then TSRV has the following distribution: (tsrv) L
⟨ X , X ⟩T
≈
⟨X , X ⟩ T
object of interest
+
1
8
[
2 2
due to noise
T
∫
σt4 dt ]1/2 Ztotal due to discretization
E [ϵ ] + c 2
n1/6 c
4T 3
(2.4)
0
total variance
and the constant c can be set to minimize the total asymptotic variance above. Unlike all the previously considered ones, this estimator is now correctly centered, and to the best of our knowledge is the first consistent estimator for ⟨X , X ⟩T in the empirically relevant case where market microstructure noise is present and the volatility is non-constant. A small sample refinement to ⟨ X , X ⟩T can be constructed as follows (tsrv,adj)
⟨ X , X ⟩T
−1 (tsrv) n¯ ⟨ = 1− X , X ⟩T . n
(2.5)
The difference from the estimator in (2.3) is of order Op (¯n/n) = Op (K −1 ), and thus the two estimators behave identically to the asymptotic order that we consider. The estimator (2.5), however, has the appeal of being unbiased to a higher order.
Y. Aït-Sahalia et al. / Journal of Econometrics 160 (2011) 160–175
163
Table 1 Descriptive statistics. Descriptive statistics
3M
AIG
Intel
Microsoft
Transactions Average number of transactions per day Average time between transactions (s) Min log-return from transactions Max log-return Average daily first order autocorrelation Average daily second order autocorrelation Average daily third order autocorrelation
2820 8.3 −0.019 0.019 −0.41 0.017 0.009
3435 6.8 −0.028 0.028 −0.40 0.08 −0.01
13,018 1.8 −0.044 0.044 −0.60 0.21 −0.12
14,299 1.6 −0.083 0.082 −0.63 0.25 −0.17
Quotes Average number of quote revisions per day Average time between quote revisions Min log-return from quote revisions Max log-return Average daily first order autocorrelation Average daily second order autocorrelation Average daily third order autocorrelation
12,824 1.8 −0.031 0.034 −0.49 0.001 0.005
13,507 1.7 −0.044 0.044 −0.49 0.001 0.004
22,275 1.1 −0.016 0.016 −0.24 0.07 0.03
22,661 1.0 −0.013 0.013 −0.23 0.02 0.02
For the purpose of counting transactions, only transactions leading to a price change are counted. Identical quotes are counted as a single one when reporting the number of quote revisions. Log-returns from quotes are computed using a bid-ask midpoint, weighted by the respective depth of the two sides. Autocorrelations of log-returns are reported in transaction time and quote time, respectively. Averages are computed over the last ten trading days in April 2004 (April 19–23 and 26–30). Minima and maxima are computed over the full ten day sample. All descriptive statistics for the transactions data are reported prior to any data processing, except for the removal of obvious data errors such as prices or quotes reported as zero. The estimates to be computed in the rest of the paper from transaction prices are based on data cleaned to remove any price ‘‘bounceback,’’ defined as a price jump of size greater than a cutoff of 1%, immediately followed by a jump of equal magnitude but an opposite sign (see Section 7.5 below). The raw quotes data are pre-processed to remove any sets of quotes whose bid or ask price deviate from the closest transaction price recorded by more than 5% (except in instances where the transaction price itself moves by that amount). The data are from the TAQ database.
If two scales are better than one, how about using three or more? This question has been studied in Zhang (2006) where it is shown that one can further improve efficiency by taking (avg) a weighted average of [Y , Y ]T for multiple time scales. The resulting estimator, the MSRV, has a rate of convergence of n−1/4 , and is thus an improvement over the TSRV’s rate of n−1/6 . This is the best possible rate even in the parametric case (when σt = σ , a constant, and the noise is normal), as established in Gloter and Jacod (2000). MSRV is further discussed in Section 6. Following these papers, Barndorff-Nielsen et al. (2008) have shown that the TSRV and MSRV estimator are closely related to their ‘‘realized kernel’’ class of estimators based on autocovariances. They are also closely related to the ‘‘preaveraging’’ class of estimators studied by Jacod et al. (2009) and Podolskij and Vetter (forthcoming). The three types of estimators (subsampling, realized kernel, and preaveraging) differ in their treatment of end effects. Another possible derivation and generalization of our TSRV estimator is provided by Curci and Corsi (2005). They propose realized volatility measures based on a Discrete Sine Transform (DST) of high frequency returns data. The DST can diagonalize MA processes by providing an orthonormal basis decomposition of observed returns to disentangle the volatility signal of the underlying price process from the market microstructure noise. This approach delivers an estimator close to TSRV that combines two RV estimators and a new multi-frequency estimator that, like MSRV, combines multiple RV estimators. 3. Time series dependence in high frequency market microstructure noise We now turn to examining empirically whether there is a need to relax the assumption that the market microstructure noise ϵ is iid. In other words, is it the case that every time a new price is observed, one observes it with an error that is independent of the previous one, no matter how close together those two successive prices might be? 3.1. The data Our data consist of transactions and quotes from the NYSE’s TAQ database for the 30 Dow Jones Industrials Average (DJIA)
stocks, over the last ten trading days of April 2004 (April 19–23 and 26–30). To save space, we will focus on four of the thirty stocks: 3M Inc. (trading symbol: MMM), American International Group (trading symbol: AIG), Intel (trading symbol: INTC) and Microsoft (trading symbol: MSFT). Of these, the first two are traded on the NYSE while the latter two are traded on the Nasdaq. Table 1 reports the basic summary statistics on these four stocks’ transactions. In our earlier paper where we introduced the TSRV estimator, we assumed that microstructure noise ϵ was iid. In that case, logreturns
∫ Yτi − Yτi−1 =
τi τi−1
σt dWt + ϵτi − ϵτi−1
follow an MA(1) process since the increments
(3.1)
τi
τi−1
σt dWt are
uncorrelated, ϵ yW and therefore, in the simple case where σt is nonrandom (but possibly time varying), E
Yτj − Yτj−1
=
∫
τi τi−1
Yτi − Yτi−1
σt2 dt + 2E [ϵ 2 ]
2 −E [ϵ ]
0
if j = i (3.2) if j = i + 1 if j > i + 1.
Under the simple iid noise assumption, log-returns are therefore (negatively) autocorrelated at the first order. We will examine below whether this is compatible with what we observe in the data, but for now note that this is consistent with the predictions of many simple reduced form market microstructure models. For instance, in the Roll (1984) model, ϵt = (s/2)Qt where s is the bid/ask spread and Qt , the order flow indicator, is a binomial variable that takes the values +1 and −1 with equal probability, generating first order autocorrelation in returns. French and Roll (1986) proposed to adjust variance estimates to control for such autocorrelation and Harris (1990) studied the resulting estimators. Zhou (1996) proposed a bias correcting approach based on the first order autocovariances; see also Hansen and Lunde (2006) who study the Zhou estimator. We now turn to confronting this model to the data. The top panel of Fig. 4 reports the autocorrelogram computed for the 3M and AIG transactions, respectively. That part of the plot shows a
164
Y. Aït-Sahalia et al. / Journal of Econometrics 160 (2011) 160–175
Fig. 4. Top and middle panels: Log-return autocorrelograms from transactions for American International Group, Inc. (AIG), 3M Co. (MMM), Intel (INTC) and Microsoft (MSFT), last ten trading days in April 2004. Bottom panel: log-return autocorrelogram from the same transactions for Intel and Microsoft, superimposed with the autocorrelogram fitted from the basic i.i.d. plus AR(1) model for the noise.
good agreement with the prediction of the iid noise model, namely the MA(1) structure in (3.2) for 3M and AIG. However, the figure in the middle panel of the same Fig. 4 shows the corresponding result for Intel and Microsoft. It is clear that the MA(1) model, and consequently the iid noise model, does not fit those data well for these two stocks. Both stocks were added to the DJIA on November 1, 1999, becoming the first two companies traded on the Nasdaq to be included in the DJIA. It is important to note however, that the difference between the two figures does not appear to be driven by the different market structures on the NYSE (a specialist market structure) compared to the Nasdaq (a dealers’ market). In fact, the autocorrelogram pattern for the other 26 DJIA stocks is closer to that of Intel and Microsoft, not that of 3M and AIG. Table 2 reports the results of a cross-sectional OLS regression of the autocorrelation coefficients of order 2–5 on the average time between transactions used as a measure of the liquidity of the stock, for the 30 DJIA stocks. These autocorrelation coefficients of order greater than 1 would be zero if the noise term were serially uncorrelated, as in (3.2). The table shows that the lower the time between successive transactions, the higher the observed autocorrelation in absolute value (the coefficients alternate signs because the autocorrelation coefficients do, as in the middle panel of Fig. 4). In other words, based on these data, the more liquid the stock, the more likely we are to face departures from the iid assumption. Another angle to understand the departure from the MA(1) autocorrelogram in the direction of an ARMA(1, 1) is through the use of different sampling schemes. Griffin and Oomen (2008) provide an interesting analysis of the impact of tick vs. transaction sampling. Their results show that the nature of the sampling mechanism can generate fairly distinct autocorrelogram patterns for the resulting log-returns. Now, from a practical perspective, we can view the choice of sampling schemes as one more source of noise, this one attributable to the econometrician who is deciding
Table 2 Regressions of higher order autocorrelations on stock liquidity. Autocorrelation order
Constant
Avg time between transactions
R2
2
0.25 (8.0)
−0.015 (−3.9)
0.35
3
−0.16 (−6.5)
0.012 (4.2)
0.39
4
0.11 (7.1)
−0.009 (−4.9)
0.46
5
−0.08 (−6.7)
0.008 (5.2)
0.49
This table reports the results of cross-sectional OLS regressions of the autocorrelation coefficients of order 2–5 on the average time between transactions used as a measure of the liquidity of the stock, for the 30 DJIA stocks. These autocorrelation coefficients would be zero if the noise term were serially uncorrelated. The autocorrelation coefficients are computed for each stock as the average of the daily autocorrelations over the last ten trading days in April 2004. t-statistics are in parentheses.
between different ways to approach the same original transactions or quotes data: should we sample in calendar time, transaction time, tick time or something else altogether? Since the sampling mechanism is not dictated by the data, this argues for working under robust departures from the basic assumptions. 3.2. Example: a simple model to capture the noise dependence A simple model to capture the higher order dependence that we just documented in INTC and MSFT trades is
ϵti = Uti + Vti
(3.3)
where U is iid, V is AR(1) with the first order coefficient ρ, |ρ| < 1, and U ⊥ V . Under this model, we have E
Yτj − Yτj−1
Yτi − Yτi−1
Y. Aït-Sahalia et al. / Journal of Econometrics 160 (2011) 160–175 1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
-0.2
-0.2
165
Fig. 5. Log-return autocorrelogram from quote revisions for Intel and Microsoft, last ten trading days in April 2004.
=
∫
τi τi−1
σt2 dt + 2E U 2 + 2 (1 − ρ) E [V 2 ]
2 2 2 −E [j−Ui−]1 − (1 − 2ρ) E2[V ] −ρ (1 − ρ) E [V ]
4. Extending the TSRV estimator for dependent noise
if j = i if j = i + 1 if j > i + 1.
(3.4)
This model can easily be fitted to the data by the generalized method of moments. We use the first twenty autocovariances of the log-returns as moment functions, in order to estimate the three parameters E [U 2 ], E [V 2 ] and ρ . Their estimated values are 4.2 × 10−8 , 3.5 × 10−8 and −0.68 for INTC and 2.9 × 10−8 , 4.3 × 10−8 and −0.70 for MSFT. The bottom panel of Fig. 4 shows the sample autocorrelogram and the corresponding one fitted by the model above, illustrating the generally good fit produced by this simple model. Let us stress, however, that, while this simple model seems to capture fairly well the dependence in the stock data that we have examined, our theory is not tied to this particular specification of ϵ . It applies to fairly general dependence structures, as can be seen from Assumption 1 below. Finally, note also that while for consistency reasons the bottom panel of Fig. 4 reports autocorrelations, the fitting is actually done on autocovariances as given in (3.4). 3.3. Transactions or quotes? The model (3.3) for the microstructure noise describes well a situation where the primary source of the noise beyond order one consists of further bid-ask bounces. In such a situation, the fact that a transaction is on the bid or ask side has little predictive power for the next transaction, or at least not enough to predict that two successive transactions are on the same side with very high probability (although Choi et al. (1988) have argued that serial correlation in the transaction type can be a component of the bidask spread, and extended the model of Roll (1984) to allow for it). Fig. 4 and the estimates just reported (ρ = −0.7) are evidence of negative autocorrelation at horizons of up to about 15 transactions. In trying to assess the source of the higher order dependence in the log-returns, a natural hypothesis is that this is due to the trade reversals: in transactions data and an orderly liquid market, one might expect that in most cases successive transactions of the same sign (buy or sell orders) will not move the price. The next recorded price move is then, more likely than not, going to be caused by a transaction that occurs on the other side of the bid-ask spread, and so we observed these reversals when the data consist of the transactions that lead to a price change. To examine this hypothesis, we turn to quotes data, also from the TAQ database. The results are reported in Fig. 5 and suggest that an important source for the AR(1) pattern with negative autocorrelation (the term V in (3.3)) will be trade reversals. The remaining autocorrelation exhibited in the quotes data can also be captured by model (3.3), but with a positive autocorrelation in the V term. This can capture effects such as the gradual adjustment of prices in response to a shock such as a large trade.
In the previous section, we found that there are empirical situations (such as Intel or Microsoft transactions) where the assumption of iid market microstructure noise could be problematic. We now proceed to suitably extending the TSRV estimator to make it robust to departures from the iid noise assumption. The idea is to somewhat slow the fast time scale to reduce the degree of dependence that is induced by the noise. 4.1. The setup As above, we let Y be the logarithm of the transaction price, which is observed at times 0 = t0 , t1 , . . . , tn = T . We assume that at these times, Y is related to a latent true price X (also in logarithmic scale) through Eq. (1.1). The latent price X is given by (1.2). Assumption 1. We assume that the noise process ϵti is independent of the Xt process, and that it is (when viewed as a process in index i) stationary and strong mixing with the mixing coefficients decaying exponentially. We also suppose that for some κ > 0, E ϵ 4+κ < ∞. Definitions of mixing concepts can be found e.g., in Hall and Heyde (1980), p. 132. Note that by Theorem A.6 (p. 278) of Hall and Heyde (1980), there is a constant ρ < 1 so that, for all i,
Cov(ϵt , ϵt ) ≤ ρ l Var(ϵ). i i+l
(4.1)
For the moment, we focus on determining the integrated volatility of X for one time period [0, T ]. This is also known as the continuous quadratic variation ⟨X , X ⟩ of X . In other words,
⟨X , X ⟩T =
T
∫
σt2 dt .
(4.2)
0
Our volatility estimators can be described by considering subsamples of the total set of observations. A realized volatility based on every j’th observation, and starting with observation number r, is given as
[Y , Y ](Tj,r ) =
− 0≤j(i−1)≤n−r −j
(Ytji+r − Ytj(i−1)+r )2 .
Under most assumptions, this estimator violates the sufficiency principle, whence we define the average lag j realized volatility as
[Y , Y ](TJ ) =
J −1 1−
J r =0
[Y , Y ](TJ ,r ) =
n −J 1−
J i=0
(Yti+J − Yti )2 .
(4.3)
A generalization of TSRV can be defined for 1 ≤ J < K ≤ n as (tsrv)
⟨ X , X ⟩T
n¯ K = [Y , Y ](TK ) − [Y , Y ](J ) , n¯ J T slow time scale
(4.4)
fast time scale
thereby combining the two time scales J and K . Here n¯ K = (n − K + 1)/K and similarly for n¯ J .
166
Y. Aït-Sahalia et al. / Journal of Econometrics 160 (2011) 160–175
We will continue to call this estimator the TSRV estimator, noting that the estimator we proposed in Zhang et al. (2005) is the special case where J = 1 and K → ∞ as n → ∞. The original TSRV produces a consistent estimator in the case where the ϵti are iid. For the optimal choice K = O(n2/3 ), (tsrv)
⟨ X , X ⟩T
− ⟨X , X ⟩T = Op (n−1/6 ).
The problem with which we are concerned here is that these assumptions on the noise ϵti may be too restrictive. We shall see that in the case where J is allowed to be larger than 1, the problem of dependence of the ϵti ’s will be eliminated and the generalized TSRV estimator given in (4.4) will be consistent for suitable choices of (J , K ). 4.2. A signal-noise decomposition
and, in the regular case where Cov(ϵt0 , ϵtK ) = o(Cov(ϵt0 , ϵtJ )) n
Cov(ϵt0 , ϵtJ )(1 + o(1)). K If J → ∞ at even a quite slow rate when n → ∞, the bias is negligible. Also, in the case of m-dependent ϵ s, the bias becomes zero for a finite J. We obtain: E [noise term] = 2
Proposition 1. Under assumption (4.12) below, K
L
(noise term − E [noise term]) −→ ξ Znoise
n1/2
(4.9)
as n → ∞, where Znoise is standard normal. Further, in the case where 2 both J and K go to infinity with n, we have that ξ 2 = ξ∞ , where 2 ξ∞ = 8 Var(ϵ)2 + 16
We have the following.
∞ −
Cov(ϵt0 , ϵti )2 .
(4.10)
i=1
Lemma 1. Under the assumptions above, let n → ∞, and let j = jn be any sequence. Then n −j − (Xti+j − Xti ) ϵti+j − ϵti = Op (j1/2 ).
In the case where J does not go to infinity (the m-dependent case, say), then 2 ξ 2 = ξ∞ + 4α0 + 8
∞ −
αi
(4.11)
i =1
i =0
where The lemma is important because it gives rise to the sum of squares decomposition
[Y , Y ](TJ ) = [X , X ](TJ ) + [ϵ, ϵ](TJ ) + Op (J −1/2 ). Thus, if we look at linear combinations of the form (4.4), one obtains (tsrv) n¯ K ⟨ X , X ⟩T = [X , X ](TK ) − [X , X ](TJ ) n¯ J
signal term
n¯ K
(K )
αi = Cov(ϵt0 , ϵti+J ) Cov(ϵti , ϵtJ ) + Cum(ϵt0 , ϵti , ϵtJ , ϵti+J ). Note that even when J → ∞, one may be better off using (4.11) 2 than ξ∞ since the former is closer to the small sample variance, and since J → ∞ quite slowly. (By contrast, K → ∞ much more quickly, as we shall see). 4.4. Analysis of the signal term
(K )
(J )
+ [ϵ, ϵ]T − [ϵ, ϵ]T +Op (K n¯ J
−1/2
As for the ‘‘signal term,’’ we obtain that [X , X ]T → ⟨X , X ⟩T in probability as n → ∞, provided K = o(n). Obviously, for the signal term in ⟨ X , X ⟩T − ⟨X , X ⟩T to be scalable to be consistent (see Eq. (4.16) below), we need
),
noise term
so long as 1≤J ≤K
and K = o(n),
(4.5)
both of which will be assumed throughout.
lim sup
It can be seen that when the ϵ ’s are independent E [noise term] = 0, so that the linear combination used in (4.4) is exactly what is needed to remove the bias due to noise. To analyze the more general case, and to obtain approximate distribution of the noise term, note that
[ϵ, ϵ]T
n −J n 1 − (J ) 2 2− = ci ϵti − ϵt ϵt , J i =0 J i=0 i i+J
(J )
where ci
−
(J )
ci
< 1,
(4.12)
which is easily satisfied. In fact, as we shall see, one would normally take
4.3. Analysis of the noise term
(J )
J
n→∞ K
lim sup n→∞
= 0.
(4.13)
Specifically, we have in the following, which is proved in the same way as Theorem 2 (p. 1401) of Zhang et al. (2005): Proposition 2. Under (4.5),
K
(4.6)
= 2 for J ≤ i ≤ n − J, and = 1 for other i. By construction
J K
n
1+2
−1/2 n¯ K (J ) (K ) ⟨ ⟩ [ X , X ] − − X , X [ X , X ] T T T K3 n¯ J J3
√ L −→ η T Zdiscrete ,
(4.14)
where, Zdiscrete is standard normal, and where in general, η is given as the limit in Theorem 3 in Zhang et al. (2005) (i.e., the discretization variance η2 has the same expression as when the noise is iid). In the special case where observations are equidistant, 2
= 2J n¯ J ,
(4.7)
i
so that for J ≤ n/2,
|E [noise term]| ≤ 2E ϵ 2
1
(n − K + 1)ρ K K n¯ K 1 J + (n − J + 1)ρ n¯ J J n =O (ρ K + ρ J ) K
η = 2
(4.8)
4 3
T
∫
σt4 dt .
(4.15)
0
The convergence in law is stable (see Chapter 3 of Hall and Heyde (1980)), the most important consequence of which is that Zdiscrete is independent of η.
Y. Aït-Sahalia et al. / Journal of Econometrics 160 (2011) 160–175
4.5. The combined estimator Consider the adjusted estimator (tsrv,adj)
⟨ X , X ⟩T
= 1−
n¯ K
−1
n¯ J
(tsrv)
⟨ X , X ⟩T
.
(4.16)
In the iid case of Zhang et al. (2005), this adjustment was introduced from small sample considerations (Section 4.2). Here, we also see that in the case where (4.12) is satisfied but (4.13) is not, this adjustment is needed for consistency. In the following, we analyze this estimator, and for the case when (4.13) holds, the same (tsrv) analysis applies to the original ⟨ X, X⟩ . T
We obtain from Eqs. (4.9) and (4.14) that (tsrv,adj)
⟨ X , X ⟩T
=
2E ϵ 2
+
K
n K
n
×
Cov(ϵt0 , ϵtJ ) +
1−
1+2 n¯ K
J
3
1/2
K3
−1
n¯ J
n1/2 K
ξ Znoise
σt2 = σ 2 , one gets that ⟨X , X ⟩T = σ 2 T , whereas [X , X ](TK ) ≈ σ 2 T (n − K + 1)/n (the approximation here is loose, but, for example, it is an equality in expectation when σt2 is constant). Thus n¯ K n−K +1 n¯ K n − J + 1 (J ) (K ) 2 [X , X ]T − [X , X ]T ≈ σ T − n¯ J n¯ J n n = σ 2T
(K − J )¯nK n
.
(4.20)
A further modification of our estimator is thus the area adjusted quantity (tsrv,aa)
(tsrv)
n
⟨ X , X ⟩T . (K − J )¯nK −1 n¯ Since, by (4.5), (K −nJ )¯n ∼ 1 − n¯K , we have: K J ⟨ X , X ⟩T
=
(tsrv,aa)
Proposition 4. The estimator ⟨ X , X ⟩T (tsrv,adj)
√
167
totics as ⟨ X , X ⟩T
η T Zdiscrete
1 + op (1) ,
(4.17)
where Znoise and Zdiscrete are asymptotically standard normal, and asymptotically independent. It is easy to see that the optimal trade-off between the two variance terms results in a choice of K = O(n2/3 ). The worst thing that can then happen to the bias term is then that this is of the order of (n/K )ρ J = n1/3 ρ J . Thus the bias is of small order relative to the variance provided one chooses n1/3 ρ J = o(n−1/6 ), i.e., ρ J = o(n−1/2 ). Thus, one can safely assume that J /K ∼ 0 (i.e., (4.13)), and it follows that (from the two previous propositions, the interaction term going away as in Lemma A.2 (p. 1408) of Zhang et al. (2005)):
(4.21)
has the same asymp-
given in Proposition 3.
The further adjustment therefore does no harm asymptotically. Because of its more careful treatment of small-sample unbiasedness, the area adjusted estimator (4.21) is the one we would most often recommend, especially for moderate sample size. It should be emphasized, however, that the bias-calculation is based on an assumption of a constant σ and on borrowing information from the middle of the interval [0, T ]. In conclusion, the TSRV estimator that is robust to serial dependence in the noise, behaves as follows: (tsrv,aa) L
⟨ X , X ⟩T +
≈ ⟨X , X ⟩T
∫ T 4T 2 ξ + c σt4 dt ]1/2 Ztotal 2 n1/6 c 3 0 due to noise due to discretization 1
[
1
(4.22)
total variance
Proposition 3. The asymptotic (tsrv,adj) ⟨ X, X⟩ is given by
behavior
of
the
estimator
T
(tsrv,adj)
⟨ X , X ⟩T
n n1/2 = ⟨X , X ⟩T + 2 E ϵ 2 Cov(ϵt0 , ϵtJ ) + ξ Znoise K
1/2 +
K
n
K
√ η T Zdiscrete 1 + op (1) .
(4.18)
(tsrv,aa)
(tsrv,adj)
whether it is taken in the form ⟨ X , X ⟩T or ⟨ X , X ⟩T . Here 2/3 K ∼ cn , and ξ is given by (4.10), or, more generally, (4.11). Feasible implementation of the procedure depends on having T an estimate of the 0 σt4 dt. Several schemes are possible; one is discussed in Section 6 of Zhang et al. (2005), another is to use preaveraging (see Jacod et al. (2009) and Podolskij and Vetter (forthcoming)). 5. RV under serial dependence in the noise
Similar results have been developed for the multivariate asynchronous case in Zhang (2011). The optimal K is as given above, and one chooses, ultimately, J so that Cov(ϵt0 , ϵtJ ) = o(n−1/2 ).
(4.19)
Obviously, when ϵ is m-dependent, one can simply choose J = m + 1. In terms of asymptotic variance, there is no unique optimal J. In order to minimize asymptiotic variance, J should follow, (4.13). Hence, so long as J falls into the range going from the lower bound defined by (4.19) and the upper bound defined by (4.13). For further optimization of J, either small sample or asymptotic expansion arguments would need to be invoked, and this is beyond the scope of this paper.
We now turn to an analysis of the standard RV estimator when the noise is serially dependent. First, we have that sparse sampling at a given nsparse results in the same asymptotic distribution as when the noise is serially uncorrelated. Second, however, we find that dependence in the noise impacts both the bias and the asymptotic variance of the RV estimator when all the data (all n) are used. (sparse) Specifically, the traditional RV estimator, [Y , Y ]T , computed at a sparse sampling frequency ∆sparse = T /nsparse , has the following behavior: L
[Y , Y ](Tsparse) ≈ ⟨X , X ⟩T + 2nsparse E ϵ 2
bias due to noise
(K )
It should be noted that when K is large, [X , X ]T may be a slight underestimate of ⟨X , X ⟩T . To consider the issue, if σt2 is constant,
2T
∫
T
+ [4nsparse E ϵ + σ 4 dt ]1/2 Ztotal . nsparse 0 t due to noise due to discretization 4
4.6. A further adjustment to the TSRV estimator
total variance
(5.1)
168
Y. Aït-Sahalia et al. / Journal of Econometrics 160 (2011) 160–175
The reason why this last expression is as in the∑iid case is as follows. Essentially, the asymptotic variance of i ϵti ϵti+J behaves as if the quantities were uncorrelated if J goes to infinity, and so when we have nsparse = n/J going to infinity
to the true ⟨X , X ⟩T at a rate of n−1/4 when the noise is iid. We here show that a similar result holds under dependent noise. The class of weights are as in the earlier paper, given by
are with n effectively the log-returns involved in [Y , Y ]T separated by enough dependence in ϵ not for the intervals ∑ time is a martingale, with to matter. Then i ϵti ϵti+J − E ϵti+J |Fi
ai =
(sparse)
variance
∑
2 2 , which is approximately | F − E ϵ E ϵ ϵ i t t ti i i+J i+J
nsparse E[ϵ 2 ]2 under mixing, while the remainder term exponential ∑ becomes negligible under exponential i ϵti ϵti+J − E ϵti+J |Fi mixing. (all) By contrast, when all the observations are used, as in [Y , Y ]T , the asymptotic variance of RV is influenced by the dependence of (all) the noise. The asymptotics of [Y , Y ]T are, to first order, like that of [ϵ, ϵ]. The mean of the latter is 2nE ϵ 2 . As for the asymptotic variance, from standard formulas for mixing sums, we have
Ω∞ = AVAR
[
√ n
[ϵ, ϵ] n
− 2E ϵ
2
∞ −
Cov((ϵ1 − ϵ0 )2 , (ϵi+1 − ϵi )2 ).
i
2M 2 M
h′
i
M
,
(6.2)
1
∫
1
∫
xh(x)dx = 1
h(x)dx = 0.
and
(6.3)
0
0
The key step to the asymptotic analysis is the decomposition (msrv)
⟨ X , X ⟩T
=
M −
(Ki )
ai [X , X ]T
i=1
+
M −
ai Un,Ki + 2
M −
i=1
signal M −
(Ki )
ai [X , ϵ]T
i=1
noise
signal-noise interaction
ai En,Ki + 2E ϵ 2 +Op (n−1/2 ),
(6.4)
2T
∫
end points of noise
where Un,K = − K2
∑n
i=K
ϵti ϵti−K , and En,K = − K1
∑K −1 j =0
ϵt2j −
ϵt2j . Conditions (6.3) ensure that the first term in (6.4) will be asymptotically unbiased for ⟨X , X ⟩T . 1 K
∑n
j=n−K +1
6.2. The asymptotics of MSRV when the noise is serially dependent
bias due to noise T
σt4 dt ]1/2 Ztotal + [ 4nΩ∞ + n 0 due to noise due to discretization
(5.3)
total variance
where Ω∞ = E ϵ 4 when the noise is iid; otherwise, dependency in ϵ gives rise to Ω∞ in (5.2). An alternate expression for Ω∞ can be obtained by noting that Cov (ϵ1 − ϵ0 )2 , (ϵi+1 − ϵi )2 = 2 Cov(ϵ1 − ϵ0 , ϵi+1 − ϵi )2
We now study the MSRV estimator under the (dependence) Assumption 1 from Section 4.1. The only extra bias (due to dependency) in Eq. (6.4) comes from the Un,Ki . Under our mixing assumption, n − 2 ρ Ki E [Un,K ] ≤ 2 Var(ϵ) ρ i ≤ Var(ϵ) . i K K 1−ρ j =K i
Thus, when the ai follow (6.2), the absolute value of the extra bias in (6.4) becomes
M M − − 2 ai Un,i ≤ E 2 i=1 M i=1
+ Cum(ϵ1 − ϵ0 , ϵ1 − ϵ0 , ϵi+1 − ϵi , ϵi+1 − ϵi ) (and similarly for the variance).
i h i ρ M 1−ρ
+ corresponding h′ term = O(M −1 ).
6. MSRV: multiple scales realized volatility 6.1. Review of the MSRV estimator We have seen that TSRV provides the first consistent and asymptotic (mixed) normal estimator of the quadratic variation ⟨X , X ⟩T , that it can be made to work even if market microstructure noise is serially dependent, and that it has the rate of convergence n−1/6 . At the cost of higher complexity, it is possible to generalize TSRV to multiple time scales, by averaging not on two time scales but on multiple time scales. The resulting estimator, multiple scale realized volatility (MSRV), has the form of (Ki )
ai [ Y , Y ] T
i =1
1
where h is a continuously differentiable real-value function with derivative h′ , and satisfying the following two conditions:
L
M −
−
M
(5.2)
[Y , Y ](Tall) ≈ ⟨X , X ⟩T + 2n(E ϵ 2 + E ϵt0 ϵt1 )
=
+
This gives, by contrast, that the RV estimator using all the data, [Y , Y ](Tall) , computed at the highest sampling frequency ∆ = T /n, has the following behavior:
(msrv)
i
]
i=1
⟨ X , X ⟩T
M
h 2
i=1
= Var((ϵ1 − ϵ0 )2 ) + 2
i
+2
E ϵ2
(6.1)
(6.5)
To the extent that the MSRV estimator converges at the rate Op (M −1/2 ), the bias induced by the dependence of the ϵ ’s is therefore irrelevant asymptotically. An inspection of the terms in (6.4) shows that the rate of convergence does, indeed, remain of order Op (M −1/2 ) = Op (n−1/4 ) under Assumption 1. As in the TSRV case, however, the asymptotic (random) variance now changes due to the dependence of the ϵ ’s. To compute that variance when the market microstructure noise is serially dependent, note first that the four terms in (6.4) are asymptotically independent. This is by the same methods as we use in the following. Also, the behavior of the signal term is, obviously, unchanged. We compute the covariances of the individual terms, and obtain:
fast time scale
weighted sum of M slow time scales
where E ϵ 2 is given as before in (2.2). The estimator was introduced by Zhang (2006). It was shown (msrv) there that for suitably selected weights ai ’s, ⟨ X, X⟩ converges T
Proposition 5. Assume the conditions of Theorem 4 in Zhang (2006), except that the noise process ϵ , instead of being iid, is now a dependent process following Assumption 1, and with autocorrelation function γ (l) = Cov(ϵ0 , ϵtl ). Assume that M = Mn satisfies Mn /n1/2 → 0. In this case, the following expression has a limit:
Y. Aït-Sahalia et al. / Journal of Econometrics 160 (2011) 160–175
4 c T η2 3
1
∫
0 M M 1 −−
M 3 J =1 K =1
K −
h
J M
h
K
M
⟨X , X ⟩T
7. Empirical analysis
(γ (l) + γ (l + J − K ) − γ (l − K ) − γ (l + J ))
l=−J
× (min(l + J , K ) − max(0, l)) M M 4 −− K J + c −1 3 h h M
×
J =1 K =1
n−J −
M
M
(min(n, n + l) − max(J + l, K ) + 1)+
l=−(n−K )
× Cov(ϵt0 ϵt−J , ϵtl ϵtl−K ) M M 2 −− K J + c −1 3 h h M
×
169
earlier paper, or one can seek to optimize the overall variance (6.6) as a function of h. This would be analogous to the kernel design question which is the subject of Barndorff-Nielsen et al. (2008) (in the context of their autocovariance based estimator).
h(y)h(x)y2 (3x − y)dy
dx 0
+ c −1
×
x
∫
K −1 −
J =1 K =1
M
M
(min(J + l, K ) − max(0, l) + 1)+ Cov(ϵt20 , ϵt2l ). (6.6)
l=−(J −1)
1/4 Here, η2 is given by Eq. (30) in Zhang (2006). Furthermore, n (msrv) ⟨ X , X ⟩T − ⟨X , X ⟩T converges stably to a mixed normal dis-
tribution with mean zero and (random) variance given as the limit of (6.6). We have stated a prelimiting expression in (6.6) since this is closer to the small sample variance. In the special case of model (3.3), we have
2 2E U + 2 (1 − ρ) E V 2 γ (l) = −E [U 2 ] − (1 − ρ)2 E [V 2 ] |l|−1 −ρ (1 − ρ)2 E [V 2 ]
if l = 0 if |l| = 1 if |l| > 1
With these theoretical results in hand, we now turn to a comparison of the empirical performance of the RV, TSRV and MSRV estimators, study the impact of the selection of the fast and slow time scales on the TSRV estimators and the improvement due to MSRV relative to TSRV in the context of transactions data for Intel and Microsoft in the last ten days of April 2004. In practice, our estimators require that a choice be made for K in the basic TSRV estimator, and for (J , K ) in the dependencerobust TSRV estimator. Our asymptotic formulae give the rate at which K needs to grow with n, and a relative rate of convergence for J, which leaves open the question of how to choose the constant c in front of those rates. This is of course an issue that is not unique to these estimators but is rather shared by essentially every estimator that has some nonparametric feature. In this particular problem, fortunately, J and K have a natural interpretation in terms of numbers of observations that translate into sampling intervals that are meaningful for the asset price series under consideration. Our view is that one should start with plausible values of the constants c (say, J corresponding to 1 min and K to 5 min for the type of liquid stocks we are studying in this paper) and then compute the estimators for values of the constants c ranging from a few minutes around those center points. This is what we will do in our empirical analysis. As we can see from the figures below, the resulting TSRV estimates are quite robust to varying J and K in those ranges. 7.1. Comparison of the RV and TSRV estimators
(6.7)
to be inserted in (6.6). It can also be noted that, if one sets ρ = 0 then the expression (6.6) reduces to the asymptotic variance of MSRV in the iid case, see Remark 3 below. We conclude our analysis of MSRV with three additional comments: Remark 1. Recall that for TSRV we replaced (2.3) with (4.4), thereby ‘‘jumping’’ to frequencies (J , K ) over the very fastest one (1, K ) at which the serial dependence in the noise manifests itself. By letting both J and K go to infinity with n, we were effectively able to eliminate the serial dependence within each subgrid. However, the asymptotic variance of TSRV is affected by the serial dependence across subgrids coming from the averaging (J ) (K ) over RV from different subgrids in both [Y , Y ]T and [Y , Y ]T in (4.4), hence the asymptotic variance in Proposition 3, also in (4.22), which is different than in the iid noise case. Remark 2. The asymptotic distribution of MSRV is also affected by the dependence of the noise. Unlike the TSRV case, there is no benefit to adjusting the MSRV estimator in the presence of serial dependence in the noise. This is because under (6.3), the weights ai in (6.2) already assign most of the mass on the interval [1, M ] with M = O(n1/2 ) to subintervals of the form [cM , M ] where c is a positive constant. Therefore, the very fastest frequencies of observations (those close to m = 1) already play a small role in MSRV even under iid noise. Remark 3. The result specializes to Theorem 4 of Zhang (2006) in the case of iid noise. As a choice of the weight function (kernel) h, one can either use the noise optimal choice (25) (p. 1027) from the
In our empirical analysis of the different estimators, we start by comparing our TSRV estimator to the traditional RV estimator. In particular, we establish that TSRV solves the two main problems associated with RV, namely the divergence of RV as the sampling interval gets small and the variability of RV. The comparison is reported for Intel in Fig. 6 and for Microsoft in Fig. 7, where we compare RV computed at different sampling frequencies with TSRV computed for a range of values of K of 2–8 min around the typical 5 min value. Besides the well-known divergence of RV as ∆ → 0, the two figures also demonstrate the large difference in variability of both estimates. Without the benefits of the double averaging in Fig. 1, what these two series of plots show is that computing RV at, say, 4 min as opposed to 5 min or 6 min can result in substantially different daily estimates. The computation of dayby-day estimates is how RV is actually used. In existing empirical applications, RV has typically been employed in the empirical literature at an arbitrary sparse frequency: in light of the variability of RV as a function of the sparse sampling interval ∆sparse , whatever a particular choice is made can matter. 7.2. Robustness of TSRV to the choice of slow time scale Both RV and TSRV require that the econometrician make a choice. In RV, one needs to select the sparse sampling frequency at which to compute the estimator. In TSRV, one needs to select the number of subgrids K over which to average the slow time scales sum of squares. In Zhang et al. (2005), we showed how to compute an optimal value K ∗ for the slow time scale parameter K when the noise term is assumed to be iid, but in practical application it would be beneficial to be able to dispense with that computation. For that,
170
Y. Aït-Sahalia et al. / Journal of Econometrics 160 (2011) 160–175
Fig. 6. Comparison of the RV (dashed line) and TSRV (solid line) estimators for Intel, computed on a daily basis, from transaction data.
Fig. 7. Comparison of the RV (dashed line) and TSRV (solid line) estimators for Microsoft, computed on a daily basis.
we would need to establish that the TSRV estimator is empirically robust to departures from the optimal K ∗ . So, we now examine and
compare the robustness of the two estimators to the selection of their respective free parameters.
Y. Aït-Sahalia et al. / Journal of Econometrics 160 (2011) 160–175
171
Fig. 8. Comparison of the RV and TSRV estimators for Intel and Microsoft, averaged over the last ten trading days of April 2004. The left panels demonstrate the dependence of RV as a function of the sparse sampling interval, while the right panels study the robustness of TSRV with respect to the choice of the averaging frequency as represented by the number of subgrids K . The estimators were calculated from transaction data.
The left panels in Fig. 8 show RV, computed for Intel and Microsoft, as an average of the RV values over the last ten days in April 2004 for different sparse sampling frequency (which is the choice parameter for RV). The right panels report the robustness of TSRV to the choice of K . The right panels show that the estimator is numerically very robust to a range of choices of K . In other words, the value of the TSRV estimator is largely unaffected by the choice of K within a reasonable range corresponding, in sampling interval scale, to 2–10 min. 7.3. Robustness of TSRV to the choice of slow and fast time scales When the noise is serially dependent, the TSRV estimator defined in (4.4) depends on the choice of both the slow time scale (K ) and the fast time scale (J ). We find that the time-dependent TSRV is quite robust to the choice of (J , K ). Fig. 9 shows that the value of the estimator is essentially identical within the range of values considered, which correspond in a sampling interval scale to a few seconds to 2 min for J and 5 min–10 min for K . 7.4. The improvement in MSRV over TSRV It is possible to improve upon TSRV by considering MSRV. Both are consistent estimators of ⟨X , X ⟩T , but MSRV has the faster convergence rate n−1/4 vs. n−1/6 for TSRV. The trade-off involves the additional computational burden, since the number of slow time scales to be computed for MSRV is M = O(n1/2 ) and n can be large in empirical applications: for instance n = 23,400 for a stock that trades on average once a second for a full day. We now examine the difference between TSRV and MSRV in the context of our empirical application. Fig. 10 shows that both methods produce close estimates, especially when compared to the differences exhibited earlier between RV and TSRV. As in the case of TSRV shown in Fig. 9, the MSRV estimator is not sensitive to the specific choice of M within the range considered (up to 500).
Fig. 9. Robustness of the TSRV estimator for Intel and Microsoft over the choice of the two time scales J (fast) and K (slow), averaged over the last 10 trading days of April 2004.
7.5. Robustness to data cleaning procedures One aspect that is sometimes briefly mentioned, but often not emphasized, in empirical papers using high frequency financial data is the fact that the raw data is typically pre-processed to eliminate data errors, outliers, etc. In addition, empirical
applications of RV can involve pre-filtering of the data of various types, but we focus here on the impact of data cleaning procedures that typically take place before any actual RV computation is performed.
172
Y. Aït-Sahalia et al. / Journal of Econometrics 160 (2011) 160–175
Fig. 10. Comparison of the TSRV and MSRV estimators for Intel and Microsoft, for each of the last ten trading days of April 2004. For each stock and time period, the left bar is TSRV and the right bar is MSRV. The estimators were calculated from transaction data.
Fig. 11. Dependence of the RV and TSRV estimators for Intel and Microsoft, averaged over the last ten trading days of April 2004 on the degree of pre-processing of the raw data. In each panel, the three curves correspond respectively to the raw data (solid line), the data where immediate price bouncebacks of 1% or more are eliminated (large dashes) and the data where immediate price bouncebacks of 0.1% or more are eliminated (short dashes). In the case of TSRV, the results for the raw data and the elimination of 1% bouncebacks are virtually indistinguishable.
It turns out that the impact of the specific data-cleaning procedures used to pre-process the raw data can have a large impact on RV estimators. We illustrate this effect by considering different cutoffs to determine which outliers to eliminate before calculating the RV estimator. First, we eliminate the obvious data errors (such as a transaction price reported as zero, transaction times that are out of order, etc.). Second, we seek to eliminate outliers of various sizes. This is where things get trickier. For that purpose, an outlier is a ‘‘bounceback’’: a log-return from one transaction to the next that is both greater in magnitude than an arbitrary cutoff, and is followed immediately by a log-return of the same magnitude but of the opposite sign, so that the price returns to its starting level before that particular transaction. Certainly, we do not expect such large ‘‘roundtrips’’ to represent meaningful transactions. The question is how large is large, and so we are led to study the dependence of the RV and TSRV estimators on three different cutoffs that could conceivably be adopted, 0.1% and 1% respectively in log-returns terms, and no cutoff (no raw bounceback return is larger than 2% in our sample, so that any cutoff larger than this would make no difference). The analysis reported above is all based on the intermediary cutoff of 1%. The left panels in Fig. 11 show the large impact of the cutoff on the RV estimator. As shown in the right panel, where all three curves are close together, TSRV is much less sensitive to the specific
cutoff used. This is due to the structure of TSRV as a difference of two estimators: large returns in the data are part of the slow time scale calculation, but are then subtracted out in the fast time scale one. Since the cutoff level is essentially arbitrary, we can view such outliers as a form of market microstructure noise, and the robustness of TSRV to different ways of pre-processing the data is therefore a desirable property. 8. Conclusions Market microstructure noise contained in high frequency financial data can exhibit serial correlation, as was suggested in Aït-Sahalia et al. (2005) and Hansen and Lunde (2006). We showed that combining two or more time scales for the purpose of estimating integrated volatility will work even in the situation where the microstructure noise exhibits time series dependence, thereby making the TSRV construction robust to this departure from the basic assumptions in Zhang et al. (2005). In most data error situations, one might expect that progress will lead the issue to become somehow less salient over time. However, in this instance, the measurement errors we face in ultra high frequency data are compounded by the institutional evolution of the equity markets. While changes such as the passage to decimalization contribute to reducing the amount of noise
Y. Aït-Sahalia et al. / Journal of Econometrics 160 (2011) 160–175
in the data, by reducing the rounding errors, the emergence of competing electronic networks means that multiple transactions can be executed (and ultimately reported in our database) on different exchanges at the same time, thereby increasing the potential for slight time reporting mismatches and other forms of data error. Indeed, the data generated by the individual market venues find their way to the public in various ways. The principal Electronic Communication Networks (ECNs), such as INET and its precursors and Archipelago, have high speed dissemination directly to their subscribers. These dissemination systems run on a telecommunications protocol known as ‘‘frame relay,’’ which is quite fast. Most other market data, however, reaches the trading public (and ultimately us econometricians) either through Nasdaq’s dissemination or CTS/CQS. The Consolidated Tape Association administers the CTS (the consolidated trade system) and CQS (the consolidated quote system). Virtually all US trades are reported to CTS, but the path may be indirect. Island (not INET) may report a trade to the National (formerly Cincinnati) Stock Exchange, which will then report it to CTS, which then broadcasts it to us. The general problem is that trading activity is fast relative to the CTS speed of collection and dissemination. Furthermore, Nasdaq has had long-standing issues with late and delayed trade reports. In principle, a Nasdaq member has up to 30 (in the past, 90) seconds to report a trade and anecdotal evidence suggests that some dealers were/are using this leeway to its greatest extent. Since this practice was not uniform across dealers and across time, the sequencing can be suspect. Also, the sequencing across exchanges may be unreliable over very short time intervals: a trade on one exchange followed (and timestamped to the same second as) a trade in the same stock on a different exchange, may not in fact have occurred in that order. While the consolidated tape feed (which we see on TAQ) is probably the best source of data available, we may not be seeing the trades in the order in which they occurred, and the emergence and further development of alternative networks on which to trade the same stocks makes the issue of market microstructure noise in the data an increasing, not a decreasing, one. ECNs represent over 30% of Nasdaq trading volume and are increasing their market share in NYSE-listed issues as well (see e.g., Barclay et al. (2003)). Clearly, the decentralization of trading, combined with the increased frequency of trading, create challenges for the data collection which ultimately affect the estimation of a quantity as basic as the daily integrated volatility of the price. So there are reasons to believe that the issue of controlling for market microstructure noise in high frequency financial econometrics will be with us for some time.
∑n−J
∑n
2
n − |X = Var (−ci+J + ci )ϵti |X
i =0
i =0
≤ Eϵ
−
(−ci+J + ci ) + 2 2
−
ρ
− × (−ci+J + ci )(−ci+J +l + ci+l ) i − − 2 2 l ≤ Eϵ (−ci+J + ci ) 1 + 2 ρ (J )
≤ 4J [X , X ]T E ϵ (1 + 2ρ/(1 − ρ)) 2
n n 1 − (K ) 2 n¯ K 1 − (J ) 2 ci ϵti − c ϵ ti K i=0 n¯ J J i=0 i
= Var ≤
n −
n¯ K − ci(J ) n¯ J
(K )
ci
i=0
+2
n − n −l −
(K )
ci
l =1 i =0
≤
n −
ci
n −
(K )
ci
i=0
2
Var(ϵ 2 )
n¯ K − ci(J ) n¯ J
n¯ K − ci(J ) n¯ J
(K )
i=0
=O
2
2
(K ) ci+l
n¯ K − ci(+J )l n¯ J
Var(ϵ ) + 2 2
n −
Cov(ϵt2i , ϵt2i+l )
Cov(ϵ , ϵ ) 2 t0
2 tl
l=0
n¯ K − ci(J ) n¯ J
2 (B.1)
where the second to last transition is due to the Cauchy–Schwarz inequality, and the final one follows from our moment and mixing assumptions, again in view of Theorem A.6 (p. 278) of Hall and Heyde (1980). Under (4.5) and by tedious calculation, one obtains that the r.h.s. of (B.1) is no larger than the order O(J /n). (In fact, this is the exact order under the condition (4.12) below.) Thus noise term = −2
n−K 1 −
K i=0
ϵti ϵti+K + 2
1/2 + Op
J
n
n −J n¯ K 1 −
n¯ J J i=0
ϵti ϵti+J
,
whence, finally, K n1/2
(noise term − E [noise term]) n−K 1 −
= −2 √
n i =0
n−J 1 −
ϵti ϵti+K + 2 √
n i =0
ϵti ϵti+J + op (1)
αi′ = 2 Cov(ϵt0 , ϵti )2 + Cov(ϵt0 , ϵti+J ) Cov(ϵti , ϵtJ ) + Cum(ϵt0 , ϵti , ϵtJ , ϵti+J ). (A.1)
(B.2)
as n → ∞, where Znoise is standard normal, by the same methods as in Chapter 5 of Hall and Heyde (1980) (we here have a triangular array of sums, but the arguments go through nonetheless). By uniform ∑ integrability, the asymptotic variance is on the form ξ 2 = ∞ ′ 4α0 + 8 i=1 αi′ where αi′ is the limit as K → ∞ (and J → ∞ if such is the case) of
For a fixed J, this means that
l ≥1
i
E
n n 1 − (K ) 2 n¯ K 1 − (J ) 2 ci ϵti − c ϵti n¯ J J i=0 i K i =0
= Cov(ϵt0 , ϵti ) Cov(ϵtK − ϵtJ , ϵti+K − ϵti+J ) + Cov(ϵt0 , ϵti+K − ϵti+J ) Cov(ϵtK − ϵtJ , ϵti ) + Cum(ϵt0 , ϵti , ϵtK − ϵtJ , ϵti+K − ϵti+J ).
l
l ≥1
i
By (4.7),
Cov(ϵt0 (ϵtK − ϵtJ ), ϵti (ϵti+K − ϵti+J ))
2
Appendix B. Proof of Proposition 1
L
First, i=0 (Xti+J − Xti )(ϵti+J − ϵti ) = i=0 (−ci+J + ci )ϵti where ci = Xti − Xti−J for J ≤ i ≤ n, and = 0 otherwise. Second, n − E (−ci+J + ci )ϵti
where the last two transitions are followed by the Cauchy–Schwarz inequality. The lemma is followed by the Markov inequality. This finishes the proof.
−→ ξ Znoise
Appendix A. Proof of Lemma 1
173
The limit for J → ∞ is 2 Cov(ϵt0 , ϵti )2 .This shows the result.
174
Y. Aït-Sahalia et al. / Journal of Econometrics 160 (2011) 160–175
Appendix C. Proof of Proposition 5
where
Consider first the behavior of the [X , ϵ](K ) term. The conditional covariance behaves as follows:
Cov(ϵt0 ϵt−J , ϵtl ϵtl−K ) = γ (l)γ (l − (J − K )) + γ (l − K )γ (l + J )
(J )
Cov [X , ϵ] , [X , ϵ]
(K )
X process
+ Cum(ϵt0 , ϵt−J , ϵtl , ϵtl−K ). Finally,
n n 1 −− = (Xtj − Xtj−J )(Xtk − Xtk−K )
Cov(En,J , En,K ) ≈
JK j=J k=K
× Cov(ϵtj − ϵtj−J , ϵtk − ϵtk−K ) n n 1 −− = (Xtj − Xtj−J )(Xtk − Xtk−K )(γ (j − k)
=
JK j=J k=K
+ γ (j − k − (J − K )) − γ (j − k + K ) − γ (j − k − J )) n n 1 −− ≈ (⟨X , X ⟩min(tj ,tk ) − ⟨X , X ⟩max(tj−J ,tk−K ) )+ (γ (j − k) JK j=J k=K
+ γ (j − k − (J − K )) − γ (j − k + K ) − γ (j − k − J )) so that Cov [X , ϵ](J ) , [X , ϵ](K ) X process
=
K n 1 −−
JK l=−J k=K
⟨X , X ⟩min(tk−l ,tk ) − ⟨X , X ⟩max(tk−l−J ,tk−K )
+
JK l=−J
n −
⟨X , X ⟩min(tk−l ,tk ) − ⟨X , X ⟩max(tk−l−J ,tk−K )
+
.
(C.1)
k=K
For the final summation in (C.1), note that this is a telescope sum, of the form (where a and b depend on J, K , and l, and where one can take a ≤ b) n−b n−a n − − − ⟨X , X ⟩tk ⟨X , X ⟩tk − (⟨X , X ⟩tk−a − ⟨X , X ⟩tk−b ) = k=K −b
k=K −a
k=K
=
K −a−1
n −a −
⟨X , X ⟩tk −
−
⟨X , X ⟩tk
k=K −b
k=n−b+1
≈ (b − a) ⟨X , X ⟩T
(C.2)
since ⟨X , X ⟩0 = 0. Specifically, it is easy to see that a = max(0, l) while b = min(l + J , K ). Thus Cov [X , ϵ](J ) , [X , ϵ](K ) X process ≈ ⟨X , X ⟩T
K 1 −
JK l=−J
(γ (l)
+ γ (l + J − K ) − γ (l − K ) − γ (l + J )) × (min(l + J , K ) − max(0, l)).
(C.3)
At the same time, Cov(Un,J , Un,K ) =
=
=
=
n n 4 −−
JK j=J k=K n n 4 −−
JK j=J k=K 4
n−J −
JK l=−(n−K )
4 JK
Cov
n −
ϵtj ϵtj−J ,
j =J
n −
ϵtk ϵtk−K
k=K
Cov(ϵtj ϵtj−J , ϵtk ϵtk−K ) Cov(ϵt0 ϵt−J , ϵtk−j ϵtk−j−K )
(min(n, n + l)
− max(J + l, K ) + 1)+ Cov(ϵt0 ϵt−J , ϵtl ϵtl−K ),
JK 2
Cov
J −1 − j =0
K −1 −
ϵ , 2 tj
K −1 −
ϵ
2 tk
k=0
(min(J + l, K )
JK l=−(J −1)
− max(0, l) + 1)+ Cov(ϵt20 , ϵt2l ).
(C.5)
As before, Cov(ϵt20 , ϵt2l ) = 2γ (l)2 + Cum(ϵt0 , ϵt0 , ϵtl , ϵtl ). Following Eq. (31) (p. 1030) in Zhang (2006), and (C.3)–(C.5) above, we obtain the combined expression given in Eq. (6.6). Note that in the special case of model (3.3), with Gaussian U and V , the fourth cumulant above is zero, and γ (l) is given by (6.7). The arguments involving stable convergence are as in Zhang (2006). References
× (γ (l) + γ (l + J − K ) − γ (l − K ) − γ (l + J )) K 1 − = (γ (l) + γ (l + J − K ) − γ (l − K ) − γ (l + J ))
×
2
(C.4)
Aït-Sahalia, Y., Mykland, P.A., 2003. The effects of random and discrete sampling when estimating continuous-time diffusions. Econometrica 71, 483–549. Aït-Sahalia, Y., Mykland, P.A., Zhang, L., 2005. How often to sample a continuoustime process in the presence of market microstructure noise. Review of Financial Studies 18, 351–416. Aït-Sahalia, Y., Mykland, P.A., Zhang, L., 2006. Comments on ‘‘Realized variance and market microstructure noise’’. Journal of Business and Economic Statistics 24, 162–167. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2001. The distribution of exchange rate realized volatility. Journal of the American Statistical Association 96, 42–55. Bandi, F.M., Russell, J.R., 2008. Microstructure noise, realized volatility and optimal sampling. Review of Economic Studies 75, 339–369. Barclay, M.J., Hendershott, T., McCormick, D.T., 2003. Competition among trading venues: information and trading on electronic communications networks. Journal of Finance 58, 2637–2665. Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N., 2008. Designing realized kernels to measure ex-post variation of equity prices in the presence of noise. Econometrica 76, 1481–1536. Barndorff-Nielsen, O.E., Shephard, N., 2002. Econometric analysis of realized volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society, Series B 64, 253–280. Black, F., 1986. Noise. Journal of Finance 41, 529–543. Choi, J.Y., Salandro, D., Shastri, K., 1988. On the estimation of bid-ask spreads: theory and evidence. The Journal of Financial and Quantitative Analysis 23, 219–230. Curci, G., Corsi, F., 2005. A discrete sine transform approach for realized volatility measurement. Tech. Rep., University of Lugano. Delattre, S., Jacod, J., 1997. A central limit theorem for normalized functions of the increments of a diffusion process, in the presence of round-off errors. Bernoulli 3, 1–28. French, K., Roll, R., 1986. Stock return variances: the arrival of information and the reaction of traders. Journal of Financial Economics 17, 5–26. Gençay, R., Ballocchi, G., Dacorogna, M., Olsen, R., Pictet, O., 2002. Real-time trading models and the statistical properties of foreign exchange rates. International Economic Review 43, 463–491. Gloter, A., Jacod, J., 2000. Diffusions with measurement errors: I—local asymptotic normality and II—optimal estimators. Tech. Rep., Université de Paris-6. Gonçalves, S., Meddahi, N., 2009. Bootstrapping realized volatility. Econometrica 77, 283–306. Griffin, J., Oomen, R., 2008. Sampling returns for realized variance calculations: tick time or transaction time? Econometric Reviews 27, 230–253. Hall, P., Heyde, C.C., 1980. Martingale Limit Theory and its Application. Academic Press, Boston. Hansen, P.R., Lunde, A., 2006. Realized variance and market microstructure noise. Journal of Business and Economic Statistics 24, 127–161. Harris, L., 1990. Statistical properties of the roll serial covariance bid/ask spread estimator. Journal of Finance 45, 579–590. Jacod, J., 1994. Limit of random measures associated with the increments of a Brownian semimartingale. Tech. Rep., Université de Paris-6. Jacod, J., 1996. La variation quadratique du Brownien en présence d’erreurs d’arrondi. Astérisque 236, 155–162. Jacod, J., Li, Y., Mykland, P.A., Podolskij, M., Vetter, M., 2009. Microstructure noise in the continuous case: the pre-averaging approach. Stochastic Processes and their Applications 119, 2249–2276.
Y. Aït-Sahalia et al. / Journal of Econometrics 160 (2011) 160–175 Jacod, J., Protter, P., 1998. Asymptotic error distributions for the Euler method for stochastic differential equations. Annals of Probability 26, 267–307. Li, Y., Mykland, P.A., 2007. Are volatility estimators robust with respect to modeling assumptions? Bernoulli 13, 601–622. Mykland, P.A., Zhang, L., 2006. ANOVA for diffusions and Itô processes. Annals of Statistics 34, 1931–1963. Oomen, R.C., 2006. Properties of realized variance under alternative sampling schemes. Journal of Business and Economic Statistics 24, 219–237. Podolskij, M., Vetter, M., 2009. Estimation of volatility functionals in the simultaneous presence of microstructure noise and jumps. Bernoulli 15 (3), 634–658. Roll, R., 1984. A simple model of the implicit bid-ask spread in an efficient market. Journal of Finance 39, 1127–1139. Stoll, H., 2000. Friction. Journal of Finance 55, 1479–1514.
175
Zhang, L., 2006. Efficient estimation of stochastic volatility using noisy observations: a multi-scale approach. Bernoulli 12, 1019–1043. Zhang, L., 2011. Estimating covariation: Epps effect, microstructure noise. Journal of Econometrics 160 (1), 33–47. Zhang, L., Mykland, P.A., Aït-Sahalia, Y., 2005. A tale of two time scales: determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association 100, 1394–1411. Zhang, L., Mykland, P.A., Aït-Sahalia, Y., 2011. Edgeworth expansions for realized volatility and related estimators. Journal of Econometrics 160 (1), 190–203. Zhou, B., 1996. High-frequency data and volatility in foreign-exchange rates. Journal of Business and Economic Statistics 14, 45–52. Zumbach, G., Corsi, F., Trapletti, A., 2002. Efficient estimation of volatility using high frequency data. Tech. Rep., Olsen & Associates.
Journal of Econometrics 160 (2011) 176–189
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
A reduced form framework for modeling volatility of speculative prices based on realized variation measures✩ Torben G. Andersen a,b,c,∗ , Tim Bollerslev d,b,c,1 , Xin Huang e,2 a
Department of Finance, Kellogg School of Management, Northwestern University, Evanston, IL 60208, United States
b
CREATES, Denmark NBER, United States d Department of Economics, Duke University, Box 90097, Durham NC 27708, United States c
e
Department of Economics, University of Oklahoma, 729 Elm Avenue, Room 329 Hester Hall, Norman, OK 73019, United States
article
info
Article history: Available online 6 March 2010 JEL classification: C1 G1 C2 Keywords: Stochastic volatility Realized variation Bipower variation Jumps Hazard rates Overnight volatility
abstract Building on realized variance and bipower variation measures constructed from high-frequency financial prices, we propose a simple reduced form framework for effectively incorporating intraday data into the modeling of daily return volatility. We decompose the total daily return variability into the continuous sample path variance, the variation arising from discontinuous jumps that occur during the trading day, as well as the overnight return variance. Our empirical results, based on long samples of highfrequency equity and bond futures returns, suggest that the dynamic dependencies in the daily continuous sample path variability are well described by an approximate long-memory HAR–GARCH model, while the overnight returns may be modeled by an augmented GARCH type structure. The dynamic dependencies in the non-parametrically identified significant jumps appear to be well described by the combination of an ACH model for the time-varying jump intensities coupled with a relatively simple log-linear structure for the jump sizes. Finally, we discuss how the resulting reduced form model structure for each of the three components may be used in the construction of out-of-sample forecasts for the total return volatility. © 2010 Elsevier B.V. All rights reserved.
1. Introduction A burgeoning literature concerned with modeling and forecasting the dynamic dependencies in financial market volatility has emerged over the past two decades. Until fairly recently, most of the empirical results in the literature were based on the use of
✩ The work of Andersen and Bollerslev was supported by a Grant from the NSF to the NBER and CREATES funded by the Danish National Research Foundation. We thank two anonymous referees, Neil Shephard, Nour Meddahi, George Tauchen, Hao Zhou, Eric Ghysels, Per Mykland, and Barbara Rossi for many insightful comments, as well as seminar participants at the 2007 AEA meeting, Financial Econometrics Session, the 2006 Conference on Realized Volatility in Montreal, Canada, the SAMSI 2005–2006 Program on Financial Mathematics, Statistics and Econometrics, the Duke Financial Econometrics Lunch Group, and economists at the Federal Reserve Board of Governors in Washington DC. We also appreciate the help of Lori Aldinger in Research & Product Development at the CME and the support team at the CBOT for supplying the CME and CBOT calendars of historical exchange trading schedules. ∗ Corresponding author at: Department of Finance, Kellogg School of Management, Northwestern University, Evanston, IL 60208, United States. Tel.: +1 847 467 1285. E-mail addresses:
[email protected] (T.G. Andersen),
[email protected] (T. Bollerslev),
[email protected] (X. Huang). 1 Tel.: +1 919 660 1846.
2 Tel.: +1 405 325 2643. 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.029
daily, or coarser frequency data, coupled with formulations within the GARCH or stochastic volatility class of models; for a recent survey see Andersen et al. (2006). Meanwhile, somewhat of a paradigm shift has started to occur in which high-frequency data is now incorporated into longer-run volatility modeling and forecasting problems through the use of simple reduced-form time series models for non-parametric daily realized volatility measures based on the summation of intraday squared returns; see, e.g., Andersen et al. (2003) and the supportive theoretical results in Andersen et al. (2004).3 Further, decomposing the total daily return variability into its continuous and discontinuous components based on the bipower variation measures developed by BarndorffNielsen and Shephard (2004a, 2006), the empirical results in Andersen et al. (2007) suggest that most of the predictable variation in the volatility stems from the strong own dynamic dependencies in the continuous price path variability, while the predictability of the (squared) jumps is typically minor. The present paper takes this analysis one step further by developing, estimating and implementing separate reduced-form time series forecasting models
3 Closely related empirical findings have been reported in Anderson and Vahid (2007), Areal and Taylor (2002), Corsi (2004), Deo et al. (2006), Koopman et al. (2005), Martens et al. (2004), Pong et al. (2004) and Thomakos and Wang (2003), among others.
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 176–189
for each of the different components that make up the total daily price variation. Following the analysis in Andersen et al. (2007), we begin by decomposing the total return variability over the trading day into its continuous sample path variation and the variation due to jumps based on the bipower variation measure developed by Barndorff-Nielsen and Shephard (2004a, 2006). Our empirical results with a fifteen year sample of high-frequency intraday S&P 500 and T-Bond futures returns confirm earlier findings that the dynamic dependencies in the daily continuous sample path variability is well described by an approximate long-memory Heterogeneous AR (HAR) model, as originally proposed by Corsi (2004). Meanwhile, careful analysis of the non-parametrically identified jumps reveals some new and interesting dynamic dependencies vis-a-vis the results reported in the existing literature. In particular, while the time series of statistically significant squared jumps appear to be approximately serially uncorrelated, the times between jumps and the sizes of the jumps are both autocorrelated.4 We successfully model these dependencies by the combination of an Autoregressive Conditional Hazard (ACH) model, as developed by Hamilton and Jordà (2002), for the time-varying jump intensities, coupled with a log-linear model with GARCH errors for the size of the jumps.5 The two separate model structures described above effectively account for the variability over the active part of the trading day when the market is formally open. However, the opening price typically differs from the closing price from the previous day, and the corresponding overnight return often accounts for a non-trivial fraction of the total daily return. The most common approach for dealing with this issue when modeling and forecasting realized volatilities is to scale the intraday measures and/or model forecasts by a constant to make them unconditionally unbiased for the total daily variation; see, e.g., Martens (2002), Fleming et al. (2003) and Koopman et al. (2005). An alternative approach based on minimizing the mean square error for the realized variance over the whole day has also been advocated by Hansen and Lunde (2005). Instead, we treat the overnight returns as a time series of regularly occurring jumps. We model these by a discrete-time GARCH model in which the conditional variance explicitly depends on the continuous sample path variation over the immediately preceding active part of the trading day. We also show how the three separate models discussed above may be combined in the construction of recursive forecasts for the total daily and longer horizon return volatility.6 Comparing both in- and out-of-sample daily, weekly and monthly forecasts to those from other discrete-time volatility models, including a standard GARCH(1, 1) model and the HAR–RV model, our results suggest that the more detailed modeling approach developed here can in fact result in important forecast improvements. Our paper is most directly related to Bollerslev et al. (in this issue), who estimate a discrete-time model for the joint dynamics of daily S&P 500 returns, realized variance and bipower variation. However, in contrast to the present paper, the former paper makes no attempt at separately identifying or modeling the dynamics of the jump and the overnight return components. Closely related results have also been reported in independent
4 The occurrences of jumps in the T-Bond market also appear to be related to the releases of macroeconomic news announcements, as documented in, e.g., Andersen et al. (2007) and Johannes (2004). 5 The idea of modeling the jump process in terms of the occurrence and the size of the jumps has a natural precedent in the bin model for tick-by-tick transaction prices proposed by Rydberg and Shephard (2003). 6 The separate model estimates and accompanying forecasts reported here ignore any contemporaneous dependencies among the innovations to the continuous sample path variability and jump equations. Incorporating this into a fully efficient multivariate system estimation is severely complicated by the fact that the time series of significant jumps are effectively censored and the corresponding equations only estimated based on a subset of the sample.
177
work by Lanne (2006). Our paper is also related to the concurrent work of Tauchen and Zhou (2006), who document timevarying jump intensities based on the same realized variation measures and test statistics used here. Discrete-time GARCH models incorporating Poisson jump processes with time-varying jump intensities based solely on daily data have also previously been estimated by Chan and Maheu (2002) and Maheu and McCurdy (2004), while earlier work by Neely (1999) highlights the potential benefits from removing jumps when forecasting volatility using GARCH type models. At a somewhat broader level our results also speak to the vast finance literature based on continuous-time methods and corresponding parametric models. In particular, the compound Poisson model of Merton (1976) and the many subsequent studies that rely on time-invariant jump–diffusions, are all at odds with the empirical findings reported here. On the other hand, the more recent studies by Andersen et al. (2002), Chernov et al. (2003) and Eraker et al. (2003) that explicitly allow for time-varying jump intensities all report difficulties in precisely estimating the process from daily data. Meanwhile, consistent with the empirical results for the high-frequency realized variation measures reported here, Bates (2000), Pan (2002), Carr and Wun (2003) and Eraker (2004) all point to the existence of time-varying jump intensities when on incorporating additional information from options data. The rest of the paper is organized as follows. Section 2 sets up the notation and reviews the jump detection statistic used in revealing the latent jump processes. Section 3 reports the initial empirical evidence for the distinct dynamic characteristics of the different components that make up the total daily return variation. Section 4 models the continuous sample path variance, while Sections 5 and 6 develop our models for the discrete jump contribution and the overnight return dynamics, respectively. Section 7 discusses the construction of forecasts and compares the results to those from other procedures. Section 8 concludes. 2. Jump detection test statistics 2.1. General setup and notation We assume that the scalar logarithmic asset price within the active part of the trading day follows a standard jump–diffusion process dp(τ ) = µ(τ )dτ + σ (τ −)dw(τ ) + κ(τ )dq(τ ),
(1)
where τ ∈ R , and the time scale is normalized so that the unit interval corresponds to a trading day; µ(τ ) denotes the drift term with a continuous and locally finite variation sample path; σ (τ ) > 0 is the spot volatility process, assumed to be càdlàg; w(τ ) is a standard Brownian motion; κ(τ )dq(τ ) refers to the pure jump part, where dq(τ ) = 1 if there is a jump at time τ and 0 otherwise, where the jumps occur with potentially time-varying jump intensity λ(τ ), and size κ(τ ). We denote the corresponding discrete-time within-day geometric returns by +
rt ,j = p(t − 1 + j/M ) − p(t − 1 + (j − 1)/M ), j = 1, 2, . . . , M ,
(2)
where t ∈ N+ , and M refers the number of (equally spaced) return observations over the trading day. The continuous-time diffusion process above only applies for the active part of the trading day. However, the opening price on one day typically differs from the closing price recorded on the previous day. In fact, as discussed further below, it is natural to think of the overnight returns as random jumps occurring at the deterministic times t = 1, 2, . . . . As such, the total return for day t equals r t = r t ,n +
M − j =1
rt ,j = rt ,n + rt ,d ,
(3)
178
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 176–189
where rt ,n denotes the overnight logarithmic price change from day t − 1 to day t, and we follow the convention of measuring the daily returns as close-to-close.
Moreover, under similar regularity conditions, except α < 1, the test statistic
2.2. Realized variation measures
Zt =
The volatility over the active part of the trading day t is measured by the quadratic variation
∫
t
σ (s)ds + 2
QVt = t −1
Nt −
κ . 2 t ,j
(4)
j=1
The first integrated variance term represents the contribution from the continuous price path, while Nt gives the number of jumps over ∑N day t, and j=t 1 κt2,j accounts for the corresponding contribution to the variance from the within-day jumps. The quadratic variation process and its separate components are, of course, not directly observable. Instead, we resort to recently popularized model-free non-parametric consistent measures, including the now familiar realized variance RVt (M ) =
M −
rt2,j .
(5)
j =1
As noted in Andersen and Bollerslev (1998), Comte and Renault (1998), Andersen et al. (2001b, 2003) and Barndorff-Nielsen and Shephard (2001, 2002), among others, RVt (M ) converges uniformly in probability to QVt as the sampling frequency goes to infinity PM
RVt (M ) −−−→
∫
M →∞
t
σ 2 (s)ds + t −1
Nt −
κt2,j ,
(6)
j =1
or equivalently, the length of the return interval goes to zero. Meanwhile, a host of practical market microstructure complications prevents us from sampling too frequently while maintaining the fundamental semimartingale assumption underlying Eq. (1). Ways in which to best deal with these complications and the practical choice of M have been the subject of intensive recent research efforts; see, e.g., Aït-Sahalia et al. (2005), Bandi and Russell (2008), Barndorff-Nielsen et al. (2008), Hansen and Lunde (2006) and Zhang (2006). In the analysis reported on below, we simply follow most of the existing empirical literature in the use of a fixed five-minute sampling frequency, corresponding to M equal to 80 and 79 for each of the two markets that we study. In order to separately measure the jump part, we rely on the realized bipower variation measure developed by BarndorffNielsen and Shephard (2004a, 2006), 2 RBV1,t = µ− 1
− M
M M −2
|rt ,j−2 | |rt ,j |,
(7)
j=3
where µa = E (|Z |a ) for Z ∼ N (0, 1). The bipower variation measure defined above involves an additional stagger relative to the measure originally considered in Barndorff-Nielsen and Shephard (2004a), which helps render it robust to certain types of market microstructure noise; see Huang and Tauchen (2005) for some initial analytical investigations and simulation-based evidence along these lines. Barndorff-Nielsen et al. (2006), Barndorff-Nielsen et al. (2006) and Jacod (2008) show that RBV1,t (M ) converges in probability to the integrated variance, under the general assumption that the logarithmic price process is a Brownian semimartingale with a finite-activity jump process, or an infinite-activity α -stable jump process with the Blumenthal–Getoor index α < 2. Consequently, the difference between the realized variance and the realized bipower variation consistently estimates the part of the quadratic variation due to jumps PM
RVt (M ) − RBV1,t (M ) −−−→ M →∞
Nt − j =1
κt2,j .
(8)
RVt −RBV1,t RVt
π 2 2
+π −5
1 M
max 1,
RTQt RBV12,t
,
(9)
where 3 RTQ1,t = M µ− 4/3
M M −4
− M
|rt ,j−4 |4/3 |rt ,j−2 |4/3 |rt ,j |4/3 ,
(10)
j =5
is asymptotically standard normally distributed under the null hypothesis of no within-day jumps.7 Based on the above jump detection test statistic, the realized measure of the jump contribution to the quadratic variation of the logarithmic price process is then measured by Jt (M ) = I (Zt > Φα ) · (RVt (M ) − RBVi,t (M )),
(11)
where I (·) denotes the indicator function and Φα refers to the appropriate critical value from the standard normal distribution. Accordingly, our realized measure for the integrated variance is defined by Ct (M ) = I (Zt ≤ Φα ) · RVt (M ) + I (Zt > Φα ) · RBVi,t (M ).
(12)
This definition automatically ensures that the non-parametric measures for the jump and continuous components add up to RVt (M ). This same decomposition of the within day variance has also previously been explored by Andersen et al. (2007), among others. Of course, the actual implementation requires a choice of α . In the results reported on below, we use a critical value of α = 0.99, but very similar results (available upon request) were obtained for other values of α .8 For notational simplicity, we will refer to these empirical measures as RVt , Ct and Jt in the sequel. 3. Data and summary statistics 3.1. Data Our data consist of five-minute prices for the S&P 500 futures (SP) and 30-year US treasury bond futures (US) contracts. The raw transaction prices for both contracts were obtained from PriceData. The sample period for both assets begins on January 2, 1990, and ends on February 4, 2005. The intraday five-minute prices for the SP contracts span the time interval from 9:35 to 16:15 (EST), resulting in M = 80 non-overlapping return observations per day. The five-minute prices for the US contracts cover the period from 8:25 to 15:00 (EST), for a total of M = 79 intraday returns. Our use of a five-minute sampling frequency parallels many previous studies in the literature, and as discussed further in Andersen et al. (2007), for the two contracts analyzed here strikes a reasonable balance between the desire for as finely sampled observations as possible on the one hand, and robustness to contaminating market microstructure influences on the other.9
7 Huang and Tauchen (2005) report extensive simulation evidence showing that this particular jump detection test statistic exhibits excellent size and power properties for a one-factor logarithmic stochastic volatility model augmented with compound Poisson jumps. Also, while the original proofs for asymptotic normality in the above cited papers relied on α < 1, Zhang (2007) has recently extended the results to allow for 1 < α < 3/2. 8 In the actual implementation we also imposed a hard lower bound of 0.001 on the daily Ct (M ). 9 For further details concerning the previous-tick method used in the construc-
tion of the five-minute returns and the specific contract rollover scheme, see Wasserfallen and Zimmermann (1985), Dacorogna et al. (2001), Andersen et al. (2007) and Fleming et al. (2003), respectively. For SP around 98% of the prices occur within one minute of each five-minute mark, while for US the proportion is around 86%.
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 176–189
179
5 Daily Returns 0
-5 1990
1992
1994
1996
1998
2000
2002
2004
10 C
t
5
0 1990
1992
1994
1996
1998
2000
2002
2004
10 Jt 5
0 1990
1992
1994
1996
1998
2000
2002
2004
10 r2
t,n
5
0 1990
1992
1994
1996
1998
2000
2002
2004
Fig. 1. Daily returns and variation components for SP. Table 1 Descriptive statistics for SP.
Mean Std. dev. Skewness Kurtosis Min Max Obs.
Ct
Jt
It
St
rt2,n
0.856 1.098 5.274 47.20 0.004 14.33 3801
0.042 0.610 34.77 1390 0.000 27.59 3801
0.086 0.281 2.947 9.683 0.000 1.000 3801
0.491 2.025 10.37 123.8 0.006 27.59 328
0.261 0.865 16.44 483.3 0.000 31.38 3800
Ljung–Box Q -statistics Lags
Ct
5 10 15 20
6109 10039 12629 15050
Jt (0.000) (0.000) (0.000) (0.000)
2.773 24.69 24.79 26.75
It (0.735) (0.006) (0.053) (0.143)
3.2. Summary statistics To get an idea about the properties of the different components that make up the total daily return variance for each of the two markets, we plot in Figs. 1 and 2 the daily return rt , our measure for the continuous sample path variation Ct , the sum of the within day squared jumps Jt , and the overnight squared returns rt2,n . The figures clearly indicate rather distinct dynamic dependencies in each of the different components, with the jump time series appearing noticeably more erratic and less predictable than the other series.
rt2,n
St
7.361 15.50 32.95 37.47
(0.195) (0.115) (0.005) (0.010)
3.606 6.190 6.884 7.077
(0.607) (0.799) (0.961) (0.996)
577.3 727.3 861.2 962.9
(0.000) (0.000) (0.000) (0.000)
To better understand these dependencies, we further decompose the Jt series into two separate processes: one describing the occurrence of jumps, and the other the size of the squared jump(s) within the day when at least one jump occurs. We denote these two processes by It and St , respectively. More precisely, Pr (Jt = 0|Ft −1 ) = Pr (It = 0|Ft −1 ), while Pr (0 < Jt ≤ j|Ft −1 ) = Pr (It = 1|Ft −1 ) · Pr (St ≤ j|Ft −1 , It = 1). The resulting summary statistics reported in Tables 1 and 2 do indeed reveal some significant dynamic dependencies in the It and St series that are largely masked in the corresponding Jt series. Of course, the Ljung–Box Q -statistics reported in the tables only capture own linear dependencies.
180
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 176–189
5 Daily Returns 0
-5 1990
1992
1994
1996
1998
2000
2002
2004
3 Ct 2
1 0 1990
1992
1994
1996
1998
2000
2002
2004
3 J 2
t
1 0 1990
1992
1994
1996
1998
2000
2002
2004
2 r
2 t,n
1
0 1990
1992
1994
1996
1998
2000
2002
2004
Fig. 2. Daily returns and variation components for US. Table 2 Descriptive statistics for US.
Mean Std. dev. Skewness Kurtosis Min Max Obs.
Ct
Jt
It
St
rt2,n
0.253 0.205 3.312 23.99 0.001 2.742 3781
0.037 0.158 13.72 281.5 0.000 4.519 3781
0.255 0.436 1.123 2.261 0.000 1.000 3781
0.143 0.286 7.798 88.71 0.009 4.519 965
0.066 0.160 13.46 340.9 0.000 5.271 3780
Ljung–Box Q -statistics Lags
Ct
5 10 15 20
1492 2444 3192 3939
Jt (0.000) (0.000) (0.000) (0.000)
5.511 7.946 10.42 158.2
It (0.357) (0.634) (0.793) (0.000)
As discussed further in Section 5, there are also strong non-linear dynamic dependencies embedded in the series for both markets. The reduced form models for each of the different components discussed next are explicitly designed to account for these features. 4. Continuous sample path variation We start by detailing our model for the strongly serially correlated continuous sample path variation process, Ct . The HAR–RV
rt2,n
St
38.11 58.27 93.10 142.5
(0.000) (0.000) (0.000) (0.000)
4.648 38.15 44.12 63.82
(0.460) (0.000) (0.000) (0.000)
72.6 128.8 152.2 182.8
(0.000) (0.000) (0.000) (0.000)
model first proposed by Corsi (2004), and further developed by Andersen et al. (2007), provides a particular convenient framework for modeling these dependencies.10 The specific HAR–C model adopted here takes the form,
10 Following Müller et al. (1997), the HAR type specification is sometimes given a structural interpretation as arising from the interaction of agents with different investment horizons. We merely view the HAR–C model as providing a convenient, or ‘‘poor-man’s’’, approximation to long-memory.
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 176–189
181
Table 3 HAR–C model estimates. Homoskedastic
β0 βCD βCW βCM βJD βJW βJM ω α1 α2 β1 ν Log L Obs.
GARCH(2, 1)
SP
US
SP
US
−0.085 (0.011) [0.000]
−0.166 (0.052) [0.001]
−0.097 (0.010) [0.000]
−0.249 (0.038) [0.000]
0.349 (0.024) [0.000] 0.338 (0.034) [0.000] 0.262 (0.029) [0.000] −0.109 (0.071) [0.126] −0.118 (0.077) [0.124] −0.122 (0.088) [0.165] – – – – – −2805.434 3779
0.144 (0.043) [0.001] 0.315 (0.062) [0.000] 0.501 (0.061) [0.000] −0.342 (0.135) [0.012] 0.099 (0.224) [0.657] −0.820 (0.402) [0.041] – – – – – −3890.313 3759
0.331 (0.020) [0.000] 0.369 (0.030) [0.000] 0.252 (0.025) [0.000] −0.087 (0.078) [0.267] −0.081 (0.100) [0.418] −0.175 (0.099) [0.079] 0.049 (0.015) [0.001] 0.111 (0.027) [0.000] −0.024 (0.032) [0.457] 0.720 (0.076) [0.000] 7.696 (0.740) [0.000] −2671.580 3779
0.109 (0.020) [0.000] 0.327 (0.036) [0.000] 0.444 (0.039) [0.000] −0.278 (0.116) [0.017] 0.281 (0.221) [0.205] −1.460 (0.329) [0.000] 0.015 (0.003) [0.000] 0.129 (0.026) [0.000] −0.090 (0.026) [0.000] 0.921 (0.015) [0.000] 7.370 (0.822) [0.000] −3341.914 3759
log(Ct +1 ) = β0 + βCD log(Ct ) + βCW log(Ct −5,t )
+ βCM log(Ct −22,t ) + βJD log(Jt + 1) + βJW log(Jt −5,t + 1) + βJM log(Jt −22,t + 1) + ϵt +1,C ,
(13)
−1
where Ct −h,t ≡ h [Ct −h+1 + Ct −h+2 + · · · + Ct ] and Jt −h,t ≡ h−1 [Jt −h+1 + Jt −h+2 + · · · + Jt ]. The logarithmic transform obviously prevents the implied continuous variation defined by exponentiating Ct +1 from becoming negative.11 The left columns in Table 3 report the resulting OLS parameter estimates, together with robust standard errors in parentheses and p-values in square brackets. The estimation results confirm the strong own dynamic dependencies in Ct . Consistent with the results reported in Andersen et al. (2007) the coefficient estimates associated with the lagged squared jumps for SP are generally insignificant, albeit all negative, while there is some evidence of significant anti-persistent effects of the squared jumps for US. Meanwhile, the Ljung–Box Q -statistics for the squared and absolute residuals (available upon request) reveal clear evidence for significant conditional heteroskedasticity. Hence, we augment the basic HAR–C model above with a GARCH error structure for the time-varying volatility-ofvolatility.12 Further, to allow for the possibility of fat-tails, we estimate the model under the assumption of conditionally t-distribution errors as in Bollerslev (1987). After some experimentation, we found that a GARCH(2, 1) model provided a good fit for both markets,
v−2 · zt +1,C , zt +1,C ∼ t (ν) v 2 = ωC + α1,C ϵt ,C + α2,C ϵt2−1,C + β1,C σt2,C .
ϵt +1,C = σt +1,C · σt2+1,C
(14)
The estimates from this augmented model are reported in the right columns of Table 3. The conditional mean parameters are generally close to the previously reported OLS estimates.13 The coefficient estimates associated with the past squared jumps again suggest
that, everything else equal, large jumps tend to lower the future continuous sample path volatility, and particularly so for US.14 The residual diagnostics (available upon request) also confirm that the estimated GARCH(2, 1) models adequately account for the conditional heteroskedasticity. 5. Jump variation Our model for the trading-time jump variation consists of two parts: a model for the occurrence of jumps coupled with a model for the squared jump sizes.15 We begin by a discussion of our model for jump occurrences. 5.1. The ACH model Let {t0 , t1 , . . . , tn , . . .} denote the random arrival times, or days, associated with significant jumps. The Autoregressive Conditional Duration (ACD) model proposed by Engle and Russell (1998), is ideally suited for modeling dynamic dependencies in the jumpdurations di = ti − ti−1 , or the number of days between two adjacent significant jumps. However, the ACD model only updates the conditional expected durations on event, or jump, days. From a forecasting perspective, it is desirable to continuously incorporate new information as it becomes available. The autoregressive conditional hazard (ACH) model of Hamilton and Jordà (2002) was explicitly designed with this objective in mind.16 In order to more formally define the ACH model, let N (t ) denote the counting process representing the number of jump days that have occurred up until time t. Also, define the hazard rate ht = Pr [N (t ) ̸= N (t − 1)|Ft −1 ].
ψN (t −1) =
∞ − j =1
11 Moreover, the unconditional distributions of realized logarithmic volatilities often appear approximately normal; see, e.g., Andersen et al. (2001b,a) and Barndorff-Nielsen and Shephard (2004b), among others. 12 The presence of time-varying volatility-of-volatility is consistent with most of the continuous-time stochastic volatility models used in the asset pricing finance literature. For instance, in the square-root affine, or Heston, diffusion model, the conditional variance of the future instantaneous variance is an affine function of the current instantaneous variance and the current instantaneous variance squared; see, e.g., Bollerslev and Zhou (2002). 13 Since the model is formulated in terms of log(C ), the form of the conditional t +1
heteroskedasticity plays a direct role in determining the expected value of Ct +1 . Specifically, under the simplifying assumption of conditional normality, or ν = ∞, E [Ct +1 |Ft ] = exp{E [log(Ct +1 )|Ft ] + 1/2 Var[log(Ct +1 )|Ft ]}. We will return to a discussion of the numerical procedure that we actually use in the calculation of the expectations from the more general model in the forecasting section below.
(15)
The relationship between the hazard rate and the expected duration, say ψN (t − 1), if no new information occurs between jump days, is then given by, j(1 − ht )j−1 ht =
1 ht
.
(16)
14 Although some of the estimated coefficients are not significantly different from zero at the usual five percent level, we purposely maintain the same HAR–C GARCH(2, 1) specification for both markets. Also note, that even though the ARCH(2) coefficients are estimated to be negative for both markets, the implied coefficients in the infinite ARCH representations are all positive, so that the models are indeed well-defined. 15 The model-free approach used here only identifies days with at least one significant jump and in turn the sum of the within day squared jump(s). Further refinements along the lines of Andersen et al. (2006) for actually estimating each of the individual significant jumps could be used in the formulation of even richer reduced form models. 16 Bowsher (2007) has recently developed a more general econometric modeling framework for continuous-time conditional intensity-based multivariate processes.
182
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 176–189
Table 4 ACH model estimates. ACH(1, 1)
Augmented ACH
SP
ω α1 β1 δ0 δM δT δW δTh δES δCPI
US
0.227 (0.241) [0.346] 0.038 (0.021) [0.061] 0.942 (0.036) [0.000]
0.212 (0.084) [0.012] 0.088 (0.020) [0.000] 0.858 (0.035) [0.000]
– – – – – – –
– – – – – – –
−1107.797
Log L Obs.
−2111.457
3792
3778
SP
US
–
–
0.056 (0.029) [0.053] 0.900 (0.059) [0.000] 0.654 (2.771) [0.813] 0.320 (1.695) [0.850] 1.117 (1.731) [0.519] 2.979 (1.962) [0.129] 1.144 (1.666) [0.492] 0.629 (0.575) [0.274] 0.757 (0.629) [0.229] −1106.136 3792
0.088 (0.019) [0.000] 0.859 (0.033) [0.000] −0.282 (0.423) [0.505] 0.412 (0.298) [0.166] 0.387 (0.271) [0.154] 0.681 (0.299) [0.023] 0.325 (0.266) [0.222] 0.199 (0.094) [0.034] 0.396 (0.089) [0.000] −2098.906 3778
US
SP
0.13 0.4 0.12 0.35 0.11 0.3
0.1 0.09
0.25
0.08 0.2 0.07 0.15
0.06 1995
2000
1995
2005
2000
2005
Fig. 3. Conditional hazard rates from ACH(1, 1) model.
The ACH model directly parameterizes the hazard rate, ht , allowing it to depend on any relevant time t − 1 information. To illustrate, consider the simple ACH(1, 1) model without any information updating between jump days, 1
, ψN (t )−1 ψN (t ) = ω + α1 dN (t )−1 + β1 ψN (t )−1 . ht =
(17)
Under appropriate distributional assumptions this ACH(1, 1) model is asymptotically equivalent to the ACD(1, 1) model, which parameterizes the conditional durations as ψi = ω + α1 di−1 + β1 ψi−1 ; see Hamilton and Jordà (2002) for further details. The parameter estimates from this ACH(1, 1) model are reported in the first part of Table 4. The estimates confirm the existence of strong persistence in the hazard rates, or equivalently the durations, in both markets. The Ljung–Box Q -statistics for any remaining own dynamic dependencies in the standardized durations implied by the model (available upon request), di /ψˆ i , are generally also insignificant. The second set of estimates reported in Table 4 augments the basic ACH(1, 1) model by four weekday dummies for Monday, Tuesday, Wednesday and Thursday. We explicitly exclude the Friday dummy to avoid singularity, so that the estimated coefficients represent the effects relative to Friday. In addition, we include the logarithm of the number of days to the next nearest news announcements of the Employment Report (representing the real side of the economy) and the Consumer Price Index
(representing the nominal side of the economy).17 Specifically, 1
, α1 dN (t )−1 + β1 ψN (t )−1 + δ ′ zt −1 δ ′ zt −1 = δ0 + δM DM + δT DT + δW DW + δTh DTh + δES log(nES + 1) + δCPI log(nCPI + 1).
ht =
(18)
Consistent with the extant news announcement literature, the results suggest a statistically significant decreasing hazard for the occurrence of jumps in the US market as a function of the number of days until the release of one of the two announcements. Also, the corresponding coefficients for SP are both positive, albeit insignificant. The Monday through Thursday weekday dummies are all positive, but they do not indicate any statistically significant day-of-the-week effects in the jump occurrences. Nonetheless, in order to highlight the added flexibility afforded by the augmented ACH model, we maintain this as our preferred specification for both markets.18 To better illustrate the workings of the two different ACH specifications, Figs. 3 and 4 plot the resulting implied conditional hazard rates, hˆt . Comparing the two figures, the impact of the dayto-day updating for the latter set of plots is immediately evident.
17 The results in Andersen et al. (2007) suggest that these are the two most important macroeconomic news announcements. 18 We also experimented with augmenting the ACH model by C and J , but the t t estimated hazard rates did not appear plausible, so we decided not to include any of these variables in our final model specification.
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 176–189 SP
183 US
0.7 0.18 0.6
0.16 0.14
0.5
0.12
0.4
0.1 0.3 0.08 0.2
0.06 1995
2000
2005
1995
2000
2005
Fig. 4. Conditional hazard rates from augmented ACH model. Table 5 HAR–J model estimates. Homoskedastic
GARCH(1, 1)
SP
US
SP
US
β0 βCD βCW βCM ω α1 β1 ν
−1.093 (0.072) [0.000]
−1.479 (0.113) [0.000] −0.057 (0.053) [0.280]
−1.299 (0.122) [0.000]
−1.535 (0.114) [0.000] −0.085 (0.051) [0.095]
Log L Obs.
−387.854
0.281 (0.114) [0.014] 0.460 (0.185) [0.013] 0.345 (0.150) [0.022] – – – –
0.172 (0.122) [0.158] 0.440 (0.172) [0.011] 0.343 (0.147) [0.020] 0.068 (0.110) [0.540] 0.039 (0.042) [0.355] 0.866 (0.009) [0.000] 3.265 (0.659) [0.000] −353.162 327
0.269 (0.112) [0.017] 0.489 (0.111) [0.000] – – – –
327
−1281.353 960
Still, the two sets of figures reveal the same general patterns in the estimated hazard rates, with jumps appearing more than twice as likely for US compared to SP over most of the sample. Again, we believe that this is partly due to the much bigger impact of macroeconomic news announcements for the fixed-income market. There is also a pronounced tendency for even fewer jumps in the equity market during the middle part of the sample, almost akin to a level shift in the estimated hazard rates. It is not clear what drives this change. 5.2. The HAR–J model Most continuous-time parametric jump–diffusion models assume that the size of the jumps are i.i.d. distributed through time. By directly observing the squared jumps, or more precisely the realized measure of the sum of within-the-day squared jumps, the present framework affords us much greater flexibility in terms of modeling the jump sizes. Following the same basic idea underlying the HAR–C model, we parameterize the conditional jump sizes as a function of the past continuous sample path variations.19 In particular, log(St (i) ) = β0 + βCD log(Ct (i)−1 ) + βCW log(Ct (i)−5,t (i) )
+ βCM log(Ct (i)−22,t (i) ) + ϵt (i) ,
(19)
where t (i) maps the jump counter i into the corresponding trading day t, so that the lagged variation measures on the righthand-side are always measured in calendar time relative to the
19 We also tried including various lags of log(S ), as well as the raw and expected t durations. All of these other variables turned out to be insignificant.
0.338 (0.099) [0.001] 0.474 (0.103) [0.000] 0.013 (0.007) [0.046] 0.035 (0.010) [0.000] 0.950 (0.010) [0.000] 6.430 (1.608) [0.000] −1250.054 960
time of the jump. The estimation results from this model are reported in the left columns in Table 5. As seen from the table, the one month lagged continuous volatility generally have the most explanatory power. Also, the size of the jumps for US are much more persistent than for SP. Meanwhile, the Ljung–Box Q -statistics for the squared and absolute residuals (available upon request) again clearly indicate the existence of conditional heteroskedasticity in the residuals from the model for both markets. We therefore augment the basic HAR–J model above with a GARCH(1, 1)-t error structure,
v−2 · zt (i) , zt (i) ∼ t (ν), v 2 = ω + α1 ϵt (i−1) + β1 σt2(i−1) .
ϵt (i) = σt (i) · σt2(i)
(20)
The estimates from this preferred model are reported in the right columns in Table 5. The results confirm the existence of significant GARCH effects. Otherwise, the estimated dependencies in the conditional mean are directly in line with those for the homoskedastic model. 6. Overnight return variance The realized variation measures and corresponding reduced form models developed above pertains to the return variation observed during the regular trading hours when the exchanges are open. However, as previously noted, the opening price on one day typically differs from the closing price on the previous day. Since most investors hold their portfolios over longer interdaily horizons, the corresponding overnight return variability
184
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 176–189
Table 6 Overnight GARCH model estimates. Unrestricted SP
µ ωn α1,n β1,n βCP βCN βJP βJN βC ν
0.015 (0.005) [0.004]
−0.001 (0.001) [0.233] 0.041 (0.012) [0.001] 0.817 (0.024) [0.000] 0.040 (0.009) [0.000] 0.045 (0.007) [0.000] −0.031 (0.012) [0.010] 0.023 (0.022) [0.305] – 5.004 (0.434) [0.000] −1863.415 3799
Log L Obs.
Restricted US
SP
0.006 (0.004) [0.121] 0.002 (0.001) [0.014] 0.045 (0.011) [0.000] 0.852 (0.027) [0.000] 0.014 (0.006) [0.013] 0.023 (0.006) [0.000] −0.000 (0.011) [0.964] 0.005 (0.009) [0.574] – 7.906 (0.747) [0.000] −8.696 3779
will directly affect the risks of their positions. In particular, the proportion of the total daily variation due to the over-night returns, as measured by the sample means of rt2,n /(RVt + rt2,n ), equals 0.160 and 0.165 for the SP and US markets, respectively. Two common ways of dealing with this non-trivial overnight variation have emerged in the realized volatility literature. The first approach simply scales up the daytime realized variation measures to provide an unbiased estimate for the variation over the whole-day. This is the method used in, e.g., Martens (2002), Fleming et al. (2003) and Koopman et al. (2005). Alternatively, the overnight squared returns may be added to the within-day realized variation so that it covers the whole day. This approach, along with its pros and cons, is discussed further in Hansen and Lunde (2005), who also propose an improved estimator by optimally weighting, in a minimum mean-square-error sense, the daytime realized volatility and the squared overnight return. Both of these approaches implicitly assumes that the overnight squared returns may somehow be viewed as part of the same process that generates the within day realized volatility. Here we take a different approach and directly model the overnight returns, or jumps, by a separate discrete time model.20 6.1. The GARCH–t model The summary statistics previously reported and discussed in Section 3.2, not surprisingly, indicate the presence of serial correlation in the squared overnight returns. This naturally suggests a GARCH type approach for capturing these dependencies. Since the overnight returns are separated by the returns during the regular trading hours, we include the immediately preceding daytime realized volatility as an additional explanatory variable in the conditional variance equation. Moreover, since the continuous and discrete sample-path variation over the day may affect the subsequent overnight return differently, we split up the realized volatility into Ct and Jt . Furthermore, to allow for the possibility that positive and negative daytime shocks may have different effects, we condition the estimated coefficients for Ct and Jt on the sign of the daytime return, rt ,d . The resulting specification for the overnight returns takes the form, rt +1,n = µ + ϵt +1,n
v−2 · zt +1,n , zt +1,n ∼ t (v) v 2 = ωn + α1,n ϵt ,n + β1,n σt2,n + βCP CtP
ϵt +1,n = σt +1,n · σt2+1,n
(21)
+ βCN CtN + βJP JtP + βJN JtN ,
20 The studies by Chan et al. (1991) and Martens (2002), which estimate individual discrete-time models for the trading-time and overnight returns, provide an earlier precedent.
US
0.015 (0.005) [0.003]
0.006 (0.004) [0.105] 0.002 (0.001) [0.020] 0.046 (0.010) [0.000] 0.854 (0.027) [0.000]
−0.001 (0.001) [0.199] 0.044 (0.012) [0.000] 0.806 (0.025) [0.000] – – – –
– – – –
0.045 (0.007) [0.000] 4.944 (0.422) [0.000] −1865.451 3799
0.019 (0.005) [0.000] 7.847 (0.7303) [0.000] −10.300 3779
where CtP = Ct · I (rt ,d > 0), CtN = Ct · I (rt ,d < 0), and similarly for JtP and JtN . The left columns in Table 6 report the estimation results. As expected, the estimates for α1 and β1 are both highly statistically significant and broadly in line with the typical daily GARCH(1, 1) model estimates, although their sums are slightly less than what is generally found with daily returns. Of course, some of this ‘‘lack’’ in persistence is made up by significant positive dependence on the within day realized continuous sample path variation, CtP and CtN . Interestingly, the jump components, JtP and JtN , are generally not significant. Furthermore, the Wald test for the hypothesis of no volatility asymmetry, or βCP = βCN , equal 0.267 and 1.647 for each of the two markets respectively, with corresponding asymptotic p-values of 0.606 and 0.199.21 The right columns in Table 6 report the estimation results from the more parsimonious GARCH(1, 1)-t model obtained by eliminating the jump components and combining the positive and negative continuous variation components,
σt2+1,n = ωn + α1,n ϵt2,n + β1,n σt2,n + βC Ct .
(22)
The estimated parameters are directly in line with those from the earlier more general specification, and the corresponding values for the maximized log likelihood functions are also close to those for the unrestricted models. We consequently maintain this simpler model as our preferred specification for the overnight return variation. 7. Forecasting One of the many potential useful applications of the reduced form modeling framework developed above relates to volatility forecasting. In particular, consider the question of calculating one-day-ahead return volatility forecasts, or Var(rt +1 |Ft ). The standard GARCH based approach directly parameterizes this conditional expectation as a function of its own past value(s) and the lagged squared return(s). This, of course, does not include any high-frequency information. On the other hand, the now popular HAR–RV model parameterizes the conditional variance as a distributed lag of the past realized variation measures. While this does incorporate high-frequency information into the resulting forecasts, the traditional HAR–RV model does not distinguish between the continuous sample path variation and the discontinuous jump part. However, as discussed at length above, the dynamic dependencies in these two components are very different. Moreover, the standard approaches of scaling the
21 On estimating the same GARCH models under the assumption of conditionally normal errors, the asymmetry appears significant for SP, indirectly suggesting that the effect is associated with the tails of the distribution.
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 176–189 Table 7 In-sample forecast statistics. Horizon
7.1. In-sample forecasts
RMSE
MAE
1
5
22
1
5
22
SP
GARCH HAR–RV HAR–CJN
1.519 1.477 1.412
0.995 0.943 0.855
0.869 0.827 0.748
0.626 0.556 0.542
0.506 0.453 0.417
0.504 0.466 0.420
US
GARCH HAR–RV HAR–CJN
0.325 0.325 0.322
0.170 0.169 0.167
0.119 0.118 0.120
0.176 0.175 0.176
0.112 0.112 0.114
0.087 0.086 0.086
realized volatilities or treating the overnight return as another intraday return in order to get an unbiased measure for the full day variance both ignore the distinct dynamic dependencies in the overnight returns. In contrast, the framework proposed here explicitly decomposes the conditional variance into three separate components,22 Var(rt +1 |Ft ) = E (Ct +1 |Ft ) + E (Jt +1 |Ft ) + Var(rt +1,n |Ft ).
(23)
The last term on the right-hand-side comes directly from the GARCH–t model discussed in the previous section. As for the second term, our results suggest that the occurrences of jumps and the sizes of the jumps are independent. Thus, ∞
∫
jdP (0 < Jt +1 ≤ j|Ft )
E [Jt +1 |Ft ] =
∫0 ∞ =
jdP (St +1 ≤ j|Ft , It +1 = 1) · P (It +1 = 1|Ft )
0
∞
[∫
]
jf (St +1 ≤ j|Ft , It +1 = 1)dj · P (It +1 = 1|Ft )
=
185
0
= E (St +1 |Ft , It +1 = 1) · ht +1 . Forecast for the hazard rate, ht , follows directly from the estimated ACH models. Since the models for St +1 and Ct +1 are formulated in logarithmic terms, the two conditional expectations E (St +1 |Ft , It +1 = 1) and E (Ct +1 |Ft ) will both involve a Jensen’s inequality type correction. However, numerical evaluations of these expectations are easily accomplished by means of simulations. Similarly, even though the highly non-linear dynamic dependencies among the different model components render closed-form expressions for the multi-step ahead conditional expectations, Var(rt +h |Ft ) for h > 1, infeasible, these are relatively easy to compute by means of recursive simulations.23 To assess the accuracy of the HAR–CJN model forecasts, we compare the predictions to the actual realized variation measures; i.e., for the one-day horizon forecasts RVt +1 + rt2+1,n . In addition to the one-day forecasts, we also calculate one-week and one-month forecasts defined by the average of the forecasts from 1 to 5, and 1 to 22 days ahead, respectively. As a benchmark comparison, we consider the forecasts from a simple GARCH(1, 1) model estimated on the daily returns, and an HAR–RV model properly scaled by the contribution from the overnight return so that the forecasts are unconditionally unbiased.24 The first subsection below discuss the results for the full sample period, labeled in-sample, while the subsequent section reports on the results from a true out-ofsample forecast comparison.
22 The validity of this decomposition for the conditional expectations implicitly assumes that the aforementioned convergence in probability of the realized variation measures implies convergence in mean. The assumption of a bounded return process, or a weaker uniform integrability condition, is sufficient to ensure that this holds; see, e.g., the discussion in Andersen et al. (2003). 23 In the results reported on below we rely on a total of 10,000 replications in calculating the expectations. To minimize the impact of large influential outliers and stabilize the algorithm, we also trim any simulated values more than twice the largest in-sample observation. Further details are available upon request. 24 An alternative to these popular forecasting models and the HAR–CJN model developed here suggested by one of the referees would be to project RVt +1 + rt2+1,n
To begin, Table 7 reports the standard Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) for the forecasts from each of the three different models based on the data over the full sample period.25 As is clear from the table, these in-sample summary statistics clearly favor the more complicated HAR–CJN model for SP. Meanwhile, the in-sample RMSE and MAE for US are not as clear-cut. In order to further analyze relative performance of the HAR–CJN model, we also estimate a series of Mincer–Zarnowitz style regressions. In particular, for the one-day-ahead forecasts,
(RVt +1 + rt2+1,n ) = b0 + b1 Vt ,GARCH + b2 Vt ,HAR–RV + b3 Vt ,HAR–CJN + ϵt +1
(24)
where Vt ,M refers to the time t one-day-ahead forecast from model M. In addition to the one-day-ahead forecasts, we run the same regressions for the 5- and 22-steps ahead forecasts, appropriately correcting the standard errors of the parameter estimates for the serial correlation in the residuals induced by the overlap in the data. Following the discussion in, e.g., Anderson and Vahid (2007), these regressions are naturally interpreted as volatility forecast encompassing regressions, in the sense that a coefficient significantly different from zero implies that the information in that particular model forecast is not encompassed in the forecasts by the two other models. As a further robustness check, we also report the results from the simple Mincer–Zarnowitz regressions, in which the ex-post variation measures are regressed on a constant and one of the three individual model forecasts in isolation. The results from these joint and individual Mincer–Zarnowitz regressions are all reported in Table 8. In the joint regressions for SP the forecasts from the HAR–CJN model invariably receives a weight indistinguishably different from unity in a statistical sense, while the estimated coefficients for the other two model forecasts are close to zero and insignificant, indicating that the HAR–CJN forecasts encompasses the forecasts from other two models. The individual SP regressions reported in the bottom part of the table further corroborate these findings. In particular, the R2 ’s from the HAR–CJN models are always the highest,26 with the estimated intercept and slope coefficients very close to zero and unity, respectively. The corresponding in-sample results for US generally also favor the HAR–CJN model, although the differences among the three model forecasts are not as large. 7.2. Out-of-sample forecasts Even though the loss functions used in evaluating the forecasts discussed in the previous section formally differ from the likelihood functions used in estimating the models, the insample comparisons may seem to tilted toward making the more complicated HAR–CJN model perform well. Hence, in order to more closely mimic a real-world forecast situation, we also report on the results obtained by re-estimating all of the models with data up until the end of 1999, retaining the last five years of the sample from January 2, 2000 to February 4, 2005 for out-of-sample forecast
on all of the variables in the time t information set, including information about the jumps and the overnight returns. While this procedure might perform well in a pure forecasting sense, it would obviously be completely void of any detailed information about the individual components that make up the total daily variation. 25 Patton (2006) has recently cautioned against the use of the MAE criteria with a noisy volatility proxy. The realized volatility measures that we use here effectively mitigate these concerns. 26 As discussed in Andersen et al. (2004, 2005), the reported R2 ’s understate the true degree of predictability due to the measurement errors in the realized volatility proxies. This does not, however, impede any cross model comparisons.
186
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 176–189
Table 8 In-sample Mincer–Zarnowitz regressions. Horizon
SP
US
1
5
22
1
5
22
Joint regressions
−0.030(0.045) −0.096 (0.106) −0.103 (0.071)
0.025 (0.055)
0.140 (0.083)
−0.004 (0.020)
−0.015 (0.024)
−0.012 (0.033)
−0.136 (0.106) −0.189 (0.123)
−0.022 (0.150) −0.410 (0.226)
1.264 (0.142) 0.399
1.353 (0.161) 0.601
1.363 (0.216) 0.564
0.524 (0.110) −0.302 (0.148) 0.774 (0.097) 0.123
0.459 (0.109) 0.003 (0.187) 0.565 (0.120) 0.310
0.358 (0.128) 0.097 (0.156) 0.539 (0.132) 0.408
0.044 (0.046) 0.939 (0.050) 0.301
0.127 (0.061) 0.868 (0.068) 0.460
0.257 (0.073) 0.750 (0.062) 0.450
−0.023 (0.019)
−0.009 (0.025)
1.093 (0.059) 0.098
1.053 (0.077) 0.259
0.022 (0.036) 0.963 (0.109) 0.324
Intercept HAR–RV R2
−0.066 (0.081)
−0.098 (0.083)
−0.083 (0.104)
−0.007 (0.021)
−0.018 (0.027)
−0.031 (0.037)
1.150 (0.089) 0.346
1.180 (0.095) 0.523
1.169 (0.111) 0.473
1.043 (0.062) 0.095
1.076 (0.080) 0.265
1.113 (0.111) 0.345
Const. HAR–CJN R2
−0.061 (0.050)
−0.022 (0.056)
1.091 (0.058) 0.397
1.068 (0.065) 0.579
0.046 (0.066) 1.024 (0.072) 0.558
0.038 (0.016) 0.870 (0.045) 0.117
0.044 (0.019) 0.838 (0.052) 0.291
0.042 (0.030) 0.805 (0.080) 0.385
Const. GARCH HAR–RV HAR–CJN R2
Individual regressions Const. GARCH R2
Table 9 Out-of-sample forecast statistics. Horizon
RMSE
MAE
1
5
22
1
5
22
SP
GARCH HAR–RV HAR–CJN
1.929 (0.001) 1.868 (0.002) 1.793
1.269 (0.004) 1.234 (0.004) 1.127
1.130 (0.069) 1.147 (0.009) 1.055
0.836 (0.000) 0.717 (0.224) 0.705
0.669 (0.000) 0.600 (0.002) 0.557
0.694 (0.019) 0.642 (0.005) 0.586
US
GARCH HAR–RV HAR–CJN
0.375 (0.044) 0.375 (0.000) 0.368
0.199 (0.107) 0.198 (0.010) 0.188
0.150 (0.133) 0.151 (0.041) 0.132
0.193 (0.305) 0.192 (0.251) 0.190
0.130 (0.191) 0.130 (0.041) 0.124
0.105 (0.436) 0.108 (0.135) 0.098
comparisons.27 Due to the relatively time consuming calculations involved in the estimation of the non-linear models, we did not re-estimate the models on a rolling basis over the out-of-sample period, instead simply freezing all of the parameters at their estimates based on the full 1990–1999 in-sample period. The out-of-sample results essentially affirm the earlier insample findings. The RMSEs and MAEs for SP reported in Table 9 again achieve their lowest values across all horizons for the HAR–CJN models. The out-of-sample values for US are also the lowest for the HAR–CJN model, although the numerical differences are not particularly large. Interestingly, however, limiting the out-of-sample forecast comparisons for US to the last two years of the sample, which tend to exhibit both larger and more frequent jumps, results in sharper differences among the RMSE and MAE criteria. As such, this indirectly suggests that the benefits from a forecasting perspective from separately modeling the two volatility components is to some extend period specific. Several procedures to formally test for the statistical significance of the observed differences in the RMSE and MAE criteria and the superior predictive ability of the underlying forecasting models have recently been proposed in the literature. As a simple guide we here rely on the easy-to-calculate Diebold and Mariano (1995) test involving a pairwise comparison of the forecasts from each of the two traditional models to the forecasts from the HAR–CJN model.28 The test is based on the heteroskedasticity and autocorrelation consistent t-statistic for the sample mean of Lt ,HAR–CJN − Lt ,M , where Lt ,M denotes the time t squared or absolute loss from the
particular model M . Many of the corresponding p-values reported in parentheses in Table 9 do indeed indicate statistically significant superior out-of-sample performance of the HAR–CJN model. The out-of-sample Mincer–Zarnowitz regressions adjusted for the in-sample parameter estimation error uncertainty following West and McCracken (1998) reported in Table 10 generally also favor the HAR–CJN model. Although the high degree of co-linearity among the three forecasts render most of the estimated coefficients for the joint encompassing regressions rather imprecise, the individual regressions all achieve their highest R2 ’s for the HAR–CJN model. Moreover, the estimated intercept and slope coefficients for the individual HAR–CJN regressions are all close to zero and unity, respectively. To further appreciate these results and the basic features of the different models, Figs. 5 and 6 plot the one-day ahead outof-sample forecasts. The overall level of the forecasts obviously matches fairly closely across the three models for both of the markets. Consistent with the results from the Mincer–Zarnowitz regressions, it also appears more difficult to discern any sharp differences in the three US forecasts. Nonetheless, the HAR–CJN based forecasts do seem to adapt more quickly to changes in the volatility than do the GARCH and, to a lesser degree, the HAR–RV, based forecasts. Not surprisingly, on comparing the forecasts to the actual realization in Figs. 1 and 2, all of the models miss the very largest observations which inherently must represent genuine large volatility innovations. 8. Conclusion
27 We also experimented with other out-of-sample periods, resulting in the same basic conclusions. Further details concerning these additional robustness checks are available upon request. 28 Although the Diebold and Mariano (1995) test does not explicitly account for the effect of estimation uncertainty, the out-of-sample version of the test coincides with the generally valid test for equal unconditional predictive ability recently developed by Giacomini and White (2006).
We use two fifteen-year samples of high-frequency intraday data for the S&P 500 and T-Bond futures markets along with the model-free bipower variation measures and corresponding jump statistics of Barndorff-Nielsen and Shephard (2004a, 2006) to nonparametrically identify and measure the daily continuous sample path variation and squared jumps. Directly in line with earlier
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 176–189
187
Table 10 Out-of-sample Mincer–Zarnowitz regressions. Horizon
SP
US
1
5
22
1
5
22
Joint regressions 0.008 (0.117)
0.114 (0.170)
0.394 (0.250)
−0.029 (0.062)
−0.039 (0.072)
−0.194 (0.270) −0.238 (0.594)
−0.187 (0.262) −0.813 (0.779)
−0.097 (0.294) −2.428 (1.545)
1.515 (0.542) 0.375
2.019 (0.716) 0.568
3.333 (1.321) 0.518
0.852 (0.301) −0.709 (0.379) 1.018 (0.247) 0.124
0.762 (0.279) −0.361 (0.404) 0.797 (0.265) 0.321
0.077 (0.124) 0.989 (0.099) 0.260
0.201 (0.187) 0.915 (0.154) 0.404
0.487 (0.259) 0.746 (0.119) 0.348
−0.049 (0.060)
−0.034 (0.079)
1.282 (0.166) 0.094
1.250 (0.224) 0.262
Const. HAR–RV R2
−0.128 (0.150)
−0.043 (0.202) 1.300 (0.200) 0.506
0.182 (0.296) 1.217 (0.194) 0.402
0.025 (0.070) 1.062 (0.183) 0.084
0.002 (0.085) 1.139 (0.234) 0.250
−0.033 (0.107)
1.325 (0.144) 0.346
Const. HAR–CJN R2
−0.078 (0.144)
−0.008 (0.168)
1.154 (0.124) 0.371
1.157 (0.156) 0.554
0.170 (0.232) 1.137 (0.150) 0.466
0.048 (0.084) 0.944 (0.138) 0.109
0.048 (0.064) 0.948 (0.162) 0.287
0.012 (0.098) 1.058 (0.257) 0.383
Const. GARCH HAR–RV HAR–CJN R2
−0.050 (0.100) 0.516 (0.329)
−0.061 (0.446) 0.795 (0.402) 0.413
Individual regressions Const. GARCH R2
0.016 (0.102) 1.137 (0.288) 0.310 1.286 (0.304) 0.333
10 Data GARCH
5
0
2001
2002
2003
2004
2005
10 Data HAR–RV
5
0
2001
2002
2003
2004
2005
10 Data HAR–CJN
5
0
2001
2002
2003
2004
2005
Fig. 5. One-day-ahead out-of-sample forecasts for SP.
studies we find that the volatility associated with the continuous price movements within the day is a highly persistent process for both markets. Counter to a number of previous studies, however, we detect important dynamic dependencies in both the times between significant jumps and the sizes of the jumps. Further, the time series of overnight returns, or price jumps, associated with the change in the closing price from one day to the opening price of the next exhibits strong volatility clustering. To satisfactorily account for these dependencies, we formulate and estimate a combination of several reduced form time series models. In addition, we compare and contrast the forecasting performance of the estimated models for each of the three non-parametrically identified volatility components to other commonly used volatility forecasting models. Looking ahead, our estimation results for the ACH model indicate that the occurrence of jumps in the T-Bond market is directly related to certain macroeconomic news releases. In this regard, it would be interesting to more systematically investigate the economic determinants behind the apparent discontinuities.
What is it that causes financial markets to jump? The reduced form modeling setup developed here provides a particular convenient framework for further exploring this important question. In the model diagnostics and forecast comparisons presented in the paper, we have primarily focused on mean square error type criteria. However, separately modeling the intraday jumps and the overnight returns are likely to prove especially beneficial for better understanding the tails of the return distributions. It would be interesting to more directly analyze this issue, and the model’s ability to capture the more extreme tail behavior and corresponding expected shortfalls, as would be of interest in many practical risk management situations. As previously noted, the specification and estimation of empirically realistic continuous-time jump–diffusion models have been the subject of extensive recent research efforts. In this regard, the relatively simple reduced form model structures for each of the different variation measures developed here could also be used as auxiliary models in an indirect inference setting to more effectively estimate and discriminate among some of these
188
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 176–189
2 Data GARCH
1.5 1 0.5 0
2001
2002
2003
2004
2005
2 Data HAR–RV
1.5 1 0.5 0
2001
2002
2003
2004
2005
2 Data HAR–CJN
1.5 1 0.5 0
2001
2002
2003
2004
2005
Fig. 6. One-day-ahead out-of-sample forecasts for US.
competing continuous-time specifications, naturally extending the earlier realized variation based inferential procedures of BarndorffNielsen and Shephard (2002) and Bollerslev and Zhou (2002). In a related context, the recent studies by Santa-Clara and Yan (2004) and Todorov (2006) suggest that the premia required by investors in options markets to compensate for jump and continuous volatility risks differ. By easily allowing for different risk premia associated with the future risks originating from the continuous sample path price process and the harder-to-hedge intraday jump and overnight components, it is possible that our relatively simple-to-implement reduced form forecasting model may be used in the calculation of more accurate derivatives prices. We leave further work along these lines for future research. References Aït-Sahalia, Yacine, Mykland, Per A., Zhang, Lan, 2005. A tale of two time scales: determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association 100, 1394–1411. Andersen, Torben G., Benzoni, Luca, Lund, Jesper, 2002. An empirical investigation of continuous-time equity return models. Journal of Finance 57, 1239–1284. Andersen, Torben G., Bollerslev, Tim, 1998. Answering the skeptics: yes, standard volatility models do provide accurate forecasts. International Economic Review 39, 885–905. Andersen, Torben G., Bollerslev, Tim, Christoffersen, Peter, Diebold, Francis X., 2006. Volatility forecasting. In: Elliott, Graham, Granger, Clive W.J., Timmermann, Allan (Eds.), Handbook of Economic Forecasting. Elsevier Science, New York. Andersen, Torben G., Bollerslev, Tim, Diebold, Francis, 2007. Roughing it up: including jump components in the measurement, modeling and forecasting of return volatility. Review of Economics and Statistics 89, 701–720. Andersen, Torben G., Bollerslev, Tim, Diebold, Francis X., Ebens, Heiko, 2001a. The distribution of realized stock return volatility. Journal of Financial Economics 61, 43–76. Andersen, Torben G., Bollerslev, Tim, Diebold, Francis X., Labys, Paul, 2001b. The distribution of realized exchange rate volatility. Journal of the American Statistical Association 96, 42–55. Andersen, Torben G., Bollerslev, Tim, Diebold, Francis X., Labys, Paul, 2003. Modeling and forecasting realized volatility. Econometrica 71, 579–625. Andersen, Torben G., Bollerslev, Tim, Diebold, Francis X., Vega, Clara, 2007. Realtime price discovery in stock, bond, and foreign exchange markets. Journal of International Economics 73, 251–277. Andersen, Torben G., Bollerslev, Tim, Frederiksen, Per Houmann, Nielsen, Morten Ørregaard, 2006. Continuous-time models, realized volatilities, and testable distributional implications for daily stock returns. Working Paper. Northwestern, Duke, and Cornell Universities. Andersen, Torben G., Bollerslev, Tim, Meddahi, Nour, 2004. Analytic evaluation of volatility forecasts. International Economic Review 45, 1079–1110.
Andersen, Torben G., Bollerslev, Tim, Meddahi, Nour, 2005. Correcting the errors: volatility forecast evaluation using high-frequency data and realized volatilities. Econometrica 73, 279–296. Anderson, Heather M., Vahid, Farshid, 2007. Forecasting the volatility of Australian stock returns: do common factors help? Journal of Business and Economic Statistics 25, 76–90. Areal, Nelson M.P.C., Taylor, Stephen J., 2002. The realized volatility of FTSE-100 futures prices. Journal of Futures Markets 22, 627–648. Bandi, Federico M., Russell, Jeffrey R., 2008. Microstructure noise, realized volatility, and optimal sampling. Review of Economic Studies 75, 339–369. Barndorff-Nielsen, Ole E., Graversen, Svend Erik, Jacod, Jean, Podolskij, Mark, Shephard, Neil, 2006. Limit theorems for bipower variation in financial econometrics. Econometric Theory 22, 677–719. Barndorff-Nielsen, Ole E., Hansen, Peter Reinhard, Lunde, Asger, Shephard, Neil, 2008. Designing realised kernels to measure the ex-post variation of equity prices in the presence of noise. Econometrica 76, 1481–1536. Barndorff-Nielsen, Ole E., Shephard, Neil, 2001. Non-Gaussian Ornstein–Uhlenbeckbased models and some of their uses in financial economics. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 63, 167–241. Barndorff-Nielsen, Ole E., Shephard, Neil, 2002. Econometric analysis of realised volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 64, 253–280. Barndorff-Nielsen, Ole E., Shephard, Neil, 2004a. Power and bipower variation with stochastic volatility and jumps. Journal of Financial Econometrics 2, 1–37. Barndorff-Nielsen, Ole E., Shephard, Neil, 2004b. How accurate is the asymptotic approximation to the distribution of realized volatility. In: Andrews, Donald W.K., Powell, James L., Ruud, Paul A., Stock, James H. (Eds.), Identification and Inference for Econometric Models. A Festschrift in Honour of Thomas Rothenberg. Cambridge University Press, Cambridge. Barndorff-Nielsen, Ole E., Shephard, Neil, 2006. Econometrics of testing for jumps in financial economics using bipower variation. Journal of Financial Econometrics 4, 1–30. Barndorff-Nielsen, Ole E., Shephard, Neil, Winkel, Matthias, 2006. Limit theorems for multipower variation in the presence of jumps. Stochastic Processes and their Applications 116, 796–806. Bates, David S., 2000. Post-‘87 crash fears in the S&P 500 futures option market. Journal of Econometrics 94, 181–238. Bollerslev, Tim, 1987. A conditionally heteroskedastic time series model for speculative prices and rates of return. Review of Economic and Statistics 69, 542–547. Bollerslev, Tim, Kretschmer, Uta, Pigorsch, Christian, Tauchen, George, 2008. A discrete-time model for daily S&P 500 returns and realized variations: jumps and leverage effects. Journal of Econometrics, in this issue, (doi:10.1016/j.jeconom.2008.12.001). Bollerslev, Tim, Zhou, Hao, 2002. Estimating stochastic volatility diffusions using conditional moments of integrated volatility. Journal of Econometrics 109, 33–65. Bowsher, Clive Graham, 2007. Modelling security market events in continuous time: intensity based, multivariate point process models. Journal of Econometrics 141, 876–912. Carr, Peter, Wun, Liuren, 2003. What type of process underlies options? A simple robust test. Journal of Finance 58, 2581–2610. Chan, Kalok, Chan, K.C., Karolyi, G. Andrew, 1991. Intraday volatility in the stock index and stock index futures markets. Review of Financial Studies 4, 657–684.
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 176–189 Chan, Wing H., Maheu, John M., 2002. Conditional jump dynamics in stock market returns. Journal of Business and Economic Statistics 20, 377–389. Chernov, Mikhail, Gallant, A. Ronald, Ghysels, Eric, Tauchen, George, 2003. Alternative models for stock price dynamics. Journal of Econometrics 116, 225–258. Comte, Fabienne, Renault, Eric, 1998. Long memory in continuous time stochastic volatility models. Mathematical Finance 8, 291–323. Corsi, Fulvio, 2004. A simple long memory model of realized volatility. Working Paper. University of Lugano. Dacorogna, Michael M., Gençay, Ramazan, Müller, Ulrich A., Pictet, Olivier V., Olsen, Richard B., 2001. An Introduction to High-Frequency Finance. AP Professional, Boston, MA, London. Deo, Rohit, Hurvich, Clifford, Lu, Yi, 2006. Forecasting realized volatility using a long-memory stochastic volatility model: estimation, prediction and seasonal adjustment. Journal of Econometrics 131, 29–58. Diebold, Francis X., Mariano, Roberto S., 1995. Comparing predictive accuracy. Journal of Business and Economic Statistics 13, 253–263. Engle, Robert F., Russell, Jeffrey R., 1998. Autoregressive conditional duration: a new model for irregularly spaced transaction data. Econometrica 66, 1127–1162. Eraker, Bjørn, 2004. Do Stock prices and volatility jump? Reconciling evidence from spot and option prices. Journal of Finance 59, 1367–1403. Eraker, Bjørn, Johannes, Michael S., Polson, Nicholas G., 2003. The impact of jumps in volatility. Journal of Finance 58, 1269–1300. Fleming, Jeff, Kirby, Chris, Ostdiek, Barbara, 2003. The economic value of volatility timing using realized volatility. Journal of Financial Economics 67, 473–509. Giacomini, Raffaella, White, Halbert, 2006. Tests of conditional predictice ability. Econometrica 74, 1545–1578. Hamilton, James D., Jordà, Òscar, 2002. A model of the federal funds rate target. Journal of Political Economy 110, 1135–1167. Hansen, Peter Reinhard, Lunde, Asger, 2005. A realized variance for the whole day based on intermittent high-frequency data. Journal of Financial Econometrics 3, 525–554. Hansen, Peter Reinhard, Lunde, Asger, 2006. Realized variance and market microstructure noise. Journal of Business and Economic Statistics 24, 127–161. Huang, Xin, Tauchen, George, 2005. The relative contribution of jumps to total price variance. Journal of Financial Econometrics 3, 456–499. Jacod, Jean, 2008. Asymptotic properties of realized power variations and related functionals of semimartingales. Stochastic Processes and their Applications 118, 517–559. Johannes, Michael, 2004. The statistical and economic role of jumps in continuoustime interest rate models. Journal of Finance 59, 227–260. Koopman, Siem Jan, Jungbacker, Borus, Hol, Eugenie, 2005. Forecasting daily variability of the S&P 100 stock index using historical, realised and implied volatility measurements. Journal of Empirical Finance 12, 445–475.
189
Lanne, Markku, 2006. Forecasting realized volatility by decomposition. Working Paper. European University Institute. Maheu, John M., McCurdy, Thomas H., 2004. News arrival, jump dynamics and volatility components for individual stock returns. Journal of Finance 59, 755–793. Martens, Martin, 2002. Meauring and forecasting S&P 500 index-futures volatility using high-frequency data. Journal of Futures Markets 22, 497–518. Martens, Martin, van Dijk, Dick, de Pooter, Michiel, 2004. Modeling and forecasting S&P 500 volatility: long memory, structural breaks and nonlinearity. Working Paper. Erasmus University Rotterdam. Merton, Robert C., 1976. Option pricing when underlying stock returns are discontinuous. Journal of Financial Economics 3, 125–144. Müller, Ulrich A., Dacorogna, Michel M., Davé, Rakhal D., Olsen, Richard B., Pictet, Olivier V., von Weizsäcker, Jacob E., 1997. Volatilities of different time resolution—analyzing the dynamics of market components. Journal of Empirical Finance 4, 213–239. Neely, Christopher J., 1999. Target zones and conditional volatility: the role of realignments. Journal of Empirical Finance 6, 177–192. Pan, Jun, 2002. The jump-risk premia implicit in options: evidence from an integrated time series study. Journal of Financial Economics 63, 3–50. Patton, J. Andrew, 2006. Volatility forecast comparison using imperfect volatility proxies. Working Paper. London School of Economics. Pong, Shiu-yan Eddie, Shackleton, Mark B., Taylor, Stephen J., Xu, Xinzhong, 2004. Forecasting currency volatility: a comparison of implied volatilities and AR(FI)MA models. Journal of Banking and Finance 28, 2541–2563. Rydberg, Tina Hviid, Shephard, Neil, 2003. Dynamics of trade-by-trade price movements: decomposition and models. Journal of Financial Econometrics 1, 2–25. Santa-Clara, Pedro, Yan, Shu, 2004. Jump and volatility risk and risk premia: a new model and lessons from S&P 500 options. Working Paper. UCLA. Tauchen, George, Zhou, Hao, 2006. Identifying realized jumps on financial markets. Working Paper. Federal Reserve Board. Thomakos, Dimitrios D., Wang, Tao, 2003. Realized volatility in the futures market. Journal of Empirical Finance 10, 321–353. Todorov, Viktor, 2006. Pricing diffusive and jump risk: what can we learn from the variance risk premium? Working Paper. Duke University. Wasserfallen, Walter, Zimmermann, Heinz, 1985. The behavior of intraday exchange rates. Journal of Banking and Finance 9, 55–72. West, Kenneth D., McCracken, Michael W., 1998. Regression-based tests of predictive ability. International Economic Review 39, 817–840. Zhang, Lan, 2006. Efficient estimation of stochastic volatility using noisy observations: a multi-scale approach. Bernoulli 12, 1019–1043. Zhang, Lan, 2007. What you don’t know cannot hurt you: on the detection of small jumps. Working Paper. University of Illinois at Chicago.
Journal of Econometrics 160 (2011) 190–203
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Edgeworth expansions for realized volatility and related estimators✩ Lan Zhang a , Per A. Mykland b , Yacine Aït-Sahalia c,∗ a
University of Illinois at Chicago, United States
b
The University of Chicago, United States
c
Princeton University and NBER, United States
article
info
Article history: Available online 6 March 2010 JEL classification: C13 C14 C15 C22
abstract This paper shows that the asymptotic normal approximation is often insufficiently accurate for volatility estimators based on high frequency data. To remedy this, we derive Edgeworth expansions for such estimators. The expansions are developed in the framework of small-noise asymptotics. The results have application to Cornish–Fisher inversion and help setting intervals more accurately than those relying on normal distribution. © 2010 Elsevier B.V. All rights reserved.
Keywords: Bias correction Edgeworth expansion Market microstructure Martingale Realized volatility Two scales realized volatility
1. Introduction Volatility estimation from high frequency data has received substantial attention in the recent literature.1 A phenomenon which has been gradually recognized, however, is that the standard estimator, realized volatility or realized variance (RV, hereafter), can be unreliable if the microstructure noise in the data is not explicitly taken into account. Market microstructure effects are surprisingly prevalent in high frequency financial data. As the sampling frequency increases, the noise becomes progressively more dominant, and in the limit swamps the signal. Empirically, sampling a typical stock price every few seconds can lead to volatility estimates that deviate from the true volatility by a factor of two or more. As a result, the usual prescription in the literature is to sample sparsely, with the recommendations ranging from five
✩ Financial support from the NSF under grants SBR-0350772 (Aït-Sahalia) and DMS-0204639, DMS 06-04758, and SES 06-31605 (Mykland and Zhang) is gratefully acknowledged. The authors would like to thank the editors and referees for helpful comments and suggestions. ∗ Corresponding address: Princeton University and NBER, Department of Economics, Princeton, NJ 08544-1021, United States. Tel.: +1 609 258 4015. E-mail address:
[email protected] (Y. Aït-Sahalia). 1 See, e.g., Dacorogna et al. (2001), Andersen et al. (2001b), Zhang (2001),
Barndorff-Nielsen and Shephard (2002), Meddahi (2002) and Mykland and Zhang (2006). 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.030
to thirty minutes, even if the data are available at much higher frequencies. More recently various RV-type estimators have been proposed to take into account the market microstructure impact. For example, in the parametric setting, Aït-Sahalia et al. (2005) proposed likelihood corrections for volatility estimation; in the nonparametric context, Zhang et al. (2005) proposed five different RV-like estimation strategies, culminating with a consistent estimator based on combining two time scales, which we called TSRV (two scales realized volatility).2 One thing in common among various RV-type estimators is that the limit theory predicts that the estimation errors of these estimators should be asymptotically mixed normal. Without noise, the asymptotic normality of RV estimation errors dates back to at least Jacod (1994) and Jacod and Protter (1998). When microstructure noise is present, the asymptotic normality of the standard RV estimator (as well as that of the subsequent refinements that are
2 A natural generalization of TSRV, based on multiple time scales, can improve the estimator’s efficiency (Zhang, 2006). Also, since the development of the two scales estimators, two other classes of estimators have been developed for this problem: realized kernels (Barndorff-Nielsen et al., 2008, 2011), and pre-averaging (Podolskij and Vetter, 2009; Jacod et al., 2009). Other strategies include Zhou (1996, 1998), Hansen and Lunde (2006), and Bandi and Russell (2008). Studying the Edgeworth expansions of these statistics is beyond the scope of this paper, instead we focus on the statistics introduced by Zhang et al. (2005).
L. Zhang et al. / Journal of Econometrics 160 (2011) 190–203
robust to the presence of microstructure noise, such as TSRV) was established in Zhang et al. (2005). However, simulation studies do not agree well with what the asymptotic theory predicts. As we shall see in Section 5, the error distributions of various RV-type estimators (including those that account for microstructure noise) can be far from normal, even for fairly large sample sizes. In particular, they are skewed and heavytailed. In the case of basic RV, such non-normality appears to first have been documented in simulation experiments by BarndorffNielsen and Shephard (2005).3 We argue that the lack of normality can be caused by the coexistence of a small effective sample size and small noise. As a first-order remedy, we derive Edgeworth expansions for the RV-type estimators when the observations of the price process are noisy. What makes the situation unusual is that the errors (noises) ϵ are very small, and if they are taken to be of order Op (1), their impact on the Edgeworth expansion may be exaggerated. Consequently, the coefficients in the expansion may not accurately reflect which terms are important. To deal with this, we develop expansions under the hypothesis that the size of |ϵ| goes to zero, as stated precisely at the beginning of Section 4. We will document that this approach predicts the small sample behavior of the estimators better than the approach where |ϵ| is of fixed size. In this sense, we are dealing with an unusual type of Edgeworth expansion. One can argue that it is counterfactual to let the size of ϵ go to zero as the number of observations go to infinity. We should emphasize that we do not literally mean that the noise goes down the more observations one gets. The form of asymptotics is merely a device to generate appropriately accurate distributions. Another problem where this type of device is used is for ARMA processes with nearly unit root (see, e.g. Chan and Wei, 1987), or the localto-unity paradigm. In our setting, the assumption that the size of ϵ goes down has produced useful results in Sections 2 and 3 of Zhang et al. (2005). For the problem discussed there, shrinking ϵ is the only known way of discussing bias-variance trade-off rigorously in the presence of a leverage effect. Note that a similar use of triangular array asymptotics has been used by Delattre and Jacod (1997) in the context of rounding, and by Gloter and Jacod (2001) in the context of additive error. Another interpretation is that of small-sigma asymptotics, cf. the discussion in Section 4.1 below. It is worth mentioning that jumps are not the likely causes leading to the non-normality in RV’s error distributions in Section 5, as we model both the underlying returns and the volatility as continuous processes. Also, it is important to note that our analysis focuses on normalized RV-type estimators, rather than studentized RV which has more immediate implementation in practice. In other words, our Edgeworth expansion has the limitation of conditioning on volatility processes, while hopefully it sheds some light on how an Edgeworth correction can be done for RV-type estimators while allowing for the presence of microstructure noise. For an Edgeworth expansion applicable to the studentized (basic) RV estimator when there is no noise, one can consult Gonçalves and Meddahi (2009). Their expansion is used for assessing the accuracy of the bootstrap in comparison to the first order asymptotic approach. See also Gonçalves and Meddahi (2008). Edgeworth expansions for realized volatility are also developed by Lieberman and Phillips (2006) for inference on long memory parameters. With the help of Cornish–Fisher expansions, our Edgeworth expansions can be used for the purpose of setting intervals that
3 We emphasize that the phenomenon we describe is the distribution of the estimation error of volatility measures. This is different from the well known empirical work demonstrating the non-normality of the unconditional distribution of RV estimators (see for example Zumbach et al., 1999; Andersen et al., 2001a,b), where the dominant effect is the behavior of the true volatility itself.
191
are more accurate than the ones based on the normal distribution. Since our expansions hold in a triangular array setting, they can also be used to analyze the behavior of bootstrapping distributions. A nice side result in our development, which may be of use in other contexts, shows how to calculate the third and fourth cumulants of integrals of Gaussian processes with respect to Brownian motion. This can be found in Proposition 4. The paper is organized as follows. In Section 2, we briefly recall the estimators under consideration. Section 3 gives their first order asymptotic properties, and reports initial simulation results which show that the normal asymptotic distribution can be unsatisfactory. So, in Section 4, we develop Edgeworth expansions. In Section 5, we examine the behavior of our small-sample Edgeworth corrections in simulations. Section 6 concludes. Proofs are in the Appendix. 2. Data structure and estimators Let {Yti }, 0 = t0 ≤ t1 ≤ · · · tn = T , be the observed (log) price of a security at time ti ∈ [0, T ]. The basic modelling assumption we make is that these observed prices can be decomposed into an underlying (log) price process X (the signal) and a noise term ϵ , which captures a variety of phenomena collectively known as market microstructure noise. That is, at each observation time ti , we have Yti = Xti + ϵti .
(2.1)
Let the signal (latent) process X follow an Itô process dXt = µt dt + σt dBt ,
(2.2)
where Bt is a standard Brownian motion. We assume that, µt , the drift coefficient, and σt2 , the instantaneous variance of the returns process Xt , will be (continuous) stochastic processes. We do not, in general, assume that the volatility process, when stochastic, is orthogonal to the Brownian motion driving the price process.4 However, we will make this assumption in Section 4.3. Let the noise ϵti in (2.1) satisfy the following assumption,
ϵti i.i.d. with E (ϵti ) = 0,
and
Var(ϵti ) = E ϵ . Also ϵ y X process, 2
(2.3)
where y denotes independence between two random quantities. Note that our interest in the noise is only at the observation times ti ’s, so, model (2.1) does not require that ϵt exists for every t. We are interested in estimating
⟨X , X ⟩T =
T
∫
σt2 dt ,
(2.4)
0
the integrated volatility or quadratic variation of the true price process X , assuming model (2.1), and assuming that Yti ’s can be observed at high frequency. In particular, we focus on estimators that are nonparametric in nature, and as we will see, are extensions of RV. Following Zhang et al. (2005), we consider five RV-type estimators. Ranked from the statistically least desirable to the most desirable, we start with (1) the ‘‘all’’ estimator [Y , Y ](all) , where RV is based on the entire sample and consecutive returns are used; (2) the sparse estimator [Y , Y ](sparse) , where the RV is based on a sparsely sampled returns series. Its sampling frequency is often arbitrary or selected in an ad hoc fashion; (3) the optimal, sparse estimator [Y , Y ](sparse,opt) , which is similar to [Y , Y ](sparse) except that the sampling frequency is pre-determined to be optimal in the
4 See the theorems in Zhang et al. (2005) for the precise assumptions.
192
L. Zhang et al. / Journal of Econometrics 160 (2011) 190–203
sense of minimizing root mean squared error (MSE); (4) the averaging estimator [Y , Y ](avg) , which is constructed by averaging the sparse estimators and thus also utilizes the entire sample, and finally (5) two scales estimator (TSRV) ⟨ X , X ⟩, which combines the RV estimators from two time scales, [Y , Y ](avg) and [Y , Y ](all) , using the latter as a means to bias-correct the former. We showed that the combination of two time scales results in a consistent estimator. TSRV is the first estimator proposed in the literature to have this property. The first four estimators are biased; the magnitude of their bias is typically proportional to the sampling frequency. Specifically, our estimators have the following form. First, [Y , Y ](Tall) uses all the observations
[Y , Y ](Tall) =
− (Yti+1 − Yti )2 ,
(2.5)
3. Small sample accuracy of the normal asymptotic distribution We now briefly recall the distributional theory for each of these five estimators which we developed in Zhang et al. (2005); the estimation errors of all five RVs have asymptotically (mixed) normal distributions. As we will see, however, this asymptotic distribution is not particularly accurate in small samples. 3.1. Asymptotic normality for the sparse estimators For the sparse estimator, we have shown that L
[Y , Y ](Tsparse) ≈ ⟨X , X ⟩T + 2nsparse E ϵ 2
bias due to noise
ti ∈G
where G contains all the observation times ti ’s in [0, T ], 0 = t0 ≤ t1 , . . . , ≤ tn = T . The sparse estimator uses a subsample of the data,
[Y , Y ](Tsparse) =
− tj ,tj,+ ∈H
(Ytj,+ − Ytj )2 ,
(2.6)
K 1 −
−
K k=1 tj ,tj,+ ∈G(k)
(Ytj,+ − Ytj )2 ,
(2.7)
=[Y ,Y ]T
−
K t ,t ∈G j j+K
(Ytj+K − Ytj )2 ,
⟨ X , X ⟩T = 1 −
n¯ n
−1
n¯ [Y , Y ]T(avg) − [Y , Y ]T(all) n
(2.8)
that is, the volatility estimator ⟨ X , X ⟩T combines the sum of squares (avg) estimators from two different time scales, [Y , Y ]T from the (all)
returns on a slow time scale whereas [Y , Y ]T is computed from the returns on a fast time scale. n¯ in (2.8) is the average sample size across different grids. Note that this is what is called the ‘‘adjusted’’ TSRV in Zhang et al. (2005). In the model (2.1), the distributions of various estimators can be studied by decomposing the sum-of-squared returns [Y , Y ],
[Y , Y ]T = [X , X ]T + 2[X , ϵ]T + [ϵ, ϵ]T .
× Ztotal ,
(3.1) (sparse)
where Var([ϵ, ϵ]T
) = 4nsparse E ϵ − 2 Var(ϵ ), and Ztotal is 4
(2.9)
The above decomposition applies to all the estimators in this section, with the samples suitably selected.
2
L
standard normal term. The symbol ‘‘≈’’ means that when suitably standardized, the two sides have the same limit in law. If the sample size nsparse is large relative to the noise, the vari(sparse)
ance due to noise in (3.1) would be dominated by Var([ϵ, ϵ]T ) which is of order nsparse E ϵ 4 . However, with the dual presence of (sparse)
small nsparse and small noise (say, E ϵ 2 ), 8[X , X ]T (sparse)
where the sum-squared returns are computed only from subsampling every K th observation times, and then averaged with equal weights. The TSRV estimator has the form of
total variance
necessarily smaller than Var([ϵ, ϵ]T
(k)
where G ’s are disjoint subsets of the full set of observation times with union G. Let nk be the number of time points in G(k) and ∑K n¯ = K −1 k=1 nk the average sample size across different grids G(k) , k = 1, . . . , K . One can also consider the optimal, averaging estimator [Y , Y ](avg,opt) , by substituting n¯ by n¯ ∗ where the latter is selected to balance the bias-variance trade-off in the error of averaging estimator. (See again Zhang et al. (2005) for an explicit formula.) A special case of (2.7) arises when the sampling points are regularly allocated: 1
due to discretization
(sparse)
(k)
[Y , Y ]T(avg) =
∫ T 2T + Var([ϵ, ϵ ](Tsparse) ) + 8[X , X ](Tsparse) E ϵ 2 + σt4 dt nsparse 0 due to noise
where H is a strict subset of G, with sample size nsparse , nsparse < n. And, if ti ∈ H , then ti,+ denotes the following elements in H . The optimal sparse estimator [Y , Y ](sparse,opt) has the same form as in (2.6) except replacing nsparse with n∗sparse , where n∗sparse is determined by minimizing MSE of the estimator (an explicit formula for doing so is given in Zhang et al. (2005)). The averaging estimator maintains a slow sampling scheme based on using all the data,
[Y , Y ]T(avg) =
1/2
E ϵ 2 is not
). One then needs to add
8[X , X ]T E ϵ 2 into the approximation. We call this correction small-sample, small-error adjustment. This type of adjustment is often useful, since the magnitude of the microstructure noise is typically smallish as documented in the empirical literature, cf. the discussion in the introduction to Zhang et al. (2005). Of course, nsparse is selected either arbitrarily or in some ad hoc manner. By contrast, the sampling frequency in the optimal-sparse estimator [Y , Y ](sparse,opt) can be determined by minimizing the MSE of the estimator analytically. Distribution-wise, the optimalsparse estimator has the same form as in (3.1), but, one replaces nsparse by the optimal sampling frequency n∗sparse given below in (4.11). No matter whether nsparse is selected optimally or not, one can see from (3.1) that after suitably adjusted for the bias term, the sparse estimators are asymptotically normal. 3.2. Asymptotic normality for the averaging estimator The optimal-sparse estimator only uses a fraction n∗sparse /n of the data; one also has to pick the beginning (or ending) point of the sample. The averaging estimator overcomes both shortcomings. Based on the decomposition (2.9), we have L
[Y , Y ](Tavg) ≈ ⟨X , X ⟩T +
2 E ϵ2 n¯
bias due to noise
1/2
∫ T 8 4T (avg) (avg) 2 4 + Var([ϵ, ϵ ]T ) + [X , X ]T E ϵ + σt dt ¯ K 3 n 0 due to noise due to discretization total variance
× Ztotal ,
(3.2)
L. Zhang et al. / Journal of Econometrics 160 (2011) 190–203
where (avg)
Var([ϵ, ϵ]T
n¯ 2 ) = 4 E ϵ 4 − Var(ϵ 2 ), K
K
and Ztotal is a standard normal term. The distribution of the optimal averaging estimator [Y , Y ](avg,opt) has the same form as in (3.2) except that we substitute n¯ with the optimal sub-sampling average size n¯ ∗ . To find n¯ ∗ , one determines K ∗ from the bias-variance trade-off in (3.2) and then set n¯ ∗ ≈ n/K ∗ . (avg) (avg,opt) , it folIf one removes the bias in either [Y , Y ]T or [Y , Y ]T lows from (3.2) that the next term is, again, asymptotically normal. 3.3. The failure of asymptotic normality In practice, things are, unfortunately, somewhat more complicated than the story that emerges from Eqs. (3.1) and (3.2). The error distributions of the sparse estimators and the averaging estimator can be, in fact, quite far from normal. We provide an illustration of this using simulations. The simulation design is described in Section 5 below, but here we give a preview to motivate our following theoretical development of small sample corrections to these asymptotic distributions. Fig. 1 reports the QQ plots of the standardized distribution of the five estimators before any Edgeworth correction is applied, as well as the histograms of the estimates. It is clear that the sparse, the sparse-optimal and the averaging estimators are not normally distributed, in particular, they are positively skewed and show some degree of leptokurtosis. On the other hand, the ‘‘all’’ estimator and the TSRV estimator appear to be normally distributed. The apparent normality of the ‘‘all’’ estimator is mainly due to the large sample size (one second sampling over 6.5 h); it is thus fairly irrelevant to talk about its small-sample behavior. Overall, we conclude from these QQ plots that the small-sample error distribution of the TSRV estimator is close to normality, while the small-sample error distribution of the other estimators departs from normality. As mentioned in Section 5, n is very large in this simulation. It should be emphasized that bias is not the cause of the nonnormality. Apart from TSRV, all the estimators have substantial bias. This bias, however, does not change the shape of the error distribution of the estimator, it only changes where the distribution is centered. 4. Edgeworth expansions for the distribution of the estimators 4.1. The form of the Edgeworth expansion in terms of cumulants In situations where the normal approximation is only moderately accurate, improved accuracy can be obtained by appealing to Edgeworth expansions, as follows. Let θ be a quantity to be estiT mated, such as θ = 0 σt2 dt, and let θˆn be an estimator, say the sparse or average realized volatility, and suppose that αn is a normalizing constant to that Tn = αn (θˆn −θ ) is asymptotically normal. A better approximation to the density fn of Tn can then be obtained through the Edgeworth expansion. Typically, second order expansions are sufficient to capture skewness and kurtosis, as follows:
[ φ(z ) 1 Cum3 (Tn ) 1 Cum4 (Tn ) fn (x) = 1+ h3 (z ) + h4 (z ) Var(Tn )1/2 6 Var(Tn )3/2 24 Var(Tn )2 ] 1 Cum3 (Tn )2 + h ( z ) + · · · (4.1) 6 72 Var(Tn )3 where Cumi (Tn ) is the ith order cumulant of Tn , z = (x − E (Tn ))/ Var(Tn )1/2 , and where the Hermite polynomials hi are given by h3 (z ) = z 3 − 3z , h4 (z ) = z 4 − 6z 2 + 3, h6 (z ) =
193
z 6 − 15z 4 + 45z 2 − 15. The neglected terms are typically of smaller order in n than the explicit terms. We shall refer to the explicit terms in (4.1) as the usual Edgeworth form. For broad discussions of Edgeworth expansions, and definitions of cumulants, see e.g., Chapter XVI of Feller (1971) and Chapter 5.3 of McCullagh (1987). In some cases, Edgeworth expansions can only be found for distribution functions, in which case the form is obtained by integrating Eq. (4.1) term by term. In either situation, the Edgeworth approximations can be turned into expansions for pvalues, and to Cornish–Fisher expansions for critical values; see formula (5.2) in Section 5. For more detail, we refer the reader to, e.g., Hall (1992). Let us now apply this to the problem at hand here. An Edgeworth expansion of the usual form, up to second order, can be found separately for each of the components in (2.9) by first considering expansions for n−1/2 ([ϵ, ϵ](all) − 2nE ϵ 2 ) and (avg) n−1/2 K ([ϵ, ϵ]T − 2n¯ E ϵ 2 ). Each of these can then be represented exactly as a triangular array of martingales. The remaining terms are also, to relevant order, martingales. Results deriving expansions for martingales can be found in Mykland (1993), and Mykland (1995b) and Mykland (1995a). See also Bickel et al. (1986) for n−1/2 ([ϵ, ϵ](all) − 2nE ϵ 2 ). To implement the expansions, however, one needs the form of the first four cumulants of Tn . We assume that the ‘‘size’’ of the law of ϵ goes to zero, formally that ϵ for sample size n is of the form τn ζ , i.e., Pn (ϵ/τn ≤ x) = P (ζ ≤ x), where the left hand probability is for sample size n, and the right hand probability is independent of n. Here, E ζ 8 < ∞ and does not depend on sample size, and τn is nonrandom and goes to zero as n → ∞. Note that under our assumption, Var(ϵ) = O(τn2 ), so the assumption is similar to the small-sigma asymptotics which goes back to Kadane (1971). Finally, while in our case this is a way of setting up asymptotics, there is empirical work on whether the noise decreases with n; see, in particular, Awartani et al. (2006). No matter what assumptions are made on the noise (on τn ), one should not expect cumulants in (4.1) to have standard convergence rates. The typical situation for an asymptotically normal statistic Tn is that the p’th cumulant, p ≥ 2, is of order O(n−(p−2)/2 ), see, for example, Chapters 2.3–2.4 of Hall (1992), along with Wallace (1958), and Bhattacharya and Ghosh (1978), and the discussion in Mykland (2001) and the references therein. While the typical situation does remain in effect for realized volatility in the no-noise and no-leverage case (which is, after all, a matter simply of observations that are independent but non-identically distributed), the picture changes for more complex statistics. To see that non-standard rates can occur even in the absence of microstructure noise, consult (4.28)–(4.29) in Section 4.3.2 below. An important question which arises in connection with Edgeworth expansions is the comparison of Cornish–Fisher inversion with bootstrapping. The latter has been developed in the no-noise case by Gonçalves and Meddahi (2009). A comparison of this type is beyond the scope of this paper, but is clearly called for. 4.2. Conditional cumulants We start by deriving explicit expressions for the conditional cumulants for [Y , Y ] and [Y , Y ](avg) , given the latent process X . All the expressions we give below about [Y , Y ] hold for both [Y , Y ](all) and [Y , Y ](sparse) ; in the former case, n remains to be the total sample size in G, while in the latter n is replaced by nsparse . We use a similar notation for [ϵ, ϵ] and for [X , X ]. 4.2.1. Third-order conditional cumulants Denote 1
c3 (n) = Cum3 ([ϵ, ϵ] − 2nE ϵ 2 ),
(4.2)
194
L. Zhang et al. / Journal of Econometrics 160 (2011) 190–203
Fig. 1. Left panel: QQ plot for the five estimators based on the asymptotic Normal distribution. Right panel: Comparison of the small sample distribution of the estimator (histogram), the Edgeworth-corrected distribution (solid line) and the standard Normal distribution (dashed line).
where [ϵ, ϵ] =
∑n−1 i=0
(ϵti+1 − ϵti )2 . We have:
=
[ n−
3
4
Cum3 ϵ 2 − 7 n −
6 7
For the conditional third cumulant of [Y , Y ], we have Cum3 (ϵ)2
Cum3 ([Y , Y ]T |X ) = Cum3 ([ϵ, ϵ]T + 2[X , ϵ]|X )
= Cum3 ([ϵ, ϵ]T ) + 6 Cum([ϵ, ϵ]T , [ϵ, ϵ]T , [X , ϵ]T |X ) + 12 Cum([ϵ, ϵ]T , [X , ϵ]T , [X , ϵ]T |X ) + 8 Cum3 ([X , ϵ]T |X ).
] 1 2 +6 n − Var(ϵ) Var(ϵ ) .
2
From that lemma, it follows that c3 (n) = O(nE ϵ ) 6
From this, we obtain: (4.3)
and also because the ϵ ’s from the different grids are independent, Cum3 K ([ϵ, ϵ]
Cum3 ([ϵ, ϵ](k) − 2nk E ϵ 2 ) = Kc3 (¯n).
k=1
Lemma 1. c3 (n) = 8
K −
(avg)
− 2n¯ E ϵ ) 2
Proposition 1. Cum3 ([Y , Y ]T |X )
= Cum3 ([ϵ, ϵ]T ) + 48[X , X ]E ϵ 4 + Op (n−1/2 E [|ϵ|3 ]),
(4.4)
L. Zhang et al. / Journal of Econometrics 160 (2011) 190–203
where Cum3 ([ϵ, ϵ]T ) is given in (4.3). Also (avg)
Cum3 (K [Y , Y ]T
(avg)
|X ) = Cum3 (K [ϵ, ϵ]T
(avg)
+ 48K [X , X ]T
E ϵ + Op (K n¯ 4
−1/2
In what follows, we apply these formulas to derive the unconditional cumulants for our estimators. The main development (after formula (4.8)) will be for the case where there is no leverage effect. It should be noted that there are (other) cases, such as Bollerslev and Zhou (2002) and Corradi and Distaso (2006), involving leverage effect where (non-mixed) asymptotic normality holds. In such case, unconditional Edgeworth expansions may also be applicable.
)
3
E [|ϵ| ]).
4.2.2. Fourth-order conditional cumulants For the fourth-order cumulant, denote 1
c4 (n) = Cum4 ([ϵ, ϵ](all) − 2nE ϵ 2 ).
(4.5)
We have that:
4.3.1. Unconditional cumulants for sparse estimators In Zhang et al. (2005), we showed that E ([Y , Y ]T | X process) = [X , X ]T + 2nE ϵ 2
Lemma 2. c4 (n) = 16
195
n−
7 8
+ 12(n − 1) Var(ϵ 2 )E ϵ 4 − 32 n − + 24 n −
7 4
and also that
Cum4 (ϵ 2 ) + n(E ϵ 4 )2 − 3n(E ϵ 2 )4 17
16
E ϵ 2 (E ϵ 3 )2 + 12 n −
3
4
Var([Y , Y ]T |X )
= 4nE ϵ 4 − 2 Var(ϵ 2 ) +8[X , X ]T E ϵ 2 + Op (E |ϵ|2 n−1/2 ).
E ϵ 3 Cov(ϵ 2 , ϵ 3 )
Var([ϵ,ϵ]T )
This allows us to obtain the unconditional cumulants as:
Cum3 (ϵ 2 )E ϵ 2 .
Cum3 ([Y , Y ]T − ⟨X , X ⟩T ) = c3 (n) + 48E (ϵ 4 )E [X , X ]
Also here, Cum4 K ([ϵ, ϵ](avg) − 2n¯ E ϵ 2 ) =
K −
Cum4 ([ϵ, ϵ](k) − 2nk E ϵ 2 )
k=1
= Kc4 (¯n).
+ 24 Cum([ϵ, ϵ]T , [ϵ, ϵ]T , [X , ϵ]T , [X , ϵ]T |X ) + 8 Cum([ϵ, ϵ]T , [ϵ, ϵ]T , [ϵ, ϵ]T , [X , ϵ]T |X ) + 32 Cum([ϵ, ϵ]T , [X , ϵ]T , [X , ϵ]T , [X , ϵ]T |X ) + 16 Cum4 ([X , ϵ]|X ). (4.6) Similar argument as in deriving the third cumulant shows that the latter three terms in the right hand side of (4.6) are of order Op (n−1/2 E [|ϵ|5 ]). Gathering terms of the appropriate order, we obtain: Proposition 2. Cum4 ([Y , Y ]|X ) = Cum4 ([ϵ, ϵ]T ) + 24[X , X ]T n−1 Cum3 ([ϵ, ϵ]T ) 5
E [|ϵ| ]).
Cum4 (K [Y , Y ](avg) |X ) = Cum4 (K [ϵ, ϵ]T
+ Op (K n¯
−1/2
+ 192E ϵ 4 Cov [X , X ]T , [X , X ]T − ⟨X , X ⟩T + 192(Var(ϵ))2 Var([X , X ]T ) + 48 Var(ϵ) Cum3 ([X , X ]T , [X , X ]T − ⟨X , X ⟩T , [X , X ]T − ⟨X , X ⟩T ) + Cum4 ([X , X ]T − ⟨X , X ⟩T ) + O(n−1/2 E [|ϵ|5 ]).
n) (avg) c3 (¯
) + 24K [X , X ]T
(4.8)
To calculate cumulants of [X , X ]T − ⟨X , X ⟩T , consider now the case where there is no leverage effect. For example, one can take σt to be conditionally nonrandom. Then
[X , X ]T =
n − i=1
Also, for the average estimator, (avg)
(4.7)
1 Cum4 ([Y , Y ]T − ⟨X , X ⟩T ) = c4 (n) + 24 c3 (n)E [X , X ]T n
Cum4 ([Y , Y ]|X ) = Cum4 ([ϵ, ϵ]T )
+ Op (n
+ O(n−1/2 E [|ϵ|3 ]) and
For the conditional fourth-order cumulant, we know that
−1/2
+ 24 Var(ϵ) Cov([X , X ]T , [X , X ]T − ⟨X , X ⟩T ) + Cum3 ([X , X ]T − ⟨X , X ⟩T )
χ12,i
∫
ti
σt2 dt ,
ti−1
where the χ12,i are i.i.d. χ12 random variables. Hence, with implicit conditioning,
n¯
E [|ϵ|5 ]).
Cump ([X , X ]T ) = Cump (χ ) 2 1
∫ n − i=1
4.3. Unconditional cumulants To get the Edgeworth expansion form as in (4.1), we need unconditional cumulants for the estimator. To pass from conditional to unconditional third cumulants, we will use general formulas for this purpose (see Brillinger, 1969; Speed, 1983, and also Chapter 2 in McCullagh, 1987): Cum3 (A) = E [Cum3 (A|F )] + 3 Cov[Var(A|F ), E (A|F )] + Cum3 [E (A|F )] Cum4 (A) = E [Cum4 (A|F )] + 4 Cov[Cum3 (A|F ), E (A|F )] + 3 Var[Var(A|F )] + 6 Cum3 (Var(A|F ), E (A|F ), E (A|F )) + Cum4 (E (A|F )).
p
ti
σ
2 t dt
.
ti−1
The cumulants of the χ12 distribution are as follows: p=1 Cump (χ12 ) 1
p=2
p=3
p=4
2
8
54
When the sampling points are equidistant, one then obtains the approximation Cump ([X , X ]T ) = Cump (χ12 )
p−1 ∫ T
n
T
1
σt2p dt + O(n 2 −p )
0
under the assumption that σt2 is an Itô process (often called a Brownian semimartingale). Hence, we have:
196
L. Zhang et al. / Journal of Econometrics 160 (2011) 190–203
Proposition 3. In the case where there is no leverage effect, conditionally on the path of σt2 , Cum3 ([Y , Y ]T − ⟨X , X ⟩T ) = c3 (n) + 48E (ϵ 4 )
T
∫
σt2 dt 0
+ 48 Var(ϵ)n T −1
T
∫
σ
−2 2
4 t dt
+ O(n
σ
6 t dt
+ 8n T
E [ϵ ]) + O(n 2
−1/2
E [|ϵ| ]) + O(n 3
−5/2
).
(4.9)
−1
c3 (n)
σ
×
T
σt6 dt + 54n−3 T 3
σt8 dt + O(n−1/2 E [|ϵ|5 ])
0
(4.10)
If one chooses ϵ = op (n−1/2 ) (i.e., τn = o(n−1/2 )), then all the explicit terms in (4.9) and (4.10) are non-negligible. In this case, the error term in Eq. (4.9) is of order O(n−1/2 E [|ϵ|3 ]) + O(n−5/2 ), while that in Eq. (4.10) is of order O(n−1/2 E [|ϵ|5 ]) + O(n−7/2 ). In the case of optimal-sparse estimator, it is shown in Zhang et al. (2005) (Section 2.3) that the optimal sampling frequency leads to ϵ = Op (n−3/4 ), in particular ϵ = op (n−1/2 ). For the special case of equidistant sampling times, the optimal sampling size is (Zhang et al., 2005, Eq. (31), p. 1399)
∗
nsparse =
T
∫
T 4(E ϵ 2 )2
σ
4 t dt
1/3
(4.11)
(sparse,opt)
Cum3 ([Y , Y ]T
− ⟨X , X ⟩T ) 2/3 σt4 dt 22/3 (E ϵ 2 )5/3
T
∫ = 48 T
4.3.2. Unconditional cumulants for the averaging estimator Similarly, for the averaging estimators, (avg)
E ([Y , Y ]T
(avg)
Var([Y , Y ]T
| X ) = Var([ϵ, ϵ](Tavg) ) +
T
σt4 dt
−2/3
T2
(avg)
0
(4.17)
(4.18) K K Also, from Zhang et al. (2005), for nonrandom σt , we have that (avg)
Var([X , X ]T
)=
K 4 n3
T
∫
σt4 dt + o
T
0
K
n
.
(4.19)
Invoking the general relations between the conditional and the unconditional cumulants given above, we get the unconditional cumulants for the average estimator:
=
1 K2
0
− ⟨X , X ⟩T )
c3 (¯n) + 48
+ 24
× (2E ϵ 2 )4/3 + O(E |ϵ|11/3 )
[X , X ](Tavg) E ϵ 2
2 n¯ ) = 4 E ϵ 4 − Var(ϵ 2 ).
(avg)
σt6 dt
(4.16)
with
Cum3 ([Y , Y ]T
T
∫
8 K
+ Op (E [|ϵ|2 (nK )−1/2 ]),
0
∫ +8 T
| X process) = [X , X ](Tavg) + 2n¯ E ϵ 2 ,
0
Also, in this case, it is easy to see that the error terms in Eqs. (4.9) and (4.10) are, respectively, O(n−1/2 E [|ϵ|3 ]) and O(n−1/2 E [|ϵ|5 ]). Plug (4.11) into (4.9) and (4.10) for the choice of n, and it follows that
(4.15)
In other words, the third-order and the fourth-order cumulants indeed vanish as n → ∞ and E ϵ 2 → 0, at rate O((E ϵ 2 )1/3 ) and O((E ϵ 2 )2/3 ), respectively.
Var([ϵ, ϵ]T
.
(4.14)
) , (sparse,opt) −1 ⟨ ⟩ s ([ Y , Y ] − X , X ) Cum 3 T T = (Eϵ 2 )1/3 (σ 2 T )−1/3 25/6 + O((E |ϵ|)4/3 ) (sparse,opt) − ⟨X , X ⟩T ) Cum4 s−1 ([Y , Y ]T = (E ϵ 2 )2/3 (σ 2 T )−2/3 (27 × 21/3 ) + O((E |ϵ|)2 ).
σt4 dt
+ O(n−3/2 E [ϵ 4 ]) + O(n−5/2 E [ϵ 2 ]) + O(n−7/2 ).
2
(2E ϵ 2 ) 3 + O(E ϵ 2 ), (sparse,opt) 1/2
0 T
23
hence, if s = Var([Y , Y ]T
0
∫
σ
4 t dt
0
2 t dt
T
∫
2
T
=2 T 0
+ 384(E ϵ 4 + Var(ϵ)2 )n−1 T
4
T
∫
0
+ 4nsparse E ϵ − 2 Var(ϵ ) ∗
∫
Cum4 ([Y , Y ]T − ⟨X , X ⟩T ) = c4 (n) + 24n
∫
) = E Var([Y , Y ](Tsparse,opt) | X ) + Var E ([Y , Y ](Tsparse,opt) | X ) ∫ T 2 4 2 σt dt T = 8 ⟨X , X ⟩T E ϵ + ∗ nsparse
Similarly for the fourth cumulant
+ 384 Var(ϵ)n−2 T 2
(sparse,opt)
Var([Y , Y ]T
0
0
−3/2
T
∫
But under optimal sampling, we have
1 K
1 K2
(avg)
E (ϵ 4 )E [X , X ]T (avg)
Var(ϵ) Cov([X , X ]T
, [X , X ](Tavg) − ⟨X , X ⟩T )
+ Cum3 ([X , X ](Tavg) − ⟨X , X ⟩T ) + O(K −2 n¯ −1/2 E [|ϵ|3 ])
(4.12)
(4.20)
and and
(avg)
(sparse,opt)
Cum4 ([Y , Y ]T T
∫ ×
σt4 dt
T
2/3
Cum4 ([Y , Y ]T
T2
T
∫
∫ σt6 dt 24/3 (E ϵ 2 )7/3 + 54 T
0
×
T
3
2
+ 192
0
0
×
− ⟨X , X ⟩T ) 1 1 c3 (¯n) (avg) = 3 c4 (¯n) + 24 3 E [X , X ]T K K n¯
− ⟨X , X ⟩T ) = 384(E ϵ + Var(ϵ) ) ∫ T −2/3 2 2/3 4 (2E ϵ ) + 384 T σt dt 4
σt4 dt
−1 + 192
0 T
∫
T
σ
8 t dt
0
respectively.
17/3
(2E ϵ ) + O(E |ϵ| 2 2
)
(4.13)
1 K2 1 K2
(avg)
E ϵ 4 Cov([X , X ]T
, [X , X ](Tavg) − ⟨X , X ⟩T )
(Var(ϵ))2 Var([X , X ](Tavg) ) + 48
1 K
Var(ϵ)
× Cum3 ([X , X ](Tavg) , [X , X ](Tavg) − ⟨X , X ⟩T , [X , X ](Tavg) − ⟨X , X ⟩T ) + Cum4 ([X , X ](Tavg) − ⟨X , X ⟩T ) + O(K −3 n¯ −1/2 E [|ϵ|5 ]). (4.21)
L. Zhang et al. / Journal of Econometrics 160 (2011) 190–203
(avg)
To calculate cumulants of [X , X ]T −⟨X , X ⟩T for the case where there is no leverage effect, we shall use the following proposition, which has some t independent interest. We suppose that Dt is a process, Dt = 0 Zs dWs . We also assume that (1) Zs has mean zero, (2) is adapted to the filtration generated by Wt , and also (3) jointly Gaussian with Wt . The first two of these assumptions imply, by the martingale representation theorem, that one can write
and (avg)
Cum4 ([X , X ]T
1
T s
f (s, t )f (u, t )dt .
(4.23)
0
=
1888
3
105
n
Proposition 4. Under the assumptions above, T
∫0 T
Cov(Zs , Zu )f (s, u)du
= 6
∫0 s 0
Cum4 (DT ) = −12
f (s, u)f (s, t )f (u, t )dt
du
0
∫
u
∫
ds T
K
(4.24)
t
∫
ds
f (s, u)f (t , u)du
dt
∫0 T
∫0 s
∫
=
1 K2
8
+ 6 n¯ − (4.25)
The proof is in the Appendix. Note that it is possible to derive similar results in the multivariate case. See, for example, Eq. (E.3) in the Appendix. For the application to our case, note that when σt is (avg) − (conditionally or unconditionally) nonrandom, DT = [X , X ]T ⟨X , X ⟩T is on the form discussed above, with
+ 2 K − #tj between u and s . (4.26) K This provides a general form of the low order cumulants of [X , X ](Tavg) . In the equidistant case, one can, in the equations above, to first order make the approximation f (s, u) = σs σu
1−
s−u K 1t
(avg)
Cum3 ([X , X ]T
) = 48
K
n
T2
σt6 dt 0
n
T2
T
∫
σt6 dt 0
1
Var(ϵ) Var(ϵ 2 )
2
E (ϵ 2 )T
7
T
∫
σt4 dt + 0
(avg)
Cum4 ([Y , Y ]T
44
2
10
n
K
T2
T
∫
1
σt6 dt + o 0
10
n
K
K
E (ϵ 4 ) 2
T
∫
σt2 dt 0
T2
(4.30)
− ⟨X , X ⟩T ) = 16
K3
n¯ K3
Cum4 (ϵ 2 ) + (E ϵ 4 )2
σ
Cum3 (ϵ ) − 7 Cum3 (ϵ) + 6 Var(ϵ) Var(ϵ )
T 2 t dt
+O
2
1
6
nK 2
2
E |ϵ|
nK
10 n2
K
+o
n
2 n
2
2
2112 K
2
K
44
1
1 4 + 256 E ϵ + (Var(ϵ))2 T nK ∫ T 1 4 4 × σt dt + o E |ϵ|
dy 0
+
1−z
+ 48
− 3(E ϵ 2 )4 + 12 Var(ϵ 2 )E ϵ 4 − 32E ϵ 3 Cov(ϵ 2 , ϵ 3 ) + 24E ϵ 2 (E ϵ 3 )2 1 8 + 12 Cum3 (ϵ 2 )E ϵ 2 + O E |ϵ| K3
0
dv z v(z + v − 1) + o
(4.29)
− ⟨X , X ⟩T ) 3 6 n¯ − Cum3 (ϵ 2 ) − 7 n¯ − Cum3 (ϵ)2
0
dz
1
×
.
0
1
∫
n
0
0
K
K
+o
σt6 dt + smaller terms
×
1
∫
σ
8 t dt
T
∫
× T
∫
dx(1 − y)(1 − x)(1 − (x + y))+
2
∫
3 n
∫
0
= 48
96 1
(4.27)
1
∫ ×
+
+ 192
.
This yields, from Proposition 4,
2
T
3
3
T
∫
4
0
f (s, u) ≈ 2σs σu
0
(avg)
0 x
+ f (x, u)f (u, t )f (s, x)f (s, t ) + f (x, t )f (u, t )f (s, x)f (s, u)) .
+
dw (1 − y)+
Cum3 ([Y , Y ]T
2
+ 24 ds dx du 0 0 ∫ u0 × dt (f (x, u)f (x, t )f (s, u)f (s, t )
2
Proposition 5. In the case where there is no leverage effect, conditionally on the path of the σt2 ,
0 s
∫
dy 0
Thus, (4.20) and (4.21) lead to the following results:
s
∫ ds
1
∫
−192
n
u
∫
σ
8 t dt
× (1 − (y + w))+ (1 − (y + z ))+ (1 − (w + y + z ))+ + (1 − y)+ (1 − w)+ (1 − z )+ (1 − (w + y + z ))+ + (1 − (w + y))+ (1 − w)+ (1 − z )+ 3 K + +o (1 − (y + z ))
Obviously, Var(DT ) = 0 E (Zs2 )ds = 0 0 f (s, u)2 duds. The following result provides the third and fourth cumulants of DT . Note that for u ≤ s
Cum3 (DT ) = 6
1
∫
0
0
∫
T
∫ 0
dy
dz
0
Cov(Zs , Zu ) =
T
n
1
∫
+ 384
the third assumption yields that t this f (s, u) is nonrandom, with representation Cov(Zs , Wt ) = 0 f (s, u)du for 0 ≤ t ≤ s ≤ T .
=
)=
3
0
(4.22)
T
K
(1 − (x + y))+ (1 − x)dx ∫
f (s, u)dWu ,
Zs =
3
1
∫ ×
s
∫
197
n2
2
E |ϵ|
+
σt8 dt + o
× 0
T
∫
σt6 dt 0
T
∫ (4.28)
K
Var(ϵ)T
2
1888
3
105
n
K
3 K
n
T3
+ smaller terms.
(4.31)
198
L. Zhang et al. / Journal of Econometrics 160 (2011) 190–203
Also, the optimal average subsampling size is, n¯ ∗ =
T
∫
T 6(E ϵ 2 )2
σt4 dt
1/3
our simulations in the next section. (The simulations use the most precise formulas above, see Table 1 for details.)
.
4.4. Cumulants for the TSRV estimator
0
The unconditional cumulants of the averaging estimator under the optimal sampling are
(avg,opt) Cum3 ([Y , Y ]T − ⟨X , X ⟩T ) ∫ T 2 2 K 22 K 6 2 σt dt + o T , = 5 n n 0 (avg,opt) Cum4 ([Y , Y ]T − ⟨X , X ⟩T ) ∫ T 3 3 1888 K K 8 3 = σt dt + o T .
105
n
Cum3 (⟨ X , X ⟩T − ⟨X , X ⟩T ) K 2 2 T 6 2 = 22 T 0 σt dt + o Kn , 5 n
n
0
Also, the unconditional variance of the averaging estimator, under the optimal sampling, is (avg,opt)
Var([Y , Y ]T
8
)=
K
Eϵ2
T
∫
n¯ ∗ 2 σt2 dt + 4 E ϵ 4 − Var(ϵ 2 ) K K
0
The same methods can be used to find cumulants for the two scales realized volatility (TSRV) estimator, ⟨ X , X ⟩T . Since the distribution of TSRV is well approximated by its asymptotic normal distribution, we only sketch the results. When ϵ goes to zero sufficiently fast, the dominating term in the third and fourth unconditional cumulants for TSRV are, symbolically, the same as for the average volatility, namely
Cum4 (⟨ X , X ⟩T − ⟨X , X ⟩T ) K 3 3 T 8 3 = 1888 T 0 σt dt + o Kn . 105 n
However, the value of K is quite different for TSRV than for the averaging volatility estimator. It is shown in Section 4 of Zhang et al. (2005) that for TSRV, the optimal choice of K is given by
(avg,opt) =E Var([Y ,Y ]T |X )
T
K = K 4
+
n¯ ∗ 3
T
∫
σ
4 t dt
T
+o
K
(avg,opt) =Var E ([Y ,Y ]T |X )
=
4 3
1 3
2 32
6 (E ϵ )
T
∫
σ
4 t dt
T
23
+ o(E |ϵ|4/3 ) (avg,opt) 1/2
hence, if we write s = Var([Y , Y ]T
)
(4.32)
(4.33)
(sparse,opt)
Cum3 s−1 ([Y , Y ]T
− ⟨X , X ⟩T )
(E ϵ 2 )1/3 (σ 2 T )−1/3 11 × 2−11/6 × 35/3 + O((E |ϵ|)4/3 ) 5 = (E ϵ 2 )1/3 (σ 2 T )−1/3 25/6 + O((E |ϵ|)4/3 ) =
11 5
n2/3 .
(4.36)
0
5. Simulation results incorporating the Edgeworth correction
case where the sampling points are regularly allocated. We first examine the empirical distributions of the five approaches in simulation. We then apply the Edgeworth corrections as developed in Section 4, and compare the sample performance to those predicted by the asymptotic theory. We simulate M = 50,000 sample paths from the standard Heston stochastic model dXt = µ − σt2 /2 dt + σt dBt
as n → ∞ and E ϵ 2 → 0. It is interesting to note that the averaging estimator is no closer to normal than the sparse estimator. In fact, by comparing the expression for the third cumulants in (4.15) and (4.33), we find an increase in skewness of
− ⟨X , X ⟩T )
−1/3
T
35
(avg,opt)
σ
4 t dt
In this paper, we have discussed five estimators to deal with the microstructure noise in realized volatility. The five estima(sparse) (all) , [Y , Y ](Tsparse,opt) , [Y , Y ]T(avg) , tors, including [Y , Y ]T , [Y , Y ]T ⟨ X , X ⟩ , are defined in Section 2. In this section, we focus on the
, we have that
(avg,opt) Cum3 s−1 ([Y , Y ]T − ⟨X , X ⟩T ) −11/6 5/3 2 1/3 2 −1/3 11 × 2 × 3 = ( E ϵ ) (σ T ) 5 + O((E ϵ 2 )2/3 ), (avg,opt) −1 ⟨ ⟩ Cum s ([ Y , Y ] − X , X ) 4 T T 354 = (E ϵ 2 )2/3 (σ 2 T )−2/3 61/3 + O((E ϵ 2 ))
T
∫
As is seen from Table 1, this choice of K gives radically different distributional properties than those for the average volatility. This is consistent with the behavior in simulation. Thus, as predicted, the normal approximation works well in this case.
0
Cum3 s−1 ([Y , Y ]T
12(E ϵ 2 )2
n¯ ∗
0
(4.35)
× 2−8/3 × 35/3 + O((E |ϵ|)2/3 ) ≈ 216%.
dσt2 = κ(α − σt2 )dt + γ σt dWt at a time interval 1t = 1 s, with parameter values µ = 0.05, α = 0.04, κ = 5, γ = 0.05 and ρ = d ⟨B, W ⟩t /dt = −0.5. As for the market microstructure noise ϵ , we assume that it is Gaussian with 1/2 mean zero and standard deviation E ϵ 2 = 0.0005 (i.e., only 0.05% of the value of the asset price). On each simulated sample path, we estimate ⟨X , X ⟩T over T = 1 day (i.e., T = 1/252 using annualized values) using the five estimation strategies described (sparse) (all) above: [Y , Y ]T , [Y , Y ]T , [Y , Y ](Tsparse,opt) , [Y , Y ](Tavg) and the TSRV estimator, ⟨ X , X ⟩ . We assume that a day consists of 6.5 h T
(4.34)
This number does not fully reflect the change is skewness, since it is only a first order term and the higher order terms also matter, cf.
of open trading, as is the case on the NYSE and NASDAQ. For [Y , Y ](Tsparse) , we use sparse sampling at a frequency of once every 5 min. We shall see that even in this model – which includes leverage effect – the distributional approximation from our Edgeworth expansions is highly accurate.
L. Zhang et al. / Journal of Econometrics 160 (2011) 190–203
199
Table 1 Monte-Carlo simulations: This table reports the sample and asymptotic moments for the five estimators. The bias of the five estimators is computed relative to the true quadratic variation. Our theory predicts that the first four estimators are biased, with only TSRV being correctly centered. The mean, standard deviation, skewness and kurtosis are computed for the standardized distributions of the five estimators. As seen in the table, incorporating the Edgeworth correction provides a clear improvement in the fit of the asymptotic distribution, compared to the asymptotics based on the Normal distribution. All
Sparse
Sparse opt
[Y , Y ](Tall)
[Y , Y ](Tsparse)
[Y , Y ](Tsparse,opt)
Sample bias (×10−5 ) Asymptotic bias (×10−5 )
1171 1170
3.89 3.90
Sample mean Asymptotic mean Sample stdev Asymptotic stdev
0.001 0 0.9997 1
Sample skewness Asymp. skewness (Normal) Asymp. skewness (Edgeworth) Formula for cumulant Sample kurtosis Asymp. kurtosis (Normal) Asymp. kurtosis (Edgeworth) Formula for cumulant
Avg
TSRV
[Y , Y ]T(avg)
⟨ X , X ⟩T
2.23 2.19
1.918 1.923
0.00001 0
0.002 0 1.006 1
0.01 0 1.006 1
0.002 0 0.996 1
0.002 0 1.01 1
0.023 0 0.025 (4.9)
0.341 0 0.340 (4.9)
0.493 0 0.490 (4.12)
0.509 0 0.511 (4.30)
0.049 0 0.043 (4.35)
3.002 3 3.001 (4.10)
3.16 3 3.17 (4.10)
3.42 3 3.41 (4.13)
3.44 3 3.37 (4.31)
3.004 3 3.005 (4.35)
Table 2 Monte-Carlo simulations: This table reports the coverage probability before and after the Edgeworth correction. All
Sparse
Sparse opt
[Y , Y ](Tsparse)
[Y , Y ](Tall)
[Y , Y ](Tsparse,opt)
Avg
TSRV
[Y , Y ]T(avg)
⟨ X , X ⟩T
Normal-based coverage Edgeworth-based coverage
Theoretical coverage probability = 90% 89.9% 89.3% 90.0% 89.8%
89.0% 89.7%
89.5% 89.8%
89.6% 89.7%
Normal-based coverage Edgeworth-based coverage
Theoretical coverage probability = 95% 94.9% 94.0% 95.0% 94.9%
93.6% 94.8%
93.9% 94.5%
94.5% 94.6%
Normal-based coverage Edgeworth-based coverage
Theoretical coverage probability = 99% 98.9% 98.0% 99.0% 99.0%
98.0% 99.0%
98.0% 98.6%
98.8% 98.9%
For each estimator, we report the values of the standardized quantities5 R=
estimator − ⟨X , X ⟩T [Var(estimator)]1/2
.
(5.1) (all)
For example, the variances of [Y , Y ]T (sparse,opt)
, [Y , Y ](Tsparse) and
[Y , Y ]T are based on Eq. (4.14) with the sample size (avg) ∗ n, nsparse and nsparse respectively. And the variance of [Y , Y ]T corresponds to (4.32) where the optimal subsampling size n¯ ∗ is adopted. The final estimator TSRV has variance
2 1−
n¯ n
2
n−1/3 (12(E ϵ 2 )2 )1/3
T
∫
σt4 dt
2/3
.
0
We now inspect how the simulation behavior of the five estimations compares to the second order Edgeworth expansion developed in the previous Section. The results are in Fig. 1, and in Tables 1 and 2. Table 1 reports the simulation results for the five estimation strategies. In each estimation strategy, ‘‘sample’’ represents the sample statistic from the M simulated paths; ‘‘Asymptotic (Normal)’’ refers to the value predicted by the Normal asymptotic
5 Since we take the denominator to be known, the simulations are mainly of conceptual interest, in comparing the quality of normal distributions for different estimators. In practical estimation situations, one would need an estimated denominator, and this would lead to a different Edgeworth expansion. Relevant approaches to such estimation include those of Barndorff-Nielsen and Shephard (2002) (the quarticity), Zhang et al. (2005) (Section 6), and Jacod et al. (2009) (Eq. (3.10), p. 2255). Implementing such estimators, and developing their expansions, however, are left for future work as far as this paper is concerned.
distribution (that is, without Edgeworth correction); ‘‘Asymptotic (Edgeworth)’’ refers to the value predicted by our theory (the asymptotic cumulants are given up to the approximation in the previous section; the relevant formula number is also given in Table 1). An inspection of Table 1 suggests that asymptotic normal theory (without higher order correction) is not adequate to capture the positive skewness and the leptokurtosis in each of the five (standardized) estimators, on the other hand, our expansion theory provides a good approximation to all four moments of the small sample distribution in each estimation scheme. In Table 2, we report coverage probabilities computed as follows: for asymptotically standard normal Tn , let zα be the upper 1 − α quantile (so Φ (zα ) = 1 − α ) and set
wα,n = zα + +
1 72
1 6
Cum3 (Tn )(zα2 − 1) +
1 24
Cum4 (Tn )(zα3 − 3zα )
Cum3 (Tn )2 (−4zα3 + 10zα ).
(5.2)
The second order Cornish–Fisher corrected interval has actual coverage probability P (Tn ≤ wα,n ) (which should be close to 1 − α , but not exactly equal to it). The normal approximation gives a coverage probability P (Tn ≤ zα ). We report these values for a = 0.10, 0.05 and 0.01. The results show that the Edgeworth-based coverage probabilities provide very accurate approximations to the sample ones, compared to the Normalbased coverage probabilities. Fig. 1 confirms that the sample distributions of all five estimators conform to our Edgeworth expansion. The nonlinearity in the QQ plots (left panels) reminds us that normal asymptotic theory without Edgeworth expansion fails to describe the sample behav(sparse) iors of [Y , Y ]T , [Y , Y ](Tsparse,opt) and [Y , Y ](Tavg) . The histograms
200
L. Zhang et al. / Journal of Econometrics 160 (2011) 190–203
in the right panels display the standardized distribution of the five estimators obtained from simulation results, and the superimposed solid curve corresponds to the asymptotic distribution predicted by our Edgeworth expansion. The dashed curve represents the uncorrected N (0, 1) distribution. By comparing the deviation between the dashed and solid curves, we can see how Edgeworth correction helps to capture the right skewness and leptokurtosis in the sample distribution of the (standardized) estimators.
=2
n −1 − j=0
=2 n−
1
(E ϵ 2 ) Var(ϵ 2 )
2
∑n−1
Cum
n−1 −
n −1 −
ϵti ϵti+1 ,
Appendix A. Proof of Lemma 1 Let ai be defined by
if 1 ≤ i ≤ n − 1
1 1
ai =
(A.1)
if i = 0, n.
2
n −1 −
=
ϵtj ϵtj+1 ,
n −1 −
c3 (n) = Cum3
2
n −
Cum3 (ϵti ϵti+1 ) = n(E ϵ 3 )2 ,
= 8 Cum3
ai (ϵ − E ϵ ) − 2
n −
2 ti
2
n −1 −
Cum
n −
ϵ ,
ai t2i
n −
i=0 n −
=
n −
ϵ ,
aj t2j
j=0
k=0
a3i Cum3 (ϵt2i ) =
n−
i =0
with
∑n
i=0
ϵ
ak t2k 3
− 3 Cum
ϵ
ai t2i
− Cum3
4
+ 3 Cum
n −
ai ϵt2i ,
n −
n −
aj ϵt2j ,
ϵti ϵti+1 ,
To proceed, define
1Xti−1 − 1Xti if 1 ≤ i ≤ n − 1 if i = n bi = 1Xtn−1 −1Xt0 if i = 0. ∑n Note that [X , ϵ]T = i=0 bi ϵti . Then it follows that Cum([ϵ, ϵ]T , [ϵ, ϵ]T , [X , ϵ]T |X ) =
Cum
i =0
ai ϵt2i ,
j =0
ϵ ,
ai t2i
n −1 −
= Cum 2
ϵtk ϵtk+1
ϵtj ϵtj+1 ,
j=0
− Cum 2
ϵtk ϵtk+1
ϵ ,
ai t2i
i =0
k=0 n−1 −
n −
bj ϵtj ,
n−1 −
=2
n −
aj ϵt2j ,
n−1 −
ϵtk ϵtk+1
k=0
−
ak ak+1 Cum(ϵt2k , ϵt2k+1 , ϵtk ϵtk+1 )
ϵti ϵti+1 ,
bk ϵtk |X
n −
b j ϵ tj ,
j=0
ai b2i Var(ϵ 2 ) − 4
n −1 −
n −
bk ϵtk |X
k=0
bi bi+1 (Var(ϵ))2
i =0
= 4[X , X ]T E ϵ + Op (n 4
−1/2
E [ϵ 4 ]).
(B.2)
Finally, Cum3 ([X , ϵ]T |X ) =
n −
b3i Cum3 (ϵ)
i=0
k=0
= 2(n − 1)(E ϵ 3 )2 (A.3) ∑n−1 since k=0 ak ak+1 = n − 1, and the summation is non-zero only when (i = k, j = k + 1) or (i = k + 1, j = k). Also, n n−1 n −1 − − − 2 Cum ai ϵti , ϵtj ϵtj+1 , ϵtk ϵtk+1 i =0
n − k=0
i=0
(A.2)
k=0
n − j =0
n −1
=2
bi Cum([ϵ, ϵ]T , [ϵ, ϵ]T , ϵti )
Cum([ϵ, ϵ]T , [X , ϵ]T , [X , ϵ]T |X )
ϵti ϵti+1
i =0 n −
n −
(B.1)
because Cum([ϵ, ϵ]T , [ϵ, ϵ]T , ϵti ) = Cum([ϵ, ϵ]T , [ϵ, ϵ]T , ϵt1 ), for i = 1, . . . , n − 1. Also, recalling the definition of ai in (A.1)
where n −
(A.6)
a3i = n − 34 . Inserting (A.3)–(A.6) in (A.2) yields (4.3).
n−1 −
n −1 −
j=0
i=0
Cum3 (ϵ 2 ),
Appendix B. Proof of Proposition 1
i=0
i=0
= (b0 + bn )[2E ϵ 2 E ϵ 3 − 3E ϵ 5 ] = Op (n−1/2 E [|ϵ|5 ])
i =0
i=0
(A.5)
i=0
i =0
ϵtk ϵtk+1
i =0
We can then write
k=0
j =0
i=0
We have developed and given formulas for Edgeworth expansions of several type of realized volatility estimators. Apart from the practical interest of having access to such expansions, there is an important conceptual finding. That is, a better expansion is obtained by using an asymptotics where the noise level goes to zero when the number of observations goes to infinity. Another lesson is that the asymptotic normal distribution is a more accurate approximation for the two scales realized volatility (TSRV) than for the subsampled estimators, whose distributions definitely need to be Edgeworth-corrected in small samples. In the process of developing the expansions, we also developed a general device for computing cumulants of the integrals of Gaussian processes with respect to Brownian motion (Proposition 4), and this result should have applications to other situations. The proposition is only stated for the third and fourth cumulants, but the same technology can potentially be used for higher order cumulants.
(A.4)
since j=0 aj = n − 12 , and the summation is non-zero only when j = k = (i, or i − 1). And finally,
6. Conclusions
aj Cum(ϵt2j , ϵtj ϵtj+1 , ϵtj ϵtj+1 )
j =0
k=0
= Eϵ
3
−3
n −1 −
n −1 − (1Xti−1 ) (1Xti ) + 3 (1Xti−1 )(1Xti )2 i =1
i=1
= Op (n
−1/2
2
3
E [|ϵ| ]).
(B.3)
Gathering the terms above together, one now obtains the first part of Proposition 1. The second part of the result is then obvious.
L. Zhang et al. / Journal of Econometrics 160 (2011) 190–203
Appendix C. Proof of Lemma 2 We have that:
Cum
n −
ai ϵt2i ,
n −1 −
i=0
=
ϵtj ϵtj+1 ,
n −1 −
n n−1 − −
ϵtk ϵtk+1 ,
n −1 −
ϵtl ϵtl+1
3
+
2
ai 1{l=j} 1{i=j, or j+1}
j =0
4
−
4 3 4
+
ai aj 1{i=j,k=l,i=(k+1 or k)}
+ 24 n −
(C.2)
ai ai+1 = n − 1. Next: ai ϵt2i ,
aj ϵt2j ,
n − n − n−1 − n −1 −
ak ϵt2k ,
n −1 −
3 2
i=0 j=0 k=0 l=0
ϵtl ϵtl+1
=6
n−1 −
a2i ai+1
2 tj
[1{i=j=l,k=l+1} + 1{i=j=l+1,k=l} ]
5
=6 n− since
∑n−1
Cum4
i =0
n−1 −
4
Cum(ϵ , ϵ , ϵ)E ϵ 2
2
ϵti ϵti+1
i=0
=
n−1 − n −1 − n −1 − n −1 −
+
2
(C.3)
ai ϵt2 ,
k=0
aj ϵt2 ,
n−1 −
j
i
n−1 −
j=0
ϵtl ϵtl+1
l=0
ϵtk ϵtk+1 ,
k= 0
n−1 −
ϵtl ϵtl+1
l=0
7 4
E ϵ (E ϵ ) + 12 n − 2
3 2
3
4
Cum3 (ϵ )E ϵ 2
2
(C.6)
−
bi bj Cum([ϵ, ϵ]T , [ϵ, ϵ]T , ϵti , ϵtj )
−
b2i Cum([ϵ, ϵ]T , [ϵ, ϵ]T , ϵti , ϵti )
+2
n −1 −
bi bi+1 Cum([ϵ, ϵ]T , [ϵ, ϵ]T , ϵti , ϵti+1 ).
(D.1)
α = n− 1
−
Cum([ϵ, ϵ]T , [ϵ, ϵ]T , ϵti , ϵti )
i
−
Cum([ϵ, ϵ]T , [ϵ, ϵ]T , ϵti , ϵti+1 ).
Now following the two identities: 1{i=j=k=l}
Cum([ϵ, ϵ]T , [ϵ, ϵ]T , ϵi , ϵi )
= Cum3 ([ϵ, ϵ]T , [ϵ, ϵ]T , ϵi2 ) − 2(Cov([ϵ, ϵ]T , ϵi ))2 Cum([ϵ, ϵ]T , [ϵ, ϵ]T , ϵi , ϵi+1 ) = Cum3 ([ϵ, ϵ]T , [ϵ, ϵ]T , ϵi ϵi+1 ) − 2 Cov([ϵ, ϵ]T , ϵi ) Cov([ϵ, ϵ]T , ϵi+1 ),
1{i=j,k=l,i=(k+1,k−1)}
= n((E ϵ 4 )2 − 3(E ϵ 2 )4 ) + 12(n − 1)(E ϵ 2 )2 Var(ϵ 2 ) n n − − 2 Cum4 ai ϵ t i = a4i Cum4 (ϵ 2 ) i=0
n −
ϵtk ϵtk+1 ,
i
× Cum(ϵti ϵti+1 , ϵtj ϵtj+1 , ϵtk ϵtk+1 , ϵtl ϵtl+1 )
i=0
j=0
n −
n−1 −
Cum([ϵ, ϵ]T , [ϵ, ϵ]T , [X , ϵ]T , [X , ϵ]T |X )
β = n− 1
i=0 j=0 k=0 l=0
4
3
a2i ai+1 = n − 5/4, and:
i=0
ϵtj ϵtj+1 ,
ϵtl ϵtl+1
l=0
Note that Cum([ϵ, ϵ]T , [ϵ, ϵ]T , ϵti , ϵti ) and Cum([ϵ, ϵ]T , [ϵ, ϵ]T , ϵti , ϵti+1 ) are independent of i, except close to the edges. One can take α and β to be
3
Cum(ϵ , ϵ , ϵ)E ϵ , 2
n−1 −
i =0
i=0
ai ϵt2 ,
ϵ ,
k= 0
n−1 −
i
2 tk
2
j=0
ak t2 k
It remains to deal with the second term in Eq. (4.6),
=
× Cum(ϵ , ϵ , ϵ , ϵtl ϵtl+1 ) 2 ti
n −
i,j
ai aj ak
ϵ ,
aj t2 j
Appendix D. Proof of Proposition 2
=
l =0
k=0
j =0
n −
n −
since Cov(ϵ 2 , ϵ 3 ) = E ϵ 5 − E ϵ 2 E ϵ 3 and Cum(ϵ 2 , ϵ 2 , ϵ) = E ϵ 5 − 2E ϵ 2 E ϵ 3 .
where the notation (i, j) = (k + 1, k − 1)[2] means that (i = k + 1, j = k − 1), or (j = k + 1, i = k − 1). The last equation ∑n ∑n−1 above holds because i=1 a2i = n − 3/4, i=1 ai−1 ai+1 = n − 2,
i=0
ϵ ,
ai t2 i
16
+ 2(n − 1)(Var(ϵ 2 ))2
n −
n −
i=0
4
n −
Cum
2
ϵti ϵti+1
i=0
7 = 16 n − Cum4 (ϵ 2 ) + n(E ϵ 4 )2 − 3n(E ϵ 2 )4 8 17 + 12(n − 1) Var(ϵ 2 )E ϵ 4 − 32 n − E ϵ 3 Cov(ϵ 2 , ϵ 3 )
3 =2 n− Cum3 (ϵ 2 )E ϵ 2 + 4(n − 2)(E ϵ 3 )2 E ϵ 2
=
+ Cum4
n−1 −
+ 1{l=k+1,(i,j)=(k,k+2)[2]} + 1{k=l,(i,j)=(k,k+1)[2]}
Cum
ϵ
ai t2 i
i
n −
Cum
(C.1)
+ 1{l=k−1,(i,j)=(k+1,k−1)[2]}
i=0
i=0
i=0 j=0 k=0 l=0
i=0
Cum
1
−
l=0
k=0
(1{l=j+1,i=j+2} + 1{l=i=j−1} )
n − n − n−1 − n −1 −
∑ n −1
(C.5)
i=0
i=0
and
n −
= 16Cum4
× Cum(ϵt2i , ϵtj ϵtj+1 , ϵtk ϵtk+1 , ϵtl ϵtl+1 ) 1 3 3 2 3 E ϵ Cov(ϵ , ϵ ) + 6 n − (E ϵ 3 )2 E ϵ 2 =2 n− 2 2 n−1 n n n −1 − − − − 2 2 ϵtl ϵtl+1 Cum ai ϵti , aj ϵ t j , ϵtk ϵtk+1 , =
Cum4 (ϵ 2 ).
i=0
i=0 j=k=0
8
201
Putting together (C.1)–(C.5): n n−1 − − 2 c4 (n) = Cum4 2 ai ϵt − 2 ϵti ϵti+1 , i
l =0
k=0
j =0
7
= n−
(C.4)
also observing that Cov([ϵ, ϵ]T , ϵi ) = Cov([ϵ, ϵ]T , ϵi+1 ), except at the edges, 2(α − β) = n−1 Cum3 ([ϵ, ϵ]T ) + Op (n−1/2 E [|ϵ|6 ]).
202
L. Zhang et al. / Journal of Econometrics 160 (2011) 190–203
=
b2i
α+2
n −
i=0
bi bi+1 β + Op (n
−1/2
= n [X , X ]T Cum3 ([ϵ, ϵ]T ) + Op (n
E [|ϵ| ])
−1/2
=4
E [|ϵ| ])
s
∫
dt Cov(Zs , Zt )2
ds 0
0 T
=4
b2i = 2[X , X ]T + Op (n−1/2 ), bi bi+1 = −[X , X ]T + Op (n
dsdt Cov(Zs , Zt )2 0
T
∫ 6
∫
n
T
∫
0
s
∫
t
∫
f (s, u)f (t , u)du
dt
ds 0
0
2
.
(E.5)
0
For the other term in (E.4)
i =0
−
T
∫ =2
6
where the last line is because
−
dsdt Cov(Zs2 , Zt2 )
0
0
i=0
−1
n
T
∫
=
Cum4 ([ϵ, ϵ]T , [ϵ, ϵ]T , [X , ϵ]T , [X , ϵ]T |X ) n −
T
∫
Hence, (D.1) becomes
−1/2
Cum3 (DT , DT , ⟨D, D⟩T ) =
).
∫
T
Cum3 (DT , DT , Zs2 )ds.
0
i =0
(1)
The proposition now follows. Appendix E. Proof of Proposition 4 The Bartlett identities for martingales, of this we use the cumulant version, with ‘‘cumulant variations’’, can be found in Mykland s∧t (s) (1994). Set Zt = 0 f (s, u)dWu , which is taken to be a process in t for fixed s. For the third cumulant, by the third Bartlett identity
Cum3 (DT , DT , ⟨D, D⟩T ) = 4
T
Zs2 ds
Cov(DT , Zs2 )ds. (E.1) 0
s
∫
Zu f (s, u)du by (4.22) and 0
, Z (s) s is nonrandom
s
Cov(Zs , Zu )f (s, u)du 0 s
∫ =2
u
∫
f (s, u)f (s, t )f (u, t )dt .
du 0
(E.2)
0
Combining the two last lines of (E.2) with Eq. (E.1) yields the result (4.24) in the proposition. Note that, more generally than (4.24), in the case of three dif(i) ferent processes DT , i = 1, 2, 3, one has (1)
(2)
(3)
Cum3 (DT , DT , DT ) T
∫ =2
s
∫ ds
0
x
∫ dx
0
du 0
u
dt (f (x, u)f (x, t )f (s, u)f (s, t )
(E.6)
References
by the third Bartlett identity
∫
(s)
Combining Eqs. (E.4)–(E.6) yields the result (4.25) in the proposition.
= Cum3 (Ds , Zs , Zs ) since EDs = EZs = 0 = Cum3 (Ds , Zs(s) , Zs(s) ) = 2 Cov Zs(s) , D, Z (s) s + Cov Ds , Z (s) , Z (s) s
=2
s
∫ ds
+ f (x, u)f (u, t )f (s, x)f (s, t ) + f (x, t )f (u, t )f (s, x)f (s, u)) .
T
∫ =3
Cov(DT , Zs2 ) = Cov(Ds , Zs2 ) since Dt is a martingale
since Z
(3)
0
To compute the integrand,
= 2 Cov Zs , (s)
T
×
0
∫ ∫0
Cum3 (DT ) = 3 Cov(DT , ⟨D, D⟩T )
∫ = 3 Cov DT ,
(2)
To calculate this, fix s, and set Dt = Dt = Dt , and Dt = (Zt )2 t − ⟨Z (s) , Z (s) ⟩t . Since D(t 3) = 0 (2Zu(s) f (s, u))dWu for t ≤ s, D(t 3) is on the form covered by the third cumulant equation (E.3), with (s) Z (for D(3) )u = 2Zu f (s, u) and f (for D(3) )(a, t ) = 2f (s, a)f (s, t ) (for t ≤ a ≤ s). Then:
Cov(Zs(1) , Zu(2) )f (3) (s, u)du[3],
(E.3)
0
where the symbol ‘‘[3]’’ is used as in McCullagh (1987). We shall use this below. For the fourth cumulant, Cum4 (DT ) = −3 Cov(⟨D, D⟩T , ⟨D, D⟩T )
+ 6 Cum3 (DT , DT , ⟨D, D⟩T ),
(E.4)
by the fourth Bartlett identity. For the first term Cov(⟨D, D⟩T , ⟨D, D⟩T ) = Cov
T
∫
Zs2 ds 0
,
T
∫
Zs2 ds 0
Aït-Sahalia, Y., Mykland, P.A., Zhang, L., 2005. How often to sample a continuoustime process in the presence of market microstructure noise. Review of Financial Studies 18, 351–416. Andersen, T.G., Bollerslev, T., Diebold, F.X., Ebens, H., 2001a. The distribution of realized stock return volatility. Journal of Financial Economics 61, 43–76. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2001b. The distribution of realized exchange rate volatility. Journal of the American Statistical Association 96, 42–55. Awartani, B., Corradi, V., Distaso, W., 2006. Testing and modelling microstructure effects with an application to the Dow Jones industrial average. Working Paper. University of Warwick. Bandi, F.M., Russell, J.R., 2008. Microstructure noise, realized volatility and optimal sampling. Review of Economic Studies 75, 339–369. Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N., 2008. Designing realized kernels to measure ex-post variation of equity prices in the presence of noise. Econometrica 76, 1481–1536. Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N., 2011. Subsampling realised kernels. Journal of Econometrics 160 (1), 204–219. Barndorff-Nielsen, O.E., Shephard, N., 2002. Econometric analysis of realized volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society. Series B 64, 253–280. Barndorff-Nielsen, O.E., Shephard, N., 2005. How accurate is the asymptotic approximation to the distribution of realized variance? In: Andrews, D.W., Stock, J.H. (Eds.), Identification and Inference for Econometric Models. A Festschrift in Honour of T.J. Rothenberg. Cambridge University Press, Cambridge, UK, pp. 306–311. Bhattacharya, R.N., Ghosh, J., 1978. On the validity of the formal Edgeworth expansion. Annals of Statistics 6, 434–451. Bickel, P.J., Götze, F., van Zwet, W.R., 1986. The Edgeworth expansion for u-statistics of degree two. The Annals of Statistics 14, 1463–1484. Bollerslev, T., Zhou, H., 2002. Estimating stochastic volatility diffusions using conditional moments of integrated volatility. Journal of Econometrics 109, 33–65. Brillinger, D.R., 1969. The calculation of cumulants via conditioning. Annals of the Institute of Statistical Mathematics 21, 215–218. Chan, N.H., Wei, C.Z., 1987. Asymptotic inference for nearly nonstationary AR(1) processes. Annals of Statistics 15, 1050–1063. Corradi, V., Distaso, W., 2006. Semiparametric comparison of stochastic volatility models via realized measures. Review of Economic Studies 73, 635–667. Dacorogna, M.M., Gençay, R., Müller, U., Olsen, R.B., Pictet, O.V., 2001. An Introduction to High-Frequency Finance. Academic Press, San Diego. Delattre, S., Jacod, J., 1997. A central limit theorem for normalized functions of the increments of a diffusion process, in the presence of round-off errors. Bernoulli 3, 1–28. Feller, W., 1971. An Introduction to Probability Theory and its Applications, Volume 2. John Wiley and Sons, New York.
L. Zhang et al. / Journal of Econometrics 160 (2011) 190–203 Gloter, A., Jacod, J., 2001. Diffusions with measurement errors. II—optimal estimators. European Series in Applied and Industrial Mathematics (ESAIM) 5, 243–260. Gonçalves, S., Meddahi, N., 2008. Edgeworth corrections for realized volatility. Econometric Reviews 27, 139–162. Gonçalves, S., Meddahi, N., 2009. Bootstrapping realized volatility. Econometrica 77, 283–306. Hall, P., 1992. The Bootstrap and Edgeworth Expansion. Springer, New York. Hansen, P.R., Lunde, A., 2006. Realized variance and market microstructure noise. Journal of Business and Economic Statistics 24, 127–218. Jacod, J., 1994. Limit of random measures associated with the increments of a Brownian semimartingale. Tech. Rep. Université de Paris VI. Jacod, J., Protter, P., 1998. Asymptotic error distributions for the Euler method for stochastic differential equations. Annals of Probability 26, 267–307. Jacod, J., Li, Y., Mykland, P.A., Podolskij, M., Vetter, M., 2009. Microstructure noise in the continuous case: the pre-averaging approach. Stochastic Processes and their Applications 119, 2249–2276. Kadane, J.B., 1971. Comparison of k-class estimators when the disturbances are small. Econometrica 39, 723–737. Lieberman, O., Phillips, P.C., 2006. Refined inference on long memory in realized volatility. Cowles Foundation Discussion Paper No. 1549. McCullagh, P., 1987. Tensor Methods in Statistics. Chapman and Hall, London, UK. Meddahi, N., 2002. A theoretical comparison between integrated and realized volatility. Journal of Applied Econometrics 17, 479–508. Mykland, P.A., 1993. Asymptotic expansions for martingales. Annals of Probability 21, 800–818. Mykland, P.A., 1994. Bartlett type identities for martingales. Annals of Statistics 22, 21–38.
203
Mykland, P.A., 1995a. Embedding and asymptotic expansions for martingales. Probability Theory and Related Fields 103, 475–492. Mykland, P.A., 1995b. Martingale expansions and second order inference. Annals of Statistics 23, 707–731. Mykland, P.A., 2001. Likelihood computations without Bartlett identities. Bernoulli 7, 473–485. Mykland, P.A., Zhang, L., 2006. ANOVA for diffusions and Itô processes. Annals of Statistics 34, 1931–1963. Podolskij, M., Vetter, M., 2009. Estimation of volatility functionals in the simultaneous presence of microstructure noise and jumps. Bernoulli 15 (3), 634–658. Speed, T.P., 1983. Cumulants and partition lattices. The Australian Journal of Statistics 25, 378–388. Wallace, D.L., 1958. Asymptotic approximations to distributions. Annals of Mathematical Statistics 29, 635–654. Zhang, L., 2001. From martingales to ANOVA: implied and realized volatility. Ph.D. Thesis. The University of Chicago, Department of Statistics. Zhang, L., 2006. Efficient estimation of stochastic volatility using noisy observations: a multi-scale approach. Bernoulli 12, 1019–1043. Zhang, L., Mykland, P.A., Aït-Sahalia, Y., 2005. A tale of two time scales: determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association 100, 1394–1411. Zhou, B., 1996. High-frequency data and volatility in foreign-exchange rates. Journal of Business and Economic Statistics 14, 45–52. Zhou, B., 1998. F -consistency, de-volatization and normalization of high frequency financial data. In: Dunis, C.L., Zhou, B. (Eds.), Nonlinear Modelling of High Frequency Financial Time Series. John Wiley Sons Ltd., New York, pp. 109–123. Zumbach, G., Dacorogna, M., Olsen, J., Olsen, R., 1999. Introducing a scale of market shocks. Tech. Rep. Olsen & Associates.
Journal of Econometrics 160 (2011) 204–219
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Subsampling realised kernels✩ Ole E. Barndorff-Nielsen a,b , Peter Reinhard Hansen c , Asger Lunde d,b,∗ , Neil Shephard e,f a
The T.N. Thiele Centre for Mathematics in Natural Science, Department of Mathematical Sciences, Aarhus University, Ny Munkegade, DK-8000 Aarhus C, Denmark
b
CREATES, Aarhus University, Denmark
c
Department of Economics, Stanford University, Landau Economics Building, 579 Serra Mall, Stanford, CA 94305-6072, USA
d
School of Economics and Management, Aarhus University, Building 1322 Bartholins Allé 10, DK-8000 Aarhus C, Denmark
e
Oxford-Man Institute, University of Oxford, Eagle House, Walton Well Road, Oxford OX2 6EE, UK
f
Department of Economics, University of Oxford, UK
article
info
Article history: Available online 6 March 2010 Keywords: Long run variance estimator Market frictions Quadratic variation Realised kernel Realised variance Subsampling
abstract In a recent paper we have introduced the class of realised kernel estimators of the increments of quadratic variation in the presence of noise. We showed that this estimator is consistent and derived its limit distribution under various assumptions on the kernel weights. In this paper we extend our analysis, looking at the class of subsampled realised kernels and we derive the limit theory for this class of estimators. We find that subsampling is highly advantageous for estimators based on discontinuous kernels, such as the truncated kernel. For kinked kernels, such as the Bartlett kernel, we show that subsampling is impotent, in the sense that subsampling has no effect on the asymptotic distribution. Perhaps surprisingly, for the efficient smooth kernels, such as the Parzen kernel, we show that subsampling is harmful as it increases the asymptotic variance. We also study the performance of subsampled realised kernels in simulations and in empirical work. © 2010 Elsevier B.V. All rights reserved.
1. Introduction High frequency financial data allow us to estimate the increments to quadratic variation, the usual ex post measure of the variation of asset prices (e.g. Andersen et al. (2001) and BarndorffNielsen and Shephard (2002)). Common estimators, such as the realised variance, can be sensitive to market frictions when applied to returns recorded over shorter time intervals such as 1 min, or even more ambitiously, 1 s (e.g. Zhou (1996), Fang (1996) and Andersen et al. (2000)). In response, two non-parametric generalisations have been proposed: subsampling and realised kernels by Zhang et al. (2005) and Barndorff-Nielsen et al. (2008), respectively. Here we partially unify these approaches by studying the properties of subsampled realised kernels.
Our interest is the estimation of the increment to quadratic variation over some arbitrary fixed time period written as [0, t ], which could represent a day say, using estimators of the realised kernel type. For a continuous time log-price process X and time gap δ > 0, the flat-top1 realised kernels of Barndorff-Nielsen et al. (2008) take on the following form: K (Xδ ) = γ0 (Xδ ) +
H − h−1 k
h=1
H
{γh (Xδ ) + γ−h (Xδ )} ,
H ≥ 1.
Here k(x), x ∈ [0, 1], is a weight function with k(0) = 1, k(1) = 0, while
γh (Xδ ) =
nδ −
xj xj−h ,
j =1
xj = Xδ j − Xδ(j−1) , h = −H , . . . , −1, 0, 1, . . . , H , ✩ The second and fourth author are also affiliated with CREATES, a research centre
funded by the Danish National Research Foundation. The Ox language of Doornik (2006) was used to perform the calculations reported here. We are grateful to Joseph Romano, Per Mykland and Nour Meddahi (the editors), and two anonymous referees for their suggestions that improved this manuscript. All errors remain our responsibility. ∗ Corresponding author at: School of Economics and Management, Aarhus University, Building 1322 Bartholins Allé 10, DK-8000 Aarhus C, Denmark. E-mail addresses:
[email protected] (O.E. Barndorff-Nielsen),
[email protected] (P.R. Hansen),
[email protected] (A. Lunde),
[email protected] (N. Shephard). 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.031
with nδ = ⌊t /δ⌋. Think of δ as being small and so xj represents the jth high frequency return, while γ0 (Xδ ) is the realised variance of X . The above authors gave a relatively exhaustive treatment of K (Xδ ) when X is a Brownian semimartingale plus noise. It is important to distinguish three classes of kernel functions k(x): smooth, kinked, and discontinuous. Examples are the Parzen,
1 It is called a flat-top estimator as it imposes that the weight at lag 1 is 1. The motivation for this is discussed extensively in Barndorff-Nielsen et al. (2008).
O.E. Barndorff-Nielsen et al. / Journal of Econometrics 160 (2011) 204–219
the Bartlett and the truncated kernel, respectively. BarndorffNielsen et al. (2008) have shown that the smooth class, which satisfy k′ (0) = k′ (1) = 0, lead to realised kernels that converge 1/4 at the efficient rate nδ , whereas the kinked kernels, which do not satisfy k′ (0) = k′ (1) = 0, lead to realised kernels that converge at 1/6 rate nδ . The discontinuous kernels lead to inconsistent estimators as we show in Section 3.4. Realised kernels use returns computed starting at t = 0. There may be efficiency gains by jittering the initial value S times, illustrated in Fig. 1, producing S sets of high frequency returns xsj , s = 1, 2, . . . , S. Zhang et al. (2005) made this point for realised variances. We can then average the resulting S realised kernel estimators: K (Xδ ; S ) =
S 1−
S s=1
K s (Xδ ),
where
H − h−1 s k γh (Xδ ) + γ−s h (Xδ ) ,
K s (Xδ ) = γ0s (Xδ ) +
H
h=1
γhs (Xδ ) =
nδ
−
xsj xsj−h ,
xsj = X
δ j+ (s−S 1)
j=1
− Xδj+ (s−1) −1 . S
We call K (Xδ ; S ) the subsampled realised kernel—noting that this form of subsampling is different from the conventional form of subsampling, as we discuss below. Here we show that subsampling is very useful for the class of discontinuous kernels, because subsampling makes these estimators consistent and converge in distribution at rate n1/6 , where n = Snδ is the effective sample size. Zhou (1996) used a simple discontinuous kernel and gave a brief discussion of subsampling of that kernel. We will see that his estimator can be made consistent by allowing S → ∞ as n → ∞, a result which is implicit in his paper, but one that he did not explicitly draw out. For the class of kinked kernels, we show that subsampling is impotent, in the sense that the asymptotic distribution is the same whether subsampling is used or not. Finally, we show that subsampling is harmful when applied to smooth kernels. In fact, if the number of subsamples, S, increases with the sample size, n, the best rate of convergence is reduced to less than the efficient one, n1/4 . The intuition for these results follows from Lemma A.1 in the Appendix. It shows that S 1−
γh (Xδ ; S ) =
S −1 −
γhs (Xδ ) ≃
kB
s
S s =1 s=−S +1 where kB (x) = 1 − |x| ,
S
γSh+s (Xδ/S ),
where the approximation is due to subtle end effects. The implication is that S −1 −
K (Xδ ; S ) ≃
kB
s
s=−S +1 S
×
−
kB
S
=
−
kS
S
h−1 HS
h =0
k
h=1
s
s=−S HS
γs (Xδ/S ) +
H − h−1
H
γSh+s (Xδ/S ) + γ−Sh−s (Xδ/S ) γ˜Sh+s (Xδ/S ).
So a subsampled realised kernel is a realised kernel simply operating on a higher (ignoring end effects). The implied h frequency kernel weights, kS HS , h = 1, . . . , SH, are convex combinations of neighboring weights of the original kernel,
kS
hs HS
=
S−s S
k
h S
s
+ k S
h+1
S
h = 0, . . . , H , s = 1, . . . , S .
, (1)
205
Fig. 1. x1j are the usual returns. The bottom series are the offset returns xsj , s = 2, . . . , S.
In Fig. 2 we trace out the implied kernel weights for three subsampled realised kernels. The left panels display the original kernel functions and right panels display the implied kernel functions. For the truncated kernel (H = 1) subsampling leads to a substantially different implied kernel function—the trapezoidal kernel of Politis and Romano (1995). For the kinked Bartlett kernel, subsampling leads to the same kernel function. For a smooth kernel function, the original and implied kernel functions are fairly similar; however subsampling does impose some piecewise linearity which is the reason that subsampling of smooth kernels increases the asymptotic variance. The connection between subsampled realised kernels and realised kernels is perhaps not too surprising, because Bartlett (1950) motivated his kernel with the subsampling idea. The conventional form of subsampling is based on subseries that consist of consecutive observations. This is different from the case for our subsamples that consist of every Sth observation. Such ones are called subgrids in Zhang et al. (2005). While the two types of subsampling are different, they can result in identical estimators. For instance, the sparsely sampled realised variance, γ01 (Xδ ), is identical to Carlstein’s subsample estimator (of the variance of a sample mean when the mean is zero); see Carlstein (1986). Carlstein’s estimator is based on non-overlapping subseries and Künsch (1989) analysed the closely related estimator on the basis of overlapping subseries. Interestingly, the (overlapping) subsample estimator of Künsch (1989) is identical to the average sparsely sampled realised variance called ‘‘second best’’ in Zhang et al. (2005), so the TSRV and MSRV estimators, by Zhang et al. (2005), Aït-Sahalia et al. (forthcoming), and Zhang (2006), can be expressed as linear combinations of two or more subsample estimators of the overlapping subseries type by Künsch (1989). For additional details on the relation between the Bartlett kernel and various subsample estimators; see Anderson (1971, p. 512), Priestley (1981, pp. 439–440), and Politis et al. (1999, pp. 95–98). This paper has the following structure. We present the basic framework in Section 2 along with some known results. In Section 3 we present our main results. Here we derive the limit theory for subsampled realised kernels and show that subsampling cannot improve realised kernels within a very broad class of estimators. In Section 4, we given some specific recommendations on empirical implementation of subsampled realised kernels and how to conduct valid inference in this context. We present results from a small simulation study in Section 5 and an empirical application in Section 6. We conclude in Section 7 and present all proofs in an Appendix. 2. Notation, definitions and background 2.1. Semimartingales and quadratic variation The fundamental theory of asset prices says that the log-price at time t, Yt , must, in a frictionless arbitrage free market, obey a semimartingale process (written as Y ∈ SM ) on some filtered
O.E. Barndorff-Nielsen et al. / Journal of Econometrics 160 (2011) 204–219
Truncated (Zhou)
206 1.0
1.0
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.0
0.0
Bartlett
0
0.5
1 1.0
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.0
0.5
1
0.0 0
TH2
0
1.0
0.5
1
1.0
1.0
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.0
0
0.5
1
0
0.5
1
0.0 0
0.5
1
Fig. 2. The effects of subsampling some kernels. The left panels display the original kernel functions and the right panels display their implied kernel functions that are induced by subsampling. For the truncated (discontinuous) kernel the two are very different. So subsampling makes an important difference in this case. For the (kinked) Bartlett kernel the two are virtually identical, which suggests that subsampling has no effect on this kernel. Finally, for the smooth kernel in the lower panels, subsampling has only a small effect by making the kernel function piecewise linear.
probability space (Ω , F , (Ft )t ≥T ∗ , P ), where T ∗ ≤ 0. Crucial to semimartingales, and to the economics of financial risk, is the quadratic variation (QV) process of Y ∈ SM . This can be defined as
[Y ]t = plim
N −
N →∞ j=1
Ytj − Ytj−1
2
,
(2)
(e.g. Protter (2004, p. 66–77) and Jacod and Shiryaev (2003, p. 51)) for any sequence of deterministic partitions 0 = t0 < t1 < · · · < tN = t with supj {tj+1 − tj } → 0 for N → ∞. The most familiar semimartingales are of Brownian semimartingale type (Y ∈ BSM ) t
∫ Yt =
t
∫
σu dWu ,
au du + 0
(3)
0
where a is a predictable locally bounded drift, σ is a càdlàg volatility tprocess and W is a Brownian motion. If Y ∈ BSM then [Y ]t = 0 σu2 du. In some of our asymptotic theory we also assume, for simplicity of exposition, that
σt = σ0 +
t
∫
a#u du + 0
t
∫
σu# dWu + 0
t
∫
vu# dVu , 0
(4)
where a# , σ # and v # are adapted càdlàg processes, with a# also being predictable and locally bounded and V is Brownian motion independent of W . Much of what we do here can be extended to allow for jumps in σ (cf. Barndorff-Nielsen et al. (2006)). 2.2. Assumptions about noise We write the effects of market frictions as U, so we observe the process X = Y + U.
(5)
Our scientific interest will be in estimating [Y ]t . In the main part of our work we will assume that Y yU where, in general, AyB denotes that A and B are independent. From a market microstructure theory viewpoint this is a strong assumption, as one may expect U to be correlated with increments in Y . However, the empirical work of Hansen and Lunde (2006) suggests that this independence assumption is not too damaging statistically when we analyse data in thickly traded stocks recorded approximately every minute (see also Kalnina and Linton (2008)). We make a white noise assumption about the U process (U ∈ W N ): E(Ut ) = 0, Ut yUs
Var(Ut ) = ω2 ,
Var(Ut2 ) = λ2 ω4 ,
(6)
O.E. Barndorff-Nielsen et al. / Journal of Econometrics 160 (2011) 204–219
for any t ̸= s, where λ ∈ R+ . This white noise assumption is unsatisfactory but is a useful starting point if we think of the market frictions as operating in tick time (e.g. Bandi and Russell (2008), Zhang et al. (2005) and Hansen and Lunde (2006)). By analogy to the realised autocovariances we define
γh (Yδ , Uδ ) =
nδ −
are mixed Gaussian, uncorrelated, with mean zero and asymptotic variances given by 4
H nδ
4ω4 nδ
γh (Xδ ) = γh (Yδ ) + γh (Yδ , Uδ ) + γh (Uδ , Yδ ) + γh (Uδ ). It will be useful to have the following notation: γ (Xδ ) = {γ0 (Xδ ), γ1 (Xδ ), . . . , γH (Xδ )}| , where γh (Xδ ) = γh (Xδ )+γ−h (Xδ ), and introduce the analogous definitions of γ (Yδ ), γ (Uδ ), and γ (Yδ , Uδ ).
and k′ (1) = 0,
(7)
is a necessary condition for a realised kernel to have the best rate of convergence, and this property is also key for subsampled realised kernels—see also the work of Zhang (2006) on using subsampling of realised variance to obtain the same rate of convergence. We shall refer to continuous kernels that satisfy (7) as smooth; otherwise they are called kinked. In some of our proofs it is convenient to extend the support of the kernel functions beyond the unit interval, using the conventions k(x) = 0 for x > 1 and k(−x) = k(x). Barndorff-Nielsen et al. (2008) showed that kernel functions of the type can be used to produce consistent estimators with mixed Gaussian asymptotic distributions. It is therefore interesting to analyse whether there are any gains from subsampling realised kernels or not. Perhaps surprisingly we find that subsampling is harmful or, at best, impotent, for realised kernels that are based on smooth or kinked kernel functions. Below we formulate limit results for subsampled realised kernels using the notation 1
k1•,1 =
k(x)2 dx,
k•
1
∫
0
k′ (x)2 dx, 0
1
k (x) dx, ′′
=
2
0
ξ =ω
2
∫
t
σ
4 u du
t
,
ρ=
0
t
∫
σ 0
2 u du
∫
t
σu4 du,
t 0
0
K (Uδ ; S ) +
t
σu2 du,
0 ∆SH ,n
,
(10)
σ
K (Yδ , Uδ ; S ) + K (Uδ , Yδ ; S ),
H nδ
t
σu2 du is mixed Gaussian with a zero mean
0
k0•,0
2ξ ρ k1•,1 H −1 + ξ 2 nδ
k′ (0)2 + k′ (1)2 H −2 + k2•,2 H −3 S
. (11)
Subsampling has no impact on the first term, (8). This is despite the fact that subsampling lowers the variance of the individual realised autocovariances, γ˜h (Yδ ). This is because subsampling introduces positive correlation between γ˜h (Yδ ; S ) and γ˜h+1 (Yδ ; S ) that exactly offsets the reduction in the variance of the realised autocovariances. Subsampling does reduce the variances of the terms affected by noise, (9) and (10), by a factor of S. The auxiliary quantity, K˜ (Xδ ; S ), is introduced to simplify the exposition of our results. K˜ (Xδ ; S ) and K (Xδ ; S ) are often asymptotically equivalent because their difference, ∆SH ,n , vanishes at a sufficiently fast rate. This is made precise in the following lemma. Lemma 1. If k′ (0)2 + k′ (1)2 ̸= 0 or S → ∞, then avar{K (Xδ ) − ′ 2 ′ 2 K˜ (Xδ )}/avar{K˜ (Xδ )} = o(1). If k (0) + k (1) = 0 then avar{K (Xδ )
− K˜ (Xδ )}/avar{K˜ (Xδ )} ≤ ξ / 2 + 2 k2•,2 k0•,0 /(k1•,1 )2 .
shall state several asymptotic results for nγ {K˜ (Xδ ) − t We 2 σ du}. An implication of Lemma 1 is that K (Xδ ) can be 0 u substituted for K˜ (Xδ ) whenever γ < 1/4. When γ = 1/4 the difference between K (Xδ ) and K˜ (Xδ ) is not trivial in an asymptotic
sense, but for all practical purposes their difference is negligible, the reason being that a realistic empirical value for ξ is ξ ≤ 0.01. With the original Tukey–Hanning kernel √ the relative variance in Lemma 1 is no larger than 1/{200(1 + 3)} ≈ 0.00183. The most obvious generalisation of Barndorff-Nielsen et al. (2008) is to think of the case where S is fixed and we allow H to increase with nδ . When (7) holds, we can follow Barndorff-Nielsen et al. (2008) and set H = c (ξ nδ )1/2 . Then we obtain the result that, where Ls denotes convergence in law stably (e.g. Barndorff-Nielsen et al. (2006)), nδ
K˜ (Xδ ; S ) −
t
∫
σ
2 u du
∫ t 3/4 → MN 0, 4ω t σu4 du
Ls
0
×
0
2c −1 ρ k1•,1 + c −3 k2•,2 0,0 ck• + . S
Whether or not (7) holds, when we set H = c (ξ nδ )2/3 we have
Theorem 1. For large H and n the asymptotic distributions of K (Yδ ; S ) −
0
0
So ∆SH ,n is related to end effects.
∫
k′ (0)2 + k′ (1)2 H −2 + k2•,2 H −3 /S ,
4t
Ut s Ut s −hδ − Utns Utns −hδ − Ut s Ut s +hδ ), t0s = s−S 1 δ , and tns = t + s−S 1 δ . 0
4 u du
1/4
and we define K˜ (Xδ ; S ) = K (Xδ ; S ) + ∆SH ,n , where ∆SH ,n = 1 s ∑S ∑H 1 S −1 s=1 h=1 k h+ − k h− Rh,n , Rsh,n = 12 (Utns Utns +hδ + H H 0
(9)
t
∫
+
Here we study subsampled realised kernels on the basis of smooth and kinked kernel functions. Specifically, we require that k(s) is continuous and twice differentiable on [0, 1] and that k(0) = 1 and k(1) = 0. Naturally, the derivatives at the end points are k(x)−k(0) k(1)−k(x) defined by k′ (0) = limx↓0 and k′ (1) = limx↑1 1−x . x Without subsampling, Barndorff-Nielsen et al. (2008) showed that
∫
σu2 duk1•,1 H −1 /S
Furthermore, K˜ (Xδ ; S ) − and variance
3. The subsampled realised kernel
2,2
(8)
respectively, and the asymptotic variance of ∆SH ,n is 4ω4 k1•,1 /(HS ).
From (5) we have that
∫
t
∫
j =1
k0•,0 =
σu4 du, 0
0
yj = Yδ j − Yδ(j−1) and uj = Uδ j − Uδ(j−1) .
k′ (0) = 0
t
∫
k0•,0 t
8ω2
yj uj−h ,
207
1/6
nδ
K˜ (Xδ ; S ) −
t
∫
σu2 du
0
and
Ls
→ MN 0, 4ω
4/3
t
∫
σ
4 u du
t 0
2/3
0,0
ck• +
k′ (0)2 + k′ (1)2 c2S
.
208
O.E. Barndorff-Nielsen et al. / Journal of Econometrics 160 (2011) 204–219
Here S plays a relatively simple role, reducing the impact √ of noise— by in effect reducing the noise variance from ω2 to ω2 / S. If (7) does hold then we get
1/6
K˜ (Xδ ; S ) −
nδ
t
∫
σu2 du
thp t
∫
Ls
→ MN 0, 4ck•0,0 ω4/3 t
σu4 du
2/3
,
realised kernel that does not require any subsampling. The second part of Theorem 2 shows that
0
which implies no asymptotic gains at all from subsampling.
ck0•,0 = 6 2 k0•,0
3.1. The effective sample size
t
σ
4 u du
4t
HS n
0
+ nξ 2
2ξ ρ k1•,1
k•0,0 +
+S
(HS )2
k2•,2
(HS )3
.
(12)
Here HS appears in the variance expression in a way that is almost identical to that for H when there is no subsampling (S = 1). The only difference is the impact on the last term. This term vanishes when k′ (0) = k′ (1) = 0 does not hold, because the term second to last is then O(n/(SH )2 ) whereas the last term is only O(H −1 )O(n/(SH )2 ). This feature of the asymptotic variance holds the key to the different results that we derive for smooth and kinked kernels. 3.2. Kinked kernels: when k′ (0) = k′ (1) = 0 does not hold When (7) does not hold the asymptotic variance of K˜ (Xδ , S ) is given by t
∫
σu4 du
4t 0
HS
k•0,0 +
n
1 ,1
2ξ ρ k• HS
k (0) + k (1) ′
+ nξ 2
′
2
2
(HS )
2
.
In this section, we consider smooth kernel functions. Some examples of smooth kernel functions are given in Table 1, where kth1 (x) is the Tukey–Hanning kernel. We know from Barndorff-Nielsen et al. (2008) that the rate of convergence of realised kernels improves when k′ (0) = k′ (1) = 0. This smoothness condition will also improve the rate of convergence for subsampled realised kernels. For smooth kernel functions, the asymptotic variance is given by
∫
t
σ
4 u du
4t
K˜ (Xδ ; S ) −
t
∫
∫ t 2/3 Ls σu2 du → MN 0, 4ω4/3 t σu4 du
0
× ck•0,0 +
0
k (0) + k (1) ′
′
2
2
c2
HS n
0
0 ,0
k• +
2ξ ρ k1•,1 HS
+ ξ nS
1/4
K˜ (Xδ ) −
t
∫
2
k′ (0)2 + k′ (1)2 0,0
k•
1/3
,
and
(HS )3
.
σ
2 u du
(14)
∫ t 3/4 → MN 0, 4ω t σu4 du
Ls
0
× ck0•,0 +
0
2ρ c
k1•,1 +
S
k2•,2
c3
.
(15)
(i.b) When S = anα for some 0 < α < 2/3, we set HS = c (ξ n)1/2 nα/4 and have n
1−α/2 4
K˜ (Xδ ; S ) −
t
∫
σ
2 u du
0
,
(13)
as n → ∞, as long as H increase with n. (ii) The asymptotic variance is minimised by
Theorem 3. (i.a) When S is fixed we set HS = c (ξ n)1/2 and have
t
∫
Ls
→ MN 0, 4ω t
σ
4 u du
3/4
0
c=
k2•,2
2
Because the last term is multiplied with S it is evident that the asymptotic distribution will depend on whether S is constant or increases with n. This is made precise in the following theorem.
n
Theorem 2. (i) If SH = c (ξ n)2/3 we have
1/3
Example 1. The Bartlett kernel, k(x) = 1 − x, has k0•,0 = 1/3 and k′ (0)2 + k′ (1)2 = 2, so 6ck0•,0 = 2 · 121/3 ≃ 4.58, whereas the quadratic kernel, k(x) = 1 − 2x + x2 , is more efficient, because it has k0•,0 = 1/5 and k′ (0)2 + k′ (1)2 = 4, so 6ck0•,0 = 12 · 5−2/3 ≃ 4.10.
While this expression depends on the product HS, it is invariant to the particular values of H and S, though we do need H → ∞ to justify the terms, k•0,0 , k•1,1 , etc. We have the following result.
n1/6
k′ (0)2 + k′ (1)2
3.3. Smooth kernels: when k′ (0) = k′ (1) = 0 holds
HS
k′ (0)2 + k′ (1)2
2
controls the asymptotic efficiency of estimators in this class.
The effectiveness of subsampling can be assessed in terms of the effective sample size, n = nδ S. It makes explicit that a larger S reduces the sample size, nδ , that is available for each of the realised kernels. Then we ask whether it is better to increase nδ or S for a given n. In terms of n (11) becomes
∫
kC (x) = 1 − 3x2 + 2x3 1 − 6x2 + 6x3 0 ≤ x ≤ 1/2 kP (x) = 2(1 − x)3 1/2 ≤ x ≤ 1 kthp (x) = sin2 {π/2 (1 − x)p }
Cubic kernel Parzen kernel
0
Table 1 Some smooth kernel functions.
6ck0•,0 ω4/3
t
∫
σu4 du
t
2/3
0
is the lower bound for the asymptotic variance. Thus (13) is not influenced by S, not even the rate of growth in S. All that matters is that H grows and that HS grows at the right rate. The implication is that there are no gains from subsampling when k′ (0)2 + k′ (1)2 ̸= 0. So we might as well set S = 1 and use the
0,0
ck• +
a c3
2 ,2
k•
.
(ii) Whether S is constant or not, the asymptotic variance is minimised by
1 ,1 0,0 2,2 k• k• ρ k• HS = (ξ n)1/2 0,0 1 + 1 + 3S , k• (ρ k1•,1 )2 and the lower bound is n
−1/2
t
∫
ω t
σ 0
4 u du
3/4
g (S ),
(16)
O.E. Barndorff-Nielsen et al. / Journal of Econometrics 160 (2011) 204–219 1.0
where
g (S ) =
209
16 3
ρ k1•,1 k0•,0
0.8
0,0 2,2 1 + 1 + 3S k• 1k,1• 2
+
1+
0,0 2,2
1 + 3S
k• k•
(ρ k1•,1 )2
TH1
0.6
TH16
(ρ k• )
Cubic Parzen
1
0.4
0.2
.
(17)
0.0 0
1
2
3
4
5
6
7
Fig. 3. Some smooth kernels, k(x/c1 ), using their respective optimal values of c when S = 1.
Remark. In (i.b) we impose α < 2/3. The reason is that H ∝ 1− 3 α /2
2 n1/2+α/4−α = n that H grows with n.
and we need 1 − 23 α /2 > 0 to ensure
The relative efficiency in this class of estimators is given from g (S ), and we have the following important result for subsampling of smooth kernels Corollary 1. The asymptotic variance of K˜ (Xδ ; S ) is strictly increasing in S. The implication is that subsampling is always harmful for smooth kernels. Furthermore, (i.b) shows that there is an efficiency loss from allowing S to grow with n. See Table 2 for the values of g (S ) for some selected kernel functions. Another implication of Theorem 3 concerns the best way to sample high frequency returns. This result is formulated in the next corollary and will require some explanation. Corollary 2. The asymptotic variance, (16), as a function of ρ , is minimised for ρ = 1. The corollary is interesting because ρ =
t 0
σu2 du
t
t 0
σ 4 du
depends on the sampling scheme by which intraday returns are obtained. So ρ can be interpreted as an asymptotic measure of heteroskedasticity in the intraday returns, where ρ = 1 corresponds to homoskedastic intraday returns. Rather than equidistant sampling in calendar time we can generate the sampling times using tj = t × τ
j
n
,
j = 0, 1, . . . , n,
where τ is a time change (τ (0) = 0, τ (1) = 1, and τ is monotonically increasing, so 0 = t0 ≤ t1 ≤ · · · ≤ tn = t). A change t of time does not affect 0 σu2 du but does influence the integrated
t
quarticity 0 σu4 du; see e.g. Mykland and Zhang (2006). A particularly interesting sampling scheme is business time sampling (BTS) (see e.g. Oomen (2005, 2006)), which is the sampling scheme that minimises the integrated quarticity (see Hansen and Lunde (2006, p. 135)). It is easy to see that the time change associated with BTS,
t ×τ (s)
t σu2 du = s × 0 σu2 du, and by ′ the implicit function theorem we have τbts (s) ∝ 1/σ 2 (˜s), where s˜ = t × τbts (s). The implication is that returns are sampled more τ (·), τbts (·) say, must solve
0
frequently when the volatility is high and less frequently when the volatility is low under BTS. In general we have ρ ≤ 1 and Corollary 2 shows that BTS (ρ = 1) is the ideal sample scheme. Naturally, sampling in business time is infeasible because τbts depends on the unknown volatility path. Still, Corollary 2 can be used as argument in favour of sampling schemes that results in less heteroskedastic intraday returns than CTS.
Given S and ρ the optimal H is H = cS (ξ n)1/2 for this class of kernels where
1 ,1 0,0 2,2 ρ k k k • • • . cS = S −1 0,0 1 + 1 + 3S k• (ρ k1•,1 )2
(18)
In Table 2 we present key quantities for some smooth kernels. Perhaps the most interesting quantity is g (S ) of (17), as it enables us to compare the relative efficiency across estimators. In Table 2 we have computed g (S ) for the case where ρ = 1. So g (S ) can be compared to 8 which is the corresponding constant for the maximum likelihood estimator in the Gaussian parametric version of the problem. We see that most kernels are only slightly less efficient than the maximum likelihood estimator, TH16 almost reaching this lower bound. Comparing g (S ) for different degrees of subsampling reminds us that S = 1 (no subsampling) yields the most efficient estimator. The larger the value of k0•,0 k•2,2 /(k1•,1 )2 , the more sensitive the kernel is to subsampling. Fig. 3 plots some smooth kernel functions, k(x/c1 ), using their respective optimal values for c1 ; see Table 2. We see that the TH1 kernel is almost identical to the cubic kernel. The TH16 kernel is somewhat flatter, putting less weight on realised autocovariances of lower order and higher weight on realised autocovariances of higher order. The Parzen kernel is typically between TH1 and TH16 . While the smooth kernels improve the rate of convergence over the kinked kernels, the improvements may be modest in finite samples. The reason is the following. When the noise is small the optimal H is small, and H may actually be quite similar for kinked and smooth kernels. For instance with ξ = 0.01 and n = 780, the Bartlett kernel has cBartlett (ξ n)2/3 = 9.00 whereas the cubic kernel has cCubic (ξ n)1/2 = 10.78. So in this case the two estimator types are rather similar and, despite the fact that HBartlett grows at the faster rate n2/3 , the cubic kernels include more lags in this situation. 3.4. Discontinuous kernel functions In this section, we consider the kernel functions that we have labelled as discontinuous kernels. Such kernels lead to estimators with poor asymptotic properties. We shall see that subsampling can substantially improve such estimators, making them consistent with mixed Gaussian distributions. So for such kernels, subsampling is a saviour.
˜h (Xδ ), where H = o(n) (possibly Lemma 2. Let Kw (Xδ ) = h =0 w h γ −1 constant). Then w = 1 + o ( 1 ) and w 0 0 − w1 = o(n ) are necessary ∑H
conditions for E Kw (Xδ ) − H − h=0
t 0
σu2 du → 0; and
(wh+1 − 2wh + wh−1 )2 = o(n−1 ),
(19)
210
O.E. Barndorff-Nielsen et al. / Journal of Econometrics 160 (2011) 204–219
Table 2 Key quantities for some smoothly continuous kernels.
Cubic Parzen TH1 TH2 TH5 TH10 TH16
0,0 2,2 k• k• (k1•,1 )2
0,0 1,1
k0•,0
k•1,1
k•2,2
0.371 0.269 0.375 0.218 0.097 0.050 0.032
1.20 1.50 1.23 1.71 3.50 6.57 10.26
12.0 24.0 12.2 41.8 489.0 3610.6 14374.0
0.67 0.64 0.68 0.61 0.58 0.57 0.57
k• k•
3.09 2.87 3.00 3.11 3.85 4.19 4.33
c1
g (S ) S=1
S=2
S=3
S = 10
3.68 4.77 3.70 5.75 8.07 24.79 39.16
9.03 8.53 9.18 8.27 8.07 8.04 8.02
9.81 9.25 9.96 8.99 8.82 8.80 8.80
10.39 9.78 10.55 9.51 10.19 10.19 10.20
12.72 11.94 12.89 11.65 11.57 11.59 11.60
Key is g (S ) that measures the relative efficiency in this class of estimators, here computed for the case with constant volatility (ρ = 1) such that these numbers are comparable with the maximum likelihood estimator that has g = 8.00. No subsampling (S = 1) produces the best estimator and kernels with a relatively large k0•,0 k2•,2 /(k1•,1 )2 tend to be more sensitive to subsampling.
is a necessary condition for Var Kw (Xδ ) − we set wH +1 = 0 and w−1 = w0 .
t 0
σu2 du
→ 0, where
The lemma shows that t realised kernels obtained using a fixed H cannot converge to 0 σu2 du in mean squares, because such estimators will not satisfy (19). Consider the case where we construct wh from a kernel function and let H → ∞. In this situation it is clear that any discontinuous kernel will violate (19), because n
H −
(wh+1 − 2wh + wh−1 ) ≃ n × 2
2 lim k(x) − lim k(x) .
− xj ∈Dk
h=0
x ↑x j
x ↓x j
3.4.1. The Zhou (1996) estimator First we will look at estimators which are thought of as having H fixed and allowing the degree of subsampling to increase. This is away from the spirit of the realised kernels of Barndorff-Nielsen et al. (2008) which need H to get large with nδ for consistency; however it is close to the important early work of Zhou (1996) and is strongly intellectually connected to the two-scale estimators of Zhang et al. (2005). The Zhou (1996) estimator can be written as γ0 (Xδ ; S ) + γ1 (Xδ ; S ) which is the subsampled realised kernel based on the truncated kernel function using H = 1. Zhou noticed that (1996) the variance of his estimator was of order O
+O
1 S
+O
nδ S2
,
but did not realise that by allowing S to increase with nδ his estimator becomes consistent. In fact, in a subsequent paper Zhou stated that his subsampled realised kernel was inconsistent; see Zhou (1998, p. 114). The following theorem gives its asymptotic distribution. Theorem 4. Suppose S = c 3 n2δ ; then as nδ → ∞,
∫ t 1/2 nδ γ0 (Xδ ; S ) + γ1 (Xδ ; S ) − σu2 du 0 ∫ t 16 Ls → MN 0, t σu4 du + 8ω4 /c 3 . 3
∫ t σu2 du γ0 (Xδ ; S ) + γ1 (Xδ ; S ) − 0 ∫ t 2/3 8 16 Ls 4/3 4 c+ 2 . → MN 0, ω t σu du
n1/6
3
0
c
The minimum asymptotic variance is
∫ t 2/3 √ 3 4 3 ω4/3 t σ du , u 0
with c =
8
√ 3
3.
≃11.54
Here Dk is the set of discontinuity points for k(x). Next, we consider the truncated kernel which does not satisfies (19). We will see that subsampling this kernel produces an estimator that is consistent and mixed Gaussian. This is true whether H is finite or is allowed to grow with the sample size.
S nδ
Lemma 3. If S = c (ξ n)2/3 then the Zhou estimator has
0
This asymptotics is not particularly attractive, for its seeming 1/2 nδ rate of convergence hides the fact that it assumes massive databases in order to allow S to increase rapidly with nδ . A more interesting way of thinking about this estimator is in terms of the effective sample size n = S × nδ . Again we ask whether it is better to increase nδ or S for a given n. This leads to the following result.
The Zhou estimator asymptotic variance is thus of the form obtained by the kinked non-subsampled realised kernels, i.e. ones which do not satisfy the k′ (0) = k′ (1) = 0 condition. Example 2. Suppose n corresponds to using prices t every 1 s on the NYSE; so n = 23,400. If ω2 = 0.001 and t 0 σu4 du = 1, which is roughly right in empirical work from 2004, then for the Zhou estimator the optimal S ≃ 25, so nδ ≃ 920. Hence the degree of t subsampling is rather modest. In 2000, ω2 = 0.01 and t 0 σu4 du = 1 would be more reasonable, in which case S = 118 and nδ = 198, which corresponds to returns measured roughly every 2 min. 3.4.2. The two-lag flat-top Bartlett estimator A natural extension of Zhou (1996) is to allow H to be larger than 1 but fixed. Lemma 4. Let w0 = w1 = 1 and w2 = 1/2. With S = c (ξ n)2/3 we have
1
n1/6 γ0 (Xδ ; S ) + γ1 (Xδ ; S ) +
Ls
2
2/3
t
∫
→ MN 0, ω4/3 t
γ2 (Xδ ; S ) −
σu4 du 0
20 3
t
∫
c+
0
2
σu2 du
c2
,
and the minimum variance is 10 3/5 ω
3
≃8.43
4/3
t
∫
σ
t
4 u du
2/3
,
with c =
3/5.
3
0
The constant in the asymptotic variance is here reduced from about 11.54 to 8.43. Now we proceed by adding additional realised autocovariances 1 to Zhou’s estimator, using the Bartlett weights, wh = k h − , h = 2, . . . , H. An interesting question is that of H what happens as we For moderately large H increase H further. we have that n1/6 K (Xδ ) −
t
σu2 du has an asymptotic variance t 4 of approximately 34 {2 + (H + 1)} ct 0 σu4 du + c82ωH 2 . This suggests 0
O.E. Barndorff-Nielsen et al. / Journal of Econometrics 160 (2011) 204–219 1
211
This allows us to understand that replacing γ0 (Uδ ; S ) − nδ 2ω2 by γ0 (Uδ ; S ) − nδ 2ω ˆ 2 yields a feasible estimator with a smaller variance than the infeasible estimator.
H= 1 H= 2
Theorem 6. With S = c (ξ n)2/3 we have 1/2
H= 00
n
1/6
Ls
∫ t 2 2 γ0 (Xδ ; S ) − 2nδ ωˆ − σu du 0 2/3 ∫
→ MN 0, ω
4/3
t
σ
t
4
4 u du
3
0
c+
8
c2
.
0 0
1/3
1
2/3
1/2
Fig. 4. The implied kernels that arise from subsampling some simple kernels. H = 1 corresponds to the subsampled version of Zhou’s estimator; H = 2 is that for Zhou’s estimator after adding 1/2γ˜2 (Xδ ); and H = ∞ (here approximated by H = 18) illustrates the implied kernel for Zhou’s estimator that is enhanced by an increasing number of Bartlett-weighted realised autocovariances.
c 3 = 12ω4 / H 3 t
t
σu4 du + o(1), so the asymptotic variance √ 3 (using 43 121/3 + 8/122/3 = 2 12) is ∫ t 2/3 √ 3 4 2 12 ω4/3 t σ du + o(1). u 0
0
≃4.58
So we achieve an additional reduction of the asymptotic variance. Not surprisingly, this is the asymptotic variance of the Bartlett realised kernel applied to a sample of size n when H ∝ n2/3 ; see Example 1. Here, by allowing H to grow we approach the situation with kinked kernels, so we observe the eventual impotence of subsampling—a property that we have shown holds for all kinked kernels. Hence as H gets large the optimal degree of subsampling rapidly falls and the best thing to do is simply to run a Bartlett realised kernel on the data without subsampling, i.e. take nδ = n. Fig. 4 shows the implied kernel functions that are generated by subsampling Zhou’s estimator (H = 1) and the two estimators that have been enhanced by adding Bartlett weights. The relative asymptotic efficiencies for these estimators are simply given by k0•,0 of the implied kernel, where the implied kernel for H = 1 corresponds to the trapezoidal kernel by Politis and Romano (1995). From Fig. 4 it is evident that k0•,0 is decreasing in H which explains that the asymptotic variance of this estimator is decreasing in H. 3.4.3. The relationship to a two-scale estimator The two-scale estimator of Zhang et al. (2005) bias corrects γ0 (Xδ ; S ) using ωˆ 2 = γ0 (Xδ/S )/2n. Their results are reproved here, exploiting our previous results to make the proofs very short. We set S = c (ξ n)2/3 , which imposes the optimal rate for S. 2/3
Theorem 5. With S = c (ξ n)
n1/6 γ0 (Xδ ; S ) − nδ 2ω2 −
we have
t
∫
σu2 du
0
Ls
→ MN 0, ω
4/3
t
∫
σ
4 u du
t
2/3
0
n1/6
4 3
c+4
Sn δ 2 1− Ujδ/S − U(j−S )δ/S − nδ 2ω2 S
Snδ 2 1− Ujδ/S − U(j−1)δ/S − nδ 2ω2 S
j =1
j =1
Ls
→ N 0,
4ω4 c 2 ξ 4/3
1 + λ2
λ
2
λ2 1 + λ2
.
1 + λ2 c2
,
(20)
The minimum asymptotic variance is
2/3 ∫ t √ 3 4 σ du , 12 ω4/3 t u 0
2
with c =
√ 3
12.
≃4.58
Thus the two-scale estimator is significantly more efficient than the Zhou estimator and is as efficient as the Bartlett realised kernel.
t
Example 3 (Continued from Example 2). If ω2 = 0.001 and t 0 σu4 du = 1, then S ≃ 40 and nδ ≃ 580. Hence the degree of subsampling is larger than that used by Zhou. 4. Some empirical recommendations We have worked under the assumption that the noise is of the independent type defined in (6). This assumption seems reasonable for equity returns when prices are sampled at moderately high frequencies; e.g. for the liquid stocks on the New York stock exchange this assumption seems reasonable when applied to 1 min returns (Hansen and Lunde (2006)). In this context the best approach to estimation is to use a smooth realised kernel without any subsampling. A shortcoming of this approach is that this estimator does not make use of all available observations. For example, transactions on the most liquid stocks now take place every few seconds, but for U ∈ W N to be reasonable we can only sample every, say, 15th observation. In this section we discuss how to construct subsampled realised estimators that make use of all available data. We also discuss how valid inference can be made about such estimators under realistic assumptions about the noise in tick-by-tick data. Here we use a subsampled realised kernel, where S is chosen to be sufficiently large that (6) is reasonable for a sample that only consists of every Sth observation. The asymptotic variance can be estimated from the coarsely sampled data, using the methods of Barndorff-Nielsen et al. (2008), and this leads to valid inference that is robust to both time-dependent and endogenous noise in the tick-by-tick data. Specifically we recommend the following procedure. 1. Choose S sufficiently large for (6) to be a plausible assumption for a sample that only consists of every Sth observation. 2. Construct S distinct subsamples, each having approximately nδ = n/S observations. 3. For each of the S subsamples, obtain estimates of ω2 and t t IQ = t 0 σu4 du, and an initial estimate of IV = 0 σu2 du. See Barndorff-Nielsen et al. (2008) for ways to do this. Average each of these estimators to construct the subsampled estimators, ∑ ∑ initial = S −1 S IV initial,s and IQ = ωˆ 2 = S −1 Ss=1 ωˆ s2 and IV s=1
∑S
s. S −1 s=1 IQ ˆ for the optimal H, by plugging the 4. Obtain an estimate, H, subsampled estimates into the expression for the optimal H. ˆ to compute the S realised kernels, K s (Xskip-S ), using Use this H a smooth kernel and the weights w0 = w1 = 1 and wh = k
h −1 ˆ H
ˆ Form their average to obtain the , for h = 2, . . . , H.
final = K (Xskip-S ; S ). actual estimator, IV
212
O.E. Barndorff-Nielsen et al. / Journal of Econometrics 160 (2011) 204–219
5. Finally, compute the conservative estimate for avar{K (Xskip-S ; S )} using the finite sample expressions, where w = (w0 , w1 , . . . , wHˆ )| ,
K (Xskip-S ; S ) = IQ final (w | Bw) (w| Aw) × 1 + 8ωˆ 2 IV Var
nδ
+ 4ωˆ 4 (w| C w) × nδ .
(21)
The variance estimate in (21) is the sum of the finite sample versions of (8)–(10) with S = 1. So this expression completely ignores subsampling, and the expression is really an estimator of Var(K s (Xskip-S )). The reason is that subsampling does not reduce the noise variance by a factor of S, unless the noise is uncorrelated across subsamples. This is unrealistic when the subsamples exploit all the tick-by-tick data. However, we still have avar{K (Xskip-S ; S )} ≤ avar(K s (Xskip-S )), even if Ut yUs is violated for some s ̸= t. So (21) is simply a robust estimator that is expected to yield a conservative estimate of the variance. It is interesting to have some notion of how conservative this estimator is. Recall our result in Theorem 1 that avar{K (Yskip-S ; S )} = avar (K s (Yskip-S )); see (8). So subsampling cannot reduce the contribution to the asymptotic variance from this term, while the contributions from the two other terms (9) and (10), can potentially be driven all the way to zero. Example 4. With ρ = 1, the asymptotic variance of the realised TH2 kernel is proportional to
one second, so the entire interval corresponds to 6.5 h, which is the length of a typical trading day. The volatility process is restarted at its mean value σ0 = 1 every day by setting τ0 = 5/2. This
1 0
keeps the noise-to-signal ratio, ξ = ω2
across simulations. In our Monte Carlo designs we let the effective sample size, n, be either 1560, 4680, or 23,400; these correspond to sampling every 15, 5, or 1 s, respectively. So a sample with 4680 observations, say, is obtained by including every fifth observation of the N = 23,401 simulated data points over the [0, t ] interval. 5.2. Implementation of realised kernels and subsampled realised kernels From the simulated data, X0 , . . . , Xn , we define the ‘‘skip-S returns’’ ∆S Xj = Xj − Xj−S . The subsampled realised autocovariances are computed by using nδ −
γˆhs =
∆S XjS +s−1 ∆S X(j−h)S +s−1 ,
j=1
s = 1, . . . , S , h = −H , . . . , 0, . . . , H , where nδ = n/S. The subsampled realised kernel is defined by S 1− K s (X ), K (X ; S ) = S s=1 H −
h−1
k1,1 k2,2 c1 + 2 •0,0 c1−1 + •0,0 c1−3 k• k• 41.8 1.71 2 + = 5.75 + (5.75)−3 ≃ 9.50. 0.218 5.75 0.218
∗ kernel and HBartlett ,1 =
Subsampling this estimator with S = 10, say, reduces this factor to no less than
The ‘‘noise-to-signal’’ parameter, ξ = ω2
5.75 +
1 1.71
2
10 0.218 5.75
+
1 41.8 10 0.218
(5.75)−3 ≃ 6.12
(see (11)). So the variance reduction is less than 36% and even with S → ∞ the reduction is less than 40%. In practice, the reduction is likely to be much smaller, because the noise is not independent across subsamples. So even though (21) is a conservative estimator, it is not perversely conservative. 5. The simulation study 5.1. The simulated model and design In this section we analyse the finite sample properties of K (Xδ ; S ), using both a smooth TH2 kernel and a kinked Bartlett kernel. We consider the following SV model: dYt = µdt + σt dWt , dτt = ατt dt + dBt ,
σt = exp (β0 + β1 τt ) , corr(dWt , dBt ) = ρ,
where ρ is a leverage parameter. This model is frequently used for simulation is this context; see e.g. Huang and Tauchen (2005) and Barndorff-Nielsen et al. (2008). In our simulated model, we set µ = 0.03, β1 = 0.125, α = −0.025 and ρ = −0.3. Further, we set β0 = β12 /(2α) in order to standardise E(σt2 ) = 1. With this configuration the variance of
t 0
σu2 du is comparable to the empirical results found in Hansen and
Lunde (2005). For the variance of market microstructure noise we set ω2 = 0.1. The process is generated using an Euler scheme based on N = 23,400 intervals, where each interval is thought to correspond to
σu4 du, comparable
s where K ˆ0s + H (X ) = γ
h=1
k(
H
) γˆhs + γˆ−s h .
∗ When S = 1 we use Hth = 5.75(ξ n)1/2 for the smooth TH2 2 ,1
3
12(ξ n)2 for the kinked Bartlett kernel.
1 0
σu4 du, need not
be estimated in our simulations, because ω2 is known and the 1 4 ∑N integrated quarticity, 0 σu du ≃ N j=1 σj4/N , can be computed from the simulated data. The parameter ρ =
1 0
σ
2 u du
1 0
σu4 du
can be computed from the simulated volatility path. When S ≥ 2 ∗ the optimal H for the Bartlett kernel is simply given by HBartlett ,S = S −1
1/2
∗ 12(ξ n)2 , and the TH2 kernel has Hth = cS (ξ n), where 2 ,S
3
cS = S −1
7.84ρ(1 +
√
1 + 9.33S ), as defined in (18).
5.3. Simulation results Figs. 5 and 6 shows the Monte Carlo results with the number of subsamples, S, increasing along the horizontal axis and the MSE on the vertical axis. The lines represent different sample sizes. Consider first the results based on the Bartlett kernel. Our theoretical results in Theorem 2 dictate that these lines should be horizontal. This result is confirmed. Still, a small increase in the MSE as S increases is observed for the smaller sample sizes. The reason is that the lag length of the implied kernel, Himplied , can only attain values that are divisible by S.While the Bartlett kernel ∗ without subsampling has HBartlett ,1 =
Bartlett kernel has Himplied = S × S −1
3
3
12(ξ n)2 , the implied
12(ξ n)2 . So as S increases
∗ the implied kernels’ Himplied is more likely to deviate from HBartlett ,1 , which causes an increase in the mean squared error. The smaller the sample size, n, the smaller the optimal value for H. So it is not surprising that the impact on MSE is seen earlier when n is small. ∗ In this design, the optimal lag length, HBartlett ,1 , is about 67,140, and 403, for n = 1560, n = 4680, and n = 23,400, respectively, though
O.E. Barndorff-Nielsen et al. / Journal of Econometrics 160 (2011) 204–219
MSE
n=23400 (1 sec)
213
n=1560 (15 sec)
n=4600 (5 sec)
0.214 0.169 0.133 0.105 0.083 0.065 0.052 0.041 0.032 0.025 0.020 1
2
3
4
5
6
7 8 9
11
15
20
25
35
50
Number of subsamples (noise=0.1, Bartlett kernel)
Fig. 5. Mean square errors (MSEs) for the subsampled (kinked) Bartlett realised kernel obtained using three different sample sizes. The MSE is fairly insensitive to S. These findings are fully consistent with Theorems 2 and 3.
MSE
n=23400 (1 sec)
n=1560 (15 sec)
n=4600 (5 sec)
0.225 0.177 0.139 0.109 0.086 0.067 0.053 0.041 0.032 0.025 0.020 1
2
3
4
5
6
7 8 9
11
15
20
25
35
50
Number of subsamples (noise=0.1, TH2 kernel)
Fig. 6. Mean square errors (MSEs) for subsampled (smooth) TH2 realised kernels, using three different sample sizes. The TH2 kernel has MSEs that are slightly increasing in S. These findings are fully consistent Theorems 2 and 3.
there is some variation in the optimal H across simulations because it, through ξ , depends on the simulated volatility path. The lower panels present the results for the smooth TH2 kernel. Here, our theoretical results of Theorem 3 state that the MSE is increasing in S, and this phenomenon is evident for all sample sizes. The results when ω2 = 0.01 and ω2 = 0.001 (not reported) are similar. Here the optimal H is smaller and this causes subsampling to have a larger impact on the MSE. Naturally, the implied kernels must have Himplied ≥ S, so Himplied = S whenever S ≥ H ∗ . This constraint is relevant for our simulations with small levels of noise because subsampling takes Himplied further away from its optimal value, as S increases beyond the optimal H. 6. Empirical study of General Electric trades Here we compare subsampled realised kernels with other estimators. We estimate the daily increments of [Y ] for the log-price of General Electric (GE) shares in 2000 and in 2004. The reason that we analyse data from both periods is that the variance of the noise was around 10 times higher in 2000 than in 2004. A more detailed analysis of 29 other major stocks is provided in a web appendix to this paper available from www.hha.dk/~alunde/bnhls/bnhls.htm. This appendix also describes the exact implementation of our estimators. Precise details on the cleaning that we carried out on the raw data before analysis are described in the web appendix to Barndorff-Nielsen et al. (2008). Table 3 shows summary statistics for seven estimators. The first estimator is the realised TH2 kernel obtained using approximate 1 min returns. The approximate 1 min returns are obtained by skipping a fixed number of transactions, such that the average time between observations is one minute. In 2000 we had to skip every 9.7 observations on average to construct the approximate 1 min returns, and in 2004 we had to skip every 13.7 observations on average. The second estimator is the subsampled realised TH2
kernel. So in 2000 we have S ≃ 9.7 and in 2004 we have S ≃ 13.7. The third estimator is the realised TH2 kernel that is based on tickby-tick data (i.e. all available trades) and an H that is S times larger than that used by the first estimator. The following three estimators are subsampled realised variances. These are based on returns that are sampled in calendar time, where each intraday return spans 20 min, 5 min, or 1 min, as indicated in the subscript of these estimators. To exhaust data sampled every second, the number of subsamples are S = 1200, S = 300, and S = 60, respectively. For instance, the estimator [X5 min ; 300] is the average of 300 realised variances, where each realised variance is based on 5 min intraday returns, simply changing the initial place at which prices are recorded by one second. The last estimator, TSRV (K , J ), given by Zhang et al. (2005), is the two-scale estimator that is designed to be robust to deviations from i.i.d. noise. Here we use their area adjusted estimator, which involves a bias correction. From Table 3 we see that the estimators are very tightly correlated. The two realised kernels and the subsampled realised kernel are almost perfectly correlated, and all reported statistics are quite similar for these estimators. The two-scale estimator is also quite similar to the realised kernels. Interestingly, amongst the subsampled realised variances, it is that based on 5 min returns that is most similar to the realised kernels. This suggest that 20 min returns leads to too much sampling error, whereas 1 min returns are being influenced by the bias due to market microstructure noise. Time series for some of these estimators are drawn in Fig. 7, where we plot daily point estimates for November 2000 and November 2004. We also include the confidence intervals for K TH2 (Xap.1 min ), using the method of Barndorff-Nielsen et al. (2008). The three estimators are almost identical. While the subsampled realised kernel may be slightly more precise than the moderately sampled realised kernel, K TH2 (Xap.1 min ), Fig. 7 does not suggest there is a big difference between these two. The realised kernel
214
O.E. Barndorff-Nielsen et al. / Journal of Econometrics 160 (2011) 204–219
Table 3 Summary statistics for subsampled [Y ] estimators. Mean
Std. (HAC)
H
Corr
acf(1)
acf(2)
acf(5)
acf(10)
Sample period: 2000 Realised kernel (TH2 , H ∗ = c ξ n1/2 ) K TH2 (Xap.1 min ) 4.747
3.216 (6.133)
6.558
1.000
0.43
0.25
0.03
0.15
Subsampled realised kernel (TH2 , H = c ξ n1/2 ) K TH2 (Xap.1 min ; S ) 4.709 3.220 (6.170)
6.558
0.997
0.43
0.25
0.03
0.16
0.986
0.46
0.27
0.05
0.13
Realised kernel (TH2 , H = S · H ∗ ) K TH2 (X1 tick ) 4.702
2.946 (5.793)
Subsampled realised variances [X20 min ; 1200] 4.417 [X5 min ; 300] 4.908 [X1 min ; 60] 5.545
3.650 (6.046) 3.018 (5.611) 2.376 (5.167)
0.894 0.984 0.787
0.26 0.44 0.55
0.17 0.23 0.36
−0.01 0.01 0.10
0.17 0.14 0.18
Aït-Sahalia et al. (forthcoming) TSRV (K , J ) 4.514
3.657 (6.766)
0.941
0.36
0.21
0.01
0.23
62.44
Sample period: 2004 Realised kernel (TH2 , H ∗ = c ξ n1/2 ) K TH2 (Xap.1 min ) 0.962
0.568 (1.195)
5.723
1.000
0.34
0.32
0.28
0.08
Subsampled realised kernel (TH2 , H = c ξ n1/2 ) K TH2 (Xap.1 min ; S ) 0.954 0.561 (1.202)
5.723
0.995
0.37
0.32
0.28
0.09
0.990
0.37
0.31
0.30
0.08
Realised kernel (TH2 , H = S · H ∗ ) K TH2 (X1 tick ) 0.947
0.522 (1.130)
Subsampled realised variances [X20 min ; 1200] 0.885 [X5 min ; 300] 0.943 [X1 min ; 60] 0.942
0.516 (1.036) 0.503 (1.088) 0.376 (0.921)
0.933 0.984 0.899
0.27 0.37 0.46
0.27 0.32 0.43
0.27 0.30 0.38
0.08 0.08 0.12
Aït-Sahalia et al. (forthcoming) TSRV (K , J ) 0.946
0.560 (1.194)
0.944
0.33
0.35
0.28
0.11
78.27
Summary statistics for seven estimators. First the realised kernel obtained using approximate 1 min returns with H ∗ and its subsampled version, followed by the realised kernel obtained using tick-by-tick data with H = S · H ∗ . Then three subsampled realised variances based on 20, 5 and 1 min returns. For instance, [X5 min ; 300] is the average of 300 realised variances based on 5 min returns, obtained by shifting the time prices recorded by 1 s. Finally, TSRV (K , J ) is the two-scale estimator that is robust to deviations from i.i.d. noise. For both 2000 and 2004 we report the average of daily estimates with standard deviations. Corr is the correlation between each of the estimators and the first realised kernel. Finally we report four sample autocovariances.
that is based on tick-by-tick data is slightly different from the other estimators, but always inside the confidence interval for K TH2 (Xap.1 min ).
The remainder RxS /S is a relatively small term, due to end effects. The term is defined explicitly in the proof, and the expression shows that RxS can be made zero by tweaking the first S − 1 and last S − 1 intraday returns.
7. Conclusions We have studied the properties of subsampled realised kernels. Subsampling is a very natural addition to realised kernels, for it can be viewed as averaging over realised kernels with slightly different starts of the day. We have provided a first asymptotic study of these statistics, allowing the degree of subsampling or the number of lags to go to infinity or be fixed. Included in our analysis is the asymptotic distribution of the estimator proposed by Zhou (1996). Subsampling leads to few gains in our analysis. In fact, we found that subsampling is harmful for the best class of realised kernel estimators. The main advantage of subsampling is that it can overcome the inefficiency that results from a poor choice of kernel weights in the first place. For example, when the truncated kernel is used to design estimators, the resulting estimator has poor asymptotic properties, whereas the subsampled estimator is consistent and converges at rate n1/6 . In the realistic situation where the noise is endogenous and time dependent, subsampled realised kernels do provide a simple way to make use of all the available data. We have discussed how to make valid inferences about such estimators.
Proof. Define the intraday returns xj = X δ j − X δ j− δ , and write S
So xj are intraday returns over short intervals, each having length δ/S. The γh1 (Xδ ) equals nδ −
Xδ j − Xδ(j−1)
Lemma A.1. We have γh (Xδ ; S ) =
γSh+s (X δ ) + RxS /S. S
Xδ(j−h) − Xδ(j−h−1)
j =1
=
nδ −
x(j−1)S +1 + · · · + xjS
x(j−h−1)S +1 + · · · + x(j−h)S
j =1
=
n −
xj xj−Sh +
j =1
+
Appendix. Proofs S −|s| s=−S S
S
Xδ(j+ s−1 ) − Xδ(j−1+ s−1 ) = X δ (jS +s−1) − X δ (jS −S +s−1) S S S S = xjS +s−1 + · · · + xjS −S +s .
+
n −
n −
xj xj−Sh+S −1 +
n − j=1 jmod S ̸∈{1,2}
n −
xj xj−Sh+1 +
j=1 jmod S ̸=0
j=1 jmod S =1
∑S
S
xj xj−Sh+2 + · · ·
j=1 jmod S ̸∈{0,S −1}
n −
xj xj−Sh−1
j=1 jmod S ̸=1
xj xj−Sh−2 + · · · +
n − j=1 jmod S =0
xj xj−Sh−S +1 .
215
2
4
6
8
Estimates of [Y]
10
12
O.E. Barndorff-Nielsen et al. / Journal of Econometrics 160 (2011) 204–219
1
2
3
6
7
8
9
10
13
14
15
16
17
20
21
22
27
28
29
30
19
22
23
24
29
30
0.0
0.5
1.0
Estimates of [Y]
1.5
2.0
Days in November 2000 (GE)
1
2
3
4
5
8
9
10
11
12
15
16
17
18
Days in November 2004 (GE)
Fig. 7. Three estimators for the daily increments to [Y ] for General Electrics in November 2000 and 2004. Triangles are the estimates of the realised kernel obtained using roughly 1 min returns. Diamonds are the estimates produced from the subsampled realised kernel. Circles are the estimates of the realised kernel that uses tick-by-tick returns and an H that is S times larger than that used for the first realised kernel. The intervals are the 95% confidence intervals for K TH2 (Xap.1 min ).
Similarly for s > 1 we have nδ −
Xδ j+ s−1 − Xδ(j−1)+ s−1 S
j =1
=
Xδ(j−h)+ s−1 − Xδ(j−h−1)+ s−1
S
S
n+s−1
n+s−1
−
−
xj xj−Sh +
j =s
+
+
S =2
xj xj−Sh+1 n+s−1
−
−
xj xj−Sh+2 + · · · +
xj xj−Sh+1 + · · ·
j =n +1
n+1 −
n+s−1
−
xj xj−Sh+S −2 +
j=n+1
n+s−1
xj xj−Sh−1 +
−
xj xj−Sh−2 + · · ·
j =n +2
−
xj xj−Sh−1 +
j=s jmod S ̸=s
+
xj xj−Sh+S −1
xj xj−Sh−2 + · · ·
products, xi xj , than s=1 γhs (Xδ ). So RxS /S is typically negligible. In fact, RxS can be made zero by assuming x1 = · · · = xS −1 = xn+1 = · · · = xn+S −1 = 0.
∑S
xj xj−Sh−S +1 .
For the non-subsampling case Barndorff-Nielsen et al. (2008) derived the following helpful results.
j=s jmod S =s−1
By adding up the terms, γh (Xδ ; S ) =
∑ S −1
s=−S +1
S −s S
γSh+s (X δ )+ RxS /S, S
where
S =2
s−1 − j =1
xj xj−Sh−h+1 .
The term RxS is due to end effects and involves far fewer cross
n+s−1
s−1 S − −
− j=n+s−1
j=s jmod S ̸∈{s,1}
−
n+s−1
j=s jmod S =s
n+s−1
+
j =n +1
j =n +1
n+s−1
−
RxS = −
−
j=s jmod S ̸=s−1
n+s−1
+
n+s−2
xj xj−Sh +
S
+
j=s jmod S ̸∈{s−1,s−2}
+
S n− +s−1 −
xj xj−Sh +
j =1
xj xj−Sh−1 +
s−2 −
xj xj−Sh+1 + · · · +
j =1
s −1 − j=2
xj xj−Sh−2 + · · · +
1 −
xj xj−Sh+S −2
j=1
s−1 − j=s−1
xj xj−Sh−S +1
Theorem A.1. We study properties as δ ↓ 0. Suppose that Y ∈ BSM and (4) holds; then
∫ t γ0 (Yδ ) − σu2 du ∫ t 0 Ls 1/2 4 γ ˜ ( Y ) → MN 0, A1 × t 1 δ nδ σ du , u .. 0 . γ˜H (Yδ )
216
O.E. Barndorff-Nielsen et al. / Journal of Econometrics 160 (2011) 204–219
.
.. .
··· ··· .. .
0
0
···
2 0
0 4
A1 = ..
0 0 4
V 0 ,S =
.. . .
(A.1) V 1 ,S =
1
1+2
S 1
S − s S−s
0+
S
B11 B21
B=
B11 =
B12 B22
1
•
−1
2
,
B22
2 −1 = .. . ··· 0 0
,
B21 = .. .
0
−1
0
2
..
..
..
.
· · · .. . , −1
. .
−1
0
S s=1
C =
C11 C21
1 0 0 =
C21
...
0
C12 C22
,
−4 1 0 , .. .
1+λ −2 − λ 2
C11 =
6 −4 1 = 0
C22
0
.. .
• • 6
6
..
.
= Var
,
= 4V1,S ×
S − S − | s|
6
.
S
γ˜Sh+s (YδS )
t
∫
σu4 du,
t 0
S 1
nδ
2V0,S nδ
t
t
0
σu4 du.
t
∫
S
γ˜Sh+S +s (YδS )
S − S−s s
γ˜Sh+s (YδS ) =
S
s =1
t
∫
1
σu4 du
× 4t
S
n
0
t
∫
σu4 du.
t 0
2 S − S−s
+
0
+2
S
s =1
σu4 du ,
S (S − 1)
S − s S−s
S
s=1
S
= S,
S
so V0,S + 2V1,S = 1, where V0,S and V1,S are defined in the proof of Theorem A.2. From the structure of AS we have
··· .. . .. . .. .
H 1 −
H i ,j = 0
=
(A.2)
=
i
k
H
H 4(V0,S
s=−S
γ˜Sh+s (Y δ ), and the asymptotic properties of γh (Y δ ), h = −SH , S S . . . , SH, using the small time gaps, δ/S, follow straightforwardly
[AS ]i,j
H
2 k
h=0
h
k
H
k
h−1 H
k(u)2 dx + O
k
h=1
H
h=0
−k 1/H
1
=4
H
2 H + 2V1,S ) − h
h H
H 8V1,S −
+
H
∫
∑S
j
k
H 4V0,S −
×
t γ0 (Yδ ; S ) − σu2 du, ∫ t 0 Ls 1/2 4 γ1 (Yδ ; S ) → MN 0, A∞ × t nδ σ du . u .. 0 . γH (Yδ ; S ) Proof of Theorem A.2. By Lemma A.1 we have γ˜h (Yδ ; S ) ≃
S
Proof of Theorem 1. For the subsampled realised kernel on Yδ we have
×t
S − S − | s|
γ˜Sh+s (YδS ),
S V0,S + 2V1,S = 1 + 2
∫
from (A.1). Write
1
2 3
Covariances between γ˜h (Yδ ; S ) and γ˜i (Yδ ; S ) are zero for |h − i| ≥ 2, as they do not ‘‘share’’ any of the realised autocovariances γ˜Sh+s (YδS ).
and as δ ↓ 0 and S → ∞,
S −|s| S
→
2
1 − S −2 →
s=−S
S − S−s s s=1
• • • . • .. .
Theorem A.2. Suppose that Y ∈ BSM and (4) holds; then as δ ↓ 0
∫ t γ0 (Yδ ; S ) − σu2 du, 0 Ls 1/2 γ1 (Yδ ; S ) → MN 0, AS nδ .. . γH (Yδ ; S ) 2 + S −2 • 0 − 2 − 2 4 + 2S • 2 1 − S AS = 3 1 − S −2 4 + 2S −2 0 .. .. .. . . . 2 1 0 ··· .. . 4 1 2 1 → = A∞ , .. 3 0 . 1 4 .. . . .. .. . . . .
S
s=−S
• • •
−4 .. .
S − S − | s|
= Cov
and
−2 − λ 5 + λ2
.
3
Cov {γ˜h (Yδ ; S ), γ˜h+1 (Yδ ; S )}
•
..
1+
and similarly for h = 0 we find Var{γ˜0 (Yδ ; S )} = For h ≥ 0 we find
2
6 −4 1
nδ
2
2
4V0,S
→
0
6
s=−S
Cov { γ (Uδ )} = 4ω nδ C + O(1),
1
=
S −2
Then for h ≥ 1 we have
4
2
=
S
Var {γ˜h (Yδ ; S )} = Var
−1 0 .. , .
E { γ (Uδ )} = 2ω2 nδ (1, −1, 0, 0, . . . , 0)| ,
S
s =1
Ls
If, in addition, for nδ ≥ H, U ∈ W N and Y yU then γ (Yδ , Uδ ) → MN (0, 2ω2 [Y ]B), where
2 S − S−s
−
h
k
H
H 8V1,S −
H2
h−1
H
+O
1
H
h
k
H
h=1
1
+O
H
1
H
0
.
From Barndorff-Nielsen et al. (2008) we have
2ω2 Ls γ (Yδ , Uδ ; S ) → MN 0, [Y ]B ,
(A.3)
S
E { γ (Uδ ; S )} = 2ω2 nδ (1, −1, 0, 0, . . . , 0)| . Furthermore, with
Ujs
=
U
1 j+ s− δ S
(A.4)
we have γ˜ (Uδ ) = − s h
Vhs+1,n
2Vhs,n − Vhs−1,n + Rsh+1,n − Rsh−1,n , where Vhs,n = Ujs+h
) and
Rsh,n
= ( 1 2
Uns Uns+h
+
s U0s U− h
−
Uns Uns−h
∑n
−
j =1
+
Ujs (Ujs−h +
U0 Uhs
). So with
O.E. Barndorff-Nielsen et al. / Journal of Econometrics 160 (2011) 204–219
w0 = w−1 = 1, wh = k
, h = 1, . . . , H + 1 we have from H Barndorff-Nielsen et al. (2008) that K (Uδ ) = − s
H −
h −1
(wh+1 − 2wh + w
)
−
(wh+1 − w
)
s h−1 Rh,n
The unique positive solution is given by HS = cS (ξ n)1/2 , where
,
h=1
where
n 2 ,2 k H3 •
+
n H2
{k′ (0) + k′ (1)}
d
−1/2 ∑
H h=0
(wh+1 − 2wh +
∑H −1/2
s wh−1 )Vhs,n → N (0, 4ω4 ) and avar(H h=1 (wh+1 − wh−1 )Rh,n ) 4 1,1 = 4ω k• . Since the noise is independent across subsamples, the ∑H results for K (Xδ , S ) + ∆SH ,n = − h=1 (wh+1 − 2wh + wh−1 ) ∑ ∑ ∑S S H s −1 S −1 s=1 Vhs,n and ∆SH ,n = h=1 (wh+1 − wh−1 )S s=1 Rh,n
follow.
avar{K˜ (Xδ )}
4t
=
0
σu4 du
HS 0,0 k n •
1,1
2ξ ρ k• HS
+
+ nξ 2
k′ (0)2 +k′ (1)2 (HS )2
ξ (HS )2 k0•,0 n ξ k1,1 •
+ 2ρ +
n HS
ξ
k′ (0)2 +k′ (1)2 1,1 k•
2,2
+ S (HSn )2 ξ k1•,1
2,2
+ S (kHS• )3
,
k•
which can be seen to vanish when k (0) + k (1) ̸= 0 or S → ∞. We need HS ∝ n1/2 for √ the ratio not to vanish when k′ (0)2 + ′ 2 k (1) = 0. With HS = c ξ n we find ′
avar{K (Xδ ) − K˜ (Xδ )} avar{K˜ (Xδ )}
′
2
2
ξ
=
0,0 k• 1,1 k•
c2ξ
≤
+ 2ρ + ξ
2 1+
2,2 0,0 k• k• (k1•,1 )2
S
t
this c into (13) yields ω4/3 t
k′ (0)2 + k′ (1)2 c3
0
σu4 du
2/3
2,2
k•
, √
b/a minimises
times 1
= 4c k0•,0 + k0•,0 2
0 ,0
Finally ck• = {2(k (0) + k (1)2 )/k0•,0 }1/3 k0•,0 = {2(k0•,0 )2 (k′ (0)2 + k′ (1)2 )}1/3 . 2
t 0
σu4 du
cn1/2+α/4
3/4
k•0,0 +
+
1 cS3 xy
, which is proportional to
1+z
1
Lemma A.2. Let g (S ) be as defined in Theorem 3. Then g ′ (S ) > 0 for all S > 0. 1
√
Proof of Corollary 1. From Lemma A.2 it follows that g ′ (S ) > 0 for all S > 0, if we set x = S and a = 3k0•,0 k2•,2 /(ρ k1•,1 )2 . So any increment in S will increase the asymptotic variance. Proof of Corollary 2. By substitution for the first ρ in g (S ) we find that (16) is proportional to
∫ t 1/2 ∫ t 1/2 ω t σu4 du σu2 du 0 0 1
+ 1+
0,0 2,2
1,1
1 + 3Sk• k• /(ρ k• )2 0,0 2,2
1,1 2 k
1 + 3Sk• k• /(ρ • )
.
2ρ k1•,1 cn1/2+α/4
From Hansen and Lunde t (2006, p. 135) it follows that business time sampling minimises t 0 σu4 du and by Lemma A.2 we have also that the second term is minimised for the largest possible value of ρ (set x = 1/ρ 2 ). Since ρ ≤ 1 the solution is ρ = 1. Proof of Lemma 2. From the proof of 1 we have
times
+ nnα
k2•,2
(cn1/2+α/4 )3 0,0 −1/2+α/4 −3 2,2 −1/2+α/4 = ck• n + c k• n , n
2 cS x
′
Proof of Theorem 3. (i.a) The mixed Gaussian result is straightforward using Theorem 1. (i.b) Substituting HS = ξ 1/2 cn1/2+α/4 and S = anα into (14) yields 4ω t
cS +
√ x cS + + 3 = +2 + √ cS x x x(1 + z ) cS xy (1 + z ) 1 + zc 3 y √ 1 4 1 = √ + 1+z . √ x3 1+z 0 ,0 2 ,2 1 ,1 Now substitute z = 1 + 3Sk• k• /(ρ k• )2 and x−1/2 = ρ k1•,1 /k0•,0 , and (16) follows.
1+
= 4ck• (1 + 1/2) = 6ck• . ′
k0•,0
1
×
0 ,0
0,0
3/4
√ + 1 + 1 + ax, 1+ 1+ax √ for a > 0. The first derivative f ′ (x) = 4a (1 + ax + 1)−3/2 is positive for all x > 0.
Proof of Theorem 2. (i) The mixed Gaussian result follows from Theorem 1. (ii) The best value for c is found by solving the firstorder condition k0•,0 − 2c −3 {k′ (0)2 + k′ (1)2 } = 0, and substituting
4c k•0,0 +
σu4 du
Proof. Consider the function f (x) = √
ξ c 2 k1•,1
where we used that ρ, S ≥ 1 and that x = f (x) = ax + b/x, a, b > 0.
0
2
4ω4 k1•,1 /(HS )
Now define x = k0•,0 /(ρ k1•,1 ), y = ρ k1•,1 /(Sk2•,2 ), and z = √ √ 1 + 3x/y. Then cS = (1 + z )/x and x/y = (z 2 − 1)/3 = (1 + z ) (z − 1)/3. So the minimum asymptotic variance is given by 4ω t
avar{K (Xδ ) − K˜ (Xδ )}
t
1 ,1 0,0 2,2 k• k• ρ k• cS = 0,0 1 + 1 + 3S k• (ρ k1•,1 )2 ρ k1,1 (ρ k1•,1 )2 + 3Sk0•,0 k2•,2 • = + . 0,0 k• (k0•,0 )2
t
Proof of Lemma 1. From (12) we have
=
(ii) Minimising (14) with respect to x = HS has the first-order condition n−1 k0•,0 − 2ξ ρ k1•,1 (HS )−2 − 3ξ 2 nSk2•,2 (HS )−4 = 0.
s h−1 Vh,n
h=0 H −
217
because the second term is of lower order that the first and third terms when α > 0.
Kw (Uδ ) = Kw (Uδ ; 1)
=−
H −
(wh+1 − 2wh + wh−1 ) Vh1,n
h=0
−
H − h=1
(wh+1 − wh−1 ) R1h,n ,
218
O.E. Barndorff-Nielsen et al. / Journal of Econometrics 160 (2011) 204–219
where Var(Vh1,n ) = (4n − 2h)ω4 .Vh1,n is entirely made up of Uj1 Uj1−h terms and so Cov( 4ω n − since H = o(n). 4
, ) = 0, for h ̸= k. Hence Var{K˜ w (Uδ )} ≥ 2 h=0 (wh+1 − 2wh + wh−1 ) , and the result follows
∑H
H 2
Vh1,n
Vk1,n
3
8ω4 nδ
t
∫
σu4 du +
S
0
(A.5)
2/3
Proof of Lemma 3. With S = c (ξ n) we have = S /n = n c ξ 2/3 n−1/3 and Sδ = n/S 2 = c −2 ξ −4/3 n1/3 , so (A.5) in the proof of Theorem 4 becomes n1/3 times 16 3
c ξ 2/3 t
t
1 n− δ
σu4 du + 8ω4 c −2 ξ −4/3 t
= ω4/3 t
σu4 du
2/3
16 3
0
c + 8c −2
So n1/6 γ0 (Xδ ; S ) + γ1 (Xδ ; S ) −
t 0
.
t
≃ 11.53ω4/3 t
σu2 du converges to a mixed
σu4 du
2/3
[A∞,3×3 ] =
3
•
.
• • ,
4 1
4
2
• λ2 + 5 −4
1 With w = 1, 1,
= n
1 . 2
1/3
1 | 2
• • . 6
we have w | [A∞,3×3 ]w =
20 3
and w | [C3×3 ]w
The result now follows, as the asymptotic variance is
S n
1
∫
20 4 u du
σ
t
1
∫
= ω4/3 t
+ 4ω
3
0
4
2/3
σu4 du 0
n 1
S2 2 20 3
c + 2c −2
.
Proof of Theorem 5. We have
∫ t γ0 (Xδ ; S ) − 2nδ ω2 − σu2 du 0 ∫ t σu2 du + 2γ0 (Uδ , Yδ ; S ) + γ0 (Uδ ; S ) − 2nδ ω2 , = γ0 (Yδ ; S ) − 0 S −1 8ω2 t σ 2 du n 4ω4 δ (1+λ2 ) t −1 nδ 43 t 0 σu4 du
4/3
t
∫
σ
4 u du
t
2/3
c
+
3
0
1 + λ2 c2
.
n 2 1 − 2 Ujδ/S − U(j−S )δ/S ≃ S j =1 S
n 2 2 1 − Ujδ/S − U(j−1)δ/S ≃ S j =1 S
and using
2 S
=
2n1/2 c ξ 2/3 n2/3
n −
Uj2δ/S
+
j =1 n −
n −
Ujδ/S U(j−S )δ/S
j =1
Uj2δ/S
+
j =1
n −
Ujδ/S U(j−1)δ/S
,
j =1
× n−1/2 = n−1/6
4 c 2 ξ 4/3
× n−1/2 we see
n
−1/2
n −
− 2nδ ω2 2 2 − U(j−1)δ/S − 2nδ ω
Ujδ/S − U(j−S )δ/S
j =1 n −
Ujδ/S
2
j=1
4ω4 L 1 + λ2 → N 0, 2 4/3 λ2 c ξ
λ2 1 + λ2
.
References
λ +1 [C3×3 ] = −λ2 − 2
2/3 t . = n−1/2 γ0 (Uδ/S ) + op (1) and ω4 /ξ 4/3 = ω4/3 t 0 σu4 du
Proof of Lemma 4. From Theorems A.2 and 1 we obtain the following upper left 3 × 3 submatrices of A∞ and C : 2 1 0
σu2 du
Proof of Theorem 6. It follows from Theorem 5, n−1/2 γ0 (Xδ/S )
0
→ MN 0, 4ω
n−1/6
3
∫
Ls
−1/2 n
∫ t 2/3 16 ω4/3 t σu4 du (3)1/3 + 8 (3)−2/3 0
Gaussian distribution with this variance. We can now minimise this asymptotic variance by selecting c 3 = 3. At this value the asymptotic variance is
2
t
that
0
∫
∫
By the approximations
.
The first term appears from (A.2), the second from Theorem A.2 of Barndorff-Nielsen et al. (2008).
∫
n1/6 γ0 (Xδ ; S ) − 2nδ ω2 −
0
Proof of Theorem 4. The asymptotic distribution of γ0 (Xδ ; S ) + t γ1 (Xδ ; S ) − 0 σu2 du is mixed Gaussian with variance of approximately, for moderate nδ and S, 1 16 t n− δ
(A.2), (A.3), and Theorem A.1, respectively. For large S = c (ξ n)2/3 (implying large nδ = n/S = c −1 ξ −2/3 n1/3 ) we have
0
u
S
which has mean zero and a variance that is the sum of the three terms given below the brackets. The three terms are given from
Aït-Sahalia, Y., Mykland, P.A., Zhang, L., 2010. Ultra high frequency volatility estimation with dependent microstructure noise, Journal of Econometrics, forthcoming (doi:10.1016/j.jeconom.2010.03.028). Anderson, T.W., 1971. The Statistical Analysis of Time Series. John Wiley and Sons, New York. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2000. Great realizations. Risk 13, 105–108. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2001. The distribution of exchange rate volatility. Journal of the American Statistical Association 96, 42–55. (Correction published in 2003, volume 98, page 501). Bandi, F.M., Russell, J.R., 2008. Microstructure noise, realized variance, and optimal sampling. Review of Economic Studies 75, 339–369. Barndorff-Nielsen, O.E., Graversen, S.E., Jacod, J., Shephard, N., 2006. Limit theorems for realised bipower variation in econometrics. Econometric Theory 22, 677–719. Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N., 2008. Designing realised kernels to measure the ex-post variation of equity prices in the presence of noise. Econometrica 76, 1481–1536. Barndorff-Nielsen, O.E., Shephard, N., 2002. Econometric analysis of realised volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society, Series B 64, 253–280. Bartlett, M.S., 1950. Periodogram analysis and continuous spectra. Biometrika 37, 1–16. Carlstein, E., 1986. The use of subseries values for estimating the variance of a general statistic from a stationary time series. Annals of Statistics 14, 1171–1179. Doornik, J.A., 2006. Ox: An Object-Orientated Matrix Programming Language, 5 ed. Timberlake Consultants Ltd., London. Fang, Y., 1996. Volatility modeling and estimation of high-frequency data with Gaussian noise, Unpublished Ph.D. Thesis, Sloan School of Management, MIT. Hansen, P.R., Lunde, A., 2005. A realized variance for the whole day based on intermittent high-frequency data. Journal of Financial Econometrics 3, 525–554. Hansen, P.R., Lunde, A., 2006. Realized variance and market microstructure noise (with discussion). Journal of Business and Economic Statistics 24, 127–218. Huang, X., Tauchen, G., 2005. The relative contribution of jumps to total price variation. Journal of Financial Econometrics 3, 456–499. Jacod, J., Shiryaev, A.N., 2003. Limit Theorems for Stochastic Processes, 2 ed. Springer, Berlin.
O.E. Barndorff-Nielsen et al. / Journal of Econometrics 160 (2011) 204–219 Kalnina, I., Linton, O., 2008. Estimating quadratic variation consistently in the presence of correlated measurement error. Journal of Econometrics 147, 47–59. Künsch, H.R., 1989. The jackknife and the bootstrap for general stationary observations. Annals of Statistics 17, 1217–1241. Mykland, P.A., Zhang, L., 2006. ANOVA for diffusions and Ito processes. Annals of Statistics 34, 1931–1963. Oomen, R.A.A., 2005. Properties of bias corrected realized variance in calendar time and business time. Journal of Financial Econometrics 3, 555–577. Oomen, R.A.A., 2006. Properties of realized variance under alternative sampling schemes. Journal of Business and Economic Statistics 24, 219–237. Politis, D.N., Romano, J.P., 1995. Bias-corrected nonparametric spectral estimation. Journal of Time Series Analysis 16, 67–103. Politis, D.N., Romano, J.P., Wolf, M., 1999. Subsampling. Springer, New York.
219
Priestley, M.B., 1981. Spectral Analysis and Time Series, vol. 1. Academic Press, London. Protter, P., 2004. Stochastic Integration and Differential Equations. Springer-Verlag, New York. Zhang, L., 2006. Efficient estimation of stochastic volatility using noisy observations: a multi-scale approach. Bernoulli 12, 1019–1043. Zhang, L., Mykland, P.A., Aït-Sahalia, Y., 2005. A tale of two time scales: determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association 100, 1394–1411. Zhou, B., 1996. High-frequency data and volatility in foreign-exchange rates. Journal of Business and Economic Statistics 14, 45–52. Zhou, B., 1998. Parametric and nonparametric volatility measurement. In: Dunis, C.L., Zhou, B. (Eds.), Nonlinear Modelling of High Frequency Financial Time Series. John Wiley Sons Ltd., New York, pp. 109–123 (Chapter 6).
Journal of Econometrics 160 (2011) 220–234
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Realized volatility forecasting and market microstructure noise✩ Torben G. Andersen a,b,c,∗ , Tim Bollerslev d,b,c,1 , Nour Meddahi e,2 a
Department of Finance, Kellogg School of Management, Northwestern University, Evanston, IL 60208, United States
b
NBER, Cambridge, MA, United States
c
CREATES, Aarhus, Denmark d Department of Economics, Duke University, Durham, NC 27708, United States e
Toulouse School of Economics (GREMAQ, IDEI), 21 Allée de Brienne, 31000 Toulouse, France
article
info
Article history: Available online 6 March 2010 JEL classification: C14 C22 C52 G14 Keywords: Volatility forecasting High-frequency data Market microstructure noise Integrated volatility Realized volatility Robust volatility measures Eigenfunction stochastic volatility models
abstract We extend the analytical results for reduced form realized volatility based forecasting in ABM (2004) to allow for market microstructure frictions in the observed high-frequency returns. Our results build on the eigenfunction representation of the general stochastic volatility class of models developed by Meddahi (2001). In addition to traditional realized volatility measures and the role of the underlying sampling frequencies, we also explore the forecasting performance of several alternative volatility measures designed to mitigate the impact of the microstructure noise. Our analysis is facilitated by a simple unified quadratic form representation for all these estimators. Our results suggest that the detrimental impact of the noise on forecast accuracy can be substantial. Moreover, the linear forecasts based on a simple-toimplement ‘average’ (or ‘subsampled’) estimator obtained by averaging standard sparsely sampled realized volatility measures generally perform on par with the best alternative robust measures. © 2010 Elsevier B.V. All rights reserved.
1. Introduction In recent years, the increased availability of complete transaction and quote records for financial assets has spurred a literature seeking to exploit this information in estimating the current level of return volatility. Merton (1980) notes that spot volatility may be inferred perfectly if the asset price follows a diffusion process and a continuous record of prices is available. However, practical implementation presents significant challenges. First, we only
✩ We are grateful to Federico Bandi, Peter Christoffersen, Peter R. Hansen, Per Mykland, Neil Shephard, Viktor Todorov and participants at various conferences and seminars for helpful discussions. We also thank Selma Chaker and Bruno Feunou for excellent research assistance. The work of Andersen and Bollerslev was supported by a grant from the NSF to the NBER and CREATES funded by the Danish National Research Foundation. The work of Meddahi was supported by MITACS. This paper was written while the third author was visiting CREST. He thanks the CREST for its hospitality and financial support. ∗ Corresponding author. Tel.: +1 847 467 1285; fax: +1 847 491 5719. E-mail addresses:
[email protected] (T.G. Andersen),
[email protected] (T. Bollerslev),
[email protected] (N. Meddahi). 1 Tel.: +1 919 660 1846.
2 Tel.: +33 (0) 5 61 12 85 63. 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.032
observe prices at intermittent and discrete points in time. This induces discretization errors in estimates of current volatility. Second, and more importantly, the recorded prices do not reflect direct observations of a frictionless diffusive process. Market prices are quoted on a discrete price grid with a gap between buying and selling prices, i.e., a bid-ask spread, and different prices may be quoted simultaneously by competing market makers due to heterogeneous beliefs, information and inventory positions. The latter set of complications is referred to jointly as market microstructure effects. Consequently, any observed price does not represent a unique market price but instead an underlying ideal price confounded by an error term reflecting the impact of market microstructure frictions, or ‘‘noise’’. The early literature accommodates microstructure noise by sampling prices relatively sparsely to ensure that the intraday returns are approximately mean zero and uncorrelated. As such, the realized volatility estimator, which cumulates intraday squared returns, provides a near unbiased return variation measure; see, e.g., Andersen et al. (2000). Further, as stressed by Andersen and Bollerslev (1998), Andersen et al. (2001), and BarndorffNielsen and Shephard (2001), in the diffusive case, and absent microstructure frictions, this estimator is consistent for the integrated variance as the sampling frequency diverges. Importantly, this represents a paradigm shift towards ex-post estimation of (average)
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 220–234
volatility over a non-trivial interval, avoiding the pitfalls associated with estimation of spot volatility when prices embed microstructure distortions. However, realized volatility computed from sparsely sampled data suffers from a potentially substantial discretization error; see Barndorff-Nielsen and Shephard (2002), Jacod and Protter (1998), and Meddahi (2002a). An entire literature is devoted to improving the estimator. Important contributions include the first-order autocorrelation adjustment by Zhou (1996), the notion of an optimal sampling frequency by Bandi and Russell (2006, 2008) and Aït-Sahalia et al. (2005), the average and two-scale estimator of Zhang et al. (2005), the multi-scale estimator of Zhang (2006), as well as the realized kernel estimator of Barndorff-Nielsen et al. (2008a). Another key issue concerns the use of realized volatility measures for decision making. Real-time asset allocation, derivatives pricing and risk management is conducted given current (conditional) expectations for the return distribution over the planning horizon. Hence, the measures of current and past volatility must be converted into useful predictors of a future return variation. This critical step is inevitably model-dependent but the realized volatility based forecasting literature is less developed. A number of empirical studies compares the performance of forecasts using realized variation measures to standard stochastic volatility (SV) forecasts as well as option based predictions; see Andersen et al. (2003), Deo et al. (2006), Koopman et al. (2005), and Pong et al. (2004), among others. The realized variation forecasts generally dominate traditional SV model forecasts based on daily data and they perform roughly on par with the options based forecasts. In terms of a more analytic assessment, existing results stem from a handful of simulation studies which, aside from being model specific, typically ignore microstructure effects.3 The model-specific nature of these studies is partially circumvented by Andersen, Bollerslev and Meddahi (henceforth ABM, 2004, 2005). They exploit the eigenfunction stochastic volatility (ESV) framework of Meddahi (2001) in developing analytic expressions for forecast performance spanning all SV diffusions commonly used in the literature. This set-up delivers expressions for the optimal linear forecasts based on the history of past realized volatility measures and allows for direct comparison as the sampling frequency of the intraday returns varies or the measurement horizon changes.4 It also facilitates analysis of the (artificial) deterioration in forecast performance due to the use of feasible realized volatility measures as ex-post benchmarks for return variation in lieu of the true integrated volatility. Nonetheless, these studies do not account for the impact of microstructure noise on practical measurement and forecast performance. In fact, there is no obvious way to assess this issue analytically for a broad class of models within the existing literature.5 In this paper, we extend the ABM studies by explicitly accounting for microstructure noise in the analytic derivation of realized volatility based forecasts. The literature on this topic is limited to concurrent work by Aït-Sahalia and Mancini (2008) and Ghysels and Sinko (2006). These papers provide complementary evidence as they resort to simulation methods or empirical assessment in order to rank the estimators while also studying data generating
processes and forecast procedures not considered here.6 For example, Aït-Sahalia and Mancini (2008) include long memory and jump diffusions among the scenarios explored, while Ghysels and Sinko (2006) consider nonlinear forecasting techniques based on the MIDAS regression approach. Moreover, a preliminary review of some results, originally derived for this project, is included in Garcia and Meddahi (2006). The remainder of the paper unfolds as follows. The next section briefly introduces the theoretical framework, including the ESV model and the definition of realized volatility, followed by an enumeration of the analytical expressions for the requisite moments underlying our main theoretical results. Section 3 presents the optimal linear forecasting rules for integrated volatility when the standard realized volatility measure is contaminated by market microstructure noise. We also quantify the impact of noise for forecast performance and explore notions of ‘‘optimal’’ sampling frequency. Moreover, we show how optimally combining intraday squared returns in constructing integrated volatility forecasts does not materially improve upon forecasts relying on realized volatilities computed from equally weighted intraday squared returns. Section 4 shows that many robust realized volatility measures, designed to mitigate the impact of microstructure noise, may be conveniently expressed as a quadratic form of intraday returns sampled at the highest possible frequency. This representation, in turn, facilitates the derivation of the corresponding optimal linear integrated volatility forecasts. We find that a simple estimator, obtained by averaging different sparsely sampled standard realized volatility measures (sometimes referred to as a subsampled estimator), is among the best forecast performers. Moreover, the differences among the competing realized volatility estimators can be substantial, highlighting the potential impact of noise for practical forecast performance. For example, we show that feasible realized volatility forecasting regressions based on the ‘‘wrong’’ realized volatility measure may, falsely, suggest near zero predictability, when in fact more than fifty percent of the day-today variation in the (latent) integrated volatility is predictable. Section 5 provides concluding remarks. All main proofs are deferred to the technical Appendix. 2. Theoretical framework 2.1. General setup and assumptions We focus on a single asset traded in a liquid financial market. We assume that the sample-path of the corresponding (latent) price process, {St∗ , 0 ≤ t }, is continuous and determined by the stochastic differential equation (sde) d log(St∗ ) = σt dWt ,
analyzing the predictive inference for integrated volatility. 5 Bandi et al. (2008) find that choosing a proper sampling frequency in constructing realized volatility measures has important benefits for a dynamic portfolio choice.
(2.1)
where Wt denotes a standard Brownian motion, and the spot volatility process σt is predictable and has a continuous sample path. We assume the σt and Wt processes are uncorrelated and, for convenience, we refer to the unit time interval as a day.7 Our primary interest centers on forecasting the (latent) integrated volatility over daily and longer inter-daily horizons. Specifically, we define the one-period integrated volatility, t +1
∫ 3 For example, Andersen et al. (1999) document substantial gains from volatility forecasts based on high-frequency data over daily GARCH forecasts through simulations from a GARCH diffusion. 4 This same approach has recently been adopted by Corradi et al. (2009a,b) in
221
IVt +1 ≡ t
στ2 dτ ,
(2.2)
6 We only became aware of these projects after initiating the current work. 7 The extension to scenarios including either return–volatility correlations or a drift term in the return process is discussed in the working paper version of this paper (ABM, 2006).
222
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 220–234
and, for m a positive integer, the corresponding multi-period measure, IVt +1:t +m =
m −
IVt +j .
(2.3)
j =1
In this context, IVt equals the quadratic return variation which, in turn, provides a natural measure of the ex-post return variability; see, e.g., the discussion in Andersen et al. (2006, 2010).8 Integrated volatility is not directly observable but, as highlighted by Andersen and Bollerslev (1998), Andersen et al. (2001, 2003), Barndorff-Nielsen and Shephard (2001, 2002), and Meddahi (2002a), the corresponding realized volatilities provide consistent estimates of IVt . The standard realized volatility measure is simply, RV∗t (h) ≡
1/h −
∗(h)2
rt −1+ih ,
(2.4)
i=1
where 1/h is assumed to be a positive integer and ∗(h)
rt
≡ log(St∗ ) − log(St∗−h ).
(2.5)
Formally, RVt (h) is uniformly consistent for IVt as h → 0, i.e., the intraday sampling frequency goes to infinity. Moreover, ABM (2004) demonstrate that simple autoregressive models for RV∗t (h) provide simple-to-implement and, for many popular SV models, remarkably close to efficient forecasts for IVt +1 and IVt +1:t +m .9 In practice, recorded prices are invariably affected by microstructure frictions and do not adhere to the model in (2.1) at the highest frequencies. The studies above advocate using relatively sparse sampling to allow Eq. (2.1) to adequately approximate the observed price process (see, e.g., the discussion of the so-called volatility signature plot in Andersen et al., 2000, for informally selecting the value of h). More recently, however, many studies advocate explicitly including noise terms in the price process and then design procedures to mitigate their impact on volatility measurement. We focus on the most common scenario in the literature involving i.i.d. noise. Hence, the observed price, {St , 0 ≤ t }, is governed by the process in (2.1) plus a noise component,
The noise induces an MA(1) error structure in observed returns. For (h) very small h the variance of the noise term, et , dominates the vari∗(h) ance of the true return, rt . In fact, as shown by Bandi and Russell (2006, 2008) and Zhang et al. (2005), the feasible realized volatility measure based on contaminated high-frequency returns, RVt (h) ≡
1/h −
(h)2
rt −1+ih
(2.10)
i=1
is inconsistent for IVt and diverges to infinity for h → 0. Nonetheless, RVt (h) can still be used to construct meaningful forecasts for IVt +1 and IVt +1:t +m for moderate values of h. Indeed, as documented below, by balancing the impact of the noise and signal, accurate volatility forecasting based on simple autoregressive models for RVt (h) is feasible. In addition, a number of alternative robust realized volatility measures, explicitly designed to account for high-frequency noise, has been proposed. We therefore also compare and contrast the performance of reduced form forecasting models for these alternative measures to those based on the traditional RVt (h) measure.
∗
log(St ) = log(St∗ ) + ut ,
(2.6)
where ut is i.i.d., independent of the frictionless price process St∗ , with mean zero, variance Vu , and kurtosis Ku = E [u4t ]/Vu2 . In the illustrations below we focus on Ku = 3, corresponding to a Gaussian noise term, but our results allow for any finite value of Ku .10 If St , but not St∗ , is observable, the h-period returns become, (h)
rt
≡ log(St ) − log(St −h ).
(2.7)
These noise contaminated returns are linked to the returns in (2.5) as, (h)
rt
= rt∗(h) + e(t h) ,
(2.8)
where (h)
et
≡ ut − ut − h .
(2.9)
8 The integrated volatility also figures prominently in the option pricing literature; see, e.g., Hull and White (1987) and Garcia et al. (2011). 9 These theoretical results corroborate the empirical findings of Andersen et al.
2.2. Eigenfunction stochastic volatility models We follow ABM (2004) in assuming that the spot volatility process belongs to the Eigenfunction Stochastic Volatility (ESV) class introduced by Meddahi (2001). This class of models includes most diffusive stochastic volatility models in the literature. For illustration, assume that volatility is driven by a single state variable.11 The ESV representation, with p denoting a positive, possibly infinite, integer, takes the generic form,
σt2 =
p −
an Pn (ft ),
(2.11)
n =0
where the latent state variable ft is governed by a diffusion process, dft = m(ft )dt +
v(ft )dWtf ,
(2.12)
f Wt ,
the Brownian motion, is independent of Wt in Eq. (2.1), the an coefficients are real numbers and the Pn (ft )’s are the eigenfunctions of the infinitesimal generator associated with ft .12 The eigenfunctions are orthogonal and centered at zero, E Pn (ft )Pj (ft ) = 0
E [Pn (ft )] = 0,
(2.13)
and follow first-order autoregressive processes,
∀l > 0 ,
E [Pn (ft +l ) | fτ , τ ≤ t] = exp(−λn l)Pn (ft ),
(2.14)
where (−λn ) denote the corresponding eigenvalues. These simplifying features render a derivation of analytic multi-step forecasts (h) for σt2 and for the moments of discretely sampled returns, rt , from the model defined by Eqs. (2.1), (2.8), (2.11) and (2.12), feasible. The following proposition collects the relevant results for the subsequent analysis. Proposition 2.1. Let the discrete-time noise-contaminated and ideal (h) ∗(h) returns, rt and rt , respectively, be determined by an ESV model and Eq. (2.8), with corresponding one- and m-period integrated volatilities, IVt and IVt +1:t +m , defined by Eqs. (2.2) and (2.3).Then for
(2003), Areal and Taylor (2002), Corsi (2009), Deo et al. (2006), Koopman et al. (2005), Martens (2002), Pong et al. (2004), and Thomakos and Wang (2003), among many others, involving estimation of reduced form forecasting models for various realized volatility series. 10 In addition, the case of correlated microstructure noise is also briefly discussed
11 The one-factor ESV model may be extended to allow for multiple factors while maintaining the key results discussed below; see Meddahi (2001) for further details. See also Chen et al. (2009) for a general approach to eigenfunction modeling for multivariate Markov processes. 12 For a more detailed discussion of the properties of infinitesimal generators see,
in ABM (2006).
e.g., Hansen and Scheinkman (1995) and Aït-Sahalia et al. (2010).
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 220–234
positive integers i ≥ j ≥ k ≥ l, m ≥ 1, and h > 0, (h)
∗(h)
E [rt +ih ] = E [rt +ih ] = 0, (h)2
(2.15)
(h)
∗(h)
E [rt +ih ] = Var[rt +ih ] = Var[rt +ih ] + 2Vu = a0 h + 2Vu , (h) (h) Cov[rt +ih , rt +jh ] = −Vu , (h) (h) (h) (h) E [rt +ih rt +jh rt +kh rt +lh ]
for |i − j| = 1,
(2.16) (2.17)
= 3a20 h2 + 6 +
(
2Vu2 Ku Vu2 Ku
−
λ
if i = j = k + 1 = l + 1, p − a2n [1 − exp(−λn h)]2 exp(−λn (i − k − 1)h) 2 λ n n =1
= 2Vu2 if i = j + 1, j = k = l + 1,
(2.18)
(h) (h) (h) (h) Cov[rt −1+m+ih rt −1+m+jh , rt −1+kh rt −1+lh ] p − a2n (1 − exp(−λn h))2 exp(−λn (m + (i − k − 1)h)) 2 λ n n =1
if m ≥ 2, i = j, k = l, p − a2n (1 − exp(−λn h))2 exp(−λn (1 + (i − k − 1)h)) 2 λ n n =1
if m = 1, i = j, k = l, i ̸= 1, k ̸= 1/h,
=
−
λ
(2.23)
ABM (2004) examine integrated volatility forecasts constructed from linear regressions of IVt +1 and IVt +1:t +m on current and lagged values of RV∗t (h). However, Eq. (2.6) provides a better description of real-world prices than (2.1) when data are sampled at the highest frequencies. Hence, for small h, analytical results based on RVt (h) in lieu of RV∗t (h) should help us gauge the impact of microstructure noise on forecast accuracy. In order to derive analytical expressions for linear forecasts of IVt +1 and IVt +1:t +m based on current and lagged RVt (h) we must compute Cov[IVt +1 , RVt −l (h)], Var[RVt (h)], and Cov[RVt +1 (h), RVt −l (h)], for l ≥ 0. To do so, note from Eq. (2.8) that, 1/h −
(h)2
et −1+ih + 2
1/h −
∗(h)
(h)
rt −1+ih et −1+ih .
(3.1)
i=1
Utilizing this decomposition the following set of results follows readily.
if m = 1, i = j, k = l, i = 1, k = 1/h,
+
,
where κ1 = 0.5708, θ1 = 0.3257, η1 = 0.2286, κ2 = 0.0757, θ2 = 0.1786, and η2 = 0.1096, implying a very volatile first factor and a much more slowly mean reverting second factor, reminiscent of, e.g., the Engle and Lee (1999) model.
i =1
(1 − exp(−λn h))2 + (Ku − 1)Vu2
= 0 otherwise. (h) (h) Cov IVt , rt −1+ih rt −1+jh p − a2n = δi,j 2 [exp(−λn h) + λn h − 1] λ2n n =1
j = 1, 2,
RVt (h) = RV∗t (h) +
p
a2n 2 n n =1
(j+1)
dσj2,t = κj (θj − σj2,t )dt + ηj σj,t dWt
3.1. Optimal linear forecasts
= −2Vu2 − a0 Vu h if i = j ≥ k + 1, k = l + 1, or i = j + 1, j ≥ k + 1, k = l, = Vu2 if i = j + 1, j ≥ k + 1, k = l + 1,
=
σt2 = σ12,t + σ22,t
3. Traditional realized volatility based forecasts
+ 4Vu2 + 4a0 Vu h if i = j > k + 1, k = l,
=
(2.22)
Model M2—two-factor affine. The instantaneous volatility is given by,
+ 3) + 12a0 Vu h if i = j = k = l,
otherwise.
where κ = 0.035,
θ = 0.636, ψ = 0.296.
[−1 + λn h + exp(−λn h)]
( + 3) − 3a0 Vu h if i = j = k = l + 1 or i = j + 1 = k + 1 = l + 1, p − a2n = a20 h2 + [1 − exp(−λn h)]2 + Vu2 (Ku + 3) + 4a0 Vu h 2 λ n n =1
=0
Model M1—GARCH diffusion. The instantaneous volatility is given by, (2)
=−
= a20 h2 +
For illustration, we rely on a GARCH diffusion and a two-factor affine model which are representative of the literature. We provide their sde form below, while the corresponding ESV specifications are given in ABM (2004, 2005).
dσt2 = κ(θ − σt2 )dt + σ σt2 dWt ,
p
a2n 2 n n =1
223
(2.19)
p − a2n [2 − exp(−λn (i − 1)h) λ2n n =1
Proposition 3.1. Let the discrete-time noise-contaminated and ideal (h) ∗(h) returns, rt and rt be given by Eqs. (2.8) and (2.5) , with the corresponding realized and integrated volatilities, RV∗t (h), RVt (h), IVt and IVt +1:t +m , defined by Eqs. (2.4), (2.10), (2.2) and (2.3) , respectively. Then for integers m ≥ 1 and l ≥ 0, and h > 0, Cov[IVt +1:t +m , RVt −l (h)] = Cov[IVt +1:t +m , RV∗t −l (h)] = Cov[IVt +1:t +m , IVt −l ], Var[RVt (h)] = Var[RVt (h)] 2Ku E [σt2 ] 2 + 2Vu − Ku + 1 + 4 , h Vu
(3.2)
∗
− exp(−λn (1 − ih))][1 − exp(−λn h)] . (h)
(2.20)
(h)
Cov[IVt +1:t +m , rt −1+ih rt −1+jh ]
= δi,j
Cov[RVt +1 (h), RVt (h)] = Cov[RV∗t +1 (h), RV∗t (h)]
p − a2i [1 − exp(−λn h)][1 − exp(−λn m)] λ2i i =1
× exp(−λn (1 − ih)) .
(3.3)
Cov[RVt +1 (h), RVt −l (h)] = (2.21)
+ (Ku − 1)Vu2 ,
(3.4)
Cov[RV∗t +1 (h), RV∗t −l (h)].
(3.5)
The proposition expresses variances and covariances for RVt (h) via the counterparts for RV∗t (h) along with the noise variance
224
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 220–234
and kurtosis. The relevant expressions for terms involving RV∗t (h) appear in ABM (2004). Adapting their notation, we have, Var[IVt +1:t +m ] = 2
p − a2n [exp(−λn m) + λn m − 1], λ2n n =1
(3.6)
Cov(IVt +1:t +m , IVt −l )
=
p −
a2n
n=1
[1 − exp(−λn )][1 − exp(−λn m)] exp(−λn l), λ2n
(3.7)
Var[RV∗t +1 (h)] = Var[IVt +1 ]
+
4 h
p − a2n [exp(−λn h) − 1 + λn h] , + 2 λ2n n=1
a20 h2
Cov[RV∗t +1 (h), RV∗t −l (h)] = Cov[IVt +1 (h), IVt −l (h)].
(3.8) (3.9)
Combining these expressions with Proposition 3.1, we obtain the requisite variances and covariances for the noise-contaminated realized volatility based forecasts. We cannot quantify the economic losses due to the adverse impact of microstructure noise on the precision of volatility forecasts, as the appropriate loss function depends on the application. Instead, for compatibility with the extant literature, we focus on the R2 from the regression of future integrated variance on a constant and the associated forecast variables. This is equivalent to adopting a mean-squared-error (MSE) criterion for the unconditional bias-corrected return variation forecast.13 The main limitation is that we do not consider (unknown) time-variation in the noise distribution which would reduce the effectiveness of the (unconditional) bias-correction. As such, our analysis provides only a first step towards understanding the impact of noise on volatility forecasting. The R2 from the Mincer–Zarnowitz style regression of IVt +1 onto a constant and the (l + 1) × 1 vector, (RVt (h), RVt −1 (h), . . . , RVt −l (h)), l ≥ 0, may be succinctly expressed as, R2 (IVt +1 , RVt (h), l) = C (IVt +1 , RVt (h), l)⊤ (M (RVt (h), l))−1
× C (IVt +1 , RVt (h), l)/Var[IVt ]
(3.10)
where the typical elements in the (l + 1) × 1 vector C (IVt +1 , RVt (h), l) and the (l + 1) × (l + 1) matrix M (RVt (h), l) are given by, respectively, C (IVt +1 , RVt (h), l)i = Cov(IVt +1 , RVt −i+1 (h))
(3.11)
and, M (RVt (h), l)i j = Cov(RVt (h), RVt +i−j (h)).
(3.12)
2
The corresponding R for the longer-horizon integrated volatility forecasts is obtained by replacing IVt +1 with IVt +1:t +m in the formulas immediately above. 3.2. Quantifying the impact of market microstructure noise The impact of microstructure noise is related to the size of the noise variance relative to the daily return variation, conveniently captured by the noise-to-signal ratio, or γ ≡ Vu /E [IVt ]. Hansen and Lunde (2006) estimate this factor for thirty actively traded stocks during the year 2000 and find values around 0.1%, with most slightly lower. The magnitude of the noise has declined in recent years and is now much lower for many stocks. Consequently, we use 0.1% as the benchmark for a realistic, if not inflated, value for γ .
13 Patton (2011) provides an interesting discussion concerning the choice of loss function for assessing the performance of alternative volatility forecasts when the latent volatility is observed with noise.
We also explore the impact of a significantly higher noise-to-signal ratio of 0.5%. Table 1 reports the population R2 in Eq. (3.10) from the regression of future integrated volatility on various realized measures across different forecast horizons, data generating processes (models), levels of microstructure noise, and sampling frequencies. As reference we include, in row one, the R2 ’s for the optimal (infeasible) forecasts based on the exact value of the (latent) volatility state variable(s). The next two rows concern the (infeasible) forecasts based on past daily (latent) integrated volatility and potentially an additional four lags, or a full week, of daily integrated volatilities. The next eleven rows report R2 ’s for realized volatility based forecasts assuming no noise and sampling frequencies spanning h = 1/1444 to h = 1, representing 1-min to daily returns in a 24-h market, and we refer to them accordingly.14 Alternatively, the h = 1/1440 frequency reflect 15-s sampling over a 6-h equity trading day. Row one reveals that Model 1 implies a higher degree of predictability than Model 2. The latter embodies a second, less persistent, factor which reduces the serial correlation in volatility. In rows two and three, we find only a small loss of predictability for forecasts based on the last day’s integrated volatility rather than spot volatility, while exploiting a full week of daily integrated volatilities is only marginally helpful. We now consider realized volatility based forecasts in the ideal case without noise. Rows four to fourteen reveal only a small drop in predictive power for forecasts using measures constructed from 1- and 5-min returns. At lower sampling frequencies, where the return variation measures are less precise, the addition of lagged volatility measures becomes progressively more valuable. The results for twenty daily lags (19 extra) in row fourteen mimic the performance of a well-specified GARCH model, as detailed in ABM (2004). Finally, turning to the new results for forecasts based on realized volatilities constructed from noisy returns, we first observe only a mild degradation in performance for the realistic case with γ = 0.1%. However, for the higher noise level in the bottom part of the table, the performance deteriorates more sharply and using lagged volatility measures is now critical in boosting the predictive power. Second, as anticipated, it is not optimal to estimate the realized volatility with ultra-high frequency returns. At the moderate noise level, the performance is better for 5-min rather than 1-min sampling, and as γ grows further, sampling at the 15- and 30-min levels produces the highest coherence between forecasts and future realizations, because noise increasingly dominates sampling variability as the main source of variation in the realized measures. Further evidence of the benefit from sparse sampling in this context is obtained by comparing the decline in predictability from the γ = 0.1% to the γ = 0.5% scenario for the 5-min (h = 1/288) versus 30-min (h = 1/48) sampling frequency. One finds a drop in the R2 from moderate to large noise for h = 1/288 at the oneday forecast horizon in Model 1 of about 91% to 72% (92% to 84% if lags are exploited) compared to a drop for h = 1/48 of about 82% to 75% (88% to 85% with lags). Thirdly, the importance of exploiting lagged realized volatility measures increases sharply with the noise level. Even for γ = 0.1%, the measures based on 30-min returns are quite competitive with those using 5-min sampling once the lagged volatility measures are exploited. In fact, for the higher noise level, the 30-min based measures dominate the 5-min based ones in all scenarios. Hence, within the class of linear realized volatility based forecast procedures, 30-min sampling appears to provide a robust and fairly efficient choice as long as past daily realized measures are also exploited.
14 The figures in the first fourteen rows of Table 1 are extracted from Tables 1 through 6 of ABM (2004).
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 220–234
225
Table 1 R2 for integrated variance forecasts.
γ (%)
1/h
0
1440 288 96 48 1
0.1
1440 288 96 48 1
0.5
1440 288 96 48 1
Model
M1
Horizon
1
5
20
1
M2 5
20
R2 (Best) R2 (IVt ) R2 (IVt , 4)
0.977 0.955 0.957
0.891 0.871 0.874
0.645 0.630 0.632
0.830 0.689 0.698
0.586 0.445 0.446
0.338 0.214 0.227
R2 (RV∗t (h)) R2 (RV∗t (h), 4) R2 (RV∗t (h)) R2 (RV∗t (h), 4) R2 (RV∗t (h)) R2 (RV∗t (h), 4) R2 (RV∗t (h)) R2 (RV∗t (h), 4) R2 (RV∗t (h)) R2 (RV∗t (h), 4) R2 (RV∗t (h), 19)
0.950 0.951 0.932 0.934 0.891 0.908 0.836 0.883 0.122 0.360 0.493
0.867 0.868 0.851 0.852 0.813 0.829 0.762 0.805 0.111 0.329 0.450
0.627 0.627 0.615 0.616 0.588 0.599 0.551 0.582 0.081 0.238 0.325
0.679 0.685 0.641 0.642 0.563 0.580 0.476 0.519 0.031 0.072 0.092
0.439 0.450 0.414 0.429 0.364 0.395 0.307 0.360 0.020 0.054 0.074
0.211 0.224 0.199 0.216 0.175 0.202 0.148 0.186 0.010 0.029 0.043
R2 (RVt (h)) R2 (RVt (h), 4) R2 (RVt (h)) R2 (RVt (h), 4) R2 (RVt (h)) R2 (RVt (h), 4) R2 (RVt (h)) R2 (RVt (h), 4) R2 (RVt (h)) R2 (RVt (h), 4) R2 (RVt (h), 19)
0.896 0.911 0.908 0.917 0.873 0.899 0.821 0.877 0.122 0.360 0.492
0.817 0.831 0.828 0.837 0.797 0.821 0.749 0.800 0.111 0.328 0.449
0.591 0.601 0.599 0.605 0.576 0.593 0.541 0.578 0.081 0.237 0.325
0.547 0.569 0.581 0.594 0.525 0.553 0.450 0.501 0.031 0.071 0.091
0.353 0.388 0.375 0.402 0.339 0.379 0.291 0.349 0.020 0.053 0.074
0.170 0.199 0.181 0.205 0.163 0.195 0.140 0.182 0.010 0.029 0.042
R2 (RVt (h)) R2 (RVt (h), 4) R2 (RVt (h)) R2 (RVt (h), 4) R2 (RVt (h)) R2 (RVt (h), 4) R2 (RVt (h)) R2 (RVt (h), 4) R2 (RVt (h)) R2 (RVt (h), 4) R2 (RVt (h), 19)
0.446 0.711 0.719 0.837 0.772 0.858 0.750 0.849 0.121 0.357 0.491
0.407 0.649 0.656 0.764 0.704 0.782 0.684 0.775 0.110 0.326 0.448
0.294 0.469 0.474 0.552 0.509 0.566 0.495 0.560 0.080 0.236 0.324
0.123 0.222 0.300 0.395 0.365 0.443 0.349 0.431 0.031 0.070 0.090
0.080 0.164 0.194 0.283 0.236 0.313 0.225 0.306 0.020 0.052 0.073
0.038 0.088 0.093 0.149 0.113 0.165 0.108 0.161 0.010 0.028 0.042
Note: The table provides the R2 of the MZ regression related to forecasts of IVt +1:t +n 1, 5 and 20 days ahead, for models M1 and M2; h indicates the size of the intraday return interval; γ is the noise-to-signal ratio, Var[noise]/E [IVt ]. The first row provides the optimal forecast from ABM (2004). The next two rows refer to cases where the explanatory variable is current (row 2) and lagged IVt (row 3). The last three blocks correspond to cases with current and lagged (4 and 19 lags) daily realized volatility as regressors.
3.3. Optimal sampling frequency The findings in Section 3.2 suggest exploring the notion of an optimal sampling frequency for RVt (h) in terms of maximizing the R2 for the linear forecasting regressions or, equivalently, minimizing the MSE of the forecasts. This section considers two alternative proposals for choosing h. We focus on one-step-ahead forecasts, but the results are readily extended to longer horizons, as exemplified by our numerical calculations below. One approach follows Bandi and Russell (2006, 2008), and Aït-Sahalia et al. (2005) who show that the optimal sampling frequency, in terms of minimizing the MSE of RVt (h) conditional on the sample path of volatility, may be approximated by, h∗t ≈ (IQt /(4Vu2 ))−1/3 ,
(3.13)
where the integrated quarticity is defined by,
∫
t
IQt = t −1
στ4 dτ .
(3.14)
However, instead of attempting to estimate the optimal frequency on a period-by-period basis, we follow Bandi and Russell (2006) in replacing the hard-to-estimate one-period integrated quarticity by its unconditional expectation. Hence, we consider, h1 = (E [IQt ]/(4Vu2 ))−1/3 .
(3.15)
This unconditional counterpart to h∗t is fairly easy to estimate and implement in practice. We also consider the frequency which minimizes the variance of RVt (h). For motivation, note that the R2 from the regression of IVt +1 on a constant and RVt (h) is, R2 =
Cov[IVt +1 , RVt (h)]2 Var[IVt ]Var[RVt (h)]
=
Cov[IVt +1 , IVt ]2
(3.16)
Var[IVt ]Var[RVt (h)]
where the last equality follows from Proposition 3.1. Hence, maximizing this R2 is tantamount to minimizing the unconditional variance of Var[RVt (h)], also noted by Ghysels and Sinko (2006). To minimize this variance, we follow Barndorff-Nielsen and Shephard (2002), and Meddahi (2002a), in approximating the unconditional variance of the corresponding non-contaminated realized volatility measure by, Var[RV∗t (h)] ≈ Var[IVt ] + 2hE [IQt ].
(3.17)
Substituting this expression into the equation for Var[RVt (h)] in Eq. (3.3) yields, Var[RV∗t (h)] ≈ Var[IVt ] + 2hE [IQt ]
+
2Vu2
2Ku h
− Ku + 1 + 4
E [σt2 ] Vu
.
(3.18)
226
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 220–234
Table 2 R2 for ‘optimally’ sampled intraday returns.
γ (%)
precision of the volatility measure. This is obviously a comforting finding from a practical perspective.15
1/h1
1/h2
Horizon
1
5
20
70.8
487
R2 (RVt (h1 )) R2 (RVt (h1 ), 4) R2 (RVt (h2 )) R2 (RVt (h2 ), 4)
0.854 0.891 0.911 0.919
0.779 0.813 0.832 0.839
0.563 0.588 0.601 0.607
Model M1 0.1
3.4. Optimally combining intra day returns
24.2
97.3
R2 (RVt (h1 )) R2 (RVt (h1 ), 4) R2 (RVt (h2 )) R2 (RVt (h2 ), 4)
0.684 0.824 0.772 0.858
0.624 0.752 0.704 0.782
0.451 0.544 0.509 0.566
0.1
65.3
431
R2 (RVt (h1 )) R2 (RVt (h1 ), 4) R2 (RVt (h2 )) R2 (RVt (h2 ), 4)
0.487 0.527 0.585 0.597
0.315 0.364 0.378 0.404
0.151 0.188 0.182 0.206
0.5
22.3
86.2
R2 (RVt (h1 )) R2 (RVt (h1 ), 4) R2 (RVt (h2 )) R2 (RVt (h2 ), 4)
0.285 0.383 0.365 0.443
0.184 0.275 0.236 0.314
0.089 0.146 0.113 0.165
0.5
Model M2
Note: The table provides the R2 of the MZ regression related to forecasts of IVt +1:t +n 1, 5 and 20 days ahead, for models M1 and M2; h indicates the size of the intraday return interval; γ is the noise-to-signal ratio, Var[noise]/E [IVt ]. The regressors are current and lagged (4) realized volatility computed from return intervals of length h1 or h2 . h1 defined in (3.15) corresponds to the sampling interval suggested by Zhang et al. (2005) and Bandi and Russell (2008). h2 defined in (3.19) corresponds to the frequency maximizing R2 (IVt +1 , RVt (h)).
The basic realized volatility estimator utilizes a flat weighting scheme in combining the information in intraday returns. This is primarily motivated by the consistency property of the measure for the underlying return variation. Once noise is present, the basic measures become inconsistent even if the sparse estimators only suffer from minor finite sample biases. Moreover, inconsistent measures can provide a sensible basis for predicting the future return variation via forecast regressions which adjust for any systematic (unconditional) bias through the inclusion of a constant term. The main issue for forecast regressors is not their bias but their ability to capture variations in the current realized volatility which typically translates into improved predictive performance. This suggests that we may want to loosen the link between the regressors and realized volatility measures. A natural step is to have the daily return variation proxy be a more flexible function of the intraday squared returns. To this end, we next contrast the predictive ability of optimally combined, or weighted, intraday squared returns to the usual realized volatility measure. The former may, for an optimal choice of the α(h) and βi (h) coefficients, be represented by the regression 1/h −
βi (h)(rt(−h)1+ih )2 + ηt +1 (h).
Minimizing with respect to h produces an alternative candidate sampling frequency,
IVt +1 = α(h) +
h2 = (E [IQt ]/(2Vu2 Ku ))−1/2 .
This regression is difficult to implement in practice due to the large number of parameters, 1+1/h, but we can readily compute its population counterpart within the ESV setting using Proposition 2.1. The corresponding numerical results are presented in Table 3. Comparing the results to those in the previous tables, the minor gains obtained by optimal intraday weighting are striking. Even if the improvements are slightly larger for Model 2 than Model 1, they will inevitably be negated, in practice, by the need to estimate the weighting scheme a priori. Of course, the flat weighting of the RV estimator strikes a sensible balance between efficiency and parsimony. However, the above representations only allow for linear weighting of the intraday squared returns. Many modern noise robust estimators involve nonlinear functions of the intraday returns. We explore the forecast potential of some of these estimators below.16
(3.19)
The relative size of h1 versus h2 obviously depends on the magnitude and distribution of the noise term as well as the volatility-ofvolatility, or E [IQt ]. Importantly, however, both h1 and h2 may be estimated in a model-free fashion by using the higher order sample moments of RVt (h) based on very finely sampled returns, or small h values, to assess Vu and Ku , along with the use of lower frequency returns to estimate E [IQt ]; see Bandi and Russell (2006, 2008) for further discussion and empirical analysis along these lines. Table 2 reports approximate optimal sampling frequencies, as represented by h1 and h2 , for the scenarios in Table 1, along with the resulting population R2 ’s. Since h2 directly optimizes an approximation of this quantity, we would expect the associated forecasts to outperform those based on h1 . Nonetheless, the size of the discrepancy is noteworthy. In some cases, the R2 increases by over 25% and there are always a few percent to be gained by adhering to h2 rather than h1 . The reason is that h2 invariably prescribes more frequent sampling than h1 . This finding reflects the pronounced right skew in the distribution of integrated quarticity. Large IQt values are associated with high optimal sampling frequencies to offset the increase in discretization error. Hence, averaging the optimal frequency across days, as in the derivation of h1 , ignores the disproportional losses suffered on the most volatile days. In contrast, h2 minimizes the average squared error and thus adjusts the sampling frequency to accommodate the more extreme days. Of course, if the cost of a fixed sampling frequency is high, one may seek to vary the sampling frequency based on an initial estimate of the integrated quarticity. However, a comparison of the forecast performance associated with h2 and moderate noise in Table 2 with that stemming from forecasts derived from realized volatility in the absence of noise in rows 4–7 in Table 1 shows that the loss is quite small. Hence, for these models, it seems more important to pin down a sensible sampling frequency than to vary the intraday return interval from day to day in response to the varying
(3.20)
i=1
4. Robust realized volatility based forecasts This section investigates to what extent reduced form forecast models based on noise-robust realized variation estimators improve on forecasts constructed from traditional realized volatility measures. In particular, we consider the average and two-scale estimators of Aït-Sahalia et al. (2005), the first-order autocovariance adjusted estimator of Zhou (1996), and the realized kernels of Barndorff-Nielsen et al. (2008a).
15 As indicated previously, one major caveat is that time-variation in the noise distribution, which we do not consider, will render bias-correction less effective. This is most critical for procedures requiring frequent sampling which tend to lower sampling variation but increase bias. Hence, this effect may work to offset some of the advantages of h2 relative to h1 . Future work should further explore this issue. 16 The MIDAS scheme of Ghysels et al. (2006) also produces regression-based volatility forecasts using nonlinear functions of lagged intraday absolute returns, but this approach generally does not fall within to the analytical ESV framework.
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 220–234
227
Table 3 R2 for optimally combined intraday squared returns.
γ (%)
1/h
0.1
1440 288 96 48
0.5
1440 288 96 48
Model
M1
Horizon
1
5
20
1
M2 5
20
R2 (RVt (h)) R2 (optimal) R2 (RVt (h)) R2 (optimal) R2 (RVt (h)) R2 (optimal) R2 (RVt (h)) R2 (optimal)
0.896 0.897 0.908 0.910 0.873 0.874 0.821 0.821
0.817 0.819 0.828 0.830 0.797 0.798 0.749 0.749
0.591 0.592 0.599 0.600 0.576 0.577 0.541 0.542
0.547 0.562 0.581 0.600 0.525 0.537 0.450 0.457
0.353 0.359 0.376 0.382 0.339 0.343 0.290 0.293
0.170 0.171 0.180 0.182 0.163 0.164 0.140 0.140
R2 (RVt (h)) R2 (optimal) R2 (RVt (h)) R2 (optimal) R2 (RVt (h)) R2 (optimal) R2 (RVt (h)) R2 (optimal)
0.446 0.446 0.719 0.719 0.772 0.772 0.750 0.750
0.407 0.407 0.656 0.656 0.704 0.705 0.684 0.684
0.294 0.294 0.474 0.475 0.509 0.510 0.495 0.495
0.123 0.124 0.300 0.303 0.365 0.369 0.349 0.353
0.080 0.080 0.194 0.195 0.235 0.237 0.225 0.227
0.038 0.038 0.093 0.093 0.113 0.114 0.108 0.109
Note: The table provides the R2 of the MZ regression related to forecasts of IVt +1:t +n 1, 5 and 20 days ahead, for models M1 and M2; h indicates the length of the intraday return interval; γ is the noise-to-signal ratio, Var[noise]/E [IVt ]. R2 (RVt (h)) is for the case where the regressor is RVt (h). R2 (optimal) corresponds to the optimal linear ( h) 2
forecast combining rt −1+ih , i = 1, . . . , 1/h.
We first develop a unified quadratic form representation for the alternative estimators.17 Let h denote the shortest practical intraday interval such that 1/h is an integer. As before, we let 1/h denote the actual number of equally spaced returns used to construct a (sparsely sampled) realized volatility estimator. It is convenient to express each such measure as a quadratic function of the 1/h × 1 vector of the highest frequency returns. That is, (h)
−
RMt (h) =
(h)
qij rt −1+ih rt −1+jh = Rt (h)⊤ QRt (h)
where the (1/h × 1) vector Rt (h) is defined by, (h)
(h)
Rt (h) = (rt −1+h , rt −1+2h , . . . , rt )⊤ .
(4.2)
In order to study the interaction between these alternative volatility measures and their relation to the underlying integrated variance, we need analytical expressions for the corresponding first and second moments. The next proposition delivers these quantities. Proposition 4.1. Let the noise-contaminated returns be given by Eqs. (2.1) and (2.8), let RM t (h) and RM t (h) denote two realized volatility measures defined via Eq. (4.1) with corresponding quadratic form weights qij and qij , and let the integrated volatilities, IVt and IVt +1:t +m , be defined by Eqs. (2.2)–(2.3). Then,
−
E [RMt (h)] =
(h)
(h)
qij E [rt −1+ih rt −1+jh ],
(4.3)
1≤i,j≤1/h
(h)
−
E [RMt (h)2 ] =
(h)
(h)
(h)
qij qkl E [rt −1+ih rt −1+jh rt −1+kh rt −1+lh ], (4.4)
Proof. Follows directly from the quadratic form representation. For the ESV model class, closed-form expressions for the righthand-side of the equations in Proposition 4.1 follow from Proposition 2.1. 4.2. Robust RV estimators 4.2.1. The ‘‘all’’ RV estimator The ‘‘all’’ estimator equals the standard realized volatility applied to the maximal sampling frequency. The quadratic form representation is simply, RVall t (h) ≡ RVt (h) =
(h)
(h)
(h)
(h)
qall ij (h) = 1
E [IVt RMt (h)] =
for i = j and
qall ij (h) = 0
when i ̸= j.
(4.9)
This measure is not noise-robust so it is a poor estimator of IVt for small h. However, it plays an important role in defining some of the estimators below. 4.2.2. The sparse RV estimator The sparse estimator equals the usual RVt (h) measure, except that h is a multiple of h; i.e., h = hnh for nh a positive integer. The quadratic representation takes the form,
(4.5)
(h) =
1/h −
(h)2
rt −1+ih = Rt (h)⊤ Q sparse (h)Rt (h),
(4.10)
where sparse
(h) (h) qij E [IVt rt −1+ih rt −1+jh ],
(4.8)
i =1
1≤i,j,k,l≤1/h
−
(h)2
rt −1+ih = Rt (h)⊤ Q all (h)Rt (h),
where
sparse
qij qkl E [rt −1+ih rt −1+jh rt −1+kh rt −1+lh ],
1/h − i =1
RVt
E [RMt (h)RM t (h)]
=
(4.7)
1≤i,j≤1/h
1≤i,j,k,l≤1/h
−
(h)
qij E [IVt +1:t +m rt −1+ih rt −1+jh ].
(4.1)
1≤i,j≤1/h
(h)
(h)
−
E [IVt +1:t +m RMt (h)] =
4.1. Quadratic form representation
(4.6)
1≤i,j≤1/h
17 Sun (2006) introduced the same quadratic representation independently from this paper.
qij
(h) = 1 for i = j, or for i ̸= j, (s − 1)nh + 1 ≤ i, j ≤ snh , s = 1, . . . , 1/h, = 0 otherwise.
(4.11)
For a larger h, this estimator is more noise robust but will be subject to increased sampling variability. It also serves as a building block for more desirable estimators.
228
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 220–234
4.2.3. The average RV estimator Aït-Sahalia et al. (2005) define the average (or ‘‘subsampled’’) RV estimator as the mean of several sparse estimators. In particular, define the nh distinct sparse estimators initiated respectively at 0, h, 2h, . . . , (nh − 1)h, through the equation, sparse RVt
(h, k) =
Nk −
(rt(−h)1+kh+ih )2 ,
k = 0, . . . , nh − 1,
(4.12)
4.2.5. Zhou’s RV estimator The estimator originally proposed by Zhou (1996) essentially involves a correction for first-order serial correlation in the highfrequency returns, leading to an unbiased but still inconsistent estimator of the integrated variance. Specifically, RVZhou (h) = t
i =1
1
Nk − sparse (rt(−h)1+kh+ih )2 = Rt (h)⊤ Q sparse (h, k)Rt (h), RVt (h, k) =
average
RVt
(h) =
n h −1 1 −
nh k=0
sparse
RVt
(h, k)
= Rt (h)⊤ Q sparse (h, k)Rt (h),
(4.15)
−
(h)
(h)
rt −1+ih rt −1+(i+1)h
qZhou (h) = 1 ij
for i = j, or for i ̸= j, (s − 1)nh
+ 1 ≤ i, j ≤ snh , s = 1, . . . , 1/h, = 1 for (s − 1)nh + 1 ≤ i, j∗ ≤ snh , j∗ = j ± nh , s = 1, . . . , 1/h, = 0 otherwise. (4.21) Upon defining the estimator at the highest frequency h, the expressions simplify, as qZhou (h) = 1if |i − j| ≤ 1, and qZhou (h) = ij ij 0 otherwise. This is the version of the estimator used in deriving the numerical results below. This estimator has also previously been analyzed by Zumbach et al. (2002). 4.2.6. Realized kernels The Zhou estimator is a special case of the realized kernels developed by Barndorff-Nielsen et al. (2008a). Letting K (·) and L denote the kernel and bandwidth, the realized kernel RV estimator is given by,
where n h −1
average
qij
(h) =
1 − nh k=0
sparse
qij
(h, k).
(4.16)
4.2.4. The (adjusted) two-scale RV estimator The two-scale estimator of Zhang et al. (2005) is obtained by average combining the RVt (h) and RVall t (h) estimators. Specifically, let n¯ ≡
1 − nh k=0
Nk =
1 nh
1 h
+ (nh − 1)
1
L − l−1 K
l =1
h
−1
average
RVt
C RVt (l, h) =
1/h −
(h)
L
−1
(h) − n¯ hRVall t
C RVt (l, h),
(4.22)
i=1+l
−
(h)
(h)
rt −1+ih rt −1+(i+l)h . (4.23)
i=1
This estimator is readily expressed in quadratic form as, RVKernel (K (·), L) = Rt (h)⊤ Q Kernel (K (·), L)Rt (h), t
(4.24)
where qKernel (K (·), L) = 1 ij
=K
for i = j l−1 L
for |i − j| = l, l ≤ L,
= 0 otherwise.
(4.25)
In the specific calculations below, we use the modified Tukey– Hanning kernel of order two advocated by Barndorff-Nielsen et al. (2008a),18 K (x) = sin2 (π (1 − x)2 /2).
1/h−l
(h)
rt −1+ih rt −1+(i−l)h +
1
¯h RVTS t (h) = 1 − n
(K (·), L) = RVt (h) +
h −1+ . (4.17) h h The (finite-sample adjusted) two-scale estimator may then be expressed as,
=
RVKernel t where
Whereas the sparse estimator only uses a small subset of the available data, the average estimator exploits more of the data by extending the estimator to each subgrid partition while retaining the associated robustness to noise for appropriately large values of h.
n h −1
(4.20)
where
(h, k) = 1 for k + 1 ≤ i = j ≤ Nk + k,
The average estimator is now simply defined by the mean of these sparse estimators,
(h)
i =1
where
= 1 for i ̸= j, (s − 1)nh + 1 + k ≤ i, j ≤ snh + k, s = 1, . . . , Nk /nh , = 0 otherwise. (4.14)
(h)
rt −1+ih rt −1+(i−1)h
= Rt (h)⊤ Q Zhou (h)Rt (h),
i =1
sparse
1/h − i=2
1/h−1
+
1 if k = 0, Nk = − 1 if k = 1, . . . , nh − 1. (4.13) h h In terms of the quadratic form representation we have,
qij
(h)2
rt −1+ih +
i=1
where, as before, h = hnh , and Nk =
1/h −
(4.26)
(4.19)
We do not use the bandwidth selection procedure of BarndorffNielsen et al. (2008a) in our benchmark kernel RV estimator but fix L = h/h − 1. However, our framework allows for direct comparison of the estimators across bandwidth choices and we explore the performance for a range of reasonable values in Section 4.4.
The initial scaling factor provides a simple finite-sample adjustment reflecting the number of terms entering into each of the two summands defining the two-scale estimator. Unlike the previously defined estimators, the two-scale measure is consistent for IVt as h → 0 under the noise assumptions in Section 2. Zhang (2006) analyzes related, but more elaborate, multi-scale estimators. We do not consider these extensions here.
18 The estimator we implement differs slightly from theirs as, in contrast to Eq. (4.23), they add returns outside the [t − 1, t ] time interval to avoid certain end-effects. Since our analysis focuses on forecasting, we want to avoid including any returns beyond time t in the realized volatility measure for the [t − 1, t ] interval. This renders the estimator inconsistent although we, a priori, expect the quantitative impact to be minor.
= Rt (h)⊤ Q TS (h)Rt (h),
(4.18)
where
¯h qTS ij (h) = 1 − n
−1
(qaverage (h) − n¯ hqall ij (h)). ij
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 220–234 Table 4 Mean, variance and MSE of RV measures.
Table 5 Correlations of RV measures.
Model
M1
IVt
Mean 0.636
Variance 0.168
MSE 0.168
Mean 0.504
Variance 0.0263
M2
2.47 1.002 1.000 0.634 0.637 0.637
0.179 0.177 0.171 0.172 0.178 0.173
3.53 0.311 0.303 0.172 0.178 0.173
1.96 0.795 0.793 0.503 0.505 0.506
0.033 0.031 0.028 0.027 0.032 0.029
MSE 0.0263
γ = 0.1% RVall t sparse RVt average RVt RVTS t RVZhou t RVKernel t
2.14 0.116 0.111 0.027 0.032 0.029
Model M2
sparse
average
IVt
RVall t
RVt
RVt
RVTS t
1.00 – – – – – –
0.891 1.00 – – – – –
0.918 0.861 1.00 – – – –
0.965 0.905 0.951 1.00 – – –
0.954 0.851 0.945 0.994 1.00 – –
1.00 – – – – – –
0.423 1.00 – – – – –
0.660 0.460 1.00 – – – –
0.878 0.613 0.751 1.00 – – –
0.863 0.243 0.688 0.915 1.00 – –
RVZhou t
RVKernel t
γ = 0.1% IVt RVall t sparse RVt average RVt RVTS t RVZhou t Kernel RVt
0.900 0.769 0.872 0.916 0.926 1.00
0.953 0.843 0.934 0.981 0.987 0.940 1.00
0.487
0.792 0.153 0.590 0.787 0.889 0.653 1.00
–
γ = 0.5%
γ = 0.5% RVall t sparse RVt average RVt RVTS t RVZhou t RVKernel t
229
9.79 2.47 2.46 0.634 0.642 0.642
0.360 0.223 0.180 0.182 0.303 0.194
84.2 3.58 3.51 0.182 0.303 0.194
7.77 1.96 1.95 0.503 0.509 0.509
0.147 0.060 0.034 0.035 0.111 0.042
52.9 2.17 2.13 0.035 0.111 0.042
Note: The table provides summary statistics for the integrated variance and several realized measures, for models M1 and M2. γ is the noise-to-signal ratio, sparse Var[noise]/E [IVt ]. RVall = RVt (1/288); RVaverage averages t t = RVt (1/1444); RVt sparse the five RVt (1/288, k), 1 ≤ k ≤ 5, measures; RVTS is the adjusted two-scale average measure combining the RVall defined above; RVZhou = RVZhou (1/1444) t and RVt t t from (4.21); RVKernel = RVKernel (Tukey–Hanning, 4) given by (4.25). t t
4.3. Distribution of robust RV measures The analytical solution for relevant cross-moments within the ESV class facilitates comparison of the properties of the estimators, even in the presence of noise. Table 4 reports the mean, variance and mean-squared-error for alternative measures of integrated variance. In principle, the ‘‘all’’ estimator employs the highest possible frequency. We fix h = 1/1440, or 1-min (15-s) sampling in a 24-h (6-h) market, as the shortest practical return interval. As predicted, the ‘‘all’’ estimator is severely inflated by microstructure noise. Under moderate noise, the estimator is, on average, almost four times as large as the underlying integrated variance while this factor rises to ten at the larger noise level, so the measure is useless as a direct estimator for the integrated variance. Moving to the sparse estimator based on h = 1/288, or 5-min sampling in a 24-h market, the upward bias remains large although it has dropped sharply relative to the ‘‘all’’ estimator. Reducing the sampling frequency further produces an even less biased estimator but we retain this relatively high frequency to explore more cleanly the implications of the noise-induced bias for the predictive ability of these measures compared to the more robust ones discussed below. The last estimator constructed directly from the standard realized volatility measure is the average estimator appearing in the third row. The averaging reduces the sampling variability, and in turn provides an improvement in the MSE compared to the sparse estimator. The noise-robust estimators are all virtually unbiased for both models and noise levels, even if we have not optimized the sampling frequency or bandwidth but keep them fixed across all model designs.19 For Model 1 and moderate noise, the estimators have close to identical sampling variability, but for all other scenarios the average, two-scale and kernel measures display the lowest variability. In particular, the Zhou estimator is not competitive in this regard even though it is designed explicitly for this type of noise structure.
19 As illustrated in Section 4.4 below, the frequency and bandwidth are close to optimal for the two-scale and kernel estimator, respectively, in the empirically relevant scenario of Model 2 and moderate noise.
IVt RVall t sparse RVt average RVt RVTS t RVZhou t Kernel RVt
−0.078 0.359 0.477 0.626 1.00 –
Note: The table provides cross-correlation of the integrated variance and selected realized measures for model M2. γ is the noise-to-signal ratio, Var[noise]/E [IVt ]. RVall = RVt (1/1444); RVsparse = RVt (1/288); RVaverage averages the five t t t sparse RVt (1/288, k), 1 ≤ k ≤ 5, measures; RVTS is the adjusted two-scale measure average combining the RVall defined above; RVZhou = RVZhou (1/1444) from t and RVt t t Kernel (4.21); RVKernel = RV ( Tukey–Hanning , 4 ) given by (4.25). t t
Table 5 provides the correlations among the alternative estimators as well as the actual integrated variance. This provides a first impression of the potential forecast performance, as high correlation with the current volatility level, everything else equal, should translate into a good prediction. Overall, the measures separate into two distinct groups. The ‘‘all’’, sparse and Zhou estimators fail to match the performance of the remainder in terms of coherence with the ideal integrated variance measure. The average estimator performs well in spite of its sizeable bias, while the nearly unbiased Zhou estimator is handicapped by its larger sampling variability and fails dramatically in the more noisy scenarios. Finally, we stress that the TS and kernel estimators are loosely calibrated to Model 2 with moderate noise, so the entries for these estimators across the other scenarios are less telling. We have also explored the population autocorrelation of the alternative estimators. Intuitively, the less noisy estimators manage to correlate better with the integrated variance and they may thus be expected to inherit the strong serial dependence present in the daily integrated variance series. This is exactly what we find: the ranking in terms of high correlation with the integrated variance measure in Table 5 is preserved when ranking the estimators in terms of serial dependence. These findings are tabulated in ABM (2006). 4.4. True forecast performance of robust RV measures We now compare the potential performance of linear forecasts constructed from the alternative return variation measures. In a direct extension of the findings for the regular RV measures in Table 1, we compute the true population R2 ’s by combining the results from Propositions 2.1 and 4.1. Given the wide array of alternatives, we focus on only one version of each estimator. Hence, the estimators are not calibrated optimally for each scenario but are, at best, designed to perform well for a couple of the relevant cases. Nonetheless, the results are sufficiently impressive that further improvements are unlike to alter the qualitative conclusions. Table 6 provides the results for daily, weekly and monthly forecast horizons. As expected, the measures most highly correlated with the true return variation also provide the best basis for forecasts. Hence, forecasts generated by the average estimator
230
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 220–234
Table 6 R2 for integrated variance forecasts.
γ
Model
Table 7 R2 for one-step ahead forecast of integrated variance when the bandwidth varies.
M1
γ
M2
(%)
Model
M1
M2
(%) Horizon
1
5
20
1
5
20
0.1
R2 (RVall t ) sparse R2 (RVt ) average R2 (RVt ) TS R2 (RVt ) R2 (RVZhou ) t R2 (RVkernel ) t
0.896 0.908 0.934 0.927 0.900 0.928
0.817 0.829 0.852 0.846 0.821 0.846
0.591 0.599 0.616 0.612 0.593 0.612
0.547 0.581 0.642 0.628 0.559 0.626
0.353 0.375 0.415 0.405 0.361 0.404
0.170 0.181 0.199 0.195 0.174 0.194
0.5
R2 (RVall t ) sparse R2 (RVt ) average R2 (RVt ) R2 (RVTS t ) R2 (RVZhou ) t R2 (RVkernel ) t
0.446 0.719 0.886 0.876 0.529 0.829
0.407 0.656 0.809 0.799 0.483 0.756
0.294 0.474 0.585 0.578 0.349 0.547
0.123 0.300 0.532 0.513 0.163 0.432
0.080 0.194 0.343 0.331 0.106 0.279
0.038 0.093 0.165 0.159 0.051 0.134
Note: The table provides the R2 of the MZ regression related to forecasts of IVt +1:t +n 1, 5 and 20 days ahead, for models M1 and M2; γ is the noise-tosignal ratio, Var[noise]/E [IVt ]. The explanatory variable is one realized measure. sparse RVall = RVt (1/288); RVaverage averages the five measures t t = RVt (1/1444); RVt sparse RVt (1/288, k), 1 ≤ k ≤ 5; RVTS is the adjusted two-scale measure combining average RVall defined above; RVZhou = RVZhou (1/1444) defined in (4.21); t and RVt t t RVKernel = RVKernel (Tukey–Hanning, 4) given by (4.25). t t
are uniformly best, although barely distinguishable from those based on the two-scale or kernel estimators when these are well calibrated. Moreover, the fall-off for the remaining forecasts is not dramatic under the realistic moderate noise setting. In fact, compared with the feasible estimators under ideal noise-free conditions, provided in Table 1, the performance of the entire range of forecasts is quite impressive. However, the picture changes for the noisier scenario where sizeable gains are attained by adhering to forecasts based on estimators which succeed in dampening the impact of the noise on the sampling variability.20 Overall, the evidence suggests that the comparatively simple average (or ‘‘subsampled’’) RV estimator is an excellent starting point for practical volatility forecasting.21 We have informally stressed the importance of low sampling variability for forecast performance across the alternative realized variation measures. This intuition may be formalized through an extension of the result underlying Eq. (3.16). Proposition 4.2. Let the discrete-time noise contaminated returns be determined by an ESV model and the relationship in Eq. (2.8). Let RM t (h) denote a realized volatility measure as defined in Eq. (4.1) with corresponding quadratic form weights qij such that
∀i, 1 ≤ i ≤ 1/h,
qii = 1.
(4.27)
Then, Cov[IVt +1 , RMt (h)] = Cov[IVt +1 , IVt ].
(4.28)
Consequently, maximizing the R2 from the regression of IVt +1 on a constant and a RMt (h) measure of the form (4.1) under the restriction (4.27) is tantamount to minimizing the variance of the
20 In this scenario, the sparse estimator performs better at a lower frequency, say h = 1/96, and if lagged measures are used to form the forecasts. However, a lower sampling frequency increases sampling variability so the approach cannot match the more elaborate noise-robust procedures. 21 The superior performance of the average estimator may be an artifact of the i.i.d. noise assumption although we expect it also to perform well for dependent noise. It is possible to accommodate more complex noise structures within a tractable ESV framework, as discussed in ABM (2006), but the lack of consensus regarding the dependence structure in the noise process is an obstacle for broadly exploring the issue.
Bandwidth
Average
TS
Kernel
Average
TS
Kernel
0.1
1 2 3 4 6 9 11 14
0.929 0.935 0.935 0.934 0.930 0.923 0.918 0.910
0.899 0.924 0.928 0.927 0.924 0.917 0.912 0.904
0.900 0.912 0.925 0.928 0.928 0.925 0.922 0.917
0.442 0.527 0.587 0.642 0.635 0.621 0.611 0.597
0.143 0.295 0.531 0.628 0.623 0.610 0.601 0.586
0.559 0.583 0.615 0.626 0.629 0.623 0.618 0.609
0.5
1 2 3 4 6 9 11 14
0.728 0.823 0.867 0.886 0.903 0.908 0.906 0.902
0.530 0.771 0.846 0.876 0.897 0.903 0.901 0.897
0.529 0.628 0.769 0.829 0.878 0.900 0.904 0.905
0.138 0.295 0.428 0.532 0.569 0.583 0.583 0.577
0.012 0.069 0.321 0.513 0.559 0.575 0.575 0.568
0.163 0.211 0.338 0.432 0.517 0.565 0.576 0.580
Note: The table provides the R2 of the MZ regression related to forecasts of IVt +1:t +n 1, 5 and 20 days ahead, for models M1 and M2; γ is the noise-to-signal ratio, average . or RVkernel Var[noise]/E [IVt ]. The explanatory variable is either RVt , RVTS t t average sparse For a given bandwidth L, RVt averages the (L + 1) measures RVt ((L + TS 1)/1444, k), 1 ≤ k ≤ L + 1; RV is the adjusted two-scale measure that average combines RVall with the same bandwidth; RVKernel = t = RVt (1/1444) and RVt t Kernel RVt (Tukey–Hanning, L) defined in (4.25).
measure RMt (h). The restriction (4.27) holds for the sparse and Zhou estimators, and for any kernel estimator including the nonflat top kernels introduced by Barndorff-Nielsen et al. (2008b). It is not satisfied for the average and two scale estimators at the edges of the trading day, although it will be close to valid for these as well in most circumstances. Finally, we explore the importance of calibrating the sampling frequency and bandwidth for the average, TS and kernel estimators. Table 7 reports the population R2 across model designs for bandwidths spanning 1 to 14. Evidently, higher noise levels and less persistent volatility processes (Model 2) tend to increase the optimal bandwidth. Moreover, there is a distinct pattern to the degree of predictability as the bandwidth rises: performance improves, then levels off and declines. Only for the kernels in the high noise scenario do we not observe a maximum degree of predictability, as this noise level is best accommodated with a very conservative bandwidth. Note also that a bandwidth of four, as in Tables 4–6, is close to optimal for all the estimators in the more realistic scenario of Model 2 and moderate noise. 4.5. Feasible forecasting performance of robust RV measures The integrated volatility regressand of the Mincer–Zarnowitz regressions in Section 4.4 is latent. In practice it must be replaced by some realized volatility measure, as in, RM t +1:t +m (h) = a + bRMt (h) + ηt +m ,
(4.29)
where RM t (h) and RM t (h) denote possibly different realized measures. The associated regression R2 involves a covariance term which we have not directly considered previously, R2 =
(Cov[RM t +1:t +m (h), RMt (h)])2 Var[RM t +1:t +m (h)]Var[RMt (h)]
.
(4.30)
The following proposition provides closed form expressions for the requisite covariance term. Proposition 4.3. Let the discrete-time noise contaminated returns be determined by an ESV model and the relationship in Eq. (2.8). Let RM t (h) and RM t (h) denote two realized volatility measures as defined
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 220–234 Table 8 R2 for one-step-ahead RV forecasts. Indp.Var.
sparse
average
RVall t
RVt
RVt
RVTS t
RVZhou t
RVKernel t
0.841 0.852 0.877 0.870 0.844 0.870
0.852 0.864 0.889 0.882 0.856 0.882
0.877 0.889 0.914 0.908 0.880 0.908
0.870 0.882 0.908 0.901 0.874 0.902
0.844 0.856 0.880 0.874 0.848 0.874
0.870 0.882 0.901 0.901 0.874 0.901
0.208 0.336 0.414 0.409 0.247 0.387
0.336 0.542 0.668 0.659 0.399 0.624
0.414 0.668 0.823 0.813 0.492 0.769
0.409 0.659 0.813 0.803 0.486 0.760
0.247 0.399 0.492 0.486 0.294 0.460
0.387 0.624 0.769 0.760 0.460 0.720
Model M1
γ = 0.1% RVall RVsp. RVav. RVTS RVZhou RVKer.
γ = 0.5% RVall RVsp. RVav. RVTS RVZhou RVKer.
231
due to the increased number of cross-comparisons.22 As before, the relative rankings are preserved over the longer horizons. Table 8 conveys the now familiar picture. The average estimator dominates uniformly both as the basis for forecasts and as the proxy for the future realized return variation. Moreover, the prior rankings are preserved everywhere across all the model designs. It is evident that using a precise ex-post estimator for the integrated variance improves the measured degree of predictability and allows the regressions to better convey the true relationship as captured in Table 7. For example, consider Model 2 and γ = 0.5%. Using the average estimator as the basis for the forecast and as the ex-post proxy for future return variation realizations, the R2 is 41% compared to the actual R2 of about 53% in Table 7. In contrast, exploiting the Zhou estimator in both capacities results in an R2 below 4%. Although the figures reflect the specific model design, it exemplifies how the issue of observed versus true underlying predictability is crucially important in properly interpreting empirical studies in this area.23
Model M2
5. Conclusion
γ = 0.1% RVall RVsp. RVav. RVTS RVZhou RVKer.
0.434 0.461 0.510 0.498 0.444 0.497
0.461 0.490 0.541 0.529 0.471 0.528
0.510 0.541 0.598 0.585 0.520 0.583
0.498 0.529 0.585 0.572 0.509 0.570
0.444 0.471 0.520 0.509 0.453 0.507
0.497 0.528 0.588 0.570 0.507 0.568
0.022 0.054 0.095 0.092 0.029 0.077
0.054 0.131 0.231 0.223 0.071 0.188
0.095 0.231 0.410 0.396 0.126 0.333
0.092 0.223 0.396 0.382 0.122 0.321
0.029 0.071 0.126 0.122 0.039 0.102
0.077 0.188 0.333 0.321 0.102 0.271
γ = 0.5% RVall RVsp. RVav. RVTS RVZhou RVKer.
Note: The table provides the R2 of the MZ regression related to forecasts for a realized measure (dependent variable) one-day ahead, for models M1 and M2; γ is the noise-to-signal ratio, Var[noise]/E [IVt ]. The explanatory variable is a realized measure. RVall = RVt (1/1444); RVsparse = RVt (1/288); RVaverage averages the t t t sparse five measures RVt (1/288, k), 1 ≤ k ≤ 5; RVTS is the adjusted two-scale average all Zhou Zhou measure that combines RVt and RVt defined above; RVt = RVt (1/1444) is defined in (4.21); RVKernel = RVKernel (Tukey–Hanning, 4) is given by (4.25). t t
in Eq. (4.1) with corresponding quadratic form weights qij and qij , respectively. Then, Cov[RM t +1 (h), RMt (h)]
=
1/h − 1/h −
qii qkk
i=1 k=1
p − 2 a2n 1 − exp(λn h) 2 λn n =1
Appendix. Technical proofs ∗(h)
Proof of Proposition 2.1. In the absence of any drift, E [rt +ih ] =
× exp(−λn (1 + (i − k − 1)h)) + q11 qh−1 h−1 (Ku − 1)Vu2 ,
∗(h) Var[rt +ih ]
(4.31)
Cov[RM t +m (h), RMt (h)]
=
i=1 k=1
qii qkk
p − 2 a2n 1 − exp(λn h) 2 λ n n =1
× exp(−λn (m + (i − k − 1)h)) .
0 and = a0 h (see, e.g. Meddahi, 2002b). Now given the i.i.d. assumption for the noise ut , (2.15) and (2.16) follows ∗(h) readily from (2.8). Likewise, the non-contaminated returns rt +ih (h)
are uncorrelated (see, e.g. Meddahi, 2002b), while et is an MA(1) (h) process. Hence, the observed returns rt +ih will also follow an MA(1) process with
and for m > 1, 1/h − 1/h −
This paper extends existing analytic methods for the construction and assessment of volatility forecasts for diffusion models to the important case of market microstructure noise. The procedures are valid within the ESV model class, which includes most popular volatility diffusions, and may be adapted to accommodate other empirically relevant features. We apply the techniques to a few representative specifications for which we compare the performance of feasible linear forecasts constructed from alternative realized variation measures in the presence of noise to those based on optimal (infeasible) forecasts. We find it feasible to construct fairly precise forecasts but many aspects of the implementation require careful examination of the underlying market structure and data availability in order to design effective procedures. Given the vast diversity in potential models, sampling frequencies, levels of microstructure noise, realized variation estimators and forecasting schemes, the costs associated with comprehensive simulation studies are formidable. Instead, the ESV analytical tools developed here enable us to study the relevant issues succinctly across alternative designs within a coherent framework, thus providing a guide for general performance and robustness. As such, we expect the approach to provide additional useful insights in future work concerning the design of alternative return variation measures and their application in the context of volatility forecasting.
∗(h)
∗(h)
(h)
(h)
Cov[rt +ih , rt +(i−1)h ] = Cov[et +ih , et +(i−1)h ] = −Var[ut ] = −Vu ,
(4.32)
Table 8 provides feasible performance measures that, ideally, may be obtained from the forecast procedures discussed in Section 4.4. We only report figures for the one-step-ahead forecasts
22 Also note that for the feasible regressions analyzed here minimizing the variance of the explanatory robust measure is not equivalent to maximizing the R2 in the Mincer–Zarnowitz regression, as the numerator in the R2 will also depend on h, and Cov[IVt +1 , RMt (h)] ̸= Cov[IVt +1 , IVt ] for qii ̸= 1. 23 As previously noted, ABM (2005) provide a technique for formally converting the observed degree of predictability into an estimate of the higher true predictability through a fairly simple procedure.
232
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 220–234
i.e., (2.17). We will now prove (2.18). As a short-hand notation, let (h) ∗(h) (h) ri , ri∗ , and ei refer to rt +ih , rt +ih , and et +ih . We then have
Denote the remaining terms that appear in (A.1), Aijkl ≡ E [ri∗ rj∗ ]E [ek el ] + E [ri∗ rl∗ ]E [ej ek ] + E [ri∗ rk∗ ]E [ej el ]
ri rj rk rl = (ri + ei )(rj + ej )(rk + ek )(rl + el ) ∗
∗
∗
∗ ∗ ∗ ∗
∗ ∗
∗
∗
∗ ∗ ∗
+ E [ei ek ]E [rj∗ rl∗ ] + E [ei el ]E [rj∗ rk∗ ] + E [ei ej ]E [rk∗ rl∗ ].
∗ ∗
∗
∗ ∗
= ri rj rk rl + ri rj ek rl + ri rj rk el + ri rj ek el + ri ej rk rl ∗
∗
∗
∗
∗
∗ ∗ ∗
∗
∗
+ ri ej ek rl + ri ej rk el + ri ej ek el + ei rj rk rl + ei rj ek rl + ei rj∗ rk∗ el + ei rj∗ ek el + ei ej rk∗ rl∗
From above E [ri∗ rj∗ ] = δi,j E [(ri∗ )2 ] = δi,j a0 h and E [ei ej ] = δi,j E [e2i ] − δ|i−j|,1 Vu = 2δi,j Vu − δ|i−j|,1 Vu . Hence, Aijkl = 6E [(ri∗ )2 ]E [e2i ] = 12a0 Vu h if i = j = k = l,
+ ei ej ek rl∗ + ei ej rk∗ el + ei ej ek el .
= 3E [(ri∗ )2 ]E [ei ei−1 ] = −3a0 Vu h if i = j = k = l + 1 or i = j + 1 = k + 1 = l + 1, = 2E [(ri∗ )2 ]E [e2k ] = 4a0 Vu h if i = j = k + 1 = l + 1,
The returns are independent from the noise. In addition, the mean of the noise and returns are zero. This implies that quantities like E [ri∗ rj∗ ek rl∗ ] and E [ri∗ ej ek el ] equal zero. Therefore, ∗ ∗ ∗ ∗
∗ ∗
= 2E [(ri∗ )2 ]E [e2k ] = 4a0 Vu h if i = j > k + 1, k = l, = 0 if i = j + 1, j = k = l + 1, = E [(ri∗ )2 ]E [ek ek−1 ] = −a0 Vu h if i = j ≥ k + 1, k = l + 1 or i = j + 1, j ≥ k + 1, k = l, = 0 if i = j + 1, j ≥ k + 1, k = l + 1, = 0 otherwise. (A.4)
∗ ∗
E [ri rj rk rl ] = E [ri rj rk rl ] + E [ri rj ]E [ek el ] + E [ri rl ]E [ej ek ]
+ E [ri∗ rk∗ ]E [ej el ] + E [ei ek ]E [rj∗ rl∗ ] + E [ei el ]E [rj∗ rk∗ ] + E [ei ej ]E [rk∗ rl∗ ] + E [ei ej ek el ].
(A.1)
We will now compute the elements that appear in (A.1). We start with the first term. Given the path of the volatility, the returns are independent. Therefore (see, e.g. Meddahi, 2002b), E [ri rj rk rl ] = E [(ri ) ] ∗ ∗ ∗ ∗
∗ 4
=
if i = j = k = l,
∗(h) ∗(h) Cov[(rt −1+ih )2 , (rt −1+kh )2 ] ∗ 2 2
+ (E [(ri ) ])
if i = j > k = l,
(h)
E [ri rj rk rl ] =
= δi,j δk,l Cov[(rt∗(−h1)+m+ih )2 , (rt∗(−h1)+kh )2 ] + δm,1 δi,j δk,l δi,1 δk,1/h
if i = j = k = l, p − a2n [1 − exp(−λk h)]2 exp(−λn (i − k − 1)h) = 2 λ n n =1
+
if i = j > k = l,
= 0 otherwise.
(A.2)
To compute the last term in (A.1) note that
× Cov[e(t h+)h e(t h+)h , e(t h) e(t h) ] [∫ t −1+m+ih ∫ 2 = δi,j δk,l Cov σu du, t −1+m+(i−1)h
Lemma A.1. Let a, b, c , d be real numbers such that a ≤ b ≤ c ≤ d. Then, for any h > 0,
σu2 du, a
∫
d
c
] p − a2n σu2 du = [1 − exp(−λn (b − a))] λ2n n =1 × [1 − exp(−λn (d − c ))] × exp(−λn (c − b)).
= −Vu2 (Ku + 3) if i = j = k = l + 1 or i = j + 1 = k + 1 = l + 1, = Vu2 (Ku + 3) if i = j = k + 1 = l + 1,
Proof of Lemma A.1. We have
= 4Vu2 if i = j > k + 1, k = l,
b
[∫
σu2 du,
Cov
if i = j + 1, j = k = l + 1,
a
if i = j ≥ k + 1, k = l + 1
=
or i = j + 1, j ≥ k + 1, k = l, = Vu2 if i = j + 1, j ≥ k + 1, k = l + 1,
= 0 otherwise.
b
Cov
E [ei ej ek el ] = 2Vu2 (Ku + 3) if i = j = k = l,
=−
σ
]
where the first part in the last equation is a consequence of Lemma A.1 given below. The last equation achieves the proof of (2.19).
[∫
Now, using the i.i.d. structure of the ut process it follows that
2Vu2
t −1+(k−1)h
2 u du
+ δm,1 δi,j δk,l δi,1 δk,1/h (Ku − 1)Vu2 ,
= ui uj uk ul − ui uj uk ul−1 − ui uj uk−1 ul + ui uj uk−1 ul−1 − ui uj−1 uk ul + ui uj−1 uk ul−1 + ui uj−1 uk−1 ul − ui uj−1 uk−1 ul−1 − ui−1 uj uk ul + ui−1 uj uk ul−1 + ui−1 uj uk−1 ul − ui−1 uj uk−1 ul−1 + ui−1 uj−1 uk ul − ui−1 uj−1 uk ul−1 − ui−1 uj−1 uk−1 ul + ui−1 uj−1 uk−1 ul−1 .
=
t −1+kh
+ δm,1 δi,j δk,l δi,1 δk,1/h Cov[u2t , u2t ] p − a2n = δi,j δk,l [1 − exp(−λn h)]2 exp(−λn (m + (i − k − 1)h)) λ2n n=1
ei ej ek el = (ui − ui−1 )(uj − uj−1 )(uk − uk−1 )(ul − ul−1 )
2Vu2
(h)
+ Cov[e(t h−)1+m+ih e(t h−)1+m+jh , e(t h−)1+kh e(t h−)1+lh ]
p − a2i +6 [−1 + λi h + exp(−λi h)] λ2i i =1
a20 h2
(h)
= Cov[rt∗(−h1)+m+ih rt∗(−h1)+m+jh , rt∗(−h1)+kh rt∗(−h1)+lh ]
Eqs. (3.7) and (3.10) in Meddahi (2002b) now imply 3a20 h2
(h)
Cov[rt −1+m+ih rt −1+m+jh , rt −1+kh rt −1+lh ]
= 0 otherwise.
∗ ∗ ∗ ∗
Now combining (A.2)–(A.4) results in (2.18). The proof of (2.19) proceeds similarly to one for (2.18). The main difference stems from the fact that t − 1 + m + jh > t − 1 + kh, so that several terms that appear in (2.18) are now zero. In particular, (h) by using the MA(1) structure of et , it follows that
− 1≤n,m≤p
≡ (A.3)
− 1≤n,m≤p
∫
d
σu2 du
]
c b
[∫
Pn (fu )du
an am E a
an am bn,m ,
d
∫
Pm (fu )du c
]
(A.5)
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 220–234 b
[∫
Pn (fu )du
bn,m = E
d
∫
E [Pm (fu ) | fτ , τ ≤ b]du
b
E [Pn (fu )Pm (fb )]du
= a
=
Cov[RM t +m (h), RMt (h)] =
(h)
∗(h) ∗(h) E [IVt E [rt −1+ih rt −1+jh [ ∫ t −1+ih
= δi,j E IVt
| στ , τ ≤ t ]] − a0 δi,j a0 h ] 2 2 σu du − a0 h
t −1+(i−1)h t −1+(i−1)h
∫
= δi,j Cov
σ
2 u du
∫
t
σ
2 u du
+
∫ ,
t −1+ih
t −1+ih
+ t −1+(i−1)h
t −1
σu2 du
t −1+ih
σ
2 u du
t −1+(i−1)h
p − a2n [2 − exp(−λn (i − 1)h) − exp(−λn (1 − ih))] λ2n n =1 [∫ ] t −1+ih
× [1 − exp(−λn h)] + Var
t −1+(i−1)h
σu2 du
,
(A.6)
where the last equality holds due to Lemma A.1. Eq. (15) in ABM (2004) implies
] p − a2n σu2 du = 2 [exp(−λn h) + λn h − 1]. (A.7) λ2n t −1+(i−1)h n =1 t −1+ih
By combining (A.6) and (A.7), one gets (2.20). Similar arguments lead to (h)
(h)
∗(h)
∗(h)
Cov[IVt +1:t +m , rt −1+ih rt −1+jh ] = Cov[IVt +1:t +m , rt −1+ih rt −1+jh ]
[
= δi,j Cov IVt +1:t +m , = δi,j
−
qij qkl
∫
t −1+ih t −1+(i−1)h
σu2 du
]
p − a2i [1 − exp(−λn h)][1 − exp(−λn m)] λ2i i =1
× exp(−λn (1 − ih)) ,
When m > 1, (A.9) combined with the first part of (2.19) leads to (4.32). When m = 1, (A.9) combined with the second and third parts of (2.19) leads to (4.31). This achieves the proof of Proposition 4.3. References
− E [IVt ]E [rt∗(−h1)+ih rt∗(−h1)+jh ]
]
1≤i,j,k,l≤1/h
= Cov[IVt , rt∗(−h1)+ih rt∗(−h1)+jh ] = E [IVt rt∗(−h1)+ih rt∗(−h1)+jh ]
[∫
t −1+(i−1)h
σu2 du
× Cov[rt(+h)m+ih rt(+h)m+jh , rt(+h)kh rt(+h)lh ]. (A.9)
(h)
Var
t −1+ih
Proof of Proposition 4.3. We have
Cov[IVt , rt −1+ih rt −1+jh ]
= δi,j
∫
under (4.27), which achieves the proof of (4.28) and Proposition 4.2.
In order to prove (2.20) note that the independence of the noise with the volatility and the no leverage assumption imply that
[
qii Cov IVt +1 ,
= Cov[IVt +1 , IVt ]
i.e., (A.5).
∫
− 1≤i≤1/h
a
[1 − exp(−λm (d − c ))] × exp(−λm (c − b)) λm − [1 − exp(−λn (b − a))] = an am δn,m λn 1≤n,m≤p [1 − exp(−λm (d − c ))] exp(−λn (c − b)), × λm
=
(h)
qij Cov[IVt +1 , rt −1+ih rt −1+jh ]
1≤i,j≤1/h
exp(−λn (b − u))δn,m du
an am
1≤n,m≤p
exp(−λm (u − b))du
(h)
−
c
b
∫
−
=
Cov[IVt +1 , RMt (h)] =
d
∫
233
Proof of Proposition 4.2. The first line of (A.8) implies
c
a
∫
]
(A.8)
where the last equality holds due to Lemma A.1. This achieves the proof of (2.21).
Aït-Sahalia, Y., Hansen, L.P., Scheinkman, J., 2010. Operator Methods for Continuoustime Markov Models. In: Aït-Sahalia, Y., Hansen, L.P. (Eds.), Handbook of Financial Econometrics, North-Holland, pp. 1–66. Aït-Sahalia, , Mancini, L., 2008. Out of sample forecasts of quadratic variation. Journal of Econometrics 147, 17–33. Aït-Sahalia, , Mykland, P.A., Zhang, L., 2005. How often to sample a continuous-time process in the presence of market microstructure noise. Review of Financial Studies 18, 351–416. Andersen, T.G., Bollerslev, T., 1998. Answering the skeptics: yes, standard volatility models do provide accurate forecasts. International Economic Review 39, 885–905. Andersen, T.G., Bollerslev, T., Christoffersen, P.F., Diebold, F.X., 2006. Volatility forecasting. In: Elliott, G., Granger, C.W.J., Timmermann, A. (Eds.), Handbook of Economic Forecasting. North-Holland, pp. 778–878. Andersen, T.G., Bollerslev, T., Diebold, F.X., 2010. Parametric and Nonparametric Volatility Measurement. In: Aït-Sahalia, Y., Hansen, L.P. (Eds.), Handbook of Financial Econometrics, North-Holland, pp. 67–128. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2000. Great realizations. RISK 13, 105–108. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2001. The distribution of exchange rate volatility. Journal of the American Statistical Association 96, 42–55. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2003. Modeling and forecasting realized volatility. Econometrica 71, 579–625. Andersen, T.G., Bollerslev, T., Lange, S., 1999. Forecasting financial market volatility: sample frequency vis-à-vis forecast horizon. Journal of Empirical Finance 6, 457–477. Andersen, T.G., Bollerslev, T., Meddahi, N., 2004. Analytic evaluation of volatility forecasts. International Economic Review 45, 1079–1110. Andersen, T.G., Bollerslev, T., Meddahi, N., 2005. Correcting the errors: volatility forecast evaluation using high-frequency data and realized volatilities. Econometrica 73, 279–296. Andersen, T.G., Bollerslev, T., Meddahi, N., 2006. Realized volatility forecasting and market microstructure noise. Working Paper. Montréal University. Areal, N.M.P.C., Taylor, S.J., 2002. The realized volatility of FTSE-100 futures prices. Journal of Futures Markets 22, 627–648. Bandi, F., Russell, J., 2006. Separating microstructure noise from volatility. Journal of Financial Economics 79, 655–692. Bandi, F., Russell, J., 2008. Microstructure noise, realized volatility, and optimal sampling. Review of Economic Studies 75, 339–369. Bandi, F., Russell, J., Zhu, Y., 2008. Using high-frequency data in dynamic portfolio choice. Econometric Reviews 27, 163–198. Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N., 2008a. Designing realised kernels to measure the ex-post variation of equity prices in the presence of noise. Econometrica 76, 1481–1536. Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N., 2008b. Multivariate realised kernels: consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading. Working Paper. Oxford University. Barndorff-Nielsen, O.E., Shephard, N., 2001. Non-Gaussian OU based models and some of their uses in financial economics, with discussion. Journal of the Royal Statistical Society, Series B 63, 167–241. Barndorff-Nielsen, O.E., Shephard, N., 2002. Econometric analysis of realised volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society, Series B 64, 253–280. Chen, X., Hansen, L.P., Scheinkman, J., 2009. Nonlinear principal components and long run implications of multivariate diffusions. The Annals of Statistics 37, 4279–4312.
234
T.G. Andersen et al. / Journal of Econometrics 160 (2011) 220–234
Corradi, V., Distaso, W., Swanson, N.R., 2009a. Predictive density estimators for daily volatility based on the use of realized measures. Journal of Econometrics 150, 119–138. Corradi, V., Distaso, W., Swanson, N.R., 2009b. Predictive inference for integrated volatility. Unpublished Manuscript. University of London. Corsi, F., 2009. A simple approximate long memory model of realized volatility. Journal of Financial Econometrics 7, 174–196. Deo, R., Hurvich, C., Lu, Y., 2006. Forecasting realized volatility using a long-memory stochastic volatility model: estimation, prediction and seasonal adjustment. Journal of Econometrics 131, 29–58. Engle, R.F., Lee, G.G.J., 1999. A permanent and transitory component model for stock return volatility. In: Engle, R.F., White, H. (Eds.), Cointegration, Causality and Forecasting: A Festschrift in Honour of Clive W.J. Granger. Oxford University Press, pp. 475–497. Garcia, R., Lewis, M.A., Pastorello, S., Renault, E., 2011. Estimation of objective and risk-neutral distributions based on moments of integrated volatility. Journal of Econometrics 160 (1), 22–32. Garcia, R., Meddahi, N., 2006. Comment on realized variance and market microstructure noise. Journal of Business and Economic Statistics 24, 184–191. Ghysels, E., Santa-Clara, P., Valkanov, R., 2006. Predicting volatility: getting the most out of return data sampled at different frequencies. Journal of Econometrics 131, 59–96. Ghysels, E., Sinko, A., 2006. Volatility forecasting and microstructure noise. Unpublished Manuscript. University of North Carolina. Hansen, P.R., Lunde, A., 2006. Realized variance and market microstructure noise. Journal of Business and Economic Statistics 24, 127–161. Hansen, L.P., Scheinkman, J., 1995. Back to the future: generating moment implications for continuous time Markov processes. Econometrica 63, 767–804. Hull, J., White, A., 1987. The pricing of options on assets with stochastic volatilities. Journal of Finance XLII, 281–300. Jacod, J., Protter, P., 1998. Asymptotic error distributions for the Euler method for stochastic differential equations. The Annals of Probability 26, 267–307.
Koopman, S.J., Jungbacker, B., Hol, E., 2005. Forecasting daily variability of the S&P 100 stock index using historical, realized and implied volatility measures. Journal of Empirical Finance 12, 445–475. Martens, M., 2002. Measuring and forecasting S&P 500 index futures volatility using high-frequency data. Journal of Futures Markets 22, 497–518. Meddahi, N., 2001. An eigenfunction approach for volatility modeling. CIRANO Working Paper. 2001s-70. Meddahi, N., 2002a. A theoretical comparison between integrated and realized volatility. Journal of Applied Econometrics 17, 479–508. Meddahi, N., 2002b. Moments of continuous time stochastic volatility models. Working Paper. Université de Montréal. Merton, R.C., 1980. On estimating the expected return on the market: an exploratory investigation. Journal of Financial Economics 8, 323–361. Patton, A.J., 2011. Volatility forecast comparison using imperfect volatility proxies. Journal of Econometrics 160 (1), 246–256. Pong, S., Shackleton, M.B., Taylor, S.J., Xu, X., 2004. Forecasting currency volatility: a comparison of implied volatilities and AR(FI)MA models. Journal of Banking and Finance 28, 2541–2563. Sun, X., 2006. Best quadratic unbiased estimators of integrated variance in the presence of market microstructure noise. Working Paper. University of California at San Diego. Thomakos, D.D., Wang, T., 2003. Realized volatility in the futures market. Journal of Empirical Finance 10, 321–353. Zhang, L., 2006. Efficient estimation of stochastic volatility using noisy observations: a multi-scale approach. Bernoulli 12, 1019–1043. Zhang, L., Mykland, P.A., Aït-Sahalia, Y., 2005. A tale of two time scales: determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association 100, 1394–1411. Zhou, B., 1996. High-frequency data and volatility in foreign exchange rates. Journal of Business and Economic Statistics 14, 45–52. Zumbach, G., Corsi, F., Trapletti, A., 2002. Efficient estimation of volatility using high-frequency data. Unpublished Manuscript. Olsen & Associates, Zürich, Switzerland.
Journal of Econometrics 160 (2011) 235–245
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Dynamic estimation of volatility risk premia and investor risk aversion from option-implied and realized volatilities✩ Tim Bollerslev a,b,c,∗ , Michael Gibson d,1 , Hao Zhou d,2 a
Department of Economics, Duke University, Post Office Box 90097, Durham NC 27708, USA
b
NBER, USA CREATES, Denmark d Risk Analysis Section, Federal Reserve Board, Mail Stop 91, Washington DC 20551, USA c
article
info
Article history: Available online 6 March 2010 JEL classification: G12 G13 C51 C52 Keywords: Stochastic volatility risk premium Model-free implied volatility Model-free realized volatility Black–Scholes GMM estimation Return predictability
abstract This paper proposes a method for constructing a volatility risk premium, or investor risk aversion, index. The method is intuitive and simple to implement, relying on the sample moments of the recently popularized model-free realized and option-implied volatility measures. A small-scale Monte Carlo experiment confirms that the procedure works well in practice. Implementing the procedure with actual S&P500 option-implied volatilities and high-frequency five-minute-based realized volatilities indicates significant temporal dependencies in the estimated stochastic volatility risk premium, which we in turn relate to a set of macro-finance state variables. We also find that the extracted volatility risk premium helps predict future stock market returns. © 2010 Elsevier B.V. All rights reserved.
1. Introduction Two new model-free volatility measures have figured prominently in the recent academic and financial market practitioner literature. ‘‘Model-free realized volatilities’’ are computed by summing squared returns from high-frequency data over short time
✩ The work of Bollerslev was supported by a grant from the National Science Foundation to the NBER and CREATES funded by the Danish National Research Foundation. We would also like to thank three anonymous referees, Alain Chaboud, N.K. Chidambaran, Hui Guo, George Jiang, Chris Jones, Nellie Liang, Nour Meddahi, Nagpurnanand R. Prabhala, Patricia White, and seminar participants at the Venice Conference on Time-Varying Financial Structures 2005, the Federal Reserve Conference on Financial Market Risk Premia 2005, Peking University, the Bank for International Settlement, and the AFA 2006 Annual Meeting for useful comments and suggestions. The views presented here are solely those of the authors and do not necessarily represent those of the Federal Reserve Board or its staff. Matthew Chesnes and Stephen Saroki provided excellent research assistance. ∗ Corresponding author at: Department of Economics, Duke University, Post Office Box 90097, Durham NC 27708, USA. Tel.: +1 919 660 1846; fax: +1 919 684 8974. E-mail addresses:
[email protected] (T. Bollerslev),
[email protected] (M. Gibson),
[email protected] (H. Zhou). 1 Tel.: +1 202 452 2495; fax: +1 202 728 5887.
2 Tel.: +1 202 452 3360; fax: +1 202 728 5887. 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.033
intervals during the trading day. As demonstrated in the literature, these types of measures afford much more accurate ex post observations of actual volatility than the more traditional sample variances based on daily or coarser frequency data (Andersen et al., 2001; Barndorff-Nielsen and Shephard, 2002; Meddahi, 2002; Andersen et al., forthcoming, 2003; Barndorff-Nielsen and Shephard, 2004a; Andersen et al., 2004). ‘‘Model-free implied volatilities’’ are computed from option prices without the use of any particular option-pricing model. These measures provide ex ante riskneutral expectations of future volatilities and do not rely on the Black–Scholes pricing formula or some variant thereof (Carr and Madan, 1998; Demeterfi et al., 1999; Britten-Jones and Neuberger, 2000; Lynch and Panigirtzoglou, 2003; Jiang and Tian, 2005; Carr and Wu, 2009).3 In this paper, we combine these two new volatil-
3 Market participants have also recently developed several new products – realized variance futures, VIX futures, and over-the-counter (OTC) variance swaps – that are based on these two model-free volatility measures. Specifically, the Chicago Board Option Exchange (CBOE) recently changed its implied volatility index (VIX) to use the model-free implied volatility approach and the more popular S&P500 index options (CBOE Documentation, 2003), while the CBOE Futures Exchange began to trade futures on the VIX on March 26, 2004 and realized variance futures on the S&P500 on May 18, 2004. Demeterfi et al. (1999) discuss OTC variance swaps.
236
T. Bollerslev et al. / Journal of Econometrics 160 (2011) 235–245
ity measures to improve on existing estimates of the risk premium associated with stochastic volatility risk and investor risk aversion. Because the method we present here directly uses the modelfree realized and implied volatilities to extract the stochastic volatility risk premium, it is easier to implement than other methods which rely on the joint estimation of both the underlying asset return and the price(s) of one or more of its derivatives, which requires complicated modeling and estimation procedures (see, e.g., Bates, 1996; Chernov and Ghysels, 2000; Jackwerth, 2000; Aït-Sahalia and Lo, 2000; Benzoni, 2002; Pan, 2002; Jones, 2003; Eraker, 2004; Aït-Sahalia and Kimmel, 2007, among many others). In contrast, the method of this paper relies on standard GMM estimation of the cross conditional moments between risk-neutral and objective expectations of integrated volatility to identify the stochastic volatility risk premium. As such, the method is simple to implement and can easily be extended to allow for a time-varying volatility risk premium. Indeed, one feature of our estimation strategy is that it allows us to capture time-variation in the volatility risk premium, possibly driven by a set of economic state variables.4 The closest paper to ours is Garcia et al. (forthcoming), who estimate jointly the risk-neutral and objective dynamics, using a series expansion of option-implied volatility around the Black–Scholes implied volatility. This approach has the advantage of relying on only a single option price to identify the risk premium parameter, while our approach requires a large number of option prices to construct the model-free implied volatility measure. On the other hand, model-free implied volatility effectively aggregate out some of the pricing errors in individual options, which may adversely affect the series expansion approach. Garcia et al. (forthcoming) also bring in higher order moments of the integrated volatility (skewness in particular) to identify the underlying dynamics, beyond the first two conditional moments explored by Bollerslev and Zhou (2002). To validate the performance of the new estimation strategy, we perform a small-scale Monte Carlo experiment focusing directly on our ability to precisely estimate the risk premium parameter. While the estimation strategy applies generally, the Monte Carlo study focuses on the popular Heston (1993) stochastic volatility model. The results confirm that using model-free implied volatility from options with one month to maturity and realized volatility from five-minute returns, we can estimate the volatility risk premium nearly as well as if we were using the actual (unobserved and infeasible) risk-neutral implied volatility and continuous time integrated volatility. However, the use of Black–Scholes implied volatilities and/or realized volatilities from daily returns generally results in biased and inefficient estimates of the risk premium parameter, leading to unreliable statistical inference. To illustrate the procedure empirically, we apply the method to estimate the volatility risk premium associated with the S&P500 market index. We extend the method to allow for time variation in the stochastic volatility risk premium. We allow the premium to vary over time and to depend on macro-finance state variables. We find statistically significant effects on the volatility risk premium from several macro-finance variables, including the market volatility itself, the price-earnings (P /E) ratio of the market, a measure of credit spread, industrial production, housing starts, the producer price index, and nonfarm employment.5 Our results give structure to the intuitive notion that the difference between implied and realized volatilities reflects a
volatility risk premium that responds to economic state variables. As such, our findings should be of direct interest to market participants and monetary policymakers who are concerned with the links between financial markets and the overall economy. Further broadening our results, we also find that the estimated time-varying volatility risk premium predicts future stock market returns better than several established predictor variables. The rest of the paper is organized as follows. Section 2 outlines the basic theory behind our simple GMM estimation procedure, while Section 3 provides finite sample simulation evidence on the performance of the estimator. Section 4 applies the estimator to the S&P500 market index, explicitly linking the temporal variation in the volatility risk premium to a set of underlying macro-finance variables. This section also documents our findings related to return predictability. Section 5 concludes. 2. Identification and estimation of the volatility risk premium Consider the general continuous-time stochastic volatility model for the logarithmic stock price process (pt = log St ),
dpt = µt (·)dt + Vt dB1t , dVt = κ(θ − Vt )dt + σt (·)dB2t ,
where the instantaneous corr(dB1t , dB2t ) = ρ denotes the familiar leverage effect, and the functions µt (·) and σt (·) must satisfy the usual regularity conditions. Assuming no arbitrage and a linear volatility risk premium, the corresponding risk-neutral distribution then takes the form
dpt = rt∗ dt + Vt dB∗1t , dVt = κ ∗ (θ ∗ − Vt )dt + σt (·)dB∗2t ,
risk premium – expected excess return – and macro-finance state variables are
(2)
where corr(dB∗1t , dB∗2t ) = ρ , and rt∗ denotes the risk-free interest rate. The risk-neutral parameters in (2) are directly related to the parameters of the actual price process in Eq. (1) by the relationships, κ ∗ = κ + λ and θ ∗ = κθ /(κ + λ), where λ refers to the volatility risk premium parameter of interest. Note that the functional forms of µt (·) and σt (·) are completely flexible as long as they avoid arbitrage. 2.1. Model-free volatility measures and moment restrictions The point-in-time volatility Vt entering the stochastic volatility model above is latent and its consistent estimation through filtering faces a host of market microstructure complications. Alternatively, the model-free realized volatility measures afford a simple approach for quantifying the integrated volatility over non-trivial time intervals. In our notation, let Vtn,t +∆ denote the realized volatility computed by summing the squared highfrequency returns over the [t , t + ∆] time-interval:
Vtn,t +∆ ≡
n −
pt + i (∆) − pt + i−1 (∆) n
i=1
n
2
.
(3)
It follows then by the theory of quadratic variation (see, e.g., Andersen et al. (forthcoming), for a recent survey of the realized volatility literature), a.s.
lim Vtn,t +∆ −→ Vt ,t +∆ ≡
n→∞
4 The general strategy developed here is also related to the literature on market-implied risk aversion (see, e.g., Jackwerth, 2000; Aït-Sahalia and Lo, 2000; Rosenberg and Engle, 2002; Brandt and Wang, 2003; Bliss and Panigirtzoglou, 2004; Gordon and St-Amour, 2004). A recent paper by Wu (2005) also uses model-free realized and implied volatilities to estimate a flexible affine jump-diffusion model for volatility under the risk-neutral and objective measures. 5 For directly traded assets like equities or bonds, empirical links between the
(1)
t +∆
∫
Vs ds.
(4)
t
already well established. For example, the equity risk premium is predicted by the dividend–price ratio and short-term interest rates (see, e.g., Campbell, 1987; Fama and French, 1988; Campbell and Shiller, 1988a,b), while bond risk premia may be predicted by forward rates (see, e.g., Fama and Bliss, 1987; Cochrane and Piazzesi, 2005). However, with the notable exception of the recent study by Carr and Wu (2009), academic studies on the behavior of the volatility risk premium are rare, let alone its linkage to the overall economy.
T. Bollerslev et al. / Journal of Econometrics 160 (2011) 235–245
In other words, when n is large relative to ∆, the realized volatility should be a good approximation for the unobserved integrated volatility Vt ,t +∆ .6 Moments for the integrated volatility for the model in (1) have previously been derived by Bollerslev and Zhou (2002) (see also Meddahi (2002) and Andersen et al. (2004)). In particular, the first conditional moment under the physical measure satisfies E(Vt +∆,t +2∆ |Ft ) = α∆ E(Vt ,t +∆ |Ft ) + β∆ ,
(5)
where the coefficients α∆ = e−κ ∆ and β∆ = θ (1 − e−κ ∆ ) are functions of the underlying parameters κ and θ of (1). Using option prices, it is also possible to construct a modelfree measure of the risk-neutral expectation of the integrated volatility. In particular, let IV∗t ,t +∆ denote the time t implied volatility measure computed as a weighted average, or integral, of a continuum of ∆-maturity options, IV∗t ,t +∆ = 2
∞
∫
C (t + ∆, K ) − C (t , K ) K2
0
dK ,
(6)
where C (t , K ) denotes the price of a European call option maturing at time t with strike price K . As formally shown by BrittenJones and Neuberger (2000), this model-free implied volatility then equals the true risk-neutral expectation of the integrated volatility,7
IV∗t ,t +∆ = E∗ Vt ,t +∆ |Ft ,
(7)
where E∗ (·) refers to the expectation under the risk-neutral measure. Although the original derivation of this important result in Britten-Jones and Neuberger (2000) assumes that the underlying price path is continuous, this same result has been extended by Jiang and Tian (2005) to the case of jump diffusions. Jiang and Tian (2005) also demonstrate that the integral in the formula for IV∗t ,t +∆ may be accurately approximated from a finite number of options in empirically realistic situations. Combining these results, it now becomes possible to directly and analytically link the expectation of the integrated volatility under the risk-neutral dynamics in (2) with the objective expectation of the integrated volatility under (1). As formally shown by Bollerslev and Zhou (2006), E Vt ,t +∆ |Ft = A∆ IV∗t ,t +∆ + B∆ ,
where A∆ =
(1−e−κ ∆ )/κ ∗ (1−e−κ ∆ )/κ ∗ −κ ∗ ∆
A∆ θ [∆ − (1 − e
and B∆ = θ [∆ − (1 − e
(8) −κ ∆
)/κ] −
)/κ ] are functions of the underlying parameters κ , θ , and λ. This equation, in conjunction with the moment restriction in (5), provides the necessary identification of the risk premium parameter, λ.8 ∗
∗
2.2. GMM estimation and statistical inference Using the moment conditions (5) and (8), we can now construct a standard GMM type estimator. To allow for overidentifying restrictions, we augment the moment conditions with a lagged instrument of realized volatility, resulting in the following four
6 The asymptotic distribution (for n → ∞ and ∆ fixed) of the realized volatility error has been formally characterized by Barndorff-Nielsen and Shephard (2002) and Meddahi (2002). Also, Barndorff-Nielsen and Shephard (2004b) have recently extended these asymptotic distributional results to allow for leverage effects. 7 Carr and Madan (1998) and Demeterfi et al. (1999) have previously derived a closely related expression. 8 When implementing the conditional moment restrictions (5) and (8), it is useful to distinguish between two information sets—the continuous sigma-algebra Ft = σ {Vs ; s ≤ t }, generated by the point-in-time volatility process, and the discrete sigma-algebra Gt = σ {Vt −s−1,t −s ; s = 0, 1, 2, . . . , ∞}, generated by the integrated volatility series. Obviously, the coarser filtration is nested in the finer filtration (i.e., Gt ⊂ Ft ), and by the Law of Iterated Expectations, E[E(·|Ft )|Gt ] = E(·|Gt ). The GMM estimation method implemented later is based on the coarser information set Gt .
237
dimensional system of equations:
Vt +∆,t +2∆ − α∆ Vt ,t +∆ − β∆ (Vt +∆,t +2∆ − α∆ Vt ,t +∆ − β∆ )Vt −∆,t ft (ξ ) = Vt ,t +∆ − A∆ IV∗t ,t +∆ − B∆ ∗ (Vt ,t +∆ − A∆ IVt ,t +∆ − B∆ )Vt −∆,t
(9)
where ξ = (κ, θ , λ)′ . By construction E[ft (ξ0 )|Gt ] = 0, and the corresponding GMM estimator is defined by ξˆT = arg min gT (ξ )′ WgT (ξ ), where gT (ξ ) refers to∑the sample mean of the T −2 moment conditions, gT (ξ ) ≡ 1/T t =2 ft (ξ ), and W denotes the asymptotic covariance matrix of gT (ξ0 ) (Hansen, 1982). Under standard regularity conditions, the minimized value of the objective function J = minξ gT (ξ )′ WgT (ξ ) multiplied by the sample size should be asymptotically chi-square distributed, allowing for an omnibus test of the overidentifying restrictions. Inference concerning the individual parameters is readily available from the standard formula for the asymptotic covariance matrix, (∂ ft (ξ )/∂ξ ′ W ∂ ft (ξ )/∂ξ )/T . Since the lag structure in the moment conditions in Eqs. (5) and (8) entails a complex dependence, we use a heteroskedasticity and autocorrelation consistent robust covariance matrix estimator with a Bartlett-kernel and a lag length of five in implementing the estimator (Newey and West, 1987). 3. Finite sample distributions To determine the finite sample performance of the GMM estimator based on the moment conditions described above, we conducted a small-scale Monte Carlo study for the specialized Heston (1993) √ version of the model in (1) and (2) in which σt (·) = σ Vt . To illustrate the advantage of the new modelfree volatility measures, we estimated the model using three different implied volatilities: risk-neutral expectation of integrated volatility (this is, of course, not observable in practice but can be calculated within the simulations where we know both the latent volatility state Vt and the risk neutral parameters κ ∗ and θ ∗ ); model-free implied volatility computed from one-month maturity option prices using a truncated and discretized version of Eq. (6); Black–Scholes implied volatility from a one-month maturity, at-the-money option as a (misspecified) proxy for the risk-neutral expectation. We also use three different realized volatility measures to assess how the mis-measurement of realized volatility affects the estimation: the true monthly integrated t +∆ volatility t Vs ds (again, this is not observable in practice but can be calculated inside the simulations); monthly realized volatilities computed from five-minute returns; monthly realized volatilities computed from daily returns. The dynamics of (1) are simulated with the Euler method. We calculate model-free implied volatility for a given level of Vt with the discrete version of (6) presented by Jiang and Tian (2005, p. 1313). We truncate the integration at lower and upper bounds of 70% and 143% of the current stock price St . We discretize the range of integration onto a grid of 150 points. The call option prices needed to compute model-free implied volatility are computed with the Heston (1993) formula. The Black–Scholes implied volatility is generated by calculating the price of an at-the-money call and then inverting the Black–Scholes formula to extract the implied volatility. The accuracy of the asymptotic approximations are illustrated by contrasting the results for sample sizes of 150 and 600. The total number of Monte Carlo replications is 500. To focus on the volatility risk premium, the drift of the stock return in (1) and the risk-free rate in (2) are both set equal to zero. The benchmark scenario sets κ = 0.10, θ = 0.25, σ = 0.10, λ = −0.20, ρ = −0.50.9
9 Three additional variations we consider are (1) high volatility persistence, or
κ = 0.03; (2) high volatility-of-volatility, or σ = 0.20; and (3) pronounced
238
T. Bollerslev et al. / Journal of Econometrics 160 (2011) 235–245
Table 1 Monte Carlo simulation results. The table reports the estimation results for the volatility risk premium parameter, λ, based on 500 replications. The benchmark parameter values are κ = 0.10, θ = 0.20, σ = 0.10, λ = −0.20, and ρ = −0.50 respectively. Mean bias T = 150
Median bias T = 600
Root-MSE
T = 150
T = 600
T = 150
T = 600
−0.0041 −0.0027 −0.0169
−0.0013 −0.0014 −0.0040
0.0202 0.0201 0.0576
0.0091 0.0090 0.0260
0.0015 0.0030 −0.0103
0.0048 0.0045 0.0017
0.0199 0.0199 0.0569
0.0101 0.0101 0.0258
0.0094 0.0106 −0.0019
0.0122 0.0121 0.0094
0.0209 0.0211 0.0562
0.0147 0.0148 0.0276
Risk-neutral implied volatility
−0.0046 −0.0043 −0.0129
Integrated vol. Realized, 5-min Realized, 1-day
−0.0015 −0.0014 −0.0036
Model-free implied volatility Integrated vol. Realized, 5-min Realized, 1-day
0.0013 0.0017 −0.0068
0.0044 0.0045 0.0021
Black–Scholes implied volatility Integrated vol. Realized, 5-min Realized, 1-day
0.0089 0.0092 0.0010
0.0119 0.0120 0.0100
Table 1 summarizes the parameter estimation for the volatility risk premium. The use of model-free implied volatility achieves a similar root-mean-squared error (RMSE) and convergence rate as the true infeasible risk-neutral implied volatility. On the other hand, the use of Black–Scholes implied volatilities adversely affects the estimated volatility risk premium, especially for the large sample size (T = 600). Also, the estimates based on realized volatility from five-minute returns (over a monthly horizon) has virtually the same small bias and efficiency as the estimates based on the (infeasible) integrated volatility. In contrast, the use of realized volatilities from daily returns generally results in a larger biases and noticeable lower efficiency.10
Combining (10) and (12), the constant relative risk aversion coefficient γ is directly proportional to the volatility risk premium: γ = λ/(ρσ ). Moreover, given the estimated values of ρ = −0.8 and σ = 1.2 for the S&P500 data analyzed below, −λ is approximately equal to the representative investor’s risk aversion, γ . An argument along the lines of Gordon and St-Amour (2004) suggests that the above line of reasoning relating the volatility risk premium to investor risk aversion still applies in an approximate sense if investor risk aversion γt is time-varying and follows a diffusion process.12
4. Estimates for the market volatility risk premium
The discussion in the previous section shows how our approach can accommodate a time-varying volatility risk premium. Previous efforts to explain time-varying volatility risk premia with economic variables have been rare and challenging at best. In contrast, the model and GMM estimation procedure that we use here are quite simple to implement. Specifically, we approximate the volatility risk premium parameter as following an augmented AR(1) process,
4.1. Volatility risk premium and relative risk aversion There is an intimate link between the stochastic volatility risk premium and the coefficient of risk aversion for the representative investor within the standard intertemporal asset pricing framework. In particular, assuming a linear volatility risk premium along with an affine version √ of the stochastic volatility model corresponding to σt (·) = σ Vt in (1), as in Heston (1993), it follows that
− λVt = covt
dmt mt
, dVt ,
(10)
where mt denotes the pricing kernel, or marginal utility of wealth for the representative investor. Moreover, if we assume that the representative agent has a power utility function 1−γ
Wt Ut = e−δ t , (11) 1−γ where δ denotes a constant subjective time discount rate, and in equilibrium the agent holds the market portfolio, marginal utility −γ equals mt = e−δ t Wt . It follows then from Itô’s formula that11
covt
dmt mt
, dVt
= −γ ρσ Vt .
(12)
leverage, or ρ = −0.80. These additional designs as well as the estimation results for the κ and θ parameters are available upon request. 10 The Wald test for the risk premium parameter, as well as GMM omnibus test, also has the correct size for the model-free risk-neutral and realized volatility measures. These additional graphs are omitted to conserve space but available upon request. 11 A similar reduced-form argument is made by Bakshi and Kapadia (2003). For a much earlier formal general equilibrium treatment, see also Bates (1988) who allows for both stochastic volatility and jumps.
4.2. Empirical approximation for the volatility risk premium
λ t +1 = a + b λ t +
K −
ck × statet ,k ,
(13)
k=1
where ‘‘statet ,k ’’ are macro-finance state variables. To be consistent with an absence of arbitrage, the macro-finance shocks ‘‘statet ,k ’’ must be interpreted either as fixed covariates or predetermined functions of the time-t state variables, St and Vt . We estimate this time-varying risk premium specification by adding lagged squared realized volatility, lagged implied volatility, and six out of the seven macro-finance covariates (without the redundant lagged realized volatility, see Section 4.5 for details) as additional instruments for the cross moment in (8), leaving the moment for the realized volatility in (5) the same as in the constant risk premium case. Allin-all, this results in the same X2 (1) asymptotic distribution for the GMM omnibus test as in the estimation with a constant λ. 4.3. Data sources and summary statistics Our empirical analysis is based on monthly implied and realized volatilities for the S&P500 index from January 1990 through May 2004. For the risk-neutral implied volatility measure, we rely on the VIX index provided by the Chicago Board of Options
12 In a somewhat different context, Bekaert et al. (2005, 2009) have recently explored the implications from a discrete-time model in which the temporal variation in the degree of risk aversion is related to the external habit of the representative agent.
T. Bollerslev et al. / Journal of Econometrics 160 (2011) 235–245
239
Fig. 1. Model-free realized and implied volatilities.
Exchange (CBOE). The VIX index, available back to January 1990, is based on the liquid S&P500 index options, and more importantly, it is calculated based on the model-free approach discussed earlier.13 Under appropriate assumptions, the concept of CBOE’s ‘‘fair value of future variance’’ developed by Demeterfi et al. (1999) is identical to the ‘‘model-free implied variance’’ by Britten-Jones and Neuberger (2000), as well as the ‘‘risk-neutral expected value of the return variance’’ by Carr and Wu (2009) (see Jiang and Tian, 2007, for detailed justification). As shown in the Monte Carlo study, the model-free implied volatility should be a good approximation to the true (unobserved) risk-neutral expectation of the integrated volatility. Our realized volatilities are based on the summation of the five-minute squared returns on the S&P500 index within the month.14 Thus, for a typical month with 22 trading days, we have 22 × 78 = 1716 five-minute returns, where the 78 five-minute subintervals cover the normal trading hours from 9:30 am to 4:00 pm, including the close-to-open five-minute interval. Again, as indicated by the Monte Carlo simulations discussed in the
13 In September 2003, CBOE replaced the old VIX index, based on S&P100 options and Black–Scholes implied volatility, with the new VIX index based on S&P500 options and model-free implied volatilities involving a discrete approximation to the theoretical result in Carr and Madan (1998). Historical data on both the old and new VIX are directly available from the CBOE. 14 The high-frequency data for the S&P500 index is provided by the Institute of Financial Markets.
previous section, the monthly realized volatilities based on these five-minute returns should provide a very good approximation to the true (unobserved) continuous-time integrated volatility, and, in particular, a much better approximation than the one based on the sum of the daily squared returns. Fig. 1 plots realized volatility, implied volatility, and their difference.15 Both of the volatility measures were generally higher during the latter half of the sample, although they have also both decreased more recently. Summary statistics are reported in Table 2. Realized volatility is systematically lower than implied volatility, and its unconditional distribution deviates more from the normal. Both measures exhibit pronounced serial correlation with extremely slow decay in their autocorrelations. The difference between the implied and realized volatilities is sometimes used by market participants as a measure for the market-implied risk aversion.16 However, the raw difference, depicted in the bottom panel in Fig. 1, is obviously rather noisy and more or less reflects the overall level of the volatility. A more structured approach for extracting the volatility risk premium (or implied risk aversion), as discussed in the previous sections, thus holds the promise of revealing a better picture and a deeper
15 Here and throughout the paper, monthly standard deviations are ‘‘annualized’’ √ by multiplying by 12. 16 In support of this, Rosenberg and Engle (2002) also find that their empirical risk aversion measure is positively related to the difference between implied and objective volatility.
240
T. Bollerslev et al. / Journal of Econometrics 160 (2011) 235–245
Table 2 Summary statistics for monthly volatilities.
4.5. GMM estimation results
Statistics
Realized volatility
Implied volatility
Mean Std. dev. Skewness Kurtosis Minimum 5% Qntl. 25% Qntl. 50% Qntl. 75% Qntl. 95% Qntl. Maximum
12.68 5.84 1.21 4.63 4.73 5.92 7.93 11.56 15.42 24.62 36.61 0.81 0.68 0.61 0.54 0.55 0.55 0.52 0.53 0.53 0.53
20.08 6.39 0.84 3.87 10.63 11.73 14.79 19.52 24.19 31.17 44.28 0.83 0.69 0.60 0.56 0.55 0.53 0.50 0.49 0.52 0.54
ρ1 ρ2 ρ3 ρ4 ρ5 ρ6 ρ7 ρ8 ρ9 ρ10
understanding of the way in which the volatility risk premium evolves over time, and its relationship to the macroeconomy. We next turn to a discussion of our pertinent estimation results. 4.4. Preliminary factor analysis and modeling assumptions Our analysis relies on the stochastic volatility model (1) and its corresponding risk-neutral counterpart (2). While our approach is quite general in that the functions µt (·) and σt (·) can be left unspecified, there are some assumptions embedded in Eqs. (1)–(2) that can be tested before going further: realized volatility should follow an ARMA(1, 1) process; model-free implied volatility should follow an AR(1) process; and one common factor drives both integrated and implied volatility. To test whether realized volatility and model-free implied volatility follow ARMA(1, 1) and AR(1) processes, respectively, we compare the ARMA(1, 1) with an ARMA(2, 2) model and the AR(1) with an AR(2) model. The results are reported in Table 3. All four models produce white-noise residuals (the portmanteau tests do not reject the hypothesis of white-noise residuals at conventional levels). Meanwhile, following a traditional time-series approach to model selection based on the minimization of Schwartz’s Bayesian information criterion, the ARMA(1, 1) is preferred to the ARMA(2, 2) for realized volatility and the AR(1) is preferred to the AR(2) for model-free implied volatility.17 The third model implication listed above holds that realized and model-free implied volatility are driven by a single common factor. To informally investigate this, we performed a standard (unconditional) principal components analysis (PCA) on the two volatility series. The PCA indicates that the first principal component explains 79% of the variance of the two series. This is high enough to assure us that our (implicit) assumption of a single common factor is not obviously violated by the data. Of course, the remaining 21% of the variability could be explained, at least in part, by a time-varying volatility risk premium. We next turn our attention to the results from the more formal GMM-based estimation strategy.
17 While a standard likelihood ratio (LR) test based on the reported maximized values for the Gaussian log likelihoods results in the ARMA(1, 1) for the realized volatility being rejected in favor of the ARMA(2, 2) model, the errors from both models (the volatility-of-volatility) are heteroskedastic, so the standard critical values likely overstates the difference in fit between the two models.
Table 4 reports the GMM estimation results for two volatility risk premium specifications: (i) a constant λ; (ii) a time-varying λt driven by shocks to macro-finance variables as in Eq. (13).18 As seen in the first column of the table, when we restrict the risk premium to be constant, the estimated λ is negative and statistically significant. This finding is consistent with other papers that have found a negative risk premium on stochastic volatility. However, the chi-square omnibus test of overidentifying restrictions rejects the overall specification at the 10% (although not at the 5%) level. The second column presents the results obtained by explicitly including macro-finance covariates. To select the macro-finance variables in the time-varying risk premium specification, we did an extensive search over 29 monthly data series.19 If part of the temporal variation in investor risk aversion reflects investors focusing on different aspects of the economy at different points in time, as seems likely, some flexibility in specifying the set of covariates seems both appropriate and unavoidable. Hence, we select the group of variables that jointly achieves the highest p-value of the GMM omnibus specification test and that are significant (at the 5% level) based on their individual t-test statistics. To facilitate the subsequent discussion, the resulting seven variables have all been standardized to have mean zero and variance one so that their marginal contribution to the timevarying risk premium are directly comparable.20 The results for the autoregressive part of the specification implies an average risk premium of a/(1 − b) = −1.82, and, without figuring in the dynamic impact of the macro state variables, an even higher degree of own persistence, b = 0.93. As necessitated by the specification search, all of the individual parameters for the macro-finance covariates are statistically significant at the 5% level, and the overall GMM specification test is greatly improved, with a p-value of 0.68. The resulting estimate for the volatility risk premium is plotted in the top panel of Fig. 2. Both the signs and magnitudes of the macro-finance shock coefficients are important in understanding the time-variation of the volatility risk premium. Sticking to the convention that (−λ) represents the risk premium, or risk aversion, the realized volatility has the biggest contribution (−0.32) and a positive impact (i.e., when volatility is high so is risk aversion). The impact of AAA bond spread over Treasuries (0.19) likely reflects a business cycle effect (i.e., credit spreads tend to be high before a downturn which usually coincides with low risk aversion). Conversely, housing starts have a positive impact on the risk premium (−0.19) (i.e., a real estate boom usually precedes higher risk aversion). The P /E ratio is the fourth most important factor (0.14), and impacts the premium negatively (i.e., everything else equal, higher P /E ratios lowers the degree of risk aversion). The fifth variable in the table is industrial production growth (0.10), which also has a negative impact (i.e., higher growth leads to a lower volatility risk premium). On the contrary, the sixth PPI inflation variable leads to higher risk aversion (−0.05). Finally, the last significant macro state variable, payroll employment, marginally raises the volatility risk premium (−0.04), possibly as a result of wage pressure.
18 In order to conserve space, we only report the results pertaining to the parameters for the volatility risk premium. The results for the other parameters in the model are directly in line with previous results reported in the literature, and consistent with the summary statistics in Table 2, point toward a high degree of volatility persistence in the (latent) Vt process. 19 The list of these macro-finance variables is omitted to conserve space but available upon request. 20 For stationary variables the unit is the level, while for non-stationary variables the unit is the logarithmic change for the past twelve months.
T. Bollerslev et al. / Journal of Econometrics 160 (2011) 235–245
241
Table 3 Alternatives model specifications. For realized volatility, we estimate the ARMA(2, 2) model
Vt +∆,t +2∆ = β + α1 Vt ,t +∆ + α2 Vt −∆,t + et +∆,t +2∆ + θ1 et ,t +∆ + θ2 et −∆,t where the ARMA(1, 1) model restricts α2 = θ2 = 0. For implied volatility, we estimate the AR(2) model IV∗t +∆,t +2∆ = β + α1 IV∗t ,t +∆ + α2 IV∗t −∆,t + et +∆,t +2∆ where the AR(1) model restricts α2 = 0. Standard errors are shown in parentheses. The columns labeled ‘‘Q(10)’’, ‘‘BIC’’ and ‘‘log L’’ show the Ljung–Box portmanteau test statistic for white-noise residuals using ten lags, Schwartz’s Bayesian information criterion, and the maximized value of the Gaussian log-likelihood, respectively. Realized volatility
β
α1
α2
θ1
θ2
Q(10)
BIC
log L
ARMA(1, 1)
0.052 (0.050) 0.052 (0.053)
0.70 (0.07) 0.21 (0.26)
–
−0.32
–
14.5
2951.7
−1465.5
0.45 (0.12)
(0.10) 0.27 (0.26)
−0.42
7.9
2952.4
−1460.7
Implied volatility
β
α1
α2
θ1
θ2
Q(10)
BIC
log L
AR(1)
0.044 (0.01) 0.044 (0.01)
0.76 (0.04) 0.80 (0.09)
–
–
–
13.4
2323.0
−1153.8
−0.05
–
–
12.5
2327.8
−1153.6
ARMA(2, 2)
AR(2)
(0.08)
Table 4 Estimation of volatility risk premium. All of the macro-finance variables are standardized to have mean zero and variance one. The growth variables (Industrial Production, Producer Price Index, and Payroll Employment) are expressed in terms of the logarithmic difference over the past twelve months. The lag length in the Newey–West weighting matrix employed in the estimation, as discussed in the main text, is set at 25. Constant
λ a b c1 c2 c3 c4 c5 c6 c7
Macro-finance
−1.793 (0.216) −0.122 (0.051) 0.933 (0.030)
−0.319 (0.042)
Realized volatility Moody AAA bond spread Housing start S&P500 P /E ratio Industrial production Producer price index Payroll employment
X2 (d.o.f. = 1) (p-value)
0.194 (0.034)
−0.191 (0.055) 0.140 (0.015) 0.097 (0.026) −0.047 (0.023) −0.040 (0.019) 2.889 (0.089)
(0.11)
0.169 (0.681)
4.6. Robustness checks The consistency of the realized volatility estimator hinges on the idea of ever finer sampled observations over a fixedlength time interval. Yet, it is well-known that a host of market microstructure frictions, including price discreteness and staleness, invalidate the basic underlying martingale assumption at the ultra high frequencies.21 In order to investigate the robustness of our findings based on the 5-min returns with respect to this issue, we re-estimate our model with realized volatilities constructed from coarser sampled 30-min returns. As seen from the first column in Table 5, the sign and significance of the parameter estimates are qualitatively and quantitatively very similar to the previous findings, and the p-values for the overall goodness-of-fit tests for the model are also remarkably close (0.697 versus 0.681). The Monte Carlo experiment in Section 3 indicates that the use of Black–Scholes as opposed to model-free implied volatilities does not result in materially different estimates for the (constant) volatility risk premium, when the sample size is relatively small (150 months). To investigate the sensitivity of our results to the
specific volatility measure, the second column in Table 5 reports the GMM estimation results obtained by using Black–Scholes implied volatilities in place of the model-free measures. Compared to the original results in the last column in Table 4, the parameter estimates are generally close, as are their standard errors. This is in line with the Monte Carlo evidence presented earlier, which indicates that the inferior performance of Black–Scholes becomes more apparent for larger sample sizes (600 months). It is been widely argued in the literature that most major market indices contain jumps, or price discontinuities (see, e.g., Bates, 1996; Bakshi et al., 1997; Pan, 2002; Chernov et al., 2003). This suggests that it may be important to separately consider jump risk when estimating the stochastic volatility risk premium. However, the moment condition in Eq. (8) only identifies a single risk premium parameter. Thus, to check the robustness of our estimates with respect to jumps, we simply fix the jump risk premium and re-estimate the resulting volatility risk premium. To identify the jumps, we follow Barndorff-Nielsen and Shephard (2006, 2004c) in using the difference between the realized volatility, pt + i (∆) − pt + i−1 (∆) n
i=1
2
n
t +∆
∫ →
t +∆
∫ Vs ds +
t
Js2 ds,
(14)
t
and the so-called bi-power variation,
BV nt,t +∆ ≡
n −
pt + i (∆) − pt + i−1 (∆) n
i=2 t +∆
∫ →
pt + i−1 (∆) − pt + i−2 (∆)
n
n
n
Vs ds,
(15)
t
t +∆
for measuring the monthly jump volatility, t Js2 ds, under the objective measure (see also Huang and Tauchen, 2005; Andersen et al., 2007). Following Jiang and Tian (2005) and Carr and Wu (2009) the model-free implied volatility may be similarly decomposed under the risk-neutral expectation, IV∗t ,t +∆
21 A large, and rapidly growing, literature have sought different ways in which to best deal with these complications in the construction of improved realized volatility measures; see, e.g., Aït-Sahalia et al. (2005); Bandi and Russell (2006); Hansen and Lunde (2006).
n −
Vtn,t +∆ ≡
≈E
∗
t +∆
∫ t
∫ ∗ Vs dsFt + E
t +∆ t
Js2 ds
Ft .
(16)
Since it isn’t possible to separately identify a volatility risk premium and a jump risk premium without additional modeling assumptions, we instead perform a counter-factual experiment and assume that the risk-neutral and objective expectations of
242
T. Bollerslev et al. / Journal of Econometrics 160 (2011) 235–245
Fig. 2. Estimates of time-varying volatility risk premia.
Table 5 Robustness checks. All of the macro-finance variables are the same as in the previous table. 30-min refers to the use of realized volatilities based on 30-min returns. BSIV replaces the model-free implied volatility with Black–Scholes implied volatility. Jump (1), (2), and (3) represent the cases where the risk-neutral expectation of squared jumps is assumed to be the same, double, and tipple that of the objective expectation. Macro-finance specification
30-min
BSIV
Jump (1)
Jump (2)
Jump (3)
a b c1 c2 c3 c4 c5 c6 c7
−0.156 (0.040)
−0.129 (0.065)
−0.158 (0.055)
−0.145 (0.048)
−0.129 (0.050)
Realized volatility Moody AAA bond spread Housing start S&P500 P /E ratio Industrial production Producer price index Payroll employment
X2 (d.o.f. = 1) (p-value)
0.891 (0.028)
0.933 (0.033)
0.916 (0.032)
0.919 (0.029)
0.925 (0.030)
−0.262 (0.102)
−0.281 (0.033)
−0.352 (0.049)
−0.365 (0.063)
−0.386 (0.070)
0.107 (0.065)
0.151 (0.029)
0.225 (0.039)
0.229 (0.054)
0.236 (0.056)
−0.175 (0.054)
−0.142 (0.056)
−0.199 (0.056)
−0.203 (0.056)
−0.209 (0.057)
0.136 (0.021) 0.064 (0.054) −0.037 (0.019) −0.019 (0.043)
0.129 (0.014) 0.056 (0.026) −0.032 (0.021) −0.013 (0.020)
0.144 (0.013) 0.110 (0.028) −0.048 (0.024) −0.045 (0.020)
0.147 (0.012) 0.117 (0.032) −0.046 (0.022) −0.051 (0.023)
0.152 (0.011) 0.124 (0.033) −0.043 (0.021) −0.059 (0.024)
0.151 (0.697)
0.373 (0.541)
0.150 (0.698)
0.059 (0.807)
0.004 (0.951)
the jump contribution differ by a constant multiple. In particular, under Jump Scenario (h), E
∗
t +∆
∫ t
Js2 ds
∫ Ft = h · E
t +∆ t
Js2 ds
Ft .
risk increases. Nonetheless, overall the results clearly confirm the robustness of our previous findings with respect to the specific jump dynamics and risk prices entertained here.22
(17)
The corresponding estimation results, reported in the last three columns in Table 5, show that the level, persistence, and macrofinance sensitivities of the volatility risk premium are all largely unaffected. Interestingly, on comparing the three jumps scenarios, the overall goodness-fit appears to improve as the price of jump
22 We also experimented with the alternative non-parametric jump detecting procedure developed by Jiang and Oomen (2008) as a way of identifying periods without any jumps. Even though 92.5% of the days are classified as no-jump days according to this procedure, only 40 months of the total 173 months in the sample contain no jumps, rendering separate estimation and comparisons of a risk premium for those months questionable. Further details of these results are available upon request.
T. Bollerslev et al. / Journal of Econometrics 160 (2011) 235–245
243
Table 6 Stock market return predictability. Panel A reports predictive regressions for the monthly excess return on S&P500 index measured in annualized percentage term. Industrial Production (Ind. P.) and Nonfarm Payroll Employment (Nonf. P.) numbers represent the past year logarithmic changes in annualized percentages. The PE Ratio is based on the trailing twelve month earning reported for S&P500 index. Risk Prem. refers to the volatility risk premium extracted here. Panel B reports predictive regressions for the quarterly excess return on S&P500 index from 1990Q1 to 2003Q2. The consumption-wealth-ratio, or CAY, is defined in Lettau and Ludvigson (2001), and the data is downloaded from their website. Panel A monthly horizon Intercept (s.e.)
Risk Prem. (s.e.)
−22.283 (10.120)
14.567 (5.045)
PE Ratio (s.e.)
35.939 (13.750) −0.926 (5.495) −0.311 (5.478)
−1.272 (0.566)
32.619 (15.306) −9.279 (21.245)
−1.259 (0.585) −0.430 (0.594)
11.575 (5.438) Risk Prem. (s.e.)
−27.087 (12.215)
16.477 (6.145)
2.719 (2.204) 1.841 (2.245)
13.619 (5.966) 14.253 (6.970)
29.737 (27.888) 13.204 (6.180)
4.7. Comparing alternative estimates of time-varying risk premia Several alternative procedures for estimating the time-varying volatility risk premium have previously been implemented in the literature. One approach is to vary the risk premium parameter each time period to best match that period’s market data. In the context of volatility modeling, that approach would vary the risk premium parameter to match each month’s difference between realized and implied volatility (papers that have taken this approach include Rosenberg and Engle (2002, p. 363) and Tarashev et al. (2003, p. 62)). In the context of our modeling framework, such an approach produces the time-varying risk premium shown in the middle panel of Fig. 2. The general shape of Fig. 2 matches the simple difference between implied and realized volatilities shown in the bottom panel of Fig. 1. As previously noted, because this approach attributes every wiggle in the data to changes in the risk premium, it produces a very volatile time series of monthly risk premia. Economic theory argues that an asset’s risk premium should depend on deep structural parameters. For example, in the consumption CAPM (C-CAPM), an asset’s risk premium varies with investors’ risk aversion and the asset’s covariance with investors’ consumption. By definition, deep structural parameters should be relatively stable over time. Yet the approach of period-by-period estimation of a time-varying risk premia forces the parameters to vary (almost independently) from one period to the next. As such, we find that monthly volatility risk premiums estimated in this way are implausibly volatile. A second approach to estimating risk premium parameters comes from the consumption-based asset pricing literature. This approach typically assumes that risk premia are constant, or if risk preferences are allowed to vary over time, they end up being implausibly smooth and possibly nonstationary. For example, Campbell and Cochrane (1999) generate time variation in risk aversion through habit formation in which the level of habit reacts only gradually to changes in consumption. Such a modeling strategy explicitly prevents the risk premia from being excessively variable in the short-run. Risk aversion parameters estimated with this approach (Brandt and Wang, 2003, p. 1481 and Gordon and St-Amour, 2004, p. 249) generally have little or no variation at a business cycle frequency. In contrast, our estimated volatility risk premium parameter shown in the top panel of Fig. 2, has plausible business cycle
Adj. R2
3.643 (2.624)
0.044 0.022 0.010 0.005
−3.306 (4.657) −1.388 (4.721)
0.022 0.037
CAY (s.e.)
Adj. R2
5.378 (2.031)
0.156 0.086 0.068
−1.098 (1.139)
2.042 (1.946) 2.476 (3.352)
0.151 0.149 0.078
−0.432 (0.926)
1.146 (2.860)
0.136
PE Ratio (s.e.)
−1.543 (0.692)
41.044 (16.289) 2.413 (4.308)
−10.698 (25.008)
Nonf. P. (s.e.)
1.992 (1.226)
Panel B quarterly horizon Intercept (s.e.)
−6.927 (21.494) −23.338 (13.413)
Ind. P. (s.e.)
−0.609 (0.649)
variation. Peaks and troughs in the series are typically multiple years apart, and the series dies not have excessive monthby-month fluctuations that. The estimated risk premium rises sharply during the two NBER-dated macroeconomic recessions (the shaded areas in the plots), as well as the periods of slow recovery and job growth after the 1991 and 2001 recessions. Nearly all of the peaks in the series are readily identifiable with major macroeconomic or financial market developments, as labeled in the figure. The chart also suggests that the risk premium often rises sharply but declines only gradually. It is interesting to contrast these estimates with the plot in the bottom panel in the same figure, which shows the estimate of the volatility risk premium based solely on the lagged realized volatility as the only state variable. The resulting risk premium hardly changes over the first half of the sample and otherwise appears extremely smooth. We conclude that the macro-finance variables clearly help in identifying a reasonably informative timevarying volatility risk premium. 4.8. Stock return predictability Because the volatility risk premium can be related to investor risk aversion, it may be informative about other risk premia in the economy. To illustrate, we compare its predictive power for aggregate stock market returns with that of other traditionallyused macro-finance variables. The top panel of Table 6 reports the results of simple regressions of monthly S&P500 excess returns on the volatility risk premium and on the most significant individual variables from the pool of 29 macro-finance covariates. The extracted volatility risk premium has the highest predictive power with an adjusted R2 of 4.4%.23 The second best predictor is the P /E ratio with an adjusted R2 of 2.2%. Next in order are industrial production and nonfarm payrolls with adjusted R2 ’s of 1.0% and 0.5%, respectively. Dividend yield – a significant predictor according to many other studies – only explains 0.3% of the monthly return variation. These results are consistent with previous findings that macroeconomic state variables do predict returns, though the predictability measured by adjusted R2 is
23 The use of the volatility risk premium as a second-stage regressor suffers from a standard errors-in-variables type problem, resulting in too large a standard error for the estimated slope coefficient. Also, the persistence of the right-hand-side variables in predictive regressions can severely bias the coefficient estimates and hamper the inference (Stambaugh, 1999; Amihud and Hurvich, 2004).
244
T. Bollerslev et al. / Journal of Econometrics 160 (2011) 235–245
usually in the low single digits. Nonetheless, it is noteworthy that of all the predictor variables, the volatility risk premium results in the highest adjusted R2 . Combining all of the marginally significant variables into a single multiple regression together with the volatility risk premium, none of the macro-finance variables are significant, while only the P /E ratio is significant in the regression excluding the premium. Of course, the estimate for the volatility risk premium already incorporates some of the same macroeconomic variables (see Table 4), so the finding that these variables are ‘‘driven out’’ when included together with the premium is not necessarily surprising. However, the macro-variables entering the model for λt only impact the returns indirectly through the temporal variation in the premium, and the volatility risk premium itself is also estimated from a different set of moment conditions involving only the model-free realized and optionimplied volatilities. The bottom panel of Table 6 examines stock return predictability over a quarterly horizon. In addition to the volatility risk premium and the P /E ratio from the last month of the previous quarter, which were the two most important predictor variables in the monthly regressions, we add the quarterly consumptionwealth ratio. The consumption-wealth ratio, termed CAY, has previously been found by Lettau and Ludvigson (2001) to be helpful in explaining longer-horizon returns. The first three regressions in the bottom panel of Table 6 show that each of the three predictor variables are statistically significant in univariate regressions. The volatility risk premium results in the highest individual adjusted R2 of 15.6%. Adding the P /E ratio or CAY to the volatility risk premium results in lower adjusted R2 ’s and only the risk premium remains statistically significant in all of the predictive regressions.24 5. Conclusion This paper develops a simple consistent approach for estimating the volatility risk premium. The approach exploits the linkage between the objective and risk-neutral expectations of the integrated volatility. The estimation is facilitated by the use of newly available model-free realized volatilities based on highfrequency intraday data along with model-free option-implied volatilities. The approach allows us to explicitly link any temporal variation in the risk premium to underlying state variables within an internally consistent and simple-to-implement GMM estimation framework. A small-scale Monte Carlo experiment indicates that the procedure performs well in estimating the volatility risk premium in empirically realistic situations. In contrast, the estimates based on Black–Scholes implied volatilities and/or monthly sample variances based on daily squared returns result in inefficient and statistically less reliable estimates of the risk premium. Applying the methodology to the S&P500 market index, we find significant evidence for temporal variation in the volatility risk premium, which we directly link to a set of underlying macro-finance state variables. Interestingly, the extracted volatility risk premium also appears to be helpful in predicting the return on the market itself. The volatility risk premium (or risk aversion index) extracted in our paper differs sharply from other approaches in the
24 The predictability of the volatility risk premium documented here, 4.4% and 15.6% at the monthly and quarterly horizons, respectively, far exceed that afforded by other more traditional predictor variables. Of course, by estimating the volatility premium from the entire sample, the results suffer from a look-ahead bias. Also, with only 173 monthly and 54 quarterly return observations, the specific estimates may be driven by a few influential outliers. Still, the more comprehensive empirical investigation reported in Bollerslev et al. (2008) do suggest that the results hold up more generally, although the magnitude of the reported R2 ’s probably overstate the case.
literature. In particular, earlier estimates relying on period-byperiod differences in the estimated risk-neutral and objective distributions tend to produce implausibly volatile estimates. On the other hand, earlier procedures based on structural macroeconomic/consumption-type pricing models typically result in implausibly smooth estimates. In contrast, the model-free realized and implied volatility-based procedure developed here results in an estimated premium that avoids the excessive period-by-period random fluctuations, yet responds to recessions, financial crises, and other economic events in an empirically realistic fashion. It would be interesting to more closely compare and contrast the risk aversion index estimated here to other popular gauges of investor fear or market sentiment. Also, how does the estimated volatility risk premium for the S&P500 compare to that of other markets? The results in the paper show that the extracted volatility risk premium for the current month is useful in predicting next month’s aggregate S&P500 return. It would be interesting to further explore the cross sectional pricing implications of this finding. Does the volatility risk premium represent a systematic priced risk factor?25 Also, what is the link between stock and bond market volatility risk premia? Lastly, better estimates for the volatility risk premium are, of course, of direct importance for derivatives pricing. We leave further work along these lines for future research. References Adrian, Tobias, Rosenberg, Joshua, 2008. Stock returns and volatility: pricing the long-run and short-run components of market risk. Journal of Finance 63, 2997–3030. Aït-Sahalia, Yacine, Kimmel, Bob, 2007. Maximum likelihood estimation of stochastic volatility models. Journal of Financial Economics 83, 413–452. Aït-Sahalia, Yacine, Lo, Andrew W., 2000. Nonparametric risk management and implied risk aversion. Journal of Econometrics 94, 9–51. Aït-Sahalia, Yacine, Mykland, Per, Zhang, Lan, 2005. How often to sample a continuous-time process in the presence of market microstructure noise. Review of Financial Studies 18, 351–416. Amihud, Yakov, Hurvich, Clifford, 2004. Predictive regressions: a reduced-bias estimation method. Journal of Financial and Quantitative Analysis 39, 813–841. Andersen, Torben G., Bollerslev, Tim, Diebold, Francis X., 2007. Roughing it up: including jump components in the measurement, modeling, and forecasting of return volatility. Review of Economics and Statistics 89, 701–720. Andersen, Torben G., Bollerslev, Tim, Diebold, Francis X., 2009. Parametric and Nonparametric Volatility Measurement. In: Handbook of Financial Econometrics. Elsevier Science B.V., Amsterdam (Chapter 2) (forthcoming). Andersen, Torben G., Bollerslev, Tim, Diebold, Francis X., Ebens, Heiko, 2001. The distribution of realized stock return volatility. Journal of Financial Economics 61, 43–76. Andersen, Torben G., Bollerslev, Tim, Diebold, Francis X., Labys, Paul, 2003. Modeling and forecasting realized volatility. Econometrica 71, 579–625. Andersen, Torben G., Bollerslev, Tim, Meddahi, Nour, 2004. Analytical evaluation of volatility forecasts. International Economic Review 45, 1079–1110. Ang, Andrew, Hodrick, Robert, Xing, Yuhang, Zhang, Xiaoyan, 2006. The crosssection of volatility and expected returns. Journal of Finance 61, 259–299. Bakshi, Gurdip, Cao, Charles, Chen, Zhiwu, 1997. Empirical performance of alternative option pricing models. Journal of Finance 52, 2003–2049. Bakshi, Gurdip, Kapadia, Nikunj, 2003. Delta-hedged gains and the negative market volatility risk premium. Review of Financial Studies 16, 527–566. Bandi, Federico, Russell, Jeffrey, 2006. Separating microstructure noise from volatility. Journal of Financial Economics 79, 655–692. Barndorff-Nielsen, Ole, Shephard, Neil, 2002. Econometric analysis of realised volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society, Series B 64. Barndorff-Nielsen, Ole, Shephard, Neil, 2004a. Econometric analysis of realised covariation: high frequency based covariance, regression and correlation. Econometrica 72, 885–925. Barndorff-Nielsen, Ole, Shephard, Neil, 2004b. A feasible central limit theory for realised volatility under leverage. Manuscript. Nuffield College, Oxford University. Barndorff-Nielsen, Ole, Shephard, Neil, 2004c. Power and bipower variation with stochastic volatility and jumps. Journal of Financial Econometrics 2, 1–48.
25 The recent results in Ang et al. (2006) and Adrian and Rosenberg (2008) suggest that volatility risk may indeed be a priced factor.
T. Bollerslev et al. / Journal of Econometrics 160 (2011) 235–245 Barndorff-Nielsen, Ole, Shephard, Neil, 2006. Econometrics of testing for jumps in financial economics using bipower variation. Journal of Financial Econometrics 4, 1–30. Bates, David S., 1988. Pricing options on jump-diffusion process. Working Paper. Rodney L. White Center, Wharton School. Bates, David S., 1996. Jumps and stochastic volatility: exchange rate process implicit in deutsche mark options. The Review of Financial Studies 9, 69–107. Bekaert, Geert, Engstrom, Eric, Grenadier, Steven R., 2005. Stock and bond returns with moody investors. Working Paper. Columbia University. Bekaert, Geert, Engstrom, Eric, Xing, Yuhang, 2009. Risk, uncertainty and asset prices. Journal of Financial Economics 91, 39–82. Benzoni, Luca, 2002. Pricing options under stochastic volatility: an empirical investigation. Working Paper. University of Minnesota. Bliss, Robert R., Panigirtzoglou, Nikolaos, 2004. Option-implied risk aversion estimates. Journal of Finance 59, 407–446. Bollerslev, Tim, Tauchen, George, Zhou, Hao, 2008. Variance risk premia and expected stock returns. Working Paper. Department of Economics, Duke University. Bollerslev, Tim, Zhou, Hao, 2002. Estimating stochastic volatility diffusion using conditional moments of integrated volatility. Journal of Econometrics 109, 33–65. Bollerslev, Tim, Zhou, Hao, 2006. Volatility puzzles: a simple framework for gauging return-volatility regressions. Journal of Econometrics 131, 123–150. Brandt, Michael W., Wang, Kevin Q., 2003. Time-varying risk aversion and expected inflation. Journal of Monetary Economics 50, 1457–1498. Britten-Jones, Mark, Neuberger, Anthony, 2000. Option prices, implied price processes, and stochastic volatility. Journal of Finance 55, 839–866. Campbell, John Y., 1987. Stock returns and the term structure. Journal of Financial Economics 18, 373–399. Campbell, John Y., Cochrane, John H., 1999. By force of habit: a consumption based explanation of aggregate stock market behavior. Journal of Political Economy 107, 205–251. Campbell, John Y., Shiller, Robert J., 1988a. The dividend-price ratio and expectations of future dividends and discount factors. Review of Financial Studies 1, 195–228. Campbell, John Y., Shiller, Robert J., 1988b. Stock prices, earnings, and expected dividends. Journal of Finance 43, 661–676. Carr, Peter, Madan, Dilip, 1998. Towards a Theory of Volatility Trading. Risk Books, pp. 417–427 (Chapter 29). Carr, Peter, Wu, Liuren, 2009. Variance risk premiums. Review of Financial Studies 22, 1311–1341. CBOE Documentation 2003. VIX: CBOE volatility index. White Paper. Chernov, Mikhail, Gallant, A. Ronald, Ghysels, Eric, Tauchen, George, 2003. Alternative models for stock price dynamics. Journal of Econometrics 116, 225–257. Chernov, Mikhail, Ghysels, Eric, 2000. A study towards a unified approach to the joint estimation of objective and risk neutral measures for the purpose of options valuation. Journal of Financial Economics 56, 407–458. Cochrane, John H., Piazzesi, Monika, 2005. Bond risk premia. American Economic Review 95, 138–160. Demeterfi, Kresimir, Derman, Emanuel, Kamal, Michael, Zou, Joseph, 1999. A guide to volatility and variance swaps. Journal of Derivatives 6, 9–32.
245
Eraker, Bjørn, 2004. Do stock prices and volatility jump? Reconciling evidence from spot and option prices. Journal of Finance 59, 1367–1403. Fama, Eugene F., Bliss, Robert R., 1987. The information in long-maturity forward rates. The American Economic Review 77, 680–692. Fama, Eugene F., French, Kenneth R., 1988. Dividend yields and expected stock returns. Journal of Financial Economics 22, 3–25. Garcia, René, Lewis, Marc-André, Renault, Éric, 2010. Estimation of objective and risk-neutral distributions based on moments of integrated volatility. Journal of Econometrics, forthcoming (doi:10.1016/j.jeconom.2010.03.011). Gordon, Stephen, St-Amour, Pascal, 2004. Asset returns and state-dependent risk preferences. Journal of Business and Economic Statistics 22, 241–252. Hansen, Lars Peter, 1982. Large sample properties of generalized method of moments estimators. Econometrica 50, 1029–1054. Hansen, Peter Reinhard, Lunde, Asger, 2006. Realized variance and market microstructure noise. Journal of Business and Economic Statistics 24, 127–218. Heston, Steven, 1993. A closed-form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies 6, 327–343. Huang, Xin, Tauchen, George, 2005. The relative contribution of jumps to total price variance. Journal of Financial Econometrics 3, 456–499. Jackwerth, Jens Carsten, 2000. Recovering risk aversion from option prices and realized returns. Review of Financial Studies 13, 433–451. Jiang, George, Oomen, Roel, 2008. Testing for jumps when asset prices are observed with noise a ‘‘swap variance’’ approach. Journal of Econometrics 144, 352–370. Jiang, George, Tian, Yisong, 2005. Model-free implied volatility and its information content. Review of Financial Studies 1305–1342. Jiang, George, Tian, Yisong, 2007. Extracting model-free volatility from option prices: an examination of the VIX index. Journal of Derivatives 14, 1–26. Jones, Christopher S., 2003. The dynamics of stochastic volatility: evidence from underlying and options markets. Journal of Econometrics 116, 181–224. Lettau, Martin, Ludvigson, Sydney, 2001. Consumption, aggregate wealth, and expected stock returns. Journal of Finance 56, 815–849. Lynch, Damien, Panigirtzoglou, Nikolaos, 2003. Option implied and realized measures of variance. Working Paper. Monetary Instruments and Markets Division, Bank of England. Meddahi, Nour, 2002. Theoretical comparison between integrated and realized volatility. Journal of Applied Econometrics 17, 479–508. Newey, Whitney K., West, Kenneth D., 1987. A simple positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–708. Pan, Jun, 2002. The jump-risk premia implicit in options: evidence from an integrated time-series study. Journal of Financial Economics 63, 3–50. Rosenberg, Joshua V., Engle, Robert F., 2002. Empirical pricing kernels. Journal of Financial Economics 64, 341–372. Stambaugh, Robert F., 1999. Predictive regressions. Journal of Financial Economics 54, 375–421. Tarashev, Nikola, Tsatsaronis, Kostas, Karampatos, Dimitrios, 2003. Investors’ attitude toward risk: what can we learn from options? BIS Quarterly Review. Bank of International Settlement. Wu, Liuren, 2005. Variance dynamics: joint evidence from options and highfrequency returns. Working Paper. Baruch College of CUNY.
Journal of Econometrics 160 (2011) 246–256
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Volatility forecast comparison using imperfect volatility proxies✩ Andrew J. Patton ∗ Department of Economics, Duke University, USA Oxford-Man Institute of Quantitative Finance, University of Oxford, UK
article
info
Article history: Available online 6 March 2010 JEL classification: C53 C52 C22 Keywords: Forecast evaluation Forecast comparison Loss functions Realised variance Range
abstract The use of a conditionally unbiased, but imperfect, volatility proxy can lead to undesirable outcomes in standard methods for comparing conditional variance forecasts. We motivate our study with analytical results on the distortions caused by some widely used loss functions, when used with standard volatility proxies such as squared returns, the intra-daily range or realised volatility. We then derive necessary and sufficient conditions on the functional form of the loss function for the ranking of competing volatility forecasts to be robust to the presence of noise in the volatility proxy, and derive some useful special cases of this class of ‘‘robust’’ loss functions. The methods are illustrated with an application to the volatility of returns on IBM over the period 1993 to 2003. © 2010 Published by Elsevier B.V.
1. Introduction Many forecasting problems in economics and finance involve a variable of interest that is unobservable, even ex post. The most prominent example of such a problem is the forecasting of volatility for use in financial decision making. Other problems include forecasting the true rates of inflation, GDP growth or unemployment (not simply the announced rates); forecasting trade intensities; and forecasting default probabilities or ‘crash’ probabilities. While evaluating and comparing economic forecasts is a well-studied problem, dating back at least to Cowles (1933), if the variable of interest is latent then the problem of forecast evaluation and comparison becomes more complicated.1 This complication can be resolved, at least partly, if an unbiased estimator of the latent variable of interest is available. In volatility forecasting, for example, the squared return on an asset over the period t (assuming a zero mean return) can be interpreted as a conditionally unbiased estimator of the true unobserved
✩ Matlab code used in this paper is available from http://econ.duke.edu/~ap172/code.html. ∗ Corresponding address: Department of Economics, Duke University, 213 Social Sciences Building, Box 90097, Durham NC 27708-0097, USA. E-mail address:
[email protected]. 1 For recent surveys of the forecast evaluation literature see Clements (2005) and
West (2006). For recent surveys of the volatility forecasting literature, see Andersen et al. (2006), Poon and Granger (2003) and Shephard (2005). 0304-4076/$ – see front matter © 2010 Published by Elsevier B.V. doi:10.1016/j.jeconom.2010.03.034
conditional variance of the asset over the period t.2 Many of the standard methods for forecast evaluation and comparison, such as the Mincer and Zarnowitz (1969) regression and the Diebold and Mariano (1995) and West (1996) tests, can be shown to be applicable when such a conditionally unbiased proxy is used, see Hansen and Lunde (2006) for example. However, it is not true that using a conditionally unbiased proxy will always lead to the same outcome as if the true latent variable were used: Andersen and Bollerslev (1998) and Andersen et al. (2005), amongst others, study the reduction in finite-sample power of tests based on noisy volatility proxies; we focus, like Hansen and Lunde (2006), on distortions in the rankings of competing forecasts that can arise when using a noisy volatility proxy in some commonly used tests for forecast comparison. For example, in the volatility forecasting literature numerous authors have expressed concern that a few extreme observations may have an unduly large impact on the outcomes of forecast evaluation and comparison tests, see Bollerslev and Ghysels (1994), Andersen et al. (1999) and Poon and Granger (2003) amongst others. One common response to this concern is to employ forecast loss functions that are ‘‘less sensitive’’ to large observations than the usual squared forecast error loss function, such as absolute error or proportional error loss functions. In
2 The high/low range and realised volatility, see Parkinson (1980) and Andersen et al. (2003) for example, have also been used as volatility proxies. These are discussed in detail below.
A.J. Patton / Journal of Econometrics 160 (2011) 246–256
this paper we show analytically that such approaches can lead to incorrect inferences and the selection of inferior forecasts over better forecasts. We focus on volatility forecasting as a specific case of the more general problem of latent variable forecasting. In Section 5 we discuss the extension of our results to other latent variable forecasting problems. Our research builds on work by Andersen and Bollerslev (1998), Meddahi (2001) and Hansen and Lunde (2006), who were among the first to analyse the problems introduced by the presence of noise in a volatility proxy. This paper extends the existing literature in two important directions, discussed below. Firstly, we derive explicit analytical results for the distortions that may arise when some common loss functions are employed, considering the three most commonly used volatility proxies: the daily squared return, the intra-daily range and a realised variance estimator. We show that these distortions can be large, even for favourable scenarios (such as Gaussianity). Further, we show that the distortions vary greatly with the choice of loss function, thus providing a theoretical explanation for the widespread finding of conflicting rankings of volatility forecasts when ‘‘non-robust’’ loss functions (defined precisely in Section 2) are used in applied work, see Lamoureux and Lastrapes (1993), Hamilton and Susmel (1994), Bollerslev and Ghysels (1994) and Hansen and Lunde (2005), amongst many others.3 Secondly, we provide necessary and sufficient conditions on the functional form of the loss function to ensure that the ranking of various forecasts is preserved when using a noisy volatility proxy. These conditions are related to those of Gourieroux et al. (1984) for quasi-maximum likelihood estimation. Interestingly, we find that there are an infinite number of loss functions that satisfy these conditions, and that these loss functions differ in meaningful ways (such as the penalty applied to over-prediction versus underprediction). Thus our class of ‘‘robust’’ loss functions is not simply the quadratic loss function or minor variations thereof. The canonical problem in point forecasting is to find the forecast that minimises the expected loss, conditional on time t information. That is, Yˆt∗+h,t ≡ arg min E L Yt +h , yˆ |Ft
(1)
yˆ ∈Y
where Yt +h is the variable of interest, L is the forecast user’s loss function, Y is the set of possible forecasts, and Ft is the time t information set. Starting with the assumption that the forecast user is interested in the conditional variance, we effectively take the solution of the optimisation problem above (the conditional variance) as given, and consider the loss functions that will generate the desired solution. This approach is unusual in the economic forecasting literature: the more common approach is to take the forecast user’s loss function as given and derive the optimal forecast for that loss function; related papers here are Granger (1969), Engle (1993), Christoffersen and Diebold (1997), Christoffersen and Jacobs (2004) and Patton and Timmermann (2007), amongst others. The fact that we know the forecast user desires a variance forecast places limits on the class of loss functions that may be used for volatility comparison, ruling out
some choices previously used in the literature. However we show that the class of ‘‘robust’’ loss functions still admits a wide variety of loss functions, allowing much flexibility in representing volatility forecast users’ preferences. One practical implication of this paper is that the stated goal of forecasting the conditional variance is not consistent with the use of some loss functions when an imperfect volatility proxy is employed. However, these loss functions are not inherently invalid or inappropriate: if the forecast user’s preferences are indeed described by an ‘‘non-robust’’ loss function, then this simply implies that the object of interest to that forecast user is not the conditional variance but rather some other quantity.4 In academic research the preferences of the end-user of the forecast are often unknown, and a common response to this to is to select forecasts based on their average distance, somehow measured, to the true latent conditional variance. In such cases, the methods outlined in this paper can be applied to identify the forecast that is closest to the true conditional variance by using imperfect volatility proxy and a ‘‘robust’’ loss function. The remainder of this paper is as follows. In Section 2 we analytically consider volatility forecast comparison tests using an imperfect volatility proxy, showing the problems that arise when using some common loss functions. We initially consider using squared daily returns as the proxy, and then consider using the range and realised variance. In Section 3 we provide necessary and sufficient conditions on the functional form of a loss function for the ranking of competing volatility forecasts to be robust to the presence of noise in the volatility proxy, and derive some useful special cases of this class of robust loss functions. One of these special cases is a parametric family of loss functions that nests two of the most widely used loss functions in the literature, namely the MSE and QLIKE loss functions (defined in Eqs. (5) and (6) below). In Section 4 we present an empirical illustration using two widely used volatility forecasting methods, and in Section 5 we conclude and suggest extensions. All proofs and derivations are provided in Appendix. 1.1. Notation Let rt be the variable whose conditional variance is of interest, usually a daily or monthly asset return in the volatility forecasting literature. The information set used in defining the conditional variance of interest is denoted Ft −1 , which is assumed to contain σ (rt −j , j ≥ 1), but may also include other variables and/or variables measured at a higher frequency than rt (such as intra-daily returns). Denote V [rt |Ft −1 ] ≡ Vt −1 [rt ] ≡ σt2 . We will assume throughout that E [rt |Ft −1 ] ≡ Et −1 [rt ] = 0, and so σt2 = Et −1 [rt2 ]. Let εt ≡ rt /σt denote the ‘standardised return’. Let a forecast of the conditional variance of rt be denoted ht , or hi,t if there is more than one forecast under analysis. We will take forecasts as ‘‘primitive’’, and not consider the specific models and estimators that may have generated the forecasts. The loss function of the forecast user is L : R+ × H → R+ , where the first argument of L is σt2 or some proxy for σt2 , denoted σˆ t2 , and the second is ht . R+ and R++ denote the non-negative and positive parts of the real line respectively, and H is a compact subset of R++ . Commonly used volatility proxies are the squared return, rt2 , realised volatility, RVt , and the range, RGt . Optimal forecasts for a given loss function and proxy are denoted h∗t and are defined as: h∗t ≡ arg min E L σˆ t2 , h |Ft −1 .
3 All of the results in this paper apply directly to the problem of forecasting integrated variance (IV), which Andersen et al. (2010), amongst others, argue is a more ‘‘relevant’’ notion of variability. We focus on the problem of conditional variance forecasting due to its prevalence in applied work in the past two decades. If we take expected IV rather than the conditional variance as the latent object of interest, then we only require that an unbiased realised variance estimator is available for the results to go through. In the presence of jumps in the price process, quadratic variation (QV) is a more appropriate measure of risk, and a similar extension is possible.
247
(2)
h∈H
4 For example, the utility of realised returns on a portfolio formed using a volatility forecast, or the profits obtained from an option trading strategy based on a volatility forecast, see West et al. (1993) and Engle et al. (1993) for example, define economically meaningful loss functions, even though the optimal forecasts under those loss functions will not generally be the true conditional variance.
248
A.J. Patton / Journal of Econometrics 160 (2011) 246–256
2. Volatility forecast comparison using an imperfect volatility proxy We consider volatility forecast comparisons based on expected loss, or distance to the true conditional variance. These comparisons can be implemented in finite samples using the tests of Diebold and Mariano (1995) and West (1996), (henceforth DMW). If we define ui,t ≡ L(σt2 , hi,t ), where L is the forecast user’s loss function, and let dt = u1,t − u2,t , then a DMW test of equal predictive accuracy can be conducted as a simple Wald test that E [dt ] = 0.5 Of primary interest is whether the feasible ranking of two forecasts obtained using an imperfect volatility proxy is the same as the infeasible ranking that would be obtained using the unobservable true conditional variance. In such a case we are able to compare average forecast accuracy even though the variable of interest is unobservable. We define loss functions that yield such an equivalence as ‘‘robust’’: Definition 1. A loss function, L, is ‘‘robust’’ if the ranking of any two (possibly imperfect) volatility forecasts, h1t and h2t , by expected loss is the same whether the ranking is done using the true conditional variance, σt2 , or some conditionally unbiased volatility proxy, σˆ t2 . That is,
E L σt2 , h1t R E L σt2 , h2t ⇔ E L σˆ t2 , h1t R E L σˆ t2 , h2t
(3)
level of expected loss that would be obtained using the true latent variable of interest. It follows directly from the definition of a robust loss function that the true conditional variance is the optimal forecast (we formally show this in the proof of Proposition 1), and thus a necessary condition for a loss function to be robust to noise is that the true conditional variance is the optimal forecast. In this section we determine whether this condition holds for some common loss functions, and analytically characterise the distortion for those cases where it is violated. A common response to the concern that a few extreme observations drive the results of volatility forecast comparison studies is to employ alternative measures of forecast accuracy to the usual MSE loss function, see Pagan and Schwert (1990), Bollerslev and Ghysels (1994); Bollerslev et al. (1994), Diebold and Lopez (1996), Andersen et al. (1999), Poon and Granger (2003) and Hansen and Lunde (2005), for example. A collection of loss functions employed in the literature on volatility forecast evaluation is presented below.8 In the next two sub-sections we will study the properties of these loss functions and show that for almost all choices of volatility proxy most of these loss functions are not robust and can lead to incorrect rankings of volatility forecasts. MSE : L σˆ 2 , h = σˆ 2 − h
2
QLIKE : L σˆ 2 , h = log h +
(5)
σˆ
2
(6)
h
for any σˆ t2 s.t. E [σˆ t2 |Ft −1 ] = σt2 .
MSE-LOG : L σˆ 2 , h = log σˆ 2 − log h
(7)
Meddahi (2001) showed that the ranking of forecasts on the basis of the R2 from the Mincer–Zarnowitz regression:
√ 2 MSE-SD : L σˆ 2 , h = σˆ − h
(8)
σˆ t2 = β0 + β1 hit + eit
(4)
is robust to noise in σˆ t2 . Hansen and Lunde (2006) showed that the R2 from a regression of log(σˆ t2 ) on a constant and log(ht ) is not robust to noise, and showed more generally that a sufficient condition for a loss function to be robust is that ∂ 2 L(σ 2 , h)/∂(σ 2 )2 does not depend on h. In Section 3 we generalise this result by providing necessary and sufficient conditions for a loss function to be robust.6 , 7 It is worth noting that although the ranking obtained from a robust loss function will be invariant to noise in the proxy, the actual level of expected loss obtained using a proxy will be larger than that which would be obtained when using the true conditional variance. This point was compellingly presented in Andersen and Bollerslev (1998) and Andersen et al. (2004). Andersen et al. (2005) provide a method to estimate the distortion in the level of expected loss and thereby obtain an estimator of the
5 The key difference between the approaches of Diebold and Mariano (1995) and West (1996) is that the latter explicitly allows for forecasts that are based on estimated parameters, whereas the null of equal predictive accuracy is based on population parameters, see West (2006). The problems we identify below arise even in the absence of estimation error in the forecasts, thus our treatment of the forecasts as primitive, and so for our purposes these two approaches coincide. 6 Our use of the adjective ‘‘robust’’ is related, though not equivalent, to its use in estimation theory, where it applies to estimators that insensitive/less sensitive to the presence of outliers in the data, see Huber (1981) for example. A ‘‘robust’’ loss function, in the sense of Definition 1, will generally not be robust to the presence of outliers. 7 In recent work Giacomini and White (2006) propose ranking forecasts by expected loss conditional on some information set Gt , rather than by unconditional expected loss as in Definition 1. The numerical examples provided below will differ in this more general case, of course, however the theoretical results in this paper go through if Gt ⊆ Ft −1 , which is true for all of the examples considered by Giacomini and White (2006).
MSE-prop : L σˆ 2 , h =
2
σˆ 2 h
2 −1
MAE : L σˆ 2 , h = σˆ 2 − h MAE-LOG : L σˆ 2 , h = log σˆ 2 − log h √ MAE-SD : L σˆ 2 , h = σˆ − h 2 σˆ 2 MAE-prop : L σˆ , h = − 1 . h
(9) (10) (11) (12) (13)
2.1. Using squared returns as a volatility proxy In this section we will focus on the use of daily squared returns for volatility forecast evaluation, and in Section 2.2 we will examine the use of realised volatility and the range. We will derive our results under three assumptions for the conditional distribution of daily returns: rt |Ft −1
Ft 0, σt2 2 ∼ Student’s t 0, σt , ν 2 N 0, σt
where Ft (0, σt2 ) is some unspecified distribution with mean zero and variance σt2 , and Student’s t (0, σt2 , ν) is a Student’s t distribution with mean zero, variance σt2 and ν degrees of freedom.
8 Some of these loss functions are called different names by different authors: MSE-prop is also known as ‘‘heteroskedasticity-adjusted MSE (HMSE)’’; MAE-prop is also known as ‘‘mean absolute percentage error (MAPE)’’ or as ‘‘heteroskedasticity-adjusted MAE (HMAE)’’.
A.J. Patton / Journal of Econometrics 160 (2011) 246–256
In all cases it is clear that Et −1 [rt2 ] = σt2 , and so the squared daily return is a valid volatility proxy. It is trivial to show that the MSE loss function generates an optimal forecast equal to the conditional variance: h∗t = Et −1 [rt2 ] = σt2 , and thus satisfies the necessary condition for robustness. Further, the MSE loss function also satisfies the sufficient condition of Hansen and Lunde (2006), and thus MSE is a ‘‘robust’’ loss function. Another commonly used loss function is the MSE loss function on standard deviations rather than variances, MSE-SD, see Eq. (8). The motivation for this loss function is that taking square root of the two arguments of the squared-error loss function shrinks the larger values towards zero, reducing the impact of the most extreme values of rt . However it also leads to an incorrect volatility forecast being selected as optimal:
[ ] √ 2 |rt | − h ht ≡ arg min Et −1 h∈H [ ] √ 2 ∂ |rt | − h Et −1 FOC 0 = ∗ ∂h h=ht ∗
so h∗t = (Et −1 [|rt |])2
(14)
= σ (Et −1 [|εt |]) 2 ν−1 ν ν−2 0 0 σt2 , π 2 2 = if rt |Ft −1 ∼ Student’s t 0, σt2 , ν , ν > 2 2 2 σ ≈ 0.64σt2 , if rt |Ft −1 ∼ N 0, σt2 . π t 2 t
2
(15)
This distortion is present even under Gaussianity, and excess kurtosis in asset returns exacerbates the distortion: For example, if returns follow the Student’s t distribution with six degrees of freedom then the coefficient on σt2 in the above expression is 0.56. As mentioned in the Introduction, if the forecast user’s loss function truly is the square of the difference between the absolute return and the square root of the forecast, then the ‘‘distortion’’ in the optimal forecast above is desirable, as this is the forecast that minimises his/her expected loss. However, if the goal is to find the forecast that is closest to the true conditional variance, then this distortion in the optimal forecast can lead to an incorrect ranking of competing forecasts.9 Thus the MSE-SD loss function is not consistent with the goal of ranking volatility forecasts by their distance to the true conditional variance when using the squared return as the volatility proxy: either the proxy has to be re-scaled by a term that depends critically on the underlying conditional distribution of returns, or, more simply, a different loss function must be chosen. The corresponding calculations for the remaining loss functions in Eqs. (5) to (13) are provided in Patton (2006), and the results are summarised in Table 1. This table shows that the degree of distortion in the optimal forecast according to some of the loss functions used in the literature can be substantial. Under normality the optimal forecast under these loss functions ranges from about one quarter of the true conditional variance to three times the true conditional variance. If returns exhibit excess conditional kurtosis then the range of optimal forecasts from these loss functions is even wider. Table 1 provides a theoretical explanation for the widespread finding of conflicting rankings of volatility forecasts when nonrobust loss functions are used in applied work. Lamoureux and Lastrapes (1993), Hamilton and Susmel (1994), Bollerslev and
9 This distortion remains if the target is instead the conditional standard deviation, as the absolute return is not an unbiased proxy for that quantity.
249
Ghysels (1994) and Hansen and Lunde (2005), amongst many others, use some or all of the nine loss functions considered in Table 1 and find that the best-performing volatility model changes with the choice of loss function. Given that, for example, the MSE-prop loss function leads to an optimal forecast that is biased upwards by at least a factor of three, while the MAE loss function leads to an optimal forecast that is biased downwards by at least a factor of two, it is no surprise that different rankings of volatility forecasts are found. 2.2. Using better volatility proxies It has long been known that squared returns are a rather noisy proxy for the true conditional variance. One alternative volatility proxy that has gained much attention recently is ‘‘realised volatility’’, see Andersen et al. (2001, 2003), and Barndorff-Nielsen and Shephard (2002, 2004). Another commonly used alternative to squared returns is the intra-daily range. It is well known that if the log stock price follows a Brownian motion then both of these estimators are unbiased and more efficient than the squared return. In this section we obtain the rate at which the distortion in the ranking of alternative forecasts disappears when using realised volatility as the proxy, as the sampling frequency increases, for a simple data generating process (DGP). Assume that there are m equally-spaced observations per trade day, and let ri,m,t denote the ith intra-daily return on day t. While recent work on realised volatility would enable us to consider a quite general class of DGPs, in order to obtain analytical results for problems involving the range as a volatility proxy we consider only a simple DGP: zero mean return, no jumps, and constant conditional volatility within a trade day.10 Patton and Sheppard (2009) present the corresponding results for a range of more realistic DGPs via simulation.11 Let rt = d log Pt = σt dWt
(16)
στ = σt ∀τ ∈ (t − 1, t ] ∫ i/m ∫ ri,m,t ≡ rτ dτ = σt
(17)
(i−1)/m
m
so ri,m,t i=1 ∼ i.i.d. N
0,
i/m
(i−1)/m
σt2 m
dWτ
(18)
.
(19)
The ‘‘realised volatility’’ or ‘‘realised variance’’ is defined as: (m)
RVt
≡
m −
ri2,m,t .
i =1
Realised variance, like the daily squared return (which is obtained in the above framework by setting m = 1), is a conditionally unbiased estimator of the daily conditional variance. Its main advantage is that it is more efficient estimator than the daily squared return: for this DGP it can be shown that Et −1 [(rt2 − σt2 )2 ] = 2σt4 while (m)
(m)
Et −1 [(RVt − σt2 )2 ] = 2σt4 /m. Thus RVt →p σt2 as m → ∞, under these assumptions, and we find in this case that σt2 is observable. As expected, all distortions vanish in this case.
10 Analytical and empirical results on the range and ‘‘realised range’’ under more flexible DGPs are presented in two recent papers by Christensen and Podolskij (2007) and Martens and van Dijk (2007). 11 When the DGP is specified to be log-normal or GARCH stochastic volatility diffusions, Patton and Sheppard (2009) find results very similar to those obtained for the case below. Using the same parameterisations as those in the simulations of Gonçalves and Meddahi (2009), slightly larger biases from the non-robust loss functions are found, but they generally differ from those in Table 2 only in the second decimal place. In contrast, the biases are found to be much larger under the two-factor stochastic volatility diffusion considered by Gonçalves and Meddahi (2009).
250
A.J. Patton / Journal of Econometrics 160 (2011) 246–256
Table 1 Optimal forecasts under various loss functions. Loss function
Distribution of daily returns Ft (0, σt2 )
Student’s t (0, σt2 , ν)
ν QLIKE
σ σ
MSE-LOG
exp{Et −1 [log εt2 ]}σt2
MSE
σ σ
2 t 2 t
MSE-SD
(Et −1 [|εt |])2 σt2
MSE-prop
Kurtosist −1 [rt ]σt2 rt2 rt2 rt2
MAE
Mediant −1 [ ]
MAE-LOG
Mediant −1 [ ] Mediant −1 [ ]
MAE-SD
n/a
a
MAE-prop
2 t 2 t
ν=6
ν = 10
ν→∞
σ σ 0.22σt2
σ σ 0.25σt2
σt2 σt2 0.28σt2
0.56σt2
0.60σt2
0.64σt2
6.00σt2
4.00σt2
3.00σt2
0.34σ
0.39σ
0.45σt2
0.39σ
0.45σt2
0.34σ
2 t 2 t 2 t
0.39σ
0.45σt2
2.73σ
2 t
2.55σ
2.36σt2
2 t 2 t
− Ψ ν2 (ν − 2)σt2 2 2 2 ν−2 0 ν−2 1 /0 ν2 σt π ν−2 2 3 ν−4 σt ν−2 Median [F1,ν ]σt2 ν ν−2 Median [F1,ν ]σt2 ν ν−2 Median [F1,ν ]σt2 ν σt2 2.36 + 1.ν00 + 7ν.78 2 exp Ψ
1
0.34σ
2 t 2 t
2 t 2 t 2 t 2 t
Notes: This table presents the forecast that minimises the conditional expected loss when the squared return is used as a volatility proxy. That is, h∗t minimises Et −1 [L(rt2 , h)], for various loss functions L. The first column presents the solutions when returns have an arbitrary conditional distribution rt |Ft −1 ∼ Ft with mean zero and conditional variance σt2 , the second, third, and fourth columns present results with returns have a Student’s t distribution with mean zero, variance σt2 and degrees of freedom ν , and the final column presents the solutions when returns are conditionally normally distributed. 0 is the gamma function and Ψ is the digamma function. a The expressions given for MAE-prop are based on a numerical approximation, see Patton (2006) for details.
The range, or the high/low, estimator has been used in finance for many years, see Garman and Klass (1980) and Parkinson (1980). The intra-daily log range is defined as: RGt ≡ max log Pτ − min log Pτ , τ
τ
t − 1 < τ ≤ t.
(20)
Under the dynamics in Eq. (16) Feller (1951) presented the density of RGt , and Parkinson (1980) presented a formula for obtaining moments of the range, which enable us to compute: Et −1 RG2t = 4 log (2) · σt2 ≈ 2.7726σt2 .
(21)
Details on the distributional properties of the range under this DGP are presented in Patton (2006). The above expression shows that squared range is not a conditionally unbiased estimator of σt2 ; we will thus focus below on the adjusted range: RG∗t ≡
RGt
√
2 log (2)
≈ 0.6006RGt
(22)
which, when squared, is an unbiased proxy for the conditional variance. Note that the adjustment factor depends critically on the assumed DGP, which is a potential drawback of the range as a volatility proxy. Using the results of Parkinson (1980) it is simple to determine that MSEt −1 [RG∗t 2 ] ≈ 0.4073σt4 , which is approximately one-fifth of the MSE of the daily squared return. We now determine the optimal forecasts obtained using the (m) various loss functions considered above, when σˆ t2 = RVt or σˆ t2 = ∗2 RGt is used as a proxy for the conditional variance rather than rt2 . We initially leave m unspecified for the realised volatility proxy, and then specialise to three cases: m = 1, 13 and 78, corresponding to the use of daily, half-hourly and 5-min returns, on a stock listed on the New York Stock Exchange (NYSE). For MSE and QLIKE the optimal forecast is simply the conditional mean of σˆ t2 , which equals the conditional variance, as (m)
∗2
RVt and RGt are both conditionally unbiased. The MSE-SD loss function yields (Et −1 [σˆ t ])2 as the optimal forecast. Under the setup above, (m)
RVt
≡
m −
rt2,i =
i =1
(m)
so mσt−2 RVt
m σt2 −
m i =1
∼ χm2 [ ]2 σt2 ∗ so ht = E χm2 m
[ ] √ 1 E χm2 ≈ m − √ by a Taylor series approximation 4 m 1 1 so h∗t ≈ σt2 1 − + 2m 16m2 2 0.5625 · σt for m = 1 ≈ 0.9619 · σt2 for m = 13 0.9936 · σt2 for m = 78. The results for the MSE-SD loss function using realised volatility show that reducing the noise in the volatility proxy improves the optimal forecast,12 consistent with Hansen and Lunde (2006). Using the range we find that h∗t = Et −1 RG∗t
2
=
2
π log 2
σt2 ≈ 0.9184σt2
and so the distortion from using the range is approximately equal to that incurred when using a realised volatility constructed using 6 intra-daily observations. Calculations for the remaining loss functions are collected in Patton (2006), and the results are summarised in Table 2. The results in Table 2 confirm that as the proxy used to measure the true conditional variance gets more efficient the degree of distortion decreases for all loss functions. Using half-hour returns (13 intra-daily observations) or the intra-daily range still leaves substantial distortions in the optimal forecasts, but using 5-min returns (78 intra-daily observations) eliminates almost all of the bias, at least in this simple framework. While high frequency data is available and reliable for some assets (the most liquid assets on well-developed exchanges), for most assets it is not possible to obtain reliable high-frequency data, and thus the impact of noise in the volatility proxy cannot be ignored. 3. A class of robust loss functions In the previous section we showed that amongst nine loss functions commonly used to compare volatility forecasts, only the MSE and the QLIKE loss functions lead to h∗t = Et −1 [σˆ t2 ] = σt2 , which is a necessary condition for a loss function to be robust
εt2,i 12 Note that the result for m = 1 is different to that obtained in Section 2, which was h∗t = π2 σt2 ≈ 0.6366σt2 . This is because for m = 1 we can obtain the expression exactly, using results for the normal distribution, whereas for arbitrary m we relied on a second-order Taylor series approximation.
A.J. Patton / Journal of Econometrics 160 (2011) 246–256
251
Table 2 Optimal forecasts under various loss functions, using realised volatility and range. Loss function
Volatility proxy Range
Realised volatility
σ σ 0.85σt2 2 t 2 t
MSE QLIKE MSE-LOGa
Arbitrary m
m=1
m = 13
m = 78
m→∞
σ σ
σt2 2 χm2 σt2
σ σ 0.28σt2
σ σ 0.91σt2
σ σ 0.98σt2
σt2 σt2 σt2
0.56σt2
0.96σt2
0.99σt2
σt2
σ
3.00σ
1.15σ
1.03σ
0.95σ
0.99σ
σt2 σt2 σt2 σt2 σt2
e
2 t 2 t −1.2741/m
MSE-SD
0.92σt2
1 m
MSE-prop
1.41σ
MAE
0.83σ
MAE-LOG
0.83σ
MAE-SD
0.83σ
MAE-propa
1.19σ
2 t 2 t 2 t 2 t 2 t
E
1+
1 m 1 m 1 m
2 m
2 t 2 t
2 t
Median [χ ]σ
0.45σ
Median [χ ]σ
0.45σ
Median [χ ]σ
0.45σ
2 m 2 m 2 m 2 t
1 + 1.3624 σ m
2 t 2 t 2 t
2.36σ
2 t 2 t
2 t 2 t 2 t 2 t 2 t
0.95σ 0.95σ
2 t 2 t 2 t 2 t 2 t
1.10σ
2 t 2 t
0.99σ 0.99σ 1.02σ
2 t 2 t 2 t 2 t 2 t
Notes: This table presents the forecast that minimises the conditional expected loss when the range or realised volatility is used as a volatility proxy. That is, h∗t minimises Et −1 [L(σˆ t2 , h)], for σˆ t2 = RG∗t 2 or σˆ t2 = RVt , for various loss functions L. In all cases returns are assumed to be generated as a zero mean Brownian motion with constant volatility within each trade day and no jumps. The cases of m = 1, 13, 78 correspond to the use of daily squared returns, realised variance with 30-min returns and realised variance with 5-min returns respectively. The case that m → ∞ corresponds to the case where the conditional variance is observable ex post without error. a For the MSE-LOG and MAE-prop loss functions we used simulations, numerical integration and numerical optimisation to obtain the expressions given. Details on the computation of the figures in this table are given in Patton (2006).
to noise in the volatility proxy. The following proposition is the main theoretical contribution of the paper; it provides a necessary and sufficient class of robust loss functions for volatility forecast comparison, which are related to the class of linear-exponential densities of Gourieroux et al. (1984), and to the work of Gourieroux et al. (1987). We will show below that this class contains an infinite number of loss functions, and allows for asymmetric penalties to be applied to over- versus under-predictions, as well as for a symmetric penalty. We make the following assumptions: A1: Et −1 [σˆ t2 ] = σt2 for all t. A2: σˆ t2 |Ft −1 ∼ Ft ∈ F˜ , the set of all absolutely continuous distribution functions on R+ . A3: L is twice continuously differentiable with respect to h and σˆ 2 , and has a unique minimum at σˆ 2 = h. A4: There exists some h∗t ∈ int(H ) such that h∗t = Et −1 [σˆ t2 ], where H is a compact subset of R++ . A5: L and Ft are such that: (a) Et −1 [L(σˆ t2 , h)] < ∞ for some h ∈ H ; (b) |Et −1 [∂ L(σˆ t2 , h)/∂ h|h=σ 2 ]| < ∞; and (c) |Et −1 [∂ 2 L(σˆ t2 , h)/ t
∂ h2 |h=σ 2 ]| < ∞, for all t. t
Proposition 1. Let assumptions A1 to A5 hold. Then a loss function L is robust, in the sense of Definition 1, if and only if it takes the following form: L σˆ 2 , h = C˜ (h) + B σˆ 2 + C (h) σˆ 2 − h
(23)
where B and C are twice continuously differentiable, C is a strictly decreasing function on H , and C˜ is the anti-derivative of C . Remark 1. If we normalise the loss function to yield zero loss when σˆ 2 = h, then B(σˆ 2 ) = −C˜ (σˆ 2 ). Remark 2. Up to additive and multiplicative constants, MSE loss is obtained by setting C (z ) = −z, C˜ (z ) = −z 2 /2 and B(z ) = z 2 /2, and QLIKE is obtained by setting C (z ) = 1/z , C˜ (z ) = log(z ) and B(z ) = 0. Given the widespread interest in economics and finance in loss functions that depend only on the forecast error or the standardised forecast error, we present below a somewhat surprising result on the subset of robust loss functions that satisfy one of these restrictions.
Proposition 2. (i) The ‘‘MSE’’ loss function is the only robust loss function satisfying assumptions A1–A5 that depends solely on the forecast error, σˆ 2 − h. (ii) The ‘‘QLIKE’’ loss function is the only robust loss function satisfying assumptions A1–A5 that depends solely on the standardised forecast error, σˆ 2 /h. The standardised forecast error will be centred approximately around 1 (if h is somewhat accurate) and, more interestingly, the conditional variance of the standardised forecast error will be approximately 2 (under Gaussianity) regardless of the level of volatility of returns. Thus the average QLIKE loss will be less affected (generally) by the most extreme observations in the sample. The MSE loss, on the other hand, depends on the usual forecast error, σˆ 2 − h, which will be centred approximately around zero, but will have variance that is proportional to the square of the variance of returns, i.e., σ 4 . As noted by several previous authors, this implies that MSE is sensitive to extreme observations and the level of volatility of returns. In most economic and financial applications, the choice of units of measurement is arbitrary, e.g., measuring prices in dollars versus cents, or measuring returns in percentages versus decimals. Given this, it is useful to consider the impact of a simple change in units on the ranking of two competing forecasts by expected loss. The class of loss functions presented in Proposition 1 guarantees that the true conditional variance will be chosen (subject to sampling variation) over any other forecast regardless of the choice units. However it does not guarantee that the ranking of two imperfect forecasts will be invariant to the choice of units. The following proposition shows that by using a homogeneous robust loss function, the ranking of any two (possibly imperfect) forecasts is invariant to a re-scaling of the data. It further provides an example where the ranking can be reversed simply with a rescaling of the data if a non-homogeneous robust loss function is used. Proposition 3. Recall that a loss function L is homogeneous of order k if L aσˆ 2 , ah = ak L σˆ 2 , h
∀a > 0 for some k.
Then: (i) The ranking of any two (possibly imperfect) volatility forecasts by expected loss is invariant to a re-scaling of the data if the loss function is homogeneous.
252
A.J. Patton / Journal of Econometrics 160 (2011) 246–256
(ii) The ranking of any two (possibly imperfect) volatility forecasts by expected loss may not be invariant to a re-scaling of the data if the loss function is robust but not homogeneous. With the above motivation for homogeneous loss functions, we now derive the subset of homogeneous, robust loss functions. It turns out that this subset of functions is indexed by a single parameter, which determines the both degree of homogeneity and the shape of the loss function. Naturally, the MSE loss function is nested in this case (homogeneous of order 2), as is the QLIKE loss function (homogeneous of order zero). Proposition 4. The following family of loss functions, indexed by the scalar parameter b, corresponds to the entire subset of robust and homogeneous loss functions. The degree of homogeneity is equal to b + 2.
L σˆ 2 , h; b =
1 (σˆ 2b+4 − hb+2 ) b + 1 ( ) (b + 2) − 1 hb+1 σˆ 2 − h , for b ̸∈ {−1, −2} b +1
σˆ 2 , for b = −1 h − σˆ 2 + σˆ 2 log h 2 2 σˆ σˆ − log − 1, for b = −2. h
(24)
Fig. 1. Loss functions for various choices of b. True σ 2 = 2 in this example, with the volatility forecast ranging between 0 and 4. b = 0 and b = −2 correspond to the MSE and QLIKE loss functions respectively.
h
The MSE loss function is obtained when b = 0 and the QLIKE loss function is obtained when b = −2, up to additive and multiplicative constants. In Fig. 1 we present the above class of functions for various values of b, ranging from 1 to −5, and including the MSE and QLIKE cases. This figure shows that this family of loss functions can take a wide variety of shapes, ranging from symmetric (b = 0, corresponding to MSE) to asymmetric, with heavier penalty either on under-prediction (b < 0) or over-prediction (b > 0). Fig. 2 plots the ratio of losses incurred for negative forecast errors to those incurred for positive forecast errors, to make clearer the form of asymmetries in these loss functions. Other considerations when choosing a loss function from the class in Eq. (24) include the moment conditions required for formal tests and the finite-sample power of these tests. Patton (2006) presents results on how moment and memory conditions required for DMW tests vary with the shape parameter b. It is noteworthy that the moment conditions required under MSE loss are substantially stronger than those using QLIKE loss. Related to this, Patton and Sheppard (2009) find that the power of DMW tests using QLIKE loss are higher than those using MSE loss, providing further motivation for using QLIKE rather than MSE in volatility forecasting applications.
Fig. 2. Ratio of losses from negative forecast errors to positive forecast errors, for various choices of b. True σ 2 = 2 in this example, with the volatility forecast ranging between 0 and 4. b = 0 and b = −2 correspond to the MSE and QLIKE loss functions respectively.
4. Empirical application to forecasting IBM return volatility
(26)
section requires that the volatility proxy (σˆ t2 ) is conditionally unbiased, but no such assumption is required for the volatility forecasts (hit ): the rolling window and RiskMetrics forecasts can be biased, or inaccurate in other ways. (Indeed, Mincer–Zarnowitz tests reported in Patton (2006) indicate that both of these forecasts are biased.) We employ a variety of volatility proxies in the comparison of these forecasts: the daily squared return, and realised variance (RV) computed using 65-min, 15-min and 5-min returns.13 In order for the theory in the previous section to be applied, we require the proxy to be conditionally unbiased. For a liquid stock such as IBM, all of these proxies can plausibly be considered free from market microstructure effects. The same is not likely true for very high
We use approximately the first year of observations (272 observations) to initialise the RiskMetrics forecasts, and the remaining 2500 observations to compare the forecasts. A plot of the volatility forecasts is provided in Fig. 3. Recall that the theory in the previous
13 We use 65-min returns rather than 60-min returns so that there are an even number of intervals within the NYSE trade day, which runs from 9.30 am to 4 pm.
In this section we consider the problem of forecasting the conditional variance of the daily open-to-close return on IBM, using data from the TAQ database over the period from January 1993 to December 2003. We consider two simple volatility forecasting models that are widely used in industry: a 60-day rolling window forecast, and the RiskMetrics volatility forecast based on daily returns: Rolling window : h1t =
60 1 −
60 j=1
rt2−j
RiskMetrics : h2t = λh2t −1 + (1 − λ) rt2−1 ,
(25)
λ = 0.94.
A.J. Patton / Journal of Econometrics 160 (2011) 246–256
253
Fig. 3. Conditional variance forecasts for IBM returns from 60-day rolling window and RiskMetrics models, January 1994 to December 2003. Table 3 Comparison of rolling window and RiskMetrics forecasts. Loss function
b b b b b
=1 = 0 (MSE) = −1 = −2 (QLIKE) = −5
Volatility proxy Daily squared return
65-min realised vol.
15-min realised vol.
5-min realised vol.
−1.58 −0.59
−1.66 −0.80
−1.30 −0.03
−1.35 −0.13 −1.55
1.30 1.94 −0.17
1.04 2.21∗ 0.25
1.65 2.73∗ 1.63
2.41∗ 0.65
Notes: This table presents the t-statistics from Diebold–Mariano–West tests of equal predictive accuracy for a 60-day rolling window forecast and a RiskMetrics forecast, for IBM over the period January 1994 to December 2003. A t-statistic greater than 1.96 in absolute value indicates a rejection of the null of equal predictive accuracy at the 0.05 level. These statistics are marked with an asterisk. The sign of the t-statistics indicates which forecast performed better for each loss function: a positive t-statistic indicates that the rolling window forecast produced larger average loss than the RiskMetrics forecast, while a negative sign indicates the opposite.
frequencies (such as 1-s or 30-s), and may not be true for 5-min RV for less liquid stocks. In comparing these forecasts we present the results of Diebold–Mariano–West tests using the loss function presented in Proposition 4, for five different choices of the loss function parameter: b = {1, 0, −1, −2, −5}. MSE loss and QLIKE loss correspond to b = 0 and b = −2 respectively. Table 3 presents tests comparing the RiskMetrics forecasts based on daily returns with the 60-day rolling window volatility forecasts. The only loss function for which the difference in forecast performance is significantly different from zero is the QLIKE loss function: the difference is significant at the 0.05 level using 65-min, 15-min and 5-min realised variances as the volatility proxy, and significant at the 0.10 level using daily squared returns as the proxy. In all of these cases the t-statistic is positive, indicating that the rolling window forecasts generated larger average loss than the RiskMetrics forecasts. Interestingly, under MSE loss, the differences in average loss favour the rolling window forecasts, though these differences are not statistically significant. Mincer–Zarnowitz tests (presented in Patton (2006)) revealed, unsurprisingly, that neither of these forecasts is optimal. Robust loss functions are designed to always select the true conditional variance over any competing forecast, but when comparing two imperfect forecasts the ranking can, as in this example, change depending on the choice of loss function. This emphasises the flexibility that remains even when we restrict attention to homogeneous, robust loss functions. 5. Conclusion This paper analytically demonstrated some problems with volatility forecast comparison techniques used in the literature. These techniques invariably rely on a volatility proxy, which is some imperfect estimator of the true conditional variance, and the presence of noise in the volatility proxy can lead an imperfect
volatility forecast being selected over the true conditional variance for certain choices of loss function. Thus noisy volatility proxies not only reduce power, as discussed in Andersen and Bollerslev (1998) for example, they can also seriously affect the asymptotic size of commonly used tests. We showed analytically that less noisy volatility proxies, such as the intra-daily range and realised volatility, lead to less distortion, though in many cases the degree of distortion is still large. We derived necessary and sufficient conditions for the loss function to yield rankings of volatility forecasts that are robust to noise in the proxy. We also proposed a new parametric family of robust and homogeneous loss functions, which yield inference that is invariant to the choice of units of measurement. The new family of loss function nests both squared-error (MSE) and the ‘‘QLIKE’’ loss functions, two of the most widely used in the volatility forecasting literature. A small empirical study of IBM equity volatility illustrated the new loss functions in forecast comparison tests. Whilst volatility forecasting is a prominent example of a problem in economics where the variable of interest is unobserved, there are many other such examples: forecasting the true rate of GDP growth (not simply the announced rate); forecasting default probabilities; and forecasting covariances or correlations. The derivations in this paper exploited the fact that the latent variable of interest in volatility forecasting (namely the conditional variance) is a positive random variable, and the proxy is nonnegative and continuously distributed. Extending the results in this paper to handle latent variables of interest with support on the entire real line, as would be required for applications to studies of the ‘‘true’’ rates of growth in macroeconomic aggregates or to conditional covariances, should not be difficult. Extending our results to handle proxies with discrete support, such as those that would be used in default forecasting applications, may require a different method of proof. We leave such extensions to future research.
254
A.J. Patton / Journal of Econometrics 160 (2011) 246–256
Acknowledgements The author would particularly like to thank Peter Hansen, Ivana Komunjer and Asger Lunde for helpful suggestions and comments. Thanks are also due to Torben Andersen, Tim Bollerslev, Peter Christoffersen, Rob Engle, Christian Gourieroux, Tony Hall, Mike McCracken, Nour Meddahi, Roel Oomen, Adrian Pagan, Neil Shephard, Kevin Sheppard, and Ken Wallis. Runquan Chen provided excellent research assistance. The author gratefully acknowledges financial support from the Leverhulme Trust under Grant F/0004/AF. Some of the work on this paper was conducted while the author was a visiting scholar at the School of Finance and Economics, University of Technology, Sydney. Appendix Proof of Proposition 1. We prove this proposition by showing the equivalence of the following three statements: S 1: The loss function takes the form given the statement of the proposition; S 2: The loss function is robust in the sense of Definition 1; S 3: The optimal forecast under the loss function is the conditional variance. We will show that S 1 ⇒ S 2 ⇒ S 3 ⇒ S 1. That S 1 ⇒ S 2 follows from Hansen and Lunde (2006): their assumption 2 is satisfied given the assumptions for the proposition and noting that ∂ 2 L(σˆ 2 , h)/∂(σˆ 2 )2 = B′′ (σˆ 2 ) does not depend on h. We next show that S 2 ⇒ S 3: by the definition of h∗t we have Et −1 L σˆ t2 , h∗t
≤ Et −1 L σˆ t2 , h˜ t
for any other sequence of Ft −1 -measurable forecasts h˜ t . Then
≤ E L σˆ t2 , h˜ t by the LIE 2 ∗ and E L σt , ht ≤ E L σt2 , h˜ t since L is robust under S 2. E L σˆ t2 , h∗t
But L(σˆ 2 , h) has a unique minimum at σˆ 2 = h, and if we set h˜ t = σt2 then it must be the case that h∗t = σt2 . Proving S 3 ⇒ S 1 is more challenging. For this part we follow the proof of Theorem 1 of Komunjer and Vuong (2006), adapted to our problem. We seek to show that the functional form of the loss function given in the proposition is necessary for h∗t = Et −1 [σˆ t2 ], for any Ft ∈ F˜ . Notice that we can write ∂ L σˆ t2 , ht = c σˆ t2 , ht σˆ t2 − ht ∂h where c (σˆ t2 , ht ) = (σˆ t2 − ht )−1 ∂ L(σˆ t2 , ht )/∂ h, since σˆ t2 ̸= ht a.s. by assumption A2. Now decompose c (σˆ t2 , ht ) into c σˆ t2 , ht = Et −1 c σˆ t2 , ht
+ εt
where Et −1 [εt ] = 0. Thus
∂ L σˆ t2 , h∗t E t −1 = Et −1 c σˆ t2 , h∗t σˆ t2 − h∗t ∂h 2 = Et −1 c σˆ t , ht Et −1 σˆ t2 − h∗t + Et −1 εt σˆ t2 − h∗t . If Et −1 [∂ L(σˆ t2 , h∗t )/∂ h] = 0 for h∗t = Et −1 [σˆ t2 ], then it must be that Et −1 [σˆ t2 − h∗t ] = 0 ⇒ Et −1 [εt (σˆ t2 − h∗t )] = 0 for all Ft ∈ F˜ . Employing a generalised Farkas lemma, see Lemma 8.1 of Gourieroux and Monfort (1996), this implies that ∃λ ∈ R such that λ(σˆ t2 − h∗t ) = εt (σˆ t2 − h∗t ) for every Ft ∈ F˜ and for all t. Since σˆ t2 − h∗t ̸= 0 a.s. by assumption A2 this implies that εt = λ a.s. for all t. Since Et −1 [εt ] = 0 we then have λ = 0. Thus c (σˆ t2 , h∗t ) =
Et −1 [c (σˆ t2 , h∗t )] for all t, which implies that c (σˆ t2 , h∗t ) = c (h∗t ), and thus that ∂ L(σˆ t2 , ht )/∂ h = c (ht )(σˆ t2 − ht ). The remainder of the proof is straightforward: A necessary condition for h∗t to minimise Et −1 [L(σˆ t2 , h)] is that Et −1 [∂ 2 L(σˆ t2 , h∗t )/ ∂ h2 ] ≥ 0, using A5 to interchange expectation and differentiation. Using the previous result we have:
E t −1
∂ 2 L σˆ t2 , h∗t = Et −1 c ′ h∗t σˆ t2 − h∗t − c h∗t 2 ∂h = −c h∗t
which is non-negative iff c (h∗t ) is non-positive. From assumption A4 we know that the optimum is in the interior of H and so we know that c ̸= 0, and thus c (h) < 0 ∀h ∈ H . To obtain the loss function corresponding to the given first derivative we simply integrate up: L σˆ 2 , h = σˆ 2
∫
c (h) dh −
∫
c (h) hdh
∫ 2 2 = B σˆ + σˆ C (h) − C (h) h + C (h) dh = C˜ (h) + B σˆ 2 + C (h) σˆ 2 − h where C is a strictly decreasing function (i.e. C ′ ≡ c is negative) and C˜ is the anti-derivative of C . By assumption A3 both B and C are twice continuously differentiable. Thus S 3 ⇒ S 1, completing the proof. Proof of Proposition 2. Without loss of generality, we work below with loss functions that have been normalised to imply zero loss when the forecast error is zero: L(σˆ 2 , h) = C˜ (h) − C˜ (σˆ 2 ) + C (h)(σˆ 2 − h). (i) We want to find the general sub-set of loss functions that ˜ This satisfy L(σˆ 2 , h) = L˜ (σˆ 2 − h) ∀(σˆ 2 , h) for some function L. condition implies
∂ L σˆ 2 , h ∂ L σˆ 2 , h = − ∀ σˆ 2 , h 2 ∂ σˆ ∂h −C σˆ 2 + C (h) + C ′ (h) σˆ 2 − h = 0 ∀ σˆ 2 , h . Taking the derivative of both sides w.r.t. σˆ 2 we obtain:
−C ′ σˆ 2 + C ′ (h) = 0 ∀ σˆ 2 , h which implies C ′ (h) = κ1
∀h
and since we know C is strictly decreasing, we also have κ1 < 0. so C (h) = κ1 h + κ2 σˆ 2
C˜ (h) =
1 2
κ1 h2 + κ2 σˆ 2 h + κ3 σˆ 2
where κ2 , κ3 are constants of integration, and may be functions of σˆ 2 . Thus the loss function becomes
1 κ1 h2 + κ2 σˆ 2 h + κ3 σˆ 2 − κ1 σˆ 4 2 2 − κ2 σˆ 2 σˆ 2 − κ3 σˆ 2 + κ1 h + κ2 σˆ 2 σˆ 2 − h 2 1 = − κ1 σˆ 2 − h .
L σˆ 2 , h =
1
2
Since proportionality constants do not affect the loss function, we find that the only loss function that depends on (σˆ 2 , h) only through the forecast error, σˆ 2 − h, is the MSE loss function. (ii) We next want to find the general sub-set of loss functions ˜ Note that satisfy L(σˆ 2 , h) = L˜ (σˆ 2 /h) ∀(σˆ 2 , h) for some function L. that this condition implies that L is homogeneous of degree zero.
A.J. Patton / Journal of Econometrics 160 (2011) 246–256
Using Proposition 4 below, this implies that the loss function must be of the form: L σˆ 2 , h =
σˆ 2 h
− log
σˆ 2 h
−1
which is the QLIKE loss function up to additive and multiplicative constants. Proof of Proposition 3. (i) If L is homogeneous then E [L(aσˆ t2 , ah1t )] ≥ E [L(aσˆ t2 , ah2t )] ⇔ E [ak L(σˆ t2 , h1t )] ≥ E [ak L(σˆ t2 , h2t )] ⇔ E [L(σˆ t2 , h1t )] ≥ E [L(σˆ t2 , h2t )], for any a > 0. (ii) Here we need only provide an example. Consider the following stylised case: σt2 = 1 a.s. ∀t, (h1t , h2t ) = (γ1 , γ2 ) ∀t, and σˆ t2 is such that Et −1 [σˆ t2 ] = 1 a.s. ∀t. As a robust but nonhomogeneous loss we will use the one generated by the following specification for C ′ : C ′ (h) = − log (1 + h)
For small h this loss function resembles the b = 1 loss function from Proposition 4 (up to a scaling constant), but for medium to large h this loss function does not correspond to any in Proposition 4. Given this set-up, we have
=
1 4
aγi (3aγi + 2) − 2 (1 + aγi )2 log (1 + aγi )
− E C˜ aσˆ t2 + a [aγi − (1 + aγi ) log (1 + aγi )] (1 − γi ) . Then define dt (γ1 , γ2 , a) ≡ L aσˆ t2 , aγ1 − L aσˆ t2 , aγ2 a E [dt (γ1 , γ2 , a)] = (γ1 − γ2 ) (2 − 4a − a (γ1 + γ2 )) 4 1 + a2 (γ1 − 1)2 − (1 + a)2 log (1 + aγ1 ) 2 1 − a2 (γ2 − 1)2 − (1 + a)2 log (1 + aγ2 ) . 2
Let h1t = γ1 = 1/3 and let h2t = γ2 = 3/2. Then E [dt (h1t , h2t , 1)] = −0.0087, and so the first forecast has lower expected loss than the second using the ‘‘original’’ scaling of the data. But E [dt (h1t , h2t , 2)] = 0.0061, and so if all variables are multiplied by 2 then the second forecast has lower expected loss than the first. Proof of Proposition 4. We seek the subset of robust loss functions that are homogeneous of order k : L(aσˆ 2 , ah) = ak L(σˆ 2 , h) ∀a > 0. Let
λ σˆ 2 , h ≡ ∂ L σˆ 2 , h /∂ h = C ′ (h) σˆ 2 − h for robust loss functions. Since L is homogeneous of order k, λ is homogeneous of order (k − 1). This implies λ(aσˆ 2 , ah) = ak−1 λ(σˆ 2 , h) = ak−1 C ′ (h)(σˆ 2 − h), while direct substitution yields λ(aσˆ 2 , ah) = aC ′ (ah)(σˆ 2 − h). Thus C ′ (ah) = ak−2 C ′ (h) ∀a > 0, that is, C ′ is homogeneous of order (k − 2). Next we apply Euler’s Theorem to C ′ : C ′′ (h)h = (k − 2)C ′ (h) ∀h > 0, and so
(2 − k) C ′ (h) + C ′′ (h) h = 0. We can solve this first-order differential equation to find: C ′ (h) = γ hk−2
where γ is an unknown scalar. Since C ′ < 0 we know that γ < 0, and as this is just a scaling parameter we set it to −1 without loss of generality. C ′ (h) = −hk−2 1 hk−1 + z1 C ( h) = 1 − k − log h + z1 C˜ (h) =
k ̸= 1 k=1
1 hk + z 2 k (1 − k) z1 h + h − h log h + z2 z1 h + log h + z2
z1 h +
k ̸∈ {0, 1} k=1 k=0
where z1 and z2 are constants of integration. Finally, we substitute the expressions for C and C˜ into Eq. (23), set B = −C˜ , and simplify to obtain the loss functions in Eq. (24) with k = b + 2. References
so C (h) = h − (1 + h) log (1 + h) 1 and C˜ (h) = h (3h + 2) − 2 (1 + h)2 log (1 + h) . 4
E L aσˆ t2 , ahit
255
Andersen, T.G., Bollerslev, T., 1998. Answering the skeptics: yes, standard volatility models do provide accurate forecasts. International Economic Review 39, 885–905. Andersen, T.G., Bollerslev, T., Christoffersen, P.F., Diebold, F.X., 2006. Volatility and correlation forecasting. In: Elliott, G., Granger, C.W.J., Timmermann, A. (Eds.), Handbook of Economic Forecasting. North Holland Press, Amsterdam. Andersen, T.G., Bollerslev, T., Diebold, F.X., 2010. Parametric and nonparametric volatility measurement. In: Hansen, L.P., Aï t-Sahalia, Y. (Eds.), Handbook of Financial Econometrics. North-Holland Press, Amsterdam. Andersen, T.G., Bollerslev, T., Diebold, F.X., Ebens, H., 2001. The distribution of realized stock return volatility. Journal of Financial Economics 61, 43–76. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2003. Modeling and forecasting realized volatility. Econometrica 71 (2), 579–625. Andersen, T.G., Bollerslev, T., Lange, S., 1999. Forecasting financial market volatility: sample frequency vis-à-vis forecast horizon. Journal of Empirical Finance 6, 457–477. Andersen, T.G., Bollerslev, T., Meddahi, N., 2005. Correcting the errors: volatility forecast evaluation using high-frequency data and realized volatilities. Econometrica 73 (1), 279–296. Andersen, T.G., Bollerslev, T., Meddahi, N., 2004. Analytic evaluation of volatility forecasts. International Economic Review 45, 1079–1110. Barndorff-Nielsen, O.E., Shephard, N., 2002. Econometric analysis of realised volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society, Series B 64, 253–280. Barndorff-Nielsen, O.E., Shephard, N., 2004. Econometric analysis of realized covariation: high frequency based covariance, regression and correlation in financial economics. Econometrica 72 (3), 885–925. Bollerslev, T., Engle, R.F., Nelson, D.B., 1994. ARCH models. In: Engle, R.F., McFadden, D. (Eds.), Handbook of Econometrics. North Holland Press, Amsterdam. Bollerslev, T., Ghysels, E., 1994. Periodic autoregressive conditional heteroscedasticity. Journal of Business and Economic Statistics 14 (2), 139–151. Christensen, K., Podolskij, M., 2007. Realized range-based estimation of integrated variance. Journal of Econometrics 141, 323–349. Christoffersen, P.F., Diebold, F.X., 1997. Optimal prediction under asymmetric loss. Econometric Theory 13, 808–817. Christoffersen, P.F., Jacobs, K., 2004. The importance of the loss function in option valuation. Journal of Financial Economics 72, 291–318. Clements, M.P., 2005. Evaluating Econometric Forecasts of Economic and Financial Variables. Palgrave MacMillan, United Kingdom. Cowles, A., 1933. Can stock market forecasters forecast? Econometrica 1 (3), 309–324. Diebold, F.X., Lopez, J.A., 1996. Forecast evaluation and combination. In: Maddala, G.S., Rao, C.R. (Eds.), Handbook of Statistics. North-Holland, Amsterdam, pp. 241–268. Diebold, F.X., Mariano, R.S., 1995. Comparing predictive accuracy. Journal of Business and Economic Statistics 13 (3), 253–263. Engle, R.F., 1993. A comment on Hendry and Clements on the limitations of comparing mean square forecast errors. Journal of Forecasting 12, 642–644. Engle, R.F., Hong, C.-H., Kane, A., Noh, J., 1993. Arbitrage valuation of variance forecasts with simulated options. In: Chance, D., Tripp, R. (Eds.), Advances in Futures and Options Research. JIA Press, Greenwich, USA. Feller, W., 1951. The asymptotic distribution of the range of sums of random variables. Annals of Mathematical Statistics 22, 427–432. Garman, M.B., Klass, M.J., 1980. On the estimation of security price volatilities from historical data. Journal of Business 53 (1), 67–78. Giacomini, R., White, H., 2006. Tests of conditional predictive ability. Econometrica 74 (6), 1545–1578. Gonçalves, S., Meddahi, N., 2009. Bootstrapping realized volatility. Econometrica 77 (1), 283–306. Gourieroux, C., Monfort, A., 1996. Statistics and Econometric Models, Vol. 1. Cambridge University Press, Great Britain, (Q. Vuong, Trans.) (in French). Gourieroux, C., Monfort, A., Renault, E., 1987. Consistent M-estimators in a semiparametric model, CEPREMAP Working Paper 8720.
256
A.J. Patton / Journal of Econometrics 160 (2011) 246–256
Gourieroux, C., Monfort, A., Trognon, A., 1984. Pseudo maximum likelihood methods: theory. Econometrica 52 (3), 681–700. Granger, C.W.J., 1969. Prediction with a generalized cost function. Operations Research Quarterly 20, 199–207. Hamilton, J.D., Susmel, R., 1994. Autoregressive conditional heteroskedasticity and changes in regime. Journal of Econometrics 64 (1–2), 307–333. Hansen, P.R., Lunde, A., 2006. Consistent ranking of volatility models. Journal of Econometrics 131 (1–2), 97–121. Hansen, P.R., Lunde, A., 2005. A forecast comparison of volatility models: does anything beat a GARCH(1, 1)? Journal of Applied Econometrics 20 (7), 873–889. Huber, P.J., 1981. Robust Statistics. Wiley, New York, USA. Komunjer, I., Vuong, Q., 2006. Efficientt conditional quantile estimation: the time series case, Working Paper 2006–10, Department of Economics, UC-San Diego. Lamoureux, C.G., Lastrapes, W.D., 1993. Forecasting stock return variance: toward an understanding of stochastic implied volatilities. Review of Financial Studies 6 (2), 293–326. Martens, M., van Dijk, D., 2007. Measuring volatility with the realized range. Journal of Econometrics 138, 181–207. Meddahi, N., 2001. A theoretical comparison between integrated and realized volatilities, Manuscript, Université de Montréal. Mincer, J., Zarnowitz, V., 1969. The evaluation of economic forecasts. In: Zarnowitz, J. (Ed.), Economic Forecasts and Expectations. National Bureau of Economic Research, New York.
Pagan, A.R., Schwert, G.W., 1990. Alternative models for conditional volatility. Journal of Econometrics 45, 267–290. Parkinson, M., 1980. The extreme value method for estimating the variance of the rate of return. Journal of Business 53 (1), 61–65. Patton, A.J., 2006. Volatility forecast comparison using imperfect volatility proxies, Research Paper 175, Quantitative Finance Research Centre, University of Technology Sydney. Patton, A.J., Sheppard, K., 2009. Evaluating volatility and correlation forecasts. In: Andersen, T.G., Davis, R.A., Kreiss, J.-P., Mikosch, T. (Eds.), The Handbook of Financial Time Series. Springer Verlag. Patton, A.J., Timmermann, A., 2007. Properties of optimal forecasts under asymmetric loss and nonlinearity. Journal of Econometrics 140 (2), 884–918. Poon, S.-H., Granger, C.W.J., 2003. Forecasting volatility in financial markets. Journal of Economic Literature 41, 478–539. Shephard, N., 2005. Stochastic Volatility: Selected Readings. Oxford University Press, United Kingdom. West, K.D., 2006. Forecast evaluation. In: Elliott, G., Granger, C.W.J., Timmermann, A. (Eds.), Handbook of Economic Forecasting. North Holland Press, Amsterdam. West, K.D., 1996. Asymptotic inference about predictive ability. Econometrica 64, 1067–1084. West, K.D., Edison, H.J., Cho, D., 1993. A utility-based comparison of some models of exchange rate volatility. Journal of International Economics 35, 23–45.
Journal of Econometrics 160 (2011) 257–271
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Volatility forecasting and microstructure noise✩ Eric Ghysels a,b,∗ , Arthur Sinko c,1 a
Department of Finance, Kenan-Flagler, School of Business, United States
b
Department of Economics, University of North Carolina, Gardner Hall CB 3305, Gardner Hall CB 3305, Chapel Hill, NC 27599-3305, United States
c
Economics, School of Social Sciences, Arthur Lewis Building University of Manchester, Manchester M13 9PL, United Kingdom
article
info
Article history: Available online 6 March 2010
abstract It is common practice to use the sum of frequently sampled squared returns to estimate volatility, yielding the so-called realized volatility. Unfortunately, returns are contaminated by market microstructure noise. Several noise-corrected realized volatility measures have been proposed. We assess to what extent correction for microstructure noise improves forecasting future volatility using a MIxed DAta Sampling (MIDAS) regression framework. We study the population prediction properties of various realized volatility measures, assuming i.i.d. microstructure noise. Next we study optimal sampling issues theoretically, when the objective is forecasting and microstructure noise contaminates realized volatility. We distinguish between conditional and unconditional optimal sampling schemes, and find that conditional optimal sampling seems to work reasonably well in practice. © 2010 Published by Elsevier B.V.
1. Introduction We study a regression prediction problem with volatility measures that are contaminated by microstructure noise and examine optimal sampling for the purpose of volatility prediction. The analysis is framed in the context of MIDAS regressions with regressors affected by microstructure noise. We investigate two topics: (1) the theoretical analysis and empirical performance of various volatility measures sampled at different frequencies, and (2) the performance of volatility measures corrected for independent noise compared to those that are corrected for dependent market microstructure noise. Discussions about the impact of microstructure have mostly focused so far on measurement and therefore mean squared error and bias of various adjustments. In this paper, instead the focus is on prediction in a regression format, and therefore we can include estimators that are suboptimal in mean square error sense, since covariation with the predictor is what matters. Previously, the optimal sampling frequency was studied in terms of MSE of estimators in an asymptotic setting (Zhang et al., 2005) and for
✩ We thank Yacine Aït-Sahalia, Federico Bandi, Peter Hansen, Jeffrey Russell, and Per Mykland for their helpful comments, and Peter Hansen and Asger Lunde for providing their data. We also like to thank Neil Shephard for his detailed comments on an earlier draft of our paper as well as the Referees. All remaining errors are ours. ∗ Corresponding author at: Department of Economics, University of North Carolina, Gardner Hall CB 3305, Gardner Hall CB 3305, Chapel Hill, NC 27599-3305, United States. Tel.: +1 919 966 5325. E-mail addresses:
[email protected] (E. Ghysels),
[email protected] (A. Sinko). 1 Tel.: +44 161 275 4842.
0304-4076/$ – see front matter © 2010 Published by Elsevier B.V. doi:10.1016/j.jeconom.2010.03.035
finite samples (Bandi and Russell, 2008b). In this respect, the issues discussed here differ from the existing literature. We also conduct an extensive empirical study of forecasting with microstructure noise. We use the same data as in Hansen and Lunde (2006), namely the thirty Dow Jones Industrial Average (DJIA), from January 3, 2000 to December 31, 2004. The purpose of our empirical analysis is twofold. First, we verify whether the predictions from the theory hold in actual data samples. We find that is indeed the case. Second, we also implement optimal sampling schemes empirically and check the relevance of the theoretical derivations using real data. We distinguish between ‘‘conditional’’ and ‘‘unconditional’’ optimal sampling schemes, as in Bandi and Russell (2006). We find that ‘‘conditional’’ optimal sampling seems to work reasonably well in practice. The topic of this paper has been studied by many authors independently and simultaneously. Garcia and Meddahi (2006) and Ghysels and Sinko (2006) discussed forecasting volatility and microstructure noise. Ghysels et al. (2007) provided further empirical evidence expanding on Ghysels and Sinko (2006). AïtSahalia and Mancini (2008) consider a number of stochastic volatility and jump–diffusions, including the Heston and logvolatility models, and study the relative performance of the twoscales realized (henceforth TSRV) estimator versus RV estimators. They provide simulation evidence showing that TSRV largely outperforms RV. They also report an empirical application which confirms their simulation results. We derive theoretical results for RV, TSRV, average over subsamples and Zhou (1996) estimators and study theoretically optimal sampling as well. For the most part we consider i.i.d. noise in our theoretical derivations. In addition, we also cover extensively empirical evidence on the topic.
258
E. Ghysels, A. Sinko / Journal of Econometrics 160 (2011) 257–271
Andersen et al. (2011) considered independently several issues covered in this paper. The paper is structured in the following way. Section 2 provides the theoretical underpinnings for our analysis. Section 3 describes the univariate MIDAS prediction models, the data and the empirical implementation of optimal sampling. Section 4 discusses the results. Section 5 concludes. 2. Volatility prediction and microstructure noise We want to compare the forecasting performance of linear regression models with various realized volatility measures as regressors. To do so we need to study the population second-order moments of the these volatility measures with future realizations, i.e. the regressands in our analysis. 2.1. Description of realized measures We use p to denote an observable, high-frequency log-price process, p∗ to refer to the unobservable efficient log-price process, and η to denote the microstructure noise component that has mean 0 and variance ση2 . Also, we assume that the noise component and the efficient price component are independent. The relation between them is given by the equation pt = pt + ηt . ∗
(2.1)
We will assume that dp∗t = σt dWt ,
(2.2)
where Wt is a standard Brownian motion and the spot volatility process σt is predictable and has a continuous sample path. We first focus on equally-spaced data sampling (calendar data sampling). We denote by M + 1 the number of prices associated with the finest equidistant grid M (say — every second) per period (say a day). Because of microstructure noise, we will also consider estimators that use a relatively ‘‘sparse grid’’ Mjm and constructed using M = M /m returns. We consider six realized volatility estimators. Their technical details are reported in Appendix A.1. The first estimator, the daily RVm j , defined in Eq. (A.2), is consistent in the absence of microstructure noise, but biased and inconsistent when noise is present. The second estimator, RVAC 1 , defined in Eq. (A.3), was studied by Zhou (1996) and Hansen and Lunde (2006). Under the assumption of i.i.d. microstructure noise it is unbiased, but inconsistent. The third estimator, RVTS (Eq. (A.4)), is the twoscales estimator of Zhang et al. (2005). It consists of two parts. The first part is the average of a ‘‘fast-scale’’ RVm j measures. Since each measure uses only M intraperiod returns, potential improvement could be made by averaging over m different realized volatility estimators. The second part is a properly adjusted ‘‘slow-scale’’ realized volatility. It is introduced to compensate for the microstructure noise bias. Although the bias correction can significantly change estimated realized volatility, we will show in Sections 2.3 and 4 that it may not significantly change the mean square error of prediction. Nevertheless, we expect RVTS to perform well as subsampling reduces the measurement error. This prompts another question, namely how good is a averaging m estimator without the noise-correction? We write it as RV and it is define in Eq. (A.5). The last two estimators we consider capture the fact that, in reality, microstructure noise can be serially correlated. The first estimator, RVTS d (Eq. (A.7)), is a modification of (A.4) and is proposed by Aït-Sahalia et al. (2005). They only differ in the definition of the bias-correction term. The last estimator, tick proposed by Hansen and Lunde (2006) and RV1ACNW (Eq. (A.8)), is w Bartlett kernel-based defined in tick-time instead of calendar-time.
We assume that the variance is a function of a state variable which is a linear combination of the eigenfunctions of the infinitesimal generator associated with the state variable in continuous time. As noted by Renault (2009), the eigenfunction approach dates back to Granger and Newbold (1976) and was applied to volatility by Taylor (1982) and further developed by Meddahi (2001). Special cases of this setting include the log-normal and the square-root processes where the eigenfunctions are the Hermite and Laguerre polynomials, respectively. The eigenfunction approach has several advantages including the fact that any square integrable function may be written as a linear combination of the eigenfunctions and the implied dynamics of the variance and squared return processes have an ARMA representation. Therefore, one can easily compute forecasting formulas. This approach has been successfully used for such a purpose in a number of recent papers including, in the context of forecasting with microstructure noise, in independent work by Andersen et al. (2011). We describe first some properties of the return process and microstructure noise. The observed log-price process is described by Eq. (2.1). For microstructure noise process ηt we rely on an i.i.d. assumption.2 In particular: Assumption 2.1. The ηt process is i.i.d. with E (ηt ) = 0, Var (ηt ) =
2 ση2 , E ηt3 = 0, E ηt4 /E ηt2 = κ , ηt is independent from p∗t . We assume that σt2 is a continuous square-integrable function of a Markovian time-reversible stationary process ft , namely: ∑ σt2 = a0 + ki=1 ai Pi (ft ) where ai is real; k ≤ ∞; ai = E σ 2 (ft )Pi (ft ) ; ∀k > 0: E (Pi (ft +k )|ft ) = e−λi k Pi (ft ); E (Pi (ft )) =
δi,1 ; E Pi (ft )Pj (ft ) = δi,j . Also, we assume that constituents of p∗ (Eq. (2.2)) W and σ are independent processes (no leverage assumption). Given this setting and Eqs. (2.1) and (A.1), observed returns are defined as rt ,h = p∗t − p∗t −h + ηt − ηt −h = rt∗,h + et ,h and therefore in the absence of a drift rt∗,h =
t
t −h
σt dWt . For this setting we prove3 :
Theorem 2.1. Let Assumption 2.1 hold, then for the estimators RV, RVm AC 1 , RV, and RVTS :
Bias RVm = Bias RV j
Bias (RVTS ) =
1 − Mh − p
Var(RVm j ) = 2
m
a2i 2 i i =1
−
λ
= (1 − Mh)a0 + 2M ση2 ; 1
M
(2.3)
a0
e−λi Mh − 1 + λi Mh + 2Ma20 h2
p
+ 4M
a2i 2 i i =1
−
λ
e−λi h − 1 + λi h + 8Ma0 hση2
+ (4M − 2)(κ − 1)ση4 + 4M ση4 2 Var(RVm AC 1 ) = (2.4) + 4
(2.4)
M
(κ + 2)ση4 + 4a0 hση2 + a20 h2 (M − 1) 2 p − M (M − 2) 4 a2i −λi h 2 + 1 − e + 8 ση λ2i (M − 1)2 i =1
2.2. Data generating processes
2 Note that this assumption is relevant only for a relatively low-frequency setup. For discussion of microstructure noise properties see, for example, Hansen and Lunde (2006). 3 It should be noted that our results differ slightly from the findings the
To derive population covariances of realized variance estimators we need to specify a data generating process for spot volatility.
independently written paper by Andersen et al. (2011), as we are using only intraperiod prices to compute realized volatility estimators. As a result two noisy returns that span consecutive periods have independent noise components.
E. Ghysels, A. Sinko / Journal of Econometrics 160 (2011) 257–271
− 8M (κ + 1)ση4 + 2a0 hση2 m 8a0 hM σ 2 (4M − 2)(κ − 1)ση4 + 4M ση4 η Var RV = + m m p 2 − M a k 2 2 −λk h + 2h a0 + 4 e − 1 + λk h m λ2k k =1 p m−1 1 − a2k − × 2 (1 + 2i) e−λk (1−2i/M ) − 1 2 m λ k i=0 k=1 m −1 i −1 4 −− + λk (1 − 2i/M ) + 2 m
[ p − −j a2k −λk iM −λk (1−2i/M ) 1 − e × 1 − e 2 λk k=1 m − 1 i −1 2 −− (i − j)2 a20 + 2 4(M − 1) 2
t +nh,nh , RV t ,h ) = Cov RV t +nh,nh , RV t ,h R2 (RV
From the above equation we note that the relative R -performance of different RVt ,h measures only depends on the variances of t ,h = Cov IVt +nh,nh , IVt ,h and t ,h , since Cov RV t +nh,nh , RV RV
t +nh,nh Var RV B RVt ,h
Var
p − −j a2k i−j −λk iM + e − 1 + λk M λ2k k=1 m−1 i−1 2 −− (m − i + j)2 a20 + 2 4M 2 2M
p − λ (m−i+j) a2k m−i−j − k M + e − 1 + λk M λ2k k=1 2 p − a2k −λk M Var (RVTS ) = (2.6) + 2 2 e − 1 + λk 2 M λk k =1 p − a2 2a20 k −λk /M + + 4M e − 1 + λk /M M λ2k k=1
(2.6)
M
4M ση2 /M + 4M − 2/m (κ − 1)ση4
m−1
−
p
a2k 2 k j=0 k=1
2M − − Mm
λ
× (1 − e ×
− 1 + λk Mh) −
, ) ≤
≥
B R2 RVt +nh,nh RVt ,h
(
, ). The
above cases, denoted by setting also applies to multiple regressor t +nh,nh ; RV t ,h , RV t −h,h , . . . , RV t −lh,h , which will be relevant R2 RV for the empirical analysis reported in the next sections. Defining the l + 1 × 1 vector C (yτ , zt , l) = (Cov (yτ , zt ) , Cov (yτ , zt −1 ) , . . . , Cov (yτ , zt −l ))′ and the (l + 1) × (l + 1) matrix M (zt , l) with (i, j)th component M (zt , l)[i, j] = Cov zt , zt +i−j , we can then express the R2 for a regression of RVt +nh,nh onto a constant and the set of regressors (RVt ,h , RVt −h,h , . . . , RVt −lh,h ), l ≥ 0 as:
Z
X
t +nh,nh , RV t ,h , l R2 RV
2M (M − m) M
p − a2k −λk /M 2 2 2a0 /M + 4 (e − 1 + λk /M ) . λ2k k=1
Proof. See Appendix A.2.
(
−1 (2.9)
it +a,a , RV jt −δ,b = Cov IVt +a,a , IVt −δ,b , ∀δ ≥ 0 and i, j = Cov RV {X , Y , Z } we have that:
(2 − e−λk (m−1−j)/M − e−λk j/M )
) + (e
A
Theorem 2.2. For multiple regressions and arbitrary realized volatility estimators X, Y , and Z yielding R2 ’s in (2.9), and whenever
−λk Mh
then
A R2 RVt +nh,nh RVt ,h
× C (.) /Var RVt +nh,nh where C (.) = C RVt +nh,nh , RVt ,h , l .
−λk Mh
t ,h is fixed. As a result, whenever Var RV
2
2M
R2 RVt +nh,nh , RVt ,h , l = C (.)′ M RVt ,h , l
M + 2 8a0 ση2 + (4M − 2)(κ − 1)ση4 + 4M ση4 M −
(2.8)
2
i=1 j=0
2
t ,h − 1 . t +nh,nh Var RV × Var RV
]
2M
i=1 j=0
m
we consider in this section. In reality, however, this is only an approximation. Moreover, we (1) do not take into account overnight returns, and (2), neglect a bias of the averaging over subsamples estimators, that disappears only asymptotically.5 In addition, we derive approximately optimal sampling frequencies in terms of minimization of MSE of the prediction, which is equivalent in this case to the optimal frequency maximization of the population R2 . This analysis parallels that of Andersen et al. (2004), in particular, they show that for n-period ahead forecasts,
i =1 j =0
m
(2.5)
259
(2.7)
2.3. Population prediction properties and optimal sampling We now turn to the computations of multiple correlation coefficients, or R2 , for single regressor equations projecting future integrated (realized) volatility onto various past RV measurements and a constant. By ‘‘approximately optimal4 ’’ we mean the following: to unify our analysis, we assume that t +nh,nh , RV t ,h = Cov IVt +nh,nh , IVt ,h for all estimators Cov RV
4 For convenience we will henceforth use the term ‘‘optimality’’ when ‘‘optimality in term of MSE of prediction’’ is discussed. In all other cases we will state explicitly when we mean MSE of estimation.
Z X t +nh,nh , RV Yt,h , l ⇔ Var RV t ,h ≥ R2 RV Y t ,h . ≤ Var RV
Proof. See Appendix A.2.
With the above result we are able to proceed with the comparison of the impact of various volatility measures on forecasting performance. To apply Theorem 2.2 to our framework, we have to assume that the covariances between daily estimators are equal to the covariances between daily integrated volatilities. Consider two groups of estimators: group A = {RV, RVAC 1 } and group B = {RV, RVTS }. We expect that group B estimators should have smaller variance both in noise-free and noisy environments. In a noise-free environment the variances are smaller given that the variance of the discretization noise (Eq. (A.21)) averaged over subsamples is smaller than the variance of MVar Zj/M +h,h (see Eq. (A.19)). When microstructure noise is present, as the sampling frequency becomes large, the variances of group A estimators diverge
5 The exact covariances (assuming daily integrated volatility can be consistently estimated without overnight returns) are shown in Ghysels and Sinko (2009). All numerical results are obtained using these formulas.
260
E. Ghysels, A. Sinko / Journal of Econometrics 160 (2011) 257–271
to infinity. As a result, the R2 s of the regressions converge to 0. Group B estimators show a different pattern. These two estimators have asymptotically zero variance, and the two-scales correction only removes the bias from the estimator averaging over subsamples. However, if the bias is constant over time, it does not affect the predictive power of the regression as it only changes the intercept. Thus, the performance of group B estimators should be approximately the same, with the averaging over subsamples estimator performing better as the ‘‘sparse’’ grid M approaches M. Finally, in real-data applications, the variance of market microstructure noise is time-varying, which makes the two-scales estimator preferable. In particular, we have to emphasize the following. First, our analysis is valid as long as a regressand is unbiased, note though that in the simplest case we consider, microstructure noise bias does not change the prediction MSE and relative performance of different regressors. Note also, however, that such analysis cannot be applied directly to autoregressive models. For the autoregressive models there is a non-trivial bias-variance tradeoff. Second, our analysis is about picking the best regressor given some regressand. Hence, we do not answer the question which is the best regressand. Finally, we operate under the assumption that the noise is i.i.d. The microstructure i.i.d. assumption impacts our analysis in two ways. First, in our derivations of variances we assume that market microstructure noise has zero autocorrelation. It is however well-documented in the literature that this does not hold for ultra-high frequencies.6 This violation of our theoretical assumption creates a wedge between our results and the empirics. Second, in our regression analysis we assume that the variance and fourth moment of microstructure noise do not vary over time. During market crashes or severe liquidity shocks, however, the variance of microstructure noise tends to increase (see, for example, Aït-Sahalia and Yu (2009)). Violation of this creates a time-varying regressor bias. It should decrease the regression explanatory power of RV relative to the RVTS estimator and make predictions biased. However, our empirical analysis will show that there is no significant difference, for all frequencies we consider, in the explanatory power of RV and RVTS estimators. Thus, the i.i.d. assumption allows us to develop a simple and concise theory, and check qualitatively for what frequencies this theory works and whether a time-varying bias should be taken into account. To focus exclusively on a relative performance of different realized volatility measures, we use as a regressand the infeasible integrated variance. To appraise the performance of the realized volatility estimators, we compare the population predictive power of the estimators using three models M1–M3 described in Andersen et al. (2004). Model M1 is a GARCH-diffusion model, M2 is a two-factor affine model, and M3 is a log-normal diffusion one (for details and parameters see Appendix A.2, Eqs. (A.24)–(A.26)). Table 1 presents the sample results for models M1–M3. The results are reported for different frequencies, number of lags, zero and non-zero microstructure noise. Market microstructure noise is assumed to have i.i.d. with κ = 3, and two values of ση2 : ση2 = 0.001 (relevant for the frequently traded stocks we consider in the empirical part of the paper), and ση2 = 0.03. The latter ση2 is rather unrealistic. However, in reality, not only the variance of microstructure noise, but also the variance of fundamental returns varies across stocks, while in our analysis it is fixed for a given model. Aït-Sahalia and Yu (2009) show that noise-to-signal ratio (NSR) for traded stocks varies from virtually 0 (no noise) to 1 (no signal). Thus, the high value of ση2 describes stocks with high NSR price process. Each model, for each set of microstructure noise properties, has three sampling frequencies: five minutes, one minute and twenty
6 See, for example, Hansen and Lunde (2006) and Sinko (2007).
seconds. These frequencies correspond accordingly to the lowfrequency scales for RVTS and RV, and sampling frequencies for the RV and Zhou estimators. The high-frequency scale for RVTS and RV is set to be one second. These sampling frequencies are selected to check the suitability of the forecasting procedure method and are commonly used in empirical studies utilizing the 5 min sampling frequency. The full version of the tables is available online in Ghysels and Sinko (2009). We consider three values for lags: 1, 15, 50; seven values of the variance of market microstructure noise σ 2 = 0, 0.001, 0.005, 0.01, 0.015, 0.02, 0.025, 0.03, and six values for the microstructure noise kurtosis κ = 1.5, 2, 2.5, 3, 3.5, 4. The main findings are the following: In the noise-free environment the averaging over subsampling estimator performs the best across all estimators. The two-scales estimator produces slightly worse results. These two estimators outperform the RV and RVAC 1 estimators. Moreover, the RV estimator performs better than RVAC 1 . Infeasible linear regressions with integrated variance on the right hand side also perform better compared to the feasible ones. This finding goes along the same line as Theorem 2.2. Lower variance of the estimator implies higher R2 assuming the all other assumptions hold. As the sampling frequency increases, the variance of the discretization noise decreases and R2 of all estimators converge to the R2 of the infeasible regression on integrated volatility (R2IV ). For all the models, fifteen lags are sufficient and further increase in the number of lags does not provide any increase in R2 s. These results are not surprising given the fact that the models have a small number of independent parameters and exponential decay of the lag coefficients. The results change in an environment with microstructure noise. Namely, they depend not only on microstructure noise variance, but also on its kurtosis. For κ ≤ 2, the RV estimator (RV) performs better compared to RVAC 1 and the averaging over subsamples realized estimator RV produces better results than RVTS for all frequencies. Moreover, in absolute terms, when microstructure noise variance σ 2 ≥ 0.015, all R2 s monotonically decrease with the increase in the subsampling frequency for a given set of model parameters. The aforementioned finding can be explained by the fact that, for this range of microstructure noise variances, the optimal sampling in terms of R2 becomes lower than five minutes. The main factor that impacts the performance of R2 is the variance of the realized volatility estimators. The biases are captured by the constant terms of the regressions. Fig. 1 shows the optimal sampling frequencies that maximize R2 of the estimators, conditional on specific values for noise variance for κ = 3. For large values of market microstructure noise variance increasing the number of lags is helpful. For some extreme cases the difference between the R2 for a regression with 15 lags and a regression with 50 lags is 10%. The explanatory power of the group A estimators monotonically decreases for all values of microstructure noise variance considered. For κ > 2 the situation changes. First, for higher frequencies, the RV estimator R2pl is smaller than R2AC 1 . Second, R2TS becomes
larger than R2av for some range of near-optimal frequencies around the maximum of R2 . This result supports the findings of AïtSahalia et al. (2005) who show that the two-scales estimator performs better with the optimal frequency compared to the ‘‘second best’’ averaging over subsamples estimator. However, for the non-optimal frequencies the averaging over subsamples estimator outperforms the two-scales one. The eigenfunction approach also allows us to derive an approximately optimal frequency. However, analytical solutions can only be obtained for the simplest cases of the RVAC 1 and RV estimators. For the other two cases the solution can be found as a root of third and fourth power polynomials. For these two it is more
E. Ghysels, A. Sinko / Journal of Econometrics 160 (2011) 257–271
261
Table 1 Sample Theoretical R2 Comparison of MIDAS approach for the M1–M3 models. Each entry in the table corresponds to the R2 for the different models (A.24)–(A.26), different number of lags and the different return sampling frequencies. The regressions are run on a weekly (5 days) data sampling scheme. The names of the variables are consistent with the section describing realized volatility estimators. Every column in the panel corresponds to the theoretical explanatory power of the different left-hand side variables for the same right-hand side variable. The first panel contains theoretical results for the ‘‘noiseless’’ case, the second contains results for the case of i.i.d. noise with κ = 3 and σ 2 = 0.001, and the third contains the case of i.i.d. noise with κ = 3 and σ 2 = 0.03. The complete results are provided in Ghysels and Sinko (2009), Table B-1. 1 lag R2best
50 lags
RVIV
RV
RVAC 1
RV
RVTS
RVIV
RV
RVAC 1
RV
RVTS
LHS: IV , σ = 0.0000 2
M15min M11min M120sec
0.891 0.891 0.891
0.871 0.871 0.871
0.799 0.856 0.866
0.686 0.827 0.856
0.822 0.861 0.867
0.817 0.832 0.783
0.874 0.874 0.874
0.822 0.856 0.866
0.776 0.836 0.856
0.834 0.861 0.868
0.830 0.839 0.814
M25min M21min M220sec
0.586 0.586 0.586
0.445 0.445 0.445
0.346 0.422 0.437
0.240 0.381 0.422
0.375 0.429 0.440
0.372 0.415 0.396
0.460 0.460 0.460
0.390 0.439 0.452
0.328 0.411 0.439
0.407 0.445 0.454
0.406 0.434 0.421
M35min M31min M320sec
0.945 0.945 0.945
0.934 0.934 0.934
0.875 0.922 0.930
0.775 0.898 0.922
0.894 0.926 0.932
0.888 0.895 0.841
0.936 0.936 0.936
0.900 0.923 0.930
0.869 0.910 0.923
0.908 0.926 0.932
0.905 0.908 0.888
LHS: IV , σ 2 = 0.0010, κ = 3.0 M15min M11min M120sec
0.891 0.891 0.891
0.871 0.871 0.871
0.774 0.809 0.778
0.668 0.790 0.789
0.822 0.860 0.863
0.816 0.831 0.779
0.874 0.874 0.874
0.810 0.827 0.812
0.770 0.818 0.817
0.833 0.860 0.863
0.830 0.839 0.813
M25min M21min M220sec
0.586 0.586 0.586
0.445 0.445 0.445
0.301 0.321 0.261
0.218 0.309 0.284
0.375 0.427 0.425
0.372 0.413 0.386
0.460 0.460 0.460
0.364 0.375 0.341
0.314 0.368 0.354
0.407 0.443 0.442
0.406 0.432 0.414
M35min M31min M320sec
0.945 0.945 0.945
0.934 0.934 0.934
0.852 0.880 0.848
0.757 0.865 0.860
0.894 0.925 0.927
0.888 0.895 0.837
0.936 0.936 0.936
0.892 0.902 0.891
0.865 0.896 0.895
0.908 0.926 0.927
0.905 0.908 0.887
LHS: IV , σ 2 = 0.0300, κ = 3.0 M15min M11min M120sec
0.891 0.891 0.891
0.871 0.871 0.871
0.123 0.032 0.011
0.153 0.046 0.017
0.807 0.603 0.181
0.806 0.640 0.205
0.874 0.874 0.874
0.471 0.253 0.124
0.508 0.309 0.166
0.826 0.747 0.536
0.825 0.760 0.557
M25min M21min M220sec
0.586 0.586 0.586
0.445 0.445 0.445
0.012 0.003 0.001
0.015 0.004 0.001
0.340 0.117 0.018
0.346 0.145 0.021
0.460 0.460 0.460
0.048 0.013 0.004
0.061 0.018 0.007
0.386 0.234 0.068
0.390 0.260 0.078
M35min M31min M320sec
0.945 0.945 0.945
0.934 0.934 0.934
0.148 0.039 0.014
0.185 0.056 0.020
0.879 0.670 0.213
0.877 0.706 0.239
0.936 0.936 0.936
0.624 0.401 0.224
0.657 0.465 0.286
0.902 0.842 0.677
0.901 0.852 0.694
convenient to express the solution in terms of φ ≡ M /M , φ ∈ (0, 1) (Bandi and Russell, 2008a). As a result, φ ≃ 0 corresponds to the case of the lowest possible frequency, φ ≃ 1 corresponds to the case of the highest frequency. In our analysis only M changes. There are two major differences between the optimal sampling we derive and the optimal sampling derived in Bandi and Russell (2008b). First, for prediction purposes, the bias of the estimator does not matter as long as it is constant over time. As a result, using Theorem 2.2, the optimal sampling frequency is the one that minimizes the variance of the estimators. Second, we are using a stochastic volatility model while Bandi and Russell (2008b) assume that the variance is deterministic function of time. The following proposition summarizes our results: Proposition 2.1. Let Assumption 2.1 and the conditions of ∑pTheo2 rem 2.2 hold, Mh ≃ 1, m ≃ M /M. Also, lets define Q = i=0 ai . Then the approximate optimal sampling frequencies for the RV, RVAC 1 , RVTS and RV estimators are:
M RV ≃
Q 2κση
4
,
M RVAC ≃ 1
3Q 4ση4
M RV ≃ arg min 8a0 φση2 + 4φ 2 M κση4 − 2φση4 (κ − 1)
+
Q (1 − φ) {2M (2 − φ) − (1/φ + 1)} 3M 2 φ
M RVTS ≃ arg min
−
Q {2M (2 − φ) − (1/φ + 1)}
2φση4 (κ − 1) 1−φ
Proof. See Appendix A.3.
√
3M 2 φ(1 − φ)
+
8φ a0 ση2 + 8φ 2 M ση4
(1 − φ)2
.
(2.10)
We note from the above proposition that M RV /M RVAC
1
≃
2/(3κ). Although the result is based on approximations, it matches with exact computations. In particular, consider the findings reported in Fig. 1 where we computed the R2 as a function of sampling frequency, second and fourth moments of market microstructure noise and mean and variance of realized volatility directly for three models M1–M3 ((A.24)–(A.26)) using an exact formula for the variance. The optimal sampling frequency for RVAC 1 estimator is always greater than the optimal sampling frequency for RV estimator, and the difference increases as kurtosis κ of market microstructure noise increases. We showed in Theorem 2.2 that the optimum in terms of MSE of prediction is equivalent to the optimum in terms of minimum of estimator’s variance. Comparing optimal frequencies in terms of MSE of prediction and optimal frequency in terms of MSE of estimator we can conclude that they will coincide whenever the estimators are unbiased. For cases when the bias is non-zero and is linearly increasing with the sampling frequency, the optimal sampling frequency in terms of MSE of prediction will be higher
262
E. Ghysels, A. Sinko / Journal of Econometrics 160 (2011) 257–271 M1
0.95
M2
0.7
0.9
M3
0.95
0.6 0.9
0.85
0.5
0.8
0.85
0.4
0.75 0.3
0.8
0.7 0.2
0.65
0.75 0.1
0.6 0.55
0
0.005
0.01
0.015
0.02
0
0.7 0
0.005
M1
0.01
0.015
0.02
0
M2 300
300
250
250
250
200
200
200
150
150
150
100
100
100
50
50
50
0
0.005
0.01
0.015
0.02
0
0
0.005
0.01
0.015
0.02
0.015
0.02
M3
300
0
0.005
0.01
0.015
0
0.02
0
0.005
0.01
Fig. 1. Optimal sampling, κ = 3. Optimal sampling frequency and maximal R2 of the different models and different realized volatility estimators. R2E corresponds to the prediction power of the conditional expectation, R2IV corresponds to the prediction power of the integrated volatility.
than the optimal sampling frequency in terms of MSE of estimator, as the increase in the bias introduces an additional penalty as the sampling frequency increases. Also, note that Bias(RV) = Bias(RV) = 2M ση2 . 3. Practical implementation issues In this section we discuss various practical implementation issues, ranging from the choice of regression models, the data and optimal sampling schemes. We compare in this subsection realized volatility estimators using the forecast performance of MIDAS regressions proposed, in the context of volatility prediction, by Ghysels et al. (2006). We will consider the beta polynomial specification for the parameterization. 3.1. Data We use the exact same prefiltered price data as in Hansen and Lunde (2006). The data is provided by the authors. The data consists of the thirty equities of the Dow Jones Industrial Average (DJIA). The sample period spans five years, from January 3, 2000 to December 31, 2004. All the data are extracted from the Trade and Quote (TAQ) database. In particular, we use the trade prices for our analysis. The raw data were filtered for outliers and transactions outside the period from 9:30 am to 4:00 pm were discarded. The filtering procedure removed obvious data errors such as zero prices. For most of our estimators we use calendar-time sampling. It requires the construction of artificial prices from the original tick-by-tick irregularly-spaced price data. For interpolation we use the previous-tick method, introduced by Wasserfallen and Zimmermann (1985).
developed by Oomen (2005), Bandi and Russell (2008a) and Hansen and Lunde (2006) among others. Following Bandi and Russell (2006) we use term ‘‘conditional’’ to reflect the fact that the optimal sampling frequency for realized volatility estimators is computed on a daily basis using Proposition 2.1 formulas with daily estimates of second and fourth moments of market microstructure noise and quarticity. In contrast, unconditional optimal sampling fixes the sampling frequency over the whole period. In the remainder of this subsection we will discuss the microstructure noise moment estimators used to compute the conditional optimal sampling frequencies. We are partially adopting the Bandi and Russell approach, though our focus is to find optimal sampling in terms of MSE in prediction, not MSE in estimator. To apply the results of Proposition 2.1, we need to estimate the second and fourth moments of market microstructure noise as well as a measure of daily quarticity. The quarticity and the fourth moment of market microstructure noise can be estimated from sum of returns in fourth power, Rm j , using different sampling frequencies (note that the number of observations on a grid Mjm is M + 1, h ≃ 1/M):
m
E Rj
≡ E
−
rt4k ,h
tk,−1 ∈Mjm
2 2 4 E η + 3E η2 . (3.1) h≪1 h m In the no-noise (relatively low-frequency) environment Rj is used to compute Barndorff-Nielsen and Shephard (2002) quarticity estimator Qˆ jm , which can be further improved using subsampling averaging. In the noisy environment (the highest-frequency case) it is used to estimate E η4 . ≃ 3hQ + 12a0 ση2 +
m 1 −
Qˆ jm =
m 1 −M
3.2. Unconditional and conditional optimal sampling
Qˆ =
Optimal sampling frequency issues were first considered, for the homoscedastic case, by Zhou (1996). The idea was further
1 1 E η4 ≃ R1 − 3σˆ η4 . 2M
m
j
m
j
3
Rm j ; (3.2)
E. Ghysels, A. Sinko / Journal of Econometrics 160 (2011) 257–271
To measure realized variance we use Eq. (A.5) at low frequency. To measure second moment of microstructure noise we use estimator proposed by Bandi and Russell (2006):
σˆ η2 ≃
1 2M
−
(rtk )2
(3.3)
tk,−1 ∈M
where M is the finest possible grid (every second), M = 23 400 is the number of elements in the finest possible grid minus one, Mjm is fifteen minutes grid, m is a number of grids. 4. Empirical results We are armed with theoretical predictions about how various volatility measures should behave as far as prediction is concerned. How does this all play out in real data? In this section we describe both the in-sample and out-of-sample empirical forecasting performance for the different estimators. Due to space limitations we only present the details of the optimal sampling results. Before we do we briefly describe the one-minute and five-minute frequency results that can be found in a technical appendix (Ghysels and Sinko, 2009). We find that, within the MIDAS framework, the five-minutes frequency can be considered as a low-noise frequency environment where all of our theoretical predictions hold. In contrast, for the one-minute frequency it seems that we do not find many coherent empirical results, nor results that square with the theory. In general, the out-of-sample results support our in-sample prediction findings. Averaging over subsamples estimators based on five-minutes returns outperform the rest. RVTS and RV estimators perform the same. Comparing the RV estimator performance with that of two scales and averaging over subsamples reveals that the RV estimator produces worse results. Finally, one-minute out-of-sample and in-sample predictions share similar patterns. There is no uniformly better estimator in terms of predicting power for this sampling frequency. In the remainder of this section, we focus on optimal sampling in terms of MSE of prediction. There are at least two reasons why we want to examine empirically optimal sampling issues in the context of prediction of volatility in the presence of microstructure noise. First, we noted so far that the five-minute empirical results aligned with the theoretical predictions regarding realized volatility measures, yet for the one-minute frequency there is no volatility measure that uniformly outperforms the others. Second, in Section 2.3 we established a number of theoretical results that we now verify empirically. Notably, we showed in Section 2.3 that the optimal sampling in terms of MSE of prediction should be higher than those in terms of MSE of the estimator. This section answers the following questions: (i) What is the unconditional optimal frequency for individual stocks? (ii) What is the conditional optimal frequency implied by estimated daily characteristics of market microstructure noise and realized volatility? (iii) What is the gain in terms of R2 for the second method compared to the first? To do this, we need to estimate daily variance and quarticity of efficient log-price process as well as the second and fourth moment of market microstructure noise. When considering optimal sampling we need to broaden the empirical specification used for the regressions, as the sampling frequency of LHS and RHS variables will differ. In particular, our analysis makes use of the following regression framework ′
′
m m RVm y ( t + H , t ) = µH + φ H
kmax −
′
bm H
k=0 ′ RVm y′
× (k, θ )
′
m (t − k, t − k − 1) + ϵHt ,
(4.1)
263
where for the left hand side (1) RVm y = RVTS is computed using five-minute, one-minute or two-second sampling frequencies m and (2) H is a five-day period. For the right hand side we use (1) a set of variables RVy′ = {RV, RVTSd , RVAC 1 , RVNW , RV, RVTS } constructed using sampling frequencies m′ from 2 s to 10 min and (2) the conditional optimal sampling frequency determined empirical using the daily estimators from Section 3.2. To keep the empirical analysis concise we study two stocks: IBM and AA as representatives of a liquid and a relatively illiquid stock. The results for the unconditional and conditional sampling frequencies are presented in Figs. 2 and 3. Each stock has six ′ subplots corresponding to the different RVm . For the construction y′ of each plot we used the entire 2000–2004 sample. Our findings do not change if use only subsamples 2000–2002 or 2003–2004. The R2 ’s obtained with the conditional optimal sampling, which vary every day, are plotted using colors associated with the sampling frequency of the LHS variables. We use red for the 2 s frequency, green for the 1 min and blue for 5 min. The vertical lines correspond to ex ante unconditional sampling frequencies based on the unconditional measures of the noise and signal moments and Proposition 2.1. Before discussion the empirical findings, let us first recall what the theory should tell us about the patterns of predictive power as we change the sampling frequency in the measurement of volatility. As mentioned in Section 2.3, we divide the estimators into two groups. Group A = {RV, RVAC 1 } consists of estimators that have increasing variances as a function of the sampling frequency, whereas Group B = {RV, RVTS } consists of the estimators with variances that decrease as the sampling frequency increases. Hence, one might expect from theory that Group B estimators should outperform Group A estimators for the relatively high frequencies. However, this statement is based on asymptotic arguments, which may be or may not be a good description of what we see in empirical applications. Indeed, the decreases of the variance in the Group B cases, is based on asymptotic arguments that may not directly apply to our empirical analysis since the finest possible grid M is constant, and the asymptotics assume that as the subsampling frequency increases, the highest frequency associated with M increases, too. Therefore, due to the fact that lower frequencies group B estimators have lower discretization error, the explanatory power of group B estimators for lower frequencies should be higher compared to the explanatory power of group A estimators. By the same reason, the optimal unconditional frequency for these estimators should be higher compared to the RV estimator. Along the same lines, what do we expect from theory for the Group A = {RV, RVAC 1 } estimators? Let’s start with the RV estimator. As the sampling frequency increases, the discretization noise of the estimator decreases. At the same time, the impact of the market microstructure noise becomes more significant. Thus the R2 as a function of the sampling frequency m (or the number of log-price observations per day M) can be either increasing (the sampling frequency is too low to achieve the optimum), decreasing (the sampling frequency is too high to achieve the optimum), or hump-shaped (there is an optimum within the interval considered). The same pattern should hold for the Zhou estimator RVAC 1 . Moreover, as sampling frequency increases, the probability of two consecutive non-zero returns decreases. Combining with the previous-tick method we use, this leads to the convergence of the RVAC 1 estimator for the ultrahigh frequencies to the RV estimator. In addition, the maximum of the ‘‘hump’’ for RVAC 1 should be achieved at a higher frequency than for the ‘‘plain vanilla’’ estimator (see discussion after Proposition 2.1). The behavior of group B estimators depends not only on M but also on the finest possible grid over the day, i.e. M. It follows
264
E. Ghysels, A. Sinko / Journal of Econometrics 160 (2011) 257–271
0.8
0.8
0.75
0.75
0.7
0.7
0.65
0.65
0.6
0.6 0
100
200
300
400
500
600
0.8
0.8
0.75
0.75
0.7
0.7
0.65
0.65
0
100
200
300
400
500
600
0
100
200
300
400
500
600
0
100
200
300
400
500
600
0.6
0.6 0
100
200
300
400
500
600
0.8
0.8
0.75
0.75
0.7
0.7
0.65
0.65
0.6
0.6 0
100
200
300
400
500
600
Fig. 2. R2 as a function of frequency. IBM stock. Full sample. The figure shows dependence of regression R2 as a function of sampling frequency. LHS variable is RV , constructed using 5 min, 1 min and 2 s sampling frequencies. R2 is computed for the following RHS: RV, RVAC 1 , RV , RVNW , RVTS , RVTSd and from 2 to 600 s sampling frequencies. The explanatory power of the conditional optimal frequency for a given estimator is plotted using the same color as the unconditional one. The results are given for five-year sample (Jan. 2000–Dec. 2004).
directly from the proof of Proposition 2.1 that when the ratio φ = M /M is large enough, the RV estimator behaves like the ‘‘plain vanilla’’ realized volatility estimator. The behavior of the variance of RVTS resembles that of RV. Further, we would expect that the change in the LHS sampling frequency will increase the explanatory power of the regression if the resulting error (sum of the discretization and the market microstructure noise) decreases. That is, as long as the decrease in the discretization noise does not compensate for the increase in the market microstructure noise, the explanatory power of the regression increases, and vice versa. In addition, the unconditional sampling frequency optimum (the maximum of the hump) should stay constant. How do these plots match up with the theoretical predictions? As we mentioned in Section 2.3, the i.i.d. market microstructure noise assumption is most questionable. As a result, empirical violation of the theory is most likely due to the time-varying market microstructure noise. There are some patterns that contradict the theory. namely: (a) the patterns of the plots should look like parallel shifts across the three colors, and this not the case for AA and it also not the case for the high frequency patterns of the red lines for IBM, (b) conditional optimal sampling should be at least as good as unconditional optimal sampling. Conditional optimal sampling yields predictions that are reasonably close to the maximal R2 of the regressions for
each fixed frequency. Yet, in general they are slightly worse than the unconditional ones, and (c) the red lines are upward trending for all estimators, towards the high frequencies, and this seems to imply that the variance of microstructure noise is predictable. Nevertheless, some findings square with the theory. All these findings are related to the lower-frequencies and stocks with higher liquidity, namely: (a) Group B estimators outperform Group A estimators for the relatively low frequencies. This confirms the simulation results reported in Aït-Sahalia and Mancini (2008), (b) comparing the Zhou estimator with all other estimators reveals that it performs poorly for low frequencies, confirming earlier results reported in Ghysels and Sinko (2006) and Ghysels et al. (2007), (c) the unconditional optimal sampling frequency of the Zhou estimator is higher than the one for the RV estimator, (d) the patterns of the plots looks like parallel shifts across the blue and green colors in case for IBM stock and for the lower frequency segment of AA stock. (e) the pattern of of the RV estimator R2 is roughly identical to the prediction power of the RVTS estimator. In the remainder of the section we take a closer look at respectively the contradictions and the support for the theoretical findings. It appears that the contradictions occur mostly in the cases where the LHS and RHS variables contain a significant amount of market microstructure noise. For example, for IBM stock
E. Ghysels, A. Sinko / Journal of Econometrics 160 (2011) 257–271
0.8
0.8
0.75
0.75
0.7
0.7
0.65
0.65
0.6
0.6
0.55
0.55
0.5
265
0.5 0
100
200
300
400
500
600
0.8
0.8
0.75
0.75
0.7
0.7
0.65
0.65
0.6
0.6
0.55
0.55
0.5
0
100
200
300
400
500
600
0
100
200
300
400
500
600
0
100
200
300
400
500
600
0.5 0
100
200
300
400
500
600
0.8
0.8
0.75
0.75
0.7
0.7
0.65
0.65
0.6
0.6
0.55
0.55
0.5
0.5 0
100
200
300
400
500
600
Fig. 3. R2 as a function of frequency. AA stock. Full sample. The figure shows dependence of regression R2 as a function of sampling frequency. LHS variable is RV , constructed using 5 min, 1 min and 2 s sampling frequencies. R2 is computed for the following RHS: RV, RVAC 1 , RV, RVNW , RVTS , RVTSd and from 2 to 600 s sampling frequencies. The explanatory power of the conditional optimal frequency for a given estimator is plotted using the same color as the unconditional one. The results are given for five-year sample (Jan. 2000–Dec. 2004).
the theory holds for the five-minute and one-minute LHS sampling frequency and for the entire range of the RHS frequencies. The largest number of discrepancies from the theory can be observed for the two-second LHS and the RHS, which is frequent enough to contain significant amount of the market microstructure noise. This seems to suggest that, particularly for an illiquid stock like AA, the variance of microstructure noise is predictable, a topic of further research covered in Sinko (2007). We noted that, in support of the theory, group B estimators have the same predictive power patterns. These two estimators have asymptotically zero variance, and the two-scales correction only removes the bias from the RV estimator. However, if the bias is constant over time, it does not affect the predictive power of the regression as it only changes the intercept. Therefore, the performance of these estimators should be approximately the same, and this is exactly what we observe in the data. Note however that, as a caution, in real-data applications with possibly time-varying variance of market microstructure noise, the RVTS should be preferred. Finally, we would like to discuss the conditional optimal sampling results. They are reasonably close to the maximal R2 of the regressions for each frequency, even though they are in general
worse than the unconditional ones. To estimate the optimal frequencies, we use the approximation derived in Proposition 2.1 using parameter estimates from Section 3.2. The histograms of kurtoses and conditional optimal sampling frequencies implied by the theoretical model for the AA and IBM stocks appear in Fig. 4. The major differences in the conditional optimal sampling frequencies across the two stocks can be captured by the difference of the kurtoses between them. The kurtosis histogram of the AA stock is much wider compared to the same histogram of the IBM stock. This is the main factor that widens the conditional sampling frequency histograms. For the optimal frequency of the Zhou estimator, which does not depend on the market microstructure kurtosis, the conditional optimal sampling frequency is usually higher than the highest sampling frequency considered (2 s). The histograms of this estimator looks the same for the two stocks considered. The model implies that the optimal sampling frequency for the RVTS estimator is higher than the optimal sampling frequency of the RV estimators, even though it does not change the explanatory power of the regression in a significant way. The RVAC 1 optimal sampling frequency is much higher than the optimal frequencies of the other estimators. As a result, for the lower frequencies
266
E. Ghysels, A. Sinko / Journal of Econometrics 160 (2011) 257–271
Fig. 4. Estimated kurtosis and conditional sampling frequencies for AA and IBM stocks. The figure shows estimated kurtosis and approximate conditional optimal sampling frequencies constructed on a daily basis using Proposition 2.1 and Section 3.2. The results are provided for IBM and AA stocks and five-year sample (Jan. 2000–Dec. 2004).
(5 min, 1 min) it is usually outperformed by the others. The RV estimator has a higher optimal sampling frequency compared to the RVTS and RV estimators. As mentioned before, this is a result from the fact that in this subsection the finest possible grid (1 s frequency) stays constant as the subsampling frequency increases. 5. Conclusions We studied the forecasting of future volatility using past volatility measures unadjusted and adjusted for microstructure noise. We examined the population properties of a regression prediction problem involving measures volatility that is contaminated by noise. The general regression framework allowed us to compare the population performance of various estimators and also study of optimal sampling issues in the context of volatility prediction with microstructure noise. We also conducted an extensive empirical study of forecasting with microstructure noise using the 30 Dow Jones stocks. Our empirical results suggest that for this data, within the class of quadratic variation measures, the subsampling and averaging approach (see Zhang et al. (2005)) constitutes the class of estimators that performs best in a prediction context. Overall our empirical findings with five minute sampling schemes square with our theory developed in the paper and confirm earlier findings reported Aït-Sahalia and Mancini (2008), Ghysels and Sinko (2006)
and Ghysels et al. (2007). We also examined the empirics of optimal conditional and unconditional sampling. The optimal sampling exercise compares the explanatory power patterns implied by the theory with the ones estimated from the data. This comparison demonstrates that the theory provides a reasonable explanation for many features of the empirical data for a liquid stock like IBM. For an illiquid stock, like AA, the findings do not square as much with the theory. We conjecture that what is missing, is a model that can capture the more complex time-dependent characteristics of market microstructure noise. This is further explored in Sinko (2007). Appendix. Technical details A.1. Volatility estimators Define two time grids. M = {t0 , . . . , tM } corresponds to the largest possible number of equally-spaced observations per day measured in seconds. T = {τ0 , . . . , τT } corresponds to the actual-time records of the transaction tick-by-tick price data. Any equidistant subsample grid can be represented as Mjm = {tj , tj+m , tj+2m , . . .} with j = 0, . . . , m − 1, j=0,...,m−1 Mjm = M , and ∀i̸=j Mjm Mim = ∅. Log-returns on the grid Mm for two consecutive times tk−1 , tk ∈ M m and Eq. (2.1) is defined as rtk ,m = ptk − ptk−1 = p∗tk − p∗tk−1 + ηt − ηtk−1 = rt∗k ,m + etk ,m . (A.1)
E. Ghysels, A. Sinko / Journal of Econometrics 160 (2011) 257–271 2 Under Assumption 2.1 Var(em t ) = 2ση . The first estimator we m consider is the daily RV estimator:
−
RVm =
tk ,tk−1
(rtk ,m )2
−
=
tk,−1 ∈M m
(rt∗k ,m )2 + 2
− tk,−1 ∈M m
etk ,m rt∗k ,m +
− tk,−1 ∈M m
e2tk ,m . (A.2)
To simplify notations, we use tk,−1 ∈ M instead of tk , tk−1 ∈ Mm . Without market microstructure noise (e = 0) the previous equation defines a consistent estimator of integrated variance with respect to the number of observations M in the subsample (sampling frequency). However, it is biased and inconsistent in the presence of microstructure noise. The second estimator we consider is studied by Zhou (1996) and Hansen and Lunde (2006). We adopt the notation of the authors, and call it the RVm AC 1 estimator: m
RVm ˜0m + 2γ˜1m , AC 1 = γ
γ˜jm =
M
−
M −j
tk,−1 ∈M m
rtmk rtmk+j .
(A.3)
−ση2 and E (rt2im ) = Var(rt∗im )+2ση2 , and thus it is unbiased. The third estimator is consistent. It is proposed by Aït-Sahalia et al. (2005) m−1 1 −
m j =0
tk,−1 ∈Mjm
m−1 1 −
=
−
m j =0
(rtmk )2 − M
RVm,j −
M
M M
(ptk+1 − ptk )2
m
RV =
m j =0
tk,−1 ∈Mjm
RV1 ,
(A.4)
(rtmk )2 .
(A.5)
m−1 1 −
m j =0
t +Nh
for some |ρ| < 1,
i =1
IVt +ih,h =
m −1 M −
M¯ ′ j=0
RVm ,j . ′
This equation is the analog of Eq. (A.4). The only difference is the second term that captures the fact of non-zero autocorrelation of the noise. The last estimator is based on the tick-time grid T instead of the calendar-time grid M . This estimator is proposed by Hansen and Lunde (2006). For T transactions occurring during the day at times τi and window w, based on the data sample, they conclude that it is enough to have about 15 lags for the noise to be ‘‘approximately
σt2 dt;
t −h
t +h
t
Cov(IVt ,h , Zt ,h ) = 0,
p − a2i −λi h + e − 1 + λi h ; 2 λi 2 i=1
a20 h2
(A.9)
E (Zt ,h Zt +δ+s,s ) = 0. Lemma A.1. ∀l > 0, δ > 0, l > δ Var(Zt ,(l−δ)∧s ), δ ≥ l: Cov(Zt ,s , Zt +δ,l ) = 0.
: Cov(Zt ,s , Zt +δ,l ) =
Proof. For l < s:
∫
t
∫
u
∫
t −s t
∫
t −s u
στ dWτ σu dWu
Z t ,s = 2
dpτ dpu t −s t
∫
t −s u∧(t −l)
[∫
=2
u∨(t −l)
∫ dpτ +
t −s
] dpτ dpu
t −s
t −l ∫
∫
t −l
u
=2
dpτ dpu t −s
(A.6)
(A.7)
∑p
t −s t
∫
t −l
∫
+2
t
∫
u
∫
dpτ dpu + 2 t −l
t −s
dpτ dpu t −l
t −l
= Zt −l,s−l + Zt ,l + 2r ∗ t −l,s−l r ∗ t ,l
(A.10)
with Cov(Zt −l,s−l , 2r ∗ t −l,s−l r ∗ t ,l ) = Cov(Zt ,l , 2r ∗ t −l,s−l r ∗ t ,l ) = 0. For δ ≥ l,
′
RVm,j −
h
σt2 dt, E IVt +h,h = t a0 dt = a0 h; a2 Var IVt +h,h = 2 i=1 i2 e−λi h − 1 + λi h ; Cov IVt +h,h , IVt −s,m λi t ∑p a2i −λ s i = 1 − e−λi h 1 − e−λi m , (rt∗,h )2 = t −h σt2 dt + i=1 λ2 e t u i 2 t −h t −h dps dpu = IVt ,h + Zt ,h ; properties of discretization error Zt ,h under no leverage assumption, ∀s, δ ≥ 0: ∑N
≡2
where τ1 < · · · < τT are the sequence of transaction tick-bytick times that belong to T . Under these conditions, the RVTSd estimator is unbiased. Define the minimum step corresponding to ‘‘near uncorrelated frequency’’ as m′ and the associated sample size ′ of the subsample as M . The modified estimator RVTSd is RVTSd =
period h under no leverage assumption are: IVt ,h =
Var Zt ,h = 4
The last two estimators we consider capture the fact that, in reality, microstructure noise can be serially correlated. The first estimator is a modification of the estimator defined at (A.4) and introduced in Aït-Sahalia et al. (2005). Instead of zero correlation of the noise component, they use a much weaker restriction: Corr(eτi , eτi+k ) ≤ ρ k ,
Using properties of ft defined in Assumption 2.1, properties for integrated variance IVt ,h and square returns (rt∗,h )2 over
m−1
−
A.2. Proof of Theorems 2.1 and 2.2
tk,−1 ∈M
(A.8)
∑T
E (Zt ,h ) = 0,
−
where RVm,j is realized volatility associated with subgrid Mjm , number of observations per day starting from the jth observation is M, and RV1 is the realized volatility computed using all equallyspaced data available. 1 −
w − w − j 1 tick γ˜ w j j=−w
where γ˜j1 tick = γ˜|1j| tick = T /(T − |j|) i=1+|j| (pt +τi − pt +τi−1 )(pt +τi−|j| − pt +τi−|j|−1 ). Although this estimator is inconsistent, it provides material for comparison between tick-time and calendar time estimators for prediction purposes.
Under i.i.d. assumption of the microstructure noise, E (rtmk+1 rtmk ) =
RVm TS =
tick uncorrelated’’. We name the last estimator RV1ACNW and define it w as
tick RV1ACNW = w
∈Mm
267
Cov(Zt ,s , Zt +δ,l ) = 4E
∫
t
t
∫
u
= 4E
t +δ
∫
t −s
∫
u
t −s
[∫
t +δ
t +δ−l
∫
t +δ−l
dpτ dpu
u
dpτ dpu E t −s
= 0.
u
dpτ dpu t −s
∫
∫
t +δ−l
t +δ−l
]
dpτ dpu pz ,0≤z ≤t (A.11)
For δ < l, l − δ < s, using (A.10): Zt ,s = Zt ,l−δ + Zt −l+δ,s+δ−l + 2r ∗ t ,l−δ r ∗ t −l+δ,s+δ−l , Zt +δ,l = Zt +δ,δ + Zt ,l−δ + 2r ∗ t +δ,δ r ∗ t ,l−δ . Using (A.11), Cov(Zt ,s , Zt +δ,l ) = Var(Zt ,l−δ ). For δ < l, l − δ ≥ s, by analogy, Cov(Zt ,s , Zt +δ,l ) = Var(Zt ,s ).
268
E. Ghysels, A. Sinko / Journal of Econometrics 160 (2011) 257–271
Lemma A.2. Define the finest (every second) time grid M = {t0 , t1 , . . . , tM } and m subgrids Mim = {t0′ , t1′ , t2′ , . . . , tM′ }, tk′ ≡ ti+km , i ∈ {0, 1, 2, . . . , m − 1}, with corresponding rtk ,m ≡ ptk − − ηtk−m . Further, define h ≡ m/M Then, ptk−m and etk ,m ≡ ηt k ∀Mim , Mjm , i ̸= j, Mjm Mim = ∅ and:
m−1
1 − Var m i=0
− tk,−1 ∈Mim
m i=0
m
Var
tj,−1 ∈Mim
m−1 1 −
Var
1
− tk,−1 ∈Mim
1 m
−
e2tk
,
−
e2tk
tk,−1 ∈Mjm
tk,−1 ∈M
−
Var
tk,−1 ∈Mim
Cov
e2tk ;
etk rt∗k
i =1
Var
e
i M
1 ,M
,
−
e
i=1
2 j M
=E
+ih,h
Cov
(A.14)
k=1
k=1
Z j +kh,h M
= (M − 1)Var Z i +h, i−j + MVar Z j +h,h− i−j M M M M M M − − Cov Z i +kh,h , Zk/M , 1 = (M − m)Var Zi, 1 . k=1
M
M
k=1
e j +ih,h r ∗j +ih,h M M
M −
η
− 2η
2 j M
+ih
j M
+ih
η
j M
(A.15)
= Cov Z j +h,h , Z i +h,h M M = Var Z j +h,h− i−j M M M − ∀k > 1, Cov Z j +kh,h , Z
Proof. Eqs. (A.12) hold since ∀Mim , Mjm , i ̸= j, Mjm ηt is i.i.d.; ηt and rt∗ are independent E
− tk,−1 ∈Mim
e2tk
− tk,−1 ∈Mjm
(A.16)
e2tk = E
E
Mim = ∅;
k=1
tk,−1 ∈Mim
e2tk E
− tk,−1 ∈Mjm
− tk,−1 ∈Mim
etk rt∗k
−
M
k=1
= E
tk,−1 ∈Mim
−
∗
e tk r tk
E etk |ηtk , tk ∈ Mi
m
tk,−1 ∈Mjm
M −1
(A.13): Cov η02 + η12 + 2
− i=1
− i=1
r tk
η2i , η2j M
i M
η2j M
+ih
M −1 − = 4(M − 1)Var η02 + Cov η02 + η12 + 2 η2i , i =1
η2j + η12−(j−m+1)/M M
M
Z i +kh,h M
+kh,h
= (M − 1)Var Z
i M
−j +h, iM
+ MVar Z
Similarly, Eq. (A.16): Cov Z l , 1 , M M 0,
=
e2tk M −
M − k=1
j M
−j +h,h− iM
.
Z i +kh,h M
l ∈ [i, M − m + i + 1)
Var Z 1 , 1 M M
Cov Z l , 1 , M M
M − k=1
otherwise
Z i +kh,h M
= (M − m)Var Z 1 , 1 . M
=0 Var
m−1 M 1 − − ∗2 rj m j=0 k=1 M +kh,h
= Var
+ Var
M
k=1
m−1 M 1 −−
m j=0 k=1
M −1
+ η12−(m−j−1)/M + 2
∗
M −
M
Lemma A.3. Given conditions above,
−
+(i−1)h
M
l =1
etk rt∗k
tk,−1 ∈Mjm
j M
r ∗j2 +ih,h M
−
+η
2
= Cov Z j +kh,h , Z i +kh,h + Cov Z j +kh,h , Z i +(k−1)h,h M M M M = Var Z j +h,h− i−j + Var Z j +h, i−j M M M M M M − − Cov Z j +kh,h , Z i +kh,h k=1
M
+(i−1)h
+ih
+(i−1)h
−
j M
M
M
Z i +kh,h , M
η
+ih
(A.15) using Lemma A.1: Cov Z j +h,h , M
tk,−1 ∈Mim
if i ≥ j,
j M
(A.13)
= (4M − 2)Var η02 + 4M ση4 , − Var etk rt∗k = 2a0 hM ση2 M −
η
i=1
+ih,h M
i =1
M
= 2a0 hM ση2
2 (4M − 1)Var η0 j = {0, m − 1} = 4MVar η02 j = {1, . . . , m − 2} M − − 2 2 etk ≡ Var ej Var tk,−1 ∈Mjm
M 2
M −
η2j
i =1
= (4M − 2)Var η02 + 4M ση4 M − i =1
M
−
≡ Cov
(A.12)
−
j =1
M −1
M
− 2
etk rt∗k =
−
j = {0, m − 1} j = {1, . . . , m − 2}
(A.14): Var η2j + η12−(m−j−1)/M + 2
e2tk =
2 (4M − 1)Var η0 = 4MVar η02
IV ∗j 2 +kh,h M
m−1 M 1 − − ∗2 Zj m j=0 k=1 M +kh,h
. (A.17)
Proof of Theorem 2.1. Note that by construction, M + 1, the number of observations in the finest grid M equal to Mm + m, where M + 1 is the number of observations in the grid Mjm . As a result, Mm = M + 1 − m, or, given h = m/M, Mh = (M + 1 − m)/M. Sum of efficient squared returns over some period can be separated into two parts: IV part and discretization error Z part.
E. Ghysels, A. Sinko / Journal of Econometrics 160 (2011) 257–271 M −
r ∗j2 M
k=1
=
+kh,h
M − k=1
= IV
M −
IV j +kh,h + M
j M
+
Z j +kh,h M
k=1
+(M −m+1)/M ,(M −m+1)/M
M −
+
k=1
Z j +kh,h . M
(A.18)
=
Then, using results of Lemmas A.1 and A.2, properties of integrated variance and discretization noise (A.9)
Var(RVm j ) = Var
M −
r ∗j2
+ Var
+ 4Var
+kh,h M
k=1
M −
M − k=1
k=1
j M
Var
p
a2k 2 k k=1
−
×
+ MVar Z j +h,h M + 8Ma0 hση2 + (4M − 2)Var η02 + 4M ση4 p − a2i −λi Mh = 2 e − 1 + λ Mh + 4M i λ2 i= 1 i p − a2i −λi h a20 h2 + × e − 1 + λi h 2 2 λ i=1 i = Var IVMh+
p m−1 − a2k − (1 + 2i) m2 k=1 λ2k i=0 [ ] m−1 i−1 2i 4 −− 2i × e−λk (1− M ) − 1 + λk 1 − + 2
M
e j +kh,h r ∗j +kh,h M M
+kh,h
i=1 j=0
Cov IV1− i ,1− 2i , IV1− j , i−j M M M M
e
m2
1
2
m−1 i−1 4 −−
269
λ
2i
1 − e−λk (1− M )
m−1 M 1 −−
1
=
Z j +kh,h M
m j=0 k=1
m
1 − e−λk
m
MVar Zh,h
i−j
i=1 j=0
M
,
and
j ,Mh M
+
m−1 i−1 2 −−
m2 i=1 j=0
=
m−1 i−1 2 −−
m2 i=1 j=0
Cov
M − k=1
(M − 1)Var Z
+ MVar Z
+ 2a0 hση2 + ση4 + (4M − 2)(κ − 1)ση4
(A.19)
j M
Z j +kh,h , M
+
−j +h,h− iM
M − k=1
Z i +kh,h M
i M
M m
−j +h, iM
Var Zh,h
Var(RVm ˆ0 ) + 4 AC 1 ) = Var(γ
+4
M
=
2
(M − 1)2
M
(M − 1)
Var(γˆ1 )
Cov(γˆ0 , γˆ1 ),
(M − 1)[(κ + 2)ση + 4a0 hση + 4
2
a20 h2
+
∑p
a2i
i=1 λ2 i
(1 − e
) ]+ 2(M −
−λi h 2
2)ση4 . Variance of the averaging over subsamples estimator (A.5) is
m
Var RV
= Var
m−1 M 1 − − ∗2 rj m j=0 k=1 M +kh,h
+ 4Var
m−1 M 1 −−
∗
e j +kh,h r j +kh,h M M
m j=0 k=1
+ Var
m−1 M 1 −−
m j=0 k=1
m
Using Lemma A.2, Var RV
2
e
j M
+kh,h
= Var
.
+ Var Var
m−1 1 −
m j =0
M
+kh,h
m−1 M 1 −−
m j=0 k=1
m j=0
IV M −i , M −2i M
M
=
,
m−1 1 −
m2 i=0
k=1
+ MkVar Zh,k/M =
m−1 2(M − 1) −
m m−1
×
− i=1
i=1
+
M
+
m
Var Zh, i M
iVar Zh, i M
M m
Var Zh,h
+
2 m2
Var Zh,h
p − a2k i −λk Mi + = − 1 + λk e m 2M 2 M λ2k i=1 k=1 p m−1 − 8 − i2 a20 a2k i −λk Mi + 2 i + e − 1 + λk m i=1 2M 2 M λ2k k=1 p − M a2k −λk h + 2h2 a20 + 4 e − 1 + λk h . (A.21) 2 m λk k=1 m−1 8(M − 1) −
i2 a20
M
M
with
(1 + 2i)Var IV1−
i 2i ,1− M M
8a0 hM ση2 (4M − 2)(κ − 1)ση4 + 4M ση4 = + m m [ ] p −1 2 m − − 1 ak M − 2i −2i −λk MM + 2 (1 + 2i) e − 1 + λk m M λ2k i=0 k=1 p m−1 i−1 i−j 4 − − − a2k M −2i + 2 1 − e−λk M 1 − e−λk M m i=1 j=0 k=1 λ2k p m−1 − −j 8(M − 1) − i2 a20 a2k λk i −λk iM e + + −1+ m 2M 2 M λ2k i=1 k=1 p m − 1 − a2 8 − i2 a20 i i k + 2 i + e−λk M − 1 + λk 2 2 m i=1 2M M λk k=1
Var RV
Z j +kh,h M
(m − k)(M − 1)Var Zh,k/M
With the final result m−1 M 1 − − ∗2 rj m j=0 k=1 M +kh,h
8 1 + a0 hM ση2 + ((4M − 2)Var η02 + 4M ση4 ) m m m −1 − M m−1 − 1 1 − ∗2 Var rj = Var IV1− i ,1− 2i m j=0 k=1
m2
(A.20)
where Var(γˆ0 ) is (A.19), Var(γˆ1 ) and Cov(γˆ0 , γˆ1 ) computed using appropriate modification of BHLS appendix (p. 25), i.e. Cov(γˆ0 , γˆ1 ) = −2(M − 1)[(κ + 1)ση4 + 2a0 hση2 ], Var(γˆ1 ) =
m−1 2 −
m
270
E. Ghysels, A. Sinko / Journal of Econometrics 160 (2011) 257–271
+
M
m
p − a2k −λk h 2 2 2h a0 + 4 e − 1 + λk h . λ2k k=1
(A.22)
And finally, variance of the two-scales estimator (A.4) with smallsample correction is
1−
M
2
Var (RVTS ) = Var RV
M
−2
M
m
M
+
2
Var RV1
M2
m
mCov RV , RV
1
= Cov
M −
r ∗i2
i =1
M − −
m−1
j =0
k=1
× Cov
r ∗j2 +kh,h M
M − i =1
+ 4Cov
+ Cov
j=0 k=1
r ∗j2 +kh,h M
m−1 M
−
r
i M
−−
e i ,1, M M
∗ 1 ,M
r
j=0 k=1
e
i =1
i M
1 ,M
,
−−
j M
+kh,h
e j +kh,h M
m−1 M 2
∗
2
e
j M
j=0 k=1
B =
= = A =
=
m j=0
Cov
M −
M −
M −
Cov m j=0 4a0 M ση2
M −
r
r
i =1
m−1 1 −
m j =0 m−1 1 −
m j =0
+
Cov
∗ i M
1 ,M
∗ i M
1 ,M
M −
e i ,1, M M
m−1 1 −
m j =0
i
Y X t ,h + δ, δ ≥ 0 ⇔ M RV t ,h = Var RV Y t ,h + I δ ⇒ R2 (RV Zt+nh,nh , RV Yt,h , l) = M RV
Xt,h Var RV
Zt+nh,nh , RV Xt,h , l) ≥ R2 (RV Y −1 X −1 t ,h t ,h ⇔C (.)′ M RV C (.) − C (.)′ M RV C (.) ≥ 0 −1 X −1 Yt,h t ,h ⇔C (.)′ M RV − M RV C (.) ≥ 0
ηi, M
k=1
σ2 P1 (ft ) (A.24) 2κ − σ 2 √ 2κ−σ 2 √ with λ1 = κ, P1 (ft ) = (ft − θ ), and dft = κ(θ − ft )dt + θ σ2 σ ft dWt . For simulations we use a0 = 0.686, a1 = 0.412, λ1 = 0.035. Model M2—Two-factor affine: σt2 = σ12,t + σ22,t , dσj2,t = κj (θj − σj2,t )dt + ηj σj,t dWtj , j = 1, 2. σt2 = θ + θ
r
k=1
M −
∗ j M
+kh,h
e j +kh,h M
r
∗ j M
+kh,h
η
j M
+kh
M −
, ,1
r ∗i2 M
i =1
M
M −
r ∗j2 M
k=1
Cov IV1,1 , IVMh+ j ,Mh M
Cov
M
−
−
Z i ,1, M M
+kh,h
k=1
(αj −1)
j
fj,t )dt + 2κj fj,t dWt , and αj = 2κj θj /ηj2 . For simulations we use a0 = 0.504, a1 = −0.122, a2 = −0.119, λ1 = 0.571, λ2 = 0.076.
Z j +kh,h M
Model M3—Log-normal diffusion: d log(σt2 ) = κ[θ − log(σt2 )]dt + σ dWt . The spot variance of this process can be expressed as
Cov IV j , j + IVMh+ j ,Mh M M M
(A.25)
with L1 (fj,t ) are the Laguerre polynomials of degree 1 with corresponding eigenvalues λj = κj , fj,t = αj /θj σj2,t , dfj,t = κj (αj −
M
i =1
θ1 (α −1) θ2 (α −1) σt2 = (θ1 + θ2 ) − √ L1 1 (f1,t ) − √ L2 2 (f2,t ) α1 α2
m j =0
i
The spot variance of this process can be expressed as
1 −
The spot volatility of this process can be expressed as
2
M
m−1
=
2
i =1
m−1 4 −
i
t ,h , l) as M (RV t ,h , l) = M (IV t ,h , l) + C (.), and the matrix M (RV
Model M1—GARCH diffusion, popularized by Nelson (1990): dσt2 =
, , l can be = C IV t +nh,nh , IV t ,h , l ≡
κ(θ − σt2 )dt + σ σt2 dWt .
Cov ei 1, ej , +kh,h m M M M k=1 j =0 i =1 4 = 4M − 2/m (κ − 1)ση m−1 4 −
j
t +nh,nh , RV t ,h , l rewritten as C RV
Cov
For numerical computations we are using models M1–M3 described in Andersen et al. (2004).
Using Lemma A.2, C =
i
=
+kh,h
≡ mA + mB + mC .
m−1 1 −
j RVt ,h
Y −1 Y −1 t ,h t ,h + I δ ⇔ M RV − M RV p.s.d. Y Y −1 t ,h + I δ M RV t ,h ⇔ M RV Y Y t ,h + I δ − M RV t ,h − I δ × M RV Y −1 t ,h = δ 2 M RV + I δ p.s.d. by construction.
−−
M
M −
2
+e j e j +kh,h r j +kh,h +kh,h M M M
m−1 M
,
e i , 1 + e2i 1 , , M M M M
∗
k=1
r ∗i2 1 , M M
i =1
,1 M M
M −
+2
+ 2r ∗i
,1 M M
IVt +a,a , IVt −s,b , s ≥ 0, the vector C
(A.23)
The first and the second terms of the variance is already computed.
i RVt +nh,nh
t ,h − Var IVt ,h where I is (l + 1) × (l + 1) identity I Var RV matrix. Without loss of generality we can assume that
Cov RV , RV1 .
M
m
it +a,a , RV jt −s,b Proof of Theorem 2.2. Given Cov RV
+ IV1,(m−1−j)/M , IVMh+ j ,Mh + (M − m)Var Z 1 , 1 M M M p − a2k 2(1 − e−λk h ) −λk Mh (1 − e = 1− ) + λk Mh λ2k m 1 − e−λk /M k =1 p − 2a20 a2k −λk /M + (M − m) +4 (e − 1 + λk /M ) . M2 λ2k k=1
σt2 =
∞ −
ai Hi (ft )
(A.26)
i=0
where Hi (ft ), i = 0, 1, . . . are Hermite polynomial with 2 corresponding √ i √ eigenvalues √ λi = 2κ i, ai = exp(θ + σ /4κ) (σ / 2κ) / i!, and ft = 2κ(log σt − θ )/σ . For simulations we use a0 = 0.551, a1 = 0.387, an =
an1 √ , n−1 a0 n!
λ1 = 0.014, λn = λ1 n.
E. Ghysels, A. Sinko / Journal of Econometrics 160 (2011) 257–271
A.3. Proof of Proposition 2.1
∑p
2 Defining Q = i=0 ai , φ = M /M and assuming Mh ∼ 1, m ∼ 2 M /M, Var Zt ,h ∼ 2h Q , variances of realized volatility estimators
RV, RVAC 1 , RVTS , RV as a function of M can be approximated by:
2Q ≃ Var IV1,1 + + 8a0 ση2 M + 4M κση4 − 2Var η2 2Q Var RVAC 1 ≃ C + + 4M κση4 + 4M (κ + 2)ση4 + h2 Q
(m)
Var RVj
M
6Q
+ 8M ση4 − 8M (κ + 1)ση4 = C + + 8M ση4 M Var RV ≃ Var IV1,1 + 8a0 φση2 2Q + 4φ 2 M κση4 + − 2φ Var η2 M 1+φ Q (1 − φ) 2M (2 − φ) − φ + 3M 2 φ = Var IV1,1 + 8a0 φση2 + 4φ 2 M κση4 2 2Q 2 Q 1 +φ − −1 − 2φ Var η + 3M φ 3M 2 φ 2 2Q (1 − φ)2 Var (RVTS ) ≃ (1 − φ)2 Var IV1,1 + M 1+φ Q (1 − φ) 2M (2 − φ) − φ + 3M 2 φ 4Q + 8φ a0 ση2 + 8φ 2 M ση4 − 2φ(1 − φ)Var η2 + 2 , M 2Q or Var (RVTS ) ≃ Var IV1,1 + M 2φ Var η2 Q {2M (2 − φ) − (1/φ + 1)} − + 3M 2 φ(1 − φ) 1−φ 4Q 1 2 2 4 + 8φ a0 ση + 8φ M ση . (A.27) + (1 − φ)2 M 2 References Aït-Sahalia, Yacine, Mancini, Loriano, 2008. Out of sample forecasts of quadratic variation. Journal of Econometrics 147, 17–33.
271
Aït-Sahalia, Yacine, Mykland, Per, Zhang, Lan, 2005. How often to sample a continuous-time process in the presence of market microstructure noise. Review of Financial Studies 18, 351–416. Aït-Sahalia, Yacine, Yu, Jialin, 2009. High frequency market microstructure noise estimates and liquidity measures. Annals of Applied Statistics 3, 422–457. Andersen, Torben, Bollerslev, Tim, Meddahi, Nour, 2004. Analytic evaluation of volatility forecasts. International Economic Review 45, 1079–1110. Andersen, Torben, Bollerslev, Tim, Meddahi, Nour, 2011. Realized volatility forecasting and market microstructure noise. Journal of Econometrics 160 (1), 220–234. Bandi, Federico, Russell, Jeffrey, 2006. Separating microstructure noise from volatility. Journal of Financial Economics 79, 655–692. Bandi, Federico, Russell, Jeffrey, 2008a. Microstructure noise, realized variance, and optimal sampling. Review of Economic Studies 75, 339–369. Bandi, Frederico M., Russell, Jeffrey R., 2008b. On the finite sample properties of kernel-based integrated variance estimators. Working Paper. Barndorff-Nielsen, Ole E., Shephard, Neil, 2002. Econometric analysis of realised volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society. Series B 64, 253–280. Garcia, Rene, Meddahi, Nour, 2006. Comment. Journal of Business and Economic Statistics 24 (2), 184–192. Ghysels, Eric, Santa-Clara, Pedro, Valkanov, Rossen, 2006. Predicting volatility: getting the most out of return data sampled at different frequencies. Journal of Econometrics 131, 59–95. Ghysels, Eric, Sinko, Arthur, 2006. Comment. Journal of Business and Economic Statistics 24 (2), 192–194. Ghysels, Eric, Sinko, Arthur, 2009. Volatility forecasting and microstructure noise. Available at URL: www.unc.edu~eghysels. Ghysels, Eric, Sinko, Arthur, Valkanov, Rossen, 2007. MIDAS regressions: further results and new directions. Econometric Reviews 26, 53–90. Granger, C.W.J., Newbold, P., 1976. Forecasting transformed series. Journal of the Royal Statistical Society. Series B (Methodological) 189–203. Hansen, Peter R., Lunde, Asger, 2006. Realized variance and market microstructure noise. Journal of Business and Economic Statistics 24, 127–161. Meddahi, Nour, 2001. An eigenfunction approach for volatility modeling. Working Paper. University of Montreal. Nelson, Daniel B., 1990. ARCH models as diffusion approximations. Journal of Econometrics 45, 7–38. Oomen, R.A.C., 2005. Properties of bias corrected realized variance in calendar time and business time. Journal of Financial Econometrics 3 (4), 555–577. Renault, E., 2009. Moment-based estimation of stochastic volatility models. In: Andersen, T.G., Davis, R.A., Kreiss, J., Mikosch, T. (Eds.), Handbook of Financial Time Series. Springer Verlag, Berlin. Sinko, Arthur, 2007. On predictability of market microstructure noise volatility. UNC Working Paper. Taylor, S.J., 1982. Financial returns modelled by the product of two stochastic processes. A study of daily sugar prices, 1961–79. In: Shephard, N. (Ed.), Stochastic Volatility: Selected Readings. Oxford University Press, Oxford. Wasserfallen, Walter, Zimmermann, Heinz, 1985. The behavior of intraday exchange rates. Journal of Banking and Finance 9, 55–72. Zhang, Lan, Mykland, Per A., Aït-Sahalia, Yacine, 2005. A tale of two time scales: determining integrated volatility with noisy high frequency data. Journal of the American Statistical Association 100, 1394–1411. Zhou, Bin, 1996. High-frequency data and volatility in foreign-exchange rates. Journal of Business and Economic Statistics 14, 45–52.
Journal of Econometrics 160 (2011) 272–279
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Causality effects in return volatility measures with random times Eric Renault a,b,∗ , Bas J.M. Werker c a
University of North Carolina at Chapel Hill, NC, United States
b
CentER, Tilburg University, Tilburg, The Netherlands
c
Finance and Econometrics Group, CentER, Tilburg University, Tilburg, The Netherlands
article
info
Article history: Available online 6 March 2010 Keywords: Continuous time models Granger causality Instantaneous causality Durations Ultra-high frequency data Volatility per trade
abstract We provide a structural approach to identify instantaneous causality effects between durations and stock price volatility. So far, in the literature, instantaneous causality effects have either been excluded or cannot be identified separately from Granger type causality effects. By giving explicit moment conditions for observed returns over (random) duration intervals, we are able to identify an instantaneous causality effect. The documented causality effect has significant impact on inference for tick-by-tick data. We find that instantaneous volatility forecasts for, e.g., IBM stock returns must be decreased by as much as 40% when not having seen the next quote change before its (conditionally) median time. Also, instantaneous volatilities are found to be much higher than indicated by standard volatility assessment procedures using tick-by-tick data. For IBM, a naive assessment of spot volatility based on observed returns between quote changes would only account for 60% of the actual volatility. For less liquidly traded stocks at NYSE this effect is even stronger. © 2010 Elsevier B.V. All rights reserved.
1. Introduction Understanding the effects of trading and information flows on short-run stock price volatility has become an active area of research in financial econometrics. Following the seminal paper by French and Roll (1986), several authors (e.g., Amihud and Mendelson, 1991, or Foster and Viswanathan, 1993) have compared the behavior of volatility during exchanges’ opening hours versus closing hours. Next to this empirical evidence, several theoretical models (e.g., Admati and Pfleiderer, 1988, or Foster and Viswanathan, 1990) have been developed explaining the high (low) volatilities during exchange trading (nontrading) periods. An important contribution is also Jones et al. (1994) which extends the definition of nontrading periods to times where exchanges are open but traders endogenously choose not to trade. The present paper is in line with this endogenous definition of nontrading periods. More precisely, we consider causality relationships between volatility and trading intensity (as measured by the duration between trades or quote changes). We address this issue in the continuous time setting of diffusion models and their extension to Markov models driven by Lévy processes. Surprisingly, while these latter models are essential tools for much of the theoretical asset pricing literature, little has been done to accommodate causality effects from trading (possibly induced by new
∗ Corresponding author at: University of North Carolina at Chapel Hill, NC, United States. E-mail address:
[email protected] (E. Renault). 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.036
information) on volatility in this framework. For instance, both Aït-Sahalia and Mykland (2003) and Meddahi et al. (2006) assume that irregular sampling dates can simply be plugged into formulas for the transition probability of Markovian price processes. This is at odds with instantaneous causality between these sampling dates and volatility which is the main focus of the present paper. To sketch the causality relationships of interest, let us for the moment consider a log-price process (log St ) which follows a Brownian martingale: d log St = σt dWt ,
t
(1.1)
with E 0 σu2 du < ∞ for all t. While a more general setting will be used in the remainder of this paper, it is for explanatory reasons simpler to restrict attention to the model (1.1). The present paper focuses on the conditional variance, at a given time t, of the continuously compounded rate of return Rt :t +τ := log(St +τ /St ) over a stochastic horizon τ . The conditional variance of interest, denoted by Vart Rt :t +τ , is determined by the joint probability distribution of the stochastic duration τ and the asset price process as of time t. This direct approach to volatility measurement over endogenous random periods is, in our opinion, an interesting alternative to other approaches proposed so far in the literature. These include the use of time deformation to capture the trading activity intensity in Ane and Geman (2000) or using point processes with time varying intensity as in Duffie and Glynn (2004). In some respect, the approach closest to ours is the GARCH model with irregularly spaced data in Engle (2000).
E. Renault, B.J.M. Werker / Journal of Econometrics 160 (2011) 272–279
Compared to these alternative approaches, a key new insight in the present paper is a distinction between Granger causality effects (from durations towards volatility or from volatility towards durations) and instantaneous causality effects. With respect to this latter concept the use of the term ‘causality’ may be questioned, in particular, because it gives a symmetric role to the cause and the effect. However, we follow this classical term from Pierce and Haugh (1977) and extended to continuous time in Renault et al. (1998). The basic idea of our paper is an additive decomposition of the conditional variance Vart Rt :t +τ into two components. The first component is the conditional expectation of R2t :t +τ computed as if, given the information available at time t, the random duration τ and the volatility process were independent. We stress that this conditional independence does not preclude that optimal forecasts at time t of both the future volatility σt +h , h > 0, and the future duration τ depend on common variables available at time t. In the context of an observed sequence of random times, such common dependence would amount to Granger causality relationships between durations and volatilities. For instance, Dufour and Engle (2000) document that one may expect, in general, a high (low) frequency of future trades to occur at the same time as a high (low) volatility of returns, possibly because they are both affected by a common information flow. We argue, however, that there is a significant second component in the conditional variance Vart Rt :t +τ due to conditional dependence, still given the information available at time t, between the random duration τ and future volatilities σt +h , h > 0. We empirically document that, considering durations between quote changes, this effect may lead one to believe that the underlying spot volatility is only 60% of its actual level. The intuition is rather clear. The conditional variance Vart Rt :t +τ is not as large as you may believe it to be since, when future spot volatility is large, the duration τ tends to be small. In view of the remarks above, we dubbed this second term the instantaneous causality effect. As far as risk management is concerned, one may argue that we are not much interested in the measurement of volatility over intervals less than a minute. Even though our empirical application is focused on such very short durations, the general framework put forward in this paper can actually be applied to much longer random durations. Just to give an example, liquidity risk issues may lead to considering the time needed to sell a given amount of shares (see Hautsch, 2003; Easley et al., 2008). The tools proposed in this paper for volatility assessment on endogenously random time intervals would also be relevant for such applications. The econometric importance of the instantaneous causality effect is obvious. Consider the example put forward in our empirical application: the duration of interest is the random time between two consecutive quote changes. Engle and Sun (2005) have considered a very similar quantity that they call conditional volatility per trade.1 The key issue is to fit a model for forecasting volatility from tick by tick data, i.e., randomly unequally spaced in time. As will become clear from the conditional moment restrictions we derive, overlooking the instantaneous causality effect leads to a significant bias in the estimation of the parameters of the volatility forecasting equation. Alternatively phrased, as the conditional volatility per trade may be only 60% of what it would have been ignoring the instantaneous causality effect, the conditional volatility per trade may underestimate the underlying spot volatility by a factor of about two. Let us introduce some notation and assumptions used throughout the paper. The information flow in the market is described by a continuous time filtration (Ft )t ≥0 that satisfies the usual
conditions. All stochastic processes in the sequel are adapted to this filtration. Note that the filtration (Ft )t ≥0 is generally not completely observed by the econometrician. We assume that only the asset price St is observed at some (random but observable) times t1 , t2 , . . . , although it is straightforward to extend our ideas to situations where the econometrician’s information set is larger. The times t1 , t2 , . . . , form an increasing sequence of stopping times with respect to the filtration (Ft )t ≥0 . To simplify terminology, we call these times transaction dates even though they may refer to other times, e.g., those at which either the best bid or the best ask quote changes as in Section 4. For notational convenience we define t0 = 0 and 1ti+1 = ti+1 − ti . In order not to exclude Granger causality effects in our analysis, we adopt, for the transaction times ti , the Autoregressive Conditional Duration (ACD) model of Engle and Russell (1998). This model is characterized by the fact that durations, divided by their conditional expectation, are serially independent and identically distributed. This assumption simplifies our causality study, while more general point processes would need more involved tools as in Florens and Fougère (1996). Assumption A. Let ψti = Eti {1ti+1 } denote the conditionally expected duration at time ti . We assume that Pr{1ti+1 ≤ u|Fti } = F (u/ψti ), where F is the distribution function of a positive random variable with unit expectation. The paper is organized as follows. In Section 2, we present a general discussion of possible specifications for the conditional variance Varti {Rti :ti+1 } when it is computed as if, given the information available at time ti , the random duration 1ti+1 and the future of the volatility process σti +h , h > 0, were conditionally independent. Alternatively said, in Section 2 we impose the absence of instantaneous causality effects and revisit the ACDGARCH type model of Ghysels and Jasiak (1998), Engle (2000), Grammig and Wellner (2002), and Meddahi et al. (2006). Section 3 then provides the main contribution of this paper, namely a tool to incorporate instantaneous causality effects between durations and volatility when forecasting volatility from tick by tick data. This leads us to consider an appropriate measure of covariation between durations and volatility. Section 4 presents an empirical illustration that shows that the instantaneous causality effect is highly significant, both from a statistical and financial point of view. Section 5 concludes and sketches the implications for likelihood inference of continuous time models of instantaneous causality between volatility and durations. 2. Volatility per trade without instantaneous causality Consider a financial asset with time t price St . The evolution of the price St is d log St = σt dLt ,
t ≥ 0;
S0 = 1.
(2.2)
In our specification, the volatility process (σt )t ≥0 is a predictable process and (Lt )t ≥0 is some Lévy process. For ease of exposition, we ignore, in line with, e.g., Engle (2000), a possible drift term. We will allow for a possibly nonzero drift in the empirical illustration in Section 4. Specifications like (2.2) have also been used in Carr and Wu (2004) and Eberlein and Papapantoleon (2005). In order to derive moment conditions, we impose some further conditions. Assumption B. The innovation process (Lt )t ≥0 is assumed to be a zero-mean Lévy process with respect to the filtration (Ft )t ≥0 . We assume it has unit variance per unit of time, i.e., Var{Lt } = t. The volatility process (σt )t ≥0 is assumed to be predictable with respect tothe filtration (Ft )t ≥0 and square-integrable in the sense that E
1 We follow the terminology by Engle and Sun (2005) here, although ‘‘variance per trade’’ might have been a more appropriate term.
273
T 0
σu2 d[L, L]u < ∞, for all T ≥ 0.
Note that the condition Var{Lt } = t identifies σt as ‘‘the volatility’’, without explicitly distinguishing the continuous and
274
E. Renault, B.J.M. Werker / Journal of Econometrics 160 (2011) 272–279
jump components. Assumption B implies that (log St )t ≥0 is a martingale. Even though the most general framework for asset price models ruling out arbitrage uses general semimartingales (see Back, 1991; Delbaen and Schachermayer, 1999) we do not want to introduce a non-trivial drift term here as this would give rise to additional causality effects with the random transaction times. Constant drift terms, seasonality effects, as well as microstructure noise, which makes the observed price possibly different from log St , will be introduced in the empirical section. Finally, a key tool in the theoretical developments below will be Doob’s optional sampling theorem (see, e.g., Protter, 2003, p. 9). One way to justify its use would be to assume that all the stopping times ti are bounded. However, it will be convenient to use the exponential distribution as a benchmark model for durations. This is the reason why we do not restrict ourselves to bounded stopping times but simply assume that the optional sampling theorem applies when needed. The following proposition relates the conditional variance of returns observed over random durations to the conditional probability distribution of the duration when no instantaneous causality is at play. All propositions are proven in the Appendix A.
consider an Ornstein–Uhlenbeck like model (that is with linear mean-reversion in volatility as in, e.g., Drost and Werker, 1996; Barndorff-Nielsen and Shephard, 2002; Meddahi and Renault, 2004). Such a model would imply
Eti σt2i +vψt i
Varti Rti :ti+1
ψti
= E⋆ Eti σt2i +V ψt , i
(2.3)
where E⋆ denotes the expectation operator concerning the variable V which is supposed to be endowed with distribution G and density g satisfying g (v) = 1 − F (v). Note that a similar result is, of course, valid for a deterministic horizon, using a degenerated distribution G.
i
distribution G. The alternative, the route we will follow below, is to assume that observed durations are sufficient small to justify an approximation of the (squared) volatility process by a martingale, i.e., Eti {σt2i +vψt } = σt2i , independent of v . i
Corollary 2.3. Under Assumptions A and B, assuming that durations 1ti+1 and the volatility process (σti +u )u≥0 are conditionally independent given Fti , and Eti σt2i +vψt = σt2i for all v > 0, we have i
Varti Rti :ti+1 = ψti σ ,
(2.8)
Varti Rti :ti +h = hσt2i .
(2.9)
Varti Rti :ti +hψti hψti
= Eti σt2i +hψt , i
(2.4)
and
Varti Rti :ti +h h
= Eti σt2i +h .
(2.5)
Note that moment condition (2.4), albeit more interesting than (2.5) for financial interpretation, is not of easy use for inference since it involves a measurement of returns over time varying unobserved expected durations. For simplicity, we will rather use the second one with h proportional to the unconditional average duration, see Section 4. Comparing (2.3) and (2.4) shows that duration randomness induces an additional expectation operator with respect to the duration, albeit not with the historical probability distribution. The modified duration distribution is defined by a density g (v) = 1 − F (v), decreasing on the positive real line. The expectation E⋆ is related to the duration dispersion E⋆ {V } = (Var{V } + 1)/2 = 1/(2ϕ),
(2.6)
where ϕ = 1/E{V }. Recall that ϕ characterizes the degree of overdispersion in the (conditional) historical distribution of durations, with ϕ < 1/2 in the typical case of overdispersion. Only in the case of exponential distributions does the modified distribution G coincide with the historical one F . As far as statistical inference is concerned, randomness in durations necessitates either a parametric model for durations or a martingale hypothesis on (squared) volatility. To see this, 2
2 ti
Conditions (2.8)–(2.9) hold exactly in the case the (squared) volatility process is a martingale, as, e.g., in IGARCH or the unit root volatility model in Hansen (1995). We assume (2.9) to hold at horizons h equal to 25 or 50 times the (unconditional) average duration, i.e., not more than a few minutes for the data considered in Section 4. The use of moment condition (2.8) in a GMM framework requires an additional assumption as neither ψti nor σt2i is observed by the econometrician. In order to allow for Granger causality from durations to volatilities, we adopt the following forecasting equation
ψti E⋆ [Eti
=
{σt2i +V ψt }] would involve the Laplace transform of the probability
E σt2i |ψti ; 1tj , Rtj−1 :tj , j ≤ i = α0 + α1 ψti + α2
Corollary 2.2. Under Assumptions A and B, we have
(2.7)
Consequently, the calculation of Varti {Rti :ti +hψti }
Proposition 2.1. Under Assumptions A and B and assuming that durations 1ti+1 and the volatility process (σti +u )u≥0 are conditionally independent given Fti , we have
= 1 − exp −κvψti E σt2 + exp −κvψti σt2i .
R2ti−1 :ti
ψti
.
(2.10)
Note that Engle (2000) puts forward a variety of reduced form specifications for forecasting volatility from tick by tick data.2 While (2.10) is only one possible specification among many one may want to use, in our empirical illustration in Section 4 we could not reject (2.10) in favor of a specification including higher-order terms. From (2.8) we deduce
E R2ti :ti+1 − ψti σt2i | 1tj , Rtj−1 :tj , j ≤ i = 0
⇒ E R2ti :ti+1 − α0 ψti − α1 ψt2i − α2 R2ti−1 :ti | 1tj , Rtj−1 :tj , j ≤ i = 0 ⇒ E R2ti :ti+1 − α0 1ti+1 − α1 ϕ (1ti+1 )2 − α2 R2ti−1 :ti | 1tj , Rtj−1 :tj , j ≤ i = 0.
(2.11)
In contrast to (2.8), (2.10) can be used directly in GMM inference with lagged durations and returns as instruments. 3. Volatility per trade with instantaneous causality Proposition 2.1 above is based on the restrictive, empirically rejected, assumption that 1ti+1 and the volatility process (σti +u )u≥0
2 Engle’s (2000) forecasting equations are not about a continuous time spot volatility σt2i but about a discrete time variance in the GARCH spirit. However, Meddahi et al. (2006) proves, in the case of an Ornstein–Uhlenbeck like model with volatility mean reversion, that there is an affine relationship between both quantities. In this respect, (2.10) is closely related to the equations studied in Engle (2000).
E. Renault, B.J.M. Werker / Journal of Econometrics 160 (2011) 272–279
are conditionally independent given Fti . As shown in Appendix A, it is actually always possible, under the maintained validity of the optional sampling theorem, to show
∫
∞
I[0,1ti+1 ] (u)σt2i +u du .
Varti Rti :ti+1 = Eti 0
(3.1)
The assumed noncausality in Proposition 2.1 leads to a zero conditional covariance between the indicator function I[0,1ti+1 ] (u) and σt2i +u . In the absence of such noncausality assumption, we rewrite the covariance by using the (conditional, i.e., given Fti ) regression coefficient of σt2i +u on I[0,1ti+1 ] (u), that is, Covti I[0,1ti+1 ] (u), σt2i +u = βti (u)F u/ψti 1 − F u/ψti . (3.2) This leads us to state the main contribution of the present paper.
Proposition 3.1. Under Assumptions A–B, we have
Varti Rti :ti+1
ψti
= E⋆ Eti σt2i +V ψt
i
⋆
+ E βti V ψti
F (V ) , (3.3)
where E⋆ denotes the expectation operator concerning the variable V which is supposed to be endowed with distribution G and density g satisfying g (v) = 1 − F (v). It is worth noting that the relations (2.4)–(2.5) are not affected by possible instantaneous causality effects as they refer to deterministic durations. In order to interpret Proposition 3.1, note that, since any function of an indicator is necessarily affine, we deduce from (3.2)
Eti σt2i +vψt |1ti+1 > vψti i
− Eti σt2i +vψt i
= βti (vψti )F (v) , Eti σt2i +vψt |1ti+1 ≤ vψti − Eti σt2i +vψt i
= βti (vψti ) [F (v) − 1] .
(3.4)
i
(3.5)
From (3.4), we see that the β -function characterizes by how much an instantaneous variance assessment is influenced by the information that no completed duration is observed for some time. Proposition 3.1 shows that this information influences the volatility of returns over a random horizon as well. Generally speaking, when returns are considered over random time intervals (ti , ti+1 ], the duration between the two consecutive stopping times may convey (though a nonzero coefficient βti ) relevant information about the risk borne at time ti over the horizon 1ti+1 . A negative beta coefficient is naturally suggested by existing theories of market microstructure and confirmed by our empirical study in Section 4 where the dates ti refer to quote changes: the knowledge that 1ti+1 > vψti (resp. 1ti+1 ≤ vψti ) leads one to update downwards (resp. upwards) the expectation about the volatility at date ti +vψti . The amount of this update is given by βti (vψti )F ((v)) (resp. βti (vψti )[F ((v)) − 1]). To implement actual GMM inference from the moment conditions (3.3), we need to specify the function βti (·). Both Engle (2000) and Manganelli (2005) estimate a discrete time model of the conditional variance of Rti :ti+1 given not only Fti but also given the current duration 1ti+1 . Their empirical results give us some guidance concerning the way the forecast at time ti of σt2i +vψt i
could be modified by the additional knowledge that 1ti+1 > vψti . Under the working hypothesis that linear approximations give an appropriate description of these relations, it seems natural to relate the shape of βti (vψti ) to the (unconditional) volatility predictions at the corresponding horizon. This is formalized in the next assumption. Assumption C. The conditional regression coefficient βti (vψti ) in (3.2) satisfies
275
βti (vψti ) = β (v) Eti σt2i +vψt
i
,
(3.6)
for a given function β defined on the support of the distribution function F . Assumption C extends the ACD specification of the durations (Assumption A) to the regression function βti . It means that for a given level of the rescaled forecasting horizon v , the update in variance predictions given by (3.4)–(3.5) is constant in relative terms
σt2i +vψt 1ti+1 > vψti i − 1 = β (v) F (v) , Eti σt2i +vψt i Eti σt2i +vψt 1ti+1 ≤ vψti i − 1 = β (v) [F (v) − 1] . Eti σt2i +vψt E ti
(3.7)
(3.8)
i
In other words, thanks to Assumption C, the relative adjustment given the hypothetical information that the next duration exceeds (or is below) its conditional median, its conditional first quartile, or any given conditional quantile, is always the same, irrespective of the other available information. It is worth stressing that we will document empirically that the prediction updates given by (3.7)–(3.8) are not negligible. For instance, using the GMMbased estimated parameters for IBM in Section 4, we can show3 that a present time prediction for the instantaneous volatility 1.5 s from now (the median duration), conditional on not having seen a quote revision by that time, is about 40% less than the unconditional prediction. Conversely, at the median duration of 1.5 s, the instantaneous volatility prediction has to be increased by about 28% if we know that a new quote is available. Assumption C allows us to translate the moment condition (3.3) in Proposition 3.1 into moment conditions that can be used for GMM inference. Corollary 3.2. Under Assumptions A–C, we have
Varti Rti :ti+1
ψti
= E⋆ Eti σt2i +V ψt i
+ E⋆ β (V ) F (V ) Eti σt2i +V ψt ,
(3.9)
i
where, again, E⋆ denotes the expectation with respect to V having density g (v) = 1 − F (v). As in Section 2, the calculation of Varti {Rti :ti+1 } generally needs the complete specification of (the Laplace transform of) the probability distribution G of V . Assuming integrated (squared) volatility, this is no longer the case. Corollary 3.3. Under Assumptions A–C and Eti σt2i +vψt = σt2i for all
v > 0, we have Varti Rti :ti+1 = 1 + β ∗ ψti σt2i , ⋆
i
(3.10)
⋆
where β = E [β(V )F (V )]. Comparing to (2.8), we see that the instantaneous causality effect introduces a multiplicative effect (1 + β ⋆ ), where β ⋆ is an average of the function β dampened by the distribution function F . This multiplicative effect 1 + β ⋆ will show up as a factor in front
3 These approximate results are computed by considering the simplest model where the function β is constant and F corresponds to a unit mean exponential distribution: F (v) = 1 − exp(−v). Even though these assumptions are never maintained in the rest of the paper, we can use them to get a visual appraisal of the orders of magnitude in the volatility updates (3.7)–(3.8). See the web appendix.
276
E. Renault, B.J.M. Werker / Journal of Econometrics 160 (2011) 272–279
Table 1 Summary statistics for durations and returns for ten stocks from the TAQ database January 3, 2005, until March 31, 2005. The rows of the table present, from top to bottom, the number of observations, the average duration between quote revisions, the standard deviation of durations, the average return between quote revisions, and the standard deviation. All durations are measured in seconds (s) and returns in basis points (bp).
Observations Average dur. Stand. dev. dur. Average ret. Stand. dev. ret.
DDS
FD
IBM
JCP
MAT
MAY
MCD
SKS
SLB
WMT
328 167 4.3 6.9 −0.0 2.8
354 205 4.0 6.8 0.0 1.5
657 906 2.2 2.5 −0.0 0.8
413 551 3.4 5.0 0.0 1.5
405 697 3.5 4.7 0.0 2.1
442 760 3.2 4.6 −0.0 2.0
588 747 2.4 2.9 0.0 1.2
291 770 4.9 7.4 0.0 3.1
521 279 2.7 4.5 0.0 1.2
676 793 2.1 2.5 −0.0 0.9
of the coefficients α0 , α1 , and α2 in the moment condition (2.10). Note also that (3.10) provides a very convenient way to correct the common rule of thumb that the current value of the spot volatility process can be inferred by just dividing the conditional volatility per trade by the expected duration. Such a rule of thumb is one way to understand the Engle and Sun (2005) observation that volatility is inversely related to expected durations as the conditional volatility pre trade appears to be nearly independent of durations. Actually, this latter observation is consistent with (3.10), although the rule of thumb is incorrect due to the additional instantaneous causality factor β ⋆ this rule of thumb overlooks. To illustrate the consequences of neglecting the causality factor 1+β ⋆ , that is, to compute the instantaneous variance by unit of time as Varti {Rti :ti+1 }/ψti , consider that the estimated β ⋆ is −65% for IBM and even more negative for some other stocks, see Section 4. As a result, the rule √ of thumb underestimated the actual instantaneous volatility by 1 + β ⋆ = 59% (in relative terms). Clearly, this may have important repercussions for risk management.
we replace returns above 100 basis points (in absolute value) by the average return. The latter only affected three out of the ten stocks for in total 41 observations. Summary statistics are in Table 1. The first row in Table 1 gives, for each of the ten stocks, the number of observations that are available for estimation. For a (relatively) illiquid stock like Saks (SKS), we still have almost 300,000 observations available. For the most liquid stocks (IBM and WMT), we have twice as many. The difference in liquidity also follows from the second row of Table 1, that gives the average duration between consecutive quote revisions. These average durations range from 2.1 s for WMT to 4.9 s for SKS. The standard deviation of durations is always above the average. We would like to stress that, contrary to many empirical market microstructure papers, we did not seasonally adjust our data in any way. The reason for this is that it is not clear how such an adjustment would interfere with the causality effects we are interested in. We detail in the section below one way to control for seasonality effects.
4. Empirical illustration
4.2. Market microstructure noise and seasonality effects
The present paper argues that volatility measurement (be it for inference purposes or risk management) based on random durations, needs to take into account instantaneous causality between durations and volatility as measured, e.g., by the parameter β ⋆ in Corollary 3.3. The purpose of our empirical illustration is to show that this causality effect is not only statistically significant, but also economically. In particular, as is detailed in a previous version of this paper, incorrectly imposing β ⋆ = 0 leads to biased estimates for the other model parameters. Section 4.1 discusses the NYSE stocks that we use for our illustration. It is well-known that intraday data suffer from two problems: market microstructure noise and seasonality effects. We show in Section 4.2 how we deal with both. Finally, Section 4.3 shows that, at least for the stocks and time period we study, instantaneous causality effects between durations and volatilities are both statistically and economically significant.
Two stylized facts for intraday data complicate any empirical analysis. First, there is abundant evidence concerning the presence of microstructure noise. Following the literature (see, e.g. Bandi and Russell, 2006; Barndorff-Nielsen et al., 2006; Zhang et al., 2005; Zhou, 1996) we model microstructure noise as independent additive terms to observed prices, see also the discussion in Hansen and Lunde (2006). As a result, both moment conditions (2.9) and (3.10), combined with (2.10), will be amended with a 2 2 constant σmms ,h and σmms,ψ , respectively. We do not impose that both variances are the same in order to allow for possible serial correlation in the microstructure noise. Therefore, GMM inference is based on the following conditional moment conditions
4.1. Data description Our data consists of ten stocks traded at NYSE from the TAQ dataset for 61 days from January 3, 2005, until March 31, 2005. The relevant times ti we consider here are those where either the best bid or the best ask quote at NYSE4 changes. An earlier version of this paper considered transaction times, leading to similar conclusions. Returns are measured as the change in a stock’s midquote defined as the geometric average of the best bid and ask at a given time. The ten stocks we use are (with ticker symbol in parentheses): Dillard’s (DDS), Federated (FD), IBM (IBM), JCPenney (JCP), Mattel (MAT), May (MAY), McDonald’s (MCD), Saks (SKS), Schlumberger (SLB), and Walmart (WMT). We remove zero durations. Moreover,
4 For cross-listed stocks, we restrict attention to quotes on NYSE.
E Rti :ti+1 − µ1ti+1 | 1tj , Rtj−1 :tj , j ≤ i = 0,
E R2ti :ti+1 − µ2 ϕ (1ti+1 )2 − α0 1 + β
⋆
(4.1)
1ti+1 − α1 1 + β 2 × ϕ (1ti+1 )2 − 2σmms | 1 t , R , j ≤ i = 0, j t : t ,ψ j−1 j E Rti :ti +h − µh| 1tj , Rtj−1 :tj , j ≤ i = 0, E R2ti :ti +h − µ2 h2 − α0 h − α1 h1ti+1 2 − 2σmms | 1 t , R , j ≤ i = 0. j t : t ,h j−1 j
⋆ (4.2) (4.3)
(4.4)
Note that we allow for a possible non-zero constant drift µ. The moment condition (4.4) over fixed durations allows us, together with (4.2), to identify the instantaneous causality coefficient β ⋆ separately from the Granger causality coefficient α1 . Clearly, these moment conditions (4.3)–(4.4) are valid for any h and in our empirical analysis we use simultaneously h equal to 25 and 50 times the unconditional average duration for each stock. Note that the moment conditions (4.1) and (4.3) overlook the serial correlation in observed returns induced by i.i.d. microstructure noise. Formally, writing (4.1) without any return predictability
E. Renault, B.J.M. Werker / Journal of Econometrics 160 (2011) 272–279
277
Table 2 Point estimates and t-values for the expected return (µ), the relation between instantaneous volatility and expected durations (α0 and α1 ), the instantaneous causality 2 2 parameter (β ⋆ ), the duration dispersion parameter (ϕ ), and the variances of market microstructure noise (σmms ;h referring to deterministic time intervals and σmms;ψ referring to consecutive quote revisions). All estimates are precision weighted averages over 25 independent 15 min intervals per trading day. The last two lines in each panel present, respectively, the√ average p-values of the GMM J-test √ for overidentifying restrictions and the relative volatility underestimation due to not taking into account the instantaneous causality ( 1 + β ⋆ ), for instance, for IBM, 1 + β ⋆ σti is only 0.59σti . Detailed estimation results are available in the web-appendix. Parameter
est.
t-val
DDS
µ (%) α0 α1 (%) β⋆ ϕ 2 σmms ,h 2 σmms ,ψ p-value Underestimation
p-value Underestimation
t-val
est.
t-val
IBM
−0.22
−0.05
−1.03
−0.01
−0.18
0.76 −3.82 −0.94 1.06 1.21 1.73 0.38 0.24
26.07 −11.39 −5.75 1.98 1.56 9.90
0.58 −3.19 −0.93 0.71 0.99 0.60 0.32 0.26
29.82 −11.47 −9.77 2.99 1.67 7.66
0.18 −1.52 −0.65 0.46 1.00 0.19 0.70 0.59
32.42 −9.65 −2.90 1.96 10.80 5.63
MCD
SKS
−0.22
−0.05
−1.03
−0.01
−0.18
0.73 −7.76 −0.88 0.56 1.54 1.17 0.52 0.34
26.22 −15.25 −7.22 5.38 2.92 8.99
0.40 −2.69 −0.54 0.77 0.90 0.43 0.65 0.68
33.42 −9.19 −2.59 6.00 4.75 6.32
0.74 −3.50 −0.92 0.65 2.92 3.01 0.37 0.28
22.84 −11.58 −5.19 6.04 2.93 8.78
4.3. Empirical results We conclude the empirical relevance of the instantaneous causality effect we point out from the results in Table 2. These estimates have been obtained by efficient GMM using the moment restrictions (4.1)–(4.4) and as instruments: the constant, lagged durations, and squared lagged durations. Let us discuss the parameter estimates in detail. The parameter α0 determines the level of the instantaneous variance. Given the average durations in Table 1 and the estimated values for α1 , we can easily derive the average level of the instantaneous variance for each of the ten stocks, ignoring market microstructure noise. Focusing on IBM, we would find 0.18 − 0.0152 × 2.2 = 0.15 bp2 /s. The parameter
t-val
0.00 0.51 −2.78 −0.88 0.32 0.16 0.65 0.44 0.35
0.00 0.58 −1.59 −0.86 0.49 0.51 0.37 0.44 0.37
est.
t-val
MAT 0.04 27.09 −7.77 −7.13 2.00 0.37 7.25
SLB
−0.01
based on past returns would imply that this predictability is captured by the random drift term. In any case, our empirical results confirm that return predictability is only a second order effect. Generalizations of the moment conditions (4.1)–(4.4) did not improve the fit and lead to statistically insignificant estimates, see also the reported J-test below. In particular, our specification puts α2 = 0 in (2.10). A second well-documented fact is seasonality in the intraday processes. There is no a priori reason to exclude seasonality for our main parameter of interest β ⋆ . One possible approach is to hypothesize an intraday pattern for all parameters of interest. This has the advantage of a more structural interpretation of parameters and the, obvious, cost of a larger misspecification risk. We will remain nonparametric about the seasonality effect and estimate different parameters for each 15 min interval within the day. As the purpose of this illustration is to show the significance of the instantaneous causality effect, we present, in the interest of space, only average estimated parameters. These averages exclude the opening quarter and are weighted with the inverse of the estimated variances. Accordingly, standard errors are computed using the inverse of the harmonic average of the estimated variances of each quarter, divided by the number of quarters (25). Note that this neglects possible correlation between the parameter estimates for disjunct quarters. For the estimation results per quarter we refer to the web-appendix.
est. JCP
−0.01
MAY
µ (%) α0 α1 (%) β⋆ ϕ 2 σmms ,h 2 σmms ,ψ
est. FD
−0.00 0.49 −2.67 −0.88 0.68 3.43 1.21 0.40 0.35
−0.01 25.45
−8.17 −4.74 4.69 6.90 8.63
WMT 0.04 34.66 −4.12 −4.59 0.80 1.39 3.31
−0.00 0.24 −1.63 −0.63 0.45 0.84 0.22 0.73 0.61
−0.01 32.83
−6.71 −2.63 2.43 8.02 5.06
α1 is estimated significantly5 negative in all cases. Recall that α1 measures the Granger causality effect from durations to volatility. Consequently, a higher (instantaneous) volatility is expected after the observation of smaller durations. Note, moreover, that the estimates are such that expected variances in (2.10) remain positive over the relevant domain of expected durations for all stocks. The parameter β ⋆ measures the instantaneous causality between future volatilities and (surprises in) durations and is the focus of this paper. This parameter is estimated significantly negative for all stocks in our sample. To get an idea of the economic significance of the causality effects consider the parameter estimates for IBM, that is in particular β ⋆ = −0.65. Assume for expository reasons (conditionally) exponentially distributed durations and a constant function β(·). Using the definition of β ⋆ in Corollary 3.3 and (for this exponential distribution) F (v)[1 − F (v)]dv = 1/2, we find that the constant function β is equal to 2β ⋆ . Now consider the event that, after waiting the (conditional) median duration, we have not seen the next quote yet. Then, according to (3.7), we should update our current instantaneous variance prediction with β/2 = β ⋆ =√ −0.65, i.e., a 65% decrease in variances and a corresponding 1 − 1 + β ⋆ = 41% decrease in volatility. For each of the individual stocks, the row ‘‘underestimation’’ in Table 2 gives the relative underestimation in instantaneous volatility due to ignoring the documented causality effect as discussed in Section 3. In all cases we find an economically significant effect, with some variation for the individual stocks. In line with the intuition that the causality effect disappears at higher frequencies, there is a strong positive (rank)correlation between the estimated β ⋆ for each stock and the liquidity, as measured by the average duration. We stress that even for the currently most liquidly traded stocks (IBM and WMT) the causality effect is far from negligible. Our empirical results are consistent with those in Engle and Sun (2005). They specify the conditional variance per trade proportional, given Fti , to (1ti+1 )δ and find empirically δ < 1 and even smaller for the least liquid stocks. In
5 All statements about statistical significance in this paper are at a level of 1%.
278
E. Renault, B.J.M. Werker / Journal of Econometrics 160 (2011) 272–279
other words, the variance per unit of time is decreasing with the corresponding duration (as duration to the power δ − 1) and the causality effect is even stronger for the least liquid stocks. 2 The rows σmms ,h/ψ in Table 2 provide estimates of the variance of market microstructure noise. As mentioned before, we allow for the possibility that market microstructure noise for consecutively observed midquotes is correlated. As a result, we present two 2 variance estimates. σmms ,h refers to the variance of market microstructure noise for returns measured over long intervals, i.e., the deterministic intervals we use in the estimation. The parameter 2 σmms ,ψ refers to market microstructure noise in quote-to-quote prices. Observe that this estimate is for some stocks smaller than the estimate for long duration returns. In these cases, the results indicate a negative correlation in quote-to-quote microstructure noise. These differences are, however, not statistically significant. We apply, quarter by quarter, the standard GMM J-test for overidentifying restrictions. Given our six moment conditions, three instruments, and seven estimated parameters the relevant null distribution has eleven degrees of freedom. We present the average p-values in the table and remark that, for all the 250 individual tests, only seven rejections occur (at 5% level) in the whole sample of all ten stocks. Moreover, assuming that the individual 15 min interval estimates are independent, such that the individual J-tests can be combined to a single one, our specification is rejected for none of the ten stocks under consideration. To conclude the empirical illustration, let us consider the parameter ϕ which measures the dispersion of the rescaled (by their conditional expectation) durations. For the exponential distribution, we have ϕ = 1/2. The results for the ten stocks we study vary in this respect, leading to the conclusion that some stocks exhibit some overdispersion and others exhibit underdispersion for the conditional duration distribution. These effects are, however, never statistically significant. In general, we find this parameter not to be well identified from the moment conditions we use. 5. Conclusions The present paper considers a structural continuous time model for the analysis of instantaneous causality relations between price volatility and durations, in addition to possible Granger causality. We argue that these instantaneous causality effects are significant and that failure to take them into account may lead to severely biased volatility estimates and, consequently, possibly inadequate risk management. We identify the instantaneous causality effects using appropriate moment conditions. These conditions (see Proposition 3.1) are sufficiently general to be applicable to a wide range of model specifications. The analysis does not yet take into account other relevant microstructure variables, like volume or information in other assets. Since our results for the variance of observed returns is based on a specification of volatility predictions given all current information, these could easily be included. Also, while we focus on an interpretation of ti as quote revision times, this is not required in our main Proposition 3.1. As such, interesting empirical applications could include situations where transaction times are studied or cross-causality effects where surprises in durations for one stock, may cause volatility in another stock. Our moments conditions may also be applied to longer, random, horizons. They are volatility forecasting equations in the GARCH style and these would be severely misspecified if instantaneous causality were overlooked. To the best of our knowledge, the only paper in the GARCH literature which allows to identify both Granger causality and instantaneous causality between duration and volatility is Engle (2000). However, discrete time reduced form models do not
clearly disentangle the various causality effects, in particular because observed durations may have significant coefficients in a volatility forecasting equation simply as a filtering device of a latent volatility process. By contrast, our continuous time framework with stochastic volatility avoids confusion between Granger causality, instantaneous causality and filtering issues. As far as inference for continuous time models is concerned, our paper is a first step towards a more extensive study of diffusion processes observed at random times. By providing some empirical evidence of instantaneous causality relationships, we claim that a transition density function for the joint dynamics of durations and returns is best factorized as p 1ti+1 , Rti :ti+1 |Fti = p 1ti+1 |Fti p Rti :ti+1 |1ti+1 , Fti .
(5.1)
Since our paper documents a significant impact of the current duration 1ti+1 in the conditional probability distribution of Rti :ti+1 given (1ti+1 , Fti ), it implies in particular that a likelihood function cannot be written down by simply plugging random times in the transition density function for returns, as done, e.g., in Aït-Sahalia and Mykland (2003). While our paper sheds some light on the causality parameters at play in a semiparametric setting, a fully parametric likelihood approach based on the decomposition (5.1) is left for future research. Alternatively, a model-free approach to instantaneous causality based on quadratic variation and high frequency data is developed in Mykland and Renault (2008). Acknowledgements The authors would like to thank Jeff Russell for his very useful remarks as a discussant at the 2003 winter meetings in Washington DC, Laura Spierdijk for kindly providing (wellorganized) data, and three referees, Rob Engle, Asger Lunde, Nour Meddahi, Per Mykland, Neil Shephard, and Harald Uhlig for several suggestions that improved the paper significantly. The second author gratefully acknowledges support from the EU within the MICFINMA network. This paper circulated previously under the title ‘‘Stochastic volatility models with transaction time risk’’. A web-appendix on the last author’s website contains some detailed material left out in this version of the paper. Appendix A. Proofs As Proposition 2.1 is a special case of Proposition 3.1 (setting
βti = 0), we only prove the latter.
Proof of Proposition 3.1. All references in this proof are to Protter (2003). We consider the conditional expectation of squared observed returns. Note that, under tAssumption B, L is a squareintegrable martingale and so is 0 σti +u dLti +u by applying the lemma on Page 171. Using Corollary 3 on Page 73 and Theorem II.29, we find
∫
t
σti +u dLti +u
E ti
2
∫
t
= E ti
u =0
u =0
σt2i +u d[L, L]ti +u .
The quadratic variation [L, L] is obviously increasing and, thus, of integrable variation since E[L, L]t = t < ∞. Moreover, the compensator of this quadratic variation is time itself and, hence, Theorem III.16 implies
∫
t
E ti u=0
σt2i +u d[L, L]ti +u = Eti
t
∫
u =0
σt2i +u du.
Using our maintained assumption that the optional sampling theorem (Theorem I.16) applies, we find that the above arguments remain valid if we stop the martingales at t = ti+1 , i.e., Eti R2ti :ti+1 = Eti
1ti+1
∫ 0
σt2i +u du.
E. Renault, B.J.M. Werker / Journal of Econometrics 160 (2011) 272–279
Consequently, Eti R2ti :ti+1 = Eti
∞
∫
I(0,1ti+1 ] (u)σt2i +u du
0
∞
∫
Pti {1ti+1 ≥ u}Eti σt2i +u du
= 0
∫ ∞ Covti I(0,1ti+1 ] (u), σt2i +u du + 0 ∫ ∞ u 1−F = Eti σt2i +u du ψti 0 ∫ ∞ u u βti (u)F + 1−F du, ψti ψti 0 where the Fubini exchange in the second equality is allowed as the integrand is nonnegative and the expectation of the product is written as the product of the expectations and the covariance. With a change of variables u = vψti , we deduce
Varti Rti :ti+1
ψti
∞
∫ =
v=0
∫
Eti σt2i +vψt
i
(1 − F (v)) dv
∞
+ v=0
βti (vψti )F (v) (1 − F (v)) dv, ∞
which gives the desired result upon noting that v=0 (1−F (v))dv = E{V } = 1. Appendix B. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.jeconom.2010.03.036. References Admati, A., Pfleiderer, P., 1988. A theory of intraday patterns: volume and price variability. The Review of Financial Studies 1, 3–40. Aït-Sahalia, Y., Mykland, P., 2003. The effects of random and discrete sampling when estimating continuous-time diffusions. Econometrica 71, 483–549. Amihud, Y., Mendelson, H., 1991. Volatility, efficiency, and trading: evidence from the Japanese stock market. The Journal of Finance 46, 1765–1789. Ane, T., Geman, H., 2000. Order flow, transaction clock, and normality of asset returns. The Journal of Finance 55, 2259–2284. Back, K., 1991. Asset prices for general processes. Journal of Mathematical Economics 20, 317–395. Bandi, F.M., Russell, J.R., 2006. Separating microstructure noise from volatility. Journal of Financial Economics 79, 655–692. Barndorff-Nielsen, O., Shephard, N., 2002. Econometric analysis of realized volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society, Series B 64, 253–280. Barndorff-Nielsen, O., Hansen, P., Lunde, A., Shephard, N., 2006. Designing realised kernels to measure the ex-post variation of equity prices in the presence of noise, Working Paper.
279
Carr, P., Wu, L., 2004. Time-changed Lévy processes and option pricing. Journal of Financial Economics 71, 113–141. Delbaen, F., Schachermayer, W., 1999. Non-arbitrage and the fundamental theorem of asset pricing: summary of main results. in: Proceedings of Symposia in Applied Mathematics of the AMS, vol. 57. pp. 49–58. Drost, F.C., Werker, B.J.M., 1996. Closing the GARCH gap: continuous time GARCH modelling. Journal of Econometrics 74, 31–57. Duffie, D., Glynn, P., 2004. Estimation of continuous-time Markov processes sampled at random times. Econometrica 1773–1808. Dufour, A., Engle, R.F., 2000. Time and the price impact of a trade. The Journal of Finance 55, 2467–2498. Easley, David, Engle, Robert F., O’Hara, Maureen, Wu, Liuren, 2008. Timevarying arrival rates of informed and uninformed trades. Journal of Financial Econometrics 6, 171–207. Eberlein, E., Papapantoleon, A., 2005. Equivalence of floating and fixed strike Asian and lookback options. Stochastic Processes and their Applications 115, 31–40. Engle, R.F., 2000. The econometrics of ultra-high frequency data. Econometrica 68, 1–22. Engle, R.F., Russell, J.R., 1998. Autoregressive conditional duration: a new model for irregularly spaced transaction data. Econometrica 66, 1127–1162. Engle, R.F., Sun, Z., 2005. Forecasting volatility using tick by tick data. Working Paper. Florens, J.-P., Fougère, D., 1996. Noncausality in continuous time. Econometrica 64, 1195–1212. Foster, F.D., Viswanathan, S., 1990. A theory of interday variations in volumes, variances, and trading costs in securities markets. Review of Financial Studies 4, 595–624. Foster, F.D., Viswanathan, S., 1993. Variations in trading volume, return volatility, and trading costs: evidence on recent price formation models. The Journal of Finance 48, 187–211. French, K., Roll, R., 1986. Stock return variances: the arrival of information and the reaction of traders. Journal of Financial Economics 17, 5–26. Ghysels, E., Jasiak, J., 1998. GARCH for irregularly spaced financial data: the ACDGARCH model. Studies in Nonlinear Dynamics and Econometrics 2, 133–149. Grammig, J., Wellner, M., 2002. Modelling the interdependence of volatility and inter-transaction duration processes. Journal of Econometrics 106, 369–400. Hansen, B.E., 1995. Regression with nonstationary volatility. Econometrica 63, 1113–1132. Hansen, P.R., Lunde, A., 2006. Realized variance and market microstructure noise, invited lecture with discussion. Journal of Business and Economic Statistics 24, 127–161. Hautsch, N., 2003. Assessing the risk of liquidity suppliers on the basis of excess demand intensities. Journal of Financial Econometrics 1, 189–215. Jones, C., Kaul, G., Lipson, M., 1994. Transactions, volume, and volatility. Review of Financial Studies 7, 631–651. Manganelli, S., 2005. Duration, volume, and volatility impact of trades. Journal of Financial Markets 8, 377–399. Meddahi, N., Renault, E., 2004. Temporal aggregation of volatility models. Journal of Econometrics 119, 355–379. Meddahi, N., Renault, E., Werker, B.J.M., 2006. GARCH and irregularly spaced data. Economics Letters 90, 200–204. Mykland, P.A., Renault, E., 2008. Estimation of volatility with endogenous observation times. Working Paper. Pierce, D.A., Haugh, L.D., 1977. Causality in temporal systems: characterizations and a survey. Journal of Econometrics 5, 265–293. Protter, P., 2003. Stochastic Integration and Differential Equations, 2nd ed. SpringerVerlag, Berlin. Renault, E., Sekkat, K., Szafarz, A., 1998. Testing for spurious causality in exchange rates. Journal of Empirical Finance 5, 47–66. Zhang, L., Mykland, P.A., Aït-Sahalia, Y., 2005. A tale of two time scales: determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association 100, 1394–1411. Zhou, B., 1996. High-frequency data and volatility in foreign-exchange rates. Journal of Business and Economic Statistics 14, 45–52.
Journal of Econometrics 160 (2011) 280–287
Contents lists available at ScienceDirect
Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom
Variance dynamics: Joint evidence from options and high-frequency returns Liuren Wu ∗ Zicklin School of Business, Baruch College, the City University of New York, United States
article
info
Article history: Available online 6 March 2010 JEL classification: C13 C51 G12 G13
abstract This paper analyzes the S&P 500 index return variance dynamics and the variance risk premium by combining information in variance swap rates constructed from options and quadratic variation estimators constructed from tick data on S&P 500 index futures. Estimation shows that the index return variance jumps. The jump arrival rate is not constant over time, but is proportional to the variance rate level. The variance jumps are not rare events but arrive frequently. Estimation also identifies a strongly negative variance risk premium, the absolute magnitude of which is proportional to the variance rate level. © 2010 Elsevier B.V. All rights reserved.
Keywords: Return variance dynamics Variance risk premium Options Variance swap rates High-frequency returns Market microstructure Realized variance Quadratic variation Time-changed Lévy processes
1. Introduction Return variance on financial securities is stochastic. Understanding return variance dynamics is imperative for derivative pricing, risk management, and asset pricing in general. Yet, how to model and estimate variance dynamics remains a challenging task, mainly because return variance is not directly observable as security prices are. Recent advances in two frontiers of finance greatly enhance the identification of return variance dynamics and its pricing. One frontier is the derivatives market. Researchers show that under very general settings, return variance swap rate, which equals the risk-neutral expected value of return variance over a fixed horizon, can be well approximated by the value of a specific portfolio of options across different strikes at the same maturity.1 In line with this theoretical development, on September 22, 2003, the Chicago Board of Options Exchange (CBOE) launched a new volatility index, the VIX, and back-calculated this index to
∗ Corresponding address: Department of Economics and Finance, Zicklin School of Business, Baruch College, One Bernard Baruch Way, Box B10-225, New York, NY 10010-5585, United States. Tel.: +1 646 312 3509; fax: +1 646 312 3451. E-mail address:
[email protected]. 1 Theoretical works on the replication of variance swap contracts include Carr and Madan (1998), Demeterfi et al. (1999), Britten-Jones and Neuberger (2000), Jiang and Tian (2005), and Carr and Wu (2009). 0304-4076/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2010.03.037
1990. This index approximates the 30-day variance swap rate on the S&P 500 index (Carr and Wu, 2006). Thus, the risk-neutral expected value of the 30-day variance becomes an approximately observable series. The other frontier is market microstructure. The increasing availability of high-frequency data spurs a rapid development of new theories on computing realized variance from high-frequency returns. If the true price of a security can be sampled frequently, the sum of the squared returns over a sample period converges in the limit to the return quadratic variation for that period. More recently, researchers realize that microstructure noise can bias the estimate of return quadratic variation. They propose various methods to account for the microstructure effects.2 Variance swap rates constructed from options, and quadratic variation estimators constructed from high-frequency returns, make the return variance an almost observable quantity, up to a risk-neutral projection in the former, and up to a random-jump related error term in the latter. Directly using these series can make inferences on variance dynamics more accurate and less reliant on the underlying return dynamics specification. In this
2 Examples include Andreou and Ghysels (2002), Aït-Sahalia et al. (2005), Andersen et al. (2005), Oomen (2005), Zhang et al. (2005), Bandi and Russell (2006), Hansen and Lunde (2006), Barndorff-Nielsen et al. (2008), and Podolskij and Vetter (2009).
L. Wu / Journal of Econometrics 160 (2011) 280–287
281
paper, I exploit the recent advances in both frontiers, and estimate the variance dynamics and variance risk premium on the S&P 500 index from a joint analysis of over 15 years of daily time series on the VIX index and several quadratic variation estimators constructed from tick data on S&P 500 index futures. Starting with a stochastically time-changed Lévy process for the S&P 500 index return, this paper uses the stochastic time change to capture the index return variance dynamics. By specifying the time change via the instantaneous variance rate, the paper identifies the variance rate dynamics using the variance swap rate and the quadratic variation estimators, without ever specifying, and hence without jointly testing, the Lévy process that defines the return innovations. This paper analyzes the variance rate dynamics and pricing within the flexible affine framework of Filipović (2001). Within this framework, the paper estimates several variance dynamics specifications. Estimation shows that the variance rate jumps. The jump arrival rate is not constant over time, but is proportional to the variance level. Jumps in the variance are not rare events, but arrive frequently. Estimation also identifies a strongly negative variance risk premium, the absolute magnitude of which is proportional to the variance level. Combining realized variance estimators with variance swap rates, this paper provides insights on the discontinuous movements of the variance rate dynamics, and answers questions on whether the variance jump intensity is proportional to the variance rate level and whether the variance jumps arrive frequently or are rare events. These questions have not been effectively addressed in the option pricing literature due to identification issues (e.g., Eraker (2004)), nor in the realized variance literature due to the nonparametric nature of the proposed tests (e.g., BarndorffNielsen and Shephard, 2004). In other related works, Jones (2003) studies the variance dynamics using daily series on index returns and CBOE’s old volatility index VXO. Chernov (2007) analyzes the link between high–low range volatility estimators and the VXO. In this paper, I use tick data to construct realized variance estimators, which contain much less noise than estimators from daily returns or high–low data. Furthermore, I use the new VIX index instead of the old VXO, because the new VIX index directly approximates the variance swap rate under very general settings whereas CBOE’s construction of the old VXO involves an erroneous day-counting conversion that biases the estimate upward (Carr and Wu, 2006). Also related is a working paper by Bollerslev et al. (2004), who construct a risk aversion index using realized variance and the new VIX. The rest of the paper is structured as follows. The next section establishes the theory that underlies our variance dynamics estimation. Section 3 describes the data and the estimation procedure. Section 4 discusses the estimation results. Section 5 concludes.
The objective of this paper is to estimate the instantaneous variance rate dynamics v(t ) without ever specifying the return drift process θt or the return innovation (the standardized Lévy process Lt ). The estimation is based on observations on two quantities. The first is the volatility index VIX squared, which approximates the 30-day variance swap rate on the S&P 500 index and hence the expected value of the 30-day quadratic variation (QV ) under a risk-neutral measure Q,
2. Theory
where I approximate the integral in (5) by assuming constant variance rate within a day t, with δ denoting the length of one day. In this approximation, I include an additional scaling coefficient ϕ to further adjust scale mismatches between the quadratic variation estimators and the annualized variance rate derived from the variance swap rate from Eq. (3). The mismatches can come from a number of sources. For example, the quadratic variation estimators are computed based on tick data on index futures during the day trading session from 9:30 am to 3:15 pm Chicago time. Thus, the estimators do not capture after-hour trading activities and overnight information flows that can move the index futures. Further, the instantaneous variance rate derived from the variance swap rate can differ from the variance rate level under the statistical measure if the index level jumps and the jump risk is priced. The error term et in Eq. (7) can come either from the random return jump realizations as defined in (6), or from noises originated
Let (Ω , F , (Ft )t ≥0 , P) represent a stochastic basis, with P being the physical measure that governs the time series dynamics of the S&P 500 index returns, ln ST /St =
T
∫ t
θs ds + LT[t ,T ] ,
(1)
where θt denotes the instantaneous drift of the index return, Lt denotes a standardized Lévy process with its variance normalized to t, and T[t ,T ] denotes a continuous and differentiable stochastic time change that captures the randomness in the integrated variance over horizon [t , T ], T
∫
v− (s) ds,
T[t ,T ] ≡ t
where v(t ) denotes the instantaneous variance rate.
(2)
. 1
1 Q Q Q Et QV [t ,t +h] = Et T[t ,t +h] , h = 30/365, (3) h h where the second equality replaces the quadratic variation with its risk-neutral expected value over random jumps, i.e., the riskQ neutral integrated variance T[t ,t +h] . The first equality is an approximation. Carr and Wu (2009) show that approximation errors can arise from discrete strikes and discontinuous movements in the S&P 500 index. Nevertheless, under commonly used models and reasonable model parameters, they use numerical analysis to show that the approximation errors from both sources are small. In this paper, I treat the approximation errors as negligible and infer the variance rate from the VIX squared under affine model specifications. The second quantity is an estimator on the daily quadratic variation, QV [t ,t +δ] , with δ = 1/365 denoting the length of one day. The quadratic variation is defined as the limit in probability of the sum of return squared as the sampling interval approaches zero, VIXt2,h =
QV [t ,t +δ] ≡ lim
[δ/ ∆] −
∆→0
ln St +j∆ /St +(j−1)∆
2
,
(4)
j=1
where [δ/∆] denotes the number of observations in a day given sampling interval ∆. Since the integrated variance represents an expectation of the quadratic variation over the random jump realization, we can write, QV [t ,t +δ] = Tt ,t +δ + e[t ,t +δ] ,
(5)
where the zero-mean error term e[t ,t +δ] is induced by the random jumps in the index return. In particular, if we let µt (dx) denote the counting measure of jumps in the index return and let νt (dx) denote the corresponding compensator, which measures the arrival rate of jumps of size x, the error term becomes t +δ
∫ e[t ,t +δ] = t
∫ R0
x2 (µs (dx) − νs (dx)) ds,
(6)
where R0 denotes the real line excluding zero. t ) from highGiven daily quadratic variation estimators (QV frequency return data, we can rewrite Eq. (5) as
t = vt δϕ + et , QV
(7)
282
L. Wu / Journal of Econometrics 160 (2011) 280–287
in the particular quadratic variation estimator (Meddahi, 2003). I assume an AR(1) dynamics for the error term to accommodate potential serial dependence,
where δ refers to the daily time interval. The affine dynamics in Eq. (10) dictates that the characteristic function is exponential affine in vt ,
et +1 = φe et + εt +1 ,
φvt (u) = exp (−b (δ) vt − c (δ)) ,
(8)
where the daily time step is normalized to one for notational clarity and ε denotes an iid zero-mean random error term. A non-zero autocorrelation φe can be induced by persistence of the return jump arrival rates and/or microstructure noise in the quadratic variation estimator. The distribution of ε depends on both the jump structure, as defined in (6), and the distribution of the random noise in the quadratic variation estimator. If we make the simplifying assumption that ε is normally distributed with variance Ve , we can write the daily log likelihood contribution from the quadratic variation estimator as, l(et +1 |et ) = −
1 ln(2π ) + ln(Ve ) + (et +1 − φe et )2 /Ve . 2
(9)
When the distribution of ε deviates from the normal assumption, (9) represents a valid quasi-likelihood for inferences on the conditional mean of the quadratic estimator, vt δϕ . In particular, the inferences on vt δϕ do not depend on the distribution of ε nor on its variance Ve , and standard likelihood ratio tests are equally applicable to the quasi-likelihood (McCullagh and Nelder (1983), pp. 168–172). In estimating a constant variance from a noisy quadratic variation estimator, Aït-Sahalia et al. (2005) show that even if the noise distribution is misspecified, the variance estimator obtained by maximizing the misspecified log-likelihood function remains consistent and the asymptotic variance of the estimator is the same as that from the true likelihood. Xiu (2009) further shows that in the presence of stochastic volatility and market microstructure noise, the integrated variance estimator remains consistent, efficient, and robust as a quasi-maximum likelihood estimator under the misspecified assumptions. Hence, inferences on the variance rate dynamics remain valid whether Eq. (9) is regarded as the true log likelihood or quasi-likelihood on the quadratic variation estimator. 2.1. Affine variance rate dynamics and variance swap pricing To derive the likelihood on the instantaneous variance rate, I consider a general one-factor affine specification (Filipović, 2001), under which the variance rate is governed by the stochastic differential equation,
√ ωvt dWt x µ(dx) − νc (dx) + vt νp (dx) dt ,
dvt = (a − κvt ) dt +
∫ + R+ 0
(10)
where a, ω, κ are all positive constants, R0 denotes the positive half line excluding zero, and νc (dx) and νp (dx) denote two nonnegative Borel measures that define the arrival rate of jumps in the variance rate, with νc (dx) describing a jump component with a constant arrival rate and νp (dx) describing a jump component with the arrival rate proportional to the variance rate level. The two Borel measures must satisfy the following condition: +
∫ R+ 0
(x1x