SHORT-MEMORY LINEAR PROCESSES AND ECONOMETRIC APPLICATIONS
SHORT-MEMORY LINEAR PROCESSES AND ECONOMETRIC APPLICATIONS ...
90 downloads
746 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
SHORT-MEMORY LINEAR PROCESSES AND ECONOMETRIC APPLICATIONS
SHORT-MEMORY LINEAR PROCESSES AND ECONOMETRIC APPLICATIONS KAIRAT T. MYNBAEV International School of Economics Kazakh-British Technical University Almaty, Kazakhstan
Copyright # 2011 by John Wiley & Sons, Inc. All rights reserved Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in variety of electronic formats. Some content that appears in print may not be available in electronic format. For more information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-Publication Data: Mynbaev, K. T. (Kairat Turysbekovich) Short-memory linear processes and econometric applications / Kairat T. Mynbaev. p. cm. Includes bibliographical references and index. ISBN 978-0-470-92419-8 1. Linear programming. 2. Econometric models. 3. Regression analysis. 4. Probabilities. I. Title. T57.74.M98 2011 519.70 2—dc22 2010040947
Printed in the United States of America 10 9 8 7 6 5 4 3 2 1
To my teacher Mukhtarbai Otelbaev, from whom I learnt the best I know.
CONTENTS
List of Tables
xi
Preface
xiii
Acknowledgments
xix
1 INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12
Linear Spaces 1 Normed Spaces 3 Linear Operators 6 Hilbert Spaces 9 Lp Spaces 13 Conditioning on s-fields 18 Matrix Algebra 21 Convergence of Random Variables 24 The Linear Model 29 Normalization of Regressors 32 General Framework in the case of K Regressors Introduction to L2-Approximability 40
35
2 Lp-APPROXIMABLE SEQUENCES OF VECTORS 2.1 2.2 2.3 2.4 2.5 2.6 2.7
1
45
Discretization, Interpolation and Haar Projector in Lp 45 Convergence of Bilinear Forms 49 The Trinity and Its Boundedness in lp 54 Convergence of the Trinity on Lp-Generated Sequences 57 Properties of Lp-Approximable Sequences 68 Criterion of Lp-Approximability 71 Examples and Counterexamples 80
3 CONVERGENCE OF LINEAR AND QUADRATIC FORMS
89
3.1 3.2 3.3 3.4
General Information 89 Weak Laws of Large Numbers 93 Central Limit Theorems for Martingale Differences 94 Central Limit Theorems for Weighted Sums of Martingale Differences 95 3.5 Central Limit Theorems for Weighted Sums of Linear Processes 102 3.6 Lp-Approximable Sequences of Matrices 106
vii
viii
CONTENTS
3.7 Integral operators 111 3.8 Classes sp 117 3.9 Convergence of Quadratic Forms of Random Variables
122
4 REGRESSIONS WITH SLOWLY VARYING REGRESSORS 4.1 4.2 4.3 4.4 4.5 4.6
Slowly Varying Functions 132 Phillips Gallery 1 133 Slowly Varying Functions with Remainder 143 Results Based on Lp -Approximability 149 Phillips Gallery 2 159 Regression with Two Slowly Varying Regressors
131
174
5 SPATIAL MODELS 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14
189
A Math Introduction to Purely Spatial Models 190 Continuity of Nonlinear Matrix Functions 193 Assumption on the Error Term and Implications 195 Assumption on the Spatial Matrices and Implications 198 Assumption on the Kernel and Implications 201 Linear and Quadratic Forms Involving Segments of K 205 The Roundabout Road 207 Asymptotics of the OLS Estimator for Purely Spatial Model Method of Moments and Maximum Likelihood 217 Two-Step Procedure 223 Examples and Computer Simulation 230 Mixed Spatial Model 236 The Roundabout Road (Mixed Model) 244 Asymptotics of the OLS Estimator for Mixed Spatial Model
213
254
6 CONVERGENCE ALMOST EVERYWHERE 6.1 6.2 6.3 6.4 6.5 6.6 6.7
265
Theoretical Background 265 Various Bounds on Martingale Transforms 270 Marcinkiewicz –Zygmund Theorems and Related Results 278 Strong Consistency for Multiple Regression 292 Some Algebra Related to Vector Autoregression 300 Preliminary Analysis 310 Strong Consistency for Vector Autoregression and Related Results
319
7 NONLINEAR MODELS 7.1 7.2 7.3 7.4
339
Asymptotic Normality of an Abstract Estimator 339 Convergence of Some Deterministic and Stochastic Expressions 346 Nonlinear Least Squares 358 Binary Logit Models with Unbounded Explanatory Variables 373
8 TOOLS FOR VECTOR AUTOREGRESSIONS 8.1 Lp-Approximable Sequences of Matrix-Valued Functions 8.2 T-Operator and Trinity 397
393 393
CONTENTS
8.3 Matrix Operations and Lp-Approximability 400 8.4 Resolvents 402 8.5 Convergence and Bounds for Deterministic Trends
ix
404
REFERENCES
417
Author Index
423
Subject Index
425
LIST OF TABLES
4.1 Basic SV Functions
134
4.2 Transition Matrix Summary
178
4.3 Type-Wise OLS Asymptotics
182
4.4 Transition Matrix Summary in Case II: jld j , 1, b1 ld þ b2 ¼ 0, b2 =0 5.1 Simulations for Theorem 5.8.1
234
5.2 Simulations for Two-Step Estimator 5.3 Comparison of Percentage Errors
235
236
5.4 Asymptotic Distribution with a Constant Term and Case Matrix 5.5 Simulation Results for Pseudo-Case Matrices 5.6 Simulation Results for Case Matrices
262
262
263
6.1 Contributions to the Consistency Theory of Autoregressions 7.1 Properties of Bernoulli Variables
187
302
386
xi
PREFACE
1 RED LIGHT There are no new econometric models in this book. You will not find real-life applications or tests of economic theories either.
2 GREEN LIGHT The book concentrates on the methodology of asymptotic theory in econometrics. Specifically, central limit theorems (CLTs) for weighted sums of short-memory processes are obtained. They are applied to several well-known econometric models to demonstrate how their asymptotic behavior can be studied, what kind of assumptions are (in)appropriate and how probabilistic convergence statements are applied. Currently, no monographs or textbooks are devoted specifically to econometric models with deterministic regressors. The field is considered rather narrow by some specialists because the first thing they think about is polynomial trends. Indeed, polynomial trends are not widely used in econometrics. However, some other types of regressors fall into the classes of deterministic regressors considered in the literature; for example, some spatial matrices and seasonal dummies. This makes deterministic explanatory variables more important than commonly thought. Besides, on the level of CLTs deterministic weights are of interest in themselves. There is a monograph by Taylor (1978) devoted exclusively to such theorems.
3 THE ESSENCE By and large, CLTs here are based on only one global idea: how sequences of discrete objects (vectors and matrices) can be approximated with functions of a continuous argument (defined on the segment [0, 1]). Stated in this general way, the idea is as old as calculus. The novelty here consists in application of the idea to weighted sums of linear processes n X
wnt ut ,
(0:1)
t¼1
xiii
xiv
PREFACE
where wn ¼ (wn1 , . . . , wnn ), n ¼ 1, 2, . . . is a sequence of deterministic vector weights and ut ¼
1 X
ctj ej , t ¼ 0, +1, +2, . . .
(0:2)
j¼1
is a short-memory linear process. Anybody with a little experience in probabilities, statistics, and econometrics can confirm that statements on convergence in distribution of such weighted sums have many applications. As it turned out, the main difficulties in proving precise CLTs lay in the theory of functions. Hence, attempts to obtain general CLTs for sums of type Eq. (0.1) by researchers with backgrounds other than the theory of functions yielded results less satisfactory than those published in my paper (Mynbaev, 2001) on Lp -approximable sequences. My interest in CLTs for Eq. (0.1) arose from the necessities of regression analysis. In the asymptotic theory of regressions with deterministic regressors, sequences of regressors can be approximated by functions of a continuous argument. The structure of the corresponding estimators allows for application of CLTs for Eq. (0.1). As I was developing applications, I needed various additional properties of Lp -approximable sequences. They are distributed throughout the book and, taken together, constitute a complete toolkit accompanying the main CLTs. In the econometrics context, two other definitions of deterministic regressors are suggested in the literature. A purely algebraic definition (based on recursion) was proposed by Johansen (2000) and developed further by Nielsen (2005) to study strong consistency of ordinary least squares (OLS) estimators. Nielsen’s result, given in Chapter 8, shows that such regressors are asymptotically polynomial functions multiplied by oscillating (trigonometric) functions. The Johansen – Nielsen approach and Lp -approximability complement each other. Phillips (2007) has defined regressors in terms of slowly varying (SV) functions (which is a functional– theoretical construction). Slow variation is a limit property at infinity and, in general, has nothing to do with Lp -approximability, which is a limit property distributed over the segment [0, 1]. However, special sequences arising from SV functions in the regression context are all Lp -approximable.
4 STANDING PROBLEMS About half of the results contained in the book were obtained after I started writing it. The theory has grown to the extent that no single person can embrace all the ramifications. 1. Linear processes (0.2), depending on the rate at which the numbers cj vanish at infinity, are classified as follows. Processes for which 1 X j¼1
jcj j , 1
(0:3)
PREFACE
xv
are called short-memory processes. Processes for which the series in Eq. (0.3) diverges, but 1 X
c2j , 1
j¼1
are called long-memory processes. My CLT for weighted sums Eq. (0.1) holds in the case of short-memory processes. The existing CLTs for longmemory ones, as deep as they are, leave some questions open. 2. The main advantage of representing sequences of vectors with the help of functions of a continuous argument is that the limit expressions in asymptotic distributions involve integrals of those functions. Thus, they are amenable to further analysis, which I call analysis at infinity. For this reason alone, when my definition of Lp -approximability does not fit practical situations (and there is at least one, in spatial econometrics), developing a more suitable definition may be better than relinquishing the concept altogether. 3. The name of the book reflects its coverage rather than its potential. There are two important directions in which it can be extended. One is nonparametric and nonlinear estimation, where even my CLT will suffice for the beginning. Another is the case of stochastic regressors. In this case Anderson and Kunitomo (1992) impose conditions on separate parts of the OLS estimator that allow them to prove its convergence. As an alternative, I would embed enough structure in the stochastic regressors to be able to derive convergence of separate parts of the OLS estimator. The structure entailed by Lp -approximability in the deterministic case may guide the choice for the stochastic case.
5 REVIEW BY CHAPTERS Chapter 1 is a collection of general ideas and preliminaries from probability theory and functional analysis. It also contains a discussion of Lp -approximability and its advantages. The first nontrivial application is to the convergence in distribution of the fitted value for the linear regression. This convergence looks to some econometricians so incredible that an anonymous referee of Econometric Theory said that my paper was “full of mistakes.” Naturally, the paper was rejected and, not so naturally, the result was not published in journals. Thus this book was written. The discussion of issues related to normalization of regressors draws from folklore and should be in the core of any course on asymptotic theory in econometrics. Chapter 2 covers the nonstochastic part of my paper (Mynbaev, 2001). Readers with taste for mathematical precision will find it illuminating that Lp approximability (which relates sequences of vectors to functions defined on [0, 1]) can be characterized intrinsically (in terms of sequences of vectors themselves). This is evidence of a well-balanced definition. On a more practical note, such results and their by-products make sure that the ensuing CLTs are the most precise and general.
xvi
PREFACE
The main CLTs are proved in Chapter 3, which is based on Mynbaev (2001) but I would like to acknowledge the influence of Nabeya and Tanaka (1990) who paved the way to treating convergence of quadratic forms. This is where the theory of integral operators is needed and introduced first. There are many CLTs and weak laws of large numbers (WLLN) out there. The reader will notice that when the innovations ej in linear processes [Eq. (0.2)] are martingale differences (m.d.’s), the McLeish CLT (McLeish, 1974) and the ChowDavidson WLLN (Davidson, 1994) are absolutely sufficient for the purposes of Chapter 3. I dare to suggest trying these tools first in all other problems with linear processes involving m.d.’s. Serious applications (to static models) start in Chapter 4. Phillips (2007) developed a nice scheme of investigating asymptotic properties of regressions with regressors such as log s, log(log s), their reciprocals and so on. Chapter 4 follows this scheme, while the underlying central limit results are derived from my CLT. This is possible because Phillips’ specification of the weights in Eq. (0.1) is a special case of Lp -approximable sequences. One of the methodological conclusions of this chapter is that direct derivation of a CLT given in Chapter 3 is better than recourse to Brownian motion used by Phillips. Chapter 5 demonstrates what can be done with the help of Lp -approximability in the theory of spatial models. This research started with a joint paper (Mynbaev and Ullah, 2008) in which we showed that the OLS estimator for a purely spatial model is not asymptotically normal. In Mynbaev (2010), this result is extended to a mixed spatial model. Spatial models are peculiar in many respects, a full discussion of which would be too technical for a preface. It is worth stating here only the most general methodological conclusion. When studying the asymptotic behavior of a new model, never presume it is of a certain class. Otherwise, you will be bound to use specific techniques that will take you to a particular result so you will not see the general picture. In the 1980s, Lai and Wei in a series of papers (Lai and Wei, 1982, 1983a, 1983b, 1985) obtained outstanding results on strong consistency of the OLS estimator for the linear model, with and without autoregressive terms. Reading those papers is a thankless task because the solution to a large problem is divided into publishable articles and the times of publication of the articles are not the best reflection of the logic of the solution. Chapter 6 is an attempt to expound Lai and Wei’s theory coherently. Chapter 7 contains a treatment of two nonlinear estimators: nonlinear least squares (NLS) and maximum likelihood (ML). The choice of the models is explained by the fact that in both cases the explanatory variables are deterministic. The first part of the chapter covers the Phillips (2007) result for the model ys ¼ bsg þ us . The second part is my extension to unbounded explanatory variables of the approach to binary logit models suggested by Gourie´roux and Monfort (1981). Finally, Chapter 8 contains a study of algebraic properties of Lp -approximable sequences of matrix-valued functions and a study of a different type of deterministic trends from Nielsen (2005). The applications to vector autoregressions (VARs) with deterministic trends are left out.
PREFACE
xvii
6 EXPOSITION The book is analytical in nature, meaning that there is a lot of formula manipulation. Most calculations are detailed so they can be followed without a pen and paper. To simplify the reader’s job, all meaningful parts of proofs are given in separate statements. Because of this, some proofs look longer than they are. Commuters who need to do their reading in buses and trains will benefit from such exposition. Only the core theoretical results are collected in Chapters 2 and 3. All others are given immediately before they are applied (including some CLTs). Thus, applicationspecific properties of Lp -approximable sequences, as well as parts of the theory of integral operators, are scattered throughout the book. If someone were to lecture using this book, I have imagined how clumsy it would be to say, “Let us recall the function defined by Eq. (9) in Lecture 3.” For this reason I have tried to give names not only to final statements, but also to auxiliary objects, such as lemmas, functions and operators. In most cases the names reflect the roles performed by such objects. Thus, you will see bad and good coefficients, a chain product, annihilation lemma, balancer, cutter and the like. However, in a couple of cases descriptive names would be too long, and the names I give reflect the look, not the role. There is a projy and proXy, an awkward aggregate, genie (because of Gn ) and so on. No subsection contains more than one statement. Therefore statements are referred to by the section they are in. Thus, Lemma 3.1.2 means the statement from subsection 3.1.2, even though the name ‘Lemma’ may not be there. Equation numbering follows the Wiley standard: Eq. (7.1) means equation 1 from Chapter 7. To make the book self-contained, most preliminaries are given in the book. All calculations are detailed with extensive cross-referencing.
7 SUGGESTIONS FOR READING The variety and depth of mathematical theories used by econometricians can be a serious obstacle for novices. Davidson (1994) has done an excellent job in gathering in one place the required minimum, from measure theory to stochastic processes. For me, this is the most important book I have read in the past 10 years, and I recommend it for preliminary or concurrent reading. A partial excuse for the limited coverage of the existing literature is that during the four years that I was working on the book I did not receive any support, financial or otherwise, and did not have access to a good library, except when I traveled to international conferences. At the final stage, when the book was in production, I received useful references from some colleagues. Regarding weighted sums and their applications in econometrics, Jonathan B. Hill suggests reading Cˇ´ız´ek (2008), Goldie and Smith (1987), Hahn et al. (1987), Hill (2010, 2011) and references therein. Jan Mielniczuk, who contributed a lot to the theory of long-memory processes not covered here, proposes reading Wu (2005) and Wu and Min (2005) for the most recent developments in the area. M. H. Pesaran was kind enough to provide references Chudik et al. (2010), Holly et al. (2008) and Pesaran and Chudik (2010) for spatial models, vector autoregressions and panel data models.
xviii
PREFACE
Personally, I find nothing more gratifying than reading applied econometric papers because they abound in new ideas. Sometimes they also show how things should not be done. See more about this in Chapter 1. Kairat T. Mynbaev Almaty, Kazakhstan
ACKNOWLEDGMENTS
I am grateful to Carlos Martins Filho for his encouragement for this book and many other projects. The folks at John Wiley & Sons have been highly efficient in preparing the book for publication. The production process surely involved many people of whom I would like to especially thank Susanne Steitz-Filler, Christine Punzo, Jacqueline Palmieri, and Nick Barber (Books Manager, Techset Composition Ltd).
xix
CHAPTER
1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
T
HIS CHAPTER has a little bit of everything: normed and Hilbert spaces, linear operators, probabilities, including conditional expectations and different modes of convergence, and matrix algebra. Introduction to the OLS method is given along with a discussion of methodological issues, such as the choice of the format of the convergence statement, choice of the conditions sufficient for convergence and the use of L2 -approximability. The exposition presumes that the reader is versed more in the theory of probabilities than in functional analysis.
1.1 LINEAR SPACES In this book basic notions of functional analysis are used more frequently than in most other econometric books. Here I explain these notions the way I understand them— omitting some formalities and emphasizing the intuition.
1.1.1 Linear Spaces The Euclidean space Rn is a good point of departure when introducing linear spaces. An element x ¼ (x1 , . . . , xn ) [ Rn is called a vector. Two vectors x, y can be added coordinate by coordinate to obtain a new vector x þ y ¼ (x1 þ y1 , . . . , xn þ yn ):
(1:1)
A vector x can be multiplied by a number a [ R, giving ax ¼ (ax1, . . . , axn). By combining these two operations we can form expressions like ax þ by or, more generally, a1 x(1) þ þ am x(m) (1)
(1:2)
(m)
where a1, . . . , an are numbers and x , . . . , x are vectors. Expression (1.2) is called a linear combination of vectors x (1), . . . , x (m) with coefficients a1, . . . , an. Generally, multiplication of vectors is not defined. Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
1
2
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
Here we observe the major difference between R and Rn . In R both summation a þ b and multiplication ab can be performed. In Rn we can add two vectors, but to multiply them we use elements of another set – the set of real numbers (or scalars) R. Generalizing upon this situation we obtain abstract linear (or vector) spaces. The elements x, y of a linear space L are called vectors. They can be added to give another vector x þ y. Summation is defined axiomatically and, in general, there is no coordinate representation of type (1.1) for summation. A vector x can be multiplied by a scalar a [ R. As in Rn , we can form linear combinations [Eq. (1.2)]. The generalization is pretty straightforward, so what’s the big deal? You see, in functional analysis complex objects, such as functions and operators, are considered vectors or points in some space. Here is an example. Denote C[0, 1] the set of continuous functions on the segment [0, 1]. The sum of two functions F, G [ C[0, 1] is defined as the function F þ G with values (F þ G)(t) ¼ F(t) þ G(t), t [ [0, 1] [this is an analog of Eq. (1.1)]. Continuity of F, G implies continuity of their sum and of the product aF, for a a scalar, so C[0, 1] is a linear space.
1.1.2 Subspaces of Linear Spaces A subset L1 of a linear space L is called its linear subspace (or just a subspace, for simplicity) if all linear combinations ax þ by of any elements x, y [ L1 belong to L1 . Obviously, the set f0g and L itself are subspaces of L, called trivial subspaces. For example, in Rn the set L1 ¼ {x : c1 x1 þ þ cn xn ¼ 0} is a subspace because if x, y [ L1 , then c1 (ax1 þ by1 ) þ þ cn (axn þ byn ) ¼ 0. Thus, in R3 the usual straight lines and two-dimensional (2-D) planes containing the origin are subspaces. All intuition we get from our day-to-day experience with the space we live in applies to subspaces. Geometrically, summation x þ y is performed by the parallelogram rule. Multiplying x by a number a = 0 we obtain a vector ax of either the same (a . 0) or opposite (a , 0) direction. Multiplying x by all real numbers, we obtain a straight line {ax : a [ R} passing through the origin and parallel to x. This is a particular situation in which it may be convenient to call x a point rather than a vector. Then the previous sentence sounds like this: multiplying x by all real numbers we get a straight line passing through the origin and the given point x. For a given x1 , . . . , xn its linear span M is, by definition, the least linear space of L containing those points. In the case n ¼ 2 it can be constructed as follows. Draw a straight line L1 ¼ {ax1 : a [ R} through the origin and x1 and another straight line L2 ¼ {ax2 : a [ R} through the origin and x2 . Then form M by adding elements of L1 and L2 using the parallelogram rule: M ¼ {x þ y : x [ L1 , y [ L2 }.
1.1.3 Linear Independence Vectors x1 , . . . , xn are linearly independent if the linear combination c1 x1 þ þ cn xn can be null only when all coefficients are null. EXAMPLE 1.1. Denote by ej ¼ (0, . . . , 0, 1, 0, . . . , 0) (unity in the jth place) the jth unit vector in Rn . From the definition of vector operations in Rn we see that
1.2 NORMED SPACES
3
c1 e1 þ þ cn en ¼ (c1 , . . . , cn ). Hence, the equation c1 e1 þ þ cn en ¼ 0 implies equality of all coefficients to zero and the unit vectors are linearly independent. If in a linear space L there exist vectors x1 , . . . , xn such that 1. x1 , . . . , xn are linearly independent and 2. any other vector x [ L is a linear combination of x1 , . . . , xn , then L is called n-dimensional and the system {x1 , . . . , xn } is called its basis. If, on the other hand, for any natural n, L contains n linearly independent vectors, then L is called infinite-dimensional. EXAMPLE 1.2. The unit vectors in Rn form a basis because they are linearly independent and for any x [ Rn we can write x ¼ (x1 , . . . , xn ) ¼ x1 e1 þ þ xn en . EXAMPLE 1.3. C[0, 1] is infinite-dimensional. Consider monomials xj (t) ¼ t j , j ¼ 0, . . . , n. By the main theorem of algebra, the equation c0 x0 (t) þ þ cn xn (t) ¼ 0 with nonzero coefficients can have at most n roots. Hence, if c0 x0 (t) þ þ cn xn (t) is identically zero on [0, 1], the coefficients must be zero, so these monomials are linearly independent. Functional analysis deals mainly with infinite-dimensional spaces. Together with the desire to do without coordinate representations of vectors this fact has led to the development of very powerful methods.
1.2 NORMED SPACES 1.2.1 Normed Spaces The Pythagorean theorem gives rise to the Euclidean distance
dist(x, y) ¼
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X (xi yi )2
(1:3)
i
between points x, y [ Rn . In an abstract situation, we can first axiomatically define the distance dist(x, 0) from x to the origin and then the distance between any two points will be dist(x, y) ¼ dist(x y, 0) (this looks like tautology, but programmers use such definitions all the time). dist(x, 0) is denoted kxk and is called a norm. Let X be a linear space. A real-valued function k k defined on X is called a norm if 1. kxk 0 (nonnegativity), 2. kaxk ¼ jajkxk for all numbers a and vectors x (homogeneity), 3. kx þ yk kxk þ kyk (triangle inequality) and 4. kxk ¼ 0 implies x ¼ 0 (nondegeneracy).
4
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
By homogeneity the norm of the null vector is zero: 0 0 (vector) ¼ (number)
¼ j0jk0k ¼ 0: (vector) 0
Nondegeneracy makes sure that the null vector is the only vector whose norm is zero. If we omit the nondegeneracy requirement, the result is the definition of a seminorm. Distance measurement is another context in which points and vectors can be used interchangeably. kxk is a length of the vector x and a distance from point x to the origin. In this book, the way norms are used for bounding various quantities is clear from the next two definitions. Let {Xi } be a nested sequence of normed spaces, X1 # X2 # . . . : Take one element from each of these spaces, xi [ Xi . We say that {xi } is a bounded sequence if supi kxi kXi , 1 and vanishing if kxi kXi ! 0:
1.2.2 Convergence in Normed Spaces A linear space X provided with a norm k k is denoted (X, k k). This is often simplified to X. We say that a sequence {xn } converges to x if kxn xk ! 0. In this case we write lim xn ¼ x. Lemma (i) Vector operations are continuous: if lim xn ¼ x, lim yn ¼ y and lim an ¼ a, then lim an xn ¼ ax, lim(xn þ yn ) ¼ lim xn þ lim yn . (ii) If lim xn ¼ x, then limkxn k ¼ kxk (a norm is continuous in the topology it induces). Proof. (i) Applying the triangle inequality and homogeneity, kan xn axk k(an a)xk þ kan (xn x)k ¼ jan ajkxk þ kan kkxn xk ! 0: Here we remember that convergence of the sequence {an } implies its boundedness: supjan j , 1. (ii) Let us prove that kxk kyk kx yk:
(1:4)
The proof is modeled on a similar result for absolute values. By the triangle inequality, kxk kx yk þ kyk and kxk kyk kx yk: Changing the places of x and y and using homogeneity we get kyk kxk ky xk ¼ kx yk: The latter two inequalities imply Eq. (1.4). Equation (1.4) yields continuity of the norm: jkxn k kxkj kxn xk ! 0: B
1.2 NORMED SPACES
5
We say that {xn } is a Cauchy sequence if limn,m!1 (xn xm ) ¼ 0. If {xn } converges to x, then it is a Cauchy sequence: kxn xm k kxn xk þ kx xm k ! 0. If the converse is true (that is, every Cauchy sequence converges), then the space is called complete. All normed spaces considered in this book are complete, which ensures the existence of limits of Cauchy sequences.
1.2.3 Spaces lp A norm more general than (1.3) is obtained by replacing the index 2 by an arbitrary number p [ [1, 1). In other words, in Rn the function kxkp ¼
X
jxi jp
1=p
(1:5)
i
satisfies all axioms of a norm. For p ¼ 1, definition (1.5) is completed with kxk1 ¼ sup jxi j
(1:6)
i
because limp!1 kxkp ¼ kxk1 . Rn provided with the norm k kp is denoted Rnp (1 p 1). The most immediate generalization of Rnp is the space lp of infinite sequences of numbers x ¼ (x1 , x2 , . . . ) that have a finite norm kxkp [defined by Eqs. (1.5) or (1.6), where i runs over the set of naturals N]. More generally, the set of indices I ¼ {i} in Eq. (1.5) or Eq. (1.6) may depend on the context. In addition to Rnp we use Mp (the set of matrices of all sizes). The jth unit vector in lp is an infinite sequence ej ¼ (0, . . . , 0, 1, 0, . . . ) with unity in the jth place and 0 in all others. It is immediate that the unit vectors are linearly independent and lp is infinite-dimensional.
1.2.4 Inequalities in lp The triangle inequality in lp kx þ ykp kxkp þ kykp is called the Minkowski inequality. Its proof can be found in many texts, which is not true with respect to another, less known, property that is natural to call monotonicity of lp norms: kxkp kxkq
for all 1 q p 1:
(1:7)
If x ¼ 0, there is nothing to prove. If x = 0, the general case can be reduced to the case kxkq ¼ 1 by considering the normalized vector x=kxkq : kxkq ¼ 1 implies jxi j 1 for all i. Hence, if p , 1, we have kxkp ¼
X i
jxi jp
1=p
X
jxi jq
1=p
¼
i
If p ¼ 1, the inequality supi jxi j kxkq is obvious.
X i
jxi jq
1=q
¼ kxkq :
6
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
In lp there is no general inequality opposite to Eq. (1.7). In Rnp there is one. For example, in the case n ¼ 2 we can write max{jx1 j, jx2 j} (jx1 jp þ jx2 jp )1=p 21=p max{jx1 j, jx2 j}: All such inequalities are easy to remember under the general heading of equivalent norms. Two norms kk1 and kk2 defined on the same linear space X are called equivalent if there exist constants 0 , c1 c2 , 1 such that c1 kxk1 kxk2 c2 kxk1 for all x. Theorem. (Trenogin, 1980, Section 3.3) In a finite-dimensional space any two norms are equivalent.
1.3 LINEAR OPERATORS 1.3.1 Linear Operators A linear operator is a generalization of the mapping A : Rm ! Rn induced by an n m matrix A according to y ¼ Ax. Let L1 , L2 be linear spaces. A mapping A : L1 ! L2 is called a linear operator if A(ax þ by) ¼ aAx þ bAy
(1:8)
for all vectors x, y [ L1 and numbers a, b. A linear operator is a function in the first place, and the general definition of an image applies to it: Im(A) ¼ {Ax : x [ L1 } # L2 : However, because of the linearity of A the image Im(A) is a linear subspace of L2 : Indeed, if we take two elements y1 , y2 of the image, then there exist x1 , x2 [ L1 such that Axi ¼ yi : Hence, a linear combination a1 y1 þ a2 y2 ¼ a1 Ax1 þ a2 Ax2 ¼ A(ax1 þ bx2 ) belongs to the image. With a linear operator A we can associate another linear subspace N(A) ¼ {x [ L1 : Ax ¼ 0} # L1 , called a null space of A. Its linearity easily follows from that of A: if x, y belong to the null space of A, then their linear combination belongs to it too: A(ax þ by) ¼ aAx þ bAy ¼ 0. The set of linear operators acting from L1 to L2 can be considered a linear space. A linear combination of operators aA þ bB of operators A, B is an operator defined by (aA þ bB)x ¼ aAx þ bBx. It is easy to check linearity of aA þ bB.
1.3 LINEAR OPERATORS
7
If A is a linear operator from L1 to L2 and B is a linear operator from L2 to L3 , then we can also define a product of operators BA by (BA)x ¼ B(Ax). Applying Eq. (1.8) twice we see that BA is linear: (BA)(ax þ by) ¼ B(aAx þ bAy) ¼ a(BA)x þ b(BA)y:
1.3.2 Bounded Linear Operators Let X1 , X2 be normed spaces and let A : X1 ! X2 be a linear operator. We can relate kAxk2 to kxk1 by composing the ratio kAxk2 =kxk1 if x = 0. A is called a bounded operator if all such ratios are uniformly bounded, and the norm of an operator A is defined as the supremum of those ratios: kAxk2 : x=0 kxk1
kAk ¼ sup
(1:9)
An immediate consequence of this definition is the bound kAxk2 kAkkxk1 for all x [ X1 , from which we see that the images Ax of elements of the unit ball b1 ¼ {x [ X1 : kxk1 1} are uniformly bounded: kAxk2 kAk
for all x [ b1 :
(1:10)
To save a word, a bounded linear operator is called simply a bounded operator. Let B(X1 , X2 ) denote the set of bounded operators acting from X1 to X2 : Lemma.
B(X1 , X2 ) with the norm (1.9) is a normed space.
Proof. We check the axioms from Section 1.2.1 one by one. 1. Nonnegativity is obvious. 2. Homogeneity of Eq. (1.9) follows from that of k k2 . 3. The inequality k(A þ B)xk2 kAxk2 þ kBxk2 implies kA þ Bk ¼ sup x=0
k(A þ B)xk2 kAxk2 kBxk2 sup þ sup ¼ kAk þ kBk: kxk1 x=0 kxk1 x=0 kxk1
4. If kAk ¼ 0, then kAxk2 ¼ 0 for all x and, consequently, A ¼ 0.
B
1.3.3 Isomorphism Let X1 , X2 be normed spaces. A linear operator I : X1 ! X2 is called an isomorphism if 1. kIxk2 ¼ kxk1 for all x [ X1 (preservation of norms) and 2. IX1 ¼ X2 (I is a surjection).
8
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
Item 1 implies that kIk ¼ 1 and I is one-to-one (if Ix1 ¼ Ix2 , then kx1 x2 k1 ¼ kI(x1 x2 )k2 ¼ 0 and x1 ¼ x2 ): Hence, the inverse of I exists and is an isomorphism from X2 to X1 . Normed spaces X1 and X2 are called isomorphic spaces if there exists an isomorphism I : X1 ! X2 . Vector operations in X1 are mirrored by those in X2 and the norms are the same, so as normed spaces X1 and X2 are indistinguishable. However, a given operator in one of them may be easier to analyze than its isomorphic image in the other, because of special features. Let A be a bounded operator in X1 . It is easy to see that A~ ¼ IAI 1 is a linear operator in X2 . Moreover, the norms are preserved under this mapping: kIAI 1 xk2 kIAyk2 kAyk1 ¼ sup ¼ sup ¼ kAk: kxk2 x=0 y=0 kIyk2 y=0 kyk1
~ ¼ sup kAk
1.3.4 Convergence of Operators Let A, A1 , A2 , . . . be bounded operators from a normed space X1 to a normed space X2 . The sequence {An } converges to A uniformly if kAn Ak ! 0, where the norm is as defined in Eq. (1.9). This is convergence in a normed space B(X1 , X2 ): The word ‘uniform’ is pertinent because, as we can see from Eq. (1.10), when kAn Ak ! 0, we also have the convergence kAn x Axk2 ! 0 uniformly in the unit ball b1 . The sequence {An } is said to converge to A strongly, or pointwise, if for each x [ X1 we have kAn x Axk2 ! 0. Of course, uniform convergence implies strong convergence.
1.3.5 Projectors Projectors are used (or implicitly present) in econometrics so often that it would be a sin to bypass them. Let X be a normed space and let P : X ! X be a bounded operator. P is called a projector if P2 ¼ P:
(1:11)
Suppose y is a projection of x, y ¼ Px. Then P doesn’t change y: Py ¼ P2 x ¼ Px ¼ y. This property is the key to the intuition behind projectors. Consider on the plane two coordinate axes, X and Y, intersecting at a positive, not necessarily straight, angle. Projection of points on the plane onto the axis X parallel to the axis Y has the following geometrical properties: 1. The projection of the whole plane is X. 2. Points on X stay the same. 3. Points on Y are projected to the origin. 4. Any vector on the plane is uniquely represented as a sum of two vectors, one from X and another from Y. All these properties can be deduced from linearity of P and Eq. (1.11).
1.4 HILBERT SPACES
9
Lemma. Let P be a projector and denote Q ¼ I P, where I is the identity operator in X. Then (i) Q is also a projector. (ii) Im(P) coincides with the set of fixed points of P: Im(P) ¼ {x : x ¼ Px}: (iii) Im(Q) ¼ N(P), Im(P) ¼ N(Q): (iv) Any x [ X can be uniquely represented as x ¼ y þ z with y [ Im(P), z [ Im(Q): Proof. (i) Q2 ¼ (I P)2 ¼ I 2 2P þ P2 ¼ I P ¼ Q: (ii) If x [ Im(P), then x ¼ Py for some y [ X and Px ¼ P2 y ¼ Py ¼ x, so that x is a fixed point of P. Conversely, if x is a fixed point of P, then x ¼ Px [ Im(P): (iii) The equation Px ¼ 0 is equivalent to Qx ¼ (I P)x ¼ x, and the equation Im(Q) ¼ N(P) follows. Im(P) ¼ N(Q) is obtained similarly. (iv) The desired representation is obtained by writing x ¼ Px þ (I P)x ¼ y þ z, where y ¼ Px [ Im(P) and z ¼ (I P)x ¼ Qy [ Im(Q). If x ¼ y1 þ z1 is another representation, then, subtracting one from another, we get y y1 ¼ (z z1 ). Hence, P( y y1 ) ¼ P(z z1 ). Here the righthand side is null because z, z1 [ Im(Q) ¼ N(P). The left-hand side is y y1 because both y and y1 are fixed points of P. Thus, y ¼ y1 and z ¼ z1 . B
1.4 HILBERT SPACES 1.4.1 Scalar Products A Hilbert space is another infinite-dimensional generalization of Rn . Everything starts with noticing how useful a scalar product n X
kx, yl ¼
xi yi
(1:12)
i¼1
of two vectors x, y [ Rn is. In terms of it we can define the Euclidean norm, in Rn : kxk2 ¼
n X
!1=2 x2i
¼ kx, xl1=2 :
(1:13)
i¼1
Most importantly, we can find the cosine of the angle between x, y by the formula cy) ¼ cos(x,
kx, yl : kxk2 kyk2
(1:14)
10
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
To do without the coordinate representation, we observe algebraic properties of this scalar product. First of all, it is a bilinear form: it is linear with respect to one argument when the other is fixed: kax þ by, zl ¼ akx, zl þ bk y, zl, kz, ax þ byl ¼ akz, xl þ bkz, yl for all vectors x, y, z and numbers a, b. Further, we notice that kx, xl is always nonnegative and kx, xl ¼ 0 is true only when x ¼ 0. Thus, on the abstract level, we start with the assumption that H is a linear space and kx, yl is a real function of arguments x, y [ H having properties: 1. kx, yl is a bilinear form, 2. kx, xl 0 for all x [ H and 3. kx, xl ¼ 0 implies x ¼ 0. 4. kx, yl ¼ ky, xl for all x, y. Such a function is called a scalar product. Put kxk ¼ kx, xl1=2 : Lemma.
(1:15)
(Cauchy –Schwarz inequality) jkx, ylj kxkkyk:
Proof. The function f (t) ¼ kx þ ty, x þ tyl of a real argument t is nonnegative by item 2. Using items 1 and 4 we see that it is a quadratic function: f (t) ¼ kx, x þ tyl þ tk y, x þ tyl ¼ kx, xl þ 2tkx, yl þ t 2 k y, yl: Its nonnegativity implies that its discriminant kx, yl2 kx, xlk y, yl is nonpositive. B
1.4.2 Continuity of Scalar Products Notation (1.15) is justified by the following lemma. Lemma (i) Eq. (1.15) defines a norm on H and the associated convergence concept: xn ! x in H if kxn xk ! 0: (ii) The scalar product is continuous: if xn ! x, yn ! y, then kxn , yn l ! kx, yl: Proof. (i) By the Cauchy – Schwarz inequality kx þ yk2 ¼ kx þ y, x þ yl ¼ kxk2 þ2kx, yl þ kyk2 kxk2 þ2kxkkyk þ kyk2 ¼ (kxk þ kyk)2 ,
1.4 HILBERT SPACES
11
which proves the triangle inequality in Section 1.2.1 (3). The other properties of a norm (nonnegativity, homogeneity and nondegeneracy) easily follow from the scalar product axioms. (ii) Convergence xn ! x implies boundedness of the norms kxn k: Therefore, by the Cauchy –Schwarz inequality, kkxn , yn l kx, ylk kkxn , yn ylk þ kkxn x, ylk kxn kkyn yk þ kxn xkkyk:
B
A linear space H that is endowed with a scalar product and is complete in the norm generated by that scalar product is called a Hilbert space.
1.4.3 Discrete Ho¨lder’s Inequality An interesting generalization of the Cauchy – Schwarz inequality is in terms of the spaces lp from Section 1.2.3. Let p be a number from [1, 1) or the symbol 1. Its conjugate q is defined from 1=p þ 1=q ¼ 1. Explicitly, 8 > < p=(p 1) [ (1, 1), q ¼ 1, > : 1,
1 , p , 1; p ¼ 1; p ¼ 1:
Ho¨lder’s inequality states that 1 X
jxi yi j kxkp kykq :
(1:16)
i¼1
P A way to understand it is by considering the bilinear form kx, yl ¼ 1 i¼1 xi yi . It is defined on the Cartesian product l2 l2 and is continuous on it by Lemma 1.4.2 Ho¨lder’s inequality allows us to take arguments from different spaces: kx, yl is defined on lp lq and is continuous on this product.
1.4.4 Symmetric Operators Let A be a bounded operator in a Hilbert space H. Its adjoint is defined as the operator A that satisfies kAx, yl ¼ kx, A yl for all x, y [ H: This definition arises from the property of the transpose matrix A0 , n X i¼1
(Ax)i yi ¼
n X i¼1
xi (A0 y)i :
12
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
Existence of A is proved using the so-called Riesz theorem. We do not need the general proof of existence because in all the cases we need, the adjoint will be constructed explicitly. Boundedness of A will also be proved directly. A is called symmetric if A ¼ A . Symmetric operators stand out by having properties closest to those of real numbers.
1.4.5 Orthoprojectors Cosines of angles between vectors from H can be defined using Eq. (1.14). We don’t need this definition, but we do need its special case: vectors x, y [ H are called orthogonal if kx, yl ¼ 0. For orthogonal vectors we have the Pythagorean theorem: kx þ yk2 ¼ kx þ y, x þ yl ¼ kxk2 þ2kx, yl þ kyk2 ¼ kxk2 þ kyk2 :
Two subspaces X, Y # H are called orthogonal if every element of X is orthogonal to every element of Y. If a projector P in H (P2 ¼ P) is symmetric, P ¼ P , then it is called an orthoprojector. In the situation described in Section 1.3.5, when points on the plane are projected onto one axis parallel to another, orthoprojectors correspond to the case when the axes are orthogonal. Lemma.
Let P be an orthoprojector and let Q ¼ I P. Then
(i) Im(P) is orthogonal to Im(Q). (ii) For any x [ H, kPxk is the distance from x to Im(Q). Proof. (i) Let x [ Im(P) and y [ Im(Q). By Lemma 1.3.5(ii), x ¼ Px, y ¼ Qy: Hence, x and y are orthogonal: kx, yl ¼ kPx, Qyl ¼ kx, P(I P)yl ¼ kx, (P P2 )yl ¼ 0: (ii) For an arbitrary element x [ H and a set A # H the distance from x to A is defined by dist(x, A) ¼ inf kx yk: y[A
Take any y [ Im(Q): In the equation x y ¼ Px þ Qx Qy ¼ Px þ Q(x y)
1.5 LP SPACES
13
the two terms at the right are orthogonal, so by the Pythagorean theorem kx yk2 ¼ kPxk2 þ kQ(x y)k2 kPxk2 , which implies the lower bound for the distance dist(x, Im(Q)) kPxk: This lower bound is attained on y ¼ Qx [ Im(Q): kx yk ¼ kPx þ Qx Qxk ¼ kPxk: Hence, dist(x, Im(Q)) ¼ kPxk: B
1.5 Lp SPACES 1.5.1 s -Fields Let V be some set and let F be a nonempty family of its subsets. F is called a s-field if 1. unions, intersections, differences and complements of any two elements of F belong to F , 2. the union of any sequence {An : n ¼ 1, 2, . . .} of elements of F belongs to F and 3. V belongs to F . This definition contains sufficiently many requirements to serve most purposes of analysis. In probabilities, s-fields play the role of information sets. The precise meaning of this sentence at times can be pretty complex. The following existence statement is used very often. Lemma. For any system S of subsets of V there exists a s-field F that contains S and is contained in any other s-field containing S. Proof. The set of s-fields containing S is not empty. For example, the set of all subsets of V is a s-field and contains S. Let s be the intersection of all s-fields containing B S: It obviously satisfies 1 – 3 and hence is the s-field we are looking for. The s-field whose existence is affirmed in this lemma is called the least s-field generated by S and denoted s(S):
1.5.2 Borel s-field in Rn A ball in Rn centered at x [ Rn of radius 1 . 0, b1 (x) ¼ {y [ Rn : kx yk2 , 1}, is called an 1-neighborhood of x. We say that the set A # Rn is an open set if each point x belongs to A with its neighborhood b1 (x) (where 1 depends on x). The Borel s-field Bn in Rn is defined as the smallest s-field that contains all open subsets of Rn : It exists by Lemma 1.5.1. In more general situations, when open subsets of V are not defined, s-fields of V are introduced axiomatically.
14
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
1.5.3 s-Additive Measures A pair (V, F ), where V is some set and F is a s-field of its subsets, is called a measurable space. A set function m defined on elements of F with values in the extended half-line [0, 1] is called a s-additive measure if for any disjoint sets A1 , A2 , . . . [ F one has
m
1 [ j¼1
! Aj
¼
1 X
m(Aj ):
j¼1
EXAMPLE 1.4. On a plane, for any rectangle A define m(A) to be its area. The extension procedure from the measure theory then leads to the Lebesgue measure m with V ¼ R2 and F ¼ B2 (m is defined on all Borel subsets of R2 ). A probabilistic measure is a s-additive measure that satisfies an additional requirement m(V) ¼ 1: In this case, following common practice, we write P instead of m: Thus, a probability space (sometimes also called a sample space) is a triple (V, F , P) where V is a set, F is a s-field of its subsets and P is a s-additive measure on F such that P(V) ¼ 1: EXAMPLE 1.5. On a plane, take the square [0, 1]2 as V and let P be the Lebesgue measure. Then F will be the set of Borel subsets of the square.
1.5.4 Measurable Functions Let (V1 , F 1 ) and (V2 , F 2 ) be two measurable spaces. A function f : V1 ! V2 is called measurable if f 1 (A) [ F 1 for any A [ F 2 : More precisely, it is said to be (F 1 , F 2 )-measurable. In particular, when (V1 , F 1 ) ¼ (Rn , Bn ) and (V2 , F 2 ) ¼ (Rm , Bm ), this definition gives the definition of Borel-measurability. Most of the time we deal with real-valued functions, when V2 ¼ R and F 2 ¼ B1 is the Borel s-field. In this case we simply say that f is F 1 -measurable. All analysis operations in the finite-dimensional case preserve measurability. The next theorem is often used implicitly. Theorem.
(Kolmogorov and Fomin, 1989, Chapter 5, Section 4)
1. Let X, Y and Z be arbitrary sets with systems of subsets sX , sY and sZ , respectively. Suppose the function f : X ! Y is (sX , sY )-measurable and g : Y ! Z is (sY , sZ )-measurable. Then the composition z(x) ¼ g( f (x)) is (sX , sZ )-measurable. 2. Let f and g be defined on the same measurable space (V, F ): Then a linear combination af þ bg and product fg are measurable. If g does not vanish, then the ratio f =g is also measurable.
1.5 LP SPACES
15
1.5.5 L p Spaces Let (V, F , m) be any space with a s-additive measure m and let 1 p , 1. The set of measurable functions f : V ! R provided with the norm ð k f kp ¼
1=p j f (x)jp d m , 1 p , 1,
V
is denoted Lp ¼ Lp (V). In the case p ¼ 1 this definition is completed with k f k1 ¼ ess supx[V j f (x)j ¼ inf
sup j f (x)j:
m(A)¼0 x[VnA
The term in the middle is, by definition, the quantity at the right and is called essential supremum. These definitions mean that values taken by functions on sets of measure zero don’t matter. An equality f (t) ¼ 0 is accompanied by the caveat “almost everywhere” (a.e.) or “almost surely” (a.s.) in the probabilistic setup, meaning that there is a set of measure zero outside which f (t) ¼ 0:
1.5.6 Inequalities in Lp Apparently, Lp spaces should have a lot in common with lp spaces. The triangle inequality in Lp kF þ Gkp kFkp þ kGkp is called a Minkowski inequality. Ho¨lder’s inequality looks like this: ð j f (x)g(x)j d m k f kp kgkq , V
where q is the conjugate of p. When m(V) , 1, we can use this inequality to show that for 1 p1 , p2 1, L p2 is a subset of L p1 : ð
p1
ð
j f (x)j d m V
j f (x)j
p1 p2 = p1
p1 = p2 ð dm
V
p ¼ f p12 [m(V)]1p1 =p2 :
1 p1 = p2 dm
V
In particular, when (V, F , P) is a probability space, we get k f k p1 k f k p2
if 1 p1 , p2 1:
This is the opposite of the monotonicity relation (1.7).
1.5.7 Covariance as a Scalar Product Real-valued measurable functions on a probability space (V, F , P) are called random variables. Let X, Y be integrable random variables (integrability is necessary for their
16
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
means to exist). Denote x ¼ X EX, y ¼ Y EY: Then the covariance of X, Y is defined by cov(X, Y) ¼ E(X EX)(Y EY) ¼ Exy, the standard deviation of X is, by definition, pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffi s(X) ¼ cov(X, X) ¼ Ex2 ¼ s(x)
(1:17)
(1:18)
and the definition of correlation of X, Y is
r(X, Y) ¼
cov(X, Y) Exy ¼ : s(X)s(Y) s(x)s(y)
(1:19)
Comparison of Eqs. (1.17), (1.18) and (1.19) with Eqs. (1.12), (1.13) and (1.14) from Section 1.4.1 makes clear that definitions (1.17), (1.18) and (1.19) originate in Euclidean geometry. In particular, s (X) is the distance from X to EX and from x to 0. While this idea has been very fruitful, I often find it more useful to estimate (EX 2 )1=2 , which is the distance from X to 0.
1.5.8 Dense Sets in Lp, p < 1 Let us fix some space with measure (V, F, m). A set M # Lp is said to be dense in Lp if any function f [ Lp can be approximated by some sequence f fng # M: k fn 2f kp ! 0. By 1A we denote the indicator of a set A: 1, x [ A; 1A ¼ 0, x A: P A finite linear combination i ci 1Ai of indicators of measurable sets Ai [ F is called a step function. We say that the measure m is a s-finite measure if V can be represented as a union of disjoint sets Vi, [ V¼ Vi , (1:20) i
of finite measure m(Vi) , 1. For example, Rn is a union of rectangles of finite Lebesgue measure. Lemma. If p , 1 and the measure m is s-finite, then the set M of step functions is dense in Lp. Proof. Step 1. Let f [ Lp. First we show that the general case of V of infinite measure can be reduced to the case m (V) , 1. Since for the sets from Eq. (1.20) we have ð
j f (x)j p d m ¼ V
Xð l
j f (x)j p dm , 1, Vl
1.5 LP SPACES
17
P Ð for any 1 . 0 there exists L . 0 such that l.L Vl j f (x)j p d m , 1: Denote e e ¼ SL Vl . Whatever step function f~ we find to approximate f in L p (V) V 1 l¼1 in the sense that ð
j f (x) f~ 1 (x)j p d m , 1,
V
we can extend it by zero, ( f1 (x) ¼
f~ 1 (x),
e x [ V;
0,
e x [ VnV,
to obtain an approximation to f in Lp(V): ð
j f f1 j p d m ¼ V
ð
j f f~ 1 j p d m þ
V
ð
j f j p d m , 21:
Vn V
e , 1: f1 will be a step function and m(V) Step 2. Now we show that f can be considered bounded. From ð
j f j p dm ¼
V
1 ð X l¼1
j f (x)j p d m , 1 {l1j f (x)j,l}
Ð we see that for any 1 . 0, L can be chosen so that {Lj f (x)j} j f (x)j p d m , 1. e ¼ j f (x)j L and, as above, we see that approximatThen f is bounded on V e is enough. ing f by a simple function on V Step 3. Now we can assume that m(V) , 1 and j f (x)j L: Take a large k and partition [2L, L] into k nonoverlapping (closed, semiclosed or open, it does not matter) intervals D1 , . . . , Dk of length 2L=k: Let l1 , . . . , lk denote the left ends of those intervals and put Am ¼ f 1 (Dm ), m ¼ 1, . . . , k: Then the sets Am are disjoint,
jlm f (x)j
2L k
for x [ Am and V ¼
k [
Am :
m¼1
This implies p ð X Xð lm 1Am (x) f (x) dm ¼ jlm f (x)j p d m V m A m m p 2L m(V) ! 0, k ! 1: k
B
18
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
1.6 CONDITIONING ON s-FIELDS 1.6.1 Absolute Continuity of Measures Let (V, F, P) be a probability space and let f be an integrable function on V. Then the s-additivity of Lebesgue integrals (Kolmogorov and Fomin, 1989, Chapter 5, Section 5.4) ð 1 ð X f (x)dP ¼ f (x)dP for disjoint measurable Am 1 S Am
m¼1
Am
m¼1
means that ð n(A) ¼
f (x)dP
(1:21)
A
is a s-additive set function with the same domain F as that of P. Another property of Lebesgue integrals (see the same source) states that n is absolutely continuous with respect to P: n(A) ¼ 0 for each measurable set A for which P(A) ¼ 0. The Radon – Nikodym theorem affirms that the opposite is true: s-additivity and absolute continuity are sufficient for a set function to be of form (1.21). Theorem. (Radon – Nikodym) (Kolmogorov and Fomin, 1989, Chapter 6, Section 5.3) If (V, F , P) is a probability space and n is a set function defined on F that is s-additive and absolutely continuous with respect to P, then there exists an integrable function f on V such that Eq. (1.21) is true. If g is another such function, then f ¼ g a.s.
1.6.2 Conditional Expectation Let (V, F, P) be a probability space, X an integrable random variable and G a s-field contained in F . The conditional expectation E(XjG) is defined as a G-measurable function Y such that ð ð YdP ¼ XdP for all A [ G: (1:22) A
A
EXAMPLE 1.6. Let G ¼ {;, V} be the smallest s-field. In the case A ¼ ; (or, more generally, P(A) ¼ 0) Eq. (1.22) turns into an equality of two zeros. In the case A ¼ V we see that the means of Y and X should be the same. Since a constant is the only G-measurable random variable, it follows that E(XjG) ¼ EX: EXAMPLE 1.7. Let G ¼ F be the largest s-field contained in F . Since X is G-measurable, Y ¼ X satisfies Eq. (1.22). Hence, E(XjG) ¼ X by a.s. uniqueness. Y ¼ X is an incorrect answer for Example 1.6 because inverse images X 1 (B) of some Borel sets would not belong to {;, V} unless F ¼ {;, V}: Y ¼ E(XjG) contains precisely as much information about X as is necessary to calculate the integrals in (1.22).
1.6 CONDITIONING ON s-FIELDS
19
1.6.3 Conditioning as a Projector Lemma.
Let (V, F, P) be a probability space and let G be a s-field contained in F.
(i) For any integrable X, E(XjG) exists. Denote PG X ¼ E(X j G) for X [ L1 (V). (ii) PG is linear, PG (aX þ bY) ¼ aPGX þ bPGY, and bounded, kPG Xk1 kXk1 . (iii) PG is a projector. Proof. Ð (i) n(A) ¼ A XdP defines a s-additive set function on G that is absolutely continuous with respect to P. By the Radon – Nikodym theorem there exists a G-measurable function Y such that Eq. (1.22) is true. This proves the existence of Y ¼ E(X j G). (ii) We can use Eq. (1.22) repeatedly to obtain ð
ð
ð
PG (aX þ bY) dP ¼
ð
(aX þ bY) dP ¼ a
A
A
ð
ð PG X dP þ b
¼a A
ð
Y dP A
PG Y dP A
(aPG X þ bPG Y) dP,
¼
XdP þ b A
A [ G:
A
Since aPG X þ bPG Y is G-measurable, it must coincide with PG (aX þ bY). For any real-valued function f define its positive part by fþ ¼ maxf f, 0g and negative part by f2 ¼ 2minf f, 0g. Then it is geometrically obvious that f ¼ fþ f and j f j ¼ fþ þ f : Decomposing PG X into its positive and negative parts, PG X ¼ (PG X)þ (PG X) , and remembering that both sets {PG X . 0} and {PG X , 0} are G-measurable we have ð
ð jPG Xj dP ¼ V
[(PG X)þ þ (PG X) ] dP ð
V
ð PG X dP þ
¼ ð
PG X . 0
ð
PG X dP PG X , 0
X dP þ
¼ PG X . 0
ð
X dP PG X , 0
jXj dP: V
This proves that k PG k 1. Ð Ð (iii) P2G X is defined as a G-measurable function Y such that A Y dP ¼ A PG X dP for all A [ G: Since PG X itself is G-measurable, we have Y ¼ PG X a.s. B
20
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
1.6.4 The Law of Iterated Expectations In a 3-D space, projecting first to a plane and then to a straight line in that plane gives the same result as projecting directly to the straight line. This is also true of conditioning (and projectors in general). Lemma. Let H # G # F be nested s-fields and denote PH and PG as the conditioning projectors on H and G, respectively. Then PH PG ¼ PG PH ¼ PH : Using the conditional expectation notation, this is the same as E[E(XjG)jH] ¼ E[E(XjH)jG] ¼ E(XjH):
(1:23)
In particular, when H ¼ {;, V} is the least s-field, we get E[E(XjG)] ¼ EX for all integrable X. Proof. H-measurability of PH X implies its G-measurability. Hence, by Lemma 1.6.3 (iii) PG doesn’t change it. This proves that Ð PG PH ¼ PHÐ . PG X is G-measurable and Ðsatisfies A PÐG XdP ¼ A X dP for all A [ G: In particular, this is true for A [ H: A PG XdP ¼ A XdP, A [ H: Confronting this with the definition of PH PG X, ð
ð
PG XdP, A [ H,
PH PG XdP ¼ A
A
Ð Ð we see that A PH PG XdP ¼ A XdP, A [ H: But PH satisfies the same equation with PH X in place of PH PG X and both are H-measurable. Hence, PH PG X ¼ PH X a.s. B
1.6.5 Extended Homogeneity In the usual homogeneity, PG (aX) ¼ aPG X, a is a number. In the conditioning context, a can be any G-measurable function, according to the next theorem. I call this property extended homogeneity. Theorem. If the variables X and XY are integrable and Y is G-measurable, then PG (XY) ¼ YPG X: The proof can be found, for example, in (Davidson 1994, Section 10.4).
1.6.6 Independence s-fields H and G are called independent s-fields if any event A [ H is independent of any event B [ G: P(A > B) ¼ P(A)P(B): Random variables X and Y are said to be independent if s-fields s (X) and s (Y) are independent. Moreover, a family {Xi : i [ I} of random variables is called independent if, for any two disjoint sets of indices J, K, s-fields s (Xi : i [ J) and s (Xi : i [ K) are independent.
1.7 MATRIX ALGEBRA
21
Theorem. (Davidson 1994, Section 10.5) Suppose X is integrable and H-measurable. If G is independent of H, then conditioning X on G provides minimum information: E(XjG ) ¼ EX:
1.7 MATRIX ALGEBRA Everywhere we follow the matrix algebra convention: all matrices and vectors in the same formula are compatible. All matrices in this section are assumed to be of size n n: The determinant of A is denoted as det A or jAj.
1.7.1 Orthogonal Matrices A matrix T is called orthogonal if T 0 T ¼ I:
(1:24) P
Since both T 0 T and TT 0 have generic elements l til tli , Eq. (1.24) is equivalent to TT 0 ¼ I: Equation (1.24) means, by definition of the inverse, that T 1 ¼ T 0 : Geometrically, the mapping y ¼ Tx is rotation in Rn : This is proved by noting that T preserves scalar products: kTx, Tyl ¼ kx, T 0 Tyl ¼ kx, yl: Hence, it preserves vector lengths and angles between vectors, see Equations (1.13) and (1.14) in Section 1.4.1. Rotation around the origin is the only mapping that has these properties.
1.7.2 Diagonalization of Symmetric Matrices A number l [ R is called an eigenvalue of a matrix A if there exists a nonzero vector x that satisfies Ax ¼ lx: Such a vector x is named an eigenvector corresponding to l: From this definition it follows that A reduces to multiplication by l along the straight line {ax : a [ R}: The set L of all eigenvectors corresponding to l, completed with the null vector, is a subspace of Rn , because Ax ¼ lx and Ay ¼ ly imply A(ax þ by) ¼ l(ax þ by): This subspace is called a characteristic subspace of A corresponding to l: The dimension of the characteristic subspace (see Section 1.1.3) is called multiplicity of l: A reduces to multiplication by l in L: We say that a system of vectors x1 , . . . , xk is orthonormal if kxi , xj l ¼
1, 0,
i ¼ j; i = j:
The system of unit vectors in Rn is an example of an orthonormal system. An orthonormal system is necessarily linearly independent because scalar multiplication of the equation a1 x1 þ þ ak xk ¼ 0 by vectors x1 , . . . , xk yields a1 ¼ ¼ ak ¼ 0:
22
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
Theorem. (Diagonalization theorem) (Bellman 1995, Chapter 4, Section 7). If A is symmetric of size n n, then it has n real eigenvalues l1 , . . . , ln , repeated with their multiplicities. Further, there is an orthogonal matrix T such that A ¼ T 0 LT,
(1:25)
where L is a diagonal matrix L ¼ diag[l1 , . . . , ln ]: Finally, the eigenvectors x1 , . . . , xn that correspond to l1 , . . . , ln can be chosen orthonormal. Equation (1.25) embodies the following geometry. In the original coordinate system with the unit vectors ej (see Section 1.1.3) the matrix A has generic elements aij : The first transformation T in Eq. (1.25) rotates the coordinate system to a new position in which A is of simple diagonal form, the new axes being eigenvectors along which applying A amounts to multiplication by numbers. The final transformation by T 0 ¼ T 1 rotates the picture to the original position.
1.7.3 Finding and Applying Eigenvalues Eigenvalues are the roots of the equation det(A lI) ¼ 0: Application of this matrix algebra rule is complicated as the left side of the equation is a polynomial of order n. Often it is possible to exploit the analytical structure of A to find its eigenvalues using the next lemma. A subspace L of Rn is called an invariant subspace of a matrix A if AL # L: Lemma (i) l is an eigenvalue of A if and only if l c is an eigenvalue of A cI: (ii) Let L be an invariant subspace of a symmetric matrix A. Denote P an orthoprojector onto L, Q ¼ I P and M ¼ Im(Q). Then M is an invariant subspace of A and the analysis of A reduces to the analysis of its restrictions AjL and AjM : Proof. Statement (i) is obvious because the equation Ax ¼ lx is equivalent to (A cI)x ¼ (l c)x. (ii) For any x, y [ Rn by symmetry of A, P, kPAQx, yl ¼ kAQx, Pyl ¼ kQx, APyl ¼ 0: The last equality follows from the facts that Py [ L ¼ Im(P), APy [ L and Im(P) is orthogonal to Im(Q) [see Lemma 1.4.5(i)]. Plugging in y ¼ PAQx we get kPAQxk ¼ 0 and PAQx ¼ 0: Since Qx runs over M when x runs over Rn , we obtain PAM ¼ {0} or, by Lemma 1.4.5(ii), AM # M and M is invariant with respect to A.
1.7 MATRIX ALGEBRA
23
Now premultiply by A the identity I ¼ P þ Q to get A ¼ AP þ AQ ¼ AjL P þ AjM Q:
B
The second part of this lemma leads to the following practical rule. If you have managed to find the first eigenvalue l and the corresponding characteristic subspace L of A, then consider the restriction AjM to find the rest of the eigenvalues. This process of “chipping off ” characteristic subspaces can be repeated. While you do that, construct the orthonormal systems of eigenvectors until their total number reaches n. Denoting y ¼ Tx, from Theorem 1.7.2 we have kAx, xl ¼ kT 0 LTx, xl ¼ kLTx, Txl ¼ kLy, yl ¼
n X
li y2i :
i¼1
Hence, A is nonnegative and kAx, xl 0 for all x if and only if all eigenvalues of A are nonnegative. Therefore we can define the square root of a nonnegative symmetric matrix by A1=2 ¼ T 0 diag[l11=2 , . . . , l1=2 n ]T:
1.7.4 Gram Matrices In a Hilbert space H consider vectors x1 , . . . , xk : Their Gram matrix is defined by 0 1 kx1 , x1 l . . . kx1 , xk l G ¼ @ ... ... . . . A: kxk , x1 l . . . kxk , xk l Theorem. (Gantmacher 1959, Chapter IX, Section 5) Vectors x1 , . . . , xk are linearly independent if and only if det G . 0.
1.7.5 Positive Definiteness of Gram Matrices Lemma. If vectors x1 , . . . , xk [ Rn are linearly independent, then G is positive definite: kGx, xl . 0 for all x = 0: Proof. According to the Silvester criterion (Bellman 1995, Chapter 5, Section 3), G is positive definite if and only if all determinants
kx1 , x1 l kx1 , x2 l , . . . , det G kx1 , x1 l, det kx2 , x1 l kx2 , x2 l
(1:26)
are positive. Linear independence of the system {x1 , . . . , xk } implies that of all its subsystems {x1 }, {x1 , x2 }, . . . . Thus all determinants are positive by Theorem 1.7.4. B
24
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
1.7.6 Partitioned Matrices: Determinant and Inverse Lemma.
(Lu¨tkepohl 1991, Section A.10). Let matrix A be partitioned as A¼
A11 A21
A12 A22
where A11 and A22 are square. Then (i) If A11 is nonsingular, jAj ¼ jA11 j jA22 A21 A1 11 A12 j. (ii) If A11 and A22 are nonsingular, 1
A
¼
¼
D
DA12 A1 22
!
1 1 1 A1 22 A21 D A22 þ A22 A21 DA12 A22 1 1 A1 11 þ A11 A12 GA21 A11 GA21 A1 11
! A1 11 A12 G , G
1 1 where D ¼ (A11 A12 A1 and G ¼ (A22 A21 A1 22 A21 ) 11 A12 ) .
1.8 CONVERGENCE OF RANDOM VARIABLES A random variable is nothing but a (F , B)-measurable function X : V ! R where (V, F , P) is a probability space and B is the Borel s-field of R. In the case of a random vector it suffices to replace R by Rn and B by Bn , the Borel s-field of Rn :
1.8.1 Convergence in Probability Let X, X1 , X2 , . . . be random vectors defined on the same probability space and with values in the same space Rn : If lim P(kXn Xk2 . 1) ¼ 0 for any 1 . 0,
n!1
then {Xn } is said to converge in probability to X. Convergence in probability is comp
monly denoted Xn ! X or plimXn ¼ X: From the equivalent definition lim P(kXn Xk2 1) ¼ 1 for any 1 . 0
n!1
it may be easier to see that this notion is a natural generalization of convergence of numbers. A nice feature of convergence in probability is that it is preserved under arithmetic operations.
1.8 CONVERGENCE OF RANDOM VARIABLES
25
Lemma. Let {Xi } and {Yi } be sequences of n 1 random vectors and let {Ai } be a sequence of random matrices such that plim Xi , plim Yi and plim Ai exist. Then (i) plim(Xi + Yi ) ¼ plimXi + plimYi : (ii) plim Ai Xi ¼ plim Ai plim Xi : (iii) Let g : Rn ! R be a Borel-measurable function such that X ¼ plim Xi takes values in the continuity set Cg of g with probability 1, P(X [ Cg ) ¼ 1: Then plimg(Xi ) ¼ g(X): 1 (iv) If plim Ai ¼ A and P(det A = 0) ¼ 1, then plim A1 n ¼A . Proof. Statements (i) and (ii) are from (Lu¨tkepohl 1991, Section C.1). (iii) is proved in (Davidson 1994, Theorem 18.8). (iv) The real-valued function 1=det A of a square matrix A of order n is continu2 ous everywhere in the space Rn of its elements except for the set det A ¼ 0: Elements of A1 are cofactors of elements of A divided by det A: Hence, they are also continuous where det A = 0. The statement follows on applying (iii) element by element. B Part (iv) of this lemma does not imply invertibility of An a.e. It merely implies that the set on which An is not invertible has probability approaching zero.
1.8.2 Distribution Function of a Random Vector Let X be a random vector with values in Rk . Its distribution function is defined by !! k Y 1 FX (x) ¼ P(X1 x1 , . . . , Xk xk ) ¼ P X (1, xn ] , x [ Rk : n¼1
It is proved that FX induces a probability measure on Rk , also denoted by FX . We say that X has density pX if FX is absolutely continuous with respect to the Lebesgue measure in Rk , that is if ð FX (A) ¼ pX (t) dt A
for any Borel set A. Random vectors X, Y are said to be identically distributed if their distribution functions are identical: FX (x) ¼ FY (x) for all x [ Rk . The original pair consisting of the vector X and probability space (V, F , P) is distributed identically with the pair consisting of the identity mapping X(t) ¼ t on Rk and probability space (Rk , Bk , FX ) where Bk is the Borel field of subsets of Rk . Identically distributed vectors have equal moments. In particular, there are two different formulas for ð ð EX ¼ X(v) dP(v) ¼ tdFX (t) V
(see Davidson 1994, Section 9.1).
Rk
26
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
1.8.3 Convergence in Distribution We say that a sequence of random vectors {Xi } converges in distribution to X if FXi (t) ! FX (t) at all continuity points t of the limit distribution FX . For convergence d
in distribution we use the notation Xi ! X or dlimXi ¼ X. In econometrics, we are interested in convergence in distribution because confidence intervals for X in the one-dimensional (1-D) case can be expressed in terms of FX : P(a , X b) ¼ FX (b) FX (a). Here the right-hand side can be approximated by FXi (b) FXi (a) if dlimXi ¼ X and a and b are continuity points of FX (which is always the case if X is normal). Convergence in distribution is so weak that it is not preserved under arithmetic operations. In expressions like Xi þ Yi or Ai Xi we can pass to the limit in distribution if one sequence converges in distribution and the other in probability to a constant. Lemma. Let {Xi } and {Yi } be sequences of n 1 random vectors and let {Ai } be a sequence of random matrices such that dlimXi , plimYi and plimAi exist. (i) If c ¼ plimYi is a constant, then dlim(Xi þ Yi ) ¼ dlimXi þ c. (ii) If A ¼ plimAi is constant, then dlimAi Xi ¼ AdlimXi . (iii) plimXi ¼ X implies dlimXi ¼ X. If X is a constant, then the converse is true: dlimXi ¼ c implies plimXi ¼ c. (iv) (Dominance of convergence in probability to zero) If plimAi ¼ 0, then the same is true for the product: plimAi Xi ¼ 0: d
(v) Suppose Xn ! X where all random vectors take values in Rk . Let h : Rk ! Rm be measurable and denote Dh the set of discontinuities of h. d
If FX (Dh ) ¼ 0, then h(Xn ) ! h(X). Proof. For (i) and (ii) see (Davidson 1994, Theorem 22.14) (1-D case). The proof of (iii) can be found in (Davidson 1994, Theorems 22.4 and 22.5). Statement (iv) is proved like this. If plimAi ¼ 0, then dlimAi Xi ¼ 0 by (ii), which implies plimAi Xi ¼ 0 by (iii). The proof of (v) is contained in (Billingsley 1968, Chapter 1, Section 5). B The case c ¼ 0 of statement (i) is a perturbation result: adding to {Xi } a sequence {Yi } such that plimYi ¼ 0 does not change dlimXi . A continuous h (for which Dh is empty) is a very special case of (v). This case is called a continuous mapping theorem (CMT). For (ii) “plimAi ” is not constant, the way around is to prove convergence in distribution of the pair {Ai , Xi }. Then CMT applied to h(Ai , Xi ) ¼ Ai Xi does the job.
1.8.4 Boundedness in Probability Let {Xn } be a sequence of random variables. We know that a (proper) random variable X satisfies P(jXj . M) ! 0 as M ! 1. Requiring this property to hold uniformly in n gives us the definition of boundedness in probability: supn P (jXn j . M) ! 0
1.8 CONVERGENCE OF RANDOM VARIABLES
27
as M ! 1. We write Xn ¼ Op (1) when {Xn } is bounded in probability. This notation is justified by item (i) of the next lemma. Lemma (i) If Xn ¼ xn ¼ constant, then xn ¼ O(1) is equivalent to Xn ¼ Op (1). (ii) If Xn ¼ Op (1) and Yn ¼ Op (1), then Xn þ Yn ¼ Op (1) and Xn Yn ¼ Op (1). Proof. (i) It is easy to see that sup P(jxn j . M) ¼ sup 1{jxn j.M} ¼ 1{supn jxn j.M} : n
(1:27)
n
This implies that supn P(jXn j . M) ! 0 if and only if supn jxn j M: (ii) Let us show that {jXn þ Yn j . M} # {jXn j . M=2} < {jYn j . M=2}:
(1:28)
Suppose the opposite is true. Then there exists v [ V such that M , jXn (v) þ Yn (v)j jXn (v)j þ jYn (v)j M, which is nonsense. Equation (1.28) implies
M M þ sup P jYn j . sup P(jXn þ Yn j . M) sup P jXn j . 2 2 n n n ! 0, M ! 1, that is, Xn þ Yn ¼ Op (1). Further, along with Eq. (1.28), we can prove n pffiffiffiffiffio n pffiffiffiffiffio {jXn Yn j . M} # jXn j . M < jYn j . M and therefore pffiffiffiffiffi pffiffiffiffiffi sup P(jXn Yn j . M) sup P jXn j . M þ sup P jYn j . M n
n
! 0, M ! 1,
n
(1:29)
which proves that Xn Yn ¼ Op (1). B
28
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
1.8.5 Convergence in Probability to Zero The definition of Section 1.8.1 in the special case when {Xn } is a sequence of random variables gives the definition of convergence in probability to zero: p limn!1 P(jXn j . 1) ¼ 0 for any 1: In this case, instead of Xn ! 0 people often write Xn ¼ op (1): Lemma (i) If Xn ¼ xn ¼ constant, then xn ¼ o(1) is equivalent to Xn ¼ op (1). (ii) Xn ¼ op (1) implies Xn ¼ Op (1). (iii) If Xn ¼ op (1) and Yn ¼ op (1), then Xn + Yn ¼ op (1). (iv) Suppose Xn ¼ op (1) or Xn ¼ Op (1) and Yn ¼ op (1). Then Xn Yn ¼ op (1). d (v) If Xn ! X and Yn ¼ op (1), then Xn Yn ¼ op (1). Proof. (i) From an equation similar to Eq. (1.27): lim sup P(jxn j . 1) ¼ lim sup 1{jxn j.1} ¼ 1{lim supn!1 jxn j.1} , n!1
n!1
we see that limn!1 P(jXn j . 1) ¼ 0 is equivalent to lim supn!1 jxn j 1 and Xn ¼ op (1) is equivalent to xn ¼ o(1). (ii) If Xn ¼ op (1), then, for any given d . 0, there exists n0 such that P(jXn j . M) d, n n0 . Increasing M, if necessary, we can make sure that P(jXn j . M) d, n , n0 . Thus, supn P(jXn j . M) d. Since d . 0 is arbitrary, this proves Xn ¼ Op (1). (iii) This statement follows from Lemma 1.8.1(i). (iv) By (ii) Xn ¼ Op (1), modify Eq. (1.29) to get sup P(jXn Yn j . 1M) sup P(jXn j . M) þ sup P(jYn j . 1): nn0
n
nn0
Taking an arbitrary d . 0, choose a sufficiently large M, define 1 ¼ d=M and then select a sufficiently large n0 : The right-hand side will be small, which proves Xn Yn ¼ op (1). (v) This is just a different way of stating Lemma 1.8.3(iv). B
1.8.6 Criterion of Convergence in Distribution of Normal Vectors A normal vector is defined using its density. We don’t need the formula for the density here. It suffices to know that determined Ð Ð the density of a normal vector e is completely by its first moment Ee ¼ Rn tdFe (t) and second moments Eei ej ¼ Rn ti tj dFe (t).
1.9 THE LINEAR MODEL
29
Lemma. Convergence in distribution of a sequence {Xk } of normal vectors takes place if and only if the limits limEXk and limV(Xk ) exist where V(X) ¼ E(X EX)(X EX)0 . Proof. This statement is obtained by combining two facts. The characteristic function fX of a random vector X is defined by
fX (t) ¼ Eeikt,Xl , t [ Rn : pffiffiffiffiffiffiffi Here i ¼ 1. The first fact is that convergence in distribution dlimXk ¼ X is equivalent to the pointwise convergence limfXk (t) ¼ fX (t) for all t [ Rn (see Billingsley 1995, Theorem 26.3). The second fact is that the characteristic function of a normal vector X depends only on two parameters: its mean EX and variance V(X ) see (Rao 1965, Section 8a.2). B
1.9 THE LINEAR MODEL 1.9.1 The Classical Linear Model The usual assumptions about the linear regression y ¼ Xb þ e
(1:30)
are the following: 1. y is an observed n-dimensional random vector, 2. the matrix of regressors (or independent variables) X of size n k is assumed known, 3. b [ Rk is the parameter vector to be estimated from data ( y and X), 4. e is an unobserved n-dimensional error vector with mean zero and 5. n . k and det X 0 X = 0. The matrix X is assumed constant (deterministic). In dynamic models, with lags of the dependent variable at the right side, those lags are listed separately. I am in favor of separating deterministic regressors from stochastic ones from the very beginning, rather than piling them up together and later trying to specify the assumptions by sorting out the exogenous regressors.
30
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
1.9.2 Ordinary Least Squares Estimator The least squares procedure first gives rise to the normal equation X 0 X b^ ¼ X 0 y for the OLS estimator b^ of b and then, subject to the condition det X 0 X = 0, to the formula of the estimator
b^ ¼ (X 0 X)1 X 0 y: This formula and model (1.30) itself lead to the representation
b^ b ¼ (X 0 X)1 X 0 e
(1:31)
used to study the properties of b^ : In particular, the assumption Ee ¼ 0 implies that b^ is unbiased, E b^ ¼ b and that its distribution is centered on b.
1.9.3 Normal Errors N(m, S) denotes the class of normal vectors with mean m and variance S (which in general may be singular). Errors distributed as N(0, s 2 I) are assumed as the first approximation to reality. Components e1 , . . . , en of such errors satisfy cov(ei , ej ) ¼ 0,
i = j, Eei ¼ 0, Ee2i ¼ s2 :
(1:32)
The first equation here says that e1 , . . . , en are uncorrelated. Lemma. If e N(0, s 2 I), then the components of e are independent identically distributed. Proof. By the theorem from (Rao 1965, Section 8a.2) uncorrelatedness of the components of e plus normality of e imply independence of the components. By Eq. (1.32) the first and second moments of the components coincide, therefore their densities and distribution functions coincide. B
1.9.4 Independent Identically Distributed Errors We write e IID(0, s 2 I) to mean that the components of e are independent identically distributed (i.i.d.), have mean zero and covariance s2 I: Lemma 1.9.3 means that N(0, s 2 I) # IID(0, s 2 I): Lemma. Suppose e IID(0, s2 I) and put F 0 ¼ {;, V}, F t ¼ s(ej : j t), t ¼ 1, 2, . . . Then et is F t -measurable, E(et jF t1 ) ¼ 0, E(e2t jF t1 ) ¼ s2 , t ¼ 1, . . . , n. Proof. For t ¼ 1, E(e1 jF 0 ) ¼ Ee1 ¼ 0 (see Example 1.6 in Section 1.6.2). Let t . 1. By definition, F t1 ¼ s (ej : j t 1) and s (et ) are independent.
1.9 THE LINEAR MODEL
31
By Theorem 1.6.6, E(et jF t1 ) ¼ Eet ¼ 0: Similarly, E(e2t jF t1 ) ¼ Ee2t ¼ s 2 (see Theorem 1.5.4(i) about nonlinear transformations of measurable functions). B
1.9.5 Martingale Differences Let {F t : t ¼ 1, 2, . . .} be an increasing sequence of s-fields contained in F : F 1 # . . . # F n # . . . , F : A sequence of random variables {et : t ¼ 1, 2, . . .} is called adapted to {F t } if et is F t -measurable for t ¼ 1, 2, . . . If a sequence of integrable variables {et } satisfies 1. {et } is adapted to {F t } and 2. E(et jF t1 ) ¼ 0 for t ¼ 1, 2, . . . , where F 0 ¼ {;, V}, then we say that {et , F t } or, shorter, {et } is a martingale difference (m.d.) sequence. Lemma.
Square-integrable m.d. sequences are uncorrelated and have mean zero.
Proof. By the law of iterated expectations (LIE) [Eq. (1.23)] and the m.d. property item 2 the means are zero: Eet ¼ E[E(et jF t1 )jF 0 ] ¼ 0, t ¼ 1, 2, . . . Let s , t. Since es is F s -measurable, it is F t1 -measurable. By extended homogeneity (Section 1.6.5) and the LIE Ees et ¼ E[E(es et jF t1 )] ¼ E[es E(et jF t1 )] ¼ 0:
B
The generality of the m.d. assumption is often reduced by the necessity to restrict the behavior of the second-order conditional moments by the condition E(e2t jF t1 ) ¼ s 2 , t ¼ 1, 2, . . .
(1:33)
Owing to the LIE this condition implies Ee2t ¼ s 2 , t ¼ 1, 2, . . . We denote by MD(0, s 2 ) the square-integrable m.d.’s that satisfy Eq. (1.33). By Lemma 1.9.4, IID(0, s 2 I) # MD(0, s 2 ) if we put F t ¼ s (ej : j t).
1.9.6 The Hierarchy of Errors We have proved that N(0, s 2 I) # IID(0, s 2 I) # MD(0, s 2 ):
(1:34)
Members of any of these three classes have a mean of zero and are uncorrelated. Normal errors are in the core of all error classes considered in this book. This means that any asymptotic results should hold for normal errors and the class of normal errors can be used as litmus paper for tentative assumptions and proofs. The
32
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
criterion of convergence in the distribution of normal vectors (Section 1.8.6) facilitates verifying convergence in this class. Some results will be proved for linear processes P as errors. Let {cj : j [ Z} be a double-infinite summable sequence of numbers, j[Z jcj j , 1, and let {ej : j [ Z} be a sequence of integrable zero-mean random variables, called innovations. A linear process is a sequence {vj : j [ Z} defined by the convolution vt ¼
X
cj etj , t [ Z:
(1:35)
j[Z
Members of any of the above three classes may serve as the innovations. If c0 ¼ 1 and cj ¼ 0 for any j = 0, we get vt ¼ et , which shows that the class of linear processes includes any of the three classes of Eq. (1.34). Linear processesPwith summable {cj } are called short-memory processes. If supj Ejej j , 1 and j jcj j , 1, then vt have uniformly bounded L1 -norms, P Ejvt j supj Ejej j j jcj j , 1, and zero means. More general processes with P 2 square-summable {cj }, j[Z cj , 1, are called long-memory processes. In this case, if the innovations are uncorrelated and have uniformly bounded L2 -norms, P then vt exist in the sense of L2 : Ev2t supj Ee2j j c2j , 1. There are also mixing processes, see (Davidson, 1994), which are more useful in nonlinear problems. Longmemory and mixing processes are not considered here. Long-memory processes do not fit Theorem 3.5.2, as discussed in Section 3. Conditions in terms of mixing processes do not look nice, perhaps because they are inherently complex or the theory is underdeveloped.
1.10 NORMALIZATION OF REGRESSORS 1.10.1 Normal Errors as the Touchstone of the Asymptotic Theory Suppose we have a series of regressions y ¼ X b þ e with the same b and n going to infinity (dependence of y, X and e on n is not reflected in the notation). We would like to know if the sequence of corresponding OLS estimators b^ converges in distribution to a normal vector. We shall see that, as a preliminary step, b^ should be centered on b and properly scaled, so that convergence takes place for Dn (b^ b), where Dn is some matrix function of the regressors. The factor Dn is called a normalizer (it normalizes variances of components of the transformed errors in the OLS estimator formula to a constant). The choice of the normalizer is of crucial importance as it affects the conditions imposed later on X and e. The classes of regressors and errors should be as wide as possible. The search for these classes is complicated if both regressors and errors are allowed to vary. However, under the hierarchy of errors described above the normal errors are the core of the theory. The implication is that, whatever the conditions imposed on X, they should work for the class of normal errors. The OLS estimator, being a linear transformation
1.10 NORMALIZATION OF REGRESSORS
33
of e, is normal when e is normal. Therefore from the criterion of convergence in distribution of normal vectors (Section 1.8.6) we conclude that the choice of the normalizer and the class of regressors should satisfy the conditions 1. lim EDn (b^ b) exists and 2. lim V(Dn (b^ b)) exists when e N(0, s 2 I): For deterministic X, it is natural to stick to deterministic Dn , so condition 1 trivially holds because of unbiasedness of b^ : The second condition can be called a variance stabilization condition.
1.10.2 Where Does the Square Root Come From? Consider n independent observations on a normal variable with mean b and standard deviation s: In terms of regression, we are dealing with X ¼ (1, . . . , 1)0 (n unities) and e N(0, s 2 I): From the representation of the OLS estimator (1.31) b^ b ¼ (e1 þ þ en )=n: By independence of the components of e this implies 1 s2 : [V(e ) þ þ V(e )] ¼ 1 n n n2 pffiffiffi Now it is easy to see that with Dn ¼ n the variance stabilization condition is satisfied pffiffiffi d and the criterion of convergence of normal variables gives n(b^ b) ! N(0, s 2 ): The square root also works for stable autoregressive models (Hamilton, 1994). V(b^ b) ¼
1.10.3 One Nontrivial Regressor and Normal Errors Consider a slightly more general case y ¼ xb þ e with x [ Rn and a scalar b. The representation of the OLS estimator reduces to b^ b ¼ x0 e=kxk22 and we easily find that V(kxk2 (b^ b)) ¼
n 1 X x2 s 2 ¼ s 2 kxk22 i¼1 i
under the same assumption e N(0, s 2 I): It follows that d kxk2 (b^ b) ! N(0, s 2 )
(1:36)
and Dn ¼ kxk2 is the right normalizer.pffiffiffi pffiffiffi pffiffiffi What if instead of Dn we use n? Then n(b^ b) ¼ nx0 e= kxk22 and the variance stabilization condition leads to n ! constant: kxk22
34
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
pffiffiffi This means that the n-rule separates a narrow class of regressors for which kxk2 is of pffiffiffi order n for large n. In general, any function of n tending to 1 as n ! 1 can be used as a normalizer for some class of regressors, and there are as many classes as there are functions with different behavior at infinity. The normalizer Dn ¼ kxk2 is better because it adapts to the regressor instead of separating some class. For example, for x ¼ (1, . . . , 1)0 (n unities) it gives the classical square root and for a linear trend x1 ¼ 1, x2 ¼ 2, . . . , xn ¼ n it grows as n3=2 : As Dn is self-adjusting, you don’t need to know the rate of growth of kxk2 : This is especially important in applications where regressors don’t have any particular analytical pattern. The decisive argument is that Dn is in some sense unique (see Section 1.11.3).
1.10.4 The Errors Contribution Negligibility Condition Let us look again at y ¼ xb þ e where e1 , . . . , en are now IID(0, s2 I) and not necessarily normal. Having made up our mind regarding the normalizer we need to prove convergence in distribution of kxk2 (b^ b) ¼
x1 xn e1 þ þ en : kxk2 kxk2
Here is where CLTs step in. The CLTs we need affirm the asymptotic normality of weighted sums n X
wnt et
t¼1
of random variables e1 , . . . , en , which are not necessarily normal. Convergence in distribution of such sums is possible under two types of restrictions. The first type limits dependence among the random variables and is satisfied in the case under consideration because we assume independence. The second type requires contribution of each term in the sum to vanish asymptotically where
contribution ¼
variance of a term : variance of the sum
Under our assumptions this type boils down to the condition jxt j ¼ 0, n!1 1tn kxk2 lim max
(1:37)
often called an errors contribution negligibility condition. This condition in combination with e IID(0, s 2 I) is sufficient to prove Eq. (1.36).
1.11 GENERAL FRAMEWORK IN THE CASE OF K REGRESSORS
35
1.11 GENERAL FRAMEWORK IN THE CASE OF K REGRESSORS 1.11.1 The Conventional Scheme Now in the model y ¼ X b þ e we allow X to have more than one column and assume det X 0 X = 0, e IID(0, s 2 I). The rough approach consists in generalizing upon Section 1.10.2 (with a constant regressor) by relying on the identity 0 1 0 pffiffiffi XX Xe ^ pffiffiffi : n (b b ) ¼ n n
(1:38)
Suppose that here X0X exists and is nonsingular n!1 n
limit A ¼ lim
(1:39)
and that X0e d pffiffiffi ! N(0, B): n
(1:40)
Then, by continuity of matrix inversion (X 0 X=n)1 ! A1 and the rule for convergence in distribution [Lemma 1.8.3(ii)] implies
X0X n
1
X0e d pffiffiffi ! A1 u, u N(0, B): n
As a result, pffiffiffi d n(b^ b) ! N(0, A1 BA1 ):
(1:41)
As in case k ¼ 1, the rough approach separates a narrow class of regressor matrices by virtue of conditions (1.39) and (1.40). The refined approach is based on the variance stabilization idea. Partitioning X into columns, X ¼ (X1 , . . . , Xk ), we see that the vector u ¼ X 0 e has components uj ¼ Xj0 e with variances V(uj ) ¼ s 2 kXj k22 : Since X 0 X is the Gram matrix of the system {X1 , . . . , Xk }, the condition det X 0 X = 0 is equivalent to linear independence of the columns (Section 1.7.4) and implies kXj k2 = 0 for all j and large n. If we define the normalizer by
Dn ¼ diag kX1 k2 , . . . , kXk k2 ,
(1:42)
36
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
then the matrix H¼
XD1 n
X1 Xk ¼ , ..., ¼ (H1 , . . . , Hk ) kX1 k2 kXk k2
has normalized columns, kHj k2 ¼ 1: This construction is simple yet so important that I would love to name it after the discoverer. Unfortunately, the historical evidence is not clear-cut, as is shown in Section 1.11.2. For this reason I call Dn a variancestabilizing (VS) normalizer. The analog of Eq. (1.38) is [see Eq. (1.31)] Dn (b^ b) ¼ Dn (X 0 X)1 X 0 e 0 1 1 1 0 0 1 0 ¼ (D1 n X XDn ) Dn X e ¼ (H H) H e:
(1:43)
Naturally, the place of Eqs. (1.39) and (1.40) is taken by limit A ¼ lim H 0 H exists and is nonsingular n!1
(1:44)
and d
H 0 e ! N(0, B):
(1:45)
We call both the combinations of Eqs. (1.38) þ (1.39) þ (1.40) and Eqs. (1.43) þ (1.44) þ (1.45) a conventional scheme of derivation of the OLS asymptotics. The result in Section 1.11.3 implies that, if we want to use Eq. (1.43), condition (1.44) is unavoidable. If Eq. (1.44) is not satisfied with any normalization, the conventional scheme itself should be modified (see in Chapter 4, how P.C.B. Phillips handles this issue).
1.11.2 History The probabilists became aware of the variance stabilization principle a long time ago. It is realized in one or another form in all CLTs. It took some time for the idea to penetrate econometrics. Eicker (1963) introduced the normalizer Dn , but considered convergence of components of the OLS estimator instead of convergence of the estimator in joint distribution. Anderson (1971) proved convergence in joint distribution using Dn and mentioned that the result “in a slightly different form was given by Eicker”. Schmidt (1976), without reference to either Eicker or Anderson, established a result similar to Anderson’s. None of these three authors compare Dn to the classical normalizer. Moreover, Schmidt’s comments imply that he thinks of Dn as complementary to the square root. Amemiya (1985) proved Anderson’s result, without referring to thepthree ffiffiffi authors just cited. Evidently, he was the first to show that Dn is superior to n in
1.11 GENERAL FRAMEWORK IN THE CASE OF K REGRESSORS
37
the sense that Eq. (1.44) is more general than Eq. (1.39). He also noticed that Dn -type normalization is applicable to maximum likelihood estimators. Finally, Mynbaev and Castelar (2001) established that Dn is more general than any other normalizer, as long as the conventional scheme is employed. This result is the subject of Section 1.11.3.
1.11.3 Universality of Dn Definition. A diagonal matrix (actually, a sequence of matrices) Dn is called a conventional-scheme-compliant (CSC) normalizer if H ¼ XD1 n satisfies Eqs. (1.44) 2 and (1.45) for all errors e IID(0, s I): If {Mn } is any sequence of nonstochastic diagonal matrices satisfying the condition limit M ¼ lim Mn exists and is nonsingular
(1:46)
Dn ¼ Mn Dn is also a CSC and Dn is a CSC normalizer, then it is easily checked that e normalizer with e ¼ M 1 AM 1 , B~ ¼ M 1 BM 1 : e0 H e ¼ HMn1 , A~ ¼ lim H H Theorem. (Mynbaev and Castelar 2001) The VS normalizer (1.42) is unique in the class of CSC normalizers up to a factor satisfying Eq. (1.46). It follows that if with some normalizer the conventional scheme works, then Dn can also be used, while the converse may not be true. 1 Proof. Let Dn ¼ diag[ d n1 , . . . , d nk ] be some CSC normalizer, H ¼ X Dn , and let A and B be the corresponding elements of the conventional scheme. The diagonal of 0 the limit relation H H ! A gives 0
2
H j H j ¼ kXj k22 =d nj ! a jj , j ¼ 1, . . . , k,
(1:47)
where H j denote the columns of H, Xj the columns of X and a ij the elements of A: Recalling that Dn has dnj ¼ kXj k2 on its diagonal we deduce from Eq. (1.47) that dnj =d nj ! a 1=2 jj ,
j ¼ 1, . . . , k:
(1:48) 0
By the Cauchy –Schwarz inequality the elements of H H satisfy the inequality H j j kH i k2 kH j k2 : Letting n ! 1 here and using Eq. (1.47) we get jaij j (aii a jj )1=2 : This tells us that none of the diagonal elements can be zero because otherwise a whole cross in A would consist of zeros and A would be singular. 1 Now from Eq. (1.48) we see that Mn ¼ Dn Dn satisfies Eq. (1.46) and Dn ¼ Mn Dn differs from Dn by an asymptotically constant diagonal factor. It follows that Dn is CSC with A ¼ M 1 AM 1 and B ¼ M 1 BM 1 : 0 jH i
38
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
The square root is an example of a normalizer that has a narrower area of appliB cability than Dn .
1.11.4 The Moore– Penrose Inverse Suppose A is a singular square matrix. According to (Rao 1965, Section 1b.5) the Moore – Penrose inverse Aþ of a matrix A is uniquely defined by the properties AAþ A ¼ A,
(1:49)
þ
(1:50) (1:51)
þ
þ
A AA ¼ A , AAþ and Aþ A are symmetric:
When A is symmetric, Aþ can be constructed explicitly using its diagonal representation. Let A be of order n and diagonalized as A ¼ PLP0 where P is orthogonal, P0 P ¼ I and L is a diagonal of eigenvalues of A (see Theorem 1.7.2). Denote þ 1 , 1 ¼ l l 0,
l = 0; 1 þ (L ) ¼ diag l ¼ 0:
þ þ 1 1 , ..., , l1 ln
Aþ ¼ P(L1 )þ P0 :
Lemma. Aþ is the Moore – Penrose inverse of A. It is symmetric and the matrix Q ¼ Aþ A is an orthoprojector: Q0 ¼ Q, Q2 ¼ Q: Proof. Aþ is symmetric by construction. It is easy to see that the product D ¼ (L1 )þ L has zeros where L has zeros and unities where L has nonzero eigenvalues. Therefore LD ¼ L and DLþ ¼ Lþ , so that Eqs. (1.49) and (1.50) are true: AAþ A ¼ PLDP0 ¼ A, Aþ AAþ ¼ PDLþ P0 ¼ Aþ : Besides, the matrices AAþ ¼ PLLþ P0 and Aþ A ¼ PLþ LP0 ¼ PDP0 are symmetric. By the uniqueness of the Moore – Penrose inverse, Aþ is that inverse. The symmetry of Q ¼ Aþ A has just been shown. Q is idempotent: B Q2 ¼ (Aþ A)2 ¼ PD2 P0 ¼ Q. Note that Aþ is not a continuous function of A. For example, An ¼
1 0 0 1=n
1.11 GENERAL FRAMEWORK IN THE CASE OF K REGRESSORS
39
converges to A¼
1 0
0 0
¼ Aþ
but Aþ n ¼
1 0 0 n
does not converge to Aþ .
1.11.5 What if the Limit of the Denominator Matrix is Singular? Can the Moore – Penrose inverse save the situation? It is important to realize that convergence in distribution of Dn ( b^ b) in the conventional scheme is obtained as a consequence of Equations (1.43) – (1.45) from Section 1.11.1. Since the Moore – Penrose inversion is not continuous, the scheme does not work when the limit of the denominator matrix is singular. The next proposition shows that the Moore – Penrose inverse can be applied if outside (independent of the conventional scheme) information is available in the form limit v ¼ dlimDn (b^ b) exists: Lemma.
(1:52)
If instead of Eq. (1.44) we assume that limit A ¼ lim H 0 H exists and is singular n!1
(1:53)
and if two pieces of information about convergence in distribution are available in the form of Eqs. (1.45) and (1.52), then Qv N(0, Aþ BAþ ) where Q ¼ Aþ A is an orthoprojector. Proof. The normal equation X 0 X(b^ b) ¼ X 0 e can be rewritten as H 0 HDn (b^ b) ¼ H 0 e: Denoting u the limit of the numerator and using Eqs. (1.53), (1.45) and (1.52) we get Av ¼ u: Premultiply this by Aþ to obtain Qv ¼ Aþ u: Now the statement follows from Eq. (1.45). B Thus, under the additional condition (1.52) some projection of v is normally distributed, with a degenerate variance Aþ BAþ .
40
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
1.12 INTRODUCTION TO L2-APPROXIMABILITY 1.12.1 Asymptotic Linear Independence By Theorem 1.7.4 the Gram matrix 0
H10 H1 0 @ ... G¼HH¼ Hk0 H1
1 . . . H10 Hk ... ... A . . . Hk0 Hk
is nonsingular if and only if the columns H1 , . . . , Hk of H are linearly independent. Therefore condition (1.44) is termed the asymptotic linear independence condition. The question is: can the word “asymptotic” be removed from this name, that is, are there any vectors for which nonsingularity of the limit A ¼ limn!1 H 0 H would mean simply linear independence? Imagine that for each j we have convergence of columns Hj ! Mj , as n ! 1, in such a way that Hk0 Hl ! Mk0 Mk : Then existence of the limit A ¼ limn!1 H 0 H would be guaranteed and detA = 0 would mean linear independence of M1 , . . . , Mk . Unfortunately, the sequences {Hj : n . k} do not converge. Their elements belong to Rn2 , which can be embedded naturally into l2 (N). A necessary condition for convergence x(n) ! x in l2 (N) is the coordinate-wise convergence x(n) i ! xi , n ! 1, for all i ¼ 1, 2, . . . . But for Eq. (1.45) to be true we have to require the errors contribution negligibility condition (1.37) which in terms of the elements of H looks like this: lim max jhij j ¼ 0:
n!1 i, j
Thus, convergence Hj ! Mj , as n ! 1, implies Mj ¼ 0, but this is impossible because kHj k2 ¼ 1 for all n because of normalization.
1.12.2 Discretization The general idea is to approximate sequences of vectors (functions of a discrete argument) with functions of a continuous argument. For any natural n a function f [ C[0, 1] generates a vector with coordinates f (i=n), i ¼ 1, . . . , n: A sequence of vectors {x(n) }, with x(n) [ Rn for all n, can be considered close to f if i ! 0, n ! 1: f max x(n) i 1in n This kind of approximation was used by Nabeya and Tanaka (1988), see also, (Tanaka, 1996). A better idea is to use the class L2 (0, 1), which is wider than C[0, 1]: However, the members of L2 (0, 1) are defined only up to sets of Lebesgue measure 0, and it doesn’t make sense to talk about values f (i=n) for f [ L2 (0, 1): Ð i=n Instead of values we can use integrals (i1)=n f (t) dt, i ¼ 1, . . . , n: For convenience,
1.12 INTRODUCTION TO L2-APPROXIMABILITY
the vector of integrals is multiplied by zation operator dn ð pffiffiffi i=n (dn f )i ¼ n
41
pffiffiffi n, which gives the definition of the discreti-
i ¼ 1, . . . , n:
f (t) dt,
(1:54)
(i1)=n
The sequence {dn f : n [ N} is called L2 -generated by f. L2-generated sequences were introduced by Moussatat (1976). With the volatility of economic data, in econometrics it is unacceptable to require regressors to be L2 -generated or, in other words, to be exact images of some f [ L2 (0, 1) under the mapping dn : To allow some deviation from exact images, in a conference presentation (Mynbaev 1997) I defined an L2 -approximable sequence as a sequence {x(n) } for which there is a function f [ L2 (0, 1) satisfying kx(n) dn f k2 ! 0: If this is true, we also say that {x(n) } is L2 -close to f [ L2 (0, 1): It is worth emphasizing that the OLS estimator asymptotics can be proved without this condition. When the errors are independent, the asymptotic linear independence and errors contribution negligibility condition are sufficient for this purpose, see (Anderson 1971; Amemiya 1985). In 1997 I needed this notion to find the asymptotic behavior of the fitted value, which is a more advanced problem. Note also that (Po¨tscher and Prucha 1997) and Davidson (1994) used the term Lp -approximability in a different context. L2 -approximable sequences and, more generally, Lp -approximable sequences defined in (Mynbaev 2000) possess some continuity properties when p , 1: This is their main advantage over general sequences.
1.12.3 Ordinary Least Squares Asymptotics Theorem.
Consider a linear model y ¼ X b þ u where
(i) the errors u1 , . . . , un are defined by Eq. (1.35), the innovations {ej : j [ Z} P 2 are IID(0, s2 I), j[Z jcj j , 1 and ej are uniformly integrable; (ii) for each j ¼ 1, . . . , k, the sequence of columns {Hj : n . k} of the normalized regressor matrix H ¼ XD1 n is L2 -close to Mj [ L2 (0, 1); (iii) the functions M1 , . . . , Mk are linearly independent. Then the denominator matrix H 0 H converges to the Gram matrix G of the system M1 , . . . , Mk and d Dn (b^ b) ! N(0, (sbc )2 G1 ):
Proof. By Theorem 2.5.3 limn!1 Hi0 Hj ¼ 0
0
d
Ð1
0 Mi Mj dt (sbc )2 G)
(1:55) and, in consequence,
lim H H ¼ G: By Theorem 3.5.2 H u ! N(0, (this includes the case when H 0 u converges in distribution and in probability to a null vector). Equation (1.55) follows from the conventional scheme. B
42
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
In similar results with VS normalization (Anderson, 1971, Theorem 2.6.1; Schmidt, 1976, Section 2.7; Amemiya, 1985, Theorem 3.5.4) the errors are assumed independent. Assumptions on H vary from source to source. In Theorems 2.5.3 and 3.5.2 the necessary properties of H are derived from the L2 -approximability assumption. Instead, we could require them directly. When the errors are independent, these properties are: existence of the limit A ¼ limn!1 H 0 H, asymptotic linear independence det A = 0 and the errors contribution negligibility condition limn!1 maxi, j jhij j ¼ 0: Thus, as far as the OLS asymptotics for the classical model is concerned, the L2 -approximability condition is stronger than the minimum required. It becomes indispensable when deeper properties are needed, like convergence of the fitted value considered next.
1.12.4 Convergence of the Fitted Value The fitted value is defined by y^ ¼ X b^ . The need for its asymptotics may arise in the following way. Suppose we have to estimate stock q(t) based on its known initial value q(t0 ) and flow (rate of change) q0 (t). By the Newton – Leibniz formula, Ðt q(t) q(t0 ) ¼ t0 q0 (s) ds. If q 0 (t) is measured at discrete points and regressed on, say, a polynomial of time, the interpolated fitted value approximates q0 on the whole interval [t0 , t] and integrating it gives an estimate of q(t) q(t0 ). As is the case with the OLS estimator, the fitted value has to be transformed to achieve convergence in distribution. Centering on X b results in ^ ^ y^ X b ¼ X(b^ b) ¼ XD1 n Dn (b b) ¼ HDn (b b):
(1:56)
Convergence of Dn (b^ b) is available from Theorem 1.12.3, but H does not converge, as explained in Section 1.12.1. It happens, though, that interpolating H leads to a convergent sequence in L2 (0, 1). APvector x with n values is interpolated by constants to obtain a step function Dn x ¼ nt¼1 xt 1it . The interpolation operator Dn is applied to columns of H. From Eq. (1.56) we get Dn ( y^ X b) ¼ Dn
k X
Hl [Dn (b^ b)]l ¼
l¼1
k X
(Dn Hl )[Dn (b^ b)]l :
l¼1
Theorem. Under the assumptions of Theorem 1.12.3 the fitted value converges in distribution to a linear combination of the functions M1, . . . ,Mk, d
Dn ( y^ X b) !
k X
Ml ct ,
l¼1
where the random vector c ¼ (c, . . . , ck)0 is distributed as N(0, (sbc )2 G1 ). Proof. By Lemma 2.5.1 the L2-approximability condition kHl 2 dnMlk ! 0 is equivalent to kDnHl 2 Mlk! 0. Convergence of {Dn Hl } to Ml in L2 implies convergence in distribution of {Dn H} to the vector M ¼ (M1, . . . , Mk)0 . In the expression Dn (^y X b) ¼ [Dn H 0 ][Dn (b^ b)] both factors in brackets at the right
1.12 INTRODUCTION TO L2-APPROXIMABILITY
43
converge in distribution. Since their limits M and u ¼ dlimDn (b^ b) are independent d and, for each n, DnH and Dn (b^ b) are independent, the relations Dn H ! M and d d Dn (b^ b) ! u imply convergence of the pair (Dn H, Dn (b^ b)) ! (M, u) see (Billingsley 1968, pp. 26– 27). By the continuous mapping theorem then d
Dn ( y^ X b) ! M 0 u.
B
1.12.5 Convictions and Preconceptions In econometrics too much depends on the views of the researcher. Apparently, a set of real-world data can be looked at from different angles. Unfortunately, theoretical studies also suffer from the subjectivity of their authors. Two different sets of assumptions for the same model may lead to quite different conclusions. The choice of the assumptions depends on the previous experience of the researcher, the method employed and the desired result. Assumptions made for and views drawn from a simple model are often taken to a higher level where they can be called convictions if justified or preconceptions if questionable. A practitioner usually worries only about the qualitative side of the result. A highly technical paper about estimator asymptotics in his/her interpretation boils down to “under some regularity conditions the estimator is asymptotically normal”. Hypotheses tests are conducted accordingly, the result is cited without proofs in expository monographs for applied specialists and, with time, becomes a part of folklore. The probability of a critical revision of the original paper declines exponentially. Imagine that you are a security agent entrusted with the task of capturing an alien that is killing humans. If you presuppose that the beast is disguised like a human your course of actions will be quite different from what it would be if you were looking for a giant cockroach. When you see a new estimator, its asymptotics is that alien. The best of all is not to presume that it is of a particular type. Make simplified assumptions and look at the finite-sample distributions in the case of normal disturbances. If they are normal, perhaps the asymptotics is also normal. If they are not, a suitable transformation of the estimator, such as centering and scaling, may result in normal asymptotics. Alternatively, you may have to apply a CLT in conjunction with the CMT to obtain nonnormal asymptotics. All these possibilities are illustrated in the book. By choosing the format of the result you make a commitment. Normal asymptotics is usually proved using a CLT. Let us say it comes with conditions (A), (B) and (C). To satisfy them, you impose in terms of your model conditions (A0 ), (B0 ) and (C0 ), respectively. These conditions determine the class of processes your result is applicable to. By selecting a different format you are bound to use different techniques and obtain a different class. In the case of the conventional scheme an easy way to go is simply assume that X and e are such that either Eqs. (1.39) þ (1.40) or Eqs. (1.44) þ (1.45) are satisfied. I call such a “theorem” a pig-in-a-poke result. While this approach serves illustrative purposes in a university course well, its value in a research paper or monograph is doubtful. Eicker (1963) mentions that conditions should be imposed separately on the errors and regressors.
44
CHAPTER 1
INTRODUCTION TO OPERATORS, PROBABILITIES AND THE LINEAR MODEL
In this relation it is useful to distinguish between low-level conditions, stated directly in terms of the primary elements of the model, such as Eq. (1.44), and high-level conditions, expressed in terms of some complex combinations of the basic elements, such as Eq. (1.45). Of course, this distinction is relative. For instance, the L2-approximability assumption about deterministic regressors made in the most part of this book is of a lower level than Eq. (1.44). The parsimony principle in econometric modeling states that a model should contain as few parameters as possible or be simple otherwise and still describe the process in question well. A similar principle applies to the choice of conditions. If you have imposed several of them and are about to require a new one, make sure that it is not implied or contradicted by the previous conditions. My major professor, M. Otelbaev, used to say, “If I am allowed to impose many conditions, I can prove anything”. Transparency, simplicity and beauty are other subjective measures of the assumptions quality. A good taste is acquired by reading and comparing many sources. It is not a good idea to have a prospective user of your result prove a whole theorem to check whether your assumptions are satisfied. Nontransparent conditions appealing to existence of objects with certain properties are especially dangerous. It is quite possible to use the right theorems and comply with all the rules of formal logic and get a bad statement because the set of objects it applies to will be empty if the conditions are contradictory or existence requirements are infeasible. Contradictions are easy to avoid by using conditions with nonoverlapping responsibilities. In other words, beware of two different conditions governing the behavior of the same object. Generalizations do not always work, as we have seen when going from constant to variable regressors. However, when studying a dynamic model, such as the mixed spatial model Y ¼ Xb þ rWY þ e in Chapter 5, I choose the conditions and methods that work for its two submodels, Y ¼ Xb þ e and Y ¼ rWY þ e. In this sense, this book is not free from subjectivity. Generalizations based on the conventional scheme can be as harmful as any others. The study of the purely spatial model in Chapter 5 shows that the said model violates the habitual notions in several ways: 1. the OLS asymptotics is not normal, 2. the limit of the numerator vector is not normal, 3. the limit of the denominator matrix is not constant, 4. the normalizer is identically 1 (that is, no scaling is necessary) and 5. there is no consistency. These days requirements to econometric papers are very high. If you suggest a new model, you have to defend it by showing its theoretical advantages and testing its practical performance, preferably in the same paper. The author of a new model can be excused if he/she studies the model under simplified assumptions and leaves the generalizations and refinements to the followers. The way of modeling deterministic regressors advocated here allows us to combine simple assumptions with rigorous proofs.
CHAPTER
2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
I
chapter we use some classical tools, the Haar projector and continuity modulus in the first place. With their help we can study the properties of discretization, interpolation and convolution operators. From there we go to Lp -approximable sequences and their properties, among which the criterion of Lp -approximability is the most advanced. The chapter ends with examples. Everywhere in this chapter Lp denotes the space Lp (0, 1). Where necessary integration over subsets of (0, 1) is Ð 1=p b . indicated as in kFk p,(a,b) ¼ a jF(x)jp dx N THIS
2.1 DISCRETIZATION, INTERPOLATION AND HAAR PROJECTOR IN Lp 2.1.1 Partitions and Coverings For each natural n the set {t=n: t ¼ 0, . . . , n} is called a uniform partition. The intervals it ¼ [(t 1)=n, t=n) form a disjoint covering of [0, 1) of equal length 1=n. Denoting [a] as the integer part of a real number a, we can see that the condition x [ it is equivalent to t 1 nx , t, which, in turn, is equivalent to t ¼ [nx] þ 1. The function [nx] þ 1 can be called a locator because x [ i[nx]þ1 for all x [ [0, 1).
2.1.2 Discretization Operator Definition For each natural n, we can define a discretization operator dnp : Lp ! Rnp by ð (dnp F)t ¼ n1=q F(x) dx, t ¼ 1, . . . , n, F [ Lp , it
where q is the conjugate of p, 1=p þ 1=q ¼ 1. Up to a scaling factor, the tth component of dnp F is the average of F over the interval it . For a given F [ Lp , the sequence {dnp F: n [ N} is called Lp -generated by F. Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
45
46
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
2.1.3 Discretization Operator Properties Lemma.
If F [ Lp , 1 p 1, then
(i) j(dnp F)t j kFkp,it , (ii) kdnp Fkp kFkp and (iii) limn!1 max1 t n j(dnp F)t j ¼ 0 in case p , 1. Proof. Everywhere we assume p , 1, the modification for p ¼ 1 being obvious. (i) By Ho¨lder’s inequality 0
11=p 0
ð
ð
11=q
j(dnp F)t j n1=q @ jF(x)jp dxA @ dxA it
¼ kFkp,it :
it
(ii) If p , 1, use part (i) to get by additivity of integrals n X
!1=p j(dnp F)t jp
t¼1
0 @
n ð X t¼1
11=p jF(x)jp dxA
¼ kFkp :
it
(iii) Since jF()jp [ L1 , we can use absolute continuity of the Lebesgue integral to Ð find, pfor any 1 . 0, a number n(1) . 0 such that n . n(1) implies it jF(x)j dx , 1 for t ¼ 1, . . . , n. Now the statement follows from (i). B Statement (ii) means that the norms of dnp from Lp to lp do not exceed 1.
2.1.4 Continuity Modulus in Lp , p , 1 Most properties of continuity moduli we need can be found in Zhuk and Natanson (2001). For y [ R, let ty be the translation operator, (ty F)(x) ¼ F(x þ y). If F is defined on (a, b), ty F is defined on (a, b) y ¼ (a y, b y). As the domain of the difference F ty F people take the intersection of these intervals, denoted (a, b)y ¼ (a, b) > [(a, b) y] ¼ ( max{a, a y}, min{b, b y}): In particular, with V ¼ (0, 1) the difference F ty F is defined on Vy . The continuity modulus of F [ Lp equals, by definition,
vp (F, d) ¼ sup kF ty Fk p,Vy , d . 0: jyj d
The continuity modulus is designed to measure how close ty F is to F for small y. This can be demonstrated using the indicator 1A of a set A. Think of the function
2.1 DISCRETIZATION, INTERPOLATION AND HAAR PROJECTOR IN Lp
47
F ¼ 1A as an example, where A is some measurable subset of (0, 1). Then ty 1A ¼ 1 on A y and ty 1A ¼ 0 outside A y. The Venn diagram shows that 1A ty 1A is zero on the intersection A > (A y) and outside the union A < (A y) and it is unity on the symmetric difference sy ¼ [An(A y)] < [(A y)nA] of A and A y. Therefore when p , 1 ð ð k1A ty 1A kpp,Vy ¼ j1A ty 1A jp dx dx ¼ mes(sy ), sy
Vy
where mes denotes the Lebesgue measure. In the theory of the Lebesgue measure it is proved that mes(sy ) ! 0 when y ! 0, which implies k1A ty 1A kp,Vy ! 0, y ! 0 ( p , 1):
(2:1)
However, if p ¼ 1, then k1A ty 1A k1,Vy ess sup 1sy (x):
(2:2)
x[sy
Here the quantity at the right is 1 as long as mes(sy ) . 0.
2.1.5 Continuity of Elements of Lp Lemma (i) The continuity modulus is nondecreasing: d g implies that vp (F, d) vp (F, g). (ii) If F [ Lp , p , 1, then limd ! 0 vp (F, d) ¼ 0. Proof. Part (i) follows directly from the definition: supremum over a larger set is larger. (ii) Since linear combinations of indicators of measurable sets are dense in Lp when p , 1, Eq. (2.1) extends to all of Lp : kF ty Fkp,Vy ! 0, y ! 0 (F [ Lp , p , 1):
(2:3)
This is described by saying that functions from Lp with finite p are continuous in mean (or translation continuous). To prove (ii), it remains to apply sup over B jyj d to Eq. (2.3). Equation (2.2) shows that elements of L1 are not translation continuous. When it is important to have this property, L1 is replaced by the set of continuous functions C[0, 1]. The continuity modulus of a continuous function on [0, 1] is defined by
vC (F, d) ¼
sup jxyj d, x,y[[0,1]
jF(x) F(y)j:
48
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
This tends to 0 when d tends to zero because a continuous function on [0, 1] is uniformly continuous.
2.1.6 Interpolation Operator and Haar Projector Definitions The interpolation operator Dnp : Rnp ! Lp takes a vector f [ Rnp to a step function Dnp f ¼ n1=p
n X
ft 1it :
t¼1
Here, the factor n1=p is introduced just for convenience. Interpolation by a piece-wise constant function is sufficient for our purposes. The Haar projector Pn is defined on integrable functions on [0, 1] by Pn F ¼ n
n ð X t¼1
F(x) dx 1it
it
(the value of Pn F on the interval it is the average of F on that interval). Pn is really a projector because for is in the above sum only one term can be different from zero and Pn (Pn F) ¼ n
n ð X s¼1
¼n
is
n ð X s¼1
is
(Pn F)( y) dy 1is 0
ð
1
@n F(x) dx 1is (y)Ady 1is ¼ Pn F: is
2.1.7 Interpolation and Haar Projector Properties Lemma.
Let 1 p 1.
(i) Dnp preserves norms, kDnp f kp ¼ k f kp for all f [ Rnp , and bilinear forms, ð1
(Dnp f )Dnq g dx ¼ f 0 g for all f , g [ Rn :
(2:4)
0
(ii) Discretizing and then interpolating is the same as projecting: Dnp dnp ¼ Pn . Interpolation and subsequent discretization amount to applying the identity operator: dnp Dnp ¼ I on Rnp . (iii) Norms of the projectors Pn do not exceed 1: kPn Fkp kFkp .
2.2 CONVERGENCE OF BILINEAR FORMS
49
Proof. (i) If p , 1, then ð1
p
j(Dnp f )(x)j dx ¼
n ð X t¼1
0
¼n
j(Dnp f )(x)jp dx
it
n ð X t¼1
j ft jp 1it (x) dx ¼
n X
j ft jp :
t¼1
it
If p ¼ 1, then kDn1 f k1 ¼ maxt¼1,..., n j ft j. Further, ð1 (Dnp f )Dnq g dx ¼
n ð X t¼1
0
1=p
n
ft n
1=q
gt dx ¼
n X
nft gt dx ¼ f 0 g:
t¼1
it
ð it
(ii) For F [ Lp Dnp (dnp F) ¼ n1=p
n X
(dnp F)t 1it ¼ n1=pþ1=q
t¼1
n ð X t¼1
F(x) dx 1it ¼ Pn F:
it
To see that dnp Dnp f ¼ f for all f [ Rn it suffices to check that (dnp Dnp f )t ¼ ft for all t. This is true because interpolating ft generates a constant on it equal to F ¼ n1=p ft . With this F, (dnp F)t gives ft . (iii) From boundedness of the discretization operator [Lemma 2.1.3(ii)] and items (i) and (ii) of this lemma kPn Fkp ¼ kDnp dnp Fkp ¼ kdnp Fkp kFkp :
B
2.2 CONVERGENCE OF BILINEAR FORMS An expression of type fixed, ð1
Ð1 0
F(x)G(x) dx is a bilinear form: it is linear in F when G is
ð1
ð1
(aF þ bH)G dx ¼ a FG dx þ b HG dx, 0
0
and similarly it is linear in G when F is fixed.
0
50
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
2.2.1 Convergence of Haar Projectors to the Identity Operator Lemma. If p , 1, then the sequence {Pn } converges strongly to the identity operator with the next bound on the rate of convergence: kPn F Fkp 21=p vp (F, 1=n). Proof. Step 1. Using additivity of integrals and the fact that the restriction of Pn F on it Ð equals n it Fdx we have kPn F Fkpp ¼
n ð X t¼1
j(Pn F)( y) F( y)jp dy
it
p ð ð ¼ n F(x) dx n F(y) dx dy t¼1 it it it p n ð ð X p ¼n (F(x) F( y)) dx dy: t¼1 it it n ð X
Now we apply Ho¨lder’s inequality to X(x) ¼ F(x) F( y) and Y(x) ; 1 and use the identity p p=q ¼ 1: kPn F
Fkpp
n
p
n ðð X t¼1
¼n
it it
n ðð X t¼1
jF(x) F( y)jp dx dy np=q
jF(x) F( y)jp dx dy:
it it
As we can see, we need to estimate the integrals over the squares it it . Step 2. Let F be defined on D ¼ (a, a þ b). We want to reveal the translation operator in the integral ðð jF(x) F( y)jp dx dy I¼ DD
(change x ¼ y þ z in the inner integral) aþb ð ð aþby
¼ a
ay
jF( y þ z) F( y)jp dz dy:
2.2 CONVERGENCE OF BILINEAR FORMS
51
Figure 2.1 Change of variables.
The inner integral should be over y. To change the order of integration, we have to split {z: b z b} into {z: b z 0} and {z: 0 z b} (Figure 2.1). Reading off the limits of integration from the diagram we write aþb ð
ð0 dz
I¼
jF( y) (tz F)( y)jp dy
az
b
ðb
aþbz ð
þ dz
jF( y) (tz F)( y)jp dy:
a
0
Step 3. Combining Steps 1 and 2 we get 2
kPn F
Fkpp
n
ð0
n X
6 4
t¼1
1=n
ð
1=n ð
dz
þ 0
ð dz
jF tz Fjp dy
(it )z
3 7 jF tz Fjp dy5:
(it )z
The intervals (it )z conveniently satisfy sy ) m(sy ):
(2.12)
54
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
Suppose [a, b] and [a, b] þ y do not intersect. Then either a þ y . b or b þ y , a. In both cases b a , y. sy is a union of two nonoverlapping segments of length b a, so mes(sy ) ¼ 2(b a) , 2jyj: Suppose [a, b] and [a, b] þ y do intersect. Then their symmetric difference sy is a union of two disjoint intervals of length jyj each, so mes(sy ) ¼ 2jyj: In consequence, mes(sy ) 2jyj independently of [a, b]. Since m is absolutely continuous with respect to the Lebesgue measure, for any 1 . 0 there exists d . 0 such that mes(A) , d implies m(A) , 1. By choosing jyj , d=2 we satisfy m(sy ) , 1. Now Eqs. (2.9), (2.10) and (2.12) show that vp ( fX, d=2) vp (X, d=2) þ 1. Since 1 and d are arbitrarily small, we have proved that lim vp ( f X, d) ¼ 0
d !0
uniformly in [a, b].
B
I claimed (Mynbaev, 2000) that this theorem holds for nonuniform partitions, but I failed to prove this generalization six years later when I was writing this chapter.
2.3 THE TRINITY AND ITS BOUNDEDNESS IN lp 2.3.1 Motivation Consider a linear process vt ¼
X
ctj ej , t [ Z,
j[Z
where {cj : j [ Z} is a summable sequence of real numbers and {ej : j [ Z} is a sequence of centered (Eej ¼ 0) integrable random variables (innovations). Suppose that for each n [ N we are given a vector of weights wn [ Rn and the question is about convergence in distribution of weighted sums Sn ¼
n X
wnt vt :
t¼1
Changing the order of summation, Sn ¼
n X t¼1
wnt
X
ctj ej ¼
j[Z
n X X j[Z
! wnt ctj ej ,
t¼1
we see that it makes sense to consider the convolution operator Tn : Rnp ! lp (Z) defined by (Tn w)j ¼
n X t¼1
wt ctj , j [ Z:
2.3 THE TRINITY AND ITS BOUNDEDNESS IN lp
55
Sometimes it is convenient to represent Tn w as 0 1 Tn w B 0 C Tn w ¼ @ Tn w A, Tnþ w where Tnþ : Rnp ! lp ( j . n), Tn0 : Rnp ! Rnp and Tn : Rnp ! lp ( j , 1) are defined by (Tnþ w)j ¼ (Tn w)j , j . n; (Tn0 w)j ¼ (Tn w)j , 1 j n; (Tn w)j ¼ (Tn w)j , j , 1: As a consequence of the exceptional role of Tnþ , Tn0 and Tn in this theory, I call these three operators a trinity. Naturally, Tn is called a T-operator.
2.3.2 Boundedness of the Trinity in lp Lemma.
If ac , 1 and 1 p 1, then supn kTn k ac and sup max{kTnþ k, kTn0 k, kTn k} ac :
(2:13)
n
Proof. Let E n be the embedding operator of Rnp into lp (Z): 0, if t , 1 or t . n; (E n w)t ¼ wt , if 1 t n: Denote tj the translation operator in lp (Z): (tj w)t ¼ wtþj , j, t [ Z: Obviously, both operators preserve norms, kE n wkp ¼ kwkp , ktj wkp ¼ kwkp and, as a result, ktj E n k 1. From the definition of Tn we have X X (Tn w)j ¼ (E n w)t ctj ¼ (E n w) jþs cs t[Z
¼
X
s[Z
(ts E n w)j cs ¼
s[Z
X s[Z
!
cs ts E n w : j
Since this is true for all j [ Z, we have proved the representation Tn ¼ which implies X jcs jkts E n k ac : kTn k s[Z
P
s[Z cs ts E n ,
56
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
Now Eq. (2.13) follows from kTn wkp ¼ (kTnþ wkpp þ kTn0 wkpp þ kTn wkpp )1=p :
B
2.3.3 Matrix Representation of the Trinity Correct me if I am wrong, but using matrix representations seems to be inevitable when studying the further properties of the trinity members contained in Theorem 2.4.9. These members can be identified with matrices 0
Bc B 3 Tn ¼ B @ c2 0
c1
c2
cn Bc B 1n Tnþ ¼ B @ c2n
0
1 c4 cnþ2 C C C, c3 cnþ1 A
c1n cn c1n
cn c1
c0 Bc B 1 Tn0 ¼ B @ c1n 1
c1 c0 cn
1 cn1 cn2 C C C, A
c0
c2 C C C: c3 A
All of these matrices have n columns; Tn has an infinite number of rows stretching upward, and Tnþ has an infinite number of rows stretching downward. Their structure suggests using diagonal matrices for analysis. Let In and 0n denote the n n identity and null matrices. Then the matrices 0(nk)k Ink , A00 ¼ In ; A0k ¼ 0k 0k(nk) 0jkj(njkj) 0jkj , A0k ¼ Injkj 0(njkj)jkj
k ¼ 1, . . . , n 1; (2:14) k ¼ 1, . . . , n þ 1,
have the elements of their respective diagonals (main, sub or super) equal to 1 and all others equal to 0. The norms of all these operators do not exceed 1: kA0k zkp ¼ k(zkþ1 , . . . , zn , 0, . . . , 0)0 kp kzkp ,
k ¼ 0, . . . , n 1,
(2.15)
kA0k zkp ¼ k(0, . . . , 0, z1 , . . . , znjkj )0 kp kzkp ,
k ¼ 1, . . . , n þ 1:
(2.16)
From the matrix expression for Tn0 it is directly seen that Tn0 ¼
n1 X k¼nþ1
ck A0k :
(2:17)
2.4 CONVERGENCE OF THE TRINITY ON Lp-GENERATED SEQUENCES
57
To obtain a similar representation for Tn put 01k 01(nk) Ak ¼ , 1 k n; Ik 0k(nk) 1n 0 1 01n B C In A , k . n: A k ¼@ 0(kn)n 1n Then 0 kA k zkp ¼ k(. . . , 0, z1 , . . . , zk ) kp kzkp ,
kA k zkp
k n; (2.18)
0
¼ k(. . . , 0, z1 , . . . , zn , 0, . . . , 0) kp kzkp , 1 X ck A Tn ¼ k :
k . n; (2.19)
k¼1
Following the same track, let 0k(nk) Ik ¼ , 1 k n; Aþ k 01(nk) 01k 1n 0 1 0(kn)n B C In A , k . n: Aþ k ¼@ 01n
1n
Quite similar to what we have for Tn , now we have Tnþ ¼
1 X
þ ck Aþ k , kAk zkp kzkp , k 1:
(2.20)
k¼1
By the way, the representations and bounds we have obtained allow us to improve upon Eq. (2.13): kTn0 k
n1 X k¼nþ1
jck j, kTn k
1 X
jck j, kTnþ k
1 X
k¼1
jck j:
k¼1
2.4 CONVERGENCE OF THE TRINITY ON Lp-GENERATED SEQUENCES 2.4.1 Some Estimates for Images of Functions from Lp Functions from Ð Lp have two types of properties: magnitude properties, characterized by integrals A jF(x)jp dx over measurable sets, and continuity properties, embodied mainly in the continuity modulus vp (F, d). In this terminology, functions from lp have only magnitude properties and no continuity ones. We should expect
58
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
discretizations dnp F of elements of Lp to be better in some way than general elements of lp . Here we obtain several estimates that confirm this surmise. Lemma.
If F [ Lp , p , 1, then fn ¼ dnp F satisfies
(i) kA0k fn fn kp (vpp (F, k=n) þ kFkpp,(0,k=n) ) (ii) (iii) (iv) (v) (vi) (vii)
1=p
,
1=p kA0k fn fn kp (vpp (F, k=n) þ kFkpp,(1k=n,1) ) , kA0k fn kp kFkp , jkj n 1, kA k fn kp kFk p,(0,k=n) , k n, for all k, kA k fn kp kFkp þ kAk fn kp kFk p,(1k=n,1) , k n, for all k. kAþ k fn kp kFkp
k ¼ 1, . . . , n þ 1, k ¼ 0, 1, . . . , n 1,
Proof. (i) By Eq. (2.16) A0k fn ¼ (0, . . . , 0, fn,1 , . . . , fn,njkj )0 (k zeros), so !1=p jkj n X X j fnt jp þ j fn,tjkj fnt jp : kA0k fn fn kp ¼ t¼1
(2:21)
t¼jkjþ1
For t jkj we use the bound on j(dnp F)t j from Lemma 2.1.3: jkj X t¼1
p
j fnt j
jkj ð X t¼1
p
jkj=n ð
jF(x)j dx ¼
it
jF(x)jp dx:
(2:22)
0
For t . jkj ð ð 1=q 1=q j fn,tjkj fnt j ¼ n F(x) dx n F(x) dx itjkj it ð ð 1=q ¼n F(x) dx F( y þ jkj=n) dy: itjkj it Apply the change x ¼ y þ jkj=n to map it to itjkj : ð 1=q j fn,tjkj fnt j ¼ n [F(x) (tjkj=n F)(x)] dx itjkj 0 B @
ð itjkj
11=p C jF(x) (tjkj=n F)(x)jp dxA :
(2.23)
2.4 CONVERGENCE OF THE TRINITY ON Lp-GENERATED SEQUENCES
59
Summarizing, 0 B kA0k fn fn kp @
jkj=n ð
n X
jFjp dx þ
t¼jkjþ1
0
11=p
ð
C jF tjkj=n Fjp dxA
itjkj
¼ (kFkpp,(0,jkj=n) þ kF tjkj=n Fkpp,(0,1jkj=n) ) (kFkpp,(0,jkj=n) þ vpp (F, jkj=n))
1=p
1=p
:
(2.24)
The final inequality is by the definition of the continuity modulus. (ii) For k ¼ 0 both sides of the inequality are null. Let k 1. By Eq. (2.15) A0k fn ¼ ( fn,kþ1 , . . . , fn,n , 0, . . . , 0)0 (k zeros), so instead of Eq. (2.21) we have
kA0k fn
n X
fn kp ¼
j fn,t fn,tk j þ
t¼kþ1
!1=p
n X
p
j fnt j
p
:
t¼nkþ1
The place of Eq. (2.22) is taken by
n X
p
j fnt j
t¼nkþ1
ð n X t¼nkþ1
it
ð1
p
jF(x)j dx ¼
jF(x)jp dx:
1k=n
Bound (2.23) is still applicable in the present situation. Equation (2.24) follows with kFk p,(1k=n,1) in place of kFk p,(0,k=n) . Item (iii) follows from Eqs. (2.15) – (2.16) and Lemma 2.1.3. Item (iv) is a consequence of Eq. (2.22):
kA k fn k p ¼
k X
!1=p j fnt jp
kFk p,(0,k=n) ,
k n:
t¼1
Item (v) obtains from Eq. (2.18) and Lemma 2.1.3. The proofs of (vi) and (vii) mimic those of (iv) and (v). B
2.4.2 The Doubling Property of the Continuity Modulus Lemma.
vp (F, 2d) vp (F, d).
60
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
Proof. Let jyj 2d. By the triangle inequality kF ty Fk p,Vy kF ty=2 Fk p,Vy þ kty=2 F ty Fk p,Vy ¼ kF ty=2 Fk p,Vy þ kF ty=2 Fk p,(Vy þy=2) : Here the end term results from the change x þ y=2 ¼ t. As it happens, Vy ¼ ( max{0, y}, min{1, 1 y}) # Vy=2 Vy þ y=2 ¼ ( max{y=2, y=2}, min{1 þ y=2, 1 y=2}) # Vy=2 : Hence, increasing the domains in the above inequality and applying sup gives
vp (F, 2d) ¼ sup kF ty Fk p,Vy jyj2d
2 sup kF ty=2 Fk p,Vy=2 ¼ 2vp (F, d):
B
jy=2jd
2.4.3 The Continuity Modulus is Uniformly Continuous Lemma. For F [ Lp , p , 1, the continuity modulus vp (F, d) is a uniformly continuous function of d . 0. Proof. Adapted from Zhuk and Natanson (2001). Step 1. Let us prove that d1 , d2 implies
vp (F, d2 ) vp (F, d1 ) vp (F, d2 d1 ):
(2:25)
By definition, for any 1 . 0 there exists y, jyj d2 , such that
vp (F, d2 ) 1 kF ty Fk p,Vy :
(2:26)
jyj=d2 1 implies jyjd1 =d2 d1 , so for h ¼ yd1 =d2 by definition kF th Fk p,Vh vp (F, d1 ):
(2:27)
Since d1 =d2 , 1 by assumption, there is an inclusion Vh ¼ Vyd1 =d2 ¼ (max{0, yd1 =d2 }, min{1, 1 yd1 =d2 }) $ Vy : This, together with Eq. (2.27), implies vp (F, d1 ) kF th Fk p,Vy :
(2:28)
2.4 CONVERGENCE OF THE TRINITY ON Lp-GENERATED SEQUENCES
61
Adding Eqs. (2.26) and (2.28) yields
vp (F, d2 ) vp (F, d1 ) 1 kF ty Fk p,Vy kF th Fk p,Vy kth F ty Fk p,Vy ,
(2.29)
where the final step is by the triangle inequality. Now consider I ¼ k th F
ty Fkpp,Vy
p ð yd1 F(x þ y) dx: ¼ F x þ d 2
Vy
e¼ Changing x þ yd1 =d2 ¼ z and denoting u ¼ y(1 d1 =d2 ) and V Vy þ yd1 =d2 , we have ð I¼
jF(z) F(z þ y(1 d1 =d2 ))jp dz ¼ kF tu Fk p,e : V
V e
e is a subset of Vu : Note that V e ¼ max y d1 , y 1 d1 , min 1 þ y d1 , 1 y 1 d1 V d2 d2 d2 d2 d1 d1 ¼ max y , u , min 1 þ y , 1 u d2 d2 # (max{0, u}, min{1, 1 u}) # Vu : Now Eq. (2.29), the definition and the monotonicity of the continuity modulus give
vp (F, d2 ) vp (F, d1 ) 1 kF tu Fk p,Vu vp (F, juj) vp (F, d2 (1 d1 =d2 )) ¼ vp (F, d2 d1 ): Since 1 is arbitrarily close to zero, Eq. (2.25) follows. Step 2. By Lemma 2.1.5 the continuity modulus of F [ Lp , p , 1, vanishes at zero. Hence, the right-hand side of Eq. (2.25) can be made arbitrarily small by choosing d2 d1 small, regardless of where d1 is. The left side is nonnegative by monotonicity. Thus, the continuity modulus is, indeed, uniformly continuous. B
62
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
2.4.4 Major For F [ Lp , p , 1, put
m(d) ¼ m(F, p, d) ¼ max{vp (F, d), kFk p,(0,d) , kFk p,(1d,1) }, d [ (0, 1]: Since m appears as a majorant in certain estimates, I call it just a major (luckily, the majors from matrix algebra do not play any role in this book). Lemma.
Let F [ Lp , p , 1. m is continuous on (0, 1] and vanishes at zero: lim m(d) ¼ 0:
d !0
(2:30)
If kFkp = 0, then m(d) is positive for positive d. Proof. Continuity of the major follows from continuity of vp (F, d) (Lemma 2.4.3) and absolute continuity of the Lebesgue integral (both norms kFk p,(0,d) and kFk p,(1d,1) are continuous in d). Equation (2.30) is a consequence of Lemma 2.1.5 and absolute continuity of the Lebesgue integral. Suppose that m(d) ¼ 0 for some d [ (0, 1]. If d 1=2, then the intervals (0, d) and (1 d, 1) cover (0, 1) and F ¼ 0 a.e. on (0, 1), which contradicts the assumption kFkp = 0. Let’s assume d , 1=2. Then F ¼ 0 a.e: on (0, d):
(2:31)
Ð 1d vp (F, d) ¼ 0 implies, in particular, 0 jF(x) F(x þ d)jp dx ¼ 0, which, because of Ð 1d Eq. (2.31), reduces to 0 jF(x þ d)jp dx ¼ 0. That is, the vanishing behavior of F on (0, d) extends to (d, 2d). By the doubling property (Lemma 2.4.2) we have vp (F, 2d) ¼ 0. Hence, the above procedure of propagating the equality of F to zero can be repeated a finite number of times to cover the whole interval (0, 1) (actually, covering (0, 1 d) is enough). The conclusion again contradicts kFkp = 0. Thus, B the assumption m(d) ¼ 0 is wrong and m is positive on (0, 1].
2.4.5 Inverting the Major Denote
z(1) ¼ z(F, P, 1) ¼ sup {d [ (0, 1]: m(d) 1}, 1 [ (0, 1], the inverse of the major. This is apusual way to obtain a generalized inverse of a ffiffiffi function. If, for example, m(d) ¼ d, then z(1) ¼ 12 . The definition works when the theorem on inverses of continuous strictly monotone functions does not. If the graph of the major has a flat section at 10 , m(d) ¼ 10 for d1 d d2 , the definition supplies the right end of that section: z(10 ) ¼ d2 .
2.4 CONVERGENCE OF THE TRINITY ON Lp-GENERATED SEQUENCES
63
Lemma. Let F [ Lp , p , 1. Then z is positive on (0, 1]. For sufficiently small 1, z(1) inverts m:
m(z(1)) ¼ 1:
(2:32)
lim z(1) ¼ 0:
(2:33)
It also vanishes at 0 if kFkp = 0: 1!0
Proof. By Eq. (2.30), for any 1 [ (0, 1] there is d [ (0, 1] such that m(d) 1. Then z(1) d and z is positive on (0, 1]. Let {dn } be a sequence such that dn ! z(1) and m(dn ) 1. By continuity of m, then m(z(1)) 1. For sufficiently small 1 a strict inequality here is impossible because if m(z(1)) , 1, then by continuity of m we would have m(d) 1 for some d [ (z(1), 1], which is at variance with the definition of z(1). Equation (2.33) follows from Eq. (2.30) because in the case kFkp = 0, by Lemma 2.4.4 d ¼ 0 is the only point where m vanishes. B
2.4.6 Zero-Tail and Nonzero-Tail Sequences of Weights In principle, the bounds from Lemma 2.4.1 are sufficient to prove convergence of the trinity if we are willing to abuse the 1 d language. The definitions in Sections 2.4.4 and 2.4.5, this section and Section 2.4.7 are aimed at making the things more beautiful by using just one 1 in the main statement. Everywhere it is assumed that X ac ; jcj j , 1, F [ Lp , 1 [ (0, 1]: j[Z
The objective is to study convergence of the trinity on images dnp F. This convergence is trivial if ac ¼ 0 and/or kFkp ¼ 0. In Sections 2.4.4 and 2.4.5 we see the implications of kFkp = 0. Here we combine them with those of the restriction ac = 0. The sequencePc ¼ {cj : j [ Z} with ac = 0 is called zero-tail if there exists a natural n such that jkjn jck j ¼ 0. By rejecting this condition we obtain nonzero-tail P sequences for which jkjn jck j . 0 for all n . 0. For a zero-tail nontrivial sequence the number ( ) X jck j ¼ 0 kc ¼ min n [ N: jkjn
is defined.
2.4.7 Regulator Definition The regulator r: (0, 1] ! N is used in a statement of type: for any 1 [ (0, 1] there exists a natural number r(1) such that a certain quantity (depending on n) does not exceed 1 for n r(1). In the trivial case ac kFkp ¼ 0 we put formally r(1) ¼ 1 for all 1:
64
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
Consider the nontrivial case: ac kFkp = 0. 1. If c is zero-tail, then kc is a natural number and the set {n [ N: z(1)n kc } is nonempty because z(1) . 0 by Lemma 2.4.5. By definition r(1) ¼ r(c, F, p, 1) ¼ min{n [ N: z(1)n kc }: zero-tail, then by summability of c for all sufficiently large n the 2. If c is not P inequality jkj.z(1)n jck j 1 is true. In this case the regulator is defined by ( ) X jck j 1 : r(1) ¼ r(c, F, p, 1) ¼ min n [ N: jkj.z(1)n
In both cases directly from the definition we see that X jck j 1:
(2:34)
jkj.z(1)r(1)
From this property and Eq. (2.33) we also see that lim r(1) ¼ 1:
(2:35)
1!0
2.4.8 Cutter Let ac kFkp = 0. Put c(n, 1) ¼ [z(1)n], n [ N, n r(1): I call c(n, 1) a cutter because it is used to cut sums and integrals. Lemma.
Suppose ac kFkp = 0, 1 [ (0, 1] is sufficiently small and n r(1).
(i) If jkj c(n, 1), then jkj=n z(1). P (ii) jkj.c(n,1) jck j 1: (iii) c(n, 1) n 2. Proof. From the definition of an integer part c(n, 1) z(1)n , c(n, 1) þ 1:
(2:36)
By the condition of the lemma z(1)n z(1)r(1), which, together with the right inequality in Eq. (2.36), leads to c(n, 1) þ 1 . z(1)r(1):
(2:37)
Part (i) follows from the left side of Eq. (2.36). Part (ii) results from Eqs. (2.34) and (2.37): X X jck j ¼ jck j jkj.c(n,1)
jkjc(n,1)þ1
X jkj.z(1)r(1)
jck j 1:
2.4 CONVERGENCE OF THE TRINITY ON Lp-GENERATED SEQUENCES
65
(iii) By Eqs. (2.33) and (2.35) for small 1 the expression 2=(1 z(1)) is close to 2 and r(1) is large, so for such 1 we have 2=(1 z(1)) r(1) n. Hence, 2 n nz(1) and nz(1) n 2. Combining this with the left inequality in Eq. (2.36) proves the statement. B
2.4.9 Convergence of the Trinity on Lp-Generated Sequences In addition to the previous notation ac ¼
bc ¼
P
j[Z
X
jcj j we need
cj :
j[Z
Theorem. (Mynbaev, 2001) If ac , 1, F [ Lp , 1 p , 1, then for all sufficiently small 1 max{k(Tn0 bc )dnp Fkp , kTn dnp Fkp , kTnþ dnp Fkp } (21=p ac þ 2kFkp )1 for all n r(1):
(2.38)
Proof. In the trivial case ac kFkp ¼ 0 the left side of Eq. (2.38) is zero and the inequality is true for all n 1 ¼ r(1), so we can assume ac kFkp = 0. Denote fn ¼ dnp F. The cutter determines what kind of estimate to use. For 0 . k c(n, 1) ( (n 2)) we use Lemma 2.4.1(i): kA0k fn fn kp (v pp (F, jkj=n) þ kFkpp,(0,jkj=n) )1=p [by Lemma 2.4.8(i) and monotonicity] (v pp (F, z(1)) þ kFkpp,(0,z(1)) )1=p [applying the major and Eq. (2.32)] 21=p m(z(1)) ¼ 21=p 1:
(2:39)
Similarly, using item (ii) of Lemma 2.4.1, for 0 k c(n, 1) we have p )1=p 21=p 1: kA0k fn fn kp (vpp (F, k=n) þ kFkp,(1k=n,1)
(2:40)
66
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
After subtracting bc fn from Eq. (2.17) we can sort the terms as in (Tn0 bc )fn ¼
n1 X k¼0
¼
1 X
ck A0k fn þ
ck A0k fn
k¼nþ1
c(n,1) X
k¼0 |fflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflffl} n1 X
ck [A0k fn fn ]
k¼c(n,1)
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
S1
þ
ck fn
k[Z 1 X
ck [A0k fn fn ] þ
X
S2
X
ck A0k fn
ck fn :
jkj¼c(n,1)þ1
jkj.c(n,1)
S3
S4
|fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl}
|fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl}
Now estimate S1 using Eq. (2.40) and S2 using Eq. (2.39); for S3 apply Lemma 2.4.1(iii) [the sum is not empty by Lemma 2.4.8(iii)], and for S4 apply Lemmas 2.1.3 and 2.4.8(ii). The resulting bound is c(n,1) X
k(Tn0 bc )fn kp
k¼0
þ
1 X
jck j21=p 1 þ
jck j21=p 1
k¼c(n,1) n1 X
c(n,1) X
jck jkFkp
jkj.c(n,1)
jkj.c(n,1)
21=p 1
X
jck jkFkp þ
X
jck j þ 2kFkp
jck j
jkj.c(n,1)
jkj¼0
(21=p ac þ 2kFkp )1:
(2.41)
Applying the cutter in representation (2.19) we get kTn fn kp
c(n,1) X
jck j kA k fn k p þ
k¼1 |fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl} S1
X
jck j kA k fn k p :
k.c(n,1)
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} S2
For S1 use Lemma 2.4.1(iv) and for S2 Lemma 2.4.1(v). Next, apply parts (i) and (ii) of Lemma 2.4.8: kTn fn kp
c(n,1) X k¼1
X
jck jkFk p,(0,k=n) þ
jck jkFkp
k.c(n,1)
ac kFk p,(0,z(1)) þ 1kFkp ac m(z(1)) þ 1kFkp 1(ac þ kFkp ): The final line follows from the property (2.32) of the major.
(2.42)
2.4 CONVERGENCE OF THE TRINITY ON Lp-GENERATED SEQUENCES
67
Similarly, for Tnþ we use representation (2.30), Lemmas 2.4.1(vi) and 2.4.8(i) for k c(n, 1) and Lemmas 2.4.1(vii) and 2.4.8(ii) for k . c(n, 1). The result is kTnþ fn kp 1(ac þ kFkp ):
(2:43)
Equations (2.41), (2.42) and (2.43) prove the theorem.
B
2.4.10 Discussion Components of dnp F vanish in the limit [Lemma 2.1.3(iii)], so the terms Tn0 dnp F and bc dnp F in Eq. (2.38) do not converge separately in lp . To understand why their difference converges to zero, it is useful to consider the operator 2 Mn F ¼ Dnp Tn0 dnp F ¼ Dnp 4
n X
!n 3 5 (dnp F )t ctj
t¼1
¼ n1=p
n X n X
j¼1
(dnp F )t ctj 1ij
j¼1 t¼1 1=pþ1=q
¼n
n X n ð X j¼1 t¼1
F(x) dx ctj 1ij ,
F [ Lp :
it
Thus, the restriction of Mn F on ij equals Mn Fjij ¼
ð n X n F(x) dx ctj : t¼1
it
The averages of F over the covering intervals are multiplied by ctj and the results are summed to obtain Mn Fjij . As n ! 1, the partition becomes finer and more and more of the numbers ctj are involved in the sum. The averages tend to values of F at points of (0, 1). In the limit every value F(x) is multiplied by the sum bc of all numbers cj . This intuitive explanation is substantiated by kMn F bc Fkp kDnp (Tn0 bc )dnp Fkp þ jbc j kDnp dnp F Fkp ¼ k(Tn0 bc )dnp Fkp þ jbc j kPn F Fkp ! 0, n ! 1, where we use Lemma 2.1.7, Eq. (2.38) and Lemma 2.2.1. The limit limn!1 Mn is similar to the multiplier operator M in the Fourier analysis defined by (MF)(x) ¼
X k[Z
mk ck eikx
68
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
if the function F on the unit circumference is decomposed as X F(x) ¼ ck eikx k[Z
and {mk } is a given sequence of numbers. M is a composite of three mappings: first F is discretized to obtain its Fourier coefficients {ck }, second the Fourier coefficients are multiplied by mk to get {mk ck } and, finally, the latter numbers are used as Fourier coefficients in the new series MF.
2.5 PROPERTIES OF Lp-APPROXIMABLE SEQUENCES 2.5.1 Definitions Let 1 p 1 and let the sequence of vectors { fn } be such that fn [ Rn for all n [ N. { fn } is called Lp -approximable if there exists a function F [ Lp such that k fn dnp Fkp ! 0, n ! 1:
(2:44)
If such is the case, the sequence { fn } is said to be Lp -close to F. To make this definition work in the case p ¼ 1, it is necessary to assume additionally that F is continuous on [0, 1], and I prefer to mention this condition each time rather than include it in the definition. Lemma.
If p , 1, then Eq. (2.44) is equivalent to kDnp fn Fkp ! 0, n ! 1:
(2:45)
Proof. If Eq. (2.44) is true, then by Lemmas 2.1.7 and 2.2.1 kDnp fn Fkp kDnp fn Dnp dnp Fkp þ kDnp dnp F Fkp ¼ kDnp ( fn dnp F)kp þ kPn F Fkp ! 0,
n ! 1:
Conversely, the same lemmas allow us to derive from Eq. (2.45) that k fn dnp Fkp ¼ kDnp ( fn dnp F)kp kDnp fn Fkp þ kF Pn Fkp ! 0, n ! 1:
B
2.5.2 M-Properties The name m-properties is used for those properties of Lp -approximable sequences that stem mainly from magnitude properties of functions from Lp . Accordingly, c-properties reflect mainly the continuity characteristics of elements of Lp . They are more difficult to establish.
2.5 PROPERTIES OF Lp-APPROXIMABLE SEQUENCES
Lemma.
69
Let { fn } be Lp -approximable. Then
(i) supn k fn kp , 1. (ii) If p , 1, then lim max j fnt j ¼ 0:
n!1 1tn
Proof. (i) By Lemma 2.1.3(ii) Lp -approximability implies k fn kp k fn dnp Fkp þ kdnp Fkp k fn dnp Fkp þ kFkp c: Part (ii) follows from Lemma 2.1.3(iii) and max j fnt j k fn dnp Fkp þ max1tn j(dnp F)t j:
1tn
B
2.5.3 Bilinear Forms of Lp-Approximable Sequences This section starts a series of c-properties. Theorem. (Mynbaev, 2001) If 1 , p , 1, {xn } is Lp -close to X [ Lp and {yn } is Lq -close to Y [ Lq , then lim
n!1
[nb] X
ðb xnt ynt ¼
t¼[na]
X(s)Y(s) ds
for all [a, b] # [0, 1]
a
uniformly with respect to the segments [a, b]. Here we put xn0 ¼ yn0 ¼ 0 for the sum at the left to have meaning when a ¼ 0. Proof. Denote {a, b} ¼ {t [ N: [na] t [nb]} and apply Theorem 2.2.3. The locator (Section 2.1.1) makes sure that a [ i[na]þ1 , b [ i[nb]þ1 . In consequence, ! ! [nb] [na] n [ [ [ it , 1[a,b] ¼ 0 on it < it : 1[a,b] ¼ 1 on t¼[na]þ2
t¼1
t¼[nb]þ2
Therefore [dnp (1[a,b] X)]0 dnq (1[a,b] Y) ¼
n X
[dnp (1[a,b] X)]t [dnq (1[a,b] Y)]t
t¼1
¼
X
(dnp X)t (dnq Y)t
t[{a,b}
þ
X
[dnp (1[a,b] X)]t [dnq (1[a,b] Y)]t
t¼[na]þ1,[nb]þ1
X
t¼[na],[na]þ1
(dnp X)t (dnq Y)t :
(2.46)
70
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
By Lp (Lq )-approximability and Ho¨lder’s inequality X X (dnp X)t (dnq Y)t xnt ynt t[{a,b} t[{a,b} X X [(dnp X)t xnt ](dnq Y)t þ xnt [(dnq Y)t ynt ] t[{a,b} t[{a,b} kdnp X xn kp kdnq Ykq þ kxn kp kdnq Y yn kq ! 0:
(2.47)
Here we used Lemmas 2.1.3(ii) and 2.5.2(i). By Lemma 2.1.3(i) max {j[dnp (1[a,b] X)]t j, j(dnp X)t j} max{k1[a,b] Xk p,it , kXk p,it } ¼ kXk p,it : A similar bound holds for Y. Hence, of the three sums at the right of Eq. (2.46), the last two tend to zero uniformly with respect to a, b. By Theorem 2.2.3, Eq. (2.46) and Eq. (2.47) we have uniformly in a, b X X xnt ynt ¼ lim (dnp X)t (dnq Y)t lim n!1
t[{a,b}
n!1
t[{a,b}
0
ðb
¼ lim [dnp (1[a,b] X)] dnq (1[a,b] Y) ¼
X(s)Y(s) ds:
n!1
a
B
2.5.4 The Trinity and Lp-Approximable Sequences In essence, statements for Lp -approximable sequences are obtained from those for Lp -generated ones by perturbation. Since the rate of convergence of fn dnp F in Eq. (2.44) is not quantified, in the next theorem it is not possible to specify the rate of convergence of the trinity. Theorem.
(Mynbaev, 2001) If p , 1, ac , 1 and { fn } is Lp -approximable, then lim max{k(Tn0 bc ) fn kp , kTn fn kp , kTnþ fn kp } ¼ 0:
n!1
Proof. By boundedness of Tn0 in lp (see Eq. (2.41)), convergence of the trinity on Lp -generated sequences (Theorem 2.4.9) and the Lp -approximability definition k(Tn0 bc ) fn kp k(Tn0 bc )( fn dnp F)kp þ k(Tn0 bc )dnp Fkp 2ac k fn dnp Fkp þ k(Tn0 bc )dnp Fkp ! 0: The rest of the proof utilizes Eqs. (2.42) and (2.43) and is equally simple.
B
2.6 CRITERION OF Lp-APPROXIMABILITY
71
2.5.5 Bilinear Forms and T Operator Combined Theorem. then
If 1 , p , 1, ac , 1 and { fn } is Lp -close to F, {gn } is Lp -close to G,
lim
n!1
X
(Tn fn )t (Tn gn )t ¼
t[Z
b2c
ð1 F(x)G(x) dx: 0
Proof. By Ho¨lder’s inequality n n X X 0 0 2 (Tn fn )t (Tn gn )t bc fnt gnt t¼1 t¼1 n n X X 0 0 0 (Tn fn bc fn )t (Tn gn )t þ jbc j fnt (Tn gn bc gn )t t¼1 t¼1 k(Tn0 bc )fn kp kTn0 gn kq þ jbc jk fn kp k(Tn0 bc )gn kq ! 0:
(2.48)
The final line obtains by applying uniform boundedness of kTn0 k, k fn kp and kgn kq and Theorem 2.5.4. Ho¨lder’s inequality and Theorem 2.5.4 yield X (Tn fn )t (Tn gn )t kTn fn kp kTn gn kq ! 0: t,1 Here Tn can be replaced with Tnþ . Therefore Eq. (2.48) implies X
(Tn fn )t (Tn gn )t b2c
fnt gnt ! 0:
t¼1
t[Z
It remains to recall that by Theorem 2.5.3
n X
Pn
t¼1 fnt gnt
!
Ð1 0
FGds.
B
2.6 CRITERION OF Lp-APPROXIMABILITY 2.6.1 Statement of Problem The definition of Lp -approximability appeals to the existence of a function F [ Lp for which Eq. (2.44) would be true. Along with the question about what this entails it is natural to ask when such a function exists. Different answers are possible. Sufficient conditions and counter-examples are considered in Section 2.7. All of them rely on
72
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
some external information about the sequence. Here we concentrate on what is called an intrinsic characterization, which should satisfy two conditions: 1. it should be equivalent to (that is, necessary and sufficient for) Lp -approximability and 2. it should be expressed in terms of just the sequence itself, without appealing to any other objects.
2.6.2 Continuity Modulus of a Step Function Here is one of those technical calculations that look arbitrary—and therefore ugly— yet lead to a precise result. Notation (2.14) conceals the fact that the matrices A0k depend not only on k but also on n. Here it is more convenient to use the well-known fact that if In ¼ A01 denotes the n n matrix with the first subdiagonal filled with unities and all other cells with zeros, then all other matrices A0k with negative k are its powers: A0k ¼ (In )k , k ¼ 1, . . . , n þ 1: Lemma.
For a natural n consider a step function F¼
n X
ct 1it
t¼1
with some real coefficients and denote c ¼ (c1 , . . . , cn )0 . If p , 1 and d , 1, then !
vp (F, d) (2=n)1=p 2 sup k(In )[ yn] c ckp þ kIn c ckp : 0,yd
Proof. Because of the symmetry
kF
ty Fkpp,Vy
min{1,1y} ð
¼
jF(x) F(x þ y)jp dx
max {0,y} min{1þy,1} ð
¼
jF(z y) F(z)jp dz ¼ kF ty Fkpp,Vy
max {y,0}
the sup in the definition of the continuity modulus can be taken over only positive y:
vp (F, d) ¼ sup kF ty Fk p,Vy : 0,yd
The condition d , 1 is not really a restriction because for y 1 the set Vy is empty.
2.6 CRITERION OF Lp-APPROXIMABILITY
73
Fix 0 , y d , 1 and denote k ¼ [yn] yn , n. Because of the form of F we have to start with kF
ty Fkpp,Vy
¼
ð n X t¼1
jF ty Fjp dx:
it >Vy
Let’s look at one term in this sum. Let x [ it > Vy , that is, (t 1)=n x , t=n, 0 , x , 1 y: From the definition of k k=n y , (k þ 1)=n:
(2:49)
These inequalities imply (t þ k 1)=n x þ y , (t þ k þ 1)=n or x þ y [ itþk < itþkþ1 . Hence F(x þ y) may take only values ctþk (on itþk y) and ctþkþ1 (on itþkþ1 y). It follows that ð it >Vy
ð
jF ty Fjp dx ¼
jct ctþk jp dx
it >Vy >(itþk y)
ð þ
jct ctþkþ1 jp dx
it >Vy >(itþkþ1 y)
¼ jct ctþk jp mes (it > Vy > (itþk y)) þ jct ctþkþ1 jp mes(it > Vy > (itþkþ1 y)):
(2.50)
Obviously, with Vy,t,k ¼ it > Vy > (itþk y) we have mes(Vy,t,k )
mes(itþk y) ¼ 1=n, 0,
if Vy,t,k = ;; if Vy,t,k ¼ ;:
To avoid the headache of trying to figure out when Vy,t,k is nonempty we replace it by a weaker condition that Vy > (itþk y) is nonempty (the upper bound may only increase). Since itþk y ¼ [(t þ k 1)=n, (t þ k)=n y), Vy ¼ (0, 1 y) and by Eq. (2.49) (t þ k)=n y . (k þ 1)=n y . 0, the intersection Vy > (itþk y) is nonempty if (t þ k 1)=n y < 1 y or t þ k n:
74
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
Similarly, mes(it > Vy > (itþkþ1 y)) 1=n and we can count only those t that satisfy t þ k þ 1 n: Summing Eq. (2.50) over the indicated t we get
kF
ty Fkpp,Vy
! nk nk1 X 1 X p p jct ctþk j þ jct ctþkþ1 j : n t¼1 t¼1
(2:51)
Consider, for example, the first sum at the right of Eq. (2.51), nk X
jct ctþk jp ¼
t¼1
n X
jc jk cj jp þ
k X
jcj jp :
j¼1
j¼kþ1
From Eq. (2.16) we know that (In )k c ¼ (0, . . . , 0, c1 , . . . , cnk )0 so the above sum equals nk X
p
jct ctþk jp ¼ k(In )k c ckp
t¼1
and similarly nk1 X
p
jct ctþkþ1 jp ¼ k(In )kþ1 c ckp :
t¼1
Thus, using also an elementary inequality (ap þ bp )1=p 21=p (a þ b), p
kF ty Fk p,Vy n1=p (k(In )k c ckp þ k(In )kþ1 c ckpp )
1=p
(2=n)1=p (k(In )k c ckp þ k(In )kþ1 c ckp ): Here, by boundedness (2.16) k(In )kþ1 c ckp k(In )kþ1 c (In )k ckp þ k(In )k c ckp ¼ k(In )k (In c c)kp þ k(In )k c ckp kIn c ckp þ k(In )k c ckp :
(2.52)
2.6 CRITERION OF Lp-APPROXIMABILITY
75
Thus, kF ty Fk p,Vy (2=n)1=p (2k(In )k c ckp þ kIn c ckp ), which proves the lemma.
B
2.6.3 Condition X: Discretizing the Continuity Modulus The discretization operator dnp takes us from Lp to lp and the interpolation operator Dnp takes us back. How does this two-way relationship extend to continuity moduli? In other words, what is, in terms of lp , the equivalent of the property limd!0 vp (F, d) ¼ 0, p , 1? This equivalent, let’s call it condition X, is established in the next lemma taken from Mynbaev (2001). Denote X({fn }) ¼
lim
sup
d !0, m !1 nm, 0,yd
k(In )[yn] fn fn kp
(2:53)
for any sequence { fn } , lp . We say that { fn } satisfies condition X if X({ fn }) ¼ 0. Lemma.
Let p , 1:
(i) If { fn } is Lp -generated by F [ Lp, fn ¼ dnp F, then { fn } satisfies condition X. (ii) Conversely, suppose that a sequence { fn }, such that fn [ Rn for all n, satisfies condition X. Then the step functions Fn ¼ D np fn possess the property lim sup vp (Fn , d) ¼ 0:
d !0 n1
Proof. (i) Let 0 , d , 1 and n [ N. Since [ yn] yn, Lemma 2.4.1(i) implies sup n1, 0,yd
k(In )[ yn] fn fn kp (vpp (F, d) þ kFkpp,(0,d) )
1=p
:
Therefore lim
sup
d !0 n 1, 0,yd
k(In )[ yn] fn fn kp ¼ 0,
which is stronger than X({ fn }) ¼ 0. (ii) As a preliminary step, let’s prove that X({fn }) ¼ 0 implies lim k(In )k fn fn kp ¼ 0 for any k [ N:
n !1
(2:54)
76
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
From Eq. (2.53) we see that if X({ fn }) ¼ 0, then for any 1 . 0 there exist d . 0 and m 1 such that k(In )[ yn] fn fn kp , 1
for all n m and y [ (0, d]:
(2:55)
For a natural k consider n m0 ; max{m, k=d} and put y ¼ k=n d. Then [ yn] ¼ k and the preceding bound gives Eq. (2.54): k(In )k fn fn kp , 1 for all n m0 :
(2:56)
Put c ¼ n1=p fn in Lemma 2.6.2. Then the function F from that lemma becomes Fn ¼ Dnp fn and ! 1=p
vp (Fn , d) 2
2 sup 0,yd
k(In )[ yn] fn
fn kp þ
kIn fn
fn kp :
Applying Eqs. (2.55) and (2.56) we get vp (Fn , d) 21=p 31 for all n m0 . The proof is complete. B
2.6.4 Precompactness in Lp A set K in a normed space L is called precompact if every sequence {xn } # K contains a convergent subsequence {xnm }. Properties of precompact sets in infinite-dimensional spaces parallel those of bounded sets in finite-dimensional spaces. I give an example of how this notion works in Section 2.6.6. Theorem (Frechet – Kolmogorov). (Iosida, 1965, Section X.1) A set K , Lp is precompact if and only if sup kFkp , 1 (uniform boundedness) and
F[K
lim sup vp (F, d) ¼ 0 (uniform equicontinuity in mean):
d !0 F[K
2.6.5 Orthogonality Lemma. Let 1 , p , 1. If a function F [ Lp is orthogonal to indicators of all intervals, ð1 F(x)1(a,b) (x) dx ¼ 0 0
then F ¼ 0 a.e.
for all (a, b) # (0, 1),
(2:57)
2.6 CRITERION OF Lp-APPROXIMABILITY
77
Proof. By linearity, Eq. (2.57) extends to ð1 F(x)G(x) dx ¼ 0
for all step functions G:
(2:58)
0
If G [ Lq is an arbitrary function, then the projections Pn G are step functions. By Eq. (2.58), Ho¨lder’s inequality and Lemma 2.2.1 1 1 ð ð ð1 FG dx ¼ FG dx FPn G dx 0
0
0
kFkp kG Pn Gkq ! 0: This generalizes Eq. (2.58) to ð1 F(x)G(x) dx ¼ 0
for all G [ Lq :
(2:59)
0
It is easy to check that H ¼ jFj p1 sgnF belongs to Lq : ð1
q
ð1
jH(x)j dx ¼ 0
jF(x)j
(p1)q
ð1 dx ¼
0
jF(x)jp dx , 1:
0
Then, by Eq. (2.59) ð1 0¼
ð1 F(x)G(x) dx ¼
0
jF(x)kF(x)jp1 dx ¼ kFkpp
0
and F ¼ 0 a.e.
B
2.6.6 Criterion of Lp-Approximability Theorem. (Mynbaev, 2001) Let 1 , p , 1 and suppose { fn } is a sequence of vectors satisfying fn [ Rn for all n [ N. Then { fn } is Lp -approximable if and only if the following three conditions hold: (i) supn k fn kp , 1 (uniform boundedness), P (ii) the limit limn !1 n1=q [nb] t¼[na] fnt exists for any 0 a , b 1 (here by definition fn0 ¼ 0 for all n) and (iii) X({ fn }) ¼ 0 (condition X).
78
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
Proof. Necessity. Let { fn } be Lp -close to F [ Lp . The necessity of uniform boundedness is proved in Lemma 2.5.2. In the refined convergence theorem (Theorem 2.5.3) let {xn } ¼ { fn }, X ¼ F and let {yn } be Lq -generated by Y ; 1. Then, by the definition from Section 2.1.2, ynt ¼ n1=p1 ¼ n1=q , t ¼ 1, . . . , n, and Theorem 2.5.3 gives 1=q
lim n
n !1
[nb] X
ðb F(s) ds uniformly in [a, b] # [0, 1]:
fnt ¼
t¼[na]
(2:60)
a
This condition implies (ii). Later on in the proof we need a generalization of this property for subsequences: if some subsequence { fnm } of { fn } is Lp -close to F [ Lp , meaning that k fnm dnm ,p Fkp ! 0, m ! 1, then
lim n1=q m
m !1
[n m b] X
ðb F(s) ds uniformly in [a, b] # [0, 1]:
fnm ,t ¼
t¼[nm a]
(2:61)
a
This is obtained from Eq. (2.60) simply by taking { fnm } as the original sequence. In Lemma 2.6.3(i) the necessity of condition X is proved for Lp -generated sequences, so for any 1 . 0 there exist d . 0 and m 1 such that sup nm, 0,yd
k(In )[yn] dnp F dnp Fkp , 1:
Due to Lp -approximability, the choice of m can also be subjected to sup k fn dnp Fkp , 1:
nm
By boundedness of (In )k [see Eq. (2.16)] for n m and 0 , y d k(In )[ yn] fn fn kp k(In )[ yn] ( fn dnp F)kp þ k(In )[ yn] dnp F dnp Fkp þ kdnp F fn kp 2k fn dnp Fkp þ k(In )[ yn] dnp F dnp Fkp 31: This proves necessity of condition X for Lp -approximable sequences. Sufficiency. Put Fn ¼ Dnp fn . Since Dnp is an isomorphism (Lemma 2.1.7), condition (i) implies uniform boundedness of Fn : supn kFn kp , 1. By Lemma 2.6.3(ii) condition X ensures uniform equicontinuity in the mean of the functions Fn : limd!0 supn1 vp (Fn , d) ¼ 0. In virtue of the Frechet – Kolmogorov theorem, the set K ¼ {Fn } is precompact in Lp . Hence, there exist a subsequence {Fnm } and a function F [ Lp such that kFnm Fkp ! 0. Then { fnm } is Lp -close to F and Eq. (2.61) is true.
2.6 CRITERION OF Lp-APPROXIMABILITY
79
We need to show that the whole sequence {Fn } converges to F. Suppose it does not. Then there exists another subsequence {Fnk } that is at a positive distance from F: kFnk Fkp 1 . 0:
(2:62)
By precompactness, {Fnk } has a convergent subsequence. Changing the notation, if necessary, we can think of {Fnk } itself as convergent to some G [ Lp : kFnk Gkp ! 0:
(2:63)
Note that { fnk } is Lp -close to G because by Lemmas 2.1.7 and 2.2.1 k fnk dnk , p Gkp ¼ kDnk , p ( fnk dnk , p G)kp ¼ kFnk Pnk Gkp kFnk Gkp þ kG Pnk Gkp ! 0: This allows us to employ Eq. (2.61):
lim
k !1
n1=q k
[n k b] X
ðb fnk ,t ¼
t¼[nk a]
G(s) ds
for all [a, b] # [0, 1]:
(2:64)
a
By condition (ii) the limits in Eqs. (2.61) and (2.64) should be the same. We write this conclusion as ð1 (F G)1(a,b) dx ¼ 0 for all (a, b) # (0, 1): 0
By the orthogonality Lemma 2.6.5, F ¼ G a.e., which contradicts Eqs. (2.62) and (2.63). Hence, the whole sequence {Fn } converges to F and { fn } is Lp -close to F: k fn dnp Fkp ¼ kDnp ( fn dnp F)kp ¼ kFn Pn Fkp kFn Fkp þ kF Pn Fkp ! 0:
2.6.7 Explicit Construction Corollary. If 1 , p , 1 and { fn } is Lp -approximable, then F(x) ¼
[nx] X d lim n1=q fnt , dx n !1 t¼1
is that function to which { fn } is Lp -close.
x [ [0, 1],
B
80
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
Proof. This follows from Eq. (2.60) where we can take a ¼Ð 0, b ¼ x and use the d x Lebesgue differentiation theorem: if F is integrable, then dx 0 F(s) ds ¼ F(x) a.e. (Kolmogorov and Fomin, 1989, Chapter VI, Section 3). B
2.7 EXAMPLES AND COUNTEREXAMPLES 2.7.1 Definition of Trends 1. A polynomial trend equals, by definition, xn ¼ (1k1 , 2k1 , . . . , nk1 )0 where k is natural. 2. A logarithmic trend is defined by xn ¼ ( lnk 1, . . . , lnk n)0 for a natural k. 3. A geometric progression is taken to be xn ¼ (a1 , a2 , . . . , an )0 with a real a = 0. 4. Finally, an exponential trend is a vector xn ¼ (ea , . . . , ena )0 . Obviously, denoting b ¼ ea we turn the exponential trend into a geometric progression xn ¼ (b, . . . , bn )0 . A constant is a polynomial trend (k ¼ 1), a geometric progression (a ¼ 1) and an exponential trend (a ¼ 0). In the conventional scheme the regressors are normalized. This is why we are interested in Lp -approximability of the normalized trends fn ¼ xn =kxn kp . The next theorem in the most important case p ¼ 2 is proved in Mynbaev and Castelar (2001). Theorem.
Let p , 1.
(i) If {xn } is a polynomial trend, then the normalized sequence { fn } is Lp -close to F(x) ¼ ((k 1)p þ 1)1=p xk1 , k [ N. When p ¼ 1, this statement is true with F(x) ¼ xk1 . (ii) If {xn } is a logarithmic trend, then { fn } is Lp -close to F ; 1 for all k [ N. (iii) For a geometric progression, { fn } is not Lp -approximable, unless a ¼ 1. (iv) For an exponential trend, { fn } is not Lp -approximable, unless a ¼ 0. Because exponential trends are a special case of geometric progressions, part (iv) follows from item (iii). The rest of the proof is split into sections. See Theorems 4.4.1, 4.4.8 and Lemma 7.2.3 for other examples of Lp -approximable sequences.
2.7.2 Simple Sufficient Conditions Lemma.
Let p 1.
(i) Suppose that for a given { fn }, with fn [ Rn for all n, there exists F [ L1 such that kDnp fn Fk1 ! 0. Then { fn } is Lp -close to F.
2.7 EXAMPLES AND COUNTEREXAMPLES
81
(ii) Let F be continuous on [0, 1] and suppose that a sequence {pn }, with pn [ Rn for all n, satisfies max jpnt F(t=n)j ! 0,
1tn
n ! 1:
Denote fn ¼ n1=p pn . Then {fn } is Lp -close to F. Proof. (i) By Ho¨lder’s inequality the equivalent definition of Lp -approximability [Eq. (2.45)] is satisfied: kDnp fn Fkp kDnp fn Fk1 ! 0: (ii) By uniform continuity of F max max jF(t=n) F(x)j ! 0,
1tn x[it
Since Dnp fn ¼
Pn
t¼1
n ! 1:
pnt 1it , we see that
kDnp fn Fk1 ¼ max max jpnt F(x)j 1tn x[it
max jpnt F(t=n)j þ max max jF(t=n) F(x)j ! 0: t
t
x[it
It remains to apply part (i).
B
2.7.3 Proof of Theorem 2.7.1(i): Polynomial Trends For a continuous function h on [0, 1] its integral n 1X h(t=n) n t¼1
Ð1 0
h(t) dt is a limit of Riemann sums:
ð1 h(t) dt ¼ o(1): 0
o(1), as usual, denotes a sequence {1n } satisfying lim 1n ¼ 0. This notation is impersonal in the sense that the sequences {1n } that appear in different places of the proof are not the same. From kxn kpp ¼
n X t¼1
t (k1)p ¼ n(k1)pþ1
n (k1)p 1X t n t¼1 n
82
CHAPTER 2
Lp-APPROXIMABLE SEQUENCES OF VECTORS
we see that h(t) ¼ t (k1)p is the right choice to approximate kxn kpp . Since ð1
t (k1)p dt ¼
1 , (k 1)p þ 1
(2:65)
0
we get 391=p n = t ð1 X 1 1 þ h(t) dt 5 kxn kp ¼ n(k1)pþ1 4 h ; : (k 1)p þ 1 n t¼1 n 8
U: wi d i wi wi di w2i j 1 1 1 1 (6:20) P1
However, if v [ V > (VnU), then i¼1 wi (v)di (v) converges by Eq. (6.16), while Pn 1 wi (v) ! 1 by the definition of V, so n X
n X
wi di
1
!1 wi
! 0 a:s: on V > (VnU):
(6:21)
1
Equations (6.20) and (6.21) prove Eq. (6.19).
B
6.2.6 A Lower Bound for Weighted Sums of Modules of Martingale Differences Lemma. If wi are nonnegative and Assumption L-W holds, then with the same V as in Eq. (6.18) one has
lim inf n!1
n X 1
wi jei j
n X 1
!1 wi
lim inf E(jen kF n1 ) a:s: on V: n!1
(6:22)
274
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Proof. Obviously, with di ¼ jei j E(jei jjF i1 ) Pn Pn Pn wi jei j wi E(jei kF i1 ) wi d i 1 1 Pn Pn ¼ þ P1n : 1 wi 1 wi 1 wi By Eq. (6.19) on V, the last term is asymptotically negligible. Denoting a ¼ lim inf n!1 E(jen jjF n1 ), for any 1 [ (0, a=2) we can find N . 0 such that E(jen jjF n1 ) a 1 for n . N: In Pn Pn PN wi E(jei kF i1 ) E(jei kF i1 ) i E(jei kF i1 ) 1 wiP 1 wP ¼ þ Nþ1 Pn (6:23) n n w w i 1 i 1 1 wi the first term on the right tends to 0 as n ! 1, on V: The second term is not less than Pn a1 i¼Nþ1 wi a 21, (a 1) ¼ PN (6:24) PN Pn Pn w þ w w = i i 1 Nþ1 1 i Nþ1 wi þ 1 say, for all n sufficiently large. Equations (6.23) and (6.24) prove Eq. (6.22).
B
6.2.7 A Lower Bound for Weighted Sums of Powers of Martingale Differences Lemma. [Lai and Wei, 1983b Lemma 1(i)] If wi are nonnegative and Assumption L-W holds, then for every r 1 ( ) !1 n n 1 X X X r lim inf wi jei j wi . 0 a:s: on V ¼ wi ¼ 1, sup wi , 1 : n!1
1
1
i
i¼1
Pn 1 : i ¼ 1, . . . , ng as a probability density function on Proof. Viewing fwi 1 wi {1, . . . , n}, by Ho¨lder’s inequality we have 2 !1 31=r !1 n n n n X X X X r 4 5 wi jei j wi wi jei j wi : 1
1
1
1
Hence, the statement follows from the local M – Z condition Eq. (6.14) and Eq. (6.22). B
6.2.8 Uniform Boundedness of Weights Lemma. [Lai and Wei, 1983b, Lemma 1(ii)] If wi are nonnegative and Assumption L-W holds, then wn are uniformly (in n) bounded on the set A ¼ { supn wn jen j , 1}: P(sup wn ¼ 1, sup wn jen j , 1) ¼ 0: n
n
Proof. Let dn ¼ wn jen j,
mn ¼ E(dn jF n1 ), vn ¼ [E(dn2 jF n1 )]1=2
(6:25)
6.2 VARIOUS BOUNDS ON MARTINGALE TRANSFORMS
275
and, for a natural K 1, define AK ¼ {sup dn , K} > {mn 3K 1 vn for all large n}: n
In the definition of AK , “for all large n” means “for n N(v), where N(v) is large enough”. Step 1. To prove boundedness of wn , it suffices to prove boundedness of the variations vn . Indeed, by the conditional Ho¨lder inequality and the local M – Z condition 0 , lim inf E(jen jjF n1 ) lim inf [E(e2n jF n1 )]1=2 : n!1
n!1
Therefore from wn ¼ vn =[E(e2n jF n1 )]1=2 it follows that lim sup wn n!1
lim supn!1 vn lim inf n!1 [E(e2n jF n1 )]1=2
and we have the implication supn vn , 1 ) supn wn , 1. Step 2. Now we reduce the problem further by showing that sup vn , 1 on AK for all K ) sup vn , 1 on A: n
(6:26)
n
The local M – Z condition implies E(jen kF n1 ) ¼
E(jen kF n1 ) [E(e2n j F n1 )]1=2 [E(e2n j F n1 )]1=2 1=2 lim inf n!1 E(jen kF n1 ) [E(e2n jF n1 )]1=2 , all large n: supn [E(e2n jF n1 )]1=2
Multiplying this by wn (which is F n1 -measurable), we get mn cvn for all large n, where c depends on v . Hence, for almost any v there exists S K such that mn 3K 1 vn for all large n. As a result, V ¼ 1 K¼1 {mn 3K 1 vn for all large n}, up to a set of probability zero. Besides, A ¼ S1 A)=P(A): This sequence is a m.d. sequence under PA : E(an e~ n 1(A)j F~ n1 ) ¼ an 1(A)E( e~ n j F~ n1 ) ¼ 0: Denoting EA X ¼ E1(A)X=P(A), from condition (6.44) we have EA j e~ n 1(A)j ¼ EA [1(A)E(j e~ n k F~ n1 )] EA K 1 ¼ K 1 , EA ( e~ n 1(A))2 ¼ EA [1(A)E( e~ 2n j F~ n1 )] K 2 and the consequence is that EA jan e~ n 1(A)j jan jK 1 K 2 [EA (an e~ n 1(A))2 ]1=2 , n . ni1 : By M– Z Theorem II (Section 6.3.3) it follows that there exists a constant dK [ (0,1) depending only on K and such that for n . ni1 : 2 X a e~ 1(A) dK 4EA EA n ,jn j j i1
X ni1 ,jn
!2 31=2 aj e~ j 1(A) 5 :
(6:48)
6.3 MARCINKIEWICZ –ZYGMUND THEOREMS AND RELATED RESULTS
283
We note that for n . ni1 EA
X
2
!2 aj e~ j 1(A)
¼ E4
ni1 ,jn
3
!2
X
aj e~ j
1(A)5
P(A)
ni1 ,jn
X
¼
a2j E[E( e~ 2j j F~ j1 )1(A)] P(A)
ni1 ,jn
[applying the conditional Jensen inequality (Section 6.1.6)] X 2 a2j E [E(j e~ j k F~ j1 )] 1(A)=P(A) ni1 ,jn
[using the second inequality in Eq. (6.44)] X a2j E1(A)=P(A) K 2 ni1 ,jn
X
¼ K 2
(6:49)
a2j :
ni1 ,jn
The conclusion from Eqs. (6.48) and (6.49) is that !1=2 X X a e~ 1(A)=P(A) dK K 1 a2j : E j[b j j j[b i
i
Since this inequality holds for all A [ F~ ni1 with P(A) . 0, it follows from the definition of conditional expectation that E(jui jjGi1 ) dK K 1 a.s. So, the second condition in Eq. (6.47) is satisfied. B
6.3.5 Lemma on Leading Terms Lemma. Let { e~ n , F~ n } be a m.d. sequence and let {an } be a sequencePof constants satisfying the condition of nontriviality of batches (6.45). Define Sn ¼ ni¼1 ai e~ i , X
Ai ¼
!1=2 a2n
=0
(6:50)
n[bi
and let ui , Gi be as in Eq. (6.46). Then the leading terms in the expression 8 > > >
P P P A2i ni1 ni1 > 2 i¼1 > þ ui on ¼1 : (1 þ o(1)) A2 A2 i¼1
i
i¼1
i¼1
i
(6:51)
284
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Proof. Start with the identity !2 X S2ni ¼ Sni1 þ an e~ n A2 i A2i n[bi 2 !2 3 X X ¼ 4S2ni1 þ 2Sni1 an e~ n þ an e~ n 5A2 i n[bi
n[bi
1 2 ¼ S2ni1 A2 i þ 2(Sni1 Ai )ui þ ui :
Summation over i gives k X
S2ni =A2i ¼
i¼1
k X
S2ni1 A2 i þ2
i¼1
k X
(Sni1 A1 i )ui þ
i¼1
k X
u2i :
(6:52)
i¼1
is Gi1 -measurable, we can put p ¼ 2, Xi ¼ (Sni1 A1 Since Sni1 A1 i i )ui , F i ¼ Gi in the martingale convergence theorem (Section 6.1.3) and use Eq. (6.47) to conclude that ( ) k 1 X X 1 2 2 (Sni1 Ai )ui converges on Sni1 Ai , 1 : (6:53) i¼1
i¼1
In the martingale strong law (Section 6.1.4), in addition to the above choice, let us take P Un ¼ ni¼1 S2ni1 A2 i . On the set U ¼ {limn!1 Un ¼ 1} we have 1 X
Ui2 E(Xi2 jF i1 ) sup E(u2n jF n1 ) n
i¼1
1 X
2 2 (Sni1 A1 ,1 i ) Ui
i¼1
[see Eq. (6.11) for the proof of convergence of the last series]. By Theorem 6.1.4 limn!1 Sn =Un ¼ 0 on U, that is, ( ) ! k k 1 X X X 1 2 2 2 2 (Sni1 Ai )ui ¼ o Sni1 Ai Sni1 Ai ¼ 1 : (6:54) on i¼1
i¼1
i¼1
Equations (6.52), (6.53) and (6.54) prove Eq. (6.51).
B
6.3.6 Theorem on Almost Sure Convergence of a Series with Square-Summable Coefficients (One-Dimensional Case) Theorem. (Lai and Wei, 1983b, Corollary 2) Let {en , F n } be a m.d. sequence satisfying the local M – Z condition (6.14). Let {an } be a sequence of constants such that 1 X i¼1
a2i , 1 and ai = 0 for infinitely many i:
(6:55)
285
6.3 MARCINKIEWICZ –ZYGMUND THEOREMS AND RELATED RESULTS
Then the series
P1
i¼1
ai ei converges a.s. and P
1 X
! ai ei ¼ Y
¼0
(6:56)
i¼1
for every random variable Y that is F p -measurable for some p 1: Hence, in partiP1 P cular, 1 i¼1 ai ei has a nonatomic distribution and P i¼1 ai ei ¼ c ¼ 0 for any constant c: When the m.d. sequence {en , F n } satisfies the stronger condition (6.13), that a i¼1 i ei has a nonatomic distribution for constants an satisfying Eq. (6.55) was established by Barlow (1975). P1
P1 Proof. The a.s. convergence of i¼1 ai ei follows from the generalized M – Z theorem I (Section 6.3.1). To prove Eq. (6.56), assume the contrary that ! 1 X ai ei Y ¼ 0 ¼ 3h . 0 (6:57) P i¼1
for some F p -measurable Y: By Egorov’s theorem there exists an event V0 such that P(V0 ) 1 h and
1 X
ai ei (v) converges uniformly for v [ V0 :
(6:58)
i¼1
By the approximation lemma (Lemma 6.3.2), for the given h there exist positive integers m and K and a m.d. sequence { e~ n , F~ n } satisfying conditions (6.36) and (6.37) and such that F n # F~ n for all n. Moreover, m can be taken larger than p: Let ! m1 n X X ai ei Y þ ai e~ i , n m: (6:59) Sn ¼ i¼m
i¼1
As the term in the parentheses is F m1 -measurable and F n # F~ n , {Sn , F~ n : n m} is a martingale: E(Sn j F~ n1 ) ¼
m1 X
! ai ei Y þ
V1 ¼
1 X i¼1
ai e~ i þan E( e~ n j F~ n1 ) ¼ Sn1 :
i¼m
i¼1
Denote (
n1 X
) ai ei Y ¼ 0 , V2 ¼
\ nm
{en ¼ e~ n }, V3 ¼ V0 > V1 > V2 :
286
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
From Eqs. (6.57), (6.58) and (6.59) we see that Sn ¼
n X
ai ei Y converges uniformly to 0 on V3 :
(6:60)
i¼1
By Eq. (6.37), (6.57) and (6.58) we have P( V2 ) h, P( V1 ) ¼ 1 3h and P( V0 ) h (the bars stand for complements). Hence, P(V3 ) ¼ 1 P( V3 ) ¼ 1 P( V0 < V1 < V2 ) ¼ 1 P( V0 ) P( V1 ) P( V2 ) þ P( V0 > V1 ) þ P( V0 > V2 ) þ P( V1 > V2 ) P( V0 > V1 > V2 ) 1 P( V0 ) P( V1 ) P( V2 ) h: We now define nonrandom positive integers m ¼ n0 , n1 , inductively as follows.Having defined ni1 , we can choose by Eq. (6.55) an index n . ni1 such that an = 0: By Eq. (6.60), we can then choose ni . n in such a way that supv[V3 jSni (v)j jan j=i: With the numbers Ai defined in Eq. (6.50) this choice ensures that S2ni (v) A2i =i2 for all v [ V3 and that Ai . 0 for all i, so that 1 X
S2ni A2 i
i¼1
1 X
i2 , 1 on V3 with P(V3 ) . 0:
(6:61)
1
Having defined the integers ni , we then define ui and Gi as in Eq. (6.46) and obtain by Lemma 6.3.4 that {ui , Gi } is a m.d. sequence satisfying the local M – Z condition Eq. (6.47). Taking in Lemma 6.2.7 wi ¼ 1 for all i and r ¼ 2 we see that the set V from that lemma equals V and lim inf n!1
n 1X u2 . 0 a:s: n 1 i
(6:62)
The Sn defined in Eq. (6.59) differs from the Sn defined in Lemma 6.3.5 by the term Y which is F p -measurable and does not affect the proof of Lemma 6.3.5. Hence, Eq. (6.51) is applicable in the current situation. Equations (6.51) and (6.62) lead to P B the conclusion ki¼1 S2ni =A2i ! 1 a.s., which contradicts Eq. (6.61).
6.3.7 Multivariate Local Marcinkiewicz –Zygmund Condition The assumption to be made in the multivariate case is stronger than the local M –Z condition [Eq. (6.14)] in that in the first part of Eq. (6.14) higher powers of the m.d.’s are used: sup E(jen ja jF n1 ) , 1 a:s: with some a . 2: n
(6:63)
6.3 MARCINKIEWICZ –ZYGMUND THEOREMS AND RELATED RESULTS
287
Besides, under this condition the second part of Eq. (6.14) becomes equivalent to lim inf E(e2n jF n1 ) . 0 a:s: n!1
(6:64)
Indeed, if lim inf n!1 E(jen jjF n1 ) . 0 a.s., then Eq. (6.64) holds because of the inequality E(jen jjF n1 ) [E(e2n jF n1 )]1=2. Conversely, if Eq. (6.64) is true, then by the conditional Ho¨lder inequality with p ¼ (a 1)=(a 2), q ¼ a 1 we have 1=p þ 1=q ¼ 1, 1=p þ a=q ¼ 2 and a=2 jF n1 ) E(e2n jF n1 ) ¼ E(e1=pþ n
[E(jen jjF n1 )]1=p [sup E(jen ja jF n1 )]1=q : n
Hence, lim inf n!1 E(jen jjF n1 ) . 0 follows. If en ¼ (en1 , . . . , end )0 is a column vector in Rd , we replace the positivity of 2 E(en jF n1 ) in Eq. (6.64) by the positive definiteness of the covariance matrix E(en e0n jF n1 ) or, equivalently, by the positivity of its least eigenvalue lmin [E(en e0n jF n1 )]: Thus, by a multivariate local M – Z condition we mean 9 sup E( ken ka jF n1 ) , 1 a:s: with some a . 2 > = n and > ; lim inf lmin [E(en e0n jF n1 )] . 0:
(6:65)
n!1
6.3.8 Generalized Marcinkiewicz – Zygmund Theorem, Multivariate Case Theorem. (Lai and Wei, 1983b, Corollary 3) Let {en , F n } be a vector m.d. sequence satisfying Eq. (6.65). Let {wn } be a premeasurable sequence of vectors wn ¼ (wn1 , . . . , wnd )0 of the same dimension as en . Then, except for a null set, the following statements are equivalent: P1 (i) kwi k2 , 1, P11 0 (ii) 1 wi ei converges, P (iii) supn n1 w0i ei , 1, P1 0 2 (iv) 1 jwi ei j , 1. Proof. The idea is to reduce the multivariate case to the scalar case considered in Section 6.3.1. Put ui ¼
w0i ei 1(wi = 0) þ ei1 1(wi ¼ 0): kwi k
(6:66)
Then sup E(jun ja jF n1 ) sup E( ken ka jF n1 ) , 1 a:s: n
n
(6:67)
288
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Moreover, if wn = 0, then E(u2n jF n1 ) ¼ w0n E(en e0n jF n1 )wn = kwn k2 lmin [E(en e0n jF n1 )]: When wn ¼ 0, a similar inequality is true with wn replaced by (1, 0, . . . , 0)0 [ Rd : Therefore lim inf E(u2n jF n1 ) lim inf lmin [E(en e0n jF n1 )] . 0 a:s: n!1
n!1
(6:68)
By the argument in Section 6.3.7 it follows from Eqs. (6.67) and (6.68) that {un } satisfies the local M –Z condition. That it is a m.d. sequence is easily seen from Eq. (6.66). Since w0i ei ¼ kwi kui and kwi k is F i1 -measurable, items (i) – (iv) from Section 6.3.1 rewrite as items (i) – (iv) from this section. B
6.3.9 Convergence of a Series with Constant Vector Coefficients Theorem. (Lai and Wei, 1983b, Corollary 4) Let {en , F n } be a m.d. sequence satisfying the multivariate local M – Z condition Eq. (6.65). Suppose {An } is a sequence of nonrandom vectors from Rd such that 1 X
kAi k2 , 1 and Ai = 0 for infinitely many i:
(6:69)
i¼1
P1 0 P 0 Then the series 1 i¼1 Ai ei converges a.s. and P i¼1 Ai ei ¼ Y ¼ 0 for every random variable Y that is F p -measurable for some p 1: Proof. In the definitions of Section 6.3.8 replace wi by Ai to see that {ui } is a m.d. sequence satisfying the univariate local M – Z condition. As a result of the equality A0i ei ¼ kAi kui we can put ai ¼ kAi k. Then condition of Eq. (6.69) translates into Eq. (6.55) and the statement follows from Theorem 6.3.6. B
6.3.10 Wei’s Bound on Martingale Transforms Theorem. (Wei, 1985, Lemma 2) Let {en , F n } be a m.d. sequence such that, with some a . 2, supn E(jen ja jF n1 ) , 1 a.s. and let {wn } be a premeasurable sequence Pn 2 1=2 of random variables. Define sn ¼ : Then for any d . 1= min {a, 4} 1 wi n X
wi ei ¼ O(sn ( log sn )d ) a:s:
(6:70)
1
Furthermore, if jwn j ¼ o(scn ) for some 0 , c , 1,
(6:71)
6.3 MARCINKIEWICZ –ZYGMUND THEOREMS AND RELATED RESULTS
289
then n X
wi ei ¼ O(sn ( log log sn )1=2 ) a:s:
(6:72)
1
The proof is omitted because it uses stochastic processes in continuous time. Instead, in Section 6.3.11 we give a simpler statement Lai and (1982) witha proof. P1from Wei P 1 2 2 w ¼ 1 and Note that this theorem covers both cases 1P i 1 wi , 1 : In the n latter case Eqs. (6.70) and (6.72) become w e ¼ O(1): Eq. (6.71) becomes i i 1 P 2 wn ¼ o(1) and trivially follows from 1 w , 1: i 1
6.3.11 A Simple Bound on Martingale Transforms The following statement is stronger than Lemma 6.2.2. Lemma.
If the m.d. sequence {en , F n } satisfies sup E(e2n jF n1 ) , 1
(6:73)
n
and {wn } is premeasurable, then P1 2 P1 (i) 1 wi ei converges a.s. on U ¼ 1 wi , 1 ; Pn 2 1=2 (ii) For every h . 1=2 with sn ¼ we have 1 wi ( ) n 1 X X h 2 wi ei ¼ o(sn ( log sn ) ) a:s: on VnU ¼ wi ¼ 1 : 1
1
Proof. By the truncation argument from Section 6.2.1 wn en can be considered P1 2 square-integrable, and Eq. (6.7) ensures that U ¼ U ¼ 1 (wi ) , 1 . In the proof we omit the stars. Statement (i) was actually obtained in the course of the proof of Theorem 6.3.1, see implication (i) ) (ii) in the proof of that theorem. (ii) {wn en } is a m.d. sequence. Un ¼ sn ( log sn )1=2 is nondecreasing, positive and F n1 -measurable. Obviously, 1 X
E((wi ei )2 jF i1 )Ui2
i¼1
1 X
w2i Ui2 sup E(e2n jF n1 ):
i¼1
n
Replace the function f (x) ¼ x2 in the proof of Lemma 6.2.2 by f (x) ¼ [x( log x)2h ]1 , to get with sufficiently large m 1 X w2
i U2 i¼m i
U 1 ði ð 1 X 1 dx ¼ dx , 1 on U 2 U x( log x)2h i¼m i Ui1
Um1
because 2h . 1: Now the statement follows from Theorem 6.1.4.
B
290
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
6.3.12 Bounds for Weighted Sums of Squared Martingale Differences Assume the conditions of Lemma 6.3.11. Then P1 P1 2 (i) 1 jwi jei , 1 a.s. on T ¼ 1 jwi j , 1 ; (ii) For every r . 1
Lemma.
n X
jwi je2i
n X
¼o
1
!r ! a:s: on VnT:
jwi j
(6:74)
1
Proof (i) The centered squares di ¼ e2i E(e2i jF i1 ) are m.d.’s and satisfy E(jdi jjF i1 ) 2 supn E(e2n jF n1 ): In the identity 1 X
jwi je2i ¼
1
1 X
jwi j di þ
1 X
1
jwi jE(e2i jF i1 )
1
the second series on the right clearly converges on T: The first series converges by the martingale convergence theorem (Theorem 6.1.3) because {jwi jdi } is a m.d. sequence and 1 X
E(jwi di jjF i1 ) sup E(jdn jjF n1 ) n
1
1 X
jwi j , 1
1
on T: (ii) As a first step we prove 1 X
r jwi je2i c i , 1 a.s. on VnT,
(6:75)
1
where ci ¼ implies
Pi
1
jwj j and, by definition, 0=0 ¼ 0. The assumption r . 1
1 X jwi j i¼1
cri
¼
1 X ci ci1 i¼1
cri
1 ð
dx , 1 a.s. on VnT: xr
c1
r Applying statement (i) with jwi jc i in place of jwi j we obtain Eq. (6.75). To apply Kronecker’s lemma (Lemma 6.1.12), denote xi ¼ jwi je2i , Pn ai ¼ cri : i¼1 xi =ai converges by Eq. (6.75) and {an } monotonically increases to 1 on VnT: Hence, n X 1
r jwi je2i c n ¼ 1=an
n X i¼1
xi ! 0: B
6.3 MARCINKIEWICZ –ZYGMUND THEOREMS AND RELATED RESULTS
291
6.3.13 Precise Order of Weighted Squares of Martingale Differences Lemma. If the m.d. sequence {en , F n} satisfies the M-Z condition Eq. (6.65) (in the 1-D case) and {wn } is premeasurable, then Pn Pn jwi je2i jwi je2i 1 lim sup P1 n ,1 0 , lim inf Pn n!1 n!1 1 jwi j 1 jwi j ( ) 1 X on V ¼ jwi j ¼ 1, sup jwi j , 1 : (6.76) i
i¼1
Proof. The left inequality in Eq. (6.76) is proved in Lemma 6.2.7. Note that the right inequality strengthens Eq. (6.74) by allowing r ¼ 1: Let us prove it. Take some r [ (1, min {2, a=2}) and denote di ¼ e2i E(e2i jF i1 ): From the elementary inequality ja bjr cr (ar þ br ), conditional Jensen’s inequality (Section 6.1.6) and M-Z condition [Eq. (6.65)] we get E(jdi jr jF i1 ) cr E{jei j2r þ [E(e2i jF i1 )]r jF i1 } 2cr sup E(jen j2r jF n1 ) , 1 a:s:
(6.77)
n
The products jwi jdi are m.d.’s. Now consider two cases: 1. Suppose 1 X
jwi jr ¼ 1:
(6:78)
1
Denoting Ui ¼
Pi
1
jwj jr , by Eqs. (6.77) and (6.78) we have
1 X E(jwi di jr jF i1 ) i¼m
Uir
c
1 X jwi jr
Uir i¼m
¼c
1 X Ui Ui1 i¼m
Uir
U 1 ði ð 1 X dx dx c ¼ c , 1 a:s:, r x xr i¼m Ui1
Um1
where c depends on v and Um1 . 0. By the martingale strong law (Section 6.1.4) ! n n X X r (6:79) Eq: (6:78) ) jwi j di ¼ o jwi j a:s: i¼1
i¼1
2. Next assume the opposite of Eq. (6.78): 1 X 1
jwi jr , 1:
(6:80)
292
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
P P1 r r Since by Eq. (6.77) 1 i¼1 E(jwi di j jF i1 ) c 1 jwi j , 1 in this case, the martingale convergence theorem (Theorem 6.1.3) shows that n X
Eq: (6:80) )
jwi j di ¼ O(1) a:s:
(6:81)
i¼1
Now we are able to prove n X
jwi j di ¼ o
i¼1
n X
! a:s:
jwi j
(6:82)
i¼1
From the bound n X
jwi j ¼ sup jwj j
n X
j
1
i¼1
n X jwi j sup jwj j1r jwi jr supj jwj j j i¼1
we see that n X
r
jwi j ¼ O
i¼1
n X
! jwi j
on V:
(6:83)
i¼1
Take v [ V: If Eq. (6.78) is true, then Eq. (6.82) follows from Eqs. (6.79) and (6.83). If Eq. (6.80) is true, then Eq. (6.82) is a consequence of P Eq. (6.81) and 1 1 jwi j ¼ 1: Finally, Eq. (6.82) and the M-Z condition (Section 6.2.3) imply n X
jwi je2i
1
¼
n X
jwi j di þ
i¼1
n X
jwi jE(e2i jF i1 )
¼O
i¼1
n X
! jwi j
i¼1
on V: B
6.4 STRONG CONSISTENCY FOR MULTIPLE REGRESSION 6.4.1 Notation In the multiple regression model yn ¼ Xn b þ 1n , n ¼ 1, 2, . . . , where 1n ¼ (e1 , . . . , en )0 we assume that {en , F n } is a m.d. sequence, the parameter vector is p 1 and the matrix Xn is of size n p: It is assumed that, as n grows, new equations are appended to the system and the previous equations are not changed. Further, for each n the nth row (xn1 , . . . , xnp ) is F n1 -measurable. The nth row is written as the transpose of xn ¼ (xn1 , . . . , xnp )0 and, therefore, if Xn0 Xn is nonsingular, the least squares estimate
6.4 STRONG CONSISTENCY FOR MULTIPLE REGRESSION
293
bn of b becomes bn ¼
(Xn0 Xn )1 Xn0 yn
¼bþ
n X
!1 xi x0i
1
n X
xi ei :
1
It seen that the statistical properties of bn are related to the martingale transform Pis n x 1 i ei and the random matrix An ¼
n X
xi x0i :
1
That this matrix is a sum of nonnegative definite matrices xi x0i is one of the leading ideas in Section 6.4. In particular, some important properties of the sequence {An } depend on quadratic forms x0k A1 k xk , which we call increments. All statements in Section 6.4 are taken from Lai and Wei (1982). For a p p symmetric matrix A we denote by
lmin (A) ¼ l1 (A) lp (A) ¼ lmax (A)
(6:84)
its eigenvalues. jAj stands for the determinant of A:
6.4.2 Lemma on Increments Lemma. Let B be a p p matrix and let w be a p 1 vector. If A ¼ B þ ww 0 is nonsingular, then w 0 A1 w ¼ 1 jBj=jAj. Proof. By the partitioned matrix rule [Lemma 1.7.6(i)] 1 w0 0 ¼ jAj(1 w0 A1 w), jBj ¼ jA ww j ¼ w A which gives the desired result.
B
6.4.3 The Link between Increments and lmax (An) Lemma. Let x1 , x2 , . . . be p 1 vectors and let An ¼ nonsingular for some N. Then
Pn
0 1 xi xi .
Suppose that AN is
(i) lmax (An ) is nondecreasing and An is nonsingular for all n N. P1 0 1 x Ak xk , 1. Pnk¼N 0k 1 (iii) If limn ! 1 lmax (An ) ¼ 1, then k¼N xk Ak xk ¼ O( log lmax (An )): (ii) If limn ! 1 lmax (An ) , 1, then
Proof. (i) By (Gohberg and Kreı˘n (1969), Lemma 1.1) the inequality Ak Ak1 implies
lj (Ak ) lj (Ak1 ), j ¼ 1, . . . , p:
(6:85)
294
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
In particular, lmax (An ) is nondecreasing. If AN is nonsingular, then lmin (AN ) is positive and all An with n N are nonsingular. (ii) The equation Ak ¼ Ak1 þ xk x0 k and Lemma 6.4.2 imply n X
n X
x0k A1 k xk ¼
k¼Nþ1
(1 jAk1 j=jAk j), n N þ 1:
(6:86)
lj (An ) [lmin (An )]p , n N þ 1:
(6:87)
k¼Nþ1
Obviously, Eq. (6.85) implies [lmax (An )]p jAn j ¼
p Y j¼1
If limn!1 lmax (An ) , 1, then Eqs. (6.86) and (6.87) give n X
n X
x0k A1 k xk ¼
k¼Nþ1
(jAk j jAk1 j)=jAk j
k¼Nþ1
lp min (ANþ1 )
n X
(jAk j jAk1 j)
k¼Nþ1
¼ lp min (ANþ1 )(jAn j jAN j) [lmax (An )=lmin (ANþ1 )]p c, n N þ 1: (iii) As a result of Eq. (6.85) we have jAk j jAk1 j and jA ðk j
jAk j jAk1 j ¼ jAk j
dx jAk j
jAk1 j
jA ðk j
dx ¼ ln jAk j ln jAk1 j: x
jAk1 j
Summing these inequalities and applying Eqs. (6.86) and (6.87) we get n X
x0k A1 k xk
k¼Nþ1
n X
( ln jAk j ln jAk1 j)
k¼Nþ1
¼ ln jAn j ln jAN j ¼ O( ln jAn j) ¼ O{log [lmax (An )]}: B
6.4.4 Lemma on Recursions Define N ¼ inf {n : Xn0 Xn is nonsingular}, inf f ¼ 1, and for n N consider the quadratic form 0 Qn ¼ 10n Xn (Xn0 Xn )1 Xn0 1n ¼ 10n Xn A1 n Xn 1n :
6.4 STRONG CONSISTENCY FOR MULTIPLE REGRESSION
295
Lemma. Let us partition Xn into rows, Xn ¼ (x1 , . . . , xn )0 , and denote qk ¼ x0k A1 k1 xk . Then 1 1 0 1 A1 k ¼ Ak1 Ak1 xk xk Ak1 =(1 þ qk ), 2 Pk1 Pk1 x0k A1 xi ei x0k A1 xi ei ek k1 k1 1 1 Qk ¼ Qk1 þ2 1 þ qk 1 þ qk 2 þ x0k A1 k xk ek ,
and for n . N n X
Qn QN þ
(6.88)
(6.89) 2 Pk1 x0k A1 x e i i k1 1
1 þ qk Pk1 n x0k A1 x e X i i ek k1 1 k¼Nþ1
¼2
1 þ qk
k¼Nþ1
þ
n X
2 x0k A1 k xk ek :
(6.90)
k¼Nþ1
Proof. To prove Eq. (6.88), it is enough to check that premultiplication of the matrix 0 Xk1 þ xk x0k gives the identity matrix. at the right of Eq. (6.88) by Ak ¼ Xk1 0 1 Remembering that Xk Xk Ak ¼ I we have
0 1 A1 0 0 1 k1 xk xk Ak1 (Xk1 Xk1 þ xk xk ) Ak1 1 þ qk 0 1 xk x0k A1 xk (x0k A1 k1 k1 xk )xk Ak1 þ xk x0k A1 k1 1 þ qk 1 þ qk
1 qk xk x0k A1 k1 xk x0k A1 ¼I þ 1 k1 1 þ qk 1 þ qk
qk qk ¼Iþ xk x0k A1 k1 ¼ I: 1 þ qk 1 þ qk
¼I
Now we prove Eq. (6.89). By Eq. (6.88) for k . N 0 1 x0k A1 k ¼ xk Ak1
0 1 (x0k A1 x0k A1 k1 xk )xk Ak1 k1 ¼ : 1 þ qk 1 þ x0k A1 k1 xk
For k . N the definition of Qn , and Eqs. (6.88) and (6.91) give
Qk ¼
k1 X
! x0i ei
þ
x0k ek
A1 k
1
¼
k 1 X 1
k 1 X
! xi ei þ xk ek
1
x0i ei A1 k
k 1 X 1
xi ei þ 2
x0k A1 k
k 1 X 1
! 2 xi ei ek þ x0k A1 k xk ek
(6:91)
296
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
X k1 0 1 A1 k1 xk xk Ak1 ¼ xi ei 1 þ qk 1 1 ! k1 X x0k A1 2 k1 þ2 xi ei ek þ x0k A1 k xk ek 1 þ qk 1 2 Pk1 Pk1 0 1 x0k A1 x e x A x e ek i i i i k1 k k1 1 1 ¼ Qk1 þ2 1 þ qk 1 þ qk k1 X
x0i ei
A1 k1
2 þ x0k A1 k xk ek ,
which proves Eq. (6.89). Sending the first two terms at the right of Eq. (6.89) to the left side and summing over k ¼ N þ 1, . . . , n we get Eq. (6.90). B
6.4.5 Bounding Qn Lemma.
Let the m.d. sequence {en , F n } satisfy sup E(e2n jF n1 ) , 1 a:s:
(6:92)
n
and let {xn } be premeasurable. Put L,1 ¼ {N , 1, lim lmax (An ) , 1}, n!1
L¼1 ¼ {N , 1, lim lmax (An ) ¼ 1}: n!1
The following statements are true: (i) Qn ¼ O(1) a.s. on L,1 . (ii) For every d . 0 Qn ¼ o([log lmax (An )]1þd ) a:s: on L¼1 :
(6:93)
(iii) If Eq. (6.92) is replaced by sup E(jen ja jF n1 ) , 1 a:s: for some a . 2,
(6:94)
n
then Eq. (6.93) can be strengthened into Qn ¼ O(log lmax (An )) a:s: on L¼1 :
(6:95)
6.4 STRONG CONSISTENCY FOR MULTIPLE REGRESSION
297
Proof. Pk1 (i) Let wk ¼ x0k A1 k1 1 xi ei =(1 þ qk ) if k . N and wk ¼ 0 if k N. Since {wn } is premeasurable, it follows from Lemma 6.3.11(ii) that 2
n X
wk ek ¼ o4
k¼Nþ1
n X
!1=2 log
!
( 1 X
k¼Nþ1 n X
¼o
n X
w2k
w2k
!h 3 5
k¼Nþ1
w2k
on
) w2k
¼1 :
Nþ1
k¼Nþ1
However, by Lemma 6.3.11(i) n X
wk ek ¼ O(1) on
( 1 X
) w2k
,1 :
Nþ1
k¼Nþ1
The two bounds above imply ! Pk1 n n X X x0k A1 xi ei 2 k1 1 ek ¼ o wk þ O(1) 1 þ qk k¼Nþ1 k¼Nþ1 0 !2 1 Pk1 n 0 1 X xk Ak1 1 xi ei A þ O(1) ¼ o@ 1 þ qk k¼Nþ1 0
n B X ¼ o@
x0k A1 k1
k¼Nþ1
Pk1 1
1 þ qk
xi ei
2 1 C A þ O(1):
(6.96)
P 0 1 0 1 By Lemma 6.4.3(ii) 1 k¼Nþ1 xk Ak xk , 1 on L,1 . Because xk Ak xk is F k1 -measurable, by Lemma 6.3.12(i) 1 X
2 x0k A1 k xk ek , 1 on L,1 :
(6:97)
k¼Nþ1
Equations (6.96) and (6.97) allow us to use Eq. (6.90) to conclude that
Qn þ
n X
2 Pk1 x0k A1 xi ei k1 1
k¼Nþ1
1 þ qk
This proves statement (i).
(1 þ o(1)) ¼ O(1) a:s: on L,1 : (6:98)
298
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
(ii) In the event L¼1 , by Lemmas 6.4.3(iii) and 6.3.12(ii) for every d . 0 2 !1þd 3 n n X X 2 4 5 x0k A1 x0k A1 k xk ek ¼ o k xk k¼Nþ1
k¼Nþ1 1 X
¼ o[( log lmax (An ))1þd ] if
x0k A1 k xk ¼ 1
k¼Nþ1
and by Lemma 6.3.12(i) n X
1 X
2 x0k A1 k xk ek ¼ O(1) ¼ o(log lmax (An )) if
k¼Nþ1
x0k A1 k xk , 1:
k¼Nþ1
Consequently, for every d . 0 n X
2 1þd x0k A1 ] on L¼1 : k xk ek ¼ o[( log lmax (An ))
(6:99)
k¼Nþ1
As above, we apply Eqs. (6.96), (6.99) and (6.90) to derive
Qn þ
n X
x0k A1 k1
Pk1 1
xi ei
2 (1 þ o(1))
1 þ qk
k¼Nþ1
¼ o[( log lmax (An ))1þd ] on L¼1
(6:100)
for every d . 0. The proof of Eq. (6.93) is complete. (iii) Suppose Eq. (6.94) holds. By Lemma 6.4.2 x0k A1 k xk ¼ 1 jAk1 j=jAk j 1 and therefore ( ) ( ) 1 1 X X 0 1 0 1 0 1 sup xk Ak xk , 1, xk Ak xk ¼ 1 ¼ xk Ak xk ¼ 1 : k
k¼Nþ1
k¼Nþ1
By Lemmas 6.3.13 and 6.4.3(iii) n X
2 x0k A1 k xk ek ¼ O
k¼Nþ1
n X
! x0k A1 k xk
¼ O( log lmax (An )):
k¼Nþ1
Using this relationship instead of Eq. (6.99) in the proof of Eq. (6.100) we finish the proof of Eq. (6.95). B
6.4.6 Case of One Regressor The beauty of Lemma 6.4.5 is that the convergence properties established in Lemma 6.3.11 are extended to the multiple regression case without deterioration of the rates of convergence. More importantly, Lemma 6.4.5 in the case of one regressor ( p ¼ 1)
6.4 STRONG CONSISTENCY FOR MULTIPLE REGRESSION
299
provides an improvement of Lemma 6.3.11(ii) under the assumption (6.94). This is the content of the next corollary. Corollary. Let the m.d. sequence {en , F n } satisfy Eq. (6.94) and suppose P1 2that the sequence of random weights {wn } is premeasurable. Then in the event 1 wi ¼ 1 n X
0" !#1=2 1 n n X X A a:s: w i e i ¼ O@ w2i log w2i
1
1
1
Proof. Set Xn ¼ (w1 , . . . , wn )0 in Lemma 6.4.5 to obtain An ¼ Xn0 Xn ¼
n X
w2i , lmax (An ) ¼
n X
1
Qn ¼
w2i , Xn0 1n ¼
n X
1
10n Xn (Xn0 Xn )1 Xn0 1n
¼
n X
wi ei ,
1
!2
n X
wi ei
1
!1 w2i
:
1
Then by Eq. (6.95) n X 1
!2 wi e i
n X 1
!1 w2i
¼ O log
n X 1
! w2i
on
( 1 X
) w2i
¼1 :
1
B
6.4.7 Strong Consistency of the Ordinary Least Squares Estimator in Stochastic Regression Models Consider the regression model yn ¼ Xn b þ 1n , where 1n ¼ (e1 , . . . , en )0 , and denote An ¼ Xn0 Xn . For nonrandom xij , Lai et al. (1978, 1979) proved that the condition A1 n ! 0 is necessary and sufficient for the strong consistency of the OLS estimator bn .3 This condition is equivalent to lmin (An ) ! 1. Now suppose that xij are random and lmin (An ) ! 1 a.s. Anderson and Taylor (1979) established the strong consistency of bn under the condition lmax (An ) ¼ O(lmin (An )), while Christopeit and Helmes (1980) weakened that condition to [lmax (An )]r ¼ O(lmin (An )) a.s. for some r . 1=2. The following theorem by Lai and Wei (1982) is a substantial improvement of those results. Theorem. Suppose that in the above regression model, {en , F n } is a m.d. sequence such that Eq. (6.94) holds. Further, assume that the nth row xn of Xn is F n1 -measurable, for all n, and that
lmin (An ) ! 1 a:s: and log lmax (An ) ¼ o(lmin (An )) a:s: 3
It would be good to include this result in the book but I could not find the complete proof.
(6:101)
300
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Then the least-squares estimator bn converges a.s. to b; in fact, ! log lmax (An ) 1=2 kbn bk ¼ O a:s: lmin (An )
(6:102)
Proof. Since lmin (An ) ! 1 a.s., we have N , 1 a.s. Note the identity 2 n 0 1=2 X 0 xi ei ¼ (A1=2 Xn0 1n ) A1=2 Xn0 1n ¼ 10n Xn A1 An n n n Xn 1n ¼ Qn : 1 It allows us to apply Lemma 6.4.5(iii) and condition (6.101) in writing 2 2 n n X 1 X 1=2 2 1=2 xi ei kAn k An xi ei kbn bk ¼ An 1 1 2
¼ [lmin (An )]1 Qn ¼ O([ log lmax (An )]=lmin (An )) ¼ o(1):
B
6.4.8 Another Version of the Strong Consistency Statement In Chapter 7 on nonlinear models there is a situation where condition (6.94) is too tough. To this end, the theorem below is useful. Theorem. If in Theorem 6.4.7 condition (6.94) is replaced by a weaker condition Eq. (6.92) and condition (6.101) is replaced by a stronger one
lmin (An ) ! 1 a:s: and [log lmax (An )]1þd ¼ o(lmin (An )) for some d . 0,
(6:103)
then the conclusion [Eq. (6.102)] is true. This follows from Lemma 6.4.5(ii).
6.5 SOME ALGEBRA RELATED TO VECTOR AUTOREGRESSION 6.5.1 The Model and History Consider the vector autoregressive model Yn ¼ BYn1 þ en , n ¼ 1, 2, . . .
(6:104)
where B is a p p nonrandom matrix with real elements and {en , F n } is a vector m.d. sequence satisfying sup E(ken ka jF n1 ) , 1 a.s. with some a . 2: n
(6:105)
6.5 SOME ALGEBRA RELATED TO VECTOR AUTOREGRESSION
301
Equation (6.104) is equivalent to Yn ¼ Bn Y0 þ
n X
Bni ei ,
(6:106)
i¼1
where Y0 is assumed to be F 0 -measurable. We denote Yn ¼ (Yn1 , . . . , Ynp )0 , en ¼ (en1 , . . . , enp )0 , B ¼ (bij )1i, jp , bi ¼ (bi1 , . . . , bip )0
(6.107)
(B is partitioned into rows). The least-squares estimate of bi based on observed Y1 , . . . , Ynþ1 is b^ k (n þ 1) ¼
n X i¼1
!1 Yi Yi0
n X
Yi Yiþ1,k ¼ bk þ
i¼1
n X
!1 Yi Yi0
i¼1
n X
Yi eiþ1,k ,
(6:108)
i¼1
where the inverse sign denotes the Moore – Penrose generalized inverse. In view of Eq. (6.108) the least-squares estimate is strongly consistent if and only if n X
!1 Yi Yi0
n X
Yi e0iþ1 ! 0 a:s:
(6:109)
yn ¼ b1 yn1 þ þ bp ynp þ 1n
(6:110)
i¼1
i¼1
The autoregressive AR( p) model
can be written in the vector form [Eq. (6.104)] with Yn ¼ ( yn , yn1 , . . . , ynpþ1 )0 , en ¼ (1n , 0, . . . , 0)0 ,
b1 b p1 bp : B¼ I p1 0
(6.111)
This B is called a companion matrix for Eq. (6.110). Everywhere we denote z1 , . . . , zp the eigenvalues of B (the roots of its characteristic polynomial) and put M ¼ max jzj j, m ¼ min jzj j. If m . 1, the process {Yn } is called purely explosive. If M 1, we say that it is nonexplosive. Section 6.5 is based on the paper Lai and Wei (1985). Table 6.1 describes the contributions of various authors prior to that paper. All these authors considered weak consistency. Lai and Wei established strong consistency for Eq. (6.110) in (Lai and Wei, 1983a) and for Eq. (6.104) in (Lai and Wei, 1985). Nielsen (2005) added a deterministic term (see the definition in Chapter 8) in Eq. (6.104). In Nielsen, (2008) he discovered that the least-squares estimator for a vector autoregression (VAR) of dimension larger than 1 may be inconsistent. To achieve consistency, we have to decompose Yn into stationary, unit root and explosive processes and require the unit root component to be of dimension 1, [see (Nielsen, 2008, pp. 3-4) for the decomposition
302
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
TABLE 6.1 Contributions to the Consistency Theory of Autoregressions
Author(s)
Case(s) considered
Mann and Wald (1943) Rubin (1950), Anderson (1959) Rao (1961) White (1958) Muench(1974), Stigum(1974)
M,1 m.1 p ¼ 2, jz1 j , 1, jz2 j , 1 p ¼ 1, jz1 j ¼ 1 Roots anywhere
and Assumption E that ensures consistency]. He mentions that Lai and Wei (1985) and Nielsen (2005) overlooked the possibility of singularity. Giving complete proofs is not an option here, and we just require the unit root component to be of dimension 1, remembering that this makes the argument by Lai and Wei correct.
6.5.2 Growth Rate of Natural Powers of B For x [ Rp denote kxk ¼ (x0 x)1=2 and for a p p matrix A denote kAk ¼ supkxk¼1 kAxk. Let l1 , . . . , lq , q p, denote all the distinct eigenvalues of B, and express it in its Jordan form (6:112) B ¼ C 1 DC, where D ¼ diag[D1 , . . . , Dq ], 0 1 lj 1 0 0 B 0 lj 1 0 C B C B C Dj ¼ B 0 0 lj C is an mj mj matrix, B . C . . . . . . .. A .. .. @ .. 0 0 0 lj P q mj is the multiplicity of lj j¼1 mj ¼ p , and C is a nonsingular p p matrix (over the complex field). Denote M ¼ maxjlj j, m ¼ max{mj : jlj j ¼ M}:
(6:113)
m is the multiplicity of the largest (in absolute value) eigenvalue of B. Lemma (i) There exists a constant c ¼ c(m, M) . 0 such that kBn k cnm1 M n for all natural n: (ii) (iii)
1 n n log kB k ¼ (1 þ o(1)) log M. 1 n n log kB k ¼ (1 þ o(1)) log m.
(6:114)
6.5 SOME ALGEBRA RELATED TO VECTOR AUTOREGRESSION
303
Proof. (i) [Varga, 1962, p. 65] proved that kBn k ¼ (1 þ o(1))cCmn 1 [lmax (B)]n(m1) ,
(6:115)
where c does not depend on B. The quantity Cmn 1 M n(m1) ¼ ¼
n(n 1) (n m þ 2) M n M m1 (m 1)! (1 1n) (1 m2 n ) m1 n n M m 1 (m 1)!M
(6.116)
is of order nm1 M n , which proves (i). (ii) From Eqs. (6.115) and (6.116) it follows that with 1 1n 1 m2 n c(n) ¼ [(m 1)!M m1 ] we have 1 1 m1 log kBn k ¼ log [(1 þ o(1))cc(n)] þ log n þ log M: n n n This proves (ii). (iii) is a consequence of (ii) because lmax (B1 ) ¼ 1=lmin (B) ¼ m1 .
B
6.5.3 The Meddling Middle Factor This lemma is used to show that in some situations for the estimation of lmax (An Cn A0n ) the factor in the middle, Cn , is asymptotically negligible. Lemma. Let A, C be p p matrices such that C is symmetric and nonnegative definite. Then
lmax (C)lmax (AA0 ) lmax (ACA0 ) lmin (C)lmax (AA0 ):
(6:117)
Proof. In the proof we use the fact that for any symmetric nonnegative definite matrix C
lmax (C) ¼ sup x0 Cx ¼ sup kCxk: kxk¼1
(6:118)
kxk¼1
Together with the equation kAk ¼ kA0 k this leads to kAk2 ¼ sup kA0 xk2 ¼ sup x0 AA0 x ¼ lmax (AA0 ): kxk¼1
kxk¼1
(6:119)
304
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Using the diagonalization C ¼ U 0 LU, where U is an orthogonal matrix, U 0 U ¼ I, and L is a diagonal matrix with the eigenvalues lj (C) on the main diagonal, we have a two-sided bound on the quadratic form x0 Cx:
lmin (C)kxk2 x0 Cx ¼ (Ux)0 LUx lmax (C)kxk2 : The left inequality in Eq. (6.117) is proved as follows:
lmax (ACA0 ) ¼ sup x0 ACA0 x ¼ sup (A0 x)0 C(A0 x) kxk¼1
kxk¼1
lmax (C) sup kA0 xk2 ¼ lmax (C)lmax (AA0 ): kxk¼1
The right inequality follows from sup x0 ACA0 x sup {lmin (C)kA0 xk2 } ¼ lmin (C)lmax (AA0 ): kxk¼1
B
kxk¼1
6.5.4 Lemma on an Enveloping Resolvent Let us call BF ¼
1 X
Bi F(Bi )0
i¼0
an enveloping resolvent (see related definitions in Section 8.4.1). Here F is a p p matrix. Lemma. If M , 1, then the equation X BXB0 ¼ F has a unique solution X ¼ BF for any right-hand side F. Proof. If there are two different solutions, X1 and X2 , then the difference X ¼ X1 X2 satisfies X BXB0 ¼ 0. Since kBk , 1, we arrive at a contradiction: 0 , kXk kBk2 kXk , kXk. Further, X ¼ BF is really a solution: X BXB0 ¼
1 X
Bi F(Bi )0
i¼0
1 X
Biþ1 F(Biþ1 )0 ¼ F: B
i¼0
6.5.5 Lemma on the Same Order Lemma.
Denote Vn ¼
Pn
i¼1
Yi Yi0 . lmax (Vn ) and
Pn
i¼1
kYi k2 are of the same order.
6.5 SOME ALGEBRA RELATED TO VECTOR AUTOREGRESSION
305
Proof. Denoting lj the eigenvalues of Vn and using properties of trace we have
lmax (Vn )
p X
li (Vn ) ¼ tr Vn ¼ tr
i¼1
¼
n X
Yi Yi0
i¼1
n X
tr Yi0 Yi ¼
n X
i¼1
kYi k2 plmax (Vn ):
(6.120)
i¼1
B
6.5.6 Towards the Lower Bound Let w(l) ¼ jB lIp j ¼ lp þ a1 l p1 þ þ ap be the characteristic polynomial of B and let a0 ¼ 1. The variables Zn ¼ Yn þ
p X
aj Ynj ¼
j¼1
p X
aj Ynj , n p,
(6:121)
j¼0
are instrumental in the estimation of lmin (Vn ) from below. P P Pn p n 0 2 0 p Z Z a Lemma. lmin i i i¼p i¼0 Yi Yi . j¼0 j lmin Proof. Let u be a unit vector, kuk ¼ 1. By Eq. (6.121) and Ho¨lder’s inequality p X
u0 Zi Zi0 u ¼
!2 aj u0 Yij
p X
j¼0
a2j
j¼0
p X
(u0 Yij )2
j¼0
and therefore n X
u
0
Zi Zi0 u
p X
i¼p
! a2j
p n X X
(u0 Yij )2
i¼p j¼0
j¼0
(in the inner sum replace i j ¼ k) ¼
p X
! a2j
ip n X X
(u0 Yk )2
i¼p k¼i
j¼0
(changing summation order)
¼
p X j¼0
! a2j
kþp n X X k¼0 i¼k
0
2
(u Yk ) ¼ p
p X j¼0
! a2j
n X
u0 Yk Yk0 u:
k¼0
The lemma follows from this inequality and that for any symmetric matrix A the equation lmin (A) ¼ inf kuk¼1 u0 Au is true. B
306
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
6.5.7 Representation in Terms of Errors Denote Cj ¼
j X
ai B ji , j ¼ 0, . . . , p 1:
(6:122)
i¼0
Lemma (i) Along with Eq. (6.121) there is the representation Zk ¼
p1 X
Cj ekj :
(6:123)
j¼0
(ii) Denoting p1 X
Ak,l ¼
0 e0m Ckm ,
(6.124)
0 (A0k,l e0l Ckl þ Ckl el Ak,l ),
(6.125)
j¼klþ1
Rn ¼ we have n X
m¼kpþ1
n lþp2 X X l¼2
l1 X
e0kj Cj0 ¼
k¼l
Zi Zi0 ¼
p1 X
i¼p
Cj
n X
j¼0
! ekj e0kj Cj0 þ Rn :
(6:126)
k¼p
Proof. (i) The recursion Eq. (6.106), applied to Ykj , j p, is used to reveal its dependence on Ykp : Ykj ¼ Bkj Y0 þ
kp X
Bkji ei þ
i¼1
¼B
pj
B
kp
kj X i¼kpþ1
Y0 þ
kp X
! B
kpi
ei þ
i¼1
¼ B pj Ykp þ
Bkji ei
kj X
kj X
Bkji ei
i¼kpþ1
Bkji ei
i¼kpþ1
(the sum in the expression above is empty when j ¼ p). Now we plug these expressions in Eq. (6.121) and use the fact that B satisfies its determinantal equation w(B) ¼ 0 by the Caley – Hamilton
6.5 SOME ALGEBRA RELATED TO VECTOR AUTOREGRESSION
307
theorem (see Herstein, 1975). For p k n
Zi ¼
p X
aj Yij ¼
j¼0
p X
! aj B
pj
Yip þ
p X
j¼0
j¼0
ij X
aj
Bijh eh
h¼ipþ1
[the first sum is null; in the second one change the summation order and apply notation Eq. (6.122)] i X
ih X
h¼ipþ1
j¼0
¼
! aj B
ijh
i X
eh ¼
p1 X
Cih eh ¼
Cj eij :
j¼0
h¼ipþ1
This is Eq. (6.123). P (ii) We substitute Eq. (6.123) in nk¼p Zk Zk0 , multiply through the inner sums and sort out the terms into groups with i ¼ j, i , j and i . j: n X
Zk Zk0
¼
k¼p
p1 n X X k¼p
! Ci eki
p1 X
i¼0
! e0kj Cj0
¼ Si¼j þ Si,j þ Si.j : (6:127)
j¼0
The terms at the right-hand side are rearranged as below:
Si¼j ¼
p1 n X X
Ci eki e0ki Ci0 ¼
k¼p i¼0
p1 X
n X
Ci
i¼0
Si,j ¼
! eki e0ki Ci0 ,
(6:128)
k¼p
p2 X p1 n X X
Ci eki e0kj Cj0
k¼p i¼0 j¼iþ1
(replacing i according to k i ¼ l)
¼
n k X X
p1 X
Ckl el
k¼p l¼kpþ2
¼
n lþp2 X X l¼2
k¼l
! e0kj Cj0
j¼klþ1 p1 X
Ckl el
! e0kj Cj0
j¼klþ1
and
Si.j ¼
p2 X p1 n X X k¼p j¼0 i¼jþ1
Ci eki e0kj Cj0
,
(6.129)
308
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
(replacing j according to k j ¼ l) ¼
n k X X
!
p1 X
0 Ci eki e0l Ckl
k¼p l¼kpþ2 i¼klþ1
¼
n lþp2 X X k¼l
l¼2
!
p1 X
0 Cj ekj e0l Ckl :
(6.130)
j¼klþ1
Collecting Eqs. (6.127) – (6.130) and using notations (6.124) and (6.125) we finish the proof of Eq. (6.126). B
6.5.8 Spectrum-Separating Decomposition Let M . 1 m. From the rational canonical form (over the real field) of the real matrix B (see Herstein, 1975, p. 307) it follows that there exists a nonsingular real matrix M such that B ¼ M 1
B1 0
0 M, B2
(6:131)
where B1 and B2 are real square matrices such that min jlj (B1 )j . 1, max jlj (B2 )j 1:
(6:132)
Substituting Eq. (6.131) in Yn ¼ BYn1 þ en and premultiplying the resulting equation by M we obtain
MYn ¼
B1 0
0 MYn1 þ Men : B2
With conformal partitioning
MYn ¼
jn Sn , Men ¼ Tn zn
(6:133)
the system breaks up into Sn ¼ B1 Sn1 þ jn , Tn ¼ B2 Tn1 þ zn :
(6:134)
The first of the processes in Eq. (6.134) is purely explosive and the second one is nonexplosive. As a result of nonsingularity of M estimating Vn is the same as estimating 0
MVn M ¼
n X 1
Pn
0
(MYi )(MYi ) ¼
0 1 Si Si Pn 0 1 Ti Si
Pn
0 1 Si Ti Pn 0 1 Ti Ti
! :
(6:135)
6.5 SOME ALGEBRA RELATED TO VECTOR AUTOREGRESSION
309
6.5.9 Exploiting Order Properties of Symmetric Matrices Lemma.
Let A be a p p symmetric positive definite matrix
(i) If A1 ¼ Ip þ B þ C, where B, C are symmetric, B is nonnegative definite and kCk , 1, then kAk 1=(1 kCk):
(6:136)
(ii) If A is partitioned as
A¼
H , Q
P H0
where P and Q are, respectively, r r and s s matrices such that p ¼ r þ s, then for u [ Rr
0
u u u0 P1 u(1 þ kA1 ktrQ): A1 0 0
(6:137)
Proof. (i) Note that
lmin (A1 ) ¼ inf x0 A1 x ¼ inf (x0 x þ x0 Bx þ x0 Cx) kxk¼1
kxk¼1
0
inf x x sup jx0 Cxj 1 kCk: kxk¼1
kxk¼1
Since kAk ¼ lmax (A) ¼ 1=lmin (A1 ), Eq. (6.136) follows. (ii) By the partitioned matrix inversion rule [Lemma 1.7.6(ii)] A1 ¼
P1 þ P1 HGH 0 P1 GH 0 P1
P1 HG , G
(6:138)
where (6:139) G1 ¼ Q H 0 P1 H is positive definite
0 is positive definite. Taking x2 [ Rs , 0 [ Rr and because G ¼ (0 I)A1 I 0 we have from Eq. (6.138) letting y ¼ x2 kGk ¼ sup x02 Gx2 ¼ sup y0 A1 y kx2 k¼1
sup p
x[R , kxk¼1
kx2 k¼1
0 1
x A x ¼ kA1 k:
(6.140)
310
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
By Eq. (6.139) tr(H 0 P1 H) ¼ trQ trG1 trQ. Therefore
lmax (P1=2 HH 0 P1=2 ) tr(P1=2 HH 0 P1=2 ) ¼ tr(H 0 P1 H) trQ: (6:141) In the equation
0
u 1 u ¼ u0 P1 u þ u0 P1 HGH 0 P1 u A 0 0
(6:142)
we need to estimate only the end term. Using the inequality v0 Cv kCkkvk2 (C is symmetric) twice we get u0 P1 HGH 0 P1 u ¼ (H 0 P1 u)0 G(H 0 P1 u) kGkkH 0 P1 uk
2
¼ kGku0 P1 HH 0 P1 u ¼ kGk(P1=2 u)0 P1=2 HH 0 P1=2 (P1=2 u) kGkkP1=2 HH 0 P1=2 kkP1=2 uk
2
¼ kGklmax (P1=2 HH 0 P1=2 )u0 P1 u:
(6.143)
Combining Eqs. (6.140) – (6.143) we obtain Eq. (6.137).
B
6.6 PRELIMINARY ANALYSIS 6.6.1 Bound on the Error Norm Lemma.
If the m.d. {en , F n } satisfies Eq. (6.105), then for any b . 1=a ken k ¼ o(nb ):
(6:144)
Proof. By the conditional Chebyshov inequality (Lemma 6.1.7) for any d . 0 P(kei ka . diab jF i1 )
1 c E(kei ka j F i1 ) ab : ab di di
As ab . 1, we get 1 X i¼1
P(kei ka . diab j F i1 )
1 cX iab , 1: d i¼1
Letting Zi ¼ 1(kei ka . diab ) in the conditional Borel– Cantelli lemma (Lemma 6.1.2) P a ab for all large i. Since this is we see that 1 i¼1 Zi , 1 a.s. This means that kei k di B true for any d . 0, we obtain Eq. (6.144).
6.6 PRELIMINARY ANALYSIS
311
6.6.2 Laws of Iterated Logarithm-Type Upper Bound The bound (6.145) below and, more generally, Eq. (6.72) is the same rate as is seen in the so-called laws of iterated logarithm; see Stout (1974). Lemma. Suppose that the (real-valued) m.d. sequence {en , F n } satisfies Eq. (6.105) and that l is a complex number with jlj ¼ 1. Denote R(a) and I (a) the real and imaginary parts of a complex number a and put Rn ¼
n X
R(lj e jk ), In ¼
n X
j¼1
I(lj e jk ), k ¼ 1, . . . , p:
j¼1
Then Rn ¼ O((n log log n)1=2 ), In ¼ O((n log log n)1=2 ) a:s:
(6:145)
pffiffiffiffiffiffiffi iw l ¼ e , i ¼ 1, implies lj ¼ eijw ¼ Proof. By the Euler formula Pn cos jw i sin jw and Rn ¼ j¼1 e jk cos jw. {e jk cos jw, F j } is a m.d. sequence such that sup E(je jk cos jwja j F j1 ) sup E(kej ka j F j1 ) , 1: j
j
Putting wn ¼ 1 for all n, we satisfy all conditions required for Wei’s bound (Section 6.3.10), where sn ¼ n1=2 and jwn j ¼ 1 ¼ o(scn ) ¼ o(nc=2 ) for all 0 , c , 1. Therefore Rn ¼ O(n1=2 ( log log n1=2 )1=2 ) ¼ O((n log log n)1=2 ): Obviously, such a proof works for In too.
B
6.6.3 Case of One Jordan Cell Lemma.
Let l be a complex number with jlj ¼ 1. Define a p p matrix 0
l 1 0 B0 l 1 B B D ¼ B0 0 l B .. .. .. @. . . 0 0 0
1 0 0 C C C C: .. C .. . . A l
If the m.d. sequence {en , F n } satisfies Eq. (6.105), then n X Dni ei ¼ O(n p1=2 ( log log n)1=2 ) a:s: i¼1
312
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Proof. We begin with the expression for the powers of D: 0
k
Bl B B B B0 k D ¼B B B . B .. @ 0
k k1 l 1
lk
.. .
..
0
1 k kpþ1 l C p1 C
C k C lkpþ2 C p2 C, C C .. C . A k l
.
k ¼ 0, 1, . . .
(6:146)
a! 0 a a if a b. Therefore Here we set ¼ 1, ¼ 0 if a , b and ¼ 0 b b b!(a b)! D
ni
ei ¼
p1 X
l
nin
n¼0
!0
0 X ni nin n i ei,nþ1 , . . . , ei,nþp : l n n n¼0
Introducing for n ¼ 0, 1, . . . , p 1 and k ¼ n þ 1, . . . , p the sum Sn (n, k) ¼ l
nn
n X ni i¼1
n
li eik ,
(6:147)
we have n X
D
ni
ei ¼
p1 X n X
l
nin
¼
p1 X
ni n
n¼0 i¼1
i¼1
ei,nþ1 , . . . ,
0 X n X
l
nin
ni n
n¼0 i¼1
Sn (n, n þ 1), . . . ,
n¼0
0 X
!0
ei,nþp
!0 Sn (n, n þ p) :
(6.148)
n¼0
By partial summation,
Pn
i¼1
ai bi ¼
Pn1 j¼1
(aj a jþ1 )
Pj
i¼1
bi , we have
n n1 X X ni nj nj1 R(li eik ) ¼ Rj , n n n i¼1 j¼1
(6:149)
where Rj and R(li eik ) are from Lemma 6.6.2. Note that, as m ! 1,
m n
m1 n
¼
m 1 (m 1)! m 1 ¼ n!(m 1 n)! m n n1
¼
(m 1) (m (n 1)) mn1 : (n 1)! (n 1)!
(6.150)
(The equivalence relation “” between two sequences an and bn means c1 an bn c2 an with constants independent of n:) For moderate m the left side is
6.6 PRELIMINARY ANALYSIS
313
bounded. Lemma 6.6.2 and Eqs. (6.149) and (6.150) lead to the bound n X ni i¼1
n
R(li eik ) ¼
n1 X
O((n j)n1 )O(( j log log j)1=2 )
j¼1
¼
n1 X
O(nn1=2 ( log log n)1=2 )
j¼1
¼ O(nnþ1=2 ( log log n)1=2 ) a:s:
(6.151)
Likewise, n X ni I (li eik ) ¼ O(nnþ1=2 ( log log n)1=2 ) a:s: n i¼1
(6:152)
Since n p 1 and jlj ¼ 1, Eqs. (6.151) and (6.152) imply, in view of Eq. (6.147), that Sn (n, k) ¼ O(n p1=2 ( log log n)1=2 ) a.s. Therefore the lemma follows from Eq. (6.148). B
6.6.4 Order of Magnitude of kYnk in Case M < 1 Lemma. Let the m.d. sequence {en , F n } satisfy Eq. (6.105) and let M , 1. Then kYn k ¼ o(nb ) a.s. for every b . 1=a. Proof. As M , 1, we can simplify Eq. (6.114) as kBn k cnm1 M n ¼ c(nm1 M n=2 )M n=2 c1 M n=2 :
(6:153)
By Lemma 6.6.1 for any 1 . 0 there exists I(1) . 0 such that kei k 1ib ,
i I(1):
(6:154)
i , I(1):
(6:155)
We can extend this bound by writing kei k c2 ,
Apply Eq. (6.106) and Eqs. (6.153)– (6.155) to get kYn k kBn kkY0 k þ
n X
kBni kkei k
i¼1
c1 kY0 kM n=2 þ c1 c2
I(1)1 X i¼1
M (ni)=2 þ c1 1
n X i¼I(1)
M (ni)=2 ib
314
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
b
¼ o(n ) þ c3 M
(nI(1))=2
b
þ c1 1n
n X
M
(ni)=2
i¼I(1)
¼ o(nb ) þ c1 1nb
1 X
b i n
M i=2 :
i¼0
Since 1 . 0 is arbitrary, the lemma is proved.
B
6.6.5 Order of Magnitude of kYnk in Case M 5 1 Lemma. Let the m.d. sequence {en , F n } satisfy Eq. (6.105) and let M ¼ 1. Then kYn k ¼ O(nm1=2 ( log log n)1=2 ) a.s. Proof. Plugging the Jordan representation (6.112) in autoregression (6.104) and premultiplying the resulting equation by C we obtain CYn ¼ DCYn1 þ Cen :
(6:156)
Partition the vectors CYn and Cen conformably with D as 0 (1) 1 0 (1) 1 zn un CYn ¼ @ A, Cen ¼ @ A, z(q) u(q) n n
(6:157)
where z(n j) and u(n j) are mj 1 vectors, j ¼ 1, . . . , q. The properties of en imply a sup j,n E(ku(nj) k j F n1 ) , 1. Then, because D is diagonal, system (6.156) breaks up into q equations j) þ u(nj) ¼ Dnj z(0j) þ z(nj) ¼ Dj z(n1
n X
Djni u(i j) :
(6:158)
i¼1
For j with jlj j ¼ 1 from Eq. (6.158) and Lemmas 6.5.2 and 6.6.3 we obtain kz(nj) k ¼ O(nmj 1 ) þ O(nmj 1=2 ( log log n)1=2 ) ¼ O(nmj 1=2 ( log log n)1=2 ): (6:159) For j with jlj j ¼ 1 by part (i) of Lemma 6.5.2 kz(nj) k ¼ o(n1=2 ). Hence, Eq. (6.159) holds for every j. This proves the statement because C is nonsingular. B
6.6.6 Order of Magnitude of kYnk in Case M > 1 Lemma. Let {en , F n } be a m.d. sequence satisfying Eq. (6.105). If M . 1, then kYn k ¼ O(nm1 M n ) a.s. where m and M are as defined in Eq. (6.113). Proof. We use Eq. (6.156) with CYn and Cen partitioned as in Eq. (6.157). Equations (6.154) and (6.155) imply ku(i j) k c max {1, 1ib }. Using this bound, Eq. (6.158) and
6.6 PRELIMINARY ANALYSIS
315
Eq. (6.114) we estimate one component of CYn as follows: kz(nj) k kDnj kkz(0j) k þ
n X
kDjni kku(i j) k
i¼1
c1 nmj 1 jlj jn þ c2
n X
(n i)mj 1 jlj jni max{1, 1ib }
i¼1
"
¼ c1 nmj 1 jlj jn 1 þ c3
n X i¼1
i 1 n
mj 1
# jlj ji max{1, 1ib }
c4 nmj 1 jlj jn : This implies the required bound on Yn .
B
6.6.7 Generalization of Rubin’s Theorem For purely explosive systems (i.e. m ¼ min jlj j . 1), Rubin (1950) showed in the 1-D case that if the en are i.i.d. random variables with Ee1 ¼ 0 and Ee21 ¼ s 2 . 0, then Yn diverges exponentially fast so that P(Bn Yn converges to nonzero limit) ¼ 1: The theorem below is a multivariate generalization of Rubin’s result. Theorem. Let the m.d. sequence {en , F n } satisfy Eq. (6.105) and let Yn be defined by Eq. (6.106). If m . 1, then B is invertible and 1 X Bi ei : (6:160) Bn Yn converges a:s: to Y ¼ Y0 þ i¼1 If, furthermore, " #! r X 0 Brk enrþk e0nrþk (Brk ) jF nr lim inf lmin E .0 (6:161) n!1
k¼1
for some r 1, then the limit Y in Eq. (6.160) has the property that a0 Y has a continuous distribution for all a = 0:
(6:162)
P Proof. Let Zn ¼ Y0 þ ni¼1 Bi ei be the initial segment of Eq. (6.160). By Eq. (6.106) Bn Yn ¼ Zn . By Lemma 6.5.2 kBn k cmn . This bound and Eq. (6.105) imply 1 X i¼1
2
E(kBi ei k j F i1 )
1 X i¼1
2
kBi k sup E(ken k2 j F n1 ) , 1 a:s: n
By the martingale convergence theorem (Theorem 6.1.3) Zn ! Y a.s.
316
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Now we turn to the proof of Eq. (6.162). Note that, since Bi is nonsingular, the vectors (Bi )0 a, i ¼ 1, 2, . . . , are nonzero for a = 0. Let us write the series in Eq. (6.160) in batches of r terms: 1 X
Bi ei ¼
i¼1
1 X r X
Bnrk enrþk ¼
n¼0 k¼1
1 X
B(nþ1)r
n¼0
r X
Brk enrþk :
k¼1
Denote un ¼
r X
Brk enrþk , F~ n ¼ F (nþ1)r :
k¼1
The sequence {un , F~ n } is a m.d. sequence, E(un j F~ n1 ) ¼
r X
Brk E(enrþk j F nr ) ¼ 0,
k¼1
with conditional second moments E(un u0n
j F~ n1 ) ¼ E
r X
! rk
B
enrþk e0nrþl (Brl )0
j F nr :
k,l¼1
Here for k , l E(enrþk e0nrþl j F nr ) ¼ E[E(enrþk e0nrþl j F nrþl1 ) j F nr ] ¼ E[enrþk E(e0nrþl j F nrþl1 ) j F nr ] ¼ 0 and similarly for l , k E(enrþk e0nrþl j F nr ) ¼ 0. Hence, E(un u0n
j F~ n1 ) ¼ E
r X
! B
rk
enrþk e0nrþk (Brk )0
j F nr
k¼1
and condition Eq. (6.161) rewrites as lim inf lmin E(un u0n j F~ n1 ) . 0: n!1
As {un , F~ n } satisfies the multivariate local M – Z condition [Eq. (6.65)] and the coefficients An ¼ (B(nþ1)r )0 a satisfy Eq. (6.69), Theorem 6.3.9 yields 0
0
P(a Y ¼ c) ¼ P a Y0 þ
1 X
! A0n un
¼c ¼0
n¼0
for every constant c.
B
6.6 PRELIMINARY ANALYSIS
317
6.6.8 kFnk is Bounded Lemma. If the m.d. sequence {en , F n } satisfies Eq. (6.105), then the expression P Fn ¼ n1 ni¼1 ei e0i satisfies kF n k ¼ O(1): Proof. Let di ¼ kei k2 E(kei k2 j F i1 ). Taking some r [ (1, min {2, a=2}), in a way similar to Eq. (6.77) we have supi E(jdi jr j F i1 ) , 1. Letting Un ¼ n in Theorem 6.1.4 we see that 1 X
E(jdi jr j F i1 )Uir c
1 X
1
and therefore n X
Pn
i¼1
kei k2 ¼
i¼1
ir , 1 a:s:
1
di ¼ o(n). Hence,
n X
di þ
i¼1
n X
E(kei k2 j F i1 ) ¼ o(n) þ O(n) ¼ O(n) a:s:
(6:163)
i¼1
Since Fn is nonnegative definite, by Lemma 6.5.5 it is true that kFn k ¼ lmax (Fn )
p X i¼1
n 1X li (Fn ) ¼ tr ei e0 n i¼1 i
! ¼
n 1X kei k2 : n i¼1
Equations (6.163) and (6.164) prove the lemma.
B
6.6.9 kXnk is Bounded Lemma.
If M , 1 and Eq. (6.105) holds, then Xn ¼ 1n kXn k
(6:164)
Pn
i¼1
Yi Yi0 satisfies
n 1X kYi k2 ¼ O(1): n i¼1
(6:165)
Proof. By the recursion (6.106) n X
2
!1=2 kYk k2
4
k¼1
n X
kBk kkY0 k þ
!2 31=2 kBki kkei k 5
i¼1
k¼1
" n X
k X
#1=2 2
k
(kB kkY0 k)
k¼1
2 þ4
n X
k X
k¼1
i¼1
kB
ki
!2 31=2 kkei k 5 :
(6.166)
318
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
The first term on the right causes no trouble as M , 1. The second term is bounded using Eq. (6.163): !2 ! ! n k n k k X X X X X kBki kkei k kBki k kBki kkei k2 k¼1
i¼1
i¼1
k¼1
1 X
! i
kB k
i¼1 n n X X
i¼0
1 X i¼0
i¼1
!2 kBi k
n X
kB
! ki
k kei k2
k¼i
kei k2 ¼ O(n) a:s:
(6.167)
i¼1
For Xn a relationship of type Eq. (6.164) is true. Therefore Eqs. (6.166) and (6.167) give Eq. (6.165). B
6.6.10 Lemma on Almost Decreasing Sequences Lemma. (Nielsen, 2005, Lemma 8.5) If a numerical nonnegative sequence {at } is almost decreasing, in the sense that there exist constants c . 0 and k . 0 such that atþ1 at þ ct k for all large t, and the numbers a1 , . . . , aT are jointly bounded as T X
at ¼ o(T d )
(6:168)
t¼1
for all d . 0, then these numbers tend to zero at the rate aT ¼ o(T r ) for all r , min {1, k=2}: Proof. By the “almost decreasing” condition, there exists T0 such that atþ1 at þ ct k for all t T0 : Suppose T . T0 and consider T0 t T: Inductively, at atþ1 ct k atþ2 ct k c(t þ 1)k aT [ct k þ c(t þ 1)k þ þ c(T 1)k ]:
(6:169)
Now let 0 , r , 1: For large enough T, Eq. (6.169) is applicable to t [ [T T r , T] # [T0 , T]: The sum in the brackets at the right of Eq. (6.169) contains at most T r terms, and each of these does not exceed c(T T r )k 2cT k for large T. Therefore min
TtTT r
at aT 2cT rk :
Restricting r to r [ (0, min{1, k=2}) it is seen that T X t¼1
at
T X
at T r (aT 2cT rk )
t¼TT r
¼ T r aT 2cT 2rk T r aT 2c
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
319
P for large T: Combining this with Eq. (6.168) gives aT T r Tt¼1 at þ 2c ¼ B o(T dr ): As d can be chosen arbitrarily small, this proves the lemma.
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS 6.7.1 Theorem on Relative Compactness Recall that a sequence of vectors {ak } # Rp is called relatively compact if any subsequence {bk } of it contains a further subsequence {ck } that is convergent. We say that a sequence of random matrices {Xn } is relatively compact with probability 1 if for almost any v [ V the sequence {Xn (v)} is relatively compact. In this situation it is convenient to denote Lim{Xn } the set of limit points of {Xn }. The purpose here is to establish the relationship between Xn ¼
n n X 1X Yi Yi0 and Fn ¼ n1 ei e0i : n i¼1 i¼1
The relative compactness notion is required because {Xn } and {Fn } do not converge. Theorem. Let the m.d. sequence {en , F n } satisfy Eq. (6.105) and let Yn be defined by Yn ¼ BYn1 þ en ,
(6:170)
where B satisfies M ¼ maxjlj j , 1. Then with probability 1 the matrix sequence {Xn } is relatively compact and its set of limit points is BF, where B is the enveloping resolvent from Lemma 6.5.4 and F is the set of limit points of {Fn } (which is also relatively compact): Lim{Xn } ¼ B Lim{Fn }. Proof. In a finite-dimensional space relative compactness is equivalent to boundedness, so the statements about relative compactness of {Xn } and {Fn } follow from Lemmas 6.6.8 and 6.6.9. From Eq. (6.170) we get n X
Yi Yi0 ¼
i¼1
n X
0 (BYi1 þ ei )(Yi1 B0 þ e0i ) ¼ B
i¼1
þ
n X i¼1
n X
0 Yi1 Yi1 B0
i¼1 0 (BYi1 e0i þ ei Yi1 B0 ) þ
n X
ei e0i
i¼1
or, using Xn and Fn , n 1 1X 0 (BYi1 e0i þ ei Yi1 B0 ) þ Fn : Xn ¼ BXn B0 þ B(Y0 Y00 Yn Yn0 )B0 þ n n i¼1
(6:171)
320
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Because Yi1 is F i1 -measurable, {BYi1 e0i } is a m.d. sequence. Let us bound n X
2
E( kBYi1 e0i k jF i1 )=i2
n X
1
kBk2 kYi1 k2 E( kei k2 jF i1 )=i2
1
c
n X
kYi1 k2 =i2 :
(6:172)
1
Denote Sn ¼ n X 1
Pn 1
kYi1 k2 , ai ¼ i2 . From Lemma 6.6.9 we see that
kYi1 k2 =i2 ¼
n X
(Si Si1 )ai
1
¼ S0 a1 þ Sn an þ
n1 X
Sj (aj a jþ1 )
1
n1 1 X 1 1 þ O( j) n2 j2 ( j þ 1)2 1
n1 X 1 ¼ O(1) þ O 2 ¼ O(1): j 1 ¼ O(1) þ O(n)
(6:173)
Now Eqs. (6.172), (6.173) and the martingale strong law (Section 6.1.4) with p ¼ 2, Xn ¼ BYn1 e0n and Un ¼ n imply n 1X 0 (BYi1 e0i þ ei Yi1 B0 ) ¼ o(1) a:s: n i¼1
(6:174)
Besides, by Lemma 6.6.4 with b ¼ 1=2 . 1=a 1 B(Y0 Y00 Yn Yn0 )B0 ¼ o(1): n
(6:175)
The consequence of Eqs. (6.171), (6.174) and (6.175) is that Xn BXn B0 ¼ Fn þ o(1) or, in terms of the operator B, Xn ¼ B(Fn þ o(1)):
(6:176)
By Lemma 6.6.8 {Fn } is relatively compact. If {Fn (v): n 1} is convergent for some v [ V, then, because of boundedness of B, {Xn (v): n 1} is also convergent. The set of limit points of {Xn } is an image under B of the set of limit points B of {Fn }.
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
321
6.7.2 Bounding lmax (Vn) Theorem. If the m.d. sequence {en , F n } satisfies Eq. (6.105), then for Vn ¼ P n 0 i¼1 Yi Yi the following is true: (i) lmax (Vn ) ¼ O(n2m2 M 2n ) a.s. if M . 1; (ii) lmax (Vn ) ¼ O(n2m log log n) a.s. if M ¼ 1; (iii) lmax (Vn ) ¼ O(n) a.s. if M , 1. Proof. (i) From Eq. (6.120) and Lemma 6.6.6 we derive
lmax (Vn )
n X
kYi k2 c
n X
i¼1
i2m2 M 2i
i¼1
n 2m2 X i in 2m2 2n ¼ cn M (M 2 ) n i¼1
cn2m2 M 2n
1 X
i
(M 2 ) :
i¼0
(ii) Similarly, by Lemma 6.6.5, excluding i for which log log i is not defined,
lmax (Vn ) kY1 k2 þc
n X
i2m1 log log i
i¼2
c1 þ cn
2m1
(log log n)(n 2) ¼ O(n2m log log n):
Statement (iii) follows from Eq. (6.165).
B
6.7.3 Bounding lmin(Vn) from Below in the Stable Case The importance of the theorem on relative compactness (Section 6.7.1) is demonstrated by the following application. For the notations B, Fn , Xn and Vn see Sections 6.5.4, 6.7.1 and 6.7.2, respectively. Lemma.
If the m.d. sequence {en , F n } satisfies Eq. (6.105), then the condition lim inf lmin (BFn ) . 0 a:s: n!1
(6:177)
is necessary and sufficient for lim inf n!1
1 lmin (Vn ) ¼ lim inf lmin (Xn ) . 0 a:s: n!1 n
(6:178)
Proof. Suppose Eq. (6.177) is true and denote a ¼ lim inf n!1 lmin (Xn ): Then there exists a subsequence {Xnk } # {Xn } such that a ¼ limn!1 lmin (Xnk ): By Lemma 6.6.9
322
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
{Xn } is relatively precompact. To simplify the notation, we can assume that {Xnk } converges. Similarly, by Lemma 6.6.8 we may assume that {Fnk } converges. By Theorem 6.7.1 the respective limit points satisfy Lim{Xn } ¼ BLim{Fn } and almost sure positivity of a follows from Eq. (6.177). The proof of (6.178) ) (6.177) is absolutely analogous. B To extend this result to unstable systems, we need to modify Eq. (6.177) because the operator B is not defined in the case M . 1: This is the subject of Sections 6.7.4 and 6.7.5, the final result being Theorem 6.7.5.
6.7.4 Conditions Equivalent to Equation (6.177) Denote Hk (L) ¼ (I, B, . . . , Bk1 )L1=2 ¼ (L1=2 , BL1=2 , . . . , Bk1 L1=2 ),
k ¼ 1, 2, . . .
The matrices Hk , Hp and Hp Hp0 are of sizes p (kp), p p2 and p p, respectively. Lemma. Under the conditions of Lemma 6.7.3, condition (6.177) of that lemma is equivalent to each of the next two conditions: P(rankHp (L) ¼ p)
for all L [ Lim{Fn }
(6:179)
and lim inf lmin (Hp (Fn )[Hp (Fn )]0 ) . 0 a:s: n!1
(6:180)
Proof. Step 1. We prove that for any L the following three conditions are equivalent
lmin (Hk (L)[Hk (L)]0 ) . 0 for some k p, rank Hp (L) ¼ p
(6:182)
lmin (Hp (L)[Hp (L)]0 ) . 0
(6:183)
(6:181)
and
(see Kushner, 1971, p. 264). With the identity and null matrices of appropriI ate sizes we can define A ¼ to obtain 0 (I, B, . . . , Bk1p )L1=2 ¼ (I, B, . . . , B p1 )L1=2 A ¼ Hp (L)A, Hk (L) ¼ (I, . . . , B p1 , Bp , . . . , Bk1 )L1=2 ¼ (Hp (L), B p (I, . . . , Bk1p )L1=2 ) ¼ (Hp (L), B p Hp (L)A):
(6:184)
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
323
This implies Hk (L)[Hk (L)]0 ¼ Hp (L)[Hp (L)]0 þ B p Hp (L)AA0 [Hp (L)]0 (B p )0 and
lmin (Hk (L)[Hk (L)]0 ) lmin (Hp (L)[Hp (L)]0 ):
(6:185)
Furthermore, with appropriately sized matrices we can define S ¼ (I, B p ) and T ¼ diag[I, A] to obtain from Eq. (6.184)
I SHp (L)T ¼ (Hp (L), B Hp (L)) 0 p
0 A
¼ Hk (L):
(6:186)
If rank Hp (L) , p, then rank Hk (L) min{rank S, rank Hp (L), rank T} , p by Eq. (6.186) and rank Hk (L)[Hk (L)]0 , p: Since Hk (L)[Hk (L)]0 is of size p p and nonnegative definite, we can use jHk (L)[Hk (L)]0 j ¼
p Y
lj (Hk (L)[Hk (L)]0 )
(6:187)
j¼1
to conclude that lmin (Hk (L)[Hk (L)]0 ) ¼ 0. Thus, (6.181) ) (6.182). By Eq. (6.187), the implication (6.182) ) (6.183) is true. And, finally, according to Eq. (6.185), (6.183) implies lmin (Hk (L)[Hk (L)]0 ) . 0 for all k p: Step 2. As we know from Lemma 6.6.8, {Fn (v): n 1} is relatively compact for almost every v: Suppose that Eq. (6.177) holds. Then there is a subsequence {Fnk } # {Fn } such that limn!1 lmin (BFnk (v)) . 0: By relative compactness we can assume that {Fnk (v)} converges to some L(v): Then lmin (BL(v)) . 0 and we can choose k ¼ k(v) p such that
0
lmin (Hk (L(v))[Hk (L(v))] ) ¼ lmin
k 1 X
! i
i 0
B L(v)(B )
. 0:
i¼0
From Step 1, this is equivalent to Eqs. (6.182) and (6.183) for the given L ¼ L(v): For the reason that this proof applies to each limit point L of {Fn }, we have proved that (6.177) ) (6.179). Since Eq. (6.183) is just an equivalent way of writing Eq. (6.180), we have also proved (6.179) () (6.180). The implication (6.180) ) (6.177) is obvious P p1 i B Fn (Bi )0 . B because Hp (Fn )[Hp (Fn )]0 ¼ i¼0
6.7.5 A General Lower Bound on lmin(Vn) The theorem here, unlike Lemma 6.7.3, does not require the assumption M , 1: Theorem. Let the m.d. sequence {en , F n } satisfy Eq. (6.105) and define Yn by Eq. (6.106). Suppose that condition (6.179) or, equivalently, (6.180) holds. Then lim inf n!1 n1 lmin (Vn ) . 0 a.s.
324
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
Proof. Step 1. Let us show that the residual Rn in Eq. (6.126) satisfies Rn ¼ o(n): Obviously, the matrices from Eq. (6.122) satisfy c ¼ max kCj k , 1:
(6:188)
j
Plþp2 Ckl el Ak,l : As we can see from Eq. (6.124), Ak,l is Denote Xl ¼ k¼l F l1 -measurable and {Xl } is a m.d. sequence. By Eq. (6.188), kAk,l k c
l1 X
kem k
m¼kpþ1
and kXl k c
lþp2 X
kel kkAk,l k c1 kel k
lþp2 X
l1 X
k¼l
m¼kpþ1
k¼l
¼ c1 kel k
c2 kel k
l1 X
mþp1 X
m¼lpþ1
k¼l
kem k
kem k c1 ( p 1)kel k
l1 X
kem k
m¼lpþ1
!1=2
l1 X
kem k
2
:
m¼lpþ1
With this bound at hand, we can estimate n X l¼2
E( kXl k2 jF l1 )l2 c3
n X
l1 X
kem k2 E( kel k2 jF l1 )l2
l¼2 m¼lpþ1
c3 sup E( ken k2 jF n1 ) n
c4 sup E( ken k2 jF n1 ) n
n1 X m¼3p n1 X
kem k2
mþp1 X
l2
l¼mþ1
kem k2 m2 :
m¼3p
(6:189) P As we know from Eq. (6.163), n1 kem k2 ¼ O(n): Therefore the right side of Eq. (6.189) is bounded uniformly in n [see a similar argument in Section 6.7.1, in particular, Eq. (6.173)]. By Theorem 6.1.4 Rn ¼ P n 0 l¼2 (Xl þ Xl ) ¼ o(n). Step 2. In view of Eq. (6.126) we have proved that ! p1 n n X 1X 1X 0 0 Zi Zi ¼ Cj ekj ekj Cj0 þ o(1): n i¼p n j¼0 k¼p
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
This equation and Lemma 6.6.8 show that the sequence
n P n 1 n
i¼p
325
o Zi Zi0 is
relatively compact with probability 1. Moreover, its set of limit points is {F(F): F [ Lim{Fn }}, where
F(F) ¼
p1 X
Cj FCj0 :
j¼0
All terms in this expression are nonnegative definite. Therefore if x0 F(F)x ¼ 0, then F 1=2 x ¼ 0, F 1=2 (B þ a1 Ip )0 x ¼ 0, . . . , F 1=2 (B p1 þ þ a p1 Ip )0 x ¼ 0 which, in turn, implies F 1=2 x ¼ 0, F 1=2 B0 x ¼ 0, . . . , P p1 i i 0 F 1=2 (B p1 )0 x ¼ 0: Hence, if lmin B F(B ) . 0, then F(F) is noni¼0 P singular. Therefore lim inf n!1 lmin 1n ni¼p Zi Zi0 . 0 a.s. by assumption (6.180). Lemma 6.5.6 and this relation prove the theorem.
B
6.7.6 Application to Scalar Autoregression Consider an autoregressive model (6.110), which can be written as Eq. (6.104) if notation (6.111) is adopted. Theorem.
If the m.d. sequence {1n , F n } satisfies sup E(j1n ja j F n1 ) , 1 a:s: with some a . 2
(6:190)
n
and
lim inf n!1
n 1X E(12i j F i1 ) . 0 a:s: n i¼1
(6:191)
then lim inf n!1 lmin (Vn ) . 0 a.s. Proof. Obviously, Eq. (6.190) implies Eq. (6.105). We need to verify Eq. (6.179). With u ¼ (1, 0, . . . , 0)0 we have en ¼ 1n u and Fn ¼ n1 p1 X i¼0
n X
ei e0i ¼ n1
n X
i¼1
i¼1
i
n X
i 0
1
B Fn (B ) ¼ n
i¼1
1i uu0 1i ¼ n1 12i
" p1 X i¼0
n X i¼1
i
0
i 0
12i uu0 ,
#
B uu (B ) :
(6:192)
326
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
From
Bu ¼
2
b p1
I p1
B u¼
...
b1
b1
... I p1
b p1
0
1 b1 1 B C B C B 1 C bp B 0 C B C B C¼B 0 C C, 0 @...A B B C @...A 0 0 0 2 1 0 1 b1 þ b2 b1 B C B C B b1 C C B 1 C B C bp B C B 1 C B 0 C¼B B C B C 0 B C B 0 C C @...A B @ ... A 0 0 0
1
and similar expressions for other powers of B we observe that the matrix (u, Bu, . . . , B p1 u) is upper triangular with unities on the main diagonal and is therefore nonsingular. It follows that in Eq. (6.192) the matrix in the brackets is nonsingular:
det
p1 X
! Bi uu0 (Bi )0
i¼0
2
13 u0 0 C7 B 6 p1 B (Bu) C7 u) ¼ det6 (u, Bu, . . . , B @ . . . A5 . 0: 4 (B p1 u)0 0
Theorem 6.7.5 is applicable.
B
6.7.7 Purely Explosive Case Here we establish that in the purely explosive case (m ¼ minjlj j . 1) both lmin (Vn ) and lmax (Vn ) grow exponentially fast under assumption (6.161). Theorem. Let {en , F n } be a m.d. sequence satisfying Eqs. (6.105) and (6.161), and define Yn by Eq. (6.106). Suppose m . 1: Then P n n 0 i 0 i 0 (i) The product a.s. to G ¼ 1 i¼0 B YY (B ) , where Pn B 0 Vn (B ) convergesn Vn ¼ i¼1 Yi Yi and Y ¼ limn!1 B Yn : (ii) G is positive definite with probability 1. (iii) With probability 1 lim n1 log lmax (Vn ) ¼ 2 log M,
n!1
lim n1 log lmin (Vn ) ¼ 2 log m:
n!1
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
327
Proof. (i) As for the existence of the limit Y ¼ limn!1 Bn Yn , see Eq. (6.160). Convergence of G follows from kBi k cmi : Let us prove that Bn Vn (Bn )0 converges to G: Denoting Zn ¼ Bn Yn Yn0 (Bn )0 and Z ¼ YY 0 , by Eq. (6.160) we have Zn ! Z a.s. and therefore c ¼ supi kZi k , 1 a.s. For a given 1 . 0 let n(1) be such that kZ Zi k 1 for i n(1): We need to prove that Bn Vn (Bn )0 ¼
n X
Bin Bi Yi Yi0 (Bi )0 (Bin )0
i¼1
¼
n X
Bin Zi (Bin )0
i¼1
P i i 0 converges to G ¼ 1 i¼0 B Z(B ) : This convergence follows from the next three bounds. The first bound is n(1)1 n(1)1 X 2 X in in 0 B Zi (B ) kBin k c i¼0 i¼0 c
1 X
2
kBj k ! 0, n ! 1:
j¼nn(1)þ1
The second is 1 X j j 0 B Z(B ) j¼nn(1)þ1
1 X
2
kBj k kZk ! 0,
n ! 1:
j¼nn(1)þ1
Finally, nn(1) n X X in in 0 j j 0 B Zi (B ) B Z(B ) i¼n(1) j¼0 1 nn(1) X 2 X j j 0 ¼ B (Znj Z)(B ) 1 kBj k : j¼0 j¼0 For these bounds to yield the desired result, n(1) is chosen first and n next. (ii) Suppose that 0 ¼ x0 Gx ¼
1 X i¼0
x0 Bi YY 0 (Bi )0 x ¼
1 X
(x0 Bi Y)2 :
i¼0
Then x0 Bi Y ¼ ((Bi )0 x)0 Y ¼ 0: By Eq. (6.162) this is an impossible event if (Bi )0 x = 0, that is x = 0:
328
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
(iii) Denote Cn ¼ Bn Vn (Bn )0 : Since Cn ! G a.s. and G is positive definite, we have 0 , lim lmin (Cn ) lim lmax (Cn ) , 1 a.s: n!1
n!1
This relation, the equation Vn ¼ Bn Cn (Bn )0 and Lemma 6.5.3 imply n1 log lmax (Vn ) ¼ n1 loglmax (Bn (Bn )0 ) þ o(1):
(6:193)
However, by Lemma 6.5.2(ii) and Eq. (6.119) n1 log lmax (Bn (Bn )0 ) ¼ n1 log kBn k2 ¼ 2(1 þ o(1))log M:
(6:194)
The first equation in (iii) follows from Eqs. (6.193) and (6.194). As a result of Vn1 ¼ (Bn )0 Cn1 Bn , along with Eq. (6.193), we have n1 log lmin (Vn ) ¼ n1 log lmax (Vn1 ) ¼ n1 log lmax ((Bn )0 Bn ) þ o(1): Instead of Eq. (6.194) we need now n1 log lmax ((Bn )0 Bn ) ¼ n1 log kBn k2 ¼ 2(1 þ o(1))log m: The above two relations prove the second equation in (iii).
B
6.7.8 Some Bounds Involving Vn As before, we denote Vn ¼
Pn
i¼1
Yi Yi0 and let
N ¼ inf{n: Vn is nonsingular}, inf ¼ 1: In the lemma below, we explicitly take account of the fact that the error vectors en form an array e1 ¼ (e11 , . . . , e1p ), e2 ¼ (e21 , . . . , e2p ), . . . , en ¼ (en1 , . . . , enp ) [see Eq. (6.107)]. Lemma. Let the m.d. sequence {en , F n } satisfy Eq. (6.105) and suppose that Eq. (6.179) or, equivalently, Eq. (6.180) holds. If, additionally, M 1, then (i) N , 1 a.s. and kVn1=2 k ¼ O(n1=2 ) a:s: P (ii) Yn0 Vn1 Yn 1 for n N and ni¼N Yi0 Vi1 Yi ¼ O(log n) a.s. (iii) We have the bound n 1=2 X Yi e0iþ1 ¼ O((log n)1=2 ) a:s: Vn i¼1
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
329
Proof. (i) By Theorem 6.7.5 lim inf n!1 n1 lmin (Vn ) . 0, and so N , 1 and 2
2
kVn1=2 k ¼ sup kVn1=2 xk ¼ sup x0 Vn1 x kxk¼1
kxk¼1
¼ lmax (Vn1 ) ¼ 1=lmin (Vn ) ¼ O(n) a:s: (ii) As Vn1 is nonnegative definite and Vn is positive definite for n N, we have jVn1 j 0, jVn j . 0 and by Lemma 6.4.2 Yn0 Vn1 Yn ¼ 1 jVn1 j=jVn j 1, n N: Further, by Lemma 6.4.3 n X
Yi0 Vi1 Yi ¼ O(log lmax (Vn )):
i¼N
It remains to recall that log lmax (Vn ) ¼ O(log n), according to parts (ii) and (iii) of Theorem 6.7.2. (iii) To apply Lemma 6.4.5(iii), we need to reveal in 2 !0 ! n n n X X 1=2 X 0 0 1 0 Yi eiþ1 ¼ Yi e iþ1 Vn Yi e iþ1 V n i¼1 i¼1 i¼1 the structure associated with regression yn ¼ Xn b þ 1n . Let X 0 n ¼ (Y1 , . . . , Yn ), 1(nj ) ¼ (e2, j , . . . , enþ1, j ), Q(nj ) ¼ (1(nj ) )0 Xn ( X 0 n Xn )1 X 0 n 1(nj ) , j ¼ 1, . . . , p: P Then Vn ¼ ni¼1 Yi Y 0 i ¼ X 0 n Xn and the rows of Xn are not changed as new rows are appended. For each j, {eiþ1, j , F iþ1 } is a m.d. sequence and Yi is F i -measurable. Further, 0P n
n X
Yi e0iþ1
i¼1
1
0 0 (1) 1 X 1 B i¼1 C B C @ n n A ¼B : C¼ n @P A 0 ( p) X 1 n n Yi eiþ1,p Yi eiþ1,1
i¼1
So by Lemma 6.4.5(iii) 0 0 (1) 1 2 Xn 1n n 1=2 X C 0 (1) 0 ( p) 0 0 1 B Yi eiþ1 ¼ ((1n ) Xn , . . . , (1n ) Xn )( Xn Xn ) @ . . . A V n i¼1 Xn0 1n( p) ( p) ¼ Q(1) n þ þ Qn ¼ O(log lmax (Vn )) ¼ O(log n) a:s:
B
330
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
6.7.9 More Bounds Involving Vn Lemma. Then
Let the conditions of Lemma 6.7.8 hold and suppose that B is nonsingular.
1 (i) kVn1=2 B0 Vnþ1 BVn1=2 k 1 þ O(n1=2 (log n)1=2 ) a:s. (ii) Let r . 1=a where a is the integrability parameter from Eq. (6.105). Then 0 1 lim sup n1=2r (Ynþ1 Vnþ1 Ynþ1 Yn0 Vn1 Yn ) 0 a:s: n!1
Proof. 1 BVn1=2 : To obtain a (i) We intend to apply Lemma 6.5.9(i) to An ¼ Vn1=2 B0 Vnþ1 recursion for Vnþ1 , consider
Vnþ1 ¼
nþ1 X
Yi Yi0 ¼
i¼1
¼
nþ1 X
nþ1 X
0 (BYi1 þ ei )(Yi1 B0 þ e0i )
i¼1 0 0 (BYi1 Yi1 B0 þ BYi1 e0i þ ei Yi1 B0 þ ei e0i )
i¼1
¼ B(Vn þ Y0 Y00 )B0 þ B
n X
Yi e0iþ1 þ
i¼0
n X
eiþ1 Yi0 B0 þ
nþ1 X
i¼0
ei e0i :
i¼1
Therefore 0
1=2 1 A1 B Vnþ1 (B0 )1 Vn1=2 ¼ Ip þ Bn þ C~ n þ C~ n n ¼ Vn
(6:195)
where " Bn ¼ Vn1=2 Y0 Y00 þ B1
nþ1 X
!
#
ei e0i (B0 )1 Vn1=2
i¼1
is nonnegative definite, C~ n ¼
Vn1=2
n X
! Yi e0iþ1
(B0 )1 Vn1=2 :
i¼0
By Lemma 6.7.8, parts (i) and (iii), n 1=2 X 0 ~ kC n k Vn Yi eiþ1 k(B0 )1 kkVn1=2 k i¼0 ¼ O(n1=2 (log n)1=2 ) a:s:
(6:196)
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
331
0 Noting that Cn ¼ C~ n þ C~ n is symmetric, we can apply Eqs. (6.195) and (6.196) and Lemma 6.5.9(i) to conclude that
kAn k 1=[1 O(n1=2 (log n)1=2 )] ¼ 1 þ O(n1=2 (log n)1=2 ): (ii) For n N we have 0 1 1 Vnþ1 Ynþ1 ¼ (BYn þ enþ1 )0 Vnþ1 (BYn þ enþ1 ) Ynþ1 1 0 1 BYn þ Ynþ1 Vnþ1 enþ1 ¼ Yn0 B0 Vnþ1 1 1 þ e0nþ1 Vnþ1 Ynþ1 e0nþ1 Vnþ1 enþ1 : 1 is positive definite, we continue as follows: Remembering that Vnþ1 0 1 1 Vnþ1 Ynþ1 (Vn1=2 Yn )0 Vn1=2 B0 Vnþ1 BVn1=2 (Vn1=2 Yn ) Ynþ1 1=2 1=2 þ 2(Cnþ1 Ynþ1 )0 (Cnþ1 enþ1 )
[applying statement (i)] 2
[1 þ O(n1=2 (log n)1=2 )] kVn1=2 Yn k 1=2 1=2 þ 2kVnþ1 Ynþ1 kkVnþ1 kkenþ1 k:
(6:197)
By Lemmas 6.6.1 and 6.7.8(i,ii) 2
ken k ¼ o(nr ), kVn1=2 k ¼ O(n1=2 ), kVn1=2 Yn k ¼ Yn0 Vn1 Yn 1: These bounds and Eq. (6.197) give 0 1 Vnþ1 Ynþ1 Yn0 Vn1 Yn þ O(n1=2 (log n)1=2 ) þ o(nr1=2 ) Ynþ1
¼ Yn0 Vn1 Yn þ o(nr1=2 ): This proves statement (ii).
(6:198) B
6.7.10 Convergence of Certain Quadratic Forms An important difference between purely explosive (m . 1) and nonexplosive (M 1) systems is described in the theorem below. Theorem. Let {en } be a m.d. sequence satisfying Eqs. (6.105) and (6.161), and define Yn by Eq. (6.106). (i) If m . 0, then for k ¼ 0, +1, +2, . . . 0 Vn1 Ynk ¼ (Bk Y)0 G1 (Bk Y) . 0 a:s: lim Ynk
n!1
where G and Y are the same as in Theorem 6.7.7.
332
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
(ii) If M 1, then lim max Yj0 Vn1 Yj ¼ 0 a:s:
n!1 1 jn
(6:199)
Proof. Proving (i). The random variables under consideration converge by Theorem 6.7.7: 0 Vn1 Ynk ¼ (Bk Bkn Ynk )0 [Bn Vn (Bn )0 ]1 (Bk Bkn Ynk ) Ynk
! (Bk Y)0 G1 (Bk Y) where G1 exists. Proving (ii). The proof is split into several steps. Step 1. For the proof of Eq. (6.199) it suffices to show that lim Yn0 Vn1 Yn ¼ 0 a:s:
n!1
(6:200)
Indeed, by Lemma 6.7.8(i) for 1 j , N Yj0 Vn1 Yj max kYj k2 O(n1 ) ¼ O(n1 ): 1 j,N
For N j n we can use the fact that, according to Eq. (6.88), Vj1 V 1 jþ1 0 1 and therefore Yj0 Vj1 Yj Yj0 V 1 Y Y V Y . j j jþ1 j n Step 2. Case of a nonsingular B. By Lemma 6.7.9(ii), in Eq. (6.198) we can choose r [ (1=a, 1=2): Denoting at ¼ Yt0 Vt1 Yt , by Eq. (6.198) we have atþ1 at þ o(t k ), with k ¼ 1=2 r . 0, whereas by Lemma 6.7.8(ii) Pn d t¼N at ¼ O(log n) ¼ o(n ) for any d . 0: Therefore Lemma 6.6.10 b implies an ¼ o(n ) for all b , min{1, k=2}. Case of a singular B. In this case 0 is a root of the characteristic polynomial f(l) of B: Denote r its multiplicity, r p: Subcase r , p. In the spectrum-separating decomposition of Section 6.5.8 we can assume that M is nonsingular, B1 is nonsingular and all eigenvalues of B2 are zero. The error vectors jn , zn in Eq. (6.134) satisfy the same conditions as en . As we proved in the nonsingular case,
S0n
n X 1
!1 Si S0i
Sn ! 0 a.s:
(6:201)
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
333
Denoting An ¼ MVn M 0 , from Eqs. (6.133) and (6.135) we get Yn0 Vn1 Yn
¼
(MYn )0 A1 n MYn
þ
0
0
Tn
A1 n
¼ 0
Sn 0
0
þ2
Tn
A1 n Sn
0
0
Sn 0
A1 n
0
Tn
¼ In1 þ In2 þ 2In3 , say:
(6:202)
As M is nonsingular and kVn1 k ¼ O(n1 ) by Lemma 6.7.8(i), we have 0 1 1 1 1 kA1 n k k(M ) kkVn kkM k ¼ O(n ) a:s:
(6:203)
Lemma 6.6.4, applied to Tn , gives kTn k ¼ o(n1=2 ): Consequently, 2 1 0 In2 kA1 n k kTn k ¼ O(n )o(n) ¼ o(1):
(6:204)
By Lemma 6.6.9 n X
tr
! Ti Ti0
¼
1
n X
kTi k2 ¼ O(n):
(6:205)
1
As a result of the partitioning [Eq. (6.135)], Lemma 6.5.9(ii) can be applied to An . Using Eqs. (6.201), (6.203) and (6.205) we get
0 In1
S0n
n X
!1 Si S0i
" Sn 1 þ
kA1 n ktr
1
n X
!# Ti Ti0
1 1
¼ o(1)[1 þ O(n )O(n)] ¼ o(1):
(6:206)
The bounds obtained allow us to estimate In3 : 0
Sn 0 1=2 1=2 ¼ (In1 In2 )1=2 ¼ o(1): kIn3 k An An Tn 0
(6:207)
The desired conclusion Eq. (6.200) follows from Eqs. (6.202), (6.204), (6.206) and (6.207). Subcase r ¼ p. In this case Sn is empty, Yn ¼ Tn and the bound is similar to Eq. (6.204): Yn0 Vn1 Yn ¼ Tn0 Vn1 Tn kVn1 k kTn k2 ¼ O(n1 )o(n) ¼ o(1):
B
334
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
6.7.11 Another Lemma on Purely Explosive Processes Lemma. Let {en } be a m.d. sequence satisfying Eqs. (6.105) and (6.161). Suppose m . 1: Then
lim
n X
n!1
kBn Yi k ¼
1 X
i¼1
kBi Yk , 1 a:s:,
i¼1
where Y is from Eq. (6.160). Proof. Obviously, for any I . 0 n X
n
kB Yi k ¼
i¼1
n X
þ
i¼Iþ1
I X
! kB(ni) Bi Yi k:
(6:208)
i¼1
By Theorem 6.6.7 the number I can be chosen in such a way that for i I we have kBi Yi Yk 1: We handle the first sum in Eq. (6.208) using Lemma 6.5.2: n nI1 X X (ni) i j kB B Yi k kB Yk i¼Iþ1 j¼0 (change j ¼ n 2 i in the second sum) n X (ni) i (ni) ¼ (kB B Yi k kB Yk) i¼Iþ1
n X
kB(ni) (Bi Yi Y)k 1
i¼Iþ1
1
n X
kB(ni) k
i¼Iþ1
1 X
kBi k c1 1
i¼0
1 X
mi ¼ c2 1:
(6:209)
i¼0
For the second sum in Eq. (6.208) we apply the bound c3 ¼ supi kBi Yi k , 1: I X
kB(ni) Bi Yi k c4
1 X
mi ¼ c5 m(nI) ! 0,
n ! 1:
(6:210)
i¼nI
i¼1
Besides, 1 X i¼nI
kBi Yk c6
1 X
mi ! 0,
n ! 1:
(6:211)
i¼nI
Equations (6.208) – (6.211) prove the statement.
B
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
335
6.7.12 Strong Consistency of the Ordinary Least Squares Estimator Theorem. Suppose that the m.d. sequence {en } satisfies Eqs. (6.105) and (6.179) [or, equivalently, Eq. (6.180)]. Then for k ¼ 1, . . . , p lim b^ k (n) ¼ bk a:s:
(6:212)
n!1
Proof. Setting up proper normalization. The elements of the representation !1 n1 n1 X X 0 Yi Y Yi eiþ1,k (6:213) b^ k (n) bk ¼ i
i¼1
i¼1
need to be properly normalized to converge. We are assuming that for the spectrumseparating decomposition Eq. (6.132) holds and the process {Sn } is purely explosive. Therefore, by Theorem 6.7.7(i) ! n1 X n 0 0 Si Si (Bn (6:214) lim B1 1 ) ; G is positive definite a.s. n!1
1
Denoting Vn ¼
n1 X
Ti Ti0 ,
Dn ¼
Bn 1
0
0
1=2 Vn1
1
, An1 ¼ M
n1 X
! Yi Yi0
M0,
(6:215)
1
from Eq. (6.213) we get the desired representation: " # n1 X 1 1=2 0 0 0 1=2 n Dn M Yi eiþ1,k : b^ k (n) bk ¼ [n M Dn ] [Dn An1 Dn ]
(6:216)
i¼1
Convergence of the denominator matrix. Here we prove that Dn An1 D0n !
G 0 0 Ir
a.s.,
(6:217)
where r is the dimension of the process {Tn }: From Eqs. (6.135) and (6.215) we obtain the following representation for the denominator matrix: 0 Dn An1 D0n ¼
B B 1 1=2 B n1 @P Vn1
Bn 1
0
0
1
0 B B ¼B @
n1 P
Bn 1
n1 P
1=2 Vn1
0 Si S0i (Bn 1 )
1 n1 P 1
0 Ti S0i (Bn 1 )
Si S0i Ti S0i Bn 1
n1 P 1 n1 P 1 n1 P 1
1 C (Bn )0 C 1 C A 0 0
Si Ti0 Ti Ti
1=2 Si Ti0 Vn1
Ir
1 C C C: A
0 1=2 Vn1
336
CHAPTER 6
CONVERGENCE ALMOST EVERYWHERE
The limit of the upper left element of this matrix is given by Eq. (6.214). To prove Eq. (6.217), it suffices to show that Bn 1
n1 X
1=2 Si Ti0 Vn1 ¼
1
n1 X
0
1=2 (Bn 1 Si ) (Vn1 Ti ) ! 0:
(6:218)
1
Since the process {Tn } is nonexplosive, by Theorem 6.7.10(ii) 2
1=2 1 max kVn1 Tj k ¼ max Tj0 Vn1 Tj ! 0:
1 jn1
1 jn1
Besides, by Lemma 6.7.11 in the purely explosive case sup n
n1 X
kBn 1 Si k , 1:
(6:219)
i¼1
Clearly, Eq. (6.218) is a consequence of the above two equations. Bounding the first factor in Eq. (6.216). By Lemma 6.7.8(i) 1=2 kVn1 k ¼ O(n1=2 ) a.s:
(6:220)
This bound, definition (6.215) and Lemma 6.5.2 imply 1=2 0 kn1=2 M 0 D0n k n1=2 kM 0 k(k(Bn 1 ) k þ kVn1 k)
¼ O(n1=2 )[O(mn ) þ O(n1=2 )] ¼ O(1):
(6:221)
Bounding the last factor in Eq. (6.216). Using Eqs. (6.133) and (6.215) we get 0 1 nP 1
n B Si eiþ1,k C n1 X B1 0 B i¼1 C 1=2 1=2 Dn M Yi eiþ1,k ¼ n n C 1=2 B n1 P @ A 0 V n1 i¼1 Ti eiþ1,k i¼1
0 B B ¼B @
n1=2 Bn 1
n1 P
1
Si eiþ1,k C C C: A Ti eiþ1,k
i¼1 n1 1=2 P n1=2 Vn1 i¼1
(6:222)
Since ken k ¼ o(n1=2 ) by Lemma 6.6.1, Eq. (6.219) implies n1 1=2 n X B1 Si eiþ1,k n i¼1 (n1=2 max eiþ1,k ) 1in1
n1 X i¼1
kBn 1 Si k ! 0 a:s:
(6:223)
6.7 STRONG CONSISTENCY FOR VECTOR AUTOREGRESSION AND RELATED RESULTS
337
From Lemma 6.7.8(iii) we know that n1 1=2 X Ti eiþ1,k ¼ O((log n)1=2 ): Vn1 i¼1
(6:224)
Combining this with Eq. (6.220) we see that n1 1=2 1=2 X Vn1 Ti eiþ1,k ¼ O(n1=2 (log n)1=2 ): n i¼1
(6:225)
Equations (6.222), (6.223) and (6.225) prove that n1=2 Dn M
n1 X
Yi eiþ1,k ! 0 a:s:
(6:226)
i¼1
The strong consistency (6.212) is a consequence of Eqs. (6.216), (6.217), (6.222), (6.226) and the fact that G is positive definite a.s. B
CHAPTER
7
NONLINEAR MODELS
I
N THIS chapter we consider two types of nonlinear estimation techniques: NLS and the ML method. In the first case we give a full proof of the result by Phillips (2007) for the model ys ¼ bsg þ us , s ¼ 1, . . . , n: This proof includes an expanded exposition of the Wooldridge (1994) approach to asymptotic normality of an abstract estimator. In the second case, we give an extension to the unbounded explanatory variables of the result of Gourie´roux and Monfort (1981) for the binary selection model. Problems arising from the unboundedness assumption are explained. Some ideas of the proof, such as obtaining and analyzing the Lipschitz constant, can be used in models other than binary logit, and others, like the link to the linear model, are specific to the binary logit model.
7.1 ASYMPTOTIC NORMALITY OF AN ABSTRACT ESTIMATOR The theory of nonlinear estimation is complex, and some authors in this area “overcome” its complexities by hiding them under a pile of conditions. The result by Wooldridge (1994) stands out by being rigorous and applicable to nonlinear regressions for nonstationary dependent time series.
7.1.1 The Framework We begin with an objective function Qn (v, u), where v is the sample data and u is the parameter in the parameter space Q. Q is assumed to be of dimension p and, correspondingly, all square matrices will be of size p p. Most of the time dependence on v of Qn and its derivatives is suppressed. The vector of first-order derivatives Sn (u) ¼ ru Qn (u)0 is called a score and the matrix of second-order derivatives Hn (u) ¼ ru Sn (u) Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
339
340
CHAPTER 7
NONLINEAR MODELS
is called a Hessian. By an estimator of the true value u0 we mean a maximizing or minimizing point u^n of the objective function Qn (v, u), which, under appropriate conditions, is a solution to the first-order condition Sn (u^n ) ¼ 0 a:s:
(7:1)
Usually such an estimator exists only asymptotically almost surely in the sense that the probability of the sample points v for which it exists approaches unity as n ! 1: We are interested in conditions sufficient for existence and consistency of such an estimator. Once we have it, a mean value expansion about u0 0 ¼ Sn (u^n ) ¼ Sn (u0 ) þ Hn (u^n , u0 )(u^n u0 )
(7:2)
can be used to investigate the asymptotic normality of u^n : In Eq. (7.2) Hn (u, u0 ) denotes the Hessian with rows evaluated at mean values u between u and u0 . That is, if we denote Hn1 , . . . , Hnp the rows of Hn , then with some D1 , . . . , Dp [ [0, 1] 0
1 Hn1 (u0 þ D1 (u u0 )) A Hn (u, u0 ) ¼ @ Hnp (u0 þ Dp (u u0 ))
(7:3)
(in fact, the argument here belongs to the cube with vertices u and u0 ): The numbers Di arise from an application of the mean value theorem to p components of Sn : It is well known that, in general, they are different (one cannot apply the mean value theorem to the whole vector Sn to produce a single D for i ¼ 1, . . . , p): With some abuse of notation the argument in Eq. (7.3) is denoted u ¼ u0 þ D(u u0 ) and then Hn (u, u0 ) can be written as Hn (u, u0 ) ¼ Hn (u):
7.1.2 Wooldridge’s Assumptions The first assumption is a set of regularity conditions to ensure a proper smoothness of Qn : 7.1.2.1
Assumption W1
1. Qn : V Q ! R is the objective function defined on the data space V and the parameter space Q # Rp : 2. The true parameter u0 belongs to the interior int(Q): 3. Qn satisfies standard measurability and differentiability conditions: a. for each u [ Q, Qn ( , u) is measurable, b. for each v [ V, Qn (v, ) is twice continuously differentiable on int(Q): The second assumption is about normalization of the score and Hessian at the true value u0 :
7.1 ASYMPTOTIC NORMALITY OF AN ABSTRACT ESTIMATOR
341
7.1.2.2 Assumption W2 There exists a sequence of nonstochastic positive definite diagonal matrices {Dn } such that d
! N(0, B0 ) D1 n Sn (u0 ) and p
1 D1 ! A0 , n Hn (u0 )Dn
(7:4)
where A0 and B0 are nonrandom matrices and A0 is positive definite. The next assumption realizes the idea to provide a type of uniform convergence Hn (u) ! Hn (u0 ) of the Hessian normalized by something tending to infinity at a rate slower than Dn : 7.1.2.3 Assumption W3 There is a sequence of nonstochastic positive definite diagonal matrices {Cn } such that ! 0 as n ! 1 Cn D1 n
(7:5)
max kCn1 [Hn (u) Hn (u0 )]Cn1 k ¼ op (1),
(7:6)
and u [ Nnr (u0 )
where the neighborhood Nnr (u0 ) of u0 is defined by Nnr (u0 ) ¼ {u [ Q: kCn (u u0 )k r}, 0 , r 1: Owing to this assumption, (i) we allow each element of the Hessian to be standardized by a different function of the sample size and (ii) the neighborhood Nnr (u0 ), over which the convergence Hn (u) ! Hn (u0 ) must take place, uniformly shrinks to u0 as the sample size tends to infinity. Everywhere in Section 7.1 Assumptions W1– W3 are assumed to hold.
7.1.3 Algebraic Lemma It is convenient to denote 0 1 Sn0 ¼ Sn (u0 ), Hn0 ¼ Hn (u0 ), An ¼ D1 n Hn Dn :
One of Wooldridge’s tricks is to use expansions about the point
u~n ¼ u0 (Hn0 )1 Sn0 ,
(7:7)
342
CHAPTER 7
NONLINEAR MODELS
which mimics u^n [from Eq. (7.2) we see that u^n ¼ u0 (Hn (u))1 Sn0 )]. This point has the properties (u~n u0 )0 Sn0 ¼ (Sn0 )0 (Hn0 )1 Sn0 , (u~n u0 )0 Hn0 (u~n u0 ) ¼ (Sn0 )0 (Hn0 )1 Sn0 :
(7.8)
The purpose of the lemma below is to show that the difference Qn (u) Qn (u~n ) is a quadratic function. Lemma.
The objective function satisfies
1 Qn (u) Qn (u~n ) ¼ (u u~n )0 Hn0 (u u~n ) þ Rn (u, u0 ) Rn (u~n , u0 ), 2
(7:9)
where Rn (u, u0 ) ¼ (u u0 )0 [Hn (u, u0 ) Hn0 ](u u0 ): Proof. By the second-order Taylor expansion 1 Qn (u) Qn (u0 ) ¼ (u u0 )0 Sn0 þ (u u0 )0 Hn0 (u u0 ) þ Rn (u, u0 ): 2
(7:10)
Replacing u by u~n in Eq. (7.10) and using Eq. (7.8) we get 1 Qn (u~n ) Qn (u0 ) ¼ (u~n u0 )0 Sn0 þ (u~n u0 )0 Hn0 (u~n u0 ) þ Rn (u~n , u0 ) 2 1 0 0 0 1 0 ¼ (Sn ) (Hn ) Sn þ Rn (u~n , u0 ): 2
(7.11)
By Eq. (7.7) we have u0 ¼ u~n (Hn0 )1 Sn0 , so by Eq. (7.10) 1 [u u~n (Hn0 )1 Sn0 ]0 2 H 0 [u u~n (H 0 )1 S 0 ] þ Rn (u~n , u0 )
Qn (u) Qn (u0 ) ¼ [u u~n (Hn0 )1 Sn0 ]0 Sn0 þ n
n
n
1 ¼ (u u~n )0 Sn0 (Sn0 )0 (Hn0 )1 Sn0 þ (u u~n )0 Hn0 (u u~n ) 2 1 1 (u u~n )0 Hn0 (Hn0 )1 Sn0 (Sn0 )0 (Hn0 )1 Hn0 (u u~n ) 2 2 1 þ (Sn0 )0 (Hn0 )1 Sn0 þ Rn (u, u0 ): 2
7.1 ASYMPTOTIC NORMALITY OF AN ABSTRACT ESTIMATOR
343
Here some terms cancel out, and the result is 1 1 Qn (u) Qn (u0 ) ¼ (Sn0 )0 (Hn0 )1 Sn0 þ (u u~n )0 Hn0 (u u~n ) þ Rn (u, u0 ): (7:12) 2 2 Subtracting Eq. (7.11) from Eq. (7.12) we get Eq. (7.9).
B
7.1.4 Lemma on Convergence in Probability Lemma. If a sequence of random vectors {an } satisfies plim an ¼ 0, then there exists a sequence of positive numbers {rn } such that lim rn ¼ 0 and lim P(kan k . rn ) ¼ 0: Proof. By definition, for any d . 0 we have P(kan k . d) ! 0: Hence, letting d1 ¼ 1 we can find n1 such that P(kan k . d1 ) d1 for all n n1 : Similarly, for d2 ¼ 21 there exists n2 . n1 such that P(kan k . d2 ) d2 for all n n2 : On the kth step we put dk ¼ 2kþ1 and find nk . nk1 such that P(kan k . dk ) dk for all n nk :
(7:13)
Since {v: 5dn ln =4}: V Since ln tends to a positive number by Assumption W2, Lemma 7.1.6(i) and ~n Eq. (7.14) imply Eq. (7.20). By Lemmas 7.1.3 and 7.1.6 for v [ V 1 0 1 ~ [Dn (u u~n )]0 (D1 min fn (u) min n Hn Dn )[Dn (u un )] u [ @N~ n (rn ) u [ @N~ n (rn ) 2 ~ þRn (u, u0 ) Rn (un , u0 ) 1 1 ln rn2 5dn rn2 ln rn2 . 0: 2 4 Thus, there are points inside N~ n (rn ) where fn takes values lower than on its boundary. Since fn is smooth by Assumption W1, it achieves its minimum inside N~ n (rn ) and
346
CHAPTER 7
NONLINEAR MODELS
the point of minimum u^n satisfies the first-order condition (7.1). The inequality kDn (u^n u~n )k rn and Eq. (7.15) prove Eq. (7.21). B
7.1.8 Asymptotic Normality of u^n Theorem. satisfies
Under Assumptions W1– W3 the estimator u^n from Theorem 7.1.7
d 1 Dn (u^n u0 ) ! N(0, A1 0 B0 A0 ):
(7:22)
~ n we can use Eq. (7.2). PremultipliProof. By Theorem 7.1.7 for almost any v [ V 1 cation of that equation by Dn yields 0 1 ~ ^ 0 ¼ D1 n Sn þ Dn H n (un u0 ) 0 1 ~ 0 1 ^ ^ ¼ D1 n Sn þ An Dn (un u0 ) þ Dn (H n Hn )Dn Dn (un u0 ),
(7.23)
where we denote H~ n ¼ Hn (u^n , u0 ). Now we show that the end term in Equation (7.23) is negligible. Equation (7.21) implies that the mean value u n ¼ u0 þ D(u^n u0 ) satisfies Dn (u n u0 ) ¼ Op (1): By condition (7.5) then Cn (u n u0 ) ¼ op (1):
(7:24)
~ n : kCn (u n u0 )k 1} we have P(V n ) ! 1 by Eqs. (7.20) n ¼ {v [ V Denoting V n it holds that u n [ Nn (1) for all large n and therefore, by and (7.24). For v [ V Assumption W3 0 1 1 1 ~ 0 1 1 ~ kD1 n (H n Hn )Dn k kDn Cn kkCn (H n Hn )Cn kkCn Dn k ¼ op (1) 1 (D1 n Cn ¼ Cn Dn because these matrices are diagonal). This bound and consistency 0 (7.21) show that the end term in Eq. (7.23) is op (1): It follows that 0 ¼ D1 n Sn þ An Dn (u^n u0 ) þ op (1): As a result of Assumption W2 this implies Dn (u^n u0 ) ¼ 1 0 A1 n Dn Sn þ op (1): It remains to apply Assumption W2 again to prove Eq. (7.22). B
7.2 CONVERGENCE OF SOME DETERMINISTIC AND STOCHASTIC EXPRESSIONS To make the exposition of the Phillips’ method in Section 7.3 clearer, in Section 7.2 we collect some technical tools arising in his approach.
347
7.2 CONVERGENCE OF SOME DETERMINISTIC AND STOCHASTIC EXPRESSIONS
7.2.1 Approximation of Integrals by Integral Sums: General Statement Lemma.
Denote n t ð1 1X f (t) dt: f R( f ) ¼ n t¼1 n 0
(i) If f is absolutely continuous on [0, 1], then jR( f )j k f 0 k1 =n: (ii) Suppose f is continuously differentiable on (0, 1], j f j is monotone on [0, d0 ) for some 0 , d0 , 1 and sup j f 0 (t)j cj f 0 (d)j
for all 0 , d d0 =2:
(7:25)
d,t,1
Then 2ðd
j f 0 (d)j n
j f (t)j dt þ c
jR( f )j 2
for all 0 , d d0 =2:
0
Proof. (i) Using the Newton – Leibniz formula and changing the order of integration we have [it ¼ ((t 1)=n, t=n)] ð t=n ð ð t f (s) ds j f 0 (t)j dt ds f n it s
it
ð
ðt
¼
ð 1 dsj f (t)j dt j f 0 (t)j dt: n 0
it (t1)=n
it
Hence, n ð n ð X 1X k f 0 k1 t : f (s) ds j f 0 (t)j d t ¼ jR( f )j f n n n t¼1 t¼1 it
it
1 t (ii) By monotonicity of j f j for small t the term f does not exceed either n n Ð Ð 1 d d 0 =2 it j f (s)j ds or itþ1 j f (s)j ds, so for n dþ1=n 2ðd ð nd 1 X t f j f (t)j dt j f (t)j dt: n t¼1 n 0
0
348
CHAPTER 7
NONLINEAR MODELS
By the finite increments formula and condition (7.25) X X ð1 n ð 1 n t t 0 f f (s) ds f ( u ) s ds n n t¼nd t¼nd n it
d
n X 1 sup j f 0 (s)j n sd t¼1
ð ds
cj f 0 (d)j : n
it
The last two bounds result in nd 1 X t ðd jR( f )j f þ j f (s)j ds n t¼1 n 0
2ðd X ð1 1 n t j f 0 (d)j f (s) ds 2 j f (s)j ds þ c : f þ n n t¼nd n d
0
B
7.2.2 Approximation of Integrals by Integral Sums: Special Cases Lemma. Let i be a nonnegative integer and g0 a real number such that g0 . 1: Denote fi (t) ¼ t g (log t)i where g belongs to a small neighborhood Od (g0 ) # (1, 1) of g0 : Then uniformly in g [ Od (g0 ) 1 if g0 . 1, n ( log n)i if 1 g0 . 1: R( fi ) ¼ O (gþ1)=2 n R( fi ) ¼ O
(7.26) (7.27)
Proof. From fi0 (t) ¼ gtg1 ( log t)i þ it g1 ( log t)i1 we get j fi0 (t)j ct g1 (1 þ jlog tji ):
(7:28)
Equation (7.26) follows from this bound and Lemma 7.2.1(i). In case g0 1 we see from Eq. (7.28) that fi satisfies assumptions of Lemma 7.2.1(ii). Therefore 2ðd
jR( fi )j 2 0
t g jlog tji dt þ c
j fi0 (d)j : n
(7:29)
7.2 CONVERGENCE OF SOME DETERMINISTIC AND STOCHASTIC EXPRESSIONS
349
It is easy to see that with some constants cij ða
t g (log t)i dt ¼ agþ1
ða
g
t ( log t)
kþ1
0
cij log j a:
(7:30)
j¼0
0
Indeed, for i ¼ 0 one has Then for i ¼ k þ 1
i X
Ða 0
t g dt ¼ agþ1 =(g þ 1): Suppose Eq. (7.30) is true for i ¼ k:
a ða t gþ1 ( log t)kþ1 k þ 1 g k dt ¼ g þ 1 t ( log t) dt gþ1 0 0
gþ1
¼
a
( log a) gþ1
kþ1
k k þ 1 gþ1 X a ckj log j a, gþ1 j¼0
which is of form Eq. (7.30). Equations (7.28) – (7.30) imply for small d jR( fi )j c1 dgþ1 j log dji þ c2 The choice d ¼ n1=2 yields dgþ1 ¼ Eq. (7.27).
dg1 j log dji : n
dg1 ¼ n(gþ1)=2 and finishes the proof of n B
7.2.3 Lp-Approximability of Power Sequences Here we consider sequences xn ¼ (1g , 2g , . . . , ng ): For g 0 Lp -approximability of xn =kxn k is shown in Section 2.7.3, where the fact that g is an integer did not play any role. Continuity, however, was important. Here we consider negative g when there is no continuity. Let 1 p , 1 and 0 . g . 1=p: Then ngpþ1 þ O(n(gpþ1)=2 (i) kxn kpp ¼ ),g gp þ 1 n g 1 1=p is Lp -close to W(s) ¼ sg : More,..., (ii) the sequence wn ¼ n n n over, kwn dnp Wkp ! 0 uniformly in g [ (1=p þ d, 0) for any d . 0:
Lemma.
Proof. (i) By Eq. (7.26) kxn kpp ¼
n X
t gp ¼ ngpþ1
t¼1
n gp 1X t n t¼1 n
21 3 ð ngpþ1 þ O(n(gpþ1)=2 ): ¼ ngpþ1 4 sgp ds þ O(n(gpþ1)=2 )5 ¼ gp þ 1 0
350
CHAPTER 7
NONLINEAR MODELS
(ii) By Minkowski’s inequality nd X
kwn dnp Wkp
!1=p jwnt jp
þ
t¼1
þ
nd X
!1=p j(dnp W)t jp
t¼1
!1=p
n X
jwnt (dnp W)t j
p
:
(7.31)
t¼nd
Here, by monotonicity nd X t¼1
ðd nd gp 1X t jwnt j ¼ sgp ds ¼ cdgpþ1 : n t¼1 n p
(7:32)
0
Ð Using Ho¨lder’s inequality and the definition (dnp W)t ¼ n11=p it W(s) ds we obtain a similar bound for the second term at the right of Eq. (7.31) nd X
j(dnp W)t jp ¼ n p1
t¼1
nd X t¼1
n p1
nd ð X t¼1
ðd
p ð W(s) ds it 0
ð
1p1
W p (s) ds@ dsA
it
it
sgp ds ¼ cdgpþ1 :
¼
(7.33)
0
By the finite increments formula n X t¼nd
p ð n X g 1 t p g jwnt (dnp W)t j ¼ n s ds n n t¼nd it p ð n X 1 t 0 ¼ n W (u) s ds n n t¼nd it sup jW 0 (s)jp sd
n X t¼nd
1 sup jW 0 (s)jp : np sd
1 n pþ1 (7.34)
7.2 CONVERGENCE OF SOME DETERMINISTIC AND STOCHASTIC EXPRESSIONS
351
Since W satisfies condition (7.25), Eq. (7.34) implies n X
!1=p jwnt (dnp W)t j
p
t¼nd
c g1 d : n
(7:35)
Equations (7.31), (7.32), (7.33), and (7.35) yield dg1 gþ1=p : þ kwn dnp Wkp c d n The choice dgþ1=p ¼ dg1 =n or, equivalently, d ¼ n(1þ1=p) finishes the proof of Lp -approximability with the bound kwn dnp Wkp ¼ O (n( pgþ1)=( pþ1) ): The fact that this convergence is uniform in g [ (1=p þ d, 0) follows from the observation that the constants in Eqs. (7.32), (7.33), and (7.35) are uniformly bounded. B
7.2.4 Definition of Auxiliary Deterministic and Stochastic Expressions Phillips defines the matrix Cn required in the Wooldridge framework by Cn ¼ ng0 þ1=2d diag[1, log n]: With this definition the neighborhood from Assumption W3 becomes Nnr (u0 ) ¼ {u [ Q: kCn (u u0 )k r} 2
2
¼ {u : [ng0 þ1=2d (b b0 )] þ [ng0 þ1=2d ( log n)(g g0 )] r 2 }:
(7.36)
In particular, for u [ Nn1 (u0 ) jb b0 j
1 ng0 þ1=2d
,
jg g0 j
1 ng0 þ1=2d
log n
:
(7:37)
We need two types of deterministic expressions: D1ni ¼
n X
(bi s2g bi0 s2g0 ) logi s,
i ¼ 0, 1, 2,
(7.38)
s¼1
D2ni ¼
n X s¼1
(bsg b0 sg0 )bi sg logiþ1 s, i ¼ 0, 1,
(7.39)
352
CHAPTER 7
NONLINEAR MODELS
and two types of stochastic ones: S1ni ¼
n X
us (bi sg bi0 sg0 ) logiþ1 s,
i ¼ 0, 1,
(7.40)
s¼1 n s g 1 X s S2ni (g) ¼ pffiffiffi us logi , n n n s¼1
i ¼ 0, 1, 2, 3:
(7.41)
In these definitions, (b, g) [ Nnr (u0 ): Therefore Eqs. (7.38), (7.39), and (7.40) should converge to zero in some sense.
7.2.5 Bounding Deterministic Expressions Lemma. Let g0 . 1=2: With some ( b , g ) between ( b0 , g0 ) and ( b, g) uniformly in ( b, g) [ Nn1 (u0 ) we have c logi n, ng0 2g 1=2d c jD2ni j g gg 1=2d logiþ1 n: 0 n
jD1ni j
(7.42) (7.43)
Proof. With some ( b , g ) between ( b0 , g0 ) and ( b, g) by the finite increments formula
bi s2g bi0 s2g0 ¼ i(b )i1 s2g (b b0 ) þ 2(b )i s2g (g g0 ) log s:
(7:44)
Therefore Eq. (7.38) becomes D1ni ¼ i( b )i1 ( b b0 )
n X
s2g logi s þ 2(b )i (g g0 )
s¼1
n X
s2g logiþ1 s:
(7:45)
s¼1
This requires estimation of Tni ¼
n X
s2g logi s,
i ¼ 0, . . . , 4:
s¼1
Substitution of log s ¼ log (s=n) þ log n yields Tni ¼
n X
s2 g
s¼1
¼ n2g
i X j¼0
þ1
i X j¼0
s ij log n Cji log j n
Cji ( logij n)
n 1X s s2g log j : n s¼1 n
(7:46)
353
7.2 CONVERGENCE OF SOME DETERMINISTIC AND STOCHASTIC EXPRESSIONS
As a result of Eq. (7.37) for all large n we have 2g þ 1 . d1 for some d1 . 0: Hence, by Lemma 7.2.2 jTni j c1 n2g
þ1
logi n:
(7:47)
Now Eqs. (7.37), (7.45), and (7.47) imply Eq. (7.42):
jD1ni j
c2 n2g þ1 logi n c3 n2g þ1 logiþ1 n ¼ c4 n2g g0 þ1=2þd logi n: þ g þ1=2d g þ1=2 d 0 0 log n n n
In the case of D2ni instead of Eqs. (7.44) and (7.45) we have, respectively,
bsg b0 sg0 ¼ sg (b b0 ) þ b sg (g g0 ) log s, D2ni ¼ bi (b b0 )
n X
sg
þg
logiþ1 s þ bi b (g g0 )
s¼1
n X
sg
þg
logiþ2 s,
(7:48)
i ¼ 0, 1:
s¼1
P Instead of Eq. (7.46) we need to estimate Tni ¼ ns¼1 sg þg logi s, i ¼ 1, 2, 3: In addition to Lemma 7.2.2, in the derivation of the analog of Eq. (7.47) we have to apply the Ho¨lder inequality ! ! n g þg n 2g n 1=2 1 X 1=2 1 X s 2g X s s 1 s s s log j log j log j : n s¼1 n n n s¼1 n n n s¼1 n n The result is jTni j c1 ng
þgþ1
jD2ni j
c2 ng
logi n and, hence,
þgþ1
logiþ1 n
ng0 þ1=2d
¼ c4 ng
þgg þ1=2þd 0
þ
c3 ng þgþ1 logiþ2 n ng0 þ1=2d log n
logiþ1 n:
B
7.2.6 Multiplication of Lp-Approximable Sequences by Continuous Functions Suppose sequences {wn (g): n ¼ 1, 2, . . .} depend on a parameter g [ Gn : We say that {wn (g)} is Lp -close to W [ Lp (0, 1) uniformly on Gn if sup kwn (g) dnp Wkp ! 0:
g [ Gn
The lemma below shows that this property is preserved under multiplication by continuous functions.
354
CHAPTER 7
NONLINEAR MODELS
Lemma. Let f [ C[0, 1]: Denote M(s) ¼ W(s)f (s) and consider the product sequences n 1 , . . . , wnn (g) f : mn (g) ¼ wn1(g) f n n If {wn (g)} is Lp -close to W [ Lp (0, 1) uniformly on Gn , then {mn (g)} is Lp -close to M uniformly on Gn : Proof. Denote zn (g) ¼ mn (g) dnp M: We need to prove that sup kzn (g)kp ! 0:
(7:49)
g [ Gn
As f is uniformly continuous, for any 1 . 0 there exists d . 0 such that js s0 j d implies j f (s) f (s0 )j 1 and sups [ it j f (t=n) f (s)j 1 for n 1=d: Now we bound the tth component of zn (g): t [dnp (Wf )]t j[znt (g)]t j ¼ wnt (g) f n ð t 1=q f W(s) ds wnt (g) n n it ð h t i 1=q þ n W(s) f f (s) ds n it k f kC jwnt (g) (dnp W)t j þ 1(dnp jWj)t : Hence by boundedness of dnp kzn (g)kp k f kC sup kwn (g) dnp Wkp þ 1kWkp : g [ Gn
This proves Eq. (7.49).
B
2 7.2.7 Convergence in Distribution of Sni (g0 )
7.2.7.1 Assumption P1 The errors ut in the model ys ¼ bsg þ us are linear processes ut ¼
X
cj etj , t [ Z,
j[Z
P where {et , F t : t [ Z} is a m.d. array, j jcj j , 1, second conditional moments are constant, E(e2t j F t1 ) ¼ s2e for all t and the squares e2t are uniformly integrable.
355
7.2 CONVERGENCE OF SOME DETERMINISTIC AND STOCHASTIC EXPRESSIONS
If g0 . 1=2 and ut satisfy Assumption P1, then 0 1 !2 ð1 n s g0 X X 1 s d S2ni (g0 ) ¼ pffiffiffi us logi ! N @0, se cj s2g0 log2i s dsA: n n n s¼1 j
Lemma.
0
Proof. Let d . 0 be such that g0 d . 1=2: The sequence 1 s g0 d pffiffiffi : s ¼ 1, . . . , n , n ¼ 1, 2, . . . n n is L2 -close to W(s) ¼ sg0 d by Lemma 7.2.3(ii). Since the function f (s) ¼ sd logi s is continuous on [0, 1], the product sequence 1 s g0 d s d i 1 s g0 i pffiffiffi p ffiffi ffi log s ¼ log s n n n n n is L2 -close to M(s) ¼ W(s) f (s) ¼ sg0 logi s by Lemma 7.2.6 (where Gn ¼ {g0 }). The conclusion of this lemma follows from Theorem 3.5.2. B 2 7.2.8 A Uniform Bound on Sni (g )
Lemma. If g0 . 1=2 and ut satisfy Assumption P1, then uniformly in g [ Nn1 (u0 ) [see Eqs. (7.36) and (7.41) for the definitions] S2ni (g) ¼ Op (1):
(7:50)
Proof. To make use of Lemma 7.2.7, we approximate S2ni (g) by S2ni (g0 ): For the approximation to work, we need to bound g away from 1=2: Therefore we write n n s g s gd s d 1 X s 1 X s S2ni (g) ¼ pffiffiffi us logi ¼ pffiffiffi us logi , n n n n n n s¼1 n s¼1
where d . 0 is small and g d is uniformly bounded away from 1=2: By the mean value theorem there are points gs,n between g and g0 such that s gd s g0 d s gs,n d s ¼ (g g0 ) log : n n n n 1 s gd : s ¼ 1, . . . , n we have Denoting wn (g) ¼ pffiffiffi n n n gd s g0 d 2 1X s 2 kwn (g) wn (g0 )k2 ¼ n s¼1 n n ¼ jg g 0 j2
n 2(g d) s,n 1X s s log2 : n s¼1 n n
356
CHAPTER 7
NONLINEAR MODELS
By Lemma 7.2.2 uniformly in n and gs,n the estimate n 2(g d) 1X s s,n s log2 c n s¼1 n n
is true. Thus, by Eq. (7.37) uniformly in g [ Nn1 (u0 ), kwn (g) wn (g0 )k2
c : ng0 þ1=2d log n
We know from Lemma 7.2.3 that wn (g0 ) is L2 -close to W(s) ¼ sg0 d , so the above inequality implies sup g [ Nn1 (u0 )
kwn (g) dn2 Wk2 ! 0:
With f (s) ¼ sd logi s then by Eq. (7.49) sup g [ Nn1 (u0 )
kmn (g) dn2 Mk2 ! 0,
where M(s) ¼ W(s) f (s) ¼ sg0 logi s and
1 s gd s d i s 1 s g i s ¼ pffiffiffi : log log mn (g) ¼ pffiffiffi n n n n n n n Now we use identity (3.25), orthogonality of m.d.’s and Lemma 2.3.2 to get 2
kS2ni (g) S2ni (g0 )k2
2
n n
1 X s g s g0 1 X
is i s
¼ pffiffiffi us log pffiffiffi us log
n s¼1 n n n s¼1 n n
2
2
n
X
¼
u [m (g) mn (g0 )]
s¼1 s n 2
2
X
¼
ei {Tn [mn (g) mn (g0 )]}i
i [ Z 2
¼
se2 kTn [mn (g)
se2
X i[Z
!2
mn (g0 )]k22
jci j kmn (g) mn (g0 )k22 :
(7:51)
7.2 CONVERGENCE OF SOME DETERMINISTIC AND STOCHASTIC EXPRESSIONS
357
Since mn (g0 ) is L2 -close to M, this bound and Eq. (7.51) imply S2ni (g) ¼ S2ni (g0 ) þ Op (1) uniformly in g [ Nn1 (u0 ): Lemma 7.2.7 and Eq. (7.52) prove Eq. (7.50).
(7:52) B
1 7.2.9 Bounding Sni
Lemma. satisfy
If g0 . 1=2 and ut satisfy Assumption P1, then the variables (7.40)
S1ni ¼ Op
( log n)iþ1 uniformly in u [ Nn1 (u0 ): ng0 g d
Proof. Using Eq. (7.48) rewrite Eq. (7.40) as S1ni ¼ i(b )i1 (b b0 )
n X
us sg logiþ1 s þ (b )i (g g0 )
s¼1
n X
us sg logiþ2 s,
i ¼ 0, 1:
(7:53)
s¼1
Therefore we need to bound n X
Uni ¼
us sg logi s, i ¼ 1, 2, 3:
s¼1
Substituting log s ¼ log(s=n) þ log n we get Uni ¼
n X
u s sg
s¼1
¼
i X
s Cji log j logij n n j¼0
i X
Cji ( logij n)ng
þ1=2
j¼0
¼ ng
þ1=2
i X
n s g 1 X s pffiffiffi us log j n n n s¼1
Cji ( logij n)S2nj (g ):
j¼0
By Lemma 7.2.8 Uni ¼ Op (ng S1ni
¼ Op
ng
¼ Op (ng uniformly in u [ Nn1 (u0 ):
þ1=2
þ1=2
logi n): This bound, Eqs. (7.37) and (7.53) yield
logiþ1 n
ng0 þ1=2d
g0 þd
g þ1=2 iþ2 n log n þ Op g þ1=2d log n n0
logiþ1 n) B
358
CHAPTER 7
NONLINEAR MODELS
7.3 NONLINEAR LEAST SQUARES 7.3.1 The Model and History Asymptotically collinear regressors arise in nonlinear regressions of type ys ¼ bsg þ us , s ¼ 1, . . . , n,
(7:54)
where the trend component g . 1=2 is to be estimated along with the regression coefficient b: Let b0 and g0 denote the true values of the parameters. The firstorder expansion for bsg is
bsg b0 sg0 þ sg0 (b b0 ) þ (b0 sg0 log s)(g g0 ) ¼ bsg0 þ (g g0 )b0 sg0 log s:
(7:55)
Thus, the linearized form of Eq. (7.54) involves the regressors sg0 and sg0 log s, which are asymptotically collinear and whose second moment matrix is asymptotically singular upon appropriate (multivariate) normalization. Wu (1981, p. 509) noted that model (7.54) failed his conditions [which require a single normalizing quantity and a positive definite limit matrix for the second moment matrix of the linearized version of Eq. (7.54) ys ¼ bsg0 þ (g g0 )b0 sg0 log s þ us ]. More precisely, Wu noted that the model (7.54) satisfies his conditions for strong consistency of the least-squares estimator u^ ¼ (b^ , g^ ), but not his conditions for asymptotic normality. There are two reasons for the failure: 1. the Hessian requires different standardizations for the parameters b and g (whereas Wu’s approach uses a common standardization) and 2. the Hessian is asymptotically singular because of the asymptotic collinearity of the functions sg0 and sg0 log s that appear in the score (whereas Wu’s theory requires the variance matrix to have a positive definite limit). Phillips (2007) derived the asymptotic distribution of the NLS estimator for Eq. (7.54). The theory is very instructive because of: 1. his choice of standardizing matrices, 2. the way he modified the Wooldridge approach to suit Eq. (7.54) and 3. the possibility to adapt the theory to models different from Eq. (7.54). This section contains a full proof of his results using, where necessary, statements based on Lp -approximability instead of those based on Brownian motion. While it is in general a matter of taste which approach to use, in one case the Brownian motion methods do not seem adequate for applications. Phillips (2007, Lemma 6.1) is proved for smooth functions, while its purported application Phillips (2007, p. 607) is to functions which may not be continuous.
7.3 NONLINEAR LEAST SQUARES
359
7.3.2 The Objective Function, Score and Hessian In the NLS method, the objective function for model (7.54) is defined by Qn (u) ¼
n X
( ys bsg )2 , where u ¼ (b, g):
s¼1
The estimator u^ , by definition, solves the extremum problem
u^ ¼ arg {min Qn (u)} u
(we minimize over u the objective function and take the minimizing point as the estimator). As a result of smoothness of Qn , the estimator satisfies the first-order condition Sn (u^ ) ¼ 0:
(7:56)
Since scaling the derivatives by 1=2 does not change the validity of the expansion (7.2) and does not affect asymptotic conditions, we define the score by [see the derivatives of bsg in Eq. (7.55)] n X 1 sg Sn (u) ¼ ru Qn (u)0 ¼ ( ys bsg ): g b s log s 2 s¼1
(7:57)
The Hessian for Eq. (7.57) is H n ( u ) ¼ ru S n ( u ) ¼
n X
hs (u),
(7:58)
s¼1
where
@ hs (u) ¼ @b
sg ( ys bsg ) , bsg ( ys bsg ) log s
@ @g
sg ( ys bsg ) g bs ( ys bsg ) log s
:
Replacing ys by its expression from the true model ys ¼ b0 sg0 þ us we find the elements of hs (u) to be hs11 (u) ¼ hs12 (u) ¼ hs21 (u) ¼
@ g s ( ys bsg ) ¼ s2g , @b
(7:59)
@ ( ys bsg )bsg log s @b
¼ ys sg log s þ 2bs2g log s ¼ b0 sg0 þg log s us sg log s þ 2bs2g log s ¼ bs2g log s us sg log s þ (bsg b0 sg0 )sg log s,
(7:60)
360
CHAPTER 7
NONLINEAR MODELS
hs22 (u) ¼
@ ( ys bsg )bsg log s @g
¼ b2 s2g log2 s (b0 sg0 þ us bsg )bsg log2 s ¼ b2 s2g log2 s us bsg log2 s þ (bsg b0 sg0 )bsg log2 s:
(7:61)
7.3.3 Lemma on Convergence of the Score Define the normalization matrix Dn ¼ ng0 þ1=2 diag[1, log n]: Lemma.
(7:62)
If g0 . 1=2 and ut satisfy Assumption P1, then
D1 n Sn (u0 )
s2 ! N 0, 2g0 þ 1 d
1
b0
b0
b20
!! 2
, s ; se
X
!2
cj :
j
Proof. Equations (7.57) and (7.62) imply g0 1=2 D1 n Sn (u0 ) ¼ n
0 B B ¼ B B @
1 0 0 1= log n
X n s¼1
n g 1 X s 0 pffiffiffi us n n s¼1 n g X b s 0 pffiffiffi 0 (log s)us n log n s¼1 n
sg0 us b0 sg0 ( log s)us 1 C C C: C A
(7:63)
Replacing log s ¼ log(s=n) þ log n and using notation (7.41) we get, by Lemma 7.2.7, 0
D1 n Sn (u0 )
1
1 @ A S2n0 (g0 ) þ op (1), ¼ ¼ b0 2 2 b Sn1 (g0 ) þ b0 Sn0 (g0 ) 0 log n S2n0 (g0 )
d
where S2n0 (g0 ) ! N½0, s2 =(2g0 þ 1) : This proves the lemma.
B
7.3.4 Asymptotic Representation of the Normalized Hessian Lemma. Suppose g0 . 1=2 and ut satisfy Assumption P1. With Hn defined by Eqs. (7.58) – (7.61) from Section 7.3.2 and Dn defined by Eq. (7.62) for the matrix
7.3 NONLINEAR LEAST SQUARES
361
1 An (u0 ) ¼ D1 n Hn (u0 )Dn , we have asymptotically
0 B 1B An (u0 ) ¼ B g1 B @
b0
1 b0 1
1 g1 log n
b20 1
1 1 g1 log n
2 2 þ 2 2 g1 log n g1 log n
1 C C 1 C C þ Op (n ), A
where g1 ¼ 2g0 þ 1 and 1 is some number from (0, g0 þ 1=2): Proof. By Eqs. (7.58) – (7.61) from Section 7.3.2
Hn (u) ¼
s2g0
n X s¼1
b0 s2g0 log s us sg0 log s
!
b0 s2g0 log s us sg0 log s b20 s2g0 log2 s us b0 sg0 log2 s
:
(7:64)
This equation and definition (7.62) lead to the following expressions for the elements of An (u0 ): An11 (u0 ) ¼
1
n X
n2g0 þ1
s¼1
s2g0 ¼
n 2g 1X s 0 , n s¼1 n
s 2g0 g0 þ log n ( b s u s ) log s 0 n2g0 þ1 log n s¼1 n " # 1 n 2g X 1 1X s 0 s log j ¼ b0 j n n n j¼0 log n s¼1 " # 1 n s g0 X 1 1 X js pffiffiffi : us log g0 þ1=2 log j n n n n s¼1 j¼0 n
An12 (u0 ) ¼
1
(7:65)
n X
(7:66)
Similarly, replacing log s by log ns þ log n, we have n 2 X X 1 2 2g0 g0 2 2j s log j n ( b s u b s ) C log s 0 0 j 2 n n2g0 þ1 log n s¼1 j¼0 " # 2 n 2g X Cj2 1X s 0 2 2j s log ¼ b0 2j n n n s¼1 n j¼0 log " # 2 n s g0 X Cj2 1 X 2j s pffiffiffi : us log b0 g0 þ1=2 log2j n n n n s¼1 j¼0 n
An22 (u0 ) ¼
(7:67)
362
CHAPTER 7
NONLINEAR MODELS
The bounds from Lemma 7.2.2 can be joined as R( fi ) ¼ O(n1 )
(7:68)
with some 1 [ (0, g0 þ 1=2): It is easy to calculate that ð1 t 0
2g0
1 dt ¼ , g1
ð1 t
2g0
0
1 log t dt ¼ 2 , g1
ð1
t 2g0 log2 t dt ¼
0
2 : g13
(7:69)
Hence, Eq. (7.65) asymptotically is An11 (u0 ) ¼
1 þ O(n1 ): g1
(7:70)
For Eq. (7.66) Lemma 7.2.7 and Eqs. (7.68) and (7.69) imply An12 (u0 ) ¼
b0 b 1 S2 (g ) þ 0 þ O(n1 ) g þ1=2 log n n1 0 n0 g12 log n g1 1 ng0 þ1=2
S2n0 (g0 ) ¼
b0 b 2 0 þ Op (n1 ): g1 g1 log n
(7:71)
Finally, for Eq. (7.67), we use Eqs. (7.68) and (7.69) and Lemma 7.2.7 to obtain
An22 (u0 ) ¼ b20
¼
b20
2 X
1 2j n log j¼0
21 3 ð Cj2 4 t 2g0 log2j t dt þ O(n1 )5 þ Op (n1 ) 0
1 2 2 þ Op (n1 ): þ g1 g21 log n g31 log2 n
Equations (7.70) –(7.72) prove the lemma.
(7:72)
B
7.3.5 The Right Normalization of the Hessian From the proof of Theorem 7.1.8 we can see that it is really not the convergence of the normalized Hessian An (u0 ) that is important [see Eq. (7.4)] but the convergence of the inverse A1 n (u0 ): The lemma below shows that if Hn (u0 ) is normalized by Dn this inverse does not converge. The inverse converges if Dn is replaced by Fn ¼
1 Dn ¼ ng0 þ1=2 diag[1= log n, 1]: log n
363
7.3 NONLINEAR LEAST SQUARES
Lemma.
Suppose b0 = 0, g0 . 1=2 and ut satisfy Assumption P1.
(i) The inverse of An (u0 ) is 0
1 2 2 1 1 B 1 g log n þ g2 log2 n b g log n 1 C 1 0 1 C 1 1 3 2 B An (u0 ) ¼ g1 log nB C @ A 1 1 1 1 2 b0 g1 log n b0 þ Op (n1 )
and, hence, A1 n (u0 ) diverges as n ! 1: (ii) Denote En ¼ Fn1 Hn (u0 )Fn1 : Then 0 2 2 2 þ b 1 0 g1 log n g21 log2 n g3 B B En1 ¼ 12 B b0 @ 1 1 b0 g1 log n
b0
1 1 1 C g1 log n C C A 1
þ Op (n1 )
b20 b0
g3 ¼ 12 b0
b 0 1
!
þ Op
1 : log n
Proof. (i) By Lemma 7.3.4 det An (u0 ) ¼
b20 2 2 2 1 þ 1 þ 1 g1 log n g 21 log2 n g1 log n g12 log2 n g 21 þ Op (n1 )
¼
b20 þ Op (n1 ): g14 log2 n
This equation, Lemma 7.3.4 and Eq. (4.62) prove (i). Part (ii) immediately follows from (i).
B
7.3.6 The Order of Eigenvalues of An (u0) Lemma.
If g0 . 1=2 and Assumption P1 is satisfied, then An (u0 ) has eigenvalues
1 þ b20 1 , l1 ¼ þ Op g1 log n
b20 (1 b20 ) 1 þ Op : l2 ¼ 3 g1 (1 þ b20 ) log2 n log3 n
(7:73)
364
CHAPTER 7
NONLINEAR MODELS
1 : From Lemma 7.3.4 we see that the eigenvalues g1 log n m1 , m2 of g1 An (u0 ) are the roots of the equation b0 (1 an ) 1m det(g1 An (u0 ) mI) ¼ det b0 (1 an ) b20 (1 2an þ 2a2n ) m Proof. Denote an ¼
¼ m2 m[b20 (1 2an þ 2a2n ) þ 1] þ b20 (1 2an þ 2a2n ) b20 (1 2an þ a2n ) ¼ m2 m[(1 þ b20 ) þ b20 (2an þ 2a2n )] þ b20 a2n ¼ 0: Hence
m1,2
(1 þ b20 ) þ b20 (2an þ 2a2n ) + ¼ 2
pffiffiffiffi D
,
(7:74)
where the discriminant D is D ¼ (1 þ b20 )2 þ 2b20 (1 þ b20 )(2an þ 2a2n ) þ b40 (4a2n 8a3n þ 4a4n ) 4b20 a2n ¼ (1 þ b20 )2 þ 2b20 (1 þ b20 )(2an þ 2a2n ) þ b40 (8a3n þ 4a4n ) þ 4b20 a2n (b20 1): Using the approximation
pffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffi an a þ an a þ pffiffiffi we have 2 a
pffiffiffiffi 2b2 a2 (b2 1) 1 : D ¼ (1 þ b20 ) þ b20 (2an þ 2a2n ) þ 0 n 0 2 þ Op (1 þ b0 ) log3 n Therefore by Eq. (7.74)
m1 ¼ 1 þ
b20
þ Op
1 , log n
b20 a2n (1 b20 ) 1 b20 (1 b20 ) 1 ¼ 2 þ Op : m2 ¼ þ Op 1 þ b20 g1 (1 þ b20 ) log2 n log3 n log3 n Upon division by g1 we get Eq. (7.73).
B
7.3.7 Regulating Convergence of the Hessian With Lemma 7.3.6 we have finished preparing the ingredients necessary for modifying Wooldridge’s Assumption W2 (Section 7.1.2.2). In this and Section 7.3.8 we deal with definitions and statements required for his Assumption W3 (Section 7.1.2.3).
7.3 NONLINEAR LEAST SQUARES
365
Taking some d [ (0, (g0 þ 1=2)=3) put Cn ¼
1 Dn ¼ ng0 þ1=2d diag[1, log n] nd
(7:75)
and Dn ¼ Cn1 [Hn (u) Hn (u0 )]Cn1 : Lemma.
The elements of the matrix Dn are (g1 ¼ 2g0 þ 1),
Dn11 ¼
Dn12 ¼
1
n X
ng1 2d
s¼1
(s2g s2g0 ), " n X
1 ng1 2d log n
n X
g
(bs2g b0 s2g0 ) log s
s¼1
g0
us (s s ) log sþ
s¼1
Dn22
(7:76)
n X
# g
g0
g
(bs b0 s )s log s ,
(7:77)
s¼1
" n X 1 ¼ g 2d 2 (b2 s2g b20 s2g0 ) log2 s log n s¼1 n1
n X
us (bsg b0 sg0 ) log2 s þ
s¼1
n X
# (bsg b0 sg0 )bsg log2 s :
(7:78)
s¼1
Proof. Using Eqs. (7.58) – (7.61) from Section 7.3.2 and Eq. (7.64) we find the elements of the difference Gn ¼ Hn (u) Hn (u0 ): Gn11 ¼
n X
(s2g s2g0 ),
s¼1
Gn12 ¼
n X
(bs2g b0 s2g0 ) log s
n X
s¼1
þ
us (sg sg0 ) log s
s¼1
n X
(bsg b0 sg0 )sg log s,
s¼1
Gn22 ¼
n X
(b2 s2g b20 s2g0 ) log2 s
s¼1
þ
n X
n X
us (bsg b0 sg0 ) log2 s
s¼1
(bsg b0 sg0 )bsg log2 s:
s¼1
These equations and Eq. (7.7) imply Eqs. (7.76) – (7.78).
B
366
CHAPTER 7
NONLINEAR MODELS
7.3.8 Verifying Wooldridge’s Assumption W3 Lemma.
Provided that g0 . 1=2 and Assumption P1 holds we have kDn (u, u0 )k ¼ op (1):
sup u [ Nn1 (u0 )
Proof. To bound Eq. (7.76) we use Eq. (7.42) with i ¼ 0: Dn11 ¼ O
1 ng0 2g 1=2dþg1 2d
¼O
1
n3g0 2g þ1=23d
:
This tends to zero because g is close to g0 and g1 þ 1=2 3d . 0: Equation (7.77) is estimated with the help of Eqs. (7.42) and (7.43) and Lemma 7.2.9:
Dn12 ¼
1
log n
log n
log n
Op g 2g 1=2d þ g g d þ g gg 1=2d ng1 2d log n n0 n0 n0 1 1 1 ¼ Op 3g 2g þ1=23d þ 3g g þ13d þ 3g gg þ1=23d n 0 n 0 n 0 1 ¼ Op 1 n
with some 1 . 0: Here we remember that g and g are close to g0 : Finally, for (7.78) we use again Eqs. (7.42) and (7.43) and Lemma 7.2.9 to get
Dn22
1 log2 n log2 n log2 n ¼ g 2d 2 O g 2g 1=2d þ Op g g d þ O g gg 1=2d n0 n0 n0 log n n1 1 1 1 ¼ Op 3g 2g þ1=23d þ 3g g þ13d þ 3g gg þ1=23d n 0 n 0 n 0 1 ¼ Op 1 n
with some 1 . 0:
B
7.3.9 Summary of Phillips’ Statements Here we review Wooldridge’s assumptions in the light of the Phillips propositions proved so far. Everywhere the next two assumptions are assumed to hold.
7.3 NONLINEAR LEAST SQUARES
367
7.3.9.1 Assumption P1 The errors ut in the model ys ¼ bsg þ us are linear processes ut ¼
X
cj e jt , t [ Z,
j[Z
P where {et , F t : t [ Z} is a m.d. array, j jcj j , 1, the second conditional moments are constant, E(e2t j F t1 ) ¼ se2 for all t, and the squares e2t are uniformly integrable. This condition, introduced in Section 7.2.7, provides convergence of stochastic expressions. 7.3.9.2
Assumption P2
(i) g0 . 1=2, (ii) b0 = 0 and (iii) jb0 j , 1: The inequality g0 . 1=2 is necessary for the integrability of f (s) ¼ s2g0 : The condition b0 = 0 provides the existence of A1 n (u0 ) (see Lemma 7.3.5). The inequality jb0 j , 1 is imposed to ensure positivity of the eigenvalues of An (u0 ) (see Lemma 7.3.6). This latter condition is missing in the Phillips paper. Assumption W1 obviously holds for the objective function Qn defined in Section 7.3.2. By Lemma 7.3.3 with Dn ¼ ng0 þ1=2 diag[1, log n] the first part of Assumption W2 is satisfied in the form D1 n Sn (u0 )
s2 1 ! N(0, B0 ), where B0 ¼ 2g0 þ 1 b0 d
b0 : b20
(7:79)
While the convergence (7.4) is true by Lemma 7.3.4, the part of Assumption W2 concerning positive definiteness of A0 is not satisfied. Phillips noticed that, in fact, the inverse Hn1 (u0 ) needs to be normalized. Introducing Fn ¼ 1=(log n) Dn and En ¼ Fn1 Hn (u0 )Fn1 he proved [see Lemma 7.3.5(ii)] En1
g3 ! 12 b0 p
b20 b0
b 0 , where g1 ¼ 2g0 þ 1: 1
(7:80)
The matrices Cn defined by Cn ¼ (1=nd )Dn trivially satisfy the first part of Assumption W3: d ¼ o(1): Cn D1 n ¼n
(7:81)
368
CHAPTER 7
NONLINEAR MODELS
Finally, with the neighborhood Nnr (u0 ) ¼ {u: kCn (u u0 )k r}, by Lemma 7.3.8 the second part of Assumption W3 holds: max kCn1 [Hn (u) Hn (u0 )]Cn1 k ¼ op (1):
u [ Nn1 (u0 )
(7:82)
The algebraic Lemma 7.1.3 does not depend on Eq. (7.80) and continues to be true. The next lemma of the Wooldridge framework, Lemma 7.1.5, uses only Eq. (7.82) and is also true. The remaining statements, starting from Lemma 7.1.6, need a revision.
7.3.10 Bounding the Remainder in the Vicinity of u~n Vn is the same as in Section 7.1.6, Vn ¼ {v: kCn (u~n u0 )k rn }: In the definition of N~ n (r), however, Dn is replaced by Fn : N~ n (r) ¼ {u : kFn (u u~n )k r}:
(7:83)
Lemma. Let g0 . 1=2 and let Assumption P1 hold. There exists a sequence of positive numbers {rn } such that (i) P(Vn ) ! 1 as n ! 1: (ii) For all large n we have sup v [ Vn , u [ N~ n (rn )
jRn (u, u0 )j 4dn rn2 a:s:
(iii) For rn 1 sup jRn (u~n , u0 )j dn rn2 a:s:
v [ Vn
Proof. By definition (7.7) Fn (u~n u0 ) ¼ Fn (Hn0 )1 Sn0 ¼ En1 Fn1 Sn0 :
(7:84)
Here, by Eq. (7.79) and the definition of Fn we have 0 Fn1 Sn0 ¼ (log n)D1 n Sn ¼ Op (log n):
(7:85)
Therefore Eqs. (7.80) and (7.84) imply Fn (u~n u0 ) ¼ Op (log n):
(7:86)
Hence, by Eq. (7.81) 1 ~ ~ Cn (u~n u0 ) ¼ Cn D1 n Dn (un u0 ) ¼ (Cn Dn )[(log n)Fn (un u0 )] ¼ op (1): (7:87)
Lemma 7.1.4 and Eq. (7.87) prove statement (i).
7.3 NONLINEAR LEAST SQUARES
369
Since kFn (u u~n )k rn implies kCn (u u~n )k kCn D1 n log nkrn rn for ~ ~ large n, we have the inclusion N n (rn ) # {u: kCn (u un )k rn } for large n. This inclusion, the triangle inequality and (7.19) lead to the implication v [ Vn , u [ N~ n (rn ) ) u [ Nn2rn (u0 ): Therefore statement (ii) follows from Lemma 7.1.5. As u~n [ Nn2rn for v [ Vn , statement (iii) follows directly from Lemma 7.1.5. B
7.3.11 Consistency of u^n Theorem. Let Assumptions P1 and P2 be satisfied. Then there exists a sequence of ~ n } such that sets {V ~ n ) ! 1 as n ! 1 P(V
(7:88)
~ n there exists an estimator u^n [ N~ n (rn ) such that and for almost any v [ V Fn (u^n u0 ) ¼ Op (log n):
(7:89)
Proof. By Lemma 7.1.3 0 Qn (u) Qn (u~n ) ¼ [Fn (u u~n )] En [Fn (u u~n )] þ 2Rn (u, u0 ) 2Rn (u~n , u0 ) (7:90)
[the right side of Eq. (7.9) gets multiplied by two because the derivatives in Section 7.3.2 are half the usual derivatives; the notation En ¼ Fn1 Hn (u0 )Fn1 is from Lemma 7.3.5(ii)]. On the boundary of N~ n (rn ) [see Eq. (7.83)] we have 0 [Fn (u u~n )] En [Fn (u u~n )] lmin (En )rn2 :
(7:91)
The numbers ln ; lmin (En ) ¼ lmin ½An (u0 )log2 n by Lemma 7.3.6 and Assumption P2 are bounded away from zero, ln c . 0: Denote fn (u) ¼ Qn (u) Qn (u~n ): Obviously, the point u~n [ N~ n (rn ) satisfies fn (u~n ) ¼ 0: We want to show that on the boundary @N~ n (rn ) of N~ n (rn ) the function fn ~ n ¼ Vn > {v: 5dn ln =2}. Since ln c, by Eq. (7.14) and is positive. Let V ~ n satisfies Eq. (7.88). Lemma 7.3.10(i) V ~n According to Lemma 7.3.10 and Eqs. (7.90) and (7.91), for v [ V 1 fn (u) ln rn2 5dn rn2 ln rn2 . 0: 2 u [ @N~ n (rn ) min
Thus, there are points inside N~ n (rn ) where fn takes lower values than on its boundary. Since fn is smooth, it achieves its minimum inside N~ n (rn ) and the point of minimum u^n satisfies the first-order condition (7.56). The inequality kFn (u^n u~n )k rn combined with Eq. (7.86) proves Eq. (7.89). B
370
CHAPTER 7
NONLINEAR MODELS
7.3.12 Phillips’ Representation of the Standardized Estimator Lemma.
Under Assumptions P1 and P2 the estimator from Theorem 7.3.11 satisfies Fn (u^n u0 ) ¼ En1 Fn1 Sn0 þ op (1):
(7:92)
Proof. Expanding Sn (u^n ) about u0 , we have from Eq. (7.56) 0 ¼ S0n þ Hn0 (u^n u0 ) þ (Hn Hn0 )(u^n u0 ), where the Hessian Hn is evaluated at mean values between u0 and u^n : Scaling this condition we get 0 ¼ Fn1 Sn0 þ (Fn1 Hn0 Fn1 )Fn (u^n u0 ) þ [Fn1 (Hn Hn0 )Fn1 ]Fn (u^n u0 ) ¼ Fn1 Sn0 þ {En þ En En1 [Fn1 (Hn Hn0 )Fn1 ]}Fn (u^n u0 ) ¼ Fn1 Sn0 þ En (I þ En1 D~ n )Fn (u^n u0 ), where we have denoted D~ n (u , u0 ) ¼ Fn1 (Hn Hn0 )Fn1 : It follows that on the set Jn ¼ {v: (I þ En1 D~ n )1 exists} we have the representation Fn (u^n u0 ) ¼ (I þ En1 D~ n )1 En1 Fn1 Sn0 :
(7:93)
Now we prove that the probability of Jn approaches 1. By Lemma 7.3.8 (see also the definitions of Fn and Cn in Section 7.3.9)
sup
kD~ n (u, u0 )k ¼
u [ Nn1 (u0 )
log2 n 1 sup kDn (u, u0 )k ¼ op 1 2 d n u [ Nn1 (u0 ) n
with some 1 [ (0, 2d): Using Eq. (7.89) we get
Cn (u^n u0 ) ¼
2 log n log n ^ F ( u u ) ¼ O ¼ op (1): n n 0 p nd nd
(7:94)
7.3 NONLINEAR LEAST SQUARES
371
This implies that u^n and u (which is a mean between u^n and u0 ) belong to Nn1 (u0 ) with probability approaching unity as n ! 1. Therefore by Eq. (7.94) and Lemma 7.3.5(ii) En1
~Dn (u , u0 ) ¼ op 1 , lim P(Jn ) ¼ 1 n!1 n1
and (I þ En1 D~ n )1 ¼
1 j X 1 En1 D~ n ¼ I þ op 1 n j¼0
on Jn :
(7:95)
By Eq. (7.85) and Lemma 7.3.5(ii), En1 Fn1 Sn0 ¼ Op (log n): Hence, Eqs. (7.93) and (7.95) give Eq. (7.92): 1 ^ Fn (un u0 ) ¼ I þ op 1 En1 Fn1 Sn0 n 1 ¼ En1 Fn1 Sn0 þ op 1 Op (log n): n
B
7.3.13 Asymptotic Normality Theorem. Suppose that in the model (7.54) the errors satisfy Assumption P1 and the true parameter vector satisfies Assumption P2. Then the least-squares estimator u^n exists with probability approaching 1, is consistent in the sense of Eq. (7.89) and has the following limit distribution: d
Fn (u^n u0 ) !
1 N(0, s 2 (2g0 þ 1)3 ): 1=b0
(7:96)
Proof. We apply representation (7.92). The limit of En1 is singular [Lemma 7.3.5(ii)] and Fn1 Sn0 diverges [see Eq. (7.85)]. Therefore before letting n ! 1 we need to calculate the product En1 Fn1 Sn0 . By Eq. (7.63) and the definition Fn ¼ 1=(log n) Dn we have 0
1 n g log n X s 0 pffiffiffi us C B n g X n s¼1 n B C s 0 1 0 C ¼ p1ffiffiffi Fn Sn ¼ B us vs , n g B 1 X C n s¼1 n s 0 @ pffiffiffi A us b0 log s n s¼1 n
(7:97)
372
CHAPTER 7
NONLINEAR MODELS
where vs ¼ (log n, b0 log s)0 : With the help of the expression from Lemma 7.3.5(ii) we calculate 0
1 2 2 1 1 þ 1 1 B C log n g1 log n g12 log2 n b0 g1 log n C 1 3B En vs } g1 B C 1 1 @ 1 A b0 log s 1 b0 g1 log n b20 0 1 2 2 1 1 log s þ log n þ 2 B C g1 g 1 log n g1 log n C: } g31 B @ A 1 1 log n þ log s b0 g1 In this and the next two equations } means equality up to a term of order Op (n1 ). Next we replace log s ¼ log (s=n) þ log n and retain the terms of order 1=log n: 1 2 2 1 s 1 log þ log n log n þ 2 þ B C g1 g1 log n g1 log n n B C En1 vs } g13 B C @ A 1 1 s þ log b0 g1 n 1 0 1 s 1 2 s þ log þ log B g1 g1 log n g1 n n C C } g13 B @ A 1 1 s þ log b0 g1 n 0 2 1 g1 2 s 1 1 s þ log þ @ log n g1 þ log } g13 n A: g1 n 1=b0 0 0
(7:98)
From Eqs. (7.97) and (7.98) we see that En1 Fn1 Sn0
}
g13 þ
1 1=b0
n g 1 X s 0 1 s pffiffiffi us þ log g1 n n s¼1 n
n g 1 1 X s 0 s pffiffiffi us 2g1 þ g12 log : log n n s¼1 n n
(7:99)
By Lemma 7.2.7 the second line of Eq. (7.99) is Op (1=log n): The proof of that lemma can be easily modified to show that with F(s) ¼ sg0 (1=g1 þ log s) 0 1 ð1 n g X 1 s 0 1 s d pffiffiffi ! N @0, s 2 F 2 (s) dsA: us þ log g1 n n s¼1 n 0
(7:100)
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
373
Here, by Eq. (7.69) ð1
2
ð1
F (s) ds ¼ 0
0
s2g0 2s2g0 1 2g0 2 þ log s þ s log s ds ¼ 3 : 2 g1 g1 g1
(7:101)
Equations (7.99) –(7.101) allow us to conclude that d 1 N(0, s 2 g13 ): En1 Fn1 Sn0 ! 1=b0 This equation and Eq. (7.92) prove Eq. (7.96).
B
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES 7.4.1 The Binary Logit Model and Log-Likelihood Consider independent observations (xt , yt ), t ¼ 1, . . . , T, where all yt are Bernoulli variables (with values 0 and 1) and xt ¼ (xt1 , . . . , xtK )0 are vectors of explanatory variables. The Bernoulli variable is uniquely characterized by the probability P( yt ¼ 1): In the binary model it is assumed that P( yt ¼ 1jxt ) ¼ F(x0t b0 ),
(7:102)
P where F is some probability distribution function, x0t b0 ¼ Kk¼1 xtk b0k and b0 [ RK is an unknown parameter vector. The choice of the logistic function F(x) ¼ 1=(1 þ ex ), x [ R,
(7:103)
makes the model a binary logit model. The density function of a single Bernoulli variable is P( y) ¼ py (1 p)1y : As a result of Eq. (7.102) and the assumed independence of observations the likelihood function (also known as the joint density) of the sequence of observations (x, y) ¼ {(x1 , y1 ), (x2 , y2 ), . . .} is LT (b; (x, y)) ¼
T Y
[F(x0t b)]yt [1 F(x0t b)]1yt:
t¼1
The log-likelihood is, obviously, log LT (b; (x, y)) ¼
T X
{yt log F(x0t b) þ (1 yt )[1 F(x0t b)]}
t¼1 T X ¼ yt log t¼1
F(x0t b) 0 þ log[1 F(xt b)] : 1 F(x0t b)
374
CHAPTER 7
NONLINEAR MODELS
7.4.2 The Score and Hessian Note that the logit function (7.103) satisfies 1 F(x) ¼ ex =(1 þ ex ) and therefore F(x) ¼ ex ; 1 F(x)
f (x) ; F 0 (x) ¼
ex ¼ F(x)[1 F(x)]: (1 þ ex )2
(7:104)
Using Eq. (7.104) we find the derivatives d F(x0t b) d ¼ x0 b ¼ xt , log db 1 F(x0t b) db t d F 0 (x0t b) xt ¼ F(x0t b)xt log[1 F(x0t b)] ¼ db 1 F(x0t b) and the score T T X X d log LT (b; (x, y)) ¼ [yt xt F(x0t b)xt ] ¼ [ yt F(x0t b)]xt : db t¼1 t¼1
(7:105)
Consequently, using the notation HT (b) ¼
T X
f (x0t b)xt x0t , b [ RK ,
T ¼ 1, 2, . . .
t¼1
the Hessian can be expressed as T X d2 log LT (b; (x, y)) ¼ F 0 (x0t b)xt x0t ¼ HT (b): db db0 t¼1
(7:106)
7.4.3 History The strong consistency of the ML estimator for Eq. (7.102) was studied in the context of repeated samples by Amemiya (1976) and Morimune (1959). These authors assumed that the explanatory variables can take a finite number of values and that the number of observations goes to infinity for each set of possible values of these variables. Such assumptions are appropriate in the context of controlled experiments and do not satisfy most econometric applications. Gourie´roux and Monfort (1981) (henceforth referred to as G&M) made a significant step forward by allowing the explanatory variables to take an infinite number of values. They obtained necessary and sufficient conditions for strong consistency of the ML estimator and proved its asymptotic normality. They discovered an interesting link between the logit model and the OLS estimator for a linear model (see Section 7.4.12); such a link does not exist in case of the probit model. In their argument an important role belongs to the surjection theorem from Cartan
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
375
(1967), see Section 7.4.7. Strong consistency was proved with the help of Anderson and Taylor (1979), and asymptotic normality was obtained as an application of an asymptotic normality result for the OLS estimator due to Eicker (1966). Let lKT and l1T denote, respectively, the largest and the smallest eigenvalues of HT (b0 ), where b0 is the true parameter vector:
lKT ¼ lmax (HT (b0 )), l1T ¼ lmin (HT (b0 )), MT ¼ sup kxt k,
(7:107)
tT
PK 2 1=2 . G&M assumed that the regressors are deterministic, where kxt k ¼ k¼1 xtk bounded (supT MT , 1), and that the eigenvalues of HT (b0 ) are of the same order: supT lKT =l1T , 1: We relax some or all of these assumptions, depending on the situation. The case of unbounded explanatory variables requires an accurate estimation of the Lipschitz constant for the mapping [see Eqs. (7.105) and (7.106)] d log LT (b; (x, y)) db T X ¼ b þ HT1 (b0 ) [yt F(x0t b)]xt ,
fT (b; (x, y)) ¼ b þ HT1 (b0 )
(7:108)
t¼1
which is necessary to apply the Cartan theorem. Instead of the result by Anderson and Taylor (1979) we use a more general theorem (Theorem 6.4.8) by Lai and Wei (1982). The theorem by Eicker (1966) does not apply because the squares of the error terms in the linear model are not uniformly integrable (Lemma 7.4.16); instead, we apply the Lindeberg CLT. We show that our results include, as a special case, those due to G&M. Hsiao (1991) also addressed the issue of unbounded explanatory variables. However, since Hsiao’s intention was to cover errors in variables, the resulting conditions are complex and are not directly comparable to ours. Besides, the approach by G&M generalized here gives necessary and sufficient conditions in some cases. The main results are given in Sections 7.4.10, 7.4.13 and 7.4.21. Everywhere we maintain (without explicitly mentioning) the basic assumptions of the logit model: the observations are independent and satisfy Eq. (7.102) with the logit function (7.103). To distinguish the additional assumptions from those in the previous sections, here we provide their numbers with the prefix BL (for binary logit).
7.4.4 Uniqueness of the Maximum Likelihood Estimator 7.4.4.1 Assumption BL1 l1T . 0:
For all large T the matrix HT (b0 ) is positive definite:
Lemma. If Assumption BL1 is satisfied and the ML estimator b^ T (x, y) exists, then it is unique.
376
CHAPTER 7
NONLINEAR MODELS
Proof. Denote G(b) ¼ diag[ f (x01 b), . . . , f (x0T b)],
X ¼ (x1 , . . . , xT ):
(7:109)
Then HT (b) ¼ XG(b)X 0 and rankHT (b0 ) min[rankX, rankG(b0 )] ¼ rankX K
for T K:
As a result of this inequality and Assumption BL1, rankX ¼ K. Among the columns of X there are K linearly independent and the other T K P are linear functions of those K columns. In the sum Tt¼1 f (x0t b)xt x0t we can relocate the terms that correspond to the linearly independent xt to the beginning and the others to the end [this does not change HT (b)] and renumber the terms correspondingly. Then X ¼ (Y, AY) where Y is K K and nonsingular and A is some (T K) K matrix. Partitioning G(b) correspondingly we get HT (b) ¼ (Y, AY)
G1 (b)
0
0
G2 (b)
Y0
Y 0 A0
¼ YG1 (b)Y 0 þ AYG2 (b)Y 0 A0 YG1 (b)Y 0 : Since det[YG1 (b)Y 0 ] ¼ (detY)2 detG1 (b) = 0, HT (b) is nonsingular and the Hessian (7.106) is negative definite. This ensures uniqueness of the ML estimator. B
7.4.5 Lipschitz Condition for the Logit Density Lemma.
With the notation (7.107) for any b, b0 [ RK , we have j f (x0t b0 ) f (x0t b)j 4kb b0 kMT ekbb0 kMT f (x0t b0 ):
(7:110)
Proof. It is convenient to use hyperbolic functions cosh x ¼
ex þ ex ex ex sinh x , sinh x ¼ , tanh x ¼ 2 2 cosh x
and their obvious properties: (cosh x)0 ¼ sinh x, tanh x ¼ tanh (x),
jtanh xj 1:
(7:111)
The density f in Eq. (7.104) can be represented as f (x) ¼ [cosh (x=2)]2 =4 and then Eq. (7.111) can be used to obtain j f 0 (x)j ¼ j f (x) tanh (x=2)j f (x),
1 jxj e f (x) ejxj : 4
(7:112)
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
377
By the first relation in Eq. (7.112) and finite increments formula for any x, h with some u [ (0, 1) we have j f (x þ h) f (x)j ¼ j f 0 (x þ uh)hj f (x þ uh)jhj:
(7:113)
Using the inequality jxj jx þ uhj jx (x þ uh)j jhj and the second relation in Eq. (7.112) we bound f (x þ uh) ejxþuhj ¼ ejxjjxþuhj ejxj ejhj ejxj 4f (x)ejhj :
(7:114)
Equations (7.113) and (7.114) imply j f (x þ h) f (x)j 4jhj f (x)ejhj , which leads to the desired estimate 0
j f (x0t b0 ) f (x0t b)j 4jx0t (b b0 )jejxt (bb0 )j f (x0t b0 ) 4kb b0 kMT ekbb0 kMT f (x0t b0 ): B
7.4.6 Lipschitz Condition for f T Let B(b0 , r) ¼ {b [ RK : kb b0 k , r} denote an open ball in RK with center b0 and radius r. The function fT is defined in Eq. (7.108). Lemma. If Assumption BL1 holds, then for any r . 0 and T ¼ 1, 2, . . . the function fT satisfies the Lipschitz condition ~ (x, y))k L(r, T)kb bk ~ kfT (b; (x, y)) fT (b;
for b, b~ [ B(b0 , r)
with the Lipschitz constant L(r, T) ¼ 4rMT erMT lKT =l1T :
(7:115)
Proof. The finite increments formula for vector-valued functions (Kolmogorov and Fomin 1989, Ch. X, Part 1, Section 3) states that ~ (x, y))k kfT (b; (x, y)) fT (b;
dfT (b; (x, y))
kb bk ~ sup
db0
b [ B(b0 ,r)
and the lemma will follow if we prove
dfT (b; (x, y))
L(r, T): sup
db0
b [ B(b0 ,r)
(7:116)
378
CHAPTER 7
NONLINEAR MODELS
From Eqs. (7.106) and (7.108) we get d fT (b; (x, y)) d 2 log LT (b; (x, y)) 1 ¼ I þ H (b ) 0 T db0 db db0 ¼ HT1 (b0 )[HT (b0 ) HT (b)] ¼ HT1 (b0 )
T X
xt x0t [ f (x0t b0 ) f (x0t b)]:
(7.117)
t¼1
Equation (7.117) explains the construction of f T : when HT (b) is close to HT (b0 ), the matrix d f T (b; (x, y))=db0 should be small. Denote
at ¼ j f (x0t b0 ) f (x0t b)j, A ¼ diag[a1 , . . . , aT ], bt ¼ 4kb b0 kMT ekbb0 kMT f (x0t b0 ), B ¼ diag[b1 , . . . , bT ]: By Lemma 7.4.5 at bt . Using the notation (7.109) we have [(, ) is the scalar product in RT ] kHT (b0 ) HT (b)k ¼ kXG(b0 )X 0 XG(b)X 0 k ¼ sup ([G(b0 ) G(b)]X 0 x, X 0 x) kxk¼1
sup (AX 0 x, X 0 x) sup (BX 0 x, X 0 x) kxk¼1
kxk¼1
¼ 4kb b0 kMT ekbb0 kMT kHT (b0 )k:
(7.118)
This bound and Eq. (7.117) imply
df T (b; (x, y))
kH 1 (b0 )kkHT (b0 ) HT (b)k
T
db0 4kb b0 kMT ekbb0 kMT which proves Eq. (7.116) and the lemma.
lKT , l1T B
7.4.7 Surjection Theorem Theorem. (Cartan 1967, Theorem 4.4.1). If the function c : B(b0 , r) ! RK is such that the function f(b) ¼ b c(b) satisfies the Lipschitz condition ~ ckb bk ~ for b, b~ [ B(b0 , r) kf (b) f (b)k with a constant c , 1, then any element of B[c(b0 ), (1 c)r] is the image by c of an element of B(b0 , r): B(c(b0 ), (1 c)r) # c(B(b0 , r)):
(7:119)
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
379
7.4.8 Definition Let {rT } be a sequence of positive numbers. We say that the ML estimator b^ T (x, y) exists a.s. and converges a.s. to the true value b0 at the rate o(rT ) if for almost any (x, y) 1. there exists T0 (x, y) such that b^ T (x, y) exists for all T T0 (x, y) and 2. b^ T (x, y) b0 ¼ o(rT ) a.s.
7.4.9 Existence and Convergence of the Maximum Likelihood Estimator in Terms of the Inverse Lipschitz Function The function L(r, T) defined in Eq. (7.115) is continuous and monotone in r and satisfies L(0, T) ¼ 0, L(1, T) ¼ 1 for each T. Therefore for any c [ (0, 1) we can define rT (c) by L[rT (c), T] ¼ c. We call rT (c) an inverse Lipschitz function. By Lemma 7.4.6 for any 1 [ (0, 1) ~ (x, y))k ckb bk ~ kf T (b; (x, y)) f T (b;
for b, b~ [ B(b0 , 1rT (c)):
(7:120)
Denote [see Eq. (7.108)]
cT (b; (x, y)) ¼ b f T (b; (x, y)) ¼ HT1 (b0 )
T X
[ yt F(x0t b)]xt :
(7:121)
t¼1
Lemma. Suppose Assumption BL1 is satisfied and let c [ (0, 1). Then b^ T (x, y) exists a.s. and converges a.s. to the true value b0 at the rate o[rT (c)] if and only if
cT (b0 ; (x, y)) ¼ o(rT (c)) a:s:
(7:122)
Proof. Sufficiency. Following G&M we apply the surjection theorem. If Eq. (7.122) is true, then for almost any (x, y) and for any 1 [ (0, 1) there exists T0 ¼ T0 (x, y, 1) such that kcT (b0 ; (x, y))k , (1 c)1rT (c)
for all T T0 :
(7:123)
Equation (7.120) shows that Theorem 7.4.7 is applicable with r ¼ 1rT (c). By Eq. (7.123) the null vector belongs to the ball B[cT (b0 ; (x, y)), (1 c)1rT (c)]. Therefore Eq. (7.119) ensures the existence of b^ T (x, y) [ B(b0 , 1rT (c))
(7:124)
such that cT [b^ T (x, y); (x, y)] ¼ 0 and, hence, b^ T (x, y) is the ML estimator. (Recall that by Lemma 7.4.4 it is unique.) Since Eq. (7.124) is true for all T T0 and 1 can be arbitrarily small, we have proved that b^ T (x, y) b0 ¼ o(rT (c)) a:s:
(7:125)
380
CHAPTER 7
NONLINEAR MODELS
Necessity. Suppose Eq. (7.122) does not hold. Then for any (x, y) from a set of positive probability there exist 11 ¼ 11 (x, y) and a sequence {Tn } such that Tn ! 1 as n ! 1 and kcTn (b0 ; (x, y))k 11 rTn (c) for all n:
(7:126)
Letting 1 ¼ 11 =[2(1 þ c)] in Eq. (7.120) we get kcT (b; (x, y)) cT (b0 ; (x, y))k kb b0 k þ kf T (b; (x, y)) f T (b0 ; (x, y))k 11 1rT (c) þ c1rT (c) ¼ rT (c) 2 for all b [ B[b0 , 1rT (c)]. This inequality and Eq. (7.126) imply kcTn (b; (x, y))k kcTn (b0 ; (x, y))k kcTn (b; (x, y)) cTn (b0 ; (x, y))k 11 rTn (c) for all b [ B(b0 , 1rTn (c)): 2 This shows that the ML estimator b^ Tn (x, y), if it exists, cannot belong to B[b0 , 1rTn (c)] for all n. The resulting inequality kb^ Tn (x, y) b0 k 1(x, y)rTn (c), n ¼ 1, 2, . . . , is true on a set of positive probability and means that Eq. (7.125) cannot be true. B
7.4.10 Existence and Convergence of the Maximum Likelihood Estimator in Terms of rT Condition (7.122) is not very convenient because the inverse Lipschitz function is difficult to find explicitly. Here we show that it can be replaced by
rT ¼
1 l1T : MT lKT
(7:127)
Let {aT }, {bT } be positive sequences of constants or random variables. We write aT bT if c1 aT bT c2 aT with constants c1 , c2 . 0 independent of T (c1 , c2 may depend on the point in the sample space V if aT , bT are random variables). Theorem. (From my drawer) If Assumption BL1 holds, then b^ T (x, y) exists a.s. and converges a.s. to b0 at the rate o(rT ) if and only if c T [b0 ; (x, y)] ¼ o(rT ) a:s. Proof. Let us prove that for any c [ (0, 1) rT (c) rT :
(7:128)
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
Denoting r~ T (c) ¼ rT (c)MT and cT ¼
381
cl1T we rewrite the definition 4lKT
L(rT (c), T) ¼ 4rT (c)MT erT (c)MT
lKT ¼c l1T
of rT (c) as r~ T (c)er~T (c) ¼ cT :
(7:129)
Now consider two cases. 1. Suppose 0 , r~ T (c) 1. Obviously, r rer er for any 0 , r 1 so that r~ lT (c) r~ T (c) r~ uT (c) where the lower and upper bounds r~ lT (c), r~ uT (c) are defined by e~rlT (c) ¼ cT , r~ uT (c) ¼ cT . Thus, 1ecT r~ T (c) cT or cl1T cl1T rT (c) 4eMT lKT 4MT lKT
if rT (c)
1 : MT
(7:130)
2. Let r~ T (c) . 1. In the range r . 1 we have er rer e2r . It follows that l r~ lT (c) r~ T (c) r~ uT (c), where r~ lT (c), r~ uT (c) are defined from e2~r T (c) ¼ cT , u er~ T (c) ¼ cT . Hence, 12 ln cT r~ T (c) ln cT or 1 c l1T 1 c l1T rT (c) ln þ ln ln þ ln lKT lKT 2MT 4 MT 4
if rT (c) .
1 : MT (7:131)
Equations (7.130) and (7.131) allow us to prove Eq. (7.128). If lKT =l1T ! 1, then cT ! 0 and by Eq. (7.129) r~ T (c) ! 0. For all sufficiently large T Eq. (7.130) is true and Eq. (7.128) follows. If, however, lKT =l1T ¼ O(1), then rT 1=MT and Eq. (7.128) follows from either Eq. (7.130) or Eq. (7.131). Now the theorem follows from Lemma 7.4.9 and Eq. (7.128). B Equation (7.128) implies that rT (c0 ) rT (c00 ) for any c0 , c00 [ (0, 1), that is, the dependence of rT (c) on c is insignificant.
7.4.11 Consistency in the Case of Bounded Explanatory Variables 7.4.11.1 Assumption BL2 The explanatory variables are bounded and the eigenvalues of HT (b0 ) are of the same order: supT MT , 1, supT lKT =l1T , 1. Corollary. (Gourie´roux and Monfort 1981, Lemma 3) Suppose Assumptions BL1 and BL2 are satisfied. Then b^ T (x, y) exists a.s. and converges a.s. to b0 if and only if cT [b0 ; (x, y)] ¼ o(1) a.s. as T ! 1.
382
CHAPTER 7
NONLINEAR MODELS
The proof follows from the fact that under Assumption BL2 rT 1. Comparison of this corollary and Theorem 7.4.10 shows that when MT lKT =l1T is allowed to grow, convergence of b^ T to b0 is faster than just “a.s.”
7.4.12 The Relationship between the Logit and Linear Models Denote 0 1 z1 0 p ffiffiffiffiffiffiffiffiffiffiffiffiffiffi yt F(xt b0 ) @ A, zt ¼ x0t f (x0t b0 ), ut ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi , Z ¼ f (x0t b0 ) z T
0
1 u1 u ¼ @A uT
(7:132)
and consider the model g ¼ Z b þ u,
(7:133)
where b is a K-dimensional parameter. Lemma. Under the specification of the logit model the variables ut defined in (7.132) are independent and satisfy Eut ¼ 0,
Eu2t ¼ 1:
(7:134)
If, further, Assumption BL1 holds, then the OLS estimator for model (7.133) has the property
b^ b ¼ (Z 0 Z)1 Z 0 u ¼ cT (b0 ; (x, y)): Proof. By the binary model assumption (7.102) E( yt jxt ) ¼ 1 P( yt ¼ 1jxt ) þ 0 P( yt ¼ 0jxt ) ¼ F(x0t b0 ), so E(ut jxt ) ¼
E( yt jxt ) F(x0t b0 ) pffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼0 f (x0t b0 )
and, by the LIE, Eut ¼ 0. Similarly, E( y2t jxt ) ¼ F(x0t b0 ) and Eq. (7.104) implies E(u2t jxt ) ¼ ¼
E( y2t jxt ) 2E( yt jxt )F(x0t b0 ) þ F 2 (x0t b0 ) f (x0t b0 ) F(x0t b0 )[1 F(x0t b0 )] ¼ 1, Eu2t ¼ 1: f (x0t b0 )
We have proved Eq. (7.134).
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
383
Independence of ut follows from the assumed independence of the observations (xt , yt ). Further, using Eq. (7.121) we get
Z 0 Z ¼ HT (b0 ), Z 0 u ¼
T X
[ yt F(x0t b0 )]xt ,
(7:135)
t¼1
b^ b ¼ (Z 0 Z)1 Z 0 u ¼ HT1 (b0 )
T X
[ yt F(x0t b0 )]xt ¼ cT (b0 ; (x, y)):
t¼1
B
7.4.13 Conditions for Consistency in Terms of Eigenvalues of HT (b0) Theorem 7.4.10 and Corollary 7.4.11 supply necessary and sufficient conditions for a.s. convergence of b^ T to b0 at the rate o(rT ). These conditions are in terms of a relatively complex function cT [b0 ; (x, y)]. Using Lemma 7.4.12 and the result by Anderson and Taylor (1979) on strong consistency of the OLS estimator, G&M obtained a simpler condition for a.s. convergence of b^ T to b0 . The essence of their result is that under some circumstances the condition lim l1T ¼ 1
T !1
(7:136)
provides a.s. convergence. That the result by Anderson and Taylor (1979) generalized by Lai and Wei (1982) allows us to prove sufficiency of Eq. (7.136) under more general conditions than in the G&M theorem (Assumption BL2 is not required in the theorem below). The next two assumptions are imposed to satisfy the conditions of the Lai and Wei theorem (Theorem 6.4.8). 7.4.13.1
Assumption BL3
The explanatory variables xt are deterministic.
7.4.13.2
Assumption BL4
With some d . 0
(log lKT )1þd ¼ o(l1T ),
log lKT 1=2 ¼ o(rT ): l1T
(7:137)
Theorem. (From my drawer) If Assumptions BL3 and BL4 are satisfied and Eq. (7.136) holds, then b^ T (x, y) exists a.s. and converges a.s. to b0 at the rate o( rT ). Proof. Denote F t ¼ s (u1 , . . . , ut ) the s-field generated by u1 , . . . , ut [for the notation see Eq. (7.132)]. By Lemma 7.4.12 {ut , F t } is a m.d. sequence with supt E(u2t jF t1 ) ¼ supt Eu2t , 1. zt is deterministic and F t1 -measurable.
384
CHAPTER 7
NONLINEAR MODELS
By Theorem 6.4.8 and Eq. (7.137)
kb^ bk ¼ O
log lKT l1T
1=2 ! ¼ o(rT ):
It remains to recall that by Lemma 7.4.12, b^ b ¼ cT [b0 ; (x, y)].
B
7.4.14 Corollary Lemma. (Gourie´roux and Monfort, 1981, Lemma 4, sufficiency part) Suppose Assumptions BL2 and BL3 are satisfied and condition (7.136) holds. Then the ML estimator b^ T (x, y) exists a.s. and converges a.s. to b0 . Proof. We need to verify that Eq. (7.137) follows from the conditions of this corollary. By Assumption BL2 with some constant c . 0
lKT cl1T :
(7:138)
Obviously, for any 1 . 0 there exists c1 . 0 such that log l l1 for l c1 . Choosing 1(1 þ d) , 1, by Eqs. (7.136) and (7.138) we get for all large T d) (cl1T )1(1þd) ¼ o(l1T ): ( log lKT )1þd l1(1þ KT
(7:139)
This is the first part of Eq. (7.137). Assumption BL2 implies rT 1. Therefore the second part of Eq. (7.137) follows from Eq. (7.139): log lKT o(( log lKT )1þd ) ¼ ¼ o(1) ¼ o(rT ): l1T l1T Thus, this corollary really follows from Theorem 7.4.13.
B
7.4.15 Example Here is an example with one unbounded explanatory variable when Corollary 7.4.14 is not applicable and Theorem 7.4.13 is. For this example G&M established just the existence of the ML estimator (see their Remark 1 on p. 88). EXAMPLE.
Let K ¼ 1, b0 ¼ 1, xt ¼ ln t. Then b^ T b0 ¼ o(1= ln T) a:s:
(7:140)
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
385
Proof. The first several equations in the linear model asymptotically don’t matter, so it suffices to find the orders of the required quantities. Thus, f (xt b0 ) ¼
e ln t (1 þ
e ln t )2
¼
1=t 1 þ o(1) , ¼ 2 t (1 þ 1=t)
T X ln t ln2 t (1 þ o(1)) ln3 T: zt ¼ pffi (1 þ o(1)), Z 0 Z ¼ t t t¼1
(7.141)
It follows that lKT ¼ l1T ln3 T. Obviously, MT ¼ ln T, rT ¼ 1= ln T and Assumption BL4 is satisfied: 3
(ln ln T)
1þd
3
¼ o(ln T),
ln ln3 T ln3 T
1=2
3 ln ln T ¼ ln T
1=2
1 ¼ o(rT ): ln T
Application of Theorem 7.4.13 yields Eq. (7.140).
B
7.4.16 Lack of Uniform Integrability of Squared Errors Our next goal is to provide conditions sufficient for asymptotic normality of the ML estimator. For this purpose G&M applied the theorem by Eicker (1966) for the OLS estimator for the linear model. Eicker’s theorem requires uniform integrability of the squares u2t : supt Eu2t 1(u2t . c) ! 0 as c ! 1: The lemma below shows that Eicker’s theorem is not applicable when xt are unbounded. Lemma (i) If xt are deterministic, then for c . 0 Eu2t 1(u2t . c) ¼ [1 F(Xt )]1(Xt . ln c) þ F(Xt )1(Xt . ln c),
(7:142)
where Xt ¼ x0t b0 : (ii) If, additionally, supt jXt j ¼ 1 then sup Eu2t 1(u2t . c) ¼ 1 for any c . 0:
(7:143)
t
Proof. (i) It is convenient to collect the properties of the relevant Bernoulli variables in Table 7.1 [see Eqs. (7.102), (7.104) and (7.132)]. Based on the table data and Eq. (7.103) we note that the following equivalences hold for c . 0: if yt ¼ 1 then u2t . c () F(Xt ) ,
1 , Xt . ln c 1þc
386
CHAPTER 7
NONLINEAR MODELS
TABLE 7.1 Properties of Bernoulli Variables
Values of yt
Values of ut
Values of u2t
F(Xt )
1
1 F(Xt ) pffiffiffiffiffiffiffiffiffiffiffi f (Xt )
1 F(Xt ) F(Xt )
1 F(Xt )
0
F(Xt ) pffiffiffiffiffiffiffiffiffiffiffi f (Xt )
F(Xt ) 1 F(Xt )
Probabilities
and if yt ¼ 0 then u2t . c () F(Xt ) .
c () Xt . ln c: 1þc
Together with the table data these give Eq. (7.142): Eu2t 1(u2t . c) ¼ P(yt ¼ 1)[u2t 1(u2t . c)]yt ¼1 þ P(yt ¼ 0)[u2t 1(u2t . c)]yt ¼0 ¼ F(Xt )
1 F(Xt ) 1(Xt . ln c) F(Xt )
þ [1 F(Xt )]
F(Xt ) 1(Xt . ln c) 1 F(Xt )
¼ [1 F(Xt )]1(Xt . ln c) þ F(Xt )1(Xt . ln c): (ii) Suppose jXtk j ! 1 along a subsequence {tk }: Then for any given c . 0 one has jXtk j . ln c for all large k. Since at least one of the numbers 1 F(Xtk ) or F(Xtk ) tends to 1, Eq. (7.142) implies Eq. (7.143). B
7.4.17 G&M Representation of the Bias Lemma.
We have the representation
HT1=2 (b^ T )(b^ T b0 ) ¼ [HT1=2 (b^ T )AT1=2 ][AT1=2 HT1=2 (b0 )][(Z 0 Z)1=2 Z 0 u], where Z and u are from Eq. (7.132),
AT ¼
T X
f (x0t btT )xt x0t
t¼1
and btT is some point belonging to the segment with extremities b^ T and b0 .
(7:144)
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
387
Proof. The ML estimator b^ T maximizes the log-likelihood function and, given its smoothness, is the root of the first-order condition [see Eq. (7.105)] T X t¼1
Subtracting from both sides Z 0u ¼
T X
PT
t¼1
yt xt ¼
T X t¼1
F(x0t b0 )xt and using Eq. (7.135) we get
[yt F(x0t b0 )]xt ¼
t¼1
F(x0t b^ T )xt :
T X
[F(x0t b^ T ) F(x0t b0 )]xt :
t¼1
Here by the finite increments formula, F(x0t b^ T ) F(x0t b0 ) ¼ f (x0t btT )x0t (b^ T b0 ). Therefore Z 0 u ¼ AT (b^ T b0 ) and 1=2 1=2 HT1=2 (b^ T )(b^ T b0 ) ¼ HT1=2 (b^ T )A1 (b0 )Z 0 u: T HT (b0 )HT
Application of the identity Z 0 Z ¼ HT (b0 ) [see Eq. (7.135)] finishes the proof of Eq. (7.144). B
7.4.18 Kadison Theorem Let H be a Hilbert space and let B(H) denote the space of all bounded operators in H: For a given set S # R, B(H)S denotes the set of all bounded self-adjoint operators with the spectrum s (A) # S: We are interested in conditions on a real-valued function f that provide strong continuity of f (A), A [ B(H)S : if {An x} converges for each x [ H, then { f (An )x} also converges for each x [ H. The theorem below is sufficient for our purposes: Theorem. (Kadison, 1968, Corollary 3.7) If S is a closed or open subset of R, then a real-valued function defined on S is strong-operator continuous on B(H)S if and only if it is continuous on S, bounded on bounded subsets of S, and O(x) at infinity. We need a simple case of this theorem. When H ¼ Rn , strong-operator continuity coincides with uniform continuity. In applications to symmetric nonnegative matrices we can put S ¼ [0, 1) and then the desired continuity takes place if f is real-valued, continuous on [0, 1) and satisfies f (x) ¼ O(x) as x ! 1.
7.4.19 Functions of Two Matrix Sequences Lemma. satisfy
Suppose two sequences of positive, symmetric K K matrices {AT }, {BT }
sup kBT k , 1, kAT BT k ¼ o(1): T
(7:145)
388
CHAPTER 7
NONLINEAR MODELS
Let f be a real-valued, continuous on [0, 1) function such that f (x) ¼ O(x) as x ! 1. Then k f (AT ) f (BT )k ¼ o(1):
(7:146)
Proof. Suppose Eq. (7.146) is wrong. Then there exist 1 . 0 and a sequence {Tk } such that k f (ATk ) f (BTk )k 1 for all k:
(7:147)
A bounded sequence {BTk } contains a convergent subsequence {BTkn } such that kBTkn Bk ! 0 as n ! 1 for some B: Hence, by Eq. (7.145) kATkn Bk ¼ o(1): The Kadison theorem implies k f (ATkn ) f (B)k ¼ o(1), k f (BTkn ) f (B)k ¼ o(1) and, consequently, k f (ATkn ) f (BTkn )k ¼ o(1): This contradicts Eq. (7.147) and proves the statement. B
7.4.20 Convergence of the Elements of the G&M Representation 7.4.20.1 Assumption BL5 The ML estimator b^ T is consistent in the sense that b^ T b0 ¼ o( rT ) a.s. If necessary, this assumption can be replaced by sufficient conditions from Theorem 7.4.13. 7.4.20.2 Assumption BL6 The largest and smallest eigenvalues of HT (b0 ) are of the same order: supT lKT =l1T , 1. Lemma.
If Assumptions BL5 and BL6 hold, then a:s a:s AT1=2 HT1=2 (b0 ) ! I, HT1=2 ( b^ T )AT1=2 ! I:
(7:148)
Proof. Let us prove the first relation in Eq. (7.148). Denote ^ atT ¼ j f (x0t b0 ) f (x0t btT )j, btT ¼ 4kb^ T b0 kMT ekbT b0 kMT f (x0t b0 ):
By Lemma 7.4.5 and the inequality kbtT b0 k kb^ T b0 k we have atT btT : Then, )], we get similarly to Eq. (7.118) with G(bT ) ¼ diag[ f (x01 b1T ), . . . , f (x0T bTT kHT (b0 ) AT k ¼ kX[G(b0 ) G(bT )]X 0 k ^
4kb^ T b0 kMT ekbT b0 kMT lKT :
(7:149)
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
389
By Assumption BL5 and definition (7.127) kb^ T b0 kMT ¼ o(l1T =lKT ) ¼ o(1):
(7:150)
Equations (7.149) and (7.150) lead to ^ kHT (b0 )=lKT AT =lKT k 4kb^ T b0 kMT ekbT b0 kMT ¼ o(1):
By Lemma 7.4.19, in which we put BT ¼ HT (b0 )=lKT and f (x) ¼ x1=2 , this implies k(HT (b0 )=lKT )1=2 (AT =lKT )1=2 k ¼ o(1) or 1=2 kHT1=2 (b0 ) A1=2 T k ¼ o(lKT ):
(7:151)
We also need to bound kAT1=2 k: Owing to equations (7.149) and (7.150), kHT (b0 ) AT k ¼ o(l1T ): It follows that
lmin (AT z, z) ¼ inf (AT z, z) ¼ lmin (HT (b0 ))(1 þ o(1)) kzk¼1
and that k 2l1=2 for large T: kA1=2 T 1T
(7:152)
Now we combine Eqs. (7.151) and (7.152) and Assumption BL6 to get kAT1=2 HT1=2 (b0 ) Ik kA1=2 kkHT1=2 (b0 ) AT1=2 k T ¼ o((lKT =l1T )1=2 ) ¼ o(1) a.s. This is the first relation in Eq. (7.148). Replacing in the definition of atT the vector b0 by b^ T and making the necessary changes in the subsequent calculations, instead of Eq. (7.151) we obtain kHT1=2 ( b^ T ) AT1=2 k ¼ o(l1=2 KT ): Combining this bound with Eq. (7.152), as above, we finish the proof of the second relation in Eq. (7.148). B
7.4.21 Asymptotic Normality of the Maximum Likelihood Estimator Denote 2
gTt ¼ k(Z 0 Z)1=2 z0t k ¼ zt (Z 0 Z)1 z0t ¼ x0t HT1 (b0 )xt f (x0t b0 ), t ¼ 1, . . . , T
390
CHAPTER 7
NONLINEAR MODELS
and X
dT (1) ¼
gTt , 1 . 0:
jx0t b0 j.log(12 =gTt )
7.4.21.1
Assumption BL7
limT !1 dT (1) ¼ 0 for all 1 . 0:
Theorem. (From my drawer) If Assumptions BL3, BL5, BL6, BL7 and the condition liml1T ¼ 1 are satisfied, then d HT1=2 ( b^ T )( b^ T b0 ) ! N(0, I), T ! 1:
(7:153)
Proof. In view of the representation (7.144), in which the first two factors in square brackets satisfy Eq. (7.148), we have to show that d
(Z 0 Z)1=2 Z 0 u ! N(0, I), T ! 1:
(7:154)
By the Crame´r – Wold theorem (Theorem 3.1.53), Eq. (7.154) follows, if we prove d
a0 (Z 0 Z)1=2 Z 0 u ! N(0, a0 a), T ! 1,
(7:155)
for any a [ RK , a = 0: Denote
XTt ¼
T X 1 0 0 1=2 0 1 0 0 1=2 0 a (Z Z) a (Z Z) zt ut , ST ¼ XTt ¼ Z u: kak kak t¼1
XTt are independent and by Eq. (7.134) 1 0 0 1=2 0 a (Z Z) zt Eut ¼ 0, kak o 1 n 0 0 1=2 0 0 0 1=2 0 0 ES2T ¼ E a (Z Z) Z u [a (Z Z) Z u] kak2
EXTt ¼
¼
1 0 0 1=2 0 a (Z Z) Z (Euu0 )Z(Z 0 Z)1=2 a ¼ 1: kak2
By the Lindeberg theorem (Davidson, 1994, p. 369) to prove Eq. (7.155) it is enough to prove that LT (1) ! 0 as T ! 1, for all 1 . 0, where
LT (1) ¼
T X t¼1
2 EXTt 1(jXTt j . 1)
7.4 BINARY LOGIT MODELS WITH UNBOUNDED EXPLANATORY VARIABLES
391
2
is the Lindeberg function. Denoting g~ Tt ¼ [a0 (Z 0 Z)1=2 z0t =kak] we have
2 kak
(Z 0 Z)1=2 z0 ¼ gTt , X 2 ¼ g~ Tt u2 g~ Tt t Tt t kak
(7:156)
and LT (1) ¼
T X
g~ Tt Eu2t 1(u2t . 12 = g~ Tt ):
(7:157)
t¼1
Equation (7.142) yields Eu2t 1(u2t . c) [1 F(Xt )]1(jXt j . ln c) þ F(Xt )1(jXt j . ln c) ¼ 1(jx0t b0 j . ln c):
(7:158)
Now Eqs. (7.156) – (7.158) and Assumption BL7 lead to LT (1)
T X
gTt Eu2t 1(u2t . 12 =gTt )
t¼1
T X
gTt 1(jx0t b0 j . ln(12 =gTt )) ¼ dT (1) ! 0, T ! 1,
t¼1
for any 1 . 0: Application of the Lindeberg theorem completes the proof of Eq. (7.155). B
7.4.22 Corollary Lemma. (Gourie´roux and Monfort, 1981, Proposition 4) If the explanatory variables satisfy Assumptions BL2 and BL3, the smallest eigenvalue of HT (b0 ) goes to infinity and max1tT gTt ! 0, then Eq. (7.153) is true. Proof. Under Assumption BL2, rT 1. By Corollary 7.4.14 b^ T converges to b0 a.s., so Assumption BL5 is satisfied with rT ¼ 1. Assumption BL6 is a part of BL2. Thus, to apply Theorem 7.4.21 we have to verify Assumption BL7. Take any 1 . 0 and choose d . 0 so small that log(12 =d) . Mkb0 k, where M ¼ supt kxt k: Let T(d) be so large that maxt gTt d for T T(d): Then 2 2 1 1 jx0t b0 j Mkb0 k , log log , dT (1) ¼ 0 for T T(d): d gTt B
7.4.23 Example Here is an example that satisfies the conditions of Theorem 7.4.21 but not Corollary 7.4.22.
392
CHAPTER 7
EXAMPLE.
NONLINEAR MODELS
For K, b0 and xt from Example 7.4.15 one has
gTt
ln2 t , dT (1) ! 0 tln3 T
for all 1 . 0
(7:159)
and asymptotic normality (7.153) is true. Proof. From Example 7.4.15 we know that Assumption BL4 is satisfied and condition (7.136) holds. Therefore by Theorem 7.4.13, Assumption BL5 is satisfied (BL6 is trivial). To apply Theorem 7.4.21, it remains to check Assumption BL7. The expression for gTt in Eq. (7.159) follows from Eq. (7.141). dT (1) can only increase if summation over t such that jx0t b0 j . log(12 =gTt ) is replaced by summation over t satisfying eMkb0 k . 12 =gTt : Hence,
dT (1) c1
X t=ln2 tc2 =(12 ln3 T)
ln2 t : t ln3 T
Here the summation can be done over tT ¼ {t: t t0 , t=ln2 t c2 (12 ln3 T)} because the first several observations don’t matter. The function f (t) ¼ t=ln2 t has a positive minimum on [t0 , 1), so for large T tT ¼ ; and dT (1) ¼ 0: B
CHAPTER
8
TOOLS FOR VECTOR AUTOREGRESSIONS
I
chapter we find some technical results that proved useful in the study of VARs. The first part (Sections 8.1– 8.4) is from my work. It describes algebraic properties of Lp -approximable sequences (e.g., how they can be added and multiplied). The second part (Section 8.5) is devoted to a different notion of deterministic trends originated in Johansen (2000) and developed further in Nielsen (2005). Unlike simpler approaches (e.g., polynomial and logarithmic trends), it allows us to consider periodic (seasonal) trends. The Nielsen paper contains very deep results on strong consistency of the OLS estimator for VARs with deterministic trends. My initial intention was to cover all Nielsen’s results but then I realized that they would require another book. N THIS
8.1 Lp-APPROXIMABLE SEQUENCES OF MATRIX-VALUED FUNCTIONS In this section some results from Chapter 2 concerning sequences of vectors are generalized to sequences of matrices, so as to satisfy the needs of the theory of VARs.
8.1.1 Matrix-Valued Functions By replacing real numbers x1 , . . . , xn with matrices X1 , . . . , Xn in the vector x ¼ (x1 , . . . , xn ) we arrive at the following definitions. Let Mp denote the set of all matrices (of different sizes) equipped with the norm k kp : Denote tn ¼ {1, . . . , n}: lp (tn , Mp ) stands for the set of matrix-valued functions X: tn ! Mp provided with the norm
kX; lp (tn , Mp )k ¼
( P
n t¼1
kXt kpp
1=p
,
p , 1:
maxt¼1,..., n kXt k1 ,
p ¼ 1:
Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
393
394
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
The size s(A) of a matrix A is defined as the product of its dimensions. We consider only functions X with values Xt of the same size s(X) (which may vary with X). Usual matrix algebra conventions are followed: all vectors are column vectors and all matrices in the same formula are assumed to be compatible.
8.1.2 Definition of Lp -Approximability Let X [ lp (tn , Mp ) be a matrix-valued function. Xtij denotes the (i, j)th element of Xt . For a fixed pair (i, j), the vector Tij (X) ¼ (X1ij , . . . , Xnij )0 is called an (i, j)-thread of X or just a thread when the pair (i, j) is clear from the context. Consider a sequence {Xn : n [ N} of matrix-valued functions of equal sizes s(X1 ) ¼ s(X2 ) ¼ . We say that {Xn } is Lp -approximable if for each (i, j) the sequence of (i, j)-threads {Tij (Xn ) : n [ N} is Lp -approximable in the sense of Section 2.5.1. This means existence of functions Xijc [ Lp satisfying kTij (Xn ) dnp Xijc kp ! 0 for any (i, j): Applying dnp element-wise to the matrix X c composed of Xijc , we equivalently write kXn dnp X c ; lp (tn , Mp )k ! 0,
n ! 1:
When this is true, we say that {Xn } is Lp -close to X c :
8.1.3 Convergence of Trilinear Forms Here we generalize Lemma 2.2.2 to the case of three factors. We write Z c [ C[0, 1] if all components of the matrix Z c belong to the space C[0, 1]: The notation X c [ Lp has a similar meaning. Theorem. Let 1 , p , 1: Consider sequences of matrix-valued functions {Xn }, {Yn }, {Zn } such that Xn , Yn , Zn are defined on tn for all n [ N. Suppose {Xn } is Lp -close to X c [ Lp , {Yn } is Lq -close to Y c [ Lq and {Zn } is L1 -close to Z c [ C[0, 1]: Then
lim
n !1
n X t¼1
ð1 Xnt Ynt Znt ¼
X c (x)Y c (x)Z c (x) dx:
(8:1)
0
Proof. Step 1. Let X c , Y c , Z c be scalar functions. Generalizing Eq. (2.6) we get ð1 0
(Pn X c )(Pn Y c )(Pn Z c ) dx ¼
n X t¼1
(dnp X c )t (dnq Y c )t (dn1 Z c )t :
(8:2)
395
8.1 Lp-APPROXIMABLE SEQUENCES OF MATRIX-VALUED FUNCTIONS
Further, 1 ð ð1 (Pn X c )(Pn Y c )(Pn Z c ) dx X c Y c Z c dx 0
0
1 ð ð1 c c c c ¼ (Pn X X )(Pn Y )(Pn Z ) dx þ X c (Pn Y c Y c )(Pn Z c ) dx 0
0
ð1 þ 0
c c c c X Y (Pn Z Z ) dx
kPn X c X c kp kPn Y c kq kPn Z c k1 þ kX c kp kPn Y c Y c kq kPn Z c k1 þ kX c kp kY c kq kPn Z c Z c k1 :
(8.3)
In the case of X c and Y c we can apply boundedness of Haar projectors (Lemma 2.1.7) and their convergence to the identity operator in Lp , p , 1 (Lemma 2.2.1). For Z c we use uniform continuity: ð kPn Z c Z c k1 ¼ max maxn Z c (x) dx Z c (y) 1tn y[it it ð ¼ max maxn (Z c (x) Z c (y)) dx 1tn y[it it
jZ c (x) Z c ( y)j ! 0,
sup
n ! 1:
jxyj1=n
Now Eqs. (8.2) and (8.3) lead to
lim
n !1
n X
c
c
c
ð1
(dnp X )t (dnq Y )t (dn1 Z )t ¼
t¼1
The limit at the left equals limn!1
X c (x)Y c (x)Z c (x) dx:
0
Pn
t¼1
Xnt Ynt Znt because
n n X X c c c Xnt Ynt Znt (dnp X )t (dnq Y )t (dn1 Z )t t¼1 t¼1 kXn dnp X c kp kYn kq kZn k1 þ kdnp X c kp kYn dnq Y c kq kZn k1 þ kdnp X c kp kdnq Y c kq kZn dn1 Z c k1 ! 0:
396
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
Here we have used the Lp -approximability assumption of the theorem and boundedness of the discretization operator (Lemma 2.1.3). The statement in the case under consideration has been proved. Step 2. In the matrix case the matrix products at the left and right of Eq. (8.1) have elements X
(Xnt )iu (Ynt )uv (Znt )vj
and
X
u,v
c c Xiuc Yuv Zvj :
(8:4)
u,v
This means that the matrix version of Eq. (8.1) can be obtained by applying its scalar sibling to triplets of threads [the (i, u)-thread of Xn , (u, v)-thread of Yn and (v, j)-thread of Zn ] and summing the resulting equations over all pairs (u, v): B
8.1.4 Refined Convergence of Trilinear Forms Theorem.
Under the conditions of Theorem 8.1.3
lim
n !1
[nb] X
ðb Xnt Ynt Znt ¼
t¼[na]
X c (x)Y c (x)Z c (x) dx
a
uniformly with respect to the intervals [a, b] # [0, 1]: Proof. Since the result is not used in this book, only the main steps of the proof are indicated. Similarly to Eq. (2.7) we can obtain 1 ð ð1 Pn F Pn G Pn H dx FGH dx c[vp (F, 1=n)kGkq kHk1 0
0
þ kFkp vq (G, 1=n)kHk1 þ kFkp kGkq kPn H Hk1 ]: Letting F ¼ 1[a,b] X c , G ¼ 1[a,b] Y c and H ¼ Z c and arguing as in the proof of Theorem 2.2.3, it is possible to get
lim
n !1
n X t¼1
c
c
c
ðb
[dnp (1[a,b] X )]t [dnq (1[a,b] Y )]t (dn1 Z )t ¼
X c (x)Y c (x)Z c (x) dx:
a
The proof in the scalar case is completed by making obvious changes in the proof of Theorem 2.5.3. The generalization to the matrix case is straightforward. B
8.2 T-OPERATOR AND TRINITY
397
8.2 T-OPERATOR AND TRINITY 8.2.1 T-Operator Definition (Matrix Case) The definition of the convolution operator from Section 2.3.1 needs to be modified to take into account the possibility of pre- and postmultiplication. lp (Z, Mp ) is obtained from lp (tn , Mp ) by way of replacing tn with Z: Let {c ls : s [ Z} and {c rs : s [ Z} be two sequences of matrices intended for multiplication from the left and right, respectively. The T-operator l Tnr : lp (tn , Mp ) ! lp (Z, Mp ) is defined by (l Tnr X)j ¼
n X
l r ctj Xt ctj , j [ Z:
t¼1
8.2.2 The Adjoint of l Tnr Define (l Tnr ) by ½(l Tnr ) X j ¼
n X
c rjt Xt c ljt , j [ Z:
t¼1
Pn
The formula kX, Yl ¼ tr t¼1 Xj Yj defines a bilinear form with the argument (X, Y) from the product lp (tn , Mp ) lq (tn , Mq ) because n n n X X X X X X Xj Yj ¼ (Xj Yj )ii ¼ (Xj )iu (Yj )ui tr i j¼1 i j¼1 u j¼1
n X
kXj kp kYj kq kX; lp (tn , Mp )kkY; lq (tn , Mq )k:
j¼1
Lemma.
The operator (l Tnr ) is the adjoint of l Tnr in the sense that kl Tnr X, Yl ¼ kX, (l Tnr ) Yl:
Proof. Whenever the products AB and BA are square, the equation trAB ¼ trBA
(8:5)
is true (see Lu¨tkepohl, 1991, Section A.7). Change the summation order: tr
n X
(l Tnr X)j Yj ¼ tr
j¼1
n X n X
l r ctj Xt ctj Yj
j¼1 t¼1
¼ tr
n X t¼1
Xt
n X j¼1
! r l ctj Yj ctj
¼ tr
n X t¼1
Xt [(l Tnr ) Y]t : B
398
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
8.2.3 Boundedness of l Tnr Denote
ac ¼
X
kcsl k1 kcsr k1 :
s[Z
Lemma.
If ac , 1, then k l Tnr X; lp (Z, Mp )k ac kX; lp (tn , Mp )k:
Proof. Denoting xt ¼ kXt kp , x ¼ (x1 , . . . , xn ), c s ¼ kcsl k1 kcsr k1 we have k( l Tnr X)j kp c
n X
l r kctj k1 kXt kp kctj k1
t¼1
¼c
n X
xt c tj ¼ (Tn x)j :
t¼1
Here c depends only on the sizes of the matrices involved. Hence, by Lemma 2.3.2 X
!1=p k(
l
p Tnr X)j kp
c
j[Z
X
!1=p j(Tn x)j j
p
¼ ckTn xklp (Z)
j[Z
cac kxkp ¼ cac
n X
!1=p kXt kpp
:
t¼1
B
8.2.4 The Trinity and Lp -Approximable Sequences (Matrix Case) Here, to conserve notation, we omit the superscripts l and r and denote (Tnþ X)j ¼ (Tn X)j , j . n; (Tn0 X)j ¼ (Tn X)j , 1 j n; (Tn X)j ¼ (Tn X)j , j , 1; X bc X ¼ csl X csr : s[Z
Besides, the norms in lp (tn , Mp ), lp ({ j , 1}, Mp ) and lp ({ j . n}, Mp ) are denoted simply k kp : Theorem.
If ac , 1, p , 1 and {Xn } is Lp -close to X c [ Lp , then lim max{kTn0 Xn bc Xn kp , kTn Xn kp , kTnþ Xn kp } ¼ 0:
n !1
Moreover, {Tn0 Xn } is Lp -close to bc X c [ Lp :
(8:6)
8.2 T-OPERATOR AND TRINITY
399
Proof. From the generic expression (8.4) for the (i, j)th element of a product of three matrices we see that the tth coordinate of the (i, j)-thread of Tn0 Xn bc Xn equals n X X s¼1
¼
l r (cst )iu (Xns )uv (cst )vj
u,v
" n X X u,v
XX
(csl )iu (Xnt )uv (csr )vj
s[Z u,v l r (cst )iu (cst )vj (Xns )uv
s¼1
X
# (csl )iu (csr )vj (Xnt )uv
:
s[Z
The expression in the brackets here is of the type considered in Section 2.5.4 with
c t ¼ (ctl )iu (ctr )vj , bc ¼
X
c s :
s[Z
Since the summation over (u, v) is finite, the Lp -norm of the (i, j)-thread of Tn0 Xn bc Xn tends to zero. As the number of threads is finite, we have kTn0 Xn bc Xn kp ! 0: The other two relations in Eq. (8.6) are proved similarly. The final statement of the theorem follows from Lp -approximability of {Xn }: kTn0 Xn dnp bc X c kp kTn0 Xn bc Xn kp þ kbc (Xn dnp X c )kp ! 0:
B
8.2.5 Shift Operators The backward shift (or lag) operator L and forward shift operator Lþ are defined by (L X)t ¼ Xt1 , 2 t n, (L X)1 ¼ 0, (Lþ X)t ¼ Xtþ1 , 1 t n 1, (Lþ X)n ¼ 0, where X [ lp (tn , Mp ): Lemma. to X c :
If {Xn } is Lp -close to X c [ Lp and p , 1, then {L+ Xn } are Lp -close
l ¼ I, csl ¼ 0 for s=1, Proof. The lag operator obtains from l Tnr if we choose c1 r þ cs ¼ I for all s: A similar choice works for L : Thus the statement follows from the previous theorem. B
400
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
8.3 MATRIX OPERATIONS AND LP-APPROXIMABILITY 8.3.1 Transposition and Summation of Lp -Approximable Sequences By definition, the transposed matrix-valued function X 0 has values Xt0 , t ¼ 1, . . . , n: A sum of two functions X, Y is defined as the function with values Xt þ Yt : Theorem.
Let {Xn } be Lp -close to X c [ Lp : Then
(i) {Xn0 } is Lp -close to (X c )0 , (ii) if {Yn } is Lp -close to Y c [ Lp , then {Xn þ Yn } is Lp -close to X c þ Y c : Proof. Statement (i) is obvious because transposition does not change k kp -norms of matrices: kXn0 dnp (X c )0 kp ¼ kXn dnp X c kp ! 0; (ii) follows from the triangle inequality: k(Xn þ Yn ) dnp (Xn þ Yn )kp kXn dnp X c kp þ kYn dnp Y c kp ! 0: B
8.3.2 Multiplication of Lp -Approximable Sequences A product of two functions X, Y is defined as the function with values Xt Yt : The argument here is similar to that in Section 7.2.6. Theorem. If {Xn } is Lp -close to X c [ Lp and {Yn } is L1 -close to Y c [ C[0, 1], then {Xn Yn } is Lp -close to X c Y c : Proof. Step 1. Consider the scalar case. The fact that {Yn } is L1 -close to Y c means that for any 1 . 0 there is n0 . 0 such that for n . n0 max jYnt (dn1 Y c )t j , 1: t
Since Y c is uniformly continuous, we can also assert that max max jY c (x) Y c ( y)j , 1, t
x,y [ it
8.3 MATRIX OPERATIONS AND LP-APPROXIMABILITY
401
increasing, if necessary, n0 : Therefore for all t ð t t c c c c ¼ Ynt (dn1 Y )t þ n Y (x) dx Y Ynt Y n n it ð t jYnt (dn1 Y c )t j þ n Y c (x) Y c dx n it
21:
(8.7)
It follows that h t i t jXnt Ynt (dnp X c Y c )t j Xnt Ynt Y c þ (Xnt (dnp X c )t )Y c n n n h t io þ dnp X c () Y c () Y c n t 21jXnt j þ max jY c (x)jjXnt (dnp X c )t j þ 1(dnp jX c j)t : x [ [0,1]
Taking Lp -norms on both sides we get kXn Yn dnp X c Y c kp 21kXn kp þ max jY c (x)jkXn dnp X c kp x [ [0,1]
c
þ 1kdnp jX jkp : Recalling that dnp is bounded (Section 2.1.3) and kXn dnp X c kp ! 0 we see that the right-hand side here can be made arbitrarily small by increasing n: Step 2. In the matrix case we note that the (i, j)-thread of Xn Yn dnp X c Y c has the tth component equal to (Xn Yn )tij (dnp X c Y c )tij ¼
X
{(Xnt )iu (Ynt )uj [dnp (X c )iu (Y c )uj ]}:
u
Here {(Xnt )iu : t [ tn } is Lp -close to (X c )iu and {(Ynt )uj : t [ tn } is L1 -close to (Y c )uj , so the expression in the curly brackets tends to 0 in Lp -norm. Since the summation over u is finite, the theorem follows. B
8.3.3 Functions with Constant Matrix Values We say that a matrix-valued function Y: tn ! Mp is constant if Yt ¼ A, t ¼ 1, . . . , n:
402
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
Theorem (i) If {Xn } is L1 -close to X c [ L1 , then {n1=p Xn } is Lp -close to X c : (ii) If {Xn } is Lp -close to X c [ Lp and {Yn } is a sequence of constant functions with values An ! A, then {Xn þ n1=p Yn } is Lp -close to X c þ A and {Xn Yn } is Lp -close to X c A: Proof. (i) Since kn
1=p
c
Xn dnp X kp ¼
n XX i, j
!1=p 1=p
jn
c
(Xnt )ij (dnp X )tij j
p
,
t¼1
it suffices to prove the statement for threads. Obviously, p ð n n X X p 1=p 1=p c 11=p c jn (Xnt )ij (dnp X )tij j ¼ (Xnt )ij n X (x) dx n t¼1 t¼1 it p ð 1 c n max (Xnt )ij n X (x) dx t n it kXn dn1 X c ; l1 (tn , M1 )kp ! 0: (ii) This statement follows from part (i) of this theorem and Theorems 8.3.1 (ii) and 8.3.2. B
8.4 RESOLVENTS 8.4.1 Definition of Resolvents As time series autoregressions are difference equations, difference equations resolvents should play a special role in the theory of autoregressions. Here we look at examples of three resolvents. With a square matrix B we can associate an operator ( Pt1 (lB X)t ¼
s¼1
0,
Bt1s Xs ,
2 t n,
X [ lp (tn , Mp ):
t ¼ 1,
It is easy to check that lB X satisfies the difference equation (lB X)t B(lB X)t1 ¼ Xt1 ,
2 t n:
8.4 RESOLVENTS
403
In the definition of lB the values of X are premultiplied by powers of B, and therefore lB can be termed a left resolvent. Instead of premultiplying by powers of B and/or summing over initial values of s ¼ 1, . . . , t 1 we can postmultiply by powers of B0 and/or sum over terminal values s ¼ t þ 1, . . . , n, as in ( Pn (rB X)t ¼
s¼tþ1
Xs B0st1 ,
0,
1 t n 1, t ¼ n,
X [ lp (tn , Mp ):
Now the difference equation is (rB X)t B0 (rB X)t1 ¼ Xtþ1 B0 , 2 t n: rB is called a right resolvent. Since properties of operators obtained by different combinations of pre- and/or postmultiplication and summation sets are similar, it is enough to study one example of each type. Besides, a statement on boundedness of a resolvent generates a statement on boundedness of its adjoint (Section 8.2.2). This point is not elaborated here. The enveloping resolvent is defined by ( Pt1 (eB X)t ¼
s¼1 B
t1s
Xs B0t1s ,
0,
2 t n,
X [ lp (tn , Mp ):
t ¼ 1,
It satisfies the equation (eB X)t B(eB X)t1 B0 ¼ Xt1 , 2 t n:
8.4.2 Convergence of Resolvents Theorem. Suppose that all eigenvalues of B are inside the unit circle jlj , 1 and that {Xn } is Lp -close to X c [ Lp , p , 1: Then (i) {lB Xn } is Lp -close to (I B)1 X c , {rB Xn } is Lp -close to X c (I B0 )1 , and {eB Xn } is Lp -close to
1 X
Bs X c B0s :
(8:8)
s¼0
(ii) If {Yn } is Lq -close to Y c [ Lq , q , 1, 1=p þ 1=q ¼ 1, then
lim
n !1
n X t¼1
ð1 Ynt (eB Xn )t ¼ 0
Y c (x)
1 X s¼0
Bs X c (x)B0s dx:
(8:9)
404
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
Proof. (i) Comparison of definitions from Sections 8.2.1 and 8.2.4 shows that the resolvents can be obtained from Tn0 with the following choices of the matrices csl and csr : 0, s0 r l , c ¼ I for all s, (a) choice for lB : cs ¼ Bs1 , s , 0 s 0s1 B , s.0 , (b) choice for rB : csl ¼ I for all s, csr ¼ 0, s0 0, s0 0, s0 r c ¼ (c) choice for eB : csl ¼ , : s Bs1 , s , 0 B0s1 , s , 0 By Eq. (6.114) the assumption about the spectrum of B ensures convergence of all series involving B, including ac : By Theorem 8.2.4: P s1 c X ¼ (I B)1 X c , (a) {lB Xn } is Lp -close to bc X c ¼ 1 s¼1 B P c 0s1 (b) {rB Xn } is Lp -close to bc X c ¼ 1 ¼ X c (I B0 )1 , s¼1 X B P s c 0s (c) {eB Xn } is Lp -close to bc X c ¼ 1 s¼0 B X B : (ii) Equation (8.8) and Theorem 8.1.3 imply Eq. (8.9).
B
8.5 CONVERGENCE AND BOUNDS FOR DETERMINISTIC TRENDS 8.5.1 Definition and Examples Suppose a sequence of deterministic vectors {dt : t ¼ 0, 1, . . .} # Rp satisfies the recurrent equation dt ¼ Ddt1 , t ¼ 1, 2, . . . ,
(8:10)
where D is a p p matrix. If D has all its eigenvalues on the unit circle and the vectors d1 , . . . , dp are linearly independent, jlj (D)j ¼ 1, j ¼ 1, . . . , p; rank(d1 , . . . , dp ) ¼ p,
(8:11)
then {dt } is called a deterministic trend. Obviously, Eq. (8.10) is equivalent to dt ¼ Dt d0 , t ¼ 1, 2, . . .
(8:12)
EXAMPLE 8.1. When p ¼ 1, assuming that D ¼ eiw , from the Euler formula and Eq. (8.12) we get dt ¼ eitf d0 ¼ (cos t f þ i sin t f) d0 : This shows that in the 1-D case a monomial dt ¼ t k is not a deterministic trend, unless k ¼ 0:
8.5 CONVERGENCE AND BOUNDS FOR DETERMINISTIC TRENDS
405
(Nielsen, 2005, p. 535). Let
EXAMPLE 8.2.
D¼
0 , 1
1 1
d0 ¼
1 : 1
Then
1 1
d1 ¼
0 1
1 1 1 , d2 ¼ ¼ 0 1 1
0 1
1 1 : ¼ 1 0
If the data are biannual, then the first component dt1 generates a constant in a regression model and the second component dt2 is a dummy for even-numbered years. The eigenvalues of D are +1: EXAMPLE 8.3.
In the 2-D case a linear trend is obtained as follows. Put D¼
1 1 0 , dt ¼ t 1 1
Then Ddt1 ¼
1
0
1
1 1 rank(d1 , d2 ) ¼ rank 1
1
¼
1
t1 t 1 ¼ rankD: 2
¼ dt ;
l(D) ¼ 1 is an eigenvalue of multiplicity two. EXAMPLE 8.4. (Adapted from Johansen, 2000, p. 744.) A periodical trend (biannual dummy) from Example (8.2) can be combined with a linear trend from Example (8.3). Let s1 (t) ¼
1, 0,
t is odd; t is even;
s2 (t) ¼
1, 0,
t is even; t is odd:
Then, obviously, s1 (t þ 1) ¼ s2 (t) ¼ 1 s1 (t) and with 0
1 0 D ¼ @1 1 1 0
0 1 1 1 0 0 A, dt ¼ @ t A, s1 (t) 1
406
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
we have 0
1
0
B Ddt1 ¼ @ 1 1 0
1 0
B ¼@
0
10
1
1
C CB 0 A@ t 1 A 1 s1 (t 1) 1 0 1 1 1 C B C t A ¼ @ t A ¼ dt ;
s1 (t) 1 s1 (t 1) 0 1 1 1 1 B C rank(d1 , d2 , d3 ) ¼ rank@ 1 2 3 A ¼ 3 ¼ rankD 1
0 1
and the eigenvalues of D are 1 and 1:
8.5.2 The Jordan Representation of D By assumption, D has eigenvalues on the unit circle. From now on we suppose that these occur at l distinct complex pairs eiuj and eiuj for 0 uj p, which, of course, reduce to a single value of 1 or 1 if uj equals 0 or p: By a theorem of (Herstein, 1975, p. 308) there exists a regular, real matrix P that block-diagonalizes D as PDP1 ¼ diag[D1 , . . . , Dl ]
(8:13)
where Dj are real Jordan matrices of the form 0 B Lj B0 B Dj ¼ B B B0 @ 0
Ej Lj .. 0
..
.
1 0 C 0 C C C C Ej C A Lj
.
Ej .. . ..
.
(8:14)
and the pair (Lj , Ej ) is one of the pairs (1, 1), (1, 1)
or
cos uj sin uj
sin uj 1 , cos uj 0
0 1
for 0 , uj , p: (8:15)
The numbers
dj ¼ dim Dj =dim Lj , d ¼ max dj
(8:16)
are the multiplicity of the eigenvalue lj (D) and the largest multiplicity of the eigenvalues of D, respectively.
8.5 CONVERGENCE AND BOUNDS FOR DETERMINISTIC TRENDS
407
8.5.3 Normalization of dt The block structure (8.13) induces the block structures of the process dt , 0 0 0 0 0 0 , . . . , dt,l ) , and of the initial vector, d0 ¼ (d0,1 , . . . , d0,l ) : For the jth block dt ¼ (dt,1 t we have the equation dt, j ¼ Dj d0, j , j ¼ 1, . . . , l: The partial initial vector d0, j itself consists of dj blocks that correspond to the diagonal blocks of Eq. (8.14): d0, j ¼ (d0,0 j, 1 , . . . , d0,0 j, dj )0 :
(8:17)
The block dt, j is normalized by NT, j ¼ diag[(Lj =T)dj 1 , . . . , (Lj =T)0 ]
(8:18)
and, correspondingly, dt is normalized by NT ¼ diag[NT,1 , . . . , NT,l ]:
(8:19)
Definitions (8.18) and (8.19) imply kNT k ¼ O(1),
kNT1 k ¼ O(T d1 ):
(8:20)
Denote f (n, ) the vector that consists of the first n terms of the Taylor series:
0 un1 u0 ,..., f (n, u) ¼ : (n 1)! 0! Lemma.
Uniformly in t ¼ 0, . . . , T
NT, j dt, j ¼ (1 þ o(1)) f (dj , t=T) (Ltj d0, j,dj ) þ O
1 , T
as T ! 1:
(8:21)
Proof. Using Eq. (8.14), where the pair (Lj , Ej ) is one of those listed in Eq. (8.15), we write the powers of Dj as 0
Ltj
B B B t Dj ¼ B B 0 B @ 0
t Lt1 j 1
Ltj
0
1 t tdj þ1 Lj C dj 1 C C t tdj þ2 C: Lj C dj 2 C A Ltj
(8:22)
408
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
This is quite similar to the expression for Dk from Section 6.6.3, where the notation has been introduced. Upon premultiplication of Eq. (8.22) by Eq. (8.18) we get 0 BT B B t NT, j Dj ¼ B B B @
1dj
tþd 1 Lj j
T
1dj
t tþd 2 Lj j 1 tþdj 2
T 2dj Lj
0 0
0
a b
1 t Ltj C T dj 1 C C t 2dj Ltj C T C: dj 2 C A Ltj 1dj
Postmultiplying this by Eq. (8.17) we see that the first block of NT, j Dtj d0, j ¼ NT, j dt, j is
(NT, j dt, j )1 ¼
dj 1 1 X
T dj 1
k¼0
t Ltþk j d0, j,dj k dj 1 k
(8:23)
(the blocks of d0, j are counted from the end). For nonzero values of the binomial coefficients from t! t(t 1) (t k þ 1) t k 1 k 1 tk t ¼ 1 ¼ 1 ¼ k k! k! k!(t k)! tk t t we have tk t ¼ (1 þ o(1)) as t ! 1, k k!
for 0 k dj 1:
(8:24)
Equations (8.23) and (8.24) imply dX j 1
(t=T)dj 1k Ltþk d0, j,dj k k j ( d 1 k)!T j k¼0 dj 1 (t=T) 1 Ltj d0, j,dj þ O ¼ (1 þ o(1)) (dj 1)! T
(NT, j dt, j )1 ¼ (1 þ o(1))
(8.25)
(the term with k ¼ 0 leads; all others do not exceed c=T). Equation (8.25) is true for t large, t t0 and T t0 : To extend Eq. (8.25) to values t , t0 , we simply bound t k c and then from Eq. (8.23) in the case dj 2 (NT, j dt, j )1 ¼ O
1 T dj 1
1 : ¼O T
(8:26)
8.5 CONVERGENCE AND BOUNDS FOR DETERMINISTIC TRENDS
409
Since also (t=T)dj 1 t 1 Lj d0, j,dj ¼ O dj 1 , (dj 1)! T
(8:27)
Equation (8.25) follows from Eqs. (8.26) and (8.27) for t , t0 in the case dj 2: Finally, if dj ¼ 1, then Dtj consists of just one block, NT, j ¼ I and NT, j Dtj d0, j ¼ Ltj d0, j,dj ¼
(t=T)dj 1 t L d0, j,dj : (dj 1)! j
Equation (8.25) has been proved for all dj 1 and 0 t T: Replacing in the above argument dj 1 by dj 2, . . . , 0 we obtain analogs of Eq. (8.25) for other blocks of NT, j dt, j : The resulting equations are collected as 1 (t=T)dj 1 t 1 L d þ O (1 þ o(1)) j 0, j,dj B C ( d 1)! T j B C B C (t=T)dj 2 t 1 C B B (1 þ o(1)) C Lj d0, j,dj þ O (dj 2)! T C ¼B B C B C B C B C 0 @ A (t=T) t 1 (1 þ o(1)) Lj d0, j,dj þ O 0! T 1 , ¼ (1 þ o(1))f (dj , t=T) (Ltj d0, j,dj ) þ O T 0
NT, j dt, j
as T ! 1,
and this relationship is uniform in t, 0 t T:
B
A slight change of this argument shows that max kNT Dt k ¼ O(1): 0tT
8.5.4 Trigonometric Lemma Lemma (i) Let L¼
cos u sin u
sin u , cos u
a x¼ : b
Then
a2 þ b2 þ A cos 2t u þ B sin 2t u, t ¼ 1, 2, . . . , 2 where A and B are constant 2 2 matrices with elements depending only on a, b: t
t
0
L x(L x) ¼
1 0
0 1
410
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
(ii) Let Lj ¼
cos uj sin uj
sin uj , cos uj
xj ¼
aj , bj
j ¼ 1, 2:
Then 0
Lt1 x1 (Lt2 x2 ) ¼
X
[A+ cos(t(u1 + u2 )) þ B+ sin(t(u1 + u2 ))]
where A+ , B+ are 2 2 matrices with elements depending only on x1 , x2 : Proof. (i) Since L is rotation by angle u, Lt is rotation by angle tu and
t
Lx¼
cos t u sin t u
sin t u cos t u
a cos t u b sin t u a ¼ : a sin t u þ b cos t u b
(8:28)
Therefore 0
Lt x(Lt x) ¼
a cos t u b sin t u
a sin t u þ b cos t u c11 c12 : ¼ c21 c22
(a cos t u b sin t u, a sin t u þ b cos t u)
where c11 ¼ a2 cos2 t u 2ab sin t u cos t u þ b2 sin2 t u, c12 ¼ c21 ¼ a2 sin t u cos t u þ ab cos2 t u ab sin2 t u b2 sin t u cos t u, c22 ¼ a2 sin2 t u þ 2ab sin t u cos t u þ b2 cos2 t u: Using equations 2 sin a cos a ¼ sin 2a, cos2 a ¼ 12( cos 2a þ 1) and sin2 a ¼ 12(1 cos 2a) this can be rewritten as c11 ¼
a2 b2 (cos 2t u þ 1) þ (1 cos 2t u) ab sin 2t u, 2 2
c12 ¼ c21 ¼ c22 ¼
a2 b2 sin 2t u þ ab cos 2t u, 2
a2 b2 (1 cos 2t u) þ (cos 2t u þ 1) þ ab sin 2t u: 2 2
8.5 CONVERGENCE AND BOUNDS FOR DETERMINISTIC TRENDS
411
The result is 0
1 0 2 a2 þ b2 a b2 0 C B B 0 2 CþB 2 Lt x(Lt x) ¼ B @ a2 þ b2 A @ 0 ab 2 0 1 a2 b2 ab B 2 C C sin 2t u: þB @ a2 b2 A ab 2
1 ab
C C cos 2t u b a A 2 2
2
(ii) This time using Eq. (8.28) we have 0 Lt1 x1 (Lt2 x2 )
¼
a1 cos t u1 b1 sin t u1 a1 sin t u1 þ b1 cos t u1
(a2 cos t u2 b2 sin t u2 , a2 sin t u2 þ b2 cos t u2 ): The terms in the above expressions are linear combinations of the products cos t u1 cos t u2 , cos t u1 sin t u2 , sin t u1 cos t u2 , and sin t u1 sin t u2 : Using the formulas cos(a þ b) þ cos(a b) , 2 cos(a b) cos(a þ b) , sin a sin b ¼ 2 sin(a þ b) þ sin(a b) sin a cos b ¼ 2
cos a cos b ¼
these terms can be rewritten as linear combinations of cos(t(u1 + u2 )) and sin(t(u1 + u2 )): B
8.5.5 Sample Covariance of the Deterministic Process Lemma (i) With an appropriate pair (Lj , Ej ) from Eq. (8.15) we have ð1 T kd0, j,dj k2 1X 0 (NT, j dt1, j )(NT, j dt1, j ) ¼ (1 þ o(1)) f (dj , u) f 0 (dj , u) du dimLj T t¼1 0 1 : Ej þ O T (ii) If kd0, j,dj k2 . 0 then the limiting matrix in (i) is positive definite. P (iii) T1 Tt¼1 (NT,i dt1,i )(NT, j dt1, j )0 ¼ O T1 for i = j:
412
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
Proof. (i) By Lemma 8.5.3 T T 1X 1X t1 0 t1 f dj , (NT, j dt1, j )(NT, j dt1, j )0 ¼ (1 þ o(1)) f dj , T t¼1 T t¼1 T T 1 t1 0 0 (Lt1 j d0, j,dj d0, j,dj (Lj ) ) þ O T (8.29) (the terms corresponding to t ¼ 1 are of order T1 and can be included/ excluded without affecting the result). Let 0 p, q dj 1 be integer numbers. One block of the expression at the right of Eq. (8.29) equals T 1X 1 t 1 pþq t1 1 0 : Lj d0, j,dj d0,0 j,dj (Lt1 ) þ O j T t¼1 p!q! T T
(8:30)
Consider two cases. (a) Suppose dimLj ¼ 1: In this case Lj ¼ +1, d0, j,dj is a real number and t1 0 0 2 Lt1 j d0, j,dj d0, j,dj (Lj ) ¼ jd0, j,dj j ¼ Ej
kd0, j,dj k2 : dimLj
The limit of Eq. (8.29), therefore, is 1 p!q!
ð1
u pþq duEj
kd0, j,dj k2 , dimLj
0
which, in combination with Eq. (8.29), proves (i). (b) Let dimLj ¼ 2: By Lemma 8.5.4(i), Eq. (8.30) equals t 1 pþq Ej kd0, j,dj k2 =dimLj T T 1 1X t 1 pþq 1 : þ (A cos 2t uj þ B sin 2t uj ) þ O p!q! T t¼1 T T
T 1 1X p!q! T t¼1
The desired result follows if we prove that T 1X t 1 pþq 1 : (A cos 2t uj þ B sin 2t uj ) ¼ O T t¼1 T T
8.5 CONVERGENCE AND BOUNDS FOR DETERMINISTIC TRENDS
413
To this end, it is sufficient to prove T 1 1 X
T lþ1
t¼0
1 t cos (2t u þ a) ¼ O T l
(8:31)
for any nonnegative integer l and a constant a ¼ 0 or a ¼ p=2. Direct calculation gives jþ4k @ p , j ¼ 0, 2, sin 2t u þ a þ 2 @u jþ3k jþ4k j4k @ t cos(2t u þ a) ¼ 2 sin(2t u þ a), j ¼ 1, 3, @u t jþ4k cos(2t u þ a) ¼ 2j4k
P l for k ¼ 0, 1, . . . These identities show that T1 t¼0 t cos(2t u þ a) is, up l X @ T1 l to a constant factor cl , t sin(2t u þ a), where x ¼ a or t¼0 @u x ¼ a þ p=2. By (Gradshteyn and Ryzhik 2007, Formula 1.341.1) PT1 t¼0 sin(2t u þ x) ¼ f (x, T, u), where f (x, T, u) ¼ sin[x þ (T 1)u] sin(T u)=sin u [by definition, f (x, T, u) ¼ 0 when sin u ¼ 0]. Thus, T 1 1 X
T lþ1
t l cos(2t u þ a) ¼
t¼0
l cl @ f (x, T, u): lþ1 T @u
l Xl @ f (x, T, u) ¼ a(x, T, u)T m , where m¼0 @u a(x, T, u) are bounded in T. Therefore Eq. (8.31) follows. Ð1 (ii) The matrix 0 f (dj , u) f 0 (dj , u) du is positive definite as a Gram matrix of a linearly independent system (Section 1.7.5). It is easy to see that
(iii) By Lemma 8.5.3 T T 1X 1X t1 0 t1 0 f dj , (NT,i dt1,i )(NT, j dt1, j ) ¼ (1 þ o(1)) f di , T t¼1 T t¼1 T T 1 t1 0 0 : þO d d (L ) Lt1 0,i, d i 0, j,dj i j T Therefore the result follows from Lemma 8.5.4(ii) and Eq. (8.31).
B
414
CHAPTER 8
TOOLS FOR VECTOR AUTOREGRESSIONS
8.5.6 Asymptotic Behavior of Normalized Deterministic Trends Theorem. Then
(Nielsen, 2005, Theorem 4.1) Suppose dt satisfies Eqs. (8.10) and (8.11).
(i) max0tT jNT dt j ¼ O(1) and, in particular, maxtT kdt k ¼ O(T d1 ), where d ¼ max dj is the largest multiplicity of eigenvalues of D: P (ii) limT !1 T1 Tt¼1 (NT dt1 )(NT dt1 )0 is positive definite. P 1 0 (iii) maxtT dt0 Ts¼1 ds1 ds1 dt ¼ O T1 : Proof. (i) The process NT dt is obtained by stacking the processes NT, j dt, j : 0
0
NT,1
B NT dt ¼ @
.. 0
. NT,l
10
1 0 1 dt,1 NT,1 dt,1 C@ A A ¼ @ A: dt,l NT,l dt,l
Therefore the first statement follows from Lemma 8.5.3 and the inequality kNT dt k l maxj kNT, j dt, j k: Further, 1 1dj dt, j ¼ NT, , . . . , (Lj =T)0 ]NT, j dt, j ¼ O(T d1 ); j NT, j dt, j ¼ diag[(Lj =T)
which proves the statement. (ii) The equation 0
1 NT,1 dt1,1 B C (NT dt1 )(NT dt1 )0 ¼ @ A((NT,1 dt1,1 )0 , . . . , (NT,l dt1,l )0 ) NT,l dt1,l ¼ (NT,i dt1,i (NT, j dt1, j )0 )i,l j¼1 implies that the matrix T 1X (NT dt1 )(NT dt1 )0 T !1 T t¼1
M ; lim
¼ lim
T !1
T 1X NT,i dt1,i (NT, j dt1, j )0 T t¼1
!l i, j¼1
has null blocks outside the main diagonal by Lemma 8.5.5(iii) and positive definite diagonal blocks by Lemma 8.5.5(ii). Thus, M itself is positive definite.
8.5 CONVERGENCE AND BOUNDS FOR DETERMINISTIC TRENDS
415
(iii) In the identity dt0
T X
!1 0 ds1 ds1
s¼1
" #1 T X 1 0 1 0 dt ¼ (NT dt ) (NT ds1 )(NT ds1 ) (NT dt ) T T s¼1
the vector NT dt is bounded by part (i). The matrix in the brackets is positive definite by part (ii). B
8.5.7 Corollary Lemma.
Under the conditions of Theorem 8.5.6
!1=2
X
0 T 0
¼ O(T 1=2 ):
max dt ds1 ds1
tT
s¼1
Proof. For a vector x and symmetric matrix A we have 0
kx0 Ak ¼ kA0 xk ¼ ((A0 x) A0 x)1=2 ¼ (x0 A2 x)1=2 , which implies
2 !1=2
!1 31=2
X T X
0 T 0 0
¼ 4d 0
d ds1 ds1 ds1 ds1 dt 5 : t
t
s¼1 s¼1 Now the statement follows from part (iii) of Theorem 8.5.6.
B
REFERENCES Aljancˇic´, S., Bojanic´, R., Tomic´, M. 1955. Deux the´ore`mes relatifs au comportement asymptotique des se´ries trigonome´triques. Srpska Akad. Nauka. Zb. Rad. Mat. Inst., 43(4), 15– 26. Amemiya, T. 1976. The maximum likelihood, the minimum chi-square and the nonlinear weighted leastsquares estimator in the general qualitative response model. J. Amer. Statist. Assoc., 71(354), 347 –351. Amemiya, T. 1985. Advanced Econometrics. Oxford: Blackwell. Anderson, T. W. 1959. On asymptotic distributions of estimates of parameters of stochastic difference equations. Ann. Math. Stat., 30, 676–687. Anderson, T. W. 1971. The Statistical Analysis of Time Series. New York: John Wiley & Sons Inc. Anderson, T. W., Kunitomo, N. 1992. Asymptotic distributions of regression and autoregression coefficients with martingale difference disturbances. J. Multivar. Anal., 40(2), 221–243. Anderson, T. W., Taylor, J. B. 1979. Strong consistency of least squares estimates in dynamic models. Ann. Stat., 7(3), 484–489. Anselin, L. 1988. Spatial Econometrics: Methods and Models. Boston: Kluwer Academic Publishers. Anselin, L., Bera, A. K. 1998. Spatial dependence in linear regression models with an introduction to spatial econometrics. In: Ullah, A., Giles, D.E.A. (eds), Handbook of Applied Economics Statistics. New York: Marcel Dekker. Barlow, W. J. 1975. Coefficient properties of random variable sequences. Ann. Probab., 3(5), 840–848. Barro, R. J., Sala-i-Martin, X. 2003. Economic Growth. 2nd edn. Cambridge: MIT Press. Bellman, R. 1995. Introduction to Matrix Analysis. Classics in Applied Mathematics, Vol. 12. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). Reprint of the 1960 original. Beveridge, S., Nelson, C. R. 1981. A new approach to decomposition of economic time series into permanent and transitory components with particular attention to measurement of the “business cycle”. J. Monetary Econ., 7, 151–174. Billingsley, P. 1968. Convergence of Probability Measures. New York: John Wiley & Sons Inc. Billingsley, P. 1995. Probability and Measure. 3rd edn. New York: John Wiley & Sons Inc. Brown, B. M. 1971. Martingale central limit theorems. Ann. Math. Stat., 42, 59–66. Burkholder, D. L. 1968. Independent sequences with the Stein property. Ann. Math. Stat., 39, 1282– 1288. Burkholder, D. L., Gundy, R. F. 1970. Extrapolation and interpolation of quasi-linear operators on martingales. Acta Math., 124, 249–304. Cartan, H. 1967. Calcul Diffe´rentiel. Paris: Hermann. Case, A. C. 1991. Spatial patterns in household demand. Econometrica, 59, 953–965. Chow, Y. S. 1965. Local convergence of martingales and the law of large numbers. Ann. Math. Stat., 36, 552–558. Chow, Y. S. 1969. Martingale extensions of a theorem of Marcinkiewicz and Zygmund. Ann. Math. Stat., 40, 427– 433. Chow, Y. S. 1971. On the Lp-convergence for n 21/p Sn, 0 , p , 2. Ann. Math. Stat., 42, 393– 394. Christopeit, N., Helmes, K. 1980. Strong consistency of least squares estimators in linear regression models. Ann. Stat., 8(4), 778–788. Chudik, A., Pesaran, M. H., Tosetti, E. 2010. Weak and strong cross section dependence and estimation of large panels, 45 pp. http://ideas.repec.org/p/ces/ceswps/_2689.html Cˇ´ız´ek, P. 2008. Robust and efficient adaptive estimation of binary-choice regression models. J. Amer. Statist. Assoc., 103, 687 –696. Cliff, A. D., Ord, K. 1981. Spatial Processes: Models Applications. London: Pion Ltd.
Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
417
418
REFERENCES
Cressie, N. A. C. 1993. Statistics for Spatial Data. New York: John Wiley & Sons Inc. Davidson, J. 1994. Stochastic Limit Theory. New York: Oxford University Press. An introduction for econometricians. de Haan, L., Resnick, S. 1996. Second-order regular variation and rates of convergence in extreme-value theory. Ann. Prob., 24(1), 97–124. Dvoretzky, A. 1972. Asymptotic normality for sums of dependent random variables. Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. II: Probability theory. pp. 513– 535, Berkeley: Univ. California Press. Eicker, F. 1963. Asymptotic normality and consistency of the least squares estimators for families of linear regressions. Ann. Math. Stat., 34, 447–456. Eicker, F. 1966. A multivariate central limit theorem for random linear vector forms. Ann. Math. Stat., 37, 1825– 1828. Gantmacher, F. R. 1959. The Theory of Matrices. Vols. 1, 2. New York: Chelsea Publishing Co. Gohberg, I. C., Kreı˘n, M. G. 1969. Introduction to the Theory of Linear Nonselfadjoint Operators. Translations of Mathematical Monographs, Vol. 18. Providence: American Mathematical Society. Goldie, C. M., Smith, R. L. 1987. Slow variation with remainder: Theory and applications. Q. J. Math., 38, 45– 71. Gourie´roux, C., Monfort, A. 1981. Asymptotic properties of the maximum likelihood estimator in dichotomous logit models. J. Econometrics, 17(1), 83– 97. Gradshteyn, I. S., Ryzhik, I. M. 2007. Table of Integrals, Series, and Products. 7th edn. Amsterdam Elsevier/Academic Press. Translated from the Russian, Translation edited and with a preface by Alan Jeffrey and Daniel Zwillinger, with one CD-ROM (Windows, Macintosh and UNIX). Gundy, R. F. 1967. The martingale version of a theorem of Marcinkiewicz and Zygmund. Ann. Math. Stat., 38, 725– 734. Hahn, M. G., Kuelbs, J., Samur, J. D. 1987. Asymptotic normality of trimmed sums of mixing random variables. Ann. Probab., 15, 1395–1418. Hall, P., Heyde, C. C. 1980. Martingale Limit Theory and Its Application. New York: Academic Press Inc. Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press. Hannan, E. J. 1979. The central limit theorem for time series regression. Stoch. Proc. Appl., 9(3), 281–289. Herstein, I. N. 1975. Topics in Algebra. 2nd edn. Lexington: Xerox College Publishing. Hill, J. B. 2010. Least tail-trimmed squares for infinite variance autoregressions, under review at Journal of the Royal Statistical Society Series B. http://www.unc.edu/~jbhill/working%20papers.htm Hill, J. B. 2011. Central limit theory for kernel self-normalized tail-trimmed sums of dependent data with applications. Working paper, 30 pp. http://www.unc.edu/~jbhill/clt_tail_trim.pdf Holly, S., Pesaran, M. H., Yamagata, T. 2008. A spatio-temporal model of house prices in the US, 30 pp. http://ideas.repec.org/p/cam/camdae/0654.html Hoque, A. 1985. The exact moments of forecast error in the general dynamic model. Sankhya¯ Ser. B, 47(1), 128–143. Hsiao, C. 1991. Identification and estimation of dichotomous latent variables models using panel data. Rev. Econ. Stud., 58(4), 717– 731. Hurvich, C. M., Deo, R., Brodsky, J. 1998. The mean squared error of Geweke and Porter– Hudak’s estimator of the memory parameter of a long-memory time series. J. Time Ser. Anal., 19(1), 19– 46. Iosida, K. 1965. Functional Aalysis. Berlin: Springer-Verlag. Johansen, S. 2000. A Bartlett correction factor for tests on the cointegrating relations. Economet. Theor., 16(5), 740– 778. Jones, M. C. 1986. Expressions for inverse moments of positive quadratic forms in normal variables. Austral. J. Stat., 28(2), 242– 250. Kadison, R. V. 1968. Strong continuity of operator functions. Pacific J. Math., 26, 121– 129. Kelejian, H. H., Prucha, I. R. 1998. A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J. Real Estate Finance, 17, 99– 121. Kelejian, H. H., Prucha, I. R. 1999. A generalized moments estimator for the autoregressive parameter in a spatial model. Int. Econ. Rev., 40, 509– 533. Kelejian, H. H., Prucha, I. R. 2001. On the asymptotic distribution of the Moran I test statistic with applications. J. Econometrics, 104, 219 –257.
REFERENCES
419
Kelejian, H. H., Prucha, I. R. 2002. 2SLS and OLS in a spatial autoregressive model with equal spatial weights. Reg. Sci. Urban Econ., 32, 691–707. Kelejian, H. H., Prucha, I. R., Yuzefovich, Y. 2004. Instrumental variable estimation of a spatial autoregressive model with autoregressive disturbances: large and small sample results. Adv. Econometrics, 18, 163–198. Kolmogorov, A. N., Fomin, S. V. 1989. Elementy Teorii Funktsii i Funktsional’nogo Analiza. 6th edn. Moscow: “Nauka”. With a supplement, “Banach algebras”, by V. M. Tikhomirov. Kushner, H. 1971. Introduction to Stochastic Control. New York: Holt, Rinehart and Winston, Inc. Lai, T. L., Wei, C. Z. 1982. Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. Ann. Stat., 10(1), 154–166. Lai, T. L., Wei, C. Z. 1983a. Asymptotic properties of general autoregressive models and strong consistency of least-squares estimates of their parameters. J. Multivar. Anal., 13(1), 1– 23. Lai, T. L., Wei, C. Z. 1983b. A note on martingale difference sequences satisfying the local Marcinkiewicz – Zygmund condition. Bull. Inst. Math. Acad. Sinica, 11(1), 1 –13. Lai, T. L., Wei, C. Z. 1985. Asymptotic properties of multivariate weighted sums with applications to stochastic regression in linear dynamic systems. In: Multivariate Analysis VI (Pittsburgh, Pa., 1983). Amsterdam: North-Holland., pp. 375– 393. Lai, T. L., Robbins, H., Wei, C. Z. 1978. Strong consistency of least squares estimates in multiple regression. Proc. Nat. Acad. Sci. USA, 75(7), 3034–3036. Lai, T. L., Robbins, H., Wei, C. Z. 1979. Strong consistency of least squares estimates in multiple regression. II. J. Multivar. Anal., 9(3), 343– 361. Lee, LungFei. 2001. Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models I: spatial autoregressive processes. Ohio State University. Lee, LungFei. 2002. Consistency and efficiency of least squares estimation for mixed regressive, spatial autoregressive models. Economet. Theor., 18(2), 252– 277. Lee, LungFei. 2003. Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Economet. Rev., 22(4), 307–335. Lee, LungFei. 2004a. Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica, 72(6), 1899–1925. Lee, LungFei. 2004b. A supplement to “Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models”. http://economics.sbs.ohio-state.edu/lee/wp/sar-qml-rappen-04feb.pdf. Long, R. L. 1993. Martingale Spaces and Inequalities. Beijing: Peking University Press. Lu¨tkepohl, H. 1991. Introduction to Multiple Time Series Analysis. Berlin: Springer-Verlag. Mann, H. B., Wald, A. 1943. On the statistical treatment of linear stochastic difference equations. Econometrica, 11, 173– 220. Marcinkiewicz, J., Zygmund, A. 1937. Sur les fonctions inde´pendantes. Fund. Math., 29, 60–90. Mathai, A. M., Provost, S. B. 1992. Quadratic Forms in Random Variables. Statistics: Textbooks and Monographs, Vol. 126. New York: Marcel Dekker Inc. McLeish, D. L. 1974. Dependent central limit theorems and invariance principles. Ann. Prob., 2, 620 –628. Morimune, K. 1959. Comparison of normal and logistic models in the bivariate dichotomous analysis. J. Econometrics, 47, 957– 976. Moussatat, M. W. 1976. On the asymptotic theory of statistical experiments and some of its applications. PhD thesis, Univ. of California, Berkeley. Muench, T. J. 1974. Consistency of least squares estimates of coefficients of stochastic differential equations. Univ. of Minnesota Economic Department Technical Report. Mynbaev, K. T. 1997. Linear models with regressors generated by square-integrable functions. In: Programa e Resumos. 7a Escola de Se´ries Temporais e Econometria. Porto Alegre: ABE e SBE. 6 a 8 de agosto, pp. 80–82. Mynbaev, K. T. 2000. Limits of weighted sums of random variables. Discussion text No. 218/2000, Economics Department, Federal University of Ceara´, Brazil. Mynbaev, K. T. 2001. Lp-approximable sequences of vectors and limit distribution of quadratic forms of random variables. Adv. Appl. Math., 26(4), 302– 329. Mynbaev, K. T. 2006a. Asymptotic properties of OLS estimates in autoregressions with bounded or slowly growing deterministic trends. Commun. Stat. Theor. Methods, 35(1-3), 499– 520.
420
REFERENCES
Mynbaev, K. T. 2006b. OLS Estimator for a Mixed Regressive, Spatial Autoregressive Model: Extended Version. http://mpra.ub.uni-muenchen.de/15153/. Mynbaev, K. T. 2009. Central limit theorems for weighted sums of linear processes: Lp-approximability versus Brownian motion. Economet. Theor., 25(3), 748–763. Mynbaev, K. T. 2010. Asymptotic distribution of the OLS estimator for a mixed regressive, spatial autoregressive model. J. Multivar. Anal., 10(3), 733–748. Mynbaev, K. T. 2011. Regressions with asymptotically collinear regressors. Economet. Journal (forthcoming). Mynbaev, K. T., Castelar, I. 2001. The strengths and weaknesses of L2-approximable regressors. Two Essays on Econometrics. Vol. 1, Fortaleza, Brazil: Expressa˜o Gra´fica. http://mpra.ub.uni-muenchen. de/9056/. Mynbaev, K. T., Ullah, A. 2008. Asymptotic distribution of the OLS estimator for a purely autoregressive spatial model. J. Multivar. Anal., 99, 245–277. Nabeya, S., Tanaka, K. 1988. Asymptotic theory of a test for the constancy of regression coefficients against the random walk alternative. Ann. Stat., 16(1), 218–235. Nabeya, S., Tanaka, K. 1990. A general approach to the limiting distribution for estimators in time series regression with nonstable autoregressive errors. Econometrica, 58(1), 145–163. Nielsen, B. 2005. Strong consistency results for least squares estimators in general vector autoregressions with deterministic terms. Economet. Theor., 21(3), 534– 561. Nielsen, B. 2008. Singular vector autoregressions with deterministic terms: Strong consistency and lag order determination. http://www.nuffield.ox.ac.uk/economics/papers/2008/w14/Nielsen08VAR explosive.pdf. Ord, K. 1975. Estimation methods for models of spatial interaction. J. Amer. Statist. Assoc., 70, 120–126. Pesaran, M. H., Chudik, A. 2010. Econometric analysis of high dimensional VARs featuring a dominant unit, 41 pp. http://ideas.repec.org/p/ces/ceswps/_3055.html Phillips, P. C. B. 1999. Discrete Fourier transforms of fractional processes. Cowles Foundation discussion paper no. 1243, Yale University. Phillips, P. C. B. 2007. Regression with slowly varying regressors and nonlinear trends. Economet. Theor., 23, 557– 614. Po¨tscher, B. M., Prucha, I. R. 1997. Dynamic Nonlinear Econometric Models. Berlin: Springer-Verlag. Rao, C. R. 1965. Linear Statistical Inference and its Applications. New York: John Wiley & Sons Inc. Rao, M. M. 1961. Consistency and limit distributions of estimators of parameters in explosive stochastic difference equations. Ann. Math. Stat., 32, 195–218. Robinson, P. M. 1995. Log-periodogram regression of time series with long range dependence. Ann. Stat., 23(3), 1048–1072. Rubin, H. 1950. Consistency of maximum likelihood estimates in the explosive case. In: Koopmans, T. C. (ed), Statistical Inference in Dynamic Economic Models. Cowles Commission Monograph No. 10. New York: John Wiley & Sons Inc., pp. 356–364. Schmidt, P. 1976. Econometrics. Statistics: Textbooks and monographs. New York: Marcel Dekker, Inc. Seneta, E. 1985. Pravilno Menyayushchiesya Funktsii. Moscow: “Nauka”. Translated from English by I. S. Shiganov, Translation edited and with a preface by V. M. Zolotarev, With appendices by I. S. Shiganov and V. M. Zolotarev. Smirnov, O. L., Anselin, L. 2001. Fast maximum likelihood estimation of very large spatial autoregressive models: a characteristic polynomial approach. Comput. Statist. Data Anal., 35(3), 301– 319. Smith, R. L. 1982. Uniform rates of convergence in extreme-value theory. Adv. in Appl. Probab., 14, 600–622. Stigum, B. P. 1974. Asymptotic properties of dynamic stochastic parameter estimates. III. J. Multivar. Anal., 4, 351–381. Stout, W. F. 1974. Almost Sure Convergence. Academic Press Inc., New York-London. Probability and Mathematical Statistics, Vol. 24. Tanaka, K. 1996. Time Series Analysis. New York: John Wiley & Sons Inc. Nonstationary and noninvertible distribution theory. Taylor, R. L. 1978. Stochastic Convergence of Weighted Sums of Random Elements in Linear Spaces. Lecture Notes in Mathematics, Vol. 672. Berlin: Springer. Theil, H. 1971. Principles of Econometrics. New York: John Wiley & Sons Inc.
REFERENCES
421
Trenogin, V. A. 1980. Funktsional’nyi Analiz. Moscow: “Nauka”. Varga, R. S. 1962. Matrix Iterative Analysis. Englewood Cliffs, Prentice-Hall Inc. Vilenkin, N. Ja. 1969. Kombinatorika. Moscow: Nauka. Wei, C. Z. 1985. Asymptotic properties of least-squares estimates in stochastic regression models. Ann. Stat., 13(4), 1498–1508. White, H. 1994. Estimation, Inference and Specification Analysis. Econometric Society Monographs, Vol. 22. Cambridge: Cambridge University Press. White, J. S. 1958. The limiting distribution of the serial correlation coefficient in the explosive case. Ann. Math. Stat., 29, 1188– 1197. Wooldridge, J. M. 1994. Estimation and inference for dependent processes. Handbook of Econometrics, Vol. IV. Handbooks in Econom., Vol. 2. Amsterdam: North-Holland., pp. 2639– 2738. Wu, Chien-Fu. 1981. Asymptotic theory of nonlinear least squares estimation. Ann. Stat., 9(3), 501 –513. Wu, W. B. 2005. Nonlinear system theory: Another look at dependence. Proc. Natl. Acad. Sci. USA, 102, 14150– 14154. Wu, W. B., Min, W. 2005. On linear processes with dependent innovations. Stochastic Process. Appl., 115, 939–958. Zhuk, V. V., Natanson, G. I. 2001. Seminorms and moduli of continuity of functions defined on a segment. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI), 276 (Anal. Teor. Chisel i Teor. Funkts. 17), 155– 203.
AUTHOR INDEX Aljancˇic´, S., 143 Amemiya, T., 36, 41, 42, 96, 264, 374 Anderson, T.W., xi, 36, 41, 42, 95 –96, 299, 302, 375, 383 Anselin, L., 189, 236, 263 Barlow, W.J., 282, 285 Barro, R.J., 131 Bellman, R., 22, 23 Bera, A.K., 189 Beveridge, S., 104 Billingsley, P., 26, 29, 43, 125 Bojanic´, R., 143 Brodsky, J., 131 Brown, B.M., 94 Burkholder, D.L., 267, 269, 272, 280 Cartan, H., 374, 378 Case, A.C., 230 –231 Castelar, I., 37, 80 Chow, Y.S., xii, 93, 266, 268, 272 Christopeit, N., 299 Cliff, A.D., 189 Davidson, J., xii, xiii, 20, 21, 25, 26, 32, 41, 91– 95, 266, 269, 270, 390 de Haan, L., 141 Deo, R., 131 Dvoretzky, A., 95
Gradshteyn, I.S., 413 Gundy, R.F., 271– 272 Hall, P., 95, 103, 266, 267 Hamilton, J.D., 33 Hannan, E.J., 95 Helmes, K., 299 Herstein, I.N., 307, 308, 406 Heyde, C.C., 95, 103, 266, 267 Hoque, A., 225 Hsiao, C., 375 Hurvich, C.M., 131 Iosida, K., 76 Johansen, S., x, 393, 405 Jones, M.C., 226 Kadison, R.V., 387 Kelejian, H.H., 190, 191, 214, 222, 230, 236, 263 Kolmogorov, A.N., 14, 18, 80, 115, 116, 377 Kreı˘n, M.G., 121, 201, 293 Kunitomo, N., xi, 95 Kushner, H., 322
Fomin, S.V., 14, 18, 80, 115, 116, 377
Lai, T.L., xii, 265, 270, 272, 274, 278, 281– 282, 284, 287, 288, 289, 293, 299, 301–302, 375, 383 Lee, LungFei, 127, 190, 191, 214, 217, 222, 230, 234, 236, 263 Long, R.L., 267, 281 Lu¨tkepohl, H., 24, 25, 220, 231, 397
Gohberg, I.C., 121, 201, 293 Gourie´roux, C., xii, 339, 374, 381, 384, 391
Mann, H.B., 302 Marcinkiewicz, J., 271, 280, 286 Mathai, A.M., 224
Eicker, F., 36, 43, 375, 385
Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
423
424
AUTHOR INDEX
McLeish, D.L., xii, 94 Monfort, A., xii, 339, 374, 381, 384, 391 Morimune, K., 374 Moussatat, M.W., 41 Muench, T.J., 302 Mynbaev, K.T., x, xi, xii, 37, 41, 52, 54, 65, 69, 70, 75, 77, 80, 97, 103, 121, 122 –123, 126, 149, 157, 181, 189, 190, 192, 214, 216, 218, 223, 227, 254, 263 Nabeya, S., xii, 40, 110, 122 Natanson, G.I., 46, 60 Nelson, C.R., 104 Nielsen, B., x, xii, xiii, 265, 301 –302, 318, 393, 405, 414 Ord, K., 189, 224, 236, 263 Paley, R.E.A.C., 267, 276, 280 Phillips, P.C.B., x, xii, 36, 131– 132, 133, 135, 137, 140 –142, 148, 153, 159 – 161, 166 –167, 169, 171 –172, 175 – 176, 179, 181, 185 –186, 339, 346, 351, 358, 366 –367, 370 Provost, S.B., 224 Prucha, I.R., 41, 191, 214, 222, 230, 236, 263 Po¨tscher, B.M., 41 Rao, C.R., 29– 30, 38 Rao, M.M., 302 Resnick, S., 141 Robbins, H., 299 Robinson, P.M., 131
Rubin, H., 302, 315 Ryzhik, I.M., 413 Sala-i-Martin, X., 131 Schmidt, P., 36, 42 Seneta, E., 132, 133, 143–146 Smirnov, O.L., 236 Stigum, B.P., 302 Stout, W.F., 311 Tanaka, K., xii, 40, 99, 104, 110, 122, 123 Taylor, J.B., 299, 375, 383 Taylor, R.L., 95 Theil, H., 140 Tomic´, M., 143 Trenogin, V.A., 6 Ullah, A., xii, 189, 192, 209, 214, 216, 218, 223, 227, 230, 264 Varga, R.S., 303 Vilenkin, N.Ja., 168 Wald, A., 302 Wei, C.Z., xii, 265, 270, 272, 274, 278, 281–282, 284, 287–289, 293, 299, 301–302, 375, 383 White, H., 217 White, J.S., 302 Wooldridge, J.M., 339, 340, 341, 351, 358, 364, 366, 368 Wu, Chien-Fu, 131, 358 Zhuk, V.V., 46, 60 Zygmund, A., 267, 268, 271, 276, 278, 280, 286, 287
SUBJECT INDEX s-additive measures, 14 s-additivity of Lebesgue integrals, 18 L2 -approximable sequence, 41 Lp-approximability abstract example, 87 convergence of the trinity, 70 criterion, 77, 87 definitions, 1-D, 68 explicit construction, 79 exponential trend, 80 geometric progression, 80 logarithmic trend, 80 matrix-valued functions, 110 polynomial trend, 80 c-properties, 68 m-properties, 68 refined convergence of bilinear forms, 69 sp classes, 117 L2-close, 41 Lp-close, 68 matrix case, 110, 394 uniformly, 353 h-continuity, 195 T-decomposition, 103, 104 s-field, 13 s-finite measure, 16 d-function, 174 1-function, 134 G-function, 141 H-function, 155 h-function, 194 L ¼ K(1), 134 L2-generated by, 41 Lp-generated by, 45 s-numbers, 117 T-operator, 55, 104, 297 adjoint, 114, 397 boundedness, 26. 298 definition (matrix case), 397
h-series, 194 2-step estimator, 226 absolute continuity, 18 of Lebesgue integrals, 18 adapted sequence of random variables, 31 adaptive normalizer, 34 adjoint operator, 11, 114 annihilation lemma, 224 asymptotic independence, condition, 39 of normalized regressors, 237 asymptotic linear independence, condition, 40 asymptotically almost surely, 340 autoregressive term domination, 258 Auxiliary vector, 245 awkward aggregates, 206 balancer mixed spatial model, 238 basis, 3 Beveridge–Nelson decomposition, 104 bilinear form, 10 binary model, 373 Borel s-field, 13 Borel –Cantelli lemma, 265 conditional, 266 Borel-measurability, 14 bounded operator, 7 sequence, 4 boundedness in probability, 26 Cauchy sequence, 5 Central Limit Theorems for weighted sums, of linear processes, 103 chain product, 205 characteristic function, 29 characteristic subspace, 21
Short-Memory Linear Processes and Econometric Applications. Kairat T. Mynbaev # 2011 John Wiley & Sons, Inc. Published 2011 by John Wiley & Sons, Inc.
425
426
SUBJECT INDEX
class of normal vectors, 30 classification theorem, 181 CLT, xi CLT for quadratic forms of linear processes version 1, 122 version 2, 126 CLT for weighted sums of linear processes, 103 of martingale differences, 97 coefficients bad, 176 good, 176 compact operator, 112 companion matrix, 301 complete, 5 condition X, 75 conditional expectation, 18 conjugate number, 11 continuous in mean, 47 continuity modulus 2-D, 107 condition X, 75 definition, 46 doubling property, 59 limit at zero, 47 monotonicity, 47 of a continuous function, 47 of a step function, 72 uniform continuity, 60 continuous mapping theorem, 26 conventional scheme, 36 conventional-scheme-compliant normalizer, 37 convergence in distribution, 26 in normed spaces, 4 in probability, 24 in probability to zero, 28 convergence neighborhood, 234 convergence of the trinity (matrix case), 398 convergence on Lp-generated sequences, 65 correction term, 226 correlation, 16 counting function, 266 covariance, 16 covering, 45 Crame´r-Wold device, 91 theorem, 91
criterion of uniform integrability, 91 CSC normalizer, 37 cutter, 64 1-D, 26 2-D, 2 3-D, 20 denominator, 238 dense, 16 dense set, 16 density, 25 dimensional, 3 discretization operator, 40 1-D, 45 2-D, 106 distance, 12 distribution function, 25 double A lemma, 206 double P lemma, 240 Egorov’s theorem, 269 eigenvalue, 21, 115 eigenvector, 21, 115 embedding operator, 55 errors contribution negligibility condition, 34 escort, 195 essential formula of the ML theory of spatial models, 220 essential supremum, 15 estimator, 340 exogenous regressors domination, 258 extended homogeneity, 20 eXtended M, 240 eXtender, 240 first moment, 28 fitted value, 42 Fourier coefficients, 113 series, 113 functions of operators, 115 G&M, 374 gauge, 128 generalized M-Z theorem I, 278 II, 280 multivariate case, 287 genie, 241 Gram matrix, 23
SUBJECT INDEX
Ho¨lder’s inequality, 15 in lp, 11, 15 Haar projector 1-D, 48 2-D, 107 Hessian, 340 high-level conditions, 44 Hilbert space, 11 Hilbert-Schmidt operators, 117 ID, 218 identically distributed vectors, 25 image, 6 increments, 293 independent events, 20 family, 20 variables, 20 indicator, 16 inequality Cauchy –Schwarz, 10 Chebyshov, 89 conditional Chebyshov, 267 conditional Ho¨lder, 267 conditional Jensen, 267 conditional Paley–Zygmund, 268 gauge, 128 Paley–Zygmund, 267 infinite-dimensional, 3 infinitely often, 265 innovations, 32 integral operator, 111 1-D, 48 2-D, 106 interpolation operator, 42 intrinsic characterization, 72 invariant subspace, 22 inverse Lipschitz function, 379 inverse of the major, 62 invertibility criterion, 260 isomorphic spaces, 8 isomorphism, 7 Karamata representation, 133, 143 kernel, 111 Kronecker’s lemma, 270 lag, 399 law of iterated expectations (LIE), 20 least s-field, 13
427
Lebesgue measure, 14 lemma on almost decreasing sequences, 318 on convex combinations of nonnegative random variables, 269 likelihood function, 373 Lindeberg function, 391 Lindeberg– Le´vy theorem, 92 linear independent, 2 operator, 6 space, 2 span, 2 subspace, 2 linear combination of operators, 6 of vectors, 1 linear form in independent standard normals, 92 linear process, 32, 103 linearity of Lp-approximable sequences, 178 link, 205 Lipschitz condition, 377 locator, 45 log-likelihood, 373 logistic function, 373 logit model, 373 low-level conditions, 44 m.d., xiv M–Z condition, 272 major, 62 Marcinkiewicz-Zygmund condition local, 272, 287 multivariate local, 287 martingale difference, 31 difference array, 93 transform, 270 WLLN, 93 martingale convergence theorem, 265–266 martingale strong law, 267 matrix-valued functions Lp-approximable, 394 constant, 384, 401 convergence of trilinear forms, 394 definition, 393 multiplication, 400
428
SUBJECT INDEX
matrix-valued functions (Continued) summation, 400 transposition, 400 measurable function, 14 space, 14 measure nonatomic, 269 Minkowski inequality in Lp, 15 in lp , 5 ML, xiv ML estimator a.s. existence, 379 convergence at some rate, 379 MM, 189 modification factor, 226 monotonicity of lp norms, 5 Moore-Penrose inverse, 38 multicollinearity detector, 257 multiplicity of an eigenvalue, 21, 115 negative part of a function, 19 neighborhood, 13 Newton-Leibniz formula, 221 non zero-tail sequences, 63 nonnegative operator, 114 nonreduced model, 174, 176 norm equivalent, 6 homogeneity, 3 nondegeneracy, 3 nonnegativity, 3 of a vector, 3 of an operator, 7 triangle inequality, 3 normal equation, 30 normalization of system of vectors, 113 normalizer, 32 NLS, xiv nuclear operators, 117 nuke, 201 null space, 6 numerator, 238 OLS, xii open set, 13 orthogonality of subspaces, 12
of system of vectors, 113 of vectors, 12 orthogonal matrix, 21 orthogonality lemma, 76 orthonormal system, 21, 113 orthoprojector, 12 pair, 208 Parseval identity, 114 parsimony principle, 44 perturbation argument, 104 pig-in-a-poke result, 43 positive part of a function, 19 precompact set, 76 premeasurable random variables, 265 principal projectors, 240 projy mixed spatial model, 245 purely spatial model, 209 probabilistic measure, 14 probability space, 14 process long-memory, 32 non-explosive, 301 purely explosive, 301 short-memory, 32 product of operators, 7 projector, 8 proper random variable, 90 proXy mixed spatial model, 245 purely spatial model, 208 pseudo-Case matrices, 232 purely spatial model, 190 QML, 191 quadratic form in independent standard normals, 92 quasi-monotone, 147 random variable, 15, 24 vector, 24 reduced form, 191 refined convergence of bilinear forms of Lp-approximable sequences, 69 of Lp-generated sequences, 52 regulator, 63 relatively compact with probability 1, 319
SUBJECT INDEX
remainder, 143 representation theorem, 132 resolvent convergence, 403 enveloping, 304, 403 left, 403 right, 403 2SLS, 224 sample space, 14 scalar product in Rn , 9 in a Hilbert space, 10 score, 339 second moments, 28 self-adjoint operator, 114 semireduced model, 174, 179 seminorm, 4 sequences nonzero-tail, 63 zero-tail, 63 shift backward, 399 forward, 399 size of a matrix, 394 slowly varying function, 132 with remainder, 143 solver, 191 spaces C[0,1], 2 Lp, 15 Lp , 85, 89 lp , 5 Mp , 5 Rnp , 5–6 spatial matrix, 190 spectral decomposition, 115 spectrum-separating decomposition, 308 square root of a matrix, 23 standard deviation, 16 step function, 16 strong convergence of operators, 8 strong law of large numbers, 93 strongly converge of operators, 8 submultiplicativity of norms, 193 SV, xii SV function, 132 SV function, with remainder, 143 symmetric operator, 12, 114
429
theorem on relative compactness, 319 thread, 394 transition matrix, 176 translation continuous, 47 translation operator 1-D, 46 2-D, 107 in lp (Z), 55 trend deterministic, 404 exponential, 80 geometric progression, 80 linear, 34 logarithmic, 80 polynomial, 80 trinity, 55 boundedness, 54 convergence on Lp-generated sequences, 65 definition, 55 matrix representation, 56 truncation argument, 270 two-step estimator, 226 uncorrelated variables, 30 uniform absolute continuity, 91 L1 -boundedness, 91 convergence of operators, 8 equicontinuity in mean, 76 uniform convergence theorem, 132, 144 uniform partition, 45 uniformly integrable, 91 unit vector in lp , 5 in Rn , 2 vanishing sequence, 4 VAR, xv variance stabilization condition, 33 variance-stabilizing normalizer, 36 variation, 268 vector, 1 vector spaces, 2 weak law of large numbers, 93 WLLN, 93 zero-tail sequences, 63