Lecture Notes in Statistics Edited by P. Bickel, P. Diggle, S. Fienberg , K. Krickeberg, 1. Olkin, N. Wermuth , and S. Zeger
166
Springer New York Berlin Heidelberg Barcelona HongKong London Milan Paris Singapore Tokyo
Hira L. Koul
Weighted Empirical Processes in Dynamic Nonlinear Models Second Edition
,
Springer
Hira L. Koul Department of Statistics and Probability Mich igan State University East Lansing, MI 48824 USA
Library of Co ngress CataloginginPubl ication Data Koul , H.L. (Hira L.) Weighted empiric al processes in dynamic nonlinear model s / Hira L. Koul.2nd ed. p. cm.  (Lecture notes in statistics ; 166) Rev. ed. of: Weighted empirica ls and linear models. cl 992. ISB N 0387954767 (softcover : alk. paper) I. Sampling (Statistics) 2. Linear models (Statistics) 3. Regression analysis. 4. Autoregressio n (Statistics) I. Kou l, H.L. (Hira L.). Weighted empiricals and linear models. II. Title. III. Lecture notes in statistics (SpringerVerlag) ; v. 166. QA276.6 .K68 2002 5 19.5dc21 2002020944 ISBN 0387 954767
Printed on acidfree paper.
First edition © 1992 Institute of Mathem atical Statistics, Ohio.
© 200 2 SpringerVerlag New York, Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publi sher (SpringerVerlag New York, Inc., 175 Fifth Aven ue, New York , NY 100 I0, USA), except for brie f excerpts in connection with reviews or scholarly anal ysis. Use in connection with any form of information storage and retrieval, electron ic adaptation, computer software, or by similar or dissim ilar meth odology now known or hereafter developed is forbidden . The use in this publication of trade names, trademarks, service mark s, and similar terms, even if they are not identified as such , is not to be taken as an expres sion of opinion as to whether or not they are subjec t to proprietary rights. Printed in the United State s of America.
9 8 7 6 5 4 3 2 I
SPIN 10874053
www .springerny.com Sprin gerVerlag New York Berlin Heidelberg A member ofBertelsmannSpringer Science+Business Media GmbH
Preface to the Second Edition The role of the weak convergence technique via weighted empirical processes has proved to be very useful in advancing the development of the asymptotic theory of the so called robust inference procedures corresponding to nonsmooth score functions from linear models to nonlinear dynamic models in the 1990's. This monograph is an expanded version of the monograph Weighted Empiricals and Linear Models , IMS Lecture NotesMonograph, 21 published in 1992, that includes some aspects of this development. The new inclusions are as follows. Theorems 2.2.4 and 2.2.5 give an extension of the Theorem 2.2.3 (old Theorem 2.2b .1) to the unbounded random weights case. These results are found useful in Chapters 7 and 8 when dealing with homoscedastic and conditionally heteroscedastic autoregressive models, actively researched family of dynamic models in time series analysis in the 1990's. The weak convergence results pertaining to the partial sum process given in Theorems 2.2.6 .and 2.2.7 are found useful in fitting a parametric autoregressive model as is expounded in Section 7.7 in some detail. Section 6.6 discusses the related problem of fitting a regression model, using a certain partial sum process. In both sections a certain transform of the underlying process is shown to provide asymptotically distribution free tests. Other important changes are as follows. Theorem 7.3.1 gives the asymptotic uniform linearity of linear rank statistics in linear autoregressive (LAR) models for any nondecreasing bounded score function ip, compared to its older version Theorem 7.3b.1 that assumed ip to be differentiable with uniformly continuous derivative. The new Section 7.5 is devoted to autoregression quantiles and rank scores. Its
VI
Preface to the Second Edition
contents provide an important extension of the regression quantiles of KoenkerBassett to LAR models. The author gratefully acknowledges the help of Kanchan Mukherjee with Section 8.3, Vince Melfi's help with some tex problems and the NSF DMS 0071619 grant support. East Lans ing, Michigan March 18, 2002
Hira L. Koul
Preface to the First Edition An empirical process that assigns possibly different nonrandom (random) weights to different observations is called a weighted (randomly weighted empirical process. These processes are as basic to linear regression and autoregression models as the ordinary empirical process is to one sample models. However their usefulness in studying linear regression and autoregression models has not been fully exploited. This monograph addresses this question to a large extent. There is a vast literature in non parametric inference that discusses inferential procedures based on empirical processes in ksample location models. However, their analogs in autoregression and linear regression models are not readily accessible. This monograph makes an attempt to fill this void. The statistical methodologies studied here extend to these models many of the known results in ksample location models, thereby giving a unified theory. By viewing linear regression models via certain weighted empirical processes one is naturally led to new and interesting inferential procedures. Examples include minimum distance estimators of regression parameters and goodness  of  fit tests pertaining to the errors in linear models. Similarly, by viewing autoregression models via certain randomly weighted empirical processes one is naturally led to classes of minimum distance estimators of autoregression parameters and goodness .. of  fit tests pertaining to the error distribution. The introductory Chapter 1 gives an overview of the usefulness of weighted and randomly weighted empirical processes in linear models. Chapter 2 gives general sufficient conditions for the weak convergence of suitably standardized versions of these processes to continuous Gaussian processes. This chapter also contains the proof of the asymptotic uniform linearity of weighted empirical processes based
Vlll
Preface to the First Edition
on the residuals when errors are heteroscedastic and independent. Chapter 3 discusses the asymptotic uniform linearity of linear rank and signed rank statistics when errors are heteroscedastic and independent. It also includes some results about the weak convergence of weighted empirical processes of ranks and signed ranks. Chapter 4 is devoted to the study of the asymptotic behavior of M and Restimators of regression parameters under heteroscedastic and independent errors, via weighted empirical processes. A brief discussion about bootstrap approximations to the distribution of a class of Mestimators appears in Section 4.2.2. This chapter also contains a proof of the consistency of a class of robust estimators for certain scale parameters under heteroscedastic errors. In carrying out the analysis of variance of linear regression models based on ranks, one often needs an estimator of the functional f jd 0, (2.2.25)
lim limsupP( sup IW;(t)  W;(s)1 2:: E) = O. otO
n
Itsi d
27
2.2.2. Vh  Processes
Proof. Follows from Theor em 2.2.1 (i) applied to
1]i
=H (Xd , G = i
L i ,1 :'S i:'S n.
0
Remark 2.2.3 Not e that if H is continuous then n
n
1
2:: Lni(t ) = t. i= l
Therefore, sup [L d(t +8) Ld(t)] :'S nm~xd;i8.
Ol
max I
I
lanil =
Op(l).
< 00 .
2.2.2. Vh

29
Processes
Then, for every x E lR at which G is continuous,
(2.2.29) In addition, if
(2.2.30) (2.2.31)
G has a uniformly continuous a. e. positive Lebesgue density g,
then
(2.2.32) If, in addition,
(2.2.33)
Ani C A n+l,i, 1 ~ i ~ n; n 2:: 1.
(2.2.34)
(n
1","""
2 )1 /2
Z:: h ni
=
0:
+ op(l),
0:
a positive r.v.,
Uk
0:'
B(G)
i
then
(2.2.35)
Ui, =?
0: '
B(G) ,
=?
where B is a Brownian Bridge in e[O, 1], independent of 0:.
The proof of (2.2.29) is straightforward using Chebbychev's inequality, while that of (2.2.32) uses a restricted chaining argument and an exponential inequality for martingales with bounded differences. It will be a consequence of the following two lemmas.
Lemma 2.2.5 Under {2.2 .27}  {2.2.31}, \:IE>
li~n P (sx~; n
1 2 /
t,
Ihnil'IG(y + Oni)  G(x
°
and for r = 1,2,
+ onill ,,; 2,'
where the supremum is taken over the set
{x ,y E lR;n 1/ 2 IG(x )  G(y)1 ~
E} .
c) ~
1,
2. Asymptotic Properties of W .E.P. 's
30
Proof. Let E> 0, q(u) := g(G 1 (u)) , 0 :::; u :::; 1; I n := maxi Ibil , sup{lq(u)  q(v) l; lu  vi :::; m  1/ 2 }
.
Wn
sup{ lg(x)  g(y) l; IG(x )  G(y) 1:::; m 1/ 2 } , sup{lg( x)  g(y)l ; Iy  z ] :::; I n }.
L\n .
By (2.2.31), q is uniform ly continuous on [0,1] . Hence , by (2.2.28) , (2.2.36)
L\n = op(l ),
Wn
= 0(1 ).
But n
supn 1 / 2 x ,y
L
Ihi n G(y + bi)  G(x
+ bd l
i= l
n
0,
(It \;1a) >
:::;
aT
2 exp {  (a /2M) arcsi nh(Ma/2L)} .
Proof. Write = E((fIFid , i 2: 1. F irst , consider th e case T = m : Recall t he following elementary facts : For all x E JR ,
< 00 ,
31
2.2.2. Vh  Processes exp (x )  x  I::; 2(cosh x 1 ) ::; x sinhx, sinh(x)jx is increasin g in [z], x ::; exp (x  1).
(a) (b) (c)
Because E(~i IFi d 1 ::; i ::; m , E{[exp(o~d

== 0 and by (2.2.37 ), for a
°>
0 and for all
1J1Fid < E{o6 sinh(06)I F i l } , < a; o sinh(oM)jM,
(2.2.40)
by (a), by (b).
Use a condit ioning ar gum ent to obtain
}
E ex+t,~i
~ E[exp (S~ O. Let an := [n l / 2 IE], the greatest integer less than or equal to n 1/2 IE, and define the grid 1i n := {Yj ;G(Yj) = jm l / 2 ,
1::; j
::; an},
n 2:: 1.
Also let
= hi+ 
Write hi
hi
Uh( X)
= max(O, hd , so that
n l / 2
n
L hi+Zi( x) 
n l / 2
i= l
n
L hiZi(X) i= l
UJ~(x)  Ui:(x) , say .
Thus to prove (b)(i) , by the triangle inequality, it suffices to prove it processes. The details of the proof shall be given for the u;t for process only ; those for the U;; being similar. Next , let , for k 2:: 1, and x , Y E JR,
o;
k
Ddx ,y)
:=
L hr+E {[Zi( X) 
Zi(y)]2IAi} ,
i=l
and define th e sequence of stopping times
+ ._ { . Dk(x , y)) < 3EC2 n } . .  n 1\ max k 2:: 1, max d2 ( x, yE 1i n x ,Y
Tn
Observe that T;t ::; n. To adapt the present situation to that of the Pollard , we first prove that P(T;t < n) 7 0 (see (2.2.41) below). This allows one to work with n l / 2(T;t)1 /2U\ instead of u;t . By r« Lemma 2.2.6 and the fact that arcsinh(x) is increasing and concave in x, one obtains that if X,Y in 1i n are such that d2 (x , y ) 2:: tEn  1/ 2 then P(nl/2(T:)1/2IU~(x)  U~(Y)I 2:: t) 2
::; 2 exp { 2Cd 2t(2 , y) EarCSinh(1 /(6E c))}, X
for all t > O.
2. Asymptotic Properties of W .E.? 's
34
This enables on e to carry out the chaining arg ume nt as in P ollard. What rem ains to b e done is t o connec t between the points in IR and a point in 7l n whi ch will b e done in (2.2.42) b elow. We shall now prove
P (r;t < n) + O.
(2.2.4 1)
Proof of (2.2.41) . For Yj , Yk in 7l n with Yj < Yk , d2 (Yj , Yk) (k  j) E n 1/ 2 . Hence, us ing the fact t hat (hi+)2 :::; hI ,
>
n
h;[G(Yk + 6d
< L
 G(Yj + 6d]{ (k 
12 j )E }l n /
i= l
n
< {(k  j)E}  ln 1/ 2 L i= l
k l
h; L[G(Yr+l
+ 6i )

G(Yr + 6dJ
r=j
n
Now apply Lemma 2.2.5 with r = 2 t o obtain P ( max
ll eA~ ,J ::: a n
2 } Dd X,Y) { d 2 (x. , Y ) < 3EC n * l.
This completes t he proof of (2.2.41). Next , for each x E IR, let Yjx deno te t he p oin t in 7l n t hat is the closest to x in dmet ric from t he po ints in 7l n t hat satisfy Yjx :::; x . We sha ll now prove: V E > 0, P (su p lu;t(x )  u;t (Yjx)1 > SCE) + O.
(2.2.42)
x
Proof of (2.2.42) . Now write Vn+, J;t for Vn , I n wh en {hi } in these qu anti ti es is repl aced by { hi+} ' The defin it ion of Yjx' G increasing, and t he fact tha t lu.; :::; Ihil for all i, imply t hat sup In 1 / 2 [J;t (x )  J;t (Yjx)J! x
35
2.2.2. Vh  Processes An application of Lemma 2.2.5 with r = 1 now yields that P(sup In 1/ 2[J: (x)  J:(Yj x)]1
(2.2.43)
x
> 4CE) + O.
But hi+ 2:: 0, 1 ~ i ~ n , implies that Vn+ is nondecreasing in x . Therefore, using the definition of Yjx'
n  1/ 2[U: (Yjx_d  U:(Yjx)]
+ J:(Yjxd 
J:(Yj x)
Vn+(Yjxd  Vn+(Yjx)
< Vn+(x)  Vn+(Yjx) < Vn+ (Yjx+1)  Vn+(Yjx) =
n  1/ 2 [U: (Yjx+l )  U:(Yjx)]
+ J:(Yjx+d
 J:(Yjx)'
Hence, (2.2.44)
sup In 1/ 2[Vn+(x )  Vn+(Yjx] I x
< 2
max
l :'SJ :San
1U:(Yj+d  U:(Yj)1
+2 max In 1/ 2[Jlt (Yj +l )  J:(Yj)]1 l :SJ :Sa n
Thus, (2.2.42) will follow from (2.2.43) , (2.2.44) and
P( m ax 1U:(Yj+l)  U:(Yj)1 > CE) + O.
(2.2.45)
l :'SJ :'San
In view of (2.2.41), to prove (2.2.45) , it suffices to show that (2.2.46) P( max n 1/ 2(T: )1/21U++(Yj+d  U\(Yj)1 > CE) + O. t«
l:S J:San
r«
But the L.H.S. of (2.2 .46) is bounded above by an
T;;
L p(1 L hi+[Zi(Yj+l j=l
Zi(Yj)] I > etn
1 2 / ) .
i= l
Now apply Lemma 2.2.6 with ~i == hi+[Zi(Yj+d  Zi(Yj)], Fil == A i, T == T;; , M = c, a = etn 1 / 2 , m = n . By the definition of T;;, L = 3c2E2n1 / 2 . Hen ce, by Lemma 2.2.6,
T;;
p(1 L
lu.; [z, (Yj+d  Zi(Yj)]! > cm
i= l
n 1/ 2E
< 2exp[2arcsinh(1/6E)] .
1 2 / )
2. Asymptotic Properties of W.E.P. 's
36
Since this bound does not depend on j , it follows that L.R.S . (2.2.46)
n 1/ 2 E
~ 2E 1n1 / 2 exp[2arcsinh(1/6E)] ~ 0.
This completes the proof of (2.2.42) for u;t. As mentioned earlier the proof of (2.2.42) for U;; is exactly similar, thereby completing the proof of (b)(i). Adapt the above proof of (b)(i) with Oi == to conclude (b)(ii) . Note that (b )(ii) holds solely under (2.2.27) and the assumption that G is continuous and strictly increasing, the other assumptions are not required here. The proof of (2.2.32) is now complete. The claim (2.2.35) follows from (b)(ii) above, Lemma 9.1.3 of the Appendix and the Cramer  Wold device . 0 As noted in the proof of the above theorem, the weak convergence of Uh holds only under (2.2.27), (2.2.34) and the assumption that G is continuous and strictly increasing. For an easy reference later on we state this result as
°
Corollary 2.2.3 Let the setup of Theorem 2.2.3 hold. Assume that G is continuous and strictly increasing and that {2.2.27}, {2.2.34} hold. Then, Uh '* a · B (G), where B is a Brownian bridge in C[a, 1], independent of a. Consider the process Uh(t) := Uh(G 1(t)),0 ~ t ~ 1. Now work with the metric It  81 1/ 2 on [0,1]. Upon repeating the arguments in the proof of the above theorem, modified appropriately, one can readily conclude the following
Corollary 2.2.4 Let the setup of Theorem 2.2.3 hold. Assume that G is continuous and that {2.2.27}, {2.2.34} hold. Then {Uh} ==> a.B, where B is a Brownian Bridge in C[a, 1], independent of a . Remark 2.2.4 Suppose that in Theorem 2.2.2 the r.v.'s 1]nl,'" Tln» are i.i.d . Uniform [0,1] . Then, upon choosing h n i == n 1/ 2 dn i , (ni == G 1 (1]n d , one sees that Ui, == Wd(G), provided G is continuous. Moreover the condition (D) is a priori satisfied , (B) is equivalent to (2.2.27) and (Nl) implies (2.2.34) trivially. Consequently, for this
37
2.2.2. Vh  Processes
sp ecial setup , Theor em 2.2.2 is a special cas e of Corollary 2.2.4 . But in general t hese two res ults serve differe nt purposes. T heore m 2.2.1 is the most general for the indep endent setup given there and ca n not b e deduced from Theorem 2.2 .3. The next t heo rem gives an analog of t he above theorem when t he weights are not necessarily bounded r. v. 's nor do they need to be square integrable.
Theorem 2.2.4 Suppos e th e setup in (2.2.26), and the assump tions
(2.2.28), {2.2. 30) , (2.2.31), and (2.2.34) hold. Th en (2.2.32) continues to hold. If, in addition, (2.2.33) holds, and
(2.2.47) For each n 2: 1, {h ni ; 1::; i ::; n} are square in tegrable, th en (2.2.35) cont in ues to hold. A proof of this theorem appears in Koul and Ossiander (1994) . This proof uses the ideas of bracket ing and chaining and is a bit involved , hence not included her e. We are rather interested in its numerous applications to time ser ies models as will be seen la ter in the Chapters 7 and 8 of this monograph. The next t heorem gives an analogu e of the above two res ults useful in the presence of het eroscedasticity. According ly, let Tni, 1 ::; i ::; n b e another array of r. v.'s indep en dent of (ni , for each 1 ::; i ::; n . Let {And b e an array of sub a fields such that Ani C A n,i+I , 1 ::; i ::; n , n 2: 1, (h nl, Snl , Tnl) is AnI  meas urable; the r.v. 's {(n l ,· · · ,(n,j  I;(hni,Sni, Tni), 1 ::; i ::; j} are A nj  measurable, 2 ::; j ::; n ; and ( nj is indep endent of A nj , 1 ::; j ::; n . Define, n
(2.2.48)
Vh(x)
I 2 := n /
L
hniI((ni ::; x
+ XTni + Sni) ,
i= l
n
Jh( x)
I 2
:= n /
L
hniG( x
+ XTni + Sni ),
i= l
x ER
2. Asymptotic Properties of W.E.P. 's
38
Next , we introduce the addit ional needed assumptions. (2.2.49)
G has Lebesgue den sity 9 satisfying the following:
> 0 on the set { x : 0 < G(x) < I} , g(G 1( u))
9
is uniformly cont inuous in 0 :::; u :::; 1, (2.2.50)
sup(1
+ Ixl)g(x) < 00 ,
x EIR
(2.2.51)
lim sup Ixg(x(1
u.OxEIR
E
(2.2.52)
(n t 1
+ u))
 xg(x )1=
o.
h;i) = 0(1) .
1=1
(2.2.53)
max ITnil = op(I). l 'S I'Sn
(2.2.54)
n'/' E[n' t{h~;(IOn;1 + ITn;I)}]' ~ 0(1).
(2.2.55)
n
n
1 2 /
L
IhniTnil
= Op(I) .
i= l
Remark 2.2.5 Not e that (2.2.50) impli es that for some const ant
0 0,
P[jBnj,l ! > E] n
< CE 4
(n L Eh~i 2
i= 1
+E [n I
t i= 1
h~i lG(Xjtni + 8ni) 
G(xj Il!f ).
41
2.2.2. Vh  Processes
The first term in the above inequ ality is free from j . To deal with the second term, use the boundedness of g, (2.2.56), and a teles coping argument to obtain that for all 1 ::; j ::; r«. 1 ::; i ::; n,
G(Xjtni + ond  G(Xjl) G(Xj)  G(x jd
+ G(Xjtni + Oni) 
G(Xjtni)
+G( Xjtnd  G(Xj)
< C [0 n 1!2 + IOnil +
ITnil].
Therefore,
Hence, using r n p (
= O(n 1!2),
~ax IBnj,l I > E)
l SJSrn
< C (n 1! 2 (1 + 02) n1
n
L Eh; i i=l
+ n 1!2E[n  1 t h; i(IOnil + ITnil)f) , i=l which in turn, together with (2.2.52) and (2.2.54) , implies that maXlSj Srn IBnj,l I = op(l) . A similar state ment holds for Bnj,2. Next , consider n
1
Bnj ,3 = n  !2
L hni [G(Xjtni + ond 
G(Xjltni + Oni)]
i=l n
0 , a .e.
(F3)
sUPx EIR IXf( x)1
A.
Note that (F1) implies that F is strictly increasing.
:S k < 00.
f is bounded and that (F2) impli es that
Corollary 2.3.1 LetXn1,' " , X nn be i. i.d. F. In addition, suppose that (N1) , (N2) , {2.3.6}, {2.3.1} and (1'1) hold. Th en {2.3.8} holds with qni = f(F 1) . If, in addition, (1'2) holds, th en {2.3.9} holds with f ni
=f .
0
Corollary 2.3.2 Let X n1,' " , X nn be i. i.d. F . In addition, suppose that (N1), (N2) , {2.3.6}, {2.3.1}, (1'1) and (1'3) hold. Th en
{2.3.21} holds with H
=F and qni =f(F 
1
).
If, in addition, (1'2) holds, th en {2.3.28} holds with fn i
= f.
0
60
2. Asymptotic Prop erties of W .E.P. 's
We sha ll now ap ply the above results to the mod el (1.1.1) and the {Vj}  pro cesses of (1.1.2). The resul ts thus ob tained are useful in studying the asymptoti c distributions of certain goodnessoffit t ests and a class of Mestimators of {3 of (1.1.1) when t here is an unknown scale par am et er also. We need the followin g ass umpt ion abo ut the design matrix X.
This is Noether 's cond it ion for t he des ign matrix X . Now, let (2.3.30) A
.
(X ' X)  1/2,
D := XA ,
q' (t ) .
(qn 1(t), · · · ,qnn(t )),
A (t ): = diag(q (t )) ,
r 1 (t) .r 2 (t ) .
AX' A (t )XA , n  1/ 2 H  1(t )D'q (t ),
0 ~ t ~ 1.
Wri te D = ((d i j ) ) , 1 ~ i ~ n , 1 ~ j ~ p , and let d U) denote t he j th column of D . Note t hat D'D = I p x p . This in t urn implies t hat (2.3.31)
(N l) is sa tisfied by d (j) for all 1
~
j
~ p.
Mor eover , with a U) denoting the j t h column of A , (2.3.32)
max dI2) . i p
m?x(x' a U))2 ~ max I)x~aU ) ) 2 I
I
mfxx;
(t
j=l
a (j)a(j))
Xi
) =1
m?x x;( X ' X) l X i = 0(1),
by (N X) .
I
Let n
(2.3.33)
Lj (t ) :=
L dTj Fi( H  1(t)) , i= l
We are now read y to state
0 ~ t ~ 1, 1 ~ j ~ p.
61
2.3. AUL of residual W.E.P. 's
Theorem 2.3.3 Let {(x~i,Yni),1 SiS n} ,,B ,{Fni,1 SiS n} be as in the model (1.1 .1) . In addition, assume that {Fnd satisfy (C*) is satisfied by each L j of (2.3.33) ,
(2.3.4), (2.3.5) and that 1
S
j
S p.
Then , [or every 0 (2.3.34)
< b < 00 ,
sup IIA{V(H 1 (t) ,,B + Au) =
op(I).
where the supremum is over 0 S t S 1,
Ilull S
b.
If, in addition, H is strictly increasing for all n every 0 (2.3.35)
fdt)ull
V(H 1(t),,Bn 
< b
o.
If, in addition, H is strictly increasing for every n (2.3.38)
A [V(ax ,,B =
+ Auy] 
~
1, then
V(rx , ,B)]
f 1 (H(x)) u  f 2 (H (x )) v
+ u p (I ).
In (2.3.37) and (2.3.38}, u p (l ) are a sequences of stochastic processes in (t , x , u , v) tending to zero uniformly in probability over 0 S t S
1,
00
Sx S
00,
Ilull S band Ivl S
b.
2. Asymptotic Properties of W.E.P. 's
62
Proof of Theorem 2.3.3. Apply Theorem 2.3.1 to Xi = Yi x~{3 , c, = x~A , 1 ::; i ::; n . Then F; is the d.f. of Xi and the j th components of AV(H 1(t) ,{3 + Au) and AV(H 1(t) ,{3) are Sd(t, u) , Sd(t, 0) of (2.3.1), respectively, with d i = d i j of (2.3.30) , 1 ::; i ::; n, 1 ::; j ::; p . Therefore (2.3.34) will follows by p applications of (2.3.8), one for each dUl' provided the assumptions of Theorem 2.3.1 are satisfied. But in view of (2.3.31) and (2.3.32), the assumption (NX) implies (Nl), (N2) for dUl' 1 ::; j ::; p. Also, (2.3.6) for the specified {cd is equivalent to (NX). Finally, the CS inequality and (2.3.31) verifies (2.3.7) in the present case. This makes Theorem 2.3.1 applicable and hence (2.3.34) follows. 0 Proof of Theorem 2.3.4. Follows from Theorem 2.3.2 when applied to Xi = (Yi x~{3h 1, c~ = x~A, 1 ::; i ::; n , in a fashion similar 0 to the proof of Theorem 2.3.3 above. The following corollaries follow from Corollaries 2.3.1 and 2.3.2 in the same way as the above Theorems 2.3.3 and 2.3.4 follow from Theorems 2.3.1 and 2.3.2. These are stated for an easy reference later on . Let C(b) := {s E W , IIA l(S  {3)II ::; b} . Corollary 2.3.3 Suppose that the model (1.1.1) with F n i == F holds. Assume that the design matrix X and the d.f. F satisfying (NX) and (Fl) . Then , V 0
(2.3.39)
< b < 00,
A [V(F 1(t) ,s)  V(F 1(t),{3)] =
f(F 1(t))A l(s  {3) + up(1) ,
where u p(1) is a sequ enc e of stochastic processes in (t , s) tending to zero uniformly in probability over 0 ::; t ::; 1; sEC (b) . If, in addition, F satisfies (F2), then
(2.3.40)
IIA{V(x,s)  V(x ,{3)}  f(x)A l(s  {3)II = up(1) ,
where u p(1) is a sequence of stochastic processes in (x, s) tending to zero uniformly in probability over
00 ::;
x ::; 00 ; s E C(b).
0
Corollary 2.3.4 Suppose that the model (2.3. 36) with F n i F holds and that the design matrix X and th e d.f. F satisfy (NX),
2.4. Some additional results for W .E.P. 's
63
(Fl) and (F3). Then {2.3.37} holds with f(F 1(t))Ip x p ,
(2.3.41) f 1 (t)
o:::; t :::; 1.
F 1 (t)f(F 1 (t) )AX'l,
f 2 (t )
If, in addition, F satisfies (F2), then {2.3.38} holds with rj(H)
= rj(F) ,j = 1,2,
i.e.,
A [V(ax ,,8 + Auy)  V(yx ,,8)] = f(x)u  xf(x)v
+ u p (l ), o
where v is as in {2.3.37} and u p (l ) is as in {2.3.38}.
We end this section by stating an AUL result about the ordinary residual empirical processes H n of (1.2.1) for an easy reference later on. Corollary 2.3.5 Suppose that the model {1 .1.1} with Fn i
=F holds .
Assume that the design matrix X and the d.f. F satisfying (NX) and
(Fl). Th en, V 0 < b
1, respectively.
0
2. Asymptotic Properties of W.E.P. 's
64
2.4
Some Additional Results for W.E.P.'S.
For t he sake of general interest , here we state some further results abo ut W .E.P. 's. To begin with, we have
2.4.1
Laws of the iterated logarithm
In this subsec tio n, we ass ume that (2.4.1)
Define n n _ L d2 L di{ I (7]i ::::; t )  Gi(t)} , (In2.'i, i=l i=l . u; (t )/ {2(J; £n£n(J;} 1/2, » 1, t ::::; 1.
.
(2.4.2) Un(t ) ~n (t)
>
°:: ;
°:: ;
Let r (s, t) := s /\ t  st , s, t ::::; 1, and H( r) be the reproducing kern el Hilb ert space gene rate d by the kernel r with II . li T denoting t he associated norm on H (r). Let
K = {f
E H( r) ; IlfilT:: :; I} .
Let p denote the uniform metric. Theorem 2.4.1 If7]1 ,7]2, '" are i.i .d. uniform on [0,1] r.v.'s an d d 1, d 2, . . . are any real numbers sa tisfying 2 · 1Im(Jn = n
.
00 ,
2
hm( max d i n
1:S1:Sn
)
£n£n(J; 2 (In
= 0,
th en P(P (~n , K) ~
°
an d the se t of lim it points of {~n } is K) = 1.0
This theorem was proved by Vanderzanden (1980 , 1984) usin g some of the results of Ku elbs (1976) and certain martingale properties of ~n'
Theorem 2.4.2 Let 7]1, fJ2, . . . be in dependen t no n n egat ive ran dom variables. Let {d i } be any real numbers. T hen lim sup sUP (J~ l IUn (L ) 1 n
t >O
< 00 a.s ..
o
2.4. Some additional results for W.E.P. 's
65
A proof of this appears in Marcus and Zinn (1984). Actually they prove some other interesting results about w.e.p.'s with weights which are r .v.'s and functions oft. Most of their results, however, are concerned with the bounded law of the iterated logarithm. They also proved the following inequality that is similar to, yet a generalization of, the classical Dvoretzky  Kiefer  Wolfowitz exponential inequality for the ordinary empirical process. Their proof is valid for triangular arrays and real r.v .'s. Exponential inequality. Let X n1, X n2 , ' " ,Xnn be independent r.v. 's with respective d.f. 's Fn1, ... , Fnn and {dnd be any real numbers satisfying (Nl) . Then, V .\ > 0, V n ~ 1, p (sup xE IR
t
i=1
dni{I(Xni ::; x)  Fni(X)}
~.\)
< [1 + (81f)1 /2.\] exp( _.\2 /8) .
o
The above two theorems immediately suggest some interesting probabilistic questions. For example, is Vanderzanden's result valid for nonidentical r. v.'s {'l7d? Or can one remove the assumption of nonnegative {'l7d in Theorem 2.4.1?
2.4.2
Weak convergence of W.E.P.'s in 1)[0, I]", metric and an embedding result.
III
pq 
Next, we state a weak convergence result for multivariate r.v.'s. For this we revert back to triangular arrays. Now suppose that 'l7ni E [0 , 1]P , 1 ::; i ::; n , are independent r.v.'s of dimension p. Define n
(2.4.3)
Wd(t)
:=
L dni{I('l7ni ::; t) 
Gni(t)}, t E [0, 1]P .
i=1 Let c.; be the jth marginal of Gni , 1 ::; i ::; n, 1 ::; j ::; p. Theorem 2.4.3 Let {'l7ni, 1 ::; i ::; n} be independent P: variate r.v. 's and {dnd satisfy (Nl) and (N2) . Moreover suppose that for each 1 ::; j ::; p , n
L
lim lim sup sup d;i{Gnij(t 0+0 n 09::;10 i=1
+ 8) 
Gnij(t)} = 0.
66
2. Asymptotic Properties of W.E.P.'s
Th en, f or every to >
°
2. Moreover, W d ==;> som e W on (V[O , l]P,p) if, and only if,
fo r each s, t E [O ,l]P, Co v( Wd( S), W d(t )) + Co v( W( s) , W (t) ) =: C(s , t ). In this case W is necess arily a Gaussian process, P (W E C[O, I]" ) = 1, W (O ) = = W(l ).
°
o
Theor em 2.4.3 is essent ially proved in Vanderzanden (1980) , using results of Bickel a nd Wi chura (1971) . Mehra and Rao (1975) , Withers (1975), and Koul (1977), among ot hers, obtain t he weak convergence resul ts for { Wd }  processes when {17nd are weakl y dep endent. See Dehling and Taqqu (1989) and Koul and Mukherj ee (1992) for similar resul ts when {1Jnd are lon g range dep endent . Shor ack (1971) pr oved the wea k convergence of W d/q  process in the pmet ric, wher e q E Q, with
Q :=
{q ,q a cont inuous fun ction on [0, l],q q(t) t , C 1/ 2q(t ) .j,. for 0
~ O,q(t)
= q(l 
< t < 1/2, fal q2(t )dt
N, .
These derivations together with Lemma 3.2.2 and (3.2.35) complete 0 t he pr oof of (3.2.3 6) . We combine Theorem 3.2.1, Lemmas 3.2.2 and 3.2.3 to obtain th e following Theorem 3.2.4 Under th e no tation and assum ptions of Th eorem 3.2.1, Lem m as 3.2.2 and 3.2.3, \:j 0 < b < 00 , n
(3.2.40)
sup IZd(t , u )  Zd(t , 0) 
u'2~)dni  dn(t)) Cniqni(t) I i= l
= op( l) ,
(3.2.41)
su p !Td( O.
(b)
lim sup n 1 / 2 Idt
n
(v)1< 00.
n
Th en, (3.4.16)
where ItJ is as in (3.4.8). Proof. The proof of thi s th eorem is similar to th at of Theorem 3.4.1 so we shall be bri ef. To begin with, by (3.4.9) and (N2) it suffices to prove that {T(V)}lrt(v) 'td N(0 ,1) , where r t(v) := Vd( v)  ItJ(v) . Apply Lemma 3.4.1 above to the LV. 'S jXn11,... ,IXnnl , to conclude that sup IJ(J;l(t ))  tl = op(1). O 0,
0 < t < 1.
n
Finally, with Kd(t) assume that
:= 2:~=1 (dni 
dn(t)){I(Xni ~ H1( t ))  Lni(t)} ,
(3.4.21) C(t , s) lim COV(Kd(t), K d(s )) n
n
li~
'L
(dni  dn(t ))(dni  dn(s)) L ni(S)( 1  Lni(t )),
i= l
exists for all a ~ s ~ t ~ 1. Then, Zd  fL d converges weakly to a mean zero, covariance C continuous Gaussian process on [0,1 ], tied down at a and 1. 0 Remark 3.4.5 In (3.4.17), withou t loss of generality it may b e assumed t hat n  1L;i£ni( S) = 1,0 ~ s ~ 1. For , if (3.4.17) holds for some {£ni, 1 ~ i ~ n} , then it also hold s for £ni (S) == £~i ( s ) := n 1f2[Lni(S + n  1/ 2)  Lni( s)], 1 ~ i ~ n , a ~ s ~ 1. Because
n 12:;=1 Lni(s) == s, n  1 2:~=1 £~i (s ) == 1. 0 Remark 3.4.6 Condit ions (C *), (N l ) and (3.4.18) may b e repl aced by t he condit ion (B) , b ecause, in view of t he previous rem ark, n
n 1/ 2Idn (t )1= In1/2 'Ldni£ni(t )1 ~ n 1 / 2m ?x jdn il, 1
i= l
a ~ t ~ 1.
0
Remark 3.4.7 In t he case Fni have density f ni, one ca n choose
£ni = f ni(H 1)/n 1
n
'L f nj(H 1),
1 ~ i ~ n.
j= l
Remark 3.4.8 In t he cas e Fni == F, F a cont inuous and st rict ly increasi ng d.f., Lni(t ) == t , £ni (t ) == 1, so t hat (C *) and (3.4.17) (3.4.20) are t rivia lly satisfied . Moreover , C(s, t) = s(1  t) , a ~ s ~ t ~ 1, so t hat (3.4.21) is satisfied . Thus Theor em 3.4.3 includes Theor em V.3.5.1 of Haj ek and SId ak (1967). 0 Theorem 3.4.4 ( Weak convergence of zt ). Let X n 1 , ' " , X nn be independent r.v. 's with respective d.f. 's F n 1 , " ' , Fnn and let dn 1 ,
98
3. Lin ear Rank and Signed R ank Statistics
... , dnn be real num bers. A ssume that (N 1) and (N 2) hold and that the following hold. (3.4.22) lim lim sup n
8t0
sup
0 (x )l; x E ~} = O(n l / 2 ) , wh ere cI> is the d .f. of a N(O , 1) r .v. See, e.g., Feller (1966) , Ch. XVI ). Babu and Singh (1983, 1984), among ot hers , po inted out that this phen omen on is shared by a large class of statistics. For fur ther reading on boot st rapping we refer the reader to Efron (1982), Hall (1992). We now turn to the probl em of bootstrapping Mestimators in a lin ear regr ession model. For the sake of clarity we shall restrict our attention to a si m ple lin ear regr ession model only. Our main purpose is to show how a certain weighted emp irica l sampling distribution naturally help s t o overco me some inherent difficulties in defining boot strap Mestimators. What follows is based on the work of Lahiri (1989). No proofs will be given as they involve intricate te chnicalities of the Edgeworth expans ion for indep endent nonidenticall y dist ributed LV .'s. Accordingly, ass ume that {ei , i ~ I } are i.i.d . F r. v.'s, {Xni' 1 ~ i ~ n} are t he known design points, {Yni , 1 ~ i ~ n } are obse rvable
4.2.2. Bootstrap approximations
109
r.v.'s such that for a (3 E JR, (4.2.21) The score function 'ljJ is assumed to satisfy (4.2.22)
J'ljJdF=O.
Let An be an Mestimator obtained as a solution t of n
(4.2.23)
L xni'ljJ{Yni 
Xni t ) = 0,
i=1
and Fn be an estimator of F based on the residuals eni := Yni XniAn , 1 ::; i::; n. Let {e~i' 1::; i::; n} be i.i.d. Fn r.v.'s and define (4.2.24)
1 ::; i ::; n.
The bootstrap Mestimator
~~
is defined to be a solution t of
n
(4.2.25)
L xni'ljJ{Y;i  Xni t ) == O. i=1
Recall, from the previous section, that in general (4.2.22) ensures the absence of the asymptotic bias in An . Analogously, to ensure the absence of the asymptotic bias in ~~ , we need to have Fn such that (4.2.26) where En is the expectation under Fn . In general, the choice of Fn that will satisfy (4.2.26) and at the same time be a reasonable estimator of F depends heavily on the forms of 'ljJ and F . When bootstrapping the least square estimator of (3, i.e., when 'ljJ{x) = x, Freedman (1981) ensure (4.2.26) by choosing Fn to be the empirical d.f of the centered residuals {eni  en., 1 ::; i ::; n} , where en. := n 1 '2:/]=1 enj' In fact , he shows that if one does not center the residuals, the bootstrap distribution of the least squares estimator does not approximate the corresponding original distribution. Clearly, the ordinary empirical d.f. tt; of the residuals {eni; 1 ::; i ::; n} does not ensure the validity of {4.2.26} for general designs
110
4. M, R and Some Scale Estimators
and a general '1/;. We are thus forced to look at appropriate modifications of the usual bootstrap. Here we describe two modifications. One chooses the resampling distribution appropriately and the other modifies the defining equation (4.2.6) a La Shorack (1982) . Both provide the second order correct approximations to the distribution of standardized An .
Weighted Empirical Bootstrap Assume that the design points {xnd are either all nonnegative or all nonpositive. Let W x = L~=I IXnil be positive and define n
(4.2.27)
F1n(y) := w;1
L
jXnilI(eni :S y) , Y
E R
i= l
Take the resampling distribution Fn to be FIn. Then, clearly, TI
EIn'l/;(e~l) = w;1
L
I Xnil'l/;U~nd
i=1
n
sign(xt}w;1
L Xni'I/; (Yni 
Xni A) = 0,
i= 1
by the definition of An . That is, FIn satisfies (4.2.26) for any '1/; .
Modified Scores Bootstrap Let Fn be any resampling distribution based on the residuals . Define the bootstrap estimator ~ns to be a solution t of the equation n
(4.2 .28)
L Xni['I/; (Y; i 
Xni t )  En'l/;(e~i)J = O.
i=1
In other words the score function is now a priori centered under Fn and hence (4.2.26) holds for any Fn and any '1/; . We now describe the second order correctness of these procedures. To that effect we need some more notation and assumptions. To begin with let r; := L~=I x~i and define n
TI
bi« >
L x~dr~, i= l
bx
:=
L i=l
IX~il/r~ .
111
4.2.2. Bootstrap approximati ons For a d .f. F and any sampling d .f. F n , define, for an x E JR,
I(X)
. E 'ljJ (el  x) ,
w(x) = (J2(x ) := E { 'ljJ(el  x)  I(x)}2, In(X) := En 'ljJ ( e~l  x) ,
Wl(X) .
E{'ljJ (el  x)  I(x)}3 ,
wn(x)
.
(J~ (x) = En{'ljJ(e~ l  x)  In(x)}2 ,
Wln(X)
.
En{ 'ljJ ( e~l  x)  ,n(x)}3 ,
{i: 1 ~ i ~ n , IXni l > cTx bx }, /1;n(c):= #An(c),
An( c) .
C> 0.
For any real valued fun cti on g on JR, let g, 9 denot e its first and second derivat ives at wh enever t hey exist, resp ectively. Also, write In, Wn et c. for IN (O),WN(O), etc. Finally, let a :=  y/ (J, an := Yn/(In and, define for x E JR, H 2(x) := x 2  1, and
°
Pn(x )
:=
In InWn .. " } X2 (x)  bix [{    3 2 2 (In (In an
+
WIn ] 6 3 H 2(x ) ep (x ). (In
In t he following t heore ms , a .s, means for almost a ll sequences {ei; i 2: I} of i.i.d. F r. v.'s. Theorem 4.2.3 Let the model (4.2.21) hold. In addition, assum e
that 'ljJ has uniformly continuous bounded second derivative and that the following hold: (a) t 00. (b) a > O. (c) There exists a constant 0 < C < 1, such that livr; = O(K;n(c)) . (d) m xlnTx = O(Tx) . (e) There exist constants 0 > 0, 8 > 0, and q < 1, such that
T;
sup[IEex p{it'ljJ( el  x)}1 : [z ] < (f)
2:~=1 exp ( 
Th en, with
D.~
8, ltl > OJ < q.
AW;/ T;) < 00, V A > O. defined as a solution of (4.2.25) with Fn = FIn,
sup IPIn( anTx (D.~  Lin) ~ y )  'Pn( y)1 = o(mx/Tx ), y
sup IPIn(a Tx(Lin  {3) ~ y)  PIn (Tx (D.~  Lin) ~ y)1 y
= o(mx /T x)'
a.s .,
where PIn denot es the bootstrap probability under FIn, and where the supremum is over y E JR. 0
4. M, R and Some Scale Estimators
112
Next we state the analogous result for b. n s .
Theorem 4.2.4 Suppose that all of the hypotheses of Th eorem 4.2.3 except (J) hold and that b. n s is defined as a solution of {4.2.28} with Fn =
ti;
the ordinary em pirical of th e residuals . Th en,
sup IFn (anTx (b. ns  A n) ~ y)  Pn(y)1 = o(mx/Tx)' y
sup IPn (a Tx (A n,B) ~ y)  Pn(Tx(b. ns  A n) ~ y)1 y
=
where
i;
o(mx/Tx)'
a.s .,
denot es th e bootstrap probability under
it.;
o
The proofs of t hese t heorems appear in Lahiri (1989) where he also discusses analogous results for a non smooth 'lj;. In th is case he chooses th e sampling distribution to be a smooth est imator obtained from the kernel type density est imator. Lahiri (1992) gives exte ns ions of th e above theorems to multiple linear regression models. Here we br iefly comment about the assumptions (a)  (f) . As is seen from th e pr evious section , (a) and (b) ar e minimally required for the asymptotic normality of Mestimators. Assumptions (c), (e) and (f) ar e required to carry out the Edgewort h expansions while (d) is slightly stronger than Noether's cond ition (N X) applied to (4.2.21). In particular, X i == 1 and X i == i sa tisfy (a) , (c) , (d) and (f). A sufficient cond ition for (e) to hold is that F have a positive density and 'lj; have a cont inuous positive derivative on an open int erval in JR.
4.3
Distributions of Some Scale Estimators
Here we shall now discuss some robust sca le estimators. Definitions. An est imator !1(X, Y ) based on th e design matrix X and the observat ion vector Y of f3 is said to be locatio n in vari ant if (4.3.1)
!1(X, Y
+ Xb) = !1(X , Y ) + b , V b E JRP.
It is said to be scale in variant if
(4.3.2)
!1(X ,aY) =a!1( X, Y), VaE JR, a#O .
113
4.3. Distributions of some scale estimators
A scale estimator 8{X , Y) of a scale parameter , is said to be location invariant if
8{X, Y
(4.3.3 )
+ Xb) = 8{X , Y) ,
V bE 1W.
It is sa id to be scale invariant if
8{X ,aY) = a8 {X , Y ), V a > O.
(4.3.4 )
Now observe that Mestimators Li and a * of f3 of Section 4.2.1 are location invariant but not scale invariant. The est imators Lil defined at (4.2.15) , ar e location and scale invariant whenever 8 satisfies (4.3.3) and (4.3.4) . Not e that if 8 does not satisfy (4.3.3) then Lil need not be location invari ant. Some of the candidat es for 8 ar e (4.3.5)
8
.
{{n 
n
p )1
2)Yi  x~/3)2}
1/ 2
,
i= 1
81
.
med { IYi  x~/3I ; 1 < i
82
·
med{ IYi Yj {Xi  Xj )'/3 1;1 ::; i < j ::; n } ,
::; n} ,
where /3 is a pr eliminary est imato r sa t isfying (4.3.1) and (4.3.2). Estimator 8 2 , wit h /3 as the least square est imator , is the usual est imator of the error var iance, ass uming it exists. It is known to be nonrobust against outl iers in the errors . In robustness studies one needs scale estimators th at are not sensit ive to outliers in t he err ors . Estimator 8 1 has bee n mentioned by Huber (1981, p. 175) as one such candidate . The asymptotic properties of 8 1, 8 2 will be discussed shortl y. Here we just mention that each of these est imators est imat es a different scale paramet er but that is not a point of concern if our goal is only to have location and scale invariant Mestimators of f3 . An alte rn ative way of having location and scale invariant Mest imators of f3 is to use simultaneous Mestimation method for est imating f3 and , of (2.3.36) as dis cussed in Huber (1981). We ment ion here, without giving det ails, that it is possible to st udy the asymptotic joint distribution of these est imators under heteroscedastic errors by using th e results of Ch apter 2.
4. M, R and Some Scale Estimators
114
We shall now study the asymptotic distributions of 81 and 82 under th e model (1.1.1) . With F, denoting the d.f, of ei , H = n  1 L:Z:::1 r; let (4.3.6)
pdy ) . H(y )  H( y) ,
(4.3.7)
P2( y)
J[H(Y + x)  H(  y + x )JdH (x ), y
.
~ O.
Define 1'1 and 1'2 by th e relations
P1hd =
(4.3.8)
1/2,
Not e that in th e case F, == F , 1'1 is median of the distribution of 1e1 1and 1'2 is median of the distribution of lei  e21· In general, 1'j , Pj , etc . depend on n , but we suppress this for th e sake of convenience. The asymptotic d istribution of S j is obtained by th e usual method of connecting the event { 8 j :S a} with certain events based on certain empirical processes, as is done when studying th e asy mptot ic distribution of th e sample medi an , j = 1,2. Accordingly, let , for y ~ 0, n
(4.3.9)
S( y)
'
L I (IYi  x~,B1 :S y) , L L I (IYi  Yj i= l
T(y )
:=
(Xi 
xj)',B1:S y )
.
l :Si :Sj :Sn
Then , for an a > 0,
+ 1)2 1},
(4.3.10) {Sl :S a}
=
{S( a) ~ (n
{S(a) ~n21}
C
{81 :Sa} ~ {S(a) ~nr1 1} , neven.
n odd,
Simi larly, wit h N := n(n  1)/2, for an a > 0, {T(a) ~ (N
(4.3.11)
{S2:S a} {T(a) ~ N2 1}
C
+ 1)r 1},
N odd {82 :S a} ~ {T(a) ~ N2 1  I} , N even
Thus, to study the asymptotic distributions of 8 i  j = 1,2 , it suffices to st udy thos e of S(y) and T (y ), y ~ O. In what follows we shall be using the notation of Ch apter 2 with t he following modifi cations. As before, we shall write S~ , JL~ etc . for
115
4.3. Distri bu tions of some scale estimators s~ , IL~ et c. of (2.3.2) whe never dni we shall take
== n 1/ 2 . Moreover, in (2.3 .2),
(4.3.12) With these modifications, for all n
~
1, Y
~
0,
Sr(y , v )  Sr(  y, v ) = n 1/ 2
S (y)
n
L I(l ei 
c~v l :S y) ,
i= l
J
[S r(y + x , v )  Sr(  y + x, v )JSr (dx , v)  1,
with probability 1, where v = A  1 (fJ  (3). Let (4.3.13) IL~(Y , u )
1L1(H(y) , u ),
y10 (y , u )
Y1(H(y ), u ),
J
K(y , u )
[y10(y
W(y , u ) 9i(X)
'
hi( x)
.
y
E IH;;
+ X, u )  y 10( y + X , u )JdH (x ),
y10(y , u )  y 10(  y, u ), Y ~ 0, u E {fih'2 + x )  Ji( + x)},
,2 {fih'2 + x ) + Ji( ,2+ x )} ,
IH;P,
for 1 :S i :S n , x E R We shall write W(y) , K (y) etc . for W(y , O) , K (y,O) et c. Theorem 4 .3. 1 Assume that (1.1.1) holds with X and {Fnd satis
fying (NX ) , (2.3.4) and (2.3.5) . Moreover, assume that H is strictly in creasing for each n and that lim lim sup
o~O
n
sup
O ~ s ~l  o
(4.3.14)
About {fJ} assume that (4.3.15)
[H(H  1(s + 15) ± , 2)  H (H  1(s) ± ,2)J = O.
4. M, R and Some Scale Estimators
116 Then, V a E JR, (4 .3.16)
P(n 1 / 2 ( s l

"n) ::; a"/'l)
= P ( W("/'l) + n 1 / 2 ~ x~A{fk/'l)  Ji( ')'t}} . v
~ a· ')'In1 t[Ji(')'t} + Ji( ')'t}]) + 0(1), i= 1
(4.3.17) P(n / 2(s2 1
=
P (2K(')'2)
')'2) ::;
a')'2)
+ n 3 / 2 L ~ Cij
9i(X)dFj(x) . v
J
1
~ ')'2 an  1
J
tJ
hi( X)dH(X))
+ 0(1) .
1=1
where
Cij
=
(Xi 
Xj)' A,
1 S i , j ::; n .
Proof. We shall give the proof of (4.3.17) only ; that of (4.3.16) being similar and less involved . Fix an a E JR and let Qn(a) denote the left hand side of (4.3.17) . Assume that n is large enough so that an := (an 1 / 2 + 1)')'2 > 0. Then , by (4.3.11), N odd
It thus suffices to study P(T(a n ) ~ N2 1 + b), bE JR. Now, let
. n 1 / 2[2n 1T(y) + 1]  n 1 / 2p2(Y) , y ~ 0, kn . (N + 2b)n 3 / 2 + n  1 / 2  n 1 / 2p2(an) .
Tt{y)
Then , direct calculations show that (4.3.18) We now analyze k n : By (4.3.8) ,
4.3. Distributions of some scale estimators
117
But
n 1/ 2(p2(an)  P2(,2)] =
n
1 2 /
J [{ H(a n + x)  H ('Y2
+ x)} {H (an + x)  H (
,2 + x)} ]dH(x).
By (2.3.4) and (2.3.5), t he sequence of distributi ons {P2} is tight on (IR, B), implying that ,2 = 0( 1), n 1/ 2 , 2 = 0(1). Consequently, 1 2
n /
J {H(±an + x )  H(±' 2 + x )} dH (x) a,2 n  1
t
J h(±, 2 + x )dH (x ) + 0(1),
i= l
 a, 2n  1
(4.3.19 ) kn
t
J[h (,2
+ x) + fi(
,2+ x) ]dH(x)
i =l
+ 0(1). Next , we approximate T1 (an ) by a sum of independ ent LV . ' S . The pr oof is similar to t he one used in approximating linear rank statistics of Secti on 3.4. From t he definition of T 1 , T 1 (y)
n  1 / 2 J [Sr(y
+ x,v) 
Sr(  y + x ,v )]Sr (dx , v )
n  1 / 2 J[YIO(y + x, v )  y 10 (  y + x ,v )]Sr (dx , v )
+ n 1 / 2 J[fL~ (Y + x, v )  fL~(  Y + x, v)]y1o(dx , v) + n 1 / 2 J[fL~(Y+ x,v)

fL~( y+ x,v)]fL~(dx,v)  n 1/ 2p 2 (y)
Bu t E 3 (y)
.
n 1 / 2 J
[fL~ (Y + x, v )  fL~( Y + x , v)]fL~ (dx, v ) 
n
1/ 2p2(Y)
4. M, R and Some Scale Estimators
118
n 3/2
t 2; !
{Fl (y + X +
C~jv) 
Fi(y + x) =
n 3/2
Fi(y + x
+ C~jv)
J
2=1
t L C~jV ! i =1
+ F1(y + x) }dFj(X)
[Ji(y + x)  Ji( y + x)]dFj(x)
+ u p (1),
j
by (2.3.3) , (NX) and (4.3.15). In this proof, up(1) means op(1) uniformly in Iyl :::; k , for every 0 < k < 00. Integration by parts, (4.3.15) , (2.3.25) , H increasing and the fact that J n 1 / 2fl~ (dx , v) = 1 yield that
E2(Y) .
n 1/2 n  1/2
!
! {JJ,~(Y + ! + {y10(y
x , v) 
fl~( y + x , v)}YI0(dx , v)
x , v)  y 10(y + x, v)}fl~(dx, v)
{y10(y + x )  y 10(y + x)}dH(x)
+ u p (1).
Similarly,
(4.3.20)
Et{y) =
n  1/2
!
{YI0(y
+ x)
 y 10(y + x)}S~(dx)
+ up (1).
Now observe that n1/2S~ = H n , the ordinary empirical d.f of the errors {ei} ' Let
Eu(Y)
. !
{y10(y + x)  Y?( y + x)}d(Hn(x)  H(x))
Z(y)  Z( y), where
Z(±y) := We shall show that (4.3.21)
!
y 10(± y + x)d[Hn(x)  H(x)], y 2: O.
4.3. Distributions of some scale estim ators
119
But
IZ(±an )
(4.3 .22)

Z(±'Y2)1
I![Yt{H(±an + x)) 
Yl(H(±,2 + x ))]
xd(Hn(x)  H(X)}I
S 2
sup 1Y1(H(y))  Yl(H(z))1 Iys!:::; lain 1 / 2,
= op(l) ,
bec ause of (2.3.4}(2.3.5) and Corollary 2.3.1 applied with dni n l / 2 . Thus, to prove (4.3.21) , it suffices to show that
(4.3.23) But
IZ(±,2}!
111
P (liS+ ILII :S K t )
~ 1
E/ 2, \fn
~
NI, ·
Therefore by (4.4.13) ad (4.4.15) t here exist N, and b as in (4.4.14) such t hat (4.4.16) But
where di = e'Al (XiX) , Rir is the rank of rj  r (xix)AIe. But such a linear rank statistic is nondecreasing in r , for every e . See, e.g., Hajek (1969; Theorem 7E , Chapter II) . This together with (4.4.16) 0 enables one to conclude (4.4.12) and hence (4.4.11). Theorem 4 .4.1 Supp ose tha t (1.1 .1) holds an d th at th e design matrix X an d th e error d.j. 's { Fn i } satisfy th e assumptions of Lem mas 4·4 · 1 and
4·4·2 above.
Th en
(4.4.17) Proof. Follows from Lemmas 4.4.1 and 4.4.2.
o
4. M, R and Some Scale Estimators
126
Remark 4.4.1 Arguing as in J aeckel combined with an argument of Lemma 4.4 .2, one can show t hat II A l (.82  .83)1/ = Op(l) . Con
1
sequently, under t he condit ions of Lemmas 4.4 .1 an d 4.4 .2, .82 and th e Jaeckel estimator .83 also satisfy (4.4.1 7). Remark 4.4.2 Cons ider the case when Fn i
0
== F, F a d.f. satisfying
(F l) , (F 2) . Then J.L = 0 and S = T ({3 ). Moreover, under (N X c ) all ot her ass umptions of Lem mas 4.4 .1 and 4.4.2 ar e a priori satisfied. Note t hat here
Moreover , from Theorem 3.4 .3 above, it follows that S converges in distribut ion to a
N( o, a~ Ip xp) ,
a~
=
LV . ,
where
t' r.p (u )du ) 2 . Jt'o r.p2(u )du  (Jo
We summar ize t he above d iscussion as Corollary 4.4.1 Suppo se that (i .i .i ) with Fn i
== F holds. Suppos e
that F and X satisfy (Fl ) , (F2 ), and (NX c ) . In addit ion, suppose that r.p
is nondecreasing bound ed on [0, 1] and
J f dr.p (F ) > O.
Then
A 1l (.82  (3 ) =
(J
f dr.p (F ))  IT({3)
+ op(l) .
Hence,
T his resul t is qui te general as far as t he conditio ns on the design mat rix X and F are concerned but not that general as far as t he score function ip is concerne d . 0 Remark 4.4.3 Robustness against heteroscedastic gross errors . F irst we give a working definit ion of qualitative rob ustness .
Estim ation of QU )
127
Consider the model (1.1.1). Suppose that we have modeled the errors {eni' 1
~
i ~ n } to be i.i.d. F whereas their actua l d.f. 's ar e
{Fni , 1 ~ i ~ n }. Let p n := TI ~=l F, Qn := TI ~=l Fni denote the cor respond ing product probability measure. Definition 4.4.1. A sequence of est imators 13 is sa id to be qualit atively robust for f3 at F against Qn if it is consistent for f3 under P " and un der those Qn that sat isfy 'Dn
:=
max i SUPy IFni (Y)  F( y)1 t O.
The above definition is a variant of that of Hampel (1971). On e could use the notions of weak convergence on product probability spaces to give a bit more general definition. For example we could insist that the Prohorov dist an ce between Qn and P " should tend to zero inst ead of requiring 'Dn t O. We do not pursue this any further here. The result (4.4.17) can be used to st udy t he qu ali tativ e robust
132
ness of
against certain heteroscedasti c errors. Consider, for ex
amp le, t he gross errors model where , for some 0
~
6n i
~
1, wit h
maxi bni t 0,
and , where G is d .f. having a uniformly cont inuous a.e. positive density. If, in addit ion, {6n d satisfy n
(4.4.18)
II A1 I)Xni  Xn)6nill =
0 (1 ),
i=l
t hen one can readily see that IIK;l ll = 0(1 ) and IIILII = 0(1). It follows from (4.4.17) t hat 132is qualitatively robust against the above het eroscedastic gross errors at every F that has uniformly continuous a.e. positiv e density. Examples of bni satisfyin g (4.4.18) would be
6n i == n 1/ 2 or 6n i = p l /2I1 At{x n i

xn )lI ,
1 ~ i ~ n.
It may be arg ued t hat the lat ter choice of contaminating prop ortions
{bni } is more natural to linear regression t ha n t he former. A similar remark is applicable to 131and 133' 0
4. M, R and Som e Scale Estimators
128
4.5
Estimation of Q(f)
Consid er the mod el (1.1.1) with Fni de nsity J on JR. Define (4.5.1)
QU ) =
JJ
F , where F is a d .f. with
drp(F ),
where sp E C of (3.2.1). As is seen from Corollary 4.4.1, the paramet er Q appears in the asy mptotic vari ance of R  est imato rs. The comp lete rank an alysis of the mod el (1.1.1) requires an est imate of Q. This est imate is used to standardize rank test stat ist ics when carrying out the ANOVA of linear models using J aeckel' s dispersion .J of (4.4 .4). See, for example, Hettmansp erger (1984) and refer ences t herein for the rank based ANOVA . Lehmann (1963) and Sen (1966) give est imators of Q in t he one and two sa mple location mod els. These est imators are given in te rms of lengths of confide nce interva ls based on linear rank statistics. Cheng and Serft ing (1981) discuss several est imators of Q when obs ervations are i.i.d . F , i.e., when t here are no nuisance pa rameters . Some of t hese est imators ar e obtained by replacin g J by a kernel ty pe dens ity estimator and F by an emp irical d. f. in Q . Schewede r (1975) disc usses similar estimates of Q in t he one sa mple location mod el. In this sect ion we discuss two ty pes of est imators of Q . Both use a kernel typ e density est imator of J based on t he residuals and the ordinary residual em pirical d. f. to estimate F. The d iffer ence is in the way the window widt h and the kernel are chosen. In one the window width is parti ally based on the data and is of the orde r of square root of n and the kernel is the histogram typ e whereas in the ot her the kernel and the window width are arbit rary. It will be observed t hat the AUL result abo ut the residual empirical process of Corollary 2.3.5 is t he basic too l needed to prove t he consistency of t hese est imat ors. We begin with th e class of est ima tors wher e the wind ow wid th is partly based on the data. Define (4.5.2)
p(y)
:=
J
[F(y
+x) 
F ( y + x )]drp (F (x )),
y
~ O.
4.5. Estimation of Q(J )
129
Since cp is a d.f., p(y) = P (le e*1 ::; y), where e, e* ar e indep endent r.v .'s with respective d.f.'s F and cp(F) . Con sequ ently, under (Fl ), t he densi ty of pat 0 is 2Q . This suggests that an estimate of Q can be obtained by est imating t he slope of p at O. Recall the definiti on of t he residual empirical pr ocess H n (y, t ) from (1.2.1). Let fJ be an est imato r of f3 and define (4.5.3) A natural est imator of p is obtained by substituting
it;
for F in p ,
v.i.z. ,
Let  00 = e(O) < e( l ) ::; e(2) ::; . . . ::; ern) < e (n + l ) = 00 denote t he ordere d residuals {ei := Yi  x~fJ , 1 ::; i ::; n}. Since cp(H n ) assigns mass {cpUI n )  cp(j  1) In))} to each e(j ) and zero mass to each of the intervals (e (j  l ) ' e(j) ), 1 ::; j ::; n + 1; it follows that 'Ii y E JR, (4.5.4)
Pn (y) n
L {cpU In) 
cpU  1)l n ) }[Hn(y + e(j) )  H n(  y + e(j))]
i= l
From t his one sees t hat Pn (y) has the following int erpret ation. For each j, one first computes the proportion of {e(i) } falling in th e interval [y + e(j), y + e(j) ] and then Pn(y) gives the weight ed average of such proportions. Formula (4.5.4) is clearly suitable for computations . Now, if {h n } is a sequence of positive numbers te nding to zero , an est imator of Q is given by
T his estimator can be viewed from t he density est ima tio n po int of view also. Consider a kern eltype density est imator In of I base d on
4. M, R and Some Scale Estimators
130 the residuals {ei}:
n
fn(x)
:= (2nh n)1
L 1(lx  eil :S hn ), i=l
which uses the window wn(x) = (1/2) . 1(lxl :S h n) . Then a natural estimator of Q is
Scheweder (1975) studied the asymptotic properties of this estimator in the one sample location model. Observe that in this case the estimator of Q does not depend on the estimator of the location parameter which makes it relatively easier to derive its asymptotic properties. In Qn, there is an arbitrariness due to the choice of the window width h n . Here we recommend that h n be determined from the spread of the data as follows . Let 0 < 0: < 1, t~ be oth quantile of Pn and define the estimator Q~ of Q as (4.5.5) The quantile t~ is an estimator of the oth quantile to: of p . Note that if cp(s) := s, then to: is the o:th quantile of the distribution of leI  e21 and t~ is the oth quantile of the empirical d .f. Pn of the LV . 'S {lei  ejl , 1 :S i,j :S n}. Thus, e.g., t~ = 82 of (4.3.5) . Similarly, if cp(8) = 1(8 ~ 0.5) then to: (t~) is oth quantile of the d.f. of lei I (empirical d.f. of leil, 1 :S i :S n). Again, here t:/! would correspond to 81 of (4.3.5) . In any case , in general, to: is a scale parameter in the sense of Bickel and Lehmann (1975) . Note that the estimator Q~ is location and scale invariant in the sense of (4.3.3) and (4.3.4) , as long as is location and scale invariant in the sense of (4.3.1) and (4.3.2). This follows from the fact that under (4.3.1) and (4.3.2), Pn(Y, aY + Xb) = Pn(y/a , Y), t~(aY + Xb) = at~(Y) , for all y E~, a > 0, bE W. The consistency of Q~ is asserted in the following
/3
4.5. Estimation of QU)
131
Theorem 4.5.1 Let (1.1.1) hold with Fni (NX), (Fl) and (F2), assume that
/3
==
F.
In addition to
is an estimator of (3 satis
fying (4.3.15). Then, (4.5.6)
sup IQ~
 QU)I =
op(1) .
0,
E
Proof. Ob serve t hat t he event
[Pn(1  E)t
Q )
< 0: :S Pn(l + E)t ~
[(1  E)t
Q
Q ) ]
:s t~ :s (1 + E)t
Q ] •
Hence, by two applications of Lemma 4.5.2, once wit h y = (1 + E)t and once wit h y = (1  E)t we obtain that
Q ,
Q
,
lim inf P (l t ~  t Q I :S Et Q , V sp E C) n
> P(p( l  E)t =
Q )
< 0: :S p(( l + E) t
Q ) ,
Vep E C)
o
1.
Proof of Theorem 4.5.1. Clearly, V ep E C,
I Q~  Q(f)1
=
(2t~) 1 I nl /2Pn (n l/2 t~ )  2t~Q(f)I ·
By Lemma 4.5.3, V E > 0,
P (o < t~ :S (1 + E)t
Q ,
Vep E C)
1
1.
Hence (4.5.6) follows from (4.5.8) applied wit h a = (1 + E)t 4.5.3 and t he Slutsky Theorem .
Q ,
Lemma 0
4.5. Estimation of Q(J)
135
Remark 4.5.1 The estimator
Q~
shifts the burden of choosing the
window width to the choice of a . There does not seem to be an easy way to recommend a universal a . In an empirical study done in Koul, Sievers and McKean (1987) that investigated level and power of some rank tests in the linear regression setting, a to be most desirable.
= 0.8 was found 0
Remark 4.5.2 It is an interesting theoretical exercise to see if, for
some 0 < O)}
>
.6.EY c
yc
and where denotes the complement of y . Then (3* can be uniquely defined by the relation (3* = ((31 + (32 ) / 2. This (3* corresponding to L(a) = s was studied by Willi amson (1979, 1982). In genera l t his estimator is asymptotically relatively mor e efficient than Wilcoxon type Restimators as will be seen lat er on in Sect ion 5.6. There does not seem to be such a nice charac terizat ion for p 2:: 1 and general D sat isfying (5.2.16). However , pro ceeding as in the derivation of (5.3.6), a computat iona l formula for K * of (5.2.17) can be obtained to be
This formula is valid for a general used to compute {3 *.
(J" 
finit e measure L and can be
5. Minimum Distance Estimators
156
We now turn to t he m.d. est imator defined at (5.2.21) and (5.2.22). Let di == Xi  ii , The first obse rvat ion one ma kes is t ha t for t E JR, n
V n(t ) := sup yE IR
L di I(~ ~ y + tdd i=l
n
=
L
sup dJ( Rit ~ ns) 0'S s'S l i=l
Proceedings as in th e above discussion per tainin g to 13* , assume , without loss of generality, that the dat a is so ar ra nged that Xl ~ X2 ~ . . . ~ Xn so t hat d1 ~ d2 ~ . . . ~ dn· Let Y1 := {(Yj  ~ )/ ( dj  di);d i < O,dj 2 0, 1 ~ i < j ~ n }. It can be proved that V;i(D~) is a left cont inuous nondecreasing (right continuous nonincreasing) st ep fun cti on on JR whos e points of discontinuity are a subse t of Y1. Moreover , if  00 = to < ii ~ t2·· . ~ t m < t m+ 1 = 00 denote t he ordered members of Y1 then V ;i(t1  ) = = V ;;(t m+ ) and v ;i(tm+) = 2:7=1 d; = V~ ( t1 ) , where dt == max (di , 0) . Consequent ly, t he following ent it ies are finite:
°
13s1 .
inf{t E JR; V;i(t) 2 V ; (t )},
13s2 .
sup] t E JR; V;i (t) ~ V ; (t )}.
Note t hat 13s2 2 13s1 w.p. l.. One can now take f3s = (131 + 13s2)/ 2. Willi am son (1979) pr ovides t he pr oofs of t he abov e claims and obtains t he asy mptotic dist ribut ion of 13s . This est imator is the pr ecise generalization of th e m.d. est imator of th e two sam ple location par ameter of Rao, Schuster and Lit tell (1975). Its asymptotic distribut ion is t he sa me as that of t heir estimator. We shall now discuss some add it ional dist ri bu tional prop erties of th e above m.d. estima tors . To facilit ate this discussion let /3 denote anyone of the est imators defined at (5.2.8), (5.2.15), (5.2.18) and (5.2.22). As in Section 4.3, we shall write /3(X , Y ) to emphasize the depe nde nce on the data { (x~ , Y i ) ; 1 ~ i ~ n} . It also helps to t hink of th e definin g dist an ces K , K + , etc . as fun cti ons of residu als. Thus we shall some times write K (Y  Xt) etc. for K (t ). Let K stand for eit her K or K + or K * of (5.2.7), (5.2.14) and (5.2.17). To begin with , obse rve t hat (5.3.21)
K (t  b ) = K(Y
+ Xb 
Xt ), \I t , b E W,
5.3. Finite sample prop erties
157
so that (5.3.22)
!3(X, Y
+ Xb) =
!3(X, Y)
+ b,
Vb E W .
Consequentl y, the dist ri bu ti on of !3  {3 does not depend on {3. The dist an ce measure Q of (5.2.9) do es not satisfy (5.3.21) and hen ce the distribution of (:J  {3 will generally depend on {3. In general, the classes of est imators {,8} and {{3 +} are not scale invariant . However , as can be readily seen from (5.3.6) and (5.3.7) , the class {,8} corr esp onding to G (y) = y, H i = F and those {D} that satisfy (5.2.16) and the class {{3+} corres ponding to G(y) = y and general {D} ar e scale invariant in the sense of (4.3.2). An interesting property of all of the above m.d. estima tors is that th ey are invari ant under nonsin gular tran sformat ion of the design matrix X . That is, !3(XB , Y) = B  1!3 (X, Y) V p x p nonsin gular matrix B. A similar st at ement holds for (:J. We sha ll end t his sect ion by discussin g t he symmetry pr operty of th ese est ima tors . In the following lemma it is implicitly ass umed t hat all integrals involved are finite. Some sufficient condit ions for t ha t to happen will unfold as we proceed in this chapter. Lemma 5.3.2. Let (1.1.1) hold wi th th e actual and the modeled d.f. of e; equal to H i , 1 ::; i ::; n . (i) If either (ia) {Hi ,l ::; i ::; n } an d G are sym me tri c aroun d 0 an d {Hi , 1 ::; i ::; n } are con tinuous, or (ib) dij = dn H1 ,j , Xij =  Xn i+l ,j and H i = F V I::; i ::; n, 1 ::; j ::; p , th en
,8
an d {3* are symmetrically distributed aroun d {3 , whenever th ey exist uniquely.
(ii) If {Hi , 1 ::; i ::; n } and G are symmetric around 0 an d eit her { H i , 1 ::; i ::; n} are cont inuous or G is con tinuous, th en
5. Minimum Distance Estimators
158
13+
is symmetrically distributed around 13, whenever it exists uniquely. Proof. In view of (5.3.22) there is no loss of generality in assuming that the true 13 is O. Suppose that (ia) holds. Then /:J(X, Y) =d /:J(X,  Y). But, by definition (5.2.8) , /:J(X,  Y) is the minimizer of K(  Y  Xt) w.r.t. t . Observe that V t E W , K(Y  Xt)
=
t J[t
t J[t
t J[t
r r r
dii{ I( Yi : 0 :3 a 0 < Zf < 00 and N 1f such th at
Pn(IMn( 9 0)1 :S Zf) ?: 1 (A5) V E > 0 and 0 and 0:) such t hat
E,
V n?: Nl{
< 0: < 00, :3 an N 2f and a b (depending on
E
It is conveni ent to let , for a 9 E W ,
Qn(9 , 9 0)
:=
(9  90)'S n(90) + ~(9  9 0)'W n (9 0)(9  9 0),
and On := argmin{Qn(9 , 9 0) , 9 E W }. Clearly, relation (5.4.3)
On must
satisfy the
5.4. A general case
163
where B n := 8~ lwn8~1, where W n = Wn(l~O) ' Some generality is achieved by making the following assumption. (A6) 118 n(On  ( 0)11 = Op(I). Note that (A2) and (A3) imply (A6). We now state and prove Theorem 5.4.1 Let th e dispe rs ions M n satisfy (Ai) , (A4)  (A6). Th en , under ti;
(5.4.4)
I(On  On)'8 nB ndn(On  On)1 = op(I) , inf Mn(O)  Mn(Oo)
(5.4.5)
O ED.
= (1 /2)(On 
Oo)'Wn(On  ( 0) + op(I) .
Con sequently, if (A6) is repla ced by (A2) and (A3), th en
8n(O n  ( 0) ~d {W(OO)} ly(OO),
(5.4.6) (5.4.7)
inf Mn(O )  Mn(Oo) O ED.
= (1 /2)S~(00)8~113~18~ISn(00)
+ op(I) .
If, ins tead of (Ai)  (A3), M n satisfi es (AI)  (A3), and if (A4) and (A5) hold th en also (5.4.4)  (5.4.7) hold and
(5.4.8) whe re
Proof. Let
Zt
be as in (A4). Choose an a >
[ IMn(Oo)1 ~
Zt,
inf Mn(Oo Ilhll>b
C [ inf Mn(Oo +
Ilhll:Sb
C [ inf
IIhll>b
Mn(Oo
Zt
in (A5). Then
+ 8~lh) 2:: a]
8~lh) ~ Z t , Ilhll>b inf Mn(Oo + d~lh)
+ 8~lh) >
inf Mn(Oo Ilhll :Sb
+ 8~lh)]
2:: a ]
.
Hence by (A4) and (A5), for any E > 0 there exists a 0 (now dep ending only on E) su ch that V n 2:: NIt V N2t ,
0. Consider n
E[Mn(On)  Mn(Oo)  (On  ( 0 ) Lg(Xi,Oo)] n n
J Jl
i=l
[g(x, On)  g(x . ( 0 )
(On  ( 0 ) g(x , ( 0 ) ] dF(x)
un
[g(x,00
l J l x l un
n

[g(x. eo
+ s)  g(x ,Oo)] ds dF( x) , + s)  g(x , ( 0 ) ] dF(x) ds ,
un
n
n 1/ 2
[
(00
+ s)  >.(( 0 ) ] ds
tn
[>'(0 0
+ sn  1/ 2 ) 
>.(00 ) ] ds .
In the above , the second equality follows from a and (5.4.11) while the third uses Fubini. Note that t;.j2 = J~n s ds. Hence
167
5.4. A general case
0.
(On  Oo)g(X,( 0 ) ]
< nE[g(X, On)  g(X,Oo)  (On  Oo)g(X,Oo)f
J[l Jl
un
n
< n
{g(x , s + ( 0 )
un
u ,
Tnnl/21un

g(x , Oo)} ds
f
dF( x)
{g(x , s + eo)  g(x , Oo)} 2 dsdF(x)
J
{g(x , s + ( 0 )

g(x , ( 0 ) } 2 dF( x)ds
< b2(bn 1/ 2)  1
xl! s+ ( bn 
1/ 2
{g(x ,
+
0,
0) 
g(x , Oo)}2dF(x)
by the assumption c .
This completes the proof of (5.4.13) in the case On > 00. The proof is similar in the opposite case , thereby completing the proof of (5.4.13) . Actually we can prove mor e under the same assumptions . Let
We sh all now show , under the same assumptions as above , that for every 0 < b < 00 , (5.4.15)
E(SUItl:SbP ID
n(t)I)2
+ o.
To prove this , because of a , we can rewrite with probability 1 for all t 2': 0 and for all n 2': 1,
5. Minimum Distance Estimators
168
n
Dn(t)
=
L
[g(Xi,(JO+ tn l / 2 )

g(Xi , ( 0)
i::::l
2
. ] t A(Oo) 2n
 tn" 1/2 g(Xi , ( 0) 
=
=
rt
Jo
n l / 2
t
[g(Xi , 00 + sn l / 2 )

g(Xi , ( 0)
i :::: l
 sn l / 2 ~(OO) ] ds
it t
[g(Xi , 00 + sn l / 2 )
n l / 2
o

g(Xi , ( 0)
i::::l
 A(Oo + sn l / 2 )
+ A(Oo) ] ds
t
+ 1
[
n l / 2[A(00
+ sn l / 2 )

A(Oo)
 sn l / 2 ~(Oo) ] ds say . Hence ,
E( sup ID
(t)12 )
nl
O'(t) . Suppose, additionally, that 1 1 is continuously differ entiable at 0 and that 1 2 is cont inuous at O. Then , under (NX) , it follows that Huber 's dispersion M n is
171
5.4. A general case ULANQ with n
=
00
~n
0,
= (X'X)I/2,
Sn(j3)
=  LXi1/J(J!i 
x~{3),
i=1
W
= \(O)X'X, W = \(O)Ipxp,
n({3)
:E = J'ljJ2dFlpxp .
If p is additionally assumed to be convex , then using a result in Rockafeller (1970), to prove the ULANQ, it suffices to prove that the score statistic Sn({3) is asymptotically linear in (X'X)I/2({3  {30) , for {3 in the set II (X'X)I/2({3  {30)11 :::; b, in probability. See Heiler and Weiler (1988) and Pollard (1991) for details. For p(x) = Ixl and F continuous, 'ljJ(x) = sgn(x) and Ir(t) = 2r IF (t )  F(O)I. The condition on II now translates to the usual condition on F in terms of the density f at O. For p(x) = x 2, 'ljJ(x) == 2x , 11(t) == 2t , so that II is trivially continuously differentiable with ')'1 (0) = 2. Note that in general W =I :E unless \(0) = J 'ljJ2dF which 0 is the case when 'ljJ is related to the likelihood scores.
3. An example where the full strength of (Ai)  (A3) is realized is obtained by considering the least square dispersion in an explosive AR(I) process where X o = 0 and for some 101 > 1, EXAMPLE
i = 1,2, ' " ,
where it is assumed that e, are mean zero finite variance i.i.d. r.v .'s . Let 1 n Mn(t) := 2 (Xi  tXi _ 1)2, ItI > 1.
L
i=1
Then the least square estimator of 0 is en := argmintMn(t). We now verify (Ai)  (A3) for this dispersion. Let 00 denote the true value. Rewrite, for a 101 > 1, 1 n
Mn(O)
= 2 L (s, 
(0  00)X i _ 1)2
i=1
1
n
n
2L i=1
(0  00 )
L i=l
ciXi  l
+
1
2(0  00 ) 2
n
L i=l
xLI
5. Minimum Distance Estimators
172
So one read ily sees that what Sn and W n should be. Now t he question is what should be On , Wand Y . F irst note that one can rewrite j
x, =
L O~  i ei, i= l
so t ha t j
ooj x, =
L «' e, i= l
is a mean zero square int egrabl e martingale. Not e also that , with a 2 = E eI , and because 05 > 1, as j 7 00,
sup)· E ( 4)2 =
The above derivati ons also show t hat
T
2
°0 exists theorem , t here
Hence, by t he martingale convergence with mean zero and vari an ce T 2 such th at a.s. and in
£2 ,
as j
< a
00 .
LV.
Z
7 00 .
This implies that IXjl explodes to infinity as j + 00. Also not e that becaus e ~!=1 00iei is a sum of i.i.d . mean zero finit e vari an ce r.v.'s , by the CLT , Z is a N(O , (05 _1) 1) LV. Now let Zi := 00 iXi . Then
Oo 2nW n =
n
L 002i Z~ _ i
7
Z 2/(05  1),
a.s .
i=1
This suggests that th e On
= O~
and W
= Z 2/ (05 
n
00 n s;
= L Oo iZni i= 1
enHI ,
1). Now consider
173
5.5. Asymptotic uniform quadraticity
Because of the indep end ence of Ei from X iI , Sn can be verified to be a means zero squa re int egrable martingale. By the martingale cent ral limit theorem it follows that
where ZI is a N (O , (115  1)1 ) LV. , indep endent of Z . So here Y = !ZIZI. 0 The next secti on is devoted to verifyin g (A1)  (A5) for vari ous disp ersion introduce in Sect ion 5.2.
5.5
Asymptotic Uniform Quadraticity
In this section we sh all give sufficient conditions under which K D , KiJ of Sect ion 5.2 will satisfy (5.4.Al) , (5.4.A4) , (5.4.A5) and K and Q of Section 5.2 will satisfy (5.4.AI). As is seen from t he pr evious sect ion this will bring us a step closer to obtaining the asy mptot ic distributi ons of various m.d. est imators introduced in Sect ion 5.2. To begin wit h we shall focus on (5.4.A l) for KD ,KiJ and K Our bas ic objective is to study t he asy mptotic dist ribu tion of when t he actual d.f.'s of { e n i ' 1 ~ i ~ n} are {Fni , I ~ i ~ n} bu t we mod el them to be {Hn i , 1 ~ i ~ n} . Similarly, we wish to st udy t he asy mptotic distribution of 13i) when actually t he errors may not be symmetric but we model t hem to be so. To achieve these object ives it is necessary to obtain t he asy mptot ic resul ts un der as genera l a setting as possible. This of course makes the expos it ion t hat follows look somewha t complicated. The results thus obtained will ena ble us to study not onl y th e asymptotic distributions of th ese estimators at the true model but also some of their robustness pr op erties. With this in mind we pr oceed to state our assumptions.
n
n·
/3D
(a) X
satisfies
(NX).
(b) With d(j) denoting t he j t h column of D , IId(j) 11 2 > 0 for at least one j ; Ild(j)1 12 = I for all t hose j for which Ild(j) 1I 2 > 0, 1 < j
0,
°< li~inf
/ g;dGn
~ limnsup /
and such that there exists an a
g;dGn
satisfying
liminfinf{e'I'ne; e E 1W ,lIell = n
1} ~ a .
(l) Either (1) e'dniX~iAe ~ 0, V 1 ~ i ~ n and VeE W , lIell = 1.
Or
(2) e'dniX~iAe ~ 0, V 1 ~ i ~ n and VeE W , Ilell = 1.
In most of the subsequent applications of the results obtained in this section, the sequence of integrating measures {Gn } will be a fixed G. However , we formulate the results of this section in terms of sequences {G n} to allow ext ra generality. Note that if Gn == G, G E VI(JR) , then th ere always exist a agE Lr(G) , r = 1,2, such that g > 0, < J ldG < 00 . Define , for y E JR, u E W , 1 ~ j ~ p ,
°
(5.5.1)
sJ(y , u)
.
Yjd(y , Au) ,
yjO(y, u)
. SJ(y , u)  fL1(y , u).
Note that for each j , SJ , fL1 , ~o are the same as in (2.3.2) applied to X ni = Yni , Cni = AXn i and dni = dnij , 1 ~ i ~ n , 1 ~ j ~ p . Notation. For any functions g , h : W+l + JR,
Igu 
hvl~ := /
{g(y , u)  h(y, v)}2dG n(y) .
Occasionally we shall write Igl~ for Igo I~ · Lemma 5.5.1. Let Yn 1 , ' " , Y n n be independ ent r.v. '8 with respective d.f.'s F n 1 , ' " , Fn n . Then (e) implies p
(5.5.2)
E
I: Ilj~l~ = 0(1) . j=l
5. Minimum Distance Estimators
176
Proof. By Fubini's Theorem, n
P
E
L IYJol; = / L j=l
2
IIdi ll Fi (l  Fi)dGn
i=l
and hence (e) implies the Lemma. 0 Lemma 5.5.2. Let {Ynd be as in Lemma 5.5.1. Then assumptions (a)  (d) , (f)  (j) imply that, for every 0 < b < 00, P
L
E sup IYj~ Ilull:Sb j=l
(5.5.3)
Proof. Let N(b) . {u E VUE N(b),
jRP;

Yj~l; = 0(1).
IIuli :S b}. By Fubini 's Theorem,
P
L
E
I}~?u  Yj~I;
j=l
:S /
t
IIdi ll
r:(/
2lFi(Y
+ c~u)  Fi(y)ldGn
i =l
:S where b.,
= b max, K,i , In P
(5.5.4)
In(Y + x)dGn(y))dx
as in (f). Therefore, by the assumption (J),
ELI Yjou 
Yj~ I~ = o(1), VuE
jRP.
j =l
To complet e the proof of (5.5.3), because of the compactness of N(b) , it suffices to show that V E > 0, :3 a 8 > 0 such that V v E N(b) , P
(5.5.5)
lim sup E n
L
sup IL j u  Ljvl :S lIu vll :So j =l
E,
where Lj u
:=
IYj~

Yj~I; ,
u E
JRP , 1:S j :S p .
Expand th e quadratic, apply the CS inequality to the cross product terms to obtain , for all 1:S j :S p ,
177
5.5. Asymptotic uniform quadraticity ~
Moreover, for all 1 (5.5.6)
IYj~
j ~ p,
< 2{ISJu  SJvl~ + IltJu  ItJvl~}, ISJu  SJvl~ < 2{lstu  stl~ + ISju  Sjvl~}' IltJu  ItJvl~ < 2{llttu  Ittvl~ + Iltju  ItJvI~} ,

Yj~I~
are the SJ, ItJ with dij replaced by dtJ := max(O, dij), dij := dt;  dij , 1 ~ i ~ n , 1 ~ j ~ p . Now, Ilu  vii ~ 6, nonnegativity of {dtJ}, and the monotonicity of {Fd yields (use (2.3.17) here) , that for all 1 ~ j ~ p , where SJ'
It]=
r
±  Itjv ± 1n2 Itju
I
s;
J[t, d~
{F,(y + c;v +OK,)  F;(y + c;v  OK,)}
dGn(y).
Therefore, by assumption (g), p
(5.5 .7)
lim sup
sup
L IltJu  ItJvl~ ~ 4k6
2
.
Iluvll~o j=l
n
By the monotonicity of SJ and (2.3.17) , for all 1 ~ j ~ p, Y E ~,
Iluvll
~
0 implies that
n
 "d*I(6K L lJ l' < 1':l  v' c.l  Y <  0) i= l
~ SJ(Y, u)  SJ(Y , v) n
~
L dtJI(O < Yi 
V'Ci 
Y ~ 6Kd·
i=l
This in turn implies (using the fact that a ~ b ~ c implies b2 ~ a2+c2 for any reals a, b, c)
{ SJ(Y, u)  SJ(Y, v) } 2
< {
t
dtJ I (0 < Yi  Y  v'c,
l=l
+
{t l=l
~ 0Ki) }
dtJI( OKi < Yi  Y 
2
V'Ci
~ 0)}2
5. Minimum Distance Estimators
178
: . 0 f th e compac t ness of N (b) and th e assumption (i), it suffices to pr ove that V E > 0, 3 a 0 > 0 3 V v E N(b ),
(5.5.12)
lim sup n
sup s~p ~ju Ilu vll:S8 j = I
 ~jv l
:S
E
But
I~ju  ~jvl < 2{lfLJu/lJvl; + Ilu  vll 2 l1 vjll; + ~;~2 [1JlJu + IfLJu 
fLJvln + Ilu 
VIl IlVj lln]
fLJvlnllu  VIIIIVjlln}.
Hen ce, from (5.5.7) and t he ass um ption (i) , the left hand side of (5.5.12) is bo unded above by
2 { 4k02 + 02(a + 2k 1 / 2 a 1/ 2 ) } = k10 2 where a = lim sup., L~=l IIvjll; . Thus up on choosing 02 :S Elk1 one obtains (5.5.12), hence (5.5.11) and t herefore t he Theorem . 0 Our next goal is to obtain an analogue of (5.5.9) for Ki) . Before stat ing it rigorously, it helps to rewr ite Ki) in terms of standardized processes {Y/} and {fLJ } defined at (5.5.1). In fact , we have
Ki)(Au ) =
P
L
j= l
!
]2
n
[SJ(Y,u)  Ldij +SJ (y, u ) dGn(y) i= l
P
=
L ! [yjO(y, u )  yjO(y)
+ Y/(  V, u ) 
Y/(  V)
j= l
+ fLJ(y , u)  fLJ(y)  U'Vj(Y) + fLJ(  V, u )  fLJ(  V)  U'Vj (Y) + u'vj (y) + W/(y) + mj(y)f dGn(y),
5.5. Asymptotic uniform quadraticity
181
where
n
mj(y)
.
Ldij{Fi(Y)  1 + Fi(y)} i=l
n
=
p,J(y)
+ p,J(y)
 L
dij ,
Y E JR, 1
~
j
~ p.
i=l
Let p
(5.5.13)
ki)(Au) = L
j[w/ +mj +u'vj]2dGn,
u E IW.
j=l
Now proceeding as in (5.5.10) , one obtains a similar upper bound for IKi)(Au)  ki)(Au)\ involving terms like those in R.B .S. of (5.5.10) and the terms like IYj~ lj~ln, Ip,Ju  p,J u'Vjln , Ilvjl!n, IYjOI n , where for any function h : W+1 ~ JR,
It thus becomes apparent that one needs an analogue of Lemmas 5.5.1 and 5.5.2 with G n () replaced by G n ( _ .) . That is, if the conditions (e)  (j) are also assumed to hold for measures {G n ( _ · )} then obviously analogues of these lemmas will hold . Alternatively, the statement of the following theorem and the details of its proof are considerably simplified if one assumes Gn to be symmetric around zero, as we shall do for convenience. Before stating the theorem, we state Lemma 5.5.3. Let Y n1, ' " , Y nn be independent r.v. 's with respective d.f. 's Fn1,'" , Fnn · Assume (a)  (d) , (f) , (g) hold , {G n} satisfies (5 .3.8) and that (5.5.14) hold, where
(5.5.14)
j t IldniI1 2
{ Fni(y)
+1
Fni(y)}dGn(y) = 0(1).
i =l
Then, p
(5.5.15)
E
L j=l
IYj~l=n
= 0(1),
5. Minimum Distance Estimators
182
and \:j 0 < b
1 Ilull>b
E,
'in 2 N,
P roof. As before write K , K etc . for K o , K o etc. Recall the definition ofI'n from (k) . Let kn(e):= e'I'ne, e E ~p . By the CS inequality and (k) , p
(5.5.23)
sup Ikn(eW :::; Ilell=I
Ilr nll2
:::;
L
j =I
II vj ll; · Ign!;
= 0(1).
5. Minimum Distance Estim ators
184 Fix an
E
> 0 and a
z E (0,00) . Define, for t E IRP , 1 ~ j ~ P,
~(t)
.
!
Vj( t)
.
! [Vjd(Y, t ) 
{Y/
+ t'Rj + mj }gndGn,
t
dnij Hni(Y)] gn(y)dGn(y) ·
i=I
Also, let , ,
'
V := (VI , '"
,
,
,Vp ) , V := (VI, '"
Write a u E IRP with t he CS inequality,
2
.
, Vp ) , "[n := Ignln, T := lim sup jc . n
lIuli > b as u =
re ,
Irl > b, lIell
= 1. Then , by
inf K (Au ) Ilull>b
>
inf (e'V(r A e))2 l t« , Irl>b.llell=I
inf K( A u) Ilull>b
>
in f (e'V(rA e))2 h Irl>b,lIell=I
n,
It thus suffices to show t hat B abE (0, 00) and N 3 , V n 2: N , (5.5.24) (5.5 .25)
p( Ir l>b,lIell=I inf (e'V (r A e))2 h n 2: z) p( Ir l>b,llell=I inf (e'V(r A e))2 hn 2: z)
>
1
E,
> 1
E.
But V UE IRP , IIV(Au)  V (Au)11 P
~ 2Tn
I: {IYj~ 
Yj~ l; + IflJu 
flJ u'vjl;}·
j =I Thus, from (k), (5.5.3) and (5.5.10), it follows t hat V B E (0, (0) , (5.5.26)
sup IIV (Au)  V (Au)1I = op( l). IlullSb
Now, let T j := ! {Yl and rewrite
+ m j}g ndGn,
1 ~ j ~ Pi T' := (TI
e'V(rAe) = e'T
+ r k n( e) .
, '"
,Tp ) ,
5.5. Asymptotic uniform quadraticity
185
Again, by the CS inequality, Fubini, (5.5.3) and the assumptions (j) and (k) it follows that :3 N; and bl , possibly both depending on 10, such that
P(IITII :::; bt}
(5.5.27)
~ 1
(E/2),
Choose b such that (5.5.28) where a is as in (k). Then, with an := inf{lkn(e)l ; IIeil = I},
(5.5.29)
p(
,
2
inf (e'V(1'Ae)) Irl=b,llell=l "[n. P(le'Y(1'Ae)1
>
~
z)
~ (z"Yn)l /2,
P(lle'TI  11'llkn(e)11
V Ilell = 1,11'1
= b)
~ (z"Yn)l /2, Vllell
= 1,11'1 = b)
> p(IITI:::; (z"Yn)l /2 + ban) >
p(IITII:::; _(z"Y)l/2 + ba)
> p(IITII:::; bl ) ~ 1 (E/2),
Vn
~
Nl
·
In the above, the first inequality follows from the fact that Ildllell :::; jd+cl , d, e real numbers; the second uses the fact that [e'T] :::; IITII for all IIell = 1; the third uses the relation (00 , _(z"Y)l /2 + ba) c (00 ,  (Z"Yn)l /2+ban) ; while the last inequality follows from (5.5.27) and (5.5.28) . Observe that e'Y (1' Ae) is monotonic in r for every II e II = l. Therefore, (5.5.29) implies (5.5.25) and hence (5.5.22) in a straight forward fashion . Next , consider e'V(1'Ae) . Rewrite e'V(1'Ae) =
J
i)e'dd[I(Yni :::; Y + 1'X~iAe))  Hni(y)]gn(y)dGn(y) i=l
5. Minimum Distance Estimators
186
which, in view of the assumption (l), shows that e'V(r Ae) is monotonic in r for every lIell = 1. Therefore, by (5.5.26) :3 N 2 , depending on E, 3
P (
(e'V(rAe))2 >
inf Irl>b,lIell=1
/'n
z) P ( inf (e'V(rAe))2 2: z)  (E/2), Irl=b,lIell=1
> P(
(e'V(rAe))2 >
inf
Ir/=b,llell=1
>
z)
/'n

"[n.
>
1
2: N 2
\j n
E,
V N1,
by (5.5.29). This proves (5.5.24) and hence (5.5.21) . 0 The next lemma gives an analogue of the previous lemma for KiJ. Since the proof is quite similar no details will be given . Lemma 5.5.5. In addition to the assumptions of Theorem 5 .5.2 assume that (k+) and (l) hold, where (k+) is the condition (k) with
T'n replaced by
r;t
:=
(vi , ' " ,vt ) and where {vt} are defined just
above (5.5 .13) . Then , \j E > 0, 0 < Z < 00, :3 N (depending only on (depending on E, z) 3 \j n 2: N ,
Z)
> 1  E,
P( inf kiJ(Au) 2: z)
> 1  E.
P ( inf KiJ(Au) 2: lIull>b
lIull>b
E)
and a b
o
The above two lemmas verify (5.4.A5) for the two dispersions K and K+. Also note that (5.5.22) together with (e) and (j) imply that IIA 1(A  ,8)11 = Op(I), where A is defined at (5.5.31) below . Similarly, Lemma 5.5.5, (e) , (5.5.17) and the symmetry assumption (5.3.8) about {G n } imply that IIA 1(A +  ,8)11 = Op(l) , where A + is defined at (5.5.35) below . The proofs of these facts are exactly similar to that of (5.4.2) given in the proof of Theorem 5.4.1. In view of Remark 5.5.1 and Theorem 5.4.1, we have now proved the following theorems. Theorem 5.5.3 Assume that (1.1.1) holds with th e modeled and actual d.f. 's of the errors {eni' 1
:s: i :s: n}
equal to {Hni , 1
:s: i :s: n}
187
5.5. Asymptotic uniform quadraticity
and {Fni, 1 ::; i ::; n}, respectively. In addition, suppose that (a)(l) hold. Then
(5.5.30) where
A satisfies
the equation
(5.5.31) If in addition,
(5.5.32)
J3 n 1 exists for all n ~ p,
then ,
(5.5.33)
:rnand B n
are defined at (5.5.19) . 0 Theorem 5.5.4 Assume that (1.1.1) holds with the actual d.j. 's of the errors {e ni ,l ::; i ::; n} equal to {Fn i , l i S n} . In addition, suppose that {X, Fni , D, Gn } satisfy (a)  (d), (1)  (i), (5.3.8) for all n ~ 1, (k) , (I) and (5.5.14). Then , where
0, 1 :S i :S n ; m~XIJni + 0. lim
s~ 1
JIylj
fk( sy)dG( y) =
JIylj
t
f k(y)dG( y) ,
j = 1, k = 1, j = 0, k = 1,2.
Claim 5.5.4. Under (a ), (b), (d) with G n == G , (5.5.44) (5.5.46) , (5.5.50) and (5.5.51) , the assumpt ions (e)  (i) are satisfied.
Proof. By the Lemmas 9.1.5 and 9.1.6 of t he App endix below, and by (5.5.23), (5.5.27) , and (5.5.31), (5.5.52)
lim lim sup max
x~o
lim
x~o
n
J
If(y
t
J
+ x) 
If(T1( y + x ))  f( y + x W dG(y) = 0,
f( yW dG( y) = 0, r = 1,2.
5.5. Asymptotic uniform quadraticity
195
Now,
! t IIdIl
2{F (1  Fd  F (l  F )}dG i
i= l
r] W(TIY)  F (y)ldG(y) 2p mrx i ! lylf(sy)dG(y)ds = 0(1),
< 2p
Ti
0 p
n
j [Ld;ij{Fni(ya s + C~iv + j=l i=l
lim sup L
n (5.5.59)
Fni(ya s + CniV

8(n 
1
/
00,
2 IYI + "'nd)
8(n  1/ 2IYI + "'ni))}
for some k not depending on (s , v) and 8.
0, and 0 < J g2dG < 00 . The conditi on (5.5.k ) with gn == 9 becomes (5.6.3)
lim inf inf le'D'XAel 2:: n Ilell=1
0,
for some
0
> O.
Cond ition (5.5.l) implies th at e'D'XAe 2:: 0 or e'D'XAe < 0, VileII = 1 and V n 2:: 1. It need not imply (5.6.3) . The above d iscussion together wit h t he LF Cra merWold Theorem lead s to Corollary 5 .6a.1. Assume that (1.1.1) holds with the error r.v. 's correctly modeled to be i.i. d. F, F known . In addition, assume that (5.5.a), (5.5.b), (5.5.l) , (5.5.44)  (5.5.46), (5.6.3) and (5.6.4) hold, where (5.6.4)
(D 'X A)  1 exists for all n 2:: p.
Then,
(5.6.5)
AI (!3D  (3)
(D'XA J f 2dG) 1
t
i=1
If, in addition, we assum e (5.6.6)
then
(5.6.7)
d ni[1j1(end  E 1j1(endJ+ op (l ).
5.6.1. Asymptotic distributions, efficiency & robustness
203
whe re
~D
'=
.
(D'XA)1D'D(AX'D )1
2 ,T
= Va7"lp( ed
(J j2dGF'
0
For any two square matrices L 1 and L 2 of the same order, by L 1 ~ L 2 we mean t hat L 1  L 2 is nonnegative definite. Let L and J be two p x n matrices such tha t (LL')1 exists . The CS inequ ality for matrices states that
(5.6.8) with equality if and only if J ex: L. Now note that if D = XA then ~D = I p x p . In general, upon choo sing J = D' , L = AX' in this inequality, we obtain
D'D
~
D'XA . AX'D or
~D ~ I p x p
with equality if and only if D ex: XA . Fr om t hese obse rvat ions we deduce Theorem 5.6a.l ( Opt imality of fJx ). Suppose that (1.1.1) holds with th e error r. v. 's correct ly modeled to be i.i. d. F . In addi tion, assume that (5.5 .a) , (5.5.d) with G n == G , (5.5.44)  (5.5.46) hold. T hen , among the class of estim ators {fJ D; D satisfying (5.5.b), (5 .5.l) , (5.6.3) , (5.6.4) an d (5.6.6) }, th e estim ator th at minimizes the asymptotic variance of b ' A  1(fJn  (3), f or every b E IRP , is fJx the fJn with D = XA . 0 Ob serve that under (5.5.a) , D = XA a priori sati sfies (5.5.b), (5.6.3) , (5.6.4) and (5.6.6). Conse que ntl y we obtain Corollary 5.6a.2. (A sym pt ot ic n ormalit y of fJx) . A ssume that (1.1.1) holds with th e er ror r.v. 's correctly modeled to be i.i. d. F . In add it ion, assu me that (5 .5.a) an d (5.5.44)  (5.5.46) hold. Th en , A
 1
((3x  (3) A
'td
2
N( O, T Ip x p ) .
o
Remark 5.6a.1. Wri te fJn (G) for fJn to emphasize the dependen ce on G. The above t heorem proves the op t imality of fJx (G) among a class of estimators {fJn (G) , as D varies}. To obtain an asy mpt ot ically efficient est imator at a given F amo ng t he class of estimato rs
5. Minimum Distance Estimators
204
{.8x (G) , G varies} one must have F and G satisfy the following relation. Assume that F satisfies (3.2.a) of Theorem 3.2.3 and all of the derivatives that occur below make sense and that (5.5.44) hold. Then, a G that will give asymptotically efficient .8x (G) must satisfy the relation
fdG
(I/I(J) . d(j / J),
I(J) :=
J
(j / J) 2 dF.
From this it follows that the m.d . estimators .8x(G), for G satisfying the relations dG(y) = (2/3)dy and dG(y) = 4doo(y), are asymptotically efficient at logistic and double exponential error d.f. 's , respectively. For .8x(G) to be asymptotically efficient at N(O,I) errors, G would have to satisfy f(y)dG(y) = dy . But such a G does not satisfy (5.5.58) . Consequently, under the current art of affairs , one can not estimate f3 asymptotically efficiently at the N(o , 1) error d.f, by using a .8x (G) . This naturally leaves one open problem, v.i.z., Is the conclusion of Corollary 5.6a.2 true without requirinq f f dG < 00,0 < f f 2 dG < oo? 0 Observe that Theorem 5.6a .l does not include the estimator .81  the .8D when D = n 1 / 2 [1, 0"" ,O]nxp Le., the m.d. estimator defined at (5.2.4) , (5.2.5) after Hni is replaced by F in there. The main reason for this being that the given D does not satisfy (5.6.4). However, Theorem 5.5.3 is general enough to cover this case also . Upon specializing that theorem and applying (5.5.31) one obtains the following Theorem 5.6a.2. Assume that (1.1.1) holds with the errors correctly modeled to be i.i.d. F. In addition, assume that (5.5.a), (5.5.44) (5.5.46) and the following hold. (5.6.9)
Either
nl/2elx~iAe 2: 0 for all 1 ~ i ~ n, all
lIell
= 1,
or
nl /2elx~iAe ~ 0 for alII ~ i ~ n, all lle] = 1.
(5.6.10)
5.6.1. Asymptotic distributions, efficiency & robustness where
xn
(5.6.11)
is as in (4.2a.11) and
n
1/2'
xnA . A
1
fit
205
is the first coordinate of O. Then
A
_
({31  {3) 
Zn J j2dG + "» ( 1) ,
where n
Zn
= n 1 / 2 I)1/J(e nd  E1/J(end} ,
with
1/J as in (5.6.2) .
i=1
Consequently, nl/2x~(,Bl  {3) is asymptotically a N(O, 72) r.v. 0 Next, we focus on the class of estimators {{3i;} and the case of i.i. d. symmetric errors. An analogue of Corollary 5.6a.1 is obtained with the help of Theorem 5.5.4 instead of Theorem 5.5.3 and is given in Corollary 5.6a.3. The details of its proof are similar to those of Corollary 5.6a.1. Corollary 5.6a.3 . Assume that (1.1.1) holds with th e errors correctly modeled to be i.i.d. symmetric around O. In addition, assume that (5.3.8), (5.5.a), (5.5 .b), (5 .5.d) with G n == G, (5.5.44) , (5.5.46), (5.6.3), (5.6.4) and (5.6.12) hold, where
1
00
(5.6.12)
(1  F)dG
O.
(5.6.16)
lIell =1
n
(5.6.17)
Either
e'dni(xni 
x n )' A 1e 2: 0, VI :S i:S n, V [e] = 1,
or
(5.6.18)
(D'X c A 1 )
1
exists for all n 2: p .
Then,
(5.6.19)
1
A1
CBo 13)
1 1
(D'X cA 1
o
q2dL)1
t
i =1
dniep(F(end)
+ op(l) .
5.6.1. Asymptotic distributions, efficiency & robustness
207
If, in addition, (5.6.6) holds, then
(5.6.20) where ~D = ( D/Xc Al) l D'D(AlX~ D)l , and 2 _
(5.6.21)
ao 
Varrpo(F(ed) (fol q2dL)2 '
with rpo as in (5.6.15) . Consequently, (5.6.22) and {,8x e } is asymptotically efficient among all {,8D' D satisfying o above conditions .
Consider the case when L(s) 2
J J[F(x /\ y) 
ao =
== s. Then
F(x)F(y)]j2(x)f2(y)dxdy (J P(x)dx)2
It is interesting to make a numerical comparison of this var iance with that of some other well celebrated estimators. Let alad' a~ and a;s denote the respective asymptotic variances of the Wilcoxon rank, the least absolute deviation, the least square and the normal scores estimators of ,8. Recall, from Chapter 4 that
a;,
a:; ~ als
=
{12(J f'(X)dX) ' } a2;
a;s
 1
a1'd ~ (2f(OW' ;
= {[J f 2 (x )/ rp (  l (F ))Jdx } 2 ;
where a 2 is the error variance. Using these we obtain the following table.
F Double Exp. Logistic Normal Cauchy
Table 1 a 02 a w2 1.2 1.333 3.0357 3 1.0946 tt /3 2.5739 3.2899
2 a/ad 1
4 7f/2 2.46
2 a ns 7f2 7f 1
a2 2 7f2/3 1 00
208
5. Minimum Distance Estimators
It thus follows that the m.d. estimator j3xJL), with L( s) == s, is superior to the Wilcoxon rank estimator and the LAD estimator at double exponential and logistic errors, respectively. At normal errors, it has smaller variance than the LAD estimator and compares favorably with the optimal estimator. The same is true for the m.d. estimator x (F) . Next , we shall discuss 13 . In the following theor em the framework is the same as in Theorem 5.5.7. Also see (5.5.82) for the definitions of ti.; BIn etc. Theorem 5.6a.3. In addition to the assumptions of Theorem 5.5.7
/3
assume that (5.6.23) lim inf inf n
11011 =1
J
I v~danOI ~
a , for some a > O.
Moreover, assume that (5.6.9)) holds and that 8 1;
(5.6.24)
exists for all n ~ p.
Then, (5.6.25)
A 1(13  13)
JJ
8 1;
vn( s , y){Mln( S, y)
+ mn(s , y)}dan(s, y) +op(l) .
Proof. The proof of (5.6.25) is similar to that of (5.5.33), henc e no 0 det ails are given . Corollary 5.6a.5. Suppose the conditions of Theorem 5.6a.3 are satisfied by Fni == F == Hni' Gn == G, L n == L , where F is supposed to have continuous density f . Let
JJ11 1
C
=
o
ns
1
[{An
0
1
nt
L L x ixjfi(y)fj(y)A}(s l\t) i= l j = 1
X
(F(Y
1\
z)  F(y)F(z))] dai« , y)da(t , z) ,
where Ii(y) = f(y  x~j3) , and da( s ,y) = dL(s)dG(y) . Then the asymptotic distribution of A  1(13 _ 13 ) isN(o, '£0(13)) where '£0(13) = 1CB  I B0 In In '
209
5.6.2. Robustness
Because of the dependence of ~o on f3 , no clear cut comparison between f3 and iJx in terms of their asymptotic covariance matrices seems to be feasible. However, some comparison at a given f3 can be made. To demonstrate this, consider the case when £(s) = s, p = 1 and /31 = O. Write Xi for XiI etc . Note that here, with = I:~=1
T;
BIn
=
T;2 inr n
1
n
ns ns L Xi L xjds . i=1
1
C
xl,
T;211 o
0
X
1
J2
j dG ,
j=1
ns nt n 1 LXi L Xj(s i=1
1\
t)dsdt
j=1
J J[F(Y 1\ z)  F(y)F(z)]d'IjJ(y)d'IjJ(z).
Consequently
L;o(O) =
Recall that T2 is the asymptotic variance of Tx(Sx  {3). Direct integration shows that in the cases Xi == 1 and Xi == i , Tn + ~~ and ~~ , respectively. Thus, in the cases of the one sample location model and the first degree polynomial through the origin, in terms of the asymptotic variance, Sx dominates 7J with £(s) = s at /3 = o. 0
5.6.2
Robustness
In a linear regression setup an estimator needs to be robust against departures in the assumed design variables and the error distributions. As seen in the previous section, one purpose of having general weights D in iJn was to prove that iJx is asymptotically efficient among a certain class of m.d. estimators {iJn,D varies}. Another purpose is to robustify these estimators against the extremes in the design by choosing D to be a bounded function of X that satisfies all other conditions of Theorem 5.6a.1. Then the corresponding iJn would be asymptotically normal and robust against the extremes in
5. Minimum Distance Estimators
210
the design, but not as efficient as fJx . This gives another example of the phenomenon that compromises efficiency in return for robustness. A similar remark applies to {,sri} and {,sD}' We shall now focus on the qualitative robustness (see Definition 4.4.1) of fJx and ,si . For simplicity, we shall write 13 and ,s+ for fJx and ,si, respectively, in the rest of this section. To begin with consider 13. Recall Theorem 5.5 .3 and the notation of (5.5.19) . We need to apply these to the case when the errors in (1.1.1) are modeled to be i.i.d. F, but their actual d.f.'s are {Fnd, D = XA and G n = G. Then various quantities in (5.5.19) acquire the following form.
rn(y)
where for 1
,
on
AX'A*(y)XA,
13 n = AX'! A*IIA*dGXA,
!r
+ An(y)]dG(y)
~
n(y)AX'[on(Y)
=
z, + b.,,
say,
i ~ n, y E JR,
. (a nl,an2 , '
"
, ann), A~:= (.0. nl,.0.n 2, · · · ,.0.nn );
II . X(X'X)lX';
bn
:=
! r n(y)AX' An(y)dG(y).
The assumption (5.5.a) ensures that the design matrix X is of the full rank p . This in turn implies the existence of 13;;1 and the satisfaction of (5.5.b) , (5.5.1) in the present case. Moreover , because c; = G, (5.5.k) now becomes (5.6.26)
liminf inf kn(e);:: T ' for some T > 0, n Ilell=l
where
kn(e) :=
e'AX'
!
A*gdGXAe,
°
Ilell
= 1,
and where 9 is a function from JR to [0,00], < J gr dG < 00, r = 1,2. Because G is a (J  finite measure, such a 9 always exists. Upon specializing Theorem 5.5 .3 to the present case, we readily obtain Corollary 5.6b.1. Assume that in (1.1.1) the actual and modeled d.J. 's of the errors {eni , 1 ~ i ~ n} are {Fni , 1 ~ i ~ n} and F ,
5.6.2. Robustness
211
respectively. In addition, assume that (5.5.a), (5.5.c)  (5.5.j) with D = XA, Hni == F, Gn == G, and (5.6.26) hold. Then A 1(/3  {3) = B~l{Zn
+ b n} + op(l) .
0
Observe that B~lbn measures the amount of the asymptotic bias in the estimator /3 when F ni i= F . Our goal here is to obtain the asymptotic distribution of A  1(/3  {3) when {Fnd converge to F in a certain sense. The achievement of this goal is facilitated by the following lemma. Recall that for any square matrix L , IILlioo = sup{IIt'LIl :s: I}. Also recall the fact that (5.6 .27)
IILlloo
:s:
{tr .LL'}1/2,
where tr. denotes the trace operator. Lemma 5.6b.1. Let F and G satisfy (5.5.44). Assume that (5.5.e) and (5.5.j) are satisfied by G n == G,{Fnd ,Hni == F and D = XA. Moreover assume that (5.5.c) holds and that (5.6.28)
Then with I = I pxp, (i) IIB n  If f 2dGIIoo = 0(1). (ii) IIB~l  I(f f 2dG)1I1oo = 0(1). (iii) Itr.B n  p f f 2dGI = 0(1). (iv) I Z=~=1 f II vjll2dG  p f j 2dGI = 0(1). (v) IIb n  fAX' An(y)j(y)dG(y)} = 0(1). (vi) IIZn  fAX'O:n(y)j(y)dG(y)1I = op(l) . (vii) sUPllell=llkn(e)  f jgdGI = 0(1). Remark 5.6b.1. Note that the condition (5.5.j) with D = XA, G n == G now becomes (5.6 .29)
Proof. To begin with, because AX'XA == I, we obtain the relation rn(y)r~(y)  j2(y)I
= AX'[A*(y)  j(y)I]XA ·A X ' [A *(y)  f(y)I]XA AX'C(y)XA· AX'C(y)XA 1)(y)1)'(y) ,
y E IR,
5. Minimum Distance Estim ators
212
where C(y ) := A *(y )  If (y ), 'D(y ) := AX'C (y )XA , y E lit Therefore, II S n
(5.6.30)
I

J
f 2dGli oo
and every a
t
E
W with
lIall
!(a'enifI(la'enil > €)dPni
= 1,
= 0(1).
i= l
Now, let X n1, '" , X nn b e independent r .v.'s with Qnl , '" , Qnn denoting their respective distributions and s, = Sn(Xn1"" , X nn ) be an est imat or of Sn (Qn). Let U be a nondecreasing bounded fun ction on [0,00] to [0, (0) and define the risk of est imat ing Sn by s, to be
where En is the exp ectation under Qn.
Theorem 5.6c.1. Suppos e that {Sn : TIn + W , n 2: 1} is a sequ en ce of Hd ifferentiable functionals and that the sequence {pn E 1S
rr}
su ch that
(5.6.44) Th en ,
(5.6.45) where Z is a N(O , I p x p ) r. v.. Sketch of a proof. This is a reformulation of a result of Beran (1982) , pp . 425426. He actually proved (5.6.45) with lCn(pn , c, fJn) replaced by 'Hn(p n , c) and without requiring (5.6.44) . The assumption (5.6.44) is an assumption on the fixed sequence {pn} of probability measures. Beran's proof proceeds as follows :
5. Minimum Distance Estimators
220
Under (1) and (2) of the Definition 5.6c.1, there exists a sequence of probability measures {Qn(h)} such that for every 0 < b < 00,
L/
sup n IIhll::;b i=1
{ 1/2
qni (h) 
1/2 v.; 
1/2
I
(1.2)h eniPni
}2 dVni = 0(1).
Consequently,
and for n sufficiently large, the family {Qn (h), IIh} :::; b, h E JRP} is a subset of Hn(pn ,(b/2)). Hence , V c > O,V sequence of statistics {Sn} , (5.6.46)
lim inf inf n
s,
sup QnEHn(pn ,c)
2: lim inf inf sup n
Sn
Ilhll ::;2c
R n (Sn, Qn)
u; (Sn, Qn (h))
Then the proof proceeds as in Hajek  Le Cam setup for the parametric family {Qn(h) , Ilhll :::; b}, under the LAN property of the family {Qn(h) , IIhll :::; b} with b = 2c. Thus (5.6.45) will be proved if we verify (5.6.46) with Hn(pn , c) replaced by Hn(pn , c, 1]n) under the additional assumption (5.6.44). That is, we have to show that there exist sequences 0 < 1]nl 70,0 < 1]n2 7 0 such that the above family {Qn(h), Ilhll :::; b} is a subset ofHn(pn , (b/2) , 1]n) for sufficiently large n . To that effect we recall the family {Qn (h)} from Beran. With eni as in the Definition 5.6c.1, let ~nij denote the jth component of ~ni, 1 :::; j :::; P, 1 :::; i :::; n. By (3) , there exist a sequence En > 0, En .j,. 0 such that
Now, define , for 1 :::; i :::; n, 1 :::; j :::; P,
221
5.6.3. Locally asymptotically minimax Note that (5.6.47) For a
Ilenill :::; 2pE n ,
°
.(Xi , d(Xi)) I(£(X i) :::; x ),
x E JR.
i= l
An adaptat ion of t he GlivenkoCantelli arg uments ( see (6.6.31) below) yield s sup In 1/ 2 D n(x)  E>'(X ,d(X))I( £(X) :::; x )1 + 0, a.s., xE IR
where the expectation E is com pute d under the alte rnati ve m'l/J . Mor eover , by Theorem 6.6.1 below, Sn,'l/J converges weakl y to a continuous Gau ssian process. These fact s t ogether with the ass umpt ion (6.6.4) and a r outine arg ument yield the consiste ncy of t he KS and Cramerv on Mises t est s based on Sn,'l/J ' Note that t he condit ion (6.6.4) is t rivi ally satisfied when 'l/J (x) == x while for 'l/J = 'l/Jo:, it is equivalent to requirin g t hat zero be the un iqu e o:th per centile of the condi tional distribution of the error Y  m'l/J( X ), given X .
6.6.2
Transform
t; of Sn ,'l/J
This section first discusses the asympt ot ic behavior of t he processes introdu ced in the previous sect ion under the simple and composi te hyp otheses. Then, in the special case of p = 1, £(x ) == x, a transformation Tn is given so that the process TnSn,'l/J has the weak limit with a known distribution . Con sequentl y th e t est s based on the pr ocess TnSn,'l/J are ADF. To begin with we give a general resul t of somewhat ind ep endent interest . For each n, let (Z ni, X i) be i.i.d . r.v .'s, {X i ,l :::; i:::; n } ind ep endent of {Zni ; 1 :::; i :::; n },
IEZn1 == 0,
(6.6.5)
EZ;'l
< 00 , V n 2: 1,
and define n
(6.6 .6)
Zn(x)
= n 1/ 2 L i= l
ZniI(£(X i) :::; x),
x E JR.
6. GoodnessoiIit Tests in Regression
270
The process Zn takes its value in th e Skorokhod space D(  00, (0 ). Extend it continuously to ±oo by putting
Zn (  00) = 0,
and
Zn (oo) = n 1 / 2
n
L: Zn;· ;= 1
Then Z n becomes a pro cess in V[ oo, 00]. Let (T~ := EZ~ ,1 and L denote th e d.f. of f( X ). Not e that und er (6.6.5) , the covariance function of Zn is := (T~L(x 1\ y) ,
J (see (WI) below) and a nonsmooth 71> (see (11'2) below) are dealt with separately. All probability statements in thes e assumptions ar e understood to be made und er H o . We make th e following assumptions . About the est imator On assume (6.6.12)
About th e model under H o assume the following: There exists a function rn from ffi.P x e to ffi.q such that rile, ( 0 ) is measurable and satisfies the following: For all k < 00, (6.6.13)
sup n 1 / 2 Im (X i , B)  m(Xi , ( 0 )

(0  Oo)'ril(Xi , ( 0 )1
= op(l) ,
(6.6.14)
Ellril(X , ( 0 )11 2
0, :3 a 8 > 0, and an N < 00,3 'tI 0 < b < 00 , n 1 / 2
1 2s) /
IIsll :::; b, n > N ,
p( (6.6.21)
n
L
sup n 1 / 2 Ilrlli(O + n 1 / 2 t )  rlli(O + n 1/ 2 s)11:::; tsll II ::;" i=1 2: 1  e,
€)
e'M(O + n 1 / 2re) is monotonic in r E ~, 'tie E ~q ,
Ilell =
1, n 2: 1,
a.s.
A proof of the above claim uses the methodology of Chapter 5.4. See also Section 8.2 below in connection with autoregressive models. Unlike (6.6.3), th e structure of K~ given at (6.6.17) does not allow for a in terms of a pro cess with a known distribution. simple representation of
s;
6. GoodnessoEfit Tests in Regression
274
The situation is similar to the model checking for the underlying error distribution as in the previous sections of this chapter. Now focus on the case p = 1, £(x) == x. In this case it is possible to transform the process Sn,,,, so that it is still a statistic with a known limiting null distribution. To simplify the exposition further write mO = me, 0 0 ) , Set A(x) = m(y)m'(y) 1(y 2: x)G(dy) , x E JR,
!
where G denotes the d.f. of X, assumed to be continuous. Assume that
A(xo) is nonsingular for some Xo
0) 
.5} [ I(X i
_t
~ xo)
X j Xi [(Xj ~ Xi 1\ x) ] j=l L~=l X; I(Xk ~ Xj) ,
and
n
0";,.5 = n 1 L{I(l'i 
x.e; > 0) 
.5}2.
i=l By Theorem 6.6.4, the asymptotic null distribution of both of these tests is free from the null model and other underlying parameters, as long
280
6. Goodn essoiiit Testing in Regression
as t he estimator en is t he least squa re est imator in t he form er test and t he least abso lute deviation esti mator in t he lat ter.
=
=
In the case q = 2, 91 (x) 1, 92(X) x , one obtains ril (x , ( 0 ) and A (x ) is t he 2 x 2 symmetric matrix
A (x )
= E I(X ?
x)
(~
= (1, xy
:2 ).
Clearly, E X 2 < 00 implies A (x ) is nonsingular for every real x and AI and A are cont inuous on lIt The matrix
). provides a uniformly a.s . consiste nt estimator of A (x). Thus one may use sUPx:SXQ ITnSn,I (x) II {(In,IGn (XO)} to test t he hypoth esis t hat t he regression mean function is a simpl e linear model on th e interval (00, xo ]. Similarl y, one can use t he test statistic
to test the hypoth esis t hat t he regression median function is given by a simple linear funct ion. In both cases A n is as above an d one should now use t he genera l formula (6.6.30) to compute t hese statisti cs. Again, from Theorem 6.6.4 it readily follows t hat the asymptotic levels of both of t hese tests can be computed from t he distribution of sUPO:Su:Sl IB (u)l, pro vided t he estimator On is taken to be , respectively, th e LS and t he LAD. Remark 6.6.2 Theorems 6.6.1 and 6.6.2 can be exte nded to t he case where £(X i ) is replac ed by an r vector of functions in th e definitions of th e Sn ,1/J and Sn ,1/J , for some positive integer r . In thi s case th e time parameter of th ese pro cesses is an r dimensiona l vector. The difficulty in transforming such processes to obtain a limitin g process t hat has a known limiting distribution is similar to t hat faced in t ra nsforming t he multi variate empirical pro cess in th e i.i.d. setting . This, in turn , is related to t he difficulty of having a proper definit ion of a multitim e par ameter martingale. See Khm alad ze (1988, 1993) for a discussion on t he issues involved . For t hese reasons we restricted our attent ion here to t he one dimensional ~e ~~
0
281
6.6.4. Proofs
6.6 .4
Proofs of some results of Section 6.6.2
Before proceeding further , we state two fact s that will be used below repeatedly. Let {~;} be r.v.'s with finit e first moment such th at {(~i , Xi)} are i.i.d . and let ( i be i.i.d. square integrable LV .'S. Then maxl ~ i ~ n n  1 / 2 1( i l = op (l) and n
(6.6.31)
sup !n 1
L ~i I (X
x EIR
i= 1
i :::;
x )  E6 I (X :::; x)l + 0, a.s.
The LLN's implies th e pointwi se convergence in (6.6.31). The uniformity is obtained with th e aid of t he trian gle inequality and by decompo sing each ~j into it s negative and positive par t and app lying a GlivenkoCantelli typ e argument to each part . R ema rk 6.6.3 We are now ready to sket ch an argument for th e weak convergence of Sn , ,p ( (T~ . ,p )I) to B und er t he hypoth esis m,p = mo. For t he sake of brevity, let b., := T~ . ,p (00) , b := T~ (00) . Fir st , note t hat sup I T~ , ,p ( (T~ . ,p ) J (t))  t l
max n  J 7jJ2(Yj
0 and let A n := [Ibn  bl :::;
E]
and
Cf :=
1/[1  ~] .
and sup IT~ . ,p ( (T~ . ,p ) I ( t ) )  t l
(xo) rewrite th e relevant term as Kn(x)
.
n  1/ 2
~ 'l/J(ci ) i
X oo ril'(y)A
1
< 00 . Now,
(y)ril(Xi)I(X i ;::: y) xG(dy) .
Because the summands are martingale differences and because of (6.6.14) we obtain , with the help of Fubini 's th eorem , that for x < y , E[Kn (y)  Kn( x)F =
O'~
lY lY ril'(s)A
By (6.6.14), IIAlloo obtain th at
1(
s)A(s vt)A 1(t)ril(t)G(dt)G(ds) .
sUPxEIR IIA(x) II ~ J~oo Ilrilll 2 dG
E[Kn (y)  Kn( xW
0 t here is a 6 > 0 such that for every Ilull b,
s
(7.2.15)
s
lim supP ( n
sup IIsIl9, lIs ull:So,
IWs(x )  Wu (x ) 1 > 4c)
< c.
zER
By th e definition of W ± and the triangle inequality , for x E ~, s , u E ~p , (7.2.16)
IWs(x )  Wu(x) 1
x)} + 0 as x + 00, which, in turn is equivalent to requiring that x 2 P(lcII > x) + 0 as x + 00. This last condition is weaker than requiring that Elcl12 < 00 . For example, let the right tail of the distribution of ICII be given as follows:
P(lcII >x)
< 2,
1,
x
1/(x
2€nx)
, x 2': 2.
Then, ElcII < 00 , EcI = 00 , yet x 2 P(lcll > x) + 0 as x + 00. A similar remark applies to (7.2.4) with respect to the square integrability of h(Yo) . 0
Remark 7.2.3 An analogue of (7.2.25) was first proved by Boldin (1982) requiring {Xd to be stationary, ECI = 0, E(cI) < 00 and a uniformly bounded second derivative of F . The Corollary 7.2.2 is an improvement of Boldin 's result in the sense that F needs to be smooth only up to the first derivative and the r. v.'s need not have finite second moment. Again, if Yo and {cd are so chosen that the Ergodic Theorem is applicable and E(Y o) = 0, then the coefficient n l L~=l Y i  l of the linear term in (7.2.25) will converge to 0 , a.s .. Thus (7.2.25) becomes (7.2.26)
sup Inl / 2{Fn(x, p + n lIull9
I 2 / u)
 Fn(x , p)}1
= op(l) .
In particular, this implies that if jJ is an estimator of p such th at
then
Consequently, the estimation of p has asymptotically negligible effect on the estimation of the error d.f. F. This is similar to the fact , observed in the previous chapter, that the estimation of the slope parameters in linear regression has asymptotically negligible effect on the estimation of the error d.f. as long as the design matrix is centered at the origin . 0
7. Autoregression
304
Serial Rank Residual Correlations. An important application of (7.2.25) occurs when proving the AUL property of the serial rank correlations of the residuals as functions of t . More precisely, let R it denote the rank of Xi  eY i 1 among X j  ey j  1 , 1 :S j :S n, 1 :S i :S n. Define Rit = 0 for i :S O. Residual rank correlations of lag i, for 1 :S j :S p, t E IRP, are defined as (7.2.27)
S'
5 j (t )
.
n (n + 1)) ( (n + 1)) 12 n(n2 1) .L (Ri jt 2 Rit 2 ' '=J+1 (51,"' , 5 p ) .
Simple algebra shows that
where an is a nonrandom sequence not depending on t, lanl
bnj(t)
:=
j)
In (n . L +L ,=nJ+1 ,=1
6(n+l) {n(n2 _
= 0(1), and
Rit ,
n
Lj(t)
:=
n 3
L
RijtRit, 1:S j :S p, t E IRP .
i=j+1 Observe that sup{lb nj (t )l; t E JRP} :S 48p/n, so that n 1 / 2 sup{lbnj(t)l; t E IRP} tends to zero , a .s. It thus suffices to prove the AUL of {L j } only, 1 :S j :S p. In order to state the AUL result we need to introduce
(7.2.28) :=
0,
U i j := Y i j 1F(E;)f(Eij) :=
0,
i :S i .
+ YiI!(E;)F(Eij) ,
i
> i, i :S j ,
Observe that {Zij} are bounded r .v.'s with EZij = J f2(X)dx for all i and j. Moreover, {Ed i.i.d . F imply that {Zij , j < i :S n} are stationary
r;
7.2. AUL OfWh &
305
and ergodic. By th e Ergodic Theorem ,
Zj + b(f ) :=
I
f 2(x )dx , a.s ., j = 1, '"
.v.
We are now read y to state and prove Theorem 7.2.2 Assume that (7.1.1), (7.2.5) , (7.2. 7) and (7.2.24) hold. Then for every 0 < b < 00 and for every 1 ~ j ~ p, (7.2.29)
sup In 1 / 2[L j (p + n  1 / 2 u)  L j (p )]  u'[b(f )Y n  Uj]1 lIull:Sb

= op(l) .
cn
If (7.2.5) and (7.2.24) are strengthened to requiring E(lIY o Il 2 + < 00 and {X;} stationary and ergodic then Y nand U j may be replaced by their respective expectations in (7.2.29). Proof. Fix a j in 1 ~ j ~ p . For the sake of simplicity of t he exposition, write L (u ), L (O ) for Lj (p + n 1 / 2u ), Lj (p ): respectively. Apply similar convent ion to other functions of u . Also write Ci u for e,  n 1 / 2 u' Y i _ 1 and FnO for Fn (·, p ). With t hese convent ions R iu is now t he rank of X i  (p + n  1 / 2 u )' Y i _ 1 = Ci u . In ot her words, R iu == n Fn(ciU', u ) and
L (u ) = n  1
n
L
Fn(ci ju , U)Fn(ciu, u ), u E JRP .
i= j + l
The pr oof is based on t he lineari ty properti es of Fn ( , u ) as given in (7.2.25) of Corollar y 7.2.2 above. In fact if we let
B n(x , u )
:=
Fn(x , u )  Fn(x )  n  1 / 2 u'Y nf( x) ,
X
E JR
t hen (7.2.25) is equivalent to
supn 1 / 2 IBn(x, u )1
= op(l) .
All supremums, unl ess specified ot herwise, in the proof are over x E JR, 1 lIuli ~ b. Rewrite
i ~ nand / or
n 1 / 2(L(u)  L(O)) n
=
n 1 / 2
L
{Fn(ciju , U)Fn(ciu, u )  Fn(ci j )Fn(ci )}
i =j + l n
=
1 2
n /
L
[{Bn(Ci jU , u ) + Fn(ci ju ) + n 
1 2 / u' Y nf
(ci ju )}
i = j+ l
·{B n(C iu, u ) + Fn(ciu ) + n  1 / 2 u' Y nf( ciu)} Fn(ci j )Fn(ci)] .
~
7. Autoregression
306 Hence , from (7.2.5), (7.2.20) and (7.2.24), (7.2 .30)
n 1 / 2(L(u)  L(O)) n
L
=n 1/ 2
[Fn(ciju)Fn(ciu)  Fn(ci)Fn(cij)]
i=j+1 n
+ n 1
L
[Fn(ciju)!(ciu)
i=j+1
+ Fn(ciu)f(ciju)](U'Y n) + u p (l ), where, now, u p (l ) is a sequence of stochastic processes converging to zero uniformly, in probability, over the set {u E jRP ; lIull :::; b). Now recall that (7.2.7) and the asymptotic uniform continuity of the standard empirical process based on LLd. r.v. 's imply that sup
n 1/ 2 1[Fn(x )  F( x)]  Fn(y)  F(y)J1 = op(l)
Ix yl:S o
when first n + 00 and th en J + O. Hence from (7.2.5) and the fact that sup ICiu  c;/ :::; bn 1 / 2 max IIY i 111, i ,u
1
one readily obtains supn l / 2 1[Fn(Ciu)  F(ciu)] [Fn(Ci)  F(ci)]1 = op(l) . I ,U
From this and (7.2.7) we obtain (7.2.31)
supn 1 / 2 IFn(ciu)  Fn(ci)
+ n 1 / 2 u' Yi_l! (Ci)1=
op(l) .
i ,u
From (7.2.30), (7.2.31), the uniform continuity of Cantelli lemma, one obtains (7.2.32)
!
and F, the Glivenko
n 1 / 2(L(u)  L(O))
n 1
n
L
[F(cij)!(ci)
+ F(c i)!(ci_j)](U'Y n)
i=j+1 u'n 1
n
L i=j+1
{Yijl!(ci j)F(Ci) + Y il!(ci)F(Cij)}
7.2. AUL OfWh &
r;
307
In concluding (7.2 .32) we also used the fact that by (7.2.5) and (7.2.24) , n
sup In 3 / 2
L
!U'Yi j' u'Yi 
1
1
i=j+1
u
n
:S bn  1/ 2 max , IlYi_ 1I1 n  1
L
II Y i  j ll =
op(l) .
i=j+1
Now (7.2.29) readily follows from (7.2.32) and the notation introduced just befor e th e statement of the theorem. The rest is obvious. 0 Remark 7.2.4 Autoregressive m oving average models. Boldin (1989) and Kreiss (1991) give an analogue of (7.2.26) for a moving average model of
order q and an autoregressive moving average mod el of order (p, q) (ARMA (p, q)) , respect ively, when t he error d.f. F has zero mean , finit e second moment and bo unded second derivative. Here we shall illust rate as to how Theorem 2.2.3 can be used to yield th e same result under weaker conditions on F. For th e sake of clarity, the det ails are carried out for an ARMA (l , 1) mod el only. Let CO, C1, C2, '" , be i.i.d . F r.v.'s and X o be a r .v. ind epend ent of {ci,i 2: I }. Consider th e process given by the relation Xi = pXi 1 + e, + (3ci  1, i 2: 1,
(7.2.33) where
Ipl < 1, 1(3 1< 1.
(7.2.34)
One can rewrite t his mod el as
Ci
Xl  (pX o + (3co),
i = 1,
i I
Xi 
L ( (3 )j (p + (3 )X i j  1
j=l +( _ (3) i 1(pX o + (3co), i 2: 2.
Let 0 := (s,t) ' denote a point in th e open square ( 1, 1)2 and 0 0 := (p, (3)' denote t he true param et er value. Assume that O's are restricted to t he following sequen ce of neighbourhoods: For abE (0,00) , (7.2.35) Let {Ei , i 2: I } stand for the residu als {ci, i 2: I } of (7.2.34) afte r p and (3 ar e replaced by s and t , resp ect ively, in (7.2.34) . Let FnC, O) denote th e empirical process of {Ei, 1 :S i :S n} . This empirical can be rewrit ten as n
(7.2.36)
Fn(x, O)
=n
1
L i=l
I(ci :S x
+ 6ni ), x
E JR,
7. Autoregression
308 where
(7 .2.37)
bni
.
(s  p)Xo + (t  b)co, i = 1, i2 t)i (s + t)  (_ (3)i (p + (3 )]X i j  1
L [( j= l
+ ( _t)i 1 (sXo + tco)  (_.8)i1] (pX o + .8co), i 2: 2.
=
<Sni t
+ 8n i 2 ,
say, i
2
2.
From (7.2.37), it follows t hat for every () E (1 ,1) 2 sa t isfying (7.2.35) ,
< bn 1 / 2(!Xol + Ico!), < 2bn 1/ 2 max IXil(l 
Ibn11 max Ibnill 2:Si:S n
l :Si:S n
br., 1/2 
(3) 1
X{l + (11 (3!)1}, max Ibni21 2:Si:Sn
0,
'tj
x(k))
x E jRn ,
because each summand is nonnegative. This proves that J(t) 2: 0, t E By Theorem 368 of hardy, Littlewood and Polya (1952), D(x) = max D,,(x),
"En
Therefore, (7.3.6)
'tj
t E W,
J(t)
'tj
x E jRn .
jR.
315
7.3.2. GRestimators
This shows that J(t) is a maximal element of a finite number of continuous and convex functions, which itself is continuous and convex . The statement about a.e. differential being nS(t) is obvious. This completes the proof of (a) . (b) Without the loss of generality assume b > J(D) . Write atE IRP as t = se, s E IR, e E IRP , Ilell = 1. Let d; == e'Y i  1 . The assumptions about J imply that not all {d i } are equal. Rewrite n
J(t)
J(se) =
L bn(i)(X 
sd)(i)
i=l
n
L bn(ris)(Xi 
sdi)
i=l
where now ris is the rank of Xi  sd; among {Xj  sd j; 1 ~ j ~ n}. From (7.3.6) it follows that J(se) is linear and convex in s , for every
e E IRP , Ilell
= 1. Its a.e. derivative w.r.t. s is 
2:7=1 dibn(ris),
which by Lemma 7.3.1 and because of the assumed continuity, is nondecreasing in u and eventually positive. Hence J(se) will eventually exceed b, for every
e E IRP : lIell = 1. Thus, there exists a Se such that J(see) > b. Since J is continuous, there is an open set De of unit vectors v, containing e such that J(sev) > b. Since b > J(O) , and J is convex, J(sv) > b, 'tis ;::: Se and 'tI v E De. Now, for each unit vector e , there is an open set De covering it. Since the unit sphere is compact, a finite number of these sets covers it . Let m be the maximum of the corresponding finite set of Se. Then for all s ;::: m, for all unit vectors u, J (sv) > b. This proves the claim (b), hence the lemma.
o Note : Lemma 7.3.2 and its proof is an adaptation of Theorems 1 and 2 of Jaeckel (1972) to the present case.
0
From the above lemma it follows that if the r.v .'s Y O,X1,X2 , ' " ,Xn are continuous and the matrix n 1 2:7=1 (Yi  1  Y)(Yi  1  V)' is a.s. positive definite, then the rank of Xc is a .s. p and the set {t E IRP ; J(t) ~ b} is a.s. bounded for every 0 ~ b < 00 . Thus a minimizer PJ of J exists, a.s ., and has the property that it makes IISII small. As is shown in Jaeckel (1972) in connection with the linear regression model , it follows from the AUL result given in Theorem 7.3.1 below that PJ and PR are asymptotically equivalent. Note that the score function e.p need not satisfy (7.3.5) in this theorem.
Ch 7. Autoregression
316
Unlike in the regression model (1.1.1), these estimators are not robust against outliers in the errors because the weights in the scores 8 are now unbounded functions of the errors. Akin to GM estimators, we thus define GR estimators as (7.3.7) Strictly speaking these estimators are not determined only by the residual ranks, as here the weights in 8 g involve the observations also. But we borrow this terminology from linear regression setup. For the convenience of the statement of the assumptions and results, from now onwards we shall assume that the observed time series comes from the following model. (7.3.8)
Xi
= PIXil + P2Xi2 + .. . + PpXi p + Ci, i = 0, ±1 , ±2,· ·· , P E ~p,
with all roots of the equation (7.3.9) inside the interval (1 ,1) , where {ci, i = 0, ±1, ±2 ,· ··} are i.i.d. F r.v.'s , with (7.3.10)
Ec = 0,
Ec 2
0, Y)'e < O.
E ither e'(g(Yid  g)(Y i  1  Y)'e
Or e'(g(Yid  g)(Y i  1

Th en, n
1
/
2(PRg
2 17
 p)
~ N( 0, Q~ ~glrg~gl) .
Proof. From the methods in Sections 5.4 and 5.5, Theorem 7.3.1 and (7.3.16) imply that n 1 / 2 11 (pg  p)11 = Op(l) and that n 1/ 2(PRg  p) = (Q~g) ln1 / 2Sg(p)
+ op(l) .
Observ e tha t n 1 / 2 Sg (p) is a vector of square integrable mean zero mar1[ tingale arrays with nESgS = (J~rg , (J~ := Jo r.p(u)  ep]2 du . Thus, by
g
319
7.3.3. Estimation of QU )
t he routine CramerWold device and by Lemm a 9.1.3 in the App endix, one readily obtains the claim of t he asym ptotic normality. 0 Remark 7.3.1 Write PR for PRg when g (y ) == y . Argue either as in the Section 3.4 or as in J aeckel (1972) to conclude that IIn 1 / 2 (PR  pJ)11 = 0 + p( l) . Consequent ly by T heorem 7.3.2,
Remark 7.3.2 Recently Mukherjee and Bai (2001) have extended the
AUL result (7.3.15) to any nondecreas ing square int egrable ip for t he case g (y ) == y , pr ovided the error d.f, F has the finite Fisher informati on for location, th e condition (a) of Theorem 3.2.3. Their proof uses the cont iguity argument, approximating t he squa re integrable
0, < 0,
\i x E JR, \i x E JR.
Then, a minimizer of Kt exists if either G(JR)
= 00
or G(JR)
0, 'r:j Ilull :S
b,
limi~fP(! n 1 [th ±(Yid{F(X +n 1 / 2 u'Y i_ 1 + 8ni ) i= l
F(x
+ n
1 2 /
u'Y i _ 1

8ni) }
r
dG(x ) :S k82 ) = 1,
where 8ni := n1/281 IYi_d ll and h± is as in t he proof of T heorem 7.2.1. For every lIull :S b, (7.4. 10)
!
n 1 [ t h(Y id { F(x
+ n 1 / 2 u'Yi _d  F (x )
i= l
 n  1 / 2 u'Y
i_ 1
! (X)}
r
dG(x) = op(I) ,
7. Autoregression
324
and (5.5.44b) holds. Now, recall the definitions of W h , Vh, Wh , W±, T±, W±, Z±, m± from (7.1.6) , (7.2.2), (7.2.11) and (7.2.12). Let I . Ie denote the £2  norm w.r .t. th e measure G. In the proofs below, we have adopted t he notation and conventions used in the proof of Theorem 7.2.1. Thus, e.g., 1 2u), Ei == Y i  1 ; W u () , v u () stand for W h(, p+n / Vh( ·,p+n 1 / 2u) , etc. Lemma 7.4.2 Suppose that the autoregression model {7.3.8} and {7.3.9} holds. Th en the following hold . Assumption {7.4.8} implies that V 0
(7.4. 11)
E ![Z±(x ; u, a)  Z±(x; u, OWdG( x)
= 0(1) , Ilull ::; b, a E lIt
Assumption {7.4.9} implies that V 0
(7.4 .12)
liminfP( n
sup
< b < 00 ,
< b < 00,
n1/2Iv; (x ,p+n1 / 2v)
II v  ull ~o
v;(x , p
+ n1 /2u)l ~ ::; k 0 t here is a 0> 0 such
JR.p ; lI ull :S b}, it suffices to show t hat for every "l
326
7. Au toregression
that for every
Ilull ::; b,
(7.4.19)
liminfP(
sup II v  ull :':: 1 means Px is asymptotically more efficient than Pis . It follows that Px is to be preferred to Pis for the heavy tailed error d.f.'s F. Also note that if G(x) == x then 7 2 = 1/12[J j2(x)dxj2 and e = 12O' 2 [J j2(x)dxj2 . If G is degenerate at 0, then y2 = 1/4(j2(0)] and e = 40'2 j2(0) . These expressions are well known in connection with the Wilcoxon and median rank estimators of the slope parameters in linear regression models . For example if F is N(O , 1) then the first expr ession is 3/1r while the second is 2/1r. See Lehmann (1975) for some bounds on these 0 expressions. Similar conclusions remain valid for p > 1. Remark 7 .4.4 Least Absolute Deviation Estimator. As mentioned earlier,
if we choose g(x) == x and G to be degenerate at 0 then p~ is the LAD estimator , v.i.z., n
p
(7.4.41)
Piad :=
arqm iru.
2: [2: X j=1
2 i j
sign(Xi

t'Y i 
1) ]
.
i=l
See also (7.4.6). Because of its importance we shall now summarize sufficient conditions under which it is asymptotically normally distributed. Of course we could use th e stronger conditions (7.4.36)  (7.4.39) but they do not use the given inform ation about G. Clearly , (7.4.7(b)) implies (7.4.7(a)) in the present case. Moreover , in this case the L.R .S. of (7.4.8) becomes
which tends to 0 by th e D.C.T., (7.4.7(b)) and th e continuity of F, 1 ~ j p. Now consider (7.4.9). Assume the following (7.4.42)
F has a density
I,
continuous and positive at O.
Recall from (7.3.12) that under (7.3.8), (7.3.9) and (7.4.7(b)) , (7.4.43)
n
1
/
2 max{IIY
i
 t1I; 1 ~ i
~ n} = op(l) .
Th e r.v.'s involved in the L.R .S. of (7.4.9) in the present case are n
n 1
[2: xt, {F(ni=l
1 2 / u'Y i _ 1
+ n 1 / 2 81IY i _ 1 ID
~
7.5. AUTOREGRESSION QUANTILES & RANK SCORES
333
which, in view of (7.4.42) , can be bounded above by n
I: Xt)I Yi 11If(1]niW ,
1 46 [n2
(7.4.44)
i=l
where {1]n;} are
LV . 'S,
with
Hence, by the stationarity and the ergodicity of the pro cess {X;}, (7.4.7(b)) , (7.4.42) and (7.4.43) imply that the r.v.'s in (7.4.44) converge to
This verifies (7.4.9) in the present case. The condition (7.4.10) is verified similarly. Also note that here (5.5.44) is implied by (7.4.42) and (5.5.45) is trivially satisfied as J F(l  F)dG :S 1/4 in the pres ent case. We summarize the above discussion in
Corollary 7.4.2 Assume that the autoregressive model (1.3.8)  (1.3.10) holds. In addition, assume that the error d.f. F has finite second moment, F(O) = 1/2 and satisfies (7.4.42) . Then ,
where
7.5
Ptad
is defined at (7·4·41).
Autoregression Quantiles and Rank Scores
The regression quantiles of Koenker and Basett (1978) (KB) are now accepted as an appropriate extension of the one sample quantiles in the one sample location models to the multiple linear regression model (1.1.1) . In this section we shall discuss an extension of these regression quantiles to th e stationary ergodic linear autoregressive tim e series of order p as specified at (7.3.8)  (7.3.10). We shall also assume that the error d.f. F is continuous.
7. Autoregression
334 Let Z~
;=
(1, y~), and for an 0 ::; a ::; 1, t E IRP +1 ,
?/Jo:(u) ;= auI(u > 0)  (Ia)uI(u::; 0), n
Qo:(t)
;=
L ?/Jo:(X
i 
Z:_l t) ,
i=l
n
So:(t)
;=
n 1
LY
i  1 {I(X i 
Z:_l t ::; 0)  a} .
i=l
The extension of the one sample order statistics to the linear AR(p) model (7.3.8)  (7.3 .10) is given by the autoregression quantiles defined as a minmuzer (7.5.1)
We also need to define (7.5 .2)
Note that p( .5) and Pmd(.5) are both equla to the LAD (least absolute deviation) estimator, which provides the extension of the one sample median to the above model. Let 1~ ; = (1" " ,Ihxn be an ndimensional vector of L's , 1r n be a subset of size p+ 1 of the set of integers {I , 2, . .. , n} , X~l := (Xl , . . . , X n ), X 1r n be the vector of Xi , i E 1r n , H n be the n x (p + 1) matrix with rows Z~ _l ; i = 1,· ·· ,n, and H 1r n be the (p + 1) x (p + 1) matrix with rows Z~_l ; i E 1r n · Now recall that the above model is casual and invertible satisfying a relation like (7.3.11) . This and the continuity of F implies that the rows of H n are linearly independent as are its columns, w.p.I. Hence, the various inverses below exist w.p.I. Now, let
and consider the following linear programming probl em . (7.5 .3)
. . . a I'nr + rmrnrrnze
+ (1 
a )1'nr  , w.r.t .
(t ,r,r + ) ,
subject to X;  Hnt = r+  r , over all (t ,r+ ,r) E IRP+ l
X
(O,oo)n
X
(Il.co)" .
7.5.1. Autoregression quantiles
335
Note that p(o:) E Bn(o:) . Moreover, the set Bn(o:) is the convex hull of one or more basic solutions of the form (7.5.4)
This is proved in the same fashion as in KB. A closely related entity is the so called autoregression rank scores defined as follows. Consider the following dual of the above linear programming problem. Maximize
(7.5.5)
X~a,
w.r.t . a , subject to
X~a = (1  o:)X~In ,
a E [0, It.
By the linear programming theory the optimal solution a n (0:) = (anI (0:) , .. . , ann (0:) )' of this problem can be computed in t erm s of p(o:) as follows: If
p(o:) =
H;;?a)XJr( a) ,
th en , for i
for som e (p+ l.jdimensional subset 71"(0:) of {I, ... ,n },
tt 71"(0:),
(7.5.6)
1,
Xi
0,
Xi
> Z; _IP(o:) , < Z;_IP(O:),
and, for i E 71"(0:) , Ilni(O:) is the solution of th e p + 1 linear equations (7.5.7)
L
ZjIllnj(O:)
j EJrn(a) n
(1  0:)
L j =l
n
ZjI 
L
Zj_1I
(X
j
> ZJ IP(o:)) .
j =l
The cont inuity of F impli es that the autoregression rank scores an (0:) are unique for all 0 < 0: < 1, w.p.1. The process an E [o,I]n has piecewise linear paths in [C(O,I)]n and an(O) = In = In  a n(I) . It is invariant in th e sense that an (0:) based on the vect or X ; + H n t is th e same as the an (0:) bas ed on X n , for all t E lRp+l , 0 < 0: < 1. One can use th e computational algorithm of Koenker and d 'Odrey (1987, 1993) to compute these ent it ies. In the next two subsections we shall discuss th e asymptotic distributions of Pn(O:) and an(o:)·
7.5.1
Autoregression quantiles
In this subsect ion we shall show how th e results of section 7.2 can be used to obtain th e limiting distribution of Pn(0:) . All th e need ed results are
7. Autoregression
336
given in t he following lemma . It s statement needs the additional notation:
p (a) :=p+F 1 (a )el , 1
q(a) := f(F (a )),
0
el := (I ,O, ·· · ,O)' ,
< a < 1; n
~n := n  1 H~ Hn
= n 1 LZi1Z:1 ,
~
= plimn~n.
i= 1
By t he Er godi c Theorem , 2: exists and is positi ve definite . In t his subsec t ion, for any pro cess Zn( s , a) , t he stateme nt Zn(s , a) = 0;( 1) mea ns t hat for every 0 op(I) .
< a:S 1/2 ,0 < b < 00, suP{ IZn(a) l; II s ll
:s b, a :S a :s 1 
a} =
Lemma 7.5.1 Suppose the assumptions made at (7. 3. 8)  (7.3.10) hold. If, in additi on (7. 2.10) holds, then, f or every 0 < a 1/2,0 < b < 00 ,
:s
(7.5.8) Moreov er,
(7.5.9)
(7.5.10)
n 1 / 2( Pmd(a)  p (a )) =  {q ( a )~ 71}  lnl / 2S,, (p (a )) + 0;( 1), n 1/ 2( Pmd(a)  jJ(a)) = 0;( 1).
If, (7.2.10) is strengthened to (F1 ) and (F2 ) , then , fo r ever y 0 < b < 00 ,
where the supremum is taken over (o ; s ) E [0,1] x {s E IRP+ l; II s ll
:s b}.
A sketch of the proof. The claims (7.5.8) a nd (7.5.11) follow from Theorem 7.2.1 an d Remark 7.2.1 in an obvio us fashion : apply these resul ts once with h == 1 and p times, jth time with h(Yid == Xij . In view of the Er godi c Theorem all conditions of T heorem 7.2.1 are a pri ori satisfied , in view of t he cur rent assumptions. The proof of (7.5.9) is similar to that of Theor em 5.5.3. It amo unts to first showing t hat (7.5.12) and t hen using t he result (7.5.8) to concl ude t he claim . But t he proof of (7.5.12) is similar to th at of Lemma 5.5.4, and hence no det ails are given.
7.5.1. Autoregression quantiles
337
To prove (7.5.10), we shall first show that for every 0 < a :S 1/2, (7.5.13) To th at effect, let wn(a )
:=
2::
Z~_l {I (X i  Z~_lP(a) :S 0)  a } H;:(a)
i (l 1T n (a)
+
2::
Z~ _lI(XiZ ~_lP( a)=O)H ;n\a) '
i(l1Tn (a )
where 1rn (a ) is as in (7.5.4). Using sgn(x) = 1  2I(x :S 0) have th e following inequalities w.p.I . For all 0 < a < 1, (a  1)lp
+ I( x = 0)
we
< wn(a) < alp .
Note that from (7.5.4) we have I((X; Z~_lP(a) = 0) Thus we obtain
= 0, for all if/. 1rn (a ).
n
[2:: Z~_l {I(X Z~_lP(a) :S 0)  2:: Z~_l {I (X i 
a}
i= l
i 
Z~_ljJ(a) :S 0) 
a }]H ; :(a)
iE 1T n (a)
=
w~ ( a ).
Again , by (7.5.4), I (X i  Z ~ _lP ( a ) :S 0) = 1, i E 1rn (a ), 0 < a < I " w.p.I . Hence, w.p.L , VO < a < 1, n 1 / 2S a(p(a))
2::
= n  1/ 2
Z ~_l (1  a)
+ n l /2H~n (a ) wn(a ),
i E 1T n (a )
so tha t
in view of the square integrability of X o and the stationarity of the process. This completes the proof of (7.5.13). Hence we obtain (7.5.14)
sup a ~a ~la
inf Iln 1/ 2 S a(s)1I = op(I) . s
This and (7.5.13) essentially th en show t hat sup
n1/ 21ISa(p (a ))  Sa (Pmd)(a ))11 = op(I) ,
a ~ a ~l  a
which together with (7.5.8) proves the claim (7.5.10). Th e following corollary is immediat e.
o
7. Autoregression
338
Corollary 7.5.1 Under the assumptions (7.2.10) and (7.3.8)  (7.3.10),
(7.5.15)
n 1/ 2(p(a)  p(a)) = {q(a)1: n } l nl /2S o (p (a )) + 0;(1).
Moreover, for every 0 < al < .. . < ak < 1, the asymptotic joint distribution of the vector n 1/ 2[(p( a r)  p(ar)) , · · · , (P(ak)  p(ak))] is (p + 1) x k normal distribution with the mean matrix bfO and the covariance matrix A ffi 1: 1 , where
and where ffi denotes the Kronecker matrix product.
7.5.2
Autoregression rank scores
Now we shall discuss the asymptotic behaviour of the autoregression rank scores defined at (7.5.6). To that effect we need to introduce some more notation. Let q be a positive integer, {k n i j ; 1 ~ j ~ q} be ;:il := (J {Yo ,co ,El, · · · ,ciI} measurable and independent of Ci , 1 ~ i ~ n . Let k n i := (k n i 1 , . . . , kn iq )' and K denote the matrix whose i t h row is k n i , 1 ~ i ~ n. Define the processes n
Uk(a) := n 1
L kn;{&ni(a) 
(1  an,
i ==1 n
Uk(a)
:=
n 1
L kn;{I(Ci > F
1
(a )  (1  an ,
a~
a ~ 1.
i==l
Let
We are now ready to state Lemma 7 .5.2 In addition to the model assumptions (7.3.8)  (7.3.10) , suppose the following two conditions hold. For some positive definite matrix
rqx q, = r + op(I) . max IIk = op(I) . l ::;i::;n n ili
(7.5.16)
n 1 K ' K
(7.5.17)
n 1 / 2
Then, for every
(7.5.18)
a< a
~
1/2,
7.5.2. Autoregression rank scores
339
Conse quen tly,
(7.5.19) Proof. From (7.5.6) , we obtain t hat V1 ani (a)
=
I (ci
~
i ~ n, 0
< a < 1,
nI/2 z~_l an(a))
> F 1 (a ) +
+ ani (a )I(X i = Z~_ IP ( a ) ) ,
which in t urn yields the following identi ty: ani(a)  (1  a) I (ci
> F  1(a )  (1 
a)
{I(ci ~ F  1(a) + n I /2 z ~_ l an (a ) )  Iie, ~ F1( a))} + ani( a)I(X i = Z:_lp(a) ),
for all 1
~
i
~
n, 0 < a < 1, w.p.1. T his and (7.5.4) yield
n l / 2 Uk(a)
n l / 2 U k(a)  lCna n( a)q(a)
[nI/2~kni{I(€i < F1(a) + n  I/ 2z :_ la n (a )) I(ci
+n  I / 2
L
~ F 1(a )) } lCnan(a )q(a )]
k niani (a )I(X i = Z:_ I P(a ))
i E 7rn (o )
n l / 2 U k(a)  lCna n (a )q(a )  R 1(a)
+ R 2(a ),
say .
Now, by t he CS inequality and by (7.5.16), IllCn/l = Op( I) . App ly Remark 7.2.1 to r ni == k nij and ot her ent it ies as in t he previous section to conclude t hat sup{/I R 1(a ) II; O ~ a ~ I} = op(I) . Also , not e t hat from the results of the previous sect ion we have II sUPa::oo::o l a /Ia n( a )11 = Op(I ). Use this and (7.5.17) to obtain sup{IIR 1 (a) ll; a ~ a ~ 1 a} = op(I ), t here by complet ing D. th e proof of (7.5.18) . T he rest is obvious . Corollary 7.5.2 Under the assumptions of Lem m a 7.5.2, the auto regression quantile and autoregression rank score processes are asymptotically in dependent. Moreov er, for every k
2: 1, an d f or every 0 < a l < ... < ak ,
n l / 2( U k(ad , ' " , Uk (a d )
=}
N( O, B) ,
B := B EB p limnn I[ K~  lCn :En H~)[ K~ lC n:E nH ~ ]' ,
340
7. Autoregression
where B := ((ai /\ aj  aiaj)h:::;i,j9'
Proof. Let si(a) := I(ci > F 1(a))  (1  a), 1 ~ i ~ n, s(a) := (Sl (a), ·· · , sn (a ))' . The leading r.v .'s in the right hand sides of (7.5.15) and (7.5.19) are equal to I:;;ln1/2H~s(a)jq(a),
n1/2[K~ 1CnI:nH~ls(a),
respectively. By the stationarity, ergodicity of the underlying process, Lemma 9.1.3 in the Appendix, and by the CramerWold device, it follows that for each a E (0,1), the asymptotic joint distribution of an(a) and n 1/ 2Uk(a) is (p + 1 + q)dimensional normal with the mean vector 0 and the covariance matrix
where 'On := [a(1  a)jq2(a))~1 '0 22 := plimnn1[K~ 1CnI:nH~]'[K~ 1CnI:nH~) '0 12 := [a(1  a)jq(a)) plimnn1I:;;lH~[K~1CnI:nH~)'.
But, by definition, , w.p.I ,
n 1I:n
vo
> 1,
1H'[K' 1C I: H')'=I: 11C' I: 11C' =0 nn nnn n n n n
This proves the claim of independence for each a. The result is proved 0 similarly for any finite dimensional joint distribution. Note: The above results were first obtained in Koul and Saleh (1995), using numerous facts available in linear regression from the works of Koenker and Bassett (1978) and Grutenbrunner and lureekova (1992), and of course the AUL result given in Theorem 7.2.1 .
7.6
Goodnessoffit Testing for F
Once again consider the AR(p) model given by (7.3.8), (7.3.9) and let Fo be a known d .f.. Consider the problem of testing H o : F = F o. One of the common tests of Hi, is based on the Kolmogorov  Smirnov statistic D n := n 1 / 2 sup IFn(x, p)  Fo(x)l· x
7.7. AUTOREGRESSIVE MODEL FITTING
341
From Corollary 7.2.1 one readily has the following : If Fo has finite second moment and a uniformly continuous density fo , fo > 0 a .e.; p satisfies (7.3.20) under Fo, then , under H o,
o; =
sup IB(Fo(X))
+ n 1 / 2 (p  p)'n 1
t
Yido( x) 1+ op(1) .
In addition , if EYo = 0 = Eel , then Ir; +d sup{ IB(t) 1,0 ::; t ::; 1}, thereby rendering D n asymptotically distribution free. Next, consider, H 0 1 : F = N(p , a 2 ) , p E IR, a 2 > O. In other words , H 0 1 states t hat the AR(p) process is generated by som e normal err ors. Let {tn, an and Pn be est imat ors of p , a , and p , respectively. Define n
Fn(x)
:=
n 1
L I(X i ::; x a n + {tn + P~ Y
i  1 ),
x E IR,
i=l A
D II
._
.
n
1/ 2
sup IFn(x)  (x)l, A
.
_
 N(O , 1) d.f..
x
Corollary 7.2.1 can be readily modifi ed in a routine fashion to yield that if
then
ti; :=
sup IB(( x))
+ n 1 / 2 {({tn p) + (an 
a)}a  1 <j> (x )! + op(1) ,
x
where <j> is the density of . Thus the asymptotic null distribution of D; is similar to its an alogue in the one sample locationscale mod el: the estimation of p has no effect on the large sample .null distribution of ii; Clearly, simil ar conclusions can be applied to other goodnessoffit t ests. In particular we leave it as an exercise for an interested reader to investigate the large sample behaviour of the goodness  of  fit tests based on £ 2 distances, analogous to the results obtained in Section 6.3. Lemma 6.3.1 and the results of the pr evious sect ion ar e found useful here . 0
7.7 7.7.1
Autoregressive Model Fitting Introduction
In this section we shall cons ider the problem of fitting a given parametric autoregressive model of order 1 to a real valued st ationary ergodic Markovian time series X i, i = 0, ±1 , ±2, " ' . Much of the development her e is
342
7. Autoregression
parallel to that of Section 6.6 above . We shall thus be brief on motivation and details here. Let 'lj; be a non decreasing real valued function such that EI 'lj;(X I r) I < 00, for each rElit Define the 'lj;autoregressive function m t/J by the requirement that (7.7.1)
E['lj;(X I
Observe that , if 'lj;(x) 'lj;(x)

mt/J(Xo))IXo)
= 0,
a.s.
== x , then mt/J = u, and if
== 'lj;o: (x ) := I(x > 0)  (1 
0'), for an
°
sup
= T'IjJ(OO)
IB(t)1
O::;t~T~(OO)
xEIR
sup IB(t)l ,
in law.
0~t9
Thus, to test the simple hypothesis fI o : m'IjJ = mo , where mo is a known function proceed as follows. Estimate (under m'IjJ = mo) the variance (x) by
TJ
T~,'IjJ(X) := n 1
n
L
'ljJ2(X i  mo(X id)I(Xi 1 :; x),
x E JR,
i=1
and replace m'IjJ by mo in the definition of Mn ,'IjJ . Write s;,,'IjJ for T~ ,'IjJ(OO) . Then, for example, the KolmogorovSmirnov (KS) test based on Mn ,'IjJ of the given asymptotic level would reject the hypothesis fI o if
exceeds an appropriate critical value obtained from the boundary crossing probabilities of a Brownian motion on the unit interval which are readily available. More generally, the asymptotic level of any test based on a continuous function of S~,~Mn ,'IjJ((T~ ,1/,)I) can be obtained from the distribution of the corresponding function of B on [0,1], where (T~,'IjJ)1 (t) := inf{x E JR : T~ ,'IjJ(X) ;::: t}, t ;::: 0. For example, the asymptotic level of the test based on the Cramer  von Mises statistic
1
is obtained from the distribution of fo B 2 dH , where H is a d.f, on [0, I) . Now, let M be as in Section 6.6 and consider the problem of testing the goodnessoffit hypothesis H o : m'IjJ(x) = m(x , (Jo),
for some (Jo E
e, x
E I,
where I is now a compact subset of JR. Let (In be an consistent estimator of (Jo under Hi, based on {Xi , 0:; i :; n} . Define, for an 00:S x :S 00, Mn,'IjJ(x) = n 1! 2
n
L 'IjJ(Xi i=1
m(Xil, (In))I(X i  1 :; x ).
7. Autoregression
344
The process M n,1jJ is a weighted empirical process, where the weights, at X i  l are now given by the 1jJresiduals 1jJ(Xi  m(X i  l , On)). Tests for H o can be based on an appropriately scaled discrepancy of this process. For example, an analogue of the KS test would reject H o in favor of HI if sup{(J"~,~IMn,1jJ(x)1 : x E ~} is too large, where (J"; ,1jJ := l n  L~=11jJ2(Xi  m(XiI ,On)). These tests, however, are not generally asymptotically distribution free. In the next subsection we shall show that under the same conditions on M as in Section 6.6.2, under H o, the weak limit of i; M n,1jJ is B(G) , so that the asymptotic null distribution of various tests based on it will be known . Here t; is an analogue of the transformation Tn of (6.6.25) . Computational formulas for computing some of these tests is also given in the same section. Tests based on various discrepancies of iWn ,1jJ are consistent here also under the condition (6.6.4) with oX properly defined : Just change X , Y to X o, Xl , respectively.
7.7.2
Transform
r;
of M n ,1j;
This section first discusses the asymptotic behavior of the processes introduced in th e previous section under the simple and composite hypotheses. Then a transformation T and its estimate i; are given so that the processes T M n,1jJ and T nMn,1jJ have the same weak limit with a known distribution. Let P(X I
e,

m1jJ(Xo)
Xi  m1jJ(Xid,
:s x IXo = y), i = 0, ±1 , ±2 ,' "
x, Y
E~,
We are ready to state our first result . Theorem 7.7.1 Assume that (1.1.1) and (7.7.2) holds . Then all finite dimensional distributions of M n,1jJ converge weakly to those of a centered continuous Gaussian process M 1jJ with the covariance function K 1jJ ' (1). Suppose, in addition, that for some T/ > 0, f> > 0, (7.7.4)
< 00 ,
E 1jJ4(cd
(c)
E{'ljJ2(C2)1jJ2(cdIXII}I+c5
and that the family of d.f. 's {Fy, y E that are uniformly bounded: (7.7 .5)
(b)
E 1jJ4(CI)IXoll+'7
(a)
~}
sup fy(x) x ,y
< 00,
< 00,
have Lebesgue densities {fy , y E
< 00 .
~}
7.7.2. Transform T n Mn ,1/J
345
Then
Mn,t/J ===> M t/J' in the space D[oo, 00) .
(7.7.6)
(II) . Instead of (7.7.4) and (7.7.5) , suppose that u: is bounded and th e family of d.f. 's {Fy, y E JR} have Lebesgue densities {fy , y E JR} satisfying
/ [E{ fl~6(x 
(7.7 .7) for some 8
m t/J(Xo )) } ] th dx
0. Then also (7.7.6) holds.
Proof. Part (1) follows from Theorem 2.2.4 while part (II) follows from
Theorem 2.2.7 upon choosing
0, E'lj;4(IH) (cd and EX;(lH) are finite .
Moreover , in th is situation th e conditional distributions do not dep end on y , so that (7.7.5) amounts to assuming that th e density of CI is bounded. In th e case of bounded 'lj;, EIXIIl+ 6 < 00 , for some 8 > 0, impl ies (7.7.4) . Now consider th e assumption (7.7.7) . Note th at the stat iona ry distribution G has Lebesgue density g( x) == Efxo (x  mt/J(Xo )). This fact together with (7.7.5) implies th at th e left hand side of (7.7.7) is bounded o • from the above by a constant C := [suPX,y fy( x)) Tn' tim es /
(Efxo (x  m t/J(Xo ))] l~o dx = /
gth (x)dx.
Thus, (7.7.7) is implied by assuming /
gl~o (x)dx < 00 .
Alt ernately, suppose m t/J is bounded , and that fy(x) :::; f(x) , for all x, y E JR, where f is a bounded and unimodal Lebesgue density on JR. Then also the left hand side of (7.7.7) is finit e. One thus sees that in the particular case of th e i.i.d. homosc edastic err ors, (7.7.7) is satisfied for eit her all bounded error den sitie s and for all st ationary densities that have an exponential tailor for all bounded unimodal error densities in the
7. Autoregression
346
case of bounded m,p. Summarizing, we see that (7.7.4), (7.7.5) and (7.7.7) are fulfilled in many models under standard assumptions on the relevant densities and moments. Perhaps the differences between Theorem 6.6.1 and the above theorem are worth pointing out. In the former no additional moment conditions, beyond the finite second moment of the ¢innovation, were needed nor did it require the the error density to be bounded or to satisfy any thing like (7.7.7). Next, we need to study the asymptotic null behaviour of Mn,,p. To that effect, the following additional regularity conditions on the underlying entities will be needed. To begin with, the regularity conditions for asymptotic expansion of IV.Jn, ,p are stated without assuming X i  1 to be independent of Ci , i 2: 1. All probability statements in these assumptions are understood to be made under Ho. Unlike in the regression setup, the d.f. of X o here in general depends on 0 0 but this dependence is not exhibited for th e sake of convenience. We make the following assumptions. The estimator On satisfies n
(7.7.8)
n 1/ 2 (On  ( 0) = n 1/ 2 :L>t>(X i  1,Xi ,00)
+ op(l)
i=1
for some qvector valued function ¢ such that lE{¢(Xo , Xl, ( 0) !X o} = 0 and ¢(Oo) := E¢(XO ,Xl ,Oo)¢'(XO, Xl, ( 0) exists and is positive definite. (F). The family of d.f.'s {Fy , y E JR} has Lebesgue densities {fy, y E JR} that are equicontinuous: For every 0' > 0 there exists a 5 > 0 such that sup
Ify(x)  fy(z)1 ::;
0' .
yE IR ,l xzl <e5
Let, for 1 ::; j ::; q, x E JR,
IErilj (Xo, (0) ~(X1  m(Xo, Oo))I(Xo ::; x ),
Illj(X , ( 0 )
tj(x, ( 0 )
=
lEmj (Xo, ( 0)
M(x,Oo)
(rn, (x , ( 0 )"
t(x, ( 0 )
rr, (x, ( 0 )"
Jl
" , " ,
x; d¢ I(Xo ::; x),
Illq(X , ( 0 ) ) ' ,
I'q(x, ( 0 ) ) ' ,
Note that by (6.6.14) and ('1Jd or by (6.6.14) , ('1J 2 ) and (F) these entities are welldefined.
7.7.2. Transform T n Mn ,1/J
347
We are now ready to formulate an asymptotic expansion of £1n ,,,,, which is crucial for the subsequent results and the transformation Tn. Recall the conditions ('lid and ('lI 2 ) from Section 6.6.2. Theorem 7.7.2 Assume that (7.7.1), (7.7.8) and H o hold. About the model M, assume that (6.6.13) and (6.6.14) hold with Xi , X replaced by X iI, X o, respectively.
(A). If, in addition (WI) holds, th en n
(7.7.9)
l£1n,,,,(x)  Mn,,,,(x)
+ M'(x , ( 0 ) n I / 2 I>t>(Xi 
l,
Xi, ( 0 )
I
i=1
=u p (l ). (B). Assume, in addition, that (iJ1 2 ) and (F) hold , and that either lElXo 11+0 < 00, for som e a and (7.7.5) holds or (7.7.7) holds. Then the conclusion (7.7.9) with M replaced by r continues to hold . We note that th e Remark 6.6.1 applies to the autoregressive setup also . The following corollary is an imm ediate consequence of the above theorem and Theorems 7.7.1. We shall state it for th e smoot h 'l/J case only. The same holds in th e nonsmooth 'l/J. Note that under H o, e, == Xi m(X i  l , ( 0 ) . Corollary 7.7.1 Und er the assumptions of Theorems 7.7.1 and 7.7.2{A) ,
£1n,,p
=}
£1""
in the space
D[oo ,00],
where £1,p is a centered continuous Gaussian process with the covariance function K~(x,y)
K ",(x , y)
+ M'( x , Oo)1>(Oo)M(y, ( 0 )
M'(x , Oo)lE{ I(Xo :::; y) 'l/J(EI) 1>(Xo , Xl, Oo)}
 M' (y , Oo)lE{ I(Xo :::; x )'l/J(EI )1>(XO , X I, ( 0 ) }
.
The above complicated looking covariance function can be further simplified if we choose On to be related to the function 'I/J in the following
7. Autoregression
348 fashion . Recall from t he prev ious sect ion th at (J~ (x ) an d let , for x E IRP,
1',p(x)
= x),
.
E[~ (Ed IXo
.
/ f x (x ) '1/1 (dx) ,
= IE['I/12(edIXo = z ]
for smooth '1/1, for nonsmooth '1/1.
From now onwards we shall assume t hat The erro rs e, are i.i.d. F , Ei ind epend ent of
(7.7.10)
X i I ,
for eac h i = 0, ±1 ,· · · , and F sat isfies Fl and F2 . Then it readil y follows th at
=
(J~(x)
(J~ , a positive constant inx , a. s.,
1',p , a posi tive cons tant in x, a. s .,
1',p (x )
an d t hat On sat isfies (7.7.8) with (7.7.11)
o
for x , y E JR, where :Eo := E ril(Xo, Oo)ril'(X o , ( 0 ), so that ¢>(Oo) = T :E I , wit h T := (J~/ I'~ . Then direct calcu lations show t hat the above covar iance function simp lifies to J(~ (:r , y) = E'I/12(ed [G(x 1\ y)  v'( x) :E01v(y)),
v (:r )
= E ril(Xo, ( 0 ) I (X o :s x) ,
x , y E JR.
Unde r (7.7.10), a set of sufficient cond itio ns on th e mod el M is given in Sect ion 8.2 below und er which a class of Mestimators of 0 0 corres pond ing to a given '1/1 defined by t he relation n
On,,p
:=
argmin t lln I / 2
L
ril(Xi 
I,
t)'I/1(X i

m( X i 
I ,
t ))11
i= 1
satisfies (7.7.11). See, Theorem 8.2.1 and Cor ollar y 8.2.1 below. Throughout t he rest of t he section we shall ass ume that (7.7.10) holds. To simplify t he expos itio n fur th er write rnf ) = rn f, ( 0 ) . Set
A (x )
=/
ril(y)ril' (y) I (y 2': x) G(dy) ,
x E JR.
Assume t hat (7.7.12)
A (x ) is nonsi ngular for some Xo
0 and A(x) is positive definite for all x S xo, Hence,
and (6.6.14) implies
(7.7.15) This fact is used in the proofs repeatedly. Now, let a < b be given real numbers and suppose one is int erested in testing the hypothesis
H : m",(x)
= m(x, eo),
for all x E [a , b] and for some
eo
E e.
Assume the support of G is IR, and A(b) is positive definite . Then, A(x) is nonsingular for all x S b, cont inuous on [a, b] and AI (x) is continuous on [a , b] and
Ellril/(Xo)A 1 (Xo)II/(a < X o S b)
0) , 94(X) = xI (x > 0) gives th e self excitin g threshold AR(l) mod el m( x ,O) = (B 1 + B2 x )I (x :::; 0)
+ (B 3 + B4 x )I( x > 0) .
7. Autoregression
352
For more on these and several other nonlinear AR(l) models see Tong (1990). In the following discussion the assumption (7.7.10) is in action. In the linear AR(l) model ril(x, B) x and A(x) A(x) IEXJ I(Xo ~ x) is positive for all real x , uniformly continuous and decreasing on ~, and thus trivially satisfies (7.7.12). A uniformly a.s. consistent estimator of A is
=
An(x)
=
=
=
n
l
n L
xLI I(X k  l
~ x) .
k=l
Thus a test of the hypothesis that the first order autoregressive mean function is linear AR(l) on the interval (00, xo] can be based on sup ITnMn,I(x)I/{£1n,IG~/2(xo)},
x::O;xo
where
n
n
l 2 /
L (Xi  XiIBn) [I(X i l :S X) i=l
_
1
n
.2~
j=l
X j  l X i  l I(X j _ l :S X i  l 1\ X) ] 2 ( X ) n 1 ",n L..k=l X k _ l I X k  l ~ jl
n
£1;,I
n l L(Xi  X i_ lB n)2. i=l
Similarly, a test of the hypothesis that the first order autoregressive median function is linear AR(l) can be based on sup ITnMn,.5(X)11 {£1n, .5G~/2(xo)} ,
x::O;xo
where
n 2
n l / L
{I(X i  XiIBn> 0)  .5} [I(X i 
i=l _
1
n
and
t j=l
Xj n l
l
Xi
l
I(X j  l :S X i 
l
l 1\
= n l
L {I(X i  XiIBn> 0)  .5}2. i=l
X) ]
L~=l XLI I(X k  l ~ Xjd
n
£1; ,.5
:S X)
7.7.3. Computation of Tn Mn ,1/J in some examples
353
By Theorem 7.7.4, both of these tests are ADF as long as the estimator On is the least square (LS) estimator in the former test and the least absolute deviation (LAD) estimator in the latter. For the former test we additionally require l&i(lH) < 00, for some 0 > 0, while for the latter test I&r < 00 and f being uniformly continuous and positive suffice. In the EXPAR(l) model, ri1(x, ( 0 ) == (x, xe X 2 ) 1 and A(x) is the 2 x 2 symmetric matrix
From Theorem 4.3 of Tong (1990: p 128), if I&t < 00, f is absolutely continuous and positive on JR then the above EXPAR(l) process is stationary, ergodic, the corresponding stationary d.f. G is strictly increasing on JR, and EX6 < 00. Moreover, one can directly verify that EXJ < 00 implies A(x) is nonsingular for every real x and A 1 and A are continuous on JR. The matrix n
An(x)
= n 1 L
I(X i 
1
~ x)
i=l
provides a uniformly a.s. consistent estimator of A(x). Thus one may use supx~xo ITnM"n,I(x)I/{O"n,IGn(So)} to test the hypothesis that the autoregressive mean function is given by an EXPAR(l) function on an interval (00, xo) . Similarly, one can use the test statistic sup ITn Mn,.5(X)I/{O"n,.5G;j2(xo)}
X~XO
to test the hypothesis that the autoregressive median function is given by an EXPAR(l) function . In both cases An is as above and one should now use the general formula (7.7.16) to compute these statistics. Again, from Theorem 7.7.4 it readily follows that the asymptotic levels of both of these tests can be computed from the distribution of sUPO~u~l IB(u)l, provided the estimator On is taken to be, respe ctively, the LS and the LAD. Again one needs the (4 + o)th moment assumption for the former test and the uniform continuity of f for the latter test. The relevant asymptotics of the LS estimator and a class of Mestimators with bounded 'ljJ in a class of nonlinear time series models is given in Tjestheim (1986) and Koul (1996), respectively. In particular these papers include the above EXPAR(l) model.
7. Autoregression
354
7.7.4
Proofs of some results of Section 7.7.2
Many pro ofs are similar to th ose of Section 6.6.4. For exa mple th e ana logue of (6.6.31) holds here also with X i replaced by X i  1 and with i.i.d . replaced by assuming th at t he r.v.'s {(~i , X i)} are stationa ry and ergodic. In many arguments just replace th e LLN 's by t he Ergodic Theorem and th e classical CLT by th e cent ra l limit t heorem for mar tingales as given in Lemm a A3 of th e App endix. So many details are not given or are shorte ned. The Remark 6.6.3 applies here also without any cha nge . T he proof of par t (A) of Theorem 7.7.2 is exactly similar to t ha t of part (A) of Theorem 6.6.2 while th at of part (B) is some what different . We give th e det ails for t his par t only. Proof part (B) of Theorem 7.7.2 . Put, for 1 :::; i :::; n, t E ~q,
dn,i(t )
'
m (X i 
1,
()o + n 1 /Zt )  m(X i 
1 2 / (2o
+ t5llm(X i _ 1 , ()o)ll),
1,
I'n,i
.
n
ftn(Xi  1, t , a)
.
lE[l/J (ci  dn,i(t ) + aI'",;) I Xi I]'
Define, for a, x E
~
and t E
0:
()o) ;
> 0, £5 > 0;
~q ,
n
.
n 1/ Z I )l/J (Ci  dn,i(t) + al'n,i )  ftn (X i  1, t , a) i=l  ¢(Ci)] I(X i  1 :::; x) .
Write Dn(x ,t) and ftn (Xi1 ,t) for Dn(x ,t ,O) and ft n(Xi 1,t ,0), respect ively. Note t hat the summa nds in D n (x, t , a) form mean zero bounded martin gale differences, for each x, t and a . Thus
Va r (Dn( x , t , a»
< lE[l/J(Cl  dn,l (t) + al'n,l )  ftn(XO, t , a)  ¢ (CI W :::; lE[¢(Cl  dn,l(t) + al'n,d  ¢ (Cl )]2 + 0, by ass umption (6.6.13) and (lItz ). Upon an applicatio n of Th eorem 2.2.5 with Zn,i = ¢ (ci  dn,i(t ) + al'n,i)  ftn(Xi  l , t , a)  ¢ (ci ) we readil y obt ain t hat (7.7.17)
sup IDn(x , t, a )1 = op(l ), xE IR
7.7.4. Proofs
355
The assumption (C) of Theorem 2.2.5 with these {Zn ,d and implied by (W2) while (7.7.7) impli es (2.2.74) here. We need to prove that for every b < 00, (7.7.18)
sup
7
2
== 0 is
IDn(x , t )1 = op(I ).
xE R,lI t llSb
To that effect let
c; := and for an
{
Iisil
An := {
sup Idn,i(t )1 S; n 1/ 2 (0' + bllriJ.(XidID , 1 S; i II tli ::; b
s; n} ,
S; b, let
sup Idn,i (t )  dn,i(s)1 S; I n,i, 1 S; is; n } n IItll::;b,lIt sll::;J
0, k < 00, sEn, (8.2 .2)
supn 1/ 2 IJl(Y i_1, t)  Jl(Y i 1 ,S)  (t  S)'/J,i(Y i1,S)1 = op(l),
where the supremum is taken over 1 ::; i ::; n , n 1/ 211t  sll ::; k. Note that the differentiability of Jl in 0 alone need not imply this assumption. To see th is consider the simple case q = 1 = p, Jl(y,8) = 82y . Then the left hand side of (8.2.2) is bounded above by a constant multiple of n  1 / 2 max, !Yil l , which may not tend to zero in probability unless some additional conditions are satisfied. Now we are ready to define analogues of M and R and m .d. estimators for O. For the sake of brev ity we shall often write Jli (t), Jli for Jl(Y i l , t) , Jl(Y i 1 , 0) and /J,i(S), /J,i for /J,(Yi l ' s) , jJ,(YiI' 0) , respectively. The basic scores needed ar e as follows. Let ci( t) := Xi  Jl(Y i 1 , t) , and R it denote the rank of Ci(t ) among {Cj(t) ; 1::; j ::; n} , 1::; i ::; n , t E n . Also let ,¢, tp be as in Section 7.3, L be a d .f. on [0,1] and define n
(8.2.3)
M(t)
.
n  1/ 2
L /J,i(t)'¢(ci(t)) , i=1
S(t)
. n  I/ 2
n
L /J,i(t)tp(Rit/(n + 1)) , i=1 n
Z(u , t)
.
n 1 / 2
L /J,i(t)I(R it ::; nu), i=1 n
Z(u, t)
.
Z (u , t)  n  1 / 2
L /J,i(t) u, i=1
O::;u::;l ,
8. Nonlinear Autoregression
360
tEn . Also, define (8.2.4)
OM
:=
OR := argmin t IIS(t )1I 2,
argmin t IIM (t )1I2,
e.: := argmin IlZ (t )11
2
t
.
Note that OM, OR are the extensions of M and R estimators of Section 7.3 and Omd is an analogue of the m.d. estimator (5.2.18) appropriate for the model here . Thus, for example, the choice of'ljJ(x) == sign(x), 0, :3 a 15 > 0, and an N < 00,:1 V 0 < b < 00,
IIsll :S b, n > N,
p(
n
sup II t  sll 0, 3
and no , '3 Vn > no , and for each IIsll ~ b,
(8.2.47)
p(
sup
II D (y , t )  D (y, s)1I
> 0') ~
0',
y EIR,IIt  sll 0, n 21 , N := [nl / 2 / a] and {Yj} be the partition oflR such that F(Yj) = jlN, 1 ::; j ::; N , Yo = 00 , YN+l = 00 . Then, under {2.2.52}, [nul
(8.2.53)
suI? In 1/ 2 U,J
L h;i {I(ci ::; Yj+l) i=l
ti«, ::; Yj)  liN} I = op(I) ,
where the supremum is taken over 0 ::; U ::; 1, 0 ::; j ::; N + 1. Proof. Let i
Vi,j
:=
h;i{ I(c i ::; Yj+l) 
u«, < Yj) 
liN},
Si ,j :=
L Vk,j. k=l
8. Nonlinear Autoregression
374
Clearl y, for each 0 ::; j ::; N + 1, {Si,i ' F ni , 1 ::; i ;::: n,} is a mean zero martingale array. By the inequ ality (9.1.4) , for some C < 00 ,
P(l~~n ISi,il > a ) ::; a  ES~ ,i ' 4
n
2
n
ES~,i ::; C { E [ L E (V;~i I Fn,i 1 ) ] + L EV;~i }' i=l
i= l
But, because : ::; Ihn ;/ , for all i, n
n
L EV;;i ::; L i=l i=l
Eh~i '
E(Vi,i2Ir.r n,t. 1) a) ::; Co: n 2 N L
Eh~i = Q (n 
1/ 2
)
= 0(1),
i= l
Now we pro ceed to prove (8 .2.32) . Write n« = h7;i  h;;i' By the triangle inequ ali ty, it suffices to prove (8.2.32) with h ni repl aced by h~i ' Let V±(y , t , u ) denote t he difference inside th e absolute valu e on the L.R .S. of (8 .2.32) with h ni replaced by h ~i ' Now let Oni(t ) == dni(t ) + a ~ni , and as before we will conti nue writing Oni for Oni(S ), Define [nul
U±(a, y , t , u)
:=
n
1/ 2
L
i: [n Si ::; y + Oni(t) )  I( ci < y)
i =l
 F(y
+ Oni(t)) + F(Y) ]
We have, by (F ), (2 .2.52) , and th e DCT , for every fixed y E IR, t E IRq , and 0 ::; u ::; 1, n
Var( V ±(y , t , u )) ::; n l L
Eh;'ilF(y
+ dni(t )) 
F( y )1 = 0(1).
i= l
Now, fix an a > 0, 11511 ::; b and a 0 > O. Let An be as in t he proof of (8.2 .41) . Arguin g as in t here , we obtain t hat on t he set An, \f Iltll ::;
375
8.2.1 Main results in AR models
sll < 6, and for all Y E JR, 0 :S u :S 1,
b, lit 
V±(y, t, u)
< sup IU±(l,Y, s, u)1 + sup IU±(1, y, s, u)1 y ,u
y ,u
[nuJ
+ supn 1/ 2 L h;i[F(y + dni(s) + .6. ni )  F(y + dni(s)  .6. ni )). y ,u
i=l
But , by (Fl), (2.2.52) , and (8.2.30), the last term in this bound is bounded above by n
n
C(6n 1 L
IIhni it i li + 2bn 1 L
i=l
= Op(a) ,
Ihnila)
i=l
by the choice of 6. Thus to complete the proof of (8.2.32), it suffices to show that sup IU±(a,Y, s, u)1 = op(l),
(8.2.54)
a E JR,
IIsll :S b.
y,u
Let Nand {Yj} be as in the proof of Lemma 8.2.3. Then we obtain sup IU±(a,Y, s , u)1 y ,u
Uni(t 2 ) (Jni(t2) . Note that Uni(O) = U(Y i  I,,8)/(J(Yi  I ,,8), (Jni(O) := 1. In the sequel, iLi ' P,i , Ui, r, will stand for iLni(O), P,i(O) , Uni(O), rni(O), respectively, as they also do not depend on n . Also, let itni,j and iti,j , respectively, denote the lh coordinate of iLni and iLi, 1 :S: j :S: p. All expectations and probabilities below depend on (J := (a', ,8')" but this dependence is not exhibited for the sake of convenience. We now state additional assumptions. (8.3.8)
There exist positive definite matrices A, matrix I', all possibly depending on n
n l
L i=l
(J,
t , :1\1:, and a
such that
n
iLiiL;
= A + op(l) ,
n l
L i=l
UiU;
= t + op(l) ,
8. Nonlinear Autoregression
384 n
n 1 L Ji(Yi1, 0:)jL(Yi 
1 , 0:)'
=
M + op(I),
i= l
n
n 1 L Jiio~
= r + op(I) .
i=l
(8.3.10)
n . (t) 4 n 1 LE(J.Lni,j 1 ) = 0(1) , VI S:} S:P, t E ]Rm . i=l ani(tz) max n 1 / z (IIJiill + 110;11) = op(I) .
(8.3.11)
z 1 n L E{ llJini(td  Jiill
(8.3.9)
lS'Sn
n
i=l
o;lI Z } = 0(1),
+IIoni(tz) 
t E ]RID.
n
(8.3.12)
n
1 Z /
L
{IIJini(td  Jiill
i=l
oill}
+1!oni(t Z) (8.3.13)
For everyt E
]Rm ,
= Op(1),
t E ]RID .
1 S:} S: p ,
n1 /ZE[n1 t{Jini,j(td}Z i=l ani(tZ) x {!J.lni,j(t1)  J.li ,jl + lani(tz) (8.3.14)
V E > 0, :3 a 8 > 0, and an N
< 00,
11}f = 0(1).
3 V
°
N,
p(
n 1/ Z t
sup IItsll 1  E. 

Many of the above assumptions are th e analogues of the assumptions (8.2.6)  (8.2.10) needed for the AR models . Attention should be paid to th e difference between the Jini here and th e one appearing in the previous section due to th e presence of the conditional standard heteroscedasticity in the ARCH model. A relatively easily verifiable sufficient condition for (8.3.13) is th e following: For every 1 S: } S: P, t E ]Rm , (8.3.15)
n 1/ Z t
i=l
E [{ itni'i(~t;) an, Z
r
{1J.lni,j (td  J.li ,jIZ
+!ani(tz) 
liZ}] = 0(1).
Note also that if the und erlying process is stationary and ergodic, then un
8.3.2 Main results in ARCH model
385
der appropriate moment conditions, (8.3.8)  (8.3.10), are a priori satisfied. The first two theorems below give the asymptotic behavior of the preliminary estimators Q p and i3 of (8.3.6) and (8.3.7) , respectively. To state the first result we need to introduce n
tc;
:=
n 1 / 2
L jJ,(Y
i l ,
a){ (i(Y
i
1,
f3)  1
}€i'
i=1
Theorem 8.3.1 Suppose that the model assumptions {8.3.1}, {8.3.2}, and {8.3.3} hold and {8.3.8} holds. In addition, suppose the following holds: There exist a real matrixvalued function M on jRP x n1 such that V k < 00 , SI
E
n1 , n
(8.3.17) ~>arEIIM(YiI' a)11
= 0(1), IIn 1 L

M(Y i 
1,
a) €ill
where the supremum in {8.3.16} is over 1 ::; i ::; n, n 1/211tl Then , for every
= op(l),
i=1
°
(c€)} = 0, for every c > 0. This is satisfied for example when ¢> is skewsymmetric and
386
8. Nonlinear A utoregression
s is symm et rically distribut ed . In this case, a preliminary estimator of a can be defined as n
a :=
argm int EOllinI /2
L jt(Y i l , t ) (X i 
J.L(Y i  1 , t ))II·
i= 1
The next t heorem gives a similar lineari ty resul t about th e scores M s. Its proo f uses usual Taylor expansion and hence is not given here.
Theorem 8.3.2 Suppose that the assumptions (8.3.1), (8.3.2), and (8.3.3) hold. In addition , suppose the following hold. Th e fun ction K, is norulecreasing, twice different iable and satisfies: (i)
J
XK, (x )F (dx ) = 1,
(ii)
J
x 2 1k,(x )lF (dx)
< 00,
(iii) the second derivat ive of K, is bounded. Th ere exist a matrixvalued fun cti ons k < 00 , 82 E 11 2,
R
on
x l"h , such that for ever y
jRP
n
(8.3.20)
~>~ E II R(Yi  l , j3)1I
= 0(1), IIn 1 LR(Y i  1 , a ) c i li = op(1).

~l
where the sup remum in (8.3.19) is over 1 :S i :S n , n1/211 t2  8211 :S k . Then, for every 0 < b < 00, sup IIMs( O + n 
II t ll9
[J + [J +
1
/
2t)  M s(O )
J +J
+
K,(x) F(dx)
XK, (x )F (dx )
x k,(x ) F (dX)] x
2k,(x
r
l
) F (dX)]
t1
I:t211 = op(1).
Consequent ly, we have t he following corollary.
Corollary 8.3.2 In additi on to the assumptions of Theorem 8.3.2, assume that J K, (x)F(dx) = 0 = J x k,(x )F(dx) and that
Iln1/ 2 (,8 
(8.3.21)
13)11 = Op(1).
Th en,
[J
XK,(x) F( dx)
+
J
x 2k,(x) F (dX)] n 1 / 2 (,8 . 1
=:E

13 )
M s(O ) + op(l) .
8.3.2 Main results in ARCH model
387
Note that the asymptotic distribution of 13 does not depend on the preliminary estimator op used in defining 13. Also, the conditions (i)(iii) and those of the above corollary involving", are satisfied by ",(x) == x, because Ec 2 = 1. Again, if the underlying process is stationary and ergodic then (8.3.17) and (8.3.20) will be typically satisfied under appropriate moment conditions. Now, we address the problem of obtaining the limiting distributions of the estimators defined at (8.3.5) . The first ingredient needed is the AUL property of the M score and the ULAQ of the score K . The following lemma is basic to proving these results when the underlying functions 'IjJ and L are not smooth. Its role here is similar to that of its analogue given by the Lemma 8.2.1 in the AR models . Let, for t = (t~ , t~)' , t 1 E IRq , t2 E IRr , m = q + r , and x E IR,
~ J1,ni(td 1) .(t) I(c . < x + x (IJnt·(t?) ~
W(x , t) := n 1 / 2 L
i=l
t
IJn t

2
+(J.lni(td  J.li)) , v(x , t) := n 1 / 2
. (t) J.Lni(t 1) F(x + X(IJni (t 2)  1)
Ln i=l
IJnt
2
+(J.lni(t 1 )
J.li))'

W(x , t) := W(x, t)  v( x , t) , W*(x , t) := n 1 / 2
t
i =l
J1,ni(td [I(c i IJni(t 2)
~ x) 
F(x)] .
The basic result needed is given in the following lemma whose proof appears later. Lemma 8.3.1 Suppose the assumptions (8.3.1) (8.3.3}, (8.3.8}(8.3.14) hold and that the error d.f. F satisfies (2.2 .49), (2.2. 50} and {2.2.51}. Then, [or every 0 < b < 00,
= u p (l ). W(x, 0)11 = u p (l ).
(8.3.22)
IIW(x , t)  W*(x , t)1I
(8.3.23)
IIW(x, t) 
(8.3.24)
IIW(X,t)  W(x , 0)  n 1 / 2
t
i= l
{j(x)A t 1
+ x j (x )r' t 2 }
[itn i(t d  J1,i] IJn i(t 2 )
F(X)II = u (l ), p
388
8. Nonlinear Autoregression
where u p (l ) is a sequence of stochastic processes in x , t, converging to zero , uniformly over the set {x E JR,
litII ::; b} , in probability.
The claim (8.3.24) follows from (8.3.23) and the assumptions (8.3.8) (8.3.14), and the assumption that F satisfies (2.2.49), (2.2.50) and (2.2.51) . Note that the assumptions (8.3.8)(8 .3.14) ensure that for every 0 < b < 00 ,
The proofs of these two claims are routine and left out for an interested reader. The proofs of (8.3.22) and (8.3.23) appear in the last section as a consequence of Theorem 2.2.5. The next result gives the AUL result for Mscores. Theorem 8.3.3 Und er th e assumption (8.3.1) (8.3.3) , (2.2.49) , (2.2. 50} , (2.2.51) with G replac ed by F , and (8.3.8}(8 .3.14), for every 0 < b < 00 , and fo r ever y bounded nondecreasing 'ljJ with J 'ljJdF = 0, sup IIM(9 + n 1 / 2 t )  M(9)
Iltll:::;b
 ( I fd'ljJAt l
+
1
X f( X)d'ljJ(X)rt 2) II =op(l).
This theorem follows from the Lemma 8.3.1 in the sam e way as Th eorem 8.2.1 from Lemma 8.2.1, using the relation I[W(x , t)  W(x , 0)] d'ljJ(x)
== n 1 / 2
t
i=l
ni (t [itO'ni(t2) 1
) 
iti] 'ljJ (00 )  [M(9
+ n 1/ 2t)
 M(9)] .
Next, we have the following immediate corollary. Corollary 8.3.3 In addition to the assumptions of Theorem 8.3.3, assume that J fd'ljJ > 0, (8.3.21) holds, and that (8.3.25)
389
8.3.2 Main results in ARCH model Then ,
!
(8.3.26)
fd'ljJn1/ 2(&.  0)
_A 1 [M(O)
=
+ rn1/2(~ 
{3)
!
Xf(X)d'ljJ(X)] +op(l) .
From this corollary it is apparent that the asymptotic distribution of &. depends on the preliminary estimator of the scale parameter in gen
eral. However, if either f xf(x)d'ljJ(x) = 0 or if r = 0 , then the second term in the right hand side of (8.3.26) disappears and the preliminary estimation of the scale parameter has no effect on the asymptotic distribution of the estimation of o . Also, in this case , the asymptotic distribution of &. is the same as that of an Mestimator of 0 for the model Xii o(Yi1 , (3) = P,(Yi 1 , 0'.) /o(Y i 1, (3) +ci with {3 known. We summarize this in the following Corollary 8.3.4 In addition to the assumptions of Corollary 8.3.3, suppose either f xf (x)d'ljJ (x) = 0, or T = O. Then,
where v('ljJ , F) is as in (1.3.4). A sufficient condition for f xf (x)d'ljJ (x) = 0 is that f is symmetric and 'ljJ skew symmetric, i.e., f( x) = f( x) , 'ljJ (x) =  'ljJ(x) , for every x E JR. To give the analogous results for the process K and the corresponding minimum distance estimator &.md based on ranks, we need to introduce n
2
L
Z(u)
.
n 1 /
q(u)
.
f(F 1(u)) ,
[iti
lL] {I(G(c i) :S u) 
u} ,
lL :=
i=1
n
n
1
L iti ' i=1
s(u):= F 1(u)f(F1(u)),
u E [0,1].
Theorem 8.3.4 In addition to the assumptions (8.3.1)  (8.3 .3) , (2.2.49), (2.2.50), (2.2.51), and (8.3.8)  (8.3.14), suppose that for some positive definite matrix D(0) , n
n
1
L i=1
[iti
lL] [iti  Ii]
I
= D(O)
+ op(l) .
8. Nonlinear Autoregression
390 Then, for every
°
0, and an no < 00, 3 ' no,
where V := U U*. Details that follow are similar to the proof of Lemma 8.2.1. For convenience , write for 1 SiS n , t E IRm , x E IR,
13i(x, t)
:=
I(ci S x
+ XVni(t ) + Uni(t)) F(x
13i(:r ) .
+ XVn i(t) + 'Uni(t)) ,
I(ci S x)  F(x) .
Th en n
ui«. t)
= n
1 2 /
L Ini(t)13i(x, t),
n
U*(x , t)
= n
1 2 /
i =1
L Ini(t)13i(x) , i=1
and V( X, t)  V(x , s) n
n 1 / 2
L
[lni(t) lni(s)) [13i(x , t)  13i(x ))
i=1
n
+ n 1 / 2
L
lni(s) [13i(x, t)  13i(x , s))
i=1
It thus suffices to prove the analog of (8.3.42) for VI, '0 2 . Consider VI first. Note that because 13i (x , t)  13i (x ) are uniformly bounded by 1, we obtain n
IVdx, s , t)1 S
n 1 / 2
L
11ni(t) lni(s)
I·
i= 1
This and th e assumption (8.3.38) th en readily verifies (8.3.42) for VI .
8.3.2 Main result s in ARCH model Now consider V 2 . For a 8
395
> 0, and S fixed , let
sup
IVni(t )  vni(s)l,
sup
IUni(t )  uni(s)l,
II t  sll and an
n1
s: €} .
< 00, such t hat
(8.3.43) Next , writ e lni = l~i  l;;i and V 2 = to V 2 with i: replaced by l; i' Let
vt 
V :; , where
vt
correspond
Dt (x , s , a) n
'
n 1! 2
L
l;i(S) [I (ci ~ x
+ X{Vni(S) + adni( s)}
i= 1
+Uni(S) + aCni(s ))
 F( x + X{Vni(S) + adni(s) } +Uni(S) + aCni(s) ) ], Arguing as for (8.3.41), verify t hat hni == l; i(s ), Tni == vni( s) +adni(s), 8ni == Uni (S) + aCni(s) satisfy th e conditions of Theorem 2.2.5. Hence, one mor e ap plication of (2.2.57) yields that for each S E JRm , a E JR, (8.3.44)
sup \Dt(x , s , a)  Dt (x , s , 0)1 = op(l) . x ER
Now, sup pose x > 0. Then , again using monotonicity of th e indi cator function and G , we obtain that on En, for all lit  sil < 8, t , s E JRm ,
Ivt (x,s , t)1 < IDt (x ,s, 1)  Dt (x ,s,O)1 + IDt (x ,s, 1 )  Dt(x ,s ,O)1 n
+ n  1! 2
L
l; i(s )
[F( x + X(Vni( S) + dni (s ))
i= l
+Uni(S) + Cni (S))
 F( x + X(Vni( S) 
dni(s ))
+Uni(S)  Cni(S)) ] .
396
8. Nonlinear Autoregression
Again , und er the conditions (2.2.49}(2.2.51) and (8.3 .37) , there exist s a 8 > 0 such that the last term in this upper bound is Op(€} , while the first two terms are op(l} , by (8.3.44). Thi s completes the proof of (8.3.42) for D 2 in the case x > O. The proof is similar in the case x ::; O. Hence the proof of (8.3.39) is comp lete . 0 Proof of Lem ma 8.3.1. First, consider (8.3.22) . Let m = q + rand writ e t = (t~ , t~)' , t 1 E IRq, t2 E IRr and let W j , W; etc. denot e the jth coordina te of W , W · , etc. Take (8.3.45) (8.3.46)
in U to see that now U equals Wj . Thus (8.3.22) will follow from (8.3.39) once we verify th e condit ions of Lemma 8.3.2 for th e quantities in (8.3.45) for each 1 ::; j ::; q. To t hat effect , note t hat by (8.3.2) and (8.3.3), 'V € > 0, :3 N, such that "In > N , (8.3.47)
p(max{sup IJlni (t
1} 
Jli 
t ,tl
sup IOni(t 2}  1 't ,t 2
nl/2t~itil ,
nl/2t~ui l } ::; bm
1
/
2) 2: 1  e,
where, here and in th e sequel, i , t 1 , t 2 in the supremum var y over th e range 1 ::; i ::; n, lI tl l ::; b, t 1 E IRq , t 2 E IRr , unless specified otherwise. From (8.3.47) and th e assumption (8.3.1O) we obtain that (8.3.48)
T his verifies (8.3 .34) for t he follows from (8.3.9). Next , let
Vni, Uni
of (8.3.45) . The cond ition (8.3.32)
=
n 1 / 2(8I1itill + 2b€} ,
bn := max o« :
Cn i

n 1 / 2(8 1Iudl + 2b€} ,
Cn
Zni

bn
Wni
=
bn 1 / 2(lI it i li
bn i
1
/
2(llu dl + e),
+ e),
l ~ i ~n
:= max
Cn i ;
max
Zn i;
l ~i ~n
Zn := Wn
l ~ i ~n
:= max
l ~i ~n
Wn i .
8.3.2 Main results in ARCH model
397
Note that by (8.3.10), bn = op(l) = Zn = :=
en
Wn.
Now let , for an Iisil ~ b,
sup IJlni (td  Jlni (sdl II t,  s,ll for all j = 2, · , · ,q and the case when OJl = for some j = 2,· . . , q, separately. In the first case we have the following fact: Va, b E JRP+l , n>  1,
°
(8.3.55)
°
w
It
a'w/(o. + n 1 / 2b)'w, wE [0, oo)p+l, is bounded.
Use this fact and (8.3.54) to obtain that for some k E (0,00), possibly depending on t,
0, E Xri < 00 , and t he stationarity and th e ergodicity of {X;} to verify (8.3.8), (8.3.9), and (8.3.10) here with 1\1: = E[XgJ, an d
A=E[~] (3'Zo '
t _
 E
[ZoZ~]
4 «(3'ZO )2 '
XoZ~ ] T = E [ 2((3'Zo )3/ 2 .
To verify (8.3.11), note t hat iJ.ni(t diJ.i == O. Hence, by t he stationarity of t he pr ocess the left hand side of (8.3.11) here equa ls
404
8. Nonlinear Autoregression
But, clearly the sequence of r.v .'s [{,8'Zo/(,8 + n1 /Ztz)'ZoP / z  1] is bounded and tends to 0, a.s. These facts together with the D.C.T. imply (8.3 .11) in this example. Next , to verify (8.3.12), note that the derivative of the function s H [x/(x + s)jl /z at s = 0 is 1/(2x) . Now, rewrite the left hand side of (8 .3.12) as
2~ t, ~~~~~~ IL~ + :"~:~: )'z,J'I'  11
0, = to < tl < .. . < t r = 1 with t,  til ~ b, 2 ::; i ::; r  1, be a part ition 01[0 ,1] . Then, V e > 0, V 0 < b ::; 1,
P( sup I((t)  ((s)1 It sl 0,
L E[X~J(IXnil > E)IFn,i d = op (l ). i=1
n
(9.1.2)
L E[X~ i IFn , i d
+ a r .v . '7 2 , in probabilit y.
(9.1.3)
i=1 :Tn,i C F n+ 1,i, 1 :S
i :S n , n 2': 1.
Then S nn converges in distribution to a r. v. whos e characteris tic fun ction at t is E exp( _1]2( 2 ) , t. E lit DO The following inequ ali ty on t he tail probability of a sum of martingale differences is obtained by combining the Doob and Rossenth al inequalit ies : cf. Hall and Heyd e (1980, Coroll ary 2.1 and Theor em 2.12). S uppose M j = L: {=1 D i is a su m of martingale differences with respect to the un derlyin g in creasing filtrat ion {V;} an d E IDilP < 00 , 1 :S i :S n , f or so me p 2': 2. Th en , there exis ts a cons tan t C = C (p) su ch that for any E> 0,
(9.1.4)
Next , we state and pr ove t h ree lemmas of general interest . The first lemma is du e to Scheffe (194i) whil e t he seco nd has its origin in Theor em 11.4.2.1 of of Hajek  Sid ak (op . cit .) . The third lemma is the same as Theor em V.1.3.1 of Haj ek  Sldak (op . cit.) . All these resul ts are reproduced her e for t he sa ke of comp leteness .
411
Appendix
e, en , n ~ 1 en t e, a.e. u . Then,
Lemma 9.1.4 Let (D, A, v) be a afinite measure space. Let be sequence of probability densities w.r.t . v such that
! len  el
du + 0.
Proof. Let 8n := en  e, 8;t := max(8 n, 0), 8;; := max( 8n , 0). By assumption, 8;; t 0, a.e., u. Mor eover , 8;; e. Thus, by the DCT, J 8;; dv t 0. This in turn along with the fact that J 8n dv = 0, implies that J 8;t dv t 0. The claim now follows from these fact s and the relation J len el dv = J 8;; du + J 8;t dv . 0
:s
Le m m a 9 .1.5 Let (D, A , v) be a afinit e m easure space. Let {g n}, 9 be a sequ ence of m easurabl e function s su ch that
(9.1.5) (9.1.6)
limnsu p
!
gn
t
Ignl dv