iiilill^^ iiiiiβ^ llβ
illlllil
Empirical Processes Peter Gaenssler Mathematical Institute of the University of Munich ...
103 downloads
1570 Views
10MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
iiilill^^ iiiiiβ^ llβ
illlllil
Empirical Processes Peter Gaenssler Mathematical Institute of the University of Munich
lllilllllllllllllllll
Institute of Mathematical Statistics LECTURE NOTES-MONOGRAPH SERIES Shanti S. Gupta, Series Editor Volume 3
Empirical Processes Peter Gaenssler Mathematical Institute of the University of Munich
Institute of Mathematical Statistics Hayward, California
Institute of Mathematical Statistics Lecture Notes-Monograph Series Series Editor, Shanti S. Gupta, Purdue University
The production of the IMS Lecture Notes-Monograph Series is managed by the IMS Business Office: Bruce E. Trumbo, Treasurer, and Jose L Gonzalez, Business Manager.
Library of Congress Catalog Card Number: 83-82637 International Standard Book Number 0-940600-03-X Copyright © 1983 Institute of Mathematical Statistics All rights reserved Printed in the United States of America
CONTENTS
1. Introduction and some structural properties of empirical measures
1
2. Glivenko-Cantelli Convergence: The Vapnik-Chervonenkis Theory with some extensions
12
3. Weak convergence of non-Borel measures on a metric space: Separable and tight measures on ^ ( S )
4 1
Weak convergence / Portmanteau-Theorem
46
Identification of Limits
52
Weak convergence and mappings (Continuous Mapping Theorems)
53
Weak convergence criteria and compactness
58
Some remarks on product spaces
69
Sequential Compactness
74
Skorokhod-Dudley-Wichura Representation Theorem
82
The space D[θ,l]
90
Random change of time
102
4. Functional Central Limit Theorems: Functional Central Limit Theorems for empirical C-processes
105
Some remarks on other measurability assumptions and further results
131
Functional Central Limit Theorems for weighted empirical processes
139
Some general remarks on weak convergence of random elements in D = D[O,ll w.r.t. p -metrics
147
Functional Central Limit Theorems for weighted empirical processes w.r.t, p -metrics 1
1 i f xGB , 0 if
BeB.
xtB
In other words, given the first n observations x. = ξ.(ω), i=l,...,n, μ (B) Ξμ (B,ω) is the average number of the first n x.'s falling into B. (The notation μ ( , ω ) should call attention to the fact that μ is a random n n p-measure on B.) μ
may be viewed as the statistical picture of μ and we are thus inte-
rested in the connection between μ and μ, especially when n tends to infinity. n
PETER GAENSSLER
In what follows, let C be some subset of 8 (e.g., C = {(-»,t]: tθR }, the class of all lower left orthants in X = E , or the class of all closed Euclidean k balls in ]R , to have at least two specific examples in mind). Denoting with 1 the indicator function of a set CEC, μ (C) can be rewritten in the form
Now, since the random variables 1 (ξ.), i=l,2,... are again i.i.d. with common mean μ(C) and variance μ(C)(l-μ(C)), it results from classical probability theory that (2)
(Strong Law of Large Numbers): For each fixed CEC one has μ (C) -> μ(C) F-almost surely QP-a.s.) as n tends to infinity.
(3)
(Central Limit Theorem): For each fixed CEC one has 1/2
L —
n ' (μ (C) - μ(C)) + G (C) as n tends to infinity, where G (C) is a random variable with L{Gμ(C)} = N(O,μ(C)(l-μ(C))). (4)
(Multidimensional Central Limit Theorem): For any finitely many C
l 5 " ''CkeC O n e
h a S
{n 1 / 2 (μ n (C.) - μ(C.)) : j=l,...,k} k {G μ (C.) : j=l,...,k} as n tends to infinity, where
Φ = (G ( C ) ) n c P μ μ Let
is a mean-zero
Gaussian process with covariance structure cov(G (C), G (D)) = μ(CΠD) - μ(C)μ(D), C,DEC. Here, according to Kolmogorovfs theorem (cf. Gaenssler-Stute (1977), 7.1.16),
— Φ
μ
is viewed as a random element
C
in (IR , β p ) , where t
Bn Ξ & IB denotes the proL p
C duct σ-algebra in ]R of identical components β, β being the σ-algebra of Borel sets in ΊR, In this lecture we are going to present uniform analogues of (2) (with the uniformity being in CEC) known as GLIVENKO-CANTELLI THEOREMS (Section 2) and functional versions of (4), so-called FUNCTIONAL CENTRAL LIMIT THEOREMS (Section 4 ) ; an appropriate setting for the latter is presented in Section 3
EMPIRICAL PROCESSES
3
which might also be of independent interest. First we want to give insight into some more or less known
STRUCTURAL PROPERTIES OF EMPIRICAL MEASURES: For this, consider instead of μ
the counting process
N (B) := nμ (B), BEB. n n j
2
Note that L{N n (B)} = Bin(n,μ(B)) (i.e., F(N n (B)=j) = (* )μ(B) ( l-μ(B) f"
,
j=O,l,...,n). The following Markov and Martingale properties associated with empirical measures are well known; since however specific references are not conveniently available, and especially not in the set-indexed context of these lectures, we present detailed derivatives. LEMMA 1 (MARKOV PROPERTY). For any B.eB such that for
D. := B.\B
0 ^ m ^ . . .^m, . ύ ΠL ^ n
Proofs
with
R + 1
= X
with
, μ(D.)>0, i=l,...,k+l, and for any m. E {0,1,... ,n}
one has
3P(Nn(Bk)=mk|Nn(B1)=m1,...,Nn(Bk_1)=mk_1)
"
a
0 = B Q C B^. . .CBJ^C B R C B
•
_ _
-: —
9 sa
y 9 where
=P(Nn(D1)=m1, 1
n 1 (m 2 -m 1 )!...(m k -m k _ 1 )!(n-m k )! m. 1
3
whence
— = b
(
"-\-l)!
(mk-\i>!
) =P({ωEΩ: ξ(ω)GA}) n
= L{ξ}(A) = L{ζ}(A) = F({ωGΩ : ξ(ω)6A}) b) n = Γ({ωGΩ : lim (sup | - Σ 1, n-x° tGR Π i = l °°'
. ( ξ . ( ω ) ) - F ( t ) | ) = 0 } ) = 1, ± b)
When taking (X,B) = (R,β) and C := {(-«>,t] :tθR}, then, in the setting of Section l, the GLIVENKO-CANTELLI-convergence (8) reads as (8*)
D (C,μ) Ξ supjμ (C)-μ(C)j -> 0 P-a.s. n
n
cec
Concerning more general situations it turns out however that (8 ) may hold for the empirical measures obtained from one sequence ξ-,ξ 9 9 ... of independent random elements in (X,B) each having distribution μ but not for the empirical measures obtained from another such sequence, say η ,η ,... EXAMPLE
(cf. D. Pollard (1981), Example (5.1)).
.
14
PETER GAENSSLER
Let (X,B,μ) be a nonatomic p-space ( i . e . , { x } E B and μ({x})=0 for a l l
xθ().
Suppose that there exists a subset A of X with inner measure μ (A)=0 and outer measure μ*(A)=l (cf. P.R. Halmos (1969), Section 16, for an example). Let B := AΠβ be the trace σ-algebra of B on A and μ be the p-measure defined on A A B by μ (AΠB) := μ(B), BEB; note that μ is well defined since μ*(A)=l. By the A A A definition of B Δ the embedding ξ Δ of A into X is B.,B-measurable A A A (ξ Λ
(B) = {xEA: ξΛ(x)=xEB} =AΠBEB. for any B6B) and one has
A
A
A 1
μ A (ξ A " (B)) = μ(B)
for a l l BEB.
Consider the p-space (Ω 1 ,F 1 ^P 1 ) := (A
,S) B Δ , x μ ) Έ
A
]N
A
and on it the random elements ξ.: Ω- + X, defined by
TM where π.: A
-> A denotes the i-th coordinate projection.
Then, by construction, the ξ.fs are independent having distribution μ (L{ξ i }(B) = P 1 ( ξ i " 1 ( B ) ) =ΊP 1 (π i " 1 (ξ A " 1 (B))) = V ^ ' ^ B ) ) = μ(B) for each BEB). Now, let C be the class of all finite subsets of A; then CCβ and μ(C)=O for all CGC since (X,B,μ) we assumed to be nonatomic. But since all the ξ.fs take their values in A it follows that for the empirical measures μ
pertaining to ξ ,...,ξ
one has
sup |μ (C)-μ(C)| Ξ 1.
cec
n
Taking instead the p-space
and on it the random elements η.: Ω
where here τr.:(CA)
-> X, defined by
•> QA is again the i-th coordinate projection and where
EMPIRICAL PROCESSES
15
denotes the embedding of CA into X, it follows as before (noticing that f
μ (CA)=1) that the n. s are i.i.d. with distribution μ (whence
But,
for the same class C as before one has now for the empirical measure μ ———— n !
pertaining to η , ...,η , μ (C)=0 for all CeC, since the η^ s take their values in CA, whence sup|μ (C)-μ(C)| =0.
cec (Note that in both cases
n
D (C,μ)Ξsup|μ (C)-μ(C)| n
cec
is measurable.)
n
Finally, taking as underlying p-space the CANONICAL MODEL (Ω,A,P) = (X^.B.., x μ )
and on it the coordinate projections ξ., i G U , being again i.i.d. with distribution μ, the above example shows that for the very same class C one gets e.g. , sup|μ (C,x)-μ(C)|
cec for
x.= ( X I J X Λ '
1
^e
{xθiΈ
λ x
= sup \iΛC9x) - l.(x-)
~
cec
λ
"
A 1
» whence, since AφB : sup|μ1(C,x)-μ(C)|
cec
λ
"
= l } = A x X x X x ...
i.e., here - in contrast to (9) (10)
ϋ^gtx)
:
= s u P | μ n (C,x)-μ(C)|
cec
n
-
is not B__ ,β-measurable, JN This indicates already the need for appropriate measurability assumptions to be discussed later.
Let us point out at this stage the usefulness of GLIVENKO-CANTELLI-convergence in statistics by giving only one example concerning CHERNOFF-type estimates of the mode (c.f. H. Chernoff (1964), and E.J. Wegman (1971)). For other examples, see P. Gaenssler and J.A. Wellner (1981). For the moment we anticipate the following GLIVENKO-CANTELLI-Theorem which will be proved later in this section:
16
PETER GAENSSLER
(11) Let ξ ,ξ ,... be i.i.d. random vectors on some p-space (Ω,F^P) with values in X = E , k^l, having distribution μ on B = 03, . Let C Ξ B ^ be the class of all closed Euclidean balls in ]R lim
(sup |μ (C)-μ(C)|) = 0
then P-a.s.
Now, consider (X,B) = (E j β ^ ) , k^l, and suppose that μ is "unimodal" in the following sense: (*) there exists a ΘG]R
such that for some 0, μ(B C (θ,δ ))>μ(BC(x,<S ))
for all xEΊR , xj=θ, where B°(x,& ) denotes the closed Euclidean ball with center x and radius βn)) = lim inf μ n
(B (θ,δ Q ),ω) ^ lim sup μ
C
(B (θ,r
),ω)
k which is in contradiction with the choice of θ ad (2): lim θ n (ω) = x * θ
implies that
according to (**).
lim inf μ
C
(B (Θ
(ω),r
),ω)
£ lim |μ (B°(θ (ω),r ),ω) - μ(BC(θ (ω),r ))| n. n. n. n. n. /
V = 0 according to (11)
+ lim sup μ(B°(θn (ω),i»n )) ύ μ(BC(x,δ )) < μ(B°(θ,δ )) 3"*°° j j ° (*) ° C
= lim inf μ n (B°( Θ, ]R+ i s some given WEIGHT FUNCTION. (Note that D (q) Ξ D F
for q Ξ l . )
Considering instead of the ξ.!s the special versions ξ. = F~ 1 (n.), the n.'s being independent and uniformly distributed on (0,1), it follows in the same
18
PETER GAENSSLER
way as pointed out in part c*) of the outline of the proof of (8) that w.r.t. p ! Γ-a.s. convergence of D (q) for continuous q s one may consider w.l.o.g. p instead of D (q) its versions .F |F < D (q) := sup n q(F(t)) where F
is the empirical df pertaining to L } ...,ξ .
But, due to the identity
{ξ\^t} = {η.^F(t)}
F (t) = U (F(t)) for all tEH, where U η
|
l 5 '*'' η n*
for all tGIR one has
is the empirical df pertaining to
T n e r e f o r e
|U (F(t))-F(t)| |ϋ (s)-s| D (q) = sup — ^ sup — =: D (q), tEΠR q(F(t)) s6[0,l] q(s)
A F r
~F where we remark that for continuous F we have even D (q) = D (q), whence for continuous q's and F f s one has, comparing again with part c*) of the outline of the proof of (8), that D
n ( q ) = ^n ( q ) = V
q )
showing that in this case D F (q) is a DISTRIBUTION-FREE STATISTIC. By the way, since for continuous q f s and arbitrary F f s
F ** ~F D (q) = D (q) ύ D (q), we obtain
in this case that ^
for each d^O.
Also, the above remarks show that for continuous q*s we may restrict ourselves w.l.o.g. to the case of finding conditions on q such that (*)
lim D (q) = 0 ΠP-a.s. n-*» n
in order to get the same GLIVENKO-CANTELLI-convergence for D (q). The following theorem gives in a certain sense necessary and sufficient conditions on q for (*) to hold (cf. J.A. Wellner (1977) and (1978)). THEOREM 1. Let (η.).
be a sequence of independent random variables on some
p-space (Ω,F,IP) being uniformly distributed on [θ,l] [θ,l]. Let D (q) be defined as above with a weight function q belonging to the set
EMPIRICAL PROCESSES
19
0^ := ίq:[0,l] -*]R, q continuous, q(0) = q(l) = 0, q(t) > 0
for all
tE(0,l), q monotone increasing on [0,6 ] and monotone decreasing on [l-δ ,l] for appropriate δ.=δ.(q), i=0,l}. Then, putting (i)
Ψ(t) :=
, > , one has:
1 For any q€(λ with Jψ(t)dt < °° it follows that 1 0 lim D (q) = 0 P-a.s.
1 (ii) For any qE£ with Jψ(t)dt = °° it follows that 1 0 lim sup D (q) = °° P-a.s. Proof.(i): For any ε>0 there exist θ.>0 such that θ. i.e. that (+)
(B\ {x 1 }) Π T± Φ F^
for all i=l,...,p. 1
ad (+): Let i E {l,...,p} be arbitrary but fixed; if x
E F., then x 1
1 2 E F., 1
EMPIRICAL PROCESSES
27
X
1
but x £ (B\{x }) Π F i 3 implying (+) in this case. If x E (?F., i.e., 1
1
{x } Π F, = ?, then F^ = F* % whence (B\ {x }) Π F. = B Π F. Φ F^ = F?, implying (+) also in this case. This proves Lemma 8. D The next lemma, being a consequence of Lemma 7 or Lemma 8 5 respectively, will be one of the key results used below. Lemma 9. Let X be any set and CCp(χ) be a Vapnik-Chervonenkis class (i.e.,V(C) =: s < ~) then C
S
(i)
m (r) ϊ r
for all r£2, and
(ii)
m C (r) ύ r S +l
for all
Proof. According to Lemma 7 and (13) we have C
s
C
m (r) < φ(s,r) ύ r +1 for all r^s, whence (note that m (•) is integer valued) m (r) ^ r
for all r^s;
if 2 £ r £ s , it follows that m C (r) ύ 2 Γ ύ 2 s ύ r S ; this proves (i). CO s Finally, for r=0 we have m (0) = 1 = 2 (whence s^l) ^ 0 +1, and for r=l we have m (1) ^ 2= 1 S +1, proving (ii). D Besides Lemma 9, the following VAPNIK-CHERVONENKIS-INEQUALITIES are basic for the whole theory. Vfe are going to present this part in a form strengthening the original bounds obtained by Vapnik and Chervonenkis. This will be done in a similar way as in a recent paper by L. Devroye (1981). For this, let again (X,8) be an arbitrary measurable space (Devroye (1981) considers only (X,B) = (]R k ,6 k ), k*l), and let (ξ ± ) i e : N be a sequence of independent and identically distributed random elements in (X,B), defined on some common p-space (Ω,F,F), with distribution μ ΞL{ξ^} on 8* For n,n' E Έ let μ and v t be the empirical measures based on ξ.,...,ξ and ξ n + 1 >
>£ n + n »» respectively.
Let C be an arbitrary subset of B, and let
28
PETER GAENSSLER
D (C,μ) := sup |μ (C) - μ(C)| , n n CGC D
(C) := sup |μ (C) - v ,(C)|,
where we assume that both D_(C,μ) and D_ t (C) are measurable w.r.t. the n n, ri ~———————————————_______^— canonical model
(i.e., with (X , BτΊ,
being the coordinate projections of X (Note that then D (C,μ) and D
T
f
x μ) as basic p-space and with ξ. s onto X ) .
(C) are also measurable considered as func-
tions on the initially given p-space (Ω,F,P), since ω H- ξ(ω) := (ξ^ω) ,ξ2(ω) ,. . . ) G X 1
is
F,6^-measurable.)
The proof of the following inequalities is patterned on the proof of Vapnik and Chervonenkis (1971). As a corollary we will obtain both, the fundamental Vapnik-Chervonenkis inequality and its improvement by Devroye (1981). Lemma 10. For any ε>0, any 02]
^
:= J [ J u(h ,- (l-α)ε) dP 2 ] d P 1 1 2 n > n A X A 1 := {x 1 G X 1 : sup |μ (C,χ 1 )-μ(C)| > ε } , n CGC 1 1 Π μ (C,x ) = — Σ n i=1
But for any ( x 1 ^ 2 ) G X 1 x X 2
1 1 l p (x.) for x = (x1 ,.. . ,x ) G X . C i 1 n
with
x1 G A1
there exists a C χ l G C
30
PETER GAENSSLER
such that 1
L.x ) - μ ( C χ l ) | > ε,
2
1
whence (for v τ (C,x ) = - t n n
Σ
i-n+1
l p (x.) with L> ϊ
x
9
= (x
2
,,...,x f) E X ) n+l n+n
we have μ
l n if
l
(C
v
χl)
χl' (C
n
χl'
therefore, we obtain for all x
" V
χ2)
( C
χl'
χ 2 )
> ( 1
l
"
α ) ε
" uCC χ l )I £ αe;
£ A
the following estimate for the inner
integral in (a):
/ u(h , ( x \ x 2 ) - (l-α)ε) P 2 (dχ 2 ) n χ2 'n P 2 ( { χ 2 € X 2 : |v.(C
l5
χ 2 ) - μ ( C i ) | S oε}).
But, by Tschebyschev's inequality,
P 2 ( { χ 2 E X 2 : | v n ! ( C χ l , χ 2 ) - μ ( C χ l ) | > αε}) 1
(l-y(C χ l ))
μCC?
2 2 α ε
n
f
1 %α
2
2 ' ε nf1
thus, summarizing we obtain P(D
,(C) > (lα)ε) (l-α)ε) ^
( ) ( 1 ^ ) 4α2ε2n'
= P(D (C,μ) >ε)(l n
(•#)
^—) 4o ε n f
which proves (a). Proof of inequality (b) in Lemma 10. According to our remarks preceeding the proof we may and will consider D
.(C) as a function h . of n,n ! n,n ! 1 2 1 2 x = (x ,x ) E X x X
1 with
x
2 = (x-,... ,x ) and x = (x
.,... > x n + n » )
1 2 Due to the symmetry of P = P * P (w.r.t. coordinate permutations) one has
EMPIRICAL PROCESSES
31
n+n 1 for each f 6 L (X xX , B,P) 1 λ
9
λ
J
f(x) P(dx) =
J
f(T.x)P(dx)
for every permutation T.x of x (which means that T.x is the image of x when applying a permutation T. to the n+nf components of x ) . Therefore, P(U
t
(C) > (l-α)ε) =
where the summation w . r . t .
J
u(h
i i s over a l l
f
(x) - (l-α)ε) P(dx)
(n+n')I permutations T . .
Ws a l s o remark here for l a t e r use i n the proof t h a t Q
u(h
. - ( l - α ) ε ) = sup u(h , - (l-α)ε), n,n' n,n c ^ := |μ ( C ^ 1 ) - v , ( C , χ 2 ) | n n
where h° t ( x ) n jn 1 9
1
for
9
x = (x ,x ) e X xX . Next, l e t x = ( x \ χ 2 ) x F
2
χ
= (χn+1> :=
X
6 X 1 x X2 (with x 1 = ( x ^ .
#
X
X
9
^ l' "'' n' n+l '
Λ
Π C
±
# #
X
f
= F
x
Π C
Δ
C ,C 6 C , F X X
implies h
l Λ
C
X
2
n9n
C
C
1» 2
G
C
O n e
h a S
t h a t
,(T.χ) for all T.. l
I
n C , and such that at the same time for any
X
CEC there exists a C
l
a n y
a subclass of C such that for any two
Φ F
J.
f o r
,(T.x) = h
n5n
n C
t h e n
' n+n ^*
Hence, denoting with C Δ. A
and
» x n + i ) ) ^ e a r b i t r a r y but fixed and put
C
F
.x^
Z
EC
with F X
ΠC
X
sup u(h° ,(T.x) - ( l - α ) ε ) = n,n' l CEg
= F X
n C, we obtain for all T.
X
2. C
C
sup u ( h - (l-α)ε) t(T.x) n,n τ l E ^
32
PETER GAENSSLER
C
Σ u(h (T.x) - (l-α)ε). t n n 1 CGC ' x For later use in the proof we note here also that jC I = Δ (F ) x x
(cf. DEFINITION 1) 2
1
2
for every x = ( x ^ x ) E X x X . It follows that (n+n')! Σ u(h ,(T.x) - (l-α)ε) . n,n ! 1
/ . t (n+nM1 )!
rrγ
Σ
sup u(h
,(T.x) - (l-α)ε)
(n+n')!
1
ΓΓ
.Σ.
u(h
n,n'(Tix)
Note that for each fixed x and C
is the fraction of all (n+n 1 )! permutations T.x of x for which
|μ n (C 5 (T i x) 1 ) - v n ! ( C , ( T i x ) 2 ) | > (l-α)ε. Now, for x and C being arbitrary but fixed, put
{
1, if x. e C ^ , j=l,...,n+n τ , and denote by 0, if x. € CC
(η^ 1 ,... , n ^ n , ) the vector T^η for η = (nχ9.. . > n n + n ι ) Consider then the p-space (Ω ,A JP ) with Ω permutations T., A
being the set of all (n+n 1 )!
:= P(Ω ) , and
|A| Γ (A) :=
,A e A . (n+n')!
EMPIRICAL PROCESSES
Then, u s i n g t h e random v a r i a b l e s ζ . : Ω — > { 0 , l } 5 by ς . ( I\) := η .
33
defined
, Ύ± E Ω Q 9 j = l , . . . 5 n + n f , we o b t a i n
lί (|i
n Σ ς. - i i =l ^
Σ
n [ ( n + n ' ) μ . ,(C,x) - Σ ζ . ] π+n i=l ^
ζ . - (n+n')y
+n
ι(C,x) + |
Σ
ζ . | > (l-α)ε)
j = 1
^ 2
exp [-2n (
f) (1-α) ε ],
using Hoeffding's inequality for sampling without replacement from n+n' binaryvalued random variables with sum (n+nf)μ
f
(C,x); cf. W. Hoeffding (1963) and
R.J. Serfling (1974). 1 2 Summarizing we t h u s o b t a i n f o r e v e r y x = ( x , x ) E X
1 2 xX
(n+n!)ί Σ
Σ
ί ^ . ' W
u(h
.(T.x) - (l-α)ε)]
\C | (2 exp [ - 2 n ( ^ τ ) 2 ( l - α ) 2 ε 2 ] ) = ΔC(F ) x n+n x Q ! ^ m (n+n ) (
), and therefore 1
J I
> (l-α)ε) = X
(n+n )! Σ [J^J, Σ u(h^ ^ h ^ ^, (T.x)-(l-α)ε) (T.x [γ—^TTΓ 1 P(dx)
o
xX
l—1 f
Λ
( Σ
[7—VvΓ
(n+n )Σ Σ
u
p
^h
,(T.x) - (l-α)ε)l) P(dx)
34
PETER GAENSSLER
mC(n+n') 2.exp [-2 which concludes the proof of (b). D
NOTE: We have in fact shown, assuming measurability of ΔCαξl,...,ξn,ξn+1 P(D
ξ n + n , } ) , that
,(C) > (l-α)e) £ 2 exp[-2n(^- Γ ) 2 (l-α) 2 ε 2 ] E(ΔC({ξ.,... ,ξ , n,n n+n 1. ΠTΠ
in many cases this bound is considerably smaller than the r.h.s. of (b).
COROLLARY. (i) Vapnik-Chervonenkis (1971). Taking α = — and n f = n, one gets 2
P(D (C,μ) > ε) ύ 4 mC(2n) exp (- ^ n o
) for a l l n £ 2 / ε 2 .
( i i ) Devroye (1981). 1 Taking α = —
2 and n 1 = n -n, one gets
P(D (C,μ)>ε) ύ 4 exp(4ε + 4ε 2 ) m C (n 2 ) exp(-2nε2) for all n > max(~,2).
Proof, (i): It follows from (a) and (b) in Lemma 10 that in the present case P(D n (C,μ) >ε) ik (1 - -^-)"1.mC(2n) 2 exp [-2n i i ε 2 ] ε n 2 A. 4 mC(2n)exp(- — • ) . 8 (nε2έ2) (ii): Again (a) and (b) in Lemma 10 yield in the present case
P(D (C,μ)>ε) ί» (1 n
2 ~1 2 C 2 2 2 2 2 § ) m (n )2exp[-2iH—) (ε -2αε +α ε )] n 4(n -n)
EMPIRICAL PROCESSES
2
2
2
2
35
2
2
2
2
x 2exp [-2n(ε -2αε +α ε ) +4(ε -2αε +α ε )] 4 m C ( n 2 ) exp[-2nε 2 +4αnε 2 -2nα 2 ε 2 +4ε 2 +4α 2 ε 2 ]
n (n 2 )exp[-2nε 2 +4ε+4ε 2 ]
m (n 2 )exp(-2nε 2 ). D
Based on Lemma 9 (i) and on part (i) of the corollary to Lemma 10 we now obtain the main result of Vapnik and Chervonenkis concerning almost sure convergence of empirical C-discrepancies in arbitrary sample spaces.
THEOREM 2. Let (X,B) be an arbitrary measurable space and let (£*) £-rκτ b e
a
sequence of independent and identically distributed random elements in (X,B), defined on some common p-space (Ω,FjP), with distribution μ = L{ξ.} on B.
For n G I l e t μ and v b e t h e e m p i r i c a l m e a s u r e s b a s e d on ξ . 9 . . . 9 ξ a n d n n in ξ
n t l 9 " # * ξ 2n' r e s P e c t i v e l y
Let CCB be a VCC such that both D (C,μ) as well as D
n
(C) are measurable
n,n
w . r . t . the canonical model; then lim D (C,μ) = 0 n
-χχ>
P-a.s.
n
Proof. Of course, it suffices to show that (*)
lim sup D (C,μ) ^ ε P-a.s. for every ε>0;
according to the Borel-Cantelli Lemma, (*) is implied by (**)
Σ
n€]N
P(D (C,μ) > ε) < «> for n
every ε>0,
whence the proof will be concluded by showing that (**)
holds true.
36
PETER GAENSSLER
ad (**): Given any ε>0, we obtain from part (i) of the corollary to Lemma 10 that for all n ^ 2/ε 2 Γ(D (C,μ) > ε) ^ 4m (2n) exp(- ^ - ) . n o Since, by assumption, C is a VCC, we have V(C) =: s < °°, whence by Lemma 9 (i) Σ
P(D (C,μ) > ε) ^ n
Σ o P(D (C,μ) > ε) + n nCFεn o
g n Ξ 0,
(3) restp g^ = 1.
Therefore, for every nEΈ
we obtain
lim sup μ (F) = lim sup J l p d μ oi r o t α α = (h'),(D
^
(g^D
Jg dμ = Π
ί(SCΠFεn)
/
/
^ lim sup Jg dμ n o (3),(1) α
- , \ / - \
J g dμ = f g dμ + f SC n SCΠCFεn n S°ΠF ε n o o c
g dμ n
= J (2) S ° n F £ n c
t
g dμ n
= μ(F£n) < Ϊ(F) + - ,
°
whence (for n-*0^) we obtain
lim sup μ (F) ^ μ(F), which proves (b). α
(bτ ) — ^ ( b ) : Given an arbitrary FEF(S), we have as before that for every nEϋN there exists an F G n EG(S) s.t. μ ( F ε n ) ^ μ(F) + ~. Let g
be defined as before and put F
:= {xES: g (x^tyjll}
then
F EF(S)ΠB, (S), F 3F for all nE]N , and n D n (+)
F Π S C C F ε n Π S C for all nEIN.
no
o
(As to (t), let xθF Πs°; then, if S°ΠδF ε n Φ 0, we have by construction of g n , d(x,S^ΠCF ε n) ^ -^ > 0, whence xφS^ΠCF ε n , and therefore xGF £ r i ns^; if S^ΠCF £ n = 0
52
PETER GAENSSLER
εn
(and therefore g = 1 ) , it follows that F Π S ° = S° and therefore n c o c
c
εn
c
F ns cs = F πs .) n o o We thus obtain
o
lim sup μ (F) ^ lim sup μ (F ) ^ μ(F ) α α n ! n α α (b )
μ(F) + p whence (for n-*») lim sup μ (F) ύ μ(F), which proves (b). α
(a) and ( b ) — ^ ( f ) : Given an Ae8(S) with μ(3A) = 0, we have μ(A°)
ύ lim inf (μ α ) # (A°) ^lim inf (y α > # (A) ^ lim inf μ*(A) (a) α α α
^ lim sup μ (A) ύ lim sup μ (A ) ^ μ(A ) = μ(A ) , whence α α α (b)
lim μ*(A) = ϊί(A), α On the other hand, one obtains in the same way that μ(A ) ^ lim inf (μ ) (A) ύ lim sup (μ ) (A) ύ lim sup μ (A ) α α α ^ μ(A C ) = μ(A°), whence also lim (μ )<Jt(A) = μ(A), which proves (f). α This concludes the proof of the Portmanteau theorem, •
IDENTIFICATION OF LIMITS: Let C be the set of all closed balls in S = (S,d) and let C
denote
the class of all subsets of S which are finite intersections of sets in C. Then, since C
is a Π-closed generator of 8 (S), we have for any two
μ.GR (S), i=l,2, that μ χ = μ
if μ χ (A) = μ 2 (A) for all A G C Π f (cf. Gaenssler-
Stute (1977), Satz 1.4.10).
We will show below that for any net (μ ) in M (S) and any μ.eM, (S), μ
α "b μ i ' i = 1 ' 2 »
i m
P l i e s ^1
=
^2'
For this we need the following auxiliary result:
EMPIRICAL PROCESSES
(29)
53
For any ACS, any ε>0, and any separable S Cs, there exists o an f£Oj£(S) such that 0£f£l, reSt
C(AΠS^)ε f ε
Ξ
°a n d
r e s
f
W
Ξ
U
ε
Proof. It follows from Lemma 11 (ii) that C
f (x) := max [(1 has the stated properties.
d(x,AΠS ) — ) » θ ] , x€S,
D
LEMMA 13. Let μ.GR(S) be separable, i=l,2, and suppose that (+)
Jfdμ1 = Jfdμ2 for all fβj£(S);
then μ 1 = μ 2 Proof. Let S. be the separable subsets of S for which μ.(S.) = μ.(S), i=l,2; put S
:= S^US^jt o
s e t
a
se
P a r a t ) l e subset of S for which μ.(S C ) = μ.(S),
i=l,2. Now, given an arbitrary AGC
and n£]N , choose f =f . according to
(29) to get a sequence (f )OJ (S) for which lim f n-*" n
= l . n ς O ; from this, by Lebesgue's theorem and (+) c
it follows that ^
^
= μ^A). D
Lemma 13, together with the equivalence of (g 1 ) and (h 1 ) in (28) implies the result announced above (cf. Definition 4 (i)): Lemma 14. For any net (μ ) in M (S) with μ -rj μ., i=l,2, we have μ
= μ .
WEAK CONVERGENCE AND MAPPINGS (Continuous Mapping Theorems): Let S = (S,d) and S f = (S ? ,d ! ) be two metric spaces and suppose again that A is a σ-algebra of subsets of S such that B b (S) C A C B(S); let g: S + S τ be A,B (S1)~measurable and let μ 6M (S) and μE/^CS), respectively,μ separable. Then μ and μ induce measures v and v on B, (S ! ), defined by α α b
54
PETER GAENSSLER
1
t
τ
1
1
v α (B') := μ α (g C B ) ) and v(B ) := £(g (B')) g
for B ' d B ^ S ) , where
1
(B ) = lx6S: g(x)eB'} and where μ is again the unique Borel extension of
μ (cf. (24)).
We are interested in conditions on g under which μ v
= μ og
—r* \) Ξ μog
—=*« μ implies
. it can be shown by examples that measurability of g
alone is not sufficient for preserving weak convergence. As we will see, some continuity assumptions on g will be needed. The corresponding theorems are then usually called CONTINUOUS MAPPING THEOREMS.
THEOREM 3. Let S - (S,d) and S f = (S ! ,d f ) be metric spaces, let A be a σ-algebra of subsets of S such that B (S) C A C B(S), and let g: S -> S f be A,B (S ! )-measurable and continuous. Let (μ^) be a net in M (S) and let \iEhί (S) be separable such
t h a t
μ
α ~~bμ #
T h e n
V
α
Ξ
μ
α °g~±
~b
V
Ξ
μ
° g"±'
Theorem 3 is a special case of the following result where the continuity assumption on g is weakened:
THEOREM 4. Let S = (S,d) and S τ = (S τ ,d ! ) be metric spaces, let A be a σ-algebra of subsets of S such that B, (S) C A C B(S), and let (μ ) be a net in b α M (S) and μEAl (S) be separable such that μ measurable such that μ(D(g)) = 0. Then v
- £ μ; let g: S -> S τ be A,B (S')~ Ξ μ
g
—ζ v Ξ μ«g
(Note that D(g) 6 B(S); cf. P. Billingsley (1968), p. 225-226.) Proof. Note that v ^ M ^ S 1 ) and \>ύl ( S ! ) , whence v^ -ζ v iff (i) v is separable and (ii) lim J fdv α α S
= J fdv for all f θ l ( S f ) where (ii) b S'
is equivalent to any of the conditions (a)-(h 1 ) in (28) (with S replaced by S 1 and A replaced by A ! = B (S f )). 1.) v is separable: since μ is separable, there exists a separable S Q5 such that μ(S ) = μ(S).
EMPIRICAL PROCESSES
Let T C S Q be countable and dense in S we will show that S
1
:= g(S^\D(g))UT
55
(as well as in S ) and let T
1
is a separable subset of S
τ
f
:= g(T ) ;
with
v((sυ°) = v(S'). For this we will show that T
τ
!
(being countable) is dense in S :
o o in fact, l e t (w.l.o.g.) yGgtS^Dίg)), i . e . , y = g(x) for some xESC\D(g)f Since T is dense in S there exists a sequence (x ) ^τ. τ c T such that x -> x o o ^ n nEU o n and therefore, since x f D(g), we have g(x ) -> g(x) = y, where g(x ) GT1 . X
C
Next, since g ( ( S M ) D g
(g(S^\D(g))) D S ^ D ( g ) and since £(D(g)) = 0,
we have v((S') C ) = μ(g~
SM))
ϊ(S^D(g))
S(S)
= £(S°) = μ(S^) = μ(S) = Ϊ ( S ) = ϊ ί g ' ^ S 1 ) )
2.) It remains to show (ii)
= v(Sf).
= J fdv for all f6C?(S f ).
lim J fdv α
α S
μ(
b
S
f
For this, given any fEC, ( S ) , we have that f g: S -> ]R is a bounded function belonging totF (S) which is μ-a.e. continuous, and therefore it follows from a (28) (cf. (e 1 )) that lim / fdv α α Sf
= lim J (f g)dμ α α S
= J (f g)dμ = J S S!
fdv, which proves (ii). D
The following lemma is in some sense an inverse result: LEMMA 15. Let S = (S,d) be a metric space, (μ ) be a net in M (S) and let
ot μEK (S) be separable such that μ βf" b a
- ^ μ f" b
a for all fEC (S). Then μ -Γ μ. a α D
Proof. Note that in the present case S 1 = P. (a separable metric space), whence v
Ξ μ
f
—1
~ —1 and v Ξ μ f are separable Borel measures on
C = B(IR). Now,
for any fEC^(S) and any gEC^(]R) = Cb(]R) we have
lim J (g f)dμ
= lim J gdv α
αS
α
]
R
α
]
= J gdv = J (g f)dμ. R
S
Furthermore, for any fEC (S) there exists a c>0 such that |f| ^ c, whence for a r
- c , if t0, there k
exists a k E IN such that o μ( U
T°) ύ μ(T°) + ε for all k ^ k ,
kernk
k
°
and therefore, by (b), we obtain v(G) ^ μ(T°) + ε for all k U K
, o
But μ -7* μ implies (cf. (28)) that for every kEϋN n b Ϊ(T°) ύ lim inf (U n ) # (T°),
and therefore, noticing that T° C g (G) for sufficiently large n, we obtain K n
58
PETER GAENSSLER
μ(T°) έ lim inf μ ( g ^ C O ) = lira inf v (G), k
n
n
n
whence v(G) ύ lim inf v (G) + ε for every ε>0, which implies (++). D n*»
WEAK CONVERGENCE CRITERIA AND COMPACTNESS: b e
As before, let S = (S,4)
a
(possibly non-separable) metric space, let
M (S) be the space of all p-measures on a σ-algebra A with EL(S) C A C B(S) and let ^ί(S) be the space of all p-measures on β ( S ) .
DEFINITION 5. Let (μ ) _. be a net in M X ( S ) ; then (μ ) C Δ is called δ-tight ot dcA a ot otfcA iff (30)
sup
(Note that K
lim inf μ ( K δ ) = 1. α α€A
inf δ>0
E B,(S) C A according to Lemma 11 (ii).)
The following two results were proved by M.J. Wichura (1968), Th. 1,3 and Th. 1.4; in view of (27) b) they can be restated as follows (where in Theorem 7 the assumption of (S,d) being topologically complete cannot be dispensed with, in general).
THEOREM 6 (Wichura). Let (μ ) C Δ C M 1 ( S ) be δ-tight. Ot Ottn.
Then there exists a subnet (μ ,) f Ot
a
ΔT
Ot vzzΔ
of (μ ) 01
. and a separable μEM, (S) such OtvzΔ
JD
that μ α , - - μ . THEOREM 7 (Wichura). Let S = (S,d) be a topologically complete metric space and (μ ) be a net in M ( S ) ; then there exists a separable μEM, (S) with
and
(a)
lim inf Jfdμ = lim sup Jfdμ for all f e Ot Ot α α
(b)
(μ ) is δ-tight.
D
We are going to prove here instead the following versions of Theorem 6 and 7 (cf. Remark (31) below):
EMPIRICAL PROCESSES
THEOREM 6*. Let (μ )
59
1
he a net in M (S) fulfilling the following two con-
ditions: (b ) 1
For every (f ) __τ C lί(S) with f Ψ 0 one has n nEJN b n lim sup Jf dμ α
(b_) ^
-> 0 as n-x».
There exists a separable S lim inf Jfdμ
o
C s such that
^ 1 for all f G lf(S) with f b
SC o
Then there exists a subnet (μ f ) , ^ , of (μ )
and a separable μ G Λi(S)
with μ(S°) = 1 such that μ -r* μ. o ΌL b
THEOREM 7*. Let S = (S,d) be an arbitrary metric space and (μ ) be a net in M (S); then there exists a separable μ G Λί(S) with μ
—=* μ iff the following
conditions are fulfilled: (a) as in Theorem 7 and (b.), i=l,2, as in Theorem 6 , where in this connection the separable S
with μ(S ) = 1 and the separable S
occurring in (b~)
coincide.
Proof of Theorem 6*. Let μ (f) := Jfdμ
α~(uα(f»£θJb(s)e b
for f G U (S) and consider the net
π
[
fθf(S) b
where
||f|| :=
sup |f(x)|. Since the product space x G S
Π
[- ||f||,||f|| 1 b
fGU (S)
is compact in the product topology (Tychonov!s theorem), there exists a convergent subnet, say α 1 ι—> (μ ι ( f ^£Qjb( S )» α f G A τ . Therefore exists for each f G Let
then μ: u
u b
(S).
μ(f) :=
lim μ ,(f) for f G U b ( S ) ;
(S) — > ]R is positive, linear, and normed.
b We are going to show that μ is also σ-smooth on U (S):
lim
μ t (f)
60
PETER GAENSSLER
for this, let (f ) ___C lί(S) with f Ψ 0; then it follows by (b.) that J n nfcJN b n 1 μ(f ) = lim μ t (f ) = lim sup Jf dμ . n , α' n , * n α' α α ^ lim sup Jf dμ α
-> 0 as n-*»,
Therefore, according to the Daniell-Stone representation theorem, there exists one and only one μ E M, (S) such that μ(f) = Jfdμ for all f E l£(S). D Hence, in view of (28) (cf. the equivalence of (g ! ) and (h ! )) it follows that μ , — £ μ, if we finally show that μ(S°) = 1 (i.e. μ separable). For this we use (b^) according to which (+)
lim inf Jfdμ α
^ 1 for all f G l£(S) with f * 1 b SC o
taking
f (x) := max [l - nd(x,S^),0], xGS,
we obtain a sequence (f ) € 1 s
c
^ f ^ 1 Λ . , n 1/n (sc
o
C U (S) with 0 £ f
£ 1 and
whence by the σ-smoothness of μ
o
(note that S ° , ( S C ) 1 / n G B, (S) by Lemma 11 (ii)), O O D C
C
μ(S ) = inf μ ( ( S ) o ^.--T o n£U
1/n
) ^ i n f Jf dμ = inf lim Jf dμ ,1 n n α fi n n α
^ inf lim inf Jf dμ ^ 1, whence μ(S C ) = 1. n α n α ( +) °
Proof of Theorem 7 . Only if-part: Suppose μ
D
—£ μ;
then (a) is a consequence of (28) (cf. the equivalent statements (g f ) and (h 1 )). ad (b,): Let (f ) c n κ τ Clf(S) with f 4- 0; then (cf. again the equivalence of x— n ntjN JD n (g 1 ) and (h ! ) in (28)) lim sup Jf du α according to the σ-smoothness of μ on
= lim Jf n dμ α = Jfndμ -> 0 as n^° α u b
(S)
ad (b^): Since, by assumption, μ is separable, there exists a separable S
C s
EMPIRICAL PROCESSES
61
such that μ(S°) = μ(S); therefore, for any f 6 uf(S) with f Π o b
s
c o
one has
lim inf Jfdμ = lim Jfdμ = Jfdμ ^ μ(S°) = μ(S) = 1. α α o α α 1
c
If-part: It suffices to show that there exists a μ E M. (S) with μ(S ) = 1 such that for any subnet (μ ,) of (μ ) there exists a further subnet (μ M ) such that μ M —ζ μ. For this, let (μ ,) l f = Δ t be an arbitrary subnet of (μ ) a then it is easy to α α tA α αtA show that (μ ,) ! C : Δ I fulfills (bΛ) and ( b o ) and therefore, by Theorem 6 , there α α tA -L 2. exists a subnet (μ „ ) llc.,, of (μ t ) ! C Δ , and a μ Δ f Ot
yK Λ ι!
c. Λif(S ) = 1
A ,A"
o
'9
Ot
such
vΞ*A
that
Ot
μ „ —^ μ
α" b
Ot
vHxX
Jt\
„ E Λl (S) with % ί\
iλϊ*
....
AΛ ff, A M
We are going to show that μ Δ , Δ f l in fact does not depend on A 1 or A", whence A ,A for μ being the common value of all the μ Λ l .„ we get μ - ^ μ, which will A
,A
Ot JD
conclude the proof. For this, given any f E U (S), we have by (a) lim inf Jfdμ α αEA = lim sup Jfdμ α α"EAM whence
ύ lim inf Jfdμ „ = Jfdμ Δ ! α α"EA" ' fl
^ lim sup Jfdμ α αEA
Δ M
= lim inf Jfdμ , α αEA
Jfdμ A ? A,, = Jfdμ~t - „ , for all f Eu b ( S ) and any other subnet dJ~M)~ιι^Tti of (vι~f)~tcAi which is a subnet of (μ ) α Q . , with μ~ π —£ μ~ t ~ tl ;
therefore Lemma 13 implies the assertion.
D
(31) REMARK. Any «S-tight net (μ ) C M"L(S) fulfills ( b . ) 3 i=l,2, but not vice versa (look at μ
Ξ μ with a separable μ E ^ί(S) which is not tight).
Proof, ad (b,): Let (f ) C 1 λ τ C UΓ(S) with f Ψ 0 and assume w.i.o.g. sup f ^ 1 ; i.— n ntJJN D n n then for every n £ 3N , every δ>0, and every K E K(S) we have lim sup Jf dμ = lim sup ( J x f dμ + J^f dμ ) F r n α 'δ n α α Γvδ n α α K CK
62
PETER GAENSSLER
^ lim sup J.f dμ
K δn
α
a
+ lim sup
CK
α
J~f dμ
δ n
α
^ (since f ^1) sup. f (x) + lim sup μ «JK ) ^sup~ f (x) + sup lim sup μ (£K ). n ό n α n α xGK α xGK δ>0 α Now, given any ε>0, there exists by assumption (cf. (30) and look at complements) a K
6 K(S) such that sup lim sup μ (£K ) ύ ε/3.
ε
δ>0
α
α
Therefore, for any ε>0 there exists a K
ε
6 K(S) such that for all n G ΊX and δ>0
lim sup Jf dμ ^sup {f (x): x G K 6 } + ε/3. n α n α Furthermore, it is easy to show that for any ε>0 and n G IN there exists a δ(ε,n) such that sup if (x): x e K ^ ( ε ' n ) } ύ sup if (x): x G K £ } + ε/3. We thus obtain that for any ε>0 there exists a K
G K(S) such that for every
n GH lim sup Jf dμ α But, since K lim sup Jf dμ
^ sup {f (x): x G K } + ε/3 + ε/3.
is compact, sup {f (x): x G K } -> 0 as n->», whence ύ ε for sufficiently large n, which implies ( b ^ .
ad ( b o ) : δ-tightness of (μ ) implies that for every n G U there exists a 2.— ot K G K(S) such that inf lim inf μ (K 6 ) £ 1 - -. n a n n Γ^Λ δ>0 a Put S
:=
U K nGϋN
to obtain a separable S
C S; then, given any f G U^(S)
with f ^ 1 , we must show that SC o (+)
Since f ^ 1 K
lim inf Jfdμ ^ 1. α α
for each n, it follows (by continuity of f) that for every n G n
and every ε>0 there exists a δ J
o
= δ ( ε , n ) > 0 such that o
inf ίf(x): x G K ^ } ^ 1 - ε, whence Jfdμα ^ (1-ε) U α ( κ n ° ) every ε>0 and every n E U w e
have
lim inf Jfdμ α
Therefore, for
^ (1-ε) lim inf ^ ( K ^ ) α
EMPIRICAL PROCESSES
63
k (1-ε) inf lim inf μ (K ) ^ ( l - ε ) ( l - - ) , which implies (b ). D α n n δ α (32) Remark. The proof of Theorem 7* shows that any net (μ )C Λ C M (S) —————————
ot oiviA
a
which fulfills (b.), i=l,2, is a compact net in M (S) (i.e. for any subnet I a (
}
V a'eA' °
f (μ
}
a aeA
theΓe existS a further subnet (μ
ι) α
f(
}
α"eA" ° V a ' G A '
and a separable μ 6 Λί(S) such that μ u -£ μ ) . The following lemma prepares for the next theorem (cf. M.J. Wichura (1968), Theorem 1.2 (a)). LEMMA 17. Let (μ ) be a net in M (S) and μ G R ( S ) be separable, i.e., μ(S ) = μ(S) for some separable S C S; let C C A be such that o o (33)
for each x E S° fc € C: x G C°} is a neighborhood base at x,
and let C
denote the class of all finite intersections of members of C.
Suppose that ~ Πf lim μ (C) = μ(C) for all C G C . α
(+) Then
lim inf μ (G) ^ μ(G) for all G G G(S) n A. α α (Here again μ denotes the unique Borel extension of μ and A is a σ-algebra of subsets of S with B, (S) C A C 8(S).) D
Proof. Given any G 6 G(S) n A, it follows by (33) that for every x E G n S° there e x i s t s a C EC such t h a t x G C C G C G, whence x xx G Π s° C U o χeσΊsc of G Π S
C°, which means that {C° Π S°: x G G Π s°} is an open covering x' x o
in the separable subspace (S ,d) of (S,d). Therefore (cf. Billings-
ley (1968), p. 216) there exists a countable subcovering of G ^ S , i.e., G Π SC C U ( c ° Π SC)with x 6 G Π S C , n e i . o ^-_ x o n o n G Uτ n
64
PETER GAENSSLER
Put C := C , n e W ; then U C C U C C G, whence neu n xGens0 x O
μ(G) ^ μ( U C ) = μ( U (C Π S°)) ^ μ(G Π S°) = μ(G), n
n
n o
n
o
i.e., μ(G) = μ( U C ) . n
n
C* := C. and C! := C \
Put
C1
G A with
n
n-1 U
C., n ^ 2 5 t o get pairwise d i s j o i n t
sets
U C1 = U c , for which one can e a s i l y show (using t h e assumpnG]N n nGU n
tion (+)) that lim μ (C 1 ) = μ(C f ) for all n GIN. α
-
α
n
"
n
Therefore, for every n G U we have n n n lim μ ( U C!_) = lim Σ μ (C!) = Σ α α i=l 1 α i=l α 1 i=l
(++)
n μ(C!) = μ( U 1 i=l
C!). 1
Since μ(G) = μ( U C^) = μ( U C ! ) , there exists for each ε>0 5 an n = n(ε) G 3N nG]N such that
~ n ~ μ( U C!) ^ μ(G) - ε 5 and therefore (note also that G 3 U cl) 1
x
lim inf μ (G) ^ lim inf μ ( U C! ) = μ( U C|) k μ(G) - ε, α α α α i=l X (++) i=l 1 which proves the assertion. D THEOREM 8. Let (μ ) be a net in M 1 (S) and μ G Mj^(S) be separable (i..e.,μ(S ) = μ(S) = 1 for some separable S C S ) . o o Suppose that C C {B G β (S): μ(3B) = 0} fulfills (33). b Then the following two assertions are equivalent: (i)
lim μ (C) = μ(C) for all C G C Π f α
(ii)
μ.-eμ.
Proof, (i) =* (ii): Follows immediately from Lemma 17 and (28) (cf. the equi1
1
valence of (a ) and (g ) there); note that lim μ (S) = μ(S) is trivially α
EMPIRICAL PROCESSES
fulfilled for p-measures μ
65
and μ. τ
1
(ii) =» (i): Again (28) (cf. the equivalence of (g ) and (f )) yields f
lim μ (B) = μ(B) for all B 6 {B 6 B, (S): μ(3B) = 0} =: R~ = R~ α D μ μ
Πf
D C .
D
We will consider next a Cramer-type result which is useful in applications. For this, let again S = (S,d) be a (possibly non-separable) metric space, A be a σ-algebra of subsets of S such that 8, (S) C A C B(S), and let (ξ ) „. D n ntJN be a sequence of random elements in (S,A) and ξ be a random element in (S,B,(S)), being all defined on a common p-space (Ω,FjP). Then (34)
ξ
is said to converge in law to ξ (denoted by
L ξ -=•* ξ) iff L{ξ } — e H ξ } n n b
(in the sense of
our Definition 4 ) . Now, let (η ) ™
be another sequence of random elements in (S,A) defined on
the same p-space (Ω,F ,P), and let d(ξn,nn)(ω) := d(ξ n (ω),n n (ω)), ω 6 Ω. Note that for non-separable S, d(ξ ,n ) need not be a random variable. THEOREM 9a. Suppose that in the setting just decribed lim P*(d(ξ ,π ) > 6) = 0 for every 6 > 0, n-χ»
where Έ Then
denotes the outer p-measure pertaining to P.
ξ -2-> ξ iff η -^> ξ. n n
Proof. By symmetry, it suffices to show that ξ n
L So, assume
>ξ
implies
η n b
ϊ ξ.
ξ > ξ and let f 6 U (S) be arbitrary but fixed; n D then according to (28) (cf. (h ! )) it suffices to show that
66
PETER GAENSSLER
( + ) lim I E(f(ξ n )) -lE(f(n n ))| = 0. (Note that f(ξ ) and f(η ), as well as f(ξ), are random variables.) n n ad (+): Given an arbitrary ε>0 there exists (by uniform continuity of f) a ό = δ(ε) > 0 such that |f(x) - f(y)| ^ ε whenever d(x,y) ^ 6; also ||f|| = sup |f(x)| < co.
xes Therefore,
I E(f(ξ n )) -E(f(n n ))| ^ J|±r(€n) - f(n n )| dip -- J#|f(ξn) - f(nn)| «
s !\άUΛ)>6}
l«5n) - f(nn)|
^ 2 ||f || P*(d(ξ ,η ) > 6) + ε •> ε as n -> «, whence lim sup | E(f(ξ )) - E(f(n ))| = ε
for every ε > 0,
which implies (+). • The following version of Theorem 9a is useful as well, THEOREM 9a . Let (ξ ) ^^T and (η ) __. be sequences of random elements in n ΏEM n ntJN (S,A), defined on a common p-space (Ω,F>1P) such that (a)
lim P*(d(ξ ,η ) > 6) = 0 for every δ > 0. n-χ»
Let S τ = (S ! ,d ! ) be another metric space and H: S -> S 1 be A,B b (S ! )-measurable9 and such that (b)
d τ (H(x) 9 H(y)) ^ L d(x,y) for all x 5 y 6 S and some constant 0 < L < °°.
Then, for any random element ζ in (S ! ,B b (S ! )) s H(ξ ) -2-> ζ iff H(η n n
EMPIRICAL PROCESSES
67
f
1
Proof. H(ξ ) and H(η ) are random elements in (S ,B (S )) for which by (a) and (b) f
]P*(d (H(ξ ) , H(η )) > δ) ^P*(d(ξ ,η ) > 6/L) •> 0 n n n n for every δ>0, whence the assertion follows from Theorem 9a. D REMARK. Instead of (b) it suffices to assume only that H is uniformly continuous. THEOREM 9b. Let (ξ )
be a sequence of random elements in (S,A) and let ξ
be a random element in (S,8, (S)) being all defined on some common p-space (Ω 9 FjP). Suppose that ξ iaP-a.s. constant; then ξ
? ξ implies
lim 3P*(d(ξ ,ξ) > δ) = 0 for every δ > 0.
Proof. We show first (+)
lim E(|f ξ - f ξ|) = 0 for all f 6 t£(S). b
In fact,
f
we have (cf. Theorem 3) that f ξ for each f E UΓ(S) UΓ D n D
*f β ξ ,
ξ and f ξ being real random variables such that f oξ is P-a.s. constant, Έ
ΊP
whence (by classical probability theory) f ξ -*- f ξ (where •> denotes convergence in probability). Since f is bounded, ifo ξ : nEΈ } is uniformly Ll integrable and therefore fβ ξ —f fβ ξ which proves ( + ). We are going to show that (+) implies lim P*(d(ξ ,ξ) > δ) = 0 for every δ > 0. n
-xχ»
n
For this, let δ>0 be arbitrary; since L{ξ}(SC) = 1 for some separable S C S there exists a countable and dense subset {x.: iEΠN } of S and we have 1
O
E S^) = 1. Then, for each iEU , there exists an f i
{
0 if 1 if
E U^(S) such that O^ ^ i l
and
x6B°(x.,6/4) xECB°(xi9δ/2),
where B°(x.,r) denotes the open ball with center x i and radius r.
68
PETER GAENSSLER
In fact, take C
d(x,B°(x.,δ/4)ΠS ) i —),θl 6/4
f.(x) := 1 - max [(1 1
to get such a function. o
Now, let A 1 := ίξ6B (x1,δ/4)} and for i^2 let
then A. E F, the A.'s being pairwise disjoint and such that P( U A.) = 1 1
1
according
iE3N
to (*).
P*(d(ξ
^
Σ
1
Therefore , ξ ) >δ ) £
IP*({d(ξ
Σ iEK
,x ) > h ] i 4
n
=
P*({d(ξ
Π A.) £ i
,ξ) > δ} Π A.)
Σ i e ] N
J*l |f A± i
ξ n
- f . o ξ | d P i
Σ / |f.o ξ - f. . ξ|dP 5 1 iE3N A i
where the last inequality follows from the fact that for all ω E { d ( ξ , x . ) > -^δ}n A . n l 4 l by
c o n s t r u c t i o n of
If
we put
the
g (i) n
o n eh a s
f.τs
:= J
and f.(ξ(ω)) l
= 0
.
|f.
A
f . ( ξ ( ω ) )= 1 in
o ξ l
- f. n
ξjdF
and
g(i)
i
:= P ( A . ) i
i for
each iEJJ,
O^g
^g n
and
we o b t a i n f u n c t i o n s Σ iE]N
g(i)
=
Σ
g
and g on Έ f o r
P(A. ) = P ( U
iE]N
X
iEIN
A.)
which
= 1,
1
i.e.fthe g 's are integrable functions on U (integrable w.r.t. the counting measure on U ) being dominated by an integrable function g; since, by assumption
lim g (i) = 0 for all iEJN, n^°° it follows from Lebesgue's dominated convergence theorem that lim sup P*(d(ξ n-*°°
,ξ) > δ) ^ l i m Σ g ( i ) = 0. n^°° i E ] N
D
Finally, concerning the speed of convergence we have the following result:
EMPIRICAL PROCESSES
THEOREM 10. Let ξ
69
n£H , and η be random elements i n (S,A) defined on a
5
common p-space (ΐl91rJP) such t h a t f o r some sequence a (a')
+0
P*(d(ξn,η) > an) = ]R be A,β-measurable and such t h a t (bf)
|H(x) - H ( y ) | £ L
d(x,y)
for a l l x9y 6 S
and some constant 0 < L < °°. Assume further that L{H(η)} is absolutely continuous w.r.t. Lebesgue measure λ such that (c )
| | h | | = sup | h ( t ) | =: M < » f b r t6]R
h
Then s u p jP(H(ξ ) ^ t ) - P ( H ( η ) £ t ) | = CT(a ) . n
ten
n
Proof. Let tEJR be arbitrary but fixed; then P(H(ξ n ) ^ t) -Γ(H(η) £ t) ]P*(H(ξ ) ^ t, d(ξ ,η) ^ a ) + σ(a ) - P(H(η) ^ t) n n n n P(H(ξ ξ nn) ύ t, |H(ξn) ~ H(η)| Z L a n ) + « a n ) -P(H(η) ύ t) P(H(η) ύ t + L a ) + tf(a ) - P(H(n) ^ t) n n ^
(c )
M
L - a
n
+ CΓ(a ) = CΓ(a ) .
n
n
In the same way one obtains that 3P(H(ξn) > t) -P(H(η) > t) = whence also P(H(η) ύ t) -P(H(ξ ) ύ t) = so that in summary sup |P(H(ξ ) ύ t) -]P(H(η) ύ t ) | = CΓ(a ). D n
ten
n
SOME REMARKS ON PRODUCT SPACES: Let S
f
f
!
ff
= ( S 5 d ) and S" = (S",d ) be two (possibly non-separable) metric
70
PETER GAENSSLER
spaces. 1
Let S := S * S " be the Cartesian product of S
!
!
and S" and let d := max(d ,d"),
i.e., ι
d((x ,x"),(y',y")) := max(d' (χ ,y' ),d"(x'\y")) !
for (x',x") 6 S and (y ,y") G S, Then S = (S,d) is again a (possibly non-separable) metric space.
REMARK.
(1) Bb(S) C B^S 1 ) ©
and
(2) B ( S f ) ® B(S") C B(S),
the inclusions being strict in general as can be shown by examples. Let A r and A" be σ-algebras of subsets of S ! and S", respectively, such that B b ( S f ) c A' C 6(S ! )
and
B^(S") c A" c B ( S M ) .
Then
fi.(S) c B,(S') ®BAS") c A! © A " c B(s')©B(S") c B(s), b
(1)
b
b
i.e.,putting e.g., A := B ^ S 1 ) ® ^(S") or
A := A' © A "
(2) ... (a) ... (a 1 ),
we have again B b (S) C A C B(S) for the product space
S = S 1 x S".
Now, let ξ , nE3N, be random elements in ( S ' , A f ) 9 ξ be a random element in (S',B, (S')) 9 η , n θ ί , be random elements in ( S " , A M ) 9 n and let η be a random element in (S",B, (Sf!)) suppose that all these random elements are defined on a common p-space
Then (ξ ,n ) , nEΠN , are random elements in (S,A) (for both choices of A as in n n (a) or (a 1 )) and (ξ,η) is a random element in ( S ^ t S 1 ) Θ as well as in (S,B, (S)) (cf. (1) in the above remark). Thus, considering (ξ,η) as a random element in (S,B, (S))
EMPIRICAL PROCESSES
71
is again defined in the sense of (34), i.e., as
L{(ξ ,n )} ( eM (s)) -^ L{(ξ,η)} ( eM. (s)), n
n
a
b
b
Supplementing the results contained in Theorems 9a and 9b we can prove within the setting just described the following Theorems 9c and 9d:
THEOREM 9c, Suppose that n equals P-a.s. some constant c; L
then
L
b
ξ
L
b
>ξ
and
η
h
* η.
together imply
(ξ ,n )
> (ξ,n).
Proof. According to Theorem 9b,
η
>η
Since
and
d((ξ
n
n = c P-a.s,
5
η ),(ξ n n
,η))
imply
= max
(df(ξ
lim
P (d π (n ,n) > δ) = 0
, ξ ) ,d"(n ,η)) nn n
= d
f !
(n
n
for every δ>0.
,n),
we thus have lim P*(d((ξ ,η ),(ξsη))>fi) = 0 for every 0. nχ» Therefore, by Theorem 9a, the assertion of the present theorem will follow if we show
ad (+); 1.) L{(ξ,η)} is separable: since L{ξ} is separable, there exists a separable S 1 C S 1 such that L{ξ}(S t C ) = 1 . Take o
S
o
:= S' C x {c} to get a separable and closed subset o
S = S C of S for which o o U(ξ,n)}(s o ) = p ( ( ξ , η ) e s Q ) = p ( ( ξ , n ) e s ^ c x {c}) = p ( ( ξ , c ) e s | C x {c}) = f ( ξ e s ' c ) = L{ξ}(s f C ) = l, o
Q
o
2,) According to the Portmanteau theorem (cf. (h f ) there) it remains to show that
where
S f dμ ? J f dμ for all f € U?(S), n S S μ := L{(ξ ,η)} and μ := ί-{(ξ,η)}. n n
72
PETER GAENSSLER
!
Now, given any f: S. = S x S" •> ]R being hounded, d-uniformly continuous and B, (S)-measurable, it follows from (1) in the remark made at the beginning 1
n
that f is also B (S ) 0 B,(S )-measurable, whence !
!
!
!
f : S ->]R, defined by f'(x') := f(x ,c), x E S , !
!
!
is B (S )-measurable, and thus f E IΓ(S ), But now, with μ 1 := L{ξ } and μ ! := L{ξ}, we obtain from ξ n n again the Portmanteau theorem): / f dμn = J f o (ξn,η)dIP = J f S Ω Ω =
J S!
f'dμf n
> J Sf
ψ ξ (using
n
f !dμ!
( ξ n , c ) d I P = J f! Ω
= J f! Ω
o ξ dlP = J f Ω
= J f < » ( ξ , n ) d P = J f dμ. Ω S
* ^
(ξ,c)
dP
dlP
D
For sequences of independent random elements one gets THEOREM 9d. Suppose that ξ and η
are independent for each nEϋN and suppose
also that ξ and η are independent. Then the following two statements are equivalent: (i)
ξ n
> ξ and n — * n n
Proof, (i) => (ii): 1.) L{U,n)) is separable; since both, L{ξ} and L{η} are separable, there exist S^ C S ! and S^ C S" such that (S',d!) and (S",dM) are separable and o o L{ξ}(S C ) = L{ n }(S" C ) = 1. o o Put S
:= S !
c
c x S" to get a separable and closed subspace of
S = (S,d) (S = S 1 x S", d = max(d!,d")) for which
2.) According to the Portmanteau theorem (cf. (a f ) there) it remains to show
EMPIRICAL PROCESSES
(+)
lim inf
(where A = A C
f
©A").
L{(ξ ,n )>(G) ^ U(ξ,η)}(G)
For this, let μ
:= {A' x A": A
f
1
73
for all
G E G(S) Π A
:= L{ξ} and μ" := L{η} and let
E A' , T ' O A ' ) = 0, A" 6 A", £"(3A") = 0 } ;
then C is closed under finite intersections, i.e, C = C
, and (33) holds
which means that for each
x E S
{CEC
x E C }
is a neighborhood base at x. f
Furthermore, by assumption and the Portmanteau theorem (cf, (f ) there), we
1
have for μ = L{ξ } n n
and μ" = L{η } n n
lim (μ1 x μ")(Af x A") = lim μ 1 (A1 ) μ"(Afl) = μ"* (Af ) μ"(A") n n n n = (μf x μ")(Af x A") = μ f x μl!(Af x A")
for all
A' x A" E C = C' ,
whence (+) follows from Lemma 17,
(ii) =» (i): 1.) Both L{ξ} and L{η.) are separable: since L{(ξ,η)} is separable there exists a separable S
C S = S ! x S"
such that L{(ξ,n)}(s£) = 1. Put S^
:= {x E S f : 3y E S"
such that (x,y) E S Q }
to get a separable S 1 C S' for which S° C π τ ~ ( S ! ° ) 9 whence
here π 1 denotes the projection of S = S' x S" onto S ! . In the same way one shows that L{τ\] is separable.
2.) According to the Portmanteau theorem (cf. (a ! ) there) it remains to show that for μ 1 = L{ξ } and μ τ = L{ξ} n n (+)
l i mi n f μ'(Gf) n
^ V^(Gf)
and that for μ" = L{x\ } and μ" = L{η} n n
for a l l
G
l
6 G ( S
l
) Π A
f
74
PETER GAENSSLER
(++)
lim inf μ"(G") ^ μ"(G M )
for all
G" G G(S Π ) Π A".
We will show (+); the proof of (++) runs analogously.
ad (+); Let G 1 G G(S') Π A f be arbitrary but fixed; then
π t " 1 (G l ) = Gf x s M G A n G(s) and
μ ' ( G ' ) = (μ 1
n
n
x μ M ) ( π ' ~ 1 ( G I )) = ( μ !
n
n
x μ")(G f
n
(A = A' (jpA") x S")
for each nGlN.
By assumption and the Portmanteau theorem (cf. (a') there) we therefore obtain lim inf μ ! ( G ! ) = lim inf (μ1 x μ")(Gf x S") n n n ^ (μ ! x μ")(G! x S") = μ ! ( G f )
μ"(Sfl) = μ f ( G ! ) .
D
Remark. Using the continuous mapping theorem (Theorem 3) one easily gets an alternative proof of "(ii) ** (i) t τ in Theorem 9d, even without imposing the independence assumptions,
SEQUENTIAL COMPACTNESS: We have shown before (cf. (32)) that any net (μ ) C M (s) which fulfills oi a (b.), i=l,2, is a compact net in M (S). At this point we ask the question I a whether the same is true for sequences instead of nets, i.e.,whether for any sequence (μ ) c l K r C M (S) fulfilling (b.), i=l,2, there exists a subsequence n ntJiN
a
l
(μ ). ___ and a separable μ G M, (S) such that μ - ^ μ (as k-x»). n, kGϋN D n, b k k (Note that a subnet of a sequence need not be a sequence!) If (b.), i=l,2, is replaced by the (stronger) assumption of (μ ) __, being 0 as m •* °°, and therefore also
->- 0 as m •> °°. Since
- f)dμ ^ lim sup J f dμ - lim sup J fdμ m n *k k "k k k
^ 0,
lim sup J f dμ - lim sup J fdμ -> 0 as m •> °°, , m n, . n, k k k k
and therefore we obtain by (i) (a)
lim lim J f dμ m \
On the other hand, since we have
= lim sup J fdμ k^» \
lim inf / (f - f )dμ m n k k
= - lim sup J (f m k
- f)dμ Π k
,
lim inf J (f - f )dμ -> 0 as m -> °°. Since k k
lim inf J (f - f )dμ ^ lim inf J fdμ - lim inf J f dμ £ 0, , m n, , n, , m n, k k k k k k we thus obtain in the same way as before, using (i), that lim lim J f dμ^ = lim inf J fdμ^ , whence together with (a) the assertion in n, . _^5 n, k k-** k (v) follows. Finally, let G
:= {f: S ->B: f = P(g ,...,g ) for some g. E G , 1 ύ i £ k, k E]N};
then, by (iii), G (vi)
C U (S), and it can be easily shown that (v) implies
lim / fdμ
exists for all f E G .
n k-χ»
o k
C Now , let IΓ(S ) := {f: S° -> ]R: f bounded and uniformly (d-) continuous}
and consider G' := {rest_of: f E G } C U b ( S C ) . M
o
C
M"
O
80
PETER GAENSSLER
Let F ,F
E F(S C ) be a d-strictly separated pair of closed subsets in the
metric space S
= (S , d ) , i.e.,(w.l.o.g.) there exists a δ>0 such that
c
(Γ? n s ) n F,°= 0. ° _L
O
Z.
Put f := minCdC ^
7
Π S Q ) , 1 ) ; then f E G^ and g := rest Q f E G^; o
we will show that (b)
sup g(x) < inf g(x). xEF xEF 1 2
ad (b): x E F, implies d(x,F/ ^"*— l 1 f(x) = 0 for all x E F
Π S°) = 0 since F, C s°, and therefore o i o
whence sup g(x) = 0.
On the other hand, x E F 2 together with (F^ Π S^) Π F 2 = 0 implies d(x,F1) ^ δ ) ^ δ/2; thus d(x,F^/2 Π S°) ^ δ/2 for all x E F 2 , i.e.,
and therefore d(x,F^ inf
g(x) ^ min(δ/2,l) which proves (bJ.
Therefore, by our proposition, G' is an analytic generator of U ( S C ) , i.e., for every h E U (S°) there exists a sequence (g ) Sn ~- V β n l
^
-Snk^
w i t h
^ni
f
G
i >
H
i
S
k
n'
such that a n d
sup | h ( x ) - g ( x ) | -> 0 a s n -> α>. n xES° c Since g . E G' , g . = rest f . for some f . ^ G.t 9 whence & ni 4 ni c ni ni M o
O
f := P (f i 9 f o,...,f , ) E G c with rest f = g for each n 6 W . n n nl n2' nk 6 _c n & n n S o We thus obtain that (vii)
For any f E U, (S) there exists a sequence (f )
C G
such that
sup |f(x) - f (x)| •> 0 as n n xES c o Furthermore, we will show that (viii)
lim J fdμ
exists for all f E UΓ(S).
n
b
k
ad (viii): Let f E U (S); then, by (vii), there exists a sequence (f ) e - I N c G such that sup |f(x) - f^(x)| ->• 0 as n ->• °°; therefore, given an arbitrary n x6S Sc o
EMPIRICAL PROCESSES
81
but fixed m G UN there exists a n n
= n (m) G U such that o o |f(x) - f (x)| < —. Since f and f are uniformly continuous, there n m n o o
sup c xGS o exists a δ >0 such that |f(x) - f(y)| < - and |f (x) - f (y)| < m m n n m o o δ whenever d(x,y) < δ . Now, let S := (S ) then, for any x G S there exists a
y G S such that d(x,y) < δ , and therefore o m
|f
(x) - f(x)| ^ |f o
(x) - f (y)| + |f (y) - f(y)| + |f(y) - f(x)| < | n m o o o
for all x G S, whence f(x) ύ f (x) + n m o
and f(x) £ f (x) - - for all x G S. n m o
Since S G δ (S) (cf. Lemma 11 (ii)), it follows that ©
J fdμ n
(d)
k
J fdμ n
£ J (f + -) dμ n S Πo m k
+ I fdμ n GS k
for all k € U , and
^ / (f
+
for all k 6 1 ,
k
S
n
o
- -) dμ m
n
k
J fdμ
CS
n
k
Furthermore, it follows from (b ) that (e)
lim sup μ (CS) = 0, n k k d ( f
'So} ad Q) : l/»g = 1 r ( S C ) δ m = m i n ( — £ — — , D lim sup μ (£S) ^ lim sup J f dμ n n k k k k = 1 - lim inf J (1 - f°)dμ
lim inf J (1 - f°)dμ
& 1
=: f
o b G IΓ(S), whence
= lim sup (1 - /(I - f )dμ ) n k k
^ 0, since 1 - f° 2 1
and thus
by ( b o ) . This proves ( e ) , 2
\
Next, it can be easily shown that (<e) implies (f\
lim ^ f d μ n k+~ Cs k
=0
for all f G ϋ
But then, it follows from (c), Ql) and Qj lim sup J fdμ Π k k
that
^ lim sup J f dμ k S no nk
+ —, and
82
PETER GAENSSLER
lim inf J f dμ ^ lim inf J f dμ k k k S o k Furthermore, ©
together with (vi) imply easily that
lim sup ί f dμ n k S o \ and therefore
.
= lim inf J f n k S o
lim sup J fdμ k \
dμ n k
- lim inf J fdμ n k k
= lim J f n k~> S o
dμ k
n
^ — ra
which implies the assertion in (viii) since we started with an arbitrary m. But now, putting
μ(f) := lim J fdμ for n k-*° k
f E u£(S),
the assertion of Theorem 11 a) follows as in the proof of Theorem 6
applying
the Daniell-Stone representation theorem (cf. H. Bauer (1978), 3. Auflage, S. 188) noticing that ^ ( S ) coincides with the smallest σ-algebra with respect to which all f E U, (S) are measurable. D
This concludes the proof of Theorem 11 a ) . D
SKOROKHOD-DUDLEY-WICHURA REPRESENTATION THEOREM:
Let again S = (S,d) be a (possibly non-separable) metric space and suppose that A is a σ-algebra of subsets of S such that B, (S) C A C B(S): let (ξ ) ^ D n nEJN be a sequence of random elements in (S,A) and ξ be a random element in (S,B b (S)) such that ξ
-^-» ξ (cf. (34)).
Then the Skorokhod-Dudley-Wichura Representation Theorem states: THEOREM 12. ξ
r ξ implies that there exists a sequence ξ , n E U , of random
elements in (S,A) and a random element ξ in (S,B (S)) being all defined on an appropriate p-space ( Ω , ? ^ ) such that L{ξ } = Lίξ } (on A) for all n G I , HO
= ί-{ξ} (on B (S)) and ξ
an Ω C Ω o
with ?l E ΐ and o
-> ξ P-almost surely as n •> °° (i.e., there exists
ί(Ω ) = 1 such that for all ώ E Ω o o
lim d(ξ (ω), ξ(ω)) = 0 ) . n
For complete and separable metric spaces this result was proved by A.V. Skorokhod (1956); it was generalized to arbitrary separable metric spaces
EMPIRICAL PROCESSES
83
by R.M. Dudley (1968), and in its present form (for arbitrary metric spaces) it was first proved by M.J. Wichura (1970); cf. also R.M. Dudley (1976), Lectures 19 and 24. Our proof will be based on the one given by Dudley (1976). For this, we need the following proposition. Proposition. Let S = (S,d) be a metric space, μ E R C S ) be separable, i.e., μ(S Q ) = 1 for some separable S
c
S; then, given any ε>0, there exists a
sequence (A ) „ of pairwise disjoint subsets A
of S having the following
properties: (i)
S° C
° (ii)
U
A
new
n
u(3A ) = 0 for all n E U (where μ denotes the unique Borel extension of μ (cf. (24)))
(iii)
diam(A ) :=
sup d(x,y) < ε for all n E U , and x,yEAn
(iv)
A E B, (S) for all n 6 3N. n D
Proof. Let {x ,x ,...} be dense in S . For each n EϋN, the open ball B(x ,δ) 1 z o n is a μ-continuity set (i.e., μ(3B(x ,δ)) =0) except for at most countably many values of δ; hence, given any ε>0, for each n E U there exists an ε ε/4 < ε
< ε/2 and μθB(x ,ε )) = 0. Now, let A
for n>l
A
:= B(x ,ε ) \ U n n j 1 - Λ2
Σ
P(A
(where w.l.o.g. we may assume
) > 0 for all 1 ύ j ύ J ).
Applying (28) (cf. (f τ ) there) we obtain (b)
For each k 6 1 there exists an n
6 1 such that for
|P (A. .) - P(A. . ) | < 2 ~ k
1 £ j £ J,
min
P(A. .) for k
all n ^ n . We may assume w . l . o . g . l < n
STEP 2. For each n 6 1 8
n
:= A ® B ( I
n
measure on 8 ( 1
) n
let
< n
S, where Π A is the natural projection of
T Ω = A x T onto A, i(A) is the injection of A into T , and Π ° is the natural O
projection of T
o
= S x I onto S; then ξ is a random element in (S,B, (S)) and
U)
L{ξ} = L{ξ}
(on B. (S)). D 1
In fact, for any B e B b (S), L{ξ}(B) = P(ξ" (B)) A
T ~ 1 1 1 1 1 = P((π")"" c (i(A))" o (ΠS°)" (B)) ^ ( ( I I " ) " o (i(A))'" (B x I)) 1 ^ - ( A Π (B x I)))=£( [A Π (B x I)] x T) = J μ (T)Q (dx) A (j) AΠ(BxI) X ° = Q (A Π (B x I)) = Q (B x I) = P(B) = L{ξ}(B). o o Now, let Ω
:= lim inf Ω
, where
°
n
k(n) then Ω £ F and o (m)
lim d(ξ (ω), ξ(ω)) = 0 n n-*»
for all ω € Ω , o
In fact, for any
ώ £ Ω there exists an n E l such that for all n ^ n there o o o exists a j(n), 1 ^ j(n) ύ J . ., such that *k(n)j(n)
and
implies i(A)(Π^(ω)) € E n
< ™ t β that ή t t ) 6 A Π E n j ( n )
"(ίS whence Πs°(i(A)(π"(ίS))) β A k ( n ) ( n ))'
(n)
cf. the definition of the sets D . and E ., respectively. Therefore, for all n ^ n , d(ξ (ω), ξ(ω)) £ diam(Aj, Λ / \) ^ F Γ T "*" n -> °° (since k(n) -*• °° as n •> » ) . Next we will show that ]P(Ω ) = 1. For this we will prove later that (n)
Q (lim sup E ) = 0. o no -xχ) n
Now, C
"o,n=
( [ A n E
* k(n)
n j
]
X T
l
X
X T
n-lxCDnj
X T
n+1:
PETER GAENSSLER
+ ([A Π E 1 x T x . . . x T x . ..) no 1 n = Ω. + Ω^ , say, l,n 2,n where
P(8
) = Σ J Pn x (CD . )Q (dx) = Σ P . (JD . )Q (A Π E .) n n j AΠE n j 3 ° j 3 n3 o nD
Q ((CD .)ΠD .) r Σ — ^ £3_ Q (E .) = 0 for all n E JJ, and therefore
ί ( l i m sup CΩ ) = ]P(lim sup ίL ) = ί ( [ A n lim sup E 1 x T) n 2 n n o n~ °' n-~ ' n~ = Q (A Π l i m s u p E
n o
°
I t follows t h a t
) = Q ( l i m s u pE
°
n~°
P(Ω ) = H l i m inf Ω
°
n o
) =0
by ( n ) .
) = 1 - P(lim sup Cfi Π
°
) = 1. n
It remains to show (i) (2) and (n): ad (i) (2): We have to show that ( +) If
V := {B e 8: μ( ,B): A -> I = [θ,l], is A Π B , B(I)-measurable} = B. B = T x ... x T - x B x T x ... for some B 6 B , then for each t G I 1 n-1 n n+1 n n
{x E A: μ(x,B) £ t} = {x G A: μ (B) ^ t} = {x G A: P (B ) ^ t} = x nx n ( h ) U
AΠE.GAΠB,
whence B E 1? f or a l l these s e t s .
From this and the product form of μ it follows that the class C of finite intersections of sets B just considered is also contained in P. Since C is a Π-closed generator of B, we get (+) as in Gaenssler-Stute (1977), 1.8.5.
ad (n): It follows from (a) that
Σ P(S \ U l ύ k ^
\.) ^ Σ 2~ k < « 9 k k
whence, by the Borel^Cantelli lemma P(lim sup(S\
U
A
)) = 0, i . e . ,
P(lim inf
U
A
) =1
and thus P(lim inf
U
A
.) = 1
as k(n) -> °° for n •*
EMPIRICAL PROCESSES
89
Furthermore, λ(lim inf [0,
with t < l ,
min 1
Since
min
^
g(n,k(n),j)1) = 1 ,
since for any
g(n,k(n) 5 j) ^ 1 - 2
J
> t
lim inf
U
E . D (lim inf
U
A. , Λ . ) x k ( n > 3
(lim inf [0, min
we thus obtain Q (lim inf
U
g(n,k(n),j)]),
E .) = 1 n3
° u
for all large enough n.
( )
nD
C(
t 6 I = [θ,ll
which implies (n) since
E .) = E . ni
no
This concludes the proof of Theorem 12.
D
Following a suggestion of Ron Pyke, let us demonstrate at this place the usefulness of the representation theorem for proving the following version of Theorem 5 (cf. Lemma 16 for the definition of the set E ) .
THEOREM 5'. Let S = (S,d) and S 1 = ( S ! 5 d f ) be metric spaces, and A a σ-algebra of subsets of S such that ^ ( S ) C A C B(S). For n G Έ let g : S -> S τ be A,8 (Sf )-measurable and let g: S -> S ! be 8 h ( S ) , B (S f )-measurable. Let (ξ ) ^ ^
be a sequence of random elements in (S,A) and ξ be a random ele-
L
, ,
ment in (S,B, (S)) such that ξ -=•• ξ and L{ξ} (E) = 0 . Then D n
(Note that g (ξ ) and g(ξ) are random elements in (S f ,B b (S ! )).)
Proof. As in the proof of Theorem *+ it is shown that ί.{g(ξ)} = Lίί} ° g
( = ί{ξ} β g
) is separable. Now according to Theorem 12,
there exists a p-space (Ω,F,P) and on it random elements ξ random element ξ in (S,B, (S)) such that
in (S,A) and a
90
Uξ
PETER GAENSSLER
n
} = L{ξ } (on A) for all n 6 lί, L{ξ} = L{ξ} (on B, (S)), n Jb
and ξ (ω) •> ξ(ω) n
(as n -> «) for all ω G Ω , where Ω 6 F with o o
P(Ω ) = 1. o
Let Ω. := {ξ G C E } and Ω o : = Ω n Ω : then for all ω 6 Ω_, 1 z o 1 Z g (ξ (ώ)) -* g(ξ(ω))
(as n -> « ) . Since
ί(Ω ) = 1
(note that L{ξ}^ (ffE) = L{ξ}^ (CE) = 1) we have
and ί ^ C ^ ) = 1
i # ( Ω 2 > = 1, whence there
exists Ω 3 e F such that Ω 3 C Ω 2 and P(Ω 3 > = 1. It follows that for ft G Ω 3 and
each f e u h b
f eg
ξ(ώ)-> fβgβξ(ω)
(as n ->- »)
whence, by Lebesgue's theorem, E ( f β g n β g ^ i ( f ί g o ξ ) 5 i.e., JfdL{gn(ξn)} -• JfdL{g(ξ)} Since and
Lign(ξn)} = Litjo^
(as n -* » ) .
= Lίξ^-g^ 1 = Lίg^ζJ}
L{g(ξ)} = L{ξ} g" 1 = L(ξ} g" 1 = H g ( ξ ) } , the assertion follows by (28)
(cf. (h 1 ) there).
D
Next, we want to make some specific remarks concerning the special case S = D[θ,ll reviewing at the same time some of the key results from Billingsley's (1968) book (cf. Appendix A in G, Shorack (1979)).
THE SPACE D[0,l]: Let D Ξ D[θ,ll be the space of all right continuous functions on the unit interval [θ,l] that have left hand limits at all points t G (0,1]. Cf. P. Billingsley (1968), Lemma 1, p. 110, and its consequences concerning specific properties of functions x G D; among others,
sup |x(t) | < °° for all x G D. te[o,i]
If not stated otherwise, the space D will be equipped with the supremum metric p, i.e. ρ(x,y) :=
sup |x(t) - y(t)| for x,y G D. tG[0,l]
(By the way, (D,p) is a linear topological space whereas (D,s), with s being
EMPIRICAL PROCESSES
91
the Skorokhod metric, is not (cf, P. Billingsley (1968), p. 123, 3.)). Note that (S,d) = (D,p) is a non-separable metric space (in fact, look at x
s
:= lr -I, s E (0,1), to obtain an uncountable set of functions in D for L s9±J
which p(x ,x ,) = 1 for s Φ s ' ) . s s Also, as pointed out by D.M. Chibisov (1965), (cf. P. Billingsley (1968), Section 18), the empirical df U some
(based on independent random variables (on
p-space (Ω,F^P)) being uniformly distributed on [θ,l]) cannot be con-
sidered as a random element in (D,B(D,p)) (i.e. U : Ω •* D is not F,B(D,p)n ——— measurable), where B(D,p) denotes the Borel σ-algebra in (D,p). But, considering instead the smaller σ-algebra B .(D) Ξ B (D,ρ) generated by the open (p-) balls we have (37)
B b (D) = σ({π t : t E [0,1]}), where σ({π : t E [0,1]}) denotes the σ-algebra generated by the coordinate projections π by π (x) := x(t)
= π (D) from D onto ]R, defined
for x E D.
(Note that (37) implies that U
is F,B (D)-measurable since
F, σ({π : t E [O,l]})-measurability of U
is equivalent with F,β-measurability
of π.(U ) = U (t) for each fixed t E [θ,l] where the latter is satisfied since t n n U (t) is a random variable.) n Proof of (37). Let T := Q Π [θ,l] be the set of rational numbers in [θ,ll; then, by the right continuity of each x E D one has (a)
ρ(x ,x ) = sup |x-(t) - x Q ( t ) | for every x ,x 1
ι
Therefore, for any x
E D.
ter E D and any r > 0
{x E D: p(x,x ) ύ r} = Π {x E D: |x(t) - x (t)j ^ r} ° tGT ° =
n
tET
Tr'^tx (t) - r, x (t) +r]) E σ({π^; t E [0,1]}); thus t t ° °
B, (D) C σ({τr : t E [0,1]}); (note that {x E D: p(x,x ) < r} Jb t o =
U ^-«τ
mEIN
{x E D: p(x,x ) ύ r - - } ) . o
m
92
PETER GAENSSLER
To verify the other inclusion it suffices to show that for every fixed t E [θ,l] and r E ]R one has (b)
{x E D: π t (x) < r} G B ^ D ) .
For this we define, given the fixed t and r, for any n, k E IN and s E [θ,l] 0, if s < t k
χ (s) :=
ξ iff L{ξ } —r* L{ξ} in which case (by our definition of —r* - convergen-
ce) L{ξ} is assumed to be separable. On the other hand, in view of (38), ί.{ξ } and L{ξ} may also be considered as Borel measures on B(D,s), whence the usual concept of weak convergence of Borel measures can also be used, which means that ξ
>ξ
iff, by definition, L{ξ } on B(D,s) converges weakly to L{ξ} on B(D,s) in the sense of Billingsley (1968),
L h LEMMA 18. If ξ -=-> ξ, then ξ n n
/
> ξ; on the other hand, if ξ n
L
> ξ and
H ξ K C ) = 1, then ξ^ -^> ξ.
Proof. Note first that (D,s) is a separable metric space whence we can use (28) with B,(D,s) = A = B(D,s), which gives us
94
PETER GAENSSLER
(+)
ξ
- ^ ξ * lim E(f(ξ )) = E(f(ξ)) for all hounded
B(D,s), /B-measurable functions f: D ->• ]R which are L{ξ}-a,e, continuous.
1. ) Consider an f: D -*]R; then: if
f is s-continuous,
i t i s also
p - c o n t i n u o u s a n d ( c f . ( 3 8 ) ) ίL ( D , p ) ,
iB-measu-
rable. Therefore
Lb
ξ
> ξ implies that lim E(f(ξ )) = E(f(ξ)) for all bounded s-continuous n n^o
n
f: D •* ]R, whence ξ n 2.) ξ
> ξ.
? ξ implies, according to ( + ), that lim E(f(ξ )) =E(f(ξ)) for all
bounded β(D,s), β-measurable f: D •> "R which are L{ξ}-a.e. continuous. Since i-{ξ}(C) = 1 implies (cf. (38)) that any p-continuous f is also L{ξ}-a.e. s-continuous, we obtain, using again that B(D,s) = 8, (D,p), that lim n
E(f(ξ )) =E(f(ξ)) for all bounded, p-continuous, and Bτ(D 9 p)j β-measu-
->oo
rable f: D -*]R; furthermore, since C = (C,p) is a closed separable subspace of (D,p) with L{ξ}(C) = 1, we finally obtain (cf, (28)(hf)) that
Now we are going on in reviewing here some of the key results of Billingsley's (1968) book. The following lemma is well known (cf. Yu.V. Prohorov (1956));
LEMMA 19. Let F: [θ,l] •> E be a continuous function and a>l, b>0 be constants such that for some random element (K[°>l],βΓ O LO,1J (40)
ξ in (β
LO.1J
:= t(E
© 8 [o,i] t
with tB
=β)
E(|ξ(t) - ξ(s)|b) ύ |F(t) - F(s)| a for all 0 ύ s ύ t ύ 1;
then there exists a random element ξ in (D,8 (D,p)) such that L{ξ}|IBr0 .-. = L{ξ} and (L{ξ}|Bb(D,p))(C) = 1. (Note that D Π ^
^
z B^D.p) (cf. (37)) and
EMPIRICAL PROCESSES
95
C e B b (D,p) (cf. (39)).) In what follows we shall write ξ -=—f ξ, if the finite dimensional distrin r .α, butions (fidis) of ξ converge weakly to the corresponding fidis of ξ. (Recall that, given a r.e. ξ in (D,B (D,p)), the fidis of ξ (or L{ξ}, respectively) are defined as the image measures that π o n ^
k (D): D ->]R induce
from L{ξ} on B b (D,p) (= σ({π t (D): t E [0,1].})) for each fixed
ti5...3t 1
E [0,1], k k 1, where π.
K
(D) (x) :- (x(t ),...,χ(t, )) for
L^5. .., L
x E D; note that π.
±
K
is B (D,ρ) 9 β -measurable,)
DEFINITION 6. Let (ξ ) , τ be a sequence of random elements in n nEJN (D,Bb(D,p)) = (D,B(D,s)); (i)
(ξ ) is said to be relatively L-sequentially compact 9 iff for any sub-
sequence (L{ξ ,}) of (L{ξ }) there exists a further subsequence (L{ξ ,,}) of (L{ξ t}) and a p-measure μ on B(D,s) such that L{ξ ,,} converges weakly to μ in the sense of Billingsley (1968), (ii) (ξ ) is said to be relatively Lv-sequentially compact, iff for any subn Ό sequence
( H ξ ,}) of (L{ξ }) there exists a further subsequence (/-{ξ „}) of
(L{ξ ,}) and a separable p-measure μ on B,(D 9 p) (in (D,p)I) such that
The following theorem is well known (cf. P. Billingsley (1968), Th. 15.1).
THEOREM 13. Let (ξ ) be relatively /.-sequentially compact and suppose that
The next theorem gives sufficient conditions for (ξ ) to be relatively L-sequentially compact. For this, given any x E D and B E [θ,l] Π β
||x|| :=
}
sup |x(t)|, tE[0,l]
let
96
PETER GAENSSLER
and iλ) (B) := sup jx(t) - x(s) I . X s,t€B THEOREM 14. Let (ξ )
be a sequence of random elements in (D,B (D,p)) 9 all
defined on a common p-space (Ω,F,P), and satisfying the following set of conditions ^ p - ( S ) : (A):
lim sup 3P(||ξ || > m) -* 0 as m -»- «.
(β): For every ε>0, lim sup Έ> (u) ([θ,δ)) ^ ε) + 0 as δ -> 0 and
lim sup ]P (u) ([δ,l)) k ε) •> 0 as δ •> 1, n-*» n
(S): There exist constants a>l, b>0 and, for every n 6 IN there exist monotone increasing functions F : [0,1] -»• ]R such that for every ε>0 and any
P(|ξ (s) - ζ (r)| 2 ε, |ζ (t) - ξ (s)| 2 e) S 11
11
11
11
ii
xx
(5) : There exists a monotone increasing and continuous function F: [θ,l] -> ]R such that for the F ' s occurring in (^
and any 0 ^ s S t ^ 1
lim sup (F (t) - F (s)) ^ F(t) - F(s). n n n-χ» Then (ξ ) is relatively L-sequentially compact.
(41) REMARK. Given any x E D and δ>0, let UJM(δ) :=
sup
min ί|x(s) - x(r)|, |x(t) - x(s)|}
t-r^δ Then @
and (^ together imply
(cjj) : For every ε>0, lim sup Γ(M;'! ( ό ) ^ ε ) ^ O a s δ - ^ O . ir*»
n
As to Theorem 14, it is shown in Billingsley (1968), Theorem 15.3 that
@
and (cjj) together imply the assertion of Theorem 14, So we will prove here only the statement made in (41). For notational convenience we shall write ξ (s,t] instead of ξ (t) - ξ (s) n n n
EMPIRICAL PROCESSES
97
for 0 ^ s £ t ύ 1, a) Given an arbitrary ε>0, t 6 [0,1) and δ ^ 1 - t, it follows from Theorem 12.5 in Billingsley (1968) together with (c) that for every n 6 UN and every m 6 Ή P(
^
{min [ jξ (t + —
ύ K(a,b)'ε
δ,t + —
δ] I , | ξ ( t + - ^ - δ , t + —
δ]|]^ε})
(Fn('t + θ ) - F ( t ) ) , where K(a,b) is a constant depending only on
a and b. Therefore, due to the right-continuity of the sample paths of ξ , putting u;"(Lt,t + 6]) : =
sup
min { |x(r ,s] |, |x(s,t' 11}
X
for x β D, δ>0 and t ^ 1 - 6, it follows that
P(itf" ([t,t + δ]) ^ ε) n
^ K(a,b).ε"*b(Fn(t + δ) -
n
b) Let, for any δ>0, m = m(δ) ;= [^j] (where [xl stands for the integer part of x ) ; then, for every n
EB9
P(M;" (δ) ^ ε) ύ Σ p(wy ([-,—1) ^ ε) + Σ p(wy a—1, n
\
a)
i=0
,
[
i=0
n
n
i=0
n
φ
i=0
% ^ 1 ) ^ ε)
n
^ ^ ^
]
,
which implies by (5) that lim sup n-^00
P(M;" (δ) ^ ε) ύ K(a,b) ε""b. 2 ( W F φ ) a " 1 . (F(l) - F(0)) t n
where υίΛ-) := sup {F(t) - F(s): s £ t, t - s ^ -} F m m = M;TΠ( / f v ) •»• 0 as δ -> 0 (since F is uniformly continuous), r m(.ό This proves (C^) ,
D
(42) REMARK. Let us consider in Theorem 14 instead of (^ conditions y&j
and ^J/
, respectively:
and @
the following
98
PETER GAENSSLER
: For every ε>0, lim sup P(|ξ (d) - ξ (0)j ^ ε) -> 0 as δ -> 0
and
lira sup P(|ξ n (l) - ξ U ) | ^ ε) + 0 as δ + 1;
j) : There exist constants a.,b. > 0, i=l,2, such that a 1 + a o > 1 and, for every n E U there exist monotone increasing functions F :[θ9ll •*• ]R such that for any O ^ r ^ s ^ t ^ l b b2 E( |ξ ξ n (s) ( ) - ξξnn(r) (r)| | L |ξ |ξnn(t) ( - ξ n (s) | ) ύ ( F ^ s ) -
then (?j) together w i t h ^)
imply ^ ^ , and (c]) implies
THEOREM 15, Let ξ , n 6 E , and ξ b e random elements in (D,B ( D , ρ ) ) f a l l defined on a common p-space ( Ω ^ ^ I P ) , and suppose that ζ^
(or ζcj) ) a n d ( ϋ ) together
with t h e following conditions ( g ) and Q ) a r e fulfilled:
φ : then
L{ξ}({x E D: x(l) * x(l-0)}) = 0; ξ n
> ξ,
Proof, As remarked in (M-2), ^ j ) implies (c) which together with ^ ) implies Qcjj according to (M-l), But ζ y together with ^ ^ and (c^) imply the assertion according to Theorem 1 5 Λ in Billingsley (1968) (cf, also Gaenssler-Stute (1977), Satz 8,5.6.).
D
In view of Lemma 18 we thus obtain the following L ^convergence theorem:
THEOREM 16. Let ξ , n β Jί, and ξ be random elements in (D,B (D,p)) f all defined on a common p-space (Ω,F^P), and suppose that L{ξ}(C) = 1,
I
Then (c) (or (cj) ) together with (^
and (E) imply ξ
?- ξ.
The following result is used in G. Shorack's (1979) paper concerning ξ 's of a special nature.
EMPIRICAL PROCESSES
99
n
n
THEOREM 17. Let, for every n G U, T := {t , t^9,..,t } be such that n o 1 m 0 = t
^ t 1 ^ . . .^ t m n
= 1. Let ( ^ ^ e ^ k e
(D,B, (D,p)) such that for all n and i ^ m ω
ζ
([t
t
i-i' i
)) =
a
sequence of random elements in
ξ
is constant on [ t ^ ^ t . ) , i.e M
°
a
'
s
n Furthermore, assume that the following conditions (i) - (iii) are fulfilled: (i) max (t? - t?_ 1 ) -> 0 as n -> »; 1 i^m (ii) There exists a sequence (F )
of monotone increasing functions
F : [0,1] ->]R such that for some a>l and b>0 P(|ξ n (s) - ξ n (r)| ^ ε, |ξn(t) - ξ n (s)| 2 ε) έ ε~"b(Fn(t) - F j r ) ) 3 for every ε>0 and any set {r,s,t} C T with r ^ s ^ t ; (iii) There exists a monotone increasing and continuous function F: [θ,l] such that for the F 's occurring in (ii) either (a) F (t) - F (s) ^ F(t) - F(s) for every n and any O ^ s ^ t ^ ^•^
or Then (ξ )
Q)
n
n
F (t) + F(t) as n -> °° for every t G [θ,ll.
satisfies (c) and (ϋ).
Proof. Let, for each n G U, φ : [θ3l] ->T be defined by φ (t) := max {r ^ t: r G T }, t 6 [θ,ll. Then, according to (i), lim φ (t) = t for every t G [θ,l]. Now, put F 1 := F o φ , n 6 1 , to get a sequence of monotone increasing functions on [θ,l]; we are going to show that (Q and ζg) are satisfied with F τ (instead of F there): n As to (c), by the assumed nature of the ξ f s , we have for any 0 ^ t^ ^ t^ ύ
which implies by (ii) that for every ε>0 and any O P(|ξ n (s) - ξ n (r)| Z ε, |ξn(t) - ξ^Cs)| ^ ε) £ ε"
100
PETER GAENSSLER
which proves (Q, As to ^ ) , we have to show that for any 0 ύ s £t ^ 1 (+)
f
f
lim sup (F (t) - F (s)) ^ F(t) - F(s). n n-
But this follows easily from (iii); in fact, (iii) (a) implies that for any O^s^t^l
F'(t) - F'(s) = F (φ (t)) - F (φ (s)) n n n n n n
ύ F(φ (t)) - F(φ (s)) -*- F(t) - F(s) as n •> «>, which implies ( + ), On the other hand, (iii) Q ) implies by the Polya-Cantelli theorem that sup
te[o,i] for any t °° and therefore, n
|F'(t) - F(t)| ^ |F(t) - F(φn (t))| n
+ JF(φ (t)) - F (φ (t))| -> 0 as n -> °°, which implies ( + ),
D
This concludes our short review of some of the key results in Billingsley's (1968) book to be used in Section 4 when proving functional central limit theorems for weighted empirical processes along the lines of Shorackτs (1979) paper; concerning the L -statements there (cf. Theorem 18 and 19 in Section 4) it is possible to modify the above mentioned criteria in Billingsley's book in such a way that they allow for proofs working totally within the theory of L -convergence (cf. Remark (73)(b) in Section 4) as it will be the case for the following example concerning Donsker's functional central limit theorem for the uniform empirical process α
= (α (ΐ) )-t^ΓQ ii> defined by
α n (t) := n 1 / 2 ( U n ( t ) - t ) , t e [θ,ll, where U
is the empirical distribution function based on independent random
variables having uniform distribution on [θ,l]. According to (37), α can be considered as a random element in (D,8, (D,p)) n D as well as in (D,B(D,s)) (cf, (38)) and it follows from the multidimensional Central Limit Theorem that
EMPIRICAL PROCESSES
101
α :Λf
(43)
n f .d. where B
= (B ("t^+.cΓo ii ^
s
t
ie
" ^ Brownian bridge.
As to B , having all its sample paths in the separable and closed subspace C = (C,p) of D = (D,p), it follows from (39) that L{B°}, being originally defined on 8(C,p), may be considered as well on 8 (D,p) having the additional property that L{B }(C) = 1. Therefore, B
may be considered as a random element
in (D,8,(D,p)), too, with L{B } being concentrated on C, whence by Lemma 18 one has
(44)
(i)
α - t * B°
iff
(ii)
n
L α -^» B°. n
It was conjectured by J.L. Doob (1949) and shown by M.D, Donsker (1952) that (44)(i) holds true. There are various ways of proving this result which is known as Donsker τ s functional central limit theorem for the uniform empirical process: One may e.g. use Theorem 15 by showing that the hypotheses ©
and ( D ) are
fulfilled (cf. Gaenssler-Stute (1977), Lemma 10,2.2) or one may apply Theorem 15.5 in Billingsley's (1968) book; as to the latter one has to show that (45)
For each positive ε and η there exist a δ, 0 < δ < 1, and an integer er n
o
such that tha for all n ^ n
o
(fi) > ε) < η, n where
u) (6) := X
sup |x(t) - x(s)| for x E D, |t-sj«5 t,se[o,i]
(By the way, it follows from Theorem 15,5 in Billingsley (1968) together with Lemma 18 that (45) is a sufficient condition for (α )
_, to be relatively
L -sequentially compact.)
As to (45), this can be shown either by using Donker's invariance principle for partial sum processes (in case of independent exponential random variables) (cf. L. Breiman (1968), problem 9, p. 296) or by more direct com-
102
PETER GAENSSLER
putations using the structural properties of empirical measures as presented in Section 1 (cf. W. Stute (1982)) yielding at the same time an independent proof of (M-M-)(ii) within the theory of 1. -convergence in (D,p); in fact, it can be shown (cf. Proposition B 9 in Section 4) that (45) implies δ-tightness of (L{α })
w.r.t. S
- C[θ,ll, and therefore Theorem 11* together with an
application of Theorem 3 yields (44)(ii) in view of (43). This also indicates the way to prove Functional Central Limit Theorems for more general empirical processes (empirical C-processes indexed by classes C of sets) in the setting of L -convergence of random elements in appropriately chosen metric spaces. Before doing this in the next section we want to supplement the present one by some remarks on random change of time (cf. Billingsley (1968), Chapter 3,17.).
RANDOM CHANGE OF TIME: Following Billingsley (1968) we will briefly indicate here that so-called random change of time arguments are valid also within the context of L -convergence (even with simplified proofs not relying on Skorokhod's topology); in this connection the reader should remind our remarks on product spaces, For this, let D
consist of those elements φ of D Ξ D[θ,l] that are in-
creasing and satisfy 0 ύ φ(t) ύ 1 for all t. Such a φ represents a transformation of the time interval [θ,l]. We topologize D
by relativizing the uniform topology of D,
Then (37) implies that D 6 B L (ϋ) and therefore o D BΛΏ b
)C A o o
:= D Π 3R (D) = {B C D : B E B. (D)} C B(D ). o b o b o
For x E D and φ 6 D , let x
φ: [0,1] •* E
be defined by (x * φ)(t) := x(φ(t)), t E [θ,l]. Then x © φ lies in D and, if ψ: D x D -> D o
EMPIRICAL PROCESSES
is defined by
ψ(x,φ) := x o φ , then ψ is
103
B, (D)(χ)A , 8 (D)-measurable,
i.e. one has (+)
1
ψ" (8 b (D)) c A := B b ( D ) 6 ) A o
where A is a σ-algebra in the product space S = D x D
(being equipped with the
maximum metric d (cf. our remarks on product spaces)) such that
B b (s) c A c 8(S). ad (+): cf. Billingsley (1968) p, 232 for a proof being based on the fact that B b (D) = σ(ίπ t : t 6 [0,1]}) by (37). D
Now , let ξ , n £ 1 , and ξ be random elements in (D,B (D)) and, in addition, let η , n € K , and η be random elements in (D ,A ) all defined on a common p-space (Ω,FjP). Then (ξ ,η ) , n 6 1 , and (ξ,n) are random elements in
(S,A) = (D x D , B.(D)®A ) O
D
O
and so, by (+), ξoη
= ψ(ξ ,ii ) , n E E , and ξ » η = ψ(ξ,η) are random elements in (D,B, (D))
resulting from subjecting ξ by η
and ξ to the random change of time represented
and η, respectively.
Concerning a "(ξ ,η )
* (ξ,η)"-statement, (ξ,n) may be considered as a
random element in (S,8 (S)), since BjtS) c A> thus being in accordance with our definition of L -convergence. When asking for conditions under which L
(++)
(ξ ,η ) n n
L
b > (ξ,η)
implies
ξ o n n n
b >ξ
n
we know from the continuous mapping theorem (Theorem 4) that (++) holds if ψ is A,8 (D)-measurable and L{(ξ,η)}-a.e. d-continuous. Now, the required measurability of ψ is guaranteed by (+) and it follows as in Billingsley (1968), p. 145, that ψ is also L{(ξ,η)}-a.e. d-continuous if L{ξ}(C) = L{η)(C) = 1
for
C Ξ C[θ,l]; in fact, if L{ξ} and L{η} concentrate
PETER GAENSSLER
on C, then L{(ξ,n)}(C x ( C Π D
)) = 1, and it is easy to show that ψ is d-con-
tinuous on C x (C Π D ).
It remains of course the question of when
holds and here Theorem 9c can be used leading to the following result on stability of L -convergence in D Ξ D[θ,ll under random change of time:
THEOREM. Suppose that ξ , n GIN, and ξ are random elements in (D,BL (D)) such ί-b that
ξ
> ξ and
(D 5 A ) such that η
L{ξ}(C) = 1. Let η , n E U , and n. be random elements in Lh
> η and n equals
P-a f s, some function belonging to
C Ξ C[09l]*}. Then ξ o η , n EϋN, and ξ © η are random elements in (D,B (D)) for which
This last assumption may be omitted by considering instead the set C x {c} as separable support of L{(ξ,n)} if Π
=
c
P-a,s,
4. Functional Central Limit Theorems,
In the last section we have already mentioned Donsker's functional central limit theorem for the uniform empirical process α
= (α (t)).pfπ .,-i5 where
1/2 α (t) = n
(U (t) - t ) , U ( t ) being the empirical df based on independent
random variables η. having uniform distribution on the sample space X = [θ,l] with its Borel σ-algebra B = [θ,ll n β , In the setting of an empirical C-process β Ξ (£ (C)) r the uniform emn n utu pirical process α is a very special case taking C = {[θ,t]: t G [θ,l]} and identifying α (t) with β (C) = n
1/2
(μ (C) - μ(C)) for C = [θ,t], μ being the
empirical measure based on η ,. . • ,n
and μ being the uniform distribution on
[0,1]; note that μ (C) = U (t) and μ(C) = t for C = [θ,tl. The present section is concerned with some extensions of Donsker's functional central limit theorem in its form (1+4)(ii) to more general situations.
FUNCTIONAL CENTRAL LIMIT THEOREMS FOR EMPIRICAL C-PROCESSES: Let X = (X98) be an arbitrary measurable space considered as a sample space for a given sequence ξ ,ξ 9... of i.i.d, random elements in (X,B) 9 the ξ.'s being defined on some common p-space (Ω,F,]P) with law μ on B. If not stated otherwise we will consider the canonical model (Ω,F,]P)=(X 1N 9 B_ l5 x μ) with the ξ.'s being the coordinate projections of X
onto X.
1 n Let μ (B) = — Σ l D (ξ.), B 6 8, be the empirical measure based on ξ ,...,ξ . n n . _ . . b i ±n
105
106
PETER GAENSSLER
Now, given some subclass C of B, consider the empirical C-process 3
n
Ξ
(β
n
(C))
C6C
d e f i n e d
M O
b v
:= n
1 / 2
(μ n (C) - μ ( O ) , C G C,
as a stochastic process (on (Ω,F^P)) indexed by C. As mentioned in Section 1, its covariance structure is given by cov(3 n (C 1 ),β n (C 2 )) = \i(C1 Π C 2 ) - μ(C 1 )μ(C 2 ),
C l 9 C 2 G C.
So, the analogue of (44 )(ii) would be the statement that (in the sense of (34-)) (46)
Lh
3 => C , G = (G (C)) p c r , being a mean-zero Gaussian process n μ μ μ LfcL with cov(G (C1),G (C 2 )) = μ(C 1 Π C?) - μζC^μίC^), C ^ C ^ G C.
But this amounts at first to make a proper choice for a metric space S = (S,d) together with a suitable separable subspace S β
serving as sample spaces for
and its limiting process (B , respectively.
Following Dudley (1978) we propose to choose S
= U (C,d ) ;=
{φ: C -> H: φ bounded and uniformly d -continuous}, where d
is the pseudo^
metric defined on C by d
μ(Cl'C2)
:= μ ( C
l
Δ C
2
) 9
C
1'C2 e C'
(C. Δ C denoting the symmetric difference between C. and C ). Note that, concerning the μ(C)-part of 3 (C), C -*• μ(C) is a function belonging to S Q (since (μζ^) - μ(C 2 >| ^ d μ ( C l 9 C 2 ) ) . 1/2 In order to cope also with the μ (C)-part of 3 (C) (and the factor n ) , let S Ξ D (C,μ)
:= {φ = φ
O
+ φ : φ l
/
i
_
G S L
and φ
θ
k Σ a.ε
= ^
.
^ 1=1
a
G ] R ,x
E X , l ^ i ^ k ,
Note that S is a linear space containing S
1
X
for
some
. i
k elί}.
as a linear subspace
also 3 ( f 9 ω) 6 S for all ω G Ω. Finally, let S (and its subspace S ) be metrized by the metric d := p, where p is the supremum-metric, i.e.,
EMPIRICAL PROCESSES
p(φ ! 5 φ") := sup |cpτ(C) - φ"(C)| CEC
107
for φ 1 ,φ" € S.
Note that the closure D(C 9 μ) of D (C 9 μ) in the Banach space £°°(C) = U°°(C) 9 p) of all bounded real-valued functions on C can be considered as an extension of D=D[O,1] in the classical case, where X = [θ,ll, C = {[0 t tl: t 6 [θ 9 l]} 9 and μ is the uniform distribution on [0,1] or any other distribution on [θ,l] with a strictly increasing distribution function; also, in the latter case, u (C,d ) equals C[θ,ll after identifying φ([θ,tl) with x(t).
Having made this choice for S , S and d, in view of (46) the following problems still remain: PROBLEM (a) (MEASURABILITY): Find conditions under which the 3 ! s can be viewed ^m*.
pj
as random elements in (S,A) for some σ-algebra A in S such that one meets the situation of Section 3, i,e. (47)
B b (S,p) C A C B(S,p)
(with B (S,p) being the σ-algebra generated by the open p-balls in S 9 and B(S,ρ) being the Borel σ-algebra in (S,p)) Taking A ;= σ({π : C 6 C } ) 5 with π : S -> E. being defined by π c (φ) := φ(C), C 6 C, is
F,A-measurable
(since F,σ({π : C E C})-measurability of β
is equivalent with F9β-measurabili-
ty of π (& ) = 3 (C) for each fixed C 6 C, the latter being satisfied since 3 (C) is a random variable (on (Ω9F,]P)) for each fixed C ) 9 but the first inclusion in (47) fails to hold, in general: in fact, looking back to (10) in Section 1, it follows that in the example considered there 3 is not even F,B(S,p)-measurable, n D So, we will restrict our consideration to cases where the following measurability condition (M): B b ( S , p ) C A := σ ( { π c : C E C})
108
PETER GAENSSLER
is fulfilled, which turns out to be satisfied in important cases of interest; note that (M) implies (47), since the other inclusion there holds trivially due to the p-continuity of the π 's for each fixed C E C,
LEMMA 20. Suppose that C fulfills the following condition (SE): There exists a countable subclass V of C such that for any C E C there exists a sequence (D ) ^
in V with 1
(x)
> lp(x) for all
n x E X; then (M) holds true.
Proof. (SE) implies that for any C E C there exists a sequence (D ) _ „ in V such that lim d (D ,C) = 0 from which it follows that φ,(C) = lim φ,(D ) for U n 1 l n every φ
E S
on the other hand, since 1
(x) -** 1 (x) for all x is equivalent n
with lim ε (D ) = ε (C) for all x, we obtain φ(C) = lim φ(D ) for every φ E S. x n x n But from this it follows that for any φ {φ E S: ρ(φ,φ ) ^ r} =
°
Π
E S and any r > 0
{φ E S: |φ(D) - φ (D)| ύ r} E A,
DEP
°
since V is countable, implying (M). D
(48) EXAMPLES, (a) Let (X,B) = 0R k ,β ) , k £ 1, and let C be the class 1
of all
lower left orthants or the class B, of all closed Euclidean balls in Έ. , respectively; then (SE) and therefore (M) holds true for C = J
and C = B ,
respectively, (b) If we c o n s i d e r i n s t e a d e . g M t h e fixed c l o s e d Euclidean b a l l in f a c t ,
no V - {C
class C ; = { C
+ z; z E R }, C being a
in E. , then (SE) f a i l s t o h o l d :
+ q: q E R} with countable R C ]R
can serve as a countable
s u b c l a s s of C with t h e d e s i r e d p r o p e r t y s t a t e d i n (SE), s i n c e for any fixed k z E ]R \ R and any D E V t h e r e e x i s t s a y 1 —1 (v ) ^ 1 (v ) ~ 0 C Ί*z π Do o Q (cf.
FIGURE 4 ) .
k E H such t h a t
EMPIRICAL PROCESSES
109
FIGURE 4
We shall see below how to cope also with examples where (SE) fails to hold. For this another measurability assumption (M ) weaker than (M) will be needed. It should be noticed (cf, the proof of Lemma 20) that in case of (SE) we have SEPARABILITY of the process 3 Ξ (3 (C)) r in the sense that each sample path n n UfcL' of 3 is uniquely determined by its values on P.
Let us make some further remarks at this place: first, note that (M) implies (49)
B (T,p) = σ({π c (T): C 6 C}) = B(T,p) for any_ separable subspace T of S, with TΓpίT): T ->• ΊR being defined by π c (T)(φ) := φ(C).
In fact 9 the same reasoning which gave us (39) in Section 3 yields (50)
B (T,p) = T Π 8 (S,p) for any separable subspace T of S,
whence (cf. Lemma 11 (iv)) B, (T,p) = T Π B, (S,p) b b
C ( M )
T Π A = σ({π p (T): C E C } ) C 8(T,p) = B, (T,p) 5 C b
which proves (49). Next, concerning S (49*)
= u (C,d ), it follows even without imposing (M) that
B, (S ,p) = σ({π_(S ): C E C}) = 6(S , ρ ) 5 provided that C is DO
LO
O
^^
^
totally bounded for d j_ In fact, if C is totally bounded for d , there exists a countable d -dense subset V of C implying, due to the d -continuity of functions belonging to S , that for any φ
G S
and any r > 0
110
PETER GAENSSLER
{φ E S : p(φ,φ ) ^ r} = Π o D e p
{φ € S : |φ(D) - φ (D)| ύ r} o o
G σ({π n (S ): C E C } ) 9 whence B. (S ,p) C σ({π (S ); C 6 C}) C β(S ,p) Co DO Co o on the other hand, using the Stone-WeierstraB theorem, it can be shown that (51)
S = LΓ(C9d ) i s s e p a r a b l e and p-closed ( i . e . S° = S ) , o μ o o provided that C is totally bounded for d j_
This proves (49 ). For later use it is important to note that (49 ) together with (50) and (51) imply LEMMA 21. Let C be totally bounded for d has all its sample paths in S
and suppose that (B = (G (C))pw,
= lΓ(C,d ) ; then jζ ° J. K
This leads us to the next PROBLEM (b) : (EXISTENCE OF A VERSION OF g Let Φ
= (G (C))
r
Ξ (g ( O ) c c C in S
=_ (^(C^d ) ) :
be a mean-zero Gaussian process with covariance structure
(cf. Section 1, (4)) cov(G y (C 1 ), 6 (C 2 )) = μ ( C 1 Π C 2 ) - μ ( C 1 ) μ ( C 2 ) > C ^ Noticing that the fidis of 3
G C.
(viewed as a random element in (S,A) with
A := σ ( { π : C G C})) are well defined, we have according to (4) of Section 1 (52)
3 -Ξ-J C , G
being viewed as the coordinate process on
(where L{Φ } is uniquely determined by the fidis of 0 C =
n U
N(ε,C,μ) be the smallest n G Έ such that
C. for some classes C. with d -diam(C) := supίd ( C ! 9 C M ) : C',C"eC.}^2ε
for each j log N(ε,C,μ) is called a METRIC ENTROPY (of C w.r.t. μ ) .
Obviously, N(ε,C,μ) < °° for each ε>0 iff C is totally bounded for d case S
(in which
= u (C,d ) is separable and p-closed, by (51)).
Now, as shown by R.M. Dudley (1967) and (1973), cf. p. 71, (53)
$ S
μ
has a version $
μ
= (G ( O ) p having all its sample paths in μ Ltc
Ξ ϋ (C,d ) provided that
(EQ):
J
1
9 1/9 (log N(x\C,μ)) '' dx < ~.
But it turns out that (E ) is not sufficient to ensure (46); o in fact, disregarding for the moment measurability questions, the following example shows that (46), i.e. 3
— ^ 0 t h e r e e x i s t s a δ = δ ( ε 9 η ) 9 0 < δ < 1, and t h e r e e x i s t s a n n
o
9
and
= n ( ε 9 η 9 δ ) E ] N such t h a t for n ^ n o o
P*(M;Q 3
(δ)
> ε)
|: d ( C 1 > C 2 ) < δ 9 C l 9 C 2 E C} for φ E S Ξ D (C,μ). o (55) REMARK. A comparison with (45) shows the complete analogy with the classical situation X = [θ 9 l], B = [0,1] Π flB5 μ = uniform distribution on B, C = {[θ,t]: t E [0,1]}, where 3 process α
can be identified with the uniform empirical
note that, due to the compactness of the unit interval,
C = {[0,t]: t E [0 9 l]} is totally bounded for d : given any ε>0 let n
: {[θ {[θ99tl: tl: U := infίn: - ^ 2ε} and C.. := 3 nQ
then d -diam (C.) ύ 2ε and C =
ύ tt < ί }; nQ
n o U C..
Before proving Theorem B we will show two auxiliary results:
PROPOSITION B
(cf. Problem ©
above). Suppose that (a) and (b) of Theorem B
114-
PETER GAENSSLER
are fulfilled; then G
= (S (C)) P ί _ Γ has a version in S = u (C,d ) , i.e.,there μ L/vzL o μ
μ
exists a Gaussian process Φ μ such that Φ
= (G ( C ) ) n ^ p having all its sample paths in S and μ LtL o
L
= ' ) having all its sample paths in U(P,d )
L
and such that Φ -n =, ® -n μ,P f.d. μ,P In fact, once ©
is shown, we can define for each ω ' E Ω
τ
f
G (ω ) as the μ
uniquely determined uniformly d -continuous extension on C of Φ G (C,ω! ) = lim G ( D n , ω ! ) , ( D n ) n e ] N C V
for each C E C
r}(ω') (i.e t
being such that
d (C,D ) -> 0 as n •> «>). It follows that (*) and
(**)
€ (ω1 ) is bounded for each ω τ , whence Φ (ω 1 ) E S βμ
f
=d
for all ω',
βμ .
ad (*): By (a), for every ε>0 there exist an
n
= n (ε) 63N and C. C C, o 3 n o j=l,...,n , such that d -diam (C.) S 2ε and C = U C.. ° U 3 3 j = 1 o
EMPIRICAL PROCESSES
115
Let ω f 6 Ω 1 be arbitrary but fixed^ since € (ω ! ) is uniformly continuous on C, for each δ>0 there exists an ε = ε(δ,ω ! ) > 0 such that | G μ ( C 1 5 ω ' ) - G ( C 2 5 ω ! ) | < δ whenever d ( C ^ C ^ ^ 2ε for C ^ C ^ E C. Now, given an arbitrary C £ C, there exists a j G {l,...,n } and a C.
EC.
such that d (C,C.) ^ 2ε, and therefore G μ ( C , ω f ) | ^ |G μ (C 5 ω ! ) - G^(C.,ω f )| + |G ( C . ω 1 ) ! £ δ + |G (C.,ω f )|, whence sup |G (C,ω')| ^ δ + sup |G (C.,ω f )| < «. μ CEC l^j^nQ μ : ad (**); Let us confine here to show that L{G (C)} = L{G (C)} for each fixed C 6 C; concerning the higher-dimensional fidis the proof runs in a similar way. Now, given any C E C , let (D )
C V be such that d (C,D ) •* 0 as n -> °°,
whence, by construction, G (C,ω ! ) = lim G (D ,ω ! )
i mJp l 7y i n& g G (D ) - ^ ^ μ n
G ( C ) . Now, by © , L{G (D )} = L{G (D )} = μ > J \~s * μn μn
W(O,μ(D ) ( 1 - μ(D ) ) ) (cf.
n
for all ω 1 E Ω ! ,
(3) of S e c t i o n 1) for
n
each n E ]N, where
μ(D ) -*• μ(C) as n •>0
ι/ n (δ) f ^ ( δ ) φ φ
as P
t V9 n
116
PETER GAENSSLER
whence for any ε>0 we have (c)
{φ El?:
(/(δ) > ε} C Φ
U { φ e^: nEΦJ
We are going to show next that ^ Ώ
n
t/ (δ) > ε} as V
t P.
Ψ
is implied by
(R ): For any ε,η > 0 there exists a δ = δ(ε,η), 0 < δ < 1, such that Pp({φ EΈ°: MJ (δ) > ε}) < η. In fact, (R ) implies that for each fixed ε>0 Σ U?p({φ E E : ttf (δ ) > ε } ) ε i f f ψ
(φ(D.. ) , . . . , φ ( D β ) ) E G J.
Λι
with G = Gε ,o_ being some open subset of E . Now, given an arbitrary ε>0 and an arbitrary η>0 choose δ = δ(ε,n) 9 0 < δ < 1 according to (b) such that for all n ^ n (ε,n9δ) (e)
P (UJO (δ) > ε) < η. 3 n
EMPIRICAL PROCESSES
r
Then it follows that for each V
— =
= ί D i 9 , .,D β } C V (C C)
-1
-1
«π
P|Λ
^ lim inf L{β } n-*» n
117
(G) = L{ϋ } *π
(G)
πr , (G) = lim inf P * (π r _ _ , ββ ) ~ (G) {D 1 9 ...,D £ } ^ {D 1 9 ..,,D £ } n
P1 = lim
inf
P(MJ O
n-x»
* (6)
> ε) ύ lim
n
inf
P
(M; H
n-*»
(δ)
> ε)
n
ύ
η,
(e)
where for the first inequality above we made use of (28) and the fact that according to (52) and (**) This proves Proposition B ^
3 ,. •, > Φ . n r.α. y D
(56) REMARK. The proof just given of Proposition B
shows that in order to get
a result like (53), it suffices to show that an entropy condition like (E ) implies (R ). This was nicely demonstrated by D. Pollard (1982) in one of his Seminar talks at Seattle using an analogue of the chaining argument of R.M. Dudley ((1978), pp. 915, 924); cf. also D. Pollard (1981), pp. 191-192.
PROPOSITION B . Suppose that (a) and (b) of Theorem B are fulfilled and also (M); then (L{β }) „_ is δ-tight w.r.t. S = U b (C,d ) , n nE-DM o μ (Note again that L{$ } 6 M ^ S ) , S Ξ D (C,μ)J n a o For the proof of Proposition B
we will make use of the
Kirszbraun-McShane-Theorem (cf. M.D. Kirszbraun (1934) and McShane (1934)): let S = (S,d) be a metric space, A C S , and let φ be a real-valued function defined on A such that sup { |φ(x) - φ(y) | / d ( x , y ) : x,y G A, x + y} =: K < °°; then φ can be extended to a function ψ on all of S with sup ί|ψ(x) - ψ(y)|/d(x,y): x,y 6 S, x Φ y} = K.
Proof of Proposition B o (cf. R.M. Dudley (1978), Lemma (1,3)).
118
PETER GAENSSLER
For any ε,δ > 0 let B.
θj.ε
:= {φ E D (C,μ): 3C. ,CQ E C s.t. d (C19C_) < δ and |φ(C.) - φ(C o )| > ε}. o
±
Note that
z
μ
j
-
z
i
z
φ E B. iff itf (ό) > ε, o5ε φ
We have to show: for any 0 < ε < 1 there exists a compact set K C Lr(C,d ) such that for each γ>0
Ύ
L{3 }(K ) > 1 - ε for n large enough.
(Note that K Ύ E B (S,p) C A by (M).) Let 0 < ε < 1 be given; by (b) take δ = δ(ε) 9 0 < δ < 1, such that
Θ
P*(β
n
E B Γ / o ) < ε Λ for all n ^ n (ε,δ(ε)), δ,ε/2 o
According to (a) there exists a finite C = C (&) C C such that for all C e C . o o d (C,C ) < δ for some C E C . μ 'o o o Let k := |C |; then k = k(δ(ε)) E H . Take M = M(ε) large enough so that (M - 1 ) ~ < ε/k; then Γb) ^
3P(sup |3n (C)| > M) < ε/2 for all n ^ n (ε) Ξ n (ε,δ(ε)). CEC o o
ad (b): Note that {ω: sup |β (C,ω)| > M} E F according to ( M ) ; n CEC now, for each C E C , ]P( j β (C ) | > M - 1) < ε/M k
by Chebyshev's inequality
(and the choice of M ) , whence
© Next,
P(sup |β (C )| > M - 1) < ε/4, CEC n ° o sup |βn (C,ω)| > M CθC
and
|β (C.,ω) - β (C o ,ω)| ^ ε/2 n i n z
for all C ,C E C with d (C ,C ) < δ together imply (due to the choice of C ) that there exists a C £ C such that o o |β (C ,ω)| > M - ε/2 > M - 1, whence {sup |β (C)| > M} C {3 6 B . ,J U {sup |β (C )| > M - 1} n δ CEC n 'ε/2 C EC Π ° o o which implies \bj according to (a} and (bj ,
EMPIRICAL PROCESSES
119
3
Now, for any j E E , let ε(j) := ε
2 ; then by (b) there exists a sequence
ό(j) = ό(j,ε) > 0 , ] 6 1 , such that (i) (ii)
δ(j + 1) < δ(j)/29 and P*(3 n E B 6 ( j )
ε ( j )
) < ε(j) for all n ^ n Q (ε,j).
Let A. : = B., .λ , . Λ and 6 . : = — ~ z — 3 δ(]),ε(]) ] 2^1
^
*
then, by (i), we have (iii)
ό. < δ./M3+1 j
and
. Q
is increasing with j,
Furthermore, for m ^ 2, let Fm := {φ E D (C,μ): sup |φ(C)| ^ M and
°
then (c)
CEC
s.t. for all CX 1,C l
E C
d (C ,C ) IφίC^) - φ(C 2 )| si ε(j).max (1, - = — ^ ) for j=2,..,,m}; j sup |φ(C)| ^ M for some φ E D (C,μ) and φ 6 JA. for j=2,,.,,m 3 CEC ° together imply that φ E F ,
ad (c) : sup jφ(C)| ύ M implies that for all C ,C E C CEC |φ(C ) - φ(C o )| ύ 2M = — ~ — — X
2.
£ ε(j) — — ^
0.
, if
0.
d (C l 9 C 2 ) ^ δ(j) for all C l 9 C 2 E C; on the other hand, d (C l 9 C 2 ) < ό(j) for some C^C^ E C imply
together with φ E C A .
IΦCC^) - φ(C 2 )| ^ ε(j), which proves (c) .
We will show next that (ii) together with Q y and ^c) imply Id)
For each m ^ 2 there exists an n 1 =n 1 (ε 9 m) 6 3N such that for all
n ^ n 1 there exists an E £¥ with P(E ) > 1-ε and β ( ,ω) E F 1 nm nm n π for all ω E E nm ad (d): According to (ii), let n (ε,m) be large enough such that for all n ^ n (ε,m) and each j=2,...5m there exist E', E F with ° n3
120
{3
n
PETER GAENSSLER
6 A . } C E 1 . a n d 3P(E f . ) < ε ( j ) = ε 3
whence
n j •
nj
m P ( f u E ' . ) > l - ε / 2 j=2 n:l
thus, for
m E := ( £ U nm j = 2
2 ^,
mm C u E ' . C Π {β j =2 n ^ j=2 n
and
and
3
E τ .) Π {sup |3 (C) | ^ M} G F, n3 n C E C
we obtain together with ζ y and ^ ) that for n ^ n ]P(E ) > 1-ε nm
efA.};
3 ( ,ω) 6 F n m
for all
:= max(n (ε,m),n (ε))
ω 6 E . nm
This proves (d} Now let K := {φ E A C ) : sup |φ(C)| ύ M and s.t. for all j G I
ceC d
μ
( C
l
9 C
2
)
ε/2}
c)
) < 3δ , r,s E ί l 9 , , , ,m(O)>l > ε/2}
= E 1 ( ε , δ Q 9 n ) U E 2 ( ε , δ Q 9 n ) , say, it suffices to show that δ
P (E.(ε,δ 9 n ) ) < ε / 2 , i=l,2, for an appropriate
= δ (ε) and n sufficiently large.
STEP Q :
Let us consider first E 2 replacing (in view of STEP (g) below)
ε by ε/2, i.e. we will show that P2
:= P*(E 2 (ε/2,δ Q ,n)) = P(E 2 (ε/2,δ Q ,n)) < ε/4 for a proper choice of
δ = δ (ε) and n sufficiently large. o o Applying Lemma 4 (i) of Section 1 we get
P o ύ 2 [m(O)]2exp ( 6δo
for n > n
o
2 (^ ,/ o „ ) ^ 2 [m(0)] exp + - n -
2
Z
o
2 2 := ε /(256 δ ) ; o
now , as to m(0), it follows from ( E ^ together with N^-Cx) + as x Ψ 0 that
126
PETER GAENSSLER
xlog N (x) •> 0 as x •> 0, whence there is a γ = γ(ε) > 0 such that 2
£l)
N (x) ύ exp(ε /(8OO x))
for all 0 < x ^ γ.
2 Thus, for δ o έ γ and n > n Q , ?2 έ 2 exp ( J^-
2 2 - I § 5 ? - ) = 2 exp ( - 3 ^ — ). o o o
But since 2 @
exp (-
we obtain for δ
^ min(γ,α) that P 2 < ε/4 for all n > n .
STEP (^:
) < ε/8 for α small enough,
To cope with the other event E- a certain chaining argument will be
used: for this we note first that the entropy condition ( E ^ is equivalent to
ί y" 1 / 2 (log N ( y ) ) 1 / 2 dy < - and to Z ^ l o g X 0 HEM
N ( 2 ^ ) ) 1 / 2 < «;
therefore, there exists a u = u(ε) so that 0
Now, let 6 δ
Σ
(2~1log N ( 2 " X ) ) 1 / 2 < ε/96
Σ
exp(-2 £ + U ε2/(9000(il+l)4)) < ε/32.
=ά(ε):=2
•~r
and
with r ^ u and r large enough so that also
^ min(γ,α) (cf. STEP O ) .
For k=l,2,... let δ, : = δ - 2~ k = 2 ^ ( r + k ) and b, := (2~klog m ( k ) ) 1 / 2 , K O K i.e. b v δ
1 / 2
= (2-(r+k)log N τ t 2 - ( r + k ) ) ) 1 / 2 Σ
b. δ k
Next, let B ] < = B ^ C ) ;= ^ D
k
=
and
D
k(C)
:=
.
'
we have
< ε/96.
° ^
N
A^^ . ^ ^
^ljCk-elC) N ^ j d c C ) 5
μlD k ) < δ k + 1 < δ R
so that by ©
t h e n
" ^ ί
and
10.000/ε o Then, for each n > n
2
-> ~ as ε •* 0.)
there is a unique k = k(n) such that
EMPIRICAL PROCESSES
Θ
1 /9
1/2 < 86.
K
Now, f o r e a c h n > n
127
n ' /ε £ 1.
a n d e a c h C 6 C we o h t a i n ( w i t h k = k ( n ) , i ( k ) = i ( k , C )
and j ( k ) = j ( k , O ) 1 /9
s-*κ
©
β
(A
n ki(k)
)
"
ε / δ
&
(A
* n ki(k)
)
δ
* k
n
*
β
n
(C)
Also
β (B )| + |β (D )|1. n l n ^ Let S n be the collection of sets B = A n .N.A n
or A Λ
Λ
Λ
v A n . with
j E {1,. . . ,m(A)} and m 6 {1,. , , 5m(λ+l)}, respectively, and so that μ(B) < <S J Then, for each C e C, B β (C) and D β (C) G S o . The number of sets in S. is bounded by (§)
|S |^ 2m(£Jm(A+l).
For later use, note that (by the definition of b 0 ) Jo
£
m(£) = Let d & := max( U + l ) " " 2 ε / 3 2 , 6 b £ + 1 6 Q l i ^ " ) (g)
Σ
d
^
t h e n by
(3^)
< e/8. 1/2
For each I ύ k = k ( n ) , n > n , we have n O thus by (9)
1/2 δ
Z n ' )L
<S > ε/16; K s£S
Now, by Lemma 4 (ii) of Section 1 we obtain for each B 6 S
P
:=P(|β (B)| > d ) ί 2exp(
Thus, since μ(B) < 6^
and
d^n
-1/2
4 ^ 2δ^, we have
ryx ).
128
PETER GAENSSLER
o
Let
mU)mU+l) ^ 4[mU+l)]
{.+9 9
= 4 exp(2
b£+1).
Then, using (^ and (lg> we obtain P£
:= Γ( |β (B)| > d
for some B 6 S
)
2d }=M exp(
7
i
d 2 /(8 δ ) and
Now, by definition of d 0 ,
JO
d 2 /8
" T2Γ
ε 2 /(8
O
(32)2(ί,+l)i+) and so
4 e x p ( - 2 A + Γ ε2/(9000(ic+l)i+)). k=k(n)
Thus, by Σ
Next, again for k = k ( n )
P
9
(n > n ) we have o
< 4 ε/32 = ε/8.
n > n , let
and Q := P(V > ε/8). n n Then by Lemma 4 (i) of Section 1 and (jΓ) (according to which 4 3
n
-1/2 ε ϊ -
U(k)]
= exp(2
2
3 δ
k
}
. 2 exp(
2kb2)
^ ,
2 exp(- J L 2 _
• 2 exp(8" k
9
r29Γ
= 2 exp[2K(2b1" - - | — - ) ] .
Now, for s := k+r, 2b2 = 2 1 klog m(k) = 21"klog \i\)
= 21"klog
EMPIRICAL PROCESSES
-ξ!
Thus Q n S 2 exp
Now, if V
129
£ ε/8 then by (?) |β (C) - β (A..,,
* ) I ^ε/4 for all C E C,
and therefore 1
1
o utc
= (E1 Π{V In with W
Now W
"- *-
> ε/8}) U (E1 Π{V In
:= {sup |β (\ i(k
c )
) " β (A . (
^ ε/8.}) C {V > ε/8} U W n n J | > ε/H}.
C W ! := {sup[ Σ | β (B,(C)) | I > ε/8} U {sup[ © n CGC 0^£ ε/8} ,
where according to (?) (note that B (C), D (C) 6 5 ) ^ ^ ^
At
P(W') ^ n
thus, together with P(V P
Σ
P
Λn
>ε/8)=Q
ΛJ
+
Σ
JO
P
< ε/4; (l^
< ε/4 it follows that
(E.Cεjδ ,n)) < ε/2 for n > n , 1 o o
This proves (+) and concludes the proof of Theorem A,
D
(57) REMARK. The above proof shows that the two conditions (a) and (b) of Theorem B are implied by ( E ^ without imposing (M). I,S. Borisov (1981) has shown that (E.) cannot be weakened, being necessary in case C is the collection of all subsets of a countable set X, where (E-) is equivalent to 1 /9 Σ (μ({x})) < «; cf. also M. Durst and R.M. Dudley (1980). x6X (58) EXAMPLE. As an illustration of the applicability of Theorem A we will show that in (X,B) = QRk5/B ), k i l l , the class C = J R of all lower left orthants is a μ-Donsker class for any p-measure μ on β, (1952) for k = 1 and by R.M. Dudley (1966) for k k 1 ) .
(proved by M,D. Donsker
130
PETER GAENSSLER
As remarked in (48) (a), condition (M) holds true for J,
so, by Theorem A 9
we must show that (E.) is fulfilled: a) For k = 1, consider for any 0 < ε ^ 1 the partition -« =: t < t 1 the result is an immediate consequence of a) and the inequality (59) of the following lemma (formulated in greater generality as needed in the present case), LEMMA, Let (X,B) be a measurable space and let μ be a probability measure k k on the product σ-algebra © B in X , k ^ 1, with marginal laws π.μ on B, i=l,...,k. Let C. C B, i=l,...,k, be given classes of sets and k C := { x C : C. 6 C , i=l,,.. 9 k}. x x i=k 1 Then (59)
N I (ε 9 C 9 μ) ύ
k Π
N^ε/k^^μ).
Proof. We may and do assume that n. := N (ε/k9C.9τr.μ) < °° for each i=l 9 , t l 9 k. Then there exist A. ,...9A. r.,s. 6 {1,...,n.} with
6 B such that for any C. 6 C. there exist i
EMPIRICAL PROCESSES
A
ir
C
C
i
i
C
A
a n d
is
π
i
i
μ(A
V A
is
131
}
i=l,...,k. This implies that
x
A.
C
x
• _-. ir. l-l l
._ 1=11
c. C
x
l
. i=l
A.
is. l
and
μ( x
Σ μ(B.) (with B. := X x ... xX x (A. X x 1S =l i
=
Σ
. _i=l
π.μ(A. l
is. l
\ A.
IΓ.
x
A.
βΘ
i
X A. xr i
A.
\
x
is. . Λ i i=l
A.
)
lr. i
) x X x ... x X )
) < ε.
l
Since there are at most n
i=l
. i=l
n
...
B, (59) follows.
ΓL approximating sets of the form
D
1
SOME REMARKS ON OTHER MEASURABILITY ASSUMPTIONS AND FURTHER RESULTS: Instead of (M) Dudley (1978) used the following measurability assumption (M ) (again w.r.t. the canonical model (Ω5F,JP) = (X (M ): o
,β
, x μ)): H
3 : Ω ^ S Ξ D (C 9 μ) is ?.B, (S,ρ)-measurable, n o o where F denotes the measure-theoretic completion of F w.r.t, P = x μ . H
Imposing (M), it follows that β
is F,B^(S,p)-measurable, whence
(M) implies (M ) . On the other hand, replacing A = σ ( { π : C (Ξ C}) by A
o
:= σ({π_: C 6 C; p( c
9
φ ) : φ E S})
and imposing (M ) instead of (M), it follows that β where B (S,ρ) C A
C B(S,ρ)
is F,A -measurable, (cf. (47)), which means that also under (M )
one meets the basic model of Section 3. Thus, Theorem A and Theorem B hold as well (with the same proof) if (M) is
132
PETER GAENSSLER
replaced by (M ),
Besides (M ) Dudley (1978) Introduced a second measurability assumption o (M ) (called a μE Suslin property for C ) , stronger that (M ) , which turned out to be verifiable in cases of interest where (M) or (SE) fails to hold (cf. (48) (b)) t As shown in Gaenssler (1983), based on Theorem A (with (M) replaced by (M )) one obtains a functional central limit theorem for empirical C-processes indexed by classes C allowing a finite-dimensional parametrization in the sense of the following theorem:
THEOREM C. Let X be a locally compact, separable metric space, B = B(X) be the σ-algebra of Borel sets in X, and let K be a compact subset of ΊR , I ^ 1. Suppose that f: X x K
+H
is a function satisfying the following conditions (i) - (iii) ((iii) with respect to a given probability measure μ on B ) : (i)
f
(ii)
f.(x): K -> B is "uniformly Lipschitz", i,e,,
z
:= f( jz): X ^ E is continuous for each z 6 K
M :- sup sup{jf (x) - f t(x)j / |z - z ! | , z + z 1 , z,z' E K} < » z Z xEX (where |z - z'| denotes the Euclidean distance between z and z 1 ) (iii) μ({f
z
E [-ε,ε)}) = Cf(ε) uniformly in z E K.
Let C C B be defined by
C := {{f
k 0}: z E K } . z
Then C is a μ-Donsker class; furthermore, (M ) (and therefore also (M )) is satisfied for C and μ. (60) EXAMPLES, (a): Let (X,B,μ) = ([θ,l] k , [θ,l] k n ^ , 5 λ l ) , k ^ 1, λ, being K the k-dimensional Lebesgue measure on [θ,l]
k
K
Π (β , and let C C β be the class
of all closed Euclidean balls in [θ,l] . Then C is a λ-Donsker class and ( M ^ is satisfied for C and λ, .
EMPIRICAL PROCESSES
133
In fact, take k
C
k
K := ίz = (y,r): y E [θ,l] , 0 ^ r ^ r y := sup {r: B (y,r) C [θ,l] }}, c k where B (y,r) := ίx 6 E : e(x,y) ύ r} (e denoting the Euclidean distance in k
[0 9 l] ) 5 and define
k
f: [θ,l] x K + ]R by C
f(x,z) := e(x,£B°(y,r)) - e(x,B (y,r)), z = (y,r) E K (= r - e(x,y)) where B (y,r) := ίx E E. : e(x,y) < r}. Then
{{f ^ 0}: z E K} z
is the class of all closed Euclidean balls in
X = [θ,l] and it is easy to verify (i) - (iii) of Theorem C giving the result, (b) (cf. (48)(b)): consider the same p-space (X,B,μ) as in (a) and let C := {(C + z) n [0 9 ll k : z E [θ,l]k}, C being a fixed closed and convex subset of X = [θ,l] , k k 1, (cf. R. Pyke f(x,z) := e(x,CC°) - e(x9C ), x 9 z E [θ 9 l] k , with z z C := C + z and C denoting the interior of C . & z z z (1979)). As in (a) let
Then C is a λ -Donsker class and (M.,) is satisfied for C and λ, . This follows again from Theorem C; for this we have to verify the conditions (i) - (iii) there and also that
(+) ad (+): x E C
k
C = {{f
^ 0}: z E [0 9 ll } 9 i.e., that
C
^0}
= if
for each z' E [θ 9 l] k .
implies that e(x,C ) = 0 whence f (x) = e(x9tC°) ^ 0; on the
other hand, if x E ()C then e(x9C ) > 0 9 since C is closed, and e(x9CC ) = 0 z z z z whence f (x) = -e(x9C ) < 0, this shows (+). z z ad (i) follows immediately from the fact that for any 0 * A C X |e(x ,A) - e(x ,A)| ^ e(x ,x ) for each x ^ ^ £ X ad (ii): let f'(x) := e(x,CC°) and f"(x) := e(x,C ), i.e.., —^—^^——
2
Z
z
Z
f.(x) = fτ.(x) - fV(x) for all x E [θ,l] , Then it suffices to show that both
134
PETER GAENSSLER
f;(x) and f"(x) are uniformly Lipschiΐz: as to fj(x) this follows from k
0 V x e ^^
k
[05l] :
|e(x,CC°) - e(x 9 CC°,)| ^ e(z,z') Vz 9 z' E L θ 9 l ] . z z
ad Q ) : we use the following fact which is easy to prove: (+)
For any closed F C [θ5l]
and any x 6 F° there exists
a w E 9F such that e(x,w) = e(x 9 CF°). Now, given any x E Lθ,l]
let w.l.o.g, z and z
1
be such that x 6 C° Π c ° τ ; z z applying then (+) for F = C and F = C ,, respectively, we obtain z z
e(x,(JCO) = e(x,w ) and e(x,(JC°t) = e(x,w E 3C and t ) for some w Z X5Z Z X9Z XjZ Z w , 6 3C l 5 respectively. Furthermore33 since C w for some c
and C 1. are closed, z
x,z
= c + z χ5z
E C and c
X9 Z
(++)
z
and
w
. = c , -t z 1 x 9 z' x9z'
, E C 9 respectively, and
X9 Z
e(x9c
+ z) ύ e(x,c X9Z
f
+ z) and e(x9c
XjZ
c +zf) X, Z
9
, + z1 ) XjZ
respectively.
Thus C^) - e(x,CC° ) = e(x9c
^
+ z) - e(x 9 c
+ zf)
e(x 9 c . + z) - e(x9 9c , + z') ^ e(c τ . + z 9 c . + z 1 ) = e(z.z'). xzτ x 9 z' x9z x 9 z'
This proves ^ ^ . That also f" is uniformly Lipschitz follows from (2) 1
Vx E [ 0 9 l ] k :
|e(x,C ) - e(x9C , ) | ^ e(z,z ! ) Vz 9 z τ E [ θ 9 l ] k . Z Z
ad (2) : Given any x E [θ,ll that for all z 9 z
!
E [θ9ll
9
and any ε>0 there exists a c = c(x 9 ε) E C such e(x9c + z) ύ e(x,C ) + ε and thus
e(x,C ,) ^ e(x 9 c + z ! ) ύ e(x9c + z) + e(c + z,c + z f ) z = e ( x 9 c + z ) + e ( z , z ' ) ύ e ( x , C z ) + ε + e ( z , z ' ) f o r any ε>09 whence e ( x 9 C
z
f
) - e ( x , C ) ^ e ( z 9 z ! ) y i e l d i n g Cty by s y m m e t r y . z ^"*^
EMPIRICAL PROCESSES
135
Before proving (iii), let us remark that so far we have only used that C is a closed subset of [θ,ll
for proving (iii), in addition, some smoothness of the
boundary of C is needed. So we will now use that C is convex, ad (iii): We must show that λ ({f G [-ε,ε)}) = βf(ε) uniformly in z G K, K z For this it suffices to prove
0
{f
CO
z
G L-ε,ε)} C c ε \ C z ε z
for a l l z G L θ , l l k , and
sup . λ. (C \ C ) ^ c. ε £-r^ -, i k k z ε z k constant c
(Here Aε
:= {x:
for ε Ψ 0 with some
depending only on k.
e(x,A) ^ ε} and
A := {x:
e(x,CA) > ε}.)
x G X, whence (a)
f
(x) = -e(x,C ) i f f x G CC°, z z f (x) = e(x,CC°) iff x G C , and z z z f (x) = 0 iff x G ac . z z z
(b) (c)
Thus (note t h a t X = [(βC°) \3C 1 + (C \ 3C ) + 3C ) z z z z z -ε £ f
-ε < e(x,C ) ύ ε z
(x) < ε z
xG(CC°)\3C f (a) I z z I I
-ε ύ e(x,Cc ) < ε z
-ε £ f (x) < ε x G C \3C z z
(b)
f (x) < ε z x G 3C
x G C \ C, z εz
x G ( Cv C ° ) \ a C z z
x G
f
> =>
. ac
(x) = 0 z
(c)
x G C \ C , and z ε z'
x G ac
x G C° N\ C , z εz
This proves Ad (j+): Due to the translation invariance of λ, Qϊ) is equivalent to λ (C \ C) ^ c, ε as ε Ψ 0. K ε K
136
PETER GAENSSLER
Now, as shown in Gaenssler (1981) one has ε
(61)
sup λ ( C \ C) ύ c ε k ε K CE l. This proves the assertion of Example (b), D
(62) ADDITIONAL REMARKS, (a): the above considerations show that the set of all translates of a fixed closed but not necessarily convex set C is a λ, -Donsker class provided that C has a smooth boundary in the sense that λ ( C ε \ C) = CΓ(ε). Based on a result of E.M. Bron^tein (1976) it was shown by K ε Dudley (1981a) that for the class 0 we have to show that (+)
lim sup P(|W (δ) - W (0)| £ ε) •* 0
as
6 + 0
as
δ + 1,
and (++)
lim sup P(|W n (l) - W (δ)| £ ε) -• 0 n-*»
P(|W (δ) - W (0)| ^ ε) ύ ε" 2 E(W2(O,δl) n n n
ad ( + ):
/, ., x /κox τ o o (bl)3(b2),L.22 = ε
ε ' V 1 Σ c2.[F ,(09δ] -F2.(O,δl] ^ ε ' V 1 Σ c2. F .(0,δ] ._1 m m i=l
ni
. 1 ni i=l
ni
v (0,6], whence by (a) or ^ )
lim sup P( |W. (δ) - W (0) | £ ε) £ ε~2(G(δ) - G(0)) ^ 0
as δ + 0
since G was assumed to be continuous, ad (++): Follows in the same way as (+). ad (&:
By (42) (cj) implies (c)which together with ζϊj) implies ζ ? ) by (41),
According to (C*) , given any η > 0 there exist δ - δ(η) > 0 and
144
n
o
PETER GAENSSLER
= n (δ(η)) G H such that for all n ^ n P(W" ( 0 such that k"1 < δ and
a"2 ύ η/2;
then ρ
W for some subsequence (n1 ) of I; then for any 0 ύ s ^ t ύ 1 such that s,t E T
:= {r E [0,1]: π
is L{W}-a.e. s-continuous}
it follows from Theorem 5.1 in Billingsley (1968) that (+)
|Wn,(t) - W n t ( s ) | 3 -ί> |w(t) - W(s)| 3 ;
on the other hand it follows from Lemma 22 (ii) that
EMPIRICAL PROCESSES
145
2 Q
2
(++)
3E(w\(s,tl) ^ 3v f(s,tl + ( max - ^ n n - s .^_f n all n
1
) v ,(s,tl ^ 4 n
for
(cf. (66) and (67)(a)).
But (+) together with (++) imply (cf. Gaenssler-Stute (1977), Exercise 1.14.4, p. 114) that 3
3
E(|W(s,t]| ) = lim E(|W t (s,tl| ) ύ lim sup (E(w\ (s ,tl) ) n ! n n'-*» (Holder) n ->~
3 / 4
(3G 2 (s,t]) 3 / 4 έ 3(G(s,tl) 3 / 2 . (++) and (a) or Since T
w
contains 0 and 1 and is dense in [θ,l] (cf, Billingsley (1968)), it
follows that E(|W(t) - W(s)| 3 ) έ 3|G(t) - G ( s ) | 3 / 2 for all 0 £ s £ t £ 1, whence by Lemma 19 P(W 6 C) = 1. ad (iii): Assume first that lim K (s,t) = K(s,t) for all 0 £ s £ t £ 1. n
We are going to show that for any αl5...,α (+++)
k , Σ α.W (t.) -=^W(O,V) with . Λ Jj n J j j=l
€ ]R and any t ,.. , ,t E [θ,ll
k V := Σ α α K(t ,t ). r s r s r,s=lΛ
ad (+++): If V = 0, then for any ε > 0 k P(| Σ α.W (t.)| ^ ε) ^ ε 3 n 3 j = 1
-
-2 ε
(cf.(67)(b))
k
k E( | Σ α.W (t.)| ) 3 n 3 j = 1
-2 α α K ( t , t ) —=> ε V = 0
Σ r,s=l
Γ
S n
in case V = 0. If V > 0, consider
C
ni[ j
Γ
S
asn-^°° which proves (+++)
146
PETER GAENSSLER
Then the ζ .'s form a triangular array of row-wise independent random variables with
E(ζ .) = 0 and such that ni n
n
2
k
-1
2
±
Σ ECζ .) =E(( Σ ζ .Γ) = V τ(( Σ α.W (t.)Γ) Π1 Π 1 Π ^ ^ -1 = V
Σ α α K ( t 9,t ) + 1 , r s n r s r,s=l
as n -*- «;
furthermore, for any δ > 0 we have n
Σ
2 .
E ( ζ
i=l
n l
lr i {
I . ζ
l ni
x
λ
) + 0
a s
n
-* « :
l > δ }
in fact, given any ε > 0 let p := 1 + ε / 2 and q > 0 be such that 1/p + 1/q = 1; then, by Holder's and Markov's inequality we obtain for any ό > 0
Jfsi (Vn)
y
-l-ε/2
( maχ
l
l
ϋ n J >ε
-^Oasn-^00
=1
Thus, it follows from the Central Limit Theorem (cf, Gaenssler-Stute (1977), 9.2.9) that n
L
Z ζ . — > N(O,1) i=l n l
which proves (+++).
Next, let W be a mean-zero Gaussian process with covariance structure given by K. Then, again for any α ,. .. ,α k L{ Σ α.W(t.)} = N(O,V)
E H and any t.. ,. . , ,t, G [θ,ll
with V defined as above.
Therefore, by the Cramer-Wold Device (cf. Gaenssler-Stute (1977), 8.7.6) it follows together with (+++) that
EMPIRICAL PROCESSES
W ~f
n f .α.
Now, by (ii), for any subsequence (W
f
W.
) of (W ) there exists a further subse-
quence (W ,,) and a random element W, τ x, ftx in
L{
147
(D,8, (D,ρ)) such that
V ) ( n » ) K C ) =l a n d V^ W (n')(n") i n (D'p}
Applying Theorem 3 and using the fact that each W, t,
M )
is uniquely deter-
mined by its fidis, it follows that L W, n1 , , n =: W (n )(n")
L = W
and therefore
d
L
W — > W in (D,p). To prove the other direction, suppose that W
V,,.
—•> W in (D,p) where W is a
r.e. in (D,& (D,p)) and is a mean-zero Gaussian process with covariance structure given by K (and such that L{W}(C) = 1 ) . Then, by Theorem 3, for any 0 ύ s ύ t ύ 1 W n (s) - W n (t) -^> W(s) and
E(W 2 (s) n
W(t)
as
n -> «,
W 2 (t)) S E ( W 4 ( s ) ) 1 / 2 E ί W ^ C t ) ) 1 7 2 ύ 4 n n n
for all n 6 Έ (cf. (+++) above), whence (by the same reasoning as in the proof of (ii)) E(W(s)
W(t)) = lim
lim K (s,t) = K(s,t) n
^o
E(W (s)
W (t)), i.e.,
for all 0 ύ s £ t ύ 1.
D
n
SOME GENERAL REMARKS ON WEAK CONVERGENCE OF RANDOM ELEMENTS IN D Ξ D[0,ll w.r.t. p -METRICS: Let q be any weight function belonging to the set (L := {q: [θ,l] ->]R9 q continuous, q(0) = q(l) = 0 and q(t) > 0 for all t G (0,1), having the following additional properties (i ) - (iii )}: There exists a δ
=δ(q),O D, defined by T (y) := y/q, is B (D ,p ), B,(D,p) -measurable.
This implies that ξ/q is a random element in (D,B(D,p)) iff ξ
is a random element in (D jB^tD sP ))•
Note also (cf. (39)) that C G B, (D ,p ) and that (C ,ρ ) is a separable and q b q q q9 q closed subspace of (D ,p ). Furthermore, one has the following LEMMA 23, In the just described setting, the following two statements are equivalent: (i)
ξ /q - ^ ξ /q no
(ii)
ξ -^-> ξ n o
in (D,p) and
L{ξ /q}(C) = 1 o
in (D ,p ) and L{ξ }(C ) = 1, q q o q
Proof. (i) => (ii): Note first that L{ξ }(C ) = P(ξ G C = qC) o q o q = P(ξ Q /q E C ) : L{ξ Q /q}(C), Now, according to (28) (cf, (h ! ) there) it remains to show (+)
E(f(ξ )) ->E(f(ξ )) for every f: D -> ]R which no q is bounded, uniformly p -continuous and B, (D ,p ), 0S-measurable. b q q
So, let f: D
•> E be bounded, uniformly p -continuous and B (D ,p ), β-measu-
rable, and let g: D •> ]R be defined by g(x) := f(qx), x G D; then g is bounded, B (D,p), /B-measurable (since g = f
T.) and uniformly p-continuous (since
p(x.,xo) = p (qx,,qxo) and |g(x^ - g(x o )| 1 2 q 1 2 1 2
=
|fCqx.) - f(qx )j for any 1 2
x l 9 x 2 G D, i.e. qx l 3 qx 2 G D ). Therefore, by (i) and (28) E(g(ξ /q)) ->E(g(ξ /q)) which implies ( + ) since E(g(ξ n /q)) =E(f(ξ n >) for all
150
PETER GAENSSLER
n Z 0. (ii) =*• (i): follows in the same way,
D
FUNCTIONAL CENTRAL LIMIT THEOREMS FOR WEIGHTED EMPIRICAL PROCESSES w.r.t. pq -METRICS: As before let W
be a weighted empirical process based on an array (ξ .)
of row-wise independent random variables ξ ., 1 ^ i ^ n, n G I , defined on some p-space (Ω,F,P), and on an array (c .) of given scores (cf. (65)). We assume again that the distribution functions F . of the ξ . f s are concentrated on [0,1]; here, in addition, we suppose that
(69)
for each n 6 3M: n
-1
n
Σ
F .(t) = t
for all t G [θ,l].
n l
Then , for any q 6 (L, we have
LEMMA 24, P ( W / q £ D ) = 1 for all n, whence we may and do assume w,l.o,g, that W ( ,ω)/q( ) 6 D for each ω 6 Ω and all n
EΈ.
Proof. According to the definition of W , for P-a,a. ω 6 Ω there exists a t = t(ω) ύ δ |W Ct,ω)| n
,. q(t)
such that by (69) and (iv ) ,
± n n ±
1 / 2
n F (t) , l .. ΣΣl |c II -2i-r| ) n1/2 - ^ |c -2i-r-ί ί( (max max|c|c..
m» q(t) q(t) m»
i=1
^ JJ
mm
+ 0
|W (t,ω)l as t ->- 0; similarly, —
n
, . — -> 0
as
t ->• 1 for P-a.a. ω
which implies the assertion (imposing the convention — := 0 ) . D
Now, for uniformly bounded scores, Shorack (1979) has shown:
THEOREM 19 (Shorack (1979), Theorem 1,2). Suppose that (70)
sup ( max jc . |) ^ M < °°.
Then for all q € ζL we have
EMPIRICAL PROCESSES
^
151
(i)
(W /q)
is relatively [.-sequentially compact (in (D,s)).
(ii)
Any limiting process, i.e. any random element W in (D,B(D,s)) = (D,B (D,p)) such that W ,/q
> W for some subsequence
1
(n ) o f W , satisfies L{W}(C) = 1, whence, by Lemma 18, (W /q) is relatively L -sequentially compact in (D,p) such that for any limiting process W
L{W}(C) = 1,
and therefore, by Lemma 23, (W )
is relatively L -sequentially compact in (D ,p )
such that for any limiting process W (iii) There exists a random element W o Gaussian stochastic process with
L{W }(C ) = 1.
in (D 9 B,(D ,p )) being a mean-zero q b q q
C O V ( W Q ( S ) , W (t)) = K Q (s,t), L { W Q } ( C ) = 1 W
-^> W
n o
and such that
in (D ,p )
q q
if and only if (for K (s,t) = cov(W (s), W (t))) lim K (s,t) = K (s,t) n+~
n
for all
0 ^ s ^ t ^ 1.
°
The proof of Theorem 19 (being based on Theorem 18 and Theorem 17) can be carried through along the lines presented in Shorack's (1979) paper with some slight modifications being necessary due to our choice of 0 : by the way, instead of (15) on p. 171 it suffices to impose (iv ) and instead of P(A ) ύ exp(-l/a ) one shows
P(A ) ^ 1 - I/a
to get (v) on p. 181. We are not
going to give further details here. Instead, since the proof of Theorem 1.2 in Shorack (1979) seems not suited to carry over to give a proof of his Theorem 1, 3 as mentioned there on p. 182 (note that in the case of not uniformly bounded scores it is not possible to estimate 2 ( max l^n
)(t - s) by M (t - s)
for t - s > n
, which was essentially used
n
to get (c) on p. 179) we want to present here a completely different proof of the following result:
152
PETER GAENSSLER
THEOREM 20 (Shorack (1979), Theorem 1.3). If a l l ξ . are uniformly distributed on Lθ,ll and if 2 c . max — ^ - ->- 0
(71)
L
b
then, for any q E (L, W B
instead of (70)
as n •> °°,
o > B° in (D ,p ) as n -> °° and L{B°}(C ) = 1, where
denotes the Brownian bridge.
The proof of Theorem 20 is based on the following lemmata which may be of independent interest. The first lemma is concerned with a martingale property of the weighted empirical process W
based on ξ . which are uniformly distributed on [θ,l],
LEMMA 25. Let n E U be arbitrary but fixed and write ξ. and c. instead of ξ . J l
i
ni
and c ^, respectively.Suppose that F .(t) = t for all t E [θ $ ll. Then for any (cl5...5cn)
EIR n
n1/2Wn(t) (
ϊ=t
W
martingale w.r.t.
F t
n 1 / 2 W () (s) := σ({—zr—^ : s ^ t})9 0 _L S
t < 1.
Proof. We use the following auxiliary result which is easy to prove: (72) Let ( C t ) 0 ^ t < T a (F t
n d (η
) t
o^t ε) £ ε2 8 J 6 q"2(u)du 0
which proves (+). ad (ii): by symmetry this follows in the same way.
LEMMA 27.
For any q e (λ we have
D
P(B°/q 6 C) = 1, where B° denotes the
Brownian bridge and C is the space of all continuous functions on [θ,ll.
Proof. We have to show that B /q is P-a.s. continuous at 0 (and also at 1 which is shown similarly). For this, according to Lemma 19, it suffices to show that for some constants a > 1, b > 0 and some continuous function F: [θ,l] •> E. ( +)
E(|B°(t) - B°(s)| b ) έ |F(t) - F ( s ) | a for all 0 ύ s ^ t ^ 1, where (using again the convention — := 0)
EMPIRICAL PROCESSES
(δ
155
= δ (q) as in the definition of
ad (+): since, for any 0 ^ s ^ t ύ 1, B°(t) - B°(s) is normally distributed with mean zero and variance (t - s)(l - (t - s)), we have θ
° (a)
^
2
, 3(t- s ) ,==
E (
2
( l- ( t - s ) )
t
q (t)
O
O ί
ί a
a S
S t
t i
i
l
q\t)
On the other hand, for any 0 < s ^ t ^ δ ,we have (s)) [
_
-
]
)
_
[
-
_
!
. 3 s (1 - s) ,
where S
2f L
1 1 Ί*+ _ r / Γ n q(s) .-,4 q(s) " q(t) J " L q(s) K± ' q(t) n
[ ^ y (1 - ψ)f
(since ^ y + on [0,6*1 by (ii
/t"-ϊ/s" 4^/t-sΛ2 = ΓL — n ( -x J Ί ^ ( -5 ) 9 whence
(b
,
E
(CB°(s)Λ
1
.
I1*, ^ 3 ( t T s ; 2 ( 1 ' s ) 2 , 0 < s £ t . δ *
1
Now, it follows from (a) and (b) that for 0 < s ^ t ^ δ , B°(t) q(t)
2
\
4 2
[
B°( S ) 4 q(s) I } -
E (
3
B°(t) - B°(s) + q(t)
E U
(B^t^B^s)) q (t)
4
,
+ E ( ( B
o
( B ) )
o B
C s ) L
^ q
1 . ^t;
Ct-s)2(l> (t-s))2 + 3(t-s)2(l-s)2
6(q"2(t)(t - s))2
Thus, taking F(t) := 4^T J 0 0 < s ύ t ύ 1.
t
ύ
q(t)
q
1 ^S;
q(s) f
n
j
(i+i/Γj11 q " 2 ( u ) d u ) 2 .
2
q" (u)du, we get ( + ) (with b = 4, a = 2) for all
It remains to consider the case s = 0 and 0 < t ύ δ 11
F2(t) = |F(t)- F(0)|2.
This proves
but, by (a), q" 2 (u)du) 2
(+). D
156
PETER GAENSSLER
Proof of Theorem 20. In the setting of Theorem 18 and its preceding remarks (67) we have in the present situation (where F .(t) = t for all t E [θ,l]) that (cf. (66)) v n (t) = t
and
(=: G(t)) for all t E [θ,l]
K (s,t) = K(s 9 t) = sΛt - st = cov (B°(s), B°(t)).
Therefore, by Theorem 18 ( i i i ) , we have L
(a)
Wn A » B °
in (D,p).
Furthermore, by Lemma 24 and Lemma 27, for any q 6 (^ we have (b)
F(W /q E D) = 1
for all n E Ή and F(B°/q E C) = 1.
We are thus in a situation where our general remarks on weak convergence of random elements in D = D[θ,l] w.r.t. p -metrics can be applied. So, by Lemma 23, it remains to show L (c)
W n /q - ^ B ° / q
ad (c): let q q Since q
in (D,p).
Ξ 1 and for m £1 let := q 1
+ q(-) 1
is continuous and q
+ q(l-i) 1
> 0 on [θ,l]
L Wn/qm - £ *B 0 / ^
(d)
in (D t p) as n -> «,
Now, according to (28), (c) holds if we show that lim
E(f(W n /q)) =E(f(B°/q))
for all f E U^(D,p).
n-χ»
But, again by (28) and (d), we have for each m that
lim ECfCW^c^)) =E(f(B°/q m )) for a l l f E u£(D,p); n
->oo
furthermore, by Lebesgue's theorem lim m^
E(f(B°/q )) =E(f(B°/q)) for all f E U ^ ( D , p ) , b ^
since, by Lemma 27, P(B°/q E C) = 1 and therefore lim p(B°/q , B°/q) = 0 m
F-a.s.
->oo
EMPIRICAL PROCESSES
157
Thus, given any f G U.(D,p), choosing for each m 6 UN k > k Λ (with k := 0) such that m m-1 o
and putting for each n 6 IN i
:= m
if
n 6 {k 9 .,. 5 k
*1}, we obtain
lim E(f(W /q. )) = E(f(B°/q)). n x n-**> n So, it remains to show (e)
lim
|E(f(W /q ± )) -E(f(W /q))| = 0. 1
n-*»
n
For this, let ε > 0 be arbitrary and δ = δ(ε) > 0 be such that ρ(x,y) ^ 6 implies |f(x) - f(y) | S ε. (Note also that
||f|| := sup |f(x)| < °°.) xGD Then for n sufficiently large (i.e. such that i ύ 6 ) n |E(f(
V q i ) } -E(f 6) n
W (t) | - i L - y | > δ/2) + ]P(
tG[o,i 1 n
}
'
W (t) |-ϋ-_|
sup
> δ/2)l,
te[i-i ,H n
whence it follows from Lemma 26 that |E(f(Wn/qi )) n
n
£ ε + 2 ||f|| [(δ/2)^ 2 . 8 ( 1 ^ q^2(u)du + 0
/
q" 2 (u)du)] 9
^i
and therefore, by (iii ), lim sup |E(f(W / q ^ ) -E(f(W /q))| ^ ε. n-^ n Since ε > 0 was arbitrary, this proves (e) and therefore (c) is shown.
D
(73) REMARKS, (a) W. Schneemeier (1982) has given an example showing that Theorem 19 fails to hold if the uniform boundedness condition (70) on the scores is replaced by the condition
158
PETER GAENSSLER
max
2 c . - ^ + 0 n
as
n •> °°
which was imposed in Theorem 18. Thus, the assumption in Theorem 20 of ξ . being uniformly distributed on [θ,l] cannot be weakened to the assumption that
for every n 6 Iί
-1
n~
n
Σ
F .(t) = t
for all
t e [0,1]
(cf. (69)) without
strengthening the condition on the scores. (b) As to the L -statements in Theorem 18 and Theorem 19 it is possible by making use of Theorem 11 a) (or Theorem 11 ) to modify the given proofs such that they operate totally within our theory of L -convergence.
Note, for example, that along the same lines as in the proof of Proposition B
together with an application of Theorem 11
theory of L -convergence) that any sequence (ξ )
one obtains (within the _τ of random elements in
(D,B (D,p)) which satisfies the following two conditions (i)
lim δΨO
lim sup n-*»
TP(u) (<S) > ε) - 0 ξ n
lim
lim
]P(||ξ || > M) = 0
for each ε > 0
and (ii)
sup
is relatively L -sequentially compact and such that for any limiting random element ξ one has L{ξ }(C) = 1. o o Further results in this direction will be contained in a forthcoming paper by P. Gaenssler, E. Haeusler and W. Schneemeier (1983).
CONCLUDING REMARKS ON FURTHER RESULTS FOR EMPIRICAL PROCESSES INDEXED BY CLASSES OF SETS OR CLASSES OF FUNCTIONS: (a) FUNCTIONAL LAWS OF THE ITERATED LOGARITHM (cf. Gaenssler-Stute (1979), Section 1.3, concerning results for the uniform empirical process α ). One of the main theorems in Kuelbs and Dudley (1980) states that for any
EMPIRICAL PROCESSES
159
p-space (X,B,μ) the following holds true: (74) If (M^) is satisfied for a class C C β
and μ, and if C is a μ-Donsker
class, then C is a STRASSEN LOG-LOG CLASS for μ, i.e., with probability one the set β n (C) {(
:
Γ75" )nar 1/2 L t L (21oglogn)
n
-
n
^ °
^
s
relatively compact
(w.r.t. the supremum metric p in D (C,μ)) with limit set Br
L
:= {φ: C H- J fdμ, C 6 C; f 6 B } , where C
B := {f e i. 2 (X 5 8 5 μ): J f dμ = 0 X
and
J|f|2dμ ύ 1}. X
(Note that Bn C iP(C,d ) C D (C,μ).) L μ o
Now , as pointed out in Gaenssler (1983), since for (X,B,μ) = (]R1^,β ,μ), k ^ 1, the class C = J, of all lower left orthants satisfies (M ) and is a μ-Donsker class for any μ by (58), one obtains by (74) the results of Finkelstein (1971) and Richter (1974), namely (75) J
is a Strassen log log class for every p-measure μ on β, , k ^ 1,
That the same holds true for C = B
(the class of all closed Euclidean balls
in E j k ^ 1) is a consequence of our remarks preceding Theorem D and of Corollary 2.4 in Kuelbs and Dudley (1980) according to which one has (76) If (M.) is satisfied for μ and a Vapnik-Chervonenkis class C, then C is a Strassen log log class for μ.
(b) DONSKER CLASSES OF FUNCTIONS. Let α
be the uniform empirical process (cf, the end of Section 3) and let
q be some weight function considered above in connection with weak convergence of random elements in D = D[θ,l] w.r.t. p -metrics. For any q £ Q^ we know from Theorem 20 (with c . = 1) that ni
or, equivalently by Lemma 23, that
160
PETER GAENSSLER
(77) Now, ft:
α n /q - ^ B°/q
(in (D,p)).
from a different point of view, taking for each t € [θ,ll the functions [0,1] +]R defined by f t (s) := q" 1 (t) 1 [ Q
j(s), s E [ θ , U ,
α /q can be considered as an empirical process indexed by a class of functions; in fact, let tFQ
:= ίf t : t G [0,11},
then for each t E [θ,l] α (t)/q(t) = J 1 f_(s) dα (s) =: α (f ). n ~ t n n t Also the limiting process in (77) can be viewed as a mean-zero Gaussian process C
(μ being here Lebesgue measure on X = [θ,ll) indexed by tF , i.e.,
4, then f is a μ-Donsker class.
o
>t})=βr(t
-2
(log t )
-3
) a s t->«
164
PETER GAENSSLER
FIGURE 5
Note that b ) , even for 3 > 1, implies f on μ(ίf
> t}) is implied by
Note also that by taking f
f
6 L (X,B,μ). Conversely, the condition
6 L P (X 9 B,μ) for some p > 2.
= 1 one obtains Theorem D as a corollary of (84).
In this context the following result of D, Pollard (1981a) extends a special case of (84) to the case where only f
2 E ί (X,B,μ) is assumed:
(85) If f € L (X,B,μ) and if C is a Vapnik-Chervonenkis class of sets, then (for a separable version of &^ Ξ (& (f ) ) f ef) > f
: = ίf Q
l c : C G C}
is a μ^Donsker class.
(c) STRONG APPROXIMATIONS (cf. Gaenssler-Stute (1979), Section 3, concerning results for the uniform empirical process α ). In a recent paper by R.M. Dudley and W. Philipp (1983) almost sure and probability invariance principles are established for sums of independent not necessarily (Borel-)measurable random elements with values in a not necessarily
EMPIRICAL PROCESSES
165
separable Banach space like the closure of D (C,μ) in (I (C),ρ) fitting readily into the theory of empirical C-processes β = ((3 (C))-,cr, being now viewed as partial sum processes
β
-1/2 ±/Z = n
n
Σ ζ i=l
with ζ± Ξ ( ζ i ( C ) ) c e C
X
defined by ζ.,(C) := l c ( ξ 1 ) - μ(C)
(for a given sequence (ξ.).
of random elements in (X,8) with distribution μ
on β) having its values in D (C,μ). In an analogous way the same viewpoint applies for empirical tF-processes. This approach of getting strong resp. weak invariance principles has the advantage that one can bypass most of the problems of measurability and topological characteristics which occurred in our theory of L, -convergence where it was essential to choose proper sample spaces S and S β
for the processes
and £ , respectively, together with suitable σ-algebras in S and S
on which
the laws of 3 and Φ could be defined, n μ On the other hand, we think that the availability of the presented theory of weak convergence of empirical processes is at the least necessary to support Dudley's and Philipp's claim that strong approximation results are strengthened versions of functional central limit theorems.
REFERENCES
ANDERSON, T.W. and DARLING, D.A. (1952). Asymptotic theory of certain 'goodness of fit' criteria based on stochastic processes. Ann. Math. Statist. 23 193-212. BAUER, H. (1978). Wahrscheinlichkeitstheorie und Grundzϋge der Maβtheorie, 3. Auflage, Walter de Gruyter, Berlin - New York. BENNETT, G. (1962). Probability inequalities for the sum of independent random variables. J. Amer. Statist. Assoc. 57 33-4-5. BILLINGSLEY, P. (1968). Convergence of Probability Measures. Wiley, New York. BILLINGSLEY, P. (1971). Weak Convergence of Measures. Applications in Probability. Regional Conference Series in Appl, Math., No. 5. SIAM, Philadelphia. BOLTHAUSEN, E. (1978). Weak convergence of an empirical process indexed by the 2 closed convex subsets of I . Z. Wahrscheinlichkeitstheorie und verw. Gebiete 43^ 173-181. BORISOV, I.S. (1981). Some limit theorems for empirical distributions; Abstract of Reports, Third Vilnius Conference on Probability Theory and Math. Statistics I 71-72. BREIMAN, L. (1968). Probability. Addison-Wesley, Reading. BRON^TEIN, E.M. (1976). ε-entropy of convex sets and functions. Siberian Math. J. (English translation) 17 393-398. CHERNOFF, H. (1964). Estimation of the mode. Ann. Inst. Statist. Math. 16 31-41. CHIBISOV, D.M. (1965). An investigation of the asymptotic power of tests of fit. Theor. Probability Appl. 10 421-437.
166
EMPIRICAL PROCESSES
167
COVER, T.M. (1965). Geometric and statistical properties of systems of linear inequalities with applications to pattern recognition, IEEE Trans. Elec. Comp. EC-14 326-334. DEVROYE, L. (1982). Bounds for the uniform deviation of empirical measures. J. Multiv. Anal. 12 72-79. DONSKER, M.D. (1952). Justification and extension of Doob's heuristic approach to the Kolmogorov-Smirnov theorems. Ann. Math. Statist, 23 277-281. DOOB, J.L. (1949). Heuristic approach to the Kolmogorov-Smirnov theorems. Ann, Math. Statist. 20 393-403. DUDLEY, R.M. (1966). Weak convergence of probabilities on nonseparable metric spaces and empirical measures on Euclidean spaces, Illinois J, Math. 10 109-126. DUDLEY, R.M. (1967). The sizes of compact subsets of Hubert space and continuity of Gaussian processes. J, Functional Analysis 1 290-330. DUDLEY, R.M. (1968). Distances of probability measures and random variables, Ann. Math. Statist. 39 1563-1572. DUDLEY, R.M. (1973). Sample functions of the Gaussian process. Ann, Probability 1 66-103. DUDLEY, R.M. (1974). Metric entropy of some classes of sets with differentiable boundaries. J. Approximation Theory 10 227-236. DUDLEY, R.M. (1976). Probabilities and Metrics, Convergence of laws on metric spaces, with a view to statistical testing. Aarhus Lecture Notes Series No. 45. DUDLEY, R.M. (1978). Central Limit Theorems for Empirical Measures, Ann, Probability 6_ 899-929. DUDLEY, R.M. (1979). Balls in ]Rk Do Not Cut All Subsets of k + 2 Points, Adv. in Math. _31 306-308. DUDLEY, R.M. (1979a). Lower layers in ]R2 and convex sets in R 3 are not GB classes. Springer Lecture Notes in Math, 709 97-102, DUDLEY, R.M. (1981a). Donsker classes of functions; Statistics and Related
168
PETER GAENSSLER
Topics (Proc. Symp. Ottawa, 1980), North Holland, New York - Amsterdam, 341-352. DUDLEY, R.M. (1981b). Vapnik-Chervonenkis-Donsker classes of functions, Aspects Statistiques et aspects physiques des processus gaussiens (Proc. Collogue C.N.R.S. St. Flour, 1980), C.N.R.S. Paris, 251-269. DUDLEY, R.M. (1982). Empirical and Poisson processes on classes of sets or functions too large for central limit theorems. Z. Wahrscheinlichkeitstheorie verw. Gebiete 61 355-368. DUDLEY, R.M. and PHILIPP, W. (1983). Invariance principles for sums of Banach space valued random elements and empirical processes, Z. Wahrscheinlichkeitstheorie verw. Gebiete 6_2 509-552. DURBIN, J. (1973). Distribution Theory for Tests based on the Sample Distribution Function. Regional Conference Series in Appl. Math., No. 9. SIAM, Philadelphia. DURST, M. and DUDLEY, R.M. (1980). Empirical Processes, Vapnik-Chervonenkis classes and Poisson Processes. Probability and Mathematical Statistics (Wrocίaw) 1, Fasc. 2, 109-115. FINKELSTEIN, H. (1971). The law of the iterated logarithm for empirical distributions. Ann. Math. Statist. 42 607-615, GAENSSLER, P. and STUTE, W. (1977). Wahrscheinlichkeitstheorie, Springer, Berlin-Heidelberg-New York. GAENSSLER, P. and STUTE, W. (1979). Empirical Processes: A Survey of Results for independent and identically distributed random variables. Ann. Probability 1_ 193-243. GAENSSLER, P. (1981). On certain properties of convex sets in Euclidean spaces with probabilistic implications. Unpublished manuscript, GAENSSLER, P. and WELLNER, J.A. (1981). Glivenko-Cantelli Theorems. To appear in the Encyclopedia of Statistical Sciences, Volume 3. GAENSSLER, P. (1983). Limit Theorems for Empirical Processes indexed by classes of sets allowing a finite-dimensional parametrization. Probability and Mathematical Statistics (Wroclaw), Vol. IV, Fasc. 1.
EMPIRICAL PROCESSES
169
HARDING, E.F. (1967). The number of partitions of a set of N points in k dimensions induced by hyperplanes, Proc. Edinburgh Math. Soc. (Ser. II) 15 285-298. HEWITT, E. (1947). Certain Generalizations of the WeierstraB Approximation Theorem. Duke Math. J. 14 419-427. HOEFFDING, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. As soc. _5£ 13-30. KELLEY, J.L. (1961). General Topology. D. van Nostrand Comp., Inc. Princeton N.J. KIRSZBRAUN, M.D. (1934). ϋber die zusammenziehende und Lipschitzsche Transformationen. Fund. Math. 22 77-108. KUELBS, J. and DUDLEY, R.M. (1980). Log log laws for empirical measures, Ann, Probability _8 405-418. McSHANE, E.J. (1934). Extension of range of functions. Bull, Amer. Math. Soc, 40 837-842. POLLARD, D. (1979). Weak convergence on non-separable metric spaces. J. Austral. Math. Soc. (Ser. A) _28 197-204. POLLARD, D. (1981). Limit theorems for empirical processes. Z. Wahrscheinlichkeitstheorie und verw. Gebiete 57 181-195. POLLARD, D. (1981a). A central limit theorem for empirical processes. To appear in J. Austral. Math. Soc. PR0H0R0V, Yu.V. (1956). Convergence of random processes and limit theorems in probability theory. Theor. Probability Appl. 1 157-214. PYKE, R. and SHORACK, G. (1968). Weak convergence of a two sample empirical process and a new approach to Chernoff-Savage theorems. Ann. Math. Statist, 39_ 755-771. PYKE, R. (1977). The Haar-function construction of Brownian motion indexed by sets. Technical Report _35_, Dept. of Mathematics, University of Washington, Seattle. PYKE, R. (1979). Recent developments in empirical processes. Adv. Appl, Prob,
170
PETER GAENSSLER
11 267-268. PYKE, R. (1982). The Haar-function construction of Brownian motion indexed by sets. Technical Report 18, Dept, of Statistics, University of Washington, Seattle, PYKE, R, (1982a), A uniform central limit theorem for partial-sum processes indexed by sets. Technical Report Γ 7 9 Dept, of Statistics, University of Washington, Seattle. RICHTER, H. (1974). Das Gesetz vom iterierten Logarithmus fur empirische Yerteilungsfunkΐionen im ]R . Manuscripta Math. 11 291-303. SAUER, N. (1972). On the density of families of sets. J. Comb. Theory (A), 13 145-147. SCHLAFLI, L. (1901, posth.). Theorie der vielfachen Kontinuitat, in Gesammelte Math. Abhandlungen I; Birkhauser, Basel, 1950. SERFLING, R.J. (1974). Probability Inequalities for the Sum in Sampling without Replacement. Ann. Statist. 2 39-48. SHORACK, G.R. (1979). The weighted empirical process of row independent random variables with arbitrary distribution functions. Statistica Neerlandica _33 169-189. SK0R0KH0D, A.V. (1956), Limit theorems for stochastic processes. Theor. Probability Appl. 1 261-290. STEELE, M. (1975). Combinatorial Entropy and Uniform Limit Laws. Ph. D, dissertation. Stanford University. STEELE, M. (1978). Empirical discrepancies and subadditive processes. Ann. Probability 6 118-127. STEINER, J. (1826). Einige Gesetze uber die Theilung der Ebene und des Raumes, J. Reine Angew. Math. 1 349-364. STUTE, W. (1982). The oscillation behavior of empirical processes, Ann. Probability JLO 86-107. SUN, T.G. (1977). Ph. D. dissertation, Dept. of Mathematics, University of Washington, Seattle.
EMPIRICAL PROCESSES
171
SUN, T.G, and PYKE, R. (1982), Weak convergence of empirical processes. Technical Report ^9_, Dept. of Statistics, University of Washington, Seattle. TALAGRAND, M. (1978). Les boules peuvent elles engendrer la tribu borelienne d'un espace metrisable non separable? Communication au Seminaire Choquet (Paris.) VALENTINE, F.A. (1964). Convex Sets. McGraw-Hill, New York. VAPNIK, V.N. and CHERVONENKIS, A.Ya. (1971). On uniform convergence of the frequencies of events to their probabilities. Theor. Probability Appl. 16 264-280. WATSON, D. (1969). On partitions of n points. Proc. Edinburgh Math, Soc. 16 263-264. WEGMAN, E.J. (1971). A note on the estimation of the mode. Ann. Math. Statist, 42 1909-1915. WELLNER, J.A. (1977). A Glivenko-Cantelli theorem and strong laws of large numbers for functions of order statistics, Ann. Statist, 5 473-480; Correction, ibid. 6 1394. WENOCUR, R.S. and DUDLEY, R.M. (1981). Some special Vapnik-Chervonenkis classes. Disrete Math. _33 313-318. WICHURA, M.J. (1968). On the weak convergence of non-Borel probabilities on a metric space. Ph. D. dissertation, Columbia University, WICHURA, M.J. (1970). On the construction of almost uniformly convergent random variables with given weakly convergent image laws. Ann. Math, Statist. 41 284-291.
NOTATION INDEX
JA|, 21
B(D,s), 92
A° 5 47
B(C,p), 93
C
A , 47
α
B° Ξ (B°(t))
nΞ(αn(t))te[0,ll' 10°
ε
A
8
b ( S ' p ) ' 8(S.pKS = Do(C.μ)), 107
" 135 Ul,
(B
EB),
®β, 2 C
a Ξ
(f
n
V
co(F), 38
yc)w8
C b ( S ) 9 42
m
C^CS), 42
16
9 42
3.
B C (x .,δ ) , 16
Bb(S ) Ξ B b ( S , d ) 9
lf
, 52
Ξ C [ 0 9 l ] 9 93
41
8(S)
9
i 9 r) 9 B ( x 9 r ) 9 41
148
D (C9μ)9 9
B°(x. 9 r) 9 67 l
D F 9 12 n
δ(D,p) 9 91
D F ( q ) 9 17
172
EMPIRICAL PROCESSES
>, 18
F ^ ( s ) , 12
) 9 18
f
(S) ,
173
M
α. C
(F b (s)
Δ (F), 21
\9n'
(C)
28
>
F(S),
47
D(f) 9 43 ,S ), 43
j|f|| := sup |f(x)|, 59
xes
9A9 47
€μE(Gμ(C))ceC9
d ( ξ n 9 n n ) , 65
G(S) 9 47
d := maxίd'jd"), 70
g " 1 ( B f ) 9 54
diam(A)9 83
(B
D = D [ 0 9 l ] 9 90
J*f dμ, 43
d μ , 106
Jf dμ, 43
D (C,μ), 106
J
D , 148
US),
ε (B), 1
ξ -i> ξ, 65 >g)> 56
(E Q ):, 111
(ξ
=
φ , 111
108 43
n )-^
e , 161 μ n5
F
n(t)'
(ξ,n), 71
ξ n - ^ ξ (in (D,s)), 93 ξ,95
F
2
K (s 9 t) 9 140 n 12
K(s 9 t) 9 143
174
PETER GAENSSLER
L =S 4
N(ε,C9μ)9 111
M C ) , 107
N (ε,C 9 μ), 112
> 1
v ( t ) 9 140
V Γ
, μ ) , 162
PCX), 21 m ( r ) 9 22
P* 9 65
M ( S ) ,43
π ! 9 73
3.
43
π t , 91
43
πt
t
μ*(A) 9 43
π c , 107
μ 5 44
IT L
μ
-T^ μ, 46
(D), 95
(S ) 9 U l5#"'Lk
°
Q 9 12
M 1 C S ) 9 58
0
M?;CS) 9 58
0 > 14V
(M):9 107
]Rk9 k ^ l 9
9
19
C M Q ) : 9 131
R., 65
CM ) : 5 132
p = supremum metric, 90
N n (B), 3
pq,
175
EMPIRICAL PROCESSES
S =(S,d)5
W, 143
42
σ({d( , x ) : x e s } ) 9 43 _δ
, _
S°, o
43
;jx||
σ({πt: t e [ 0 , l ] } ) 5
91
:=
sup |x(t)| t€[θ9ll
(x€DLθ,l]), 95
USED ABBREVIATIONS:
s = Skorokhod m e t r i c , 92
ad( ): = as to the proof of ( ):
(SE):, 108
a.e. = almost everywhere
U , U (t), 18 n n
CLT = Central Limit Theorem
a
, 42
df = distribution function
, 42
fidis = finite dimensional distributions
U?(S), 42
GCC = Glivenko-Cantelli class
D U b (C,dμ), 106
= convergence in the mean
= convergence in probability
V(C), 22 U) (B), 96
P-a.s, = IP-almost surely
/U"(δ), 96 x
rest f = restriction of f on A
, 97
r.h.s, = right hand side
VO ( δ ) 9 101
SLLN = Strong Law of Large Numbers
V) (δ), 113
w.r.t. = with respect to
, 115
V ί W n ( t ) ) t6[0,:Ll
VCC = Vapnik-Chervonenkis class
SUBJECT INDEX
analytic generator, 77
Bernstein-type inequality, 8, 9. Borel extension μ of a separable measure μ on ίL(S), 44 Brownian bridge, 101
canonical model, 15 characterization theorem for μ-Donsker classes, 113 Chernoff-type estimates of the mode, 15 compactness, 58 compact net in M (S), 63 a continuous mapping theorems, 54 convergence in law (L -convergence, 65; L-convergence, 93) convergence theorem for reversed submartingales, 10 Cramer-type result, 65 Devroye-Inequality, 34 distribution-free statistic, 18 Donsker classes of functions, 159, 162 Donsker's functional CLT for the uniform empirical process, 101 d-strictly separated, 77 ό-tight, 58 δ-tight w.r.t. S , 76
empirical C-discrepancy, 9 empirical C-process, 8, 106
176
EMPIRICAL PROCESSES
empirical distribution function (empirical d f ) , 12 empirical tF-process, 160 empirical measure, 1 existence of a version of € = (G (C)) pc o in S Ξlr(C,d ) , 110, 111, 114
finite dimensional distributions (fidis), 95, 110 functional CLT's for empirical C-processes9 105 functional CLT's for weighted empirical processes, 139 functional CLT's for weighted empirical processes w.r.t. p -metrics, 150 functional laws of the iterated logarithm, 158
Gaussian process, 2, 110, 14-3, 151, 160 Glivenko-Cantelli class (GCC), 36, 39, 137 Glivenko-Cantelli convergence, 13, 15, 18 Glivenko-Cantelli theorem, 12, 15, 21 growth function pertaining to C, 22
identification of limits, 52
L-convergence, 93 L -convergence, 65 lower left orthants in X = ]Rk, 2, 108, 129, 159
Markov property of empirical measures, 3 martingale property of empirical measures, 6 martingale property of the weighted empirical process, 152 measurability, 107, 131 metric entropy, 111 metric entropy with inclusion, 112 μ-Donsker class, 113 μ-Donsker class of functions 162
Poissonization, 7
177
178
PETER GAENSSLER
Portmanteau-theorem, 47 product spaces, 69 p-space, 1
quantile transformation, 12
Radon's theorem, 38 random change of time, 102 random element in X, 1 randomized discrepancy, 10 relatively L-sequentially compact, 95 relatively L -sequentially compact, 95 reversed martingale, 9 reversed submartingale, 9, 10 p -metric, 148
scores, 140 separable measures on B, (S), 44 separation property, 38 sequential compactness, 74 shattered by C, 22 Skorokhod-Dudley-Wichura Representation Theorem, 82 Skorokhod metric s, 92 smoothed version of empirical processes, 138 special versions, 12, 17 Strassen log log class, 159 strong approximation, 164 supremum metric p, 90, 106
tight measures on B (S) 5 45 trace σ-algebra, 14
uniform empirical process, 100
EMPIRICAL PROCESSES
Vapnik-Chervonenkis class (VCC), 22 Vapnik-Chervonenkis-Inequalities, 27, 34 Vapnik-Chervonenkis-Lemma, 24 version, 111
weak convergence and mappings, 53 weak convergence criteria, 58 weak convergence of random elements in D Ξ D [ θ 9 l l w.r.t. p -metrics, 147 weighted discrepancy, 17 weighted empirical process, 140 weight function, 17
179