This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
µ(dx)
(:ER', µE ')1L),
(5.2)
where, as usual, the integral is over the whole space R k . If µ is a probability measure, µ is also called the characteristic function of s. Note that if s is absolutely continuous (finite signed measure) with respect to Lebesgue measure, having density (i.e., the Radon—Nikodym derivative) f, then
µ=f.
(5.3)
The convolution µ*v of two finite signed measures is a finite signed measure defined by (µ*p)(B)=
f
(B-x)p(dx)
(BE iYC),
(5.4)
The Fourier-Stielijes Transform
43
where for A C R"`, y E R k , the translate A +y is defined by A+y=(z=u+y:uEA)•
(5.5)
It is clear that convolution is commutative and associative. One defines the n-fold convolution u*" by µ *1 =µ µ*"=µ*("- i) *µ
(n> 1,
e'X).
(5.6)
Let be a signed measure on R k . For any measurable map T on the space (R k , GJ3 k ,µ) into (R 3 ,Jas) one defines the induced signed measure µ ° T -' by (µoT-')(B)=µ(T-'(B))
(B E'3).
(5.7)
THEOREM 5.1 (i) (Uniqueness Theorem). The map µ—*A is one-to-one on 671.. (ii) For u E JR, , µ is uniformly continuous and µ( 0 ) = µ( R k ) , Iµ(t)I }
(xER k ). (6.31)
Of the two parameters, m ER" and V is a symmetric positive-definite k x k matrix. The notation Det V stands for determinant of V, and V' is, of course, the inverse of V. It is well known thatt (D,,,. v (t) = exp { i — 2 )
(t ER").
(6.32)
From this it follows that m is the mean and V is the covariance matrix of For the computation of cumulants X, for II > 2, it is convenient to take m =0 (for It > 2 a change of origin does not affect the cumulants x.). This yields log f o, v (t) = — 2 tSee Cramer [4], pp. 118, 119.
(6.33)
The Polynomials P,
51
which shows that X„ = (i,j) element of V if v = ei + ej ,
(6.34)
where e1 is the vector with I for the ith coordinate and 0 for others, < i k. Also, X„=0
if Ip1>2.
(6.35)
Another important property of the normal distribution is
(D m1, V l * ( Dm 2 . V,* ...
* 0
(6.36)
V = 4i m ,,,,
where m=m1+m2+••• +m,
(6.37)
V=V i +V 2 +•••+V,,.
This follows from (6.32) and Theorem 5.1(i), (iii). The normal distribution (D o.r , where I is the k x k identity matrix, is called the standard normal distribution on R k and is denoted by 4'; the density of 4' is denoted by .0. Lastly, if X = (X 1 , ... , X k ) is a random vector with distribution 4',., ^,, then, for every a E R k , a0, the random variable (a, X> has the onedimensional normal distribution with mean and variance = ^k a a i ja r^ , where ij=I a j =(i,j) element of V=cov(X i ,Xj )
(i,j= I....,k).
7. THE POLYNOMIALS P, AND THE SIGNED MEASURES P J Throughout this section v = (v i , ... , v,,) is a nonnegative integral vector in R k . Consider the polynomials
X^ 7,(z)=s!
lvIas P !
(z`=z^
zP
,...
(7.1)
z)
in k variables z 1 , z 2 ,...,z k (real or complex) for a given set of real constants x,. We define the formal polynomials P3 (z: (X„)) in Z 1 ,....Z k by means of the following identity between two formal power series (in the real
variable u). 1+ I P,(z: {X })u s =exp '
X,+2(z) !
u' m
=l+ 1+I ,
m=i
E s + 2)! us .
m! L•^ (s+2)!
s=i
(7.2)
52
Expansions of Characteristic Functions
In other words, Ps (z : {7c}) is the coefficient of u' in the series on the extreme right. Thus
Ps( z : (X"})=
t
V1 f
L
,r,l
(il
X1 1 +2( Z ) Xj2+2(z) ...
Xj„ + 2 ( Z )
(11+2)! (12+ 2 )!
(Jm+2)1 1}
^^
1 0 -0,
1
mt
(s=1,2,...),
(7.3)
where the summation X* is over all m-tuples of positive integers (j 1 ,...,jm ) satisfying M (7.4) ji =1,2,...,s (1G i<m), >ji=s, and S** denotes summation over all m-tuples of nonnegative integral vectors (p 1 ,. . . , vm ) satisfying kI=j; +2
(1}. l+
r=!
x (I +o(n -js-2) / 2 ))
(n-oo),
(7.8)
where, in the evaluation of P,(it: {X„}), one uses the cumulants y of G. Thus, for each t E R k , one has an asymptotic expansion of G" (t /n 1 / 2 ) in powers of n - 1 / 2 , in the sense that the remainder is of smaller order of magnitude than the last term in the expansion. The first term in the asymptotic expansion is the characteristic function of the normal distribution'I ,. The function (of t) that appears as the coefficient of n - '/ 2 in the asymptotic expansion is the Fourier transform of a function that we denote by P,(-0 0. ,,: (y,)). The reason for such a notation is supplied by the following lemma. LEMMA 7.2. The function t-*,(it:{x.})exp{-z)
(IGRk),
(7.9)
is the Fourier transform of the function P,(-40 o v : {X„}) obtained by formally
54 Expansions
of Characteristic Functions
substituting (— 1) IHI D V *O v
for (it) " (7.10)
for each v in the polynomial P r (it: {X,,)). Here 4, is the normal density in R k with mean zero and covariance matrix V. Thus one has the formal identity P,( - 4o: {X,))= P,( —D: {X„})$o.v
(7.11)
where —D=(—D I ,..., —Dk ). Proof. The Fourier transform of 0 o, v is given by [see (6.32)]
— Z}
(tERC).
(7.12)
D qo v(t)=( it)r*o.v(t)
(tER"),
(7.13)
0o,v(t)=exp{
Also
which is obtained by taking the with derivatives with respect to x on both sides of (the Fourier inversion formula) 00,v(x)=(2sr) -k
f
exp{—i}$ o. v (t)dt
(xER k ) ( 7.14)
[or, by Theorem 4.1(iv), (v)]. Q.E.D. We define P,(— (D o, v : {7t,.}) as the finite signed measure on R k whose density is P,(- 4o.v:(X„ }). For any given finite signed measure µ on R", we define u(•), the distribution function of µ, by µ(x)=u((—oo,x])
(xER"),
(7.15)
where (
—
oc,X]=(
—
oo,X1]X .. • X(
—
ci,X k ]
[X= (x1,...,Xk)ERk}. (7.16)
Note that D,... DkP,( — 'Do.v: {X.})(x)= P,( — $o,v: {X,})(x) ,(—D: {X, , }) 40.v(x)
=P,( —D: (X, ))(DI ,
...
DkOo.v)(x)
= D j ... Dk (P,(— D: {Xr))4o.v)(x). (7.17)
The Polynomials P, 55
Thus the distribution function of P,( — (D o, I, : {x,}) is obtained by using the operator P,(— D: (x)) on the normal distribution function (D o, I,(• ). The last equality in (7.17) follows from the fact that the differential operators P,(—D: {X,}) and D I D 2 • • • Dk commute. Let us write down P,(-40: (x,)) explicitly. By (7.6), P I (it : (x}) =
(7.18) -- (it) ° (I ER"),
so that (by Lemma 7.2) P1(
—
$0,v: {x,})(x)
_—
Irl-3
Dr0O.v(x)
k —
6 X(3.0.....0)
—
3
II
v ljxj )
k
+3v 11 k v l'xj
j-I
j-1 k
vkjx. +3v
+... +X0.....0,3) — (
k j -1
k
2 k
2 "12.1.0.....0) — 2I vlxj El^ 2jx! )
k
k
+Zv l2 vlixj+vll j-1
2 k
k +... + X10,....0.1.2)
k
v2Jx1 j-I
k—Ijx ^IVk^xj) (^ I V j
k
+2v k ' k-I 2 v kjxj +v kk 2 Uk-IjxJ j- 1 j-1
Jj
Expansions of Characteristic Functions
56
—
X (1.I.1.0,....0) — (J
x)(±
U2jXl /
I
k
1 v3JxJ /
\
k
k
v 12 U3jx,+U13 2 V2iX^') U 23 UIIXj +...
l-1
JAI
k +X(0.....0,1,1,1)
f-I
I k
k k—IjxJ Vkjxl
Uk-2jXj)(j- U 1 JiI
/
k
+Uk-2.k-1
k UkJx.+Uk-2.k
Uk-1jX.
l -1 l—I
l
k
+Uk—1,k
2
Uk-2jkJ
l- 1
OV(x)
r V -1 =((v')), x=(x l ,...,xk )ER k ]. (7.19)
If one takes V= I, then (7.19) reduces to P1(-4: ( ))(x) =
{
-
—
6["13,0,....0)(
— X^+ 3 X1)+ ... +x((0....,0.3)( — xk+ 3 xk)]
2 [X(2,1,0.....0)(
— x^x2+x2)+ ... + X(0.....0,1.2)( —Xk xk—I +Xk—I)]
[X(1,1,1,0,...,0)(
—x 1 x 2 x 3)
+... +
X(0,...,0,1,1,1)(
Xk xk—I xk-2)]}'O( x )
(x E R k) (7.20)
where is the standard normal density in R k . If k= 1, by letting X^ be the jth cumulant of a probability measure G on R'(j = 1, 2, 3) having zero mean, one gets [using (6.13)] P1(
—
*: {7G.))(x) = 6 µ3 (z3-3x)$(x)
(xER'),
(7.21)
where µ 3 is the third moment of G. Finally, note that whatever the numbers (X,) and the positive definite symmetric matrix V are,
f P,( — $o v: {^))(x)dx=P,( -4 o,: {x,))(R k ) =0,
(7.22)
Approximation of Characteristic Functions
57
for all s > 1. This follows from
for III 1.
f (Dpo o, v )(x)dx=0
(7.23)
The relation (7.23) is a consequence of the fact that 40 o v and all its derivatives are integrable and vanish at infinity.
8. APPROXIMATION OF CHARACTERISTIC FUNCTIONS OF NORMALIZED SUMS OF INDEPENDENT RANDOM VECTORS Let X 1 ,. ..,X ben independent random vectors in R k each with zero mean and a finite third (or fourth) absolute moment. In this section we investigate the rate of convergence of Px,+...+x.)/nt/2 to 1 o.v , where V is the average of the covariance matrices of X 1 ,... ,X. The following form of Taylor's expansion will be us-;ful to us. LEMMA 8.1.t Let f be a complex-valued function defined on an open interval J of the real line, having continuous derivatives f ` of orders r=1,...,s. If x, x+hEJ, then (
)
s—I
f(x+h)=J(x)+
h^ fi 1 (x)+
(s hs l)i
f ' (1–v) s- 'f( s l (x+vh)dv.
–
(8.1) COROLLARY 8.2. For all real numbers u and positive integers s (to)' I J u
exp { iu } — 1 — iu — • • • — (s
—
1)! < —. l(8.2)
Consequently, if G is a probatility measure on R k having a finite sth absolute moment p s for some positive integer s, then i
G(t)
—
I
—
iµi(t)
—...
—
1)! µs -1(t) $ (t) ^ p511tlls s ^ s^ tE ( R k )+ ( 8.3 )
tHardy [fl, p. 327.
58 Expansions of Characteristic Functions
where for r = 1, ... , s, vi µrt r ^ljs(t)= f II s G(dx). (8.4)
A'(t)= f 'G(dx)= Irl - r
Proof The inequality (8.2) follows immediately from Lemma 8.1 on taking f (u) = exp (iu) (u E R 1 ) and x = 0, h = u. Inequality (8.3) is obtained on replacing u by in (8.2) and integrating with respect to G(dx). Note that
a:(1)= f II'G(dx) < Iltll'p,. Q.E.D. COROLLARY 8.3. Let f be a complex-valued function defined on an open subset SI of R k , having continuous derivatives Drf for I vl <s (on SZ). If the closed line segment joining x, x + h (E R k ) lies in St, then vl (Dr7)(x)
f(x+h)=J(x)+ 10),
rj
j=1
then it is clear from (8.10) that l,.n n -(: z)/z sup
P, II t CI = PS S
IItII= i fit, Vt>s/2
n
^s12
-(s- 2)/2 (8.12)
where X is the smallest eigenvalue of V. In one dimension (i.e., k= 1)
!S n _P P /2 n
—(s-2)/2
(s > 2).
(8.13)
We also use the following simple inequalities in the proofs of some of the
60 Expansions of Characteristic Functions
theorems below. If V= I, then.. n EI
n' /2l,,nlltIJ .
(8.14)
In the rest of this section we use the notation (8.9)-(8. 11), often without further mention. THEOREM 8.4. Let X 1 ,...,X, be n independent random vectors (with values) in R'` having distributions G 1 ,...,G, respectively. Assume that each Xi has zero mean and a finite third absolute moment. Assume also that the average covariance matrix V is nonsingular. Let B be the symmetric positivedefinite matrix satisfying B2= V '. (8.15) -
Define -
r..2
11
d2
)
r-2 ,
lT)
b„(d)=2-d(da(d)+1)(ls.n)2/3. 6
(8.16)
Then for every d E (0, 2' / 2) and for all t satisfying
Iltll ( d'/',
(8.17)
one has the inequality
n
^G( nBt 1/2 )-exp{-2IItIl2} i-^
3. Then there exists a constant c 2(s) depending only on s such that J(D"log G)(t)I < c2(s)p5
(9.3)
for all t in R k satisfying
G(t)—lI0),
Pr i—I
n
= with cumulant of X^,
Xtl = n -
X„
(9.6)
[PE(z+)k, 00).
(9.8)
J -1
The constants c's that appear below depend only on their arguments. LEMMA 9.5. For every nonnegative integral vector a satisfying 0 < lal /3, then D° - O (exp {h(t)) -1)= D° - " (exp {h(t)}),
(9.49)
which is a linear combination of terms such as (Da'h(t))''... (DRh(t))''exp { h(t)),
where 2; _ 1 j; /3; = a - /3. By (9.46) [and (9.13)1,
(DF'h( t ))i'... (DRh(t))i'l
c ► 6 s k 11 (I Vc s-2)1
n
(s - 2)(EJ, - 1)
,
(s
P [ n /2 - 2)/2
II tII5-2+2±Jl-la-13I
C17( S,k)
ps(IItlls—ja—Ql+ 1(II( sla—PI-2 )
(9.50)
Hence if a> Q, then
ID° a (exp{h(t)) - 1)I -
c's( s, k) ps(Iltll'-IQ-fI n cs-z)/z
+Iltllt+la-BI+2)exp{ Il8 I.
(9.51)
Using (9.44), (9.48), and (9.51) in (9.40), one obtains
D a (n n G t ex l/z) p i=1
Iltll z + s-3 n / J4+z( !t)
2 E -^ .-1
c19(s k)P, ,
(Iltll
Ia l
2
+Iltl l
(r+2)i
l
3+al+z
Iltll2 4 }• (9.52) )exp (-
}
Expansions of Characteristic Functions
81
Now use Lemma 9.8 to complete the proof when V= B= 1. If V r 1. look at random vectors BX 1 ,...,BX n and observe that n
II J=^
Gi \
z l
is the characteristic function of Z n = n - '/ 2 Y. where Y n = B (X, + • • . + X n ). Also, if the (r+2)th cumulant of the random variable is denoted by X,+z,l(t), then the corresponding cumulant of = is Xz(Bt). Q.E.D. The following theorems are easy consequences of Theorem 9.9. THEOREM 9.10. Let G be a probability measure on R k with zero mean, positive-definite covariance matrix V. and finite sth absolute moment for some integer s not smaller than 3. Then there exist two positive constants c 20(s,k), c 21 (s,k) such that for all tin R k satisfying
IItII < c20(s,k)n1/zns
z'
one has, for all nonnegative integral vectors a, 0 al < s, s-3
D ° Gn( ^^z) —ex p{ - 111 , 11 2 } E n—r/2Pr( iB1: (Xv)) r=0
cz,(s,k)71S s lai ncs—z)/z ^Iltll
+Iltll
3(s—z,+iaj ]ex{ — Ill
4 ^.
Here B is the symmetric positive-definite matrix satisfying (9.7),
-s= f
I1BxII-V(dx),
and y,, is the with cumulant of G. Proof Note that if i satisfies 1 1 11 < n'/z^ls 11(s-zt
then Bt, VBt ) t 2 2n = I1I G^ Bi z )—1 < 2n
^z,
Expansions of Characteristic Functions
82
because of the relations fIBxjp 2G(dx)=k )) —
Ilt2I `3
r
2_
))
exp — 2 J ^n -1 Pr(i 81 :{X^ r=0
111112 ,ai acs—z>+iQi------j }exp^ , +I{t{I n (z))z [lltll
c23 s ,k s j
where the notation is as in Theorem 9.9. Proof. As in the proof of Theorem 9.9 (see the concluding observations), it is enough to prove this theorem for B= V= I. In this case, for all t satisfying n'/2
Iltll
11u/(52) •j
one has t E Z IItII ZE(IX II Z ;
2n
<
2n
IItl12(EllXilis)21s 11,112(n',ns)Z/sa.
(10.1)
84
Expansions of Characteristic Functions
is called the uniform distribution on [ – a, a). One has
The measure U1 _
O(_
Q
, a J(t)
=Za f costxdx= 0
a
a
sp a ' (tERl).
(10.2)
The probability measure
To=
(10.3)
U`i_a/2,a/2l
is called the triangular distribution on [ – a, a]. It is easy to show that its density is
Q (1– ad l )
t,(x)=
for IxI a,
=0 and that
at
sin — I a t 2 (tER'). T a (t)=
(105)
2 One can write
c(m) =(f
/Ri
I
sinx I m dx) (m =2,...). -'
X i
(10.6)
For a >0 and integer m > 2 let G,.,,, denote the probability measure on R' with density g,
m
(x)=ac(m)I sin x ax
I
m
(xER').
(10.7)
It follows from (10.2) that for even integers m> 2 sin at _
--
at
m
^ [- a . a
l(t)
(t E R'),
(
10.8)
so that by the Fourier inversion theorem [Theorem 4.l(iv))
d.,.(t)=21Tac(m)u*
=O
if
f tI > ma.
.. a) (t)
(t ER'), (10.9)
A Class of Kernels 85
Let Z I ,...,Zk be independent random variables each with distribution G 112,, 2 , and let Z = (Z,, ... , Z, F ). Then for each d>0 Prob(IIZII>d)
\.
(10.10)
Thus for any a in (0, 1), there exists a constant d depending only on k such that Prob(IIZII> d)< 1—a.
(10.11)
Note that the characteristic function of Z vanishes outside [ — 1,1 ] k . Now let K j denote the distribution of Z/d; then K,((x:lIxll> 1))< 1—a, if t12[—d,d^ k
K 1 (t)=0
(10.12)
One thus has THEOREM 10.1. Let r be any given positive integer and let a E (0, 1). There exists a probability measure K, on Rk such that
(i) K1((x:IIxII> 1)) 1) is a sequence of independent and identically distributed (i.i.d.) random vectors with common distribution Q 1 . Suppose that Q 1 has mean zero, covariance I (identity matrix), and a finite third absolute moment p3 . Let Q. denote the distribution of n - 2(X, + • • • + X„). If Q, has an integrable characteristic function (c.f.) Q 1 , then the c.f. Q. of Q„ is integrable for all n. One can then use Fourier inversion to estimate the density h„ of the signed measure Q„ - (D as "
Ilh.II m sup Ih.(y)I 1) and applies the above procedure to these truncated vectors. The various lemmas in Section 14 allow one to take care of the perturbation due to truncation. As in the case r0 =0, for final accounting (i.e., to estimate the effect of smoothing by Kc ) one uses the smoothing inequalities of Section 11. The main theorems of Section 15 are obtained in this manner. A further truncation enables one to obtain corresponding analogs when only the finiteness of absolute moments of order 2+8 is assumed for some E, 0 < 8 < 1, thus yielding generalizations and refinements of the classical one-dimensional theorems of Liapounov and Lindeberg.
11. SMOOTHING INEQUALITIES
Lemmas 11.1 and 11.4 show how the difference µ— v between a finite measure and a finite signed measure v is perturbed by convolution with a probability measure Kc that concentrates (for small e) most of its mass near zero. Let f be a real-valued, Borel-measurable function on R k . Recall that in Chapter I we defined the following:
(ACRk),
wf (A)=sup(If(x)—f(y)I:x,yEA} wf (x:e)= wf (B(x:e))
(11.1)
(xER k , a>0).
Also define
MJ (x:e)=sup{f(y):yEB(x:e)), mj (x:e) =inf{ f(y): yEB(x:e))
(11.2)
(xERk, e>0).
Note that
wf (x:e)=M1 (x:e)—mf (x:e)
(xER k , a>0).
(11.3)
The functions M1 (- : e), mf (• : e) are lower and upper semicontinuous, respectively, for every real-valued function f that is bounded on each compact subset of R k . Also, wf(• : e) is lower semicontinuous. These follow from {x: M1 (x:e) >c }=
U {B(x:e): f(x)>c)
(cER5,
z
ml (x:e)=—M_ f (x:e)
(xER k , r >0).
(11.4)
In particular, it follows that Mf (• : r), mf (• : e), wf (• : e) are Borel-measurable
Smoothing Inequalities
93
for every real-valued function f on R k that is bounded on compacts. Recall that the translate fy off by y(E R'") is defined by fy (x) =f(x+y)
(xER k
).
(11.5)
LEMMA 11.1 Let s be a finite measure and v a finite signed measure on R k . Let a be a positive number and K E a probability measure on R k satisfying Ke (B(0:e))= 1.
(11.6)
Then for every real-valued, Bore/-measurable function f on R k that is bounded on compacts, f fd(µ
—v)I
=f
B(O:) e
—
v)(dy)]Ke(dx)
[ f Mf(Y +x:e)It(dy) f f(Y)v(dy) — f (Mf(y+x:e)— f(Y))v(dy)]K.(dx) —
> f[ f f(Y) u(dY) f f(Y)v(dY) — f (MJ(y+x:e) — f(Y))v (dy) I KE(dx) > f fd(,u f (M1( . : 2 e) — f)dv . —
B(0:e)
+
—
v) —
+
(11.9)
Normal Approximation
94
Similarly — Y(f:e)
2.
(11.16)
Then for each real-valued, Borel-measurable, bounded function f on R k , one has
I
f fd(tt—v)I 0, there exists x o such that F(x o )—G(x o )>S—r^.
Then (P— Q)*Ke ((— oo,x o +e])
B(o:e)
[F(xo+e
—
y)
—
G(xo+e
—
Y)]KE(dY)
+ f[F(x o +e—y)—G(x o +e—y)]KE (dy) R \B(O:e)
>f[F(xo)—G(xo)—(e—Y)m]KK(dy)—S(I —a) B(0:)
(S
—
n)IX
—
am€
—
S(1
—
a)=S(2c
—
1)— amt—a,
so that (2a— l)8 sup (P—Q)*K ((—oo,x])+ame. E
xER
1
Q.E.D. LEMMA 12.2 Let P be a finite measure and Q a finite signed measure on R' with distribution functions F and G, respectively. Assume that
f IxIP(dx)0). (12.13)
Fix x > 0. For every b > 0, one has 1 +b 2 =
f( - b) 2P(dy) > f
(
(y- b)2P(dy)>(x+b)2F(-x), m,
-
xl
so that g(b)=(l +b 2 )(x+b) -2 > F(-x). The minimum of g in [0, oo) occurs at b = x -', and g(x- 1)=(l+xZ)-1,
104
Normal Approximation
which gives the first inequality in (12.13) [note that for x=0, (12.13) is trivial]. The second is obtained similarly (or, by looking at P). This gives F(-x)-(D(-x) 6(1 +x 2 ) -' -(D(-x)mh(x),
say, for x > 0. The supremum of h in [0, oo) is attained at a point x 0 satisfying or
h'(xo)=0,
xo/ 2x 0 (1+xo) -2 = (27r) - / 2e - 2
A numerical computation yields x 0 =0.2135 and h(x 0)=0.5416, thus proving (12.14)
IF(x)-(D(x)i 0 follows similarly (or, by looking at P). Q.E.D. THEOREM 12.4 (Berry-Esseen Theorem) Let X 1 ,... , X„ be n independent random variables each with zero mean and a finite absolute third moment. If n
P2-n -1 2 EX^>0, i=1
then sup IF„(x)-c(x)I2). This is justified in view of (8.12).
THEOREM 13.2 Let X 1 ,.. . , X n be n independent and identically distributed random vectors with values in R k satisfying EX,=0,
Cov(XI)=1,
P4=E11X1114r/2)
r
512k°p3
Prob(IIEZII> 2)< 52 3 3 2 irr n
(13.28)
where XJ =(XJ , I ,...,XJ.k ), 1 < j0].
Define truncated random vectors
X^ Y= 0
1
if IX,II
n 1,2
(14.2)
if I^Xf ll > n
Zj =Y1 —EYJ (Ic jn^^ Z }
(
I An,j.a.
(14.4)
j=I
Finally, write Cov(XX )=((vii )),
V=n-I n
D=n -I Y, Cov(Zj )=((dij (14.5) jal
The constants c's depend only on their arguments.
122
Normal Approximation
LEMMA 14.1 Let p, < oo for some s>2. (i) One has
p,,=EIIY11I3+
3. Let v i ,...,v m be nonnegative integral vectors satisfying m
^v,I>3
(10,1/2, (14.111) n112) i-I
which proves (14.107). Now assume that s is an integer, s>2. Since IIxII' < l + Ilxlls for 0< r < s, it is enough to prove (14.109) for r=0 and r = s. The case r = 0 is precisely (14.107). We therefore need only to prove (14.109) for r=s. One has
f IIxl IQ. Q,'I(dx) 5
—
<
f Ilxll'IG.*
..
.
*GG-i*(G1
—
Gi )*Gi+i* g
...
*G,. I(dx)
n * G,, (du ) <j-21 f (f ►I u+U II sIG^— GJ'I(dv))G,*... *Gi-,**... n
3 ) IEY1I sP(IY;I>T24 2n' /2— IEY;I)2n'/ 2 /3) IXiI,
fIY;^J < f {IY I>r1t 2n'/ 2 —IEY I) ;
;
{IX;I>2n'/i/3}
^X;Is.
(14.118)
Therefore, choosing c, a(s,k)=(,1) , / 2 (l6k) - ', one has n -'
"
IW;I s < T2^S/22a n.^(s) < f {IW , I>"' 12 )
n (r — 2)/2
(14.119)
8
so that Lemma 14.7 may be applied to yield EIX,+••• +X^+Z,+••• +Z,I
$
= TZ j 2E I W, +... + W IS .
EIW J +
+W„I 5 O, xER k
M0(f)= sup If(x)—f(y)I=wj(R" ).
(15.4)
x,yER k
If v is a finite (signed) measure on R k , define a new (signed) measure v, by if r>O,
v,(dx)=(1 +IIxII')'(dx) o = v.
v
(15.5)
As in the preceding section, write n
^n,sl E ) —n-1 f
IIXi 3
^ns=^ns(1).
(15.6)
j-1 01Xj11>cn'/2)
Then define n
inf A* s =O 3, then for every real-valued Bore/-measurable function f satisfying Mr(f)c 9 /e=n 1 / 2 /(16p 3 ), and (
( DPR0),
,lmmn -I
(15.39)
i-1
B being the symmetric positive-definite matrix satisfying B 2 = D - 1 . By Lemma 14.1(v) (with s'=m) and inequality (14.21) (Corollary 14.2), one has
IIB(I 0 ,,.3{i)> c1n 1/2,
then
f fd(Q,
-4i )I
1) is a sequence of independent and identically distributed random vectors with zero means, covariance matrices I, and finite third absolute moments p3 . The reason for this is that the bound is in terms of wf [and wf (R k )] rather than wf . Recall that [see Lemma 1.2(iii), Theorem 1.3(iii)] that f fdQ„---. f fd4 for all bounded c1-continuous f and that a bounded f is (D-continuous if and only if -
lim wf (e:1)=0
{wl ( e:0)=
f
wj (x:e)(P(dx)l.
On the other hand, it is not difficult to construct bounded 4>-continuous functions f such that wj (e does not go to zero with e. For example, let A I , the indicator function of the following Borel subset A of R'. f=
: c)
((,-I)/21
A=U
U r-2
i—i
jxeR 1 :r+? <xr+ r !
(2i+1) r
,
Normal Approximation
154
with [(r— 1)/2] denoting the integer part of (r— 1)/2. It is easy to see that w^A (e:4i)= sup 0((8A)`+y)=1, yERk
W1A (E:4))=4D l (aA) ) 0),
(16.6)
j-1
and M,(f°T - ') and w8r or wf, T -1, where (I + IIxIrr") - '(fo T - ')(x) Sr(x)= (foT
- ')(x)
if r>O, if r=0.
(16.7)
Normalization
161
Since T is easily computable, we may leave things at this stage. If one would like the bound to involve moments of X D 's and not those of TX's, then the simple inequality (r>0)
Tr 3, then for every real-valued, Borel-measurable function f satisfying (16.11) M,(f) 1. Examples of such sets are affine subspaces of dimensions k' < k — I (and their subsets and complements) and many other manifolds of dimension k'< k —1, for which a = k — k'. Below we assume that V= I merely to avoid notational complexity. THEOREM 17.4. Let V=I. Assume that p, < oo for some integer s)'3. If A is a Borel set satisfying (17.10) for some a> 1, then IQ.(A)
-
1b(A.)I a+2. Proof. First assume that 4(A)=0 or 4(A)=1. In this case P.(-4: {X,,)) (A)=0 for all m, 1 <m <s-2, because of (7.22). Hence (15.56) holds, and it remains to show that fl -m/2 IP,n(-^ {x.})(x)Idx (aA)"+y
0 there exist c; F_ R', y, ER k , 1 < i < m, say, such that II 'B — = 1 c(x +y) 1 G + ((a {c+y))") k'
c10(n -1/2 P3)
k+1-k '
y E Rk-
+c', o (P 3 n -I / 2 ) k-k ,
where ¢= +n -t / ZP t (— : {X„}), , denoting the average of the with
cumulants of Uj 's. Q.E.D. Remark. It is fairly straightforward to extend the assertion (17.12) to a sequence (X,, : n> 1) of independent random vectors for which n
linminfX.>0,
supn -1 7, EIIX^II' 3, then sup ( 1 +d 3 ( 0, aC))IQ.(C) — b(C)I < c16Psn - ' /2 +ci7P:n -t:-2)/2 .
CE@
(17.25)
Proof. Let C be a Borel-measurable convex set. Replacing A by C in (17.17) and using (17.22) in Theorem 15.1 (with r=s), one has (1+d'(U,8C))IQ.(C)-41(C)j
=^ ffd(Q^-0)I s-2
m/2I (' fdp (—: f fd(Q,, — ^)I + I n{X,))I m-1 J c18Pan-(s-2)/2+ c 1 9w (2E : (Y
+ )r°)
s-2
+ 2 n - m/ 2 I f fdpp (-4: (x.))I,
(17.26)
m-1
where e=c 10P 3 n - '/ 2 . Now, by (15.82) and (15.84),
1
(1 + 1I X II^ ° )I Pm ( - -0 : {Xr})(X)I dX 4 C21Pm+2+ n-m/2Pm+2 0. Also, d,(G 3 ,G 2 )=inf (t>0: G 1 (F) < G 2 (F
)+c
for all closed F) (17.46)
since, given G 1 (F) < G2 (F`) + e for all closed F and some c > 0, one obtains 1— G 1 (F` )= G 1 (R"`\F`) < G 2 ((R"\F` f)+e 0).
( 17.51)
denote
(17.52)
THEOREM 17.10. If p 3 < oo, then there exist constants c 38 , c 39 depending
only on k such that sup AE&(d:0o.v)
iQ.(A)- 00,v(A)I 1) be a sequence of independent random vectors with values in R k and having zero means and finite sth absolute moments for some integer s> 3. Let n
Vn = n - ' Y, Coy (Xi ),
An = smallest eigenvalue of V,,,
J- 1 n
A n = largest eigenvalue of Vn ,
p = n - ' F, E IIXjII', i=1
n
= inf
o0,
lim n - '/ 2 p 3 , 1 log'/ 2 n=0,
lim n -( ' -Z) / Z pJ, n =0.
(17.56) Then one has
sup
{ a'
a >((s-2+8)Iogn)
t /2
Prob (II n - 1 / 2 (X 1 + ... + Xn) II > A' /Za) )
< c 4o(s, k)(1 + A'( 2 )( l + (n -'/2A1 3/2logn)k+,+'`n _(s-2)/2pn +©n(8 )n -(:-2)/Z, (17.57)
where 0„(S)—>0 as n-+oo for each 8 >0. Proof. Without loss of generality, assume a n >0 for all n. Let Qn denote the distribution of n - '/ 2(T„X + + T„X n ), where Tn is the symmetric, positive-definite matrix satisfying TZ =V,,
(n>1).
(17.58)
Some Applications
177
Write n T
—n—'
m+2,n
m+2 (0<m<s-2), 2 EllTT Xjll i I —
n
(l vi < s).
(vth cumulant of T„Xj )
(17.59)
j-1
Define the function f by
f (x)° 0 if a if
IIxII a,
(17.60)
and use Theorem 15.4 with this f to get s-2
Qn — 2 n -mj2 Pm( —, D: (j4.,n}) Qx: Ilxll > a)) ,n 0
a' (
< cal(s k)M, (f)[ 1 +( n—'/2T3
,nlogn)k+:+110n 3n-(s-2)/2
,
s-2
'has m2.0 n—m/2 I Pm( — 'D: (Z,n))l (17.61)
(lx:IIxII>a—c42n-'/22r3,nlogn}) where n
0.* ,- inf ,
04141
en-1 2 f
j^ l (IIT.Xj IK <Enl/I)
II TnXj IIJ
n
+n -l I f
IITnXiiIs oo
(17.64)
(1 <m<s-2),
and a — c42n -1 /2i3„logn> 2 , I/2
a—c42 n -1 / 2T 3 „logn>((s-2+ 2 )logn)
(17.65)
for all sufficiently large n if a>((s-2+S)logn) 1 / 2 . Hence a'n -m/2 I Pm( -0 : {i.,^})l({x IIXII > 2 1/ < c43 n - m/2,rm+2
(IIxII
.n f
IIxll3m+aeXp
z >(s-2+6/2)1.`.)
_ II 211 2 ) dx J (17.66)
=©„(S)n -( s -2) / 2 (0<m<s-2). Also note that MS(f)= 1+a5 a ))
< c4,(s,k)r 1 +(n -1 / 2 X; 3/2P3
.logn)k+,+' ]n n sn -(..-2)/2
n-(S-2)/2. +O„(S) t
(17.68)
Finally observe that Q,,((x:IIxii>a})=Prob(I(n -tl2 (T,X,+
...
+T,,X)11>a)
>Prob(IIn -1 / 2 (X j +... +.X,)11>aA,',/ 2 ), (17.69)
since II T,,xll > Ar 112IIxII. Q.E.D. COROLLARY 17.12. Let {X: n> 1) be a sequence of independent and identically distributed random vectors having a common mean zero and common covariance matrix V. If p,-EIIX 1 II s is finite for some integer s>3, then 2n^/ 2 )=Sn n -( : -2) l 2 an s^ (17.70) P(IIX1+... +X.II>a.Al/
Some Applications 179
where 6,,-*0 as n--* oo uniformly for every sequence {a„ : n> I) of numbers satisfying (17.71)
a,,>(s-2+6)'/2log"2n
for any fixed 6 > 0, and A is the largest eigenvalue of V. Proof Note that in this case a' < n -I / 4 - s/ 2ps +x - s/ 2 f1IX1 n
(IlXill> >,
5-* 0 (17.72)
as n-*oo. Here A, A are the smallest and largest eigenvalues of V, respectively. Q.E.D. COROLLARY 17.13. Let {X„ : n> 1) be a sequence of independent random vectors having zero means and finite sth absolute moments for some integer s > 3. Assume that lim Q,,,, < oo ,
lim inf an > 0, n-ioo
(17.73)
n—,cc
where the notation is the same as in Theorem 17.11. Then P(11X1+ ... +Xn11 > a,A,', /2n 112 )= S„n
c: -2) / 2 a,^ ',
(17.74)
where (S„ : n) 1 } remains uniformly bounded for every sequence of numbers { a„ : n) 1) satisfying (17.71) for any fixed 6 > 0. Proof. In view of (17.73) the sequence (A n : n) 1) is bounded since, writing V. = ((va )), one has k
A,,= sup (x, V,,x> = sup 2 v j xi xj I1x11-1
11x11 i,j-1 2
kk < G ( vii vjj) 1/2xi xjl =(
Pm,,, 1) is a bounded sequence. The relation (17.74) now follows from (17.57). Q.E.D. Note that the sequence (&, : n> 1) in Corollary 17.13 may be shown to go to zero in the same manner as (Sn : n > 1) in Corollary 17.12 if n
limn
1
(17.77)
j-1 {Ilxjll>Af2n'/')
18. RATES OF CONVERGENCE UNDER FINITENESS OF SECOND MOMENTS Most of the main results in Sections 15, 16, and 17 have appropriate analogs when only the second moments are assumed finite. Here we prove an analog of Theorem 15.1 and derive some corollaries. As before, X 1 ,.. .,X are independent random vectors with values in R',
and n
n ' 7, Cov(Xj)= V,
(l < j0),
xER k
Mo(f)=
sup
lf(x)
—
f(y)I =wj(R" ).
(18.6)
x,yER k
Finally, let Qn denote the distribution of n -1 / 2 (X 1 + • • • +X„) and let ' a e denote the normal distribution on R k with mean a and covariance matrix V. We write 4i=0 o.J , where I is the identity matrix. THEOREM 18.1. Let V=I and p< oo for some s, 2 < s < 3. There exist positive constants c,, c 2 , c 3 depending only on k and s such that for every Borel-measurable function f on R k satisfying (18.7)
M,(f) ck ^^^
for every c > 0, then {G,, : n) I) converges weakly to the standard normal distribution 4.
8-
184 Normal Approximation Proof. Apply Theorem 18.1 (with r = ro = 0) to the random vectors
For every Lipschitzian function f bounded by one (or for the indicator function f of an arbitrary Borel-measurable convex set) one has f f(G"-'0)I no (rl) one has 8"(t/2k) < ri/2. Then d, < rl for all n> n 0(rl). Q.E.D. The above coronary is an extension of Lindeberg's central limit theorem to the multidimensional case.
COROLLARY 18.3. If in Corollary 18.2 one replaces (18.19) by
EIIXj"I' ,' "
k„
+ kn 1 2 J II T"X5 n) II S j-1il ll Xf"III>ck=} (
< c14k^ t -z)/2 ,
k^
k„
Ell T"XJ"t° , j-1
(18.23)
Convergence Assuming Finite Second Moments 185
where e is the class of all Bore/-measurable convex subsets of R k , and e 14 depends only on k. Proof. The first inequality in (18.23) follows from Theorem 18.1 (with r=0= r 0) applied to random vectors T"X^" , 1 <j < k, and from Corollary )
3.2 (with s = 0). The second inequality is obtained from the first by letting e=0 in the expression within square brackets. Q.E.D. The above corollary contains a multidimensional extension of Liapounov's central limit theorem: { G" : n > 1) converges weakly to cli if
lim
kn (,-2)/2 fl-p 00
k„
k,. ' 2 EllT"X,ll s )= 0,
(18.24)
j-1
for some s, 2 <s < 3. The first inequality in (18.23), however, is sharper. For example, if (X" : n> 1) is a sequence of independent and identically distributed random vectors with common mean zero, common positivedefinite covariance matrix V, and finite sth absolute moments for some s, 2<s 1) is an independent and identically distributed sequence of random variables, then the right side is o(n -(a -2) / 2 ) as n,00.
NOTES The first central limit theorem was proved for i.i.d. Bernoulli random variables by DeMoivre [I]; Laplace [I] elucidated and refined it, and also gave a statement (as well as some reasoning
for the validity) of a rather general central limit theorem. Chebyshev [1] proved (with a complement due to Markov Ill) the first general central limit theorem by his famous method of moments; however, Chebyshev's moment conditions were very severe. Then came Lia-
186 Normal Approximation pounov's pioneering investigations [1, 21 in which he introduced the characteristic function in probability theory and used it to prove convergence to the normal distribution under the extremely mild hypothesis (18.24) (for k I). Finally Lindeberg [1] proved Corollary 18.2 (for k - 1). In the i.i.d. case this reduces to the so-called classical central limit theorem: if (X" : n> 1) is a seque...c of i.i.d. random variables each with mean zero and variance one, then the distribution of n - ' / 2(X, +••• +X") converges weakly to the standard normal distribution 4}. This classical central limit theorem was also proved by Levy [I] (p. 233). Feller [I) proved that the Lindeberg condition (18.19) is also necessary in order that (i) the distribution of k,, '/ 2s; t(XS" 1 + • • • +X ) converge weakly to 4' and (ii) m/k"s,-*O as n-+oo; here k-1, and we write s,, for V", s; I for T", and m, = max (var(XS" )) :1 < j < k"). Many authors have obtained multidimensional extensions of the central limit theorem, for example, Bernstein [I], Khinchin [1, 2]; the Lindeberg-Feller theorem was extended to R" by Takano [1]. Section 11. Lemma 11.1-Corollary 11.5 are due to Bhattacharya [1-5]. These easily extend to metric groups, and Bhattacharya [6] used them to derive rates of convergence of the n-fold convolution of a probability measure on a compact group to the normalized Haar measure as n-+oo. Lemma 11.6 is perhaps well known to analysts. Section 12. The first result on the speed of convergence is due to Liapounov [2], who proved sup 1F"(x)- 4'(x)I0, whereas von Bahr [3] essentially assumed that the random vectors are i.i.d. and that p 3 < 00, pk+ I < oo. For the class e, Sazonov [ I ] finally relaxed the moment condition to P3 < 00, proving Corollary 17.2 in the i.i.d. case (Bergstrom [3] later proved this independently of
Convergence Assuming Finite Second Moments 187 Sazonov), while Rotar' [1] relaxed it for the general non-i.i.d. case. For more general classes of sets this relaxation of the moment condition is due to Bhattacharya [7]. Paulauskas [1] also has a result that goes somewhat beyond the class ( . The results of Section 13 are due to Bhattacharya [3], although the explicit computation of constants given here is new. The first effective use of truncation in the present context is due to Bikjalis [4]; Lemma 14.1 and Corollary 14.2 are essentially due to him. Lemma 14.3 is due to Rotar' (I]. Lemmas 14.6 and 14.8 were obtained by Bhattacharya [7]; a result analogous to the inequality (14.107) was obtained earlier by Bikjalis [4]. Analogs of Lemma 14.7 were obtained earlier by Doob [1], pp. 225-228, for a stationary Markov chain, by Brillinger [I] for the i.i.d. case, and by von Bahr [1] for the case considered by us; but we are unable to deduce the present explicit form needed by us from their results. Theorems 15.1, 15.4, and Corollary 15.2 are due to Bhattacharya [7], as is the present form of Corollary 15.3; earlier, a version of Corollary 15.3 was independently proved by von Bahr [3] and Bhattacharya [1, 2]. Theorems 17.1, 17.4, 17.8-17.10, and Corollary 17.3 are due to Bhattacharya [4, 5, 7]. Corollaries 17.5 and 17.12 were proved by von Bahr [2, 31 in the i.i.d. case; the corresponding results (Theorems 17.4, 17.11, and Corollary 17.13) in the non-i.i.d. case are new. The first global, or mean central limit theorems are due to Esseen [1, 3], and Agnew [1]. The fairly precise result Corollary 17.7 was proved for s-3 by Nagaev [I] in the i.i.d. case (a slightly weaker result was proved earlier by Esseen [1]) and later by Bikjalis [2] in the non-i.i.d. case; afterwards, the much more powerful Theorem 17.6 was proved by Rotar' [ I ] for s-3. Rotar' [1] also stated a result which implies Theorem 17.6 for all s>3; however we are unable to verify it.
Theorem 18.3 is new, as is perhaps Corollary 18.3; however Osipov and Petrov [1] and Feller [2] contain fairly general inequalities for the difference between the distribution functions F. and 0 in the non-i.i.d. case in one dimension. More precise results than (18.25), (18.26) are known in one dimension. Ibragimov (1) has proved the following result. Suppose that {X.: n > 1) is a sequence of i.i.d. random variables each with mean zero and variance one; let O 1; but the Riemann—Lebesgue lemma [Theorem 4.1(iii)] applies for m > p, so that there must exist t o E R k such that I Q 1 (t0)I= I, which means that X i assigns all its mass to a countable set of parallel hyperplanes (see Section 21); this would imply singularity of Qm with respect to Lebesgue measure for all m> 1, contradicting the fact that Q,„ is absolutely continuous for all m >p.
+
Expansions of Densities
Next, for 11t1I >bn
191
► 12
`nb2)I:11:I1>bn1^2)) 161( /2)IP'(sup^IQ n "-P ► P
=S" -P IQ ► (.. _ )I
(19.12)
(n>p)•
Now lim f
n iao
"(t)-;O,v(t)I dt
3 and that the characteristic function Q, of X, belongs to LP(R k) for some p> 1. A bounded continuous density q„ of the distribution Q, of n - '/ 2 (X 1 + • • • +X") exists for every n> p, and one has the asymptotic expansion s-2
sup ( 1 + IIXII')Ign(x) — xERk
n
—i/2pj( — $o.v:
{L})(x)I
j-0
=o(n-(2)/2)
(n-*oo),
(19.17)
where X, denotes the with cumulant of X, (3 < lvI < s). Proof Without loss of generality, assume p to be an integer (else, take [p] + I for p). For n > p + s, D °Q„ is integrable for 0< la) <s. Writing, for n>p+s, iai<s, s-2
h.(x) = x ('(gn(x) —
I
n-1/2pj(—tpo.v
j-o
{X. })(x)
(xERk),
s-2
ho(t)s( — i)^ a ^D^
n—j/2P1(it: {X,})exp{—i)h„(t)dt
(19.19)
(xER k ).
Let B be the positive-definite symmetric matrix satisfying B 2 = V 1 Define (19.20) ,q3= EIIBX1II°• -
.
By Theorem 9.12 (and the remark following it) 8(n)n-(s-2)/2( an
2 ),
(19.24)
where n > p + s, jaI s, and from 8
Q.E.D.
msup {IQ,(t)j: 11 1 11 > a) < 1.
(19.25)
194 Expansions—Nonlattice Distributions
Remark. It should be pointed out that Theorem 19.2 holds even with s-2. This is true because Theorem 9.12 holds with s-2. Therefore a sharper assertion than (19.5) holds, namely, nlimp
sup (l+llxI 2 )jgn(x) — #ov(x)^= 0 .
(19.26)
xERk
The next theorem deals with the non-i.i.d. case. THEOREM 19.3 Let (X: n> 1) be a sequence of independent random vectors with values in R k having zero means and average positive-definite covariance matrices V„ for large n. Assume that n
lira n - '
n— oo
E11B„Xj lI 3 3, where B n is the positive-definite symmetric matrix satisfying
Vn = n
B„ = Vn ',
Cov(Xj),
(19.28)
j- 1
defined for all sufficiently large n. Also, assume that there exists a positive integer p such that the functions m+p
g n.n (t)- 11 jE(exp(i))I
(0<mp+1)
j—m+1
satisfy
y-
sup
f gm , n (t)dtb,0<mp+1)) )),, ...
(D °„E l
ex p l i < tn -'/2, Bn X> >) )) rT )
< n-IaI/2(EIIB ,,XIi illa1l)... (Eli BnX1,^il -I)'^ < n - IaI/ 2 (E' II Bn X1 IlI a l)' < n-lal/2
ilad/Ial
(E1IB,,Xj,111aI) 4
... (Eli B,,XI. jl Ial ) ', n - lal/ 2mn i I a l n .
^ la.l/lal
(19.44)
Also, since m < jat , there are at least (n — IaI)/(dal p)— I sets of p consecutive indices in {1,2,...,n}\{j 1 ,...,jm ). Hence dt < n -lal/2+ ^yflal.n(S (b4))in-IaI /(IaIP)-2
} (11111>b.n' / '
X fRk gm'.n n^/2 dt < b$n-IaI/2+k/2+Ila.n(6(b4))(n-lal)/(IaIP)-2 (19.45)
for some m', 0 < m' < n —p and, therefore,
(11111>bOI12
)
JD a Qn (t)I dt n+n'/ 2 I(EY1."II})=0, ✓
-i M",i({Ilxll n,).
(19.93)
Therefore for all such j, using (19.79) and the Leibniz formula for differentiating a product of n functions, ID OMM,j (t)I (I6v,) ^) -
By (19.69) and (19.70) (and remembering that G is absolutely continuous), sup
llm S„= lim S„= n-i 00
n- 00
IG(i)I(I6p3)
so that 8„ 3. Let V denote the covariance matrix of Q 1 and X its with cumulant (3 < II < s). Then for every real-valued, Bore/-measurable function f on R k satisfying (20.4)
M5,(f) n o , say, by T2 = and, writing A„ for the largest eigenvalue of D,
s+k+l
A„ _
c 7 (s, k)n 1/2
h/(s+k
(EIIT,Z1.1,II An
)
—
I)
'/Z
(20.22)
211
Expansions Under Cramei s Condition
By Corollary 14.2, (II T" II : n > no ) is bounded, and [by Lemma 14.1(v)] EIIT1,.II5+k+1 G2 s+k+lEIIy1 .nlls+k+1= 0(n(k+1)/2) C8(s,
(n—+),
k)n(1 /2)(s— 2)/(s+k— 1) (>
Ps /(s+k— 1)
(20.23)
where c 8 is positive. Use the first estimate of (20.23) in (20.21) to get
f
I ^ DO aHH (t) J [ DGK^ (1) -
{Il^ilc }
+ fc 9 (s, k)(1 + IItIIIS - °I) exp { — ie IItII 2 } di { II1II >A}
s+k-2
+ fDO - °` 2 n - '/ 2p(it : (X,}) exp( — Zm^^=
for every positive e, and (iii) the characteristic functions g„ of X. satisfy
lim sup jg„(t)I < l °~00
(20.55)
II1II>b
for every positive b. Then for every real-valued, Borel-measurable function f on R" satisfying (20.4) for some s', 0< s' < s, one has .,-Z
f fd( Qn— 1 n-'/2P,(—'D: {X,,,n)) r-0
<M:(f)8i(n)+c(s,k)w!(2e-cn_ 4),
(20.56)
where Q, is the distribution of n -1 / 2 B„ (X I +"• + X„), with B,2, = V, '. Also 8 1 (n) and d are as in Theorem 20.1, and z = average with cumulant of B„Xj (1 < j < n).
Expansions Under Cramer's Condition
217
The proof of Theorem 20.6 is entirely analogous to that of Theorem 20.1 and is therefore omitted. As indicated in the introduction to the present chapter, there are special functions f, for example, trigonometric polynomials, for which the expansion of f fdQ„ is valid whatever may be the type of distribution of X,. This follows from Theorems 9.10-9.12. Our next theorem provides a class of functions of this type. For the sake of simplicity we state it for the i.i.d. case. THEOREM 20.7 Let (X,,: n> 1) be an i.i.d. sequence of random vectors with values in Rk . Assume that the common distribution has mean zero, positive-definite covariance matrix I and a finite sth absolute moment p, for some integer s> 3. Let f be a (real or complex-valued) Bore/-measurable function on Rk that is the Fourier-Stielijes transform of a finite signed measure satisfying (20.57)
f Ilxll s-Z I t (dx) < oo.
Then
f
f-2
fd Q„
{7G}) =o(n - cs -2) / 2 )
- n - '/ Z P,(-^:
r=o
(n--*oo).
(20.58)
Here Q„ is the distribution of n " 2 (X 1 + • • • + X„), X, = with cumulant of X 1 . Proof. By Parseval's relation (Theorem 5.1(viii)] and Theorem 9.12, s-z
f
fd Q„ - ^ n-'/2P,(-0: {X})1
= ro
/
s-z
= f Q„(t) - 1 n
- ' /2
P,(it:{X,.})exp(
r=o
f
8
(n)n -(s-2)/2
-
ill'11 2 ) It(dt)
[IItll Z +llt11 10-2)
I
{Iltllc20(s, k)n" 2p, t/(-2))
Q.E.D. We point out that if µ is discrete, assigning its entire mass to a finite number of points of R k , then the above theorem applies, and thus f may be taken to be an arbitrary trigonometric polynomial. However the result applies to a much larger class of functions f, including the class of all Schwartz functions (see Section A.2 for a definition of this class). Finally, for strongly nonlattice distributions we have the following result. THEOREM 20.8 Let (X,,: n> 1) be a sequence of i.i.d. strongly nonlattice random vectors with values in R k . If EX 1 = 0, Cov(X,) = I, and p 3 - EIIX 1 11 3 < cc, then for every real-valued, bounded, Bore/-measurable function f on R k one has J fd (Q,,—(D—n-'/2P1(— I:
{X„)))I=wj(R k ) . o(n -I/Z )+ 0 (wl (s.
4 ))
(n-moo), (20.60)
where Q. is the distribution of n -1 / 2 (X I + . • • +X„), X,, = with cumulant of X 1 , and 8„ = o(n - 1 / 2 ); a n does not depend on f. Proof. Given q >0, we show that there exists n(rl) such that for all n > n(ij) the left side of (20.60) is less than wf (R') , o(n -1 / 2 )+c(k)4 (ran -1 / 2 : fi).
(20.61)
Introduce truncated random vectors Y1, (1 0, and third moment µ3 . It may be noted that for k =1 "strongly nonlattice" is the same as "nonlattice." One may also easily write down analogs of (20.77) for more general classes of sets (than rectangles), for example, the class of all Borel-measurable convex subsets of R c, or the class & 1 (d: 1) introduced in (17.3). NOTES Although Chebyshev [11 and Edgeworth [1] had conceived of the formal expansions of this chapter, it was not until Cramer's important work [1,31 (Chapter VII) that a proper foundation was laid and the first important results derived.
Expansions—Nonlattice Distributions
222
Section 19. For an early local limit theorem for densities in the non-i.i.d. case and its significant applications to statistical mechanics the reader is referred to Khinchin [3]. Theorem 19.1 is essentially proved in Gnedenko and Kolmogorov [I], pp. 224-227; in this book (pp. 228-230) one also finds the following result of Gnedenko: in one dimension under the hypothesis of Theorem 19.2 one has s-2
sup q,,(x)- i n -112P/( +: {L))(x) =o(n -
-(,-
2 ' 12 ).
(N.1)
xER 1 !-0
For k- I and s>3 the relation (19.17) in Theorem 19.2 was proved by Petrov [1] assuming boundedness of q„ for some n; however this assumption has been shown to be equivalent to ours in Theorem 19.1. Theorems 19.2, 19.3, and Corollary 19.4 appear here for the first time in their present forms. The assumptions (19.47), (19.49) may be considered too restrictive for the non-i.i.d. case; however it is not difficult to weaken them and get somewhat weaker results; we have avoided this on the ground that the conditions would look messier. Theorem 19.5 is due to Bhattacharya [8]; it strengthens Corollary 19.6, which was proved earlier by Bikjalis (4) for s>3. Section 20. Cramer [1,3] (Chapter VII) proved that
sup
k_
f
-3
I
n J /2P(-4 o ?: {X,))(x)I_O(n- t,-=W/2) (N.2) -
XER 1 J-0
in one dimension under the hypothesis of Theorem 20.1; here F. is the distribution function of n -1 / 2 (X 1 +... +X„) and var(X 1 ) - a 2. This was sharpened by Esseen (1], who obtained a remainder o(n t' -2 )/ 2) by adding one more term to the expansion; Esseen's result is equivalent to (20.49) when specialized to k =1. R. Rao [1,2] was the first to obtain multidimensional expansions under Cramir's condition (20.1) and prove that in the i.i.d. case one can expand probabilities of Borel-measurable convex sets with an error term O(n t' -2>/ 2 (logn)t k- 't/ 2) uniformly over the class Cs, provided that the hypothesis of Theorem 20.1 holds. This was extended to more general classes of sets independently by von Bahr [3] and Bhattacharya [1,2]. Esseen's result on the expansion of the distribution function (mentioned above) was extended to Rk independently by Bikjalis [4] and von Bahr [3]. Corollaries 20.4, 20.5 as well as the relation (20.49), which refine earlier results of Bhattacharya [1,2) and von Bahr [3], were obtained in Bhattacharya [4, 5]. The very general Theorem 20.1 is new; this extends Corollaries 20.2, 20.3 proved earlier by Bhattacharya [5]. Theorems 20.6, 20.7 are due to Bhattacharya [4,5]. There is a result in Osipov [l] that yields o(n_ (1 _ 2)/ 2) in place of o(n t'_ 2)/ 2) as the right side of (20.53). Some analogs of Theorem 20.8 have been obtained independently by Bikjalis [6). Earlier Esseen [1) had proved (20.60) xl x E R t )) in for the distribution function of Q. (i.e., for the class of functions { f I one dimension and derived (20.78). -
-
-
- _,,. : (
CHAPTER 5
Asymptotic Expansions Lattice Distributions
The Cramer-Edgeworth expansions of Chapter 4 are not valid for purely discrete distributions. For example, if (X,,: n> 1) is a sequence of i.i.d. lattice random variables (k= 1), then the distribution Q„ of the nth normalized partial sum is easily shown to have point masses each of order n - I / 2 (if variance of X, is finite and nonzero). Thus the distribution function of Q. cannot possibly be expanded in terms of the absolutely continuous distribution functions of P,(- 4)), 0 < r < s -2, with a remainder term o(n -(,-2)/2 ), when X, has a finite sth moment for some integers not smaller than 3. However the situation may be salvaged in the following manner. The multiple Fourier series Q. is easily inverted to yield the point masses of Q. Making use of the approximation of Q„ by exp( - 2 )^;=pn - '' 2P,(it) as provided by Chapter 2, Section 9, one obtains an asymptotic expansion of the point masses of Q„ in terms of j;=an - '12P,(-4). To obtain an expansion of Q„ (B) for a Borel set B, one has to add up the asymptotic expansions of the point masses in B. For B = (- oo, x], x E R k , this sum may be expressed in a simple closed form. A multidimensional extension of the classical Euler-Maclaurin summation formula is used for this purpose. 21. LATTICE DISTRIBUTIONS Consider R k as a group under vector addition. A subgroup L of R I is said to be a discrete subgroup if there is a ball B (0: d), d > 0, around the origin such that L n B (0: d) _ (0). Equivalently, a subgroup L is discrete if every ball in R k has only a finite number of points of L in it. In particular, a 223
224
Expansions—Lattice Distributions
discrete subgroup is a closed subset of R k . The following theorem gives the structure of discrete subgroups. THEOREM 21.1 Let L be a discrete subgroup of Rk and let r be the
number of elements contained in a maximal set of linearly independent vectors in L. Then there exist r linearly independent vectors i , ... , in L such that L=Z•E,+ • • • +Z•t,-(m i t t + • • • +m,t,: m i ,...,m, integers) (Z=(0,±1,±2,...)). (21.1)
Proof First consider the case k = 1. If L is a discrete subgroup, L * (0), then to =min t : t E L, t > O) is positive. If t E L is arbitrary, let n be an integer such that nto < t < (n + 1)t a . Then 0 < t — nt p < to and t — nto E L, so that the minimality of t o implies that t = nt o or L = Z • t o . Now consider the case k> 1. The theorem will be proved if we can construct linearly independent vectors,..., , in L such that Ln(R• +•.. +R• ,)=Z•¢ i + • • • +Z..,, since it follows from the definition of the integer r that L c R• J, + • • • +R•,. Here R is the field of reals. We construct these vectors inductively. Assume that linearly independent vectors ¢, ..., ^ have been found for some s, s < r, such that (i) EE L, j=1,2,...,s and (ii) Ln(R•J,+•••+R•^,)=Z•t 1 +•••+Z. . Now we describe a method for choosing ^ ^. Let a E L\(R• J, + • • • + R.). Then a is linearly independent of ...,ts . Let M={toa: t oa+t i t s + • • • +tEL for some choice of real numbers t1,...,ts }. Since ¢ i ,...,C,EL, it follows that M = (t o a: toa + a i t i + • • • + as E L for some choice of a i with 00 such that M=Z•a oa. Choose constants a^, 1 r. We shall show that ro =r. Since r(d) is integer-valued, there exists d0 >0 such that r(d)=ro for 00 be arbitrary and let d 1 =min(e/k,d0 }. Then r(d 1 ) = do, and there exist linearly independent vectors Q^,...,/3,, in L n B(0: d 1 ). It follows that /3^ E R•rl 1 + • • • + R•rl o and, therefore, R•(3, + • + R•13,0 C R•rl, + • • • + R•17,, which implies R.f3 1 +••• +R•/3. 0 =R•rj 1 +••. +R•rl, o (21.5) Now let EE R•r1 1 + • • • + R•rl, o be arbitrary. Then J= t 1 X13, + • • • + t /3,0 Write t^ = m1 + tj with rn1 integral and ^ t^^ < 1, 1 <j < r o . Thus there exists /3 E L, /6= m, Q,+••• +m of13, 0, such that Ili — a II 0 and let to E L*. Then (21.11) implies that E a + 2irZ, so that P(E2IrZ)=1.
(21.13)
If S=(: P (X = x o + E) > 0), then (21.13) is equivalent to E 277Z
for all j' ES.
(21.14)
Since S generates L, (21.13) is equivalent to E27TZ
(21.15)
for all EEL.
Thus t o belongs to the right side of (ii). Conversely, if (21.15) holds for some t o , (21.13) holds and f(t+t o)I=IE(exp{i))I=IE(exp(i — i})I=IE(exp{i+i})J=J f (1)J
(tER"` ). (21.16)
Thus to E L*, and (ii) is proved. It remains to prove (iii). By Theorem 21.1, there exists a basis { ... k } of R k such that L = Z • t 1 + • • • + Z • J„ where r is the rank of L. Let { rl,, ... ,,t k } be the dual basis, that is, %, rl^) = 5 ., 1 < j, j' < k, where 8y is Kronecker's delta. Then (ii) implies L*=27r(Z•r1 1 +... +Z q,)+R'ij. +j +... +R• l k .
(21.17)
The relation (iii) follows immediately from (21.17). The last assertion is an immediate consequence of (i) and (ii) [or (21.17)]. Q.E.D. COROLLARY 21.7 Let X be a lattice random vector with characteristic function f. The set L* of periods of !fI is a lattice if and only if X is nondegenerate. Proof. This follows immediately from representation (21.17) and Lemma 21.5. Q.E.D. Let L be a lattice and {j I ,•••,Jk ) a basis of.L; that is, L=Z•,+ • • • +Z•jk .
(21.18)
Let {rt 1 ,...,71k } be a dual basis, that is, =Sy , and let L* be the lattice defined as L*=21T(Z•,q1+ • • • +Z.).
(21.19)
Lattice Distributions
229
Write Det(j l ,...,^k ) for the determinant of the matrix whose jth row is =(^ I ,...,^k ), so that Det(^ I ,...,^k )=Det(^ .). If .,k} is another basis of L, then there exist integral matrices A = (ay,) and B = (b11 ) such that Jj =la^j J.' and ^y =Yby.f^. Then DetA =±1, so that Det(i; I ,...,iak )=±Det(,...,?;k) Thus the quantity defined by det L = IDet(^ I , ..., i k )I
(21.20)
is independent of the basis and depends only on the lattice L. Also, with this definition, (2i)k det L* =(2 a) k IDet(,1 i ,.. .,rlk )I =
(21.21)
detL '
Consider a domain S * defined as follows: 9*={ 1111+"• +101k: I1^ l 1, n > 1). We shall assume that write
ps
= E IIX1 µ ll f is finite for some integer s > 2, and
D„ = Cov(Z,,,,),
—
1= det L,
X„ = with cumulant of X 1 ,
= with cumulant of Z 1 ,,, Ya,n_n- 1/2(a
(22.2)
—nµ),
(Iv! < s),
y,,,,=n -1/2 (a — nEY,. ,,)
(a
EL),
Pn(Y.,.)=P(XI+... +Xn= a)= P(n -1/2 1 (Xi — µ)=Y.,n)• j -1 n
P.(Y..n)=P(Y1,rr+... +Y. ,= a) =P( n -1 / 2
1 Zj.n=ya.n),
m-2
qn.m = In —k/2
n —'/ 2P,(
r..0
-
4: (x})
(2<m<s),
,n-2 r-0
n - ' / 2P,( — qIov,: (,,))
(2<m 2, then sup ( 1+ IIYa.nII 5 ) IP,,( Yn,n) — gn,,(Ya.n)1 = o(n —( k+s-2)/2 ) aEL
( n —.00).
(22.4) Also, a€L
IPn(Y.,n) — q.,,(Ya.n)I = o(n
—( s-2)/2) (n--goo).
(22.5)
Expansions—Lattice Distributions
232
Proof Let g, denote the characteristic function of X, and fn that of n - 1 / 2 (X, + • • • + X n - nµ). Then fn(t)=(gi("
- ' /2t)exp{
-
i)D^fn(t)dt I e 1/I f'
where /3 is a nonnegative integral vector satisfying I /3 I < s, and n'/ 2A={n'/ 2x: xEA}
(ACR").
(22.8)
Also, clearly, i)IQl Ya ngn.s(Ya.n) = 1(27r) —k n —k / 2 ( —
1
rRk exp{ — l}
3_2
2
2
l
XD O 2n - '/ 2Pr (it: (x,))exp{ -I 1(1 } dt
1 1
r-0
(C^(31<s). (22.9)
111
Hence IYun(hn(Ya.n)—q.s(Ya.n ))J k + 1, then (22.5) follows from (22.4). For the general case, we shall use truncation. First note that [see (14.111)] n
IPn(Ya,n)—P,(Ya,n)IK2 1 P(Xj }J,n) '
j-I
aEL
= 2 nP(IIXI'µII =
)
(_(5_2)/2) -(s -2) /2 )
n l/2 ) (22.16)
(n—*oo).
Next, by Lemma 14.6, s-2 ^gn,s(Ya,n)
Qn,s(Ya.n)^
(I + IIY0,nII3r+2)
n-k/2sl(n) r=0
xexp{
—II 6 , n ll2
+ I8k '
II6a.nlI 2
1 8012
12I
} (aEL), (22.17)
where 8,(n) = o(n -(s -2) / 2 ). Now n-k/2a. (I+IIYa.n1I3r+2 )exp EL 6
=n -
k/2
I
(1+1In
{
l
—
- 1/2ajI3r+2)exp
aEL-nµ
< n - "`/ 2 sup
I h(n - '/ 2a+x)
IIxII< n-112aEL
.}
I/2
IIn
-
6
l
1
' /2a
1 2 + 11n
-
I/ 2a
1I
8k1/2 }
(=diameter of *). (22.18)
234
Expansions—Lattice Distributions
where
h(y)=(1+IIYII3r+2)exp{
—I^6Y1I2 + 8^k' 1 2 1
(yER k ). (22.19)
Let T be a nonsingular linear transformation on R k such that TL = Zk .
Then n —k/2 sup Y, h(n — ' /2a+x) IIxiIGAn ^/ 2 aEL
sup
2 h(n -1 / 2 T - 'a'+x), f h(T - y)dyn1/2/(16P 3 ))
( —) f' n(t) I di ate
< n" 2E IIZI.n Its (S' + 2p5n -(:-2)/2)" s = o(n -( -2) / 2 )
(s'=0, 1,...),
-;
(22.29)
where S' is defined by -
S'msup(Ig1(t)I:
tE
*\ (1':
IIt'll'/z. (24.1) {Ilxll — a&')
Combining (24.4) — (24.8) one arrives at aS 0 : I M,(•: as) d (z — v)*KE + a w,(•: 2as)dv'
+ J w, (• : as) d I (µ — v) r KF I + ö 1 (1 — a)
+( xER, I(If(x+y)I+ww(x+y:
•
j K5(dx) . {Nxl ^ as')
a&))I
—
v I(dy))
(24.9)
In case b o = — J fd (µ — v) , the above computations are applied to —f (in place off). Thus in all cases one obtains a6 0 5 9, (e) + a0 2 (z) + c0 3 (s) + S, (1 — a) ,
(24.10)
Two Recent Improvements
246
where Os (s) = j(IfI + 2W,(.: as)) d l (i — v) . K E I , 0 :(E) = !wf( . : 2as)dv',
03 (g) =
c=
J KE (dx) , IDA2t — as' )
sup x E Rk
I (If (x + y) I +wf(x+y
:a&))IA
—
v I(dy)•
(24.11)
Now define
=
sup
I j f (x + y) (ai — v) (dy)
(uxII <jae'
(j = 1, 2,...).
(24.12)
We are going to establish a relation such as (24.10) between S f and 5,,, for all j = 1, 2, .... Fix j . Let n be a number satisfying 0 < rl < .(Assume S; > 0; else, there is nothing to prove). Once again suppose
=
sup t Ilia <jas' )
J f(x + y) (A — ►') (dy)
(24.13)
There exists x, such that Ix, <jas' and f(x^ + Y) (p — v) (dy) > S1 — n .
(24.14)
Smoothing Inequality
247
Then
j J M, (x + y + x) (. — v) (dy) K (dx) E
1
= (M, (x + y + x) — f (x + j
j
y)) (u — v) (dy) K. (dx)
+ f (x, + y) (µ — v) (dy) KE (dx) = I.j + I +
131,
(24.15)
where I,j , I, I j are the integrals (w.r.t. K. (dx)) over [ Uxa — — a w,
(xj
+ y : 2as) v (dy) + a (Sj — -q) ,
1.► >— — J w,,j (.: as) d I (u — v) * KE
I —8
j. 1
(1 — a) ,
I I,j 15 c0,(e).
(24.16)
Since i > 0 may be taken arbitrarily small, one has a bj F. Then the inequality (24.21) holds with 6o given by (24.2) and -yam, defined by (24.19), (24.20).
As a consequence of this strengthening of Lemma 11.4 one may obtain the following improvement of Theorems 15.1, 15.4. The notation below is that of Section 15, unless stated otherwise. The positive constants d depend only on k, s, r and are suitably chosen. THEOREM 24.2. Under the hypothesis of Theorem 15.1 one has, for each positive integer j,
I
J fd (Q
-
^') 1 ^ M, (f)(d1n (• 2 )': A*,,, + d2,1(e3/n'')') -
+ d3w1 (d4Q3n ' : -
,
(24.22)
where r 0 , g and 0 are as defined in the statement of Theorem 15.1.
Proof The proof of Theorem 15.1 remains unchanged upto (and including) relation (15.30). Now apply Lemma 24.1 with ' µ (dy) = (1 + ly I 0) Q (dy), v (dy) = (1 + Ily u '°) ' (dy) ,
(24.23)
250
Two Recent Improvements
and KE satisfying (15.25) and
1yV
K (dy) < oo
J (24.24)
Let (See (15.25), (15.33) )
a = 1, s = 16c 9 Q,n, e' =
m = [E , _ ½ ] , (24.25)
and note that a ^ 3/4 (See (15.25) ). Instead of (15.31) one then obtains the relation
I
g.. (x) (I + lxu `°) (Q.' — 0') (dx)
< 27.. +
(!
—
a
)^
a., (24.26)
By (15.48),
81 ^, s d6 M, (f) I (U — is) .K6 1 s d, M, (f)A;,n'c' :>i: -
(24.27)
Smoothing Inequality
251
To estimate 82,,. write t for the density of ' and note that
(y : 2a6) v (dy) = J w, (a. + x + y : 2a6) (1 + lyu ro ) E ` (y) dy
= j cot (z : 2at) (1 + liz — a„ — xli '°) ' (z — a. — x) dz
= J w, (z : 2as) (1 + az1 '°) ^ ` (z) dz
+ j w, (z : 2as) j (1 + liz — a„ — xli '0 ) E ` (z — a„ — x)
— (1 + lizIir°)E`(z)Idz.
(24.28)
For lixl < mae' = ms' < s "½ _ ; 4 one may show, as in the proof of Lemma 14.6 (especially see (14.81) — (14.86)) ,
(1 + liz — a. — xli ' 0 ) t + (z — a. — x) — (1 + Ilzll ' 0 ) E' (z)
s 1 liz — a. — xu '° — lizu '° 1 ^' (z) + (1 + liz — an — xli '0)
I E'(z—a„—x) — + (z) I < d8 (s% + lia.li ) (I + Uz11 '0- ' ) t + (z)
+ d9 (e% + aa„II ) (1 + lid '0+1 ) exp ( ½ (e% + lia„II )'
+ (s'" + Ila„II ) lid — ½ Uzu 2 I
252
Two Recent Improvements
s d, o (1 + Id
'O-1
) E (z) + d 11 (1 + Id '0 ) exp ( — '/2 Nz1 2
if IzI s d%2 ( + Na.N )'' ,
s d, ° (I + Nz I '*-') t ` (z) + d,:,,s' exp I — Id 2 / 4 }
if Nzd > d 12 (&4 + Na„N) -' (24.29)
Hence
82 s dt, ,E
J w, (z : 26) (1 + NzI '°) E (z) dz + d,4, jm,(f)&J. (24.30)
Further, by Lemma 14.3 (or (14.93) ), and inequalities (14.12), (14.81) (also see (15.28) ), one has
NU
= I (1 + Nyl ' ) Q. (dy) s d,5,
UPI =
0
J (1 + NyI'° )) E(y) I dy s d,6.
(24.31)
Smoothing Inequality
253
These lead to the estimate
c =
sup
J(g
+ y) ( + w, (x + y: ac)) µ — v I (dy)
x E R'
5 d17 M,(f)(liO + IvU) A)
I (1 — o) a2 (& (t) — t. (t)) (NtN>A)
= o(n -f 2)/2 ) + o (ii -(-/2
-
19 (01 dt
k/4)).
(25.12)
Expectations of Smooth Functions
259
In case s is an odd integer use truncation and carry out the above computations with s + 1 replacing s. Q.E.D.
For some purposes the result of Gotze and Hipp (1978) is somewhat better and, therefore, let us state it here.
THEOREM 25.2. Assume that g, < oo for some integers >_ 3. If (i) D °a f is continuous for a I s s — 2 , (ii) (1 + IxN 2 ) -12 1 f (x) I is bounded above, and (iii) D of has at most a polynomial growth at infinity for I a I = s — 2 , then
j fd(Q, u,) = o(n -
-
c•
-
2)'2).
(25.13)
In order to make a simple comparison between the two theorems above, assume Q, < ao for all integers s > 0. If f is m-times continuously differentiable and its derivatives of order m are polynomially bounded, then Theorem 25.2 provides an asymptotic expansion of j fd Q. with an error o (n m2). Theorem 25.1 on the other hand gives a larger error o (n (= k"') . However, there are functions in W".' which are not m-times continuously differentiable. In general, all that can be said is: if g E W'. 2 then g has continuous derivatives of order a for all a satisfying I a I < m — k / 2 t. Thus there are functions f for which Theorem 25.1 provides a sharper result. Finally, let us mention the recent monographs by Hall (1982) and Sazonov (1981) on the subject matter of this monograph. -
t See Reed, M. and Simon, B. [l], Theorem IX.24, p. 52.
-
Chapter 7
An Application of Stein's Method In this section, we first present a brief outline of a method of approximation due to Stein (1986), which is, in general, not Fourier analytic. This is followed by a detailed derivation of the Berry—Esseen bound for convex sets obtained by Gotze (1991), who used Stein's method.
26.
AN EXPOSITION OF GOTZE'S ESTIMATION OF
THE RATE OF CONVERGENCE IN THE MULTIVARIATE CENTRAL LIMIT THEOREM In his article Gotze (1991) used Stein's method to provide an ingenious derivation of the Berry—Esseen-type bound for the class of Borel convex subsets of R' in the context of the classical multivariate central limit theorem. This approach has proved fruitful in deriving error bounds for the CLT under certain structures of dependence as well (see Rinott and Rotar 260
26.1. The generator of the ergodic Markov process. 261
(1996)). Our view and elaboration of Gotze's proof follow Bhattacharya and Holmes (2010) and were first presented in a seminar at Stanford given in the summer of 2000. The authors wish to thank Persi Diaconis for pointing out the need for a more readable account of Gotze's result than that given in his original work. Rai c (2004) has followed essentially the same route as Gotze, but in greater detail, in deriving Gotze's bound. It may be pointed out that we are unable to verify the derivations of the dimensional dependence 0(k) in Gotze (1991), Raic (2004). Our derivation provides the higher order dependence of the error rate on k, namely, 0(k4). This rate can be reduced to 0(k4) using an inequality of Ball (1993). The best order of dependence known, namely, 0(k4), is given by Bentkus (2003), using a different method, which would be difficult to extend to dependent cases. As a matter of notation, the constants c, with or without subscripts, are absolute constants. The k- dimensional standard Normal distribution is denoted by Ar(0, ilk) as well as P, with density 0.
26.1 The generator of the ergodic Markov process as a Stein operator. Suppose Q and Qo are two probability measures on a measurable space (S, S) and h is integrable (with regards to Q and Qo). Consider the problem
of estimating
Eh—E0h
-J hdQ — f hdQ o .
( 26.1.1)
A basic idea of Stein (1986) (developed in some examples in Diaconis and Holmes (2004) and Holmes (2004)) is
Chapter 7. An Application of Stein's Method
262
(i) to find an invertible map L which maps "nice" functions on S into the kernel or null space of Eo, (ii) to find a perturbation of L, say, L a , which maps "nice" functions on S into the kernel or null space of E, and (iii) to estimate (26.1.1) using the identity Eh - Eoh = ELgo = E(Lgo - L a ga ),
(26.1.2)
where go = L -1 (h - Eoh),
ga - L
-
1 (h
- Eh).
In the present application, instead of finding a perturbation L a of L, one obtains a smooth perturbation Tt h, say, of h, and applies the first relation in (26.1.2) to Tt h rather than h. Writing V) t = L -1 (Tt - EoTt h) in place of go above, one then estimates EL b = ETt h - EoTt h. Finally, the extent of perturbation due to smoothing is estimated: (ETt h - EoTt h) - (Eh - Eoh). One way to find L is to consider an ergodic Markov process {Xt : t > 0} on S which has Qo as its invariant distribution and let L be its generator. Lg
t oTtgt
-
g 2 g 6.1.3 E Dc,, ) (
where the limit is in L 2 (S, Qo) , and (Ttg)(x) = E ]g(Xt)IXo = x] , or, in terms of the transitions probability p(t; x, dy) of the Markov process {Xt :t>0}, (Tt g)(x)
= Js
g(y)p(t; x, dy)
(x E S, t > 0).
(26.1.4)
26.1. The generator of the ergodic Markov process.
263
Also, DL is the set of g for which the limit in (26.1.3) exists. By the Markov (or semigroup) property, Tt +s = Tt T3 = T3 Tt , so that d Ttg = lim
dt
Tt+sg - Ttg
40
= lim
s
Tt(Tsg - g) = T t Lg.
(26.1.5)
s
40
Since Tt T3 = T8 Tt , Tt and L commute, (26.1.6)
dt Ttg = LTtg.
Note that invariance of Qo means ETt g(Xo) = Eg(Xo) = f gdQo, if the distribution of X 0 is Qo. This implies that, for every g E DL, ELg(Xo) = 0, or
i
s
Lg(x)dQo(x) = 0,
[ELg(Xo) = E^ t o
Ttg(Xo) - g(X0) __
t
)
urn ETeg(Xo) - Eg(Xo) t
That is, L maps DL into the set 1 1 of mean zero functions in L 2 (S, Qo). It is known that the range of L is dense in 1 -L and if L has a spectral gap, then the range of L is all of 1 1 . In the latter case L -1 is well defined on 1 -'- (kernel of Qo) and is bounded on it (Bhattacharya (1982)). Since Tt converges to the identity operator as t . 0, one may also use Tt for small t > 0 to smooth the target function h = h - f hdQo. For the case of a diffusion {Xt : t > 0}, L is a differential operator and even nonsmooth functions such as h = 1B - Qo(B)(h = 1B) are immediately made smooth by applying T. One may then use the approximation to h given by
Tt h = L(L - 'Tt h) =Li t , with O t = L -1 Tt h,
(26.1.7)
and then estimate the error of this approximation by a "smoothing inequality", especially if Tt h may be represented as a perturbation by convolution.
Chapter 7. An Application of Stein's Method
264
For several perspectives and applications of Stein's method, see Barbour (1988), Diaconis and Holmes (2004), Holmes (2004), and Rinott and Rotar (1996).
1(a) The Ornstein—Uhlenbeck Process and Its Gausssian Invariant Distribution The Ornstein-Uhlenbeck (OU) process is governed by the Langevin equa-
tion (see, e.g., Bhattacharya and Waymire (2009), pp. 476, 597, 598)
dXt where {B t
( 26.1.8)
= - Xt dt + v dBt ,
: t > 0} is a k- dimensional standard Brownian motion. Its
transition density is k
p(t; x, y) = x =
-t
)2
2 - e xi II [27r(1 - e -2t)] z exp - (y2(1 - e -2t) }'
i=1 ... , xk), y = (y1, ... , yk). (
x1,
(26.1.9)
This is the density of a Gaussian (Normal) distribution with mean vector
e - t x and dispersion matrix (1 - e -2t )4 where ilk is the k x k identity matrix. One can check (e.g., by direct differentiation) that the Kolmogorov backward equation holds:
ap(t; x, y) k a2 p(t; x, y) _ k ap(t; x, y) Lxz —L. z _ 1 ax? at i= 1. ax, = Ap-x.Vp=Lp, withL-©-x•V,(26.1.10)
26.1. The generator of the ergodic Markov process.
265
where © is the Laplacian and V = grad. Integrating both sides w.r.t. h(y)dy, we see that Tt h(x) = f h(y)p(t; x, y)dy satisfies
a
Tt h(x) = ATt h(x) - x • VTt h(x) = LTt h(x), Vh E L 2 (IP C , (F).
(26.1.11)
Now on the space L 2 (R k , (F) (where (F = N(0, Ilk) is the k-dimensional standard Normal), L is self-adjoint and has a spectral gap, with the eigenvalue 0 corresponding to the invariant distribution 4) (or the constant function 1 on L 2 (R', 4))). This may be deduced from the fact that the Normal density p(t; x, y) (with mean vector e -t x and dispersion matrix (1 - e -2t )IIk) converges to the standard Normal density q(y) exponentially fast as t -* 00, for every initial state x. Else, one can compute the set of eigenvalues of L, namely, {0, -1, -2,.. .}, with eigenfunctions expressed in terms of Hermite polynomials (Bhattacharya and Waymire, 2009, page 487). In particular, L -1 is a bounded operator on 1 1 and is given by
L -1 h = - J
r
T3 h(x)ds,
Vh
h - f hd(F E L 2 (R k ,,D). (26.1.12)
0
To check this, note that by (26.1.11)
(26.1.13)
= -
Tsh(x)ds = -1 00 LTs h(x)ds = L (- ^ Ts h(x)ds) .
Jo
as
0
o
For our purposes h = 1C: the indicator function of a Borel convex subset CofR k . A smooth approximation of h is Tt h for small t > 0 (since Tt h is infinitely
Chapter 7. An Application of Stein's Method
266
differentiable). Also, by (26.1.12)
't (x) - L 'Tt h(x) _ -
J
T T h(x)ds 8
(26.1.14)
t
0
_ — Jo / T3+th(x)ds = —Jt ^ T h(x)ds 3
_—
T l Rk J {
h(e - sx + 1 - e -2 sz)O(z)dz } ds,
JJ
O
where 0 is the k-dimensional standard Normal density. We have expressed T3 h(x) - E[h(X 3 )IXo = x] in (26.1.14) as E[h(X 3 )jXo = x] = Eh(e - sx + 1 - e -2 sZ), where Z is a standard Normal N(0,
Ilk),
(26.1.15)
for X3 has the same distribution
as a - ex + 1 - e -2 sZ. Now note that using (26.1.14), one may write (26.1.16)
Tt h(x) = L(L - 'Tth(x))
= 0 (L - 'Tih(x)) - x . 0 (L - 'Tth(x)) = i i?&t(x) - x - DV)t(x)• For the problem at hand (see 26.1.1), Qo = I and Q = Q( n ) is the distribution of S,,, = *(Y1 + Y2 + . • • + Y.) _ (X1 + X2 + • • + Xn), (X3 = Y^/ / ), where are i.i.d. mean-zero with covariance matrix IIk
and finite absolute third moment k
2
p= EjjYi11 3 = E ^(Y) 2 We want to estimate
Eh(S) = Eh(Sn ) -
J hd4)
for h = 1c, C E C, the class of all Borel convex sets in Rk.
(26.1.17)
26.2. Derivatives of Ii t - L -1 Tt h.
267
For this we first estimate (see (26.1.16)), for small t > 0,
ETth(SS) = E [DOt(S.) — Sn - DV)t(S,)] •
(26.1.18)
This is done in subsection 26.3. The next step is to estimate, for small t > 0,
ETth(S) — Eh(S),
(26.1.19)
which is carried out in subsection 26.4. Combining the estimates of (26.1.18) and (26.1.19), and with a suitable choice oft > 0, one arrives at the desired estimation of (26.1.17). We will write
6n =
sup
J hdQ(,
n
{h=lc:CEC}
26.2 Derivatives of
t
) —
fhdF. (26.1.20)
- L -1 Tt h.
Before we engage in the estimation of (26.1.18) and (26.1.19), it is useful to compute certain derivatives of V)t:
a a2a3 Let Di = —, Dijy = ax Dii, = , axi
axiaxi•
,
,
iaxi, axi„
etc.
Chapter 7. An Application of Stein's Method
268 Then, using (26.1.14),
DiV)t (x)
(26.2.1)
fk
—
1 — e -2
_ Ily—e-sxH12
ex P
-s }dyds { 2(1 —e2)
_ k e -3 (yz — e sxi) k h(y)(2^r(1 _e28)) 21 — e -2 s 1 — e -2 s —r f -
exp
{
Ily—e-sxII2 d ds 2(1_e_28) I y
e —s
1 — e 2s
t
Ilgk
h(ex + 1 — e` 28 z)zjO(z)dz ds,
ziO(z) = a z2 O(z) = DiO(z) —
—
using the change of variables
z
— y — e -s x 1
—
e 2s -
a ' In the same manner, one has, usingg Z, D Z^ , etc. for derivatives aZ;
a2
etc., e s
Dii3Ot(x) _
DZZZ
(y2 — e s xi) e -s h(y)(2^r(1 — e -2s))- 2 s
^
— ,/t
[fRk ^t(x)
_ —
f t
Rk [f
2
1 — e-2s
(26.2.2)
h(e-sx + 1 — e -2s z ) • D(z)dzds, e
8
1 — e -28
h(e-sx + 1 — e-2sz) ' (—Dii'i"O(z))dz^ ds.
26.2. Derivatives of V) t - L -1 Tt h.
269
The following estimate is used in the next section: sup
uER'
Jk
n- 1 e-sx + e -s u + 1- e -2 sz O(x)Dii , a ,, O(z)dxdz
h
fRl^
Ti
< coke 2s (1 - e-23).
(26.2.3)
To prove this, write a =/ ^n1 e 8 / 1 - e -2 s and change variables x —* y = x + az.
Then
O(x) _ çb(y - az) _ 0(y) - az V (y) k
+ a2
f
zr zr ,
(26.2.4)
-v)Drr'(y - vaz)dv,
so that
n n 1 e -s x + e -s u + 1- e -2 sz = h n n 1 e -s y + e - 'su
h
and the double integral in (26.2.3) becomes
(26.2.5) Jh
JRk R k
n-1 n
e-Sy + e -s u k
0(y) - az • V (y) +
1
a 2 zrzr,
1 (1 - v)Drr'O(y - vaz)dv 0
Djjy, O(z)dzdy.
Note that the integrals of
and zj)Djjy,O(z) vanish for i, i', i",
and i o , so that (26.2.6)
f
R kh
n-1 e s y + e su (O(y) - az
= 0.
Chapter 7. An Application of Stein's Method
270
The magnitude of the last term on the right in (26.2.4) is (26.2.7)
f
a 2 (1 — v)
zrzr'(y —vaz) r (y —vaz) r ' —
1
zr O(y — avz)dv r=1
r=1
k
k
zrzr'(y — vaz) r (y — vaz) r ' + E zT O(y — avz)dv,
0). Recall that (see 26.1.15) Tt h(x) = Eh(e -t x + 1 - e -2 tZ), where Z has the standard Normal distribution
= N(0, Ilk), which we take to be independent of S. Then
ETt h(Sn,) = Eh(e -t S,, + 1 - e -2 tZ)
f f k
=
fR
k
h(ex + 1 - e
hd((Q(n))e-t *
hd((Q(n))e—c —
2t
z)dQ(n)(x)^(z)dz
1_e- t)
k
=J
(26.4.1)
e) *
e
1
_e z .
fk
The introduction of the extra term 4) e -t *4 integration in the last step since
j,.
,
1_e_
, _ 4 does not affect the
hd4 = 0.
Since the last integration is with respect to the difference between two probability measures, its value is unchanged if we replace h by h. Hence
ETth(Sn) = f hd[(Q(n))e-t - (D e -t] ^t $ 1 _ e - ^..
(26.4.2)
Rk
Also the class C is invariant under multiplication C —+ bC, where b > 0 is given. Therefore,
s,, = sup I Eh(Sn)I = sup f hd(Q (n) — 41) hE
hEN
= sup f hd [(Q())e-t - ^-t^ .
(26.4.3)
hEN
Thus (26.4.2) is a perturbation (or smoothing) of the integral in (26.4.3) by convolution with
1_e_
^. If e > 0 is a constant such that
1_e
({izI 1 and an absolute constant c > 1 specified below. Note that (26.4.13) clearly holds for n < c 2 k 5 p3. Since c 2 k 5 p3 > k
,
that (26.4.13) holds for some n = n o > k 8 . Then under the induction hypothesis, and (26.4.12), and using n o co.. ck 4+4 P3
> 2(no 1) 14 , one obtains
C7k3/2P3
ago+1 (no(no + 1)) + (no + 1) cio. ck P3
C7k5/2P3
(n o + 1) + 2o(n o + 1)2 \ 1 (c lo = 2c 9 , k < k -o < 2 -9 for k >2 I .
no+ 1 —
JJJ
(26.4.14)
Now, choose c to be the greater of 1 and the positive solution of c = clo/+c72 -9 , to check that (26.4.13) holds for n = no+1. Hence (26.4.13) holds for all n. We have proved the following result.
Theorem 1 There exists an absolute constant c> 0 such that on
ck 3
< _
n
(26.4.15)
26.5 The Non—Identically Distributed Case For the general case considered in Gotze (1991), XD's (1 < j < n) are independent with zero means and E 1 CovX3 = Ek• Assume
a3 = E EIIX^II 3 1, then /33 may be assumed to be smaller than or equal to c -l k - 2, and (1- fig)-' < (1 - - ) -1 = c'. The c^
induction argument is similar. Remark. If one defines 73
1 =_ >E
k
^X^zl1 1 3 ,
(26.5.9)
j=1 `i=1
then n
k
E L
j=1 i,i" i"=1
j j
^
i
E X ("X ")X 2/ l j = 73.
3
Since 13 now replaces k 2 Q3 in the computations, it follows that b n < ckry3. Since ry3 < k033 i (26.5.10) provides a better bound than (26.5.8) or (26.4.13).
(26.5.10)
Bibliography Ball, K. (1993). The reverse isoperimetric problem for Gaussian measure. Discrete Comput. Geom., 10(4):411-420.
Barbour, A. D. (1988). Stein's method and Poisson process convergence. In A Celebration of Applied Probability ( Journal of Applied Probability, Volume 25A), pages 175-184.
Bentkus, V. (2003). On the dependence of the Berry-Esseen bound on dimension. J. Statist. Plann. Inference, 113(2):385-402. Bhattacharya, R. (1982). On the functional central limit theorem and the law of the iterated logarithm for Markov processes. Z. Wahrsch. Verve. Gebiete, 60(2):185-201.
Bhattacharya, R. and Holmes, S. (2010). An exposition of Gotze's estimation of the rate of convergence in the multivariate central limit theorem. Technical report, Stanford University, Stanford, CA. http://arxiv.org/abs/1003.4254. Bhattacharya, R. and Waymire, E. C. (2009). Stochastic Processes with Applications. Classics Appl. Math. 61, SIAM, Philadelphia.
283
284
BIBLIOGRAPHY
Diaconis, P. and Holmes, S. (2004). In Stein's Method: Expository Lec-
tures and Applications, IMS Lecture Notes Monogr. Ser. 46, Inst. Math. Statist., Beachwood, OH. Gotze, F. (1991). On the rate of convergence in the multivariate Cit. The
Annals of Probability, 19:724-739. Holmes, S. (2004). Stein's method for birth and death chains. In
Stein's Method: Expository Lectures and Applications, IMS Lecture Notes Monogr. Ser. 46, Inst. Math. Statist., Beachwood, OH, pp. 45-68. Raic, M. (2004). A multivariate CLT. Personal communication. Rinott, Y. and Rotar, V. (1996). A multivariate CLT for local dependence with n -112 log n rate and applications to multivariate graph related statistics. J. Multivariate Anal., 56(2):333-350. Stein, C. (1986). Approximate Computation of Expectations. Inst. Math. Statist., Beachwood, OH.
Appendix
A.1 RANDOM VECTORS AND INDEPENDENCE
is a A measure space is a triple (S2, , ii), where SZ is a nonempty set, sigma-field of subsets of SZ, and is a measure defined on ffi . A measure space (S2,, , P) is called a probability space if the measure P is a probability measure, that is, if P ((I) = 1. Let (SZ, ffi , P) be a probability space. A random vector X with values in R k is a map on Sl into R k satisfying X '(A)-(w:X(w)EA}E -
(A.1.1)
for all A E , where S k is the Borel sigma-field of R k . When k= I, such an X is also called a random variable. If X is an integrable random variable, the mean, or expectation, of X, denoted by EX [or E(X)], is defined by
EX
f
n
(A.1.2)
X dP.
If X = (X I Xk is a random vector (with values in R k ) each of whose coordinates is integrable, then the mean, or expectation, EX of X is defined by , ...,
)
EXm(EX1,...,EXk).
(A.1.3)
If X is an integrable random variable, then the variance of X, denoted by var X [or var(X)], is defined by varX = E (X — EX) 2
.
(
A.1.4)
Let X, Y be two random variables defined on (2, ,P). If X, Y, and XY 285
286
Bounded Variation
are all integrable, one defines the covariance between X and V. denoted cov (X, Y), by
cov (:C, Y) - E (X — E X) (Y — E Y) = E XY — (E X) (E Y). (A.1.5 ) If X = (X 1 , ... , X k ) is a random vector (with values in R'), such that cov(X i ,X^) is defined for every pair of coordinates (X.,X^), then one defines the covariance matrix Cov(X) of X as the k x k matrix whose (ij) element is coy (X.,Xi ). The distribution Px of a random vector X (with values in R k ) is the induced probability measure PDX - ' on R", that is, Px(A)-P(X - '(A))
(A E 'i k ).
(A.1.6)
Since the mean and the covariance matrix of a random vector X depend only on its distribution, one also defines the mean and the covariance matrix of a probability measure Q on R k as those of a (any) random vector having distribution Q. Random vectors X 1 ,.. .X,,, (with values in R k ) defined on ((2,J^'v,P) are independent if P(X 1 EA 1 ,X 2 EA 2 ,..., X,, EA.)
=P(X 1 E A 1 )P (XZ EA 2 )• P(X.E A,.)
(A.1.7)
for every m-tuple (A 1 ,. ..,A,,,) of Borel subsets of R"`. In other words, X 1 ,.. .,X,,, are independent if the induced measure P°(X 1 ,...,X,„) - ' is a product measure. A sequence (X: n) 1) of random vectors [defined on ((2, S , P)] are independent if every finite subfamily is so.
A.2 FUNCTIONS OF BOUNDED VARIATION AND DISTRIBUTION FUNCTIONS Let be a finite signed measure on R k . The distribution function F µ of is the real-valued function on R k defined by F, (x)=IL((— oo, x])
(xER' ),
(A.2.1)
where (—oo,x]=(—oo,x1]X(—oo,x2]X... X(—oo,x k ]
I x=(x^,...,x )ER k
k I
(A.2.2)
Appendix
287
It is simple to check that F; is right continuous. For a random vector X defined on some probability space (2, % , P), the distribution function of X is merely the distribution function of its distribution P,. The distribution function Fµ completely determines the (finite) signed measure µ. To see this consider the class S of all rectangles of the form (a,b]=(a,,b 1 ]x... x(ak ,bk ]
(A.2.3)
(a,4fori=l,...,k]. (A.2.5)
If k = 1, we shall write t k for the difference operator. One can also shows that for every (a, b] E 9 it((a,b])=A hF,,(x),
(A.2.6)
where h=2(b—a),
x=I(a+b).
(A.2.7)
The class €, of all finite disjoint unions of sets in is a ring over which µ is determined by (A.2.6). Since the sigma-ring generated by R. is 'B k , the tSee Cramer [4], pp. 78-80.
Bounded Variation
288
uniqueness of the Caratheodory extensions implies that on 6k is determined by on L (and, hence by the distribution function Fµ ). One may also show by an induction argument$ that A h F(x)= 2 ±F(x i +E 1 h i ,x 2 +E 2 h 2 ,...,x, +Ek hk ),
(A.2.8)
where the summation is over all k-tuples (€ I '€ 2 , ... , E k ), each e; being either + I or —1. The sign of a summand in (A.2.8) is plus or minus depending on whether the number of negative E's is even or odd. Now let F be an arbitrary real-valued function on an open set U. Define a set function µF on the class Fu of all those sets in 9 that are contained in U by (A.2.9) µF((a,b1)-AhF(x), where x and h are given by (A.2.7). One can check that µF is finitely additive in . The function F is said to be of bounded variation on an open set U if
sup
I µF( 1 )I
(A.2.10)
is finite, where the supremum is over all finite collections { 1 '2'••)} of pairwise disjoint sets in 9 such that 1c U for all j. The expression (A.2.10) is called the variation of F on U. The following theorem is proved in Saks El] (Theorem 6.2, p. 68). THEOREM A.2. 1. Let F be a right continuous function of bounded variation on a nonempty open set U. There exists a unique finite signed measure on U that agrees with µ F on the class ^+o of all sets in 't contained in U. It may be checked that the variation on U of a right continuous function F of bounded variation (on U) coincides with the variation norm of the signed measure whose existence is asserted in Theorem A.2. I. A function F is said to be absolutely continuous on an open set U if given e >0 there exists S >0 such that
^^µF Qj )I<E
(A.2.11)
1
for all finite collections (I,.... } of pairwise disjoint rectangles 1^ E ^+ u satisfying (A.2.12) Xk (1^) < S,
tSee Halmos [1), p. 54. =See Cramer [4], pp. 78-80.
289
Appendix
where Ak denotes the Lebesgue measure on R k . If F is absolutely continuous on a bounded open set U, then it may be shown that F is of bounded variation on U.t THEOREM A.2.2. Let F be a right continuous function of bounded variation on an open set U C Rk . Let µ F be the measure on R k defined by (A.2.6) (and Theorem A.2.1). Suppose that on U the successive derivatives D k F, Dk _ I D k F,...,D 1 . Dk F exist and are continuous. Then F is absolutely continuous on U and one has ttF (A)= f(D 1 ... Dk F)(x)dx
(A.2.13)
for every Bore! subset A of U. Also, lim
h,jo.....h 4 j0
(2 kh 1 . • • hk) -I LkF= DID2• • . D k F
(A.2.14)
on U. Proof. Let the closed rectangle [a, b] be contained in U. Let h and x be defined by (A.2.7). Then µF((a+b])=A F(x)=A^.
..
Ok F(x)
Ok-IlF(XI,...,Xk-1, Xk+ hk)-
_oi... Ak_y
F(x1,...,xk-1,xk-hk))
(X4th4
JX4
hk
(DkF)(x, , ... , xk-I+Yk)dYk
(A.2.15)
by the fundamental theorem of integral calculus. Since the integrand has a continuous derivative with respect to X k - 1 , N'F(( a,b])=(]^'... Qk_2
(x4+hkr
2 J X4
f
h4
L ( Dk F )(X1 , ...9 Xk-I +hk-I'Yk) —
Xk t k4
(DkF)( X I , ... , Xk-1 - hk-Ilyk)]dyk
^^X4_^+/J4_^ //
-
lDk-IDkF)1X1,...+Yk-I+Yk)^k-I x4_i - h4_i
X4 k4
J
^k•
(A.2.16) tSaks [1), p. 93.
Bounded Variation
290
Proceeding in this manner, we arrive at (A.2.13) for A = (a, b], remembering that by Fubini's theorem the iterated integral as obtained by the above procedure is equal to the integral on (a,b] with respect to Lebesgue measure on R k . We next show that D, • • • Dk F is integrable on U. For if this is false, then for every integer n > I there exists an integer m„ and pairwise disjoint rectangles (a', b'], ... , (a ", b'"•] such that [a', b' ] c U, i=1,...,m 1 , and m„
j
(D, • • Dk F)(x)dx > n.
_
='.b'l
By (A.2.13), which we have proved for sets like (a',b'], one then has I tLF((a; ,b; ])I>n
for all n, contradicting the hypothesis that F is of bounded variation on U. Thus we have two finite signed measures on U, defined by A—► f (D I • Dk F)(x)dx,
A—^ s,(A),
A
that coincide on the class of all rectangles (a,b] such that [a,b]c U. Therefore the two signed measures on U are equal, and (A.2.13) is established. To prove (A.2.14), let x E U. Choose h =(h 1 , ... ,h,) such that hi > 0 for all i and [x — h, x + hl C U. Then by (A.2.13) one has A F(x)°µF((x '
—
h x+h])=_ f(D1 ,
(z
h.z+Al
...
DF)(y)dy.
From this and continuity of D s • • • Dk F on U the relation (A.2.14) follows.
Q.E.D. It follows from definition that the sum of a finite number of functions of bounded variation, or absolutely continuous, on an open set U is itself of bounded variation, or absolutely continuous, on U. Our next result establishes the bounded variation of a product of a special set of functions of bounded variation. We say that a function g on R k (into R 1 ) is Schwartz if it is infinitely differentiable and if for every nonnegative integral vector a and every positive integer m one has sup IIxIImI(D"g)(x)I p, and G = g if k = p. Then the function F(x)=F l (x l )• • • F;(xp )G(x)
(xER k ),
(A.2.19)
is of bounded variation on R k . Proof Consider an arbitrary function Ho on R. We first show that phi... L F .. . F I( x l) p( xp) Ho( x I_ .,xp) i - I F,(x;, - h,,) . .. F: (x, - h i) [ 0!^ . .. , Ho(x') ] X {phi,F. (x. )]... [ Ohj,-,F. (x. )],
(A.2.20)
where X'=(x ,...,Xp), X^^=X...,X^=x x^ , =x+h^ . ..... x, =x1 + and the summation is over all partitions of (1,2,... ' p^ into two disjoint subsets {i 1 ,...,i3 ), {j 1 ,...,jp _ J }, 0<s' p; when one of these subsets is empty, the corresponding factor drops out. If p = 1, then
i F1 (x)Ho (x)=FI (x+h)Ho(x+h)- Fl (x- h)Ho (x-h) =F1 (x-h)[Ho(x+h)-H o (x-h)] +H0(x+h)[FI (x+h)-FI (x-h)] = F I (x - h)O 1Ho (x) + Ho (x + h)PPF1 (x), which proves (A.2.20) for p = 1. Assume, as an induction hypothesis, that (A.2.20) holds for some p. Then ^'i...
Fl(x1)... jCp+l( xp+l) Ho(xl , ... , xp+1)
... F;( .. pp [ F 1( x l) xp)' Ap+11( FF+l( xp+1) Ho( x 1 ,...,xp+1))] =d1.... dp
[Fl(xl)
...
F,(xp){Fp+,(xp+1 - hp+l) 4➢+1Ha(xl , ... , xp+l)
+Ho(xl,...,Xp,Xp+l+hp+Oilh,-Fp+l(xp+l)1 J
_^^^... poFl(xl) .. F. (xp)Ho'(xI — xp+1) .
+
l...
AoF'1(xl)
,
...
,
F.(xp)Ho(xl , ... , xp+1) , (A.2.21)
Bounded Variation
292
where Qx l ,...,xp+l )= Fp+1(xp+1
—
h,,+l)AP+'HO(x, ... xp+I) ,
,
,
Ho(xl,...,xp+l)= Ho(xl, ... ,xp , xp+I +hp+1 )A +'F,+1(xp+l). (A.2.22)
Now apply the induction hypothesis to each of the two summands of the last expression in (A.2.21) and then substitute from (A.2.22) to see that (A.2.20) holds with p replaced by p+ 1. This completes the proof of (A.2.20) for all p. Looking at F given by (A.2.19), one sees that del... dkkF(x)=4h1... AhF,(xI)... F,(xp )Ho (x l ,...,xk ), (A.2.23) where Ho(xI,...,xk)—Ap+11...
dkG(x l ,...,xk ).
(A.2.24)
By (A.2.20) we obtain 2Fxi .—hi .)... Fi, (xi, —h i, )
OF(x)=
X (d^4,. .. A^ ,j^p+1 • dk G ( x' ) J X r^^51F( xi ),... {'-
where
x' = ( 1,..., k),
x. ^,
...,x' ^
'/, _ ,
= i, x. x' p+ = l-x p+
]
_
(A.2.25)
x'k— = kx 1x' _ 1
l,...,
,
x.i , + h^., ... , xJ' , = x^. , + hh. , and the summation is over all partitions of
(1,2,...,p) into two disjoint subsets {i 1 ,...,i}, {j l ,...,jp - f }, 0<s< p. For the sake of simplicity consider the summand on the right side of (A.2.25) corresponding to i 1 = 1, . .. , is = s. Then by the definition (A.2.18) of G, one
has
.. oS P+y ... okG( x ') = rxi+l+h0+l... fXk+Ilk
xo+l
-
hr+l
k
Xk-hk
... Dag(x1 ,. . . xs xf+l+hs+1 ,...,xp ,
-
x1+h1 ...
Jfxl-h, ...
Xk
,
+hp , Yp+11 ..., YJdYk ... dyp+I
/'x,+h, ! xF+l +h,+l
J
x,-h,
+ hk ^G(Yp+1,...,Yk)(D, • •ass}(y1 ...,y^^xJ+ ► ,
xk - h k
+h,+1 , ... , xp+hp , Yp+l^
,Yk )dyk
...
4p+1 43'5
...
dy1.
(A .2.26)
293
Appendix
Let the derivative of F, on (0, 1) be bounded above in magnitude by b,, and let c, denote the magnitude of the jump of F, at 0 (c, may be zero), 1 < i < p. Assume that 2h, 0). Let Sj (j = 0, 1,2,...) be a sequence of real-valued periodic functions on R of period one, possessing the following properties: (i) For j > 0, Si is differentiable at all nonintegral points and Si+ ^(x) = S.(x) (at all nonintegral x), (ii) So (x) m 1, S, is right continuous and Sj is continuous for j > 2. (A.4.1) Such a sequence is uniquely determined by the above properties and plays a fundamental role in the summation formula. To see this, write Si (0) = Bj /(j!) and observe that (i) leads to S 1 (x)=x+B 1 , S (x)-
1
ji
S2 (x) = Zx 2 +B 1 x+B 2 /2!,...
B x j + I XX
1
1! (j— 1)!
...+
j B - _ I I B (j)Xl -r ,
j!
j!
r-p r
(0<x(1,j> 1) (A.4.2)
297
Appendix
The constants Bi ' s are determined by the property (A.4.1). In fact, Si (0) =S(1) for j > 2, which yields 1+ (
J B1 +12JB2+... +( . i l )B'-i=0
(3= 2,3,...). (A.4.3)
)
The sequence. of constants Bi is recursively defined by the relation (A.4.3), thus completely determining the sequence of functions Si in the interval 0< x < 1. The continuity assumption determines their values at integral points. The numbers Bi defined by (A.4.3) are called Bernoulli numbers, and the polynomial i
B1 (x)=
I B (J)x' r
(A.4.4)
r
r-0
is called the jth Bernoulli polynomial. Clearly, SS (x)= B1 (x)/(j!) for 0 < x 0} has the properties (A.4.1), excepting right continuity of (-1)S,(— x), it follows from uniqueness that
Si (— x) = (-1)iS1 (x)
(for all x if j 1, for nonintegral x if j =1). (A.4.5)
The functions S, are thus even or odd depending on whether j is even or odd. In particular, Bi
.,(0)=-4=0 J•
for j odd, j > 3.
(A.4.6)
The first few Bernoulli numbers are B0 =1,
B 1 =-2,
B 2 =6,
B 3 =0,
B4 =-3,
B 5 =0. (A.4.7)
Therefore S1(X)= X-
S3 (x)=
,
S2(x)=1(x2-x+),
6(x 3 — 2x 2 + 2x),...
(0 < x < 1),
(A.4.8)
and so on. The periodic functions S, have the following Fourier series
298
Euter-Maclaurin Summation Formula
expansions when x is not an integer: cos(2nirx) —1 2 )^^2- (
j even, j > 0, n=1
(2nir)j
(A.4.9) 2sin(2n7rx)
j odd. n= I (2nir)i
This may be seen as follows. Let uj denote the function represented by the Fourier series (j > 1). It can be checked directly that u i is the Fourier series of S, and that u' + i = uj for j > 1. Thus Sj = uj for all j > 2, and S 1 (x)=u 1 (x) for all nonintegral x. THEOREM A.4. 1. Let f be a real-valued function on R' having r continuous derivatives, r> 1, and let
f
JDifJdx 0, and let r be a positive integer. Define
I
( Ar(x)= j(a)0. Then if h = (h I , ... , h k ), h.>0 for all i, dH=A" . .. Ok (H)=A ,(S1p2z... OkG). ,
x - h