Normal Approximation and Asymptotic Expansions (Clasics in Applied Mathmatics)

9 Normal Approximation and Asymptotic Expansions b c.4 P Books in the Classics in Applied Mathematics series are mon...

Author: Rabi N. Bhattacharya | R. Ranga Rao

69 downloads 1141 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

9 Normal Approximation and Asymptotic Expansions b c.4

P

Books in the Classics in Applied Mathematics series are monographs and textbooks declared out of print by their original publishers, though they are of continued importance and interest to the mathematical community. SIAM publishes this series to ensure that the information presented in these texts is not lost to today's students and researchers. Editor-in-Chief Robert E. O'Malley, Jr., University of Washington Editorial Board John Boyd, University of Michigan Leah Edelstein-Keshet, University of British Columbia William G. Faris, University of Arizona Nicholas J. Higham, University of Manchester Peter Hoff, University of Washington Mark Kot, University of Washington Peter Olver, University of Minnesota Philip Protter, Cornell University Gerhard Wanner, L'Universite de Geneve Classics in Applied Mathematics C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the Natural Sciences Johan G. F. Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie Algebras with Applications and Computational Methods James M. Ortega, Numerical Analysis: A Second Course Anthony V. Fiacco and Garth P. McCormick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques F. H. Clarke, Optimization and Nonsmooth Analysis George F. Carrier and Carl E. Pearson, Ordinary Differential Equations Leo Breiman, Probability R. Bellman and G. M. Wing, An Introduction to Invariant Imbedding Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathematical Sciences Olvi L. Mangasarian, Nonlinear Programming *Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject to Errors: Part One, Part Two, Supplement. Translated by G. W. Stewart Richard Bellman, Introduction to Matrix Analysis U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary Value Problems for Ordinary Differential Equations K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical Solution of Initial-Value Problems in Differential- Algebraic Equations Charles L. Lawson and Richard J. Hanson, Solving Least Squares Problems J. E. Dennis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear

Equations Richard E. Barlow and Frank Proschan, Mathematical Theory of Reliability Cornelius Lanczos, Linear Differential Operators Richard Bellman, Introduction to Matrix Analysis, Second Edition Beresford N. Parlett, The Symmetric Eigenvalue Problem Richard Haberman, Mathematical Models: Mechanical Vibrations, Population Dynamics, and Traffic Flow Peter W. M. John, Statistical Design and Analysis of Experiments Tamer Ba§ar and Geert Jan Olsder, Dynamic Noncooperative Game Theory, Second Edition Emanuel Parzen, Stochastic Processes *First time in print.

Classics in Applied Mathematics (continued) Petar Kokotovic, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods in Control: Analysis and Design Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and Ordering Populations: A New Statistical Methodology James A. Murdock, Perturbations: Theory and Methods Ivar Ekeland and Roger Temam, Convex Analysis and Variational Problems Ivar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and II J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their Applications F. Natterer, The Mathematics of Computerized Tomography Avinash C. Kale and Malcolm Slaney, Principles of Computerized Tomographic Imaging R. Wong, Asymptotic Approximations of Integrals O. Axelsson and V. A. Barker, Finite Element Solution of Boundary Value Problems: Theory and Computation David R. Brillinger, Time Series: Data Analysis and Theory Joel N. Franklin, Methods of Mathematical Economics: Linear and Nonlinear Programming, Fixed-Point Theorems Philip Hartman, Ordinary Differential Equations, Second Edition Michael D. Intriligator, Mathematical Optimization and Economic Theory Philippe G. Ciarlet, The Finite Element Method for Elliptic Problems Jane K. Cullum and Ralph A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. I: Theory M. Vidyasagar, Nonlinear Systems Analysis, Second Edition Robert Mattheij and Jaap Molenaar, Ordinary Differential Equations in Theory and Practice Shanti S. Gupta and S. Panchapakesan, Multiple Decision Procedures: Theory and Methodology of Selecting and Ranking Populations Eugene L. Allgower and Kurt Georg, Introduction to Numerical Continuation Methods Leah Edelstein-Keshet, Mathematical Models in Biology Heinz-Otto Kreiss and Jens Lorenz, Initial-Boundary Value Problems and the Navier-Stokes Equations J. L. Hodges, Jr. and E. L. Lehmann, Basic Concepts of Probability and Statistics, Second Edition George F. Carrier, Max Krook, and Carl E. Pearson, Functions of a Complex Variable: Theory and Technique Friedrich Pukelsheim, Optimal Design of Experiments Israel Gohberg, Peter Lancaster, and Leiba Rodman, Invariant Subspaces of Matrices with Applications Lee A. Segel with G. H. Handelman, Mathematics Applied to Continuum Mechanics Rajendra Bhatia, Perturbation Bounds for Matrix Eigenvalues Barry C. Arnold, N. Balakrishnan, and H. N. Nagaraja, A First Course in Order Statistics Charles A. Desoer and M. Vidyasagar, Feedback Systems: Input-Output Properties Stephen L. Campbell and Carl D. Meyer, Generalized Inverses of Linear Transformations Alexander Morgan, Solving Polynomial Systems Using Continuation for Engineering and Scientific Problems I. Gohberg, P. Lancaster, and L. Rodman, Matrix Polynomials Galen R. Shorack and Jon A. Wellner, Empirical Processes with Applications to Statistics Richard W. Cottle, Jong-Shi Pang, and Richard E. Stone, The Linear Complementarity Problem Rabi N. Bhattacharya and Edward C. Waymire, Stochastic Processes with Applications Robert J. Adler, The Geometry of Random Fields Mordecai Avriel, Walter E. Diewert, Siegfried Schaible, and Israel Zang, Generalized Concavity Rabi N. Bhattacharya and R. Ranga Rao, Normal Approximation and Asymptotic Expansions

N ormal Approximation and Asymptotic Expansions ci Rabi N. Bhattacharya University of Arizona Tucson, Arizona

R. Ranga Rao University of Illinois at Urbana-Champaign Urbana, Illinois

Society for Industrial and Applied Mathematics Philadelphia

Copyright © 2010 by the Society for Industrial and Applied Mathematics This SIAM edition is an updated republication of the work first published by Robert E. Krieger Publishing Company, Inc., in 1986, which was an updated and corrected version of the original edition that was published by Wiley in 1976. 10987654321 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA. Library of Congress Cataloging-in-Publication Data Bhattacharya, R. N. (Rabindra Nath), 1937Normal approximation and asymptotic expansions / Rabi N. Bhattacharya, R. Ranga Rao. p. cm. -- (Classics in applied mathematics ; 64) "Updated republication of the work first published by Robert E. Krieger Publishing Company, Inc., in 1986"--Copr. p. Includes bibliographical references and index. ISBN 978-0-898718-97-3 (pbk.) 1. Central limit theorem. 2. Convergence. 3. Asymptotic expansions. I. Ranga Rao, R. (Ramaswamy), 1935- II. Title. QA273.67.B48 2010 519.2--dc22 2010031917

S.LaJTL is a registered trademark.

To owri and hantha

Contents PREFACE TO THE CLASSICS EDITION

xiii

PREFACE

xv

LIST OF SYMBOLS

xvii

CHAPTER 1. WEAK CONVERGENCE OF PROBABILITY MEASURES AND UNIFORMITY CLASSES

I

1. Weak Convergence, 2 2. Uniformity Classes, 6 3. Inequalities for Integrals over Convex Shells, 23 Notes, 38

CHAPTER 2. FOURIER TRANSFORMS AND EXPANSIONS OF CHARACTERISTIC FUNCTIONS

39

4. The Fourier Transform, 39 5. The Fourier—Stieltjes Transform, 42 6. Moments, Cumulants, and Normal Distribution, 44 7. The Polynomials Ps and the Signed Measures Ps , 51 8. Approximation of Characteristic Functions of Normalized Sums of Independent Random Vectors, 57 9. Asymptotic Expansions of Derivatives of Characteristic Functions, 68 10. A Class of Kernels, 83 Notes, 88 ix

X

Contents

CHAPTER 3. BOUNDS FOR ERRORS OF NORMAL APPROXIMATION

90

11. Smoothing Inequalities, 92 12. Berry—Esseen Theorem, 99 13. Rates of Convergence Assuming Finite Fourth Moments, 110 14. Truncation, 120 15. Main Theorems, 143 16. Normalization, 160 17. Some Applications, 164 18. Rates of Convergence under Finiteness of Second Moments, 180 Notes, 185

CHAPTER 4. ASYMPTOTIC EXPANSIONSNONLATTICE DISTRIBUTIONS

188

19. Local Limit Theorems and Asymptotic Expansions for Densities, 189 Asymptotic Expansions under Cramer's 20. Condition, 207 Notes,

221

CHAPTER 5. ASYMPTOTIC EXPANSIONS—LATTICE DISTRIBUTIONS

223

21. Lattice Distributions, 223 22. Local Expansions, 230 23. Asymptotic Expansions of Distribution Functions, 237 Notes, 241

CHAPTER 6. TWO RECENT IMPROVEMENTS 24. Another Smoothing Inequality, 243 25. Asymptotic Expansions of Smooth Functions of Normalized Sums, 255

243

Contents

CHAPTER 7. AN APPLICATION OF STEIN'S METHOD

xi

260

26. An Exposition of Gotze's Estimation of the Rate of Convergence in the Multivariate Central Limit Theorem, 260

APPENDIX A.I. RANDOM VECTORS AND INDEPENDENCE

285

APPENDIX A.2. FUNCTIONS OF BOUNDED VARIATION AND DISTRIBUTION FUNCTIONS

286

APPENDIX A.3. ABSOLUTELY CONTINUOUS. SINGULAR, AND DISCRETE PROBABILITY MEASURES

294

APPENDIX A.4. THE EULER—MACLAURIN SUM FORMULA FOR FUNCTIONS OF SEVERAL VARIABLES

296

REFERENCES

309

INDEX

315

Preface to the Classics Edition

It is with great pleasure that the authors welcome the publication by SIAM of the edited reprint of Normal Approximation and Asymptotic Expansions. The original edition was published in 1976 by Wiley, followed by a Russian translation in 1982 and an edited version with a new chapter by Krieger in 1986. The book has been out of print for nearly twenty years. Statistical applications such as "higher order" comparisons of efficiency, and the evaluation of the improvement over the classical central limit theorem due to the widely popular and important bootstrap methodology of Efron, have led to a renewed interest in the subject matter of the book. We also note with a measure of happiness that the theory of asymptotic expansions for sums of weakly dependent random variables/vectors due to Gotze and Hipp made use of some of the formalism and estimation in our book. We have controlled an initial impulse to present this theory, as it would make the book substantially increase in size and take us much time to ready it for publication. Keeping to independence, however, a short new chapter is added on Gotze's application to the multivariate CLT of an ingenious method of Stein. The exposition, and a somewhat modified treatment presented here of the rather difficult original paper, resulted from a collaboration between one of the authors and Professor Susan Holmes. Finally, we are deeply appreciative that our colleague Professor William Faris has always liked the book. His support, as well as that of SIAM acquisitions editor Sara Murphy, made the publication of the present reprint possible. We are indebted to Bill and Sara.

xiii

Preface

This monograph presents in a unified way various refinements of the classical central limit theorem for independent random vectors and includes recent research on the subject. Most of the multidimensional results in this area are fairly recent, and significant advances over the last 15 years have led to a fresh outlook. The increasing demands of application (e.g., to the large sample theory of statistics) indicate that the present generality is useful. It is rather fortunate that in our context precision and generality go hand in hand. Apart from some material that most students in probability and statistics encounter during the first year of their graduate studies, this book is essentially self-contained. It is unavoidable that lengthy computations frequently appear in the text. We hope that in addition to making it easier for someone to check the veracity of a particular result of interest, the detailed computations will also be helpful in estimations of constants that appear in various error bounds in the text. To facilitate comprehension each chapter begins with a brief indication of the nature of the problem treated and its solution. Notes at the end of each chapter provide some history and references and, occasionally, additional facts. There is also an Appendix devoted partly to some elementary notions in probability and partly to some auxiliary results used in the book. We have not discussed many topics closely related to the subject matter (not to mention applications). Some of these topics are "large deviation," extension of the results of this monograph to the dependence case, and rates of convergence for the invariance principle. It would take another book of comparable size to cover these topics adequately. We take this opportunity to thank Professors Raghu Raj Bahadur and Patrick Billingsley for encouraging us to write this book and giving us xv

xvi

Preface

advice. We owe a special debt of gratitude to Professor Billingsley for his many critical remarks, suggestions, and other help. We thank Professor John L. Denny for graciously reviewing the manuscript and pointing out a number of errors. We gratefully acknowledge partial support from the National Science Foundation (Grant. No. MPS 75-07549). Miss Kanda Kunze and Mrs. Sarah Oordt, who did an excellent job of typing the manuscript, have our sincere appreciation. R. N. BHATTACHARYA R. RANGA RAO Note In this reprinted edition a new chapter (Chapter 6) has been added and misprints in the original edition corrected.

List of Symbols

A\B A+y A ` A -`

9 c a* (d : µ), d. (d : 41 o , v )

a„

a, /3 âj B B, B„ B (x : e)

n1'` Cl(A) C c(B)

set of all elements of A not in B: (1.4) (x+y:xEA): (5.5) set of all points at distances less than e from A : (1.17) set of all x such that the open ball of radius a centered at x is contained in A: (2.38) a generic class of Borel sets special classes of Borel Sets: (17.3), (17.52) (14.64) usually nonnegative vectors with integral coordinates; sometimes positive numbers sum of coordinates of a nonnegative integral vector a generic Borel set positive square roots of the inverses of matrices V, V,,: (9.7), (19.28) open ball of radius a centered at x: (1.10) Borel sigma-field of R k closure of A class of all convex Borel subsets of R k convex hull of B: Section 3 xvti

xviii List of Symbols cov(X, Y) Cov(X)

D

covariance between random variables X, Y: (A.1.5) matrix of covariances between coordinates of a random vector X: Appendix A. 1 average covariance matrix of centered truncated random vectors Z,,...,Z,,: (1 4.5) ath derivative: (4.3) euclidean distance between the origin and aA: Section 17 (17.50) Prokhorov distance: (1.16) bounded Lipschitzian distance: (2.51) (12.47) determinant of a matrix V or D absolute value of the determinant of the matrix of basis vectors of a lattice L: (21.20) Hausdorff distance between sets A and B: (2.62) -

D° d (0, aA) d0(G 1 , G2)

dp dBL d(P,4) Det V, Det D det L

0(A, B)

(14.4)

0,,,(e) Or s ^^ J S,! s

(14.105), (14.106) (15.7)

aA EX, E(X)

topological boundary of A: (1.15) expectation or mean of a random variable or random vector X: (A. 1.2), (A. 1.3) generic small numbers symbol for "belongs to" Fourier transform of a function f: (4.5) (4.4) f(x+y): (11.5) convolution of functions f and g: (4.9) n-fold convolution of a function f: (4.11)

E, E E

f f f(x)

f*g f*"

(17.55) (18.4)

List of Symbols

xix

a generic class of Borel-measurable functions fundamental domain for the dual lattice L*:

(

Dm. v

(21.22)

normal distribution on R k with zero mean and identity covariance matrix density of 1 normal distribution with mean m and covariance V

mV (D,o

density of 4,,, v :

G., m , g0 ,

a special probability measure and its density: (10.7) (16.7)

gT Y(f:E), Y * (f:E)

11i

^

I IA Int(A) K(

(6.31)

(15.5), (18.10)

(11.8), (11.18) (9.8), (19.32)

k X k identity matrix indicator function of the set A interior of A a smooth kernel probability measure assigning either all or more than half its mass to the sphere B (x : e): (11.6), (11.16), (15.26)

X„

with cumulant, average of with cumulants of X,,...,X,,:

(6.9), (9.6), (14.1)

average of with cumulants of centered truncated random vectors Z,, ... , Z,,: (14.3)

X,,, j , X,,.,,

with cumulant of X^, I < j < n, and their average: (9.6), Sections 19, 20

X,(z)

(6.16)

L L*

a lattice: Section 21 lattice of periods of f, f being the characteristic function of a lattice random vector:

L(c,d) 11 , 11 A, A

(21.9), (21.19)

a Lipschitzian class of functions: (2.50) Liapounov coefficient: (8.10) smallest and largest eigenvalues of an average covariance matrix V: Section 16

xx

List of Symbols

Ak

Lebesgue measure on R k

A,. ,,(F) M,(f), M O (f)

(23.8) (15.4) supremum and infimum of f in B (x : e): (11.2) set of all finite signed measures on a metric space. positive, negative, and total variations of a finite signed measure µ: (1.1) variation norm of a signed measure µ: (1.5) Fourier-Stieltjes transform of µ: (5.2) convolution of two finite signed measures µ, v: (5.4) n-fold convolution of µ: (5.6) signed measure induced by the map T: (5.7) ath moment, average of ath moments of X 1 ,...,X,,: (6.1), (14.1) average of ath moments of centered truncated random vectors Z,,...,Z,,: (14.3)

M1 (x : e), mj (x : E)

)i, µ+, µ-, ^ µI

Il µdl µ µ• v µ•"

µo T -' µ

Q

µ.(t), f3,(t)

(8.4)

v!

PI!v2!...Pk!

v,, v o P

P P,(z : (x,)) P.(-4 0, v : {x,})

P,(

-

'o, v : {x,})

where v=(P,,••.,Pk) is a nonnegative integral vector special signed measures: (15.5) a probability measure, a polyhedron set of all probability measures on a metric space characteristic function of a probability measure P: (5.2) a special polynomial in z: (7.3) a polynomial multiple of 0 o.v : (7.11) signed measure whose density is P,( $o,v: (X„}) a special polyhedron: (3.19) point masses of normalized lattice random vectors X 1 ,.. .,X 17 : (22.3) point masses of normalized truncated lattice random vectors: (22.3) -

Pa

P;,(Y.,,,)

List of Symbols

xxi

distribution of n - '/ Z (X I + • • • where X 1 ,...,X,, are independent random vectors having zero means and average covariance matrix V (or 1) distribution of n 1 2 (Y, + + Y„), where YT 's are truncations of XD 's:

Q„

Q„

-

(14.2)

Q„

11 T h

distribution of n -1 ' 2 (Z 1 + + Z„), where Z^=Y^—EYE : (14.2) local expansions of point masses of Q, Q,,' in the lattice case: (22.3), (22.38), (23.2) distance between a point x and a set A: (1.18) sth absolute moment, average of sth absolute moments of X 1 ,.. .,X: (6.2), (9.6), (14.1) average of sth absolute moments of centered truncated tandom vectors (14.3) absolute moment of X^, I < j < n, and their average: (14.1), (17.55) special periodic functions: (A.4.2), (A.4.14) Schwartz space: (A.4.13) surface area measure on the unit sphere of R k: Section 3 norm of a matrix T: (14.17)

Tr

(16.6)

T(f : 2E), T*(f : 2E)

(11.8), (11.18)

V

average of covariance matrices of ran-

9n,m , 9n,m

p(x, A) p,

p;

p SS , S. S ak _ I

dom vectors X 1 ,...,X: wf (A) wf (x : E)

Zj (E : µ)

(9.6), (14.5)

oscillation off on A: (2.7), (11.1) oscillation of f on B (x : E): (2.7), (11.3) average modulus of oscillation off with respect to a measure µ: (11.23) sup

1

(: s):

(11.24)

y

IxI

x,I+ • • • +Ix k I, where x=(x l ,...,xk ): (4.8)

xxii

List of Symbols

ya,n+ ya,n z+ ( Z + ) k

il•Ii, II • II P z

(22.3)

set of all nonnegative integers set of all k-tuples of nonnegative integers euclidean norm and inner product Lp-norm set of all integers

CHAPTER 1

Weak Convergence of Probability Measures and Uniformity Classes

Let Q be a probability measure on a separable metric space S every open ball of which is connected (e.g., S=R"). In the present chapter we characterize classes f of bounded Borel-measurable functions such that sup

If

fdQ„-

Jeff S

fSfdQI-0

(n--*oo), (1)

for every sequence { Q„ : n> I) of probability measures converging weakly to Q. Such a class is called a Q-uniformity class. It turns out that is a Q uniformity class if and only if

f

f

sup w1 (S) Q(G)

for every open subset G of S.

n

For a proof of this theorem we refer to Billingsley [1] (Theorem 2.1, pp. 11-14) or Parthasarathy [1] (Theorem 6.1, pp. 40-42). Let B (x : E) denote the open ball with center x and radius E, B(x:E)={y:yES, p(x,y)<E} (xES, c>0). (1.10)

For an arbitrary real-valued function f on S we define, for each positive number e, the oscillation function wj (wf (x:E)= sup {I f(z)- f(y)I:y,zEB(x:c)}

(x ES).

(1.11)

For a complex-valued function f= g + ih (g, h real-valued), define Wf(X:E)— wg (X:E)+Wh(X:E) (xES, E >0). (1.12)

The oscillation function is Borel-measurable on the (Borel-measurable) set on which it is finite.t The set of points of discontinuity of f is Borelmeasurable and can be expressed as (x:wj (x:

1n

)

+0

as n-moo}.

(1.13)

1

Now let Q be a probability measure on S. A complex-valued function f on S is said to be Q-continuous if its points of discontinuity comprise a set of Q-measure zero. In particular, if the indicator function I A of a set A, taking values one on A and zero on the complement of A, is Q-continuous, we say A is a Q-continuity set. Since the set of points of discontinuity of I A is precisely the boundary 8A of A, A is a Q-continuity set if and only if Q(aA)=0.

(1.14)

Recall that the (topological) boundary aA of a set A is defined by aA =Cl(A)\Int(A), tSee relations (11.1)—{11.4) and the discussion following them.

(1.15)

Weak Convergence 5

where C1(A), Int(A) are the closure and interior of A, respectively. LEMMA 1.2. Let Q be a probability measure and f a complex-valued, bounded, Borel-measurable function on a metric space S. The following statements are equivalent. (i) f is Q-continuous. (ii) lim Q((x:wf (x:e)>6))=0 for every positive S.

(iii) urn Jwf (x : e) Q (dx) = 0. Proof Let D be the set of discontinuities off. As 40 the sets { x : wf (x : e) >6) decrease to a set Da . Now (i) means Q(D)=O and (ii) means Q(D8 )=0 for all 6>0. Since D= U D id , ( i) and (ii) are equivalent. n>I Since, as 40, the functions wl ( : e) are uniformly bounded and decrease to a function that is strictly positive on D and zero outside, (iii) is equivalent to Q(D)=0. Q.E.D. The next theorem provides two further characterizations of weak convergence of a sequence of probability measures. THEOREM 1.3. Let Q„ (n = 1, 2, ... ), Q be probability measures on a metric space S. The following statements are equivalent. (i) { Q„ } converges weakly to Q. (ii) linm Q„(A)=Q(A) for every Borel set A that is a Q-continuity set. (iii) lim Jf dQ„ = Jf dQ for every complex-valued, bounded, Borelmeasurable Q-continuous function f. Although it is not difficult to prove this theorem directly, we note that it follows as a very special case of Theorem 2.5. We conclude this section by recalling that if the metric space S is separable, then the weak topology on 9l is metrizable and separable, and that a metrization is provided by the Prokhorov distance d p between two probability measures Q, and Q 2 , which is defined by a,(Q 1 ,Q2 )=inf

{E:c>O,

Q 1 (A)0) (h„>0) (h„ G0)

h„•!{ti,>o)dX.

(2.3)

The last integrand in (2.3) is nonnegative and bounded above by q. Since it converges to zero almost everywhere, its integral converges to zero. Q.E.D. LEMMA 2.2. Let S be a separable metric space, and let Q be a probability measure on S. For every positive a there exists a countable family (A k : k = 1,2,...) of pairwise disjoint Borel subsets of S such that (i) U (A k : k = 1,2,...) = S, (ii) the diameter of A k is less than a for every k, and (iii) every A k is a Q-continuity set.

Proof. For each x in S there are uncountably many balls (B (x : 8) : 0< 6 <e/2) (perhaps not all distinct). Since (2.4)

aB (x : 6) C { y : p(x,y) = S },

the collection { aB (x : 3) : 0 < 8 )

S))+s, (2.20) JE J

since the diameter of each A k is less than e. First let 40 and then let 8j,0. This proves sufficiency.

Uniformity Classes

11

NECESSITY. Assume (2.9i) does not hold. Then there exists a sequence { f,,) c such that mJ (S)>n

(n=1,2,...).

(2.21)

Thus if we write a„=inf(f„(x):xES},

Q=sup(f„(x):xES},

(2.22)

then /3„ — a,, > n for all n. Divide the closed interval [a,,, /3J into n disjoint subintervals of equal length. There exists a subinterval I, such that Q(f^ I (I.,))>n

(n=1,2,...).

(2.23)

Also, since /3,, — an > n, there exists a point x,, in S for which If„(x,,)—tl. n 2

for all tECI(I,,)

(n=1,2,...). (2.24)

Now define a probability measure Q„ by adding a point mass 1/n at x,, and subtracting this mass proportionately from subsets of f,^ '(I„); that is, Q.(A) = Q(A\fn '(In))+ — (A)

+f 1— nQ( f^ ^(I^)) ]Q(Anf„ '(I.)),

(AE),

(2.25)

where Sx. is the probability measure degenerate at x„ (i.e., Sx. ({ x,, }) = 1). Note that

(2.26)

IIQ.-Q11=n,

so that Q,, converges in variation norm and, therefore, weakly to Q. But

f f dQ,, — Jf,,dQl = n II„(x,,)— Q f^

1^ (I^) ✓ J (i^)l,,dQ

12

Weak Convergence and Uniformity Classes

for some! in Cl(I„). Thus, by (2.26),

f fn dQ. — f f" dQ I >' 2n

(2.28)

for all n, implying that F is not a Q- uniformity class. Next assume (2.9ii) does not hold. This means that there exist positive numbers 8 and ri, a sequence {ç) of positive reals converging to zero, and a sequence (f) c 5 such that Q({x:w1 (x:f„) >6})>r1>0

(n=1,2,...).

(2.29)

Let (Nk. ,, : k = 1,2,...) be a countable collection of pairwise disjoint Borel sets satisfying (i) U( Nk ,„: k=1,2,...)D(x:wJ (x:e„)>8), (ii) diameter of Nk.A < 6E„ for each k, (n=1,2,...). (iii) wf (Nk. „)> 8 for each k Such a collection exists (for each n) by Lemma 2.3. Given n, for each k choose two points xk n , Yk n in Nk „ such that ,

,

f,, (Y)

—

f,, (x)> 8

(k= 1,2,...).

(2.30)

Thus

T,k Q( Nk,n)f (Yk,n) 2k Q( Nk,,,)fn( xk,n) > s71 —

,

which implies that either

Ikf

(n=i,2,...), (2.31)

f.dQ—YQ(Nk.,,)fl(xk.n)> k

Nk.4

or 7.Q(Nk.f)fn(Yk.f) k

-2kf

(n=1,2,...). (2.32)

fndQ>

Nk.,

If (2.31) holds, then define Q„ by

Q„(A)=Q(A \U (Nk. .:k=1,2,... )) +2Q(Nk. .)S.,..(A) k

(A E );

(2.33)

Uniformity Classes

13

if (2.32) holds, then define Q„ by Q(A)= Q(A \U (Nk.f :k=1,2,... )) +2Q(Nk.,)8) (A) k

(A E

(2.34)

Suppose (2.31) holds. Let f be a uniformly continuous complex-valued function on S. Then

f fdQn— f fdQl =l Y.

f

(f(xk.n)—f)dQl

kNk..

wf(Nk.fl)Q(Nk.n) k <sup(wf (Nk ,,,):k=1,2,... ).

(2.35)

The right side of (2.35) goes to zero as n goes to infinity, since the diameter of Nk. „ is, for all k, not more than 6e„ (which goes to zero). Hence {Q„) converges weakly to Q by Theorem 1.1. On the other hand

f f.dQ f fndQn=2(f .dQ f — Q(Nk.n)fn(xk.,,))> 2 S —

k

(n=1,2,...),

N,,,

by (2.31), which shows that 5 is not a Q- uniformity class. A similar argument applies if (2.32) holds. Q.E.D. Remark. Let S be a separable metric space, with Q, and Q 2 two probability measures on it such that Q2 is absolutely continuous with respect to Q 1 . It follows from Theorem A.3.1 (see Appendix), which characterizes absolute continuity, that every Q,-uniformity class of functions is also a Q 2-uniformity class.

The following variant of Theorem 2.4 is also useful. THEOREM 2.5. Let S be a separable metric space and Q a probability measure on it. A family 9 of complex-valued, bounded, Bore/-measurable functions on S is a Q- uniformity class if and only if (i) sup wf (S) < oo, In

(ii) lim sup f w1(x : e)Q (dx) = 0. CIO JE

(2.36)

Proof. Suppose is a Q-uniformity class. By Theorem 2.4, (2.9) holds. Let c =sup {wj (S) : f E ). Given a positive 6 there exists a positive

14


number E o(S) such that sup Q I { x : wj (x : f) > ) / < 2c

J E g;

`l

(2.37)

for every E less than E o(S). Hence for all f in + f wf(x:€)Q(dx)< f) {X:wf(x:t)< 21

4S whenever a is less than E o(S). This proves necessity of (2.36). Conversely, suppose (2.36) holds. Choose and fix a positive number S. Given a positive, there exists a positive number E,(r^) such that sup f wj (x:c)Q(dx)S})< -k

f

Wf(x:E)Q(dx)O). (2.38)

The set A ` was also defined earlier in (1.16). Note that A ` is open and A -`

is closed. COROLLARY 2.6. A class C?' of Bore! subsets of a separable metric space S is a Q-uniformity class if and only if lim sup Q(A`\A `)=0. 1 10 A E a -

(2.39)

If every open ball of S is connected, then A`\A `=(aA)` -

(A CS, E>0),

(2.40)

Uniformity Classes

so that in this case

15

is a Q-uniformity class if and only if lim sup Q((aA)`)=0.

(2.41)

fio AE&

Proof. For any arbitrary set A, wrA (x : e) is one if and only if B (x : e)

intersects both A and S\A; otherwise it has the value zero. Therefore W ]A (X:E)=,A ' \A -.(x)

(XES, E>0).

(2.42)

Hence

f

wjA (x:e)Q(dx)=Q(A`\A `)

(ACS, e>0).

-

(2.43)

Since w,A (S) < I for all sets A, it follows from Theorem 2.5 that (2.39) is a necessary and sufficient condition for & to be a Q-uniformity class. We now prove (2.40) under the hypothesis that every open ball of the metric space S [separability is not needed for the validity of (2.40)] is connected. First we show that the relation (aA)`CA`\A `

(ACS, e>0)

-

(2.44)

is valid in every metric space S. For suppose xE(aA)`. Then there exist a positive c' smaller than and a pointy in aA such that p(x,y) < e'. Since y is a boundary point of A, there exist two points z, and z 2 , with z, in A and z 2 in S \A, such that p(y, z.) < E — e' for i = 1, 2. Thus p(x, z.) < p(x,y) + p(y, z ; ) < e for i= 1, 2. This means that x E A ` and x A - `, which proves (2.44). Next we assume that every open ball of S is connected. If x E A `\ A -`, then AnB(x:e)#4,

(S\A)nB(x:e)^.

(2.45)

We now suppose that xE(8A) (and derive a contradition). This means B(x':e)n 8A=0,

(2.46)

B(x:e)=((S\Cl(A))n B(x:E))u(Int(A)nB(x:e)),

(2.47)

so that

since S = (S \C 1(A )) U Int (A) U aAA. The right side of (2.47) is the disjoint union of two open sets. These two sets are nonempty because of (2.45), (2.46), and the relations (which hold in every topological space) (S\C1(A))U aA D S\A,

Int(A)U aA DA.

16 Weak Convergence and Uniformity Classes

However this would imply that B (x : e) is not connected. We have reached a contradiction. Hence xE(8A)`, and (2.40) is proved. The relation (2.41) is therefore equivalent to (2.39). Q.E.D. COROLLARY 2.7. Let S be a separable metric space. A class of bounded functions is a Q-uniformity class for every probability measure Q on S if and only if (i) sup {w t(S) : f E } < co, and (ii) is equicontinuous at every point of S; that is, lim sup w1 (x : () = 0

for all x E S.

(2.48)

JE J

Proof We assume, without loss of generality, that the functions in 15 are real-valued. Suppose that (i) and (ii) hold. Let Q be a probability measure on S. Whatever the positive numbers S and a are, sup Q{{x:,1 (x:e)>6))S}).t (2.49) fE^

J JE 9

For every positive 8 the right side goes to zero as €,0. Therefore, by Theorem 2.4, is a Q-uniformity class. Necessity of (i) also follows immediately from Theorem 2.4. To prove the necessity of (ii), assume that there exist a positive number S and a point x o in S such that sup c j (x o :e)>S

foralle>0.

fE9

This implies the existence of a sequence {x„) of points in S converging to x o and a sequence of functions (f„) c F such that IL(xn)—f.('XO)I>2

(n=1,2,...).

Let Q = Sxo and Q„ = Sx. (n =1, 2, ... ). Clearly, (Q} converges weakly to Q, but

f f.dQ,^ Hence

—

f.dQl =If^(x,^)—f,,(xo)l>2

(n=1,2,...).

is not a Q-uniformity class. Q.E.D.

We have previously remarked that the weak topology on the set Vii' of all probability measures on a separable metric space is metrizable, and the tThe set (x: sup(wj (x : c) : f E f) and, therefore, measurable.

>6) =

u ((x : wj (x : c) > 8):f E i} is open (see Section 11)

Uniformity Classes

17

Prokhorov distance dp metrizes it. Another interesting metrization is provided by the next corollary. For every pair of positive numbers c. d, define a class L(c, d) of Lipschitzian functions on S by L(c,d) = { f : wf (S) < c, I f (x) — f (y)l < dp(x,y) for all x,y ES). (2.50)

Now define the bounded Lipschitzian distance d BL by dBL (Q1 Q2) = sup fdQ, — f fdQ2I fEL(1,1) I

(Q1.Q2E''P). (2.51)

,

COROLLARY 2.8. If S is a separable metric space, then d BL metrizes the weak topology on Proof. By Corollary 2.7, L(l, I) is a Q-uniformity class for every probabil-

ity measure Q on S. Hence if (Q n ) is a sequence *of probability measures converging weakly to Q, then lira dBL (Qn,Q)=0.

(2.52)

n

Conversely, suppose (2.52) holds. We shall show that (Q n ) converges weakly to Q. It follows from (2.52) that lim l f fdQn — f fdQl =0

for every bounded Lipschitzian function f. (2.53)

For, if f (x) — f (y)^ < dp(x,y) for all x,y E S, f fdQ,,— f fdQ=c( f f'dQn

—

f f'dQ),

where c = max {wf (S),d) and f' = f/c E L(1, 1). Let F be any nonempty closed subset of S. We now prove lim Qn (F)0 define the real-valued function fE on S by ff (x)=^(e

-

'p(x,F))

(x ES),

(2.55)

18


where 4, is defined on [0, oo) by

(t) __1

.

1—t if 0 < t < 1, if 1>1. 0

(2.56)

Note that f is, for every positive e, a bounded Lipschitzian function satisfying wf(S)< 1, If,(X) f^(Y)I0).

(2.57)

f, for every positive e, linm Q,, (F) < lim f f dQ„ = f f dQ

(e>0). (2.58)

Also, lim f^(x)= IF (x) for all x in S. Hence

lim JJdQ=Q(F).

(2.59)

cj0

By Theorem 1.1 { Q) converges weakly to Q. Finally, it is easy to check that dBL is a distance function on 'P. Q.E.D. Remark. The distance daL may be defined on the class GR, of all finite signed measures on S by letting Q t . Q 2 be finite signed measures in (2.51). The function µ—.d, L (s,0) is a norm on the vector space 9R.. The topology induced by this norm is, in general, weaker than the one induced by the variation norm (1.5). It should also be pointed out that the proof of Corollary 2.8 presupposes metrizability of the weak topology on 9 and merely provides a suitable metric as an alternative to the Prokhorov distance do defined by (1.16)]. This justifies the use of sequences (rather than nets) in the proof.

One can construct many interesting examples of uniformity classes beyond those provided by Corollaries 2.7 and 2.8. We give one example now. The rest of this section will be devoted to another example (Theorem 2.11) of considerable interest from our point of view.

Uniformity Classes

19

Example Let S=R 2 . Let 6! (1) be the class of all Borel-measurable subsets of R 2 each having a boundary contained in some rectifiable curvet of length not exceeding a given positive number 1. We now show that 6D (I) is a Q-uniformity class for every probability measure Q that is absolutely continuous with respect to the Lebesgue measure A 2 on R 2 . Let A E 6 (l) and let 3A C J, where J is a rectifiable curve of length 1. There exist k points z o,z t ,.••,z k -, on J such that (i) z o and z k _ t are the end points of J (may coincide), (ii) k0 ,AcB`, BCA`}.

(2.62)

The class Q of all closed bounded subsets of S is a metric space with metric A. LEMMA 2.9. Let ')1t be a compact subset of (? . Then for every probability measure Q on S one has lim sup{Q(M`):ME cjo

)=sup{Q(M):MElt).

(2.63)

Proof Let rq be a given positive number. For every M E )lt. there exists a positive number E, t, such that

Q(M`") E, 0(A, B) < A(aA, aB

).

(2.69)

To prove the opposite inequality, suppose that E > t1(A, B) and let be any positive number. Let x E aA A. Let (z : (1, z> = c) be a supporting hyperplane for A at x. Then the half space H = { z : (l, z> c c + E) contains A. If z E A and 11 z' — z l l< E, then

(l,z'>=(l,z>+(l,z'—z> +IIz'—z!I c+ e. Hence H J A D B. The ball B (x : E + il) intersects R k \ H. This is because the point x +(E +71/2)1 of this ball satisfies (l,X+(E+

2 )l>=( l,x>+(

E+ 2) =C +E+

2

,

and therefore lies in the complement of H. It follows that B (x : E + rl) intersects R k \ B. But, since A C B ` and x E A, B (x : E + rl) certainly intersects B. It follows that B (x : E + rl) intersects aB, so that (aB)"'' D 3A. Similarly (8A)` + ' j B. Therefore A(aA,aB) <E +rl tEggleston II), p. 20.

22


for every c > 0(A, B) and every positive rt. Hence 0(aA,aB) 0, let r be so chosen that

Q({x:IIxII>r))r)) n =2Q({x:fIxll >r}) r) is a Q-continuity set, since its complement is [although, given any probability measure Q and a positive rl, one can always find r such that (x: jlxjl > r) is a Q-continuity set and (2.72) holds]. Since rl is an arbitrary positive number, it follows from (2.74) that is a Q-uniformity class. Consequently, (2 is a Q-uniformity class. Q.E.D.

a

Remark. It follows from the above theorem that if Q is absolutely continuous with respect to Lebesgue measure on R", then is a Q-uniformity class. In particular, is a 4)uniformity class, 4) being the standard normal distribution in R k . We shall obtain a refinement of this last statement in the next section. It is easy to see from the proof of Theorem 2.11 that the class C in its statement may be replaced by any class Ef of Borel-measurable convex sets with the property that

e

a

R,- (CI(A)nCl(B(0:r)):AE($} is compact in the Hausdorff metric for all positive r. In particular, by letting d — {(—oo,x1]X(—oo,x2]X ... X( — oo,xk]:x—(xi,...,Xk)ER k }, we get Polya's result: Let P be a probability measure on R k whose distribution function F, defined by F(x)-P((-oo,x,]x... x(-oo,xk ]) is continuous on Rk. If a sequence converges weakly to P, then

[x-(x......xk)ERk ],

(2.75)

of probability measures {P„} with distribution functions {F

sup(IF„(x)-F(x)I:xER k }-+0

A)

(2.76)

as n-boo. The left side of (2.76) is sometimes called the Kolmogorov distance between P o and P. The converse of this result is also true: if (2.76) holds, then (P„) converges weakly to P. In fact, if (F„} converges to Fat all points of continuity of F, then (P„) converges weakly to P.t

3. INEQUALITIES FOR INTEGRALS OVER CONVEX SHELLS

It is not difficult to check that if C is convex then so are Int(C),C1(C). The convex hull c(B) of a subset B of R k is the intersection of all convex sets containing B. Clearly c(B) is convex. If C is a closed and bounded convex subset of Rk, then it is the convex hull of its boundary; that is,

c(8C)=C. Clearly c(8C) C C. On the other hand, if x E C, x li= 8C, then every line through x intersects aC at two points and x is a convex combination of these two points. Thus c(aC) = C. If C is convex and e >0, then C` is tSee Billingsley [ 1 1, pp. 17-18.


24

convex and open, and C -' is convex and closed. The main theorem of this section is the following: THEOREM 3.1. Let g be a nonnegative differentiable function on [0, oo)

such that (i) b= f Ig'(t)It k- 'dt < d for all j for which LL is a face of P }.

(3.8)

Proof. The first assertion is obvious. If F= H n 8P= u(H n L^) is a face of P, then H = (x : 0. In fact, there exists an a 0 a o . Thus P. is a polyhedron for a > a o , if we show that it is bounded. A more precise result is given by the following: LEMMA 3.5. Let xo E Int (P) and suppose that P C B(x o : po). Then P a CB(x o :cpo) for a>0, where c = I + a/d, d = min(d d —<us ,x0 >: 1 <j<m).

Proof. The proof uses the notion of a gauge, or support, function of a convex set. Let C be a closed convex set with nonempty interior and

Integrals Over Convex Shells

29

x 0 EInt(C). Then x-xo FF (x)=inf(a:a>0,x 0 +

a

EC

(3.20)

is called the gauge function of C with respect to x 0 . It is easy to see that if x 0 is an interior point of each of two closed convex sets C I , C2 , then C I c C2 if and only if Fc= (x) < F 1 (X) for all x, where both gauge functions are with respect to x 0 . Returning to P and defining Fp with respect to x 0, an easy calculation shows that [since x o +(x - x o)/a E aP if a = F,(x)] (uj , x - x 0 )

Fp (x) = max

< j < m dj — c,FP (x) for all x, where c= min

dj - I llx-x011•

Thus Cl

FF, (x)>—IIx-x 0 11,

which implies that Pa C B (x 0 : p0 /c 1 ). Now d +a-.

Thus in terms of the new coordinates Qi , respectively, where Qi = l Y=(Y, , ... , Yk):0. Let Q be a polyhedron such that Q c C I B (0 : t). tSee Dieudonne [1], Vol. 11, p. 218.

Integrals Over Convex Shells

37

a

Then Q n 8P c (P n Q). Since P n Q is a polyhedron and Ak _, is a measure on the boundary of a polyhedron, it follows that Ak _ I (Q n 3P) k_ I (P n Q)). On the other hand, (3.16) implies that A k _ I (P n Q)) k _,(3Q). Thus by (3.17)

< a (a <x

(a

(3.49)

xk- i(QnaP) <xk- I(aQ) 0, choose x,,...,x„ E aC such that C {x 1 ,...,x„) s. Let P be the convex hull of { x,, ... ,x). ). By taking 6 smaller than the radius of a ball contained in C, we ensure that P has a nonempty interior. Also, clearly,

aC

PC CC

P s,

so that

C` C Pc+s By Lemma 3.6 P

(Pa) CP0 +P

for all p>0, all a E R '. Thus (P_ P )PC P C C, or P_ P C C', so that C`\C-PC(Pa

)t

\P -PC PP +s\P -P

(P>0),

and

P

f

g(Ilxll)dx
0)

38 Weak Convergence and Uniformity Classes

by Lemma 3.9. From Lemma 3.10 we have

fc.\c

p 8(ilxjl)dx O, and for all 8>0 -

Jc , \c p g(Ilxll)dx= f g(IlxjI)dx< f c6) ,g(llxll)dx c'

a g(Ilxll)dx

G(c+S) bak , (3.52)

^C a ) VC a ) —

since C & has a nonempty interior. Since S is arbitrary, (3.1) is true for all bounded convex sets C. If C is unbounded, look at C, = C n B (0: r) and let r increase to infinity. Since C, 1C and C, - °TC - ° as rToo, and the right side of (3.1) does not depend on the particular convex set, the proof is complete. Q.E.D.

NOTES Section 1.

Detailed accounts of the theory of weak convergence of probability measures may

be found in Billingsley [1] and Parthasarathy [1]. Section 2. A systematic study of uniformity classes was initiated by Rao [3), who proved, in

particular, Corollary 2.7 and Theorem 2.11. The theory was advanced and completed by Billingsley and Topsde [1], who proved the main theorem Theorem 2.4, Corollary 2.6, Lemmas 2.2, 2.3, 2.9, 2.10, and gave the example following Corollary 2.8. These two articles also contain many results and applications not included here. One significant application of the theory is in generalizing and strengthening the classical Gilvenko-Cantelli Theorem. For this, see Rao [3] and Topsde [I]. The useful Lemma 2.1 was proved by Scheffe [1]. The bounded Lipschitzian distance has been studied by Dudley [1], and Corollary 2.8 is due to him. Theorem 2.5, which is a convenient variant of Theorem 2.4, is due to Bhattacharya [3]. Section 3. Rao [1] was the first to prove the existence of a constant c(k), depending only on k, such that 4((8C)') < c(k)c for all convex subsets C of R k . The particular estimation given by Theorem 3.1 is essentially due to von Bahr [3). It is easy to check that for k - I and a nonincreasing g [on (0, oo)], this estimate cannot be improved upon. The proof presented here provides perhaps the first detailed formal derivation of the estimate. The development (in particular, Lemma 3.9) may be of some independent interest.

CHAPTER 2

Fourier Transforms and Expansions of Characteristics Functions

The main mathematical tool used in this monograph is the Fourier transform and its extension, the Fourier—Stielijes transform. The Fourier—Stieltjes transform of a probability measure on R k is better known in probability literature as its characteristic function. In Sections 4 and 5 we present a summary of some basic facts about these transforms. Section 6 introduces moments and cumulants of probability measures on R k and presents some inequalities concerning them. In Section 7 the Cramer—Edgeworth polynomials associated with a set of cumulants are studied. The principal results of Chapter 2 are th'e asymptotic expansions of (derivatives of) characteristic functions of normalized sums of independent random vectors. These expansions, in terms of the Cramer—Edgeworth polynomials, are developed in detail in Sections 8 and 9. In Section 10 some classes of probability measures, which are used as smoothing kernels in Chapters 3 and 4, are introduced.

4. THE FOURIER TRANSFORM

In this section we collect some standard results on Fourier transforms without proofs. Let L'(R'o), I
µ(dx)

(:ER', µE ')1L),

(5.2)

where, as usual, the integral is over the whole space R k . If µ is a probability measure, µ is also called the characteristic function of s. Note that if s is absolutely continuous (finite signed measure) with respect to Lebesgue measure, having density (i.e., the Radon—Nikodym derivative) f, then

µ=f.

(5.3)

The convolution µ*v of two finite signed measures is a finite signed measure defined by (µ*p)(B)=

f

(B-x)p(dx)

(BE iYC),

(5.4)

The Fourier-Stielijes Transform

43

where for A C R"`, y E R k , the translate A +y is defined by A+y=(z=u+y:uEA)•

(5.5)

It is clear that convolution is commutative and associative. One defines the n-fold convolution u*" by µ *1 =µ µ*"=µ*("- i) *µ

(n> 1,

e'X).

(5.6)

Let be a signed measure on R k . For any measurable map T on the space (R k , GJ3 k ,µ) into (R 3 ,Jas) one defines the induced signed measure µ ° T -' by (µoT-')(B)=µ(T-'(B))

(B E'3).

(5.7)

THEOREM 5.1 (i) (Uniqueness Theorem). The map µ—*A is one-to-one on 671.. (ii) For u E JR, , µ is uniformly continuous and µ( 0 ) = µ( R k ) , Iµ(t)I }

(xER k ). (6.31)

Of the two parameters, m ER" and V is a symmetric positive-definite k x k matrix. The notation Det V stands for determinant of V, and V' is, of course, the inverse of V. It is well known thatt (D,,,. v (t) = exp { i — 2 )

(t ER").

(6.32)

From this it follows that m is the mean and V is the covariance matrix of For the computation of cumulants X, for II > 2, it is convenient to take m =0 (for It > 2 a change of origin does not affect the cumulants x.). This yields log f o, v (t) = — 2 tSee Cramer [4], pp. 118, 119.

(6.33)

The Polynomials P,

51

which shows that X„ = (i,j) element of V if v = ei + ej ,

(6.34)

where e1 is the vector with I for the ith coordinate and 0 for others, < i k. Also, X„=0

if Ip1>2.

(6.35)

Another important property of the normal distribution is

(D m1, V l * ( Dm 2 . V,* ...

* 0

(6.36)

V = 4i m ,,,,

where m=m1+m2+••• +m,

(6.37)

V=V i +V 2 +•••+V,,.

This follows from (6.32) and Theorem 5.1(i), (iii). The normal distribution (D o.r , where I is the k x k identity matrix, is called the standard normal distribution on R k and is denoted by 4'; the density of 4' is denoted by .0. Lastly, if X = (X 1 , ... , X k ) is a random vector with distribution 4',., ^,, then, for every a E R k , a0, the random variable (a, X> has the onedimensional normal distribution with mean and variance = ^k a a i ja r^ , where ij=I a j =(i,j) element of V=cov(X i ,Xj )

(i,j= I....,k).

7. THE POLYNOMIALS P, AND THE SIGNED MEASURES P J Throughout this section v = (v i , ... , v,,) is a nonnegative integral vector in R k . Consider the polynomials

X^ 7,(z)=s!

lvIas P !

(z`=z^

zP

,...

(7.1)

z)

in k variables z 1 , z 2 ,...,z k (real or complex) for a given set of real constants x,. We define the formal polynomials P3 (z: (X„)) in Z 1 ,....Z k by means of the following identity between two formal power series (in the real

variable u). 1+ I P,(z: {X })u s =exp '

X,+2(z) !

u' m

=l+ 1+I ,

m=i

E s + 2)! us .

m! L•^ (s+2)!

s=i

(7.2)

52

Expansions of Characteristic Functions

In other words, Ps (z : {7c}) is the coefficient of u' in the series on the extreme right. Thus

Ps( z : (X"})=

t

V1 f

L

,r,l

(il

X1 1 +2( Z ) Xj2+2(z) ...

Xj„ + 2 ( Z )

(11+2)! (12+ 2 )!

(Jm+2)1 1}

^^

1 0 -0,

1

mt

(s=1,2,...),

(7.3)

where the summation X* is over all m-tuples of positive integers (j 1 ,...,jm ) satisfying M (7.4) ji =1,2,...,s (1G i<m), >ji=s, and S** denotes summation over all m-tuples of nonnegative integral vectors (p 1 ,. . . , vm ) satisfying kI=j; +2

(1}. l+

r=!

x (I +o(n -js-2) / 2 ))

(n-oo),

(7.8)

where, in the evaluation of P,(it: {X„}), one uses the cumulants y of G. Thus, for each t E R k , one has an asymptotic expansion of G" (t /n 1 / 2 ) in powers of n - 1 / 2 , in the sense that the remainder is of smaller order of magnitude than the last term in the expansion. The first term in the asymptotic expansion is the characteristic function of the normal distribution'I ,. The function (of t) that appears as the coefficient of n - '/ 2 in the asymptotic expansion is the Fourier transform of a function that we denote by P,(-0 0. ,,: (y,)). The reason for such a notation is supplied by the following lemma. LEMMA 7.2. The function t-*,(it:{x.})exp{-z)

(IGRk),

(7.9)

is the Fourier transform of the function P,(-40 o v : {X„}) obtained by formally

54 Expansions

of Characteristic Functions

substituting (— 1) IHI D V *O v

for (it) " (7.10)

for each v in the polynomial P r (it: {X,,)). Here 4, is the normal density in R k with mean zero and covariance matrix V. Thus one has the formal identity P,( - 4o: {X,))= P,( —D: {X„})$o.v

(7.11)

where —D=(—D I ,..., —Dk ). Proof. The Fourier transform of 0 o, v is given by [see (6.32)]

— Z}

(tERC).

(7.12)

D qo v(t)=( it)r*o.v(t)

(tER"),

(7.13)

0o,v(t)=exp{

Also

which is obtained by taking the with derivatives with respect to x on both sides of (the Fourier inversion formula) 00,v(x)=(2sr) -k

f

exp{—i}$ o. v (t)dt

(xER k ) ( 7.14)

[or, by Theorem 4.1(iv), (v)]. Q.E.D. We define P,(— (D o, v : {7t,.}) as the finite signed measure on R k whose density is P,(- 4o.v:(X„ }). For any given finite signed measure µ on R", we define u(•), the distribution function of µ, by µ(x)=u((—oo,x])

(xER"),

(7.15)

where (

—

oc,X]=(

—

oo,X1]X .. • X(

—

ci,X k ]

[X= (x1,...,Xk)ERk}. (7.16)

Note that D,... DkP,( — 'Do.v: {X.})(x)= P,( — $o,v: {X,})(x) ,(—D: {X, , }) 40.v(x)

=P,( —D: (X, ))(DI ,

...

DkOo.v)(x)

= D j ... Dk (P,(— D: {Xr))4o.v)(x). (7.17)

The Polynomials P, 55

Thus the distribution function of P,( — (D o, I, : {x,}) is obtained by using the operator P,(— D: (x)) on the normal distribution function (D o, I,(• ). The last equality in (7.17) follows from the fact that the differential operators P,(—D: {X,}) and D I D 2 • • • Dk commute. Let us write down P,(-40: (x,)) explicitly. By (7.6), P I (it : (x}) =

(7.18) -- (it) ° (I ER"),

so that (by Lemma 7.2) P1(

—

$0,v: {x,})(x)

_—

Irl-3

Dr0O.v(x)

k —

6 X(3.0.....0)

—

3

II

v ljxj )

k

+3v 11 k v l'xj

j-I

j-1 k

vkjx. +3v

+... +X0.....0,3) — (

k j -1

k

2 k

2 "12.1.0.....0) — 2I vlxj El^ 2jx! )

k

k

+Zv l2 vlixj+vll j-1

2 k

k +... + X10,....0.1.2)

k

v2Jx1 j-I

k—Ijx ÎVk^xj) (^ I V j

k

+2v k ' k-I 2 v kjxj +v kk 2 Uk-IjxJ j- 1 j-1

Jj


56

—

X (1.I.1.0,....0) — (J

x)(±

U2jXl /

I

k

1 v3JxJ /

\

k

k

v 12 U3jx,+U13 2 V2iX^') U 23 UIIXj +...

l-1

JAI

k +X(0.....0,1,1,1)

f-I

I k

k k—IjxJ Vkjxl

Uk-2jXj)(j- U 1 JiI

/

k

+Uk-2.k-1

k UkJx.+Uk-2.k

Uk-1jX.

l -1 l—I

l

k

+Uk—1,k

2

Uk-2jkJ

l- 1

OV(x)

r V -1 =((v')), x=(x l ,...,xk )ER k ]. (7.19)

If one takes V= I, then (7.19) reduces to P1(-4: ( ))(x) =

{

-

—

6["13,0,....0)(

— X^+ 3 X1)+ ... +x((0....,0.3)( — xk+ 3 xk)]

2 [X(2,1,0.....0)(

— x^x2+x2)+ ... + X(0.....0,1.2)( —Xk xk—I +Xk—I)]

[X(1,1,1,0,...,0)(

—x 1 x 2 x 3)

+... +

X(0,...,0,1,1,1)(

Xk xk—I xk-2)]}'O( x )

(x E R k) (7.20)

where is the standard normal density in R k . If k= 1, by letting X^ be the jth cumulant of a probability measure G on R'(j = 1, 2, 3) having zero mean, one gets [using (6.13)] P1(

—

*: {7G.))(x) = 6 µ3 (z3-3x)$(x)

(xER'),

(7.21)

where µ 3 is the third moment of G. Finally, note that whatever the numbers (X,) and the positive definite symmetric matrix V are,

f P,( — $o v: {^))(x)dx=P,( -4 o,: {x,))(R k ) =0,

(7.22)

Approximation of Characteristic Functions

57

for all s > 1. This follows from

for III 1.

f (Dpo o, v )(x)dx=0

(7.23)

The relation (7.23) is a consequence of the fact that 40 o v and all its derivatives are integrable and vanish at infinity.

8. APPROXIMATION OF CHARACTERISTIC FUNCTIONS OF NORMALIZED SUMS OF INDEPENDENT RANDOM VECTORS Let X 1 ,. ..,X ben independent random vectors in R k each with zero mean and a finite third (or fourth) absolute moment. In this section we investigate the rate of convergence of Px,+...+x.)/nt/2 to 1 o.v , where V is the average of the covariance matrices of X 1 ,... ,X. The following form of Taylor's expansion will be us-;ful to us. LEMMA 8.1.t Let f be a complex-valued function defined on an open interval J of the real line, having continuous derivatives f ` of orders r=1,...,s. If x, x+hEJ, then (

)

s—I

f(x+h)=J(x)+

h^ fi 1 (x)+

(s hs l)i

f ' (1–v) s- 'f( s l (x+vh)dv.

–

(8.1) COROLLARY 8.2. For all real numbers u and positive integers s (to)' I J u

exp { iu } — 1 — iu — • • • — (s

—

1)! < —. l(8.2)

Consequently, if G is a probatility measure on R k having a finite sth absolute moment p s for some positive integer s, then i

G(t)

—

I

—

iµi(t)

—...

—

1)! µs -1(t) $ (t) ^ p511tlls s ^ s^ tE ( R k )+ ( 8.3 )

tHardy [fl, p. 327.

58 Expansions of Characteristic Functions

where for r = 1, ... , s, vi µrt r ^ljs(t)= f II s G(dx). (8.4)

A'(t)= f 'G(dx)= Irl - r

Proof The inequality (8.2) follows immediately from Lemma 8.1 on taking f (u) = exp (iu) (u E R 1 ) and x = 0, h = u. Inequality (8.3) is obtained on replacing u by in (8.2) and integrating with respect to G(dx). Note that

a:(1)= f II'G(dx) < Iltll'p,. Q.E.D. COROLLARY 8.3. Let f be a complex-valued function defined on an open subset SI of R k , having continuous derivatives Drf for I vl <s (on SZ). If the closed line segment joining x, x + h (E R k ) lies in St, then vl (Dr7)(x)

f(x+h)=J(x)+ 10),

rj

j=1

then it is clear from (8.10) that l,.n n -(: z)/z sup

P, II t CI = PS S

IItII= i fit, Vt>s/2

n

^s12

-(s- 2)/2 (8.12)

where X is the smallest eigenvalue of V. In one dimension (i.e., k= 1)

!S n _P P /2 n

—(s-2)/2

(s > 2).

(8.13)

We also use the following simple inequalities in the proofs of some of the

60 Expansions of Characteristic Functions

theorems below. If V= I, then.. n EI

n' /2l,,nlltIJ .

(8.14)

In the rest of this section we use the notation (8.9)-(8. 11), often without further mention. THEOREM 8.4. Let X 1 ,...,X, be n independent random vectors (with values) in R'` having distributions G 1 ,...,G, respectively. Assume that each Xi has zero mean and a finite third absolute moment. Assume also that the average covariance matrix V is nonsingular. Let B be the symmetric positivedefinite matrix satisfying B2= V '. (8.15) -

Define -

r..2

11

d2

)

r-2 ,

lT)

b„(d)=2-d(da(d)+1)(ls.n)2/3. 6

(8.16)

Then for every d E (0, 2' / 2) and for all t satisfying

Iltll ( d'/',

(8.17)

one has the inequality

n

^G( nBt 1/2 )-exp{-2IItIl2} i-^

3. Then there exists a constant c 2(s) depending only on s such that J(D"log G)(t)I < c2(s)p5

(9.3)

for all t in R k satisfying

G(t)—lI0),

Pr i—I

n

= with cumulant of X^,

Xtl = n -

X„

(9.6)

[PE(z+)k, 00).

(9.8)

J -1

The constants c's that appear below depend only on their arguments. LEMMA 9.5. For every nonnegative integral vector a satisfying 0 < lal /3, then D° - O (exp {h(t)) -1)= D° - " (exp {h(t)}),

(9.49)

which is a linear combination of terms such as (Da'h(t))''... (DRh(t))''exp { h(t)),

where 2; _ 1 j; /3; = a - /3. By (9.46) [and (9.13)1,

(DF'h( t ))i'... (DRh(t))i'l

c ► 6 s k 11 (I Vc s-2)1

n

(s - 2)(EJ, - 1)

,

(s

P [ n /2 - 2)/2

II tII5-2+2±Jl-la-13I

C17( S,k)

ps(IItlls—ja—Ql+ 1(II( sla—PI-2 )

(9.50)

Hence if a> Q, then

ID° a (exp{h(t)) - 1)I -

c's( s, k) ps(Iltll'-IQ-fI n cs-z)/z

+Iltllt+la-BI+2)exp{ Il8 I.

(9.51)

Using (9.44), (9.48), and (9.51) in (9.40), one obtains

D a (n n G t ex l/z) p i=1

Iltll z + s-3 n / J4+z( !t)

2 E -^ .-1

c19(s k)P, ,

(Iltll

Ia l

2

+Iltl l

(r+2)i

l

3+al+z

Iltll2 4 }• (9.52) )exp (-

}


81

Now use Lemma 9.8 to complete the proof when V= B= 1. If V r 1. look at random vectors BX 1 ,...,BX n and observe that n

II J=^

Gi \

z l

is the characteristic function of Z n = n - '/ 2 Y. where Y n = B (X, + • • . + X n ). Also, if the (r+2)th cumulant of the random variable is denoted by X,+z,l(t), then the corresponding cumulant of = is Xz(Bt). Q.E.D. The following theorems are easy consequences of Theorem 9.9. THEOREM 9.10. Let G be a probability measure on R k with zero mean, positive-definite covariance matrix V. and finite sth absolute moment for some integer s not smaller than 3. Then there exist two positive constants c 20(s,k), c 21 (s,k) such that for all tin R k satisfying

IItII < c20(s,k)n1/zns

z'

one has, for all nonnegative integral vectors a, 0 al < s, s-3

D ° Gn( ^^z) —ex p{ - 111 , 11 2 } E n—r/2Pr( iB1: (Xv)) r=0

cz,(s,k)71S s lai ncs—z)/z Îltll

+Iltll

3(s—z,+iaj ]ex{ — Ill

4 ^.

Here B is the symmetric positive-definite matrix satisfying (9.7),

-s= f

I1BxII-V(dx),

and y,, is the with cumulant of G. Proof Note that if i satisfies 1 1 11 < n'/z^ls 11(s-zt

then Bt, VBt ) t 2 2n = I1I G^ Bi z )—1 < 2n

^z,


82

because of the relations fIBxjp 2G(dx)=k )) —

Ilt2I `3

r

2_

))

exp — 2 J ^n -1 Pr(i 81 :{X^ r=0

111112 ,ai acs—z>+iQi------j }exp^ , +I{t{I n (z))z [lltll

c23 s ,k s j

where the notation is as in Theorem 9.9. Proof. As in the proof of Theorem 9.9 (see the concluding observations), it is enough to prove this theorem for B= V= I. In this case, for all t satisfying n'/2

Iltll

11u/(52) •j

one has t E Z IItII ZE(IX II Z ;

2n

<

2n

IItl12(EllXilis)21s 11,112(n',ns)Z/sa.

(10.1)

84


is called the uniform distribution on [ – a, a). One has

The measure U1 _

O(_

Q

, a J(t)

=Za f costxdx= 0

a

a

sp a ' (tERl).

(10.2)

The probability measure

To=

(10.3)

Uì_a/2,a/2l

is called the triangular distribution on [ – a, a]. It is easy to show that its density is

Q (1– ad l )

t,(x)=

for IxI a,

=0 and that

at

sin — I a t 2 (tER'). T a (t)=

(105)

2 One can write

c(m) =(f

/Ri

I

sinx I m dx) (m =2,...). -'

X i

(10.6)

For a >0 and integer m > 2 let G,.,,, denote the probability measure on R' with density g,

m

(x)=ac(m)I sin x ax

I

m

(xER').

(10.7)

It follows from (10.2) that for even integers m> 2 sin at _

--

at

m

^ [- a . a

l(t)

(t E R'),

(

10.8)

so that by the Fourier inversion theorem [Theorem 4.l(iv))

d.,.(t)=21Tac(m)u*

=O

if

f tI > ma.

.. a) (t)

(t ER'), (10.9)

A Class of Kernels 85

Let Z I ,...,Zk be independent random variables each with distribution G 112,, 2 , and let Z = (Z,, ... , Z, F ). Then for each d>0 Prob(IIZII>d)

\.

(10.10)

Thus for any a in (0, 1), there exists a constant d depending only on k such that Prob(IIZII> d)< 1—a.

(10.11)

Note that the characteristic function of Z vanishes outside [ — 1,1 ] k . Now let K j denote the distribution of Z/d; then K,((x:lIxll> 1))< 1—a, if t12[—d,d^ k

K 1 (t)=0

(10.12)

One thus has THEOREM 10.1. Let r be any given positive integer and let a E (0, 1). There exists a probability measure K, on Rk such that

(i) K1((x:IIxII> 1)) 1) is a sequence of independent and identically distributed (i.i.d.) random vectors with common distribution Q 1 . Suppose that Q 1 has mean zero, covariance I (identity matrix), and a finite third absolute moment p3 . Let Q. denote the distribution of n - 2(X, + • • • + X„). If Q, has an integrable characteristic function (c.f.) Q 1 , then the c.f. Q. of Q„ is integrable for all n. One can then use Fourier inversion to estimate the density h„ of the signed measure Q„ - (D as "

Ilh.II m sup Ih.(y)I 1) and applies the above procedure to these truncated vectors. The various lemmas in Section 14 allow one to take care of the perturbation due to truncation. As in the case r0 =0, for final accounting (i.e., to estimate the effect of smoothing by Kc ) one uses the smoothing inequalities of Section 11. The main theorems of Section 15 are obtained in this manner. A further truncation enables one to obtain corresponding analogs when only the finiteness of absolute moments of order 2+8 is assumed for some E, 0 < 8 < 1, thus yielding generalizations and refinements of the classical one-dimensional theorems of Liapounov and Lindeberg.

11. SMOOTHING INEQUALITIES

Lemmas 11.1 and 11.4 show how the difference µ— v between a finite measure and a finite signed measure v is perturbed by convolution with a probability measure Kc that concentrates (for small e) most of its mass near zero. Let f be a real-valued, Borel-measurable function on R k . Recall that in Chapter I we defined the following:

(ACRk),

wf (A)=sup(If(x)—f(y)I:x,yEA} wf (x:e)= wf (B(x:e))

(11.1)

(xER k , a>0).

Also define

MJ (x:e)=sup{f(y):yEB(x:e)), mj (x:e) =inf{ f(y): yEB(x:e))

(11.2)

(xERk, e>0).

Note that

wf (x:e)=M1 (x:e)—mf (x:e)

(xER k , a>0).

(11.3)

The functions M1 (- : e), mf (• : e) are lower and upper semicontinuous, respectively, for every real-valued function f that is bounded on each compact subset of R k . Also, wf(• : e) is lower semicontinuous. These follow from {x: M1 (x:e) >c }=

U {B(x:e): f(x)>c)

(cER5,

z

ml (x:e)=—M_ f (x:e)

(xER k , r >0).

(11.4)

In particular, it follows that Mf (• : r), mf (• : e), wf (• : e) are Borel-measurable

Smoothing Inequalities

93

for every real-valued function f on R k that is bounded on compacts. Recall that the translate fy off by y(E R'") is defined by fy (x) =f(x+y)

(xER k

).

(11.5)

LEMMA 11.1 Let s be a finite measure and v a finite signed measure on R k . Let a be a positive number and K E a probability measure on R k satisfying Ke (B(0:e))= 1.

(11.6)

Then for every real-valued, Bore/-measurable function f on R k that is bounded on compacts, f fd(µ

—v)I

=f

B(O:) e

—

v)(dy)]Ke(dx)

[ f Mf(Y +x:e)It(dy) f f(Y)v(dy) — f (Mf(y+x:e)— f(Y))v(dy)]K.(dx) —

> f[ f f(Y) u(dY) f f(Y)v(dY) — f (MJ(y+x:e) — f(Y))v (dy) I KE(dx) > f fd(,u f (M1( . : 2 e) — f)dv . —

B(0:e)

+

—

v) —

+

(11.9)

Normal Approximation

94

Similarly — Y(f:e)
2.

(11.16)

Then for each real-valued, Borel-measurable, bounded function f on R k , one has

I

f fd(tt—v)I 0, there exists x o such that F(x o )—G(x o )>S—r^.

Then (P— Q)*Ke ((— oo,x o +e])

B(o:e)

[F(xo+e

—

y)

—

G(xo+e

—

Y)]KE(dY)

+ f[F(x o +e—y)—G(x o +e—y)]KE (dy) R \B(O:e)

>f[F(xo)—G(xo)—(e—Y)m]KK(dy)—S(I —a) B(0:)

(S

—

n)IX

—

am€

—

S(1

—

a)=S(2c

—

1)— amt—a,

so that (2a— l)8 sup (P—Q)*K ((—oo,x])+ame. E

xER

1

Q.E.D. LEMMA 12.2 Let P be a finite measure and Q a finite signed measure on R' with distribution functions F and G, respectively. Assume that

f IxIP(dx)0). (12.13)

Fix x > 0. For every b > 0, one has 1 +b 2 =

f( - b) 2P(dy) > f

(

(y- b)2P(dy)>(x+b)2F(-x), m,

-

xl

so that g(b)=(l +b 2 )(x+b) -2 > F(-x). The minimum of g in [0, oo) occurs at b = x -', and g(x- 1)=(l+xZ)-1,

104


which gives the first inequality in (12.13) [note that for x=0, (12.13) is trivial]. The second is obtained similarly (or, by looking at P). This gives F(-x)-(D(-x) 6(1 +x 2 ) -' -(D(-x)mh(x),

say, for x > 0. The supremum of h in [0, oo) is attained at a point x 0 satisfying or

h'(xo)=0,

xo/ 2x 0 (1+xo) -2 = (27r) - / 2e - 2

A numerical computation yields x 0 =0.2135 and h(x 0)=0.5416, thus proving (12.14)

IF(x)-(D(x)i 0 follows similarly (or, by looking at P). Q.E.D. THEOREM 12.4 (Berry-Esseen Theorem) Let X 1 ,... , X„ be n independent random variables each with zero mean and a finite absolute third moment. If n

P2-n -1 2 EX^>0, i=1

then sup IF„(x)-c(x)I2). This is justified in view of (8.12).

THEOREM 13.2 Let X 1 ,.. . , X n be n independent and identically distributed random vectors with values in R k satisfying EX,=0,

Cov(XI)=1,

P4=E11X1114r/2)

r

512k°p3

Prob(IIEZII> 2)< 52 3 3 2 irr n

(13.28)

where XJ =(XJ , I ,...,XJ.k ), 1 < j0].

Define truncated random vectors

X^ Y= 0

1

if IX,II

n 1,2

(14.2)

if I^Xf ll > n

Zj =Y1 —EYJ (Ic jn^^ Z }

(

I An,j.a.

(14.4)

j=I

Finally, write Cov(XX )=((vii )),

V=n-I n

D=n -I Y, Cov(Zj )=((dij (14.5) jal

The constants c's depend only on their arguments.

122


LEMMA 14.1 Let p, < oo for some s>2. (i) One has

p,,=EIIY11I3+

3. Let v i ,...,v m be nonnegative integral vectors satisfying m

^v,I>3

(10,1/2, (14.111) n112) i-I

which proves (14.107). Now assume that s is an integer, s>2. Since IIxII' < l + Ilxlls for 0< r < s, it is enough to prove (14.109) for r=0 and r = s. The case r = 0 is precisely (14.107). We therefore need only to prove (14.109) for r=s. One has

f IIxl IQ. Q,'I(dx) 5

—

<

f Ilxll'IG.*

..

.

*GG-i*(G1

—

Gi )*Gi+i* g

...

*G,. I(dx)

n * G,, (du ) <j-21 f (f ►I u+U II sIG^— GJ'I(dv))G,*... *Gi-,**... n

3 ) IEY1I sP(IY;I>T24 2n' /2— IEY;I)2n'/ 2 /3) IXiI,

fIY;^J < f {IY I>r1t 2n'/ 2 —IEY I) ;

;

{IX;I>2n'/i/3}

^X;Is.

(14.118)

Therefore, choosing c, a(s,k)=(,1) , / 2 (l6k) - ', one has n -'

"

IW;I s < T2^S/22a n.^(s) < f {IW , I>"' 12 )

n (r — 2)/2

(14.119)

8

so that Lemma 14.7 may be applied to yield EIX,+••• +X^+Z,+••• +Z,I

$

= TZ j 2E I W, +... + W IS .

EIW J +

+W„I 5 O, xER k

M0(f)= sup If(x)—f(y)I=wj(R" ).

(15.4)

x,yER k

If v is a finite (signed) measure on R k , define a new (signed) measure v, by if r>O,

v,(dx)=(1 +IIxII')'(dx) o = v.

v

(15.5)

As in the preceding section, write n

^n,sl E ) —n-1 f

IIXi 3

^ns=^ns(1).

(15.6)

j-1 01Xj11>cn'/2)

Then define n

inf A* s =O 3, then for every real-valued Bore/-measurable function f satisfying Mr(f)c 9 /e=n 1 / 2 /(16p 3 ), and (

( DPR0),

,lmmn -I

(15.39)

i-1

B being the symmetric positive-definite matrix satisfying B 2 = D - 1 . By Lemma 14.1(v) (with s'=m) and inequality (14.21) (Corollary 14.2), one has

IIB(I 0 ,,.3{i)> c1n 1/2,

then

f fd(Q,

-4i )I

1) is a sequence of independent and identically distributed random vectors with zero means, covariance matrices I, and finite third absolute moments p3 . The reason for this is that the bound is in terms of wf [and wf (R k )] rather than wf . Recall that [see Lemma 1.2(iii), Theorem 1.3(iii)] that f fdQ„---. f fd4 for all bounded c1-continuous f and that a bounded f is (D-continuous if and only if -

lim wf (e:1)=0

{wl ( e:0)=

f

wj (x:e)(P(dx)l.

On the other hand, it is not difficult to construct bounded 4>-continuous functions f such that wj (e does not go to zero with e. For example, let A I , the indicator function of the following Borel subset A of R'. f=

: c)

((,-I)/21

A=U

U r-2

i—i

jxeR 1 :r+? <xr+ r !

(2i+1) r

,


154

with [(r— 1)/2] denoting the integer part of (r— 1)/2. It is easy to see that wÂ (e:4i)= sup 0((8A)`+y)=1, yERk

W1A (E:4))=4D l (aA) ) 0),

(16.6)

j-1

and M,(f°T - ') and w8r or wf, T -1, where (I + IIxIrr") - '(fo T - ')(x) Sr(x)= (foT

- ')(x)

if r>O, if r=0.

(16.7)

Normalization

161

Since T is easily computable, we may leave things at this stage. If one would like the bound to involve moments of X D 's and not those of TX's, then the simple inequality (r>0)

Tr 3, then for every real-valued, Borel-measurable function f satisfying (16.11) M,(f) 1. Examples of such sets are affine subspaces of dimensions k' < k — I (and their subsets and complements) and many other manifolds of dimension k'< k —1, for which a = k — k'. Below we assume that V= I merely to avoid notational complexity. THEOREM 17.4. Let V=I. Assume that p, < oo for some integer s)'3. If A is a Borel set satisfying (17.10) for some a> 1, then IQ.(A)

-

1b(A.)I a+2. Proof. First assume that 4(A)=0 or 4(A)=1. In this case P.(-4: {X,,)) (A)=0 for all m, 1 <m <s-2, because of (7.22). Hence (15.56) holds, and it remains to show that fl -m/2 IP,n(-^ {x.})(x)Idx (aA)"+y

0 there exist c; F_ R', y, ER k , 1 < i < m, say, such that II 'B — = 1 c(x +y) 1 G + ((a {c+y))") k'

c10(n -1/2 P3)

k+1-k '

y E Rk-

+c', o (P 3 n -I / 2 ) k-k ,

where ¢= +n -t / ZP t (— : {X„}), , denoting the average of the with

cumulants of Uj 's. Q.E.D. Remark. It is fairly straightforward to extend the assertion (17.12) to a sequence (X,, : n> 1) of independent random vectors for which n

linminfX.>0,

supn -1 7, EIIXÎI' 3, then sup ( 1 +d 3 ( 0, aC))IQ.(C) — b(C)I < c16Psn - ' /2 +ci7P:n -t:-2)/2 .

CE@

(17.25)

Proof. Let C be a Borel-measurable convex set. Replacing A by C in (17.17) and using (17.22) in Theorem 15.1 (with r=s), one has (1+d'(U,8C))IQ.(C)-41(C)j

=^ ffd(Q^-0)I s-2

m/2I (' fdp (—: f fd(Q,, — ^)I + I n{X,))I m-1 J c18Pan-(s-2)/2+ c 1 9w (2E : (Y

+ )r°)

s-2

+ 2 n - m/ 2 I f fdpp (-4: (x.))I,

(17.26)

m-1

where e=c 10P 3 n - '/ 2 . Now, by (15.82) and (15.84),

1

(1 + 1I X II^ ° )I Pm ( - -0 : {Xr})(X)I dX 4 C21Pm+2+ n-m/2Pm+2 0. Also, d,(G 3 ,G 2 )=inf (t>0: G 1 (F) < G 2 (F

)+c

for all closed F) (17.46)

since, given G 1 (F) < G2 (F`) + e for all closed F and some c > 0, one obtains 1— G 1 (F` )= G 1 (R"`\F`) < G 2 ((R"\F` f)+e 0).

( 17.51)

denote

(17.52)

THEOREM 17.10. If p 3 < oo, then there exist constants c 38 , c 39 depending

only on k such that sup AE&(d:0o.v)

iQ.(A)- 00,v(A)I 1) be a sequence of independent random vectors with values in R k and having zero means and finite sth absolute moments for some integer s> 3. Let n

Vn = n - ' Y, Coy (Xi ),

An = smallest eigenvalue of V,,,

J- 1 n

A n = largest eigenvalue of Vn ,

p = n - ' F, E IIXjII', i=1

n

= inf

o0,

lim n - '/ 2 p 3 , 1 log'/ 2 n=0,

lim n -( ' -Z) / Z pJ, n =0.

(17.56) Then one has

sup

{ a'

a >((s-2+8)Iogn)

t /2

Prob (II n - 1 / 2 (X 1 + ... + Xn) II > A' /Za) )

< c 4o(s, k)(1 + A'( 2 )( l + (n -'/2A1 3/2logn)k+,+'`n _(s-2)/2pn +©n(8 )n -(:-2)/Z, (17.57)

where 0„(S)—>0 as n-+oo for each 8 >0. Proof. Without loss of generality, assume a n >0 for all n. Let Qn denote the distribution of n - '/ 2(T„X + + T„X n ), where Tn is the symmetric, positive-definite matrix satisfying TZ =V,,

(n>1).

(17.58)

Some Applications

177

Write n T

—n—'

m+2,n

m+2 (0<m<s-2), 2 EllTT Xjll i I —

n

(l vi < s).

(vth cumulant of T„Xj )

(17.59)

j-1

Define the function f by

f (x)° 0 if a if

IIxII a,

(17.60)

and use Theorem 15.4 with this f to get s-2

Qn — 2 n -mj2 Pm( —, D: (j4.,n}) Qx: Ilxll > a)) ,n 0

a' (

< cal(s k)M, (f)[ 1 +( n—'/2T3

,nlogn)k+:+110n 3n-(s-2)/2

,

s-2

'has m2.0 n—m/2 I Pm( — 'D: (Z,n))l (17.61)

(lx:IIxII>a—c42n-'/22r3,nlogn}) where n

0.* ,- inf ,

04141

en-1 2 f

j^ l (IIT.Xj IK <Enl/I)

II TnXj IIJ

n

+n -l I f

IITnXiiIs oo

(17.64)

(1 <m<s-2),

and a — c42n -1 /2i3„logn> 2 , I/2

a—c42 n -1 / 2T 3 „logn>((s-2+ 2 )logn)

(17.65)

for all sufficiently large n if a>((s-2+S)logn) 1 / 2 . Hence a'n -m/2 I Pm( -0 : {i.,^})l({x IIXII > 2 1/ < c43 n - m/2,rm+2

(IIxII

.n f

IIxll3m+aeXp

z >(s-2+6/2)1.`.)

_ II 211 2 ) dx J (17.66)

=©„(S)n -( s -2) / 2 (0<m<s-2). Also note that MS(f)= 1+a5 a ))

< c4,(s,k)r 1 +(n -1 / 2 X; 3/2P3

.logn)k+,+' ]n n sn -(..-2)/2

n-(S-2)/2. +O„(S) t

(17.68)

Finally observe that Q,,((x:IIxii>a})=Prob(I(n -tl2 (T,X,+

...

+T,,X)11>a)

>Prob(IIn -1 / 2 (X j +... +.X,)11>aA,',/ 2 ), (17.69)

since II T,,xll > Ar 112IIxII. Q.E.D. COROLLARY 17.12. Let {X: n> 1) be a sequence of independent and identically distributed random vectors having a common mean zero and common covariance matrix V. If p,-EIIX 1 II s is finite for some integer s>3, then 2n^/ 2 )=Sn n -( : -2) l 2 an s^ (17.70) P(IIX1+... +X.II>a.Al/

Some Applications 179

where 6,,-*0 as n--* oo uniformly for every sequence {a„ : n> I) of numbers satisfying (17.71)

a,,>(s-2+6)'/2log"2n

for any fixed 6 > 0, and A is the largest eigenvalue of V. Proof Note that in this case a' < n -I / 4 - s/ 2ps +x - s/ 2 f1IX1 n

(IlXill> >,

5-* 0 (17.72)

as n-*oo. Here A, A are the smallest and largest eigenvalues of V, respectively. Q.E.D. COROLLARY 17.13. Let {X„ : n> 1) be a sequence of independent random vectors having zero means and finite sth absolute moments for some integer s > 3. Assume that lim Q,,,, < oo ,

lim inf an > 0, n-ioo

(17.73)

n—,cc

where the notation is the same as in Theorem 17.11. Then P(11X1+ ... +Xn11 > a,A,', /2n 112 )= S„n

c: -2) / 2 a,^ ',

(17.74)

where (S„ : n) 1 } remains uniformly bounded for every sequence of numbers { a„ : n) 1) satisfying (17.71) for any fixed 6 > 0. Proof. In view of (17.73) the sequence (A n : n) 1) is bounded since, writing V. = ((va )), one has k

A,,= sup (x, V,,x> = sup 2 v j xi xj I1x11-1

11x11 i,j-1 2

kk < G ( vii vjj) 1/2xi xjl =(

Pm,,, 1) is a bounded sequence. The relation (17.74) now follows from (17.57). Q.E.D. Note that the sequence (&, : n> 1) in Corollary 17.13 may be shown to go to zero in the same manner as (Sn : n > 1) in Corollary 17.12 if n

limn

1

(17.77)

j-1 {Ilxjll>Af2n'/')

18. RATES OF CONVERGENCE UNDER FINITENESS OF SECOND MOMENTS Most of the main results in Sections 15, 16, and 17 have appropriate analogs when only the second moments are assumed finite. Here we prove an analog of Theorem 15.1 and derive some corollaries. As before, X 1 ,.. .,X are independent random vectors with values in R',

and n

n ' 7, Cov(Xj)= V,

(l < j0),

xER k

Mo(f)=

sup

lf(x)

—

f(y)I =wj(R" ).

(18.6)

x,yER k

Finally, let Qn denote the distribution of n -1 / 2 (X 1 + • • • +X„) and let ' a e denote the normal distribution on R k with mean a and covariance matrix V. We write 4i=0 o.J , where I is the identity matrix. THEOREM 18.1. Let V=I and p< oo for some s, 2 < s < 3. There exist positive constants c,, c 2 , c 3 depending only on k and s such that for every Borel-measurable function f on R k satisfying (18.7)

M,(f) ck ^^^

for every c > 0, then {G,, : n) I) converges weakly to the standard normal distribution 4.

8-

184 Normal Approximation Proof. Apply Theorem 18.1 (with r = ro = 0) to the random vectors

For every Lipschitzian function f bounded by one (or for the indicator function f of an arbitrary Borel-measurable convex set) one has f f(G"-'0)I no (rl) one has 8"(t/2k) < ri/2. Then d, < rl for all n> n 0(rl). Q.E.D. The above coronary is an extension of Lindeberg's central limit theorem to the multidimensional case.

COROLLARY 18.3. If in Corollary 18.2 one replaces (18.19) by

EIIXj"I' ,' "

k„

+ kn 1 2 J II T"X5 n) II S j-1il ll Xf"III>ck=} (

< c14k^ t -z)/2 ,

k^

k„

Ell T"XJ"t° , j-1

(18.23)

Convergence Assuming Finite Second Moments 185

where e is the class of all Bore/-measurable convex subsets of R k , and e 14 depends only on k. Proof. The first inequality in (18.23) follows from Theorem 18.1 (with r=0= r 0) applied to random vectors T"X^" , 1 <j < k, and from Corollary )

3.2 (with s = 0). The second inequality is obtained from the first by letting e=0 in the expression within square brackets. Q.E.D. The above corollary contains a multidimensional extension of Liapounov's central limit theorem: { G" : n > 1) converges weakly to cli if

lim

kn (,-2)/2 fl-p 00

k„

k,. ' 2 EllT"X,ll s )= 0,

(18.24)

j-1

for some s, 2 <s < 3. The first inequality in (18.23), however, is sharper. For example, if (X" : n> 1) is a sequence of independent and identically distributed random vectors with common mean zero, common positivedefinite covariance matrix V, and finite sth absolute moments for some s, 2<s 1) is an independent and identically distributed sequence of random variables, then the right side is o(n -(a -2) / 2 ) as n,00.

NOTES The first central limit theorem was proved for i.i.d. Bernoulli random variables by DeMoivre [I]; Laplace [I] elucidated and refined it, and also gave a statement (as well as some reasoning

for the validity) of a rather general central limit theorem. Chebyshev [1] proved (with a complement due to Markov Ill) the first general central limit theorem by his famous method of moments; however, Chebyshev's moment conditions were very severe. Then came Lia-

186 Normal Approximation pounov's pioneering investigations [1, 21 in which he introduced the characteristic function in probability theory and used it to prove convergence to the normal distribution under the extremely mild hypothesis (18.24) (for k I). Finally Lindeberg [1] proved Corollary 18.2 (for k - 1). In the i.i.d. case this reduces to the so-called classical central limit theorem: if (X" : n> 1) is a seque...c of i.i.d. random variables each with mean zero and variance one, then the distribution of n - ' / 2(X, +••• +X") converges weakly to the standard normal distribution 4}. This classical central limit theorem was also proved by Levy [I] (p. 233). Feller [I) proved that the Lindeberg condition (18.19) is also necessary in order that (i) the distribution of k,, '/ 2s; t(XS" 1 + • • • +X ) converge weakly to 4' and (ii) m/k"s,-*O as n-+oo; here k-1, and we write s,, for V", s; I for T", and m, = max (var(XS" )) :1 < j < k"). Many authors have obtained multidimensional extensions of the central limit theorem, for example, Bernstein [I], Khinchin [1, 2]; the Lindeberg-Feller theorem was extended to R" by Takano [1]. Section 11. Lemma 11.1-Corollary 11.5 are due to Bhattacharya [1-5]. These easily extend to metric groups, and Bhattacharya [6] used them to derive rates of convergence of the n-fold convolution of a probability measure on a compact group to the normalized Haar measure as n-+oo. Lemma 11.6 is perhaps well known to analysts. Section 12. The first result on the speed of convergence is due to Liapounov [2], who proved sup 1F"(x)- 4'(x)I0, whereas von Bahr [3] essentially assumed that the random vectors are i.i.d. and that p 3 < 00, pk+ I < oo. For the class e, Sazonov [ I ] finally relaxed the moment condition to P3 < 00, proving Corollary 17.2 in the i.i.d. case (Bergstrom [3] later proved this independently of

Convergence Assuming Finite Second Moments 187 Sazonov), while Rotar' [1] relaxed it for the general non-i.i.d. case. For more general classes of sets this relaxation of the moment condition is due to Bhattacharya [7]. Paulauskas [1] also has a result that goes somewhat beyond the class ( . The results of Section 13 are due to Bhattacharya [3], although the explicit computation of constants given here is new. The first effective use of truncation in the present context is due to Bikjalis [4]; Lemma 14.1 and Corollary 14.2 are essentially due to him. Lemma 14.3 is due to Rotar' (I]. Lemmas 14.6 and 14.8 were obtained by Bhattacharya [7]; a result analogous to the inequality (14.107) was obtained earlier by Bikjalis [4]. Analogs of Lemma 14.7 were obtained earlier by Doob [1], pp. 225-228, for a stationary Markov chain, by Brillinger [I] for the i.i.d. case, and by von Bahr [1] for the case considered by us; but we are unable to deduce the present explicit form needed by us from their results. Theorems 15.1, 15.4, and Corollary 15.2 are due to Bhattacharya [7], as is the present form of Corollary 15.3; earlier, a version of Corollary 15.3 was independently proved by von Bahr [3] and Bhattacharya [1, 2]. Theorems 17.1, 17.4, 17.8-17.10, and Corollary 17.3 are due to Bhattacharya [4, 5, 7]. Corollaries 17.5 and 17.12 were proved by von Bahr [2, 31 in the i.i.d. case; the corresponding results (Theorems 17.4, 17.11, and Corollary 17.13) in the non-i.i.d. case are new. The first global, or mean central limit theorems are due to Esseen [1, 3], and Agnew [1]. The fairly precise result Corollary 17.7 was proved for s-3 by Nagaev [I] in the i.i.d. case (a slightly weaker result was proved earlier by Esseen [1]) and later by Bikjalis [2] in the non-i.i.d. case; afterwards, the much more powerful Theorem 17.6 was proved by Rotar' [ I ] for s-3. Rotar' [1] also stated a result which implies Theorem 17.6 for all s>3; however we are unable to verify it.

Theorem 18.3 is new, as is perhaps Corollary 18.3; however Osipov and Petrov [1] and Feller [2] contain fairly general inequalities for the difference between the distribution functions F. and 0 in the non-i.i.d. case in one dimension. More precise results than (18.25), (18.26) are known in one dimension. Ibragimov (1) has proved the following result. Suppose that {X.: n > 1) is a sequence of i.i.d. random variables each with mean zero and variance one; let O 1; but the Riemann—Lebesgue lemma [Theorem 4.1(iii)] applies for m > p, so that there must exist t o E R k such that I Q 1 (t0)I= I, which means that X i assigns all its mass to a countable set of parallel hyperplanes (see Section 21); this would imply singularity of Qm with respect to Lebesgue measure for all m> 1, contradicting the fact that Q,„ is absolutely continuous for all m >p.

+

Expansions of Densities

Next, for 11t1I >bn

191

► 12

`nb2)I:11:I1>bn1^2)) 161( /2)IP'(supÎQ n "-P ► P

=S" -P IQ ► (.. _ )I

(19.12)

(n>p)•

Now lim f

n iao

"(t)-;O,v(t)I dt
3 and that the characteristic function Q, of X, belongs to LP(R k) for some p> 1. A bounded continuous density q„ of the distribution Q, of n - '/ 2 (X 1 + • • • +X") exists for every n> p, and one has the asymptotic expansion s-2

sup ( 1 + IIXII')Ign(x) — xERk

n

—i/2pj( — $o.v:

{L})(x)I

j-0

=o(n-(2)/2)

(n-*oo),

(19.17)

where X, denotes the with cumulant of X, (3 < lvI < s). Proof Without loss of generality, assume p to be an integer (else, take [p] + I for p). For n > p + s, D °Q„ is integrable for 0< la) <s. Writing, for n>p+s, iai<s, s-2

h.(x) = x ('(gn(x) —

I

n-1/2pj(—tpo.v

j-o

{X. })(x)

(xERk),

s-2

ho(t)s( — i)^ a ^D^

n—j/2P1(it: {X,})exp{—i)h„(t)dt

(19.19)

(xER k ).

Let B be the positive-definite symmetric matrix satisfying B 2 = V 1 Define (19.20) ,q3= EIIBX1II°• -

.

By Theorem 9.12 (and the remark following it) 8(n)n-(s-2)/2( an

2 ),

(19.24)

where n > p + s, jaI s, and from 8

Q.E.D.

msup {IQ,(t)j: 11 1 11 > a) < 1.

(19.25)

194 Expansions—Nonlattice Distributions

Remark. It should be pointed out that Theorem 19.2 holds even with s-2. This is true because Theorem 9.12 holds with s-2. Therefore a sharper assertion than (19.5) holds, namely, nlimp

sup (l+llxI 2 )jgn(x) — #ov(x)^= 0 .

(19.26)

xERk

The next theorem deals with the non-i.i.d. case. THEOREM 19.3 Let (X: n> 1) be a sequence of independent random vectors with values in R k having zero means and average positive-definite covariance matrices V„ for large n. Assume that n

lira n - '

n— oo

E11B„Xj lI 3 3, where B n is the positive-definite symmetric matrix satisfying

Vn = n

B„ = Vn ',

Cov(Xj),

(19.28)

j- 1

defined for all sufficiently large n. Also, assume that there exists a positive integer p such that the functions m+p

g n.n (t)- 11 jE(exp(i))I

(0<mp+1)

j—m+1

satisfy

y-

sup

f gm , n (t)dtb,0<mp+1)) )),, ...

(D °„E l

ex p l i < tn -'/2, Bn X> >) )) rT )

< n-IaI/2(EIIB ,,XIi illa1l)... (Eli BnX1,îl -I)'^ < n - IaI/ 2 (E' II Bn X1 IlI a l)' < n-lal/2

ilad/Ial

(E1IB,,Xj,111aI) 4

... (Eli B,,XI. jl Ial ) ', n - lal/ 2mn i I a l n .

^ la.l/lal

(19.44)

Also, since m < jat , there are at least (n — IaI)/(dal p)— I sets of p consecutive indices in {1,2,...,n}\{j 1 ,...,jm ). Hence dt < n -lal/2+ ^yflal.n(S (b4))in-IaI /(IaIP)-2

} (11111>b.n' / '

X fRk gm'.n n^/2 dt < b$n-IaI/2+k/2+Ila.n(6(b4))(n-lal)/(IaIP)-2 (19.45)

for some m', 0 < m' < n —p and, therefore,

(11111>bOI12

)

JD a Qn (t)I dt n+n'/ 2 I(EY1."II})=0, ✓

-i M",i({Ilxll n,).

(19.93)

Therefore for all such j, using (19.79) and the Leibniz formula for differentiating a product of n functions, ID OMM,j (t)I (I6v,) ^) -

By (19.69) and (19.70) (and remembering that G is absolutely continuous), sup

llm S„= lim S„= n-i 00

n- 00

IG(i)I(I6p3)

so that 8„ 3. Let V denote the covariance matrix of Q 1 and X its with cumulant (3 < II < s). Then for every real-valued, Bore/-measurable function f on R k satisfying (20.4)

M5,(f) n o , say, by T2 = and, writing A„ for the largest eigenvalue of D,

s+k+l

A„ _

c 7 (s, k)n 1/2

h/(s+k

(EIIT,Z1.1,II An

)

—

I)

'/Z

(20.22)

211

Expansions Under Cramei s Condition

By Corollary 14.2, (II T" II : n > no ) is bounded, and [by Lemma 14.1(v)] EIIT1,.II5+k+1 G2 s+k+lEIIy1 .nlls+k+1= 0(n(k+1)/2) C8(s,

(n—+),

k)n(1 /2)(s— 2)/(s+k— 1) (>

Ps /(s+k— 1)

(20.23)

where c 8 is positive. Use the first estimate of (20.23) in (20.21) to get

f

I ^ DO aHH (t) J [ DGK^ (1) -

{Ilîlc }

+ fc 9 (s, k)(1 + IItIIIS - °I) exp { — ie IItII 2 } di { II1II >A}

s+k-2

+ fDO - °` 2 n - '/ 2p(it : (X,}) exp( — Zm^^=

for every positive e, and (iii) the characteristic functions g„ of X. satisfy

lim sup jg„(t)I < l °~00

(20.55)

II1II>b

for every positive b. Then for every real-valued, Borel-measurable function f on R" satisfying (20.4) for some s', 0< s' < s, one has .,-Z

f fd( Qn— 1 n-'/2P,(—'D: {X,,,n)) r-0

<M:(f)8i(n)+c(s,k)w!(2e-cn_ 4),

(20.56)

where Q, is the distribution of n -1 / 2 B„ (X I +"• + X„), with B,2, = V, '. Also 8 1 (n) and d are as in Theorem 20.1, and z = average with cumulant of B„Xj (1 < j < n).

Expansions Under Cramer's Condition

217

The proof of Theorem 20.6 is entirely analogous to that of Theorem 20.1 and is therefore omitted. As indicated in the introduction to the present chapter, there are special functions f, for example, trigonometric polynomials, for which the expansion of f fdQ„ is valid whatever may be the type of distribution of X,. This follows from Theorems 9.10-9.12. Our next theorem provides a class of functions of this type. For the sake of simplicity we state it for the i.i.d. case. THEOREM 20.7 Let (X,,: n> 1) be an i.i.d. sequence of random vectors with values in Rk . Assume that the common distribution has mean zero, positive-definite covariance matrix I and a finite sth absolute moment p, for some integer s> 3. Let f be a (real or complex-valued) Bore/-measurable function on Rk that is the Fourier-Stielijes transform of a finite signed measure satisfying (20.57)

f Ilxll s-Z I t (dx) < oo.

Then

f

f-2

fd Q„

{7G}) =o(n - cs -2) / 2 )

- n - '/ Z P,(-^:

r=o

(n--*oo).

(20.58)

Here Q„ is the distribution of n " 2 (X 1 + • • • + X„), X, = with cumulant of X 1 . Proof. By Parseval's relation (Theorem 5.1(viii)] and Theorem 9.12, s-z

f

fd Q„ - ^ n-'/2P,(-0: {X})1

= ro

/

s-z

= f Q„(t) - 1 n

- ' /2

P,(it:{X,.})exp(

r=o

f

8

(n)n -(s-2)/2

-

ill'11 2 ) It(dt)

[IItll Z +llt11 10-2)

I

{Iltllc20(s, k)n" 2p, t/(-2))

Q.E.D. We point out that if µ is discrete, assigning its entire mass to a finite number of points of R k , then the above theorem applies, and thus f may be taken to be an arbitrary trigonometric polynomial. However the result applies to a much larger class of functions f, including the class of all Schwartz functions (see Section A.2 for a definition of this class). Finally, for strongly nonlattice distributions we have the following result. THEOREM 20.8 Let (X,,: n> 1) be a sequence of i.i.d. strongly nonlattice random vectors with values in R k . If EX 1 = 0, Cov(X,) = I, and p 3 - EIIX 1 11 3 < cc, then for every real-valued, bounded, Bore/-measurable function f on R k one has J fd (Q,,—(D—n-'/2P1(— I:

{X„)))I=wj(R k ) . o(n -I/Z )+ 0 (wl (s.

4 ))

(n-moo), (20.60)

where Q. is the distribution of n -1 / 2 (X I + . • • +X„), X,, = with cumulant of X 1 , and 8„ = o(n - 1 / 2 ); a n does not depend on f. Proof. Given q >0, we show that there exists n(rl) such that for all n > n(ij) the left side of (20.60) is less than wf (R') , o(n -1 / 2 )+c(k)4 (ran -1 / 2 : fi).

(20.61)

Introduce truncated random vectors Y1, (1 0, and third moment µ3 . It may be noted that for k =1 "strongly nonlattice" is the same as "nonlattice." One may also easily write down analogs of (20.77) for more general classes of sets (than rectangles), for example, the class of all Borel-measurable convex subsets of R c, or the class & 1 (d: 1) introduced in (17.3). NOTES Although Chebyshev [11 and Edgeworth [1] had conceived of the formal expansions of this chapter, it was not until Cramer's important work [1,31 (Chapter VII) that a proper foundation was laid and the first important results derived.

Expansions—Nonlattice Distributions

222

Section 19. For an early local limit theorem for densities in the non-i.i.d. case and its significant applications to statistical mechanics the reader is referred to Khinchin [3]. Theorem 19.1 is essentially proved in Gnedenko and Kolmogorov [I], pp. 224-227; in this book (pp. 228-230) one also finds the following result of Gnedenko: in one dimension under the hypothesis of Theorem 19.2 one has s-2

sup q,,(x)- i n -112P/( +: {L))(x) =o(n -

-(,-

2 ' 12 ).

(N.1)

xER 1 !-0

For k- I and s>3 the relation (19.17) in Theorem 19.2 was proved by Petrov [1] assuming boundedness of q„ for some n; however this assumption has been shown to be equivalent to ours in Theorem 19.1. Theorems 19.2, 19.3, and Corollary 19.4 appear here for the first time in their present forms. The assumptions (19.47), (19.49) may be considered too restrictive for the non-i.i.d. case; however it is not difficult to weaken them and get somewhat weaker results; we have avoided this on the ground that the conditions would look messier. Theorem 19.5 is due to Bhattacharya [8]; it strengthens Corollary 19.6, which was proved earlier by Bikjalis (4) for s>3. Section 20. Cramer [1,3] (Chapter VII) proved that

sup

k_

f

-3

I

n J /2P(-4 o ?: {X,))(x)I_O(n- t,-=W/2) (N.2) -

XER 1 J-0

in one dimension under the hypothesis of Theorem 20.1; here F. is the distribution function of n -1 / 2 (X 1 +... +X„) and var(X 1 ) - a 2. This was sharpened by Esseen (1], who obtained a remainder o(n t' -2 )/ 2) by adding one more term to the expansion; Esseen's result is equivalent to (20.49) when specialized to k =1. R. Rao [1,2] was the first to obtain multidimensional expansions under Cramir's condition (20.1) and prove that in the i.i.d. case one can expand probabilities of Borel-measurable convex sets with an error term O(n t' -2>/ 2 (logn)t k- 't/ 2) uniformly over the class Cs, provided that the hypothesis of Theorem 20.1 holds. This was extended to more general classes of sets independently by von Bahr [3] and Bhattacharya [1,2]. Esseen's result on the expansion of the distribution function (mentioned above) was extended to Rk independently by Bikjalis [4] and von Bahr [3]. Corollaries 20.4, 20.5 as well as the relation (20.49), which refine earlier results of Bhattacharya [1,2) and von Bahr [3], were obtained in Bhattacharya [4, 5]. The very general Theorem 20.1 is new; this extends Corollaries 20.2, 20.3 proved earlier by Bhattacharya [5]. Theorems 20.6, 20.7 are due to Bhattacharya [4,5]. There is a result in Osipov [l] that yields o(n_ (1 _ 2)/ 2) in place of o(n t'_ 2)/ 2) as the right side of (20.53). Some analogs of Theorem 20.8 have been obtained independently by Bikjalis [6). Earlier Esseen [1) had proved (20.60) xl x E R t )) in for the distribution function of Q. (i.e., for the class of functions { f I one dimension and derived (20.78). -

-

-

- _,,. : (

CHAPTER 5

Asymptotic Expansions Lattice Distributions

The Cramer-Edgeworth expansions of Chapter 4 are not valid for purely discrete distributions. For example, if (X,,: n> 1) is a sequence of i.i.d. lattice random variables (k= 1), then the distribution Q„ of the nth normalized partial sum is easily shown to have point masses each of order n - I / 2 (if variance of X, is finite and nonzero). Thus the distribution function of Q. cannot possibly be expanded in terms of the absolutely continuous distribution functions of P,(- 4)), 0 < r < s -2, with a remainder term o(n -(,-2)/2 ), when X, has a finite sth moment for some integers not smaller than 3. However the situation may be salvaged in the following manner. The multiple Fourier series Q. is easily inverted to yield the point masses of Q. Making use of the approximation of Q„ by exp( - 2 )^;=pn - '' 2P,(it) as provided by Chapter 2, Section 9, one obtains an asymptotic expansion of the point masses of Q„ in terms of j;=an - '12P,(-4). To obtain an expansion of Q„ (B) for a Borel set B, one has to add up the asymptotic expansions of the point masses in B. For B = (- oo, x], x E R k , this sum may be expressed in a simple closed form. A multidimensional extension of the classical Euler-Maclaurin summation formula is used for this purpose. 21. LATTICE DISTRIBUTIONS Consider R k as a group under vector addition. A subgroup L of R I is said to be a discrete subgroup if there is a ball B (0: d), d > 0, around the origin such that L n B (0: d) _ (0). Equivalently, a subgroup L is discrete if every ball in R k has only a finite number of points of L in it. In particular, a 223

224

Expansions—Lattice Distributions

discrete subgroup is a closed subset of R k . The following theorem gives the structure of discrete subgroups. THEOREM 21.1 Let L be a discrete subgroup of Rk and let r be the

number of elements contained in a maximal set of linearly independent vectors in L. Then there exist r linearly independent vectors i , ... , in L such that L=Z•E,+ • • • +Z•t,-(m i t t + • • • +m,t,: m i ,...,m, integers) (Z=(0,±1,±2,...)). (21.1)

Proof First consider the case k = 1. If L is a discrete subgroup, L * (0), then to =min t : t E L, t > O) is positive. If t E L is arbitrary, let n be an integer such that nto < t < (n + 1)t a . Then 0 < t — nt p < to and t — nto E L, so that the minimality of t o implies that t = nt o or L = Z • t o . Now consider the case k> 1. The theorem will be proved if we can construct linearly independent vectors,..., , in L such that Ln(R• +•.. +R• ,)=Z•¢ i + • • • +Z..,, since it follows from the definition of the integer r that L c R• J, + • • • +R•,. Here R is the field of reals. We construct these vectors inductively. Assume that linearly independent vectors ¢, ..., ^ have been found for some s, s < r, such that (i) EE L, j=1,2,...,s and (ii) Ln(R•J,+•••+R•^,)=Z•t 1 +•••+Z. . Now we describe a method for choosing ^ ^. Let a E L\(R• J, + • • • + R.). Then a is linearly independent of ...,ts . Let M={toa: t oa+t i t s + • • • +tEL for some choice of real numbers t1,...,ts }. Since ¢ i ,...,C,EL, it follows that M = (t o a: toa + a i t i + • • • + as E L for some choice of a i with 00 such that M=Z•a oa. Choose constants a^, 1 r. We shall show that ro =r. Since r(d) is integer-valued, there exists d0 >0 such that r(d)=ro for 00 be arbitrary and let d 1 =min(e/k,d0 }. Then r(d 1 ) = do, and there exist linearly independent vectors Q^,...,/3,, in L n B(0: d 1 ). It follows that /3^ E R•rl 1 + • • • + R•rl o and, therefore, R•(3, + • + R•13,0 C R•rl, + • • • + R•17,, which implies R.f3 1 +••• +R•/3. 0 =R•rj 1 +••. +R•rl, o (21.5) Now let EE R•r1 1 + • • • + R•rl, o be arbitrary. Then J= t 1 X13, + • • • + t /3,0 Write t^ = m1 + tj with rn1 integral and ^ t^^ < 1, 1 <j < r o . Thus there exists /3 E L, /6= m, Q,+••• +m of13, 0, such that Ili — a II 0 and let to E L*. Then (21.11) implies that E a + 2irZ, so that P(E2IrZ)=1.

(21.13)

If S=(: P (X = x o + E) > 0), then (21.13) is equivalent to E 277Z

for all j' ES.

(21.14)

Since S generates L, (21.13) is equivalent to E27TZ

(21.15)

for all EEL.

Thus t o belongs to the right side of (ii). Conversely, if (21.15) holds for some t o , (21.13) holds and f(t+t o)I=IE(exp{i))I=IE(exp(i — i})I=IE(exp{i+i})J=J f (1)J

(tER"` ). (21.16)

Thus to E L*, and (ii) is proved. It remains to prove (iii). By Theorem 21.1, there exists a basis { ... k } of R k such that L = Z • t 1 + • • • + Z • J„ where r is the rank of L. Let { rl,, ... ,,t k } be the dual basis, that is, %, rl^) = 5 ., 1 < j, j' < k, where 8y is Kronecker's delta. Then (ii) implies L*=27r(Z•r1 1 +... +Z q,)+R'ij. +j +... +R• l k .

(21.17)

The relation (iii) follows immediately from (21.17). The last assertion is an immediate consequence of (i) and (ii) [or (21.17)]. Q.E.D. COROLLARY 21.7 Let X be a lattice random vector with characteristic function f. The set L* of periods of !fI is a lattice if and only if X is nondegenerate. Proof. This follows immediately from representation (21.17) and Lemma 21.5. Q.E.D. Let L be a lattice and {j I ,•••,Jk ) a basis of.L; that is, L=Z•,+ • • • +Z•jk .

(21.18)

Let {rt 1 ,...,71k } be a dual basis, that is, =Sy , and let L* be the lattice defined as L*=21T(Z•,q1+ • • • +Z.).

(21.19)

Lattice Distributions

229

Write Det(j l ,...,^k ) for the determinant of the matrix whose jth row is =(^ I ,...,^k ), so that Det(^ I ,...,^k )=Det(^ .). If .,k} is another basis of L, then there exist integral matrices A = (ay,) and B = (b11 ) such that Jj =la^j J.' and ^y =Yby.f^. Then DetA =±1, so that Det(i; I ,...,iak )=±Det(,...,?;k) Thus the quantity defined by det L = IDet(^ I , ..., i k )I

(21.20)

is independent of the basis and depends only on the lattice L. Also, with this definition, (2i)k det L* =(2 a) k IDet(,1 i ,.. .,rlk )I =

(21.21)

detL '

Consider a domain S * defined as follows: 9*={ 1111+"• +101k: I1^ l 1, n > 1). We shall assume that write

ps

= E IIX1 µ ll f is finite for some integer s > 2, and

D„ = Cov(Z,,,,),

—

1= det L,

X„ = with cumulant of X 1 ,

= with cumulant of Z 1 ,,, Ya,n_n- 1/2(a

(22.2)

—nµ),

(Iv! < s),

y,,,,=n -1/2 (a — nEY,. ,,)

(a

EL),

Pn(Y.,.)=P(XI+... +Xn= a)= P(n -1/2 1 (Xi — µ)=Y.,n)• j -1 n

P.(Y..n)=P(Y1,rr+... +Y. ,= a) =P( n -1 / 2

1 Zj.n=ya.n),

m-2

qn.m = In —k/2

n —'/ 2P,(

r..0

-

4: (x})

(2<m<s),

,n-2 r-0

n - ' / 2P,( — qIov,: (,,))

(2<m 2, then sup ( 1+ IIYa.nII 5 ) IP,,( Yn,n) — gn,,(Ya.n)1 = o(n —( k+s-2)/2 ) aEL

( n —.00).

(22.4) Also, a€L

IPn(Y.,n) — q.,,(Ya.n)I = o(n

—( s-2)/2) (n--goo).

(22.5)


232

Proof Let g, denote the characteristic function of X, and fn that of n - 1 / 2 (X, + • • • + X n - nµ). Then fn(t)=(gi("

- ' /2t)exp{

-

i)D^fn(t)dt I e 1/I f'

where /3 is a nonnegative integral vector satisfying I /3 I < s, and n'/ 2A={n'/ 2x: xEA}

(ACR").

(22.8)

Also, clearly, i)IQl Ya ngn.s(Ya.n) = 1(27r) —k n —k / 2 ( —

1

rRk exp{ — l}

3_2

2

2

l

XD O 2n - '/ 2Pr (it: (x,))exp{ -I 1(1 } dt

1 1

r-0

(C^(31<s). (22.9)

111

Hence IYun(hn(Ya.n)—q.s(Ya.n ))J k + 1, then (22.5) follows from (22.4). For the general case, we shall use truncation. First note that [see (14.111)] n

IPn(Ya,n)—P,(Ya,n)IK2 1 P(Xj }J,n) '

j-I

aEL

= 2 nP(IIXI'µII =

)

(_(5_2)/2) -(s -2) /2 )

n l/2 ) (22.16)

(n—*oo).

Next, by Lemma 14.6, s-2 ^gn,s(Ya,n)

Qn,s(Ya.n)^

(I + IIY0,nII3r+2)

n-k/2sl(n) r=0

xexp{

—II 6 , n ll2

+ I8k '

II6a.nlI 2

1 8012

12I

} (aEL), (22.17)

where 8,(n) = o(n -(s -2) / 2 ). Now n-k/2a. (I+IIYa.n1I3r+2 )exp EL 6

=n -

k/2

I

(1+1In

{

l

—

- 1/2ajI3r+2)exp

aEL-nµ

< n - "`/ 2 sup

I h(n - '/ 2a+x)

IIxII< n-112aEL

.}

I/2

IIn

-

6

l

1

' /2a

1 2 + 11n

-

I/ 2a

1I

8k1/2 }

(=diameter of *). (22.18)

234


where

h(y)=(1+IIYII3r+2)exp{

—I^6Y1I2 + 8^k' 1 2 1

(yER k ). (22.19)

Let T be a nonsingular linear transformation on R k such that TL = Zk .

Then n —k/2 sup Y, h(n — ' /2a+x) IIxiIGAn ^/ 2 aEL

sup

2 h(n -1 / 2 T - 'a'+x), f h(T - y)dyn1/2/(16P 3 ))

( —) f' n(t) I di ate

< n" 2E IIZI.n Its (S' + 2p5n -(:-2)/2)" s = o(n -( -2) / 2 )

(s'=0, 1,...),

-;

(22.29)

where S' is defined by -

S'msup(Ig1(t)I:

tE

*\ (1':

IIt'll'/z. (24.1) {Ilxll — a&')

Combining (24.4) — (24.8) one arrives at aS 0 : I M,(•: as) d (z — v)*KE + a w,(•: 2as)dv'

+ J w, (• : as) d I (µ — v) r KF I + ö 1 (1 — a)

+( xER, I(If(x+y)I+ww(x+y:

•

j K5(dx) . {Nxl ^ as')

a&))I

—

v I(dy))

(24.9)

In case b o = — J fd (µ — v) , the above computations are applied to —f (in place off). Thus in all cases one obtains a6 0 5 9, (e) + a0 2 (z) + c0 3 (s) + S, (1 — a) ,

(24.10)

Two Recent Improvements

246

where Os (s) = j(IfI + 2W,(.: as)) d l (i — v) . K E I , 0 :(E) = !wf( . : 2as)dv',

03 (g) =

c=

J KE (dx) , IDA2t — as' )

sup x E Rk

I (If (x + y) I +wf(x+y

:a&))IA

—

v I(dy)•

(24.11)

Now define

=

sup

I j f (x + y) (ai — v) (dy)

(uxII <jae'

(j = 1, 2,...).

(24.12)

We are going to establish a relation such as (24.10) between S f and 5,,, for all j = 1, 2, .... Fix j . Let n be a number satisfying 0 < rl < .(Assume S; > 0; else, there is nothing to prove). Once again suppose

=

sup t Ilia <jas' )

J f(x + y) (A — ►') (dy)

(24.13)

There exists x, such that Ix, <jas' and f(x^ + Y) (p — v) (dy) > S1 — n .

(24.14)

Smoothing Inequality

247

Then

j J M, (x + y + x) (. — v) (dy) K (dx) E

1

= (M, (x + y + x) — f (x + j

j

y)) (u — v) (dy) K. (dx)

+ f (x, + y) (µ — v) (dy) KE (dx) = I.j + I +

131,

(24.15)

where I,j , I, I j are the integrals (w.r.t. K. (dx)) over [ Uxa — — a w,

(xj

+ y : 2as) v (dy) + a (Sj — -q) ,

1.► >— — J w,,j (.: as) d I (u — v) * KE

I —8

j. 1

(1 — a) ,

I I,j 15 c0,(e).

(24.16)

Since i > 0 may be taken arbitrarily small, one has a bj F. Then the inequality (24.21) holds with 6o given by (24.2) and -yam, defined by (24.19), (24.20).

As a consequence of this strengthening of Lemma 11.4 one may obtain the following improvement of Theorems 15.1, 15.4. The notation below is that of Section 15, unless stated otherwise. The positive constants d depend only on k, s, r and are suitably chosen. THEOREM 24.2. Under the hypothesis of Theorem 15.1 one has, for each positive integer j,

I

J fd (Q

-

^') 1 ^ M, (f)(d1n (• 2 )': A*,,, + d2,1(e3/n'')') -

+ d3w1 (d4Q3n ' : -

,

(24.22)

where r 0 , g and 0 are as defined in the statement of Theorem 15.1.

Proof The proof of Theorem 15.1 remains unchanged upto (and including) relation (15.30). Now apply Lemma 24.1 with ' µ (dy) = (1 + ly I 0) Q (dy), v (dy) = (1 + Ily u '°) ' (dy) ,

(24.23)

250


and KE satisfying (15.25) and

1yV

K (dy) < oo

J (24.24)

Let (See (15.25), (15.33) )

a = 1, s = 16c 9 Q,n, e' =

m = [E , _ ½ ] , (24.25)

and note that a ^ 3/4 (See (15.25) ). Instead of (15.31) one then obtains the relation

I

g.. (x) (I + lxu `°) (Q.' — 0') (dx)

< 27.. +

(!

—

a

)^

a., (24.26)

By (15.48),

81 ^, s d6 M, (f) I (U — is) .K6 1 s d, M, (f)A;,n'c' :>i: -

(24.27)


251

To estimate 82,,. write t for the density of ' and note that

(y : 2a6) v (dy) = J w, (a. + x + y : 2a6) (1 + lyu ro ) E ` (y) dy

= j cot (z : 2at) (1 + liz — a„ — xli '°) ' (z — a. — x) dz

= J w, (z : 2as) (1 + az1 '°) ^ ` (z) dz

+ j w, (z : 2as) j (1 + liz — a„ — xli '0 ) E ` (z — a„ — x)

— (1 + lizIir°)E`(z)Idz.

(24.28)

For lixl < mae' = ms' < s "½ _ ; 4 one may show, as in the proof of Lemma 14.6 (especially see (14.81) — (14.86)) ,

(1 + liz — a. — xli ' 0 ) t + (z — a. — x) — (1 + Ilzll ' 0 ) E' (z)

s 1 liz — a. — xu '° — lizu '° 1 ^' (z) + (1 + liz — an — xli '0)

I E'(z—a„—x) — + (z) I < d8 (s% + lia.li ) (I + Uz11 '0- ' ) t + (z)

+ d9 (e% + aa„II ) (1 + lid '0+1 ) exp ( ½ (e% + lia„II )'

+ (s'" + Ila„II ) lid — ½ Uzu 2 I

252


s d, o (1 + Id

'O-1

) E (z) + d 11 (1 + Id '0 ) exp ( — '/2 Nz1 2

if IzI s d%2 ( + Na.N )'' ,

s d, ° (I + Nz I '*-') t ` (z) + d,:,,s' exp I — Id 2 / 4 }

if Nzd > d 12 (&4 + Na„N) -' (24.29)

Hence

82 s dt, ,E

J w, (z : 26) (1 + NzI '°) E (z) dz + d,4, jm,(f)&J. (24.30)

Further, by Lemma 14.3 (or (14.93) ), and inequalities (14.12), (14.81) (also see (15.28) ), one has

NU

= I (1 + Nyl ' ) Q. (dy) s d,5,

UPI =

0

J (1 + NyI'° )) E(y) I dy s d,6.

(24.31)


253

These lead to the estimate

c =

sup

J(g

+ y) ( + w, (x + y: ac)) µ — v I (dy)

x E R'

5 d17 M,(f)(liO + IvU) A)

I (1 — o) a2 (& (t) — t. (t)) (NtN>A)

= o(n -f 2)/2 ) + o (ii -(-/2

-

19 (01 dt

k/4)).

(25.12)

Expectations of Smooth Functions

259

In case s is an odd integer use truncation and carry out the above computations with s + 1 replacing s. Q.E.D.

For some purposes the result of Gotze and Hipp (1978) is somewhat better and, therefore, let us state it here.

THEOREM 25.2. Assume that g, < oo for some integers >_ 3. If (i) D °a f is continuous for a I s s — 2 , (ii) (1 + IxN 2 ) -12 1 f (x) I is bounded above, and (iii) D of has at most a polynomial growth at infinity for I a I = s — 2 , then

j fd(Q, u,) = o(n -

-

c•

-

2)'2).

(25.13)

In order to make a simple comparison between the two theorems above, assume Q, < ao for all integers s > 0. If f is m-times continuously differentiable and its derivatives of order m are polynomially bounded, then Theorem 25.2 provides an asymptotic expansion of j fd Q. with an error o (n m2). Theorem 25.1 on the other hand gives a larger error o (n (= k"') . However, there are functions in W".' which are not m-times continuously differentiable. In general, all that can be said is: if g E W'. 2 then g has continuous derivatives of order a for all a satisfying I a I < m — k / 2 t. Thus there are functions f for which Theorem 25.1 provides a sharper result. Finally, let us mention the recent monographs by Hall (1982) and Sazonov (1981) on the subject matter of this monograph. -

t See Reed, M. and Simon, B. [l], Theorem IX.24, p. 52.

-

Chapter 7

An Application of Stein's Method In this section, we first present a brief outline of a method of approximation due to Stein (1986), which is, in general, not Fourier analytic. This is followed by a detailed derivation of the Berry—Esseen bound for convex sets obtained by Gotze (1991), who used Stein's method.

26.

AN EXPOSITION OF GOTZE'S ESTIMATION OF

THE RATE OF CONVERGENCE IN THE MULTIVARIATE CENTRAL LIMIT THEOREM In his article Gotze (1991) used Stein's method to provide an ingenious derivation of the Berry—Esseen-type bound for the class of Borel convex subsets of R' in the context of the classical multivariate central limit theorem. This approach has proved fruitful in deriving error bounds for the CLT under certain structures of dependence as well (see Rinott and Rotar 260

26.1. The generator of the ergodic Markov process. 261

(1996)). Our view and elaboration of Gotze's proof follow Bhattacharya and Holmes (2010) and were first presented in a seminar at Stanford given in the summer of 2000. The authors wish to thank Persi Diaconis for pointing out the need for a more readable account of Gotze's result than that given in his original work. Rai c (2004) has followed essentially the same route as Gotze, but in greater detail, in deriving Gotze's bound. It may be pointed out that we are unable to verify the derivations of the dimensional dependence 0(k) in Gotze (1991), Raic (2004). Our derivation provides the higher order dependence of the error rate on k, namely, 0(k4). This rate can be reduced to 0(k4) using an inequality of Ball (1993). The best order of dependence known, namely, 0(k4), is given by Bentkus (2003), using a different method, which would be difficult to extend to dependent cases. As a matter of notation, the constants c, with or without subscripts, are absolute constants. The k- dimensional standard Normal distribution is denoted by Ar(0, ilk) as well as P, with density 0.

26.1 The generator of the ergodic Markov process as a Stein operator. Suppose Q and Qo are two probability measures on a measurable space (S, S) and h is integrable (with regards to Q and Qo). Consider the problem

of estimating

Eh—E0h

-J hdQ — f hdQ o .

( 26.1.1)

A basic idea of Stein (1986) (developed in some examples in Diaconis and Holmes (2004) and Holmes (2004)) is

Chapter 7. An Application of Stein's Method

262

(i) to find an invertible map L which maps "nice" functions on S into the kernel or null space of Eo, (ii) to find a perturbation of L, say, L a , which maps "nice" functions on S into the kernel or null space of E, and (iii) to estimate (26.1.1) using the identity Eh - Eoh = ELgo = E(Lgo - L a ga ),

(26.1.2)

where go = L -1 (h - Eoh),

ga - L

-

1 (h

- Eh).

In the present application, instead of finding a perturbation L a of L, one obtains a smooth perturbation Tt h, say, of h, and applies the first relation in (26.1.2) to Tt h rather than h. Writing V) t = L -1 (Tt - EoTt h) in place of go above, one then estimates EL b = ETt h - EoTt h. Finally, the extent of perturbation due to smoothing is estimated: (ETt h - EoTt h) - (Eh - Eoh). One way to find L is to consider an ergodic Markov process {Xt : t > 0} on S which has Qo as its invariant distribution and let L be its generator. Lg

t oTtgt

-

g 2 g 6.1.3 E Dc,, ) (

where the limit is in L 2 (S, Qo) , and (Ttg)(x) = E ]g(Xt)IXo = x] , or, in terms of the transitions probability p(t; x, dy) of the Markov process {Xt :t>0}, (Tt g)(x)

= Js

g(y)p(t; x, dy)

(x E S, t > 0).

(26.1.4)

26.1. The generator of the ergodic Markov process.

263

Also, DL is the set of g for which the limit in (26.1.3) exists. By the Markov (or semigroup) property, Tt +s = Tt T3 = T3 Tt , so that d Ttg = lim

dt

Tt+sg - Ttg

40

= lim

s

Tt(Tsg - g) = T t Lg.

(26.1.5)

s

40

Since Tt T3 = T8 Tt , Tt and L commute, (26.1.6)

dt Ttg = LTtg.

Note that invariance of Qo means ETt g(Xo) = Eg(Xo) = f gdQo, if the distribution of X 0 is Qo. This implies that, for every g E DL, ELg(Xo) = 0, or

i

s

Lg(x)dQo(x) = 0,

[ELg(Xo) = E^ t o

Ttg(Xo) - g(X0) __

t

)

urn ETeg(Xo) - Eg(Xo) t

That is, L maps DL into the set 1 1 of mean zero functions in L 2 (S, Qo). It is known that the range of L is dense in 1 -L and if L has a spectral gap, then the range of L is all of 1 1 . In the latter case L -1 is well defined on 1 -'- (kernel of Qo) and is bounded on it (Bhattacharya (1982)). Since Tt converges to the identity operator as t . 0, one may also use Tt for small t > 0 to smooth the target function h = h - f hdQo. For the case of a diffusion {Xt : t > 0}, L is a differential operator and even nonsmooth functions such as h = 1B - Qo(B)(h = 1B) are immediately made smooth by applying T. One may then use the approximation to h given by

Tt h = L(L - 'Tt h) =Li t , with O t = L -1 Tt h,

(26.1.7)

and then estimate the error of this approximation by a "smoothing inequality", especially if Tt h may be represented as a perturbation by convolution.


264

For several perspectives and applications of Stein's method, see Barbour (1988), Diaconis and Holmes (2004), Holmes (2004), and Rinott and Rotar (1996).

1(a) The Ornstein—Uhlenbeck Process and Its Gausssian Invariant Distribution The Ornstein-Uhlenbeck (OU) process is governed by the Langevin equa-

tion (see, e.g., Bhattacharya and Waymire (2009), pp. 476, 597, 598)

dXt where {B t

( 26.1.8)

= - Xt dt + v dBt ,

: t > 0} is a k- dimensional standard Brownian motion. Its

transition density is k

p(t; x, y) = x =

-t

)2

2 - e xi II [27r(1 - e -2t)] z exp - (y2(1 - e -2t) }'

i=1 ... , xk), y = (y1, ... , yk). (

x1,

(26.1.9)

This is the density of a Gaussian (Normal) distribution with mean vector

e - t x and dispersion matrix (1 - e -2t )4 where ilk is the k x k identity matrix. One can check (e.g., by direct differentiation) that the Kolmogorov backward equation holds:

ap(t; x, y) k a2 p(t; x, y) _ k ap(t; x, y) Lxz —L. z _ 1 ax? at i= 1. ax, = Ap-x.Vp=Lp, withL-©-x•V,(26.1.10)

26.1. The generator of the ergodic Markov process.

265

where © is the Laplacian and V = grad. Integrating both sides w.r.t. h(y)dy, we see that Tt h(x) = f h(y)p(t; x, y)dy satisfies

a

Tt h(x) = ATt h(x) - x • VTt h(x) = LTt h(x), Vh E L 2 (IP C , (F).

(26.1.11)

Now on the space L 2 (R k , (F) (where (F = N(0, Ilk) is the k-dimensional standard Normal), L is self-adjoint and has a spectral gap, with the eigenvalue 0 corresponding to the invariant distribution 4) (or the constant function 1 on L 2 (R', 4))). This may be deduced from the fact that the Normal density p(t; x, y) (with mean vector e -t x and dispersion matrix (1 - e -2t )IIk) converges to the standard Normal density q(y) exponentially fast as t -* 00, for every initial state x. Else, one can compute the set of eigenvalues of L, namely, {0, -1, -2,.. .}, with eigenfunctions expressed in terms of Hermite polynomials (Bhattacharya and Waymire, 2009, page 487). In particular, L -1 is a bounded operator on 1 1 and is given by

L -1 h = - J

r

T3 h(x)ds,

Vh

h - f hd(F E L 2 (R k ,,D). (26.1.12)

0

To check this, note that by (26.1.11)

(26.1.13)

= -

Tsh(x)ds = -1 00 LTs h(x)ds = L (- ^ Ts h(x)ds) .

Jo

as

0

o

For our purposes h = 1C: the indicator function of a Borel convex subset CofR k . A smooth approximation of h is Tt h for small t > 0 (since Tt h is infinitely


266

differentiable). Also, by (26.1.12)

't (x) - L 'Tt h(x) _ -

J

T T h(x)ds 8

(26.1.14)

t

0

_ — Jo / T3+th(x)ds = —Jt ^ T h(x)ds 3

_—

T l Rk J {

h(e - sx + 1 - e -2 sz)O(z)dz } ds,

JJ

O

where 0 is the k-dimensional standard Normal density. We have expressed T3 h(x) - E[h(X 3 )IXo = x] in (26.1.14) as E[h(X 3 )jXo = x] = Eh(e - sx + 1 - e -2 sZ), where Z is a standard Normal N(0,

Ilk),

(26.1.15)

for X3 has the same distribution

as a - ex + 1 - e -2 sZ. Now note that using (26.1.14), one may write (26.1.16)

Tt h(x) = L(L - 'Tth(x))

= 0 (L - 'Tih(x)) - x . 0 (L - 'Tth(x)) = i i?&t(x) - x - DV)t(x)• For the problem at hand (see 26.1.1), Qo = I and Q = Q( n ) is the distribution of S,,, = *(Y1 + Y2 + . • • + Y.) _ (X1 + X2 + • • + Xn), (X3 = Y^/ / ), where are i.i.d. mean-zero with covariance matrix IIk

and finite absolute third moment k

2

p= EjjYi11 3 = E ^(Y) 2 We want to estimate

Eh(S) = Eh(Sn ) -

J hd4)

for h = 1c, C E C, the class of all Borel convex sets in Rk.

(26.1.17)

26.2. Derivatives of Ii t - L -1 Tt h.

267

For this we first estimate (see (26.1.16)), for small t > 0,

ETth(SS) = E [DOt(S.) — Sn - DV)t(S,)] •

(26.1.18)

This is done in subsection 26.3. The next step is to estimate, for small t > 0,

ETth(S) — Eh(S),

(26.1.19)

which is carried out in subsection 26.4. Combining the estimates of (26.1.18) and (26.1.19), and with a suitable choice oft > 0, one arrives at the desired estimation of (26.1.17). We will write

6n =

sup

J hdQ(,

n

{h=lc:CEC}

26.2 Derivatives of

t

) —

fhdF. (26.1.20)

- L -1 Tt h.

Before we engage in the estimation of (26.1.18) and (26.1.19), it is useful to compute certain derivatives of V)t:

a a2a3 Let Di = —, Dijy = ax Dii, = , axi

axiaxi•

,

,

iaxi, axi„

etc.


268 Then, using (26.1.14),

DiV)t (x)

(26.2.1)

fk

—

1 — e -2

_ Ily—e-sxH12

ex P

-s }dyds { 2(1 —e2)

_ k e -3 (yz — e sxi) k h(y)(2^r(1 _e28)) 21 — e -2 s 1 — e -2 s —r f -

exp

{

Ily—e-sxII2 d ds 2(1_e_28) I y

e —s

1 — e 2s

t

Ilgk

h(ex + 1 — e` 28 z)zjO(z)dz ds,

ziO(z) = a z2 O(z) = DiO(z) —

—

using the change of variables

z

— y — e -s x 1

—

e 2s -

a ' In the same manner, one has, usingg Z, D Z^ , etc. for derivatives aZ;

a2

etc., e s

Dii3Ot(x) _

DZZZ

(y2 — e s xi) e -s h(y)(2^r(1 — e -2s))- 2 s

^

— ,/t

[fRk ^t(x)

_ —

f t

Rk [f

2

1 — e-2s

(26.2.2)

h(e-sx + 1 — e -2s z ) • D(z)dzds, e

8

1 — e -28

h(e-sx + 1 — e-2sz) ' (—Dii'i"O(z))dz^ ds.

26.2. Derivatives of V) t - L -1 Tt h.

269

The following estimate is used in the next section: sup

uER'

Jk

n- 1 e-sx + e -s u + 1- e -2 sz O(x)Dii , a ,, O(z)dxdz

h

fRl^

Ti

< coke 2s (1 - e-23).

(26.2.3)

To prove this, write a =/ ^n1 e 8 / 1 - e -2 s and change variables x —* y = x + az.

Then

O(x) _ çb(y - az) _ 0(y) - az V (y) k

+ a2

f

zr zr ,

(26.2.4)

-v)Drr'(y - vaz)dv,

so that

n n 1 e -s x + e -s u + 1- e -2 sz = h n n 1 e -s y + e - 'su

h

and the double integral in (26.2.3) becomes

(26.2.5) Jh

JRk R k

n-1 n

e-Sy + e -s u k

0(y) - az • V (y) +

1

a 2 zrzr,

1 (1 - v)Drr'O(y - vaz)dv 0

Djjy, O(z)dzdy.

Note that the integrals of

and zj)Djjy,O(z) vanish for i, i', i",

and i o , so that (26.2.6)

f

R kh

n-1 e s y + e su (O(y) - az

= 0.


270

The magnitude of the last term on the right in (26.2.4) is (26.2.7)

f

a 2 (1 — v)

zrzr'(y —vaz) r (y —vaz) r ' —

1

zr O(y — avz)dv r=1

r=1

k

k

zrzr'(y — vaz) r (y — vaz) r ' + E zT O(y — avz)dv,

0). Recall that (see 26.1.15) Tt h(x) = Eh(e -t x + 1 - e -2 tZ), where Z has the standard Normal distribution

= N(0, Ilk), which we take to be independent of S. Then

ETt h(Sn,) = Eh(e -t S,, + 1 - e -2 tZ)

f f k

=

fR

k

h(ex + 1 - e

hd((Q(n))e-t *

hd((Q(n))e—c —

2t

z)dQ(n)(x)^(z)dz

1_e- t)

k

=J

(26.4.1)

e) *

e

1

_e z .

fk

The introduction of the extra term 4) e -t *4 integration in the last step since

j,.

,

1_e_

, _ 4 does not affect the

hd4 = 0.

Since the last integration is with respect to the difference between two probability measures, its value is unchanged if we replace h by h. Hence

ETth(Sn) = f hd[(Q(n))e-t - (D e -t] ^t $ 1 _ e - ^..

(26.4.2)

Rk

Also the class C is invariant under multiplication C —+ bC, where b > 0 is given. Therefore,

s,, = sup I Eh(Sn)I = sup f hd(Q (n) — 41) hE

hEN

= sup f hd [(Q())e-t - ^-t^ .

(26.4.3)

hEN

Thus (26.4.2) is a perturbation (or smoothing) of the integral in (26.4.3) by convolution with

1_e_

^. If e > 0 is a constant such that

1_e

({izI 1 and an absolute constant c > 1 specified below. Note that (26.4.13) clearly holds for n < c 2 k 5 p3. Since c 2 k 5 p3 > k

,

that (26.4.13) holds for some n = n o > k 8 . Then under the induction hypothesis, and (26.4.12), and using n o co.. ck 4+4 P3

> 2(no 1) 14 , one obtains

C7k3/2P3

ago+1 (no(no + 1)) + (no + 1) cio. ck P3

C7k5/2P3

(n o + 1) + 2o(n o + 1)2 \ 1 (c lo = 2c 9 , k < k -o < 2 -9 for k >2 I .

no+ 1 —

JJJ

(26.4.14)

Now, choose c to be the greater of 1 and the positive solution of c = clo/+c72 -9 , to check that (26.4.13) holds for n = no+1. Hence (26.4.13) holds for all n. We have proved the following result.

Theorem 1 There exists an absolute constant c> 0 such that on

ck 3

< _

n

(26.4.15)

26.5 The Non—Identically Distributed Case For the general case considered in Gotze (1991), XD's (1 < j < n) are independent with zero means and E 1 CovX3 = Ek• Assume

a3 = E EIIXÎI 3 1, then /33 may be assumed to be smaller than or equal to c -l k - 2, and (1- fig)-' < (1 - - ) -1 = c'. The c^

induction argument is similar. Remark. If one defines 73

1 =_ >E

k

^X^zl1 1 3 ,

(26.5.9)

j=1 ì=1

then n

k

E L

j=1 i,i" i"=1

j j

^

i

E X ("X ")X 2/ l j = 73.

3

Since 13 now replaces k 2 Q3 in the computations, it follows that b n < ckry3. Since ry3 < k033 i (26.5.10) provides a better bound than (26.5.8) or (26.4.13).

(26.5.10)

Bibliography Ball, K. (1993). The reverse isoperimetric problem for Gaussian measure. Discrete Comput. Geom., 10(4):411-420.

Barbour, A. D. (1988). Stein's method and Poisson process convergence. In A Celebration of Applied Probability ( Journal of Applied Probability, Volume 25A), pages 175-184.

Bentkus, V. (2003). On the dependence of the Berry-Esseen bound on dimension. J. Statist. Plann. Inference, 113(2):385-402. Bhattacharya, R. (1982). On the functional central limit theorem and the law of the iterated logarithm for Markov processes. Z. Wahrsch. Verve. Gebiete, 60(2):185-201.

Bhattacharya, R. and Holmes, S. (2010). An exposition of Gotze's estimation of the rate of convergence in the multivariate central limit theorem. Technical report, Stanford University, Stanford, CA. http://arxiv.org/abs/1003.4254. Bhattacharya, R. and Waymire, E. C. (2009). Stochastic Processes with Applications. Classics Appl. Math. 61, SIAM, Philadelphia.

283

284

BIBLIOGRAPHY

Diaconis, P. and Holmes, S. (2004). In Stein's Method: Expository Lec-

tures and Applications, IMS Lecture Notes Monogr. Ser. 46, Inst. Math. Statist., Beachwood, OH. Gotze, F. (1991). On the rate of convergence in the multivariate Cit. The

Annals of Probability, 19:724-739. Holmes, S. (2004). Stein's method for birth and death chains. In

Stein's Method: Expository Lectures and Applications, IMS Lecture Notes Monogr. Ser. 46, Inst. Math. Statist., Beachwood, OH, pp. 45-68. Raic, M. (2004). A multivariate CLT. Personal communication. Rinott, Y. and Rotar, V. (1996). A multivariate CLT for local dependence with n -112 log n rate and applications to multivariate graph related statistics. J. Multivariate Anal., 56(2):333-350. Stein, C. (1986). Approximate Computation of Expectations. Inst. Math. Statist., Beachwood, OH.

Appendix

A.1 RANDOM VECTORS AND INDEPENDENCE

is a A measure space is a triple (S2, , ii), where SZ is a nonempty set, sigma-field of subsets of SZ, and is a measure defined on ffi . A measure space (S2,, , P) is called a probability space if the measure P is a probability measure, that is, if P ((I) = 1. Let (SZ, ffi , P) be a probability space. A random vector X with values in R k is a map on Sl into R k satisfying X '(A)-(w:X(w)EA}E -

(A.1.1)

for all A E , where S k is the Borel sigma-field of R k . When k= I, such an X is also called a random variable. If X is an integrable random variable, the mean, or expectation, of X, denoted by EX [or E(X)], is defined by

EX

f

n

(A.1.2)

X dP.

If X = (X I Xk is a random vector (with values in R k ) each of whose coordinates is integrable, then the mean, or expectation, EX of X is defined by , ...,

)

EXm(EX1,...,EXk).

(A.1.3)

If X is an integrable random variable, then the variance of X, denoted by var X [or var(X)], is defined by varX = E (X — EX) 2

.

(

A.1.4)

Let X, Y be two random variables defined on (2, ,P). If X, Y, and XY 285

286

Bounded Variation

are all integrable, one defines the covariance between X and V. denoted cov (X, Y), by

cov (:C, Y) - E (X — E X) (Y — E Y) = E XY — (E X) (E Y). (A.1.5 ) If X = (X 1 , ... , X k ) is a random vector (with values in R'), such that cov(X i ,X^) is defined for every pair of coordinates (X.,X^), then one defines the covariance matrix Cov(X) of X as the k x k matrix whose (ij) element is coy (X.,Xi ). The distribution Px of a random vector X (with values in R k ) is the induced probability measure PDX - ' on R", that is, Px(A)-P(X - '(A))

(A E 'i k ).

(A.1.6)

Since the mean and the covariance matrix of a random vector X depend only on its distribution, one also defines the mean and the covariance matrix of a probability measure Q on R k as those of a (any) random vector having distribution Q. Random vectors X 1 ,.. .X,,, (with values in R k ) defined on ((2,J^'v,P) are independent if P(X 1 EA 1 ,X 2 EA 2 ,..., X,, EA.)

=P(X 1 E A 1 )P (XZ EA 2 )• P(X.E A,.)

(A.1.7)

for every m-tuple (A 1 ,. ..,A,,,) of Borel subsets of R"`. In other words, X 1 ,.. .,X,,, are independent if the induced measure P°(X 1 ,...,X,„) - ' is a product measure. A sequence (X: n) 1) of random vectors [defined on ((2, S , P)] are independent if every finite subfamily is so.

A.2 FUNCTIONS OF BOUNDED VARIATION AND DISTRIBUTION FUNCTIONS Let be a finite signed measure on R k . The distribution function F µ of is the real-valued function on R k defined by F, (x)=IL((— oo, x])

(xER' ),

(A.2.1)

where (—oo,x]=(—oo,x1]X(—oo,x2]X... X(—oo,x k ]

I x=(x^,...,x )ER k

k I

(A.2.2)

Appendix

287

It is simple to check that F; is right continuous. For a random vector X defined on some probability space (2, % , P), the distribution function of X is merely the distribution function of its distribution P,. The distribution function Fµ completely determines the (finite) signed measure µ. To see this consider the class S of all rectangles of the form (a,b]=(a,,b 1 ]x... x(ak ,bk ]

(A.2.3)

(a,4fori=l,...,k]. (A.2.5)

If k = 1, we shall write t k for the difference operator. One can also shows that for every (a, b] E 9 it((a,b])=A hF,,(x),

(A.2.6)

where h=2(b—a),

x=I(a+b).

(A.2.7)

The class €, of all finite disjoint unions of sets in is a ring over which µ is determined by (A.2.6). Since the sigma-ring generated by R. is 'B k , the tSee Cramer [4], pp. 78-80.

Bounded Variation

288

uniqueness of the Caratheodory extensions implies that on 6k is determined by on L (and, hence by the distribution function Fµ ). One may also show by an induction argument$ that A h F(x)= 2 ±F(x i +E 1 h i ,x 2 +E 2 h 2 ,...,x, +Ek hk ),

(A.2.8)

where the summation is over all k-tuples (€ I '€ 2 , ... , E k ), each e; being either + I or —1. The sign of a summand in (A.2.8) is plus or minus depending on whether the number of negative E's is even or odd. Now let F be an arbitrary real-valued function on an open set U. Define a set function µF on the class Fu of all those sets in 9 that are contained in U by (A.2.9) µF((a,b1)-AhF(x), where x and h are given by (A.2.7). One can check that µF is finitely additive in . The function F is said to be of bounded variation on an open set U if

sup

I µF( 1 )I

(A.2.10)

is finite, where the supremum is over all finite collections { 1 '2'••)} of pairwise disjoint sets in 9 such that 1c U for all j. The expression (A.2.10) is called the variation of F on U. The following theorem is proved in Saks El] (Theorem 6.2, p. 68). THEOREM A.2. 1. Let F be a right continuous function of bounded variation on a nonempty open set U. There exists a unique finite signed measure on U that agrees with µ F on the class ^+o of all sets in 't contained in U. It may be checked that the variation on U of a right continuous function F of bounded variation (on U) coincides with the variation norm of the signed measure whose existence is asserted in Theorem A.2. I. A function F is said to be absolutely continuous on an open set U if given e >0 there exists S >0 such that

^^µF Qj )I<E

(A.2.11)

1

for all finite collections (I,.... } of pairwise disjoint rectangles 1^ E ^+ u satisfying (A.2.12) Xk (1^) < S,

tSee Halmos [1), p. 54. =See Cramer [4], pp. 78-80.

289

Appendix

where Ak denotes the Lebesgue measure on R k . If F is absolutely continuous on a bounded open set U, then it may be shown that F is of bounded variation on U.t THEOREM A.2.2. Let F be a right continuous function of bounded variation on an open set U C Rk . Let µ F be the measure on R k defined by (A.2.6) (and Theorem A.2.1). Suppose that on U the successive derivatives D k F, Dk _ I D k F,...,D 1 . Dk F exist and are continuous. Then F is absolutely continuous on U and one has ttF (A)= f(D 1 ... Dk F)(x)dx

(A.2.13)

for every Bore! subset A of U. Also, lim

h,jo.....h 4 j0

(2 kh 1 . • • hk) -I LkF= DID2• • . D k F

(A.2.14)

on U. Proof. Let the closed rectangle [a, b] be contained in U. Let h and x be defined by (A.2.7). Then µF((a+b])=A F(x)=A^.

..

Ok F(x)

Ok-IlF(XI,...,Xk-1, Xk+ hk)-

_oi... Ak_y

F(x1,...,xk-1,xk-hk))

(X4th4

JX4

hk

(DkF)(x, , ... , xk-I+Yk)dYk

(A.2.15)

by the fundamental theorem of integral calculus. Since the integrand has a continuous derivative with respect to X k - 1 , N'F(( a,b])=(]^'... Qk_2

(x4+hkr

2 J X4

f

h4

L ( Dk F )(X1 , ...9 Xk-I +hk-I'Yk) —

Xk t k4

(DkF)( X I , ... , Xk-1 - hk-Ilyk)]dyk

^^X4_^+/J4_^ //

-

lDk-IDkF)1X1,...+Yk-I+Yk)^k-I x4_i - h4_i

X4 k4

J

^k•

(A.2.16) tSaks [1), p. 93.

Bounded Variation

290

Proceeding in this manner, we arrive at (A.2.13) for A = (a, b], remembering that by Fubini's theorem the iterated integral as obtained by the above procedure is equal to the integral on (a,b] with respect to Lebesgue measure on R k . We next show that D, • • • Dk F is integrable on U. For if this is false, then for every integer n > I there exists an integer m„ and pairwise disjoint rectangles (a', b'], ... , (a ", b'"•] such that [a', b' ] c U, i=1,...,m 1 , and m„

j

(D, • • Dk F)(x)dx > n.

_

='.b'l

By (A.2.13), which we have proved for sets like (a',b'], one then has I tLF((a; ,b; ])I>n

for all n, contradicting the hypothesis that F is of bounded variation on U. Thus we have two finite signed measures on U, defined by A—► f (D I • Dk F)(x)dx,

A—^ s,(A),

A

that coincide on the class of all rectangles (a,b] such that [a,b]c U. Therefore the two signed measures on U are equal, and (A.2.13) is established. To prove (A.2.14), let x E U. Choose h =(h 1 , ... ,h,) such that hi > 0 for all i and [x — h, x + hl C U. Then by (A.2.13) one has A F(x)°µF((x '

—

h x+h])=_ f(D1 ,

(z

h.z+Al

...

DF)(y)dy.

From this and continuity of D s • • • Dk F on U the relation (A.2.14) follows.

Q.E.D. It follows from definition that the sum of a finite number of functions of bounded variation, or absolutely continuous, on an open set U is itself of bounded variation, or absolutely continuous, on U. Our next result establishes the bounded variation of a product of a special set of functions of bounded variation. We say that a function g on R k (into R 1 ) is Schwartz if it is infinitely differentiable and if for every nonnegative integral vector a and every positive integer m one has sup IIxIImI(D"g)(x)I p, and G = g if k = p. Then the function F(x)=F l (x l )• • • F;(xp )G(x)

(xER k ),

(A.2.19)

is of bounded variation on R k . Proof Consider an arbitrary function Ho on R. We first show that phi... L F .. . F I( x l) p( xp) Ho( x I_ .,xp) i - I F,(x;, - h,,) . .. F: (x, - h i) [ 0!^ . .. , Ho(x') ] X {phi,F. (x. )]... [ Ohj,-,F. (x. )],

(A.2.20)

where X'=(x ,...,Xp), X^^=X...,X^=x x^ , =x+h^ . ..... x, =x1 + and the summation is over all partitions of (1,2,... ' p^ into two disjoint subsets {i 1 ,...,i3 ), {j 1 ,...,jp _ J }, 0<s' p; when one of these subsets is empty, the corresponding factor drops out. If p = 1, then

i F1 (x)Ho (x)=FI (x+h)Ho(x+h)- Fl (x- h)Ho (x-h) =F1 (x-h)[Ho(x+h)-H o (x-h)] +H0(x+h)[FI (x+h)-FI (x-h)] = F I (x - h)O 1Ho (x) + Ho (x + h)PPF1 (x), which proves (A.2.20) for p = 1. Assume, as an induction hypothesis, that (A.2.20) holds for some p. Then ^'i...

Fl(x1)... jCp+l( xp+l) Ho(xl , ... , xp+1)

... F;( .. pp [ F 1( x l) xp)' Ap+11( FF+l( xp+1) Ho( x 1 ,...,xp+1))] =d1.... dp

[Fl(xl)

...

F,(xp){Fp+,(xp+1 - hp+l) 4➢+1Ha(xl , ... , xp+l)

+Ho(xl,...,Xp,Xp+l+hp+Oilh,-Fp+l(xp+l)1 J

_^^^... poFl(xl) .. F. (xp)Ho'(xI — xp+1) .

+

l...

AoF'1(xl)

,

...

,

F.(xp)Ho(xl , ... , xp+1) , (A.2.21)

Bounded Variation

292

where Qx l ,...,xp+l )= Fp+1(xp+1

—

h,,+l)AP+'HO(x, ... xp+I) ,

,

,

Ho(xl,...,xp+l)= Ho(xl, ... ,xp , xp+I +hp+1 )A +'F,+1(xp+l). (A.2.22)

Now apply the induction hypothesis to each of the two summands of the last expression in (A.2.21) and then substitute from (A.2.22) to see that (A.2.20) holds with p replaced by p+ 1. This completes the proof of (A.2.20) for all p. Looking at F given by (A.2.19), one sees that del... dkkF(x)=4h1... AhF,(xI)... F,(xp )Ho (x l ,...,xk ), (A.2.23) where Ho(xI,...,xk)—Ap+11...

dkG(x l ,...,xk ).

(A.2.24)

By (A.2.20) we obtain 2Fxi .—hi .)... Fi, (xi, —h i, )

OF(x)=

X (d^4,. .. A^ ,j^p+1 • dk G ( x' ) J X r^^51F( xi ),... {'-

where

x' = ( 1,..., k),

x. ^,

...,x' ^

'/, _ ,

= i, x. x' p+ = l-x p+

]

_

(A.2.25)

x'k— = kx 1x' _ 1

l,...,

,

x.i , + h^., ... , xJ' , = x^. , + hh. , and the summation is over all partitions of

(1,2,...,p) into two disjoint subsets {i 1 ,...,i}, {j l ,...,jp - f }, 0<s< p. For the sake of simplicity consider the summand on the right side of (A.2.25) corresponding to i 1 = 1, . .. , is = s. Then by the definition (A.2.18) of G, one

has

.. oS P+y ... okG( x ') = rxi+l+h0+l... fXk+Ilk

xo+l

-

hr+l

k

Xk-hk

... Dag(x1 ,. . . xs xf+l+hs+1 ,...,xp ,

-

x1+h1 ...

Jfxl-h, ...

Xk

,

+hp , Yp+11 ..., YJdYk ... dyp+I

/'x,+h, ! xF+l +h,+l

J

x,-h,

+ hk ^G(Yp+1,...,Yk)(D, • •ass}(y1 ...,y^^xJ+ ► ,

xk - h k

+h,+1 , ... , xp+hp , Yp+l^

,Yk )dyk

...

4p+1 43'5

...

dy1.

(A .2.26)

293

Appendix

Let the derivative of F, on (0, 1) be bounded above in magnitude by b,, and let c, denote the magnitude of the jump of F, at 0 (c, may be zero), 1 < i < p. Assume that 2h, 0). Let Sj (j = 0, 1,2,...) be a sequence of real-valued periodic functions on R of period one, possessing the following properties: (i) For j > 0, Si is differentiable at all nonintegral points and Si+ ^(x) = S.(x) (at all nonintegral x), (ii) So (x) m 1, S, is right continuous and Sj is continuous for j > 2. (A.4.1) Such a sequence is uniquely determined by the above properties and plays a fundamental role in the summation formula. To see this, write Si (0) = Bj /(j!) and observe that (i) leads to S 1 (x)=x+B 1 , S (x)-

1

ji

S2 (x) = Zx 2 +B 1 x+B 2 /2!,...

B x j + I XX

1

1! (j— 1)!

...+

j B - _ I I B (j)Xl -r ,

j!

j!

r-p r

(0<x(1,j> 1) (A.4.2)

297

Appendix

The constants Bi ' s are determined by the property (A.4.1). In fact, Si (0) =S(1) for j > 2, which yields 1+ (

J B1 +12JB2+... +( . i l )B'-i=0

(3= 2,3,...). (A.4.3)

)

The sequence. of constants Bi is recursively defined by the relation (A.4.3), thus completely determining the sequence of functions Si in the interval 0< x < 1. The continuity assumption determines their values at integral points. The numbers Bi defined by (A.4.3) are called Bernoulli numbers, and the polynomial i

B1 (x)=

I B (J)x' r

(A.4.4)

r

r-0

is called the jth Bernoulli polynomial. Clearly, SS (x)= B1 (x)/(j!) for 0 < x 0} has the properties (A.4.1), excepting right continuity of (-1)S,(— x), it follows from uniqueness that

Si (— x) = (-1)iS1 (x)

(for all x if j 1, for nonintegral x if j =1). (A.4.5)

The functions S, are thus even or odd depending on whether j is even or odd. In particular, Bi

.,(0)=-4=0 J•

for j odd, j > 3.

(A.4.6)

The first few Bernoulli numbers are B0 =1,

B 1 =-2,

B 2 =6,

B 3 =0,

B4 =-3,

B 5 =0. (A.4.7)

Therefore S1(X)= X-

S3 (x)=

,

S2(x)=1(x2-x+),

6(x 3 — 2x 2 + 2x),...

(0 < x < 1),

(A.4.8)

and so on. The periodic functions S, have the following Fourier series

298

Euter-Maclaurin Summation Formula

expansions when x is not an integer: cos(2nirx) —1 2 )^^2- (

j even, j > 0, n=1

(2nir)j

(A.4.9) 2sin(2n7rx)

j odd. n= I (2nir)i

This may be seen as follows. Let uj denote the function represented by the Fourier series (j > 1). It can be checked directly that u i is the Fourier series of S, and that u' + i = uj for j > 1. Thus Sj = uj for all j > 2, and S 1 (x)=u 1 (x) for all nonintegral x. THEOREM A.4. 1. Let f be a real-valued function on R' having r continuous derivatives, r> 1, and let

f

JDifJdx 0, and let r be a positive integer. Define

I

( Ar(x)= j(a)0. Then if h = (h I , ... , h k ), h.>0 for all i, dH=A" . .. Ok (H)=A ,(S1p2z... OkG). ,

x - h

Normal Approximation and Asymptotic Expansions (Clasics in Applied Mathmatics)

Applied asymptotic expansions in momenta and masses

Asymptotic expansions

Asymptotic expansions

Asymptotic expansions

Nonarchimedean fields and asymptotic expansions

Nonarchimedean Fields and Asymptotic Expansions

Composite Asymptotic Expansions

Asymptotic Expansions (Cambridge Tracts in Mathematics)

Matched asymptotic expansions and singular perturbations

Asymptotic Approximation of Integrals (Classics in Applied Mathematics)

Applied Asymptotic Expansions in Momenta and Masses (Springer Tracts in Modern Physics)

Normal Approximation - Some Recent Advances

Normal approximation by Stein's method

Asymptotic expansions for pseudodifferential operators on bounded domains

Asymptotic expansions for pseudodifferential operators on bounded domains

Normal Approximation by Stein's Method (Probability and Its Applications)

Transonic Aerodynamics: Problems in Asymptotic Theory (Frontiers in Applied Mathematics)

Transonic Aerodynamics: Problems in Asymptotic Theory (Frontiers in Applied Mathematics)

Approximation Theory (Lecture Notes in Pure and Applied Mathematics)

Asymptotic Treatment of Differential Equations (Applied Mathematics)

PSP Hacks, Mods, and Expansions

Lagrangian analysis and quantum mechanics: a mathematical structure related to asymptotic expansions and the Maslov index

Multipliers for (C,a)-Bounded Fourier Expansions in Banach Spaces and Approximation Theory

Practical Applied Math. Modelling, Analysis, Approximation

Frontiers in Interpolation and Approximation

Practical applied mathematics: modelling, analysis, approximation

Practical Applied Mathematics: Modelling, Analysis, Approximation

Asymptotic Methods in Analysis

Asymptotic Methods in Electromagnetics

Practical Applied Mathematics Modelling, Analysis, Approximation

Normal Approximation and Asymptotic Expansions (Clasics in Applied Mathmatics)

Applied asymptotic expansions in momenta and masses

Asymptotic expansions

Asymptotic expansions

Asymptotic expansions

Nonarchimedean fields and asymptotic expansions

Nonarchimedean Fields and Asymptotic Expansions

Composite Asymptotic Expansions

Asymptotic Expansions (Cambridge Tracts in Mathematics)

Matched asymptotic expansions and singular perturbations

Asymptotic Approximation of Integrals (Classics in Applied Mathematics)

Applied Asymptotic Expansions in Momenta and Masses (Springer Tracts in Modern Physics)

Normal Approximation - Some Recent Advances

Normal approximation by Stein's method

Asymptotic expansions for pseudodifferential operators on bounded domains

Asymptotic expansions for pseudodifferential operators on bounded domains

Normal Approximation by Stein's Method (Probability and Its Applications)

Transonic Aerodynamics: Problems in Asymptotic Theory (Frontiers in Applied Mathematics)

Transonic Aerodynamics: Problems in Asymptotic Theory (Frontiers in Applied Mathematics)

Approximation Theory (Lecture Notes in Pure and Applied Mathematics)

Asymptotic Treatment of Differential Equations (Applied Mathematics)

PSP Hacks, Mods, and Expansions

Lagrangian analysis and quantum mechanics: a mathematical structure related to asymptotic expansions and the Maslov index

Multipliers for (C,a)-Bounded Fourier Expansions in Banach Spaces and Approximation Theory

Practical Applied Math. Modelling, Analysis, Approximation

Frontiers in Interpolation and Approximation

Practical applied mathematics: modelling, analysis, approximation

Practical Applied Mathematics: Modelling, Analysis, Approximation

Asymptotic Methods in Analysis

Asymptotic Methods in Electromagnetics

Practical Applied Mathematics Modelling, Analysis, Approximation

Recommend Documents