ADVANCED TEXTS IN ECONOMETYCS General Editors
C. W. J. Granger
G. E. Mizon
STOCHASTIC LIMIT THEORY An lntroduction for Econometricians
JAMES DAVIDSON
(lxford llniversity Press 1994
Oxjord University Press, Walton Street Oxjord oxa Oxford New York Athens Auckland Banghok Bombay Calcutta Cape Tozvn Dar es Salaam Delhi Florence Hong Kong Istanbul Karachi Kuala Lumpur Madras Madrid Melbourne Mexico City Nairobi Jtzrl Singapore Ttzzc Tohyo Toronto and assoclted companies in Berlin Ibadan ,
6Dp
Published in the Unitei States by Oxford University Pyess l'nc., Nec York
(0 JamesDavidson, 1994 All rlk/lfl reserved. No part of this publication may be repvoduced, stored in a rdfrfcrtz/ system, or transmitted, in any or .',Vtz'rly means, without the yrfor permsion 2'A; zvriting t)./ Oxford Ublbcrlf.y Press. Within the UK, exceptions are allouled p respect of any fair dealing .Jorthe Azlr/t?ycoj research or private study, or crl'li'cl'lpl or reviezv, as permitted under the Co/wrr/, Desns and Patents Act, zq'l', or fr the case V veprographic reproduction in accordance zcff/l the terms V the licences issued by the Copyright Licensing Agency. A'r/tzpzrc: concerntg rcyrotucfpn outside these terms and in other countries should be sent to the Rights Depavtment, Oxjord University Press, at the address above ./brv
F/7l'.book f.ssold subject to the condition that it shall not, by zt?tz.'p oj trade or otherwe, be lent, re-sold, Itived out or otheyulise circulated without the publisher's prior consent s any formof binding (?r covey other than that l'A; which 2'J is published and xithout a similar condition i'ac/z/kg this condhion being imposed on the subsequent purchaser srzk./z Librar.y Ctzftz/twlfag in Publication Data Data available Library
of Congress Cataloging in Publication Data
Data available ISBN 0-19-877402-8 ISBN 0-19-877403-6 ('Pbk) 1 3 5 7 9 10 8 6 4 2 Printed in Great Britain on acid-jkee yl/cr by Biddles Ltd., Guildjord and King
's
Lynn
For Lynette, Julia, and Nicola.
what in me is dark Illumine, what is 1ow raise and support, That, to the height of this great argument, 1 may assert Eternal Providence, And justify the ways of God to men.'
& .
.
.
Paradise f.,t?l/, Book 1, 16-20
Conients Preface
xiii
Mathematical
Symbols and Abbreviations
AXA
Part 1: Mathematics 1. Sets 1.1 1.2 1.3 1.4 1.5 1.6
and Numbers Basic Set Theory Countable Sets The Real Continuum Sequences of Sets Classes of Subsets Sigma Fields
3 8 10 12 13 15
2. Limits and Continuity 2.1 The Topology of the Real Line 2.2 Sequences and Limits 2.3 Functions and Continuity 2.4 Vector Sequences and Functions 2.5 Sequences of Functions 2.6 Summability and Order Relations 2.7 Arrays
20 23 27 29 30 31 33
3. Measure 3.1 Measure Spaces 3.2 The Extension Theorem 3.3 Non-measurability 3.4 Ptoduct Spaces 3.5 Measurable Transfcymations 3.6 Borel Functions
36 40 46 48 50 55
4. lntegration 4.1 Construction of the Integral 4.2 Properties of the lntegral 4.3 Product Measure and Multipl 4 4 The Radon-Nikodym
57 61 64 69
lntepals
'theorem
.''' -' y'
-'
7*
(qd qd 5. Metric
Spaces and Metris Distances 1 5. ) xjj 5.2 Separability and tjmptttt t(((...'.! j y.) .. . r), 5.3 Exampls ' ..)
.....
'
.
...
...
E. .
.........y
.. y. . . . . . E)rt)yy),,:yy't,y:r(,q)'yt 2 .,
i.; ! E E. ... . . . .. ... . ... . q.
. r
. j.
. .. q .
.
...
75 78 82
Contents
5.4 Mappings on Metric Spaces 5.5 Function Spaces
84 87
6. Topology 6.1 Topological Spaces 6.2 Countability and Compactness 6.3 Separation Properties 6.4 Weak Topologies 6.5 The Topolpgy of Product Spaces 6.6 Embedding and Metrization
93 94 97 101 102 105
Part 11: Probability 7. Probability Spaces 7. 1 Probability Measures 7.2 Conditional Probability 7.3 Independence 7.4 Product Spaces
111 113 114 115
8. Random Variables 8. 1 Measures on the Line 8.2 Distribution Functions 8.3 Examples 8.4 Multivariate Distributions 8.5 Independent Random Variables
117 117 122 124 126
9. Expectations 9. 1 Averages and Integrals 9.2 Expectations of Functions of X 9.3 Theorems for the Probabilist's Toolbox 9.4 Multivariate Distributions 9.5 More Theorems for the Toolbox 9.6 Random Variables Depending on a Parameter
128 130 132 135 137 140
10. Conditioning 10.1 Conditioning in Product Measures 10.2 Conditioning on a Sigma Field 10.3 Conditional Expectations 10.4 Some Theorems on Conditional Expectations 10.5 Relationships between Subfields 10.6 Conditional Distributions
143 145 147 149 154 157
11. Characteristic Functions 11.1 The Distribution of Sums of Random Variables 11.2 Complex Numbers
161 162
Contents
11.3 The Theory of Characteristic Functions 11.4 The Inversion Theorem 11.5 The Conditional Characteristic Function
f;r
164 168 171
Part 111:Theory of Stochastic Processes 12. Stochastic Processes 12.1 Basic Ideas and Terminology 12.2 Convergence of Stochastic Sequences 12.3 The Probability Model 12.4 The Consistency Theorem 12.5 Uniform and Limiting Properties 12.6 Unifonn lntegrability
177 178 179 183 186 188
13. Dependence 13.1 Shift Transformations 13.2 Independence and Stationarity 13.3 Invariant Events 13.4 Ergodicity and Mixing 13.5 Subfields and Regularity 13.6 Strong and Uniform Mixing
191 192 195 199 203 206
14. Mixing 14.1 Mixing Sequences of Random Variables 14.2 Mixing Inequalities 14.3 Mixing in Linear Processes 14.4 Sufficient Conditions for Strong and Uniform Mixing
209 211 215 219
15. Martingales 15.1 Sequential Conditioning 15.2 Extensions of the Martingale Concept 15.3 Martingale Convergence 15.4 Convergence and the Conditional Variances 15.5 Martingale Inequalities
229 232 235 238 240
16. Mixingales 16.1 Definition and Examples 16.2 Telescoping Sum Representations 16.3 Maximal Inequalities 16.4 Uniform Square-integrability
247 249 252 257
17. Near-Epoch Dependence 17.1 Definition and zxamples 17.2 Near-Epoc Dependence and Mixingales
261 264
Contents
17.3 Near-Epoch Dependence and Transformations 17.4 Approximability
267 273
Part lV: The Law of Large Numbers 18. Stochastic Convergence 18.1 Almost Sure Convergence 18.2 Convergence in Probability 18.3 Transformations and Convergence 18.4 Convergence in Lp Norm 18.5 Examples 18.6 Laws of Large Numbers
281 284 285 287 288 289
19. Convergence in fp-Norm 19.1 Weak Laws by Mean-square Convergence 19.2 Almost Sure Convergence by the Method of Subsequences 19.3 A Martingale Weak Law 19.4 A Mixingale Weak Law 19.5 Approximable Processes
293 295 298 302 304
20. The 20. 1 20.2 20.3 20.4 20.5 20.6
306 311 313 316 318 323
Strong Law of Large Numbers Technical Tricks for Proving LLNS The Case of Independence Martingale Strong Laws Conditional Variances and Random Weighting Two Strong Laws for Mixingales Near-epoch Dependent and Mixing Processes
21. Uniform Stochastic Convergence 21.1 Stochasti Functions on a Parameter Space 21.2 Pointwise and Unifonn Stochastic Convergence 21.3 Stochastic Equicontinuity 21.4 Generic Uniform Convergence 21.5 Unifonu Laws of Large Numbers
Part V: The Central Lint
327 330 335 337 340
Theorem
22. Weak Convergence of Distributions 22.1 Basic Concepts 22.2 The Skorokhod Representation Theorem 22.3 Weak Convergence and Transformations 22.4 Convergence of Moments and Characteristic Functions 22.5 Criteria for Weak Convergence 22.6 Convergence of Random Sums
347 350 355 357 359 361
Contents
23. The 23. 1 23.2 23.3 23.4 24.
CLTS
24.1 24.2 24.3 24.4 24.5
11
Classical Central Limit Theorem The i.i.d. Case lndependent Heterogeneous Sequences Feller's Theorem and Asymptotic Negligibility The Case of Trending Variances
364 368 373 377
for Dependent Processes A General Convergence Theorem The Martingale Case Stationary Ergodic Sequences The CLT for NED Functions of Mixing Processes Proving the CLT by the Bernstein Blocking Method
380 383 385 386 391
25. Some Extensions 25.1 The CLT with Estimated Normalization 25.2 The CLT with Random Norming 25.3 The Multivariate CLT 25.4 En'or Estimation
399 403 405 407
Pa14 V1: The Functional Central Limit Theorem 26. Weak Convergence in Metric Spaces 26.1 Probability Measures on a Metric Space 26.2 Measures and Expectations 26.3 Weak Convergence 26.4 Metrizing the Space of Measures 26.5 Tightness and Convergence 26.6 Skorokhod's Representation
413 416 418 422 427 431
27. Weak Convergence in a Function Space 27.1 Measures on Function Spaces 27.2 The Space C 11.3 Measures on C 27.4 Brownian Motion 27.5 Weak Convergence on C 27.6 The Functional Central Limit Theorem 27.7 The Multivariate Case
434 437 440 442 447 449 453
28. Cadlag Functions 28.1 The Space D 28.2 Metrizing D 28.3 Billingsley's Vetric 28.4 Measures on D
456 459 461 465
Contents
Xlt
28.5 Prokhorov's Metric 28.6 Compactness and Tightness in D 29.
FCLTS
29.1 29.2 29.3 29.4 29.5
for Dependent Variables The Distribution of Continuous Functions on D Asymptotic Independence The FCLT for NED Functions of Mixing Processes Transfonned Brownian Motion The Multivariate Case
467 469 474 479 481 485 490
30. Weak Convergence to Stochastic Integrals 30.1 Weak Limit Results for Random Functionals 30.2 Stochastic Processes in Continuous Time 30.3 Stochastic Integrals 30.4 Convergence to Stochastic Integrals
496 500 503 509
Notes
517
References
519
lndex
Preface
Recent years have seen a marked increase in the mathematical sophistication of econometricresearch. While thetheory of linearparametric models which fonns the backbone of the subject makes an extensive and clever use of matrix algebra, the statistical prerequisites of this theory are comparatively simple. But now that these models are pretty thoroughly understood, research is concentrated increasingly on the less tractable questions, such as nonlinear and nonparametric estimation and nonstationary data generation processes. The standard econometlics texts are no longer an adequate guide to this new technical literature, and a sound understanding of the probabilistic foundations of the subject is becoming less and less of a luxury. The asymptotic theory traditionally taught to students of econometrics is founded on a small body of classical limit theorems, such as Khinchine's weak law of large numbers and the Lindeberg-Lvy central limit theorem, relevant to the stationary and independent data case. To deal with linear stochastic difference equations, appeal can be made to the results of Mann and Wald (1943a),but even these are rooted in the assumption of independent and identically distributed disturbances. This foundation has become increasingly inadequate to sustain the expanding editke of econometric inference techniques, and recent years have seen a systematic attempt to construct a less restrictive limit theory. Hall and Heyde' s Martingale Limit Fdt??'.y and its Application (1980)is an important landmark, as are a series of papers by econometricians including among others Halbert White, Ronald Gallant, Donald Andrews, and Herman Bierens. This work introduced to the econometrics profession pioneering research into limit theot'y under dependence, done in the preceding decades by probabilists such as J. L. Doob, 1. A. Ibragimov, Patrick Billingsley, Robert Serfling, Murray Rosenblatt, and Donald McLeish. These ltter authors devised various concepts of limited dependence for general nonstationary time series. The concept of a martingale has a long history in probability, but it was primarily Doob's Stochastic Processes (1953)that brought it to prominence as a tool of limit theory. Martingale processes behave like the wealth of a gambler who undertakes a succession of fair bets; the differences of a martingale (the net winnings at each step) are unpredictable from lagged information. Powerful limit theorems are available for martingale difference sequences involving no further restrictions on the dependence of the process. Ibragimov and Rosenblatt respectively defined strong mixing and unbrm mixing as charactermemory' or independence at long range. McLeish defined the izations of notion of a mixingale, the asymptotic counterpart of a martingale difference, becomingunpredictable m steps aheadaspz becomeslarge. This is aweakerproperty than mixing because it involves only low-order moments of the distribution, but mixingales oossess most of those attributes of mixine orocesses needed to make , are not reflexive or antisymmetric, but thy are transitive. A set A is said to be linearly ordered by a partial ordering f if one of the relations x < y, x > y, and x y hold forr every pair @,y)e A xA. If there exist elements a e A and b G A such that a < x for al1 x e A, or x S b for all x e A, a and b are called respectively the smallest and largest elements of A. A linearly ordered set A is called well-orderd if every subset of A contains a smallest element. It is of course in sets whose elements are numbers thai the ordering concept is most familiar. Copsider two sets X and F, which can be thought of as representing the universal sets fr a pqir of related problems. The following bundle of definitions contains =
Kiff'
=
.)
tnapping
F
is a rule that associates each element of X with a unique element of F; in other words, for each a7 e X there exists a specified element y e F, denoted F@). X is called the domain of the mapping, and F the codomain. The set Gw
(@,y):
e X, y
.z'
=
is called the graph of T. For T(4)
is called the image of
,4
:4
e /t
Jq
under F, and for B
F -1(#)
=
(1.10)
Xx F
the set
? X,
(F@): x
=
W@))
=
fx: F(x)
e
(1 11)
F
.
1', the set
c
#J c
X
(1.12)
is called the inverse rrlzwe of B under F. The set T(m is called the range of F, and if F(m 1,the mapping is said to be from X onto F, and otherwise, into K If each y is the image of one and only one x e X, so that F(x1) Tx if and only the mapping is said to be one-to-one, or 1-1. if xl =
=
=
.n,
The notions of mapping and graph are really interchangeable, and it is permissable to say that the graph is the mapping, but it is convenient to keep a distinction in mind between the rule and the subset of Xx F which it generates. The term function is usually reserved for cases when the codomain is the set of real numbers (seej1.3). The term correspondence is used for a nlle connecting -1 elements of X to elements of F where the latter are not necessarily unique. T is a con'espondence, but not a mapping unless F is one-to-one. However, the term one-toone correspondence is often used specitically, in certain contexts that will arise below, to refer to a mapping that is both 1. 1 and onto. lf partial orderings are defined on both X and F, a mapping is called order-preserving if F(x1) G Tx iff On the other hand, if X is partially ordered by f, a 1- 1 mapping induces .;q S Z
is a further mapping, UoF: X
F--
the composite
mapping
Z
takes each x e X to the element U(F(x)) e Z. UoT operates as a simple transform-1 from X to Z, and 1.2 applies to this case. For C c Z, (&J07) c' ation -1 =
-1
F
(&
(C)).
0- x E .
j
i
.
.
.
.
.
'
.
' .
.
' .
' .
' .
.
j
F(z4c )
1
+
'
.
)Y
X1Z
Fig. 1.3
1.3 Example lf X 0- x E is a product space, having as elements the ordered pairs .x = (0,(), the mapping F: 0- x E i- E =
,
Mathematics
8
defined by F(0,() (, is called the projection tnapping onto E. The projection of 0- x E onto E (respectively,0-) is the set consisting of the second a set A (resp., first) members of each pair in A. On the other hand, for a set B e E, 0- x # It is a useful exercise to verify 1.2 for this case, and also to F -1 B4 T(AC) in general. In Fig. 1.3, 0- and E are line segments and check that F(A # F(A)C is the union of the indicated line 0- x E is a rectangle in the plane. Here, whereas T(A9 E. u segments, =
=
.
=
The number of elements contained in a set is called the cardinality or cardinal in this context is not a primitive one, number of the set. The notion of but can be reduced to fundamentals by what is called the principle. A set A is said to be equipotent with a set B if there exists a 1-1 correspondence connectingA and #. Think in terms of taking an element from each set and placing the pair in a pigeon-hole. Equipotency means that such a procedure can never exhust one set before the other. Now, think of the number 0 as being just a name for the null set, 0. Let the number 1 be the name for the set that has a single element, the number 0. Let 2 denote the set whoseelements are the numbers 0 and 1.And proceeding recursively, 1et n be the name for the set (0,...,n 1 1. Then, the statement that a set A has n elements, or has cardinal number n, can be interpreted to mean that A is equipotent with the set n. The set of natural numbers, denoted N, is the collection (n: n = 1,2,3,... 1. This collection is well ordered by the relation usually denoted +x < for al1 R respective and An important example properties x the xe is the positive half-line (0,+x), which we will denote subsequently by R+.
We say that d ()
qx + 1, and choose p as the smallest integer 1/(v qy. Then x < (,p 1)/ < y. For the case < 0 choose an integer n > < r < n +y, .x < r- n < y, where r is the rational satisfying n -x),
-x,
.x
-
.-x
above.
rational r
exceeding
exceeding and then found as
.
1.11 Corollary
Every collection
of disjoint open intervals is countable.
Proof Since each open interval contains a rational appearing in no other interval disjoint with it, a set of disjoint open intervals can be placed in 1-1 correspondence with a subset of the rationals. w The supremum of a set A c R, when it exists, is the smallest number y such that x S y for every x e A, written supA. The inpmum of A, when it exists, is the largest number y such that k y for every x e A, written inf A. These may or may a, and supgl,') not be elements of A. In particular, infgtz,hj inftc,#) suptc,:) b. Open intervals do notpossess largest or smallest elements. However, every subset of R which is bounded above (resp.below) has a supremum (resp. intimum). While unbounded sets in R lack suprema and/or infima, itis customary to called the extended real line. In V, every set has define the set V R t.p ( number or +x, and similarly, every set has an either real a tinite a supremum, infimum. The notation 2-+will also be used subsequently to denote R+ k.p (+x). .x
=
=
=
=
-x,+x),
=
l.4 Sequences of Sets will be written, variously, as (An'. n e IN), (A,,)7, Set sequences fA1,A2,A3, or just fAnl when the context is clear. with each member of A monotone sequence is one that is either non-decreasing, n), being contained in its (An the sequence successor c An+l, V or non-increasing, with each member containing its successor (An+1i An, V n4. We also speak of increasing (resp. decreasing) sequences when the inclusion is strict, with c (resp. D) replacing c (resp.Q). For a non-decreasing sequence, define the set A ...)
=
U';=1A,,and for a non-increasing
sequence the set A
=
fV=IA,;
=
(U';=1Aj)f. These
Sets and Numbers
13 '1'
sets are called the limits of the respective sequences. We may write An A and d,, A. 1 A, and also, in general, limrl-yx,As zt and A,, --->
=
1.12 Example The sequence ((0, 1/n1, n e (Nl is decreasing and has as limit the singleton f0). In fact, limn-+x(0,1/n) = (0) also, whereas 1im,,-4x(0,1/n) 0. The decreasing sequence of open intervals, ((tz 1/n, b + 1/n), n e EN), has as its (J, :1. On the other hand, the sequence ((tz+ 1/n, limit the closed interval its limit is b4. a, a b 1/nj, n e IN) is increasing, and =
-
-
is non-increasConsider an arbitrary sequence (zt,,1.The sequence Bn U,a'='=,#4,, = exists. superior is called limit of the This set the ing, so that B limn-oon (Anl, written limsupnAn, and also as limnAa. Similarly, the limit of sequence the non-decreasing sequence Cn F1e;=,,A,,,is called the inferior limit of the written liminfndn, or limnAr,. Formally: for a sequence (A,,, n e EN ), =
=
sequence,
oo
limsupz
O
=
n=1
n
liminfAn
U n=1
=
Cr
UA,s
(1.19)
m=n
OA,u1 .
m=n
/
(1.20)
The limsup is the set of De Morgan's laws imply that liminfnAn = (limsuwzvlf. elementscontained in infinitely rrltwz of the An, while the liminf is the set belongingto all but a Jl/c number of the An, that is, to every member of the sequence from some point onwards. These concepts provide us with a criterion for convergence of set sequences in genergl. LiminfnAs ? limsupnAn, and if these two sets differ, there are elements that belong to infinitely many of the An, but also do not belong to infinitely many of them. Such a sequence is not convergent. On the other hand, if liminfuA,l = limsupnAn A, the elements of A belong to infinitely many of the An and do not belong to at most a finite number of them. Then the sequence (An) is said to converge to A, and A is called the limit of the sequence. =
1.5 Classes of Subsets The set of a11 the subsets of X is called the power set of X, dnoted 2X The power set of a set with n elements has 2 elements, which accounts for its name and representation. ln the case of a countable set, the power set is thought of fprmally as having 2 20 e le ments. One of the fundamental facts of set theory is tht the number of subsets of a given set strictly exceeds the number of its elments. For finite sets this is obvious, but when extended to countable sets it M to the claim that 2 g > do. ?mntj .
n
1 13 Theorem
.'
')'. ()' td
'' -'
.
Prof '
. . .
.. .
q
E
..'
..'
.
.(
;
))t () it)(lit'(. . . . . '.. .:
'
.
c.
.
.
.
' '
' .
,'
i
(..
=
The proposition is proved if we can show that 2 is equipotent with R or, qivalently (in view of 1.8), with the unit interval (0,1j. For a set A G 2 x (.2.1liitsftqt . th sequence of binat'y digits (:1,hz,,... ) according to the rule,
E . ..
.
2B0
:
. . . ... .. .. . .
-.
.
,
Lbn
Mathematics
14
= 1 if n e 4, bn 0 otherwise'. Using formula (1.15)with m 1 ad q 0, let this sequence detine an element XA of (0,1)(the case where bn 1 for a1l n defines 1). On the other hand, for any element x e (0,1q, construct the set Ax G 2N according to the rule, n in Ax if and only if the nth digit in the binal'y expansion of is a 1'. These constructions define a 1-1 correspondence between 2 ENan d (0 11 w =
=
=
=
'include
.x
,
.
When studying the subsets of a given set, particularly their measure-theoretic properties, the power set is often too big for anything very interesting or useful to be said about it. The idea behind the following detinitions is to specify subX sets of 2 t h a t are large enough to be interesting, but whose characteristics may be more tractable. We typically do this by choosing a base collection of sets with known properties, and then specifying certain operations for creating new sets from existing ones. These operations permit an interesting diversity of class members to be generated, but important properties of the sets may be deduced from those of the base collection, as the following examples show.
1.14 Definition A ring S is a nonempty class of subsets of X satisfying (a) O q S. (b) lf A and B e R then A wB e S, A (AB G R and A B e X. n -
One generates a ring by specifying an arbitrary basic collection C, which must include 0, and then declaring that any sets that can be generated by the specified operations also belong to the class. A ring is said to be closed under the operations of union, intersection and difference. Rings lack a crucial piece of structure, for there is no requirement fclr the set X itself to be a member. lf X is included, a ring becomes aheld, or synonymously AC, this amounts to including all complements, and, in an algebra. Since X-A view of the de Morgan laws, specifying the inclusion of intersections and differences becomes redundant. =
1.15 Dennition A tield 5 is a class of subsets of X satisfying (a) X e 5. (b) lf A 6 5 then Ac 6 T. (c) If A and B G 5 then A t..p # e @. n A field is said to be closed under complementation and finite union, and hence under intersections and differences too; none of these operations can take one outside the class. These classes can be very complex, and also very trivial. The simplest case of a ring is (OJ. The smallest possible t'ield is tX,O). Scarcely less trivial is the AC, O ) where A is any subset of X. What makes any class of sets field (-Y,A, interesting, or not, is the collection C of sets it is declared to contain, which for the class. We speak of the smallest tield we can think of as the generated by C'. field containing C as Rings and fields are natural classes in the sense of being defined in terms of the simple set operations, but their structure is rather restrictive for some of '
,
tseed'
tthe
Sets and Numbers
15
the applications in probability. More inclusive definitions, carefully tailored to include some important cases, are as follows. ,/
is a non-empty class of subsets of X satisfying 1.16 Definition A semi-ring (a) O e $. then A /'n B e (b) If A, B e where (c) If A, B e and A i #, H n < cxa such that B A U7=1G', e = f''h for each j'. O j, n and .f.
,/
,/
./
=
-
.
.'
.
More succinctly, condition partition into '-sets.
that the difference of two '-sets has a finite
(c) says ,/
is a class of subsets 1.17 Dennition A semi-algebra (a) X e (b) If A, B e then A chB e (c) If A e $, 3 n < x such that Ac Uyzlcj,where for each j, j'. a
of X satisfying
.f.
.l'.
,/
=
.
e
.t/?
and C) ch Cj'
O
=
A semi-ring containing X is a semi-algebra. R, and consider the class of all the half-open intervals and Iz < together with the empty set. If h (J1,:1) for I (tz,!?J (J2,:1q, f-'h and 0. And if fl = (tza,:2),then fl h is. one of h, h, (tz1,:2), c 11 (tz2,J11, (:1,:2j, so that a) f tz1 and bj f b1, then h h is one of 3, semi-ritzg conditions defining therefore and The sat(az,atj t.p (:1,&2J, h. a are isfied, although not those detining a ring. If we now let R be a member of the class and follow 1.17, we find that the halfand (tz,+x),plus open iniervals, plus the unbounded intervals of the form In O artd R, constitute a semi-algebra.
1-18 Example
Let X
=
=
a S b
1'h: j x,,soy s,: xe.)ts..T?th: ,, .
. . .
.
-.
-
.
-.
,.:r
,.
-
.
.
.-.
-
.
.
.-
.
.
.
.
.
.-
. .
.
.
.
. .-
.
. -
.
.
.j(rtrp1k.
.jry,
,,...
.
.
.
. .q;,..
-iji
.
-.
-..
.-
'
.
-.
. .
-
1k-,,,,-Jj
rjjj;;!;;q
.
.-t;,
. .. .
..
.
.
. .
. ..
.
,,,
-,,
.
.....';-.7.
..:.,2,,
'
..
.
,-,.
c-n'
',,
.;
''
,;j),
''j-.k,
-
. . .
,..
.
.
.
'.
--k...
-. .
.
-
. ..
-
.
.
--,.
..
.
be
,,,
.
.
.
..
-
.,.
,,
't(l!jt)t2),.
..
-@jjjgjjk.:
.,i(-$;ydjy-.. .....
. :
.
'
.
.
:
.
..
.
jlj..
. yyjjggk., ?jjy-.$,,;gjjjr..tyy(,.k;.y;jy;j.-
:jjjy...
.
y
.
..
y.s,.y,.y
.
.
.
, .
.
. ..
..
..
'..-,,ibn
y,.;.y
... ,
.....,.--
.
k
s..
,
...
.
.
,,
.
...)...j....j.,..
,.yyy....y,y.y
..
.
.
.,.)......),),.....,).j
.:
.
-..!it,..
....
.
,q
kf,.
,.
.
.
,,
.yy
,4j,j)),,;jjj,-j)))jj)..,;;,)j,,j...
..
-),-.t..
.
.
.
.)
-,,
,.
.
..
,,
.
....
,
.
.
.
,
..
c-flelds
--
.
-,
C
--
.
16
Mathematics
generated by C, customarily denoted c(C). The following theorem establishes a basic fact about c-fields.
calledthe c-field
1.20 Theorem If C is a finite collection c(C) is tinite, otherwise c(C) is always uncountable. Proof Define the relation R between elements of X by Ry iff x and y are elements of the same sets of C' R is an equivalence relation, and hence defines an equivalence class & of disjoint subsets. Each set of & is the intersection of a11the C-sets containing its elements and the complements of the remainder. (For example, see Fig. 1.1. For this collection of regions of R2 & is the partition defined by the complete network of set boundaries.) lf C contains n sets, & contains at most ln sets and c(C), in this case the collection of all unions of &-sets, contains at most lln sets. This proves the first pm of the theorem. Let C be infinite. If it is uncountable then so is c(C) and there is nothing more to show, so assume C is countable. ln this case every set in & is a countable intersection of C-sets or the complements of C-sets, hence & g c(C), and hence is the collection of a11the countable unions of also U4&) c(C), where flX&) t/(&) uncountable, is &-sets. If we show the same will be true of c(C). We may assume that & is countable, since otherwise there is nothing more to show. So 1et the sets of & be indexed by N. Then every union of &-sets corresponds uniquely with a subset of N, and every subset of ENcorresponds uniquely to a union of 2N, &-sets. ln other words, the elements of U4&) are equipotent with those of which are uncountable by 1.13. This completes the proof. . .
,
R, and let C ((-x,rj, r G (E1 ), the collection of closed with rational called Jltilt/'-ff/r6y endpoints. c(C) is the Borel jield of R, generally denotd B. A number of different base collections generate T. Since countable unions of open intervals can be closed intervals, and vice versa, (compare1.12), the set of open half-lines, ((-x,r), r G (EI), will also serve. Or, letting (rnJ b: a decreasing sequence of rational numbers with rn 2, x,
1.21 Example Let X
=
=
(1.21) Such a sequence exists for any G ER(see2.15), and hence the same c-field is generated by the (uncountable)collection of half-lines with real endpoints, ((-x,x), x e R ) lt easily follows that various other collections generate S, including th open intervals of R, the closed intervals, and the half-open .x
.
intervals. n
1 22 Example Let X given. It is
T
=
F, the extended
real line. The Borel
field of F is easily
# e Sl, where B is the Borel field of R. You can verify that S is a c-field, and is generated by the collection F' of 1.21 augmented by the sets (-x) and V. n =
(#, B tp (+xJ, B t.p (-x4, B QJ (+x) t-p (-x):
1.23 Example Givn an interval f of the line, the class
BI
=
LBf-h f: B e B ) is
Sets and Numbers
called the restriction of T to 1, or the Borel field on 1. ln fact, T; is the c-field generated from the collection C ((-x,r1f-h f: r e () ) n =
.
Notice how c(r) has been defined from the outside' It might be thought that c(r) the inside', in terms of a specified sequence of the operacould be defined tions of complementation and countable union applied to the elements of r. But, despite the constructive nature of the definitions, 1.20 suggests how this may be impossible. Suppose we define .Wlas the set that contains r, together with the complement of every set in T' and a11 the finite and countable unions of the sets is not c(C) because it does not contain the complements of the of r. Of course, be the set containing W1together with all the complements and unions. So let in the same finite and countable unions of the sets in W1.Defining 45, manner, itmight be thoughtthat themonotone sequence (W,,) would approach c(C) x; but in fact this is not so. ln the case of the class fm,l!, for as n it example, can be shown that Wx is strictly smaller than c(C) (see Billingsley IX This fact is 1986: 261. On the other hand, c(C) may be sm aller than for in again fm,1), j3.4. demonstrated, The union of two c-fields (the set of elements contained in either or both of them) is not generally a c-field, for the unions of the sets from one tield with those from the other are not guaranteed to belong to it. The concept of union for c-fields is therefore extended by adding in these sets. Given c-fields ; and N, the smallest c-field containing a11the elements of 5 and all the elements of Wis denoted 5 v V, called the union of 5 and V. On the other hand, 5 r'A W (A:A G . and A e V') is a c-field, although for uniformity the notation 5 ZN V may be used for such intersections. Fonnally, 5 zq N denotes the largest of the c-fields whose elements belong to both 5 and V. Both of these operations generalize to the count1,2,3,... we may define able case, so that for a sequence of c-tields 5n, n l;a. V= n n 15 n and 0* Without going prematurely into too many details, it can be said that a large part of the intellectual labour in probability and measure theory is devoted to proving that particular classes of sets are c-tields. Problems of this kind will arige throughout this book. It is usually not too hard to show that Ac e 5 whenever W e 5, but the requirement to show that a class contains the countable unions can be tough to fulfil. The following material can be helpful in this connection, A monotone class A is a class of sets such that, if tAn) is a monotone sequence withL limit 4, and An e A for a1l n, then A G A. If (Aa) is non-decreasing, then zl U7=1An.lf it is non-increasing, then A O0J=Iz4,,. The next theorem shows that, to dtenine whether or not we ar: dalinf with a c-field, it is sufficient to ,'krd r' y' jr';ty 't'ld r' yyy'.yyjjjjyjjjjjjiy.d k' t'(' y' );' jyyd lr'yyy p' )' 'y,d trytd td ttd t'. ild ctmsl'detwhdherthe limits of monotone sequences belong to it, which should often ' b j esiet to establish than the general case. .
tfrom
.d1
.4z
.44,...
--->
.
=
=
=
=
=
C' y' .'E
-'. .' -' :t' '
.'''
.' '''.'' ..'.'.'....'. ..'.' '. ' . . .'... . .. .. ..(E.
' . .. '
E
' g. ...
' . .'
r . .. : :
1);,4
.
. .lj ' .. . y. .. . . Flrqh:oykm
l q)
l g...(
.. . , . . .
l .
.
'
.
.
. .
.
.
?;; is a c-field iff it is both a field and a monotone class.
.t:)2
E
.
.
:.y
:
.
. . .. . . .r ..Eq(
,,k))
y.
t j' ' .
. . The..ftjjjjy, p tjtjjy .
''
of the
ito
is immediate. For
theorem patt the part y(t.: j. . ;) y y. j.r r ,'tlt:' t.ri.jjryry.yi.,yuit y 1:2j1. :jkq-.j(jjji. :g;pC;: ;(.):t..,.(.,)i'.:..'..y.;tt.t,jiyA......,y '.1t jkj:r qlj?ii. 4, )1:41.. .j1r:lg. 1:;(jj.. jpp:p-;,r 4:;:: ...... jlrrjs 4:2:: 422:: j,qbqf. . 4EEEE! jjr-iylr jjy-plrqjj .. jrrqls ,,krIg. :;; igjl. ji. ji. :. ,(EE L.,;.,jfL:..; ... j . I j!..I t;i:;r ni; (1q ... .g. )......)...... tjjii:;s ,b, 4:22EE jj tlij;r (yj,?)'' yj e. (. . k:j?,. .t.y(). r .., j; (jry-;;;(tj yy ). .s.u.. y... r(j-(.. (.; .i.qLL);,;j . y. . ... y... .;jtytytyt, . . . . tg -L$1t.. ttttt . . .;j )')r'--t)-tzjj i-.-tki -,.-(jkyr))-;-..---,r. y is . rj limit with =1A,, t monotone ;, );. by t sequence ) a ulyytjt U.-----.- e y-):-),::::r,:--.. --..L3LLLi..-. .,.-yt.-..,7r-...(-.......). r ;-)y)-. j/l-rjt--j,-....;.LL;L;.,.;j;;-),.,;b;qj, '''-?'''t''-' )jjt.-. (t)-.)rrtt.,..,)))-y)#)---.r),-.-. t;)--?--.--,r...r-....-.,... ...... ... . . .--. ... ;.,-(.$;.,q;-.,-LLLL1L--t;))plt:'-i yryjyti... E #,,.. .. kr. t).y-j-t,yr ....-..--...- ....-.... ttl'i?.-.-;r-..t...)k.-(--..-t..-.ti..-)-.,)---)'.-.-.t.....-..t-., .t'y4?.i@''. ... . ..Li q-?..'..I3;......;-;t.-b. .!.#,..-t,) llt))k...-.... tiy,-3$bqL-;--....').L ..::?)i..-..$kr'-'t)L' .. .)yyyiyyyi 14441111@11..... i
r
,.tE'jt.ty( : ;,
i E kE'
(
(
.
.
-.yy)j,ytj,.i...y.yE.y;;r.,...myt;y.U,
; -. . g . .
; ;
(
.
E
i..E -
-
q
,.
..
.
.,...
,,
-. .
-
-. .
- . . .
:.
-
.
-.
.E-...
,
q .
. . .
.. -.
.
.
:
E .
.
-
k.tlll.
..
-.
..
..
...-....
.....
..
,.
.
.
..
- ...-.-
.
.
. . ,
...
,,
-;::,---::--
.
-
........
-j
..-.
.,
.
.
..
...
'''--'f'':tt--
.....
define
,
-k,(
..
.....,.
x ' '--------8-'-:'''*''''% . -))
'if'
-jlqilii;,;j;r
.
....
..
-......-....
..
.
.
. .
.
.
.
.
.
..-
..
... ..... -
-p
. .
..
..
-
.
18
Mathetnatics Ur1=lAk, so the theorem follows.
assumption.Uo;=1An =
.
Another useful trick is Dynkin ' s zr- theorem.l To develop this result, we define two new classes of subsets of X.
1.25 Definition A class T is a z-system if A and B e T implies A rn B e T. A class .f is a h-system if (a) X e 1. and B (b) If A and B e c A, then A B e 1. is a non-decreasing sequence and Au A, then A e (c) If fAn e n ,f
-
'1'
,f)
.
Conditions (a)and (b)imply that a l-system is closed under complementation (put .f
m.Moreover, since (b) implies that Bn A,,+1 A,i e for each n, (c) implies that a countable union of disjoint d-sets is in 1. In fact, these implica-
X
=
=
-
tions hold in both directions, and we have the following.
is a l-system if and only if 1.26 Theorem A class (a) X e 1. then Bc e 1. () If B e is a disjoint sequence, then UnAn e (c) If (Aa e ,f
.f
,f)
.
Ia
ln particular, a c-field is a X-system, and moreover, a class that is both a is is a c-tield. This follows by 1.24, because a a-system and a class closed under unions by 1.25(c), and by de Morgan's laws is if a monotone closed under both intersections and complemntation. The following result makes these dehnitions useful. -system
-system
1.27 Dynkin's x-A theorem (Billingsley 1979, 1986: th. 3.2) If T is a a-system, ctfis a l-system, and P c 2, then c(T) c 1. containing P (theintersection of a11 Proof Let kTj denote the smallest containing P), so that in particular, (T) the c M.We show that 3.(:P)is remarks l-system. follow the above, it will then that (T) is a c-field, and By a 2, as required. hence that c(T) c &T) For a set A e 1(T), let Vztdenote the class of sets B such that A rn B e (T). We Clearly, X e Vz, so that condition 1.25(a) is shall show that gk is a satisfied. Let #1, Bz e 5k, and Bj c #2; then A rnfl e 14T) and A fa #2 e (T), and (A r''h#1) c (A fa #2), which implies that -system
-systems
-system.
(A rn #2)
(A t'''h#1)
-
A f'n
=
(#2 -
#1) e 14T).
(1.22)
But this means that Bz #1 e Vx, and condition 1.25(b) is satisfied. Lastly, suppose A f''h Bi e ,4T) for each i 1,2,... and Bi #. Then zt f-h # e (T) by 1.25(c), which means that 1.25(c) holds for Vx, and sk is a as asserted. Suppose A e P. Then B e P implies A PB e T p is a a-system) and since P i and '4T) (T), this further implies B c 5k. Hence P c 5k. Since Nx is a containing have in this smallest also (T) gk P, is the we case. So, c V',A and hence A fa B e (T). whn A e T, # e VT) implies B e We can summarize the last conclusion as: -
'1'
=
-system
-system,
-system
Sets and Numbers (A e P, B e 1(T)) Now defining Vs by analogy
(# e
(T),
yt
with
PB e
=
19
(A OB e
(T)).
(1.23)
9%, so that (T))
=
(A e Vs),
(1.24)
we see that (1.23)and (1.24)together yield P c T. Since Vs is also a by the same argument as held for Vx, and contains T, 1(T) c Ns by definition of (T). Thus, suppose B e (T) and C E (T). Then Ce Vs, which means that B f''h Cq (T). So 1(T) is a n-system as required. . -system
+.
2 Limits and Continuity
2.1 The Topology of the Real Line The purpose of this section is to treat rigorously the idea of as it applies to points of the line. The key ingredient of the theory is the distance between a pair of points, x, y e R, defined as the non-negative real number Ix y l what is formally called the Euclidean distance. ln Chapters 5 and 6 we examine the generalization of this theory to non-Euclidean spaces, and find not only that most aspects of the theory have a natural generalization, but that the concept of distance itself can be dispensed with in their development. We are really studying a special case of a very powerful general theory. This fact may be helpful in making sense of certain ideas, the definition of compactness for example, which can otherwise appear a little puzzling at first sight. An e-neighbourhood of a point e R is a set Sx, fy: Ix-yI < :1, for some s > 0. An open set is a set d c R such that for each e A, there exists for some e > 0 an e-neighbourhood which is a subset of d. The open intervals defined in j1.3 are open sets since if a < < b, : minl 1b l > 0 satisfies the la definition. R and O are also open sets on the definition. The concept of an open set is subtle and often gives beginners some diftkulty. Naive intuition strongly favours the notion that in any bounded set of points to' a point outside the set. But open sets are there ought to be one that is sets that do not have this property, and there is no shortage of them in R. For a complete understanding of the issues involved we need the additional concepts of Cauchy sequence and limit, to appear in j2.2 below. Doubters are invited to suspend their disbelief for the moment and just take the definition at face value. The collection of a1l the open sets of k is known as the topology of R. More precisely, we ought to call this the usual topology on R, since other ways of defining open sets of R can be devised, although these will not concern us. (See Chapter 6 for more information on these matters,) More generally, we can discuss subsets of (Rfrom a topological standpoint, although we would tend to use the term subspace rather than subset in this context. If A c S ? R, we say that A is open in S if for each x G 4 there exists Sxtzj, : > 0, such that .(x,E) f''AS is a subset of A. Thus, the interval (0,) is not open in R, but it is open in (0,1). These sets define the relative topology on S, that is, the topology on S relative to R. The following result is an immediate consequence of the definition. inearness',
-
,
.z7
=
.'r
-x1
.7
=
-x1
,
0, the set
Limits and Continuity
21
is not empty. The closure points of A are not necessalily elements of A /'7 A, open sets being a case in point. The set of closure points of A is called the c losure of A, and will be denoted X-or sometimes (A)- if the set is defined by an expression. On the other hand, an accumulation point of A is a point x e R wich is a closure point of the set A f.x) An accumulation point has other points of A A, it must also arbitrarily close to it, and if x is a closure point of A and accumulation closure is point. A point that not an accumulation point (the be an of contains x former definition being satisfied because each E-neighbourhood itselfl is an isolated point of A. AC /-7 Sxnz) is A boundar.v point of a set A is a point e X such that the set points of A is denoted ?A, and X not empty for any e:> 0 The set of boundary AO A A. A closed set is one containing A t.p :z4. The interior of A is the set all its closure points, i.e. a set A such that A A. For an open interval A a,b) c R, X' (tz,1. Every point of anb) is a closure point, and a and b are also closure points, not belonging to asbj. They are the boundat'y points of both a,b) and gJ,?7). ,&(x,sl
-
.
.x
.x
.x
=
.
=
-
=
=
=
2.2 Theorem The complement of an open set in R is a closed set.
(:I
This gives an alternative definition of a closed set. According to the definitions, O (theempty set) and R are both open and closed. The half-line (-x,xq is the complement of the open set @,+x)and is hence closed. Extending this result to relative topologies, we have the following.
2.3 Theorem lf
:4
is open in S c R, then S -A is closed in S. cl
In particular, a corollary to 2.1 is that if B is closed in R then S r-hB is closed in S. But, for example, the interval l, 1) is not closed in R, although it is is open in (0,1). closed in the set (0,1),since its complement (0,r1) Some additional properties of open sets are given in the following theorems.
2.4 Theorem
(i) The
union of a collection of open sets is open.
(ii) If A and B are open, then A /-7 B is open. n
This result is be proved in a more general context below, as 5.4. Arbitrary intersections of open sets need not be open. See 1.12 for a counter-example.
2.5 Theorem Every open set A q R is the union of a countable collectin joint open intervals.
of dis-
Proof Consider a collection (&x,:x), x e A h where for each x, Ex > 0 is chosen enough 5'@,:x) Then that A. small i Uxex5'tx,:xlA, but, since necessarily A UxeA5'@,sxl UxeA.@,Exl, it follows that A. This shows that A is a union of open intervals. Now define a relation R for elements of A, such that xRy if there exists an open interval f q A with x e f and y E 1. Every x e A is contained in some interval by the preceding argument, so that xRx for all x e A. The symmetry of R is obvious. Lastly, if x,y e I A and y,z e 1' ? A, I /-7 1' is nonempty and hence I t.p 1' is also an open interval, so R is transitive. Hence R is an equivalence =
Mathematics
22
relation, and the intervals I are an equivalence class partitioning A. Thus, A is a union of disjoint open intervals. The theorem now follows from 1.11. w Recall from 1.21 that f, the Borel field of R, is the c-field of sets generated by both the open and the closed half-lines. Since every interval is the intersection of a half-line (openor closed) with the complement of another half-line, 2.2 and 2.5 yield directly the following important fact.
2.6 Theorem
B contains the open sets and the closed sets of R n .
A collection r is called a covering for a set A i R if A an open set, it is called an open covering.
UsswB. If each
B is
covering theorem If C is any collection of open subsets of R, 2.7 Lindelf's is countable subcollection (Bi e C, i e N ) such that there a X
UB U#. =
f=1
Be N
Proof Consider the collection (Sk Srkbs, rk e Q, sk e Q+) ; that is the collection of a11 neighbourhoods of rational points of R, having rational radii. xQ+ is countable by 1.5, and hence is countable; in other words. The set (E1 show exhausts by We k the set. that, for any open set B c R and point indexing e N a' e B, there is a set Sk G such that e Sk c #. Since has a E-neighbourhood inside B by definition, the desired Sk is found by setting sk to any rational from for s > 0 sufticiently small, and then choosing rk e the open interval (0,1as), 5'(a',/) as is possible by 1.10. Now for each x e UsewB choose a member of say Skp, satisfying x e Skp i B for any B e C. Letting k(x) be the smallest index which satisfies the requirement gives an unambiguous choice. The distinct members of this collection form a set that covers Ussc#, but is a subset of and hence countable. Labelling the indices of this set as k1,k2,..., choose Bi as any member of C containing Ski. and hence also for Usecs. w Clearly, U:'xlffis a countable covering for U7=15k, ,/
=
=
,/
.$
.z'
.x
.$,
,/
lt follows that, if C is a covering for a set in R, it contains a countable subcovering. This is sometimes called the Lindelf property. The concept of a covering leads on to the crucial notion of compactness. A set A is said to be compact if every open covering of A contains afinite subcovering. The words that matter in this definition are and Any open covering that has R as a member obviously contains a finite subcovering. But for a set to be compact, there must be no way to construct an inrducible, infinite open covering. Moreover, every interval has an irreducible infinite cover, consisting of the singleton sets of its individual points; but these sets are not open. Gevery'
Eopen'.
2.8 Example Consider the half-open interval (0,11.An open covering is the eountable collection (41/n,1j, n e NJ lt is easy to see that there is no finite subcollection covering (0,11 in this case, so (0,1j is not compact. EI .
A set A is bounded if zt i 5'(x,s) for some x e A and ir > 0. The idea here is that
Limits and Continuity
23
: is a possibly large but tinite number. In other words, a bounded set must be containable within a finite interval. 2.9 Theorem A set in R is compact iff it is closed and bounded. u This can be proved as a case of 5.12 below, and provides an alternative definition of compactness in R. The sufticiency part is known as the Heine-Borel theorem. A subset B of A is said to be dense in A if B c A q X. Readers may think they know what is implied here after studying the following theorem, but denseness is a slightly tricky notion. See also 2.15 and the remarks following before coming to any premature conclusions.
2.10 Theorem Let A be an interval of R, and C c A be a countable set. Then A C is dense in A. -
Proof By 1.7, each neighbourhood of apoint inA contains an uncountable number of points. Hence for each e A (whetheror not e C), the set (A C) f-h 5'(x,:) is not empty for every e > 0, so that is a closure point of A C. Thus, A C c (A C) QJ C A ? (A C) K .x
.v
-
.x
-
=
-
-
-
.
The k-fold Cartesian product of R with copies of itself generates what is called Euclidean k-space, R The points of Rk have the interpretation of k-vectors, or ordered k-tuples of real numbers, x (x1,.n,...,.q)'.All the concepts defined above for sets in R generalize directly to R The only modification required is to replace the scalars x and y by vectors x and y, and define an e-neighbourhood be the Euclidean distance betweenx andy, where 1Icll in a new way. Let Ilx 11 1/2 IEt,-1 , q is the length of the vector a (J1,...,JJ' and then define S(x,z) < el, for som s > 0. An open set A of R2 is one in which every point (y: Ilx 11 x e A can be contained in an open disk with positive radius centred on x. In R3 the open disk becomes an open sphere, and so on. .
=
.
-y
=
=
=
-y
2.2 Sequences and Limits A real sequence is a mapping from E$into R. The elements of the domain are called the indice- and those of the range variously the terms, members, or coordinates of the sequence. We will denote a sequence either by tak,n e EN), or more briefly by IakI:', or just by (xnJ when the context is clear. (xnl'kois said to converge to a limit x, if for every : > 0 there is an integer Nv for which
Ixn
-
xI
Nv.
Write xn it ij often x, or = limu-jxak. When a sequence is tendinygo +x or said to diverge, but it may also be said to converge in R, to distinguish those cases when it is does not approach any fixed value, but is always wandering. increasing, non-increasing, or decreasA sequence is monotone (non-decreasing, ing) if one of the inequalities ak f xn+1, xn < xn+1, xn k ak+1, or xn > xnx.l holds for every n. To indicate that a monotone sequence is converging, one may write for emphasis either xn a7 will or xn t, x, as appropriate, although xn -.->
.x
-x
'1'
.x
--
24
Mathematics
also do in both cases. The following result does not require elaboration. 2.11 Theorem Evel'y monotone sequence in a compact set converges.
n
A sequence that does not converge may none the less visit the same point an infinite number of times, so exhibiting a kind of convergent behaviour. If fak,n e N l is real sequence, a subsequence is fxnk,k e INl where fnk, k e N ) is any increasing sequence of positive integers. If there exists a subsequence taksk e (h1 ) and a constant c such that xnk (-1)N, c, c is called a cluster point of the n 1,2,3,...) does not converge, but sequence. For example, the sequence f subsequence obtained by only taking the even values of n converges trivially. c is usually a finite constant, but +x and may be cluster points of a sequence if we If the of notion allow convergence in F. a subsequence. is convergent, then so is ) where fmk3 is an any subsequence of the subsequence, defined as (x,,u,k e EN whose members of fnkl members are also increasing sequence The concept of a subsequence is often useful in arguments concerning convergence. A typical line of reasoning employs a two-pronged attack', first one identities a convergent subsequence (a monotone sequence, perhapsl; then one uses other characteristics of the sequence to show that the cluster point is actually a limit. Especially useful in this connection is the knowledge that the members of the sequence are points in a compact set. Such sequences cannot diverge to infinity, since the set is bounded; and because the set is closed, any limit points or cluster points that exist must be in the set. Specifically, we have two uscful --
=
-x
.
results.
2.12 Theorem Every sequence in a compact set of R has at least one cluster point. Proof A monotone sequence converges in a compact set by 2.11. We show that ) hasamonotonesubsequence.Defineasubsequence (xnk) evel'y sequence (ak,ne EN follows. for 1,2,3,... let if S'et 1, and k there nl xnk..! sup,on:ak as exists a finite ?7z+1 satisfying this condition', otherwise let the subsequence terminate at nk. This subsequence is non-increasing. If it terminates, the sub?u) must contain a non-decreasing subsequence. A monotone sequence (.v, n k subsequence therefore exists in every case. w =
=
=
2.13 Theorem A sequence in a compact set either has two or more cluster points, or it converges. Proof Suppose that c is the unique cluster point of the sequence fxn) but that xn zd-yc. Then there is an infinite set of integers (nz, L e EN) such that Iau c I s for some : > 0. Define a sequence fyk)by setting yz xnk. Since (yz)is also a sequence on a compact set, it has a cluster point c' which by construction is different from c. But c' is also a cluster point of fx,,1, of which (ykl is a subsequence, which is a contradiction. Hence, xn c. w ,
-
=
-->
3 ,...,x
n 2.14 Example Consider the sequence ( 1,x,x2 or more j orma jj y ta.gl where is a real number. In the case IxI< 1, this sequence converges to n e EN()1, The condition specified zero,( I.x''lJ being monotone on the compactinterval g0,11. ,a:
.'t7
,...),
,
Limits and Continuity
25
in (2.2) is satisfied for Nc = logtel/log IxI in this case. If x = 1 it converges it to 1, trivially. lf > 1 it diverges in R, but converges in 2-to +x. If x +1 diverges, but oscillates between cluster points and neither converges nor -1
.x
=
-1.
Finally, if x < 1 the sequence diverges in R, but does not converge in Ea mately, it oscillates between the cluster points +x and -
F. Ulti-
-x.
We may discuss the asymptotic behaviour of a real sequence even when it has no limit. The superior limit of a sequence fxn) is
limsup xn n
=
inf sup xm. n
(2.3)
m kn
(Alternative notation'. limn xn.) The limsup is the eventual upper bound of a sequence. Think of (sup-znau, n 1,2,... J as the sequence of the largest values the sequence takes beyond the point n. This may be +x for every n, but in all cases it must be a non-increasing sequene having a limit, either +x or a finite real number; this limit is the limsup of the sequence. A link with the corresponding concept for set sequences is that if xn supA,, for some sequence of sets (4n c R ) then limsup xn supA, where A limsupsAn. The inferior limit is defined likewise, as the eventual lower bound: =
=
=
,
liminf xn
=
=
-
limsupt-ak)
=
sup inf xm, n
n
(2.4)
mkn
also written limnak. Necessarily, liminfaak ; limsupnak. When the limsup and liminf of a sequence are equal the sequence is convergent, and the limit is equal the sequence converges in V. to their common value. If both equal +x, or The usual application of these concepts is in arguments to establish the value of a limit. lt may not be pennissible to assume the existence of the limit, but the limsup and liminf always exist. The trick is to derive these and show them to be equal. For this purpose, it is sufticient in view of the above inequality to show liminfn xn limsupn xn. We often use this type of argument in the sequel. To determine whether a sequence converges it is not necessary to know what the limit is; 4he relationship between sequence coordinates the tail' (as tl criterion for converbecomes large) is sufficient for this purpose. The Cauchy > iff for real 0 of that Ne such ('E'i states every e: fakl converges sequence gence a that Ixn xm I < : whenever n > Nc and m > Ne. A sequence satisfying this criterion is called a Cauchy sequence. Any sequence satisfying (2.2)is a Cauchy sequence, and conversely, a real Cauchy seqqence must possess a limit in R. The two definitions are therefore equivalent (in R, at least), but the Cauchy condition may be easier to verify in practice. The limit of a Cauchy sequence whose members a1lbelong to a setA is by definition a closure point of A, though it need not itself belong to A. Conversely, for every accumulation point of a set A there must exist a Cauchy sequence in the set whose limit is x. Construct such a sequence by taking one point from each of the sequence of sets, -x,
n, then xn is an integer. If m f n, removing the decimal /10'J-'N point produces a tinite integer a, and xn a so ak is rational. Given any real x, a sequence of rationals (ak) is obtained by replacing with a zero every beyond the ath, for n = 1,2,... Since digit in the decimal expansion of jxa..l l < 10-/, lak) is a Cauchy sequence and xn x as n x. w =
.v
-->
.--->
-xn
The sequence exhibited is increasing, but a decreasing sequence can also be conIf x is structed, as f-yn) where (yn) is an increasing sequence tending to itself rational, this construction works by putting ak = x for every n, which trivially defines a Cauchy sequence, but certain arguments such as in 2.16 below depend on having xn # x for every n. To satisfy this requirement, choose the terminating' representation of the number', for example, instead of 1 take This does not 0.9999999..., and consider the sequence (0.9, 0.99, 0.999, work for the point 0, but then one can choose f0.1, 0.01, 0.001,...1. of a real One lnterestingcorollary of 2.15 is that, since every E-neighbourhood number must contain a rational, (E)is dense in R. We also showed in 2.10 that R (D is dense in R since Q is countable. We must be careful not to jump to the concluAnother version sion that because a set is dense, its complement must be of this proof, at least for points of the intelwal E0,1), is got by using the binary expansion of a real number. The dyadic rationals are the set -x.
<non-
...)
.
-
,
tsparse'
.
D
=
(ilzn,
i
=
1,...,2* 1, n c EN )
(2.5)
-
.
The dyadic rationals corresponding to a finite n define a covering of E0, 11 by 1/2N, which are bisected each time n is incremented. For any x intervals of width e g0,1J, a point of the set f illn' i = 1,,..,22 1 ) is contained in 5'(x,eJfor : < 2/21 so the dyadic rationals are dense in (0,11. D is a convenient analytic tool when we need to detine a sequence of partitions of an interval that is becoming dense in the limit, and will often appear in the sequel. Another set of useful applications concern set limits in R -
.
2.16 Theorem Every open interval is the limit of a sequence of closed subintervals wiih rational endpoints. Proof If a,b) is the interval, with a < &, choose Cauchy sequences of' rationals an a and bn b, with tzI < l)j (alwayspossible by 1.10). By definition, for for all n k N, and hence every x e anbj there exists N k 1 such that x gtzr,,lul > anb) i liminfnlln,lul. On the other hand, since an a and b > bn, abbjc (Jn,:,,1f for all n 2 1, so that a,bjc i liminfagtzn,hnlc. This is equivalent to limsupagtzn,:nj atb). Hence limnltzn,:nl exists and is equal to a,b). . .1e
''
Limits and Continuit.y
27
This shows that the limits of sequences of open sets need not be open, nor the limits of sequences of closed sets closed (takecomplements above). The only hard and fast rules we may 1ay down are the following corollaries of 2.4(i): the limit of a non-decreasing sequence of open sets is open, and (bycomplements) the limit of a non-increasing sequence of closed sets is closed.
2.3 Functions and Continuity hfunction of a real variable is a mapping J:S F-->T, where S c R, and 'F c R. By specifying a subset of R as the codomain, we imply without loss of generality that /S) = T', such that the mapping is onto 7. Consider the image in F, under f, of a Cauchy sequence fxnlin S converging to .x. lf the image of every such sequence converging to x e S is a Cauchy sequence the function is said to be continuous at Continuity is in 'F converging to formally defined, without invoking sequences explicitly, using the 6r approach. < # is continuous at the point x e S if for any e > 0 D > 0 such that Iy whenever < The choice here depend S. of implies lfl may on x. ye fx) I e, lf f is continuous at every point of S, it is simply said to be continuous on S. Perhaps the chief reason why continuity matters is the following result. .f@),
.x.
-
-xl
-
2.17 Theorem lf J: S F-> 'T is continuous at a1l points of S, f -1(A) j s open ju whenever A is open in 7J, and f -1(4) is closed in S whenever 4 is closed in F.
s
(:a
This important result has several generalizations, of which one, the extension to vector functions, is given in the next section. A proof will be given in a still more general context below; see 5.19. Continuity does not ensure that J(A)is open whenA is open. A mapping with this fAC) J(X)C in general, we # property is called an open mapping, although, since cannot assume that an open mapping is also a closed mapping, taking closed sets to clpsed sets. However, a homeomorphism is a function which is 1-1 onto, continuous, and has a continuous inverse. If f is a homeomorphism so is f and hence by 2.17 it is both an open mapping and a closed mapping. It therefore preserves the structum of neighbourhoods, so that, if two points are close in the domain, their images are always close in the range. Such a transformation amounts to a relabelling of axes. If fx + ) has a limit as h i 0, this is denoted f@+4.Likewise, J@-) denotes the limit of f(x ). It is not necessary to have x e S for these limits to exist, but if fx) exists, there is a weaker lkotion of cpntinuity at x. f is said at the point x e S if, for any s > 0, R 6 > 0 such that to be right-continuous and < 8 whenever 0 f x- h e S, .j
,
-
Iflx-h) fx) I < -
lt is said to be ever 0 < h
0, 3 8 > 0 and x e S,
wch that
when-
-
If@) J@ ) I < -
-
z.
(2.7)
Mathematics
28
Right continuity at x implies J@) J@+)and left continuity at x implies fx) = the function is continuous at x. fx-). If fx) f(x+) Continuity is the property of a point x, not of the function f as a whole. Despite continuity holding pointwise on S, the property may none the less break down as certain points are approached. =
.f@-),
=
=
2.18 Example Conjider fx)
1/x, with S
=
=
T'
=
(0, x).
iff
0, 1
Iflx- ) fx) I -
=
x(x+o
and hence the choice of depends on both 0. l:l 0, but not in the limit as that
J: S
F->
sx
-
flx) is continuous
continuous if for every E
'F is unformly
Ix-yl
6: and x.
j
for all x
>
-->
.x
The function
< E
0 H
>
0 such
-.f(A') 1< e:
(2.8)
for every x,y e S. ln 2.18 the function is not uniformly continuous, for whichever is chosen, we can pick x small enough to invalidate the detinition. The problem arises because the set on which the function is defined is open and the boundary point is a discontinuity. Another class of cases that gives difticulty is the one x. where the dofnain is nbounded, and continuity at x is breaking down as x -->
However, we have the following result.
2.19 Theorem lf a function is continuous everywhere on a compact sets, then it is bounded and uniformly continuous on S. In (For proof, see 5.20 and 5.21.) Continuity is the weakest concept of smoothness of a function. So-called Lipschitz conditions provide a whole class of smoothness properties. A function f is said to satisfy a Lipschitz condition at a point x if, for any y e 5'@,) fot some 8 > 0, H M > 0 such that
If@l fx, I K -
ftlx
-yI )
(2.9)
R+ F-> R+ satisfies hld) 1 0 as d 1, 0. f is said to satisfy a uniform Lschitz condition if condition (2.9)holds, with fixed M, for a1l x,y e S. The Continuity (resp.unifonn type of smoothness imposed depends on the function uniform Lipschitz) property for any continuity) follows from the Lipschitz (resp. choice of h. Implicit in continuity is the idea that some function (.): R+ F-> R+ exists satisfying 6(E) 1 0 as 6: 1 0. This is equivalent to the Lipschitz condition holding for some (.),the case h = 8-1 B y imposig some degree of smoothness on h making it a psitive power of the argument for example we impose a degree of smoothness on the function, forbidding sharp corners' The next smoothness concept is undoubtedly well known to the reader, although differential calculus will play a fairly minor role here. Let a function J: S F-> 'F be continuous at x e S. lf
here
W
:
.
.
-
-
.
29
Limits and Continuity
fx-h) fx) h -
/:(x)
=
lim i0
(2.10)
exists, A'@)is called the left-hand derivative of f at x. The right-hand derivative, /1(x), is detined correspondingly for the case h 0. lf J+'(x) f-'(x),the common value is called the derivative of f at x, denoted f'x) or df/dx, and f is said to be dterentiable at x. lf J': S F-.> R is a continuous function, f is said to be continuously dterentiable on S. A function f is said to be non-decreasing (resp. increasing) if f@) flx) (resp. f@)> Ax))whenever y > x. lt is non-increasing (resp.decreasing) if is non-decreasing (resp.increasing). A monotone function is either non-decreasing or non-increasing. 'When the domain is an interval we have yet another smoothness condition. A function f.. (J,:1 F-+ R is of bounded variation if 3 M < x such that for every < xn b, by tinite collections of points a partition of gt7,/71 xo < x1 < '1'
=
-f
=
...
=
n
< M. lfxi) .f(.xf-.1)I 77 k=1
(2.1 1)
-
2.20 Theorem If and only if f is of bounded variation, there exist non-decreasing functions h and h such that f n .42
.f1.
=
-
(For proof see Apostol 1974: Ch. 6.) A function that satisfies the uniform Lipschitz condition on gJ,hl with ( I.x y I) Ix y I is of bounded variation on =
-
-
(J,:1.
2.4 Vector Sequences and Functions A sequence (xs) of real k-vectors is said to converge to a limit x if for every E > 0 there is an integer Nc for which
-x1I< Ilxn
E
for all H > Ne.
The sequence is called a Cauchy sequence in R iff Ne and m > Nv. A function
fl S
i--y
(2.12)
-auIl< e Ilxn
whenever
n
>
1-,
where S c Rk an d 7 R associates each point of S with a unique point of 7. 1ts graph is the subset of S x'r consisting of the k + ll-vectors (x, fx) ) for each x e S. f is continuous at x l S if for any E > 0 H > 0 such that ,
,
#
IIII
0, 3 > 0 such that
IIII
0, R Nvjl such that 1Jnt)l J((J)) l < e:when n > Asro,then fn is said to converge to f,pointwiseon f1. As for real sequences, we use the notations fn f, fn f, or fn 1 J, as appropwhere in the latter case the monoriate, for general or monotone convergence, tonicity must apply for eveyy (.t) e f2. This is a relatively weak notion of convergence, for it does not rule out the possibility that the convergence is breaking down at certain points of f1. The following example is related to 2.18 above. Let
fn1f
F-
-
'1'
-->
2.22 Example Let fnx) (0,x) is 1/x. But
=
nllnx
+
1),
fnlxt -
-
.x
1
e
=
x
(0,x). The
pointwise limit of
fnx) on
1 xtrl-x + 1)
,
and llxNcxx + 1)) < e: only for Nu > (1/a 1)(1/x). Thus for given E, Nvx cxo 0 and it is not possible to put an upper bound on Nu such that as x lfnx) 1/xl < s, n 2 Ncx, for every x > 0. n -->
-
--
-
To rule out cases of this type, we define the stronger notion of unifrm convergence. If there exists a function f juch that, for each : > 0, there exists N such th a t
lfnkbt J((,))I < sup eD -
fn is said to converge
E when
to J unifonnly on f1.
n
>
N,
31
Limits and Continuity
2.6 Summability and Order Relations The sum of the tenns of a real sequence faklTis called a series, written X'izzlxa (or just EAk). The terms of the real sequence (X'=lau,n e (N) are called the partial sums of the series. We say that the series converges if the partial sums converge to a tinite limit. A series is said to converge absolutely if the monotone sequence (Z'=1IxmI n e Ehl converges. ,
T x? This 2.23 Example Consider the geometric series, Zy converges to 1/(1 x) when 1x1< 1, and also converges absolutely. It oscillates between cluster points and for other values of x it diverges. u 0 and 1 for x .1
-
.
-1,
=
2.24 Theorem lf a series converges absolutely, then it converges. Proof The sequence (X;.1lxm I n e EN)is monotone, and either diverges to +x or converges to a finite limit. ln the latter case the Cauchy criterion implies that + Ixn..mI + Ixn..mI 0 as m and n tend to infinity. since Ixn I + lxnl + inequality,4 +xn.vm I by the triangle lak + convergence of (X)=1au,n E EN ) follows by the same criterion. w ,
-->
....
....
....
An alternative tenninology speaks of summability. A real sequence taklT is said to be summable if the series Xzkconverges, and absolutely summable if ( Ixnll7is summable. Any absolutely summablesequenceis summableby 2.24,and any summable sequence must b converging to zero. Convergence to zero does not imply summability (see2.27 below, fof example), but convergence of the tail sums to zero is necessary and suftkient.
2.25 Theorem 1ff (xn)7is summable, Zrlzznx- 0 as n -->
n-1
-->
x.
Proof For necessity, write IZ,,=l.zu x ! f IZ,a=1au I + I m-nxm j sincefor any : > 0 < k there exists N such that IE,7ox,ul e: for n N, it follows that lZp1=laulS ,,=,14% Then n I +: < x.n Conversely, assume summability and let A Eklxn. IE#-1 Z,n=n-Ym X Y,,=1.
=
.
-
.
A sequence fakJTis Cesro-summable if the sequence This is weaker than ordinary convergence.
i .26 Theorem lf (akl:' cbnverges to x, its Cesro
-1
n converges. f?z Eru=laulq
sum also converges to x. n
But a sequence can be Cesro-summable in spite of not converging. The sequence f (-1)/)7 converges in Cesro sum to zero, whereas the partial sum sequence (r=()(-1)''' J7converges in Cesro sum ttl (compare2.14). Various notations are used to indicate the relationships between rates of diverfJnl!' gence or convergence of different sequences. If (xnlTis any real sequence, is a sequence of positive real numbers, and there exists a constant B < x such that IakIIan ; B for a1l n, we say that xn is (at most) of the order of magnitude of an, and write xn Oa. If takIanl converges to zero, we write xn oan), 4nd s>y that xn is of smaller order of magnitude than an. an can be incfeasing or decr:aing, so this notation can be used to express an upper bound either on the fte of eropth of a divereent seauence. or on the rate of convergence of a =
=
32
Mathematics
sequenceto zero. followsfrom the
Here are some rules for manipulation of 0(.), whose proof definition. lf xn On a) and y,j Onq), then =
.;tn +ya x 'n =
=
=
t?tzynwelxtfts; )
(2. 15)
(?(Itz+f5)
(2.16)
,
xls onsq), whenever =
a'fz is defined.
(2.17)
An alternative notation for the case xn 0 is xn f< an, which means that there is a constant, 0 < B < x, such that xn f Ban for al1 n. This may be more convenient in algebraic manipulations. The notation ak an will be used to indicate that there exist N k 0, and t'inite constants A > 0 and B k A, such that infapvak lan) 2 A and supazxtak Ian) S B. This says that l and (Jnl grow ultimately at the same rate, and is different 0. Some f'rom the relation xn Oan), since the latter does not exclude xn Ian 1. authors use xn an in the stronger sense of xn lan -
txn
.-.-)
=
.-->
-
2.27 Theorem If (ak) is a real positive sequence, and xn 1+a. n (i) if a > then Zpual.u n then Erlxlau log n; (ii) if ('x ()(sl+a) (j mzmxm < then )Lxlau x (iii) if (x < an xx
n,
-
-1
-
,
-1
=
-
-1
(x
.
.
Proot By assumption there exist N 2 1 and constants A > 0 and B k A such that An < xn K Bnz for n k #, and hence Amnzmt < mnzxm S Bxmnzm. The limit x for different values of a defines the Riemann zetahmction of 1x.l?'?ztz as n and its rates of divergence for a k for a < are standard results', see e.g. of Apostol (1974:Sects. 8. 12-8.13). Since the sum terms from 1 to N-1 is tinite, their omission cannot change the conclusions. . ..4
-1
-1,
lt is common pratice to express the rate of convergence to zero of a positive real sequence in tenns of the summability of the coordinates raised to a given power. The following device allows some f'urtherrefinement of summability condiV as v x (0) for tions. Let U(v) be a positive function of v. If &(vx)/&(v) > 0 and < p < +=, U is said to be regularly vatying at inhnity zero). lf a .x x (0), 1 for x > 0 as v positive function Lv) has the property L(%'x)lLM) frl/W/y varying said Evidently, is slowly regularly it at zero). to be any varying function can be expressed in the form &(v) = vpfXvl,where L(v) is slowly varying. While the definition allows v to be a real variable, in the cases of interest we will have v n for n e N, with U and L having the interpretation of positive sequences. --
--
-x
-->
-->
=
2.28 Example (logvltxis slowly varying at infinity, for any
G.
u
On the theol'y of regular vqriation see Feller (1971),or Love (1977).The important property is the following. 2.29 Theorem If L is slowly varying at infinity, then for any k 1 such that
>
0 there exists N
33
Limits and Continuity
v -8
Atvl
=
=
vPf-(v),
write
pvP-1f-(v)
+vPfJ(v)
=
vP-1(p1,(v)
+vr(v)).
0 there is no more to show, so assume liminfvfxv)
jfzotjvxxilj
Kdv
which implies f7(vx)/f7(v) U'Mx)
U,(v)
Ly'vl
=
x
...0
-
(v)
--
L
(v) fXv)
>
0. Then
(),
(2.21)
w
(2.22)
1. Thus,
p-lpfatvx) =
f-tv-lj
jf7,(vAl
(2.20)
+
vxc'tvxl
pAtvl +vA
,
(v)
--)
p
a7
.
#
2.7 Arrays Arguments concerning stochastic convergence often involve a double-indexing of elements. An array is a mapping whose domain is the Cartesian product of countable, linearly ordered sets, such as ENx ENor r x (N, or a subst thereof. A real double array, in particular, is a double-indexed collection of numbers, or, alternatively, a sequence whose members are real sequences. We will use notation juch as (fakt,t E z J, n e (N), or just fau) when the context is clear.
Mathematics
34
-?'
A collection of tinite sequences (fau, f 1,...,k ), n e IN) where kn x as n --+ x, is called a triangular array. As an example, consider array elements of the 1,...,n) is a real sequence. The question of form xnt ytln, where tys t whether the series ( :=1.x,,,, n e N l converges is equivalent to that of the Cesro convergence of the original sequence', however, the array formulation is frequently =
,
=
=
the more convenient.
2.34 Toeplitz's lemma Suppose (y,;) is a real sequence and yn .-.-: y. lf ((.'xkf, t= such that 1,...,lk ), n e ENl is a triangula.r array (a) xnt
0 as n
--
for each fixed /,
x
--
kn
(b)
lim n-yoo
X Iak,lS
C
1
kn lim A'-lau1, =
n-+x
>1
k then E/i,lxn,y, --- y. Proof
Iyn -y
F or y
=
(c) can be
O ,
fyn) for any s > 0 5 Ne k 1 such that for (c), and then (b) and the triangle inequality,
By assumption on I < 8C. Hence by
,
kn
n
>
Nvs
kn
lim Xxna, -y
n-yx
omitted.
X-v/t-'w
lim
=
>1
-y)
>1
n-yx
Nz
Txntyt y)
< lim
-
+
s
wj
n--kx
in view of (a). This completes the proof, since
E
(2.23)
:,
=
is arbitrary.
x
A particular case of an array fxn/)satisfying the conditions of the lemma is xnt where S = (Z''=1y,)-1y/, ty/lis a positive sequence and Z'lsxlys A leading application of this result is to prove the following theorem, a fundamental tool of limit theory. -->
=.
2.35 Kronecker's Iemma Consider sequences (tz,)';'and x, If Ll=jxtlat C < cxa as n numbers, with at -?'
tx/lTof
positive real
-->
=.
--
1
n
-Xx,
Jn
0.
--)
(2.24)
>1
Proof Detining co 0 and cn Xntcjxtlat for n G N, note that xt atct cr-1), 1,...,n. Also define tzo 1,...,n, so that an 0 and bt at fh-l for t t ntzzkbt. Now apply the identity for arbitrary sequences ao,...,an and co,...,cn, =
=
=
77(c,,.-1 Abel's known
c,-1)
as
=
=
-
-
=
n
n
(This is
=
=
=
77(tz,-1 zflcr-l -
+
ancn
-
zoco.
,-1
partial
summation formula.) We obtain
(2.25)
Limits and Continuit.v
1 n -- X xt an >1
n
1
X atlc t an
=
-
>1
= cn
35
1
-
-
ct- 1)
w-xn
--Lbtct-j an
-->
C- C
=
(2.26)
0
>1
is by the Toeplitz lemma, setting xnt
where the convergence
=
bt lan.
.
The notion of array convergence extends the familiar sequenceconcept. Consider subsequences, collection of full generality k (N for a (fxmnk,e l m e EN), an array where fnk, k e INl is an increasing sequence of positive integers. If the limit xm limz--ycxaAun: exists for each m e IN,we would say that the array is convergent; and = its limit is the infinite sequence (x,,,,m e N 1.Whether this sequence converges is a separate question from whether it exists at all. Suppose the array is bounded, in the sense that supkmlxmpulf # < x. We know by 2.12 that for each m there exists at least one cluster point, say xm, of the inner sequence (xaw,k e IN1. An important question in several contexts is this: is it valid to say that the array as a whole has a cluster point?
2.36 Theorem Corresponding to any bounded array f(xrm,k e ENJ, m e N ) there k e N J m e EN exists a sequece (x,,l, the limit of the array ( tx,ug, J as k x, subsequence n:) of f for each m. where (rl:) is the same ,
-->
,
Proof This is by construction of the required subsequence. Begin with a conver1 be a su bsequence o f (nz) such that sj gent subsequence for m 1', let (nkJ x1. Next, consider the sequence (.n,n)J. Like f-n,s:J,this is on the bounded interval (-#,#), and so contains a convergent subsequence. Let the indices of this 1 be denoted f42:) and note that latter subsequence, drawn from the members of (?1k), Proceeding in the same way for each m generxl,nj xl as well as 12,4 ates an array (fn, k e IN1, m e INJ, having the property that lxjw, k e EN) is a convergent sequence for 1 S i S m. k k e ENJ in other wotds, take the first member of Now consider the sequence fns 1 th e secon d member of (rl2k), and so on, For each m, this sequence is a subtlkl, pzth point of the sequence onwards, and hence the sequence of (n'2Jfrom the k e EN ) sequence f.x,,,4,k k PZJis cnkergent. This means that the sequence (xmmj, k = setting rl11 requirement isfies of the theorem. w is convergent, so the f fnzl sat ...-)
.n
=
,
-->
.n.
--
'
,
is called the 1 is disjoint by eonstrucand tion, with Bj e @, Aa U7=1#j, =
=
,
=
=
p,(An)
=
Xp,(). j=1
The real sequence (g,(A,,)) is therefore monotone, and converges since it is bounded g,(A). above by g,4Q)< x. Countable additivity implies LT) g,(U7 O';=1A,,. Consider Alternatively, 1et (An) be decreasing, with z4n-l Q An and the increasing sequence lA)) determine g,(49 by the same argument, and use finite additivity to conclude that g(A) = gtf1l g,(A9 is the limit of g(An) =l#y.)
=1g,(#j)
=
:4
=
=
,
=
-
P.(fX -
P.(XnC).
K
The finiteness of the measure is needed for the second part of the argument, but the result that g,tAul g(A) when ytu A actually holds generally, not excluding the case g,(A) x. This theorem has a partial converse: '
-->
=
3.5 Theorem A non-negative uous is countably additive.
set function g, which is finitely additive and contin-
Proof Let (#n) be a countable, disjoint sequence. If A,, U7=l, the sequence (An ) is increasing, Bn f'nAn-l 0, and so gtAnl g,(#n)+ g,(An-I) for every n, by finite additivity. Given non-negativity, it follows by induction that (g,(An)Jis ZTJ =Ip,(), whereas continuity implies that g,(A) monotone. lf A U7=tBj, g(A) LLIUTJ * lB =
=
=
=
=
=
=
.
Arguments in the theory of integution often turn on the notion of a set. In a measure space (f1,T,g), a set ofmeasure zero is (simplyenough) a set M e 5 with g.(M) 0. A condition or restriction on the elements of Q is said to occur almost everywhere (a.e.)if it holds on a set E and f E has measure zero. lf more than one measure is assigned to the same space, it may be necessary to indicate which measure the statement applies to, by writing a.e.lgl or a.e.gvq as the case may be. fnegligible'
=
-
3.6 Theorem (i) If M and N are T-sets, M has measure 0 and N i M, then N has measure 0. (ii) lf (A&) is a countable sequence with g(Mj) 0, V j, then g(UjMj4 0. (iii) If fe)) is a countable sequence with g,(r.'pl 0, V j, then g,((UjA))9 = 0. .
=
=
=
Measure Proof
39
(i) is
an application of monotonicity', (ii) is a consequence of countable additivity', and (iii) follows likewise, using the second de Morgan law. .
In j3.2 and j3.3 we will be concerned about the measurability of the sets in a given space. We show that, if the sets of a given collection are measurable, the sets of the c-field generated by that collection are also measurable (the Extension Theorem). For many purposes this fact is sufficient, but there may be sets outside the c-field which can be shown in other ways to be measurable, and it might be desirable to include these in the measure space. In particular, if g(A) g(A) wheneverd c E c #. This is g,(#) it would seem reasonable to assign jtt'l equivalent to assigning measure 0 to any subset of a set of measure 0. The measure space (f1,T,jt) is said to be complete if, for any set E e 5 with g.(F) = 0, a11 subsets of E are also in T. According to the following result, every measure space can be completed without changing any of our conclusions except in respect of these negligible sets. =
=
3.7 Theorem Given any measure space (f,;,jt), there exists a complete measure TB, (f,TB,g), called the completion of (f1,F,p), such that 5 and g(F) space c F(F) for al1 E e %. In =
Notice that the completion of a space is defined with respect to a particular has a different completion for each measure measure. The measurable space (f1,@) that can be defined on it. Proof Let
AB J;B
denote the collection of a11 subsets of i-sets of pmmeasure0, and LF
=
c
D: FA F e
AB
for some E e
.41
.
0, any set F c E satisfies the criterion of (3.7)and so is in FB as the definition requires. For F e @B, let F(#) g,(F), where E is any F-set satisfying E F e AB. To show that the choice of E is immaterial, let Ej and E1 be two such If jt(F)
=
=
sets, and note that
p,(A'lA&) Since jttFl t-p F2)
=
g,((FAF1)A (FAF2))
=
=
0.
g(F1 A F2), we must conclude
p,(F1 r'hE
+
g,tflr-hE
2 p.tfl k p,(F1 r'hF2)
(3.8) that
for i 1 and 2, or, g,(F1) p.(&). Hence, the measure is unique. When F e T, we AB, confirming that the measures agree on T. can choose E F, since FAF= O e TB is a c-tield colgaining ?F.Choosing E F in (3.7) It remains to show that AFC FB. 8/ shows If F e ;B, thenFA F e ABfor E e 5 and hence Ec for Fe 5 c TB. And finally AB where Ec if Fj e J;B foj e (N,there EF e 5, and so Fc e e exist Ej e 5 for j e EN,such that Ejh Fj e AB. Hence =
=
=
=
=
e U1 A U/) c UIF/AFJ ?
? by 3.6(ii). This means that
AB,
j
Ojb 6
TB,
and completes the proof. w
(3.10)
Mathematics
40 3.2 The Extension Theorem
You may wonderwhy, in thedefinition of ameasurable space, Fcould not simply be the set of all subsets; the power set of D. The problem is to tind a consistent method of assigning a measure to every set. This is straightforward when the space has a tinite number of elements, but not in an infinite space where there is no way, even conceptually, to assign a specitic measure to each set. It is necessary to specify a rule which generates a measure for any designated set. The problem of measurability is basically the problem of going beyond constructive methods without running into inconsistencies. We now show how this problem can be solved for c-fields. These are a sufticiently general class of sets to cope with most situations arising in probability. One must begin by assigning a measure, to be denotedgo, to the members of some basic collection C for which this can feasibly be done. For example, to construct the measure b a. Lebesgue measure we started by assigning to each interval (tz,&1 We then reason from the properties of go to extend it from this basic coltection to a11 the sets of interest. C must be rich enough to allow go to be uniquely defined by it. A collection C c 5 is called a determning class for (f,F) if, whenever jt and v are measures on 5, jt(A) v(A) for all A e C implies that g v. Given C, we must also know how to assign gmvalues to any sets derived from rby operations such as union, intersection, complementation, and difference. For disjoint sets A and B w have g)tz4QJ #) go(A) + go(#) by finite additivity, and when B c A, gotA B) go(A) 04#). We also need to be able to determine gotA fa #), which will require specitic knowledge of the relationship between the sets. When such assignments are possible for any pair of sets whose measures are themselves known, the measure is thereby extended to a wider class of sets, to be is closed and C are the same collection, but in any event denoted 'Y'.Often under various finite set operations, and must at least be a semi-ring. In the is typically either a field (algebra)or a semi-algebra. Example applications 1.18 is a god case to keep in mind. However, cannot be a c-field since at most a finite number of operations are permitted to determine go(A) for any A e .Y'.At this point we might pose the opposite question to the one we started with, and ask why might not be a rich enough collection for our needs. ln fact, events of interest frequently arise which cannot contain. 3.15 below illustrates the necessity of being able to go to the limit, and consider events that are expressible only as countably infinite unions or intersections of C-sets. Extending to the events 5 c(#) proves indispensable. We have two results, establishing existence and uniqueness respectively. -
=
=
=
-
=
-
./
-t/'
,/
'f
,/
,$
=
3.8 Extension ,/.
theorem (existence)Let $ be a semi-ring, and let go:
lf 5 = c(,$). there exists a measure jt on measure on ,YI. jL0t'l for each E e n
./
F--y
F+be
(D,?D,such that jt(F)
a =
Although the proof of the theorem is rather lengthy and some of the details are fiddly, the basic idea is simple. Take an event A c f to which we wish to assign a
Measure ./,
,/,
consider choosing a finite we have g,(A) go(A). If A measure g,(z4).If A G that is, a selection of sets Ej e or countable covering for A from members of OjEj. The object is to tind as 1,2,3,... such that A c ./, j a covering as possible, in the sense that Xgo(f;)is as small as possible. The outer measure of A is =
,/,.
teconomical'
=
P,*(X)
inf X.to(F./),
=
(3.11)
where the infimum is taken over a11tinite and countable coverings of zl by y-sets.
If no such covering exists, set g,*(A) x. Clearly, g7(A) gotAl for each A E g,* is called the outer measure because, for any eligible definition of g,(A), g,*(A) 2
Xg,(l
2
p,
j
Uf j
.9.
=
=
2 p,(A), for Ej
,Y'.
E
(3.12) ,/
The first inequality here is by the stipulation that p(Ej) goIFJfor Ej e in side majorant where exists, is infinite. The the case a covering or else the second and third follow by countable subadditivity and monotonicity respectively, because p, is a measure. We could also construct a minimal covering for Ac and, at least if the relevant outer measures are tinite, define the inner measure of 4 as jt+ (4) jt(AC) p,*(z4C). and g,*(A9 k jttAfl by (3.12), Note that since p,(z4) g,tfl g,*(f) =
=
=
-
-
p,(A) 2 g,+(A).
(3.13)
lf g,*(4) g+44), it would make sense to call this common value the measure of A, and say that A is measurable. ln fact, we employ a more stringent criterion. A set A g f is said to be measurable if, for any B c f, =
g,*(Af-'h B4
+
g,*tz4f chB)
=
(3.14)
jt*(#).
This yields g,*(A) p,+(A) as a special case on putting B f1, but remains valid even if g,tfll x. Let A denote the collection of all measurable sets, those subsets of f) satisfying (3.14).Since g,*(A) go(A) forA e and /1.:40) 0, putting 4 O in (3.14) gives the trivial equality g,*(#) g,*(#). Hence O e A, and since the definition implies that Ac E A if 4 e A, f E A too. The next steps are to detennine what properties the set function jt*: A !-+ F shares with a measure. Clearly, =
=
=
,/
=
=
=
=
jt*(A)
0 for all 4
C f1.
(3.15)
Another property which follows directly from the definition of p,*is monotonicity: X1 I X2 =
F*(X1) f >*(X2), fOr A1, 42 i
.
(3.16)
Our goal is to show that countable additivity also holds for g,* in respect of A-sets, but it proves convenient to begin by establishing countable Juhadditivity. 3.9 Lemma If fAy,j
E
EN J is any sequence of subsets of f, then
42
Mathematics
g,* UX./ f
/7 >*(X./).
(3.17)
Proof Assume g*(Ay) < x for each j. (If not, the result is trivial.) For each j, 1et fEjkq denote a countable covering of Aj by y-sets, which satisfies p.*(A>+ 2-:
< Xpo(,) k
: > 0. Such a collection always exists, by the definition of g*. Since LkkEjk, it follows by definition that U#.j ?
for any
p.*UA/ noting Z7=12S ity is strict. .
=
1.
f XpotA>l
1 and E e f1, the definition of an A-set gives =
g7(An fa E)
=
g*(4n-1 ra (An rh A=))+ g*(4k-1 f-'h (Anf''h F))
= g*(An-1 rhEI+ g7(& rn E). where Bn g,*(A()rn E)
= =
An -An-1, and the sequence 0', then by induction, ke(An chF)
=
(3.20)
(BjI is disjoint.
X pz*( rn E)
Put
ztttl
=
O so that (3.21)
y=1
holds for every n. The right-hand side of (3.21)for n e INis a monotone real x. Now, since An e A, g,*(4rh A>)as n sequence, and g,*(An(hE) .--)
-->
jt*(F)
r1*(Anrnm
+
h1*(Xrn)
k jt*(An(nE)
+
g*tAc rnfl,
=
using the monotonicity of g,*and the fact that from the foregoing argument that
AC
(3.22) Anf.Taking the limit, we have
sleasure jt*(A7 k g7(A r'A F)
+
43
g.*(AC
f'-h A'),
(3.23)
so that 4 e A by 3.10. For the case of a decreasing sequence, simply move to the complements and argue as above. . Since () is a disjoint sequence, countable additivity emerges as a by-product of the lemma, as the following corollary shows.
3.12 Corollary
If
()
is a disjoint sequence of A-sets,
g,*U j Proof Immediate on putting E
-
Xp,*().
(3.24)
j
in (3.21) and letting n
=
--> x,
noting VjBj
A. w
=
Notice how we needed 3.10 in the proof of 3.11, which is why additivity has been derived from subadditivity rather than the other way about. Proof of 3.8 We have established in (3.15)and (3.24)that g* is a measure for the elements of A. If it can be shown that 5 q A, setting g,(A) g,*(A)for al1 A e 5 will satisfy the existence criteria of the theorem. implies The t'irst step is to show that c A or, by 3.10, that 4 e =
.t/7
,/
g,*(A'r7A) + g,*(A'r-hAf)< g*(F)
(3.25)
,$1
denote a finite or countable covering of E such that Lj gotAyl < p,*(F) + E, for s > 0. If no such covering exists, g,*(A-) cxl by definis a ition and (3.25)holds trivially. Note that E r'A A c U/A; r-7A), and since /''AAC f-7WC), and by simple semi-ring the sets Aj ('''AAare in ,Y'.Similarly, E c Uyt>j algbra definition of and the set a semi-ring,
for any E i (. Let lAy e
=
,/
Aj r'Aytf
=
Aj
Aj chAl
-
=
UG:
(3.26)
k
where the Cjk are a finite collection of yksets, disjoint with each other and also with Aj r'hA. Now, applying 3.9 and the fact that g,*(#) k)(#) for B e we find .$,
=
g,*(A'faA) +
g*(Ff-$AC)
f
X jtot/j chdl X XgotGl +
j
j
k
p,*(eD =Xw)(z4y) ?
*(A1 nA') + F*(Af rhEI
Mathematics
=
jt*(z42 c7z4jra e) +
g*(z4qrndl chE) + g,*(A2/'nAk ra A')+ jt*(A rnAf
f-h
E)
p,*(A2f-7Al ra A')
+ g*((Aq r7Al ra F)
f-h F)) rn E) k.p (Aj/-AAC
k.p (A2 chAf
+p7((42ra4j)cfaF), = jt*((A2rn41) cjE)
(3.28)
where the inequality is by subadditivity, and the rest is set algebra. By 3.10 this is sufficient for A1 f'742 e A, and hence also for A1kJA2 G A, using closure under complementation. It follows that A is a c-tield containing $, and since 5 is the smallest such c-field, we have that 5 i X, as required. . Notice that (3.28)was got by using (3.14)as the relation defining measurability. The proof does not go through using p.*(A) gx(A) as the definition. The style of this argument tells us some important things about the role of .Y'. Any set that has no covering by '-sets is assigned the measure x, so for finite measures it is a requisite that f q LbEj for a tinite or countable collection fEj o? e #') The measure of a union of y-sets must be able to approximate the measure any F-set arbitrarily well, and the bagic content of the theorem is to establish that a semi-ring has this property. To complete the demonstraiion of the extension, there remains the question of uniqueness. To get this result we need to impose c-finiteness, which was not needed for existence. =
.
3.13 Extension theorem (uniqueness)Let jt and g,' denote measures on a space (f1,T), where 5 c(.$), and is a semi-ring. If the measures are c-finite on and g,(A') jt'(F) for a1l E e .T, then g(An) g'(F) for a1l E e %. .$
,/
=
=
=
Proof We
tirst prove
the theorem for the case of finite measures, by an applitheorem. Define .J= fE e F: p.(F) = p,'(AD ) Then $z ,J by hypothBy 1.27, the proof is completed esis. If is a semi-ring, it is also a l-system. contains c(#). show hence if Fe can that .J is a l-system, and and condition 1.26(a) holds. Additivity When the measure is finite, f e implies that, for A e
cation of
the 7:-
.
./
.4
.4,
F(X3
so that 1.26(b) additivity,
=
F(f)
-
XX)
=
LUIf) -
$(A)
=
>'(A3,
(3.29)
holds. Lastly, 1et fA.j) be a disjoint sequence in co
p,
UAp
j=3
X
oo -
77p,(A> j=
-
.4.
By countable
G)
Xp,-+) -
j=1
p,' UAk
(3.30)
,
y=1
and 1.26(c) holds. It follows by 1.26 and the 7:- theorem that 5 c(./) ? W. VjBj where Bj e and pBp Now consider the c-finite case, Let ( g,'(#j) < co. % fBj rnA: A e 5 J is a c-field, so that the (Bj,5) are measurable spaces, =
./
=
=
=
Measure
45 ./
fe'h Bj. The preceding argument on which jt and $ are tinite measures agreeing on showed that, for A e 5, g(#j rnz4) bt'Bj rnA) only if g and g,' are the same =
measllre.
Consider the following recursion. By 3.3(ii) we have p.(Ara (#1t..p#2)) Letting Cn
=
f''h #2) g.(X f''h #1 f''h #2). g,(z4r'7#1) + h1(z4
=
U7=l the
(3.31)
-
same relation yields
(3.32)
rn cn) p.tz4rn Bn) + p,tA ra cn-j) p,tz4rn Bn rn cn-1). g,ta4 =
-
The terms involving Cn-l on the right-hand side can be solved backwards to yield f-'h an expression for g,(A G), as a sum of terms having the general form g,(Ara Bjj rn Bjzra Bj, ra
for some j, say j that gtD ra Bj) that in (3.32) =
=
j, in which ease D
jt'(D ra B? for all D g.(Afa Cn)
olds for any n. Since Cn
This
=
G
=
=
=
g,to ra Bj)
(3.33)
< x
A fn Bh fe'h Bh fa e F. Since we know preceding by 5 the argument, it follows ...
g,'(4 fa Cn4. f as n
--)
F(A)
)
...
-->
x,
(3.34) we obtain in the
F'(A),
limit
(3.35)
the two sides of the equality being either finite and equal, or both equal to This completes th proof, since A is arbitrary. w
+x.
3.14 Example Let A denote the subsets of R which are measurable according to (3.14) when g,* is the outer measure defined on the half-open intervals, whose measures go are taken equal to their lengths. This defines Lebesgue measure m. 'hese sets fol'm a semi-ring by 1.18, a countable collectioh of them covers R, and the extension theorem shows that, given m is a c-finite measure, A contains the Borel field on R (see 1.21), so (R,f,?n) is a measure space. It can be shown (we wonpt) that a11 the Lebesgue-measurable sets not in B are subsets of T-sets of includes a1l measure 0. For any measure g, on (R,f), the complete space @,SB,F) of the Lebzsgue-measurable sets. In The following is a basic property of Lebesgue measure. Notice the need to deal with a countable intersection of intervals to determine so simple a thing as the measure of a point. 3.15 Theorem Any countable set from R hs Lebesgue measure 0. Proof The measure
of a point
(.x) is
zero, since for x e R,
@Q
fx) ppd :
.
'
'
??z(())0. est c-field containing the colletion and since it follows (& is and hence D c-field by the T l-system, a a theorem that c(#c) ? C @)0. Exactly same conclusion holds for #s, the orresponding collection for V. Evel'y element of Sss is the intersection of an element from #v and one from #'g, and it follows that Spg c C @D. But Sys is a l-system by 3.19 and hence a further application of the 7:- theorem gives 5 (8)N ? u gs; , .1
vt/c
,t/c
=
-system,
xt/p
'
=
irlik pztjlptlof > product extends beyond pairs to triples and general n-tuples, and gd jjjajj be jnterested in the nrooerties of Euclidean a-qnnne ffRlh '
y:
,;r-.
..
.. .
. .
.
.
jjtyj)jjjujf
50
Mathetnatics
For tinite n at least, a separate theory is not needed because results can be obtained by recursion. If (Ylf,X)is a third measurable space, then trivially, fxE
x'P
((,(,v):
=
(
e f,
(,)
= f((,(),v): (,() E) xT. = (f x
e E,
ve
e DXE,
v
t1:l
e .P1
(3.49)
Either or both of (f),T) and (E,V) can be product spaces, and the last two theorems extend to product spaces of any finite dimension.
3.5 Measurable Transformations
F: D
F->
(E,N) in a different context, as domain
and
Consider measurable spaces (f,T) and codomain of a mapping E.
if F -1(#) l 5 for a11# l N. The idea is that a to be sl%-measurable measure p, defined on (,T) can be mapped into (E,V), every event B e V being assigned a measure v(#) g,(F -1(.#)) We havejust encountered one example, the projection mapping, whose inverse defined in (3.44)takes each T-set A into a measurable rectangle. Corresponding to a mesurable transformation there is always a transformed in following the sense. measure, F is said
=
.
3.21 Theorem Let jt be a measure on (f,F) and F: f ation. Then P.F -1 i s a measure on (E V) where ,
jLF -1(#)
E a measurable
F-
transfonu-
,
-1
=
(#)), each
g,(F
B e g,
(gj5o
.
.
Proof we check conditions 3.1(a)-(c). Clearly g,F -1(4 ) 1 0 a1l A e Bv. Since holds by definition, T-1(O) T -1(E) O by 1.2(iii) and so g,F-1(3) p,(T -1(O)) g,(O) 0. For countable additivity we must show ,
=
=
=
=
=
g,F-1
for a disjoint collection that the B; are disjoint becomes
U#y j
=
X/tF-1()
(3.51)
j -1
F Bj4, 1.2 shows both Bj,B1,... e E. Letting B) and that F-1(Uy#y) Equation (3.51) therefore Uy#J. =
=
p,Ur j
=
X g,(#J)
(3.52)
./
for disjoint sets Bgq,which holds because g, is a measure. The main result o
general transformations
.
is the following.
3.22 Theorem Suppose F -1(#) l 5 for each B e D where D is an arbitrary class of sets, and N c4D). Then the transfonnation F is T/v-measurable. ,
=
51
Measure
(
$.,
.
.
.
,
.
,
=
,
j'
T-3LjBj)
T-3Bj)
q @,j q s, then proof By 1.2(ii) and (iii), if l then @ @ It follows d if z'-1(#r) z'-1(#)c z'-1(,) E e ?yan
,
k
=
UjF-1()
that the class of
CtS
a is a c-field.
Since D
t,: w-1(s)s s)
.
N c 4 by definition.
.4,
c
.
This result is easily iterated. If (t1#,J't)is another measurable
is a V/x-measurable transformation, then &oF: (
F-+
space and U: E
F->
VIJ
VP
is F/s-measurable, since for C -1
R, W 1(C)
E
-1
(&0D (0
F
=
N an d hence
E
,
-1
(&
(C))
9.
1
(3.53)
E is called a measurable isomorphism if it is An important special case: F: f' 1-1 oflto, and both F and F -1 are measura ble The measurable spaces (2,1) and (E,N) are said to be isomorphic if such a mapping between them exists. The implication is that measure-theoretic discussions can be conducted equivalently in either (f,T) or (E,V). This might appear related to the homeomorphic property of realfunctions, and ahomeomorphismis indeed measurablyisomorphic. Butthereis no implication the other way. -..:
.
r0,11 F--> (0,11,defined
3.23 Example Consider g: #(A)
x+
,1
0 f x f 1 2
x
1
l
0
< xlclc,
c
Combining parts (i) and (ii) shows that if h,...,fn are measurable functions so is Xhjcjfj, where the cj are constant coefficients. The measurability of suprema, infima, and limits of sequences of measurable functions is important in many applications, especially the derivation of integrals in Chapter 4. These are the main cases involving the extended line, because of the possibility that sequences in R are diverging. Such limits lying in V are called extended functions. 3.26 Theorem Let f/l be a sequence of F/s-measurable functions. Then infn/n, supn/k, liminfln, and limsuw/ are F/W-measurable. Proof For any x e
t:
F (: fnlk
f xl s 5 for each n by assumption. Hence
,
supnlnt)
R*' be defined by -
r(A') -1
In this case, g,g B)
=
X..
-1
=
P.W(#))
=
jt(#
xR
(3.68) k-m
)
or p s qrrl .
4 Integration
4. 1 Construction of the lntegral The reader may be familiar with the Riemann integral of a bounded non-negative function f on a bounded interval of the line gJ,:J, usually written fdx. The objects to be studied in this chapter represent a heroic generalization of the same idea. lnstead of intervals of the line, the integral is defined on an arb-
itral'y measure space. Suppose
(f1,@,g)is :
l'1
O
a measure space and
2-6
is a T/s-measurable function into the non-negative, extended real line. The integral of f is defined to be the real valued functional
jfdb
Sup
=
tXj
fte)ljpfj
tinsfo
(4.1)
where the supremum is taken over a11finite pmitions of into sets Ei e T, and the supremum exists. lf no supremum exists, the integral is assigned the value +x. 5 T he integral of the function IAJ,where 1x() is the indicator of the set 4 e 5, is called the integral of f over A, and written fdbk. The expression in (4.1) is sometimes called the lower integral. and denoted sfdp. Likewise defining the upper integral of f,
j'ydb -
inf
E i
(supsy/j bte'i)
(4.2)
,
G
f from below and from we should like these two constnzctions, approximating .fdbk J'/W/whenever above, to agree. And indeed, ttis possible to show that f *fdbk oo if either the set > 0) has is bounded and g(f1) < x. However, (: infinite measure, or f is unbounded on sets of positive measure. Definition (4.1) is preferred because it can yield a finite value in these cases. =
.f()
=
4.1 Example A familiar case is the measure space ('R,f,-), where m is Lebesgue where f is a Borel function is the Lebesgue integral of measure. The integral Ifdm f. This is customarily written j'fdx,reflecting the fact that mx, x + #x1) tfx, even though the sets (A'j) in (4.1) need not be intervals. n =
4.2 Example Consider a measure space (R,f,g,) where g, differs from m. The where f is a Borel function, is the Lebesgue-stieltjes integral. integral Ifdbk'
Mathematics
58 The monotone function Fxt
=
p,((-x,x1)
(4.3)
F(b) F(c), and the measure of the interval has the property (x, x + #< can be written JF@). The notation IfdF means exactly the same as J/Wg.,the choice between the p. and F representations being a matter of taste. See j8.2 and j9.1 for details. n g,((J,:1)
=
-
For a contrast with these cases, consider the Riemann-stieltjes integral. For an 1et a partition into subintervals be defined by a set of points H interval Etz,!71, Akl, < < xn with a = xo < b. Another set H' is called a = (x1,..., refinement of Fl if IR c H'. Given functions f and a: R 1-+ R, let .x1
=
...
5lRvas.fl
=
77f(f)(G(A)
-
a(zf-I)),
(4.4)
i=1 bafd%,
here ti e (xf-l,xf). If there exists a number there is a partition IQ with W
(&l-I.a,.f)
'-jajd
such that for every : > 0
b < 6:
l
for all FI Q IQ and every choice of fff), this is called the Riemann-stieltjes integtal of J with respect to a. Recall in this connection the well-known formula for integration by parts, which states thqt when both integrals exist,
fbtub)
=
/J)a(J)
+
jafd ja@f' +
When (x = x and #is bounded this definition yields the ordinary Riemann integral, and when it exists, this always agrees with the Lebesgue integral of f over gtz,:). Moreover, if a is an increasing function of the form in (4.3),this integral is equal to the Lebesgue-stieltjes integral whenever it is defined. There do exist bounded, measurable functions which are not Riemann-integrable (consider3.30 for example) so that even for bounded intervals the Lebesgue integral is the more inclusive concept. However, the Riemann-stieltjes integral is defined for more general classes of integrator function. In particular, if f is continuous it exists for a of bounded not necessarily monotone. These integrals therefore fall variation on gtz,&), the class defined by (4.1), although note that when a is of bounded variaoutside tion, having a representation as the difference of two increasing functions, the Reimann-stieltjes integral is the difference between a pair of Lebesgue-stieltjes
integrals on (J,:1. The best way to understand the general integral is not to study a particular measure space, such as the line, but to restrict attention initially to particular classes of function. The simplest possible case is the indicator of a set. Then, every partition fEiI yields the same value for the sum of terms in (4.1),which is
jgdn
Integration
59
jxdp
(4.6)
BIX),
=
=
5, the integral is undefined.
for any A e 5. Note that if A
Another case of much importance is the following.
4.3 Theorem lf J
0 a.e.lgl, then
=
fdp
Proof The theorem says there exists C e C. For any partition (Fl,...,&J 1et A'; ity of g,,
77 i
inf (l) e
Ei
/.() g,(e)
=
Xi
inf (l) e
F;
=
0.
1, such that /4)) 0 for ) Ei rn C, and A': Ei F;. By additiv-
with
gtc)
=
=
=
=
/.()l g,(e':) +
Xi
inf f e e:
-
g,(rJ)
= 0,
(4.7)
the first sum of terms disappearing because /*4) 0, and the second disappearing =
by
.6(i)
since pE'i) f g,(C')
0 for each i.
=
.
A class of functions for which evaluation of the integral is simple, as their name suggests,is the non-negative simple functions.
4.4 Theorem Let (?() 5 is a partition of f.
=
where aj k 0 for X:=1%1sj()),
i
=
1,...,n, gnd EL,...,En E
Then
tFp, 77
(4. 8 )
ipEi).
=
=1
Proof Consider an arbitrary finite partition of f1, A1,...,4-, and define inftxxyst). Then, using additivity of g, Ip,(Ay) X SX 77 f=1 .j=1 j=3 =
oo
Proof Let gn infkn/'k, so that (,n) is a non-decreasing sequence, and gn 1% g sides oo 2 Igndp. Letting n both of liminfn/n. Since Jn k gn, Jfntkyt the on inequality gives =
=
--+
liminf fndp 2 lim gndp n.-+txz
=
gdp
liminf
=
n-M
4.12 Dominated convergence theorem If fn
--
that IfnI
fn #p..
(4.24)
w
n--x
S g a.erlp1 for al1 n and
gdbk
jw.
.t.......:.')..:..
t.............( .) ..... .:
.......
Fatu' s lymma, attl .).( ,
.'). '
..
.(..
.
...
E .y. ... . .
..
.
.. .
j.... ..
.
-
,
64
Mathematics
ljgdp
J
=
=
liminftzg
-
j (lg jhndp
hn) Jjt f liminf
-
hnjdp
N->x
n-/co
jgdp
limsup
-
(4.25)
,
N-+=
where the last equality uses (2.4). Clearly, limsupn-yxf/lnzv k 0 the modulus inequality implies 1im n-e
=
0, and since
jfndp j/'ljt jhndp < 1im
-
=
0.
j'hndb (4.26)
.
n'M
Taking the case where the g is replaced by a tinite constant produces the following version, often more convenient:
4.13 Bounded convergence theorem If fn then limn--jxlfntfg, Ifdt < x. E1
--y
f a.e.gg1 and IA! f
B
Theorem 4.7 extends by recursion from pairs to arbitrary finite sums of funcPut fn tions, and in pmicular we may assert that J(Z:=1'i)Jg, = Xl=tgiand fndbk l=tlgid, where the gi are pon-negative functions. Then, if fn -1xf Z7=l'j < x a.e., 4.9 also permits us to assert the following.
Z:=1J##g..
=
=
=
4.14 Corollary
If
f'fl is
a sequence of non-negative functions,
J Ng'
,
-
# jg,
.
(4.27)
o
i= 1
fcz1
By implication, the two sides of this equation are either both infinite, or finite and equal. This has a particular application to results involving c-finite measures. Suppose we wish to evaluate an integral gdkkusing a method that works for finite measures. To extend to the c-finite case, choose a countable partition igi, lffl of f1, such that g(fk) < x for each Letting gi loy, note that g iidp by gdp (4.27). and =
.
=
=
4.3 Product Measure and Multiple lntegrals Let (f,@,g) and (E,V,v) be measure spaces. ln general, tfx E, 5 @ N, 1) might also be a measure space, with 7t a measure o'n the sets of 5 @ V. In this case respectivlly, are measures g andv, defined by g,(#) 7:(Fx E) andvtG) a(f1 x G4 called the marginal measures conrsponding to T:. Alternatively, suppose that g and v are given, and define the set function 1: Sys F-> 2-+, =
=
where Sss denotes the measurable rectangles 7:(Fx G)
=
g,(&v(G).
of the space f x E, by
(4.28)
We will show that 1: is a measure on Syg, called the product measure. >nd has an extension to 5 * T, so that tfx E, 5 (8) V, a) is indeed a measure space. The tirst
Integration
65
step in this demonstration is to define the mapping F(o: E F-> x E by Fo(() (,(), so that, for G e V, F(o(G) f )) x G. For E e 5 @ V, let =
=
Eo
F;1(F)
=
((: (,()
=
e F)
c
E.
(4.29)
The set Eoj can be thought of as the cross-section through E at the element ), any countable collection of 5 @ V-sets (Ej, j e EN
U
((:(t,),t)
=
j
For future reference,
U)
e
Ul(:(,()
=
e
./
j
EjI
=
U().
the following.
note
e 5 (& V. If F=Fx
Proof We must show thate'to e N wheneverf G e N, it is obvious that Eo, =
c(Sps),
(4.30)
j
4.15 Lemma Toj is a W5 @ Vl-measurable mapping for each
Since 5 (&N
For
(t.
G,
E
F
e
F
(l)
e f1.
G for Fq 5 and
e N.
=
3,
(l)
(4.31)
the lemma follows by 3.22. w
The second step is to show the following.
4.16 Theorem
';:
on Sps.
is a measure
0, recalling that Fx O O x G O Proof Clearly JE is non-negative, and ())
=
J
X 1z)(t0)V(f) 1
tf/t*l
66
Mathentatics
= as required,
Xp.(F?)v(t7;)
Xz:(e;.) j
=
j
(4.33)
the penultimate equality is by 4.14. w
where
It is now straightforward
4.17 Theorem ( x E, 5
to extend the measure from R,g to 5
e N, J:)
(& V.
is a measure space. hence Svs, is a semi-ring by
5 and V are c-fields and hence semi-rings; 3.19. The theorem follows from 4.16 and 3.8. . Proof
Iterating the preceding arguments (i.e. letting (,T) and/or (E,V) be product spaces) allows the concept to be extended to products of higher order. In later chapters, product probability measures will embody the intuitive notion of statistical independence, although this is by no means the only application we shall meet. The following case has a familiar geometrical intemretation. Lebesgue measure in the plane, R2 R x R is detined for intervals
4.18Example by
=
-((Jl,:11
x
(J2,&21)
=
b1
-
J1)(:2
,
-
(4.34)
J2).
Here the measurable rectangles include the actual geometrical rectangles (products of intervals), and o the Borel sets of the plane, is generated from these as a 2 is a measure space in consequence of 3.20. By the foregoing reasoning, (R2,B which the measure of a set is given by its area. n 0A,
,m4
We now construct integrals of functions /*((t),() on the product space. The following lemma is a natural extension of 4.15, for it considers what we might think of yielding a function with as a cross-section through the mapping at a point (J) e domain E. ,
4.19 Lemma Let f:D x E F-> R be 5 @)Ws-measurable. Define Then f(0:E F- R is V/o-measurable. fixed ) e
.f(o(j)
=
A,tlfor
.
Proof We can write
f(()
=
f(eD,()
=
f(Fo(())
=
/oF(().
By 4.15 and the remarks following 3.22, the composite function measurable.
(4.35) is MIB-
fo
K
Suppose we are able to integrate f with respect io v over E. There are two quesJE/'IWV tions of interest that arise here. First, is the resulting function gl F/plmeasurable? And second, if g is now integrated over f1, what is the relationship between this integral and the integral Inxzfd:Over ( x E? The affirmative answer to the first of these questions, and the fact that the integral where these exist, are fiterated' integral is identical with the the most important results for product spaces, known jointly as the Fubini theorem. Since iterated integration is an operation we tend to take for g'ranted xxrltla mllltlnlo piemnnn nerhnns the main Doint needin: to be stressed infnyrnlq =
tdouble'
Integration
67
here is that this convenient property of product measures
(and multivariate
Lebesgue measure in particular) does not generalize to arbitrary measures on product spaces. The first step is to let f be the indicator of a set E e 5 (& V. ln this case f is the indicator of the set Efo defined in (4.29),and
J/'olv
v(F(o)
=
+(e)),
=
(4.36)
2-+is well-defined, say. ln view of 4.15, Efo e V and the function #s: fl values unless in the extended although, v is a finite measure, it may take its shown. line, half as .-.:
(& V,
4.20 Lemma Let jt and v be c-finite. For all E e 5 and Jo#FdB TCZ')'
g; is T/p-measurable
(4.37)
=
By implication, the two sides of the equality in or finite and equal.
(4.37)are
either both infinite,
Proof Assume first that the measures are finite. The theorem is proved for this theorem. Let denote the collection of sets E such that gz case using the satisfies (4.37).Spv i ,W,since if E Fx G then, by (4.31), .4
';:-
=
+() and gzdbk fl x E e 1sz- lsj,
=
(4.38)
Clearly g(#)v(G) l(F) as required. We now show is a = and since holds. lsz-sj 1.25(a) lf F1 c Ez, then, Ej,E2 e so -system.
=
.4
Js1/,()JV(()
gE,-EL4
an 5lT measurable
Js1e'1(t0,()#V(()
=
=
-
'sztt,)l
(4.39)
-%(),
function by 3.25, and so, by additivity of
'n,
.
jgjgh-zbdbk
.'
Y,
F e
.4
.4,
' r' 1*
v(G)1F(),
=
l(F2)
=
-
r(F1)
=
zE2
-
F1),
.VL.
t
(4.k0)
''
showing that .Wsatisfies 1.25(b). Finally, If 41 and A2 are disjoint so are (A1)(o + gy). and (A2)(o,and ujwxct)) u1((l)) To establish 1.25(c), let (. e W, with be IN Ej j e J a monotone sequence, F. Define the disjoint collection (Aj) = With A1 q= E$ and Aj Epj A),j > 1, so that E U7=3Ajand Aj e Wby (4.39). additlvity of v, y ountable
(
=
't
=
-
x ','(l
=
Xuytl.
yj -.t. .E )r-. )..' L.k,.j-L);(b. ..r.),).).t - ; )--'.ti't(.. (yyyE (;()t. .(.yy')-....qy . r. tf''Ett..t.q'? ;t-yjjy:jj(j .yjjjy.;gjjyjjjjjt.. .y yy. q jtyjjj yyyj,;( jqqj)q;((. .!k jyy .ryLLI. gjyj)j. jjrjjk yy. ... jj;gjr j.yy.j.jjjyrr jjygj. yjjjyj $kj)4 yryt;y.)...-i.r..t.E)). .E 7..y jj ... y..q-().r-(.t(. ...-.-!;y..:. ,.-. .. gyjj,. . r) r.q.r t. #t-.-y 1l(!(l:l.)(.. ..((r-. )-)t.. . t.-... .. .. ' ;.1 . . . . . . ... . . .. . . . ((
')'
..t.-)
-'.t.t
-..(q..)--r;:,)j);.tr-ry
.
..
..q
..
y..jt..j. . ;fj,jL)qq;... )) jtjj(j)q..... ..y,r(.jk
-.-
.li.
:
--''
-
- - .
-..)-.-.))t:..,.
-..
.
...
.
tj!jj jj.(gjrj
(4.41)
68
Mathematics x
x
Jo
X'At)
tfBtt0l
g
Xj j=1
=
j=j
v
x
o'Atttltfp,tl
X5(X/)
=
=
1(13,
(4.42)
j=j
n
.d
Since Sys is a where the first equality is by 4.14. This shows that is semi-ring it is also a a-system, and 5 @ N c(Sys) c by 1.27. This completes the proof for finite measures. To extend to the c-finite case, 1et J and (Ey) be countable partitions of f tff and E with finite g-measure and v-measure respectively', then the collection tff x Ej e Sys ) forms a countable pmition of x E having finite measures, attf x Ej) = g.tfvtWl.For a set E l 5 @V, write Eq = E r7 tf x Ej). Then by the last argument, -system.
.4
=
ghdbk
=
where gzql fk disjoint and
F-> 's((l))
nEq),
(4.43)
R+ is defined by +,y((l)) v((F(j)(J, (t) e fk. The sets Eq are vjLjjl whep (l) e ff, or =
=
gE
=
X 1f4()X+/).
(4.44)
j
i
The sum on the right need not converge, and in that case g/ F/s-measurability holds by 3.25/3.26, and
jcysdb -
-
Jo
Xjl XyS's
XXJO:S/B i
j
*'
-
'*'l
+x.
=
However,
Jp, XXn(A) i
-
(4.45)
nk,
j
using 4.14 and countable additivity. This completes the proof. Now extend from indicator functions to non-hegative
.
functions:
4.21 Tonelli's theorem Let 7: be a product measure with c-finite marginal measures g and v, and 1et fl x E F-> R+ be (@(&Vl/f-measurable. Detine functions .f:
f1
= R+ f(,(), is T/f-measurable, (i) g JsfA
E
F->
(ii)
b-V(()
jtstcafdx Jo =
and let gk
=
hfuA.
Then
tfB. E1
ln part (ii) it is again understood that the two sides of the equation are either finite and equal, or both infinite. Like the other results of this section, the and (E,N,v), and the complementary results given theorem is symmetric in (f1,@,g) roles interchanging of the marginal spaces do not require a sepapte stateby the ment. The theorem holds even for measures that are not c-tinite, but thij further complicates the proof. Proof This is on the lines of 4.6. For a partition (F1,...,Fn J of 5 @)T let f Xwlsj, and then f Xii ltsjl(sand g = Xiqis'lEil by 4.4. g is F/p-measurable
=
=
Integration
69
by 3.25, and 4.20 gives vedb gdb Xm fl i
Xaine'i)
=
=
fl
=
j
AE
j'dn,
(4.46)
so that the theorem holds for simple functions. For general non-negative f, choose a monotone sequence of simple functions converging to f as in 3.28, show measurability of g in the limit using 3.26, and apply the monotone convergence theorem. w Extending to general
f
requires the additional
assumption of integrability.
4.22 Fubini's theorem Let a be a product measure with c-finite measuresg, and v; 1et #: fl x E >-> R be (; @ Vl/f-measurable with
marginal
Jzxsl.ftt,)'tllllt,ll
-> R by Jsffzyfv. (i) f is N/f-measurable and integrable for (z) e A f2, with g,tfl A) (ii) g is F/f-measurable, and integrable on A; .f4,();
.406)
=
=
=
-
JsTto'cldvt'l joxsft,clflttt,'l Jo
(iii)
0',
dpt*'
=
Proof Apart from the integrability, 4.19 shows (i) and Tonelli's Theorem shows (ii) and (iii) for the functions f' = maxl,o) and f- = f'b f, where I f+ + f-. But under (4.47),I I < on a set of a-measure 1. With A defined (i), (ii) and (iii) hold for f'b and as the projection of this set onto finite of the equation in sides with both (iii). Since f f'b f (i) extends to f by 3.25, and (ii) and (iii) extend to f by 4.7. . .41
=
-
.f(),()
'x
-,
,
-,
=
-
4.4 The Radon-Nikodym Theorem Consider c-finite measures g. and v on a measurable space (f2,1).g, is said to be absolutely continuous with respect to v if v(A) 0, for E e 5, implies g(A-) 0. This relationship is written as g 0. By construction E and A+ are disjoint. Every subset of A+kJ E is the disjoint union of a subset of A+ with a subset of F, so if E is a positive set, so is A+ t? E. By definition of
and hence
=
=
X,
,
2
b-)E4 z(z4+
:+z(A),
=
(4.51)
which requires x(A>) 0, so E cannot be a positive set. If F is a subset of F, it is also a subset of A-, and if positive it must have zero measure, by the argument just applied to E. The desired contradiction is obtained by showing that if zE) > 0, E must have a subset F which is both positive and has positive measure. The technique is to successively remove subsets of negative measure from E until what is left has to be a positive set, and then to show that this remainder has positive measure. Let ?z1 be the smallest integer such that there is a subset F1 c and detine F1 E F1. Then let nl be the smallest integer E with z(F1) < In general, for k = 2,3,..., such that there exists E1 i F1 with z(F2) < 1et nk be the smallest positive integer such that Fk-j has a subset Ek satisfying Z(f) < -$Ink, and let =
-1/n1,
=
-
-1/n2.
Fk
E
=
-
U). j=
(4.52)
lf no such set exists for finite nk, let nk +x and Ek = 0. The sequence (Fk) is x. non-increasing. and so must converge to a limit F as k =
---
We may therefore write E Fb..l (U7=1A), where the sets on the right-hand are mutually disjoint, and hence, by countable additivity, =
oo
z(f)
:4#')
=
+
side
Cr
Xz(A)
0 it must be the case that x(#)> 0, but since zF < oo by assumptoo as k x. < x, and hence nk This ion, it is also the case that Z7=1(1/ak) with and negative is therefore a positive measure, means that F contains no subset . having positive measure. set -->
--
For any set B e 5, define Z(#) z(A+ rn B4 and z-B) -z(A- fa #), such that x+(S) z-B). It is easy to verify that zB4 z+and x- are mutually singular, non-negative measures on (f1,F). f z- is called the Jordan decomposition of x a s ig ned measure. z+and z- are called the upper variation and lower variation of =
=
=
-
=
a,
wxvxa
+1-.zk
maooxa...x
l .,
I
l,+
-
..u
%,-
lo
-
zxollxcxzl
tlao
tntnl
a/mrizyimn
nf
q;
Tho
Tnrdnn
Mathematics
72
decomposition shows that all signed measures can be represented in the form of (4.48). Signed measures therefore introduce no new technical difticulties. We can integrate with respect to z by taking the difference of the integrals with respect V to x an (j y We are now able to prove the Radon-Nikodym theorem. It is actually most convenient to derive the Lebesgue decomposition (4.23)in such a way that the Radon-Nikodym theorem emerges as a fairly trivial corollary. It is also easiest to begin with tinite measures, and then extend the results to the c-tinite case. -
.
4.27 Theorem Finite, non-negative measures v and g have a Lebesgue decomposition g /1,1+ g2 where gl v and p,2 n
ra x+n) +
g.l v/n, for n -
=
g2(F raA+sl
j
sck.:
y:v
=
j
srn,j
-)
.f
+
w,
(4.56)
Integration
73
and hence p.(A')
p,(Ff'3A1) +p.(FrnAa-) 2 p.(Ff7A1)
=
k
J
El'x
Aj defines j measures such a partition. If different collections with tinite measures are known for v and p,, say lAjzjl and fAvyl the collection containing all the Ax rnAvk for j,k e N, is countable and of finite measure with respect to both v and g,, and after reindexing this collection can generate (f). Consider the restrictions of g, and v to the measurable spaces (Lh,%4, for j e IN,where % (F t''h Lh, E e 5 J By countable additivity, g,(F) XjttA' rn L%lwith similar equalities for g,l, g,z, and v; by implication, the two sides are in each case either finite and equal, or both +x. If v(F rn Lhl 0 implies g,2(F chLhl 0 for each j, then v(F) 0 implies g2(F) 0 for E e 9, and g,2< v. Similarly, let Aj, AJ) define partitions of the Lh such that g,1(Aj) v(A,) 0', then A U#. j, v(AC) = Qv(A,) = Ac unions, 0. p,1(A) Qg4(AJ 0, and and Ugdlare disjoint Hence g,1 v. K ,
=
,
=
,
=
=
.
=
=
=
=
=
=
=
=
=
=
.
The proof of the Radon-Nikodym theorem is now achieved by extending the other conclusion of 4.27 to the c-finite case. Proof of 4.24 In the countable partition of the last proof, 4.27 implies the such that non-negative X/f-measurable
existence of
g,(F rn Lh)
=
p,ltf
f''h
L1jl + g,2(F r-hLhl
where P,2(A't''h f#) Define
J:
F-
Jsyoo *,
=
R+ by .fttl)l
=
al1 E e
X lf/ttl'tt9l.
5.
(4.59) (4.60) -1
This is a function since the Lh are disjoint, and is g/s-measurable since f B) y-1(#) LtEj e 5 where Ej e 5j, for each # e S. Apply 4.14 to give = Uy/' =
Mathematics
74 :2(A3
=
Xp,ztFr7
Lhl
=
j
Js
1tzjfldv. - X
Xj
Js
Jsljdv
fdv
-
.
(4.6 1)
*
Consider the case where g, is absolutely continuous with respect to another gq, v(A) measure v. If the Lebesgue decomposition with respect to v is g, g,l + g,1(AC) since 0 implies 0 which in implies g,l(A) 0. But g,(A) 0 turn g,l v, too, Thus, g,1(f2) 0 and g, ga. The absolute continuity of a measure implies the existence of a Radon-Nikodym derivative f as an equivalent representation of the fOr measure, given v, in the sense that g,(F) Js/'tfv any E e @. An important application of these results is to measures on the line. =
=
=
=
=
.
=
=
=
4.28 Example Let v in the last result be Lebesgue measure, m, and 1et g, be any other measure on the line. Clearly, g,1 m requires that g,1(A'') 0 except when E is of Lebesgue measure 0. On the other hand, absolute continuity of g,2 with length', any countable collection of respect to m implies that any set of isolated points for example, must have zero measure under g2. If g, is absolutely continuous with respect to m, we may write the integral of a measurable function g as =
.
Ezero
j
+x
'(AVP'@)
=
-X
p+x
1 -
gxtfxtdx'
-X
(4.62)
so that all integrals reduce to Lebesgue integrals. Here, f is known as the density functionof the measure g, and is an equivalent representation of g,, with the relation g,(f)
=
jefxtdx
(the Lebesgue integral of f over E4 holding for each E e f. n
(4.63)
5 Metric Spaces
5. 1 Distances and Metrics Central to the properties of R studied in Chapter 2 was the concept of distance. For any real numbers x and y, the Euclidean distance between them is the number Jstx,y) Ix y I e R+. Generalizing this idea, a set (otherwisearbitrary) having a distance measure, or metric, defined for each pair of elements is called a metric space. Let S denote such a set. =
-
A metric is a mapping J: S xs F-> R+ having the properties dx,y), 0 iff if y, 2 dx, dy, (triangleinequality). metl'ic (S,#) is S A space a set paired with metric #, such that conditions (a)-(c) hold for each pair of elements of S. u
5.1 Dqfinition (a) dy.x) (b) dx,yt (c) dx,yt +
=
.'t7
=
=
0, but dx,y) 0 is possible when x # y, we If 5.1(a) and (c) hold, and dxtx) would call d a pseudo-metric. A fundamental fact is that if (A,#) is a metric space and # c A, (#,#) is also a metric space. If (E1is the set of rational numbers, Q c R and (Q,#s) is a metric space', another example is (g0,11,#s). While the Euclidean metric on R is the familiar case, and the proof that dz satisfies 5.1(a)-(c) is elementary, dz is not the only possible metric on R =
=
.
5.2 Example For x,y
E
R 1et
4@,y)
=
!
Ix +
yI
-
jx
..y
j
(5.1)
.
It is immediate that 5.1(a) :nd (b) hold. To show (c), note that Ix l doxtyjlk dx,yj). The inequality tz/(1 a) + /7/(1 b4 2 c/(1 c) simplifies c). We obtain 5.1(c) on putting a doxty), b doy,z), to a + b 2 c + abl c 4)@,z),and using the fact that 0 f do S 1. Unlike the Euclidean metric, Jo is defined for or y = ix. (V,4) is a metric spae on the definition, while Rwith the Euclidean metric is not. n -y
-
-
-
=
-
=
-
=
=
.x
lh the space R2 a 1arger va riety of metrics is found. 5.3 Example The Euclidean distance on R2 is -y1)2
defx-ytIIx-yII E(m =
d (q2r 4) p,lp ,
.
''
,
.
=
-y,)211'2
+(.n
(5.2)
'
.
is the Eglidean '
'
'
plane. An alternative is the
Etaxicab'
metric,
76
Mathematics
dvx,y)
=
Ixl
-
yl I +
1A'2
-
y2 1
(5.3)
.
dz is the shortest distance between two addresses in Manhattan as the crow flies, but dv is the shortest distance by taxi (see Fig. 5. 1). The reader will note that dv and Js are actually the cases for p 1 and p 2 of a sequence of metrics on R 2 He/she is invited to supply the definition for the case p = 3, and so for any cxo is the maximum metric, p. The limiting case as p =
=
.
-..-h
JA,g,y)
=
maxl
-y2l
1.x1yl I 1x2 -
,
1.
(5.4) R'T
Al1 these distance measures can be shown to satisfy 5.1(a)-(c). Letting R x ERx x R for any finite n, they can be generalized in the obvious fasltions (RR,#r), (R'l,#x) and so forth. n to define metric spaces (R&,Js), =
...
t/yg (-,:,)2 )
------------
t2-(-);,)7
)
Fig. 5.1 Metrics dk and dz on a space S are said to be equivalent > 0, there is a > 0 such that
dblx,yq < 8
=
dlx,yq < E
dzlxty) < 8
=
[email protected]') < E
if, for each
-v
e S and :
for each y e S. The idea here is that the two metrics confer essentially tlte same properties on the space, apart from a possible relabelling of points and axes. A metric that is a continuous, increasing function of another metric is equivalent to it; thus, if d is any metric on S, it is equivalent to the bounded metric dll + d4. Js and do of 5.2 are equivalent in R, as are are Js and du in R2 o n the other hand, consider for any S the discrete metric Jo, where for x,y e S, dE dnxby) 0 if are not y, and 1 otherwise. do is a metric, but Jo and equivalent in R ln metric space theory, the properties of R outlined in j2.1 are revealed as a special case. Many definitions are the same, word for word, although other concepts are novel. In a metric space (S,#) the concept of an open neighbourhood in R generalizes to the sphere or ball, a set Sdxn (y: y e S, J(x,y) < :) where x e S and : > 0. We write simply Sx,z) when the context makes clear which .
=
.x
=
.
=
,
Metric Spaces
77
metric is being adopted. ln (R2 d E ) S@ :) is a circle with centre at x and it is a radius E. In (R2 (rotatedsquare) centred on x with e the distance from x to the vertices. In (R2 du ) it is a regular square centred on x, with sides of 2e. For (R3 d s) we 11 think about it! An open set of (S,#) is a set A S such that, for each x e A, (; > 0 such that 5'@,) is a subset of 4. If metrics dj and dz are equivalent, a set is open in (S,#1) iff it is open in (S,J2). The theory of open sets of R generalizes straightforwardly. For example, the Borel field of S is a well-defined notion, the smallest c-field containing the open sets of (S,J). Here is the general version of 2.4. ,
,
,
tdiamond'
,#w)
,
,
,
...
5.4 Theorem (i) If C is any collection of open sets of (S,#), then
c
UA
=
Ae N
is open.
(ii) lf A and B are open in (S,#), then A ch# is open. Sx, c A and A e C, then S.z,z) c C. Since such a ball exists by a11 definition for x E A, a11 A e C, it follows that one exists for all x e C. (ii) If 5'(x,/) and S(x,k) are two spheres centred on x, then
Proof
(i) If
5'@,6k)f''h 5'@,Es)
(5.8)
5'@,E),
=
If x e A, H 5'(x,/) c A with Ex > 0, and if e #, (5S(x,ze4 E > 0. If x e A chB, 5'@,E) I A rn B, with 6: > 0. x
where e minlerx,Es) c B similarly, with
.x
=
.
The important thing to bear in mind is that openness is not preserved under
arbitrary intersections. A closure point of a set A is a point x e S (not necessarily belonging to A) The set of closure points of A, such that for all 6 > 0 ('!y e A with #(x,y) < . of Closure called the closure A. points is denoted A, are also called adherent points, sticking to' a set though nt necessarily belonging to it. If for some 8 x, so that > 0 the definition of a closure point is satisfied only for y &x,) f'-AA (xJ, x is said to be an isolated point of A. A boundary point of A is a point x e X, such that for all > 0 H z e Ac with The set of boundary points of A is denoted :A, and W A Q.J :4. The J(x,z) < interior of A is A0 A PA. A losed set is one containing al1 its closure Points, such that X A. An Qpen st dyjes not contain a1l of its closure points, since the boundary points do not belng to the set. The empty set 3 and the space S are both open and closed. A Subset pf A is said to be dense in A if B c A c X. A collection of sets T' is calld a. bvering for A if 4 Usssf. If each B is =
=
=
.
=
-
=
.
.
...:.;
.
''
.
'
.
.
;
'
.,>
covtink. A jt A is called compact if every open open, it is called an open kqtog. x is said to be relatively compact if covering of A contains a tinite S itsel? 'A,t,tk'.$,j,'..,k4) is said to be a compact space. The cott, is compact. If s is rk.,i tl compaltlktlyj, ,. ttk y))(y,y;(jty)):Vyty,yjkj4g eqqally relevant to the general case. remarks in j2. 1 about jkyyjytyyyjjy yjj yyy, jj 4::11. d!;p dr:;li 4k220 154!r 11::). tjyod jyyyyyktty.jtyz,y,g,jyjyyyyyjxyyyjyjyy,,yyyyyyytyy ii.1b;1b;Citii. 1l:4(E:1. rr.t).:y...;....,. ..,.4!k. ;yjjy .yj < x, sucj.j tjaatx s sxtrj; and also .
x
. .
.p,
,1.)
.))
'
t
.
rjg)
....r
.
.F ..--
.
Mathematics
78
totally bounded (or precompact) if for every e: > O there exists a finite collection of points x1,...,xm (calledan E-net) such that the spheres Sxz), i z4. The Sxi, can be replaced in this definition by = 1,...,- form a covering for noting that Sxi,z) is contained in Sxi, : + ) for all 4'5 their closures Sxinz), z4. > 0. The points of the E-net need not be elements of An attractive mental image is a region of R 2 covered w ith little cocktail umbrellas of radius : (Fig. 5.2). Any set that is totally bounded is also bounded. In certain cases such as ?n,dE) the converse is also true, but this is not true in general. . .k . .
.
.
... . . .
.
.
.
.
4
.
.
.
.
,
.
.
.
k...
.
.
.
.
.
.
.
.k
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
y .
.
.
.
.
'
j
.
.
.
.
.
.
.
.
.
.
.
' .
.'.
'
.. .
.
.
.
. .. '
.
.
.
.
. .
.
.
. '.
...
.
.
...
.
i
.
.
.
.
.
....
.
.
'.'k. .
.
. .
.
'.
.
.
.i; ..
.
...'
. .
.
...
.
.
4 .
.
...
. .
...'.
kj'..'
Fig. 5.2
5.5 Theorem lf a set is relatively
compact, it is totally bounded.
Proof Let A be relatively compact, and eonsider the covering of X'consisting of the e-balls Sxnz) for a1l x e X'.By the detinition this contains a finite subis an e-net for cover Sxitz), i = 1,...,-, which also covers A. Then (x1,...,au) d, and the theorem follows since e: is arbitrary. K The converse is true only when the space is complete', see 5.13.
5.2 Separability and Completeness In thinking about metric spaces, it is sometimes helpful to visualize the analogue R'J with n f 3, and use one's intuitive knowledge of problem for R, or at most for those cases. But this trick can be misleading if the space in question is too alien to geometrical intuition. A metric space is said to be separable if it contains a countable, dense subset. Separability is one of the properties that might be considered to characterize an 4R-like' space. The rational numbers (I) are countable and dense in R, so R is separable, as is R&. An aliernative definition of a separable metric space is a metric spaee for which the Lindelf property holds (see2-7). This result can be given in the following fonn.
5.6 Theorem In a metric space S the following three properties are equivalent: (a) S is separable. (b) Every open set A S has the representation
Metric Spaces
4
UBi, Bi e
=
79
'P,
izu1
where V is a countable collection of open spheres in %. Every (c) open cover of a set in S has a countable subcover.
I:a
A collection V with property (b) is called a base of S, so that separability is equated in this theorem with the existence of a countable base for the space. In topology this property is called second-countabilit.y (see j6.2). (c) is the Lindelf propely. Proof We first show that (a) implies (b). Let V be the countable collection of spheres (5'(1,r):a7e D, r e t1+), where D is a countable, dense subset of S, and Q+ is the set of positive rationals. If A is an open subset of S, then for each x e A, 3 > 0 such that 5'(x,) g A. For any such x, choose xi e D such that dxie < /2 (possiblesince D is dense) and then choose rational ri to satisfy dxip < ri < &2. Define Bi Sxitr e V, and observe that =
S(A7,8)
Bi I
(5.10)
i X.
Since V as a whole is countable, the subcollection fBiI of al1 the sets that satisfy this condition for at least one x e A is also countable, and clearly A c Uff I X, SO X = U#f. Next we show that (b) implies (c). Since V is countable we may index its elements as ( Jj, j E EN). If C is any collection of open sets covering A, cboose a subcollection ( Cp j e LNJ where Cj is a set from C which contains 5. if such by F-sets, as just exists, otherwise let Cj 0. There exists a covering of shown, and each 5. can itself be covered by other elements of V with smaller radii, so that by taking small enough spheres we may always Gnd an element of C to contain them. Thus 4 c UjG,and the Lindelf property holds. Finally, to show that (c) implies (a),consider the open cover of S by the sets (5'(x,1/n), x e S l If there exists for each n a countable subcover (Sxnktllnj, k e ENJ, for ach k there must be one br more indices k' such that dxnkpxnk')< lln. Since this must be true for every n, the countable set fak, k e (N n e EN ) must be . completes ptoof. the This in S. dense ,
:4
=
.
,
The theorem has a useful corollary, lc'tg C' k' ;''j' y' q.rljljjrjtrj'jk. k:5j';;'jll.' j(j('(jy';;)(;(jjjr-:';E' 'i'. 11(F*))7(j)1))*5i'-!tt)@'.tjjj: rjd t;(.',' y' (jd y.'r' )' :'''jt'ty'k). (y'r;. )' jy.yjy-'tyyd yj'(' (' )jjj'Lqjj.j;Lj))(L:'.. ;';)' j;'16'. illj.Fklfll'r!id j()$.tj!y' !l-i@;yy'. jj.'lq. j'td .'.'k(t(jjj.'. --' ,.' -' .'.,' ,'.r:,ggjj4;4* ..t-d '-7*
'''jd 7* ::.tky,(y)jjj4jj44.2* yjyyjj...d ..-tj).'.' ,.t,' jy'. E'. 5* k(j)jt)':. ;'-'kE''. ,':'.#y' ?'
)'jt. jyd (' jjryjjgj;rrjjj'. yy'' jjjjj..d jjj')j(jk)'. r)'k(!@'' )' h' tl'tjjg t'jkd
''. .' 57 @
Corollary A totally bounded space is separable. n .
Another important property is that gbspaces which we show as follows. y
sR
Tuoorem
If 4:.* is a sp' ..
.
.
..
of separable
spaces are separable.
ab ll,!.'pae and A c S, then A,d) is separable. l l '
.
.'
;
'.;
. .y
y.
:
5:.'
:
')'q q.ty ;..' j,.);.j.;Lj. jy yk y. ..) .. a,,#..z'inl.s.
....y.((
. ..
.'
)
;y..
:
.'
.
g
g
...
.
. d'tll(td. Constnlct the countable set E by ) qp;jt,. jytgr yy a, ( '' supposeD is countbl g.;..j y (. y .). . .... y .. ... .. . . yjtliltjt, . q F tnul iyl!q t ' 'y takingone point from each;.$.',tj,.A,, ' (qi r..i q. y ( .( .. jy- jy.j(. ..jjtk(,... rk ;,-j,bj,.,. ...;,;-j-tyjj-yyy.trr,pjrrj--t:. jq j rk.,yjj ' r...'. .(. .,.-bj',::,;;;LL?3b.,L, y ..g.. .. ;f;;,,,;j:;-f);;,-'LL?33'-.,-j;)--Ljj;::-),'b--jj.)L,,t,;b'f?;));j-.
Ivoof
.
y
E
.
:
(
E
(E $2 .'
EE .
.
E ' ly . .
jE. .
?.
.
: gy. q
.ptllko'tyq)
E. q: . .
:
.E
.
,
.
.
l'.r
-i
y..y)..
E' . .. .
.' ' ::EE. . . . . E .-' (Ej.jijjjjjyryjgggggg, .. . . g: E . y y . y, g . . yy . . . y .jjyy... ..j... . y.-.... . ,!j;pjj42)$g,-.,,y,..(y.kpjr)(.y(y(,yyjjyjjyj;jy.-..,...-y.tyjky,
E
. E .. yj' .E. !. - q. . (-. .(j. ; (E r
.
.
.
.
.
-
. -:;..,-:3626,3.-.; -...633333:,31,'.. . . . .
.
j .....-.-,...yjj..... . .-,,.... .....
#.,..- ytlyy.jitr.. )r,y,-.yyjyjjtjj)y.,,. .--..!1k.. ijijLjlijjjiiljl-. . ... .. ... ..
);.ryj-rkji-. 1,:-::,..-,k. .... lttjj?, . . .;(;;yyit..ykrj.-.yy-j-kjtjrqy-yj. .,jjt;-.. ...y'ptj,;,.-p!h!!!qqj--..,.
...-....
,
( !jr r... ..... . . .. :..
: : : . ..
,
-k,yt.yytyyy-y,......--...-
.
,....y
-.
.
;k.
.
..
.
,.
. ,.
.
),
.
..
.
,
..
,;.j,:-.,,j;;)j.yty,r.t y .(.. - .. .... -
.
.
..
.
4::). jjjjj.s ,-
tjggjjj
Mathematics
80
For any x e A and > 0, we may choose y E D such that J(x,y) < 6/2. For every such y, R z e E satisfying z e A fn S@, for r < 6/2, so that d@,z) < &2. Thus
dxa)
f dx,y)
+
J(.y,z)
0 such that (5'(x,6) (xJ)DA is empty. In other words, each element is an isolated point. The integers ; are a discrete set of (R,#s), for example. If S is itself discrete, the discrete metric do is equivalent to J. -
-
5.9 Theorem If a metric space contains an uncountable discrete subset, it is not separable. Proof This is immediate from 5.6. Let A be discrete, and consider the open set where Ex is chosen small enough that the specified spheres folnn a Uxez5'@,exl, disjoint collection. This is an open cover of A, and if A is uncountable it has no countable subcover. . The separability question arises when we come to define measures on metric spaces (see Chapter 26). Unless a space is separable, we cannot be sure that a1l of its Borel sets are measurable. The space Dgtz,,jdiscussed below (5.27)is an important example of this difficulty. The concepts of sequence, limit, subsequence, and cluster point all extend from R to general metric spaces. A sequence (x,,)of points in (S,J) is said to converge to a limit x if for al1 e > 0 there exists Ne k 1 such that
dxng)
< :
for a1l n
>
(5.13)
Ne.
Theorems 2.12 and 2.13 extend in an obvious way, as follows.
5.10 Theorem Every sequence on a compact subset of S has one or more cluster points. n 5.11 Theorem If a sequence on a compact subset of S has a unique cluster point, then it converges. Ia The notion of a Cauchy sequence also remains fundamental. A sequence (xnlof points in a metric space (S,tf) is a Cauchy sequence if for a11 e, > 0, 3 Nz such that dxnbxmj < e,whenever n > Nz and m > Nv.The novelty is that Cauchy sequences in a metric space do not always possess limits. It is possible that the point on which the sequence is converging lies outside the space. Consider the space (Q,#s). The sequence fxn), where xn 1 + 1/2 + 1/6 + + 1/n! e Q, is a Cauchy 0', but of course, xn e (the base of sequence since Iak+1 l = 1/4n + 1)! the natural logarithms), an irrational number. A metric space (S,#) is said to be complete if it contains the limits of all Cauchy sequences defined on it. (?,dE) =
...
-->
-xn
--
Metric Spaces
is a complete space, while
81
Q,dE4
is not. Although compactness is aprimitive notion whichdoes notrequirethe concept of a Cauchy sequence, we can nevertheless detine it, following the idea in 2.12, in terms of the properties of sequences. This is often convenient from a practical point of view.
5.13 Theorem The following statements about a metric space (S,(f) are equivalent: (a) S is compact. (b) Every sequence in S has a cluster point in S. (c) S is totally bounded and complete. Ia Notice the distinction between completeness and compactness. In a complete space all Cauchy sequences converge, which says nothing about the behaviour of nonCauchy sequences. But in a compact space, which is also totally bounded, all sequences contain Cauchy subsequences which converge in the space. Proof We show in turn that (a) implies (b),(b) implies (c), and (c) implies (a). Suppose S is compact. Let fxn,n e IN) be a sequence in S, and detine a decreasing sequence of subsets of S by Bn fxk:k k n,J The sets Tn are closed, and the cluster points of the sequence, if any, compose the set C O';=1X,; IICIC. U';=I#,C,, If C 3, S (U= so that the open sets T'nc are a cover for S, and by assumption these contain a t'initesubcover. This means that, for some m < oo, S (O'2=1))C This leads to the contradiction Tm 0, so that C must LKLc n S U'''ol) be nonempty. Hence, (a) implies (b). Now suppose that every sequence has a cluster point in S. Considering the case of Cauchy sequences, it is clear that the space is complete; it remains to show that it is totally bounded. Suppose not: then there must exist an : > 0 for which such that no e-net exists', in other words, no tinite n and points (x1,...,AkJ x k. But letting n in this case, we have found a dxjtx S : for all j sequence with no cluster point, which is again a contradiction. Hence, (b)implies =
.
=
=
=
n=
=
=
=
=
.
:g
-->
(C).
Finally, 1et C be an arbitrary open covr of S. We assume that C contains no finite subcover of S, and obtain a contradiction. Since S is totally bounded it must possss for each n k 1 a tinite cover of the fonu B ni
sxni, 1/2'')i
=
y'
jyjjjjjyyjjjy'
1
=
...,kn.
(5.14)
,
Fixing ?z, choose an 1 for whlch Bni hs' or finite cover by C-sets (at least one such exists by hypothesis) and all thls st Dn. For n > 1, jlBni))=n 1 is also a jl tjjt X h & X n-1 jja.s Sujpcgyeyy COVCf1ng jw p n-1 ap wg ggyj gjyjpjjj Xyr,(,.)!y;yy;jy fojyjj* njj tg F yyjj . kyry,j;. :,, jyzjojjsea sequence of points (xn e D IS nonemp tyi yj.g.(..)t;(( C-sets, and accordlngly n, y .(-(j,jj(. . )y.( . y.kj. s l tontalns ak, and Ds+1 1s of radi us ) since Dn 1s a ball Of radm# .,.. yy,,.tyjjj;,,,,gy; .yr;yy, jj-j jygyrgjs jkgyy jjygj.. . jtyjjy yggjj jkgyy ;yy gjg gjygjs y....jjygjyyj jjyy;yy (j. yj.gj. ikyj. gj. ;jy yyyjy yjs gjygjyyyg jjygjs $j44,, gjs lln- gj and contalns Ak+1 dxngxq-j) .iJ (j. gyjj,, gjjgjj y 0'1: '''''j . !.E ',..j';;jjc''''j;.)',.*.,.,,;.;,)k,q..qL,.j'q,)j-,Ls))jj)q,;t)).L'j,.j !1..2,'... ......1812.....,.... 1t!12$!E;k.. t7)Eitu . . .,2:jj,g.
,1
.
-
-
..
,-
...,,q..yyy-
..,,.j.)
.
...
.
y
..
,
....
..ksysj-sssssyyyssy,j.yr
,..syyyy
.,
.
',','t.)y;;r,))',tyyyiy),.,),y!(jjj,y)g.,,yiityy;j,t,y.,,yy,,,jyjj,
y
,
''''
..j.,
,,
..)
=
j,;.jr
..
d,
jj:rr;jj
''
-'::::g
.iltrr;,:r
..
',-
q.g...j-: .E , q..
.(...j!;r';-r:).,..-jj)...
jjggygi
.....
d.
...t.
,
:
ktrriiir'dt:l'.k;,d::r '-,.,,
'-::::::'
...,
g
..,.fj),)),)jj)j)),,
..
,
...
.
..
,
.rqrrty.r. .
.
.g .jjjj. j,.
y..ygyjjjy2jjj;;;.yyjjyygjjg ;,,.q.;u,..,..,,,,r,,..
jjjjjjya.y.
.jjjgjrj...
,4..
..,..
.,r,...r,r,,,.,....)))))jj,jj.j:)jj(j,;..(qjj.
yjyyyjyy.yyyyj;yyj.. ..,...yyyyyj;..;.
..g yt -t-;t4i!)7-.ygjjtt' -
.
yjjjgj.y,-pk@$1r....-,... . yjya
--
.'.
'' ' .' t.ylii. Ji. .;..
:..... ,-..
..
yjjjjjjjjyyjj.g. yy,,yjjjyyyj,... g.yy.. yyy ...yy.,yyyy;
....yyjgyyyyy..
..
.
--$t'-.
.
.yyjjyyjy.
jsggggggggy
,,..).:,.j
...,..y .
,y
..
yyyjy.yygyyjy,
..yjyyyy. a.aayyayay..
y.
..
y,
-;jj-t.yy.
.
-...
--jjy;;-
..
.
-zi
.
....
..
.. ....
.. yjyjy;.;yjjry,;;jy..j(.. .g...j.jgy,...;.g..g........y.y. ..... . rysg..;
jjryrrjyjtr
.
.')' yj..-T!rt)j-ji.. .. . '-ty,--i)... ,,- EEE; ;.,-. ,qjy)j)y(. . E ,;,'.-);,.. ;' E..:-...'.-..?.-.., ..k:, ;.hF(.f.
...
-..
,..y......
,
.g.yyy..
ysy. (gylrqE..jj
.
).
.
jjjjjj,jj),,,j r.:g; .,yjy,4E ..ypj.'q).k(jyj.,j:;II,..: ),j)jj,..t jpkzjlk.g))jjjjjj):...k -(...
.:;,,::7(22,-
bif'
-.;(
)F?jjjgjjj:... r.()
j .. y,y.q.yt. .E ;..... )i):y:.
j!jjjjj).
jtkkgry
jtg
,,'..
.-;p7.
...
.jrr,
jjyyg;jj jjygg;jjjggygi
.
.'r'y)'-'.
,444,.4
.j(r,
!j,
,jjy,
''
.....
.j;,k:rr
.!jjj;,.
.
''.'''''12.12
.
jl,.
-.
.E.'.:.'.y..E.E......:... ),1lpl-t''t,-;-)@--
.,
.
...
.
.tlih'tjjrt.;2).-:.lqt.1r):.
!,-
77?:?,-' .... ;,..-.. b3-,Lt;j,,1'3$;t311L;;,;(;L)3$L;-t..i;'.).j)). .'.ti;;jit)()(.. . ... ;-
1t;;$(yr';)))(;...---.j$)yyk?Ey-..,it). -. . '-'. -(jj(-,-.'. '.
.-..
yjgy;yyjygyyyyy jygyyyyy
.
.
.
..
,. -
.
.
.
. .
.
y;jjjjyy
82
Mathematics
< 9Iln ensures that Dn c Sxn. But this means Dn c A, which is a contradiction since Dn has no finite cover by C-sets, Hence C contains a finite subcover, and (c) implies (a). K In complete spaces, the set properties of relative compactness and precompactness are identical. The following is the converse of 5.5.
5.13 Corollary In a complete metric space, a totally bounded set A is relatively compact. Proof lf S is complete, every Cauchy sequence in A has a limit in S, and a1l such points are closure points of A. The subspace (A,#)is therefore a complete space. It follows from 5.12 that if A is totally bounded, X is compact. .
5.3 Examples The following cases are somewhat more remote from ordinary geometric intuition than the ones we looked at above.
5.14 Example In j12.3 and subsequently we shall encounter R=, that is, inhnitedipensional Euclidean space. lf x (x1,.n,...)e R=, and y @1,y2,...)e R= similarly, a metric for R* is given by =
=
X
d-x-y) >72-Q()(x,,y,), al
(5. 15)
=
where do is detined in (5.1).Like do, t/x is a bounded metric with #xtx,yl f 1 for all x and y. n (RX,Jx) is separable and complete.
5.15 Theorem
Proof To show separability, consider the collection
4
=
xk rational (x (x1,x2,...):
if k S m, xk
=
R=
=
0 otherwise
)
(5.16)
.
(A,u, m Am is countable, and by 1.5 the collection A and e: > 0, H x e Am such that able. For any y e =
R''O
m
=
1,2,... J is also count-
co
d-x,y)S y2-$ -al
+
-t=
5-' 2-4(0,yk)
S
m+ 1
e+ 2--.
(5.17)
Since the right-hand side can be made as small as desired by choice of e:and m, y is a closure point of z4. Hence, A is dense in R=. To show completeness, suppose (xn @1n,x2n,...), n e N ) is a Cauchy sequence in R ' Since dtxknvxkm)S 2J x @n, xm4 for any k, fxa, n e EN ) must be a Cauchy sequence in R. Since =
.
d-xx)
/
57l-kdxkakn) 1 -t=
+
2-'n
(5.18)
Metric Spaces
83
for all m, we can say that xn x @1,x2,...) xk for each k e R= iff xkn completeness of R implies that lxn) has a limit in R=. w 1,2,...., the -->
--)
=
=
5.16 Example Consider the infinite-dimensional cube', g0,1J=;the Cartesians product of an infinite collection of unit intervals. The space (g0,11=,#x) is separable by 5.8. We can also endow (0,11= with the equivalent and in this case bounded metric, X
772-11x: -ytl
Px(x,y) =
(5.19)
u
.
a1
In a metric space (S,*, where Jcan be assumed bounded without loss of generality, define the distance between a point x e S and a subset A ? S as dxl) infysxJtx,y). Then for a pair of subsets h,# of (S,#) define the function du 2 s x zs j..o R+ =
,
W here
2S
is the power set of S, by du(A,B4
max sup d@A), sup d(y,B)
=
xB
(5.20)
.
yGA
du(A,B) is called the Hausdorff distance between sets 4 and #.
5.17 Theorem Letting J'fs denote the compact, nonempty subsets of S, ls,du) is a metric space. satisfies 5.1(b) since the sets of Rs are 0, so that du is only a pseudo-metric for for any x E 4 and any z E C we have, by d is a metric,
Proof Clearly du satisfies 5.1(a). It closed, although note that duA, W) general subsets of S. To show 5.1(c), definition of dx.B) and the fact that =
sup dx,B) K supttftmz)
)
dz,B)
+
Ac#
xcA
(5.21)
.
Since C is compact, the infimum oyer C of the expression in braces on the righthand side above is attained at a point z e C. We can therefore write + dz,B)) < snp tnf(4(x,z) supltms) A ze xed
.xG
supltz,l). sup (;:,6'D z 0. By 1.2(iv) and continuity at x,
f-'fsdx,nbLtj c
Sdxnn)
Wz have Spflxln
? A
J-1(Sp(f@),e)) i f-'(A).
(5.27)
A is closed and f -1(V A) s If A is open then by 1.2(iii), which -1 is closed if f (A) is open. This proves sufficiency. To prove necessity, suppose J -1(A) is open in S whenever A is open in 1-, and in for E > 0 is open in S. Since particu lar J-1(' p (/x),E)) e f-(S p (.j(x),:)), there is a 8 > 0 such that (5.25)holds. Use complements again for the case of closed sets. . -y-1(4)
''
-
=
-
.x
,
This property of inverse images under J provides an alternative characterization of continuity, and in topological spaces provides the primary definition of continuity. The notion of Borel measurability discussed in j3.6 extends naturally to mappings between pairs of metric spaces, and the theorem establishes that continuous transformations are Borel-measurable. The properties of functions on compact sets are of interest in a number of contexts. The essential results are as follows.
5.20 Theorem
The continuous image of a compact set is compact.
Proof We show that, if A c S is compact and Jis continuous, then /'tAlis compact. -1 Let C be an open covering of fA). Continuity of f means that the sets f @),B s C are open by 5.19, and their union covers A by 1.2(ii). Since A is compact, these sets contain a finite subcover, say, J (#1),...,J Bm4.lt follows that .A
.j
y(A)
c /'
.j
c UBj, U.f-'(#y)U.f(.f-'(>) y=1
(5.28)
-
jv1
y=1
is by 1.2(i) and ihe seond inclusion by 1.2(v). Hence, Bj,...,Bm is a finite subcover of fA) by Cij4ts. sinceT' is arbitrary, it follows .jyjyd -),'jLiii'''L. k' 11:222:::;* y'. ;'y'. jjj;jy!r;'. y' yyd !)' -i-E.(E' y'y y'ty' ry'j y;t'11I::2t!):.!q'-?q#').i. q' j-q,,,4fb.'. ..' jy'. g'g'. j';;;r2qpjjy.'. (' jj122222,,r*. y'(!jjt:r!''-, ;'(j1Ii(r:j11:qr:1I,;.. diIt!j:rjt)'. (jii).d (r:ljill;k.d .y' y' rlll:rjjd ..-,;;p9i;j:r'.-' .tjjjrrjjq:!)gry:yjj(,.'. gjjjjjjjjj'.' j';' jyj''' (.(;-'j4' :!jEi;,-' 1;:Ej5)11,,*. j'y .r.j;4'?i (' LjjkLbftjt' 11*. y-' L-t--..,,blfbqk:t''L.' rjjlg)d rljjj;;-d ..' y' yy'.'y yyy'...-y'yjj'y,(;jjry.qjjjyy.rjj(.. (yjjj.'. ty'-tj.ytjjj't. -.-,'-qi;'fqbI;;'.' (111:r::114.*. (jjlyrrr!ll.d .,' .g' ty.-y.jjy'..' t)j)-',. that fA) is compact. w )')66fLLLj. .jjqn,qjv...'k::kf'qktLLLl''
where the equality
''.'.' .'. .'rs .'.'jy,.
-(j(y'-y' -'. --((2:jp,'.' ,:pj(,jg.' -j;yjjjjj,jyjjj,,'. '-jjj::jjrjj;'. ;'F:' ..:,' ';t';--,::;;;jg!!r;y.-j)-' ---')::' -.' ,'ygjjjj)jjjj(yyrjj(y,'yyy -'.' -' ,':' :-'jj(jgj.y.' -'jyj' ;',',' E'jyyyy j'.-kk;'. )jyyjy-.'' t-,'. y'yd jjr:;rr:r;d jy'y'((r)))I''C' .i-'',,j(;)' y' t'.-jjjyjy:jjj,.' yrd .t't(;yy!ryryjjjgr;rryy.k' ('. jr:'-'.:.,,-j,jj;jjjjjjk,:.,y)--.)t;'i.'. ;,,*j3.*313413;*. ---'ibbfLi',,j,'jg'y.yyj-,-l t'yjytt'js j'yd .y' g' (y;'. 41kk2:::;* jjkykzrr:;d 11;j2r!:;'. .it')E y' :',jjjkjg,r;j'. ;'yyllllirrrjlk.'gj .-;'.jjjjjjrjjjjj.'. t.' :''jjj;:rjjjjj...' )' .y' j'yj,d
-,' .--'.'-.;k1ijj(2!t$;'y ..' .11;:2)* '',*4111::)*. .111:2k::1,,*. -' .'. ','.'i;'i---'jkg;jjpjr.. -'j);,4,j)tf,'3t',' jjryjpjiij'j.. .y'yyjjprrtjjj-;' ,'y.jy'-yj -'. ,k-'-
5.21Theorem If the set
Jis
. .
continuous on
.
,
.
.5.(' .
.)
..'
.
. '.. . .
'.
; y.' . j . '. C
.'
.
.
.
fj....
'
'
.
.
.
( (.
.
..
njpd set, it is uniformly continuous on .
.
' '. ' . . lrEi ''.
'.'
.
.
.
..
.
,.
F y. jjl kj.g::( .)()('; k...
E .r:
..j
.
....
.. .
.
.
j
.
.
ryyy,y',.'yt;9y'jr,tty:j(,.i.yyy,rr,,y; tt'tkttyt, d fr eacil x e A, continity '' a.t x S be compact. C'..h,ptt ...!)E;1il):il).l!;)7tj!: ;..r' .'ij.tE itJ))j#15! .. ... ' 4).k6.). -. 7(; (.q-( ... .. . tjq. .)j k;jy'jiI-k::IIL.. y-;:!iii;y-. 5. .. ' :,!iij;;.. y. li .;;,.:ijj,. ..f- lil-. .IIkrIT).. (,11:::,14,.. :!ij;;--:1EE5111,( ...jj...y....-....C4;CIIi:jyjj,yy.. j;j!jrr:jjy. (qjj::--. ,11I::2jII),. jjjjyrr;jj:. iljj;;: rjjygq 11(2rrjjIIl,. ' . .... . ..yy.. y...y(IIE:r:11l,.. y .. jjjjgl:::)).. . (j11(r:)l1... t!jjI(4(:jpt.. ' gg.. ... .. j . jL:jji,,'''. .. : . .. E j . y . . . . '.. i... . j.j y..... .... . Jiii .).. . .(. . . .r....yyj .. y. y jg.. j. . j. ..yy.))j,. .y . . r jjjyy:t.. ,jjjjj)f:63L... ' ' . ., rrrjj;!lL . . .d1EE!i1. .. . .q-.'.rllqq-ll:E),s jj!jjj;j,. 4422::, d;::::: . y j-. . t. j .....j . .44:)( 4(:4):2,j-:, 11E1!2, . .... (Iir11.. . . tlEsll, (Ir:'' IIEE:!'' ;,jt-.,...jr))-...jj;tj.yjy gj .yy;y-(...-.,#.,.yyj,y,k.-.-.-.yy..y?yy)......t-,,..-.yt;y-.. jky.y..,y(y.yy(jyy!..-jyy-.. . j.yy.y ..y .j.... , .) .. y yyy.yjjyt.. jyy.., j.y;jj.. yk,-..-jy.)..;(.,... ... ,yyy.jjj. . .. ... jj-.. )j.,...;,tyy; 44:)( ((;), ((;), ((;), .; ..... jy.. ;y...yjtj;.....,jy..-jj#,,-.j;L... . ,.y.... . .. . . . . . ...j..... . . .
Proof Let A
.1jlgq,
-:rjjgijr,.
,gjj!r:r
,,.
,gyy..t: ,;....
. .
y..
y
.
E
-(:!i-l'
'
--. ..!tr:())yry-
.
.
...
...
-...
. .
.
:.,..
.,-
.
-
.
... ..
.-y. -.
-.74.
;' ):: l;!. ?!-):;)!t27tE.''''. . ,b:3b... :.'.'.,.Ekikkir:....-.. .;
--k::::::
.j;1d:r7
' .
.
.
.
''.
.
.
.
-
.
..''.
..
.
'.
-
-, . -
.,
.
qE-y q. -
E $it)kyy.)jtt.i..rjjjy)y -.-y j. .- ,, . .
,
.
....
. -
-...
..
.
,-jjj,!r!,jjj(j-.
. .
.
y
.
.
.
.
.
..yy)....r.-kt..j). q. ..
....
.
.
.
,
y .:
.
.
;.;(gt;-). .-.
..
,j.
t;y
E
' '-'..
.. .
.
..
.'Erjj,?.
.
!.
.
.. .
:
.
.
, -
''
,g;j/r:'
.
.,
E'7
. .
y
.
jjjyyygyjjj. ...
.
-,yy -.,
y
.
.
y
...
tyy . ,. y..y..;jj,-, r,jj:j:r:;;. . y-jjrrjqyjjjjj-t., - ,)..t,y.-.y,.yj.t,,yt.,yy,(.t, .y . . .
,
,
.
,
..
,-,.
-
..
.
.
y.
.
j
j . j..yy,.-y.y?.-y......-E.-y-.q.-..-.(q.y . ......-j;.;.;..-,:.,j-).j.;k.q.;jr-,y,..y,.yyy.-.t.y4.
...r.
.
.
-
...(j.....
....
.
. .
.
.
.
.
-
y-. y. .y yyy.
.
-
y. .
. . . . yy. g y-. . y-. yj y. -gyy. . . . . . y.. .. . .
y. .
,
-
-.
Mathematics
86
1,...,-. compact they contain a finite subcover, say Sdlxknr, k consider of points x,y e S such that dlx,y) < minlckcmrk, and a pair < zlE, and also e Sdxk,rk) for some kn so that pfx,fyjj dxkvx)S dlxk,yt + dlx,y) f rk + 6 S lrk,
using the triangle
inequality. Hence plflxnfxl)
S
f
.
(5.29)
r1e,and
< e. S p(J(x),f (x1))+ p(Axk),J(y)) p(f@),J@))
Since, 8 independent of x and y,
Let 8 Now, y =
=
(5.30)
is unifonnly continuous on A. w
T' is 1-1 onto, and f and f -1 are cont inuous, J is called a homeomorphism, and S and are said to be homeomorphic if such a function exists. lf S is homeomorphic with a subset of T', it is said to be embedded in 'T by f. If f also preserves distances so that p(f(x4,f@)) #@,y) for each x,y e S, it is called an isometry. Metrics J1 and #2 in a space S are equivalent if and only if the identity mapping from (S,J1) to (S,J2) (themapping which takes each point of S into itselg is an homeomorphism. If
.f:
S
F..+
'r
=
5.22 Example If Jx and poo are the metrics defined in (5.15) and (5.19) respectively, the mapping gl (R=,#x) ((0,1)=,px),where g = (:1,:2,...)and --)
is an homeomorphism.
Xi
1
'(x) 2 -+
=
2(1
dxnyt
-> R is said to be upper E > 0 3 5 > 0 such that, for y e S, ER+satisfies hd) 4, 0 as d 0. It satisfies a unrm Lipschitz conditionif condition (5.33)holds uniformly, with tixed M, for all x e S. The remarks following (2.9) apply equally here. Continuity is enforced by this conditionwith arbitrary and stronger smoothness conditions are obtained for specialcases of h. .1,
,
Fig. 5.4
5.5 Function Spaces The non-Euclidean metric spaces met in later chapters are mostly spaces of real functionson an interval of R. The elements of such spaces are graphs, subsets of R2. However, most of the relevant theory holds for functions whose domain is any metric space (S,#), and accordingly, it is this more general case that we will study.Let Cs denote the set of all bounded continuous functions S F-+ R, and .f:
de/ne
duf
g)
spl
=
J(x)
I
gx)
-
(5.34)
.
xe S
dv is a metric.
5.23 Thorem
Proof Conditions 5 1(a) and (b) re immediate. To prove the triangle inequality CN, write, given functions J, g and h .
ttd .yy.(' t'yyty'
t';'j' ;'E y' t'(' )' t''r' y;t'. j':'jyjy' j'. y' (' .);' j'g' .-j;'. -' -'.jt'
.'.'
dfbhj
=
SLIPj
...
Ies
:
'.
'
.
'
..
...
-
hlx)
.'t ' '.
''
.
.
...4(:). 11.,1.. 4:yx) 0 such that Ifnlx) Also, by unifonn convergence (; n large enough that -
maxfIfx) fnlx)l Ifaly) f@lI l < -
-
,
so that Ifx) J(y)1< z. Hence f e -
!
-f(.y)
(5.36)
.
I< Jn(.y)
Y if dx,y)
13s,
yytlyyttttt. yyty d1:::,i-....t).',t)'tt ..jj3i,1.. ,).() !j,, yj,jsy, EIl..jjojoug uyytsy mox geueraj jyy ..... ., To show that we canno :lrr-.d.pir,... j));;.,j);)q.;j...;);)... ;;)))..;....-..; .,.434... yy...;jygr-.j-y--.. .. )j-#).;-y--t..r.-,)g. y)-... yty. ...t-y.y-.)(tyy.y,yy. =
'
.
,
'.
.
=
'
..
.
.
..y,,y.
.)
'ril,.
'
.
' 'ttyljk
,
,,
'tij.'yjy,,yt.#,y,
--.i1friil-.
.1r:. .)ky,yy..yyy-. ..... r;-.)t);t@ -jyyt-r),t-.y.ytyj-,.-.
,typtl-l-y-y-)..
.
... . .
........... ..
-..y
.
-.
.
..
Mathematics
90
function space.
circumstances, we exhibit a nonseparable
5.27 Example For S (tz,:J,an interval of the real line, consider the metric space Dka,bndu) of real, bounded cadlag functions of a real variable. Cadlag is gauche) to describe droite, limites a colourful French acronym continue functions of a real variable which may have discontinuities, but are right continuous at every point, with the image of every decreasing sequence in Ltz,:)containing its limit point; in other words, there is a limit point to the left of every point. Of course, Cp,s) c Dp,x. Functions with completely arbitrary discontinuities form a larger class still, but one that for most purposes is too unstructured to permit a useful theory. =
To show that Dfa,bjsdu) is not separable, conslder the subset with elements
J/f)
0, t
0, and choose 8
=
sup a.s s
(as is
If(.y4 J@)I
sup
.
ysdx,)
equicontinuity)
possible by unifonn
sup w(.f,)
(5.48)
-
such that
(5.49)
< :.
ysA
Boundedness of A under the unifonn metric means that there exist finite real numbers U and L such that inf
L f
/'lA.xe
be (x1,...,xm)
Let
S
#4x)f
sup DA,XC
U.
(5.50)
5
for S, and construct the finite family
-net
a Dm
fx) S
=
l'k
A, k
q
=
1,...,(v + 1)-)
(5.51)
according to the recipe of 5.25, with the constants ai taken from the finite set
where u and v are integers with v exceeding CU L4Ie and u 0,...,v. This set contains v + 1 real values between U and L which are less than 6: apal't, so that Dm has (v+ 1)= members, as indicated. Since the assumptions imply
fL
+
(&- Ljulvj
-
,
=
4 c &s, it follows by 5.25 that for every f E A there exists gk e Dm with < :. This shows that Dm is a e-net for A, and A is totally bounded. dvf,g ito suppose A is relatively compact, and hence totally bounded. To prove Trivially, total boundedness implies boundedness, and it remains to show unifonn equ icontinuity. Consider for E > 0 the set ionly
,
'
Mathematics
#k(:)
(/': w(/',1/k)
=
< :).
(5.52)
Uniform equicontinuity of A is the condition that, for any
large enough that
X'c
Bkz). lt is easily veritied
lw(.f,)
-
w(',*
6: >
0, there exists k
that
I f 2JrXJ,'),
(5.53)
R*,dE)
F--> is continuous. Bkz) is the inso that the function w(.,): (Cs,d verse image under w(.,8) of the half-line (0,:) which is open in R+, and hence
x 0 as k for each f e Bkzj is open by 5.19. By definition of Cs, w(/', 1/k) Cs In other words, w converges to 0 pointwise on Cs which implies that the Cs, and hence forW.But b.collection (#k(:),k e IN) must be an open coverinAfor covering is such of has A X hypothesis compact, every a finite subcover, and so A q Bke) for finite k, as required. . -->
--
.
,
6 Topology
6. 1 Topological Spaces Metric spaces fonn a subclass of a larger class of mathematical objects called topological spaces. These do not have a distance defined upon them, but the concepts of open set, neighbourhood, and continuous mapping are still well defined. Even though only metric spaces are encountered in the sequel (Part VI), much of the reasoning is essentially topological in character. An appreciation of the topological underpinnings is essential for getting to grips with the theory of
weak convergence. 6.1 Definition A topological space (X,I) is a set X on which is defined a topology, a class of subsets called open sets having the following properties: (a) X e T, O e 1. (b) If C i T, then U()scoe T. Oa G T, then 01 f-'h 02 q T. n (c) If 01 G 'r
'r,
These three conditions dehne an open set, so that openness becomes a primitive concept of which the notion of e-spheres around points is only one characterization. A metric induces a topology on a space because it is one way (thoughnot the only way) of detining what an open set is, and a1l metlic spaces are also topological spaces. On the other hand, some topological spaces may be made into metric spaces by defining a metric on them under which sets of are open in the sense defined in j5.1. Such spaces are called metrizable. A subset of a topological space (X,I) has a topology naturally induced on it by (A f''h 0: O G 'IJ is called the the parent space. If A c X, the collection relative topology for A. (A,n) would normally be referred to as a subspace of X. and If two topologies are defined on a space and :1 c T2, then T1 is said to be coarser, or weaker, than T2, whereas 12 is hner stronger) than I1. In particular, the power set of X is a topology, called the discrete topology, whereas (?,x J is called the trivial topology. Two metrics define the same topology on a space if and only if they are equivalent. If two points are close in one space, their images in the other space must be correspondingly close. If a set O is open, its complement Oc on X is said to be closed. The closure X' of an arbitrary set A ? X is the intersection of a1l the closed sets containing A. As for metric spaces, a set A i #, for B c X, is said to be dense in B if B c X. ':
'u
':1
=
.:2
6.2 Theorem The intersection of any collection of closed sets is closed. X and O are both open and closed. n
Mathematics
94
However, an arbitral'y union of closed sets need not be closed, just as an arbitrary intersection of open sets need not be open. For given e X, a collection Vx of open sets is called a base for the point x if for every open O containing x there is a set B e Vxsuch that x e B and B c 0. This is the generalization to topological spaces of the idea of a system of neighbourhoods or spheres in a metric space. A basefor the topology T on X is a and every x 6 0, there exists B e collection V of sets such that, for every O 6 that V such x e B c 0. The definition implies that any open set can be expressed union the of sets from the base of the topology; a topology may be defined for as specifying a base collection, and letting the open sets be defined as a space by the unions and finite intersections of the base sets. ln the case of R, for example, the open intervals form a base. .x
,r,
6.3 Theorem A collection V is a base for a topology I on X iff (a) Ulevf = X. (b) V #1,#2 e F and x e #1 chB1. 3 B5 e V such that a' e #3 c #1 /n %. Proof Necessity of these conditions follows from the definitions of base and open in terms of the base F, as follows: set. For sufficiency, define a collection 'r
O e
'r
iff, for each
.'&'
e 0, H B e F such that x
q
B c 0.
O s4tisfies the condition in (6.1), and X satisfies it given condition (a) of the theorem. lf C is a collection of T-sets, UtxwO e T since (6.1) holds in this case in respect of a base set B corresponding to any set in C which contains x. And if 01,02 G 1, and x e 01 fa O1. then, letting #1 and B1 be the base sets specified in (6.1) in respect of x and 01 and 01 respectively, condition (b) implies that x e B5 c 01 f''h O1, w ltic h shows that I is closed under finite intersections. Hence, is a topology for X. . ':
The concept of base sets allows us to generalize two further notions familiar points) of a set A in a from metric spaces. The closure points (accumulation topological space (X,'r) are the points x e X such that every set in yhe base of a7 contains a point of A (a point of A other than x). An important exercise is to show that x is a closure point of A if and only if is in the closure of z4. We have generalizations of two other familiar concepts. A sequence tak) of points in a topological space is said to converge to x if, for every open set O containing x, H N k 1 such that ak e O for all n k N. And is called a cluster point of (.:kl if, for every open O containing and every N k 1, xn e O for some n k N. ln general topological spaces the notion of a convergent sequence is inadequate for characterizing basic properties such as the continuity of mappings, and is augmented by the concepts of net and///er. Because we deal mainly with metric spaces, we do not require these extensions (seee.g. Willard 1970: Ch. 4). .x
.x
.x
6.2 Countability and Compactness The countability tutppu provide one classitkation of topological spaces according, roughly speaking, to their degree of structure and amenability to the methods
Topology
95
of analysis. A topological space is said to satisfy the first axiom of countability (to be jrst-countable) if every point of the space has a countable base. It if the space as a satisfies the second axiom of countability (is second-countable) whole has a countable base. Every metric space is first-countable in view of the existence of the countable base composed of open spheres, Sx, 1/n) for each x.
More generally, sequences in first-countable spaces tend to behave in a similar manner to those in metric spaces, as the following theorem illustrates.
6.4 Theorem ln a first-countable space, x is a cluster point of a sequence (.v, n e IN) iff there is a subsequence (xa#,k s IN) converging to x. Proof Sufficiency is immediate. For necessity, the definition of a cluster point implies that 3 n k N such that xn e 0, for every open 0 containing x and every N k 1. Let the countable base of be the collection Vx fBi, i G (11), and choose a monotone sequence of base sets (Ak, k G INl containing x (andhence nonempty) with A1 #l, and Ak c Ak-l f''h Bk for k 2,3,...; this is always possible by 6.3. Since x is a cluster point, we may construct an infinite subsequence by taking xnk as th: next member of the sequence contained in 4s for k = 1,2,... For every open set O containing x, R N k 1 such that xnk E Ak 0, for a1l k k N, and hence xnk x, as required. w --> x as k .x
=
=
=
.-..:
The point of quoting a result such as this has less to do with demonstrating a new property than with reminding us of the need for caution in assuming properties we take for granted in metric spaces. While the intuition derived from R-like situations might lead us to suppose that the existence of a cluster point and a convergent subsequence amount to the same thing, this need not be true unless we can establish first-countability. A topological space is said to be separable if it contains a countable dense subset. Second-countable spaces are separable. This fact follows directly on taking a point from each set in a countable base, and verifying that these points are dense in the space. The converse is not generally true, but it is true for metric spaces, where separability, second countability and the Lindelf property (that every open cover of X has a countable subcover) are all equivalent to one another. This is just what we showed in 5.6. More generally, we can say the
following.
6.5 Theorem A
jecond-countable
space is both separable
and Lindelf.
Proof The proof of separability is in the text above. To prove the Lindelf property, let C be an open cover of X, such that UzeCA X.For eachA e C and e A, we can find a base set Bi such thgt x 6 Sj c A. Since UT=1#fX, we may choose a countable subcollection Af, i 1,2,... such that Bi c 4f for each i, and hence U7=1Xf X. K .x
=
=
=
=
A topological space is said to be compact if every covering of the space by open countably compact if each countable sets has a finite subcover. It is said to e coveringhas a finite subcovering. And it is said to be sequentially compact if subsequpce. Somesimes, compacteach sequence on the space has a convrgnt
96
Mathematics
ness is more conveniently characterized in tenns of the complements. The complements of an open cover of X are a collection of closed sets whose intersection is empty', if and only if X is compact, every such collection must have a finite subcollection with empty intersection. An equivalent way to state this proposition is in terms of the converse implication. A collection of closed sets is said to have the Tnite intersection property if no finite subcollection has an empty intersection. Thus: 6.6 Theorem X is compact (countablycompact) if and only if no collection (countable collection) of closed sets having the finite intersection property has an empty intersection. a The following pair of theorems summarize important relationships different varieties of compactness.
between the
6.7 Theorem A first-countable space X is countably compact iff it is sequentially compact. Proof Let the space be countably compact. Let fxn,n G EN ) be a sequence in X, and detine the sets Bn fxn,ak+1,... ) n 1,2,... The collection of closed sets (Tn, n e (N) clearly possesses the finite intersection property, and hence (X'n is nonempty by 6.6, which is another way of saying that fak) has a cluster point. Since the sequence is arbitrary, sequential compactness follows by 6.4. This proves necessity. For sufficiency, 6.4 implies that under sequential compactness, al1 sequences in X have a cluster point. Let ( Ci, i e INl be a countable collection of closed sets having the finite intersection property such that A,y 0Z=IC/ # 0, for every finite n. Consider a sequence fxn) chosen such that xn e A,,, and note since (A,, ) is monotone that xn e Am for all n 2 m; or in other words, Am contains the /?z). Since (ak) has a cluster point x and Amis closed, x e Am. sequence fak,n k This is true for every m e N so that OT=1Q is nonempty, and X is countably 6.6. by compact w =
,
=
=
,
6.8 Theorem A metl'ic space (S,#) is countably Proof
compact iff it is compact.
Sufficiency is immediate. For necessity, we show first that if S is countably compact, it is separable. A metric space is first-countable, hence countable compactness implies sequential compactness (6.7),which in turn implies that every sequence in S has a cluster point (6.4).This must mean that for any 6: > 0 there exists a finite E-net (m,...,auJ such that, for a1l x e S, dx,xk) < E, for some k E ( 1,...,-) ; for otherwise, we can construct an infinite sequence fxn) with dxn,xn') k 6: for n # n', contradicting the existence of a cluster point. Thus, for each n e IN there is a finite collection of points 4,, such that, for is countable and every x e S, #@,y) < 2-n for some y e An. The set D = U*n=1An dense in S, and S is separable. Separability in a metric space is equivalent by 5.6 to the Lindelf property, that every open cover of S has a countable subcover', but countable compactness implies that this countable subcover has a finite subcover in its turn, so that
Topology
compactness is proved.
97
.
Like separability and compactness, the notion of a continuous mapping may be defined in tenns of a distance measure, but is really topological in character. In the mapping /': X F--y f is said to be contina pair of topological spaces X and uous if J -1(#) is open in X when B is open in and closed in X when B is closed in That in metric spaces this definition is equivalent to the more familiar one in terms of s- and 8-neighbourhoodsfollows from 5.19. The concepts of homeomorphism and embedding, as mappings that are respectively onto or into, and 1-1 continuous with continuous inverse, remain well defined. The following theorem gives two important properties of continuous maps. ,
,
.
6.9 Theorem Suppose there exists a continuous space X onto another space (i) If X is separable, T is separable. is compact. (ii) If X is compact,
mapping
f
from a topological
.
problem is to exhibit a countable, dense subset of Consider .f(D4 where D is dense in X. lf J(D) is the closure of J(D), the inverse image f-JfD is closed by continuity of f, and contains f -1(/'(D)), and hence also coptains D by 1.2(iv). Since X'is the smallest closed set containing D, and X c 5, /X) it follows that X c J-1C(D)). But since the mapping is onto, f ff-1(fD))4 c J(D), where the last inclusion is by 1.2(v). J(D) is therefore ' in dense as required. J(D) is countable if D is countable, and the conclusion follows directly. (ii) Let C be an open cover of Then (J -1(#): B e C) must be an open cover of X by the definition. The compactness of X means that it contains a finite subf-1B n ) such that cover, say f -1(B l ) Proof
(i) The
.
=
.
,...,
y'
=
y(x) y =
Uy-'tsp= J y-1 tjsy c /=1 Usy,
y=1
(6.2)
j=L
where the third equality uses 1.2(ii) and the inclusion, 1.2(v). Hence C contains a Gnite svbcover. . in both these results. The extension Note the importance of the stijulation and f is obvious, and can be supplied of (ii) to the case of compact subsets of X reader. the by Completeness, unlike separability, compactness, and continuity, is not a topological property. To define a Cauchy sequence it is necessary to have the concept of a distance between points. One of the advantages of defining a metric on a space is that the relatively weak notion of completeness provides some of the essential features of compactness in a wider class than the compact spaces.
(6.5)
s.
Nonnality implies the existence of an open set 171/2(say)such that 171/2contains :4 an d (&1/2)ccontains B. The same story can be told with &f/2 replacing B in the role of C1 to define 171/4,and then again with 171/2replacing A in the role of C1 to define Um. The argument extends by induction to generate sets ( Umnn, m 1,...,2* 1 ) for any n e EN,and the collection ( Ur, r e D ) is obtained on letting n --A x. lt is easy to verify conditions (6.3)-(6.5)for this collection. Fig. 6.l illustrates the construction for n 3 when A and B are regions of the plane. One of the onion' in the limiting case. must imagine countably many more =
-
=
.
,
lx(.Y,')
=
A'
(6.11)
1:(-Y,)/)
=
y.
(6.12)
are topological spaces, the coordinate projections can be used to generate a new topology on the product space. The product topology on X x is the weak topology induced by the coordinate projections. The underlying idea here is very simple. lf A i X and B q f are open sets, the 1 1(#) will be lk - (z4)an d X x B = 1:yset A x B (A x f) rn (X x #), where A x f rectangle called of X The product regarded as open in X x and is x f. an open topology on X x is the one having the open rectangles as a base. This means that and @2,y2) two points @1,y1) are close in X x provided xl is close to x2 in X, antl yl to y2 in f. Equivalently, it is the weakest topology under which the coordinate projections are continuous. If the factors are metric spaces (X,#x) and f,h), several metrics can be constructed to induce the product topology on X x including If X and
=
=
,
,
,
@2,y2)) maxllxtxl,xzl, p((x1,y1), =
Jr(A'1,y2)l
(6.13)
and p'(@1,A'1), @2,y2)) =
#x@1sx2)+#:/1,y2).
An open sphere in the space (X x p), where p is the metric in happens to be an open rectangle, for ,
5'p(@,A'),)
=
Sdsxnnt
X Sdj,vnt',
(6.14) (6.13),
also
(6.15)
but of course, this is not true for every metric. Since either X or may be a product space, the generalization of these results from two to any finite number of factors is straightforward. The generic element of the space X2=1Xfis the n-tuple (x1,...,ak: xi e Xf), and so on. But to deal with infinite collections of factor spaces, as we shall wish to do below, it is necessary to approach the product from a slightly different viewpoint. Let 4 denote an arbitrary index se and fXa, (x l A ) a collection of spaces indexed by is the collection of a1l the mappings x: 4 4. The Cartesian product X Xasxxa F-y Uaszxa such that x(a) e Xa for each (x e A. This definition contains that given in j1.1 as a special case, but is fundamentally more general in character. The coordinate projections are the mappings 1a: Xn, F-> Xa with ,t
=
Topology
103
l(x(.%) Xq),
(6.16)
=
but can also be defined as the images under x of the points a e A. Thus, a point in the product space is a mapping, the one which generates the coordinate projections when it is evaluated at points of the domain z4. In the case of a finite product, A can be the integers 1,...,l. In a countable case A EN,or a set equipotent with EN,and we should call an infinite sequence, an element of =
.x
X'='(say).A familiar uncountable example is provided by a class of real-valued functions x: R F-> R, so that A R. In this case, x associates each point a e R R with a real number x(a), and defines an element of the product R The product topology is now generalized as follows. Let (Xa, a e A ) be an arbitrary collection of topological spaces. The Tychonoff topology (product X topology) on the space X h as as base the finite-dimensional open rectangles, sets where the Oa ? Xa are open sets and Os Xa except for at of the form Xas.4otx, most a finite number of coordinates. These basic sets can be written as the interkections of tinite collections of cylinders, say =
.
=
B
-1 aajtoaj)
=
-1
f-h
chzramto oo ;
...
,
o j'y; .
for indices a1,...,a,a e A. Let be a topology under which the coordinate projections are continuous. If Oz and hence contains the Tychonoff topology. Since is open in Xa, aa-1(O(: ) e this is true for any such 1, we can characterize the Tychonoff topology as the weak topology generated by the coordinate projections. The sets C:a-1(Oa) fonn the sub-base for the topology, whose hnite intersections yield the base sets. Something to keep in mind in these intinite product spaces is that, if any of X the sets Xct are empty, X i s emp ty. Some of our results are true only for nonempty spaces, so for full rigour the stipulation that elements exist is desirable. 'r
'r
':
,
6.15 Example The space hdv) examined in j5.5 is an uncountable product space having the Tychonoff topology; the unifonn metric is the generalization of the maximum metric p of (6.13).Continuous functions are regarded as close to one another upder dv only if they are close at every point of the domain. The subsequent usefulness of this characterization of C,du) stems mainly from the fact that the coordinate projections are known to be continuous. n Thetwoessentialtheorems onproductspacesextendseparability andcompactness from the factor spaces to the product. The following theorem has a generalization to uncountable products, which we shall not pursue since this is harder to prove, and the countable case is sufticient for our purposes.
6.16 Theorem Finite or countable product spaces are separable under the product topology iff the factor spaces are separable. Proof The proof for tinite products is an easy implication of the countable case, hence consider X* XT=fXf.Let Df fdikndizn... l c Xf be a countable dense set by D each and construct a set i, detining for c =
=
X''O
104
Mathematics
Fm
=
Xojx
i=m+ 1
f=l
fl l
(6. 18)
for m = 1,2,..., and then letting D U;=IF-. Fm is equipotent with the set of m-tuples formed from the elements of the countable DI ,...,Dm, and is countable by induction from 1.4. Hence D is countable, as a countable union of countable sets. We will show that D is dense in X=. Let B = X7=IO/ be a non-empty basic set, with Oi open in niand Of niexcept for a finite number of coordinates. Choose m such that Oi Xf fOr i > m, and then =
=
=
m
B ra Fm
=
=
x X ftl X((?/rno/) f=rn+1
=1
)
y,
?.
(6.19)
l,...,m. Since recalling that the dense property implies Oi fa Di # 0, for B fa Fm B ?'7 D, it follows that B contains a point of D; and since B is an arbitrary basic set, D is dense in X7 as required. w =
One of the most powerful and important results in topology is Tychonoff's theorem, which states that arbitrary products of compact topological spaces are also compact, under the product topology. It will suftice here to prove the result for countable products of metric spaces, and this case can be dealt with using a more elementary and familiar line of argument. lt is not necessary to specify the metrics involved, for we need the spaces to be metric solely to exploit the equivalence of compactness and sequential compactness.
6.17 Theorem A finite or countable product of separable metric spaces (Xf,) is compact under the product topology iff the factor spaces are compact. Proof As before, the finite case follows easily from the countable case, so assume X= = X:=1Xj, where the Xf are separable spaces. In a metric space, which is first countable, compactness implies seprability and is equivalent to sequential compactness by 6.8 and 6.7. Since Di is sequentially compact and tirst-countable, every sequence (xfa,n e INl on Xf has a cluster point xi on the space (6.4)k Applying the diagonal argument of 2.36, there exists a single subsequence of integers, (nk,k E l ), such that xink xi, for every i. Consider the subsequence xw,...). In the product topology, xy in X*, lau, k s N ) where xng @1n:, which proves that X* is sequentlally xo...) iff for i, every xink xi .x (.x1, compact. X= can be endowed with the metric px Z7wl4/zf,which induces the product topology. X= is separable by 6.16, and sequential compactness is equivalent to compactness by 6.8 and 6.7, as above. This proves sufficiency. Necessity follows fzom 6.9(ii), by continuity of the projections as before. w -..-9
=
=
--
--
=
6.18 Example The space R= (see5.14) is endowed with the Tychonoff topology if we take as the base sets of a point the collection .x
N(x.,k,
(y: la)
-yjl
=
< :,
=
1,...,k1., k e N, er > 0.
(6.20)
A point in R* is close to a7 in this topology if many of its coorclinates are close
Topology
to those of x; another point is closer if either more coordinates are within 6: of each other, or the same coordinates are closer than e, or both. The metric Jx defined in (5.15)induces the topology of (6.20).If (xn) is a sequence in X, dxlxnvx) 0 iff V z,k 3 N 1 such that xn e N(x;k,e) for a1l n k N. We already know that R= is separable under Jx (5.15),but now we can deduce this as a purely topological property, since R* inherits separability from R by 6.16. (:1 -->
The intinite cube (0,1j=shares the topology (6.20)with R= and is a compact space by 6.17; to show this we can assign the Euclidean metric to the factor spaces (0,1j. The trick of metrizing a space to establish a topological property is frequently useful, and is one we shall exploit again below.
6.6 Embedding and Metrization Let X be a topological space, and F a class of functions F-y tion p'lzw :: X/rfyis the mapping defined by
exjf
y. The evalua-
J: X F->
xt.
=
(6.21)
The class F may be quite general, but if it were finite we would think of d(x) as the vector whose elements are the J(x),J e F (6.21)could also be written z:yoe = f where ay is the coordinate projection. A minor complication arises because f need not be onto y, and :(X) c Swy is possible. If 4 is a set of points in ly -1(4) may contain points not in e(X). We therefore y, the inverse projection need to express the inverse image of A under f, in tenns of c. as .
:-1(ay-1(4)
f -144) (xyo:)-1(4) =
=
f...j c(x))
c
y.
(6.2:)
The importance of this concept stems from the fact that, under the right conditions, the evaluation map embeds X in the product space generayed by it. It would be holeomorphic to it in the case c(X) Skyy. =
6.19 Theorem Suppose the class F sparates points of X, meaning that fx) # J@) for some f G F whenever x an d y are distinctpoints of X. If X is endowed with the weak topology induced by F, the evaluation map defines an embedding of X into ly J .
.' ,'--f,,' -' ' :.);' .'...' .'-.' .,'
k' r' ::'i;t' lkd ;'y' j' f'.f..
tjd f'
Proof lt has to be shown that e is a 1-1 mapping from X onto a subset of Xro which is continuous with continuou: inverse. Since F separates points of X, e is 1-1, since c@) # :(.y) w he never flx) #(.f@). for some f e F. To show continuity of -1 X rwheneker A is open in y under the weak e, note first that f (A) is open in topology, and sets of the form ly*(A) i open in X/ywith the product Bltt e (z:y(A)) (ayoc) (A) topology, since the projecons are y yyj ; j lh, under e of sets of the form gs J- (A), so we can conclude that. thte '-' .. ).' .T.'1.). (qgt. .y. .g j(. ..t. ),q:.., yj.).yt) ). tyi,j:lt. . .. . y'yt,jgyy,kytt,jyyyryltysy invrfs unions and intersections t; ay-1(4), 4 c :y, are open. since #.-;y.;.L.L.y ( .r (..eryt yrve y (t. (.'( ). ty t . . se sets of Xfy,which are (see 1.2) the same property extrh.d, kygtk..qjlkyrltlqplttit,tt,i ltjj tjju illwl invfj tuj , product topology, and tjo ) r))t.()),qy:,. jjy y Ljjqj.jlqjtjlllq-kjLb. f-iite intersections of these .. . ..; p cztt' yyyt( j))j;)).jjjy)(4tr.)jk;)r...'...))' yyyjyy, y y ,y...'y#)tyv-y..y,' ,.;;.bitiL.) !!F:1. L.)iLj;;Lb'-j)'3';'bb'' 1t1!4,1(E:,. ,y'').%' ,)'..; .y'' ftt. ,:... jjj7td-jjj,1!2421,:'(E:,..ki:r t::' )8, (0,11 /74#1 ): '
represent the contingent probability to be assigned to B after drawing an event A from N, where V i @.We can think of V'as an infonuation set in the sense that, for each A G V, an observer knows whether or not the outcome is in A. Since the elements of the domain are random events, we must think of #(# lN) as itself a random outcome (arandom variable, in the terminology of Chapter 8) derived from the restricted probability space (f1,V,#). We may think of this space as a model of the action of an observer possessing information V, who assigns the conditional probability #(# lA) to B when be observes the occurrence of A, viewed from the standpoint of another observer who has no prior infonnation. N is a c-tield, because if we know an outcome is in A we also know it is not inAc, and if we know whether or not it is in Aj for each j 1,2,3,..., we know whether or not it is in U#.j. The more sets there are in V the larger the volume of information, all the way from the trivial set 5 (f,0) (completeignorance, with 8#1 T) #(#) a.s.) to the set 5 itself, which orresponds to almost sure knowledge of the outcome. In the latter case, P@ 11) 1 a.s. if ( e B, and 0 otherwise. lf you know whether or not (J) G A for every A e @, you eftctively know (J). 8 =
=
=
=
7.3 Independence A pair of events A, B e 5 is said to be independent if #(A ch#) equivalently, if PB jA)
=
#(#).
=
#(A)#(#), or,
(7.4)
lf, in a collection of events C, (7.4)holds for every pair of distinct sets A and B from the cotlection, C is said to be pairwise indepenttent. In addition, C is said to be totally independent if (.D-,A)
#
-
J,'tA'
(7.5)
containing two or more events. This is a stronger condition for every subset # i thanpairwise independence. Suppose Cconsists of setsA, #, and C. Knowing thatS has occurred may not influence the probability we attach to A, and similarly for C) but thejoint occunrnce of B and C may none the less imply something about A. Pairwiseindependence implies thatfA fa B) =PIAIPCBI,PCA rn C) =#(A)#(C), and PB rn C' PB)PC4, but total independence would also require P(A f'n B rn C) = =
P(A)PB)PC).
Probability Spaces
115
Here are two useful facts about independent events. In each theorem let C be a totally independent collection, satisfying ptf)xs .gA) l7ze.y#(A) for each subset =
#' I C. 7.5 Theorem The collection C' which contains independent.
yt
and Ac for each A e C is totally
Proof It is sufficient to prove that the independence of A and B implies that of Ac and B, for B can denote any arbitrary intersection of sets from the collection AC. This is certainly true, since if and (7.5) will be satisfied, for either A or #(A)#(#), then #(A f''h B) =
PCAC
r'A #)
=
PAc ra #)
= #(#)
-
+
P(A fa #)
P(A4PB4
-
#(A)#(#)
PAPB4.
=
(7.6)
w
7.6 Theorem Let () be a countable disjoint collection, and let t e collections consisting of Bj and the sets of C be totally independent for each-j. Then, if B LjBj, the collection consisting of B and C is als independent.
=
Proof Let #' be any subset of C. Using the disjointness of the sr-ts of B, and
countable additivity, P
j
rn
)
xOs ,A
-
P
sA)
zOs ) (#?
(Uy Bj ra ,A xOs XyP -
=Xy
PCB?P
7.4 Product Spaces
sA)
tj
=
Plhj
ra
,PA4.
.
and independence arise when multiple random eyperiments run in parallel, and product spaces play a natural role in the analysis of these issues. Let (f x E, 5 @ V, #) be a probability space where 5 (*)V is the
Questionsof dependence
c-fieldgenerated by the measurable rectangles of (-1x E, and PL x E) 1. The randomoutcome is a pair (,4). This is no more than a case of the general theory of j7.1 (wherethe nature of (l) is unspecified) except that it becomes possible to askquestions about the part of the outcome represented by ) or ( alone. #04/7 =
=
#(Fx E) for F e F, and #E(G) Pl x G4 for G e N, are called the marginal probabilities.(1,1,#0) and (E,T,#E) are probability spaces representing an incompletelyobserved random experiment, with ) or k, respectively, being the only things observed in an experiment generating (,(). On the other hand, suppose we observe ( and subsequently consider the experiment'of observing (t). Knowingtmeans thatforeachf x Gweknowwhetherornot (,() is in it. The conditional probabilities generated by tlzis two-stage experimentcan be wr itten by a slight abuse of notation as #(Fl V), although strictly Fx E, and the elemts of the speakingthe re 1evan t events are the cylinders uzrln enHzw+lxlv.lllv,qia 'n uzn n'zala' rmfiold nre 0./ G for G G W. Cnn ditinnine =
Probability
116
#(Fx E lfl x T). ln this context, product measure assumes a special role as the model of independence. In (f x E, 5 (& V, #), the coordinate spaces are said to be independent when PFx
G)
=
#0(#)#s(G)
for each F e 5 and G e T.
Unity of the notation is preserved since Fx G (Fx E) fa (( x G). We can also write #(Fx EIf x G) /704/-), or with a further slight abuse of notation #(FIG) #n(/'), for any pair Fq 5 and G e N. Independence means that knowing ( does not affect the probabilities assigned to sets of ;. Since the measurable rectangles are a determining class for the space, the p.m. P is entirely detennined by the =
=
marginal measures.
=
8 Random Variables
8. 1 Measures on the Line a probability space. A real random variable (r.v.) is an 5lBf'unction X: ( F.-: R 9 That is to say, aY4)) induces an inverse mapping measurable from B to 5 such that X -1(#) l 5 for every B e B where B is the linear Borel field. The term T-measurable r.v.' may be used when the role of B is understood. The symbol g. will be generally used to denote a p.m. on the line, reserving P for the p.m. on the underlying space. Random variables therefore live in the space (R,S,g,), where g. is the derived measure such that g(#) PX -1(#)) = #( X e B) The term distribution is synonymous with measure in this context. The properties of nv.s are special cases of the results in Chapter 3; in particular, the contents of j3.6 should be reviewed = in conjunction with this chapter. If g: R F-> R is-1a Borel function, then gox is also a r.v., having derived p.m. g,: according to 3.21. p(A)) If there is a set S e B having the property g(& 1, the trace of (R,S,g) on S is equivalent to the original space in the sense that the same measure is assigned to B and to B r''hS, for each # e S. Which space to work with is basically a matter of technical convenience. If X is a r.v., it may be more satisfactory to say that the Borel function X 2 is a r v. distributed on R+ than that it is distributed on R but takes values in R+ almost surely. One could substitute for (M,S,g,) the extended space (F,S,g) (see 1.22), but note that assigning a positive probability to infinity does not lead to meaningf'ul results. Random variables must be finite with probability 1. Thus (R,f,g), the trace of (F,S,g) on R, is equivalent to it for nearly a1l purposes. However, while it is always finite a.s., a r.v. is not necessarily bounded a.s.; there may exist no constant # such that 1X(l I f B for a1l (l) e C, with #(D C4 0. The essential supremum of X is
Let
(f1,1,#) be
.
,
=
.
=
,
.
=
-
ess sup X
=
inf (x: #41XI > x)
and this may be either a finite number, or
0),
=
+x.
8.2 Distribution Functions The cumulative distribution
@.b.f.)Of
f'Ifrlcft??z
X is the function F:
where
2- F-> g0,1J,
< xj, x s F. (8.2) l ( ( fty/, j'q..(( gn the values 0 and 1 to )( jj:t.:.' tk..j.,.((y. we take the domain to be R-since it' iq.(... (('..t'jr(t.t:.....t( .. .. . y)':)j)t..: .i-) ... (j).jL?, f;. ji,j.. . F(x)
=
g((-x, x1)
x
=
.
.
..
.
.
:
'
:
:.)'
.-
..'.''
....
.
..)
.... .
.
'
'
.t.
.
.:$j.i.
(Ej
.
.
....j
... y.. q .))(-.3E,l'( y j.y)y(y)jgE.t..-'(.:,.t.yy.. .t
...yr.j(,?:jyj.jy.
,,;.j..j;...(..j.
..
.
-(.y.'.,
.
. .
..
.
.
.
.
c
..
.
r)().... .. .. . y.....) ,.E.
. .
.
Probability
118
F(-x) and F(+x) respectively. No other values are possible so there is no contradiction in contining attention to just the points of R. To specify a distribution for X it is sufticient to assign a functional form for F; p, and F are equivalent representations of the distribution, each useful for different purposes. To represent g,(A) in terms of F for a set A much more complicated than an interval would be cumbersome, but on the other hand, the graph of Fis an appealing way to display the characteristics of the distribution. To see how probabilities are assigned to sets using F, start with the half-open interval (x,y) for x < y. This is the intersection of the half-lines (x,y) and (-x, (.x,+x).Let A (-=, x) and B = (-x,y), so that g(A) F@) and g(#) F(y); then -xzf
=
=
BIAC
g,(@,)/1) =
=
ra #)
= 1 (g(A) -
1 p,tA
=
+
-
1 g(#))
tJ#C)
g(#)
=
-
=
-
g(z4)
=
F@) F@), -
A and Bc being disjoint. The half-open intervals form a semi-ring (see1.18), and from the results of j3.2 the measure extends uniquely to the sets of f. a) for x c R As an example of the extension, we detennine g,((x)) PX will (compare 3.15). Putting a7 y in (8.3) not yield this result, since A cjAc = 0, not (xl We could obtain fxl as the intersection of (-x, x) and (-x, xlc, but then there is no obvious way to tind the probability for the open interval (-x, x) (-x, xl f.xl The solution to the problem is to consider the monotone sequence of half-lines (-x, x 1/n) for n e N. Since @ 1/p, (-x, aj, 1/n), according have 1/n, 0 there exists Ne such that g,(@,x + 1/nJ) < E, and, accordingly, As n
.-.+
x,
.v
P,((-=,
=
.Y1)
f t((-=,
for n k Ne. Hence F(x+)
=
x+ 1/n1)
1, so that g,((J,:J) mc) = 0, as require. =
=
8.3 Examples Most of the distributions met with in practice are either discrete or continuous. A discrete distribution assigns zero probability to a11 but a countable set of points, with F' = 0 in te decomposition of 8.3.
8.6 Example The Bemoulli (orbinal'y) nv. takes values 1 and 0 with fixed probaThink of it as a mapping from any probability space containbilities p and 1 elements. and tFailure' 1 on R and is called the probability densit.v function(p.d.f.)of th p.m. According to the Radon-Nikodym theorem, the p.d.f. has the property that for each E e f, =
g,(f)
=
jyflvdx.
,
(S' 16)
8.9 Example For the unkf'orm distribution on g0,11 (see 7.3), 0, x F@)
0 .x
S 1.
(8.17)
1
The p.d.f is constant at 1 on the interval, but is undefined at 0 and 1.
E1
8.10 Example The standard normal or Gaussian distribution has p.d.f. f(x)
=
1
/-2a
-x2/2
e
-x
'
1
as n becomes large. However, the validity of this hypothesis depends on the method of repeating the experiment. See Part IV for the details, but suffice it to say at this point that the equivalence certainly holds if F(A') exists and the random experiments are independent of one another. The connection is most evident for simple random variables. lf X = jxjjej where the ()) are a partition of f1, then by 4.4, osllj). /?(r) !F7 j cr
When the probabilities are interpreted as relative frequencies of the events Ej (: X'4(0) z)) in a large number of drawings from the disibution, (9.2) wlth large n should approximate (9.3). The values xj will appear in the sum in a proportion roughly equal to their probability of occurrence. F(m has a dual characterization, as an abstract integral on the parent probability space and as a Lebcsgue-stieltjes integral on the line, under the derived distribution. lt is equally conrct to write either (9.1) or =
=
+x
EX4
=
xdFxx).
(9.4)
Which of these representations is adopted is mainly a matter of convenience. lf 1x() is the indicator function of a set A G T, then
'(1,4m =
jgxldlk =
Jx(x).x#F@),
(9.5)
Expectations
129
where X(A) e B is the image of A under X. Here the abstract integral is obviously the more direct and simple representation, but by the same token, the Stieltjes fonn is the natural way to represent integration over a set in B.
If the distribution is discrete, X is a simple function and the formula in applies directly. Under the derived distribution,
(9.3)
'k-xj
F(m
where xj, j
=
(9.6)
p,(ta)l),
=
1,2,..., are the atoms of the distribution.
9.1 Example If X is a Bernoulli variable (8.6),EX) = 1.# + 0.(1
-#)
=
p. n
9.2 Example If X is Poisson (8.8), F(m
=
x
e-
ax x Wx
x
e-,E (x
=
vl
a
z-1
'A'
'A'
.
l ) 1.
-
vl
=
,.
u
(9.7)
For a, continuous distribution, the Lebesgue-stieltjes integral of a' coincides with the integral in ordinary Lebesgue measure of the function xfx).
9.3 Example For the uniform distribution on the interval gtz,)1(8.13), F(m
b
1
=
b a
xdx
=
1a(c +
-
a
b). n
(9.8)
9.4 Example For the Gaussian family (8.19), F(A3
1
=
-(x-jt)2/2o.2
#?
xe
Wfic
=
g.
(9.9)
This can be shown by integration by parts, but for a neater proof see 11.8.
I:a
ln a mixed continuous-discrete distzibution with atoms x1,x2,..., we can use the decomposition F F1 + F2 where F1(x) = Zouxgltlayl)and F1(x) is absolutely continuous with derivative hx). Then =
EX)
77.)
p,1(tA)l)
=
+
jxhxtdx.
(9.10)
The set of atoms has Lebesgue measure zero in R, so there is no need to exclude these from the integral on the right-hand side of (9.10). Some random variables do not have an expectation.
9.5 Example Recall the condition for integrability in (4.15),and note that for the Cauchy distribution (8.11),
1
+u
jxj
J-a(
-a
1+
x)2
dx
--
'>9
as
f'
-->
=.
El
(9.11)
130
Probability
9.2 Expectations of Functions of X lf X is a r.v. on the probability space @,f,g), and gl R F-- R is a Borel func-1), as note d in j8.1. This leads is a nv. on the space t'R,f tion, goX to the following dual characterization of the expectation of a function. =
'(m
,g.'
9.6 Theorem If g is a Borel function, F('(A'))
jglxtdbkxb /dB#-1t3')'
=
=
&n):R+
Proof Define a sequence of simple functions m i
z(a)(x)77I =
-
1
(9. 12) F-
R+ by
(9. 13)
1s,.@),
2n
1,...,-. Then, Z(n)(x) 1 and Bi = (2-&( 1), 2-'') for for x k 0, by arguments paralleling 3.28. According to 3.21, (R,S,pw -1 ) j s a g(#-1(#)) for B e S, and so by the monotone measurespace where g' -1(B)
where m
=
nln
't
+
.x
=
-
=
convergencetheorem,
-l(y) y;m
JZ(n)(A')t*'
1
-
.
f=1
zn
jydvg-t).
-1(sj)
.-.y
>'
(9. 14)
Consider first the case of non-negative g. Let 1s@) be the indicator of the set B G 0, and then if g is Borel, so is the composite function
1, glx)
(1so')@)
=
6
B
0, glx) e
B
=
1#-l(s)(x).
(9. 15)
Hence, consider the simple function (Z(,,)o#)@) =
m
77
i
1
-
2n
=1
m
(1sfo#)(x) =
By the same arguments as before, Znjog However, m
Eznjog)
=
77
j
j=1 m i
.
j
2n -
1
gn =X =1
'1'
77 f=I
i
-
2n
1
1#-1(sp@).
g, and Elznlog)
=
Eg)
=
JWg.
.j
g,(# Bi))
-
j
g,# Bi)
=
ztalllfp.'
-
1
(A),
and (9.12)follows from (9.14). To extend the result to general g, consider the non-negative g'b g separately. lt is immediate that maxf ',0J and g-
-->
(9.16)
(9. 17)
functions g'
=
Expectations E(Zn)og+4 Eznjog-) -
m
j.j
m
j-1
X =1
=
Eg-)
-
Eg),
=
(9. 18)
..j
>7 =1 2/
=
Eg')
--
of this limit separately.
so consider each component Eznlog-)
131
p,((#+) Bi)4 (9.19) jooycyg-jo),
-1
ptt# (wi ;;
2*
-..y
0
-1
-1
second equality holds because
where the g +) (Bi) g Bi) for i k g sjnce the 1 the tel'm elements of Bi are a11 positive for these cases, whereas for i disappears. Similarly, -Zn(x) 1 x for < 0, and =
=
.z'
?rj
-7r
-znjog-)
j.j
-
f=1
2/
m
i- 1
-:7
-
zn
j=1
.j
p.((#-)
Bi4)
0
j--ydv.g
-1
p,t, (#7))
--->
..j
(T),
(9.20)
1)1, and in this case the second equality holds where B; (-2-&, because (g- ) (#j) g (-Bi) j or j u z uence -2-''(f
=
-
-1
-1
=
.
Ezn,og
+) -
Ezn,og-)
j-
--
ydjw-lol,
(9.21)
and the theorem follows in view of (9.18).w The quantities F(X), for integer k 2 1, are called the moments of the distribution of X, and for k > 1 the central moments are defined by EX
-
Exlk
=
k
p
X
-
j
Eujh-kjxllk-i
(9.22)
-
A familiar case is the variance, Vartm E(X- F(m) 2 EX ) F(A') the usua j measure of dispersion about the mean. When the distribution is symmetric, with PX- F(A') e A) #(F(m X e 4) for each A e tB,the odd-order central moments are all zero. =
=
9.7 Exampl.
=
2
-
2
,
-
For the Gaussian case
(8.14),the
k'sk EX
-
g) k
=
2
kll
'
(kll)!
central
k even,
moments are
(9.23)
k odd. This formlamay be derived after some manipulation from equation (11.22) below. Vartm c 2 an d a11 the finite-order moments exist although the sequence increases monotoniclly. n =
,
Probability
132
The existence of a moment of given orderrequires the existence of the corresponding absolute moment. If FlXlP < =, for any real p > 0, X is sometimes said to belong to the set Lp (of functions Lebesgue-integrable to order p), or otherwise, to be Lp-bounded.
9.8 Example For X
N(0,c2), we have, by
-
FlXl
2(2a
=
(8.26),
y)-1/2j*.x e-xllloldx ()
(9.24)
l?2 = (2/1) c. n 'raking the corresponding root of the absolute moment is convenient forpumoses of comparison (see 9.23) and for X e Lp, the Lp-ttorm of X is defined as
(FI#I')'* 11111 p
(9.25)
=
The Gaussian disibution possesses a11finite-order moments according to (9.23), but its support is none the less the whole of R, and its p-norms are not unlrmly x, it coincides with the essential bounded. If IIXII, has a finite limit as p supremum of X, so that arandom variable belonging tofax is bounded almost surely. --
9.3 Theorems for the Probabilist's Toolbox The following inequalities for expected values are exploited in the proof of innumerable theorems in probability. The first is better known as Chebyshev inequalf@ for the special case p = 2. 's
9.9 Markov's
For er > 0 and # > 0,
inequality
#( l11 k Proof
tpp
l.Y1
Epjj
:)
=
Al
f:)
Fj aYjf
(9.26)
.
:
yvdFx) K
P
Jjzjzalxl PdFx) f E
1115.*
This inequality does not bind unless F1X1PIZP < 1, but it shows that if Fj -Yjr < x. The order of the tail probabilities converg to zero at the rate z-P as f: fv-boundedness measures the tendency of a distribution to generate outliers. The Markov inequality is a special case of (at least) two more general inequalities. =,
...-4
9.10 Corollary For any event A e F, E
Equivalently,
#((:
j
Arnt
IX()) I
Proof Obvious from 9.9.
lllkcl :1
dp fa
u
j IxI,g,. A
A) f F(1z
(9.27)
lXI#)/e'.
.
9.11 Corollary Let g: R F-> R be a function with the property that x ga) > 0, for a given constant a. Then gxj
a implies
Expectations
PX k a)
0
Proof lntegration by parts gives, for some b joxrdFx)
0,
>
=
-
bryb) rjo
= r
-
>
0,
rjoxr-ypdx
Fpldx V--l(s(,) -
x 0
r-lp
dx.
(.x< x u
The theorem follows on letting b tend to infinity.
.
If the left-hand side of (9.37)diverges, so does the theorem is true whether or not EX) is finite.
9.15 Corollary
lf X is non-negative
J-
-xdF
-
spx k
(9.37)
light,
and in this sense the
and integrable,
o
+
j-px
> xvx.
-
(9.38)
Expectations Proof Apply 9.14 Fith r
135
1 to the r.v. 1(xk:)X. This gives
=
J- J-tlfxkslx = Jo 'dF
=
>
xvx
>
xltfx
o
$(1(xk:)X
+
j*PX
>
xldx
:
=
#(X k
E ejodx
jvP(X c
+
>
xjdx.
.
(9.39)
Not only does (9.38)give the Markov inequality on replacing non-negative X by IXI; for p > 0 and arbitral'y X, but the error in the Markov estimate of the tail probability is neatly quantified. Noting that P( IXI E) = 81.11#2 zP),
s ##( I-Yl k e)
j-
=
e
#
IXIJWF
= FIXl# where both the subtracted
j
-
j-
-
gP
P
IA-I'>
#
IXIPJF
px
-
0
I J
xldx
#(IXIP
eP
terms on the right-hand
>
xqdx,
(9.40)
side are non-negative.
9.4 Multivariate Distributions From one point of view, the integral of a function of two or more random variables presents no special problems. For example, if
gl R2
F-y
IR
is Borel-measurable, meaning in this case that g -1 B4 e Bl for every B e B, then #(#((l)),F()) is just a F/f-measurable r.v., and () =
F('(X,F))
=
Jjatldptl
(9.41)
is its expectation, which involves no new ideas apart from the particular way in which the t.v. hlk hppens to be defined. Alternatively,
the Lebesgue-stieltjes Eg-)
=
form is
jpzgx'ybdFx'L'
(9.42)
where dFxny) is to be thought of as the limiting case of &F@,yl defined in (8.30) as the rectangle tends the differential of area. When the distribution is with respect to continuous, the integral is the ordinary integral of '(x,y)J@,y) Lebesgue product measure. According to Fubini's theorem, it is equivalent to an iterated integral, and may be written +x
&'(X,l?))
=
+x
j-yqx,ytx'ybdydx. J.x
But caution must be exercised with formula
(9.43)
(9.42)because
this is not a double
Probability
136
integral in general. lt might seem more appropriate to write d lFx y) instead of JF(x,y), but except in the continuous case this would not be correct. The abstract notation of (9.41)is often preferable, because it avoids these ambiguities. In spite of these caveats, the expectation of a function of (say)X alone can in every case be constructed with respect to either the marginal distribution or te joint distribution. ,
EglxL
9.16 Theorem
jvz
#(X#F(mA')
=
jpgxjdklxj.
=
Proof Define a function +: R2 b..A (R
g
by setting g *@,y)= glx), a 11y e R :*-14#) is a cylinder in R2 with base :-1(#) e B for B e S, and g * i s Sz/s-measurable. For non-negative g, let .
m
* #(a) where m nl n+ 1 an d E i = = l-nli .j xR where Af (x: m
X
=
f=1
i
2
n
(9.44)
ls i
ln
=1
2-''(f 1) S #*(x,y) < 2-NI') e Bl Since Ei 1) K #(x) < 2-% l and tx(Af) = g,(&), -
,
-
.
=
,
j
-
j
.
(@y):
=
F(#*(,'))
j.
=
m
g,t&l
X
=
i
=1
j
-
2
n
p,x(A)
=
Elgnlj.
(9.45)
By the monotone convergence theorem the left and right-hand membrs of (9.45) *x,yldFx,y) and Egj xjdFxxj respectively. Extend converge to E(g.) non-negative general complete the from proof. * to g to =
=
The means and variances of X and F are the leading cases of this result. We also have cross moments, and in particular, the covariance of X and 1' is Cov(X,F)
=
E((X-
&m)(1'- F(F)))
=
Fubini's theorem suggests a characterization
ECXT F(mF(F).
(9.46)
-
of pairwise independence:
9.17 Theorem lf X and F are independent nv.s, Cov@(m,v(F)) of integrable Borel functions ( and
=
0 for al1 pairs
v.
Proof Fubini's theorem gives F(#(mv(F))
=
=
jp,
#@)v(y)#F@,y)
jpxtdlxjvlsbdFkl-bq
= (m)F(v(F)).
w
(9.47)
The condition is actually suffictent as well as necessary for independence, although this cannot be shown using the present approach; see 10-25 below.
Expectations
137
Extending from the bivariate to the general k-dimensional case adds nothing of substance to the above, and is mainly a matter of appropriate notation. lf X is a random k-vector, F(m
=
jxdFx)
(9.48)
denotes the k-vector of expectations, EXi) for 1,...,k. The variance of a scalar r.v. generalizes to the covariance matrix of a random vector. The k x k matrix, =
2 X1X2 X1
XX'
.
.
.
X3Xk
2 X2X1 X2
(9.49)
=
2
Xkxj
Xk
is called the outer product of #, and ECXX') is the k x k positive semi-detinite matl'ix whose elements are the expectations of the elements of XX'. The covariance matrix of X is Vartm
ELCX F(m(#
=
-
-
F(m)'q
=
ECXX') ExTEX4'.
(9.50)
-
is positive semi-detinite, generalizing the non-negative property of a scalar variance. lt is of full rank (notwithstanding that XX' has rank 1) unless an element of X is an exact linear function of the remainder. The following generalizes 4.7, the proof being essentially an exercise in interpreting the matl'ix formulae. Vartm
9.18 Theorem lf F= BX+c where X is an k-vector with EX4 g and Vartm = E, and B and c are respectively an m x k constant matrix and a constant v-vector, then (i) ElY) B# +r. BSB'. a (ii) Vartp =
=
=
Note that it m
>
k Vartl'l is singular, having rank k.
9.19 Example If a random vector Z (Z1,...,Zk)' is standard Gaussian (8.19),it Ik. Applying 9.18 to 0 and EZZ') is easy to verify, applying 9.17, that Ezj = the transformation in (8.35)produces F(A') g and =
=
Vartm
=
EX
-
p)(#-
g4'
=
F(AZZ'A')
=
=
AF(ZZ')A'
=
M'
=
E. n
9.5 More Theorems for the Toolbox Thefollowingcollection of theorems, together with thelensen and Markovinequalities of j9.3, constitute the basic toolbox for the proof of results in proba-
Probability
bility. The student will tind that it will suffice to have his/her thumb in these pages to be able to follow a gratifyingly large number of the arguments to be encountered in subsequent chapters. 9.20 Cauchy-schwartz
inequality
2
1, and q
Proof The proof for the case p
>
if p
x
=
1 requires
=
(9.52)
1.
a small lemma.
9.22 Lemma For any pair of non-negative numbers a,b, ab S
tzp bq
-
P
+
(9.53)
-.
q
If either a or b are zero this is trivial. lf both are oositive, let s = etlq ay = & and t = q log b. Inverting these relations gives a es y rlogtz S/PTf# d an d (9 53) follows from the fact that ex is a convex function of x, noting 3Iq 1 1/p and applying (9.29).* Proof
=
,
,
.
=
Choose a
.
,
=
-
)XI/II-Yllp,b
=
F11t?.For lr1/11
FIXFI
XIp
F
q
= Eab)
these choices, Eap)
S 1/# +
$Iq
=
=
Ebq)
1.
=
1, and
(9.54)
S Fl Xl ess sup F, which holds For the case p = 1, the inequality reduces to FI-YFI since F < ess sup i' a.s., by definition. w
The
lder inequality spawns a range of useful corollaries and special cases, includingthe following.
9.23 Liapunov's
inequality
Proof Let Z F:zlXIF, F = 1, s
(norm inequality) =
r/p. Then,
If r > p > 0, then
(9.52)gives FIZFIf
jlxllr I1-YIlp.
or IIzII,II FIIs(,-1),
Expectations E ( 1X 1#) < E 111
pslls
=
rfr
I11
E
(9.55)
.
.
This result is also obtainable as a corollary of the Jensen inequality.
9.24 Corollary
For each A e
jg IXI'IdP Proof ln
and 1/p + jIq
.
1/p
Jx
p
l71 PdP
0, then, for any constant M > 0, FtltxyvlA')
f
zJtftlf
y>M/2c)F)+F(1(z>ArJw)Z)).
(9.64)
Proof lf we can prove the almost sure inequality 1(x>M)X S 2J(1ty>&z/2u)F+1(z>M/2u)Z),a.s.,
(9.65)
the theorem will follow on taking expectations. 1(x>M)Xis the r.v. that is equal to X if X > M and 0 otherwise. If X K M (9.65)is immediate. At least one of the inequalities F k X/M, Z k Xlla must hold, and if X > M, (9.65) is no less obviously true. w
9.6 Random Variables Depending on a Parameter Let G(,0): f x 0- F-> R, O i R, denote a random function of a real variable 0, or n other words, a family of random variables indexed on points of the real line.
141
Expectations The following results, due to Cramr nated convergence theorem.
easy consequences of the domi-
(1946),are
9.30 Theorem Suppose that for each (l) e C, with #4C) 1, G4,0) is continuous at apoint %,and IG((t),0) I < F((t))for each 0 in an open neighbourhood No of % where =
F(i')
theorem.
.
9.31 Theorem 1f, for each
)
e C with #(C)
G(),%
+
h4
1, (JG/J0)((t)) exists at a point
=
G(,%)
-
n((o)
< for 0 f
< Jl1, where F(l'1)
KdoEa)
R
2
is a cylinder set in R 2 an d the p.m. defined on c(A') is the marginal distribution of X. .
10.2 Example The knowledge that xl fk X ; can be represented R). R c(f(-x, x1: x < xl ), ((x, x): x > .x2
by
.x2),
=
Satisfy yourself that for every element of this c-field we know whether or not X belongs to the set; also that it contains all sets about which we possess this knowledge. The closer together a'1 and are, the more sets there are in R. Wben +x, lO,R 1. n J4 = 5 ctm, and when xl .v1 x2, R .x2
=
=
=
-x,
.:2
=
=
The relationships between transformations and subfield measurability are summarized in the next theorem, of which the first part is an easy consequence of the definitions but the second is trickier. lf two random variables are measurable with respect to the same subfield, the implication is that they contain the same information; knowledge of one is equivalent to knowledge of the other. This means tha t evety Borel set is the image of a Borel set under g- This is a stronger condition than measurability, and requires that g is an isomorphism. It suffices for g to be a homeomorphism, although this is not necessary as was shown in 3.23. .
10.3 Theorem Let X be a r.v. on the space (S,fs,p), and let i' gl S F-+ 'T is a Borel function, with S i R and 'T i R. (i) ctl') c ctm. (ii) c(1?) ctm iff g is a Borel-measurable isomorphism.
=
glX) where
=
Proof Each B e f.r has an image in fs under g -1 w hich in turn has an image in c(m under X -1 This proves (i). To prove (ii), define a class of subsets of S, C (.g-1(#): B e B v.). To every -1 A S there corresponds (sinceg is a mapping) a set B 'T such that A = g B), and making this substitution gives ,
.
=
fs
lz4:
'(z4)
e
.r 1
fp-j (#): gg
=
-1
(#)) e s v l
=
g
jj(;
,
.
j4;
where the inclusion is by measurability of g -1 an d the second equality is because -' = gg (#)) B for any B c 'T, since g is 1-1 onto. It follows from (10.14)that ,
fx -1( s -4)
j:
z4
e ls
)
g
(x-1(:-1(,)) c
f: B s f.y).
(10.15)
If F is V-measurable for some c-field V 5 (suchthat V contains the sets of the right-hand member of (10.15)), then X is also V-measurable. ln particular, ctm ctl?). Part (i) then implies c(m = c(F), proving sufficiency of the conditions. = To show the necessity, suppose first that g is not 1-1 and glxL) = y '(.n)
Conditioning
(x1) and (x2)are elements of Ss but not of C, which contains only g -1(fy)) fxl) k.p (x2) Hence : Bs C, and 3 a Os-set X for which G there is no Or-set B having the property g -1(B) A This implies that (say)for xl
y:
The sets
.n.
=
.
.:-1(4)
=
.
c(F), so that c(F) c ctm. We may therefore assume that g is 1- 1 If g is not Borel-measurable, then by C; and again, definition 34 = g -1(B) G B s suc h thatgtA) =B e tB.r,and henced Ss C, so that c(l') c ctm by the same argument. This completes the proof of
ctm but
-1
.
necessity.
.
We should briefly note the generalization of these results to the vector case. A random vector #4): f F-> Rk i s measura b1e with respect to 97 c 5 if X -1 B4
=
f
:
A)l
e #)
G
s vs ,
s pk
(1().j6)
.
If ctAr) is the c-tield generated by X, we have the following result.
10.4 Theorem Letfbe $ c R an d Bks fl =
F
k arandom vector on the probability space (S,fs,p) Sk) and consider a Borel function s: B e
f-h
,
g(m: S
=
w jaere
F--
1', T' g R2'.
(10.17)
(i) G'(1') l c(A'). (ii) If m k and g is 1-1 with Borel inverse, then c(1') =
=
ctm.
Proof This follows the proof of 10.3 almost word for word, with the substitutions of fs k an d Bkr for Ss and S.r, X and F for X and F, and so forth. .
10.3 Conditional Expectations Let i' be an integrable r.v. on (f,T,#) and N a c-tield contained in F. The term conditional expectation, and symbol F(FIN), can be used to refer to any integrable, N-measurable r.v. having the property jgFtrl
U)dP
=
F(FI joj'dp
G)#(G), all G e N.
=
(10.18)
Intuitively, F(FIV)()) represents the prediction of F() made by an observer having information V, when the outcome (.t) is realized. The second equality of (10. 18) supplies the definition of the constant S(l'l G), although this need not exist unless #4G) > 0. The two extreme cases are F(FI@) i' a.s., and F(Jrl 5) EY) a.s., where T denotes the trivial c-field with elements (f,0 ). Note that f s 5', so integrability of F is necessal'y for the existence of F(FIV). The conditional expectation is a slightly bizarre construction, not only a r.v., but evidently not even an integral. To demonstrate that an object satisfying (10.18) actually does exist, consider initially the case F 0, and define =
v(G)
=
JgF##.
10.5 Theorem v is a measure, and is absolutely
=
(10.19) continuouj with respect to #.
Probability
148
0. lt remains to show Clearly, v(G) 2 0, and #(G) = 0 implies v(G) is a disjoint sequence, then countable additivity. If ll Proof
=
vUb
v,o,Ydp
= *'
j
J
=
J
X j
o,Ydp *'
-
'
Xvt(l,
(10.20)
./
where the second equality holds under disjointness.
.
So the implication of (10.18)for non-negative F turns out to be that F(FIT) is the Radon-Nikodym derivative of v with respect to #. The extension from nonnegative to general F is easy, since we can write F F+ F- where F+ and Yare non-negative, and from (10.18),F(Fl V) F(F+IV) E(Y- IN), where both of the right-hand r.v.s are Radon-Nikodym derivatives. The Radon-Nikodym theorem therefore establishes the existence of F(FI5:); at any rate, it establishes the existence of at least one nv. satisfying (10.18).It does not guarantee that there is only one such r.v., and in the event of nonuniquenes, we speak of the different versions of F(l'IV). The possibility of multiple versions is rarely of practical concern since 10.5 assures us that they are al1 equal to one another a.s.g#j, but it does make it necessary to qualify any statement we make about conditional expectations with the tag a.s.', to indicate that there may be sets of measure zero on which our assertions do not apply. ln the bivariate context, F(Fl c(m), which we can write as FtFlm when the context is clear, is interpreted as the prediction of F made by observers who observe X. This notion is related to (10.7)by thinking of F(l'l x) as a drawing from the distribution of F(FIm. =
=
-
-
10.6 Example ln place of (10.11)we write &Flm(t,))
61 2
=
---(X() p,2 + C1 1
-
p.1),
(10.21)
which is a function of X(l, and hence a r.v. defined on the marginal distribution of X. F(FIm is Gaussian with mean g,2 and variance of c2la/cll. E! Making
F(FIx)a point
in a probability space on R circumvents the difficulty
previously with conditioning on events of probability 0, and our is valid for a11distributions. lt is possible to define F(FIG4 when
encountered construction #(G) 0. What is required is to exhibit a decreasing sequence fGn e N) with #(Gn) > 0 for every n, and Gn 1 G, such that the real sequence (F(1'1 Gn)) converges. This is why (10.7)works for continuous distributions. Take Gn gx,x + 1/p1XR s =
=
c(A'), so that G
=
Using
(x)xR.
(10.4)in (10.18),
+co f*x+1/n
f(rl Gn)
J..J
yfl,ytdldy
+x
x
=
--
+.
ujtn
uJ-?'(''.''vjvy
j-giy'xsdy =
&Fl*,
(10.22)
Conditioning x. Fubini's theorem allows us to evaluate these double integrals one as n dimension at a time, and to take the limits with respect to n inside the integrals with respect to y. --->
Conditional probability can sometimes generate paradoxical results, following case demonstrates.
as the
10.7 Example Let X be a drawing from the space (g0,11, p,IJ, m), where m is Lebesgue measure. Let N c fp,1) denote the c-field generated from the singletons fxl, x e (0,1). A1l countable unions of singletons have measure 0, while all complements have measure 1. Since either #(G) 0 or #4G) = 1 for each G e V, it is clear from (10.18)that F(XIN) F(m a.s. However, consider the following weknowwhetheror notxc Gforeach Ge V, weknow argument. 'Since fx) e N, if x. ln particular, V contains knowledge of the outcome. It ought to be the case that F(XIV) X a.s.' n =
=
=
,
=
The mathematics are unambiguous, but there is evidently some difficulty with the idea that V should always represent partial knowledge. lt must be accepted that the mathematical model may sometimes part company with intuition, and generate paradoxical results. Whether it is the model or the intuition that fails is a nice point for debate.
10.4 Some Theorems on Conditional Expectations 10.8 Law of iterated expectations
(LlE)
FgF(J'IN)j Proof lmmediate from
(10.23)
Ek.
=
(10.18),setting
G
=
D.
.
if constant' under The intuitive idea that conditioning variables can be held the conditional distribution is confirmed by the following pair of results. 'as
10.9 Theorem lf X is integrable and V-measurable, then Proof Since X is N-measurable, E
+
then
jyydp
-
=
1): X()
>
F(XIN)()1
F(.YI jg-yxf jyjxN)J#
This contradicts (10.18),so PE*) = (tt):X((t7) < #(xl V)(@) E V.
N))J#
=
=
F(#IN)
>
=
X, a.s.
e N. lf #(F+)
0.
0. By the same argument, P(E-)
>
0,
(10.24) =
0 where E-
*
10.10 Theorem lf Fis F-measurable and intepable, X is V-measurable for V g @, N) a.s. < x, ten &FaY1T) and E IA-FI -YSIFI =
Proof By definition, the theorem follows if %)dP xYFIFI
Let
xn)
=
o
'z-lajlsj
=
J s
XYdP a.s., for al1 G e T.
(10.25)
be a V-measurable simple f.v., with F1,...,Fn a pmition of
Probability
150
f and Ei e V for each i. (10.25)holds for X
X(n) since, for all G e V,
=
jgxtnwtyl cldp nwj ac, s(I,Itvp x -
i= 1
=X-
.ij
yw,
-
GIAEt'
/=1
j x-,vdp,
(10.26)
G
Ei e N when G e V and Ei e V. Let X k 0 a.s., and let (X(,,)J be a monotone sequence of simple N-measurable converging XY a.s. and IX(n)F( K lXFI functions to X as in 3.28. Then XnlY N) where F1 XFI < x by assumption. Similarly, X(n)F(FI #FIl'IV) a.s., and
noting G
fa
,--+
,
--
e-l-Ytnlftrl N) I
=
1-Y(n)1'l I9-))
F1 F(A)n)FI N) I S EE
= FlaYtall'l
e-IA'l'I
6(FIT)((t))1e V, itfollows f: F(xYI
that #(A) = 0. The proof of (iii) uses 4.8(ii), and is otherwise identical to that of (ii). w
10.14 Conditional
=
IF(FIT) I f
modulus inequality
E Il'l IT) a.s.
Proof Note that IFI F+ + Y-. where F+ and Y- are defined in (10.28).These are non-negative nv.s so that F(F+l N) > 0 a.s. and EY- j N) 0 a.s. by 10-13(i) 1, and (ii). For ) e C with PC4 =
=
lF(F+
-
Y- lN)()1
=
IF(F+l T)()
-
< F(l'+IT)((o) + =
Ev+w y-
/(F-1 97)())1 EY-
I )(),
where both the equalities are by linearity.
.
lN)() (10.34)
Probability
10.15 Conditional monotone convergence theorem lf L f F and L 1- 1' a.s., then F(FnI5') 1' F(1'1 N) a.s. Proof Consider the monotone sequence Zn L K Since Zn f 0 and Zn S Zn+I, 10.13 implies that 4he sequence Ezn lN) ) is negative and non-decreasing a.s., and hence converges a.s. By Fatou's Lernma, =
-
ja limsup Ezn N) dP limsup jaE(Zn U4dP k l I n'-/oo
n'->
= limsup
Zndp
=
G
N-->=
(10.35)
0
for G e V, the tirst equality being by (10.18), and the second by regular monotone convergence. Choose G = f : limsupaAzn lT)() < 0J which is in N by 3.26, and (10.35) implies that #4G) 0. It follows that ,
=
V) lim F(ZnI
0, a.s.
=
(10.36)
.
n'-co
10-16 Conditional
Fatou's
liminf
lemma If Fn k 0 a.s. then
&FnIV) k
E liminf L IN a.s.
influnrk so that l%' is non-decreasing, and converges to F = EY' IW) by 10.15. F,, k Fr, and hence E liminfarr,. Then E'n' 5') l5>)1 E'n' lT) a.s. by 10.13(ii). The theorem follows on letting n . Proof Put
L'
(10.37)
n-'+oo
n-oo
=
--+
--)
,.
Extending the various other corollaries, such as the dominated convergence theorem, follows the pattern of the last results, and is left to the reader.
10.17 Conditional
Markov
inequality
#(1IFl
P E.l IT) S
F(l
rlpl T) EP
,
a.s.
Proof By Corollary 9.10 we have
s'jolfIrlkcll' By definition, for G e N,
jopf I1'l
f
Jgl
Y'PdP, G e @.
k El I*4dP
and jaEf
FIPIUtdp
=
=
Jclt
ll'lk.:)tf#,
Jcl
YPdP.
Substituting (10.39)and (10.40)into (10.38),it follows that
(10.38)
(10.39)
(10.40)
Conditioning JcEEf#(t
l1'I 2
El
153
I5') -F(I F1#1U4dp
(10.41)
f 0.
The contents of the square brackets in (10.41)is a V-measurable r.v. Let G e N 0. . denote the set on which it is positive, and it is clear that #(G) =
10.18 Conditional Jensen's inequality Let a Borel function () be convex on an interval 1 containing the support of a i-measurable r.v. F where F and (/(J') are integrable. Then
4'tftrlT))
< F@tl'')l
Proof The proof applies 9.13. Setting x
N), a.s.
(10.42)
F(l'IV) and
=
y
J' in
=
T)) < :(F) -#(F(1'IN)).
A(F(1'l V))(F-F(Fl
(9.31), (10.43)
V)) is a random variable. It is not certain that However, unlike A(F(D),A(F(FI the left-hand side of (10.43)is integrable, so the proof cannot proceed exactly like that of 9.12. The extra t'rick is to replace F by 1sF, where E ftl): F(FIN)() f #1 for B < x. F(Fl V) and hence also ls are V-measurable random variables, so F(1sFIT) = 1sF(l'IV) by 10.10, and =
N) e'@(1sl')I
=
= Thus, instead of
F(9(F)ls+ lsfttF)l
#(0)1rIT) V)
(1
+
-
1s)#(0).
(10.44)
(10.43),consider
A(F(ls1'IT))(1sF-
lsF(l'l V)) k #(1sF) -#(1sF(l'l
T)).
(10.45)
The majorant side of (10.45)is integrable given that (/(i') is integrable, and hence so is the minorant side. Application of 10.9 and 10.10 establishes that the conditional expectation of the latter term is zero almost surely, so with (10.44) we get
#(1sF(l'IV)) Finally, 1et #
--
x
< 1sF@(1?)l V)+
so that ls
-->
(1
-
(10.46)
1s)9(0), a.s.
1 to complete the proof.
.
The following is a simple application of the last result which will have use
subsequently.
10.19 Theorem Let X be V-measurable and fv-bounded for r k 1. If F is F-measurable, X+ F is also Lr-bounded, and F(Fl V) 0 a.s., then =
FIxY+ FIr k
FI-YIr.
(10.47)
Proof Take expectations and apply the LlE to the inequality F(1-Y+ FIr! N) 1
IEX+
irj N)Ir
=
lx1r
a.s.
.
(10.48)
Finally, we can generalize the results of j9.6. It will suffice to illustrate with the case of differentiation under the conditional expectation.
154
Probability
10.20 Theorem Let a function G(,0)
e'(-J I I
r)
dedl 9'' -
:-o
E
-
G(%)
hv
T
I
8=00
(v, v e
Proof Take a countable sequence of the conditional xpectation, G(0() + h
satisfy the conditions of 9.31. Then
F(G(%
(Nl with hv +
v)
=
lV)
(10.49)
-.s.
,
-
--
0 as v
x.
--
(G(%) IT)
By linearity
(10.50)
a.s.
hv
If Cv 6 V is the set on which the equality in (10.50)holds, with #(Cv) 1, the #(OvCv) a nd limit 1 by 3.6. The in the two sequences agree on the set Ovcv, left-hand side of (10.50)converges a.s. to the left-hand side of (10.49)by assumption, applying the conditional version of the dominated convergence theorem. Since whenever it exists the a.s. limit of the right-hand side of (10.50) is the right-hand side of (10.49)by definition, the theorem follows. . =
=
10.5 Relationships between Subfields TI
5 and Mzc 5 are independent subfields if, for every pair of events Gj e V1 and Gz s N2, #(Gj r7 G
=
#(G1)#(Gc).
(10.51)
Note that if F is measurable on Vl it is also measurable on any collection containing V1,and on ; in particular. Theorems 10.10 and lo-llcover cases where F as well as X is measurable on a subfietd.
10.21 Theorem Random variables X and F are independent iff ctm and ctF) are independent. Proof Under the inverse mapping in (10.13),G1 e c(m if and only if #1 X(G1) i T with a corresponding condition for c(F). It follows that (10.51)holds for each G1 e c(m, G1 G c(F) iff PX* Bj, i'e Bz4 PX e #1)#(Fe Bz) for eachfl X(G1), Bz F(G2). The only ifo of the theorem then follows directly from the follows, given (8.38),from the fact that every Bi e definition of ctm. The B has an inverse image in any subfield on which a r.v. is measurable. w =
=
=
=
'if'
The
only ifo in the first line of this proof is essential. Independence of the subtields always implies independence of X and F, but the converse holds only for the infimal cases, c(m and f5(1').
10.22 Theorem Let F be integrable and measurable on V1. Then 5'I a.s. for a11 V independent of
F(FIV)
=
EY)
.
Proof Define the simple Vl-measurable r.v.s l'(n) G11,...,G1s of ( where G1j e V1, each i, with l%)
On a partition = Z'lzzlyflclf 1 F as in 3.28. Then
Conditioning
jynlb
5')J#
n
X'faFtltyl/l
=
f=1
=
*'
vl,
%''
#(G)XX#(G1f)
-
155
ynyjacjf
c)
ra
=1
V,
all G e
fOr
PG4EYn)4
=
(10.52)
f=1
F(i') by the monotone convergence theorem. F(F(n) IN) is not a simple function, but EYn) IV) F(Fl V) a.s. by 10.15, and
Ejnj)
--
'1'
joEn)
IVV# --
VV#' jcfk''
(10.53)
convergence. Hence for Vl-measurable F,
by regular monotone
JgFIFIM4dP
=
for a1l G e V.
PGjEY)
(10.54)
Frop the second equality of (10.18) it follows that F(FIG) which proves the theorem. w
10.23 Corollary
If X and F are independent, the
Proof Direct from 10.21 and 10.22, putting V
=
&FI
EY4 for a11 G e V,
=
.)
=
E').
ctm and V1
=
c(F).
.
10.24 Theorem Apairof c-fields V1cF and Vac 5 areindependentiff Cov(X,D 0 for every pair of integrable r.v.s X and F such that X is measurable on V1 and F is measurable on N2. =
Proof By 10.22, independence implies the condition of 10.11 is satisfied for N tif' consider X ito To prove lcj, G1 e V1, and F 1cz for V1, proving G2 e N2. X is Nl-measurable and F is Na-measurable. For this case, =
F(o*(rj)for ri, rj e :) with ri < (/ Similarly, 1et Ri denote the set of (l) on which limn-yx/trj + 1/n) # FLlril,ri e e. And tinally, 1et L denote the set of those ) for which F(l(+x) # 1 and F(o*(-x) y: 0. Then C (Uj#C is monotone and right-continuous at rn Lc is the set of on which FfLvl rn F(l(+x) with 1 and Ff)(-x) 0. For y e R let a11rational points of the line, .
-&lc
=
=
Fg'@) y e ()
=
,
Fr(. I5')()
=
F.(.y+), y e R
(,)e c,
-0
(10.74)
Otherwise,
G(X, where G is an arbitrary
c.d.f. ln view of 10.29, #(Mj) 0 for each pair i,j, PRi4 0 for each i and PL) 0. (1fneed be, work ln the completion of the space to define these probabilities.) Since this collection is countable, PC) 1, and in view of 8.4, Fy(. IV)() is a c.d.f. which satisfies (10.73),as it was required to show. . =
=
=
=
lt is straightforward, at least in principle, to generalize this argument to multivariate distributions. For # e B it is possible to write F(1s I5')(t,))
=
V)() jedkyf
(10.75)
a.s.,
and the standard lgument by way of simple functions and monotone convergence leads us full circle, to the representation
F(FI9')()
+x
J
=
-X
y#&@l
syy(y,a.,.
(10.76)
ctm, we have constructions to parallel those of j10.1. Since no restriction had to be placed on the distribution to obtain this result, we have evidently found a way around the difficulties associated with the earlier detinitions. However, Fy(. IV)() is something of a novelty, a c.d.f. that is a random element f'rom a probability space. lntuitively, we must attempt to understand this as representing the subjective distribution of F(@ in the mind of the observer who knows whether or not ) e Gfor each G e V. The particular case Fy(. IN)4) is the one of interest to the statistical modeller wheh the outcome ) is realized. Manyrandom variables may begeneratedfromtheelements of (D,1,#), notonly the but also variables outcome itself in the bivariate case the pair F(),X() and A11 have the quantiles of Fv@ these such as F(FIm(), to be thought I of as different aspects of the same random experiment. Let X and F be r.v.s, and N a subfield with V c Rx c(A') and V ? Rv c(F). We say that X 4nd F are independent conditional on V if If V
=
-
-
m().
=
Fxrtmyl 5')
=
Fx@l S'IFI-I.yI V) ms.
=
(10.77)
Probability
160
This condition implies, for example, that f(Xl'IV) g(.,(o) be the conditional measure such that g,(o(lX 6
(--,A1,
i'
(--,A'1 1)
6
=
=
F(XIV)F(FIV) a.s.
Fxvx'y
Let ju
=
1V)().
with (t) fixed this is a regular p.m. by (the bivariate generalization ofl 10.30, and gwotAfa #) g(o(A)g(o(#) for eachA e Rx and B e Xy, by 10.21. ln this sense, the subtields Rx and Ry can be called conditionally independent. =
10.31 Theorem.
lf X and F are independent conditional on V, then
f(l'1 Rx4
F(1'l N) a.s.
=
(10.78)
Proof By independence of Rx and Rv under JxftFl
XaYll/t
=
JxFtfltto
=
u we can write BtX)JFtV' X e Rx.
(10.79)
This is equivalent to
&1z&F1Jx)l 9')(t,)) F(lxl'l 5')() =
T) IT)() = F(1xF(1'I
were tlae tirst equality also with respect to #, noting f
JxF(FIRdp
=
a.s.(#1,
(10.80)
follows from 10.26(i) and 10.10. lntegrating over f e V, using 4.8(ii) and the LIE, we arrive at JxF(Fl U4dP. A e Rx.
jgk'dp =
This shows F(Fl Rx) is a version of
F(FIN), competing
(10.81)
the proof.
F(FIRx) is in principle
.
Xvmeasurable, it is in fact almost surely I#J equal to a V-measurable r.v. Needless to say, the whole argumentis symmetric in X and Rp The idea we are capturing here is that, to an observer who possesses the information in V (knowswhether (t) e G for each G e V), observing Xdoes not yield any additional information that improves his prediction of F, and vice versa. This need not be true for an observer who does not possess prior information. Equation (10.77) shows that the predictors of F based on the smaller and larger information EY4 a.s., so sets are the same a.s.(#), although this does not imply F(Fl R that X apd F are not independent in the ordinary sense. Thus, while
=
11 Charqcteristic Functions
11.1 The Distribution of Sums of Random Variables Let a pair of independent r.v.s X and F have marginal c.d.f.s Fx@) and Fy@). The c.d.f. of the sum J'P X+ F is given by the convolution of Fx and Fs the =
function
Fx*Fy(w)
+x J..Fx(w
=
(11.1)
-ytdFv(9.
11.1 Theorem If r.v.s X and F are independent, then Fx*Fy(w)
#(X+ F f w)
=
FI'*Fx(w).
=
(1 1.2)
Proof Let 1w(my) be the indicator function of the set fmy:x K w y), so that F(1w(X,F)). By independence Fxty) FxxqFvy), so this is #(X+ F < w) -
=
=
Js21w@,A')F@,A')
+x
P+x
J..J
dFyIyl
.x1w@,.')tFx(I)
=
+x
w-y J..JFx@)
J..
=
dFyvl
+x
Fxtw
=
-
y)#Fy(y),
(1 1.3)
-X
where the first equality is by Fubini's theorem. This establishes the first equality in (11.2). Reversing the roles of X and i' in (11.3) establishes the second. w For continuous distributions, the convolution
f
=
fx*fyof
p.d.f.s
fx and h
is
+x
j-.fxw -').fy(')*,
f(w)
=
such that
Jlx/'ttlx
=
(1 1.4)
F(w).
11.2 Example Let X and Fbe independent drawings from the uniform distribu1m,1j@).Applying (11.4)gives tion on 0,11, so that fxx) =
/k+r(w)
=
1
llw-l,wj#y.
()
that the graph of this function forms an isosceles triangle lt is with base (0,21and height 1. n easily veritied
Probability
162
This is the most direct result on the distribution of sums, but the formulae generated by applying the rule recursively are not easy to handle, and other approaches are preferred. The moment generatinghlnction (m.g.f.)of X, when it exists, is jedFx), Eeth Mxtf) t e R, (11.6) =
=
where e denotes the base of natural logarithms. (Integrals are taken over (-x,+x) unless otherwise indicated.) If X and i' are independent, Mx-vvt)
jjetx*'gdt-xx4dll-y,
=
=
jetxdnxjetydFyy)
= Mx(f)Mg(/).
(11.7)
This suggests a simple approach to analysing the distribution of independent sums. The difficulty is that the method is not universal, since the m.g.f. is not defined for every distribution. Considering the series expansion of e all the moments of X must evidently exist. The solution to this problem is to replace the variable f by it, where i is the imaginary number, VX. The characteristic function (ch.f.)of X is defined as ,
9x(f) =
1l
.2
Ee
itX
)
itx
(1 1.8)
e dFx).
=
Complex Numbers
A complex number is z a + ib, where a and b are real numbers and i V-X.a and b are called the real and imaginary parts of the number, denoted a = Re(z) and b lm(z). The complex conjugate of z is the number i' a ib. Complex adthmetic is mainly a matter of carrying i as an algebraic unknown, and replacing 2 j)y j /3 i 4 by 1 e tc wherever these appear in an expression. by One can represent z as a point in the plane with Cartesian coordinates a and b. The modulus or absolute value of z is its Euclidean distance from the origin, =
=
=
=
-
-
,
-,
,
.,
lz l
=
(zD 1/2
=
a
2
+
jp
1/2
)
(1j
.
.
9)
Polar coordinaies can also be used. Let the complex exponential be defined by
efo = cos 0
+ jsin
8
(11.10)
for real 0. All the usual properties of the exponential function, such as multiplication by summing exponents (according to the nzles of complex arithmetic) go through under this definition, and
Ie i l
=
+ sin2e)1/2 (coszo
=
1
(11.11)
Characteristic
Functions
163
for any 0, by a standard trigonometric identity. We may therefore write z Iz Iei where Retz) Iz Icos 0 and 1m(z) Iz Isin 0. Also note, by (11.11),that =
=
=
.Re(z)+Im(z) j Ie z 1 j =
@Re(z)
jef1m(z)j
=
=
@Re(z)
(j.j. jg;
.
.
random If X and i' are real random variables, Z X+ iY is a complex-valued variable. lts distribution is defined in the obvious way, in terms of a bivariate c.d.f., F@,y). In particular, =
EZ)
(11.13)
F(A') + iElY).
=
Whereas EZ4 is a complex variable, FIZI is of course real, and since 1zl < IXl + Il'I by the triangle inequality, integrability of X and F is suftkient for the integrability of Z. Many of the standard properties of expectations extend to the complex case in a straightforward way. One result needing proof, however, is the generalization of the modulus inequality.
11.3 Theorem
If Z is a complex random variable,
Proot Consider a complex-valued
4F(Z) I S FI I .
.
simple nv.
(a./+ lllsy, zn) 7'j=L -
where the xj and I are real non-negative constants and the Ej e 5 for j constitute a partition of f1. Write Pj F(1sy). Then
(11. 14) =
1,...,p
=
2
IEznl) 1
-
2+
77xjpj
=
2
77fjjpj
+ f5,5/,7? 77(a??+ 7777(%a,+ fbbrjn,
( 11.15)
j*k
j
whereas CEIz(n)I)2
=
=
1/2
>7(a,?+ p,?)Pj
2
./ + + jk2)1/2# j pk. X (( + f5?)#? X X (%?+ Y?)'/2(ak2 J
J
J
j*k
./
(11.16)
The modulus inequality holds for Zn) if
0S =
(E IZ(n) I)2
-
2
IF(z(n)) I
p2,)1/2(%a,+ 1%)1#./#,. 7777E(a?+p,?)''2(a,2+ -
J
(1 1.17)
The coefficients of Pjpk in this expression are the differences of pairs of nonnegative terms, and these differences are non-negative if and only if the differences of the squares are non-negative. But as required,
Probability
164
+ p2,) (%a.+ Ip.)2 (a,?+ If)(a2k -
,
a,?p2,+ a%?
=
=
zwakjf'lk
-
(ay;'J. a.I$)2z O
(11 18)
-
.
.
This result extends to any complex r.v. having non-negative real and imaginary Z X+ iY, using 3.28, and invoking the parts by letting Z(n) = Xn) + in) monotone convergence theorem. To extend to general integrable nv.s, splitXand F X* + iY* into positive and negative parts, so that Z = Z+ r, where Z+ X+ 1'+ A?'-+ iT, with X- k 0 and lr- k 0. Noting that 2 0 and k 0, and Zwith '1k
=
=
-
=
IEZ) I < IE(Z+ + r) l < e'lz++ r I FIzl
(11.19)
=
completes the proof.
.
11.3 The Theory of Characteristic Functions We are now equipped to study some of the properties of the characteristic function x(/). The fact that it is detined for any diskibution follows from the fact 1 and Eeth is finite regardless of the 1 for all x; E( Icfl'fl ) that Ie x I distribution of X. The real and imaginary parts of #x(J) are respectively Ftcos t.k and fsin tX4. =
=
k< 11.4 Theorem If E IA-I
x,
then
k
d px(/)
d
=
Eiuk
k itx
e
).
(1 1.20)
Proof
t-bht #x(J) = h
eixt'ht
+=
-
eitx
-
#(x),
h
-x
(1 1.21)
where, using (11.10), e
t-h)
h
-
e
itx
=
cOs
xt
+
h4 h
-
cOs
tx
+
j is n
.7
(/ + 4 -
siu tx .
h
The limits of the real and imaginary terms in this expression as h sin tx and i(x cos /x), so the limit if the integrand in respectively -->
-x
is
xi cos tx
-
sin fx)
=
fccos tx
+
i sin tx)
=
ie
0 are
(11.21)
it: .
since1tfxlefr-lIxl
the integral exists if A'C-Y'I This proves (11.20) for complete 1. To the case k the proof, the same argument can be applied itx inductively to the integrands (ix)k-1e j or k z a s =
,
=
.
,
,...
It follows that the integer moments of the distribution repeated differentiation with respect to t.
can be obtained by
Characteristic lf Fl#! k
e) ) + 1)! (.+ k! k+1
The second alternative on the light is obtained in view of the fact that k k k+11( IxIx
=
(11.8), then yn
--
e
ittl
--
e
-
itli
(dt
it
..w
for any pair a and b of continuity points of F,' with a generalization of this formula is AF(xl
,...,.q)
=
1 m
za
1im T-+x
z'
z' ...
-w
-z
k
-jx./
e
J''I
jzzj
-
e
(11.42)
0, are a11continuity points of F. n --
1.,
wxza..1
xrolon:
f.n
/1 1 q0'
Tt onn
lw
xzorifiofl
Ilqin
e (11
10
thnt
Characteristic
169
-itb
-ita
d
Functions
-d
--: b a as t
0.
-->
-
it
The integrals in (11.42)and (11.43) are therefore well defined in spite of including the point t 0. Despite this, it is necessary to avoid writing (11.42) =
as '-ita
Fb)
F(J)
-
'-itb
+x e
=
e
-
(11.44)
(f)Jf,
lnit
because the Lebesgue integral on the right may not exist. Forexample, suppose the a'it.j j random variable is degenerate at the point 0; this means that (/) e and =
F
1 21
-
e
ita
,
-itb
e
-
1 r j. 7: 1 t
log F,
--dt
dt k
it
-w
=
-
-
( 11.45)
so that the criterion for I rbesgue integrability over (-x,+x) fails. However, the limits in (11.42)and (11.43) do exist, as the proof reveals. Proof of 11.11 Only the univariate case will be proved, the multivariate extension being identical in principle. After substituting for () in (11.43) we can interchange the order of integration by 9.32, whose continuity and a.s. boundedness requirements are certainly satisfied here: '-itb
-ita
'
d
r
dt
lnit
=
= Using
'-itb
-ita
F
-d
e
e
-
-F
21f/
+x
T
-x
-F
d
oo
-x
itx-a)
j/x
e dF@) dt
itx-b)
-e
t dF(x)
lxit
.
(1 1.46)
(11.10.), ' e itx-a)
-.w
-
e
itfx-b)
t
lxit
z'Sj n yx
=
F sjn j(x
.u)
t
'lt
()
-
()
zt
.y)
t,
(1 1.47)
noting that the cosine is an even function, so that the terms containing cosines x (which are also the imaginary terms) vanish in the integral. The limit as F standard fonuula of this expression is obtained from the -..j
jo'='sin
t t
dt
=
zIl,
tx
()
(:
,
-1/2, tx
>
0
.()
1
If the series on the majorant side converges, this inequality remains valid as n --> x. lt may be contrasted for tightness with the cr inequality for general Xt, (9 62). In this case, 2 must be replaced by nr-1 for 1 < r S 2, which is of no use for large n. .
lll THEORY OF
STOCHASTIC PROCESSES
12 Stochastic Processes
12.1 Basic Ideas and Terminology Let (( 5 #) be a probability space, let 'T be any set, and let RV be the product space generated by taking a copy of R for each element of 'F. Then, a stochastic process is a measurable mapping x: f r--yR'T where ,
,
j
xttDl
=
l-V((,)), ':
(12.1)
e
'T is called the index Jtrf, and the r.v. Xs() is called a coordinate of the process. A stochastic process can also be characterized as a mapping from x'T to R. However, the significant feature of the definition given is the requirement of joint measurability of the coordinates. Something more is implied than having Xs(() a measurable nv. for each 1. Here, 'Ir is an arbitrary set which in principle need not even be ordered, although linear ordering characterizes the important cases. A familiar example is T' ( 1,...,k) where x is a random k-vector. Another important case of 7 is an interval of R, such that x()) is a function of a real variable and R 'T the space of random functions. And when 'T is a countable subset of R, x fXv()), G 7-) defines a stochastic sequence. Thus, a stochastic sequence is a stochastic process whose index set is countable and linearly ordered. When the Xz represent random observations equally spaced in time, no relevant information is lost by assigning and a linear ordering through ENor Z, indicated by the notations f-V()))''*-x.The definition does not rule out 1- containing information about distances between the sequence coordinates, as when the observations are irregularly spacd in time with a real number representing elapsed time from a chosen origin, but cases of this kind will not be considered explicitly. Fnmiliarly, a time series is a time-ordered sequence of observations of (say) economic variables, although the term may extend to unobserved or hypothetical variables, such as the errors in a regression model. Time-series coordinates are labelled t. If a sample is defined as a time series of finite length n (or more generally, a collection of #uch series for different variables) it is convenient observato assume that samples are embedded in infinite sequences of tions. Various mathematical functions of sample observations, statistics or estimators, will also be well known to the reader, characteristically involving a summation of terms over the coordinates. The sample moments of a time series, regression coefficients, log-likelihood functions and their derivatives, are standard examples. By letting n take the values 1,2,3,..., these functions of n observationsgeneratewhatwemay call#crfvdtfsequences.The notionof asequence =
,
':
=
(.V((t)JO
':
dpotential'
Theor.y of Stochastic Processes
in this case comes from the idea of analysing samples of progressively increasing size. The mathematical theory often does not distinguish between the types of sequence under consideration, and some of our definitions and results apply generally, but a clue to the usual application will be given by the choice of index symbol, / or n as the case may be.
A leading case which does not fall under the detinition of a sequence is where 'T is partially ordered. When there are two dimensions to the observations, as in a panel data set having both a time dimension and a dimension over agents, x may be called a random .#e/J. Such cases afe not treated explicitly here, although in many applications one dimension is regarded as tixed and the sequence notion is adequate for asymptotic analysis. However, cases where 'T is ither the product set ; x(N, or a subset thereof, are often met below in a different context. A triangular stochastic array is a doubly-indexed collection of random variables,
X1l Xz1 X51 .Yla X22 X51
11l
(12.2)
X3k1
a=l is some increasing integer compactly written as ( (Akmlmil ) n=1, w he r f/cn )'=' sequence. Array notation is called for when the points of a sample are subjected to scale transformations or the like, depending on the complete sample. A standard Z7=1Var(X,), or some where Xnt = Xtlsn, and sn example is ( lA-n,))=1)=n=j, similar function of the sample moments f'rom 1 to n. e'D
=
12.2 Convergence of Stochastic Sequences Consider the functional expression (Ak())
';
for a random sequence on the space L1,5,P). When evaluated at a point (l) e this denotes a realization of the actual collection real is of numbers generated whenthe outcome sequence, the drawn. lt is natural to consider in the spirit of ordinal'y analysis whether this e f2, we sequepce converges to a limit, say Al. Tf this is the case for every would say that Xn Xsurely (orelementwise) where, if Xnis an T/f-measurable nv. for each n, then so is X, by 3.26. But, except by direct construction, it is usually difficult to establish in terms of a given collection of distributional properties that a stochastic sequence converges surely to a limit. A much more useful notion (becausemore easily shown) is almost sure convergence. Let C ? fl be the set of outcomes such --y
Stochastic Processes
179
.Y()) as n that, for every (t) e C, Xn) x. If #(C) 1, the sequence is said almost surely, or equivalently, with probability one. The notations Xn to converge -->
-->
=
-F.&:
X a.s., and a.s.lim-,, X are al1 used to denote almost sure convergence. A similar concept, of convergence almost everywhere (a.e.),was invoked in connection with the properties of integrals in j4.2. For many purposes, almost sure convergence can be thought of as yielding the same implications as sure convergence in probabilistic arguments. However, attaching probabilities to the convergent set is not the only way in which stochastic convergence can be understood. Associated with any stochastic sequence are various non-stochastic sequences of variables and functions describing aspects of its behaviour, moments being the obvious case. Convergence of the stochastic sequence may be defined in tenns of the ordinary convergence of an associated sequence. If the sequence fExn A'')217converges to zero, th ere is a clearly a sense in which Xn X.,this is called convergence in mean square. Or suppose that for any 6: > 0, the probabilities of the events ( ): IX,,() X()) l < :1 e 5 form a real sequence converging to 1. This is another distinct convergence concept, so-called convergence in probability. In neither case is there any obvious way to attach a probability to the convergent set; this can even be zero! These issues are studied in Part lV. Another convergence concept relates to the sequence of marginal p.m.s of the coordinates, fgwl!', or equivalently the marginal c.d.f.s, (F,,l7. Here we can consider conditions for convergence of the real sequences fgwtAlITfor various sets A e or alternatively, of (F,,(x) J7 for various x e R ln the latter case, uniform or pointwise convergence on R is a possibility, but these are relatively strong notions. It is sufficient for a theory of the limiting distribution if convergence is confined just to the continuity points of the limiting function F, or equivalently (as we shall show in Chapter 22) of gwtAl, to sets A having g,tAl = 0. This condition is referred to as the wdck convergence of the distributions, and forms the subject of Part V. X, or Xn
-->
=
-
-->
-
,
.
12.3 The Probability Model Some very important ideas are implicit in the notion of a stochastic sequence.
Given the equipotency of INand 1, it will suffic to consider the random element (A((l)))T, e f1, mapping from a point of to a point in infinite-dimensional Euclidean space, R=. From a probabilistic point of view, the entire infinite sequence corresponds to a single outcome ) of the underlying abstract probability space. ln principle, a sampling exercise in this umework is the random drawing of a point in R=, called a realization or sample path of the random sequence', we tnay actually observe only a tinite segment of this sequence, but the key idea is that a random experiment consists of drawing a complete realization. Repeated sampling means observing the same tinite segment (relativeto the origin of the index set) of different realizations, not different segments of the same
realization. The reason for this characterization of the random experiment will become clear
18O
Theot'y of Stochastic Processes
in the sequel; for the moment we will just concentrate on placing this slightly outlandish notion of an intinite-dimensioned random element into perspective. To show that there is no difficulty in establishing a correspondence between a probability space of a familiar type and a random sequence, we discuss a simple example in some detail. 12.1 Example Consider a repeated game of coin tossing, generating a random sequence of heads and tails; if the game continues for ever, it will generate a sequence of infinite length. Let 1 represent a head and 0 a tail, and we have a random sequence of ls and 0s. Such a sequence corresponds to the binary (base2) representation of a real number; according to equation (1.15) there is a one-toone correspondence between infinite sequences of coin tosses and points on the unit interval. On this basis, the fundamental space (f,T) for the coin tossing experiment can be chosen as ((0,1),S(0,l)).The form of P can be deduced in an Pttails) 0.5 (i.e.the coin elementary way from the stipulation that Ptheads) is fair) and successive tosses are independent. For example, the events ftails on first toss ) and lheads on Erst toss) are the images of the sets (0,0.5)and g0.5,1)respectively, whosemeasures must accordingly be O.seach. More generally, the probability that the first n tosses in a sequence yields a given configuration 1/2N, of heads and tails out of the ln possible ones is equal in every case to so likely'. The that each sequence is (in an appropriate limiting sense) corresponding sets in (0,1)of the binary expansions with the identical pattern of =
=
tequally
Os and ls in the tirst n positions occupy intervals al1 of width precisely $l2n in the unit interval. The conclusion is that the probability measure of any interval is equal to its width. This is nothing but Lebesgue measure on the half-open interval g0,1).n This example can be elaborated from binary sequences to sequences of real variables without too much difficulty. There is an intimate connection between infinite random sequences and continuous probability distributions on the line, and understanding one class of problem is frequently an aid to understanding the other. The question often posed about the probability of some sequence predicted in advance being realized, say an infinite run of heads or a perpetual alternation of heads and tails, is precisely answered. In either the decimal or binary expansions, all the numbers whose digit sequences either tenuinate or, beyond some point, are found to cycle perpetually through a finite sequence belong to the set of rational numbers. Since the rasionals have Lebesgue measure zero in the space of the reals, we have a proof that the probability of any such sequence occurring
is zero.
Another well-known conundrum concerns the troupe of monkeys equipped with typewriters who, it is claimed, will eventually type out the complete works of Shakespeare. We can show that this event will occur with probability 1. For the sake of argument, assume that a single monkey types into a word processor, and his Ascll-encoded output takes the form of a string of bits (binarydigits). Suppose Shakespeare's encoded complete works occupy k bits, equivalent to kl5 characters allowing for a 3z-character keyboard (upper-caseonly, but including som
Stochastic Processes
181
punctuation marks). This stling is one o f the lk p ossible strings of k bits.
Assuming that each such string is equally likely to arise in k/5 random key presses, the probability that the monkey will type Shakespeare without an error from scratch is exactly 2 -k H ow ever the probability that the second string of lk bits it produces is the right one, given that the tirst one is wrong, is (1 2 '-k)2- w h en the strings are independent. In general, the probability that the monkey will type Shakespeare correctly on the m + llth independent attempt, '&2A11 these events given that the first m attempts were failures, is (1 2 -k) and summing k disjoint, probabilities a11 their 0 yields m over are 1. fmonkey types Shakespeare eventually) ,
.
-
-
.
=
In the meantime, of course, the industrious primate has produced much of the rest of world literature, not to mention a good many telephone books. It is also advisable to estimate the length of time we are likely tcj wait for the desired text to appear, which requires a further calculation. The average waiting time, l-k4m expresse d in units of the time taken to type k bits, is 2-E,1=1rrl(1 2k 1 If we scale down our ambitions and decide to be content with just $TO BE OR NOT TO BE' (5x 18 90 bits), and the monkey takes 1 minute over each attempt, we shall wait on average 2.3 x 1021years. s o the Complete Works don't really bear thinking about. What we have shown is that almost every infinite string of bits contains every t'inite string somewhere in its length', but also, that the mathematical concept of talmost surely' has no difficulty in coinciding with an everyday notion The example is frivolous, but it is useful to be indistinguishable from reminded occasionally that limit theory deals in large numbers. A sense of perspective is always desirable in evaluating the claims of the theol'y. =
-
-
.
=
'never'
.
The first technical challenge we face in the theory of stochastic processes is to handle distributions on R=. To construct the Borel field 000of events on R7 we implicitly endow R= with the Tychonoff, or product, topology. It is not essential to have absorbed the theory of j6.5 to make sense of the discussion that follows, but it may help to glance at Example 6.18 to see what this assumption implies. Given a process x f.il7 we shall write jz 7:k(x) (m,...,A): Rx F-> (jyk =
(
=
.
a)
for each k e EN,to denote the k-dimensional coordinate projection. Let C denote the collection of jnite-dimensional cylinder sets of R=, the sets C
=
(x e
In other words, elements
: ;u@) R oo
e E, E e Bk k e ,
s)
(124) .
of C have the fonn C
=
-1 (F) C:k
(1z 5) .
for some E e T k an d some finite k. Although we may wish to consider arbitrary finite dimensional cylinders, there is no loss of generality in considering the projections onto just the first k coordinates. Any tinite dimensional cylinder can ,
182
Theor.y of Stochastic Processes
be embedded in a cylinder of the form lk -1(F), E e f where k is just the largest of the restricted coordinates. The distinguishing feature of an element of C is that at most a tinite number of its coordinates are restricted. ,
C is a tield.
12.2 Theorem
Proof First, the complement in R= of a set C defined by (12.4)is a-1(Ff) C c = fx e R= : (ak(x)e Ec E e fl k =
,
,
(126) .
which is another element of C, i.e. Cc s C. Second, consider the union of sets C a-1(F') -1 k e C for E, E' G Bk C tp C' is given by (12.4) = Jck E) e C an d C' with E replaced by E tp E ' and hence Ck.l C' e C Tlzird if E e Bk and E' e Bmfor m > k, then Ex? m-ke Bm an d so the argument of the second case applies. . =
,
,
.
.
,
,
Fig. 12.1 It is not easy to imagine sets in arbitrary numbers of dimensions, but good visual intuition is provided by thinking about one-dimensional cylinders in R3 Letting @,y,z)denote the coordinate directions, the one-dimensional cylinder generated by an interval of the axis is a region of 3-space bounded by two infinite planes at right angles to the x axis (see Fig. 12. 1 for a cut-away representation). A union of x-cylinders is another x-cylinder, a collection of But the union and intersection of an x-cylinder with parallel y-cylinder are two-dimensional cylinder sets, a cross' and a column' respectively (see Fig. 12.2). These examples show that the collection of cylinder sets in R k for fixed k is in R3 is a not a field,' the intersection of three mutually orthogonal bounded not a cylinder set. The set of tinite-dimensionalcylinders is nbt closed under the operations of union and complementation (andhence intersection) except in an infinite-dimensional space. This fact is critical in considering c(C), the class obtained by adding the countable unions to C. By the lastmentioned property of unions, c(C) includes sets of the form (12.4)with k tending to infinity. Thus, we have the following theorem. .
.x
<walls'
.
twalls'
Kcube',
Stochastic Processes
12.3 Theorem c(r) ogy. I:l
=
183
f=, the Borel field of sets in R= with the Tychonoff topol-
Fig. 12.2 The condition of this result is something we can take for granted in the usual applications. Recalling that the Borel field of a space is the jmallest c-field containing the open sets, 12.3 is true by definition, since C is a sub-base for the product topology (see j6.5) and a11the open sets of R= are generated by unions and finite intersections of C-sets. To avoid explicit topological considerations, the reader may like to think of 12.3 as providing the definition of f=. One straightforward implication, since the coordinate projections are continuous mappings and hence measurable, is that, given a distribution on @=,=), finite collections of sequence coordinates can always be treated as random vectors. But, while this is obviously a condition that will need to be satistied, the real mtypractical method we have of defining distribproblem runs the other way. utions for infinite sequences is to assign probabilities to finite collections of coordinates, after the manner of j8.4. The serious question is whether this can be done in a consistent manner, so that in particular, there is one and only one p.m. on (R7S=) that corresponds to a given set of the finite-dimensional distributions. The affirmative answer to this question is the famous Kolmogorov consis'l'he
tency theorem.
12.4 The Consistency Theorem The goal is to construct a p.m. on @*,f*),and, following the approach of j3.2, the plausible first step in this direction is to assign probabilities to elements Of C Let p denote a p.m. on th space (R,f$, for k 1,2,3,.... We will say that this family of measures satisfies the consistency property if =
.
gt' )
=
g'-t'
X
R
m-k
)
(12.7)
Theory of Stochastic Processes
184
for E G Bk and al1 m > k > 0. ln other words, any k-dimensional distribution can be obtained from an v-dimensional distribution with m > k, by the usual operation of marginalization.
The consistency theorem actually generalizes to stochastic processes with uncountable index sets 1- (see 27.1) but it is sufficient for present purposes to consider the countable case. consistency theorem Suppose there exists a family of p.m.s (gkJwlich satisfy consistency condition (12.7).Then there exists a stochastic sequence (m,t e N J on a probability space @=,f=,g,)such that p is the p.m. the of coordinate functions (.L,...,Xk)'. u vector of tinite
12.4 Kolmogorov's -'r
=
The candidate
measure for
is defined for sets in C by
.x
FIC)
=
(12.8)
11,1(6-),
where C and E are related by (12.4).The problem is to show that g, is a p.m. on C. If this is the case, then, since C is a field and f= = c(C), we may appeal to the extension theorem (3.8+3.13)to establish the existence of a unique measure on (R*,%=) which agrees with g for all C e C. The theorem has a simple but important corollary. C is a determining class for (R7f=).
12.5 Corollary
E1
ln other words, if g and v are two measures on (R=,f=) and jtk finite k, then jt v.
vk for every
=
=
To prove the consistency theorem we require a technical lemma whose proof is beyond us at this stage. It is quite intuitive, however, and will be proved in a more general context as 26.23.
12.6 Lemma For every E G T an d such that pkE A') < n
0 there exists K, a compact subset of E,
>
-
.
In other words, a p.m. on the space (R Bk) has nearly all of its mass confined to a compact set; this implies in particular the proposition asserted in j8.1, that random variables are finite almost surely. ,
Proof of 12.4 We will verify that jt of (12.8)satisfies the probability axioms with respect to elements of C. When E Rk C R= so that the first two probability axioms, 7.1(a) and (b), are certainly satistied. To establish finite additivfor E e Tk E' e Tm and ity, suppose we have C-sets C gk-1(AD, an d C' zr-1(F') m m k k. If C and C' are disjoint, =
,
=
=
p,(O+ g,tC ') g( =
F) + jt m (F')
= gmE x
=
qvl-J7
=
tp E
,
pmEx R'&-$ + p,m(A'') ,
)
=
g,(c
kp
c')
,
(12.9)
where the second equality applies the consistency condition (12.7),and the third m-k and E z are disjoint if C and C z are. one uses the fact that E x R The remaining, relatively tricky, step is to extend tinite additivity to count-
Stochastic Processes
185
able additivity. This is done by proving continuity, which is an equivalent property according to 3.5. lf and only if the measure is continuous, a monotone 1 C or G' C has the property, g,(G) p,IC). sequence tCj e C ) such that CJ Since Cj Cc where g(C9 = 1 jttt7), it is sufficient to C implies consider the decreasing case. And by considering the sequence Cj C there also is no loss of generality in setting C 0, so that continuity implies g,tGl 0. To prove continuity, it is sufticient to show that if g(() 2 6: for some E > 0, for every j, then C is nonempty. lf C e C for some j k 1, then p,(G)= pkqlEj) for some set Ej G Bkct where ? /cU) ls the dimension of the cylinder Cj. By consistency, pkqlEp '1-
.
--
.1e
'l'
-
-
=
--
=
pmEj X R '''-U)
so there is no loss of generality in assuming ) for any m > 17U), kq) < < k41) < k42) We may therefore define sets F1 e LB that lU) < 1,...,./, by setting E) A) and ...
....
=
,
=
1.
=
Ei xR ko-k
is a decreasing sincefc/),t,-1
i
,
1
=
j
,...,
-
1.
(12.j())
euhsets(A1).t,-1,
sequence, so is the sequence of
for each j k 1.
Consider any tixed j. There exists, by 12-6, a compact set Kj bkkqfEj Detine the sets K1
=
E/2 y+1
&')
: by assumption, it follows from (12.15) that bklh) pkqlFj) > e/2, and accordingly that Dj is nonempty. Now. we construct a Doint of C. Let Ixlij. i e (N1 denote a seauenc of ooints -
.
=
F/let??'yof Stochastic Processes
186
of R=
with
x(J) e
Dj for each j, so that
(X1U),...,Xw)U))lk(j)@U)) =
Note that for m
=
(12. 16)
Fj.
1,...,./,
1 Km, (XlU),...,Akm)U))'&(,?,)(xU))
(12.17)
=
by (12.12), where Kmis compact. Now 1et m be tixed. Our reasoning ensures that (12. 17) holds for each.j k m. A set in Rkm' is compact if and only if each of the coordinate sets in R is compact, so consider the bounded scalar scquences, fxibij, j r?z) for f 1,...,k4-). Each of these has a cluster point A'1, and we can use the diagonal method (2.36)to construct a single subsequence (Al with the = A': for each 1,...,k(-). By the compactness, property that Xijn) e Km i Em. This is true for every m e IN (X1,...,X(m)) Consider the point a7*e R= defined by 7v(,,,)(x*) for each m e N. (X1,...,X(m)) = Since x* e Cm for each m, we have x* G O'>;=ICk C, as required. . =
.-+
.
=
This theorem shows that, if a p.m. satisfying (12.7)can be assigned to the finite-dimensional distributions of a sequence m x is a random element of a We shall often wish to think of @=,f7jt) as derived probability space from an abstract probability space (f,T,#), and then we shall say that x is ble if F1(A>)e 5 for each event E e S*. This statement implies the 5lB coordinates Xt are F/f-measurable r.v.s for each /, but it also implies a great deal more than this, since it is possible to assign measures to events involving countably many sequence coordinates. ('R*,fO,g,).
''O
-measura
12.5 Uniform and Limiting Properties Much the largest part of stochastic process theory has to do with the joint distributions of sets of coordinates, under the general heading of dependence. Before getting into these topics, we shall deal in the rest of this chapter with the various issues relating exclusively to the marginal distributions of the coordinates. Of special interest are conditions that limit the random behaviour of a sequence as the index tends to infinity. The concept of a unform condition on the marginal distributions often plays a key role. Thus, a collection of rov.s tX,r, T e is said to be uniformly bounded in probability if, for any 8 > 0, there exists Bg < x such that -1)
#( IXzI sup ' It is also said to be unjrmly
)
p > 0, by Liapunov's inequality. Also, uniform fv-boundedness for anyp > 0 implies uniform boundedness in probability; the Markov inequality gives 'r
=
tbounded
,
-->
-x).
-->
,
#(IAI
B4
(12.20)
terminology we sometimes speak of fvboundedness in the case of (12.18). A standard shorthand (dueto Mann and Wald 1943b) for the maximum rate of (positive or negative) increase of a stochastic sequence uses the notion ot Oh' notation unifonu boundedness in probability to extend the tBig Oh' and real E lf, for exists ordinary 0, > Be < x such for there sequences (see j2.6). that the stochastic sequence fXn )T satisfies supa't 1XnI > B < E, we write Xn 0p(1). lf (L)T is another sequence, either stochastic or nonstochastic, and XnlYn = 0p(1) we say that Xn Op(L), or in words, &Xnis at most of order Fn in x, 0 as n tN(1); more we say that Xn Probability' If #( IXnI > :) t7p(1), when words, in GXn is of order lss than Xnlk'n generally Xn o (L') or r l'n in probability The main use of these notations is in manipulating small-order terms in an expression, without specifying them explicitly. Usually, Fn is a positive or negative power of n. To say that Xn tV1)is equivalent to saying that Xn converges in probability to zero, following the terminology of j12.2. Sometimes Xn Op(1) is defined by the condition that for each s > 0 there exists Be < x and an integer Ng > 1 such that #( IXnl > B < s for a1l n Nz. But Xn is finite almost surely, and there necessarily exists (this is by 12.6) a constant BL< #E') < E for 1 f n < Ne. For al1 possib1y 1arg er than Be, such that #( IXn l > practical purposes, the formulations are equivalent. Ktaittle
=
=
--)
-->
=
.
=
=
.
=
=
=,
Iheor.y of Stochastic Processes
188
12.6 Uniform lntegrability lf a r.v. X is integrable, the contributions to the integral of extreme X values must be negligible. ln other words, if FIXI f(IX1 1(IxlzM))
-->
0 as M
(12.21)
-->
x.
However, it is possible to construct unifonnly Al-bounded sequences l#nl which fail to satisfy (12.21)in the limit.
12.7 Example Detine a stochastic sequence as follows: for n 1,2,3,... let Xn = 0 with probability 1 1/rI, and Xn = n with probability 1/a. Note that E( lXn l) = nln = 1 for every n, and hence the sequence is uniformly Al-bounded. But to have =
-
limE IXnIlt lxalA/l)
=
0
(12.22)
MM=
uniformly in n requires that for each e > 0 there exists Ms such that E lXnl 1( xajkvl) < e for al1 M > Mo uniformly in a. Clearly, this condition fails, for e: < 1, in view of the cases n > Me. n Something very strange is going on in this example. Although E(Xn4= 1 for any n, x. To be precise, we may show that Xn = 0 with probability approaching 1 as n 0 (see 18.15). The intuitive concept of expectation appears to fail when Xn faced with nv.s taking values approaching intinity with probabilities approaching --+
-613...:
Zero.
The unform integrability condition rules out this type of perverse behaviour in a sequence. The collection (Xz, $ e T'J is said to be uniformly integrable if
sup 1im F( lXx11( xvl
Te
7
M)
M-yx
)
=
0.
(12.23)
ln our applications the collection in question is usually either a sequence or an array. In the latter case, uniform integrability of (Xkf) (say) is defined by taking the supremum with respect to both t and n. The following is a collection of theorems on unifonn integrability which will tind frequent application later on; 12.8 in particular provides insight into why this concept is so important, since the last example shows that the conclusion does not generally hold without uniform integrability. 12.8 Theorem Let E(Xn)
--$
E (A').
be a unifonnly l.YnlT
integrable sequence. If Xn -6J- X, then
Proof Note that E iXn I) = E IXnl 1( !xnIM))
+
2F( lXl 1( IxI>v/2)).
The second right-hand-side term goes to zero as M integrable if (A%) is. We may write f(Fn)
'(l%1(yn<M))
=
+
--
,
.--.)
so
x,
(12.25)
f Fn) is
A'(L1(yn>v)),
uniformly
(12.26)
and by the bounded convergence theorem there exists, for any E > 0, Ne such that F(l%1(ya)) < E/2 for n > #:, for M < oo. M can be chosen large enough that F(L1( ys>>)) < :/2 uniformly in n, so that EYn) < E for n > Nv., or, since 6: is 0. But arbitrary, EYn) --.-)
EYnq
by the
lnodulus
F(I-L--YI) k
=
IF(Xn) -F(ml
(12.27)
inequality, and the theorem follows. w
The next theorem gives an alternative form for the condition in often more convenient for establishing uniform integrability.
(12.23)which
is
12.9 Theorem A collection fXs, e 1-) of nv.s on a probability space (f1,1,#)is uniformly integrable iff it is uniformly Al-bounded and satisties the following condition: V E, > 0, 3 8 > 0 such that, for E E 5, ':
sup (F41 Xzl 1s) l < z.
P(E4 S
(12.28)
': e 'T
Proof To show sufficiency, fix : > 0 and 5 1, Markov inequality for p
E
7. By Al-boundedness and the
=
E
#(1-V1k Af) f and for M large enough, (12.28), it follows since
lxzI < M
(12.29)
x,
#(l-YzI Af) S ,1
6, for any is arbitrary that
>
0. Choosing
lA'(lX,rl 1f Ixzlwzlll < E,
Sup
to satisfy
(12.30)
TeT
and (12.23)follows since
E is arbitrary. note that, for any E e 5 and
To show necessity, F(1 Xvl 1s)
=
'r
E
1-,
1s1( Ixz1<MJ) + .E'(I.YZI 1s1( lxvI&z)) l-Y'rI
E
< MPCEI+
F(1 -Ysl1( Ixzlkvl).
(12.31)
Consider the suprema with respect to of each side of this inequality. For e > 0, (12.23) impliej there exists M < t:'o such that 'r
1F( 1AL1 1s)l
sup T
Uniform Al-oundedness
s MPCE) +
r1:.
now follows on setting F
=
f, and
(12.32) (12.28)also follows
Theor. of Stochastic Processes
190 with 8
0, then limv-pxlx'rl
Proof Note that
1( IxzIuv))
E IX,t11+e
(12.33)
< cxn =
0.
1+91
s (jx z j t l-vzlkxf)) S?E (IX1I1( jxvlzwzl) 2
for any 0 > 0. ne result follows on letting M (12.34) is finite by (12.33)..
..-+
x,
(12.34) since the majorant side of
Example 12.7 illustrated the fact that uniform fal-boundedness is not sufficient for unifonn integrability, but 12.10 shows that unifol'm falma-boundedness is sufficient, for any 0 > 0. Adding this result to those of j12.5, we have established the Merarchy of uniform conditions summarized in the following
theorem. 12.11 Theorem Uniform boundedness a.s.
uniform uniform = = uniform = uniform zz:
None of the reverse implications hold.
fw-boundedness, p > 1 integrability fw-boundedness, 0 < p S 1 boundedness in probability. n
13 Dependence
13.1 Shift Transformations We now consider relationships between the different members of a sequence. ln what ways, for example, might the joint distdbution of Xt and Xt-k depend upon k, and upon /? To answer questions such as these it is sometimes helpful to think aboutthe sequencein anew way. Havingintroducedthenotionof arandom sequence as a point in a probability space, there is a useful analogy between studying the relationships within a sequence and comparing different sequences, that is, different sample outcomes (l) e f. F-> copsider a 1-1 measurable mapping, F: ln a probability space (f1,@,#), (onto). This is a rule for pairing each outcome with another outcome of the space, but if each ) e f maps into an infinite sequence, Finduces a mapping from one sequence to another. F is called measure-preserving if PCTEI PE) for a11E e F. The shift transformation for a seqence (X/)JT is defined bylo =
X(F)
=
X,+1().
(13.1)
F takes each outcome ) into the outcome under which the realized value of X occurring in period t now occurs in period t- 1, for every t. In effect, each coordinate of the sequence from t 2 onwards is relabelled with the previous Xt-vk), the relationship period ' s index. More generally we can write XtlTkl between points in the sequence k periods apart becoming a characteristic of the transformation F k Since Xt is a nv. for a1l f, both the shift transformation and its inverse F -1 th e backsh;ft transformation, must be measurable. Taken together, the single r.v. X1(): f r- R and the shift transformation F, can be thoght of as generating a complete description of the sequence (X/))JT. This can be seen as follows. Given #14), apply the transformation F to ), and each ( e ( defines the mapping A%((0):f F-> R, obtain Xz = X1(F(0). Doing this and we are ready to get Xg X2(F(0). ltrating the procedure generates as many points in the sequence as we require. =
=
.
,
tor
=
be a sequence of coin tosses (with1 13.1 Example Consider 12.1, Let (Xf())JT for heads, 0 for tails) beginning 11010010001 1... (say).Somewhere on the intelwal g0,1) of real numbers (in binary representation), there is also a sequence (A()') J7 beginning 10100100011..., identical to the sequence indexed by (l) apart om the dropping of the initial digit and the backshift of the remaindr by one position. Likewise there is another sequence fXt(''4 1T,a backshifted version of ' ) 17b eg inning 0100101 1...; and so foih. If we define the transformation taVt(0
Theory of Stochastic Processes
192
)j'' F by F -1 ) = (l) r-1)' etc. the sequence (Xl(() )T can be constructed as 1-J ) )T,t h at is the sequence of first members of the sequences thesequence lm(F foundby iterating the transformation, in this case beginning 1,1,0,... I:l '
,
=
,
,
'
,
This device reveals, among other things, the complex stnlcture of the probability space we are postulating. To each point (.t) e fl there must correspond a countably infinite set of points F ) e f w hich reproduce the same sequence apart from the The intertemporal properties of a sequence can absolute date associated with then be treated as a comparison of two sequences, the original and the sequence lagged k periods. Econometricians attempt to make inferences about economic behaviour from recorded economic data. In time-series analysis, the sample available is usually a single realization of a random sequence, economic history as it actually occurred. Because we observe only one world, it is easy to make the mistake of looking on it as the whole sample space, whereas it is really only one of the many possible realizations the workings of historical chance might have generated. Indeed, in our probability model the whole economic universe is the counterpm of a single random outcome ); there is an important sense in which the time series analyst is a statistician who draws inferences from single data points ! But although a single it is realization of the sequenc can be treated as a mapping from a single linked to a countably infinite set of s corresponding to the leads and lags of the seqpence. A large part of our subsequent enquiry can be summarized as posing the question: is this set really rich enough to allow us to make inferences about # from a single realization? ,
m.
,
13.2 Independence
nnd
Stationarity
Independence and statioharity are the best-known restrictions on the behaviour of a sequence, but also the most stringent, from the point of view of describing economic time series. But while the emphasis in this book will mainly be on finding ways to relax these conditions, they remain important because of the many classic theorems in probability and limit theory which are founded on them. The degree to wlch random variations of sequnce coordinates are related to those of their neighbours in the time ordering is sometimes called the memory of a sequence; in the context of time-ordered observations, one may think in tenns of the amount of information contained in the cunrnt state of the sequence about its previous states. A sequence with no memory is a rather special kind of object, because the ordering ceases to have signiticance. lt is like the outcome of a collection of independent random experiments conducted in parallel, and indexed
arbitrarily. When a time ordering does nominally exist, we call such a sequence serially independent. Generalizing the theory of j8.6, a pair of sequences ((X,())T, l Fr() JT1e R= xR*, is independent if, for a11F1, E1 e 5*, /:()1.Y,17 e F1, (F,1: e E,4 Accordingly, a sequence
(A())T
=
#(tx,)7
e Ej4pf
r,)7 e
F2).
(13.2)
is serially independent if it is independent of
Dependence
193
fXt(T k(0) IT fOr a 11k ls 0 This is equivalent to saying that every tinite collection of sequence coordinates is totally independent. Serial independence is the simplest possible assumption about memory. Similarly, looking at the distribution of the sequence as a whole, the simplest treatment is to assume that the joint distribution of the coordinates is invariant with respect to the time index. A random sequence is called strictly stationary if Tllis implies that the sequences the shift transformation is measure-preserving. fXf 17 and fAkJ7=1have the same joint distribution, for every k > 0. Subject to the existence of particular moments, less restrictive versions of the condition are also commonly employed. Letting w E(Xt), and Xkt Covxt,xtn), consider those cases in which the sequence (g'rl7 and also the array ( fXmt are well defined. If w g,, al1 /, we say the sequence is mean J';=()l7wl, where (y,,l stationaty. If a mean stationary sequence has 17'is a sequence wt w of constants, it is called covariance stationaty, or wide sense stationary. If the marginal distriution of Xt is the same for any t, the sequence IXCJis said to be identically distributed. This concept is different from stationarity, which also restricts the joint distribution of neighbours in the sequence. However, Fhen a stochastic sequence is both serially independent and identically distributed (or i.i.d.4, this suftices for stationarity. An i.i.d. sequence is likeanarbitrarily indexedrandom sampledrawngomsomeaunderlyingpopulation. The following clutch of examples include both stationary and nonstationary .
.1
=
=
=1,
=
=
CaKS. (52 < 13.2 Example Let the sequence (6>J b e i i d with mean 0 and variance where fA)7, and let (0j17be a square-summable sequence of constants. Then ''O -x
.
.
(x,
.
*
A
=
77bet-j,
(13.3)
jcst
is a covariance stationary sequence, with Ext) t. This is the inhnite-order moving average additional details. n 13.3 Example lf
kt
=
0 and Ext)2 = c J e? ) sr every (MA(=)) process. See j14.3 for 2:.,.0
,0
is i.i.d. with mean 0, and Xt
for a constant a, then Ex
=
=
cos at
+
(13.4)
st
cos at, depending systematically on /.
I:a
13.4 Example Let (xV)be any stationary sequence with autocovariance sequence (ym, m 0J The sequence (.X)+ A%) has autocovariances given by the array .
l
'o
+
and hence it is nonstationary.
'sm
+
Yt+
'tt,vm,
m 2 0, t k 1 l
,
u
13.5 Example Let X be a r.v. which is symmetrically distributed about 0, with variance c 2 If Xt (- 1)W then (X,)Tis a stationary sequence. In particular, .
Ext)
=
=
,
0, and Cov(Xt,Xt+k4 5.2 when k is even and =
-c2
when k is odd, independent
Theory of Stochastic Processes
194 of t.
(:1
Theseexamples show that, contraryto acommon misconception, stationarity does not imply homogeneity in the appearance of a sequence, or the absence of periodic patterns. The essential feature is that any patterns in the sequence do not depend systematically on the time index. It is also important to distnguish between stationarity and limited dependence, although these notions are often closely linked. Example 13.4 is nonstationaty in view of dependence on initial conditions. The square-summability condition in 13.2 allows us to show covariance stationarity, but is actually a limitation on the long-range dependence of the process. Treatments of time series modelling which focus exclusively on models in the linear MA class often fail to distinguish between these propeties, but examples 13.3 and 13.5 demonstrate that there is no necessary connection betwen
them.
'
Stationarity is a strong assumption, particularly for the description of empirical time series, where features like seasonal patterns are commonly found. It is nonstationarity, sometling we might think of useful to distinguish between capable of elimination by local averaging of the coordinates, and global' as nonstationarity, involving features such as persistent trends in the moments. If sequences (Xr)7=1 and (a'+k)7=1have the same distribution for some (not necessarily every) k > 0, it follows that (.+x)7.1 has the same distribution as (X)+l7=1, and the same property extends to every integer multiple of k. Such a sequence accordingly has certain stationary characteristics, if we think in tenns of the distributions of successive blocks of k coordinates. This idea retains x. Consider the limit of a tinite sequence of length n, divided force even as k into (rl j blocks of length gn 1,plus any remainder, for some a between 0 and 1. here denotes the largest integer below x.) The number of blocks as well (Note, E.;r) as their extent is going to infinity, and the stationarity (or otherwise) of the sequence of blocks in the limit is clearly an issue. Important applications of these ideas arise in Parts V and VI below. It is convenient to formulate a definition embodying this concept in terms of moments. Thus, a zero-mean sequence will be said to be globally covariance stationary if the autocovariance sequences (y/uf are Cesro-summable for each )*r.1 0, where is strictly positive the k in the Cesro sum case of the variances m = m 0). The fllowing are a pair of contrasting counter-examples. ilocal'
-->
13 6 Example A sequence with
'p,
/l5
-
@
is globally nonstationary for any
j
#
0. n
13.7 Example Consider the integer sequence beginning 1,2,1,1,2,2,2,2,1,1,1,1,1,1,1,1,...*, i.e. the value changes at points t 2 k k = 1 2 3 The cesro sum of this x. It fluctuates eventually between the points sequence fails to converge as n lk k = k even. A stochastic sequence having a 5/3 at n 2 k o dd and 4/3 at n of form globally nonstationary. (:1 is variance sequence this =
,
-->
,
,
=
,
,
,
,.,.
Dependence
195
13.3 lnvariant Events The amount of dependence in a sequence is the chief factordetermining how informative a realization of given length can be about the distribution that generated it. At one extreme, the i.i.d. sequence is equivalent to a true random sample. The classical theorems of statistics can be applied to this type of distribution. At the other extreme, it ij easy to specify sequences for which a single realization can never reveal the parameters of the distribution to us, even in the limit as its length tends to intinity. This last possibility is what concerns us most, since we want to know whether averaging operations applied to sequences have useful limiting properties', whether, for example, parameters of the generation process can be consistently estimated in this way. Toclarify theseissues, imaginerepeatedsamplingof random sequences (A((t)) JT', in other words, imagine being given a function X1(.) and transfonnation F, making repeated random drawings of (.t) from fl and constructing the corresponding random sequences', 13.1 illustrates the procedure. Let the sample drawings be denoted j, 1,...,N, and imagine constructing the average of the realizations at some i ,=1 X is called an ensemble average, fixed time / e. The average kxto N-1X% o((tk) which may be contrasted with the time average of a realization of length n for n-11J=1Xj()j). Fig. 13.1 illustrates this procedure, some g iven )j e D, Xntj) showing a sample of three realizations of the sequence. The ensemble average is the average of the points falling on the vertical line labelled /c. lt is clear and the ensemble average as n and N that the limits of the time averg respectively go to infinity are not in general the same; we might expect the ensemble average to tend to the marginal expectation Ext ), but the time average 8 will not do so except in special cases. If the sequence ls nonstationary Ext depends upon /o, but even assuming stationarity, it is still possible that different realizations of the sequence depend upon random effects which are common to a1l t. =
=
,
=
O O
O
O O O
O O
O
O
Fig. 13. 1
Theory of Stochastic Processes
196
In a probability space (f1,F,#) the event E e 5 is said to be invariant under a transfonnation Tif PCTEh F) 0. The criterion for invariance is sometimes given as TE F, but allowing the two events to differ by a set of measure zero does not change anything important in the theory. The set of events in 5 that are invariant under the shift transformation is denoted #'. =
=
13.8 Theorem
,@
is a c-tield.
Proof Since F is onto,
is clearly invariant. Since F is also 1-1,
TEc A E C
CTE4cA Ec
=
(13.5)
TE A E
=
by definition. And, given 3En e #, n e IN), PCTE,tA En)
PITEn
=
&) + P(En
-
-
TEn4 = O
(13.6)
for each n, and also F
UEn
A
n
UEn UTEn A UEn,
(13.7)
=
n
l
n
using 1.2(i). By 1.1(i) and then 1.1(iii),
UTEn UEn U TEn OEL ? U(Ff n
and similarly,
f'n
=
-
n
n
-
n
&),
(13.8)
?1
n
UEn UTEn U(En n
-
n
TEn).
-
(13.9)
n
= 0 The conclusion #gT(Un&) A (Un&)1 now follows by . proof. pletingthe
(13.6)and
3.6(ii), com-
An invariant random variable is one that is J/f-measurable. An invariant nv. Z() has the property that Z(F) =Z(), and an J/f-measurable sequence (Z,()) ):' Z1()) a.s. for every t. The invariant events is trivial in the sense that Zt and associated r.v.s constitute those aspects of the probability model that do not alter with the passage of time. =
= Ff((z)) + Z(), where Xt 13.9 Example Consider the sequence (AX()))) ( F,4(0)) random and example of invariant Z((0) An being a event is E an sequence a r.v. f (.0: Z() K z, Ff() e R ). Clearly E and FF are the same event, since Z is the only thing subject to a condition. Fig. 13. 1 illustrates this case'. lf ( Ff()) ) is a zero-mean stationary sequence, the figure illustrates the cases Z((l)1) = Z((.0:) 0, and Z((l)3) > 0. Even if EZ4 out' in the limit, = 0, the influence of Z() in the time average is not as it will be from the ensemble average. I:a =
=
0
(13.12)
.
'-j
Since k
j+k
&(r?(o) XxtT) =
!F71rf(e)) sjmke)SjTb, z-
=
>1
by defining & Nnj
=
(13.13)
-
J=./m 1
0 we may also write ):
=
max j- 1f k f n
sk
-
Sjbh) > 0
,
0 f j Kn
-
1
.
(13.14)
This is the set of outcomes for which the partial sums of the coordinates f'rom j + 1 to n are positive at least once, and we have the inequality (explained below) n-1
XA+1()
lxnytl
2 0, all
)
e D.
(13.15)
Integrating this sum over the invariant set M gives n-1
n-1
r
0 X j=0 J
limit by 2.26, so that, as required,
This limit is equal to the Cesro
-1
j'ucvs m()##()
=
1im a-,-
The inequality in
n
Xnj
m()##4)) 0. .
(13.18)
ucxxj
s,
not self-evident, but is justified as follows. The having the property expression on the left is the sum containing only those Xt realization that in the partial sums from the point t onwards are positive at least once, otherwise the tth contribution to the sum is 0. The sum includes only Xt lying in segments of the sequence over which Sk increases, so that their net contribution must be positive. It would be zero only in the case Xt S 0 for 1 f t S n. Fig. 13.2 depicts a realization. for k shows values of Sk 1,...,a, vertical shows the separations between successive so the .Yf((0)are the running sum of the terms of (13.15).The coordinates where the Xt are to be omitted from (13.15) are arrowed, the criterion being that there is no to the which exceeds the right current
(13.15)is ,
'o'
=
&o'
c
A(-a)
=
.
Jsy(a,j)X1(t0V#(tt9 S tx#(M(?'p))'
(13.21)
Since the left-hand sides of (13.21)and (13.20)are equal and (x < p, it follows 5'() with probability 1. that #(M(a,p)) = 0; that is, R)) 8)) This completes the first stage of the proof. It is now required to show that S = Exj I#) a.s., that is, according to equation (10.18),that =
juxltdp Sinc
=
jsjdp,
=
each M e
(13.22)
M is invariant, ''t
j
juxLlkdp =
a-
n
.,yt
X >1
vr
j.
Jyrto, ju ,
X/t0)##((t)
=
-s
Snl
:#4tl))
(13.23)
and the issue hinges on the convergence of the right-hand member of (13.24)to
E(S3u4. Since the sequence (-Y/Jis stationary and integrable, it is also uniformly integrable, and the same is true of the sequence ( F,l, where Ff Xtsu and M e #. For : > 0, it is possible by 12.9 to choose an event E e 5 with #(& < 6, such that sup E 1FJ I1e,) < e. For the same E, the triangle inequality gives =
j
E
j) j s )x-(J yn,-, >1
#,)
Ii-,I E
dp
>1
< :
(13.24)
.
By the same argument, also using stationarity and integrability of Ff,
I j
J1 n-
n
y-, dp
-,
s
Hence by 12.9 the sequence n-v I>. If n-l n n n -11 F) 12.8, by =1
=
J
complete. w
n
,.j
jl
yl
),; jap
=
s j yj j
(13.25)
tn-1Z:=1Ff)
-->
i s a 1so uniformly S a.s., it is clear that n-S
-1
Mn
Since n is arbitrary,
n
-1
:#((t))
--y
( Sudp.
2M
n
integrable, where Sku a.s., so
1M
-->
(13.26)
(13.26)and (13.23)together give (13.22),and the proof
is
13.4 Ergodicity and Mixing The property of a stationary sequence which ensures that the time average and the ensemble average have the same limit is ergodicity, which is defined in terms of
I'heory of Stochastic Processes
200
the probability of invariant events. A measure-preserving transformation F is l or #() is the c-field of 0 for a11E e Ds where ergodic if either #(F) under F. A stationary sequence (Xf() )+-7is said to be ergodic if invariant events J-1 =
x/
=
Xl (F ) for every t where F is measure-preserving and ergodic. Some authors, such ag Doob, use the tenn metrically transitive for ergodic. Events that are invariant under ergodic transformations either occur almost surely, or do not occur almost surely. ln the case of 13.9, Z must be a constant almost surely. Intuitively, stationarity and ergodicity together are seen to be sufficient conditions for time averages and ensemble averages to converge to the same limit. is the mean not just of X1 Stationarity implies that, for example, g, f(m()) but of any member of the sequence. The existence of events that are invariant under the shift transformation means that there are regions of the sample space which a particular realization of the sequence will never visit. lf PIFFA E) = yhen the event Ec occurs with probability 0 in a realization where E occurs. However, if invariant events other than the trivial ones are ruled out, we ensure that a sequence will eventually visit a1l parts of the space, with probability 1. In this case time averaging and ensemble averaging are effectively equivalent
X()
=
=
t
(
operations.
)
The following corollary is the main reason for our interest in Theorem 13.10.
13.12 Ergodic theorem Let (.X)4(0)IT be a stationary, ergodic, integrable sequence. Then 1imSnelln
=
(13.27)
F(X1), a.s.
n-4-
Proof This is immediate from 13.10, since by ergodicity, EIXj I#)
F(X1) a.s. w
=
In an ergodic sequence, conditioning on events of probability zero or one is a is trivial operation almost surely, in that the information contained in trivial. The ergodic theorem is an example of a 1aw of large numbers, the first of several such theorems to be studied in later chapters. Unlike most of the subsequent examples this pne is for stationat'y sequences. Its practical applications in econometrics are limited by the fact that the stationarity assumption is often inappropriate, but it is of much theoretical interest, because ergodicity is a very mild constraint on the dependence, as we now show. Atransformation thatis measure-preserving eventually mixes up the outcomes in a non-invariant event A with those in Ac. The measure-preserving property rules out mapping sets into proper subsets of themselves, so we can be sure that FA jAc is nonempty. Repeated iterations of the transformation generate a sequence of sets l T V ) containing different mixtures of the elements of A and Ac. A positive AC; that is, if implies a negative dependence of # on dependence of # on #(A f-h B) > P(A4PB) then P(Ac /'A Bj PB4 #(A f-'h #) < #(#) #(A)#(#) PIAc)PCB). Intuition suggests that the average dependence of B on mixtures of A and Ac should tend to zero as the mixing-up proceeds. ln fact, ergodicity can be characterized in just this kind of way, as the following theorem shows. ,/
:4
=
-
-
=
llependence
201
shift transfonnation Fis ergodic if and only 13.13 Theorem A measure-preserving for pair of if, events A, B e @, any n
1
-xpvkx
lim
rasl
(13.28)
PA)PB).
=
k=1
n-yx
Proof To show ton 1y ito let A be an invariant event and B A. Then PCTPAf'n B) = #(A) for a11 k, and hence the left-hand side of (13.27)is equal to #(A) for all 1' #(A) 2 implying 84) 0 or 17.7 k. This gives #(A) apply the ergodic theorem to the indicator function of the sets To show and ergodic, to give F V w h re T is measure-preserving =
,
=
=
,
iif'
,
,
1
n
lztxt(0l lim X &1
=
-
(13.29)
#(4), a.s.
n-yx
But for any # e 5,
jj x-1,u((o) -)
-
,(.)
k=1
j
dpv
zj I jJst) x,.-tx-
1,u((o)
#IN
-
&1
lztxt(')l
,(x)
!
dp
I
-#(A))dz'((o)
1
n
=
1 -sX#(F
k
A rhB) -PA4PB4
.
(13.30)
k=1
The sequence whose absolute value is the integrand in the left-hand member of cxl by (13.30) converges almost surely to zero as n (13.29);it is bounded uniformly + is clearly integrable. Hence, uniformly in 1 #(A) absolutely by n, so the left-hand member of (13.30)converges to zero by 12.8, and the theorem -->
follows.
.
Following from this result, ergodicity of a stationary sequence is often associated with convergence to zero in Cesro-sum of the autocovariances, and indeed, in a Gaussian sequence the conditions are equivalent.
13.14 Corollary If (Xr4(4)))Tis a stationary, ergodic, square-integrable sequence, then n 1 XCov(XIm)
*n al
--
0 as n
(13.31)
-+
x.
Proof Setting B A and defining a real sequence by the indicators of F A, Xkk equivalent to (13.31).First extend this result to sequencs = 1wu((o), (13.28)is X,'%1Aj(), so that A%((t))= X1(T) Lwlr-ujt). of simple r.v.s. Let X1(() = The main point to be established is tha.tthe differencebetween X1 and a simple r.v. can be ignored in integiation. In other words, the sets F Aj must form a =
=
.j
Theot'y of Stochastic Processes
202
partition of f1, apart possibly from sets of measure 0. Since F is measureAf)) #(UjAj) 1, using 1.2(ii), and hence #(F-1(Uf preserving, #(UF-V 1, additivity of the #(f1 UjF -1Ai) = 0. And since L#(F-V i ) = Zf#(Af) z4jl -1 is a1so disjoint apart f'rom possible measure implies that the collection (F verifying required the property. sets of measure 0, This argument extends by induction to Xk for any k e N. Hence, =
=
=
=
-
7txlxzlE 777rcslxow-u/) j =
i
= 7) 77a,uyrtA
rn T-kAjl,
,w1z'',4
7/... $J
,.-u ' .7
j
-
t.
.
(13.32)
/ (the sum being absolutely convergent by assumption), and by 13.13, n 1 TExk)
lim n -
n--yx
=
X ywm
&1
i
lim
'
'-
j
n-yx
-
n 1 X#(Af
n
-z
rn T Aj)
&1
afW#(Xf)#(Ay) = Xi X j
&XI)2,
=
(13.33)
whereEXj4 2 EX 14EXk) for any k, by stationarity. The theorem extends to generalsequences by the usual application of 3.28 and the monotone convergence theorem.> =
This result might appear to mean that ergodicity implies some fonn of asymptotic independence of the sequence, since one condition under which (13.31)certainly x. But this is not so. The following holds is where Cov(X),XJ 0 as k example illustrates nicely what ergodicity implies and does not imply. -->
-->
13.15 Example Let the probability space (f,T,#) be defined by fl = f0, 1 J, so that 5 = (fO l 0),f 1 l (0,1)) and #() = 0,5 for (.0 0 and (l) = 1. Let F be the transfonnation that sets F0 = 1 and F1 = 0. ln this setup a random sequence and generating the sequence by f Xr(fJ))J may be defined by letting .Yl4(0)= iterating F. These sequences always consist of alternating Os and 1s, but the inial value is randomly chosen with equal probabilities. Now, T is measurepreserving', the invariant events are f and 0, both trivial, so the sequence is ergodic. And it is easily verified that lima-yxn -1Z2 PITkA fa B) P(A)PB) for 0.5 for k every pair A,B e T. For instance, 1et B = f 1 ) and then #(F V f7#) even and 0 for k odd, so that the limit is indeed 0.25 as required. You can verify, equivalently, that the ergodic theorem holds, since the time average of the sequence will always converge to 0.5, which is the same as the ensemble mean (a of XI()). ,f
=
,
,
=
=j
=
,
.
In this example, Xt is perfectly predictable once we know X; for any t. This shows that ergodicity does not imply independence between different parts of the sequence, even as the time separation increases. By contrast, a mixing sequence has this property. A measure-preserving, ergodic shift transformation Wis said to be mixing if, for each A, B e F, ,
l7ependence
lim#(F 4 cjB4
203
(13.34)
PA4PB).
=
k-x
X1(Ff-')) for each T he stationary sequence f.V)7is said to be mixing if A() t where F is a mixing transformation. Compare this condition with (13.28);Cesro convergence of the sequence t#(F V /'7 #) k e EN ) has been replaced by actual convergence. To obtain a sound intuition about mixing transformations, one cannot do better than reflect on the following oft-quoted example, originally due to Halmos (1956). =
,
13.16 Example Consider a dry martini initially poured as a layer of vennouth (10% of the volume) on top of the gin (90%).Let G denote the gin, and F an arbitrary small region of the fluid, so that Fcj G is the gin contained in F. If 0.9 and #(.) denotes the volume of a set as a proportion of the whole, #(G) =
#(Ff-7 G)/#(/), the proportion of gin in F, is initially eithr 0 or 1. Let F denote the operation of stining the martini with a swizzle stick, so that #(F Vcj G)/#4p is the proportion of gin in F after k stirs. Assuming the fluid is incompressible, stirring is a measure-preserving transfonnation in that #(F #) mixes the martini we would expect the proportion = #4#) for all k. If the stirring PITkFP k hich is gin in G)/#(&, to tend to #4G), so that each region F F F, w of of the martini eventually contains 90% gin. la transfonnaThis is precisely condition (13.34).Repeated applications of a lnixing tion to an event A should eventually mix outcomes in A and Ac so thoroughly that for large enough k the composition of F V gives no clues about the original A. Mixing in a real sequence implies that events such as A ( ): A((l)) S tz) and F 1.4 = f(t): Xt-vk) S JJ are becoming independent as k increases. It is immediate, or virtually so, that for stationary mixing sequences the result of 13.14 can be x. strengthened to Cov(X1,-L) 0 as n =
--y
--y
13.5 Subfields and Regularity We now introduce an alternative approach to studying dependence which considers the collection of c-subfields of events generated by a stochastic sequence. This theory is fundamental to nearly everything we do subsequently, particularly because, unlike the ergodic theory of the preceding sections, it generalizes
(stationaty)case. Consider a doubly infinite sequence (-V,t e r) (notnecessarily stationary) and define the family of subfields l X, s f /l, where TJ = nXs,...,Xt) is the smallest c-field on which the sequence coordinates from dates s to t are measurable. The sets of Sf can be visualized as the inverse images of (/ s4dimensional cylinder sets in :8*; compare the discussion of j 12.3, recalling that IN and Z are equipotent. We can let one or other of the bounds tend to intinity, and aparticularly important sub-family is the increasing sequence (T-'x, t e Z ), which information contained in the sequence up to can be thought of as, in effect, /' which The c-field on the sequence as a whole is measurable is the date limitingcase F+-7 Vts-t.. In cases where the underlying probability model beyond the measure-preserving
-
tthe
=
Theory of Stochastic Processes
204 concerns just the sequence
(XZJwe
shall identify 1+-Zwith 5.
OfT-rx. Another interesting object is the remote c-Jcl# (or tail c-field), F-x This c-tield contains events about which we can learn something by observing any coordinate of the sequence, and it might plausibly be supposed that these events the occurred at time past when the initial conditions for the sequence were set. However, note that the set may be generated in other ways, such =
-x,
RS
t Ot 1+=
,
or
Ot 5-tt
tremote'
.
One of the ways to characterize independence in a sequence is to say that any pair of non-overlapping subtields, of the fonn Tfzland 1143where /1 k h > t5 k /4, are independent (see j10.5). One of the most famous early results in the law' for independent theory of stochastic processes is Kolmogorov's usually given for the case of a sequence (Al7, and the sequences. This theorem is is defined in this c-tield remote case as
=l97lf
.191,
=
=
.
.
The zero-one 1aw shows us that for an independent sequence there are no events, other than trivial ones, that can be relevant to a11 sequence coordinates. But clearly, not only independent sequences have the zero-one property, and from our point of view the interesting problem is to identify the wider class of sequences that possess it. A sequence fX))+-7 is said to be regular or mixing if every remote event has probability 0 or 1. Regularity is the term adopted by Ibragimov and Linnik (1971), to whom the basics of this theory are due. ln a suitably unified framework,this is essentially equivalent to the mixing concept defined in j13.4. The following theorem says that in a regular sequence, remote events must be independent of a11 events in 5. Note that trivial events are independent of themselves, on the
definition.
13.18 Theorem (lbragimovand Linnik 1971: th. 17.1.1) tall+-Zis regular if and only if, for every B e 5,
lt ( .
Dependence
sup l#(A ra #)
-
I --A
PA)PB4
205 0 as t
(13.35)
.--.) -=.
A e F!-
Proof To prove suppose 3 E E F-x with 0 regular. Then for every t, E e @-'x and so dif'
(ja a6)
()
,
.
which contradicts (13.35).
if assume regularity and define random variables ( lx #(A) (F-lx/f-measurable) and q (Wf-measurable), such that ls #(#) F()n). Then, by the Cauchy-schwartz inequality, #(A fa #) P(A)PB) To prove
Eonly
=
,
=
-
-
=
-
lF((n) I
=
IlF(n I@-'-)112, IuE'tsE'tnI@!-)) I < lItII2
(13.37)
where the equality is by the Iaw of iterated expectations because A e F-lx. Note which will complete the < 1. We show lIF(n l@-,-)112 0 as t that 11(112 of T-fx. roof, arbitrary since element is X an P x)()) Consider the sequence (F(ls I?F-f lt2. For any (.t) e f1, -->
-=,
--
F(1sI8/'-x)())
F(1s l@-x)(t0) as t
--
(13.38)
--> -=,
where by equation (10.18) and the zero-one property,
Jw&1sI;-x)()J#()
f-7#)
PE
=
#(#), PE4 =
1
=
#(A-) = 0
0,
,
E e Fux.
(13.39)
#(#) a.s. agrees with the definition, so It is clear that setting F(ls I@-x)() 5-t.) F(lsl P@) a.s., or, equivalently, that we may say that (F(1s IT!x) /'(#)()2 0, a.s. (13.40) Since 1s()) #(#) < 1 for al1 (1) e f, (F(ls @-fx)() /:(#)()2 is similarly =
--
--)
-
1
-
I
I
-
bounded, uniformly in f. Unifonn integrability of the sequence can therefore be assumed, and it follows from 12.8 that IlF(n l9-/-)112=
as required.
IlF(1sI@-'-)(t,))
-/'(:)112
--
0 as t
(13.41)
--+ -,x,
.
In this theorem, it is less the existence of the limit than the passage to the limit, the fact that the supremum in (13.35)can be made small by choosing large, that gives the result practical significance. When the only remote events are trivial, the dependence of Xt-vkon events in @-x, for fixed k must eventually decline as f increases. The zero-one 1aw is an instant corollary of the necessity part of the theorem, since an independent sequence would certainly satisfy -t
,
(13.35).
Theory of Stochastic Processes
206
There is an obvious connection between the properties of invariance and remoteness. lf T is a measure-preserving shift transformation we have the following simple implication.
13.19 Theorem
If FA
=
A, then A e T-x. l
l
t t t Proof lf A e T-x, then Fz4 e @-x and F A e F-x :4 followsimmediately that e (*s=ts-sxj chtO,k +
-
-
1 .
lf TA
-x@!x)
=
-
=
l
A, F A
T-x.
A and it
=
.
t
The last result of this section establishes formally the relatlonship between regularity and ergodicity which has been implicit in the foregoing discussion.
13.20 Theorem (Ibragimov and Linnik 1971: cor. 17. 1.1) If a stationary sequence x((o) (AX(o))!Zis regular, it is also ergodic. =
Proof Every set A e F+-7is contained in a set At e T-ff, with the sequence (A,) non-increasing and Ar A. Thus, A, may be constnlcted as the inverse image under .x of the (2/+ ll-dimensional cylinder set whose base is the product of the coorof x(A) e The inclusion follows by dinate f-sets for coordinates P(At4 #(A). continuity of By P, that 1.2(iv). we can assume Let A be invariant. Using the measure-preserving property of F, w find .1,
=.
-4,...,/
-->
P(At raAl
=
PAt
Since k is arbitzary, regularity Letting f
--
x
yields #(A)
=
fa
F -kA)
=
PITkA t chz4l.
(13.42)
implies by (13.34)that PAt faAl = PAt)PA). P(A) 2 so that P(A4 0 or 1, as required. . =
,
13.6 Strong and Uniform Mixing The defect of mixing (regularity)as an operational concept is that remote events are of less interest than arbitrary events which happen to be widely separated in time. The extra ingredient we need for a workable theory is the concept of dependence between pairs of c-subtields of events. There are several ways to characterize such dependence, but the following are the concepts that have been most commonly exploited in limit theory. Let L1,5,P) be a probability space, and 1et V and R be c-subfields of @;then
(x(V,X) =
sup G c F Hq
.'(
IP(G
f'n
S)
-
#(G)#(&
I
(13.43)
l
(13.44)
,
is known as the strong mixing coeftkient, #(V,A)
=
sup Gq F,f'1eJf; P(G)>0
and
G4 1/,(/./'1
-#(#)
as the uniform mixing coefficient. These are alternative measures of the dependence between the subfields V and R. If the subfields V and R are independent, then (x(V,J) = 0 and #(V,Jf) 0, and the converse is also true in the case of uniform mixing, although not for strong mixing. At first sight there may appear not much to choose between the defini=
,
Dependence
207
tions, but since they are set up in terms of suprema of the dependence measures over the sets of events in question, it is the extreme (and possibly anomalous) cases which define the characteristics of the mixing coeixcients. The strong mixing concept is weaker than the uniform concept. Since
IPG
r-'hff
-
PCG4PC* I S IPCH, G)
I f 9(5',>t)
PH)
-
(13.45)
for all G e V and H e R, it is clear that a(N,J#) f ((T,Jf). However, the following example shows how the two concepts differ more crucially. 13.21 Example Suppose that, for a sequence of subfields (V-IT,and a subfield X, a%m,R4 0 as m x. This condition is compatible with the existence of sets Gm ?. forJ xp(s) PGm) llm, andlGm fo & e Mmandffe R with thejroperties .-+
--
=alm
=
.
But (/497,s,J#)2 IPCH6Gm) #(/f) I Ia PH4 I for every tilds Mmand R are not independent in the limit. n =
-
-
,?z
k 1, showing that sub-
Evidently, the strong mixing characterization of does not rule out ppssibility dependence negligible between the sets. qf () rnixing coefficients and the only not that are can be defined, although they tx have proved the most popular in applications. Others that have appeared in the literature include tindepndence'
13(V,J#) = sup E HR
j#(SIV)
-
#(S)
j
(13.46)
and
p(V,J#)
sup
=
If((n)I
(
2
n
(13.47)
,
2
where the latter supremum is taken with respect to a1l square integrable, zero mean, V-measurable r.v.s (, and S-measurable r.v.s q. To ompare these alter#(Sl T)() #(&, so that natives, first let (() =
f5(V,X)
-
=
sup SGS
JI(1
dP
I( I js sup
dp
GC%D.H6R
sup
G e F, H R
(.LdP
*
=
a(T,3f).
(13.48)
'''
#(G) and Moreover, since for any sets G e V and H e 3f, (((t)) 1c() n() which members the of is defined, p(V,>t) and F((n) #(& set 1zz((l)) are over while l( I < 1 and lqI S 1 for these cases, it is also clear #(G f''h z.fl PGlm notwithstanding (x its designation, is the weakest of mixing, Thus, that p a. o? variants, although it is these four course stronger than ordinary trivial remote events. We also have j S 9, by an regularity characterized by immediatecorollary of the following result. =
-
=
=
-
-
istrong'
Theory of Stochastic Processes
208
13.22Theorem I#(NIV)
If
#(&
-
9(V,>t) a.s., for a1l H e R. a
The main step in the proof of 13.22 is the following lemma. !' 13.23 Lemma Let X be an talmost surely boundedy) V-measurable nv. Then
1
sup GeF,#(G)>0
jaxdp
po).
ess sup
=
x.
(13.49)
:'
r
E
Proof #(G)-1 j < ess sup X, for any set G ils the designated class. For any E > 0, consider the sets .
fccxlrj
..E
.
.
G G-
f (:
=
=
:
X() -A)
k
v
(ess sup.km 2 (ess sup
-
al
-
6:), E).
By definition of ess sup X, both these sets belong to V and at least one of them is nonempty and has positive probability. Define the set G*
G+, #(G>+)k PG-) =
G -, otherwise
and we may conclude that
1 Jc- xdp ,(c-)
(13.49) now follows on letting
E.
z (ess spp approach 0.
m -:.
(13.50)
*
of 13.22 #ut X #(SIV) #(S) in the lemma, noting that this is a Observe that, for any G e V, V-measurable r.v. lying between +1 and E'(G)-1 .' IJ crxl'l = IPCH, G) Ph'j l Hence the lemma together with (13.44) implies that, for any H e R, Proof
=
-
-1.
-
.
9(N,J#)
ess sup
k with probability 1.
.
f#(SIV)
l#(SI V)
-
#4&
l
-
,
#(S))
(13.51)
14 Mixing
14.1 Mixing Sequences of Random Variables For a sequence
T-Jx
(X,())J=-x,let
=
of...,Xt-z,Xt-j,Xt) as in
913.5,and
similarly
The sequence is said to be vmixing (or define5*t+m n(Xt+m,Xt+m+3,.%+m+1,...4. lim,u-Axa,u 0 where strong mixing) if =
=
qm
sup
=
(x(T-Jx,@7+,,,),
(14.1)
l
and a is defined in (13.43).It is limpz-jxtj,u0, where
said to be
=
Sup
=
zrl
(
b-mixingtoruniform
9(?F-f x,T7+r,,),
mixing) if
(14.2)
and (j is detined in (13.44).(-mixing implies a-mixing as noted in j 13.6, while the converse does not hold. Another difference is that g-mixing is not timex) reversible', in other words, it is not necessarily the case that supytF/ +,,I,T-f =
supytT-lx,T''; By contrast, a-mixing is time-reversible. If the sequence (.X))+-7 is a-mixing, so is the sequence ( Ffl+-7 where J'f X-t. 0 where (#,((.t)))=-xis also said to be absolutely regular if limp-xp,,, +,,).
=
=
p,,,
sup
=
t
13(8?;-'-,8?7-,+,,,)
(14.3)
and 13is defined in (13.46).According to the results in j13.6, absolute regularit,yis a condition intenuediate between svong mixing and uniform mixing. On the other hand, if (m)+= is a stationary, fmbounded sequence, and @0-x c(...,X-1,A%) and %m+* c(X,,,X,,+1,...), the sequence is said to be completely O where p is defined in (13.47).In stationary regular if p,,, p( T0 J;+oo m ) regllarity complete is equivalent to strong mixing. KolmoGaussian sequences, gorov and Rozanov (1960)show that in this case =
..x
=
-->
=
-x,
,
qm/
Lzam.
pm
f z)
1 =
t=
77
c'l?
2,s jzz--sn z)
,
,
e
(-a,a1.
(14.5)
Theory of Stochastic Processes The theorem of Kolmogorov and Rozanov leads to the result proved by Ibragimov and Linnik (1971: th. 17.3.3), that a stationary Gaussian sequence is strong mixing when fk) exists and is continuous and strictly positive, everywhere on
(-a,cj.
This topic is something of a tenninological
minefield. KRegularity' is an undescriptive term, and there does not seem to be unanimity among authors with regard to usage, complete regularity and absolute regularity sometimes being used synonymously. Nor is the list of mixing concepts given here by any means exhaustive. Fortunately, we shall be able to avoid this confusion by sticking with the strong and uniform cases. While there are some applications in which absolute regularity provides just the right condition, we shall not encounter any of these. might be thought appropriate as a synonym for Incidentally, the tenn regularity, but should be avoided as there is a risk of confusion with wetz/cdependence, a term used, often somewhat imprecisely, to refer to sequences having summable covariances. Strongly dcpendent sequences may be stationary and mixing, buttheircovariances are non-summable. (dWeak' implies Icswdependencethan <strong' in this instance, not morel) Confining our attention now to the strong and uniform mixing definitions, measures of the dependence in a sequence can be based in various ways on the rate at which the mixing coefficients xmor 4u tend to zero. To avoid repetition we will discuss just strong mixing, but the following remarks apply equally to the ;-x Since the collections uniform mixing case, on substituting 4)for Gthroughout. and 5*t+m are respectively non-decreasing in t and non-increasing in t and m, the sequence f(xp,lis monotone. The rate of convergence is often quantified by a summability criterions that for some number ? > 0, xm 0 sufficiently fast that tweakmixing'
f
-->
X
7711* < r;=1
x.
(14.6)
The term size has been coined to describe the rate of convergence of the mixing numbers, although different definitions have been used by different authors, and the terminology should be used with caution. One possibility is to say that the if the mixing numbers satisfy (14.6).However, the commonsequence is of size White 1984) is to say that a sequence is G-mixing of example for est usage (see *.11 if am Om -#) for some fp > It is clear that such sequences are size when of the 1/:0, raised to summable so that this concept of size is power stronger than the summability concept. One temptation to be avoided is to define the size as < whe re f? is the largest constant such that the a) * are summable' ; for no such number may exist. Since mixing is not so much a property of the sequence (.X)) as of the sequences of c-fields generated by (A), it holds for any random variables measurable on those c-fields, such as measurable transformations of Xt. More generally, we have the following implication: -(2
-(p()
=
-(?,
14.1 Theorem Let Ff = g(Xt,Xt-3,...,Xt-x) be a measurable function, for finite of size then Ff is also. lf Xt is a-mixing (g-mixing) -(?,
T.
Mixing
211
= Proof Let V-fx= c(..., 1$-1,F, and G*twm c(I$+,,,,im+1,...). Since F, is measuri-Jx which each of Xt.%-j,...,Xt-z are measurable, Wlx i able on any c-field on and it follows that ay,,u f (u-s and M*t-bm i T7+,u-s. Let ay,,, supr (N-fx,Vc;+,,,) for m k 1. With tinite, (xm-z O(m-%) if um Om-%) and the conclusion replacing follows. The same argument follows word for word with w =
.1
=
=
d9'
ttx'
.
14.2 Mixing lnequalities Strong and uniform mixing are restrictions on the complete joint distribution of the sequence, and to make practical use of the concepts we must know what they imply about particular measures of dependence. This section establishes a set of fundamental moment inequalities for mixing processes. The main results bound the m-step-ahead predictions, Extwm I5-tx). Mixing implies that, as we try to forecast the future path of a sequence from knowledge of its histol'y to date, looking further and further forward, we will eventually be unable to improve on the predictor based solely on the distribution of the sequence as a whole, EXt+m). The r.v. Extm lT-'x) EXt+m) is tending to zero as m increases. We prove convergence of the fw-norm. -
14.2 Theorem
p k 1 and with um defined in (14.1), lI,< 2(21/: + 1)a,,,1/r-1/rl1m+,,,lI,.(14.7)
(Ibragimov 1962) For r
IlF(A+r,,IT-'-)
-
Exm)
Proof To simplify notation, substitute X for Xtmm,V for s-too, R for T7+,,,,and a for xm.It will be understood thatxis an X-measurable random variable where V, R c 5. The proof is in two stages, tirst to establish the result for IXI < Mx < cxo a.s., and then to extend it to the case where X is G-boundedfor t'inite r. Define
the V-measurable r.v. q
=
sgn(F(Xl V) F(m) -
1, F(.Yl V) k #(m, =
-1, otherwise.
(14.8)
Using 10.8 and 10.10,
9') A'IFIXI
-
F(ml
=
=
V) -F(-Y))1 FEn(F(XI FEtA'tn.YlN) -nF(m1
= Covtq,m
ICovtq,m I
=
.
(14.9)
Let F be any V-measurable r.v., such as q for example. Noting that R) -F(F)) is A-measurable, similar arguments give sgn(F(FI Icovt-Y,F)
R4 -A'(l'))) I I IF(X(&FI < F(IXII(&FIR4 -F(F))I) < Mxf'l F(y1 R) F(F) l =
-
(
=
Theory of Stochastic Processes
1Z
G A&1COv((,F)
l
(14.10)
,
where the tirst inequality is the modulus inequality. ( and n are simple random variables taking only two distinct values each, so define the sets A+ (n = 1 ), =
A-
(n
=
-1
=
together gives
), B*
FlFl.Yl T) F(m -
=
()
=
=
1 ) and B,
l K Mxl Cov(n,k)l
Mxj (#(A+
chB+) + P(A-
=
((
=
=
Mxlfttnl
fa#-)
PA+c
-
(#(A+)#(#+) +PA-)PB-)
l
-1
Putting
.
-
F(()F(n)
B-)
I
PA- rn #+)1
-
PAhP(B-j
-
(14.9)and (14.10)
PA-4PB+)j
-
< 4Mm. Since IF(-Yl N) F(m -
l < IF(#IT) 1+ lF(ml f lMx. V) F(mIIP S 2M,x(2a)1/F. llF(-YI
I
(14.11)
it follows that, for p
1,
(14.12)
-
This completes the first pal't of the proof. The next step is to let X be Lr-bounded. Choose a finite positive Mx, and define Xl 1( IxjsMxlx and .Y2 By the Minkowski inequality and (14.11), X =
-
=
m .
IlF(.YlN)
-
+ T) e'(A)11, < IIF(-LIT) -F(-Yl)lI, A'(mII, 11(721 2A.za'(2a)1/J' + zjjxallp, u -
and the problem is to bound the second right-hand-side llX2ll/?f
for r
member. But
sf-rlr IlA'IIrr/P
(14.14)
p, so we arrive at 11.E'(x1 N) F(#)ll, < -
Fillally,
(14. 13)
Choosing
Mx
1IXl)ra-
=
1Mx2%)31P I/C
zM-l--rll,-Yllr/.
(14.l5)
and simplifying yields
11F(-Y1 N) -.E'(A')11,/
which is the required result.
+
2(21/P+
lltxl/#-l/rjjxjjr,
*
There is an easy corollary bounding the autocovariances of the sequence.
14.3 Corollary For p
1 and r plp 1), (14.16) ICov(x,,.Y,+,,)l < 2(21-1* + 1)a)-1/#-1/rjjx;IIp1Ix,+mI!r. >
-
Proof
ICovxvxmb l
=
IEXtxtwmj
-
F(A)F(-'+,,,)
I
= IExtExt-vm I9,) 1lpllA'(X,+,, I@,) F(.&+,2II,/?-l) f 11.X) 2(21-1/P+ 1)jj#,ljpIlX,+mIl 1 and q r/tr 1), we have =
=
=
...).
=
IExt..m I@-'-)
E(Xt+m)3r
-
'>7
j
-
l@-'-)
k-xipAi
=
-
i
i
1
r
IxflI#(4fI@-'-)-#(4I
S
r
#(4f))
r '>7
= f
ImlI#( i I5t ) z1.
i
77Ixi I I#(X i I5-t X
l/rl
x) -
#(X
ij
i
S
E IXt-mIrl @-,-)+
1/t?
I #(AjI@-r-)
PAi)
-
--
#(Af) I
-
) (77
I 77I#(A I5
X?
-'
-) -
i
r)
I FIxt-vm
i
IPAi l@-'-)
-
#(Af) I
PAi)
X.
)
I
(14.19)
The second inequality here is by 9.25. The sets Aj pmition and T-rx)+#(Af'I5-t=) a.s. and PAiwAk') PAi QJA/' I@-rx) #(AfI PAi) + PAi') for # i'. Letting A) denote the union of all those Aj for which P(Ai I@-rx) PAi) k 0, and AJthe complement of 4) on f1, ,
=
=
-
77I#tAfl @-'-)-#4A/)1
I#(A) I@-'-)
=
#(A1)I+ IPCAII@-'-) #(A7)I (14.20)
-
j
-
.
By 13.22, the inequalities
lPA; I@-'-)
#(41)
IPA; I@-'-)
PA;)
-
-
l < 4u I S 9,,,
hold with probability 1. Substituting int (14.19)gives
lExmf
@-'-) Exm) -
Taking expectations
a.s. (14.21) ge'(IAu,,, Irl @-r-)+ Flx,+,,,l rjtztTl,,,lr&
Ir
and using the law of iterated expectations
t E If(xf+r,,l T-x)
-
rIq Ext,vm) Ir S lElXt+m Ir (29,,,)
and, raising both sides to the power 1/r,
then gives '
,
(14.22)
214
Theory of Stochastic Processes f 2IIA+-II41 F(A7,a)IIr -1/r.
IT-'-) Il'(kt+,,,
-
(14.23)
Inequality (14.18)follows by Liapunov's inequality. The result extends from simple to general r.v,s using the constmction of 3.28.
ke For any nv. X:+mthere exists a monotone sequence of simple r.v.s fxkywm, IxY(z)r+,,,() 0 as k N ) such that Xt-vmlt l for all e f. This converx) @-f Exkymm), k e (N) by transfers the sequences (F(aV)f+,,, l a.s. to gence 10.15. Then, assuming Xtwmis fv-bounded, the inequality in (14.22)holds as k Ir as the x by the dominated convergence theorem applied to each side, with lxfx.m dominating function, thanks to 10.13(ii). Tlis completes the proof. w -->
--)
=,
-
-
-->
The counterpart of 14.3 is obtained similarly.
14.5 Corollary
For r k 1, ICov(-Yr+,,,.x))l
where, if r
=
1, replace
s zgml/rjjxrjjrjjx/+mjlrytr-j),
by 1IX,+,,,IIx ess IlXf+r,,IIr/(r-1) =
(14.24)
sup Xt.vm.
Proof
ICov(x)+,,,,.')I I9-f-) F(-Y,+r,,)lIs(r-l) s lIA-,IlrlI'(.2t+,,, -
s 2#-17rlIx,lIrII=,+-lIr/(r-1), where the first inequality corresponds is by 14.4. .
(14.25)
to the one in (14.17),and the second one
These results tell us a good deal about the behaviour of mixing sequences. A fundamental property is mean reversion. The mean deviation sequence (A Ext) ) must change sign frequently when the rate of mixing is ligh. lf the sequence exhibits persistent behaviour with Xt Ex tending to have the same sign for a large number of successive periods, thep IE(Xt+mI5-t.) Extmm) j would likewise tend to be large for large m. If this quantity is small the sign of the mean deviation m periods hence is unpredictable, indicating that it changes frequently. Butwhile mixing implies mean reversion, mean reversion need notimply mixing. Theorems 14.2 and 14.4 isolate the properties of greatest importance, but not the only ones. A sequence having the property that Ilvar(Ap, I9-',.) Vart-mlllr > 0 is called conditionally heteroscedastic. Mixing also requires this sequence of norms to converge as m x, and similarly for other integrable functions of Xtwm. Comparison of 14.2 and 14.4 also shows that being able to assert uniform mixing can give us considerably greater flexibility in applications with respect to the existence of moments. In (14.18),the rate of convergence of the left-hand side to F-Jx) Ext+m) I zero with m does not depend upon p, and in particular, FIE(Xt+mI whenever exists 0, condition infinitesimally > for a 117,+,,,111+8 converges stronger than uniform integrability. In the corresponding inequality for am in 14.2, p < r is required for the restriction to Likewise, 14.5 for the case = p 2 yields -
-
-
-
--+
-
Kbite'
.
Mixing
215
I 2#,,,1/2jI.;t1I2IIx,+,,,Il2,
lCovxmnx
but to be useful 14.3 requires that either Xt or Xtwmbe fo-bounded, Mere existence of the variances will not suffice.
(14.26) for
>
0.
14.3 Mixing in Linear Processes fmlO-x
which arises very frequently in econometric A type of stochastic sequence modelling applications has the representation
Xt
=
q
jzt
-j
,
0
,
--
=
=
=()
-,..:
--
.wl
co
f (1) 2:: Tjeihi jujl '''
=
2 .
(14.29)
The theorem of Ibragimov and Linnik cited in j14.1 yields thexconditionZ7=()l $1 strong-mixing in txl the for sufficient Gausjian another However, case. < as standard result (seeDoob 1953t ch. X!8, or Ibragimov and Linnik 1971: ch. 16.7)
Theoty of Stochastic Processes
states that every wide-sense stationary sequence admitting a spectral density has moving average representation with orthogonal increments and a (doubly-infinite) square summable coefficients.
But allowing more general distributions for the innovations yields surprising results. Contrary to what might be supposed, having the %tend to zero even at an exponential rate is not sufticient by itself for strong mixing. Here is a simple illustration. Recall that the first-order autoregressive process Xt pm-l + Zt, p/, j 0, 1,2,... Ip I < 1 has the MA(x) form with % =
=
,
=
14.6 Example Let (Z,)7' be an independent sequence of Bernoulli r.v.s, with P(Zt Ztl and 0) = zl Let 1) Pzt =
=
ax'c
=
Xt
=
.
=
.zXl t- j
Zt
+
=
t
F'')l-izt-p
t
1,2,3,
=
(14.30)
...
lt is not difficult to see that the term f
X
2-lZt-j
t
=
2-Xlkzk
(14.31)
belongs for each t to the set of dyadic rationals Bs (kl2t, k = 0,1,2,... 2f+1 2f+1 1 ) Each element of Ff corresponds to one of the p ossible drawings =
-
.
(Zo,...,4),
and has equal probability of 2 -fXt e Bt
whereas iff
ztl
=
=
fkllt, k
1
.
Iff Zo
0,2,4,...,242f
=
=
-
0,
1)),
1, Xt e J'F'f Bt -
=
Lkllt ,
k
=
13 5 ,
,
2f+1 I j -
,...,
.
It follows that (.Y()= 1 ) /a fX, e Btj 0, for every finite /. But it is clear that ,1 PIXO = = 0) P(Xt 6 SJ Hence for every finite m, =
=
.
)
q?n X #(
(&
=
1 ) rn (A% e Bm))
= t,
which contradicts
im
.--.h
-
#(V
=
J
1)#(Xm e Bm4
(14.32)
0. n
Since the process starts at t = 0 in this case it is not stationary, but the example is easily generalized to a wider class of processes, as follows, be an independent sequence of Bern14.7 Theorem (Andrews 1984) Let IZtIEMJx If Xt oulli r.v.s, taking values 1 and 0 tixed probabilities p and 1 (A)*-oo + mixing.' is pXf-1 Zt for p e (0,!, not strong n ,with
-#.
=
Note, the condition on p is purely to expedite the argument. The theorem surely holds for other values of p, although this cannot be proved by the present
approach.
Proof Write Xt+s = p'Xf + Xta where
Mixing
217
x-1
Xts
fZt-s-j. 'X7 j=
=
(14.33)
The support of Xzs is finite for finite p, having at most 2, distinct members. Call this set Bi, so that Wr! (0,1), Fa = (O, 1, p, 1 + p), and so on. ln general, 8$+1is obtained from Wrxby adding p' to each of its elements and forming the union of these elements with those of Bi; formally, =
WQ1= F,
QJ
tw+ p&:w
v,J,
s
s
=
(14.34)
2,3,...
For given s denote the distinct elements of B&by wj, ordered by magnitude with wl < < wJ, for J < ls. Now suppose tha t Xt e (0 p) so that ps#t e (0 p'+1) This means that Xtms for some-j. Defining events A = (aYtE assumesa va lue between wj and wy+ p&+1 r'+1) w+ (0,p)l and Bs lXf+, e Ul/=ltwj, ) we have #(#,IA) == l for any s, howeverlarge. To see that #(A) > 0, consider the case Zt Zf-l Zt-z 0 and zr-a 1 and note that ...
,
,
.
,
,
=
,
=
=
=
x
xp?
;jS j=3
for p e (0,1g.So, unless PBs)
3
x
t-j u ppz
=
j=3
j
P
y--lp/ j P ) s
u
=
-
(14.36)
1,
/=1
0. Suppose that
min (ws1 ./k1
-
wyl k
p
t
0 or, at worst, this holds after substituting (-Z,) for (Zf l and hence ( -XtI for (Xf). pk < 1 fOr a1l K implies, by stationarity, #(X-l < 0) = Pxo < 0) < 1. Since (.%< A-Ji (Z0 < X') QJ (fZ() 2 A') f'n IX-I < 01), independence of the (Zf l implies that B
=
-
PIXZ
So PA) that
>
PB
,
&l'
=
ctjzl-j
-
-
m
X
Evt
lm)
< Evt
-
k';)241
=1-p
=
=
=
=
=
.%
0 for some C e e C) e @1-p,
G
=
f(t):
#(,()
H
=
(:
A%() e
DJ
E
=
f ):
Fz()
#) e
e
for some D m e J;'7l,
G
TP ,
Bk ,
10 -x,
where # (p2:Ir21 f n) e T k lv21 denotes the vector whose elements are the is a vector of positive constants. absolute values of p2, and z1 = (q,,,+1,...,n,,,+z)' Also define D pa = (w2:.,2 +#2 e D) e T k =
,
-
.
H may be thought of as the random event that has occun-ed when, first F2 = 1/2 is realized, and then W2 e D p2. By independence, the joint c.d.f. of the variables (Wa,F2,A%) factorizes as F = FwzFh,xo (say) and we can write -
#(&
=
#(X2 e D)
JsJss.va#F(+2,V2,20) =
=
jepjekzvltdFvz,xvl'x'
(14.49)
Mixing
223
where
:(1/2)
#(W2 e D
=
-
v2)
jp-vgdFwzw.
=
( 14.50)
These definitions set the scene for the main business of the proof, which is to show that events G and H are tending to independence as m becomes large. Given 5IB P'm'k ility of X this is sufficient for the result, since C and D are arbitrary. By the same reasoning that gave (14.49),we have -measurab
,
#(G Detine k*
E4
/'7 HIA
supyzssztpz) and
=
X.PIGcjE)
jcjg(v2)#Fvz,xa(v2,&).
=
x.
infpzssztva), and
=
(14.51)
(14.51)implies (14.52)
f PIGCHIAEI < z.PGcjE4.
Hence we have the bounds
PCGchP
+#45rn PCGCDHI-?EI
=
z*#(G)
; and similarly, since
-
k k.P(G4 f
(14.53)
+#(Grnz;faFC)
k z.PGcjE)
= z.P(G)
=
PEc),
+
z. K 1,
PGcllj
Choosing G
H(nEcl
(i.e., C x.
Ec) + #(G chHcj Ec4
fa
P(Ec4.
R#) in
=
PCEC4
-
-
k.PG
( 14.54)
(14.53)and (14.54)gives
f #(& S
z*+
PCEC),
in particular
(14.55)
and combining all these inequalities yields 1#(G rn /f)
1 f x* x++lPEc).
#4G)#(&
-
-
(14.56)
Am+kZ,where Z (Z1,...,Z,,,+k)'and Amn is defined by (14.48).Since I4,?,+I 1 and the (z1,...,Z,,,+kl are independent, the change of variable formula from 8.18 yields the result that G is continuously distributed with Write IP
=
=
=
m+k
/(w) Define B '
(r:
=
y.
-z.
A'I
=
0 v2 ,
G
fztz)
=
#1
I
-z(0)
u 2 sup x(v2) vzB S 2 sup v2G#
JsI
G
11fz,(zr).
=
(14.57)
>1
Bm+k Then the following relations hold: .
j
/ctwz + #2) /c(w2) -
I
dw1
Theor.v of Stochastic Processes
224
jpm..k I /vt.'
u 2 sup, B G
&,
+
v)
l
-yv(w)
dw
m+k
m+k
flJz/z,+ J.,a+z
= 2 #sup. B'
t'%
*'
6
E
p e B'
I-lJz/tz,)
-
>1
>1
m+k
< ZM sup
r)
19,1
(14.58)
,
>-+1
where it is understood in the final inequality (whichis by 14.10) that );, l f W here 6 is defined in condition 14.9(a). The tllird equality substitutes 7 = Am-+1kv 0 if vl = 0 by lower triangularity of Amn. For v e and uses the fact that ;'I #' note that =
1
m+k
m+k
77 I I
l
t-m-
77 jszj 77 zjvt-j
v-,
1=-+1
>-+1
t-m- 1
m+k
S
77 jzujj :7
>-+1
m+k
x
XI
I&ln,-; K
':/1
/=0
A'lnr,
(14.59)
t-nm
assuming has been chosen with elements small enough that the terms in parentheses in the penultimate member do not exceed 6. Tlzis is possible by condition 14.9(c). For the final step, choose r to be the largest order of absolute moment if this is does not exceed 2, and the largest even integer moment, otherwise. Then 'rt
PEc)
lF21 >
P
=
'q)
m+k
P
=
U f I141
>-+1
>
n,l
m+k
m+k
77#(I7,I -+1
>
n,)K
>7FI7,I
rn-,r,
(14.60)
J=ra+1
by the Markov inequality, and E j 7, Ir < sup E Iz, IrG,(r),
(14.61)
'
where Gt is given by (14.43),applying 11.15 for r 2 (see(11.65) for the required extension) and Lemma 14.11 for r > 2. Substituting inequalities (14.58), (14.59), (14.60),and (14.61)into (14.56)yields m+k
l#(G
chffl
-
PG4PH4
I
X (n,
+
t=m+3
G(r)n;3.
(14.62)
Mixing
225
i 0 by 14.9(b), it is possible to choose m large enough that (14.59)
Since Gt
andhence (14.62)hold l P(G r'3S)
with
n,
Gtr) 1R1+8 G t (rln-r t for each f > m. We obtain
=
=
m+k
*(G)#(S)
-
l
d
t%(r)1/C1+r' 77 >-+1 X
K
1/(1+r)
77G'r) /=-+1
(14.63)
,
where the right-hand sum is finite by 14.9(b), and goes to zero as m completes the proof. w
This
-->
x.
It is worth examining this argument with care to see how violation of the conditions can lead to trouble. According to (14.56),mixing will follow from two O conditions: the obvious one is that the tail component F2, the F-x-measura bje part of A%,becomes negligible, such that is, #(F) gets close to 1 when m is large, even when Tl is allowed to approach 0. But in addition, to have x* x. disappear, 0, for any D, and whatever #(W2 e D r2) must approach a unique limit as 172 the path of convergence. When the distribution has atoms, it is easy to devise examples where this requirement fails. In 14.6, the set Bt becomes Js Bt on being translated a distance of 2-l For such a case these probabilities evidently x. do not converge, in the limiting case as t However, this is a suftkiency result, and it remains unclearjust how much more than the absence of atoms is strictly necessary. Consider an example where the distribution is continuous, having differentiable p.d.f., but condition (14.42) none the less fails. -
-->
-
-
-..-)
g
.;
14.12 Example Let fz) Coz sin (z ), z e R This is non-negative, continuous everywhere, and bounded by C()c-2 an d hence integrable. By choice of Q we can *fztdz 1, so f is a p.d.f. By the mean value theorem, have =
.
=
Ifz
+
ab
f(z) I
-
=
Ia IIf'z
wheref '(z) 8c osint/lcost/k x and hence, =
-
I
+ a(z)J)
,
2c()sin2(/)g-3
(14.64)
a(z) e (0,11, .
But note that
J+OI f'l ldz
=
-x
1
1
:*
Ifz
J
' Ia I
.-
+ J)
f U)ldz
-
'--h
'x'
as
Ia l
-->
0,
(14.65)
which contrdicts (14.42).The problem is that the density is varying too rapidly in the tails of the distribution, and Ifz + a4 fz) l does not diminish rapidly 0. enough in these regions as a The rate of divergence in (14.65)can be estimated. 15 F or fixed (small)a, sin (z+ a) 4 1 (or 0) Ifz + a) fz) I is at a local maximum at points at which tz)4 z4 4.3 + o(u2) and sin z4 0 (or 1), or in other wor ds w here (z+ :+/2. The solutions to these approximate relations can be written as z = +CI lJI -1/3for C 1 > 0 At these points we can write, again approximately (orders of magnitude are all we need here), -
--
=
-
=
-
.
=
=
Theoty of Stochastic Processes
226
Ifz
a4
+
fz) I S
-
-2 1J
13
I (-C1 It7I 21
lfz)
S ICOC 1
.
-1/3
-1/3j
C 1 jt7l by The integra l is bounded within the interval 1/3 2/3 rectangle having height 2C0C7-1 a I Outside the 4CoC -1 th e area of the I la I and h -2 is bounded by t e integral over this region is bounded by interval,J coz ,
,
.
,
+00 2C0Jcj$(,I-v3Z
Adding up the approximations
dz
for M
0, and condition 14.13(a') fails. To conclude this chapter, we look at the case of uniform mixing. Manipulating inequalitie (14.52)414.55)yields 'ry1
=
-2
=
,
,
-2
IPIH' G) #4/./)1 S -
(1
z* z. PCEY + -
+
plv) ,
(14.73)
1 for all m exceeding a which shows that unifonn mixing can fail unless #(F) tinite value. Otherwise, we can always construct a sequence of events G whose probability is positive but approaching 0 no slower than PEc4. When the support of X-v,...,Xo4 is unbounded this kind of thing can occur, as illustrated by 14.8. =
The essence of this example does not depend on the AR(1) model, and similar cases could be constructed in the general MA(x) fzamework. Suftkient conditions must include a.s. boundedness of the distributions, and the summability conditions are also modified. We will adapt the extended version of the strong mixing condition in 14.13, although it is easy to deduce the relationship between these conditions and 14.9 by setting f$ 1 below. =
228
Theoty of Stochastic Processes
14.14 Theorem Modify the conditions of 14.13 as follows. Let (a') and (c') hold as before, but replace (b') by (b'') f.o I j I)l5< x, add and (d) (Z,l is uniformly bounded a.s. T hen (mJ is uniform mixing with (,,, O; =,,,+1(E7=f I0./1)f5). XO
xt
=
Proof Follow the proof of 14.9 up to (14.55),but replace (14.56)by (14.73).By condition 14.14(d), th'ere exists K < x such that suprl Zt I < K a.s., and hence I-Y,I 1 < K;=o I0.j1a.s. It further follows, recalling the definition of Fa, that PE) +k. Substituting directly into (14.73) 0./1 for t m + 1,...,< A'Z7=,1 when n, of and a nd making choice this from (14.70) (14.71), n, gives (forany G with #(G) > 0) =
=
,
I,(,.,1
c) rm I -
1
+
ZnEXn+3I?L)
=
Sn
(
(15.20)
'n).
We might think of Xt as the random return on a stake of 1 unit in a sequence of bets, and the sequence (4) as representing a betting system, a rule based on information available at time t 1 for deciding how many units to bet in the next game. The implication of (15.20)is that, if the basic game (in which the same stake is bet every time) is fair, there is no betting system (basedon no more than information about past play) that can turn it into a game favouring the player or for that matter, a game favouring the house into a fair game. ,5,P), a stopping time For an increasing sequence (1,1 of c-subfields of is a random integer having the property f : t ) e 5t. The classic example is gambling which entails withdrawing policy from the game whenever a certain a conditiop depending only on the outcomes to date (suchas one's losses exceeding some limit, or a certain number of successive wins) is realized. If is the random variable defined as the first time the said condition is met in a sequence of bets, it is a stopping time. be a stopping time of f@n Let ), and consider -
-
..:(r4))
'r()
=
'r
'r
Snxz
=
Sn, n ;
%, n
':
> T
(15.21)
Theory of Stochastic Processes
234
where n zx T stands for minf
l,'rl.
fSnoysn
l'X'n.l
is called a stopped process.
then t,Sv,T,, 15.6 Theorem If fSn,5n 1Tis a martingale (submartingale), martingale (submartingale).
JTis
a
Proof Since f 5n 1Tis increasing, f ): k T()) ) e 5n for k < rl, and hence also -1 1(t=z) + n j (a
=
1 if either Fz-j
=
1, Sk-j k
or Fk-1 = 1, Sk-j
Proof of 15.7 Fix (x and
a. By 15.8,
e'1snI + laI
EUn)
or supnxst)
-x
I)
M, if one exists, and IA(t) tx,vhn-tlxxu Tnwwl-i-rl is a martingale (15.6),and S M for al1 n. Letting fsnxzu, = S'Lzu maxf 5kZ.,OJ, =
s+nh'w
1
(15.39)
M.
If there is no finite integer with this property then ,:A#)) = x. If DM ( : 'rA/.((l)) x), D limv-yxfk. The r.v. 1(wka)() is Fs-l-measurable, since it is known at = time n 1 whether the inequality in (15.39)is true. Detine the stopped process =
=
-
N
A7A1(zvvf)
Snxxu.
=
(15.40)
,.=1 is a snvsxu
martingale by 15.6. The increments are orthogonal, and Esnhxu)
sup n
N
=
sup E n
XV ltwkul >1
n
=
supf n
77ltwknlftl
@r-1)
1
where the final inequality holds for the expectation since it holds for each by definition of 1M((t)). By Liapunov's inequality,
supFIskwl N
S sup n
(.t) e
f
< Ml Il&.wII2
and hence Snxzu converges a.s., by 15.7. If ) e DM, xskss/)) xS4))for every n e EN,and hence Sn1 converges, except for ) tna set of zero measure. That is, PDu rn C) PDM4. The theorem follows on taking complements, and then letling =
=
M
-->
x.
.
15.12 Example To get an idea of what convergence entails, consider the case of
(Xf ) an i.i.d. sequence (compare15.1). Then Lxtlatjis a m.d. sequence for any Exlt @f-1) F(aYl) = c2, constant sequence (t7,)o f p ositive constants. Since I a which we assume finite, Sn Xlcjxlat is an a.s. convergent martingale whenever x jlat 2 < For example, at t would satisfy the requirement. a =
=
t
.1
(x,.
=
In the almost sure case of Theorem 15.11 (when#(C) #4D) 1), the summability of the conditional variances transfers to that of the ordinary varipnces, 2 Ftsupfxl) < x, the summability of the conditional variances cf2 EXt). Also w h en is almost equivalent to the summability of the Xt2 themse lves. These are consequences of the following pair of useful results. =
=
=
Theory of Stochastic Processes
240
15.13 Theorem Let lZ?) be a any non-negative stochastic sequence. (i) X*t=jEZt) < x if and only if :7 =1f(Zt1 7t-1) < cxa a.s. (ii) lf Asupzf) < x thn PD A E) 0, where =
D
=
E
=
( ): Z=f=qE(Zt197,-1)4) ( ): Z'';=lZr() < ) x
E I
E lsnj#
1Sk
f E-1F( ISnIltmaxlsksalslo
s
(15.47)
.
(15.46)for the
Proof Consider the penultimate member of P max
>
case p
c)),
=
1, that is,
(15.48)
and apply the following ingenioud lemma involving integration by parts. 15.16 Lemma Let X and F be non-negative nv.s. If #(X > 1)1PF(F#), for p > 1. all E > 0, then F(XP) f lp
E)
S E-1F(F1(x>c)) for
-
Proof Letting Fx denote the marginal c.d.f. of X, and integrating by parts, using #(1 Fx) -#Fx and RP(1 Fx@))17 0, -
=
=
-
Exp
j,
-jhpdk
'jpdFxv
=
=
-
l
Xp(#-1(1
=
Fx(())
,
-
Fx6))X =
j*pLP-$PX o
>
j)#j.
(15.49)
> 1 when Define the function 1fx>j)@) (, and 0 otherwise. Letting Fxy denote the joint c.d.f. of X and F and substituting the assumption of the lemma into (15.49),we have =
Ex
upj-r-zstrlfxxklld
p)
0
.7
Theory of Stochastic Processes
242
(J(scsF1tx>()(I)JFx.r(I'F))X
pjo-ie-l
=
-
pj
- 11 =
jkp-zm
y
(R2)+
P
dpkvx,y)
0
jpz,-yx'--'dFkYx-ys
styxp-l;
(15.50)
.
-1
Here (R2)+ denotes the non-negative orthant of R2, or g0,x)x g0,x).The second equality is permitted by Tonelli's theorem, noting that the function Fxrt defines (R3)+. By Hlder's inequality, a c-finite product measure on ECYXP
-j ) K E ly (josl-lytxp;
Substituting into the majorant side of
(15.50)and
.
simplifying
To complete the proof of 15.15, apply the lemma to and i' putting x maxlsjss I I x .%1
,%1
=
=
gives the result. w
(15.48)to
yield
(15.47),
.
Because of the orthogonality of the differences, we have the interesting property of a mmingale l5'nlr that EzSll
=
E
XV
(15.51)
,
>1
where, with s'tl for the case p with the variance property extended
0, Xt St St-k. This lets us extend the last two inequalities 2, to link Ptmaxlsksa lSnl > :) and Ftmaxlxlsa) directly of the increments. It would be most useful if this type of to other values of p, in particular for p e (0,2). One approach to this problem is th von Bahr-Essen inequality of j11.5. Obviously, 11.15 has a direct application to martingales. =
=
-
=
15.17 Theorem
lf
is a (aYf,7f1T
m.d. sequenc
and Sn
=
1=3Xt,
E lsnIP < 2:7 E lx,IP
(15.52)
>1
for 0
2
2
Defining Kt
we obtain
-1
#-1
F,-1
=
(F,-1 +
-
%y,)P
(15.58)
,
the result by showing that n
0
< pxKtyt >2
x > 0 and 0 < r < 1 (see(9.63)).Hence, (15.58)implies that @ -x
-x)r,
1 -.p
Kt K
1
Ff-1
It follows that
1 -
Fl-1 +
a
tyt
=
(Ff-1(Ff-1 + 0ly)
p-1
1.v
1-p
0f yt
.
(15.60)
Theory of Stochastic Processes
244
2,-2 l-p 0 S Ktyt < F,-1 yt
(15.61)
,
and hence n F-P#XA-rzJ n
n
< pv@tlYt-tlls-ps(.y
t /l'n)''
>2
>2
(A''71''-1)X1
-p'
= PT'''/ ?
'P
yt
r-a
E Bpn),
(15.62)
where y; = ytlj'n for t 1,...,n is a collection of non-negative numbers summing to 1, F; = Yxilyr,and Bpn) denotes the supremum of the indicated sum over all such collections, given p and n. =
The terms y;/F;-1 yt /1$-1 for f k 2 are finite since yl > 0. lf at most a number of finite the yt are positive the majorant side of (15.62)is certainly finite, so assume otherwise. Without loss of generality, by reinterpreting n if terms, we can also assume yt > 0 for every t. necessary as the number of nonzero 04?T-1) Ot-1) and applying 2.27 yields the result Bp n) Then, ytlYt-t and y; = ' /41) for a1l p e (0,1). Putting Bp supnfptn) < x completes the proof. w = =
'
=
,
=
Proof of 15.18 Put An Fn
: +
=
2 an d for E Z:=1X?,
=
>
0 and 8
0, set
>
n
Sln +
(: +An)
(1 6)(: +4u) +
=
+
2Xx%-1.V,
(15.63)
>1
2 + 25'f-1 Xt for t k 2, so that in the notation of 15.19, yt = 1', F,-1 (1+ 6).Yf 2 > 0 Then by the left-hand inequality of (15.55), with yl = (1+ 8)(e+X1) =
-
.
is
S
(1 6/18 +
+
?1
X21)P
+
(1
+
t
F(:lV 2pXJ't--ljx%-j.Y/. 6)#X >2
(15.64)
+
N2
we may allow it to approach 0. Taking expectations through, using the law of iterated expectations, and the facts that -1 i s decreasing in its argument, we obtain F(.&l 7/-1) 0 and that However, : is arbitrary in
(15.64),and
.?
=
E(%l
+
6:4.)# < E (1+ S E
(1+
DjPXIP
&fXILP
+
(1+
(:2,-1+ 6x4,-j)P-1V )pX >2 N
+
P XA(--j1F t (1+ 6)45/'-1
(15.65)
.
>2
2 an d yf Xlt for t 2 the right: + 4n, with yl = e:+ But if we put now l'n hand inequality of (15.55)yields (again,as the limiting case as E 1 0)
m
=
Xlp 1
=
,
Fl
+
#XA--lV< (j + Bg)E(AL). >2
(15.66)
Martingales
?
and since (1+ (1 +
6)8P-1(1
+
)P-1
(1+
f
245
this combines with
(15.65)to
give
B )F(A$) k E(Sln+ 8z4ay
2 2:-11(.E'1 l
lp
us-
#F(4j)j,
+
(15.67) .?
where the second inequality is by the concavity of the function
for p S 1.
yields
Rearrangement
2:
FIs I
6J'g21-P(1
(15.68) )/ 1jF(4j). which is the right-hand inequality in (15.54),where Cp is given by choosing 8 to minimize the expression on the majorant side of (15.68). In a similar manner, combining the right-hand inequality of (15.55)with L : + Sn2 w ith (15.65)and (15.67),and using concavity, yields n
S
+.Bp )(1+
-
=
lp
(1+ )(1+l,)FI 5'n1
k
(1+ &)E xll' +
zE
((1
Pxip
)
+
pTn5'l.%-llxj
+
,1
U'-
(1+
Dlpxn
(./-1 +
A,-1)P-1A-l
N2
N
Esn 2 + 6.4n )p
> which rearranges
2#-1(s1
lp
snI
#A'(z4j))
+
(15.69)
as lp
FIS 1
8#(21-P(1
n
+
6)41+#p
-
11-1F(Aj),
which is the left hand inequality of (15.54),with maximize the expression on the majorant side K .
o
1, Bp 0 identically in (15.55)and reproducing the known orthogonality property. Our final result is a so-called exponential inequality. bound fof martingale processes whose increments are accordipgly related directly to the bounding constants, For the case p
=
=
(15.70)
given by choosing cl
=
C1
=
to
1 for any
,
This gives a probability a.s. bounded, which is rather than to absolute
moments.
If (aYf,?;r)';'is a m.d. sequence with 15.20 of positive constants, and Sn Z)=lA, sequence 'heorem
IAI K Bt a.s.,
where
(#fl is a
=
#( IsnI >
:)
S
2exp(-:2/2(J=1#2f) j. u
(15.71)
This is due, in a slightly different form, to Azuma (1967),although the corresponding result for independent sequences is Hoeffding's inequality, (Hoeffding 1963). The chief interest of these results is the fact that the tail probabilities decline exponentially as 6: increases. To fix ideas, consider the case Bt B for This a 11t so that the probability bound becomes #( ISn I > e) f 2exp (-elllnBlt inequalitv is trivial when n is small. since of course P( ISn I > nBj = 0 bv con=
,
.
Theory of Stochastic Processes
246
struction. However, choosing e On 1?2 ) allows us to estimate the tail probabili-1/2 with quantity the associated Sn. The fact that these are becoming ties n exponential suggests an interesting connection with the central limit results to be studied in Chapter 24. =
(-#,,#,) satisfies
Proof of 15.20 By convexity, evel'y x e etM
for any a
>
0. Indeed, tmixingale differences' might appear the more logical terminology, but for the fact that the counterpm of the martingale (i.e. the cumulation of a mixingale sequence) does not play any direct role in this theory. The present terminology, due to Donald McLeish who invented the concept, is standard. Many of the results of this chapter are basically due to McLeish, although his theorems are for the 2. case p mixingales form a very general class of stochastic Unlike martingales, processes; many of the processes for which limit theorems are known to hold can be characterized as mixingles, although supplementary conditions are generally needed. Note that mixingales re not adapted sequences, in general. Xt is not assumed to be Fsmeasurable, although if it is, (16.2)holds kivially for every m k 0. The mixingale property captures the idea that the sequence (Tsl contains progressively more information about Xt as s increases; in the remote past nothing is known according to (16.1),whreas in the remote future everything will eventually be known accordig to (16.2). The constants ct are scaling fatorj to make the choice of (,uscale-independent, and multiples of 11A11pwill often fulfil this role. As for mixing processes (see if (m O(m-%) for (? > (>. j14.1), we usually say that the sequence is of size However, the discussion following (14.6)also applies to this case. =
=
-(>
=
Theory of Stochastic Processes
248
16.2 Example Consider a linear process X
x jc-,77%i-./,
(16.3)
=
where ( I7,)+-Zis a fw-bounded mmingale difference sequence, with p k 1. Also 1et 5t ct&s, s S f). Then =
X
F(Al 5t-m4
=
77 jtb-j,
(16.4)
a.s.
j=m
Xt F(.Y,l 5t+m4
=
-
77e-jUt+js a.s.
(16.5)
j=m+L
w-bounded, the Minkowski inequality shows Assuming ( Us*-ooto be unformly that (16.1)and (16.2)are satisfied with ct sup,ll U,llp for every t, and (,,,= TJ l0j1 + Ie-j!). f-Y,,F,l is therefore a fz-mixingale if X; I0.j1+ l0-./1) 0 are absolutely summable. The as m x, and hence if the coefficients (%J=-x arises sided' process in which % 0 foj < 0 more commonly in the econometric modelling context. In this case Xt is gr-measurable and Xt Ext l5t+m4= 0 a.s., which may increase with t, and does not have to but we may set ct supysfll t&11p satisfy the definition. To prove Xt integrable, given be bounded in the limit to integrbility of the &,, requires the absolute summability of the coefficients, and in this sense, integrability is effectively sufficient for a linear process to be an f-l-mixingale a =
=r,,(
.-.-9
=m
fone-
--
=
-
=
We could say that mixingales are to mixing processes as martingale differences are to independent processes', in each case, a restriction on arbitrary dependence is replaced by a restriction on a simple type of dependence, predictability of the level of the process. Just as martingale differences need not be independent, so mixingales need not be mixing. However, application of 14-2 shows that a mixing zero-mean process is an adapted Ap-mixingale for some p k 1 with respect to the subfields 5t = c(A,Xf-t,...), provided it is bounded in the relevant norm. To be precise, themeandeviations of any fv-bounded sequence whichis a-mixing for r > 1, form an p-mixingale of size 1/r) for p satisfying of size application of 14.4 1 S p < r. If the process is also (-mixingof size tightens up the mixingale size. The mean deviations of a (-mixingfv-bounded 1/r) for 1 < p S r. The is an Lxmixingale of size sequence of size reader can supply suitable definitions of ct in each case. lt is interesting that the indicated mixingale size is lower (absolutely)than the mixing size, except only in the (-mixingsequence having finite sup-nonn (fv-boundedfor all r). Although these relative sizes could be an mefact of the inequalities wlzich can be proved, rather than the sharpest available, this is not an unreasonable result. If a sequence has so many outliers that it fails to possess higher-order moments, it would not be surprising to find that it can be predicted further into the future than a sequence with the same dependence structure but more restricted -:(1//7
-(?,
-
-(?,
-:(1
-(p
-
variations.
Mixingales
249
show the type of case arising in the sequel.
The next examples
16.3 Example An Ar-bounded, zero-mean adapted sequence is an Q-mixingaleof if either r > 2 and the sequence is a-mixing of size 2), or r k 2 size 1). u and it is (-mixingof size -r/(r-
-
-r/2(r
-
16.4 Example Consider for any j k 0 the adapted zero-mean fXtXt+.i c,,?+./,5twjl
sequence
-
,
and (A) is defined as in 16.3. By 14.1 this is a-mixing where ojt-vj Extxt-, (p-mixing) of the same size as Xt for finite j, and is fv/z-bounded, since =
Il.X)-)+./ rn 11
rII.X)+./ IIAII Ilr.
:
x,
for to be
FIEt-s-txt I
and Fl Xt-s-j
Et-jxt-s-k
-
1
E IXt Fm-Y,l ;
=
-
hence, applying the triangle inequality, X
to
E Iz,l
S TE M
IEt-s-txt I +
77E IXt r.m
-
Etnxt
I
*
< 2X(,
-->
.-+
-->
=
Theory of Stochastic Processes
252
The technical argument in the final paragraph of this proof can be better appreciated after studying Chapter 18. It is neither possible nor necessary in this 0 a.s. approach to assert that F(G,,vl @r-l) Note how taking conditional expectations of (16.12) yields -->
F(ml 72-1) = Zt 4+1 a.s.
(16.21)
-
It follows that 0 otherwise. For any p > 1,
tJklX-x
=
E max Lsjs n
IsjI# u
# -
#
1
x
77a k
b=-x
p-1
>-cf-1Fl
ak>o
ynzl#.
(16.26)
Mixingales
taloJx and positive real sequence
Proof For a real sequence X7=-xlkand note that OQ
X xk
p
x =
X
A''
P=-x
253
p
xklaaklp
=
x
X tz,'-#I.xkI#,
K A'''-'
k=-x
1et K llkloo-x,
(16.27)
#=-x
where the weights aklK sum to unity, and the inequality follows by the convexity of the power transformation (Jensen's inequality). Clearly, (16.27)remains true if the terms corresponding to zero xk are omitted from the sums, and for these cases set ak 0 without loss of generality. Put xk Fn, take the max over 1 S j < n, and then take expectations, to give =
=
E max L<j 0, and in this case one 2 0,...,- + 1, and K reduces to m + 1)(%2 + (1). 1, k A jtem may choose ak consider where and the case natively, (k> 0 for evel'y k. If we put at ((), then define the recursion =
=
=
=
=
(z g((1+4Jk-1) z ak ztu-j
.(j
a
=
ak is real and positive if this is true of -j
ak
=
8
x
Xtzk
=
> -2
(, so that
(1 ak 2
(1)f
(16.37)
2J(),
we have
x
(2j)+2Xtu
< 16
k=1
1=0
ln this case, for k
and the relation
-2
-c:-1
J-o1((+
(16.36)
,
tu-1
-1
l 2 is satisfied for each k. Since aj- ((c +
K
j/2
x
2
Xak
.
(16.38)
k=o
0 we find -1
=
ak
-1
-z-1)Jk
-1
-2
f ak
-2
-J1-l
(16.39)
Mixingales m
m
X (-,2G (0-2+ X k=o
-
,-z-21
=1
(16.38),we
Substituting into
ak-l
255
)
am-l.
=
(16.40)
get
( j
X*X*(
K f 16
-
2
-
' /2 2
(16.41)
This result links the maximal inequality directly with the issue of the summability of the mixingale coefficients. In particular, we have the following
corollary. 16.10 Corollary
Let
(.X),S)be
an Q-mixingale of size
E maxu/J
K
Then
#Xd,
(16.42)
>1
1f.jf n
where K
-.
< x. -1/2-
-2
Proof If (1 Ok > 0, as the theorem imposes, then Z'l=1(k ) for1(-k2)-1/2 tp(--1-8 ) an d hence is summable Over m. ( and b 2 27 Om 2+2: q ) y The theorem follows by (16.41).w =
.
=
.
=
=
However, it should be noted that the condition m
x
.j/p,
X 77(-,2 is weaker than Ok -1/21/2+ for E > 0, so that k
(k
=
(z
--
X( -2 X(k..v2)(log =
(16.43)
< c.o
). Consider tjj e x
case for evel'y g > ()
k.- 2)2+2e
1
Now we have the mixingale inequalities, lIF,+zX,
f IIF,+1AIl,+ < 2c,(z .&,+z--1.Y,I1?? lI.E',+z-1-'tlI,
-
fork
0 and
(16.48)
(16.33).Hence, n
e'Irnzl' s
Ec
2,cp(f
(16.49)
>1
(put (0 1), and substitution in (16.26),with (t7z)7'a positive sequence and = ak, gives =
E
mx
lijin
p
#
IS)Ip < y+1c?,
p-j
cx>
Efz t..o k
- 1
Both ak and ak1-#(( can be summa b1efor p conclusion follows. .
>
-ak
n
x
y'-qtzj-ptj N c;.
(16.50)
,...1
t-azo
1 only in the case
(z
=
Oa,
and the
A case of special importance is the linear process of 16.2. Here we can specialize 16.11 as follows:
16.12 Corollary (i) if Xt
Z7=-oo%Ut-),then
=
E mx Lsjsn
(ii) if Xt
For 1 < p S 2,
=
IsjI# S Cp Z7
p
x
IOeI +
j
77( I01( + al
l0-zI)
p
n sup F# (7,1#., s
then
zejh-p
E max I5'./i lsjs n
P
p
s c,
P
p
!
o
p
n
I0zl 77sup FI &,IP. >'') k.'.'zo >1
st
Mixingales Proof In this case, Et-kxt
Et-k-kxt
-
and a-k
constant sequence
l57
kut-k.
=
tJkl7
Letting
be any non-negative
ak,
=
7-7JI-PFI1u1,
I0:I#7'7cf, CvTakz-p
1
=
ak (ii). choosing
T) Recall that the mixingale coefficients in this case are (,, I0./1 + I0-./1), so linearity yields a dramatic relaxation of the conditions for the inequalities to be satisfied. Absolute summability of the % is sufficient. This corresponds simply to (,,, 0. A mixingale size of zero suffices. Moreover, there is no 2 yields a resuli separate result for faa-bounded linear processes. Putting # that is correspondingly superior in terms of mixingale size restrictions to 16.11. =m(
=
--)
=
16.4 Uniform Square-integrability One of the most important of McLeish's mixingale theorems is a further consequence of 16.9. lt is not a maximal inequality, but belongs to the same family of results and has a related application. The question at issue is the uniform integrability of the sequence of squared partial sums.
16.13 Theorem (fromMacLeish 1975b: lemma6.5,' 1977: lemma3.5) Let taYsTfl 1m, and :,2 7=!clwhere ct is detined in be an Q-mixingaleof size 1 Sn l n l l lf the sequence (Xtlct ) ,=1 is un iformly integrable, then so is the (16.1)M16.2). =
-z,
=
.
oo
sequence(
maxlxjspl/vz,,
jooawl .
Proof A preliminary step is to decompose Xt into three components. Choose posiB 1( jx,Issc,), an d then tive numbers B and m (to be specified below), 1et 1 t =
define
Ut Ft Zt
=
(16.52)
Xt Etwmxt+ Et-mxt
=
-
B
Et-vmxt1 t
=
-
E t-m
a
lf
(1653)
t t
.
B Et-mxt 1 1 t ) Et-mxt 1 1#) t -
-
(16.54)
-
,
such that Xt Ut + Ff+ Zt. This decomposition allows us to exploit the fotlowing collection of properties. (To verify these, use various results from Chgpter 10 on conditional expectations, and consider yhe cases k 2 m and k < m separately.) =
First,
E 2Et-kut Eut
-
Et-vkut)a
for k k 0, where kv m
=
1 E Et-wmjvk S
Ext
-
maxfkml.
=
EE 2 t-k F,
=
=
EEl
t-ksm
2 2
ctLkvm,
(jy
a a Et-vkwmlxa f c/((t,v??;)+1 ,
.
55;
(jy 56; .
Second, El X/ 1#) )# , 1* , , t-m -
,
(16.57)
Theory of Stochtic
258 E
Et+kYt)2
-
Processes
F(F2/+?,, X 1*f -F2 faxkz.?n )# t 1W) , t
=
(16.58)
,
=mintk,rrll.
Theterms areboth zero if kkm and are otherwise bounded Bct. Third,
whereko.1P,m by Exlt ) ;
F#
t-k
zt
EEl
=
E(z t F,+zz,)2 -
=
1 1()
:.)m4
t-
EEl
t+
-
,aftl
1t)
-
-
-
e'lt-ttvj
lt
-
A+(k=)m(1
)), 11/)),
-
where the tenns are zero for k 2 m and bounded by f(#l(1 F(AQf(1 11 that ))/U= ElxtlctjlLf jxycgjlsl) 0 as B the assumption of uniform integrabillty.
x
-
(16.60)
1Pf))
-
-->
-.-.:
(16.59) otherwise. Note uniformly in /, by
The inequality
(x'
2
x,)
>1
s
3
s,) 2 +
z,) 2j
g(x? (x? (x' ,-,) 2
+
>1
>1
(16.61)
>1
for 1 S j S n follows from substituting Xt = Ut + F, + Zt, multiplying applying the Cauchy-schwartz inequality. For brevity. write xj
S)?/v2 n,
=
uj = (Sr=1 '
A'./=
kn
=
Ullgnl,
?= 1
zj = (Yfn Then
out, and
YttllMnl,
lZf)2/v2. s
(16.61)is
equivalent to xj f 3(w+yj +p), for each j = 1,...,n. Also 1et similarly; then clearly, maxlssaay, and define n, y'k, and
.
f 3n
+
yh +
k).
(16.62)
For any r.v. Xk 0 and constant M > 0, introduce the notation &>(A') F(1(x>>).X), x. 0 as M As a so that the object of the proof is to show that supn&ion) of 9.29, and consequence (16.62) =
-.-/
'k
f 3&v/3(
s 6eqn)
+
yh +
-->
k)
+ &wK@-n)
+En)).
(16.63)
We now show that for any er> 0, each of the expectations on the right-hand side of (16.63) can be bounded by e by choosing M laree enough. First consider &;n); =L1/2giveia (16.55)and (16.56),and assuming (,a= Om ), we can app j y j: nto this k-1-9 -1-8 = for k > m. Applying (16.29) for k S m and ak case, setting ak m ' with >1 U t substituted for S) in that expression produces .
,
E(
n
)f
8
(r?z+1)/1-1-8
= O(m ),
+
X k-1-0
&-+1
=
(2m1++2 m X (2,/ 1=>a+1
(16.64)
Mixingales
259
where the order of magnitude in m follows from 2.27(iii). Evidently we can choose m large enough that ELn) < :. Henceforth, let m be fixed at this value.
but in view of (16.59)and (16.60)we A similar argument is applied to E(L4, 2 otherwise. Write, fonnally, EEt-kzt and < choose 1, 0,.. 0 k ak ak may2 y.m, ct(2:, Etnztjl 2 an d Ezt where < t7?(k =
=
=
-
maxlu,u,,F((X,/c,)21( Ixgc,I>s)),
(2
k
s)).
1< t < n
(16.66)
x, so let B be fixed at a value large enough that This term goes to zero as B Eqnj < E. where For the remaining term, notice that l'f Z'l'=-,u+1(x -->
=
Et-vkxtf B, Et+k-3y , jl ,.
k,1 =
(j6 6y)
-
.
For each k, (efl, @f+:)is a m.d. sequence. If 16.8 is applied for the case p = 4 and ak 1 for IFcl < m, 0 otherwise, we obtain (not forgetting that for yj > 0, =
2 (maxyyy) =
..2 Eyn,
maxytyy?
-
where Fn1
=
j
1
wf
v
J)
max l sj s n
'n
77r, 1
=
F(F4
1
>2
>1
+ 4Xv,-1c) + (2#)4 6Xvl-1U Xd
4 f 11(2#) 4vn.
(16.72)
Theory of Stochastic Processes
260 Plugging this bound into .Y.k 0 and a > 0, yields
(16.68),and tinally 6
an ) f &Ml6 (.y
.2
uE- (.yn ) f
applying the inequality a &u(m 4642-
(s j
+
1)411(2s)4
s,
< EX2) for
(16.73)
.
By choice of M, this quantity can be made smaller than e. < 18E for large enough M, Thus, according to (16.63)we have shown that &AQ) or, equivalently,
b-)
-->
0 as M
x.
By assumption, the foregoing argument applies uniformly
complete. w
(16.74)
--)
in n, so the proof is
The array version of this result, which is effectively identical, is quoted for the record. and let Sn be an fu-mixingale array of size 16.14Corollary Let (X ,,@nrJ 2 X1=jXnt and vn by if 2 E'Jzzlcnt, is given where cnt (16.6)V16.7), fAkfyc2n, j is is uniformly integrable. uniformlyintegrable, (maxlsyunial/vnz1-=a.! =
-,
'1
=
Proof As for 16.13, after inserting the subscript n as required.
.
.
17 Near-Epoch Dependence
17.1 Definitions and Examples As noted in j14.3, the mixing concept has a serious drawbackfrom the viewpoint of applications in time-series modelling, in that a function of a mixing sequence (even an independent sequence) that depends on an intinite number of lags and/or leads of the sequence is not generally mixing. Let X
=
#f(...,16-l,1$,'P)+1,...),
(17.1)
where Js is a vector of mixing processes. The idea to be developed in tllis chapter is that although Xt may not be mixing, if it depends almost entirely on the epoch' of f F,) it will often have properties permitting the application of limit theorems, of which the mixingale property is the most important. This idea goes back to lbragimov (1962),and had been formalized in different ways by Billingsley (1968),McLeish (1975a),Bierens (1983),Gallant and White (1988), Andrews (1988),and Ptscher and Prucha (1991a),among others. The following definitions encompass and extend most existing ones. Consider first a definition for sequences. tnear
17.1 Dennition For a stochastic sequence l Ffl+-Z,possibly vector-valued, on a is an probability space (f1,@,#),let lj+-: c(J$-,j,...,J$+,,,), such that ( 8??f+-'''''');=m lf, for of integrable 0, of > c-tields. increasing sequence r.v.s a sequence p =
lA
lt7
satisfies
IIX, -
S vp,, F(.X)I ?7-'''''')II,,
(17.2)
' where vm -- 0, and flrl+-2is a sequence of positive constants, Xt will be near-epoch dependent in fv-/ztp?'?zl(JV-NED) on (FfJ+-7.I:l
be said to
Many results in this literature are proved for the case p 2 (Gallant and White, 1988,for example) and the term near-epoch dependence, withoutqualitkation, may be used in tlzis case. As for mixingales, thre is an extension to the array case. =
A'
possibly vector-valued, on 17.2 Definition For a stochastic array f( Fnfl17-x)';=1, a probability space (f1,F,#), let snt-mt c(Fn,,-,,,,...,F,,,r+,,,).lf an integrable -,,,
=
satisfies arrayt t.krll---xl7-l, Ilxn'F(-Yn,lT,$+7-,,,)11,K dntvm, -
0, and fV-NED bn ( Fnr) u
where
Mm
-->
.
fdritt
(17.3)
is an array of positive constants, it is said to be
Theory of Stochastic Processes
262
We discuss the sequence case below with the extensions to the array case being easily supplied when needed. The size terminology wlch has been defined foi mixing processes and mixingales is also applicable here. We will say that the O(m-% for (p > (pf). when Mm sequence or array is LP-NED of size According to the Minkowski and conditional modulus inequalities, -(p0
=
117,e-'L117'''''')11,S 11*-,wll,+ Il.E'(-'t w! /7:)11,
vt-g.
vt-j,
(17.12)
Theory of Stochastic Processes
264
denotes evaluation of the derivatives at points in the intervals (0,vr-j1. Now define the stochastic sequence (X,) by evaluating gt at (1$,J6-1,...). Note
where
*
that IIX, -
F(X,I1',+-:)112f
#'J'(7,,...,F'-,,,)II2
IIX, -
(17.13)
by 10.12. The Minkowski inequality, (17.12),and thep (17.10) further imply that X
lIA7 -
#:'IIa 77 Gt-.ivt-j =
.j=-+1
l
<jcm- jlGt-jvt-j j)2 X
b.i- llF/-jvt-./llc
<jzm1
s
1-#
bm sup IlF,-./vf-./lia,
(17.14)
j>m *
with where Gt-j is the random variable detined by evaluating Llvt-pgtj and Ft-j bears the corresponding respect to the random point vt-j,vt-j-L with constants relationship with lvt-pft-j. Xt is therefore fU-NED of size 1
>1
n
-t
TElxtupwml
,
m=1
a sum of n 2 terms. If the sequence (zY,l is uncorrelated only the n variances appear and. assuming unifonnly bounded moments, Esn)2 = On). s or genera j dependent processes, summability of the sequences of autocovariances (EXtXt-j), j e N ) implies that on a global scale the sequence behaves like an uncorrelated 2 = o (n). sequence, in the sense that, again, E(Sn) For future reference, here is the basic result on absolute summability. To fulfil subsequent requirements, this incorporates two easy generalizations. First we consider a pair Xt,Yt), which effectively permit.s a generalization to random vectors since any element of an autocovariance matrix can be considered. To deal with the leading case discussed above we simply set Ff = Xt. Second, we frame the imposingfv-boundedness resultinsuchawayas toaccommodateendingmoments, uniform fv-boundedness. It is also noted that, like previous results, this but not extends trivially to the array case. 17.7 Theorem Let fXf,)',) be a pair of sequences, each &&r-l)-NED of size W ith respect to constants fdxtvd't) for r > 2, on either (i) an a-mixing process of -1,
size and -s(r-2),
(r,
2, the constants appearing in (17.21) and (17.22) are smaller (absolutely) than the autocorrelations, and the latter need not converge at the same rate. But notice too that r/(r- 1) < 2, so it is always sufficient for the result if the functions are JU-NED. Proof As before, let Ests)
=
E.
ity,
lE-vmx
I f 1E-vmxt
The modulus and Hlder
1TJ),and -
let k
Y+-zAll l +
inequalities give
=
(
(W21+ 1
=
assuming wherev,a dYtvm+v,u +vruvp,, an d the constants are 4#f-wmY+k-g1a, that dxt and are not smaller than the corresponding Lz nonns. =
X
F
F
X
Al1 these results extend to the array case as before, by simply including the extra subscript throughout. Corollary 17.11 should be compared with 17.7, and care taken not to confuse the two. ln the former we have k fixed and finite, whereas x. The two theorems naturally the latter result deals with the case as m each other in applying truncation complement arguments to infinite sums of products. Applications will arise in subsequent chapters. More general classes of function can be treated under an assumption of continuity, but in this case we can deal only with cases where 9f(A) is Q-NED. Let --+
RV
4)(x):-I' F-> R, 'T i be a function of
real variables,
1/
ptxl
,.'r
2
and use the taxicab metric on V
)
IA'f1
=
2
-xi
I
(17.31)
,
izu1
RV,
2 to measure the distance between points x 1 an d x We consider a set of results that impose restrictions of differing sevelity on the types of function allowed, but offer a trade-off with the severity of the moment restrictions. To begin with, impose the uniform Lipschitz condition, .
l(1),(.1 1) 9/xl ) I < -
B , pt.tl
xll a.s.,
,
(17.32)
where Bt is a tinite constant. 17.12 Theorem Let Xitbe AZ-NED of size on f F2) for i = 1,...,v, with constants with constants dit. If (17.32)holds, ((aV)) is also AZ-NED on f F2) of size multiple maxftljll. of a finite Proof Let 4),denote a Fj+-l-measurable approximation to /ml. Then -a
-c,
II4h(A)
-
Ak
< 119,(2,) 9,112 FJ-,,''>,(.X))Il2
+
(17.33)
-
by 10.12. Since ((j+-;;1A) is an Tjlrl-measurable random for this choice of ,, and by Minkowski's inequality,
variable,
F7''''''9,(A)112< #,Ilp(.2',,.E7;7,-Y,) 112 119,(1,) -
V
u Bt
2 11-'t,e,!,,,',,-'t,II -
i=1 V
< Bt
ditvim f=1
(17.33)holds
Theoty of Stochastic Processes
270
f dtvm, and v,,,
where dt vffmaxjltitl by assumption. x =
=
(17.34)
Y the latter sequence v -1Zjazlvfm,
being of size
-.a
lf we can assume only that the Xit are TV-NED on F, for some p e (1,2),this argument fails. There is a way to get the result, however, if the functions (t are bounded almost surely. 17.13 Theorem Let Xitbe JV-NED of size dit, i = 1,...,v. Suppose that, for each t,
I9,(.1l)
-
Proof
l#1, -
For brevity, write ( i 9, I / zMmintz, 1 ). Then
=
,7(9)
on
%).
2
-4,:2
,
)) is Q-NED
x.
9r(X
I
on ( Ff ), for 1 f p f 2, with constants (A) j f M < x a.s., and also that
(#l xl)
2
#f(-Y ) l f m inl Blp
Then ((# finite multiple of maxjttf)
where Bt
1,(9l 2 and #p/C > l Substituting for C in (17.45) and (17.46)and applying (17.44)yields the result. . .
The general result is then as follows.
17.16 Theorem Let l#f ) be a v-dimensioned random sequence, of which each element is Q-NED of size on (FCJ,and suppose thatY/m)is fvbounded. Suppose further that for 1 s q S 2, -a
< x, I1p(.X',,F7;Jl.fJllt?
< x, lI#,(-Y,,e-7:A',)IIc&c-1)
and for
>
r
2, < oo. Il#?(#/,%+--M#,)p(#f,Fj+->,)IIr
Thn
((f(A)
l
is JU-NED on
( Ff)
of size
-alr
-
2)/2(r
-
1),
Proof For ease of notation, write p for p(.X',,F7:#Jand B for BtxtnE7-zXtl. As in the previous two theorems, the basic inequality (17.33)is applied, but now we have
11(-)) e'7Jr9(A-f)l12< -
114,(.X',)#(F77X))Il2 -
llfpll2 (17.47) s 2IlpII:Cr-2)/2(r-1)Ij#jj4(;-(:21(2)(r-1)jjspjjrr/2(r-1) S
where the last step is by 17.15. For q f 2, V
Ilpllq
0, there exist (T), (t$), and (vpr,las above such that, for every --)
#(IA
'71
-
>
dtn) f vm. u
(17.56)
The usual size tenninology can be applied here. There is also the usual extension to arrays, by inclusion of the additional subscript wherever appropriate. If a sequence is fv-approximable for p > 0, then by the Markov inequality
#( lXt 8-J'vj,;
where s'm'=
-
h'l I > dtn) < ()-PlI-Y,
hence an
-
< vm' /1'J'11p#
(17.57)
,
p-approximable
An process is also Q-approximable. although only in the casep 2 are we able to
AP-NED sequence is fp-approximable, is the best w-approximator in the sense that claim (from10.12) that Ext 11j+-'''m) the p-nonns in (17.55) are smaller than for any alternative choice of hl'. =
17.19 Example Consider the linear process of 17.3. The function
h';l X =
(17.58)
jvt-j
j=-m
unless @,'+-':) isdifferent from A'(#,I for xt since zw-approximator
112,hmt11, 77 jvt-j + jzzm1 + joy1 l l) and dt 7=,u+1( -
where vm
=
(J/',) is an independent process, but is also an -jb-vjt
=
-j
> =
< dts'm,
(17.59)
sup,ll 7,IIp. o
17.20 Example In 17.4, the functions g are fv-approximators for Xt, of intinite < x. n size, whenever sup,sf IIF,F,IIp One reason why approximability might have advantages over the AP-NED property is the ease of handling transfonnations. As we found above, transferring the JV-NED property to transformations of the original functions can present difficulties, and impose undesirable moment restrictions. With approximability, these difficulties can be largely overcome. The first step is to show that, subject to Zr-boundedness, a sequence that is approximable in probability is also fw-approximable forr < r, and moreover, that the approximator functions can be bounded for
Near Epoch Depcndence
each finite m. The following is adapted from Ptscher
and Prucha
(1991a).
17.21Theorem Suppose fXf) isfv-bounded, r> 1, andfvapproximablebyT. Then for 0 < p < r it is fv-approximable by J?;' /CI ( Ihmtj 1. II#II,qIIl'II?v&-l),
chooseq r/# and apply (17.61)to Amitfor i dtMm,again by Minkowski's inequality, IImIIr+ thefollowing inequalities. First, =
=
(17.61)
1,2,3. Noting that 112,/')'IIr< lsllfx= #(F), we obtain and that 11 -
/ Il-Y, l'7IIr#(I-Y, /'71 > dtnm) 41,,,, t4(IIA/IIrM,,,-1 + 1)MmMm. f -
-
(17.62)
Second, observe that
l IXt
'l
II';'I >
lS
dtnm.
and hence, when
slm>
&m,
Ixt- /';'I
P
-
dtsl)
t IXt I >
dt Mml i S #(
11,1>
dtMm
f IIXIII
rrdirum
by the
dtMm
-
-
,,,)
l
,,,)) -
nml-r
(17.63)
Markov inequality, so that Al,,,
R satisfies (17.40),and If #t: of size then (j,4.Y,)is fo-approximable of size =
-(p.
-(?.
Proof Fix 6
>
0 and M
#( I#r(m) 9,('7)I > -
>
0, and define dt
Elxllff.The
=
dt) S PBt-knhllqpxphl't
+
>
#(#f(A'r,h'7)p(m,:')
F(#/(#s')')E)/3#
+
Markov inequality gives #f(m,:')
s, >
>
dtn, Btxt,h',
Plpxt,hll
>
dt&M).
M4 S M)
(17.69)
Since M is arbitrary the first term on the majorant side can be made as small as desired. The proof is completed by noting that V
Ppxt,h'l)
>
J,&M)
P
Im, X =1
KP
Ut Iff
=
-
'rri
>
dtnlM
>
ditnlM
ll
-
jzz1 %'
#( jXit
dit&lM)
V
f
Xvjm 0 as m -+
--y
x.
w
(17.70)
=l
It might seem as if teaming 17.22 with 17.21 would allow us to show that, given an and fv-bounded, Q-NED sequence, r > 2, which is accordingly Q-approximable hence Q-approximable, any transfonnation satisfying the conditions of 17.22 is and therefore also Q-NED, by 10.12. The catch with circumQ-approximable,
Near Fp/c
Dependence
277
venting the moment restrictions of 17.16 in this manner is that it is not possible to specify the Q-NED size of the transfonned sequence. In (17.67),one cannot put a bound on the rate at which thesequence (,,, ) may converge without specifying the distributions of the Xit in greater detail. However, if it is possible to do this in a given application, we have here an alternative route to dealing with transformations. Ptscher and Prucha (1991a),to whom the concepts of this section are due, define approximability in a slightly different way, in terms of the convergence of the Cesro-sums of the p-nonns or probabilities. These authors say that Xt is fv-approximable p > 0) if
limsup1 -
n
7) 111,-
'yllp
0 as m
--
11 >1
n-x
and is Q-approximable if, for every
(17.71)
.-->
x,
0,
>
n
1
limsupn-X#(lx,- :,1 > )
O as m
--->
(17.72)
--+
x.
>1
n-yx
lt is clear that we might choose to define near-epoch dependence and the mixingale property in an analogous manner, leading to a whole class of alternative convergence results. Comparing these alternatives, it turns out that neither definition dominates, eachpermitting a form of behaviourby the sequences which is ruled out by the other. If (17.55)holds, we may write n
1
limsupn Y 112,--
'/lln
'''
n-yx
''r
-
A
>1
1
n
Tdt n
< limsup
v,a
-->
0
(17.73)
>1
n--hx
so long as the limsup on the majorant side is bounded. On the other hand, if (17.71) holds we may detine n
v,,,
=
limsup
-
17) 111,hmt11,,
(17.74)
>1
n-x
will satisfy 17.18 so long as it is t'initefor and then dt = supp,t 111, Tllp/vml finite t. Evidently, (17.71)permits the existence of a set of sequence coordinates for which the p-norms fail to converge to 0 with m, so long as these are ultimately negligible, accumulating at a rate strictly less than n as n increases. On the other hand, (17.55)permits trending moments, with for example #f 0(/), -
=
,
0, which would contradict (17.71). Similarly, for ,u > 0, and vm > 0, define dtm by the relation >
#( IXt
-
h'II >
dtmm)
Mm,
=
(17.75)
0, define dt = suvmdtm. (17.56)is satisfied and then, allowing v,,i 0 and m if dt < x for each tinite t; this latter condition need not hold under (17.72).On the other hand, (17.72)could fail in cases where, for jed 8 and every m, -->
--
P IXt
-
';l
I > ) is
tending to unity as /
-->
x.
1A? THE LAW OF LARGE NUMBERS
18 Stochastic Convergence
18. 1 Almost Sure Convergence in 512.2.Sometimestheconditionis
wasdefinedformally
Almostsureconvergence stated in the form
# limsup IXn 11
> 6:
-
0, for a11 E
=
>
0.
(18.1)
n->x
1 where, for each (t) Yet another way to express the same idea is to say that #(C) > number E > finite of times as we pass and at most a any s 0, I-L() Xt)l I e C down the sequence. This is also written as =
-
#( IAk Xl -
where i.o. stands for
i.o.)
> E,
0, all E, > 0,
=
(18.2)
often'. probability in (18.2)is assigned to an attribute of the whole Note that the often' idea is sequence, not to a pmicular n. One way to grasp the words, consider the event U';=,,,l event that has occurred to IXn *-1 > erl; in 11 > for least beyond Xn el occurs whenever ( I at one n a given point m in the sequence' lf this event occurs for every m, no matter how large, ( IXn 11 > el occurs infinitely often. ln other words, tinfinitely
'infinitely
'the
-
-
-
.
Cr
f IXn 71 -
>
s, i.o.)
r
O U( IXn
=
-
#1
> El
vl=1 n=m
=
limsupt
IXn
-
11
>
el
.
(18.3)
n-x
Useful facts about this set and its complement
are contained in the following
j emma.
18.1 Lemma Let fEn
TlT be
G
an arbitrary sequence. Then X
(i) P
limsup En
=
1im P n--x
n--x
UEm
.
m=n X
(ii)#
liminf En n-'hoo
=
1im #
OEm
.
m=n
is decreasing monotonically to limsuw Fn. Part Proof The sequence (U7=,aFa1'Q1 (i) therefore follows by 3.4. Part (ii) follows in exactly the same way, since the
The frw of f-zzrg: Numbers
282
sequence f*n=mbJo;=lincreases monotonically to liminf En.
.
A fundamental tool in proofs of a.s. convergence is the Borel-cantelli /'r/;zz?.t7. This has two parts, the part and the part. The former is the most useful, since it yields a very general sufficient condition for convergence, whereas the second parq which generates a necessary condition for convergence, requires independence of the sequence. tconvergence'
Kdivergence'
18.2 Borel-cantelli lemma (i) For an arbitrary sequence of events fEn e
??7)T,
X
XpEnb
PEn, 1
0.
.0.)
m, /'
#
(18.7)
n=m
(AN,(
e
;)7
is independent; hence
;
;
(1 g)Ecn I-IPYn) 1-I =
h=n1
=
n=m
-
P(En44
n=m
< exp
-XPEn)
--
O as m'
( 18 8)
-+
.x,
.
n=m
by hypothesis, since e-x > 1 x. (18.8)holds for a1l m, so -
X
rtliminfEcnl
=
OYn
lim P m--koo
=
0,
(18.9)
n=m
by 18.1(ii). Hence, PEn i.o.)
=
'tlimsup
En4 = 1 Ptliminf E -
=
1.
.
(18.10)
To appreciate the role of tls result (the convergence partl in showing a.s. convergence, consider the particular case
Stochastic Convergence En
f
=
IXnlk
):
I
X())
-
283
*n=3PEn)< x, the condition PEn) > 0 can hold for at most a finite number of n. The lemma shows that P(En i.o.) has to be zero to avoid a contradiction. Yet another way to characterize a.s. convergence is suggested by the following
If
theorem. (Xn) converges a.s. to X if and only if for all 6: > 0
18.3 Theorem
lim # sup lXn
.:-1
S
-
s
nkm
pl-x
=
1.
(18.11)
e1 e @,
(18.12)
Proof Let X
A,,,(E)
=
(')1:
IS
IAk(t,))
-#()
n=m
and then (18.11) can be written in the fonn lim,,,-+x#(Am(e)) 1. The sequence U,,7=lAm(E), fA,,,(E) 17 is non-decreasing, so A,,,(E) = UXlA/:); letting A(E) 1. (18. 11) can be stated as #(A(:)) Define the set C by the property that, for each ) s C, (Xn((t))lT converges. That is, for (t) e C, =
=
=
3
lA%((J))
such that sup
??z()
-
X(l
numl)
1
0. Set E To show if' assume k, and define #(A(E))
=
i) >
(18.13)
0.
HencePto
1implies
=
.
=
,
X
A*
=
f)
=
t=1
$Ik for positive integer
C
<x)
Ajlk)
=
tj
Atlkjc
(18.14)
.
&1
1. The second equality here is 1.1(iv). By 3.6(ii), #(A*) 1 #(U7=1A(1/k)9 z4* is a convergent outcome in the sense of (18.13), hence A* But every element of c C, and the conclusion follows.w =
=
-
The last theorem characterizes a.s. convergence in tenns of the unform proximity of the tail sequences ( IXnll) X()) l 1';.,,,to zero, on a setAm whose measure x. A related, but distinct, result establishes a direct link approaches 1 as m between a.s. convergence and uniform convergence on subsets of (. -
--y
-6:t-:
X, there exists for every > 0 a 18.4 Egoroff's Theorem lf and only if Xn X() uniformly on C(&). set C(8) with #(C48)) k 1 8, such that Ak() .-+
-
Proof To show if', suppose aL()) converges uniformly on sets 1 2 3 The sequence ( C(1/k) k e (N) can be chosen as non-decreasing tonicityof #, and #(U7=1C(1/k)) 1 by continuity of #. To show Eonly
1
,
;***
,
=
C(I/Q,k
=
by mono1et
tif',
The fuzw of fzzrgd Numbers
284
A,,()
O (t,):lXn(t,))-X()I
=
0,
1im#(IXn X1 -
> f:)
=
0,
(18.18)
n'M
Xn is said to converge in probability (inpr.) to X. Here the convergent sequences are specified to be, not random elements (X,,()) 1T,but the nonstochastic sequences (#( IXn Xl > E) )T.The probability of the convergent subset of ('1 is left unspecified. However, the following relation is immediate from 18.3, since (18.11) -
implies (18.18).
18.5 Theorem If Xn
-66.9
X then Xn -E'.% X.
(a
The converse does not holdkConvergence in probability imposes a limiting condix. The tion on the marginal distribution of the pth member of the sequence as n negligible deviation of is approaches Xn X 1 probability that the from as we move down the sequence. Almost sure convergence, on the other hand, requires that byond a certain point in the sequence the probability that deviations are negligible from there on approaches 1. While it may not be intuitively obvious that a sequence can convergein pr. but not a.s., in 18.16 below we show thatconvergence in pr. is compatible with a.s. n/nconvergence. However, convergence in probability is equivalent to a.s. convergence on a ..-.h
Stochastic Convergence
285
subsequence', given a sequence that converges in pr., it is always possible, by throwing away some of the members of the sequence, to be left with an a.s. convergent sequence. 18.6 Theorem Xn -EI-> X if and only if every subsequence fAk, k e IN) contains a further subsequence fxnkcj, j e (N) which converges a.s. to X. 0 for any e > 0. This means if : suppose P( IXn XI> E) Proof To prove #( that, for any sequence of integers fns k e INl IXnk XI > e) 0. Hence for each j G LNthere exists an integer kU)such that 'only
-
--
-->
-
,
#( 1Xnk 71
>
-
1/.j) < 2-/, a11k k k(j).
(18.19)
Since this sequence of probabilities is summable over j, we conclude from the first Borel-cantelli lemma that
#( IXnkv
XI >
-
$lj i.o.)
=
0.
(18.20)
It follows, by consideration of the intinite subsequences Lj k J) 0 for every 6: > 0, and hence the that #( IXnkv XI > e i.o.) required. fxnw)J converges a.s. as : if (Ak) does not convergence in probability, there To prove > s)l subsequence (rl1)such that inflf#tlAk-rl s, for some 6: > which nlles out subsequence of in out convergence (?u), pr. on any a.s. on the same subsequence, by 18.5. w =
-
Kif'
18.3 Transformations
for J > 1/:, subsequenc: must exist a 0. This rules convergence
and Convergence
The following set tools of asymptotic even though most k-vector Xn is said
of results on convergence, a.s. and in pr., are lndamental theory. For completeness they are given for the vector case, of our own applications are to scalar sequences. A random to converge a.s. (in pr.) to a vector X if each element of Xn converges a.s. (in pr.) to the corresponding element of X. 18.7 Lemma Xn X a.s. (in pr.) if and Only if IlAk #II 0 a.s. (in pr.).19 -->
-->
-
Proof Take first the case of a.s. convergence. The relation
expressedas
k
P 1im n-yoo
for any
0. But
6: >
xni Xill X f=1
':
-6.+
-6-+
'(m.
t
is by hypothesis a set D e @, with #(D) = 1, such that A%() #4(z)),each (t G D. Continuity and 18.7 together imply that glxnll for each (.t e X-1(Cg ) ch D. This set has probability 1 by 3.6(iii). g(#()) Toprove (ii),analogous reasoning shows that, for each r > 0, 3 > O such that Proof For case
(i), there
--
--
1t0:IlX()
-A)lI
Note that if #(#)
) c'-kr,sj
0, then XnYn.-CL>0. FPn
Proof For a constant # > 0, define L1 ( Iyklus). The event ( lXaF,,I 6: > 0 is expressible as a disjoint union: IXnYnI sl l IA%IIF$ l k e:1 tp f IXnIIi'n F$ l El
t
For any :
>
0,
-
.
( IXn IIF,, I k #(l-Yn lIi
Ik
c l IXn I 2 e/#l,
:1
el for
=
=
s) < #(l-Yn 1
zlB4
-->
(18.27)
and 0.
-->
(18.28)
> 0, B& < cxa such that By the 0:(1) assumption there exists, for each YB YB #( IL Fn%I > 0) < for n e ENSince ( IXnIIF,, n l sl c l Il'n n I > 01, (18.27)and additivity imply, putting B = Bs in (8.28),that -
-
-
.
lim P( lXnk
lk
e)
x
The theorem follows since both 6: and 6 are arbitrary.
.
18.4 Convergence in L Norm #
Recall that when F(l Xn IP) < x, we have said that Xn is fw-bounded. Consider, for lf F(llxnllp < :cn all n, and limn-jxllAk --Y11p 0, p > 0, the sequence ( II-L-XIIpl7. When p = 2 we speak of Xn is said to converge in N zm?wzto x (writeXn l convergence in mean square (m.s.). Convergence in probability is sometimes called fvconvergence, terminology which can be explained by the factthatfv-convergence implies fx-convergence for 0 < q < p by Liapunov's inequality, together with the following relationship, which is immediate from the Markov inequality. =
---+
m.
-1f,+
18.1 Theorem lf Xn The converse
X for any p
>
0, then Xn .T.LA X. n
does not follow in general, but see the following theorem.
18.14 Theorem If Xn
.-#-(..>
X, and
( IL IPITis uniformly
integrable, then Xn -f:E.-)X.
288
The fzzw of zlr':
Numbers
Proof For e > 0, e-lAk
-11:
=
F(1( Ix,,-xl''>:) lXn
K e'(1(Ixn-xj''>EJ
-11#)
+
F(1f
--YlP) laY?,
Ixa-xtzkEl
+ s. l-Yn-XI#)
(18.30)
x. Uniform integrab0 as n Convergence in pr. means that #( lXn Xl > e) ility therefore implies, by 12.9, that the expectation on the majorant side of (18.30) converges to zero. The theorem follows since e: is arbitrary. K .-+
.-.+
-
We proved the a.s. counterpart of this result, in effect, as 12.8, whose conclusion can be written as: jXn X1 -E.. 0 implies A'lXn XI 0. The extension f'rom the fw1case to the Lp case is easily obtained by applying 18.8(i) to the case #(.) -,-h
-
-
lP. =I .
One of the useful features of Lp convergence is that the Lp norms of Xn X define a sequence of constants whose order of magnitude in n may be determined, providing a measure of the rate of approach to the limit. We will say for example that Xn converges to X in mean square at the rate nk if Ijxf-.Y112= On-h, but not on -k). This is useful in that the scaled random variable nkxt A') may be non-degenerate in the limit, in the sense of having positive but finite limiting variance. Determining this rate of convergence is often the tirst step in the analysis of limiting distributions, as discussed in Part V below. -
-
18.5 Exnmples Convergence in pr. is a weak mode of convergence in that without side conditions it does not imply, yet is implied by, a.s. convergence and Lp convergence. However, there is no implication from a.s. convergence to Lp convergence, or vice versa. A good way to appreciate the distinctions is to consider cases where one or other mode of convergence fails to hold. dpathological'
18.15 Example Look again at 12.7, in whichAk 0 withprobability 1 1/n, and Xn = = n with probability 1/n, for n 1,2,3,.... A convenient model for this sequence where m is Lebesgue is to let ( be a drawing from the space (r0,1),m,1j,pz) variable and random define the measure, =
Xnlj
=
n,
)
-
e (0,1/a),
(18.31)
0, otherwise.
The set f : limuAkt) # 0) consists of the point (0J, and has p.m. zero, so that Xn -E't-A 0 according to (18.1).But Fl XnlP = 0 (1 1/rl) + t'l'ln = n#-1 lt will be recalled that this sequence is not uniformly integrable. lt fails to converge in 1 for every n. The Lp for any p > 1, but for the case p = 1 we obtain Exn) limiting expectation of Xn is therefore different from its almost sure limit. n -
.
.
=
The same device can be used to define a.s. convergent sequences wltich do not converge in Lp for any p > 0. It is left to the reader to construct examples.
Stochastic Convergence
289
18.16 Example Let a sequence be generated as follows: .Y1 1 with probability 1; xztxzl are either (0,1) or (1,0) with equal probability; (X4,X5,X are chosen from (1,0,0),(0,1,0),(0,0,1) with equal probability; and so forth. For k 1,2,3,... the next k members of the sequence are randomly selected such that one of them is unity, the others zero. Hence, for n in the range (kt/c 1) + 1, jkk + 1)J,P(Xn 1) 1/k, as well as FIXnlP 1/k for p > 0. Since k .-.A x as n .-+ x, it is clear that Xn converges to zero both in pr. and in Lp norm. But since, for any n, Xnwj 1 a.s. for infinitely many j, =
=
-
=
=
=
=
#(I-L I
18.18 Example Let (m) denote a sequence of independent Cauchy random variables w ith characteristic function 9xf(X) e- lXI for each t (11.9).It is easy to I.I verify using formulae (11.30) and (11.33) that #:a(D= e -aI,l/,, e xxord. ing to the inversion theorem, the average of n independent Cauchy variables is also a Cauchy variable. This result holds for any n, contradicting the possibility that Xncould converge to a constant. l:a =
-
=
.
18.19 Example Consider a process l
Xt
lgsz.
=
Xt- I +
=
jthzt,
t
1,2,3,
=
(18.34)
...
J=1
with Ab 0, where IZ/J7is an independent stationary sequence with mean 0 and variance c2, and (% 17is a sequence of constant coefficients. Notice, these are indexed with the absolute date rather than the lag relative to time /, as in the linear processes considered in j14.3. For m > 0, =
COv(X/,Xtwm)
=
VartXJ c21
t
=
x=l
Fs. l
(18.35)
< x, i n this case the effect fw-bounded requires Zyzzlvs * with of the innovations declines to zero t and Xt approaches a limiting random Without the square-summability assumption, Vartl An X, variable say. example of the latter case is the random walk process, in which vs 1, all s. Since Cov(xY1,m) for every /, these processes are not mixing. Xnhas zero mean, but
For
(X,)Tto be uniformly
.
-->
=.
=
=
W1c2
n
1 7'vartx,l+
vara) nw =
>1
n
277 >2
J-l
yvartxp
(18.36)
.
j=1
c27,=1vl; otherwise Vartxa) x. In either then limn-oxvartk) fixed limit, being either stochastic fails the fYa to converge to a ) sequence case
If
17
J=
lvy?
c,o,
=
292
The fww of zzr'd Numbers
asymptotically, or divergent. n These counter-examples illustrate the fact that, to obey the law of large numbers, a sequence must satisfy regularity conditions relating to two distinct factors: the probability of outliers (limitedby bounding absolute moments) and the degree of dependence between coordinates. ln 18.18 we have a case where the mean fails to exist, and in 18.19 an example of long-range dependence. ln neither case can Xn be thought of as a sample statistic which is estimatinj a parameter of the underlying distribution in any meaningf'ul fashion. ln Chapters 19 and 20 we devise sets of regularity conditions sufficient for weak and strong laws to operate, constraining both characteristics in different configurations. The necessity of a set of regularity conditions is usually hard to prove (the exception is when the sequences are independent), but various configurations of mixing and fw-boundedness conditions can be shown to be sufficient. These results usually exhibit a trade-off between the two dimensions of regularity; the stronger the moment restrictions are, the weaker dependence restrictions can be, and vice Versa.
One word of caution before we proceed to the theorems. In j9. 1 we sought to motivate the idea of an expectation by viewing it as the limit of the empirical average. There is a temptation to attempt to defne an expectation as such a limit', but to do so would inevitably involve us in circular reasoning, since the arguments establishing convergence are couched in the laguage of probability. The aim of the theory is to establish convergence in particular sampling schemes. lt cannot, for example, be used to validate the frequentist interpretation of probability. However, it does show that axiomatic probability yields predictions that accord with the frequentist model, and in this sense the laws of large numbers are among the most fundamental results in probability theory.
19 Convergence in Lp Norm
19.1 Weak Laws by Mean-square
Convergence
This chapter surveys a range of techniques for proving (mainly)weak laws of large numbers, ranging from classical results to recent additions to the literature. The common theme in these results is that they depend on showing convergence ih fv-norm, where in general p lies in the interval (1,21.Initially we consider the 2. The regularity conditions for these results relate directly to the case p variances and covariances of the process. While for subsequent results these moments will not need to exist, the faz case is of interest both because the conditions are familiar and intuitive, and because in certain respects the results available are more powedl. and variConsider a stochastic sequence (m)T,with sequence of means (g.r1T, 2 ances (cf )7. There is no loss of generality in setting g,f 0 by simply considering the case of faV w)7,but to focus the discussion on a familiar case, 1et us ---:: p, question, what initially assume Fn n -11J=1p< tfinite) and so consider therelation g)2 for 0:7 elementary conditions E(Xn An is are su fticient =
=
-
=
,
-->
-
Ek
g,)2
-
n
=
vartfnl+ EXn)
g)2.,
-
(19.1)
where the second term on the right-hand side converges to zero by detinition of p.. Thus the question becomes: when does Var(Xn) 0? We have -->
n
E ?z-1X(.2Vg,
Var(Xa) =
-
2
n
n
t- l
N(U 2XXcs /2
?z-2
=
+
>1
>1
(19.2)
,
ml
where (U Vartx and nts Cov(Xt,Xs). Suppose, to make life simple, we assume that the sequence is uncorrelated, with nts 0 for f # s in (19.2).Then we have the following well-known rsult. =
=
=
19.1 Theorem If
(.41Tis
uncorrelated sequence and *
77t
-2g2
,
< x
(19.3)
,
>1
-i+
then Xn
g,.
Proof
This is an application of Kronecker's 0. . implies VartYn) =
?z-21(U
lemma
(2.35), by
which
(19.3)
--
This result yields a weak 1aw of large numbers by application of 18.13, known conditton for (19.3)is that the as Chebyshev's theorem. An (amply)suftkient >nG
The zzw of fxzr'c Numbers
294
variances are uniformly bounded with, say, suprczff B < x. Wide-sense stationary 0(n-1). But since Vartxa) sequences fall into this class. ln such cases we have f1-8 /41), cl x is evidently permissable. If cl for Var(Xn)= a11we need is -lGl has O(f-1 and therefore converges by 2.27. terms of > 0, Z:=1f t is an unnecLooking at (19.2)again, it is also clear that uncorrelatedness essarily tough condition. It will suffice if the magnitude of the covariances can be suitably controlled. Imposing unifonn fu-boundebness to allow the maximum relaxation of constraints on dependence, the Cauchy-schwmz inequality tells us that lcsl S B for a11 t and s. Rearranging the formula in (19.2), =
-.->
-
-8)
,
Var(x-
n
1 n) n
+
-
n
xn x
p.2
c2, + g
p.. /2 Proof For part (i), 14.5 for the case r 2 yields the inequality Bm ; lB(m1 y.or /(z+y a r = 2 + 8 yields Bm S 6 IImII2+8a,,, part (ii), 14.3 for the case p xoting lIAll22+a, the conditions of 19.2 are satisfied in either case. . that B S =
.
=
.
O((1og ra)-2-f) for any 6: > 0. For A sufticient condition for 19 3(i) is (),,, -(1+V)(l+e)) Ottlog m) 19.3(ii), m sr : > () is suftkient. In the size terminology of j14.1, mixing of any size will ensure these conditiops. The most significant cost of using the strong mixing condition is that simple existence of the variances is not sufticient. This is not of course to say that no weak 1aw exists for Q-bounded strong mixing processes, but more subtle arguments, such as those of j19.4, are needed for the proof. =
=
19.2 Almost Sure Convergence by the Method of Subsequences Almost sure convergence does not follow from convergence in mean square (a counter-example is 18.16), but a clever adaptation of the above techniqueg yields a result. The proof of the following theorems makes use of the method of subsequences, exploiting the relation between convergence in pr. and convergence a.s. demonstrated in 18.6. Mainly for the sake of clarity, we first prve the result for the uncorrelated case. Notice how the condittons have to be strengthened, relative to 19.1.
19.4 Theorem lf fXf lTis uniformly fmbounded and uncorrelated, Xn -1.-y
g,. n
A natural place to start in a sufticiency proof of the strong 1aw is with the convergence pal't of the Borel-cantelli lemma. The Chebyshev inequality yields, under the stated conditions,
#(IXn FnI > -
:)
S
vart/n) s S a
z
(19.7)
for B < x, Fith the probability on the left-hand @'side going to zero with the . x. One approach to the probtem of bounding the quantity right-hand side as n -->
The fuzw of fzzr'c Numbers
296
#( IXn Fnl > E, i.o.) would be to add up the inequalities in (19.7)over n. Since the partial sums of 1/n form a divergent sequence, a direct attack on these lines 1 64 and we can add up the Juhsequence of '=' does not succeed. However, Za.1a = for 1,4,9, 16,..., probabilities in (19.7), n the as follows. -
-2
=
Proof of 19.4 By
,
.
(19.7), XP
IXn2Fn2I > e)
n2
1.64/
n2+1
2
)-s t,z 1
(a +
1)z
.
(19.11)
(19.11),and
F''l@-2(n+ 1)-2) -
=
1,
(19.12)
n
so the Chebyshev inequality giyes XpDnl n2
>
#
o Eu
0 Instead of, in effect, having the here for Bm = Om -1(lo g merely decline to zero, we now require their summability. autocovariances .
Proof of 19.5 By (19.4),Var(Yu) < CB+ lB*)ln and hence equation themodified form, XP n2
Instead of Vartxk
(19.11) knz)
=
-
Iknl ik I >
1.64(,+zB.)
e) f
-
1 k
X j. p=n2+1
Xt
1 n
-
-
2
--
2 n Vart-Ynzl + 1 1 v = & P 2
k
1-n2-1
k
cl
Xzx.l
2
+
X X
Gt,t-m
,s=1
panaz
Nn
l-1
k
2 2 n - k 1 k -
j6)
Xnz
-
-
.
in
out and tnking expectations,
-
-
(lq
< x.
E2
we have, on multiplying
Var
(19.7)holds
X X wt-m f=nN.l
-
(19.17)
.
m=l-n2
The first term on the right-hand side is bounded by (1 n 2/k)2(#+ 2#*)/r the 2s+ 2 + 2 2 second by k n )(# + IB 4Ik and the third (absolutely)by 2(1 n lk) Adding together these latter tenns and simplifying yields -
,
-
-
.
,
Vartxk Xnz)< -
--j
1
1 1 w n+ ni Note, (1 n 2ln + 1)2)2 (19.13)we cgn write
1
-
y
-
1)-L
-
=
-
B
B
+
+
On-l)
1
2
2 ,
1
u y
+
1
1
=
Z
-
-
2 1 n -
-
n
+
1)d
+
2
B
--
1
+
n2
-
n
+
1)z
2
B
+ .
(19.18)
so the term in #* is summable. ln place of
The ww of fzzrgc Numbers
298
XpDnz
>
s)
1
kn
b2 (c) 77 >1 ,
-1jg#g
then an
t 1J(t
=
t?(t);
.(eo g.
Proof Immediate from 19.7, defining Xnt
Xtlan and cnt
=
=
btlan. w
Be careful to distinguish the constants an and kn. Although both are equal to n in the sample-average case, more generally their roles are quite different. The case with kn different from n typically arises in arguments, where the array coordinates are generated from successive blocks of underlying sequence coorfor a e (0,1) (g.:jdenoting the largest integer dinates. We might have kn Erltxl 1-a x) where block does not exceed grl j. For an application of of length the below a this sort see j24.4. Conditions 19.8(b) and (c) together imply an 1 x, so this does not need to be separately asserted. To form a clear idea of the role of the assumptions, it is helpful to suppose that bt and an are regularly varying functions of their arguthat the conditions are observed if bt t 15 ments. It is easily verified by 2.271+F for f'J> and an log n fpr f' by choosing an n In for any ;$k a11 f, kn yields for an n particular,setting bt 1 tblocking'
=
-
-1,
-1,
-1.
-
-
=
=
IIXkII, 0. -->
=
=
(19.22)
will automatically satisfy condition (a), and condition (b) choosingan /,z1/7, willalso hold when bt Ot ). On the other hnd, a case where the conditions =
=
The Izzw of wr'd Numbers
300
l-1
t
1 and, for t > 1, bt Zy=l)y= 2 In this case condition (a) fail is where :1 2 imposes the requirement bn = Oan), so that bn2 O (Ju), contra dicting condition (b). The growth rate of bt exceeds that of t 5 for every j3> 0. =
=
.
=
Proof of 19.7 Uniform integrability implies that
supftlA%,/cn,lP1f n
Ix,,,/o,I>>))
0 as M
---)
x.
---
,1
One may therefore
tind, for
tr
SuP(1IXn,1( n
0, a constant Bs
>
s:cn,)
,f
zn, Xnt Fn,. Then since Exnt I@n,f-1) Fn,/-1)+ Znt F(Zn,I @n,,-l). F(l%,I
Define Ynt = Ak,1f Ixarlsseo/), and Xnt
=
Fn,
-
=
=
-
0,
-
By the Minkowski inequality,
177'-x-, j s jj(,--,-s(r--1v-,,-1)) j y-;k-
,
>1
+
,
,...1
Consider each of these right-hand-side
,,,,--1)) j:--7.)-j ti,,,/-str--rl
kn
'A''7 znt-fznrl
.
P
>1
(19.24)
terms. First,
t,(
))a
>--.'-j
u
9n,,-1))
(r-r-,(r,--lv,-,,-1.))
1/2
k
N
-F(I&fI@,,,,-1))2 = 77e'(l',,r ,-1 p TEYI
1
>1
The tirst inequality in (19.25)is Liapunov's inequality, and the equality follows because ( Ynt ElYnt ITn,f-l)) is a m.d., and hence orthogonal. Second, -
kn
77 znt >1
kn
kn -
Ftznfl @n,,-l))
< P
+ 77Ilzsfslp 77lIF(z,,I@,,,,-1)11, >1
>1
kn
1
kn
f leTcnt. lIz,,,1l,
(19.26)
>1
The second inequality here follows because f EE Izn,lP l@n,,-l)) F,,,,-l) 137 FIF(zn?I
=
Fl
zn,lP,
from, respectively, the conditional Jensen inequality and the law of iterated expectations. The last is by (19.3). lt follows by (c) that for ir > 0 there exists Ne k 1 such that, for n k Ns,
Convergence in Lp Ntprm
301
kn
2 < s-2:2 A-fnr f:
(19.27)
.
>1
Putting together
(19.24)with (19.25)and (19.23)shows kn
7'IXU;
,..a1 k 1 + 2,21c,,r
for n k Ne, where B this completes the proof. =
K Bz
(19.28)
p
0, note that the where j3and y can be any real constants. With kn majorant side of (19.29)is bounded if a(1 + I$) < 0, independent of the value of p. This condition is automatically satisfied as an equality by setting an = k b t, but note how the choice of an can accommodate different choices of kn. Xt2l None the less, in some situations condition (b)is stronger than what we kmow to be sufficient. For the case p 2 it can be omitted, in addition to weakening the martingale difference assumption to uncorrelatedness, and uniform integrability to simple fu-boundedness. Here is the an'ay version of 19.1, with the conditions cast in the amework of 19.7 for comparability, although all they do is to ensure that the variance of the partial sums goes to zero. -
-
-
=
'y
-
The fww of fwr'c Numbers
302
19.10 Corollary lf (-L,) is a zero-mean stochastic an'ay with Exntxns) y: s, and (a) (Xntlcntl is uniformly Q-bounded, and
=
0 for t
kn
(b) 1im n--yx
thenEtzlxn,
7)c2a,
=
0,
>1
-1J-+
0. c!
19.4 A Mixingale Weak Law To generalize the last results from martingale differences to mixingales is not too difficult. The basic tool is the telescoping series' argument developed in j16.2. The array element Xnt can be decomposed into a tinite sum of martingale differences, to which 19.7 can be applied, and two residual components wlch can be treated as negligible. The following result, from Davidson (1993a),is an extension to the heterogeneous case of a theorem due to Andrews (1988).
19.11 Theorem Let the al'ra fAk,,V)=-xbe a Al-mixingale with respect to a constant array (cnf) lf (a) Lxntlcnttis uniformly integrable, .
kn
(b) limsup n--yoo
X cnt
1
n-yx
where kn is an increasing integer-valued function of n and kn
-#.
'1'
x,
then
0. a
k xnt X/:1
There is no restriction on the size here. It suffices simply for the mixingale coefficients to tend to zero. The remarks following 19.7 apply here in just the same way. In particular, if Xt is a Al-mixingale sequence and fxtlbtj is uniformly integrable for positive constants (/7f) the theorem holds for Xnt = Xtlan and cnt btlan where an l=jbt.Theorems 14.2 and 14.4 give us the corresponding results for mixing sequences, and 17,5 and 17.6 for NED processes. It is sufficient for, say, Xnt to be fv-bounded for r > 1, and V-NED, for p 2 1, on a a-mixing process. Again, no size restrictions need to be specified. Uniform integrability of fxntlcntkwill obtain in those cases where ll-kllris finite for r > 1 and each /, and the XED constants likewise satisfy dnt >) II-L,IIr. A simple lemma is required for the proof: ,
=
=
integrable for # 2 1, so is the
19.12 Lemma lf the array fxntlcntis uniformly atray fEt-jxntlcnt J for j > 0. Proof By the necessity
part of 12.9, for any ir
>
0 3 8
>
0 such that
Convergence in Ap Norm
sup n
sup
,t
jyl.L,/o,l
dP
303
(19.30)
< :,
where the inner supremum is taken over all E e 5 satisfying #(F) < 8. Since 5n,t-j Satisfying #(F) g 5, (19.30)also holds when the supremum is taken overF e %not-j For any such F, < .
JsI
XntlcntIdP
l jyk-j
jyI
Et-jxntlcnt IJ#,
XntlcntIdp
=
(19.31)
by definition of Et-j.), and the conditional Jensen inequality (10.18).We may accordingly say that, for s > 0 H 8 > 0 such that
sup n.t
jsI
Et-jxntlcnt6dP
sup
(19.32)
< :,
taking the inner supremum over E e %n,t-jsatisfying #(F) < 8. Since Et-jxnt is Fa,f-measurable, uniform integrability holds by the sufticiency part of 12.9. . Proof of 19.11 Fix an integer j and let kn
77 k-qxnt Et-vj-kxnt). ,--1 is a martingale, for each j.
Fn./
=
The sequence
-
l Ynj, @n,n+jJ';=1
l kjxnt is uniformly
integrable by
Et-vj-txntjlcntl
-
19.12, it follows by
(a) and
-f?.t-A
k Xnf Zf21
(19.33) kn
kn
X
19.7 that
as a telescoping sum. For any M k 1,
M- 1
Ynj
(b) and (c) and
0.
vnj We now express
Since the array
X
=
j=3-M
Et+M-tXnt- X Et-uxnt,
>1
(19.34)
>1
and hence M- 1
kn
Xxn, X =
kn
Ynj +
j=1-M
>1
kn
X xnt -
Etmu-txnt)
>1
+
Xk-uxnt.
(1.35)
>1
The triangle inequality and the fal-mixingale property now give M- 1
kn
E
kn
s 77 FIFnyl + X-tn, >1
kn
XF1
xnt -
>1
j=-M
M- 1
< 77 j=-M
.#'*8N
-
az'N z'
* .
.
.
.
.
Q'
..
XFI Et-uxnt'
+
>1
kn
2(::Xcnr.
E I1k/1 +
According to the assumptions, !
A',+v-lAkll
n
- .-
the secondmemberon .3
-
(19.36)
>1
!
- - - --
.-
..
FY
the right-hand side of (19.36) ,
*
*'
%
,
'
P'
UIT-
The fww of zzr'd Numbers
304
/ for M k Mv. By choosing n large enough, the sum of 2M- 1 terms on the righthand side of (19.36)can be made smaller than z1:for any finite M, by (19.33).So, <e when n is large enough. The theorem is by choosingMk Me we have 171Vzlxkjl proved since is arbitrary. : w now A comparison with the results of j19. 1 is instructive. In an fu-bounded process, the Q-mixingaleproperty would be a stronger fonn o'f dependence restriction than specified in 19.2, just as the martingale property the limiting uncorrelatedness simple uncorrelatedness. The value of the present result is the is stronger than substantial weakening of the moment conditions.
19.5 ytpproxinAable
Processes
There remains the possibility of cases in which the mixingale property is not because of a nonlinear transformation of a AP-NED easily established process which cannotbe shown to preserve the requisite moments for application of the results in j17.3. In such cases the theory of j 17.4 may yield a result. On the assumption that the approximator sequence is mixing, so that its mean deviations converge in probability by 19.11, it will be sufficient to show that this implies the convergence of the approximable sequence. This is the object of the following -perhaps
theorem.
(Wn/l
is a stochastic array and the 19.13Theorem Suppose that, for each m e !N, centredarray fhmnt Ehmnl satisfies the conditions of 19.11. If the array l-Lrl with respect to a constant array (t&,) and limsupu-,x is Al-approximable by fJln;.'11 .-P-f-+ 0. < hen k B f Xtltdnt Ztjaa, t n t -
,
=,
Establishng the conditions of the theorem will typically be achieved using 17.21, by showing that Xnt is far-bounded for r > 1, and approximable in probability on hmntfor each m, the latter being pz-order lag functions of a mixing array of any
size.
Proof Since kn
kn
XXn,
f
>1
kn
kn
X xnt -
X
hmnt, +
>1
hmnt-F('J)
>1
by the triangle inequality, we have for
XF('lf)
(19.37)
>1
0
kn
kn
#
>
+
( j j (j j ) + (jX.j j !a) (!X.j I (a) X.jxn ,
X.jxn
uP
>
,
-
,) h:7
>
kn
kn
P
la
ht
-
F(I))
>
+
P
F(7,)
>
(19.38)
by subadditivity, since the event whose probability is on the minorant side implies at least one of those tnn the majorant. By the Markov inequality,
Convernence in fg Norm kn
P
X tXnr
-
hmntj
>
f
j'
>1
3E
g
kn
X xnt Jln >1
kn
f
3
-,
m
-
< 3 XFlkr F
305
u,I
m
-
>1
kn
X dnt v
m
(19.39)
.
>1
#( IXktzkEhmnt) I > &3) is equal to either 0 or 1, according to whether the non0 and stochasticinequality holds or does not hold, By the fact that Exnt) Al-approximability, =
IEhmntt I
=
IExnt)
-
E
(Mnf)
I < FIXnt
-
hmnt I < dmvm,
(19.40)
and hence kn
kn
kn
XF(''',,rl
1
< Bvm. 77IEhmn 1,< Tdntvm
(19.41)
>1
>1
We therefore find that for each m e EN kn
limsup# n-+x
S
Txnt
>
>1
3B + limsup # 8 vm n--)x
=
3#
v,u
+
1(svs,>/a),
kn
X >1
m m Ehntjj hnt -
>
8
y
+
1(svm>&3)
(19.42)
by the assumption that hmntsatisfies the WLLN for each m e EN.The proof is x. completed by letting m w --
20 The Strong Law of Large Numbers
20.1 Technical Tricks for Proving
LLNS
In this chapter we explore the strong 1aw under a range of different assumptions, omindependentsequencestoner-epochdependentfunctionsof mixingprocesses. Many of the proofs are based on one or more of a collection of ingenious technical lemmas, and we begin by studying these results. The reader has the option of skpping ahead to j20.2, and referring back as necessary, but there is something to be said for forming an impression of the method of attack at the outset. These theorems are found in several different versions in the literature, usually in a form adapted to the pmicular problem in hand. Here we will take note of the minimal conditions needed to make each trick work. We start with the basic convergence result that shows wy maximal inequalities (for example, 15.14, 15.15, 16.9, and 16.11) are important.
20.1 Convergence Iemma Let (A)T be a stochastic sequence on a probability space (f,T,#), and 1et Sn Z:=IX, and So 0. For (t) e f, let =
=
Mll)
inf sup ISjb
=
I
Smfk
=
m j>m
lf P(M
>
e)
=
0 for all E
0, then Sn
>
-6-->
(20.1)
.
S.
Proof By the Cauchy criterion for convergence, the realization (,%())) converges u%1 ; e:for all j > m, for a11 : > 0; in other if we can find an m such that ISj words, it converges if M4(t)) < c, for al1 : > 0. . -
This result is usually applied in the following way.
20.2 Corollary Let lc,)Tbe a sequence of constants, and suppose there exists p 0 such that, for every m 0 and n > m, and every e > 0, P max lSj 5k1 -
>
e f
mmusn
wjct
where K is a t'inite constant. If X7
-
K
E
< x,
x-xn
X
c(,
(20.2)
>,1+1
then Sn
tcf) is summable
-16+
lim---jxzOf
S. =m+1c(
it follows by 2.25 that be the r.v. in (20.1). By definition, M ; supom jSj Sm1for any m
Proof Since
-
P(M
>
e)
lim m,-nt
sup ISj SmI > er -
j>m
>
0. Let M 0, and hence =
>
aw of fxzr'c Numbes
The Strong
,7(,(). Hence -
=
=
n0(tl9
n
X (m(t,))r/t,))) X (-Y/)) =
-
-
F,(t,))), V n k
/70(t,)),
>1
>1
and the sum converges, for a11 (l) e C.
.
The equivalent sequences concept is often put to use by means of the following
thorem.
20.4 Theorem Let (Al7 be a zero-mean
random
sequence satisfying
X
XE >1
Ix,IPla;
1
(20.17)
The zzw of fwtzr'd Numbers X
XF1
'
1,,1
< E( I.)1P(1 S 2F( IXt l#)/J(. =
-
-
12)(+J) -
-
-
at4
-
(20.19)
The second equality in (20.19) is again because Ext) = 0. The first inequality is an application of the modulus inequality and triangle inequalities in succession, and the last one uses (20.9).By similar arguments, except that here the cr inequality is used in the second line, we have F41 1'Jl
r)
< F!.Y,1t
+
(1 1tJ)(+tz 1rlaL -
< 2r-1(sj xgl/j S
r)
sj (1 1) j 2r-1(F( + P IXt I > IXt3P3L)lal; zre' jxt1pjtai. rfa;
+
-
J,))
The theorem follows on summing over t as before.
for p ; r
(20.20) .
Clearly, 20.5 could be adapted to this case if desired, but that extension will not be needed for our results. The last extension is relatively modest, but permits summability conditions for nonns to be applied.
20.7 Corollary (20.6),(20.7),(20.8),(20.17),and (20.18)a1l continue to hold if (20.5)is replaced by < 77F(lAI#)1/t#z7A
(20.21)
cxa
>1
for any q 2 1. Proof The modified
#(I#,l
>
forms of a
(20.9),and
of
(20.19)and (20.20)(say) are
l#)/Jf)1& K P IA1 > h)1& S E l-Yf
pyaqlllq lE(Yt4I S lE(Yt4I3Iq < J,1&(.s(jxtj
(20.22) ,
(20.23)
The Strong
zzw of r)l&
< (Fl yf 1
e'Ii',1r
311
zzr'c Numbebs
< zr/qE
pyalklq
Ix,1
(20.24)
1
where in each case the first inequality is because the left-hand-side member does not exceed 1. w For example, by choosing p = q the condition that the sequence summable is seen to be suftkient for 20.4 and 20.6.
( llxf/tzf lIp1is
20.2 The Case of lndependence The classic
results on strong convergence are for the case of independent sequences. The following is the three series theorem' of Kolmogorov:
20.8 Three series theorem Let (aV)be an independent sequence, and Sn = k1=3Xt. Sn S if and only if the following conditions hold for some fixed a > 0: -61-:
oo
X#(l-,I
a)
(20.25)
=,
>1
77A'(1(Ix,Isc)-Y,)
1 X
XVar(1lIx,I1
1
(20.39)
0.
---/
For an rl, such a sequence has got to be zero or very close to it most of the time. In fact, there is a trivially direct proof of convergence. Applying the -
theorem (4.9),
monotone convergence tzn-l
E lim
Isn I
n--yx
N
< E lim c-ln
:7 I-Y,I >1
n-x
1im = n--yx
c-n1>7e'lx,l
.
>1
(20.40)
For any random variable X, E I.YI 0 if and only if X = 0 a.s.. Notlzing more is needed to show that Snlan converges, regardless of other conditions. Thus, having latitude in te value of p for which the theorem may hold is really a matter of being able to trade off the existence of absolute moments against the rateofdampingnecessrytomakethemsummable. Wemaymeetinterestingcasesin which (20.38)holds for p < 2 only rarely, but jince this extension is available at small extra cost in complexity, it makes sense to take advantage of it. =
20.11 Theorem If l.#),X)7is a m.d. sequence satisfying (20.38)for 1 f p ; 2,
Snlan
--1t.:
0.
Proof Let l'f 1( Ix,I>,)Xs and note that (aYfl and f Ff) are equivalent under by 20.4. Ff is also Tf-measurable, and hence the centred sequence (20.38), f4,;,), where Zt Ff A'(l'rl S-1), is a m.d. Now, =
=
-
Ezi)
F(e'(zlI@,-1)) F(rfl = F(F(I'lI@,-1)
=
-
1,-1)2)
The f-aw of fzzr'c Numbers
316
= s(r2,)
F(F(y,j y,-j)2).
-
(20.38)implies
According to 20.4 with r 2, Ezt) 2 < EY2), by (20.41), =
X Ezlyal t
1 15.13(i), (20.38)is
-
,..-1
>'-A'(i',j
1
< x,
< x, a.s. Absolute According to 20.5, (20.44)implies that Z7.11F(l'/ITr-l) may say that convergence of a series implies convergence by 2.24, so wet7n-l7=l -6YA st-Lllat -E1-.:/ '/=1F(I$l + Sz and so l'l Sz. Hence, E7=1 Ytlat 0 follows by 20.3 and equivalence of and Xt Ff the by the Kronecker lemma. lt 0. w implied by (20.38)that Snlan
tlat
-6-.3.-:
,5'1
-61.+
Notice that in this proof there are no short cuts through the martingale converis a martingale, the problem is to gence theorem. While we know that Llzubxtlat establish that it is unifonuly fal-bounded, given only information about the joint distribution of (A)), in the fonn of (20.38).We have to go by way of a result for p 2 to exploit orthogonality, wlch is where the truncation arguments come in =
handy.
20.4 Conditional Variances and Random Weighting A feature of martingale theory exploited in the last theorem is the possibility of relating convergence to the behaviour of the sequences of one-step-ahead conditional moments; we now extend this principle to the conditional variances E (Xt215 t-1 ) The elegant results of this section contain those such as 20.10 and .
20.11.
The conditional variance of a centred coordinate is the variance of the innovation, that is, of Xt Ext IA-1), and in some circumstances it may be more natural to place restrictions on the behaviour of the innovations than on the original setuence. ln regression models, for example, the innovations may coaespond to the regression disturbances. Moreover, the fact that the conditional moments are Tr-l-measurable random variables, so that any constraint upon them is probabilistic, pennits a generalization of the concept of convergence, following the results of j15.4; our confidence in the summability of the weighted conditional variances translates into a probability that the sequence converges, in the -
Fe
Strong fzzw of zzrge Numbers
317
manner of the following theorem. A nice refinement is that the constant weight can be replaced by a sequence of Tf-l-measurable random weights. sequence tt7fl zo-lzrfheoreml-et (Xf,Trlrbeam.d. sequence, (B$Janon-decreasingsequenceof 7=1Arf.Then positive, Tr-j-measurable r.v.s, and Sn =
#X
-' F(xll
@,-1)/+1
1
=
0. u
(20.45)
The last statement is perhaps a little opaque, but roughly translated it says that 0), is not less than that the probability of convergence, of the event l5',,/Gn of the intersection of the two other events in (20.45).In particular, when one probability is 1, so is the other. --
Proof If (A) is a m.d. sequence so is (xY,/J7f), since W$is Ff-l-measurable, and 1=jXtIWtis a martingale. For ) e f, if L((l)) F()) and W%() 1 x then Tn 0 by Kronecker's lemma. Applying 15.11 completes the proof. . &()/lL() -->
=
---
See how this result contains 20.10, corresponding to the case of a fixed, divergent weight sequence and a.s. summability. As before, we now weaken the summability conditions from conditional variances to pth absolute moments for 1 f p < 2. However, to exploit 20.5 outside the almost sure case requires a moditkation to the equivalent sequences argument (20.3),as follows. of Tf-measurable r.v.s,
20.13 Theorem If (-YfJand ( Ff) are sequences XPCXt y. l'f 11,-1)
#
1
I;/.,)/J.y2t
lt follows by 15.11 and the fact that Ej P A'1
Xzr/B'r
-
< x
C
-
(Fl
5'1
--
0.
=
=
(20.50)
D)
-
QJ
(D
-
C) that
(20.51)
0,
>1
where & is some a.s. finite random variable. A second application of 20.5 gives X
P Fj
IF(F,I@r-1)l/B :-.2 >1
-
1
(F1 r7 E1)
PEj
5'1+
-+
-
-
sz
(20.55)
20.5 Two Strong Laws for Mixingales The mmingale difference assumption is specialized, and the last results are not sufficient to support a general treatment of dependent processes, although they are the central prop. The key to extending them, as in the weak law case, is the mixingale concept. In this section we contrast two approaches to proving mixingale strong convergence. The tirst applies a straightforward generalization of the methods introduced by McLeish (1975a);see also Hansen (1991,1992a) for related
The Strong f-zzw of z7rg' Numbers
319
results. We have two versions of the theorem to choose from, a milder constraint on the dependence being available in return for the existence of second moments. be a fv-mixingale with respect to 20.15 Theorem Let the sequence (m,@f)=-x cfl either for constants ( with mixingale size 2, (i) p or (ii) 1 < p < 2, with mixingale size If X*t < cxo then Sn S. ,
-z1,
=
-1.
czlc'f
-6:4
Proof We have the maximal
inequality, l
E max 1 j
e
1 p < ,--E max ISj Sm1
(20.58)
-
m<jn
by the Markov inequality. Inequalities (20.57)and (20.58)combine to yield (20.2), and the convergence lemma 20.2 now yields the result. w We can add the usual corollary
from Kronecker's lemma.
satisfy either (i),or (ii) of 20.15 with respect 20.16 Corollary Let fxtlat,stloloo to constants fctlatt, for a positive sequence fat ) with at x. lf '1h
X
'lcnlant
1
then Snlan
-1->
0. u
The second resuli exploits a novel and remarkably R. M. de Jong (1992).
powerful argument due to
20.17 Theorem Let (Xf,@,) be an fv-bounded, Al-mixingale with respect to constants f ctj for r 1, and let (Jfl, (#f) be positive constant sequences, and fMt J a positive integer sequence, with an x. lf ,1'
n
co '-zlanll
TM n=1
n exp
3lMlTBl n
1.=1
t
1
X
< X Lutctlat ,-1
x where ((m),,,=o are the mixingale
(20.62)
x,
coefficients, then Snlan
-1'..-j
0. u
(#,J and
fA1)) are chosen freely to satisfy the conditions, given (tztJand fcrl, which suggests a considerable amount of flexibility in application. The 1, the role which was sequence (#,1 will be used to define a truncation of played by (tz,)in 20.11. The most interesting of the conditions is (.20.62),wlaich explicitly trades off the rate of decrease of the mixingale numbers with that of the sequence fctlatj. This approach is in contrast with the McLeish method of defining separate summability conditions for themomnts and mixingale numbers, as detailed in j16.3. Here
t#f
Proof Writing 1Bt for 1( 1x,lxs,), start by noting that E t+jXt = Etsj 1t,Xt + Et+j 1 11)Xt. -
Hence, we have the identity Xt
=
(Et+Mt-j
llXr
Et-M:$BtXt)+
-
(f+M,-1(1
li/lA
-
Et-Mtll
-
-
+ xt e',+,r,-1.;)+ Et-u7, -
and, by the usual
ttelescoping
1()Xf)
(t0.63)
sum' argument, Mt- 1
Bx E f+M,-l 1 t t
-
$Bx ;,-.A,z, t t
:- z,,j
=
(20.64)
./=l-Mt
where Zjt 14,I f lBt
=
is a m.d. sequence. Note that Et.t BX E ?+y-1 IW-Y t t t t, and tzys?h+yl a.s., by a double application of 10.13(ii). Summing yields
p X-Y,
-
n
=
zp X X l -Mt J=1
>1
Mt- 1
n
+
77F,+>,-1(1
n
+
=
ubn
+
ThEt-utj
-
>1
x>1
szn
-
1Bt%t
-1
jzz
-
-
1Of
)Xt
l
'+
F7(Xt
-
Et.vut-xtj
,...1
Et-uy an
f
n
X
XZy,
P
j=1 -Mn
anzlnMn
>
>1
Mn- 1
-M;
7) + X =Af2 '=1 -Mn
+
P
Xz,
>
anelnun
>1
,z
n
Blt '-'NX (32152
-rzlalnl
f zfaexp
>1
Mn
-Af2
:j
+
qj- 1
)j
+
'=1-Mn
f 4Mnexp
-:
1
-
exp
252/32A/2
-e
n
; #2 ,
>1
.j=Jf2
lan
n
qj- l
32M2 TBl n
t
(20.67)
.
>1 Under (20.60),these probabilities are summable over n and so Sjnlan first Borel-cantelli lemma. Now let ( l'r) be ay integrable sequence and detine
--1-4
0 by the
n
,7,1 -
X Ytlat.
(20.68)
,-1
By the Markov inequality, P max IS1 m<jxn
f by an application of 20.2, and hence Sllan remaining result to each of the lemma. We apply tltis terms. For Szn. put l', --t!-
=
BJ),
an d note that
E r+M,-lX(1 1 -
n
?;
l
< A7#1,-rF1 x,$r/cr,
F7FIF,1lat < F7A'1 (1 114Xtl/J/ -
(20.71)
>1
>1
>1
B
r. S5n is dealt with in exusing the fact that IXf(1 1 t) IBt I f lXt (1 1)/#/l t s%n and %n,put successively Ff Xt Et-t-jxt and Ff actly the same way. For -
-
=
=
-
assumption,
Et-utxt, and by the mixingale
n
n
77FIF,1lat
f
77 ctlatjLut.
(20.72)
>1
>1
The proof is completed by noting that the majorant terms of are bounded in the limit by assumption. .
(20.71)and (20.72)
The conditions of 20.17 are rather difticult to apply and interpret. We will restrict them vel'y slightly, to derive a simple summability condition which can be compared directly with 20.15. with 20.18 Corollary Let (A,Tf ) be an Lr-bounded, fyl-mixingale of size respec! to constants (cf). If fcf) and ftz/l are positive, regularly varying x, and
=
1) 2r(pc.#2(r-
(1+?.)(?c
(20.74)
1)'
+2(r-
0.
Proof Define ot (log -1- fo r > 0 This is slowly varying at infinity by 2.28, and the sequence fotltl is summable by 2.31. Apply the conditions of 20.17 with the added stipulation that fBtI and f#4l are regularly varying, increasing sequences, and so consider the conditions for summability of a series of the form Uzln) ), for q > 0. Since Lnonlnjconverges, summability follows Esitnlexpl 0. Taking logarithms, this is equivalent to f'rom (nlonllhlnlexpt U1(n4) =
.
-q
-->
-q
1ogn logtu) -
Since U(n)
nvn)
=
1og&1(n) q&a(n)
where Lnj
log n logton) -
+
+
-
(20.75)
.-.-)
-x.
is slowly varying, tls condition has the form
pllog n + 1og(fy1(a)) 1)?2Q(?7) -
(20.76)
-x,
--
where pl and pa are non-negative constants and LL(n) and Lzln) are slowly yarying. The tenns logt/n) and log(fz1(n)) can be neglected here. Put p2 0 and Lzn) Llon (logn) 1+8 an d the condition reduces to =
=
,
=
The Strong
(1 + pl
-
zzw of fzzr'd Numbers
ntlognlllogn
323
(20.77)
-x, ---
which holds for all pl for any q > 0 and 6 > 0. Condition (20.60)is therefore satisfied (recallingthat (#r) is monotone) if ,
2 2 nMnBnlan < on. 2
Similarly, conditions
(2.61)and (2.62)are B t1-rFl Xt Irja
B1
t ((
(20.78) satisfied if, respectively,
'-rclla
t
t
(20.79)
(p0.
We can identify the bounding cases of Bt and Mt by replacing the second order-ofmagnitude inequality sign in (20.79),and that in (20.80),by equalities, leaving the required scaling constants implicit. Solving for Mt and Bt in this way, substituting into (20.78),and simplifying yields the condition
(c,lat /
?,/f,
'z
(20.81)
where (, (2r(p+ 2(r 1)1/(41+ rl(g + 2(r 1)j. This is sufficient for (20.60), (20.61), and (20.62)to hold. Since ct and at are specified to be regularly varying, there exist non-negative constants p3, p4, and slowly varying functions Lz and L4 such that ct tP3L5t) /P4fa is summable implies that and at 4 (/). The assumption that (lctlauj which in But (p > (?o implies ( > %, so that (p3 p4)( < (p3 p4)(o K . implies This completes the proof. 1). turn (20.8 =
-
-
=
=
-1.
-1,
-
-
Noting that 1 f Q f 2, the condition in (20.73)may be compared with (20.59).Put whereas with (pn and r 1, we get (() 2 and we obtain (e (pn 242r 1)/(3r 1) which does not exceed r in the relevant range, taling values between 1 when r 1 and 55when r 2. Square-summability of ctlat is sufficient x x. Thus, this theorem does not contain and r only in the limit as both (20 20.16. On jhe other hand, in the cases where (cfJis uniformly bounded and at f, we need only (o> 1, so that any r > 1 and (fb > 0 will serve. These dependence restrictions are on a par with those of the Lj convergence law of 19.10, and a striking improvement on 20.16. The case r 1 is not permitted for sample averftlog f) 1+ for 6 > 0. In other words, the ages, but is compatible with at theorem shows that
J
=
#,
=
=
-
=
=
-
=
=
-->
-->
=
=
=
(rltlogn) l4$-ly-qa:r
...,
t
() a.s.
(20.82)
>1
This amounts to saying that the sequence of sample means is almost surely slowly x; it could diverge, but no faster than a power of 1og n. varying as n --
20.6 Near-Epoch Dependent and Mixing Processes In
vinw
nf
fhe
lnqf
reKllltK.
there are two
Dossible
aooroaches
to the NED case. It
The frw of fxzrgd Numbers
324
turns out that neither approach dominates the other in tel'ms of pennissable conditions. We begin with the simplest of the arguments, the straightforward extension of 20.18. for z0-lgTheorem Letasequence lml=-xwith means lwl=-xbefv-NEDof size g,,llp,on a (possiblyvector-valued) sequence 1 < p S 2, with constants dt f< 11*-, which is FrlJx a-mixing (f-mixing)of size -J. lf ( -b,
-
*
11(.:)g,rl/hllti < 77 >1 for q
>
p in the a-rnixing case q
(20.83)
x
-
p in the
1) lqa lqb + lq +qlb jqxs + lq 1) ( (1 -
( t
Sxt n t= hen z-1E''
=
min
,
-
w)-fJ.-.
-
Nvhere
case) g-nzixing + +
1) gc
(20.84)
,
0.
Proof
J(1 1/)) with By 17.5, (Xt- jttl is a fal-mixingale of size respect to constants (c,), with ct
(x)
-->
(x)
-=.
-
20.20 Yheorem For real numbers b, p and r, let a squence f-vl'!x with means be JO-NED of size with constants dt 1 -
n Then an-1Xr=1( Xt ;1, b (i) p 1, 1 (ii) b (iii) b p (iv) b = 1, 1 =
=
=
,
-
=
< =
0
2, r > 2, p < 2, r 2, r k 2, p < 2, r
Proof By 17.5, conditions
(20.85)
=.
in each of the following cases'. 2)., (Ji) is (x-mixing of size > p, ( F,) is a-mixing of size 1),. f Ff J is (-tnixingof size 1). 2 p, f F,) is (-mixingof size
(i)-(iv)are
-r/(r
-
-#r/(r
-p);
-r/2(r
-
-r/(r-
all sufficient for
l Xt
-
P,,)/Jr,
?h1 to be
The Strong f-aw of fzzr'd Numbers
325
where 5t c(F,, s < f). The mixingale constants are of size II-Y, p.fllrl/c, 11*-, wIIr/c,.The theorem follows by 20.16. w
an fv-mixingale ctlat (t maxtlf,
=
-,
=
-
-
-z1,
As an example, 1etXt possess moments of all orders and be AZ-NED of size on x). of size close Summability of the a-mixing r,,,-.k terms to process (letting an Yaxtlat) is sufficient by (20.85).The same numbers yield ) 8,/on putting q 2 and a 1 in (20.84),which is not far from requiring summability of the Q-norms. However, this theorem requires fv-boundedness, which if r is small constrains the permitted mixing size, as well as offerig poor NED size characteristics for cases w'ith p < 2. lt can be improved upon in these situations by introducing a truncation argument. The third of our strong laws is the following. -1
=
=
=
20.21Theorem Let a sequence (A)=-x with mans lwlm-x be AP-NED of size with (( dt 1 2, < f for constants p g,II,,on a sequence ( F,)--- which is 111,-1/p
either
(i) a-mixing of size (ii) (-mixingof size
2) for r > 2 or 1) for r > 1, and r k p; and for q with p S q S r and a constant positive sequence -r/(r-
-r/2(r-
(th)
'1
1 -
1$)
-
--6-:
(20.90)
sz,
where 51 is another random variable, by 20.3. We conclude that
X xt vlat ,-1 -
-EI-
5.1+
s, + ti'l s?,
(20.91)
=
J-1J n
say. lt follows by Kronecker ' s lemma that conclusion. w
=
-f.-..y
1(4
w)
-
0, the required
more specialized, result. The linear function of martingale differences with summable coefficients is a case of particular interest since it unifies our two approaches to the strong law. Here is a fill,
20.22Theorem sequence with
where ( U(k is a uniformly Ip-bounded m.d. Let Xt Lcej=-xjut-j 0;1 < n. Then p > 1, and lrv-xl j =
... .
1.19
lk.
((r!ji;;7''') drkr
.kigikr-r
................45))-
.
....... .11 1=1
.
s
b..i
'
-).
4:4()4 .
.. .
4..:-..)-
....
....
$tt),.'..
:..
.
t
.-y
.
.,.
....
.-).
.....-
,
u.. r
r.'t' #t-. ''.'(.
l'.
..
--?---...
.'
h.
.
k' ) '.)
. u'y.,.
.
%.
i..,..
y...y. ... ...
,. .
.
.,
with ct t< 1// and arbitrary size. It was shown Proof Ff Xtlt is a L ! in 16.12 that the maxlmal inequality (20.56)holds for this case. Application of the convergence lemma and Kronecker's lemma lead directly to the result. Alternatively, apply 20.18 to Xt with at /. w -mixingale
=
=
In these results, four features summarize the relevant characteristics of the stochastic process: the order of existing moments, the summability characteristics of themoments, andthe sizes of themixing andnear-epochdependence numbers.The way in which the currently available theorems trade off these features suggests that some unitkation should be possible. The Mct-eish-style argument is revealed by de Jong's approach to be excessively restrictive with respect to the dependence conditions it imposes, whereas the tough summability conditions the latter's theorem requires may also be an artefact of the method adopted. The repertoire of dependent strong laws is currently being extended (deJong, 1994) in work as yet too recent for incorporation in this book.
21 Uniform Stochastic Convergence
21.1 Stochastic Functions on a Parameter Space The setting for this chapter is the class of functions
2f : x 0wllere (f1,F,g.) is a measure space, and (O- is a metri space. We write to ai value which variable the real point random assumed is denote by f the a (,0), is not a random varifor fixed 0. But /(,.), alternatively written just able, but a random element of a space of functions. Ecnometric analysis is very frequently concerned with this type of object. Loglikelihoods, sums of squares, and other criterion functions for the estimation of econometric models, and also the first and second derivatives of these criterion functions, are a1l the subject of important convergence theorems on which proofs of consistency and the derivation of limiting distributions are based. Except in a restricted class of linear models, a11 of these are typically functions both of the model parameters and of random data. --+
,
.f4,0)
,p)
.f(),
To deal with convergence on a function space, it is necessary to have a criterion by which to judge when two functions are close to one another. In this chapter we examine the questions posed by stochastic convergence (almostsure or in probability) when the relevant space of functions is endowed with the uniform metric. A class of set functions that are therefore going to be central to our 2- where discussion have the fonn f*1fl --+
f*k
.f((0,0).
(21.1)
sup 6c e
=
For examjle, if g and h are two stochastic functions whose unifonn proximity is at issue, we would be interestd in the supremum of
.f(,0)
=
I#(,0)
-
(,0) I .
An important technical problem arises here which ought to be confronted at the outset. We have not so far given any results that would justify treating f* as a random variable, when (O- may be an arbitrary metric space. We can write ,p)
((,):fs
> xl
=
U f ): /'(0,)
> xl,
(21.2)
0e O
andthe results of 3.26 show that f: /*4) > xl e 5 when (: O is a subset . each0, when e is a countable set. But typically of the kind, and is uncountable. someihing .f(0,)>x)
e T for of (Rk or ,#s)
The ww of f-wr'd Numbers
328
This is one of a class of measurability problems having ramifications far beyond the uniform convergence issue, and to handle it properly requires a mathematical apparatus going beyond what is covered in Chapter 3. We shall not attempt to deal with this question in depth, and will offer no proofs in this instance. We will merely outline the main features of the theory required for its solution. The essential step is to recognize that the set on the left-hqnd side of (21.2) can be expressed as a projection. Let Ot.l denote the Borel tield of subsets of 0- that is, the smallest c-tield containing the sets of 0- that are open with respect to p. Then 1et (f1x 0, 9 @ B denote the product space endowed with the product c-tield (thec-tield generated from the measurable rectangles of 5 and fe), and suppose that f(.,.) is 5 (& t.lfp-measurable. Observe that, if ,
Ax
=
f(),0): f4,0)
>
x)
9 @ Be,
l
(21.3)
the projection of Ax into f is Ex
f
f (,0) = ((t)..f'u =
:
> >
x, 0 e x)
0-) (2 1.4)
.
In view of 3.24, measurability of f* is equivalnt to the condition that E'iI l 5 for rational x. Projections are not as a rule measurable transformations, but 5P L'j,5P,1b where is the under certain conditions it can be shown that Ex e completion of the probability space. The key notion is that of an analytic set. A standard reference on this topic is DellacherieandMeyertlg78); see alsoDudley (1989:ch.13), and Stinchcombeand White (1992).The latter authors provide the following definition. Letting (f1,T) be a measurable space, a set E c f is called F-analytic if there exists a compact metric space (O,p) such that E is the projection onto of a set A l 5 (&0e. The Also, a function J: f F-) F is collection of T-analytic sets is written for each called T-analytic if (: J()) f xl e e P Since every E e 5 is the projection of E x 0- e 5 @ e, 5 454. A measurable is not in general a c-field, set (or function) is therefore also analytic. although it can be shown to be closed under countable unions and countable intersections. Tiie conditions under which an image underprojection is known to be analytic are somewhat Weaker than the detinition might suggest, and it will actually suftice to let (O,e) be a Souslin space, that is, a space that is measurably isomorphic to an analytic subset of a compact metric space. A sufficient condition, whose proof can be extracted from the results in Stinchcombe and White (1992),is the following: ,
.4(54.
.45)
.x
.45)
,fo)
21.1 Theorem Let (f1,F) be a measurable space and (Ois in M5 (&Se), the projection of B onto n .45).
a Souslin space. If B e
define T&=() t TB where (f1,;B,F) Now , given themeasurable space (f1,@), is the completion of the probability space (f1,@,g,) (see 3.7) and Uthe intersection is taken over al1 p.m.s g defined on the space. The elements of ; are ca lled univer,
Uniform Stochastic Convergence
329
sets. The key conclusion, from Dellacherie and Mayer III.33(a)), is the following.
sally measurable
21.2 Theorem
(1978:
space (f1,T),
For a measurable
.454 c 5 & n
(21.5)
.
S ince by definition
5U
FB
for any choice of jt, it follows that the analytic sets for any choice of g. In other of 5 are measurable under the completion of (f1,@,jt) if analytic there exist is that E A, # such A c E ? # and g(A) jt(#). words, e 5 ln this sense we say that analytic sets are nearly' measurable. Al1 the standard probabilistic arguments, and in particular the values of integrals, will be and we can ignore it. We can unaffected by this technical non-measurability, legitimately treat J*((0)as a random variable, provided the conditions on 0- are observed and we can assume J(.,.) to be tnear-l5 f9 Oe/s-measurable. An analytic subset of a compact space need not be compact but must be totally bounded. It is convenient that we do not have to insist on compactness of the parameter space, since the latter is often required to be open, thanks to strict inequality constraints (thinkof variances, stable roots of polynomials and the like). In the convergence results below, we find that 0- will in any case have to be totally bounded for completely different reasons: to ensure equicontinuity; to ensure that the stochastic functions have bounded moments; and that when a stochastic criterion function is being optimized with respect to 0, the optimum is usually required to lie almost surely in the interior of a compact set. Hence, total boundedness is not an extra restriction in practice. The measurability condition on fl(.,) might be verifiable using an argument from simple functions. lt is certainly necessary by 4.19 that the cross-section functions A.,0): F-> P-and J(),.): 0- 2- be, respectively, T/plmeasurable for ,n) each 0 e 0- and s/s-measurable for each ) e f. For a finite pmition (O-1,...,0of 0- by e-sets, consider the functions c
=
-->
J(??,)(t0,0)J(,$), =
where
j
is a point of O-j. If E) Ax
=
l(,0):
.f(,,z)(tt,0)
=
0 e O-j,j
(): f,%.l
f xl
=
UFJTX
=
(21.6)
1,...,-, f a7l e 5 for each
h
then
O-je 5 (8)f(.),
(21.7)
j being a finite union of measurable rectangles. Since this is true for any fm) is 5 (8)fe/s-measurable. The question to be addressed in any particular case is whether a sequence of such partitions can be constructed such that hm) f as .;r,
--)
m
--
x.
Henceforth we shall assume without further comment that suprema of stohastic functions are random variables. The following result should be carefully noted, not least because of its deceptive similarity to the monotone convergence theprem, although this inequality goes the oppositeway.The monotoneconvergencetheorem concerns the expectation of the supremum of a class of functions (/',,4)1, whereas the present one is more precisely concerned with the envelope of a class of
The fzzw of frr'e
330 functions, the function of f.
J*()) which
assumes the value supeseltttl,ol
sup F(.f(0)) f E sup
21.3 Theorem
0e O
Numbers
0e O
J(0)
at each point
.
Proof Appealing to 3.28, it will suffice to prove this inequality for simple functions. A simple function depending on 0 has the' fonu m
((,0)
=
Xw(0)1s/(tl))w(0), ) =
(21.8)
e Ei.
f=1
Defining
G*j
=
supesowto),
sup ee e
(p4,0)
=
a*j, (t) e Ei.
(2 1.9)
Hence
supF@(0))
E sup (/0) ee O
-
e(-)
X (w(0) u14PEi)
sup
=
-
86O
< 0,
(2 1.10)
f=l
where the final inequality is by definition of a). w
21.2 Pointwise and Uniform Stochastic Convergence Consider the convergence (a.s.,in pr., in Lp, etc.) of the sequence (Q,,(0)Jto a limit function :(0), Typically this is a law-of-large-numbers-type problem, with
pn(0) X4n,(0) >1
(2 1.11)
=
(we use array notation for generality. but the case qnt qtln may usually be assumed), and :(0) 1imn-+xF(:s(0)). Alternatively, we may want to consider the 0 where case Gnj =
=
-->
G,,(0)
=
.Q
7
.(n,(0)
-
F(ts/0))).
(2 1. 12)
>1
By considering (21.12)we divide the problem into two parts, the stochastic convergence of the sum of the mean deviations to zero, and the nonstochastic onvergence assumed in the definition of Q(0).This raises the separate question of whether the latter convergence is uniform, which is a matter for the problem at hand and will not concern us here. As we have seen in previous chapters, obedience to a law of large numbers calls for both the boundedness and the dependence of the sequence to be controlled. In the case of a function on 0- the dependence question presents no extra difficulty; for example, if wr(01)is a mixing or near-epoch dependent array of a given for any 01, 02 l 0. But class, the property will generally be sharid by qnt' the existence of pmicular moments is clearly not independent of 8. If there ,
Unijrm
Stochastic Convergence
331
< exists a positive array (D,,f) such that Iqnte) I S Dnt for a1l 0 G 0- and IIDn,lIr uniformly in t and n, qnt) is said to be Lr-dominated. To ensure pointwise x, convergence on 0- we need to postulate the existence of a dominating array. There is no problem if the qnt) are bounded functions of 0. More generally it is necessary to bound 0- but since 0- will often have to be bounded for a different set of reasons, tlzis does not necessarily present an additional restriction. ,
,
,
on the dependence plus suitable domination conditions, pointwise stochastic convergence follows by considering (Gn(0)) as an ordinary stochastic sequence, for each 0 e 0- However, this line of argument does not juarantee that there is a minimum rate of convergence which applies for a1l 0, the condition of uniform convergence. If pointwise convergenceof (Gn(0)) to the limit G(0) is defined by Given restrictions
.
Gn(0)
Lp, or in pr.), each 0 e 0-
(a.s., in
0
..-0
(21.13)
,
a sequence of stochastic functions (G,,(0) ) is said to converge
lznt/rpz/y
Lp, or in pr.) on 0- if
sup lG,,(0) I 9e (-) To appreciate
0
.--y
(a.s.,in
L;,, or in pr.).
(a.s.,in (21.14)
the difference, consider the following example.
21.4 Example Let 0gnt)
ht =
define a zero-mean
g0,x),and
=
-
n
(%,(0)l
where
0 f 0 f 1/2?z
Z0, Z(3In
+
an'ay
-
0), 1/2n $In
0,
0 f $In
0 and 8 > 0, there exists Nj k 1 large enough that, for n k Nj ,
P
Sup
0
E5
-
Bz
-,1.
r1E
I:,,(0) Q(0)I
0o, there exists N2 such that, for zl 2 N1,
#(0
e #0) 2 1 14.
(21.28)
-
To consider the joint occurrence of these two events, use the elementary relation P(A
#) k PA) +PB4
/'7
1.
-
(21.29)
Since
(0n*e for n
#ol
f''h
sup I:a(0) t)E5BQ
:(0()1
--->
The set F t/'n,n e IN) QJ (01,endowed with the uniform metric, is a subspace of (Co,Jg), and by definition, convergence of fn to 0 in the unifonn metlic is the same thing as unifonn convergence on 0- According to 5.12, compactness of F is equivalent to the property that every sequence in F has a cluster point. ln view of the pointwise convergence, the cluster point must be unique and equal to 0, so that the conclusion of this theorem is really identical with the Arzel-Ascoli theorem, although the method of proof will be adapted to the present case. Where convenient, we shall use the notation =
.
w(/k,)
=
sup 0 (5 O
0/G
sup
510,8)
IJn(0') fn) 1
(21.37)
-
.
The function w(A,.):R+ F-+ R+ is called the modulus ofcontinuity of fn. Asympof the sequence (Jn) is the property that totic unifonn equicntinuity limsupnwt/k,8) 1 0 as 0. .1'
Proof of 21.7 To prove
iif'
:
satisfy
given
e: > 0, there exists by assumption
limsup w(/,) n--x
0 to
(21.38)
The fww of fzzr'c Numbers
336
since 0- is totally bounded, it has a cover f5(f,&2), i 1,...,mJ.For each i, choosep e 0-n such that p(0sp) < /2 (possiblebecause 0-() is dense in 0-) and i 1,...,mJis also a cover for 0- Every 0 0--is contained note that (5'(0f,8), =
ad
=
.
N
in 5'(0/,) for some i, and for this i,
lJn(0)1f
1Jn(0')l
sup
e's
S
-s('4'j,)
Ifne'q .f(pf)I+ IJ(1) I
sup
-
.
.s'(;J,)
e'