This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
0.
Construct (on an entirely irrelevant probability space) an auxiliary random variable Z such that
( 9.13) for each yj in the range of Y. Note that the probabilities on the right do add to 1. The moment generating function of Z is
( 9 . 14) and therefore
( 9.15)
E[Z] =
M' ( T ) p
= 0,
For all positive t, P[ Y> 0] P[e' Y > 1] < M(t) by Markov's inequality (5.31 ), and hence =
(9.16)
P[ Y> 0 ] < p .
Inequalities in the other direction are harder to obtain. If summation over those indices j for which yj > 0, then
r:
denotes
( 9 . 17 ) Put the final sum here in the form e - 9 , and let p = P[ Z > O]. By (9.16), () > 0. Since log x is concave, Jensen ' s inequality (5.33) gives
- () = log L:' e- T Yjp - 1 P [ Z = Yj] + log p > [ ' ( - TyJp - 1 P ( Z = yj] + log p = - Tsp - 1 [ ' ; P ( Z = yj] + log p.
By (9. 15) and Lyapounov's inequality (5.37),
150
PROBABI LITY
The last two inequalities give
0 O] = pe - 9, where 0 satisfies (9. 18). To use {9. 18) requires a lower bound for P[ Z > 0].
If E[ Z ] 0, E[ Z 2 ] = s2, and E[Z 4 ] = e > 0, then P[ Z > 0)
Theorem 9.2.
=
> s 4 /4 e. t
Let z + = ZI[Z ?: O] and z - = - Zi � O]" Then z + and z - are lf nonnegative, Z z + - z-, Z 2 = ( Z + ) 2 + (Z-) , and P ROOF.
=
( 9 . 1 9) Let p
=
P[Z > 0]. By Schwarz's inequality (5.36),
E ( z + ) 2 = E [ I[Z ?: OJ Z 2
[
]
]
< £ 1 / 2 [ I[� ?; OJ] £ 1 /2 [ z 4 ]
By Holder's inequality (5.35) (fo r p
=
E ( z - ) 2 = E c z - ) 2 / ' c z - ) 4/3
[
[
]
q
t and
=
p 1 1 2e .
= 3)
]
< £ 2 13 [ z - ] £ 1 !3 c z - ) 4 < £ 2/3 [ z - u · 4/ 3.
[
Since E[ Z ] . q 3 gives ) 4
=
]
0, another application of Holder's inequality (for p = 4 and
=
E[z-] = E[ z+]
= E ( ZI[Z ?; OJ ]
< £114 ( z 4 ] £ 314 ( I[¥; 01 ] = gp 3 /4 . Combin ing these three inequalities with (9. 1 9) gives
({p 3 /4 ) 2/3e/3
=
2 p 1 1 2e .
t For a related result, see Problem 25.19.
s 2 < p 1 1 2e + •
SECTION
9.
LARGE D EVIATIONS AND THE ITERATED LOGARITHM
151
Chernoff's Theorem t
9. 3 . Let 2, andom variables satisfying
• be independent , identically distributed simple E[ X11] < 0 and P[ X11 > 0] > 0, Let M(t) be their moment generating function , and put p = inf, M(t). Then X1 , X
Theorem
r common ( 9 . 20 )
•
•
. . . +X11 > 0] = 1og p . + X _!_ log P lim [ 1 n �. oo n .
PROOF. Put Y;. = X 1 + · · · +X11• Then E[ Y;. J < 0 and P[Y;. > 0] > P"[ X 1 > 0] > 0, and so the hypotheses of Theorem 9 1 are satisfied. Define p11 and T11 by inf1 M11(t) = Mn( T) = Pn ' where M11( t) is the moment generating " function of Yn . Since M11(t) = M (t ) it follows that Pn = p " and Tn = T, where ,
M(T) = p.
Let Zn be the analogue for Yn of the Z described by (9. 13). Its moment generating function (see (9. 14)) is M11( T + t)jp " = (M( T + t)jp) " . This is also the moment generating function of V1 + · · · + Vn for independent r andom variables V1 , • • • , V,. each having moment generating function M( T + t)/p. Now each V; has (see (9.15)) mean 0 and some positive variance rr 2 and fourth moment e independent of i. Since Zn must have the same moments as V1 + · · · + V,. , it has mean 0, va riance s� = ncr 2 , and fourth moment g: = ne + 3n(n - l)cr4 = O(n 2 ) (see (6.2)). By Theorem 9.2 , P[ Z" > 0] > s:/4{: > a for some positive a independent of n. By Theorem 9.1 then, 1 P[ Yn > 0] = p"e - 9 , where 0 < ()n < T11 S11a- 1 - log a = Ta- crVn - log a. This gives (9.20), and shows, in fact, that the rate of convergence is O(n - 1 12 ). • "
This result is important in the theory of statistical hypothesb testing. An informal treatment of the Bernoulli case will illustrat e the connection. Suppose sn = XI + . . . + xn, where the X; are independent and assume the values 1 and 0 with probabilities p and q. Now P[Sn :::: na] = PD.::;; = ( Xk - a) � 0], and Chemoff's theorem applies if p < a < i. In this case M(t) = E[ e ' na and in favor of H if sn < na , where a is some number satisfying p 1 < a < Pz · The problem is to find an advantageous value for the threshold a .
I
By (9.21 ),
P [ Sn > na i H I ] "" e - nK (a . p 1 ) '
( 9 .22)
where the notation indicates that the probability is calculated for p = p 1 -that is, under �he assumption of H1 • By symmet:y,
P[ sn < na i H2 ] = e - nK(a , P 2 > .
( 9 23)
The left sides of (9.22) and (9 23) are the probabilities of erroneously deciding in favor of Hz when H1 is, in fact, true and of erroneously deciding in favor of H1 when Hz is, in fact, true-the probabilities describing the level and power of the test. Suppose a IS chosen so that K(a, p 1 ) K(a,p 2 ), which makes the two error probabilities approximately equal. This constraint gives for a a linear equation with solution =
(9 .24) where Q; = 1 - P; The common error probability is approximately e - nK (a . P1 > for this value of a, and so the larger K(a, p 1 ) is, the easier it is to distinguish statistically between p 1 and P z · Al tho ugh K( a( p l > P z ), p 1 ) is a complicated function, it has a simple approximation for p 1 near Pz · As x -+ 0, log(1 + x) = x - 4x z + O(x 3 ). Using this in the definition of K and collecting terms gives (9.25)
X -+ 0.
Fix p 1 = p, and let P z p logarithms gives =
(9 .26)
+
t; (9.24) becomes a function r/J(t) of t, and expanding the t -+ 0,
after some reductions. Finally, (9.25) and (9.26) together imply that (9 .27 )
t -+ 0.
In distinguishing p 1 = p from Pz = p + t for small t, if a is chosen to equalize the q 2 n two error probabilities, then their common value is about e - r fB p . For t fixed, the nearer p is to 4, the larger this probability is and the more difficult it is to distinguish z p from p + t. As an example, compare p = . 1 with p = .5. Now .36nt j8(.1X.9) z nt j8(.5)(.5). With a samp le only 36 percent as large, . 1 can therefore be distin guished from . 1 + t with abou t the same precision as .5 can be distinguished from .5 + t. =
SECTION
9.
LARGE DEVIATIONS AND THE ITERATED LOGARITHM
153
The Law of the Iterated Logarithm
The analysis of the rate at which Sn/n approaches the mean depends on the following variant of the theorem on la rge deviations.
9. 4 . Let = X I + . . . + X where the are independent and identically distributed simple random variables with mean 0 and variance 1. If n,
sn
Theorem
xn
an are constants satisfying
a n � oo '
( 9.28)
an
_ � o'
.r,;
then (9 .29) for a sequence
(n
going to 0.
PROOF. Put Y,. == sn - anVn = 'Lk. = I(Xk - an/ Vn). Then £[ }:1 < 0. Since X 1 has mean 0 and variance 1, P[ X1 > 0] > 0, and it follows by (9.28) that P[X1 > an/ Vn] > 0 for n sufficiently large, in which case P[Yn > 0] > n p [ XI a n/ rn > 0] > 0. Thus Theorem 9.1 applies to Y,. for all large enough n. Let Mn(t), pn, Tn, and Zn be associated with Y,. as in the theorem. If m(t ) and c(t) are the moment and cumulant generating functions of the Xn, then Mn(t) is the n th power of the moment generating function e -ta n !F m(t) of X I - an/ rn' and so yn has cumulant generating function -
( 9.30) Since Tn is the unique minimum of C/t), and since C�(t) = - a nVn + nc'(t), Tn is determined by the equation c ( Tn) = an/ Vn. Since X1 has mean 0 and variance 1, it follows by (9.6) that '
( 9.31)
c ( O) = c' (0) = 0,
c" ( O) = 1 .
Now c'(t) is nondecreasing because c(t) is convex, and since c ( Tn) = an/ Vn goes to 0, Tn must therefore go to 0 as well and must in fact be O(an/ Vn) . By the second-order mean-value theorem for c'(t), an/ Vn = c'(Tn) = Tn + 0( T;; ), from which follows '
( 9.32)
154
PROBABILITY
By the third-order mean-value theorem for log Pn
c(t ),
= C11( T11) = - T11a11 Vn + nc ( T11) = - T" a " Vn + n [ tT; + o ( 'T�)] .
Applying (9.32) gives
( 9.33) Now (see (9. 14)) zn has moment generating function M/T,. + t )/Pn and (see (9.30)) cumulant generating function D11(t) = Cn( T11 + t) - log Pn = - ( T11 + t )a n Vn + nc(t + 'T11) log Pn · The mean of Z" is D�(O ) = 0. Its variance s� is D� (0); by (9 .31) this is --
,
( 9.34 )
s� = nc" ( T11) = n ( c (O) + 0( 711) ) = n ( 1 + o ( 1) ) . "
The fourth cumulant of Zn is D�"(O) = nc""(T11) = O(n). By the formula (9.9) relating moments and cumulants ( applicable because £[Z11] = 0), E[Z�] 3s � + D� {O) . Therefore, E[Z�]/s� � 3, and it follows by Theorem 9.2 that there exists an a such that P[ Z11 > 0] > a > 0 for all sufficiently large n. 1 By Theorem 9.1, P[ Y11 > 0] = p��e-0n with 0 < 011 < T11 S11 a + 1og a . By (9.28), (9.32), and (9.34), On = O(a11) = o(a_;,), and it follows by (9.33) that =
"
-
P[ Y,. > O] = e- a�( l + o( l ))/ 2 .
•
The law of the iterated logarithm is this:
9. 5 . Let sn = XI + . . . +Xn, where the xn are independent ' identically distn'buted simple random variables with mean 0 and uan'ance 1 . Theorem
Then
( 9. 35 )
[
l
. sup ,j Sn P hm = 1 = 1. n 2 n log log n
Equivalent to (9.35) is the assertion that for positive E
( 9.36 )
P[Sn > ( 1 + E ),f2 n loglog n i.o . ] = 0
and
( 9 .37 )
P [ sn > ( 1 - E ),f2 n loglog n i .o. ] = 1 .
The set in (9.35) is, in fact, the intersection over positive rational E of the sets in (9.37) minus the union over positive rational E of the sets in (9.36).
SECTION
9.
LARGE DEViATIONS AND THE ITERATED LOGARITH M
155
The idea of the proof is this. Write
(1 + E)(n)], then by (9.29), P( A!) is near (log n ) - < 1 ± d. If nk increases exponentially, say nk - () k for () > 1, then P(A! ) is of the order k - 0 ± d. Now [. k k -< I ± d converges if the sign is + and di:erges if the sign is - . It will follow by the first Borel-Cantelli lemma that there is probability 0 that A; occurs for infinitely many k. In providing (9.36), an extra argument is required to get around the fact that the A; for n =I= n k must also be accounted for (this requires choosing () near 1). If the A; were indepen dent, it would follow by the second Borel-Cantelli lemma that with probabil ity 1, A; occurs for infinitely many k, which would in tarn imply (9.37). An extra argument is required to get around the fact that the A� are dependent (this requires choosing () large). For the proof of (9.36) a preliminary result is needed. Put Mk max{S0 , 5 1 , . . . , Sk }, where 50 = 0. k
k
k
=
Theorem mean 0 and
9.variance 6. If the1, then forareaindependent simple random variables with > ..fi. Xk
Since Sn - Sj has variance n - j, it follows by independence and Chebyshev's inequality that the probability in the sum is at most
n 1 1 < P( AJ 2n < 2P(AJ .
•
156 nk
PROBABILITY
PROOF oF (9.36). Given E, choose () so that () > 1 but k = lO J and x k = 0(2 log log n k ) 1 12 . By (9.29) and (9.39),
0 2 < l + E.
Let
0 2 log k
and
where gk � 0. The negative of the exponent is asymptotically hence for large k exceeds () log k, so that
Since () > 1 , it follows by the first Borel-Cantelli lemma that there is probability 0 that (see (9.38))
( 9.40 ) for infinitely many
k.
Suppose that
nk _ 1 < n < nk
and that
(9.41 ) n 314(1og nf 1 + •>14i.o.] = 0. Use (9.29) to give a simple proof that P[S, > (3n log n) 1 1 2 i.o.] = 0.
9.7. Show that (9.35) is true if
S, is replaced by IS,I or max k , , Sk or max k s n i Sk l.
CHA P TER 2
M.e asure
SECTION
10.
GENERAL MEASURES
Lebesgue measure on the unit interval was central to the ideas in Chapter 1. Lebesgue measure on the entire real line is important in probability as well as in analysis generally, and a uniform treatment of this and other examples requires a notion of measure for which infinite values are possible. The present chapter extends the ideas of Sections 2 and 3 to this more general setting. Classes of Sets
The u-field of Borel sets in (0, 1 ] played an essential role in Chapter 1, and it is necessary to construct the analogous classes for the entire real line and for k-dimensional Euclidean space. Let x = ( x 1 , . . . , xk ) be the generic point of Euclidean k-space R k . The bounded rectangles Example 10.1.
( 10. 1) will play in R k the role intervals (a, b] played in (0, 1]. Let jN k be the u-field generated by these rectangles. This is the analogue of the class f1J of Borel sets in (0, 1]; see Example 2.6. The elements of jN k are the k dimensional Borel sets. For k = 1 they are also called the Linear Borel sets. Call the rectangle (10.1) rational if the a; and b; are all rational. If G is an open set in R k and y E G, then there is a rational rectangle A such that Y y E A c G. But then G = U e c A , and since there are only countably Y Y Y many rational rectangles, this is a countable union. Thus � k contains the open sets. Since a closed set has open complement, � k also contains the closed sets. Just as f!J contains all the sets in (0, 1] that actually arise in
158
SECTION 10.
GENERAL MEASURES
159
ordinary analysis and probability theory, � k contains all the sets in Rk that actually arise. The u-field � k is generated by subclasses other than the class of rectangles. If An is the x-set where a; < x ; < b,. + n - 1 , i = 1 , . . . , k, then An is open and ( 10 . 1 ) is n n A n. Thus � k is generated by the open sets. Similarly, it is generated by the closed sets. Now an open set is a countable union of rational rectangles. Therefore, the (countable) class of rational rectangles
generates � k .
•
The u-field � 1 on the line R 1 is by definition generated by the finite intervals. The u-field f!J in (0, 1] is generated by the subintervals of (0, 1]. The question naturally arises whether the elements of f!J are the elements of � 1 that happen to lie inside (0, 1], and the answe r is yes. If .sr/ is a class of sets in a space !l and 00 is a subset of !l , let .w'n !10 = [ A n 00: A E .w'].
If !F is a u-field in !l , then !Fn 00 is a u-field in 0 0 . (ii) If .w' generates the u-field !F in 0 , then .w'n 00 generates the u-field !Fn !1 0 in 0 0 : u(.w'n 00) = u(.w') n 00. Theorem 10.1 .
(i)
PRooF. Of course 00 = n n 00 lies in !Fn 00. If B lies in !Fn 00, so that B = A n 00 for an A E !F, then 0 0 - B = (!l - A ) n 00 lies in !Fn 0 0 . If Bn lies in !Fn 0 0 for all n, so that Bn = An n 00 for an A n E !F, then U n Bn = ( U n An ) n 00 lies in !Fn 00• Hence part (i). Let .9lQ be the u-field .w'n 00 generates in 00. Since .w'n 00 c !Fn 0 0 and !Fn no is a u-field by part (i), 90 c !Fn 00• Now !Fn 00 c !F0 will follow if it is shown that A E !F implies A n fl 0 E 90, or, to put it anoth�r way, if it is shown that !F is contained in .:#= [ A c 0 : A n 00 E 90l. Since A E .w' implies that A n 00 lies in .w'n 00 and hence in !Fn 00, it follows that de .:#. It is therefore enough to show that .§ is a u-field in D. Since n n 00 = 00 lies in SlQ, it follows that .n E .:#. If A E .:#, then (!l - A ) n 00 = 00 - ( A n 00) lies in 90 and hence n - A E .§. If An E .:# for all n, then ( u n A J n Do = u /An n Do) lies in 90 and hence U n A n E .:#. •
If 00 E !F, then !Fn 00 = [ A : A c 00, A E !F]. If !l = R 1 , 00 = (0, 1], and !F- � 1 , and if .w' is the class of finite intervals on the line, then .w'n 00 is the class of subintervals of (0, 1], and f!J = u(.w'n 00) is given by ( 10.2)
A subset of (0, 1 ] is thus a Borel set (lies in f!J) if and only if it is a linear Borel set (lies in � 1 ), and the distinction in terminology can be dropped.
160
MEASURE
oo Measures assume values in the set [0, oo] consisting of the ordinary nonnegative reals and the special value oo, and some arithmetic conventions are called for. For x, y [0, oo], x < y means that y = oo or else x and y are finite (that is, are ordinary real numbers) and x < y holds in the usual sense. Similarly, x < y means that y = oo and x is finite or else x and y are both finite and x < y holds in the usual sense. in [0, oo], For a finite or infinite sequence x, x 1, x 2 , Conventions Involving
E
•
•
•
( 10.3 ) means that either (i) x = oo and x k = oo for some k, or (ii) x = oo and x k < oo for all k and L. k x k is an ord inary divergent infinite series, or (iii) x < oo and x k < oo for all k and (10.3) holds in the usual sense for '[ k x k an ordinary finite sum or convergent infinite series. By these conventions and Dirichlet's theorem [A26], the order of summation in ( 10.3) has no effect on the sum. For an infinite sequence x, x 1 , x 2, . . in [0, oo], •
J.k i X
( 10.4)
means in the first place that xk < x k + 1 < x and in the second place that either (i) x < OG and there is convergence in the usual sense, or (ii) x k = oo for some k, or (iii) x = oo and the x k are finite reals converging to infinity in the usual sense. Measures
A set function J.L on a field !Y in n is a conditions:
measure
if it satisfies these
(i) J.L( A ) E [0, oo] for A .;:: /Y; (ii) /L(0) = 0; (iii) if A 1 , A 2 , . . is a disjoint sequence of S'tsets and if then (see (10.3)) •
U 'k =
1
Ak E
/Y,
The measure 11- is finite or infinite as J.L(f!) < oo or J.L{ f! ) = oo; ;it is a probability measure if J.L ( f! ) = 1 , as in Chapter 1. If f! = A 1 U A 2 U . . . for some finite or countable sequence of St"sets satisfying J.L (A k ) < oo, then J.L is u-finite. The significance of this concept will be seen later. A finite measure is by definition u-finite; a u-finite measure may be finite or infinite. If .sf is a subclass of !Y, then J.L is u-finite on .sf if f! = U k A k for some finite or infinite sequence of .Q
SECTION 10.
G ENERAL MEASURES
161
is important to understand that CT-finiteness is a joint property of the space f!, the measure JL , and the class d. If 11- is a measure on a CT-field !F in f!, the triple (f!, !F, JL) is a measure space. (This term is not used if !F is merely a field.) It is an infinite, a (}'-finite, a finite, or a probability measure space according as 11- has the corresponding property. If JL( A c) = 0 for an SZ:set A , then A is a support of JL , and 11- is concentrated on A . For a finite measure, A is a support if and only if JL( A) = JL(f!). The pair (fl, !F ) itself i� a measurable space if :Y is a CT-fie!d in fl . To say that 11- is a meas:.ue on (fl, 7) indicates clearly both the space and the class of sets involved. As in the case of probability measures, (iii) above is the cond ition of countable additivity, and it implies finite additivity: If A 1 , An are disjoint SZ:sets, then •
As in the case of probability measures, if this holds for n inductively to all n.
•
=
•
,
2, then it extends
A measure 11- on (Q, !F ) is discrete if there are finitely or countably many points w; in f! and masses m ; in [0, oo] such that JL(A) = [., e A m ; for A E !F. It is an infinite, a finite, or a probability measure as [.i m i diverges, or converges, or converges to 1 ; the last case was treated in Example 2.9. If !F contains each singleton {wJ, then 11- is CT-finite if anct only if m; < oo for all i • Example 10.2. I
Let !F be the CT-field of all subsets of an arbitrary f!, and let JL(A) be the number of points in A , where JL( A ) = oo if A is not finite. This 11- is counting measure; it is finite if and only if f! is finite, and is CT-finite if and only if f! is countable. Even if .9" does not contain every subset of f!, counting measure is well defined on !F. • Example 10.3.
Specifying a measure includes specifying its domain. If /.L is a measure on a field !F and .9lQ is a field contained in !F, then the restriction 11-o of 11- to .9lQ is also a measure. Although often denoted by the same symbol, 11-o is really a different measure from 11- unless !F0 = !F. Its p roperties may be different: If 11- is counting measure on the CT-field !F of all subsets of a countably infinite f!, then 11- is CT-finite, but its restriction to the CT-field .9lQ = {0, f!} is not CT-finite. • Example 10.4.
162
MEASURE
Certain properties of probability measures carry over immediately to the general case. First, J.L is monotone: J.L(A) < J.L ( B ) if A c B. This is derived, just like its special case (2.5), from J.L( A ) + J.L( B - A ) = J.L( B ). But it is possible to go on and write J.L( B - A ) = J.L( B) - J.L ( A ) only if J.L(B) < oo. If J.L( B ) = oo and J.L( A ) < oo, then J.L( B - A) = oo; but for every a E [0, oo] there are cases where J.L( A ) = ,u ( B ) = oo and J.L ( B - A ) = a. The inclusion-exclusion formula (2.9) also carries over without change to .r-sets of finite measure : ( 10 .5 )
J.L
) L0
The proof of finite
A k = �J.L( A;) - � J.L( A J 1 AJ + . . . · 1 ; 1 + ( - l t + J.L( A 1 n · · · n An ) ·
subadditimty also goes through just as before:
here the A k need not have finite measure. Theorem 10.2.
Let J.L
be
a measure on a field
:7.
{i) Continuity from below: If An and A Lie in !F and An i A , then t J.L( An ) i J.L( A ). {ii) Continuity from above: If An and A Lie in !F and An � A , and if J.L( A 1 ) < oo, then J.L( AnH J.L( A). and U 'k = 1 A k Lie in !F, then {iii) Countable subadditivity: If A 1 , A 2 , •
•
•
If J.L is u-finite on !F, then !F cannot contain an uncountable, disjoint collection of sets of positive J.L-measure. {iv)
PROOF. The proofs of (i) and {iii) are exactly as for the corresponding parts of Theorem 2. 1 . The same is essentially true of (ii): If J.L( A 1) < oo, subtraction is possible and A I - An i A I - A implies that J.L( A I ) - J.L(A n) J.L( A l - An } i J.L( A l - A ) = J.L ( A 1 ) - J.L( A). There remains (iv). Let [ Be : () E e] be a disjoint collection of .r-sets satisfying J.L(Be) > 0. Consider an .r-set A for which J.L(A) < oo. If 01, , (}n are distinct indices satisfying J.L( A n Be ) > E > 0, then nE < Ei= 1 J.L( A n Be ) < J.L( A ), and so n < J.L( A) IE. Thus the index set [ () : J.L( A n Be) > E ] is finite, =
•
I
tsee (10.4).
•
.
I
SECTION 10.
GENERAL MEASURES
163
and hence (take the union over positive rational E) [0: J.L(A n B8) > 0] is countable. Since 1-L is 0'-finite, f! = U k A k for some finite or countable sequence of .9t"sets A k satisfying J.L( A k ) < oo. But then E>k = [0: J.L( A k n B8) > 0] is countable for each k. Since J.L( B8) > 0, there is a k for which • J.L( A k n B8) > 0, and so E> = U k E>k : E> is indeed countable. Uniqueness
Accoroing to Theorem 3 .3, probability measures agreeing on a 7T-system 9 agree on u{9). There is an extension to the general case.
Suppose that J.L 1 and 1-L z are measures on u(9), where 9 is a 1T-system, and suppose they are u-finite on 9. If I-L l and J.L2 agree on 9, then they agree on u(9). Theorem 10.3.
PROOF. Suppose that B E 9 and J.L 1(B) = J.L i B ) < oo, and let ../'8 be the class of sets A in u(9 ) for which J.L 1(B n A ) = J.L 2(B n A ). Then ../'8 is a ,.\-system containing 9 and hence (Theorem 3.2) containing u{ 9). By u-finiteness there exist Ylsets Bk sati�fying f! = Uk Bk and J.L 1( Bk ) J.L 2( Bk ) < oo. By the inclusion-exclusion formula ( 10.5), =
1-L a
(� n
1. - 1
)
( Bi n A ) =
L J.L a( Bi n A ) -
l � i �n
L
1 � i <j :$.n
1-L a( Bi n Bi n A ) + . . .
for a = 1, 2 and all n. Since 9 is a 1r-system containing the B;, it contains the B; n Bi, and so on. For each u( 9)-set A , the terms on the right above are therefore the same for a = 1 as for a = 2. The left side is then the same for a = 1 as for a = 2; letting n � oo gives J.L 1(A) = I-L i A). •
Suppose I-L l and J.L 2 tire finite measures on u(9), where fi?J is a 1T-system and f! is a finite or countable union of sets in 9. If I-L l and 1-Lz agree on 9, then they agree on u(9). Theorem 10.4.
PROOF. By hypothesis, f! = U k Bk for Ylsets Bk , and of course 1-La(Bk) 5. 1-L a(f!) < oo, a 1 , 2. Thus 1-L 1 and 1-Lz are u-finite on 9, and Theorem 10.3 applies. • =
If 9 consists of the empty set alone, then it is a 7T-system and u(9) {0, f!} . Any two finite measures agree on 9, but of course they need not agree On u( 9). Theorem 10.4 does not apply in this case, because f! is not a countable union of sets in 9. For the same reason, no measure on u(9) is u-finite on 9, and hence Theorem 10.3 does not apply. • Example 10.5. =
164
MEASURE
Suppose that (f!, !F) = ( R 1 , .9£' 1 ) and 9 consists of the half-infinite intervals ( oo, x ]. By Theorem 10.4, two finite measures on !F that agree on 9 also agree on !F. The �sets of finite measure required in • the definition of u-finiteness cannot in this example be made disjoint.
-
Example 10.6.
If a measure on ( f! , !F ) is u-finite on a subfield .9lQ of !F, then f! = Uk Bk for disjoint SlQ-sets Bk of finite measure: if they are not • disjoint, replace Bk by Bk n B[ · · · n Bf_ 1 . b'xample 10.7.
The proof of Theorem 10.3 simplifies slightly if f! = U k Bk for disjoint �sets with JL 1( Bk ) = JL2( Bk ) < oo, because additivity itself can be used in place of the inclusion-exclusion formula. PROBLEMS
10.1. Show that if conditions (i) and (iii) in the definition of measure hold, and if f.L(A) < oo for some A E !T, then condition (ii) holds. k
10.2. On the u-field of all subsets of n = {1, 2, . . . } put f.L{ A) = [k e A 2 - if A is finite and f.L(A) = oo otherwise. Is f.L finitely additive? Countably additive?
10.3. (a) In connection with Theorem 10.2(ii), show that if An l A and JL( A k ) < oo for some k, then JL(A n H f.L(A). (b) Find an example in which A n l A, JL(A n) oo, and A = 0. =
10.4. The natural generalization of (4.9) is ( 10.6)
(
)
f.L lim inf An < lim inf JL( An) n
n
(
)
< lim SUP JL ( A n ) < JL lim sup A n · n n
Show that the left-hand inequality always holds. Show that the right-hand inequality holds if f.L( U k ;;,: n A k) < oo for some n but can fail otherwise.
10.5. 3.10 A measure space (D., !T, f.L) is complete if A c B, B E !T, and f.L( B) = 0 together imply that A E .9'"- the definition is just as in the probability case. Use the ideas of Problem 3.10 to construct a complete measure space (!l, srt, f.L + ) such that .rc srt and f.L and f.L + agree on .r. 10.6. The condition in Theorem 10.2(iv) essentially characterizes u-finiteness. (a) Suppose that !T, f.L) has no "infinite atoms," in the sense that for every A in !T, if f.L( A) = oo, then there is in .r a B such that B c A and 0 < f.L( B) < oo. Show that if .r does not contain an uncountable, disjoint collection of sets of positive measure, then f.L is u-finite. (Use Zorn's lemma.) (b) Show by example that this is false without the condition that there are no "infinite atoms."
(D.,
•
10.7. Example 10.5 shows that Theorem 10.3 fails without the u-finiteness condition. Construct other examples of this kind.
SECTION
11.
OUTER MEASURE
165
SECTION 1 1. OUTER MEASURE Outer Measure An
outer measure is a set function 11-* that is defined for all subsets of a space
f! and has these four properties:
(i) (ii) (iii) (iv)
J.L *(A) E [0, oo] for every A c f! ; J.L *(0) 0; 11-* is monotone: A c B implies J.L*{ A) < J.L*(B); J.l* is countably subadditive: J.L*(Un A ) < Ln J.L*(A). =
The set function
0
P*
defined by (3 .1) is an example, one which generalizes :
Example 11.1. Let p be a set function on a class Jd' in fl. Assume that E .sf and p (0) = 0, and that p( A ) E [0, oo] for A E d; p and .sf are
otherwise arbitrary. Put
( 1 1 .1)
11-*( A ) = inf L p( AJ , n
where the infimum extends over all finite and countable coverings of A by .ldf-sets An. If no such covering exists, take J.L*( A) = oo in accordance with the convention that the infimum over an empty set is oo. That 11-* satisfies { i ), (ii), and (iii) is clear. If J.L*(A n ) = oo for some n , then obviously J.L*( U n A) < Ln J.L*(A n ). Otherwise, cover each A n by .ldf-sets Bn k n satisfying Lk p(Bn k ) < J.L*(A) + Ej2 ; then ,u.*{ Un A) < Ln , k p(Bn k ) < • Ln J.L*(A n) + E. Thus f.L* is an outer measure. Define
A to be J.L*-measurable
if
(11 .2) for every E. This is the general version of the definition (3.4) used in Section 3. By subadditivity it is equivalent to ( 1 1 .3) Denote by .A'(J.L*) the class of J.L*-measurable sets. The extension property for probability measures in Theorem 3.1 was proved by a sequence of lemmas the first three of which carry over directly to the case of the general outer measure: If P* is replaced by 11-* and .II by .A'{J.L*) at each occurrence, the proofs hold word for word, symbol for symbol.
166
M EASURE
In particular, an examination of the arguments shows that oo as a possible value for J.L * does not require any changes. Lemma 3 in Section 3 becomes this:
ll.l. If 11-* is an outer measure, then .A'(J.L*) is a u-field, and J.L* restricted to .A'( 11-*) is a measure. Theorem
This will be used to prove an extension theorem, but it has other applications as w ell.
Extension Theorem 1 1 .2.
u-field.
A measure on a field has an extension to the gen.,erated
If the original measure on the field is u-finite, then it follows by Theorem 10.3 that the ext ension is unique. Theorem 1 1 .2 can be deduced from Theorem 1 1 . 1 by the arguments used in the proof of Theorem 3.l.t It is unnecessary to retrace the steps, however, because the ideas will appear in stronger form in the proof of the next result, which generalizes Theorem 11.2. Define a class .sf of subsets of f! to be a semiring if
{i) 0 E .sf; {ii) A, B E .sf implies A n B E .sf; (iii) if A, B E N' and A c B, then such that B - A = U Z = 1 Ck.
there exist disjoint .ldf-sets C 1 , . . . , Cn
The class of finite intervals in f! = R 1 and the class of subintervals o f f! = (0, 1 ] are the simplest examples of semirings. Note that a semiring need not contain fl.
Suppose that J.L is a set function on a semiring d. Suppose that J.L has values in [0, oo], that J.L{0) 0, and that J.L is finitely additive and countably subadditive. Then J.L extends to a measure on u(Jlf). Theorem 11.3.
=
This contains Theorem 11.2, because the conditions are all satisfied if .sf is a field and J.L is a measure on it. If f! = U k A k for a sequence of .ldf-sets satisfying J.L(A k ) < oo, then it follows by Theorem 10.3 that the extension is . umque. tsee also Problem 1 1.1.
SECTION
1 1.
OUTER MEASURE
167
PROOF. If A, B, and the Ck are related as in condition (iii) in the definition of semiring, then by finite additivity J.L( B) J.L( A) + I:� = 1 J.L(C k ) > J.L( A). Thus 11- is monotone. Define an outer measure 11-* by (1 1.1) for p = 11-: =
( 1 1 .4)
n
the infimu m extending over coverings of A by .ldf-sets. The first step is to show that N'c ./t(J.L*). Suppose that A E d. If 11-*( E) = oo, then (1 1.3) holds trivially. If 11- *{E) < oo, for given E choose .ldf-sets An such that E c U n A n and l:nJ.L(AJ < J.L*{ E) + E. Since .sf is a semiring, Bn = A n A n lies in .sf and A c n A n =An - Bn has th� form U;;';:, 1 Cnk for disjoint .ldf-sets cn k • Note that A n Bn u U;;'::. I cn k ' where the union is disjoint, and that A n E c l.J n Bn and A c n E c U n U;;'" 1 Cn k · By the defini tion of 11-* and the assumed finite additivity of J.L, =
n
n k=1
n Since E is arbitrary, (11.3) follows. Thus de ./t(J.L*). The next step is to show that 11-* and 11- agree on d. If A c Un A n for .ldf-sets A and A,, then by the assumed countable subadditivity of 11- and the monotonicity established above, J.L(A ) < l:nJ.L(A nA n ) < l:n J.L(A n ). There fore, A E .sf implies that J.L( A) < J.L*( A) and hence, since the reverse in equality is an immediate consequence of (11.4), J.L( A) J.L*( A). Thus 11-* agrees with 11- on d. Since N'c./t(iJ-*) and ./t(J.L*) is a u-field (Theorem 11. 1), =
Since 11-* is countably additive when restricted to ./t(J.L*) (Theorem 1 1.1 again), 11-* further restricted to u(N') i s an extension of 11- on .sf, as required . •
For .sf take the semiring of subintervals of f! (0, 1] (together with the empty set). For 11- take length ,.\: ,.\(a, b] = b - a . The finite additivity and countable subadditivity of ,.\ follow by Theorem 1.3.t By Theorem 1 1 .3, ,.\ extends to a measure on the class u( d) = f!J of Borel sets in (0, 1]. • Example 11.2.
=
t on a field, countable additivity implies countable subadditivity, and A is in fact countably additive on .sat- but .sat is merely a semi ring. Hence the separate consideration of additivity and subadditivity; but see Problem 1 1 .2.
168
M EASURE
This gives a second construction of Lebesgue measure in the unit interval. In the first construction ,.\ was extended first from the class of intervals to the field f!J0 of finite disjoint unions of intervals (see Theorem 2.2) and then by Theorem 1 1 .2 (in its special form Theorem 3 . 1 ) from f!J0 to f!J = u( f!J0) . Using Theorem 1 1 .3 instead of Theorem 1 1 .2 effects a slight economy, since the extension then goes from .sf directly to f!J without the intermediate stop at f!J0, and the arguments involving (2. 13) and (2.14) become unnecessary. In Theorem 1 1 .3 take for .sf the semiring of finite inter vals on the rea l line R 1 , and consider A 1(a, b] = b - a. The arguments for Theorem 1.3 in no way require that the (finite) intervals in question be contained in (0, 1], and so ,.\ 1 is finitely additive and countably subadditive on this class d. Hence ,.\ 1 extends to the u-field �� of linear Borel sets, which is by definition generated by d. This defines Lebesgue measure ,.\ 1 over the • whole real line. Example 11.3.
lies in (jJ if and only if it lies in � 1 (see (1 0.2)). Now ,.\ 1( A) = ,.\( A) for subintervals A of (0. 1], and it follows by uniqueness (Theorem 3.3) that A1{ A ) = A( A) for all A in f!J. Thus there is no inconsis tency in dropping ,.\ 1 and using A to denote Lebesgue measure on � 1 as well as on f!J. A subset of
(0, 1]
The class of bounded rectangles in R k is a semiring, a fact needed in the next section. Suppose that A = [ x: x; E I;, i < k] and B = [x: x; E 1;, i < k] are nonempty rectangles, the I; and 1; being finite intervals. If A c B, then I; c1;, so that 1; -k I; is a disjoint union If U I;' of intervals (possibly empty). Consider the 3 disjoint rectangles [ x : x; E U;, i < k ], where for each i, U; is I; or I; or I;'. One of these rectangles is A itself, and B - A • is the union of the others. The rectangles thus form a semiring. Example 11.4.
An
Approximation Theorem
If .sf is a semiring, then by Theorem 10.3 a measure on u(N') is determined by its values on .sf if it is u-finite there. Theorem 1 1 .4 shows more explicitly how the measure of a u{ N')-set can be approximated by the measures of .ldf-sets.
l. If A, A I> , An are sets in a semiring disjoint Jdf-sets c I> , em such that Lemma
•
•
•
•
.sf,
•
then there are
•
A n AcI n
· · ·
nAcn = C I
u · · ·
U Cm
•
PROOF. The case n = 1 follows from the definition of semiring applied to A n A I = A - ( A n A 1 ). If the result holds for n, then A n A I n · · nA� + I • UJ!= 1 (Ci n A� + 1 ); apply the case n 1 to each set in the union. ·
=
=
SECTION
1 1.
OUTER MEASURE
169
Suppose that .sf is a semiring, J.L is a measure on !F u( .N'), and J.L is u-finite on .sf. Theorem 1 1.4.
If B E !F and E > 0, there exists a finite or infinite disjoint sequence A �> A 2 , . . . of dsets such that B c Uk A k and J.L(( U k A k ) - B) < €. (ii) If B E !F and E > 0, and if J.L(B) < oo, then there exists a finite disjoint sequence A" . . , A n of dsets such that J.L( B ..,. ( U k = 1 A k )) < €. (i)
.
PROOF. Return to the proof of Theorem 1 1 .3. If J.L* is the outer measure defined by (11.4), then !Fc .A'(J.L*) and 11-* agrees with J.L on .9/, as was shown. Si nce 11-* restricted to !F is a measure, it follows by Theorem 10.3 that 11-* agrees with J.L on !F as well. Suppose now that B lies in !F and J.L( B) = J.L*{B) < oo. There exist dsets A k such that B c U k A k and J.L( U k A k ) < L: k J.L( A k ) < .u(B) + E; but then J.L(( U k A k ) - B) < E. To make the sequence {A k } disjoint, replace A k by n A� _ 1 ; by Lemma 1 , each of these sets is a finite disjoint A k n A� n union of sets in .9/. Next suppose that B !ies in !F and J.L( B) J.L*(B) = oo. By u-finiteness there exist dsets em such that f! = U m em and J.L(e m ) < oo. By what has j ust been shown, there exist dsets A mk such that B n em c U k A mk and J.L(( u k A mk ) - (B n em )) < E j2 m The sets A mk taken all together provide a sequence A 1, A 2 , . . . of dsets satisfying B c U k A k and J.L({ U k A k ) - B) < E. As before, the A k can be made disjoint. To prove part (ii), consider the A k of part (i). If B has finite measure, so has A = U k A k , and hence by continuity from above (Theorem 10.2{ii)), • J.L( A - U k < n A k ) < E for some n . But then J.L(B .-,. ( Uk= l A k )) < 2E. ·
·
·
=
.
If, for example, B is a linear Borel set of finite Lebesgue measure, then A(B ..,. ( U k = I A t- )) < E for some disjoint collection of finite intervals A I , . . . , An . I.
If JL is a finite measure on a u-field !T generated by a field .ro, then for each !?-set A and each positive E there is an .ro-set B such that JL(A "' B) < E. Corollary
PROOF. This is of course an immediate consequence of part (ii) of the theorem, but there is a simple direct argument. Let .# be the class of /?-sets with the required property. Since Ac..,. Be = A ..,. B, .# is closed under complementation. If A = Un An, where An E .#, given E choose n0 so that JL(A - Un 5 no An) < E, and then choose �-sets Bn, n < n0, so that JL( An .-,. Bn) < Ejn0• Since ( Un s no An) .-,. ( U n s no Bn) c Un s n ( An "' Bn), the .ro-set B = Un s n Bn satisfies JL( A "' B ) < 2E. Of course .ro c .#; • since :# is a u-field , !Fe .#, as require�. .w'
is a semiring, !l is a countable union of %sets, and 11- . , JL are measures on !F u(.w'). If JL 1( A ) JLz(A) < oo for A E .w', then JL1(B) 2 J1.2(B) for B E !F. Corollary 2. Suppose that
0 (pointwise) implies A(/) > 0 and continuous from above at 0 in the sense that fn l 0 (pointwise) implies A(fn ) --+ 0. (a) If f < g ( f, g E ../ ), define in !l X R 1 an "interval"
( f g ] = [(w, t ) : f ( w ) < t < g ( w ) ] .
( 1 1 5)
,
Show that these sets form a semiring d0. (b) Define a set function v0 on d0 by v 0( f , g ) = A ( g - f ) .
( 1 1 .6)
S how that v0 is finitely additive and countably subadditive or. d0.
11.5.
t
(a) Assume f E ...£' and let fn = (n( f - f 1\ 1)) 1\ 1 . Show that f(w) < 1 implies fn(w) = 0 for all n and f(w) > 1 implies fn(w) = 1 for all sufficiently large n. Conclude that for x > 0, (O, xfn] i [ w . f( w ) > 1 ] X (O, x] .
( 1 1 .7) (b) Let
!T be the smallest u-field with respect to which every f in ...£' is measurable: !T u[ r 1 H: f E .../, H E � � ]. Let .9(j be the class of A in !T for
which A X (0, 1] E u(d0) . Show that .9(j is a semiring and that 7 u (.9(j ) . (c) Let v be the extension of v0 (see (11.6)) to u(d0), and for A E .9Q define JLo(A) = v(A X (0, 1]). Show that JLo is finitely additive and countably subaddi tive on the semiring .9Q. SECTION 12. MEASURES I N EUCLIDEAN SPACE Lebesgue Measure
In Example 1 1 .3 Lebesgue measure ,.\ was constructed on the class �1 of linear Borel sets. By Theorem 10.3, ,.\ is the only measure on � 1 satisfying an analogous k ,.\(a, b] b - a for all intervals. There is in k-space k dimensional Lebesgue measure ,.\ k on the class � of k-dimensional Borel sets (Example 10.1). It is specified by the requirement that bounded rectan gles have measure =
(12.1)
Ak [ x :
a; < x; < b; , i
=
1,
. . . ,
This is ordinary volume-that is, length ( k or hypervolume ( k > 4).
k] =
=
k
0 ( b; - a; ) .
i=
I
1), area ( k = 2), volume ( k = 3),
MEASURE
172
Since an intersection of rectangles is again a rectangle, the uniqueness theorem shows that ( 1 2 . 1 ) completely determines A k . That there does exist k such a measure on � can be proved in several ways. One is to use the ideas involved in the case k = 1 . A second construction is given in Theorem 12.5. A third, independent, construction uses the general theory of product mea sures; this is carried out in Section 18. t For the moment, assume the k existence on � of a meas ure A k satisfying (12. 1). Of course, A k is u-finite. A basic property of A k is translation invariance. +
If A E �\ then A + x = [a + x : a E A] E � k and A k(A ) A k (A + x) for all x. Theorem 12.1. =
k
PROOF. If .:# is the class of A such that A + x is in � for all x, then .:# is a u-fie!d containing the bou nded rectangles, and so .:#-:J �". Thus A + x E � k for A E � k . k For fixed x define a measure 11- on � by JL(A) = A k (A + x). Then 11- and A k agree on the 7T-system of bounded rectangles and so agree for all Borel sets. •
If A is a (k - I )-dimensio nal subspace and x lies outside A, the hyper planes A + tx for real t are disjoint, and by Theorem 12.1, all have the same measure. Since only countably many disjoint sets can have positive measure (Theorem 10.2{iv)), the measure common to the A + tx must be 0. Every (k - I)-dimensional hyperplane has k-dimensional Lebesgue measure 0. The Lebesgue measure of a rectangle is its ordinary volume. The following theorem makes it possible to calculate the measures of simple figures. Theorem 12.2. If T: R k implies that TA E � k and
�
Rk
is
linear and nonsingular, then A E !JR k
( 1 2.2) Since a parallelepiped is the image of a rectangle under a li near transfor mation, (12.2) can be used to compute its volume. If T is a rotation or a reflection-an orthogonal or a unitary transformation-then det T = + 1 , a nd so Ak(TA) = A k(A). Hence every rigid transformation or isometry (an orthogonal transformation followed by a translation) preserves Lebesgu e measur e. An affine transformation has the form Fx = Tx + x0 (the general t see also Problem s 17.14 and 20.4 tAn analogous fact was used in the construction of a nonmeasurable set on
p.
45
S ECTION
12.
MEASURES I N EUCLIDEAN SPACE
173
li near transformation T followed by a translation); it is nonsingular if T is. It follows by Theorems 12.1 and 12.2 that ,.\ k(FA ) = ldet Tl · ,.\ k( A ) i n the nonsingular case. PROOF OF TH E TH EOREM. Since T Un A n = un TA n and TAC = (TA Y k because of the assumed nonsingularity of T, the class §= [ A : TA E .9£1 ] is a £T-field. Since TA is open for open A, it follows again by the assumed nonsingularity of T that § contains all the open sets and hence (Example 1 0. 1) all the Borel sets. Therefore, A E .9£1 k implies TA E .9£1 k. For A E B£1 \ set J.I) A ) = ,.\ k(TA ) and J.1 2{ A ) = Idet TI · A k ( A ). Then J.1 1 k a nd 1.1 2 are measures, and by Theorem 10.3 they will agree on .9£1 (which i'> the assertion (12.2)) if they agree an the 7T-system consisting of the rectangles [ x : a; <x; < b;, i = 1, . . . , k ] for which the a; and the b; are all rational (Example 10.1 ). It suffices therefore to prove (12.2) for rectangles with sides of rational length. Since such a rectangle is a finite disjoint union of cubes and ,.\ k is translation-invariant, it is enough to check (12.2) for cubes
( 12.3)
A = [ x : 0 < x; < c , i =
1,. . . , k]
that have their lower corner at the origin. Now the general T can by elementary row and column operations t be represented as a product of linear transformations of these three special forms:
{1°) (2°) (3°)
T(x 1 , { 1 , 2, T(x 1 , T(x 1 ,
• • •
. . .
•
,
• •
• • •
, x k ) = ( x7T 1 , . _ , x7Tk ), where k} ; , x k ) = (ax 1 , x 2 , , xk); , xk) = ( x 1 + x 2 , x 2 , , x k ). • •
• •
1T
is a permutation of the set
•
• • •
Because of the rule for multiplying determinants, it suffices to check (12.2) for T of these three forms. And, as observed, for each such T it suffices to consider cubes (12.3 ). (1 °): Such a T is a permutation matrix, and so det T = + 1. Since (12.3) is invariant under T, (12.2) is in this case obvious. (2°): Here det T = a, and TA = [ x: x 1 E H, O < x; < c, i = 2, . . . , k ], where H = (0, ac] if a > 0, H = { 0} if a = 0 (although a can not in fact be 0 if T is k nonsingular), and H = [ac, 0) if a < 0. In each case, ,.\ k (TA ) = Ia I · c = lal · A k( A ). tBI RKHOFF & MAc LANE, Section 8.9
174
MEASURE
{3°): Here det T 1. k < 3, and define =
Let
B = [x: 0 < xi < c, i = 3, . . . , k], where B R k =
if
B 1 = [ x : 0 < x 1 < x 2 < c ] n B, B 2 = [ x : O < x 2 < x 1 < c ] n B, B3 [ x: c < x < c + x2 , 0 < x2 < c ] n B. Then A = B 1 u B 2 , TA = B 2 u B3, and B 1 + (c, 0, . . . , 0) = B3 • Since i. k (B 1 ) = • i. k(B 3 ) by trans lation invariance, (12.2) follows by additivity. =
1
if T is singular, then det T = 0 and TA lies in a (k - I)-dimensional subspace. Since such a subspace has measure 0, (12.2) holds if A and TA lie in !JR k . The surprising thing is that A gp k need not imply t hat TA gpk if T is singular. Even for a very simple transformation such as the projection T(x , x 2 ) = ( x 1 , 0) in the plane, there exist Borel sets A for which TA is not a Borel set. f
E
E
Regularity
Important among measures on � k are those assigning finite measure to bounded sets. They share with A k the property of regularity : Theorem 12.3.
A is bounded.
Suppose that f.L is a measure on � k such that J.L(A) < oo if
{i) For A E � k and € > 0, there exist a closed C an open G such that C cA c G and f.L(G - C ) < E. {ii) If f.L( A) < oo, then f.L( A) = sup J.L( K ), the supremum extending over the compact subsets K of A . PROOF. The second part of the theorem follows form the first: J.L(A) < oo implies that J.L(A - A0) < E for a bounded subset A0 of A, and it then follow s from the first part that J.L( A0 - K) < E for a closed and hence compact subset K of A0 •
tsee HAUSDORFF,
p.
241 .
S ECTION 12.
MEAS URES IN EUCLIDEAN SPACE
175
To prove (i) consider first a bounded rectangle A = [x: a , < x1 < b,., i < k ]. The set G, = [x : a, <x, < b, + n - 1 , i < k ] is open and G,. � A . Since J.L(G 1 ) is finite by hypothesis, it follows by continuity from above that J.L(G,. - A) < E for large n. A bounded rectangle can therefore be approximated from the outside by open sets. The rectangles form a semiring (Example 1 1 .4). For an arbitrary set A in !11 \ by Theorem 1 1 .4{i) there exist bounded recta ngles A k such that A c U k A k and J.L(( U k A k ) - A ) < E. Choose open sets Gk such that A k c G k and J.L(Gk - A k ) < Ej2 k. Then G uk Gk is open and J.L(G - A ) < 2€ . Thus the general k-dimen sional Borel set ca n be approximated from the outside by open sets. To approximate from the jnside by closed sets, pass to comple • ments. =
Specifying Measures on the Line
There are on the line many measures other than ,.\ that are important for probability theory. There is a useful way to describe the collectioil of all measures on .9i' 1 that assign finite measure to each bounded set. If J.L is· such a measure, define a real function F by ( 12.4)
F( x ) =
(
J.L(O, x] - J.L ( x , O]
if x ?. O , if X < 0 .
It is because J.L (A) < oo for bounded A that F is a finite function. Clearly, F is nondecreasing. Suppose that x, � x. If x ;;::: 0, apply part (ii) of Theorem 10.2, and if x < 0, apply part (i); in either case, F(x ,.H F(x ) follows . Thus F is continuous from the right. Finally, ( 12.5 )
J.L ( a , b]
= F(b)
-
F( a )
for every bou nded interval (a, b ]. If J.L is Lebesgue measure, then (1 2.4) gives F(x ) = x. The finite intervals form a 1r-system generating .9i' 1 , and therefore by Theorem 10.3 the function F completely determines J.L through the relation (12.5)." But (1 2.5) and J.L do not determine F: if F( x ) satisfies (1 2.5), then so does F(x ) + c. On the other hand, for a given J.L, (12.5) certainly determines F to within such an additive constant. For finite J.L , it is customary to standardize F by defining it not by (1 2.4) but by
( 12.6 )
F( x) = 11- (
-
oo , x ;
]
then limx .... - oo F( x ) = O and limx --+ oo F( x ) = J.L(R 1 ). If J.L is a probability measure, F is called a distribution function (the adjective cumulative is sometimes added).
176
M EASURE
Measures J.L are often specified by means of the function theorem ensures that to each F there does exist a J.L ·
F. The following
If F is a rwndecreasing, right-continuous real function on the line, there exists on !JR 1 a unique measure J.L satisfying (12.5) for all a and b. Theorem 12.4.
As noted above , uniqueness is a simple consequence of Theorem 10.3. The proof of existence is almost the same as the construction of Lebesgue measure, the case F( x) = x. This proof is not carried through at this poi nt, because it is contained in a parallel, more general construction for k dimensional space in the next theorem. For a very simple argument establish ing Theorem 12.4, see the second proof of Theorem 14. 1 .
Rk
Specifying Measures in
The a--field ,92 k of k-dimensional Borel sets is generated by the class of bounded rectangles A
( 12. 7) (Example
10.1).
= [ x: a1 < X; < b1, i = 1 , . . . , k ]
If /1 = (a1, b1], A has the form of a Cartesian product
( 12.8) Consider the sets of the special form
( 12.9)
sX = [ y : Y; < X i ,
Sx
consists of the points "southwest" of is the half-infinite interval ( - oo, x]. Now
i
=
x Sx
1, . . . , k ] ;
(x 1 , , x k ); in the ca�e k = 1 it is closed, and (12.7) has the form
=
• • •
Therefore, the class of sets (12.9) generates !JJ k . This class is a 7T-system. The objective is to find a version of Theorem 12.4 for k-space. This will in particular give k-dimensional Lebesgue measure. The first problem is to find the analogue of (12.5). A bounded rectangle (12.7) has 2k vertices-the points x = (x 1 , , x k ) for which each x; is either a1 or b1. Let sgn A x, the signum of the vertex, be + 1 or - 1, according as the number of i (1 < i < k ) satisfying x1 = a; is even or odd. For a real function F on R k , the difference of F around the vertices of A is D.A F L sgn A x · F(x ), the sum extending over the 2 k vertices x of A . In the case k 1 , A = (a, b] and tJ. A F = F(b) - F(a). In the case k = 2 , • • •
=
=
D.A F = F(bp b 2 ) - F(b l , a 2 ) - F(ap b 2 ) + F(a 1 , a 2 ).
S ECTIO N 12.
MEASURES IN EUCLIDEAN SPACE
17 7
Since the k-dimensional analogue of (1 2.4) is complicated, suppose at first that J.L is a fin ite measure on � k and consider instead the analogue of (12.6), nam ely
F ( x ) = J.L [ Y: y,. < x,. , i = 1 , . . . , k ] .
( 1 2. 1 1 ) Suppose that Then
Sx
is defined by (12.9) and A is a bounded rectangle (12.7).
( 12.12) To see this, apply to the union on the right in (12. 1 0) the inclusion-exclusion formula ( 10.5). The k sets in the union give 2 k 1 intersections, and these are the sets Sx for x ranging over the vertices of A other than (b 1 , , bk ). Taking into account the signs in (10.5) leads to (12. 12). -
•
+
-
+
-
+
+
-
+
-
-
+
-
+
+
-
+
-
•
•
+
Suppose x< n > � x in the sense that x� n > � x,. as n -7 oo for each i 1, . . . , k. n The n S, 3, and if A and B are bounded sets in R " and have nonempty interiors, then A and B are congruent by dissection. (The result does not hold if k is 1 or 2.) This is the Banach-Tarski paradox. It is usually illustrated in 3-space this way: It is possible to break a solid ball the size of a pea into finitely many pieces and then put them back together again in such a way as to get a solid ball the size of the sun. t
PROBLEMS
12. 1. Suppose that JL is a measure on .9£'1 that is finite for bou nded sets and is translation-invariant: JL(A + x) = u( A ). Show that JL(A) a A (A) for some k a > 0. Extend to R =
12.2. Suppose that A E .9P1, A( A) > 0, and 0 < e < 1 . Show that there is a hounded open interval I such that A( A n I ) > eA(I). Hint: Show that .A( A) may be assumed finite, and choo!.e an open G c;uch that A c G and A( A) > eA( G ). Now G Un In for disjoint open intervals In [A1 2], and [nA(A n 1,) > e[nA(/n); use an ln . =
A E .9P 1 and A( A) > 0, then the origin is interior to the difference set D(A) = [x - y: x, y E A]. Hint Choose bounded open interval I as in Problem 12.2 for e = i· Suppose that l z l < A( / )j2; since A n I and (A n l ) + z are contained in an interval of length less than 3 A( / )j2 and hence cannot be disjoint, z E D( A).
12.3. i
If
a
12.4. i The following construction leads to a subset H of the unit interval that is n onmeasurable in the extreme sense that its inner and outer Lebesgue mea sures are 0 and 1: A * (H) = 0 and A*(H) 1 (see (3.9) and (3.10)). Complete the details. The ideas are those in the construction of a nonmeasurable set at the end of Section 3. h will be convenient tc work in G [0, 1); let ffi and e denote addition and subtraction modulo 1 in G, which is a group with iden tity 0. (a) Fix an irrational e in G and for n = 0, ± 1, ± 2, . . . let an be ne reduce d modulo 1 . Show that en Ell em = en + m• en e em = en- m • and the en are distinct. Show that {e 2n: n = 0, + 1, . . } and {e 2n + 1: n = 0, + 1, } are dense in G. (b) Take x and y to be equivalent i f x 6 y lies in {en: n = 0, + 1, . . . }, which is a subgroup. Let S contain one repre!.entative from each equivalence class (each coset). Show that G = Un (S Ell en), where the union is disjoint. Put H = Un ( S Ell e 2 n) and show that G - H = H Ell e. (c) Suppose that A is a Borel set contained in H. If A( A) > 0, then D(A ) contains an interval (O, t:); but then some e 2 k + l lies in (O, t:) c D(A) c D(H), and so e 2 k + l = h , - h z = h , 8 h z = Cs, Ell e 2 n ) e (s z Ell e 2n ) for some h,, h z in H and some s 1 , s2 in S. Deduce that s 1 s2 and obtain a contradiction . Conclude that A * (H) = 0. (d) Show that A * (H Ell e) = 0 and A*(H ) = 1 . =
=
.
. . .
1
t see WAGON for an accou nt of these prodigies.
SECTION 12.
MEASURES I N EUCLIDEAN SPACE
181
12.5. i The construction here gives sets H, such that H, j G and A * (H, ) = 0. If 1, = G - H, , then 1, l 0 and A*(J, ) = 1. (a) Let H, = u;: - n (S Ell ()k ), so that H, i G. Show that the sets H, Ell ()(2n + I )l are disjoint for different u. (b) Suppose that A is a Borel set r.ontained in H,. Show that A and indeed all the A Ell ()< 2, + 1 l< have Lebesgue measure 0. �
12.6. Suppose that JL is non negative and finitely additive on !JRk and that JL( R k ) < oo. Suppose further that JL(A ) = sup JL(K), where K ranges over the compact subsets of A. Show that JL is countably additive (Compare Theorem 12.3(ii).) 12.7. Suppm:e JL is a measure on !JRk such that bounded sets have fi nite measure. Given A, show that there exist an Fu-":>et U (a countable union of closed sets) and a G1i-set V (a countable intersection of open sets) such that U cA c V and JL(V - U) = 0. 12.8. 2. 19t Suppose that JL is a nonatomic probability measure on (R k , !JR� ) and that JL(A ) > 0. Show that there is an uncountable compact set K such that K cA and JL( K ) = 0. 12.9. The minimal closed support of a measure JL on !JR k is a closed set C such that e c e for closed e if and only if e supports JL · Prove its exi�tence and �'uniqueness. Characterize the points of elL as those X such that JL(U ) > 0 for every neighborhood U of x. If k = 1 and if JL and the function F(x) are related by (12.5), the condition is F( x - E ) < F(x + E) for all E; x is in this case called a point of increase of F. 12.10. Of minor interest is the k-dimensional analogue of (12.4). Let I, be (0, t] for t > O and (t, O] for t < O, and let A .. = I.. X . . . x i..• . Let �p(x) be + 1 or - 1 acc.ording as the number of i, 1 < i < k, for which X ; < 0 is even or odd. Show that, if F(x) = �p(x)JL(A .. ), then (12.12) holds for bounded rectangles A . Call F degenerate if it is a fu nction of some k - 1 of the coordinates, the requirement in the case k = 1 being that F is constant. Show that !l A F = 0 for every bounded rectangle if and only if F is a finite sum of degenerate functions; (12.12) determines F to within addition of a function of this sort. 12.11. Let G be a nondecreasing, right-continuous function on the line, and put F(x, y) = min{G(x), y}. Show that F satisfies the conditions of Theorem 12.5, that the curve e [(x, G(x)): x E R 1 ] supports the corresponding measure, and that A 2(e) = 0. =
12.12. Let F1 and F2 be nondecreasing, right-continuous functions on the line and put F(x " x 2 ) = F1(x 1 )F2(x 2 ). Show that F satisfies the conditions of Theorem 12.5. Let JL, JL 1 , JL z be the measures corresponding to F, F" F2 , and prove that JL(A 1 X A 2 ) = JL 1(A 1 )JL 2(A 2 ) for intervals A 1 and A 2 • This JL is the product of JL 1 and JL z; products are studied in a general setting in Section 18.
MEASURE
182 SECTION 13. MEASURABLE FUNCTIONS AND MAPPINGS
If a real function X on f! has finite range, it is by the definition in Section 5 a simple random variable if [w: X(w ) = x] lies in the basic u-field :Y for each x. The requirement appropriate for the general real function X is stronger; namely, [w: X(w) E H] must lie in :Y for each linear Borel set H. An abstract version of this definition greatly simplifies the theory of such functions. Measurable Mappings
(f!, /Y) and (f!', /Y') be two measurable spaces. For a mapping T: .(1 � f!', consider the inverse images r- 'A ' = [w E f!: Tw EA'] for A' c f!' (See [A7] for the properties of inverse images.) The mapping T is measurable Let
!Yj :Y' if r- 'A ' E :Y for each A' E /Y'. For a real fu nction f, the image space f!' is the line R 1 , and in this case � � is always tacitly understood to play the role of :Y'. A real function f on f! is thus measurable :Y (or simply measurable, if it is clear from the context 1 what :Y is involved) if it is measurable !Yj!JR 1 -that is, if f- H = [w: f(w) EH] E :Y for eve-ry H E � 1 • Jn probability contexts, a real measur able function is called a random variable. The point of the definition is to ensure that [w: f(w) E H] has a measure or probability for all sufficiently regular sets H of real numbers-that is, for all Borel sets H.
A real function f with finite range is measurable if f- 1 {x} E :Y for each singleton {x}, but his is too weak a condition to impose on the general f. Ot is satisfied if (f!, /Y) = (R 1 , � 1 ) and f is any one-to-one rna p of the line into itself; but in this case r I H, even for so simple a set H as an interval, can for an appropriately cho'len f be any uncountable set, say Example 13.1.
the non-Bore! set constructed in Section 3. ) On the other hand, for a measurable f with finite range, r 1 H E !F for every H c R 1 ; but this is too stlong a condition to impose on the general f. (For (f!, !F ) = ( R 1 , � 1 ), even f(x) x fails to satisfy it.) Notice that nothing is required of fA; it need not • lie in � 1 for A in /Y. =
If in addition to (f!, :Y"), (f!', !F'), and the map T: f! � f! ', there is a third measurable space (f!'', !F") and a map T': f!' � f!", the composition T'T = T' o T is the mapping f! � f!" that carries w to T'(T( w )). (i) If r - 1A' E :Y for each A ' E .sf' and .sf' generates !F', then T is measurable !F/' !F'. (ii) If T is measurable :Yj !F' and T' is measurable !F' 1 !Y'' then T'T is measurable !Fj :Y". Theorem 13.1.
SECTION 13.
MEASURABLE FUNCTIONS AND MAPPINGS
183
1
PROOF. Since T - 1 (f!' - A') = f! - T- W and T- 1 ( U n A'n ) = U n T - A'n , and since :Y is a u-field in f!, the class [ A': T - 1A' E /Y] is a u-field in f!' . If this u-field contains .sf', it must also contain u(N''), and (i) follows. As for (ii), it follows by the hypotheses that A" E /Y" implies that (T') - 1A " E :F', which in turn implies that (T'T)- 1A'' = [w: T'Tw E A"] =
[w : Tw E (T')- 1A"] = T - 1 ((T') - 1A ") E :F. x,
•
By part {i), if f is a real function such that [w: F(w) < x ] lies in :Y for all then f is measurable !T This condition is usually easy to check.
Mappings into
Rk
For a mapping f: f! � R k carrying n into k-space, � k is always understood to be the u-field in the image space. In probabilistic contexts, a measurable mapping into R k is called a random vector. Now f must have the form ( 13 . 1 ) for real functions fj( w ). Since the sets (12.9) (the "southwest regions") generate � \ Theorem 13.l{i) implies that f is measurable :Y if and only if the set ( 13.2)
[ w : fl ( w ) < x l , . . . , fk ( w ) < x k ]
=
k
n [ w: fj ( w) < x j]
j= I
lies in :Y for each (x 1 , , x k ). This condition holds if each fj is measurable !T On the other hand if x1. = x is fixed and x I = · · · = x'J 1 = x1. + l = · · · = x k = n goes to oo, the sets (13.2) increase to [w: f/w) < x ]; -the condition thus implies that each t is measurable. Therefore, f is measurable :Y if and only if each component function t is measurable !T This provides a practical criterion for mappings into R k . A mapping f: R ; � R k is defined to be measurable if it is measurable � ; /� k . Such functions are often called Borel functions. To sum up, T: f! � f!' is measurable !Yj !Y' if T - 'A' E :Y for ali A' E /Y'; f: f! � R k is measurable :Y if it is measurable !Yj� k ; and f: R' � R k is measurable (a Borel function) if it is measurable �; /� k . If H lies outside � 1 , then IH (i = k = 1) is not a Borel function. • • •
•
Theorem 13.2.
'
Iff:
R; � R k
is continuous, then it is measurable.
PROOF. As noted above, it suffices to check that each set (13.2) lies in �i- But each is closed because of continuity. •
MEASURE
184
If fi : f! � R 1 is measurable .7, j = 1, . . . , k, then g(f1 (w ), . . . , fk (w )) is measurable .7 if g: R k � R 1 is measurable-in particu lar, if it is continuous. Theorem 13.3.
PROOF. If the fi are measurable, then so is (13. 1), so that the result • follows by Theorem 13.1{ii).
Taking g(x 1 , . . , x k ) to be E7= 1 x; , n 7= 1 x;, and max{x p . . . , x k } in turn shows that sums, products, and maxima of measurable functions are measur able. If f(w) is real and measurable, then so are sin f(w ), e'f<w>, and so on, and if f( w) never vanishes, then 1 /f( w ) is measurable as well. Limits and Measurability
For a real function f it is often convenient to admit the artificial values oo and - oo-to work with the extended real line [ - oo, oo]. Such an f is by definition measurable .7 if [w: f(w) E H] lies in .7 for each Borel set H of (finite) real numbers and if [w: f(w) = oo] and [w: f(w) - oo] both lie in .7. This extension of the notion of measurability is convenient in connection with limits and suprema, which need not be finite. =
Theorem 13.4.
able
{i)
The functions
.7.
{ii) (iii) (iv)
Suppose that f1 , f2 ,
•
•
•
are real functions measurable :7:
sup n fn, infn fn , lim sup n fn ,
and
lim infn fn
are measur-
If limn fn exists everywhere, then it is measurable :7: The w-set where {fn( w )} converges lies in .7. Iff is measurable .7, then the w-set where fJw) � f(w) lies in
.7.
PROOF. Clearly, (supn fn < X) = nn [fn < X ] lie'> in ,7 even for X = 00 and x = - oo, and so sup,. fn is measurable. The measurability of infn fn follows the same way, and hence lim supn fn inf n supk � n fk and lim infn fn = supn inf k � n fk are measurable. If lim n fn exists, it coincides with these last two functions and hence is measurable. Finally, the set in (iii) is the set where lim supn fn( w ) lim infn f/w), and that in (iv) is the set where this common • value is f(w ). =
=
Special cases of this theorem have been encountered before-part (iv), for example, in connection with the strong law of large numbers. The last three parts of the theorem obviously carry over to mappings into R k . A simple real function is one with finite range; it can be put in the form ( 13 .3)
SECTIO N
13.
MEASURABLE FUNCTIONS AND MAPPINGS
185
where the A ; form a finite decomposition of fl . It is measurable !F if each A; lies in !F. The simple random variables of Section 5 have this form. Many results concerning measurable functions are most easily proved first for simple functions and then, by an appeal to the next theorem and a passage to the limit, for the general measurable fu nction.
If f is real and measurable !F, there exists a sequence {f,} of simple functions, each measurable !F, such that Theorem 13.5.
0 < f,( w ) i f( w )
( 13.4)
if f( w ) > 0
and if f( w ) < 0.
( 13.5 ) PROOF.
Define
- oo 1 - n - 1 and that B contains at most n pain ts. Show that some rotation carries B in to A 68 c A fm some e in C.
13. 14. Show by example that p., a--finite does not imply p. T
-
1 a--finite.
13.15. Consider Lebesgue measure A restricted to the class � of Borel sets in (0, 1 ].
For a fixed permutation n 1 , n 2 , . • • of the positive integers, if x has dyadic expansion .x 1 x 2 . . . , take Tx = .xn 1 xn , " ' . Show that T is measurable ��� and that AT - 1 = A.
Hk be the union of the interva ls ((i - l )j2 k , ij2 k ] fo i even, 1 < i < 2 k . Show that if 0 0,
and conversely. Weak Convergence
Random variables X1 , , Xn are defined to be independent if the events [X1 EA 1 ], , [ Xn E Anl are independent for all Borel sets A 1 , , A n ;- so that P[X; E A ;, i = 1 , . . . ' n] = n;= I P[ Xj E A J To find the distribution func tion of the maximum Mn = max{X1 , . . . , Xn}, take A 1 = · · · = An = ( oo, x ]. This gives P[Mn < x ] = n;'_ 1 P[ X; < x]. If the Xi are independent and have common distribution function G and Mn has distribution function Fn , then • • •
• • •
• • •
-
( 14.8) It is possible without any appeal to measure theory to study the real function Fn solely by means of the relation (14.8), which can indeed be taken as defining Fn - It is possible in particular to study the asymptotic properties of Fn :
SECTION
14.
DISTRIBUTION FUNCTIO NS
191
Consider a stream or sequence of events, say arrivals of calls at a telephone exchange. Suppose that the times between successive events, the interarrival times, are independent and that each has the expo nential form (14.7) with a common value of a. By (14.8) the maximum Mn among the first n interarrival times has distribution function Fn(x ) = (1 e - a x ) n, x > 0. For each x, limn Fn(x ) = 0, which means that Mn tends to be large for n large. But P[Mn - a - I log n < x] = Fn(x + a - 1 log n). This is the distribution function of Mn - a - 1 log n, and it satisfies Example 14.1.
n n ) og ( -= ( 1 - e a x+ l ) --+ e - e - ax holds if log n > -ax, and so the limit holds for
Fn ( X + a - 1 log n )
( 14.9) as n --+ oo; the equality here all x. This gives for large n the approximate distribution of the normalized • random variable Mn - a _ , log n. Fn and F are distribution weakly to F, written Fn = F , if If
/
functions, then by definition,
lim Fn ( x )
( 14.10)
n
=
Fn converges
F( x )
for each x at which F is continuous. ; To study the approximate distribution of a random variable Y,. it is often necessary to study instead the normalized or rescaled random variable (Y,. - bn) Ia n for appropriate constants an and bn. If Yn has distribution function Fn and if an > 0, then P[(Yn - b)/an < x ] = P[ Yn < a nx + bn ], and therefore (Yn - bn)/an has distribution function Fn (a nx + bn). For this reason weak convergence often appears in the form*
( 14.1 1) An
example of this is
(14.9):
there
a n = 1, bn -= a - I log n, and F(x) = e - e -ax.
Consider again the distribution function maximum, but suppose that G has the form Example 14.2.
if X < if X >
of the
1, 1,
if X < 0, if x > 0.
This is an example of (14.1 1) in which
(14.8)
a n = n 1 1 a and bn
=
•
0.
t For the role of continuity, see Example 1 4.4. tTo write Fn( a x + bn) = F( x) ignores the distinction between a function and its value at an unspecified value of its argument, but the meaning of course is that F ( a x + bn) --+ F(x) at continuity points x of F.
n
n n
MEASURE
192 Example 14.3.
Consider (14.8) once more, but for
0 G( x ) = 1 - ( 1 - x ) 1 where a > 0. This time Therefore,
a
if X < 0, if 0 <x < if X > 1 ,
1,
FJn - l lax + 1) = {1 - n - 1 ( -x) "') n if if
a case of (14. 11) in which
if
- n 1 1 a < x < 0.
X < 0, X > 0,
a n = n - l / a and bn = 1.
•
Let 11 be the distribution function with a unit jump at the origin:
!:J.( X ) =
( 14.12)
{�
if X < 0, if X > 0.
If X(w ) = 0, then X has distribution function Example 14.4.
of large numbers,
be independent random variables for which - 1] = � ' and put Sn = X1 + + Xn . By the weak law
Let
P[ Xk = 1] = P[ Xk =
X1 X2 ,
11 .
• • •
·
·
·
( 14.13) Let Fn be the distribution function of n - l sn" If X > 0, then Fn(x) = 1 - P[n - 1 S,. > .t ] -+ 1; if x < 0, then FJx) < P[ln - 1 Snl > l x ll -+ 0. As this accounts for all the continuity points of /1, Fn = 11. It is easy to tum the argument around and deduce (14. 13) from Fn = 11 . Thus the weak law of large numbers is equivalent to the assertion that the distribution function of n - l sn converges weakly to /1 . If n is odd, so that sn = 0 is impossible, then by symmetry the events [Sn < 0] and [Sn > 0] each have probability � and hence FJO) = �. Thus Fn(O) does not converge to /1(0) = 1, but because 11 is discontinuous at 0, the for € > 0.
definition of weak convergence does not require this.
•
Allowing 04.10) to fail at discontinuity points x of F thus makes it possible to bring the weak law of large numbers under the theory of weak convergence. But if 04. 10) need hold only for certain values of x, there
SECT ION 14.
DISTRIBUTION FUNCflONS
193
arise s the question of whether weak limits are unique. Suppose that Fn = F and Fn = G. Then F(x) = limn Fn(x) = G(x ) if F and G are both continu ous at x. Since F and G each have only countably many points of discontinu ity, t the set of common continuity points is dense, and it follows by right continuity that F and G are identical. A sequence can thus have at most one weak limit. Convergence of distribution functions is studied in detail in Chapter 5 . The remainder of this section is devoted to some weak-convergence theorems which are interesting both for themselves and for the reason that they require so little technical machinery. Convergence of Types *
Distribution functions F and G are of the same type if there exist consrants a and b, a > 0, such that F(ax + b) = G( x) for all x. A distribution function is degenerate if it has the form Mx - x0) (see 04. 12.)) for some x0; otherwise , it is nondegenerate.
Suppose that F11( U 11X + V11) = F(x ) and F11(anx + b11) => G( x ) where u n > 0, a, > 0, and F and G are nondegenerate. Then there exist a and b, a > 0, such that a,,/u , --+ a, (bn - V11)/un --+ b, and F(ax + b) = G( x ). Theorem 14.2. ,
Thus there can be only one possible limit type and essentially only one possible sequence of norming constants. The proof of the theorem is for clarity set out in a sequence of lemmas. In all of them, a and the a " are assumed to be positive.
PROOF. If x is a continuity point of F(ax + b) and E > 0, choose conti nuity points u and v of F so that u < ax + b < v and F(v) - F(u) < E ; this is possible because F ttas only countably many discontinuities. For large enough n, u < a11x + b11 < V, IF11(U) - F(u)i < E, and I Fn( v) - F(v)I < E; but then F(ax + b) - 2 € < F(u) - € < F11(u) < Fn(anx + bn) < Fn( v) < F(v ) + € < • F(ax + b) + 2 E .
PROOF. Given E, choose a continuity point u of F so large that F(u) > 1 - €. If x > 0, then for all large enough n, anx > u and IFn(u) - F(u)l < E, so t The proof following ( 1 4.3) uses measure theory, but this is not necessary If the saltus u(x) = F(x ) - F(x - ) exceeds f at x 1 < < X11, then F(x;) - F(x ; _ 1 ) > f (take x0 < x 1 ), and so n �: < F(x11 ) - F(x0) < 1 ; hence [r u(x) > € ] is finite and [ x : u(x ) > 0] is countable *This topic may be omitted. ·
·
194
MEASURE
that F11(a11x) > F11(U) > F(u) - € > 1 - 2€. Thus limn F11(a11x) = 1 for similarly, limn F11(a11x) = 0 for x < 0. 3.
Lemma
If Fn
=
F and bn
verge weakly.
is
x > 0; •
unbounded, then F11(x + �") cannot con
Suppose that bn is unbounded and that bn oo along some subsequence (the case b, - oo is similar). Suppose that Fn(x + b11) G(x ). Given E, choose a conti nuity point u of F so that F(u) > 1 - E . Whatever x may be, for n far enough out in the subsequence, x + bn > u and F/u) > 1 2€, so that F11( x + b11 ) > 1 - 2€. Thus G( x) = lim, F,,( x + bn) = 1 for all con • tinuity points x of G, which is impossible. PROOF.
--+
=
--+
4.
If F11(x ) nondegenemte, then Lemma
=
F( x) and Fn(a11 X + bn)
=
G(x ), where F and G are
0 < inf a n < sup a n < oo, n n n Suppose that an is not bounded above. Arrange by passing to PROOF. subsequence that a �� oo. Then by Lemma 2,
( 14 . 1 4 )
a
"
( 1 4 . 15 ) Since
( 1 4 . 1 6) it follows by Lemma 3 that b11ja11 is bounded along this subsequence. By passing to a further subsequence, arrange that b11/an converges to some c. By (14. 15) and Lemma 1 , F11( a11( x + b11ja11)) Mx + c) along this subsequence. But (1 4 .16) now implies that G is degenerate, contrary to hypothesis. Thus an is bounded above. If G11( x) = F/a"x + b11), then G11( X) G(.x ) and G11(a;;1x - a;;1b11) = F11(x) F(x ). The result just proved shows that a;; 1 is bounded. Thus an is bounded away from 0 and oo. I f bn is not bounded, neither is bn fa n; pass to a subsequence along which b11 fa n + oo and a n converges to a positive a. Since, by Lemma 1, F11(a11x) F(ax ) along the subsequence, (1 4 . 16) and b11/an --+ + oo stand in contradiction (Lemma 3 again). Therefore • bn is bounded. =
=
=
--+
=
Lemma 5.
a = 1 and b
=
If F(x) F(ax + b) for all x and F =
nondegenerate, then
0.
Since F(x) = F(a"x + (a "-1 + Lemma 4 that a" is bounded away from 0 and follows that nb is bounded, so that b = 0. P ROOF.
is
·
·
·
+ a + 1)b), so that a
oo,
=
it follows by 1 , and it then •
SECTION 14.
DISTRIBUTION FUNCfiONS
195
PROOF OF THEOREM 14.2. Suppose first that un = 1 and vn = 0. Then (14. 14) holds. Fix any subsequence along which an converges to some positive a and bn converges to some b. By Lemma 1, Fn(anx + bn ) = F(ax + b) along this subsequence, and the hypothesis gives F(ax + b) = G( x ). Suppose that along some other sequence, an � u > 0 and bn v. Then F( u.x + v) = G( x) and F(ax + b) = G( x) both hold, so that u = a and v = b by Lemma 5. Every converge nt subse quence of {(a n, bn)} thus converges to (a, b), and so the entire sequence does. For the general case, let Hn(x ) = Fn( unx + vn). Then Hn( x ) = F(x) and 1 H/anu;; 1 x + (bn - vn)u;; 1 ) = G(x ), and so by the case already treated, an u;; 1 converges to some positive a and (bn - vn )u;; to some b, and as before, --+
•
F(ax + b) = G (x).
Extremal Distributions •
distribution function F is extremal if it is nondegenerate and if, for some distribution function G and constant� an (an > 0) and bm
A
( 14.17 ) These are t he possible limiting distributio ns of normalized maxima (see (14.8)), and Examples 14. 1, 14.2, and 14.3 give three specimens. The following analysis shows that these three examples exhaust the possible types. nk k F is extremal. From (14. 17) follow c (anx + bn) = p (x) and Assume that nk c (ank x + bnk ) = F(x ), and so by Theorem 14.2 there exist constants ck and dk such that c k is positive and (14.18 ) k From F(cjk x + djk ) = Fj (x) = F j(ck x + dk ) = F(cick x + dk ) + d) follow (Lemma 5) the relations
{ 14. 1 9 )
C1·k = C1·C k ,
Of course, c 1 = 1 and d 1 = 0. There are three cases to be con�idered separately.
CASE 1.
Suppose that c k = 1 for all k. Then
(14 .20 )
k F 1 1 ( x ) = F( x - dd .
k This imp lies that Fj j (x) = F(x + d1 - dk ). For positive rational r = jjk , put B, = d . - dk ; (14.19) implies that the definition is consistent, a nd F'(x) = F(x + B,). Since F is nondegenerate, there is an x such that 0 < F(x) < 1, and it follows by (14.20) that dk is decreasing in k , so that B, is strictly decreasing in r. *
This topic may be omitted.
M EASU RE
196
t �p(t) = inf0 < , < t,
For positive real let decreasing in and
1
o,
(r rational in the infimum). Then
F' (X ) = F( X + 'P (
( 14.21 )
t.
�p(t) is
t)) �p(st) �p(s) �p(t), t, �p(e x ), �p(t) t exlf3
so that by for all x and all positive Further, (14. 19) implies that = + the theorem on Cauchy's equation [A20] applied to = -{3 log where is strictly decreasing. Now (14.21) with = gives F(x) = {3 > 0 because exp{e 1 13 log F(O)}, and so F must be of the same type as
�p(t)
-x
( 14.22 )
Example 14. 1 shov'S that this distribution function can arise as a limit of distributions of maxima-that is, F1 is indeed extremal. 1. Then there CASE 2. Suppose that c k " =fo 1 for some k0, which necessarily exceeds k exists an x' such that ck x' + dk =x'; but (14.18) then gives F 11( x') = F(x'), so that F(x') 1 s 0 or 1 . (In Case 1, F has the type (14.22) and so never assumes the values 0 and 1 .) Now suppose further that, in fact, F(x ') = 0. Let x0 be the supremum of those x for which F(x ) = 0. By passing to a new F of the same type one can arrange that x 0 = 0; then F(x) = 0 for t < 0 and F(x) > 0 for x > 0. The new F will satisfy (14.18), but with new constants dk . If a (new) dk is distinct from 0, then there is an x near 0 for which the arguments o n the two sides of (14. 18) have opposite signs. Therefore, dk = 0 for all k, and II
•
II
( 14.23 ) k for all k and x. This implies that Fjf (x) = F(xcjjck ). For positive rational r =jjk , put y, = c jck . The definitio n is again consistent by (14.19), and F'(x ) = F(y,x). Since 0 < jF(x) < 1 for some x, necessarily positive, it follows by (14.23) that ck is decreasing in k, so that 'Yr is strictly decreasing in r. Put = info < r < I "lr fo r positive real From (14.19) follows = and by the corollary to the = � - � for it follows that theorem on Cauchy's equation [A20] applied to some � > 0. Since F'(x) = F(rfJ(t)x) for all x and for positive, F(x) I / exp{ x - � log F(l)} for x > 0. Thus (take a = 1H) F is of the same type as
t.
r/J(st) rfJ(s)rfJ(t), r/J(e x ),
(14.24 )
r/J(I) r/J(t) t
=
if X < 0, if X > 0.
Example 14.2 shows that this case can arise. CASE 3. Suppose as in Case 2 that ck -=1= 1 for some k0, so that F(x') is 0 or 1 fo r some x', but this time suppose that F( }) = 1. Let x 1 be the infimum of those x fo r which F(x) = 1. By passing to a new F of the same type, arrange that x 1 = 0; then F(x ) < 1 for x < 0 and F(x ) = 1 for x > 0. If dk =fo 0, then for some x near 0, one side of (14.18) is 1 and the other is not. Thus d k = 0 for all k, and (14.23) again holds. And
SECTION 14.
DI STRIBUTION FUNCfiONS
197
again Yjj k = cjjc k consistently defines a function satisfying F'(x) = F(y,x ). Since F is nondegenerate, 0 < F(x) < 1 for some x, but this time x is necessarily negative, so that c k is increasing. The same analysis as before shows that there is a positive t such that F'(x) = F(t �x ) for all x and for t positive. Thus F(x) = exp{( -x) 1 1� Iog F( - 1)} for x < O, and F is of the type if X < 0, if X > 0.
( 14.25 )
Example 14 3 shows that this distribution function is indeed extremal. This completely characterizes the class of extremal distributions:
The class of extremal distribution functions consists exactly of the distribution functions of the types (14.22), (14.24), and (14.25). Theorem 14.3.
It is possible to go on and characterize the domains of attraction. That is. it is possible for each extremal distribution function F to describe the class of G satisfying (14.17) for some constants an and bn-the class of G attracted to F.t PROBLEMS 14. 1. The general nondecreasing function F has at most countably many discontinu
ities. Prove this by considering the open intervals
{ usup<x F( u ) , t.inf>x F( u ) ) -each nonempty one contains a rational. 14.2. For distribution functions F, the second proof of Theorem 14.1 shows how to
construct a measure J1. on (R 1 , .9P 1 ) such that JL(a, b] = F( b ) - F( a ). (a) Extend to th{' case of bounded F. (b) Extend to the general case. Hint: Let Fn(x) be -n or F(x) or n as F(x) < - n or -n < F(x) < n or n < F(x). Construct the corresponding Jl.n and define J.L(A ) = limn Jl.n(A).
14.3.
(a) Suppose that X has a continuous, strictly increasing distribution function F. Show that the random vai iable F(X ) is uniformly distributed over the unit interval in the sense that P[F(X) < u] = u for 0 < u < 1. Passing from X to F(X) is called the probability transformation. (b) Show that the function �p(u) defined by (14.5) satisfies F(�p(u) - ) < u < F(�p(u)) and that, if F is continuous (but not necessarily strictly increasing), then F(�p(u)) = u for 0 < u < 1. (c) Show that P[F(X) < u] = F(�p(u) - ) and hence that the result in part (a) holds as long as F is continuous.
tThis theory is associated with the names of Fisher, Frechet, Gnedenko, and Tippet. For further information, see GALAMBOs.
M EASURE
198 14.4.
i Let C be the set of continuity points of F. (a) Show that for every Borel set A , P[ F( X) A , X C] is at most the Lebesgue measure of A . (b) Show that if F is continuous at each point of F 1A , then P[ F( X) A ] is at most the Lebesgue measure of A .
E E
-
E
Levy distance d(F, G) between two distribution functions is the infimum of those E such that G(x - E) - E < F(x) < G(x + E) + € for all x. Verify that this
14.5. The
is a metric on the set of distribution functions. Show that a necessary and sufficient condition for Fn F is that d(Fn, F) --> 0. =
A Borel function satisfying Cauchy's equation [A20] is automatically bounded in some interval and hence satisfies f(x) =xf(l). Hint: Take K large enough that A [ x : x > s, /f(x)/ < K ] > 0. Apply Problem 12.3 and conclude thl!t f is bounded in some intervai to the righ t of 0
14.6. 12.3 i
14.7.
i Consider sets S of reals that are !inearly independent over the field of +n k x k 0 for distinct points X; in S and rationals in the sense that n 1 x 1 + in tegers n ; (positive or negative) is impossible unless n; 0. (a) By Zorn's lemma find
199
INl EG RATION
200
The reasons for these conventions will become clear later. Also in f0 1 ce are the conventions of Section 10 for sums and limits involving infinity; see (10.3) and (10.4). If A; is empty, the infimum in (15.1) is by the standard convention oo; but then J.L(A) = 0, so that by the convention (15 .2), this term makes no contribution to the sum (15.1 ). The integral of f is defined as the supremum of the sums (15. 1 ): ( 1 5 .3) The supremum here extends over all finite decompositions .r-sets. For general f, consider its positive part,
( 15 .5 )
of f! into
if 0 < f ( w ) < oo , if - 00 < f( w ) < 0
( 1 5 .4) and its
{A ;}
ne'gative part, if - 00 < f ( w ) < 0, if O < f ( w ) < oo .
These functions are nonnegative and measurable, and eral integral is defined by
f = r-r. The gen
( 15 .6) unless Jr dJ.L = Jr dJ.L = oo, in which case f has no integral. If Jr dJ.L and f.f- dJ.L are both finite, then f is integrable, or integrable J.L, or summable, and has (15.6) as its definite integral. If Jr dJ.L = oo and Jr dJ.L < oo, then f is not integrable but in accordance with (15.6) is assigned oo as its definite integral. Similarly, if Jr dJ.L < oo and Jr dJ.L = oo, then f is not integrable but has definite integral - oo. Note that f can have a definite integral without being integrable; it fails to have a definite integral if and only if its positive and negative parts both have infinite integrals. The really important case of (15.6) is that in which ff+ dJ.L and Jr dJ.L are both finite. Allowing infinite integrals is a convention that simplifies the statements of various theorems, especially theorems involving nonnegative functions. Note that (15.6) is defined unless it involves "oo - oo " ; if one term on the right is oo and the other is a finite real x , the difference is defined by the conventions oo - x = oo and x - oo = - oo. The extension of the integral from the nonnegative case to the general case is consistent: (15.6) agrees with (15.3) if f is nonnegative, because then
r = o.
SECfiON 15.
THE INTEGRAL
201
Nonnegative Functions
It is convenient first to analyze nonnegative functions.
If f = L;X JA ; is a nonnegative simple function , {A J being a finite decomposition of f! into �sets, then JfdJL = L ; ;JL( A J (ii) If 0 < f(w) < g(w) for all w , then ffdJL < JgdJL. (iii) If 0 < fn(w) i f(w ) for all w , then 0 < ffn d11- i JfdJL . (iv) For nonnegative functions f and g and nonnegative constants a and {3, J(af + {3g) dJL = affdJL + {3fgdJL . (i)
Theorem 15.1.
X
In part (iii) the essential point is thar JfdJL = limn ffn dJL, and it is important to understand that both sides of th is equation may be oo. If fn = /A and f = IA , where A n i A, the conclusion is that 11- is continuous from below (Theorem 10.2(i)): lim n JL(A n ) = JL(A ); this equation often takes the form
n
00 = oo .
PROOF OF (i). Let {B) be a finite decomposition of fl and let {3j be the infimum of f over Bj. If A; n Bj =fo 0 , then {3j < x ; ; therefore, [jf3jJL(B) = [ijf3 j JL((A; n B)) < L ;jX ;JL( A; nB) = L ; X ;JL ( A ). On the other hand, there • is equality here if {B) coincides with {AJ PRooF OF (ii).
� g.
The sums (15.1) obviously do not decrease if f is replaced •
PROOF OF (iii). By (ii) the sequence ffn d11- is nondecreasing and bounded above by Jfd/.L. It therefore suffices to show that JfdJL < limn ffn dJL, or that
m lim Jfn d11- > S = L V;JL ( A; ) . I= I
( 15 .7)
n
is any decomposition of f! into �sets and V; = infw e A f(w). In order to see the essential idea of the proof, which is quite simple, suppose first that S is finite and all the V; and JL( A ; ) are positive and finite. Fix an E that is positive and less than each V;, and put A;n = [w EA;: fn(w) > v; - E]. Since fn i f, A ;n i A ; . Decompose f! into A 1 n , . . . , A mn and the complement of their union, and observe that, since 11- is continuous from below, if
A 1,
• • •
( 15 .8)
Since the
, Am
I
m m jfn dJL > iL= l ( v; - E)JL ( A;n) � iL= l ( v; - E) JL ( A ; ) m = 5 - E L JL ( A ;) ·
JL(A) are
i=
I
all finite, letting
E � 0 gives (15.7).
INTEGRATION
202
Now suppose only that 5 is finite. Each product Vi!J.( A ; ) is the n finite; suppose it is positive for i < m0 and 0 for i > m0. (Here m0 < m; if the product is 0 for all i, then S = 0 and (15.7) is trivial.) Now v1 and J.L ( A ) are positive and finite for i < m0 (one or the other may be oo for i > m0). Define A 1, as before, but only for i < m0 • This time decompose f! into A 1 ,, , A m and the complement of their union. Replace m by m0 in (15.8) and complete the proof as before. Finally, suppose that S = oo. Then v,. J.L( A , ) = oo for some i 0 , so that v,. and J.L( A ) are both positive and at least one is oo. Suppose 0 < x < V; < oo > x]. From f, i f and 0 < y < J.L( A ,. ) < oo, and put A,. , = [w E A ,. : )",(w) J follows A , , i A,. ; hence J.L( A ,. , ) > y for n exceeding some n0. But then (decompose f! into A 111, and its complement) Jf, dJ.L > X J.L( A i ,) > xy for n > n0, and therefore lim , ff, dJ.L > xy . If v10 = oo, let x --+ oo, and 0if J.L(A 1 ) = • oo, let y --+ oo. In either case (15.7) follows: lim , ff, dJ.L = oo. •
II
I (J
u
(I
(J
u J
11
II
un
• •
(I
It
IJ
PROOF OF (iv). Suppose at first that f = [1xJA; and g = Lj Y/B are ; simple. Then af + {3 g = L-iax, + f3y)IA; n 8i , and so
j ( af + {3g ) dJ.L = � ( ax1 + f3 YJ J.L ( A , n Bj ) I)
J
= a L X ;J.L ( A 1 ) + {3 LYjll ( BJ = a fdJ.L + {3 j
i
JgdJ.L.
Note that the argument is valid if some of a, {3, x1, Yj are infinite. Apart from this possibility, the ideas are a� in the proof of (5 .2 1 ). For general nonnegative f and g, there exist by Theorem 13.5 simple functions /, and gr such that 0 < ! i f and 0 < g, i g. But then 0 < af, + {3 g, i af + {3g and f ( af, + f3g,) dJ.L = a ff, dJ.L + f3 fg, dJ.L, so that (iv) follows • from (iii). ,
By part (i) of Theorem 15.1, the expected values of simple random variables in Chapter 1 are integrals: E[X ] = JX(w)P(dw). This also covers the step functions in Section 1 (see (1 .6)). The relation between the Riemann integral and the integral as defined here will be studied in Section 17. Consider the line (R 1 , .9i' 1 , ,.\) with Lebesgue measure . Suppose that - oo < a0 < a 1 < · · · < a m < oo, and let f be the function with nonnegative value x1 on (a ;_ 1 , a,], i = 1, . . . ,.m , and value 0 on ( - oo, a0] and (a m , oo). By part (i) of Theorem 15. 1 , Jfd,.\ = [f!. 1 xia 1 - a1 _ 1 ) because of the convention 0 · 00 = O-see (15.2). If the "area under the curve" to the left of a0 and to the right of a m is to be 0, this convention is inevitable. From oo · 0 = 0 it follows that Jfd,.\ = 0 if f is oo at a single point (say) and 0 elsewhere. Example 15.1.
SECTION 15. THE INTEGRAL
203
If f = I< a.oo)• the area-under-the-curve point of view makes ff d11- = oo natural. Hence the second convention in (15.2), which also requires that the • integr al be infinite if f is oo on a nonempty interval and 0 elsewhere. Recall that
almost everywhere
Theorem 15.2.
mean s outside a set of measure 0.
Suppose that f and g are nonnegative.
(i) Iff = 0 almost everywhere, then JfdJL = 0. (ii) If JL[ w: f( w) > 0] > 0, then Jf d11- > 0. (iii) If JfdJL < oo, then f < oo almost everywhere. ( iv ) lf f < g almost everywhere, then JfdJL < fgdJL. (v) Iff = g almost everywhere, then JfdJL = fgdJL. PRooF. Suppose that f = 0 almost everywhere. If A; meets [w: f(w) = 0], then the infimum in (15. 1 ) is 0; otherwise, JL(A,) = 0. Hence each sum (15. 1 ) is 0, and {i) follows. If A� = [w: f(w) > €], then A� i[w: f(w) > 0] as E � 0, so that under the hypothesis of (ii) there is a positive E for which JL(A) > 0. Decomposing f! into A� and its complement shows that ffdJL > E JL( A ) > 0. If 11-U = oo] > 0, decompose f! into [ f = oo] and its compiement: ffdJL > oo · JL[f oo] = oo by the conventions. Hence (iii). To prove (iv), let G = [f < g]. For any finite decomposition {A 1 , , A m } of f!, �
=
•
[ ]
L��f f ] JL( A [ ;] < I: [ inf g ] JL( AJl G) < jgdJL,
I: i�iff JL ( A ; ) = I: i� f JL ( A ; n G) < I:
a
;
•
•
n G)
kI n G
where the last inequality comes from a consideration of the decomposition • A 1 n G, . . . , A m n G, G c. Th is proves (iv), and (v) follows immediately. Suppose that f = g almost everywhere, where f and g need not be nonnegative. If f has a definite integral, then since r = g + and r = g almost everywhere, it follows by Theorem 15.2(v) that g also has a definite integral and ffdJL = fgdJL. Uniqueness Although there are various ways to frame the definition of the integral, they are all equivalent-they all assign the same value to ffdJL . This is because the integral is uniquely determined by certain simple properties it is natural to require of it. It is natural to want the integral to have properties (i) and (iii) of Theorem 15.1. But these uniquely determine the integral for nonnegative functions: For f nonnega tive, there exist by Theorem 13.5 simple functions [, such that 0 < fn i f; by (iii), ffdJL must be lim , ffn dJL , and (i) determines the value of each ffn dJL.
INTEGRATION
204
Property (i) can itself be derived from (iv) (linearity) together with the assumption that f!A dp. = p.( A) for indicators /A : j([;x;fA ) dp. = L;X; /IA dp. = L;X;p.(A) . If (iv) of Theorem 15.1 is to persist when th e integral is exte nded beyond the class of nonnegative functions, !fdp. must be J(r - r ) dp. = Jr dp. - Jr dp. , which makes the definition (15.6) inevitable. PROBLEMS
These problems outline alternative defin itions of the integral and clarify the role measurability plays. Call (15.3) the and write it as
lower integral,
( 1 5 9) to distinguish it from the
upper integral
j
(15.10)
*
[
J
fdp. = inf [ sup f(w) p.( A ; ) · i
w EA,
The infimum in (15 10), like the supremum in (15.9), extends over all finite partiti ons {A ; } of D. i nto .r-sets.
f is measurable and nonnegative. Show that f *fdp. = oo if p.[w: f(w) 0 = or if p.[w: f(w) 0 for all
15. 1. Suppose that
>]
oo
> a] >
a
There are many functions familiar from calculus that ought to be integrable but are of the types in the preceding problem a nd hence have infinite upper integra l. Examples are x- 2 /( l oo>(x) and x - 1 /2/(x). Therefore, (15.10) is inappropriate as a , non negative f. The only problem with (15. 1 0), however, is that definition of ffdp. for it treats infinity the wrong way. To see th is, and to focus on essentials, ass ume that p.( D.) < and that f is bounded, although not necessarily nonnegath'e or measur able !f. oo
15.2.
i
(a) Show that
if {B) refines { A ;}. Prove a dual relation for the sums in (15.10) and concl ude that (15.11 ) (b) Now assume that f is measurable !f and let M be a bound for
Consider the partitio n A; = [w : iE M. Show that
Conclude that ( 1 5 . 12)
Darboux-Ycung
To define the integral as the common value in (15.12) is the approach. The advantage of (i5.3) as a definition is that (in the nonnegative case) it applies at once to unbounded f and infinite JL For A e n, define JL*(A ) and JL * (A) by (3.9) and (3.10) with JL in place of P. Show that f*IA dJ.L = JL*(A) and f * !A dJL = JL * (A) for every A. Therefore, (15.12) can fail if f is not measurable !f. (Where was measurability used in the proof of (15.12)?)
15.3. 3.2 15.2 i
The definitio ns (15.3) and (15.6) always make formal sense (for finite JL(!l) and sup i f l), but they are reasonable-accord with intuition-only if (1 5.12) holds. Under what conditions it hold?
does
there exist an function g, !f-set and a and g]
15.4. 10.5 15.3 i
(a) Suppose of f that A JL(A) = 0 5', [f * cA. This is the same thing as assuming that JL*[f * = 0, or assuming that f is measurable with respect to !f completed with respect to JL· Show that (15.12) holds. (b) Show that if (15. 12) holds, then so does the italicized condition in part (a).
measurable such that g]
Rather than assume that f is measurable 5', one can assume that it satisfies the 5', J.L) is complete is the italicized condition in Problem 15.4(a)-which in case same thing anyway. For the next three problems, assume that JL(fi) < and that f is measurable sr and bounded.
(D.,
oo
Show that for positive E there exists a finite partition {A;} such that, if {B) is any finer partition and wj E Bj, then
15.5. i
jfdJL - [ J(wj)JL ( Bj) 1
15.6. i
< E.
Show that
f
.
k- 1
[
k-1
k
n JL w : n < f( w ) < -y. 2 2 n lkls: n 2"
fdJL = hm L
The limit on the right here is
]
·
Lebesgue's definition of the integral.
INTEGRATION
206
Suppose that the integral is defined for simple nonnegative functions by j([;x; IA ) dJL = L;X;JL(A ; ). Suppose that fn and gn are simple and nondecreasing and have a common limit: 0 < fn i f and 0 < g, i f. Adapt the arguments used to prove Theorem 15.1(iii ) and show that limn ffn dJL = l imn fg, dJL. Thus, in the nonnegative case, ffdJL can (Theorem 13.5) consistently be defined as limn Jf, dJL for simple functions for which 0 < fn i f.
15.7. i
'
SECTION 16. PROPERTIES OF THE INTEGRAL Equalities and Inequalities
By definition, the requirement for integrability of is that Jr d11- and rr- dJL both be finite, which i� the same as the requirement that Jr dJL + Jr dJL < 00 and hence is the same as the requirement that J g- almost everywhere , and so (16.2) follows by the definition (15.6). •
f
f
f
f
PRooF OF (ii).
First,
af + {3g is integrable because, by Theorem 15.1,
Ji af + f3 g l dJL < J (lai · IJ I + lf3l · l g l) dJL =
l a l j iJI dJL + lf31j l g l dJL
0 and a < 0. Therefore, it is enough to check (16.3) for the case a = /3 = 1. By definition, (f + g) + - (f + g) - = f + g = r -r + + g - = (f + g) - + r + g+. All these func g + - g - and therefore (f + g) + + r tions being nonnegative, f(f + g) + d11- + Jr d11- + fg - d11- = f{f + g) - d11- + Jr dJJ, + fg + dJJ,, which can be+ rearranged to give f(f + g) + d11- - f(f + g) - d11- = Jr d11- - Jr d11- + fg d11- - fg - dJL. But this reduces to (1 6.3). • =
-
Since - I J I n, the function corresponding to Xn l > x n 2 > . . . has integral L:;:, = l x m by Theorem 15.1(i) (consider the decomposition { 1}, . . . , {n}, {n + 1 , n + 2, . . . }). It follows by Theorem 15. 1(iii) that in the nonnegative case the integral of the function given by {xm} is the sum L:m x m (finite or infinite) of the corresponding infinite series. In the general case the function is integrable if and only if L:';;, = 1 1 x m1 is a convergent infinite series, in which case the integral is L:;;', = 1 x � - L,;, 1 x;;; . m The function x m ( - l) + 1 m - I is not integrable by this definition and even fails to have a definite integral, since L,;, = 1 x� = L:';;, = 1 x;;; = oo. This invites comparison with the ordinary theory of infinite series, according to which the alternating harmonic series does converge in the sense that m limM [�= 1( - l) +l m- 1 = log 2. But since this says that the sum of the first M terms has a limit, it requires that the elements of the space f! be ordered. If f! consists not of the positive integers but, say, of the integer lattice points in 3-space, it has no canonical linear ordering. And if L:m x m is to have the same finite value no matter what the order of summation, the series must be absolutely convergent.t Th is helps to explain why f is defined to be inte grable only if Jr d11- and Jr d11- are both finite. • Example 16.1.
=
=
=-
In connection with Example 15.1, consider the function f= 3/(a , oo) - 2 /( - oo, a)· There is no natural value for JfdA. (it is " oo - oo" ), and none is assigned by the definition. Example 16.2.
t
RUDIN)>
p.
76.
INTEGRATION
208
If a function f is bounded on bounded intervals, then each function fn = fl< - n. n ) is integrable with respect to ,.\, Since f = limn fn , the limit of ffn d,.\, if it exists, is sometimes called the " principa! value" of the integral of f. Although it is natural for some purposes to integrate symmetrically about the origin, this is not the right defin ition of the integral in the context of general measure theory. The functions gn = fl< - n, n + I ) for example also converge to f, and fgn d,.\ may have some other limit, or none at all; f(x ) = x is a case in point. There is no general reason why fn should take precedence over gn. As in the preceding example , f = L:; 1( - l) k k - 1 /< k. k + I > has no integral, • even though the ffn d,.\ above converge. �
Integration to the Umit
The first result, the rem 15. 1(iii). Theorem 16.2.
monotone convergence theorem, essentially restates Theo If 0 0, then J L, f, d J.L = L, Jf, dJ.L.
The members of this last equation are both equal either to same finite, nonnegative real number.
oo or
to the
If L.,f, converges almost everywhere and IL. Z = dk l < g almost everywhere, where g is integrable, then Ln fn and the fn are integrable and J L.n /, d J.L = L. n ffn dJ.L. Theorem 16.7.
If Lnflfnl dJ.L < oo, then L., fn converges absolutely almost everywhere and is integrable, and JL.,Jn dJ.L = Ln ff, dJ.L. Corollary.
PROOF. The function g = Lnlfn i is integrable by Theorem 16.6 and is fmite almost eve1ywhere by Theorem 15 .2(iii). Hence Lnlfnl and L.Jn co n • verge almost everywhere, and Theorem 16.7 applies.
5', ,_L), consider a In place of a sequence {fn} of real measurable functions on family [[, : t > 0] indexed by a continuous parameter t. Suppose of a measurable f that
(D.,
( 16 8)
lim f, (rJJ ) = f( w)
1 -+ 00
on a set A, where { 16.9 )
A E 5',
JL ( !l - A ) = 0.
A techP.ical point arises here, since !f need r.ot contain the w-set where (16.8) holds:
Exampie 16.8. Let !f consist of the Borel subsets of !l = [0, 1), and let H be a nonmeasurable �et-a subset of .n that does not lie in !f (see the end of Section 3). Define f,(w) = 1 !f w equals the fractional part t - lt J of t and their common value lies in He; define f, (w) = 0 otherwise. Each f, is measurable 5', but if f(w) 0, then • the w-set where (16.8) holds is exactly H. =
Because of such examples, the set A above must be assumed to lie in !f. (Because of Theorem 13.4, no such assumption is necessary in the case of sequences.) Suppose that f and the f, are integrable. If I, = Jf, dJL converges to I = ffdJL as t -> oo, then certainly I, -> I for each sequence {t n} going to infi nity. But the converse holds as well: If I, does not converge to I, then there is a positive E such that I I, - / I > E for a sequence {tn } going to infinity. To the question of whether /1 " cOnverges to I the previous theorems apply. Suppose that (16.8) and I J,(w)l < g(w) both hold for w E A, where A satisfies (16.9) a nd g is integrable. By the dominated convergence theorem, f and the f, must then be integrable and I, -> I for each sequence {tn} going to infi nity. It follows that Jf, dJL -> ffd JL· In this result t could go co ntinuously to 0 or to some other value instead of to inti nity.
INTEGRATION
212
) Suppose that f(w, i s a measurable and integrable function of t w for each t in (a, b). Let (t) = ff(w, t )JL(dw ). (i) Suppose that for w E A , where A satisfies (16.9), f(w, t) is continuous in t at t0; that I f(w, suppose )I fun her t < g(w) for w E A and I t - t 01 < B, where B is in dependent of w(ii)andSuppose g is inthat tegrable. Then rjJ(t) i s continuous at t0. for w E A , where A satisfies (16.9), f(w, t) has in (a, b) a derivative l f '(w, )I t < g(w) for w E A and t E (a, b), where g is f'integrable. (w, t ); suppose further that Then rjJ(t) has derivative ff'(w, l)JL(dw) on (a, b). Theorem 16.8.
PROOF Part (i) is a n immediate consequence of the preceding discu ssion. To prove part (ii), consider a fixed If E A , then by the mean-value theorem,
t. w
f( t +h) -f( I ) ( w, s), h where s lies between 1 and t + h. The ratio on the left goest to f'(w, t) as h -> 0 and is by hypothesis domi nated by the integrable function g( w ). Therefore, W,
r/J( t + hl -( t)
=
J
W,
r•
=1
f( uJ,t + h� - f( w , t) JL ( dw ) -> jf'( w , t) JL ( dw ) . g g(w, t) t.
•
The condition involving in part (ii) can be weake ned. It suffices to a ssume that for each there is a n integrable such that for E A and all < in some neighborhood of
t
s
l f'(w, s)l g(w, t) w
Integration over Sets
The integral of
f over a set
A in :Y is defined by
( 16 . 10) The definition applies if f is defined only on A in the first place (set f = 0 outside A). Notice that fA fdJL = 0 if JL( A) = 0. All the concepts and theorems above carry over in an obvious way to integrals over A . Theorems 16.6 and 16.7 yield this result:
If A 1 , A 2 , are disjoint, and if f is either nonnegative or integrable, then f u A fdJL = L.n fA fd !J- . Theorem 16.9.
• • •
n
n
t Letting h go to 0 through
be 0, say, elsewhere.
n
a sequence shows that each f'( · , t ) is measurable
..r o n
A; take it to
SECTION 16.
PROPERTIES OF THE INTEGRAL
213
The integrals (16. 10) usually suffice to determine
f:
Iff and g are nonnegative and fAfdJL = fA gdJL for all A in !F, and if 11- is u-finite, then f = g almost everywhere. (ii) Iff and g are integrable and fA fdJL = fA gdJL for all A in !F, then f= g almost everywhere. (iii) Iff and g are integrable and fAfdJL = fA gdJL for all A in .9, where 9 is a rr-system generating !F and !l is a finite or countable union of .9-sets, then f = g almost everywhere. Theorem 16.10.
(i)
PROOF. Suppose that f and g are nonnegative and that fAfdJL < fA gdJL for all A in !T If 11- is u-finite, there are .r-sets A" such that A n i D. and JL(A n ) < oo . If Bn = [0 < g f uniformly and deduce ffn dJL -> JfdJL fro m (16.5). (bJ Use part (a) and Egoroff's theorem to give another proof of Theorem 16.5.
16.2.
Prove that if 0 < fn -> f almost evt!rywhere and ffn dJL < A < oo, then f is integrable and ffd,u < A . (This is essentially the same as Fatou's lemma and is sometimes called by that name.)
16.3.
Suppose that fn are integrable and supn ffn dJL < oo. Show that, if fn i f, then f is integrable and ffn dJL -> ffdJL. This is Beppo Levi 's theorem. Suppose that functions an , bn, fn converge almost everywhere to func tions a, b, f, respectively. Suppose that the first two sequences may be integra ted to the limit-that is, the functio ns are all integrable and fan dJL -> fa dJL, fbn dJL -> fbdJL. Suppose, finally, that the first two sequences enclose the third: an < fn < bn almost everywhere. Show that the third may be integrated to the limit. (b) Deduce Lebesgue's dominated convergence theorem from part (a).
16.4. (a)
16.5.
About Theorem 16.8: (a) Part (i) is local: there can be a different set A for each t0. Part (ii) can be recast as a local theorem. Suppose that for A, where A satisfies (16.9),
wE
INTEGRATION
220
f(w, t) has derivative f'(w, t0) at t0; suppose further that ( 1 6 .3 2 )
for w E A and 0 < I h i < o, where o is independent of w and g 1 is integrable. Then '(t0) = ff'(w, t0)J.L(dw) . The natural way to check ( 1 6.32), however, is by the mean-value theorem, and this requires (for w E A ) a deri vative throughout a neighborhood of t0 (b) If J.L is Lebesgue measure on the unit interval fl, (a, b) = (0, 1), and f(w, t) = l {3.
For positive
h,
f
f
1 x +h 1 x +h f( y ) dy - f( x) < h lf( y ) -f( x ) l dy h X
X
< sup [If( y )
- f( x )I: x < y < x + h] ,
h
and the right side goes to 0 with if f is continuous at x. The same thing holds for negative and the re for e faxf(y) dy has derivativ e f(x):
h,
X ! f( y ) dy = f( X ) dx d
( 17 .5)
a
f is continuous at x. Suppose that F is a function with continuous derivative F' = f; suppose that F is a primitive of the continuous function f. Then if
( 17 .6)
b ja f( x) dx = ta F ( x ) dx = F ( b ) - F ( a ) '
that IS,
,
as follows from the fact that F(x) - F(a) and faxf( y) dy agree at x = a and by (1 7.5) have iden tical derivatives. For con tinuous f, (1 7.5) and ( 1 7.6) are two ways of stating the fundamental theorem of calculus. To the calculation of Lebesgue integrals the methods of elementary calculus thus apply. As will follow from the general theory of derivatives in Section 31, (17.5) holds outside a set of Lebesgue measure 0 if f is integrable- it need not be continuous. As the following example shows, however, 07.6) can fail for discon tin uous f.
Example 17.4. Define F(x) = x2 sin x-2 for 0 < x < 1 and F(x) = 0 for x � 0 an d for x > 1 . Now for 4- < x < 1 define F(x ) in such a way that F is continuously differentiable over (O, oo). Then F is everywhere differentiable, but F'(O) = 0 and F'(x) = 2x sin x - 2 - 2x- 1 cos x- 2 for 0 < x < t · Thus F' is discontinuous at 0; F' is , in fact, not even integrable over (0, 1 ], which makes (17.6) impossible for a = 0. For a more extreme example, decompose (0, 1 ] into countably many subinterval s (an, bn ]. Define G( x ) = 0 for x < 0 and x > 1, and on (an, bn l define G(x) = F((x an )f(bn - an)). Then G is everywhere differentiable, but (17.6) is impossible for G if (a, b ] contains any of the (an, bn], because G is not in tegrable over any of them. • Change of Variable
For
( 17. 7)
SECTION
17.
THE INTEGRAL WITH RESPECf TO LEBESGUI: MEASURE
225
the change-of-variable formula is
( 17 .8 ) T' b
f
If exists and is continuous, and if is conti nuous, the two integrals are finite because the in tegrand are boun ded, and to prove it is enough to let be a variable and differentiate with respect to it. t With the obvious limiting argume nts, this applies to unbounded intervals and to open ones: Example 1 7.5.
(17.8)
Put T{x ) = tan x on ( - 7T'j2, 7T'/2). Then 1 gives for
T2(x ), and (17.8) f( y) (1 + y2) dy = 7T' . "' ( 17 .9) ! + y2
T'( x) = 1 +
=
- oo
The Lebesgue Integral in
Iii
1
Rk
The k-dimen sional Lebesgue integral, the integral in (R\ !iR k , A k ), is de k In low-dime nsional n oted f it being understood that P 1, cases it is also denoted f fA and so on. 1 As for the rule for changing variables, suppose that T: U --+ R\ where U is k , k an open set in R . The map has the form ; it is by definition continuously differen tiable if the partial derivatives = exist and are continuous in U. Let Dx [ ii ] be the Jacobian matrix, let det Dx be the Jacobian determinant, and let V
f(x) dx,
x = (x . . . , x ). f(x x2 ) dx dx2 , Tx = (t1(x), . . . t (x)) tJx) atJaxi = t (x) = TU. J(x) = Theorem 17.2. Let T be a continuoasly differentiable map of the open set U onto V. Suppose that T is one-to-one and that J(x) =I= 0 for all x. If f is notlnegative, then j f( Tx ) IJ(x ) l dx = jf( y) dy.
(17.10 )
u
v
By the inverse-function theorem [A35], V is open and the inverse point that mapping T- 1 is continuously differen tiable. It is assumed in V --+ R 1 is a Borel function . As usual, for the general holds with in place of and if the two sides are finite, the absolute-value bars can be removed; and of course can be replaced by or flTA ·
f,
(17.10) f, (17.10)
f
t See Problem 17. 1 1 for extensions.
fi8
f: If I
INTEGRATION
226
Example 1 7.6. Suppose that T is a non singular linear transformation on U = V = R k . Then Dx is for each the matrix of the transformation. If T is
x
identified with this matrix, then (17.10) become s
ld e t TI J. f( Tx) dx = J.t ( y) dy.
( 17.1 1)
u
v
If f = ITA ' this holds because of ( 12.2), and then it follows in the usual sequence for simple f and for the general nonnegative f: Theorem 17.2 is easy in the linear case. •
R2
Example 1 7. 7. In tak e U [(p, 0): p > 0, 0 < 0 < 2 -rr ] and T(p, 0) ( p cos O , p sin O) . The Jacobian is J(p , O) = p, and (17. 10) gives the formul3 for integrating in polar coordinates:
( 1 7 . 1 2)
JJp> O
=
=
f( p cos O , p sin O ) p dp dO =
0 A k ( TA ) .
( 17.14) tSPivAK,
p.
72.
SECTION 17. THE INTEGRAL WITH R ESPECf TO LEBESGUE MEASURE
227
Each side of 07.14) is a measure on �= U n !JR k . If .r:ff consists of the rectangles A satisfying A - c U, then .r:ff is a semiring generating �. U is a countable union of dsets, and the left side of 07. 14) is finite for A in .r:ff (supA - 111 < oo). It follows by Corollary 2 to Theorem 1 1.4 that if 07.14) holds for A in .w', then it holds for A in �. But then (linearity and monotone convergence) 07.13) will follow. Proof of 0 7. 14) for A in d. Split the given rectangle A into finitely many subrectangles Q1 satisfying diam Q1 < o,
(17.15 ) 'o
to be determined. Let x1 be some point of Q1• Given £ , choose o in the first place so that IJ(x) - J(x')l < € if x, x' E A - and lx - x'l < o. Then (17.15) implies
( 1 7.16) Let Qt• be a rectangle that is concentric with Q1 and similar to it and whose edge lengths are those of Q1 multiplied by 1 + *' · For x in U consider the affine transforma tion ( 17.17)
x z will [A34] be a good approximation to Tz for z near x . Suppose, as will be p1 0ved in a moment, that for each *' there is a o such that, if 0 7.15) holds, then, for each i, x approximates T so well on Q1 that '
{ 17.18) By Theorem 12.2, which shows in the nonsingular case how an affine transforma tion changes the Lebesgue measures of sets, A k (x Qt ' ) = IJ(x1)1A k (Q1+ '). If 0 7.18) holds, then '
( 17.19)
A k (TA ) = }::) k (TQ;) < }::) k ( x; Qt ') i
i
= [ l l (x1)1A k ( Qt ' ) = ( 1 + t: / [ l l ( x1 ) 1A k ( Q 1 ) . i
i
(This, the central step in tht! proof, shows where the Jacobian in (17. 10) comes from .) If for each *' there is a o such that 07.15) implies both (17.16) and 07.19), then (17.14) will follow. Thus everything depends on 07.18), and the remaining problem is to show that for each *' there is a o such that 0 7. 18) holds if (17. 15) does. Proof of 07. 18). As (x, z) varies over the compact set A - x [z: lzl = 1], 1Dx- 1 zl is continuous, and therefore, for some c, { 17.20)
Since the tJI are uniformly continuous on A -, o can be chosen so that ltp(z) - tp(x)l < t:jec for all j, l if z, x E A and lz - xl < o. But then, by linear approximation [A34: 06)], ITz - Tx - DxCz - x)l < t:c- 1 lz - x l < t:c - 1 o. If 07. 15) holds and 8 < 1, then by the definition 07.17), { 17.21 )
ITz - x zl < t:/C '
for z E Q1 •
INTEGRATION
228
To prove (17.18), note that z
E Q; implies
1 1 . z l I D � Ix-, Tz - zl = Ix, I Tz - - I -�-. = x , ( Tz - x , z ) l 'f'x, x,
-
< ciTz - x z l < t: , '
where the first inequality follows by (17.20) and the second by (1 7.21). Since ; 1 Tz is within t: of the point z of Q, , it lies in Q/': ; 1 Tz Q/', or Tz x' Q/'. Hence (17.18) holds, which completes the proof. • '
E
E
Stieltjes Integrals
F
Suppose that is a function on Rk satisfying the hypotheses of Theorem for bounded 1 2.5, so that there exists a measur� J-L such that ( A ) = rectangles A. In integrals with respect to is often replaced by
J.L D.A F J.L, J.L(dx)
dF(x):
jAf( x ) dF ( x ) JAf( x ) J.L (dx).
( 1 7.22)
=
The left side of this equation is the Stieltjes integral of with respect to F; since it is defined by the right side of the equation, nothing new is involved. Suppose that is uniformly continuous on a rectangle A, and suppose f(y)l < that A is decomposed into rectangles A m small enough that E / ( A ) for y E Arn. Then
f
f
J.L
IJ(x) -
x,
�f( x ) dF ( x ) - L f( xm)D.AmF m
<E
xm
for E A m · In this case the left side of (1 7.22) can be defined as the limit of the a, lf(x)l > t:] -+ 0 as a -+ oo , Show by example that f(x) need not go to 0 as x -+ oo (even if f is continuous).
230 17. 7.
INTEGRATION
Let
r::;
=
16. 7.
17.8.
f. (x) =x" - 1 - 2x 2" - 1• Calculate and compare JJ L,=dn(x )dx and 1 JJf,(x) dx. Relate this to Theorem 16.6 and to the corollary to Theorem
Show that (1 + y 2 )- 1 has equal integrals over ( - oo ,- 1), ( - 1, 0),(0, 1),(1, oo). Conclude from (17.9) that f J(l + y 2 ) - 1 dy = '7T14. Expand the integrand in a geometric series and deduce Leibniz 's formula '7T
-
4
=
1-
1
- +
3
-51 71 + · · · - -
16.7 (note that its corollary does not apply). 17.9. Show that if f is integrable, there exist continuous, integrable functions g, such that g,(x) -+ f(x) except on a set of Lebesgue measure 0. (Use Theorem 17.1(ii) with € = n - 2.) 17. 10. 13.9 17.9 i Let f be a finite-valued Borel function over [0, 1] By the following steps, prove Lusin 's theorem: For each € there exists a continuous function g such that A [ x E (0, 1): f(x) * g(x)] < E. (a) Show that f may be assumed integrable, or even bounded. (b) Let g, be continuous functions converging to f almost everywhere. Combine Egoroff's theorem and Theorem 12.3 to show that convergenc� is un iform on a compact set K such that A((O, 1) - K) < €. The limit lim, g,(x) f(x) must be continuous when restricted to K. (c) Exhibit (0, 1) - K as a disjoint union of open intervals lk [A12], defi n e g as by Theorem
=
f on K, and define it by linear interpolation on each 1, .
17. 11.
Suppose in 07.7) that T' exists and is continuous and f is a Borel function, and suppose that Jtlf(Tx)T'(x)l dx < oo_ Show in steps that f (a,b) lf(y)l dy < oo T and (17.8) holds. Prove this for (a) f continuous, (b) f = 11 t i' (c) f 18 , (d) f simple, (e) f> 0, (f) f general. ••
17.12.
=
Let ../ consist of the continuous functions on R 1 with compact support. Show that ../ is a vector lattice in the sense of Problem 1 1 .4 and has the property that f E ../ implies fA 1 E ../ (note that 1 � ../ ). Show that the a--field !T generated by ../ is � 1 • Suppose A is a positive linear functional on ../; show that A has the required continuity property if and only if f,( x)! 0 uniformly I n x implies A(f,) -+ 0. Show under this assumption on A that there is a measure JL on � 1 such that
16.12 i
{ 17.25)
A { f) = jfdJL,
f E ../ .
Show that JL is a--finite and unique. This is a version of the Riesz representation
theorem.
Let A(f) be the Riemann integral of f, which does exist for f in ../'. Usin g the most elementary facts about Riemann integration, show that the JL determ ined by (17.25) is Lebesgue measure. This gives still another way of constructing Lebesgue measure.
17.13. i
17.14. i
Extend the ideas in the preceding two problems to
Rk .
PRODUCT MEASURE AND FUBINI 's THEOREM
SECTION 18.
231
SECTION 18. PRODUCT MEASURE AND FUBINI'S THEOREM
Let (X, 8{') and (Y, W) be measurable spaces. For given measures J.L and v on these spaces, the problem is to construct on the Cartesian product X X Y a product measure 7T such that 7T(A X B) = J.L(A)v(B) for A e X and B c Y. In the case where J.L and v are Lebesgue measure on the line, 7T will be Lebesgue measure in the plane. The main result is Fubini's theorem, accord ing to which double integrals can be calculated as iterated integrals. Product Spaces
It is notationally convenient in this section to change from (f!, .9') to (X, 8{') and (Y, W). In the product space X X Y a measurable rectangle is a product A X B for which A E 8{' and B E W. The natural class of sets in X X Y to consider is the u-field 8{'x W generated by the measurable rectangles. (Of course, 8{'x W is not a Cartesian product in the usual sense.) Suppose that X = Y = R 1 and flC W= � 1 • Then a mea surable rectangle is a Cartesian product A X B in which A and B are linear Borel sets. The term rectangle has up to this point been reserved for Cartesian products of intervals, and so a measurable rectangle is more general. As the measurable rectangles do include the ordinary ones and the latter generate IN 2 , it follows that � 2 c �N 1 x � 1 . On the other hand, if A is an interval, [ B: A X B E � 2 ] contains R 1 (A X R 1 = U n(A X ( - n, n]) E IN 2 ) and is closed under the formation of proper differences and countable unions; thus it is a u-field containing the intervals and hence the Borel sets. Therefore, if B is a Borel set, fA: A X B E IN 2 ] contains the intervals and hence, being a u-field, contains the Borel sets. Thus all the measurable rectangles are in IN 2 , and so IN 1 x IN 1 IN 2 consists exactly of the two • dimensional Borel sets. Emmple 18.1.
=
As this example shows, 8{'x measurable rectangles.
W is
in general much larger than the class of
{i) If E E 8{'x W, then for each x the set [ y: (x, y ) E £] Lies in W and for each y the set [x: (x, y) E £] Lies in f:?r. (ii) Iff is measurable 8{'x W, then for each fixed x the function f(x , ) is measurable W, and for each fixed y the function f( y) is measurable 8{'. Theorem 18.1.
·
· ,
The set [ y : the section of PROOF.
(x, y) E £ ] is the section f determined by x.
of E determined by
x, and f(x, · ) is
Fix x , and consider the mapping Tx : Y � X X Y defined by 1
Tx y = (x, y).
If E = A X B is a measurable rectangle,
Tx- £
is
B
or
0
INTEGRATION
232
Tx- 1£ E 1 [y: (x,y) E E] = T.- E E
x
W. By Theorem according a s A contains or not, and in either case W for 13.l{i), is measurable Wj8lx W. Hence 1 ffx W. By Theorem 13.1(ii), if is measurable 8lx WjB£1 , then is 1 measurable WjB£1 • Hence ·)= ( · ) is measurable W. The symmetric • statements for fixed are proved the same way.
EE
T.
f f(x, fT.
y
fT.
Product Measure
v) finite. x.
JL)
Now suppose that (X, ff, and (Y, W, are measure spaces, and suppose By the theorem just proved y : for the moment that and v are is a well-defined function of If ..£ is the class of in ffx W for which this function is measurable ff, it is not hard to show that ..£ is a A-system. Since the function io; I,ix)v(B) for = A X B, ..£ contains the 7T-system consisting of the measurable rectangles. Hence ..£ coincides with f!CX W by the 7T-A theorem. It follows without difficu lty that
11-
(x, y) E £]
E
v[
E
( 1 8 . 1)
=
7T'( £ ) J v[ y: ( x, y ) E £] 11- ( d.t ) , X
E E f!Cx W,
is a finite measure on f!Cx W, and similarly for
( 18.2 )
7T"( £ ) = jJL[x : ( x, y ) E E] v(dy) , y
E E ffx W.
For measurable rectangles,
( 1 8.3 )
7T'( A X B )
=
1T"( A X B) = JL( A ) · v( B } .
E
= 1T"(E)
The class of in ffx W for which 7T'(£) thus contains the measurable rectangles; since this class is a A-system, it contains f!Cx W. The common value 7T'(£) 1T"( is the product measure sought. To show that (18.1) and (18.2) a!so agree for u-finite and let {A ml and {Bn} be decompositions of X and Y into sets of finite measure, and put m( A) JL(A n A m ) and vn(B) = v(B n Bn). Since v(B) [m vm(B), the in tegrand in (18. 1) is measurable 8l in the lT-finite as well as in the finite case ; hence 7T' is a well-defined measure on ffx W and so is 7T". If 1T', n and 1T:;,n are (18.1) and (18.2) for and n , · then by the finite case , already treated = L m n1T',n( E) (see Example 16.5), Thus (18.1) L mn1T:;, n(E) and (1 8.2) coincide in the u-finite case as well. Moreover, 1T'(A X B) =
= E)
11-
11-
=
v,
=
11-m v 7T'( £) 7T"(E). Lmnllm(A)vn(B) JL( A )v( B). Theorem 18.2. If (X, ff, JL) and (Y, W, v) are u-finite measure spaces, 7T(E) = 7T'( E) 1T"(E) defines a u-finite measure on f!Cx W; it is the only measure such that 7T(A X B) = J.L(A) · v(B) for measurable rectangles. =
=
=
=
SECTION 18.
PRODUCf MEASURE AND FUBINI ' S THEOREM
233
P RooF. Only u-finiteness a nd uniqueness remain to be proved. The
products A m X Bn for {A m } and {Bn} as above decompose X X Y into measurable rectangles of finite 7T-measure. This proves both u-finiteness and uniqueness, since the measurable rectangles form a 7T-system generatmg • 8lx W (Theorem 10.3).
product measure;
J.Lx v.
it is usually denoted X The 1T thus defined is called Note that the integrands in (18.1) and (18.2) may be infinite for certain and which is one reason for introducing functions with infinite values. Note also that ( 18.3) in some cases requires the conventions ( 15.2).
y,
Fubini's Theorem
Integrals with respect to 1T are usually computed via the formulas ( 18.4)
J f(x, y) rr( d(x,y)) J [Jt( x, y)v( dy) ] J.L(dx) =
XXY
X
Y
and
( 18.5) J f( x, y) 7r ( d( x, y)) J [ Jt( x, y)J.L(dx) ] v(dy ). The left side here is a double integral, and the right sides are iterated integrals. The formulas hold very generally, as the following argument show�. =
Xx Y
Y
X
Consider (1 8.4). The inner integral on the right is
jf( x, y)v( dy). Because of Theorem 18.1(ii), for f measurable ff x W the integrand here is measurable W; the question is whether the integral exists, whether (18.6) is measurable 8l as a function of x, and whether it integrates to the left side of (18.4). First consider nonnegative f. If f = IE, everything follows from Theorem 18.2: (18.6) is v[ y : (x, y) E £], and (18.4) reduces to 7T{£) 1T'(E). Because of linearity (Theorem 15.1(iv)), if f is a nonnegative simple function, then ( 18.6)
y
=
(18.6) is a linear combination of functions measurable f!C and hence is itself measurable ff; further application of linearity to the two sides of (18.4) shows that (18.4) again holds. The general nonnegative is the monotone limit of nonnegative simple functions; a pplying the monotone convergence theorem to (18.6) and then to each side of (18.4) show s that again has the properties required. Thus for nonnegative (18.6) is a well-defined function of (the value oo is not excluded), measurable ff, whose integral satisfies (18.4). If one side of
f
f
f,
x
INTEGRATION
234
(18.4) is infinite, so is the other; if both are finite, they have the same finite value. Now suppose that not necessarily nonnegative, is integrable with respect to Then the two sides of (18.4) are finite if is replaced by I Now make the further assumption that
f,
1r.
f
j l f(x , y)l v ( dy)
( 18. 7)
y
fl.
< oo
for all x. Then ( 18.8) j f(x , y )v(dy ) = ( r e x , y)v( dy) - jr ( x , y)v( dy). Jy
y
y
r < l tD
The functions on the right here are measurable f!C and (since r, integrable with respect to J.L, and so the s ame is true of the function on the left. Integrating out the X and applying (18.4) to r and to r gives (18.4) for itself. satisfying (18.7) need not coincide with X, but The set A0 of J.L(X -- A0) = 0 if is integrable with respect to because the function in (Theorem 15.2{iii)). Now (18.8) holds on A0, (18.7) integrates to (18.6) is measurable 8r on A0, and (18.4) again follows if the inner integral on the right is given some arbitrary constant value on X - A0. The same analysis applies to (18.5):
f
x
1r,
f Jl f l d1r
Theorem 18.3.
the functions
Under the hypotheses of Theorem 18.2, for nonnegative f
1 f( v( dy) , f f( X , y) J.L( dx) are measurable 8l and W, respectively , and (18.4) and (18.5) hold. If f (not necessari l nonnegative) is in tegrable with respect to 1r, then th e two functions y (18.9) are finite and measurable on A0 and on B0, respectively , where J.L(X A0) = v(Y - B0) 0, and again (18.4) and (18.5) hold. ( 18 .9)
y
X, Y)
X
=
It is understood here that the inner integrals on the right in (18.4) and (18.5) are set equal to 0 (say) outside A0 and B0.t This is the part concerning nonnegative is sometimes Application of the theorem usually follows a two-step called procedure that parallels its proof. First, one of the iterated integrals is computed (or estimated above) with 1 !1 in place of If the result is finite,
Fubi n i' s theorem; Tonelli's theorem.
f
f.
t since two functions that are equal almost everywhere have the same integral, the theory of integration could be extended to functions that are only defined al most everywhere; then A0 and would disappear from Theorem 18.3.
B0
SECfiON 18.
PRODUCf MEASURE AND FUBIN! ' S THEOREM
235
then the double integral {integral with respect to 7T ) of 1 !1 must be finite, so that f is integrable with respect to 1r; then the value of the double integral of f is found by computing one of the iterated integrals of f. If the iterated integral of 1!1 is infinite, f is not integrable 7T. Let D, be the closed disk in the plane with center at the origin and radius r. By (17.12), Example 18.2.
A 2 ( D,) =
ff dx d D,
y
=
ff
O n k. Since J l fm -f,I P dJL > aPJL[Ifm -f,l > a] (thisk is just a general version of Markov's inequality (5.3 1)), JL[I f, - fml > 2- ] < 2Pk l fm - fJ� < 2 - k for Therefore, L 1 JL[I f, + I -J, k l > 2 k ] < oo, and it follows by the first Borei-Ca;ltelli lemma (which works for arbitrary measures) that, outside a set of wmeasure 0, L:kl f, k + I - !, k I converges. But then f, k converges to some f almost everywhere, and by Fatou ' s lemma, !If - f,kl P dJL < lim inf; ! I f,; - /,� I P dJ-t < 2 - k _ Therefore, f E L P and I ! - !,� l i P -� 0, as required. If = oo, choose { } so that 11/m - /,l i "" < 2- k for m , > Since k If, � + I -f, I < 2-k almost everywhere, /, � converges to some f, and if -f,k I < • 2 - k almost everywhere. Again, II!-f,) p -+ 0. PROOF.
p
m , n -+
{n k }
m, n >nk .
p
k
-
n nk.
n
k
The next theorem has to do with separability.
Let U be the set of simple functions L:�: 1aJ8 for a; and JL( B ) finite. For < p < U is dense in LP. af i n i t e and !F is countably generated, and if p < then LP (ii) lf separable. Theorem 19.2. ;
1-L is
(i)
1
I
oo ,
is
oo ,
of (i). Suppose first that p < For f E LP , choose (Theo Proof rem 1 3.5) simple functions J, such that fn -+ f and If, I < I f l. Then f, LP, PRooF.
and by the dominated convergence theorem,
oo.
E
J l f -J,I P dJL -+ 0. Therefore,
244
INTEGRATION
for some n; but each fn is in U. As for the case p if In2! n-fn> 1 i /P 1 l g l q ; and is obviously linear. According to the Riesz representation theorem, this is the most general bounded linear functional in the case p < i s £Tj i n ite, t h at 1 i s Suppose that < and that J.L on LP has t o p. Every form (19.8) for conjugate functi o nal the bounded L i n ear some g E U; further, ( 19 .9) and unique up to a set of J.L-measure 0. A
=
.6
a
N A n), and since l cp( u n N A )I < MJ.Ll /P( U n N A ) � 0, it fol lows that cp is an additive set function in the sense of (32. 1). The Jordan decomposition (32.2) represents cp as the difference of two finite measures cp + and cp - with disjoint supports A + and A -. If J.L( A ) = 0, then cp + ( A ) = cp( A n A + ) < MJ.L1 1 P( A ) = 0. Thus cp + is absolutely continu ous with respect to J.L and by the Radon-Nikodym theorem (p. 422) has an integrable density g + : cp +( A ) = fA g + dJ.L. Together with the same result for cp - , this shows that there is an integrable g such that ·y( !A ) = cp( A) = = Thus = for simple functions in L P. Assume that this lies in U, and define 'Yg by the equation (19.8). Then and are bounded linear functionals that agree for simple functions; since the latter are dense (Theorem 1 9.2(i)), it fol lows by the continuity of and 'Yg that they agree on all of L P. It is therefore enough (in the case of finite J.L) to prove E U. It will also be shown that is at most the M of (1 9.7); since does work as a bound in (19.7), (19.9) will fol low. If r/ ) = 0, (1 9.9) will imply that = 0 al most everywhere, and for the general satisfying r/ ) = must it will follow further that two functions agree almost everywhere. oo. Let gn be simple functions such that 0 1 i > and take n = � P sgn Then h n g = n and since hn < = n J.L r/ ,) = <M is simple, it follows that l = this gives J.L ! < M. Now the M[ fgn Since 1 monotone convergence theorem gives E LP and even < M. = = 1 = l ( )l < M · = oo. In this case, Then J.L > al < for simple functions m L1 • Take = sgn = If < M · II/III = M, this inequality gives J.Ll 0; therefore g il"" = M and E L"' = U. Let A n be set'> such that A n i fi and J.L(A) oo . If II: J.L for J E J.Ln( A ) = J.L(A n A ), then lr(f/.4)1 < M =M. LP( J E L P(J.L) c L P (J.L)). By the finite case, A n supports a gn in Lq such that = for J E LP and < M. Because of uniqueness, gn l can be taken to agree with gn on A n ( LP(J.Ln l ) c LP(J.L)). There is + + therefore a function on n such that = n on A n and M. It E U. By the dominated convergence theorem < M and follows that = limn and the continuity of E LP imp lies = " limn Uniqueness fol lows as before. )= • " >
>
fA gdJ.L f!A gdJ.L. y(f) ffgdJ.L g y 'Yg f
g l gll q
f
y
llgl l q
g
f y(f ) Assumehthat g I a [ MJ.L f/[lgl�a l · lgl dJ.L ffgdJ.L I g II llgll q < l g l > a] = Case a--finite. < · llf!A ) P [f ltl pdJ.LJ' IP fA y(fiA ) ffiA . gn dJ.L , l gJ q g g I l iA gll q < g llgl l q g y, f ff/A g dJ.L ffg dJ.L y(f!A y(f).
y
g
=
=
"
t Problem 1 9 3
246
INTEGRATION
Weak Compactness
f E LP, and g E U, where p and q are conjugate, write ( 19.10) ( f, g ) = ffgdJL . For fixed f in LP , this defines a bounded linear functional on U; for fixed g in U, it defines a bounded linear functional on LP. By Holder' s inequality, For
( 19 .1 1 ) Suppose that and fn are elements of If limn(f�, for each 0, then certainly to f. If in U, then converges weakly to although the converse is false. t The next theorem says in effect that if p > then the unit ball Bf 1 is compact in the topology of weak convergence.
LP. (f, g) = g) f fn weakl g fn converges --> fnl i I ! y P f, 1, = [fELP : 1 / I P < ] Suppose that is uf i n ite and :Y is countably generated. If oo, then eve1y sequence in B f contains a subsequence converging weakly 1to in S, choose rules o) = x ( · ) is an element of L"", in fact of 8 �, and so by Theorem 19.4 there is a subsequence along which, for each j = 1, . . . , k, oJ">
JA dfL > 0 and J( l LA )lA aJ.L = lim n /0 "L, W >)JA J.L = 0, so that oi > 0 and L,ioi 1 almost everywhere on A. S ince J.L is u-finite, the oi can be altered on a set of J.L-measure 0 in such a way nas to ensure that o = (81, , ok ) is an element of D. risk But, along the subsequence, < > R( o ). Therefore: -
d
-
=
x
The set i s compact and vex. conThe rest is geometry. For x in Rk , let Qx be the set of x' such that 0 <x; <x1 for all If x R(o) and x' = R(o'), then o' is better than o if and only if x ' E Qx and is admissible if there exists no better t han o; it makes no sense to x'use a ruleA rule that is not admissible. Geometrically, admissibility means that. for x = R(o), Q x consists of x alone. i. * X.
=
•
-->
0
•
•
01
Sn
>
x
Let = R(o) be given, and suppose that o is not admissible. Since S n Qx is compact, it contains a po int nearest the origin unique, since S n Qx is convex as well as compact); let o' be a conesponding rule: = R(o'). Since o is not admissible, it would x' and o' is better than o. If S n Q x · contained a point distinct from ' be a point of S n Q, nearer the origin than x , which is impossible. This means that Q x · contains no point of S other than x ' itself, which means in turn that 8' is admissible. Therefore, if o is not itself admissible, there is a o ' that is admissible and is bette1 than o . This is expressed by saying that the class of admissible rules is
x'
=l=x,
(x'x'
x',
complete. Let p = (p1, , Pk ) be a probability vector, and view P; as an a priori probability that [; is the correct density. A rule o has Bayes risk R( p , o) = [1p1R1(o) with respect to p. This is a kind of compound risk: [; is correct with probability p1, and the statistician chooses fj with probability 8/ w ). A Bayes rule is one that minimizes the Bayes risk for a given p. In this case, take a = R(p, o) and consider the hyperplane H = [ z : � P;Z; = a ] ( 19. 12) •
•
•
and the half space
( 19.13 )
[
H+= z· " l..., p .z. > a ·
i
,
r -
]
•
Then x = R(o) lies o n H, and S is contained in H+: x is on the boundary of S, and
SECTION 19.
TH E
LP
SPACES
249
H is a supporting hyperplane. If P; > 0 for all i, then Qx meets S only at x , and so o is admissible. Suppose now that o is admissible, so that x = R( o ) is the only point in S n Q x and x lies on the boundary of S. The problem is to show that o is a Bayes rule, which means finding a supporting hyperplane (19.12) corresponding to a probability vector p. Let consist of those for which Q Y meets S. Then is convex: given a convex combination = of points in choose in S points and southwest of and respectively, and note that = lies in S and is southwest of Since S meets Qx only in the point x, the same is true of so that x is a boundary point of as well as of S. Let (19.12) ( p -=F 0) be a supporting hyperplane through x x H and H+. If P,· < 0, take Z; = X; 1 and take Z ; = x ,. for the other i; then lies in but not in H+, a contradiction. (The right-hand figure shows the ro le of and only H2 the planes H 1 and H2 both support S, but only H2 supports corresponds to a probability vector.) Thus P; > 0 for all i, and since f.; P; = 1 can be arranged by normalization, o is indeed a Bayes rule. Therefore
T y y" Ay + A' y ' y', T E Tc T U
T, z" Az + A'z' +
(I
tl
are Bayes rules, and they form a complete class. The S pace
T
z z'
yy". T, z T: T The admissible rules
L2
The space is special because = 2 is its own conjugate index. If f, g E L the is well defined, and by (1 9. 1 1 ) -write 11!1 1 in (f, place of IIJ II 2 -K/, This is (or Cauchy-Schwarz) inequality. If one of f and is fixed, (f, is a bounded (hence continuous) f ), the norm is giver. by linear functional in the other. Further, (f, is complete under the metric A 11! 11 = ( / f ), and is a vector space on which is defined an inner product having all these properties. The Hilbert space is quite like Euclidean space. If (f, = 0, then f and are and orthogonality is like perpendicularity. If f1 , , fn a1 e orthogonal (in pairs), then by linearity, (L.J;, L.J) = L;Lif;, f) = L.; (f; , f): IL.; /; 11 = L; /;1 1 . This is a version of the Pythagorean theo rem. If f and are orthogonal, write f .l For every f, f .l 0. Suppose now that 11- is a--finite and !F is countably generated, so that C is separable as a metric space. The construction that follows gives a sequence • • • that is in the sense that llcpnll = 1 for (finite or infinite) cp and is all and (cpm , cpn ) 0 for in the sense that (f, cp ) = 0 for all implies f = 0-so that the orthonormal system cannot be enlarged. Define 1 , Start with a sequence ... f2 , . . . that is dense in have been defined and are inductively: Let 1 f1 • Suppose that P • • • , orthogonal. Define is L.j= 1 ; ; where if Then is orthogonal to and is arbitrary if and f 1 is a linear combination of 1 , • • • , 1 • This, the Gram-Schmidt method, • • • with the property that the finite gives an orthogonal sequence linear combinations of the include all the fn and are therefore dense in 2 If = L• 0, take if 0, discard it from the sequence. Then cp 1 , cp 2 , . . . is orthonormal, and the finite linear combinations of the are are 0, in which still dense. It can happen that all but finitely many of the
L2 p 2, inner product g g))I 2 g =gn+ l =fn+l - ang g gn ani (J�+ �> g;)/l l g; l 2 .. gw g; = O.g ggn+l g;=I=n + O ,gn, + n g , g1, 2 gn gn =I= 'Pn gn!ll gJ; gn = 'Pn gn ,
•
P
,
•
•
250
INTEG RATION
case there are only finitely many of the 'Pn · In what follows it is assumed that cp 1 , cp 2 , • • • is an infinite sequence; the finite case is analogous and somewhat simpler. Suppose that f is orthogonal to all the 'Pn · If ar� arbitrary scalars, then f, a 1 cpp · · · • a n cpn is an orthogonal set, and by the Pythagorean property, II ! - '£7= ,a;cp1ll 2 = 11!11 2 + 'L;= 1 T > 11!11 2 . If 11!11 > 0, then f cannot be ap proximated by finite linear combinations of the cpn, a contradiction: cp 1 , cp 2 , • • •
a1
a is aConsider completenoworthonormal s stem. y a sequence a 1 , a 2 , . • • of scalacs for which '[� a ,2 converge s. If sn = '£7= a 1cp , then the Pythagorean theorem gives llsn - sm11 2 = 'Lm nar Since the scalar series converges, {sJ is fundamental and therefore by Theorem 1 9 . 1 converges to some g in L2 • Thus g = lim n 'L7= 1 a1cp1, which it is namral to express as g '[� 1 1cp1• The series (that is to say, the sequence of partial sums) converges to g in the mean of order 2 (not almost everywhere). By the following argument, every element of L 2 has a unique representation in this form. The Fourier coefficients of f with respect to {cp) are the inner products n, 0 < II ! - 'L7= 1 a 1 cp;ll 2 = 11!11 2 - 2'L, a 1(f, cp) + a'L1 =a 1 (f,aicpcp).cp For11! 11each 2 - 'L 7= 1 a ! , and hence, n being arbitrary, '£7= ,a; < 11!11 2 • 1• ) u By the argument above, the series '£7= 1 a 1 cp; therefore converges. By linearity, (f- 'L7= 1 a1cp;, cp) = 0 for n > j, and by continuity, (f- '[�= 1 a cp , cp) = 0. Therefore, J- 'L7= 1 a1cp1 is orthogonal to each cpi and by completeness must ,
=1
;
=
=
by linearity, so that f - PMf .l cpi by continuity. But if f - PMf is orthogonal to each cpi , then, again by linearity and continuity, it is orthogonal to the general element 'L]= tbjcpj of M. Therefore, PMfE M and J- PMf .l M. The map f � PMf is the orthogonal projection on M.
complete in
nand orthogonal projection
nj
SECTION
THE L P SPACES
! 9.
251
The fundamental properties of PM are these: {i) (ii) (iii) (iv)
g E M and f - g .l M together imply g = PMf; f E M implies PMf = f; g E M implies I I ! - g il > I I ! - PMf ll; PM(a f + a'f' ) = a PMf + a' PM f'·
Property (i) says that PMf is uniquely determined by the two conditions PM f E M and J - PM .l M. To prove it, suppose that g, g ' E M, J - g .l. M, and f - g' .l M. Then g - g' E M and g - g' .l M, so that g - g' is orthogo 2 nal to itself and hence ll g - g ' ll = 0: g = g' . Thus the mapping PM is independent of the particular basis {cp;}; it is determined by M alone. Clearly, (ii) follows from (i); it implies that PM is idempotent in the sense t hat P'/4 f = PM f. As for {iii), if g lies in M, so does PM f - g, so that, by the 2 2 2 2 Pythagorean relation, II ! - g ll = II! - PM.f l l + IIPMf - g ll > I I ! - PMJ II ; the inequality is strict if g =I= PM f. Thus PM f is the unique point of M lying nearest to f. Property (iv), linearity, follows from (i). An
Estimation Problem
Fir�t, the technical setting: Let (!l, .9", ,u ) and (e, G', '7T) be a a--finite space and a probability space, and assume that .9" and G' are countably generated. Let fe(w) be a nonnegative function on e X n, measurable G'x .9", and assume that fn fi w )JL( dw) = 1 for each (J E e. For some unknown value of e, w is drawn from n according to the probabiiities P6( A ) = fA fe( w )JL( dw ), and the statistical problem is to estimate the value of g( e), where g is a real function on e. The statistician knows the func tions f( . ) and g( ) as well as the value of w; it is the value of (J that is unknown. For an example, take !l to be the line, f(w ) a function known to the statistician, and fe( w ) = af(aw + {3 ), where (J = ( a, {3) specifies unknown scale and location parameters; the problem is to estimate g(e) = a, say. Or, more simply, as in the exponential case ( 1 4.7), take fe( w) = af(aw ), where (J = g((J) = a. An estimator of g((J) is a function t(w). It is unbiased if ·
{ 19 . 15 )
·
,
1 t ( w ) f6( w )JL(dw ) = g ( e ) fl
for all e in e (assume the integral exists); this condition means that the estimate is on target in an average sense. A natural loss function is ( t ( w) - g(e))2, and if f6 is the co rrect density, the risk is taken to be fn(t(w) - g(e ))2fe( w )JL( de). If the probability measure '7T is regarded as an a priori distribution for the unknown e, the Bayes risk of t is ( 19.16)
R('1T, t ) = 1. 1 ( t ( w ) - g ( e ) ) 2 f6 (w ) JL ( dw ) '1T ( de ) ; e n
this integral, assumed finite, can be viewed as a joint integral or as an i terated integral (Fubini's theorem). And now t0 is a Bayes estimator of g with respect to '7T if it minimizes R( '7T, t) over t. This is analogous to the Bayes rules discussed earlier. The
252
INTEGRATION
followmg simple projection argument shows that, except in trivial cases, no Bayes estimator is unbaised t Let Q be the probability measure on G'x !f having density t6(w) with respect to 1r X JL, and let 2 be the space of square-integrable functions on (E) X !l, G' X !f, Q). Then Q is finite and G'x !f is countably generated. Recall that an element of 2 is an equivalence class of functions that are equal almost everywhere with respect to Q. Let G be the class of elements of [2 containing a function of the form g( e, w) = g( w) -functions of () alone Then G is a subspace. (That G is algebraically closed is clear; if t, E G and lit, - til -> 0, then-see the proof of Theorem 19.1-some subse quence converges to f outside a set of Q-measure 0,
is i The Neyman-Pearson lemma. Suppose f1 and /2 are rival densities and L(Ji i ) is 0 or 1 as j = i or j -=F i, so that R;(b) is the probability of choosing the opposite density when [; is the right one. Suppose of o that 8 2( w ) = 1 if f2(w ) > tfi( w ) and oz( w ) 0 if f2(w ) < tfi( w ) where t > 0 Show that b is adm!ssible: For any ule o', fo; r dJL < foz r dJL implies Wr h dJL fo r f dJL. Hint. f(o - o;) =
•
,
> f f ([2 - tf1 ) dJL > 0, since the integrand is nonnegative. l
19.8.
2
2
The classical orthonormal basis for r z [o, 2'7T] with Lebesgue measure is the trigonometric system ( 1 9.17) (2'7T) - I ,
'7T - 1/2 sin nx
,
l '7T - /2 cos tiX ,
Hint:
n = 1,2 .. ,
e inx
+ Prove orthonormality. Express the sines and cosines in terms of multiply out the products, and use the fact that M dx i s 2'7T or 0 as = 0 or -=fo 0. (For the completeness of the trigonometric system, see Problem 26.26.)
e-;,\ m m 19.9.
.,.eimx
Drop the assumption that [2 is separable. Order by inclusion the orthonormal systems in L 2 , and let (?..orn's lemma) = [cp,: be maximal. (a) Show that f1 = [ (f, cp, ) -=F 0] is countable. Use 'Lj= 1( f, cp1 ) 2 < 11[11 2 and the argument for Theorem 10.2(iv). (b) Let Pf = Lr E r ( f, cp )cp . Show that f - Pf l. clJ and hence \maYimality) f = Pf. Thus ¢ r s a f-t ortiJon�rmal basis. (c) Show that is countable if and only if [2 is separable. and (d) Now take to be a maximal orthonormal system in a subspace define PMf = 'L E r (f, cp)cp1• Show that PMf E M and f - PMf l. , that g = PMg if g , ")'an a that f - PMf l. M. This defines the general orthogonal projection.
y:
EM
yEn Hint·
M,
C H A PTER 4
Random Variables and Expected Values
SECTION 20. RANDOM VARIABLES AND DISTRIBUTIONS
This section and the next cover random variables and the machinery for dealing with them-expected values, distributions, moment generating func tions, independence, convolution. Random Variables and Vectors
random variabl e on a probability space f!, !F, P) is a real-valued function X = X(w) measurable Sections 5 through 9 dealt with random variables of A
(
!F.
a special kind, namely simple random variables, those with finite range. All concepts and facts concerning real measurable functions carry over to ran dom variables; any ch3nges are matters of viewpoint, notation, and terminol ogy only. The positive and negative parts x+ and x- of X are defined as in (15.4) and (15.5). Theorem 13.5 also applies: Define
( k - 1 )2- n ( 20.1)
n If X is nonnegative and Xn nonnegative, define
( 20.2) (This i s the same as
254
if ( k - 1 )2- n < X < k T n , if
=
x
>
n.
1 < k < n2 n ,
1/Jn(X), then 0 < Xn i X. If X is not necessarily if X > 0, if X < 0.
(13.6).) Then 0 < Xn(w) i X(w) if X(w ) > 0 and 0 >
SECTION 20.
RANDOM VARIABLES AND DISTRIBUTION S
255
Xn(wH X(w) if X(w) < 0; and IXJw)l i I X{w)l for every w. The random variable Xn is in each case simple. is a mapping from f! to Rk that is measurable !F. Any A mapping from f! to R k must have the form w � X( w ) = (X1(w ), . . . , Xk (w )), where each X;(w) is real; as shown in Section 13 (see (13.2)), X is measur able if and only if each X; is. Thus a random vector is simply a k -tuple X = (X, , . . . , Xk ) of random variables.
random vector
Subfields
If .§ is a £T-field for which .§c !F, a k-dimensional random vector X is of course measurabie .§ if [w: X(w ) E .§ for every in � k . The £T-field £T( X) generated by X is the smallest £T-field with respect to which it is measurable. The £T-field generated by a collection of random vectors is the smallest £T-field with respect to which each one is measurable. As explained in Sections 4 and 5, a sub-lT-field corresponds to partial information about w. The information contained in £T(X) = £T(X1, , Xk ) consists of the k numbers X,(w ), . . . , Xk (w ). i The following theorem is the analogue of Theorem 5.1, but there are technical complications in its proof.
E H]
H
•••
Let X = (X" . . . , Xk ) be a random vector. {i) The £T-jield £T( X ) = £T( X 1 , •• , Xk ) consists exactly of the sets [ X E H] for H E �k . (ii) In order that a random variable Y be measurable £T(X) £T(X" . . . , Xk ) 1 k a measurabl e map y and sufficient that there exist itthat necessar f: R R such Y(w) = f( X1( w), . . . , Xk (w)) for all w. The ciass .§ of sets of the form [ X E H] for HE � k is a £T-field. Since X is measurable £T( X ), .§c £T(X). Since X is measurable .§, £T( X) c Hence part {i). Measurability of f in part {ii) refers of course to measurability � k I� 1 • The sufficiency is easy: if such an f exists, Theorem 13. 1{ii) implies that Y is measurable £T(X). To prove necessity, * suppose at first that Y is a simple random variable, and let y 1, ••• , Ym be its different possible values. Since A; = [w: Y(w ) = Y;] lies in £T(X ), it must by part {i) have the form [w: X( w) E H;] for some H; in �X(w)k. Putcanf=lieLin; Y)more H . ; certainly f is measurable. Since the A; are disjoint, no than one H; (even though the latter need not be disjoint), and hence f(X(w )) Y(w ). Theorem 20. 1 .
=
is
�
PROOF.
f.
,
=
t The partition defined by (4. 16) consists of the sets [w. X(w) t For a general version of this argument, see Problem 13.3.
=
x ] for x E
R k.
RANDOM VARIABLES AND EXPECTED VALUES
256
To treat the general case, consider simple random variables Y,. such that Y,.( w ) � Y(w) for each w. For each there is a measurable function fn : 1 R k � R such that Y,.(w) = fn( X(w)) for all w. Let M be the set of x in R k for which {fn( x )} converges; by Theorem 13.4{iii), M lies in � k. Let f( x ) = limn fJx) for x in M, and let f( x ) = O for x in R k - M. Since f= lim n fn lM and fn lM is measurable, f is measurable by Theorem 13.4{ii). For each w, Y( w ) = l im n JJ X(w)); this implies in the first place that X( w ) lies in M and in the second place that Y(w) = lim n JJX(w )) = f( X(w)). •
n,
Distributions
The distribution or law of a random variable X was in Section 14 defined as the probability measure on the line given by JL = px - 1 (see (13.7)), or ( 20.3 )
JL( A ) = P[ X t:: A ] ,
A
E �1 •
The distribution function of X was defined by ( 20.4) for real x. The left-hand limit of F satisfies ( 20.5)
F( x - ) = JL ( - oo , x )
= P[ X < x ) ,
F( x ) - F( x - ) = JL { x} = P [ X = x ] ,
and F has at most countably many discontinuities. Further, F is nondecreas ing and right-continuous, and limx --+ - oo F( x ) = 0, lim x --+oo F( x ) 1. By Theo rem 14. 1, for each F with these properties there exists on some probability space a random variable having F as its distribution function. A support for JL is a Borel set S for which JL{S ) = 1. A random variable, its distribution, and its distribution function are discrete if JL has a countable support S = { x 1 , x 2 , . . . }. In this case JL is completely determined by the values JL{ X 1 }, jL{X 2 }, • • • • A familiar discrete distribution is the =
binomial:
r=0, 1, ... , n. There are many random variables, on many spaces, with this distribution: If { Xk } is an independent sequence such that P[ Xk = 1] = p and P[ Xk = 0] = 1 - p (see Theorem 5.3), then X could be L:j= 1 Xi, or L:�:;r;xi, or the sum of any of the Xi, Or f! could be {0, 1, . . . , n} if !F con sists of all subsets, P{r} = JL{r}, r = 0, 1, . . . and X( r ) = r. Or again the space and random variable could be those given by the construction in either of the two proofs of Theorem 14. 1. These examples show that, although the distribution of a
n
, n,
SECTION 20.
RANDOM VA RIABLES AND DISTRIBUTIONS
X X
257
contains all the information about the probabilistic random variable behavior of itself, it contains beyond this no further information about the or about the interaction of with underlying probability space (f! , !F, other random variables on the space. Another common discrete distribution is the distribution with parameter ,.\ > 0:
X
P)
Poisson
,.\r
P[X= r ] =J.L{r} = e-,\rl ' - r = O, l, ... .
(20.7)
constant c P[ X= c] = J.L{c} = {x x }
X(w) c.
A can be regarded as a discrete random variable with = In this case 1. For an artificial discrete example, let , 2 , . . be an enumeration of the rationals, and put 1 .
( 20.8) the point of the example is that the support need not be contai ned in a lattice. A random variable and its distribution have f with respect to Lebesgue measure if f is a nonnegative Borel function on R 1 and
density
P[XEA ] = J.L( A ) = f f( x ) dx, A E In other words, the requirement is that J.L have density f with respect to Lebesgue measure in the sense of (16.1 1). The density is assumed to be with respect to if no other measure is specified. Taking A = R 1 in (20.9) shows that f musr integrate to 1. Note that f is determined only to within a set of Lebesgue measure 0: if f = except on a set of Lebesgue measure 0, then can also serve as a density for X and It follows by Theorem 3.3 thl:lt (20.9) holds for every Borel set A if it holds for every interval-that is, if ( 20.9)
92 1 .
A
,.\
,.\
g
g
JL ·
F( b ) -F( a ) = ft( x ) dx holds for every a and b. Note that F need not differentiate to f everywhere (see (20. 13), for example); all that is required is that f integrate properly-that (20.9) hold. On the other hand, if F does differentiate to f and f is continuous, it follows by the fundamental theorem of calculus that f is indeed a density for F.t a
t The general question of the relation between differentiation and integration is taken up in Section 3 1
258
RANDOM VARIABLES AND EXPECTED VALUES
For the
exponential distribution with parameter a > 0, the density is if if
( 20.10 )
X < 0, X > 0.
The corresponding distribution function
F ( x ) =- \! 0 e - a< 1-
( 10. 1 1 )
was studied in Section 14. For the
if if
normal distribution with parameters (x )2 1 exp , f( x ) = & 2 27r [ 2a- l
( 20 .12 )
X < 0, >0 m
and a-, a- > 0,
- m
(T
- oo
<x
0, (20. 16) show� that has the normal density with parameters and Finding the density of g{X) from first principles, as in the argument leading to (20. 16), often works even if g is many-to-one: If X has the standard normal distribution, then am
a
t
b
au.
aX + b
Example 20. 1.
for
X > 0. Hence
X2 has density f( x )
0 =
1
r;:;v L. 'TT'
x
- 1 /2 - x /2 e
if X < 0, •
if X > 0.
Multidimensional Distributions
. . . , k ), the distribution J.L (a For a k-dimensional random vector X = probability measure on � k ) and the distribution function F (a real function on R k ) are defined by
(Xp X
A E �k,
(20.17) where
Sx
=
[
y : Y;
< X;, i
=
]
1, . . . , k consists of the points "southwest" of
x.
RANDOM VARIABLES AND EXPECfED VALUES
260
J.L 1, F Xk . FA
Often and are called the joint distribution and , function of X Now is nondecreasing in each variable, and D. A rectangles (see (12.12)). As h decreases to 0, the set • • •
joint
F>
distribution
0 for bounded
= [ y : X + = 1' . . . ' k ] decreases to S and therefore (Theorem 2. 1{ii)) F is continuous from above Further, in the sense that lim h ! F(x 1 + h, . . . , x k + h) = F(x , x ). k ) � if X ; � for some i (the other coordinates held fixed), F(x , x 1, k ) � 1 if X ; � for each i. For any F with thesek properties and F(x x k there is by Theorem 12.5 a unique probability measure J.L on �� such that J.L(A)As =hD.decrease A F for bounded rectangles A, and J.L( S ) F(x) for all x. s to S increases to the interior S� [ y i = 1, . . . , k ] of S and so sX . h
x'
•
•
1,
•
•
•
i
h' i
0
-
0
•
Y;
oo
•
•
•
oo
0,
x'
x
x. -h
=
=
:
Y; < X ;,
lim F( x1 - h , . . . , x k - h ) = 11- ( S� ) . h !O
(20. 18)
F F(x).
x F(x) J.L(
Since is nondecreasing in each variable, it is continuous at if and only if it is continuous from below there in the sense that this last limit coincides S x ) J.L(S� ), with Thus is continuous at if and only if which holds if and only if the boundary asX sX s� (the y-set where Y; < X i for all i and Y; = for some i) satisfies aS) 0. If k > 1 , can have discontinuity points even if J.L has no point masses: if J.L corresponds to a uniform distribution of mass over the segment B 0): 0 < < 1 in the 0 < x < 1, then is discontinuous at each plane . point of B. This also shows that can be discontinuous at uncountably many points. On the other hand, for fixed X the boundaries as h are disjoint for different values of h , and so (Theorem l0.2{iv)) only count�bly many of them can have positive wmeasure. Thus is the limit of points + h, . . . + h at which F is continuous: the continu ity points of are dense. There is always a random vector having a given distribution and distribu tion function: Take (f!, :F, (R \ �\ and X(w) = w. This is the obvious extension of the construction in the first proof of Theorem 14 .1. The distribution may as for the line be discrete in the sense of having countable support. I t may have density f with respect to k-dimensional Lebesgue measure: As in the case k 1, the distribution = is more fundamental than the distribution function and usually JL is described not by but by a density or by discrete probabilities. � R ; is measurable, If X is a k-dimensional random vector and then g(X) is an i-dimensional random vector; if the distribution of X is JL , the distribution of X) is just as in the case k 1-see (20 .14). If I> and its distribu � is defined by then gj(X) is is given by tion for
F x; (u{A) A[x:
x
=
-
=
J.L(
=
=F [( x,
(x,O) EA]), F (x x 1 F J.L) P)
=
=
F x ]
X
, xk )
=
J.L
J.L( A) fA f(x)dx.
=
F,
F g: Rk 1 g{ gg/ J.L , 1 R k J.LRj =J.Lg;-: 1 gj(x J.L/ A)xk)=J.L= xj,[(x1, , x ): xjEA]xj, = P[XjEA] k =
•
•
•
'
•
•
•
SECTION 20.
RANDOM VARIABLES AND DISTRIBUTIONS
261
.9i' 1 . The are the marginal distributions of J..L . If J..L has a density f in AR \E then J..Lj has over the line the density J..L i
(20.19)
f;(x) = j f(x p .. . ,x,_ 1, x , x; + �> · · · · xk ) dx1 • • • dx,_ 1 dx; + 1 • • • dxk , since by Fubini ' s theorem the right side integrated over A comes to J..L[(xpNow·· suppose · · x�) XjthatEA].g is a one-to-one, continuously differentiable map of V onto U, where U and V are open sets in R k . Let T be the inverse, and suppose its Jacobian J(x) never vanishes. If X has a density f supported b)" V, then for A c U, P[g(X) E A) = P[ X E TA] = fTA f( y ) dy , and by (17. 10), this equals fA f(Tx)I J(x)l dx. Therefore, g{X) has density x I J I for x E U, ( �( Tx ( ) ) (20 .20) d( x) = for x $. U. Rk - 1
This is the analogue of (20. 16). Example 20.2.
(X1, X2) has density f ( X X 2) = ( 2 7T) exp [ - i (X f + X i) J , Suppose that I,
-I
g 21 g(X1, X),2)
and let be the transformation to polar coordinates. Then U, V, and T are as in Example 17.7. If R and E> are the polar coordinates of then 2 in V. By (20. 19), R has density ( R , E>) has density (2 7T)- 1pe - p 2 pe - p on (0, oo and E> is uniformly distributed over (0, 2 7T . •
/2
=
(X1, X2 ),
)
For the normal distribution in R \ see Section 29. Independence
• • • , Xk are defined to be independent if the u-fields X1, u(X1 ) , • • • , u(Xk ) they generate are independent in the sense of Section 4. This concept for simple random variables was studied extensively in Chapter 1; the general case was touched on in Section 14. Since u(X) consists of the 1 sets [X; E H] for HE .9i' , X 1, • • • , Xk are independent if and only if Random variables
H1, , Hk .
for all linear Borel sets The definition (4.10) of independence ••• requires that (20.2 1) hold also if some of the events are suppressed on each side, but this only means taking H,·
[X; E H;] 1 =R •
RANDOM VARIABLES AND EXPECTED VALUES
262
Suppose that (20 .22)
x 1, , x k ;
x]
for all real it then also holds if some of the events X; < ; are suppressed on each side Oet X; � ). Since the intervals ( x ] form a 7T-system generating � � , the sets < form a 7T-system generating u(X,.). are independent. Therefore, by Theorem 4.2, (20.22) implies that XI , . . . If, for example, the X; are integer-vaiued, it is enough that for integral (see • • •
oo
-
[X; x ] n1,(5.9)). , Xk = nd =P[X1 = n 1 ] · · · P[Xk =n k ]
[oo,
' xk
P[X1 n1, ,n k Let . . . , Xk ) have distribution and distribution function F, and let the have distributions and distribution functions (the marginals). By (20.2 1), X1, , Xk are independent if and only if is product measure in the sense of Section 18: •
•
=
•
X;
( Xp
•
J.L
J.L ;
IL
• • •
•
•
Fj
( 20 .23) By (20.22), X1,
• • •
,
Xk are independent if and only if
(20.24) Suppose that each integrated over J.L has density
J.L,
fl f ( y1) ) yk k F1( x1) Fk(xk), so that
has density [;; by Fubini' s theorem, is just
(-oo, xd x · · · x { -oo, xd
•
•
•
•
•
•
( 20 . 25) in the case of independence. If are independent IT-fields and X; is measurable .#;, i 1, . . . ' k, then certainly XI , . are independent. are If X; is a d;-dimensional random vector, i 1, . . . , k, then X1 , , u(Xk ) are independent. by definition independent if the u-fields u(X The theory is just as for random variables: X1 , are independent if and E � d1 , Xk ) can be only if (20.21 ) holds for E � dk. Now (X re garded as a random vector of dimension d f.7_ d; ; if J.L is its distribution X R dk and J.L ; is the distribution of X; in R d;, then, just as in R d R d1 before, , X are independent if and only if J.L J.L X X J.L In none of this need the d; components of a single X; be themselves independent random variables. An infinite collection of random variables or random vectors is by defini tion independent if each finite subcollection is. The argument following (5.10)
.#1, , .#k •
•
•
.
.
=
' xk
=1), , X k , Hk 1 •
H1
=
X1,
x ···
. • .
k
•
• • •
•
•
•
•
•
•
•
=
=
1, , 1 · · · k. •
•
•
, Xk
SECTION 20.
RANDOM VARIABLES AND DISTRIBUTIONS
263
extends from collections of simple random variables to collections of random vectors: Theorem 20.2.
Suppose that ... ...
( 20.26) •
is an independent collection of random vectors. If .9; is the u-field generated by the i th row, then � .9;, . . . are independent. •
Let J:1f; consist of the finite intersections of sets of the form [ H] with H a Borel set in a space of the appropriate dimension, and apply Theorem 4.2. The u-fields .9; u{J:If;), i 1, . . . , n, are independent for each n, and the result follows. • PROOF.
=
X;j E
=
Each row of (20.26) may be finite or infinite, and there may be finitely or infinitely many rows. As a matter of fact, rows may be uncountable and there may be uncountably many of them. Suppose that X and are independent random vectors with distributions Then ( X, has distribution /-L and in and in and over Let range over By Fubini's theorem,
Y k k k + Rj = Rj . X . Rj R X R Y) J.L v v k ·x Rj y R . · ( 20 .27 ) (J.L x v)(B ) = J v[y: (x, y) E B] J.L(dx ) , Replace B by (A X R k ) n B, where E �j and B E �j + k . Then (20.27) reduces to ( 20.28) (J.L X v) (( A x R k ) n B ) = f v[y : (x, y) EB ]J.L( dx ) , RJ
A
A
) E B] is the x-section of B, so that Bx E� k Theorem = [ (x, B y y x 1 8.1), then P[(x, Y) E B] = P[w : (x, Y(w)) E B] P[w: Y(w) E B ] v(Bx). :
If
=
(
x
=
Expressing the formulas in terms of the random vectors themselves gives this result: Theorem 20.3.
If X and Y are independent random vectors with distribu and then
J.L and v in Rj R\ B E�j + k ( 20 .29 ) P [( X , Y ) EB ] = f P [(x , Y ) EB ] J.L(dx ) ,
tions
Rl
'
RANDOM VARIABLES AND EXPECTED VALUES
264
and
P [ XEA,( X,Y ) E B] f P [(x ,Y ) E B] J.L( dx ) , A E �j , B E �j + k . Suppose that X and Y are independent exponentially distributed random variables. By (20.29), P[Y/X > z] f�P[Yjx x zae-ax dx = (l +z)- 1 • Thus YjX has density x (l +z)-2 = J�e-a z]ae-a dx for z > O. Since P[X> z1, Y/X>z2] = fz� P[Yjx > z2]ae-a x dx by (20.30), • the joint distribution of X and YIX n be calculated as well. (20 .30)
=
A
Example 20.3.
>
=
ca
The formulas (20.29) and (20.30) are constantly applied as in thi example. There is no virtue in making an issue of each case, however, and the appeal to Theorem 20.3 is usua lly silent. Here is a more complicated argument of the same sort. Let , Xn be independent random variables, earh unifo rmly distributed over [0, t ]. X < Yn < t. The X; Let Yk be the k th smallest among the X;, so that 0 < Y < Yn - Yn t - Yn; let M divide [0, t ] into + 1 subintervals of lengths Y , Y2 P[ M < ] . The problem is to show be the maximum of these lengths. Define 1/Jn( t, that
1
Example
, • • •
20.4.
1 1a)= - Y1, •••a ,
n
·
·
·
_
1,
( 20 .31 )
=
where x + ( x + I xi)/2 denotes positive part. Separate consideration of the possibilities 0 < a < t /2, t /2 < a < t , and t 1 . Suppose it is shown that the probability rfii t, satisf.es disposes of the case the recursion ( 20 .32)
s this same recursion, and so it will follow by induction that (20.31) holds for all In intuitive form, the argument for (20.32) is this: If [ M < a] is to hold, the smallest If X1 is the smallest of the X1, then of the X; mus t have some value in [0, , Xn must all lie in [ t ] and divide it into subintervals of length at most the X n probability this is (1 - xjt) - J r/J, _ /t because X , , Xn have probability n of (1 - xjt ) - I of all lying in [ t ], and if they do, they are independe nt and uniformly distributed there. Now (20 .32) results from integrating with respect to the density for xl and multiplying by to allow for the fact that any of XI , . . . , xn may be the smallest. 1. Let be To make this argument rigorous, apply (20.30) for j 1 and k = the interval [0, a], and let consist of the points (x n) for which 0 < X; < t, is the minimum of xn divide [ x 1 , t] into subintervals of length , xn, and at most a. Then P[ X1 = min X;, M < a] = P[ X1 (X1 , , Xn ) 8 ]. Take X1 for
2
,
• . .
n
x, x,
B x1, • • •
x
n.
a]. - x, a),
2
1, x
a;
• • •
n-
=
x2,
• • • ,
• • • ,
EA,
• • •
E
A x1
S ECTION 20.
RANDOM VARIABLES AND DISTR IBUTIONS
X and ( X2, • • • , X,) for
265
Y in (20.30). Since X1 has density ljt,
(20.33) If C is the event that x < X; < t for 2 < i < n, then P(C) = 0 - xjt)" - 1 • A simple calculation shows that P[ X; - X < s" 2 < i < n!C] = n;'= z(sJ(t - X )); in other words, given C, the random variables X2 - x, . . . , X, - x are conditionally independent and uniformly distributed over [0, t - x ]. Now X2 , • • • , X, are random variables on some 5', P); replacing P by P( ·!C) shows that the integrand in probability space (20.33) is the same as that in (20.32). The same argument holds wi th the index 1 replaced by any k (1 < k < n), which gives (20.32). (The events [ Xk = mi n X;, < ] a re not disjoi nt, but any two intersect in a set of probability 11
(D.,
O.)
Y a
Sequences of Random Variables
Theorem 5.3 extends to general distributions
11- n -
If {JL,} is a finite or infinite sequence of probability mea exists on some probability space (f!, 5', P) an independent of random variable� such that X, has distribution JL, .
Theorem 20.4. sures on � 1 , there
sequence {X,}
PROOF. By Theorem 5.3 there exists on some probability space an independent sequence Z 1 , 22, . . . of random variables assuming the values 0 and 1 with probabilities ?[2, = 0] = ?[ 2, = 1] = i · As a matter of fact, Theorem 5.3 is not needed: take the space to be the unit interval and the Z,{w) to be the digits of the dyadic expansion of w -the functions d ,(w) of Sections and 1 and 4. Relabel the countably many random variables 2" so that they form a double array, .
.
.
All the 2n k are independent. Put U, = L.� = 1 2, k 2 - k The series certainly converges, and U, is a random variable by Theorem 13.4. Further, U1 , U2 , is, by Theorem 20.2, an independent sequence. Now P[Z, ; = z ; , 1 < i < k ] = 2 - k for each sequence z 1 , • • • , z k of O's and 1 ' s; hence the 2 k possible values j2 - k , 0 <j < 2 \ of S, k = L.7= 1 2, ; 2 ; all have probability 2 - k . If 0 < x < 1, the number of the j2 - k that lie in [0, x] is l2 k xj + 1 , and therefore P[ S, k < x ] = (l2 k xj + l)j2 k . Since S, k (w) j U,(w) as k j oo, it follows that [S, k < x] H U, < x] as k j oo, and so P[U, < x] = Iim k P[S, k < x ] lim k (l2 k xj + 1)j2 k = x for 0 < x < 1. Thus U, is uniformly distributed over the unit interval. • • •
-
=
RANDOM VARIABLES AND EXPECTED VALUES
266
The construction thus far establishes the existence of an independent sequence of random variables each uniformly distributed over [0, 1]. Let and put in : be the distribution function corresponding to for 0 < u < 1. This is the inverse used in Section 14-see (14.5). u Set Since 0, say, for u outside (0, 1), and put )-see the argument following (14.5)-P[ if and only if u And by Thus has distribution function Theorem 20.2, are independent. •
Un
Fn < Fn(x)] cpn(u) 'Pn(u) < X < Fn(x <x] = P[Un < Fn(x)] = Fn(x). Xn X" X2 , =
•
.
J.L n , cpn(u) = f[x Xn(w) = cpn(Un(w)). xn Fn.
.
This theorem of course includes Theorem 5.3 as a special case, and its proof does not depend on the earlier result. Theorem 20.4 is a special case of Kolmogorov' s existence theorem in Section 36. Convolution
X
J.L and v. = [(x, y ): x + y E H] with
Y
Let be independent random variables with distributions and Apply (20.27) and (20.29) to the planar set B
H E .9i' 1 :
"' ( 20 .34) P[ X+ YE H] = j-oo v( H -x ) J.L ( dx ) = j- "' P[YEH -x]J.L ( dx ) . The convolution of J.L and v is the measure J.L * v defined by "' ( 20 . 35 ) ( J.L * V}(H) = j- "'v ( H - x ) J.L ( dx ) , If X and Y are independent and have distributions J.L and (20.34) shows that X+ Y has distribution J.L * v. Since addition of random variables is commutative and associative, the same is true of convolution: J.L * v v * J.L and J.L *(v * TJ) = (J.L * v) * TJ. If F and G are the distribution functions corresponding to J.L and v, the distribution function corresponding to J.L * v is denoted F * G. Taking H = y] in (20.35) shows that ( "' ( 20 .36 ) ( F * G ) ( y ) = j G ( y - x ) dF ( x ). (See 0 7.22) for the notation dF(x).) If G has density g, then G{y -x) jy:,xg(s) ds = f�"'g(t - x)dt, and so the right side of (20.36) is f � "' [f':"'g(t - x) dF(x)] dt by Fubini's theorem. Thus F * G has density F * g, where "' ( 20.37) ( F * g )( y ) = j g(y -x ) dF ( x ) ; "'
v,
=
-
oo,
- 00
=
- 00
SECTION 20.
267
RANDOM VARIABLES AND DISTRIBUTIONS
G
g.
this holds if has density If, in addition, denoted and reduces by (16. 1 2) to
f*g
F
has density
f, (20.37) is
"'
( f * g )( y) = J g(y -x ) f( x ) dx. This defines convolution for densities, and * has density f * g if and have densities f and g. The formula (20.38) can be used for many explicit ( 20.38)
- oo
v
J.L
v
J.L
calculations.
Let X 1 , , Xk be independent random variables, each with the exponential densit� (20. 10). Define by Example 20.5.
•
•
•
gk
x > O,
( 20.39) put
g k(x)
=
0 for
k = 1 , 2, . . . ;
x < 0. Now g k(y).
g 1 coincides with gk =g k - l * g 1 , and . since XI + . . + xk has density gk •
which reduces to Thus (20. 10), it follows by induction that the sum The corresponding distribution function is
X > 0, as follows by differentiation.
•
Example 20.6. Suppose that X has the normal density (20.12) with and and that Y has the same density with T in place of u. If
independent, then
X+ Y
A change of variable u
=
X
has density
x(u 2 + T 2 ) 1 12juT
reduces this to
m -=
Y
0
arc
RANDOM VARIABLES AND EXPECTED VALUES
268
Thus X + Y has the normal density with
(F 2.
m = 0 and with u 2 + T 2 in place of •
If and v are arbitrary finite measures on the line, theii: convolution is defined by (20.35) even if they are not probabil ity measures.
J.L
Convergence in Probability
Random variables X, converge in probability to X, written X,
-)
lim P [ I X, - X I > E ] = 0
(20.4 1 )
P
X, if
n
for each positive E.t This is the same as (5.7), and the proof of Theorem 5.2 carries over without change (see also Example 5.4) I,
X.
If X, � X with probability then X, � (ii) A necessary and sufficient condition for X11 � X is that each subsequence {X11 } contain a further subsequence {X, } such that X � X with probability 1 as i � PRooF. Only part (ii) needs proof. If X, � X, then given {n k }, choose a subsequence {n k(i l } so that k > k ( i) implies that P[I X,k - X ! > i - 1 ] < 2 1 . By the first Borel-Cantelli lemma there is probability 1 that i X, . - X I < i - I for all but finitely many i. Therefore, lim , X, w, = X with probability 1 . If X11 does not converge to X in probability, there is some positive E for which P[ I X11� - XI > E] > E holds along some sequence {n k}. No subsequence Theorem 20.5. �
{i)
kUl
oo
P
P
11kUl
P
-
� ( 0, then f, converges in measure to f. P
>
The Glivenko-Cantelli Theorem ·
empirical distribution function for random- 1variables Xp . . . , X, F,,(x, w) with a jump of n at each Xk(w )
The distribution function (20.42)
tThis is ofte n expressed p lim, X,, = X •This topic may be omitted
:
is the
SECTION 20.
269
RANDOM VARIABLES AND DISTRIBUTION S
F w)
If the Xk have a common un known distribution function F(x), then n(x, is its natural estimate. The estimate has the right limiting behavior, according to the
Glivenko-Cantelli theorem: Theorem 20.6. Suppose that XI > X . . are independent and have a com mon distribution function F; put Dn(w) = supx IF/x, w) - F(x)l. Then Dn � 0 with probability 1. For each Fn(x, w) as a function of w is a random variable. By right 2, .
x,
contir!Uity, the supremum above is unchanged if x is restricted to the rationals, and therefore Dn is a random variable. The summ ands in (20.42) are indepen dent, identically distributed simple random variables, and so by the strong law of large numbers (Theorem 6.1), for each x there is a set A x of probability 0 such that limFn( x ,
( 20.43)
n
w) = F( x )
w
for $ A x · But Theorem 20.6 says more, namely that (20.43) holds for w outside some set A of probability 0, where A does not depend on x-as there are uncountably many of the sets A x • it is conceivable a priori that their union might necessarily have positive measure. Further, the conver gence in (20.43) is un iform in x. Of course, the theorem implies that with probability 1 there is weak convergence = (x) in the sense of Section 14.
Fn(x, w) F
PROOF OF THE THEOREM. As already observed, the set Ax where (20.43) fails has probability 0. Another application of the strong law of large numbers, with 1( - oo, x > in place of 1( - oo, x J i n (20.42), shows that (see (20.5)) = lim n , = ) except on a set Bx of probability 0. Let inf[ x: < (x) for 0 < < 1 (see (14.5)), and put xm k = cp( j ), > 1, , 1 < < It is not hard to see that F(cp( u) - ) < < F( cp(u)); hence 1 F( xm ' 1 - ) < m - 1 , and 1 - m - • U:: t and n( w ) be the maximum ot the quantities for k = l, ' + < < then If x < X < X� > + ' -I > Together with simi lar arguments fo r the this shows that cases x < xm, l and x >
cp(u) F/x - w) F(x u u F ] km m u F(xm k m. k 1 - ) - F(xm k - l ) < m - , F(xm m) > IF/xm k: w) - F( tm k )l Dm . . . , m. IFn(xm k - , w) - F(xm k - )I Fn(x, w) Fn(xm k - , w) F(xm k - ) m k- l - 1 k• Fn(x, w) Fn(xm k - l' w) F(xm k - l ) Dm n(w) F m Dm - Dm n(w) F(x) - m - Dm n(w). xm,m' •
•
•
•
•
( 20 .44)
w
If lies outside the union A of all the Axmk and Bxmk. , then limn = 0 and hence limn = 0 by (20.44). But A has probability 0.
Dn(w)
Dm n(w) •
•
270
RAN DOM VARIABLES AND EXPECTED VALUES
PROBLEMS 20. 1.
20.2.
2.11 i
A necessary and sufficient condition for a a--field .:# to be countably generated is that .:#= a-( X) for some random variable X. Hint: If .:#= k a-( A 1 , A 2 , • • . ), consider X = Lk=dUA )/ 10 , where f(x) is 4 for x = 0 and 5 for x -=1= 0. If X is a POSitive random variable with dens ity f, then x - l has density f( ljx)jx 2 • Prove this by (20.16) and by a direct argument.
20.3.
Suppose that a two-dimensional distribution ft.nction F has a continuous density f. Show that f( x, y) = a 2F( x, y)jax ay.
20.4.
The construction in Theorem 20.4 requires only Lebesgue measure on the unit interval. Use the theorem to prove the existence of Lebesgue measure on R k . First construct Ak re�t::icted to ( -n, n] x X ( -n, n], and then pass to the limit (n --+ oo). The idea is to argue from first principles, and not to use previous constructions, such as tho�e in Theorems 12.5 and 18.2. ·
·
B,
20.5.
Suppose that A , and C are positive, independent random variables with distribution function F. Show that the quadratic Az 2 + + C has real zeros with probability f'0f0F(x 214 y) dF(x) dF(y ).
20.6.
Show that XI , Xz , . . . are independent if a- ( X l > independent for each n.
20.7.
Le t X0, X1 , • • • be a persistent, irreducible Markov chain, and fo r a fixed state j let T1 , T2 , • • • be the times of the successive passages through j. Let Z 1 T1 and Zn = Tn -k Tn _ 1 , n > 2. Show that Z 1 , Z2 , • • • are independent and t hat Z = k ] = fjj l fo r n > 2.
P[
20.8.
Bz
" ' '
xn - l ) and a-(X) are
=
n
Ranks and records. Let XI > X2 , . . . be independent random variables with a
B
be the w-set where common continuous distribution function. Let Xm(w) Xn( w ) for some pair m , n of distinct integers, and show that 0. Remove from the space !l on which the X, are defined. This leaves the joiJit distributions of the xn unchanged and makes ties impossible. n Let r< l(w) = (Tf"l(w), . . . T�"l (w )) be that permutation (t 1 , . . . , tn) of O, . . . , n) for which X, ( w ) < X, (w) < < X, (w). Let Yn be the rank of Xn ' � among X1 , • • • , Xn: Yn r if an only if X; < j(n for exactly r - 1 values of i preceding 11. n (a) Show that T< l is uniformly distributed over the n! permutations. (b) Show that Yn r] 1jn, 1 < r < n. (c) Show that Yk is measurable a-(T 0. (By (17.9), the density integrates to 1.) (a) Show that c u * c, = c U +, . Hint: Expand the convolution integrand in partial fractions. (b) Show that, if XI ' . . . ' xn are independent and have density c,, then ( X1 + + Xn )jn has density c u as well. · ·
·
Y
Show that, if X ar:d are independent and have the standard normal density, then Xj Y has the Cauchy density with u = 1. (b) Show that, if X has the uniform distribution over ( 7rj2, 7T'/2), then tan X has the Cauchy distribution with u = 1.
20.15. i
(a)
-
20. 16.
1 8.1 8 i Let XI , . . . ' xn be independent, each having the standard normal distribution Show that Xn2 = XIz + . . . +X2 n
has density
(20.46) over (0, oo). This is called the chi-squared distribution with n degrees offreedom.
272
RANDOM VA RIABLES AND EXPECTED VALUE�
20.17. i
The gamma distribution has density
(20.47) over (0, oo) for positive parameters Show that
au
f( , , ) * f(
(20.48)
a and u. Check that (20.47) integrates to 1 ; a, c ) = f( ' ; a ,u + u).
Note that (20.46) is f( x; t, nj2), and from (20.48) deduce again t hat (20.46) is the density of x;. Note that the exponential density (20 10) is f( x ; 1), and from (20 48) deduce (20 39) cnce again.
a,
n
Let N, X 1 , X2 , . . . be independent, where P[ N = n] = q - lp, n > l, and each xk has the exponential density f(x; 1). Show that XI + . . +X,., has density f(x; 1).
20.18. i
ap,
20.19.
a,
Let Anm(t:) = [IZk - Zl < t:, n < k < m]. Show that Zn --+ Z with probability 1 if and only if limn lim m P( An m(t:)) = l for all positive £, whereas Zn -"p Z if and only if lim n P( An,(t: )) = 1 for all positive £. Suppose that f: R 2 --+ R 1 is continuous. Show that Xn --+ P X and Yn --+ P Y imply f( Xn, Y) --+ P f( X, Y ). (b) Show that addition and multiplication preserve convergence in probability.
20.20. (a)
20.21.
Suppose that the sequence {Xn} is fundamental in probability in the sense that for t: positive t here exists an NE such that P[IXm - Xnl > t:] < t: for m, n > NE . (a) Prove there is a subsequence {Xn ) and a random variable X such that that limk Xn k = X with probability 1. Hint: Choose increasing nk such k k k P[ I Xm - Xnl > 2 - ] < 2 - for m, n � n�. Analyze P[ IXn k+, - Xnk l > 2 - ]. (b) Show that Xn --+ P X. Suppose that XI < Xz < . . . and that xn --+ p X. Show that xn --+ X with probability 1. (b) Show by example that in an infinite measure space functions can converge almost everywhere without converging in measure.
20.22. (a)
20.23.
20.24.
If Xn --+ 0 with probability 1, t hen n - 1 r,�= 1 Xk --+ 0 with probability 1 by the standard theorem on Cesaro means [A30]. Show by example that this is not so if convergence with probability 1 is replaced by convergence in probability. S how that in a discrete probability space convergence in probabil ity is equivalent to convergence with probability 1. (b) Show that discrete spaces are essentially the only ones where this equiva lence holds: Suppose that P has a nonatomic part in the sense that there is a set A such that P(A) > 0 and P( lA) is nonatomic. Construct random variables Xn such that X, --+ P 0 but X, does not converge to 0 with probabil ity 1.
2. 1 9 i
(a)
SECTION 21. (
'0.25.
EXPECTED VALUES
273
20.21 20.24 i Let d( X, Y) be the infimum of those positive *' for which P[l X - Y I > t: ] < t:. (a) Show that d( X, Y) = 0 if and only if X = Y with probability 1. Identify random variables t hat are equal with probability 1, and show that d is a metric
on the resulting space. (b) Show that Xn -+ P X if and only if d( X X) -+ 0. (c) Show that t he space is complete. (d) Show that in general t here is no metric d0 on this space such t hat Xn -+ X with probability 1 if and only if d0( Xn, X) -+ 0.
n
20.26.
,
Construct in R k a random variable X that is uniformly distributed over the surface of the unit sphere i:1 the sense t hat I XI = 1 and UX has the same distribution as X for orthogonal transformations U. Hint Let Z be uniformly = (1, 0, , 0), say), distributed in the unit ball in R k , define 1/J( x) = x !I xI and take X = ljJ( Z ).
( ljJ(O)
20.27.
e
and be the longitude and latitude of a random point on th� surface of the u nit sphere in R 3• Show that E> and are independent, 0 is uniformly distributed over [0, 2'7T), and is distributed over [ -'7T/2, +'7T/2] with density -} cos
i Let
SECTION 21. EXPECTED VALVES Expected Value as Integral
The expected value of a random variable with respect to the me asure
P:
E[
X on (f!, .9', P) is the integral of X
•
X ] = jXdP = j X(w)P(dw) . n
All the definitions, conventions, and theorems of Chapter 3 apply. For nonnegative is always defin ed (it may be infinite); for the ge neral is defined, or has an expected value, if at least one of and is finite, in which case and i s integrable if = and only if E I Il < oo. The integral over a set A is defined, as before, as In the case of simple random variables, the definition reduces to that used in Sections 5 through 9.
X, E[X1 E[X- 1
X, E[X1
X
[X E[IA X].
E[X1 E[X + 1 - E[X- 1; fA XdP
X
E[ X + j
Expected Values and Limits
The theorems on integration to the limit in Section 16 apply. A useful fact: If random vari ables are dominated by an integrable random variable, or if they are uniformly integrable, then � follows if converges to in probability -convergence with probability 1 is not necessary. This follows easily from Theorem 20.5.
Xn
X
E[Xn1 E[X1
Xn
274
RAN DOM VARIABLES AND EXPECTED VALUES
Expected Values and Distributions
J.L. g
X
Suppose that has distribution If is a real function of a real vari able, then by the change-of-variable formula (16.17),
E[g( X)) f-"' g(x)J.L(dx). "' (In applying (16.1 7), replace T: f! � D.' by X: f! � R 1 , J.L by P, 11-T - 1 by J.L, and f by g.) This formula holds in the sense explained in Theorem 16. 1 3: It ( 21 . 1 )
=
holds in the non negative case, so that
( 21 2 )
E (l g(X) I] j-"' l g(x) IJ.L (dx) ; =
"'
if one side is infinite, then so is the other. And if the two sides of (21 .2) are finite, then (21 . 1 ) holds. If J.L is disci ete and J.L{X I ' x 2 , = 1, then (21. 1) becomes (use Theorem •
•
•
16.9)
( 21 .3 )
}
r
X has density f, then (21. 1) becomes (use Theorem 16. 1 1 ) E [ g(X) ) J g(x)f(x) dx. ( 21 .4 ) "' If F is the distribution function of X and J.L, ( 2 1 . 1 ) can be written E[g(X)] If
00
=
=
J':"'g(x) dF( x) in
-
the notation ( 17.22).
Moments
J.L and F determine all the absolute moments of X: k 1 , 2, . . . . ( 21 . 5 ) E[IXi k J j "' lxl kJ.L (dx) Jco lxl k dF ( x ) , k Since j < k implies that lxlj < 1 + lxl , if X has a finite absolute moment of order k, then it has fin ite absolute moments of ord ers 1, 2, . . . , k - 1 as welL For each k for which (2.15) is finite, X has k th moment
By (21.2),
=
- 00
=
- 00
=
( 21 .6 )
J.L
These quantities are also referred to as the moments of and of F. Th ey can be computed by (21 .3) and (21 .4) in the appropri ate circumstances.
SECTION 21.
EXPECTED VALUES
275
m
0 and u = 1 . Consider the normal density (20.12) with goes to 0 exponentially as � + oo, and so finite For each k, mome nts of all orders exist. Integration by parts shows that Example 21.1.
x k e - x 2 12
x
=
oo "' 1 ! x k e - x2 1 2 dx = k - f x k - 2 e -x2 ;2 dx , k 2 ,3 , . . . . J2 - "' (Apply ( 18. 16) to g(x) = x k - 2 and f(x) =xe-x 212 , and let Of course, E[ X] 0 by symmetry and E[X 0 ] 1 . It follows by induction J2
17r
7r
=
- "'
=
th at
a � - oo, b � oo. )
=
( 2 1 .7 ) E[X 2k ] = 1 x 3 x 5 x
· · ·
x ( 2k 1 ) ,
k
-
=
1 , 2, . . . , •
and that the od d moments all vanish. If the first two moments of Section the variance is
5,
X are finite and E[X] = m, then just as
in
Vac[X] E [ ( X - m ) ] = r ( x - m) 2 JJ- (dx)
( 21 .8)
=
2
" - ou
From Example 21.1 and a change of variable, it follows that a random variable with the normal den sity (20.12) has mean and variance u 2 • Consider for nonnegative the relation
m
X "' "' f f E[X] = 0 P[ X > t] dt = 0 P[ X> t] dt.
( 21 .9)
P[ X= t]
t,
Since can be positive for at most countably many values of th e two integrands differ only on a set of Lebesgue measure 0 and hence the integrals are the same. For simple and nonnegative, (21.9) was proved in Section see (5 .29). For the general nonnegative let be simple random variables for which 0 < i X (see (20.1)). By the monotone conver i gence theorem moreover, and therefore i again by the monotone convergence theorem. i Since (21.9) holds for each a passage to the limit establishes (21. 9 ) for X itself. Note that both sides of (21.9) may be infinite. If the integral on the right is finite, then is integrable. Replacing by > aJ leads from (21.9) to
X
5;
X, Xn P[ Xn > t] P[ X> t ],
Xn
£[ Xn] £[X]; f t] dt J0P[ X> t] dt, Xn , X X XI1x ( 2 1 . 10 ) j X dP = a P[ X> a] + j "'P[ X > t] dt, As long as a > 0, this holds even if X is not nonnegative. [ X > a]
a
276
RANDOM VARIABLES AND EXPECTED VALUES
Inequalities
Since the final term in (21 . 10) is non negative, Thus
E[X].
aP[ X> a] < frx a. J XdP < �
1 P[ X> a] < -a1 1[ X 0,
X. As in Section 5, there follow the inequalities
( 21 . 12) It is the inequality between the two extreme terms here that usually goes under the name of M; nkov; but the left-hand i nequality is often useful, too. As a special case there is Chebyshev's in equ ality,
1 P[IX - ml > a) < 2 Var [ X ] a
(21 . 13) (m
=
E[X] ).
J ensen's inequality
cp(E[ X]) < E[cp( X)]
( 2 1 . 14)
cp
X X = ax + b cp aE[X] + b < E[cp(X)].
and if and holds if i s convex on a n interval containing the range of be a support both hdve expected values. To prove it, let L(x) ing line through )- a line lying entirely under the graph of [A33]. Then so that But the left side of this inequality is cp( Holder' s inequ ality is
cp( X)
( El X], cp( E[ X]) aX(w) + b < cp(X(w)), E[ X]).
1 1 - + - = 1.
( 21 . 15)
p
q
For discrete random variables, this was proved in Section 5; see (5.35). For the general case, choose simple random variables and Y, satisfying 0 and 0 see (20.2). Then (5.35) and the monotone convergence theorem give (21 .15). Notice that (21 .15) implies that if and are integrable, then so is XY. Schwarz's inequality is the case
< IXnli lXI IYI"
p =q
=
< IYnli IYI;
Xn
2:
(2 1 . 16) If
X and Y have second moments, then XY must have a first moment.
IXIP
S ECTION 21.
EXPECTED VALUES
277
The same reasoning shows that Lyapounov 's inequality (5.37) carries over from the simple to the general case. Joint Integrals
(X1 , , Xk ) has
The relation (21 . 1 ) extends to I andom vectors. Suppose that distribution in k-space and g : � By Theorem 16.13,
Rk R1
J.L
• • •
•
(21 . 17 ) with the usual provisos about infin ite values. For example, = m;, the of X1 and If is
fRkx1x1J.L(dx). £[X1]
Random variables are
X1
covariance
£[ X,X1] =
uncorrelated if they have covariance 0.
Independence and Expected Value
X and Y are independent. If they are also simple, then Ef XY] E[ X)E[ Y), proved+ in Section 5-see (5.25). Define Xn by (20.2) and similarly define Y,. 1/Jn ( y ) - 1/1/ y- ). Then Xn and Yn are indepen dent and simple, so that E[IXnY,.ll E[IXni] E[IY,.Il, and 0 < IXnli lXI, 0 < !Y,.Ii IYI. If X and Y are integrable, then £[1XnY,I] = £[1Xni ] E[IY,I l i E[IXI] · E[I YI], and it follows by the monotone convergence theorem th::\t E[IXYI J oo; since and IXnY,I < IXYI, it follows further by the dominated conver XnY, gence theorem that E[XY] = lim n E[XnY,l = limn E[Xn]E[Yn] E[X]E[Y]. Therefore, is integrable if X and Y are (which is by no me ans true for depe ndent random variables) and E[XY] E[X]E[Y]. This argument obviously extends inductively: If X1 , , Xk are indepen dent a n d integrable, then the product XI . . . xk is also integrable and Suppose that
as
=
=
=
lsi < s0 ,
Laplace trans s, X oo); X
-
e ek e L:'k=olsxl jk! e
]
-oo, JL{n} JL{ -n} Cjn 2 n -s0 , s0 ), lsl <s0•
Thus M( s) has a Taylor expansion about 0 with positive radius of conver gence if it is defined in some ( s0), > 0. If M(s) can somehow be can be calculated and expanded in a series and if the coefficients identified, then, since ] ! , the moments of must coincide with can be computed: = ! It also follows from the theory of Taylor expan sions that is the k th derivative M< k >(s) evaluated at s = 0:
-s • ks0 L:ka k s ,
( 2 1 .23 )
ak
E[Xk jk
a kk E[X ] a k k [A29] a k k! M< k > ( O) = E[X k ] = j x kp_(dx ) .
X
"'
- oo
Ms jM s
This holds if ( ) exists in some neighborhood of 0. If v has Suppose now that is defined in some neighborhood of x den sity es ( ) with respect to 11- (see (16. 1 1 )), then v has moment for u in some neighborhood of 0. generating function N(u) = M(s + u) 1
s.
M
M(s)
SECTION 21.
EXPECTED VALUES
279
N < k >(O) = f': oox k v( dx) = [:"'x ke sxJl.(dx)jM(s), and since N (o) = M < k >(s)/M(s), But then by (21.23),
00
M< k > (s) = J x k e sxJJ. ( dx ) .
( 21 . 24)
- 00
This holds as long as the moment generating fu nction exists in some neigh borhood of s. If s = 0, this gives (2 1 .23) again. Taking k 2 shows that M(s) is convex in its interval of definition. =
Example 21.2.
For the standard normal density,
and a change of variable gives ( 21 .25 ) The moment generating function in this case defined for all has the expansion
( )
k
oo _!_ � ;., 2 2 e5 1 = 2 k ! k'::o k�O
1X3X
s.
Since
e5 2 1 2
· · ·
X ( 2 k - 1 ) 2k s , (2 k ) !
the momen ts can be read off from (21 .22), which proves (21. 7) once more. Example 21.3.
•
In the exponential case (20.10), the moment generating
function (21 .26) is defined for s < a. By (21 .22) the kth moment is k!a - k . The mean and • variance are thus a - I and a - 2 • Example 21 .4.
For the Poisson distribution (20.7),
(21 .27) Since M'(s) = A e5M(s) and M"(s) = (A.2 e 2 5 + Ae5)M(s), the first two mo ments are M'(O) = A and M"(O) = ;>..2 + A.; the mean and variance are both A . •
RANDOM VARIABLES AND EXPECTED VALUES
280
Let X1 1 , Xk be independent random variables, and suppose that each X1 has a moment generating function M1(s) = E[ e s X; ] in ( -s0 , s 0). For lsi < s0, each exp(sX,) is integrable, and, since they are independent, their product exp{s L:7= 1 X) is also integrable (see (21 .18)). The moment generating function of XI + . . . + xk is therefore •
•
•
( 21 28) in ( - s0, s0 ). This relation for simple random variables was essential to the arguments in Section 9. For simple random variables it was shown in Section 9 that the moment generating function determines the distribution . This will later be proved for general random variables; see Theorem 22.2 for the nonnegative case and Section 30 for the general case. PROBLEMS 21.1.
Prove 1
../2rr
/00 e -IX f2 dx = t - lf2 - oo
2
'
differentiate k times with respect to t inside the integral (justify), and derive (21.7) again. 21.2.
Show that, if X has the standard normal distribution, then
2"n!f2j·rr .
£[1 Xl 2" + 1 ] =
21.3.
20.9 i Records. Consider the sequence of records in the sense of Problem 20.9. Show that the expected waiting time to the next record is infinite.
21.4.
20. 14 i Show that the Cauchy distribution has no mean.
21.5.
Prove the first Borel-Cantelli lemma by applying Theorem 16.6 to indicato r random variables. Why is Theorem 16.6 not enough for the second Borel-Cantelli lemma?
2 1.6.
Prove (21 .9) by Fubini's theorem.
21. 7.
Prove for integrable X that E[X] =
0
10 P[ X > t ] dt - f 00
- oo
P [ X < t ] dt .
SECTION 21. 21.8. (a)
EXPECTED VALUES
281
Suppose that X and Y have first moments, and prove
E[ YJ - E[X] = {' ( P [ X < t < Y ] - P[ Y < t < X J) dt . - oo
Let ( X, Y] be a nondegenerate random interval Show that its expected length is the integral with re>spect to t of the probab ility that it cove rs t.
(b)
21.9.
Suppose that X and Y are random variables with distribution functions F and G. (a) Show that if F and G have no common jumps, then E[F( Y)] + E[G( X)] = 1. (b) If F is condnuous, then E[F( X)] = i. (c) Even if F dud G have common jumps, if X and Y are taken to be independent, then E[ F( Y)] + E[ G( X )] 1 , P[ X = Y ]. (d) Even if F has jumps, E[F(X)] i t- � Lx P 2[ X = x ] =
=
Show that uncorrelated variables need not be independent. (b) S how that Var[L:I'= 1 X;] = L:7, 1 = 1 Cov[ X;, X1 ] = I:;'� 1 Var[ X; ] + 2[ 1 5 ; < 1 5 11 Cov[X;, �-1. The cross terms drop out if the X, are uncorrelated, and hence drop out it they are independent.
21.10. (a)
Let X, Y, and Z be independent random variables such that X and Y assume the values 0, 1, 2 with probabil ity * each and Z assumes the values 0 and 1 with probabilities i and �. Let X' = X and Y' = X + Z (mod 3). (a) Show that X', Y', and X' + Y' have the same one-dimensional distribu tions as X, Y, and X + Y, respectively, even though ( X', Y' ) and ( X, Y) have different distributions. (b) Show that X' a11d Y' are dependent but uncorrelated (c) Show that, despite dependence, the moment generating function of X' + Y' is the product of the moment generating functions of X' and Y'.
21.11. i
21.12.
Suppose that X and Y are independent, nonnegative random variables and that E[ X] = oo and E[Y] = 0. What is the value common to E[ XY ] and E[ X]E[ Y ]? Use the conventions ( 15.2) for both the product of the random variables and the product of their expected values. What if £[X] = oo and O < E[ Y ] < oo?
21.13.
Suppose that X and Y are independent and that f( x, y) is nonnegative. Put g(x) = E[ f( x , Y)] and show that E[ g ( X )] = E[ f( X, Y)]. Show Jllore generally that fx E A g( X) dP = fx E A f( X, Y) dP. Extend to f that may be negative. The integrability of X + Y does not imply that of X and Y separately. Show that it does if X and Y are independent.
21.14. i
2 1.15.
21. 16.
20.25 i Write d 1( X, Y) E[ IX Y l ;( l + I X - Yl)]. Show that this is a metric equivalent to the one in Problem 20.25. =
-
For the density C exp( - l x l 1 12 ), oo < x < oo, show that moments of all orders exist but that the moment generating function exists only at s = 0. -
282
RAN DOM VARIABLES AND EXPECTED VALUES
Show that a moment generating function M(s) defined in ( -s0, s0), s0 0, can be extended to a function analytic in the strip [ z : - s0 < Re z < s0]. If M(s) is defined in [0 s0), s0 0, show that it can be extended to a function continuous i n [ z : 0 < Re z < s0] and analytic in [ z · 0 < Re z < s0].
21.17. 16.6 i
>
>
,
21.18. Use (21.28) to find the generating function of (20.39). 21.19. For independent random variables having moment generating functions, show by (21.28) that the variances add.
Show that the gamma density (20.47) has moment generating function k 1) . . (1 - sja ) - " for s < a. Show that the k th moment is l)ja k Show that the chi-squared distribution with n degrees of freedom has mean n and vanance 2n.
21.20. 20. 17 i
u(u +
(u +
21.21. Let X1 , X2 , be identically distributed random variables with finite second moment. Show that nP[I X1 1 > € /n) -> 0 and n - 1 1 2 max k , ,I Xk I -> p 0. • • •
SECTION 22. SUMS OF INDEPENDENT RANDOM VARIABLES
X1, X2 ,
Let be a sequence of independent random variables on some probability space. It is natural to ask whether the infinite series 1:: 1 con converges with probability 1 , or as in Section 6 whether - • 1 verges to some limit with probability 1 . It is to questions of this sort that the present section is devoted. will denote the partial sum Throughout the section, •
•
•
n L:�= Xk
=
X,
L� = .xk (S0 = 0).
S,
The Strong Law of Large Numbers
The central result is a general version of Theorem 6. 1.
If X 1 , X2 ,
are independent and identically distributed and have finite mean, then S,/n £[ X1] with probability 1. Theorem 22.1.
•
• •
-7
Formerly this theorem stood at the end of a chain of results. The following argument, due to Etemadi, proceeds from first principles. If the theorem holds for nonnegative random variables, then 1 n - L� = 1X:- n - 1 L:� = 1 X; E[XtJ - E[X;l £[X1] with proba bility 1. Assume then that Xk > 0. Consider the truncated random variables Yk Xk l[xk k l and their partial sums S,� L� = 1Yk . For a > 1, temporarily fixed, let u, l a " J. The first step is to prove S, � E [ s: J (22.1 ) u,
PROOF. n - • s, =
=
-7
=
=
11
n
s
=
22. SUMS OF INDEPENDENT RANDOM VARIABLES Since the Xn are independent and identically distributed, n n Var[ s: ] = L Var[ Yk ] < L E [ Yk2 ] k=l k= l = k=Ll E [ XI2/[XJ .s k ]] < nE [ xi2 /[ XI .s n ] ] . SECTION
283
ll
It follows by Chebyshev' s inequality that the sum in
(22.1) is at most
K = 2af(a - 1), and suppose x > 0. If N is the smallest n such that un > x, then a N >x, and since y 2lyJ for y > 1, n 1 1• -N K = Kx 2 a a < u < L L ;; n �N U n �X 1 Therefore, L:� = 1u;; 1I[ x1511" 1 < KX"i for X1 > 0, and the sum in (22.1) is at most KE - 2 £[X1] < oo . From (22.1) it follows by the first Borel-Cantelli lemma (take a un ion over positive, rational E) that (s:n - E[s:n ])fun 0 with probability 1. But by the 1 consistency of Cesaro summation [A30], n - 1 E[s;] = n - L:�= 1 £[ Yk ] has the same limit as £[ Yn ], namely. £[ X1 ]. Therefore s:" fun £[X1 ] with probaLet
0;
here 0 is included in the range of integration. This is the moment generating function (21 .21), but the argument has been reflected through the origin. It is a one-sided Laplace transform, defined for all nonnegative s. For positive s, (21 .24) gives (22.5) Therefore, for positive
x
��
k
(22.6)
lsxl
L k= O
( 1) '
and s,
k s ) s kM ( k l (s) = 1 L e - s y ( J,' J.L(dy) 0 k =O oo
lsxl
Fix X > 0. ut 0 < y < x , then Gs y(xjy ) -7 1 as s -7 00 by (22.3); if y > x, the limit is 0. If J.L { x = 0, the integrand on the right in (22.6) thus converges as s -7 oo to /[O, x J( y ) except on a set of J.L-measure 0. The bounded convergence theorem then gives
}
(22.7) t lf y = 0, the integrand in (22.5) i s I for k = 0 and 0 for k ;;:: I, hence for the middle term of (22 6) is 1 .
y
=
0, the in tegrand in
286
RANDOM VARIABLES AND EXPECTED VALUES
J.L{X}
x x
Thus M(s) determines the value of F at if > 0 and 0, which covers all but cou ntably many values of in [O, oo). Since F is right-continu ous, F itself and hence are determined through (22.7) by M(s). In fact is by (22. 7) determined by the values of M(s) for s beyond an arbitrary s0:
x
J.L
Theorem 22.2.
where s0 > 0, then Corollary.
=
J.L
Let J.L and v be probability measures on [0, oo). If
11 = v.
Let f 1 and f2 be real functions on [0, oo ). If
where s0 > 0, then f1 = f2 outside a set of Lebesgue measure 0. The f,. need not be nonnegative, and they need not be integrable, but must be integrable over [O, oo) for s > s0.
e - sxfi(x)
For the nonnegative case, apply the theorem to the probability i = 1, 2. For the densities g,.( x) = where m = • general case, prove that fi + fi. = f; + fl almost everywhere. PROOF.
e - s"xfi(x)jm,
J0e-snxt,.( x)dx,
=
J.L 1 J.L 2 J.L3, then the corresponding transforms (22.4) satisfy M1 (s)M2(s) Mis) for s > 0. lf J.L i is the Poisson distribution with mean Ai, then (see (21.27)) M,(s) exp[A,. ( e - s - 1)]. It follows by Theorem 22.2 that if two of the J.L, are Poisson, so is the third, and A 1 A 2 A3 • Example 22. 1.
*
If
=
=
+
•
=
Kolmogorov's Zero-One Law
1 L:Z 1 Xk(w) X1(w), . . . , Xm_ 1( w) - L: Xk(w) L:Z X
Consider the set A of w for which n -7 0 as n -7 oo. For each = are irrelevant to the question of m, the values of whether or not w lies in A, and so A ought to lie in the - fi eld xm . . . ). In fact, limn n ;:(: = 0 for fixed m, and hence w 1 lies in A if and only if lim n =m k( w ) = 0. Therefore,
u(Xm, (22.8)
+I'
u
I
11
[
A = n U n w: n-1 E
N "?: m n "?:N
X k( w) t k=m
the first intersection extending over positive rational
E.
]
<E ,
The set on the inside
SECTION 22.
SUMS OF INDEPENDENT RANDOM VARIABLES
287
lies in u( X X + 1 , . . . ), and hence so does A. Similarly, the w-set where the series Ln Xn( w) converges lies in each u( X X + 1 , . . . ). The intersection !T n : = 1 u(Xn , Xn + 1 , ) is the tail u-field associated with the sequence Xp X2 , . . . ; its elements are tail events. In the case xn = /A ' this is the u-field (4.29) studied in Section 4. The following general form of Kolmogorov's zero -one law extends Theorem 4.5. m,
m
m
m, •
.
.
"
Suppose that {Xn} is independent and that A E !T n �= 1 u(Xn , Xn + P · . . )· Then either P(A ) = O or P(A) = 1. Theorem 22.3.
Let .ro = u k = I u( XI ' . . . ' Xk ). The first thing to establish is that .ro is a field generating the u-field u(X 1 , X2 , ). If B and C lie in �, then B E o-(Xp . . . , X) and C E o-(X1, . . . , Xk ) for some 1 and k; if m = max{j, k}, then B and C both lie in u(X; , . . . , Xm), so that B U C E u(Xp . . . , Xm ) c .ro . Thus .ro is closed under the formation of finite unions; since it is similarly closed under complementation, � is a field. For H E .9R 1 ' [ xn E H] E .ro c u(.ro), and hence xn is measurable u(.ra); thus .ro generates u( X1 , X2 , ) (which in general is much larger than .ra). Suppose that A lies in .9'". Then A lies in u( Xk+ P Xk + 2 , . . . ) for each k. Therefore, if B E u(Xp . . . , X1 ), then A and B are independent by Theorem 20.2. Therefore, A is independent of .ro and hence by Theorem 4.2 is also independent of u(Xp X2 , ). But then A is independent of itself: P(A n A) = P(A)P(A) . Therefore, P(A) = P 2 (A), which implies that P(A) is ei • ther 0 or 1 . PROOF.
•
•
•
•
•
•
•
•
•
As noted above, the set where L:n X/w) converges satisfies the hypothesis of Theorem 22.3, and so does the set where n - 1 L:Z= 1 Xk(w) -7 0. In many similar cases it is very easy to prove by this theorem that a set at hand mu st have probability either 0 or 1. But to determine which of 0 and 1 is, in fact, the probability of the set may be extremely difficult. Maximal Inequalities
Essential to the study of random series are maximal inequalities-inequali ties concerning the maxima of partial sums. The best known is that of Kolmogorov.
Suppose that Xp . . . , Xn are independent with mean 0 and finite variances. For a > 0, Theorem 22.4.
(22.9)
288
RAN DOM VARIABLES AND EXPECTED VALUES
A k be the set where I Ski > a but ISj l < a for j < k. Since the Ak n E[ s,;] > r_ f s; dP k n= I A k = kL JA k [ sl + 2 Sd Sn - Sk) + (Sn - Sk) 2] dP n= l > kL fA [ Sl + 2Sk (Sn - Sk) ] dP. =l k Since A k and S k are measurable o-( X1, , Xk ) and Sn - Sk is measurable o-(X/:+ 1 , , Xn ), and since the means are all 0, it follows by (21 . 19) and independence that fA kS1.( S" - Sk ) dP 0. Therefore, n n E [ S,;] > L J S} dP > L a 2 ( A k) k= l k = l Ak = a2P [ max k :s:n !Skl > a ] . By Chebyshev' s inequality, P[ISnl > a] < a - 2 Var[ S�]. That this can be strengthened to (22.9) is an instance of a general phenomenon: For sums of indep endent variables, if maxk "'" I Ski is large, then ISn l is probably large as Let are disjoint,
PROOF.
•
•
•
•
•
•
=
P
•
l:s
well. Theorem 9.6 is an instance of this, and so is the following result, due to Etemadi. Theorem 22.5.
( 22.10) PROOF.
the
Suppose that X1 ,
• • •
P [ 1 �ka:n I Ski > 3a ]
, Xn are independent. For
a > 0,
< 3 1 �ka:n P[ I Ski > a] . Let Bk be the set where ISk l > 3a but IS) < 3a for j < k. Since
Bk are disjoint,
n- 1 P L �ka:n ISk I > 3 a ] < P [ ISn I > a] + k P ( Bk n [ ISn I < a] } �1 n- 1 < P[ISnl > a] + L P(Bk n [ 1 Sn - Skl > 2a] ) k= l n- 1 = P(ISnl > a] + kL P( Bk) P[1Sn - Skl > 2 a] =l < P[IS"I > a] + I:Smax k :Sn P(IS" - Ski > 2a] " I > a] + P(ISk [> a] ) (P(IS � P(IS" I > a] + _ max k :S :s:n < 3 l max :S k :Sn P [ ISk I > a] . 1
•
SECTION 22.
SUMS OF INDEPENDENT RANDOM VARIABLES
289
If the Xk have mean 0 and Chebyshev's inequality is applied to the right side of (22.10), and if a is replaced by a/3, the result is Kolmogorov 's inequality (22.9) with an extra factor of 27 on the right side. For this reason, the two inequalities are equally useful for the applications in this section. Convergence of Random Series
For independent Xn, the probability that L:Xn converges is either 0 or 1. It is natural tO try and characterize the two cases in terms of the distributions of the individual Xn.
Suppose that {Xn } is an independent sequence and E[Xn ] = L Var[ Xn ] < oo, then LXn converges with probability 1.
Theorem 22.6.
0. If
PROOF.
By (22.9),
Since the sets on the left are nondecreasing in
Since L: Var[ Xnl converges,
[ k� l
r,
lim P sup iSn + k - Snl > €
(22.1 1 )
n
]
letting r -7 oo gives
=
0
for each E. Let E(n, E) be the set where supj, k � n 1Sj - Ski > 2E, and put £(€ ) = n n E( n , € ). Then E(n, EH £(€ ), and (22. 11) implies P( £ (€)) = 0. Now U E E(E), where the union extends over positive rational E, contains the set where the sequence {Sn} is not fundamental (does not have the Cauchy property), and this set therefore has probability 0. • Let Xn(w) = rn(w) a n, where the rn are the Rademacher functions on the unit interval-see (1. 1 3). Then Xn has variance a�, and so [a� < oo implies that L:rn(w)a n converges with probability 1. An interesting special case is a n = n - 1 • If the signs in L: + n - 1 are chosen on the toss of a coin, then the series converges with probability 1 . The alternating harmonic series 1 2 - 1 + 3 - 1 + · · · is thus typical in this respect. • Example 22.2.
-
If L:Xn converges with probability 1 , then Sn converges with probability 1 to some finite random variable S. By Theorem 20.5, this implies tha t
RAN DOM VARIABLES AND EXPECTED VALUES
290
Sn � P S. The reverse implication of course does not hold in general, but it does if the summands are independent.
For an independent sequence {Xn }, the Sn converge with probability 1 if and only if they converge in probability. Theorem 22.7.
p
It is enough to show that if sn � S, then {Sn} is fundamental with probability 1. Since PROOF.
sn � p s implies lim sup P ( ISn t-J - Sni > E ] = 0 .
(22 . 1 2)
n
j :?.
I
But by (22. 10),
and therefore
[
]
[
P sup iSn +k - Sni > E < 3 sup P 1Sn+k - Snl > k�l k� l
�].
It now follows by (22. 12) that (22. 1 1 ) holds, and the proof is completed as • before. The final result in this direction, the three-series theorem, provides neces sary and sufficient conditions for the convergence of [ X11 in terms of the individual distributions of the xn. Let x�c) be xn truncated at c: x�c) = Xn [(IX,I :;; ]· c
Theorem 22.8.
senes
Suppose that {Xn}
is
independent, and consider the three
(22 . 13)
In order that EXn converge with probability 1 it is necessary that the three series converge for all positive c and sufficient that they converge for some positive c.
S ECTION 22.
SUMS OF INDEPENDENT RANDOM VARIABLES
291
Suppose that the series (22.13) converge, and put m�c) E[ X�cl]. By Theorem 22.6, L:( X�c) - m�c)) converges with probabil ity 1, and since L:m�c) converges, so does L:x�cl. Since P[ Xn =t= X�c) i.o.] 0 by the first Borel-Cantelli lemma, it follows finally that L:Xn converges with • probability 1. PROOF OF SuFFICIENCY. =
=
Although it is possible to prove necessity in the three-series theorem by the methods of the present section, the simplest and clearest argument uses the central limit theorem as treated in Section 27. This involves no circularity of reasoning, since the three-series theorem is nowhere used in what follows. PROOF OF NECESSITY. Suppose that L:Xn converges with probability 1 , and fix c > 0. Since X11 � 0 with probability 1 , it follows that L:X�c) con verge� with probability 1 and, by the second Borel-Cantelli lemma, that
L:P [I Xn l > c] < oo.
Let MYl and s�c) be the mean and standard deviation of s�c) EZ = 1Xkc)_ If s�c) � oo, then since the x�c) - m�c ) are uniformly bounded, it follows by the central limit theorem (see Example 27.4) that =
(22.14)
lim p n
[
s 0. Now Consider an open disk D = [ I (22.17) . coincides in D0 with a function analytic in D0 u D if and only if its expanston
z: z -
about ( converges at least for this holds. The coefficient
l z - (I < r. Let A0 be the set of w for which u(Xm , m lam{w)l1 1 ,., am o(w ), am ui )
is a complex-valued random variable measurable X + I> ) . By the root test, E A0 if and onl� if lim supm < r - 1 • For each m0, the condition for E A D can thus be expressed in term'> of I( CL.I , . Thus Ao has a probability, and !n > alone, and so Ao I fact P(A0) is 0 or 1 by the zero·one law. Of course, P( A0) = 1 if D c D0. The central step in the proof is to show that P( A0) = 0 if D contains points not in D0. Assume on the contrary that P( A0) = 1 for such a D. Consider that part of the circumference of the unit circle that lies in D, and let k be an integer large enough that this arc has length exceeding 27T jk. Define
w
•
w
\=.
u(Xmu' xm o +
•
•
•
)
Let B0 be the w-set w�ere the function
•
•
•
•
•
if n ¥: 0 ( mod k ) , if n = 0 ( mod k) .
"'
G ( w, z ) = L Yn(w ) z n
( 22 . 19 )
rl
=0
coincides in D0 with a function analytic in D0 U D. . . } has the same structure as the original sequence: The sequence the are independent and assume the values + 1 with probability i each. Since B0 is defined in terms of the in the same way as A0 is defined in terms of the it is intuitively clear that P(B0) and P( A 0) must be the same. Assume for the moment the truth of this statement, which is somewhat more obvious than its proof. If for a particular each of (22. 17) and (22. 19) coincides in D0 with a function analytic in D0 U D, the same must be true of
{Y0 , Yw
Yn
Yn
Xn,
w
"'
(22.20)
F( w, z) - G( w, z) = 2 L= xm k ( w)z mk. m O
294
RANDOM VARIABLES AND EXPECTED VALUES D1 = [ ze21Tilf k : z E D ].
Since replacing z by ze 21Ti/k leaves the function (22.20) unchanged, it can be extended analytically to each D0 u Dp I = 1, 2, . . . . Because of the choice of k , it can therefore be extended analytically to [ z : l z l < 1 + E] for some positive E; but this is impossible if (22.18) holds, since the radius of convergence must then be 1. Therefore, AD n BD cannot contain a point w satisfying (22.18). Since (22. 18) holds with probability 1, this rules out the possibility P(AD) = P(BD) = 1 and by the zero-one law leaves only the possibility P(A D) = P( BD) = 0. Let A be the w-set where (22.17) extends to a function analytic in some open set larger than D0. Then w E A if and only if (22. 17) extends to D0 U D for some D = [ z : l z - ( l < r] for which D - D0 =t= 0, r is rational, and ( has rational real and imaginary parts; in other words, A is the countable union of ) and has probability 0. A D for such D. Therefore, A lies in a-( X0 , X 1 , It remains only to show that P( AD) = P(BD), and this is most easily done by comparing {Xn} and {Y) with a canonical sequence having the same n structure. Put Zn(w) = (Xn(w) + 1)/2, and let Tw be L� = o Zn(w)2 - - l on the w-set A* where this sum lies in (0, 1]; on f! - A* let Tw be 1, say. Because of (22.16) P( A* ) = 1. Let .9"= u (X0 , X 1 , . . . ) and let � be the u-field of Borel subsets of (0, 1 ]; then T: fl � (0, 1] is measurable .9"/ �. Let rn (x) be the nth Rademacher function. If M = [x: r;( x ) = u ; , i = 1 . . . . , n ], where u; = + 1 for each i, then P(T- 1M) = P[w: X;(w) = u1, i = 0, 1, . . . , n 1 ] = 2 - n , which is the Lebesgue measure A(M ) of M. Since these sets form a 7T-system generating �. P(T- 1 M) = A( M) for all M in f!J (Theorem 3.3). Let MD be the set of x for which L:� =o rn + 1 (x) z n extends analytically to D0 U D. Then MD lies in f!J, this being a special case of the fact that AD lies in !F. Moreover, if w E A* , then w E A D if and onl� if Tw E M0: A* nAD = A* n T- 1MD. Since P( A* ) = 1, it follows that P(AD) = A(MD). This argument only uses (22.16), and therefore it applies to {Yn } and BD as • wei!. Therefore, P( BD) = A(MD) = P(AD). Let
•
•
•
PROBLEMS 22.1.
Suppose that X 1 , X , . . . is an independent sequence and Y is measurable u ( Xn , Xn + I > . . . ) for 2 each n. Show that there exists a constant a such that P[ Y= a] =
22.2.
1.
Assume {Xn} independent, and define X�c> as in Theorem 22.8. Prove that for !:IXnl to converge with probability 1 it is necessary that [P[ I Xnl > c] and [E[IX�c>l] converge for all positive c and sufficient that they converge for some positive c. If the three series (22.13) converge but [E[ I ��cll] = oo, then there is probability 1 that [Xn converges conditionally but not a bsolutely. Generalize the Borei-Cantelli lemmas: Suppose Xn are nonnegative If [E[ Xn] < oo, then [Xn converges with probability 1. If the Xn are indepen dent and uniformly bounded, and if [E[ Xn] = oo, then [Xn diverges witt probability 1.
22.3. i
(a)
SECTION
22.
SUMS OF INDEPENDENT RANDOM VARIABLES
295
(b) Construct independent, nonnegative X11 such that [X11 converges with probability 1 but [£[ X11] diverges For an extreme example, arrange that P[ Xn > 0 i.o.] 0 but £[X11] oo. =
=
22.4. Show under the hypothesis of Theorem 22.6 that
extend Theorem 22.4 to infimte sequences. •
22.5. 20.14
[Xn has finite variance and
a re independent, each with the Cauchy 22.1 i Suppose that XI > X2 , distribution (20.45) for a common value of u. (a) Show that n - 1 !:Z 1 Xk does not converge with probability 1 . Contrast with Theorem 22. 1 . (b) Show that P[ n - I max k "' " xk < X ] --+ e - u j TTX for X > 0. Relate to Theorem 14.3. •
•
•
=
22.6. If
X1 , X2, are independent and iden tically disuibuted, and if P[ X1 > 0] 1 and P[ X1 > 0] > 0 , then [11X11 = oo with probability 1. Deduce this from Theorem 22. 1 and its corollary and also directly: find a positive E such that xr > c infinitely often with probability 1 . •
•
=
•
22.7. Suppose that
X1, X2, . . . are independent and identically distributed and E[ I X1 I] = oo. Use (21.9) to show that L11 P[ I Xn l > an] = oo for each a, and conclude that S Up11 n - 1 1 X111 oo with probability 1. Now show that sup" n - : 1511 1 = oo with probability 1. Compare with the corollary to Theorem 22. 1 . =
22.8.
Wald's equation. Let X1, X2, be in dependent and identically distributed + Xn. Suppose that -r is a stopping with finite mean, and put Sn = X1 + time: -r has positive integers as values and [ -r = n] E u(X1, , X11); see Section 7 for examples. Suppose also that £[ -r] < oo. (a) Prove that •
•
•
·
·
·
.
•
.
(22.21 )
(b) Suppose that Xn is + 1 with probabilities p and q, p -=F q, let -r be the first n for which Sn is - a or b (a and b positive integers), and calculate £[-r]. This gives the expected duration of the game in the gambler's ruin problem for unequal p and q. 22.9. 20.9 i
Let Z11 be 1 or 0 according as at time n there is or is not a record in the sense of Problem 20.9. Let R11 = Z1 + + Zn be the number of records up to time n. Show that R11jlog n -+P 1. ·
·
·
22.10. 22. 1
{X11} the radius of conver i (a) Show that fo r an independent sequence 11 gence of the random Taylor series L11 X11 Z is r with probability 1 fo r some nonrandom r. (b) Suppose that the Xn have the same distribution and P[X1 -=F 0] > 0. Show that r is 1 or 0 according as log +I X1 1 has finite mean or not.
22.1 1. Suppose that
X0, X1, . . . are independent and each is uniformly n d istributed over [0, 2 '7T ]. Show that with probability 1 the series [11eiX,z has the unit circle as its natural boundary.
22.12. Prove (what is essentially Kolmogorov's zero-one law) that if
A is independe nt
of a '7T-system f?J and A E u(f?J), then P(A ) is either 0 or 1 .
RANDOM VARIABLES AND EXPECTED VALUES
296 22.13.
22. 14.
Suppose that .w' is a semiring conta ining !l. (a) Show that if P(A n B) < bP( B) for all B E .w', and if b < 1 and A E a-(.9«'), then P(A) = 0. (b) Show that if P( A n B) < P( A )P( B) for all B E .w', and if A E a-(.9«'), then P( A) is 0 or 1. (c) Show that i f aP(B) < P(A n B) for all B E .w', and if a > 0 and A E a-(.9«'), then P( A ) = 1. (d) Show that if P( A )P( B) < P( A n B) for all B E .w', and if A E a-(.9«'), then P( A) is 0 or 1. (e) Reconsider Problem 3.20
Burstin's theorem. Let f be a Borel function on [0, 1] with arbitrarily small periods: For each *' there is a p such that 0 < p < *' and f(x) = f(x + p) for 0 < x < 1 - p. Show that such an f i n] and variables are defined by = ), then (23.3) and (23.4) hold, and the definition (23.5) gives back the original N, . Therefore, anything that can be said about the can be stated in terms of the N, , and conversely. The points . . . of (O, oo) are exactly the discontinuities of N,{w) as a function of t; because of the queueing example, it is natural to call them arrival times. The program is to study the joint distributions of the under conditions on the waiting times and vice versa. The most common model specifies the independence of the waiting times and the absence of aftereffect:
Sn(w)
sn _,( w
N, w
=
Xn(w) Sn(w) Xn S 1{ w), S 2(w), N,
Xn
Condition r. The X,. are independent , and each is exponentially distributed with parameter a.
Xn
n
In this case P[ > OJ = 1 for each n and n s � a - I by the strong law of large numbers (Theorem 22.1), and so (23.3) and (23.4) hold with probabil ity 1; to assume they hold everywhere (Condition 0°) is simply a convenient normalization. Under Condition 1°, has the distribution function specified by (20.40), so that P[ N, > n] = f.7 ne - "''(at)'/i! by (23.6), and -
•
Sn
(23.8)
n (at )
n = 0 1,... .
P[ Nr = n] = e - ar n ! ,
,
N,
Thus has the Poisson distribution with mean at. More will be proved in a moment.
Condition 2°. {i) For 0 < t < · · · < t k the increments N, , , N,l - N, , , . . . , N,k - N, are independent. {ii) The individual increments have the Poisson distribution: 1
k-I
t s (23.9) P[ Nr - N = n ] = e - a< r - s ) ( a ( n-! s
n ))
'
n = 0 , 1 , . . . , 0 < s < t.
[N,:
t > 0] of Since N0 = 0, (23.8) is a special case of (23.9). A collection random variables satisfying Condition 2° is called a Poisson process, and a is the rate of the process. As the increments are independent by {i), if r < s < t, then the distributions of � - Nr and - � must convolve to that of - Nr But the requirement is consistent with {ii) because Poisson distribu tions with parameters u and v convolve to a Poisson distribution with parameter u + v.
N,
N,
Theorem 23.1.
Condition oo.
Conditions 1 o and 2° are equivalent
tn
the presence of
300
RAN DOM VARIABLES AND EXPECTED VALUES
PROOF OF 1 o -7
2°. Fix t, and consider the events that happen after time t. By (23.5), SN( < t < SN( + I • and the waiting time from t to the first event following t is SN( + 1 - t; the waiting time between the first and second events following t is XN( + 2 ; and so on. Thus (23. 10) XU1 ) - sN( + I - t , X2< l> -- XN( + 2 ' X3< l> - XN, + 3 • · · · define the waiting times following t. By (23.6), N, + ' - N, > m, or N, + s > N, + )< . . . +X� m, if and only if S N + m < t + s , which is the same thing as xv> + , s . Thus
Nl + s - N = max [ m · x Iu > +
( 23. 1 1 )
1
•
· · ·
+ xu> m -< s ] ·
Hence [ Nl + s - N1 = m ] = [ x I + · · · +X < 1 > -< s < x u > + · · + Xu> m + I ]· A comparison of (23.11) and (23.5) shows that for fixed t the random variables N, + s - N, for s � 0 are defined in terms of the sequence (23. 10) in exactly the same way as the � are defined in terms of the original sequence of waiting times. The idea now is to show that conditionally on the event [N, = n] the random variables (23.10) are independent and exponentially distributed. Because of the independence of the Xk and the basic property (23.2) of the exponential distribution, this seems intuitively clear. For a proof, apply (20.30). Suppose y > 0; if Gn is the distribution function of S n ' then since X11 + I has the exponential distribution,
=
J
X :5;
I
·
l
m
P[ Xn + l > t + y - x ] dGn( x)
= e - "')'
J
X :5; I
P[ Xn + 1 > t - x ] dGn( x )
By the assumed independence of the X11,
= P[ S n + l - t > y I • Sn -< t < Sn + l ] e - "'Y2 = P[Sn -< t < Sn + l ] e - ay , ( 23 .12)
p
[ Nl = n ' ( X I
I)'
.
•
.
'
· · · e - "'Yi
· · · e - ayi ·
X} I ) ) E H ] = p [ Nl = n ] p [ ( X ' . . . ' xj) E H ] . l
SECTION
23.
THE POISSON PROCESS
301
By Theorem 10.4, the equation extends from H of the special form above to all H in �i. Now the event [ N5 . = m;, 1 < i < u ] can be put. in the form [(X1 , , Xi ) E H], where j = mu + 1 and H is the set of x in R 1 for which x 1 + · · · +xm · < s; < x 1 + · · · +xm; + l • 1 < i < u. But then [( Xl ' >, . . . , xp >) E H] is by (23. 1 1) the same as the event [ N, - N, = m;, 1 < i < u]. Thus (23.12) gives •
(
.
.
+s (
From this it follows by induction on
k
that if 0 = t0 < t k
1
t] = P[N, = 0] = e - ar , so that X1 is exponentially distributed. To find the joint distribution of X1 and X2, suppose that 0 < s 1 < t 1 < s2 < t2 and perform the caiculation PRooF oF
P[ s < s < t s 2 < s 2 < t2 ] = P[ N5 = 0, N, 1 - � = 1 , � 2 - N, = 0, N, 2 - �s > 1 ] 1 I
I'
I
1
1
= e - as, X a ( t l - s l)e - a 1. In the case where rn = n and Pn k = Ajn, (23.15) is the Poisson approximation to the binomial. Note that if A > 0, then (23.14) implies rn -7 oo.
PRooF. The argument depends on a construction like that in the proof of be independent random variables, each uni Theorem 20.4. Let U1 , U2 , formly distributed over [0, 1). For each p, 0 < p < 1, �plit [0, 1) into the two intervals /0( p) = [0, 1 - p) and / 1 p) = f 1 - p, 1), as well as into the sequence . - 0, 1 , . . . . D efi ne vn - 1 i i i I l "' ') t jl·' p of mterva I s Ji ( P ) -- [ "' p �... J. , J. = le �...i = le , i k if uk E / J( Pn k ) and vn k = 0 if u�.. E Io 0, n > 1 ;
that is, each of S 1 , S 2 , . . . has a continuous distribution function. Of course there is probability 1 (under Condition 0°) that N,(w) has a discontinuity somewhere (and indeed has infinitely many of them). But {23.17) ensures that a t specified in advance has probability 0 of being a discontinuity, or time of an arrival. The Poisson process satisfies this natural condition.
If Condition oo holds and [ N, : t > 0] has independent increments and no fixed discontinuities, then each increment has a Poisson distribution. This is Prekopa ' s theorem. The conclusion is not that [ Nr : t > 0] is a Poisson process, because the mean of N, - N5 need not be proportional to t - s. If rp is an arbitrary nondecreasing, continuous function on [0, oo) and cp{O) = 0, and if [ N, : t > 0] is a Poisson process, then N 1 ] j A L , k= l
( 23.19) for some finite A and
( 23.20) Third,
(23.21)
l
[
pr L N,n k-1 > 1 ] -7 0. N, nk �k �r max
n
-
,
P l max ( N,nk - N,n k-1 ) > � k � rn
2]
-7
0.
Once the partitions have been constructed, the rest of the proof is easy: Let Z,. k be 1 or 0 according as N, nk - N, n,k - 1 is positive or not. Since [ N,: t > 0] has independent increments, the Zn k are independent for each n. By Theorem 23.2, therefore, (23.19) and (23.20) imply that Zn = L�"= 1 Znk satis fies P[ Zn = i] -7 e- AAiji! Now N,. - N,, > Zn , and there is strict inequality if and only if N, k - N, - > 2 for some k. Thus (23.21) implies P[ N," - N,, =I= Zn ] -7 0, and therefo;� P[ N,, - N1, = i] = e- AAiji! To construct the partitions, consider for each t the distance D, = infm 1 It - Sml from t to the nearest arrival time. Since Sm -7 oo, the infimum is achieved. Further, D, = 0 if and only if Sm = t f01 some m , and since by hypothesis there are no fixed discontinuities, the probability of this is 0: P[ D, = O] = O. Choose 8, so that 0 < 8, < n - 1 and P[D, < 8,] < n - 1 • The intervals (t - o, t + 8,) for t' < t < t" cover [t', t"]. Choose a finite subcover, and in (23.18) take the tn k for 0 < k < rn to be the endpoints (of intervals in the subcover) that are contained in (t', t"). By the construction, �
(23.22) and the probability that (t n, k _ 1 , tn k ] contains some Sm is less than n - 1 • This gives a sequence of partitions satisfying (23.20). Inserting more points in a partition cannot increase the maxima in (23.20) and (23.22), and so it can be arranged that each partition refines the preceding one. To prove (23.21) it is enough (Theorem 4. 1) to show that the limit superior of the sets involved has probability 0. It is in fact empty: If for infinitely many n, N, (w ) - N,n, k-J(w ) > 2 holds for some k < rn , then by (23.22), N,(w) as a n k.
SECTION
23.
THE POISSON PROCESS
305
function of t has in [t', t"] discontinuity points (arrival times) arbitrarily close together, which requires S m(w) E [t', t"] for infinitely many m, in violation of Condition 0°. It remains to prove (23.19). If Znk and Zn are defined as above and Pnk = P[Zn k = 1] , then the sum in (23. 19) is L k Pnk = E [ Zn ] . Since zn + I > Zn , L k Pnk is nondecreasing in n. Now
P[ N,.. - N, . = O) = P[ Zn k = 0, k < rn ]
- 0 (1 -p k= l
nk
[
) < exp -
t
k=l
l
Pnk ·
If the left-hand side here is positive, this puts an upper bound on [, k pr. k • and (23.19) follows. But suppose P[N,. - N,. = 0] = 0. If s is the midpoint of t' and t", then since the increments are independent, one of P[ N5 - N,. = 0] and P[N,. - � = OJ must vanish. It is therefore possible to find a nested sequence of interval'> [urn, vn ] such that vm - urn -7 0 and the event A m = [N, - Nu > 1] has probability 1. But then P( n m A m ) = 1 , and if t is the point common to the [urn, vm ], there is an arrival at t with probability 1, contrary to the assumption that there are no fixed discontinuities. • m
m
Theorem 23.3 in some cases makes the Poisson model quite plausible. The increments will be essentially independent if the arrivals to time s cannot seriously deplete the population of potential arrivals, so that N5 has for t > s negligible effect on N, - N5 And the condition that there are no fixed discontinuities is entirely natural. The'>e conditions hold for arrivals of calls at a telephone exchange if the rate of calls is small in comparison with the population of subscribers and calls are not placed at fixed, predetermined times. If the arrival rate is essentially constant, this leads to the following condition. •
Condition 3°. {i) For 0 < t < · < tk the increments N, N, 2 - N, N, - N, are independent. k k -I {ii) The distribution of N, - � depends only on the difference t - s. 1
Theorem 23.4.
Condition 0°.
· ·
1,
1,
• • •
,
Conditions 1°, 2°, and 3° are equivalent in the presence of
Obviously Condition 2o implies 3°. Suppose that Condition 3o holds. If 1, is the saltus at t (1, = N, - sups < r N5 ), then [ N, - N, _ n - 1 > 1] � [1, > 1] , and it follows by (ii) of Condition 3o that P[1, > 1] is the same for all t. But if the value common to P[1, > 1] is positive, then by the independence of the increments and the second Borel-Cantelli lemma there is probability 1 that 1, > 1 for infinitely many rational t in (0, 1), for example, which contra dicts Condition 0°. PRooF.
RANDOM VARIABLES AND EXPECTED VALUES
306
By Theorem 23.3, then, the increments have Poisson distributions. If f(t) is the mean of N, then N, - Ns for s < t must have mean f(t) -f(s) and must by (ii) have mean f(t - s ); thus f(t) = f(s) + f(t - s ). Therefore, f satisfies Cauchy's functional equation [A20] and, being nondecreasing, must have the form f(t) = at for a > 0. Condition oo makes a = 0 impossible. • One standard way of deriving the Poisson process is by differential equations.
Condition 4°. If G < t 1 < then
· · ·
< tk and if n 1 , . . . , n k are nonnegative integers,
( 23.23) and (23.24) as h � 0. Moreover, [ N, : t > 0] has no fixed discontinuities. The occurrences of o(h) in (23.23) and (23.24) denote functions, say 0 and It - sl < n- 1 , then
As n oo, the right side here decreases to the probability of a discontinuity at t, which is 0 by hypothesis. Thus P[N, = n] is continuous at t. The same kind of argument works for conditional probabilities and for t = 0, and so Pn(t) is continuous for t > 0. To simplify the notation, put D, = N, k + r - N,k . If D, + h = n, then D, = m for some m < n. If t > 0, then by the rules for conditional probabilities, �
Pn( t + h) = Pn( t )P [ Dr + h - D, = OIA n [ D, = n ]) + Pn - 1 ( t ) P [ D, +11 - D, -= 1 I A n [ D, = n - 1 ] ] n- 2 + L Pm( t )P[ Dr + h - D, = n - m i A n [ D, = m l ] . m=O
For n < 1, the final sum is absent, and for n = 0, the middle term is absent as well. This holds in the case Pn (t) = P[N, = n] if D, = N, and A = fl. (If t = 0, some of the conditioning events here are empty; hence the assumption t > 0.) By (23.24), the final sum is o(h) for each fixed n. Applying {23.23) and (23.24) now leads to
Pn( t + h ) = pn( t)(1 - ah) +pn_ 1 ( t)ah + o( h ) , and letting h � 0 gives
( 23 .26) In the case n = 0, take p _ 1(t) to be identically 0. In (23.26), t > 0 and p�(t) is a right-hand derivative. But since Pn(t) and the right side of the equation are continuous on [0, oo), (23.26) holds also for t = 0 and p�(t) can be taken as a two-sided derivative for t > 0 [A22]. Now (23.26) gives [A23]
Since Pn(O) is 1 or 0 as n = 0 or n > 0, (23.25) follows by induction on n.
•
RANDOM VARIABLES
308
AND
EXPECTED VA LUES
Stochastic Processes
The Poisson process [ N, : t > 0] is one example of a stochastic process - that is, a collection of random variables (on some probability space (f!, sz-, P)) indexed by a parameter regarded as representing time. In the Poisso n case, time is continuous. In some cases the time is discrete: Section 7 concerns the sequence ( F } of a gambler's fortunes; there n represents time, but time that . . . mcreases m JUmps. Part of the structure of a stochastic process is specified by its finite-dimen sional distributions. For any finite sequence t 1 , • • • , tk of time points, the k-dimensional random vector (N1 1 , • • • , N, ) has a distribution Jl- 1 1 1 over R k . These measures Jl- 1 1 1 k are the finite-dimensional distributions of the pro cess. Condition 2° of this section in effect specifies them for the Poisson case: n
k
( 23.27 ) P [ NI · = n1. ' ]. -< k J = n e �l 1
j
-
a( rI
· -
11 1 ) · -
(a(t - t )) j
j-l
n - - n J.
1
( ni - ni - l ) .I
.
-•
if O < n 1 :::;; . . . < n k and 0 < t 1 < . . . < tk (takc n 0 t 0 = 0) . The finite-dimensional distribution s do not, however, contain all the mathematically interesting information about the process in the case of continuous time. Because of (23.3), (23.4), and the definition (23.5), for each fixed w, N1 (w) as a function of t has the regularity properties given iu the second version of Condition 0°. These properties are used in an essentiai way in th.e proofs. Suppose that f(t) is t or 0 according as t is rational or irrational. Let N, be defined as before, and let =
Ml ( w) = N,( w) + f ( t + X I ( w ) ) . If R is the set of rationals, then P[w: f(t + X 1 (w)) =I= 0] = P[w : X1 ( w) E R - t ] = 0 for each t because R - t is countable and X, has a density. Thus P[M1 = N, ] = 1 for each t, and so the stochastic process [M1 : t > 0] has the same finite-dimensional distributions as [N1: t > 0]. For w fixed, however, M1 (w) as a function of t is everywhere discontinuous and is neither mono ( 23.28)
-
tone nor exclusively integer-valued. The function s obtained by fixing w and letting t vary are calle d the path functions or sample paths of the process. The example above shows that th e finite -dimensional distributions do not suffice to determine the character of the path functions. In specifying a stochastic process as a model for some phenomenon, it is natural to p lace conditions on the character of the samp le paths as well as on the finite-dimensional distributions. Condition oo was imposed throughout this section to ensure that the sample paths are nonde creasing, right-continuous , integer-valued step functions, a natural condition if N, is to represent the number o f events in [0, t ]. Stochastic processes i n continuous time are studied further in Chapter 7.
SECTION
23.
THE POISSON PROCESS
309
PROBLEMS
Assume the Poisson processes here satisfy Condition oo as well as Condition 1°. 23.1.
Show that the minimum of independent exponential waiting times is again exponential and that the parameters add. •
23.2.
20. 17 i Show that the time Sn of the nth event in a Poisson stream has the gamma density f( x; a, n) as defined by (20.47). This is sometimes called the
Erlang density. 23.3.
Let A , = t - SN he the time back to the most recent event in the Poisson stream (or to 0); and let B, = SN + 1 - t be the time forward to the next event. Show that A, and B, are independent, that B, is distributed as X1 (exponen tially with parameter a ), and that A, is distributed as min{Xp t}: P[ A, < t] is or 1 as x < 0, 0 < x < t, or x > t. 0, 1 - e -ax,
Let . L, =A, + B, = SN + 1 - SN be the length of the interarrival interval covenng t. (a) Show that L, has density
23.4. i
'
'
if 0 < X < t , if X > t . Show that E[L1] converges to 2 E[Xd as t --+ oo. This seems paradoxical because L, is one of the Xn. Give an intuitive resolution of the apparent paradox. (b)
23.5.
Merging Poisson streams. Define a process {N,} by (23.5) for a sequence {Xn} of
random variables satisfying (23.4). Let {X�} be a second sequence of random variables, on the same probability space, satisfying (23.4), and define {N,'} by N,' = max[ n : X) + +X� < t ]. Define {N,"} by N;' = N, + N;. Show that, if u(X1 , X7 , . . . ) and u(X;, X�, . . . ) are independent and {N1} and {N;l are Poisson processes with respective rates a and {3, then {N,"} is a Poisson process with rate a + {3. ·
23.6. i
The n th and
sn + l "
·
·
(n + l)st events
1 n the process {N,} occur at times Sn and
Find the distribution of the number Ns process during this time interval. (b) Generalize to N5 - N5, .
(a)
..
'"
23.7.
+,
- Ns of events in the other ..
,
Suppose that X 1 , X2 , are independent and exponentially distributed with parameter a, so that (23.5) defines a Poisson process {N,}. Suppose that Y1 , Y2 , . . . are independent and identically distributed and that u(X1 , Xz, . . . ) and u(Yp Y2 , ) are independent. Put Z, = L:k ,; N Yk . This is the compound Poisson process. If, for example, the event at time \ in the original process •
•
•
•
•
•
310
RANDOM VARIABLES AND EXPECTED VALUES
represents an insurance claim, and if Yn represents the amount of the claim, then z, represents the total claims to time t . (a) I f Yk = 1 with probability 1 , then {Z,} is an ordinary Poisson process. (b) Show that {Z,} has independent increments and that Zs +r - Z5 has the same distribution as z,. (c) Show that, if Yk assumes the values 1 and 0 with probabilities p and 1 - p (0 < p < 1), then {Z,} is a Poisson process with rate pa. 23.8.
Suppose a process satisfies Condition oo and has independent, Poisson-distrib uted increments and no fixed discontin uities. Show that it has the form {N (1 - E)P(T"I). S uppose the arc I runs from a to b. Let n 1 be arbitrary and, using the fact that {T "a} is dense, choose n 2 so that T"1/ and T"2/ a re disjoint and the distance from T"tb to T "2 a is less than Ea. Then choose n3 so that T"1/, T"2/, T " 'l are disjoint and the distance from T"2b to T"'a is less than Ea. Continue until T" kb is within a of T"1a and a further step is impossible. Since the T"•l are disjoint, ka < 1; and by the construction, the T "• I cover the circle to within a set of measure kw + a, which is at most 2€. And now by disjointness, • • •
• • •
k
k
P ( A ) > L P ( A n T"• I ) > ( 1 - € ) L P ( T"• I ) > ( 1 - € )( 1 - 2 €). i=
I
i=
I
Since € was arbitrary, P(A) must be 1: T is ergodic if c is not a root of unity.t t For
a
simple Fourier-series proof, see Problem 26.30.
SECfiON
24.
317
THE ERGODIC THEOREM
Proof of the Ergodic Theorem
The argument depends on a preliminary result the statement and proof of which are most clearly expressed in terms of functional operators. For a real fu nction f on f!, let Uf be the real function with value (Uf)(w ) = f( Tw ) at If f is integrable, then by change of variable (Theorem 16. 13),
w.
_
(24. 12) E[Uf ] = 1 t(Tw)P(dw) = 1 t(w)PT- 1(dw) = E[ f ] . l1
l1
U is nonnegative in the sense that it carries nonnegative functions to nonnegative functions; hence f < g (pointwise) implies Uf < Ug . n Make these pointwise definitions: S0f = O, SJ=f+ Uf + · · · + U - 1f, Mn f = max0 ,., k ,., n Sk f, and MJ = supn 0 Sn f = supn 0 MJ. The maximal ergodic theorem: And the operator
�
Theorem 24.2.
�
Iff is integrable, then
fMxf>OfdP ?: 0.
(24.13)
Since Bn = [Mnf> O] i[M,J > 0], it is enough, by the dominated convergence theorem, to show that f8n f dP > 0. On Bn , Mn f = max 1 ,., k ,; n Sk f. Since the operator U is nonnegative, Sk f = f + US k _ I f < f + UMJ for 1 < k < n, and therefore Mnf 0. =
•
Replace f by f/A. If A is invariant, then Sn{f!A) = (SJ)IA , and M,,(f!A ) (M,,J )JA , and therefore (24.13) gives
( 24.14 ) Now replace f here by f- ,.\ ,
,.\
a constant. Clearly [M,,( f - ,.\ ) > 0] is the set
318
RANDOM VARIABLES AND EXPECfED VALUES
where for some n > 1, Sn(f - A) > O. or n - 1 Snf> A. Let
(24.15 )
FA =
[ w: n � l � kt= l f( Tk - 1w) > A ] ; sup
it follows by (24.14) that fA n FA(f- A) dP > 0, or
A P( A n FJ j
(24.16)
A], th-en A P( GA ) < 2 £[1/ll (trivial if A < 0). There fore, for positive a and A , h
h
n
1
lanl dP _!_n t J I J( T k - lw) I P( dw ) [la n i > A ) k = I GA
a
a
I J(T' - Iwll > a
i f( T k - 1 w) I P ( dw ) + aP ( G>. )
l f(w) I P ( dw ) + aP ( G>. )
l f (w) IP( dw) + 2 � E[Itl] .
)
Take a = A1 12 ; since f is integrable, the final expression here goes to 0 as
SECfiON
24.
THE ERGODIC THEOREM
319
,.\ -7 oo . The a n(w) are therefore uniformly integrable, and £[ / ] E[f]. The uniform integrability also implies E[ia n - !11 -7 0. Set f(w) = 0 outside the set where the an(w) have a finite limit. Then (24.17) still holds with probability 1, and j(Tw) = /Cw). Since [w: j(w) < x ] is invariant, in the ergodic case its mea�ure is either 0 or 1; if x0 is the infimum of the x for which it is 1, th en f(w) = x0 with probability 1, and from • x0 = E[f] = E[f] it follows that f(w) = E[f] with probability 1. =
h
h
A
h
The Continued-Fraction Transformation
Let f! consist of the irrationals in the unit interval, and for x in f! let Tx = {1/x} and a 1(x) = l1/xJ be the fractional and integral parts of 1jx. This defines a mappi11g
(24.18 ) of f! into itself, a mapping associated with the continued-fraction expansion of x [A36]. Concentrating on irrational x avoids some triviai details con nected with the rational case, where the expansion is finite; it is an inessential restriction because the interest here centers on results of the almost-every where kind. n For x E f! and n > 1 let a n(x) = a 1(T - 1 x) be the nth partial quotient, and define integer-valued functions Pn(x) and qn(x) by the recursions
(24.19) p _ 1(x) = 1, q _ 1(x) = 0,
p0(x) = 0, q0(x) = 1,
Pn(x ) = a n(x )pn _ 1(x) + Pn - /x), n > 1, qn(x) = an(x)qn _ 1(x) + qn _ /x), n > 1.
Simple induction arguments show that
n > 0,
(24.20 ) and [A37: (27)]
n > 1. It also follows inductively [A36: (26)] that
(24.22)
_!ja 1 ( x ) + +1J an _ 1 (x) +_!ja n(x) + t Pn( x) + tpn _ 1 ( x) , n > 1, O < t < l. qn( x) + tqn - l ( x) ·
-
·
·
320
RANDOM VARIABLES AND EXPECfED VALUES
Taking t = 0 here gives the formula for the nth convergent: n > 1,
(24.23)
where, as follows from (24.20), pJx) and qn(x) are relatively prime. By (24.21) and (24.22),
n Pn( x ) -t ( T x) Pn - l ( x) x = J x ) + T n x)qn _ 1 ( x ) ' q (
( 24.24)
n > 0,
which, together with (24.20), impliest n ?:. 0.
(24.25)
Thus the convergents for even n fall to the left of x, and those for odd n fall to the right. And since (24. 19) obviously implies that qn(x) goes to infinity with n, the convergents Pn(x)jqJx) do converge to x: Each irrational x in (0, 1) has the infinite simple continued-fraction representation
(24.26) The representation is unique [A36: (35)], and Tx = lla 2 ( x ) + J]a 3 ( x ) + · · T shifts the partial quotients in the same way the dyadic transformation (Example 24.3) shifts the digits of the dyadic expansion. Since the continued fraction transformation turns out to be ergodic, it can be used to study the continued-fraction algorithm. Suppose now that a" a?., . . . are positive integers and define Pn and qn by the recursions (24.19) without the argument x. Then (24.20) again holds n (without the x), and SO Pn /qn - Pn - 1 /qn - l = (- l) + l /qn - l qn , n > 1. Since qn increases to infinity, the right side here is the n th term of a convergent alternating series. And since p0jq0 0, the n th partial sum is Pn /qn , which therefore converges to some limit: Every simple infinite continued fraction converges, and [A36: (36)] the limit is an irrational in (0, 1). Let A a 1 a. be the set of x in f! such that a k (x) a k for 1 < k < n; call it a fundamental set of rank n. These sets are analogous to the dyadic intervals and the thin cylinders. For an explicit description of A a a -necessary for the proof of ergodicity-consider the function :
=
•
•
=
I
(24.27) tTheorem 1 .4 follows from this.
n
·
SECTION 24. THE ERGODIC THEOREM
321
If x E d a a • then x = l/la a ( Tn x) by (24.21); on the other hand, because 1 1 . of the uniqueness of the partial. quotients [A36: (33)], if t is an irrational in the unit interval, then (24.27) lies in d a 1 an . Thus d a ( an is the image under (24.27) of f! itself. Just as (24.22) follows by induction, so does • •
• •
( 24.28)
I/Ja 1 a.{ t )
=
Pn + lPn - 1 q + tqn - I n
·
And 1/Ja I an(t) is increasing or decreasing in t according as n is even or odd, as is clear from the form of (24.27) (or differentiate in (24.28) and use (24.20)). It follows that if n is even, if n is odd. By (24.20), this set has Lebesgue measure ( 24.29) The fundamental sets of rank n form a partition of f!, and their unions form a field �; let ,9 2qn _ 2 by (24. 19), induction gives qn > 2< n - I J/ 2 for n > 0. And now (24.29) implies that .9Q generates the u-field !F of linear Borel sets that are subsets of f! (use Theorem 10.1(ii)). Thus .9, �. !F are related as in the hypothesis of Lemma 2. Clearly T is measurable !Fj !F. Although T does not preserve ,.\ , it does preserve Gauss ' s measure, defined by (24.30) In fact, since
P ( A ) = log1 2 f 1 dx +x ' A
A E !F.
RANDOM VARIABLES AND EXPECTED VALUES
322
it is enough to verify
ljk dx [, ljk dx . [, dx ol 1 + x = k = l fI/(k + l ) 1 + x = k = l f1 /(k + I ) 1 + x (I
Gauss's measure is useful because it is preserved by T and has the same sets of measure 0 as Lebesgue measure does. Proof that T is ergodic. Fix a 1 , , an, and write 1/Jn for 1/J a and dn for Suppose that n ni]i even, so that 1/Jn is increasing. !f X E d n , then da l n (since x = 1/Jn(T x )) s < T x < t if and only if if;/s) < x < 1/Jn(t ); and this last condition implies x E dn. Combined with (24.28) and (24.20), this shows that a
• • •
a
n
·
I
n
If B is an interval with endpoints s and t, then by (24.29),
A similar argument establishes this for n odd. Since the ratio on the right lies between t and 2,
( 24.31 ) Therefore, (24. 10) holds for .9, .9Q, !F as defined above, A = dn, nA = n, c = 4 , and A in the role of P. Thus r - 1 C = C implies that ,.\(C) is 0 or 1, and since Gauss ' s measure (24.30) comes from a density, P(C) is 0 or 1 as well. Therefore, T is an ergodic measure-preserving transformation on ( f!, :Y, P). It follows by the ergodic theorem that if f is integrable, then
( 24. 3 2)
1. t f( Tk - l x )
lim n n
k= I
=
1 r t f( x ) dx log 2 }0 1 + X
holds almost everywhere. Since the density in (24.30) is bounded away from 0 and oo, the "integrable" and "almost everywhere" here can refer to P or to A indifferently. Taking f to be the indicator of the x-set where a 1(x) = k shows that the asymptotic relative frequency of k among the partial quotients is almost everywhere equal to
1
lfk
dx log 2 JI /< k + D 1 + x
=
1 log ( k + 1) 2 log 2 k(k + 2 )
·
In particular, the partial quotients are unbounded almost everywhere.
SECTION
24.
THE ERGODIC THEOREM
323
For understanding the accuracy of the continued-fraction algorithm, the magnitude of a n(x) is less important than that of qn(x). The key relationship . IS
(24.33) which follows from (24.25) and (24.19). Suppc 2 < n - I )/ 2 for n > 1. Therefore, by (24.33),
1 1 X Pn ( X ) fqn ( X ) 1 -< qn + 1 ( X ) < 2 n f 2 ' n > 1. Since llog{ 1 + s)l < 4lsl if lsi < 1/ fi . log x - log Therefore, by (24.36),
n 1 k I log log y x L qn ( X )
k� I
4 . Pn( x) < -= qn( X) - 2 n f 2
RANDOM VARIABLES AND EXPECTED VALUES
324
By the ergodic theorem, then, t
1 1 I.1 m - Iog n qn ( X ) n
-
1
(
I log X dx
log 2 10 1 + x
=
( - l)k
I dx log( 1 x + x ) }
-1 (
log 2 0
_ 7T 2 - 12 log 2 - log 2 L 2 k O (k + 1) -1
00
·
Hence (24.34 ). Diophantine Approximation
The fundamental theorem of the measure theory of Diophantine approximation, due to Khinchine, is Theorem 1.5 together with Theorem 1.6. As in Section 1, let �p(q ) be a positive function of integers and let A 'I' be the set of x in (0, 1) such that
p X - -q
(24.37)
x for infinitely many n , then (25.1) fails at the discontinuity point of F. • Example 25.4.
Uniform Distribution Modulo t•
For a sequence x 1 , x2, of real numbers, consider the corresponding sequence of their fractional parts {xn} = xn l x nl· For each n, define a probability measure 1-Ln by • • •
-
(25.3)
f.L n ( A )
=
1 -#[ n k : 1 < k < n, { x d E A ] ;
t There is (see Section 28) a related notion of vague convergence in which IL may be defective in the sense that f.L(R 1 ) < 1. Weak convergence is in this context sometimes called complete convergence. * This topic, which requires ergodic theory, may be omitted.
SEC'TION 25. WEAK CONVERGENCE
329
has mass n - 1 at the points {x 1 }, , {xn}, and if several of these points coincide, the masses add. The problem is to find the weak limit of {JLn} in number-theoretically interesting cases. If the JLn defined by (25.3) converge weakly to Lebesgue measure restricted to the is said to be uniformly distributed modulo 1 . In unit interval, the sequence x" x 2, this case every subinterval has asymptotically its proportional share of the points {xn}; by Theorem 25.8 below, the same is then true of every subset whose boundary has Lebesgue measure 0.
JLn
• • •
• • •
.
Theorem 25.1.
F.or e irrational, e, 2e, 3e, . . . is uniformly distributed modulo 1.
in Example 24.4, PRooF. Since { ne} = {n {e}} , e can be assumed to lie in [0,1]. As x map [0, 1) to the unit circle in the complex plane by cf>( x) = e 2 -rri . If e is irrational, then c = cf>(e) is not a root of un ity, and so (p. 000) Tw = cw defines an ergodic transformation with respect to circular Lebesgue measure P. Let J be the class of open arcs with endpoints in some fixed countable, dense set. By the ergodic theorem, n the> orbit {T w} of almost every w enters every I in J with asymptotic relative frequency P( /). Fix such an w. If /1 c1 c /2,1 where 1 is a closed arc and /1 , /2 are in J, then the upper and lower limits of n - ['/, = 1 /iTk - lw) are between P(/1 ) and P(/2), and thereiore the limit exists and equ als P(J). Since the orbits and the arcs are rotations of one another, every orbit enters every closed arc 1 with frequency P(J). n This is true in particular of the orbit {e } of 1. n Now carry all this back to [0, 1) by cf> - I : For every x i n [0, 1), {ne } = cf> - '(e ) lies in [0, x] with asymptotic relative frequency ;,;. • For a simple proof by Fourier series, see Example 26.3. Convergence in Distribution
Let Xn and X be random variables with respective distribution functions Fn and F. If Fn = F, then Xn is said to converge in distribution or in law to X, written Xn = X. This dual use of the double arrow will cause no confusion. Because of the defining conditions (25 . 1 ) and (25.2), Xn = X if and only if
( 25.4)
lim P[Xn < x ] = P[X < x ] ,
for every x such that P[X = x] = 0. be independent random variables, each Let X1 , X2, with the exponential distribution: P[Xn > x ] = e -ax, x > 0. Put Mn = max{ X1 , , Xn } and bn = a- 1 log n. The relation (14.9), established in Exam ple 14. 1 , can be restated as P[Mn - bn < x] � e - e - ax. If X is any random variable with distribution function e - e - ax' this can be written Mn - bn = X. Example 25.5.
• • •
• • •
•
One is usually interested in proving weak convergence of the distributions of some given sequence of random variables, such as the Mn - bn in this example, and the result is often most clearly expressed in terms of the random variables themselves rather than in terms of their distributions or
CONVERGENCE OF DISTRIBUTIONS
330
distribution functions. Although the Mn - bn here arise naturally from the problem at hand, the random variable X is simply constructed to make it possible to express the asymptotic relation compactly by Mn - bn = X. Recall that by Theorem 14. 1 there does exist a random variable for any prescribed distribution. For each n, let f!n be the space of n-tuples of O's and 1 's, let � consist of all subsets of f!n, and let Pn assign probability (Ajn)k (l Ajn) n - k to each w consi�tjng of k 1's and n - k O's. Let Xn(w) be the number of 1's in w; then Xn , a random variable on (f!n, �. Pn), represents the number of successes in n Bernoulli trials having probability A/n of success at each. Let X be a random variable, on some (f!, !F, P), having the Poisson • distribution with parameter A. According to Example 25.2, X,. = X. Example 25.6.
As this example shows, the random variables Xn may be defined on entirely different probability spaces. To allow for this possibility, the P on the left in (25.4) really should be written Pn . Suppressing the n causes no confusion if it is understood that P refers to \Whatever probability space it is that xn is defined on; the underlying probability space enters into the definition only via the distribution 11- n it induces on the line. Any instance of Fn = F or of 11- n = 11- can be rewritten in terms of convergence in distribution: There exist random variables Xn and X (on some probability spaces) with distribution functions Fn and F, and Fn = F and Xn = X express the same fact. Convergence in Probability
Suppose that X, X 1 , X2, . . . are random variables all defined on the same probability space (f! , !F, P). If Xn � X with probabiliry 1, then P[IXn - XI > E i.o.] 0 for E > 0, and hence =
(25.5)
lim P [ I Xn - XI > E ] n
=
0
by Theorem 4. 1. Thus there is convergence in probability Xn �P X; see Theorems 5.2 and 20.5. Suppose that (25.5) holds for each positive E. Now P[X < x - E ] - P[IXn XI > E] < P[Xn < x ] < P[X < x + €] + P[IXn - x i > E]; letting n tend to oo and then letting E tend to 0 shows that P[ X < x ] < lim infn P[Xn < x] < lim sup n P[Xn < x ] < P[X < x ]. Thus P[Xn < x ] � P[X < x ] if P[X = x] = 0, and so Xn = X:
Suppose that Xn and X are random variables on the same probability space. If Xn � X with probability l, then Xn � p X. · If Xn � p X, then Xn = X. Theorem 25.2.
SECTION 25. WEAK CONVERGENCE
331
Of
the two implications in this theorem, neither converse holds. B ecause of Example 5.4, convergence in probability does not imply convergence with probability 1 . Neither does convergence in distribution imply convergence in probability: if X and Y are independent and assume the values 0 and 1 with probability � each, and if Xn = Y, then Xn = X, but Xn � P X cannot hold because P[IX - Yl] = 1 = � - What is more, (25.5) is impossible if X and the Xn are defined on different probability spaces, as may happen in the case of convergence in distribution. Although (25.5) in general makes no sense unless X and the Xn are defined on the same probability space, suppose that X is replaced by a constant real number a-that is, suppose that X(w) = a. Then (25.5) be comes (25.6)
lim P [ IXn - al > € ] = 0, r.
and thi� condition makes sense even if the space of Xn does vary with n. Now a can be regarded as a random variable (on any probability space at all), and it is easy to show that (25.6) implies that Xn = a: Put E = lx - al; if ..i > a, then P[ Xn < x] > P[1Xn - ai < E] � 1, and if x < a, then P[Xn <x] < P[IXn - al > E] � 0. If a is regarded as a random variable, its distribution function is 0 for x < a and 1 for x > a . Thus (25.6) implies that the distribu tion function of Xn converges weakly to that of a. Suppose, on the other hand, that Xn = a. Then P[IXn - al > E] < P[Xn < a - €] + 1 -- P[Xn < a + € ] --> 0, so that (25.6) _holds:
The condition (25.6) holds for all positive E if and only if xn = a, that is, if and only if Theorem 25.3.
if x < a , if x > a . If (25.6) holds for all positive E, Xn may be said to converge to a in probability. As this does not require that the Xn be defined on the same space, it is not really a special case of convergence in probability as defined by (25.5). Convergence in probability in this new sense will be denoted xn = a, in accordance with the theorem just proved. Example 14.4 restates the weak law of large numbers in terms of this concept. Indeed, if X 1, X2 , • . • are independent, identically distributed ran dom variables with finite mean m, and if Sn = X 1 + · · · +Xn , the weak law of large numbers is the assertion n - I Sn = m. Example 6.3 provides another illustration: If sn is the number of cycles in a random permutation on n letters, then Sn / log n = 1.
332
CONVERGENCE OF DISTRIBUTIONS
Suppose that Xn = X and 8n � 0. Given E and YJ , choose x so that P[IXI > x ] < 17 and P[X = + x] = 0, and then choose n0 so that n > n0 implies that l8ni < Ejx and I P[Xn < y] - P[X < y]i < YJ for y = +x. Then P[ !8nXn1 > E] < 3YJ for n > n0• Thus Xn = X and 8n � 0 imply that • 8n Xn = 0, a restatement of Lemma 2 of Section 14 (p. 193). Example 25. 7.
The asymptotic properties of a random variable should remain unaffected if it is altered by the addition of a random variable that goes to 0 in probability. Let (Xn • Y) be a two-dimensional random vector. Theorem 25.4.
If X" = X and Xn - Yn = 0, then Yn = X.
PROOF.
Suppose that y' < x < y" and P[X = y'] = P[X = y" ] = 0. If y' < X - € < X < X + f < y", then
( 25 7 ) P [ Xn < y' ] - P [ I Xn - Yn : > € ] < p [ Yn < X ] < P[X" < y" ] + P [ I Xn - Ynl > Ej . •
Since Xn = X, letting n � oo gives
(25.8 )
P[ X < y' ] < lim infP n [ Y,. < x ] < lim sup P[ Y,. < x ] < P[ X < y" ] . n
Since P[ X = y] = O for all but countably many y, if P[X = x ] = O, then y' and y" can further be chosen so that P[X < y'] and P[ X < y" ] are arbitrarily near P[ X < x ]; hence P[ Yn < x] � P[ X < x]. • Theorem 25.4 has a useful extension. Suppose that (X�" >, Yn) is a two-dimensional random vector. Theorem 25.5.
If, for each u, XA" ) = x < u ) as n -+ oo, if x < u > = X as u -+ oo, and if lim lim sup P ( I X� > - Ynl > € ] = 0
( 25.9)
u
for positive
€,
n
"
then Yn = X.
PROOF. Replace Xn by x�u ) in (25. 7). If P[ X = y' 1 = 0 P[ x < u) = y' 1 and P[ X = y"1 0 P[x < u > =uy"1, letting n -+ oo and then u -+ oo gives (25.8) once again. Since P[X = y 1 = 0 P[ x < > = y 1 for all but countably many y, the proof can be completed '-=
as before.
=
=
=
•
SECTION 25. WEAK CONVERGENCE
333
Fundamental Theorems
Some of the fundamental properties of weak convergence were established in Section 14. It was shown there that a sequence cannot have two distinct weak limits: If Fn = F and Fn = G, then F = G. The proof is simple: The hypothe sis implies that F and G agree at their common points of continuity, hence at all but countably many points, and hence by right continuity at all points. Another simple fact is this: If lim n Fn( d ) = F(d) for d in a set D dense in R 1 , then Fn = F. Indeed, if F is continuous at x, there are in D points d' and d" such that d' < x < d" and F(d") - F(d') < E, and it follows that the limits superior and inferior of Fn(x) are within E of F(x ). For any probability measure on (R 1 , .9i' 1 ) there is on some probability space a random variable having that measure as its distribi.Ition. Therefore, for probability measures satisfying 11- n = J.L, there e,..ist 1 andom variables Yn and Y having these measures as distributions and satisfying Yn = Y. Accord ing to the following theorem, the Yn and Y can be constructed on the same probability space, and even in such a way that Yn(w) � Y(w) for every w-a condition much stronger than Y,. -=> Y. This result, Skorohod ' s theorem, makes possible very simple and transparent proofs of many important facts.
Suppose that f-L n and J-L are probability measures on and J.Ln = 11- · There e.x:ist random variables Yn and Y on a common probability space (f!, !F, P) such that Yn has distribution J.Ln , Y has distribution JL, and Yn(w) � Y(w) for each w. Theorem 25.6.
( R 1 , .9i' 1 )
PRooF.
For the probabili ty space (f!, !F, P), take f! = (0, 1), let !F con sist of the Borel subsets of (0, 1), and for P(A) take the Lebesgue measure of A. The construction is related to that in the proofs of Theorems 14. 1 and 20.4. Consider the distribution functions Fn and F corresponding to J.Ln and J.L· For 0 < w < 1, put Yn(w) = infix: w � Fn(x)] and Y(w) = infi x: w < F(x)]. Since w < Fn(x) if and only if Y,.(w) < x (see the argument following (14.5)), P[ w: Yn(w) < x ] P[ w: w < Fn(x)] Fn (x). Thus Yn has distribution function Fn ; similarly, Y has distribution function F. It remains to show that Yn(w) � Y(w). The idea is that Yn and Y are essentially inverse functions to Fn and F; if the direct functions converge, so must the inverses. Suppose that 0 < w < 1. Given E, choose x so that Y(w) - E <x < Y(w) and JL{X} = 0. Then F(x) < w; Fn(x) � F(x) now implies that, for n large enough, Fn(x) < w and hence Y(w) - E < x < Yn(w). Thus lim infn Y,.(w) > Y(w). If w < w' and E is positive, choose a y for which Y(w') < y < Y(w') + E and J.L{Y} 0. Now w < w' < F(Y(w')) < F( y), and so, for n large enough, w < Fn(y) and hence Y,.(w) < y < Y(w') + E. Thus lim sup n Y,.(w) < Y(w') if w < w'. Therefore, Yn( w) � Y( w) if Y is continuous at w. =
=
=
CONVERGENCE OF DISTRIBUTIONS
334
Since Y is nondecreasing on (0, 1), it has at most countably many disconti nuities. At discontinuity points w of Y, redefine Yn(w) = Y(w) = 0. With this change, Yn(w) � Y(w) for every w. Since Y and the Yn have been altered only on a set of Lebesgue measure 0, their distributions are still 11- n and J.L. • Note that this proof uses the order structure of the real line in an essential way. The proof of the corresponding result in R k is more complicated. The following mapping theorem is of very frequent use.
Suppose that h: R 1 � R 1 is measurable and that the set Dh of its discontinuities is measurable.t If 11-n = J.L and J.L(D;,) = 0, then 11-n h - i = J.L h - 1 . Recall (see (13.7)) that 11-h - 1 has value p (h - 1A ) at A. Theorem 25.7.
PRooF.
Consider the random variables Yn and Y of Theorem 25.6. Since Yn(w) � Y(w), if Y(w) f!- D h then h(Yn(w)) � h(Y(w)). Since P[w: Y(w) E Dh ] = J.L(Dh ) = 0, it follows that h(Yn(w)) � h(Y(w)) with probability 1. Hence h(Y,.) = h(Y) by Theorem 25.2. Since P[h(Y) EA] P[ Y E h - 1A] = J.L(h - 1.4 ), h(Y) has distl ibution 11-h - I ; similarly, h(Yn) has distribution J.L n h - 1 . Th us • h(Yn ) = h(Y) is the same thing as J.L n h - I = 11-h - I . =
Because of the definition of convergence in distribution, this result has an equivalent statement in terms of random variables: Corollary
l.
If Xn = X and P[X E Dh ] = 0, then h(Xn) = h(X).
Take X = a: Corollary 2.
If Xn = a and h is continuous at a, then h(Xn) = h(a).
From Xn = X it follows directly by the theorem that aXn + b = aX + b. Suppose also that a n � a and bn � b. Then (a n - a)Xn � 0 by Example 25.7, and so (a nXn + bn ) - (aXn + b) = 0. And now a nXn + bn = aX + b follows by Theorem 25.4: If Xn = X, a n � a, and bn � b, then anXn + bn = aX + b. This fact was stated and proved differently in Section 14 • -see Lemma 1 on p. 193. Example 25.8.
By definition, 11- n = J.L means that the corresponding distribution functions converge weakly. The following theorem characterizes weak convergence
!JR 1
tThat Dh lies in is generally obvious in applications. In point of fact, it always holds (even if h i s not measurable): Let A(E, 15) be the set of x for which there exist y and z such that lx - y l < 15, lx - zi < 15, a r. d ih( y ) - h( z)i > E. Then A(E, 15) is open and Dh = u . n aA(E, 15 ), where E and 15 range over the positive rationals.
SECTION 25.
WeAK
CONVERGENCE
335
without reference to distribution functions. The boundary aA of A consists of the points that are limits of sequences in A and are also limits of sequences in Ac; alternatively, aA is the closure of A minus its interior. A �et A is a wcontinuity set if it is a Borel set and J.L(aA) = 0. Theorem 25.8.
The following three conditions are equivalent.
{i) 1-Ln = /-L; {ii) ffdi-Ln � ffdJ.L for every bounded, continuous real function f; (iii) 1-L n( A ) � J.L( A) for every wcontinuity set A.
PROOF.
Suppose that 1-Ln 1-L· and consider the random variables Yn and Y of Theorem 25.6. Suppose that f is a bounded function such that J.L( D ) = 0, where D1 is the set of points of discontinuity of F. From P[ Y E D1 = J.L(D1) = 0 it follows that fCY,,) � f(Y) with probabiiity 1, and so by change of variable (see (21. 1)) and the bounded convergence theorem, ffdi-Ln = E[f(Yn)] � E[ f(Y)] = ffdJ.L. Thus 1-Ln = J.L and J.L(D1) = 0 together imply that ffdi-L n � JfdJ.L if f is bounded. In particular, (i) implies (ii). Further, if f = /A , then D1 = a A, and from J.L(aA) = 0 and 1-Ln = 1-L follows 1-Ln(A) -= ffdi-L n � Jfdi-L = J.L(A). Thus (i) also implies {iii). Since a( - oo, x ] = {x}, obviously (iii) implies {i). It therefore remains only to deduce 1-Ln = 1-L from {ii). Consider the corresponding distribution func tions. Suppose that x < y, and let f(t) be 1 for t < x, 0 for t > y, and interpolate linearly on [x, y ]: f(t) = { y - t)j(y - x) for x < t < y. Since Fn(x) < Jfdi-Ln and Jfdi-L < F(y), it follows from (ii) that lim supn Fn(x) < F(y); letting y ! x shows that lim supn Fn(x) < F(x). Similarly, F(u) < lim infn Fn(x) for u < x and hence F(x - ) < lim infn Fn(x). This implies • convergence at continuity points.
f
=
The function f in this last part of the proof is uniformly continuous. Hence 1-Ln = 1-L follows if ffdi-L n � ffdJ.L for every bounded and uniformly continuous f. The distributions in Example 25.3 satisfy 1-Ln = J.L, but 1-Ln( A ) does not converge to J.L( A) if A is the set of rationals. Hence this A cannot be a wcontinuity set; in fact, of course, aA = R 1 • • Example 25.9.
The concept of weak convergence would be nearly useless if (25.2) were not allowed to fail when J.L(aA) > 0. Since F(x) - F(x - ) = J.L{x} = J.L(a( - oo, x ]), it is therefore natural in the original definition to allow (25.1) to fail when x is not a continuity point of F.
CONVERGENCE OF DISTRIBUTIONS
336 Melly's Theorem
One of the most frequently used results in analysis is the Helly selection
theorem:
For every sequence {F11} of distribution functions there exists a subsequence {Fn ) and a nondecreasing, right-continuous function F such that lim k F11 k( x) = F( x) at continuity points x of F. Theorem 25.9.
PROOF.
An
application of the diagonal method [Al4] gives a sequence {n k} of ir.tegers along which the limit G(r) = limk F k(r) exists for every rational r. Define F(x) = inf[ G(r ): x < r ]. Clearly F is non decreasing. To each x and there is an r fo r which x < r and G(r) < F(x) + E. If x < y < r, then F(y) < G(r) < F(x) + E. Hence F is continuous from the right. If F is continuous at x, choose y < x so that F(x) - E < F( y ); now choose rational r and s so that y < r <x < s and G(s) < F(x) + E. From F(x) - E < G(r) < G(s) < F(x) + E and F11(r) < F11( X ) < F11(s) it follows that as k goes to infinity F11 k(x) has limits superior and inferior within E of F(x ) . • 11
E
The F in this theorem necessarily satisfies 0 < F(x) < 1. But F need not be a distribution function: if F11 has a unit jump at n, for example, F(x) 0 is the only possibility. It is important to have a condition which ensures that for some subsequence the limit F is a distribution function. A sequence of probability measures 11- n on (R 1 , .9i' 1 ) is said to be tight if for each E there exists a finite interval (a, b] such that JL11(a, b] > 1 - E for all n. In terms of the corresponding distribution functions F11, the condition is that for each E there exist x and y such that F11(x) < E and F11(y) > 1 - E for all n. If 11- n is a unit mass at n, {Jl-11} is not tight in this sense-the mass of P-n "escapes to infinity." Tightness is a condition preventing this escape of mass. =
Tightness is a necessary and sufficient condition that for every subsequence {Jl-11 k } there exist a further subsequence {Jl-11 k(})} and a probability measure 11- such that 11- n kuJ = 11- as j � oo. Theorem 25.10.
Only the sufficiency of the condition in this theorem is used in what follows. theorem to the subsequence {Fn) of corresponding distribution functions. There exists a further subsequence {Fn k(})} such that lim ,. Fnk(}J(x) = F(x) at continuity points of F, where F is nondecreasing and right-continuous. There exists by Theorem 12.4 a measure 11- on (R 1 , .9i' 1 ) such that JL(a, b] = F(b) - F(a). Given E, choose a and b s o that JL11( a, b] > 1 - E for all n, which is possible by tightness. By decreasing a
PROOF. Sufficiency. Apply Helly's
SECTION 25. WEAK CONVERGENCE
337
and increasing b, one can ensure that they are continuity points of F. But then JL(a, b] > 1 - E. Therefore, 11- is a probability measure, and of course
= 11- · Necessity. If {JLn} is not tight, there exists a positive E such that for each finite interval (a, b ], 11- n(a, b] < 1 - E for some n. Choose n k so that 11-n ( - k, k] < 1 - €. Suppose that some subsequence {JLn } of {JL n } were to converge weakly to some probability measure 11-· Choose (a, b] so that JL{a} = JL{b} = 0 and JL(a, b] > 1 - E . For large enough j, (a, b] c ( - k(j), k{j)], and so 1 - € > 11-n ( - k{j), k{ j )] > 11- n (a, b] � JL(a, b ]. Thus • JL(a, b] < 1 - E, a contradiction. 11-
n k(} )
k
k(} )
k(J)
k
k(J)
is a tight sequence of probability measures, and !f each subsequence th at converges weakly at all converges weakly to the probability measure JL, then 11- n = JL. Corollary.
If {JL n}
PROOF.
By the theorem, each subsequence {JLn) contains a further subsequence {JL n . } converging weakly (j � oo) to some limit, and that limit must by hypothesis be 11- · Thus every subsequence {JL n ) contains a further subsequence {JLn } converging weakly to 11-· Suppose that �'� = 11- is false. Then there exists some x such that JL{x} = 0 but 11- n( - oo, x] does not converge to JL( - oo, x ]. But then there exists a positive E such that I 11- n ( - oo, x] - JL(- oo, x 11 > E for an infinite sequence {n k } of integers, and no subsequence of {JL n ) can converge weakly to 11- · This • contradiction shows that 11- n = 11- · k(J)
k
lf 11-n is a unit mass at xn, then {JLn} is tight if and only if {x n} is bounded. The theorem above and its corollary reduce in this case to standard facts about real, line; see Example 25.4 and A10: tightness of sequences of probability measures is analogous to boundedness of sequences of real numbers. Let 11-n be the normal distribution with mean mn and variance a-n2 • If mn and a-n2 are bounded, then the second moment of 11-n is bounded, and it follows by Markov's inequality (21. 12) that {JLn} is tight. The conclusion of Theorem 25. 10 can also be checked directly: If {n k U >} is chosen so that lim ,. m n = m and lim,. a-n2 . = a- 2 , then 11-n = JL, where 11- is normal with mean m and variance a- 2 (a unit mass at m if a- 2 = 0). If mn > b, then Jl n(b, oo ) > 4; if m n < a, then Jl n( - oo, a] > 4. Hence {JLn} cannot be tight if m n is unbounded. If m n is bounded, say by K, then 1 is the standard normal distribu oo, (a - K)a-n- ], where 11- n( - oo, a] > 1 oo, (a - K)a-n- 1 � t along some subse tion. If a-n is unbounded, then quence, and {JL n} cannot be tight. Thus a sequence of normal distributions is tight if and only if the means and variances are bounded. • Example 25.10.
k(J)
v(-
k(})
v(-
k(J)
v
CONVERGENCE OF DISTRIBUTIONS
338 Integration to the Limit Theorem 25.11.
If Xn = X, then E[ I XIl < lim infn E[ IXn l].
PROOF.
Apply Skorohod's Theorem 25.6 to the distributions of xn and X: There exist on a common probabili ty space random variables Yn and Y such that Y = lim n Yn with probability 1, Yn has the distribution of Xn , and Y has the distribution of X. By Fatou's lemma, E[ I Y il < lim infn E[ I Y,. Il. Since l X I and I Y I have the same distribution, they have the same expected value • (see (21.6)), and similarly for I Xn l and I Y,.l. The random variables Xn are said to be uniformly integrable if
(25.10)
lim sup
a -+ oo
n
1
1 Xn1 dP
[I Xn l � a]
=
0;
see (16.21). This implies (see (l6.22)) that
(25 . 1 1 ) Theorem 25.12.
integrable and
n
If Xn = X and the Xn are uniformly integrable, then X is
(25.12 )
PROOF.
Construct random variables Yn and Y as in the preceding proof. Since Y,. � Y with probability 1 and the Yn are uniformly integrable in the • sense of (16.21), E[ Xn l = E[ Y,. ] � E[ Y ] = E[X ] by Theorem 16.14. If supn E[IXi + E ] < cc for some positive integrable because
E,
then the Xn are uniformly
(25.13) Since Xn = X implies that x; = xr by Theorem 25.7, there is the following consequence of the theorem.
Let r be a positive integer. If Xn = X and sup n E[I XX+El < oo, where E > 0, then E[ l x n < oo and E[ X;l � E[ Xr]. Corollary.
'
The Xn are also uniformly integrable if there is an integrable random variable Z such that P[IXn l > t ] < P[ IZI > t ] for t > 0, because then (21.10)
SECTION 25. WEAK CONVERGENCE
339
.
gives
From this the dominated convergence theorem follows again. PROBLEMS
Show by example that distribution functions having densities can converge weakly even if the densities do not converge: Hint: Consider f/x) = 1 + cos 211'nx on [0, 1]. (b) Let fn be 2" times the indicator of the set of x in the unit interval for which d,. + 1(x) = · · · = d2n(x) = 0, where dk (x) is the k th dyadic digit. Show that f,.(x) --+ 0 except on a set of Lebesgue measure 0; on this exceptional set, redefine ,f,.( x) = 0 for all n, so that fn(x) --+ 0 everywhere. Show that the distributions corresponding to these densities converge weakly to Lebesgue measure confined to the unit interval. (c) Show that distributions with densities can converge weakly to a limit that has no density (even to a unit mass). (d) Show that discrete distributions can converge weakly to a distribution that has a density. (e) Construct an example, like that of Example 25.3, in which ttn(A ) --+ p,(A ) fails but in which all the measures come from continuous densities on [0, 1].
25.1. (a)
25.2.
14.8 i Give a simple proof of the Gilvenko-Cantelli theorem (Theorem 20.6) under the extra hypothesis that F is continuous.
25.3.
Initial digits.
x is d (in the scale of 10) if and only if {log 10 x} lies between log 1 0 d and log 1 0(d + 1), d 1, . . . , 9, where the braces denote fractional part. (b) For positive numbers x 1, x2, . . . , let Nr(d) be the number among the first n that have initial digit d. Show that (a) Show that the first signifi c ant digit of a po!'itive number
=
lim n _!n Nn ( d) = log 10 ( d + 1) - log 10 d,
( 25 . 14)
d = 1, . . . , 9,
if the sequence log 1 0 X11, n = 1, 2, . . . , is uniformly distributed modulo 1. This is true, for example, of xn {}" if log10 {} is irrational. (c) Let Dn be the first signifi c ant digit of a positive random variable X11 • Show that =
( 25. 15)
lim n P [ D" = d] = log 10( d + 1 ) - log 10 d,
if {log 10 Xn} 25.4.
=
d = 1,
. . .
, 9,
U, where U is uniformly distributed over the unit interval.
Show that for each probability measure p, on the line there exist probability measures 1-tn with finite support such that p, 11 p,. Show further that ttn{x} can =
CONVERGENCE OF DISTRIBUTIONS
340
be taken rational and that each point in the support can be taken rational. Thus there exists a countable set of probability measures such that every p., is the weak limit of some sequence from the set. The space of distribution fu nctions is thus separable in the Levy metric (see Problem 14.5). 25.5. Show that
(25.5) implies that
P([ X < X ] ll [ xn
< X ]) --+ 0 if
P[ X = X 1 =
0.
25.6. For arbitrary random variables Xn there exist positive consta nts a n such that a n xn = 0.
25.8 by showing for three-dimensional random vectors ( A n, Bn , Xn) and constants a and b, a > 0, that, if A n = a , Bn = b, and Xn = X, then A n Xn + Bn = aX + b Hint: First show that if Y,, = Y and Dn 0, then DnYn = 0.
25.7. Generalize Example =>
hn and h are Borel fu nctions. Let E be the set of x for which hnxn -+ hx fails for some sequence xn --+ x. Suppose that E � 1 and P( X £] = 0. Show that h 1 Xn hX.
25.8. Suppose that Xn = X and that
E
E
1
=>
25.9. Su ppose that the distributions of random variables Xn and X have densities fn and f. Show that if fn(x) --+ f(x) for x outside a set of Lebesgue measure 0,
then xn => X.
Suppose that Xn assumes as values Yn + kon , k = 0, + 1, . . . , where on > 0. Suppose that on --+ 0 and that, if k,. is a n integer varying with n in such a way that 'Yn + knon --+ x, then P[ Xn = 'Yn + k nonlo; 1 --+ f(x), where f is the density of a random variable X. Show that Xn = X.
25.10. i
Let sn have the binomial distributioil with parameters n and p. Assume as known that
25.11. i
(25 .16) if ( k n - np)(np(1 -p )) - 1 12 --+ x. Deduce the DeMoivre-Laplace theorem: (Sn - np )(np(l -p )) - 1 12 N, where N has the standard normal distribution. This is a special case of the central limit theorem; see Section 27. =>
25.12. Prove weak convergence in Example
theory of the Riemann integral.
25.3 by using Theorem 25.8 and the
25.13. (a) Show that probability measures satisfy /1- n
=>
p., if p.,n(a, b ] --+ p.,( a , b] when
ever p.,{a} = p.,{b} = 0. (b) Show that, if ffdp.,n --+ ffdp., fo r all continuous f with bou nded support, then /1- n = p.,.
SECTION 25. WEAK CONVERGENCE
341
Let p., be Lebesgue measure confined to the unit interval; let l.tn corre spond to a mass of xn , i - xn,i - l at some point in (xn , i - l , xn), where 0 =xn o <xn 1 < · · · <xnn = 1. Show by considering the distribution functions that l.tn = p., if max; < n(.xn,i - x,,; _ 1 ) -+ 0. Deduce that a bounded Borel fu nction continuous almost everywhere on the unit interval is Riemann integrable. See Problem 17. 1.
25. 14. i
A function f of positive integers has distribution function F if F is the weak limit of the distribution function Pn[m: f(m) < x ] of f under the measure having probability 1/n at each of 1, (see 2.34)). In this case D[m: f(m) <x] = F( x ) (see (2.35)) for continuity points x of F. Show that
25.15. 2. 18 5. 19 i
. . . , n
q; (m) j m
(see (2.37)) has a distribution: (a) Show by t he mapping theorem that it suffices to prove that f(m) = log(q;(m)jm) = [P o/m) log(1 - 1 jp ) has a distribution. (b} Let fu(m) = Lp ,; u o (m)log(1 - 1jp), and shew by (5.45) that fu has distribution function F/x) = P[I:P , u Xp Iog(1 - 1/p) < x ], where the XP are independent random variables (one for each prime p) such that P[XP = 1] = 1 /p and P[ XP = 0] = 1 - 1/p. (c) Show that I: P XP Iog(1 - 1jp) converges with probabi!it} 1. Hint: Use Theorem 22.6. (d) Show that lim u supn En[ If- fu ll = 0 (see (5.46) for the notation). (e) Conclude by Markov's inequality and Theorem 25.5 that f has the distribution of the sum in (c). _ ""
E �1
and T > 0, put A T( A ) = A([ - T, T] n A )j2T, where A is Lebesgue measure. The relative measure of A is
25.16. For A
(25. 17)
p ( A ) = lim A T( A ) , T - oo
provided that this limit exists. This is a continuous analogue of density (see (2.35)) for sets of integers. A Borel fu nction f has a distribution under A T ; if thi!; converges weakly to F, then (25. 18)
p [ x : f( x ) < u] = F ( u )
for continuity points u of F, and F is called the distribution function of f. Show that all periodic fu nctions have distributions. 25. 17. Suppose that supn ffdp., n < oo for a nonnegative f such that
x -+
± oo.
25.18. 23 . 4 i
Show that {p.,n} is tight.
f(x) -+ oo as
1
Show that the random variables A and L, in Problems 23.3 and 23.4 co nverge in distribution. Show that the moments converge.
25.19. In the applications of Theorem 9.2, only a weaker result is actually needed:
For each K there exists a positive a = a(K) such that if E(X] = 0, E[X 2 ] = 1 , and E[ X4] < K , then P[ X > 0] > a. Prove this by using tightness and the corollary to Theorem 25. 12.
Xn for which there is no inte grable Z satisfying P[IXnl > t] < P[l Zl > t] for t > 0.
25.20. Find u niformly integrable random variables
CONVERGENCE OF DISTRIBUTIONS
342
SECTION 26. CHARACTERISTIC FUNCTIONS Definition
Th e characteristic function of a probability measure JL on the l ine is defined for real t by
�( t)
=
j'"'00eitxJL ( dx) -
"'
= f COS txJL(dx) - 00
00
+ if 00 �in txJL( dx ) ; -
see the end of Section 16 fer integrals of complex-valued fu ncticns. t A random variable X with distribution J..L has characteristic function
� ( t ) = E[ e i ' X ] =
"'
J- eitxJL(dx) . "'
The characteristic function is thus defined as the moment generat ing function but with the real argument s replaced by it; it has the advantage that it a lways exists because eit x is bounded. The characteristic function in nonprob abilistic contexts is called the Fourier transform. The characteristic function has three fundamental properties to be estab lished here: {i) If JL 1 and JL 2 have respective characteristic functions � 1 (t ) and �it), then JL 1 * JL 2 has characteristic function � 1(t)�it). Although convolution is essential to the study of sums of independent random variables, it is a complicated operation, and it is often simpler to study the products of the corresponding cha 1 acteristic functions. {ii) The characteristic function uniquely determines the distribution. This shows that in studying the products in {i), no information is lost. {iii) From the pointwise convergence of characteristic functions follows the weak convergence of the corresponding distributions. This makes it possible, for example, to investigate the asymptotic distributions of sums of independent random variables by means of their characteristic functions. Moments and Derivatives
It is convenient first to study the relation between a characteristic function and the moments of the distribution it comes from. t From complex variable theory only De Moivre's formula and the simplest properties of the exponential function are needed here.
SECTION 26. CHARACTERISTIC FUNCTIONS
343
Of course, cp{O) = 1, and by (16.30), lcp(t )I < 1 for all t. By Theorem 1 6 .80 ), cp(t ) is continuous in t. In fact, lcp(t + h) - cp(t )I < Jle ih x - 1 I J.L{dx), and so it follows by the bounded convergence theorem that cp(t) is umformly continu
ous.
In the following relations, versions of Taylor's formula with remainder, x is assumed real. Integration by parts shows that
( 26. 1)
X
1o ( x -
n s ) e ;5 ds =
l j ( - s ) n + l e 1s ds + ' n+1 n+1 o x X
.
n+J
X
and it follows by induction that
k ·n + l n lX ( ) ·x l j e' = L k .1 + n .1 ( X - s ) e'� ds o k =O n
( 26.2 )
.l.
"
·
for n > 0. Replace n by n - 1 in (26. 1), solve for the integral on the right, and substitute this for the integral in (26.2); this gives
( 26. 3) Estimating the integrals in (26.2) and (26.3) (consider separately the cases x > 0 and x < 0) now leads to
{
k n n +l x l ( ix ) l 2lxl . e ix < mm ' n! k ! ! n + 1 ) ( L k O ' I
( 26.4 )
n
}
for n > 0. The first term on the right gives a sharp estimate for lxl small, the second a sharp estimate for lxl large. For n = 0, 1, 2, the inequality special izes to
le ix - 11 < min{ lxl, 2} , l e'x - ( 1 + ix) l < min{Ix 2 , 2 lxl} ,
(26.40 ) ( 26.41 )
l e i x - (1 + ix - Ix 2 ) l < min{-i;lxl 3 , x 2 } .
( 26.4 2 )
If X has a moment of order n , it follows that ( 26.5 )
[ { ltX I n + I
k n it tX 2l . I ;, ( k ).I E[X k ] -< £ mm cp ( t ) L. I I ' . n + . n 1 ) ( k=O _
}]
.
CONVERGENCE OF DISTRIBUTIONS
344
For any
t
satisfying
(26.6) cp(t) must therefore have the expansion (26.7) compare
(21 .22).
If
[, 1 � 1 ; E( IX Ik) k =O
=
E[ e l r Xi ] < oo,
then (see (16.3 1)) (26.7) must hold. Thus generating function over the whole line. Since
Example 26.1.
tion, by
(26. 7) and (21. 7)
( 26.8) cp( t )
"'
=
L k =0
(26.7)
holds if
X
has
a
moment
E[ei' XI ] < oo if X
has the standard normal distribu its characteristic function is
( lt ) 2 k 1 X 3 X · · · X (2k - 1) ( 2k ) !
"'
•
This and (21.25) formally coincide if
s
=
11 -
L k =O k .
( - !.._2 _ ) k 2
=
e - 1 2 /2 •
= it.
If the power-series expansion (26. 7) holds, the moments of off from it:
X can be read
(26.9) This is the analogue of (21.23). It holds, however, under the weakest possible assumption, namely that E[ I X k l] < oo. Indeed,
[
]
h X _ l - ihX i cp( t + h) - cp ( t) e v i · r r . - E[ 1.11. e X ] - E e i X h h By (26.4 1 ), the integrand on the right is dominated by 2 I X I and goes to 0 with h; hence the expected value goes to 0 by the dominated convergence theore m. Thus cp'(t) = E[iXelr x]. Repeating this argument inductively gives
(26 .10)
SECTION 26. CHARACTERISTIC FUNCTIONS
345
if E[IX k l] < oo. Hence (26.9) holds if E[ IX k l] < oo. The proof of uniform k continuity for cp(t) works for cp< >(t) as well. If £[X 2 1 is finite, then
t -7 0.
(26 . 1 1 )
Indeed, by (26 .4 2 ), the error is at most t 2 £[ min{lti iX I 3, X 2 }], and as t -7 0 the integrand goes to 0 and is dominated by X 2 . Estimates of this kind are essential for proving limit theorems. The more moments JL has, the more derivatives cp has. This is one sense in which lightness of the tails of JL is reflected by smoothness of q>. There are results which connect the behavior of cp(t) as It 1 -� oo with smoothness properties of JL. The Riemann -Lebesgue theorem is the most important of these: Theorem 26.1.
If JL has a density, then cp(t) -7 0 as I t 1 -7
oo .
PROOF.
The problem is to prove for integrable f that ff(x )eirx dx -7 0 as ltl -7 oo. There exists by Theorem 17.1 a step function g = [k ak iA ,• a finite linear combination cf indicators of intervals Ak = (a k , bk ], for which J!J gl dx < € . Now Jf(x)e ir x dx differs by at most € from Jg(x)ei rx dx • [k ak(eirbk - e ir ak )jit, and this goes to 0 as ltl -7 oo. =
Independence
The multiplicative property (21 .28) of moment generating functions extends to characteristic functions. Suppose that X 1 and X2 are ind ependent random variables with characteristic functions cp 1 and cp 2 . If }j = cos Xj and 21 sin tXj, then (Y1 , Z 1 ) and (Y2 , 2 2 ) are independent; by the rules for integrat ing complex-valued functions, =
•
cp 1 ( t)cp 2 ( t) = ( E [ Y1 ] + i£[2 1 ) )( E[ Y2 ) + i£[ 22 )) = E[ Y1 ] E[ Y2 ] - £[ 2 1 ) £[ 2 2 ) + i ( E[ YJ l £[ 2 2 ) + £[2 J l E[ Y2 ]) i t ( X I + X2 > ] = E [ Y, Y - 2 1 2 + i( yl 2 + 2 1 Y2 )] = E [ e . 2 2 2
This extends to sums ·of three or more: If
(26.12)
E [ e i t 'f.Z - I Xk l
=
n
X1 , • • • , Xn
n E[ e irXk ] .
k=l
are independent, then
CONVERGENCE OF DISTRIBUTIONS
346
If X has characteristic function function
cp(t ),
then
aX + b
has characteristic
(26.13) cp( - t ), which
In particular, -X has characteristic function conjugate of cp(t ).
is the complex
Inversion and the Uniqueness Theorem
A characteristic function cp uniquely determines the measure JL it comes from. This fundamental fact will be derived by means of an inversion formula through which JL can in principle be recovered from cp. Define
T > 0. In Example
18.4 it is shown that 7T ; T) S = ( 2 T-+ oo
(26.14)
lim
is therefore bounded. If sgn 0 is + 1, negative, then
S(T)
(26.15)
1
T
0,
or
-1
as () is positive,
0,
or
sin tO
0
t dt = sgn o · S ( T IOI) ,
If the probability measure JL has characteristic function cp, and if J1 {a} = JL{b} = 0, then - ir b T e- ir a 1 . e hm - J JL ( a , b] = T(26.16) cp( t) dt. l't + oo 2 7T - T Theorem 26.2.
_
Distinct measures cannot have the same characteristic function. integrand here converges as t --) 0 to b - a, which is to be taken as its value for t = 0. For fixed a and b the integrand is thus continuous in t, and by (26.40) it is bounded. If JL is a unit mass at 0, then cp(t) 1 and the integral in (26. 16) ca nnot be extended over the whole line.
Note: By (26.4 1 ) the =
PROOF.
The inversion formula will imply uniqueness: It will imply that if JL and v have the same characteristic function, then JL(a, b] = v(a, b] if JL{a} v{a} = JL{b} = v{b} = 0; but such intervals (a, b] form a 1r-system gen erating g 1 • =
SECTION 26. CHARACTERISTIC FUNCTIONS Denote by
347
IT the quantity inside the limit in (26.16). By Fubini's theorem
( 26.17 ) This interchange is legitimate because the double integral extends over a set of finite product measure and by (26.40) the int_egrand is bounded by lb - ai. Rewrite the integrand by DeMoivre's formula. Since sin s and cos s are odd and even, respectively, (26.15) gives
The integrand here is bounded and converges as 1
2
!/la, b ( x ) =
-7 oo to the function
if x < a, if x = a , if a < x < b, if X = b, if b < x.
0 (26.18)
T
1 I
2
0
Thu� / -7 f i/Ja, b dfL, which implies that (26.16) holds if JL{a} = JL{b} = 0. T The inversion formula contains further information. Suppose that
•
"'
f 1 �( t ) 1 dt < oo.
( 26.19)
- 00
In this case the integral in
(26.16) can be extended over R 1 • By (26.40),
e - ir b - e - i ra
it
i e r( b - a ) - 11 < lb - a l; ,= ! tl i
therefore, JL(a, b) < (b - a)f':"'l�(t)l dt, and there can be no point masses. By (26.16), the corresponding distribut ion function satisfies F( x + h ) - F( x ) h
1 "' = 27T !_
"'
i e - rx _ e - ir (x +h )
ith
�( t) dt
(whether h is positive or negative). The integrand is by (26.40) dominated by l�{t)l and goes to eir x�(t) as h -7 0. Therefore, F has derivative
(26.20 )
CONVERGENCE OF DISTRIBUTIONS
348
Since f is continuous for the same reason cp is, it integrates to F by the fundamental theorem of the calcuius (see (17.6)). Thus (26.1 9) implies that JL has the continuous density (26.20). Moreover, this is the only continuous density. In this result, as in the Riemann-Lebesgue theorem, conditions on the size of cp(t) for large I t I are connected with smoothness properties of JL . The inversion formula (26.20) has many applications. In the first place, it can be used for a new derivation of (26.14). As pointed out in Example 17.3, the. existence of the limit in (26. 14) is easy to prove. Denote this limit temporarily by 7r0/2-without assuming that 7r o = 7r. Then (26.16) and (26.20) follow as before if 7r is replaced by 7r0. Applying the latter to the standard normal density (see (26.8)) gives
( 26.21)
where the 7r on the left is that of analysis and geometry--it comes ultimately from the quadrature (18. 10). An application of (26.8) with x and t inter changed reduces the right side of (26.21 ) to (J27r /2 7r0)e - x 2 1 2 , and therefore 7r 0 does equal 7r. Consider the densities in the table. The characteristic function for the normal distribution has already been calculated. For the uniform distribution over (0, 1 ), the computation is of course straightforward; note that in this case the density cannot be recovered from ( 26.20), because cp(t) is not integrable; this is reflected in the fact that the density has discontinuities at 0 and 1.
Distribution
Density
Interval
1. Normal
1 - x 2 /2 Vl rr
- oo < x < oo
Uniform
1
0 <x < 1
3. Exponential 4. Double
e
2.
exponential or Laplace
5. Cauchy 6. Triangular 7.
e
-x
-!-
e
- lxl
1 1 rr 1 + x 2 1 - lxl 1 1 - COS X x2 7r
0 < x < oo - oo < x < oo
Characteristic Fu nction e
e
il
-
it 1
1
1 - it 1
1 + t2
- oo < x < oo
e
- 1 <x < 1
2
- oo < x < oo
_ ,2 /2
-It!
1-
cos t
t2 ( 1 - I t l) /( - I, I )(t)
SECTION 26. CHARACTERISTIC FUNCTIONS
349
The characteristic function for the exponential distribution is easily calcu lated; compare Example 2 .3 As for the double exponential or Laplace distribution, e - l x l e irx integrates over (0, oo) to - it )- 1 and over ( oo, 0) to + it )- 1 , which gives the result. By (26.20), then,
1 .
(1
e -lxl
(1
=
1 j "' e
7T
-
- oo
- ir x
-
dt . 1 + t2
= 0 this gives the standard integral f':"' dtj{l + t 2 ) = 1r; see Example Thus the Cauchy density in the table integrates to and has character istic function e 1 1 1 This distribution has no first moment, and the characteris tic function is not differentiable at the origin. A straightforward integration shows that the triangular density has the characteristic function given in the table, and by (26.20),
For
17.5.
1
x
-
.
( 1 - I X I) 1( - 1, 1)\ 1
) X =
1 ! "' e 1 - COS t dt.
7T
- oo
- i rx
t2
"'( 1
For x = 0 this is J': - cos t )t - 2 dt = 1r ; hence the last line of the table. Each density and characteristic function in the table can be transformed by (26.13), which gives a family of distributions.
The Continuity Theorem
Because of (26.1 2), the characteristic function provides a powerful means of studying the distributions of sums of independent random variables. It is often easier to work with products of characteristic functions than with convolutions, and knowing the characteristic function of the sum is by Theorem 26.2 in principle the same thing as knowing the distribution itself. Because of the following continuity theorem, characteristic functions can be used to study limit distributions. Theorem
26.3. Let JLn, JL be probability measures with characteristic func
tions 'Pn ' cp. A necessary and sufficient condition for JLn = JL is that cpn(t) -7 cp(t ) for each t.
PROOF. Necessity. For each t,
e i rx
has bounded modulus and is continu ous in x. The necessity therefore follows by an application of Theorem 25.8 (to the real and imaginary parts of e irx ).
CONVERGENCE OF DISTRIBUTIONS
350
Sufficiency. By Fubini's theorem, ( 26 .22)
>
2j
lx l � 2/u
1 ) dx (1 - 1 1 /-Ln( ) UX
(Note that the first integral is real.) Since cp is continuous at the origin and cp(O) = 1, there is for positive E a u for which u - 1J�J1 - cp(t)) dt < E . Since 'Pn converges to cp, the bounded convergence theorem implies that there exists an n0 such that u - lf�u(l cpn (t )) dt < 2 € for n > n0. If a = 2/u in (26.22), then JLn [x: lxl > a] < 2 € for n > n0. Increasing a if necessary will ensure that this inequality also holds for the finitely many n preceding n0. Therefore, {JLn } is tight. By the corollary to Theorem 25 . 1 0, 1-L n = JL will follow if it is shown that each subsequence {JL n) that converges weakly at all converges weakly to J.L . But if JL n = v as k -7 oo, then by the necessity half of the theorem, already k proved, v has characteristic function Iim k 'Pn k(t) = cp(t ). By Theorem 26.2, v and fL must coincide. II
-
Two corollaries, interesting in themselves, will make clearer the structure of the proof of sufficiency given above. In each, let J.Ln be probability measures on the line with characteristic functions 'Pn · l.
Suppose that limn cpn (t) = g(t) for each t, where the limit function g is continuous at 0. Then there exists a JL such that 1-L n = JL, and J.L has characteristic function g. Corollary
PRoOF.
The point of the corollary is that g is not assumed at the outset to be a characteristic function. But in the argument following (26.22), only cp{O) = 1 and the continuity of cp at 0 were used; hence {JLn} is tight under the present hypothesis. If 1-L nk => v as k oo, then v must have characteristic function limk 'Pnk(t) = g(t ) Thus g is, in fact, a characteristic function, and the proof goes through as before. • .
--+
SECTION 26. CHARACTERISTIC FUNCTIONS
351
In this proof the continuity of g wa s used to establish tightness. Hence if {J.Ln} is assumed tight in the first place, the hypothesis of continuity can be suppressed:
Suppose that lim n cp,,(t) = g(() exists for each t and that {J.L n} is tight. Then there exists a J.L such that J.L n = J.L, and J.L has characteristic function g. Corollary 2.
This second corollary applies, for example, if th e 1-L n have a common bounded support. If 1-L n is the uniform distribution over ( n , n), its charac teristic function is (nt) - 1 sin tn for t =I= 0, and hence it converges to Ic0J( t ). In this case {J.L n} is not tight, the limit fu nction is not continuous at 0, and 1-Ln does not converge weakly. • Example 26.2.
-
Fourier Series •
Let p., be a probability measure on coefficients are defined by
(26.23)
.91 1 that is supported by [0, 2'7T]. Its Fom ier
[2rr .m em = lo e• xp., ( dx ) , 0
m
= 0, + 1, + 2, . . . .
Th ese coefficients, the values of the characteristic functio n for integer arguments, suffice to determine p., except for the weights it may put at 0 and 2'7T. The relatio n between p., and its Fourier coefficients can be expressed formally by
1 p., ( dx ) - 2 '7T L c,e -•lx dx : I= 00
(26.24 )
•
- oo
if the p., ( dx } in (26.23} is replaced by the right side of (26.24}, and if the sum over I is interchanged with the integral, the result is a formal identity. To see how to recover p., from its Fourier coefficients, consider the symmetric partial sums sm(t) = (2'7T )- 1 'L'i'= mc1 e -u' and their Cesaro averages um(t) = - identity [A24] m 'L'i'=o 1s1(t ). From the trigonometric
-I
(26.25) it follows that
(26.26) * This topic may be omitted
m-
L
I
I 2 2 L e ikx = sin 2lmx sin � x 1 = 0 k= - 1
352
I
CONVERGENCE OF DISTRIBUTIONS
If JL is (2 '7T) - I times Lebesgue measure confined to [0, 2'7T ], then c 0 = 1 and c m = 0 for m -=fo 0, so that um(t) = sm(t) = (2 '7T) - ; this gives the identity
1 rr sin 2 �ms 2'7Tm J- rr sin 2 h ds = 1 . --==-=---
( 26.27)
Suppose that 0 < a < b < 2'7T, and integrate (26.26) over (a, b). Fubini's theorem (the integrand is nonnegative) and a change of variable lead to
(26.28)
b m( t ) dt = j2 rr [ 1 b-x x sin22 tms ] ds 1 fa 2 '7T m Ja-x Sin 2s 0 u
.
JL
( dx )
.
The denominator in (26.27) is bounded away from 0 outside ( o, o), and so as to oo with o fixed (0 < o < 1T ), -
m
goes
I 1 Sin 2 2ms I ds --+ 0 ' 2'7Tm J.li , and suppose that limn c�> = c m for all m. Since {JLn} is tight, JLn = JL will hold if JL n = v (k --+ oo) implies v = JL · But in this case v and JL have the same coefficients ckm ' and hence they are identical except perhaps in the way they split the mass v{O, 2'7T} = JL{O, 2'7T} between the points 0 and 2'7T. But this poses no problem if JL{O, 2'7T} = 0: If lim n c�> = e m for all m and JL{O} = JL{2'7T} = 0, then JL n = JL ·
Example 26.3. If JL is (2'7T) - I times Lebesgue measure confined to the interval [0, 2'7T], the condition is that Iimn dr7> = 0 for m -=fo 0. Let x 1 , x 2 , be a sequence of 1 reals, and let JLn put mass n at each point 2'7T{x k }, 1 < k < n, where { x k } = x k - [ x k J denotes fractional part. This is the probability measure (25.3) rescaled to [0, 2'7T]. The sequence x 1 , x 2 , is uniformly distributed modulo 1 if and only if •
•
for m
-=fo
•
•
0. This is Weyl' s criterion.
•
•
SECTION 26. CHARACTERISTIC FUNCTIONS
353
If x k = k8, where 8 is irrational, then exp(2 '7T i8 m ) -=fo 1 for m -=F 0 and hence n 1 L: 27Tik8m -1 2-rr i8m 1 - e 2-rr in8m = ne -+ o. nk e 1 e 2-rr18m _
=l
_
.
Thus 8, 28, 38, . . . is uniformly distributed modulo 1 if 8 is irrational, which gives • another proof of Theorem 25.1. PROBLEMS
lattice distribution if for some a and b, b > 0, the lattice [a + nb: n = 0, ± 1, . . . ] supports the distribution of X. Let X have
26.1. A random variable has a
characteristic function 'P· (a) Show that a necessary condition for X to have a lattice distribution is that l�p(t )I = 1 for some t -=F 0. (b) Show that the condition is sufficient as well. (c} Suppose that l�p(t)l = l�p(t')l = 1 for incommensurable t and t' (t -=F 0, t' -=F 0, t jt' irrational). Show that P[ X = c] = 1 for some constant c.
p.( - oo, x ] = p.[ -x, oo) for all x (which implies that p.(A) = p.(-A) for all A � 1 ), then p. is symmetric. Show that this holds if and only if the
26.2. If
E
characteristic function is real.
26.3. Consider functions 1p that are real and nonnegative and satisfy
�p(t ) and �p(O) = 1.
�p( -t) =
are positive and Lk. 1 dk = oo, that s 1 > s2 > · · · > 0 and lim k sk = 0, and that Lk = l sk d k = 1. Let 1p be the convex polygon whose successive sides have slopes -s 1 , - s2 , . . . and lengths d 1 , d 2 , when pro jected on the horizontal axis: 1p has value 1 - I:j= 1 s dj at t k = d 1 + · · · +dk . If sn = 0, there are in effect only n sides. Let �p0(t ) = (1 - It 1)1< 1 , 1 >(t) be the characteristic function in the last line in the table on p. 348, and show that �p(t ) is a convex combination of the characteristic functions �p0(t jt k ) and hence is itself a characteristic function. (b) P6lya's criterion. Show that 1p is a characteristic function if it is even and continuous and, on [O, oo), nonincreasing and convex t�p(O) = 1). (a) Suppose that
d 1 , d2 ,
•
•
•
=
•
_
•
•
CONVERGENCE OF DISTRIBUTIONS
354 26.4.
i Let 'P I and �p2 be characteristic functions, and show that the set A = [t: �p1(t) = �p2(t)] is closed, contains 0, and is symmetric about 0. Show that every set with these three properties can be such an A. What does this say about the uniqueness theorem?
26.5. Show by Theorem
26.1 and integration by1 parts that if JL has a density f with integrable derivative f', then �p(t) = o(t - ) as It I -+ oo. Extend to higher deriva tives.
26.6. Show for independent random variables uniformly distributed over ( - 1, n
that X1 +
26. 7.
···
+ Xn has density 71"- 1/Q((sin t)jt ) cos tx dt for n > 2.
+ 1)
21.17 i Uniqueness theorem for moment generating functions. Suppose that F has a moment generating function in ( -s0, s0), s0 > 0. From the fact that f":.. oo ezx dF( x ) is analytic in the strip -s0 < Re < s0, prove that the moment z
generating function determines F. Show that it is enougn that tbe moment generating function exist in [0, s0), s0 > 0. 26.8.
21.20 26.7 i Show that the gam111a density (20.47) has characteristic func tion
(1
� - t/a ) I
u
(
(
it
= exp -u log 1 - a
)] ,
where the logarithm is the principal part. Show that f0euf( x ; a, u) d.x 1 � analytic for Re z < a. 26.9. Use characteristic functions for a simple proof that the family of Cauch .
distributions defined by (20.45) is closed under convolution; compare th argument in Problem 20. 14(a). Do the same for the normal distributic (compare Example 20.6) and for the Poisson and gamma distributions.
26. 10. Suppose that Fn -=> F and that the characteristic functions are dominated by
integrable function. Show that of the Fn.
26. 11. Show for all
F
has a density that is the limit of the densit
a and b that the right side of (26.16) is JL(a, b) + �JL {a} + �JL
26.12. By the kind of argument leading to
1
Let x 1 , x2, prove that
(26.31)
J
T
.
e -110�p( t ) dt . JL { a } = lim 2T T-. oo -T
(26.30)
26. 13. i
(26.16), show that
•
•
•
be the points of positive JL-measure. By the following '
2 2 .hm 1 T " 1 I �p ( t ) dt = L..,., ( JL { x d ) . T-.oo 2T T k
J-
SECTION 26. CHARACTERISTIC FUNCTIONS
355
Let X and Y be independent and have characteristic function cp. (a) Show by (26.30) that the left side of (26.31) is P[ X - Y = 0]. (b) Show (Theorem 20.3) that P( X - Y = 0] = f:_ ooP( X = y]JL(dy) =
[k ( �L {x k )) 2. 26.14. r
Show that 1-t has no point masses if cp 2 (t) is integrable.
{1-tn} is tight, then the characteristic functions 'Pn(t) are uniformly equicontinuous (for each there is a o such that Is - t1 < o implies that lcpn(s) - Cf'n(t)l < for all n). 1-t implies that 'Pn(t) -+ cp(t) uniformly on bounded sets. (b) Show that 1-tn
26.15. (a) Show that if
=
E
E
(c) Show that the convergence in part (b) need not be uniform over the entire
line.
14.5 26.15 i For distribution functions F and G, define d'(F, G) =
26.16.
sup, lcp(t) - r/J(t )lj(l + ltl), where 1p and rfJ are the corresponding characteristic functions. Show that this is a metric and equivalent to the Uvy metric.
26.17.
25.16 i
A real function f has
mean value
J
1 M[ f ( x )] = lim 2 T T f( x ) d.x , -T T -.oo
(26.32)
provided that f is integrable over each [ - T, T] and the limit exists. (a) Show that, if f is bounded and ei•f<x> has a mean value for each t, then f has a distribution in the sense of (25.18). (b) Show that if t = 0, if t * 0.
( 26.33 ) Of course, f(x) = x has no distribution.
1. Let 1-tn be the distribution of the fractional part {nX}. Use the continuity theorem and Theorem 25.1 to 1 show that n - I:Z = 1 1-t k converges weakly to the uniform distribution on [0, 1 ].
21 18. Suppose that X is irrational with probability
26 1. �
25.13 i The uniqueness theorem for characteristic functions can be derived
from the Weierstrass approximation theorem. Fill in the details of the follow ing argument. Let 1-t and v be probability measures on the line. For continuous f with bounded support choose a so that 1-t( -a, a) and v( -a, a) are nearly 1 and f vanishes outside (-a, a). Let g be periodic and agree with f in (-a, a), and by the Weierstrass theorem uniformly approximate g(x) by a trigonomet ric sum p(x) = I:f= l a k eitk x. If 1-t and v have the same characteristic function, then ffd �L fgd�L fp d�L = fpdv fgdv ffdv. ==
==
==
==
26.20. Use the continuity theorem to prove the result in Example
convergence of the binomial distribution to the Poisson.
25.2 concerning the
356
CONVERGENCE OF D ISTRIBUTIONS
26.21. According to Example
25.8, if Xn = X, an --+ a, and bn --+ b, then aX + b. Prove this by means of characteristic functions.
26.22.
a
n Xn + bn =
26. 1 26. 15 i According to Theorem 14.2, if Xn = X and anXn + bn = Y, where an > 0 and the distributions of X and Y are nondegenerate, then an --+ a > 0, bn --+ b, and aX + b and Y have the same distribution. Prove this by characteristic functions. Let 'Pn' tp, r/J be the characteristic functions of Xn , X, Y. (a) Show that l�pn(ant )I -+ lt/J(t)l uniformly on bounded sets and hence that an cannot converge to 0 along a subsequence. (b) Interchange the roles of tp and rjJ and show that an cannot converge to infinity along a subsequence. (c) Show that an converges to some a > 0. (d) Show that eitb,. --+ r/J(t )jtp(at ) in a neighborhood of 0 and hence that Jci e isb ds fci(r/J(s)/tp(as)) ds. Conclude that bn converges. "
____,
26.23. Prove a continuity theorem for moment generating functions as defined by
(22.4) for probability measures on [0, the analogue of (26.22) is 2
( ( 1 - M(s)) ds
u .0
26.24.
oo). For uniqueness, see Theorem 22.2; >
�-t{�-,oo) . u
26.4 i Show by example that the values tp(m) of the characteristic function at integer arguments may not determine the distribution if it is not supported by [0, 2 '1T ].
m
f is integrable over [0, 2 1T ], define its Fourier coefficients as Jl"ei xf(x)dx. Show that these coefficients uniquely determine f up to sets of measure 0.
26.25. If
26.26.
19.8
26.25 i
Show that the trigonometric system (19. 17) is complete.
(26. 19) is I:mlcml < Show that x I:mcme-im x on [0, 21T ], where f is f(21T ). This is the analogue of the inversion formula
26.27. The Fourier-series analogue of the condition it implies 11- has density f( ) (2 1T ) - 1
oo.
=
continuous and f(O) (26.20).
26.28.
i
=
Show that
O < x < 21T .
26.29.
(a) Suppose X' and X'' are independent random variables with values in [0, 21T ], and let X be X' + X" reduced module 21T. Show that the correspond ing Fourier coefficients satisfy em c;.,c�. (b) Show that if one or the other of X' and X" is uniformly distributed, so is X. =
SECTION 27. THE CENTRAL LIMIT THEOREM 26.30.
357
26.25 i The theory of Fourier series can be carried over from [0, 21T] to the
unit circle in the complex plane with normalized circular Lebesgue measure P. The circular functions e im x become the powers wm, and an integrable f is determined to within sets of measure 0 by its Fourier coefficients e = m fnwmf(w)P(dw). Suppose that A is invariant under the rotation through the angle arg c (Example 24.4). Find a relation on the Fourier coefficients of /A , and conclude that the rotation is ergodic if c is not a root of unity. Compare the proof on p. 316.
SECTION 27. THE CENTRAL LIMIT THEOREM Identically Distributed Summands
The central limit theorem says roughly that the sum of many independent random variables will be approximately normally distributed if each sum mand has high probability of being small. Theorem 27. 1 , the Lindeberg-Levy theorem, will give an idea of the techn iques and hypotheses needed for the more general results that follow. Throughout, N will denote a random variable with the standard normal distribution: ( 27 .1 )
Suppose that {Xn } is an independent sequence of random variables having the same distribution with mean c and finite positive variance u 2 • If sn = XI + . . . +Xn , then Theorem 27.1.
(27 .2)
S__:.:_n- nc = N. u fii -=-
By the argument in Example 25.7, (27.2) implies that n - 1 Sn = c. The central limit theorem and the strong law of large numbers thus refine the weak law of large numbers in different directions. Since Theorem 27.1 is a special case of Theorem 27.2, no proof is really necessary. To understand the methods of this section, however, consider the special case in which Xk takes the values + 1 with probability 1/2 each. Each Xk then has characteristic function cp(t) = !e ;1 + !e - it = cos t . By n (26. 12) and (26. 13), Sn/ Vn has characteristic function cp (tj Vn), and so, by n the continuity theorem, the problem is to show that cos t/ Vn � E[e i1 N] e - 1 2 12 , or that n log cos t j Vn (well defined for large n) goes to - �t 2 • But this follows by 1 ' Hopi tal's rule: Let t I rn = X go continuously to 0. =
CONVERGENCE OF DISTRIBUTiONS
358
For a proof closer in spirit to those that follow, note that (26.5) for 3 gives l�(t) - (1 - tt 2 )1 < ltl (IXk l < 1). Therefore,
n=2
(27.3) Rather than take logarithms, use (27.5) below. which gives (n large) ( 27.4)
n
2 But of course (1 - t 2 j2 n) -7 e - 1 12 , which completes the proof for this special case. Logarithms for complex arguments can be avoided by use of the following simple lemma. at
Lemma l. Let most 1; then
lz 1
( 27 .5)
PROOF.
( z 1 - w 1 Xz 2
z1,
•
•
•
•
and w 1 ,
, zm
•
•
-
zm - W 1
•
•
•
•
•
•
, wm
wmi
1, n > 1. Suppose that an -7 0, the idea being that xk and xk +n are then approximately independent for large n. In this case the sequence {XJ is said to be a-mixing . If the distribution of the random vector (Xn , xn + I • ' xn +) does not depend on n, the sequence is said to be stationary. . • .
Example
Let {Y;J be a Markov chain with finite state space and positive transition probabilities pij, and suppose that Xn = f(Y), where f is some real function on the state space. If the initial probabilities P; are the stationary ones (see Theorem 8.9), then clearly {XJ is stationary. Moreover, n n _ by (8.42), I P� ) Pj l < p , where p < l . By (8. 1 1 ), P[ Y1 = ip . . . , Yk = ik , n · · · · , = = j P; k - l ; k P�k,.JO P1·o1 I • P1·/ - 11·/ , which differs �k + n o , · · Yk + n + l = jl] P; 1 P,· 1 ;2 27.6.
•
• This topic may be omitted.
•
364
CONVERGENCE OF DISTRIBUTIONS
, Yk + n + l = j1 ] by at most from P[ Y1 = i l > . . . , Yk = i k ] P[ Yk + n = j0, P; I P,· 1 ;2 . . . P;k - 1 ,· � p np,.ol· I . . . P,·/ - 1,·/ . It follows by addition that, if s is the number of states, then for sets of the form A = [( YI > . . . , Yk ) E H ] and B = n [(Yk +n• · · · · Yk +n + l ) E H'], (27. 19) holds with an = sp . These sets (for k and n fixed) form fields generating u-fields which contain u( X 1 , , Xk ) and u( Xk + n • Xk +n + 1, ), respectively. For fixed A the set of B satisfying (27. 19) is a monotone class, and similarly if A and B are interchanged. It follows by the monotone class theorem (Theorem 3.4) that {Xn} is a-mixing with n • an = sp . •
•
•
• • •
• • •
The sequence is m-dependent if ( X 1 , , Xk ) and (Xk + n , . . . , Xk+n + l ) are independent whenever n > m. In this case the sequence is a-mixing with a n = 0 for n > m . In this terminology an independent sequence is 0-depen dent. • • •
Example 27. 7. Let Y1, Y2 , and put X = f(Y,., . . . , }'� + m ) �
stationary and m-dependent.
then
(27 .20)
be independent and identically dis tributed, m 1 for a real function f on R + • Then {XJ is • •
•
Suppose that X 1 , X2 , is stationary and a-mixing wich and that E[XJ = O and E[X! 2 ] < oo. If S n = X 1 + · · · + Xn ,
Theorem 27.4.
an = O( n - 5)
•
• • •
1 n - Var [ SJ -7 CT 2 = E [ xn
"'
+ 2 L E[ X I XI + k ] , k= l
where the series converges absolutely. If u > 0, then Sn /uVn N. =
The conditions an = O(n - 5) and E[ X! 2 1 < oo are stronger tha n nece�sary; they are imposed to avoid technical complications in the proof. The idea of the proof, which goes back to Markov, is this: Split the sum X I + . . . + xn into alternate blocks of length bn (the big blocks) and In (the little blocks). Namely, let
where rn is the largest integer i for which (i let
(27.22)
l)(bn + I) + bn < n.
V,.; = X(i - IX b , + l, ) + b , + l + . . . +Xi(b , + l, ) • V,. ,, = X(r, - I X b, + l, ) + b, + l + . . . +Xn .
Furth er,
Then sn = L:;� I U,. I + L:;� I V,. , , and the technique will be to choose the In small enough that L;V,.; is small in comparison with L:; Uni but large enough
SECfiON 27. THE CENTRAL LIMIT THEOREM
365
that the Un i are nearly independent, so that Lyapounov's theorem can be adapted to prove L; Un i asymptotically normal.
If Y is measurable CT(X 1 , , Xk) and bounded by C, and if Z is measurable CT(Xk + n • xk + n + I > ) and bounded by D, then Lemma 2.
• • •
• • •
I E[ YZ ] - £[ Y ] £[ Z ] I < 4CDa n .
(27.23)
PROOF.
It is no restriction to take C = D = 1 and (by the usual approxi mation method) to take Y L;YJA , and Z = [jzJ8 simple ( IY;I, l zjl < 1). If i d1 j = P(A; n B) - P(A)P(B), the left side of (27.23) is IL.; jYi zjd iil. Take {; to be + 1 or - 1 as f.jzjd ;j is positive or not; now take TJj to be + 1 or - 1 as L;{; d;j is positive or not. Then =
" y . z . d . . -< " " L,. L,. z d . . .
L,. 1,1
I
)
I)
I
1
)
I)
< I: [g;d;j . l ]
=
"c L,. i+k oo.
Verify Lyapounov's
27.5. Suppose that the random variables in any single row of the triangular array arc
identically distributed. To what do Lindeberg's and Lyapounov's conditions reduce?
27.6. Suppose that
Z 1 , Z2 ,
are independent and identically distributed with mean 0 and variance 1, and suppose that Xnk = unkzk . Write down the Lindeberg condition and show that it holds if max k s. r, u}k = o([/;"� 1 un2k). • • •
27.7. Construct an example where Lindeberg's condition holds but Lyapounov's
does not.
27.8.
27.9.
27.10.
22.9 i Prove a central limit theorem for the number
time
n.
sn
Rn
of records up to
be the number of inversions in a random permutation on letters. Prove a central limit theorem for Sn .
6.3 i Let
n
The 8-method. Suppose that Theorem 27.1 applies to {Xn}, so that vn u- 1 (Xn - c) = N, where Xn = - 1 [f: � 1 Xk . Use Theorem 25.6 as in Exam ple 27.2 to show that, if f(x) has a nonzero derivative at c, then {n (fC' oo.
i Suppose that X1, X2, are in dependent and identically distributed with mean 0 and variance 1, and suppose that a n ----> oo. Formally combine the central limit theorem and (27.28) to obtain •
•
•
(27.29) where ?, ----> 0 if a n ----> oo. For a case in which this does hold, see Theorem 9.4. 27.18.
Stirling's formula. Let Sn = X1 +
+Xn , where the Xn are indepe n dent and each has the Poisson distribution with parameter 1. Prove succes sively. n k nn +(lf2>e -n Sn - n - - - n n n k (a) E [ ( .fn ) ] e 1 n.1 . k n n
21.2 i
sn - n (b) ( rn
) - = N -.
( I:
k
�
o
In
)
·
·
·
_
Sn n (c) E [ ( in ) ] -) E[ N - ] =
(d)
n! - /2rr n n +( lf2>e - n.
27.19. Let ln(w) be the length of the run of O's starting at the nth place in the dyadic
expansion of a point w drawn at random from the unit interval; see Example
4.1.
(a) Show that / 1, /2 , . . . is an a-mixing sequence, where a11 = 4j2 n.
(b) Show that [f: 1 /k is approximately normally distributed with mean
variance 6n.
�
27.20. Prove under the hypotheses of Theorem
Hint: Use (27.25).
27.21.
26. 1 26.29 i Let X 1 , X2 ,
n and
27.4 that Sn!n ----> 0 with probability 1.
be independent and identically distributed, and suppose that the distribution common to the xn is supported by [0, 2rr] and is not a lattice distribution. Let sn = X I + . . . +XII , where the sum is reduced modulo 27T. Show that Sn U, where U is uniformly distributed over [0, 21T ]. • • •
=
SECfiON 28.
INFINITELY DIVISIBLE DISTRIBUTIONS
371
SECTION 28. INFINITELY DMSIBLE DISTRIBUTIONS*
Suppose that ZA has the Poisson distribution with parameter ,.\ and that Xn P . . . , Xnn are independent and P[ Xn k 1] A/n, P[ Xn k = 0] = 1 - Ajn. According to Example 25.2, xn I + . . . + xnn zA . This contrasts with the central limit theorem, in which the limit law is normal. What is the class of all possible limit laws for independent triangular arrays? A suitably restricted form of this question will be answered here. =
=
=
Vague Convergence
The theory requires two preliminary facts about convergence of measures Let J.Ln and JL be finite measures on (R 1 , � 1 ). If J.L n( a b) --> JL(a. b] for every finite interval for which JL{ a} = JL{b} = 0, then J.Ln converges vaguely to JL, written J.Ln --> , JL. If J.Ln and JL are probability measures, it is not hard to set! that this is equivalent to wedk conv�rgence · t-Ln = JL · On the other hand, if J.Ln is a unit mass at n and JL(R 1 ) = 0, then 1-Ln --> , JL, but J.Ln JL makes no sense, because JL is not a probability measure. The first fact needed is this: Suppose that J.Ln --> JL and ,
-=>
1
(28 . 1 )
then (28.2)
for every conti."!uous real f that 1 vanishes at + oo 1in the sense that lim 1 x1 _,,., f(x) = O. Indeed, choose M so that JL(R ) < M and J.Ln(R ) < M for all n. Given t:, choose a and b so that JL{a} = JL{b} = 0 and lf(x)l < t:jM if x $ A = (a, b]. Then I JAcfdJ.Lnl < €
and I JA . fdJLI < t:. If JL(A) > O, define v(B) = JL(B nA)/JL(A) and vn(B) =J.Ln(B n A )/J.Ln(A). It is easy to see that vn = v, so that ffdvn --> ffdv. But then lfA fdJLn fA fdJLI < " for large n, and hence I JfdJLn - JfdJL I < 3t: for large n. If JL(A) = 0, then fA fdJLn --> 0, and the argument is even simpler. The other fact needed below is this: If (28. 1) holds, then there is a subseq/Jence {JL ' } and a finite measure JL such that 1-Ln . --> JL as k --> Indeed, let Fn(x) = J.Ln( - oo, x ]. Since the Fn are uniformly bounded because of (28.1), the proof of Helly's theorem shows there exists a subsequence {Fn) and a bounded, nondecreasing, right-continuous function F such that lim k Fn.(x) = F(x) at continuity points x of F. If JL is the measure for which JL(a, b ] = F(b) - F(a) (Theorem 12.4), then clearly 1
�
oo.
J.Ln • --> I JL ·
The Possible Limits
Let xn l • . . . ' xnrn• n = 1, 2, . . . be a triangular array as in the preceding ' section. The random variables in each row are independent, the means are 0, *This section may be
omitted.
CONVERG ENCE OF DISTRIBUTIONS
372
and the variances are finite: rn
s� = L un2k ·
(28.3)
k=l
Assume s� > 0 and put Sn = Xn 1 + the total variance is bounded:
(28.4) ln order that the
(28.5)
· · + Xn n. r
·
Here it will be assumed that
sup s� < oo.
n
Xnk
be small compa red with lim max u 2k =
n
k s rn
n
Sn, assume that
0.
,
The arrays in the preceding section were normalized by replacing Xnk by Xnkfs�. This has the effect of replacing sn by 1, in which case of course (28.4) holds, and (28.5) is the same thing as maxk un2kfs� � 0. A distribution function F is infinitely dimsible if for each n there is a distribution function Fn such that F is the n-fold convolution Fn * · · · * Fn (n copies) of Fw The class of pos:;ible limit laws will turn out to consist of the infinitely divisible distributions with mean 0 and finite variance. t It will be possible to exhibit the characteristic functions of these laws in an explicit way. Theorem 28.1.
Suppose that
1 1 i x - 1 - itx ) 2 11- ( dx ) , � ( t) = exp j ( e x Ri where 11- is a finite measure. Then � is the characteristic function of an infinitely divisible distribution with mean 0 and variance JL(R 1 ). ( 28.6)
By (26.4 2 ), the integrand in (28.6) converges to - t 2/2 as x � 0; take this as its value at x = 0. By (26.4 1 ), the integrand is at most t 2/2 in modulus and so is integrable. The formula (28.6) is the canonical representation of � . and 11- is the
canonical measure.
Before proceeding to the proof, consider three examples.
If 11- consists of a mass of u 2 at the origin, (28.6) is e - u 2 12 1 2 , the characteristic function of a centered normal distribution F. It is certainly infinitely divisible-take Fn normal with variance u 2jn. • Example 28.1.
tThere do exist infinitely divisible distributions without moments (see Problems 28.3 and 28. 4 ) but they do not figure in the theory of this section
,
SECfiON 28. INFINITELY DIVISIBLE DISTRIBUTIONS
373
Suppose that IL consists of a mass of ,.\ x 2 at x * 0. Then (28.6) is exp ,.\( e i1x - 1 - itx ); but this is the characteristic function of x(ZA ,.\), where Z,� has the Poisson distribution with mean ,.\ . Thus (28.6) is the characteristic function of a distribution function F, and F is infinitely • divisible-take Fn to be the distribution function of x(ZA; n - Ajn). Example 28.2.
IL
Example 28.3. If cpit) is given by (28.6) with ILi for the measure, and if = L.J= 1 1L 1 , then (28.6) is cp 1(t) . . . cp k ( t). It follows by the preceding two
examples that (28.6) is a characteristic function if IL consists of finitely many point masses. It is easy to check in the preceding two examples that the distribution corresponding to cp( t) has mean 0 and variance IL(R 1 ), and since the means and variances add, the same must be true in the present example . •
Let IL k have mass IL{j2-\ {j + 1)2 - k ] at j2 - k for j = 0, + 1, . . . , + 2 2 . Then ILJ. � IL · As observed in Example 28.3, if cp k ( t) is (28.6) with IL k in place of IL, then 'P k is a characteristic function. For each t the integrand in (28.6) vanishes at + oo; since supk 1L k(R 1 ) < oo, 'P/t) � cp(t) follows (see (28.2)). By Corollary 2 to Theorem 26.3, cp( t) is itself a characteristic function. Further, the distribution corresponding to cpk ( t ) has second moment 1L k (R 1 ), and since this is bounded, it follows (Theorem 25.1 1) that the distribution corresponding to cp( t) has a finite second moment. Differentiation (use Theorem 16.8) shows that the mean is cp'(O) = 0 and the variance is - cp"(O) = IL(R 1 ). Thus (28.6) is always the 1 characteristic function of a distribution with mean 0 and variance IL( R ). If 1/Jn(t ) is (28.6) with JL jn in place of IL, then cp t ) = rfJ;(t ), so that the distribution corresponding to cp( t) is indeed infinitely divisible. •
PROOF OF THEOREM k28.1.
1
(
The representation (28.6) shows that the normal and Pois�on distributions are special cases in a very large class of infinitely divisible laws.
Every infinitely divisible distribution with mean 0 and finite variance is the Limit Law of Sn for some independent triangular array satisfying • (28.3), (28.4), and (28.5). Theorem 28.2.
The proof requires this preliminary result:
If X and Y are independent and X + Y has a second moment, then X and Y have second moments as well. Lemma.
PROOF.
Since X 2 + Y 2 < (X + Y) 2 + 2I XY I, it suffices to prove I XY I integrable, and by Fubini's theorem applied to the joint distribution of X and Y it suffices to prove lXI and I YI individually integrable. Since IYI < lxl + lx + Yl, E[IYI] = oo would imply E[lx + Yl] oo for each x; by Fubini's
=
CONVERGENCE OF DISTRIBUTIONS
374
theorem again E[ I Y I] = oo would therefore imply E[IX + Yl] impossible. Hence E[I Y I] < oo, and similarly E[ I X I] < oo.
= oo,
which is •
PROOF OF THEOREM 28.2. Let F be infinitely divisible with mean 0 and variance u 2 • If F is the n-fold convolution of Fn, then by the lemma (extended inductively) Fn has finite mean and variance, and these must be 0 and u 2 jn. Take rn = n and take Xn 1 , • • • , Xnn independent, each with distri bution function Fn. •
If F is the Limit law of Sn for an independent triangular array satisfying (28.3), (28.4), and (28.5), then F has characteristic function of the form (28.6) for some finite measure 11- · Theorem 28.3.
PROOF. The proof will yield information making it possible to identify the limit. Let 'Pn k (t) be the characteristic function of Xnk · The first step is to prove that rn
rn
k=l
k=l
O 'Pn k ( t ) - exp I: ('Pr.k ( t) - 1 ) � o
(28.7)
for each t. Since l z l < 1 implies that l e z - I I = eRez - I < 1, it follows by (27.5) that the difference 8n(t) in (28.7) satisfies l8n(t)l < L�"= 1 lcpn k (t) exp(cpn k(t) - 1) 1. Fix t. If 'Pn k (t) - 1 = ()n k • then IOn k l < t 2un2d2, and it follows by (28.4) and (28.5) that max k l()n k l � 0 and Ek l()n k l = 0{1). Therefore, for sufficiently large n, l8n(t)l < Ek l1 + ()n k - e8•k l < e 2 Ek 1 0nk 1 2 < e 2 max k l()n k l · Ek l()n k l by (27.15). Hence (28.7). If Fn k is the distribution function of Xn k • then rn
L ('Pn k ( t ) - 1 ) =
k=l
rn
J L k=l
R1
( e ii x - 1) dFn k ( x )
Let 11-n be the finite measure satisfying (28.8) and put (28.9)
11-n( - oo, x ] =
r"
L= I j
k
y s; x
y 2 dFn k ( y) ,
SECTION 28.
375
INFINITELY Dl VISIBLE DISTRIBUTIONS
Then (28.7) can be written
r"
n 'Pn k ( t ) - 'Pn( t) � 0.
( 28.10)
k= l
By (28.8), Jl n( R 1 ) = s�, and this is bounded by assumption. Thus (28.1) holds, and some subsequence {JLn "} converges vaguely to a finite measure 11- · Since the integrand in (28.9) vanishes at + oo, 'Pn"(t) converges to (28.6). But, . of course, limn cpn(t) must coincide with the characteristic function of the limit law F, which exists by hypothesis. Thus F must have characteristic function of the form (28.6). • Theorems 28. 1, 28.2, and 28.3 together show that the possible. limit laws are exactly the infinitely divisible distributions with mean 0 and finite vari ance, and they give explicitly the form the characteristic functions of such laws must have. Characterizing the Limit
Suppose that F has characteristic function (28.6) and that an independent triangular array satisfies (28.3), (28.4), and (28.5). Then sn has limit Law F if and only if 11-n �" JL, where 11-n is defined by (28.8). Theorem 28.4.
PROOF. Since (28.7) holds as before, s n has limit law
cpn ( J.L n
F if and only if
t ) (defined by (28.9)) converges for each t to cp(t) (defined by (28.6)). If � ,. JL, then cpn(t) � cp(t) follows because the integrand in (28.9) and (28.6)
vanishes at + oo and because (28. 1) follows from (28.4). Now suppose that
28.10. Suppose that for all
such that
28.11. Show that a stable law is infinitely divisible. 28.12. Show that the Poisson law, although infinitely divisible, is not stable. 28. 13. Show that the normal and Cauchy laws are stable. 28. 14. 28.10 j
of
a'', b"
Suppose that F has mean 0 and variance 1 and that the dependence on a, a', b, b' is such that
(X) (X)
F - *F - =F ul O"z
( X ) ul + u}
V
Show that F is the standard normal distribution.
.
CONVERGENCE OF DISTRIBUTIONS
378
28.15. (a) Let Yn k be independent random variables having the Poisson distribution with mean cnajl k l 1 a, where c > O and O < a < 2. Let Zn = n - 1 'L'k� - n2kYn k (omit k = 0 in the sum), and show that if c is properly chosen then the characteristic function of zn converges to e _ ,,,a, (b) Show for 0 < a < 2 that e - llla is the characteristic fu nction of a symmetric stable distribution; it is called the symmetric stable law of exponent a. The case a = 2 is the normal law, and a = 1 is the Cauchy law.
SECTION 29. LIMIT THEOREMS IN
Rk
If Fn and F are distribution functions on R\ then Fn converges weakly to F, written Fn = F, if limn Fn( x ) = F(x) for all continuity points of F. The corresponding distributions 11- n and 11- are in this case also said to converge weakly: 11- n = 11-· If ¥n and X are k-dimensional random vectors (possibly on different probability spaces), Xn converges in distribution to X, written Xn = X, if the corresponding distribution functions converge weakly. The definitions are thus exactly as for the line. x
The Basic Theorems
The closure A - of a set in R k is the set of limits of sequences in A; the interior is Ao = R k - ( R k - A ) - ; and the boundary is aA = A - - A0• A Borel set A is a wcontinuity set if JL(aA ) = 0. The first theorem is the k-dimen sional version of Theorem 25.8.
measures 11- n and J.L on ( R\ IN k ), each of the following conditions is equivalent to the weak convergence of 11- n to 11-: Theo rem 29.1.
(i) (ii) (iii) (iv)
For probability
lim n ffdll n = ffdJL for bounded continuous f; lim supn JL/C) < JL{C) for closed C; lim infn Jln(G) > JL{G) for open G; limn Jln(A) = JL(A) for wcontinuity sets A.
It will first be shown that {i) through (iv) are all equivalent. (i) implies {ii): Consider the distance dist(x, C) = inf[ l x - yl: y E C] from x to C. It is continuous in x. Let PRooF.
cpj( t ) =
1 1
0
- jt
if t < 0, if o < t < r I , if r I < t.
SECTION 29.
LIMIT THEOREMS IN
Rk
379
Then f/x) = cp/dist(x, C)) is continuous and bounded by 1, and f/xH Ic(x) as j i oo because C is closed. If (i) holds, then lim supn J.Ln(C) < limn ffi dJ.Ln = ffi dJ.L. As j i 00, ffi dJ.L ! fie dJ.L = J.L(C). (ii) is equivalent to (iii). Take C = R k - G. (ii) and {iii) imply (iv): From {ii) and (iii) follows
J.L( Ao ) < lim infJ.Ln n ( Ao ) < lim inf n J.Ln( A ) n
1l
Clearly (iv) follows from this. (iv) implies (i): Suppose that f is continuous and lf(x )I is bounded by K. Given E , choose reals a0 < a 1 < · · · < a1 so that a0 < - K < K < a1, a; a; _1 < E, and J.L[x: f(x) = a;] = O. The last condition can be achieved be cause the sets [x: f(x) = a] are disjoint for different a. Put A; = [x: a;_ 1 0 (see (12. 12)). Given E and a rectangle A = (a 1 , b 1 ] X · · · X (a k , bk], choose a o such that if z = (8, . . . , 8), then for each of the 2 k vertices x of A, x < r <x + z implies IF(x) - G(r )I < E j 2 k . Now choose rational points r and s such that a < r < a + z and b < s < b + z. If B = (r 1 , s 1 ] X · · · X (rk , sk ], then ID.AF D. 8 G I < E . Since D. 8 G = lim; D. 8 Fn .> 0 and E is arbitrary, it follows that !:J. A F > 0. PROOF. Take
.
I
I
With the present interpretation of the symbols, the proof of Theorem 25.9 shows that F is continuous from above and lim; Fn (x) F(x) for continuity points x of F. I
=
t The approach of this section carries over to general metric spaces; for this theory and its applications, see BILLINGSLEY1 and BILLINGSLEY 2• Since S korohod's theorem is no easier in R k than in the general metric space, it is not treated here.
SECTION
29. LIMIT THEOREMS IN R k
381
12.5, there is a measure J.L on ( R\ � k ) such that J.L(A ) = !lA F for rectangles A. By tightness, there is for given € a t such that J.L J y: - t < yi < t, j < k ] > 1 - E for all n. Suppose that all coordinates of x exceed t: If r > x, then Fn (r) > 1 - E and hence (r rational) G(r) > 1 - E, so that F(x) > 1 - E. Suppose, on the other hand, that some coordinate of x is less than - t: Choose a rational r such that x < r and some coordinate of r is less than - t; then Fn(r) < E, hence G(r) < E, and so F(x) < E. Therefore, for every E there is a t such that By Theorem
F( x ) \f
( 29 . 1 )
> 1-€
t for all j, if xi < - t for some j.
If B5 = [ y: - s < yi < xi, j < k ] , then J.L(S) = I im 5 J.L(B5 ) = 1im 5 1l8l. Of the 2 k te 1 ms in the sum !l8F, all but F(x) go to 0 (s -7 oo) because of the ' second part of (29. 1). Thus J.L(S) = F(x ).t Because of the other part of (29. 1), J.L is a probability measure. Therefore, Fn = F and 11-n = J.L . • Obviously Theorem 29.3 implies that tightness is a sufficient condition that each subsequence of {J.Ln} contain a further subsequence converging weakly to some probability measure. (An easy modification of the proof of Theorem 25. 10 shows that tightness is necessary for this as well.) And clearly the corollary to Theorem 25.10 now goes through:
If {J.Ln} is a tight sequence of probability measures, and if each subsequence that converges weakly at all converges weakly to the probability measure J.L , then J.Ln = J.L · Corollary.
Characteristic Functions
Consider a random vector X = ( X1 , , Xk ) and its distribution J.L in R k . Let t · x = L� 1 t u x., denote inner product. The characteristic function of X and of J.L is defined over R k by •
•
•
=
(29 .2) To a great extent its properties parallel those of the one-dimensional charac teristic function and can be deduced by parallel arguments. "t This requires proof because there exist (Problem 12.10) functions F' other than F for which J.L( A ) AAF' holds for all rectangles A. =
382
CONVERGENCE OF DISTR IBUTIONS
(26.16)
The inversion formula takes this form: For a bounded rectangle A = [x: a u <x u < bu , u < k ] such that J.L(aA ) 0, =
II II QII - e- 1/ 1 bII e 1 .t 1-L ( A ) = lim cp( t ) dt, j k 0 u l T -+ "' (27T) BT u = I k
(29.3)
.
where B T [t E R k : l tul < T, u < k ] and it, replace cp{t) by the middle term in The integral in is
(26.17):
=
.
dt is short for dt 1
(29.3)
The inner integral may be evaluated by Fubini's theorem in IT =
f un= I [ sgn( x�Rk
-
dtk . To prove
(29.2) and reverse the integrals as in •
a
•
•
R k , which gives
u) S T · Ixu - aul) (
sgn ( x; - bu)
J
S ( T · Ix u - bui) J.L( dx ) .
(26.1 8)), (29.3) 29.1
Since the integrand converges to n� = 1 1/laII' bII(xJ (see follows as in the case k The proof that weak convergence implies (iii) in Theorem shows that k for probability measures 1-L and 11 on R there exists a dense set D of reals such that J.L(aA) = ll(aA ) 0 for all rectangles A whose vertices have coordi nates in D. If J.L{A) 11{A) for such rectangles, then 1-L and 11 are identical by Theorem Thus the characteristic function cp uniquely determines the probability mea sure 1-L Further properties of the characteristic function can be derived from the one-dimensional case by means of the following device of Cramer and Wold. For t E R\ define h 1: R k � R 1 by h 1 (x) = t · x. For real a, [ x : t · x < a] is a half space, and its wmeasure is =
1.
=
3.3.
=
(29.4)
By change of variable, the characteristic function of J.L h ; 1 is
( 29 .5)
f e isyJ.Lh ; l (dy ) f e is(l x)J.L( dx ) =
Rl
Rk
= cp( st 1 ,
• • •
, stk ),
(29.4))
to know each J.L h 1- 1 To know the J.L- measure of every half space is (by for s to know cp( t ) for every t; and to know the and hence is (by
(29.5)
=
1)
SECTION
29. LIMIT THEOREMS IN R k
383
J.L is to know J.L. Thus J.L is uniquely determined by the values it gives to the half spaces. This result, very simple in its statement, seems to require Fourier methods-no elementary proof is known. If J.Ln = J.L for probability measures on R k , then cpn(t) -7 cp(t) for the corresponding characteristic functions by Theorem 29. 1. But suppose that the characteristic functions converge pointwise. It follows by (29.5) that for each t the characteristic function of J.L n h; 1 converges pointwise on the line to the characteristic function of 11-h 1 ; by the continuity theorem for characteristic functions on the line then, J.L n h :- 1 = J.Lh ,- 1• Take the u th component of t to be 1 and the others 0; then the J.Ln h,- 1 are the marginals for the uth coordinate. Since {J.Lnh,- 1 } is weakly convergent, there is a bounded inte rval (a u , bJ such that J.L n[x E R k : a u < xu < bJ = JL n h,- 1 (a u , bJ > 1 - E jk for all n . But then J.Ln(A) > 1 - € for the bounded rectangle A = [x: a u < x u < bu , u = 1, . . ; k]. The sequence {J.L n} is therefore. tight. If a subsequence {;.t n } converges weakly to then 'Pn(t) converges to the characteristic function of v , which is therefore cp(t ). By uniqueness, v J.L, so that 11-n = J.L. By the corollary to Theorem 29.3, 11-n = J.L. This proves the continuity theorem for k-dimensional characteristic functions: J.Ln = J.L if and only if cpn( t ) -7 cp(t) for all t. characteristic function cp of
1
.
v'
I
=
•
The Cramer-Wold idea leads also to the following result, by means of which certain limit theorems can be reduced in a routine way to the one-dimensional case.
For random vectors Xn = (Xn 1 , , Xn k ) and Y = (YI , . . . ' Yk ), a necessary and sufficient condition for xn = y is that L� = l tu xnu , tk ) in R k . � L �= 1 tu Yu for each (t Theorem 29.4.
•
P
.
.
•
•
•
PRooF. The necessity follows from a consideration of the continuous
mapping h, above-use Theorem 29.2. As for sufficiency, the condition implies by the continuity theorem for one-dimensional characteristic func tions that for each (t 1 , , tk ) •
•
•
for all real s. Taking s = 1 shows that the characteristic function of Xn converges pointwise to that of Y. • Normal Distributions in
Rk
By Theorem 20.4 there is (on some probability space) a random vector X = (XP . . . , Xk ) with independent components each having the standard normal distribution. Since each Xu has density e-x 2 12 jVz1r , X has density
CONVERGENCE OF DISTRIBUTIONS
384
(see (20.25)) (29.6) where lxl 2 = r.�= x� denotes Euclidean norm. This distribution plays the 1 normal distribution in R k . Its characteristic function is role of the standard (29.7) Let A = [a u, ] be a k X k matrix, and put Y =AX. where X is viewed as a column vector. Since £[ X X13 ] = 80 13 , the rr..atrix l, = [uuJ of the covariances of Y has entries uu, = E [Y)'; ] = L. ! a u a . Thus 2: =AA', where the prime denotes transpose. The matrix I 1is symmetric and nonnegative defin ite: Lu r uUl x.,x, IA'x l 2 > 0. View t also as a column vector with transpose t', and note that t · x = t' x. The characteristic function of AX is thus a
=
a
, a
=
(29. 8)
£{ e ir'( A X l ] = E [ e i( A'r )' X ] = e- I A'q2 /2 = e - r'lr ;2 .
Define a centered normal distribution as any probability measure whose characteristic function has this form for some symmetric nonnegative definite I. If I is symmetric and nonnegative definite, then for an appropriate orthogonal matrix U, U'IU = D is a diagonal matrix whose diagonal ele ments are the eigenvalues of I and hence are nonnegative. If D0 is the diagonal matrix whose elements are the square roots of those of D, and if A = UD0, then I = AA'. Thus for every nonnegative definite I there exists a centered normal distribution (namely the distribution of AX) with covari ance matrix I and characteristic function exp( - 1t'It). If I is nonsingular, so is the A just constructed. Since X has density (29.6), Y = AX has, by the Jacobian transformation formula (20.20), density f(A - 1x)ldet A - 1 1. From I = AA' follows ldet A - 1 I = (det i) - 11 2 • Moreover, I - 1 = (A') - 1A - 1, so that IA - 1xl 2 = x'I - 1x. Thus the normal distribution has density (27T) k l2(det i) - 1 12 exp( - {x'I - 1x) if I is nonsingular. If I is singular, the A constructed above must be singular as well, so that AX is confined to some hyperplane of dimension k - 1 and the distribution can have no density. By (29.8) and the uniqueness theorem for characteristic functions in R k , a
centered normal distribution is completely determined by its covariance matrix.
Suppose the off-diagonal elements of I are 0, and let A be the diagonal matrix with the u;)/ 2 along the diagonal. Then I = AA', and if X has the standard normal distribution, the components X; are independent and hence so are the components u;)/ 2X; of AX. Therefore, the components of a
SECTION
29. LIMIT THEOREMS IN R k
385
normally distributed random vector are independent if and only if they are uncorrelated. If M is a j X k matrix and Y has in R k the centered normal distribution with covariance matrix l, then MY has in R i the characteristic function j exp( - HM'tYl(M't)) = exp( - tt'(MlM')t) (t E R ). Hence MY has the centered normal distribution in R i with covariance matrix MlM'. Thus a Linear transformation of a normal distribution is itself normal.
These normal distributions are special in that all the first moments vanish. The general normal distribution is a translation of one of these centered distributions. It is completely determined by its means and covariances. The Central
Limit
Theorem
Let Xn (Xn 1 , , Xnk) be independent random vectcrs all having the same distribution. Suppose that £[ x;"J < oo; let the vector of means be c = (c 1 , . . . , ck), where C11 = E[ XnJ, and let the covariance matrix be l = [u11 , ], = E[(Xn ll - ciiX Xnl - c, )1. P ut sn = X I + . . . +Xn " where =-
•
•
•
(Til l
Under these assumptions, the distribution of the random vector (Sn - nc)j Vn converges weakly to the centered normal distribution with covariance matrix l. Theorem 29.5.
Let Y = ( YP . . . , Yk ) be a normally distributed random vector with 0 means and covariance matrix l. For given t = (t P t k ), let Zn = L� = 1 tJXn11 - C11) and Z = L�= 1 t11Y11 By Theorem 29.4, it suffices to prove that n - 1 12 L:j= 1 Zi = Z (for arbitrary t ). But this is an instant consequence of • the Lindeberg-Uvy theorem (Theorem 27. 1). PROOF.
•
•
•
,
•
PROBLEMS
k
29.1. A
real function f on R is everywhere upper semicontinuous (see Problem 13.8) if for each x and E there is a B such that I x - y l < B implies that f( y ) < f( x) E; f is lower semicontinuous if f is upper semicontinuous. (a) Use condition (iii) of Theorem 29.1, Fatou's lemma, and (2 1 .9) to show that, if JLn JL and f is bounded and lower semicontinuous, then -
.J..
=
lim inf ffdJLn > ffdJL.
( 29.9)
n
Show that, if (29.9) holds for all bounded, lower semicontinuous functions f, then JLn JL · (c) Prove the analogous results for upper semicontinuous functions. (b)
=
CONVERGENCE OF DISTRIBUTIONS
386
X
X
Show f o r probabi l i t y measures on the l i n e that JLn vn = JL v if and only if JLnSuppose = JL and vn = v. that Xn and Yn are independent and that X and Y are indepen dent. Show that, i f Xn = X and Yn = Y, then ( Xn, Yn ) = (X, Y) and hence that Xn + Yn = X + Y. Show that part (b) f a i l s wi t hout i n dependence. If Fn = F and Gn = G, then Fn * Gn = F * G. Prove thi s by part (b) and also by characteristic functions. Show that {JLn} is tight if and only if fo r each there is a compact set suchShow that that {JLn} is tightforifalandl n. only if each of the k sequences of marginal distributions is tight on the line. Assume of (Xm Yn) that X, = X and = c. Show that (X.7 , Yn) = (X, This iins dependent. an example of Problem 29.2(b) where and Yr. need not be assumed Prove analogues for R k of the corollaries to Theo!"em 26.3. Suppose that f(X) and g(Y) are uncorrelated for all bounded continuous and Show that X and Y are independent. Hint: Use characteristic fu nc tions. 20.normal 16 diSuppose that the random vector X has a centered k-dimensional as has 1 s tri b uti o n whose covari a nce matri x an ei g enval u e of mul t i 2 plchiic-isquared ty r anddi0striasbanutioeingwienvalth ruedegrees of multofiplfreedom. icity k- r. Show that IXI has the Multinomial sampling. Let p , p k be positive and add to 1, and let be i n dependent k-di m ensi o nal random vect o rs such that Zn has wi(fnthl , .probabi l i t y P; a 1 i n the ith component and O' s elsewhere. Then n = n from a . . , fnk) = i s t h e frequency count f o r a sampl e of si z e popul a ti o n wi t h cel l probabi l i t i e s mul t i n omi a l P;· Put xni liP; ) ,;np; and Show Xn = ( Xn J> · · · , Xnk). that X" has mean val u es 0 and covaria nces u;j = (f>;j Pj P;Pj)/ fP;Pj npY jnp; has asymptotically Show that the chi squared stati s ti c the chi-squared distribution with k- 1 degrees of freedom. 20.uni2f6ormlyAdistheorem of Poincare. Suppose that Xn = (X . . . , Xnn) is R" . Fix t, tri b uted sphere of radi u s {;:'i n over the surf a ce of a X , , X111 are in the l i m i t independent, each with the and show that normal standard di s tri b uti o n. Hint: If the components of Yn ( Yn , , Ynn) each then Xn has the aresameindidependent, wi t h the standard normal di s tri b uti o n, as s tri b uti o n .fn Yn!1Yn1· Suppose that t h e di s tri b uti o n of Xn = ( Xn , , Xnn) is spherically sym sense t h e that metri c i n Xn!I Xnl is uniformly distributed over the unit sphere. that Assume I Xn1 /n = 1, and show that xn l • . . . ' Xnr are asymptotically independent and normal.
29.2. (a) (b)
(c)
(d)
29.3. (a)
JLn(K) > 1 - E
(b)
29.4.
29.5. 29.6.
29.7.
E
Y,,
K
c).
Xn
f
g.
i
29.8. i
1,
Z 1 , Z2, • • •
• • •
f
I:;:, = 1 Zm
I
= (f.,; -
(a)
·
(b)
29.9.
I:7= 1(/m -
(a )
i
11
(b)
1
1,
• • •
=
1
2
• • •
1
• • •
SECTION 29. 29.10.
LIMIT THEOREMS IN R k
387
Xn = (Xn1, . - - , Xnk ), = an = (Xn , Xn Xn
be randomthatvectors the sati s fyi n g mi x i n g Letcondition (27.19) with n O(n1, 2,-...5 ). ,Suppose stati o nar y i s the sequence 0, and (tthathe dithestributiu oaren ofuniforml...y, bounded. +) is the same for al l n), that Show that i f then + + .fn has i n the l i m i t the centered normal di s tri b uti o n with covaria nces
Sn !
E[ Xnu1 = Sn = X 1 · · · Xn ,
00
00
E [ xluxl, ] + L E[ xlu xl +j,l ] + L E[ xl +j. u xl, ] =l l j=
j
Hint:
29.11.
Use the Cramer-Wold device. l e t e state be a Markov chai n wi t h fi n i t e 27. 6 , space As{1, in Exampl say. Suppose t h e t r ansi t i o r: probabi l i t i e s Pu, are all po!>itive and thefor whiinitciahl 1probabi l i t i e s Pu are the statio nary ones. Let fn u be the number of i Show that the normalized frequency count i n and j
S=
{Yn}
. . . , s},
2 for each a and ra > 2 for some a. Then r > 2u, and since IE[X:rJI < M� r,. - Zlun� ' it follows that An(rw . . , r) < (MnfsnY - 2 uAn(2, . . . , 2). But this goes to 0 because (30.5) holds and because A n(2, . . . , 2) is bounded by 1 (it increases to 1 if the sum in (30.8) is enlarged to include all the u-tuples (i w . . , i)). It remains only to check (30.9) for r 1 = · · · = ru = 2. As just noted, An(2, . . . ,2) is at most 1, and it differs from 1 by L.sn- 2uun: , the sum extending over the (i., . . . , i) with at least one repeated index. S ince un� < M;, the u terms for example with i u = i u -l sum to at most M;s;; 2 LCTn� · · · un� < Mn2S11- 2 Thus 1 - A11 (2, . . . , 2) < u 2Mn2sn- 2 � 0. This proves that the moments (30 7) converge to those of the normal distribution and hence that Sn/sn = N If
I
•
u-1
Application to Sampling Theory
Suppose that
n
number'>
not necessarily distinct, are associated with the elements of a population of size n. Suppose that these numbers are normalized by the requirement
( 30.10)
n L Xn :, = 0 ,
h
=I
h
n L x �h = 1 , =I
Mn = maxn l x nh l . h :s;
An ordered sample Xn�> . . . , Xn k is taken, where the sampling is without " replacement. By (30. 10), E[Xnk ] = 0 and E[x;k J = 1jn. Let s� = k n /n be the fraction of the population sampled. If the Xn k were independent, which they are not, Sn = Xn 1 -t · · · +Xn k would have variance s� . If k n is small in comparison with n, the effects of dependence should be small. It will be shown that Snfsn = N if n
( 30 . 1 1 )
k , --" 2 sn = 0' n
Mn _ s n � o'
Since M; > n - I by (30.10), the second condition here in fact implies the third. The moments again have the form (30.7), but this time E[X:J , · · · x:;) cannot be factored as in (30.8). On the other hand, this expected value is by symmetry the same for each of the (kn)u = kn(kn - 1) · · · (kn - u + 1) choices of the indices i a in the sum L:'. Thus
A n( r p . . . , ru ) - ( ksnr L E [ Xn l . . . X'"J nu . n ''
The problem again is to prove (30.9).
SECTION
30.
THE METHOD OF MOMENTS
393
The proof goes by induction on u. Now An(r) kns;'n - 1 L:Z = 1 x� h • so that A n(l) = O and An(2) = 1. If r > 3, then lx�h i < M�-2x�h ' and so I An(r)l < (Mn/snY - 2 � 0 by (30. 1 1). Next suppose as induction hypothesis that (30.9) holds with u 1 in place of u. Since the sampling is without replacement, E[ X�j · : · X�d = L:x��� · · · x�'J,j(n) u , where the summation extends over the u-tuples (h I ' . . . , h) of distinct integers in the range 1 < ho: < n. In this last sum enlarge the range by requiring of (hp h 2 , , h) only that h 2 , , h u be distinct, and then com pensate by subtracting away the terms where h 1 = h 2 , where h 1 = h3, and so on. The result is =
-
•
•
•
•
•
•
n( n) u _ 1 [ X ' � ] E [ X '2 · X '" ] n 2 · · nu ( n) u E n l � ( n ) u - 1 E [ X'n 2 . . . Xn' 1 +ra . . X'" ] '-' ( n ) u o: 2 =2
_
T' U
.
0:
This takes the place of the factorization made possible in assumed independence there. It gives
u A n ( r2 , L o: =2
• • •
, r 1 + ro: ,
•
(30.8) by the
• . •
, r, ) .
By the induction hypothesis the last sum is bounded, and the factor in front goes to 0 by (30.1 1). As for the fi rst term on the right, the factor in front goes to 1. If r 1 =1= 2, then An(r1 ) � o and An(r2 , , r) is bounded, and so A n(r I ' . . . , r) � 0. The same holds by symmetry if ro: =I= 2 for some a other than 1. If r 1 · · · ru 2, then An(r 1 ) 1, and A n(r 2 , , r) � 1 by the induction hypothesis. Thus (30.9) holds in all cases, and Snfs n = N follows by the method of moments. • • •
=
=
=
=
• • •
Application to Number Theory
g(m) be the number of distinct prime factors of the integer m; for example g{34 x 5 2 ) = 2. Since there are infinitely many primes, g(m) is unbounded above; for the same reason, it drops back to 1 for infinitely many m (for the primes and their powers). Since g fluctuates in an irregular way, it Let
is natural to inquire into its average behavior. On the space f! of positive integers, let Pn be the probability measure that places mass 1/n at each of 1, 2, . . . , n, so that among the first n positive
394
CONVERGENCE OF DISTRI BUTIONS
integers the proportion that are contained in a given set A is just Pn( A). The problem is to study Pn[m: g(m) < x] for large n. If 8/m) is 1 or 0 according as the prime p divides m or not, then
( 30.12) Probability theory can be used to investigate this sum because under Pn the 8/m) behave somewhat like independent random variables. If Pp . . . , Pu are distinct primes, then by the fundamental theorem of arithmetic, 8P1(m) = · · · = 8P (m) = 1-that is, each P; divides m-if and only if the product p 1 · · · pu divides m. The probability under Pn of this is just n - I times the number of m in the range 1 < m < n that are multiples of p1 • Pu, and this number is the integer part of njp 1 • • Pu · Thus II
•
•
•
( 30.13) for distinct p1 • Now let XP be independent random variables (on some probability space, one variable for each prime p ) satisfying
1 P [ Xp = 1 ] = p' If P I > . . . , Pu are distinct, then (30 . 14)
P[ XPi = 1 ' i = 1 ' . . . ' U ] = P • 1• • Pu I
For fixed p" . . . , Pu, (30. 13) converges to (30. 14) as n � oo. Thus the behavior of the XP can serve as a guide to that of the 8/m). If m < n, (30. 12) is L.P ,s n8/m), because no prime exceeding m can divide it. The ideat is to compare this sum with the corresponding sum L.P ,s n Xp. This will require from number theory the elementary estimate*
(30.15)
1 p <X p
I: - = log log x + 0( 1 ) .
The mean and variance of Lp ,snXp are Lp ,s nP_, and Lp ,s nP- 1 { 1 -p- 1); since L.pP- 2 converges, each of these two sums is asymptotically log log n . teompare Problems 2.18, 5.19, and 6.16. t see, for example, Problem 18.17, or HARov & WRIGHT, Chapter XXII.
SECTION 30. THE METHOD OF MOMENTS
395
Lp s.n8/m) with L.p < n Xp then leads one to conjecture the Erdos -Kac centra/ limit theorem for the prime divisor function: Comparing
Theorem 30.3.
(30.16)
For all x ,
[Pn m : g( m)
- log log n vflog log n
<x -
]
�
1 =- x --=
- u 2; 2 du . e h'Tr J - oo
PROOF. The argument uses the method of moments. The first step is to
show that (30.16) is unaffected if the range of p in (30.12) is further restricted. Let {a n } be a sequence going to infinity slowly enough that log an log � 0
( 30.17 )
n
but fast enough that ( 30.18)
"\" i...J
1
n < p < n p
-
j2 = o( log log n) l .
Because of (30.15), these two requirements are met if, for example, log a n {log n)jlog log n. Now define
=
( 30.19) For a function
f of positive integers, iet En[f] = n - 1
n
L f( m)
m=l
denote its expected value computed with respect to
By
Pn. By (30.13) for u = 1 ,
(30. 18) and Markov's inequality,
Therefore (Theorem 25.4) , (30.16) is unaffected if
g(m).
gn(m) is substituted for
CONVERGENCE OF DISTRIBUTIONS
396
Now compare (30.19) with the corresponding sum mean and variance of sn are
Sn
=
] 0. Prove this with G in place of (a)
E
=
(b)
g
30.11.
)
=
e
J2Tr - oo
=
(a) op] (b) (c)
g]
--+
30.12.
g( ) m
g.
� v L. 'TT'
E
=
e
- 00
CHAPTER 6
Derivatives and Conditional Prob ability
SECTION 31. DERIVATIVES ON THE LINE*
This section on Lebesgue ' s theory of derivatives for real functions of a real variable serves to introduce the general theory of Radon-Nikodym deriva tives, which underlies the modern theory of conditional probability. The results here are interesting in themselves and will be referred to later for purposes of illustration and comparison, but they will not be required in subsequent proofs. The Fundamental Theorem of Calculus
To what extent are the operations of integration and differentiation inverse to one another? A function F is by definition an indefinite integral of another function f on [a, b] if X
F( x ) - F( a ) = f f( t ) dt
(3 1 . 1 )
a
a < x < b; F is by definition a primitive of f if it has derivative f: F'( x ) =f( x ) ( 31 .2) for a < x < b. According to the fundamental theorem of calculus (see (17.5)), these concepts coincide in the case of continuous f: for
Theorem 31.1.
Suppose that f is continuous on [a, b ].
An indefinite integral off is a primitive off: if (3 1.1) holds for all x in [a, b ], then so does (31.2). (ii) A primitive off is an indefinite integral off: if (31.2) holds for all x in [a, b], then so does (31. 1). (i)
* This section may be omitted.
400
SECTION
31.
401
DERIVATIVES ON THE LINE
A basic problem is to investigate the extent to which this theorem holds if
f is not assumed continuous. First consider part (i). Suppose f is integrable, so that the right side of (31 .1) makes sense. If f is 0 for x < m and 1 for x m (a < m < b), then an F satisfying (31.1) has no derivative at m . It is thus too much to ask that (31.2) hold for all x. On the other hand, according to a famous theorem of Lebesgue, if (31.1) holds for all x, then (31.2) holds almost everywhere-that is, except for x in a set of Lebesgue measure 0. In this section almost everywhere will refer to Lebesgue measure only. This result, the most one- could hope for, will be proved below (Theorem 31.3 ) . Now consider part {ii) of Theorem 31.1. Suppose that (31.2) holds almost everywhere, as in Lebesgue' s theorem, just stated. Does (31.1) follow? The answer is no: If f is identically 0, and if F(x ) is 0 for x < m and 1 for x m (a < m < b), then (3 1.2) holds almost everywhere, but (31. 1) fails for x m. The question was wrongly posed, and the trouble is not far to seek: If f is integrable and (31.1) holds, then
>
> >
( 31 .3 )
F( x + h ) - F( x )
=
Jbl( x. x +h lt)f( t ) dt � O a
as h � 0 by the dominated convergence theorem. Together with a similar argument for h i 0 this shows that F must be continuous. Hence the question becomes this: If F is continuous and f is integrable, and if (31.2) holds almost everywhere, does (31.1) follow? The answer, strangely enough, is still no: In Example 31.1 there is constructed a continuous, strictly increasing F for which F '(x) = 0 except on a set of Lebe, [' ( F( a;_ 1 ) - F (a;) ) + I L:" ( F( a;) - F( a; _ 1 ) ) 1 = [' ( F(a;_ 1 ) - F (a ;) ) + ! ( F(b) - F( a )) + [' ( F( a;_ 1 ) - F(a;)) l
·
As all the differences in this last exp ression are nonnegative, the absolute value bars can be suppressed; therefore,
IIFIIIl > F( b) - F( a) + 2 [' ( F( a;_ 1 ) - F ( a ;) ) • > F(·b) - F(a) + 2 0 [' (a ; - a ;_ 1 ) . A function F has at each x four derivates, the upper and lower right
derivatives
. sup F( x + h ) - F( x ) , D F ( x ) = hm h h J. O
DF ( x ) = I 1m mf
. . F( x + h) - F( x ) , h tO
h
and the upper and lower left derivatives
F( X ) - F( X - h ) ( , D x ) = hm sup h
F
•
h tO
F h F(x) x ( . ) . 1 ( ) _ f 1 1 0 m D X F h h tO
·
404
DERIVATIVES AND CONDITIONAL PROBABILITY
There is a derivative at x if and only if these four quantities have a common value. Suppose that F has finite derivative F'(x) at x. If u < x < v, then
F( v ) - F ( u ) ' ( x ) < v - x F( v ) - F( x ) - F ' ( x ) F v-x v-u v-u x - u F( x) - F ( u ) _ ' + F ( x) x-u v-u Therefore,
- F( u ) v-u
F( v )
(31.8)
�
F'( x )
as u j x and v � x; that is to say, for each E there is a o such that u < x < v and 0 < v - u < o together imply that the quantities on either side of the arrow differ by less than E . Suppos� that F is measurable and that it is continuous except possibly at countably many points. Thi� will be true if F is nondecrcasing or is the difference of two nondecreasing functions. Let M be a countable, dense set containing all the discontinuity points of F; let rn(x) be the smallest number of the form kjn exceeding x. Then F( y) - F( x) . Y -x --+oo x
>
a
>
a
=
Singular Functions
If f(x) is nonnegative and integrable, differentiating its indefinite integral jx_"'f(t) dt leads back to f(x) except perhaps on a set of Lebesgue measure 0. That is the content of Theorem 31.3. The converse question is this: If F(x ) is nondecreasing and hence has almost everywhere a derivative F'(x ), does integrating F'(x) lead back to F(x )? As stated before, the answer turns out to be no even if F( x) is assumed continuous: Let X1 , X2 , . . . be independent, identically distributed random variables such that P[ Xn = 0] = Po and P[ Xn = 1j = p 1 = 1 - p0, and n let X = r.: =I Xn2 - . Let F(x) = P[ X <x] be the distribution function of X. For an arbitrary sequence u I ' u 2 , . . . of O' s and 1 's, P[ xn = u n , n = 1, 2, . . . ] = limn Pu · · · Pu = 0; since x can have at most two dyadic expansions continuous. Of course, x = L:n u n 2 - n , P[X = x] = 0. Thus F nis everywhere n F(O) = 0 and F(l) = 1. For 0 k < 2 , k2- has the form "£7= 1 u ,. 2 - ; for some n-tuple (u 1 , u n ) of O's and 1 's. Since F is continuous, Example 31.1.
n
I
•
(31.15)
(
•
•
,
a
for: x E A , then J.L( A) < a..\(A). for x E A , then J.L( A) > a..\(A).
E
It is no restriction to assume A bounded. Fix for the moment. Let E be a countable, dense set, and let A n = n (A n I), where the intersec tion extends over the intervals I = (u, v] for which u, v E £, 0 < ..\( I ) < n - 1 and PROOF.
( 31 .19 )
J.L ( l )
< ( a + E ) ..\ ( 1 ) .
,
Then An is a Borel set and (see (31.8)) An i A under the hypothesis of (i). By Theorem 1 1.4 there exist disjoint intervals In k (open on the left, closed on
DERIVATIVES AND CONDITIONAL P ROBABILITY
410
the right) such that
An c U k in k and
(31 . 20 )
L A ( ln k ) < A ( An) + E. k
It is no restriction to assume that each In k has endpoints in E, meets and satisfies ..\ { In k ) < n - l . Then (31.19) applies to each In k • and hence
An,
J.L ( A n ) < L J.L( Ink ) < ( a + E) L A ( In k ) < ( a + E )( A ( A n) + E ) . k
k
In the extreme terms here let n -7 oo and then E -7 0; {i) follows . To prove {ii), let the countable, dense set E contain all the discontinuity points of F, and use tbc same argument with ,_J,(l) > (a - E)..\{l) in place of (31. 19) and L k J.L(ln k ) < J.L(An) + E in place of (31.20). Since E contains all the discontinuity points of F, it is again no restriction to assume that each In k has endpoints in E, meets An , and satisfies ..\ { In k ) < n - l . It follows that
J.L ( An) + € > L J.L( In k ) > ( k
Again let
a
n -7 oo and then E -7 0.
The measures and S,� · such that
( 31.21)
- €)
L A ( Ink ) > (a - € ) A ( An) · k
•
J.L and A have disjoint supports if there exist Borel sets S,.. J.L ( R 1 - s,.. ) = o, s,.. n s,� O . =
Suppose that F and J.L are related by (31.18). A necessary and sufficient condition for J.L and A to have disjoint supports is that F'(x) = 0 except on a set of Lebesgue measure 0. Theorem 31.5.
PROOF. By Theorem 31.4, J.L[x: l xl < a, F'(x) < E} < 2aE, and so {let E -7 0 and then a -7 oo) J.L[x: F'(x) = 0] = 0. If F'(x) = 0 outside a set of
Lebesgue measure O, then S,� [ x : F'(x) = 0] and S,.. = R 1 - S,� satisfy (3 1.21). Suppose that there exist S,_. and S,� satisfying (31.21). By the other half of Theorem 31.4, d[x: F'(x) > E] = d[ x : x E S,�, F'(x) > E] < J.L(S,�) = O, and so {let E -7 0) F'(x) = 0 except on a set of Lebesgue measure 0. • =
Suppose that J.L is discrete, consisting of a mass m k at each of countably many points x k " Then F( x) = L.m k ' the sum extending over the k for which x k < x. Certainly, J.L and A have disjoint supports, and so F' must vanish except on a set of Lebesgue measure 0. This is directly obvious if the • x k have no limit points, but not, for example, if they are dense. Example 31.2.
SECTION
31.
DERIVATIVES ON THE LINE
411
Consider again the distribution function F in Example 31.1. Here J.L(A ) = P[ X E A]. Since F is singular, J.L and ,.\ have disjoint supports. This fact has an interesting direct probabilistic proof. For x in the unit interval, let d 1(x), dix), . . . be the digits in its nontermi n nating dyadic expansion, as in Section 1. If (k2- , (k + 1)2- n ] is the dyadic interval of rank n consisting of the reals whose expansions begin with the digits u l , . . . ' u n , then, by (31.15), Example 31.3.
If the unit interval is regarded as a probability space under the measure J.L, then the d;(x) become random variables, and (3 1.22) says that these random variables are independent and identically distributed and J.L[ x: di(x) 0] = p0, =
J.L[x: di(x) = 1] = p 1 •
Since these random variables have expected value p 1 1 the strong law of large number'i implies that their averages go to p 1 with probability 1:
( 31 .23)
J.L
[X E (0, 1]
:
lim n
� i.t= I d ( X) = p I ] = 1. j
On the other hand, by the normal number theorem,
(31 .24)
,.\
[ x E (0, 1] : n n1 .[ di( x ) = 21 ] = 1 . .
hm -
n
'=I
(Of course, (31.24) is just (31.23) for the special case p0 = p 1 = f; in this case J.L and ,.\ coincide in the unit interval.) If p 1 =I= �, the sets in (31.23) and (31.24) are disjoint, so that J.L and ,.\ do have disjoint supports. It was shown in Example 31.1 that if F'(x) exists at all (0 < x < 1), then it is 0. By part (i) of Theorem 31.4 the set where F'(x) fails to exist therefore has wmeasure 1; in particular, this set is uncountable. • In the singular case, according to Theorem 31.5, F' vanishes on a support of A. It is natural to ask for the size of F' on a support of J.L · If B is the x-set where F has a finite derivative, and if (31.21) holds, then by Theorem 31.4, J.L[x E B: F'(x) < n] = J.L[x E B n S"': F'(x) < n] < nA(S"') = O, and hence J.L(B) = 0. The next theorem goes further. Theorem 31.6. Suppose that F and J.L are related by (31. 18) and that and ,.\ have disjoint supports. Then, except for x in a set of J.L-measure D(x) = oo.
F
J.L 0,
412
DERIVATIVES AND CONDITIONAL PROBABILITY
If J.L has finite support, then clearly FD(x) = oo if J.L{ x } > 0, while D/x ) = 0 for all x. Since F is continuous from the right, FD and DF play different roles.t PRooF. Let A n be the set where FD(x) < The problem is to prove that J.L ( A n ) = 0, and by (31.21) it is enough to prove that ( A n n S1) = 0.
n.
J.L
Further, by Theorem 12.3 it is enough to prove that J.L( K) = 0 if K is a compact subset of A n n S,.. . Fix E. Since A. ( K ) = 0, there is an open G such that K c G and A{ G) < E. If x E K, then x E A n , and by the definition of FD and the right-continuity of F, there is an open interval Ix for which x E Ix c G and J.LU) < n;>,.(l). By compactness, K has a finite subcover Ix 1, , Ixk · If some three of these have a nonempty intersection, one of them must be contained in the union of the ether two. Such superfluous intervals can be removed from the subcover, and it is therefore possible to assume that no point of K lies in more than two of the Ix But then •
I
•
•
J.L( K) < 11- ( V I_.; ) � �J.L{Ix) < n � A.( IxJ I
Since
•
I
I
< 2 n A ( l) Ir, ) 2 nA. ( G ) < 2nE.
n], then An ! 0, and since f is integrable, the dominated convergence theorem implies that fA f( x)dx < E/2 for large n. Fix such an n and take 8 Ej2n. If ..\(A ) < 8, then fA f( x ) dx < n
=
fA -A " f( x ) dx + fA
f(x) dx < n..\( A) + €/2 < €. If F is given by" (3 1.26), then F(b) - F(a) = Jabf(x) dx, and (31.27) has this consequence: For every E the1 e exists a 8 such that for each finite collection [a1, b1], i = 1, . . . , k, of nonoverlappingt intervals, ( 31 .28)
k
L j F( b1) F ( a ) J < €
i� I
-
,
if
k
L ( b, - a,) < 8.
i=I
A function F with this property is said to be absolutely continuous.+ A function of the form (31.26) (f integrable) is thus absolutely continuous. A continuous distribution function is uniformly continuous, and so for every E there is a 8 such that the implication in (31.28) holds provided that k = 1. The definition of absolute continuity requires this to hold whatever k may be, which puts severe restrictions on F. Absolute continuity of F can be characterized in terms of the measure J.L :
Suppose that F and J.L are related by (31.18). Then F is absolutely continuous in the sense of (3 1 .28) if and only if J.L(A) = 0 fo r every A for which ..\(A) = 0. Theorem 31.7.
t In tervals are nonoverlapping if their interiors are disjoint. In this definition it is immaterial
whether the intervals are regarded as closed or open or half-open, since this has no effect on (3 1 .28). tThe definition applies to all functions, not just to distribution functions If F is a distribution function as in the present discussion, the absolute-value bars in (3 1.28) are unnecessary
414
DERIVATIVES AND CONDITIONAL PROBABILITY
Suppose that F is absolutely continuous and that A(A) = 0. Given E, choose 8 so that (31.28) holds. There exists a countable disjoint union B = U k lk of intervals such that A c B and A(B) < 8. By (31.28) it follows that J.L( U Z= 1 /k ) < E for each n and hence that J.L(A) < J.L(B) < E. Since E was arbitrary, J.L(A) = 0. If F is not absolutely continuous, then there exists an E such that for every 8 some finite disjoint union A of intervals satisfies A(A) < 8 and J.L(A) > E. Choose An so that A( An) < n - 2 and J.L(An) Then ,.\(lim supn A n) = 0 by the first Borel-Cantelli lemma (Theorem 4.3, the proof of which does not require P to be a probability measure or even finite). On the other hand, J.L0im supn An) > E > 0 by Theorem 4. 1 (the proof of which applies because 1-L is assumed finite). • PROOF.
> €.
This result leads to a characterization of indefinite integrals.
distribution function F(x) has the form J�"'f(t) dt for an integrable f if and only if it is absolutely continuous i."l the sense of (31.28). Theorem 31.8.
A
That an F of the form (31.26) is absolutely continuous was proved in the argument leading to the definition (31.28). For anothe r proof, apply Theorem 31.7: if F has this form, then A ( A ) = 0 implies that J.L(A) = PROOF.
fAf(t) dt = 0.
To go the other way, define for any distribution function
(31 .29)
F
Fac ( x) = f F'( t ) dt X
- ex:
and
( 31 .30) Then F;; is right-continuous, and by (31.9) it is both nonnegative and nondecreasing. Since Fac comes form a density, it is absolutely continuous. By Theorem 31.3, F;c = F' and hence F; 0 except on a set of Lebesgue measure 0. Thus F has a decomposition =
(31.31)
F( x ) = Fac ( x) + F;; ( x ) ,
where Fac has a density and hence is absolutely continuous and F5 is singular. This is called the Lebesgue decomposition. Suppose that F is absolutely continuous. Then F5 of (3 1.30) must, as the difference of absolutely continuous functions, be absolutely continuous itself. If it can be shown that F5 is identically 0, it will follow that F = Fac has the required form. It thus suffices to show that a distribution function that is both absolutely continuous and singular must vanish.
SECTION
31.
415
DERIVATIVES ON THE LIN E
If a distribution function F is singular, then by Theorem 31.5 there are disjoint supports S,.. and SA . But if F is also absolutely continuous, then from ,.\(S,_.) = 0 it follows by Theorem 31.7 that JL ( S,_. ) 0. But then JL(R 1 ) 0, and • so F( x ) 0.
=
=
=
This theorem identifies the distribution functions that are integrals of their derivatives as the absolutely continuous functions. Theorem 31.7, on the other hand, characterizes absolute continuity in a way that extends to spaces f! without the geometric structure of the line necessary to a treatment involving distribution functions and ordinary derivatives.t The extension is studied in Section 32. Functions of Bounded Variation
The of t h e precedi n g remai n der of thi s sect i o n bri e fl y sket c hes theory t o the ext e nsi o n funct i o ns that are not monot o ne. The resul t s arc f o r si m pl i c i t y gi v en onl y f o r a fi n i t e inteIfrvalF( x[ a) =b]faxf(t) and fodtr fius nctian ionnsdefiFnionte i[nat,eb]gralsat, where isfyingfF(a)is in=tegrabl 0. e but not necessar itwlyononnegat i v e, then F( faxr(t) dt - J:r(t) de exhibits F as the difference of i n t e gral s thus nondecreasi n g functi o ns. The probl e m of charact e ri z i n g i n defi n i t e ldieffadserenceto theof nondecreasi prelimin aryngprobl e m of charact e ri z i n g funct i o ns repre e ntabl e as a f u nct i o ns. Now F is said to be of bounded variation over [ a , b] if sup8 II F ila is finite, where II FilA is defi n ed by (3 1 .5) and ranges over all parti t i o ns (31.4) of [ a , b]. Clearly, a diasffwelerencel: Forofevery nondecreasi n g funct i o ns i s of bounded vari a ti o n. But the converse hol d s finite collection of nonoverlapping intervals [ x;, Y;l in [a, b], put ,
X) =
�
r
Now define = supPr, N( x ) = sup Nr, where the supr::. m a extend over part i t i o ns of {a, x ] . If F is of bounded variatio n, then Pr Nr + F( x ). This gives the P( x) and N( x) are finite. For each such inequalities P( x )
r
Pr < N( x ) + F( x ) ,
r
r
r,
=
P( x ) > Nr + F( x ) ,
which in turn lead to the inequalities P(x) > N( x ) + F( x ) . P( x ) < N( x ) + F(x ) , Thus F(x ) = P( x ) - N( x ) (31 .32) representati o n: A function the difference of two nondecreasing gifunctions ves the ifrequi r ed and only if it is of bounded variation. in is
tTheorems 3 1 .3 and 3 1 8 do have geometric analogues
R k ; see
RuoiN2 , Chapter 8.
416
DERIVATIVES AND CONDITIONAL PROBABILITY ;
If Pr Nr, then I:IF( F( x )l. According to the definition (31.28), F conti n uous i f f o r whenever the iisntabsol u t e l y every € there exists a such that l e ngth l e ss than col l e cti o n have total If F is absolutely e rval s i n t h e of 1 and decompose [ a , b] i n to a fi n i t e t a ke the correspondi n g an conti n uous, t o number, say n, of subin tervals [ u , u) of lengths less than Any partition of [ a , b] can by the insertion of the u be spl i t i n to n sets of i n tervals each of total length lfunct ess tiohnanis necessari and itlyfoofl obounded wst thatvariI Filaation.n. Therefore, an absolutely continuous An absol u tel y conti n uous F thus has a representation (31.32). It fo l ows by the defi n i t i o ns that P( y) - P( x) i s at most supr where ranges over the parti tions of [ x, y]. If x Y;] are nonoverlapping in tervals, then P( y;) -P( x, ) i s at most col l e cti o ns of i n terval s that parti t i o n each of supr where now ranges over t h e thethat[ L(x y1y;]- x;)Therefore,impliifesFthatis absol[(P(y,utel)y-contiP(xnuous, there exi s t s for each such )) In other words, P i s absol u tel y contiThusn uous.an absol Similuatelrlyy, Ncontiis nabsol u t e l y conti n uous. uous F is the diffe rence of two nondecreasing absolutely conti n uou!. f u ncti o ns. By Theorem 31.8, each of these is an i n defi n i t e i n tegral, which implies that F is an indefinite integral as well: For an F on [a, b] satisfying F(a) 0, Tr =
Tr =
+
r
o
1
j
8,
Tr,
[
y;) -
8.
E
_1
Tr < E
�
o.
D F(O) = 0 F D(l) =Foo D (x) = 0 ( D( x ) = oo
2
F
JL
x,
(
l
-
(a) = - Po Po * },
2
1
x,
s,
JL \ l x ) = ,
s
n x = L:k'
( x)
ln( x )
=
sn( x )
11
l
Po -
2
l, F
Po -
(b)
*
n JLUn( x ))f2- -+ 0 F'( x )
x
(c)
JL( /n( X )) n (2- n (
F
2.
0.
0.
=
oo 0
F'( x )
f a< f
L:'k=o a k ( x ), a x a k ( x ) = 2- ka 2 k x (31.8)
2, 2.
a>
F
}
1
31. 18.
0.
Po
2
fo 2
f(x )
x
f
=
f
tA Li psc hitz condi tion of order a holds at x if F(x + h ) - F(x) O(ihia) as h -+ 0 for a > 1 this implies F'(x) 0, and for o < a < I it is a smoothness condition stronger than 'continuity =
=
and weaker than differentiability.
32.
S ECTION
31.19.
31.20.
TH E RADON -NIKODYM THEOREM
419
Show (see 01.31)) that (apart from addition of constants) a fu nction can have onllar.y one re presentation with absolutely continuous and singu Show thatwherethe iins cont the iLebesgue decomposi t i o n can be furt h er spl i t i n t o i n creases onl y i n jumps n uous and si n gul a r and iposin thteiosense t h at the correspondi n g measure i s di s cret e . The compl e t e decom n is then Show and that, i f assumes Suppose that L i F( )l + then i t is of unbounded1 variation. ( i n t e rval the valDefiuene0 in each f o r over [0, 1] by and F(x)=xasi n x For which values of is of bounded variation? 14.4 i I f is nonnegative and Lebesgue i n tegrable, then by Theorem 31.3 and (31.8), except fo r in a set of Lebesgue measure 0, F1 + F2
F2
F1
F5 Fcs
Fd + Fcs>
Fd
F = Fac + Fcs + Fd.
31.21. (a)
(b)
31.22.
x (< x 2
O.
x
1
(31.35)
v-u u
< u,
!' f( t ) dt -+ f( x ) u
There liisty anmeasure analoguef.L: Ifin whiiscnonnegat imeasure f is replaced andby a general probabi h Lebesgueive and in tegrable with respect to f.L, then as h ! 0, (31 .36 ) f.L ( X - �, h] t -h , x+h/( l)f.L ( dt) -+ onf.L , anda setputof wmeasureinf[x:1. Let be thefor di0 stributi1o(see n funct(14.5)). ion correspondi n g t o Deduce (31.36} from (31.35) by change of variable and Problem 14.4. u <x
0 for all E in A , and ip(E) < 0 for all E in A - . Theorem 32.1.
>
A set A is posuwe if cp(E) 0 for E cA and negative if 1p(E) < 0 for E cA. The A + and A - in the theorem decompose n into a positive and a negative set. This is the Hahn decomposition . If cp(A) = fAfdJL (see Example 32.1), the result is easy: take A + = [f > O] and A - = [f < 0] .
SECTION 32.
TH E RADON-NIKODYM THEOREM
421
Let a = sup[cp{A): A E .9']. Suppose that there exists a set A + satisfying cp(A + ) = a (which implies that a is finite). Let A -= n - A + . If A c A + and cp{ A ) < 0, then cp(A + - A ) > a, an impossibility; hence A + is a positive set. If A cA - and cp(A) > 0, then cp(A + u A) > a, an impossibility; hence A - is a negative set. It is therefore only necessary to construct a set A + for which cp(A +) = a. Choose sets An n such that cp(An) � a, and let A = U n An. For each n consider the 2 sets Bni (some perhaps empty) that are intersections of the form· n Z = 1 A'k • where each A'k is either A k or else A - A k . The collection n f!Jn = [Bn;: 1 < i < 2 ] of these sets partitions A. Clearly, f!Jn refines f!Jn _ 1 : each Bnj is contained in exactly one of the Bn - I, i · Let en be the union of those Bn; in �n for which cp( Bn;) > 0. Since An is the union of certain of the B�;, it follows that cp(An) < cp(en). Since the partitions f!J1 , f!J2 , are successively finer, m < n implies that (enr u · · · u en - I u en) - (em u · · · u en _ 1 ) is the union (perhaps empty) of certain of the sets Bni ; the Bni in this union must satisfy cp(Bn;) > 0 because they are contained in en- Therefore, cp(Cm U · · · U en _ 1 ) < cp(em u · · · u Cn ), so that by induction cp(Am) < cp(em ) < cp(em U · · · U en). If Dm U :=men, then by Lemma 1 (take E, = em u · · · u em J cp(Am) < cp( Dm). Let A + = n ';;, = 1 Dm (note that A + = lim supn en), so that +Dm ! A +. By Lemma 1, a = limm cp(A m) • < limm cp(Dm) = cp(A + ). Thus A + does have maximal cp-value. PROOF.
•
•
•
=-
If cp +{A) = cp(A nA + ) and cp - { A ) = - cp(A n A - ), then cp + and cp - are finite measures. Thus (32.2 ) represents the set function cp as the difference of two finite measures having disjoint supports. If E cA, then '{){£) < cp + (£) < cp + (A), and there is equal ity if E = A nA +. Therefore, cp+(A) = sup£c A cp(£). Similarly, cp -(A) = - inf £ c A cp( E). The measures cp + and cp - are called the upper and Lower variations of cp, and the measure lcpl with value cp +(A) + cp -{A) at A is called the total variation. The representation (32.2) is the Jordan decomposi
tion.
Absolute Continuity and Singularity
Measures J.L and v on (f!, .9') are by definition mutually singular if they have disjoint supports-that is, if there exist sets S,.. and Sv such that (32.3)
11- ( f! - S,_. ) = 0 , s,.. n sv =
0.
v (f!
- Sv ) = 0,
422
DE RIVATIVES AND CONDITIONAL PROBABILITY
In this case J.L is also said to be singular with respect to v and v singular with respect to J.L · Note that measures are automatically singular if one of them is identically 0. 1 According to Theorem 31.5 a finite measure on R with distribution function F is singular with respect to Lebesgue measure in the sense of (32.3) if and only if F'(x) = 0 except on a set of Lebesgue measure 0. In Section 31 the latter condition was taken as the definition of singularity, but of course it is the requirement of disjoint supports that can be generalized 1 from R to an arbitrary fl. The measure v is absolutely continuous with respect to J.L if for each A in !F, J.L( A) = 0 implies v(A) = 0. In this case v is also said to be domillated by J.L, and the relation is indicated by v « J.L. If v « J.L and J.L « v, the measures are equivalent, indicated by v J.L. A finite measure on the line is by Theorem 31.7 absolutely continuous in this sense with respect to Lebesgue measure if and only if the corresponding distribution function F satisfies the condition (3 1.28). The latter condition, taken in Section 31 as the definition of absolute continuity, is again not the one that generalizes from R 1 to fl. There is an E-8 idea related to the definition of absolute continuity given above. Suppose that for every E there exists a 8 such that =
v( A ) < €
(32.4)
if 11- ( A ) < 8.
If this condition holds, J.L(A) = 0 implies that v(A) < E for all E, and so v « JL. Suppose, on the other hand, that this condition fails and that v is finite. Then for some E there exist sets An such that J.L(An) < n - 2 and v(An) > €. If A = lim :;,upn An, then J.L(A ) = 0 by the first Borel-Cantelli lemma (which applies to arbitrary measures), but v(A ) > E > 0 by the right hand inequality in (4.9) (which applies because v is finite). Hence v « J.L fails, and so (32.4) follows if v is finite and v « f-L . If v is finite, in order that « J.L it is therefore necessary and sufficient that for every E there exist a 8 satisfying (32.4). This condition is not suitable as a definition, because it need not follow from v « J.L if v is infinite.t v
The Main Theorem
If v(A) = f fdJ.L, then certainly v « f.L. The in the opposite direction: A
Radon-Nikodym theorem goes
If J.L and v are u-finite measures such that v « J.L, then there exists a nonnegative f, a density, such that v(A) = f fdJ.L for all A E !F. For two such densities f and g, 11-U =I= g ] = 0. Theorem 32.2.
tsee Problem 32.3.
A
SECTION 32.
TH E RADON-NIKODYM THEOREM
423
The uniqueness of the density up to sets of J.L-measure 0 is settled by Theorem 16. 10. It is only the existence that must be proved. The density f is integrable J.L if and only if v is finite. But since f is integrable J.L over A if v ( A ) < oo, and since v is assumed u-finite, f < oo except on a set of J.L-measure 0; and f can be taken finite everywhere. By Theorem 16. 1 1, integrals with respect to v can be calculated by the form ula
JAhdv = JAhfdJ.L.
(32.5)
The density whose existence is to be proved is called the Radon-Nikodym derivative of v with respect to J.L and is often denoted dv/dJ-1-. The term derivative is appropriate because of Theorems 3 1.3 and 3 1 .8: For an abso lutely continuous distribution function F on the line, the corresponding measure J.L has with respect to Lebesgue measure the Radon-Nikodym derivative F'. Note that (32.5) can be written
v f hdv = J h : dJ.L.
(32.6)
A
A
J.L
Suppose that Theorem 32.2 holds for finite J.L and v (which is in fact enough for the probabilistic applications in the sections that follow). In the u-finite case there is a countable decomposition of f! into .r-sets An for which J.L(An) and v ( A n ) are both finite. If (32.7) then v « J.L implies vn « IL n • and so vn( A ) = fA in dJ.Ln for some density In Since .u n has density IA " with respect to J.L (Example 16.9),
Thus [nfniA is the density sought. It is therefore enough to treat finite J.L and result. n
If J.L and
v. This requires a preliminary
v are finite measures and are not mutually singular, then there exists a set A and a positive € such that J.L(A) > 0 and E J.L( E ) < v(E) Lemma 2.
for all E c A .
DERIVATIVES AND CONDITIONAL PROBABILITY
424
PRooF. Let A ; u A ; be a Hahn decomposition for the set function v - n _ ,11-; put M = U n A;, so that Me = n n A;. Since Me is in the negative set A; for v - n - IJL, it follows that v(Me) n - IJL(Me); since this holds for all n, v( Me ) = 0. Thus M supports v, and from the fact that 11- and v are not
0 for some n . Take A = A ; and • E = n - 1. Suppose that (f!, !T) = ( R 1 , � 1 ), 11- is Lebesgue measure ,.\, and v(a, b] = F(b) - F(a ). If v and ,.\ do not have disjoint supports, then by Theorem 31.5, ,.\[x: F'(x) > 0] > 0 and hence for somt" E, A = [x: F'(x) > E] satisfies ,.\{ A ) > 0. If E = (a , b] is a sufficiently small interval about an x in A, then v( E)jA( E) = (F(b) - F(a))j(b - a) > E, which is the same thing as Example 32.2.
d (E)
< v(E).
•
Thus Lemma 2 ties in with derivatives and quotients v( E) IJL(E) for "small" sets E. Martingale theory links Radon-Nikodym derivatives with such quotients; see Theorem 35.7 and Example 35. 10. PRooF OF THEOREM 32.2. Suppose that 11- and v are finite measures satisfying v « 11- · Let .§ be the class of nonnegative functions g such that fEgdJL < v( E) for all E. If g and g' lie in .§, then max(g, g ') also lies in .§
because
J max( g , g') dJL = j E
gd11- +
E n [ g � g 'J
j
g ' dJL
£ n[g' > g ]
< v ( E n [ g > g ']) + v ( E n [ g ' > g ] ) = v ( E ) .
Thus .§ i closed under the formation of finite maxima. Suppose that functions gn lie in .§ and gn j g. Then fEgdJL = limn fcgn d11- v(E) by the monotone convergence theorem, so that g lies in .§. Thus § is closed under nondecreasing passages to the limit. Let a = sup fg dJL for g ranging over .§ (a ::::;; v(f!)). Choose gn in .§ so that fgn d11- > a - n - I . If fn = max(g" . . . , gn) and f = lim fn, then f lies in .§ and ffdJL limnffn d11- > lim n fgn dJL = a. Thus f is an element of .§ for which ffdJL is maximal. Define vac by V3/ E) = fEfdJL and v5 by v5( E) = v( E) - v3c( E). Thus
0 and EJL(E) < 4£) for all E cA. Then for every E '
'
'
= v( E nA) + f AfdJL < v ( E nA ) + v ( E - A ) £= v( E ) . In other words, f + dA lies in .§; since f(f+ dA ) dJL = a + EJL ( A ) > a, th is contradicts the maximality of f. Therefore, 11- and v5 are mutually singular, and there exists an S such that 4S) JL(S c ) = 0. But since v « JL, v5(S c ) < v(S c ) = 0, and so v5{f!) = 0. The • rightmost term in (32.8) thus drops out. =
Absolute continuity was not used until the last step of the proof, and what the argument shows is that v always has a decomposition (32.8) into an absolutely continuous part and a singular part with respect to 11- · This is the Lebesgue decomposition, and it generalizes the one in the preceding section (see (31.31)). PROBLEMS
32.1.
32.2.
(32. 1)
absol u te: There are t w o ways t o show that the convergence i n must be Use the Jordan decomposi t i o n. Use the fact that a seri e s converges absol u tel y if it has the same sum no matter what order the terms are taken in. be other ones A(UA). a Hahne ofdecomposi t i o n of there may IfConstA +ruAuct -anisexampl thi s . Show that there i s uni q ueness t o the ext e nt that �p(A +6A i> �p(A -6A)) that absol u te conti n ui t y does not i m pl y the condi t i o n i f v is Show l e t s t subset s icounti nfiniten.g measure,Let andconsi of al l of the space of i n t e gers, v be at l e t Not e that have mass i s fi n i t e and v is a--finite. that t h e Radon-Ni k odym theorem fai l s not afi n i t e , even i f i f i s v is Show fiuncountabl nite. e Let let consibe scount t of theing count a bl e and the cocount a bl e sets i n an measure, and l e t v(A) be or 1 as A is countable or cocountable. 1p,
=
32.3.
Hint.
32.4.
= 0.
!f
!f
Hint:
n,
(32.4)
E-B
JL
n-
2
JL
n.
JL
JL
0
426
32.5.
DERIVATIVES AND CONDITIONAL PROBA BILITY
measure Let {AJL be theA restE rictiofonvertoficplalastnarrips.Lebesgue A 2 to the u-field Defi n e v on by v(A = A 2(A 1)). Show that v is absolutely continuous with respect to JL but has no density. Why does this not contradict the Radon-Nikodym theorem? LetderivJL,ativesandherpe beareu-fieverywhere nite measures on (D., .9"). Assume the Radon-Nikodym nonnegat i v e and fi n i t e. Show that v « JL and JL « p imply that v « p and X
!F
x (0,
32.6.
R1:
.9"
�1 }
X
R1)
f�,
(a)
dv dv dJL . dp = dJL dp (b)
=
Show that v JL implies ( )
dJL - I dv . djL = l[diLfdv > O] dv v
p,
1
« Suppose that JL p and « and let A be the set where dv dp dJL fdp. Show that v «JL if and onl y if p( A) = O, in which case (c)
>
0
=
dv f dvfdp = djL [diLfdp > Ol dJLfdp . 32.7. 32.8.
decomposi t i o n (32.8) in the if-fi nite as well as Show that there i s a Lebesgue the finite case. Prove that it is unique. JL i s u-fi n i t e, even The Radon Ni k odym theorem hol d s i f i f v i s not. Assume at firLetst that beJL ithes finiclatess(andof (,9:sets) v « JL).B such that JL(E) = or v(E) = for each E cB.LetShowbethatthe �clacontai n s a set B0 of maximal wmeasure. ss of set s i n 00 = 80 that are countable uni ons of sets of fiDnoit=e fiov-measure. Show t h at contai n s a set C0 of maximal JL-measure. Let - Co . Deduce from the maxi m al i t y of 8 0 and C0 that JL( D0 ) = v(D0) = Let v0(A) = v(A 00). Using the Radon-Nikodym theorem fo r the pai r JL, v0,Nowproveshowit fthat o r JL,thev. theorem holds if JL is merely u-finite. Show that if the density can be taken everywhere finite, then v is u-finite. .9"), and suppose that LetcontaiJL nanded ivn be finThen ite measures on i s a ufi e l d rest r i c t i o ns J-l-0 and v0 of JL and v to the are measures on (!l, .9" 0). Let v3c,v5, v�c, v5° be, respectively, the absolutely contin respect to v and v0 wi t h and JL0• Show that uous and si n gul a r part s of v�c(E) V3c(E) and v;(E) v5( E) for E E .9"0• on (D., Suppose that JL, v, vn are fi n i t e measures .9") and that v(A) = Ln vn( ) vn(A) = fA in dJL + v�(A) and v(A) = fA fdJL + v'(A) be the fordecomposi all A.tioLetns (32.8); si n gul a r wi t h respect here v' and v� are Show that to JL · f = En fn except on a set of JL·measu re and that v' ( A) = Env�( A) for al l Show that v « JL if and only if vn JL for all (a)
-�
(b)
.tf
oo
0
.tf
0.
(c)
n
(d) (e)
(f)
32.9.
.r.
>
32.10.
g:-o
en,
g:-o
.u
O] and A - = [w: f(w) < O] give a + (A) = that the three vari a ti o ns sat i s fy = f dJL. Hint: To construct f, start dJL, and dJL, �p - (A) with (32.2). A signed measure is a set fu nction that satisfies (32. 1) i f A 1 , A 2 , are and - but not both. Extend nt andandmayJordan assumedecomposi one of thetionsvaltouessigned+ measures dithesjoiHahn 31.22 Suppose that JL and v are a probabi l i ty measure and a u-fin i t e measure on t h e l i n e and that v JL. Show that the Radon - Nikodym derivative f satisfies m1 ;.tv(x( x -- hh ,, xx +-+ h]h] = f( x ) on a set of JL-measnre < < the unis StPinsuchtervalthatuncountabl Fi�inthd �onupport y many probabi l i t y measures JL , p JL)x} = fo r each x and and the SP are disjoint paus. Letuncountablbee theDefifienlde consion stingbyoftakithneg fi�p(A) nite tando be tthehe number cofinite ofsetpois innts anin A i f A is finite, and the negative of the number of points in A' if A i s cofinite . Show that (32. 1) holds (t hi s is not true if !l is countable). Show that there are notion,negatandivtehatsets fdoes o r (except the empty set ) , t h at there i s no Hahn decomposi not have bounded range. 1p
'P ·
= fA r
fA r
l �p i(A)
1p
32.12. i
32.13.
427
CONDITIONAL PROBABILITY 0
0.
1p
fA i l
oo
i
• • •
oo
«
I
h
.
....
c
1.
32.14.
0
m
32.15.
.9Q
1p
n.
1p
p
0
p
1,
.9Q
1p
SECTION 33. CONDITIONAL PROBABILI1Y I
The concepts of conditional probability and expected value with respect to a u-field underlie much of modem probability theory. The difficulty in under standing these ideas has to do not with mathematical detail so much as with probabilistic meaning, and the way to get at this meaning is through calcula tions and examples, of which there are many in this section and the next. The Discrete Case
Consider first the conditional probability of a set A with respect to another set B. It is defined of course by P(A I B ) = P(A n B)jP(B), unless P(B) vanishes, in which case it is not defined at all. It is helpful to consider conditional probability in terms of an observer in possession of partial information.t A probability space (f!, !F, P) describes tAs always, observer, information , know, and so on are informal, nonmathematical terms; see the related discussion in Section 4 (p. 57).
DERIVATIVES AND CONDITIONAL PROBABILITY
428
the working of a mechanism, governed by chance, which produces a result w distributed according to P; P( A) is for the observer the probability that the point w produced lies in A. Suppose now that w lies in B and that the observer learns this fact and no more. From the point of view of the observer, now in possession of this partial information about w, the probability that w also lies in A is P(A I B ) rather than P(A). This is the idea lying back of the definition. If, on the other hand, w happens to lie in Be and the observer learns of this, his probability instead becomes P( AIBe). These two conditional proba bilities can be linked together by the simple function ( 33 . 1 )
j P( A I B ) f( w) = \ P( A I B c )
if w E B, it w E Be .
The observer learns whether w lies in B or in Be; his new probability for the event w E A is then just f(w ). Although the observer does not in general know the argument w of f, he can calculate the value f(w) because he knows which of B and Be contains w. (Note conversely that from the value f(w) it is possible to determine whether lies in B or in SC, unless P(A I B ) = P(A I Be)-that is, unless A and B are independent, in which case the conditional probability coincides with the unconditional one anyway. ) The sets B and Be partition f!, and these ideas carry over to the general partition. Let B 1 , B 2 , be a finite or countable partition of n into .r-sets, and let .:# consist of all the unions of the B,. Then § is the u-field generated by the B,. For A in !F, consider the function with values cu
•
•
•
( 33.2) f( w) = P( A I B,) =
P( A n B; ) P( B; )
i = l , 2, . . . . w,
If the observer learns which element B, of the partition it is that contains then his new probability for the event w E A is f(w). The partition {B;}, or equivalently the u-field .:#, can be regarded as an experiment, and to learn which B, it is that contains w is to learn the outcome of the experiment. For this reason th e function or random variable f defined by (33.2) is called the conditional probability of A given § and is denoted P[ AII§J. This is written P[ A 11.:#]., whenever the argument w needs to be explicitly shown. Thus P[ A II.:#] is the function whose value on B1 is the ordinary condi tional probability P( A I BJ This definition needs to be completed, because P( A I B) is not defined if P(B) = 0. In this case P[ A II.:#J will be taken to have any constant value on B,; the value is arbitrary but must be the same over all of the set B,. If there are nonempty sets B; for which P(B) = 0, P[ A II.:#] therefore stands for any one of a family of functions on f!. A specific such function is for emphasis often called a version of the conditional
SECTION 33. CONDITIONAL PROBABILITY
429
probability. Note that any two versions are equal except on a set of probabil ity 0. Consider the Poisson process. Suppose that 0 < s < t, and and B; = [ N, = i], i = 0, 1, . . . . Since the increments are independent (Section 23), P( A I B ) = P[ Ns = O]P[ N, - Ns = i]jP[ N, = i], and since they have Poisson distributions (see (23.9)), a simple calculation reduces this to Example 33.1. let A = [Ns = 0]
;
i 0, 1 , 2, .
(33 .3)
=
Since i = N,(w) on B,, this can be written (33.4)
P[Ns = O il &' ] .,
( = 1
-t S
.
.
.
) N,(w ) .
Here the experiment or observation corresponding to {B;} or .§ deter mines the number of events -telephone calls, say-occurring in the time interval [0, t ]. For an observer who knows this r.umber but not the locations of the calls within [0, t], (33.4) gives his probability for the event that none of them occurred before time s. Although this observer does not known w, he • knows N,(w), which is all he needs to calculate the right side of (33.4). Suppose that X0, X 1 , . . . is a Markov chain with state space S as in Section 8. The events Example 33.2.
(33.5) form a finite or countable partition of f! as i0, , in range over S. If .§, is the u-field generated by this partition, then by the defining condition (8.2) for Markov chains, P[Xn + 1 = jll.§, ] ., = P;ni holds for w in (33.5). The sets .
.
•
(33 .6)
0
a u-field � smaller than � for i E S also partition f!, and they generate 0 Now (8.2) also stipulates P[Xn + l = jll-§, ]., = P;j for w in (33.6), and the essence of the Markov property is that •
(33 .7) The General Case
If .§ is the u-field generated by a partition B 1 , B 2 , . . , then the general element of .§ is a disjoint union B; 1 U B; 2 U finite or countable, of certain of the B;. To know which set B; it is that contains w is the same thing .
· · · ,
DERIVATIVES AND CONDITIONAL PROBABILITY
430
as to know which sets in .§ contain w and which do not. This second way of looking at the matter carries over to the general u-field .§ contained in !F. (As always, the probability space is (f!, !F, P).) The u-field .§ will not in general come from a partition as above. One can imagine an observer who knows for each G in .§ whether w E G or w 'E Gc. Thus the u-field .§ can in principle be identified with an experiment or observation. This is the point of view adopted in Section 4; see p. 57. It is natural to try and define conditional probabilities P[ A II.:#] with respect to the experiment .§. To do this, fix an A in !F and define a finite measure v on .§ by v(G)
=
P ( A n G) ,
G E .§.
Then P{ G ) = 0 implies that v ( G ) = 0. The Radon- Nikodym theorem can be applied to the measures v and P on the measurable space (f!, .§) because the first one is absolutely continuous with respect to the second.t It follows that there exists a function or random variable f, measurable .§ and integrable with respect to P, such that t P(A n G) = v(G) = J0fdP for all G in .:#. Denote this function f by P[ AII.:#]. It is a random variable with two properties: (i) P[ AII.:#] is measurable .§ and integrable. {ii) P[ A II.§] satisfies the functional equation
j P[ A II.:# ] dP = P( A n G) ,
(33.8)
G
G E .§.
There will in general be many such random variables P[ AII.:#], but any two of them are equal with probabilit-y 1. A specific such random variable is called a version of the conditional probability. If .§ is generated by a partition B 1 , B 2 , the function f defined by (33.2) is measurable .§ because [ w: f( w ) EH] is the union of those B; over which the constant value of f lies in H. Any G in .§ is a disjoint union G = U k Bik ' and P(A n G ) = [k P(AI B;)P(B;), so that (33.2) satisfies (33.8) as well. Thus the general definition is an extension of the one for the discrete case. Condition {i) in the definition above in effect requires that the values of P[ AII.:#] depend only on the sets in .§. An observer who knows the outcome of .§ viewed as an experiment knows for each G in .§ whether it contains or not; for each he knows this in particular for the set [ w ' : P[ A II.:#].,. = ], • • •
w
x
x
t Let P0 be the restriction of P to .§ (Example 1 0.4), and find on (fl, .§) a density f for v with respect to P0• Then, for G E .§, v(G) = fcfdP0 fcfdP (Example 16.4). If g is another such density, then P[f * g 1 = P0[f * g 1 0. =
=
S ECTION 33. CONDITIONAL FROBABILITY
431
and hence he knows in principle the functional value P[ A II.#L even if he does not know w itself. In Example 33. 1 a knowledge of N,(w) suffices to determine the value of (33.4)-w itself is not needed. Condition (ii) in the definition has a gambling interpretation. Suppose that the observer, after he has learned the outcome of .#, is offered the opportu nity to bet on the event A (unless A lies in .#, he does not yet know whether or not it occurred). He is required to pay an entry fee of P[ A II.#] units and will win 1 unit if A occurs and nothing otherwise. If the observer decides to bet and pays his fee, he gains 1 - P[ A II.#] if A occurs and - P[ A II.#] otherwise, so that his gain is ( 1 - P[ A II.# ] ) IA + ( - P [ A II,f ] ) IA' = fA - P[ A II.#]. If he declines to bet, his gain is of course 0. Suppose that he adopts the strategy of betting if G occurs but not otherwise, where G is some set in .#. He can actually carry out this strategy, since after learning the outcome of the experiment .§' he knows whether or not G occurred. His expected gain with this strategy is his gain integrated over G: •
f) fA -- P[ A ll.#]) dP. But (33.8) is exactly the requirement that this vanish for each G in .#. Condition (ii) requires then that each strategy be fair in the sense that the observer stands neither to win nor to lose on the average. Thus P[ A II.#] is the just entry fee, as intuition requires.
Example
Suppose that A E .#, which will always hold if .§' coin cides with th e whole u-field !F. Then /A satisfies conditions (i) and (ii), so that P[ A II.#] = /A with probability 1 . If A E .#, then to know the outcome of .§' viewed as an experiment is in particular to know whether or not A has • occurred. 33.3.
Example 33.4. If .§' is {0, f!}, the smallest possible u-field, every function
measurable .§' must be constant. Therefore, P[ A II.#L P(A) for all w in • this case. The observer learns nothing from the experiment .#. =
According to these two examples, P[ AII{O, f!}] is identically P(A), whereas /A is a version of P[ A II!F]. For any .§', the function identically equal to P(A ) satisfies condition (i) in the definition of conditional probability, whereas /A satisfies condition (ii). Condition {i) becomes more stringent as .§' de creases, and condition (ii) becomes more stringent as .§ increases. The two conditions work in opposite directions and between them delimit the class of versions of P[ A II.§].
DERIVATIVES AND CONDITIONAL PROBABILITY
432
Example 33.5. Let f! be the plane R 2 and let :F be the class � 2 of planar Borel sets. A point of f! is a pair (x, y ) of reals. Let .§ be the u-field consisting of the vertical strips, the product sets E X R 1 = [(x, y ): x E £], where E is a linear Borel set. If the observer knows for each strip E X R 1 whether or not it contains (x, y ), then, as he knows this for each one-point
set E, h e knows the value of x. Thus the experiment .§ consists in the determination of the first coordinate of the sample point. Suppose now that P is a probability measure on � 2 having a density y ) with respect to planar Lebesgue measure: P(A ) = y )dxdy. Let A be a horizontal strip R 1 X F = [(x, y ): y E F], F being a linear Borel s::.t. The conditional probability P[ A II.§] can be calculated explicitly. Put
ffA f(x.,
�( X ' y )
( 33.9)
=
f(x ,
j f( x , t) dt --'F'---- -!Rlf( t) dt X,
Set �(x, y ) = 0, say, �lt points where the denominator here vanishes; these points form a set of P-measure 0. Since 'fJ( -', y ) is a function of x alone, it is measurable .:#. The general element of .§ being E X R 1 , it will follow that � is a version of P[ AII.:#] if it is shown that ( 33 . 10)
fExR1�( x , y ) dP( x , y )
=
P( A n ( E X R 1 ) ) .
Since A = R 1 X F, the right side here is P( E X F). Since P has density Theorem 16. 1 1 and Fubini's theorem reduce the left side to
JE { JR I�(
X
f,
} JE { fI( X ' t ) dt } dx
' y ) ( X ' y ) dy dx =
f
=
Thus (33.9) does give a version of P[ R
ffEx Ff( x , y ) dxdy = P( E X F ) . 1
X Fll.:# ].
•
The right side of (33.9) is the classical formula for the conditional probability of the event R 1 X F (the event that y E F) given the event {x} x R 1 {given the value of x ). Since the event {x} X R 1 has probability 0, the formula P(A I B ) = P( A n B)jP(B) does not work here. The whole point of this section is the systematic development of a notion of conditional probability that covers conditioning with respect to events of probability 0. This is accomplished by conditioning with respect to collections of events-that is, with respect to u-fields .§.
SECTION 33.
433
CONDITIONAL PROBABILITY
Example 33.6. The set A is by definition independent of the u-field .§ if
it is independent of each G in .§: P( A n G) = P(A )P(G). This being the A is independent of .§ if and only if same thing as P( A n G) • ( A ) with probability 1. P[ A II.:# ]
= J0P(A) dP ,
=P
The u-field u(X) generated by a random variable X consists of the sets �1; see Theorem 20. 1. The conditional probability [w: X(w) H] for of A given X is defined as P[ AIIu(X)] and is denoted P[ A II X]. Thus P[ AIIX] = P[ A IIu(X)] by definition. From the experiment corresponding to ] contains and the u-field u(X), one learns which of the sets [ : hence learns the value X(w). Example 33.5 is a case of this: take X( y) y) in the sample space f! = R 2 there. for This definition applies without change to random vector, or, equivalentiy, to a tinite set of random variables. It can be adapted to arbitrary sets of random variables as well. For any such set [ X�> t T], the u-field u[X, T] it generates is the smallest u-field with respect to which each X, is measurable. It is generated by the collection of sets of the form [ : X,(cu) T] A P[ A II X, H] for t in T and in �1• The is by definition the conditional probability P[ A IIu[ X, T]] of A with respect to the u-field u[X, , T]. In this notation the property (33.7) of Markov chains becomes
E
HE
w' X(w') = x
w
x, = x
( x,
E
tE
H conditional probability with respect to this set of random variables tE
E w t E of tE
(33 . 1 1 ) l
=
The conditional probability of [ Xn + j] is the same for someone who knows the present state xn as for someone who knows the present state xn and th e past states X0, Xn _ as well. • • •
,
1
Y be random vectors of dimensions j and k, let JL be the distribution of X over R j, and suppose that X and Y are independent. According to (20.30), Example 33. 7. Let X and
HE
for !!J j and �j + k. This is a consequence of Fubini's theorem; it has a conditional-probability interpretation. For each in R j put
JE
x f( x) = P[(x,Y ) EJ] = P[w' : ( x , Y(w')) EJ] .
( 33.12 )
By Theorem 20. 1{ii), f(X(w)) is measurable u(X ), and since distribution of X, a change of variable gives
1
JL
is the
f( X( w ))P( dw ) = j f(x) JL( dx ) = P([( X, Y) EJ] n [X E H] ).
[ X e H]
H
434
DERIVATIVES AND CONDITIONAL PROBABILITY
Since [ X E H] is the general element of u(X), this proves that
f( X( ) ) P [ ( X, Y ) E J II XL , w
(33. 13)
=
•
with probability 1.
The fact just proved can be writ en P[ ( X, Y ) Ell I XL = P[( X( w ) , Y) E 1 ] = P[w' : (X( w ) , Y( w')) El].
col l i s i o n Repl a ci n g w' by w on the the one notati o nal l i k e here causes ri g ht a replacing by x causes in J:J(x, y ) dy. Suppose that and Y are independent random variables and that Y has distribution function F. For = [(u, v ): max{u, v} < m], (33. 12) is 0 for m < and F(m) for m > if M = max{ Y}, then (33.13) gives y
X
J
x;
x
X,
( 33 . 14 ) with probability 1. All equations involving conditional probabilities must be qualified in this way by the phrase with probability 1 , because the conditional probability is unique only to within a set of probability 0. The following theorem is useful for checking conditional probabilities.
Let 9 be a 1r-system generating the u-field .§, and suppose that f! is a finite or countable union of sets in g_ An integrable function f is a version of P[ A il .:#] if it is measurable .§ and if Theorem 33.1.
( 33 .15 )
j fdP = P( A n G) G
holds for all G in g_ PRooF.
•
Apply Theore m 10.4.
The condition that f! is a finite or countable union of 9-sets cannot be suppressed; see Example 10.5.
Suppose that X and Y are independent random variables with a conti n uous. What di s tri b uti o n functi o n common i s the condi that i s posi t i v e and F of [X< x] ticloenalarlyprobabi l i t y the random vari a bl e M = max{ X, Y}? it shoul d gi v en i f M < x, suppose that M x. Since X< x requires M Y, the chance of be the x] shoul d whidencech ibes !�F(x)j by symmetry, by i n depen condi t i o nal probabi l i t y of [X < F(m) !P[ X < xiX < m] with the random variable M substituted Example 33.8. 1
>
=
=
As
S ECTION 33.
CONDITIONAL PROBABILITY
435
for m. Intuition thus gives (33.16)
It suffi c es t o check (33.15) for sets G = [M < m], because these fo rm a 'IT-system generating u(M). The functional equation reduces to
P[M < min{x, m} ] + ��x<M�m :((;;) dP = P [ M<m , X<x]. other caseit folisloeasy, Siproduct nce themeasure, suppose that x < m. Since the distri b ution of (X, Y ) is ws by Fubini's theorem and the assumed continuity of F that (33.17 )
dF ( u)dF( v ) v F ) ( X 0. Similarly, P[ fiii5' L 0 1, and, if the An are dis joint, P[ U nAnll .:flw = LnP[A II.§L . Therefore, P[ A II.§lw is a probability measure as A ranges over !F. Thus conditional probabilities behave like probabilities at points of posi tive probability. That they may not do so at points of probability 0 causes no problem because individual such points have no effect on the probabilities of sets. Of course, sets of points individually having probability 0 do have an effect, but here the global point of view reenters. • • •
=
0
0
0
Conditional Probability Distributions Let
X be a random variable on (f!, !F, P), and let .§ be a u-field in !F. 1 Theorem 33.3. There exists a function JL(H, w ), defined for H in � and w in f!, with these two properties: (i) For each w in f!, JL( w) is a probability measure on � 1 (ii) For each H in !JP 1 , JL(H, · ) is a version of P[X E Hll.§]. ·
,
•
The probability measure JLL w) is a conditional distribution of X given .§. If .§= u{Z), it is a conditional distribution of X given Z. For each rational r < s, then by (33.23), PROOF.
O.
'
•
If P(B) = 0, the value of £[ XII.§] over B; is constant but arbitrary.
Example 34.2. For an indicator /A the defining properties of £[1)1 .#] and ?( All.§] coincide; therefore, £[ /All.§] = ?[ A ll.§] with probability 1. It is easily checked that, more generally, £[XII.§] = L;a;P[ A;II.#l with pwba • bility 1 for a simple function X = L;a;IA . I
In analogy with the case of conditional probability, if [X, t E T] is a collection of random variables, E[ XIIX, t E T] is by definition £ [XII.#] with u[X, t E T] in the role of .:#.
Example 34.3. Let J be the u-field of sets invariant under a measurepreserving transformation T on (f!, !F, P). For f integrable , the limit f in (24.7) is E[fiiJ]: Since f is invariant, it is measurable J. If G is invariant, then the averages a n in the proof of the ergodic theorem (p. 318) satisfy E[ J0a n ] = E[{0Jl. But since the a n converge to f and are uniformly inte • grable, E[I0f ] = E[I0f]. A
A
A
Properties of Conditional Expectation
To prove the first result, apply Theorem 16.10(iii) to f and £[ XII.§] on (f!, .§, P).
Let g be a 1r-system generating the u-field -�. and suppose that f! is a finite or countable union of sets in .:#. An integrable function f is a version of £[ XII.§] if it is measurable .§ and if Theorem 34.1.
( 34.3 )
holds for all G in
j fdP = j XdP G
G
g_
In most applications it is clear that f! E g_ All the equalities and inequalities in the following theorem hold with probability 1.
SECTION 34.
-
447
CONDITIONAL EXPECTATION
Theorem 34.2.
Suppose that X, Y, Xn are integrable.
(i) If X = a with probability 1, then £[ XII.§] = a. {ii) For constants a and b , E[aX + bYII.§] = a£[ XII.§] + b£[ Y I I 5' ]. (iii) If X < Y with probability 1, then £[ XII.§] < E[ YII.§]. (iv) I E[ XII 5' ll < £[l XI II.§]. {v) If lim n Xn = X with probability 1 , IXnl < Y, and Y is integrable, then lim n E[ Xnll.§] = £[ XII.§] with probability 1. If X= a with probability 1 , the function identically equal to a satisfies conditions (i) and (ii) in the definition of £[ XII.§'], and so {i) above follows by uniqueness. As for (ii), aE[XII.§] -r bE[ YII.§] is integrable and measurable �4, and PROOF.
f ( aE[ XII.§] + bE[ Y II.§]) dP = a f E[X I I .§ ] dP + b f E[ Y II.§ ] dP G
G
=a
j XdP + b f G
G
Y
G
dP =
j (aX + bY) dP G
for all G in .§, so that this function satisfies the functional equation. If X < Y with probability 1, then fa( E[ YII.§] - £[XII.§]) dP fa(Y X) dP > 0 for all G in .§. Since E[ YII.§] - £[ X II .§] is measurable .§, it must be nonnegative with probability 1 (consider the set G where it is negative). This proves (iii), which clearly implies (iv) as well as the fact that £[ XII.§] = E[ YII.§] if X = Y with probability 1 . To prove (v), consider Zn = supk ;?; n1Xk - X ]. Now Zn � 0 with probability 1, and by {ii), (iii), and {iv), I E[Xn l l .#] - E[ XII.§]I < E[ Zn ll.§]. It suffices, therefore, to show that E[ Zn11.4HO with probability 1. By (iii) the sequence E[ Zn ll.§] is nonincreasing and hence has a limit Z; the problem is to prove that Z 0 with probability 1, or, Z being nonnegative, that E[Z] = 0. But 0 < Zn < 2Y, and so (34. 1 ) and the dominated convergence theorem give • E[ Z] = fE[Z II.§] dP < JE[ Znll.§] dP E[ Zn] -7 0. =
=
=
The properties (33.21) through (33.28) can be derived anew from Theorem 34.2. Part (ii) shows once again that E[L:,.a,.IA II.§] = L:,.a,.P[A,.II.§] for simpie functions. If X is measurable .§, then clearly £[ XII.§] = X with probability 1. The following generalization of this is used constantly. For an observer with the information in .§, X is effectively a constant if it is measurable .#: '
Theorem 34.3.
(34.4)
with probability 1 .
If X is measurable .:#, and tf Y and XY are integrable, then E[ XY II.§] = XE[ YII.§]
448
DERIVATIVES AND CONDITIONAL PROBABILITY
It will be shown first that the right side of (34.4) is a version of the left side if X = 100 and G0 E .§. Since /00E[YII.§] is certainly measur able .:#, it suffices to show that it satisfies the functional equation f0100E[ YII.:#] dP = f0100YdP, G E .:#. But this reduces to fa n a11 E[ YII.:#] dP = fa na11 YdP, which holds by the definition of E[YII.:#]. Thus (34.4) holds if X is the indicator of an element of .:#. It follows by Theorem 34.2(ii) that (34.4) holds if X is a simple function measurable .§. For the general X that is measurable .:#, there exist simple functions Xn , measurable .:#, such that 1Xr1 < lXI and lim n Xn = X (Theorem 13.5). Since IXn YI < I XYI and IXYI is integrable, Theorem 34.2(v) implies that lim n E[ Xn YII.:#l = E[ XYII.:#] with probability 1. But E[XnY II.:#] = Xn E[ YII.:#] by the case already treated, and of course lim n Xn E[Y II.:#] = XE[ YII .:#]. (Note that IXnE[ Y II.:#ll = I E[XnYII.:#JI < E[ IXn YI II.:#l E[I XYI II.:#], so that the limit XE[YI!.§] is integrable.) Thus (34.4) holds in • general. Notice that X has not been assumed integrable. PRooF.
0
0 < X; < (J , =
* This topic may be omitted. tSee Problem 34.19.
f.L,
..
SECTION 34.
CONDITIONAL EXPECTATION
451
If X; is the function on R k defined by X;( X) = X;, then under P6, XI , . . . ' xk are independent random variables, each uniformly distributed over [0, e]. Let T(x ) max ; ,; k X;(x ). If g 6(t) is e - k for 0 < t < e and 0 otherwise, and if h(x) is 1 or 0 according as all x( are nonnegative or not, then fix) = g eCT(x ))h(x ). The factoriza tion criterion is thus satisfied, and T is a sufficient statistic. Sufficiency is clear on intuitive ground& as well: e is not involved in the co nditional distribution of X1 , , Xk given T because, roughly speaking, a random one of them equals T and the others are independent and uniform over [0, T]. If this is true, the distribution of X; given T ought to have a mass of k - I at T and a uniform distribution of mass 1 - k - I over [0, T ], so that =
•
•
•
(34 .9 ) For a proof of thi� fact, needed later, note that by (21 9)
(34 10) r t - u { .!_) k - I l t k +� du = = o
e
fl
2e k
if 0 < t < e. On the other hand, P6[T < t ] = (t ;ei, so that under P6 the distributio n of T has density kt k - I ;e k over [0, e]. Thus
fT,; r
(34.1 1 )
k
+2k 1 TdP.6 = k 2k+ 1 j 'uk u k -k l du = t k +kl o
e
2e
Since (34.10) and (34.11) agree, (34.9) follows by Theorem 34.1.
•
The essential ideas in the proof of Theorem 34.6 are most easily understoo d through a preliminary consideration of special cases.
E
Lemma 1. Suppose that [ P6; e e] is dominated by a probability measure P and that each P6 has with respect to P a density g 6 that is measurable .:#. Then .:# is sufficient, and P[ A ll.#] is a version of P6[ A ll.#] for each e in e. PRoOF. For
G
in .:#, (34.4) gives
Therefore, P[ A II.:#]-the conditional probability calculated with respect to P-does serve as a version of P6[ A II.:#] for each e in e. Thus .:# is sufficient for th e family
452
DERIVATIVES AND CONDITIONAL PROBABILITY
E 0]
even for this family augmented by p (which might happen to lie in the family to start with). • For the necessity, suppose first that the family is dominated by one of its members.
[ Pe:
e
-
E
E
Lemma 2. Suppose that [ Pe : e 0] is dominated by Pe for_ some eo e. If .:# is sufficient, then each Pe has with respect to Pe 11 a density g e /hat is measurable .:#. PROOF. Let p(A , w) be the function in the definition of sufficiency, and take and 0. Let de be any density of Pe w Pe[ A l l.:#]., = p( A, w) fo r all A with respect to Pn . By a number of applications of (34.4 ), uo
E !T, E fl,
6E
the next-to-last equality by sufficiency (the integrand on either side being p (A , - )). Thus gn = Ee [dell.:#], which is measu rable .:#, can serve as a density for Pe with respect to Pn . • u
uu
"
To complete the proof of Theorem 34.6 requires one more lemma of a technical sort.
E E>] is dominated by a u-finite measure, then it is equivalent to some finite or countably Lemma 3. If [Pe: e
infinite subfamily.
In many examples, the Pe are all equivalent to each other, in which case the subfamily can be taken to consist of a single Peu· PROOF. Since JL is u-finite, there is a finite or countable partition of n into .r-sets An such that 0 < JL(An) < oo. Choose positive constants am one for each An, in such a way that [nan < oo. The finite measure with value Ln anJL(A nAn)/JL(An) at A dominates JL. In proving the lemma it is therefore no restriction to assume the family dominated by a finite measure JL. Each Pe is dominated by JL and hence has a density fe with respect to it. Let Se = [w : fiw) > 0]. Then PiA) = PiA n Se) for all A, and Pe( A ) = 0 if and only if JL(A n Se) = 0. In particular, Se supports Pe. Call a set B in !T a kernel if B c Se for some e, and call a finite or countable union of kernels a chain. Let a be the supremum of JL(C) over chains C. Since JL is finite and a finite or countable union of chains is a chain, a is finite and JL(C) = a for some chain C. Suppose that C Un Bm whe1e each Bn is a kernel , and suppose that Bn c Se ". is dominated by [ Pe : n = 1, 2, . . . ] and The problem is to show that [ Pe: e hence equivalen t to it. Suppose that Pe ( A ) = 0 for all n. Then "JL( A n Se") = 0, as observed above. Since C c Un Se • JL(A n C) = 0, and it follows that Pe(A n C) = 0 =
E 8]
,
S ECTION 34.
453
CONDITIONAL EXPECTATION
whatever e may be. But suppose that PeCA - C ) > 0. Then P6((A - C) n S6) = PeCA - C) is positive, and so (A - C ) n S6 is a kernel, disjoint from C, of positive JL-measure; this is impossible because of the maximality of C. Thus PeCA C) is 0 • along with PeC A n C), and so P6( A) = 0. -
Suppose that [ P6: e E 0] is
.
] and hence to [ P6 : e E 0 ], and all th ree are
•
( 34.13 ) 34.6. If each P6 has density g 6h with respect to JL, then by the construction (34. 12), P has density fh with respect to where f = Lncng6 . Put r6 = g6jf if f > 0, and = 0 (say) if f = 0. If each g6 is measurable .:#, th e' same is true of f and hence of the r6 Since P[f 0] = 0 and P is equivalent to the entire family, P6[f = 0] = 0 fo 1 all e. Therefore, PROOF OF SuFFICIENCY I"' THEOREM
,u ,
ru
=
•
JA r6 dP = J r6fh dJL = JA n [f> O]r6 fh dJL = JA n [ f> O]g6h dJL 4
Each P6 thus has with respect to the probability measure P a density measurable .:#, and it follows by Lemma 1 thai .:# is sufficient • PROOF OF NECESSITY IN THEOREM
34.6. Let p(A, w) be a function such that, for
each A and e, p(A, · ) is a version of P6[ A II.:#], as required by the definition of sufficiency. For P as in (34.12) and G E .:fl,
(34. 14)
j p ( A , w ) P ( d w ) = Ln Cn }r p( A , w )P6 (dw) G
G
=
''
[ cn j0P6.,( AII .:# ] dP6
,,
n
= [ cnPe ( A n G) = P( A n G).
n
"
Thus p( A, · ) serves as a version of P[ AII.:#] as well, and .:# is still sufficient if P is added to the family Since P dominates the augmented family, Lemma 2 implies that each P6 has with respect to P a density g6 that is measurable .:#. But if h is the density of P with respect to JL (see (34.13)), then P6 has density g6h with respect to JL. •
454
DERIVATIVES AND CONDITIONAL PROBABILITY
E
A sub-a-field .§'0 sufficient with respect to [ P6 : e e ] is minimal if, fo r each sufficient .§', .§'0 is essentially contained in .§' in the sense that for each A in .§'0 there is a B in .§' such that P6(A A B) = 0 fo r all e in e. A sufficient .§' represents a compression of the information in !T, and a minimal sufficient .§'0 represents the greatest possible compressio n. Suppose the densities f6 of the P6 with respect to JL have the property that f6(w) i s measurable G'x !T, where G' is a u-fie!d i n e. Let rr be a probability measure on G', and define P as f0P6rr(de), in the sense that P(A) = fe!A f6(w)JL(dw)rr(de) = fe PeCA )rr(de). Obviously, P « [ P6: e E e]. Assume that {34 15) If rr has mas!. c, at e,, then P is given by (34. 12), and of course, (35.15) holds if (34. 13) does. Let r6 be a density for P6 with respect to P. Theorem
sub-a-field.
34.7. If (34. 15) holds, then .§'0 u[ r6: e =
E e] is a minimal sufficient
PROOF. That .§'0 is sufficient follows by Theorem 34.6. Suppose that .§' is sufficient. It follows by a simple extension of (34. 14) that .§' is still sufficient if P is added to the family, and then it follows by Lemma 2 that each P6 has with respect to P a density g 6 that is measurable .§'. Since densities are essentially unique, P[ g6 = r6 ] = 1. Let d? be the class of A in .§'0 such that P( A A B) = 0 for some B in .§'. Then d? is a u-field containing each set of the fo rm A = [ r6 H] (take B = [ g6 H]) and hence containing .§'0. Since, by (34. 15), P dominates each P6, §0 is essentially contained in .§', in the sense of the defi nitio n. •
E
E
Minimum-Variance Estimation •
To illustrate sufficiency, let g be a real function on e, and consider the problem of estimating g(e). One possibility is that e is a subset of the line and g is the identity; another is that e is a subset of R k and g picks out one of the coordinates. (This problem is considered from a slightly different point of view at the end of Section 19.) An estimate of g(e) is a random variable Z, and the estimate is unbiased if £6[Z] g (e) for all e. One measure of the accuracy of the estimate Z is £6[( Z g(ll)) 2 ]. If .§' is sufficient, it follows by linearity (Theorem 34.2(ii)) that £6[ Xli.§'] has for simple X a version that is independent of e. Since there are simple X, such that I X,l < lXI and X, -+ X, the same is true of any X that is integrable with respect to each P6 (use Theorem 34.2(v)). Suppose that .§' is, in fact, sufficient, and denote by £[ XII.§'] a version of £6[ XII.§'] that is independent of e. =
Theorem
Then
34.8. Suppose that E6[(Z - g(e))Z] < oo for all e and that .§' is sufficient.
(34.16)
for all e. If Z is unbiased, then so is E[ ZII.§']. * This topic may be omitted.
SECTION
34.
CONDITIONAL EXPECTATION
455
By Jensens's inequality (34.7) for �p(x) = (x - g(6))2, (E[ZII.§'] - g(6))2 < E6[(Z - g (6 ))2 1 1.§' . Applying £6 to each side gives (34.16). The second statement • follows from the fact that £6[ E[ZII.§']] = £6[Z]. PROOF.
]
This, the Rao-Blackwell theorem, says that as Z if .§' is sufficient.
E[ZII.§'] is at least as good an estimate
Example 34.5. Returning to Example 34.4, note that each X; has mean 6 j2 under P6, so that if X = k- 1t7= X; is the sample mean, then 2X is an unbiased estimate of 6. But there is a better1 one. By (34.9), £6[2 X liT] = (k + l )Tjk T', and by the Rao-Biackwell theorem, T' is an un biased estimate with variance at most t hat of 2X. In fact, for an arbitrary unbiased estimate Z, E6[(T' - 6)2] < £6[(Z - 6)2 ]. To prove this, let o = T' - E[ZIIT]. By Theorem 20. l(ii), o = f(T) for some Borel function f, and E6[ f(T )] 0 for all 6. Taking account of the density for T leads to jg[(x )x" - 1 dx = 0, so thai f(x)x k - 1 integrates to 0 over all intervals. Therefore, f(x) along with f(x )x k - 1 vani:;hes for x > 0, except on a set of Lebesgue measure 0, and hence P6[f(T) = 0] 1 and P6[T' = E[ZIIT]] 1 for all 6. Therefore, E6[(T' 6)2] = £6[(E[Z11Tl - 6)2 ] < £6[(Z - 6)2] for Z unbiased, and T' has minimum vari =
=
=
=
ance among all unbiased estimates of 6.
•
PROBLEMS 34.1. Work out for conditional expected values the analogues of Problems 33.4, 33.5, and 33.9. 34.2. In the context of Examples 33.5 and 33. 12, show that the conditional expected value of Y (if it is integrable) given X is g(X), where
g(x) =
{' f( x, y )ydy ::
- - -
J
-
f( x, y ) dy
"'
34.3. S how that the independence of X and Y implies that E[YIIX] = E[Y], which in turn implies that E[XY] = E[ X]E[Y]. Show by examples in an of three points that the reverse implications are bo th false.
D.
B be an event with P( B) > 0, and define a probability measure P0 by P0(A) = P(AIB). Show that P0[ AII.§'] = P[A n Bll.§']jP[BII.§'] on a set of P0-measure 1. (b) Suppose that d? is generated by a partition 8 , 82 , . . . , and let .§'V � 1 u(.§'U d?). Show that with probabil ity 1,
34.4. (a) Let
P[ All.§'v d?]
=
n B;ll.§' ) �18; P[PA[ B;ll.§' )
456
D ERIVATIVES AND CONDITIONAL PROBABILITY
34.5. The equation
(34.5) was proved by showing that the left side is a version of the right side. Prove it by showing that the right side is a version of the left side.
34.6. Prove fo r bounded X and Y that E[ YE[ XII.§']] = E[ XE[ YII.§']]. 34.7.
33.9 j
Generalize Theorem 34.5 by replacing X with a random vecto r
34.8. Assume that X is nonnegative but no t necessarily integrable. Show that it is
still possible to define a nonnegative random variable £[ X II.§'], measurable .§', such that (34. 1 ) holds. Prove versions of the monotone convergence theorern and Fatou's lemma. 34.9.
(a) Show fo r non nega tive X that £[ X I I .§'] = f0P[ X > tll.§'] dt with probabil ity 1. k k (b) Generalize Markov's inequ ality: P[ I X I ?- a ll.§' ] < a - E[ I X I ll.§'] with probab;lity 1 . (c) Simil arly generalize Chevyshev's and Holder"s inequalities.
34.10.
(a) Show that, if .§'1 c .§'2 and E[ X 2 ] < oo, then E[( X - E[ X II .§'2 ]) ] < E[(X 2
2
- £[ Xll .§'1 ]) ]. The dispersion of X about its conditional mean becomes smaller as the u-field grows. 2 (b) Define Var[ X II.§' ] = E[( X - £[ X II.§']) II.§']. Prove that Var[ X ] = E[Var[ X I I .§']] + Var[ E[ X II.§']].
!T, let 5;1 be the u-field generated by 5; U .§j, and let A; be the generic set in .§j. Consider three conditions:
34. 1 1 . Let .§'1 , .§'2 , .§'3 be u-fields in
all all
(i) P[ A 3II.§'1 2 ] = P[ A 3 II .§'2 ] for A 3• A 3• (ii) P[ A 1 n A 3II.§'2 ] = P[ A 1 /I .§'2 ]P[ A 3II .§'2 ] for A1 (iii) P[ A 1 II .§'23 ] P[ A 1 II .§'2 ] for A 1• If .§'1 , .§'2 , and .§'3 are interpreted as descriptions of the past, present, and future, respectively, (i) is a general version of the Markov prope rty: th e conditional probability of a future event A 3 given the past and present .#'1 2 is the same as lhe condi tional probabil ity given the present .§'2 alone. Condition (iii) is the same with time reversed. And (ii) says that past and future events A 1 and A 3 are conditionally independent given the present .§'2 • Prove the three conditions equivalent. =
34. 12.
all and
33.7 34. 1 1 i Use Exa mple 33.10 to calculate P[ Ns the Poisson process.
=
k liN;,
u >
t ] ( s < t) for
2 L be the Hilbert space of square-integrable random variables on (fi, !T, P). For .§' a u-field in !T, let M.# be the su bspace of elements of e 2 that are measurable .§'. Show that the operator P.# defined for X E [ by P.#X = £[ X II.§'] is the perpendicular projection on M.#.
34.13. Let
Suppose in Problem 34. 13 that .§= u(Z) fo r a random variable Z in L2 . Let S2 be the o ne-dimensional subspace spanned by z. Show that S2 may be 2 much smaller than Mu Z>• so that E[ XIIZ] (fo r X E L ) is by no means the projection of X on Z. Take Z the identity function on the unit interval with Lebesgue measure.
34.14. j
Hint :
SECTION
34.
CONDITIONAL EXPECTATION
457
j
34.15.
Problem 34. 13 can be turned around to give an alternative approach to conditional probability and expected value. For a u-field .:# in !T, let P.# be the perpendicular projection on the subspace M§· Show that P.#X has for X E e the two properties required of E[ XII.:#]. Use this to define E[ XII.:#] for X E L2 and then extend it to all integrable X via approximation by random variables in e. Now define conditional probability.
34.16.
Mixing sequences. A sequence A A 2 , . . . !T, P) is mixing with constant if 1,
a
(D.,
of .r-sets in a probability space
lim P ( A, n E) = a P ( E )
(34.17)
n
fo r every E in !T. Then a = lim ,P( A,). (a) Show that { A ) is mixing with constant
a
if and only if
li,?1 fA XdP = a jXdP
( 34. 18)
n
fo r each integrable X (measurable !T). (b) Suppose that (34. 17) holds fo r E E &, where g is a 'IT-system, n E 9J, and A, E u( !Y) for ail n. Show that {A,} is mixing. Hint: First check (34. 18) for X measurable u(!Y) and then use conditional expected values with respec t to u(!Y). (c) Show that, if P0 is a probability measure on !T) and P0 « P, then mixing is preserved if P is replaced by P0.
(D.,
be 34.17. i Application of mixing to the central limit theorem. Let XI > X2 , random variables on P), independent and identically distributed with mean 0 and variance u 2, and pu t S, = X1 + · · · +X,. Then S,ju{n = N by the Lindeberg-Levy theorem. Show by the steps below that this still holds if P is replaced by any probability measure P0 on (!l, !T) that P dominates. For example, the central limit theorem applies to the sums [i: = 1 r1,(w) o f Rademacher functions if w i s chosen according to the unifo rm density over the unit interval, and this result shows that the same is true if w is chosen according to an arbitrary density. Let Y,, = S,juVn and Z, = (S, - S[1o , 1 )/uVn, and take fY to consist of the sets of the form [(XI ' . . . , Xk ) E H], � > 1, H E !JR k . Prove successively: (a) P[Y, < x] --+ P[N < x]. •
•
•
(D., !T,
P[IY, - Z,I > £] --+ 0. P[Z, <x] -+ P[N < x]. P(E n [Z, <x]) --+ P(E)P[N < x] for E E 9J. P(E n [Z, <x]) --+ P(E)P[N <x] for E E !T. (f) P0[Z, <x] --+ P[ N < x]. (g) P0[IY, - Z,I > E] -+ O. (h) P0[Y, <x] --+ P[N < x]. 34.18. Suppose that .:# is a sufficient subfield for the family of probability measures P6 , (} E E>, on !T). Suppose that for each (} and A, p(A, w) is a version of P6[AII.:#].,. and suppose further t hat for each p(- , w) is a probability (b) (c) (d) (e)
�
(D.,
w,
DERIVATIVES AND CONDITIONAL PROBABILITY
458
measure on
Qe = Pe ·
!F.
Define Q6 on !T by QeCA) = f0 p(A , w)Pidw ), and show that
The idea is that an observer with the information in .:# (but ignorant of w itself) in principle knows the values p(A , w) because each p(A, · ) is measur able .:#. If he has the appropriate randomization device, he can draw an w' from n according to the probability measure p( · , w ), and h is w' will have the same distribution Q6 = P6 that w has. Thus, whatever the value of the unknown e, the observer can on the basis of the information in .:# alo ne, and without knowing w itself, construct a probabilistic replica of w.
34.19. 34. 13 j In the context of the discussion O fl p. 252, let !T be the u-field of sets of the form e X A for A E g:: Show that under the probability measure Q, i o is the conditional expected value of g0 given !F.
6
34.20. (a) In Example 34.4, take rr to have der.sity e - over 0 = (O, oo). S how by Theorem 34.7 that T is a minimal sufficient statistic (in the sense that u(T) is minimal). (b) Let P6 be the distribution for samples of size n from a normal distribution with parameter (J = (m, u 2 ), u 2 > 0, and let rr put unit mass at (0, 1 ). Show that the sampie mean and variance form a minimal sufficient statistk:.
SECTION 35. MARTINGALES Definition
Let X1 , X2 , be a sequence of random variables oo a probability space (f!, !F, P), and let �. !F2 , be a sequence of u-fields in !F. The sequence {( X", .9,;): n 1, 2, . . . } is a martingale if these four conditions hold: •
•
•
•
•
•
=
(i) � c .9� + 1; (ii) xn is measurable �; (iii) E[IX" Il < oo; (iv) with probability 1, (35 . 1 ) Alternatively, the sequence X1 • X2 , is said to be a martingale relative to Condition (i) is expressed by saying the � form a the u-fields .9j, !F2 , filtration and condition (ii) by saying the X" are adapted to the filtration. If X" represents the fortune of a gambler after the nth play and � represents his information about the game at that time, (35.1) says that his expected fortune after the next play is the same as his present fortune. Thus a martingale represents a fair game, and sums of independent random variables with mean 0 give one example. As will be seen below, martingales arise in very diverse connections. • •
• • •
•
•
SECTION 35.
MARTINGALES
459
is defined to be a martingale if it is a martingale The sequence Xp X2 , relative to some sequence .9;, .9;, . . . . In this case, the u-fields .§, = u( X1 , X" ) always work: Obviously, .#, c .§, + 1 and X" is measurable .§, , and if (35.1) holds, then E[ Xn + l ll�l = £[£[Xn + 1 ll�lll.§, l = E[Xnll -§, 1 = X" by (34.5). For these special u-field s .§, , (35.1) reduces to •
•
•
•
•
•
,
(35.2) Since u( X1 , , X") c � if and only if X" is measurable � for each n , the u(X 1 , , X) are the smallest u-fields with respect to which the X" are a martingale. The essential condition is embodied in (35.1) and in its specialization (35.2). Condition {iii) is of course needed to ensure that E[ Xn + l ll�l exists. Condition (iv) says that X" is a version of E[ X"+ 1 11 �]; since X" is measurable �' the requirement reduces to •
•
•
•
•
•
(35 .3)
·
·
Since the � are nested, A E � implies that fAXn dP = fAXn + l dP = = fAXn +k dP. Therefore, X,, being measurable �, is a version of
·
E[X" +k ll�] : (35 .4)
with probability 1 for k > 1. Note that for A = f!, (35.3) gives (35 .5) The defining conditions for a martingale can also be given in terms of the differences (35 .6)
(d 1 = X1). By linearity, (35.1) is the same thing as (35 .7) Note that, since xk = d I + . . . +dk and dk = xk - xk - 1 > the sets Xp . . . , X, and d 1 > . . . , d " generate the same u-field: (35 .8) Let d 1 , d 2 , . . . be independent, integrable random vari ables such that E[d " ] = 0 for n > 2. If � is the u-field (35.8), then by independence E[d n + t ii�] = E[dn + l ] = 0. If d is another random variable, Example 35.1.
460
DERIVATIVES AND CONDITIONAL PROBABILITY
independent of the d" , and if � is replaced by u(d, d I > , � " ), then the + d" are still a martingale relative to the �- It is natural and X" d 1 + convenient in the theory to allow u-fields � larger than the minimal ones (35.8). • •
·
=
·
•
•
·
35.2. Let (f!, !F, P ) be a probability space, let v be on !F, and let .9;, .9;_, . . . be a nondecreasing sequence of
Example
a finite measure u-fields in !F. Suppose that P dominates v when both are restricted to �-that is, suppose that A E � and P(A) 0 together imply that v(A) 0. There is then a density or Radon-Nikodym derivative X" of v with respect to P when both are restricted to �; X" is a function that is measurable � and integrable with respect to P, and it satisfies =
=
A E �.
( 35 .9)
=
If A E � ' then A E �+ I as well, so that fA Xn + I dP v(A); this and (35.9) give (35.3). Thus the X" are a martingale with respect to the �• For a specialization of the preceding example, let P be Lebesgue measure on the u-field !F of Borel subsets of f! (0, 1], and let � be the finite u-field generated by the partition of f! into dyadic intervals (k2- " , (k + 1)2 - " ], 0 k < 2 " . If A E � and P(A) 0, then A is empty. Hence P dominates every finite measure v on �- The Radon-Nikodym derivative is
Example
35.3.
=
0, and that W, is measurable � - I to exclude prevision: Before the nth play the information available to the gambler is that in � - I > and his choice of stake W, must be based on this alone. For simplicity take W" bounded. Then Jf� d " is integrable, and it is measurable � if d " is, and if X" is a martingale, then E[ W, d " ll�_ 1 ] W, £[d " ll�_ 1 ] = 0 by (34.2). Thus u
, • • •
=
=
( 35.17 ) is a martingale relative to the �· The sequence Wp W2, . . . represents a betting system, and transforming a fair game by a betting system preserves fairness; that is, transforming xn into (35.17) preserves the martingale property. The various betting systems discussed in Section 7 give rise to various martingales, and these martingales are not in general sums of independent random variables-are not in general the special martingales of Example 35.1. If W" assumes only the values 0 and 1, the betting system is a selection system; see Section 7. If the game is unfavorable to the gambler-that is, if xn is a supermartin gale-and if W, is nonnegative, bounded, and measurable � - I • then the same argument shows that (35.17) is again a supermartingale, is again unfavorable. Betting systems are thus of no avail in unfavorable games. The stopping-time arguments of Section 7 also extend. Suppose that {X" } is a martingale relative to {�}; it may have come from another martingale tThere is a reversal of terminology here: a subfair game (Section 7) is against the gambler, while a submartingale favors him. t The notation has, of course, changed. The F,, and X of Section 7 have become Xn and An. n
DERIVATIVES AND CONDITIONAL PROBABILITY
464
via transformation by a betting system. Let r be a random variable taking on nonnegative integers as values, and suppose that (35 .18)
[r = n ] E � .
If r is the time the gambler stops, [ r = n] is the event he stops just after the nth play, and (35.18) requires that his decision is to depend only on the information � available to him at that time. His fortune at time n for this stopping rule is Xn* =
(35 .19)
(
X" XT
r.
Here X7 (which has value Xr( w )(w ) at w) is the gambler's ultimate fortune, and it is his fortune for all times subsequent to r. The problem is to show that Xd', X( , . . . is a martingale relative to .9Q, 9;, . . . . First, n- 1
E[I X; Ij = L 1
IXkl dP +
k = O [T = k )
: n)I Xn l dP
n] = f! - [ r n] E �' n
[ X"* E H ] = U [ r = k , Xk E H ] U [ r > n , X" E H ) E � . k=O
Moreover,
and
Because of (35.3), the right sides here coincide if A E �; this establishes (35.3) for the sequence X(, Xi, . . . , which is thus a martingale. The same kind of argument works for supermartingales. Since x; = XT for n 2. r, x; � X7• As pointed out in Section 7, it is not always possible to integrate to the limit here. Let X" = a + d I + . . . + d where the d" are independent and assume the values + 1 with probability � + d" = 1. Then (X0 = a ), and let r be the smallest n for which d 1 + E[ X; l = a and XT = a + 1. On the other hand, if the X" are uniformly bounded or uniformly integrable, it is possible to integrate to the limit: E[ XT ] = E[ X0 ]. n'
·
·
·
SECTION 35.
MARTINGALES
465
Functions of Martingales
Convex functions of martingales are submartingales: (i) If X I > X2 , • • • is a martingale relative to .9;, ,9S, . . . , tf cp is convex, and tf the cp(X,) are integrable, then cp(X1), cp(X2 ), . . . is a submartingale relative to .9;, ,9S. (ii) If Xp X2 , . . . is a submartingale relative to .9;, !7"2 , . . . , tf cp is nonde Theorem 35.1.
creasing and convex, and if the cp(X,) are integrable, then cp(X1 ), cp(X2 ), • • • is a submartingale relative to .9;, ,9S, . . . .
In the submartingale case, X, < E[Xn + 1 11 �1, and if cp is non decreasing, then cp(X,) < cp(E[X, + 1 11�]). In the martingale case, X, = E[X, + 1 11�1, and so cp(X,) cp(E[X, + 1 11�]). If cp is convex, then by Jensen's inequality (34.7) for conditional expectations, it follows that cp(£[ X, + 1 11 �]) < PRooF.
=
II
£[cp(X, + , )II�1.
Example 35.8 is the case of part (i) for cp(x) lxl. =
Stopping Times
Let r be a random variable taking as values positive integers or the special value oo. It is a stopping time with respect to {�} if [" = k 1 E .9\ for each finite k (see (35.18)), or, equivalently, if [ r < k 1 E .9\ for each finite k. Define (35 .20)
.9:, = [ A E .9": A n [ < k ] E .9\ ' 1 < k < oo] .
'
7'
This is a u-field, and the definition is unchanged if [ r < k] is replaced by [ r = k 1 on the right. Since clearly [ r = j] E .9:, for finite j, r is measurable
.9:,.
If r(w) < oo for all w and � = u(X 1 , . . . , X,), then Iiw) = Iiw') for all A in .9:, if and only if X;(w) X;(w') for i < r(w) r(w'): The information in .9:, consists of the values r(w), Xlw), . . . , XT( w )(w). Suppose now that r 1 and r2 are two stopping times and r 1 < r 2 • If A E .9:, 1 , then A n [r 1 < k1 E S\ and hence A n [r 2 < k1 = A n [r 1 < k1 n (r 2 < k 1 E .9\ : .9:,1 C .9:, • 2 =
=
If X 1 , • • • , X, is a submartingale with respect to .9;, . . . , .9,; and r r 2 are stopping times satisfying 1 < r < r 2 < n, then X7 1 , X7 2 is a submartingale with respect to .9:,1 .9;2 • Theorem 35.2. 1,
1
DERIVATIVES AND CONDITIONAL PROBABILITY
466
This is the optional sampling theorem. The proof will show that X7 1 , X7 2 is a martingale if Xp . . . , X" is. Since the X7. are dominated by EZ = 1 1Xkl, they are integrable. It is required to show that E[XT 2 liS:T ] > XT J , or PROOF.
I
1
A E -9;." .
(35 .21)
I
ButA E .9;1 implies thatA n [r l < k < r 2 ] = (A n [r l < k - l]) n [r 2 < k - lY lies in .9k_ 1 • If D.k = Xk - Xk _ 1 , then n
fA ( XT2 - XT J dP = fA kL= l I[ T J < k � ·21/), k dP n
= Lf D.k dP 2. 0 k = l A n [ T 1 < k ,;; T 2 ) by the submartingalc property.
•
Inequalities
There are two inequalities that are fundamental to the theory of martingales. Theorem 35.3.
If X1 , • • • , X" is a submartingale, then for a > 0,
(35 .22) This extends Kolmogorov's inequality: If S I > 5 2 , • • • are partial sums of independent random variables with mean 0, they form a martingale; if the variances are finite, then S�, Si, . . . is a submartingale by Theorem 35.1(i), and (35.22) for this submartingale is exactly Kolmogorov's inequality (22.9). Let r 2 = n; let r 1 be the smallest k such that Xk > a, if there is one, and n otherwise. If Mk = max k X; , then [M" > a] n [r 1 < k] = [Mk > a] E .9k, and hence [M" > a] is in .9;."1 • By Theorem 35.2, PROOF.
; ,;;
(35 .23)
a] < a - £[1Xnll · The second fundamental inequality requires the notion of an upcrossing. Let [a, f3] be an interval (a < 13) and let X 1 , X" be random variables. Inductively define variables r p r 2 , : •
•
•
•
•
•
•
•
,
•
r 1 is the smallest j such that 1 < j < n and Xi < a, and is n if there is no such j; rk for even k is the smallest j such that rk - l <j < n and Xi > /3, and is n if there is no such j; rk for odd k exceeding 1 is the smallest j such that rk <j < n and Xi < a, and is n if there is no such j. _1
The number U of upcrossings of [a, 131 by X 1 , . . . , X" is the largest i such that X < a < t3 < X . In the diagram, n = 20 and there are three upcrossmgs. T 21
T 2, - l .
I Jr\.
v �r\ /1\
a
'V
/1\.
\V
'
\
�
'
v
I
v\
\\JI
I\.
I
\
I I
T7
=
Tg
=
'
'
For a submartingale X p . . . , X" , the number U of upcross ings of [a, 131 satisfies Theorem 35.4.
(35 .24 ) Let Yk = max{O, Xk - a} and () = t3 - a. By Theorem 35.1(ii), Y1 , , Y,. is a submartingale. The rk are unchanged if in the definitions xj < a is replaced by 1j = 0 and xj � t3 by lj � (), and so u is also the number of upcrossings of [0, 0] by Yp . . . , Y,. . If k is even and rk- l is a stopping time, then for j < n, j- 1 PRooF. •
•
•
[rk = j) = U [ rk - l = i, Y;+ I < O , . . . , l:J - I < O , lj � O ] i= I
468
DERlVATIVES AND CONDITIONAL PROBABILITY
lies in � and [ rk = n] = [ r� < n - 1Y lies in �, and so rk is also a stopping time. With a similar argument for odd k, this shows that the rk are all stopping times. Since the rk are strictly increasing until they reach n, " = n. Therefore, "
n
Yn = YT, > YT" - YTI = L ( YT• - YT• - , ) k= 2
=
L e + La ,
where Le and La are the sums over the even k and the odd k in the range 2 < k < n. By Theorem 35.2, La has nonnegative expected value, and there fore, E[ Ynl � E[ LJ If YTV -I = 0 < () < Y7 2; (which is the same thing as X7 2; _ 1 < a < {3 < X7 2), then the difference Y72i - Y,. 2i - l appears in the sum Le and is at least 0. Since there are U of these differences, Le > OU, and therefore E[ Y"] � OE[U ]. In terms of the originai variables, this is
( {3 - a ) E[U ] < j
[ X, > a)
( X" - a) dP < E [ I X)] + lal.
•
In a sense, an upcrossing of [a, {3 ] is easy: since the Xk form a submartin gale, they tend to increase. But before another upcrossing can occur, the sequence must make its way back down below a, which it resists. Think of the extreme case where the Xk are strictly increasing constants. This is reflected in the proof. Each of le and La has nonnegative expected value, but for Le thr proof uses the stronger inequality E[ Le] > E[OU ]. Convergence Theorems
The martingale convergence theorem, due to Doob, has a number of forms. The simplest one is this:
Let X I > X2 , be a submartingale. If K = sup" E[I X"� < then X" � X with probability 1, where X is a ran dom variable satisfying E[ I X Il < K. Theorem 35.5.
•
•
•
oo ,
a and {3 for the moment, and let U" be the number of upcrossings of [ a, {3 ] by X1, . . . , X". By the upcrossing theorem, E[U"] < ( E[IX"I] + iai)j({3 - a) < ( K + lal)/({3 - a). Since Un is nondecreasing and PRooF.
Fix
E[U" ] is bounded, it follows by the monotone convergence theorem that sup" U" is integrable and hence finite-valued almost everywhere. Let X * and X * be the limits superior and inferior of the sequence X I> X2 , . . . ; they may be infinite. If X * < a < {3 < X * , then U" must go to infinity. Since U" is bounded with probability 1, P[ X* < a < {3 <X*] = 0.
SECTION 35.
MARTINGALES
469
Now [ X * < X * ] = U [ X * < a < {3 < X * ] ,
(35 .25 )
where the union extends over all pairs of rationals a and {3. The set on the left therefore has probability 0. Thus X * and X * are equal with probability 1, and X" converges to their common value X, which may be + oo. By Fatou's lemma, £[1 X I] < • lim inf" E[I X"Il < K. Since it is integrable, X is finite with probability 1. If the X" forn 1 a martingale, then by (35. 16) applied to the submartingale I X 1 1, I X2 1, . . the E[IXnll are nondecreasing, so that K = lim " E[IXnll· The hypothesis in the theorem that K be finite is es Xn 2 , is a martingale with respect to �� 1 , � 2 , . . . . Define Y,.k = Xnk - Xn, k - P suppose the Y,, k have second moments, and put un2k = E[ Yni II�. k 1 ](�0 = {0, f!}). The probability space may vary with n . If the martingale is originally defined only for 1 < k < rn , take Ynk = 0 and �k = �' for k > rn . Assume that r.; = 1Yn k and r.; = 1 un2k converge with probability 1. • •
•
•
_
Theorem 35.12.
Suppose that 00
2k � 2 " n f.... (F p (F '
( 35 .35)
k=l
where u is a positive constant, and that "'
L E [ Yni I!IY.k l <J) � 0
(36.36)
k=l
for each
€.
�
Then r.; = I Y,. k = uN.
PRooF oF
The proof will be given for t going to infinity through the integers.t Let Ynk = I1v. � kld Vn and �k = � . From [vn > k] = ['LJ.:lo/ < n ] E � - 2 follow E[ Ynkll�.k - l ] = 0 and un2k = E[ Y,.i ll �.k - l ] = I1v. � klufjn. If K bounds the I Yk l, then 1 < 'L; = I un2k = n - 1 '£'k"= 1 uf < 1 + K 2jn, so that (35.35) holds for u = 1. For n large enough that Kj /n < E, the sum in (35.36) vanishes. Theorem 35. 12 therefore applies, and '£'k"= 1 Yd Vn = '£; = 1 Y,.k = N. • THEOREM 35.1 1 .
t For the general case, first check that the proof of Theorem 35. 12 goes through without change if n is replaced by a parameter going continuously to infinity.
S ECTION 35.
MARTINGALES
477
PRooF oF THEOREM 35. 12. Assume at first that there is a constant
that
c
such
00
(35 .37)
which in fact suffices for the application to Theorem 35. 1 1. Write Sk = L:J= 1 Ynj (S0 = 0), Soo = r:;= I Y,.j , L k = L�� 1 un21 (l0 = 0), and 2 L00 = L:j= 1unj; the dependence on n is suppressed in the notation. To prove E[ e 1· 1 Sx] .-., e - ' 1 2 u2, observe first that I
= I E [ e l l x( 1 - e'l 2" and let
K1
bound
t2 and lt l3)
478
DERIVATIVES AND CONDITIONAL PROBABILITY
And (38.39)
where (use (27. 15) and increase K, )
Because of the condition E[ Ynk \ l �. k _ 1 1 0 and the definition of un2k , the right sides of (35.38) and (35.39), minus () and ()', respectively, have the same conditional expected value given �. k 1 • By (35.37), therefore, =
_
t E[l E k'Ynk - e - !t 2 u2, define A nk = [L:7= 1 un2j < c 1 and A noo = [L:j= I (J'n� < c 1, and take znk = Y,.k !Ank" From A,k E �. k - 1 follow E[ Znk l!�' k - l 1 = 0 and -r;k = E[ z;k il� k - 1 1 = / un2k . S ince L:j= 1 -r;j is L:,.k= I (J'n2j on A nk -- A n k + I and Lj = I un,·2 on A noo' the Z-array sa Usfie� ( 35.37). ' Now P( AnJ -7 1 by (35.35), and on Anoo' Tnk 2 - unk 2 for all k, so that the Z-array satisfies (35.35). And it satisfies (35.36) because I Znk I < I Y,.k 1 . There fore, by the case already treated, L:; = 1 Znk = uN. But since L; = 1 Y,.k com • cides with this last sum on A n""' it, too, is asymptoticaily normal. oo
'
A
nk
,
,
PROBLEMS 35.1. Suppose that tl 1, !l 2 , . . . are independent random variables with mean 0. Let XI = Il l and xn+ I = xn + tln+ dn(XI> . . . ' Xn), and suppose that the xn are
integrable. Show that {Xn} is a martingale. The martingales of gambling have this form.
Y1 , Y2, . . . be independent random variables with mean 0 and variance u 2• Let Xn = O:::f: = 1 Yk ) 2 - nu 2 and show that {Xn} is a martingale.
35.2. Let
SECTION 35.
MARTINGALES
479
35.3. Suppose that {Yn} is a finite-state Markov chain with transition matrix [ p1). Suppose that LjPtjx(j) = Ax(i) for all i (the x(i) are the components of a right n eigenvector of the transition matrix). Put Xn = A - x(Yn) and show that {Xn} is a martingale. 35.4. Suppose that Y1 , Y2 , are independent, positive random variables and that E[Y,,] = 1. Put Xn = Y1 Yn. (a) Show that {Xn} is a martingale and converges with probability 1 to an integrable X. (b) Suppose specifically that Yn assumes the values t and % with probability t each. Show that X = 0 with probability 1 . This gives an example where E[n; = I Yn] * n� = I E[Y,,] for independent, integrable, positive random vari ables. Show, however, that E[n;= I Yn] < n;= 1 E[ Yn] always holds. • • •
•
•
•
35.5. Suppose that X1 , X? , . . . is a martingale satisfying £[ X� ] = 0 and E[ Xj] < oo. Show that E[( Xn+r - Xn) 2 ] = r:,;; = 1 E[( Xn + k - Xn + k - l ) ] (the variance of the sum is the sum of the variances). Assume that L,n E[(Xn - Xn _ 1 )2 ] < and prove that Xn converges with probability 1. Do this first by Theorem 35.5 and then (see Theorem 22.6) by Theorem 35.3. oo
35.6. Show that a submartingale Xn can be represented as Xn Yn + Zn, where Yn is a martingale and 0 < Z 1 < Z2 < · · · . Hint · Take X0 = 0 and !ln =Xn - Xn _ 1 , and define zn = L,f: = 1 £[/l k ll�� - d (2k) - 1 [�p(y ), the sum extending over the 2k nearest neighbors y. Show for k = 1 and k = 2 that a bounded superharmonic function is constant. Show for k > 3 that there exist nonconstant bounded harmonic functions.
DERIVATIVES AND CONDITIONAL PROBABILITY
480
35.13. 32.7 32.9 i Let !T, P) be a probability space, let v be a finite measure on !T, and suppose that .9,; i .9;, c: !F. For n < oo, let Xn be the Radon Nikodym derivative with respect to P of the absolutely continuous part of v when P and v are both restricted to .9,;. The problem is to extend Theorem 35.7 by showing that Xn -> X, with probability 1. (a) For n < oo, let
(D.,
A E -9,; , be the decomposition of v into absolutely continuous and singular parts with respect to P on .9,;. Show that X1 , X2 is a supermartingale and converges with probability 1. (b) Let • • • •
A E'
.9,; ,
be the decomposition of a-00 into absolutely continuous and singular parts with fespect to P on .9,;. Let Yn = E[Xooll-9,;], and prove A E .9,;. Conclude that Yn + Zn = Xn with probability 1. Since Yn converges to X,, Zn converges with probability 1 to some Z. Show that fA Z dP < a-""( A) for A E .9;,, and conclude that Z = 0 with probability 1.
35.14. (a) Show that {Xn} is a martingale with respect to {.9,;} if and only if, for all n and all stopping times T such that T < n, E[ Xn11.9;] = X7• (b) Show that, if {Xn} is a martingale and T is a bounded stopping time, then
£[ XT ] = £[ Xd.
35.15. 31.9 i Suppose that .9,; i � and A E �, and prove that P[AII�] -> /A with probability 1. Compare Lebesgue's density theorem. 35.16. Theorems 35.6 and 35.9 have analogues in Hilbert space. For n < oo, let Pn be the perpendicular projection on a subspace Mn - Then Pnx Poox for all x if either (a) M1 c M2 c · · · and Moo is the closure of Un M2 -:::> • • and Moo = nn M,. --+
•
< oo
35.17. Suppose that e has an arbitrary distribution, and suppose that, conditionally on e, the random variables Y1 , Y2 , . . . are independent and normally dis tributed with mean e and variance a- 2 Construct such a sequence {e , Y1 , Y2 , • • • }. Prove (35.3 1). •
35.18. It is shown on p. 471 that optional stopping has no effect on likelihood ratios This is not true of tests of significance. Suppose that X1 , X2 , . . . are indepen dent and identically distributed and assume the values 1 and 0 with probabili ties p and 1 - p. Consider the null hypothesis that p = 4- and the alternative that p > t· The usual .05-Ievel test of significance is to reject the null
SECTION 35.
MARTINGALES
481
hypothesis if (35.40 ) For this test the chance of falsely rejecting the null hypothesis is approximately P[ N > 1.645] "" .05 if n is large and fixed. Suppose that n is not fixed in advance of sampling, and show by the law of the iterated logarithm that, even if p is, in fact, t, there are with probability 1 infinitely many n for which {35.40) holds.
35.19. (a) Suppose that (35.32) and (35.33) hold. Suppose further that, for constants s�, s; 2 ['l, = 1 ul -> p 1 and s; 2 ['l, = 1 E[ Y/I[! Yk l� s) -> 0, and show that S1� 1 r:,z = 1 Yk N. Hint: Simplify the proof of Theorem 35. 1 1 . (b) The Lir.deberg-Levy theorem for martingales Suppose that =
. . ·
, Y - 1 , Yo , Y1 , · · ·
is stationary and ergodic (p. 494) and that
Prove that L, k= l yk ;.,fii is asymptotically normal. Hint: Use Theorem 36.4 and the remark following the statement of Lindeberg's Theorem 27.2.
35.20. 24.4 j Suppose that the u-field � in Problem 24.4 is trivial. Deduce from n Theorem 35.9 that P[ AIIr .r] ----> P[ A II�] = P(A) with probability 1, and conclude that T is mixing.
CHAPTER 7
Stochastic Processes
SECTION 36. KOLMOGOROV'S EXISTENCE THEOREM Stochastic Processes
A
stochastic process is a collection [X,: t E T] of random variables on
a
probability space (f!, !F, P). The sequence of gambler's fortunes in Section 7, the sequences of independent random variables in Section 22, the martin gales in Section 35-all these are stochastic processes for which T = {1, 2, . . . }. For the Poisson process [ N1: t > 0] of Section 23, T = [0, oo). For all these processes the points of T are thought of as representing time. In most cases, T is the set of integers and time is discrete, or else T is an interval of the line and time is continuous. For the general theory of this section, however, T can be quite arbitrary. Finite-Dimensional Distributions
A process is usually described in tenns of distributions it induces in Eu
clidean spaces. For each k-tuple (t 1 , . . . , tk ) of distinct elements of T, the random vector (X1 , , X1 k ) has over R k some distribution I""11 1 , /k' I
• • •
I
•
( 36.1 ) These probability measures J.L1 1 • 1k are the finite-dimensional distributions of the stochastic process [ X1: t E T]. The system of finite-dimensional distribu tions does not completely determine the properties of the process. For example, the Poisson process [N1: t > 0] as defined by (23.5) has sample paths (functions N1(w) with w fixed and t varying) that are step functions. But (23.28) defines a process that has the same finite-dimensional distributions and has sample paths that are not step functions. Nevertheless, the first step in a general theory is to construct processes for given systems of finite dimensional distributions. 482
S ECTION
36.
KOLMOGOROV' s EXISTENCE THEOREM
483
Now (36.1) implies two consistency propertie s of the system J.L1 1 , k . Suppose the H in (36.1) has the form H = H1 X · · x Hk (H; E .9i' 1 ), and consider a permutation 7T of (1, 2, . . . , k ). Since [( X, , . . . , X, ) E (H1 X · · X Hk )] and [( X, , . . . , X, ) E (Hrr 1 X · X Hrr k )] are t he sam� event, it follows by (36.1) that ..
•
·
1T
· ·
-rr k.
I
(36.2 ) For example, if 11- s, I = v X v ' , then necessarily The second consistency condition is
JL1, s
'
= v X v.
( 36.3 ) This is clear because (X,1, , X, k _ ) lies in H1 x · · · x Hk _ 1 if and only if (X, I , . . . , X,k - I , X, k ) lies in H 1 x · · x Hk _ 1 x R 1 Measures Jl- 1 1 1 k coming from a process [ X,: t E T ] via (36.1) necessarily satisfy (36.2) and (36.3). Kolmogorcv's existence theorem says conversely that if a given system of measures satisfies the two consistency conditions, then there exists a stochastic process having these finite-dimensional distributions. The proof is a construction, one which is more easily understood if (36.2) and into a single condition. (36.3) are combined k Define cprr: R � R k by • • •
·
•
..
applies the permutation 7T to the coordinates (for example, if 7T sends x 3 to first position, then 7T 1 1 = 3). Since cp;; 1 (H 1 X · · · X Hk ) = Hrr ! X · · · X Hrr k ' it follows from (36.2) that 'Prr
-
for rectangles H. But then
(36.4) Similarly, if cp: R k R k - 1 is the projection cp( x 1 , . . . , x k ) = ( x I > . . . , x � _ 1 ), then (36.3) is the same thing as �
(36.5 ) The conditions (36.4) and (36.5) have a common extension. Suppose that (u 1 , , u rn ) is an m-tuple of distinct elements of T and that each element of (tp . . . , tk ) is also an element of ( up . . . , um). Then (t 1 , . . . , t k ) must be the u rn ); that is, k m and there initial segment of some permutation of (u 1 , • • •
• • •
,
. . . , x k ) of real numbers, and so RT can be identified with k-dimensional Euclidean space R". If T = {1, 2, . . . }, a real function on T is a sequence {x 1 , x 2, } of real numbers. If T is an interval, RT consists of all real functions, however irregular, on the interval. The theory of RT is an elaboration of the theory of the analogous but simpler space S" of Section 2 (p. 27). Whatever the set T may be, an element of RT will be denoted x. The value of x at t will be denoted x( t) or x, depending on whether x is viewed as a function of t with domain T or as a vector with components indexed by the elements t of T. Just as R k can be regarded as the Cartesian product of k copies of the real line, RT can be regarded as a product space-a product of copies of the real line, one copy for each t in T. 1 For each t define a mapping Z, : RT � R by • • •
( 36.8 )
Z,(x ) = x ( t ) = x, .
The Z, are called the coordinate functions or projections. When later on a probability measure has been defined on RT, the Z, will be random variables, the coordinate variables. Frequently, the value Z,(x) is instead denoted Z(t, x ). If x is fixed, Z( · , x) is a real function on T and is, in fact, nothing other than x( · )-that is, x itself. If t is fixed, Z(t, · ) is a real function on R7 and is identical with the function Z, defined by (36.8).
SECTION 36.
KOLMOGORov's EXISTENCE THEOREM
485
There is a natural generalization to RT of the idea of the u-field of k-dimensional Borel sets. Let �T be the u-field generated by all the coordinate functions Z, t E T: �T = u[Z,: t E T ] . It is generated by the sets of the form
1 k for t E T and H E � • If T = {1, 2, . . . , k}, then �T coincides with jN . Consider the class �ci consisting of the sets of the form (36.9)
A = =
[ [
x
x
E RT: ( Z , J ) . . . , z, k( ) ) E H J x
E RT: (
,
x, 1 , . . . , x,
x
J E H] ,
where k is an integer, (t 1 , . . . , tk) is a k-tuple of distinct points of T, and H E �k. Sets of this form, elements of �[, are called finite-dimensional sets, or cylinders. Of course, IN[ generates �T. Now �l is not a u-field, does not coincide with �T (unless T is finite), but the following argument shows that it is a field.
y
a,
I,
If T is an interval, the cylinder [x e RT: a1 < x(t1 ) < {31, a2 <x(t2) < {32] consists of the functions that go through the two gates shown; y lies in the cylinder and z does not (they need not be continuous functions, of course)
The complement of (36.9) is RT - A = [x E RT: ( • • • , ) E Rk - H ], and so �l is closed under complementation. Suppose that A is given by (36.9) and B is given by x 1 1,
x,
(36.10) where / E �i. Let ( u 1 , . . . , u m ) be an m-tuple containing all the t,. and all the s13 • Now (t . . . , tk) must be the initial segment of some permutation of m ( u 1 , . . . , u rn ), and if 1/1 is as in (36.6) and H' = 1/1 - I H, then H' E � and A is �>
STOCHASTIC PROCESSES
486
given by
(36.1 1 ) as well as by (36.9). Similarly, B can be put in the form
(36. 12 ) where I' E �m. But then
(36.13) Since H' u I' E �m, A u B is a cylinder. This proves that �6 is a field such that � T = u(�l). The Z, are measurable functions on the measurable space ( R T, � T ). If P is a probability measure on � T, then [ 21 : t E T] is a stochastic process on (R T, � T, P), the coordinate-variable process. Kolmogorov's Existence Theorem
The existence theorem can be stated two ways:
If J.L 1 1 1 k are a system of distributions satisfying the consistency conditions (36.2) ana (36.3), then there is a probability measure P on � T such that the coordinate-variable process [Z 1: t E T] on (R T, � T, P) has the f.L 1 1 1k as its finite-dimensional distributions. Thecrem 36.2. Tf J.L 1 1 • 1 k are a system of distributions satisfying the consis tency conditions (36.2) and (36.3), then there exists on some probability space (f!, !F, P) a stochastic process [X1 : t E T] having the Jl- 1 1 • 1 k as its finite dimensional distributions. Theorem 36.1.
For many purposes the underlying probability space is irrelevant, the joint distributions of the variables in the process being all that matters, so that the two theorems are equally useful. As a matter of fact, they are equivalent anyway. Obviously, the first implies the second. To prove the converse, suppose that the process [X1 : t E T] on (f!, !F, P) has finite-dimensional distributions Jl- 1 1 •• 1 k, and define a map {: f! � R T by the requirement
(36.14 )
t E T.
For each w, {(w) is an element of R T, a real function on T, and the
SECTION
36.
KOLMOGOROV ' S EXISTENCE THEOREM
requirement is that
487
X1( w ) be its value at t. Clearly,
C 1 [ x E R T : (Z1lx ) , . . . , Z1 k( x )) E H ]
(36.15 )
= [ w E f! : (Z11( {(w)) , . . . , Z1 k( {( w)) ) E H] = [ w E n : ( X11( w ) , . . . , X1k( w)) E H ] ; since the X1 are random variables, measurable !F, this set lies in !F if k Thus � . C1A E !F for A E �J, and so (Theorem 13.1) g is measur E H able !Fj� T. By (36.15) and the assumption that [X1 : t E T] has finitedimensional distributions J..L 1 1 1�, PC 1 (see (13.7)) satisfies
( 36.16) pg- 1 [ x e R T : (Z1lx) , . . . , Z1 k( x )) E H]
= P [ w E f! : ( XJ w) , . . . , X1 k( w ) ) E H ] = 11- 1 1 1 k( H ) . Thus the coordinate-variable process [ Z 1 : t E T] on (RT, �T, PC 1) also has finite-dimensional distributions Jl-1 1 1k . Therefore, to prove either of the two versions of Kolmogorov's existence theorem is to prove the other one as well. that T is finite, say T = {l, 2, . . . , k}. Then Example 36. 1. Suppose k (R T, �T) is (R k , f!! ), and taking P 11- 1 ,2, , k satisfies the requirements of Theorem 36.1. • =
• •
Example 36.2. Suppose that T = {1 , 2, . . . } and (36.17) are probability distributions on the line. The consistency where /J- p 11-2, conditions are easily checked, and the probability measure P guaranteed by Theorem 36.1 is product measure on the product space (R T, �T ). But by Theorem 20.4 there exists on some (f!, !F, P) an independent sequence X 1 , X2, of random variables with respective distributions JJ- 1 , 11-2, ; then (36.17) is the distribution of (XI > . . . , X1 ). For the special case (36.17), Theorem 36.2 (and hence Theorem 36.1) was thus proved in Section 20. The existence of independent sequences with prescribed distributions was the measure-theoretic basis of all the probabilistic developments in Chapters 4, 5, and 6: even dependent processes like the Poisson were constructed from independent sequences. The existence of independent sequences can also be made the basis of a proof of Theorems 36.1 and 36.2 in their full generality; see the second proof below. • • • •
• • •
• • .
I
n
STOCHASTIC PROCESSES
488
The preceding example has an analogue in 1the space S"' of sequences (2. 15). Here the finite set S plays the role of R , the zn ( ) are analogues of the Zn( ), and the product measure defined by (2.21) is the analogue of the product measure specified by (36. 17) with J.L; = J.L. See also Example 24.2. The theory for S"' is simple because S is finite: see Theorem • 2.3 and the lemma it depends on. Example 36.3.
·
·
If T is a subset of the line, it is convenient to use the order structure of the line and take the 11- s I s to be specified initially only for k-tuples ( s 1 , . . . , sk ) that are in increasing order: Example 36.4.
k
( 36.18)
It is natural for example to specify the finite-dimensional distributions for the Poisson processes for increasing sequences of time points alone; see (23.27). Assume that the J.L5 1 sk for k-tuples satisfying (36.18) have the consistency property
For given s1, • • • , sk satisfying (36.18), take (X5 1 , • • • , Xs) to have distribution J.Ls I s • If tl, . . . , tk iS a permutatiOn Of S I • • • • > Sk , take J.L1 I tO be the distribution of (X1 1 , • • • , X,): k
•
Ik
(36.20)
This unambiguously defines a collection of finite-dimensional distributions. Are they consistent? If t,. 1 , • • • , t,. k is a permutation of t 1 , . . . , tk , then it is also a permutation of is the distribution of s 1 , . . . , sk , and by the definition (36.20), J.L,; E > 0 for all n. The problem is to pose that A 1 ::> A 2 :::> show that n n An must be nonempty. Since An E �[, and since the index set involved in the specification of a cylinder can always be permuted and •
•
•
STOCHASTIC PROCESSES
490
expanded, there exists a sequence
tI> t2,
•
•
•
of points in T for which
wheret H,. E ���. Of course, P( A ,.) = Jl-1 I t,.(H,. ). By Theorem 12.3 (regularity), there 1 exists inside H,. a compact set K,. such that Jl- 1 1 1 ( H,. - K,. ) 1< E / 2 " + • If B,. = [ x E RT: (x1 I , , x 1 ) E K,. ], then P( A,. - B,) < E / 2" + • Put C, = n z = I Bk . Then C,. c B, cA,. and P( A, - C, ) < E / 2, so that P(C,) > E / 2 > 0. Therefore, C, c C, _ 1 and C, is nonempty. n n Choose a point x < ) of RT in en " If n > k, then x < ) E c, c ckI c Bk and n n · Kk IS . boun d e d , the sequence { x (1 ), x;f2), . . . } hence ( x (1 ), , x 1( ) ) E Kk . smce is bounded for each k. By the diagonal method [A14] select an increasing sequence n n 2 , of integers such that lim; x��;l exists for each k. There is in RT some point x whose tk th coordinate is this limit for each k. But then, . . . , x�";l) and hence lies for each k, ( x 1 I , , x1 k ) is the limit as i � oo of (x��;l, k I in Kk . But that means that x itself lies in Bk and hence in A k . Thus • E n � = I A k , which completes the proof.:!: •
•
I
•
•
•
•
•
•
n
k.
k
•
1,
•
•
•
•
k.
•
•
X
The second proof of Kolmogorov's theorem goes in two stages, first for countable T, then for general T.*
T. The result for countable T will be proved in its second formulation, Theorem 36.2. It is no restriction to enumerate T as { t 1, t2, } and then to identify t,. with n; in other words, it is no restriction to assume that T = {1, 2, . . . }. Write p., , in place of p., By Theorem 20.4 there exists on a probability space (.fl, 5', P) (which can be taken SECOND PROOF FOR couNTABLE
•
1 2 , •
•
•
,n·
of random variables each to be the unit interval) an independent sequence U1, U2, uniformly distributed over (0, 1). Let F1 be the distribution function corresponding to p., 1 If the "inverse" g 1 of F1 is defined over (0, 1) by g 1(s) = inf[ x : s < F1(x )], then XI g I(UI) has distribution p., I by the usual argument: P[ g i(UI ) < X ] = P[ ul < Fl(x)] •
•
•
•
=
= Fl x ).
The problem is to construct X2, X3,
•
•
•
inductively in such a way that
(36.23) for a Borel function h k and ( X1, . . . , X,) has the distribution p., , . Assume tha t Xp . . . , X, _ 1 have been defined (n > 2): they have joint distribution p., , _ 1 and (36.23) holds for k < n - 1. The idea now is to construct an appropriate conditional distribu tion function F,.( x l x 1 , . . . , x, _ 1 ); here F,( xi X1(w), . . . , X, _ 1(w)) will have the value P[ X,. < x ll X1, , X,. _ dw would have if X,. were already defined. If g,.( · l x I > , x,. _ 1 ) •
•
•
•
•
•
t in general, An will involve indices t 1, . . . , ta ' w here a 1 < a2 < · · · . For notational simplicity a,. n is taken as n. As a matter of fact, this can be arranged anyway: Take A'a = A ,., A'k [x: k k a (x11, . . . , x,) e R ] = RT tor k < a1, and A'k = [x: (x11, . . . , x,) e H,. x R - :] = A ,. for a,. < , k < an + I· Now relabel A ,. as A,.. t The last part of the argument is, in effect, the proof that a countable product of compact sets i s compact. * This second proof, which may be omitted, uses the conditional-probability theory of Section 33.
=
.
SECTION 36.
KOLMOGOROV'S EXISTENCE THEOREM
491
is the "inverse" function, then Xn(w) = gn(Un(w)I X1(w), . . . , Xn _ 1(w)) will by the usual argument have the right conditional distribution given X 1 , . . . , Xn_ 1 , so that n (X1 , . . . , Xn_ 1 , Xn) will have the right distribution over R . the conditional distribution function, apply Theorem 33.3 in nTo!Jr,construct (R , 11-n) to get a conditional distribution of the last coordinate of (x 1 , . , . , xn) given the first n - 1 of them. This will have (Theorem 20. 1) the form v( H; i 1 , . . . , xn _ 1 ); it is a probability measure as H varies over � 1 , and
Since the integrand involves only x 1 , . . . , xn _ 1 , and since 11- n by consistency projects to 11- n - l under the map (x 1 , . . . , xn) --+ (x 1 , . . . , xn _ 1 ), a change of variable gives
JMv( H ; X I , . . . , xn - 1 ) dp.,n - l( x l , . . . , xn - 1 ) = 11- n [ X E R n : (X I >
.
•
.
' xn - 1 ) E M' Xn E H ] .
Define Fn(xlx 1 , . . . , xn _ 1 ) = v(( - oo, x]; x 1 , . . . , xn _ 1 ). Then Fn(- l x 1 , . . . , xn _ 1 ) nis a probability distribution function over the line, Fn(xl · ) is a Borel function over R - l , and
Put g,(ulx l , . . . ' Xn - 1 ) = inf[ x: u < Fn(xlx l , . . . ' xn - 1 )] for 0 < u < 1. Since Fn(xlx 1 , . . . , xn_ 1 ) is nondecreasing and righ t-continuous in x, gn(ulx I > • • • , xn _ 1 ) < x if and only if u < Fn(xlx 1 , . . . , xn_ 1 ). Set Xn = gn(Un1X1 , . . . , Xn _ 1 ). Since (X1 , . . . , Xn _ 1 ) has distribution 11-n - l and by (36.23) is indeper.dent oi Un> an application of (20.30) gives
P ( ( X1 , . . . , Xn _ 1 ) E M, Xn < x ) P [ ( X1 , . . . , Xn _ 1 ) E M, Un < Fn ( xiX 1 , . . . , Xn _ 1 ) ) =
= J Fn( xlx 1 , , xn _ J) dJtn - l( x 1 , , xn - l) M n = 11- n [ X E R : ( X I ' . . . ' Xn - I ) E M, xn < X ] . •••
• • •
Thus (X I , . . . ' Xn) has distribution 11-n · Note that xn, as a function of XI , . . . ' xn - 1 and Un, is a function of U1 , . . . , Un becau se (36.23) was assumed to hold for k < n. Hence (36.23) holds for k = n as well. •
STOCHASTIC PROCESSES
492
SECOND PROOF FOR GENERAL r. Consider (RT, !JRT) once again. If s c T, let SIS = a- [ Z, : t E S]. Then SIS c !f = !JRT. Suppose that S is countable. TBy the case just treated, there exists a process [ X,:
t E S ] on some W, 5', P)-the space and the process depend on S-such that ,k for eve ry k-tuple (t I > , tk ) from S. Define a (X, I , ./:. . ,....X, ) has distribution p., , . I . map � : u -+k RT by requmng that • • •
Z, ( { { w ) ) =
{:
( ) , w
if t E S, if t � s.
Now (36.15) holds as before if t 1 , . . . , t k all lie in S, and so { is measurable S'j SIS. Further, (36.16) holds for t 1 , . . . , tk in S, Put Ps = PC 1 on SIS Then Ps is a probability measure on (RT, SIS), and (36,24 ) if H E gp k and t 1 , , t k all lie in S. (The various spaces (0, 5', P) and processes [ X,: t E S] now become irrelevant.) If 50 c S, and if A is a cylinder (36.9) for which the t 1 , , tk lie in 50, then Ps ( A ) and Ps ( A ) coincide, their common value being p., , , (H). Since these cylin�ers generate Y.s , Ps ( A ) = Ps ( A ) for all A in !F.s . If A 11iesk both in g:;.-> and g:;s , then 1 2 Ps( A ) = Ps u s ( A ) = Ps ( A ). Thus P( A ) = Ps( A ) consistently defines a set function 1 on the clas� U � SIS, the 2union extending over the countable subsets S of T. If A n lies in this union and A n E SIS (Sn countable), then s = u n sn is countable and u n An lies in SIS· Thus u s SIS is a a--field and so must coincide with !JRT. Therefore, p is a probability measure on !JR T, and by (36.24) the coordinate process has under P the • required finite-dimensional distributions. • • •
. • .
11
o
'
The Inadequacy of Theorem 36.3.
o
�T Let [ X1: t E T] be a family of real functions on !1.
lf A E u[X1 : t E T] and w E A, and if X1(w) = X:(w') for all t E T, then w' EA. {ii) If A E u[X1: t E T], then A E u [X1 : t E S] for some countable subset S of T. { i)
Define g: n --+ R T by Z/ (g(w)) = Xl(w). Let !F u [ Xl : t E1T]. By (36. 15), g is measurable !Fj�T and hence !F contains the class [C M: M E �T]. 'I he latter class is a u-field, however, and by (36.15) it contains th e sets [ w E !l : (X1(w), . . . , X1k(w)) E H], H E �\ and hence contains the I u-field !F they generate. Therefore PROOF.
(36.25)
This is an infinite-dimensional analogue of Theorem 20.10).
SECTION
' KOLMOGO ROV S EXISTENCE THEOREM
36.
493
1
As for {i), the hypotheses imply that w E A = C M and g( w) = {(w' ), so that w' E A certainly follows. For S c T, let .rs u[ X1: t E S]; {ii) says that !F= !F coincides with T .:#= U s .rs, the union extending over the countable subsets S of T. If A 1 , A 2, lie in .§, An lies in .rs for some countable Sn, and so U n A n lies n in .§ because it lies in .rs for S = U nSn. Thus .§ is a u-field, and since it contains the sets [X1 E H], it contains the u-field !F they generate. (This part of the argument was used in the second proof of the existence theorem.) • =
•
•
•
From this theorem it follows that various important sets lie outside the class � T. Suppose that T [0, oo). Of obvious interest is the subset C of R T consisting of the functions continuous over [0, oo). But C is not in � T. For suppose it were. By part (ii) of the theorem Oet !1 = RT and put [ 21: t E T] in the role of [X1 : t E T]), C would lie in u[ Z1 : t E S] for some countable S c [0, oo ). But then by part {i) of the theorem Oet f! = R T and put [Z1 : t E S ] in t he role of [ X1 : t E T]), if x E C and Z1( x ) = Z1( y) for all t E S, then y E C. From the assumption that C lies in � T thu!) follows the existence of a countable set S such that, if x E C and x( t ) = y(c) for all t in S, then y E C. But whatever countable set S may be, for every continuous x there obviously exist functions y that have discontinuities but agree with x on S. Therefore, cannot lie in � T. What the argument shows is this: A set A in R T cannot lie in � T unless there exists a countable subset S of T with the property that, if x E A and x( t ) = y(t) for all t in S, then y E A . Thus A cannot lie in � T if it effectivety involves all the points t in the sense that, for each x in A and each t in T, it is possible to move x out of A by changing its value at t alone. An d C is such a set. For another, consider the set of functions x over T = [0, oo) that are non decreasing and assume as values x(t ) only nonnegative integers: =
( 36 .26)
[ X E R[O, co) :
X
( S) < x ( t ) ,
X < t ; X ( t ) E { 0, 1 , . . . } , t > 0] .
This, too, lies outside �7. In Section 23 the Poisson process was defined as follows: Let X1 , X2, be independent and identically distributed with the exponential distribution (the probability space f! on which they are defined may by Theorem 20.4 be taken to be the unit interval with Lebesgue measure). Put S0 0 and + Xn . If Sn( w) < Sn + 1(w) for n > 0 and Sn( w ) � oo, put N(t, w) Sn X1 + = �( w ) = max[ n : Sn( w ) < t ] for t > 0; otherwise, put N( t, w) = �( w ) 0 for t > 0. Then the stochastic process [ � : t > 0] has the fini te-dimensional distributions described by the equations (23.27). The function N( w) is the path function or sample functiont corresponding to w, and by the construc tion every path function lies in the set (36.26). This is a good thing if the .
=
=
·
·
·
=
·,
t Other terms are realization of the process and trajectory.
•
•
494
STOCHASTIC PROCESSES
process is to be a model for, say, calls arriving at a telephone exchange: The sample path represents the history of the calls, its value at t being the number of arrivals up to time t, and so it ought to be nondecreasing and integer-valued. According to Theorem 36.1, there exists a measure P on for T [0, oo) P) has the finite such that the coordinate process [Z 1: t > 0] on dimensional distributions of the Poisson process. This time does the path function Z( · x) lie in the set (36.26) with probability 1? Since Z( · x) is j ust x itself, the question is whether the set (36.26) has P-measure 1. But this set does not lie in and so it has no measure at all. An application of Kolmogorov ' s existence theorem will always yield a stochastic process with prescribed finite-dimensional distributions, but the process may lack certain path-function properties that it is reasonable to require of it as a model for some natural phenomenon. The special construc tion of Section 23 ger R"' 1 , byon (R"', a'"'{w) can= X(w) = X0(w), X1 ( w), ... ). The measure PC = PX-1 1 ( w ( be vi e wed as the distribution of X. Suppose t h at X is stationary i n , Xn+k) theall nsense that, f o r each > and H E P[ ( Xn+l• ... E H] is the same fo r 1 and Lemma p. = 0, ... . Then (use the shi f t preserves PC i s defi n ed The process X to be ergodic i f under PC the shift is ergodic in the sense of Secti o n In the ergodic case, it follows by the ergodic theorem that 2
+ 1, + 2,
24),
. . . , z _ J( x ) , Zo( x), ZJ( x), . . . ) .
�k �Tx) . Zk + x),
24.
1
. . . , X_
(36.14):
±
1,
k
1
)
k a' ,
(36.27) * This topic, which requires Section 24, may be omitted.
+ I, . . _
(.fl, !7, P),
_ 1,
24.
k
1
(36. 16)
1,
31 1).
SECTION 36.
KOLMOGOROV 'S EXISTENCE THEOREM
495
on a set of PC 1 -measure 1, provided f is measurable !JR"' and integrable. Carry (36.27) back to (fl, .9', P) by the inverse se t mapping C 1 • Then (36.28) with probability 1: (36.28) holds at w if and only if (36.27) holds at x = t w = X(w). It is understood that on the left in (36.28), Xk is the center coordinate (the Oth coordinate) of the argument of f, and on the right, X0 is the center coordinate: For stationary, ergodic X and iategrable f, (36.28) holds with probability 1 . then the Zk are independent under P{ - 1 • In this case, If the Xk are independent, n limn PC 1 (A n r B) = PC 1 ( A)PC 1 (B) for A and B in !JR0, because for large enough n the cylinders A and r nB depend on disjoint sets of time indices and hence are independent. But then it follows by approximation (Corollary 1 to Theorem 1 1 .4) that the same limit holds for all A and B in !JR"'. But for invariant B, this implies PC 1 (SC n B) = PC 1 ( W)PC 1 (B), so that P{ - 1 (8) is 0 cr 1, and the shift is ergodic under PC 1 : If X is stationary and independent, then it is ergodic. If f depends on just one coordinate of x, then (36.28) is in the indept!ndent case a consequence of the strong law of large numbers, Theorem 22. 1 . But (36.28) follows by the ergodic theorem even if f involves all the coordinates in some complicated way. Consider now a measurable real function on R"'. Define !{i: R"' R"' by �
r/J (X) = ( . . . , ( r 1 X ) , ( X), ( Tx ), . . . ) ; here (x) is the center coordinate: Zk (r/J(x)) = (T k x). It is easy to show that r/J is measurable !JR"' 1!JR"' and commutes with the shift in the sense of Example 24.6. Therefore, T preserves PC 1 rfi - I if it preserves PC \ and it is ergodic under PC 1 rfi - 1 if it is ergodic under PC 1 . This translates immediate!y into a result on stochastic processes. Define Y = ( . . . , Y _ J > Y0, Y1, ) in terms of X by •
.
(36.29 )
.
Y,
=
( . . . ' xn - 1 ' xn ' xn + I ' . . ) '
that is to sa), Y(w) = rfi ( X(w)) = rfi{w. Since PC 1 is the distribution of X, P{ - I t/1 - 1 = P(r/J{) - I = py - I is the distribution of Y:
36.4. If X is stationary and ergodic, in particular tf the Xn are indepen dent and identically distributed, then Y as defined by (36.29) is stationary and ergodic. Theorem
This theorem fails if Y is not defined in terms of X in a time-invariant way-if the in (36.29) is no t the same for all n: If n(x) = Z - n(x) and is replaced by n in (36.29), then Yn X0 ; in this case Y happens to be stationary, but it is not ergodic if the distribution of X0 does not concentrate at a single point. =
The autoregressive model. Let (x) = 'f:k =of3 k Z _ k (x) on the set < 1 and where the series converges, and take (x) = 0 elsewhere. Suppose that Example
36.5.
1 {3 1 that the Xn are independent and identically distributed with finite second moments.
Then by Theorem 22.6, Y, = 'f:k=of3 k Xn - k converges with probability 1, and by Theorem 36.4, the process Y is ergodic. Note that Yn + l f3 Yn + Xn + l and that Xn + l is independent of Yn . This is the linear autoregressive model of order 1. • =
496
STOCHASTIC PROCESSES
The Hewitt-Savage Theorem*
Change notation: Let (R"", !JR"") be the product space with {1,2, . . . } as the index set, the space of one-sided sequences. Let P be a probability measure on !JR"". If the coordinate variables Zn are independent under P, then by Theorem 22.3, P(A ) is 0 or 1 for each A in the tail a--field !J'. If the Zn are also identically distributed under P, a stronger result halos. Let J,; be the class of !JR""-sets A that are invariant under permutations of the first n coordinates: if rr is a permutation of {1, . . . , n}, then x lies in A if and only if (Z,ix), . . . , Z,.n(x), Zn + l(x), . . . ) does. Then ../,; is a a--field. Let ../ n � = 1 J,; be the a--field of !JR""-sets invariant under all finite permutations of coordinates. Then .../ is larger than !T, since, for example, the x-set where r::;; = 1Zk(x) > en infinitely often lies in .../ but not in !T. The Hewitt-Savage theorem is a zero-one law for .../ in the independent, identi cally distributed case.
If the Zn are independent and identically distributed under P, then 1 for each A in .../.
Theorem 36.5.
P( A) is 0
or
By Corollary 1 to Theorem 1 n1.4, there are for giver. A and *' an n and a , Zn) E H) (H E gp ) such that P(A U) < t: . Let V = set U = [(Z1 [(Zn + l • . . . ' Zzn) E H ]. If the zk are independent o.nd identically distributed, then P( A "' U) is the same as PROOF.
•
.
.
!>
.
P ( [ ( Zn + ] • • . . , Zzn , Z ] , . . . , zn . z2n + l • z2n + 2 • . . ) E A ]
But if A E J;n, this is in turn the same as P(A "' V). Therefore, P(A "' U) = P(A "' V). But then, P(A "' V) < t: and P(A "' (U n V)) < P(A "' U) + P(A "' V) < 2t:. Since U and V have the same probability and are independent, it follows that P( A) is within 2 2 *' of P(U) and hence P (A) is within 2t: of P (U) = P(U)P(V) = P(U n V), which is in turn within 2t: of P(A). Therefore, I P 2 (A) - P(A)I < 4t: for all t:, and so P(A) • must be 0 or 1.
PROBLEMS
Suppose that [ X,: t E T] is a stochastic process on en, 5', P) and A E !T. Show that there is a countable subset S of T for which P[ All X, t E T] P[ AII X, t E S ] with probability 1. Replace A by a random variable and prove a similar result.
36.1. i
=
T be arbitrary and let K(s, t) be a real function over T x T. Suppose that K is symmetric in the sense that K(s, t) = K(t, s) and nonnegative-definite in the sense that I:L = 1 K(ti, ti)xi xi > O for k � 1, t1 , , tk in T, and x1, , x k real. Show that there exists a process [ X,: t E T] for which (X, , . . . , X, ) ha s
36.2. Let
•
the centered normal distribution with covariances
* This topic may be omitted.
•
•
•
•
K(ti, t), i, j =1 1, . . . , k�
•
SECTION 36.
KOLMOGORov 's EXISTENCE THEOREM
497
36.3. Let L be a Borel set on the line, let ..£' consist of the Borel subsets of L, and let LT consist of all maps from T into L. Define the appropriate notion of cylinder, and let JT be the a-field generated by the cylinders. State a version of Theorem 36. 1 for (LT, JT ). Assume T countable, and prove this theorem not by imitating the previous proof but by observing that LT is a subset of R T and lies in !JRT. 36.4. Suppose that the random variables X1, X2 , assume the values 0 and 1 and P [ Xn = 1 i.o.] = 1. Let p., be the distribution over (0, 1] of Ln = I Xn ;zn. Show that on the unit interval with the measure p., , the digits of the nonterminating dyadic expansion form a stochastic process with the same finite-dimensional distributions as X1, X2 , . . . . . • .
36.5. 36.3 i There is an infinite-dimensional version of Fubini's theorem. In the construction in Problem 36.3, let L = I = lO, 1), T = {1, 2, . . }, let J consist of the Borel subsets of I, and suppose that each k-dimensional distribution is the k-fold product of Lebesgue measure over the unit interval. Then I T is a countable product of copies of (0, 1), its elements are sequences x = (x1, x 2 , . . . ) of points of (0, 1 ), and Kolmogorov's theorem ensures the existence on (I T, JT ) of a product probability measure 7T: 7T[ X: X; < a;, i < n] = a 1 · · · an for 0 < a; < 1. Let r denote the n-dimensional unit cube. (a) Define r/J: r X I T -+ I T by .
n Show that r/J is measurable y X JT1JT and r/J - I is measurable J T1yn l X J T. Show that r/J - p,n X 7T) = 7T, where An is n-dimensional Lebesgue measure restricted to r. (b) Let f be a function measurable J T and, for simplicity, bounded. Define
in other words, integrate out the coordinates one by one. Show by Problem 34. 18, martingale theory, and the zero-one law that (36 .30 ) except for x in a set of 7T-measure 0. (c) Adopting the point of view of part (a), let gn(x1, , xn) be the result of integrating the variable (yn+ J, Yn + z , . . . ) out (with respect to 7T) from f(x 1, . , . , xn, yn + 1, . . . ). This may suggestively be written as • • •
Show that gn(x1,
• • •
, xn) -+ f(x1, x 2 , . . . ) except for x in a set of 7T-measure 0.
36.6. (a) Let T be an interval of the line. Show that !JR T fails to contain the sets of: linear functions, polynomials, constants, nondecreasing functions, functions of bounded variation, differentiable functions, analytic functions, functions con-
STOCHASTIC PROCESSES
498
tinuous at a fixed t0, Borel measurable functions. Show that it fails to contain the set of functions that: vanish somewhere in T, satisfy x(s) < x(t) for some pair with s < t, have a local maximum anywhere, fail to have a local maximum. (b) Let C be the set of continuous functions on T = [0, oo). Show that A E gp r and A c C imply that A = 0. Show, on the other hand, that A E !JRT and C cA do not imply that A = R T.
36.7.
Not all systems of finite-dimensional distributions can be realized by stochastic processes for which !l is the unit interval. Show that there is on the unit interval with Lebesgue measure no process [ X, : t > 0] for which the X, are independent and assume the values 0 and 1 with probability i each. Compare Problem 1.1.
36.8.
Here is an application of the existence theorem in which T is not a subset of the line. Let (N, J, v) be a measure space, and take T to consist of the ..,Y-sets of finite v-measure. The problem is to construct a generalized Poisson process, a stochastic process [ XA: A E T] such that (i) XA has the Pois�on distribution with mean v( A) and (ii) XA , . . . , XA are independent if A 1 , , An are disjoint. Hint: To define the fi nite-dimensional distributions, generalize this construction: For A, B in T, consider independent random variables Y1 , Y2 , Y3 having Poisson distributions with means v( A n Be), v(A n B), v(Ac n B), take ll- A . B to be the distribution of (Y1 + Y2 , Y2 + Y3 ). •
.
.
SECTION 37. BROWNIAN MOTION Definition
A Brownian motion or Wiener process is a stochastic process [ W, : t > 0], on some (f!, !F, P), with these three properties: {i) The process starts at 0:
P[ W0 = 0] = 1 . {ii) The increments are independent : If
( 37.1)
(37 .2)
then
( 37 .3)
P [ W,. - W, .
0] corresponding to the Jl-1 1 Taking w; = 0 for t = 0 shows that there exists on some (f!, !F, P) a process [w;: t > 0] with the finite-dimensional distributions specified by the conditions (i), (ii), and (iii). ,
•
k
·
•
•
' . k
Continuity of Paths
If the Brownian motion process is to represent the motion of a particle, it is natural to require that the path functions W( · , w ) be continuous. But Kolmogorov 's theorem does not guarantee continuity. Indeed, for T = [0, oo), the space (f!, !F) in the proof of Kolmogorov's theorem is (RT, ar), and as shown in the last section, the set of continuous functions does not lie in .§RT.
SECTION
37.
501
BROWNIAN MOTION
A special construction gets around this difficulty. The idea is to use for dyadic rational t the random variables � as already defined and then to
redefine the other � in such a way as to ensure continuity. To carry th is through requires proving that with probability 1 the sample path is uniformly continuous for dyadic rational arguments in bounded intervals. Fix a space (f!, !F, P) and on it a process [ � : t > 0] having the finite motion. Let D be the set dimensional distributions prescribed for Brownian n of nonnegative dyadic rationals, let In k = [k2 - ,(k + 2)z - ], and put n
Mn k ( w ) = sup I W( 1 , w ) - W( k 2 - n , w ) l r e In k n D
(37.8 )
Suppose it is shown that f P[Mn > n - 1 ] converges. The first Borel - Cantelli i.o.] ha� probability 0. But lemma will then imply that B = [Mn > n suppose w lies outside B. Then for every t and E there exists an n such that t < n, 2n - 1 < E, and Mn(w) < n - 1 . Take 8 = 2 - n . Suppose that r and r' are dyadic rationals in [0, t ] and l r - r'l < 8. Then rn and r' must for some k < n2 n lie in a common interval In k (length 2 X 2 - ), in which case iW(r, w) - W(r', w)l < 2Mn k(w) < 2Mn( w) < 2 n - 1 < €. Therefore, uJ $. B implies that W( r, w) is for every t uniformly continuous as r ranges over the dyadic rationals in [0, t ], and hence it will have a continuous extension to [0, oo). To prove L:P[Mn > n - 1 ] < oo, use Etemadi's maximal ineq�ality (22.10), which applies because of the independence of the increments. This, together with Markov's inequality, gives -1
_P [ �� I W( t + 8i2 - m ) - W( t ) I > a ) m ) - W( t) l :? a/3] < 3 imax P[ I W( t + 8i2 s; m 2
(see (21.7) for the moments of the normal distribution). The sets on the left here increase with m , and letting m � oo leads to
P sup I W( t + r8) - W( t) l > a
(37.9 )
O :s; r :s; l reD
Therefore,
n)2 X 2K( 2 P [ Mn > n - 1 ] < n2 n 1 4 (n- )
and L:P[ M > n - 1] does converge. n
4 Kn 5
STOCHASTIC PROCESSES
502
Therefore, there exists a measurable set B such that P(B ) = 0 and such that for w outside B, W(r, w ) is uniformly continuous as r ranges over the dyadic rationals in any bounded interval. If w $. B and r decreases to t through dyadic rational values, then W(r, w) has the Cauchy property and hence converges. Put
�, ( w ) = W' ( t, w ) =
lim W( r, w )
if w $. B,
0
if
r
!1
(J)
E B'
where r decreases to t through the set D of dyadic rationals. By construc tion, W'(t, w) is continuous in t for each w in fl. If w $. B, then W(r, w ) = W'( r, w ) for dyadic rationals, and W'( · , w ) is the continuous extension to all of [0, oo). The next thing is to show that the �, have the same joint distributions as the � . It is convenient to prove this by a lemma which will be used again further on.
Let Xn and X be k-dimensional random vectors, and Let Fn( x) be the distribution function of Xn. If Xn � X with probability 1 and Fn(x) � F(x) for all x, then F(x) is the distribution function of X. Lemma 1.
PROOF. t Le t X have distribution function Theorem 4. 1, if h > 0, then n
< lim infn Fn (
X1
+
H. By two applications of
h , . . . , Xk + h )
It follows by continuity from above that F and H agree.
•
Now, for 0 < t 1 < · · · < tk , choose dyadic rationals r1(n) decreasing to the t;. Apply Lemma 1 with (W,.1 < n > ' . . . , W,.k ) and (� ; , . . . , �: ) in the roles of Xn and X, and with the distribution function with density (37.6) in the role of F. Since (37.6) is continuous in the t,., it follows by Scheffe's theorem that Fr.(x) � F(x ), and by construction Xn � X with probability 1. By the lemma, ( W,', . . . , �,k ) has distribution function F, which of course is also the distribution function of ( � , . . . , W, k ). Thus [�': t > 0] is a stochastic process, on the same probability space as [ � , t > 0], which has the finite-dimensional distributions required for Brown ian motion and moreover has a continuous sample path W'( · , w ) for every w. J
1
t The lemma is an obvious consequence of the weak-convergence theory of Section 29; the point of the special argument is to keep the development independent of Chapters 5 and 6.
SECTION 37.
BROWNIAN MOTION
503
By enlarging the set B in the definition of �'(w) to include all the w for which W(O, w) =I= 0, one can also ensure that W'(O, w) = 0. Now discard the original random variables � and relabel �, as � . The new [ �: t > 0] is a stochastic process satisfying conditions (i), (ii), and (iii) for Brownian motion and this one as well: (iv) For each w, W(t, w) is continuous in t and W(O, w) = 0. From now on, by a Brownian motion will be meant a process satisfying (iv") as well as {i), (ii), and (iii). What has been proved is this:
There exist processes [ � : t > 0] satisfying conditions (i), (ii), (iii), and (iv)-Brownian motion processes. Theorem 37.1.
In the construction above, W,. for dyadic r was used to define � in general. F0 1 that reason it suffices to apply Kolmogorov's theorem for a countable index set. By the second proof of that theorem, the space (f!, !F, P) can be taken as the unit interval with Lebesgue measure. The next section treats a general scheme for dealing with path-function questions by in effect replacing an uncountabk time set by a countable one. �easurable Processes
Let T be a Borel set on the line, let [X, : t E T] be a stochastic process on an (f!, !F, P), and consider the mapping (37. 10)
1
( t, w ) � X, ( w ) = X( t, w )
carrymg T X f! into R • Let !T be the u-field of Borel subsets of T. The process is said to be measurable if the mapping (37.10) is measurable !Tx sz-;!Rt. In the presence of measurability, each sample path X( · , w) is measurable !T by Theorem 18.1. Then, for example, J:cp(X(t, w)) dt makes sense if (a, b) c T and cp is a Borel function, and by Fubini' s theorem if
jb E[ lcp( X, )I] dt a
< oo .
Hence the usefulness of this result: Theorem 37.2. PROOF.
Brownian motion is measurable.
If for
k 2-n
0] define
( 37 . 1 1 ) where c > 0. Since t � c 2 t is an increasing function, it is easy to see that the process [w;': t > 0] has independent increments. Moreover, w;' - �, = c - 1 (Wc2 , - Wc 2), and for s < t this is normally distributed with mean 0 and variance c - 2 (c 2 t - c 2s) = t - s. Since the paths W'( · w) all start from 0 and are continuous, [ w;' : t > 0] is another Brownian motion. In (37. 1 1) rhe time scale is contracted by the factor c 2 , but the other scale only by the factor c. That the transformation (37.1 1) preserves the properties of Brownian motion implies that the paths, although continuous, must be h igh ly irregular. It seems intuitively clear that for c large enough the path W( · w) must with probability nearly 1 have somewhere in the time interval [0, c] a chord with slope exceeding, say, 1. But then W'( · w) has in [0, c- 1] a chord with slope exceeding c. Since the w;' are distributed as the w; , this makes it plausible that W( · w) must in arbitrarily small intervals [0, 8] have chords with arbitrarily great slopes, which in turn makes it plausible that W( · w) cannot be differentiable at 0. More generally, mild irregularities in the path will become ever more extreme under the transformation (37. 1 1) with ever larger values of c. It is shown below that, in fact, the paths are with probability 1 nowhere differentiable. Also interesting in this connection is the transformation ,
,
,
,
,
if t > 0, if t = 0.
( 37.12)
Again it is easily checked that the increments are independent and normally distributed with the means and variances appropriate to Brownian motion. Moreover, the path W"( · w) is continuous except possibly at t = 0. But (37.9) holds with w;' in place of � because it depends only on the finite-dimen sional distributions, and by the continuity of W"( · w) over (0, oo) the supremum is the same if not restricted to dyadic rationals. Therefore, 1 P[sup s :s; n -J I W;'I > n - ] < Kjn 2 , and it follows by the first Borel-Cantelli lemma that W"( · w) is continuous also at 0 for w outside a set M of probability 0. For w E M, redefine W"(t, w) 0; then [ w;": t > 0] is a Brown ian motion and (37. 12) holds with probability 1. ,
,
,
=
SECTION 37.
BROWNIAN MOTION
505
The behavior of W( · , w) near 0 can be studied through the behavior of W"( · , w ) near oo and vice versa. Since (W/' - w;)/t = W1 1 , , W"( · , w) cannot have a derivative at 0 if W( · , w) has no limit at oo. Now, in fact, inf Wn =
(37. 13)
n
- oo
sup wn =
'
n
+ oo
with probability 1. To prove th is, note that Wn = X 1 xk = wk - wk - 1 are independent. Consider
[ sup wn < ] oo
n
=
"'
"'
[
· · ·
+
+
Xn , where the
< ]
u n "!ax W; u ;
u = l m= l
t_m
this is a tail set and hence by the zero-one law has probability 0 or 1. Now -X1 , -X2 , have the same joint distributions as X1, X2 , , and so th is event has the same probability as •
•
•
•
[ inf W, > ] = - oo
n
"'
"'
[
u n '!l:s;ax ( - W;) r m
u=l m � l
•
•
< uJ .
If these two sets have probability 1, so has [supni W,I < oo], so that P[ sup n i Wn l <x] > O for some x. But P[1Wnl <x] = P[IW1 I <xjn 1 1 2 ] � o. This proves (37. 13). Since (37. 13) holds with probability 1, W"( , w) has with probability 1 upper and lower right derivatives of + oo and - oo at t = 0. The same must be true of every Brownian motion. A similar argument shows that, for each fixed t, W( , w) is nondifferentiable at t with probability 1 . In fact, W( w) is nowhere differentiable: ·
·
Theorem 37.3.
differentiable.
· ,
For w outside a set of probability 0, W( , w) is nowhere ·
The proof is direct-makes no use of the transformations (37. 1 1 ) and (37.12). Let PROOF.
By independence and the fact that the differences here have the distribution of 2 - n i 2 W1 , P[Xn k €] = P 3[IW1 1 2 n 1 2E ]; since the standard normal den sity is bounded by 1, P[Xnk €] (2 X 2 n 12E)3. If Yn = min k :s; n 2" Xn k then
0.
It is easily checked that [ �, : t > 0] has the finite-dimensional distributions
SECTION
37.
BROWNIAN MOTION
509
appropriate to Brownian motion. As the other properties are obvious, it is in fact a Brownian motion. Let
(37.18 ) The random variables (37.17) are independent of .9;0. To see this, suppose that O < s 1 < < t k . Put u ; = t0 + t,. Since the < si < t0 and 0 < t 1 < increments are independent, ( w;', w;' - w;', . . . , W,' - W,' ) = ( Wu w; 0, Wu - W, , W,, - Wu ) is independent of (Ws) , Ws - Ws , . . . , Ws W.S. ). But then ( w;', w;' , . . . , w;' ) is independent of \ W.S , � , . . . , W.SJ By Theorem 4.2, ( W,' , . . . , U!;' ) is independent of .9;0 Thus 0 0 0
2
J - l
I
.
.
· ·
0
,
k
I
I
(37. 19 )
k -
2
I
1
0
2
I
'
k
0
k
k
I
2
k-
2
I
1
I
J
j
P ( [ ( w;' , w;: ) E H] n A) P [ ( w;' , , w,: ) E H ] P ( A ) P ( ( w; , w;J E H ] P( A ) , 1
• • • ,
• 0 •
=
1
=
1, • • •
where the second equality follows because (37.17) is a Brownian motion. This holds for all H in IN k. The problem now is to prove all this when t0 is replaced by a stopping time r- a nonnegative random variable for which
(37 .20 )
[ w : r ( w) < t]
E
t > 0.
.9;,
It will be assumed that r is finite, at least with probability 1. Since [ r = t] [ r < t] - U J r < t - n - 1 ], (3 7.20) implies that
(37 .21 )
[ w : r ( w ) t] =
=
t > 0.
E .9;,
The conditions (37.20) and (37.21) are analogous to the conditions (7. 18) and (35.18), which prevent prevision on the part of the gambler. Now .9;0 contains the information on the past of the Brownian motion up to time t0, and the analogue for r is needed. Let .9:, consist of all measurable sets M for which
(37.22)
M n [ w: r ( w )
< t)
E .9;
for all t. (See (35.20) for the analogue in discrete time.) Note that .9:, is u-field and r is measurable .9:,. Since M n [ r t] M n [ r t] n [ r < t ], =
(37.23)
M n [ w: r ( w)
for M in .9:,. For example, [infs w.r > - 1] is in .9:,. ,s
7
r
=
t)
= inf[t: w;
=
a
=
E .9; =
1] is a stopping time and
STOCHASTIC PROCESSES
510 Theorem 37.5.
Let r be a stopping time, and put
(37.24 )
t > 0.
Then [ � * : t > ] is a Brownian motion, and it is independent of .9:,-that is, u[ � * : t > 0] is independent of 5':,: (37.25) P ( [ ( � 7 , . . . , � : } E H ] n M ) =
for H in
�
k
P[( � 7 , . . . , �: ) E H ) P( M ) = P ((�1, . . . , �J E H ] P(M)
and M in .9:,.
That the transfmmation (37.24) preserves Brownian motion is the strong Markov property .t Part of . the conclusion is that the � * are random vari ables. Suppose first that general point of V. Since PROOF.
[ w : �* ( w )
E H]
=
r has countable range V and let t0 be the
U [ w : � 0 + 1 ( w ) - �i w) E H, r ( w ) = t0 ] ,
t0E V
� * is a random variable. Also,
P( [ ( � 7 , . . . , �: ) E H ] n M) = L P( [ ( � 7 , . . . , �: ) E H] n M 0 [ r = t01) . t0 E V
If M E �' then M n [ = tol E g-;0 by (37.23). Further, if = to , then �* coincides with �, as defined by (37.17). Therefore, (37.19) reduces th is last sum to T
T
L P[( �
t0E V
1
, . . . , �J E H ] P(M n [ r = t0 ])
This proves the first and third terms in (37.25) equal; to prove equality with the middle term, simply consider the case M = fl. t since the Brownian motion has independent increments, it is a Markov process (see Examples 33.9 and 33.!0); hence the terminology.
S ECTION
37.
Sll
BROWNIAN MOTION
Thus the theorem holds if r has countable range. For the general r, put
{ k2 -n
n n k - 1 }2- < < k 2- , k = 1 , 2, . . . if ( (37.26) n= 0 if r = O. n If k r n < t < (k + 1 )T , then [rn < t] = [r < k T n ] E !Fk2-" c !Jl;. Th us each n n + 1)2 - . and k r M E t < (k .9; rn is a stopping time. Suppose that < n Then M n [ Tn < t l = M n [ < kr ] E !F..k 2 -n c sz;. Thus s: c .9"_ Let n (n ) - �< >(w) = WT n(w ) +t(w) - WT"(w )(w) -that is, let � be the � * corresponding to the stopping time rn . If M E .9; then M E .9; and by an application of (37.25) to the discrete case already treated, 7'
7'
7'
T
n
Tn
•
,
But rn(w) � r(w) for each w, and by continuity of the sample paths, �< n >(w) � � * (w) for each w. Condition on M and apply Lemma 1 with n n (w,< >, . . . , w,< > ) for Xm ( �* , . . . , �"' ) for X, and the distribution function of (� , . . . , � n ) for F = Fn . Then (37.25J! 0], then according to (37.25) (and Theorem 4.2) the u-fields .9; and !F * are independent:
(37.28)
P ( A n B ) = P( A ) P( B ) ,
For fixed t define rn by (37.26) but with t2 - n in place of 2 - n at each occurrence. Then [ WT < x 1 n [ r < t] is the limit superior of the sets [ WT < x ] n [ r < t ], each of which lies in SZ";. This proves that [ � < x] lies in .9'; and hence that � is measurable g;;. Since r is measurable .9;, n
(37.29) for planar Borel sets H. The Reflection Principle
For a stopping time r, define
(37.30 )
if t < if t >
7' ' 7' .
The sample path for [ �": t > 0] is the same as the sample path for [ � : t > 0] up to r, and beyond that it is reflected through the point WT . See the figure.
512
STOCHASTIC PROCESSES w WT
------------ ----- -----
1
-------I r
: \ {'\..__ ../ I I I
I I I
...
'
W"
T
The process defined by (37.30) is a Brownian motion, and to prove it, one need only check th� finite-dimensional distributions: ?[( � 1 , , �) E H] = E H ]. By the argument starting with (37.26), it is enough to P [( �" , . . . , �") k consider the case where r has countable range, and for this it is enough to check the equation when the sets are intersected with [ r = t0 ]. Consider for notational simplicity a pair of points: • • •
I
(37.31 ) If s < t < t0 , this holds because the two events are identical. Suppose next that s < t0 < t. Since [ r = t0] lies in !F, , it follows by the independence of the increments, symmetry, and the definition (37.30) that 0
P [ r = t0 , (Ws , W, J E / , � - W,0 E l] = P [ r = t0, ( �, �0 ) E l , - (� - �J E l]
K
If
=
I X J, this is
P [ r = to , ( � , � � - � ) E K ] = P [ r = t 0, ( w;' , �� , �, - �� ) E K ] , o,
o
and by 1r-A it follows for all K E !JP3• For the appropriate K, this gives (37.31). The remaining case, t0 < s < t, is similar. These ideas can be used to derive in a very simple way the distribution of M = sup s � · Suppose that x > O. Let r = inf[s: � > x], define W" by and (37.30), ati"d put r" = inf[s: w;' > x] and M;' = SUPs I w;' . Since W" is another Brownian motion, reflection through the point W7 = x shows ,
< r
:S
r" = T
SECTION
that
37.
BROWNIAN MOTION
5 13
P[ M, > X ] = P[ T < t] = P[ T < t ' w, < X ] + P[ T < t ' w, > X ] = P[ < t ' W," < X ] + P[ T < t ' w, > X ] P [ r" < t, W, > X ] + P[ r < t , W, > X ] = P[ T < t ' w, > X] + P[ T < t' w, > X ] = 2P[ w, > X ] . T11
=
Therefore,
(37 .32)
!"'
2 e - u 2 12 du . -/2 7r x;{r
P[ M, > x ] =
-
This argument, an application of the reflection principle,t becomes quite transparent when referred to the diagram. Skorohod Embedding •
Suppose that Xp X2 , are independent and identically distributed random variables with mean 0 and variance u 2 A powerful method, due to Skoro hod, of studying the partial sums sn = XI + . . . + xn is to construct an increasing sequence r0 = O, r 1 , r 2 , of stopping times such that W(rn) has the Same distribution aS Sn. The differenCeS Tk - Tk - l Will turn OUt tO be independent and identically distributed with mean u 2 , so that by the law of large numbers n - l rn = n - I [Z = 1( rk - rk _ 1 ) is likely to be near u 2 • But if rn is near nu 2 , then by the continuity of Brownian motion paths W(rn) will be near W(nu 2 ), and so the distribution of Sn/uVn, which coincides with the distribution of W(rn)/uVn, will be near the distribution of W(na 2 )juVn -that is, will be near the standard normal distribution. The method will thus yield another proof of the central limit theorem, one independent of the characteristic-function arguments of Section 27. of max k < n Sk fuVn But it will also give more. For example, the distribution is exactly the distribution of max k p W( rk )ju{n , and this in turn is near the distribution of sup, ,;n u W(t)juvn , which can be written down explicitly because of (37.32). It will thus be possible to derive the limiting distribution of max k ,; n Sk . The joint behavior of the partial sums is closely related to the behavior of Brownian motion paths. The Skorohod construction involves the class !T of stopping times for which •
•
•
•
•
•
•
2
(37.33 ) (37.34 )
E[WT] = 0, E[r] = E [ W72 ] ,
t see Problem 37. 18 for another application. * The rest of this section, which requires martingale theory, may be omitted.
514
STOCHASTIC PROCESSES
and
Lemma
0. Suppose t and A E !f;. Since Brownian motion has independent increments,
PROOF.
that s
f Y9 , t dP = f e 9 Ws - 9 2s/2 dP . E [ e 9( W, - Ws) - 92(1 - s)/2 ] ' A
A
and a calculation with moment generating functions (see Example 21.2) shows that
s < t ' A E �-
(37.36)
This says that for 0 fixed, [ Y9, 1 : t > 0] is a continuous-time martingale adapted to the u-fields !F,. It is the moment-generating-function martingale associated with the Brownian motion. Let f(O, t) denote the right side of (37.36). By Theorem 16.8,
:O f( 0 , t ) = £ Y9 ( W, - 0 t ) dP, a2 2 - t ] dP, Ot Y t = O, ) f( ) ( j W, , 9 [ ao 2 ,t
A
•
iJ 4 , [ ( W, - Ot ) 4 - 6( W, - O t) 2 t + 3t 2 ] dP. Y f( f ) , t 8 9 ao 4 A ·
Differentiate the other side of the equation (37.36) the same way and set 0 = 0. The result is
f � dP = f W, dP, A
A
fA ( Ws4 - 6W/s + 3s 2 ) dP = f ( W,4 - 6W, 2 t + 3t 2 ) dP, A
s < t, A E �' s < t , A E g;;,
s < t , A E g;;,
This gives three more martingales: If Z, is any of the three random variables
(37 .37)
w, ,
SECTION
then
37.
BROWNIAN MOTION
Z0 = 0, Z,
is integrable and measurable !F,, and
j Zs dP = j Z, dP,
(37.38)
5 15
A
A
s < t, A E �-
In particular, E[ Z,] = [Z0] = 0. If r is a stopping time with finite range (37.38) implies that
{t 1 ,
• • •
, tm}
bounded by
t,
then
Suppose that r is bounded by t but does not necessarily have finite range. n n n n Put Tn = k2 - t if (k - l)T t < T ::;; k2- t, 1 k 2 , and put Tn = 0 if r =- 0. Then rn is a stopping time and E[ Z-r" ] = 0. For each of the three possibilities {37.37) for Z, SUPs :s;riZsl is integrable because of (37.32). It therefore follows by the dominated convergence theorem that £[27] = lim" E[ ZT " ] = 0. Thus £[ Z7 ] = 0 for every bounded stopping time -r. The three cases {17.37) give
<
E [ WT4 ] - 6£ 1 !2 [ WT4 ] £ 1 ! 2 [ r 2 ] + 3£(r 2 ]. If C = E 1 1 2 [ W74] and x = E 1 1 2 [ r 2], the inequality is 0 > q(x) = 3x 2 - 6Cx + C 2 . Each zero of q is at most 2C, and q is negative only between these two zeros. Therefore, x < 2C, which implies (37.35). •
Suppose that r and rn are stopping times, that each rn is a member of !T, and that rn � r with probability 1. Then r is a member of Y if (i) E[ W74" ] < E[ W74] < oo for all n, or if {ii) the W74" are uniformly integrable. Lemma 3.
PROOF. Since Brownian motion paths are continuous, W7 " � W7 with probability 1. Each of the two hypotheses {i) and {ii) implies that £[ �4] " is bounded and hence that £[ r,;J is bounded, and it follows (see (16.28)) that the sequences { rn }, {W7" }, and {W72" } are uniformly integrable. Hence (37.33) and (37.34) for r follow by Theorem 16.14 from the same relations for the rn The first hypothesis implies that lim infn £[ �4] " E[W74], and the second implies that limn E[ W74] = E[ W74]. In either case it follows by Fatou's lemma that £[ r 2 ] lim infn El r,;] 4 lim infn E[ W74] • " 4E[ W74].
and put .9;< 1 > = u[ �< l ): 0 < s < t ] and 1 sr-< > = u[ w,( l >: t > 0]. Let 8 1 be the stopping time of Theorem 37.6, so that W8\1 > and X1 have the same distribution. Let �� I ) be the class of M such that M n [81 < t ] E .9;< 1> for all t. Now put w. = w,li) and (8 , W8�2>) are independent. PROOF.
I
•
2
t This is obvious from the weak-convergence poi nt of view.
-
520
STOCHASTIC PROCESSES
class of M such that M n [8 < t] E .9; < 2> for all t. If 2 and sz- is the u-field generated by these random variables, then again SZ";,�2> and sz-(3) are independent. These two u-fields are contained in sz-( 2), which is independent of �� I )· Therefore, the three u-fields SZ";,� 1>, ��2>, sz-(3) are independent. The procedure therefore extends inductively to give independent, identically distributed random vectors (8n ' W,(8"n ) ). If Tn 8 1 + . . . + 8n ' then W ( l ) W,(81l ) + . . . + W,(8"n ) has the dis• tribution of XI + . . . + xn .
sz-;, be the WI(3 ) w./l2 l - w./l2< 2> Let
=
T"
=
=
In variance •
If E[ xn CT 2' then, since the random variables Tn - Tn I of Theorem 37.7 are independent and identically distributed, the strong law of large numbers (Theorem 22. 1) applies and hence so does the weak one: =
( 37 .40) (If E[X:] < oo, so that the rn - rn l have second moments, this follows immediately by Chebyshev 's inequality.) Now Sn has the distribution of W(rn), and rn is near nu 2 by (37.40); hence Sn should have nearly the distribution of W(nu 2 ), namely the normal distribution with mean 0 and variance nu 2• To prove this, choose an increasing s�quence of integers Nk such that P[ln - 1 rn - u 2 1 > k- 1 ] < k- 1 for n > Nk , and put En k - 1 for Nk < n < Nk + 1 • Then En � 0 and P[ln - 1 rn - u 2 1 > t: n ] < En. By two applications of (37.32), =
s.( < )
�
[ """:!f,;W(
P I W(
'•) I >
En]
+P
]
[ -nu
sup
lr
nn
2 1 :S
w·./n
< En + 4P [ I W( Enn) I � EuVn] ,
and it follows by Chebyshev's inequality that limn 8n(E) distributed as W(rn),
P
[ W��
2)
]
0. Since
[ u� < x ] < p [ w�{,( ) < x + E l + 8n( E ) .
< x - E - 8n(E) < P
*This topic may be om itted.
=
sn
] IS
SECTION
37.
521
BROWNIAN MOTION
Here W(nu 2 )ju..[rl can be replaced by a random variable N with the standard normal distribution, and lett ing n � oo and then € � 0 shows that
This gives a new proof of the central limit theorem for independent, identi cally distributed random variables with second moments (the Lindeberg -Uvy theorem-Theorem 27.1). Observe that none of the convergence theory of Chapter 5 has been used. This proof of the central limit theorem is an application of the invariance principle: Sn has nearly the distribution of W(nu 2 ), and the distribution of the latter does not depend on { ary with) the distribution common to the Xn . More can be said if the Xn have fourth moments. For each n, define a stochastic process [Y,.{t): 0 < t < 1] by Y,.(O, w) = 0 and v
(37.41 ) Yn( t , w ) =
1 k 1 . k S < t f < C (w) 1 n - n' uvn k
k = 1, . . . , n.
If kjn = t > 0 and n is large, then k is large, too, and Y,.(t) = t 1 1 2Sk /u..fk is by the central limit theorem approximately normally distributed with mean 0 and variance t. Since the Xn are independent, the increments of 07.41) should be approximately independent, and so the process should behave approximately as a Brownian motion does. Let rn be the stopping times of Theorem 37.7, and in analogy with (37.41) put Zn(O) = 0 and
(37.42) Zn( t ) = 1r W(rk ) I. f k n- 1· < t < nk- , uvn
k = 1 , . . . , n.
By construction, the finite-dimensional distributions of [ Y,.(t ): 0 < t < 1] coin cide with those of [Zn{t): 0 < t < 1]. It will be shown that the latter process nearly coincides with [ W(tnu 2 )juVn : 0 < t < 1], which is itself a Brownian motion over the time interval [0, 1]-see (37.11). Put Wn(t) = W(tnu 2 )ju..[rl . Let Bn (8) be the event that I rk - ku 2 1 > 8nu 2 for some k < n. By Kolmogorov's inequality (22.9),
(37.43) If (k -
1)n - 1 < t < kn - 1
and
n > 8 - 1 , then
STOCHASTIC PROCESSES
522
on the event ( Bn( 8 )Y, and so I Zn( t ) - W,.( t ) l = wn
( n�2 ) - Wn( t )
€ ts
I
(
< P ( Bn( o ) ) + P sup
]
J
sup I W( s) - W( t ) l > E .
1 :$ 1 1.< - r l :$ 2/l
Let n � oo and then 8 � 0; it follows by (37.43) and the continuity of Brownian motion paths that
[
]
lim P sup i Zn( t ) - W ( t ) l > € = 0
( 37.44)
n
n
r :S I
for positive E. Since the processes (37.41) and (37.42) have the same finite dimensional distributions, this proves the following general invariance princi ple or functional central Limit theorem.
Suppose that X1 , X2 , . . . are independent, identically dis tributed random variables with mean 0, variance u 2 , and finite fourth mo ments, and define Y ( t) by (37.41). There exist (on another probability space), for each n, processes [Zn(t): 0 < t < 1] and [W,.(t): 0 < t < 1] such that the first has the same finite-dimensional distributions as [ Yn( t ): 0 < t < 1], ihe second is a Brownian motion, and P[sup, 5 1 1 Zn(t) - W,.{t)l > €] � 0 for positive €. Theorem 37.8.
n
As an application, consider the maximum Mn = max k 5 n Sk . Now Mn fuVn = sup, Yn( t ) has the same distribution as sup, Zn(t ), and it follows by (37.44) that
[
]
P sup Zn( t ) - sup Wn( t ) > € � o . 1 :$ 1
1 :$ 1
But P[sup, s: 1 Wn(t) > x] = P[sup, 5 1 W(t) > x] = 2P[ N > x] for x > O by (37.32). Therefore, ( 37.45)
]
< x � 2 P [ N < x ],
X > 0.
SECTION 37.
BROWNIAN MOTION
523
PROBLEMS
37.1. 36. 2 j Show that K(s, t) = min{s, t} is nonnegative-definite; use Problem 36.2 to prove the existence of a process with the fi nite-dimensional distributions prescribed for Brownian motion. 37.2. Let X(t) be independent, standard normal variables, one for each dyadic rational t (Theorem 20.4; the unit interval can be used as the probability space). Let W(O) = O and W(n) = Ef:= 1 X(k). Suppose that W(t) is already defined for dyadic rationals of rank 'l, and put
Prove by induction that the W( t) for dyadic t have the finite-dimensional distributions prescnbed for Brownian motion. Now construct a Brownian motion with continuous paths by the argument leading to Theorem 37. 1. This avoids an appeal to Kolmogorov' s existence theorem.
37.3.
n
j
n
For each n define new variables Wn(t) by setting Wn(kj2 ) = W(kj2 ) for dyadics of order n and interpolating linearly in between. Set Bn = sup, , n1Wn + l(t) - Wn(t)l, and show that
The construction in the preceding problem makes it clear that the difference here is normal with variance 1/2n + xnl both converge, and conclude that outside a set of probability 0, Wn(t, w) converges uniformly over bounded intervals. Replace W(t, w) by lim n Wn( t, w ). This gives another construction of a Brownian motion with continuous paths.
37.4. 36.6 i Let T = [0, oo), and let P be a p1 0bability measure on ( RT, �T) having the finite-dimensional distributions prescribed for Brownian motion. Let C consist of the continuous elements of RT. (a) Show that P* (C) = 0, or P*(RT- C) = 1 (see (3.9) and (3.10)). Thus completing (RT, �T, P) will not give C probability 1. (b) Show that P*( C) = l. 37.5. Suppose that [W,: t > 0] is some stochastic process having independent, sta tionary increments satisfying E[ W,] = 0 and E[ W, 2 ] = t. Show that if the finite-dimensional distributions are pr eserved by the transformation (37. 11), then they must be those of Brownian motion. 37.6. Show that n I > oul W,: s > t 1 contains only sets of probability 0 and 1 . Do the same for n , >Ou( W,: 0 t €]; give examples of sets in this u-field.
a ]. Show that the distri6ution of Ta has over (0, oo) the density
a l_ - a 2 f2 t h a ( t ) - r;;- _ 32 v2rr t / e
(37 .46)
Show that £['Ta l = oo. Show that 'Ta has the same distribution as a 2 IN 2 , where N is a standard normal variable.
37.12. i (a) Show by the !.trong Markov property that Ta and Ta + /3 - Ta are independent and that the latter has t he same distribution as Tw Conclude that h a * h13 = h a + f3 · Show that {3Ta has the same distribution as Ta " (b)
,_fjj
Show that each h a is stable-see Problem 28. 10.
37.13. i Suppose that X 1 , X2 , • • • are in dependent and each has the distribution (37.46). (a) Show that (X1 + · · +Xn)ln2 also has the distribution (37.46). Contrast this with the law of large numbers. (b) Show that P[n - 2 max k !; n Xk < x] -+ exp( -a yf2/rr x ) for x > O. Relate this to Theorem 14.3. ·
37.14. 37. 1 1 i Let p(s, t) be the probability that a Brownian path has at least one zero in (s, t). From (37.46) and the Markov property deduce (37 .47 )
,; p(s, t ) = - arccos !.. .
2
t
1T
Hint: Condition with respect to Hj.
37 .15. i (a) Show that the probability of no zero in ( t, 1) is (21 rr) arcsin ..[t and hence that the position of the last zero preceding 1 is distributed over (0, 1) with density rr - 1 (t(l - t)) - 1 12 • (b) Similarly calculate the distribution of the position of the first zero follow ing time 1 . (c) Calculate the joint distribution of the two zeros in (a) and (b). 37.16. i (a) Show by Theorem 37.8 that inf s u s r Yn(u) and inf s u s r Zn(u) both converge in distribution to inf s u s 1 W(u) for 0 < s < t < 1. Prove a similar result for the supremum. (b) Let An(s, t) be the event that the position at time k in a symmetric random walk, is 0 for at least one k in the range sn < k < tn, and show that P(An(s, t )) -+ (21rr) arccos ..fi7i. s
s
Sk,
s
SECTION 37. (c) Let
BROWNIAN MOTION
525
Tn be the maximum k such that k < n and Sk = 0. Show that Tnfn has
asymptotically the distribution with density 7T- 1 (t(l - t)) - 1 12 over (0, l). As this density is larger at the ends of the interval than in the middle, the last time during a night's play a gambler was even is more likely to be either early or late than to be around midnight.
37.17. i Show that p(s, t) =p(t- ' , s - ') = p(cs, ct). Check this by (37.47) and also by the fact that the transformations (37. 1 1) and (37.12) preserve the properties of Brownian nlotion. 37.18. Deduce by the reflection principle that ( M, w;) has density
[
1
2 2(2y -x) (2y x ) --'---i=:==-. 0 for all w and that V has a continuous distribution. Define Example 38. 7.
Il l { f( t ) = �
if t * 0, if t = 0,
and put X(t, w) = f(t - V(w )). If [X; : t > 0] is any separable process with the same finite-dimensional distributions as [X,: t > 0], then X'( · , w) must with probability 1 assume the value 1 somewhere. In this case (38.11) holds for • a < 0 and b = 1, and equality in (38.12) cannot be avoided. If
( 38 .1 4)
sup i X ( t , w ) l < oo , I,
w
then (38.11) holds for some a and b. To treat the case in which (38. 14) fails, it is necessary to allow for the possibility of infinite values. If x(t) is oo or - oo, replace the third condition in (38.2) by x(tn ) � oo or x(t) � - oo. This extends the definition of separability to functions x that may assume infinite values and to processes [X,: t > 0] for which X(t, w) = + oo is a possibility.
If [X, : t > 0] is a finite-valued process on (f!, !F, P), there exists on the same space a separable process [X;: t > 0] such that P[X; X, ] 1 for each t. Theorem 38.1.
=
=
STOCHASTIC PROCESSES
532
It is assumed for convenience here that X(t, w) is finite for all t and w, although this is not really necessary. But in some cases infinite values for certain X'(t, w) cannot be avoided-see Example 38.8. PROOF. If (38.14) holds, the result is an immediate consequence of Lemma 2. The definition of separability allows an exceptional set N of probability 0; in the construction of Lemma 2 this set is actually empty, but it is clear from the definition this could be arranged anyway. The case in which (38.14) may fa il could be treated by tracing through the preceding proofs, making slight changes to allow for infinite values. A simple argument makes this unnecessary. Let g be a continuous, strictly increasing mapping of R 1 onto (0, 1). Let Y(t, w) g(X(t, w)). Lemma 2 applie� to [Y,: t > 0]; there exists a separable process [ Y,': t > 0] such that P[ Y,' Y,] 1. Since 0 < Y(t, w) < 1, Lemma 2 ensures 0 Y'(t, w) .::;; 1. Define =
- oo X'( t, w) = g - 1 ( Y '( t , w ) ) + oo Then [X;: each t.
t > 0]
0] is separable and has the finite-dimensional distributions of [X,: t > 0], • then X'( · , w) must with probability 1 assume the value oo for some t. Combining Theorem 38.1 with Kolmogorov' s existence theorem shows that
fior any consistent system offinite-dimensional distributions 11- r k there exists a separable process with the 11- r k as finite-dimensional distributions. As shown in Example 38.4, this leads to another construction of Brownian motion with I
1
I
1
continuous paths.
Consequences of Separability
The of a fact The
next theorem implies in effect that, if the finite-dimensional distributions process are such that it "should" have continuous paths, then it will in have continuous paths if it is separable. Example 38.4 illustrates this. same thing holds for properties other than continuity.
SECTION
38.
NONDENUMBERABLE PROBABILITIES
533
Let RT be the set of functions on T = [O, oo) with values that are ordinary reals or else oo or -oo. Thus R T is an enlargement of the R T of Section 36, an enlargement necessary because separability sometimes forces infinite values. Define the function Z, on R T by Z,(x) Z(t, x) = x(t ). This is just an extension of the coordinate function (36.8). Let �T be the u-field in R T generated by the Z, t > 0. Suppose that A is a subset of R T, not necessarily in � T. For D c T [0, oo), let A0 consist of those elements x of R T that agree on D with some element y of A : =
=
A D = u n [ x E R T: x(t ) y ( t )] .
(38. 15)
=
y EA t E D
Of course, A c A 0. Let S0 denote the set of x in R T that are separable whh respect to D. In the following theorem, [X,: t > 0] and [X;: t > 0] are processes on spaces (0, !F, P) and (0', ::7', P'), which may be distinct; the path functions are X( · , w) and X'( - , w').
Suppose of A that for each countable, dense subset T [0, oo), the set (38.15) satisfies Theorem 38.2.
D
of
=
(38.16 )
-T
A0 E � ,
If [X,: t > 0] and [X;: t > 0] have the same finite-dimensional distributions, if [w: X( · , w) EA] Lies in !Y and has ?-measure 1, and if [X;: t > O] is separable, then [w': X'( · , w') EA] contains an !F'-set of P'-measJ,tre 1. If (0', .'F', P' ) is complete, then of course !F'-set of ?'-measure 1.
[w': X'(· , w') EA]
is itself an
[X;: t > 0] is separable with respect to D. The [w': X'( · , ) EA0] - [w': X'( · , w') EA] is by (38.16) a subset of [w': X'( · , w') E R T - S0], which is contained in an !7'-set of N' of ?'-mea sure 0. Since the two processes have the same finite-dimensional distribu tions and hence induce the same distribution on (R T, � T ), and since A 0 lies in �T, it follows that P'[w': X'( · , w') EA0] = P[w: X(· , w) EA0] > P[w: X{ · , w) EA] = l. Thus the subset [w': X'( · , w') EA0] - N' of [w': X'( ·, w') E A ] lies in !F' and has P'-measure 1. • PROOF. difference
Suppose that '
w
Consider the set C of finite-valued, continuous functions on T. If x E S 0 and y E C, and if x and y agree on a dense D, then x and y agree everywhere: x y. Therefore, C0 n S0 c C. Further, Example 38.9.
=
C0 = n U n ( x E RT: I x(s ) l < oo, i x (t ) l < oo, i x( s ) - x( t) I < E ] , €, 1
li
s
.
STOCHASTIC PROCESSES
534
where E and 8 range over the positive rationals, t ranges over D, and the inner intersection extends over the s in D satisfying Is - t I < 8. Hence C0 E �T. Thus C satisfies the condition (38.16). Theorem 38.2 now implies that if a process has continuous paths with probability 1, then any separable process having the same finite-dimensional distributions has continuous paths outside a set of probability 0. In particular, a Brownian motion with continuous paths was constructed in the preceding section, and so any separable process with the finite-dimensional distribu tions of Browni�r. motion has continuous paths outside a set of probability 0. The argument in Example 38.4 now becomes supererogatory. • There is a somewhat similar argument for the step functions of the Poisson process. Let z + be the set of nonnegative integers; let E consist of the nondecreasing functions x in R T such that x(t) E Z for all t and such that for every n E Z + there exists a non empty interval J su�h that x(t) = n for t E /. Then EY.ample 38.10.
+
n
s, t E D, s < l
tED 00
() n u n�O
I
n
r eDn/
[A : x(s) < x ( t )]
[x: x(t ) = n],
where I ranges over the open intervals with rational endpoints. Thus E0 E !JRT. Clearly, E0 n S0 c E, and so Theorem 38.2 applies. In Section 23 was constructed a Poisson process with paths in E, and therefore any separable process with the same finite-dimensional distributions will have paths in E except for a set of probability 0. • For E as in Example 38. 10, let E0 consist of the elements of E that are right-continuous; a function in E need not lie in E0, although at each t it must be continuous from one side or the other. The Poisson process as defined in Section 23 by N, = max[ n : Sn < t] (see (23.5)) has paths in E0 • But if N; = m ax[ n : sn < t], then [N,': t � 0] is separable and has the sam� finite-dimensional distributions, but its paths are not in E0• Thus E0 does not satisfy the hypotheses of Theorem 38.2. Separabil ity does not help distinguish between continuity from the right and continu ity from the left. • Example 38.JI.
The class of sets A satisfying (38. 16) is closed under the forma tion o f countable unions and intersections but is not closed under complementation. Define X, and Y, as in Example 38. 1, and let C be the set of continuous paths. Then [Y,: t > 0] and [ X, : t � 0] have the same finite-dimensional distributions, and the latter is separable; Y( · , w ) is in R T - C for each w, and X( · , w ) is in R T - C for • no w. Example 38.12.
1
As a final example, consider the set of functions with disconti nuities of at most the first kind: x is in if it is finite-valued, if x(t + ) = lim s 1 x(s) exists (finite) for t � 0 and x(t - ) = lim s l 1 s(s) exists (finite) for t > 0, and if x{t) lies between x(t + ) and x(t - ) for t > 0. Continuous and right-continuous functions are special cases. Example 38./3.
1
SECTION 38.
Let
NOND ENUMBERABLE PROBABILITIES
535
V denote the general system
{38.17) where k is an integer, where the
r,., s,., and a,. are rational, and where
Define k
1( D , V, t:) = n [ x : a,. <x ( t) < a,. + t: , t E ( r,. , s;) n D ] i= I
n
%-m k
k
n
i= 2
[ x : min{a,. _ 1 , a;} < x(t) < max{ a ,._ 1 , a,.} + t:,
6
t o::
( s,. _ 1 , r; )
be the class of systems (38.17) that have a fixed value for r,. - s,._ 1 . m. lt will be shown that Let
00
(38.18)
00
1v = n n u n
k
nD)
and satisfy
U 1 (D, V, t:),
where € and fj range over the positive rationals. From this it will follow that 10 E !JRT. It will also be shown that 10 n S0 c1, so that 1 satisfies the hypothesis of Theorem 38.2. Suppose that y E 1. For fixed t: , let H be the set of nonnegative h for which there exist finitely many points t; such that 0 = t0 < t 1 < < t , = h and I y(t) -y(t')l < € for t and t' in the same interval (t,._ 1 , t,.). If hn E H and hn i h, then from the existence of y(h - ) follows h E H. Hence H is closed. If h E H, from the existence of y(h + ) it follows that H contains points to the right of h. Therefore, H = [O, oo). From this it follows that the right side of (38.18) contains 10. Suppose that x is a member of the right side of (38. 18). It is not hard to deduce that for each t the l imits ·
(38. 1 9)
x s ), ( sED
lim
s ! I,
lim
·
s j r, s e D
·
x(s)
exist and that x(t) lies between them if t E D. For t E D take y(t) = x(t), and for t � D take y(t) to be the first limit in (38.19). Then y E 1 and hence x E 10. This argument also shows that 10 n S0 c1. •
Appendix
Gathered here for easy reference are certain definitions and results fro m set theory and real analysis required in the text. Although there are many newer books, HAUSDORFF (the early sections) on set theory and HARDY on analysis are still excellent for the general background assumed here. Set Theory
Ai. The empty set is denoted by 0. Sets are variable subsets of some space that is space is denoted either fixed in any one definition, argument, or discussion; this k generical ly by n or by some special symbol (such as R for Eucl idean k-space). A singleton is a set consisting of just one point or element. That A is a subset of B is expressed by A c B. In accordance with standard usage, A c B does not preclude A = B; A is a proper subset of B if A c B and A -=F B. The complement of A is always relative to the overall space !l; it consists of the points of !l not contained in A and is denoted by Ac. The difference between A and B , denoted by A - B, is A n SC; here B need not be contained in A , and if it is, then A - B is a proper difference. The symmetric difference A ,), B = (A n Be) U (Ac n B) consists of the points that lie in one of the sets A and B but not in both. Classes of sets are denoted by script letters. The power set of n is the class of all subsets of !l; it is denoted 2 °. A2.
The set of w that lie in A and satisfy a given property p(w) is denoted [ w E A: p (w )]; if A = !1, this is usually shortened to [w: p( w )].
A3. In this book, to say that a collection [ A 6: (J E e] is disjoint always means that it is pairwise disjoint: A 6 n A 6, = 0 if e and e' are distinct elements of the index set e. To say that A meets B, or that B meets A, is to say that they are not disjoint: A n B -=F 0. The collection [ A 6: e E 0] covers B if B c U 6 A 6• The collection is a decomposition or partition of B if it is disjoint and B = U 6 A 6• A4. By A n i A is meant A 1 c A 2 c A I ::J A 2 ::J . • • and A = n n An. AS.
·· ·
and A = U n A,; by A n l A is meant
The indicator' or indicator function ' of a set A is the function On n that assumes the value 1 on A and 0 on A c; it is denoted !A ' The alternative term "characteristic function" is reserved for the Fourier transform (see Section 26).
536
537
APPENDIX A6.
De Morgan's laws are (U 6A6Y = n 6A� and ( n 6 A 6 Y = U 6A�. These and the
other facts of basic set theory are assumed known : a countable union of countable sets is countable, and so on.
If T: !l --+ !l' is a mapping of !l into !l' and A' is a set in !l', the inverse image of A' is T- 1,4' = [w E !l: Tw E A']. It is easily checked that each of these statements is equivalent to the next: w E n - T- 1A'' w $. T- 1A', Tw $.A'' Tw E !l' - A', w E T- 1 (0' - A'). Therefore, !l - r t.4' = r 1 (0' - A'). Simple considerations of this kind show that U 6r t.4'6 = r 1 ( U 6A�) and n 6r t.4'6 = r 1( n 6 A'6 ), and that A' n B' = 0 implies r 1A' n T- 1 B' = 0 (the reverse implication is false unless T!l = !l'). I f f maps n into another space, f(w ) is the value of the function f at an unspecified value of the argument w. The function f itself (the rule defining the mapping) is sometimes denoted f( · ). This is especially convenient for a functio n f(w , t ) of two arguments: For each fixed t, f( · , t ) denotes the function on !l with value f(w, t) at w. A7.
AS.
The axiom of choice.
Suppose that [ A 6: (J E E>] is a decomposition of !l into nonempty sets. The axiom of choice says that there exists a set (at least one set) C that contains exactly one point from each A6: C n A 6 is a singleton for each e in e. The existence of such sets C is assumed in "eve tyday" mathematics, and the axiom of choice may even seem to be simply true. A careful treatment of set theory, however, is based on an explicit list of such axioms and a study of the relatio nships between them; see HALMOS 2 or DuDLEY. A few of the problems require Zorn's lemma, which is equivalent to the axiom of choice; see Du DLEY or KAPLANSKY. The Real Line
The real line is denoted by R 1 ; x v y = max{x, y} and x 1\ y = min {x , y}. For real x, l x J is the integer part of x, and sgn x is + 1, 0, or - 1 as x is positive, 0, o r negative. It is convenient to be explicit about open, closed, and half-open intervals:
A9.
( a, b) = [x: a < x < b), [a, b] = [x: a < x < b), ( a, b] = [x: a <x < b ), [a, b) = [x: a < x < b ]. AlO. Of course xn --+ x means lim n x, = x; xn i x means x1 <x 2 < · · · and xn --+x; xn l x means x 1 x 2 · · · and xn x. A sequence {xn} is bounded if and only tf every subsequence {xn } contains a further subsequence {xn . } that converges to some x: lim,. x, . = x. If { ;n} is not bounded, then for each k there is an n k for which lxn I > k; no subsequence of {xn } can ' t{ >
k(j )
>
--+
.{ ( J )
converge. The implication in the other direction is a simple consequence of t e fact that every bounded sequence contains a convergent subsequence.
If {x n} is bounded, and if each subsequence that converges at all converges to x, then lim n x n x. If x n does not converge to x, then I x n, - x I > € for some positive € and some increasing sequence {n k } of integers; some subsequence of {xn } converges, but ' the limit cannot be x. Al l. A set G is defined as open if for each x in G there is an open interval I such that x E I c G. A set F is defined as closed if pc is open. The interior of A, denoted =
538
APPENDIX
x closure
I
such that A0, consists of the in A for which there exists an open interval E c A . The of A , denoted A - consists of the for which there exists a sequence of A is aA = A - - A0• The basic facts in A with The of real analysis are assumed known: A is open if and only if A = A0; A is closed if and only if A = A - ; A is closed if and only if it contains all limits of sequences in it; lies in aA if and only if there is a sequence in A c such in A and a sequence
x I
that
{xn}
Xn --+ x.
xn --+ x and Yn --+ x; and
so
x
,
boundary
{yn}
{xn}
on.
x
Every open set G on the line is a countable, disjoint union of open intervals. To see this, define points x and y of G to be equivalent if x < y and [ x, y] c G or y < x and [ y , x ] c G. This is an equivalence relation. Each equivalence ci n 1, 2, : • • .
X2
{3)
n 1. 1 , X 2• n , x 2 n I 2
•
I '\ ,
x1 n •
I k
exists. Look next at
•• • •
As a subsequence of the second row of (1), (3) is bounded. Select from it a convergent subsequence
here {n 2 k } is an increasing sequence of integers, a subsequence of {n l.k }, and lim k X 2, n: eXiStS.
0 for rational r0 > x0• It is thus no restriction to assume that g(x0) > 0. Let l be an open interval in which g is bounded above. Given a number M, choose n so that ng(x0) > M, and then choose a rational r so that nx0 + r lies in /. If r > 0, then g(r + nx0) = g(r) + g(nx0) = g(nx0) = ng(x0). If r < 0, then ng(x0) g(nx0) = g((-r) + (nx0 + r)) = g( -r) + g(nx0 + r) = g(nx0 + r ). In either A20.
=
APPENDIX
541 •
case, g(nx0) + r) = ng(x0); of course this is trivial if r = 0. Since g(nx0 + r) = ng(x0) > M and M was arbitrary, g is not bounded above in I, a contradiction. • Obviously, the same proof works if f is bounded below in some interval.
Let U be a realfunction on (0, oo) and suppose that U( x + y) = U( x )U( y) for x, y > 0. Suppose further that there is some interval on which U is bounded above. x A Then either U(x) = 0 for x > 0, or else there is an A such that U(x) = e for x > 0. n PROOF. Since U( x ) = U 2 (xj2), u is nonnegative. If U(x ) = 0, then U(xj2 ) = 0 Corollary.
and so U vanishes at points arbitrarily near 0. If U vanishes at a point, it must by the functional equation vanish everywhere to the right of that point. Hence U is identically 0 or else everywhere positive. In the latter case, the theorem applies to f(x) = log U( x ), this function being II bounded above in some interval, and so f( x) =Ax for A = log U(l).
A21. A
number-theoretic fact.
Suppose that MIS a set ofpositive integers closed under addition and that M has greatest common divisor 1. J'hen M contains all integers exceeding some n • Theorem.
0
PRooF. Let M1 consist of all the integers m, -m, and m - m' with m and m' in M. Then M1 IS closed under addition and subtraction (it is a subgroup of the group of integers). Let d be the smallest positive element of M1 • If n E M1, write n = qd + r, where 0 < r < d. Since r = n - qd lies in Mt> r must actually be 0. Thus M1 consists of the multiples of d. Since d divides all the integers in M1 and hence all the integers in M, and since M has greatest common divisor 1, d = 1. Thus M1 contains all the integers. Write 1 = m - m' with m and m' in M (if 1 itself is in M, the proof is easy), and take n0 = (m + m')2 • Given n > n0, write n = q(m + m') + r, where O < r < m + m'. From n > n0 > (r + 1Xm + m') follows q = (n - r)j(m + m') > r. But n = q(m + m') + r(m - m') = (q + r)m + (q - r)m', and since q + r > q - r > O, n lies in M. •
A22.
One- and two-sided derivatives.
Suppose that f and g are continuous on [0, oo) and g is the right-hand derivative off on (0, oo) r (t) = g(t) for t > 0. Then r (0) = g(O) as well, and g is the two-sided derivative off on (0, oo). Theorem.
:
PRooF. It suffices to show that F(t) = f(t) -f(O) - fJg(s) ds vanishes for t > 0. By assumption, F is continuous on [O, oo) and F + (t) 0 for t > 0. Suppose that F(t0) > F(t1), where 0 < t0 < t 1 • Then G(t)+= F(t) - (t - t0XF(t1) - F(t0))j(t 1 - t0) is continuous on [O, oo), G(t0) = G(t1), and c (t) > 0 on (O, oo). But then the maximum of G over [t0, t d must occur at some interior point; since G + < 0 at a local maximum, this is impossible. Similarly F(t0) < F(t1) is impossible. Thus F is constant over (0, oo) and by continuity is constant over [0, ) Since F(O) = 0, F vanishes on [0, ) • =
oo .
co .
A23. A differential equation. The equation f'(t) = Af(t) + g(t) (t > 0; g continuous) has the particular solution f0(t) = e A 'JJg(s)e- A s ds; for an arbitrary solution f, 0 and hence equals f(O) identically. All solutions (f(t) -f0(t))e- A1 has derivative A thus have the form f(t) = e '[f(O) + JJg(s)e- As ds].
APPENDIX
542
A24. A
trigonometric identity. If z * 1 and z * 0, then
and hence m
I
-I
L:
L:
/=0 k = -1
zk =
z 1-z- z L:
m-
I
-
I
1+ I
1=0
[ -z
1 1 = 1 -z 1
_
-m
z- I
m] -z -z 1 -z 1
( z m f2 _ z - m f 2 ) 2 _ - I) ( z 1/2 z - 1/2 ) 2
1 _ z-m + 1 - zm ( 1 - Z)( 1 Take z = eix. If
x
x
Z
_
is not an integral multiple of 21r, then 2 ) ( sin � mx i L L e kx = 2 • ) ( Sin 2 x 1 = 0 k = -1
m-I
If
-
I
.
1
= 21rn, the left-hand side here is m 2, which is the limit of the right-hand side as
x --+ 2 1rn .
Infinite Series
A25. Nonnegative series. Suppose x 1 , x 2 , • • • are nonnegative. If E is a finite set of integers, then E c {1, 2, . . . , n} for some n, so that by nonnegativity [k E E x k < [ f: = 1 x k . The set of partial sums [f:= 1 x k thus has the same supremum as the larger set of sums [ k E x k ( E finite). Therefore, the nonnegative series L,; 1 x k converges if and only if the sums [k E E x k for finite E are bounded, in which case the sum is the supremum: L,; = ! x k = sup E [k E E x k . E
=
A26. Dirichlet 's theorem. Since the supremum in P..25 is invariant under permuta tions, so is [k= ! x k : If the x k are nonnegative and Yk = x f< k > for some one-to-one map f of the positive integers onto themselves, then [k x k and L k Yk diverge or converge together and in the latter case have the same sum.
A27. Double series. Suppose that x ij i, j 1, 2, . . . , are nonnegative. The ith row gives a series [1xu, and if each of these converges, one can form the series L;LJ xiJ. Let the terms x;1 be arranged in some order as a single infinite series [ifxiJ; by Dirichlet's theorem, the sum is the same whatever order is used. Suppose each [jx11 converges and [;[1x;1 converges. If E is a finite set of the pairs (i, j), there is an n for which L( l, j ) E e X ij < L; s, n Lj s, n X ij < L; s, n LJ X ij < L; LjX ij; hence [;1 x if converges and has sum at most L;[1xif. On the other hand, if [if x U converges, then L; s, m LJ s. n xil < [if x;1; letting n --+ oo and then m --+ oo shows that each [j xiJ converges and that [;[1xil < L;J xif. Therefore, in the nonnegative case, Lu X ;l-. converges if and only if the [jxiJ all converge and L;Lj x iJ converges, in which case Lux;j - [;[J...xiJ. . By sy� metry, 2.. if x11 Ll-.L; Xif. Thus the order of summation can be reversed in a • nonnegative double senes: 2.. 1[1x;1 = [1 [1 xiJ. •
A28.
The Weierstrass M-test.
=
543
APPENDIX
Suppose that lim n x n k = xk for each k and that lxnkl <Mk> where f.k Mk < oo. Then f.k x k and all the f.k x nk converge, and lim n f.k x nk = f.k x k . PRooF. The series of course converge absolutely, since f.k Mk < oo. Now lf.k xnk - f.k x k l < f.k s k �lxnk - x k l + 2f.k > k Mk. Given e, choose k0 so that f.k > k Mk < e / 3, and th n choose n0 so that n > n0 implies lxnk - xkl < e/ 3k0 for • k < k�-. Then n > n0 implies If." xnk - f.k x"l < e. A29. Power series. The principal fact needed is this: If f(x) = Lk= oak x " co nverges in the range I xI < r, then it is differentiable there and f'(x) = L kak x " - 1 • (8) Theorem.
k=l
For a simple proof, choose r0 and r1 so that lxl < r0 < r1 < r. If lhl < r0 - lxl, so that I x ± hi < r0 , then the mean-value theorem gives (here 0 < (Jh < 1)
2kr$ - I j f goe to 0, it is bounded by some M, and if Mk = lak I · Mrf, then f." Mk < oo and Ia " I times the left member of (9) is at most Mk for I hi < r0 - I xl. By the M·test [A28] (applied with h -+ 0 instead of n -+ oo ), Since
r
Hence (8). Repeated application of (8) gives [ 0>( x ) =
00
[ k ( k - 1 ) ' " ( k - j + 1 ) a k x k -j.
k =j
For x = 0, this is aj = JU >(O)jj!, the formula for the coefficients in a Taylor series. This shows in particular that the values of f(x) for lxl < r determine the coefficients
ak .
Cesaro averages. If xn -+ x, then n- 1 f.Z = 1 x k -+ x. To prove this, let M bound lxkl, and given e, choose k0 so that l x - x k i < E/2 for k > k0• If n > k0 and n > 4k0Mje, then ku - 1 n n 1 x - -n1 L x" < -n L 2M + n-1 L !.2 <e. k�l k = ko k= l A30.
A3l. by
Dyadic expansions. Define a mapping T of the unit interval !l = (0, 1] into itself
{ 2w Tw = 2w - 1
if O < w < L if � < w < 1.
APPENDIX
544
Define a function
d1 on !l by if O < w < L if � < w < 1,
and let
( 10)
d;(w) = d 1(T' - 1 w ). Then ;.. d; ( w ) < L. i= l
2'
·
W
I Dx ( x' - x ) 1 - if31 x' - x I > � {31x' - xl.
Thus
2
/x' - x/ < {3 /Tx' - Tx /
( 19 )
for x , x'
E
Q -.
This shows that T is one-to-one on Q - . Since x0 does not lie in the compact set aQ, infx E aQ /Tx - Tx0/ = d 0. Let Y be the open ball with center y0 = Tx 0 and radius dj2 (Figure (vi)). Fix a y in Y. The problem is to show that y = Tx for some x in Q0, which means finding an x such that rp(x) = / y - Tx / 2 = [i(y,. - t;(x )) 2 vanishes. By compactness, the minimum of rp on Q - is achieved there. If x E aQ (and y E Y), then 2/ y - y01 < d < /Tx - y0/ < /Tx y/ + / y - Yo/, so that / y - Txol < I y - Tx /. Therefore, rp(xo) < rp(x) for X E aQ, and so the minimum occurs in Q0 rather than on aQ. At the minimizing point, a rp ;a xj = - [,. 2( y; - t ,.( x ))t;i x ) = 0, and since Dx is nonsingular, it follows that y = Tx : Each y in Y is the image under T of some point x in Q0• By (19), this x is unique (although it is possible that y = Tz for some z outside Q). Let X = Q0 n T- 1 Y. Then X is open and T is a one-to-one map of X onto Y. Now let T- ! denote the inverse point transformation on Y. By (19), T- 1 is continu ous. To prove differentiability, consider in Y a fixed point y and a variable point y' such that y ' -+ y and y' -=/= y. Let x = T- 1 y and x' = T- 1 y'; then x' is a function of y', x' -+ x, and x' -=/= x. Define u by Tx' - Tx Dx(x' - x) + u ; then u is a function of x' and hence of y', and /v/j/ x' - x / -+ O by (17). Apply Dx- 1 : D; 1(Tx' - Tx) = x' - x + Dx- l u , or r- 1 y' - T- 1 y = Dx- 1( y' - y) - D; 1 u . By (18) and (19),
>
=
I r I y' - r I Y - Dx- I ( y' - Y) I / y' - y/
l
l / x' - x/
vx- 1 v - -7--;:--;--
ix' - x/
.
Iv
'
2 /vl/{3 . - y/ - I x' - x/ f3 ·
2. Since 0 < 1j(a a k (x) = ak for + t) < 1, the induction hypothesis (use (26)) gives n n � l k < n and T - x = 1j(an + t ). Now apply the case n = 1 to Tn - x. (If a, = 1 and t = O, then ak(x) = a k for k < n - 2, a n _ 1(x) = a n ! + 1, and T - lx = 0.) Consider now the infinite case. Assume that -
(34) converges, where the
( 35 )
an are positive integers. Then n >
1.
To prove this, let n --+ oo in (25): the continued fraction t = .![ii""; + .![i2; + · · · con verges and x = 1j( a1 + t ). It follows by induction (use (26)) that
( 36 )
n>
2.
Hence 0 < x < 1, and the same must be true of t. Therefore, a1 and t are the integer and fractional parts of 1jx, which proves (35) for n = 1. Apply the same argument to
APPENDIX
551
n Tx, and continue. The x defined by (34) is irrational: otherwise, T x
=
0 for some
n,
which contradicts (35) and (36). Thus the value of an infinite simple continued fraction uniquely determi nes the partial quotients. The same is almost true of finite simple continued fractions. Since (3 1) and (32) imply (33), it follows that if x is given by (30), then any continued fraction of n terms that represents x must indeed match (30) term for term. But, for example, !f1 + l.f3= !f1 + 1.[4 + ![1. This is always possible: replace a n,( x) in (30) (where an ( x ) > 2) by a n ( x ) - 1 + ![1. Apart from this ambiguity, the representation is unique-and the representation (30) that results from repeated application of T to a rational x never ends with a partial quotient of l.t x
'
'
t see RocKEIT & Szusz for more on continued fractions.
Notes on the Problems
These notes consist of hints, solutions, and references to the literature. As a rule a solution is complete in proportion to the frequency with which it is needed for the solution of subsequent problems
Section l l.l. (a) Each point of the discrete space lies in one of the four sets A 1 n A 2 , A� nA 2 , A 1 nA�, A� n A� and hence would have probability at most 2 - 2 ; continue. (b) If, for each i, 81 is A, or A�, then 81 n · · · n Bn has probability at most n;= ,(1 - a, ) < exp[ - !:7= , a , ] Suppose A is trifling and let A - be its closure. Given e choose intervals (ak , bd, k = 1, . . . , n , such that A c 1J f: = 1(ak> bd and [f: = l bk - ak ) < ej2. If ' xk = ak - ej2n, then A - c U k = l xk • bk l and ['£. = 1(bk - x k ) < e . For the other parts of the problem, consider the set of rationals in (0, 1).
1.3. (b)
1.4.
(a) Cover Ar(i) by (r - l ) n intervals of length ,- n. (c) Go to the base r k . Identify the digits in the base r with the keys of the typewriter. The monkey is certain eventually to reproduce the eleventh edition of the Britannica and even, unhappily, the fifteenth.
1.5. (a ) The set A 3(1) is itself uncountable, since a point in it is specified by a sequence of O's and 2's (excluding the countably many t hat end in O's). (b) For sequences u 1 , consist of the points , un of O's, l's, and 2's, let M 1 in (0, 1] whose nonterminating base-3 expansions sta rt out with those digits. Then A 3(1) = (0, 1] - U M where the union extends over n > 1 and the " � sequences u 1 , . . . , u n contai ing at least one 1. The set described in part (b) is [0, 1] - U M�1 , where the union is as before, and this is the closure of A 3(1). From this representation of C, it is not hard to deduce that it can be defined as the set of points in [0, 1] that can be written in base 3 without any 1 's if terminating expansions are also allowed. For example, C contains t = . 1222 · · · = .2000 . . . because it is possible to avoid 1 in the expansion. (c) Given e and an w in C, choose w ' in A 3(1) within e/2 of w; now define w" by changing from 2 to 0 some digit of w' far enough out that w" differs from w' by at most ej2. •
•
u
•
u
u
552
u ,
u
NOTES ON THE PROBLEMS
553
1.7. The interchange of limit and integral is justified because the series [k rk (w)2 - k converges uniformly in w (integration to the limit is studied systematically inn Section 16). There is a direct derivation of (1 .40): let n --+ oo in sin t = n k 2 sin 2 - t · n ;: 1 COS 2 - t, which follows by induction from the half-angle for mula. =
1.10. (a) Given m and a subinterval (a, b] of (0, 1], choose a dyadic interval I in (a, b ], and then choose in I a dyadic interval of order n m such that In - 'sn( w )I � for w E This is possible because to specify is to specify the first n dyadic digits of the points in J, choose the first digits in 5uch a way that c I and take the following ones to be 1, with n so large that n- 1sn(w) is near 1 for w E (b) A countable union of sets of the first category is also of the first category; (0, 1] = N U Nc would be of the first category if Nc were. For Baire's theorem, see RoYDEN, p 139.
1
>
1
1.
> 1
1.
l.ll. (a) If x =p0jq0 -=F pjq, then
k ) has denominator 2 a( n ) and approximates 1/ 2 a( (c) The rational !:Z 1 n within 2j2a( + l>. =
X to
Section 2 2.3. 2.4.
2.5.
Let n consist of four points, and let sr consist of the empty set, and all six of the two-point sets. (b)
n
itself,
For example, take n to consist of the integers, and let .9,; be the a--field generated by the singletons {k} with k < n. As a matter of fact, any example in which .9,; is a proper subclass of .9,; + 1 for all n will do, because it can be shown that in this case U n-9,; necessarily fails to be a a--field; see A. Broughton and B. W. Huff: A comment on unions of sigma-fields, Amer. Math. Monthly, 84 (1977), 553 - 554. (b)
The class in question is certainly contained in f(d) and is easily seen to be closer under the formation of finite intersections. But ( u /!. I n }!. tA ;)c = n ;'!. 1 U }!. 1 A �j• and U }!. 1 A�j = U }!. 1[Af; n n t, 11 A;d has the required form. (b)
2.8. If d? is the smallest class over d closed under the formation of countable unions and intersections, clearly %c a-( d). To prove the reverse inclusion, first show that the class of A such that A c E % is closed under the formation of countable unions and intersections and contains d and hence contains %. 2.9. Note that U nBn E a-( U n .w'B ). "
2.10. (a) Show that the class of A for which Iiw) = IA (w') is a a--field. See Exam ple 4.8. 2.11.
Suppose that !f is the a--field of the countable and the cocountable sets in !l. Suppose that !f is countably generated and !l is uncountable. Show that :F
(b)
NOTES ON THE PROBLEMS
554
is generated by a countable class of singletons; if n 0 is the union of these, then sr must consist of the sets 8 and 8 u n() with 8 c no, and these do not include the singletons in n(), which is uncountable because n is. (c) Let g;_ consist of the Borel sets in n (0, 1], and let g; consist of the countable and the cocountable sets there.
=
2.12. Suppose that A 1, A 2, is an infinite sequence of distinct sets in a a--field 5', and let .:# consist of the nonempty sets of the form n �= I Bn , where Bn = An or Bn = A�, n 1, 2, . . . . Each An is the union of the .#sets it contains, and since the An are distinct, .:# must be infinite. But there are uncountably maily distinct countable unions of .#sets, and they all lie in !f. •
•
•
=
2.18. For this and the subsequent problems on applications of probability theory to arithmetic, the only number theory required is the fundamental theorem of arithmetic and its immediate consequences. The other problems on stochastic arithmetic are 4.15, 4.16, 5.19, 5.20, 6.16, 18.17, 25.15, 30.9, 30.10, 30. 1 1, and 30. 12. See also Theorem 30.3. (b) Let A consist of the even integers, let Ck = [m: u k < m � u,_ + d, a nd let B consist of the even integers in c, u c3 u . . . together with the odd integers in C2 U C4 U · · · ; take uk to increase very rapidly with k and consider A n B. (c) If c is the least common multiple of a and b, then Ma n Mb = Me. From Ma E � conclude in succession that Ma n Mb E �. Ma 1 n · · · n Ma . n Mg n · · · n Mg E .9, f(�') c �- By the same sequence of steps, show how D o � .L determi� es D on f(.-R). (d) If 81 = Ma - U p ,;; /Map• then a E 81 and (the inclusion - exclusion formula requires only finite additivity) 1 ·· D ( B;) = a1 - I: a1P + then A c B , B E 5', and P*(A) = P(B). For (3.l0), argue by completnentation. (b) Suppose that P * (A ) = P* ( A ) and chose .9=sets A 1 and A in such a way 2 that A 1 c A cA 2 and P(A 1 ) = P( A 2 ). Given E, choose an .9=set B in such a way that E c B and P'" (E) = P( B). Tht!n P*(A n E) + P*(Ac n E) < P(A 2 n B ) + P( A'{ n B). Now use (2.7) to bound tht! last snm by P(B) + P(A 2 - A 1 ) = P* (E).
3.3. First note the general fact that P* agrees with P on 90 if and only if P is countably, additive there, a condition not satisfied in parts (b) and (e). Using Problem 3.2 simplifies the analysis of P* and ./I(P* ) in the other parts. Note in parts (b) and (e) that, if P* and P* are defined by (3.1) and (3.2), then, since P* ( A ) = 0 for all A, (3.4) holds for all A and (3.3) holds for no A. Countable additivity thus plays an essential role in Problem 3.2. 3.6 (c) Split Ee by A: P0 (E ) = 1 - P ( £C ) = 1 - r (A n Ee ) - P (Ae n £C ) = 1 0 P ( A () Ee) - P( Ae) = P( A ) - P (A - E).
3.7.
(b)
Apply (3. 13): For A E 90, Q( A ) = P (H n A ) + P0 (He n A ) = P(H nA)
+ P(A ) - P0 ( A - ( He n A )) = P(A). (c) If A 1 and A 2 are disjoint 90-sets, then by (3.12),
NOTES ON THE PROBLEMS
556
Apply (3. 13) to the three terms in this equation, successively using A 1 U A 2, A 1 , and A 2 for A: Po ( H e n ( A 1 U A 2 ) ) = Po ( He n A 1 ) + Po ( He n A 2 ) .
But for these two equations to hold it is enough that H n A 1 n A 2 = 0 in the first case and He n A 1 n A 2 = 0 in the second (replacing A 1 by A 1 n A� changes nothing).
3.8. By using Banach limits (BANACH, p. 34) one can similarly prove that density D on the class 9 (Problem 2.18) extends to a finitely additive probability on the class of all subsets of n = {1, 2, . . . }. 3. 14. The argument is based on cardinality. Since the Cantor set e has Lebesgue measure 0, 2c is contained in the class ...i' of Lebesgue sets in (0, 1]. But e is uncountable: card � = card(O, 1] < card 2 c < card ...£'. 3.18. (a) Since the A Ell r are disjoint Borel sets, I:, A( A Ell r) < 1, and so the common value A(A) of the A( A Ell r) must be 0. Similarly, if A is a Borel set contained in some H Ell r, then A( A) = 0. (b) If the E n (H Ell r) are all Borel sets, they all have Lebesgue measure 0, and so E is a Borel set of Lebesgue measure 0. 3,19.
Given A 1 , B 1 , , A , _ 1 , 8, _ 1 , note that their union en is nowhere dense, so that In contains an interval In disjoint from en. Choose in In disjoint, nowhere dense sets An and Bn of positive measure. (c) Note that A and Bn are disjoint and that An U Bn c G. (b)
•
•
•
3,20. (a) If In are disjoint open intervals with union G, then b - lA( A) > f.n A(In) > Lnb - lA( A () In) > b - lA( A ).
Section 4 4.1. Let r be the quantity on the right in (4.30), assumed finite. Suppose that x < r; then x < V 'k=n x k for n > 1 and hence x < x k for some k > n: x < xn i.o. Suppose that x < x n i.o.; then x < V k_ =,, x k for n > 1: x < r. It follows that r = sup[ x : x < x n i.o.], which is easily seen to be the supremum of the limit points of the sequence. The argument for (4.31) is similar. .9'
is the u-field generated by .:#U {H} (Problem 2.7(a)). If (H n G 1 ) U (He n G2) = (H n G'1 ) u (He n G;), then G 1 � G'1 c He and now follows because A * (H) G2 � G'2 nc H; consistency n = A * (He) = O. If A!i) = n (H n G\ >) u (He n G� >) are disjoint, then G\m> n G\ > c He and G \m> n G� nc (see nProblem 2.17) P( U nA n ) = zM U nG\ >) H for m -=fonn, and therefore n + tM U ,, G� >) = Ln(tMG\ >) + -!MG� >)) = Ln P( An). The intervals with ratio nal endpoints generate .:#.
'4.10. The class
4.14. Show as in Problem l. l(b) that the maximum of P(B1 n n Bn), where B1 is A1 or A�, goes to 0. Let A , = [w: f.n iA (w)2 - " < x ], show that P(A n Ax) is continuous in x, and proceed as in Problem 2.19(a). ·
·
·
NOTES ON THE PROBLEMS
557
4.15. Calculate D(F1) by (2.36) and the inclusion - exclusion formula, and estimate Pn( F1 - F) by subadditivity; now use 0 < Pn(F1) - Pn(F) = Pn(F1 - F). For the calculation of the infinite product, see HARDY & WRIGHT, p. 246. Section 5 5.5. (a) lf m = O, a > O, and x > O, then P[ X > a ] < P[( X + x ) 2 � (a + x F ] < E[( X + x ) 2 ]j(a + x )2 = (u 2 + x 2 )j(a + x) 2 ; minimize over x. 5.8. {b) It is enough to prove that cp\t ) = f(t ( x', y') + (1 - t X x, y )) is convex in t (O ::; t < l ) for ( x , y) and ( x', y' ) in C. If a = x' - x and {3 = y' - y, then (if [, , > 0) cp" = fu a 2 + 2 ft2 a f3 + fn/3 2 1 = - Un a + !! 2 /3 ) 2 + 1-1 ( fu /22 - flz) f3 2 > 0. �11 II Examples like f( x, y ) = y 2 - 2 xy show that convexity in each variable sepa rately does not imply convexitv.
5.9. Check (5.39) for f( x, y) = - x 11P y 1 fq . 5.10 . Check (5.39) for f( x, y) = - ( x 1 1P + y 1 1P ) P. 5.19. For (5.43) use (2.36) and the fundamental theorem of arithmetic · since the P ; k are distinct, the P; ; individually divide m if and only if their product does. For (5.44) use inclusion - exclusion. For (5.47), use (5.29) (see Problem 5. 12)).
k 5.20. (a) By (5.47), En[a P ] < f.� = 1 p - < 2 jp. And, of course, n - ! log n! = En[log] = LpEn[a P ] log p. k (b) Use (5.48J and the fact that En[a P - o P ] < Lk = p - . 2 (c) By (5.49), n
Ox ( log e - 1 - 2K ) .
9x x 1 12 for large x, and hence log '7T(x) :: d og x and '7T(x ) :::: xjlog '7T(x). Apply this with x = p, and note t hat '7T( p,) = r.
Section 6 6.3. Since for given values of Xn1(w), . . . , Xn , k - l(w) there are for Xn(w ) the k possible values 0, 1, . . . , k - 1, the number of values of (Xn1(w), . . . , Xnn( w)) is n!. Therefore, the map w -+ (Xnlw), . . . , Xm,(w)) is one-to-one, and the Xnk(w) determine w. It follows t hat if 0 < X; < i for 1 < i < k, then t he number of permutations w satisfying Xn ;(w) = X;, 1 < i < k, is j ust (k + 1Xk + 2) · · · n, so that P[ xni = X;, 1 < i < k ] = ljk!. It now follows by induction on k that Xn1 , . . . , Xnk are independent and P[ Xn k = x ] = k - 1 (0 < x < k). Now calculate E [ Xnd =
k-1 2
,
0 + 1 + · · · + {n - 1 )
-)
n(n - 1) 4 , 4 2 k - 1 2 k2 - 1 0 2 + f + · " ' + ( k - 1 )2 - 2 Var[ Xn d = - 12 k n 2n 3 + 3n 2 - 5n 1 i: Var[ Sn l = rr (e - 1) = 72 E[S,. ] =
(
k=l
Apply Chebyshev 's inequality.
6.7. (a) If k 2 < n < (k + 1) 2 , let a n = e; if M bounds the lxnl, then
n - an _!_ _!_ s .!_ _!_ -+ 0. = a 2M M + (n nM < n s ) .!_n n a n a, - n a an an n
6.16. From (5.53) and (5.54) it follows that an = LP n - 1ln /P J (6.8) is
- + co,
The ieft side of
1 _!_ + _!_ , .!. . .!. . .!. . (.!. . ) (! ! ) < < .!_n l.!!. p n q n pq .._ l _ n l!!. p . j n l q J pq np nq !!..
Section 7
7.3. If one grants that there are only countably many effective rules, the result is an immediate consequence of the mathematics of t his and the preceding sections: C i s a countable intersection of .r-sets of measure 1 . The argument proves in particular the nontrivial fact that collectives exist. 7.7. If n < T, then Wn = Wn - I - Xn - 1 = WI - Sn - 1 > and T is the smallest n for Which Sn - l = W1 • Use (7.8) for the question of whether the game terminates. Now T- 1
FT = F0 + I: ( W1 - sk _ I ) Xk k= l
=
F0 + W1S7_ 1 - i(s;_ 1 - ( T - 1 )) .
NOTES ON THE PROBLEMS
559
7.8. Let x 1 , . . . . x,. be the initial pattern and put I.0 = x 1 + · · · + x,.. Define I-n = I,n - 1 - wn xn, L o = k, and Ln Ln - 1 - (3Xn + l)j2. Then T is the smallest n such that Ln < 0, and T is by the strong law finite with probability 1 if E[3Xn + 1] = 6( p - �) > 0. For n < -r, I.n is t he sum of t he pattern used to determine Wn + 1 . Since Fn - Fn _ 1 = I.n _ 1 - I.n, it follows that Fn = F0 + !. 0 - I-n and F7 = F0 + !. 0• =
"Section 8
8.8.
With probability 1 the population either dies out or goes to infinity. If, for example, Pko = 1 - Pk , k + l = 1jk 2, then extinction and explosion each have positive probability. (b)
8.9. To prove t hat x,. 0 is the only possibility in the persistent case, use Problem 8.5, or else argue directly: If X; = f.1 * ;0 P;1 x1 ' i -=1= i0, and K bounds t he I ..that P and N are both nonempty. For i 0 E P and j0 E N choose n so that P! } 0. Then CIJ 0
0
j, and apply Theorem 8.8. =
560
NOTES ON THE PROBLEMS
In FELLER, Volume 1, the renewal theorem is proved by purely analytic means and is then used as the starting point for the theory of Markov chains_ Here the procedure is the reverse.
8.19. The transition probabilities are P or = 1 and P; r- i 1+ I = p, P; r - i =1 q, 1 < i < r ; the stationary probabilities are u 1 = . . = u, ='q - u0 = ( r +'q)- • The chance of getting wet is u0p, of which the maximum is 2r + 1 - 2 -Jr( r + 1) _ For r = 5 this is .046, the pessimal value of p being .523. Of course, u 0 p < 1 /4 r . In more reasonable climates fewer umbrellas suffice: if p = .25 and r = 3, then u0p = .0�0; if p . 1 and r = 2, then u0p = .031 . At the other end of the scale, if p = .8 and r = 3, then u0p = .050; and if p = .9 and r = 2, then u 0 p = .043. =
8.22. For t he last part, consider the chain with state space Cm and transition probabilities P;1 for i, j E Cm (show that they do add to 1)_ 8.23. Let C' = S - (T U C) , and take U = T U C' in (8.51)_ The probability of absorp tion in C is the probability of ever entering it, and for initial states i in T U C these probabilities are the minimal solution of Y; =
L P; Yj + jL
jE T
O < y,. < 1 ,
;
EC
Pij
+
L Pij j , j ' e
C
V
i E T U C' ' i E T U C' .
Since the states in C' (C' = 0 is po ssible) are pe1 sbtent and C is closed, it is impossible to move from C' to C. Therefore, in the minimal solution of the system above, y,. = 0 for i E C'. This gives the system (8.55). It also gives, for the minimal solution, '[j T Pij Yj + '[j c p,.1 = 0, i E C'. This makes probabilistic sense: for an i in C', not only is it Impossible to move to a j in C, it is impossible to move to a j in T for which there is positive probability of absorption in C. e
e
8.24. Fix on a state i, and let Sv consist of those j for whkh pfj lm> 0 for some n congruent to v modulo t. Choose k so that p)/l > 0; if p� ) and pfj'l are positive, then t divides m + k and n + k, so that m and n are congruent modulo t_ The S, are thus well defined. 8.25. Show that Theorem 8.6 applies to the chain with transition probabilities p�>. 8.27. (a) From PC = CA follows Pc ; = A 1 c; , from RP = A R follows r P = A ; r,. , and n ; n from RC = I follows r; cj = oij- Clearly N is diagonal and p = CA R. Hence pfj l = Lu, C;u A: o u, R ,j = Lu A: (c u ru )ij
=
Lu A:( A u );j •
By Problem 8.26, there are scalars p and y such that r1 = pr0 = p ('7T 1 , . . '7T5 ) and c1 = yc0, where c0 is the column vector of 1 ' s. From r 1 c 1 = 1 follows p y = 1, and hence A 1 = c1r1 = c0r0 has rows ('7T 1, . . . , '7T5). Of course, (8.56) gives the exact rate of convergence. It is useful for numerical work; see P;({ r < n ] n ( Xk > O, k > 1]) --+ 1 [,.0 > 0, there is an n of the kind required in the last part of the problem. And now E;[ f( XT)] < P;[ T < n = u, ]f( i + n ) + 1 - P; [ T < n = un ] =
1 - P;[ T < n = un ]f; + n .o·
8.37. If i > 1, n 1 < n 2, (i, . . . , i + n 1 ) E /n , and (i . . . . , i + n 2) E /n2 , then P;(r = n1, T = n2] > P,.( Xk = i + k, k < n 2 ] > 0, �hich is impossible. Section 9 9.3. See
BAHADUR.
9.7. Because of Theorem 9.6 there are for P( Mn > a ] bounds of the same order as the ones for P(Sn > a ] used in the proof of (9-.36).
562
NOTES ON THE PROBLEMS
Section 10 10.7. Let JLJ be counting measure on the a--field of all subsets of a countably infinite n, let JL z = 2JL 1 , and let 9J consist of the cofinite sets. Granted the existence of Lebesgue measure A on @ 1 , one can construct another example: let JL 1 = A and JL z = 2A, and let 9J consist of t he half-infinite intervals ( - oo, x ]. There are similar examples with a field .9Q in place of f?J. Let n consist of the rationals in (0, 1 ], let JL1 be co unting measure, let JL z = 2JL 1 , and let .9Q consist of finite disjoint unions of " intervals" [ r E !l: a < r < b ]. Section l l 11.4. (b) If ( f, g ] c U k ( fk , gd, then ( f( w ), g(w)] c U k (fk(w), fk( w )] for all w, and Theorem 1 .3 gives g(w) - f(w ) < 'Lk(gk(w) - fk(w)). If h m = (g - f - Lk s n (gk - fk )) V 0, then h, l 0 and g - f < LH ,,( gk - fk ) + h n. The positivity and continuity of A now give v0(f, g ] < '[k v0(fk , g k ]. A similar, easier argument shows that Lk v0(fk , g d < v0(j, g ] if ( fk , gd are disjoint subsets of (f, g ]. 11.5. (b) From ( 1 1 .7) it follows t hat [f > 1 ] E .9Q for f in ..£' Since ..£' is linear, [ f > x ] and [ f < -x] are in .9Q for f E ...i' and x > 0. Since the sets (x, oo) and ( - x) for x > 0 generate .9f' 1 , each j in ..£' is measurable a-( .9ij). Hence .r a-(.9Q). It is easy to show that .9Q is a semiring and is in fa�t closed under the formation of proper differences. It can happen that n � .90-for example, in the case where n = {1, 2} and../ consists of the f with f(l ) = 0. See Jurgen Kindler: A simple proof of the Danieli-Stone representation theorem. Amer. Ma th . Monthly, 90 (1983), 396-397_) - co,
Section 12 12.4. (a) If en = e,.. , then en - m = 0 and n = m because e is irrational. Split G into finitely many intervals of length less than t:; one of them must contain points 62n and (}2m with 6z n < 6zm • If k = m - n , then 0 < (}2m - 6zn = (}2m e 6z n (J2k < €, and the points 6zkl for 1 < l < [62/ J form a chain in which the distance from each to the next is less than t: , the first is to the left of t:, and the last is to the right of 1 - £. (c) If S 1 e Sz = 6zk + 1 G IJ 2n Ell 6 zn lies in the subgroup, then s 1 = Sz and (J2 H 1 = =
2
I
(J 2( n , - n2 )
'
12.5. (a) The S Ell em are disjoint, and (2n + l)u + k = (2 n + l)u' + k' with lkl, lk'l < n is impossible if u -=1= u'. ( b) The A Ell (J• are disjoint, contained in G, and have the same Lebesgue measure. 12.6. See Example 2.10 (which applies to any finite measure). 12.8. By Theorem 12.3 and Problem 2.19(b), A contains two disjoint compact sets of arbitrarily small positive measure. Construct inductively compact sets Ku , . . . u, (each U ; is 0 or 1) such that 0 < JL(Ku , u ) < 3 - n and Ku , u ,. O and Ku , u ,.l are disjoint subsets of Ku u . Take K = n n U u u Ku u . The Cantor set is a special case. ,
I
n
l
u
i
,
NOTES ON THE PROBLEMS
563
Section 13 13.3. If f = L;X;IA , and A ; E T- 1 Y', take A'; in Y' so that A ; = T- 1A';. and set 1 1p = L.;x,.IA' · For the general f measurable T - :T', there exist simple functions fn, measurable T- 1 :T', such that fn(w) --+ f(w) for each w. Choose 'Pn• measurable Y', so that fn = 'PnT. Let C' be the set of w' for which 'Pn(w') has a finite limit, and define ip(w') = limn ifJn(w') fo r w' E C' and �p(w') = 0 for w' � C'. Theorem 20.1(ii) is a special case. 13.7. The class of Borel functions contains the continuous functions and is closed under pointwise passages to the limit and hence contains ff. By imitating the proof of the rr-A theorem, show that, if f an d g lie in f?l:, then so do f + g, fg, f- g, f V g (note that, for example, [ g : f + g E ff] is closed under passages to the limit). If fn(x) is 1 or 1 - n(x - a) or 0 as x < a or a < x < a + n - 1 or a + n - 1 < x, then fn is continuous and fn(x) --+ 1< - oo. al x). Show that [ A: lA E ff] is a A -system. Conclude that g; contains all indicators of Borel sets, all simple Borel functions, all Borel functions. 13.13. Let B = { b p . . . , bk }, where k < n, E; = C - b;- 1A, and E = U 7= 1 E;. Then E = C - U 7= 1 b,:- 1A. Since JL is invariant under rotation&, JL( E;) = 1 - JL(A) < n - I ' and hence JL(E) < 1. Thaefore c - E = n ;= I b;- 1A is nonempty. Use any (J in C - E.
Section 14 14.3. (b ) Since u < F(x) is equivalent to �p(u) < x, it follows that u < F(:p(u)). And since F(x) < u is equivalent to x < �p(u ), it follows further t hat F(�p(u) - € ) < u for positive €. 14.4. (a) If 0 < u < v < 1, then P[u < F( X ) < v, X E C] = P[ �p(u) < X < �p(u ), X E C]. If �p(u) E C, this is at most P[ �p( u ) < X < �p(u )] = F(�p(u) - ) - F(�p(u) - ) = F(�p(u) - ) - F(�p(u)) < u - u; if �p(u) � C, it is at most P[ �p( u ) < X < �p( u )] = F(�p(u) - ) - F(�p(u)) < u - u. Thus P[ F( X) E [u, v), X E C] < A[u, u) if 0 < u < u < 1. This is true also for u = 0 (let u ! C and note that P[F( X) = 0] = 0) and fm u = 1 (let u i 1), The finite disjoint unions of intervals [u, u ) in [0, 1) form a field there, and by addition P[F(X) E A , X E C] < A ( A ) for A in this field. By the monotone class theorem, the inequality holds for all Borel sets in [0, 1). Since P[ F(X) = 1, X E C] = 0, this holds also for A = { 1}. 14.5. The sufficiency is easy. To prove necessity, choose continuity points X; of F in such a way that x0 < x 1 < < x k , F(x0) < £ , F(x k ) > 1 - £, and X; - x;_ 1 < € . If n exceeds some n0, IF(x;) - Fn(x;)l < £/2 for all i. Suppose that X;_ 1 < x < x;. Then F,,(x) < Fn(x;) < F(x;) + Ej2 < F(x + E) + Ej2. Establish a simi lar inequality going the other direction, and give special arguments for the cases x < x0 and x > x k . ·
·
·
Section 15 15.1. Suppose there is an .%partition s uch that I;[supA JlJL( A;) < oo, Then a ; = supA · f < oo for i in the set I of indices for which JL(A ; ) > 0. If a = ' max1 a ;, then JL[f > a ] = I1JL ( A ; n [ f > a ]) < I1JL ( A ; n [ f > a ; ]) = 0. And A; n [ f > 0] = 0 for i outside the set 1 of indices for which JL( A; ) < oo, so that JL[f > 0] = IJL( A , n [/> O]) < I1JL( A;) < oo,
564
NOTES ON THE PROBLEMS
15.4. Let W, ;r-, JL + ) be the completion (Problems 3.10 and 10.5) of W, 5', JL). If g is measurable 5', [f* g] cA, A E 5', JL(A) = 0, and HE .!JR 1 , then [fE H] = (Ac n [f E H]) U (A n [fE H]) = (Ac n [g E H]) U (A n [fE H]) lies in /r", and hence f is measurable !r". (a) Since f is measurable !r", it will be enough to prove that for each (finite) /r"-partition {B) there is an .9=partition {A;} such that I ,[infA , f]JL(A;) > I} inf8I]JL + (B), and to prove the dual relation for t he upper sums. Choose (Problem 3.2) Y-sets A; so that A; c B; and JL(A;) = JL * (B;) = JL + (B;). For the partition consisting of the A; together with ( U ;A;Y, the lower sum is at least I;[inf8'I ]JL(A;) I;[ inf 8'I ]JL + (B;). (b) Choose successively finer .9=partitions {A11;} in such a way that the corresponding upper and lower sums differ by at most ljn 3• Let gn and fn have values in fA I and sup A I on An;· Use Markov ' s inequality-since JL(fi) is finite, it may as well be l-to show that JL[fn - gn > ljn] < ljn 2 , and then use the first Borei-Cantelli lemma to show that fn - gn -+ 0 almost everywhere. Take g =- limn gn. =
Section 16
16.4. (a) By Fatou's lemma,
and
< lim inf n J (bn -fn) djL =
JbdJL - lim sup Jfn dJL . n
Therefore
16.6. For
w E A and small enough complex h, I f( w , z0 + h ) -f( w, z0 ) I
16.8. Use the fact that fA i fl dJL
=
fzu+hf, (w, z) dz < lhlg(w, z0). zo
< aJL( A) + fr1r1
�
a1 1!1 dJL.
NOTES ON THE PROBLEMS
565
JL( A) < o im �lies fA Ifni dJL < t: for all n , and if a - I sup nf If) d JL < o, then JL[Ifnl > a] < a- f Ifn i d JL < o and hence fut a]lfnl d JL < € for all n. For the
16.9. If
.. i
;;,
reverse implication adapt the argument in the preceding note.
16.10. (b) Suppose that fn are nonnegative and satisfy condition (ii) and JL is nonatomic_ Choose o so that JL(A) < o implies fA fn dJL < 1 for all n . If JL[ fn = oo] > 0, there is an A such that A c [ fn = oo] and 0 < JL( A) < o; but then fA fn dJL = oo_ Since JL[f,. = oo] = 0, t here is an a such that JL[fn > a] < o < JL[fn > a]. Choose B c [fn = a] in such a way that A = [fn > a] U B satisfies JL( A) = o. Then a o = a JL( A) < fA fn dJL < 1 and ffn dJL < 1 + aJL(AC) < 1 + 0 - IJL (D,)_ 16.12. (b) Suppose that f E ..£' and f > 0. If fn = (1 - n - I )f V 0, then fn E ..£' and fn i f, so that v(fn, f] = A(f- fn) t 0. Since v(f1 , f] < oo, it follows that v[(CL• , I): f(w) = I ] = 0. The disjoint union
increases to
B, where B c (O, f] and (0, f] - B c [(w, 1 ):
((Ct.•)
= I ]. Therefore
Section 17 17.1. (a) Let A . be t he set of x such that for every o t here are points y and z satisfying I y - xi < o, lz - xl < o, and lf(y) -f(z )I > E. Show that A. is closed and Dr is the unior. of the A (c) Given t: and TJ , choose a partition into intervals I1 for which the corre sponding upper and lower sums differ by at most O J . By considering those i1 whose interiors meet A., show that O J > t: A(A.). (d) Let M bound 1!1 and, giver: t:, find an open G such that D c G and 1 A(G) < t:/M. Take C = [0, 1] - G and show by compactness that there is a o such that lf(y) -f(x )I < F if x (but perhaps not y) li es in C and ly - xl < o. If [0, 1] is decomposed into intervals I; with A( l1 ) < o, and if x1 E I1, let g be the function with valu e f(x1) on I1• Let r: denote summation over those i for which I1 meets C, and let '[" deuote summation over the other i. Show that •.
ft( x) dx - L f( x ; )A (l; ) < f it( x ) - g ( x )i dx
< [' 2 EA( l1 ) + [" 2MA( l1 ) < 4t: .
17.10. (c) Do not overlook the possibility t hat points in (0, 1) - K converge to a point in K. 17.11. (b) Apply the bounded convergence theorem to fn(x) (1 - n dist(x , [s, I]))+. (c) The class of Borel sets B in [ u , v] for which f = I8 satisfies (17.8) is a A-system. =
566
NOTES ON THE PROBLEMS
(e) Choose simple fn such that 0 < fn i f To 07.8) for f = fn, apply the monoton e convergence theorenl on the right and the dominated convergence theorem on the left. is the distance from x to [a, b ], then fn = (1 - ng) V 0 l Ir a. b 1 and fn E ...t"'; since the continuous functions are measurable !JR 1 , it follows that sr !JR' . If fn(x) ! O for each X, t hen the compact sets [x: fn(x) > € ] decrease to 0 and hence one of them is 0; thus the convergence is uniform.
17.12. If
g(x)
17.13. The linearity and positivity of A are certainly elementary facts, and for the continuity property, note that if 0 < f < t and f 'vanishes outside [a, b], then elem entary cons:derations show that 0 < A(f) < E(b - a). Section 18 18.2. First, .9l'x :Jt' is generated by the sets ot the forms {x} X X and X X {x}. If the diagonal E lies in 8rx [J[', then there must be a countable S in X such that E lies in the u-field .:T generated by the sets of these two forms for x in S. If .9 consists of sc and the singletons in S, then !F is the class of unions of sets in t he partition [ P1 X P2 : P1 , P2 E .9"]. But E E !F is impossible. 18.3. Consider A X B, where A consists of completion of !JR 1 with respect to .\ ,
a
singl e point and B lies outside the
f = p - I log p, and put fn = 0 if n is not a prime In the notation of (18.175, F(x) = log x + �p(x), where ip is bounded because of (5.51). If G(x) = - 1 flog x, then
18.17. Put
.!_
(x F( t ) dt F( x ) L P = log x + }2 t log z t p :s, x 'P( X )
= 1 + log x
00 tp { l ) dt f oo �p( t ) dt dt ! + zt . 2 t log t 2 t log z t t log x
� +
x
Section 19 19.3. See BANACH, p. 34. 19.4. (a) Take f = 0 and [,, = I(O, I /n )' (b ) Take f = 0, and let Un> be an infinite orthonormal set. u�e the fact that
Ln< fm g)Z < ll g ll 2• 19.5. Take fn ni 1j3, then B0(6s) < 1 < 3B£(s). In either case , B0(6s) < 3BE(s).
Section 23 23.3. Note that A, cannot exceed t. If 0 < u < t and P[ Nt + t - Nr - u = 0] = e -aue - a•
u >
0, then P[ A, > u, 81 > v ] =
·
23.4. (a) Use (20.37) and the distributions of A1 and 81 • (b) A long interarrival interval has a better chance of covering one does_ 23.6. The probability that Ns
n + J..
t
t han a short
- N:S" = j is
"' ( f3 )1 a k a kf3j ( J' + k - 1) ! -I dx { e -(3 x x k --'ax e - k +j--: j ! (k - 1) ! j! f( k ) x Jo (a + {3-)� _
.
23.8. Let M1 be the given process and put �p(t ) = £[ M1 ]_ Since there are no fixed discontinuities, �p(t ) is continuous_ Let r/J(u) = intl t : u < �p(t)], and show that Nu = MI/J( u ) is an ordinary Poisson process and M, = N"'
A
C. ) :5
-
A.
a =
=
A
A, A,) > A,
=
E
E
A, )
NOTES ON THE PROBLEMS
575
2 it /3")• 31 15 noon � l (.!.2 + .!.e 2 •
•
31. 18. For x fixed, let u11 and V11 be the pair of successive dyadic rationals of order n (un - U 11 = 2 - ") for which U11 < x < V11 • Show that
k =0
=
n-1
a k- ( x ), L k=O
where a;; is the left-hanJi derivative. Since af:(x) = + Liar all x and k, the difference ratio cannot have a finite limit
31 .22. Let A be the x-set where (31.35) fails if f is replaced by f�p; then A has Lebesgue measure 0. Let G be the union of all open sets of tt·measure 0; represent G as a countable disjoint union of open intervals, and let B be G together with any endpoints of zero wrneasure of these intervals. Let D be t he set of discontinuity points of F. If F(x) r;t: A, x ff: B, and x l;t: D, then F(x - h) < F(x) < F(x + h ), Hx ± h) -+ F(x), and
Now x - € < �p(F(x)) < x follows from F(x - t:) < F(x), and hence �p(F(x)) = x. If A is Lebesgue measure restricted to (0, 1), then tt = A 1p - I , and (31.36) follows by change of variable. But (36.36) is easy if x E D, and hence it holds outside B (De n F - 1A). But tt ( B ) = 0 by construction and tt(D' n F- 1A) = 0 by Problem 14.4.
U
Section 32
n n I' S 32 . 7. Define Jt11 an d V11 as ·I n (32. 7), an d Write V - V3(0 ) + v5( ), Where V3(r.) c r l is singular with respect to absolutely con tinuous with respect to tt n and v5" (11) '\' '\' (II) lt n · Ta ke Vac - L.n Vac an d Vs - L. nVs ' Suppose that V3c(E) + v5(E) = v�c(E) + v�(E) for all E in :F. Choose an S in :T that supports v5 and v� and satisfies tt(S J = 0. Then v&c(E) ·
_
=
Vac(E () sc) = vac( E () sc) + J's(E () sc) = v�c(E () sc) + v�(E () sc) = v�c(E () sc) = v�c( E). A similar argument shows that v5(E) = v�(E).
32.8. (a) Show that � is closed under t he formation of countable unions, choose �-sets Bn such that Jt(B11 ) --+ sup a- tt(B) ( < oo), and take 80 = U11 B11 • (b) The same argument. (c) Suppose tt(D0) > 0. The maximality of 80 implies that 80 u D0 contains an E such that tt(E) > 0 and v(E) < oo. Since 80 n E cB0 E �, tt(B0 n E) = 0 (v(E) < oo rules out v(B0 n E) = oo). Therefore, JL(D0 n E) > 0 and v(D0 n E) < oo, which contradicts the maximality of C0. (d) Take the density to be oo on fl(j. 32.9. Define f and v5 as in (32.8), and let r and v� be t he corresponding func tion and measure for !T0 : v(E) = fEr dtt + v� (E) for E E !T0, and ther e is an g;-o -set So such that v� (fl - So) = 0 and tt(So ) = 0. If E E g;-o , it fol lows that fEr dtt = IE - S" r dtt = IE - S" r dtt0 = V0(E - so ) = v(E - S0) > IE- S" fdtt = IEfdtt ·
576
NOTES ON THE PROBLEMS
It is instructive to consider the extreme case g-o = {0, fl}, in which v0 is absolutely continuous with respect to p.,0 (provided p.,( fl) > 0) and hence v� vanishes.
Section 33 33.2. (a) To prove independence, check the covariance. Now use Example 33. 7. (b) Use the fact that R and 0 are in dependent (Example 20.2). (c) As the single event [ X = Y] = [ X - Y = 0] = [0 = 7T/4] u [ 0 = 57T/4] has probability 0, the conditional probabilities have no meaning, and strictly speaking there is nothing to resolve But whether it is natural to regard the degrees of freedom as one or as two depends on whether the 45o line through the origin is regarded as an element of the decomposition of the plane into 45° lines or whether it is regarded as the union of two elements of the decoJnposi tion of the plane into rays from the origin. Borel's paradox can be explained the same way: The equator is an element of the decornpo�ition of the sphere into lines of constant latitude; the Green wich meridian is an element of the decomposition of the sphere into great circles with common poles. The decomposition matters, which is to say the u-field matters. 33.3. (a) If the guard says, " 1 is to be exect.ted," then the conditional probability that 3 is also to be executed is 1 /0 + p). The " paradox" comes from assumin g that p must be 1, in which case the conditional probability is indeed t. But if p -=/= t, then the guard does give prisoner 3 some information. (b) Here "one" and "other" are undefined, and the problem ignores the possibility that you have been introduced to a girl. Let the sample space be a
bbo 4 , bby
1-a 4 '
f3 • bgo 4 1 - {3 bgy 4 '
'Y gbo 4 , 1 - ')' gby 4 '
0
ggo 4' 1 -8 ggy 4 .
For example, bgo is the event (probability f3 14) that the older child is a boy, the younger is a girl, and the child you have been introduced to is the older; and ggy is the event (probability (1 - o)j4) that both children are girls and the one you have been introduced to is the younger. Note that the four sex· distributions do have probability t· If the child you have been introduced to is a boy, then the conditional probability that the other child is also a boy is p = 1/(2 + f3 - y ). If f3 = 1 and y = 0 (the parents present a son if t hey have one), then p = t· If f3 = y (the parents are in different), then p = t . Any p between t and 1 is possible. This problem shows again that one must keep in mind the entire experi ment the sub,.-field .:# represents, not just one of the possible outcomes of the experiment.
33.6. There is no problem, un less the notation gives rise to the illusion that p(A ix) is P(A n [ X = x])jP[ X = x ]. 33 .15. If N is a standard normal variable, then
NOTES ON THE PROBLEMS
577
Section 34 34.3. If (X, Y ) takes the values (0, 0), (1, - 1), and (1, 1) with probability t each, then X and Y are dependent but E[ YIIX] = E[ Y] 0. If (X, Y) takes the values ( - 1, 1), (0, - 2), and (1, 1) with probability { each, t hen E[ X] = E[Y] = E[ XY ] = 0 and so E[XY] = E[ X]E[ Y], but E[ Y II X ] = Y -=1= 0 = E[ Y ]. 0{ course, this is another example of dependent but uncorre lated random variables. =
34.4. First s how t hat ffdP0 = f8 fdPjP( B) and that P[ BII&"] > 0 on measure 1 . Let G be the general set in &". (a) Since
=
a
set of
P0-
P(B) fr/cP0[ All&"] dP0 = P( B )P0(A n G)
= j P[ A n BII&"] dP, G
it follows that
holds on a set of ?-measure 1. (b) If P,(A) = P(AI B,), then
=1
P( All&"v .;y] dP.
G n B;
Therefore, fc· l8 P;[ A II&"]dP = fc /8 P[AII&"v �] dP if C = G rtB;, and of course this holds for C = G n 81 if j -=1= i. But C's of this form constitute a 'IT-system generating .§'v �. and hence /8 P;[ AII&"] = 18' P[AII&"v �] on a set of ?-measure 1. Now use t he result in part (a).
34.9. All such results can be proved by imitating the proofs for t he unconditional case or else by using Theorem 34.5 (for part (c), as generalized in Problem 34.7). For part (a), it must be shown that it is possible to take the integral measurable &". 34.10. (a) If Y =2 X - E[ XII&"d, t hen X - E[ XII&"2 ] = Y - E[YII&"2 ], and E[(Y 2 E[ YII&"2 ]) II&"2 ] = E[ Y II&"2 ] - E 2 [ Y II&"2 ] < E[ Y 2 II&"2 ]. Take expected values. 34.11. First prove that
578
NOTES ON THE PROBLEMS
From this and (i) deduce (ii). From
(ii), and the preceding equation deduce
The sets A 1 n A 2 form
a
'IT-system generating .§'1 2 .
34. 16. (a) Obviously (34. 18) implies (34. 17). If (34. 17) holds, then clearly (34. 18) holds for X simple. For the general X, choose simple Xk such that lim k Xk = X and IXk l < lXI. Note that
f XdP - afXdP An
let n -+ oo and then let k -+ oo. (b) If .!1 E .9, then the class of E satisfying (34. 17) is a A-system, and so by the 7T-A theorem and part (a), (34. 18) holds if X is measurable u(9'). Since A n E u (.9 ), it follows that
j XdP = j E[ XIIu (.9)] dP -+ ajE[XIiu (.9)] dP An
An
= a jxdP. (c) Replace X by XdP0jdP in (34 .18). 34. 17. (a) The Lindeberg-Uvy theorem. (b) Chebyshev's inequal ity. (c) Them em 25.4. (d) Independence of the Xn (e) Problem 34. 1 6(b). (f) Problem 34. 16(c). (g) Part (b) here and the t:-8 defin ition of absolute continuity. (h) Theorem 25.4 again. Section 35 35.4. (b) Let snn be the number of k such that 1 < k < n and yk = i- Then Xn = 35"j 2 . Take logarithms and use the strong law of large numbers.
579
NOTES ON THE PROBLEMS
35.9. Let K bound l X I I and the iXn - xn - I I· Bound IXT I by KT. Write IT $ k XT dP = L 7= 1 fT =i X, dP = [7= 1( fT ""- i X, dP - fT ""- l + l X,. dP). Transform the last integral by the martingale property and reduce the expression to £[X I ] - IT > k xk + I dP. Now
fT > kxk + l dP
< K ( k + 1 ) P[ r > k ] < K( k + l)k - 1 f k T dP -+ 0. T>
is a supermartingale. Si nce 35.13. (a) By the result in Problem 32.9, X1 , X2 , E[ IXnl] = E[ Xn] < v(fl), Theorem 35.5 applies. (b) If A E .9,;, then fiYn + Zn) dP + a-�( A ) = fA X., dP + a-J A) = v( A ) fA Xn dP + a-n( A). Since the Lebesgue decomposition is unique (Problem 32. 7), Yn + Zn = Xn with probability 1. Since Xn and Yn converge, so does Zn. If A � !T, and n > k, then fAZn dP < a-J A), and by Fatm:'s lemma, the limit Z satisfies fA Z dP < a-.,( A ). This holds for A l n U k !T, and hence (monotone class theorem) for A in �. Choose A so that P(A) = 1 and a-.,( A ) = 0: E[ Z] = fAZdP < a-.,( A ) = 0. It can happen that a-n(fl) = 0 aud a-J !l) = v(fl) > 0, in which case a-n does not converge to (.Too and the xn cannot be integrated to the limit. •
•
=
35.17. For a very general result, see J. L. Doob: Appl ication of the theory of martingales, Le Calcul des Probabilites et ses Applications (Colloques Interna tionaux du Centre de Ia Recherche Scientifique, Par:s, 1949). Section 36 36.5. (b) Show by part (a) and Problem 34.18 that fn is the conditional expected value of f with respect to the a--field .9';; + 1 generated by the coordinates x n + 1 , x n + z • . . . . By Theorem 35.9, (36.30) will follow if each set in n n:T, has 'IT-measure either 0 or 1, and here the zero-one law applies. (c) Show that gn is the conditional expected value of f wit h respect to the a--field generated by the coordinates x " . . . , x n , and apply Theorem 35.6. 36.7. Let ..£' be the countable set of simple functions L;a ; IA for a; rational and {A;} a finite decomposition of the unit interval into subi�tervals with rational endpoints. Suppose that the X, exist, and choose (Theorem 17. 1) Y, in ..£' so that E[I X, - Y, l] < i. From E[IXs - X, l] = t, conclude that E[ I Y, - ¥; 1] > 0 for s -=1= t. But there are only countably many of the y; . It does no good to replace Lebesgue measure by some other measure on the unit interval. t Section 37 37. 1. If t 1 , . . . , t k are in increasing order and t0 = 0, then min(i,j)
[, K( t; , tJ x; xi = [, x ; xi [, Ut - tt d 1= 1 t,} i,j
(
= [, ( It - lt - d [, x; I
i ?! I
f > 0.
580
NOTES ON THE PROBLEMS
37.4 (a) Use Problem 36.6(b). (b) Let [ W,: t > 0] be a Brownian motion on ( fl, !f. P0 ), where W( · , w) E C for every w. Define {: n --+ RT by Z,({(w)) = w;(w). Show that { is mea surable S'j!JRT and P = P0C 1 • If C c A E .9P T, then P(A) = P0(C 1A) =
Po(fl) = 1.
37.5. Consider Since
W(l) = EZ
=
nj
1(W(k jn ) - W((k - l)jn)) for notational convenience.
I W( I /�)1 ? W 2 ( 1) dP --+ O,
the Lindeberg theorem applies.
37.14. By symmetry,
[
p(s, t ) = 2 P W. > 0 . inf ( Wu - W5 ) < S $; U :$, 1
W. ] ;
W, and the infimum here are independent because of t he Markov property,
and so by (20.30) (and symmetry aga in)
oo p ( s, t ) - _,- P[ Tx s t - s ]
l
o
1 e - x Z ;zs dx v2Trs r;
Reverse t he integral, use J;xe - x 2 r ;z dx
=
1 jr, and put = (s j(s + u)) 1 12 : u
1 � t -s 1 P( s, t ) = Tr o u + s 2
'J
du
s 1 12 du u l f2
. = Tr z .[0 VI - u
Bibliography
HALMos and SAKS have been the strongest measure-theoretic and Doos and FELLER the strongest probabilistic influences on this book, and the spirit of KAc's small volume has been very important. AUBREY: Brief Lives, John Aubrey; ed., 0. L. Dick. Seker and Warburg, London, 1949. BAHADUR: Some Limit Phi ladelphia, 1971.
Theorems in Statistics,
BANACH: Theorie des Operations atyczne, Warsaw, 1932. BERGER: Statistical Decision Verlag, New York, 1985.
R. R. Bahadur. SIAM,
Lineaires, S. Banach.
Theory,
Monografje Matem
2nd ed., James 0. Berger. Springer
BHATTACHARYA & WAYMIRE: Stochastic Processes with Applications, Rabi N. Bhattacharya and Edward C. Waymire. Wiley, New York, 1990.
I
, BILLINGSLEY 1 : Convergence Wiley, New York, 1968. • +
of Probability Measures,
Patrick Billingsley.
BILLINGSLEY 2 : Weak Convergence of Measures: Applications Patrick Billingsley. SIAM, Philadelphia, 1971.
�BIRKHOFF & MAc LANE:
in Probability,
A Survey of Modern Algebra,
4th ed., Garrett Birkhoff and Saunders Mac Lane. Macmillan, New York, 1977. B, 536 A c B, 536 J':set, 20 �0• 20 u(N), 21 J, 22 �. 22 (.!1, !7, P), 23 A n t A , 536 A n ! A , 536 A, 25, 43, 168 1\ , 537 v , 53 7 f(.W), 33 P/ A ), 35 D( A ), 35 �. 35 P* , 37, 47 p * , 37, 47 S'', 27, 3 1 1 9, 41 ../, 41 A* 44 ,
P(Bi A ), 5 1 lim SllPn A n• 52 lim infn A n, 52 lim n A n• 52 i.o., 53 g-, 62, 287 R I , 537 R k , 53 9 [ X = x ], 67 O"( X ), 68, 255 J.L , 73, 160, 256 E[X], 76, 273 Var[ X ], 78, 275 En[ f], 87 s/a), 93 T , 99, 1 33, 464, 508
PiJ•
S,
a; ,
7T;,
111
11 J
Ill
1 24 M(t ) 146, 278, 285 gp k , 158 .Wn .!10, 159 x k t x, 160, 537 x = [k xk , 1 60 J.L* , 1 65 .II( J.L* ), 165 A 1 , 168 A k , 171 F, 1 75, 1 77 AA F, 1 76 r- 'A', 537 sz-;sz-', 1 82 Fn = F , 1 9 1 , 327, 378 dF(x ), 228 X X Y, 23 1 !!Z" x �. 231 J.L X v , 233 ,
585
586 *,
266 Xn -> pX, 70, 268, 330 IIJ II, 249 lifllp, 241 L P, 241 J.Ln => J.L , 327, 378 Xn = X, 329, 378 xn = a, 331 JA , 538 tp( t ), 342 J.Ln -> 1 J.L, 371 F. , 414
LIST OF SYMBOLS Fac' 4 1 4 v « J.L, 422 d vfdJ.L, 423 v. , 424 v.c , 424 P[ A II&"], 428, 4 30 P[ A IIX" t E T], 433 E[ XIi&"], 445 RT, 484 !JPT, 485 w, 498
Index
Here An refers to paragraph n of the Appendix (p. 536): u . v refers to Problem o in Section u , or else to a note on it (Notes on the Problems, p. 552); the other references are to pages. Greek letters are alphabetized by their Roman equivalents (m for J.L, and �o on). Names in the bibliogr3phy are not indexed separatt"ly. Absolute continuity, 413, 422 Absolutely continuous part, 425 Absolute moment, 274 Absm bing state, 1 1 2 Adapted u-fields, 458 Additive set function, 420 Additivity: countable, 23, 161 finite, 23, 161 Admissible, 248, 252 Affine transformation, 172 Algebra, 1 9 Almost evel)where, 60 Almost surely, 60 a-mixing, 363, 29.10 Aperiodic, 125 Approximation of measure, 168 Area over the curve, 79 Area under the curve, 203 Asymptotic equipartition property, 91, 1 44 Asymptotic relative frequency, 8 Atom, 271 Autoregression, 495 Axiom of choice, 21, 45
Beppo Levi theorem, 1 6.3 Bernoulli-Laplace model of diffusion, 1 12 Bernoulli shift, 3 1 1 Bt"rnoulli trials, 75 Bernstein polynomial, 87 Betting system, 98 Binary digit, 3 Binomial distribution, 256 Blackweii-Rao theorem, 455 Bold play, 102 Hoole's inequality, 25 Borel, 9 Borel-Cantelli lemmas, 59, 60 Borel function, 183 Borel normal num ber theorem, 9 Borel paradox, 441 Borel set, 22, 158 Boundary, Al l Bounded convergence theorem, 210 Bounded variation, 415 Branching process, 461 Britannica, 552 Brownian motion, 498 Burstin's theorem, 22.1 4
Baire category, 1 10, Al 5 Baire function, 1 3.7 Banach lim its, 3.8, 19.3 Banach space, 243 Banach-Tarski paradox, 180 Bayes estimation, 475 Bayes risk, 248, 25 1 Benford law, 25.3
Canonical measure, 372 Canonical representation, 372 Cantelli ineq uality, 5.5 Cantelli theorem, 6.6 Cantor function, 31 .2, 3 1 . 1 5 Cantor set, 1 .5 Cardinal ity of u·fields, 2.12, 2.22 Cartesian product, 231
587
588 Category, A15, 1 10 Cauchy distribution, 20 1 4, 348 Cauchy equation, A20, 1 4.7 Cavalieri principle, 18.8 Central limit theorem, 291, 357, 385, 391, 34 1 7, 475 Cesaro averages, A30, 20.23 Change of variable, 215, 224, 225, 274 Characteristic function, 342 Chebyshev inequality, 5, 80, 276 Chernoff theorem, 1 5 1 Chi-squared distribution, 20. 15 Chi-squared statistic, 29.8 Circular Lebesgue measure, 1 3 12, 3 1 3 Class of sets, 18 Closed set, Al l Closed set of states, 8 21 Closed support, 1 2.9 Closure, Al l Cocountable set, 21 Cofinite set, 20 Collective, 1 09 Compact, A13 Complement, Al Completely normal number, 6. 1 3 Complete space or measure, 44, 10.5 Com pletion, 3 10, 10.5 Complex functions, integration of, 218 Compound Poisson distribution, 28.3 Compound Poisson process, 32.7 Concentrated, 1 6 1 Conditional distribution, 439, 449 Conditional expected value, 1 33, 445 Conditional probability, 5 1 , 427, 33.5 Congruent by dissection, 1 79 Conjugate index, 242 Conjugate spact", 244 Consistency conditions for finite-dimensional distributions, 483 Contt:nt, 3.15 Continued·fraction transformation, 3 1 9, A36 Con tinuity from above, 25 Continuity of paths, 500 Continuum hypothesis, 46 Conventions involving oo, 160 Convergence in distribution, 329, 378 Convergence in mean, 243 Convergence in measure, 268 Convergence in probability, 70, 268, 330 Convergence with probability 1, 70, 330 Convergence of random series, 289 Convergence of types, 1 93 Convex functions, A32 Convolution, 266
INDEX Coordinate function, 27, 484 Coordinate variable, 484 Countable, 8 Countable additivity, 23, 1 6 1 Countable subadditivity, 25, 1 62 Countably generated u-field, 2 1 1 Countably infinite, 8 Counting measure, 1 6 1 Coupled chain, 126 Coupon problem, 362 Covariance, 277 Cover, A3 Cramer-Wold theorem, 383 Cylinder, 27, 485 Daniell-Slone theorem, ! I 14, 16 12 Darboux-Young definition, 15.2 A-distribution, 1 92 Decision theory, 247 Decom position, A3 de Finetti theorem, 473 Definite integral, 200 Degenerate di stribution function, 193 Delta method, 359 DeMoive-Laplace theorem, 25. 1 1 , 358 DeMorgan !aw, A6 Dense, Al5 Density of measure, 213, 422 Density point, 31.9 Density of random variable or distribution, 257, 260 Density of set of integers, 2. 18 Denumerable probabilities, 51 Dependent random variables, 363 Derivatives of integrals, 402 Diagonal method, 29, A14 Difference equation. A19 Difference set, Al Diophantine approximation, 13, 324 Dirichlet theorem, 13, A26 Discontinuity of the first kind, 534 Discrete measure, 23, 1 61 Discrete random variable, 256 Discrete space, 1 . 1 , 23, 5.16 Disjoint, A3 Disjoint supports, 410, 421 Distribution: of random variable, 73, 187, 256 of random vector, 259 Distribution function, 175, 188, 256, 259, 409 Dominated convergence theorem, 78, 209 Dominated measure, 422 Double exponential distribution, 348 Double integral, 233
INDEX Double series, A27 Doubly stochastic matrix, 8.20 Dual space, 245 Dubins-Savage theorem, 102 Dyadic expa nsion, 3, A31 Dyadic interval, 4 Dyadic transformation, 3 1 3 Dynkin 's rr-A theorem, 42 E-.5 definition of absolute continuity, 422
Egorov theorem, 1 3 9 Eigenvalues, 8.26 Empirical distribution function, 268 Empty set, A 1 Entropy, 57, 6. 1 4, 8.31, 3 1 . 1 7 Equkontin uous, 355 Equivalence class, 58 Equivalent measures, 422 Erdos-Kac central limit theorem, 395 Ergodic theorem, 314 Erlang density, 23.2 Essential supremum, 241 Estimation, 2'51, 452 Etemadi, 282, 288, 22. 15 Euclidean distance, A16 Euclidean space, A1, A16 Euler function, 2. 18 Event, 18 Excessive function, 1 34 Exchangeable, 473 Existence of independent sequences, 73, 265 Existence of Markov chains, 1 15 Expected value, 76, 273 Exponential convergence, 131, 8.18 Exponential distribution, 189, 258, 297, 348 Extension of measure, 36, 166, 1 1. 1 Extremal distribution, 1 95
Factorization and sufficiency, 450 Fair game, 92, 463 Fatou lemma, 209 Field, 1 9, 2.5 Filtration, 458 Finite additivity, 20, 23, 2.15, 3 8, 161 Finite or countable, 8 Finite-dimensional distri butions, 308, 482 Finite-dimensional sets, 485 Finitely additive field, 20 Finite subadditivity, 24, 162 First Borel-Cantelli lemma, 59 First category, 1 . 10, A15 First passage, 1 18 fixed discontinuity, 303 Fourier representation, 250
589 Fourier series, 35 1 , 26.30 Fourier transform, 342 Frequency, 8 Fubini theorem, 233 Functional central limit th eorem, 522 Fundamental in probability, 20 21 Fundamental set, 320 Fundamental theorem of calculus, 224, 400 Fundamental theorem of Diophantine approximation, 324 Gambling policy, 98 Gamma distribution, 20. 1 7 Gamma function, 18 1 8 Generated u· field, 21 Glivenko-Cantelli theoreM, 269 Goncharov's theorem, 361 Hahn decomposition, 420 Hamel basis, 14 7 Hardy-Ramanujan theorem, 6. 16 Heine-Bore! theorem, A 1 3, A17 Hewitt-Savage zero-one law, 496 Hilbert space, 249 Hitting time, 1 36 Holder's inequality, 80, 5 9, 242, 276 Hypothesis testing, 1 5 1 Inadequacy of !JPT, 492 Inclusion-exclusion formula, 24, 163 Indefinite integral, 400 Identically distributed, 85 Independent classes, 55 Independent events, 53 Independent increments, 299, 498 Independent random variables, 71, 261 Independent random vectors, 263 Indicator, AS Infini tely divisible distributions, 371 Infinitely often, 53 Infinite series, A25 Information, 57 Initial digit problem, 25.3 Initial probabilities, 1 1 1 Inner boundary, 64 Inner measure, 37, 3.2 Integrable, 200, 206 In tegral, 1 99 Integral with respect to Lebesgue measure, 221 Integrals of derivatives, 412 Integration by parts, 236 Integration over sets, 212 Integration with respect to a density, 214
590 Interior, Al l Interval, A9 lnvariance principle, 520 Invariant set, 313 Inverse image, A7, 182 Inversion formula, 346 Irreducible chain, 1 1 9 Irregular paths, 504 Iterated integral, 233 Jacobian, 225, 261, 545 Jensen inequality, 80, 276, 449 Jordan decomposition, 421 Jordan measurable, 3.15 k-dimensional Borel set, 1 58 k-dimensional Lebesgue measure, 1 71, 1 77, 1 7 14, 20.4 Kolmogorov existence theorem, 483 Kolmogorov zero-one law, 63, 287 Landau notation, Al8 Laplace distribution, 348 Laplace transform, 285 Large deviations, 148 Lattice distribution, 26. 1 Law of the iterated logarithm, 1 5 3 Law of large numbers: strong, 9, 1 1 , 85, 282 weak, 5, 1 1 , 86, 284 Lebesgue decomposition, 414, 425 Lebesgue density theorem, 3 1 .9, 35. 15 Lebesgue function, 3 1 .3 Lebesgue in tegrable, 221 , 225 Lebesgue measure, 25, 43, 167, 171, 177 Lebesgue set, 45 Leibniz formula, 17.8 Levy distance, 14.5, 25.4, 26.1 6 Likelihooct ratio, 461, 471 Limit inferior, 52, 4.1 Limit of sets, 52 Lindeberg condition, 359 Lindeberg-Uvy theorem, 357 Linear Borel set, 158 Linear functional, 244 Linearity of expected value, 77 Linearity of the integral, 206 Linearly independent reals, 14.7, 30.8 Lipschitz condition, 418 Log-normal distribution, 388 Lower integral, 204, 228 Lower semicontinuous, 29. 1 Lower variation, 421 L P-space, 241
INDEX A-system, 41 Lusin theorem, 17.10 Lyapounov condition, 362 Lyapounov inequality, 81, 277 Mapping theorem, 344, 380 Marginal distribution, 261 Markov chain, 1 1 1, 363, 367, 29. 1 1 , 429 Markov inequality, 80, 276 Markov process, 435, 5 1 0 Markov sh ift, 312 Markov time, 1 33 Martingale, 1 0 1 , 458, 5 1 4 Martingale central limit theorem, 475 Martingale convergence th eorem, 468 Maximal ergodic th eorem, 3 1 7 Maximal inequ'llity, 287 Maximal solution, 122 wcontir:uity set, 335, 378 m-dependent, 6 1 1 , 364 Mean value, 26. 1 7 Measurable mapping, 1 82 Measurable proLess, 503 Measurable rectangle, 231 Measurable with respect to a u·field, 68, 225 Measurable set, 20, 38, 1 65 Measurable space, 161 Measure, 22, 1 60 Measure-preserving transformation, 31 1 Measure space, 1 6 1 Meets, A3 Method of moments, 388, 30.6 Minimal sufficient field, 454 Minimum·variance estimation, 454 Minkowski inequality, 5.1 0, 242 Mixing, 24.3, 363, 29. 1 0, 34. 16 Mixture, 473 IL*-measurable, 165 Moment, 274 Moment generating function, 1 .6, 1 46, 278, 284, 390 Monotone, 24, 162, 206 Monotone class, 43, 3.12 Monotone class th eorem, 43 Monotone convergence theorem, 208 M-test, 210, A28 Multidimensional central limit theorem, 385 Multidimensional characteristic function, 381 Multidimensional d istribution, 259 Multidimensional normal distribution, 383 Multinomial sampling, 29.8 Negative part, 200, 254 Negligible set, 8, 1 .3, 1 . 9, 44
INDEX Neym an- Pearson lemma, 1 9.7 Nonatomic, 2 . 1 9 Nondenumerable probabilities, 526 Nonmeasurable set, 45, 12 4 Nonnegative series, A25 Norm, 243 Normal distribution, 258, 383 Normal number, 8, 1 8, 86, 6 1 3 Normal number theorem, 9, 6.9 Nowhere dense, AlS Nowht"re differentiable, 3 1 . 18, 505 Null persistent, 1 30 Num ber theory, 393 Open set, Al l Optional sampling theorem, 466 Optimal stopping, 1 33 Ordel of dyadic interval, 4 Orthogonal projection, 250 Orthonormal, 249 Ottaviani inequahty, 22. 15 Outer boundary, 64 Outer measure, 3 7, 3.2, 165 Pairwise disjoint, A3 Partial-fraction expansion, 20.1 4 Partial inform ation, 57 Partition, A3 Path function, 308, 493, 500 Payoff function, 1 33 Perfect set, Al5 Pea:10 curve, 1 79 Period, 125 Permut