This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
0. Let ∞ hEn in∈N , hFn in∈N be sequences in Σ, T respectively such that A ⊆ n∈N En × Fn and n=0 µEn · νFn ≤ θA + ². Now, for each n, the product of the measures µEn , νEn is finite, so either one is zero or both are finite. If µEn = 0 or νFn = 0 then of course µEn · νFn = 0 = θ((En × Fn ) ∩ H) + θ((En × Fn ) \ H). If µEn < ∞ and νFn < ∞ then µEn · νFn = λ0 (En × Fn ) = λ0 ((En × Fn ) ∩ H) + λ0 ((En × Fn ) \ H) = θ((En × Fn ) ∩ H) + θ((En × Fn ) \ H). Accordingly, because θ is an outer measure, θ(A ∩ H) + θ(A \ H) ≤ =
∞ X n=0 ∞ X
θ((En × Fn ) ∩ H) +
∞ X
θ((En × Fn ) \ H)
n=0
µEn · νFn ≤ θA + ².
n=0
As ² is arbitrary, θ(A ∩ H) + θ(A \ H) ≤ θA. As A is arbitrary, H ∈ Λ. 251I
Now for the fundamental properties of the c.l.d. product measure.
Theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces; let λ be the c.l.d. product measure on X × Y , and Λ its domain. Then b ⊆ Λ and λ(E × F ) = µE · νF whenever E ∈ Σ, F ∈ T and µE · νF < ∞; (a) Σ⊗T b such that V ⊆ W and λV = λW ; (b) for every W ∈ Λ there is a V ∈ Σ⊗T (c) (X × Y, Λ, λ) is complete and locally determined, and in fact is the c.l.d. version of (X × Y, Λ, λ0 ) as described in 213D-213E; in particular, λW = λ0 W whenever λ0 W < ∞; (d) if W ∈ Λ and λW > 0 then there are E ∈ Σ, F ∈ T such that µE < ∞, νF < ∞ and λ(W ∩(E ×F )) > 0; (e) if W ∈ Λ and λW <S∞, then for every ² > 0 there are E0 , . . . , En ∈ Σ, F0 , . . . , Fn ∈ T, all of finite measure, such that λ(W 4 i≤n (Ei × Fi )) ≤ ². proof Take θ to be the outer measure of 251A and λ0 the primitive product measure of 251C. Set Σf = {E : E ∈ Σ, µE < ∞} and Tf = {F : F ∈ T, F ∈ ∞}.
251I
Finite products
203
b ⊆ Λ. If E ∈ Σ and F ∈ T and µE · νF < ∞, either µE · νF = 0 and λ(E × F ) = (a) By 251E, Σ⊗T λ0 (E × F ) = 0 or both µE and νF are finite and again λ(E × F ) = λ0 (E × F ) = µE · νF . (b)(i) Take any a < λW . Then there are E ∈ Σf , F ∈ Tf such that λ0 (W ∩ (E × F )) > a (251H); now θ((E × F ) \ W ) = λ0 ((E × F ) \ W ) = λ0 (E × F ) − λ0 (W ∩ (E × F )) < λ0 (E × F ) − a. Let hEn in∈N , hFn in∈N be sequences in Σ, T respectively such that (E × F ) \ W ⊆ P∞ n=0 µEn · νFn ≤ λ0 (E × F ) − a. Consider S b V = (E × F ) \ n∈N En × Fn ∈ Σ⊗T;
S n∈N
En × Fn and
then V ⊆ W , and
λV = λ0 V = λ0 (E × F ) − λ0 ((E × F ) \ V ) [ ≥ λ0 (E × F ) − λ0 ( E n × Fn ) (because (E × F ) \ V ⊆
S
n∈N n∈N En × Fn )
≥ λ0 (E × F ) −
∞ X
µEn · νFn ≥ a
n=0
(by the choice of the En , Fn ). b such that V ⊆ W and λV ≥ a. Now choose a sequence (ii) Thus for every a < λW there is a V ∈ Σ⊗T S han in∈N strictly increasing to λW , and for each an a corresponding Vn ; then V = n∈N Vn belongs to the b is included in W , and has measure at least supn∈N λVn and at most λW ; so λV = λW , as σ-algebra Σ⊗T, required. (c)(i) If H ⊆ X ×Y is λ-negligible, there is a W ∈ Λ such that H ⊆ W and λW = 0. If E ∈ Σ, F ∈ T are of finite measure, λ0 (W ∩ (E × F )) = 0; but λ0 , being derived from the outer measure θ by Carath´eodory’s method, is complete (212A), so H ∩ (E × F ) ∈ Λ and λ0 (H ∩ (E × F )) = 0. Because E and F are arbitrary, H ∈ Λ, by 251H. As H is arbitrary, λ is complete. (ii) If W ∈ Λ and λW = ∞, then there must be E ∈ Σ, F ∈ T such that µE < ∞, νF < ∞ and λ0 (W ∩ (E × F )) > 0; now 0 < λ(W ∩ (E × F )) ≤ µE · νF < ∞. Thus λ is semi-finite. (iii) If H ⊆ X × Y and H ∩ W ∈ Λ whenever λW < ∞, then, in particular, H ∩ (E × F ) ∈ Λ whenever µE < ∞ and νF < ∞; by 251H again, H ∈ Λ. Thus λ is locally determined. W ⊆
(iv) λ0 W < ∞, then we have sequences hEn in∈N in Σ, hFn in∈N in T such that S If W ∈ Λ and P ∞ (E × F ) and n n n∈N n=0 µEn · νFn < ∞. Set
I = {n : µEn = ∞}, J = {n : νFn = ∞}, K = N \ (I ∪ J); S S then ν( n∈I Fn ) = µ( n∈J En ) = 0, so λ0 (W \ W 0 ) = 0, where S S S W 0 = W ∩ n∈K (En × Fn ) ⊇ W \ (( n∈J En × Y ) ∪ (X × n∈I Fn )). S S S Now set En0 = i∈K,i≤n Ei , Fn0 = i∈K,i≤n Fi for each n. We have W 0 = n∈N W 0 ∩ (En0 × Fn0 ), so λW ≤ λ0 W = λ0 W 0 = limn→∞ λ0 (W 0 ∩ (En0 × Fn0 )) ≤ λW 0 ≤ λW , and λW = λ0 W . (v) Following the terminology of 213D, let us write ˜ = {W : W ⊆ X × Y, W ∩ V ∈ Λ whenever V ∈ Λ, λ0 V < ∞}, Λ
204
Product measures
251I
˜ = sup{λ0 (W ∩ V ) : V ∈ Λ, λ0 V < ∞}. λW ˜ ⊆ Λ and Λ ˜ = Λ. Because λ0 (E × F ) < ∞ whenever µE < ∞ and νF < ∞, Λ Now for any W ∈ Λ we have ˜ = sup{λ0 (W ∩ V ) : V ∈ Λ, λ0 V < ∞} λW ≥ sup{λ0 (W ∩ (E × F )) : E ∈ Σf , F ∈ Tf } = λW ≥ sup{λ(W ∩ V ) : V ∈ Λ, λ0 V < ∞} = sup{λ0 (W ∩ V ) : V ∈ Λ, λ0 V < ∞}, ˜ is the c.l.d. version of λ0 . using (iv) just above, so that λ = λ (d) If W ∈ Λ and λW > 0, there are E ∈ Σf and F ∈ Tf such that λ(W ∩(E×F )) = λ0 (W ∩(E×F )) > 0. (e) There are E ∈ Σf , F ∈ Tf such that λ0 (W ∩ (E × F )) ≥ λW − 13 ²; set V1 = W ∩ (E × F ); then 1
λ(W \ V1 ) = λW − λV1 = λW − λ0 V1 ≤ ². 3 S P∞ 0 0 There are sequences hEn in∈N in Σ, hFn in∈N in T such that V1 ⊆ n∈N En0 × Fn0 and n=0 µEn0 · νFn0 ≤ f f 0 0 0 0 0 0 λ0 V1 + 31 ². Replacing S En , 0Fn by0 En ∩ E, Fn ∩ F if necessary, we may suppose that En ∈ Σ , Fn ∈ T for every n. Set V2 = n∈N En × Fn ; then λ(V2 \ V1 ) ≤ λ0 (V2 \ V1 ) ≤ Let m ∈ N be such that
P∞ n=m+1
P∞ n=0
1 3
µEn0 · νFn0 − λ0 V1 ≤ ².
µEn0 · νFn0 ≤ 31 ², and set Sm V = n=0 En0 × Fn0 .
Then λ(V2 \ V ) ≤
P∞ n=m+1
1 3
µEn0 · νFn0 ≤ ².
Putting these together, we have W 4V ⊆ (W \ V1 ) ∪ (V2 \ V1 ) ∪ (V2 \ V ), so 1 3
1 3
1 3
λ(W 4V ) ≤ λ(W \ V1 ) + λ(V2 \ V1 ) + λ(V2 \ V ) ≤ ² + ² + ² = ². And V is of the required form. 251J Proposition If (X, Σ, µ) and (Y, T, ν) are semi-finite measure spaces and λ is the c.l.d. product measure on X × Y , then λ(E × F ) = µE · νF for all E ∈ Σ, F ∈ T. proof Setting Σf = {E : E ∈ Σ, µE < ∞}, Tf = {F : F ∈ T, νF < ∞}, we have λ(E × F ) = sup{λ0 ((E ∩ E0 ) × (F ∩ F0 )) : E0 ∈ Σf , F0 ∈ Tf } = sup{µ(E ∩ E0 ) · ν(F ∩ F0 )) : E0 ∈ Σf , F0 ∈ Tf } = sup{µ(E ∩ E0 ) : E0 ∈ Σf } · sup{ν(F ∩ F0 ) : F0 ∈ Tf } = µE · νF (using 213A). 251K σ-finite spaces Of course most of the measure spaces we shall apply these results to are σ-finite, and in this case there are some useful simplifications. Proposition Let (X, Σ, µ) and (Y, T, ν) be σ-finite measure spaces. Then the c.l.d. product measure on b moreover, X × Y is equal to the primitive product measure, and is the completion of its restriction to Σ⊗T; this common product measure is σ-finite.
251L
Finite products
205
proof Write λ0 , λ for the primitive and c.l.d. product measures, as usual, and Λ for their domain. Let hEn in∈N , hFn in∈N be non-decreasing sequences of sets of finite measure covering X, Y respectively (see 211D). S (a) For each n ∈ N, λ(En × Fn ) = µEn · νFn is finite, by 251Ia. Since X × Y = n∈N En × Fn , λ is σ-finite. (b) For any W ∈ Λ, λ0 W = limn→∞ λ0 (W ∩ (En × Fn )) = limn→∞ λ(W ∩ (En × Fn )) = λW . So λ = λ0 . ˆ B for its completion. b and λ (c) Write λB for the restriction of λ = λ0 to Σ⊗T, ˆ B . Then there are W 0 , W 00 ∈ Σ⊗T b such that W 0 ⊆ W ⊆ W 00 and (i) Suppose that W ∈ dom λ 00 0 00 0 λB (W \ W ) = 0 (212C). In this case, λ(W \ W ) = 0; as λ is complete, W ∈ Λ and ˆB W . λW = λW 0 = λB W 0 = λ ˆB . Thus λ extends λ b such that V ⊆ W and λ(W \ V ) = 0. P (ii) If W ∈ Λ, then there is a V ∈ Σ⊗T P For each n ∈ N b such that Vn ⊆ W ∩ (En × Fn ) and λVn = λ(W ∩ (En × Fn )) (251Ib). But as there is a Vn ∈ Σ⊗T S λ(En × Fn ) = µEn · νFn is finite, this means that λ(W ∩ (En × Fn ) \ Vn ) = 0. So if we set V = n∈N Vn , b V ⊆ W and we shall have V ∈ Σ⊗T, S S W \ V = n∈N W ∩ (En × Fn ) \ V ⊆ n∈N W ∩ (En × Fn ) \ Vn is λ-negligible. Q Q b such that V 0 ⊆ (X × Y ) \ W and λ(((X × Y ) \ W ) \ V 0 ) = 0. Setting Similarly, there is a V 0 ∈ Σ⊗T 00 0 00 b V = (X × Y ) \ V , V ∈ Σ⊗T, W ⊆ V 00 and λ(V 00 \ W ) = 0. So λB (V 00 \ V ) = λ(V 00 \ V ) = λ(V 00 \ W ) + λ(W \ V ) = 0, ˆ B , with λ ˆ B W = λB V = λW . As W is arbitrary, λ ˆ B = λ. and W is measured by λ 251L It is time that I gave some examples. Of course the central example is Lebesgue measure. In b this case we have the only reasonable result. I pause to describe the leading example of the product Σ⊗T introduced in 251D. Proposition Let r, s ≥ 1 be integers. Then we have a natural bijection φ : R r × R s → R r+s , defined by setting φ((ξ1 , . . . , ξr ), (η1 , . . . , ηs )) = (ξ1 , . . . , ξr , η1 , . . . , ηs ) for ξ1 , . . . , ξr , η1 , . . . , ηs ∈ R. If we write Br , Bs and Br+s for the Borel σ-algebras of R r , R s and R r+s b s. respectively, then φ identifies Br+s with Br ⊗B proof (a) Write B for the σ-algebra {φ−1 [W ] : W ∈ Br+s } copied onto R r × R s by the bijection φ; we b s . We have maps π1 : R r+s → R r , π2 : R r+s → R s defined by are seeking to prove that B = Br ⊗B setting π1 (φ(x, y)) = x, π2 (φ(x, y)) = y. Each co-ordinate of π1 is continuous, therefore Borel measurable (121Db), so π1−1 [E] ∈ Br+s for every E ∈ Br , by 121K. Similarly, π2−1 [F ] ∈ Br+s for every F ∈ Bs . So φ[E × F ] = π1−1 [E] ∩ π1−1 [F ] belongs to Br+s , that is, E × F ∈ B, whenever E ∈ Br and F ∈ Bs . Because B b s ⊆ B. is a σ-algebra, Br ⊗B (b) Now examine sets of the form {(x, y) : ξi ≤ α} = {x : ξi ≤ α} × R s , {(x, y) : ηj ≤ α} = R r × {y : ηj ≤ α} b s . But for α ∈ R, i ≤ r and j ≤ s, taking x = (ξ1 , . . . , ξr ) and y = (η1 , . . . , ηs ). All of these belong to Br ⊗B b b the σ-algebra they generate is just B, by 121J. So B ⊆ Br ⊗Bs and B = Br ⊗Bs .
206
Product measures
251M
251M Theorem Let r, s ≥ 1 be integers. Then the bijection φ : R r × R s → R r+s described in 251L identifies Lebesgue measure on R r+s with the c.l.d. product λ of Lebesgue measure on R r and Lebesgue measure on R s . proof Write µr , µs , µr+s for the three versions of Lebesgue measure, µ∗r , µ∗s and µ∗r+s for the corresponding outer measures, and θ for the outer measure on R r × R s derived from µr and µs by the formula of 251A. (a) If I ⊆ R r and J ⊆ R s are half-open intervals, then φ[I × J] ⊆ R r+s is also a half-open interval, and µr+s (φ[I × J]) = µr I · µs J; this is immediate from the definition Qrof the Lebesgue measure of an interval. (I speak of ‘half-open’ intervals here, that is, intervals of the form j=1 [αj , βj [, because I used them in the definition of Lebesgue measure in §115. If you prefer to work with open intervals or closed intervals it makes no difference.) Note also that every half-open interval in R r+s is expressible as φ[I × J] for suitable I, J. (b) For any A ⊆ R r+s , θ(φ−1S[A]) ≤ µ∗r+s (A). P For any ² > 0, there is a sequence hKn in∈N of half-open P∞P intervals in Rr+s such that A ⊆ n∈N Kn and n=0 µr+s (Kn ) ≤ µ∗r+s (A)+². Express each Kn as φ[In ×Jn ], S where In and Jn are half-open intervals in R r and R s respectively; then φ−1 [A] ⊆ n∈N In × Jn , so that P∞ P∞ θ(φ−1 [A]) ≤ n=0 µr In · µs Jn = n=0 µr+s (Kn ) ≤ µ∗r+s (A) + ². As ² is arbitrary, we have the result. Q Q P (i) Consider first the (c) If E ⊆ R r and F ⊆ R s are measurable, then µ∗r+s (φ[E × F ]) ≤ µr E · µs F . P case µr E < ∞, µs F < S ∞. In this case, given ² > 0, there are sequences hI i , hJn in∈N of half-open n n∈N S intervals such that E ⊆ n∈N In , F ⊆ n∈N Fn , P∞ ∗ n=0 µr In ≤ µr E + ² = µr E + ², P∞ ∗ n=0 µs Jn ≤ µs F + ² = µs F + ². S S Accordingly E × F ⊆ m,n∈N Im × Jn and φ[E × F ] ⊆ m,n∈N φ[Im × Jn ], so that µ∗r+s (φ[E × F ]) ≤ =
∞ X
µr+s (φ[Im × Jn ]) =
m,n=0 ∞ X
∞ X
m=0
n=0
µr Im ·
∞ X
µr Im · µs Jn
m,n=0
µs Jn ≤ (µr E + ²)(µs F + ²).
As ² is arbitrary, we have the result. (ii) Next, if µr E = 0, there is a sequence hFn in∈N of sets of finite measure covering R s ⊇ F , so that P∞ P∞ µ∗r+s (φ[E × F ]) ≤ n=0 µ∗r+s (φ[E × Fn ]) ≤ n=0 µr E · µs Fn = 0 = µr E · µs F . (iii) Similarly, µ∗r+s (φ[E × F ]) ≤ µr E · µs F if µs F = 0. (iv) The only remaining case is in which both of µr E, µs F are strictly positive and one is infinite; but in this case µr E · µs F = ∞, so surely µ∗r+s (φ[E × F ]) ≤ µr E · µs F . Q Q (d) If A ⊆ R r+s , then µ∗r+s (A) ≤ θ(φ−1 [A]). P P Given ² > S 0, there are sequences P∞hEn in∈N , hFn in∈N of measurable sets in R r , R s respectively such that φ−1 [A] ⊆ n∈N En × Fn and n=0 µr En · µs Fn ≤ S θ(φ−1 [A]) + ². Now A ⊆ n∈N φ[En × Fn ], so P∞ P∞ µ∗r+s (A) ≤ n=0 µ∗r+s (φ[En × Fn ]) ≤ n=0 µr En · µs Fn ≤ θ(φ−1 [A]) + ². As ² is arbitrary, we have the result. Q Q (e) Putting (c) and (d) together, we have θ(φ−1 [A]) = µ∗r+s (A) for every A ⊆ Rr+s . Thus θ on R r × R s corresponds exactly to µ∗r+s on R r+s . So the associated measures λ0 , µr+s must correspond in the same way, writing λ0 for the primitive product measure. But 251K tells us that λ0 = λ, so we have the result.
251O
Finite products
207
251N In fact, a large proportion of the applications of the constructions here are to subspaces of Euclidean space, rather than to the whole product R r × R s . It would not have been especially difficult to write 251M out to deal with arbitrary subspaces, but I prefer to give a more general description of the product of subspace measures, as I feel that it illuminates the method. I start with a straightforward result on strictly localizable spaces. Proposition Let (X, Σ, µ) and (Y, T, ν) be strictly localizable measure spaces. Then the c.l.d. product measure on X × Y is strictly localizable; moreover, if hXi ii∈I and hYj ij∈J are decompositions of X and Y respectively, hXi × Yj i(i,j)∈I×J is a decomposition of X × Y . proof Let hXi ii∈I and hYj ij∈J be decompositions of X, Y respectively. Then hXi × Yj i(i,j)∈I×J is a disjoint cover of X × Y by measurable sets of finite measure. If W ⊆ X × Y and λW > 0, there P are sets E ∈ Σ, F ∈ TP such that µE < ∞, νF < ∞ and λ(W ∩ (E × F )) > 0. We know that µE = i∈I µ(E ∩ Xi ) and µF = j∈J µ(F ∩ Yj ), so there must be finite sets I0 ⊆ I, J0 ⊆ J such that P P µE · νF − ( i∈I0 µ(E ∩ Xi ))( j∈J0 ν(F ∩ Yj )) < λ(W ∩ (E × F )). S S Setting E 0 = i∈I0 Xi , F 0 = j∈J0 Yj we have λ((E × F ) \ (E 0 × F 0 )) = λ(E × F ) − λ((E ∩ E 0 ) × (F ∩ F 0 )) < λ(W ∩ (E × F )), so that λ(W ∩ (E 0 × F 0 )) > 0. There must therefore be some i ∈ I0 , j ∈ J0 such that λ(W ∩ (Xi × Yj )) > 0. This shows that {Xi × Yj : i ∈ I, j ∈ J} satisfies the criterion of 213O, so that λ, being complete and locally determined, must be strictly localizable. Because hXi × Yj i(i,j)∈I×J covers X × Y , it is actually a decomposition of X × Y (213Ob). 251O Lemma Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X ×Y . Let λ∗ be the corresponding outer measure (132B). Then λ∗ C = sup{θ(C ∩ (E × F )) : E ∈ Σ, F ∈ T, µE < ∞, νF < ∞} for every C ⊆ X × Y , where θ is the outer measure of 251A. proof Write Λ for the domain of λ, Σf for {E : E ∈ Σ, µE < ∞}, Tf for {F : F ∈ T, νF < ∞}; set u = sup{θ(C ∩ (E × F )) : E ∈ Σf , F ∈ Tf }. (a) If C ⊆ W ∈ Λ, E ∈ Σf and F ∈ Tf , then θ(C ∩ (E × F )) ≤ θ(W ∩ (E × F )) = λ0 (W ∩ (E × F )) (where λ0 is the primitive product measure) ≤ λW. As E and F are arbitrary, u ≤ λW ; as W is arbitrary, u ≤ λ∗ C. (b) If u = ∞, then of course λ∗ C = u. Otherwise, let hEn in∈N , hFn in∈N be sequences in Σf , Tf respectively such that u = supn∈N θ(C ∩ (En × Fn )). S S Consider C = C \ ( n∈N En × n∈N Fn ). If E ∈ Σf and F ∈ Tf , then for every n ∈ N we have 0
u ≥ θ(C ∩ ((E ∪ En ) × (F ∪ Fn ))) = θ(C ∩ ((E ∪ En ) × (F ∪ Fn )) ∩ (En × Fn )) + θ(C ∩ ((E ∪ En ) × (F ∪ Fn )) \ (En × Fn )) (because En × Fn ∈ Λ, by 251E) ≥ θ(C ∩ (En × Fn )) + θ(C 0 ∩ (E × F )). Taking the supremum of the right-hand expression as n varies, we have u ≥ u + θ(C 0 ∩ (E × F )) so
208
Product measures
251O
λ(C 0 ∩ (E × F )) = θ(C 0 ∩ (E × F )) = 0. As E and F are arbitrary, λC 0 = 0. But this means that λ∗ C ≤ λ∗ (C ∩ (
[
En ×
n∈N ∗
= lim λ (C ∩ ( n→∞
[
[
Fn )) + λ∗ C 0
n∈N
[
Ei ×
i≤n
Fi ))
i≤n
(using 132Ae) ≤ u, as required. 251P Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and A ⊆ X, B ⊆ Y subsets; write µA , νB for the subspace measures on A, B respectively. Let λ be the c.l.d. product measure on X × Y , and ˜ be the c.l.d. product measure of µA and νB on A × B. λ# the subspace measure it induces on A × B. Let λ Then ˜ extends λ# . (i) λ (ii) If either (α) A ∈ Σ and B ∈ T or (β) A and B can both be covered by sequences of sets of finite measure or (γ) µ and ν are both strictly localizable, ˜ then λ = λ# . proof Let θ be the outer measure on X × Y defined from µ and ν by the formula of 251A, and θ˜ the outer ˜ ˜ for the domain of λ, measure on A × B similarly defined from µA and νB . Write Λ for the domain of λ, Λ f # # f and Λ = {W ∩ (A × B) : W ∈ Λ} for the domain of λ . Set Σ = {E : µE < ∞}, T = {F : νF < ∞}. ˜ = θC for every C ⊆ A × B. P (a) The first point to observe is that θC P (i) If hEn in∈N and hFn in∈N are S sequences in Σ, T respectively such that C ⊆ n∈N En × Fn , then S C = C ∩ (A × B) ⊆ n∈N (En ∩ A) × (Fn ∩ B), so ˜ ≤ θC =
∞ X n=0 ∞ X
µA (En ∩ A) · νB (Fn ∩ B) µ∗ (En ∩ A) · ν ∗ (Fn ∩ B) ≤
n=0
∞ X
µEn · νFn .
n=0
˜ ≤ θC. (ii) If hE ˜n in∈N , hF˜n in∈N are sequences in ΣA = dom µA , As hEn in∈N and hFn in∈N are arbitrary, θC S ˜n × F˜n , then for each n ∈ N we can choose En ∈ Σ, Fn ∈ T TB = dom νB respectively such that C ⊆ n∈N E such that ˜n ⊆ En , µEn = µ∗ E ˜n = µA E ˜n , E F˜n ⊆ Fn , and now θC ≤
P∞ n=0
νFn = ν ∗ F˜n = νB F˜n ,
µEn · νFn =
P∞ n=0
˜n · νB F˜n . µA E
˜ Q ˜n in∈N , hF˜n in∈N are arbitrary, θC ≤ θC. As hE Q ˜ P (b) It follows that Λ# ⊆ Λ. P Suppose that V ∈ Λ# and that C ⊆ A × B. In this case there is a W ∈ Λ such that V = W ∩ (A × B). So
251Q
Finite products
209
˜ ∩ V ) + θ(C ˜ \ V ) = θ(C ∩ W ) + θ(C \ W ) = θC = θC. ˜ θ(C ˜ Q As C is arbitrary, V ∈ Λ. Q Accordingly, for V ∈ Λ# , λ# V = λ∗ V = sup{θ(V ∩ (E × F )) : E ∈ Σf , F ∈ Tf } ˜ < ∞, νB F˜ < ∞} ˜ × F˜ )) : E ˜ ∈ ΣA , F˜ ∈ TB , µA E = sup{θ(V ∩ (E ˜ ∩ (E ˜ ˜ × F˜ )) : E ˜ ∈ ΣA , F˜ ∈ TB , µA E ˜ < ∞, νB F˜ < ∞} = λV, = sup{θ(V using 251O twice. This proves part (i) of the proposition. ˜ and V ⊆ E × F where E ∈ Σf and F ∈ Tf , then V ∈ Λ# . (c) The next thing to observe is that if V ∈ Λ P P Let W ⊆ E × F be a measurable envelope of V with respect to λ (132Ee). Then ˜ ˜ θ(W ∩ (A × B) \ V ) = θ(W ∩ (A × B) \ V ) = λ(W ∩ (A × B) \ V ) ˜ V ∈ Λ) ˜ (because W ∩ (A × B) ∈ Λ# ⊆ Λ, ˜ ˜ ≤ θ(W ∩ (A × B)) − θV = λ(W ∩ (A × B)) − λV ˜ = θV ˜ = θV ) (because V ⊆ (E ∩ A) × (F ∩ B), and µA (E ∩ A) and νB (F ∩ B) are both finite, so λV = λ∗ (W ∩ (A × B)) − λ∗ V ≤ λW − λ∗ V = 0. But this means that W 0 = W ∩ (A × B) \ V ∈ Λ and V = (A × B) ∩ (W \ W 0 ) belongs to Λ# . Q Q ˜ and look at the conditions (α)-(γ) of part (ii) of the proposition. (d) Now fix any V ∈ Λ, α) If A ∈ Σ and B ∈ T, and C ⊆ X × Y , then A × B ∈ Λ (251E), so (α θ(C ∩ V ) + θ(C \ V ) = θ(C ∩ V ) + θ((C \ V ) ∩ (A × B)) + θ((C \ V ) \ (A × B)) ˜ ∩ V ) + θ(C ˜ ∩ (A × B) \ V ) + θ(C \ (A × B)) = θ(C ˜ ∩ (A × B)) + θ(C \ (A × B)) = θ(C = θ(C ∩ (A × B)) + θ(C \ (A × B)) = θC. As C is arbitrary, V ∈ Λ, so V = V ∩ (A × B) ∈ Λ# . S S S β ) If A ⊆ n∈N En and B ⊆ n∈N Fn where all the En , Fn are of finite measure, then V = m,n∈N V ∩ (β (Em × Fn ) ∈ Λ# , by (c). (γγ ) If hXi ii∈I , hYj ij∈J are decompositions of X, Y respectively, then for each i ∈ I, j ∈ J we have V ∩(Xi ×Yj ) ∈ Λ# , that is, there is a Wij ∈ Λ such that V ∩(Xi ×Yj ) = Wij ∩(A×B). Now hXi ×Yj i(i,j)∈I×J is a decomposition of X × Y for λ (251N), so that S W = i∈I,j∈J Wij ∩ (Xi × Yj ) ∈ Λ, and V = W ∩ (A × B) ∈ Λ# . ˜ = Λ# , in which case (i) tells us that (e) Thus any of the three conditions is sufficient to ensure that Λ # ˜ λ=λ . 251Q Corollary Let r, s ≥ 1 be integers, and φ : Rr × R s → R r+s the natural bijection. If A ⊆ R r and B ⊆ R s , then the restriction of φ to A × B identifies the product of Lebesgue measure on A and Lebesgue measure on B with Lebesgue measure on φ[A × B] ⊆ R r+s . Remark Note that by ‘Lebesgue measure on A’ I mean the subspace measure µrA on A induced by rdimensional Lebesgue measure µr on R r , whether or not A is itself a measurable set. ˜ on A × B is just the proof By 251P, using either of the conditions (ii-β) or (ii-γ), the product measure λ subspace measure λ# on A × B induced by the product measure λ on R r × R s . But by 251M we know that ˜ with the subspace φ is an isomorphism between (R r × Rs , λ) and (R r+s , µr+s ); so it must also identify λ measure on φ[A × B].
210
Product measures
251R
251R Corollary Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X×Y . If A ⊆ X and B ⊆ Y can be covered by sequences of sets of finite measure, then λ∗ (A×B) = µ∗ A·ν ∗ B. proof In the language of 251P, λ∗ (A × B) = λ# (A × B) = µA A · νB B (by 251K and 251E) = µ∗ A · ν ∗ B. 251S
The next proposition gives an idea of how the technical definitions here fit together. ˆ µ ˜ µ Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces. Write (X, Σ, ˆ) and (X, Σ, ˜) for the completion ˆ and λ ˜ be the three c.l.d. product measures on X × Y and c.l.d. version of (X, Σ, µ) (212C, 213E). Let λ, λ ˆ = λ. ˜ obtained from the pairs (µ, ν), (ˆ µ, ν) and (˜ µ, ν) of factor measures. Then λ = λ ˆ λ ˜ respectively; and θ, θ, ˆ θ˜ for the outer measures on ˆ and Λ ˜ for the domains of λ, λ, proof Write Λ, Λ X × Y obtained by the formula of 251A from the three pairs of factor measures. (a) If E ∈ Σ and µE < ∞, then θ, θˆ and θ˜ agree on subsets of E × Y . P P Take A ⊆ E × Y and ² > 0. S P∞ (i) There are sequences hEn in∈N in Σ, hFn in∈N in T such that A ⊆ n∈N En ×Fn and n=0 µEn ·νFn ≤ θA + ². Now µ ˜En ≤ µEn for every n (213F), so P∞ ˜ ≤ P∞ µ θA n=0 µEn · νFn ≤ θA + ². n=0 ˜ En · νFn ≤ P∞ ˆ ˆ ˆn ·ν Fˆn ≤ ˆn in∈N in Σ, ˆ hFˆn in∈N in T such that A ⊆ S ˆE (ii) There are sequences hE n∈N En ×Fn and n=0 µ 0 0 0 ˆ ˆ ˆ θA + ². Now for each n there is an En ∈ Σ such that En ⊆ En and µEn = µ ˆEn , so that P P∞ ∞ ˆ + ². ˆn · ν Fˆn ≤ θA ˆE θA ≤ n=0 µEn0 · ν Fˆn = n=0 µ P∞ ˜n · ˜ ˜ ˜n in∈N in Σ, ˜ hF˜n in∈N in T such that A ⊆ S ˜E (iii) There are sequences hE n=0 µ n∈N En × Fn and ˜ + ². Now for each n, E ˜n ∩ E ∈ Σ, ˆ so ν F˜n ≤ θA P P∞ ˆ ≤ ∞ µ ˜ + ². ˜ ˜ ˜n · ν F˜n ≤ θA θA ˜E n=0 ˆ (En ∩ E) · ν Fn ≤ n=0 µ (iv) Since A and ² are arbitrary, θ = θˆ = θ˜ on P(E × Y ). Q Q ∗ ˆ∗ ∗ ˜ (b) Consequently, the outer measures λ , λ and λ are identical. P P Use 251O. Take A ⊆ X × Y , E ∈ Σ, ˆ ∈ Σ, ˆ E ˜ ∈ Σ, ˜ F ∈ T such that µE, µ ˆ µ ˜ and νF are all finite. Then E ˆE, ˜E (i) ˆ ∩ (E × F )) ≤ λ ˆ ∗ A, θ(A ∩ (E × F )) = θ(A
˜ ∩ (E × F )) ≤ λ ˜∗A θ(A ∩ (E × F )) = θ(A
because µ ˆE and µ ˜E are both finite. ˆ ⊆ E 0 and µE 0 < ∞, so that (ii) There is an E 0 ∈ Σ such that E ˆ ∩ (E ˆ ∩ (E 0 × F )) = θ(A ∩ (E 0 × F )) ≤ λ∗ A. ˆ × F )) ≤ θ(A θ(A ˜ E ˜ and µ ˜ \ E 00 ) = 0 (213Fc), so that θ(( ˜ \ E 00 ) × Y ) = 0 (iii) There is an E 00 ∈ Σ such that E 00 ⊆ E ˜ (E and µE 00 < ∞; accordingly ˜ ∩ (E ˜ ∩ (E 00 × F )) = θ(A ∩ (E 00 × F )) ≤ λ∗ A. ˜ × F )) = θ(A θ(A ˆ E ˜ and F , we get (iv) Taking the supremum over E, E, ˆ ∗ A, λ∗ A ≤ λ ˜ ∗ A, λ ˆ ∗ A ≤ λ∗ A, λ ˜ ∗ A ≤ λ∗ A. λ∗ A ≤ λ ˆ∗ = λ ˜∗. Q As A is arbitrary, λ∗ = λ Q ˆ and λ ˜ are all complete and locally determined, so by 213C are the measures defined by (c) Now λ, λ Carath´eodory’s method from their own outer measures, and are therefore identical.
251Wf
Finite products
211
251T It is ‘obvious’ and an easy consequence of theorems so far proved, that the set {(x, x) : x ∈ R} is negligible for Lebesgue measure on R 2 . The corresponding result is true in the square of any atomless measure space. Proposition Let (X, Σ, µ) be an atomless measure space, and let λ be the c.l.d. measure on X × X. Then ∆ = {(x, x) : x ∈ X} is λ-negligible. proof Let E, F ∈ Σ be sets of finite measure, and n ∈ N. Applying 215D repeatedly, we can find a disjoint S µF for each i; setting Fn = F \ i 0. f (v) If W ∈ ΛSand λW Q < ∞, then for every ² > 0 there are n ∈ N and E0i , . . . , Eni ∈ Σi , for each i ∈ I, such that λ(W 4 k≤n i∈I Eki ) ≤ ².
212
Product measures
251Wg
N (g) If each µi is σ-finite, so is λ, and λ = λ0 is the completion of its restriction to c i∈I Σi . (h) If hIj ij∈J is any partition of I, then λ can be identified with the c.l.d. product of hλj ij∈J , where λj is the c.l.d. product of hµi ii∈Ij . (See the arguments in 251M and also in 254N below.) (i) If I = {1, . . . , n} and each µi is Lebesgue measure on R, then λ can be identified with Lebesgue measure on R n . Q (j) If, for each i ∈ I, we have a decomposition hXij ij∈Ji of Xi , then h i∈I Xi,f (i) if ∈Qi∈I Ji is a decomposition of X. (k) For any A ⊆ X, λ∗ C = sup{θ(C ∩
Q i∈I
Ei ) : Ei ∈ Σfi for every i ∈ I}.
(l) Suppose that Ai ⊆ Xi for each i ∈ I. Write λ# for the subspace measure on A = ˜ extends λ# , and if the c.l.d. product of the subspace measures on the Ai . Then λ either Ai ∈ Σi for every i or every Ai can be covered by a sequence of sets of finite measure or every µi is strictly localizable, ˜ = λ# . then λ
Q i∈I
˜ for Ai , and λ
Q ∗ Q (m) ∗If Ai ⊆ Xi can be covered by a sequence of sets of finite measure for each i ∈ I, then λ ( i∈I Ai ) = i∈I µi Ai for each i. (n) Writing µ ˆi , µ ˜i for the completion and c.l.d. version of each µi , λ is the c.l.d. product of hˆ µi ii∈I and also of h˜ µi ii∈I . (o) If all the (Xi , Σi , µi ) are the same atomless measure space, then {x : x ∈ X, i 7→ x(i) is injective} is λ-conegligible. 251X Basic exercises (a) Let (X, Σ, µ) and (Y, T, ν) be measure spaces; let λ0 be the primitive product measure on X × Y , and λ the c.l.d. product measure. Show that λ0 W < ∞ iff λW < ∞ and W is included in a set of the form S (E × Y ) ∪ (X × F ) ∪ n∈N En × Fn where µE = νF = 0 and µEn < ∞, νFn < ∞ for every n. > (b) Show that if X and Y are any sets, with their respective counting measures, then the primitive and c.l.d. product measures on X × Y are both counting measure on X × Y . (c) Let (X, Σ, µ) and (Y, T, ν) be measure spaces; let λ0 be the primitive product measure on X × Y , and λ the c.l.d. product measure. Show that λ0 is locally determined ⇐⇒ λ0 is semi-finite ⇐⇒ λ0 = λ ⇐⇒ λ0 , λ have the same negligible sets. > (d) (See Q 251W.) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, where I is a non-empty finite set. Set X = i∈I Xi . For A ⊆ X, set P∞ Q S Q θ(A) = inf{ n=0 i∈I µi Eni : Eni ∈ Σi ∀ n ∈ N, i ∈ I, A ⊆ n∈N i∈I Eni }. Show that θ is an outer measure on X. Let λ0 be the measure defined from θ by Carath´eodory’s method, and for W ∈ dom λ0 set Q λW = sup{λ0 (W ∩ i∈I Ei ) : Ei ∈ Σi , µi Ei < ∞ for every i ∈ I}. Show that λ is a measure on X, and is the c.l.d. version of λ0 .
251Xr
Finite products
213
> (e) (See 251W.) Let I be a non-empty finite set and h(Xi , Σi , µi )ii∈I a family of measure spaces. For Q (K) non-empty K ⊆ I set X (K) = i∈K Xi and let λ0 , λ(K) be the measures on X (K) constructed as in 251Xd. Show that if K is a non-empty proper subset of I, then the natural bijection between X (I) and (I) (K) (I\K) X (K) × X (I\K) identifies λ0 with the primitive product measure of λ0 and λ0 , and λ(I) with the (K) (I\K) c.l.d. product measure of λ and λ . > (f ) Using 251Xd-251Xe above, or otherwise, show that if (X1 , Σ1 , µ1 ), (X2 , Σ2 , µ2 ), (X3 , Σ3 , µ3 ) are measure spaces then the primitive and c.l.d. product measures λ0 , λ of (X1 × X2 ) × X3 , constructed by first taking the appropriate product measure on X1 × X2 and then taking the product of this with the measure of X3 , are identified with the corresponding product measures on X1 × (X2 × X3 ) by the canonical bijection between the sets (X1 × X2 ) × X3 and X1 × (X2 × X3 ). (g) (i) What happens in 251Xd when I is a singleton? (ii) Devise an appropriate convention to make 251Xd-251Xe remain valid when one or more of the sets I, K, I \ K there is empty. > (h) Let (X, Σ, µ) be a complete locally determined measure space, and I any non-empty set; let ν be counting measure on I. Show that the c.l.d. product measure on X × I is equal to (or at any rate identifiable with) the direct sum measure of the family h(Xi , Σi , µi )ii∈I , if we set (Xi , Σi , µi ) = (X, Σ, µ) for every i. > (i) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, with direct sum (X, Σ, µ) (214K). Let (Y, T, ν) be any measure space, andSgive X ×Y , Xi ×Y their c.l.d. product measures. Show that the natural bijection between X × Y and Z = i∈I ((Xi × Y ) × {i}) is an isomorphism between the measure of X × Y and the direct sum measure on Z. > (j) Let (X, Σ, µ) be any measure space, and Y a singleton set {y}; let ν be the measure on Y such that νY = 1. Show that the natural bijection between X × {y} and X identifies the primitive product measure on X × {y} with µ ˇ as defined in 213Xa, and the c.l.d. product measure with the c.l.d. version of µ. Explain how to put this together with 251Xf and 251Ic to prove 251S. > (k) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . Show b that λ is the c.l.d. version of its restriction to Σ⊗T. (l) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with primitive and c.l.d. product measures λ0 , λ. Let b such that λ1 (E × F ) = µE · νF for every E ∈ Σ, F ∈ T. Show that λ1 be any measure with domain Σ⊗T b λW ≤ λ1 W ≤ λ0 W for every W ∈ Σ⊗T. (m) Let (X, Σ, µ) and (Y, T, ν) be two measure spaces, and λ0 the primitive product measure on X × Y . Show that the corresponding outer measure λ∗0 is just the outer measure θ of 251A. (n) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and A ⊆ X, B ⊆ Y subsets; write µA , νB for the subspace measures. Let λ0 be the primitive product measure on X × Y , and λ# 0 the subspace measure it ˜ 0 be the primitive product measure of µA and νB on A × B. Show that λ ˜ 0 extends induces on A × B. Let λ # λ0 . Show that if either (α) A ∈ Σ and B ∈ T or (β) A and B can both be covered by sequences of sets of ˜ 0 = λ# . finite measure or (γ) µ and ν are both strictly localizable, then λ 0 (o) Let (X, Σ, µ) and (Y, T, ν) be any measure spaces, and λ0 the primitive product measure on X × Y . Show that λ∗0 (A × B) = µ∗ A · ν ∗ B for any A ⊆ X, B ⊆ Y . (p) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and µ ˆ the completion of µ. Show that µ, ν and µ ˆ, ν have the same primitive product measures. (q) Let (X, Σ, µ) be an atomless measure space, and (Y, T, ν) any measure space. Show that the c.l.d. product measure on X × Y is atomless. (r) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . (i) Show that if µ and ν are purely atomic, so is λ. (ii) Show that if µ and ν are point-supported, so is λ.
214
Product measures
251Xs
(s) Let (X, Σ, µ) be a semi-finite measure space. Show that µ is atomless iff the diagonal {(x, x) : x ∈ X} is negligible for the c.l.d. product measure on X × X. 251Y Further exercises (a) Let X, Y be sets with σ-algebras of subsets Σ, T. Suppose that h : b X × Y → R is Σ⊗T-measurable and φ : X → Y is (Σ, T)-measurable (121Yb). Show that x 7→ h(x, φ(x)) : X → R is Σ-measurable. (b) Let (X, Σ, µ) be a complete locally determined measure space with a subspace A whose measure is not locally determined (see 216Xb). Set Y = {0}, νY = 1 and consider the c.l.d. product measures on ˜ for their domains. Show that Λ ˜ properly includes {W ∩ (A × Y ) : W ∈ Λ}. X × Y and A × Y ; write Λ, Λ (c) Let (X, Σ, µ) be any measure space, (Y, T, ν) an atomless measure space, and f : X → Y a (Σ, T)measurable function. Show that {(x, f (x)) : x ∈ X} is negligible for the c.l.d. product measure on X × Y . 251 Notes and comments There are real difficulties in deciding which construction to declare as ‘the’ product of two arbitrary measures. My phrase ‘primitive product measure’, and notation λ0 , betray a bias; my own preference is for the c.l.d. product λ, for two principal reasons. The first is that λ0 is likely to be ‘bad’, in particular, not semi-finite, even if µ and ν are ‘good’ (251Xc, 252Yf), while λ inherits some of the most important properties of µ and ν (see 251N); the second is that in the case of topological measure spaces X and Y , there is often a canonical topological measure on X × Y , which is likely to be more closely related to λ than to λ0 . But for elucidation of this point I must ask you to wait until §417 in Volume 4. It would be possible to remove the ‘primitive’ product measure entirely from the exposition, or at least to relegate it to the exercises. This is indeed what I expect to do in the rest of this treatise, since (in my view) all significant features of product measures on finitely many factors can be expressed in terms of the c.l.d. product measure. For the first introduction to product measures, however, a direct approach to the c.l.d. product measure (through the description of λ∗ in 251O, for instance) is an uncomfortably large bite, and I have some sort of duty to present the most natural rival to the c.l.d. product measure prominently enough for you to judge for yourself whether I am right to dismiss it. There certainly are results associated with the primitive product measure (251Xm, 251Xo, 252Yc) which have an agreeable simplicity. The clash is avoided altogether, of course, if we specialize immediately to σ-finite spaces, in which the two constructions coincide (251K). But even this does not solve all problems. There is a popular alternative b measure often called ‘the’ product measure: the restriction λ0B of λ0 to the σ-algebra Σ⊗T. (See, for b instance, Halmos 50.) The advantage of this is that if a function f on X × Y is Σ⊗T-measurable, then x 7→ f (x, y) is Σ-measurable for every y ∈ Y . (This is because {W : W ⊆ X × Y, {x : (x, y) ∈ W } ∈ Σ ∀ y ∈ Y } b is a σ-algebra of subsets of X × Y containing E × F for every E ∈ Σ, F ∈ T, and therefore including Σ⊗T.) 2 The primary objection, to my mind, is that Lebesgue measure on R is no longer ‘the’ product of Lebesgue measure on R with itself. Generally, it is right to seek measures which measure as many sets as possible, and I prefer to face up to the technical problems (which I acknowledge are off-putting) by seeking appropriate definitions on the approach to major theorems, rather than rely on ad hoc fixes when the time comes to apply them. I omit further examples of product measures for the moment, because the investigation of particular examples will be much easier with the aid of results from the next section. Of course the leading example, and the one which should come always to mind in response to the words ‘product measure’, is Lebesgue measure on R 2 , the case r = s = 1 of 251M and 251Q. For an indication of what can happen when one of the factors is not σ-finite, you could look ahead to 252K. I hope that you will see that the definition of the outer measure θ in 251A corresponds to the standard definition of Lebesgue outer measure, with ‘measurable rectangles’ E × F taking the place of intervals, and the functional E × F 7→ µE · νF taking the place of ‘length’ or ‘volume’ of an interval; moreover, thinking of E and F as intervals, there is an obvious relation between Lebesgue measure on R 2 and the product measure on R × R. Of course an ‘obvious relationship’ is not the same thing as a proper theorem with exact hypotheses and conclusions, but Theorem 251M is clearly central. Long before that, however, there is
252A
Fubini’s theorem
215
another parallel between the construction of 251A and that of Lebesgue measure. In both cases, the proof that we have an outer measure comes directly from the defining formula (in 113Yd I gave as an exercise a general result covering 251B), and consequently a very general construction can lead us to a measure. But the measure would be of far less interest and value if it did not measure, and measure correctly, the basic sets, in this case the measurable rectangles. Thus 251E corresponds to the theorem that intervals are Lebesgue measurable, with the right measure (114Db, 114F). This is the real key to the construction, and is one of the fundamental ideas of measure theory. Yet another parallel is in 251Xm; the outer measure defining the primitive product measure λ0 is exactly equal to the outer measure defined from λ0 . I described the corresponding phenomenon for Lebesgue measure in 132C. Any construction which claims the title ‘canonical’ must satisfy a variety of natural requirements; for instance, one expects the canonical bijection between X × Y and Y × X to be an isomorphism between the corresponding product measure spaces. ‘Commutativity’ of the product in this sense is I think obvious from the definitions in 251A-251C. It is obviously desirable – not, I think, obviously true – that the product should be ‘associative’ in that the canonical bijection between (X × Y ) × Z and X × (Y × Z) should also be an isomorphism between the corresponding products of product measures. This is in fact valid for both the primitive and c.l.d. product measures (251W, 251Xd-251Xf). Working through the classification of measure spaces presented in §211, we find that the primitive product measure λ0 of arbitrary factor measures µ, ν is complete, while the c.l.d. product measure λ is always complete and locally determined. λ0 may not be semi-finite, even if µ and ν are strictly localizable (252Yf); but λ will be strictly localizable if µ and ν are (251N). Of course this is associated with the fact that the c.l.d. product measure is distributive over direct sums (251Xi). If either µ or ν is atomless, so is λ (251Xq). Both λ and λ0 are σ-finite if µ and ν are (251K). It is possible for both µ and ν to be localizable but λ not (254U). At least if you have worked through Chapter 21, you have now done enough ‘pure’ measure theory for this kind of investigation, however straightforward, to raise a good many questions. Apart from direct sums, we also have the constructions of ‘completion’, ‘subspace’, ‘outer measure’ and (in particular) ‘c.l.d. version’ to integrate into the new ideas; I offer some results in 251S and 251Xj. Concerning subspaces, some possibly surprising difficulties arise. The problem is that the product measure on the product of two subspaces can have a larger domain than one might expect. I give a simple example in 251Yb and a more elaborate one in 254Ye. For strictly localizable spaces, there is no problem (251P); but no other criterion drawn from the list of properties considered in §251 seems adequate to remove the possibility of a disconcerting phenomenon.
252 Fubini’s theorem Perhaps the most important feature of the concept of ‘product measure’ is the fact that we can use it to discuss repeated integrals. In this section I give versions of Fubini’s theorem and Tonelli’s theorem (252B, 252G) with a variety of corollaries, the most useful ones being versions for σ-finite spaces (252C, 252H). As applications I describe the relationship between integration and measuring ordinate sets (252N) and calculate the r-dimensional volume of a ball in R r (252Q, 252Xh). I mention counter-examples showing the difficulties which can arise with non-σ-finite measures and non-integrable functions (252K-252L, 252Xf-252Xg). 252A Repeated integrals Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and f a real-valued function defined on a set dom f ⊆ X × Y . We can seek to form the repeated integral
RR
f (x, y)ν(dy)µ(dx) =
which should be interpreted as follows: set D = {x : x ∈ X,
R
g(x) =
R ¡R
¢
f (x, y)ν(dy) µ(dx),
f (x, y)ν(dy) is defined in [−∞, ∞]},
R
f (x, y)ν(dy) for y ∈ D,
216
Product measures
252A
RR R and then write f (x, y)ν(dy)µ(dx) = g(x)µ(dx) if this is defined. Of course the subset of Y on which y 7→ f (x, y) is defined may vary with x, but it must always be conegligible, as must D. Similarly, exchanging the roles of X and Y , we can seek a repeated integral
RR
f (x, y)µ(dx)ν(dy) =
R ¡R
¢
f (x, y)µ(dx) ν(dy).
The point is that, under appropriate conditions on µ and ν, we can relate these repeated integrals to each other by connecting them both with the integral of f itself with respect to the product measure on X × Y . As will become apparent shortly, it is essential here to allow oneself to discuss the integral of a function which is not everywhere defined. It is of less importance whether one allows integrands and integrals to take infinite values, but for definiteness let me say that I shall be following the rules of 135F; that is, R R R f = f + − f − provided that R f is Rdefined almost everywhere, takes values in [−∞, ∞] and is virtually measurable, and at most one of f + , f − is infinite. 252B Theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with c.l.d. product (X × Y, Λ, λ) (251F). Suppose that ν is σ-finite and that µ is either strictly localizable or completeRRand locally determined. Let f R be a [−∞, ∞]-valued function such that f dλ is defined in [−∞, ∞]. Then f (x, y)ν(dy)µ(dx) is defined R and is equal to f dλ. proof The proof of this result involves substantial technical difficulties. If you have not seen these ideas before, you should almost certainly not go straight to the full generality of the version announced above. I will therefore start by writing out a proof in the case in which both µ and ν are totally finite; this is already lengthy enough. I will present it in such a way that only the central section (part (b) below) needs to be amended in the general case, and then, after completing the proof of the special case, I will give the alternative version of (b) which is required for the full result. R RR (a) Write L for the family of [0, ∞]-valued functions f such that f dλ and fR(x, y)ν(dy)µ(dx) are defined and equal. My aim is to show first that f ∈ L whenever f is non-negative and f dλ is defined, and then to look at differences of functions in L. To prove that enough functions belong to L, my strategy will be to start with ‘elementary’ functions and work outwards through progressively larger classes. It is most efficient to begin by describing ways of building new members of L from old, as follows. (i) f1 + f2 ∈ L for all f1 , f2 ∈ L, and cf ∈ L for all f ∈ L, c ∈ [0, ∞[; this is because
R
(f1 + f2 )(x, y)ν(dy) =
R
R
f1 (x, y)ν(dy) +
(cf )(x, y)ν(dy) = c
R
R
f2 (x, y)ν(dy),
f (x, y)ν(dy)
whenever the right-hand sides are defined, which we are supposing to be the case for almost every x, so that ZZ ZZ ZZ (f1 + f2 )(x, y)ν(dy)µ(dx) = f1 (x, y)ν(dy)µ(dx) + f2 (x, y)ν(dy)µ(dx) Z Z Z = f1 dλ + f2 dλ = (f1 + f2 )dλ, ZZ
Z (cf )(x, y)ν(dy)µ(dx) = c
Z f (x, y)ν(dy)µ(dx) = c
Z f dλ =
(cf )dλ.
(ii) If hfn in∈N is a sequence in L such that fn (x, y) ≤ fn+1 (x, y) whenever n ∈ NR and (x, y) ∈ dom fn ∩ dom fn+1 , then supn∈N fn ∈ L. P P Set f = supn∈N fn ; for x ∈ X, n ∈ N set gn (x) = fn (x, y)ν(dy) when the integral is defined in [0, ∞]. Since here I am allowing ∞ as a value R of a function, it is natural to R T regard f as defined on n∈N dom fn . By B.Levi’s theorem, f dλ = supn∈N fn dλ; write u for this common value in [0, ∞]. Next, because fn ≤ fn+1 wherever both are defined, gn ≤ gn+1 wherever both are defined, for each n; we are supposing that fn ∈ L, so gn is defined µ-almost everywhere for each n, and R
supn∈N
R
gn dµ = supn∈N
R
fn dλ = u.
T By B.Levi’s theorem again, g dµ = u, where g = supn∈N gn . Now take any x ∈ n∈N dom gn , and consider the functions fxn on Y , setting fxn (y) = fn (x, y) whenever this is defined. Each fxn has an integral in [0, ∞], and fxn (y) ≤ fx,n+1 (y) whenever both are defined, and
252B
Fubini’s theorem
217
R
supn∈N fxn dν = g(x); R so, using B.Levi’s theorem for a third time, (supn∈N fxn )dν is defined and equal to g(x), that is,
R
f (x, y)ν(dy) = g(x).
This is true for almost every x, so
RR
f (x, y)ν(dy)µ(dx) =
R
g dµ = u =
R
f dλ.
Thus f ∈ L, as claimed. Q Q (iii) The expression of the ideas in the next section of the proof will go more smoothly if I introduce another term. Write W for {W : W ⊆ X × Y, χW ∈ L}. Then (α) if W , W 0 ∈ W and W ∩ W 0 = ∅, W ∪ W 0 ∈ W by (i), because χ(W ∪ W 0 ) = χW + χW 0 , S (β) n∈N Wn ∈ W whenever hWn in∈N is a non-decreasing sequence in W and supn∈N λWn is finite because hχWn in∈N ↑ χW , and we can use (ii). R It is also helpful to note that, for any W ⊆ X × Y and any x ∈ X, χW (x, y)ν(dy) = νW [{x}], at least whenever W [{x}] = {y : (x, y) ∈ W } is measured by Rν. Moreover, because λ is complete, a set W ⊆ XR × Y belongs to Λ iff χW is λ-virtually measurable iff χW dλ is defined in [0, ∞], and in this case λW = χW dλ. (iv) Finally, we need to observe that, in appropriate circumstances, the difference of two members of W will belong to W: ifR W , W 0 ∈ W and W ⊆ WR0 and λW 0 < ∞, then W 0 \ W ∈ W. P P We 0 0 are supposing that g(x) = χW (x, y)ν(dy) and g (x) = χW (x, y)ν(dy) are defined for almost every x, R R and that g dµ = λW , g 0 dµ = λW 0 . Because λW 0 is finite, g 0 must be finite almost everywhere, and D = {x : x ∈ dom g ∩ dom g 0 , g 0 (x) < ∞} is conegligible. Now, for any x ∈ D, both g(x) and g 0 (x) are finite, so y 7→ χ(W 0 \ W )(x, y) = χW 0 (x, y) − χW (x, y) is the difference of two integrable functions, and Z
Z χ(W 0 \ W )(x, y)ν(dy) =
χW 0 (x, y) − χW (x, y)ν(dy) Z Z 0 = χW (x, y)ν(dy) − χW (x, y)ν(dy) = g 0 (x) − g(x).
Accordingly
RR
χ(W 0 \ W )(x, y)ν(dy)µ(dx) =
R
g 0 (x) − g(x)µ(dx) = λW 0 − λW = λ(W 0 \ W ),
and W 0 \ W belongs to W. Q Q (Of course the argument just above can be shortened by a few words if we allow ourselves to assume that µ and ν are totally finite, since then g(x) and g 0 (x) will be finite whenever they are defined; but the key idea, that the difference of integrable functions is integrable, is unchanged.) (b) Now let us examine the class W, assuming that µ and ν are totally finite. (i) E × F ∈ W for all E ∈ Σ, F ∈ T. P P λ(E × F ) = µE · νF (251J), and
R
χ(E × F )(x, y)ν(dy) = νF χE(x)
for each x, so ZZ
Z χ(E × F )(x, y)ν(dy)µ(dx) =
(νF χE(x))µ(dx) = µE · νF Z = λ(E × F ) = χ(E × F )dλ. Q Q
218
Product measures
252B
(ii) Let E be {E × F : E ∈ Σ, F ∈ T}. Then E is closed under finite intersections (because (E × F ) ∩ (E 0 × F 0 ) = (E ∩ E 0 ) × (F ∩ F 0 )) and is included in W. In particular, X × Y ∈ W. But this, together with (a-iv) and (a-iii-β) above, means that W is a Dynkin class (definition: 136A), so includes the σ-algebra of b (definition: subsets of X × Y generated by E, by the Monotone Class Theorem (136B); that is, W ⊇ Σ⊗T 251D). b such that (iii) Next, W ∈ W whenever W ⊆ X × Y is λ-negligible. P P By 251Ib, there is a V ∈ Σ⊗T V ⊆ (X × Y ) \ W and λV = λ((X × Y ) \ W ). Because λ(X × Y ) = µX · νY is finite, V 0 = (X × Y ) \ V is b λ-negligible, and we have W ⊆ V 0 ∈ Σ⊗T. Consequently 0 = λV 0 = But this means that D = {x :
R
RR
χV 0 (x, y)ν(dy)µ(dx).
χV 0 (x, y)ν(dy) is defined and equal to 0}
is conegligible. If x ∈ D, then we must have χV 0 (x, y) = 0Rfor ν-almost every y, that is, V 0 [{x}] is negligible; 0 in which case R W [{x}] ⊆ V [{x}] is also negligible, and χW (x, y)ν(dy) = 0. And this is true for every x ∈ D, so χW (x, y)ν(dy) is defined and equal to 0 for almost every x, and
RR
χW (x, y)ν(dy)µ(dx) = 0 = λW ,
as required. Q Q b such that V ⊆ W (iv) It follows that Λ ⊆ W. P P If W ∈ Λ, then, by 251Ib again, there is a V ∈ Σ⊗T and λV = λW , so that λ(W \ V ) = 0. Now V ∈ W by (ii) and W \ V ∈ W by (iii), so W ∈ W by (a-iii-α). Q Q (c) I return to the class L. (i) If f ∈ L and g is a [0, ∞]-valued function defined and equal to f λ-a.e., then g ∈ L. P P Set W = (X × Y ) \ {(x, y) : (x, y) ∈ dom f ∩ dom g, f (x, y) = g(x, y)}, RR so that λW = 0. (Remember that λ is complete.) By (b), χW (x, y)ν(dy)µ(dx) = 0, that is, W [{x}] is ν-negligible for µ-almost every x. Let D be {x : x ∈ X, W [{x}] is ν-negligible}. Then D is µ-conegligible. If x ∈ D, then is negligible, so that
R
W [{x}] = Y \ {y : (x, y) ∈ dom f ∩ dom g, f (x, y) = g(x, y)} R f (x, y)ν(dy) = g(x, y)ν(dy) if either is defined. Thus the functions R R x 7→ f (x, y)ν(dy), x 7→ g(x, y)ν(dy)
are equal almost everywhere, and
RR
g(x, y)ν(dy)µ(dx) =
RR
f (x, y)ν(dy)µ(dx) =
so that g ∈ L. Q Q (ii) Now let f be any non-negative function such that k, n ∈ N set
R
R
f dλ =
R
g dλ,
f dλ is defined in [0, ∞]. Then f ∈ L. P P For
Wnk = {(x, y) : (x, y) ∈ dom f, f (x, y) ≥ 2−n k}. Because λ is complete and f is λ-virtually measurable and dom f is conegligible, every Wnk belongs to Λ, P4n so χWnk ∈ L, by (b). Set fn = k=1 2−n χWnk , so that fn (x, y) = 2−n k if k ≤ 4n and 2−n k ≤ f (x, y) < 2−n (k + 1), = 2n if f (x, y) ≥ 2n , = 0 if (x, y) ∈ (X × Y ) \ dom f. By (a-i), fn ∈ L for every n ∈ N, while hfn in∈N is non-decreasing, so f 0 = supn∈N fn ∈ L, by (a-ii). But f =a.e. f 0 , so f ∈ L, by (i) just above. Q Q R let f be any [−∞, ∞]-valued function such that f dλ is defined in [−∞, ∞]. Then R + (iii)R Finally, f dλ, f − dλ are both defined and at most one is infinite. By (ii), both f + and f − belong to L. Set
252B
Fubini’s theorem
219
R R R R g(x) = Rf + (x, y)ν(dy), h(x) = f − (x, y)ν(dy) whenever these are defined; then g dµ = f + dλ and R h dµ = f − dλ are both defined in [0, ∞]. R R Suppose first that f − dλ is finite. Then h dµ is finite, so h must be finite µ-almost everywhere; set For any x ∈ D,
R
D = {x : x ∈ dom g ∩ dom h, h(x) < ∞}. R f + (x, y)ν(dy) and f − (x, y)ν(dy) are defined in [0, ∞], and the latter is finite; so
R
f (x, y)ν(dy) =
R
f + (x, y)ν(dy) −
R
f − (x, y)ν(dy) = g(x) − h(x)
is defined in ]−∞, ∞]. Because D is conegligible, ZZ Z Z Z f (x, y)ν(dy)µ(dx) = g(x) − h(x)µ(dx) = g dµ − h dµ Z Z Z + − = f dλ − f dλ = f dλ, as required. R whenR f − dλ is finite. Similarly, or by applying the argument above to −f , RR Thus we have the result R f (x, y)ν(dy)µ(dx) = f dλ if f + dλ is finite. Thus the theorem is proved, at least when µ and ν are totally finite. (b*) The only point in the argument above where we needed to know anything special about the measures µ and ν was in part (b), when showing that Λ ⊆ W. I now return to this point under the hypotheses of the theorem as stated, that ν is σ-finite and µ is either strictly localizable or complete and locally determined. (i) It will be helpful to note that the completion µ ˆ of µ (212C) is identical with its c.l.d. version µ ˜ (213E). P P If µ is strictly localizable, then µ ˆ=µ ˜ by 213Ha. If µ is complete and locally determined, then µ ˆ=µ=µ ˜ (212D, 213Hf). Q Q (ii) Write Σf = {G : G ∈ Σ, µG < ∞}, Tf = {H : H ∈ T, νH < ∞}. For G ∈ Σf , H ∈ Tf let µG , νH and λG×H be the subspace measures on G, H and G × H respectively; then λG×H is the c.l.d. product measure of µG and νH (251P(ii-α)). Now W ∩ (G × H) ∈ W for every W ∈ Λ. P P W ∩ (G × H) belongs to the domain of λG×H , so by (b) of this proof, applied to the totally finite measures µG and νH , λ(W ∩ (G × H)) = λG×H (W ∩ (G × H)) Z Z = χ(W ∩ (G × H))(x, y)νH (dy)µG (dx) ZG ZH = χ(W ∩ (G × H))(x, y)ν(dy)µG (dx) G
Y
(because χ(W ∩ (G × H))(x, y) = 0 if y ∈ Y \ H, so we can use 131E) Z Z = χ(W ∩ (G × H))(x, y)ν(dy)µ(dx) X
by 131E again, because
R Y
Y
χ(W ∩ (G × H))(x, y)ν(dy) = 0 if x ∈ X \ G. So W ∩ (G × H) ∈ W. Q Q
(iii) In fact, W ∈ W for every W ∈ Λ. P P Remember that we are supposing that ν is σ-finite. Let hYn in∈N Rbe a non-decreasing sequence in Tf covering Y , and for each n ∈ N set Wn = W ∩ (X × Yn ), gn (x) = χWn (x, y)ν(dy) whenever this is defined. For any G ∈ Σf ,
R
g dµ = G n
RR
χ(W ∩ (G × Yn ))(x, y)ν(dy)µ(dx)
is defined and equal to λ(W ∩ (G × Yn )), by (ii). But this means, first, that G \ dom gn is negligible, that is, that µ ˆ(G \ dom gn ) = 0. Since this is so whenever µG is finite, µ ˜(X \ dom gn ) = 0, and g is defined µ ˜-a.e.; but µ ˜=µ ˆ, so g is defined µ ˆ-a.e., that is, µ-a.e. (212Eb). Next, if we set Ena = {x : x ∈ dom gn , gn (x) ≥ a} ˆ whenever G ∈ Σf , where Σ ˆ is the domain of µ for a ∈ R, then Ena ∩ G ∈ Σ ˆ; by the definition in 213D, Ena is measured by µ ˜=µ ˆ. As a isR arbitrary, gn is µ-virtually measurable (212Fa). We can therefore speak of gn dµ. Now
220
Product measures
ZZ
252B
Z χWn (x, y)ν(dy)µ(dx) =
Z gn dµ = sup
gn
G∈Σf
G
(213B, because µ is semi-finite) = sup λ(W ∩ (G × Yn )) = λ(W ∩ (X × Yn )) G∈Σf
by the definition in 251F. Thus W ∩ (X × Yn ) ∈ W. This is true for every n ∈ N. Because hYn in∈N ↑ Y , W ∈ W, by (a-iii-β). Q Q (iv) We can therefore return to part (c) of the argument above and conclude as before. 252C The theorem above is of course asymmetric, in that different hypotheses are imposed on the two factor measures µ and ν. If we want a ‘symmetric’ theorem we have to suppose that they are both σ-finite, as follows. Corollary Let (X, Σ, µ) and (Y, T, ν) RR be two σ-finite measureRRspaces, and λ the c.l.d. product measure on X × Y . If f is λ-integrable, then f (x, y)ν(dy)µ(dx) and f (x, y)µ(dx)ν(dy) are defined, finite and equal. proof Since µ and ν are surely strictly localizable (211Lc), we can apply 252B from either side to conclude that
RR
f (x, y)ν(dy)µ(dx) =
R
f dλ =
RR
f (x, y)µ(dx)ν(dy).
252D So many applications of Fubini’s theorem are to characteristic functions that I take a few lines to spell out the form which 252B takes in this case, as in parts (b)-(b*) of the proof there. Corollary Let (X, Σ, µ) and (Y, T, ν) be measure spaces and λ the c.l.d. product measure on X ×Y . Suppose that ν is σ-finite and that Rµ is either strictly localizable or complete and locally determined. (i) If W ∈ dom λ, then ν ∗ W [{x}]µ(dx) is defined in [0, ∞] and R R equal to λW . (ii) If ν is complete, we can write νW [{x}]µ(dx) in place of ν ∗ W [{x}]µ(dx). R proof The point is just that χW (x, y)ν(dy) = νˆW [{x}] whenever either is defined, where νˆ is the completion of ν (212Fb). Now 252B tells us that λW =
RR
χW (x, y)ν(dy)µ(dx) =
R
νˆW [{x}]µ(dx).
We always have νˆW [{x}] = ν ∗ W [{x}], by the definition of νˆ (212C); and if ν is complete, then νˆ = ν so R λW = νW [{x}]µ(dx). 252E Corollary Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with c.l.d. product (X×Y, Λ, λ). Suppose that ν is σ-finite and that µ is either strictly localizable or complete and locally determined. Then if f is a Λ-measurable real-valued function defined on a subset of X × Y , y 7→ f (x, y) is ν-virtually measurable for µ-almost every x ∈ X. proof Let f˜ be a Λ-measurable extension of f to a real-valued function defined everywhere on X × Y (121I), and set f˜x (y) = f˜(x, y) for all x ∈ X, y ∈ Y , D = {x : x ∈ X, f˜x is ν-virtually measurable}. If G ∈ Σ and µG < ∞, then G \ D is negligible. P P Let hYn in∈N be a non-decreasing sequence of sets of finite measure covering Y respectively, and set f˜n (x, y) = f˜(x, y) if x ∈ G, y ∈ Yn and |f˜(x, y)| ≤ n, = 0 for other x ∈ X × Y. Then each f˜n is λ-integrable, being bounded and Λ-measurable and zero off G × Yn . Consequently, setting f˜nx (y) = f˜n (x, y),
252G
Fubini’s theorem
R R
221
( f˜nx dν)µ(dx) exists =
R
f˜n dλ.
But this surely means that f˜nx is ν-integrable, therefore ν-virtually measurable, for almost every x ∈ X. Set Dn = {x : x ∈ X, f˜nx is ν-virtually measurable}; T T then every Dn is µ-conegligible, so n∈N Dn isTconegligible. But for any x ∈ G ∩ n∈N Dn , f˜x = limn→∞ f˜nx is ν-virtually measurable. Thus G \ D ⊆ X \ n∈N Dn is negligible. Q Q This is true whenever µG < ∞. By 213J, because µ is either strictly localizable or complete and locally determined, X \ D is negligible and D is conegligible. But, for any x ∈ D, y 7→ f (x, y) is a restriction of f˜x and must be ν-virtually measurable. 252F As a further corollary we can get some useful information about the c.l.d. product measure for arbitrary measure spaces. Corollary Let (X, Σ, µ) and (Y, T, ν) be two measure spaces, λ the c.l.d. product measure on X × Y , and Λ its domain. Let W ∈ Λ be such that the vertical section W [{x}] is ν-negligible for µ-almost every x ∈ X. Then λW = 0. proof Take E ∈ Σ, F ∈ T of finite measure. Let λE×F be the subspace measure on E × F . By 251P(ii-α), this is just the product of the subspace measures µE and νF . We know that W ∩ (E × F ) is measured by λE×F . At the same time, the vertical section (W ∩ (E × F ))[{x}] = W [{x}] ∩ F is νF -negligible for µE -almost every x ∈ X. Applying 252B to µE and νF and χ(W ∩ (E × F )), λ(W ∩ (E × F )) = λE×F (W ∩ (E × F )) =
R
νF (W [{x}] ∩ F )µE (dx) = 0.
But looking at the definition in 251F, we see that this means that λW = 0, as claimed. 252G Theorem 252B and its corollaries depend on the factor measures µ and ν belonging to restricted classes. There is a partial result which applies to all c.l.d. product measures, as follows. Tonelli’s theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and (X × Y, Λ, λ) their c.l.d. product. Let RR f be a Λ-measurableRR[−∞, ∞]-valued function defined on a member of Λ, and suppose that either |f (x, y)|µ(dx)ν(dy) or |f (x, y)|ν(dy)µ(dx) exists in R. Then f is λ-integrable. proof Because the construction of the product measure is symmetric in the two factors, it is enough to RR consider the case in which |f (x, y)|ν(dy)µ(dx) is defined and finite, as the same ideas will surely deal with the other case also. (a) The first step is to check that f is defined and finite λ-a.e. P P Set W = {(x, y) : (x, y) ∈ dom f, f (x, y) is finite}. Then W ∈ Λ. The hypothesis
RR
includes the assertion
R
|f (x, y)|ν(dy)µ(dx) is defined and finite
|f (x, y)|ν(dy) is defined and finite for µ-almost every x,
which implies that for µ-almost every x, f (x, y) is defined and finite for ν-almost every y; that is, that for µ-almost every x, W [{x}] is ν-conegligible. But by 252F this implies that (X × Y ) \ W is λ-negligible, as required. Q Q (b)RRLet h be any non-negative λ-simple function such that h ≤ |f | λ-a.e. Then than |f (x, y)|ν(dy)µ(dx). P P Set
R
h cannot be greater
W = {(x, y) : (x, y) ∈ dom f, h(x, y) ≤ |f (x, y)|}, h0 = h × χW ; Pn then h0 is a simple function and h0 =a.e. h. Express h0 as i=0 ai χWi where ai ≥ 0 and λWi < ∞ for each i. Let ² > 0. For each i ≤ n there are Ei ∈ Σ, Fi ∈ T such that µEi < ∞, νFi < ∞ and
222
Product measures
252G
S S λ(Wi ∩ (Ei × Fi )) ≥ λWi − ². Set E = i≤n Ei and F = i≤n Fi . Consider the subspace measures µE and νF and their product λE×F on E × F ; then λE×F is the subspace measure on E × F defined from λ (251P). Accordingly, applying 252B to the product µE × νF ,
R
h0 dλ =
E×F
R
E×F
h0 dλE×F =
R R E
F
h0 (x, y)νF (dy)µE (dx).
For any x, we know that h0 (x, y) ≤ |f (x, y)| whenever f (x, y) is defined. So we can be sure that
R
R
R
h0 (x, y)νF (dy) = h0 (x, y)χF (y)ν(dy) ≤ |f (x, y)|ν(dy) F R 0 R at least whenever F h (x, y)νF (dy) and |f (x, y)|ν(dy) are both defined, which is the case for almost every x ∈ E. Consequently Z Z Z h0 (x, y)νF (dy)µE (dx) h0 dλ = E F E×F Z Z ZZ ≤ |f (x, y)|ν(dy)µ(dx) ≤ |f (x, y)|ν(dy)µ(dx). E
On the other hand,
R
R
h0 dλ −
E×F
h0 dλ =
So
R
As ² is arbitrary,
R
h dλ ≤
RR
Pn i=0
h dλ =
R
ai λ(Wi \ (E × F )) ≤ h0 dλ ≤
RR
Pn i=0
ai λ(Wi \ (Ei × Fi )) ≤ ²
|f (x, y)|ν(dy)µ(dx)+²
Pn i=0
Pn i=0
ai .
ai .
|f (x, y)|ν(dy)µ(dx), as claimed. Q Q
(c) This is true whenever h is a λ-simple function less than or equal to |f | λ-a.e. But |f | is Λ-measurable and λ is semi-finite (251Ic), so this is enough to ensure that |f | is λ-integrable (213B), which (because f is supposed to be Λ-measurable) in turn implies that f is λ-integrable. 252H Corollary Let (X, Σ, µ) and (Y, T, ν) be σ-finite measure spaces, λ the c.l.d. product measure on X × Y , and Λ its domain. Let f be a Λ-measurable real-valued function defined on a member of Λ. Then if one of
R
X×Y
|f (x, y)|λ(d(x, y)),
R R
R R
Y
X
|f (x, y)|µ(dx)ν(dy), X
exists in R, so do the other two, and in this case
R
X×Y
f (x, y)λ(d(x, y)) =
proof (a) Suppose that
R
R R Y
X
f (x, y)µ(dx)ν(dy) =
Y
R R X
Y
|f (x, y)|ν(dy)µ(dx)
f (x, y)ν(dy)µ(dx).
|f |dλ is finite. Because both µ and ν are σ-finite, 252B tells us that
both exist and are equal to
RR
R
|f (x, y)|µ(dx)ν(dy),
|f |dλ, while
RR
R
f (x, y)µ(dx)ν(dy),
RR
RR
|f (x, y)|ν(dy)µ(dx)
f (x, y)ν(dy)µ(dx)
both exist and are equal to f dλ. RR (b) Now suppose that |f (x, y)|ν(dy)µ(dx) exists in R. Then 252G tells us that |f | is λ-integrable, so we can use (a) to complete the argument. Exchanging the coordinates, the same argument applies if RR |f (x, y)|µ(dx)ν(dy) exists in R. 252I Corollary Let (X, Σ, µ) and (Y, T, ν) be measure spaces, λ the c.l.d. product measure on X × Y , and Λ its domain. Take W ∈ Λ. If either of the integrals
R
µ∗ W −1 [{y}]ν(dy),
exists and is finite, then λW < ∞. proof Apply 252G with f = χW , remembering that
R
ν ∗ W [{x}]µ(dx)
252Kc
Fubini’s theorem
µ∗ W −1 [{y}] =
R
ν ∗ W [{x}] =
χW (x, y)µ(dx),
223
R
χW (x, y)ν(dy)
whenever the integrals are defined, as in the proof of 252D. 252J Remarks 252H is the basic form of Fubini’s theorem; it is not a coincidence that most authors avoid non-σ-finite spaces in this context. The next two examples exhibit some of the difficulties which can arise if we leave the familiar territory of more-or-less Borel measurable functions on σ-finite spaces. The first is a classic. 252K Example Let (X, Σ, µ) be [0, 1] with Lebesgue measure, and let (Y, T, ν) be [0, 1] with counting measure. (a) Consider the set W = {(t, t) : t ∈ [0, 1]} ⊆ X × Y . We observe that W is expressible as T Sn n∈N
k+1 k k=0 [ n+1 , n+1 ]
k k+1 b × [ n+1 , n+1 ] ∈ Σ⊗T.
If we look at the sections W −1 [{t}] = W [{t}] = {t} for t ∈ [0, 1], we have
RR
χW (x, y)µ(dx)ν(dy) =
RR
R
χW (x, y)ν(dy)µ(dx) =
µW −1 [{y}]ν(dy) =
R
νW [{x}]µ(dx) =
R
R
0 ν(dy) = 0,
1 µ(dx) = 1,
so the two repeated integrals differ. It is therefore not generally possible to reverse the order of repeated integration, even for a non-negative measurable function in which both repeated integrals exist and are finite. b (b) Because the set W of part (a) actually belongs to Σ⊗T, we know that it is measured by the c.l.d. product measure λ, and 252F (applied with the coordinates reversed) tells us that λW = 0. (c) It is in fact easy to give a full description of λ. (i) The point is that a set W ⊆ [0, 1] × [0, 1] belongs to the domain Λ of λ iff every horizontal section W −1 [{y}] is Lebesgue measurable. P P (α) If W ∈ Λ, then, for every b ∈ [0, 1], λ([0, 1] × {b}) is finite, so W ∩ ([0, 1] × {b}) is a set of finite measure, and λ(W ∩ ([0, 1] × {b})) =
R
µ(W ∩ ([0, 1] × {b}))−1 [{y}]ν(dy) = µW −1 [{b}]
by 252D, because µ is σ-finite, ν is both strictly localizable and complete and locally determined, and (W ∩ ([0, 1] × {b}))−1 [{y}] = W −1 [{b}] if y = b, = ∅ otherwise. As b is arbitrary, every horizontal section of W is measurable. (β) If every horizontal section of W is measurable, let F ⊆ [0, 1] be any set of finite measure for ν; then F is finite, so S b ⊆ Λ. W ∩ ([0, 1] × F ) = y∈F W −1 [{y}] × {y} ∈ Σ⊗T But it follows that W itself belongs to Λ, by 251H. Q Q (ii) Now some of the same calculations show that for every W ∈ Λ, P λW = y∈[0,1] µW −1 [{y}]. P P For any finite F ⊆ [0, 1],
224
Product measures
252Kc
Z µ(W ∩ ([0, 1] × F ))−1 [{y}]ν(dy)
λ(W ∩ ([0, 1] × F )) = Z
µW −1 [{y}]ν(dy) =
= F
So λW = supF ⊆[0,1] is finite
P y∈F
X
µW −1 [{y}].
y∈F
µW −1 [{y}] =
P y∈[0,1]
µW −1 [{y}]. Q Q
252L Example For the second example, I turn to a problem that can arise if we neglect to check that a function is measurable as a function of two variables. Let (X, Σ, µ) = (Y, T, ν) be ω1 , the first uncountable ordinal (2A1Fc), with the countable-cocountable measure (211R). Set W = {(ξ, η) : ξ ≤ η < ω1 } ⊆ X × Y . Then all the horizontal sections W
−1
R
[{η}] = {ξ : ξ ≤ η} are countable, so µW −1 [{η}]ν(dη) =
R
0 ν(dη) = 0,
while all the vertical sections W [{ξ}] = {η : ξ ≤ η < ω1 } are cocountable, so
R
νW [{ξ}]µ(dξ) =
R
1 µ(dξ) = 1.
Because the two repeated integrals are different, they cannot both be equal to the measure of W , and the sole resolution is to say that W is not measurable for the product measure. 252M Remark A third kind of difficulty in the formula
RR
f (x, y)dxdy =
RR
f (x, y)dydx
b can arise even on probability spaces with Σ⊗T-measurable real-valued functions defined everywhere if we neglect to check that f is integrable with respect to the product measure. In 252H, we do need the hypothesis that one of
R
X×Y
|f (x, y)|λ(d(x, y)),
R R
R R
Y
X
|f (x, y)|µ(dx)ν(dy), X
Y
|f (x, y)|ν(dy)µ(dx)
is finite. For examples to show this, see 252Xf and 252Xg. 252N Integration through ordinate sets I: Proposition Let (X, Σ, µ) be a complete locally determined measure space, and λ the c.l.d. product measure on X × R, where R is given Lebesgue measure; write Λ for the domain of λ. For any [0, ∞]-valued function f defined on a conegligible subset of X, write Ωf , Ω0f for the ordinate sets Ωf = {(x, a) : x ∈ dom f, 0 ≤ a ≤ f (x)} ⊆ X × R, Ω0f = {(x, a) : x ∈ dom f, 0 ≤ a < f (x)} ⊆ X × R. Then λΩf = λΩ0f =
R
f dµ
in the sense that if one of these is defined in [0, ∞], so are the other two, and they are equal. proof (a) If Ωf ∈ Λ, then
R
f (x)µ(dx) =
R
ν{y : (x, y) ∈ Ωf }µ(dx) = λΩf
by 252D, writing µ for Lebesgue measure, because f is defined almost everywhere. Similarly, if Ω0f ∈ Λ,
R
f (x)µ(dx) =
R
ν{y : (x, y) ∈ Ω0f }µ(dx) = λΩ0f .
*252P
Fubini’s theorem
225
R (b) If f dµ is defined, then f is µ-virtually measurable, therefore measurable (because µ is complete); again because µ is complete, dom f ∈ Σ. So S Ω0f = q∈Q,q>0 {x : x ∈ dom f, f (x) > q} × [0, q], Ωf =
T
S
1
: x ∈ dom f, f (x) ≥ q − } × [0, q] n R 0 belong to Λ, so that λΩf and λΩf are defined. Now both are equal to f dµ, by (a). n≥1
q∈Q,q>0 {x
252O Integration through ordinate sets II: Proposition Let (X, Σ, µ) be a measure space, and f a Σ-measurable [0, ∞]-valued function defined on a measurable conegligible subset of X. Then
R
f dµ =
R∞
R∞
µ{x : x ∈ dom f, f (x) ≥ t}dt = 0 µ{x : x ∈ dom f, f (x) > t}dt R in [0, ∞], where the integrals . . . dt are taken with respect to Lebesgue measure. P4n proof For n, k ∈ N set Enk = {x : x ∈ dom f, f (x) > 2−n k}, gn (x) = 2−n k=1 χEnk . Then Rhgn in∈N is a non-decreasing sequence of measurable functions converging to f at every point of dom f , so f dµ = R limn→∞ gn dµ and µ{x : f (x) > t} = limn→∞ µ{x : gn (x) > t} for every t ≥ 0; consequently 0
R∞ 0
µ{x : f (x) > t}dt = limn→∞
R∞ 0
µ{x : gn (x) > t}dt.
On the other hand, µ{x : gn (x) > t} = µEnk if 1 ≤ k ≤ 4n and 2−n (k − 1) < t ≤ 2−n k, 0 if t ≥ 2n , so that R∞ R P4n −n µ{x : g (x) > t}dt = 2 µE = gn dµ, n nk k=1 0 R∞ R for every n ∈ N. So 0 µ{x : f (x) > t}dt = f dµ. Now µ{x : f (x) ≥ t} = µ{x : f (x) > t} for almost all t. P P Set C = {t : µ{x : f (x) > t} < ∞}, h(t) = µ{x : f (x) > t} for t ∈ C. Then h : C → [0, ∞[ is monotonic, so is continuous almost everywhere (222A). But at any point of C at which h is continuous, µ{x : f (x) ≥ t} = lims↓t µ{x : f (x) > s} = µ{x : f (x) > t}. So we have the Rresult, since µ{x : f (x) ≥ t} = µ{x : fR(x) > t} = ∞ for any t ∈ [0, ∞[ \ C. Q Q ∞ Accordingly 0 µ{x : f (x) ≥ t}dt is also equal to f dµ. b *252P If we work through the ideas of 252B for Σ⊗T-measurable functions, we get the following, which is sometimes useful. b Proposition Let (X, Σ, µ) be a measure space, and R (Y, T, ν) a σ-finite measure space. Then for any Σ⊗Tmeasurable RR function f : X × Y → R[0, ∞], x 7→ f (x, y)ν(dy) : X → [0, ∞] is Σ-measurable; and if µ is semi-finite, f (x, y)ν(dy)µ(dx) = f dλ, where λ is the c.l.d. product measure on X × Y . proof (a) Let hYn in∈N be a non-decreasing sequence of subsets of Y of finite measure with union Y . Set A = {W : W ⊆ X × Y, W [{x}] ∈ T for every x ∈ X, x 7→ ν(Yn ∩ W [{x}]) is Σ-measurable for every n ∈ N}. b by the Then A is a Dynkin class of subsets of X × Y including {E × F : E ∈ Σ, F ∈ T}, so includes Σ⊗T, Monotone Class Theorem (136B). b then This means that if W ∈ Σ⊗T, µW [{x}] = supn∈N ν(Yn ∩ W [{x}]) is defined for every x ∈ X and is a Σ-measurable function of x. (b) Now, for n, k ∈ N, set Wnk = {(x, y) : f (x, y) ≥ 2−n k}, Then if we set hn (x) =
R
gn (x, y)ν(dy)=
gn =
P4n k=1
P4n k=1
2−n χWnk .
2−n νWnk [{x}]
226
Product measures
*252P
for n ∈ N and x ∈ X, hn : X → [0, ∞] is Σ-measurable, and limn→∞ hn (x) =
R¡
limn→∞ gn (x, y))ν(dy) =
R
f (x, y)ν(dy)
for every x, because hgn (x, y)in∈N is a non-decreasing sequence with limit f (x, y) for all x ∈ X, y ∈ Y . So R x 7→ f (x, y)ν(dy) is also defined everywhere in X and is Σ-measurable. R R R (c) If E ⊆ X is measurable and has finite measure, then E f (x, y)ν(dy)µ(dx) = E×Y f dλ, applying 252B to the product of the subspace measure µE and ν (and using 251P to check that the product of µE and ν is the subspace measure on E × Y ). Now if λW is defined and finite, there must be a non-decreasing sequence S hEn in∈N of subsets of X of finite measure such that λW = supn∈N λ(W ∩ (En × Y )), so that W \ n∈N (En × Y ) is negligible, and Z
Z f dλ = lim
n→∞
W
f dλ W ∩(En ×Y )
(by B.Levi’s theorem applied to hf × χ(W ∩ (En × Y ))in∈N ) Z ≤ lim f dλ n→∞ E ×Y Z nZ ZZ = lim f (x, y)ν(dy)µ(dx) ≤ f (x, y)ν(dy)µ(dx). n→∞
By 213B,
R
En
f dλ = supλW (c) Let (X1 , Σ1 , µ1 ), (X2 , Σ2 , µ2 ), (X2 , Σ3 , µ3 ) be three σ-finite measure spaces, and f a real-valued function defined almost RRR everywhere on X1 × X2 × X3 and measurable for RRR the product measure described in 251Xf. Show that if |f (x , x , x )|dx dx dx is defined in R, then f (x1 , x2 , x3 )dx2 dx3 dx1 and 1 2 3 1 2 3 RRR f (x1 , x2 , x3 )dx3 dx1 dx2 exist and are equal. b such that (d) Give an example of strictly localizable measure spaces (X, Σ, µ), (Y, T, ν) and a W ∈ Σ⊗T x 7→ νW [{x}] is not Σ-measurable. (Hint: in 252Kb, try Y a proper subset of [0, 1].) b > (e) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and f a Σ⊗T-measurable function defined on a subset of X × Y . Show that y 7→ f (x, y) is T-measurable for every x ∈ X. 2 RR> (f ) Set f (x, y) = sin(x − y) if 0 ≤ y ≤ x ≤ y + 2π, 0 for other x, y ∈ R . Show that f (x, y)dy dx = 2π, taking all integrals with respect to Lebesgue measure.
RR
f (x, y)dx dy = 0,
252Ye
> (g) Set f (x, y) =
Fubini’s theorem x2 −y 2 (x2 +y 2 )2
for x, y ∈ ]0, 1]. Show that
229
R1R1 0
0
π 4
f (x, y)dydx = ,
R1R1 0
0
π 4
f (x, y)dxdy = − .
(h) Let r ≥ 1 be an integer, and write βr for the Lebesgue measure of the unit ball in R r . Set gr (t) = rβr tr−1 for tR ≥ 0. Set φ(x) = kxk for x ∈ R r . (i) Writing µr for Lebesgue measure on Rr , show that µr φ−1 [E] = E rβr tr−1 µ1 (dt) for every Lebesgue measurable set E ⊆ [0, ∞[. (Hint: start with intervals E, noting from 115Xe that µr {x : kxk ≤ a} = βr ar for a ≥ 0, and progress to open sets, negligible sets and general measurable sets.) (ii) Using 235T, show that Z Z ∞ 2 2 r e−kxk /2 µr (dx) = rβr tr−1 e−t /2 µ1 (dt) = 2(r−2)/2 rβr Γ( ) 2
0
√ r 1 = 2r/2 βr Γ(1 + ) = ( 2Γ( ))r 2
where Γ is the Γ-function (225Xj). (iii) Show that 2Γ( 21 )2 = 2β2 and hence that βr =
r/2
π Γ(1+ r2 )
and
R∞ −∞
e−t
2
/2
dt =
R∞
√
0
te−t
2
2
/2
dt = 2π,
2π.
(i) Let f , g : R → R be two non-decreasing functions, and µf , µg the associated Lebesgue-Stieltjes measures (see 114Xa). Set f (x+ ) = limt↓x f (t),
f (x− ) = limt↑x f (t)
for each x ∈ R, and define g(x+ ), g(x− ) similarly. Show that whenever a ≤ b in R, Z Z − f (x )µg (dx) + g(x+ )µf (dx) = g(b+ )f (b+ ) − g(a− )f (a− ) [a,b] [a,b] Z Z 1 1 − + = (f (x ) + f (x ))µg (dx) + ((g(x− ) + g(x+ ))µf (dx). [a,b]
2
[a,b]
2
(Hint: find two expressions for (µf × µg ){(x, y) : a ≤ x < y ≤ b}.) 252Y Further exercises (a) Let (X, Σ, µ) be a measure space. Show that the following are equiveridical: (i) the completion of µ is locally determined; (ii) the completion of µ coincides with the c.l.d. version of µ; (iii) whenever (Y, R T, ν) is a σ-finite measure space and RR λ the c.l.d. product measure on X × Y and R f is a function such that f dλ is defined in [−∞, ∞], then f (x, y)ν(dy)µ(dx) is defined and equal to f dλ. (b) Let (X, Σ, µ) be a measure space. Show that the following are equiveridical: (i) µ has locally determined negligible sets (213I); (ii) whenever (Y, T, ν) is a σ-finite measure space and λ the c.l.d. product RR R measure on X × Y , then f (x, y)ν(dy)µ(dx) is defined and equal to f dλ for any λ-integrable function f . (c) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ0 theRRprimitive product measure R on X ×Y (251C). Let f be any λ0 -integrable real-valued function. Show that f (x, y)ν(dy)µ(dx) = f dλ0 . (Hint: show that there are sequences hGn in∈NS , hHn in∈N of sets of finite measure such that f (x, y) is defined and equal to 0 for every (x, y) ∈ (X × Y ) \ n∈N Gn × Hn .) (d) Let (X, Σ, µ) and (Y, T, ν) be measure spaces; let λ0 be the primitive product measure on X × Y , and λ the c.l.d. R R product measure. Show that if f is a λ0 -integrable real-valued function, it is λ-integrable, and f dλ = f dλ0 . (e) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with c.l.d. product (X ×Y, Λ, λ). Let f be a non-negative Λ-measurable real-valued function defined on a λ-conegligible set, and suppose that ¢ R ¡R f (x, y)µ(dx) ν(dy) is finite. Show that f is λ-integrable.
230
Product measures
252Yf
(f ) Let (X, Σ, µ) be the unit interval [0, 1] with Lebesgue measure, and (Y, T, ν) the interval with counting measure, as in 252K; let λ0 be the primitive product measure on [0, 1]2 . (i) Setting ∆ = {(t, t) : t ∈ [0, 1]}, show that λ0 ∆ = ∞. (ii) Show that λ0 is not semi-finite. (iii) Show that if W ∈ dom λ0 , then λ0 W = P −1 [{y}] if there are a countable set A ⊆ [0, 1] and a Lebesgue negligible set E ⊆ [0, 1] such that y∈[0,1] µW W ⊆ ([0, 1] × A) ∪ (E × [0, 1]), ∞ otherwise. (g) Let (X, Σ, µ) be a measure space, and λ0 the primitive product measure on X × R, where R is given Lebesgue measure; write Λ for its domain. For any [0, ∞]-valued function f defined on a conegligible subset R of X, write Ωf , Ω0f for the corresponding ordinate sets, as in 252N. Show that if any of λ0 Ωf , λ0 Ω0f , f dµ is defined and finite, so are the others, and all three are equal. (h) Let (X, Σ, µ) be a complete locally determined measure space, and f a non-negative function defined on a conegligible subset of X. Write Ωf , Ω0f for the corresponding ordinate sets, as in 252N. Let λ be the R c.l.d. product measure on X × R, where R is given Lebesgue measure. Show that f dµ = λ∗ Ωf = λ∗ Ω0f . R R∞ (i) Let (X, Σ, µ) be a measure space and f : X → [0, ∞[ a function. Show that f dµ = 0 µ∗ {x : f (x) ≥ t}dt. (j) Let (X, Σ, µ) be a complete locally determined measure space and a < b in R, endowed with Lebesgue measure; let Λ be the domain of the c.l.d. product measure λ on X × [a, b]. Let f : X × ]a, b[ → R be a Λ-measurable function such that t 7→ f (x, t) : [a, b] → R is continuous on [a, b] and differentiable on ]a, b[ for every x ∈ X. (i) Show that the partial derivative
∂f ∂t
with respect to the second variable is Λ-measurable.
R ∂f (ii) Now suppose that is λ-integrable and that f (x, t0 )µ(dx) is defined and finite for some t0 ∈ ]a, b[. ∂t R Show that F (t) = f (x, t)µ(dx) is defined in R for every t ∈ [a, b], that F is absolutely continuous, and that R ∂f R ∂f F 0 (t) = (x, t)µ(dx) for almost every t ∈ ]a, b[. (Hint: F (c) = F (a) + X×[a,c] dλ for every c ∈ [a, b].) ∂t
∂t
(k) Show that
Γ(a)Γ(b) Γ(a+b)
R∞ 0
=
R1
ta−1
0
ta−1 (1 − t)b−1 dt for all a, b > 0. (Hint: show that
R∞ t
e−x (x − t)b−1 dxdt =
R∞ 0
e−x
Rx 0
ta−1 (x − t)b−1 dtdx.)
(l) Let (X, Σ, µ) and (Y, T, ν) be σ-finite measure spaces R and R λ the c.l.d. product R Rmeasure on X × Y . Suppose that f ∈ L0 (λ) and that 1 < p < ∞.R Show that ( | f (x, y)dx|p dy)1/p ≤ ( |f (x, y)|p dy)1/p dx. p (Hint: set q = p−1 and consider the integral |f (x, y)g(y)|λ(d(x, y)) for g ∈ Lq (ν), using 244K.) 1 Ry f (m) Let ν be Lebesgue measure on [0, ∞[; suppose that f ∈ Lp (ν) where 1 < p < ∞. Set F (y) = y 0 R p 1 for y > 0. Show that kF kp ≤ kf kp . (Hint: F (y) = 0 f (xy)dx; use 252Yl with X = [0, 1], Y = [0, ∞[.) p−1
R∞ (n) Set f (t) = t − ln(t + 1) for t > −1. (i) Show that Γ(a + 1) = aa+1 e−a −1 e−af (u) du for every a > 0. (Hint: substitute u = at −1 in 225Xj(iii).) (ii) Show that there is a δ > 0 such that f (t) ≥ 13 t2 for −1 ≤ t ≤ δ. (iii) Setting α = 12 f (δ), show that (for a ≥ 1) R∞ √ √ R ∞ −af (t) a δ e dt ≤ ae−aα 0 e−f (t)/2 dt → 0 √ √ √ 2 as a → ∞. (iv) Set ga (t) = e−af (t/ a) if − a < t ≤ δ a, 0 otherwise. Show that ga (t) ≤ e−t /3 for all a, t 2 and that lima→∞ ga (t) = e−t /2 for all t, so that lim
a→∞
ea Γ(a+1) a
a+ 1 2
(v) Show that
=
Z
∞
√
√
a
−∞
e
−af (t)
Z
δ
dt = lim a e−af (t) dt a→∞ a→∞ −1 −1 Z ∞ Z ∞ √ 2 = lim ga (t)dt = e−t /2 dt = 2π.
= lim
a→∞
n! limn→∞ −n n √ e n n
√
−∞
2π. (This is Stirling’s formula.)
252 Notes
Fubini’s theorem
231
(o) Let (X, Σ, µ) be a measure space and f a µ-integrable complex-valued function. For α ∈ ]−π, π] set Rπ R R Hα = {x : x ∈ dom f, Re(e−iα f (x)) > 0}. Show that −π Re(e−iα Hα f )dα = 2 |f |, and hence that there R 1 R is some α such that | Hα f | ≥ |f |. (Compare 246F.) π
(p) Let (X, Σ, µ) be a complete measure space and write M0,∞ for the set {f : f ∈ L0 (µ), µ{x : |f (x)| ≥ a} is finite for some a ∈ [0, ∞[}. (i) Show that for each f ∈ M0,∞ there is a non-increasing f ∗ : ]0, ∞[ → R such that µL {t : f ∗ (t) ≥ α} = µ{x : |f (x)| ≥ α} for every α > 0, writing µL for Lebesgue measure. (ii) R R µE Show that E |f |dµ ≤ 0 f ∗ dµL for every E ∈ Σ (allowing ∞). (Hint: (f × χE)∗ ≤ f ∗ .) (iii) Show that 0,∞ kf ∗ kpR = kf kp for every . (Hint: (|f |p )∗ = (f ∗ )p .) (iv) Show that if f , g ∈ M0,∞ R ∗ p ∈ ∗[1, ∞], f ∈ M then R |f × g|dµ ≤ f × g dµ R L . (Hint: look at simple functions first.) (v) Show that1 if µ is atomless a ∗ then 0 f dµL = supE∈Σ,µE≤a E |f | for every a ≥ 0. (Hint: 215D.) (vi) Show that A ⊆ L (µ) is uniformly integrable iff {f ∗ : f ∈ A} is uniformly integrable in L1 (µL ). (f ∗ is called the decreasing rearrangement of f .) (q) Let (X, Σ, µ) be a complete locally determined measure space, and write ν for Lebesgue measure on [0, 1]. Show that the c.l.d. product measure λ on X × [0, 1] is localizable iff µ is localizable. (Hints: (i) if E ⊆ Σ, show that F ∈ Σ is an essential supremum for E in Σ iff F × [0, 1] is an essential supremum for {E × [0, 1] : E ∈ E} in Λ = dom λ. (ii) For W ∈ Λ, n ∈ N, k < 2n set Wnk = {x : x ∈ X, ν ∗ {t : (x, t) ∈ W, 2−n k ≤ t ≤ 2−n (k + 1)} ≥ 2−n−1 }. Show that if W ⊆ Λ and Fnk is an essential supremum for {Wnk : W ∈ W} in Σ for all n, k, then S T S −m k, 2−m (k + 1)] n∈N m≥n k 0. Show that if the c.l.d. product measure on X × Y is strictly localizable, then µ is strictly localizable. (Hint: take F ∈ T, 0 < νF < ∞. Let hWi ii∈I be a decomposition of X × Y . For i ∈ I, n ∈ N set Ein = {x : ν ∗ {y : y ∈ F, (x, y) ∈ Wi } ≥ 2−n }. Apply 213Ye to {Ein : i ∈ I, n ∈ N}.) (t) Let (X, Σ, µ) be the space of Example 216E, and give Lebesgue measure to [0, 1]. Show that the c.l.d. product measure on X × [0, 1] is complete, locally determined, atomless and localizable, but not strictly localizable. (u) Show that if p is any non-zero (real) polynomial in r variables, then {x : x ∈ R r , p(x) = 0} is Lebesgue negligible. 252 Notes and comments For a volume and a half now I have asked you to accept the idea of integrating partially-defined functions, insisting that sooner or later they would appear at the core of the subject. The moment has now come. If we wish to apply Fubini’s and Tonelli’s theorems in the most fundamental of all cases, with both factors equal to Lebesgue measure on the unit interval, it is surely natural to look at all functions which are integrable on the square for two-dimensional Lebesgue measure. Now two-dimensional Lebesgue measure is a complete measure, so, in particular, assigns zero measure to any set of the form {(x, b) : x ∈ A} or {(a, y) : y ∈ A}, whether or not the set A is measurable for one-dimensional measure. Accordingly, if f is a function of two variables which is integrable for two-dimensional Lebesgue measure, there is no reason why any particular section x 7→ f (x, b) or y 7→ f (a, y) should beRRmeasurable, let alone integrable. Consequently, even if f itself is defined everywhere, the outer integral of f (x, y)dxdy is likely to be applied to a function which is not defined for every y. Let me remark that the problem does not concern ‘∞’; the awkward functions are those with sections so irregular that they cannot be assigned an integral at all.
232
Product measures
252 Notes
I have seen many approaches to this particular nettle, generally less whole-hearted than the one I have determined on for this treatise. Part of the difficulty is that Fubini’s theorem really is at the centre of measure theory. Over large parts of the subject, it is possible to assert that a result is non-trivial if and only if it depends on Fubini’s theorem. I am therefore unwilling to insert any local fix, saying that ‘in this chapter, we shall integrate functions which are not defined everywhere’; before long, such a provision would have to be interpolated into the preambles to half the best theorems, or an explanation offered of why it wasn’t necessary in their particular contexts. I suppose that one of the commonest responses is (like Halmos 50) b to restrict attention to Σ⊗T-measurable functions, which eliminates measurability problems for the moment (252Xe, 252P); but unhappily (or rather, to my mind, happily) there are crucial applications in which the b functions are not actually Σ⊗T-measurable, but belong to some wider class, and this restriction sooner or later leads to undignified contortions as we are forced to adapt limited results to unforeseen contexts. Besides, it leaves unsaid the really rather important information that if f is a measurable function of two variables then (under appropriate conditions) almost all its sections are measurable (252E). In 252B and its corollaries there is a clumsy restriction: we assume that one of the measures is σ-finite and the other is either strictly localizable or complete and locally determined. The obvious question is, whether we need these hypotheses. From 252K we see that the hypothesis ‘σ-finite’ on the second factor can certainly not be abandoned, even when the first factor is a complete probability measure. The requirement ‘µ is either strictly localizable or complete and locally determined’ is in fact fractionally stronger than what is needed, as well as disagreeably elaborate. The ‘right’ hypothesis is that the completion of µ should be locally determined (see (b*-i) of the proof of 252B). The point is that because the product of two measures is the same as the product of their c.l.d. versions (251S), no theorem which leads from the product measure to the factor measures can distinguish between a measure and its c.l.d. version; so that, in 252B, we must expect to need µ and its c.l.d. version to give rise to the same integrals. The proof of 252B would be better focused if the hypothesis was simplified to ‘ν is σ-finite and µ is complete and locally determined’. But this would just transfer part of the argument into the proof of 252C. We also have to work a little harder in 252B in order to cover functions and integrals taking the values ±∞. Fubini’s theorem is so central to measure theory that I believe it is worth taking a bit of extra trouble to state the results in maximal generality. This is especially important because we frequently apply it in multiply repeated integrals, as in 252Xc, in which we have even less control than usual over the intermediate functions to be integrated. I have expressed all the main results of this section in terms of the ‘c.l.d.’ product measure. In the case of σ-finite spaces, of course, which is where the theory works best, we could just as well use the ‘primitive’ product measure. Indeed, Fubini’s theorem itself has a version in terms of the primitive product measure which is rather more elegant than 252B as stated (252Yc), and covers the great majority of applications. (Integrals with respect to the primitive and c.l.d. product measures are of course very closely related; see 252Yd.) But we do sometimes need to look at non-σ-finite spaces, and in these cases the asymmetric form in 252B is close to the best we can do. Using the primitive product measure does not help at all with the most substantial obstacle, the phenomenon in 252K (see 252Yf). The pre-calculus concept of an integral as ‘the area under a curve’ is given expression in 252N: the integral of a non-negative function is the measure of its ordinate set. This is unsatisfactory as a definition of the integral, not just because of the requirement that the base space should be complete and locally determined (which can be dealt with by using the primitive product measure, as in 252Yg), but because the construction of the product measure involves integration (part (c) of the proof of 251E). The idea of 252N is to relate the measure of an ordinate set to the integral of the measures of its vertical sections. Curiously, if instead we integrate the measures of its horizontal sections, as in 252O, we get a more versatile result. (Indeed this one does not involve the Rconcept of ‘product measure’, and could have appeared at any point after ∞ §123.) Note that the integral 0 . . . dt here is applied to a monotonic function, so may be interpreted as an improper Riemann integral. If you think you know enough about the Riemann integral to make this a tempting alternative to the construction in §122, the tricky bit now becomes the proof that the integral is additive. A different line of argument is to use integration over sections to define a product measure. The difficulty with this approach is that unless we take great care we may find ourselves with an asymmetric construction. My own view is that such an asymmetry is acceptable only when there is no alternative. But in Chapter 43 of Volume 4 I will describe a couple of examples.
253Ab
Tensor products
233
Of the two examples I give here, 252K is supposed to show that when I call for σ-finite spaces they are really necessary, while 252L is supposed to show that joint measurability is essential in Tonelli’s theorem and its corollaries. The factor spaces in 252K, Lebesgue measure and counting measure, are chosen to show that it is only the lack of σ-finiteness that can be the problem; they are otherwise as regular as one can reasonably ask. In 252L I have used the countable-cocountable measure on ω1 , which you may feel is fit only for counter-examples; and the question does arise, whether the same phenomenon occurs with Lebesgue measure. This leads into deep water, and I will return to it in Volume 5. I ought perhaps to note explicitly that in Fubini’s theorem, we really do need to have a function which is integrable for the product measure. I include 252Xf RR RRand 252Xg to remind you that even in the best-regulated circumstances, the repeated integrals f dxdy, f dydx may fail to be equal if f is not integrable as a function of two variables. There are many ways to calculate the volume βr of an r-dimensional ball; the one I have used in 252Q follows a line that would have been natural to me before I ever heard of measure theory. In 252Xh I suggest another method. The idea of integration-by-substitution, used in part (b) of the argument, is there supported by an ad hoc argument; I will present a different, more generally applicable, approach in Chapter 26. Elsewhere (252Xh, 252Yk, 252Yl) I find myself taking for granted substitutions of the form t 7→ at, t 7→ a + t; for a systematic justification, see §263. Of course an enormous number of other formulae of advanced calculus are also based on repeated integration of one kind or another, and I give a sample handful of such results (252Xi, 252Yj-252Yn).
253 Tensor products The theorems of the last section show that the integrable functions on a product of two measure spaces can be effectively studied in terms of integration on each factor space separately. In this section I present a very striking relationship between the L1 space of a product measure and the L1 spaces of its factors, which actually determines the product L1 up to isomorphism as Banach lattice. I start with a brief note on bilinear maps (253A) and a description of the canonical bilinear map from L1 (µ) × L1 (ν) to L1 (µ × ν) (253B-253E). The main theorem of the section is 253F, showing that this canonical map is universal for continuous bilinear maps from L1 (µ) × L1 (ν) to Banach spaces; it also determines the ordering of L1 (µ × ν) (253G). I end with a description of a fundamental type of conditional expectation operator (253H) and notes on products of indefinite-integral measures (253I) and upper integrals of special kinds of function (253J, 253K). 253A Bilinear maps Before looking at any of the measure theory in this section, I introduce a concept from the theory of linear spaces. (a) Let U , V and W be linear spaces over R (or, indeed, any other field). A map φ : U × V → W is bilinear if it is linear in each variable separately, that is, φ(u1 + u2 , v) = φ(u1 , v) + φ(u2 , v), φ(u, v1 + v2 ) = φ(u, v1 ) + φ(u, v2 ), φ(αu, v) = αφ(u, v) = φ(u, αv) for all u, u1 , u2 ∈ U , v, v1 , v2 ∈ V and scalars α. Observe that φ gives rise to, and in turn can be defined by, a linear operator T : U → L(V ; W ), writing L(V ; W ) for the space of linear operators from V to W , where (T u)(v) = φ(u, v) for all u ∈ U , v ∈ V . Hence, or otherwise, we can see, for instance, that φ(0, v) = φ(u, 0) = 0 whenever u ∈ U, v ∈ V . If W 0 is another linear space over the same field, and S : W → W 0 is a linear operator, then Sφ : U × V → 0 W is bilinear.
234
Product measures
253Ab
(b) Now suppose that U , V and W are normed spaces, and φ : U × V → W is a bilinear map. Then we say that φ is bounded if sup{kφ(u, v)k : kuk ≤ 1, kvk ≤ 1} is finite, and in this case we call this supremum the norm kφk of φ. Note that kφ(u, v)k ≤ kφkkukkvk for all u ∈ U , v ∈ V (because kφ(u, v)k = αβkφ(α−1 u, β −1 v)k ≤ αβkφk whenever α > kuk, β > kvk). If W 0 is another normed space and S : W → W 0 is a bounded linear operator, then Sφ : U × V → W 0 is a bounded bilinear map, and kSφk ≤ kSkkφk. 253B Definition The most important bilinear maps of this section are based on the following idea. Let f and g be real-valued functions. I will write f ⊗ g for the function (x, y) 7→ f (x)g(y) : dom f × dom g → R. 253C Proposition (a) Let X and Y be sets, and Σ, T σ-algebras of subsets of X, Y respectively. If f is a Σ-measurable real-valued function defined on a subset of X, and g is a T-measurable real-valued function b defined on a subset of Y , then f ⊗ g, as defined in 253B, is Σ⊗T-measurable. (b) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X ×Y . If f ∈ L0 (µ) and g ∈ L0 (ν), then f ⊗ g ∈ L0 (λ). Remark Recall from 241A that L0 (µ) is the space of µ-virtually measurable real-valued functions defined on µ-conegligible subsets of X. b proof (a) The point is that f ⊗ χY is Σ⊗T-measurable, because for any α ∈ R there is an E ∈ Σ such that {x : f (x) ≥ α} = E ∩ dom f , so that {(x, y) : (f ⊗ χY )(x, y) ≥ α} = (E ∩ dom f ) × Y = (E × Y ) ∩ dom(f ⊗ χY ), b b and of course E × Y ∈ Σ⊗T. Similarly, χX ⊗ g is Σ⊗T-measurable and f ⊗ g = (f ⊗ χY ) × (χX ⊗ g) is b Σ⊗T-measurable. (b) Let E ∈ Σ, F ∈ T be conegligible subsets of X, Y respectively such that E ⊆ dom f , F ⊆ dom g, b ⊆ Λ (251Ia). Also f ¹E is Σ-measurable and g¹F is T-measurable. Write Λ for the domain of λ. Then Σ⊗T E × F is λ-conegligible, because λ((X × Y ) \ (E × F )) ≤ λ((X \ E) × Y ) + λ(X × (Y \ F )) = µ(X \ E) · νY + µX · ν(Y \ F ) = 0 (also from 251Ia). So dom(f ⊗ g) ⊇ E × F is conegligible. Also, by (a), (f ⊗ g)¹(E × F ) = (f ¹E) ⊗ (g¹F ) b is Σ⊗T-measurable, therefore Λ-measurable, and f ⊗ g is virtually measurable. Thus f ⊗ g ∈ L0 (λ), as claimed. 253D
Now we can apply the ideas of 253B-253C to integrable functions.
Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces, Rand write λ for R the Rc.l.d. product measure on X × Y . If f ∈ L1 (µ) and g ∈ L1 (ν), then f ⊗ g ∈ L1 (λ) and f ⊗ g dλ = f dµ g dν. Remark I follow §242 in writing L1 (µ) for the space of µ-integrable real-valued functions. proof (a) Consider first the case f = χE, g = χF where E ∈ Σ, F ∈ T have finite measure; then f ⊗ g = χ(E × F ) is λ-integrable with integral λ(E × F ) = µE · νF = by 251Ia. (b) It follows at once that f ⊗ g is λ-simple, with function and g is a ν-simple function.
R
R
f dµ ·
R
f ⊗ g dλ =
g dν, R
R f dµ g dν, whenever f is a µ-simple
(c) If f and g are non-negative integrable functions, there are non-decreasing sequences hfn in∈N , hgn in∈N of non-negative simple functions converging almost everywhere to f , g respectively; now note that if E ⊆ X,
253F
Tensor products
235
F ⊆ Y are conegligible, E×F is conegligible in X×Y , as remarked in the proof of 253C, so the non-decreasing sequence hfn × gn in∈N of λ-simple functions converges almost everywhere to f ⊗ g, and
R
f ⊗ g dλ = limn→∞
R
fn ⊗ gn dλ = limn→∞
R
fn dµ
R
gn dν =
R
f dµ
R
g dν
by B.Levi’s theorem. (d) Finally, for general f and g, we can express them as the differences f + − f − , g + − g − of non-negative integrable functions, and see that
R
f ⊗ g dλ =
R
f + ⊗ g + − f + ⊗ g − − f − ⊗ g + + f − ⊗ g − dλ =
R
f dµ
R
g dν.
253E The canonical map L1 × L1 → L1 I continue the argument from 253D. Because E ×F is conegligible in X × Y whenever E and F are conegligible subsets of X and Y , f1 ⊗ g1 = f ⊗ g λ-a.e. whenever f = f1 µ-a.e. and g = g1 ν-a.e. We may therefore define u ⊗ v ∈ L1 (λ), for u ∈ L1 (µ) and v ∈ L1 (ν), by saying that u ⊗ v = (f ⊗ g)• whenever u = f • and v = g • . Now if f , f1 , f2 ∈ L(µ), g, g1 , g2 ∈ L(ν) and a ∈ R, (f1 + f2 ) ⊗ g = (f1 ⊗ g) + (f2 ⊗ g), f ⊗ (g1 + g2 ) = (f ⊗ g1 ) + (f ⊗ g2 ), (af ) ⊗ g = a(f ⊗ g) = f ⊗ (ag). It follows at once that the map (u, v) 7→ u ⊗ v is bilinear. R R R Moreover, if f ∈ L1 (µ) and g ∈ L1 (ν), |f | ⊗ |g| = |f ⊗ g|, so |f ⊗ g|dλ = |f |dµ |g|dν. Accordingly ku ⊗ vk1 = kuk1 kvk1 1
1
for all u ∈ L (µ), v ∈ L (ν). In particular, the bilinear map ⊗ is bounded, with norm 1 (except in the trivial case in which one of L1 (µ), L1 (ν) is 0-dimensional). 253F
We are now ready for the main theorem of this section.
Theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and let λ be the c.l.d. product measure on X × Y . Let W be any Banach space and φ : L1 (µ) × L1 (ν) → W a bounded bilinear map. Then there is a unique bounded linear operator T : L1 (λ) → W such that T (u ⊗ v) = φ(u, v) for all u ∈ L1 (µ), v ∈ L1 (ν), and kT k = kφk. proof (a) The centre of the argument is the following fact: if E0 , . . . , En are measurable Pn sets of finite measure in X, F0 , .P . . , Fn are measurable sets of finite measure in Y , a0 , . . . , an ∈ R and i=0 ai χ(Ei × Fi ) = 0 λn P We can find a disjoint family hGj ij≤m of measurable sets of a.e., then i=0 ai φ(χEi• , χFi• ) = 0 in W . P finite measure in X such that each E is expressible as a union of some subfamily of the Gj ; so that χEi i Pm is expressible in the form j=0 bij χGj (see 122Ca). Similarly, we can find a disjoint family hHk ik≤l of Pl measurable sets of finite measure in Y such that each χFi is expressible as k=0 cik χHk . Now ¢ Pm Pl ¡Pn Pn j=0 k=0 i=0 ai bij cik χ(Gj × Hk ) = i=0 ai χ(Ei × Fi ) = 0 λ-a.e. Because the Gj × Hk are disjoint, and λ(Gj × Hk ) = µG Pnj · νHk for all j, k, it follows that for every jP≤ m, k ≤ l we have either µGj = 0 or νHk = 0 or i=0 ai bij cik = 0. In any of these three cases, n • • i=0 ai bij cik φ(χGj , χHk ) = 0 in W . But this means that ¢ Pm Pl ¡Pn Pn • • • • 0 = j=0 k=0 i=0 ai φ(χEi , χFi ), i=0 ai bij cik φ(χGj , χHk ) = as claimed. Q Q 0 (b) It follows that if E0 , . . . , En , E00 , . . . , Em are measurable sets of finite measure Pnin X, F0 , . . . , Fn , measurable sets of finite measure in Y , a0 , . . . , an , a00 , . . . , a0m ∈ R and i=0 ai χ(Ei × Fi ) = Fi0 ) λ-a.e., then Pn Pm 0 • • 0• 0• i=0 ai φ(χEi , χFi ) = i=0 ai φ(χEi , χFi )
0 0 F , . . . , Fm are P0m 0 0 a χ(E i× i=0 i
in W . Let M be the linear subspace of L1 (λ) generated by
236
Product measures
253F
{χ(E × F )• : E ∈ Σ, µE < ∞, F ∈ T, νF < ∞}; then we have a unique map T0 : M → W such that Pn Pn T0 ( i=0 ai χ(Ei × Fi )• ) = i=0 ai φ(χEi• , χFi• ) whenever E0 , . . . , En are measurable sets of finite measure in X, F0 , . . . , Fn are measurable sets of finite measure in Y and a0 , . . . , an ∈ R. Of course T0 is linear. (c) Some of the same calculations show that kT0 uk ≤ kφkkuk1 for every u ∈ M . P P If u ∈ M , then, by Pm Pl • the arguments of (a), we can express u as j=0 k=0 ajk χ(Gj × Hk ) , where hGj ij≤m and hHk ik≤l are disjoint families of sets of finite measure. Now kT0 uk = k
m X l X
ajk φ(χG•j , χHk• )k ≤
j=0 k=0
≤
m X l X
m X l X
|ajk |kφ(χG•j , χHk• )k
j=0 k=0
|ajk |kφkkχG•j k1 kχHk• k1 = kφk
j=0 k=0
= kφk
m X l X
|ajk |µGj · νHk
j=0 k=0
m X l X
|ajk |λ(Gj × Hk ) = kφkkuk1 ,
j=0 k=0
as claimed. Q Q (d) The next point is to observe that M is dense in L1 (λ) for k k1 . P P Repeating the ideas above once again, we observe that if E , . . . , E are sets of finite measure in X and F 0 n 0 , . . . , Fn are sets of finite measure S in Y , then χ( i≤n Ei × Fi )• ∈ M ; this is because, expressing each Ei as a union of Gj , where the Gj are disjoint, we have S S 0 j≤m Gj × Fj , i≤n Ei × Fi = S where Fj0 = {Fi : Gj ⊆ Ei } for each j; now hGj × Fj0 ij≤m is disjoint, so Pm S χ( j≤m Gj × Fj )• = j=0 χ(Gj × Fj0 )• ∈ M. So 251Ie tells us that whenever λH < ∞ and ² > 0 there is a G such that λ(H4G) ≤ ² and χG• ∈ M ; now kχH • − χG• k1 = λ(G4H) ≤ ², so χH • is approximated arbitrarily closely by members of M , and belongs to the closure M of M in L1 (λ). Because M is a linear subspace of L1 (λ), so is M (2A4Cb); accordingly M contains the equivalence classes of all λ-simple functions; but these are dense in L1 (λ) (242M), so M = L1 (λ), as claimed. Q Q (e) Because W is a Banach space, it follows that there is a bounded linear operator T : L1 (λ) → W extending T0 , with kT k = kT0 k ≤ kφk (2A4I). Now T (u ⊗ v) = φ(u, v) for all u ∈ L1 (µ), v ∈ L1 (ν). P P If u = χE • , v = χF • , where E and F are measurable sets of finite measure, then T (u ⊗ v) = T (χ(E × F )• ) = T0 (χ(E × F )• ) = φ(χE • , χF • ) = φ(u, v). Because φ and ⊗ are bilinear and T is linear, T (f • ⊗ g • ) = φ(f • , g • ) whenever f and g are simple functions. Now whenever u ∈ L1 (µ), v ∈ L1 (ν) and ² > 0, there are simple functions f , g such that ku − f • k1 ≤ ², kv − g • k1 ≤ ² (242M again); so that kφ(u, v) − φ(f • , g • )k ≤ kφ(u − f • , v − g • )k + kφ(u, g • − v)k + kφ(f • − u, v)k ≤ kφk(²2 + ²kuk1 + ²kvk1 ). Similarly ku ⊗ v − f • ⊗ g • k1 ≤ ²(² + kuk1 + kvk1 ), so
253G
Tensor products
237
kT (u ⊗ v) − T (f • ⊗ g • )k ≤ ²kT k(² + kuk1 + kvk1 ); because T (f • ⊗ g • ) = φ(f • , g • ), kT (u ⊗ v) − φ(u, v)k ≤ ²(kT k + kφk)(² + kuk1 + kvk1 ). As ² is arbitrary, T (u ⊗ v) = φ(u, v), as required. Q Q (f ) The argument of (e) ensured that kT k ≤ kφk. Because ku ⊗ vk1 ≤ kuk1 kvk1 for all u ∈ L1 (µ), v ∈ L1 (ν), kφ(u, v)k ≤ kT kkuk1 kvk1 for all u, v, and kφk ≤ kT k; so kT k = kφk. (g) Thus T has the required properties. To see that it is unique, we have only to observe that any bounded linear operator S : L1 (λ) → W such that S(u ⊗ v) = φ(u, v) for all u ∈ L1 (µ), v ∈ L1 (ν) must agree with T on objects of the form χ(E × F )• where E and F are of finite measure, and therefore on every member of M ; because M is dense and both S and T are continuous, they agree everywhere in L1 (λ). 253G The order structure of L1 In 253F I have treated the L1 spaces exclusively as normed linear spaces. In general, however, the order structure of an L1 space (see 242C) is as important as its norm. The map ⊗ : L1 (µ) × L1 (ν) → L1 (λ) respects the order structures of the three spaces in the following strong sense. Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . Then (a) u ⊗ v ≥ 0 in L1 (λ) whenever u ≥ 0 in L1 (µ) and v ≥ 0 in L1 (ν). (b) The positive cone {w : w ≥ 0} of L1 (λ) is precisely the closed convex hull C of {u ⊗ v : u ≥ 0, v ≥ 0} in L1 (λ). *(c) Let W be any Banach lattice, and T : L1 (λ) → W a bounded linear operator. Then the following are equiveridical: (i) T w ≥ 0 in W whenever w ≥ 0 in L1 (λ); (ii) T (u ⊗ v) ≥ 0 in W whenever u ≥ 0 in L1 (µ) and v ≥ 0 in L1 (ν). proof (a) If u, v ≥ 0 then they are expressible as f • , g • where f ∈ L1 (µ), g ∈ L1 (ν) and f ≥ 0, g ≥ 0. Now f ⊗ g ≥ 0 so u ⊗ v = (f ⊗ g)• ≥ 0. (b)(i) Write L1 (λ)+ for {w : w ∈ L1 (λ), w ≥ 0}. Then L1 (λ)+ is a closed convex set in L1 (λ) (242De); by (a), it contains u ⊗ v whenever u ∈ L1 (µ)+ , v ∈ L1 (ν)+ , so it must include C. (ii)(α) OfPcourse 0 = 0 ⊗ 0 ∈ C. (β) If u ∈ M , as defined in the proof of 253F, and u > 0, then u is expressible as j≤m,k≤l ajk χ(Gj × Hk )• , where G0 , . . . , Gm and H0 , . . . , Hl are disjoint sequences of sets of finite measure, as in (a) of the proof of 253F. Now ajk can be negative only if χ(Gj × Hk )• = 0, so replacing every aP jk by max(0, ajk ) if necessary, we can suppose that ajk ≥ 0 for all j, k. Not all the ajk can be zero, so a = j≤m,k≤l ajk > 0, and u=
P
ajk
j≤m,k≤l a
· aχ(Gj × Hk )• =
P
ajk j≤m,k≤l a
· (aχG•j ) ⊗ χHk• ∈ C.
• (γ) If w ∈ L1 (λ)+ and h ≥ 0 in L1 (λ). There is a simple function h1 ≥ 0 such R ² > R0, express w as h whereP n that h1 ≤a.e. h and h ≤ h1 + ². Express h1 as i=0 ai χHi where λHi < ∞, ai ≥ 0 for each i, and for each i ≤ n choose sets GSi0 , . . . , Gimi ∈ Σ, Fi0 , . . . , Fimi ∈ T, all of finite measure, such that Gi0 , . . . , Gimi are disjoint and λ(Hi 4 j≤mi Gij × Fij ) ≤ ²/(n + 1)(ai + 1), as in (d) of the proof of 253F. Set Pn Pmi w0 = i=0 ai j=0 χ(Gij × Fij )• .
Then w0 ∈ C because w0 ∈ M and w0 ≥ 0. Also kw − w0 k1 ≤ kw − h•1 k1 + kh•1 − w0 k1 Z Z mi n X X ≤ (h − h1 )dλ + ai |χHi − χ(Gij × Fij )|dλ i=0
≤²+
n X i=0
ai λ(H4
[ j≤mi
j=0
Gij × Fij ) ≤ 2².
238
Product measures
253G
As ² is arbitrary and C is closed, w ∈ C. As w is arbitrary, L1 (λ)+ ⊆ C and C = L1 (λ)+ . (c) Part (a) tells us that (i)⇒(ii). For the reverse implication, we need a fragment from the theory of Banach lattices: W + = {w : w ∈ W, w ≥ 0} is a closed set in W . P P If w, w0 ∈ W , then w = (w − w0 ) + w0 ≤ |w − w0 | + w0 ≤ |w − w0 | + |w0 |, −w = (w0 − w) − w0 ≤ |w − w0 | − w0 ≤ |w − w0 | + |w0 |, |w| ≤ |w − w0 | + |w0 |,
|w| − |w0 | ≤ |w − w0 |,
because |w| = w ∨ (−w) and the order of W is translation-invariant (241Ec). Similarly, |w0 | − |w| ≤ |w − w0 | and ||w| − |w0 || ≤ |w − w0 |, so k|w| − |w0 |k ≤ kw − w0 k, by the definition of Banach lattice (242G). Setting φ(w) = |w| − w, we see that kφ(w) − φ(w0 )k ≤ 2kw − w0 k for all w, w0 ∈ W , so that φ is continuous. Now, because the order is invariant under multiplication by positive scalars, w ≥ 0 ⇐⇒ 2w ≥ 0 ⇐⇒ w ≥ −w ⇐⇒ w = |w| ⇐⇒ φ(w) = 0, so W + = {w : φ(w) = 0} is closed. Q Q Now suppose that (ii) is true, and set C1 = {w : w ∈ L1 (λ), T w ≥ 0}. Then C1 contains u ⊗ v whenever u, v ≥ 0; but also it is convex, because T is linear, and closed, because T is continuous and C1 = T −1 [W + ]. By (b), C1 includes {w : w ∈ L1 (λ), w ≥ 0}, as required by (i). 253H Conditional expectations The ideas of this section and the preceding one provide us with some of the most important examples of conditional expectations. Theorem Let (X, Σ, µ) and (Y, T, ν) be complete probability spaces, with c.l.d. product (X × Y, Λ, λ). Set Λ1 = {E × Y : E ∈ Σ}. Then Λ1 is a σ-subalgebra of Λ. Given a λ-integrable real-valued function f , set g(x, y) =
R
f (x, z)ν(dz)
whenever this is defined. Then g is a conditional expectation of f on Λ1 . proof We know that Λ1 ⊆ Λ, R by 251Ia, and Λ1 is a σ-algebra of sets because Σ is. Fubini’s theorem (252B, 252C) tells us that f1 (x) = f (x, z)ν(dz) is defined for almost every x, and therefore that g = f1 ⊗ χY is defined almost everywhere in X ×Y . f1 is µ-virtually measurable; because µ is complete, f1 is Σ-measurable, so g is Λ1 -measurable (since {(x, y) : g(x, y) ≤ α} = {x : f1 (x) ≤ α} × Y for every α ∈ R). Finally, if W ∈ Λ1 , then W = E × Y for some E ∈ Σ, so Z
Z g dλ =
Z (f1 ⊗ χY ) × (χE ⊗ χY )dλ =
Z f1 × χE dµ
χY dν
W
(by 253D)
ZZ =
(by Fubini’s theorem)
Z χE(x)f (x, y)ν(dy)µ(dx) =
f × χ(E × Y )dλ
Z =
f dλ. W
So g is a conditional expectation of f . 253I
This is a convenient moment to set out a useful result on indefinite-integral measures.
Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and f ∈ L0 (µ), g ∈ L0 (ν) non-negative functions. Let µ0 , ν 0 be the corresponding indefinite-integral measures (see §234). Let λ be the c.l.d. product of µ and ν, and λ0 the indefinite-integral measure defined from λ and f ⊗ g ∈ L0 (λ) (253Cb). Then λ0 is the c.l.d. product of µ0 and ν 0 . proof Write θ for the c.l.d. product of µ0 and ν 0 .
253I
Tensor products
239
R (a) If we replace µ by its completion, we do not change the integral dµ (212Fb), so (by the definition in 234A) we do not change µ0 ; at the same time, we do not change λ, by 251S. The same applies to ν. So it will be enough to prove the result on the assumption that µ and ν are complete; in which case f and g are measurable and have measurable domains. Set F = {x : x ∈ dom f, f (x) > 0} and G = {y : y ∈ dom g, g(y) > 0}, so that F × G = {w : w ∈ dom(f ⊗ g), (f ⊗ g)(w) > 0}. Then F is µ0 -conegligible and G is ν 0 -conegligible, so F × G is θ-conegligible as well as λ0 -conegligible. Because both θ and λ0 are complete (251Ic, 234A), it will be enough to show that the subspace measures θF ×G , λ0F ×G on F × G are equal. But note that θF ×G can be identified with the 0 0 product of µ0F and νG , where µ0F and νG are the subspace measures on F , G respectively (251P(ii-α)). At 0 the same time, µF is the indefinite-integral measure defined from the subspace measure µF on F and the 0 is the indefinite-integral measure defined from the subspace measure νG on G and g¹G, function f ¹F , νG 0 and λF ×G is defined from the subspace measure λF ×G and (f ¹F ) ⊗ (g¹G). Finally, by 251P again, λF ×G is the product of µF and νG . What all this means is that it will be enough to deal with the case in which F = X and G = Y , that is, f and g are everywhere defined and strictly positive; which is what I will suppose from now on. (b) In this case dom µ0 = Σ and dom ν 0 = T (234Da). Similarly, dom λ0 = Λ is just the domain of λ. Set Fn = {x : x ∈ X, 2−n ≤ f (x) ≤ 2n },
Gn = {y : y ∈ Y, 2−n ≤ g(y) ≤ 2n }
for n ∈ N. (c) Set A = {W : W ∈ dom θ ∩ dom λ0 , θ(W ) = λ0 (W )}. If µ0 E and ν 0 H are defined and finite, then f × χE and g × χH are integrable, so Z Z λ0 (E × H) = (f ⊗ g) × χ(E × H)dλ = (f × χE) ⊗ (g × χH)dλ Z Z = f × χE dµ · g × χH dν = θ(E × H) by 253D and 251Ia, that is, E × H ∈ A. If we now look at AEH = {W : W ⊆ X × Y , W ∩ (E × H) ∈ A}, then we see that AEH contains E 0 × H 0 for every E 0 ∈ Σ, H 0 ∈ T, S if hWn in∈N is a non-decreasing sequence in AEH then n∈N Wn ∈ AEH , if W , W 0 ∈ AEH and W ⊆ W 0 then W 0 \ W ∈ AEH . Thus AEH is a Dynkin class of subsets of X × Y , and by the Monotone Class Theorem (136B) includes the b σ-algebra generated by {E 0 × H 0 : E 0 ∈ Σ, H 0 ∈ T}, which is Σ⊗T. (d) Now suppose that W ∈ Λ. In this case W ∈ dom θ and θW ≤ λ0 W . P P Take n ∈ N, and E ∈ dom µ0 , 0 0 0 0 0 H ∈ dom ν such that µ E and ν H are both finite. Set E = E ∩ Fn , H = H ∩ Gn and W 0 = W ∩ (E 0 × H 0 ). b such Then W 0 ∈ Λ, while µE 0 ≤ 2n µ0 E and νH 0 ≤ 2n ν 0 H are finite. By 251Ib there is a V ∈ Σ⊗T b such that V 0 ⊆ (E 0 × H 0 ) \ W 0 and that V ⊆ W 0 and λV = λW 0 . Similarly, there is a V 0 ∈ Σ⊗T λV 0 = λ((E 0 × H 0 ) \ W 0 ). This means that λ((E 0 × H 0 ) \ (V ∪ V 0 )) = 0, so λ0 ((E 0 × H 0 ) \ (V ∪ V 0 )) = 0. But (E 0 × H 0 ) \ (V ∪ V 0 ) ∈ A, by (c), so θ((E 0 × H 0 ) \ (V ∪ V 0 )) = 0 and W 0 ∈ dom θ, while θW 0 = θV = λ0 V ≤ λ0 W . Since E and H are arbitrary, W ∩ (Fn × Gn ) ∈ dom θ (251H) and θ(W ∩ (Fn × Gn )) ≤ λ0 W . Since hEn in∈N , hGn in∈N are non-decreasing sequences with unions X, Y respectively, θW = supn∈N θ(W ∩ (En × Gn )) ≤ λ0 W . Q Q (e) In the same way, λ0 W is defined and less than or equal to θW for every W ∈ dom θ. P P The arguments are very similar, but a refinement seems to be necessary at the last stage. Take n ∈ N, and E ∈ Σ, H ∈ T such that µE and νH are both finite. Set E 0 = E ∩ Fn , H 0 = H ∩ Gn and W 0 = W ∩ (E 0 × H 0 ). Then b such that W 0 ∈ dom θ, while µ0 E 0 ≤ 2n µE and ν 0 H 0 ≤ 2n νH are finite. This time, there are V , V 0 ∈ Σ⊗T 0 0 0 0 0 0 0 0 0 0 V ⊆ W , V ⊆ (E × H ) \ W , θV = θW and θV = θ((E × H ) \ W ). Accordingly
240
Product measures
253I
λ0 V + λ0 V 0 = θV + θV 0 = θ(E 0 × H 0 ) = λ0 (E 0 × H 0 ), so that λ0 W 0 is defined and equal to θW 0 . What this means is that W ∩(Fn ×Gn )∩(E ×H) ∈ A whenever µE and νH are finite. So W ∩(Fn ×Gn ) ∈ Λ, by 251H; as n is arbitrary, W ∈ Λ and λ0 W is defined. ?? Suppose, if possible, that λ0 W > θW . Then there is some n ∈ N such that λ0 (W ∩ (Fn × Gn )) > θW . Because λ is semi-finite, 213B tells us that there is some λ-simple function h such that h ≤ (f ⊗ g) × χ(W ∩ R (Fn × Gn )) and h dλ > θW ; setting V = {(x, y) : h(x, y) > 0}, we see that V ⊆ W ∩ (Fn × Gn ), λV is defined and finite and λ0 V > θW . Now there must be sets E ∈ Σ, H ∈ T such that µE and νF are both finite and λ(V \ (E × H)) < 4−n (λ0 V − θW ). But in this case V ∈ Λ ⊆ dom θ (by (d)), so we can apply the argument just above to V and conclude that V ∩ (E × H) = V ∩ (Fn × Gn ) ∩ (E × H) belongs to A. And now λ0 V = λ0 (V ∩ (E × H)) + λ0 (V \ (E × H)) ≤ θ(V ∩ (E × H)) + 4n λ(V \ (E × H)) < θV + λ0 V − θW ≤ λ0 V, which is absurd. X X So λ0 W is defined and not greater than θW . Q Q (f ) Putting this together with (d), we see that λ0 = θ, as claimed. Remark If µ0 and ν 0 are totally finite, so that they are ‘truly continuous’ with respect to µ and ν in the sense of 232Ab, then f and g are integrable, so f ⊗ g is λ-integrable, and θ = λ0 is truly continuous with respect to λ. The proof above can be simplified using a fragment of the general theory of complete locally determined spaces, which will be given in §412 in Volume 4. *253J Upper integrals The idea of 253D can be repeated in terms of upper integrals, as follows. Proposition Let (X, Σ, µ) and (Y, T, ν) be σ-finite measure spaces, with c.l.d. product measure λ. Then for any functions f and g, defined on conegligible subsets of X and Y respectively, and taking values in [0, ∞], R R R f ⊗ g dλ = f dµ · g dν. Remark Here (f ⊗ g)(x, y) = f (x)g(y) for all x ∈ dom f , y ∈ dom g, taking 0 · ∞ = 0, as in §135. R R R R proof (a) I show first that f ⊗ g ≤ f g. P P If f = 0, then f =a.e. 0, so f ⊗ g =a.e. 0 and the result is R R R immediate. The same argument applies if g = 0. If both f and g are non-zero, and either is infinite, the result is trivial. So R let us R supposeR thatRboth are finite. In this case there are integrable f0 , g0 such that f ≤a.e. f0 , g ≤a.e. g0 , f = f0 and g = g0 (133J). So f ⊗ g ≤a.e. f0 ⊗ g0 , and
R
f ⊗g ≤
R
f0 ⊗ g0 =
R
f0
R
g0 =
R R
f g,
by 253D. Q Q
R (b) For the reverse inequality, we need consider only the case in which f ⊗ g is finite, so that there is a R R λ-integrable function h such that f ⊗ g ≤a.e. h and f ⊗ g = h. Set f0 (x) =
R
h(x, y)ν(dy)
whenever this is defined in R, which is almost everywhere, by Fubini’s theorem (252B-252C). Then f0 (x) ≥ R f (x) g dν for every x ∈ dom f0 ∩ dom f , which is a conegligible set in X; so
R
f ⊗g =
R
h dλ =
R
f0 dµ ≥
R R
f g,
as required. *253K
A similar argument applies to upper integrals of sums, as follows.
Proposition Let (X, Σ, µ) and (Y, T, ν) be probability spaces, with c.l.d. product measure λ. Then for any real-valued functions f , g defined on conegligible subsets of X, Y respectively,
253Xc
Tensor products
R
241
R R f (x) + g(y)λ(d(x, y)) = f (x)µ(dx) + g(y)ν(dy),
at least when the right-hand side is defined in [−∞, ∞]. proof Set h(x, y) = f (x) + g(y) for x ∈ dom f , y ∈ dom g, so that dom h is λ-conegligible. R R R R R P If either f or g is ∞, this is trivial. Otherwise, (a) As in 253J, I start by showing that h ≤ f + g. P take integrable functions f0 , g0 such that f ≤a.e. f0 and g ≤a.e. g0 . Set h0 = (f0 ⊗ χY ) + (χX ⊗ g0 ); then h ≤ h0 λ-a.e., so
R
R
R
h dλ ≤
R
h0 dλ =
R
f0 dµ +
R
g0 dν.
R
As f0 , g0 are arbitrary, h ≤ f + g. Q Q (b) For the reverse inequality, suppose that h ≤ h0 for λ-almost every (x, y), where h0 is λ-integrable. R R Set f0 (x) = h0 (x, y)ν(dy) whenever this is defined in R. Then f0 (x) ≥ f (x) + g dν whenever x ∈ dom f ∩ dom f0 , so
R
R
R
h0 dλ =
R
f0 dµ ≥
R
f dµ +
R
g dν.
R
As h0 is arbitrary, h ≥ f + g, as required. 253L Complex spaces As usual, the ideas of 253F and 253H apply essentially unchanged to complex L1 spaces. Writing L1C (µ), etc., for the complex L1 spaces involved, we have the following results. Throughout, let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . (a) If f ∈ L0C (µ), g ∈ L0C (ν) then f ⊗ g, defined by the formula (f ⊗ g)(x, y) = f (x)g(y) for all x ∈ dom f , y ∈ dom g, belongs to L0C (λ). R R R (b) If f ∈ L1C (µ), g ∈ L1C (ν) then f ⊗ g ∈ L1C (λ) and f ⊗ g dλ = f dµ g dν. (c) We have a bilinear map (u, v) 7→ u ⊗ v : L1C (µ) × L1C (ν) → L1C (λ) defined by writing f • ⊗ g • = (f ⊗ g)• for all f ∈ L1C (µ), g ∈ L1C (ν). (d) If W is any complex Banach space and φ : L1C (µ) × L1C (ν) → W is any bounded bilinear map, then there is a unique bounded linear operator T : L1C (λ) → W such that T (u ⊗ v) = φ(u, v) for every u ∈ L1C (µ), v ∈ L1C (ν), and kT k = kφk. (e) If µ and ν are complete probability measures, and Λ1 = {E × Y : E R∈ Σ}, then for any f ∈ L1C (λ) we have a conditional expectation g of f on Λ1 given by setting g(x, y) = f (x, z)ν(dz) whenever this is defined. 253X Basic exercises > (a) Let U , V and W be linear spaces. Show that the set of bilinear maps from U × V to W has a natural linear structure agreeing with those of L(U ; L(V ; W )) and L(V ; L(U ; W )), writing L(U ; W ) for the linear space of linear operators from U to W . > (b) Let U , V and W be normed spaces. (i) Show that for a bilinear map φ : U × V → W the following are equiveridical: (α) φ is bounded in the sense of 253Ab; (β) φ is continuous; (γ) φ is continuous at some point of U × V . (ii) Show that the space of bounded bilinear maps from U × V to W is a linear subspace of the space of all bilinear maps from U × V to W , and that the functional k k defined in 253Ab is a norm, agreeing with the norms of B(U ; B(V ; W )) and B(V ; B(U ; W )), writing B(U ; W ) for the normed space of bounded linear operators from U to W . (c) Let (X1 , Σ1 , µ1 ), . . . , (Xn , Σn , µn ) be measure spaces, and λ the c.l.d. product measure on X1 ×. . .×Xn , as described in 251W. Let W be a Banach space, and suppose that φ : L1 (µ1 ) × . . . × L1 (µn ) → W is multilinear (that is, linear in each variable separately) and bounded (that is, kφk = sup{φ(u1 , . . . , un ) : kui k1 ≤ 1 ∀ i ≤ n} < ∞). Show that there is a unique bounded linear operator T : L1 (λ) → W such that T ⊗ = φ, where ⊗ : L1 (µ1 ) × . . . × L1 (µn ) → L1 (λ) is a canonical multilinear map (to be defined).
242
Product measures
253Xd
(d) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . Show that if A ⊆ L1 (µ) and B ⊆ L1 (ν) are both uniformly integrable, then {u ⊗ v : u ∈ A, v ∈ B} is uniformly integrable in L1 (λ). > (e) Let (X, Σ, µ) and (Y, T, ν) be measure spaces and λ the c.l.d. product measure on X × Y . Show that (i) we have a bilinear map (u, v) 7→ u ⊗ v : L0 (µ) × L0 (ν) → L0 (λ) given by setting f • ⊗ g • = (f ⊗ g)• for all f ∈ L0 (µ), g ∈ L0 (ν); (ii) if 1 ≤ p ≤ ∞ then u ⊗ v ∈ Lp (λ) and ku ⊗ vkp = kukp kvkp for all u ∈ Lp (µ), v ∈ Lp (ν); (iii) if u, u0 ∈ L2 (µ) and v, v 0 ∈ L2 (ν) then the inner product (u ⊗ v|u0 ⊗ v 0 ), taken in L2 (λ), is just 0 (u|u )(v|v 0 ); (iv) the map (u, v) 7→ u ⊗ v : L0 (µ) × L0 (ν) → L0 (λ) is continuous if L0 (µ), L0 (ν) and L0 (λ) are all given their topologies of convergence in measure. (f ) In 253Xe, assume that µ and ν are semi-finite. Show P that if u0 , . . . , un are linearly independent n members of L0 (µ) and v0 , . . . , vn ∈ L0 (ν) are not all 0, then i=0 ui ⊗ vi 6= 0 in L0 (λ). (Hint: start by • finding sets E ∈ Σ, F ∈ T of finite measure such that u0 × χE , . . . , un × χE • are linearly independent and v0 × χF • , . . . , vn × χF • are not all 0.) (g) In 253Xe, assume that µ and ν are semi-finite. If U , V are linear subspaces of L0 (µ) and L0 (ν) respectively, write U ⊗ V for the linear subspace of L0 (λ) generated by {u ⊗ v : u ∈ U, v ∈ V }. Show that if W is any linear space and φ : U ×V → W is a bilinear map, there is a unique linear opeartor T : U ⊗V → W such that T (u ⊗ v) = φ(u, v) P for all u ∈ U , v ∈ V . (Hint: start by showing that if u0 , . . . , un ∈ U and Pn n v0 , . . . , vn ∈ V are such that i=0 ui ⊗ vi = 0, then i=0 φ(ui , vi ) = 0 – do this by expressing the ui as linear combinations of some linearly independent family and applying 253Xf.) >(h) Let (X, Σ, µ) and (Y, T, ν) be completeR probability spaces, with c.l.d. product measure λ. Suppose that p ∈ [1, ∞] and that f ∈ Lp (λ). Set g(x) = f (x, y)ν(dy) whenever this is defined. Show that g ∈ Lp (µ) and that kgkp ≤ kf kp . (Hint: 253H, 244M.) (i) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with c.l.d. product measure λ, and p ∈ [1, ∞[. Show that {w : w ∈ Lp (λ), w ≥ 0} is the closed convex hull in Lp (λ) of {u ⊗ v : u ∈ Lp (µ), v ∈ Lp (ν), u ≥ 0, v ≥ 0} (see 253Xe(ii) above). 253Y Further exercises (a) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ0 the primitive product measure on X × Y . Show that if f ∈ L0 (µ) and g ∈ L0 (ν), then f ⊗ g ∈ L0 (λ0 ). (b) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ0R the primitive Rproduct R measure on X × Y . Show that if f ∈ L1 (µ) and g ∈ L1 (ν), then f ⊗ g ∈ L1 (λ0 ) and f ⊗ g dλ0 = f dµ g dν. (c) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ0 , λ the primitive and c.l.d. product measures on X × Y . Show that the embedding L1 (λ0 ) ⊆ L1 (λ) induces a Banach lattice isomorphism between L1 (λ0 ) and L1 (λ). (d) Let (X, Σ, µ), (Y, T, ν) be strictly localizable measure spaces, with c.l.d. product measure λ. Show that L∞ (λ) can be identified with L1 (λ)∗ . Show that under this identification {w : w ∈ L∞ (λ), w ≥ 0} is the weak*-closed convex hull of {u ⊗ v : u ∈ L∞ (µ), v ∈ L∞ (ν), u ≥ 0, v ≥ 0}. (e) Find a version of 253J valid when one of µ, ν is not σ-finite. (f ) Let (X, Σ, µ) be any measure space and V any Banach space. Write L1V = L1V (µ) for the set of functions f such that (α) dom f is a conegligible subset of X (β) f takes values in V (γ) there is a conegligible set −1 D R ⊆ dom f such that f [D] is separable and D ∩ f [G] ∈ Σ for every open set G ⊆ V (δ) the integral kf (x)kµ(dx) is finite. (These are the Bochner integrable functions from X to V .) For f , g ∈ L1V write f ∼ g if f = g µ-a.e.; let L1V be the set of equivalence classes in L1V under ∼. Show that (i) f + g, cf ∈ L1V for all f , g ∈ L1V , c ∈ R;
253 Notes
Tensor products
243
(ii) L1V has a natural linear space structure, defined by writing f • + g • = (f + g)• , cf • = (cf )• for f , g ∈ L1V and c ∈ R; R (iii) L1V has a norm k k, defined by writing kf • k = kf (x)kµ(dx) for f ∈ L1V ; (iv) L1V is a Banach space under this norm; (v) there is a natural map ⊗ : L1 ×V → L1V defined by writing (f ⊗v)(x) = f (x)v when f ∈ L1 = L1R (µ), v ∈ V , x ∈ dom f ; (vi) there is a canonical bilinear map ⊗ : L1 × V → L1V defined by writing f • ⊗ v = (f ⊗ v)• for f ∈ L1 , v ∈V; (vii) whenever W is a Banach space and φ : L1 × V → W is a bounded bilinear map, there is a unique bounded linear operator T : L1V → W such that T (u ⊗ v) = φ(u, v) for all u ∈ L1 , v ∈ V , and kT k = kφk. (g) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ0 the primitive product measure on X × Y . If f is a λ0 -integrable function, write fx (y) = f (x, y) whenever this is defined. Show that we have a map x 7→ fx• from a conegligible subset D0 of X to L1 (ν). Show that this map is a Bochner integrable function, as defined in 253Yf. (h) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and suppose that φ is a function from X to a separable subset of L1 (ν) which is measurable in the sense that φ−1 [G] ∈ Σ for every open G ⊆ L1 (ν). Show that there is a Λ-measurable function f from X × Y to R, where Λ is the domain of the c.l.d. product measure on X × Y , such that φ(x) = fx• for every x ∈ X, writing fx (y) = f (x, y) for x ∈ X, y ∈ Y . (i) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . Show that 253Yg provides a canonical identification between L1 (λ) and L1L1 (ν) (µ). (j) Let (X, Σ, µ) and (Y, T, ν) be complete locally determined measure spaces, with c.l.d. product measure R λ. (i) Suppose that K ∈ L2 (λ), f ∈ L2 (µ). Show that h(y) = K(x, y)f R (x)dx is defined for almost all y ∈ Y and that h ∈ L2 (ν). (Hint: to see that h is defined a.e., consider E×F K(x, y)f (x)d(x, y) for µE, R νF < ∞; to see that h ∈ L2 consider h × g where g ∈ L2 (ν).) (ii) Show that the map f 7→ h corresponds to a bounded linear operator TK : L2 (µ) → L2 (ν). (iii) Show that the map K 7→ TK corresponds to a bounded linear operator, of norm at most 1, from L2 (λ) to B(L2 (µ); L2 (ν)). 1 (k) Suppose that p, q ∈ [1, ∞] and that p1 + 1q = 1, interpreting ∞ as 0 as usual. Let (X, Σ, µ), (Y, T, ν) be complete locally determined measure spaces with c.l.d. product measure λ. Show that the ideas of 253Yj can be used to define a bounded linear operator, of norm 1, from Lp (λ) to B(Lq (µ); Lp (ν)).
(l) In 253Xc, suppose that W is a Banach lattice. Show that the following are equiveridical: (i) T u ≥ 0 whenever u ∈ L1 (λ); (ii) φ(u1 , . . . , un ) ≥ 0 whenever ui ≥ 0 in L1 (λi ) for each i ≤ n. 253 Notes and comments Throughout the main arguments of this section, I have written the results in terms of the c.l.d. product measure; of course the isomorphism noted in 253Yc means that they could just as well have been expressed in terms of the primitive product measure. The more restricted notion of integrability with respect to the primitive product measure is indeed the one appropriate for the ideas of 253Yg. Theorem 253F is a ‘universal mapping theorem’; it asserts that every bounded bilinear operator on L1 (µ) × L1 (ν) factors through ⊗ : L1 (µ) × L1 (ν) → L1 (λ), at least if the range space is a Banach space. It is easy to see that this property defines the pair (L1 (λ), ⊗) up to Banach space isomorphism, in the following sense: if V is a Banach space, and ψ : L1 (µ) × L1 (ν) → V is a bounded bilinear map such that for every bounded bilinear map φ from L1 (µ) × L1 (ν) to any Banach space W there is a unique bounded linear operator T : V → W such that T ψ = φ and kT k = kφk, then there is an isometric Banach space isomorphism S : L1 (λ) → V such that S⊗ = ψ. There is of course a general theory of bilinear maps between Banach spaces; in the language of this theory, L1 (λ) is, or is isomorphic to, the ‘projective tensor product’ of L1 (µ) and L1 (ν). For an introduction to this subject, see Defant & Floret 93, §I.3, or Semadeni 71, §20. I should perhaps emphasise, for the sake of those who have not encountered tensor products before, that this theorem is special to L1 spaces. While some of the same ideas can be applied to other function spaces (see 253Xe-253Xg), there is no other class to which 253F applies.
244
Product measures
253 Notes
There is also a theory of tensor products of Banach lattices, for which I do not think we are quite ready (it needs general ideas about ordered linear spaces for which I mean to wait until Chapter 35 in the next volume). However 253G shows that the ordering, and therefore the Banach lattice structure, of L1 (λ) is determined by the ordering of L1 (µ) and L1 (ν) and the map ⊗ : L1 (µ) × L1 (ν) → L1 (λ). The conditional expectation operators described in 253H are of very great importance, largely because in this special context we have a realization of the conditional expectation operator as a function P0 from L1 (λ) to L1 (λ¹Λ1 ), not just as a function from L1 (λ) to L1 (λ¹Λ1 ), as in 242J. As described here, P0 (f + f 0 ) need not be equal, in the strict sense, to P0 f + P0 f 0 ; it can have a larger domain. In applications, however, one b might be willing to restrict attention to the linear space U of bounded Σ⊗T-measurable functions defined everywhere on X × Y , so that P0 becomes an operator from U to itself (see 252P).
254 Infinite products I come now to the second basic idea of this chapter: the description of a product measure on the product of a (possibly large) family of probability spaces. The section begins with a construction on similar lines to that of §251 (254A-254F) and its defining property in terms of inverse-measure-preserving functions (254G). I discuss the usual measure on {0, 1}I (254J-254K), subspace measures (254L) and various properties of subproducts (254M-254T), including a study of the associated conditional expectation operators (254R254T). Q 254A Definitions (a) Let h(Xi , Σi , µi )ii∈I be a family of probability spaces. Set X = i∈I Xi , the family of functions x with domain I such that x(i) ∈ Xi for every i ∈ I. In this context, I will say that a measurable cylinder is a subset of X expressible in the form Q C = i∈I Ci , where Ci ∈ Σi for every i ∈ I and Xi } is finite. Note that for a non-empty C ⊆ X this expression Q{i : Ci 6= Q is unique. P P Suppose that C = i∈I Ci = i∈I Ci0 . For each i ∈ I set Di = {x(i) : x ∈ C}. Of course Di ⊆ Ci . Because C 6= ∅, we can fix on some z ∈ C. If i ∈ I and ξ ∈ Ci , consider x ∈ X defined by setting x(i) = ξ,
x(j) = z(j) for j 6= i;
then x ∈ C so ξ = x(i) ∈ Di . Thus Di = Ci for i ∈ I. Similarly, Di = Ci0 . Q Q (b) We can therefore define a functional θ0 : C → [0, 1], where C is the set of measurable cylinders, by setting Q θ0 C = i∈I µi Ci whenever Ci ∈ Σi for every i ∈ I and {i : Ci 6= Xi } is finite, noting that only finitely many terms in the product can differ from 1, so that it can safely be treated as a finite Q product. If C = ∅, one of the Ci must be empty, so θ0 C is surely 0, even though the expression of C as i∈I Ci is no longer unique. (c) Now define θ : PX → [0, 1] by setting P∞ S θA = inf{ n=0 θ0 Cn : Cn ∈ C for every n ∈ N, A ⊆ n∈N Cn }. 254B Lemma The functional θ defined in 254Ac is always an outer measure on X. proof Use exactly the same arguments as those in 251B above. 254CQDefinition Let h(Xi , Σi , µi )ii∈I be any indexed family of probability spaces, and X the Cartesian product i∈I Xi . The product measure on X is the measure defined by Carath´eodory’s method (113C) from the outer measure θ defined in 254A.
254F
Infinite products
245
Q 254D Remarks (a) In 254Ab, I asserted that if C ∈ C and no Ci is empty, then nor is C = i∈I Ci . This is the ‘Axiom of Choice’: the product of any family hCi ii∈I of non-empty sets is non-empty, that is, there is a ‘choice function’ x with domain I picking out a distinguished member x(i) of each Ci . In this volume I have not attempted to be scrupulous in indicating uses of the axiom of choice. In fact the use here is not an absolutely vital one; I mean, the theory of infinite products, even uncountable products, of probability spaces does not change character completely in the absence of the full axiom of choice (provided, that is, that we allow ourselves to use Q the countable axiom of choice). The point is that all we really need, in the present context, is that X = i∈I Xi should be non-empty; and in many contexts we can prove this, for the particular cases of interest, without using the axiom of choice, by actually exhibiting a member of X. The simplest case in which this is difficult is when the Xi are uncontrolled Borel subsets of [0, 1]; and even then, if they are presented with coherent descriptions, we may, with appropriate labour, be able to construct a member of X. But clearly such a process is liable to slow us down a good deal, and for the moment I think there is no great virtue in taking so much trouble. (b) I have given this section the title ‘infinite products’, but it is useful to be able to apply the ideas to finite I; I should mention in particular the cases #(I) ≤ 2. (i) If I = ∅, X consists of the unique function with domain I, the empty function. If we identify a function with its graph, then X is actually {∅}; in any case, X is to be a singleton set, with λX = 1. (ii) If I is a singleton {i}, then we can identify X with Xi ; C becomes identified with Σi and θ0 with µi , so that θ can be identified with µ∗i and the ‘product measure’ becomes the measure on Xi defined from µ∗i , that is, the completion of µi (213Xa(iv)). (iii) If I is a doubleton {i, j}, then we can identify X with Xi × Xj ; in this case the definitions of 254A, 254C above match exactly with those of 251A and 251C, so that λ here can be identified with the primitive product measure as defined in 251C. Because µi and µj are both totally finite, this agrees with the c.l.d. product measure of 251F. Q 254E Definition Let hXi ii∈I be any family of sets, and X = i∈I Xi . If Σi is a σ-subalgebra of subsets N of Xi for each i ∈ I, I write c i∈I Σi for the σ-algebra of subsets of X generated by {{x : x ∈ X, x(i) ∈ E} : i ∈ I, E ∈ Σi }. (Compare 251D.) 254FQTheorem Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, and let λ be the product measure on X = i∈I Xi defined as in 254C; let Λ be its domain. (a) λX = 1. Q Q Q (b) If Ei ∈ Σi for every i ∈ I, and {i : Ei 6= Xi } is countable, then i∈I Ei ∈ Λ, and λ( i∈I Ei ) = i∈I µi Ei . In particular, λC = θ0 C for every measurable cylinder C, as defined in 254A, and if j ∈ I then x 7→ x(j) : X → Xj is inverse-measure-preserving. N (c) c i∈I Σi ⊆ Λ. (d) λ is complete. (e) For S every W ∈ Λ and ² > 0 there is a finite family C0 , . . . , Cn of measurable cylinders such that λ(W 4 k≤n Ck ) ≤ ². N (f) For every W ∈ Λ there are W1 , W2 ∈ c i∈I Σi such that W1 ⊆ W ⊆ W2 and λ(W2 \ W1 ) = 0. Q Remark Perhaps I should pause Q to interpret the product i∈I µi Ei . Because all the µi Ei belong to [0, 1], this is simply inf J⊆I,J is finite i∈J µi Ei , taking the empty product to be 1. proof Throughout this proof, define C, θ0 and θ as in 254A. I will write out an argument which applies to finite I as well as infinite I, but you may reasonably prefer to assume that I is infinite on first reading. Q (a) Of course λX = θX, so I have to show that θX = 1. Because X, ∅ ∈ C and θ0 X = i∈I µi Xi = 1 and θ0 ∅ = 0, θX ≤ θ0 X + θ0 ∅ + . . . = 1.
246
Product measures
254F
I therefore have to show that θX ≥ 1. ?? Suppose, if possible, otherwise. P∞ (i) There is a sequence hCn in∈N in C, covering X, such that n=0 θ0 Cn < 1. For each n ∈ N, express Cn as {x : x(i) ∈ Eni ∀ i ∈ I}, where S every Eni ∈ Σi and Jn = {i : Eni 6= Xi } is finite. No Jn can be empty, because θ0 Cn < 1 = θ0 X; set J = n∈N Jn . Then J is a countable non-empty subset of I. Set K = N if J is infinite, {k : 0 ≤ k < #(J)} if J is finite; let k 7→ ik : K → Q J be a bijection. For each k ∈ K, set Lk = {ij : j < k} ⊆ J, and set αnk = i∈I\Lk µi Eni for n ∈ N, k ∈ K. If J is finite, then we can identify L#(J) with J, and set αn,#(J) = 1 for every n. We have αn0 = θ0 Cn for each n, so P∞ n=0 αn0 < 1. For n ∈ N, k ∈ K, t ∈ Xik set fnk (t) = αn,k+1 if t ∈ En,ik , = 0 otherwise. Then
R
fnk dµik = αn,k+1 µik En,ik = αnk .
(ii) Choose tk ∈ Xik inductively, for k ∈ K, as follows. The inductive hypothesis will be that αnk < 1, where Mk = {n : n ∈ N, tj ∈ En,ij ∀ j < k}; of course M0 = N, so the induction starts. Given that R R P P P 1 > n∈Mk αnk = n∈Mk fnk dµik = ( n∈Mk fnk )dµik P (by B.Levi’s theorem), there must be a tk ∈ Xik such that n∈Mk fnk (tk ) < 1. Now for such a choice of tk , P αn,k+1 = fnk (tk ) for every n ∈ Mk+1 , so that n∈Mk+1 αn,k+1 < 1, and the induction continues, unless J is finite and k + 1 = #(J). In this last case we must just have M#(J) = ∅, because αn,#(J) = 1 for every n. P
n∈Mk
(iii) If J is infinite, we obtain a full sequence htk ik∈N ; if J is finite, we obtain just a finite sequence htk ik 0. Then there is a sequence hCn in∈N in C such that S C and n n∈N n=0 θ0 Cn ≤ θA + ². In this case S S A ∩ W ⊆ n∈N Cn ∩ W , A \ W ⊆ n∈N Cn \ W,
so θ(A ∩ W ) ≤
P∞
n=0 θ0 (Cn
and θ(A ∩ W ) + θ(A \ W ) ≤
P∞
∩ W ),
n=0 θ0 (Cn
θ(A \ W ) ≤
P∞
n=0 θ0 (Cn
∩ W ) + θ0 (Cn \ W ) =
P∞
\ W ),
n=0 θ0 Cn
≤ θA + ².
As ² is arbitrary, θ(A ∩ W ) + θ(A \ W ) ≤ θA; as A is arbitrary, W ∈ Λ. (iii) I show next that if JQ⊆ I is finite and Ci ∈ Σi for each i ∈ J, and C = {x : x ∈ X, x(i) ∈ Ci ∀ i ∈ J}, then C ∈ Λ and λC = i∈J µi Ci . P P Induce on #(J). If #(J) = 0, that is, J = ∅, then C = X and this is part (a). For the inductive step to #(J) = n + 1, take any j ∈ J and set J 0 = J \ {j}, C 0 = {x : x ∈ X, x(i) ∈ Ci ∀ i ∈ J 0 }, C 00 = C 0 \ C = {x : x ∈ C 0 , x(j) ∈ Xj \ Cj }.
254G
Infinite products
247
Q Then C, C 0 , C 00 all belong to C, and θ0 C 0 = i∈J 0 µi Ci = α say, θ0 C = αµj Cj , θ0 C 00 = α(1 − µj Cj ). Moreover, by the inductive hypothesis, C 0 ∈ Λ and α = λC 0 = θC 0 . So C = C 0 ∩ {x : x(j) ∈ Cj } ∈ Λ by (ii), and C 00 = C 0 \ C ∈ Λ. We surely have λC = θC ≤ θ0 C, λC 00 ≤ θ0 C 00 ; but also α = λC 0 = λC + λC 00 ≤ θ0 C + θ0 C 00 = α, so in fact λC = θ0 C = αµj Cj =
Q i∈J
µCi ,
and the induction proceeds. Q Q Q (iv) Now let us return to the general case of a set W of the form i∈I Ei where Ei ∈ Σi for each i, and K = {i : Ei 6= Xi } is countable. If K is finite then W = {x : x(i) ∈ Ei ∀ i ∈ K} so W ∈ Λ and Q Q λW = i∈K µi Ei = i∈I µi Ei . Otherwise, let hin in∈N be an enumeration of K. Q For each n ∈ N set Wn = {x : x ∈ X, x(ik ) ∈ Eik ∀ k ≤ n}; then we know that Wn ∈ Λ and that λWn = k≤n µik Eik . But hWn in∈N is a non-increasing sequence with intersection W , so W ∈ Λ and Q Q λW = limn→∞ λWn = i∈K µi Ei = i∈I µi Ei . N (c) is an immediate consequence of (b) and the definition of c i∈I Σi . (d) Because λ is constructed by Carath´eodory’s method it must be complete. P∞ S (e) Let hCn in∈N be a sequence in C such that W ⊆ n∈N Cn and n=0 θ0 Cn ≤ θW + 21 ². Set V = P∞ S S by (b), V ∈ Λ. Let n ∈ N be such that i=n+1 θ0 Ci ≤ 21 ², and consider W 0 = k≤n Ck . Since n∈N Cn ; S V \ W 0 ⊆ i>n Ci , λ(W 4W 0 ) ≤ λ(V \ W 0 ) + λ(V \ W ) = λV − λW + λ(V \ W ) = θV − θW + θ(V \ W ) ∞ X X 1 1 ≤ θ0 Ci − θW + θ0 Ci ≤ ² + ² = ². i=0
2
i=n+1
2
N P Let hCn in∈N (f )(i) If W ∈ Λ and ² > 0 there is a V ∈ c i∈I Σi such that W ⊆ V and λV ≤ λW + ². P N P∞ S c be a sequence in C such that W ⊆ n∈N Cn and n=0 θ0 Cn ≤ θW + ². Then Cn ∈ i∈I Σi for each n, so S N V = n∈N Cn ∈ c i∈I Σi . Now W ⊆ V , and P∞ λV = θV ≤ n=0 θ0 Cn ≤ θW + ² = λW + ². Q Q N (ii) Now, given W ∈ Λ, let hVn in∈N be a sequence of sets in c i∈I Σi such that W ⊆ Vn and λVn ≤ T N λW + 2−n for each n; then W2 = n∈N Vn belongs to c i∈I Σi and W ⊆ W2 and λW2 = λW . Similarly, N there is a W 0 ∈ c Σi such that X \ W ⊆ W 0 and λW 0 = λ(X \ W ), so we may take W1 = X \ W 0 to 2
i∈I
complete the proof. 254G
2
2
2
The following is a fundamental, indeed defining, property of product measures.
Lemma Let h(Xi , Σi , µi )ii∈I be a family of probability spaces with product (X, Λ, λ). Let (Y, T, ν) be a complete probability space and φ : Y → X a function. Suppose that ν ∗ φ−1 [C] ≤ λC for every measurable cylinder C ⊆ X. Then φ is inverse-measure-preserving. In particular, φ is inverse-measure-preserving iff φ−1 [C] ∈ T and νφ−1 [C] = λC for every measurable cylinder C ⊆ X. Remark By ν ∗ I mean the usual outer measure defined from ν as in §132. proof (a) First note that, writing θ for the outer measure of 254A, ν ∗ φ−1 [A] P Given S ≤ θA for every P∞A ⊆ X. P ² > 0, there is a sequence hCn in∈N of measurable cylinders such that A ⊆ n∈N Cn and n=0 θ0 Cn ≤ θA+²,
248
Product measures
254G
where θ0 is the functional of 254A. But we know that θ0 C = λC for every measurable cylinder C (254Fb), so S P∞ P∞ ν ∗ φ−1 [A] ≤ ν ∗ ( n∈N φ−1 [Cn ]) ≤ n=0 ν ∗ φ−1 [Cn ] ≤ n=0 λCn ≤ θA + ². As ² is arbitrary, ν ∗ φ−1 [A] ≤ θA. Q Q (b) Now take any W ∈ Λ. Then there are F , F 0 ∈ T such that φ−1 [W ] ⊆ F ,
φ−1 [X \ W ] ⊆ F 0 ,
νF = ν ∗ φ−1 [W ] ≤ θW = λW ,
νF 0 ≤ λ[X \ W ].
We have F ∪ F 0 ⊇ φ−1 [W ] ∪ φ−1 [X \ W ] = Y , so ν(F ∩ F 0 ) = νF + νF 0 − ν(F ∪ F 0 ) ≤ λW + λ(X \ W ) − 1 = 0. Now F \ φ−1 [W ] ⊆ F ∩ φ−1 [X \ W ] ⊆ F ∩ F 0 is ν-negligible. Because ν is complete, F \ φ−1 [W ] ∈ T and φ−1 [W ] = F \ (F \ φ−1 [W ]) belongs to T. Moreover, 1 = νF + νF 0 ≤ λW + λ(X \ W ) = 1, so we must have νF = λW ; but this means that νφ−1 [W ] = νW . As W is arbitrary, φ is inverse-measurepreserving. 254H Corollary Let h(Xi , Σi , µi )ii∈I and h(Yi , Ti , νi )ii∈I be two families of probability spaces, with products (X, Λ, λ) and (Y, Λ0 , λ0 ). Suppose that for each i ∈ I we are given an inverse-measure-preserving function φi : Xi → Yi . Set φ(x) = hφi (x(i))ii∈I for x ∈ X. Then φ : X → Y is inverse-measure-preserving. Q Q proof If C = i∈I Ci is a measurable cylinder in Y , then φ−1 [C] = i∈I φ−1 i [Ci ] is a measurable cylinder in X, and Q Q 0 λφ−1 [C] = i∈I µi φ−1 i [Ci ] = i∈I νi Ci = λ C. Since λ is a complete probability measure, 254G tells us that φ is inverse-measure-preserving. 254I
Corresponding to 251S we have the following.
Proposition Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, λ the product measure on X = and Λ its domain. Then λ is also the product of the completions µ ˆi of the µi (212C).
Q i∈I
Xi ,
ˆ for the product of the µ ˆ for its domain. (i) The identity map from Xi to itself proof Write λ ˆi , and Λ is inverse-measure-preserving if regarded as a map from (Xi , µ ˆi ) to (Xi , µi ), so the identity map on X is ˆ to (X, λ), by 254H; that is, Λ ⊆ Λ ˆ ˆ and λ = λ¹Λ. inverse-measure-preserving if regarded as a map from (X, λ) Q ˆ i for every i and {i : Ci 6= Xi } (ii) If C is a measurable cylinder for hˆ µi ii∈I , that is, C = i∈I Ci where Ci ∈ Σ Q is finite, then for each i ∈ I we can find a Ci0 ∈ Σi such that Ci ⊆ Ci0 and µi Ci0 = µ ˆi Ci ; setting C 0 = i∈I Ci0 , we get Q Q ˆ λ∗ C ≤ λC 0 = µi C 0 = µ ˆi Ci = λC. i∈I
i
i∈I
ˆ whenever W ∈ Λ. ˆ Putting this together with (i), we see By 254G, λW must be defined and equal to λW ˆ that λ = λ. 254J The product measure on {0, 1}I (a) Perhaps the most important of all examples of infinite product measures is the case in which each factor Xi is just {0, 1} and each µi is the ‘fair-coin’ probability measure, setting
254K
Infinite products
249 1 2
µi {0} = µi {1} = . In this case, the product X = {0, 1}I has a family hEi ii∈I of measurable sets such that, writing λ for the product measure on X, T λ( i∈J Ei ) = 2−#(J) if J ⊆ I is finite. (Just take Ei = {x : x(i) = 1} for each i.) I will call this λ the usual measure on {0, 1}I . Observe that if I is finite then λ{x} = 2−#(I) for each x ∈ X (using 254Fb). On the other hand, if I is infinite, then λ{x} = 0 for every x ∈ X (because, again using 254Fb, λ∗ {x} ≤ 2−n for every n). (b) There is a natural bijection between {0, 1}I and PI, matching x ∈ {0, 1}I with {i : i ∈ I, x(i) = 1}. ˜ on PI, which I will call the usual measure on PI. Note that for any So we get a standard measure λ finite b ⊆ I and any c ⊆ b we have ˜ : a ∩ b = c} = λ{x : x(i) = 1 for i ∈ c, x(i) = 0 for i ∈ b \ c} = 2−#(b) . λ{a (c) Of course we can apply 254G to these measures; if (Y, T, ν) is a complete probability space, a function φ : Y → {0, 1}I is inverse-measure-preserving iff ν{y : y ∈ Y, φ(y)¹J = z} = 2−#(J) whenever J ⊆ I is finite and z ∈ {0, 1}J ; this is because the measurable cylinders in {0, 1}I are precisely the sets of the form {x : x¹J = z} where J ⊆ I is finite. 254K In the case of countably infinite I, we have a very important relationship between the usual product measure of {0, 1}I and Lebesgue measure on [0, 1]. Proposition Let λ be the usual measure on X = {0, 1}N , and let µ be Lebesgue measure on [0, 1]; write Λ for the domain of λ and Σ for the domain of µ. P∞ (i) For x ∈ X set φ(x) = i=0 2−i−1 x(i). Then φ−1 [E] ∈ Λ and λφ−1 [E] = µE for every E ∈ Σ; φ[F ] ∈ Σ and µφ[F ] = λF for every F ∈ Λ. (ii) There is a bijection φ˜ : X → [0, 1] which is equal to φ at all but countably many points, and any such bijection is an isomorphism between (X, Λ, λ) and ([0, 1], Σ, µ). proof (a) The first point to observe is that φ is nearly a bijection. Setting H = {x : x ∈ X, ∃ m ∈ N, x(i) = x(m) ∀ i ≥ m}, H 0 = {2−n k : n ∈ N, k ≤ 2n }, then H and H 0 are countable and φ¹X \ H is a bijection between X \ H and [0, 1] \ H 0 . (For t ∈ [0, 1] \ H 0 , φ−1 (t) is the binary expansion of t.) Because H and H 0 are countably infinite, there is a bijection between them; combining this with φ¹X \ H, we have a bijection between X and [0, 1] equal to φ except at countably many points. For the rest of this proof, let φ˜ be any such bijection. Let M be the countable set {x : x ∈ ˜ ˜ ]; then φ[A]4φ[A] ˜ X, φ(x) 6= φ(x)}, and N the countable set φ[M ] ∪ φ[M ⊆ N for every A ⊆ X. (b) To see that λφ˜−1 [E] exists and is equal to µE for every E ∈ Σ, I consider successively more complex sets E. α) If E = {t} then λφ˜−1 [E] = λ{φ˜−1 (t)} exists and is zero. (α β ) If E is of the form [2−n k, 2−n (k + 1)[, where n ∈ N and 0 ≤ k < 2n , then φ−1 [E] differs by at most (β two points from a set of the form {x : x(i) = z(i) ∀ i < n}, so φ˜−1 [E] differs from this by a countable set, and λφ˜−1 [E] = 2−n = µE. (γγ ) If E is of the form [2−n k, 2−n l[, where n ∈ N and 0 ≤ k < l ≤ 2n , then
250
Product measures
E=
S k≤i 0 there is a sequence hIn in∈N of half-open subintervals of [0, 1[ such that S P∞ S E \ {1} ⊆ n∈N In and n=0 µIn ≤ µE + ²; now φ˜−1 [E] ⊆ {φ˜−1 (1)} ∪ n∈N φ−1 [In ], so S P∞ P∞ λ∗ φ˜−1 [E] ≤ λ( n∈N φ˜−1 [In ]) ≤ n=0 λφ˜−1 [In ] = n=0 µIn ≤ µE + ². As ² is arbitrary, λ∗ φ˜−1 [E] ≤ µE, and there is a V ∈ Λ such that φ−1 [E] ⊆ V and λV ≤ µE. (ζζ ) Similarly, there is a V 0 ∈ Λ such that V 0 ⊇ φ˜−1 [[0, 1] \ E] and λV 0 ≤ µ([0, 1] \ E). Now V ∪ V 0 = X, while λV + λV 0 ≤ µE + (1 − µE) = 1 = λ(V ∪ V 0 ), so λ(V ∩ V 0 ) = 0 and φ˜−1 [E] = (X \ V 0 ) ∪ (V ∩ V 0 ∩ φ˜−1 [E]) belongs to Λ, with λφ˜−1 [E] ≤ λV ≤ µE; at the same time, 1 − λφ˜−1 [E] ≤ λV 0 ≤ 1 − µE so λφ˜−1 [E] = µE. (c) Now suppose that C ⊆ X is a measurable cylinder of the special Pn form {x : x(0) = ²0 , . . . , x(n) = ²n } for some ²0 , . . . , ²n ∈ {0, 1}. Then φ[C] = [t, t + 2−n−1 ] where t = i=0 2−i−1 ²i , so that µφ[C] = λC. Since ˜ ˜ φ[C]4φ[C] ⊆ N is countable, µφ[C] = λC. If C ⊆ X is any measurable cylinder, then it is of the form {x : x¹J = z} for some finite J ⊆ N; taking n so large that J ⊆ {0, . . . , n}, C is expressible as a disjoint union of 2n+1−#(J) sets of the form just considered, ˜ being just those in which ²i = z(i) for i ∈ J. Summing their measures, we again get µφ[C] = λC. Now −1 ˜ ˜ 254G tells us that φ : [0, 1] → X is inverse-measure-preserving, that is, φ[W ] is Lebesgue measurable, with measure λW , for every W ∈ Λ. Putting this together with (b), φ˜ must be an isomorphism between (X, Λ, λ) and ([0, 1], Σ, µ), as claimed in (ii) of the proposition. (d) As for (i), if E ∈ Σ then φ−1 [E]4φ˜−1 [E] ⊆ M is countable, so λφ−1 [E] = λφ˜−1 [E] = µE. While if ˜ ] ⊆ N is countable, so µφ[W ] = µφ[W ˜ ] = λW . W ∈ Λ, φ[F ]4φ[W (e) Finally, if ψ : X → [0, 1] is any other bijection which agrees with φ at all but countably many points, set M 0 = {x : ψ(x) 6= φ(x)}, N 0 = ψ[M 0 ] ∪ φ[M 0 ]. Then ψ −1 [E]4φ−1 [E] ⊆ M 0 ,
λψ −1 [E] = λφ−1 [E] = µE
for every E ∈ Σ, and ψ[F ]4φ[F ] ⊆ N 0 , for every F ∈ Λ.
µψ[F ] = µφ[F ] = λF
254L
Infinite products
251
254L Subspaces Just as in 251P, we can consider the product of subspace measures. There is a simplification in the form of the result because in the present context we are restricted to probability measures. Theorem Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, and (X, Λ, λ) their product. (a) For each i ∈ I, let Ai ⊆ Xi be a set of full outer measure, and write µ ˜i for the subspace measure on ˜ be the product measure on A = Q Ai . Then λ ˜ is the subspace measure on A induced Ai (214B). Let λ i∈I by λ. Q Q (b) λ∗ ( i∈I Ai ) = i∈I µ∗i Ai whenever Ai ⊆ Xi for every i. ˜ for the proof (a) Write λA for the subspace measure on A defined from λ, and ΛA for its domain; write Λ ˜ domain of λ. Q (i) Let φ : A → X be the identity map. If C ⊆ X is a measurable cylinder, say C = i∈I Ci where Q Ci ∈ Σi for each i, then φ−1 [C] = i∈I (Ci ∩ Ai ) is a measurable cylinder in A, and Q ˜ −1 [C] = Q µ λφ i∈I ˜ i (Ci ∩ Ai ) ≤ i∈I µi Ci = µC. ˜ ∩ W ) = λW for every W ∈ Λ. But this means that By 254G, φ is inverse-measure-preserving, that is, λ(A ∗ ˜ is defined and equal to λA V = λ V for every V ∈ ΛA , since for any such V there is a W ∈ Λ such that λV V = A ∩ W and λW = λA V . In particular, λA A = 1. ˜ ˜ (ii) Now regard φ as a function Q from the measure space (A, ΛA , λA ) to (A, Λ, λ). If D is a measurable ˜i and Di = Ai for all cylinder in A, we can express it as i∈I Di where every Di belongs to the domain of µ but finitely many i. Now for each i we can find Ci ∈ Σi such that D = C ∩ A and µCi = µ ˜i Di , and we i i i Q can suppose that Ci = Xi whenever Di = Ai . In this case C = i∈I Ci ∈ Λ and Q Q ˜ ˜i Di = λD. λC = i∈I µi Ci = i∈I µ Accordingly ˜ λA φ−1 [D] = λA (A ∩ C) ≤ λC = λD. By 254G again, φ is inverse-measure-preserving in this manifestation, that is, λA V is defined and equal to ˜ for every V ∈ Λ. ˜ as claimed. ˜ Putting this together with (i), we have λA = λ, λV (b) For each i ∈ I, choose a set Ei ∈ Σi such that Ai ⊆ Ei and µi Ei = µ∗i Ai ; do this in such a way that Ei = Xi whenever µ∗i Ai = 1. Set Bi = Ai ∪ (Xi \ Ei ), so that µ∗i Bi = 1 for each i (if F ∈ Σi and F ⊇ Bi then F ∩ Ei ⊇ Ai , so µi F = µi (F ∩ Ei ) + µi (F \ Ei ) = µi Ei + µi (Xi \ Ei ) = 1.) Q By (a), we can identify the subspace measure λB on B = i∈I Bi with the product Q of the subspaceQmeasures µ ˜i onQ Bi . In particular, λ∗ B = λB B = 1. Now Ai = Bi ∩ Ei so (writing A = Qi∈I Ai ), A = B ∩ i∈I Ei . If i∈I µ∗i Ai = 0, then for every ² > 0 there is a finite J ⊆ I such that i∈J µ∗i Ai ≤ ²; consequently (using 254Fb) Q λ∗ A ≤ λ{x : x(i) ∈ Ei for every i ∈ J} = i∈J µi Ei ≤ ². Q As ² is arbitrary, λ∗ A = 0. If i∈I µ∗i Ai > 0, then for every n ∈ N the set {i : µ∗ Ai ≤ 1 − 2−n } must be finite, so J = {i : µ∗ Ai < 1} = {i : Ei 6= Xi } Q is countable. By 254Fb again, applied to hEi ∩ Bi ii∈I in the product i∈I Bi , λ∗ (
Y
Y Ai ) = λB ( Ai ) = λB {x : x ∈ B, x(i) ∈ Ei ∩ Bi for every i ∈ J}
i∈I
=
Y i∈J
as required.
i∈I
µ ˜i (Ei ∩ Bi ) =
Y i∈I
µ∗i Ai ,
252
Product measures
254M
254M I now turn to the basic results which make it possible to use these product measures effectively. First, I offer a vocabulary for dealing with subproducts. Let hXi ii∈I be a family of sets, with product X. Q (a) For J ⊆ I, write XJ for i∈J Xi . We have a canonical bijection x 7→ (x¹J, x¹I \ J) : X → XI × XI\J . Associated with this we have the map x 7→ πJ (x) = x¹J : X → XJ . Now I will say that a set W ⊆ X is determined by coordinates in J if there is a V ⊆ XJ such that W = πJ−1 [V ]; that is, W corresponds to V × XI\J ⊆ XJ × XI\J . It is easy to see that W is determined by coordinates in J ⇐⇒ x0 ∈ W whenever x ∈ W, x0 ∈ X and x0 ¹J = x¹J ⇐⇒ W = πJ−1 [πJ [W ]]. It follows that if W is determined by coordinates in J, and J ⊆ K ⊆ I, W is also determined by coordinates in K. The family WJ of subsets of X determined by coordinates in J is closed under complementation and arbitrary unions and intersections. P P If W ∈ WJ , then X \ W = X \ πJ−1 [πJ [W ]] = πJ−1 [XJ \ πJ [W ]] ∈ WJ . If V ⊆ WJ , then
S
V=
S V ∈V
πJ−1 [πJ [V ]] = πJ−1 [
(b) It follows that W=
S
S V ∈V
πJ [V ]] ∈ WJ . Q Q
{WJ : J ⊆ I is countable},
the family of subsets of X determined by coordinates in some countable set, is a σ-algebra of subsets of X. P P (i) X and ∅ are determined by coordinates in ∅ (recall that X∅ is a singleton, and that X = π∅−1 [X∅ ], ∅ = π∅−1 [∅]). (ii) If W ∈ W, there is a countable J ⊆ I such that W ∈ WJ ; now X \ W = πJ−1 [XJ \ πJ [W ]] ∈ WJ ⊆ W. (iii) If hWS n in∈N is a sequence in W, then for each n ∈ N there is a countable Jn ⊆ I such that W ∈ WJn . Now J = n∈N Jn is a countable subset of I, and every Wn belongs to WJ , so S Q n∈N Wn ∈ WJ ⊆ W. Q (c) If i ∈ I and E ⊆ Xi then {x : x ∈ X, x(i) ∈ E} is determined by the single coordinate i, so surely N belongs to W; accordingly W must include c i∈I PXi . A fortiori, if Σi is a σ-algebra of subsets of Xi for N N each i, W ⊇ c Σi ; that is, every member of c Σi is determined by coordinates in some countable set. i∈I
i∈I
254N Theorem Let h(Xi , Σi , µi )ii∈I be a family of Q probability spaces and hKj ij∈J a partition of I. For each j ∈ J let λj be the product measure on Zj = i∈Kj Xi , and write λ for the product measure on Q X = i∈I Xi . Then the natural bijection Q x 7→ φ(x) = hx¹Kj ij∈J : X → j∈J Zj identifies λ with the product of the family hλj ij∈J . In Q particular, ifQK ⊆ I is any set, then λ can be identified with the c.l.d. product of the product measures on i∈K Xi and i∈I\K Xi . Q ˜ for the product measure on Z; let Λ, Λ ˜ be the domains proof (Compare 251M.) Write Z = j∈J Zj and λ ˜ of λ and λ. Q ˜ (a) Let C ⊆ Z be a measurable cylinder. Then λ∗ φ−1 [C] ≤ λC. P P Express C as j∈J Cj where Cj ⊆ Zj belongs to the domain Λj of λj for each j. Set L = {j : Cj 6= Zj },Qso that L is finite. Let ² S > 0. For each j ∈ L let hCjn in∈N be a sequence of measurable cylinders in Zj = i∈Kj Xi such that Cj ⊆ n∈N Cjn and
254O
Infinite products
P∞
Q
λj Cjn ≤ λCj + ². Express each Cjn as is finite). For f ∈ N L , set n=0
Because
S
i∈Kj
253
Cjni where Cjni ∈ Σi for i ∈ Kj (and {i : Cjni 6= Xi }
Df = {x : x ∈ X, x(i) ∈ Cj,f (j),i whenever j ∈ L, i ∈ Kj }. j∈L {i
: Cj,f (j),i 6= Xi } is finite, Df is a measurable cylinder in X, and Q Q Q λDf = j∈L i∈Kj µi Cj,f (j),i = j∈L λj Cj,f (j) .
Also
S
{Df : f ∈ N L } ⊇ φ−1 [C]
because if φ(x) ∈ C then φ(x)(j) ∈ Cj for each j ∈ L, so there must be an f ∈ N L such that φ(x)(j) ∈ Cj,f (j) for every j ∈ L. But (because N L is countable) this means that X X Y λ∗ φ−1 [C] ≤ λDf = λj Cj,f (j) =
f ∈N L j∈L
f ∈N L ∞ YX
λj Cjn ≤
j∈L n=0
As ² is arbitrary, λ∗ φ−1 [C] ≤
Q j∈L
Y
(λj Cj + ²).
j∈L
˜ λj Cj = λC. Q Q
˜ , whenever W ∈ Λ. ˜ By 254G, it follows that λφ−1 [W ] is defined, and equal to λW Q ˜ (b) Next, λφ[D] = λD for every measurable cylinder D ⊆ X. P P This is easy. Express D as i∈I Di Q ˜ j , where D ˜j = Q where Di ∈ Σi for every i ∈ I and {i : Di 6= Σi } is finite. Then φ[D] = j∈J D i∈Kj Di is a ˜ j 6= Zj } must also be finite (in fact, it cannot have more measurable cylinder for each j ∈ J. Because {j : D Q ˜ j is itself a measurable cylinder in Z, and members than the finite set {i : Di 6= Xi }), j∈J D Q Q ˜ ˜j = Q Q λφ[D] = j∈J λj D j∈J i∈Kj µDi = λD. Q ˜ Applying 254G to φ−1 : Z → X, it follows that λφ[W ] is defined, and equal to λW , for every W ∈ Λ. But together with (a) this means that for any W ⊆ X, ˜ ˜ and λφ[W if W ∈ Λ then φ[W ] ∈ Λ ] = λW , ˜ ˜ if φ[W ] ∈ Λ then W ∈ Λ and λW = λφ[W ]. ˜ ˜ λ). And of course this is just what is meant by saying that φ is an isomorphism between (X, Λ, λ) and (Z, Λ, 254O Proposition Let h(Xi , Σi , µQ i )ii∈I be a family of probability spaces. For each J ⊆ I let λJ be the product probability measure on XJ = i∈J Xi , and ΛJ its domain; write X = XI , λ = λI and Λ = ΛI . For x ∈ X and J ⊆ I set πJ (x) = x¹J ∈ XJ . (a) For every J ⊆ I, λJ is the image measure λπJ−1 (112E); in particular, πJ : X → XJ is inversemeasure-preserving for λ and λJ . (b) If J ⊆ I and W ∈ Λ is determined by coordinates in J (254M), then λJ πJ [W ] is defined and equal to λW . Consequently there are W1 , W2 belonging to the σ-algebra of subsets of X generated by {{x : x(i) ∈ E} : i ∈ J, E ∈ Σi } such that W1 ⊆ W ⊆ W2 and λ(W2 \ W1 ) = 0. (c) For every W ∈ Λ, we can find a countable set J and W1 , W2 ∈ Λ, both determined by coordinates in J, such that W1 ⊆ W ⊆ W2 and λ(W2 \ W1 ) = 0. (d) For every W ∈ Λ, there is a countable set J ⊆ I such that πJ [W ] ∈ ΛJ and λJ πJ [W ] = λW ; so that W 0 = πJ−1 [πJ [W ]] belongs to Λ, and λ(W 0 \ W ) = 0. proof (a)(i) By 254N, we can identify λ with the product of λJ and λI\J on XJ × XI\J . Now πJ−1 [E] ⊆ X corresponds to E × XI\J ⊆ XJ × XI\J , so
254
Product measures
254O
λ(π −1 [E]) = λJ E · λI\J XI\J = λJ E, by 251E or 251Ia, whenever E ∈ ΛJ . This shows that πJ is inverse-measure-preserving. (ii) To see that λJ is actually the image measure, suppose that E ⊆ XJ is such that πJ−1 [E] ∈ Λ. Identifying πJ−1 [E] with E × XI\J , as before, we are supposing that E × XI\J is measurable for the product measure on XJ × XI\J . But this means that for λI\J -almost every z ∈ XI\J , Ez = {y : (y, z) ∈ E × XI\J } belongs to ΛJ (252D(ii), because λJ is complete). Since Ez = E for every z, E itself belongs to ΛJ , as claimed. (b) If W ∈ Λ is determined by coordinates in J, set H = πJ [W ]; then πJ−1 [H] = W , so H ∈ ΛJ by (a) N just above. By 254Ff, there are H1 , H2 ∈ c i∈J Σi such that H1 ⊆ H ⊆ H2 and λJ (H2 \ H1 ) = 0. Let TJ be the σ-algebra of subsets of X generated by sets of the form {x : x(i) ∈ E} where i ∈ J and E ∈ ΣJ . Consider T0J = {G : G ⊆ XJ , πJ−1 [G] ∈ TJ }. This is a σ-algebra of subsets of XJ , and it contains {y : y ∈ XJ , y(i) ∈ E} whenever i ∈ J, E ∈ ΣJ (because πJ−1 [{y : y ∈ XJ , y(i) ∈ E}] = {x : x ∈ X, x(i) ∈ E} N whenever i ∈ J, E ⊆ Xi ). So T0J must include c i∈J Σi . In particular, H1 and H2 both belong to T0J , that is, Wk = πJ−1 [Hk ] belongs to TJ for both k. Of course W1 ⊆ W ⊆ W2 , because H1 ⊆ H ⊆ H2 , and λ(W2 \ W1 ) = λJ (H2 \ H1 ) = 0, as required. N (c) Now take any W ∈ Λ. By 254Ff, there are W1 and W2 ∈ c i∈I Σi such that W1 ⊆ W ⊆ W2 and λ(W2 \ W1 ) = 0. By 254Mc, there are countable sets J1 , J2 ⊆ I such that, for each k, Wk is determined by coordinates in Jk . Setting J = J1 ∪ J2 , J is a countable subset of I and both W1 and W2 are determined by coordinates in J. (d) Continuing the argument from (c), πJ [W1 ], πJ [W2 ] ∈ ΛJ , by (b), and λJ (πJ [W2 ] \ πJ [W1 ]) = 0. Since πJ [W1 ] ⊆ πJ [W ] ⊆ πJ [W2 ], it follows that πJ [W ] ∈ ΛJ , with λJ πJ [W ] = λJ πJ [W2 ]; so that, setting W 0 = πJ−1 [πJ [W ]], W 0 ∈ Λ, and λW 0 = λJ πJ [W ] = λJ πJ [W2 ] = λπJ−1 [πJ [W2 ]] = λW2 = λW . 254P Proposition Let h(Xi , Σi , µi )ii∈I Q be a family of probability spaces, and for each J ⊆ I let λJ be the product probability measure on XJ = i∈J Xi , and ΛJ its domain; write X = XI , Λ = ΛI and λ = λI . For x ∈ X and J ⊆ I set πJ (x) = x¹J ∈ XJ . (a) If J ⊆ I and g is a real-valued function defined on a subset of XJ , then g is ΛJ -measurable iff gπJ is Λ-measurable. (b) Whenever f is a Λ-measurable real-valued function defined on a λ-conegligible subset of X, we can find a countable set J ⊆ I and a ΛJ -measurable function g defined on a λJ -conegligible subset of XJ such that f extends gπJ . proof (a)(i) If g is ΛJ -measurable and a ∈ R, there is an H ∈ ΛJ such that {y : y ∈ dom g, g(y) ≥ a} = H ∩ dom g. Now πJ−1 [H] ∈ Λ, by 254Oa, and {x : x ∈ dom gπJ , gπJ (x) ≥ a} = πJ−1 [H] ∩ dom gπJ . So gπJ is Λ-measurable. (ii) If gπJ is Λ-measurable and a ∈ R, then there is a W ∈ Λ such that {x : gπJ (x) ≥ a} = W ∩dom gπJ . As in the proof of 254Oa, we may identify λ with the product of λJ and λI\J , and 252D(ii) tells us that, if we identify W with the corresponding subset of XJ × XI\J , there is at least one z ∈ XI\J such that Wz = {y : y ∈ XI , (y, z) ∈ W } belongs to ΛJ . But since (on this convention) gπJ (y, z) = g(y) for every y ∈ XJ , we see that {y : y ∈ dom g, g(y) ≥ a} = Wz ∩ dom g. As a is arbitrary, g is ΛJ -measurable. (b) For rational numbers q, set Wq = {x : x ∈ dom f, f (x) ≥ q}. By 254Oc we can find for each q a 0 00 countable set Jq ⊆ I and sets W by coordinates in Jq , such that Wq0 ⊆ Wq ⊆ Wq00 S q , Wq , both determined S 00 0 and λ(Wq \ Wq ) = 0. Set J = q∈Q Jq , V = X \ q∈Q (Wq00 \ Wq0 ); then J is a countable subset of I and V is a conegligible subset of X; moreover, V is determined by coordinates in J because all the Wq0 , Wq00 are.
254R
Infinite products
255
For every q ∈ Q, Wq ∩ V = Wq0 ∩ V , because V S ∩ (Wq \ Wq0 ) ⊆ V ∩ (Wq00 \ Wq0 ) = ∅; so Wq ∩ V is determined by coordinates in J. Consequently V ∩ dom f = q∈Q V ∩ Wq is also determined by coordinates in J. Also T {x : x ∈ V ∩ dom f, f (x) ≥ a} = q≤a V ∩ Wq is determined by coordinates in J. What this means is that if x, x0 ∈ V and πJ x = πJ x0 , then x ∈ dom f iff x0 ∈ dom f and in this case f (x) = f (x0 ). Setting H = πJ [V ∩ dom f ], we have πJ−1 [H] = V ∩ dom f a conegligible subset of X, so (because λJ = λπJ−1 ) H is conegligible in XJ . Also, for y ∈ H, f (x) = f (x0 ) whenever πJ x = πJ x0 = y, so there is a function g : H → R defined by saying that gπJ (x) = f (x) whenever x ∈ V ∩ dom f . Thus g is defined almost everywhere in XJ and f extends gπJ . Finally, for any a ∈ R, πJ−1 [{y : g(y) ≥ a}] = {x : x ∈ V ∩ dom f, f (x) ≥ a} ∈ Λ; by 254Oa, {y : g(y) ≥ a} ∈ ΛJ ; as a is arbitrary, g is measurable. 254Q Proposition Let h(Xi , Σi , µi )ii∈I beQa family of probability spaces, and for each J ⊆ I let λJ be the product probability measure on XJ = i∈J Xi ; write X = XI , λ = λI . For x ∈ X, J ⊆ I set πJ (x) = x¹J ∈ XJ . (a) Let S be the linear subspace of RX spanned by {χC : C ⊆ X is a measurable R cylinder}. Then for every λ-integrable real-valued function f and every ² > 0 there is a g ∈ S such that |f R− g|dλ ≤ ².R (b) Whenever J ⊆ I and g is a real-valued function defined on a subset of XJ , then g dλJ = gπJ dλ if either integral is defined in [−∞, ∞]. (c) Whenever f is a λ-integrable real-valued function, we can find a countable set J ⊆ X and a λJ integrable function g such that f extends gπJ . proof (a)(i) Write SRfor the set of functions f satisfying the assertion, that is, such that for every ² > 0 there is a g ∈ S such that R |f − g| ≤ ². ThenR f1 + f2 and cf1 ∈ S whenever f1 , f2 ∈ S. P Given ² > 0 there are R P ² g1 , g2 ∈ S such that |f1 − g1 | ≤ 2+|c| , |f2 − g2 | ≤ 2² ; now g1 + g2 , cg1 ∈ S and |(f1 + f2 ) − (g1 + g2 )| ≤ ², R |cf1 − cg1 | ≤ ². Q Q Also, of course, f ∈ S whenever f0 ∈ S and f =a.e. f0 . (ii) Write W for {W : W ⊆ X, χW ∈ S}, and C for the family of measurable cylinders in X. Then it is plain from the definition in 254A that C ∩ C 0 ∈ C for all C, C 0 ∈ C, and of course C ∈ W for every C ∈ C, because χC S ∈ S. Next, W \ V ∈ W whenever W , V ∈ W and V ⊆ W , because then χ(W \SV ) = χW − χV . P SetR W = n∈N Wn . Given Thirdly, n∈N Wn ∈ W for every non-decreasing sequence hWn in∈N in W. P ²R > 0, there is an n ∈ N such that λ(W \ Wn ) ≤ 2² . Now there is a g ∈ S such that |χWn − g| ≤ 2² , so that |χW − g| ≤ ². Q Q Thus W is a Dynkin class of subsets of X. By the Monotone Class Theorem (136B), W must include the σ-algebra of subsets of X generated by N C, which is c i∈I Σi . But this means that W contains every measurable subset of X, since by 254Ff any N measurable set differs by a negligible set from some member of c Σi . i∈I
(iii) Thus S contains the characteristic function of any measurable subset of X. under addition and scalar multiplication, it contains all simple functions. But this contain all integrable functions. P P If f is a real-valued function which is integrable R there is a simple function h : X → R such that |f − h| ≤ 2² (242M), and now there R R ² |h − g| ≤ 2 , so that |f − g| ≤ ². Q Q This proves part (a) of the proposition.
Because it is closed means that it must over X, and ² > 0, is a g ∈ S such that
(b) Put 254Oa and 235L together. (c) By 254Pb, there are a countable J ⊆ I and a real-valued function g defined on a conegligible subset of XJ such that f extends gπJ . Now dom(gπJ ) = πJ−1 [dom g] is conegligible, so f =a.e. gπJ and gπJ is λ-integrable. By (b), g is λJ -integrable. 254R Conditional expectations again Putting the ideas of 253H together with the work above, we obtain some results which are important not only for their direct applications but for the light they throw on the structures here.
256
Product measures
254R
Theorem Let h(Xi , Σi , µi )ii∈I be a family of probability spaces with product (X, Λ, λ). For J ⊆ I let ΛJ ⊆ Λ be the σ-subalgebra of sets determined by coordinates in J (254Mb). Then we may regard L0 (λ¹ΛJ ) as a subspace of L0 (λ) (242Jh). Let PJ : L1 (λ) → L1 (λ¹ΛJ ) ⊆ L1 (λ) be the corresponding conditional expectation operator (242J). Then (a) for any J, K ⊆ I, PK∩J = PK PJ ; (b) for any u ∈ L1 (λ), there is a countable set J ∗ ⊆ I such that PJ u = u iff J ⊇ J ∗ ; (c) for any u ∈ L0 (λ), there is a unique smallest set J ∗ ⊆ I such that u ∈ L0 (λ¹ΛJ ∗ ), and this J ∗ is countable; (d) for any W ∈ Λ there is a unique smallest set J ∗ ⊆ I such that W 4W 0 is negligible for some W 0 ∈ ΛJ ∗ , and this J ∗ is countable; (e) for any Λ-measurable real-valued function f : X → R there is a unique smallest set J ∗ ⊆ I such that f is equal almost everywhere to a Λ∗J -measurable function, and this J ∗ is countable. Q proof For J ⊆ I, write XJ = i∈J Xi , let λJ be the product measure on XJ , and set φJ (x) = x¹J for x ∈ X. Write L0J for L0 (λ¹ΛJ ), regarded as a subset of L0 = L0I , and L1J for L1 (λ¹ΛJ ) = L1 (λ) ∩ L0J , as in 242Jb; thus L1J is the set of values of the projection PJ . Q (a)(i) Let C ⊆ X be a measurable cylinder, expressed as i∈I Ci where Ci ∈ Σi for every i and L = {i : Ci 6= Xi } is finite. Set Q Q Ci0 = Ci for i ∈ J, Xi for i ∈ I \ J, C 0 = i∈I Ci0 , α = i∈I\J µi Ci . Then αχC 0 is a conditional expectation of χC on ΛJ . P P By 254N, we can identify λ with the product of λJ and λI\J . This identifies ΛJ with {E × XI\J : E ∈ dom λJ }. By 253H we have a conditional expectation g of χC defined by setting g(y, z) =
R
χC(y, t)λI\J (dt)
Q for y ∈ XJ , z ∈ XI\J . But C is identified with CJ × CI\J , where CJ = i∈J Ci , so that g(y, z) = 0 if y∈ / CJ and otherwise is λI\J CI\J = α. Thus g = αχ(CJ × XI\J ). But the identification between XI × XI\J and X matches CJ × XI\J with C 0 , as described above. So g becomes identified with αχC 0 and αχC 0 is a conditional expectation of χC. Q Q (ii) Next, setting
Q Ci00 = Ci0 for i ∈ K, Xi for i ∈ I \ K, C 00 = i∈I Ci00 , Q Q β = i∈I\K µi Ci0 = i∈I\(J∪K) µi Ci ,
the same arguments show that βχC 00 is a conditional expectation of χC 0 on ΛK . So we have Q
PK PJ (χC)• = βα(χC 00 )• .
But if we look at βα, this is just i∈I\(K∩J) µi Ci , while Ci00 = Ci if i ∈ K ∩ J, Xi for other i. So βαχC 00 is a conditional expectation of χC on ΛK∩J , and PK PJ (χC)• = PK∩J (χC)• . (iii) Thus we see that the operators PK PJ , PK∩J agree on elements of the form χC • where C is a measurable cylinder. Because they are both linear, they agree on linear combinations of these, that is, PK PJ v = PK∩J v whenever v = g • for some g in the space S of 254Q. But Rif u ∈ L1 (λ) and ² > 0, there is a λ-integrable function f such that f • = u an there is a g ∈ S such that |f − g| ≤ ² (254Qa), so that ku − vk1 ≤ ², where v = g • . Since PJ , PK and PK∩J are all linear operators of norm 1, kPK PJ u − PK∩J uk1 ≤ 2ku − vk1 + kPK PJ v − PK∩J vk1 ≤ 2². As ² is arbitrary, PK PJ u = PK∩J u; as u is arbitrary, PK PJ = PK∩J . (b) Take u ∈ L1 (λ). Let J be the family of all subsets J of I such that PJ u = u. By (a), J ∩ K ∈ J for all J, K ∈ J . Next, J contains a countable set J0 . P P Let f be a λ-integrable function such that f • = u. By 254Qc, we can find a countable set J0 ⊆ I and a λJ0 -integrable function g such that f =a.e. gπJ0 . Now gπJ0 is ΛJ0 -measurable and u = (gπJ0 )• belongs to L1J0 , so J0 ∈ J . Q Q
254S
Infinite products
257
T Write J ∗ = J , so that J ∗ ⊆ J0 is countable. Then J ∗ ∈ J . P P Let ² > 0. As in the proof of (a) above, there is a g ∈ S such that ku − vk1 ≤ ², where v = g • . But because g is a finite linear combination of characteristic functions of measurable cylinders, each determined by coordinates in some finite set, there is a finite K ⊆ I such that g isTΛK -measurable, so that PK v = v. Because K is finite, there must be J1 , . . . , Jn ∈ J such that J ∗ ∩ K = 1≤i≤n Ji ∩ K; but as J is closed under finite intersections, J = J1 ∩ . . . ∩ Jn ∈ J , and J ∗ ∩ K = J ∩ K. Now we have PJ ∗ v = PJ ∗ PK v = PJ ∗ ∩K v = PJ∩K v = PJ PK v = PJ v, using (a) twice. Because both PJ and PJ ∗ have norm 1, kPJ ∗ u − uk1 ≤ kPJ ∗ u − PJ ∗ vk1 + kPJ ∗ v − PJ vk1 + kPJ v − PJ uk1 + kPJ u − uk1 ≤ ku − vk1 + 0 + ku − vk1 + 0 ≤ 2². As ² is arbitrary, PJ ∗ u = u and J ∗ ∈ J . Q Q Now, for any J ⊆ I, PJ u = u =⇒ J ∈ J =⇒ J ⊇ J ∗ =⇒ PJ u = PJ PJ ∗ u = PJ∩J ∗ u = PJ ∗ u = u. Thus J ∗ has the required properties. (c) Set e = (χX)• , un = (−ne)∨(u∧ne) for each n ∈ N. Then, for any J ⊆ I, u ∈ L0J iff un ∈ L0J for every n. P P (α) If u ∈ L0J , then u is expressible as f • for some ΛJ -measurable f ; now fn = (−nχX) ∨ (f ∧ nχX) is ΛJ -measurable, so un = fn• ∈ L0J for every n. (β) If un ∈ L0J for each n, then for each n we can find a ΛJ -measurable function fn such that fn• = un . But there is also a Λ-measurable function f such that u = f • , and we must have fn =a.e. (−nχX)∨(f ∧nχX) for each n, so that f =a.e. limn→∞ fn and u = (limn→∞ fn )• . Since limn→∞ fn is ΛJ -measurable, u ∈ L0J . Q Q As every un belongs to L1 , we know that un ∈ L0J ⇐⇒ un ∈ L1J ⇐⇒ PJ un = un . that PJ un = un iff J ⊇ Jn∗ . So we see that u ∈ L0J iff J ⊇ Jn∗ By (b), there is for each nSa countable Jn∗ suchS ∗ ∗ for every n, that is, J ⊇ n∈N Jn . Thus J = n∈N Jn∗ has the property claimed. (d) Applying (c) to u = (χW )• , we have a (countable) unique smallest J ∗ such that u ∈ L0J ∗ . But if J ⊆ I, then there is a W 0 ∈ ΛJ such that W 0 4W is negligible iff u ∈ L0J . So this is the J ∗ we are looking for. (e) Again apply (c), this time to f • . 254S Proposition Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, with product (X, Λ, λ). (a) If A ⊆ X is determined by coordinates in I \ {j} for every j ∈ I, then its outer measure λ∗ A must be either 0 or 1. (b) If W ∈ Λ and λW > 0, then for every ² > 0 there are a W 0 ∈ Λ and a finite set J ⊆ I such that λW 0 ≥ 1 − ² and for every x ∈ W 0 there is a y ∈ W such that x¹I \ J = y¹I \ J. Q proof For J ⊆ I write XJ for i∈J Xi and λJ for the product measure on XJ . (a) Let W be a measurable envelope of A. By 254Rd, there is a smallest J ⊆ I for which there is a W 0 ∈ Λ, determined by coordinates in J, with λ(W 4W 0 ) = 0. Now J = ∅. P P Take any j ∈ I. Then A is determined by coordinates in I \ {j}, that is, can be regarded as Xj × A0 for some A0 ⊆ XI\{j} . We can also think of λ as the product of λ{j} and λI\{j} (254N). Let ΛI\{j} be the domain of λI\{j} . By 251R, λ∗ A = λ∗{j} Xj · λ∗I\{j} A0 = λ∗I\{j} A0 . Let V ∈ ΛI\{j} be measurable envelope of A0 . Then W 0 = Xj × V belongs to Λ, includes A and has measure λ∗ A, so λ(W ∩ W 0 ) = λW = λW 0 and W 4W 0 is negligible. At the same time, W 0 is determined by coordinates in I \ {j}. This means that J must be included in I \ {j}. As j is arbitrary, J = ∅. Q Q
258
Product measures
254S
But the only subsets of X which are determined by coordinates in ∅ are X and ∅. Since W differs from one of these by a negligible set, λ∗ A = λW ∈ {0, 1}, as claimed. (b) Set η = 21 min(², 1)λW . By 254Fe, there is a measurable set V , determined by coordinates in a finite subset J of I, such that λ(W 4V ) ≤ η. Note that 1 2
λV ≥ λW − η ≥ λW > 0, so 1 2
λ(W 4V ) ≤ ²λW ≤ ²λV . ˜ , V˜ ⊆ XI × XI\J be the sets We may identify λ with the c.l.d. product of λJ and λI\J (254N). Let W corresponding to W , V ⊆ X. Then V˜ can be expressed as U × XI\J where λJ U = λV > 0. Set U 0 = {z : ˜ −1 [{z}] = 0}. Then U 0 is measured by λI\J (252D(ii) again, because both λJ and λI\J are z ∈ XI\J , λJ W complete), and Z λJ U · λI\J U 0 ≤
˜ −1 [{z}]4U )λI\J (dz) λJ ( W
˜ −1 [{z}]4U ) = λJ U ) (because if z ∈ U 0 then λJ (W Z ˜ 4V˜ )−1 [{z}]λI\J (dz) = λJ ( W ˜ 4V˜ ) = (λJ × λI\J )(W (252D once more) = λ(W 4V ) ≤ ²λV = ²λJ U. This means that λI\J U 0 ≤ ². Set W 0 = {x : x ∈ X, x¹I \ J ∈ / U 0 }; then λW 0 ≥ 1 − ². If x ∈ W 0 , then 0 −1 ˜ z = x¹I \ J ∈ / U , so W [{z}] is not empty, that is, there is a y ∈ W such that y¹I \ J = z. So this W 0 has the required properties. 254T Remarks It is important to understand that the results above apply to L0 and L1 and measurablesets-up-to-a-negligible set, not to sets and functions themselves. One idea does apply to sets and functions, whether measurable or not. (a) Let hXi ii∈I be a family of sets with Cartesian product X. For each J ⊆ I let WJ be the set of subsets of X determined by coordinates in J. Then WJ ∩ WK = WJ∩K for all J, K ⊆ I. P P Of course WJ ∩ WK ⊇ WJ∩K , because WJ ⊇ WJ 0 whenever J 0 ⊆ J. On the other hand, suppose W ∈ WJ ∩ WK , x ∈ W , y ∈ X and x¹J ∩K = y¹J ∩K. Set z(i) = x(i) for i ∈ J, y(i) for i ∈ I \J. Then z¹J = x¹J so z ∈ W . Also y¹K = z¹K so y ∈ W . As x, y are arbitrary, W ∈ WJ∩K ; as W is arbitrary, WJ ∩ WK ⊆ WJ∩K . Q Q Accordingly, for any W ⊆ X, F = {J : W ∈ WJ } is a filter on I (unless W = X or W = ∅, in which case F = PX). But F does not necessarily have a least element, as the following example shows. (b) Set X = {0, 1}N , W = {x : x ∈ X, limi→∞ x(i) = 0}. Then for everyTn ∈ N W is determined by coordinates in Jn = {i : i ≥ n}. But W is not determined by coordinates in n∈N Jn = ∅. Note that S T W = n∈N i≥n {x : x(i) = 0} is measurable for the usual measure on X. But it is also negligible (since it is countable); in 254Rd we have J = ∅, W 0 = ∅. *254U of §251.
I am now in a position to describe a counter-example answering a natural question arising out
254X
Infinite products
259
Example There are a localizable measure space (X, Σ, µ) and a probability space (Y, T, ν) such that the c.l.d. product measure λ on X × Y is not localizable. proof (a) Take (X, Σ, µ) to be the space of 216E, so that X = {0, 1}I , where I = PC for some set C of cardinal greater than c. For each γ ∈ C write Eγ for {x : x ∈ X, x({γ}) = 1} (that is, G{γ} in the notation of 216Ec); then Eγ ∈ Σ and µEγ = 1; also every measurable set of non-zero measure meets some Eγ in a set of non-zero measure, while Eγ ∩ Eδ is negligible for all distinct γ, δ (see 216Ee). Let (Y, T, ν) be {0, 1}C with the usual measure (254J). For γ ∈ C, let Fγ be {y : y ∈ Y, y(γ) = 1}, so that νFγ = 21 . Let λ be the c.l.d. product measure on X × Y , and Λ its domain. (b) Consider the family W = {Eγ × Fγ : γ ∈ C} ⊆ Λ. ?? Suppose, if possible, that V were an essential supremum of W in Λ in the sense of 211G. For γ ∈ C write Hγ = {x : V [{x}]4Fγ is negligible}. Because Fγ 4Fδ is non-negligible, Hγ ∩ Hδ = ∅ for all γ 6= δ. Now Eγ \ Hγ is µ-negligible for every γ ∈ C. P P λ((Eγ × Fγ ) \ V ) = 0, so Fγ \ V [{x}] is negligible for almost every x ∈ Eγ , by 252D. On the other hand, if we set Fγ0 = Y \ Fγ , Wγ = (X × Y ) \ (Eγ × Fγ0 ), then we see that (Eγ × Fγ0 ) ∩ (Eγ × Fγ ) = ∅,
Eγ × Fγ ⊆ Wγ ,
λ((Eδ × Fδ ) \ Wγ ) = λ((Eγ × Fγ0 ) ∩ (Eδ × Fδ )) ≤ µ(Eγ ∩ Eδ ) = 0 for every δ 6= γ, so Wγ is an essential upper bound for W and V ∩ (Eγ × Fγ0 ) = V \ Wγ must be λ-negligible. Accordingly V [{x}] \ Fγ = V [{x}] ∩ Fγ0 is ν-negligible for µ-almost every x ∈ Eγ . But this means that V [{x}]4Fγ is ν-negligible for µ-almost every x ∈ Eγ , that is, ν(Eγ \ Hγ ) = 0. Q Q Now consider the family hEγ ∩ Hγ iγ∈C . This is a disjoint family of sets of finite measure in X. If E ∈ Σ has non-zero measure, there is a γ ∈ C such that µ(Eγ ∩ Hγ ∩ E) = ν(Eγ ∩ E) > 0. But this means that E = {Eγ ∩ Hγ : γ ∈ C} satisfies the conditions of 213O, and µ must be strictly localizable; which it isn’t. X X (c) Thus we have found a family W ⊆ Λ with no essential supremum in Λ, and λ is not localizable. Remark If (X, Σ, µ) and (Y, T, ν) are any localizable measure spaces with a non-localizable c.l.d. product measure, then their c.l.d. versions are still localizable (213Hb) and still have a non-localizable product (251S), which cannot be strictly localizable; so that one of the factors is also not strictly localizable (251N). Thus any example of the type here must involve a complete locally determined localizable space which is not strictly localizable, as in 216E. 254V Corresponding to 251T and 251Wo, we have the following result on countable powers of atomless probability spaces. Proposition Let (X, Σ, µ) be an atomless probability space and I a countable set. Let λ be the product probability measure on X I . Then {x : x ∈ X I , x is injective} is λ-conegligible. proof For any pair {i, j} of distinct elements of X, the set {z : z ∈ X {i,j} , z(i) = z(j)} is negligible for the product measure on X {i,j} , by 251T. By 254Oa, {x : x ∈ X, x(i) = x(j)} is λ-negligible. Because I is countable, there are only countably many such pairs {i, j}, so {x : x ∈ X, x(i) = x(j) for some distinct i, j ∈ I} is negligible, and its complement is conegligible; but this complement is just the set of injective functions from I to X. 254X Basic exercises (a) Let h(Xi , Σi , µi )ii∈I be any family of probability spaces, with product (X, Λ, µ). Write E for the family of subsets of X expressible as the union of a finite disjoint family of measurable cylinders. (i) Show that if C ⊆ X is a measurable cylinder then X \ C ∈ E. (ii) Show that W ∩ V ∈ E for all W , V ∈ E. (iii) Show that X \ W ∈ E for every W ∈ E. (iv) Show that E is an algebra of subsets of X. (v) Show that for any W ∈ Λ, ² > 0 there is a V ∈ E such that λ(W 4V ) ≤ ²2 . (vi) Show that for any W ∈ Λ, ² > 0 there are disjoint measurable cylinders C0 , . . . , Cn such that λ(W ∩ Cj ) ≥ (1 − ²)λCj S for every j and λ(W \ j≤n Cj ) ≤ ². (Hint: select the Cj from the measurable cylinders composing a set V R R as in (v).) (vii) Show that if f , g are λ-integrable R R functions and C f ≤ C g for every measurable cylinder C ⊆ X, then f ≤a.e. g. (Hint: show that W f ≤ W f for every W ∈ Λ.)
260
Product measures
254Xb
> (b) Let h(Xi , Σi , µi ) be a family of probability spaces, with product (X, Λ, λ). Show that the outer measure λ∗ defined by λ is exactly the outer measure θ described in 254A, that is, that θ is a regular outer measure. (c) Let h(Xi , Σi , µi ) be a family of probability spaces, with product (X, Λ, λ). Write λ0 for the restriction N of λ to c i∈I Σi , and C for the family of measurable cylinders in X. Suppose that (Y, T, ν) is a probability space and φ : Y → X a function. (i) Show that φ is inverse-measure-preserving when regarded as a function N from (Y, T, ν) to (X, c i∈I Σi , λ0 ) iff φ−1 [C] belongs to T and νφ−1 [C] = λ0 C for every C ∈ C. (ii) Show that λ0 is the only measure on X with this property. (Hint: 136C.) > (d) Let I be a set and (Y, T, ν) a complete probability space. Show that a function φ : Y → {0, 1}I is inverse-measure-preserving for ν and the usual measure on {0, 1}I iff ν{y : φ(y)(i) = 1 for every i ∈ J} = 2−#(J) for every finite J ⊆ I. > (e) Let I be any set and λ the usual measure on X = {0, 1}I . Define addition on X by setting (x + y)(i) = x(i) +2 y(i) for every i ∈ I, x, y ∈ X, where 0 +2 0 = 1 +2 1 = 0, 0 +2 1 = 1 +2 0 = 1. (i) Show that for any y ∈ X, the map x 7→ x + y : X → X is inverse-measure-preserving. (Hint: Use 254G.) (ii) Show that the map (x, y) 7→ x + y : X × X → X is inverse-measure-preserving, if X × X is given its product measure. > (f ) Let I be any set and λ the usual measure on PI. (i) Show that the map a 7→ a4b : PI → PI is inverse-measure-preserving for any b ⊆ I; in particular, a 7→ I \ a is inverse-measure-preserving. (ii) Show that the map (a, b) 7→ a4b : PI × PI → PI is inverse-measure-preserving. > (g) Show that for any q ∈ [0, 1] and any set I there is a measure λ on PI such that λ{a : J ⊆ a} = q #(J) for every finite J ⊆ I. > (h) Let (Y, T, ν) be a complete probability space, and write µ for Lebesgue measure on [0, 1]. Suppose that φ : Y → [0, 1] is a function such that νφ−1 [I] exists and is equal to µI for every interval I of the form [2−n k, 2−n (k + 1)], where n ∈ N and 0 ≤ k < 2n . Show that φ is inverse-measure-preserving for ν and µ. (i) Let hXi ii∈I be a family of sets, and for each i ∈ I let Σi be a σ-algebra of subsets of Xi . Show N that for every E ∈ c i∈I Σi there is a countable set J ⊆ I such that E is expressible as πJ−1 [F ] for some Q Q N Xi . Xi for x ∈ F ∈ c Xj , writing πJ (x) = x¹J ∈ i∈J
i∈I
i∈J
N
(j) (i) Let ν be the usual measure on X = {0, 1} . Show that for any k ≥ 1, (X, ν) is isomorphic to (X k , νk ), where νk is the measure on X k which is the product measure obtained by giving each factor X the measure ν. (ii) Writing µ[0,1] for Lebesgue measure on [0, 1], etc., show that for any k ≥ 1, ([0, 1]k , µ[0,1]k ) is isomorphic to ([0, 1], µ[0,1] ). (k) (i) Writing µ[0,1] for Lebesgue measure on [0, 1], etc., show that ([0, 1], µ[0,1] ) is isomorphic to k ([0, 1[ , µ[0,1[ ). (ii) Show that for any k ≥ 1, ([0, 1[ , µ[0,1[k ) is isomorphic to ([0, 1[ , µ[0,1[ ). (iii) Show that for any k ≥ 1, (R, µR ) is isomorphic to (R k , µRk ). (l) Let µ be Lebesgue measure on [0, 1] and λ the product measure on [0, 1]N . Show that ([0, 1], µ) and ([0, 1]N , λ) are isomorphic. Q (m) Let h(Xi , Σi , µi )ii∈I be a family of complete probability spaces Qand λ the product measure Q on ∗ i∈I Xi , with domain Λ. Suppose that Ai ⊆ Xi for each i ∈ I. Show that i∈I Ai ∈ Λ iff either (i) i∈I µi Ai = 0 or (ii) Ai ∈ Σi for every i and {i : Ai 6= Xi } is countable. (Hint: assemble ideas from 252Xb, 254F, 254L and 254N.) (n) Let h(Xi , Σi , µi )ii∈I be a family of probability spaces with product (X, Λ, λ). (i) Show that, for any A ⊆ X, λ∗ A = min{λ∗J πJ [A] : J ⊆ I is countable}, Q where for J ⊆ I I write λJ for the product probability measure on XJ = i∈J Xi and πJ : X → XJ for the canonical map. (ii) Show that if J, K ⊆ I are disjoint and A, B ⊆ X are determined by coordinates in J, K respectively, then λ∗ (A ∩ B) = λ∗ A · λ∗ B.
254 Notes
Infinite products
261
(o) Let h(Xi , Σi , µi )ii∈I be a family of probability spaces with product (X, Λ, λ). Let S be the linear span of the set of characteristic functions of measurable cylinders in X, as in 254Q. Show that {f • : f ∈ S} is dense in Lp (µ) for every p ∈ [1, ∞[. (p) Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, and (X, Λ, λ) their product; for J ⊆ I let ΛJ be the σ-algebra of members of Λ determined by coordinates in J and PJ : L1 = L1 (λ) → L1J = L1 (λ¹ΛJ ) the corresponding conditional expectation. (i) Show that if u ∈ L1J and v ∈ L1I\J then u × v ∈ L1 and R R R R R u × v = u · v. (Hint: 253D.) (ii) Show that if u ∈ L1 then u ∈ L1J iff C u = λC · u for every measurable cylinder C ⊆ X which isTdetermined by coordinates in I \ J. (Hint: 254Xa(vii).) (iii) Show that T if J ⊆ PI is non-empty, with J ∗ = J , then L1J ∗ = J∈J L1J . (q) (i) Let I be any set and λ the usual measure on PI. Let A ⊆ PI be such that a4b ∈ A whenever a ∈ A and b ⊆ I is finite. Show that λ∗ A must be either 0 or 1. (ii) Let λ be the usual measure on {0, 1}N , and Λ its domain. Let f : {0, 1}N → R be a function such that, for x, y ∈ {0, 1}N , f (x) = f (y) ⇐⇒ {n : n ∈ N, x(n) 6= y(n)} is finite. Show that f is not Λ-measurable. (Hint: for any q ∈ Q, λ∗ {x : f (x) ≤ q} is either 0 or 1.) Q (r) Let hXi ii∈I be any family of sets and A ⊆ B ⊆ i∈I Xi . Suppose that A is determined by coordinates in J ⊆ I and that B is determined by coordinates in K. Show that there is a set C such that A ⊆ C ⊆ B and C is determined by coordinates in J ∩ K. 254Y Further exercises (a) Let Q h(Xi , Σi , µi )ii∈I be a family of probability spaces, and for J ⊆ I let λJ be the product measure on XJ = i∈J Xi ; write X = XI , λ = λI and πJ (x) = x¹J for x ∈ X and J ⊆ I. (i) Show that for K ⊆ J ⊆ I we have a natural linear, order-preserving and norm-preserving map TJK : L1 (λK ) → L1 (λJ ) defined by writing TJK (f • ) = (f πKJ )• for every λK -integrable function f , where πKJ (y) = y¹K for y ∈ XJ . (ii) Write K for the set of finite subsets of I. Show that if W is any Banach space and hTK iK∈K is a family such that (α) TK is a bounded linear operator from L1 (λK ) to W for every K ∈ K (β) TK = TJ TJK whenever K ⊆ J ∈ K (γ) supK∈K kTK k < ∞, then there is a unique bounded linear operator T : L1 (λ) → W such that TK = T TIK for every K ∈ K. S (iii) Write J for the set of countable subsets of I. Show that L1 (λ) = J∈J TIJ [L1 (λJ )]. Q (b) Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, and λ a complete measure on X = i∈I Xi . Suppose that for every complete probability space (Y, T, ν) and function φ : Y → X, φ is inverse-measurepreserving for ν and λ iff νφ−1 [C] is defined and equal to θ0 C for every measurable cylinder C ⊆ X, writing θ0 for the functional of 254A. Show that λ is the product measure on X. (c) Let I be a set, and λ the usual measure on {0, 1}I . Show that L1 (λ) is separable, in its norm topology, iff I is countable. (d) Let I be a set, and λ the usual measure on PI. Show that if F is a non-principal ultrafilter on I then λ∗ F = 1. (Hint: 254Xq, 254Xf.) (e) Let (X, Σ, µ), (Y, T, ν) and λ be as in 254U. Set A = {fγ : γ ∈ C} as defined in 216E. Let µA be the ˜ the c.l.d. product measure of µA and ν on A × Y . Show that λ ˜ is a proper subspace measure on A, and λ ˜ = {(fγ , y) : γ ∈ C, y ∈ Fγ }, in the notation of extension of the subspace measure λA×Y . (Hint: consider W 254U.) (f ) Let (X, Σ, µ) be an atomless probability space, I a set with cardinal at most #(X), and A the set of injective functions from I to X. Show that A has full outer measure for the product measure on X I . 254 Notes and comments While there are many reasons for studying infinite products of probability spaces, one stands pre-eminent, from the point of view of abstract measure theory: they provide constructions of essentially new kinds of measure space. I cannot describe the nature of this ‘newness’ effectively without
262
Product measures
254 Notes
venturing into the territory of Volume 3. But the function spaces of Chapter 24 do give at least a form of words we can use: these are the first probability spaces (X, Λ, λ) we have seen for which L1 (λ) need not be separable for its norm topology (254Yc). The formulae of 254A, like those of 251A, lead very naturally to measures; the point at which they become more than a curiosity is when we find that the product measure λ is a probability measure (254Fa), which must be regarded as the crucial argument of this section, just as 251E is the essential basis of §251. It is I think remarkable that it makes no difference to the result here whether I is finite, countably infinite or uncountable. If you write out the proof for the case I = N, it will seem natural to expand the sets Jn until they are initial segments of I itself, thereby avoiding altogether the auxiliary set K; but this is a misleading simplification, because it hides an essential feature of the argument, which is that any sequence in C involves only countably many coordinates, so that as long as we are dealing with only one such sequence the uncountability of the whole set I is irrelevant. This general principle naturally permeates the whole of the section; in 254O I have tried to spell out the way in which many of the questions we are interested in can be expressed in terms of countable subproducts of the factor spaces Xi . See also the exercises 254Xi, 254Xm and 254Ya(iii). There is a slightly paradoxical side to this principle: even the best-behaved subsets Ei of Xi may fail Q I to have measurable products i∈I Ei if Ei 6= Xi for uncountably many i. For instance, ]0, 1[ is not a measurable subset of [0, 1]I if I is uncountable (254Xm). It has full outer measure and its own product measure is just the subspace measure (254L), but any measurable subset must have measure zero. The N point is that the empty set is the only member of c i∈I Σi , where Σi is the algebra of Lebesgue measurable I subsets of [0, 1] for each i, which is included in ]0, 1[ (see 254Xi). As in §251, I use a construction which automatically produces a complete measure on the product space. I am sure that this is the best choice for ‘the’ product measure. But there are occasions when its restriction to the σ-algebra generated by the measurable cylinders is worth looking at; see 254Xc. Lemma 254G is a result of a type which will be commoner in Volume 3 than in the present volume. It describes the product measure in terms not of what it is but of what it does; specifically, in terms of a property of the associated family of inverse-measure-preserving functions. It is therefore a ‘universal mapping theorem’. (Compare 253F.) Because this description is sufficient to determine the product measure completely (254Yb), it is not surprising that I use it repeatedly. The ‘usual measure’ on {0, 1}I (254J) is sometimes called ‘coin-tossing measure’ because it can be used to model the concept of tossing a coin arbitrarily many times indexed by the set I, taking an x ∈ {0, 1}I to represent the outcome in which the coin is ‘heads’ for just those i ∈ I for which x(i) = 1. The sets, or ‘events’, in the class C are just those which can be specified by declaring the outcomes of finitely many tosses, and the probability of any particular sequence of n results is 1/2n , regardless of which tosses we look at or in which order. In Chapter 27 I will return to the use of product measures to represent probabilities involving independent events. In 254K I come to the first case in this treatise of a non-trivial isomorphism between two measure spaces. If you have been brought up on a conventional diet of modern abstract pure mathematics based on algebra and topology, you may already have been struck by the absence of emphasis on any concept of ‘homomorphism’ or ‘isomorphism’. Here indeed I start to speak of ‘isomorphisms’ between measure spaces without even troubling to define them; I hope it really is obvious that an isomorphism between measure spaces (X, Σ, µ) and (Y, T, ν) is a bijection φ : X → Y such that T = {F : F ⊆ Y, φ−1 [F ] ∈ Σ} and νF = µφ−1 [F ] for every F ∈ T, so that Σ is necessarily {E : E ⊆ X, φ[E] ∈ T} and µE = νφ[E] for every E ∈ Σ. Put like this, you may, if you worked through the exercises of Volume 1, be reminded of some constructions of σ-algebras in 111Xc-111Xd and of the ‘image measures’ in 112E. The result in 254K (see also 134Yo) naturally leads to two distinct notions of ‘homomorphism’ between two measure spaces (X, Σ, µ) and (Y, T, ν): (i) a function φ : X → Y such that φ−1 [F ] ∈ Σ and µφ−1 [F ] = νF for every F ∈ T, (ii) a function φ : X → Y such that φ[E] ∈ T and νφ[E] = µE for every E ∈ Σ. On either definition, we find that a bijection φ : X → Y is an isomorphism iff φ and φ−1 are both homomorphisms. (Also, of course, the composition of homomorphisms will be a homomorphism.) My own view is that (i) is the more important, and in this treatise I study such functions at length, calling them ‘inverse-measure-preserving’. But both have their uses. The function φ of 254K not only satisfies both definitions, but is also ‘nearly’ an isomorphism in several different ways, of which possibly the most
255A
Convolutions of functions
263
important is that there are conegligible sets X 0 ⊆ {0, 1}N , Y 0 ⊆ [0, 1] such that φ¹X 0 is an isomorphism between X 0 and Y 0 when both are given their subspace measures. Having once established the isomorphism between [0, 1] and {0, 1}N , we are led immediately to many more; see 254Xj-254Xl. In fact Lebesgue measure on [0, 1] is isomorphic to a large proportion of the probability spaces arising in applications. In Volumes 3 and 4 I will discuss these isomorphisms at length. The general notion of ‘subproduct’ is associated with some of the deepest and most characteristic results in the theory of product measures. Because we are looking at products of arbitrary families of probability spaces, the definition must ignore any possible structure in the index set I of 254A-254C. But many applications, naturally enough, deal with index sets with favoured subsets or partitions, and the first essential step is the ‘associative law’ (254N; compare 251Xd-251Xe). This is, for instance, the tool Q by which Qwe can apply Fubini’s theorem within infinite products. The natural projection maps from i∈I Xi to i∈J Xi , where J ⊆ I,Qare related in a way which has already been used as the basis Q of theorems in §235; the product measure on i∈J Xi is precisely the image of the product measure on i∈I Xi (254Oa). In 254O-254Q I explore the consequences of this fact and the fact already noted that all measurable sets in the product are ‘essentially’ determined by coordinates in some countable set.Q In 254R I go more deeply into this notion of a set W ⊆ i∈I Xi ‘determined by coordinates in’ a set J ⊆ I. In its primitive form this is a purely set-theoretic notion (254M, 254Ta). I think that even a threeelement set I can give us surprises; I invite you to try to visualize subsets of [0, 1]3 which are determined by pairs of coordinates. But the interactions of this with measure-theoretic ideas, and in particular with a willingness to add or discard negligible sets, lead to much more, and in particular to the unique minimal sets of coordinates associated with measurable sets and functions (254R). Of course these results can be elegantly and effectively described in terms of L1 and L0 spaces, in which negligible sets are swept out of sight as the spaces are constructed. The basis of all this is the fact that the conditional expectation operators associated with subproducts multiply together in the simplest possible way (254Ra); but some further idea is needed T to show that if J is a non-empty family of subsets of I, then L0T J = J∈J L0J (see part (b) of the proof of 254R, and 254Xp(iii)). 254Sa is a version of the ‘zero-one law’ (272O below). 254Sb is a strong version of the principle that measurable sets in a product must be approximable by sets determined by a finite set of coordinates (254Fe, 254Qa, 254Xa). Evidently it is not a coincidence that the set W of 254Tb is negligible. In §272 I will revisit many of the ideas of 254R-254S and 254Xp, in particular, in the more general context of ‘independent σ-algebras’. Finally, 254U and 254Ye hardly belong to this section at all; they are unfinished business from §251. They are here because the construction of 254A-254C is the simplest way to produce an adequately complex probability space (Y, T, ν).
255 Convolutions of functions I devote a section to a construction which is of great importance – and will in particular be very useful in Chapters 27 and 28 – and may also be regarded as a series of exercises on the work so far. I find it difficult to know how much repetition to indulge in in this section, because the natural unified expression of the ideas is in the theory of topological groups, and I do not think we are yet ready for the general theory (I will come to it in Chapter 44 in Volume 4). The groups we need for this volume are R; R r , for r ≥ 2; S 1 = {z : z ∈ C, |z| = 1}, the ‘circle group’; Z, the group of integers. All the ideas already appear in the theory of convolutions on R, and I will therefore present this material in relatively detailed form, before sketching the forms appropriate to the groups R r and S 1 (or ]−π, π]); Z can I think be safely left to the exercises. 255A This being a book on measure theory, it is perhaps appropriate for me to emphasize, as the basis of the theory of convolutions, certain measure space isomorphisms.
264
Product measures
255A
Theorem Let µ be Lebesgue measure on R and µ2 Lebesgue measure on R 2 ; write Σ, Σ2 for their domains. (a) For any a ∈ R, the map x 7→ a + x : R → R is a measure space automorphism of (R, Σ, µ). (b) The map x 7→ −x : R → R is a measure space automorphism of (R, Σ, µ). (c) For any a ∈ R, the map x 7→ a − x : R → R is a measure space automorphism of (R, Σ, µ). (d) The map (x, y) 7→ (x + y, y) : R 2 → R 2 is a measure space automorphism of (R 2 , Σ2 , µ2 ). (e) The map (x, y) 7→ (x − y, y) : R 2 → R 2 is a measure space automorphism of (R 2 , Σ2 , µ2 ). Remark I ought to remark that (b), (d) and (e) may be regarded as simple special cases of Theorem 263A in the next chapter. I nevertheless feel that it is worth writing out separate proofs here, partly because the general case of linear operators dealt with in 263A requires some extra machinery not needed here, but more because the result here has nothing to do with the linear structure of R and R 2 ; it is exclusively dependent on the group structure of R, together with the links between its topology and measure, and the arguments I give now are adaptable to the proper generalizations to abelian topological groups. proof (a) This is just the translation-invariance of Lebesgue measure, dealt with in §134. There I showed that if E ∈ Σ then E + a ∈ Σ and µ(E + a) = µE (134Ab); that is, writing φ(x) = x + a, µ(φ[E]) exists and is equal to µE for every E ∈ Σ. But of course we also have µ(φ−1 [E]) = µ(E + (−a)) = µE for every E ∈ Σ, so φ is an automorphism. (b) The point is that µ∗ (A) = µ∗ (−A) for every A ⊆ R. P P (I follow the P∞definitions of Volume 1.) If ² > S 0, there is a sequence hIn in∈N of half-open intervals covering A with n=0 µIn ≤ µ∗ A + ². Now −A ⊆ n∈N (−In ). But if In = [an , bn [ then −In = ]−bn , an ], so P∞ P∞ P∞ µ∗ (−A) ≤ n=0 µ(−In ) = n=0 max(0, −an − (−bn )) = n=0 µIn ≤ µ∗ A + ². As ² is arbitrary, µ∗ (−A) ≤ µ∗ A. Also of course µ∗ A ≤ µ∗ (−(−A)) = µ∗ A, so µ∗ (−A) = µ∗ A. Q Q This means that, setting φ(x) = −x this time, φ is an automorphism of the structure (R, µ∗ ). But since µ is defined from µ∗ by the abstract procedure of Carath´eodory’s method, φ must also be an automorphism of the structure (R, Σ, µ). (c) Put (a) and (b) together; x 7→ a − x is the composition of the automorphisms x 7→ −x and x 7→ a + x, and the composition of automorphisms is surely an automorphism. (d)(i) Write T for the set {E : E ∈ Σ2 , φ[E] ∈ Σ2 }, where this time φ(x, y) = (x + y, y) for x, y ∈ R, so that φ : R 2 → R2 is a bijection. Then T is a σ-algebra, being the intersection of the σ-algebras Σ2 and {E : φ[E] ∈ Σ2 } = {φ−1 [F ] : F ∈ Σ2 }. Moreover, µ2 E = µ2 (φ[E]) for every E ∈ T. P P By 252D, we have R µ2 E = µ{x : (x, y) ∈ E}µ(dy). But applying the same result to φ[E] we have Z
Z
Z
µ{x : (x, y) ∈ φ[E]}µ(dy) = µ{x : (x − y, y) ∈ E}µ(dy) Z µ(E −1 [{y}] + y)µ(dy) = µE −1 [{y}]µ(dy)
µ2 φ[E] = =
(because Lebesgue measure is translation-invariant) = µ2 E. Q Q (ii) Now φ and φ−1 are clearly continuous, so that φ[G] is open, and therefore measurable, for every open G; consequently all open sets must belong to T. Because T is a σ-algebra, it contains all Borel sets. Now let E be any measurable set. Then there are Borel sets H1 , H2 such that H1 ⊆ E ⊆ H2 and µ2 (H2 \ H1 ) = 0 (134Fb). We have φ[H1 ] ⊆ φ[E] ⊆ φ[H2 ] and µ(φ[H2 ] \ φ[H1 ]) = µφ[H2 \ H1 ] = µ(H2 \ H1 ) = 0. Thus φ[E] \ φ[H1 ] must be negligible, therefore measurable, and φ[E] = φ[H1 ] ∪ (φ[E] \ φ[H1 ]) is measurable.
255D
Convolutions of functions
265
This shows that φ[E] is measurable whenever E is. But now observe that T can also be expressed as {E : E ∈ Σ2 , φ−1 [E] ∈ Σ2 }, so that we can apply the same argument with φ−1 in the place of φ to see that φ−1 [E] is measurable whenever E is. So φ is an automorphism of the structure (R 2 , Σ2 ), and therefore (by (i) again) of (R 2 , Σ2 , µ2 ). (e) Of course this is an immediate corollary either of the proof of (d) or of (d) itself as stated, since (x, y) 7→ (x − y, y) is just the inverse of (x, y) 7→ (x + y, y). 255B Corollary (a) If a ∈ R, then for any complex-valued function f defined on a subset of R R R R R f (x)dx = f (a + x)dx = f (−x)dx = f (a − x)dx in the sense that if one of the integrals exists so do the others, and they are then all equal. (b) If f is a complex-valued function defined on a subset of R2 , then R R R f (x + y, y)d(x, y) = f (x − y, y)d(x, y) = f (x, y)d(x, y) in the sense that if one of the integrals exists and is finite so does the other, and they are then equal. 255C Remarks (a) I am not sure whether it ought to be ‘obvious’ that if (X, Σ, µ), (Y, T, ν) are measure spaces and φ : X → Y is an isomorphism, then for any function f defined on a subset of Y R R f (φ(x))µ(dx) = f (y)ν(dy) in the sense that if one is defined so is the other, and they are then equal. If it is obvious then the obviousness must be contingent on the nature of the definition of integration: integrability with respect to the measure µ is something which depends on the structure (X, Σ, µ) and on no other properties of X. If it is not obvious then it is an easy deduction from Theorem 235A above, applied in turn to φ and φ−1 and to the real and imaginary parts of f . In any case the isomorphisms of 255A are just those needed to prove 255B. R (b) Note that in 255Bb I write f (x, y)d(x, y) to emphasize that I am considering the integral of f with respect to two-dimensional Lebesgue measure. The fact that ¢ ¢ ¢ R ¡R R ¡R R ¡R f (x, y)dx dy = f (x + y, y)dx dy = f (x − y, y)dx dy R R is actually easier, being an immediate consequence of the equality f (a+x)dx = f (x)dx. But applications of this result often depend essentially on the fact that the functions (x, y) 7→ f (x + y, y), (x, y) 7→ f (x − y, y) are measurable as functions of two variables. (c) I have moved directly to complex-valued functions because these are necessary for the applications in Chapter 28. If however they give you any discomfort, either technically or aesthetically, all the measuretheoretic ideas of this section are already to be found in the real case, and you may wish at first to read it as if only real numbers were involved. 255D
A further corollary of 255A will be useful.
Corollary Let f be a complex-valued function defined on a subset of R. (a) If f is measurable, then the functions (x, y) 7→ f (x + y), (x, y) 7→ f (x − y) are measurable. (b) If f is defined almost everywhere on R, then the functions (x, y) 7→ f (x + y), (x, y) 7→ f (x − y) are defined almost everywhere on R 2 . proof Writing g1 (x, y) = f (x + y), g2 (x, y) = f (x − y) whenever these are defined, we have g(x, y) = (f ⊗ 1)(φ(x, y)),
g2 (x, y) = (f ⊗ 1)(φ−1 (x, y)),
writing φ(x, y) = (x + y, y) as in 255B(d-e), and (f ⊗ 1)(x, y) = f (x), following the notation of 253B. By 253C, f ⊗ 1 is measurable if f is, and defined almost everywhere if f is. Because φ is a measure space automorphism, (f ⊗ 1)φ = g1 and (f ⊗ 1)φ−1 = g2 are measurable, or defined almost everywhere, if f is.
266
Product measures
255E
255E The basic formula Let f and g be measurable complex-valued functions defined almost everywhere in R. Write f ∗ g for the function defined by the formula R (f ∗ g)(x) = f (x − y)g(y)dy whenever the integral exists (with respect to Lebesgue measure, naturally) as a complex number. Then f ∗ g is the convolution of the functions f and g. Observe that dom(|f | ∗ |g|) = dom(f ∗ g), and that |f ∗ g| ≤ |f | ∗ |g| everywhere on their common domain, for all f and g. Remark Note that I am here prepared to contemplate the convolution of f and g for arbitrary members of L0C , the space of almost-everywhere-defined measurable complex-valued functions, even though the domain of f ∗ g may be empty. 255F Basic properties (a) Because integration is linear, we surely have ((f1 + f2 ) ∗ g)(x) = (f1 ∗ g)(x) + (f2 ∗ g)(x), (f ∗ (g1 + g2 ))(x) = (f ∗ g1 )(x) + (f ∗ g2 )(x), (cf ∗ g)(x) = (f ∗ cg)(x) = c(f ∗ g)(x) whenever the right-hand sides of the formulae are defined. (b) If f , g are measurable complex-valued functions defined almost everywhere in R, then f ∗ g = g ∗ f , in the strict sense that they have the same domain and the same value at each point of that common domain. P P Take x ∈ R and apply 255Ba to see that Z Z (f ∗ g)(x) = f (x − y)g(y)dy = f (x − (x − y))g(x − y)dy Z = f (y)g(x − y)dy = (g ∗ f )(x) if either is defined. Q Q (c) If f1 , f2 , g1 , g2 are measurable complex-valued functions defined almost everywhere in R, and f1 =a.e. f2 and g1 =a.e. g2 , then for every x ∈ R we shall have f1 (x − y) = f2 (x − y) for almost every y ∈ R, by 255Ac. Consequently f1 (x − y)g1 (y) = f2 (x − y)g2 (y) for almost every y, and (f1 ∗ g1 )(x) = (f2 ∗ g2 )(x) in the sense that if one of these is defined so is the other, and they are then equal. Accordingly we may regard convolution as a binary operator on L0C ; if u, v ∈ L0C , we can define u ∗ v as being equal to f ∗ g whenever f • = u and g • = v. We need to remember, of course, that for general u, v ∈ L0 the domain of u ∗ v may vanish. 255G I have grouped 255Fa-255Fc together because they depend only on ideas up to and including 255Ac, 255Ba. Using the second halves of 255A and 255B we get much deeper. I begin with what seems to be the fundamental result. Theorem Let f , g and h be measurable complex-valued functions defined almost everywhere in R. (a) R R h(x)(f ∗ g)(x)dx = h(x + y)f (x)g(y)d(x, y) whenever the right-hand side exists in C, provided that in the expression h(x)(f ∗ g)(x) we interpret the product as 0 if h(x) = 0 and (f ∗ g)(x) is undefined. R R (b) If, on the same interpretation of h(x)(|f | ∗ |g|)(x), the integral |h(x)|(|f | ∗ |g|)(x)dx is finite, then h(x + y)f (x)g(y)d(x, y) exists in C, so again we shall have Z R h(x)(f ∗ g)(x)dx = h(x + y)f (x)g(y)d(x, y) ZZ ZZ = h(x + y)f (x)g(y)dxdy = h(x + y)f (x)g(y)dydx.
255I
Convolutions of functions
267
proof Consider the functions k1 (x, y) = h(x)f (x − y)g(y), k2 (x, y) = h(x + y)f (x)g(y) wherever these are defined. 255D tells us that k1 and k2 are measurable and defined almost everywhere. Now setting φ(x, y) = (x + y, y), we have k2 = k1 φ, so that R R k1 (x, y)d(x, y) = k2 (x, y)d(x, y) if either exists, by 255Bb. If
R
h(x + y)f (x)g(y)d(x, y) =
R
k2
exists, then by Fubini’s theorem we have R R R R k2 = k1 (x, y)d(x, y) = ( h(x)f (x − y)g(y)dy)dx R so h(x)f (x − y)g(y)dy exists almost everywhere, that is, (f ∗ g)(x) exists for almost every x such that h(x) 6= 0; on the interpretation I am using here, h(x)(f ∗ g)(x) exists almost everywhere, and Z Z Z Z ¡ ¢ h(x)(f ∗ g)(x)dx = h(x)f (x − y)g(y)dy dx = k1 Z Z = k2 = h(x + y)f (x)g(y)d(x, y). If (on the same interpretation) |h| × (|f | ∗ |g|) is integrable, |k1 (x, y)| = |h(x)||f (x − y)||g(y)| is measurable, and
RR
|h(x)||f (x − y)||g(y)|dydx =
R
|h(x)|(|f | ∗ |g|)(x)dx
is finite, so by Tonelli’s theorem (252G, 252H) k1 and k2 are integrable, and once again Z Z h(x)(f ∗ g)(x)dx = h(x + y)f (x)g(y)d(x, y) ZZ ZZ = h(x + y)f (x)g(y)dxdy = h(x + y)f (x)g(y)dydx.
255H
Certain standard results are now easy.
Corollary If f , g are complex-valued functions which are integrable over R, then f ∗ g is integrable, with R R R R R R f ∗ g = f g, |f ∗ g| ≤ |f | |g|. proof In 255G, set h(x) = 1 for every x ∈ R; then R R R R h(x + y)f (x)g(y)d(x, y) = f (x)g(y)d(x, y) = f g by 253D, so as claimed. Now
R
f ∗g =
R
h(x)(f ∗ g)(x)dx = R
|f ∗ g| ≤
R
R
h(x + y)f (x)g(y)d(x, y) =
|f | ∗ |g| =
R
|f |
R
R
f
R
g,
|g|.
255I Corollary For any measurable complex-valued functions f , g defined almost everywhere in R, f ∗ g is measurable and has measurable domain. proof Set fn (x) = f (x) if x ∈ dom f , |x| ≤ n, |f (x)| ≤ n, and 0 elsewhere in R; define gn similarly from g. Then fn and gn are integrable, |fn | ≤ |f | and |gn | ≤ |g| almost everywhere, and f = limn→∞ fn , g = limn→∞ gn . Consequently, by Lebesgue’s Dominated Convergence Theorem,
268
Product measures
Z
255I
Z
(f ∗ g)(x) =
f (x − y)g(y)dy = lim fn (x − y)gn (y)dy n→∞ Z = lim fn (x − y)gn (y)dy = lim (fn ∗ gn )(x) n→∞
n→∞
for every x ∈ dom f ∗ g. But fn ∗ gn is integrable, therefore measurable, for every n, so that f ∗ g must be measurable. As for the domain of f ∗ g, Z x ∈ dom(f ∗ g) ⇐⇒ f (x − y)g(y)dy is defined in C Z ⇐⇒ |f (x − y)||g(y)|dy is defined in R Z ⇐⇒ |fn (x − y)||gn (y)|dy is defined in R for every n Z and sup |fn (x − y)||gn (y)|dy < ∞. n∈N
Because every |fn | ∗ |gn | is integrable, therefore measurable and with measurable domain, T dom(f ∗ g) = {x : x ∈ n∈N dom(|fn | ∗ |gn |), supn∈N (|fn | ∗ |gn |)(x) < ∞} is measurable. 255J Theorem Let f , g, h be complex-valued measurable functions defined almost everywhere in R, such that f ∗ g and g ∗ h are also defined a.e. Suppose that x ∈ R is such that one of (|f | ∗ (|g| ∗ |h|))(x), ((|f | ∗ |g|) ∗ |h|)(x) is defined in R. Then f ∗ (g ∗ h) and (f ∗ g) ∗ h are defined and equal at x. proof Set k(y) = f (x − y) when this is defined, so that k is measurable and defined almost everywhere. Now R RR (|f | ∗ (|g| ∗ |h|))(x) = |f (x − y)|(|g| ∗ |h|)(y)dy = |k(y)||g(y − z)||h(z)|dzdy, Z
ZZ (|f | ∗ |g|)(x − y)|h(y)|dy = |f (x − y − z)||g(z)||h(y)|dzdy ZZ ZZ = |k(y + z)||g(z)||h(y)|dzdy = |k(y + z)||g(y)||h(z)|dydz.
((|f | ∗ |g|) ∗ |h|)(x) =
So if either of these is finite, the conditions of 255Gb are satisfied, with k, g, h in the place of h, f and g, and R R k(y)(g ∗ h)(y)dy = k(y + z)g(y)h(z)d(y, z), that is, Z (f ∗ (g ∗ h))(x) =
Z f (x − y)(g ∗ h)(y)dy =
k(y)(g ∗ h)(y)dy ZZ = k(y + z)g(y)h(z)d(y, z) = k(y + z)g(y)h(z)dydz ZZ Z = f (x − y − z)g(y)h(z)dydz = (f ∗ g)(x − z)h(z)dz Z
= ((f ∗ g) ∗ h)(x). 255K I do not think we shall need an exhaustive discussion of the question of just when (f ∗ g)(x) is defined; this seems to be complicated. However there is a fundamental case in which we can be sure that (f ∗ g)(x) is defined everywhere.
255L
Convolutions of functions
269
Proposition Suppose that f , g are measurable complex-valued functions defined almost everywhere in R, 1 and that f ∈ LpC , g ∈ LqC where p, q ∈ [1, ∞] and p1 + 1q = 1 (writing ∞ = 0 as usual). Then f ∗ g is defined everywhere in R, is uniformly continuous, and supx∈R |(f ∗ g)(x)| ≤ kf kp kgkq . proof (a) (For an introduction to Lp spaces, see §244.) For any x ∈ R, the function fx , defined by setting fx (y) = f (x − y) whenever x − y ∈ dom f , must alsoR belong to Lp , because fx = f φ for an automorphism φ of the measure space. Consequently (f ∗ g)(x) = fx × g is defined, and of modulus at most kf kp kgkq , by 243Fa/243K and 244Eb/244Ob. (b) To see that f ∗ g is uniformly continuous, argue as follows. Suppose first that p < ∞. Let ² > 0. Let η > 0 be such that η(2 + 21/p )kgkq ≤ ². Then there is a bounded continuous function h : R → C such that {x : h(x) 6= 0} is bounded and kf − hkp ≤ η (244H, 244Ob); let M ≥ 1 be such that h(x) = 0 whenever |x| ≥ M − 1. Next, h is uniformly continuous, so there is a δ ∈ ]0, 1] such that |h(x) − h(x0 )| ≤ ηM −1/p whenever |x − x0 | ≤ δ. Suppose that |x − x0 | ≤ δ. Defining hx (y) = h(x − y), as before, we have Z
Z |hx − hx0 |p =
(substituting t = x − y)
Z |h(x − y) − h(x0 − y)|p dy =
Z
M
=
|h(t) − h(x0 − x + t)|p dt
|h(t) − h(x0 − x + t)|p dt
−M
(because h(t) = h(x0 − x + t) = 0 if |t| ≥ M ) ≤ 2M (ηM −1/p )p (because |h(t) − h(x0 − x + t)| ≤ ηM −1/p for every t) = 2η p . So khx − hx0 kp ≤ 21/p η. On the other hand, R R R |hx − fx |p = |h(x − y) − f (x − y)|p dy = |h(y) − f (y)|p dy, so khx − fx kp = kh − f kp ≤ η, and similarly khx0 − fx0 kp ≤ η. So kfx − fx0 kp ≤ kfx − hx kp + |hx − hx0 kp + khx0 − fx0 kp ≤ η(2 + 21/p ). This means that Z |(f ∗ g)(x) − (f ∗ g)(x0 )| = |
Z fx × g −
Z fx0 × g| = |
(fx − fx0 ) × g|
≤ kfx − fx0 |p kgkq ≤ η(2 + 21/p )kgkq ≤ ². As ² is arbitrary, f ∗ g is uniformly continuous. The argument here supposes that p is finite. But if p = ∞ then q = 1 is finite, so we can apply the method with g in place of f to show that g ∗ f is uniformly continuous, and f ∗ g = g ∗ f by 255Fb. 255L The r-dimensional case I have written 255A-255K out as theorems about Lebesgue measure on R. However they all apply equally well to Lebesgue measure on R r for any r ≥ 1, and the modifications required are so small that I think I need do no more than ask you to read through the arguments again, turning every R into an R r , and every R 2 into an (R r )2 . In 255A and elsewhere, the measure µ2 should be read either as Lebesgue measure on R 2r or as the product measure on (R r )2 ; by 251M the two may be identified. There is a trivial modification required in part (b) of the proof; if In = [an , bn [ then Q µIn = µ(−In ) = i≤r max(0, βni − αni ), writing an = (αn1 , . . . , αnr ). In the proof of 255I, the functions fn should be defined by saying that fn (x) = f (x) if |f (x)| ≤ n and kxk ≤ n, 0 otherwise. In quoting these results, therefore, I shall be uninhibited in referring to the paragraphs 255A-255K as if they were actually written out for general r ≥ 1.
270
Product measures
255M
255M The case of ]−π, π] The same ideas also apply to the circle group S 1 and to the interval ]−π, π], but here perhaps rather more explanation is in order. (a) The first thing to establish is the appropriate group operation. If we think of S 1 as the set {z : z ∈ C, |z| = 1}, then the group operation is complex multiplication, and in the formulae above x + y must be rendered as xy, while x − y must be rendered as xy −1 . On the interval ]−π, π], the group operation is +2π , where for x, y ∈ ]−π, π] I write x +2π y for whichever of x + y, x + y + 2π, x + y − 2π belongs to ]−π, π]. To see that this is indeed a group operation, one method is to note that it corresponds to multiplication on S 1 if we use the canonical bijection x 7→ eix : ]−π, π] → S 1 ; another, to note that it corresponds to the operation on the quotient group R/2πZ. Thus in this interpretation of the ideas of 255A-255K, we shall wish to replace x + y by x +2π y, −x by −2π x, and x − y by x −2π y, where −2π x = −x if x ∈ ]−π, π[,
−2π π = π,
and x −2π y is whichever of x − y, x − y + 2π, x − y − 2π belongs to ]−π, π]. (b) As for the measure, the measure to use on ]−π, π] is just Lebesgue measure. Note that because ]−π, π] is Lebesgue measurable, there will be no confusion concerning the meaning of ‘measurable subset’, as the relatively measurable subsets of ]−π, π] are actually measurable for Lebesgue measure on R. Also we can identify the product measure on ]−π, π] × ]−π, π] with the subspace measure induced by Lebesgue measure on R 2 (251Q). On S 1 , we need the corresponding measure induced by the canonical bijection between S 1 and ]−π, π], which indeed is often called ‘Lebesgue measure on S 1 ’. (We shall see in 265E that it is also equal to Hausdorff one-dimensional measure on S 1 .) We are very close to the level at which it would become reasonable to move to S 1 and this measure (or its normalized version, in which it is reduced by a factor of 2π, so as to make S 1 a probability space). However, the elementary theory of Fourier series, which will be the principal application of this work in the present volume, is generally done on intervals in R, so that formulae based on ]−π, π] are closer to the standard expressions. Henceforth, therefore, I will express all the work in terms of ]−π, π]. (c) The result corresponding to 255A now takes a slightly different form, so I spell it out. 255N Theorem Let µ be Lebesgue measure on ]−π, π] and µ2 Lebesgue measure on ]−π, π] × ]−π, π]; write Σ, Σ2 for their domains. (a) For any a ∈ ]−π, π], the map x 7→ a +2π x : ]−π, π] → ]−π, π] is a measure space automorphism of (]−π, π] , Σ, µ). (b) The map x 7→ −2π x : ]−π, π] → ]−π, π] is a measure space automorphism of (]−π, π] , Σ, µ). (c) For any a ∈ ]−π, π], the map x 7→ a −2π x : ]−π, π] → ]−π, π] is a measure space automorphism of (]−π, π] , Σ, µ). 2 2 2 (d) The map (x, y) 7→ (x +2π y, y) : ]−π, π] → ]−π, π] is a measure space automorphism of (]−π, π] , Σ2 , µ2 ). 2 2 2 (e) The map (x, y) 7→ (x −2π y, y) : ]−π, π] → ]−π, π] is a measure space automorphism of (]−π, π] , Σ2 , µ2 ). proof (a) Set φ(x) = a +2π x. Then for any E ⊆ ]−π, π], φ[E] = ((E + a) ∩ ]−π, π]) ∪ (((E + a) ∩ ]π, 3π]) − 2π) ∪ (((E + a) ∩ ]−3π, −π]) + 2π), and these three sets are disjoint, so that
µφ[E] = µ((E + a) ∩ ]−π, π]) + µ(((E + a) ∩ ]π, 3π]) − 2π) + µ(((E + a) ∩ ]−3π, −π]) + 2π) = µL ((E + a) ∩ ]−π, π]) + µL (((E + a) ∩ ]π, 3π]) − 2π) + µL (((E + a) ∩ ]−3π, −π]) + 2π) (writing µL for Lebesgue measure on R)
255Od
Convolutions of functions
271
= µL ((E + a) ∩ ]−π, π]) + µL ((E + a) ∩ ]π, 3π]) + µL ((E + a) ∩ ]−3π, −π]) = µL (E + a) = µL E = µE. Similarly, µφ−1 [E] is defined and equal to µE for every E ∈ Σ, so that φ is an automorphism of (]−π, π] , Σ, µ). (b) Of course this is quicker. Setting φ(x) = −2π x for x ∈ ]−π, π], we have µ(φ[E]) = µ(φ[E] ∩ ]−π, π[) = µ(−(E ∩ ]−π, π[) = µL (−(E ∩ ]−π, π[)) = µL (E ∩ ]−π, π[) = µ(E ∩ ]−π, π[) = µE for every E ∈ Σ. (c) This is just a matter of putting (a) and (b) together, as in 255A. (d) We can argue as in (a), but with a little more elaboration. If E ∈ Σ2 , and φ(x, y) = (x +2π y, y) for 2 x, y ∈ ]−π, π], set ψ(x, y) = (x + y, y) for x, y ∈ R, and write c = (2π, 0) ∈ R 2 , H = ]−π, π] , H 0 = H + c, H 00 = H − c. Then for any E ∈ Σ2 , φ[E] = (ψ[E] ∩ H) ∪ ((ψ[E] ∩ H 0 ) − c) ∪ ((ψ[E] ∩ H 00 ) + c), so µ2 φ[E] = µ2 (ψ[E] ∩ H) + µ2 ((ψ[E] ∩ H 0 ) − c) + µ2 ((ψ[E] ∩ H 00 ) + c) = µL (ψ[E] ∩ H) + µL ((ψ[E] ∩ H 0 ) − c) + µL ((ψ[E] ∩ H 00 ) + c) (this time writing µL for Lebesgue measure on R 2 ) = µL (ψ[E] ∩ H) + µL (ψ[E] ∩ H 0 ) + µL (ψ[E] ∩ H 00 ) = µL ψ[E] = µL E = µ2 E. 2
In the same way, µ2 (φ−1 [E]) = µ2 E for every E ∈ Σ2 , so φ is an automorphism of (]−π, π] , Σ2 , µ2 ), as required. (e) Finally, (e) is just a restatement of (d), as before. 255O Convolutions on ]−π, π] With the fundamental result established, the same arguments as in 255B-255K now yield the following. Write µ for Lebesgue measure on ]−π, π]. (a) Let f and g be measurable complex-valued functions defined almost everywhere in ]−π, π]. Write f ∗ g for the function defined by the formula Rπ (f ∗ g)(x) = −π f (x −2π y)g(y)dy whenever x ∈ ]−π, π] and the integral exists as a complex number. Then f ∗ g is the convolution of the functions f and g. (b) If f , g are measurable complex-valued functions defined almost everywhere in ]−π, π], then f ∗g = g∗f , in the strict sense that they have the same domain and the same value at each point of that common domain. (c) We may regard convolution as a binary operator on L0C ; if u, v ∈ L0C (µ), we can define u ∗ v as being equal to (f ∗ g)• whenever f • = u and g • = v. (d) Let f , g and h be measurable complex-valued functions defined almost everywhere in ]−π, π]. Then (i) Rπ R h(x)(f ∗ g)(x)dx = ]−π,π]2 h(x + y)f (x)g(y)d(x, y) −π
272
Product measures
255Od
whenever the right-hand side exists and is finite, provided that in the expression h(x)(f ∗ g)(x) we interpret the product as 0 if h(x) = 0 and (f ∗ g)(x) is undefined. Rπ (ii) If, on the same interpretation of |h(x)|(|f | ∗ |g|)(x), the integral −π |h(x)|(|f | ∗ |g|)(x)dx is finite, R then ]−π,π]2 h(x + y)f (x)g(y)d(x, y) exists in C, so again we shall have R Rπ h(x)(f ∗ g)(x)dx = ]−π,π]2 h(x + y)f (x)g(y)d(x, y). −π (e) If f , g are complex-valued functions which are integrable over ]−π, π], then f ∗ g is integrable, with Rπ Rπ Rπ Rπ Rπ Rπ f ∗ g = −π f −π g, |f ∗ g| ≤ −π |f | −π |g|. −π −π (f ) Let f , g, h be complex-valued measurable functions defined almost everywhere in ]−π, π]. Suppose that x ∈ ]−π, π] is such that one of (|f | ∗ (|g| ∗ |h|))(x), ((|f | ∗ |g|) ∗ |h|)(x) is defined in R. Then f ∗ (g ∗ h) and (f ∗ g) ∗ h are defined and equal at x. (g) Suppose that f ∈ LpC (µ), g ∈ LqC (µ) where p, q ∈ [1, ∞] and everywhere in ]−π, π], and supx∈]−π,π] |(f ∗ g)(x)| ≤ kf kp kgkq .
1 p
+
1 q
= 1. Then f ∗ g is defined
255X Basic exercises > (a) Let Rf , g be complex-valued functions defined almost everywhere in R. Show that for any x ∈ R, (f ∗ g)(x) = f (x + y)g(−y)dy if either is defined. > (b) Let f and g be complex-valued functions defined almost everywhere in R. (i) Show that if f and g are even functions, so is f ∗ g. (ii) Show that if f is even and g is odd then f ∗ g is odd. (iii) Show that if f and g are odd then f ∗ g is even. > (c) Let µ be Lebesgue measure on R. Show that we have a function ∗ : L1C (µ) × L1C (µ) → L1C (µ) given by setting f • ∗ g • = (f ∗ g)• for all f , g ∈ L1C (µ). Show that L1C is a commutative Banach algebra under ∗ (definition: 2A4J). R (d) (i) Show that if hRis an integrableR function on R 2 , then (T h)(x) = h(x − y, y)dy exists for almost every x ∈ R, and that (T h)(x)dx = h(x, y)d(x, y). (ii) Write µ2 for Lebesgue measure on R 2 , µ for Lebesgue measure on R. Show that there is a linear operator T˜ : L1 (µ2 ) → L1 (µ) defined by setting T˜(h• ) = (T h)• for every integrable function h on R 2 . (iii) Show that in the language of 253E and (b) above, T˜(u ⊗ v) = u ∗ v for all u, v ∈ L1 (µ). P P a(n − i)bb(i)| < ∞. Show that >(e) For a, b ∈ CZ set (a a ∗ b)(n) = i∈Z a(n − i)bb(i) whenever i∈Z |a a b b a (i) P ∗ = ∗ ; P P a(i)bb(j)| < ∞; a(i)bb(j) if i,j∈Z |cc(i + j)a a ∗ b)(i) = i,j∈Z c(i + j)a (ii) i∈Z c(i)(a 1 1 a ∗ b k1 ≤ ka ak1 kbbk1 ; (iii) if a , b ∈ ` (Z) then a ∗ b ∈ ` (Z) and ka a ∗ b k∞ ≤ ka ak2 kbbk2 ; (iv) If a , b ∈ `2 (Z) then a ∗ b ∈ `∞ (Z) and ka a| ∗ (|bb| ∗ |cc|))(n) is well-defined, then (a a ∗ (bb ∗ c ))(n) = ((a a ∗ b ) ∗ c )(n). (v) if a , b , c ∈ CZ and (|a (f ) Suppose that f , g are real-valued measurable functions defined almost everywhere in R r and such that f > 0 a.e., g ≥a.e. 0 and {x : g(x) > 0} is not negligible. Show that f ∗ g > 0 everywhere in dom(f ∗ g). > (g) Suppose that f : R → C is a bounded differentiable function and that f 0 is bounded. Show that for any integrable complex-valued function g on R, f ∗ g is differentiable and (f ∗ g)0 = f 0 ∗ g everywhere. (Hint: 123D.) Rb (h) A complex-valued function g defined almost everywhere in R is locally integrable if a g is defined in C whenever a < b in R. Suppose that g is such a function and that f : R → C is a differentiable function, with continuous derivative, such that {x : f (x) 6= 0} is bounded. Show that (f ∗ g)0 = f 0 ∗ g everywhere. R 1 > (i) Set φδ (x) = exp(− δ2 −x φδ , ψδ = αδ−1 φδ . Let 2 ) if |x| < δ, 0 if |x| ≥ δ, as in 242Xi. Set αδ = f be a locally integrable complex-valued function on R. (i) Show that f ∗ ψδ is a smooth function defined everywhere on R for every δ > 0. (ii) Show that limRδ↓0 (f ∗ ψδ )(x) = f (x) for almost every x ∈ R. (Hint: 223Yg.) (iii) Show that if f is integrable then limδ↓0 |fR− f ∗ ψδ | = 0. R (Hint: use (ii) and 245H(a-ii) or look first at the case f = χ[a, b] and use 242O, noting that |f ∗ ψδ | ≤ |f |.) (iv) Show that if f is uniformly continuous and defined everywhere on R then limδ↓0 supx∈R |f (x) − (f ∗ φδ )(x)| = 0.
255Yj
> (j) For α > 0, set gα (t) =
Convolutions of functions 1 α−1 t Γ(α)
273
for t > 0, 0 for t ≤ 0. Show that gα ∗ gβ = gα+β for all α, β > 0.
(Hint: 252Yk.) 255Y Further exercises (a) Set f (x) = 1 for all x ∈ R, g(x) =
x |x|
for 0 < |x| ≤ 1 and 0 otherwise,
h(x) = tanh x for all x ∈ R. Show that f ∗ (g ∗ h) and (f ∗ g) ∗ h are both defined (and constant) everywhere, and are different. (b) Discuss what can happen if, in the context of 255J, we know that (|f | ∗ (|g| ∗ |h|))(x) is defined, but have no information on the domain of f ∗ g. (c) Suppose that p ∈ [1, ∞[ and that f ∈ LpC (µ), where µ is Lebesgue measure on R r . For a ∈ R r set (Sa f )(x) = f (a + x) whenever a + x ∈ dom f . Show that Sa f ∈ LpC (µ), and that for every ² > 0 there is a δ > 0 such that kSa f − f kp ≤ ² whenever |a| ≤ δ. (d) Suppose that p, q ∈ ]1, ∞[ and p1 + 1q = 1. Let f ∈ LpC (µ), g ∈ LqC (µ), where µ is Lebesgue measure on R r . Show that limkxk→∞ (f ∗ g)(x) = 0. (Hint: use 244Hb.) (e) Repeat 255Yc and 255K, this time taking µ to be Lebesgue measure on ]−π, π], and setting (Sa f )(x) = f (a +2π x) for a ∈ ]−π, π]; show that in the new version of 255K, (f ∗ g)(π) = limx↓−π (f ∗ g)(x). (f ) Let µ be Lebesgue measure on R. For a ∈ R, f ∈ L0 = L0 (µ) set (Sa f )(x) = f (a + x) whenever a + x ∈ dom f . (i) Show that Sa f ∈ L0 for every f ∈ L0 . (ii) Show that we have a map S˜a : L0 → L0 defined by setting S˜a (f • ) = (Sa f )• for every f ∈ L0 . (iii) Show that S˜a is a Riesz space isomorphism and is a homeomorphism for the topology of convergence in measure; moreover, that S˜a (u × v) = S˜a u × S˜a v for all u, v ∈ L0 . (iv) Show that S˜a+b = S˜a S˜b for all a, b ∈ R. (v) Show that lima→0 S˜a u = u for the topology of convergence in measure, for every u ∈ L0 . (vi) Show that if 1 ≤ p ≤ ∞ then S˜a ¹Lp is an isometric isomorphism of the Banach lattice Lp . (vii) Show that if p ∈ [1, ∞[ then lima→0 kS˜a u − ukp = 0 for every u ∈ Lp . (viii) Show that if A ⊆ L1 is uniformly integrable and M ≥ 0, then {S˜a u : u ∈ A, |a| ≤ M } is uniformly integrable. (ix) Show that if u, v ∈ L0 are such that u ∗ v is defined in L0 , then S˜a (u ∗ v) = (S˜a u) ∗ v = u ∗ (S˜a v) for every a ∈ R. (g) Prove 255Nd from 255Na by the method used to prove 255Ad from 255Aa, rather than by quoting 255Ad. (h) Repeat the results of this chapter for the group (S 1 )r , where r ≥ 2, given its product measure. (i) Let f be a complex-valued function which is integrable over R. (i) Let x be any point of the Lebesgue set of f . Show that for any ² > 0 there is a δ > 0 such that |f (x) − (f ∗ g)(x)| ≤ ² whenever g : R → [0, ∞[ is R Rδ a function which is non-decreasing on ]−∞, 0], non-decreasing on [0, ∞[, and has g = 1 and −δ g ≥ 1 − δ. (ii) Show that for any ² > 0 there is a δ > 0 such that kf − f ∗ gk1 ≤ ² whenever g : R → [0, ∞[ is a function R Rδ which is non-decreasing on ]−∞, 0], non-decreasing on [0, ∞[, and has g = 1 and −δ g ≥ 1 − δ. (j) Let f be a complex-valued function which is integrable over R. Show that, for almost every x ∈ R, a R∞ f (y) 1 R∞ lima→∞ dy, lima→∞ x f (y)e−a(y−x) dy, 2 2 −∞ π
(x−y) +a
1 limσ↓0 √
σ 2π
all exist and are equal to f (x). (Hint: 263G.)
a
R∞ −∞
f (y)e−(y−x)
2
/2σ 2
dy
274
Product measures
255Yk
(k) Let µ be Lebesgue measure on R, and φ : R → R a convex function such that φ(0) = 0; let φ¯ : L0 → L0 =RL0 (µ) be the associated operator (see 241I). Show that if u ∈ L1 = L1 (µ), v ∈ L0 are such ¯ ¯ ∗ v) ≤ u ∗ φ(v). ¯ is defined in L0 , then φ(u (Hint: 233I.) that u, v ≥ 0, u = 1 and and u ∗ φ(v) (l) Let µ be Lebesgue measure on R, and p ∈ [1, ∞]. Let f ∈ L1C (µ), g ∈ LpC (µ). Show that f ∗ g ∈ LpC (µ) and that kf ∗ gkp ≤ kf k1 kgkp . (Hint: argue from 255Yk, as in 244M.) (m) Suppose that p, q, r ∈ ]1, ∞[ and that p1 + 1q = 1 + 1r . Let µ be Lebesgue measure on R. (i) Show that R 1−p/r 1−q/r R p f × g ≤ kf kp kgkq ( f × g q )1/r 0
0
whenever f , g ≥ 0 and f ∈ Lp (µ), g ∈ Lq (µ). (Hint: set p0 = p/(p − R1), etc.; f1 = f p/q , g1 = g q/p , h = (f p ×g q )1/r . Use 244Xd to see that kf1 ×g1 kr0 ≤ kf1 kq0 kg1 kp0 , so that f1 ×g1 ×h ≤ kf1 kq0 kg1 kp0 khkr .) (ii) Show that kf ∗ gkr ≤ kfRkp kgkq for all f ∈ Lp (µ), g ∈ Lq (µ). (Hint: take f , gR≥ 0. Use (i) to see that (f ∗ g)(x)r ≤ kf kr−p kgkr−q f (y)p g(x − y)q dy, so that kf ∗ gkrr ≤ kf kr−p kgkr−q f (y)p kgkqq dy.) (This is p q p q Young’s inequality.) (n) Let G be a group and µ a σ-finite measure on G such that (α) for every a ∈ G, the map x 7→ ax is an automorphism of (G, µ) (β) the map (x, y) 7→ (x, xy) is an automorphism of (G2 , µ2 ), where µ2 is the c.l.d. R 0 product measure on G × G. For f , g ∈ LC (µ) write (f ∗ g)(x) = f (y)g(y −1 x)dy whenever this is defined. Show that R R (i) ifR f , g, h ∈ L0C (µ) and h(xy)f (x)g(y)d(x, y) is defined in C, then h(x)(f ∗ g)(x)dx exists and is equal to h(xy)f (x)g(y)d(x, y), provided that in the expression h(x)(f ∗ g)(x) we interpret the product as 0 if h(x) = 0 and (f ∗ g)(x) is undefined; R R R (ii) if f , g ∈ L1C (µ) then f ∗ g ∈ L1C (µ) and f ∗ g = f g, kf ∗ gk1 ≤ kf k1 kgk1 ; (iii) if f , g, h ∈ L1C (µ) then f ∗ (g ∗ h) = (f ∗ g) ∗ h. (See Halmos 50, §59.) (o) Repeat 255Yn for counting measure on any group G. 255 Notes and comments I have tried to set this section out in such a way that it will be clear that the basis of all the work here is 255A, and the crucial application is 255G. I hope that if and when you come to look at general topological groups (for instance, in Chapter 44), you will find it easy to trace through the ideas in any abelian topological group for which you can prove a version of 255A. For non-abelian groups, of course, rather more care is necessary, especially as in some important examples we no longer have µ{x−1 : x ∈ E} = µE for every E; see 255Yn-255Yo for a little of what can be done without using topological ideas. The critical point in 255A is the move from the one-dimensional results in 255Aa-255Ac, which are just the translation- and reflection-invariance of Lebesgue measure, to the two-dimensional results in 255Ac-255Ad. And the living centre of the argument, as I present it, is the fact that the shear transformation φ is an automorphism of the structure (R 2 , Σ2 ). The actual calculation of µ2 φ[E], assuming that it is measurable, is an easy application of Fubini’s and Tonelli’s theorems and the translation-invariance of µ. It is for this step that we absolutely need the topological properties of Lebesgue measure. I should perhaps remind you that the fact that φ is a homeomorphism is not sufficient; in 134I I described a homeomorphism of the unit interval which does not preserve measurability, and it is easy to adapt this to produce a homeomorphism ψ : R 2 → R 2 such that ψ[E] is not always measurable for measurable E. The argument of 255A is dependent on the special relationships between all three of the measure, topology and group structure of R. I have already indulged in a few remarks on what ought, or ought not, to be ‘obvious’ (255C). But perhaps I can add that such results as 255B and the later claim, in the proof of 255K, that a reflected version of a function in Lp is also in Lp , can only be trivial consequences of results like 255A if every step in the construction of the integral is done in the abstract context of general measure spaces. Even though we are here working exclusively with the Lebesgue integral, the argument will become untrustworthy if we have at any stage in the definition of the integral even mentioned that we are thinking of Lebesgue measure. I advance this as a solid reason for defining ‘integration’ on abstract measure spaces from the beginning, as
§256 intro.
Radon measures on R r
275
I did in Volume 1. Indeed, I suggest that generally in pure mathematics there are good reasons for casting arguments into the forms appropriate to the arguments themselves. I am writing this book for readers who are interested in proofs, and as elsewhere I have written the proofs of this section out in detail. But most of us find it useful to go through some material in ‘advanced calculus’ mode, by which I mean starting with a formula such as R (f ∗ g)(x) = f (x − y)g(y)dy, and then working out consequences by formal manipulations, for instance R RR RR h(x)(f ∗ g)(x)dx = h(x)f (x − y)g(y)dydx = h(x + y)f (x)g(y)dydx, without troubling about the precise applicability of the formulae to begin with. In some ways this formuladriven approach can be more truthful to the structure of the subject than the careful analysis I habitually present. The exact hypotheses necessary to make the theorems strictly true are surely secondary, in such contexts as this section, to the pattern formed by the ensemble of the theorems, which can be adequately and elegantly expressed in straightforward formulae. Of course I do still insist that we cannot properly appreciate the structure, nor safely use it, without mastering the ideas of the proofs – and as I have said elsewhere, I believe that mastery of ideas necessarily includes mastery of the formal details, at least in the sense of being able to reconstruct them fairly fluently on demand. Throughout the main exposition of this section, I have worked with functions rather than equivalence classes of functions. But all the results here have interpretations of great importance for the theory of the ‘function spaces’ of Chapter 24. It is an interesting point that if u, v ∈ L0 then u ∗ v is most naturally interpreted as a function, not as a member of L0 , even if it is defined almost everywhere. Thus 255H can be regarded as saying that u ∗ v ∈ L1 for u, v ∈ L1 . We cannot quite say that convolution is a bilinear operator from L1 × L1 to L1 , because L1 is not strictly speaking a linear space. If we want a bilinear functional, then we have to replace the function u ∗ v by its equivalence class, so that convolution becomes a bilinear map from L1 × L1 to L1 . But when we look at convolution as a function on L2 × L2 , for instance, then our functions u ∗ v are defined everywhere (255K), and indeed are continuous functions vanishing at ∞ (255Yc-255Yd). So in this case it seems more appropriate to regard convolution as a bilinear operator from L2 × L2 to some space of continuous functions, and not as an operator from L2 × L2 to L∞ . For an example of an interesting convolution which is not naturally representable in terms of an operator on Lp spaces, see 255Xj. Because convolution acts as a continuous bilinear operator from L1 (µ) × L1 (µ) to L1 (µ), where µ is Lebesgue measure on R, Theorem 253F tells us that it must correspond to a linear operator from L1 (µ2 ) to L1 (µ), where µ2 is Lebesgue measure on R 2 . This is the operator T˜ of 255Xd. So far in these notes I have written as though we were concerned only with Lebesgue measure on R. However many applications of the ideas involve R r or ]−π, π] or S 1 . The move to R r should be elementary. The move to S 1 does require a re-formulation of the basic result 255A/255N. It should also be clear that r there will be no new difficulties in moving to ]−π, π] or (S 1 )r . Moreover, we can also go through the r whole theory for the groups Z and Z , where the appropriate measure is now counting measure, so that L0C r becomes identified with CZ or CZ (255Xe, 255Yo).
256 Radon measures on R r In the next section, and again in Chapters 27 and 28, we need to consider the principal class of measures on Euclidean spaces. For a proper discussion of this class, and the interrelationships between the measures and the topologies involved, we must wait until Volume 4. For the moment, therefore, I present definitions adapted to the case in hand, warning you that the correct generalizations are not quite obvious. I give the definition (256A) and a characterization (256C) of Radon measures on Euclidean spaces, and theorems on the construction of Radon measures as indefinite integrals (256E, 256J), as image measures (256G) and as product measures (256K). In passing I give a version of Lusin’s theorem concerning measurable functions on Radon measure spaces (256F).
276
Product measures
256A
256A Definitions Let ν be a measure on R r , where r ≥ 1, and Σ its domain. (a) ν is a topological measure if every open set belongs to Σ. Note that in this case every Borel set, and in particular every closed set, belongs to Σ. (b) ν is locally finite if every bounded set has finite outer measure. (c) If ν is a topological measure, it is inner regular with respect to the compact sets if νE = sup{νK : K ⊆ E is compact} for every E ∈ Σ. (Because ν is a topological measure, and compact sets are closed (2A2Ec), νK is defined for every compact set K.) (d) ν is a Radon measure if it is a complete locally finite topological measure which is inner regular with respect to the compact sets. 256B
It will be convenient to be able to call on the following elementary facts.
Lemma Let ν be a Radon measure on R r , and Σ its domain. (a) ν is σ-finite. (b) For any E ∈ Σ and any ² > 0 there are a closed set F ⊆ E and an open set G ⊇ E such that ν(G \ F ) ≤ ². (c) For every E ∈ Σ there is a set H ⊆ E, expressible as the union of a sequence of compact sets, such that ν(E \ H) = 0. (d) Every continuous real-valued function on R r is Σ-measurable. (e) If h : R r → R is continuous and has bounded support, then h is ν-integrable. proof (a) For each n ∈ N, B(0, n) = {x : kxk ≤ n} is a closed bounded set, therefore Borel. So if ν is a Radon measure on R r , hB(0, n)in∈N is a cover of R r by a sequence of sets of finite measure. (b) Set En = {x : x ∈ E, n ≤ kxk < n + 1} for S each n. Then νEn < ∞, so there is a compact set Kn ⊆ En such that νKn ≥ νEn − 2−n−2 ². Set F = n∈N Kn ; then ν(E \ F ) =
P∞ n=0
Also F ⊆ E and F is closed because F ∩ B(0, n) =
1 2
ν(En \ Kn ) ≤ ².
S i≤n
Ki ∩ B(0, n)
is closed for each n. In the same way, there is a closed set F 0 ⊆ R r \ E such that ν((R r \ E) \ F 0 ) ≤ 21 ². Setting G = R r \ F 0 , we see that G is open, that G ⊇ E and that ν(G \ E) ≤ 21 ², so that ν(G \ F ) ≤ ², as required. (c)SBy (b), we can choose for each n ∈ N a closed set S Fn ⊆ E such that ν(E \ Fn ) ≤ 2−n . Set H = n∈N Fn ; then H ⊆ E and ν(E \ H) = 0, and also H = m,n∈N B(0, m) ∩ Fn is a countable union of compact sets. (d) If h : R r → R is continuous, all the sets {x : h(x) > a} are open, so belong to Σ. (e) By (d), h is measurable. Now we are supposing that there is some n ∈ N such that h(x) = 0 whenever x ∈ / B(0, n). Since B(0, n) is compact (2A2F), h is bounded on B(0, n) (2A2G), and we have |h| ≤ γχB(0, n) for some γ; since νB(0, n) is finite, h is ν-integrable. 256C Theorem A measure ν on R r is a Radon measure iff it is the completion of a locally finite measure defined on the σ-algebra B of Borel subsets of R r . proof (a) Suppose first that ν is a Radon measure. Write Σ for its domain. (i) Set ν0 = ν¹B. Then ν0 is a measure with domain B, and it is locally finite because ν0 B(0, n) = νB(0, n) is finite for every n. Let νˆ0 be the completion of ν0 (212C).
256C
Radon measures on R r
277
(ii) If νˆ0 measures E, there are E1 , E2 ∈ B such that E1 ⊆ E ⊆ E2 and ν0 (E2 \ E1 ) = 0. Now E \ E1 ⊆ E2 \ E1 must be ν-negligible; as ν is complete, E ∈ Σ and νE = νE1 = ν0 E1 = νˆ0 E. (iii) If E ∈ Σ, then by 256Bc there is a Borel set H ⊆ E such that ν(E \ H) = 0. Equally, there is a Borel set H 0 ⊆ R r \ E such that ν((R r \ E) \ H 0 ) = 0, so that we have H ⊆ E ⊆ R r \ H 0 and ν0 ((R r \ H 0 ) \ H) = ν((Rr \ H 0 ) \ H) = 0. So νˆ0 E is defined and equal to ν0 E1 = νE. This shows that ν = νˆ0 is the completion of the locally finite Borel measure ν¹B. And this is true for any Radon measure ν on R r . (b) For the rest of the proof, I suppose that ν0 is a locally finite measure on R r and ν is its completion. Write Σ for the domain of ν. We say that a subset of R r is a Kσ set if it is expressible as the union of a sequence of compact sets. Note that every Kσ set is a Borel set, so belongs to Σ. Set A = {E : E ∈ Σ, there is a Kσ set H ⊆ E such that ν(E \ H) = 0}, Σ = {E : E ∈ A, R r \ E ∈ A}. (c)(i) Every open set is itself a Kσ set, so belongs to A. P P Let G ⊆ R r be open. If G = ∅ then G is compact and the result is trivial. Otherwise, let I be the set of closed intervals of the form [q, q 0 ], where q, q 0 ∈ Qr , which are included in G. Then all the members of I are closed and bounded, therefore compact. If x ∈ G, there is a δ > 0 such S that B(x, δ) = {y : ky − xk ≤ δ} ⊆ G; now there is an I ∈ I such that x ∈ I ⊆ B(x, δ). Thus G = I. But I is countable, so G is Kσ . Q Q S (ii) Every closed subset of R is Kσ , so belongs to A. P P If F ⊆ R is closed, then F = n∈N F ∩ B(0, n); but every F ∩ B(0, n) is closed and bounded, therefore compact. Q Q S P ForSeach n ∈ N we have a (iii) If hEn in∈N is any sequence in A, then E = n∈N En Sbelongs to A. P countable family Kn of compact subsets ofSEn such that ν(E \ K ) = 0; now K = n∈N Kn is a countable n n S S Q family of compact subsets of E, and E \ K ⊆ n∈N (En \ Kn ) is ν-negligible. Q T P For each (iv) If hEn in∈N is any sequence in A, then F = S n ∈ N, let hKni ii∈N be a S n∈N En ∈ A. P 0 = i≤j Kni for each j, so that sequence of compact subsets of En such that ν(En \ i∈N Kni ) = 0. Set Knj 0 ∩ H) ν(En ∩ H) = limj→∞ ν(Knj
for every H ∈ Σ. Now, for each m, n ∈ N, choose j(m, n) such that 0 ν(En ∩ B(0, m) ∩ Kn,j(m,n) ) ≥ ν(En ∩ B(0, m)) − 2−(m+n) .
T 0 ; then Km is closed (being an intersection of closed sets) and bounded (being a Set Km = n∈N Kn,j(m,n) 0 0 subset of K0,j(m,0) ), therefore compact. Also Km ⊆ F , because Kn,j(m,n) ⊆ En for each n, and P P∞ ∞ 0 ) ≤ n=0 2−(m+n) = 2−m+1 . ν(F ∩ B(0, m) \ Km ) ≤ n=0 ν(En ∩ B(0, m) \ Kn,j(m,n) S Consequently H = m∈N Km is a Kσ subset of F and ν(F ∩ B(0, m) \ H) ≤ inf k≥m ν(F ∩ B(0, k) \ Hk ) = 0 for every m, so ν(F \ H) = 0 and F ∈ A. Q Q (d) Σ is a σ-algebra of subsets of R. P P (i) ∅ and its complement are open, so belong to A and therefore to Σ. (ii) If E ∈ Σ then both R r \ E and Rr \ (R r \ E) = E belong to A, so Rr \ E ∈ Σ. (iii) Let hEn in∈N be a sequence in Σ with union E. By (a-iii) and (a-iv), T E ∈ A, R r \ E = n∈N (R r \ En ) ∈ A, so E ∈ Σ. Q Q (e) By (c-i) and (c-ii), every open set belongs to Σ; consequently every Borel set belongs to Σ and therefore to A. Now if E is any member of Σ, there is aSBorel set E1 ⊆ E such that ν(E \ E1 ) = 0 and a Kσ set H ⊆ E1 such that ν(E1 \ H) = 0. Express H as n∈N Kn where every Kn is compact; then
278
because
Product measures
νE = νH = limn→∞ ν(
S i∈n
S i≤n
256C
Ki ) ≤ supK⊆E
is compact
νK ≤ νE
Ki is a compact subset of E for every n.
(f ) Thus ν is inner regular with respect to the compact sets. But of course it is complete (being the completion of ν0 ) and a locally finite topological measure (because ν0 is); so it is a Radon measure. This completes the proof. 256D Proposition If ν and ν 0 are two Radon measures on R r , the following are equiveridical: (i) ν = ν 0 ; (ii) νK = ν 0 K for every compact set K ⊆ R r ; (iii) RνG = ν 0 GR for every open set G ⊆ R r ; (iv) h dν = h dν 0 for every continuous function h : R r → R with bounded support. proof (a)(i)⇒(iv) is trivial. (b)(iv)⇒(iii) If (iv) is true, and G ⊆ R r is an open set, then for each n ∈ N set hn (x) = min(1, 2n inf y∈R r \(G∩B(0,n)) ky − xk) for x ∈ R r . RThen hn is (in fact |hn (x) − hn (x0 )| ≤ 2n kx − x0 k for all x, x0 ∈ R r ) and zero outside R continuous 0 B(0, n), so hn dν = hn dν . Next, hhn (x)in∈N is a non-decreasing sequence converging to χG(x) for every x ∈ R r . So νG = limn→∞
R
hn dν = limn→∞
R
hn dν 0 = ν 0 G,
by 135Ga. As G is arbitrary, (iii) is true. (c)(iii)⇒(ii) If (iii) is true, and K ⊆ R r is compact, let n be so large that kxk < n for every x ∈ K. Set G = {x : kxk < n}, H = G \ K. Then G and H are open and G is bounded, so νG = ν 0 G is finite, and νK = νG − νH = ν 0 G − ν 0 H = ν 0 K. As K is arbitrary, (ii) is true. (d)(ii)⇒(i) If ν, ν 0 agree on the compact sets, then νE = supK⊆E
is compact
νK = supK⊆E
is compact
ν 0K = ν 0E
for every Borel set E. So ν¹B = ν 0 ¹B, where B is the algebra of Borel sets. But since ν and ν 0 are both the completions of their restrictions to B, they are identical. 256E It is I suppose time I gave some examples of Radon measures. However it will save a few lines if I first establish some basic constructions. You may wish to glance ahead to 256H at this point. Theorem Let ν be a Radon measure on R r , with domain Σ, and f a non-negative Σ-measurable R function defined on a ν-conegligible subset of R r . Suppose that f is locally integrable in the sense that E f dν < ∞ for every bounded set E. Then the indefinite-integral measure ν 0 on R r defined by saying that ν 0E =
R
E
f dν whenever E ∩ {x : x ∈ dom f, f (x) > 0} ∈ Σ
r
is a Radon measure on R . proof For the construction of ν 0 , see 234A-234D. It is a topological measure because every open set belongs to Σ and therefore to the domain Σ0 of ν 0 . ν 0 is locally finite because f is locally integrable. To see that ν 0 is inner regular with respect to the compact sets, take any set E ∈ Σ0 , and set E 0 = {x : x ∈ E ∩ dom f, f (x) > 0}. Then E 0 ∈ Σ, so there is a set H ⊆ E 0 , expressible as the union of a sequence of compact sets, such that ν(E 0 \ H) = 0. In this case ν 0 (E \ H) =
R
E\H
f dν = 0.
Let hKn in∈N be a sequence of compact sets with union H; then S ν 0 E = ν 0 H = limn→∞ ν 0 ( i≤n Ki ) ≤ supK⊆E
is compact
As E is arbitrary, ν 0 is inner regular with respect to the compact sets.
ν 0 K ≤ ν 0 E.
256Hb
Radon measures on R r
279
256F Theorem Let ν be a Radon measure on R r , and Σ its domain. Let f : D → R be a Σ-measurable function, where D ⊆ R r . Then for every ² > 0 there is a closed set F ⊆ R r such that ν(Rr \ F ) ≤ ² and f ¹F is continuous. proof By 121I, there is a Σ-measurable function h : R r → R extending f . Enumerate Q as hqn in∈N . For each n ∈ N set En = {x : h(x) ≤ qn }, En0 = {x : h(x) > qn } and use 256Bb T to choose closed sets Fn ⊆ En , Fn0 ⊆ En0 such that ν(En \ Fn ) ≤ 2−n−2 ², ν(En0 \ Fn0 ) ≤ 2−n−2 ². Set F = n∈N (Fn ∪ Fn0 ); then F is closed and P∞ P∞ ν(R r \ F ) ≤ n=0 ν(R r \ (Fn ∪ Fn0 )) ≤ n=0 ν(En \ Fn ) + ν(En0 \ Fn0 ) ≤ ². I claim that h¹F is continuous. P P Suppose that x ∈ F and δ > 0. Then there are m, n ∈ N such that h(x) − δ ≤ qm < h(x) ≤ qn ≤ h(x) + δ. 0 ∩ En ; consequently x ∈ / Fm ∪ Fn0 . Because Fm ∪ Fn0 is closed, there is an η > 0 This means that x ∈ Em such that y ∈ / Fm ∪ Fn0 whenever ky − xk ≤ η. Now suppose that y ∈ F and ky − xk ≤ η. Then 0 0 0 y ∈ (Fm ∪ Fm ) ∩ (Fn ∪ Fn0 ) and y ∈ / Fm ∪ Fn0 , so y ∈ Fm ∩ Fn ⊆ Em ∩ En and qm < h(y) ≤ qn . Consequently |h(y) − h(x)| ≤ δ. As x and δ are arbitrary, h¹F is continuous. Q Q Consequently f ¹F = (h¹F )¹D is continuous, as required.
256G Theorem Let ν be a Radon measure on R r , with domain Σ, and suppose that φ : R r → R s is measurable in the sense that all its coordinates are Σ-measurable. If the image measure ν 0 = νφ−1 (112F) is locally finite, it is a Radon measure. proof Write Σ for the domain of ν and Σ0 for the domain of ν 0 . If φ = (φ1 , . . . , φs ), then φ−1 [{y : ηj ≤ α}] = {x : φj (x) ≤ α} ∈ Σ, so {y : ηj ≤ α} ∈ Σ0 for every j ≤ s, α ∈ R, where I write y = (η1 , . . . , ηs ) for y ∈ R s . Consequently every Borel subset of R s belongs to Σ0 (121J), and ν 0 is a topological measure. It is complete because if F is ν 0 -negligible, and H ⊆ F , then φ−1 [H] ⊆ φ−1 [F ] is ν-negligible, therefore belongs to Σ (cf. 211Xd). The point is of course that ν 0 is inner regular with respect to the compact sets. P P Suppose that F ∈ Σ0 0 r and that γ < ν F . For each j ≤ s, there is T a closed set Hj ⊆ R such that φj ¹Hj is continuous and ν(R r \ Hj ) < 1s (ν 0 F − γ), by 256F. Set H = j≤s Hj ; then H is closed and φ¹H is continuous and ν(R r \ H) < ν 0 F − γ = νφ−1 [F ] − γ, so that ν(φ−1 [F ] ∩ H) > γ. Let K ⊆ φ−1 [F ] ∩ H be a compact set such that νK ≥ γ, and set L = φ[K]. Because K ⊆ H and φ¹H is continuous, L is compact (2A2Eb). Of course L ⊆ F , and ν 0 L = νφ−1 [L] ≥ νK ≥ γ. As F and γ are arbitrary, ν 0 is inner regular with respect to the compact sets. Q Q Since ν 0 is locally finite by the hypothesis of the theorem, it is a Radon measure. 256H Examples I come at last to the promised examples. (a) Lebesgue measure on R r is a Radon measure. (It is a topological measure by 115G, and inner regular with respect to the compact sets by 134Fb.) (b) Let htn in∈N be any sequence in R r , and han in∈N any summable sequence in [0, ∞[. For every E ⊆ Rr set P νE = {an : tn ∈ E}. so that ν is a totally finite point-supported measure. Then ν is a (totally finite) Radon measure on R r . P P Clearly ν is complete and defined on every Borel set and gives finite measure to bounded sets. To see that it is inner regular with respect to the compact sets, observe that for any E ⊆ R r the sets Kn = E ∩ {ti : i ≤ n} are compact and νE = limn→∞ νKn . Q Q
280
Product measures
256Hc
(c) Now we come to a new idea. Recall that the Cantor set C (134G) is a closed negligible subset of [0, 1], and that the Cantor function (134H) is a non-decreasing continuous function f : [0, 1] → [0, 1] such that f (0) = 0, f (1) = 1 and f is constant on each of the intervals composing [0, 1] \ C. It follows that if we set g(x) = 12 (x + f (x)) for x ∈ [0, 1], then g : [0, 1] → [0, 1] is a continuous bijection such that the Lebesgue measure of g[C] is 21 (134I); consequently g −1 : [0, 1] → [0, 1] is continuous. Now extend g to a bijection h : R → R by setting h(x) = x for x ∈ R \ [0, 1]. Then h and h−1 are continuous. Note that h[C] = g[C] has Lebesgue measure 21 . Let ν1 be the indefinite-integral measure defined from Lebesgue measure µ on R and the function 2χ(h[C]); that is, ν1 E = 2µ(E∩h[C]) whenever this is defined. By 256E, ν1 is a Radon measure, and ν1 h[C] = ν1 R = 1. Let ν be the measure ν1 h, that is, νE = ν1 h[E] for just those E ⊆ R such that h[E] ∈ dom ν1 . Then ν is a Radon probability measure on R, by 256G, and νC = 1, ν(R \ C) = µC = 0. 256I Remarks (a) The measure ν of 256Hc, sometimes called Cantor measure, is a classic example, and as such has many constructions, some rather more natural than the one I use here (see 256Xk, and also 264Ym below). But I choose the method above because it yields directly, without further investigation or any appeal to more advanced general theory, the fact that ν is a Radon measure. (b) The examples above are chosen to represent the extremes under the ‘Lebesgue decomposition’ described in 232I. If ν is a (totally finite) Radon measure on R r , we can use 232Ib to express its restriction ν¹B to the Borel σ-algebra as νp + νac + νcs , where νp is the ‘point-mass’ or ‘atomic’ part of ν¹B, νac is the ‘absolutely continuous’ part (with respect to Lebesgue measure), and νcs is the ‘atomless singular part’. In the example of 256Hb, we have ν¹B = νp ; in 256E, if we start from Lebesgue measure, we have ν¹B = νac ; and in 256Hc we have ν¹B = νcs . 256J Absolutely continuous Radon measures It is worth pausing a moment over the indefiniteintegral measures described in 256E. Proposition Let ν be a Radon measure on R r , where r ≥ 1, and write µ for Lebesgue measure on R r . Then the following are equiveridical: (i) ν is an indefinite-integral measure over µ; (ii) νE = 0 whenever E is aR Borel subset of R r and µE = 0. In this case, if g ∈ L0 (µ) and E g dµ = νE for every Borel set E ⊆ R r , then g is a Radon-Nikod´ ym derivative of ν with respect to µ in the sense of 234B. proof (a)(i)⇒(ii) If f is a Radon-Nikod´ ym derivative of ν with respect to µ, then of course νE =
R
E
f dµ = 0
whenever µE = 0. (ii)⇒(i) If νE = 0 for every µ-negligible Borel set E, then νE is defined and equal to 0 for every µ-negligible set E, because ν is complete and any µ-negligible set is included in a µ-negligible Borel set. Consequently dom ν includes the domain Σ of µ, since every Lebesgue measurable set is expressible as the union of a Borel set and a negligible set. For each n ∈ N set En = {x : n ≤ kxk < n + 1}, so that hEn in∈N is a partition of R r into bounded Borel sets. Set νn E = ν(E ∩ En ) for every Lebesgue measurable set E and every n ∈ N. Now νn is absolutely continuous with respect ym theorem (232F) there is a µ-integrable R to µ (232Ba), so by the Radon-Nikod´ function fn such that E fn dµ = νn E for every Lebesgue measurable set E. Because νn E ≥ 0 for every E ∈ Σ, fn ≥a.e. 0; because νn (R r \ En ) = 0, fn = 0 a.e. on R r \ En . Now if we set P∞ f = max(0, n=0 fn ), f will be defined µ-a.e. and we shall have R P∞ R P∞ f dµ = n=0 E fn dµ = n=0 ν(E ∩ En ) = νE E for every Borel set E, so that the indefinite-integral measure ν 0 defined by f and µ agrees with ν on the Borel sets. Since this ensures that ν 0 is locally finite, ν 0 is a Radon measure, by 256E, and is equal to ν, by 256D. Accordingly ν is an indefinite-integral measure over µ.
*256M
Radon measures on R r
281
(b) As in (a-ii) above, h must be locally integrable and the indefinite-integral measure defined by h agrees with ν on the Borel sets, so is identical with ν. 256K Products The class of Radon measures on Euclidean spaces is stable under a wide variety of operations, as we have already seen; in particular, we have the following. Theorem Let ν1 , ν2 be Radon measures on R r and R s respectively, where r, s ≥ 1. Let λ be their c.l.d. product measure on R r × R s . Then λ is a Radon measure. Remark When I say that λ is ‘Radon’ according to the definition in 256A, I am of course identifying R r ×R s with R r+s , as in 251L-251M. proof (a) I hope the following rather voluminous notation will seem natural. Write Σ1 , Σ2 for the domains of ν1 , ν2 ; Br , Bs for the Borel σ-algebras of R r , R s ; Λ for the domain of λ; and B for the Borel σ-algebra of R r+s . Because each νi is the completion of its restriction to the Borel sets (256C), λ is the product of ν1 ¹Br and ν2 ¹Bs (251S). Because ν1 ¹Br and ν2 ¹Bs are σ-finite (256Ba, 212Ga), λ must be the completion of its b s , which by 251L is identified with B. Setting Qn = {(x, y) : kxk ≤ n, kyk ≤ n} we have restriction to Br ⊗B λQn = ν1 {x : kxk ≤ n} · ν2 {y : kyk ≤ n} < ∞ for every n, while every bounded subset of R r+s is included in some Qn . So λ¹B is locally finite, and its completion λ is a Radon measure, by 256C. 256L Remark We see from 253I that if ν1 and ν2 are Radon measures on R r and R s respectively, and are both indefinite-integral measures over Lebesgue measure, then their product measure on R r+s is also an indefinite-integral measure over Lebesgue measure. *256M For the sake of applications in §286 below, I include another result, which is in fact one of the fundamental properties of Radon measures, as will appear in §414. Proposition Let ν be a Radon measure on Rr , and D any subset of Rr . Let Φ be a non-empty upwardsdirected family of non-negative continuous functions from D to R. For x ∈ D set g(x) = supf ∈Φ f (x) in [0, ∞]. Then (a) gR : D → [0, ∞] is lower semi-continuous, therefore Borel measurable; R (b) D g dν = supf ∈Φ D f dν. proof (a) For any u ∈ [−∞, ∞], {x : x ∈ D, g(x) > u} =
S
f ∈Φ {x
: x ∈ D, f (x) > u}
is an open set for the subspace topology on D (2A3C), so is the intersection of D with a Borel subset of Rr . This is enough to show that g is Borel measurable (121B-121C). R R R (b) Accordingly D g dν will be defined in [0, ∞], and of course D g dν ≥ supf ∈Φ D f dν. For the reverse inequality, observe that there is a countable set Ψ ⊆ Φ such that g(x) = supf ∈Ψ f (x) for every x ∈ D. P P For a ∈ Q, q, q 0 ∈ Q r set Φaqq0 = {f : f ∈ Φ, f (y) > a whenever y ∈ D ∩ [q, q 0 ]}, interpreting [q, q 0 ] as in 115G. Choose faqq0 ∈ Φaqq0 if Φaqq0 is not empty, and arbitrarily in Φ otherwise; and set Ψ = {faqq0 : a ∈ Q, q, q 0 ∈ Q r }, so that Ψ is a countable subset of Φ. If x ∈ D and b < g(x), there is an a ∈ Q such that b ≤ a < g(x); there is an fˆ ∈ Φ such that fˆ(x) > a; because fˆ is continuous, there are q, q 0 ∈ Qr such that q ≤ x ≤ q 0 and fˆ(y) ≥ a whenever y ∈ D ∩ [q, q 0 ]; so that fˆ ∈ Φaqq0 , Φaqq0 6= ∅, faqq0 ∈ Φaqq0 and supf ∈Ψ f (x) ≥ faqq0 (x) ≥ b. As b is arbitrary, g(x) = supf ∈Ψ f (x). Q Q Let hfn in∈N be a sequence running over Ψ. Because Φ is upwards-directed, we can choose hfn0 in∈N in Φ 0 inductively in such a way that fn+1 ≥ max(fn0 , fn ) for every n ∈ N. So hfn0 in∈N is a non-decreasing sequence in Φ and supn∈N fn0 (x) ≥ supf ∈Ψ f (x) = g(x) for every x ∈ D. By B.Levi’s theorem,
R
D
g dν ≤ supn∈N
and we have the required inequality.
R
D
fn0 dν ≤ supf ∈Φ
R
D
f dν,
282
Product measures
256X
256X Basic exercises > (a) Let ν be a measure on R r . (i) Show that it is locally finite, in the sense of 256Ab, iff for every x ∈ R r there is a δ > 0 such that ν ∗ B(x, δ) < ∞. (Hint: the sets B(0, n) are compact.) (ii) Show that in this case ν is σ-finite. > (b) Let ν beSa Radon measure on R r and G a non-empty upwards-directed family of open sets in R r . S (i) Show that ν( G) = supG∈G S νG. (Hint: observe that if K ⊆ G is compact, then K ⊆ G for some G ∈ G.) (ii) Show that ν(E ∩ G) = supG∈G ν(E ∩ G) for every set E which is measured by ν. > (c) Let ν be a Radon measure on R r and T F a non-empty downwards-directed family of closed sets in R such that inf F ∈F νF < ∞. (i) Show that ν( F) = inf F ∈F νF . (Hint: apply 256Xb(ii) to G = {R r \ F : T F ∈ F}.) (ii) Show that ν(E ∩ F) = inf F ∈F ν(E ∩ F ) for every E in the domain of ν. r
> (d) Show that a Radon measure ν on R r is atomless iff ν{x} = 0 for every x ∈ R r . (Hint: apply 256Xc with F = {F : F ⊆ E is closed, not negligible}.) (e) Let ν1 , ν2 be Radon measures on Rr , and α1 , α2 ∈ ]0, ∞[. Set Σ = dom ν1 ∩ dom ν2 , and for E ∈ Σ set νE = α1 ν1 E + α2 ν2 E. Show that ν is a Radon measure on R r . Show that ν is an indefinite-integral measure over Lebesgue measure iff ν1 , ν2 are, and that in this case a linear combination of of Radon-Nikod´ ym derivatives of ν1 and ν2 is a Radon-Nikod´ ym derivative of ν. > (f ) Let ν be a Radon measure on R r . (i) Show that there is a unique closed set F ⊆ R r such that, for open sets G ⊆ R r , νG > 0 iff G ∩ F 6= ∅. (F is called the support of ν.) (ii) Generally, a set A ⊆ R r is called self-supporting if ν ∗ (A ∩ G) > 0 whenever G ⊆ R r is an open set meeting A. Show that for every closed set F ⊆ R r there is a unique self-supporting closed set F 0 ⊆ F such that ν(F \ F 0 ) = 0. > (g) Show that a measure ν on R is a Radon measure iff it is a Lebesgue-Stieltjes measure as described in 114Xa. Show that in this case ν is an indefinite-integral measure over Lebesgue measure iff the function x 7→ ν ]−∞, x] is absolutely continuous on every bounded interval. (h) Let ν be a Radon measure on R r . Let Ck be the space of continuous real-valued functions on R r with bounded supports. Show that for every ν-integrable function f and every ² > 0 there is a g ∈ Ck such that R |f − g|dν ≤ ². (Hint: use arguments from 242O, but in (a-i) of the proof there start with closed intervals I.) (i) Let ν be a Radon measure on R r . Show that νE = inf{νG : G ⊇ E is open} for every set E in the domain of ν. (j) Let ν, ν 0 be two Radon measures on R r , and suppose that νI = ν 0 I for every half-open interval I ⊆ R r (definition: 115Ab). Show that ν = ν 0 . (k) Let ν be Cantor measure (256Hc). (i) Show that if Cn is the nth set used in the construction of the Cantor set, so that Cn consists of 2n intervals of length 3−n , then νI = 2−n for each of the intervals N N I composing → R by setting P∞Cn .−n(ii) Let λ be the usual Nmeasure on {0, 1} (254J). Define φ : {0, 1} 2 N φ(x) = 3 n=0 3 x(n) for each x ∈ {0, 1} . Show that φ is a bijection between {0, 1} and C. (iii) Show that if B is the Borel σ-algebra of R, then {φ−1 [E] : E ∈ B} is precisely the σ-algebra of subsets of {0, 1}N generated by the sets {x : x(n) = i} for n ∈ N, i ∈ {0, 1}. (iv) Show that φ is an isomorphism between ({0, 1}N , λ) and (C, νC ), where νC is the subspace measure on C induced by ν. (l) Let ν and ν 0 be two Radon measures on R r . Show that ν 0 is an indefinite-integral measure over ν iff ν E = 0R whenever νE = 0, and in this case a function f is a Radon-Nikod´ ym derivative of ν 0 with respect 0 to ν iff E f dν = ν E for every Borel set E. 0
256Y Further exercises (a) Let ν be a Radon measure on R r , and X any subset of Rr ; let νX be the subspace measure on X and ΣX its domain, and give X its subspace topology (2A3C). Show that νX has the following properties: (i) νX is complete and locally determined; (ii) every open subset of X belongs to ΣX ; (iii) νX E = sup{νX F : F ⊆ E is closed in S X} for every E ∈ ΣX ; (iv) whenever G is a non-empty upwards-directed family of open subsets of X, νX ( G) = supG∈G νX G; (v) every point of X belongs to an open set of finite measure.
256 Notes
Radon measures on R r
283
(b) Let ν be a Radon measure on R r , with domain Σ, and f : Rr → R a function. Show that the following are equiveridical: (i) f is Σ-measurable; (ii) for every non-negligible set E ∈ Σ there is a nonnegligible F ∈ Σ such that F ⊆ E and f ¹F is continuous; (iii) for every set E ∈ Σ, νE = supK∈Kf ,K⊆E νK, where Kf = {K : K ⊆ R r is compact, f ¹K is continuous}. (Hint: for S (ii)⇒(i), take a maximal disjoint family E ⊆ {K : K ∈ Kf , νK > 0}; show that E is countable and that E is conegligible.) (c) Take ν, X, νX and ΣX as in 256Ya. Suppose that f : X → R is a function. Show that f is ΣX measurable iff for every non-negligible measurable set E ⊆ X there is a non-negligible measurable F ⊆ E such that f ¹F is continuous. (d) Let hνn in∈N be a sequence of Radon measures on R r . Show that there is a Radon measure ν on R such that every νn is anP indefinite-integral measure over ν. (Hint: find aP sequence hαn in∈N of strictly ∞ ∞ positive numbers such that n=0 αn νn B(0, k) < ∞ for every k, and set ν = n=0 αn νn , using the idea of 256Xe.) r
(e) A set G ⊆ R N is open if for every x ∈ G there are n ∈ N, δ > 0 such that {y : y ∈ R N , |y(i) − x(i)| < δ for every i ≤ n} ⊆ G. The Borel σ-algebra of R N is the σ-algebra B of subsets of R N generated, in the sense of 111Gb, by the family T of open sets. (i) Show that T is a topology (2A3A). (ii) Show that a filter F on R N converges to x ∈ R N iff πi [[F]] → x(i) for every i ∈ N, where πi (y) = y(i) for i ∈ N, y ∈ R N . (iii) Show that B is the σ-algebra generated by sets of the form {x : x ∈ R N , x(i) ≤ a}, where i runs through N and a runs through R. (iv) Show that if αi ≥ 0 for every i ∈ N, then {x : |x(i)| ≤ αi ∀ i ∈ N} is compact. (Hint: 2A3R.) (v) Show that any open set in R N is the union of a sequence of closed sets. (Hint: look at sets of the form {x : qi ≤ x(i) ≤ qi0 ∀ i ≤ n}, where qi , qi0 ∈ Q for i ≤ n.) (vi) Show that if ν0 is any probability measure with domain B, then its completion ν is inner regular with respect to the compact sets, and therefore may be called a ‘Radon measure on R N ’. (Hint: show that there are compact sets of measure arbitrarily close to 1, and therefore that every open set, and every closed set, includes a Kσ set of the same measure.) 256 Notes and comments Radon measures on Euclidean spaces are very special, and the results of this section do not give clear pointers to the direction the theory takes when applied to other kinds of topological space. With the material here you could make a stab at developing a theory of Radon measures on separable complete metric spaces, provided you use 256Xa as the basis for your definition of ‘locally finite’. These are the spaces for which a version of 256C is true. (See 256Ye.) But for generalizations to other types of topological space, and for the more interesting parts of the theory on R r , I must ask you to wait for Volume 4. My purpose in introducing Radon measures here is strictly limited; I wish only to give a basis for §257 and §271 sufficiently solid not to need later revision. In fact I think that all we really need are the Radon probability measures. The chief technical difficulty in the definition of ‘Radon measure’ here lies in the insistence on completeness. It may well be that for everything studied in this volume, it would be simpler to look at locally finite measures with domain the algebra of Borel sets. This would involve us in a number of circumlocutions when dealing with Lebesgue measure itself and its derivates, since Lebesgue measure is defined on a larger σ-algebra; but the serious objection arises in the more advanced theory, when non-Borel sets of various kinds become central. Since my aim in this book is to provide secure foundations for the study of all aspects of measure theory, I ask you to take a little extra trouble now in order to avoid the possibility of having to re-work all your ideas later. The extra trouble arises, for instance, in 256D, 256Xe and 256Xj; since different Radon measures are defined on different σ-algebras, we have to check that two Radon measures which agree on the compact sets, or on the open sets, have the same domains. On the credit side, some of the power of 256G arises from the fact that the Radon image measure νφ−1 is defined on the whole σ-algebra {F : φ−1 [F ] ∈ dom(ν)}, not just on the Borel sets. The further technical point that Radon measures are expected to be locally finite gives less difficulty; its effect is that from most points of view there is little difference between a general Radon measure and a totally finite Radon measure. The extra condition which obviously has to be put into the hypotheses of such results as 256E and 256G is no burden on either intuition or memory.
284
Product measures
256 Notes
In effect, we have two definitions of Radon measures on Euclidean spaces: they are the inner regular locally finite topological measures, and they are also the completions of the locally finite Borel measures. The equivalence of these definitions is Theorem 256C. The latter definition is the better adapted to 256K, and the former to 256G. The ‘inner regularity’ of the basic definition refers to compact sets; we also have forms of inner regularity with respect to closed sets (256Bb) and Kσ sets (256Bc), and a complementary notion of ‘outer regularity’ with respect to open sets (256Xi).
257 Convolutions of measures The ideas of this chapter can be brought together in a satisfying way in the theory of convolutions of Radon measures, which will be useful in §272 and again in §285. I give just the definition (257A) and the central property (257B) of the convolution of totally finite Radon measures, with a few corollaries and a note on the relation between convolution of functions and convolution of measures (257F). 257A Definition Let r ≥ 1 be an integer and ν1 , ν2 two totally finite Radon measures on Rr . Let λ be the product measure on R r × R r ; then λ is also a (totally finite) Radon measure, by 256K. Define φ : R r × R r → R r by setting φ(x, y) = x + y; then φ is continuous, therefore measurable in the sense of 256G. The convolution of ν1 and ν2 , ν1 ∗ ν2 , is the image measure λφ−1 ; by 256G, this is a Radon measure. Note that if ν1 and ν2 are Radon probability measures, then λ and ν1 ∗ ν2 are also probability measures. 257B Theorem Let r ≥ 1 be an integer, and ν1 and ν2 two totally finite Radon measures on R r ; let ν = ν1 ∗ ν2 be their convolution, and λ their product on R r × R r . Then for any real-valued function h defined on a subset of R r , R R h(x + y)λ(d(x, y)) exists = h(x)ν(dx) if either integral is defined in [−∞, ∞]. proof Apply 235L with J(x, y) = 1, φ(x, y) = x + y for all x, y ∈ R r . 257C Corollary Let r ≥ 1 be an integer, and ν1 , ν2 two totally finite Radon measures on R r ; let ν = ν1 ∗ ν2 be their convolution, and λ their product on R r × R r ; write Λ for the domain of λ. Let h be a Λ-measurable function defined λ-almost everywhere in R r . Suppose that any one of the integrals RR RR R |h(x + y)|ν1 (dx)ν2 (dy), |h(x + y)|ν2 (dy)ν1 (dx), h(x + y)λ(d(x, y)) exists and is finite. Then h is ν-integrable and R RR RR h(x)ν(dx) = h(x + y)ν1 (dx)ν2 (dy) = h(x + y)ν2 (dy)ν1 (dx). proof Put 257B together with Fubini’s and Tonelli’s theorems (252H). 257D Corollary If ν1 and ν2 are totally finite Radon measures on R r , then ν1 ∗ ν2 = ν2 ∗ ν1 . proof For any Borel set E ⊆ R r , apply 257C to h = χE to see that ZZ ZZ (ν1 ∗ ν2 )(E) = χE(x + y)ν1 (dx)ν2 (dy) = χE(x + y)ν2 (dy)ν1 (dx) ZZ = χE(y + x)ν2 (dy)ν1 (dx) = (ν2 ∗ ν1 )(E). Thus ν1 ∗ ν2 and ν2 ∗ ν1 agree on the Borel sets of R r ; because they are both Radon measures, they must be identical (256D). 257E Corollary If ν1 , ν2 and ν3 are totally finite Radon measures on R r , then (ν1 ∗ν2 )∗ν3 = ν1 ∗(ν2 ∗ν3 ). proof For any Borel set E ⊆ R r , apply 257B to h = χE to see that
257Xe
Convolutions of measures
285
ZZ ((ν1 ∗ ν2 ) ∗ ν3 )(E) =
χE(x + z)(ν1 ∗ ν2 )(dx)ν3 (dz) ZZZ = χE(x + y + z)ν1 (dx)ν2 (dy)ν3 (dz)
(because x 7→ χE(x + z) is Borel measurable for every z) ZZ = χE(x + y)ν1 (dx)(ν2 ∗ ν3 )(dy) R (because (x, y) 7→ χE(x + y) is Borel measurable, so y 7→ χE(x + y)ν1 (dx) is (ν2 ∗ ν3 )-integrable) = (ν1 ∗ (ν2 ∗ ν3 ))(E). Thus (ν1 ∗ ν2 ) ∗ ν3 and ν1 ∗ (ν2 ∗ ν3 ) agree on the Borel sets of R r ; because they are both Radon measures, they must be identical. 257F Theorem Suppose that ν1 and ν2 are totally finite Radon measures on R r which are indefiniteintegral measures over Lebesgue measure µ. Then ν1 ∗ ν2 is also an indefinite-integral measure over µ; if f1 and f2 are Radon-Nikod´ ym derivatives of ν1 , ν2 respectively, then f1 ∗ f2 is a Radon-Nikod´ ym derivative of ν1 ∗ ν2 . R proof By 255H (see the remark in 255L), f1 ∗ f2 is integrable with respect to µ, with f1 ∗ f2 dµ = 1, and of course f1 ∗ f2 is non-negative. If E ⊆ R r is a Borel set, Z
ZZ f1 ∗ f2 dµ =
χE(x + y)f1 (x)f2 (y)µ(dx)µ(dy)
E
(by 255G)
ZZ =
χE(x + y)f2 (y)ν1 (dx)µ(dy)
(because x 7→ χE(x + y) is Borel measurable) ZZ = χE(x + y)ν1 (dx)ν2 (dy) R (because (x, y) 7→ χE(x + y) is Borel measurable, so y 7→ χE(x + y)ν1 (dx) is ν2 -integrable) = (ν1 ∗ ν2 )(E). So f1 ∗ f2 is a Radon-Nikod´ ym derivative of ν with respect to µ, by 256J. 257X Basic exercises > (a) Let r ≥ 1 be an integer. Let δ0 be the Radon probability measure on R r such that δ0 {0} = 1. Show that δ0 ∗ ν = ν for every totally finite Radon measure on R r . (b) Let µ and Rν be totally finite Radon measures on R r , and E any set measured by their convolution µ ∗ ν. Show that µ(E − y)ν(dy) is defined in [0, ∞] and equal to (µ ∗ ν)(E). (c) Let ν1 , . . . , νn be totally finite Radon measures on R r , and let ν be the convolution ν1 ∗ . . . ∗ νn (using 257E to see that such a bracketless expression is legitimate). Show that R R R h(x)ν(dx) = . . . h(x1 + . . . + xn )ν1 (dx1 ) . . . νn (dxn ) for every ν-integrable function h. (d) Let ν1 and ν2 be totally finite Radon measures on R r , with supports F1 , F2 (256Xf). Show that the support of ν1 ∗ ν2 is {x + y : x ∈ F1 , y ∈ F2 }. >(e) Let ν1 and ν2 be totally finite Radon measures on R r , and suppose that ν1 has a Radon-Nikod´ ym derivative f with ym derivative g, R respect to Lebesgue measure µ. Showr that ν1 ∗ ν2 has a Radon-Nikod´ where g(x) = f (x − y)ν2 (dy) for µ-almost every x ∈ R .
286
Product measures
257Xf
(f ) Suppose that ν1 , ν2 , ν10 and ν20 are totally finite Radon measures on R r , and that ν10 , ν20 are absolutely continuous with respect to ν1 , ν2 respectively. Show that ν10 ∗ ν20 is absolutely continuous with respect to ν1 ∗ ν2 . 257Y Further exercises (a) Let M be the space of countably additive functionals defined on the algebra B of Borel subsets of R, with its norm kνk = |ν|(R) (see 231Yh). (i) Show that we have a unique bilinear operator ∗ : M × M → M such that (µ1 ¹B) ∗ (µ2 ¹B) = (µ1 ∗ µ2 )¹B for all totally finite Radon measures µ1 , µ2 on R. (ii) Show that ∗ is commutative and associative. (iii) Show that kν1 ∗ ν2 k ≤ kν1 kkν2 k for all ν1 , ν2 ∈ M , so that M is a Banach algebra under this multiplication. (iv) Show that M has a multiplicative identity. (v) Show that L1 (µ) can be regarded as a closed subalgebra of M , where µ is Lebesgue measure on R r (cf. 255Xc). (b) Let us say that a Radon measure on ]−π, π] is a measure ν, with domain Σ, on ]−π, π] such that (i) every Borel subset of ]−π, π] belongs to Σ (ii) for every E ∈ Σ there are Borel sets E1 , E2 such that E1 ⊆ E ⊆ E2 and ν(E2 \ E1 ) = 0 (iii) every compact subset of ]−π, π] has finite measure. Show that for any two totally finite Radon measures ν1 , ν2 on ]−π, π] there is a unique totally finite Radon measure ν on ]−π, π] such that R R h(x)ν(dx) = h(x +2π y)ν1 (dx)ν2 (dy) for every ν-integrable function h, where +2π is defined as in 255Ma. 257 Notes and comments Of course convolution of functions and convolution of measures are very closely connected; the obvious link being 257F, but the correspondence between 255G and 257B is also very marked. In effect, they give us the same notion of convolution u ∗ v when u, v are positive members of L1 and u ∗ v is interpreted in L1 rather than as a function (257Ya). But we should have to go rather deeper than the arguments here to find ideas in the theory of convolution of measures to correspond to such results as 255K. I will return to questions of this type in §444 in Volume 4. All the theorems of this section can be extended to general abelian locally compact Hausdorff topological groups; but for such generality we need much more advanced ideas (see §444), and for the moment I leave only the suggestion in 257Yb that you should try to adapt the ideas here to ]−π, π] or S 1 .
261A
Vitali’s theorem in R r
287
Chapter 26 Change of Variable in the Integral I suppose most courses on basic calculus still devote a substantial amount of time to practice in the techniques of Rintegrating standard functions. Surely the most powerful single technique is that of substitution: R replacing g(y)dy by g(φ(x))φ0 (x)dx for an appropriate function φ. At this level one usually concentrates on the skills of guessing at appropriate φ and getting the formulae right. I will not address such questions here, except for rare special cases; in this book I am concerned rather with validating the process. For functions of one variable, it can usually be justified by an appeal to the fundamental theorem of calculus, and for any particular case I would normally go first to §225 in the hope that the results there would cover it. But for functions of two or more variables some much deeper ideas are necessary. I have already treated the general problem of integration-by-substitution in abstract measure spaces in R R §235. There I described conditions under which g(y)dy = g(φ(x))J(x)dx for an appropriate function J. The context there gave very little scope for suggestions as to how to compute J; at best, it could be presented as a Radon-Nikod´ ym derivative (235O). In this chapter I give a form of the fundamental theorem for the case of Lebesgue measure, in which φ is a more or less differentiable function between Euclidean spaces, and J is a ‘Jacobian’, the modulus of the determinant of the derivative of φ (263D). This necessarily depends on a serious investigation of the relationship between Lebesgue measure and geometry. The first step is to establish a form of Vitali’s theorem for r-dimensional space, together with r-dimensional density theorems; I do this in §261, following closely the scheme of §§221 and 223 above. We need to know quite a lot about differentiable functions between Euclidean spaces, and it turns out that the theory is intertwined with that of ‘Lipschitz’ functions; I treat these in §262. In the last two sections of the chapter, I turn to a separate problem for which some of the same techniques turn out to be appropriate: the description of surface measure on (smooth) surfaces in Euclidean space, like the surface of a cone or sphere. I suppose there is no difficulty in forming a robust intuition as to what is meant by the ‘area’ of such a surface and of suitably simple regions within it, and there is a very strong presumption that there ought to be an expression for this intuition in terms of measure theory as presented in this book; but the details are not I think straightforward. The first point to note is that for any calculation of the area of a region G in a surface S, one would always turn at once to a parametrization of the region, that is, a bijection φ : D → G from some subset D of Euclidean space. But obviously one needs to be sure that the result of the calculation is independent of the parametrization chosen, and while it would be possible to base the theory on results showing such independence directly, that does not seem to me to be a true reflection of the underlying intuition, which is that the area of simple surfaces, at least, is something intrinsic to their geometry. I therefore see no acceptable alternative to a theory of ‘r-dimensional measure’ which can be described in purely geometric terms. This is the burden of §264, in which I give the definition and most fundamental properties of Hausdorff r-dimensional measure in Euclidean spaces. With this established, we find that the techniques of §§261-263 are sufficient to relate it to calculations through parametrizations, which is what I do in §265.
261 Vitali’s theorem in R r The main aim of this section is to give r-dimensional versions of Vitali’s theorem and Lebesgue’s Density Theorem, following ideas already presented in §§221 and 223. 261A Notation For most of this chapter, we shall be dealing with the geometry and measure of Euclidean space; it will save space to fix some notation. Throughout this section and the two following, r ≥ 1 will be an integer. I will use Roman letters for members of R r and Greek letters for their coordinates, so that a = (α1 , . . . , αr ), etc.; if you see any Greek letter with a subscript you should look first for a nearby vector of which it might be a coordinate. The measure under consideration will nearly always be Lebesgue measure on Rr ; so unless otherwise R indicated µ should be interpreted as Lebesgue measure, and µ∗ as Lebesgue outer measure. Similarly, . . . dx will always be integration with respect to Lebesgue measure (in a dimension determined by the context).
288
Change of variable in the integral
261A
p For x = (ξ1 , . . . , ξr ) ∈ R r , write kxk = ξ12 + . . . + ξr2 . Recall that kx + yk ≤ kxk + kyk (1A2C) and that kαxk = |α|kxk for any vectors x, y and scalar α. I will use the same notation as in §115 for ‘intervals’, so that, in particular, [a, b[ = {x : αi ≤ ξi < βi ∀ i ≤ r}, ]a, b[ = {x : αi < ξi < βi ∀ i ≤ r}, [a, b] = {x : αi ≤ ξi ≤ βi ∀ i ≤ r} whenever a, b ∈ R r . 0 = (0, . . . , 0) will be the zero vector in R r , and 1 will be (1, . . . , 1). If x ∈ R r and δ > 0, B(x, δ) will be the closed ball with centre x and radius δ, that is, {y : y ∈ R r , ky −xk ≤ δ}. Note that B(x, δ) = x+B(0, δ); so that by the translation-invariance of Lebesgue measure we have µB(x, δ) = µB(0, δ) = βr δ r , where βr = =
1 k π k!
if r = 2k is even,
22k+1 k! k π (2k+1)!
if r = 2k + 1 is odd
(252Q). 261B Vitali’s theorem in R r Let A ⊆ R r be any set, and I a family of closed non-trivial (that is, non-singleton, or, equivalently, non-negligible) balls in R r such that every point of A is S contained in arbitrarily small members of I. Then there is a countable disjoint set I0 ⊆ I such that µ(A \ I0 ) = 0. proof (a) To begin with (down to the end of (f) below), suppose that kxk < M for every x ∈ A, and set I 0 = {I : I ∈ I, I ⊆ B(0, M )}. S If there is a finite disjoint set I0 ⊆ I 0 such that A ⊆ I0 (including the possibility that A = I0 = ∅), we can stop. So let us suppose henceforth that there is no such I0 . (b) In this case, if I0 is any finite disjoint subset of I 0 , there is a J ∈ I 0 which is disjoint from any S member of I0 . P P Take x ∈ A \ I0 . Because every member of I0 is closed, there is a δ > 0 such that B(x, δ) does not meet any member of I0 , and as kxk < M we can suppose that B(x, S δ) ⊆ B(0, M ). Let J be a member of I, containing x, and of diameter at most δ; then J ∈ I 0 and J ∩ I0 = ∅. Q Q (c) We can therefore choose a sequence hγn in∈N of real numbers and a disjoint sequence hIn in∈N in I 0 inductively, as follows. Given hIj ij 0, then there are q, q 0 ∈ Q such that f (x) − ² ≤ q ≤ f (x) ≤ q 0 ≤ f (x) + ², and now lim inf δ↓0
µ∗ {y:y∈D∩B(x,δ), |f (y)−f (x)|≤²} µB(x,δ)
≥ lim inf δ↓0
µ∗ (Dqq0 ∩B(x,δ)) µB(x,δ)
= 1,
so limδ↓0
µ∗ {y:y∈D∩B(x,δ), |f (y)−f (x)|≤²} µB(x,δ)
= 1.
(d) Define C as in (c). We know from (a) that µ(D \ C 0 ) = 0, where C 0 = {x : x ∈ D, limδ↓0
µ∗ (D∩B(x,δ)) µB(x,δ)
= 1}.
If x ∈ C ∩ C 0 and ² > 0, we know from (c) that limδ↓0
µ∗ {y:y∈D∩B(x,δ), |f (y)−f (x)|≤²/2} µB(x,δ)
= 1.
But because f is measurable, we have µ∗ {y : y ∈D ∩ B(x, δ), |f (y) − f (x)| ≥ ²} 1 2
+ µ∗ {y : y ∈ D ∩ B(x, δ), |f (y) − f (x)| ≤ ²} ≤ µ∗ (D ∩ B(x, δ)) for every δ > 0. Accordingly lim sup δ↓0
µ∗ {y:y∈D∩B(x,δ), |f (y)−f (x)|≥²} µB(x,δ) µ∗ (D∩B(x,δ)) µB(x,δ) δ↓0
≤ lim
µ∗ {y:y∈D∩B(x,δ), |f (y)−f (x)|≤²/2} µB(x,δ) δ↓0
− lim
= 0,
292
Change of variable in the integral
261D
and limδ↓0
µ∗ {y:y∈D∩B(x,δ), |f (y)−f (x)|≥²} µB(x,δ)
=0
for every x ∈ C ∩ C 0 , that is, for almost every x ∈ D. 261E Theorem Let D be a subset of R r , and f a real-valued function which is integrable over D. Then R 1 limδ↓0 |f (y) − f (x)|dy = 0 D∩B(x,δ) µB(x,δ)
for almost every x ∈ D. proof (Compare 223D.) (a) Suppose first that D is bounded. For each q ∈ Q, set gq (x) = |f (x) − q| for x ∈ D ∩ dom f ; then g is integrable over D, and R 1 limδ↓0 g = gq (x) D∩B(x,δ) q µB(x,δ)
for almost every x ∈ D, by 261C. Setting Eq = {x : x ∈ D ∩ dom f, limδ↓0
R 1 gq µB(x,δ) D∩B(x,δ)
we have D \ Eq negligible for every q, so D \ E is negligible, where E = limδ↓0
R
1 |f (y) − f (x)|dy µB(x,δ) D∩B(x,δ)
= gq (x)},
T q∈Q
Eq . Now
=0
for every x ∈ E. P P Take x ∈ E and ² > 0. Then there is a q ∈ Q such that |f (x) − q| ≤ ², so that |f (y) − f (x)| ≤ |f (y) − q| + ² = gq (y) + ² for every y ∈ D ∩ dom f , and Z 1 lim sup δ↓0
µB(x,δ)
|f (y) − f (x)|dy ≤ lim sup δ↓0
D∩B(x,δ)
Z
1 µB(x,δ)
gq (y) + ² dy D∩B(x,δ)
= ² + gq (x) ≤ 2². As ² is arbitrary, limδ↓0
R 1 |f (y) − f (x)|dy µB(x,δ) D∩B(x,δ)
= 0,
as required. Q Q (b) For unbounded sets D, apply (a) to D ∩ B(0, n) for each n ∈ N. Remark The set {x : x ∈ dom f, limδ↓0
R 1 |f (y) − f (x)|dy µB(x,δ) D∩B(x,δ)
= 0}
is sometimes called the Lebesgue set of f . 261F
Another very useful consequence of 261B is the following.
Proposition Let A ⊆ R r be any set,Sand ² > 0. Then hBn in∈N of closed balls in R r , P∞ there is a sequence ∗ all of radius at most ², such that A ⊆ n∈N Bn and n=0 µBn ≤ µ A + ². Moreover, we may suppose that the balls in the sequence whose centres do not lie in A have measures summing to at most ². proof (a) Set βr = µB(0, 1). The first step is the obvious remark that if x ∈ R r , δ > 0 then the half-open √ cube I = [x, x + δ1[ is a subset of the ball B(x, δ r), which has measure γr δ r = γr µI, where γr = βr rr/2 . r It follows that if G ⊆ R is any open set, then G can be covered by a sequence of balls of total measure at most γr µG. P P If G is empty, we can take all the balls to be singletons. Otherwise, for each k ∈ N, set £ £ Qk = {z : z ∈ Zr , 2−k z, 2−k (z + 1) ⊆ G},
261Yc
Vitali’s theorem in R r
Ek =
S z∈Qk
293
£ −k £ 2 z, 2−k (z + 1 ).
Then hEk ik∈N is a non-decreasing sequence of sets with union G, and E0 and each of the differences Ek+1 \Ek is expressible as a disjoint union of half-open cubes. Thus G also is expressible as a disjoint union S of a sequence hIn in∈N of half-open cubes. Each In is covered by a ball Bn of measure γr µIn ; so that G ⊆ n∈N Bn and P∞ P∞ Q n=0 µBn ≤ γr n=0 µIn = γr µG. Q (b) It follows at once that if µA = 0 then for any ² > 0 there is a sequence hBn in∈N of balls covering A of measures summing to at most ², because there is certainly an open set including A with measure at most ²/γr . (c) Now take any set A, and ² > 0. Let G ⊇ A be an open set with µG ≤ µ∗ A + 21 ². Let I be the family of non-trivial closed balls included in G, of radius at most ² and with centres in A. Then every point of A S belongs to arbitrarily small members of I, so there is a countable disjoint I ⊆ I such that µ(A \ I ) = 0. 0 0 S P∞ surely all have Let hBn0 in∈N be a sequence of balls covering A \ I0 with n=0 µBn0 ≤ min( 21 ², βr ²r ); these S radius at most ². Let hBn in∈N be a sequence amalgamating I0 with hBn0 in∈N ; then A ⊆ n∈N Bn , every Bn has radius at most ² and P∞ P P∞ 1 0 n=0 µBn = B∈I0 µB + n=0 µBn ≤ µG + ² ≤ µA + ², 2
while the Bn whose centres do not lie in A must come from the sequence hBn0 in∈N , so their measures sum to at most 21 ² ≤ ². Remark In fact we can (if A is not empty) arrange that the centre of every Bn belongs to A. This is an easy consequence of Besicovitch’s Covering Lemma (see §472 in Volume 4). 261X Basic exercises (a) Show that 261C and 261E are valid for any locally integrable real-valued function f ; in particular, for any f ∈ Lp (µD ) for any p ≥ 1, writing µD for the subspace measure on D. (b) Show that 261C, 261Dc, 261Dd and 261E are valid for complex-valued functions f . > (c) Take three disks in the plane, each touching the other two, so that they enclose an open region R with three cusps. In R let D be a disk tangent to each of the three original disks, and R0 , R1 , R2 the three components of R \ D. In each Rj let Dj be a disk tangent to each of the disks bounding Rj , and Rj0 , Rj1 , Rj2 the three components of Rj \ Dj . Continue, obtaining 27 regions at the next step, 81 regions at the next, and so on. Show that the total area of the residual regions converges to zero as the process continues indefinitely. (Hint: compare with the process in the proof of 261B.) 261Y Further exercises (a) Formulate an abstract definition of ‘Vitali cover’, meaning a family of sets satisfying the conclusion of 261B in some sense, and corresponding generalizations of 261C-261E, covering (at least) (b)-(d) below. £ £ (b) For x ∈ R r , k ∈ N let C(x, k) be the half-open cube of the form 2−k z, 2−k (z + 1) , with z ∈ Zr , containing x. Show that if f is an integrable function on R r then R limk→∞ 2kr C(x,k) f = f (x) for almost every x ∈ R r . (c) Let f be a real-valued function which is integrable over R r . Show that 1 R limδ↓0 r [x,x+δ1[ f = f (x) δ
r
for almost every x ∈ R .
294
Change of variable in the integral
261Yd
(d) Give X = {0, 1}N its usual measure ν (254J). For x ∈ X, k ∈ N set C(x, k) = {y : y ∈ X, R y(i) = x(i) for i < k}. Show that if f is any real-valued function which is integrable over X then limk→∞ 2k C(x,k) f dν = R f (x), limk→∞ 2k C(x,k) |f (y) − f (x)|ν(dy) = 0 for almost every x ∈ X. (e) Let f be a real-valued function which is integrable over R r , and R x a point in the Lebesgue set of f . Show that for every ² > 0 there is a δ > 0 such that |f (x) − f (x − y)g(kyk)dy| ≤ ² whenever R R g : [0, ∞[ → [0, ∞[ is a non-increasing function such that R r g(kyk)dy = 1 and B(0,δ) g(kyk)dy ≥ 1 − δ. (Hint: 223Yg.) (f ) Let T be the family of those measurable sets G ⊆ R r such that limδ↓0 r
r
µ(G∩B(x,δ)) µB(x,δ)
= 1 for every
x ∈ G. Show that T is a topology on R , the density topology of R . Show that a function f : R r → R is measurable iff it is T-continuous at almost every point of Rr . (g) A set A ⊆ R r is said to be porous at x ∈ R r if lim supy→x
ρ(y,A) ky−xk
> 0, writing ρ(y, A) = inf z∈A ky −zk
(or ∞ if A is empty). Show that if A is porous at all its points then it is negligible. (h) Let A ⊆ R r be a bounded set and I a non-empty family of non-trivial closed Pn balls covering A. Show that for any ² > 0 there are disjoint B0 , . . . , Bn ∈ I such that µ∗ A ≤ (3 + ²)r k=0 µBk . (i) Let (X, ρ) be a metric space and A ⊆ X any set, x 7→ δx : A → [0, ∞[ any bounded function. Show that if γ >S3 then there is an A0 ⊆ A such that (i) ρ(x, y) > δx + δy for all distinct x, y ∈ A0 (ii) S x∈A0 B(x, γδx ), writing B(x, α) for the closed ball {y : ρ(y, x) ≤ α}. x∈A B(x, δx ) ⊆ (j) Show that any union of non-trivial closed balls in R r is Lebesgue measurable. (Hint: induce on r. Compare 415Ye in Volume 4.) (k) Suppose that A ⊆ R r and that I is a family of closed subsets of R r such that for every x ∈ A there is an η > 0 such that for every ² > 0 there is an I ∈ I such that x ∈ I and 0 < η(diam I)r ≤ µI ≤ ². S Show that there is a countable disjoint set I0 ⊆ I such that A \ I0 is negligible. 261 Notes and comments In the proofs of 261B-261E above, I have done my best to follow the lines of the one-dimensional case; this section amounts to a series of generalizations of the work of §§221 and 223. It will be clear that the idea of 261A/261B can be used on other shapes than balls. To make it work in the form above, we need a family I such that there is a constant K for which µI 0 ≤ KµI for every I ∈ I, where we write I 0 = {x : inf y∈I kx − yk ≤ diam(I)}. Evidently this will be true for many classes I determined by the shapes of the sets involved; for instance, if E ⊆ R r is any bounded set of strictly positive measure, the family I = {x + δE : x ∈ R r , δ > 0} will satisfy the condition. In 261Ya I challenge you to find an appropriate generalization of the arguments depending on the conclusion of 261B. Another way of using 261B is to say that because sets can be essentially covered by disjoint sequences of balls, it ought to be possible to use balls, rather than half-open intervals, in the definition of Lebesgue measure on R r . This is indeed so (261F). The difficulty in using balls in the basic definition comes right at the start, in proving that if a ball is covered by finitely many balls then the sum of the volumes of the covering balls is at least the volume of the covered ball. (There is a trick, using the compactness of closed balls and the openness of open balls, to extend such a proof to infinite covers.) Of course you could regard this fact as ‘elementary’, on the ground that Archimedes would have noticed if it weren’t true, but nevertheless it would be something of a challenge to prove it, unless you were willing to wait for a version of Fubini’s theorem, as some authors do.
262B
Lipschitz and differentiable functions
295
I have given the results in 261C-261E for arbitrary subsets D of Rr not because I have any applications in mind in which non-measurable subsets are significant, but because I wish toRmake it possible to notice when measurability matters. Of course it is necessary to interpret the integrals D f dµ in the way laid down in §214. The game is given away in part (c) of the proof of 261C, R where R I rely on the fact that if f is integrable over D then there is an integrable f˜ : R r → R such that F f˜ = D∩F f for every measurable F ⊆ R r . In effect, for all the questions dealt with here, we can replace f , D by f˜, R r . The idea of 261C is that, for almost every x, f (x) is approximated by its mean value on small balls B(x, δ), ignoring the missing values on B(x, δ) \ (D ∩ dom f ); 261E is a sharper version of the same idea. The formulae of 261C-261E mostly involve the expression µB(x, δ). Of course this is just βr δ r . But I think that leaving it unexpanded is actually more illuminating, as well as avoiding sub- and superscripts, since it makes it clearer what these density theorems are really about. In §472 of Volume 4 I will revisit this material, showing that a surprisingly large proportion of the ideas can be applied to arbitrary Radon measures on R r , even though Vitali’s theorem (in the form stated here) is no longer valid.
262 Lipschitz and differentiable functions In preparation for the main work of this chapter in §263, I devote a section to two important classes of functions between Euclidean spaces. What we really need is the essentially elementary material down to 262I, together with the technical lemma 262M and its corollaries. Theorem 262Q is not relied on in this volume, though I believe that it makes the patterns which will develop more natural and comprehensible. 262A Lipschitz functions Suppose that r, s ≥ 1 and φ : D → R s is a function, where D ⊆ R r . We say that φ is γ-Lipschitz, where γ ∈ [0, ∞[, if kφ(x) − φ(y)k ≤ γkx − yk p p for all x, y ∈ D, writing kxk = ξ12 + . . . + ξr2 if x = (ξ1 , . . . , ξr ) ∈ R r , kzk = ζ12 + . . . + ζs2 if z = (ζ1 , . . . , ζs ) ∈ R s . In this case, γ is a Lipschitz constant for φ. A Lipschitz function is a function φ which is γ-Lipschitz for some γ ≥ 0. Note that in this case φ has a least Lipschitz constant (since if A is the set of Lipschitz constants for φ, and γ0 = inf A, then γ0 is a Lipschitz constant for φ). 262B
We need the following easy facts.
Lemma Let D ⊆ R r be a set and φ : D → R s a function. (a) φ is Lipschitz iff φi : D → R is Lipschitz for every i, writing φ(x) = (φ1 (x), . . . , φs (x)) for every x ∈ D = dom φ ⊆ R r . (b) In this case, there is a Lipschitz function φ˜ : R r → R s extending φ. (c) If r = s = 1 and D = [a, b] is an interval, then φ is Lipschitz iff it is absolutely continuous and has a bounded derivative. proof (a) For any x, y ∈ D and i ≤ s, |φi (x) − φi (y)| ≤ kφ(x) − φ(y)k ≤
√
s supj≤s |φj (x) − φj (y)|,
so any Lipschitz constant for φ will be a Lipschitz constant for every φi , and if γj is a Lipschitz constant for √ φj for each j, then s supj≤s γj will be a Lipschitz constant for φ. (b) By (a), it is enough to consider the case s = 1, for if every φi has a Lipschitz extension φ˜i , we can set ˜ φ(x) = (φ˜1 (x), . . . , φ˜s (x)) for every x to obtain a Lipschitz extension of φ. Taking s = 1, then, note that the case D = ∅ is trivial; so suppose that D 6= ∅. Let γ be a Lipschitz constant for φ, and write ˜ φ(z) = sup φ(y) − γky − zk y∈D
for every z ∈ R r . If x ∈ D, then, for any z ∈ R r and y ∈ D, φ(y) − γky − zk ≤ φ(x) + γky − xk − γky − zk ≤ φ(x) + γkz − xk,
296
Change of variable in the integral
262B
˜ ˜ so that φ(z) ≤ φ(x) + γkz − xk; this shows, in particular, that φ(z) < ∞. Also, if z ∈ D, we must have ˜ φ(z) − γkz − zk ≤ φ(z) ≤ φ(z) + γkz − zk, so that φ˜ extends φ. Finally, if w, z ∈ R r and y ∈ D, ˜ + γkw − zk; φ(y) − γky − wk ≤ φ(y) − γky − zk + γkw − zk ≤ φ(z) and taking the supremum over y ∈ D, ˜ ˜ + γkw − zk. φ(w) ≤ φ(z) As w and z are arbitrary, φ˜ is Lipschitz. (c)(i) Suppose that φ is γ-Lipschitz. If ² > 0 and a ≤ a1 ≤ b1 ≤ . . . ≤ an ≤ bn ≤ b and ²/(1 + γ), then Pn Pn i=1 |φ(bi ) − φ(ai )| ≤ i=1 γ|bi − ai | ≤ ².
Pn
i=1 bi
− ai ≤
As ² is arbitrary, φ is absolutely continuous. If x ∈ [a, b] and φ0 (x) is defined, then |φ0 (x)| = limy→x
|φ(y)−φ(x)| |y−x|
≤ γ,
so φ0 is bounded. (ii) Now suppose that φ is absolutely continuous and that |φ0 (x)| ≤ γ for every x ∈ dom φ0 , where γ ≥ 0. Then whenever a ≤ x ≤ y ≤ b, Ry Ry |φ(y) − φ(x)| = | x φ0 | ≤ x |φ0 | ≤ γ(y − x) (using 225E for the first equality). As x and y are arbitrary, φ is γ-Lipschitz. 262C Remark The argument for (b) above shows that if φ : D → R is a Lipschitz function, where D ⊆ Rr , then φ has an extension to R r with the same Lipschitz constants. In fact it is the case that if φ : D → R s is a Lipschitz function, then φ has an extension to φ˜ : R r → R s with the same Lipschitz constants; this is ‘Kirzbraun’s theorem’ (Kirzbraun 34, or Federer 69, 2.10.43). 262D Proposition If φ : D → R r is a γ-Lipschitz function, where D ⊆ R r , then µ∗ φ[A] ≤ γ r µ∗ A for every A ⊆ D, where µ is Lebesgue measure on R r . In particular, φ[D ∩ A] is negligible for every negligible set A ⊆ R r . proof LetP² > 0. By 261F, there is aP sequence hBn in∈N = hB(xn , δn )in∈N of closed balls in Rr , covering A, ∞ ∗ such that n=0 µBn ≤ µ A + ² and n∈N\K µBn ≤ ², where K = {n : n ∈ N, xn ∈ A}. Set L = {n : n ∈ N \ K, Bn ∩ D 6= ∅}, and for n ∈ L choose yn ∈ D ∩ Bn . Now set Bn0 = B(φ(xn ), γδn ) if n ∈ K, = B(φ(yn ), 2γδn ) if n ∈ L, = ∅ if n ∈ N \ (K ∪ L). S Then φ[Bn ∩ D] ⊆ Bn0 for every n, so φ[D ∩ A] ⊆ n∈N Bn0 , and µ∗ φ[A ∩ D] ≤
∞ X
µBn0 = γ r
n=0 r ∗
X
µBn + 2r γ r
n∈K r r
≤ γ (µ A + ²) + 2 γ ². As ² is arbitrary, µ∗ φ[A ∩ D] ≤ γ r µ∗ A, as claimed.
X n∈L
µBn
262G
Lipschitz and differentiable functions
297
262E Corollary Let φ : D → R r be an injective Lipschitz function, where D ⊆ R r , and f a measurable function from a subset of R r to R. (a) If φ−1 is defined almost everywhere in a subset H of R r and f is defined almost everywhere in R r , then f φ−1 is defined almost everywhere in H. (b) If E ⊆ D is Lebesgue measurable then φ[E] is measurable. (c) If D is measurable then f φ−1 is measurable. proof Set C = dom(f φ−1 ) = {y : y ∈ φ[D], φ−1 (y) ∈ dom f } = φ[D ∩ dom f ]. (a) Because f is defined almost everywhere, φ[D \ dom f ] is negligible. But now C = φ[D] \ φ[D \ dom f ] = dom φ−1 \ φ[D \ dom f ], so H \ C ⊆ (H \ dom φ−1 ) ∪ φ[D \ dom f ] is negligible. (b) Now suppose that E ⊆SD and that E is measurable. Let hFn in∈N be a sequence of closed bounded subsets of E such that µ(E \ n∈N Fn ) = 0 (134Fb). Because φ is Lipschitz, it is continuous,Sso φ[Fn ] is compact, therefore closed, therefore measurable for every n (2A2F, 2A2E, 115G); also φ[E \ n∈N Fn ] is negligible, by 262D, therefore measurable. So S S φ[E] = φ[E \ n∈N Fn ] ∪ n∈N φ[Fn ] is measurable. (c) For any a ∈ R, take a measurable set E ⊆ R r such that {x : f (x) ≥ a} = E ∩ dom f . Then {y : y ∈ C, f φ−1 (y) ≥ a} = C ∩ φ[D ∩ E]. But φ[D ∩ E] is measurable, by (b), so {y : f φ−1 (y) ≥ a} is relatively measurable in C. As a is arbitrary, f φ−1 is measurable. 262F Differentiability I come now to the class of functions whose properties will take up most of the rest of the chapter. Definitions Suppose r, s ≥ 1 and that φ is a function from a subset D = dom φ of R r to R s . (a) φ is differentiable at x ∈ D if there is a real s × r matrix T such that limy→x
kφ(y)−φ(x)−T (y−x)k ky−xk
= 0;
in this case we may write T = φ0 (x). (b) I will say that φ is differentiable relative to its domain at x, and that T is a derivative of φ at x, if x ∈ D and for every ² > 0 there is a δ > 0 such that kφ(y) − φ(x) − T (y − x)k ≤ ²ky − xk for every y ∈ B(x, δ) ∩ D. 262G Remarks (a) The standard definition in 262Fa, involving an all-sided limit ‘limy→x ’, implicitly requires φ to be defined on some non-trivial ball centred on x, so that we can calculate φ(y) − φ(x) −T (y −x) for all y sufficiently near x. It has the advantage that the derivative T = φ0 (x) is uniquely defined (because if limz→0
kT1 z−T2 zk kzk
= 0 then k(T1 −T2 )zk kzk
= limα→0
kT1 (αz)−T2 (αz)k kαzk
=0
for every non-zero z, so T1 − T2 must be the zero matrix). For our purposes here, there is some advantage in relaxing this slightly to the form in 262Fb, so that we do not need to pay special attention to the boundary of dom φ.
298
Change of variable in the integral
262Gb
(b) If you have not seen this concept of ‘differentiability’ before, but have some familiarity with partial differentiation, it is necessary to emphasize that the concept of ‘differentiable’ function (at least in the strict sense demanded by 262Fa) is strictly stronger than the concept of ‘partially differentiable’ function. For purposes of computation, the most useful method of finding true derivatives is through 262Id below. For a simple example of a function with a full set of partial derivatives, which is not everywhere differentiable, consider φ : R 2 → R defined by φ(ξ1 , ξ2 ) =
ξ1 ξ2 if ξ12 + ξ22 6= 0, ξ12 + ξ22
= 0 if ξ1 = ξ2 = 0. Then φ is not even continuous at 0, although both partial derivatives
∂φ ∂ξj
are defined everywhere.
(c) In the definition above, I speak of a derivative as being a matrix. Properly speaking, the derivative of a function defined on a subset of R r and taking values in R s should be thought of as a bounded linear operator from R r to R s ; the formulation in terms of matrices is acceptable just because there is a natural one-to-one correspondence between s × r real matrices and linear operators from R r to R s , and all these linear operators are bounded. I use the ‘matrix’ description because it makes certain calculations more direct; in particular, the relationship between φ0 and the partial derivatives of φ (262Ic), and the notion of the determinant det φ0 (x), used throughout §§263 and 265. 262H The norm of a matrix Some of the calculations below will rely on the notion of ‘norm’ of a matrix. The one I will use (in fact, for our purposes here, any norm would do) is the ‘operator norm’, defined by saying kT k = sup{kT xk : x ∈ R r , kxk ≤ 1} for any s × r matrix T . For the basic facts concerning these norms, see 2A4F-2A4G. The following will also be useful. (a) If all the coefficients of T are small, so is kT k; in fact, if T = hτij ii≤s,j≤r , and kxk ≤ 1, then |ξj | ≤ 1 for each j, so ¢ ¢ ¡Ps Pr ¡Ps Pr √ 2 1/2 2 1/2 ≤ r s maxi≤s,j≤r |τij |, ≤ kT xk = j=1 |τij |) j=1 τij ξj ) i=1 ( i=1 ( √ and kT k ≤ r s maxi≤s,j≤r |τij |. (This is a singularly crude inequality. A better one is in 262Ya. But it tells us, in particular, that kT k is always finite.) (b) If kT k is small, so are all the coefficients of T ; in fact, writing ej for the jth unit vector of R r , then the ith coordinate of T ej is τij , so |τij | ≤ kT ej k ≤ kT k. 262I Lemma Let φ : D → R s be a function, where D ⊆ R r . For i ≤ s let φi : D → R be its ith coordinate, so that φ(x) = (φ1 (x), . . . , φs (x)) for x ∈ D. (a) If φ is differentiable relative to its domain at x ∈ D, then φ is continuous at x. (b) If x ∈ D, then φ is differentiable relative to its domain at x iff each φi is differentiable relative to its domain at x. i (c) If φ is differentiable at x ∈ D, then all the partial derivatives ∂φ ∂ξj of φ are defined at x, and the
i derivative of φ at x is the matrix h ∂φ ∂ξj (x)ii≤s,j≤r .
i (d) If all the partial derivatives ∂φ ∂ξj , for i ≤ s and j ≤ r, are defined in a neighbourhood of x ∈ D and are continuous at x, then φ is differentiable at x.
proof (a) Let T be a derivative of φ at x. Applying the definition 262Fb with ² = 1, we see that there is a δ > 0 such that kφ(y) − φ(x) − T (y − x)k ≤ ky − xk whenever y ∈ D and ky − xk ≤ δ. Now
262I
Lipschitz and differentiable functions
299
kφ(y) − φ(x)k ≤ kT (y − x)k + ky − xk ≤ (1 + kT k)ky − xk whenever y ∈ D and ky − xk ≤ δ, so φ is continuous at x. (b)(i) If φ is differentiable relative to its domain at x ∈ D, let T be a derivative of φ at x. For i ≤ s let Ti be the 1 × r matrix consisting of the ith row of T . Let ² > 0. Then we have a δ > 0 such that |φi (y) − φi (x) − Ti (y − x)| ≤ kφ(y) − φ(x) − T (y − x)k ≤ ²ky − xk whenever y ∈ D and ky − xk ≤ δ, so that Ti is a derivative of φi at x. (ii) If each φi is differentiable relative to its domain at x, with corresponding derivatives Ti , let T be the s × r matrix with rows T1 , . . . , Ts . Given ² > 0, there is for each i ≤ s a δi > 0 such that |φi (y) − φi (x) − Ti y| ≤ ²ky − xk whenever y ∈ D, ky − xk ≤ δi ; set δ = mini≤s δi > 0; then if y ∈ D and ky − xk ≤ δ, we shall have Ps kφ(y) − φ(x) − T (y − x)k2 = i=1 |φi (y) − φi (x) − Ti (y − x)|2 ≤ s²2 ky − xk2 , so that
√ kφ(y) − φ(x) − T (y − x)k ≤ ² sky − xk.
As ² is arbitrary, T is a derivative of φ at x. (c) Set T = φ0 (x). We have limy→x
kφ(y)−φ(x)−T (y−x)k ky−xk
= 0;
fix j ≤ r, and consider y = x + ηej , where ej = (0, . . . , 0, 1, 0, . . . , 0) is the jth unit vector in R r . Then we must have limη→0
kφ(x+ηej )−φ(x)−ηT (ej )k |η|
= 0.
Looking at the ith coordinate of φ(x + ηej ) − φ(x) − ηT (ej ), we have |φi (x + ηej ) − φi (x) − τij η| ≤ kφ(x + ηej ) − φ(x) − ηT (ej )k, where τij is the (i, j)th coefficient of T ; so that limη→0
|φi (x+ηej )−φi (x)−τij η| |η|
But this just says that the partial derivative
∂φi ∂ξj (x)
(d) Now suppose that the partial derivatives δ > 0 be such that
∂φi ∂ξj
= 0.
exists and is equal to τij , as claimed. are defined near x and continuous at x. Let ² > 0. Let
i | ∂φ ∂ξj (y) − τij | ≤ ²
whenever ky − xk ≤ δ, writing τij =
∂φi ∂ξj (x).
Now suppose that ky − xk ≤ δ. Set
y = (η1 , . . . , ηr ),
x = (ξ1 , . . . , ξr ),
yj = (η1 , . . . , ηj , ξj+1 , . . . , ξr ) for 0 ≤ j ≤ r, so that y0 = x, yr = y and the line segment between yj−1 and yj lies wholly within δ of x whenever 1 ≤ j ≤ r, since if z lies on this line segment then ζi lies between ξi and ηi for every i. By the ordinary mean value theorem for differentiable real functions, applied to the function t 7→ φi (η1 , . . . , ηj−1 , t, ξj+1 , . . . , ξr ), there is for each i ≤ s, j ≤ r a point zij on the line segment between yj−1 and yj such that i φi (yj ) − φi (yj−1 ) = (ηj − ξj ) ∂φ ∂ξj (zij ).
300
Change of variable in the integral
262I
But i | ∂φ ∂ξj (zij ) − τij | ≤ ²,
so |φi (yj ) − φi (yj−1 ) − τij (ηj − ξj )| ≤ ²|ηj − ξj | ≤ ²ky − xk. Summing over j, |φi (y) − φi (x) −
Pr
j=1 τij (ηj
− ξj )| ≤ r²ky − xk
for each i. Summing the squares and taking the square root,
√ kφ(y) − φ(x) − T (y − x)k ≤ ²r sky − xk,
where T = hτij ii≤s,j≤r . And this is true whenever ky − xk ≤ δ. As ² is arbitrary, φ0 (x) = T is defined. 262J Remark I am not sure if I ought to apologize for the notation i ξj ) ∂φ ∂ξj (zij )
∂ ∂ξj .
In such formulae as (ηj −
above, the two appearances of ξj clash most violently. But I do not think that any person of good will is likely to be misled, provided that the labels ξj (or whatever symbols are used to represent the variables involved) are adequately described when the domain of φ is first introduced (and always remembering that in partial differentiation, we are not only moving one variable – a ξj in the present context – but holding fixed some further list of variables, not listed in the notation). I believe that the traditional notation ∂ξ∂ j has survived for solid reasons, and I should like to offer a welcome to those who are more comfortable with it than with any of the many alternatives which have been proposed, but have never taken root. 262K The Cantor function revisited It is salutary to re-examine the examples of 134H-134I in the light of the present considerations. Let f : [0, 1] → [0, 1] be the Cantor function (134H) and set g(x) = 1 −1 : [0, 1] → [0, 1]. 2 (x + f (x)) for x ∈ [0, 1]. Then g : [0, 1] → [0, 1] is a homeomorphism (134I); set φ = g 1 We see that if 0 ≤ x ≤ y ≤ 1 then g(y) − g(x) ≥ 2 (y − x); equivalently, φ(y) − φ(x) ≤ 2(y − x) whenever 0 ≤ x ≤ y ≤ 1, so that φ is a Lipschitz function, therefore absolutely continuous (262Bc). If D = {x : φ0 (x) is defined}, then [0, 1] \ D is negligible (225Cb), so [0, 1] \ φ[D] = φ[ [0, 1] \ D] is negligible (262Da). I noted in 134I that there is a measurable function h : [0, 1] → R such that the composition hφ is not measurable; now h(φ¹D) = (hφ)¹D cannot be measurable, even though φ¹D is differentiable. 262L
It will be convenient to be able to call on the following straightforward result.
Lemma Suppose that D ⊆ R r and x ∈ R r are such that limδ↓0
µ∗ (D∩B(x,δ)) µB(x,δ)
= 1. Then limz→0
ρ(x+z,D) kzk
= 0,
where ρ(x + z, D) = inf y∈D kx + z − yk. proof Let ² > 0. Let δ0 > 0 be such that µ∗ (D ∩ B(x, δ)) > (1 − (
² r ) )µB(x, δ) 1+²
whenever 0 < δ ≤ δ0 . Take any z such that 0 < kzk ≤ δ0 /(1 + ²). ?? Suppose, if possible, that ρ(x + z, D) > ²kzk. Then B(x + z, ²kzk) ⊆ B(x, (1 + ²)kzk) \ D, so µ∗ (D ∩ B(x, (1 + ²)kzk)) ≤ µB(x, (1 + ²)kzk) − µB(x + z, ²kzk) = (1 − (
² r ) )µB(x, (1 + ²)kzk), 1+²
which is impossible, as (1 + ²)kzk ≤ δ0 . X X Thus ρ(x + z, D) ≤ ²kzk. As ² is arbitrary, this proves the result. Remark There is a word for this; see 261Yg. 262M I come now to the first result connecting Lipschitz functions with differentiable functions. I approach it through a substantial lemma which will be the foundation of §263.
262M
Lipschitz and differentiable functions
301
Lemma Let r, s ≥ 1 be integers and φ a function from a subset D of R r to R s which is differentiable at each point of its domain. For each x ∈ D let T (x) be a derivative of φ. Let Msr be the set of s × r matrices and ζ : A → ]0, ∞[ a strictly positive function, where A ⊆ Msr is a non-empty set containing T (x) for every x ∈ D. Then we can find sequences hDn in∈N , hTn in∈N such that (i) hDn in∈N is a disjoint cover of D by sets which are relatively measurable in D, that is, are intersections of D with measurable subsets of R r ; (ii) Tn ∈ A for every n; (iii) kφ(x) − φ(y) − Tn (x − y)k ≤ ζ(Tn )kx − yk for every n ∈ N and x, y ∈ Dn ; (iv) kT (x) − Tn k ≤ ζ(Tn ) for every x ∈ Dn . proof (a) The first step is to note that there is a sequence hSn in∈N in A such that S A ⊆ n∈N {T : T ∈ Msr , kT − Sn k < ζ(Sn )}. P P (Of course this is a standard result about separable metric spaces.) Write Q for the set of matrices in Msr with rational coefficients; then there is a natural bijection between Q and Qsr , so Q and Q × N are countable. Enumerate Q × N as h(Rn , kn )in∈N . For each n ∈ N, choose Sn ∈ A by the rule — if there is an S ∈ A such that {T : kT − Rn k ≤ 2−kn } ⊆ {T : kT − Sk < ζ(S)}, take such an S for Sn ; — otherwise, take Sn to be any member of A. I claim that this works. For let S ∈ A. Then ζ(S) > 0; take k ∈ N such that 2−k < ζ(S). Take R∗ ∈ Q such that kR∗ − Sk < min(ζ(S) − 2−k , 2−k ); this is possible because kR − Sk will be small whenever all the coefficients of R are close enough to the corresponding coefficients of S (262Ha), and we can find rational numbers to achieve this. Let n ∈ N be such that R∗ = Rn and k = kn . Then {T : kT − Rn k ≤ 2−kn } ⊆ {T : kT − Sk < ζ(S)} (because kT − Sk ≤ kT − Rn k + kRn − Sk), so we must have chosen Sn by the first part of the rule above, and S ∈ {T : kT − Rn k ≤ 2−kn } ⊆ {T : kT − Sn k < ζ(Sn )}. As S is arbitrary, this proves the result. Q Q (b) Enumerate Qr × Qr × N as h(qn , qn0 , mn )in∈N . For each n ∈ N, set Hn = {x : x ∈ [qn , qn0 ] ∩ D, kφ(y) − φ(x) − Smn (y − x)k ≤ ζ(Smn )ky − xk = [qn , qn0 ] ∩ D ∩
for every y ∈ [qn , qn0 ] ∩ D}
\
{x : x ∈ D,
0 ]∩D y∈[qn ,qn
kφ(y) − φ(x) − Smn (y − x)k ≤ ζ(Smn )ky − xk}. Because φ is continuous, Hn = D ∩ H n , writing H n for the closure of Hn , so Hn is relatively measurable in D. Note that if x, y ∈ Hn , then y ∈ D ∩ [qn , qn0 ], so that kφ(y) − φ(x) − Smn (y − x)k ≤ ζ(Smn )ky − xk. Set Hn0 = {x : x ∈ Hn , kT (x) − Smn k ≤ ζ(Smn )}. S (c) D = n∈N Hn0 . P P Let x ∈ D. Then T (x) ∈ A, so there is a k ∈ N such that kT (x) − Sk k < ζ(Sk ). Let δ > 0 be such that kφ(y) − φ(x) − T (x)(x − y)k ≤ (ζ(Sk ) − kT (x) − Sk k)kx − yk whenever y ∈ D and ky − xk ≤ δ. Then kφ(y) − φ(x) − Sk (x − y)k ≤ (ζ(Sk ) − kT (x) − Sk k)kx − yk + kT (x) − Sk kkx − yk ≤ ζ(Sk )kx − yk
302
Change of variable in the integral
262M
whenever y ∈ D ∩ B(x, δ). Let q, q 0 ∈ Qr be such that x ∈ [q, q 0 ] ⊆ B(x, δ). Let n be such that q = qn , q 0 = qn0 and k = mn . Then x ∈ Hn0 . Q Q (d) Write Cn = {x : x ∈ Hn , limδ↓0
µ∗ (Hn ∩B(x,δ)) µB(x,δ)
= 1}.
Then Cn ⊆ Hn0 . P P (i) Take x ∈ Cn , and set T˜ = T (x) − Smn . I have to show that kT˜k ≤ ζ(Smn ). Take ² > 0. Let δ0 > 0 be such that kφ(y) − φ(x) − T (x)(y − x)k ≤ ²ky − xk whenever y ∈ D and ky − xk ≤ δ0 . Since kφ(y) − φ(x) − Smn (y − x)k ≤ ζ(Smn )ky − xk whenever y ∈ Hn , we have kT˜(y − x)k ≤ (² + ζ(Smn ))ky − xk whenever y ∈ Hn and ky − xk ≤ δ0 . (ii) By 262L, there is a δ1 > 0 such that (1+2²)δ1 ≤ δ0 and ρ(x+z, Hn ) ≤ ²kzk whenever 0 < kzk ≤ δ1 . So if kzk ≤ δ1 there is a y ∈ Hn such that kx + z − yk ≤ 2²kzk. (If z = 0 we can take y = x.) Now kx − yk ≤ (1 + 2²)kzk ≤ δ0 , so kT˜zk ≤ kT˜(y − x)k + kT˜(x + z − y)k ≤ (² + ζ(Sm ))ky − xk + kT˜kkx + z − yk n
≤ (² + ζ(Smn ))kzk + (² + ζ(Smn ) + kT˜k)kx + z − yk ≤ (² + ζ(Sm ) + 2²2 + 2²ζ(Sm ) + 2²kT˜k)kzk. n
n
And this is true whenever 0 < kzk ≤ δ1 . But multiplying this inequality by suitable positive scalars we see that ¡ ¢ kT˜zk ≤ ² + ζ(Sm ) + 2²2 + 2²ζ(Sm ) + 2²kT˜k kzk n
n
r
for all z ∈ R , and kT˜k ≤ ² + ζ(Smn ) + 2²2 + 2²ζ(Smn ) + 2²kT˜k. As ² is arbitrary, kT˜k ≤ ζ(Smn ), as claimed. Q Q (e) By 261Da, Hn \ Cn is negligible for every n, so Hn \ Hn0 is negligible, and Hn0 = D ∩ (H n \ (Hn \ Hn0 )) is relatively measurable in D. Set Dn = Hn0 \
S k 1. (b) The first step is to show that all the partial derivatives Borel measurable. P P Take j ≤ r. For q ∈ Q \ {0} set
∂φ ∂ξj
are defined almost everywhere and are
1 q
∆q (x) = (φ(x + qej ) − φ(x)), writing ej for the jth unit vector of R r . Because φ is continuous, so is ∆q , so that ∆q is a Borel measurable function for each q. Next, for any x ∈ Rr , 1 δ
D+ (x) = lim supδ→0 (φ(x + δej ) − φ(x)) = limn→∞ supq∈Q,0 0 there is a δ > 0 such that |φ(x + (u, 0)) − φ(x) − T (x)(u, 0)| ≤ ²kuk whenever kuk ≤ δ, that is, iff for every m ∈ N there is an n ∈ N such that |φ(x + (u, 0)) − φ(x) − T (x)(u, 0)| ≤ 2−m kuk whenever u ∈ Qr−1 and kuk ≤ 2−n . But for any particular m ∈ N and u ∈ Qr−1 the set {x : |φ(x + (u, 0)) − φ(x) − T (x)(u, 0)| ≤ 2−m kuk} is measurable, indeed Borel, because all the functions x 7→ φ(x + (u, 0)), x 7→ φ(x), x 7→ T (x)(u, 0) are Borel measurable. So H1 is of the form T S T m∈N n∈N u∈Qr−1 ,kuk≤2−n Emnu where every Emnu is a measurable set, and H1 is therefore measurable. Now however observe that for any σ ∈ R, the function v 7→ φσ (v) = φ(v, σ) : R r−1 → R
*262Q
Lipschitz and differentiable functions
305
is Lipschitz, therefore (by the inductive hypothesis) differentiable almost everywhere on R r−1 ; and that (v, σ) ∈ H1 iff (v, σ) ∈ H and φ0σ (v) is defined. Consequently {v : (v, σ) ∈ H1 } is conegligible whenever {v : (v, σ) ∈ H} is, that is, for almost every σ ∈ R; so that H1 , being measurable, must be conegligible. Q Q (e) Now, for q, q 0 ∈ Q and n ∈ N, set F (q, q 0 , n) = {x : x ∈ R r , q ≤
φ(x+(0,η))−φ(x) η
≤ q 0 whenever 0 < |η| ≤ 2−n }.
Set F∗ (q, q 0 , n) = {x : x ∈ F (q, q 0 , n), limδ↓0
µ∗ (F (q,q 0 ,n)∩B(x,δ)) µB(x,δ)
= 1}.
By 261Da, F (q, q 0 , n) \ F∗ (q, q 0 , n) is negligible for all q, q 0 , n, so that S H2 = H1 \ q,q0 ∈Q,n∈N (F (q, q 0 , n) \ F∗ (q, q 0 , n)) is conegligible. (f ) I claim that φ is differentiable at every point of H2 . P P Take x = (u, σ) ∈ H2 . Then α =
∂φ (x) ∂ξr
and
T = T (x) are defined. Let γ be a Lipschitz constant for φ. Take ² > 0; take q, q 0 ∈ Q such that α − ² ≤ q < α < q 0 ≤ α + ². There must be an n ∈ N such that x ∈ F (q, q 0 , n); consequently x ∈ F∗ (q, q 0 , n), by the definition of H2 . By 262L, there is a δ0 > 0 such that ρ(x + z, F (q, q 0 , n)) ≤ ²kzk whenever kzk ≤ δ0 . Next, there is a δ1 > 0 such that |φ(x + (v, 0)) − φ(x) − T (v, 0)| ≤ ²kvk whenever v ∈ R r−1 and kvk ≤ δ1 . Set δ = min(δ0 , δ1 , 2−n )/(1 + 2²) > 0. Suppose that z = (v, τ ) ∈ R r and that kzk ≤ δ. Because kzk ≤ δ0 there is an x0 = (u0 , σ 0 ) ∈ F (q, q 0 , n) such that kx + z − x0 k ≤ 2²kzk; set x∗ = (u0 , σ). Now max(ku − u0 k, |σ − σ 0 |) ≤ kx − x0 k ≤ (1 + 2²)kzk ≤ min(δ1 , 2−n ). so |φ(x∗ ) − φ(x) − T (x∗ − x)| ≤ ²ku0 − uk ≤ ²(1 + 2²)kzk. But also |φ(x0 ) − φ(x∗ ) − T (x0 − x∗ )| = |φ(x0 ) − φ(x∗ ) − α(σ 0 − σ)| ≤ ²|σ 0 − σ| ≤ ²(1 + 2²)kzk, because x0 ∈ F (q, q 0 , n) and |σ − σ 0 | ≤ 2−n , so that (if x0 6= x∗ ) α−²≤q ≤
φ(x∗ )−φ(x0 ) σ−σ 0
≤ q0 ≤ α + ²
and ¯ φ(x0 )−φ(x∗ ) ¯ ¯ − α¯ ≤ ². 0 σ −σ
Finally, |φ(x + z) − φ(x0 )| ≤ γkx + z − x0 k ≤ 2γ²kzk, |T z − T (x0 − x)| ≤ kT kkx + z − x0 k ≤ 2²kT kkzk. Putting all these together, |φ(x + z) − φx − T z| ≤ |φ(x + z) − φ(x0 )| + |T (x0 − x) − T z| + |φ(x0 ) − φ(x∗ ) − T (x0 − x∗ )| + |φ(x∗ ) − φ(x) − T (x∗ − x)| ≤ 2γ²kzk + 2²kT kkzk + ²(1 + 2²)kzk + ²(1 + 2²)kzk = ²(2γ + 2kT k + 2 + 4²)kzk. And this is true whenever kzk ≤ δ. As ² is arbitrary, φ is differentiable at x. Q Q Thus {x : φ is differentiable at x} includes H2 and is conegligible; and the induction continues.
306
Change of variable in the integral
262X
262X Basic exercises (a) Let φ and ψ be Lipschitz functions from subsets of R r to R s . Show that φ + ψ is a Lipschitz function from dom φ ∩ dom ψ to Rs . (b) Let φ be a Lipschitz function from a subset of R r to R s , and c ∈ R. Show that cφ is a Lipschitz function. (c) Suppose φ : D → R s and ψ : E → R q are Lipschitz functions, where D ⊆ R r and E ⊆ R s . Show that the composition ψφ : D ∩ φ−1 [E] → R q is Lipschitz. (d) Suppose φ, ψ are functions from subsets of R r to R s , and suppose that x ∈ dom φ ∩ dom ψ is such that each function is differentiable relative to its domain at x, with derivatives S, T there. Show that φ + ψ is differentiable relative to its domain at x, and that S + T is a derivative of φ + ψ at x. (e) Suppose that φ is a function from a subset of R r to R s , and is differentiable relative to its domain at x ∈ dom φ. Show that cφ is differentiable relative to its domain at x for every c ∈ R. > (f ) Suppose φ : D → R s and ψ : E → Rq are functions, where D ⊆ R r and E ⊆ R s ; suppose that φ is differentiable relative to its domain at x ∈ D ∩ φ−1 [E], with an s × r matrix T a derivative there, and that ψ is differentiable relative to its domain at φ(x), with a q × s matrix S a derivative there. Show that the composition ψφ is differentiable relative to its domain at x, and that the q × r matrix ST is a derivative of ψφ at x. (g) Let φ : R r → R s be a linear operator, with associated matrix T . Show that φ is differentiable everywhere, with φ0 (x) = T for every x. > (h) Let G ⊆ R r be a convex open set, and φ : G → R s a function such that all the partial derivatives are defined everywhere in G. Show that φ is Lipschitz iff all the partial derivatives are bounded on G.
∂φi ∂ξj
(i) Let φ : R r → R s be a function. Show that φ is differentiable at x ∈ R r iff for every m ∈ N there are an n ∈ N and an r × s matrix T with rational coefficients such that kφ(y) − φ(x) − T (y − x)k ≤ 2−m ky − xk whenever ky − xk ≤ 2−n . >(j) Suppose that f is a real-valued function which is integrable over R r , and that g : R r → R is a bounded differentiable function such that the partial derivative the convolution of f and g (255L). Show that
∂ (f ∂ξj
∂g ∂ξj
is bounded, where j ≤ r. Let f ∗ g be
∗ g) is defined everywhere and equal to f ∗
∂g . ∂ξj
(Hint:
255Xg.) >(k) Let (X, Σ, µ) be a measure space, G ⊆ R r an open set, and f : X × G → R a function. Suppose that (i) for every x ∈ X, t 7→ f (x, t) : G → R is differentiable; ∂f (ii) there is an integrable function g on X such that | ∂τ (x, t)| ≤ g(x) whenever x ∈ X, t ∈ G j and j ≤R r; (iii) |f (x, t)|µ(dx) exists in R for every t ∈ G. R Show that t 7→ f (x, t)µ(dx) : G → R is differentiable. (Hint: show first that, for a suitable M , |f (x, t) − f (x, t0 )| ≤ M |g(x)|kt − t0 k for every t, t0 ∈ G and x ∈ X.) 262Y Further exercises (a) Show if T = hτij ii≤s,j≤r is an s × r matrix then the operator norm qP that Pr s 2 kT k, as defined in 262H, is at most i=1 j=1 |τij | . (b) Give an example of a measurable function φ : R 2 → R such that dom
∂φ ∂ξ1
is not measurable.
(c) Let φ : D → R be any function, where D ⊆ R r . Show that H = {x : x ∈ D, φ is differentiable relative to its domain at x} is relatively measurable in D, and that
∂φ ¹H ∂ξj
is measurable for every j ≤ r.
262 Notes
Lipschitz and differentiable functions
(d) A function φ : R r → R is smooth if all its partial derivatives r
r
307 ∂...∂φ ∂ξi ∂ξj ...∂ξl
are defined everywhere in
r
R and are continuous. Show that if f is integrable over R and φ : R → R is smooth and has bounded support then the convolution f ∗ φ is smooth. (Hint: 262Xj, 262Xk.) R 2 2 (e) For δ > 0 set φ˜δ (x) = e1/(δ −kxk ) if kxk < δ, 0 if kxk ≥ δ; set αδ = φ˜δ (x)dx, φδ (x) = αδ−1 φ˜δ (x) for r every x. (i) Show that R φδ : R → R is smooth and has bounded support. (ii) Show that if f is integrable r over R then limδ↓0 |f (x) − (f ∗ φδ )(x)|dx = 0. (Hint: start with continuous functions f with bounded support, and use 242O.) r (f ) Show R that if f is integrable over R and ² > 0 there is a smooth function h with bounded support such that |f − h| ≤ ². (Hint: either reduce to the case in which f has bounded support and use 262Ye or adapt the method of 242Xi.)
(g) Suppose that f is a real function which is integrable over every bounded subset of R r . (i) Show that r that if Rf × φ is integrable whenever φ : R → R is a smooth function with bounded support. (ii) Show R f × φ = 0 for every smooth function with bounded support then f =a.e. 0. (Hint: show that B(x,δ) f = 0 R for every x ∈ R r and δ > 0, and use 261C. Alternatively show that E f = 0 first for E = [b, c], then for open sets E, then for arbitrary measurable sets E.) (h) Let f be integrable over R r , and for δ > 0 let φδ : R r → R be the function of 262Ye. Show that limδ↓0 (f ∗ φδ )(x) = f (x) for every x in the Lebesgue set of f . (Hint: 261Ye.) (i) Let L be the space of all Lipschitz functions from R r to R s and for φ ∈ L set kφk = kφ(0)k + inf{γ : γ ∈ [0, ∞[, kφ(y) − φ(x)k ≤ γky − xk for every x, y ∈ R r }. Show that (L, k k) is a Banach space. 262 Notes and comments The emphasis of this section has turned out to be on the connexions between the concepts of ‘Lipschitz function’ and ‘differentiable function’. It is the delight of classical real analysis that such intimate relationships arise between concepts which belong to different categories. ‘Lipschitz functions’ clearly belong to the theory of metric spaces (I will return to this in §264), while ‘differentiable functions’ belong to the theory of differentiable manifolds, which is outside the scope of this volume. I have written this section out carefully just in case there are readers who have so far missed the theory of differentiable mappings between multi-dimensional Euclidean spaces; but it also gives me a chance to work through the notion of ‘function differentiable relative to its domain’, which will make it possible in the next section to ride smoothly past a variety of problems arising at boundaries. The difficulties I am concerned with arise in the first place with such functions as the polar-coordinate transformation (ρ, θ) 7→ (ρ cos θ, ρ sin θ) : {(0, 0)} ∪ (]0, ∞[ × ]−π, π]) → R 2 . In order to make this a bijection we have to do something rather arbitrary, and the domain of the transformation cannot be an open set. On the definitions I am using, this function is differentiable relative to its domain at every point of its domain, and we can apply such results as 262O uninhibitedly. You will observe that in this case the non-interior points of the domain form a negligible set {(0, 0)} ∪ (]0, ∞[ × {π}), so we can expect to be able to ignore them; and for most of the geometrically straightforward transformations that the theory is applied to, judicious excision of negligible sets will reduce problems to the case of honestly differentiable functions with open domains. But while open-domain theory will deal with a large proportion of the most important examples, there is a danger that you would be left with real misapprehensions concerning the scope of these methods. The essence of differentiability is that a differentiable function φ is approximable, near any given point of its domain, by an affine function. The idea of 262M is to describe a widely effective method of dissecting D = dom φ into countably many pieces on each of which φ is well-behaved. This will be applied in §§263 and 265 to investigate the measure of φ[D]; but we already have several straightforward consequences (262N262P).
308
Change of variable in the integral
§263 intro.
263 Differentiable transformations in R r This section is devoted to the proof of a single major theorem (263D) concerning differentiable transformations between subsets of R r . There will be a generalization of this result in §265, and those with some familiarity with the topic, or sufficient hardihood, may wish to read §264 before taking this section and §265 together. I end with a few simple corollaries and an extension of the main result which can be made in the one-dimensional case (263I). Throughout this section, as in the rest of the chapter, µ will denote Lebesgue measure on R r . 263A Linear transformations I begin with the special case of linear operators, which is not only the basis of the proof of 263D, but is also one of its most important applications, and is indeed sufficient for many very striking results. Theorem Let T be a real r × r matrix; regard T as a linear operator from R r to itself. Let J = | det T | be the modulus of its determinant. Then µT [E] = JµE r
for every measurable set E ⊆ R . If T is a bijection (that is, if J 6= 0), then µF = JµT −1 [F ] for every measurable F ⊆ R r , and
R F
g dµ = J
R T −1 [F ]
gT dµ
for every integrable function g and measurable set F . proof (a) The first step is to show that T [I] is measurable for every half-open interval I ⊆ R r . P P Any non-empty half-open interval I = [a, b[ is a countable union of closed intervals In = [a, b − 2−n 1], and each In is compact (2A2F), S so that T [In ] is compact (2A2Eb), therefore closed (2A2Ec), therefore measurable Q (115G), and T [I] = n∈N T [In ] is measurable. Q (b) Set J ∗ = µT [ [0, 1[ ], where 0 = (0, . . . , 0) and 1 = (1, . . . , 1); because T [ [0, 1[ ] is bounded, J ∗ < ∞. (I will eventually show that J ∗ = J.) It is convenient to deal with the case of singular T first. Recall that T , regarded as a linear transformation from R r to itself, is either bijective or onto a proper linear subspace. In the latter case, take any e ∈ Rr \ T [R r ]; then the sets T [ [0, 1[ ] + γe, as γ runs over [0, 1], are disjoint and all of the same measure J ∗ , because µ is translation-invariant (134A); moreover, their union is bounded, so has finite outer measure. As there are infinitely many such γ, the common measure J ∗ must be zero. Now observe that S T [R r ] = z∈Zr T [ [0, 1[ ] + T z, and µ(T [ [0, 1[ ] + T z) = J ∗ = 0 for every z ∈ Zr , while Zr is countable, so µT [Rr ] = 0. At the same time, because T is singular, it has zero determinant, and J = 0. Accordingly µT [E] = 0 = JµE r
for every measurable E ⊆ R , and we’re done. (c) Henceforth, therefore, let us assume that T is non-singular. Note that it and its inverse are continuous, so that T is a homeomorphism, and T [G] is open iff G is open. If a ∈ R r and k ∈ N, then £ £ µT [ a, a + 2−k 1 ] = 2−kr J ∗ . £ £ £ £ £ £ P P Set Jk∗ = µT [ 0, 2−k 1 ]. Now T [ a, a + 2−k 1 ] = T [ 0, 2−k 1 ] + T a; because µ is translation-invariant, £ £ its measure is also Jk∗ . Next, [0, 1[ is expressible as a disjoint uion of 2kr sets £of the form £ a, a + 2−k 1 ; consequently, T [ [0, 1[ ] is expressible as a disjoint uion of 2kr sets of the form T [ a, a + 2−k 1 ], and
263A
Differentiable transformations in R r
309
J ∗ = µT [ [0, 1[ ] = 2kr Jk∗ , that is, Jk∗ = 2−kr J ∗ , as claimed. Q Q (d) Consequently µT [G] = J ∗ µG for every open set G ⊆ R r . P P For each k ∈ N, set £ −k £ r −k Qk = {z : z ∈ Z , 2 z, 2 z + 2−k 1 ⊆ G, £ £ S Gk = z∈Qk 2−k z, 2−k z + 2−k 1 . £ −k £ −k −k kr Then Gk is a disjoint union of #(Qk ) sets of the form k ); also, £ −k 2 −kz, 2 z−k+ 2£ 1 , so µGk = 2−kr#(Q T [Gk ] is a disjoint union of #(Qk ) sets of the form T [ 2 z, 2 z + 2 1 ], so has measure 2 J ∗ #(Qk ) = J ∗ µGk , using (c). Observe next that hGk ik∈N is a non-decreasing sequence with union G, so that µT [G] = limk→∞ µT [Gk ] = limk→∞ J ∗ µGk = J ∗ µG. Q Q (e) It follows that µ∗ T [A] = J ∗ µ∗ A for every A ⊆ Rr . P P Given A ⊆ R r and ² > 0, there are open sets ∗ G, H such that G ⊇ A, H ⊇ T [A], µG ≤ µ A + ² and µH ≤ µ∗ T [A] + ² (134Fa). Set G1 = G ∩ T −1 [H]; then G1 is open because T −1 [H] is. Now µT [G1 ] = J ∗ µG1 , so µ∗ T [A] ≤ µT [G1 ] = J ∗ µG1 ≤ J ∗ µ∗ A + J ∗ ² ≤ J ∗ µG1 + J ∗ ² = µT [G1 ] + J ∗ ² ≤ µH + J ∗ ² ≤ µ∗ T [A] + ² + J ∗ ². As ² is arbitrary, µ∗ T [A] = J ∗ µ∗ A. Q Q (f ) Consequently µT [E] exists and is equal to J ∗ µE for every measurable E ⊆ R r . P P Let E ⊆ R r be r 0 −1 measurable, and take any A ⊆ R . Set A = T [A]. Then µ∗ (A ∩ T [E]) + µ∗ (A \ T [E]) = µ∗ (T [A0 ∩ E]) + µ∗ (T [A0 \ E]) = J ∗ (µ∗ (A0 ∩ E) + µ∗ (A0 \ E)) = J ∗ µ∗ A0 = µ∗ T [A0 ] = µ∗ A. As A is arbitrary, T [E] is measurable, and now µT [E] = µ∗ T [E] = J ∗ µ∗ E = J ∗ µE. Q Q (g) We are at last ready for the calculation of J ∗ . Recall that the matrix T must be expressible as P DQ, where P and Q are orthogonal matrices and D is diagonal, with non-negative diagonal entries (2A6C). Now we must have T [ [0, 1[ ] = P [D[Q[ [0, 1[ ]]], so, using (f), ∗ ∗ J ∗ = JP∗ JD JQ , ∗ where JP∗ = µP [ [0, 1[ ], etc. Now we find that JP∗ = JQ = 1. P P Let B = B(0, 1) be the unit ball of r R . Because B is closed, it is measurable; because it is bounded, µB < ∞; and because B includes the £ £ non-empty half-open interval 0, r−1/2 1 , µB > 0. Now P [B] = Q[B] = B, because P and Q are orthogonal matrices; so we have
µB = µP [B] = JP∗ µB, ∗ and JP∗ must be 1; similarly, JQ = 1. Q Q ∗ (h) So we have only to calculate JD . Suppose the coefficients of D are δ1 , . . . , δr ≥ 0, so that Dx = (δ1 ξ1 , . . . , δr ξr ) = d × x. We have been assuming since the beginning of (c) that T is non-singular, so no δi can be 0. Accordingly
D[ [0, 1[ ] = [0, d[,
310
Change of variable in the integral
and ∗ JD = µ [0, d[ =
Qr
i=1 δi
263A
= det D.
Now because P and Q are orthogonal, both have determinant ±1, so det T = ± det D and J ∗ = ± det T ; because J ∗ is surely non-negative, J ∗ = | det T | = J. (i) Thus µT [E] = JµE for every Lebesgue measurable E ⊆ R r . If T is non-singular, then we may use the above argument to show that T −1 [F ] is measurable for every measurable F , and R µF = µT [T −1 [F ]] = JµT −1 [F ] = J × χ(T −1 [F ]) dµ, identifying J with the constant function with value J. By 235A, R R R g dµ = T −1 [F ] JgT dµ = J T −1 [F ] gT dµ F for every integrable function g and measurable set F . 263B Remark Perhaps I should have warned you that I should be calling on the results of §235. But if they were fresh in your mind the formulae of the statement of the theorem will have recalled them, and if not then it is perhaps better to turn back to them now rather than before reading the theorem, since they are used only in the last sentence of the proof. I have taken the argument above at a leisurely, not to say pedestrian, pace. The point is that while the translation-invariance of Lebesgue measure, and its behaviour under simple magnification of a single coordinate, are more or less built into the definition, its behaviour under general rotations is not, since a rotation takes half-open intervals into skew cuboids. Of course the calculation of the measure of such an object is not really anything to do with the Lebesgue theory, and it will be clear that much of the argument would apply equally to any geometrically reasonable notion of r-dimensional volume. We come now to the central result of the chapter. We have already done some of the detail work in 262M. The next basic element is the following lemma. 263C Lemma Let T be any r × r matrix; set J = | det T |. Then for any ² > 0 there is a ζ = ζ(T, ²) > 0 such that (i) | det S − det T | ≤ ² whenever S is an r × r matrix and kS − T k ≤ ζ; (ii) whenever D ⊆ R r is a bounded set and φ : D → R r is a function such that kφ(x) − φ(y) − T (x − y)k ≤ ζkx − yk for all x, y ∈ D, then |µ∗ φ[D] − Jµ∗ D| ≤ ²µ∗ D. proof (a) Of course (i) is the easy part. Because det S is a continuous function of the coefficients of S, and the coefficients of S must be close to those of T if kS − T k is small (262Hb), there is surely a ζ0 > 0 such that | det S − det T | ≤ ² whenever kS − T k ≤ ζ0 . (b)(i) Write B = B(0, 1) for the unit ball of R r , and consider T [B]. We know that µT [B] = JµB (263A). Let G ⊇ T [B] be an open set such that µG ≤ (J + ²)µB (134Fa). Because B is compact (2A2F) so is T [B], so there is a ζ1 > 0 such that T [B] + ζ1 B ⊆ G (2A2Ed). This means that µ∗ (T [B] + ζ1 B) ≤ (J + ²)µB. (ii) Now suppose that D ⊆ R r is a bounded set, and that φ : D → R r is a function such that kφ(x) − φ(y) − T (x − y)k ≤ ζ1 kx − yk for all x, y ∈ D. Then if x ∈ D and δ > 0, φ[D ∩ B(x, δ)] ⊆ φ(x) + δT [B] + δζ1 B, because if y ∈ D ∩ B(x, δ) then T (y − x) ∈ δT [B] and φ(y) = φ(x) + T (y − x) + (φ(y) − φ(x) − T (y − x)) ∈ φ(x) + δT [B] + ζ1 ky − xkB ⊆ φ(x) + δT [B] + ζ1 δB. Accordingly µ∗ φ[D ∩ B(x, δ)] ≤ µ∗ (δT [B] + δζ1 B) = δ r µ∗ (T [B] + ζ1 B) ≤ δ r (J + ²)µB = (J + ²)µB(x, δ). S P∞ Let η > 0. Then there is a sequence hBn in∈N of balls in R r such that D ⊆ n∈N Bn , n=0 µBn ≤ µ∗ D+η and the sum of the measures of those Bn whose centres do not lie in D is at most η (261F). Let K be the
263C
Differentiable transformations in R r
311
set of those n such that the centre of Bn lies in D. Then µ∗ φ[D ∩ Bn ] ≤ (J + ²)µBn for every n ∈ K. Also, of course, φ is (kT k + ζ1 )-Lipschitz, so µ∗ φ[D ∩ Bn ] ≤ (kT k + ζ1 )r µBn for n ∈ N \ K (262D). Now ∗
µ φ[D] ≤
∞ X
µ∗ φ[D ∩ Bn ]
n=0
≤
X
(J + ²)µBn +
n∈K
X
(kT k + ζ1 )r µBn
n∈N\K ∗
≤ (J + ²)(µ D + η) + η(kT k + ζ1 )r . As η is arbitrary, µ∗ φ[D] ≤ (J + ²)µ∗ D. (c) If J = 0, we can stop here, setting ζ = min(ζ0 , ζ1 ); for then we surely have | det S − det T | ≤ ² whenever kS − T k ≤ ζ, while if φ : D → Rr is such that kφ(x) − φ(y) − T (x − y)k ≤ ζkx − yk for all x, y ∈ D, then |µ∗ φ[D] − Jµ∗ D| = µ∗ φ[D] ≤ ²µ∗ D. If J 6= 0, we have more to do. Because T has non-zero determinant, it has an inverse T −1 , and | det T −1 | = J −1 . As in (b-i) above, there is a ζ2 > 0 such that µ∗ (T −1 [B] + ζ2 B) ≤ (J −1 + ²0 )µB, where ²0 = ²/J(J + ²). Repeating (b), we see that if C ⊆ Rr is bounded and ψ : C → R r is such that kψ(u) − ψ(v) − T −1 (u − v)k ≤ ζ2 ku − vk for all u, v ∈ C, then µ∗ ψ[C] ≤ (J −1 + ²0 )µ∗ C. Now suppose that D ⊆ R r is bounded and φ : D → R r is such that kφ(x) − φ(y) − T (x − y)k ≤ ζ20 kx − yk for all x, y ∈ D, where ζ20 = min(ζ2 , kT −1 k)/2kT −1 k2 > 0. Then 1 2
kT −1 (φ(x) − φ(y)) − (x − y)k ≤ kT −1 kζ20 kx − yk ≤ kx − yk for all x, y ∈ D, so φ must be injective; set C = φ[D] and ψ = φ−1 : C → D. Note that C is bounded, because kφ(x) − φ(y)k ≤ (kT k + ζ20 )kx − yk whenever x, y ∈ D. Also 1 2
kT −1 (u − v) − (ψ(u) − ψ(v))k ≤ kT −1 kζ20 kψ(u) − ψ(v)k ≤ kψ(u) − ψ(v)k for all u, v ∈ C. But this means that 1 2
kψ(u) − ψ(v)k − kT −1 kku − vk ≤ kψ(u) − ψ(v)k and kψ(u) − ψ(v)k ≤ 2kT −1 kku − vk for all u, v ∈ C, so that kψ(u) − ψ(v) − T −1 (u − v)k ≤ 2ζ20 kT −1 k2 ku − vk ≤ ζ2 ku − vk for all u, v ∈ C. By (b) just above, it follows that µ∗ D = µ∗ ψ[C] ≤ (J −1 + ²0 )µ∗ C = (J −1 + ²0 )µ∗ φ[D], and Jµ∗ D ≤ (1 + J²0 )µ∗ φ[D]. (d) So if we set ζ = min(ζ0 , ζ1 , ζ20 ) > 0, and if D ⊆ R r , φ : D → R r are such that D is bounded and kφ(x) − φ(y) − T (x − y)k ≤ ζkx − yk for all x, y ∈ D, we shall have µ∗ φ[D] ≤ (J + ²)µ∗ D, µ∗ φ[D] ≥ Jµ∗ D − J²0 µ∗ φ[D] ≥ Jµ∗ D − J²0 (J + ²)µ∗ D = Jµ∗ D − ²µ∗ D, so we get the required formula
312
Change of variable in the integral
263C
|µ∗ φ[D] − Jµ∗ D| ≤ ²µ∗ D. 263D
We are ready for the theorem.
Theorem Let D ⊆ R r be any set, and φ : D → R r a function differentiable relative to its domain at each point of D. For each x ∈ D let T (x) be a derivative of φ relative to D at x, and set J(x) = | det T (x)|. Then (i) J : D → [0,R∞[ is a measurable function, (ii) µ∗ φ[D] ≤ D J dµ, allowing ∞ as the value of the integral. If D is measurable, then (iii) φ[D] is measurable. If D is measurableRand φ is injective, then (iv) µφ[D] = D J dµ, (v) for every real-valued function g defined on a subset of φ[D], R R g dµ = D J × gφ dµ φ[D] if either integral is defined in [−∞, ∞], provided we interpret J(x)g(φ(x)) as zero when J(x) = 0 and g(φ(x)) is undefined. proof (a) To see that J is measurable, use 262P; the function T 7→ | det T | is a continuous function of the coefficients of T , and the coefficients of T (x) are measurable functions of x, by 262P, so x 7→ | det T (x)| is measurable (121K). We also know that if D is measurable, φ[D] will be measurable, by 262Ob. Thus (i) and (iii) are done. (b) For the moment, assume that D is bounded, and fix ² > 0. For r × r matrices T , take ζ(T, ²) > 0 as in 263C. Take hDn in∈N , hTn in∈N as in 262M, so that hDn in∈N is a disjoint cover of D by sets which are relatively measurable in D, and each Tn is an r × r matrix such that kT (x) − Tn k ≤ ζ(Tn , ²) whenever x ∈ Dn , kφ(x) − φ(y) − Tn (x − y)k ≤ ζ(Tn , ²)kx − yk for all x, y ∈ Dn . Then, setting Jn = | det Tn |, we have |J(x) − Jn | ≤ ² for every x ∈ Dn , |µ∗ φ[Dn ] − Jn µ∗ Dn | ≤ ²µ∗ Dn , by the choice of ζ(Tn , ²). So we have R R P∞ J dµ ≤ n=0 Jn µ∗ Dn + ²µ∗ D ≤ D J dµ + 2²µ∗ D; D I am using here the fact that all the Dn are relatively measurable in D, so that, in particular, µ∗ D = P ∞ ∗ n=0 µ Dn . Next, P∞ P∞ µ∗ φ[D] ≤ n=0 µ∗ φ[Dn ] ≤ n=0 Jn µ∗ Dn + ²µ∗ D. Putting these together, µ∗ φ[D] ≤
R D
J dµ + 2²µ∗ D.
If D is measurable and φ is injective, then all the Dn are measurable subsets of R r , so all the φ[Dn ] are measurable, and they are also disjoint. Accordingly R P∞ P∞ J dµ ≤ n=0 Jn µDn + ²µD ≤ n=0 (µφ[Dn ] + ²µDn ) + ²µD = µφ[D] + 2²µD. D Since ² is arbitrary, we get µ∗ φ[D] ≤ and if D is measurable and φ is injective,
R D
R D
J dµ,
J dµ ≤ µφ[D];
thus we have (ii) and (iv), on the assumption that D is bounded.
*263F
Differentiable transformations in R r
(c) For a general set D, set Bk = B(0, k); then µ∗ φ[D] = limk→∞ µ∗ φ[D ∩ Bk ] ≤ limk→∞
R D∩Bk
313
J dµ =
R D
J dµ,
with equality if φ is injective and D is measurable. (d) For part (v), I seek to show that the hypotheses of 235L are satisfied, taking X = D and Y = φ[D]. P P Set G = {x : x ∈ D, J(x) > 0}. α) If F ⊆ φ[D] is measurable, then there are Borel sets F1 , F2 such that F1 ⊆ F ⊆ F2 and µ(F2 \F1 ) = (α 0. Set Ej = φ−1 [Fj ] for each j, so that E1 ⊆ φ−1 [F ] ⊆ E2 , and both the sets Ej are measurable, because φ and dom φ are measurable. Now, applying (iv) to φ¹Ej , R J dµ = µφ[Ej ] = µ(Fj ∩ φ[D]) = µF Ej R for both j, so E2 \E1 J dµ = 0 and J = 0 a.e. on E2 \ E1 . Accordingly J × χ(φ−1 [F ]) =a.e. J × χE1 , and R R J × χ(φ−1 [F ])dµ exists and is equal to E1 J dµ = µF . At the same time, (φ−1 [F ] ∩ G)4(E1 ∩ G) is negligible, so φ−1 [F ] ∩ G is measurable. R β ) If F ⊆ φ[D] and G ∩ φ−1 [F ] is measurable, then we know that µφ[D \ G] = D\G J = 0 (by (iv)), (β so F \ φ[G] must be negligible; while F ∩ φ[G] = φ[G ∩ φ−1 [F ]] is also measurable, by (iii). Accordingly F is measurable whenever G ∩ φ−1 [F ] is measurable. Thus all the hypotheses of 235L are satisfied. Q Q Now (v) can be read off from the conclusion of 235L. 263E Remarks (a) This is a version of the classical result on change of variable in a many-dimensional integral. What I here call J(x) is the Jacobian of φ at x; it describes the change in volumes of objects near x, following the rule already established in 263A for functions with constant derivative. The idea of the proof is also the classical one: to break the set D up into small enough pieces Dm for us to be able to approximate φ by affine operators y 7→ φ(x) + Tm (y − x) on each. The potential irregularity of the set D, which in this theorem may be any set, is compensated for by a corresponding freedom in choosing the sets Dm . In fact there is a further decomposition of the sets Dm hidden in part (b-ii) of the proof of 263C; each Dm is essentially covered by a disjoint family of balls, the measures of whose images we can estimate with an adequate accuracy. There is always a danger of a negligible exceptional set, and we need the crude inequalities of the proof of 262D to deal with it. (b) Throughout the work of this chapter, from 261B to 263D, I have chosen balls B(x, δ) as the basic shapes to work with. I think it should be clear that in fact any reasonable shapes would do just as well. In particular, the ‘balls’ Pr B1 (x, δ) = {y : i=1 |ηi − ξi | ≤ δ}, B∞ (x, δ) = {y : |ηi − ξi | ≤ δ ∀ i} would serve perfectly. There are many alternatives.£ We could use sets C(x, k), for x ∈ R r and £ of the form −k −k r k ∈ N, defined to be the half-open cube of the form 2 z, 2 (z + 1) with z ∈ Z containing x, instead; or even C 0 (x, δ) = [x, x + δ1[. In all such cases we have versions of the density theorems (261Yb-261Yc) which support the remaining theory. (c) I have presented 263D as a theorem about differentiable functions, because that is the normal form in which one uses it in elementary applications. However, the proof depends essentially on the fact that a differentiable function is a countable union of Lipschitz functions, and 263D would follow at once from the same theorem proved for Lipschitz functions only. Now the fact is that the theorem applies to any countable union of Lipschitz functions, because a Lipschitz function is differentiable almost everywhere. For more advanced work (see Federer 69 or Evans & Gariepy 92, or Chapter 47 in Volume 4) it seems clear that Lipschitz functions are the vital ones, so I spell out the result. *263F Corollary Let D ⊆ R r be any set and φ : D → R r a Lipschitz function. Let D1 be the set of points at which φ has a derivative relative to D, and for each x ∈ D1 let T (x) be such a derivative, with J(x) = | det T (x)|. Then (i) D \ D1 is negligible;
314
Change of variable in the integral
*263F
(ii) J : D1 → [0, R ∞[ is measurable; (iii) µ∗ φ[D] ≤ D J(x)dx. If D is measurable, then (iv) φ[D] is measurable. If D is measurableR and φ is injective, then (v) µφ[D] = D J dµ, (vi) for every real-valued function g defined on a subset of φ[D], R R g dµ = D J × gφ dµ φ[D] if either integral is defined in [−∞, ∞], provided we interpret J(x)g(φ(x)) as zero when J(x) = 0 and g(φ(x)) is undefined. proof This is now just a matter of putting 262Q and 263D together, with a little help from 262D. Use 262Q to show that D \ D1 is negligible, 262D to show that φ[D \ D1 ] is negligible, and apply 263D to φ¹D1 . 263G Polar coordinates in the plane I offer an elementary example with µ a useful consequence. ¶ cos θ −ρ sin θ 2 2 2 0 Define φ : R → R by setting φ(ρ, θ) = (ρ cos θ, ρ sin θ) for ρ, θ ∈ R . Then φ (ρ, θ) = , so sin θ ρ cos θ J(ρ, θ) = |ρ| for all ρ, θ. Of course φ is not injective, but if we restrict it to the domain D = {(0, 0)}∪{(ρ, θ) : ρ > 0, −π < θ ≤ π} then φ¹D is a bijection between D and R 2 , and R R g dξ1 dξ2 = D g(φ(ρ, θ))ρ dρdθ for every real-valued function g which is integrable over R 2 . Suppose, in particular, that we set g(x) = e−kxk for x = (ξ1 , ξ2 ) ∈ R. Then
2
/2
2
2
= e−ξ1 /2 e−ξ2 /2
R R 2 2 g(x)dx = e−ξ1 /2 dξ1 e−ξ2 /2 dξ2 , R R 2 as in 253D. Setting I = e−t /2 dt, we have g = I 2 . (To see that I is well-defined in R, note that the integrand is continuous, therefore measurable, and that R 1 −t2 /2 e dt ≤ 2, −1 R −1 −∞
e−t
2
/2
R
dt =
R∞ 1
e−t
2
/2
dt ≤
R∞ 1
e−t/2 dt = lima→∞
Ra 1
1 2
e−t/2 dt = e−1/2
are both finite.) Now looking at the alternative expression we have Z
Z
I2 =
g(x)dx =
g(ρ cos θ, ρ sin θ)ρ d(ρ, θ) Z Z ∞Z π 2 2 = e−ρ /2 ρ d(ρ, θ) = ρe−ρ /2 dθdρ D
D
0
−π
(ignoring the point (0, 0), which has zero measure) Z ∞ Z −ρ2 /2 = 2πρe dρ = 2π lim a→∞
0
= 2π lim (−ea
2
/2
+ 1) = 2π.
2
dt = I =
a→∞
Consequently
R∞ −∞
e−t
/2
√
a
2
ρe−ρ
/2
dρ
0
2π,
which is one of the many facts every mathematician should know, and in particular is vital for Chapter 27 below.
263I
Differentiable transformations in R r
263H Corollary If k ∈ N is odd,
R∞ −∞
xk e−x
2
/2
315
dx = 0;
if k = 2l ∈ N is even, then R∞ −∞
xk e−x
2
/2
dx =
(2l)! √ 2π. 2l l!
proof (a) To see that all the integrals are well-defined and finite, observe that limx→±∞ xk e−x 2 that Mk = supx∈R |xk e−x /4 | is finite, and R ∞ k −x2 /2 R∞ 2 |x e |dx ≤ Mk −∞ e−x /4 dx < ∞. −∞
2
/4
= 0, so
(b) If k is odd, then substituting y = −x we get R ∞ k −x2 /2 R∞ 2 x e dx = − −∞ y k e−y /2 dy, −∞ so that both integrals must be zero. √ √ R∞ 2 2π by 263G. For the (c) For even k, proceed by induction. Set Il = −∞ x2l e−x /2 dx. I0 = 2π = 20! 0 0! inductive step to l + 1 ≥ 1, integrate by parts to see that R a 2l+1 Ra 2 2 2 2 x · xe−x /2 dx = −a2l+1 e−a /2 + (−a)2l+1 e−a /2 + −a (2l + 1)x2l e−x /2 dx −a for every a ≥ 0. Letting a → ∞, Il+1 = (2l + 1)Il . Because (2(l+1))! √ 2π 2l+1 (l+1)!
= (2l + 1)
(2l)! √ 2π, 2l l!
the induction proceeds. 263I The one-dimensional case The restriction to injective functions φ in 263D(v) is unavoidable in the context of the result there. But in the substitutions of elementary calculus it is not always essential. In the hope of clarifying the position I give a result here which covers many of the standard tricks. Theorem Let I ⊆ R be an interval with more than one point, and φ : I → R a function which is absolutely continuous on any closed bounded subinterval of I. Write u = inf I, u0 = sup I in [−∞, ∞], and suppose that v = limx↓u φ(x) and v 0 = limx↑u0 φ(x) are defined in [−∞, ∞]. Let g be a Lebesgue measurable real-valued function defined almost everywhere on φ[I]. Then R v0 R g = I g(φ(x))φ0 (x)dx v R v0 Rv whenever the right-hand side is defined in R, on the understanding that we interpret v g as − v0 g when v 0 < v, and g(φ(x))φ0 (x) as 0 when φ0 (x) = 0 and g(φ(x)) is undefined. proof (a) Recall that φ is differentiable almost everywhere on I (225Cb) and that φ[A] is negligible for every negligible A ⊆ I (225G). (These results are stated for closed bounded intervals; but since any interval is expressible as the union of a sequence of closed bounded intervals, they remain valid in the present context.) Set D = dom φ0 , so that I \ D and φ[I \ D] are negligible. Next, setting : x ∈ R D0 = {x 0 0 D, φ (x) = 0}, D and D are Borel sets (225J) and φ[D ] is negligible, by 263D(ii), while g(φ(x))φ (x)dx = 0 0 I R 0 g(φ(x))φ (x)dx. D\D0 Applying 262M with A = R \ {0} and ζ(α) = 12 |α| for α ∈ A, we have sequences hEn in∈N , hαn in∈N such that hEn in∈N is a disjoint cover of D \ D0 by measurable sets, every αn is non-zero, and |φ(x) − φ(y) − αn (x − y)| ≤ 21 |αn ||x − y| for all x, y ∈ En ; so that, in particular, φ¹En is injective, while sgn φ0 (x) = sgn αn for every x ∈ En , writing sgn α = α/|α| as usual. Set ²n = sgn αn for each n. Now 263D(v) tells us that P∞ R P∞ R |g| × χ(φ[En ]) = n=0 En |g(φ(x))φ0 (x)|dx n=0 is finite.
316
Change of variable in the integral
263I
−1 Note that R 263D(v) also 0shows that if B ⊆ R is negligible, then En ∩ φ [B] must be negligible for every n, so that φ−1 [B] g(φ(x))φ (x)dx = 0. Consequently, setting P∞ C0 = {y : y ∈ (φ[I] ∩ dom g) \ ({v, v 0 } ∪ φ[I \ D] ∪ φ[D0 ]), n=0 |g(y)χ(φ[En ])(y)| < ∞},
φ[I] \ C0 is negligible, and if we set C = {y : y ∈ C0 , g(y) 6= 0}, R R g= Jg J∩C for every J ⊆ φ[I]. (b) The point of the argument is the following fact: if y ∈ C then ∞ X
²n χ(φ[En ])(y) = 1 if v < y < v 0 ,
n=0
= −1 if v 0 < y < v, = 0 if y < v ≤ v 0 or v 0 ≤ v < y.
P∞ P P Because g(y) 6= 0 and n=0 |g(y)χ(φ[En ])(y)| / φ[I \ D] ∪ S is finite, {n : y ∈ φ[En ]} is finite; because y ∈ φ[D0 ], and φ¹En is injective for every n, and n∈N En = D \ D0 , K = φ−1 [{y}] is finite. For each x ∈ K, P∞ P let nx be such that x ∈ Enx ; then ²nx = sgn φ0 (x). So n=0 ²n χ(φ[En ])(y) = x∈K sgn φ0 (x). If J ⊆ R\K is an interval, φ(z) 6= y for z ∈ J; since φ is continuous, the Intermediate Value Theorem tells us that sgn(φ(z)P− y) is constant on J. A simple induction on #(K ∩ ]−∞, z[) shows that y) = P sgn(φ(z) − 0 sgn(v − y) + 2 x∈K,x α}dα R R∞ is finite, and in this case kf kp = γ 1/p . (Hint: |f |p = 0 µ∗ {x : |f (x)|p > β}dβ, by 252O; now substitute β = αp .) r Let f be an integrable function defined almost everywhere on . Show R that if α < r − 1 then P(b) PR ∞ ∞ α r α |f (nx)|dx for any ball B n=1 n |f (nx)| is finite for almost every x ∈ R . (Hint: estimate n=0 n B centered at the origin.)
263 Notes
Differentiable transformations in R r
317
(c) Let A ⊆ ]0, 1[ be a set such that µ∗ A = µ∗ ([0, 1] \ A) = 1, where µ is Lebesgue measure on R. Set D = A ∪ {−x : x ∈ ]0, 1[ \ A} ⊆ [−1, 1], and set φ(x) = |x| for x ∈ D. Show that φ is injective, that φ is R differentiable relative to its domain everywhere in D, and that µ∗ φ[D] < D |φ0 (x)|dx. (d) Let φ : D → R r be a function differentiable relative to D at each point of D ⊆ R r , and suppose that for each x ∈ DS there is a non-singular derivative T (x) of φ at x; set J(x) = | det T (x)|. Show that D is expressible as k∈N Dk where Dk = D ∩ Dk and φ¹Dk is injective for each k. R 1 R 1 du = E |u| du. (ii) For t ∈ R, > (e) (i) Show that for any Lebesgue measurable E ⊆ R, t ∈ R \ {0}, tE |u| R R 1 1 t u ∈ R \ {0} set φ(t, u) = ( u , u). Show that φ[E] |tu| d(t, u) = E |tu| d(t, u) for any Lebesgue measurable E ⊆ R2. (f ) Define φ : R 3 → R 3 by setting φ(ρ, θ, α) = (ρ sin θ cos α, ρ sin θ sin α, ρ cos θ). Show that det φ0 (ρ, θ, α) = ρ2 sin θ. (g) Show that if k = 2l + 1 is odd, then
R∞ 0
2
xk e−x
/2
dx = 2l l!. (Compare 252Xh.)
R 1 263Y Further exercises (a) Define a measure ν on R by setting νE = E |x| dx for Lebesgue measurR x 1 able sets E ⊆ R. For f , g ∈ L (ν) set (f ∗ g)(x) this is defined in R. (i) Show R = f ( t )g(t)ν(dt) whenever R that f ∗ g = g ∗ f ∈ L1 (ν). (ii) Show that h(x)(f ∗ g)(x)ν(dx) = h(xy)f (x)f (y)ν(dx)ν(dy) for every h ∈ L∞ (ν). (iii) Show that f ∗ (g ∗ h) = (f ∗ g) ∗ h for every h ∈ L1 (ν). (Hint: 263Xe.) (b) Let E ⊆ R 2 be a measurable set such that lim supα→∞ α12 µ2 (E ∩ B(0, α)) > 0, writing µ2 for Lebesgue measure on R 2 . Show that there is some θ ∈ ]−π, π] suchR that µ1 Eθ = ∞, where Eθ = {ρ : π ρ ≥ 0, (ρ cos θ, ρ sin θ) ∈ E}. (Hint: show that α12 µ2 (E ∩ B(0, α)) ≤ −π min( 12 , α1 µ1 Eθ )dθ.) Generalize to higher dimensions and to functions other than χE. (c) Let E ⊆ R r be a measurable set, and φ : E → R r a function differentiable relative to its domain, with a derivative T (x), at each point x of E; set J(x) = | det T (x)|. Show that for any integrable function g defined on φ[E], R R g(y)#(φ−1 [{y}])dy = E J(x)g(φ(x))dx (Hint: 263I.) (d) Find a proof of 263I based on the ideas of §225. (Hint: 225Xg.) 263 Notes and comments Yet again, approaching 263D, I find myself having to choose between giving an accessible, relatively weak result and making the extra effort to set out a theorem which is somewhere near the natural boundary of what is achievable within the concepts being developed in this volume; and, as usual, I go for the more powerful form. There are three basic sources of difficulty: (i) the fact that we are dealing with more than one dimension; (ii) the fact that we are dealing with irregular domains; (iii) the fact that we are dealing with arbitrary integrable functions. I do not think I need to apologise for (iii) in a book on measure theory. Concerning (ii), it is quite true that the principal applications of these results are to cases in which the transformation φ is differentiable everywhere, with continuous derivative, and the set D has negligible boundary; and in these cases there are substantial simplifications available – mostly because the sets Dm of the proof of 263D can be taken to be cubes. Nevertheless, I think any form of the result which makes such assumptions is deeply unsatisfactory at this level, being an awkward compromise between ideas natural to the Riemann integral and those natural to the Lebesgue integral. Concerning (i), it might even have been right to lay out the whole argument for the case r = 1 before proceeding to the general case, as I did in §§114-115, because the one-dimensional case is already important and interesting; and if you find the work above difficult – which it is – and your immediate interests are in one-dimensional integration by substitution, then I think you might find it worth your time to reproduce the r = 1 argument
318
Change of variable in the integral
263 Notes
yourself, up to a proof of 263I. In fact the biggest difference is in 263A, which becomes nearly trivial; the work of 262M and 263C becomes more readable, because all the matrices turn into scalars and we can drop the word ‘determinant’, but I do not think we can dispense with any of the ideas, at least if we wish to obtain 263D as stated. (But see 263Yd.) I found myself insisting, in the last paragraph, that a distinction can be made between ‘ideas natural to the Riemann integral and those natural to the Lebesgue integral’. We are approaching deep questions here, like ‘what are books on measure theory for?’, which I do not think can be answered without some – possibly unconscious – reference to the question ‘what is mathematics for?’. I do of course want to present here some of the wonderful general theorems which arise in the Lebesgue theory. But more important than any specific theorem is a general idea of what can be proved by these methods. It is the essence of modern measure theory that continuity does not matter, or, if you prefer, that measurable functions are in some sense so nearly continuous that we do not have to add hypotheses of continuity in our theorems. Now this is in a sense a great liberation, and the Lebesgue integral is now the standard one. But you must not regard the Riemann integral as outdated. The intuitions on which it is founded – for instance, that the surface of a solid body has zero volume – remain of great value in their proper context, which certainly includes the study of differentiable functions with continuous derivatives. What I am saying here is that I believe we can use these intuitions best if we maintain a division, a flexible and permeable one, of course, between the ideas of the two theories; and that when transferring a theorem from one side of the boundary to the other we should do so whole-heartedly, seeking to express the full power of the methods we are using. I have already said that the essential difference between the one-dimensional and multi-dimensional cases lies in 263A, where the Jacobian J = | det T | enters the argument. Shorn of the technical devices necessary to deal with arbitrary Lebesgue measurable sets, this amounts to a calculation of the volume of the parallelepiped T [I] where I is the interval [0, 1[. I have dealt with this by a little bit of algebra, saying that the result is essentially obvious if T is diagonal, whereas if T is an isometry it follows from the fact that the unit ball is left invariant; and the algebra comes in to express an arbitrary matrix as a product of diagonal and orthogonal matrices (2A6C). It is also plain from 261F that Lebesgue measure must be rotation-invariant as well as translation-invariant; that is to say, it is invariant under all isometries. Another way of looking at this will appear in the next section. I feel myself that the centre of the argument for 263D is in the lemma 263C. This is where we turn the exact result for linear operators into an approximate result for almost-linear functions; and the whole point of differentiability is that a differentiable function is well approximated, in a neighbourhood of any point of its domain, by a linear operator. The lemma involves two rather different ideas. To show that µ∗ φ[D] ≤ (J + ²)µ∗ D, we look first at balls and then use Vitali’s theorem P to see that D is economically covered by balls, so that an upper bound for µ∗ φ[D] in terms of a sum B∈I0 µ∗ φ[D ∩ B] is adequate. To obtain a lower bound, we need to reverse the argument by looking at ψ = φ−1 , which involves checking first that φ is invertible, and then that ψ is appropriately linked to T −1 . I have written out exact formulae for ²0 , ζ20 and so on, but this is only in case you do not trust your intuition; the fact that kφ−1 (u)−φ−1 (v)−T −1 (u−v)k is small compared with ku−vk is pretty clearly a consequence of the hypothesis that kφ(x)−φ(y)−T (x−y)k is small compared with kx − yk. The argument of 263D itself is now a matter of breaking the set D up into appropriate pieces on each of which φ is sufficiently nearly linear for 263C to apply, so that P∞ P∞ µ∗ φ[D] ≤ m=0 µ∗ φ[Dm ] ≤ m=0 (Jm + ²)µ∗ Dm . With a little care (taken in 263C, with its condition (i)), we can also ensure R that the Jacobian J is well P∞ approximated by Jm almost everywhere on Dm , so that m=0 Jm µ∗ Dm l D J(x)dx. These ideas, joined with the results of §262, bring us to the point R J dµ = µφ[E] E when φ is injective and E ⊆ D is measurable. We need a final trick, involving Borel sets, to translate this into R J dµ = µF φ−1 [F ] whenever F ⊆ φ[D] is measurable, which is what is needed for the application of 235L. I hope that you long ago saw, and were delighted by, the device in 263G. Once again, this is not really
264B
Hausdorff measures
319
Lebesgue integration; but I include it just to show that the machinery of this chapter can be turned to deal with the classical results, and that indeed we have a tiny profit from our labour, in that no apology need be made for the boundary of the set D into which the polar coordinate system maps the plane. I have already given the actual result as an exercise in 252Xh. That involved (if you chase through the references) a one-dimensional substitution (performed in 225Xj), Fubini’s theorem and an application of the formulae of §235; that is to say, very much the same elements as those used above, though in a different order. I could present this with no mention of differentiation in higher dimensions because the first change of variable was in one dimension, and the second (involving the function x 7→ kxk, in 252Xh(i)) was of a particularly simple type, so that a different method could be used to find the function J. The abstract ideas to which this treatise is devoted do not, indeed, lead us to many particular examples on which to practise the ideas of this section. The ones which do arise tend to be very straightforward, as in 263G, 263Xa-263Xb and 263Xe. I mention the last because it provides a formula needed to discuss a new type of convolution (263Ya). In effect, this depends on the multiplicative group R \ {0} in place of the additive group R treated in §255. The formula x1 in the definition of ν is of course the derivative of ln x, and ln is an isomorphism between (]0, ∞[ , ·, ν) and (R, +, Lebesgue measure).
264 Hausdorff measures The next topic I wish to approach is the question of ‘surface measure’; a useful example to bear in mind throughout this section and the next is the notion of area for regions on the sphere, but any other smoothly curved two-dimensional surface in three-dimensional space will serve equally well. It is I think more than plausible that our intuitive concepts of ‘area’ for such surfaces should correspond to appropriate measures. But the formalisation of this intuition is non-trivial, especially if we seek the generality that simple geometric ideas lead us to; I mean, not contenting ourselves with arguments that depend on the special nature of the sphere, for instance, to describe spherical surface area. I divide the problem into two parts. In this section I will describe a construction which enables us to define the r-dimensional measure of an r-dimensional surface – among other things – in s-dimensional space. In the next section I will set out the basic theorems making it possible to calculate these measures effectively in the leading cases. 264A Definitions Let s ≥ 1 be an integer, and r > 0. (I am primarily concerned with integral r, but will not insist on this until it becomes necessary, since there are some very interesting ideas which involve non-integral ‘dimension’ r.) For any A ⊆ R s , δ > 0 set θrδ A = inf{
∞ X
(diam An )r : hAn in∈N is a sequence of subsets of R s covering A,
n=0
diam An ≤ δ for every n ∈ N}. It is convenient in this context to say that diam ∅ = 0. Now set θr A = supδ>0 θrδ A; θr is r-dimensional Hausdorff outer measure on R s . 264B
Of course we must immediately check the following:
Lemma θr , as defined in 264A, is always an outer measure. proof You should be used to these arguments by now, but there is an extra step in this one, so I spell out the details. (a) Interpreting the diameter of the empty set as 0, we have θrδ ∅ = 0 for every δ > 0, so θr ∅ = 0. (b) If A ⊆ B ⊆ R s , then every sequence covering B also covers A, so θrδ A ≤ θrδ B for every δ and θr A ≤ θr B.
320
Change of variable in the integral
264B
s (c) Let hAn in∈N be a sequence of subsets P∞ of R with union A, and take any a < θr A. Then there is a δ > 0 such that a ≤ θrδ A. Now θrδ A ≤ n=0 θrδ (An ). P P Let ² > 0,Pand for each n ∈ N choose a sequence ∞ hAnm im∈N of sets, covering A, with diam Anm ≤ δ for every m and m=0 (diam Anm )r ≤ θrδ + 2−n ². Then hAnm im,n∈N is a cover of A by countably many sets of diameter at most δ, so P∞ P∞ P∞ P∞ θrδ A ≤ n=0 m=0 (diam Anm )r ≤ n=0 θrδ An + 2−n ² = 2² + n=0 θrδ An .
As ² is arbitrary, we have the result. Q Q Accordingly a ≤ θrδ A ≤
P∞
n=0 θrδ An
As a is arbitrary, θr A ≤
≤
P∞
n=0 θr An .
P∞
n=0 θr An ;
as hAn in∈N is arbitrary, θr is an outer measure. 264C Definition If s ≥ 1 is an integer, and r > 0, then Hausdorff r-dimensional measure on R s is the measure µHr on R s defined by Carath´eodory’s method from the outer measure θr of 264A-264B. 264D Remarks (a) It is important to note that the sets used in the definition of the θrδ need not be balls; even in R 2 not every set A can be covered by a ball of the same diameter as A. (b) In the definitions above I require r > 0. It is sometimes appropriate to take µH0 to be counting measure. This is nearly the result of applying the formulae above with r = 0, but there can be difficulties if we interpret them over-literally. (c) All Hausdorff measures must be complete, because they are defined by Carath´eodory’s method (212A). For r > 0, they are all atomless (264Yg). In terms of the other criteria of §211, however, they are very ill-behaved; for instance, if r, s are integers and 1 ≤ r < s, then µHr on R s is not semi-finite. (I will give a proof of this in §439 in Volume 4.) Nevertheless, they do have some striking properties which make them reasonably tractable. (d) In 264A, note that θrδ A ≤ θrδ0 A when 0 < δ 0 ≤ δ; consequently, for instance, θr A = limn→∞ θr,2−n A. I have allowed arbitrary sets An in the covers, but it makes no difference if we restrict our attention to covers consisting of open sets or of closed sets (264Xc). 264E Theorem Let s ≥ 1 be an integer, and r ≥ 0; let µHr be Hausdorff r-dimensional measure on R s , and ΣHr its domain. Then every Borel subset of R s belongs to ΣHr . proof This is trivial if r = 0; so suppose henceforth that r > 0. (a) The first step is to note that if A, B are subsets of Rs and η > 0 is such that kx − yk ≥ η for all x ∈ A, y ∈ B, then θr (A ∪ B) = θr A + θr B, where θr is r-dimensional Hausdorff outer measure on R s . P P Of course θr (A ∪ B) ≤ θr A + θr B, because θr is an outer measure. For the reverse inequality, we may suppose that θr (A ∪ B) < ∞, so that θr A and θr B are both finite. Let ² > 0 and let δ1 , δ2 > 0 be such that θr A + θr B ≤ θrδ1 A + θrδ2 B + ². Set δ = min(δ1P , δ2 , 21 η) > 0 and let hAn in∈N be ∞ and such that n=0 (diam An )r ≤ θrδ (A ∪ B) +
a sequence of sets of diameter at most δ, covering A ∪ B, ². Set
K = {n : An ∩ A 6= ∅},
L = {n : An ∩ B 6= ∅}.
Because kx − yk ≥ η > diam An whenever x ∈ A, y ∈ B and n ∈ N, K ∩ L = ∅; and of course A ⊆
S n∈K
Ak , B ⊆
S n∈L
An . Consequently
264F
Hausdorff measures
321
θr A + θr B ≤ ² + θrδ1 A + θrδ2 B X X ≤²+ (diam An )r + (diam An )r ≤²+
n∈K ∞ X
n∈L
(diam An )r ≤ 2² + θrδ (A ∪ B) ≤ 2² + θr (A ∪ B).
n=0
As ² is arbitrary, θr (A ∪ B) ≥ θr A + θr B, as required. Q Q (b) It follows that θr A = θr (A ∩ G) + θr (A \ G) whenever A ⊆ R s and G is open. P P As usual, it is enough to consider the case θr A < ∞ and to show that in this case θr (A ∩ G) + θr (A \ G) ≤ θr A. Set An = {x : x ∈ A, kx − yk ≥ 2−n for every y ∈ A \ G}, B0 = A0 , Bn = An \ An−1 for n > 1. S S Observe that An ⊆ An+1 for every n and n∈N An = n∈N Bn = A ∩ G. The point is that if m, n ∈ N and n ≥ m + 2, and if x ∈ Bm and y ∈ Bn , then there is a z ∈ A \ G such that ky − zk < 2−n+1 ≤ 2−m−1 , while kx − zk must be at least 2−m , so kx − yk ≥ kx − zk − ky − zk ≥ 2−m−1 . It follows that for any k ≥ 0 Pk S m=0 θr B2m = θr ( m≤k B2m ) ≤ θr (A ∩ G) < ∞, Pk
S
B2m+1 ) ≤ θr (A ∩ G) < ∞, P∞ (inducing on k, using (a) above for the inductive P step). Consequently n=0 θr Bn < ∞. ∞ But now, given ² > 0, there is an m such that n=m θr Bm ≤ ², so that m=0 θr B2m+1
= θr (
m≤k
θr (A ∩ G) + θr (A \ G) ≤ θr Am +
∞ X
θr Bn + θr (A \ G)
n=m −m
(by (a) again, since kx − yk ≥ 2
≤ ² + θr Am + θr (A \ G) = ² + θr (Am ∪ (A \ G)) for x ∈ Am , y ∈ A \ G) ≤ ² + θr A.
As ² is arbitrary, θr (A ∩ G) + θr (A \ G) ≤ θr A, as required. Q Q (c) Part (b) shows exactly that open sets belong to ΣHr . It follows at once that the Borel σ-algebra of R s is included in ΣHr , as claimed. 264F Proposition Let s ≥ 1 be an integer, and r > 0; let θr be r-dimensional Hausdorff outer measure on R s , and write µHr for r-dimensional Hausdorff measure on R s , ΣHr for its domain. Then (a) for every A ⊆ R s there is a Borel set E ⊇ A such that µHr E = θr A; (b) θr = µ∗Hr , the outer measure defined from µHr ; (c) if E ∈ ΣHr is expressible as a countable union of sets of finite measure, there are Borel sets E 0 , E 00 such that E 0 ⊆ E ⊆ E 00 and µHr (E 00 \ E 0 ) = 0. proof (a) If θr A = ∞ this is trivial – take E = Rs . Otherwise, for a sequence hAnm im∈N P∞each n ∈ N choose r of sets of diameter at most 2−n , covering A, and such that (diam A ) ≤ θr,2−n A + 2−n . Set nm m=0 T S s Fnm = Anm , E = n∈N m∈N Fnm ; then E is a Borel set in R . Of course T S T S A ⊆ n∈N m∈N Amn ⊆ n∈N m∈N Fnm = E. For any n ∈ N, diam Fnm = diam Anm ≤ 2−n for every m ∈ N, P∞ P∞ r r −n , m=0 (diam Fnm ) = m=0 (diam Anm ) ≤ θr,2−n A + 2 so
322
Change of variable in the integral
264F
θr,2−n E ≤ θr,2−n A + 2−n . Letting n → ∞, θr E = limn→∞ θr,2−n E ≤ limn→∞ θr,2−n A + 2−n = θr A; of course it follows that θr A = θr E, because A ⊆ E. Now by 264E we know that E ∈ ΣHr , so we can write µHr E in place of θr E. (b) This follows at once, because (writing Σ for the domain of µHr ) we have µ∗Hr A = inf{µHr E : E ∈ Σ, A ⊆ E} = inf{θr E : E ∈ Σ, A ⊆ E} ≥ θr A for every A ⊆ R s . On the other hand, if A ⊆ R s , we have a Borel set E ⊇ A such that θr A = µr E, so that µ∗Hr A ≤ µHr E = θr A. (c)(i) Suppose first that µHr E < ∞. By (a), there are Borel sets E 00 ⊇ E, H ⊇ E 00 \ E such that µHr E 00 = θr E, µHr H = θr (E 00 \ E) = µHr (E 00 \ E) = µHr E 00 − µHr E = µHr E 00 − θr E = 0. So setting E 0 = E 00 \ H, we obtain a Borel set included in E, and µHr (E 00 \ E 0 ) ≤ µHr H = 0. S 0 00 take Borel (ii) For the general case, express E as n∈N En where µHr En < ∞ for S sets00 En , En S each n; 0 00 0 00 00 0 0 such that En ⊆ En ⊆ En and µHr (En \ En ) = 0 for each n; and set E = n∈N En , E = n∈N En . 264G Lipschitz functions The definition of Hausdorff measure is exactly adapted to the following result, corresponding to 262D. Proposition Let m, s ≥ 1 be integers, and φ : D → R s a γ-Lipschitz function, where D is a subset of R m . Then for any A ⊆ D and r ≥ 0, µ∗Hr (φ[A]) ≤ γ r µ∗Hr A for every A ⊆ D, writing µHr for r-dimensional Hausdorff outer measure on either R m or R s . proof (a) The case r = 0 is trivial, since then γ r = 1 and µ∗Hr A = µH0 A = #(A) if A is finite, ∞ otherwise, and #(φ[A]) ≤ #(A). (b) If r > 0, then take any δ > 0. Set η = δ/(1 + γ) and consider θrη : PR m → [0, ∞], defined as in 264A. We know from 264Fb that µ∗Hr A = θr A ≥ θrη A, so there is a sequence hA S n in∈N of sets, all of diameter at most η, covering A, with µ∗Hr A + δ. Now φ[A] ⊆ n∈N φ[An ∩ D] and
P∞
n=0 (diam An )
r
≤
diam φ[An ∩ D] ≤ γ diam An ≤ γη ≤ δ for every n. Consequently θrδ (φ[A]) ≤
P∞
r n=0 (diam φ[An ])
≤
P∞ n=0
γ r (diam An )r ≤ γ r (µ∗Hr A + δ),
and µ∗Hr (φ[A]) = limδ↓0 θrδ (φ[A]) ≤ γ r µ∗Hr A, as claimed. 264H The next step is to relate r-dimensional Hausdorff measure on R r to Lebesgue measure on R r . The basic fact we need is the following, which is even more important for the idea in its proof than for the result.
264H
Hausdorff measures
323
Theorem Let r ≥ 1 be an integer, and A a bounded subset of R r ; write µr for Lebesgue measure on R r and d = diam A. Then d 2
µ∗r (A) ≤ µr B(0, ) = 2−r βr dr , where B(0, d2 ) is the ball with centre 0 and diameter d, so that B(0, 1) is the unit ball in R r , and has measure βr = =
1 k π k!
if r = 2k is even,
22k+1 k! k π (2k+1)!
if r = 2k + 1 is odd.
proof (a) For the calculation of βr , see 252Q or 252Xh. (b) The case r = 1 is elementary, for in this case A is included in an interval of length diam A, so that µ∗1 A ≤ diam A. So henceforth let us suppose that r ≥ 2. (c) For 1 ≤ i ≤ r let Si : R r → R r be reflection in the ith coordinate, so that Si x = (ξ1 , . . . , ξi−1 , −ξi , ξi+1 , . . . , ξr ) for every x = (ξ1 , . . . , ξr ) ∈ R r . Let us say that a set C ⊆ R r is symmetric in coordinates in J, where J ⊆ {1, . . . , r}, if Si [C] = C for i ∈ J. Now the centre of the argument is the following fact: if C ⊆ R is a bounded set which is symmetric in coordinates in J, where J is a proper subset of {1, . . . , r}, and j ∈ {1, . . . , r} \ J, then there is a set D, symmetric in coordinates in J ∪ {j}, such that diam D ≤ diam C and µ∗r C ≤ µ∗r D. P P (i) Because Lebesgue measure is invariant under permutation of coordinates, it is enough to deal with the case j = r. Start by writing F = C, so that diam F = diam C and µr F ≥ µ∗r C. Note that because Si is a homeomorphism for every i, Si [F ] = Si [C] = Si [C] = C = F for i ∈ J, and F is symmetric in coordinates in J. For y = (η1 , . . . , ηr−1 ) ∈ R r−1 , set Fy = {ξ : (η1 , . . . , ηr−1 , ξ) ∈ F },
f (y) = µ1 Fy ,
where µ1 is Lebesgue measure on R. Set 1 2
D = {(y, ξ) : y ∈ R r−1 , |ξ| < f (y)} ⊆ R r . (ii) If H ⊆ R r is measurable and H ⊇ D, then, writing µr−1 for Lebesgue measure on R r−1 , we have Z µr H = (using 251M and 252D)
µ1 {ξ : (y, ξ) ∈ H}µr−1 (dy) Z
≥
Z µ1 {ξ : (y, ξ) ∈ D}µr−1 (dy) =
f (y)µr−1 (dy)
Z =
µ1 {ξ : (y, ξ) ∈ F }µr−1 (dy) = µr F ≥ µ∗r C.
As H is arbitrary, µ∗r D ≥ µ∗r C. (iii) The next step is to check that diam D ≤ diam C. If x, x0 ∈ D, express them as (y, ξr ) and (y 0 , ξr0 ). Because F is a bounded closed set in R r , Fy and Fy0 are bounded closed subsets of R. Also both f (y) and f (y 0 ) must be greater than 0, so that Fy , Fy0 are both non-empty. Consequently α = inf Fy ,
β = sup Fy ,
α0 = inf Fy0 ,
β 0 = sup Fy0
are all defined in R, and α, β ∈ Fy , while α0 and β 0 belong to Fy0 . We have
324
Change of variable in the integral
1 2
264H
1 2
|ξr − ξr0 | ≤ |ξr | + |ξr0 | < f (y) + f (y 0 ) 1 2
1 2
= (µ1 Fy + µ1 Fy0 ) ≤ (β − α + β 0 − α0 ) ≤ max(β 0 − α, β − α0 ). So taking (ξ, ξ 0 ) to be one of (α, β 0 ) or (β, α0 ), we can find ξ ∈ Fy , ξ 0 ∈ Fy0 such that |ξ − ξ 0 | ≥ |ξr − ξr0 |. Now z = (y, ξ), z 0 = (y, ξ 0 ) both belong to F , so kx − x0 k2 = ky − y 0 k2 + |ξr − ξr0 |2 ≤ ky − y 0 k2 + |ξ − ξ 0 |2 = kz − z 0 k2 ≤ (diam F )2 , and kx − x0 k ≤ diam F . As x, x0 are arbitrary, diam D ≤ diam F = diam C, as claimed. (iv) Evidently Sr [D] = D. Moreover, if i ∈ J, then (interpreting Si as an operator on R r−1 ) FSi (y) = Fy for every y ∈ R r−1 , so f (Si (y)) = f (y) and, for ξ ∈ R, y ∈ R r−1 , (y, ξ) ∈ D ⇐⇒ |ξ| < 21 f (y) ⇐⇒ |ξ| < 12 f (Si (y)) ⇐⇒ (Si (y), ξ) ∈ D, so that Si [D] = D. Thus D is symmetric in coordinates in J ∪ {r}. Q Q (d) The rest is easy. Starting from any bounded A ⊆ R r , set A0 = A and construct inductively A1 , . . . , Ar such that d = diam A = diam A0 ≥ diam A1 ≥ . . . ≥ diam Ar , µ∗r A = µ∗r A0 ≤ . . . ≤ µ∗r Ar , Aj is symmetric in coordinates in {1, . . . , j} for every j ≤ r. At the end, we have Ar symmetric in coordinates in {1, . . . , r}. But this means that if x ∈ Ar then −x = S1 S2 . . . Sr x ∈ Ar , so that 1 2
kxk = kx − (−x)k ≤
1 2
d 2
diam Ar ≤ .
Thus Ar ⊆ B(0, d2 ), and d 2
µ∗r A ≤ µ∗r Ar ≤ µr B(0, ), as claimed. 264I Theorem Let r ≥ 1 be an integer; let µ be Lebesgue measure on R r , and let µHr be r-dimensional Hausdorff measure on R r . Then µ and µHr have the same measurable sets and µE = 2−r βr µHr E for every measurable set E ⊆ R r , where βr = µB(0, 1), so that the normalizing factor is 2−r βr = =
1 πk 22k k!
if r = 2k is even,
k! πk (2k+1)!
if r = 2k + 1 is odd.
proof (a) Of course if B = B(x, α) is any ball of radius α, 2−r βr (diam B)r = βr αr = µB. (b) The point is that µ∗ = 2−r βr µ∗Hr . P P Let A ⊆ R r . A⊆
(i) Let δ, ² >P 0. By 261F, there is a sequence hBn in∈N of balls, all of diameter at most δ, such that S ∞ ∗ n∈N Bn and n=0 µBn ≤ µ A + ². Now, defining θrδ as in 264A,
*264J
Hausdorff measures
2−r βr θrδ (A) ≤ 2−r βr
P∞
n=0 (diam Bn )
r
=
325
P∞ n=0
µBn ≤ µ∗ A + ².
Letting δ ↓ 0, 2−r βr µ∗Hr A ≤ µ∗ A + ². As ² is arbitrary, 2−r βr µ∗Hr A ≤ µ∗ A. (ii) Let ² > 0. Then there is a sequence hAn in∈N of sets of diameter at most 1 such that A ⊆ P∞ and n=0 (diam An )r ≤ θr1 A + ², so that P∞ P∞ µ∗ A ≤ n=0 µ∗ An ≤ n=0 2−r βr (diam An )r ≤ 2−r βr (θr1 A + ²) ≤ 2−r βr (µ∗Hr A + ²)
S n∈N
An
by 264H. As ² is arbitrary, µ∗ A ≤ 2−r βr µ∗Hr A. Q Q (c) Because µ, µHr are the measures defined from their respective outer measures by Carath´eodory’s method, it follows at once that µ = 2−r βr µHr in the strict sense required. *264J The Cantor set I remarked in 264A that fractional ‘dimensions’ r were of interest. I have no space for these here, and they are off the main lines of this volume, but I will give one result for its intrinsic interest. Proposition Let C be the Cantor set in [0, 1]. Set r = ln 2/ ln 3. Then the r-dimensional Hausdorff measure of C is 1. T proof (a) Recall that C = n∈N Cn , where each Cn consists of 2n closed intervals of length 3−n , and Cn+1 is obtained from Cn by deleting the middle (open) third of each interval of Cn . (See 134G.) Because C is closed, µHr C is defined (264E). Note that 3r = 2. (b) If δ > 0, take n such that 3−n ≤ δ; then C can be covered by 2n intervals of diameter 3−n , so θrδ C ≤ 2n (3−n )r = 1. Consequently µHr C = µ∗Hr C = limδ↓0 θrδ C ≤ 1. (c) We need the following elementary fact: if α, β, γ ≥ 0 and max(α, γ) ≤ β, then αr + γ r ≤ (α + β + γ)r . P P Because 0 < r ≤ 1, Rη ξ 7→ (ξ + η)r − ξ r = r 0 (ξ + ζ)r−1 dζ is non-increasing for every η ≥ 0. Consequently (α + β + γ)r − αr − γ r ≥ (β + β + γ)r − β r − γ r ≥ (β + β + β)r − β r − β r = β r (3r − 2) = 0, as required. Q Q (d) Now suppose that I ⊆ R is any interval, and m ∈ N; write jm (I) for the number of the intervals composing Cm which are included in I. Then 2−m jm (I) ≤ (diam I)r . P P If I does not meet Cm , this is trivial. Otherwise, induce on l = min{i : I meets only one of the intervals composing Cm−i }. If l = 0, so that I meets only one of the intervals comprosing Cm , then jm (I) ≤ 1, and if jm (I) = 1 then diam I ≥ 3−m so (diam I)r ≥ 2−m ; thus the induction starts. For the inductive step to l > 1, let J be the interval of Cm−l which meets I, and J 0 , J 00 the two intervals of Cm−l+1 included in J, so that I meets both J 0 and J 00 , and jm (I) = jm (I ∩ J) = jm (I ∩ J 0 ) + jm (I ∩ J 00 ). By the inductive hypothesis, (diam(I ∩ J 0 ))r + (diam(I ∩ J 00 ))r ≥ 2−m jm (I ∩ J 0 ) + 2−m jm (I ∩ J 00 ) = 2−m jm (I). On the other hand, by (c),
326
Change of variable in the integral
*264J
(diam(I ∩ J 0 ))r + (diam(I ∩ J 00 ))r ≤ (diam(I ∩ J 0 ) + 3−m+l−1 + diam(I ∩ J 00 ))r = (diam(I ∩ J))r ≤ (diam I)r because J 0 , J 00 both have diameter at most 3−(m−l+1) , the length of the interval between them. Thus the induction continues. Q Q (e) Now suppose that ² > 0. Then there is a sequence hAn in∈N of sets, covering C, such that P∞ r n=0 (diam An ) < µHr C + ². P∞ Take ηn > 0 such that n=0 (diam An + ηn )r ≤ µHr C + ², and for each n take an open interval In ⊇ An of length at most diam An + ηn and with neither S endpoint belonging to C; this is possible because C does not include any non-trivial interval. Now C ⊆ n∈N In ; because C is compact, there is a k ∈ N such S that C ⊆ n≤k In . Next, there is an m ∈ N such that no endpoint of any In , for n ≤ k, belongs to Cm . Consequently each of the intervals composing Cm must be included in some In , and (in the terminology of Pk (d) above) n=0 jm (In ) ≥ 2m . Accordingly Pk Pk P∞ 1 ≤ n=0 2−m jm (In ) ≤ n=0 (diam In )r ≤ n=0 (diam An + ηn )r ≤ µHr C + ². As ² is arbitrary, µHr C ≥ 1, as required. *264K General metric spaces While this chapter deals exclusively with Euclidean spaces, readers familiar with the general theory of metric spaces may find the nature of the theory clearer if they use the language of metric spaces in the basic definitions and results. I therefore repeat the definition here, and spell out the corresponding results in the exercises 264Yb-264Yl. Let (X, ρ) be a metric space, and r > 0. For any A ⊆ X, δ > 0 set θrδ A = inf{
∞ X
(diam An )r : hAn in∈N is a sequence of subsets of X covering A,
n=0
diam An ≤ δ for every n ∈ N}, interpreting the diameter of the empty set as 0, and inf ∅ as ∞, so that θrδ A = ∞ if A cannot be covered by a sequence of sets of diameter at most δ. Say that θr A = supδ>0 θrδ A is the r-dimensional Hausdorff outer measure of A, and take the measure µHr defined by Carath´eodory’s method from this outer measure to be r-dimensional Hausdorff measure on X. 264X Basic exercises > (a) Show that all the functions θrδ of 264A are outer measures. Show that in that context, θrδ (A) = 0 iff θr (A) = 0, for and δ > 0 and any A ⊆ R s . (b) Let s ≥ 1 be an integer, and θ an outer measure on R s such that θ(A ∪ B) = θA + θB whenever A, B are non-empty subsets of R s and inf x∈A,y∈B kx − yk > 0. Show that every Borel subset of R s is measurable for the measure defined from θ by Carath´eodory’s method. > (c) Let s ≥ 1 be an integer and r > 0; define θrδ as in 264A. Show that for any A ⊆ R s , δ > 0, ∞ X
θrδ A = inf{
(diam Fn )r : hFn in∈N is a sequence of closed subsets of X
n=0
covering A, diam Fn ≤ δ for every n ∈ N} = inf{
∞ X
(diam Gn )r : hGn in∈N is a sequence of open subsets of X
n=0
covering A, diam Gn ≤ δ for every n ∈ N}. >(d) Let s ≥ 1 be an integer and r ≥ 0; let µHr be r-dimensional Hausdorff measure on R s . Show that for every A ⊆ R s there is a Gδ set (that is, a set expressible as the intersection of a sequence of open sets) H ⊇ A such that µHr H = µ∗Hr A. (Hint: use 264Xc.)
264Yk
Hausdorff measures
327
> (e) Let s ≥ 1 be an integer, and 0 ≤ r < r0 . Show that if A ⊆ R s and the r-dimensional Hausdorff outer measure µ∗Hr A of A is finite, then µ∗Hr0 A must be zero. 264Y Further exercises (a) Let θ11 be the outer measure on R 2 defined in 264A, with r = δ = 1, and µ11 the measure derived from θ11 by Carath´eodory’s method, Σ11 its domain. Show that any set in Σ11 is either negligible or conegligible. (b) Let (X, ρ) be a metric space and r ≥ 0. Show that if A ⊆ X and µ∗Hr A < ∞, then A is separable. (c) Let (X, ρ) be a metric space, and θ an outer measure on X such that θ(A ∪ B) = θA + θB whenever A, B are non-empty subsets of X and inf x∈A,y∈B ρ(x, y) > 0. (Such an outer measure is called a metric outer measure.) Show that every Borel subset of X is measurable for the measure defined from θ by Carath´eodory’s method. (d) Let (X, ρ) be a metric space and r > 0; define θrδ as in 264K. Show that for any A ⊆ X, ∞ X
µ∗Hr A = sup inf{ δ>0
(diam Fn )r : hFn in∈N is a sequence of closed subsets of X
n=0
covering A, diam Fn ≤ δ for every n ∈ N} = sup inf{
∞ X
δ>0
(diam Gn )r : hGn in∈N is a sequence of open subsets of X
n=0
covering A, diam Gn ≤ δ for every n ∈ N}. (e) Let (X, ρ) be a metric space and r ≥ 0; let µHr be r-dimensional Hausdorff measure on X. Show that for every A ⊆ X there is a Gδ set H ⊇ A such that µHr H = µ∗Hr A is the r-dimensional Hausdorff outer measure of A. (f ) Let (X, ρ) be a metric space and r ≥ 0; let Y be any subset of X, and give Y its induced metric (Y )∗ ρY . Show that the r-dimensional Hausdorff outer measure µHr on Y is just the restriction to PY of the outer measure µ∗Hr on X. Show that if either µ∗Hr Y < ∞ or µHr Y is defined then r-dimensional Hausdorff (Y ) measure µHr on Y is just the subspace measure on Y induced by the measure µHr on X. (g) Let (X, ρ) be a metric space and r > 0. Show that r-dimensional Hausdorff measure on X is atomless. (Hint: Let E ∈ dom µHr . (i) If E is not separable, there is an open set G such that E ∩ G and E \ G are both non-separable, therefore both non-negligible. (ii) If there is an x ∈ E such that µHr (E ∩ B(x, δ)) > 0 for every δ > 0, then one of these sets has non-negligible complement in E. (iii) Otherwise, µHr E = 0.) (h) Let (X, ρ) be a metric space and r ≥ 0; let µHr be r-dimensional Hausdorff measure on X. Show that if µHr E < ∞ then µHr E = sup{µHr F : F ⊆ E is closed and totally bounded}. (Hint: given ² > 0, use 264Yd to find a closed totally bounded set F such that µHr (F \ E) = 0 and µHr (E \ F ) ≤ ², and now apply 264Ye to F \ E.) (i) Let (X, ρ) be a complete metric space and r ≥ 0; let µHr be r-dimensional Hausdorff measure on X. Show that if µHr E < ∞ then µHr E = sup{µHr F : F ⊆ E is compact}. (j) Let (X, ρ) and (Y, σ) be metric spaces. If D ⊆ X and φ : D → Y is a function, then φ is γ-Lipschitz for φ, if σ(φ(x), φ(x0 )) ≤ γρ(x, x0 ) for every x, x0 ∈ D. (i) Show that in this case, if r ≥ 0, µ∗Hr (φ[A]) ≤ γ r µ∗Hr A for every A ⊆ D, writing µ∗Hr for r-dimensional Hausdorff outer measure on either X or Y . (ii) Show that if X is complete and µHr E is defined and finite, then µHr (φ[E]) is defined. (Hint: 264Yi.) (k) Let (X, ρ) be a metric space, and for r ≥ 0 let µHr be Hausdorff r-dimensional measure on X. Show that there is a unique ∆ = ∆(X) ∈ [0, ∞] such that µHr X = ∞ if r ∈ [0, ∆[, 0 if r ∈ ]∆, ∞[.
328
Change of variable in the integral
264Yl
(l) Let (X, ρ) be a metric space and φ : I → X a continuous function, where I ⊆ R is an interval. Write µH1 for one-dimensional Hausdorff measure on X. Show that Pn µH1 (φ[I]) ≤ sup{ i=1 ρ(φ(ti ), φ(ti−1 )) : t0 , . . . , tn ∈ I, t0 ≤ . . . ≤ tn }, the length of the curve φ, with equality if φ is injective. (m) Set r = ln 2/ ln 3, as in 264J, and write µHr for r-dimensional Hausdorff measure on P the Cantor set ∞ C. Let λ be the usual measure on {0, 1}N (254J). Define φ : {0, 1}N → C by setting φ(x) = 23 n=0 3−n x(n) N N for x ∈ {0, 1} . Show that φ is an isomorphism between ({0, 1} , λ) and (C, µHr ). (n) Set r = ln 2/ ln 3 and write µHr for r-dimensional Hausdorff measure on the Cantor set C. Let f : [0, 1] → [0, 1] be the Cantor function (134H) and let µ be Lebesgue measure on R. Show that µf [E] = µHr E for every E ∈ dom µHr , µHr (C ∩ f −1 [F ]) = µF for every Lebesgue measurable set F ⊆ [0, 1]. (o) Let (X, ρ) be a metric space and h : [0, ∞[ → [0, ∞[ a non-decreasing function. For A ⊆ X set ∞ X
θh A = sup inf{ δ>0
h(diam An ) : hAn in∈N is a sequence of subsets of X
n=0
covering A, diam An ≤ δ for every n ∈ N}, interpreting diam ∅ as 0, inf ∅ as ∞ as usual. Show that θh is an outer measure on X. State and prove theorems corresponding to 264E and 264F. Look through 264X and 264Y for further results which might be generalizable, perhaps on the assumption that h is continuous on the right. (p) Let (X, ρ) be a metric space. Let us say that if a < b in R and f P : [a, b] → X is a function, then f n is absolutely continuous if for everyP ² > 0 there is a δ > 0 such that i=1 ρ(f (ai ), f (bi )) ≤ ² whenever n a ≤ a0 ≤ b0 ≤ . . . ≤ an ≤ bn ≤ b and i=0 bi − ai ≤ δ. Show that f : [a, b] → X is absolutely continuous iff it is continuous and of bounded variation (in the sense of 224Ye) and µH1 f [A] = 0 whenever A ⊆ [a, b] is Lebesgue negligible, where µH1 is 1-dimensional Hausdorff measure on X. (Compare 225M.) Show that in this case µH1 f [ [a, b] ] < ∞. 264 Notes and comments In this section we have come to the next step in ‘geometric measure theory’. I am taking this very slowly, because there are real difficulties in the subject, and for the purposes of this volume we do not need to master very much of it. The idea here is to find a definition of r-dimensional Lebesgue measure which will be ‘geometric’ in the strict sense, that is, dependent only on the metric structure of R r , and therefore applicable to sets which have a metric structure but no linear structure. As has happened before, the definition of Hausdorff measure from an outer measure gives no problems – the only new idea in 264A-264C is that of using a supremum θr = supδ>0 θrδ of outer measures – and the difficult part is proving that our new measure has any useful properties. Concerning the properties of Hausdorff measure, there are two essential objectives; first, to check that these measures, in general, share a reasonable proportion of the properties of Lebesgue measure; and second, to justify the term ‘r-dimensional measure’ by relating Hausdorff r-dimensional measure on R r to Lebesgue measure on R r . As for the properties of general Hausdorff measures, we have to go rather carefully. I do not give counterexamples here because they involve concepts which belong to Volumes 4 and 5 rather than this volume, but I must warn you to expect the worst. However, we do at least have open sets measurable, so that all Borel sets are measurable (264E). The outer measure of a set A can be defined in terms of the Borel sets including A (264Fa), though not in general in terms of the open sets including A; but the measure of a measurable set E is not necessarily the supremum of the measures of the Borel sets included in E, unless E is of finite measure (264Fc). We do find that the outer measure θr defined in 264A is the outer measure defined from µHr (264Fb), so that the phrase ‘r-dimensional Hausdorff outer measure’ is unambiguous. A crucial property of Lebesgue measure is the fact that the measure of a measurable set E is the supremum of the measures of the compact subsets of E; this is not generally shared by Hausdorff measures, but is valid for sets E of finite measure in complete spaces (264Yi). Concerning subspaces, there are no problems with the outer measures, and for sets of finite measure the subspace measures are also consistent (264Yf). Because Hausdorff measure is defined in metric terms, it behaves regularly for Lipschitz maps (264G); one of the
265B
Surface measures
329
most natural classes of functions to consider when studying metric spaces is that of 1-Lipschitz functions, so that (in the language of 264G) µ∗Hr φ[A] ≤ µ∗Hr A for every A. The second essential feature of Hausdorff measure, its relation with Lebesgue measure in the appropriate dimension, is Theorem 264I. Because both Hausdorff measure and Lebesgue measure are translationinvariant, this can be proved by relatively elementary means, except for the evaluation of the normalizing r r constant; all we need to know is that µ [0, 1[ = 1 and µHr [0, 1[ are both finite and non-zero, and this is straightforward. (The arguments of part (a) of the proof of 261F are relevant.) For the purposes of this chapter, we do not I think have to know the value of the constant; but I cannot leave it unsettled, and therefore give Theorem 264H, the isodiametric inequality, to show that it is just the Lebesgue measure of an r-dimensional ball of diameter 1, as one would hope and expect. The critical step in the argument of 264H is in part (c) of the proof. This is called ‘Steiner symmetrization’; the idea is that given a set A, we transform A through a series of steps, at each stage lowering, or at least not increasing, its diameter, and raising, or at least not decreasing, its outer measure, progressively making A more symmetric, until at the end we have a set which is sufficiently constrained to be amenable. The particular symmetrization operation used in this proof is important enough; but the idea of progressive regularization of an object is one of the most powerful methods in measure theory, and you should give all your attention to mastering any example you encounter. In my experience, the idea is principally useful when seeking an inequality involving disparate quantities – in the present example, the diameter and volume of a set. Of course it is awkward having two measures on R r , differing by a constant multiple, and for the purposes of the next section it would actually have been a little more convenient to follow Federer 69 in using ‘normalized Hausdorff measure’ 2−r βr µHr . (For non-integral r, we could take βr = π r/2 /Γ(1 + 2r ), as suggested in 252Xh.) However, I believe this to be a minority position, and the striking example of Hausdorff measure on the Cantor set (264J, 264Ym-264Yn) looks much better in the non-normalized version. Hausdorff (ln 2/ ln 3)-dimensional measure on the Cantor set is of course but one, perhaps the easiest, of a large class of examples. Because the Hausdorff r-dimensional outer measure of a set A, regarded as a function of r, behaves dramatically (falling from ∞ to 0) at a certain critical value ∆(A) (see 264Xe, 264Yk), it gives us a metric space invariant of A; ∆(A) is the Hausdorff dimension of A. Evidently the Hausdorff dimension of C is ln 2/ ln 3, while that of r-dimensional Euclidean space is r.
265 Surface measures In this section I offer a new version of the arguments of §263, this time not with the intention of justifying integration-by-substitution, but instead to give a practically effective method of computing the Hausdorff r-dimensional measure of a smooth r-dimensional surface in an s-dimensional space. The basic case to bear in mind is r = 2, s = 3, though any other combination which you can easily visualize will also be a valuable aid to intuition. I give a fundamental theorem (265E) providing a formula from which we can hope to calculate the r-dimensional measure of a surface in s-dimensional space which is parametrized by a differentiable function, and work through some of the calculations in the case of the r-sphere (265F-265H). 265A Normalized Hausdorff measure As I remarked at the end of the last section, Hausdorff measure, as defined in 264A-264C, is not quite the most appropriate measure for our work here; so in this section I will use normalized Hausdorff measure, meaning νr = 2−r βr µHr , where µHr is r-dimensional Hausdorff measure (interpreted in whichever space is under consideration) and βr = µr B(0, 1) is the Lebesgue measure of any ball of radius 1 in R r . It will be convenient to take β0 = 1. As shown in 264H-264I, this normalization makes νr on R r agree with Lebesgue measure µr . Observe that of course νr∗ = 2−r βr µ∗Hr (264Fb). 265B Linear subspaces Just as in §263, the first step is to deal with linear operators. Theorem Suppose that r, s are integers√with 1 ≤ r ≤ s, and that T is a real s × r matrix; regard T as a linear operator from R r to R s . Set J = det T 0 T , where T 0 is the transpose of T . Write νr for normalized r-dimensional Hausdorff measure on R s , Tr for its domain, and µr for Lebesgue measure on R r . Then
330
Change of variable in the integral
265B
νr T [E] = Jµr E r
for every measurable set E ⊆ R . If T is injective (that is, if J 6= 0), then νr F = Jµr T −1 [F ] whenever F ∈ Tr and F ⊆ T [R r ]. proof The formula for J assumes that det T 0 T is non-negative, which is a fact not in evidence; but the argument below will establish it adequately soon. (a) Let V be the linear subspace of R s consisting of vectors y = (η1 , . . . , ηs ) such that ηi = 0 whenever r < i ≤ s. Let R be the r × s matrix hρij ii≤r,j≤s , where ρij = 1 if i = j ≤ r, 0 otherwise; then the s × r matrix R0 may be regarded as a bijection from R r to V . Let W be an r-dimensional linear subspace of R s including T [R r ], and let P be an orthogonal s × s matrix such that P [W ] = V . Then S = RP T is an r × r matrix. We have R0 Ry = y for y ∈ V , so R0 RP T = P T and S 0 S = T 0 P 0 R0 RP T = T 0 P 0 P T = T 0 T ; accordingly det T 0 T = det S 0 S = (det S)2 ≥ 0 and J = | det S|. At the same time, P 0 R0 S = P 0 R0 RP T = P 0 P T = T . Observe that J = 0 iff S is not injective, that is, T is not injective. (b) If we consider the s × r matrix P 0 R0 as a map from R r to R s , we see that φ = P 0 R0 is an isometry between R r and W , with inverse φ−1 = RP ¹ W . It follows that φ is an isomorphism between the measure (r) (s) (r) (s) spaces (Rr , µHr ) and (W, µHrW ), where µHr is r-dimensional Hausdorff measure on R r and µHrW is the (s) subspace measure on W induced by r-dimensional Hausdorff measure µHr on R s . P P (i) If A ⊆ R r , A0 ⊆ W , (s)∗
(r)∗
µHr (φ[A]) ≤ µHr (A), (s)∗
(r)∗
(s)∗
µHr (φ−1 [A0 ]) ≤ µHr (A0 ),
(r)∗
using 264G twice. Thus µHr (φ[A]) = µHr (A) for every A ⊆ R r . (s)
(s)
(ii) Now because W is closed, therefore in the domain of µHr (264E), the subspace measure µHrW is (s)∗ just the measure induced by µHr ¹ W by Carath´eodory’s method (214H(b-ii)). Because φ is an isomorphism (r)∗ (s)∗ (r) (s) between (Rr , µHr ) and (W, µHr ¹ W ), it is an isomorphism between (R r , µHr ) and (W, µHrW ). Q Q (c) It follows that φ is also an isomorphism between the normalized versions (R r , µr ) and (W, νrW ), writing νrW for the subspace measure on W induced by νr . Now if E ⊆ R r is Lebesgue measurable, we have µr S[E] = Jµr E, by 263A; so that νr T [E] = νr (P 0 R0 [S[E]]) = νr (φ[S[E]]) = µr S[E] = Jµr E. If T is injective, then S = φ−1 T must also be injective, so that J 6= 0 and νr F = µr (φ−1 [F ]) = Jµr (S −1 [φ−1 [F ]]) = Jµr T −1 [F ] whenever F ∈ Tr and F ⊆ W = T [R r ]. 265C Corollary Under the conditions of 265B, νr∗ T [A] = Jµ∗r A for every A ⊆ R r . proof (a) If E is Lebesgue measurable and A ⊆ E, then T [A] ⊆ T [E], so νr∗ T [A] ≤ νr T [E] = Jµr E; as E is arbitrary, νr∗ T [A] ≤ Jµ∗r A.
265E
Surface measures
331
(b) If J = 0 we can stop. If J 6= 0 then T is injective, so if F ∈ Tr and T [A] ⊆ F we shall have Jµ∗r A ≤ Jµr T −1 [F ∩ W ] = νr (F ∩ W ) ≤ νr F ; as F is arbitrary, Jµ∗r A ≤ νr∗ T [A]. 265D
I now proceed to the lemma corresponding to 263C.
√ Lemma Suppose that 1 ≤ r ≤ s and that T is an s × r matrix; set J = det T 0 T , and suppose that J 6= 0. Then for √ any ² > 0 there is a ζ = ζ(T, ²) > 0 such that (i) | det S 0 S − J| ≤ ² whenever S is an s × r matrix and kS − T k ≤ ζ; (ii) whenever D ⊆ R r is a bounded set and φ : D → Rs is a function such that kφ(x) − φ(y) − T (x − y)k ≤ ζkx − yk for all x, y ∈ D, then |νr∗ φ[D] − Jµ∗r D| ≤ ²µ∗r D. proof (a) Because det S 0 S √ is a continuous function of the coefficients of S, 262Hb tells us that there must be a ζ0 > 0 such that |J − det S 0 S| ≤ ² whenever kS − T k ≤ ζ0 . (b) Because J 6= 0, T is injective, and there is an r × s matrix T ∗ such that T ∗ T is the identity r × r matrix. Take ζ > 0 such that ζ ≤ ζ0 , ζkT ∗ k < 1, J(1 + ζkT ∗ k)r ≤ J + ² and 1 − J −1 ² ≤ (1 − ζkT ∗ k)r . Let φ : D → R s be such that kφ(x) − φ(y) − T (x − y)k ≤ ζkx − yk whenever x, y ∈ D. Set ψ = φT ∗ , so that φ = ψT . Then for u, v ∈ T [D] kψ(u) − ψ(v)k ≤ (1 + ζkT ∗ k)ku − vk,
ku − vk ≤ (1 − ζkT ∗ k)−1 kψ(u) − ψ(v)k.
P P Take x, y ∈ D such that u = T x, v = T y; of course x = T ∗ u, y = T ∗ v. Then kψ(u) − ψ(v)k = kφ(T ∗ u) − φ(T ∗ v)k = kφ(x) − φ(y)k ≤ kT (x − y)k + ζkx − yk = ku − vk + ζkT ∗ u − T ∗ vk ≤ ku − vk(1 + ζkT ∗ k). Next, ku − vk = kT x − T yk ≤ kφ(x) − φ(y)k + ζkx − yk = kψ(u) − ψ(v)k + ζkT ∗ u − T ∗ vk ≤ kψ(u) − ψ(v)k + ζkT ∗ kku − vk, so that (1 − ζkT ∗ k)ku − vk ≤ kψ(u) − ψ(v)k and ku − vk ≤ (1 − ζkT ∗ k)−1 kψ(u) − ψ(v)k. Q Q (c) Now from 264G and 265C we see that νr∗ φ[D] = νr∗ ψ[T [D]] ≤ (1 + ζkT ∗ k)r νr∗ T [D] = (1 + ζkT ∗ k)r Jµ∗r D ≤ (J + ²)µ∗r D, and (provided ² ≤ J) (J − ²)µ∗r D = (1 − J −1 ²)νr∗ T [D] ≤ (1 − J −1 ²)(1 − ζkT ∗ k)−r νr∗ ψ[T [D]] ≤ νr∗ ψ[T [D]] = νr∗ φ[D]. (Of course, if ² ≥ J, then surely (J − ²)µ∗r D ≤ νr∗ φ[D].) Thus (J − ²)µ∗r D ≤ νr∗ φ[D] ≤ (J + ²)µ∗r D as required, and we have an appropriate ζ. 265E Theorem Suppose that 1 ≤ r ≤ s; write µr for Lebesgue measure on R r , νr for normalized Hausdorff measure on R s , and Tr for the domain of νr . Let D ⊆ R r be any set, and φ : D → R s a function differentiable relative to its domain at each point of D. For each x ∈ D let T (x) be a derivative of φ at x p relative to D, and set J(x) = det T (x)0 T (x). Set D0 = {x : x ∈ D, J(x) > 0}. Then (i) J : D → [0,R∞[ is a measurable function; (ii) νr∗ φ[D] ≤ D J(x)µr (dx), allowing ∞ as the value of the integral; (iii) νr∗ φ[D \ D0 ] = 0.
332
Change of variable in the integral
265E
If D is Lebesgue measurable, then (iv) φ[D] ∈ Tr . If D is measurableRand φ is injective, then (v) νr φ[D] = D J dµr ; (vi) for any set E ⊆ φ[D], E ∈ Tr iff φ−1 [E] ∩ D0 is Lebesgue measurable, and in this case R R νr E = φ−1 [E] J(x)µr (dx) = D J × χ(φ−1 [E])dµr ; (vii) for every real-valued function g defined on a subset of φ[D], R R g dνr = D J × gφ dµr , φ[D] if either integral is defined in [−∞, ∞], provided we interpret J(x)g(φ(x)) as zero when J(x) = 0 and g(φ(x)) is undefined. proof I seek to follow the line laid out in the proof of 263D. (a) Just as in 263D, we know that J : D → R is measurable, since J(x) is a continuous function of the coefficients of T (x), all of which are measurable, by 262P. S If D is Lebesgue measurable, then there is a sequence hFn in∈N of compact subsets of D such thatSD \ n∈N Fn is µr -negligible. Now φ[Fn ] is compact, therefore belongs to Tr , for each n ∈ N. As for φ[D \ n∈N Fn ], this must be νr -negligible by 264G, because φ is a countable union of Lipschitz functions (262N). So S S φ[D] = n∈N φ[Fn ] ∪ φ[D \ n∈N Fn ] ∈ Tr . This deals with (i) and (iv). (b) For the moment, assume that D is bounded and that J(x) > 0 for every x ∈ D, and fix ² > 0. Let ∗ Msr be the set of s × r matrices T such that det T 0 T 6= 0, that is, the corresponding map T : R r → R s is ∗ injective. For T ∈ Msr take ζ(T, ²) > 0 as in 265D. ∗ , so that hDn in∈N is a disjoint cover of D by sets which Take hDn in∈N , hTn in∈N as in 262M, with A = Msr are relatively measurable in D, and each Tn is an s × r matrix such that kT (x) − Tn k ≤ ζ(Tn , ²) whenever x ∈ Dn , kφ(x) − φ(y) − Tn (x − y)k ≤ ζ(Tn , ²)kx − yk for all x, y ∈ Dn . p Then, setting Jn = det Tn0 Tn , we have |J(x) − Jn | ≤ ² for every x ∈ Dn , |νr∗ φ[Dn ] − Jn µ∗r Dn | ≤ ²µ∗r Dn , by the choice of ζ(Tn , ²). So
νr∗ φ[D] ≤ (because φ[D] =
S
∞ X
νr∗ φ[Dn ]
n=0 n∈N φ[Dn ])
≤
∞ X
Jµ∗r Dn + ²µ∗r Dn ≤ ²µ∗r D +
n=0
∞ X
Jn µ∗r Dn
n=0
(because the Dn are disjoint and relatively measurable in D) Z X ∞ ∗ = ²µr D + Jn χDn dµ Z ≤ ²µ∗r D +
D n=0
D
J(x) + ²µr (dx) = 2²µ∗r D +
Z J dµr . D
If D is measurable and φ is injective, then all the Dn are Lebesgue measurable subsets of R r , so all the φ[Dn ] are measured by νr , and they are also disjoint. Accordingly
265E
Surface measures
Z J dµ ≤ D
≤
∞ X n=0 ∞ X
333
Jn µr Dn + ²µr D (νr φ[Dn ] + ²µr Dn ) + ²µr D = νr φ[D] + 2²µr D.
n=0
Since ² is arbitrary, we get νr∗ φ[D] ≤ and if D is measurable and φ is injective,
R D
R D
J dµr ,
J dµr ≤ νr φ[D];
thus we have (ii) and (v), on the assumption that D is bounded and J > 0 everywhere on D. (c) Just as in 263D, we can now relax the assumption that D is bounded by considering Bk = B(0, k) ⊆ R r ; provided J > 0 everywhere on D, we get R R νr∗ φ[D] = limk→∞ µ∗r φ[D ∩ Bk ] ≤ limk→∞ D∩Bk J dµr = D J dµr , with equality if D is measurable and φ is injective. (d) Now we find that νr∗ φ[D \ D0 ] = 0. P P (i) Let η ∈ ]0, 1]. Define ψη : D → R s+r by setting ψ(x) = (φ(x), ηx), identifying R s+r with R s × R r . ψη is differentiable relative to its domain at each point of D, with derivative T˜η (x), being the (s + r) × r matrix in which the top s rows consist of the s × r matrix T (x), and the bottom r rows are ηIr , writing Ir for the r × r identity matrix. (Use 262Ib.) Now of course T˜η (x), regarded as a map from R r to R s+r , is injective, so q p J˜η (x) = det T˜η (x)0 T˜η (x) = det(T (x)0 T (x) + η 2 I) > 0. We have limη↓0 J˜η (x) = J(x) = 0 for x ∈ D \ D0 . (ii) Express T (x) as hτij (x)ii≤s,j≤r for each x ∈ D. Set Cm = {x : x ∈ D, kxk ≤ m, |τij (x)| ≤ m for all i ≤ s, j ≤ r} for each m ≥ 1. For x ∈ Cm , all the coefficients of T˜η (x) have moduli at most m; consequently (giving the crudest and most immediately available inequalities) all the coefficients of T˜η (x)0 T˜η (x) have moduli at p 2 most (r + s)m and J˜η (x) ≤ r!(s + r)r mr . Consequently we can use Lebesgue’s Dominated Convergence Theorem to see that R ˜ limη↓0 0 Jη dµr = 0. Cm \D
(iii) Let ν˜r be normalized Hausdorff r-dimensional measure on R s+r . Applying (b) of this proof to ψη ¹Cm \ D0 , we see that R ν˜r∗ ψη [Cm \ D0 ] ≤ Cm \D0 J˜η dµr . Now we have a natural map P : R s+r → R s given by setting P (ξ1 , . . . , ξs+r ) = (ξ1 , . . . , ξs ), and P is 1-Lipschitz, so by 264G we have (allowing for the normalizing constants 2−r βr ) νr∗ P [A] ≤ ν˜r∗ A for every A ⊆ R s+r . In particular,
R νr∗ φ[Cm \ D0 ] = νr∗ P [ψη [Cm \ D0 ]] ≤ ν˜r∗ ψη [Cm \ D0 ] ≤ Cm \D0 J˜η dµr → 0 S as η ↓ 0. But this means that νr∗ φ[Cm \ D0 ] = 0. As D = m≥1 Cm , νr∗ φ[D \ D0 ] = 0, as claimed. Q Q (d) This proves (iii) of the theorem. But of course this is enough to give (ii) and (v), because we must have
334
Change of variable in the integral
νr∗ φ[D] = νr∗ φ[D0 ] ≤
R D0
J dµr =
R D
265E
J dµr ,
with equality if D is measurable and φ is injective. (e) So let us turn to part (vi). Assume that D is measurable and that φ is injective. (i) Suppose that E ⊆ φ[D] belongs to Tr . Let Hk = {x : x ∈ D, kxk ≤ k, J(x) ≤ k} for each k; then each Hk is Lebesgue measurable, so (applying (iii) to φ¹Hk ) φ[Hk ] ∈ Tr , and νr φ[Hk ] ≤ kµr Hk < ∞. Thus φ[D] can be covered by a sequence of sets of finite measure for νr , which of course are of finite measure for r-dimensional Hausdorff measure on R s . By 264Fc, there are Borel sets E1 , E2 ⊆ R s such that E1 ⊆ E ⊆ E2 and νr (E2 \ E1 ) = 0. Now F1 = φ−1 [E1 ], F2 = φ−1 [E2 ] are Lebesgue measurable subsets of D, and R J dµr = νr φ[F2 \ F1 ] = νr (φ[D] ∩ E2 \ E1 ) = 0. F2 \F1 Accordingly µr (D0 ∩ (F2 \ F1 )) = 0. But as D0 ∩ F1 ⊆ D0 ∩ φ−1 [E] ⊆ D0 ∩ F2 , it follows that D0 ∩ φ−1 [E] is measurable, and that Z Z Z J dµr = J dµr = J dµr φ−1 [E] D 0 ∩φ−1 [E] D 0 ∩F1 Z = J dµr = νr φ[D ∩ F1 ] = νr E1 = νr E. D∩F1
Moreover, J × χ(φ−1 [E]) = J × χ(D0 ∩ φ−1 [E]) is measurable, so we can write R J. φ−1 [E]
R
J × χ(φ−1 [E]) in place of
(ii) If E ⊆ φ[D] and D0 ∩ φ−1 [E] is measurable, then of course E = φ[D0 ∩ φ−1 [E]] ∪ φ[(D \ D0 ) ∩ φ−1 [E]] ∈ Tr , because φ[G] ∈ Tr for every measurable G ⊆ D and φ[D \ D0 ] is νr -negligible. (f ) Finally, (vii) follows at once from (vi), applying 235L to µr and the subspace measure induced by νr on φ[D]. 265F The surface of a sphere To show how these ideas can be applied to one of the basic cases, I give the details of a method of describing spherical surface measure in s-dimensional space. Take r ≥ 1 and s = r + 1. Write Sr for {z : z ∈ Rr+1 , kzk = 1}, the r-sphere. Then we have a parametrization φr of Sr given by setting
ξ1 ξ2 ... φr ... ... ξr
sin ξ1 sin ξ2 sin ξ3 . . . sin ξr cos ξ1 sin ξ2 sin ξ3 . . . sin ξr cos ξ2 sin ξ3 . . . sin ξr ... = . cos ξ sin ξ sin ξ r−2 r−1 r cos ξr−1 sin ξr cos ξr
I choose this formulation because I wish to use an inductive argument based on the fact that µ ¶ µ ¶ x sin ξ φr (x) φr+1 = ξ cos ξ for x ∈ R r , ξ ∈ R. Every φr is differentiable, by 262Id. If we set
265G
Surface measures
335
Dr = {x : ξ1 ∈ ]−π, π] , ξ2 , . . . , ξr ∈ [0, π], if ξj ∈ {0, π} then ξi = 0 for i < j}, then it is easy to check that Dr is a Borel subset of Rr and that φr ¹Dr is a bijection between Dr and Sr . Now let Tr (x) be the (r + 1) × r matrix φ0r (x). Then µ ¶ µ ¶ x sin ξ Tr (x) cos ξ φr (x) Tr+1 = . ξ 0 − sin ξ So ¶ µ ¶ µ ¶ µ sin2 ξ Tr (x)0 Tr (x) sin ξ cos ξ Tr (x)0 φr (x) x 0 x ) Tr+1 = . (Tr+1 ξ ξ cos ξ sin ξ φr (x)0 Tr (x) cos2 ξφr (x)0 φr (x) + sin2 ξ But of course φr (x)0 φr (x) = kφr (x)k2 = 1 for every x, and (differentiating with respect to each coordinate of x, if you wish) Tr (x)0 φr (x) = 0, φr (x)0 Tr (x) = 0. So we get µ ¶ µ ¶ µ 2 ¶ x 0 x sin ξ Tr (x)0 Tr (x) 0 (Tr+1 ) Tr+1 = , ξ ξ 0 1 p and writing Jr (x) = det Tr (x)0 Tr (x), µ ¶ x Jr+1 = | sinr ξ|Jr (x). ξ At this point we induce on r to see that Jr (x) = | sinr−1 ξr sinr−2 ξr−1 . . . sin ξ2 | (since of course the induction starts with the case r = 1, µ ¶ µ ¶ sin x cos x φ1 (x) = , T1 (x) = , cos x − sin x
T1 (x)0 T1 (x) = 1,
J1 (x) = 1).
To find the surface measure of Sr , we need to calculate Z
Z Dr
Z
π
Jr dµr = 0
= 2π
r Z Y k=2
(substituting
π 2
π
Z
π
... 0
0
π
sinr−1 ξr . . . sin ξ2 dξ1 dξ2 . . . dξr
−π
sink−1 t dt = 2π
r−1 Y Z π/2 k=1
cosk t dt
−π/2
− t for t). But in the language of 252Q, this is just Qr−1 2π k=1 Ik = 2πβr−1 ,
where βr−1 is the volume of the unit ball of R r−1 (interpreting β0 as 1, if you like). 265G
The surface area of a sphere can also be calculated through the following result.
Theorem Let µr+1 be Lebesgue measure on R r+1 , and νr normalized r-dimensional Hausdorff measure on R r+1 . If f is a locally µr+1 -integrable real-valued function, y ∈ R r+1 and δ > 0, R RδR f dµr+1 = 0 ∂B(y,t) f dνr dt, B(y,δ) R where I write ∂B(y, s) for the sphere {x : kx − yk = s} and the integral . . . dt is to be taken with respect to Lebesgue measure on R. proof Take any differentiable function φ : R r → Sr with a Borel set F ⊆ R r such that φ¹F is a bijection between F and Sr ; such a pair (φ, F ) is described in 265F. Define ψ : R r ×R → R r+1 by setting ψ(z, t) = y +
336
Change of variable in the integral
265G
tφ(z); then ψ is differentiable and ψ¹F ×]0, δ] is a bijection between F ×]0, δ] and B(y, δ)\{y}. For t ∈ ]0, δ], z ∈ R r set ψt (z) = ψ(z, t); then ψt ¹F is a bijection between F and the sphere {x : kx − yk = t} = ∂B(y, t). The derivative of φ at z is an (r + 1) × r matrix T1 (z) say, and the derivative Tt (z) of ψt at z is just tT1 (z); also the derivative of ψ at (z, t) is the the (r + 1) × (r + 1) matrix T (z, t) = ( tT1 (z) φ(z) ), where φ(z) is interpreted as a column vector. If we set p Jt (z) = det Tt (z)0 Tt (z), J(z, t) = | det T (z, t)|, then µ J(z, t)2 = det T (z, t)0 T (z, t) = det
tT1 (z)0 φ(z)0
¶ ( tT1 z
φ(z) ) ¶ µ 2 t T1 (z)0 T1 (z) 0 = Jt (z)2 , = det 0 1
because when we come to calculate the (i, r + 1)-coefficient of T (z, t)0 T (z, t), for 1 ≤ i ≤ r, it is Pr+1 ∂φj t ∂ Pr+1 (z)φj (z) = ( j=1 φj (z)2 ) = 0, j=1 t ∂ζi
2 ∂ζi
where φj is the jth coordinate of φ; while the (r+1, r+1)-coefficient of T (z, t)0 T (z, t) is just So in fact J(z, t) = Jt (z) for all z ∈ R r , t > 0. Now, given f ∈ L1 (µr+1 ), we can calculate Z
Pr+1 j=1
φj (z)2 = 1.
Z f dµr+1 =
B(y,δ)
f dµr+1 B(y,δ)\{y}
Z =
f (ψ(z, t))J(z, t)µr+1 (d(z, t)) F ×]0,δ]
(by 263D)
Z
δ
Z
=
f (ψt (z))Jt (z)µr (dz)dt 0
F
(where µr is Lebesgue measure on R r , by Fubini’s theorem, 252B) Z δZ = f dνr dt 0
∂B(y,t)
by 265E(vii). 265H Corollary If νr is normalized r-dimensional Hausdorff measure on R r+1 , then νr Sr = (r +1)βr+1 . proof In 265G, take y = 0, δ = 1, and f = χB(0, 1); then R R1 R1 1 νr Sr βr+1 = f dµr+1 = 0 νr (∂B(0, t)dt = 0 tr νr Sr dt = r+1
applying 264G to the maps x 7→ tx, x 7→ 1t x from R r+1 to itself to see that νr (∂B(0, t)) = tr νr Sr for t > 0. 265X Basic exercises (a) Let r ≥ 1, and let Sr (α) = {z : z ∈ R r+1 , kzk = α} be the r-sphere of radius α. Show that νr Sr (α) = 2πβr−1 αr = (r + 1)βr+1 αr for every α ≥ 0. > (b) Let r ≥ 1, and for a ∈ [−1, 1] set Ca = {z : z ∈ R r+1 , kzk = 1, ζ1 ≥ a}, writing z = (ζ1 , . . . , ζr+1 ) as usual. Show that R arccos a r−1 νr Ca = rβr 0 sin t dt. > (c) Again write Ca = {z : z ∈ Sr , ζ1 ≥ a}, where Sr ⊆ R r+1 is the unit sphere. Show that, for any Pr+1 R νr Sr a ∈ ]0, 1], νr Ca ≤ . (Hint: calculate i=1 Sr kξi k2 νr (dx).) 2 2(r+1)a
265 Notes
Surface measures
337
> (d) Let φ : ]0, 1[ → R r be an injective differentiable function. Show that the ‘length’ or one-dimensional R1 Hausdorff measure of φ[ ]0, 1[ ] is just 0 kφ0 (t)kdt. (e) (i) Show that if I is the identity r × r matrix and z ∈ R r , then det(I + zz 0 ) = 1 + kzk2 . (Hint: induce on r.) (ii) Write Ur−1 for the open unit ball in R r−1 , where r ≥ 2. Define φ : Ur−1 × R → Sr by setting µ ¶ x x φ = θ(x) cos ξ , ξ θ(x) sin ξ p where θ(x) = 1 − kxk2 . Show that ! µ ¶0 µ ¶ Ã 1 0 I+ xx 0 x x 0 0 θ(x)2 φ = φ , ξ ξ 0 θ(x)2 µ ¶ x = 1 for all x ∈ Ur−1 , ξ ∈ R. (iii) Hence show that the normalized r-dimensional Hausdorff so that J ξ Pr−1 2 measure of {y : y ∈ S r , i=1 ηi < 1} is just 2πβr−1 , where βr−1 is the Lebesgue measure of Ur−1 . (iv) z By considering ψz = 0 for z ∈ Sr−2 , or otherwise, show that the normalized r-dimensional Hausdorff 0 measure of Sr is 2πβr−1 . (v) Setting Ca = {z : z ∈ R r+1 , kzk = 1, ζr ≥ a}, as in 265Xb and 265Xc, show that νr Ca = 2πµr−1 {x : x ∈ R r−1 , kxk ≤ 1, ξ1 ≥ a} for every a ∈ [−1, 1]. 265Y Further exercises (a) Take a < b in R. (i) Show that φ : [a, b] → R r is absolutely continuous in the sense of 264Yp iff all its coordinates φi : [a, b] → R, for i ≤ r, are absolutely continuous in the sense of §225. (ii) Let φ : [a, b] → R r be a continuous R function, and set F = {x : x ∈ ]a, b[ , φ is differentiable at x}. Show that φ is absolutely continuous iff F kφ0 (x)kdx is finite and ν1 (φ[[a, b] \ F ]) = 0, where ν1 is normalized Hausdorff one-dimensional measure on R r . (Hint: 225K.) (iii) Show that if φ : [a, b] → R r is R 0 ∗ absolutely continuous then ν1 (φ[D]) ≤ D kφ (x)kdx for every D ⊆ [a, b], with equality if D is measurable and φ¹D is injective. 265 Notes and comments The proof of 265B seems to call on most of the second half of the alphabet. The idea is supposed to be straightforward enough. Because T [R r ] has dimension at most r, it can be rotated by an orthogonal transformation P into a subspace of the canonical r-dimensional subspace V , which is a natural copy of R r ; the matrix R represents the copying process from V to R r , and φ or P 0 R0 is a copy of R r onto a subspace including T [Rr ]. All this copying back and forth is designed to turn T into a linear operator S : R r → R r to which we can apply 263A, and part (b) of the proof is the check that we are copying the measures as well as the linear structures. In 265D-265E I have tried to follow 263C-263D as closely as possible. In fact only one new idea is needed. When s = r, we have a special argument available to show that µ∗r φ[D] ≤ Jµ∗r D + ²µ∗r D (in the language of 263C) which applies whether or not J = 0. When s > r, this approach fails, because we can no longer approximate νr T [B] by νr G where G ⊇ T [B] is open. (See part (b-i) of the proof of 263C.) I therefore turn to a different argument, valid only when J > 0, and accordingly have to find a separate method to show that {φ(x) : x ∈ D, J(x) = 0} is νr -negligible. Since we are working without restrictions on the dimensions r, s except that r ≤ s, we can use the trick of approximating φ : D → R s by ψη : D → R s+r , as in part (d) of the proof of 265E. I give three methods by which the area of the r-sphere can be calculated; a bare-hands approach (265F), the surrounding-cylinder method (265Xe) and an important repeated-integral theorem (265G). The first two provide formulae for the area of a cap (265Xb, 265Xe(v)). The surrounding-cylinder method is attractive because the Jacobian comes out to be 1, that is, we have an inverse-measure-preserving function. I note that despite having developed a technique which allows irregular domains, I am still forced by the singularity in the function θ of 265Xe to take the sphere in two bites. Theorem 265G is a special case of the Coarea Theorem (Evans & Gariepy 92, §3.4; Federer 69, 3.2.12). For the next step in the geometric theory of measures on Euclidean space, see Chapter 47 in Volume 4.
338
Probability theory
Chapter 27 Probability theory Lebesgue created his theory of integration in response to a number of problems in real analysis, and all his life seems to have thought of it as a tool for use in geometry and calculus (Lebesgue 72, vols. 1 and 2). Remarkably, it turned out, when suitably adapted, to provide a solid foundation for probability theory. The development of this approach is generally associated with the name of Kolmogorov. It has so come to dominate modern abstract probability theory that many authors ignore all other methods. I do not propose to commit myself to any view on whether σ-additive measures are the only way to give a rigorous foundation to probability theory, or whether they are adequate to deal with all probabilistic ideas; there are some serious philosophical questions here, since probability theory, at least in its applied aspects, seeks to help us to understand the material world outside mathematics. But from my position as a measure theorist, it is incontrovertible that probability theory is among the central applications of the concepts and theorems of measure theory, and is one of the most vital sources of new ideas; and that every measure theorist must be alert to the intuitions which probabilistic methods can provide. I have written the preceding paragraph in terms suggesting that ‘probability theory’ is somehow distinguishable from the rest of measure theory; this is another point on which I should prefer not to put forward any opinion as definitive. But undoubtedly there is a distinction, rather deeper than the elementary point `ve that probability deals (almost) exclusively with spaces of measure 1. M.Lo`eve argues persuasively (Loe 77, §10.2) that the essence of probability theory is the artificial nature of the probability spaces themselves. In measure theory, when we wish to integrate a function, we usually feel that we have a proper function with a domain and values. In probability theory, when we take the expectation of a random variable, the variable is an ‘observable’ or ‘the result of an experiment’; we are generally uncertain, or ignorant, or indifferent concerning the factors underlying the variable. Let me give an example from the theorems below. In the proof of the Central Limit Theorem (274F), I find that I need an auxiliary list Z0 , . . . , Zn of random variables, independent of each other and of the original sequence X0 , . . . , Xn . I create such a sequence by taking a product space Ω × Ω0 , and writing Xi0 (ω, ω 0 ) = Xi (ω), while the Zi are functions of ω 0 . Now the difference between the Xi and the Xi0 is of a type which a well-trained analyst would ordinarily take seriously. We do not think that the function x 7→ x2 : [0, 1] → [0, 1] is the same thing as the function (x1 , x2 ) 7→ x21 : [0, 1]2 → [0, 1]. But a probabilist is likely to feel that it is positively pedantic to start writing Xi0 instead of Xi . He did not believe in the space Ω in the first place, and if it turns out to be inadequate for his intuition he enlarges it without a qualm. Lo`eve calls probability spaces ‘fictions’, ‘inventions of the imagination’ in Larousse’s words; they are necessary in the models Kolmogorov has taught us to use, but we have a vast amount of freedom in choosing them, and in their essence they are nothing so definite as a set with points. A probability space, therefore, is somehow a more shadowy entity in probability theory than it is in measure theory. The important objects in probability theory are random variables and distributions, particularly joint distributions. In this volume I shall deal exclusively with random variables which can be thought of as taking values in some power of R; but this is not the central point. What is vital is that somehow the codomain, the potential set of values, of a random variable, is much better defined than its domain. Consequently our attention is focused not on any features of the artificial space which it is convenient to use as the underlying probability space – I write ‘underlying’, though it is the most superficial and easily changed aspect of the model – but on the distribution on the codomain induced by the random variable. Thus the Central Limit Theorem, which speaks only of distributions, is actually more important in applied probability than the Strong Law of Large Numbers, which claims to tell us what a long-term average will almost certainly be. W.Feller (Feller 66) goes even farther than Lo`eve, and as far as possible works entirely with distributions, setting up machinery which enables him to go for long stretches without mentioning probability spaces at all. I make no attempt to emulate him. But the approach is instructive and faithful to the essence of the subject. Probability theory includes more mathematics than can easily be encompassed in a lifetime, and I have selected for this introductory chapter the two limit theorems I have already mentioned, the Strong Law of Large Numbers and the Central Limit Theorem, together with some material on martingales (§§275-276). They illustrate not only the special character of probability theory – so that you will be able to form your
271Ad
Distributions
339
own judgement on the remarks above – but also some of its chief contributions to ‘pure’ measure theory, the concepts of ‘independence’ and ‘conditional expectation’.
271 Distributions I start this chapter with a discussion of ‘probability distributions’, the probability measures on R n defined by families (X1 , . . . , Xn ) of random variables. I give the basic results describing the circumstances under which two distributions are equal (271G), integration with respect to a distribution (271E), and probability density functions (271I-271K). 271A Notation I have just spent some paragraphs on an attempt to describe the essential difference between probability theory and measure theory. But there is a quicker test by which you may discover whether your author is a measure theorist or a probabilist: open any page, and look for the phrases ‘meaR surable function’ and ‘random variable’, and the formulae ‘ f dµ’ and ‘E(X)’. The first member of each pair will enable you to diagnose ‘measure’ and the second ‘probability’, with little danger of error. So far in this treatise I have firmly used measure theorists’ terminology, with a few individual quirks. But in a chapter on probability theory I find that measure-theoretic notation, while perfectly adequate in a formal sense, does such violence to the familiar formulations as to render them unnatural. Moreover, you must surely at some point – if you have not already done so – become familiar with probabilists’ language. So in this chapter I will make a substantial step in that direction. Happily, I think that this can be done without setting up any direct conflicts, so that I shall be able, in later volumes, to call upon this work in whichever notation then seems appropriate, without needing to re-formulate it. (a) So let (Ω, Σ, µ) be a probability space. I take the opportunity given by a new phrase to make a technical move. A real-valued random variable on Ω will be a member of L0 (µ), as defined in 241A; that is, a real-valued function X defined on a conegligible subset of Ω such that X is measurable with respect to the completion µ ˆ of µ, or, if you prefer, such that X¹E is Σ-measurable for some conegligible set E ⊆ Ω. R (b) If X is a real-valued random variable on a probability space (Ω, Σ, µ), write E(X) = X dµ if this exists in [−∞, ∞] in the sense of Chapter 12 and §133. I will sometimes write ‘X has a finite expectation’ in place of ‘X is integrable’. Thus 133A says that ‘E(X + Y ) = E(X) + E(Y ) whenever E(X) and E(Y ) and their sum are defined in [−∞, ∞]’, and 122P becomes ‘a real-valued random variable X has a finite expectation iff E(|X|) < ∞’. In this case I will call E(X) the mean or expectation of X. (c) If X is a real-valued random variable with finite expectation, I will write Var(X) = E(X − E(X))2 = E(X 2 − 2E(X)X + E(X)2 ) = E(X 2 ) − (E(X))2 , the variance of X. (Note that this formula shows that E(X)2 ≤ E(X 2 ); compare 244Xe(i).) Var(X) is finite iff E(X 2 ) < ∞, that is, iff X ∈ L2 (µ) (244A). In particular, X + Y and cX have finite variance whenever X and Y do and c ∈ R. (d) I shall allow myself to use such formulae as Pr(X > a), Pr(X − ² ≤ Y ≤ X + δ), where X and Y are random variables on the same probability space (Ω, Σ, µ), to mean respectively µ ˆ{ω : ω ∈ dom X, X(ω) > a}, µ ˆ{ω : ω ∈ dom X ∩ dom Y, X(ω) − ² ≤ Y (ω) ≤ X(ω) + δ}, writing µ ˆ for the completion of µ as usual. There are two points to note here. First, Pr depends on µ ˆ, not on µ; in effect, the notation automatically directs us to complete the probability space (Ω, Σ, µ). I could, of course, equally well write
340
Probability theory
271Ad
Pr(X 2 + Y 2 > 1) = µ∗ {ω : ω ∈ dom X ∩ dom Y, X(ω)2 + Y (ω)2 > 1}, taking µ∗ to be the outer measure on Ω associated with µ (132A). Secondly, I will use this notation only for predicates corresponding to Borel measurable sets; that is to say, I shall write T Pr(ψ(X1 , . . . , Xn )) = µ ˆ{ω : ω ∈ i≤n dom Xi , ψ(X1 (ω), . . . , Xn (ω))} only when the set {(α1 , . . . , αn ) : ψ(α1 , . . . , αn )} n
is a Borel set in R . Part of the reason for this restriction will appear in the next few paragraphs; Pr(ψ(X1 , . . . , Xn )) must be something calculable from knowledge of the joint distribution of X1 , . . . , Xn , as defined in 271C. In fact we can safely extend the idea to ‘universally measurable’ predicates ψ, to be discussed in Volume 4. But it could happen that µ gave a measure to a set of the form {ω : X(ω) ∈ A} for some exceedingly irregular set A, and in such a case it would be prudent to regard this as an accidental pathology of the probability space, and to treat it in a rather different way. (I see that I have rather glibly assumed that the formula above defines Pr(ψ(X1 , . . . , Xn )) for every Borel predicate ψ. This is a consequence of 271Bb below.) 271B Theorem Let (Ω, Σ, µ) be a probability space, and X1 , . . . , Xn real-valued random variables on T Ω. Set X (ω) = (X1 (ω), . . . , Xn (ω)) for ω ∈ i≤n dom Xi . (a) There is a unique Radon measure ν on R n such that ν ]−∞, a] = Pr(Xi ≤ αi for every i ≤ n) Q whenever a = (α1 , . . . , αn ) ∈ R , writing ]−∞, a] for i≤n ]−∞, αi ]; (b) νR n = 1 and νE = µ ˆX −1 [E] whenever νE is defined, where µ ˆ is the completion of µ; in particular, νE = Pr((X1 , . . . , Xn ) ∈ E) for every Borel set E ⊆ R n . T ˆ be the domain of µ proof Let Σ ˆ, and set D = i≤n dom Xi = dom X ; then D is conegligible, so belongs ˆ Let µ to Σ. ˆD = µ ˆ¹ PD be the subspace measure on D (131B, 214B), and ν0 the image measure µ ˆDX −1 (112E); let T be the domain of ν0 . Write B for the algebra of Borel sets in R n . Then B ⊆ T. P P For i ≤ n, α ∈ R set Fiα = {x : x ∈ R n , ξi ≤ ˆ ˆ so Hiα ∈ Σ, ˆ and α}, Hiα = {ω : ω ∈ dom Xi , Xi (ω) ≤ α}. Xi is Σ-measurable and its domain is in Σ, −1 n X [Fiα ] = D ∩ Hiα is µ ˆD -measurable. Thus Fiα ∈ T. As T is a σ-algebra of subsets of R , B ⊆ T (121J). Q Q Accordingly ν0 ¹B is a measure on R n with domain B; of course ν0 R n = µ ˆD = 1. By 256C, the completion ν of ν0 ¹B is a Radon measure on R n , and νR n = ν0 R n = 1. For E ∈ B, n
νE = ν0 E = µ ˆDX −1 [E] = µ ˆX −1 [E] = Pr((X1 , . . . , Xn ) ∈ E). More generally, if E ∈ dom ν, then there are Borel sets E 0 , E 00 such that E 0 ⊆ E ⊆ E 00 and ν(E 00 \ E 0 ) = 0, ˆ and X −1 [E 00 ] \ X −1 [E 0 ]) = 0. This means that X −1 [E] ∈ Σ so that X −1 [E 0 ] ⊆ X −1 [E] ⊆ X −1 [E 00 ] and µ ˆ(X µ ˆX −1 [E] = µ ˆX −1 [E 0 ] = νE 0 = νE. As for the uniqueness of ν, if ν 0 is any Radon measure on Rn such that ν 0 ]−∞, a] = Pr(Xi ≤ αi ∀ i ≤ n) for every a ∈ R n , then surely ν 0 R n = limk→∞ ν 0 ]−∞, k1] = limk→∞ ν ]−∞, k1] = 1 = νR n . Also I = {]−∞, a] : a ∈ R n } is closed under finite intersections, and ν and ν 0 agree on I. So by the Monotone Class Theorem (or rather, its corollary 136C), ν and ν 0 agree on the σ-algebra generated by I, which is B (121J), and are identical (256D). 271C Definition Let (Ω, Σ, µ) be a probability space and X1 , . . . , Xn real-valued random variables on Ω. By the (joint) distribution or law νX of the family X =T(X1 , . . . , Xn ) I shall mean the Radon probability X ∈ E) for measure ν of 271B. If we think of X as a function from i≤n dom Xi to R n , then νX E = Pr(X n every Borel set E ⊆ R .
271E
Distributions
341
271D Remarks (a) The choice of the Radon probability measure νX as ‘the’ distribution of X , with the insistence that ‘Radon measures’ should be complete, is of course somewhat arbitrary. Apart from the general principle that one should always complete measures, these conventions fit better with some of the work in Volume 4 and with such results as 272G below. (b) Observe that in order to speak of the distribution of a family X = (X1 , . . . , Xn ) of random variables, it is essential that all the Xi should be based on the same probability space. (c) I see that the language I have chosen allows the Xi to have different domains, T so that the family (X1 , . . . , Xn ) may not be exactly identifiable with the corresponding function from i≤n dom Xi to R n . I hope however that using the same symbol X for both will cause no confusion. (d) It is not useful to think of the whole image measure ν0 = µ ˆDX −1 in the proof of 271B as the distribution of X , unless it happens to be equal to ν = νX . The ‘distribution’ of a random variable is exactly that aspect of it which can be divorced from any consideration of the underlying space (Ω, Σ, µ), and the point of such results as 271K and 272G is that distributions can be calculated from each other, without going back to the relatively fluid and uncertain model of a random variable in terms of a function on a probability space. (e) If X = (X1 , . . . , Xn ) and Y = (Y1 , . . . , Yn ) are such that Xi =a.e. Yi for each i, then T T {ω : ω ∈ i≤n dom Xi , Xi (ω) ≤ αi ∀ i ≤ n}4{ω : ω ∈ i≤n dom Yi , Yi (ω) ≤ αi ∀ i ≤ n} is negligible, so Pr(Xi ≤ αi ∀ i ≤ n) = µ ˆ{ω : ω ∈
T i≤n
dom Xi , Xi (ω) ≤ αi ∀ i ≤ n} = Pr(Yi ≤ αi ∀ i ≤ n)
for all α0 , . . . , αn ∈ R, and νX = νY . This means that we can, if we wish, think of a distribution as a measure νu where u = (u0 , . . . , un ) is a finite sequence in L0 (µ). In the present chapter I shall not emphasize this approach, but it will always be at the back of my mind. 271E Measurable functions of random variables: Proposition Let X = (X1 , . . . , Xn ) be a family of random variables (as always in such a context, I mean them all to be on the same probability space (Ω, Σ, µ)); write TX for the domain of νX , and let h be a TX -measurable real-valued function defined νX -a.e. on R n . Then we have a random variable Y = h(X1 , . . . , Xn ) defined by setting h(X1 , . . . , Xn )(ω) = h(X1 (ω), . . . , Xn (ω)) for every ω ∈ X −1 [dom h]. The distribution νY of Y is the measure on R defined by the formula νY F = νX h−1 [F ] for just those sets F ⊆ R such that h−1 [F ] ∈ TX . Also R E(Y ) = h dνX in the sense that if one of these exists in [−∞, ∞], so does the other, and they are then equal. proof (a)(i) Ω \ dom Y ⊆
S
i≤n (Ω
\ dom Xi ) ∪ X −1 [R n \ dom h]
is negligible (using 271Bb), so dom Y is conegligible. If a ∈ R, then E = {x : x ∈ dom h, h(x) ≤ a} ∈ TX , so ˆ {ω : ω ∈ Ω, Y (ω) ≤ a} = X −1 [E] ∈ Σ. ˆ As a is arbitrary, Y is Σ-measurable, and is a random variable. ˜ : R n → R be any extension of h to the whole of R n . Then h ˜ is TX -measurable, so the (ii) Let h −1 −1 ˜ ˜ ordinary image measure νX h , defined on {F : h [F ] ∈ dom νX }, is a Radon probability measure on R (256G). But for any A ⊆ R,
342
Probability theory
271E
˜ −1 [A]4h−1 [A] ⊆ R n \ dom h h ˜ −1 [F ] if either is defined. is νX -negligible, so νX h−1 [F ] = νX h If F ⊆ R is a Borel set, then X −1 [h−1 [F ]]) = νX (h−1 [F ]). νY F = µ ˆ{ω : Y (ω) ∈ F } = µ ˆ(X ˜ −1 agree on the Borel sets and are equal (256D). So νY and νX h (b) Now apply Theorem 235E to the measures µ ˆ, νX and the function φ = X . We have R −1 X [F ])dˆ X −1 [F ]) = νX F χ(X µ=µ ˆ(X for every F ∈ TX , by 271Bb. Because h is νX -virtually measurable and defined νX -a.e., 235Eb tells us that R R R X )dµ = h(X X )dˆ h(X µ = h dνX whenever either side is defined in [−∞, ∞], which is exactly the result we need. 271F Corollary If X is a single random variable with distribution νX , then R∞ E(X) = −∞ x νX (dx) if either is defined in [−∞, ∞]. Similarly E(X 2 ) =
R∞ −∞
x2 νX (dx)
(whatever X may be). If X, Y are two random variables (on the same probability space!) then we have R E(X × Y ) = xy ν(X,Y ) d(x, y) if either side is defined in [−∞, ∞]. 271G Distribution functions (a) If X is a real-valued random variable, its distribution function is the function FX : R → [0, 1] defined by setting FX (a) = Pr(X ≤ a) = νX ]−∞, a] for every a ∈ R. (Warning! some authors prefer FX (a) = Pr(X < a).) Observe that FX is non-decreasing, that lima→−∞ FX (a) = 0, that lima→∞ FX (a) = 1 and that limx↓a FX (x) = FX (a) for every a ∈ R. By 271Ba, X and Y have the same distribution iff FX = FY . (b) If X1 , . . . , Xn are real-valued random variables on the same probability space, their (joint) distribution function is the function FX : R n → [0, 1] defined by writing FX (a) = Pr(Xi ≤ αi ∀ i ≤ n) whenever a = (α1 , . . . , αn ) ∈ R n . If X and Y have the same distribution function, they have the same distribution, by the n-dimensional version of 271B. 271H Densities Let X = (X1 , . . . , Xn ) be a family of random variables, all defined on the same probability space. A density function for (X1 , . . . , Xn ) is a Radon-Nikod´ ym derivative, with respect to Lebesgue measure, for the distribution νX ; that is, a non-negative function f , integrable with respect to Lebesgue measure µL on R n , such that R X ∈ E) f dµL = νX E = Pr(X E for every Borel set E ⊆ R n (256J) – if there is such a function, of course. 271I Proposition Let X = (X1 , . . . , Xn ) be a family of random variables, all defined on the same probability space. Write µL for Lebesgue measure on R n . X ∈ E) = 0 for every Borel set E such (a) There is a density function for X iff Pr(X R that µL E = 0. X∈ (b) A non-negative Lebesgue integrable function f is a density function for X iff ]−∞,a] f dµL = Pr(X n ]−∞, a]) for every a ∈ R .
271J
Distributions
343
(c) Suppose that f is a density function for X , and G = {x : f (x) > 0}. Then if h is a Lebesgue measurable real-valued function defined almost everywhere on G, R R X )) = h dνX = h × f dµL E(h(X if any of the three integrals is defined in [−∞, ∞], interpreting (h × f )(x) as 0 if f (x) = 0 and x ∈ / dom h. proof (a) Apply 256J to the Radon probability measure νX . (b) Of course the condition is necessary. If it is satisfied, then (by B.Levi’s theorem) R R f dµL = limk→∞ ]−∞,k11] f dµL = limk→∞ νX ]−∞, k11] = 1. So we have a Radon probability measure ν defined by writing R νE = E f dµL whenever E ∩ {x : f (x) > 0} is Lebesgue measurable (256E). We are supposing that ν ]−∞, a] = νX ]−∞, a] for every a ∈ R n ; by 271Ba, as usual, ν = νX , and R X ∈ E) f dµL = νE = νX E = Pr(X E for every Borel set E ⊆ R n , and f is a density function for X . (c) By 256E, νX is the indefinite-integral measure over µ associated with f . So, writing G = {x : f (x) > 0}, we have R R h dνX = h × f dµL whenever either is defined in [−∞, ∞]. But h is TX -measurable and defined νX -almost everywhere, where R X )) = h dνX by 271E. TX = dom νX , so E(h(X 271J The machinery developed in §263 is sufficient to give a very general result on the densities of X ), as follows. random variables of the form φ(X Theorem Let X = (X1 , . . . , Xn ) be a family of random variables, and D ⊆ R n a Borel set such that X ∈ D) = 1. Let φ : D → R n be a function which is differentiable relative to its domain everywhere Pr(X in D; for x ∈ D, let T (x) be a derivative of φ at x, and set J(x) = | det T (x)|. Suppose that J(x) 6= 0 for each x ∈ D, and that X has a density function f ; and suppose moreover that hDk ik∈N is a disjoint sequence X ) has a density function of Borel P∞ sets, with union D, such that φ¹Dk is injective for every k. Then φ(X g = k=0 gk where
gk (y) =
f (φ−1 (y)) J(φ−1 (y))
for y ∈ φ[Dk ∩ dom f ],
= 0 for y ∈ R n \ φ[Dk ]. X ) is a random variable. proof By 262Ia, φ is continuous, therefore Borel measurable, so φ(X For the moment, fix k ∈ N and a Borel set F ⊆ R n . By 263D(iii), φ[Dk ] is measurable, and by 263D(ii) φ[Dk \ dom f ] is negligible. The function gk is such that f (x) = J(x)gk (φ(x)) for every x ∈ Dk ∩ dom f , so by 263D(v) we have Z Z Z gk dµ = gk × χF dµ = J(x)gk (φ(x))χF (φ(x))µ(dx) F φ[Dk ] Dk Z X ∈ Dk ∩ φ−1 [F ]). = f dµ = Pr(X R
Dk ∩φ−1 [F ]
R R (The integral φ[Dk ] gk × χF is defined because Dk J × (gk × χF )φ is defined, and the integral gk × χF is defined because φ[Dk ] is measurable and g is zero off φ[Dk ].) Now sum over k. Every gk is non-negative, so by B.Levi’s theorem g is finite almost everywhere on F , and
344
Probability theory
Z g dµ = F
∞ Z X k=0
gk dµ =
F
∞ X
271J
X ∈ Dk ∩ φ−1 [F ]) Pr(X
k=0 −1
X ∈φ = Pr(X
X ) ∈ F ). [F ]) = Pr(φ(X
X ), as claimed. As F is arbitrary, g is a density function for φ(X 271K The application of the last theorem to ordinary transformations is sometimes indirect, so I give an example. Proposition Let X, Y be two random variables with a joint density function f . Then X × Y has a density function h, where R∞ 1 h(u) = −∞ f ( uv , v)dv |v|
whenever this is defined in R.
¶ y x , so 0 1 J(x, y) = | det T (x, y)| = |y|. Set D = {(x, y) : y 6= 0}; then D is a conegligible Borel set in R 2 and φ¹D is injective. Now φ[D] = D and φ−1 (u, v) = ( uv , v) for v 6= 0. So φ(X, Y ) = (X × Y, Y ) has a density function g, where µ
proof Set φ(x, y) = (xy, y) for x, y ∈ R 2 . Then φ is differentiable, with derivative T (x, y) =
g(u, v) =
f (u/v,v) |v|
if v 6= 0.
To find a density function for X × Y , we calculate R Ra R∞ Ra Pr(X × Y ≤ a) = ]−∞,a]×R g = −∞ −∞ g(u, v)dv du = −∞ h by Fubini’s theorem (252B, 252C). In particular, h is defined and finite almost everywhere; and by 271Ib it is a density function for X × Y . *271L When a random variable is presented as the limit of a sequence of random variables the following can be very useful. Proposition Let hXn in∈N be a sequence of real-valued random variables converging in measure to a random variable X (definition: 245A). Writing FXn , FX for the distribution functions of Xn , X respectively, FX (a) = inf b>a lim inf n→∞ FXn (b) = inf b>a lim supn→∞ FXn (b) for every a ∈ R. proof Set γ = inf b>a lim inf n→∞ FXn (b), γ 0 = inf b>a lim supn→∞ FXn (b). (a) FX (a) ≤ γ. P P Take any b > a and ² > 0. Then there is an n0 ∈ N such that Pr(|Xn − X| ≥ b − a) ≤ ² for every n ≥ n0 (245F). Now, for n ≥ n0 , FX (a) = Pr(X ≤ a) ≤ Pr(Xn ≤ b) + Pr(Xn − X ≥ b − a) ≤ FXn (b) + ². So FX (a) ≤ lim inf n→∞ FXn (b)+²; as ² is arbitrary, FX (a) ≤ lim inf n→∞ FXn (b); as b is arbitrary, FX (a) ≤ γ. Q Q (b) γ 0 ≤ FX (a). P P Let ² > 0. Then there is a δ > 0 such that FX (a + 2δ) ≤ FX (a) + ² (271Ga). Next, there is an n0 ∈ N such that Pr(|Xn − X| ≥ δ) ≤ ² for every n ≥ n0 . In this case, for n ≥ n0 , FXn (a + δ) = Pr(Xn ≤ a + δ) ≤ Pr(X ≤ a + 2δ) + Pr(X − Xn ≥ δ) ≤ FX (a + 2δ) + ² ≤ FX (a) + 2². Accordingly γ 0 ≤ lim supn→∞ FXn (a + δ) ≤ FX (a) + 2². As ² is arbitrary, γ 0 ≤ FX (a). (c) Since of course γ ≤ γ 0 , we must have FX (a) = γ = γ 0 , as claimed.
271Yd
Distributions
345
271X Basic exercises > (a) Let X be a real-valued random variable with finite expectation, and ² > 0. Show that Pr(|X − E(X)| ≥ ²) ≤
1 ²2
Var(X). (This is Chebyshev’s inequality.)
>(b) Let F : R → [0, 1] be a non-decreasing function such that (i) lima→−∞ F (a) = 0 (ii) lima→∞ F (a) = 1 (iii) limx↓a F (x) = F (a) for every a ∈ R. Show that there is a unique Radon probability measure ν in R such that F (a) = ν ]−∞, a] for every a ∈ R. (Hint: look at 114Xa.) Hence show that F is the distribution function of some random variable. > (c) Let X be a real-valued random variable with a density function f . (i) Show that |X| has a density function g1 where g1 (x) = f (x) + f (−x) whenever x ≥ 0 and f√(x), f (−x)√are both √ defined, 0 otherwise. (ii) Show that X 2 has a density function g2 where g2 (x) = (f ( x) + f (− x))/2 x whenever x > 0 and this is defined, 0 for other x. (iii) Show that if Pr(X = 0) = 0 then 1/X has a√density function g3 where g3 (x) = x12 f ( x1 ) whenever this is defined. (iv) Show that if Pr(X < 0) = 0 then X has a density function g4 where g4 (x) = 2xf (x2 ) if x ≥ 0 and f (x2 ) is defined, 0 otherwise. > (d) Let X and Y be random variables with a joint density function f : R 2 → R. Show that X + Y has R a density function h where h(u) = f (u − v, v)dv for almost every u. (e) Let X, Y be random variables with a joint density function f : R 2 → R. Show that X/Y has a R density function h where h(u) = |v|f (uv, v)dv for almost every u. (f ) Devise an alternative proof of 271K by using Fubini’s theorem and one-dimensional substitutions to show that R RbR∞ 1 u f ( v , v)dv du = {(u,v):a≤uv≤b} f a −∞ |v|
whenever a ≤ b in R. 271Y Further exercises (a) Let T be the topology of R N and B the σ-algebra of Borel sets (256Ye). (i) Let I be the family of sets of the form {x : x ∈ R N , x(i) ≤ αi ∀ i ≤ n}, where n ∈ N and αi ∈ R for each i ≤ n. Show that B is theSsmallest family of subsets of R N such that (α) I ⊆ B (β) B \ A ∈ B whenever A, B ∈ B and A ⊆ B (γ) k∈N Ak ∈ B for every non-decreasing sequence hAk ik∈N in B. (ii) Show that if µ, µ0 are two totally finite measures defined on R N , and µF and µ0 F are defined and equal for every F ∈ I, then µE and µ0 E are defined and equal for every E ∈ B. (iii) Show that if Ω is any set and Σ any σ-algebra of subsets of Ω and X : Ω → R N is any function, then X −1 [E] ∈ Σ for every E ∈ B iff πi X is Σ-measurable for every i ∈ N, where πi (x) = x(i) for each x ∈ R N , i ∈ N. (iv) Show that if X = hXi ii∈N is a sequence of real-valued random variables on a probability space (Ω, Σ, µ), then there is a B B {x : x(i) ≤ αi ∀ i ≤ n} = Pr(Xi ≤ αi ∀ i ≤ n) unique probability measure νX , with domain B, such that νX for every α0 , . . . , αn ∈ R. (v) Under the conditions of (iv), show that there is a unique Radon measure νX on R N (in the sense of 256Ye) such that νX {x : x(i) ≤ αi ∀ i ≤ n} = Pr(Xi ≤ αi ∀ i ≤ n) for every α0 , . . . , αn ∈ R. (b) Let F : R 2 → [0, 1] be a function. Show that the following are equiveridical: (i) F is the distribution function of some pair (X1 , X2 ) of random variables (ii) there is a probability measure ν on R 2 such that ν ]−∞, a] = F (a) for every a ∈ R 2 (iii)(α) F (α1 , α2 ) + F (β1 , β2 ) ≥ F (α1 , β2 ) + F (α2 , β1 ) whenever α1 ≤ β1 and α2 ≤ β2 (β) F (α1 , α2 ) = limξ1 ↓α1 ,ξ2 ↓α2 F (ξ1 , ξ2 ) for every α1 , α2 (γ) limα→−∞ F (α, β) = limα→−∞ F (β, α) = 0 for all β (δ) limα→∞ F (α, α) = 1. (Hint: for non-empty half-open intervals ]a, b], set λ ]a, b] = F (α1 , α2 ) + F (β1 , β2 ) − F (α1 , β2 ) − F (α2 , β1 ), and continue as in 115B-115F.) (c) Generalize (b) to higher dimensions, finding a suitable formula to stand in place of that in (iii-α) of (b). (d) Let (Ω, Σ, µ) be a probability space and F a filter on L0 (µ) converging to X0 ∈ L0 (µ) for the topology of convergence in measure. Show that, writing FX for the distribution function of X ∈ L0 (µ),
346
Probability theory
271Yd
FX0 (a) = inf b>a lim inf X→F FX (b) = inf b>a lim supX→F FX (b) for every a ∈ R. 271 Notes and comments Most of this section seems to have been taken up with technicalities. This is perhaps unsurprising in view of the fact that it is devoted to the relationship between a vector random variable X and the associated distribution νX , and this necessarily leads us into the minefield which I attempted to chart in §235. Indeed, I call on results from §235 twice; once in 271E, with a φ(ω) = X (ω) and J(ω) = 1, and once in 271I, with φ(x) = x and J(x) = f (x). Distribution functions of one-dimensional random variables are easily characterized (271Xb); in higher dimensions we have to work harder (271Yb-271Yc). Distributions, rather than distribution functions, can be described for infinite sequences of random variables (271Ya); indeed, these ideas can be extended to uncountable families, but this requires proper topological measure theory, and belongs in Volume 4. The statement of 271J is lengthy, not to say cumbersome. The point is that many of the most important transformations φ are not themselves injective, but can easily be dissected into injective fragments (see, for instance, 271Xc and 263Xd). The point of 271K is that we frequently wish to apply the ideas here to transformations which are singular, and indeed change the dimension of the random variable. I have not given the theorems which make such applications routine and suggest rather that you seek out tricks such as that used in the proof of 271K, which in any case are necessary if you want amenable formulae. Of course other methods are available (271Xf).
272 Independence I introduce the concept of ‘independence’ for families of events, σ-algebras and random variables. The first part of the section, down to 272G, amounts to an analysis of the elementary relationships between the three manifestations of the idea. In 272G I give the fundamental result that the joint distribution of a (finite) independent family of random variables is just the product of the individual distributions. Further expressions of the connexion between independence and product measures are in 272J, 272M and 272N. I give a version of the zero-one law (272O), and I end the section with a group of basic results from probability theory concerning sums and products of independent random variables (272Q-272U). 272A Definitions Let (Ω, Σ, µ) be a probability space. (a) A family hEi ii∈I in Σ is (stochastically) independent if Qn µ(Ei1 ∩ Ei2 ∩ . . . ∩ Ein ) = j=1 µEij whenever i1 , . . . , in are distinct members of I. (b) A family hΣi ii∈I of σ-subalgebras of Σ is (stochastically) independent if Qn µ(E1 ∩ E2 ∩ . . . ∩ En ) = j=1 µEj whenever i1 , . . . , in are distinct members of I and Ej ∈ Σij for every j ≤ n. (c) A family hXi ii∈I of real-valued random variables on Ω is (stochastically) independent if Qn Pr(Xij ≤ αj for every j ≤ n) = j=1 Pr(Xij ≤ αj ) whenever i1 , . . . , in are distinct members of I and α1 , . . . , αn ∈ R. 272B Remarks (a) This is perhaps the central contribution of probability theory to measure theory, and as such deserves the most careful scrutiny. The idea of ‘independence’ comes from outside mathematics altogether, in the notion of events which have independent causes. I suppose that 272G and 272M are the results below which most clearly show the measure-theoretic aspects of the concept. It is not an accident that both involve product measures; one of the wonders of measure theory is the fact that the same technical devices are used in establishing the probability theory of stochastic independence and the geometry of multi-dimensional volume.
272D
Independence
347
(b) In the following paragraphs I will try to describe some relationships between the three notions of independence just defined. But it is worth noting at once the fact that, in all three cases, a family is independent iff all its finite subfamilies are independent. Consequently any subfamily of an independent family is independent. Another elementary fact which is immediate from the definitions is that if hΣi ii∈I is an independent family of σ-algebras, and Σ0i is a σ-subalgebra of Σi for each i, then hΣ0i ii∈I is an independent family. (c) A useful reformulation of 272Ab is the following: A family hΣi ii∈I of σ-subalgebras of Σ is independent iff
T Q µ( i∈I Ei ) = i∈I µEi
whenever Ei ∈ Σi for every i and {iQ : Ei 6= Ω} is finite. (Here I follow the convention of 254F, saying Q that for a family hαi ii∈I in [0, 1] we take i∈I αi = 1 if I = ∅, and otherwise it is to be inf J⊆I,J is finite i∈J αj .) (d) In 272Aa-b I speak of sets Ei ∈ Σ and algebras Σi ⊆ Σ. In fact (272Ac already gives a hint of this) ˆ rather than with Σ, if there is a difference, where (Ω, Σ, ˆ µ we shall more often than not be concerned with Σ ˆ) is the completion of (Ω, Σ, µ). 272C The σ-subalgebra defined by a random variable To relate 272Ab to 272Ac we need the following notion. Let (Ω, Σ, µ) be a probability space and X a real-valued random variable defined on Ω. Write B for the σ-algebra of Borel subsets of R, and ΣX for {X −1 [F ] : F ∈ B} ∪ {(Ω \ dom X) ∪ X −1 [F ] : F ∈ B}. Then ΣX is a σ-algebra of subsets of Ω. P P ∅ = X −1 [∅] ∈ ΣX ; if F ∈ B then Ω \ X −1 [F ] = (Ω \ dom X) ∪ X −1 [R \ F ] ∈ ΣX , Ω \ ((Ω \ dom X) ∪ X −1 [F ]) = X −1 [R \ F ] ∈ ΣX ; if hFk ik∈N is any sequence in B then
S k∈N
so
S k∈N
X −1 [Fk ] = X −1 [
X −1 [Fk ],
S k∈N
(Ω \ dom X) ∪
Fk ],
S k∈N
X −1 [Fk ]
belong to ΣX . Q Q Evidently ΣX is the smallest σ-algebra of subsets of Ω, containing dom X, for which X is measurable. ˆ where Σ ˆ is the domain of the completion of µ (271Aa). Also ΣX is a subalgebra of Σ, Now we have the following result. 272D Proposition Let (Ω, Σ, µ) be a probability space and hXi ii∈I a family of real-valued random variables on Ω. For each i ∈ I, let Σi be the σ-algebra defined by Xi , as in 272C. Then the following are equiveridical: (i) hXi ii∈I is independent; (ii) whenever i1 , . . . , in are distinct members of I and F1 , . . . , Fn are Borel subsets of R, then Qn Pr(Xij ∈ Fj for every j ≤ n) = j=1 Pr(Xij ∈ Fj ); (iii) whenever hFi ii∈I is a family of Borel subsets of R, and {i : Fi 6= R} is finite, then ¡T ¢ Q µ ˆ i∈I (Xi−1 [Fi ] ∪ (Ω \ dom Xi )) = i∈I Pr(Xi ∈ Fi ), where µ ˆ is the completion of µ; (iv) hΣi ii∈I is independent. proof (a)(i)⇒(ii) Write X = (Xi1 , . . . , Xin ). Write νX for the joint distribution of X , and for each j ≤ n write νj for the distribution of Xij ; let ν be the product of ν1 , . . . , νn as described in 254A-254C. (I wrote
348
Probability theory
272D
§254 out as for infinite products. If you are interested only in finite products of probability spaces, which are adequate for our needs in this paragraph, I recommend reading §§251-252 with the mental proviso that all measures are probabilities, and then §254 with the proviso that the set I is finite.) By 256K, ν is a Radon measure on R n . (This is an induction on n, relying on 254N for assurance that we can regard ν as the repeated product (. . . ((ν1 × ν2 ) × ν3 ) × . . . νn−1 ) × νn .) Then for any a = (α1 , . . . , αn ) ∈ R n , we have ν ]−∞, a] = ν
n ¡Y
n ¢ Y ]−∞, αj ] = νj ]−∞, αj ]
j=1
j=1
(using 254Fb) =
n Y
Pr(Xij ≤ αj ) = Pr(Xij ≤ αj for every j ≤ n)
j=1
(using the condition (i)) = νX ]−∞, a] . By the uniqueness assertion in 271Ba, ν = νX . In particular, if F1 , . . . , Fn are Borel subsets of R, Y Y X∈ Pr(Xij ∈ Fj for every j ≤ n) = Pr(X Fj ) = νX ( Fj ) j≤n
= ν(
Y
Fj ) =
n Y
j≤n n Y
νj Fj =
j=1
j≤n
Pr(Xij ∈ Fj ),
j=1
as required. (b)(ii)⇒(i) is trivial, if we recall that all sets ]−∞, α] are Borel sets, so that the definition of independence given in 272Ac is just a special case of (ii). (c)(ii)⇒(iv) Assume (ii), and suppose that i1 , . . . , in are distinct members of I and Ej ∈ Σij for each j ≤ n. For each j, set Ej0 = Ej ∩ dom Xij , so that Ej0 may be expressed as Xi−1 [Fj ] for some Borel set j Fj ⊆ R. Then µ ˆ(Ej \ Ej0 ) = 0 for each j, so µ ˆ(
\
Ej ) = µ ˆ(
1≤j≤n
\
Ej0 ) = Pr(Xi1 ∈ F1 , . . . , Xin ∈ Fn )
1≤j≤n
=
n Y
Pr(Xij ∈ Fj )
j=1
(using (ii)) =
n Y
µ ˆ Ej .
i=1
As E1 , . . . , Ek are arbitrary, hΣi ii∈I is independent. (d)(iv)⇒(ii) Now suppose that hΣi ii∈I is independent. If i1 , . . . , in are distinct members of I and F1 , . . . , Fn are Borel sets in R, then Xi−1 [Fj ] ∈ Σij for each j, so j \
Pr(Xi1 ∈ F1 , . . . , Xin ∈ Fn ) = µ ˆ(
Xi−1 [Fj ]) j
1≤j≤n
= .
n Y i=1
µ ˆXi−1 [Fj ] = j
n Y
Pr(Xij ∈ Fj )
j=1
(e) Finally, observe that (iii) is nothing but a re-formulation of (ii), because if Fi = R then Pr(Xi ∈ Fi ) = 1 and Xi−1 [Fi ] ∪ (Ω \ dom Xi ) = Ω.
272G
Independence
349
272E Corollary Let hXi ii∈I be an independent family of real-valued random variables, and hhi ii∈I any family of Borel measurable functions from R to R. Then hhi (Xi )ii∈I is independent. proof Writing Σi for the σ-algebra defined by Xi , Σ0i for the σ-algebra generated by h(Xi ), h(Xi ) is Σi -measurable (121Eg) so Σ0i ⊆ Σi for every i and hΣ0i ii∈I is independent, as in 272Bb. 272F
Similarly, we can relate the definition in 272Aa to the others.
Proposition Let (Ω, Σ, µ) be a probability space, and hEi ii∈I a family in Σ. Set Σi = {∅, Ei , Ω \ Ei , Ω}, the (σ-)algebra of subsets of Ω generated by Ei , and Xi = χEi , the characteristic function of Ei . Then the following are equiveridical: (i) hEi ii∈I is independent; (ii) hΣi ii∈I is independent; (iii) hXi ii∈I is independent. proof (i)⇒(iii) If i1 , . . . , in are distinct members of I and α1 , . . . , αn ∈ R, then for each j ≤ n the set Gj = {ω : Xij (ω) ≤ αj } is either Eij or ∅ or Ω. If any Gj is empty, then Qn Pr(Xij ≤ αj for everyj ≤ n} = 0 = j=1 Pr(Xij ≤ αj ). Otherwise, set K = {j : Gj = Eij }; then Pr(Xij ≤ αj for everyj ≤ n} = µ(
\
Gj ) = µ(
Y j∈K
Eij )
j∈K
j≤n
=
\
n Y
µEij =
Pr(Xij ≤ αj ).
j=1
As i1 , . . . , in and α1 , . . . , αn are arbitrary, hXi ii∈I is independent. (iii)⇒(ii) follows from (i)⇒(iii) of 272D, because Σi is the σ-algebra defined by Xi . (ii)⇒(i) is trivial, because Ei ∈ Σi for each i. Remark You will I hope feel that while the theory of product measures might be appropriate to 272D, it is surely rather heavy machinery to use on what ought to be a simple combinatorial problem like (iii)⇒(ii) of this proposition. I suggest that you construct an ‘elementary’ proof, and examine which of the ideas of the theory of product measures (and the Monotone Class Theorem, 136B) are actually needed here. 272G Distributions of independent random variables I have not tried to describe the ‘joint distribution’ of an infinite family of random variables. (Indications of how to deal with a countable family are offered in 271Ya and 272Yg.) As, however, the independence of a family of random variables is determined by the behaviour of finite subfamilies, we can approach it through the following proposition. Theorem Let X = (X1 , . . . , Xn ) be a finite family of real-valued random variables on a probability space. Let νX be the corresponding distribution on R n . Then the following are equiveridical: (i) X1 , . . . , Xn are independent; (ii) νX can be expressed as a product of n probability measures ν1 , . . . , νn , one for each factor R of Rn ; (iii) νX is the product measure of νX1 , . . . , νXn , writing νXi for the distribution of the random variable Xi . proof (a)(i)⇒(iii) In the proof of (i)⇒(iii) of 272D above I showed that νX is the product ν of νX1 , . . . , νXn . (b)(iii)⇒(ii) is trivial. (c)(ii)⇒(i) Suppose that νX is expressible as a product ν1 × . . . × νn . Let a = (α1 , . . . , αn ) ∈ R n Then Q X ∈ ]−∞, a]) = νX (]−∞, a]) = ni=1 νi ]−∞, αi ]. Pr(Xi ≤ αi ∀ i ≤ n) = Pr(X On the other hand, setting Fi = {(ξ1 , . . . , ξn ) : ξi ≤ αi }, we must have X ∈ Fi ) = Pr(Xi ≤ αi ) νi ]−∞, αi ] = νX Fi = Pr(X
350
Probability theory
for each i. So we get Pr(Xi ≤ αi for every i ≤ n) =
272G
Qn i=1
Pr(Xi ≤ αi ),
as required. 272H Corollary Suppose that hXi ii∈I is an independent family of real-valued random variables on a probability space (Ω, Σ, µ), and that for each i ∈ I we are given another real-valued random variable Yi on Ω such that Yi =a.e. Xi . Then hYi ii∈I is independent. proof For every distinct i1 , . . . , in ∈ I, if we set X = (Xi1 , . . . , Xin ) and Y = (Yi1 , . . . , Yin ), then X =a.e. Y , so νX , νY are equal (271De). By 272G, Yi1 , . . . , Yin must be independent because Xi1 , . . . , Xin are. As i1 , . . . , in are arbitrary, the whole family hYi ii∈I is independent. Remark It follows that we may speak of independent families in the space L0 (µ) of equivalence classes of random variables (241C), saying that hXi• ii∈I is independent iff hXi ii∈I is. 272I Corollary Suppose that X1 , . . . , Xn are independent random variables with densityQfunctions n f1 , . . . , fn (271H). Then X = a density function f given by setting f (x) = i=1 fi (ξi ) Q (X1 , . . . , Xn ) has whenever x = (ξ1 , . . . , ξn ) ∈ i≤n dom(fi ) ⊆ R n . proof For n = 2 this is covered by 256L; the general case follows by induction on n. 272J The most important theorems of the subject refer to independent families of random variables, rather than independent families of σ-algebras. The value of the concept of independent σ-algebras lies in such results as the following. Proposition Let (Ω, Σ, µ) be a complete probability space, and hΣi ii∈I a family of σ-subalgebras of Σ. For each i ∈ I let µi be the restriction of µ to Σi , and let (ΩI , Λ, λ) be the product probability space of the family h(Ω, Σi , µi )ii∈I . Define φ : Ω → ΩI by setting φ(ω)(i) = ω for every ω ∈ Ω, i ∈ I. Then φ is inversemeasure-preserving iff hΣi ii∈I is independent. proof This is virtually a restatement of 254Fb and 254G. (i) If φ is inverse-measure-preserving, and T i1 , . . . , in ∈ I are distinct and Ej ∈ Σij for each j, then j≤n Eij = φ−1 [{x : x(ij ) ∈ Ej for every j ≤ n}], so that Q Q T µ( j≤n Eij ) = λ{x : x(ij ) ∈ Ej for every j ≤ n} = j≤n µij Eij = j≤n µEij . (ii) If hΣi ii∈I is independent, and Ei ∈ Σi for every i ∈ I and {i : Ei 6= Ω} is finite, then Q Q T Q µφ−1 [ i∈I Ei ] = µ( i∈I Ei ) = i∈I µEi = i∈I µi Ei . So the conditions of 254G are satisfied and µφ−1 [W ] = λW for every W ∈ Λ. 272K Proposition Let (Ω, Σ, µ) be a probability space and hΣi ii∈I an independent family of σ˜ s be the subalgebras of Σ. Let hJ(s)is∈S be a disjoint family of subsets of I, and for each s ∈ S let Σ S ˜ σ-algebra of subsets of Ω generated by i∈J(s) Σi . Then hΣs is∈S is independent. ˆ µ proof Let (Ω, Σ, ˆ) be the completion of (Ω, Σ, µ). On ΩI let λ be the product of the measures µ¹Σi , and I let φ : Ω → Ω be the diagonal map, as in 272J. φ is inverse-measure-preserving for µ ˆ and λ, by 272J. We can identify λ with the product of hλs is∈S , where for each s ∈ S λs is the product of hµ¹Σi ii∈J(s) (254N). For s ∈ S, let Λs be the domain of λs , and set πs (x) = x¹J(s) for x ∈ ΩI , so that πs is inversemeasure-preserving for λ and λs (254Oa), and φs = πs φ is inverse-measure-preserving for µ ˆ and λs ; of course ∗ ˆ and φs is the diagonal map from Ω to ΩJ(s) . Set Σ∗s = {φ−1 [H] : H ∈ Λ }. Then Σ is a σ-subalgebra of Σ, s s s ˜ s , because Σ∗s ⊇ Σ ∗ E = φ−1 s [{x : x(i) ∈ E}] ∈ Σs
for every i ∈ J(s), E ∈ Σi . ˜ s for each j. Then Ej ∈ Σ∗s , so there are Now suppose that s1 , . . . , sn ∈ S are distinct and that Ej ∈ Σ j j −1 Hj ∈ Λsj such that Ej = φsj [Hj ] for each j. Set
272M
Independence
351
W = {x : x ∈ ΩI , x¹J(sj ) ∈ Hj for every j ≤ n}. Because we can identify λ with the product of the λs , we have Qn Qn Qn Qn λW = j=1 λsj Hj = j=1 µ ˆ(φ−1 ˆEj = j=1 µEj . sj [Hj ]) = j=1 µ T On the other hand, φ−1 [W ] = j≤n Ej , so, because φ is inverse-measure-preserving, T T Qn µ( j≤n Ej ) = µ ˆ( j≤n Ej ) = λW = j=1 µEj . ˜ s is∈S is independent. As E1 , . . . , En are arbitrary, hΣ 272L
I give a typical application of this result as a sample.
Corollary Let X, X1 , . . . , Xn be independent random variables and h : R n → R a Borel function. Then X and h(X1 , . . . , Xn ) are independent. proof Let ΣX , ΣXi be the σ-algebras defined by X, Xi (272C). Then ΣX , ΣX1 , . . . , ΣXn are independent (272D). Let Σ∗ be the σ-algebra generated by ΣX1 ∪ . . . ∪ ΣXn . Then 272K (perhaps working in the completion of the original probability space) tells us that ΣX and Σ∗ are independent. But every Xj is Σ∗ -measurable so Y = h(X1 , . . . , Xn ) is Σ∗ -measurable (121Kb); also dom Y ∈ Σ∗ , so ΣY ⊆ Σ∗ and ΣX , ΣY are independent. By 272D again, X and Y are independent, as claimed. Remark Nearly all of us, when teaching elementary probability theory, would invite our students to treat this corollary (with an explicit function h, of course) as ‘obvious’. In effect, the proof here is a confirmation that the formal definition of ‘independence’ offered is a faithful representation of our intuition of independent events having independent causes. 272M Products of probability spaces and independent families of random variables We have already seen that the concept of ‘independent random variables’ is intimately linked with that of ‘product measure’. I now give some further manifestations of the connexion. Proposition Let h(Ωi , Σi , µi )ii∈I be a family of probability spaces, and (Ω, Σ, µ) their product. ˜ i = {π −1 [E] : E ∈ Σi }, where πi : Ω → Ωi is the coordinate map. Then hΣ ˜ i ii∈I (a) For each i ∈ I write Σ i is an independent family of σ-subalgebras of Σ. (b) For each i ∈ I let hXij ij∈J(i) be an independent family of real-valued random variables on Ωi , and ˜ ij (ω) = Xij (ω(i)) for those ω ∈ Ω such that ω(i) ∈ dom Xij . Then hX ˜ ij ii∈I,j∈J(i) for i ∈ I, j ∈ J(i) write X ˜ is an independent family of random variables, and each Xij has the same distribution as the corresponding Xij . ˜ i is a σ-algebra of sets. The rest amounts just to recalling from proof (a) It is easy to check that each Σ 254Fb that if J ⊆ I is finite and Ei ∈ Σi for i ∈ J, then T Q µ( i∈J πi−1 [Ei ]) = µ{ω : ω(i) ∈ Ei for every i ∈ I} = i∈I µi Ei if we set Ei = Xi for i ∈ I \ J. ˆ i, µ (b) We know also that (Ω, Σ, µ) is the product of the completions (Ωi , Σ ˆi ) (254I). From this, we see ˜ that each Xij is defined µ-a.e., and is Σ-measurable, with the same distribution as Xij . Now apply condition (iii) of 272D. Suppose that hFij ii∈I,j∈J(i) is a family of Borel sets in R, and that {(i, j) : Fij 6= R} is finite. Consider T −1 Ei = j∈J(i) (Xij [Fij ] ∪ (Ωi \ dom Xij )), E=
Q i∈I
Ei =
T
˜ −1 i∈I,j∈J(i) (Xij [Fij ]
˜ ij )). ∪ (Ω \ dom X
Because each family hXij ij∈J(i) is independent, and {j : Fij 6= R} is finite, Q µ ˆi Ei = j∈J(i) Pr(Xij ∈ Eij ) for each i ∈ I. Because {i : Ei 6= Ωi } ⊆ {i : ∃ j ∈ J(i), Fij 6= R}
352
Probability theory
is finite, µE =
Q i∈I
µ ˆi Ei =
Q i∈I,j∈J
272M
˜ ij ∈ Fij ); Pr(X
˜ ij ii∈I,j∈J(i) is independent. as hFij ii∈I,j∈J(i) is arbitrary, hX Remark The formulation in (b) is more complicated than is necessary to express the idea, but is what is needed for an application below. 272N A special case of 272J is of particular importance in general measure theory, and is most useful in an adapted form. Proposition Let (Ω, Σ, µ) be a complete probability space, and hEi ii∈I an independent family in Σ such that µEi = 21 for every i ∈ I. Define φ : Ω → {0, 1}I by setting φ(ω)(i) = 1 if ω ∈ Ei , 0 if ω ∈ Ω \ Ei . Then φ is inverse-measure-preserving for the usual measure λ on {0, 1}I (254J). proof I use 254G again. For each i ∈ I let Σi be the algebra {∅, Ei , Ω \ Ei , Ω}; then hΣi ii∈I is independent (272F). For i ∈ I set φi (ω) = φ(ω)(i). Let ν be the usual measure of {0, 1}. Then it is easy to check that 1 2
µφ−1 i [H] = #(H) = νH for every H ⊆ {0, 1}. If hHi ii∈I is a family of subsets of {0, 1}, and {i : Hi 6= {0, 1}} is finite, then µφ−1 [
\
Hi ] = µ(
i∈I
\
φ−1 i [Hi ]) =
i∈I
Y
µφ−1 i [Hi ]
i∈J
(because φ−1 [Hi ] ∈ Σi for each i, and hΣi ii∈I is independent) Y Y = νHi = λ( Hi ). i∈I
i∈I
As hHi ii∈I is arbitrary, 254G gives the result. 272O Tail σ-algebras and the zero-one law I have never been able to make up my mind whether the following result is ‘deep’ or not. I think it is one of the many cases in mathematics where a theorem is surprising and exciting if one comes on it unprepared, but is natural and straightforward if one approaches it from the appropriate angle. Proposition Let (Ω, Σ, µ) be a probabilitySspace and hΣn in∈N an independent T sequence of σ-subalgebras of Σ. Let Σ∗n be the σ-algebra generated by m≥n Σm for each n, and set Σ∗∞ = n∈N Σ∗n . Then µE is either 0 or 1 for every E ∈ Σ∗∞ . proof For each n, the family (Σ0 , . . . , Σn , Σ∗n+1 ) is independent, by 272K. So (Σ0 , . . . , Σn , Σ∗∞ ) is independent, because Σ∗∞ ⊆ Σ∗n+1 . But this means that every finite subfamily of (Σ∗∞ , Σ0 , Σ1 , . . . ) is independent, and therefore that the whole family is (272Bb). Consequently (Σ∗∞ , Σ∗0 ) must be independent, by 272K again. Now if E ∈ Σ∗∞ , then E also belongs to Σ∗0 , so we must have µ(E ∩ E) = µE · µE, 2
that is, µE = (µE) ; so that µE ∈ {0, 1}, as claimed. 272P To support the claim that somewhere we have achieved a non-trivial insight, I give a corollary, which will be fundamental to the understanding of the limit theorems in the next section, and does not seem to be obvious. Corollary Let (Ω, Σ, µ) be a probability space, and hXn in∈N an independent sequence of real-valued random variables on Ω. Then lim supn→∞
1 (X0 n+1
+ . . . + Xn )
272R
Independence
353
is almost everywhere constant – that is, there is some u ∈ [−∞, ∞] such that lim supn→∞
1 (X0 n+1
+ . . . + Xn ) =a.e. u.
proof We may suppose that each Xn is Σ-measurable and defined everywhere in Ω, because (as remarked in 272H) changing the Xn on a negligible set does not affect their independence, and it affects lim supn→∞
1 (X0 n+1
+ . . . + Xn ) only on a negligible set. For each n, let Σn be the σ-algebra generated S T by Xn (272C), and Σ∗n the σ-algebra generated by m≥n Σm ; set Σ∗∞ = n∈N Σ∗n . By 272D, hΣn in∈N is independent, so µE ∈ {0, 1} for every E ∈ Σ∗∞ (272O). Now take any a ∈ R and set Ea = {ω : lim supm→∞
1 (X0 (ω) + . . . + Xm (ω)) m+1
≤ a}.
Then lim supm→∞
1 (X0 m+1
+ . . . + Xm ) = lim supm→∞
1 (Xn m+1
+ . . . + Xm+n ),
so Ea = {ω : lim supm→∞
1 (Xn (ω) + . . . + Xn+m (ω)) m+1
≤ a}
belongs to Σ∗n for every n, because Xi is Σ∗n -measurable for every i ≥ n. So E ∈ Σ∗∞ and Pr(lim supm→∞
1 (X0 m+1
+ . . . + Xm ) ≤ a) = µEa
must be either 0 or 1. Setting u = sup{a : a ∈ R, µEa = 0} (allowing sup ∅ = −∞ and sup R = ∞, as usual in such contexts), we see that lim supn→∞
1 (X0 n+1
+ . . . + Xn ) = u
almost everywhere. 272Q
I must now catch up on some basic facts from elementary probability theory.
Proposition Let X, Y be independent real-valued random variables with finite expectation (271Ab). Then E(X × Y ) exists and is equal to E(X)E(Y ). proof Let ν(X,Y ) be the joint distribution of the pair R R (X, Y ). Then ν(X,Y ) is the product of the distributions νX and νY (272G). Also xνX (dx) = E(X) and yνY (dy) = E(Y ) exist in R (271F). So R xyν(X,Y ) d(x, y) exists = E(X)E(Y ) (253D). But this is just E(X × Y ), by 271E with h(x, y) = xy. 272R Bienaym´ e’s Equality Let X1 , . . . , Xn be independent random variables. Then Var(X1 + . . . + Xn ) = Var(X1 ) + . . . + Var(Xn ). proof (a) Suppose first that all the Xi have finite variance. Set ai = E(Xi ), Yi = Xi −ai , X = X1 +. . .+Xn , Y = Y1 + . . . + Yn ; then E(X) = a1 + . . . + an , so Y = X − E(X) and
Var(X) = E(Y 2 ) = E(
n X
Yi )2
i=1
= E(
n X n X i=1 j=1
Yi × Yj ) =
n X n X i=1 j=1
E(Yi × Yj ).
354
Probability theory
272R
Now observe that if i 6= j then E(Yi × Yj ) = E(Yi )E(Yj ) = 0, because Yi and Yj are independent (by 272E) and we may use 272Q, while if i = j then E(Yi × Yj ) = E(Yi2 ) = E(Xi − E(Xi ))2 = Var(Xi ). So Var(X) =
Pn i=1
E(Yi2 ) =
Pn i=1
Var(Xi ).
(b)(i) I show next that if Var(X1 + X2 ) < ∞ then Var(X1 ) < ∞. P P We have Z
Z 2
(x + y) νX1 (dx)νX2 (dy) =
(x + y)2 ν(X1 ,X2 ) (d(x, y))
(by 272G and Fubini’s theorem) = E((X1 + X2 )2 ) (by 271E) < ∞. R So there must be some a ∈ R such that (x + a)2 µX1 (dx) is finite, that is, E((X1 + a)2 ) < ∞; consequently E(X12 ) and Var(X1 ) are finite. Q Q (ii) Now an easy induction (relying Pn on 272L!) shows that if Var(X1 + . . . + Xn ) is finite, so is Var Xj for every j. Turning this round, if j=1 Var(Xj ) = ∞, then Var(X1 + . . . + Xn ) = ∞, and again the two are equal. 272S The distribution of a sum of independent random variables: Theorem Let X, Y be independent real-valued random variables on a probability space (Ω, Σ, µ), with distributions νX , νY . Then the distribution of X + Y is the convolution νX ∗ νY (257A). proof Set ν = νX ∗ νY . Take a ∈ R and set h = χ ]−∞, a]. Then h is ν-integrable, so Z ν ]−∞, a] =
Z h dν =
h(x + y)(νX × νY )(d(x, y))
(by 257B, writing νX × νY for the product measure on R 2 ) Z = h(x + y)ν(X,Y ) (d(x, y)) (by 272G, writing ν(X,Y ) for the joint distribution of (X, Y ); this is where we use the hypothesis that X and Y are independent) = E(h(X + Y )) (applying 271E to the function (x, y) 7→ h(x + y)) = Pr(X + Y ≤ a). As a is arbitrary, νX ∗ νY is the distribution of X + Y . 272T Corollary Suppose that X and Y are independent random variables, and that they have densities f and g. Then f ∗ g is a density function for X + Y . proof By 257F, f ∗ g is a density function for νX ∗ νY = νX+Y . 272U The following simple result will be very useful when we come to stochastic processes in Volume 4, as well as in the next section. Etemadi’s lemma Pm(Etemadi 96) Let X0 , . . . , Xn be independent real-valued random variables. For m ≤ n, set Sm = i=0 Xi . Then
272Xh
Independence
355
Pr(supm≤n |Sm | ≥ 3γ) ≤ 3 maxm≤n Pr(|Sm | ≥ γ) for every γ > 0. proof As in 272P, we may suppose that every Xi is a measurable function defined everywhere on a measure space Ω. Set α = maxm≤n Pr(|Sm | ≥ γ). For each r ≤ n, set Er = {ω : |Sm (ω)| < 3γ for every m < r, |Sr (ω)| ≥ 3γ}. Then E0 , . . . , En is a disjoint cover of {ω : maxm≤n |Sm (ω)| ≥ 3γ}. Set Er0 = {ω : ω ∈ Er , |Sn (ω)| < γ}. Then Er0 ⊆ {ω : ω ∈ Er , |(Sn − Sr )(ω)| > 2γ}. But Er depends on X0 , . . . , Xr so is independent of {ω : |(Sn − Sr )(ω)| > 2γ}, which can be calculated from Xr+1 , . . . , Xn (272K). So µEr0 ≤ µ{ω : ω ∈ Er , |(Sn − Sr )(ω)| > 2γ} = µEr · Pr(|Sn − Sr | > 2γ) ≤ µEr (Pr(|Sn | > γ) + Pr(|Sr | > γ)) ≤ 2αµEr , and µ(Er \ Er0 ) ≥ (1 − 2α)µEr . On the other hand, hEr \ Er0 ir≤n is a disjoint family of sets all included in {ω : |Sn (ω)| ≥ γ}. So Pn Pn α ≥ µ{ω : |Sn (ω)| ≥ γ} ≥ r=0 µ(Er \ Er0 ) ≥ (1 − 2α) r=0 µEr , and Pr(supr≤n |Sr | ≥ 3γ) = (considering α ≤ 31 , α ≥
1 3
Pn r=0
µEr ≤ min(1,
α ) 1−2α
≤ 3α,
separately), as required.
272X Basic exercises (a) Let (Ω, Σ, µ) be an atomless probability space, and h²n in∈N any sequence in [0, 1]. Show that there is an independent sequence hEn in∈N in Σ such that µEn = ²n for every n. (Hint: 215D.) > (b) Let hXi ii∈I be a family of real-valued random variables. Show that it is independent iff Qn E(h1 (Xi1 ) × . . . × hn (Xin )) = j=1 E(hj (Xij )) whenever i1 , . . . , in are distinct members of I and h1 , . . . , hn are Borel measurable functions from R to R such that E(hj (Xij )) are all finite. (c) Write out a proof of 272F which does not use the theory of product measures. (d) Let X = (X1 , . . . , Xn ) be a family of random variables all defined on the same probability space, and suppose that X has a density function f expressible in the form f (ξ1 , . . . , ξn ) = f1 (ξ1 )f2 (ξ2 ) . . . fn (ξn ) for suitable functions f1 , . . . , fn of one real variable. Show that X1 , . . . , Xn are independent. (e) Let X1 , X2 be independent real-valued random variables both with distribution ν and distribution function F . Set Y = max(X1 , X2 ). Show that the distribution of Y is absolutely continuous with respect to ν, with Radon-Nikod´ ym derivative F + F − , where F − (x) = limt↑x F (t) for every x ∈ R. (Hint: use Fubini’s theorem to calculate λ{(t, u) : t ≤ u ≤ x} and λ{(t, u) : t < u ≤ x} where λ is the joint distribution of X1 and X2 .) (f ) Use 254Sa and the idea of 272J to give another proof of 272O. (g) Let (Ω, Σ, µ) be a probability space S and hΣn in∈N a non-decreasing sequence of σ-subalgebras of Σ. Let Σ∞ be the σ-algebra generated by n∈N Σn . Let T be another σ-subalgebra of Σ such that Σn and T are independent for each n. Show that Σ∞ and T are independent. (Hint: apply the Monotone Class Theorem to {E : µ(E ∩ F ) = µE · µF for every F ∈ T}.) Use this to prove 272O. (h) Let hXn in∈N be a sequence of random variables and Y a random P∞ variable such that Y and Xn are independent for each n ∈ N. Suppose that Pr(Y ∈ N) = 1 and that n=0 Pr(Y ≥ n)E(|Xn |) is finite. Set PY PY (ω) Z = n=0 Xn (that is, Z(ω) = n=0 Xn (ω) whenever ω ∈ dom Y is such that Y (ω) ∈ N and ω ∈ dom Xn P∞ for every n ≤ Y (ω)). (i) Show that E(Z) = n=0 Pr(Y ≥ n)E(Xn ). (Hint: set Xn0 (ω) = Xn (ω) if Y (ω) ≥ n, 0 otherwise.) (ii) Show that if E(Xn ) = γ for every n ∈ N then E(Z) = γE(Y ). (This is Wald’s equation.)
356
Probability theory
272Xi
(i) Let X1 , . . . , Xn be independent random variables. Show that if X1 + . . . + Xn has finite expectation so does every Xj . (Hint: see part (b) of the proof of 272R.) >(j) Let X and Y be independent real-valued variables with densities f and g. Show that X × Y R ∞ 1 random has a density function h where h(x) = −∞ |y| g(y)f ( xy )dy for almost every x. (Hint: 271K.) 272Y Further exercises (a) Develop a theory of independence for random variables taking values in R r , following through as many as possible of the ideas of this section. (b) Show that all the ideas of this section apply equally to complex-valued random variables, subject to suitable adjustments (to be devised). (c) Let X0 , . . . , Xn be independent real-valued random variables with distributions ν0 , . . . , νn and distribution functions F0 , . . . , Fn . Show that, for any Borel set E ⊆ R, Pn R Q Q Pr(supi≤n Xi ∈ E) = i=0 E 0≤j 0. Write sn =
Pn i=0
Pn i=0
1 X | x − xi | n+1 i=0
≤
As ² is arbitrary, limn→∞
359
1 | n+1 ²m0 n+1
m−1 X
x − xi | +
i=0
+
²(n−m+1) n+1
1 n+1
n X
|x − xi |
i=m
≤ 2².
xi = x.
xi for each n, and s = limn→∞ sn =
P∞ i=0
xi ;
set s∗ = supn∈N |sn | < ∞. Let m ∈ N be such that |sn − s| ≤ ² whenever n ≥ m; then |sn − sj | ≤ 2² whenever j, n ≥ m. Let m0 ≥ m be such that bm s∗ ≤ ²bm0 . Take any n ≥ m0 . Then
|
n X
bk xk | = |b0 s0 + b1 (s1 − s0 ) + . . . + bn (sn − sn−1 )|
k=0
= |(b0 − b1 )s0 + (b1 − b2 )s1 + . . . + (bn−1 − bn )sn−1 + bn sn | = |b0 sn +
n−1 X
(bi+1 − bi )(sn − si )|
i=0
≤ b0 |sn | +
m−1 X i=0
≤ b0 s∗ + 2s∗
n−1 X
(bi+1 − bi )|sn − si | +
(bi+1 − bi )|sn − si |
i=m
m−1 X
(bi+1 − bi ) + 2²
i=0
n−1 X
(bi+1 − bi )
i=m
= b0 s∗ + 2s∗ (bm − b0 ) + 2²(bn − bm ) ≤ 2s∗ bm + 2²bn . Consequently, because bn ≥ bm0 , |
1 bn
Pn
k=0 bk xk |
≤2
s∗ bm bn
+ 2² ≤ 4².
As ² is arbitrary, limn→∞
1 bn
Pn
k=0 bk xk
= 0,
as required. Remark Part (b) above is sometimes called ‘Kronecker’s lemma’. 273D The strong law of large numbers: first form Let hXn in∈N be an independent sequence of real-valued random variables, and suppose that hbn in∈N is a non-decreasing sequence in ]0, ∞[, diverging to P∞ 1 ∞, such that n=0 2 Var(Xn ) < ∞. Then bn
limn→∞
1 bn
Pn
i=0 (Xi
− E(Xi )) = 0
almost everywhere. proof As usual, write (Ω, Σ, µ) for the underlying probability space. Set Yn =
1 (Xn bn
− E(Xn ))
360
Probability theory
273D
for each n; then hYn in∈N is independent (272E), E(Yn ) = 0 for each n, and P∞ P∞ 1 2 n=0 E(Yn ) = n=0 2 Var(Xn ) < ∞. bn
By 273B, hYn (ω)in∈N is summable for almost every ω ∈ Ω. But by 273C, 1 Pn 1 Pn limn→∞ i=0 (Xi (ω) − E(Xi )) = limn→∞ i=0 bi Yi (ω) = 0 bn
bn
for all such ω. So we have the result. 273E Corollary Let hXn in∈N be an independent sequence of random variables such that E(Xn ) = 0 for every n and supn∈N E(Xn2 ) < ∞. Then limn→∞
1 (X0 bn
+ . . . + Xn ) = 0
almost everywhere whenever hbn in∈N is a non-decreasing sequence of strictly positive numbers and is finite. In particular, limn→∞
1 (X0 n+1
P∞
1
n=0 b2 n
+ . . . + Xn ) = 0
almost everywhere. Remark For most of the rest √ of this section, we shall take bn = n + 1. The special virtue of 273D is that it allows other bn , e.g., bn = n ln n. A direct strengthening of this theorem is in 276C below. 273F Corollary Let hEn in∈N be an independent sequence of measurable sets in a probability space (Ω, Σ, µ). and suppose that 1 Pn limn→∞ i=0 µEi = c. n+1
Then limn→∞
1 #({i n+1
: i ≤ n, ω ∈ Ei }) = c
for almost every ω ∈ Ω. proof In 273D, set Xn = χEn , bn = n + 1. For almost every ω, we have 1 Pn limn→∞ i=0 (χXi (ω) − ai ) = 0, n+1
writing ai = µEi = E(Xi ) for each i. (I see that I am using 272F to support the claim that hXn in∈N is independent.) But for any such ω, lim
¡ 1
n→∞ n+1
#({i : i ≤ n, ω ∈ Ei }) − = lim
1 n+1
limn→∞ as required.
Pn i=0
n X
ai
i=0
n 1 X (χXi (ω) − ai ) = 0;
n→∞ n+1
because we are supposing that limn→∞
1 n+1
i=0
ai = c, we must have
1 #({i n+1
: i ≤ n, ω ∈ Ei }) = c,
¢
273H
The strong law of large numbers
361
273G Corollary Let µ be the usual measure on PN, as described in 254Jb. Then for µ-almost every set a ⊆ N, limn→∞
1 #(a ∩ {0, . . . n+1
, n}) = 21 .
proof The sets En = {a : n ∈ a} are independent, with measure 21 . Remark The limit limn→∞
1 #(a ∩ {0, . . . n+1
, n}) is called the asymptotic density of a.
273H Strong law of large numbers: second form Let hXn in∈N be an independent sequence of real-valued random variables, and suppose that supn∈N E(|Xn |1+δ ) < ∞ for some δ > 0. Then 1 Pn limn→∞ i=0 (Xi − E(Xi )) = 0 n+1
almost everywhere. proof As usual, call the underlying probability space (Ω, Σ, µ); as in 273B we can adjust the Xn on negligible sets so as to make them measurable and defined everywhere on Ω, without changing E(Xn ), E(|Xn |) or the convergence of the partial sums except on a negligible set. (a) For each n, define a random variable Yn on Ω by setting Yn (ω) = Xn (ω) if |Xn (ω)| ≤ n, = 0 if |Xn (ω)| ≥ n. Then hYn in∈N is independent (272E). For each n ∈ N, Var(Yn ) ≤ E(Yn2 ) ≤ E(n1−δ |Xn |1+δ ) ≤ n1−δ K, where K = supn∈N E(|Xn |1+δ ), so P∞
1
n=0 (n+1)2
Var(Yn ) ≤
P∞
n1−δ
n=0 (n+1)2 K
< ∞.
By 273D, G = {ω : limn→∞
1 n+1
Pn
i=0 (Yi (ω)
− E(Yi )) = 0}
is conegligible. (b) On the other hand, setting En = {ω : Yn (ω) 6= Xn (ω)} = {ω : |Xn (ω)| > n}, we have K ≥ n
1+δ
µEn for each n, so P∞ n=0
µEn ≤ 1 + K
P∞
1
n=1 n1+δ
< ∞,
and the set H = {ω : {n : ω ∈ En } is finite} is conegligible (273A). But of course 1 Pn limn→∞ i=0 (Xi (ω) − Yi (ω)) = 0 n+1
for every ω ∈ H. (c) Finally, |E(Yn ) − E(Xn )| ≤
R En
|Xn | ≤
R En
n−δ |Xn |1+δ ≤ n−δ K
whenever n ≥ 1, so limn→∞ E(Yn ) − E(Xn ) = 0 and 1 Pn limn→∞ i=0 E(Yi ) − E(Xi ) = 0 n+1
(273Ca). Putting these three together, we get
362
Probability theory
limn→∞
1 n+1
Pn i=0
273H
Xi (ω) − E(Xi ) = 0
whenever ω belongs to the conegligible set G ∩ H. So 1 Pn limn→∞ i=0 Xi − E(Xi ) = 0 n+1
almost everywhere, as required. 273I Strong law of large numbers: third form Let hXn in∈N be an independent sequence of realvalued random variables of finite expectation, and suppose that they are identically distributed, that is, all have the same distribution. Then 1 Pn limn→∞ i=0 (Xi − E(Xi )) = 0 n+1
almost everywhere. proof The proof follows the same line as that of 273H, but some of the inequalities require more delicate arguments. As usual, call the underlying probability space (Ω, Σ, µ) and suppose that the Xn are all measurable and defined everywhere on Ω. (We need to remember that changing a random variable on a negligible set does not change its distribution.) Let ν be the common distribution of the Xn . (a) For each n, define a random variable Yn on Ω by setting Yn (ω) = Xn (ω) if |Xn (ω)| ≤ n, = 0 if |Xn (ω)| ≥ n. Then hYn in∈N is independent (272E). For each n ∈ N, Var(Yn ) ≤ E(Yn2 ) = (271E). To estimate
P∞
1 2 n=0 (n+1)2 E(Yn ),
1 (n+1)2
Var(Yn ) ≤
R
[−n,n]
x2 ν(dx)
set
fn (x) = so that
R
x2 (n+1)2
if |x| ≤ n, 0 if |x| > n,
fn dν. If r ≥ 1 and r < |x| ≤ r + 1 then ∞ X
fn (x) ≤
n=0
∞ X n=r+1
1 (r (n+1)2
≤ (r + 1)|x|
+ 1)|x|
∞ X
1 n
( −
n=r+1
1 ) n+1
≤ |x|,
while if |x| ≤ 1 then P∞ n=0
fn (x) ≤
(You do not need to know that the sum is
P∞
1
n=0 (n+1)2
=
π2 6
≤ 2 < ∞.
π2 6 ,
only that it is finite; but see 282Xo.) Consequently P∞ f (x) = n=0 fn (x) ≤ 2 + |x| R R for every x, and f dν < ∞, because |x|ν(dx) is the common value of E(|Xn |), and is finite. By any of the great convergence theorems, R P∞ P∞ R 1 Var(Yn ) ≤ n=0 fn dν = f dν < ∞. n=0 2 (n+1)
By 273D, G = {ω : limn→∞
1 n+1
Pn
i=0 (Yi (ω)
− E(Yi )) = 0}
273J
The strong law of large numbers
363
is conegligible. (b) Next, setting En = {ω : Xn (ω) 6= Yn (ω)} = {ω : |Xn (ω)| > n}, we have En =
S i≥n
Fni ,
where Fni = {ω : i < |Xn (ω)| ≤ i + 1}. Now µFni = ν{x : i < |x| ≤ i + 1} for every n and i. So ∞ X
µEn =
n=0
=
∞ X ∞ X
µFni =
n=0 i=n ∞ X
∞ X i X
µFni
i=0 n=0
Z
(i + 1)ν{x : i < |x| ≤ i + 1} ≤
(1 + |x|)ν(dx) < ∞.
i=0
Consequently the set H = {ω : {n : Xn (ω) 6= Yn (ω)} is finite} is conegligible (273A). But of course 1 Pn limn→∞ i=0 Xi (ω) − Yi (ω) = 0 n+1
for every ω ∈ H. (c) Finally, |E(Yn ) − E(Xn )| ≤
R En
|Xn | =
R R\[−n,n]
|x|ν(dx)
whenever n ∈ N, so limn→∞ E(Yn ) − E(Xn ) = 0 and 1 Pn limn→∞ i=0 E(Yi ) − E(Xi ) = 0 n+1
(273Ca). Putting these three together, we get limn→∞
1 n+1
Pn i=0
Xi (ω) − E(Xi ) = 0
whenever ω belongs to the conegligible set G ∩ H. So 1 Pn limn→∞ i=0 Xi − E(Xi ) = 0 n+1
almost everywhere, as required. Remarks In my own experience, this is the most important form of the strong law from the point of view of ‘pure’ measure theory. I note that 273G above can also be regarded as a consequence of this form. For a very striking alternative proof, see 275Yn. Yet another proof treats this result as a special case of the Ergodic Theorem (see 372Xg in Volume 3). 273J Corollary Let (Ω, Σ, µ) be a probability space, and f a µ-integrable real-valued function. Let λ be the product measure on ΩN (254A-254C). Then for λ-almost every ω = hωn in∈N ∈ ΩN , R 1 Pn f dµ. limn→∞ i=0 f (ωi ) = n+1
proof Define functions Xn on ΩN by setting ω ) = f (ωn ) whenever ωn ∈ dom f . Xn (ω
364
Probability theory
273J
Then hXn in∈N is an independent sequence of random variables, all with the same distribution as f (272M). So R 1 Pn 1 Pn ω ) − E(Xi ) = 0 limn→∞ f dµ = limn→∞ i=0 f (ωi ) − i=0 Xi (ω n+1
n+1
for almost every ω , by 273I, and limn→∞
1 n+1
Pn i=0
f (ωi ) =
R
f dµ.
for almost every ω . Remark I find myself slipping here into measure-theorists’ terminology; this corollary is one of the basic applications of the strong law to measure theory. Obviously, in view of 272J and 272M, this corollary is equivalent to 273I. It could also (in theory) be used as a definition of integration (on a probability space); it is sometimes called the ‘Monte Carlo’ method of integration.
273K It is tempting to seek extensions of 273I in which the Xn are not identically distributed, but are otherwise well-behaved. Any such idea should be tested against the following example. I find that I need another standard result, complementing that in 273A. Borel-Cantelli lemma Let (Ω, Σ,P µ) be a probability space and hEn in∈N an independent sequence of ∞ measurable subsets of Ω such that n=0 µEn = ∞. Then almost every point of Ω belongs to infinitely many of the En . proof (a) Observe first that if α0 , . . . , αn ∈ [0, 1] then Qn 1 1 Pn i=0 (1 − αi ) ≤ max( , 1 − i=0 αi ); 2
2
this is a simple induction on n, because if Qn 1 1 Pn ≤ i=0 (1 − αi ) ≤ 1 − i=0 αi , 2
2
then n+1 Y
(1 − αi ) ≤ (1 − αn+1 )(1 −
i=0
≤1−
1 2
n X i=0
1 2
1 2
n X
αi )
i=0
αi − αn+1 = 1 −
1 2
n+1 X
αi .
i=0
Q∞ P∞ P For Consequently, if hαn in∈N is a sequence in [0, 1] such that n=0 αn = ∞, then n=0 (1 − αn ) = 0. P Pm Q∞ 1 Qn every n ∈ N there is an m ≥ n such that i=n+1 αi ≥ 1, so that i=0 (1 − αi ) ≤ i=0 (1 − αi ). Letting 2 Q∞ Q 1 ∞ n → ∞, i=0 (1 − αi ) ≤ Q i=0 (1 − αi ). Q 2
T (b) Set Fmn = m≤i≤n (Ω\Ei ) for m ≤ n. Because Em , . . . , En are independent, so are Ω\Em , . . . , Ω\En Qn (272F), and µFmn = i=m (1 − µEi ). Letting n → ∞, T Qn µ( i≥m (Ω \ Ei )) = limn→∞ i=m (1 − µEi ) = 0 P∞ by (a), because i=m µEi = ∞. But this means that S T µ{ω : {n : ω ∈ En is finite}} = µ( m∈N i≥m (Ω \ Ei )) = 0, and almost every point of Ω belongs to infinitely many En , as claimed.
273L
The strong law of large numbers
273L
365
Now for the promised example.
Example There is an independent sequence hXn in∈N of random variables such that limn→∞ E(|Xn |) = 0 but 1 P∞ lim supn→∞ i=0 Xi − E(Xi ) = ∞, n+1
lim inf n→∞
1 n+1
P∞ i=0
Xi − E(Xi ) = 0
almost everywhere. proof Let (Ω, Σ, µ) be a probability space with an independent sequence hEn in∈N of measurable sets such that µEn =
1 (n+3) ln(n+3)
for each n. (I have nowhere explained exactly how to build such a sequence.
Two obvious methods are available to us, and another a trifle less obvious. (i) Take Ω = {0, 1}N and µ to be the product of the probabilities µn on {0, 1}, defined by saying that µn {1} =
1 (n+3) ln(n+3)
for each n;
set En = {ω : ω(n) = 1}, and appeal to 272M to check that the En are independent. (ii) Build the En inductively as subsets of [0, 1], arranging that each En should be a finite union of intervals, so that when you come to choose En+1 the sets E0 , . . . , En define a partition In of [0, 1] into intervals, and you can take En+1 to be the union of (say) the left-hand subintervals of length a proportion
1 (n+3) ln(n+3)
of the intervals
in In . (iii) Use 215D to see that the method of (ii) can be used on any atomless probability space, as in 272Xa.) Set Xn = (n + 3) ln ln(n + 3)χEn for each n; then hXn in∈N is an independent sequence of real-valued ln ln(n+3) ln(n+3)
random variables (272F) and E(Xn ) =
for each n, so that E(Xn ) → 0 as n → ∞. Thus,
for instance, {Xn : n ∈ N} is uniformly integrable and hXn in∈N → 0 in measure (246Jc); while surely 1 Pn limn→∞ i=0 E(Xi ) = 0. n+1
On the other hand, ∞ X
µEn =
n=0
∞ X n=0
1 (n+3) ln(n+3)
Z ≥ 0
∞
1 dx (x+3) ln(x+3)
= lim (ln ln(a + 3) − ln ln 3) = ∞, a→∞
so almost every ω belongs to infinitely many of the En , by the Borel-Cantelli lemma (273K). Now if we 1 Pn write Yn = i=0 Xi , then if ω ∈ En we have Xn (ω) = (n + 3) ln ln(n + 3) so n+1
Yn (ω) ≥
n+3 n+1
ln ln(n + 3).
This means that n
1 X (Xi (ω) − E(Xi )) n+1 n→∞ i=0
{ω : lim sup
n
1 X Xi (ω) n+1 n→∞ i=0
= ∞} = {ω : lim sup
= ∞}
= {ω : sup Yn (ω) = ∞} ⊇ {ω : {n : ω ∈ En } is infinite} n∈N
is conegligible, and the strong law of large numbers does not apply to hXn in∈N . Because limn→∞ kYn k1 = limn→∞ E(Yn ) = limn→∞ E(Xn ) = 0 (273Ca), hYn in∈N → 0 for the topology of convergence in measure, and hYn in∈N has a subsequence converging to 0 almost everywhere (245K). So 1 Pn lim inf n→∞ i=0 (Xi (ω) − E(Xi )) = lim inf n→∞ Yn (ω) = 0 n+1
366
Probability theory
273L
for almost every ω. The fact that both lim supn→∞ Yn and lim inf n→∞ Yn are constant almost everywhere is of course a consequence of the zero-one law (272P). *273M All the above has been concerned with pointwise convergence of the averages of independent random variables, and that is the important part of the work of this section. But it is perhaps worth complementing it with a brief investigation of norm-convergence. To deal efficiently with convergence in Lp , we need the following. (I should perhaps remark that, compared with the general case treated here, the case p = 2 is trivial; see 273Xj.) Lemma For any p ∈ ]1, ∞[ and ² > 0, there is a δ > 0 such that kS + Xkp ≤ 1 + ²kXkp whenever S and X are independent random variables, kSkp = 1, kXkp ≤ δ and E(X) = 0. proof (a) Take ζ ∈ ]0, 1] such that pζ ≤ 2 and (1 + ξ)p ≤ 1 + pξ +
p2 2 ξ 2
whenever |ξ| ≤ ζ; such exists because limξ→0
(1+ξ)p −1−pξ ξ2
=
p(p−1) 2
0 by setting (2p + 1)η p−1 =
p ξ
+
p2 ) 2ξ 2
= ξ p + pξ p−1 (1 +
pζ ) 2
≤ ξ p + 2pξ p−1 . Q Q
(this is one of the places where we need to know that p > 1).
Let δ > 0 be such that δ ≤ ηζ,
p2 δ 2η 2
1 ζ
+ (1 + )p δ p−1 ≤
p² . 2
(b) Now suppose that S and X are independent random variables with kSkp = 1, kXkp ≤ δ and E(X) = 0. Write (Ω, Σ, µ) for the underlying probability space and adjust S and X on negligible sets so that they are measurable and defined everywhere on Ω. Set β = kXkp , γ = β/η, E = {ω : S(ω) 6= 0},
F = {ω : |X(ω)| > γ|S(ω)|},
θ = kS × χF kp .
Then Z
Z |S + X|p =
Z |S + X|p +
E\F
|S + X|p F
(because S and X are both zero on Ω \ (E ∪ F )) Z X = |S|p |1 + |p + k(S × χF ) + (X × χF )kpp S E\F Z X p2 ≤ |S|p (1 + p + γ 2 ) + (kS × χF kp + kX × χF )kp )p S
E\F
2
X
(because | | ≤ γ ≤ δ/η ≤ ζ ≤ 1 everywhere on E \ F ) S Z Z p2 ≤ (1 + γ 2 ) |S|p + p 2
(writing sgn(ξ) = ξ/|ξ| if ξ 6= 0, 0 if ξ = 0)
E\F
|S|p−1 × sgn S × X + (θ + β)p E\F
273N
The strong law of large numbers
Z
= (1 +
p2 2 γ ) 2 Ω\F
367
Z |S|p + p
|S|p−1 × sgn S × X + (θ + β)p Ω\F
(because S = 0 on Ω \ E)
Z
p2 2 γ )(1 − θp ) − p 2 F
θ
|S|p−1 × sgn S × X + β p (1 + )p β R p−1 p−1 p−1 (because X and |S| ×sgn S are independent, by 272L, so |S| ×sgn S×X = E(|S| ×sgn S)E(X) = 0) Z p2 ≤ (1 + γ 2 )(1 − θp ) + p |S|p−1 × |X| = (1 +
2
F
¡ ¢ 1 θ θ + β p (1 + )p + ( )p + 2p( )p−1 ζ
(see (a) above)
β
β
Z
≤ (1 +
≤1+
1 p2 2 γ )(1 − θp ) + p |X|p p−1 2 γ F 1 + β p (1 + )p + θp + 2pθp−1 β ζ
p2 2 γ 2
+p
1 γ
βp γ p−1
1 ζ
+ β p (1 + )p + 2p
βp γ p−1
β γ
(because θ = kS × χF kp ≤ kX × χF kp ≤ ) =1+ =1+ ≤1+
p2 β 2 2η 2
+ (2p + 1)η p−1 β + β p (1 + )p
¡ p2 β
1 ¢ + (2p + 1)η p−1 + β p−1 (1 + )p β
2η
2
¡ p2 δ 2η 2
1 ζ
ζ
+ (2p + 1)η p−1 + δ p−1 (1 +
1 p¢ ) β ζ
≤ 1 + pβ² ≤ (1 + ²kXkp )p . So kS + Xkp ≤ 1 + ²kXkp , as required. *Remark What is really happening here is that φ = k kpp : Lp → R is differentiable (as a real-valued function on the normed space Lp ) and R φ0 (S • )(X • ) = p |S|p−1 × sgn S × X, so that in the context here φ((S + X)• ) = φ(S • ) + φ0 (S • )(X • ) + o(kXkp ) = 1 + o(kXkp ) and kS + Xkp = 1 + o(kXkp ). The calculations above are elaborate partly because they do not appeal to any non-trivial ideas about normed spaces, and partly because we need the estimates to be uniform in S. 273N Theorem Let hXn in∈N be an independent sequence of real-valued random variables with zero 1 (X0 + . . . + Xn ) for each n ∈ N. expectation, and set Yn = n+1 (a) If hXn in∈N is uniformly integrable, then limn→∞ kYn k1 = 0. *(b) If p ∈ ]1, ∞[ and supn∈N kXn kp < ∞, then limn→∞ kYn kp = 0. proof (a) Let ² > 0. Then there is an M ≥ 0 such that E(|Xn | − M )+ ≤ ² for every n ∈ N. Set Xn0 = (Xn ∧ M χX) ∨ (−M χX),
αn = E(Xn0 ),
˜ n = X 0 − αn , X n
Xn00 = Xn − Xn0
˜ n in∈N are independent and uniformly bounded, and kXn00 k1 ≤ ² for for each n ∈ N. Then hXn0 in∈N and hX every n. So if we write 1 Pn 1 Pn 00 ˜ Y˜n = Yn00 = i=0 Xi , i=0 Xi , n+1
n+1
hY˜n in∈N → 0 almost everywhere, by 273E (for instance), while kYn00 k1 ≤ ² for every n. Moreover,
368
Probability theory
273N
|αn | = |E(Xn0 − Xn )| ≤ E(|Xn00 |) ≤ ² for every n. As |Y˜n | ≤ 2M almost everywhere for each n, limn→∞ kY˜n k1 = 0, by Lebesgue’s Dominated Convergence Theorem. So lim sup kYn k1 = lim sup kY˜n + Yn00 + αn k1 n→∞
n→∞
≤ lim kY˜n k1 + sup kYn00 k1 + sup |αn | n→∞
n∈N
n∈N
≤ 2². As ² is arbitrary, limn→∞ kYn k1 = 0, as claimed.
Pn *(b) Set M = supn∈N kXn kp . For n ∈ N, set Sn = i=0 Xi . Let ² > 0. Then there is a δ > 0 such that kS + X|kp ≤ 1 + ²kXkp whenever S and X are independent random variables, kSkp = 1, kXkp ≤ δ and E(X) = 0 (273M). It follows that kS + Xkp ≤ kSkp + ²kXkp whenever S and X are independent random variables, kSkp is finite, kXkp ≤ δkSkp and E(X) = 0. In particular, kSn+1 kp ≤ kSn kp + ²M whenever kSn kp ≥ M/δ. An easy induction shows that kSn kp ≤
M δ
+ M + n²M
for every n ∈ N. But this means that lim supn→∞ kYn kp = lim supn→∞
1 kSn kp n+1
≤ ²M .
As ² is arbitrary, limn→∞ kYn kp = 0. Remark There are strengthenings of (a) in 276Xd, and of (b) in 276Ya. 273X Basic exercises (a) In part (b) of the proof of 273B, use Bienaym´e’s equality to show that limm→∞ supn≥m Pr(|Sn − Sm | ≥ ²) = 0 for every ² > 0, so that we can apply the argument of part (a) of the proof directly, without appealing to 242F or 245G or even 244E. (b) Show that
P∞ (−1)ω(n) n=0
n+1
is defined in R for almost every ω = hω(n)in∈N in {0, 1}N , where {0, 1}N is
given its usual measure (254J). > (c) Take any q ∈ [0, 1], and give PN a measure µ such that µ{a : I ⊆ a} = q #(I) for every I ⊆ N, as in 254Xg. Show that for µ-almost every a ⊆ N, limn→∞
1 #(a ∩ {0, . . . n+1
, n}) = q.
> (d) Let µ be the usual probability measure on PN (254Jb), and for r ≥ 1 let µr be the product probability measure on (PN)r . Show that limn→∞ limn→∞
1 #(a1 n+1
1 #((a1 n+1
∩ . . . ∩ ar ∩ {0, . . . , n}) = 2−r ,
∪ . . . ∪ ar ) ∩ {0, . . . , n}) = 1 − 2−r
for µr -almost every (a1 , . . . , ar ) ∈ (PN)r . (e) Let µ be the usual probability measure on PN, and take any infinite b ⊆ N. Show that limn→∞ #(a ∩ b ∩ {0, . . . , n})/#(b ∩ {0, . . . , n}) = 21 for almost every a ⊆ N. >(f ) For each x ∈ [0, 1], let ²k (x) be the kth digit in the decimal expansion of x (choose for yourself what 1 to do with 0·100 . . . = 0·099 . . . ). Show that limk→∞ k1 #({j : j ≤ k, ²j (x) = 7}) = 10 for almost every x ∈ [0, 1].
273 Notes
The strong law of large numbers
369
(g) Let hFn in∈N be a sequence of distribution functions for real-valued random variables, in the sense of 271Ga, and F another distribution function; suppose that limn→∞ Fn (q) = F (q) for every q ∈ Q and limn→∞ Fn (a− ) = F (a− ) whenever F (a− ) < F (a), where I write F (a− ) for limx↑a F (x). Show that Fn → F uniformly. (h) Let (Ω, Σ, µ) be a probability space and hXn in∈N an independent identically distributedTsequence of real-valued random variables on Ω with common distribution function F . For a ∈ R, n ∈ N, ω ∈ i≤n dom Xi set Fn (ω, a) =
1 #({i n+1
: i ≤ n, Xi (ω) ≤ a}).
Show that limn→∞ supa∈R |Fn (ω, a) − F (a)| = 0 for almost every ω ∈ Ω. (i) FindP an independent sequence hXn in∈N of random variables with zero expectation such that kXn k1 = 1 n 1 1 and k n+1 i=0 Xi k1 ≥ 2 for every n ∈ N. (Hint: take Pr(Xn 6= 0) very small.) (j) Use 272R to prove 273Nb in the case p = 2. (k)P Find an independent sequence hXn in∈N of random variables with zero expectation such that kXn k∞ = n 1 k n+1 i=0 Xi k∞ = 1 for every n ∈ N. (l) Repeat the work of this section for complex-valued random variables. (m) Let hEn in∈N be an independent sequence of measurable sets in a probability space, P∞all with the same non-zero measure. Let han in∈N be a sequence of non-negative real numbers such that n=0 an = ∞. Show Pkn+1 P∞ ai ≥ that n=0 an χEn = ∞ a.e. (Hint: Take a strictly increasing sequence hkn in∈N such that dn = i=k n +1 P∞ P∞ 2 ai 1 for each n. Set ci = for kn < i ≤ kn+1 ; show that n=0 cn < ∞ = n=0 cn . Apply 273D with (n+1)dn pPn Xn = cn χEn and bn = i=0 ci .) 273Y Further exercises (a) Let (Ω, Σ, µ) be a probability space, and λ the product measure on ΩN . Suppose that f is a real-valued function, defined on a subset of Ω, such that 1 Pn ω ) = limn→∞ h(ω i=0 f (ωi ) n+1
N
exists in R for λ-almost every ω = hωn in∈N in Ω . Show (i) that f has conegligible domain (ii) f is measurable for theRcompletion of µ (iii) there is an a ∈ R such that h = a almost everywhere in ΩN (iv) f is integrable, with f dµ = a. (b) Repeat the work of this section for random variables taking values in R r . 273 Notes and comments I have tried in this section to offer the most useful of the standard criteria for pointwise convergence of averages of independent random variables. In my view the strong law of large numbers, like Fubini’s theorem, is one of the crucial steps in measure theory, where the subject changes character. Theorems depending on the strong law have a kind of depth and subtlety to them which is missing in other parts of the subject. I have described only a handful of applications here, but I hope that 273G, 273J, 273Xc, 273Xf and 273Xh will give an idea of what is to be expected. These do have rather different weights. Of the four, only 273J requires the full resources of this chapter; the others can be deduced from the essentially simpler version in 273Xh. 273Xh is the ‘fundamental theorem of statistics’ or ‘Glivenko-Cantelli theorem’. The Fn (., a) are ‘statistics’, computed from the Xi ; they are the ‘empirical distributions’, and the theorem says that, almost surely, Fn → F uniformly. (I say ‘uniformly’ to make the result look more striking, but of course the real content is that Fn (., a) → F (a) almost surely for each a; the extra step is just 273Xg.)
370
Probability theory
273 Notes
I have included 273N to show that independence is quite as important in questions of norm-convergence as it is in questions of pointwise convergence. It does not really rely on any form of the strong law; I quoted 273E as a quick way of disposing of the ‘uniformly bounded parts’ Xn0 , but of course Bienaym´e’s equality (272R) is already enough to show that if hXn0 in∈N is an independent uniformly bounded sequence of random 1 variables with zero expectation, then k n+1 (X0 + . . . + Xn )kp → 0 for p = 2, and therefore for every p < ∞. The proofs of 273H, 273I and 273Na all involve ‘truncation’; the expression of a random variable X as the sum of a bounded random variable and a tail. This is one of the most powerful techniques in the subject, and will appear again in §§274 and 276. In 273Na I used a slightly different formulation of the method, solely because it matched the definition of ‘uniformly integrable’ more closely.
274 The central limit theorem The second of the great theorems to which this chapter is devoted is of a new type. It is a limit theorem, but the limit involved is a limit of distributions, not of functions (as in the strong limit theorem above or the martingale theorem below), nor of equivalence classes of functions (as in Chapter 24). I give three forms of the theorem, in 274I-274K, all drawn as corollaries of Theorem 274G; the proof is spread over 274C-274G. In 274A-274B and 274M I give the most elementary properties of the normal distribution. 274A The normal distribution We need some facts from basic probability theory. (a) Recall that
R∞ −∞
2
e−x
/2
dx =
√
2π
(263G). Consequently, if we set 1 µG E = √
R
2π E
e−x
2
/2
dx
for every Lebesgue measurable set E, µG is a Radon probability measure (256E); we call it the standard normal distribution. The corresponding distribution function is 2 1 Ra Φ(a) = µG ]−∞, a] = √ e−x /2 dx −∞ 2π
for a ∈ R; for the rest of this section I will reserve the symbol Φ for this function. Writing Σ for the algebra of Lebesgue measurable subsets of R, (R, Σ, µG ) is a probability space. Note 2 that it is complete, and has the same negligible sets as Lebesgue measure, because e−x /2 > 0 for every x (cf. 234Dc). (b) A random variable X is standard normal if its distribution is µG ; that is, if the function x 7→ 2 1 √ e−x /2 2π
is a density function for X. The point of the remarks in (a) is that there are such random
variables; for instance, take the probability space (R, Σ, µG ) there, and set X(x) = x for every x ∈ R. (c) If X is a standard normal random variable, then 2 1 R∞ E(X) = √ xe−x /2 dx = 0, −∞ 2π
1 Var(X) = √
R∞
x 2π −∞
2 −x2 /2
e
dx = 1
by 263H. (d) More generally, a random variable X is normal if there are a ∈ R, σ > 0 such that Z = (X − a)/σ is standard normal. In this case X = σZ + a so E(X) = σE(Z) + a = a, Var(X) = σ 2 Var(Z) = σ 2 . We have, for any c ∈ R,
274B
The central limit theorem
1 √ σ 2π
Z
c
e−(x−a)
2
/2σ 2
Z
1 dx = √
2π
∞
371
(c−a)/σ
e−y
2
/2
dy
−∞
(substituting x = a + σy for −∞ < y ≤ (c − a)/σ) = Pr(Z ≤ 1 So x 7→ √ e−(x−a)
2
/2σ 2
σ 2π
c−a ) σ
= Pr(X ≤ c).
is a density function for X (271Ib). Conversely, of course, a random variable with
this density function is normal, with expectation a and variance σ 2 . (e) If Z is standard normal, so is −Z, because 1 Pr(−Z ≤ a) = Pr(Z ≥ −a) = √
R∞
e 2π −a
−x2 /2
1 dx = √
Ra
e 2π −∞
−x2 /2
dx.
The definition in the first sentence of (d) now makes it obvious that if X is normal, so is a + bX for any a ∈ R, b ∈ R \ {0}. 274B Proposition Let X1 , . . . , Xn be independent normal random variables. Then Y = X1 + . . . + Xn is normal, with E(Y ) = E(X1 ) + . . . + E(Xn ) and Var(Y ) = Var(X1 ) + . . . + Var(Xn ). proof There are innumerable proofs of this fact; the following one gives me a chance to show off the power of Chapter 26, but of course (at the price of some disagreeable algebra) 272T also gives the result. p (a) Consider first the case n = 2. Setting ai = E(Xp Var(Xi ), Zi = (Xi − ai )/σi we get i ), σi = independent standard normal variables Z1 , Z2 . Set ρ = σ12 + σ22 , and express σ1 , σ2 as ρ cos θ, ρ sin θ. Consider U = cos θZ1 + sin θZ2 . We know that (Z1 , Z2 ) has a density function (ζ1 , ζ2 ) 7→ g(ζ1 , ζ2 ) = (272I). Consequently, for any c ∈ R, Pr(U ≤ c) =
2 2 1 e−(ζ1 +ζ2 )/2 2πσ1 σ2
R
g(z)dz,
F
where F = {(ζ1 , ζ2 ) : ζ1 cos θ + ζ2 sin θ ≤ c}. But now let T be the matrix µ ¶ cos θ − sin θ . sin θ cos θ Then it is easy to check that T −1 [F ] = {(η1 , η2 ) : η1 ≤ c}, det T = 1, g(T y) = g(y) for every y ∈ R 2 , so by 263A Pr(U ≤ c) =
R F
g(z)dz =
R T −1 [F ]
g(T y)dy =
R ]−∞,c]×R
g(y)dy = Pr(Z1 ≤ c) = Φ(c).
As this is true for every c ∈ R, U is also standard normal (I am appealing to 271Ga again). But X1 + X2 = σ1 Z1 + σ2 Z2 + a1 + a2 = ρU + a1 + a2 , so X1 + X2 is also normal. (b) Now we can induce on n. If n = 1 the result is trivial. For the inductive step to n + 1 ≥ 2, we know that X1 + . . . + Xn is normal, by the inductive hypothesis, and that Xn+1 is independent of X1 + . . . + Xn , by 272L. So X1 + . . . + Xn + Xn+1 is normal, by (a). The computation of the expectation and variance of X1 + . . . + Xn is immediate from 271Ab and 272R.
372
Probability theory
274C
274C Lemma Let U0 , . . . , Un , V0 , . . . , Vn be independent real-valued random variables and h : R → R a bounded Borel measurable function. Then ¡ Pn ¢ ¡ ¢ Pn Pn |E h( i=0 Ui ) − h( i=0 Vi ) | ≤ i=0 supt∈R |E h(t + Ui ) − h(t + Vi ) |. Pj−1 Pn Pn Pn proof For 0 ≤ j ≤ n + 1, set Zj = i=0 Ui + i=j Vi , taking Z0 = i=0 Vi and Zn+1 = i=0 Ui , and for Pj−1 Pn j ≤ n set Wj = i=0 Uj + i=j+1 Vj , so that Zj = Wj + Vj and Zj+1 = Wj + Uj and Wj , Uj and Vj are independent (I am appealing to 272K, as in 272L). Then n n n X ¡ X ¢ ¡X ¢ |E h( Ui ) − h( Vi ) | = |E h(Zi+1 ) − h(Zi ) | i=0
i=0
i=0
≤
n X
¡ ¢ |E h(Zi+1 ) − h(Zi ) |
i=0
=
n X
¡ ¢ |E h(Wi + Ui ) − h(Wi + Vi ) |.
i=0
To estimate this sum I turn it into a sum of integrals, as follows. For each i, let νWi be the distribution of Wi , and so on. Because (w, u) 7→ w + u is continuous, therefore Borel measurable, (w, u) 7→ h(w, u) is also Borel measurable; accordingly (w, u, v) 7→ h(w + u) − h(w + v) is measurable for each of the product measures νWi × νUi × νVi on R 3 , and 271E and 272G give us ¯ ¡ ¢¯ ¯E h(Wi + Ui )−h(Wi + Vi ) ¯ Z ¯ ¯ ¯ = h(w + u) − h(w + v)(νWi × νUi × νVi )d(w, u, v)¯ Z Z ¯ ¡ ¯ ¢ =¯ h(w + u) − h(w + v)(νUi × νVi )d(u, v) νWi (dw)¯ Z Z ¯ ¯ ≤ ¯ h(w + u) − h(w + v)(νUi × νVi )d(u, v)¯νWi (dw) Z ¯ ¡ ¢¯ = ¯E h(w + Ui ) − h(w + Vi ) ¯νWi (dw) ¯ ¡ ¢¯ ≤ sup¯E h(t + Ui ) − h(t + Vi ) ¯. t∈R
So we get n n n X X ¡ X ¢ ¡ ¢ |E h( Ui ) − h( Vi ) | ≤ |E h(Wi + Ui ) − h(Wi + Vi ) | i=0
i=0
i=0
≤
n X
¡ ¢ sup |E h(t + Ui ) − h(t + Vi ) |,
i=0 t∈R
as required. 274D Lemma Let h : R → R be a bounded three-times-differentiable function such that M2 = supx∈R |h00 (x)|, M3 = supx∈R |h000 (x)| are both finite. Let ² > 0. (a) Let U be a real-valued random variable of zero expectation and finite variance σ 2 . Then for any t ∈ R we have |E(h(t + U )) − h(t) −
σ 2 00 h (t)| 2
1 6
≤ ²M3 σ 2 + M2 E(ψ² (U ))
where ψ² (x) = 0 if |x| ≤ ², x2 if |x| > ². (b) Let U0 , . . . , Un , V0 , . . . , Vn be independent random variables with finite variances, and suppose that E(Ui ) = E(Vi ) = 0, Var(Ui ) = Var(Vi ) = σi2 for every i ≤ n. Then
274E
The central limit theorem
373
n n X ¡ X ¢ |E h( Ui ) − h( Vi ) | i=0
i=0 n X
1 3
≤ ²M3
σi2 + M2
n n X X ¡ ¢ ¡ ¢ E ψ² (Ui ) + M2 E ψ² (Vi ) . i=0
i=0
i=0
proof (a) The point is that, by Taylor’s theorem with remainder, 1 2
|h(t + x) − h(t) − xh0 (t)| ≤ M2 x2 , 1 2
1 6
|h(t + x) − h(t) − xh0 (t) − x2 h00 (t)| ≤ M3 |x|3 for every x ∈ R. So 1 2
1 6
1 6
|h(t + x) − h(t) − xh0 (t) − x2 h00 (t)| ≤ min( M3 |x|3 , M2 x2 ) ≤ ²M3 x2 + M2 ψ² (x). Integrating with respect to the distribution of U , we get ¢ ¡ 1 1 |E h(t + U )) − h(t) − h00 (t)σ 2 | = |E(h(t + U )) − h(t) − h0 (t)E(U ) − h00 (t)E(U 2 )| 2
2
¡ ¢ 1 = |E h(t + U ) − h(t) − h0 (t)U − h00 (t)U 2 | 2
¢ ¡ 1 ≤ E |h(t + U ) − h(t) − h0 (t)U − h00 (t)U 2 | 2
¡1 ¢ ≤ E ²M3 U 2 + M2 ψ² (U ) 6
1 6
= ²M3 σ 2 + M2 E(ψ² (U )), as claimed. (b) By 274C, n n n X X ¡ X ¢ ¡ ¢ |E h( Ui ) − h( Vi ) | ≤ sup |E h(t + Ui ) − h(t + Vi ) | i=0
i=0 t∈R
i=0
≤
n X
¡ 1 sup |E(h(t + Ui )) − h(t) − h00 (t)σi2 | 2
i=0 t∈R
¢ 1 + |E(h(t + Vi )) − h(t) − h00 (t)σi2 | , 2
which by (a) above is at most Pn
1 2 i=0 3 ²M3 σi
+ M2 E(ψ² (Ui )) + M2 E(ψ² (Vi )),
as claimed. 274E Lemma For any ² > 0, there is a three-times-differentiable function h : R → [0, 1], with continuous third derivative, such that h(x) = 1 for x ≤ −² and h(x) = 0 for x ≥ ². proof Let f : ]−², ²[ → ]0, ∞[ be any twice-differentiable function such that limx↓−² f (n) (x) = limx↑² f (n) (x) = 0 for n = 0, 1 and 2, writing f (n) for the nth derivative of f ; for instance, you could take f (x) = (²2 − x2 )3 , or f (x) = exp(−
1 ). ²2 −x2
Now set
374
Probability theory
h(x) = 1 −
Rx −²
f/
274E
R² −²
f
for |x| ≤ ². 274F Lindeberg’s theorem Let ² > 0. Then there is a δ > 0 such that whenever X0 , . . . , Xn are independent real-valued random variables such that E(Xi ) = 0 for every i ≤ n, Pn i=0 Var(Xi ) = 1, Pn i=0 E(ψδ (Xi )) ≤ δ (writing ψδ (x) = 0 if |x| ≤ δ, x2 if |x| > δ), then ¯ Pn ¯ ¯Pr( ¯ i=0 Xi ≤ a) − Φ(a) ≤ ² for every a ∈ R. proof (a) Let h : R → [0, 1] be a three-times-differentiable function, with continuous third derivative, such that χ ]−∞, −²] ≤ h ≤ χ ]−∞, ²], as in 274E. Set M2 = supx∈R |h00 (x)| = sup|x|≤² |h00 (x)|, M3 = supx∈R |h000 (x)| = sup|x|≤² |h000 (x)|; 2 ) 2π
because h000 is continuous, both are finite. Write ²0 = ²(1 − √
> 0, and let η > 0 be such that
1 3
( M3 + 2M2 )η ≤ ²0 . Note that limm→∞ ψm (x) = 0 for every x, so if X is a random variable of finite variance we must have limm→∞ E(ψm (X)) = 0, by Lebesgue’s Dominated Convergence Theorem; let m ≥ 1 be such that E(ψm (Z)) ≤ η, where Z is some (or any) standard normal random variable. Finally, take δ > 0 such that δ ≤ η, δ + δ 2 ≤ (η/m)2 . (I hope that you have seen enough ²-δ arguments not to be troubled by any expectation of understanding the reasons for each particular formula here before reading the rest of the argument. But the formula 1 3 M3 + 2M2 , in association with ψδ , should recall 274D.) Pn (b) PnLet X0 , . . . , Xn be independent random variables with zero expectation such that i=0 Var(Xi ) = 1 and i=0 E(ψδ (Xi )) ≤ δ. We need an auxiliary sequence Z0 , . . . , Zn of standard normal random variables to match against the Xi . To create this, I use the following device. Suppose that the probability space underlying X0 , . . . , Xn is (Ω, Σ, µ). Set Ω0 = Ω × R n+1 , and let µ0 be the product measure on Ω0 , where Ω is given the measure µ and each factor R of R n+1 is given the measure µG . Set Xi0 (ω, z) = Xi (ω) and Zi (ω, z) = ζi for ω ∈ dom Xi , z = (ζ0 , . . . , ζn ) ∈ R n+1 , i ≤ n. Then X00 , . . . , Xn0 , Z0 , . . . , Zn are independent, and each Xi0 has the same distribution as Xi (272Mb). Consequently S 0 = X00 + . . . + Xn0 has the same distribution as S = X0 + . . . + Xn (using 272S, or otherwise); so that E(g(S 0 )) = E(g(S)) for any bounded Borel measurable function g (using 271E). Also each Zi has distribution µG , so is standard normal. p (c) Write σi = Var(Xi ) for each i, and set K = {i : i ≤ n, σi > 0}. Observe that η/σi ≥ m for each i ∈ K. P P We know that σi2 = Var(Xi ) = E(Xi2 ) ≤ E(δ 2 + ψδ (Xi )) = δ 2 + E(ψδ (Xi )) ≤ δ 2 + δ, so
√ η/σi ≥ η/ δ + δ 2 ≥ m
by the choice of δ. Q Q 0 (d) Consider the independent normal random variables σi Zi . We have E(σi Zi ) = E(X i ) = 0 and P n 0 2 Var(σi Zi ) = Var(Xi ) = σi for each i, so that Z = Z0 + . . . + Zn has expectation 0 and variance i=0 σi2 = 1; moreover, by 274B, Z is normal, so in fact it is standard normal. Now we have
274F
The central limit theorem
n X
E(ψη (σi Zi )) =
i=0
X
E(ψη (σi Zi )) =
i∈K
375
X
σi2 E(ψη/σi (Zi ))
i∈K
(because σ 2 ψη/σ (x) = ψη (σx) whenever x ∈ R, σ > 0) X X = σi2 E(ψη/σi (Z)) ≤ σi2 E(ψm (Z)) i∈K
i∈K
(because, by (c), η/σi ≥ m for every i ∈ K, so ψη/σi (t) ≤ ψm (t) for every t) X ≤ σi2 η i∈K
(by the choice of m) = η. On the other hand, we surely have Pn Pn Pn 0 i=0 E(ψη (Xi )) = i=0 E(ψη (Xi )) ≤ i=0 E(ψδ (Xi )) ≤ δ ≤ η. (e) For any real number t, set ht (x) = h(x − t) for each x ∈ R. Then ht is three-times-differentiable, with supx∈R |h00t (x)| = M2 and supx∈R |h000 (x)| = M3 . Consequently |E(ht (S)) − E(ht (Z))| ≤ ²0 . P P By 274Db, |E(ht (S)) − E(ht (Z))| = |E(ht (S 0 )) − E(ht (Z))| n n X X = |E(ht ( Xi0 )) − E(ht ( σi Zi ))| i=0
1 3
≤ ηM3
n X i=0
i=0
σi2 + M2
n X
E(ψη (Xi )) + M2
i=0
n X
E(ψη (σi Zi ))
i=0
1 3
≤ ηM3 + M2 η + M2 η ≤ ²0 , by the choice of η. Q Q (f ) Now take any a ∈ R. We have χ ]−∞, a − 2²] ≤ ha−² ≤ χ ]∞, a] ≤ ha+² ≤ χ ]−∞, a + ²]. Note also that, for any b, 1 R b+2² −x2 /2 e dx 2π b
Φ(b + 2²) = Φ(b) + √
2² 2π
≤ Φ(b) + √
= Φ(b) + ² − ²0 .
Consequently Φ(a) − ² ≤ Φ(a − 2²) − ²0 = Pr(Z ≤ a − 2²) − ²0 ≤ E(ha−² (Z)) − ²0 ≤ E(ha−² (S)) ≤ Pr(S ≤ a) ≤ E(ha+² (S)) ≤ E(ha+² (Z)) + ²0 ≤ Pr(Z ≤ a + 2²) + ²0 = Φ(a + 2²) + ²0 ≤ Φ(a) + ². But this means just that
¯ Pn ¯Pr(
i=0
as claimed.
¯ Xi ≤ a) − Φ(a)¯ ≤ ²,
376
Probability theory
274G
274G Central Limit Theorem Let hXn in∈N be an independent sequence of random variables, all pP n with zero expectation and finite variance; write sn = i=0 Var(Xi ) for each n. Suppose that 1 Pn limn→∞ 2 i=0 E(ψδsn (Xi )) = 0 for every δ > 0, sn
2
writing ψδ (x) = 0 if |x| ≤ δ, x if |x| > δ. Set Sn =
1 (X0 sn
+ . . . + Xn )
for each n ∈ N such that sn > 0. Then limn→∞ Pr(Sn ≤ a) = Φ(a) uniformly for a ∈ R. proof Given ² > 0, take δ > 0 as in Lindeberg’s theorem (274F). Then for all n large enough, 1 Pn i=0 E(ψδsn (Xi )) ≤ δ. 2 sn
Fix on any such n. Of course we have sn > 0. Set Xi0 =
1 Xi sn
for i ≤ n;
then X00 , . . . , Xn0 are independent, with zero expectation, Pn 1 Pn 0 i=0 2 Var(Xi ) = 1, i=0 Var(Xi ) = sn
Pn
E(ψδ (Xi0 )) =
i=0
By 274F,
Pn
1
i=0 s2 E(ψδsn (Xi )) n
≤ δ.
¯ ¯ ¯ ¯ ¯Pr(Sn ≤ a) − Φ(a)¯ = ¯Pr(Pn X 0 ≤ a) − Φ(a)¯ ≤ ² i=0 i
for every a ∈ R. Since this is true for all n large enough, we have the result. 274H Remarks (a) The condition limn→∞
1 s2n
Pn i=0
E(ψ²sn (Xi )) = 0 for every ² > 0
is called Lindeberg’s condition, following Lindeberg 22. (b) Lindeberg’s condition is necessary as well as sufficient, in the following sense. Suppose that hXn in∈N is an independent sequence of real-valued random variables with zero expectation and finite variance; write pPn p σn σn = Var(Xn ), sn = = 0 and that i=0 Var(Xi ) for each n. Suppose that limn→∞ sn = ∞, limn→∞ sn
limn→∞ Pr(Sn ≤ a) = Φ(a) for each a ∈ R, where Sn = limn→∞
1 s2n
Pn i=0
1 (X0 sn
+ . . . + Xn ). Then
E(ψ²sn (Xi )) = 0
`ve 77, §21.2.) for every ² > 0. (Feller 66, §XV.6, Theorem 3; Loe (c) The proof of 274F-274G here is adapted from Feller 66, §VIII.4. It has the virtue of being ‘elementary’, in that it does not involve characteristic functions. Of course this has to be paid for by a number of detailed estimations; and – what is much more serious – it leaves us without one of the most powerful techniques for describing distributions. The proof does offer a method of bounding | Pr(Sn ≤ a) − Φ(a)|; but it should be said that the bounds obtained are not useful ones, being grossly over-pessimistic, at least in the readily analysable cases. (For instance, a better bound, in many cases, is given by the Berry-Ess´een
274J
The central limit theorem
377
theorem: p if hXn in∈N is independent and identically distributed, with zero expectation, and the common values of E(Xn2 ), E(|Xn |3 ) are σ, ρ < ∞, then | Pr(Sn ≤ a) − Φ(a)| ≤
33ρ √ ; 4σ 3 n+1
`ve 77, §21.3, or Hall 82.) Furthermore, when |a| is large, Φ(a) is exceedingly see Feller 66, §XVI.5, Loe close to either 0 or 1, so that any uniform bound for | Pr(S ≤ a) − Φ(a)| gives very little information; a great deal of work has been done on estimating the tails of such distributions more precisely, subject to special conditions. For instance, if X0 , . . . , Xn are independent random variables, ofp zero expectation, uniformly bounded with |Xi | ≤ K almost everywhere for each i, Y = X1 + . . . + Xn , s = Var(Y ) > 0, S = 1s Y , then for any α ∈ [0, s/K] ¡ Pr(|S| ≥ α) ≤ 2 exp
¢ 2 −α2 l 2e−α /2 αK 2 2(1 + 2s )
´nyi 70, §VII.4, Theorem 1). if s À αK (Re I now list some of the standard cases in which Lindeberg’s conditions are satisfied, so that we may apply the theorem. 274I Corollary Let hXn in∈N be an independent sequence of real-valued random variables, all with the same distribution, and suppose that their common expectation is 0 and their common variance is finite and p not zero. Write σ for the common value of Var(Xn ), and set 1 (X0 σ n+1
Sn = √
+ . . . + Xn )
for each n ∈ N. Then limn→∞ Pr(Sn ≤ a) = Φ(a) uniformly for a ∈ R. √ proof In the language of 274H, we have σn = σ, sn = σ n, so the first two conditions are surely satisfied; moreover, if ν is the common distribution of the Xn , then R E(ψ²sn (Xn )) = {x:|x|>²σ√n} x2 ν(dx) → 0 by Lebesgue’s Dominated Convergence Theorem; so that 1 Pn i=0 E(ψ²sn (Xn )) → 0 2 sn
by 273Ca. Thus Lindeberg’s conditions are satisfied and 274G gives the result. 274J Corollary Let hXn in∈N be an independent sequence of real-valued random variables with zero expectation, and suppose that {Xn2 : n ∈ N} is uniformly integrable (246A) and that 1 Pn lim inf n→∞ i=0 Var(Xi ) > 0. n+1
Set sn =
pPn i=0
Var(Xi ),
Sn =
1 (X0 sn
+ . . . + Xn )
for large n ∈ N. Then limn→∞ Pr(Sn ≤ a) = Φ(a) uniformly for a ∈ R. proof The condition lim inf n→∞
1 n+1
Pn i=0
Var(Xi ) > 0
378
Probability theory
274J
√ means that there are c > 0, n0 ∈ N such that sn ≥ c n + 1 for every n ≥ n0 . Let the underlying space be (Ω, Σ, µ), and ², η > 0. Writing ψδ (x) = 0 for |x| ≤ δ, x2 for |x| > δ, as in 274F-274G, we have R E(ψ²sn (Xi )) ≤ E(ψc²√n+1 (Xi )) = F (i,c²√n+1) Xi2 dµ 2 for n ≥ n0 , i ≤ n, where F (i, γ) = {ω i (ω)| > γ}. Because {Xi : i ∈ N} is uniformly R : ω ∈ 2dom Xi , |X 2 integrable, there is a γ ≥ 0 such that F (i,γ) Xi dµ ≤ ηc for every i ∈ N (246I). Let n1 ≥ n0 be such that √ c² n1 + 1 ≥ γ; then for any n ≥ n1 Pn 1 Pn 1 2 i=0 E(ψ²sn (Xi )) ≤ 2 i=0 ηc = η. 2
sn
c (n+1)
As ², η are arbitrary, the conditions of 274G are satisfied and the result follows. 274K Corollary Let hXn in∈N be an independent sequence of real-valued random variables with zero expectation, and suppose that (i) there is some δ > 0 such that supn∈N E(|Xn |2+δ ) < ∞, 1 Pn (ii) lim inf n→∞ i=0 Var(Xi ) > 0. n+1 pPn Set sn = i=0 Var(Xi ) and Sn =
1 (X0 sn
+ . . . + Xn )
for large n ∈ N. Then limn→∞ Pr(Sn ≤ a) = Φ(a) uniformly for a ∈ R. P Set K = 1 + supn∈N E(|Xn |2+δ ). Given proof The point is that {Xn2 : n ∈ N} is uniformly integrable. P ² > 0, set M = (K/²)1/δ . Then (Xn2 − M )+ ≤ M −δ |Xn |2+δ , so E(Xn2 − M )+ ≤ KM −δ = ² for every n ∈ N. As ² is arbitrary, {Xn2 : n ∈ N} is uniformly integrable. Q Q Accordingly the conditions of 274J are satisfied and we have the result. 274L Remarks (a) All the theorems of this section are devoted to finding conditions under which a random variable S is ‘nearly’ standard normal, in the sense that Pr(S ≤ a) l Pr(Z ≤ a) uniformly for a ∈ R, where Z is some (or any) standard normal random variable. In all cases the random variable S is normalized to have expectation 0 and variance 1, and is a sum of a large number of independent random variables. (In 274G and 274I-274K it is explicit that there must be many Xi , since they refer to a limit as n → ∞. This is not said√in so many words in the formulation I give of Lindeberg’s theorem, but the proof makes it evident that n δ + δ 2 ≥ 1, so surely n will have to be large there also.) (b) I cannot leave this section without remarking that the form of the definition of ‘nearly standard normal’ may lead your intuition astray if you try to apply it to other distributions. If we take F to be the distribution function of S, so that F (a) = Pr(S ≤ a), I am saying that S is ‘nearly standard normal’ if supa∈R |F (a) − Φ(a)| is small. It is natural to think of this as approximation in a metric, writing ρ˜(ν, ν 0 ) = supa∈R |Fν (a) − Fν 0 (a)| for distributions ν, ν 0 on R, where Fν (a) = ν ]−∞, a]. In this form, the theorems above can be read as finding conditions under which limn→∞ ρ˜(νSn , µG ) = 0. But the point is that ρ˜ is not really the right metric to use. It works here because µG is atomless. But suppose, for instance, that ν is the distribution which gives mass 1 to the point 0 (I mean, that νE = 1 if 0 ∈ E ⊆ R, 0 if 0 ∈ / E ⊆ R), and that νn is the distribution of a normal random variable with expectation 0 and variance n1 , for each n ≥ 1. Then Fν (0) = 1 and Fνn (0) = 12 , so ρ˜(νn , ν) = 12 for each n ≥ 1. However, for most purposes one would regard the difference between νn and ν as small, and surely ν is the only distribution which one could reasonably call a limit of the νn .
274Xd
The central limit theorem
379
(c) The difficulties here present themselves in more than one form. A statistician would be unhappy with the idea that the νn of the last paragraph were far from ν (and from each other), on the grounds that any measurement involving random variables with these distributions must be subject to error, and small errors of measurement will render them indistinguishable. A pure mathematician, looking forward to the possibility of generalizing these results, will be unhappy with the emphasis given to the values of ν ]−∞, a], for which it may be difficult to find suitable equivalents in more abstract spaces. (d) These considerations join together to lead us to a rather different definition for a topology on the space P of probability distributions on R. For any bounded continuous function h : R → R we have a pseudometric ρh : P × P → [0, ∞[ defined by writing R R ρh (ν, ν 0 ) = | h dν − h dν 0 | for all ν, ν 0 ∈ P . The vague topology on P is that generated by the pseudometrics ρh (2A3F). I will not go into its properties in detail here (some are sketched in 274Ya-274Yd below; see also 285K-285L, 285S and 437J-437O in Volume 4). But I maintain that the right way to look at the results of this chapter is to say that (i) the distributions νS are close to µG for the vague topology (ii) the sets {ν : ρ˜(ν, µG ) < ²} are open for that topology, and that is why ρ˜(νS , µG ) is small. *274M I include a simple pair of inequalities which are frequently useful when studying normal random variables. R∞ 2 2 1 Lemma (a) x e−t /2 dt ≤ e−x /2 for every x > 0. x R∞ 2 2 1 (b) x e−t /2 dt ≥ e−x /2 for every x ≥ 1. 2x
proof (a) Z
∞
Z 2
e−t
/2
∞
dt =
x
Z e−(x+s)
2
/2
2
ds ≥ e−x
0
/2
∞
1 x
e−xs ds = e−x
0
2
/2
.
(b) Set f (t) = e−t 0
00
2
2
/2
2
− (1 − x(t − x))e−x
/2
.
2
−t /2
Then f (x) = f (x) = 0 and f (t) = (t − 1)e is positive for t ≥ x (because x ≥ 1). Accordingly f (t) ≥ 0 R x+1/x for every t ≥ x, and x f (t)dt ≥ 0. But this means just that Z
∞
x
Z e−t
2
/2
dt ≥
1 x+ x
x
Z e−t
2
/2
≥
1 x+ x
2
(1 − x(t − x))e−x
/2
=
x
1 −x2 /2 e , 2x
as required. 274X Basic exercises > (a) Use 272T to give an alternative proof of 274B. (b) Prove 274D when h00 is M3 -Lipschitz but not necessarily differentiable. (c) Let hmk ik∈N be a strictly increasing sequence in N such that m0 = 0 and limk→∞ mk /mk+1 = 0. Let √ √ hXn in∈N be an independent sequence of random variables such that Pr(Xn = mk ) = Pr(Xn = − mk ) = 1/2mk , Pr(Xn = 0) = 1 − 1/mk whenever mk−1 ≤ n < mk . Show that the Central Limit Theorem is not √ valid for hXn in∈N . (Hint: setting Wk = (X0 + . . . + Xmk −1 )/ mk , show that Pr(Wk ∈ G) → 1 for every open set G including Z.) (d) Let hXn in∈N be any independent sequence of random variables all with the same distribution; suppose that they all have finite variance σ 2 > 0, and that their common expectation is c. Set Sn = √
1 (X0 n+1
+
. . . + Xn ) for each n, and let Y be a normal random variable with expectation c and variance σ 2 . Show that limn→∞ Pr(Sn ≤ a) = Pr(Y ≤ a) uniformly for a ∈ R.
380
Probability theory
274Xe
> (e) Show that for any a ∈ R, 1 lim n→∞ 2n
bn 2 +a
√
n
X2 r=0
c
√ n! 1 n n = lim n #({I : I ⊆ n, #(I) ≤ + a }) = Φ(a). 2 2 n→∞ r!(n − r)! 2
(f ) Show that 274I is a special case of 274J. (g)p Let hXn in∈N be an independent sequence of real-valued random variables with zero expectation. Set Pn sn = i=0 Var(Xi ) and Sn =
1 (X0 sn
+ . . . + Xn )
for each n ∈ N. Suppose that there is some δ > 0 such that 1 Pn limn→∞ 2+δ i=0 E(|Xi |2+δ ) = 0. sn
Show that limn→∞ Pr(Sn ≤ a) = Φ(a) uniformly for a ∈ R. (This is a form of Liapounoff ’s central limit theorem; see Liapounoff 1901.) (h) Let P be the set of Radon probability measures on R. Let ν0 ∈ P , a ∈ R. Show that the map ν 7→ ν ]−∞, a] : P → [0, 1] is continuous at ν0 for the vague topology on P iff ν0 {a} = 0. 274Y Further exercises (a) Write P for the set of Radon probability measures on R. For ν, ν 0 ∈ P set ρ(ν, ν 0 ) = inf{² : ² ≥ 0, ν ]−∞, a − ²] − ² ≤ ν 0 ]−∞, a] ≤ ν ]−∞, a + ²] + ² for every a ∈ R}. Show that ρ is a metric on P and that it defines the vague topology on P . (ρ is called L´ evy’s metric.) (b) Write P for the set of Radon probability measures on R, and let ρ˜ be the metric on P defined in 274Lb. Show that if ν ∈ P is atomless and ² > 0, then {ν 0 : ν 0 ∈ P, ρ˜(ν 0 , ν) < ²} is open for the vague topology on R. (c) Let hSn in∈N be a sequence of real-valued random variables, and Z a standard normal random variable. Show that the following are equiveridical: (i) µG = limn→∞ νSn for the vague topology, writing νSn for the distribution of Sn ; (ii) E(h(Z)) = limn→∞ E(h(Sn )) for every bounded continuous function h : R → R; (iii) E(h(Z)) = limn→∞ E(h(Sn )) for every bounded function h : R → R such that (α) h has continuous derivatives of all orders (β) {x : h(x) 6= 0} is bounded; (iv) limn→∞ Pr(Sn ≤ a) = Φ(a) for every a ∈ R; (v) limn→∞ Pr(Sn ≤ a) = Φ(a) uniformly for a ∈ R; (vi) {a : limn→∞ Pr(Sn ≤ a) = Φ(a)} is dense in R. (See also 285L.) (d) Let (Ω, Σ, µ) be a probability space, and P the set of Radon probability measures on R. Show that X 7→ νX : L0 (µ) → P is continuous for the topology of convergence in measure on L0 (µ) and the vague topology on P . 274 Notes and comments For more than two hundred years the Central Limit Theorem has been one of the glories of mathematics, and no branch of mathematics or science would be the same without it. I suppose it is the most important single theorem of probability theory; and I observe that the proof hardly uses measure theory. To be sure, I have clothed the arguments above in the language of measure and integration. But if you look at their essence, the vital elements of the proof are (i) a linear combination of independent normal random variables is normal (274Ae, 274B);
275B
Martingales
381
(ii) if U , V , W are independent random variables, and h is a bounded continuous function, then |E(h(U, V, W ))| ≤ supt∈R |E(h(U, V, t))| (274C); (iii) if (X0 , . . . , Xn ) are independent random variables, then we can find independent random variables (X00 , . . . , Xn0 , Z0 , . . . , Zn ) such that Zj is standard normal and Xj0 has the same distribution as Xj , for each j (274F). The rest of the argument consists of elementary calculus, careful estimations and a few of the most fundamental properties of expectations and independence. Now (ii) and (iii) are justified above by appeals to Fubini’s theorem, but surely they belong to the list of probabilistic intuitions which take priority over the identification of probabilities with countably additive functionals. If they had given any insuperable difficulty it would have been a telling argument against the model of probability we are using, but would not have affected the Central Limit Theorem. In fact (i) seems to be the place where we really need a mathematical model of the concept of ‘distribution’, and all the relevant calculations can be done in terms of the Riemann integral on the plane, with no mention of countable additivity. So while I am happy and proud to have written out a version of these beautiful ideas, I have to admit that they are in no essential way dependent on the rest of this treatise. In §285 I will describe a quite different approach to the theorem, using much more sophisticated machinery; but it will again be the case, perhaps more thoroughly hidden, that the relevance of measure theory will not be to the theorem itself, but to our imagination of what an arbitrary distribution is. For here I do have a claim to make for my subject. The characterization of distribution functions as arbitrary monotonic functions, continuous on the right, and with the right limits at ±∞ (271Xb), together with the analysis of monotonic functions in §226, gives us a chance of forming a mental picture of the proper class of objects to which such results as the Central Limit Theorem can be applied. Theorem 274F is a trifling modification of Theorem 3 of Lindeberg 22. Like the original, it emphasizes what I believe to be vital to all the limit theorems of this chapter: they are best founded on a proper understanding of finite sequences of random variables. Lindeberg’s condition was the culmination of a long search for the most general conditions under which the Central Limit Theorem would be valid. I offer a version of Laplace’s theorem (274Xe) as the starting place, and Liapounoff’s condition (274Xg) as an example of one of the intermediate stages. Naturally the corollaries 274I, 274J, 274K and 274Xd are those one seeks to apply by choice. There is an intriguing, but as far as I know purely coincidental, parallel between 273H/274K and 273I/274Xd. As an example of an independent sequence hXn in∈N of random variables, all with expectation zero and variance 1, to which the Central Limit Theorem does not apply, I offer 274Xc.
275 Martingales This chapter so far has been dominated by independent sequences of random variables. I now turn to another of the remarkable concepts to which probabilistic intuitions have led us. Here we study evolving systems, in which we gain progressively more information as time progresses. I give the basic theorems on pointwise convergence of martingales (275F-275H, 275K) and a very brief account of ‘stopping times’ (275L-275P). ˆ µ 275A Definition Let (Ω, Σ, µ) be a probability space with completion (Ω, Σ, ˆ), and hΣn in∈N a nonˆ A martingale adapted to hΣn in∈N is a sequence hXn in∈N of decreasing sequence of σ-subalgebras of Σ. integrable real-valued random variables on Ω such Rthat (i) dom R Xn ∈ Σn and Xn is Σn -measurable for each n ∈ N (ii) whenever m ≤ n ∈ N and E ∈ Σ then X = Xm . n E E R R m Note that it is enough if E Xn+1 = E Xn whenever n ∈ N, E ∈ Σn . 275B Examples We have seen many contexts in which such sequences appear naturally; here are a few. (a) Let (Ω, Σ, µ) be a probability space and hΣn in∈N a non-decreasing sequence of σ-subalgebras of Σ. Let X be any real-valued random variable on Ω with finite expectation, and for each n ∈ N let Xn be a conditional expectation of X on Σn , as in §233. Subject to the conditions that dom Xn ∈ Σn and Xn is
382
Probability theory
275B
actually Σn -measurable for each technical point – see 232He), hXn in∈N will be a martingale R n (a purely R R adapted to hΣn in∈N , because E Xn+1 = E X = E Xn whenever E ∈ Σn . (b) Let (Ω, Σ, µ) be a probability space and hXn in∈N an independent sequence of random variables all ˜ n be the σ-algebra generated by S with zero expectation. For each n ∈ N let Σ i≤n ΣXi , writing ΣXi for the ˜ n. σ-algebra defined by Xi (272C), and set Sn = X0 + . . . + XnR. Then hSnRin∈N is a martingale adapted to Σ ˜ n , so that ˜ (Use 272K to see that ΣXn+1 is independent of Σ X = X × χE = 0 for every E ∈ Σ n+1 n+1 n, E by 272Q.) (c) Let (Ω, Σ, µ) be a probability space and hXn in∈N an independent sequence of random variables all ˜ n be the σ-algebra generated by S with expectation 1. For each n ∈ N let Σ i≤n ΣXi , writing ΣXi for the ˜ n in∈N . σ-algebra defined by Xi , and set Wn = X0 × . . . × Xn . Then hWn in∈N is a martingale adapted to hΣ 275C Remarks (a) It seems appropriate to the concept of a random variable X being ‘adapted’ to a σ-algebra Σ to require that dom X ∈ Σ and that X should be Σ-measurable, even though this may mean that other random variables, equal almost everywhere to X, may fail to be ‘adapted’ to Σ. (b) Technical problems of this kind evaporate, of course, if all µ-negligible subsets of X belong to Σ0 . But examples such as 275Bb make it seem unreasonable to insist on such a simplification as a general rule. (c) The concept of ‘martingale’ can readily be extended to other index sets than N; indeed, if I is any partially ordered set, we can say that hXi ii∈I is a martingale on (Ω, Σ, µ) adapted to hΣi ii∈I if (i) each Σi ˆ (ii) each Xi is an integrable real-valued Σi -measurable random variable such that is a σ-subalgebra of Σ R R dom Xi ∈ Σi (iii) whenever i ≤ j in I, then Σi ⊆ Σj and E Xi = E Xj for every E ∈ Σi . The principal case, after I = N, is I = [0, ∞[; I = Z is also interesting, and I think it is fair to say that the most important ideas can already be expressed in theorems about martingales indexed by finite sets I. But in this volume I will generally take martingales to be indexed by N. (d) Given just a sequence hXn in∈N of integrable real-valued random variables on a probability space (Ω, Σ, µ), we can say simply that hXn in∈N is a martingale on (Ω, Σ, µ) if there is some non-decreasing ˆ (the completion of Σ) such that hXn in∈N is a martingale adapted sequence hΣn in∈N of σ-subalgebras of Σ S ˜ to hΣn in∈N . If we write Σn for the σ-algebra generated by i≤n ΣXi , where ΣXi is the σ-algebra defined by ˜ n in∈N . Xi , as in 275Bb, then it is easy to see that hXn in∈N is a martingale iff it is a martingale adapted to hΣ (e) Continuing from (d), it is also easy to see that if hXn in∈N is a martingale on (Ω, Σ, µ), and Xn0 =a.e. Xn for every n, then hXn0 in∈N is a martingale on (Ω, Σ, µ). (The point is that if hXn in∈N is adapted to hΣn in∈N , ˆ n in∈N , where then both hXn in∈N and hXn0 in∈N are adapted to hΣ ˆ n = {E4F : E ∈ Σn , F is negligible}.) Σ Consequently we have a concept of ‘martingale’ as a sequence in L1 (µ), saying that a sequence hXn• in∈N in L1 (µ) is a martingale iff hXn in∈N is a martingale. Nevertheless, I think that the concept of ‘martingale adapted to a sequence of σ-algebras’ is the primary one, since in all the principal applications the σ-algebras reflect some essential aspect of the problem, which may not be fully encompassed by the random variables alone. (f ) The word ‘martingale’ originally (in English; the history in French is more complex) referred to a strap used to prevent a horse from throwing its head back. Later it was used as the name of a gambling system in which the gambler doubles his stake each time he loses, and (in French) as a general term for gambling systems. These may be regarded as a class of ‘stopped-time martingales’, as described in 275L-275P below. 275D A large part of the theory of martingales consists of inequalities of various kinds. I give two of the most important, both due to J.L.Doob. (See also 276Xa-276Xb.) Lemma Let (Ω, Σ, µ) be a probability space, and hXn in∈N a martingale on Ω. Fix n ∈ N and set X ∗ = max(X0 , . . . , Xn ). Then for any ² > 0,
275F
Martingales
383
1 ²
Pr(X ∗ ≥ ²) ≤ E(Xn+ ), writing Xn+ = max(0, Xn ). ˆ for its domain. Let hΣn in∈N be a non-decreasing sequence of proof Write µ ˆ for the completion of µ, and Σ ˆ to which hXn in∈N is adapted. For each i ≤ n set σ-subalgebras of Σ
Then F0 , . . . , Fn T i≤n dom Xi ,
Ei = {ω : ω ∈ dom Xi , Xi (ω) ≥ ²}, S Fi = Ei \ j n, so if n ≤ ru+1 we have Psu −1 Pn−1 k=0 ck (yk+1 − yk ) ≥ (b − a)u, k=0 ck (yk+1 − yk ) = while if n > ru+1 we have n−1 X
ck (yk+1 − yk ) =
k=0
sX u −1
ck (yk+1 − yk ) +
k=0
n−1 X
yk+1 − yk
k=ru+1
≥ (b − a)u + yn − yru+1 ≥ (b − a)u because yn ≥ a = yru+1 . Thus in both cases we have the required result. Q Q (b)(i) Now define Yk (ω) = max(a, Xk (ω)) for ω ∈ dom Xk , T Fk = {ω : ω ∈ i≤k dom Xi , ∃ j ≤ k, Xj (ω) ≤ a, Xi (ω) < b if j ≤ i ≤ k} for each k ∈ N. If hΣn in∈N is a non-decreasing sequence of σ-algebras to which hXn in∈N is adapted, then Fk ∈ Σk (because if j ≤ k all the sets dom Xj , {ω : Xj (ω) ≤ a}, {ω : Xj (ω) < b} belong to Σj ⊆ Σk ). R R (ii) We find that F Yk ≤ F Yk+1 if F ∈ Σk . P P Set G = {ω : Xk (ω) > a} ∈ Σk . Then Z Z Yk = Xk + aˆ µ(F \ G) F ZF ∩G = Xk+1 + aˆ µ(F \ G) ZF ∩G Z Z ≤ Yk+1 + Yk+1 = Yk+1 . Q Q F ∩G
(iii) Consequently P P
R R
F
Yk+1 − Yk for every F ∈ Σk . R R (Yk+1 − Yk ) − F (Yk+1 − Yk ) = Ω\F Yk+1 − Ω\F Yk ≥ 0. Q Q
F
Yk+1 − Yk ≤
F \G
R
R
T (c) Let H be the conegligible set dom U = i≤n dom Xi ∈ Σn . We ought to check at some point that U is Σn -measurable; but this is clearly true, because all the relevant sets {ω : Xi (ω) ≤ a}, {ω : Xi (ω) ≥ b} belong to Σn . For each ω ∈ H, apply (a) to the list X0 (ω), . . . , Xn (ω) to see that
275H
Martingales
(b − a)U (ω) ≤
Pn−1 k=0
385
χFk (ω)(Yk+1 (ω) − Yk (ω)).
Because H is conegligible, it follows that
(b − a)E(U ) ≤
n−1 XZ k=0
Yk+1 − Yk ≤ Fk
n−1 XZ
Yk+1 − Yk
k=0
(using (b-iii)) = E(Yn − Y0 ) ≤ E((Xn − X0 )+ ) because Yn − Y0 ≤ (Xn − X0 )+ everywhere on dom Xn ∩ dom X0 . This completes the proof. 275G
We are now ready for the principal theorems of this section.
Doob’s Martingale Convergence Theorem Let hXn in∈N be a martingale on a probability space (Ω, Σ, µ), and suppose that supn∈N E(|Xn |) < ∞. Then limn→∞ Xn (ω) is defined in R for almost every ω in Ω. T proof (a) Set H = n∈N dom Xn , and for ω ∈ H set Y (ω) = lim inf n→∞ Xn (ω),
Z(ω) = lim supn→∞ Xn (ω),
allowing ±∞ in both cases. But note that Y ≤ lim inf n→∞ |Xn |, so by Fatou’s Lemma Y (ω) < ∞ for almost every ω; similarly Z(ω) > −∞ for almost every ω. It will therefore be enough if I can show that Y =a.e. Z, for then Y (ω) = Z(ω) ∈ R for almost every ω, and hXn (ω)in∈N will be convergent for almost every ω. (b) ?? So suppose, if possible, that Y and Z are not equal almost everywhere. Of course both are ˆ ˆ µ Σ-measurable, where (Ω, Σ, ˆ) is the completion of (Ω, Σ, µ), so we must have µ ˆ{ω : ω ∈ H, Y (ω) < Z(ω)} > 0. Accordingly there are rational numbers q, q 0 such that q < q 0 and µ ˆG > 0, where G = {ω : ω ∈ H, Y (ω) < q < q 0 < Z(ω)}. Now, for each ω ∈ H, n ∈ N, let Un (ω) be the number of up-crossings from q to q 0 in the list X0 (ω), . . . , Xn (ω). Then 275F tells us that E(Un ) ≤
1 E((Xn q 0 −q
− X0 )+ ) ≤
1 E(|Xn | + |X0 |) q 0 −q
≤
2M , q 0 −q
if we write M = supi∈N E(|Xi |). By B.Levi’s theorem, U (ω) = supn∈N Un (ω) < ∞ for almost every ω. On the other hand, if ω ∈ G, then there are arbitrarily large j, k such that Xj (ω) < q and Xk (ω) > q 0 , so U (ω) = ∞. This means that µ ˆG must be 0, contrary to the choice of q, q 0 . X X (c) Thus we must in fact have Y =a.e. Z, and hXn (ω)in∈N is convergent for almost every ω, as claimed. 275H Theorem Let (Ω, Σ, µ) be a probability space, and hΣn in∈N a non-decreasing sequence of σsubalgebras of Σ. Let hXn in∈N be a martingale adapted to hΣn in∈N . Then the following are equiveridical: (i) there is a random variable X, of finite expectation, such that Xn is a conditional expectation of X on Σn for every n; (ii) {Xn : n ∈ N} is uniformly integrable; (iii) X∞ (ω) = limn→∞ Xn (ω) is defined in R for almost every ω, and E(|X∞ |) = limn→∞ E(|Xn |) < ∞. proof (i)⇒(ii) By 246D, the set of all conditional expectations of X is uniformly integrable, so {Xn : n ∈ N} is surely uniformly integrable. (ii)⇒(iii) If {Xn : n ∈ N} is uniformly integrable, we surely have supn∈N E(|Xn |) < ∞, so 275G tells us that X∞ is defined almost everywhere. By 246Ja, X∞ is integrable and limn→∞ E(|Xn − X∞ |) = 0. Consequently E(|X∞ |) = limn→∞ E(|Xn |) < ∞. (iii)⇒(i) Because E(|X∞ |) = limn→∞ E(|Xn |), limn→∞ E(|Xn − X∞ |) = 0 (245H(a-ii)). Now let n ∈ N, E ∈ Σn . Then R R R Xn = limm→∞ E Xm = E X∞ . E As E is arbitrary, Xn is a conditional expectation of X∞ on Σn .
386
Probability theory
275I
275I Theorem Let (Ω, Σ, µ) be a probability space, and S hΣn in∈N a non-decreasing sequence of σsubalgebras of Σ; write Σ∞ for the σ-algebra generated by n∈N Σn . Let X be any real-valued random variable on Ω with finite expectation, and for each n ∈ N let Xn be a conditional expectation of X on Σn . Then X∞ (ω) = limn→∞ Xn (ω) is defined almost everywhere; limn→∞ E(|X∞ − Xn |) = 0, and X∞ is a conditional expectation of X on Σ∞ . proof (a) By 275G-275H, we know that X∞ is defined almost everywhere, and, as remarked in 275H, limn→∞ E(|X∞ − Xn |) = 0. To see that X∞ is a conditional expectation of X on Σ∞ , set R R S A = {E : E ∈ Σ∞ , E X∞ = E X}, I = n∈N Σn . α) Of course Ω ∈ I and I Now I and A satisfy the conditions of the Monotone Class Theorem (136B). P P (α is closed under finite intersections, because hΣn in∈N is a non-decreasing sequence of σ-algebras; in fact I is β ) If E ∈ I, say E ∈ Σn ; then a subalgebra of PΩ, and is closed under finite unions and complements. (β R R R X∞ = limm→∞ E Xm = E X, E as in (iii)⇒(i) of 275H, so E ∈ A. Thus I ⊆ A. (γγ ) If E, F ∈ A and E ⊆ F , then R R R R R R X∞ = F X∞ − E X∞ = F X − E X = F \E X, F \E so F \ E ∈ A. (δδ ) If hEk in∈N is a non-decreasing sequence in A with union E, then R R R R X∞ = limk→∞ Ek X∞ = limk→∞ Ek X = E X, E so E ∈ A. Thus A is a Dynkin class. Q Q Consequently, by 136B, A includes Σ∞ ; that is, X∞ is a conditional expectation of X on Σ∞ . • Remark I have written ‘limn→∞ E(|Xn − X∞ |) = 0’; but you may prefer to say ‘X∞ = limn→∞ Xn• in 1 L (µ)’, as in Chapter 24. The importance of this theorem is such that you may be interested in a proof based on 275D rather than 275E-275G; see 275Xd.
*275J As a corollary of this theorem I give an important result, a kind of density theorem for product measures. Proposition Let h(Ωn , Σn , µn )in∈N be a sequence of probability spaces with product (Ω, Σ, µ). Let X be a real-valued random variable on Ω with finite expectation. For each n ∈ N define Xn by setting R ω ) = X(ω0 , . . . , ωn , ξn+1 , . . . )d(ξn+1 , . . . ) Xn (ω R wherever this is Q defined, where I write ‘ . . . d(ξn+1 , . . . )’ to mean integration with respect to the product ω ) = limn→∞ Xn (ω ω ) for almost every ω = (ω0 , ω1 , . . . ) in Ω, and measure λ0n on i≥n+1 Ωi . Then X(ω limn→∞ E(|X − Xn |) = 0. proof For each n, we can identify µ with the product of λn and λ0n , where λn is the product measure on Ω0 × . . .Q × Ωn (254N). So 253H tells us that Xn is a conditional expectation of X on the σ-algebra Λn = {E × i>n Ωi : E ∈ dom λn }. Since (by 254N again) we can think of λn+1 as the product of λn and µn+1 , Λn ⊆ Λn+1 for each n. So 275I tells us that hXn in∈N converges almost everywhere to a conditional S N expectation X∞ of X on the σ-algebra Λ∞ generated by n∈N Λn . Now Λ∞ ⊆ Σ and also c n∈N Σn ⊆ Λ∞ , so every member of Σ is sandwiched between two members of Λ∞ of the same measure (254Ff), and X∞ must be equal to X almost everywhere. Moreover, 275I also tells us that limn→∞ E(|X − Xn |) = limn→∞ E(|X∞ − Xn |) = 0, as required. 275K Reverse martingales We have a result corresponding to 275I for decreasing sequences of σalgebras. While this is used less often than 275G-275I, it does have very important applications. Theorem Let (Ω, Σ, µ) be a probability space, and hΣn in∈N a non-increasing sequence of σ-subalgebras of Σ, with intersection Σ∞ . Let X be any real-valued random variable with finite expectation, and for
275M
Martingales
387
each n ∈ N let Xn be a conditional expectation of X on Σn . Then X∞ = limn→∞ Xn is defined almost everywhere and is a conditional expectation of X on Σ∞ . T proof (a) Set H = n∈N dom Xn , so that H is conegligible. For n ∈ N, a < b in R, and ω ∈ H, write Uabn (ω) for the number of up-crossings from a to b in the list Xn (ω), Xn−1 (ω), . . . , X0 (ω) (275E). Then E(Uabn ) ≤
1 E((X0 b−a
− Xn )+ )
(275F) ≤
1 E(|X0 | + |Xn |) b−a
≤
2 E(|X0 |) b−a
< ∞.
So limn→∞ Uabn (ω) is finite for almost every ω. But this means that {ω : lim inf n→∞ Xn (ω) < a, lim supn→∞ Xn (ω) > b} is negligible. As a and b are arbitrary, hXn in∈N is convergent a.e., just as in 275G. Set X∞ (ω) = limn→∞ Xn (ω) whenever this is defined in R. (b) By 246D, {Xn : n ∈ N} is uniformly integrable, so E(|Xn − X∞ |) → 0 as n → ∞ (246Ja), and R R R X∞ = limn→∞ E Xn = E X0 E for every E ∈ Σ∞ . (c) Now there is a conegligible set G ∈ Σ∞ such that G ⊆ dom X∞ and X∞ ¹G is Σ∞ -measurable. P P For each nS∈ N,Tthere is a conegligible set Gn ∈ Σn such that G ⊆ dom X and X ¹G is Σ -measurable. n n n n n S T Set G0 = n∈N m≥n Gm ; then, for any r ∈ N, G0 = n≥r m≥n Gm belongs to Σr , so G0 ∈ Σ∞ , while of course G0 is conegligible. For n ∈ N, set Xn0 (ω) = Xn (ω) for ω ∈ Gn , 0 for ω ∈ Ω \ Gn ; then for ω ∈ G0 , 0 limn→∞ Xn0 (ω) = limn→∞ Xn0 (ω) if either is defined in R. Writing X∞ = limn→∞ Xn0 whenever this is 0 0 ∈ Σr for every r ∈ N, so that defined in R, 121F and 121H tell us that X∞ is Σr -measurable and dom X∞ 0 00 0 G = dom X∞ belongs to Σ∞ and X∞ is Σ∞ -measurable. We also know, from (b), that G00 is conegligible. So setting G = G0 ∩ G00 we have the result. Q Q Thus X∞ is a conditional expectation of X on Σ∞ . 275L Stopping times In a sense, the main work of this section is over; I have no room for any more theorems of importance comparable to 275G-275I. However, it would be wrong to leave this chapter without briefly describing one of the most fruitful ideas of the subject. ˆ µ Definition Let (Ω, Σ, µ) be a probability space, with completion (Ω, Σ, ˆ), and hΣn in∈N a non-decreasing ˆ sequence of σ-subalgebras of Σ. A stopping time adapted to hΣn in∈N (also called ‘optional time’, ‘Markov time’) is a function τ from Ω to N ∪ {∞} such that {ω : τ (ω) ≤ n} ∈ Σn for every n ∈ N. Remark Of course the condition {ω : τ (ω) ≤ n} ∈ Σn for every n ∈ N can be replaced by the equivalent condition {ω : τ (ω) = n} ∈ Σn for every n ∈ N. I give priority to the former expression because it is more appropriate to other index sets (see 275Cc). 275M Examples (a) If hXn in∈N is a martingale adapted to a sequence hΣn in∈N of σ-algebras, and Hn is a Borel subset of R n+1 for each n, then we have a stopping time τ adapted to hΣn in∈N defined by the formula T τ (ω) = inf{n : ω ∈ i≤n dom Xi , (X0 (ω), . . . , Xn (ω)) ∈ Hn }, setting inf ∅ = ∞ as usual. (For S by 121K the set En = {ω : (X0 (ω), . . . , Xn (ω)) ∈ Hn } belongs to Σn for each n, and {ω : τ (ω) ≤ n} = i≤n Ei .) In particular, for instance, the formulae inf{n : Xn (ω) ≥ a}, define stopping times.
inf{n : |Xn (ω)| > a}
388
Probability theory
275Mb
(b) Any constant function τ : Ω → N ∪ {∞} is a stopping time. If τ , τ 0 are two stopping times adapted to the same sequence hΣn in∈N of σ-algebras, then τ ∧ τ 0 is a stopping time adapted to hΣn in∈N , setting (τ ∧ τ 0 )(ω) = min(τ (ω), τ 0 (ω)) for ω ∈ Ω. 275N Lemma Let (Ω, Σ, µ) be a complete probability space, and hΣn in∈N a non-decreasing sequence of σ-subalgebras of Σ. Suppose that τ and τ 0 are stopping times on Ω, and hXn in∈N a martingale, all adapted to hΣn in∈N . (a) The family ˜ τ = {E : E ∈ Σ, E ∩ {ω : τ (ω) ≤ n} ∈ Σn for every n ∈ N} Σ is a σ-subalgebra of Σ. ˜τ ⊆ Σ ˜ τ0. (b) If τ (ω) ≤ τ 0 (ω) for every ω, then Σ (c) Now suppose that τ is finite almost everywhere. Set ˜ τ (ω) = Xτ (ω) (ω) X ˜τ ∈ Σ ˜ τ and X ˜ τ is Σ ˜ τ -measurable. whenever τ (ω) < ∞ and ω ∈ dom Xτ (ω) . Then dom X (d) If τ is essentially bounded, that is, there is some m ∈ N such that τ ≤ m almost everywhere, then ˜ τ ) exists and is equal to E(X0 ). E(X ˜ τ is a conditional expectation of (e) If τ ≤ τ 0 almost everywhere, and τ 0 is essentially bounded, then X ˜ τ 0 on Σ ˜τ. X ˜τ proof (a) This is elementary. Write Hn = {ω : τ (ω) ≤ n} for each n ∈ N. The empty set belongs to Σ ˜ because it belongs to Σn for every n. If E ∈ Στ , then (Ω \ E) ∩ Hn = Hn \ (E ∩ Hn ) ∈ Σn ˜ τ . If hEk ik∈N is any sequence in Σ ˜ τ then because Hn ∈ Σn ; this is true for for every n, so X \ E ∈ Σ S S ( k∈N Ek ) ∩ Hn = k∈N Ek ∩ Hn ∈ Σn S ˜τ. for every n, so k∈N Ek ∈ Σ ˜ τ then of course E ∈ Σ, and if n ∈ N then {ω : τ 0 (ω) ≤ n} ⊆ {ω : τ (ω) ≤ n}, so that (b) If E ∈ Σ E ∩ {ω : τ 0 (ω) ≤ n} = E ∩ {ω : τ (ω) ≤ n} ∩ {ω : τ 0 (ω) ≤ n} ˜ τ0. belongs to Σn ; as n is arbitrary, E ∈ Σ (c) Set Hn = {ω : τ (ω) ≤ n} for each n ∈ N. For any a ∈ R, ˜τ , X ˜ τ (ω) ≤ a} Hn ∩ {ω : ω ∈ dom X [ {ω : τ (ω) = k, ω ∈ dom Xk , Xk (ω) ≤ a} ∈ Σn . = k≤n
As n is arbitrary, ˜τ = As a is arbitrary, dom X
S
˜τ , X ˜ τ (ω) ≤ a} ∈ Σ ˜τ. Ga = {ω : ω ∈ dom X
˜ τ and X ˜ τ is Σ ˜ τ -measurable. Gm ∈ Σ S (d) Set Hk = {ω : τ (ω) = k} for k ≤ m. Then k≤m Hk is conegligible, so R R Pm R Pm R E(Xτ ) = k=0 Hk Xk = k=0 Hk Xm = Ω Xm = Ω X0 . m∈N
(e) Suppose τ 0 ≤ n almost everywhere. Set Hk = {ω : τ (ω) = k}, Hk0 = {ω : τ 0 (ω) = k} for each k; then both hHk ik≤n and hHk0 ik≤n are disjoint covers of conegligible subsets of X. Now suppose that E ∈ Στ . Then R R R Pn R Pn R ˜ τ = Pn ˜ X k=0 E∩Hk Xτ = k=0 E∩Hk Xk = k=0 E∩Hk Xn = E Xn E R R R R ˜ τ 0 , so we also have ˜τ 0 = ˜τ = ˜τ 0 because E ∩ Hk ∈ Σk for every k. By (b), E ∈ Σ X Xn . Thus E X X E E E ˜ τ , as claimed. for every E ∈ Σ
275Xb
Martingales
389
275O Proposition Let hXn in∈N be a martingale and τ a stopping time, both adapted to the same sequence hΣn in∈N of σ-algebras. For each n, set (τ ∧ n)(ω) = min(τ (ω), n) for ω ∈ Ω; then τ ∧ n is a stopping ˜ τ ∧n in∈N is a martingale adapted to hΣ ˜ τ ∧n in∈N , defining X ˜ τ ∧n and Σ ˜ τ ∧n as in 275N. time, and hX ˜ τ ∧m ⊆ Σ ˜ τ ∧n by 275Nb. proof As remarked in 275Mb, each τ ∧ n is a stopping time. If m ≤ n, then Σ ˜ τ ∧m is Σ ˜ τ ∧m -measurable, with domain belonging to Σ ˜ τ ∧m , by 275Nc, and has finite expectation, by Each X ˜ τ ∧m is a conditional expectation of X ˜ τ ∧n on Σ ˜ τ ∧m , by 275Ne. 275Nd; finally, if m ≤ n, then X 275P Corollary Suppose that (Ω, Σ, µ) is a probability space and hXn in∈N is a martingale on Ω such that W = supn∈N |Xn+1 − Xn | is finite almost everywhere and has finite expectation. Then for almost every ω ∈ Ω, either limn→∞ Xn (ω) exists in R or supn∈N Xn (ω) = ∞ and inf n∈N Xn (ω) = −∞. proof Let hΣn iT n∈N be a non-decreasing sequence of σ-algebras to which hXn in∈N is adapted. Let H be the conegligible set n∈N dom Xn ∩ {ω : W (ω) < ∞}. For each m ∈ N, set τm (ω) = inf{n : ω ∈ dom Xn , Xn (ω) > m}. As in 275Ma, τm is a stopping time adapted to hΣn in∈N . Set ˜ τ ∧n , Ymn = X m defined as in 275Nc, so that hYmn in∈N is a martingale, by 275O. If ω ∈ H, then either τm (ω) > n and Ymn (ω) = Xn (ω) ≤ m, or 0 < τm (ω) ≤ n and Ymn (ω) = Xτm (ω) (ω) ≤ W (ω) + Xτm (ω)−1 (ω) ≤ W (ω) + m, or τm (ω) = 0 and Ymn (ω) = X0 (ω). Thus Ymn (ω) ≤ |X0 (ω)| + W (ω) + m for every ω ∈ H, and |Ymn (ω)| = 2 max(0, Ymn (ω)) − Ymn (ω) ≤ 2(|X0 (ω)| + W (ω) + m) − Ymn (ω), E(|Ymn |) ≤ 2E(|X0 |) + 2E(W ) + 2m − E(Ymn ) = 2E(|X0 |) + 2E(W ) + 2m − E(X0 ) by 275Nd. As this is true for every n ∈ N, supn∈N E(|Ymn |) < ∞, and limn→∞ Ymn is defined in R almost everywhere, by Doob’s Martingale Convergence Theorem (275G). Let Fm be the conegligible set on which T hYmn in∈N converges. Set H ∗ = H ∩ m∈N Fm , so that H ∗ is conegligible. Now consider E = {ω : ω ∈ H ∗ , supn∈N Xn (ω) < ∞}. For any ω ∈ E, there must be an m ∈ N such that supn∈N Xn (ω) ≤ m. Now this means that Ymn (ω) = Xn (ω) for every n, and as ω ∈ Fm we have limn→∞ Xn (ω) = limn→∞ Ymn (ω) ∈ R. This means that hXn (ω)in∈N is convergent for almost every ω such that {Xn (ω) : n ∈ N} is bounded above. Similarly, hXn (ω)in∈N is convergent for almost every ω such that {Xn (ω) : n ∈ N} is bounded below, which completes the proof. 275X Basic exercises > (a) Let hXnP in∈N be an independent sequence of random variables with zero n expectation and finite variance. Set sn = ( i=0 Var(Xi ))1/2 , Yn = (X0 + . . . + Xn )2 − s2n for each n. Show that hYn in∈N is a martingale. >(b) Let hXn in∈N be a martingale. Show that for any ² > 0, Pr(supn∈N |Xn |) ≥ ²) ≤
1 ²
supn∈N E(|Xn |).
390
Probability theory
275Xc
(c) P´ olya’s urn scheme Imagine a box containing red and white balls. At each move, a ball is drawn at random from the box and replaced together with another of the same colour. (i) Writing Rn , Wn for the numbers of red and white balls after the nth move and Xn = Rn /(Rn + Wn ), show that hXn in∈N is a martingale. (ii) Starting from R0 = W0 = 1, find the distribution of (Rn , Wn ) for each n. (iii) Show that X = limn→∞ Xn is defined almost everywhere, and find its distribution when R0 = W0 = 1. (See Feller 66 for a discussion of other starting values.) > (d) Let (Ω, Σ, µ) be a probability space, and hΣn in∈N a non-decreasing sequence of σ-subalgebras of Σ; for each n ∈ N let Pn : L1 → L1 be the conditional expectation operator corresponding to Σn , where L1 = L1 (µ) (242J). (i) Show that V = {u : u ∈ L1 , limn→∞ kPn u − uk1 =S0} is a k k1 -closed linear subspace of L1 . (ii) Show that {E : E ∈ Σ, χE • ∈ V } is a Dynkin class including n∈N Σn , so includes the σ-algebra S Σ∞ generated by n∈N Σn . (iii) Show that if u ∈ L1 then v = supn∈N Pn |u| is defined in L1 and is of the form W • where Pr(W ≥ ²) ≤ 1² kuk1 for every ² > 0. (Hint: 275D.) (iv) Show that if X is a Σ∞ -measurable random variable with finite expectation, and for each n ∈ N Xn is a conditional expectation of X on Σn , then X • ∈ V and X =a.e. limn→∞ Xn . (Hint: apply (iii) to u = (X − Xm )• for large m.) (e) Let (Ω, Σ, µ) be a probability S space, hΣn in∈N a non-decreasing sequence of σ-subalgebras of Σ, and Σ∞ the σ-algebra generated by n∈N Σn . For each n ∈ N ∪ {∞} let Pn : L1 → L1 be the conditional expectation operator corresponding to Σn , where L1 = L1 (µ). Show that limn→∞ kPn u − ukp = 0 whenever p ∈ [1, ∞[ and u ∈ Lp (µ). (Hint: 275Xd, 233J/242K, 246Xg.) (f ) Let hXn in∈N be a martingale, and suppose that p ∈ ]1, ∞[ is such that supn∈N kXn kp < ∞. Show that X = limn→∞ Xn is defined almost everywhere and that limn→∞ kXn − Xkp = 0. > (g) Let (Ω, Σ, µ) be [0, 1] with Lebesgue measure. For each n ∈ N let Σn be the finite subalgebra −n of Σ generated by intervals of the type [0, 2−n r] for R r ≤ 2 . Use 275I to show that for any integrable n X : [0, 1] → R we must have X(t) = limn→∞ 2 In (t) X for almost every t ∈ [0, 1[, where In (t) is the interval of the form [2−n r, 2−n (r + 1)[ containing t. Compare this result with 223A and 261Yd. (h) In 275K, show that limn→∞ kXn − X∞ kp = 0 for any p ∈ [1, ∞[ such that kX0 kp is finite. (Compare 275Xe.) ˆ µ (i) Let (Ω, Σ, µ) be a probability space, with completion (Ω, Σ, ˆ), and hΣn in∈N a non-decreasing sequence ˆ Let hXn in∈N be a uniformly integrable martingale adapted to Σn , and set X∞ = of σ-subalgebras of Σ. ˜ τ (ω) = Xτ (ω) (ω) whenever ω ∈ limn→∞ Xn . Let τ be a stopping time adapted to hΣn in∈N , and set X ˜ ˜ τ , as dom Xτ (ω) , allowing ∞ as a value of τ (ω). Show that Xτ is a conditional expectation of X∞ on Σ defined in 275N. ˆ µ (j) Let (Ω, Σ, µ) be a probability space, with completion (Ω, Σ, ˆ), and hΣn in∈N a non-decreasing sequence ˆ of σ-subalgebras of Σ. Let hXn in∈N be a martingale and τ a stopping time, both adapted to hΣn in∈N . ˜ τ , as defined in Suppose that supn∈N E(|Xn |) < ∞ and that τ is finite almost everywhere. Show that X ˜ 275Nc, has finite expectation, but that E(Xτ ) need not be equal to E(X0 ). (k) (i) Find a martingale hXn in∈N such that hX2n in∈N → 0 a.e. but |X2n+1 | ≥a.e. 1 for every n ∈ N. (ii) Find a martingale which converges in measure but is not convergent a.e. 275Y Further exercises (a) Let (Ω, Σ, µ) be a complete probability space, hΣn in∈N a non-decreasing sequence of σ-subalgebras of Σ all containing every negligible set, and hXn in∈N a martingale adapted to hΣn in∈N . Let ν be another probability measure with domain Σ which is absolutely continuous with respect to µ, with Radon-Nikod´ ym derivative Z. For each n ∈ N let Zn be a conditional expectation of Z on Σn (with respect to the measure µ). (i) Show that Zn is a Radon-Nikod´ ym derivative of ν¹Σn with respect to µ¹Σn . (ii) Set Wn (ω) = Zn (ω)/Zn−1 (ω) if this is defined in R, otherwise 0. For n ≥ 1, let Vn be a conditional P expectation of Wn × (Xn − Xn−1 ) on Σn−1 (with respect to the measure µ). Set Y0 = X0 , n Yn = Xn − k=1 Vk for n ≥ 1. Show that hYn in∈N is a martingale adapted to hΣn in∈N with respect to the measure ν.
275Yk
Martingales
391
(b) Combine the ideas of 275Cc with those of 275Cd-275Ce to describe a notion of ‘martingale indexed by I’, where I is an arbitrary partially ordered set. (c) Let hXk ik∈N be a martingale on a complete probability space (Ω, Σ, µ), and fix n ∈ N. Set X ∗ = p
kXn kp . (Hint: set Ft = {ω : X ∗ (ω) ≥ t}. max(|X0 |, . . . , |Xn |). Let p ∈ ]1, ∞[. Show that kX ∗ kp ≤ p−1 R Show that tµFt ≤ Ft |Xn |. Using Fubini’s theorem on Ω × [0, ∞[ and on Ω × [0, ∞[ × [0, ∞[, show that R∞ ˆFt dt, E((X ∗ )p ) = p 0 tp−1 µ R∞ 0
tp−2
R Ft
|Xn |dt =
1 E(|Xn | × (X ∗ )p−1 ), p−1
E(|Xn | × (X ∗ )p−1 ) ≤ kXn kp kX ∗ kp−1 . p Compare 286A.) (d) Let hXk ik∈N be a martingale on a complete probability space (Ω, Σ, µ), and fix n ∈ N. Set X ∗ = max(|X0 |, . . . , |Xn |), Ft = {ω : X ∗ (ω) ≥ t}, Gt = {ω : |Xn (ω)| ≥ 21 t} for t ≥ 0. (i) Show that tµFt ≤ R 2 Gt |Xn | for every t ≥ 0. (ii) Show that E(X ∗ ) ≤ 1 + 2 ln 2E(|Xn |) + 2E(|Xn | × ln+ |Xn |), where ln+ t = ln t for t ≥ 1, 0 for t ∈ [0, 1]. (e) Let (Ω, Σ, µ) be a probability space and hΣi ii∈I a countable family of σ-subalgebras of Σ such that for any i, j ∈ I either Σi ⊆ Σj or Σj ⊆ Σi . Let X be a real-valued random variable on Ω such that kXkp < ∞, where 1 < p < ∞, and suppose that Xi is a conditional expectation of X on Σi for each i ∈ I. Show that k supi∈I |Xi |kp ≤
p kXkp . p−1
ˆ µ (f ) Let (Ω, Σ, µ) be a probability space, with completion (Ω, Σ, ˆ), and let hΣn in∈N be a non-decreasing ˆ Let hXn in∈N be a sequence of µ-integrable real-valued functions such that sequence of σ-subalgebras of Σ. dom Xn ∈ Σn and Xn is Σn -measurable for each Rn ∈ N. We say R that hXn in∈N is a submartingale adapted to hΣn in∈N (also called ‘semi-martingale’) if E Xn+1 ≥ E Xn for every n ∈ N. Prove versions of 275D, 275F, 275G, 275Xf for submartingales. (g) Let hXn in∈N be a martingale, and φ : R → R a convex function. Show that hφ(Xn )in∈N is a submartingale. (Hint: 233J.) Re-examine part (b-ii) of the proof of 275F in the light of this fact. (h) Let hXn in∈N be an independent sequence of non-negative random variables all with expectation 1. Set Wn = X0 × . . . × Xn for every n. (i) Show that W = limn→∞ Wn is defined a.e. (ii) Show that E(W ) is either 0 or 1. (Hint: suppose E(W ) > 0. Set Zn = limm→∞ Xn × . . . × Xm . Show that limn→∞ Zn = 1 when 0 < W < ∞, therefore a.e., by the zero-one law, while E(ZnQ ) ≤ 1, by√Fatou’s lemma, so limn→∞ E(Zn ) = 1, ∞ while E(W ) = E(Wn )E(Zn+1 ) for every n.) (iii) Set γ = n=0 E( Xn ). Show that γ >√0 iff E(W ) = 1. (Hint: Pr(Wn ≥ 14 γ 2 ) ≥ 14 γ 2 for every n, so if γ > 0 then W cannot be zero a.e.; while E( W ) ≤ γ.) (i) Let h(Ωn , Σn , µn )in∈N be a sequence of probability spaces with product (Ω, Σ, µ). Suppose that for each n ∈ N we have a probability measure νn , with domain Σn ,QwhichR is√absolutely continuous with respect to µn , with Radon-Nikod´ ym derivative fn , and suppose that n∈N fn dµn > 0. Let ν be the product of hνn in∈N . Show that ν is an indefinite-integral measure over µ, with Radon-Nikod´ ym derivative f , where R Q ω ) = n∈N fn (ωn ) for µ-almost every ω = hωn in∈N in Ω. (Hint: use 275Yh to show that f dµ = 1.) f (ω (j) Let hpn in∈N be a sequence in [0, 1]. Let µ be the usual measure on {0, 1}N (254J) and ν the product of hνn in∈N , where νn is the probability on {0, 1} defined by setting νn {1} = pn . Show that ν is an Pmeasure ∞ indefinite-integral measure over µ iff n=0 |pn − 21 |2 < ∞. (k) Let (Ω, Σ, µ) be a probability space, hΣn in∈N a non-decreasing sequence of σ-subalgebras of Σ and hXn in∈N a sequence of random variables on Ω such that E(supn∈N |Xn |) is finite and X = limn→∞ Xn is defined almost everywhere. For each n, let Yn be a conditional expectation of Xn on Σn . Show that S hYn in∈N converges almost everywhere to a conditional expectation of X on the σ-algebra generated by n∈N Σn .
392
Probability theory
275Yl
(l) Show that 275Yk can fail if hXn in∈N is merely uniformly integrable, rather than dominated by an integrable function. (m) Let (Ω, Σ, µ) be a probability space, hΣn in∈N an independent sequence of σ-subalgebras of Σ, and X a random variable on Ω with finite variance. Let Xn be a conditional P∞expectation of X on Σn for each n. Show that limn→∞ Xn = E(X) almost everywhere. (Hint: consider n=0 Var(Xn ).) (n) Let (Ω, Σ, µ) be a complete probability space, and hXn in∈N an independent sequence of random 1 (X0 + variables on Ω, all with the same distribution, and of finite expectation. For each n, set Sn = n+1 S ∗ . . . + Xn ); let Σn be the σ-algebra defined by Sn and Σn the σ-algebra generated by m≥n Σm . Show that Sn is a conditional expectation of X0 on Σ∗n . (Hint: assume every Xi defined everywhere on Ω. Set φ(ω) = hXi (ω)ii∈N . Show that φ : Ω → R N is inverse-measure-preserving for a suitable product measure on N R N , and that every set in Σ∗n is of the form φ−1R [H] where R H ⊆ R is a Borel set invariant under permutations of coordinates in the set {0, . . . , n}, so that E Xi = E Xj whenever i ≤ j ≤ n and E ∈ Σ∗n .) Hence show that hSn in∈N converges almost everywhere. (Compare 273I.) (o) Formulate and prove versions of the results of this chapter for martingales consisting of functions taking values in C or R r rather than R. (p) Find a martingale hXn in∈N such that the sequence νXn of distributions (271C) is convergent for the vague topology (274Ld), but hXn in∈N is not convergent in measure. 275 Notes and comments I hope that the sketch above, though distressingly abbreviated, has suggested some of the richness of the concepts involved, and will provide a foundation for further study. All the theorems of this section have far-reaching implications, but the one which is simply indispensable in advanced measure theory is 275I, ‘L´evy’s martingale convergence theorem’, which I will use in the proof of the Lifting Theorem in Chapter 34 of the next volume. As for stopping times, I mention them partly in an attempt to cast further light on what martingales are for (see 276Ed below), and partly because the ideas of 275N-275O are so important in modern probability theory that, just as a matter of general knowledge, you should be aware that there is something there. I add 275P as one of the most accessible of the standard results which may be obtained by this method.
276 Martingale difference sequences Hand in hand with the concept of ‘martingale’ is that of ‘martingale difference sequence’ (276A), a direct generalization of the notion of ‘independent sequence’. In this section I collect results which can be naturally expressed in terms of difference sequences, including yet another strong law of large numbers (276C). I end the section with a proof of Koml´os’ theorem (276H). 276A Martingale difference sequences (a) If hXn in∈N is a martingale adapted to a sequence hΣn in∈N of σ-algebras, then we have R Xn+1 − Xn = 0 E ˆ µ whenever E ∈ Σn . Let us say that if (Ω, Σ, µ) is a probability space, with completion (Ω, Σ, ˆ), and hΣn in∈N ˆ is a non-decreasing sequence of σ-subalgebras of Σ, then a martingale difference sequence adapted to hΣn in∈N is a sequence hXn in∈N of real-valued random variablesRon Ω, all with finite expectation, such that (i) dom Xn ∈ Σn and Xn is Σn -measurable, for each n ∈ N (ii) E Xn+1 = 0 whenever n ∈ N, E ∈ Σn . Pn (b) Evidently hXn in∈N is a martingale difference sequence adapted to hΣn in∈N iff h i=0 Xi in∈N is a martingale adapted to hΣn in∈N . (c) Just as in 275Cd, we can say that a sequence hXn in∈N is in itself a martingale difference sequence Pn ˜ n in∈N , if h i=0 Xi in∈N is a martingale, that is, if hXn in∈N is a martingale difference sequence adapted to hΣ ˜ n is the σ-algebra generated by S where Σ Σ . i≤n Xi
276C
Martingale difference sequences
393
(d) If hXn in∈N is a martingale difference sequence then han Xn in∈N is a martingale difference sequence for any real an . (e) If hXn in∈N is a martingale difference sequence and Xn0 =a.e. Xn for every n, then hXn0 in∈N is a martingale difference sequence. (Compare 275Ce.) (f ) Of course the most important example of ‘martingale difference sequence’ is that of 275Bb: any independent sequence of random variables with zero expectation is a martingale difference sequence. It turns out that some of the theorems of §273 concerning such independent sequences may be generalized to martingale difference sequences. P∞ 2 276B n=0 E(Xn ) < ∞. P∞Proposition Let hXn in∈N be a martingale difference sequence such that Then n=0 Xn is defined, and finite, almost everywhere. ˆ µ proof (a) Let (Ω, Σ, µ) be the underlying probability space, (Ω, Σ, ˆ) its completion, and hΣn in∈N a nonPn ˆ decreasing sequence of σ-subalgebras of Σ such that hXn in∈N is adapted to hΣn in∈N . Set Yn = i=0 Xi for each n ∈ N. Then hYn in∈N is a martingale adapted to hΣn in∈N . (b) E(Yn × Xn+1 ) = 0 for each n. P P Yn is a sum of random variables with finite variance, so E(Yn2 ) < ∞, by 244Ba; it follows that Yn × Xn+1 has finite expectation, by 244Eb. Because the constant function 0 is a conditional expectation of Xn+1 on Σn , E(Yn × Xn+1 ) = E(Yn × 0 ) = 0, by 242L. Q Q (c) It follows that E(Yn2 ) =
Pn i=0
E(Xi2 ) for every n. P P Induce on n. For the inductive step, we have
2 2 2 ) ) = E(Yn2 ) + E(Xn+1 ) = E(Yn2 + 2Yn × Xn+1 + Xn+1 E(Yn+1
because, by (b), E(Yn × Xn+1 ) = 0. Q Q (d) Of course E(|Yn |) =
R
|Yn | × χΩ ≤ kYn k2 kχΩk2 =
so
p
p E(Yn2 ),
pP∞
E(Xi2 ) < ∞. P∞ By 275G, limn→∞ Yn is defined and finite almost everywhere, that is, i=0 Xi is defined and finite almost everywhere. supn∈N E(|Yn |) ≤ supn∈N
E(Yn2 ) =
i=0
276C The strong law of large numbers: fourth form Let hXn in∈N be a martingale difference sequence, and suppose that hbn in∈N is a non-decreasing sequence in ]0, ∞[, diverging to ∞, such that P∞ 1 n=0 2 Var(Xn ) < ∞. Then bn
limn→∞
1 bn
Pn i=0
Xi = 0
almost everywhere. proof (Compare 273D.) As usual, write (Ω, Σ, µ) for the underlying probability space. Set ˜ n = 1 Xn X bn
˜ n in∈N is also a martingale difference sequence, and for each n; then hX P∞ P∞ 1 ˜2 n=1 E(Xn ) = n=1 2 Var(Xn ) < ∞. bn
˜ n (ω)in∈N is summable for almost every ω ∈ Ω. But by 273Cb, By 276B, hX 1 Pn 1 Pn ˜ limn→∞ i=0 Xi (ω) = limn→∞ i=0 bi Xi (ω) = 0 bn
for all such ω. So we have the result.
bn
394
Probability theory
276D
276D Corollary Let hXn in∈N be a martingale such that bn = E(Xn2 ) is finite for each n. (a) If supn∈N bn is infinite, then limn→∞ b1n Xn =a.e. 0. (b) If supn≥1 n1 bn < ∞, then limn→∞ n1 Xn =a.e. 0. proof Consider the martingale difference sequence hYn in∈N = hXn+1 − Xn in∈N . Then E(Yn × Xn ) = 0, so 2 E(Yn2 ) + E(Xn2 ) = E(Xn+1 ) for each n. In particular, hbn in∈N must be non-decreasing. (a) If limn→∞ bn = ∞, take m such that bm > 0; then R∞ 1 P∞ P∞ 1 1 Var(Yn ) = n=m 2 (bn+1 − bn ) ≤ bm 2 dt < ∞. n=m 2 bn+1
bn+1
t
By 276C (modifying bi for i < m, if necessary), limn→∞
1 Xn bn
= limn→∞
1 (X0 bn+1
+
Pn i=0
Yi ) = limn→∞
1 bn+1
Pn i=0
Yi = 0
almost everywhere. (b) If γ = supn≥1 n1 bn < ∞, then P∞
1 (n+1)2
1
≤ min(1, γ 2 /t2 ) for bn < t ≤ bn+1 , so
n=0 (n+1)2 (bn+1
− bn ) ≤ γ + γ 2
R∞ 1 γ
t2
dt < ∞,
and, by the same argument as before, limn→∞ n1 Xn =a.e. 0. 276E ‘Impossibility of systems’ (a) I return to the word ‘martingale’ and the idea of a gambling system. Consider a gambler who takes a sequence of ‘fair’ bets, that is, bets which have payoff expectations of zero, but who chooses which bets to take on the basis of past experience. The appropriate model for such a sequence of random events is a martingale in the sense of 275A, taking Σn to be the algebra of all events which are observable up to and including the outcome of the nth bet, and Xn to be the gambler’s net gain at that time. (In this model it is natural to take Σ0 = {∅, Ω} and X0 = 0.) Certain paradoxes can arise if we try to imagine this model with atomless Σn ; to begin with it is perhaps easier to work with the discrete case, in which each Σn is finite, or is the set of unions of some countable family of atomic events. Now suppose that the bets involved are just two-way bets, with two equally likely outcomes, but that the gambler chooses his stake each time. In this case we can think of the outcomes as corresponding to an independent sequence hWn in∈N of random variables, each taking the values ±1 with equal probability. The gambler’s system must be of the form Xn+1 = Xn + Zn+1 × Wn+1 , where Zn+1 is his stake on the (n + 1)-st bet, and must be constant on each atom of the σ-algebra Σn R generated by W1 , . . . , Wn . The point is that because E Wn+1 = 0 for each E ∈ Σn , E(Zn+1 × Wn+1 ) = 0, so E(Xn+1 ) = E(Xn ). (b) The general result, of which this is a special case, is the following. If hWn in∈N is a martingale difference sequence adapted to hΣn in∈N , and hZn in≥1 is a sequence of random variables such that (i) Zn is Σn−1 -measurable (ii) Zn × Wn has finite expectation for each n ≥ R1, then W0 , Z1 × W1 , Z2 × W2 , . . . is a martingale difference sequence adapted to hΣn in∈N ; the proof that E Zn+1 × Wn+1 = 0 for every E ∈ Σn is exactly the argument of (b) of the proof of 276B. (c) I invited you to restrict your ideas to the discrete case for a moment; but if you feel that you understand what it means to say that a ‘system’ or predictable sequence hZn in≥1 must be adapted to hΣn in∈N , in the sense that every Zn is RΣn−1 -measurable, then any further difficulty lies in the measure theory needed to show that the integrals E Zn+1 × Wn+1 are zero, which is what this book is about. (d) Consider the gambling system mentioned in 275Cf. Here the idea is that Wn = ±1, as in (a), and Zn+1 = 2n a if Xn ≤ 0, 0 if Xn > 0; that is, the gambler doubles his stake each time until he wins, and then quits. Of course he is almost sure to win eventually, so we have limn→∞ Xn =a.e. a, even though E(Xn ) = 0 for every n. We can compute the distribution of Xn : for n ≥ 1 we have Pr(Xn = a) = 1 − 2−n ,
*276G
Martingale difference sequences
395
Pr(Xn = −(2n − 1)a) = 2−n . Thus E(|Xn |) = (2 − 2−n+1 )a and the almost-everywhere convergence of the Xn is an example of Doob’s Martingale Convergence Theorem. Pn In the language of stopping times (275N), Xn = Y˜τ ∧n , where Yn = k=0 2k aWk and τ = min{n : Yn > 0}. *276F
I come now to Koml´os’ theorem. The first step is a trifling refinement of 276C.
Lemma Let (Ω, Σ, µ) be a probability space, and hΣn in∈N a non-decreasing sequence of σ-subalgebras of Σ. Suppose that hXn in∈N is a sequence of random variables on Ω such that (i) Xn is Σn -measurable for each P∞ 1 n (ii) n=0 E(Xn2 ) is finite (iii) limn→∞ Xn0 =a.e. 0, where Xn0 is a conditional expectation of Xn on (n+1)2 1 Pn Σn−1 for each n ≥ 1. Then limn→∞ k=0 Xk =a.e. 0. n+1
proof Making suitable adjustments on a negligible set if necessary, we may suppose that Xn0 is Σn−1 measurable for n ≥ 1 and that every Xn and Xn0 is defined on the whole of Ω. Set X00 = X0 and Yn = Xn −Xn0 for n ∈ N. Then hYn in∈N is a martingale difference sequence adapted to hΣn in∈N . Also E(Yn2 ) ≤ E(Xn2 ) for every n. P P If n ≥ 1, Xn0 is square-integrable (244M), and E(Yn × Xn0 ) = 0, as in part (b) of the proof of 276B. Now Q E(Xn2 ) = E(Yn + Xn0 )2 = E(Yn2 ) + 2E(Yn × Xn0 ) + E(Xn0 )2 ≥ E(Yn2 ). Q P∞ 1 1 Pn 2 This means that n=0 (n+1)2 E(Yn ) must be finite. By 276C, limn→∞ n+1 i=0 Yi =a.e. 0. But by P 1 n 0 0 273Ca we also have limn→∞ i=0 Xi = 0 whenever limn→∞ Xn = 0, which is almost everywhere. So n+1 1 Pn limn→∞ i=0 Xi =a.e. 0. n+1
*276G Lemma Let (Ω, Σ, µ) be a probability space, and hXn in∈N a sequence of random variables on Ω such that supn∈N E(|Xn |) is finite. For k ∈ N and x ∈ R set Fk (x) = x if |x| ≤ k, 0 otherwise. Let F be an ultrafilter on N. R R (a) For each k ∈ N there is a measurable function Yk : Ω → [−k, k] such that limn→F E Fk (Xn ) = E Yk for every E ∈ Σ. (b) limn→F E((Fk (Xn ) − Yk )2 ) ≤ limn→F E(Fk (Xn )2 ) for each k. (c) Y = limk→∞ Yk is defined a.e. and limk→∞ E(|Y − Yk |) = 0. proof (a) For each k, |Fk (Xn )| ≤a.e. kχΩ for every n, so that {Fk (Xn ) : n ∈ N} is uniformly integrable, and {Fk (Xn )• : n ∈ N} is relatively weakly compact in L1 = L1 (µ) (247C). Accordingly vk = limn→F Fk (Xn )• is defined in L1 (2A3Se); take Yk : Ω → R to be a measurable function such that Yk• = vk . For any E ∈ Σ, R R R Y = vk × (χE)• = limn→F E Fk (Xn ). E k In particular, |
R E
Yk | ≤ supn∈N |
R E
Fk (Xn )| ≤ kµE
for every E, so that {ω : Yk (ω) > k} and {ω : Yk (ω) < −k} are both negligible; changing Yk on a negligible set if necessary, we may suppose that |Yk (ω)| ≤ k for every ω ∈ Ω. (b) Because Yk is bounded, Yk• ∈ L∞ (µ), and R R R R limn→F Fk (Xn ) × Yk = limn→F Fk (Xn )• × Yk• = Yk• × Yk• = Yk2 . Accordingly
lim
n→F
Z
Z
Z 2
Z
Fk (Xn ) × Yk + Yk2 Fk (Xn ) − 2 lim n→F Z Z Z Fk (Xn )2 . Fk (Xn )2 − Yk2 ≤ lim = lim
(Fk (Xn ) − Yk ) = lim
2
n→F
n→F
n→F
396
Probability theory
*276G
(c) Set W0 = Y0 = 0, Wk = Yk − Yk−1 for k ≥ 1. Then E(|Wk |) ≤ limn→F E(|Fk (Xn ) − Fk−1 (Xn )|) for every k ≥ 1. P P Set E = {ω : Wk (ω) ≥ 0}. Then Z Z Z Wk = Yk − Yk−1 E E E Z Z = lim Fk (Xn ) − lim Fk−1 (Xn ) n→F E n→F E Z Z = lim Fk (Xn ) − Fk−1 (Xn ) ≤ lim |Fk (Xn ) − Fk−1 (Xn )|. n→F
Similarly,
R
| So
n→F
E
X\E
Wk | ≤ limn→F
R X\E
E
|Fk (Xn ) − Fk−1 (Xn )|.
R R R E(|Wk |) = E Wk − X\E Wk ≤ limn→F |Fk (Xn ) − Fk−1 (Xn )|. Q Q P∞ It follows that k=0 E(|Wk |) is finite. P P For any m ≥ 1, m X
E(|Wk |) ≤
k=0
m X k=1
lim E(|Fk (Xn ) − Fk−1 (Xn )|)
n→F
= lim E( n→F
m X
|Fk (Xn ) − Fk−1 (Xn )|)
k=1
= lim E(|Fm (Xn )|) ≤ sup E(|Xn |). n→F
So
n∈N
P∞
Q k=0 E(|Wk |) ≤ supn∈N E(|Xn |) is finite. PmQ By B.Levi’s theorem (123A), limm→∞ k=0 |Wk | is finite a.e., so that P∞ Y = limm→∞ Ym = k=0 Wk
is defined a.e.; and moreover
Pm E(|Y − Yk |) ≤ limm→∞ E( j=k+1 |Wj |) → 0
as k → ∞. ´ s 67) Let (Ω, Σ, *276H Koml´ os’ theorem (Komlo R µ) be any measure space, and hXn in∈N a sequence of integrable real-valued functions on Ω such that supn∈N |Xn | is finite. Then there are a subsequence hXn0 in∈N 1 Pn 00 00 of hXn in∈N and an integrable function Y such that Y =a.e. limn→∞ i=0 Xi whenever hXn in∈N is a n+1
subsequence of hXn0 in∈N .
proof Since neither the hypothesis nor the conclusion is affected by changing the Xn on a negligible set, we may suppose throughout that every Xn is measurable and defined on the whole of Ω. In addition, to begin with (down to the end of (e) below), let us suppose that µX = 1. As in 276G, set Fk (x) = x for |x| ≤ k, 0 for |x| > k. (a) Let F be any non-principal ultrafilter on N (2A1O). For j ∈ N set pj = limn→F Pr(|Xn | > j). Then P∞ P For any k ∈ N, j=0 pj is finite. P k X j=0
pj =
k X j=0
lim Pr(|Xn | > j) = lim
n→F
≤ lim (1 + n→F
So
P∞ j=0
pj ≤ 1 + supn∈N
R
|Xn | is finite. Q Q
k X
n→F
Pr(|Xn | > j)
j=0
Z
Z |Xn |) ≤ 1 + sup
n∈N
|Xn |.
*276H
Martingale difference sequences
397
Setting p0j = pj − pj+1 = limn→F Pr(j < |Xn | ≤ j + 1) for each j, we have ∞ X j=0
m ¡X
(j + 1)p0j = lim
m→∞
= lim Next, limn→F
R
j=0 m X
m→∞
(j + 1)pj −
m X
(j + 1)pj+1
¢
j=0
pj − (m + 1)pm+1 ≤
j=0
∞ X
pj < ∞.
j=0
Fk (Xn )2 ≤
Pk
j=0 (j
+ 1)2 p0j
Pk for each k. P P Setting Ejn = {ω : j ≤ |Xn (ω)| < j + 1} for j, n ∈ N, Fk (Xn )2 ≤ j=0 (j + 1)2 χEjn , so R Pk Pk limn→F Fk (Xn )2 ≤ limn→F j=0 (j + 1)2 µEjn = j=0 (j + 1)2 p0j . Q Q (b) Define hYk ik∈N and Y =a.e. limk→∞ Yk from hXn in∈N and F as in Lemma 276G. Then R Pk Jk = {n : n ∈ N, (Fk (Xn ) − Yk )2 ≤ 1 + j=0 (j + 1)2 p0j } belongs to F for every k ∈ N. P P By (a) above and 276Gb, R R Pk Q limn→F (Fk (Xn ) − Yk )2 ≤ limn→F Fk (Xn )2 ≤ j=0 (j + 1)2 p0j . Q Also, of course, Kk = {n : n ∈ N, Pr(Fj (Xn ) 6= Xn ) ≤ pj + 2−j for every j ≤ k} belongs to F for every k.
R (c) For n, k ∈ N let Zkn be a simple function such that |Zkn | ≤ |Fk (Xn )−Yk | and |Fk (Xn )−Yk −Zkn | ≤ 2−k . For m ∈ N let Σm be the algebra of subsets of Ω generated by sets of the form {ω : Zkn (ω) = α} for k, n ≤ m and α ∈ R. Because each Zkn takes only finitely many values, Σm is finite (and is therefore a σ-subalgebra of Σ); and of course Σm ⊆ Σm+1 for every m. We need to look at conditional expectations on the Σm , and because Σm is always finite these have a particularly straightforward expression. Let Am be the set of ‘atoms’, or minimal non-empty sets, in Σm ; that is, the set of equivalence classes in Ω under the relation ω ∼ ω 0 if Zkn (ω) = Zkn (ω 0 ) for all k, n ≤ m. For any integrable random variable X on Ω, define Em (X) by setting Z 1 Em (X)(ω) = X if x ∈ A ∈ Am and µA > 0, µA
A
= 0 if x ∈ A ∈ Am and µA = 0. Then Em (X) is a conditional expectation of X on Σm . Now X Z
Z lim
n→F
|Em (Fk (Xn ) − Yk )| = lim
n→F
X
= lim
n→F
= lim
n→F
=
X
Z |
X
Z |
A∈Am
A∈Am
Em (Fk (Xn ) − Yk )| A
A∈Am
(because Em (Fk (Xn ) − Yk ) is constant on each A ∈ Am )
|Em (Fk (Xn ) − Yk )| A
A∈Am
Fk (Xn ) − Yk | A
Z
lim |
n→F
Fk (Xn ) − Yk | = 0 A
398
Probability theory
by the choice of Yk . So if we set Im = {n : n ∈ N,
R
*276H
|Em (Fk (Xn ) − Yk )| ≤ 2−k for every k ≤ m},
then Im ∈ F for every m. (d) Suppose that hr(n)in∈N is any strictly increasing sequence in N such that r(0) > 0, r(n) ∈ Jn ∩ Kn 1 Pn for every n and r(n) ∈ Ir(n−1) for n ≥ 1. Then P Express Xr(n) as i=0 Xr(i) → Y a.e. as n → ∞. P n+1
(Xr(n) − Fn (Xr(n) )) + (Fn (Xr(n) ) − Yn − Zn,r(n) ) + Yn + Zn,r(n) for each n. Taking these pieces in turn: (i) ∞ X
Pr(Xr(n) 6= Fn (Xr(n) )) ≤
n=0
∞ X
pn + 2−n
n=0
(because r(n) ∈ Kn for every n) 0, s(n) ∈ Jn ∩ Kn for every n and s(n) ∈ Is(n−1) for n ≥ 1; such a sequence exists because Jn ∩ Kn ∩ Is(n−1) belongs to F, so is infinite, for every n ≥ 1. Set Xn0 = Xs(n) for every n. If hXn00 in∈N is a subsequence of hXn0 in∈N , then it is of the form hXs(r(n)) in∈N for some strictly increasing sequence hr(n)in∈N . In this case s(r(0)) ≥ s(0) > 0, s(r(n)) ∈ Jr(n) ∩ Kr(n) ⊆ Jn ∩ Kn for every n, s(r(n)) ∈ Is(r(n)−1) ⊆ Is(r(n−1)) for every n ≥ 1. So (d) tells us that
1 Pn n+1
i=0
Xi00 → Y a.e.
(f ) Thus the theorem is proved in the case in which (Ω, Σ, µ) is a probability space. Now suppose that µ is R σ-finite and µΩ > 0. In this case there is a strictly positive measurable function f : Ω → R such that f dµ = 1 (215B(ix)). Let ν be the corresponding indefinite-integral measure (234B), so that ν is a probability R 1 1 measure on Ω, and h × Xn in∈N is a sequence of ν-integrable functions such that supn∈N × Xn dν is finite f
f
(235M). From (a)-(e) we see that there must be a ν-integrable function Y and a subsequence hXn0 in∈N of 1 Pn 1 00 00 0 hXn in∈N such that i=0 f × Xi → Y ν-a.e. for every subsequence hXn in∈N of hXn in∈N . But µ and ν n+1 1 Pn 00 00 have the same negligible sets (234Dc), so i=0 Xi → f × Y µ-a.e. for every subsequence hXn in∈N of hXn0 in∈N .
n+1
(g) Since the result is trivial if µΩ = 0, the theorem is true whenever µ is σ-finite. For the general case, set S −m ˜ =S }, Ω m,n∈N {ω : |Xn (ω)| ≥ 2 n∈N {ω : Xn (ω) 6= 0} = so that the subspace measure µΩ˜ is σ-finite. Then there are a µΩ˜ -integrable function Y˜ and a subsequence 1 Pn 00 ˜ ˜ ˜ -a.e. for every subsequence hXn00 in∈N of hXn0 in∈N . hXn0 in∈N of hXn in∈N such that i=0 Xi ¹ Ω → Y µΩ n+1 ˜ 0 for ω ∈ Ω \ Ω, ˜ we see that Y is µ-integrable and that 1 Pn X 00 → Y µSetting Y (ω) = Y˜ (ω) if ω ∈ Ω, i=0 i a.e. whenever hXn00 in∈N is a subsequence of hXn0 in∈N . This completes the proof.
n+1
276X Basic exercises > (a) Let hXn in∈N be a martingale adapted to a sequence hΣn in∈N of σ-algebras. R R 2 Show that E Xn2 ≤ E Xn+1 for every n ∈ N, E ∈ Σn (allowing ∞ as a value of an integral). (Hint: see the proof of 276B.) > (b) Let hXn in∈N be a martingale. Show that for any ² > 0, Pr(supn∈N |Xn | ≥ ²) ≤
1 ²2
supn∈N E(Xn2 ).
(Hint: put 276Xa together with the argument for 275D.) (c) When does 276Xb give a sharper result than 275Xb? 1 (X0 + . . . + Xn ) for each n ∈ N. (d) Let hXn in∈N be a martingale difference sequence and set Yn = n+1 Show that if hXn in∈N is uniformly integrable then limn→∞ kYn k1 = 0. (Hint: use the argument of 273Na,
400
Probability theory
276Xd
˜ n = Xn0 − Zn , where Zn is an appropriate conditional expectation with 276C in place of 273D, and setting X 0 of Xn .) (e) Strong law of large numbers: fifth form A sequence hXn in∈N of random variables is exchangeable if (Xn0 , . . . , Xnk ) has the same joint distribution as (X0 , . . . , Xk ) whenever n0 , . . . , nk are distinct. Show that if hXn in∈N is an exchangeable sequence of random variables with finite expectation, 1 P∞ then h i=0 Xi in∈N converges a.e. (Hint: 276H.) n+1
276Y Further exercises (a) Let hXn in∈N be aP martingale difference sequence such that supn∈N kXn kp n 1 is finite, where p ∈ ]1, ∞[. Show that limn→∞ k n+1 i=0 Xi kp = 0. (Hint: 273Nb.) (b) Let hXn in∈N be a uniformly integrable martingale difference sequence and Y a bounded random variable. Show that limn→∞ E(Xn × Y ) = 0. (Compare 272Yd.) (c) Use 275Yg to prove 276Xa. (d) Let hXn in∈N be a sequence of random variables such that, for some δ > 0, supn∈N nδ E(|Xn |) is finite. 1 (X0 + . . . + Xn ) for each n. Show that limn→∞ Sn =a.e. 0. (Hint: set Zk = 2−k (|X0 | + . . . + Set Sn = n+1 P∞ |X2k −1 |). Show that k=0 E(Zk ) < ∞. Show that Sn ≤ 2Zk+1 if 2k < n ≤ 2k+1 .) (e) Strong law of large numbers: sixth form Let hXn in∈N be a martingale difference sequence 1 (X0 + . . . + Xn ) for each n. Show such that, for some δ > 0, supn∈N E(|Xn |1+δ ) is finite. Set Sn = n+1 that limn→∞ Sn =a.e. 0. (Hint: take a non-decreasing sequence hΣn in∈N to which hXn in∈N is adapted. Set Yn = Xn when |Xn | ≤ n, 0 otherwise. Let Un be a conditional expectation Pn of Yn on Σn−1 and set 1 Zn = Yn − Un . Use ideas from 273H, 276C and 276Yd above to show that n+1 i=0 Vi → 0 a.e. for Vi = Zi , Vi = Ui , Vi = Xi − Yi .) (f ) Show that there is a martingale hXn in∈N which converges in measure but is not convergent a.e. (Compare 273Ba.) (Hint: arrange that {ω : Xn+1 (ω) 6= 0} = En ⊆ {ω : |Xn+1 (ω) − Xn (ω)| ≥ 1}, where hEn in∈N is an independent sequence of sets and µEn =
1 n+1
for each n.)
(g) Give an example of an identically distributed martingale difference sequence hXn in∈N such that 1 h n+1 (X0 + . . . + Xn )in∈N does not converge to 0 almost everywhere. (Hint: start by devising a uniformly 1 bounded sequence hUn in∈N such that limn→∞ E(|Un |) = 0 but h n+1 (U0 + . . . + Un )in∈N does not converge to 0 almost everywhere. Now repeat your construction in such a context that the Un can be derived from an identically distributed martingale difference sequence by the formulae of 276Ye.) (h) Construct a proof of Koml´os’ theorem which does not involve ultrafilters, or any other use of the full axiom of choice, but proceeds throughout by selecting appropriate sub-subsequences. Remember to check that you can prove any fact you use about weakly convergent sequences in L1 on the same rules. 276 Notes and comments I include two more versions of the strong law of large numbers (276C, 276Ye) not because I have any applications in mind but because I think that if you know the strong law for k k1+δ bounded independent sequences, and what a martingale difference sequence is, then there is something missing if you do not know the strong law for k k1+δ -bounded martingale difference sequences. And then, of course, I have to add 276Yf and 276Yg, lest you be tempted to think that the strong law is ‘really’ about martingale difference sequences rather than about independent sequences. Koml´os’ theorem is rather outside the scope of this volume; it is quite hard work and surely much less important, to most probabilists, than many results I have omitted. It does provide a quick proof of 276Xe. However it is relevant to questions arising in some topics treated in Volumes 3 and 4, and the proof fits naturally into this section.
281A
The Stone-Weierstrass theorem
401
Chapter 28 Fourier analysis For the last chapter of this volume, I attempt a brief account of one of the most important topics in analysis. This is a bold enterprise, and I cannot hope to satisfy the reasonable demands of anyone who knows and loves the subject as it deserves. But I also cannot pass it by without being false to my own subject, since problems contributed by the study of Fourier series and transforms have led measure theory throughout its history. What I will try to do, therefore, is to give versions of those results which everyone ought to know in language unifying them with the rest of this treatise, aiming to open up a channel for the transfer of intuitions and techniques between the abstract general study of measure spaces, which is the centre of our work, and this particular family of applications of the theory of integration. I have divided the material of this chapter, conventionally enough, into three parts: Fourier series, Fourier transforms and the characteristic functions of probability theory. While it will be obvious that many ideas are common to all three, I do not think it useful, at this stage, to try to formulate an explicit generalization to unify them; that belongs to a more general theory of harmonic analysis on groups, which must wait until Volume 4. I begin however with a section on the Stone-Weierstrass theorem (§281), which is one of the basic tools of functional analysis, as well as being useful for this chapter. The final section (§286), a proof of Carleson’s theorem, is at a rather different level from the rest.
281 The Stone-Weierstrass theorem Before we begin work on the real subject of this chapter, it will be helpful to have a reasonably general statement of a fundamental theorem on the approximation of continuous functions. In fact I give a variety of forms (281A, 281E, 281F and 281G, together with 281Ya, 281Yd and 281Yg), all of which are sometimes useful. I end the section with a version of Weyl’s Equidistribution Theorem (281M-281N). 281A Stone-Weierstrass theorem: first form Let X be a topological space and K a compact subset of X. Write Cb (X) for the space of all bounded continuous real-valued functions on X, so that Cb (X) is a linear space over R. Let A ⊆ Cb (X) be such that A is a linear subspace of Cb (X); |f | ∈ A for every f ∈ A; χX ∈ A; whenever x, y are distinct points of K there is an f ∈ A such that f (x) 6= f (y). Then for every continuous h : K → R and ² > 0 there is an f ∈ A such that |f (x) − h(x)| ≤ ² for every x ∈ K, if K 6= ∅, inf x∈X f (x) ≥ inf x∈K h(x) and supx∈X f (x) ≤ supx∈K h(x). Remark I have stated this theorem in its natural context, that of general topological spaces. But if these are unfamiliar to you, you do not in fact need to know what they are. If you read ‘let X be a topological space’ as ‘let X be a subset of Rr ’ and ‘K is a compact subset of X’ as ‘K is a subset of X which is closed and bounded in R r ’, you will have enough for all the applications in this chapter. In order to follow the proof, of course, you will need to know a little about compactness in R r ; I have written out the necessary facts in §2A2. proof (a) If K is empty, then we can take f to be the constant function 0. So henceforth let us suppose that K 6= ∅. (b) The first point to note is that if f , g ∈ A then f ∧ g and f ∨ g belong to A, where (f ∧ g)(x) = min(f (x), g(x)), (f ∨ g)(x) = max(f (x), g(x)) for every x ∈ X; this is because 1 2
1 2
f ∧ g = (f + g − |f − g|), f ∨ g = (f + g + |f − g|).
402
Fourier analysis
281A
It follows by induction on n that f0 ∧ . . . ∧ fn and f0 ∨ . . . ∨ fn belong to A for all f0 , . . . , fn ∈ A. (c) If x, y are distinct points of K, and a, b ∈ R, there is an f ∈ A such that f (x) = a and f (y) = b. P P Start from g ∈ A such that g(x) 6= g(y); this is the point at which we use the last of the list of four hypotheses on A. Set α=
a−b , g(x)−g(y)
β=
bg(x)−ag(y) , g(x)−g(y)
f = αg + βχX ∈ A. Q Q
(d) (The heart of the proof lies in the next two paragraphs.) Let h : K → [0, ∞[ be a continuous function and x any point of K. For any ² > 0, there is an f ∈ A such that f (x) = h(x) and f (y) ≤ h(y) + ² for every y ∈ K. P P Let Gx be the family of those open sets G ⊆ X for which there S is some f ∈ A such that f (x) = h(x) and f (w) ≤ h(w) + ² for every w ∈ K ∩ G. I claim that K ⊆ Gx . To see this, take any y ∈ K. By (c), there is an f ∈ A such that f (x) = h(x) and f (y) = h(y). Now h − f ¹K : K → R is a continuous function, taking the value 0 at y, so there is an open subset G of X, containing y, such that (h −Sf ¹K)(w) ≥ −² for every w ∈ G ∩ K, that is, f (w) ≤ h(w) + ² for every w ∈ G ∩ K. Thus G ∈ Gx and y ∈ Gx , as required. Because K is compact, Gx has a finite subcover G0 , . . . , Gn say. For each i ≤ n, take fi ∈ A such that fi (x) = h(x) and fi (w) ≤ h(w) + ² for every w ∈ Gi ∩ K. Then f = f0 ∧ f1 ∧ . . . ∧ fn ∈ A, by (b), and evidently f (x) = h(x), while if y ∈ K there is some i ≤ n such that y ∈ Gi , so that f (y) ≤ fi (y) ≤ h(y) + ². Q Q (e) If h : K → R is any continuous function and ² > 0, there is an f ∈ A such that |f (y) − h(y)| ≤ ² for every y ∈ K. P P This time, let G be the set of those open subsets G of X for which there is some f ∈ A such that f (y) ≤ h(y) + ² for every y ∈ K and f (x) ≥ h(x) − ² for every x ∈ G ∩ K. Once again, G is an open cover of K. To see this, take any x ∈ K. By (d), there is an f ∈ A such that f (x) = h(x) and f (y) ≤ h(y) + ² for every y ∈ K. Now h − f ¹K : K → R is a continuous function which is zero at x, so there is an open subset G of X, containing x, such that (h S − f ¹K)(w) ≤ ² for every w ∈ G ∩ K, that is, f (w) ≥ h(w) − ² for every w ∈ G ∩ K. Thus G ∈ G and x ∈ G, as required. Because K is compact, G has a finite subcover G0 , . . . , Gm say. For each j ≤ m, take fj ∈ A such that fj (y) ≤ h(y) + ² for every y ∈ K and fj (w) ≥ h(w) − ² for every w ∈ Gj ∩ K. Then f = f0 ∨ f1 ∨ . . . ∨ fm ∈ A, by (b), and evidently f (y) ≤ h(y) + ² for every y ∈ K, while if x ∈ K there is some j ≤ m such that x ∈ Gj , so that f (x) ≥ fj (x) ≥ h(x) − ². Thus |f (x) − h(x)| ≤ ² for every x ∈ K, as required. Q Q (f ) Thus we have an f satisfying the first of the two requirements of the theorem. But for the second, set M0 = inf x∈K h(x) and M1 = supx∈K h(x), and f1 = (f ∧ M0 χX) ∨ (M1 χX); f1 satisfies the second condition as well as the first. (I am tacitly assuming here what is in fact the case, that M0 and M1 are finite; this is because K is compact – see 2A2G or 2A3N.) 281B We need some simple tools, belonging to the basic theory of normed spaces; but I hope they will be accessible even if you have not encountered ‘normed spaces’ before, if you keep a finger at the beginning of §2A4 as you read the next lemma. Lemma Let X be any set. Write `∞ (X) for the set of bounded functions from X to R. For f ∈ `∞ (X), set kf k∞ = supx∈X |f (x)|, counting the supremum as 0 if X is empty. Then (a) `∞ (X) is a normed space.
281C
The Stone-Weierstrass theorem
403
(b) Let A ⊆ `∞ (X) be a subset and A its closure (2A3D). (i) If A is a linear subspace of `∞ (X), so is A. (ii) If f × g ∈ A whenever f , g ∈ A, then f × g ∈ A whenever f , g ∈ A. (iii) If |f | ∈ A whenever f ∈ A, then |f | ∈ A whenever f ∈ A. proof (a) This is a routine verification. To confirm that `∞ (X) is a linear space over R, we have to check that f + g, cf belong to `∞ (X) whenever f , g ∈ `∞ (X) and c ∈ R; simultaneously we can confirm that k k∞ is a norm on `∞ (X) by observing that |(f + g)(x)| ≤ |f (x)| + |g(x)| ≤ kf k∞ + kgk∞ , |cf (x)| = |c||f (x)| ≤ |c|kf k∞ whenever f , g ∈ `∞ (X) and c ∈ R. It is worth noting at the same time that if f , g ∈ `∞ (X), then |(f × g)(x)| = |f (x)||g(x)| ≤ kf k∞ kgk∞ for every x ∈ X, so that kf × gk∞ ≤ kf k∞ kgk∞ . (Of course all these remarks are very elementary special cases of parts of §243; see 243Xl.) (b) Recall that A = {f : f ∈ `∞ (X), ∀ ² > 0 ∃ f1 ∈ A, kf − f1 k∞ ≤ ²} (2A3Kb). Take f , g ∈ A and c ∈ R, and let ² > 0. Set η = min(1,
² ) 2+|c|+kf k∞ +kgk∞
> 0.
Then there are f1 , g1 ∈ A such that kf − f1 k∞ ≤ η, kg − g1 k∞ ≤ η. Now k(f + g) − (f1 + g1 )k∞ ≤ kf − f1 k∞ + kg − g1 k∞ ≤ 2η ≤ ², kcf − cf1 k∞ = |c|kf − f1 k∞ ≤ |c|η ≤ ², k(f × g) − (f1 × g1 )k∞ = k(f − f1 ) × g + f × (g − g1 ) − (f − f1 ) × (g − g1 )k∞ ≤ k(f − f1 ) × gk∞ + kf × (g − g1 )k∞ + k(f − f1 ) × (g − g1 )k∞ ≤ kf − f1 k∞ kgk∞ + kf k∞ kg − g1 )k∞ + kf − f1 k∞ kg − g1 k∞ ≤ η(kgk∞ + kf k∞ + η) ≤ η(kgk∞ + kf k∞ + 1) ≤ ², k|f | − |f1 |k∞ ≤ kf − f1 k∞ ≤ η ≤ ². (i) If A is a linear subspace, then f1 + g1 and cf1 belong to A. As ² is arbitrary, f + g and cf belong to A. As f , g and c are arbitrary, A is a linear subspace of `∞ (X). (ii) If A is closed under multiplication, then f1 × g1 ∈ A. As ² is arbitrary, f × g ∈ A. (iii) If the absolute values of functions in A belong to A, then |f1 | ∈ A. As ² is arbitrary, |f | ∈ A. 281C Lemma There is a sequence hpn in∈N of real polynomials such that limn→∞ pn (x) = |x| uniformly for x ∈ [−1, 1]. proof (a) By the Binomial Theorem we have 1 2
(1 − x)1/2 = 1 − x −
1 2 x 4·2!
−
1·3 3 x 23 ·3!
− ... = −
P∞
(2n)! n n=0 (2n−1)(2n n!)2 x
whenever |x| < 1, with the convergence being uniform on any interval [−a, a] with 0 ≤ a < 1. (For a proof of this, see almost any book on real or complex analysis. If you have no favourite text to hand, you can try to construct a proof from the following facts: (i) the radius of convergence of the series is 1, so on any interval
404
Fourier analysis
281C
[−a, a], with 0 ≤ a < 1, it is uniformly absolutely summable (ii) writing f (x) for the sum of the series for |x| < 1, use Lebesgue’s Dominated Convergence Theorem to find expressions for the indefinite integrals Rx R0 f , − −x f and show that these are 32 (1 − (1 − x)f (x)), 23 (1 − (1 + x)f (−x)) for 0 ≤ x < 1 (iii) use the 0 ¡ 2¢ d f (x) Fundamental Theorem of Calculus to show that f (x) + 2(1 − x)f 0 (x) = 0 (iv) show that dx = 0 and 1−x 2 hence (v) that f (x) = 1 − x whenever |x| < 1. Finally, show that because f is continuous and non-zero in ]−1, 1[, f (x) must be the positive square root of 1 − x throughout.) We have a further fragment of information. If we set Pn (2k)! 1 xk q0 (x) = 1, q1 (x) = 1 − x, qn (x) = − k=0 k 2 2
(2k−1)(2 k!)
for n ≥ 2 and x ∈ [0, 1], so that qn is the nth partial sum of the binomial series for (1 − x)1/2 , then we have limn→∞ qn (x) = (1 − x)1/2 for every x ∈ [0, 1[. But also every qn is non-increasing on [0, 1], and hqn (x)in∈N is also a non-increasing sequence for each x ∈ [0, 1]. So we must have √ 1 − x ≤ qn (x) ∀ n ∈ N, x ∈ [0, 1[, and therefore, because all the qn are continuous, √ 1 − x ≤ qn (x) ∀ n ∈ N, x ∈ [0, 1]. √ 1 2 Moreover, given ² > 0, set a = 1 − 4 ² , so that 1 − a = 2² . Then there is an n0 ∈ N such that qn (x) − √ √ 1 − x ≤ 2² for every x ∈ [0, a] and n ≥ n0 . In particular, qn (a) ≤ ², so qn (x) ≤ ² and qn (x) − 1 − x ≤ ² for every x ∈ [a, 1], n ≥ n0 . This means that √ 0 ≤ qn (x) − 1 − x ≤ ² ∀ n ≥ n0 , x ∈ [0, 1]; √ as ² is arbitrary, hqn (x)in∈N → 1 − x uniformly on [0, 1]. (b) Now set pn (x) = qn (1 − x2 ) for x ∈ R. Because each qn is a real polynomial of degree n, each pn is a real polynomial of degree 2n. Next, p sup |pn (x) − |x|| = sup |qn (1 − x2 ) − 1 − (1 − x2 )| |x|≤1
|x|≤1
= sup |qn (y) −
p
1 − y| → 0
y∈[0,1]
as n → ∞, so limn→∞ pn (x) = |x| uniformly for |x| ≤ 1, as required. 281D Corollary Let X be a set, and A a norm-closed linear subspace of `∞ (X) containing χX and such that f × g ∈ A whenever f , g ∈ A. Then |f | ∈ A for every f ∈ A. proof Set f1 =
1 f, 1+kf k∞
so that f1 ∈ A and kf1 k∞ ≤ 1. Because A contains χX and is closed under multiplication, p ◦ f1 ∈ A for every polynomial p with real coefficients. In particular, gn = pn ◦ f1 ∈ A for every n, where hpn in∈N is the sequence of 281C. Now, because |f1 (x)| ≤ 1 for every x ∈ X, kgn − |f1 |k∞ = supx∈X |pn (f1 (x)) − |f1 (x)|| ≤ sup|y|≤1 |pn (y) − |y|| → 0 as n → ∞. Because A is k k∞ -closed, |f1 | ∈ A; consequently |f | ∈ A, as claimed. 281E Stone-Weierstrass theorem: second form Let X be a topological space and K a compact subset of X. Write Cb (X) for the space of all bounded continuous real-valued functions on X. Let A ⊆ Cb (X) be such that A is a linear subspace of Cb (X); f × g ∈ A for every f , g ∈ A; χX ∈ A; whenever x, y are distinct points of K there is an f ∈ A such that f (x) 6= f (y).
281G
The Stone-Weierstrass theorem
405
Then for every continuous h : K → R and ² > 0 there is an f ∈ A such that |f (x) − h(x)| ≤ ² for every x ∈ K, if K 6= ∅, inf x∈X f (x) ≥ inf x∈K h(x) and supx∈X f (x) ≤ supx∈K h(x). proof Let A be the k k∞ -closure of A in `∞ (X). It is helpful to know that A ⊆ Cb (X); this is because the uniform limit of continuous functions is continuous. (But if this is new to you, or your memory has faded, don’t take time to look it up now; just read ‘A ∩ Cb (X)’ in place of ‘A’ in the rest of this argument.) By 281B-281D, A is a linear subspace of Cb (X) and |f | ∈ A for every f ∈ A, so the conditions of 281A apply to A. Take a continuous h : K → R and an ² > 0. The cases in which K = ∅ or h is constant are trivial, because all constant functions belong to A; so I suppose that M0 = inf x∈K h(x) and M1 = supx∈K h(x) are defined and distinct. As observed at the end of the proof of 281A, M0 and M1 are finite. Set ˜ η = min( 1 ², 1 (M1 − M2 )) > 0, h(x) = min(max(h(x), M0 + η), M1 − η) for x ∈ K, 3
2
˜ : K → R is continuous and M0 + η ≤ h(x) ˜ so that h ≤ M1 − η for every x ∈ K. By 281A, there is an f0 ∈ A ˜ such that |f0 (x) − h(x)| ≤ η for every x ∈ K and M0 + η ≤ f0 (x) ≤ M1 − η for every x ∈ X. Now there is an f ∈ A such that kf − f0 k∞ ≤ η, so that ˜ ˜ |f (x) − h(x)| ≤ |f (x) − f0 (x)| + |f0 (x) − h(x)| + |h(x) − h(x)| ≤ 3η ≤ ² for every x ∈ K, while M0 ≤ f0 (x) − η ≤ f (x) ≤ f0 (x) + η ≤ M1 for every x ∈ X. 281F Corollary: Weierstrass’ theorem Let K be any closed bounded subset of R. Then every continuous h : K → R can be uniformly approximated on K by polynomials. proof Apply 281E with X = K (noting that K, being closed and bounded, is compact), and A the set of polynomials with real coefficients, regarded as functions from K to R. 281G Stone-Weierstrass theorem: third form Let X be a topological space and K a compact subset of X. Write Cb (X; C) for the space of all bounded continuous complex-valued functions on X, so that Cb (X; C) is a linear space over C. Let A ⊆ Cb (X; C) be such that A is a linear subspace of Cb (X; C); f × g ∈ A for every f , g ∈ A; χX ∈ A; the complex conjugate f¯ of f belongs to A for every f ∈ A; whenever x, y are distinct points of K there is an f ∈ A such that f (x) 6= f (y). Then for every continuous h : K → C and ² > 0 there is an f ∈ A such that |f (x) − h(x)| ≤ ² for every x ∈ K, if K 6= ∅, supx∈X |f (x)| ≤ supx∈K |h(x)|. proof If K = ∅, or h is identically zero, we can take f = 0. So let us suppose that M = supx∈K |h(x)| > 0. (a) Set AR = {f : f ∈ A, f (x) is real for every x ∈ X}. Then AR satisfies the conditions of 281E. P P (i) Evidently AR is a subset of Cb (X) = Cb (X; R), is closed under addition, multiplication by real scalars and pointwise multiplication of functions, and contains χX. If x, y are distinct points of K, there is an f ∈ A such that f (x) 6= f (y). Now 1 2
Re f = (f + f¯),
Im f =
1 (f 2i
− f¯)
both belong to A and are real-valued, so belong to AR , and at least one of them takes different values at x and y. Q Q
406
Fourier analysis
281G
(b) Consequently, given a continuous function h : K → C and ² > 0, we may apply 281E twice to find f1 , f2 ∈ AR such that |f1 (x) − Re(h(x))| ≤ η, |f2 (x) − Im(h(x))| ≤ η ²M for every x ∈ K, where η = min( 21 , 6M +4 ) > 0. Setting g = f1 + if2 , we have g ∈ A and |g(x) − h(x)| ≤ 2η for every x ∈ K.
(c) Set L = kgk∞ . If L ≤ M we can take f = g and stop. Otherwise, consider the function M −η √ max(M, t)
φ(t) =
for t ∈ [0, L2 ]. By Weierstrass’ theorem (281F), there is a real polynomial p such that |φ(t) − p(t)| ≤ whenever 0 ≤ t ≤ L2 . Note that |g|2 = g × g¯ ∈ A, so that
η L
f = g × p(|g|2 ) ∈ A. Now |p(t)| ≤ φ(t) +
η L
≤ φ(t) +
η √ max(M, t)
=
M √ max(M, t)
whenever 0 ≤ t ≤ L2 , so |f (x)| ≤ |g(x)|
M max(M,|g(x)|)
≤M
for every x ∈ X. Next, if 0 ≤ t ≤ min(L, M + 2η)2 , |1 − p(t)| ≤
η L
+ 1 − φ(t) ≤
η M
+1−
M −η M +2η
≤
4η . M
Consequently, if x ∈ K, so that |g(x)| ≤ min(L, |h(x)| + 2η) ≤ min(L, M + 2η), we shall have |1 − p(|g(x)|2 )| ≤
4η , M
and |f (x) − h(x)| ≤ |g(x) − h(x)| + |g(x)||1 − p(|g(x)|2 )| ≤ 2η +
4η (M M
+ 2η) ≤ 2η +
4η (M M
+ 1) ≤ ²,
as required. Remark Of course we could have saved ourselves effort by settling for supx∈X |f (x)| ≤ 2 supx∈K |h(x)|, which would be quite good enough for the applications below. 281H Corollary Let [a, b] ⊆ R be a non-empty bounded closed interval and h : [a, b] → C a continuous function. Then for any ² > 0 there are y0 , . . . , yn ∈ R and c0 , . . . , cn ∈ C such that Pn |h(x) − k=0 ck eiyk x | ≤ ² for every x ∈ [a, b], Pn supx∈R | k=0 ck eiyk x | ≤ supx∈[a,b] |h(x)|. proof Apply 281G with X = R, K = [a, b] and A the linear span of the functions x 7→ eiyx as y runs over R. 281I Corollary Let S 1 be the unit circle {z : |z| = 1} ⊆ C. Then for any continuous h : S1 → C Pn function k and ² > 0, there are n ∈ N and c−n , c−n+1 , . . . , c0 , . . . , cn ∈ C such that |h(z) − k=−n ck z | ≤ ² for every z ∈ S1.
281N
The Stone-Weierstrass theorem
407
proof Apply 281G with X = K = S 1 and A the linear span of the functions z 7→ z k for k ∈ Z. 281J Corollary Let h : [−π, π] → C be a continuousPfunction such that h(π) = h(−π). Then for any n ² > 0 there are n ∈ N, c−n , . . . , cn ∈ C such that |h(x) − k=−n ck eikx | ≤ ² for every x ∈ [−π, π]. ˜ : S 1 → C is continuous on S 1 , where h(z) ˜ proof The point is that h = h(arg z); this is because arg is continuous everywhere except at −1, and limx↓−π h(x) = h(−π) = h(π) = limx↑π h(x), so ˜ ˜ limz∈S 1 ,z→−1 h(z) = h(π) = h(−1). Pn ˜ Now by 281H there are c−n , . . . , cn ∈ C such that |h(z) − k=−n ck z k | ≤ ² for every z ∈ S 1 , and these coefficients serve equally for h. 281K Corollary Suppose that r ≥ 1 and that K ⊆ R r is a non-empty closed bounded set. Let h : K → C be a continuous function, and ² > 0. Then there are y0 , . . . , yn ∈ Qr and c0 , . . . , cn ∈ C such that Pn |h(x) − k=0 ck eiyk . x | ≤ ² for every x ∈ K, Pn supx∈Rr | k=0 ck eiyk . x | ≤ supx∈K |h(x)|, Pr writing y .x = j=1 ηj ξj when y = (η1 , . . . , ηr ) and x = (ξ1 , . . . , ξr ) belong to R r . proof Apply 281G with X = R r and A the linear span of the functions x 7→ eiy . x as y runs over Q r . 281L Corollary Suppose that r ≥ 1 and that K ⊆ R r is a non-empty closed bounded set. Let h : K → R be a continuous function, and ² > 0. Then there are y0 , . . . , yn ∈ R r and c0 , . . . , cn ∈ C such Pn that, writing g(x) = k=0 ck eiyk . x , g is real-valued and |h(x) − g(x)| ≤ ² for every x ∈ K, inf y∈K h(y) ≤ g(x) ≤ supy∈K h(y) for every x ∈ R r . proof Apply 281E with X = R r and A the set of real-valued functions on R r which are complex linear combinations of the functions x 7→ eiy . x ; as remarked in part (a) of the proof of 281G, A satisfies the conditions of 281E. 281M Weyl’s Equidistribution Theorem We are now ready for one of the basic results of number theory. I shall actually apply it to provide an example in §285 below, but (at least in the one-variable case) it is surely on the (rather long) list of things which every pure mathematician should know. For the sake of the application I have in mind, I give the full r-dimensional version, but you may wish to take it in the first place with r = 1. It will be helpful to have a notation for ‘fractional part’. For any real number x, write <x> for that number in [0, 1[ such that x − <x> is an integer. Now for the theorem. 281N Theorem Let η1 , . . . , ηr be real numbers such that 1, η1 , . . . , ηr are linearly independent over Q. Then whenever 0 ≤ αj ≤ βj ≤ 1 for each j ≤ r, Qr 1 limn→∞ #({m : m ≤ n, <mηj > ∈ [αj , βj ] for every j ≤ r}) = j=1 (βj − αj ). n+1
Remark Thus the theorem says that the long-term proportion of the r-tuples (<mη1 >, . . . , <mηr >) which belong to the interval [a, b] ⊆ [0, 1] is just the Lebesgue measure µ[a, b] of the interval. Of course the condition ‘η1 , . . . , ηr are linearly independent over Q’ is necessary as well as sufficient (281Xg). proof (a) Write y = (η1 , . . . , ηr ) ∈ R r ,
408
Fourier analysis
281N r
<my> = (<mη1 >, . . . , <mηr >) ∈ [0, 1[ = [0, 1[ for each m ∈ N. Set I = [0, 1] = [0, 1]r , and for any function f : I → R write 1 Pn L(f ) = lim supn→∞ m=0 f (<my>), n+1
L(f ) = lim inf n→∞
1 n+1
Pn m=0
f (<my>);
and for f : I → C write 1 n+1
L(f ) = limn→∞
Pn m=0
f (<my>)
if the limit exists. It will be worth noting that for non-negative functions f , g, h : I → R such that h ≤ f +g, L(h) ≤ L(f ) + L(g), and that L(cf + g) = cL(f ) + L(g) for any two functions f , g : I → C such that L(f ) and L(g) exist, and any c ∈ C. R (b) I mean to show that L(f ) exists and is equal to I f for (many) continuous functions f . The key step is to consider functions of the form f (x) = e2πik . x , where k = (κ1 , . . . , κr ) ∈ Zr . In this case, if k 6= 0, Pr /Z k . y = j=1 κj ηj ∈ because 1, η1 , . . . , ηr are linearly independent over Q. So 1 n→∞ n+1
L(f ) = lim (because mk . y − k . <my> =
Pr
n X
e2πik . <my> = lim
1 n→∞ n+1
m=0
n X
e2πimk . y
m=0
j=1 κj (mηj − <mηj >) is an integer)
1 − e2πi(n+1)k . y n→∞ (n + 1)(1 − e2πik . y )
= lim (because e2πik . y 6= 1) = 0,
because |1 − e2πi(n+1)k . y | ≤ 2 for every n. Of course we can also calculate the integral of f over I, which is Z
Z f (x)dx = I
e
2πik . x
dx =
I
(writing x = (ξ1 , . . . , ξr ))
Z
Z
1
... 0
Z
e2πiκj ξj dx
I j=1
= =
Z Y r
1
r Y
e2πiκj ξj dξr . . . dξ1
0 j=1 1
Z
e2πiκr ξr dξr . . .
0
1
e2πiκ1 ξ1 dξ1 = 0
0
because at least one κj is non-zero, and for this j we must have
R1
So we have L(f ) =
R
0
e2πiκj ξj dξj =
1 (e2πiκj 2πiκj
− 1) = 0.
f = 0 when k 6= 0. On the other hand, if k = 0, then f is constant with value 1, so R 1 Pn L(f ) = limn→∞ m=0 f (<my>) = limn→∞ 1 = 1 = I f (x)dx. I
n+1
281N
The Stone-Weierstrass theorem
409
(c) Now write ∂I = [0, 1] \ ]0, 1[, the boundary of I. If f : I → C is continuous and f (x) = 0 for x ∈ ∂I, R then L(f ) = I f . P P As in 281I, let S 1 be the unit circle {z : z ∈ C, |z| = 1}, and set K = (S 1 )r ⊆ Cr . If we think of K as a subset of R 2r , it is closed and bounded. Let φ : K → I be given by 1 2
φ(ζ1 , . . . , ζr ) = ( +
arg ζ1 ,... 2π
1 2
, +
arg ζr ) 2π
for ζ1 , . . . , ζr ∈ S 1 . Then h = f φ : K → C is continuous, because φ is continuous on (S 1 \ {−1})r and limw→z f φ(w) = f φ(z) = 0 1
r
for any z ∈ K \ (S \ {−1}) . (Compare 281J.) Now apply 281G with X = K and A the set of polynomials in ζ1 , . . . , ζr , ζ1−1 , . . . , ζr−1 to see that, given ² > 0, there is a function of the form P g(z) = k∈J ck ζ1κ1 . . . ζrκr , for some finite set J ⊆ Zr and constants ck ∈ C for k ∈ J, such that |g(z) − h(z)| ≤ ² for every z ∈ K. Set g˜(x) = g(eπi(2ξ1 −1) , . . . , eπi(2ξr −1) ) =
P k∈J
ck eπik . (2x−1) =
P
k∈J (−1)
k.1
ck e2πik . x ,
so that g˜φ = g, and see that supx∈I |˜ g (x) − f (x)| = supz∈K |g(z) − h(z)| ≤ ². R Now g˜ is of the form dealt with in (a), so we must have L(˜ g ) = I g˜. Let n0 be such that ¯ ¯R Pn ¯ g˜ − 1 g˜(<my>)¯ ≤ ² m=0 I n+1
for every n ≥ n0 . Then |
R I
f−
R I
g˜| ≤
R I
|f − g˜| ≤ ²
and |
1 n+1
n X m=0
g˜(<my>) −
1 n+1
n X
f (<my>)| ≤
m=0
≤
1 n+1
n X
|˜ g (<my>) − f (<my>)|
m=0
1 (n + 1)² n+1
=²
for every n ∈ N. So for n ≥ n0 we must have R 1 Pn | m=0 f (<my>) − I f | ≤ 3². n+1 R As ² is arbitrary, L(f ) = I f , as required. Q Q r
(d) Observe next that if a, b ∈ ]0, 1[ = ]0, 1[ , and ² > 0, there are continuous functions f1 , f2 such that R R f1 ≤ χ[a, b] ≤ f2 ≤ χ ]0, 1[, f − I f1 ≤ ². I 2 P P This is elementary. For n ∈ N, define hn : R → [0, 1] by setting hn (ξ) = 0 if ξ ≤ 0, 2n ξ if 0 ≤ ξ ≤ 2−n and 1 if ξ ≥ 2−n . Set Qr f1n (x) = j=1 hn (ξj − αj )hn (βj − ξj ), f2n (x) =
Qr
j=1 (1
− hn (αj − ξj ))(1 − hn (ξj − βj ))
for x = (ξ1 , . . . , ξr ) ∈ R r . (Compare the proof of 242O.) Then f1n ≤ χ[a, b] ≤ f2n for each n, f2n ≤ χ ]0, 1[ for all n so large that 2−n ≤ min(minj≤r αj , minj≤r (1 − βj )), and limn→∞ f2n (x) − f1n (x) = 0 for every x, so limn→∞
R I
f2n −
R I
f1n = 0.
410
Fourier analysis
281N
Thus we can take f1 = f1n , f2 = f2n for any n large enough. Q Q (e) It follows that if a, b ∈ ]0, 1[ and a ≤ b, L(χ[a, b]) = µ[a, b]. P P Let ² > 0. Take f1 , f2 as in (d). Then, using (c), R R L(χ[a, b]) ≤ L(f2 ) = L(f2 ) = I f2 ≤ I f1 + ² ≤ µ[a, b] + ², R R L(χ[a, b]) ≥ L(f1 ) = L(f1 ) = I f1 ≥ I f2 − ² ≥ µ[a, b] − ², so µ[a, b] − ² ≤ L(χ[a, b]) ≤ L(χ[a, b]) ≤ µ[a, b] + ². As ² is arbitrary, µ[a, b] = L(χ[a, b]) = L(χ[a, b]) = L(χ[a, b]), as required. Q Q (f ) To complete the proof, take any a, b ∈ I with a ≤ b. For 0 ≤ ² ≤ 12 , set I² = [²1, (1 − ²)1], so that I² is a closed interval included in ]0, 1[ and µI² = (1 − 2²)r . Of course L(χI) = µI = 1, so L(χ(I \ I² )) = L(χI) − L(χI² ) = 1 − µI² , and µ[a, b] − 1 + µI² ≤ µ[a, b] + µI² − µ([a, b] ∪ I² ) = µ([a, b] ∩ I² ) = L(χ([a, b] ∩ I² )) ≤ L(χ([a, b])) ≤ L(χ([a, b])) ≤ L(χ([a, b] ∩ I² )) + L(χ(I \ I² )) = L(χ([a, b] ∩ I² )) + 1 − µI² = µ([a, b] ∩ I² ) + 1 − µI² ≤ µ[a, b] + 1 − µI² . As ² is arbitrary, µ[a, b] = L(χ[a, b]) = L(χ[a, b]) = L(χ[a, b]), as stated. 281X Basic exercises (a) Let A be the bounded continuous functions f : R r × R r → R Pnset of those 0 which are expressible in the form f (x, y) = k=0 gk (x)gk (y), where all the gk , gk0 are continuous functions from R r to R. Show that for any bounded continuous function h : R r × R r → R and any bounded set K ⊆ R r × R r and any ² > 0, there is an f ∈ A such that |f (x, y) − h(x, y)| ≤ ² for every (x, y) ∈ K and supx,y∈R r |f (x, y)| ≤ supx,y∈R r |h(x, y)|. (b) Let K be a closed bounded set in R r , where r ≥ 1, and h : K → R a continuous function. Show that for any ² > 0 there is a polynomial p in r variables such that |h(x) − p(x)| ≤ ² for every x ∈ K. > (c) Let [a, b] be a non-empty closed interval of R and h : [a, b] → R a continuous function. Show that for any ² > 0 there are y0 , . . . , yn , a0 , . . . , an , b0 , . . . , bn ∈ R such that Pn |h(x) − k=0 (ak cos yk x + bk sin yk x)| ≤ ² for every x ∈ [a, b], Pn supx∈R | k=0 (ak cos yk x + bk sin yk x)| ≤ supx∈[a,b] |h(x)|. |h|p is integrable, where 1 ≤ p < ∞. Show (d) Let h be a complex-valued function on ]−π, π] such that P n that for every ² > 0 there is a function of the form x 7→ f (x) = k=−n ck eikx , where c−k , . . . , ck ∈ C, such Rπ p that −π |h − f | ≤ ². (Compare 244H.) > (e) Let h : [−π, π] → R be a continuous function such that h(π) = h(−π), and ² > 0. Show that there are a0 , . . . , an , b1 , . . . , bn ∈ R such that Pn 1 |h(x) − a0 − k=1 (ak cos kx + bk sin kx)| ≤ ² 2
for every x ∈ [−π, π].
281Yf
The Stone-Weierstrass theorem
411
(f ) Let K be a non-empty closed bounded set in R r , where r ≥ 1, and h : K → R a continuous function. Show that for any ² > 0 there are y0 , . . . , yn ∈ R r , a0 , . . . , an , b0 , . . . , bn ∈ R such that Pn |h(x) − k=0 (ak cos yk . x + bk sin yk . x)| ≤ ² for every x ∈ K, Pn supx∈R | k=0 (ak cos yk .x + bk sin yk . x)| ≤ supx∈K |h(x)|, interpreting y . x as in 281K. (g) Let y1 , . . . , yr be real numbers which are not linearly independent over Q. Show that there is a non-trivial interval [a, b] ⊆ [0, 1] ⊆ R r such that (, . . . , ) ∈ / [a, b] for every k ∈ Z. (h) Let η1 , . . . , ηr be real numbers such that 1, η1 , . . . , ηr are linearly independent over Q. Suppose that 0 ≤ αj ≤ βj ≤ 1 for each j ≤ r. Show that for every ² > 0 there is an n0 ∈ N such that Qr 1 | j=1 (βj − αj ) − #({m : k ≤ m ≤ k + n, <mηj > ∈ [αj , βj ] for every j ≤ r})| ≤ ² n+1
whenever n ≥ n0 and k ∈ N. (Hint: in 281N, set L(f ) = lim supn→∞ supk∈N
1 n+1
Pk+n m=k
f (<my>).)
281Y Further exercises (a) Show that under the hypotheses of 281A, there is an f ∈ A, the k k∞ closure of A in Cb (X), such that f ¹K = h. (Hint: take f = limn→∞ fn where kfn+1 − fn k∞ ≤ supx∈K |fn (x) − h(x)| ≤ 2−n for every n ∈ N.) (b) Let X be a topological space and K ⊆ X a compact subset. Suppose that for any distinct points x, y of K there is a continuous function f : X → R such that f (x) 6= f (y). Show that for any r ∈ N and any continuous h : K → R r there is a continuous f : X → R r extending h. (Hint: consider r = 1 first.) (c) Let hXi ii∈I be any family of compact Hausdorff spaces, and X their product as topological spaces. For each i, write C(Xi ) for the set of continuous functions from Xi to R, and πi : X → Xi for the coordinate map. Show that the subalgebra of C(X) generated by {f πi : i ∈ I, f ∈ C(Xi )} is k k∞ -dense in C(X). (Note: you will need to know that X is compact, and that if Z is any compact Hausdorff space then for any distinct z, w ∈ Z there is an f ∈ C(Z) such that f (z) 6= f (w). For references see 3A3J and 3A3Bc in the next volume.) (d) Let X be a topological space and K a compact subset of X. Let A be a linear subspace of the space Cb (X) of real-valued continuous functions on X such that |f | ∈ A for every f ∈ A. Let h : K → R be a continuous function such that whenever x, y ∈ K there is an f ∈ A such that f (x) = h(x) and f (y) = h(y). Show that for every ² > 0 there is an f ∈ A such that |f (x) − h(x)| ≤ ² for every x ∈ K. (e) Let X be a compact topological space and write C(X) for the set of continuous functions from X to R. Suppose that h ∈ C(X), and let A ⊆ C(X) be such that A is a linear subspace of C(X); either |f | ∈ A for every f ∈ A or f × g ∈ A for every f , g ∈ A or f × f ∈ A for every f ∈ A; whenever x, y ∈ X and δ > 0 there is an f ∈ A such that |f (x) − h(x)| ≤ δ, |f (y) − h(y)| ≤ δ. Show that for every ² > 0 there is an f ∈ A such that |h(x) − f (x)| ≤ ² for every x ∈ X. (f ) Let X be a compact topological space and A a k k∞ -closed linear subspace of the space C(X) of continuous functions from X to R. Show that the following are equiveridical: (i) |f | ∈ A for every f ∈ A; (ii) f × f ∈ A for every f ∈ A; (iii) f × g ∈ A for all f , g ∈ A, and that in this case A is closed in C(X) for the topology defined by the pseudometrics
412
Fourier analysis
281Yf
(f, g) 7→ |f (x) − g(x)| : C(X) × C(X) → [0, ∞[ as x runs over X (the ‘topology of pointwise convergence’ on C(X)). (g) Show that under the hypotheses of 281G there is an f ∈ A, the k k∞ -closure of A in Cb (X; C), such that f ¹K = h and (if K 6= ∅) kf k∞ = supx∈K |h(x)|. (h) Let y ∈ R be irrational. Show that for any Riemann integrable function f : [0, 1] → R, R1 1 Pn f (x)dx = limn→∞ m=0 f (<my>), 0 n+1
writing <my> for the fractional part of my. (Hint: recall Riemann’s criterion: for any ² > 0, there are a0 , . . . , an with 0 = a0 ≤ a1 ≤ . . . ≤ an = 1 and P {aj − aj−1 : j ≤ n, supx∈[aj−1 ,aj ] f (x) − inf x∈[aj−1 ,aj ] f (x) ≥ ²} ≤ ².) Pn 1 (i) Let htn in∈N be a sequence in [0, 1]. Show that the following are equiveridical: (i) limn→∞ n+1 k=0 f (tk ) R1 R Pn 1 1 = 0 f for every continuous function f : [0, 1] → R; (ii) limn→∞ n+1 f (t ) = f for every Riemann k k=0 0 1 integrable function f : [0, 1] → R; (iii) lim inf n→∞ n+1 #({k : k ≤ n, tk ∈ G}) ≥ µG for every open set 1 1 G ⊆ [0, 1]; (iv) limn→∞ n+1 #({k : k ≤ n, tk ≤ α}) = α for every α ∈ [0, 1]; (v) limn→∞ n+1 #({k : k ≤ P n 1 2πimtk n, tk ∈ E}) = µE for every E ⊆ [0, 1] such that µ(int E) = µE (vi) limn→∞ n+1 k=0 e = 0 for every m ≥ 1. (Cf. 273J. Such sequences htn in∈N are called equidistributed or uniformly distributed.) (j) Show that the sequence h< ln(n + 1)>in∈N is not equidistributed. (k) Give [0, 1]N its product measure λ. Show that λ-almost every sequence htn in∈N ∈ [0, 1]N is equidistributed in the sense of 281Yi. (Hint: 273J.) R (l) Let f : [0, 1]2 → C be a continuous function. Show that if γ ∈ R is irrational then [0,1]2 f = R a lima→∞ a1 0 f (, )dt. (Hint: consider first functions of the form x 7→ e2πik . x .) 281 Notes and comments I have given three statements (281A, 281E and 281G) of the Stone-Weierstrass theorem, with an acknowledgement (281F) of Weierstrass’ own version, and three further forms (281Ya, 281Yd, 281Yg) in the exercises. Yet another will appear in §4A6 in Volume 4. Faced with such a multiplicity, you may wish to try your own hand at writing out theorems which will cover some or all of these versions. I myself see no way of doing it without setting up a confusing list of alternative hypotheses and conclusions. At which point, I ask ‘what is a theorem, anyway?’, and answer, it is a stopping-place on our journey; it is a place where we can rest, and congratulate ourselves on our achievement; it is a place which we can learn to recognise, and use as a starting point for new adventures; it is a place we can describe, and share with others. For some theorems, like Fermat’s last theorem, there is a canonical statement, an exactly locatable point. For others, like the Stone-Weierstrass theorem here, we reach a mass of closely related results, all depending on some arrangement of the arguments laid out in 281A-281G and 281Ya (which introduces a new idea), and all useful in different ways. I suppose, indeed, that most authors would prefer the versions 281Ya and 281Yg, which eliminate the variable ² which appears in 281A, 281E and 281G, at the expense of taking a closed subspace A. But I find that the corollaries which will be useful later (281H-281L) are more naturally expressed in terms of linear subspaces which are not closed. The applications of the theorem, or the theorems, or the method – choose your own expression – are legion; only a few of them are here. An apparently innocent one is in 281Xa and, in a different variant, in 281Yc; these are enormously important in their own domains. In this volume the principal application will be to 285L below, depending on 281K, and it is perhaps right to note that there is an alternative approach to this particular result, based on ideas in 282G. But I offer Weyl’s equidistribution theorem (281M-281N) as evidence that we can expect to find good use for these ideas in almost any branch of mathematics.
282Ba
Fourier series
413
282 Fourier series Out of the enormous theory of Fourier series, I extract a few results which may at least provide a foundation for further study. I give the definitions of Fourier and F´ejer sums (282A), with five of the most important results concerning their convergence (282G, 282H, 282J, 282L, 282O). On the way I include the Riemann-Lebesgue lemma (282E). I end by mentioning convolutions (282Q). 282A Definition Let f be an integrable complex-valued function defined almost everywhere on ]−π, π]. (a) The Fourier coefficients of f are the complex numbers Z π 1 ck = f (x)e−ikx dx 2π
−π
for k ∈ Z. (b) The Fourier sums of f are the functions sn (x) =
n X
ck eikx
k=−n
for x ∈ ]−π, π], n ∈ N. P∞ (c) The Fourier series of f is the series k=−∞ ck eikx , or (because we ordinarily consider the symmetric P∞ partial sums sn ) the series c0 + k=1 (ck eikx + c−k e−ikx ). (d) The F´ ejer sums of f are the functions σm (x) =
1 m+1
m X
sn (x)
n=0
for x ∈ ]−π, π], m ∈ N. (e) It will be convenient to have a further phrase available. If f is any function with dom f ⊆ ]−π, π], S its periodic extension is the function f˜, with domain k∈Z (dom f + 2kπ), such that f˜(x) = f (x − 2kπ) whenever k ∈ Z, x ∈ dom f + 2kπ. 282B Remarks I have made two more or less arbitrary choices here. (a) I have chosen to express Fourier series in their ‘complex’ form rather than their ‘real’ form. From the point of view of pure measure theory (and, indeed, from the point of view of the nineteenth-century origins of the subject) there are gains in elegance from directing attention to real functions f and looking at the real coefficients Z π 1 ak = f (x) cos kx dx for k ∈ N, π
bk =
1 π
−π
Z
π
f (x) sin kx dx for k ≥ 1. −π
If we do this we have 1 2
c0 = a0 , and for k ≥ 1 we have 1 2
1 2
ck = (ak − ibk ), c−k = (ak + ibk ),
ak = ck + c−k , bk = i(ck − c−k ),
414
Fourier analysis
282Ba
so that the Fourier sums become 1 2
sn (x) = a0 +
n X
ak cos kx + bk sin kx.
k=1
The advantage of this is that real functions f correspond to real coefficients ak , bk , so that it is obvious that if f is real-valued so are its Fourier and F´ejer sums. The disadvantages are that we have to use a variety of trigonometric equalities which are rather more complicated than the properties of the complex exponential function which they reflect, and that we are farther away from the natural generalizations to locally compact abelian groups. So both electrical engineers and harmonic analysts tend to prefer the coefficients ck . (b) I have taken the functions f to be defined on the interval ]−π, π] rather than on the circle S 1 = {z : z ∈ C, |z| = 1}. There would be advantages in elegance of language in using S 1 , though I do not recall often seeing the formula R ck = z k f (z)dz R ikx 1 which is the natural translation of ck = 2π e f (x)dx under the substitution x = arg z, dx = 2πν(dz). However, applications of the theory tend to deal with periodic functions on the real line, so I work with ]−π, π], and accept the fact that its group operation +2π , writing x +2π y for whichever of x + y, x + y + 2π, x + y − 2π belongs to ]−π, π], is less familiar than multiplication on S 1 . (c) The remarks in (b) are supposed to remind you of §255. (d) Observe that if f =a.e. g then f and g have the same Fourier coefficients, Fourier sums and F´ejer sums. This means that we could, if we wished, regard the ck , sn and σm as associated with a member of L1C , the space of equivalence classes of integrable functions (§242), rather than as associated with a particular function f . Since however the sn and σm appear as actual functions, and since many of the questions we are interested in refer to their values at particular points, it is more natural to express the theory in terms of integrable functions f rather than in terms of members of L1C . 282C The problems (a) Under what conditions, and in what senses, do the Fourier and F´ejer sums sn and σm of a function f converge to f ? (b) How do the properties of the double-ended sequence hck ik∈Z reflect the properties of f , and vice versa? Remark The theory of Fourier series has been one of the leading topics of analysis for nearly two hundred years, and innumerable further problems have contributed greatly to our understanding. (For instance: can one characterize those sequences hck ik∈Z which are the Fourier coefficients of some integrable function?) But in this outline I will concentrate on the question (a) above, with one result (282K) addressing (b), which will give us more than enough material to work on. While most people would feel that the Fourier sums are somehow closer to what we really want to know, it turns out that the F´ejer sums are easier to analyse, and there are advantages in dealing with them first. So while you may wish to look ahead to the statements of 282J, 282L and 282O for an idea of where we are going, the first half of this section will be largely about F´ejer sums. Note that in any case in which we know that the Fourier sums converge (which is quite common; see, for instance, the examples in 282Xh and 282Xo), then if we know that the F´ejer sums converge to f , we can deduce that the Fourier sums also do, by 273Ca. The first step is a basic lemma showing that both the Fourier and F´ejer sums of a function f can be thought of as convolutions of f with kernels describable by familiar functions. 282D Lemma Let f be a complex-valued function which is integrable over ]−π, π], and Rπ Pn Pm 1 1 ck = 2π f (x)e−ikx dx, sn (x) = k=−n ck eikx , σm (x) = m+1 n=0 sn (x) −π its Fourier coefficients, Fourier sums and F´ejer sums. Write f˜ for the periodic extension of f (282Ae). For m ∈ N, write
282D
Fourier series
ψm (t) =
sn (x) = = =
Z
1 2π
1−cos(m+1)t 2π(m+1)(1−cos t) m+1 2π
for 0 < |t| ≤ π. (If you like, you can set ψm (0) = (a) For each n ∈ N, x ∈ ]−π, π], π
f (t) −π π
Z
1 2π
−π π
415
to make ψm continuous on [−π, π].) sin(n+ 12 )(x−t) sin 21 (x−t) 1
sin(n+ 2 )t f˜(x + t) dt 1 sin 2 t
Z
1 2π
dt
sin(n+ 12 )t
f (x −2π t)
dt,
sin 12 t
−π
writing x −2π t for whichever of x − t, x − t − 2π, x − t + 2π belongs to ]−π, π]. (b) For each m ∈ N, x ∈ ]−π, π], Z
π
f˜(x + t)ψm (t)dt
σm (x) = −π Z π
(f˜(x + t) + f˜(x − t))ψm (t)dt
= 0
Z
π
=
f (x −2π t)ψm (t)dt. −π
(c) For any n ∈ N, 1 2π
Z
0
sin(n+ 12 )t
dt =
sin 12 t
−π
1 2π
Z
π
sin(n+ 12 )t sin 12 t
0
1 2
1 2π
dt = ,
Z
π
−π
sin(n+ 12 )t sin 12 t
dt = 1.
(d) For any m ∈ N, (i) 0 ≤ ψm (t) ≤
m+1 2π
for every t;
(ii) for any δ > 0, limm→∞ ψm (t) = 0 uniformly on {t : δ ≤ |t| ≤ π}; Rπ R0 Rπ 1 ψ (t)dt = 1. (iii) −π ψm (t)dt = 0 ψm (t)dt = , −π m 2
proof Really all that these amount to is summing geometric series. (a) For (a), we have n X
e−ikt =
k=−n
eint − e−i(n+1)t 1 − e−it 1
=
1
ei(n+ 2 )t − e−i(n+ 2 )t e
1 2 it
−e
− 12 it
=
sin(n + 21 )t . sin 12 t
So sn (x) =
n X
ck eikx =
k=−n
= =
1 2π 1 2π
Z
π
n X
Z eikx
k=−n
π
f (s)e−iks ds
−π
Z n ¡X ¢ 1 ik(x−s) ˜ f (s) e ds =
−π
Z
1 2π
π−x
−π−x
2π
k=−n
sin(n+ 12 )t 1 f˜(x + t) dt = 1 sin 2 t
2π
π
−π
Z
π
−π
1
sin(n+ 2 )(x−s) f˜(s) dt 1 sin 2 (x−s)
1
sin(n+ 2 )t f˜(x + t) dt 1 sin 2 t
416
Fourier analysis
282D
sin(n+ 1 )t
because f˜ and t 7→ sin 1 t2 are periodic with period 2π, so that the integral from −π − x to −π must be 2 the same as the integral from π − x to π. For the expression in terms of f (x −2π t), we have
sn (x) =
1 2π
(substituting −t for t) =
1 2π
Z
π
−π
Z
Z
sin(n+ 12 )t 1 f˜(x + t) dt = 1 sin 2 t
π
f (x −2π t)
2π
sin(n+ 12 )t sin 12 t
−π
π
−π
1
sin(n+ 2 )(−t) f˜(x − t) dt 1 sin 2 (−t)
dt
because (for x, t ∈ ]−π, π]) f (x −2π t) = f˜(x − t) whenever either is defined, and sin is an odd function. (b) In the same way, we have m X
1 2
sin(n + )t = Im
n=0
m ¡X
m ¢ ¡ 1 X ¢ 1 ei(n+ 2 )t = Im e 2 it eint
n=0
n=0
¡ 1 − ei(m+1)t ¢ 1 − ei(m+1)t ¢ = Im e = Im 1 1 1 − eit e− 2 it − e 2 it ¡ 1 − ei(m+1)t ¢ ¡ i(1 − ei(m+1)t ) ¢ = Im = Im 1 −2i sin 2 t 2 sin 21 t 1 − cos(m + 1)t = . 2 sin 12 t ¡
1 2 it
So Pm n=0
sin(n+ 12 )t
1−cos(m+1)t 2 sin2 (n+ 12 )t
=
sin 12 t
=
1−cos(m+1)t 1−cos t
= 2π(m + 1)ψm (t).
Accordingly,
σm (x) = = =
1 m+1 1 m+1 1 2π
Z
Z
m X n=0 m X n=0 π
sn (x) 1 2π
π
π −π
1
sin(n+ 2 )t f˜(x + t) dt 1 sin 2 t
m X
¡ 1 f˜(x + t)
m+1
−π
=
Z
sin(n+ 21 )t ¢ sin 12 t
n=0
Z
f˜(x + t)ψm (t)dt =
−π
dt
π
f (x −2π t)ψm (t)dt −π
as in (a), because cos and ψm are even functions. For the same reason, Z
π
Z
0
f˜(x − t)ψm (t)dt =
0
f˜(x + t)ψm (t)dt,
−π
so Z σm (x) =
π
(f˜(x + t) + f˜(x − t))ψm (t)dt.
0
(c) We need only look at where the formula
sin(n+ 21 )t sin 12 t
came from to see that
282E
Fourier series
1 2π
Z I
sin(n+ 12 )t sin 12 t
dt = =
for both I = [−π, 0] and I = [0, π], because
R I
Z X n
1 2π
Z
1 2π
417
eikt dt
I k=−n
(1 + 2 I
n X
cos kt)dt =
k=1
1 2
cos kt dt = 0 for every k 6= 0.
(d)(i) ψm (t) ≥ 0 for every t because 1 − cos(m + 1)t, 1 − cos t are always greater than or equal to 0. For the upper bound, we have, using the constructions in (a) and (b), n ¯ ¯ sin(n+ 1 )t ¯ ¯ X 2 ¯=¯ ¯ eikt ¯ ≤ 2n + 1 1
sin 2 t
k=−n
for every n, so ψm (t) = ≤
1 2π(m+1) 1 2π(m+1)
m X sin(n+ 12 )t n=0 m X
sin 12 t
2n + 1 =
n=0
m+1 . 2π
(ii) If δ ≤ |t| ≤ π, ψm (t) ≤
1 π(m+1)(1−cos t)
≤
1 π(m+1)(1−cos δ)
→0
as m → ∞. (iii) also follows from the construction in (b), because Z ψm (t)dt = I
1 2π(m+1)
m Z X sin(n+ 21 )t n=0
I
sin 12 t
dt =
1 m+1
m X 1 n=0
2
=
1 2
for both I = [−π, 0] and I = [0, π], using (c). Remarks For a discussion of substitution in integrals, if you feel any need to justify the manipulations in part (a) of the proof, see 263I. The functions t 7→
sin(n+ 12 )t sin 12 t
,
t 7→
1−cos(m+1)t (m+1)(1−cos t)
are called respectively Dirichlet’s kernel and F´ ejer’s kernel. I give the formulae in terms of f (x −2π t) in (a) and (b) in order to provide a link with the work of 255O. 282E The next step is a vital lemma, with a suitably distinguished name which (you will be glad to know) reflects its importance rather than its difficulty. The Riemann-Lebesgue lemma Let f be a complex-valued function which is integrable over R. Then R R limy→∞ f (x)e−iyx dx = limy→−∞ f (x)e−iyx dx = 0. proof (a) Consider first the case in which f = χ ]a, b[, where a < b. Then R Rb 2 1 | f (x)e−iyx dx| = | a e−iyx dx| = | (e−iyb − e−iya )| ≤ −iy
|y|
if y 6= 0. So in this case the result is obvious. (b) It follows at once that the result is true if f is a step-function with bounded support, that is, if there are a0 ≤ a1 . . . ≤ an such that f is constant on every interval ]aj−1 , aj [ and zero outside [a0 , an ].
418
Fourier analysis
282E
(c) Now, for a given integrable f and ² > 0, there is a step-function g such that R R R | f (x)e−iyx dx − g(x)e−iyx dx| ≤ |f (x) − g(x)|dx ≤ ² for every y, and
R
|f − g| ≤ ² (242O). So
R lim supy→∞ | f (x)e−iyx dx| ≤ ², lim supy→−∞ |
R
f (x)e−iyx dx| ≤ ².
As ² is arbitrary, we have the result. 282F Corollary (a) Let f be a complex-valued function which is integrable over ]−π, π], and hck ik∈Z its sequence of Fourier coefficients. Then limk→∞ ck = limk→−∞ ck = 0. R (b) Let f be a complex-valued function which is integrable over R. Then limy→∞ f (x) sin yx dx = 0. proof (a) We need only identify ck = with
R
1 2π
Z
π
f (x)e−ikx dx
−π
g(x)e−ikx dx, where g(x) = f (x)/2π for x ∈ dom f and 0 for |x| > π.
(b) This is just because R
f (x) sin yx dx =
1 R ( 2i
f (x)eiyx dx −
R
f (x)e−iyx dx).
282G We are now ready for theorems on the convergence of F´ejer sums. I start with an easy one, almost a warming-up exercise. Theorem Let f : ]−π, π] → C be a continuous function such that limt↓−π f (t) = f (π). Then its sequence hσm im∈N of F´ejer sums converges uniformly to f on ]−π, π]. proof The conditions on f amount just to saying that its periodic extension f˜ is defined and continuous everywhere on R. Consequently it is bounded and uniformly continuous on any bounded interval, in particular, on the interval [−2π, 2π]. Set K = sup|t|≤2π |f˜(t)| = supt∈]−π,π] |f (t)|. Write ψm (t) =
1−cos(m+1)t 2π(m+1)(1−cos t)
for m ∈ N, 0 < |t| ≤ π, as in 282D. Given ² > 0 we can find a δ ∈ ]0, π] such that |f˜(x + t) − f˜(x)| ≤ ² whenever x ∈ [−π, π], |t| ≤ δ. Next, ² we can find an m0 ∈ N such that Mm ≤ 4πK for every m ≥ m0 , where Mm = supδ≤|t|≤π ψm (t) (282D(d-ii)). Now suppose that m ≥ m0 and x ∈ ]−π, π]. Set g(t) = f˜(x + t) − f (x) for |t| ≤ π. Then |g(t)| ≤ 2K for all t ∈ [−π, π] and |g(t)| ≤ ² if |t| ≤ δ, so ¯ ¯
Z
π
−π
¯ g(t)ψm (t)dt¯ ≤
Z
Z
−δ
−π
Z
Z
δ
|g(t)|ψm (t)dt +
π
|g(t)|ψm (t)dt + −δ
|g(t)|ψm (t)dt δ
δ
≤ 2Mm K(π − δ) + ²
ψm (t)dt + 2Mm K(π − δ) −δ
≤ 4πMm K + ² ≤ 2². Consequently, using 282Db and 282D(d-iii), Z
π
|σm (x) − f (x)| = |
(f˜(x + t) − f (x))ψm (t)dt| ≤ 2²
−π
for every m ≥ m0 ; and this is true for every x ∈ ]−π, π]. As ² is arbitrary, hσm im∈N converges to f uniformly on ]−π, π].
282H
Fourier series
419
282H I come now to a theorem describing the behaviour of the F´ejer sums of general functions f . The hypothesis of the theorem may take a little bit of digesting; you can get an idea of its intended scope by glancing at Corollary 282I. Theorem Let f be a complex-valued function which is integrable over ]−π, π], and hσm im∈N its sequence of F´ejer sums. Suppose that x ∈ ]−π, π], c ∈ C are such that 1 δ↓0 δ
Z
δ
lim
|f˜(x + t) + f˜(x − t) − 2c|dt = 0,
0
writing f˜ for the periodic extension of f , as usual; then limm→∞ σm (x) = c. proof Set φ(t) = |f˜(x + t) + f˜(x − t) − 2c| when this is defined, which is almost everywhere, and Φ(t) = Rt φ(s)ds, which is defined for every t ≥ 0, because f˜ is integrable over ]−π, π] and therefore over every 0 bounded interval. As in 282D, set ψm (t) =
1−cos(m+1)t 2π(m+1)(1−cos t)
for m ∈ N, 0 < |t| ≤ π. We have Z
π
|σm (x) − c| = |
Z
π
(f˜(x + t) + f˜(x − t) − 2c)ψm (t)dt| ≤
0
φ(t)ψm (t)dt 0
by (b) and (d) of 282D. Let ² > 0. By hypothesis, limt↓0RΦ(t)/t = 0; let δ ∈ ]0, π] be such that Φ(t) ≤ ²t for every t ∈ [0, δ]. Take π any m ≥ π/δ. I break the integral 0 φ(t)ψm (t)dt up into three parts. (i) For the integral from 0 to 1/m, we have Z
Z
1/m
1/m
m+1 φ(t)dt 2π
φ(t)ψm (t)dt ≤ 0
because ψm (t) ≤
m+1 2π
0
=
m+1 1 Φ( ) 2π m
≤
²(m+1) 2πm
≤ ²,
for every t (282D(d-i)).
(ii) For the integral from 1/m to δ, we have Z
δ
φ(t)ψm (t)dt ≤ 1/m
(because 1 − cos t ≥
2t2 π2
1 2π(m + 1)
Z
δ
φ(t) 1/m
1 π dt ≤ 1 − cos t 4(m + 1)
for |t| ≤ π) =
1 ¡ Φ(δ) Φ( m ) π − 1 2 + 2 4(m + 1) δ (m)
(integrating by parts – see 225F) ¡² π ≤ + 4(m + 1) δ (because Φ(t) ≤ ²t for 0 ≤ t ≤ δ)
Z
δ 1/m
Z
δ 1/m
2² ¢ dt t2
¢ ¡² π + 2²m 4(m + 1) δ π² π² ² π² ≤ + ≤ + ≤ 2². 4(m + 1)δ 2 4 2 ≤
(iii) For the integral from δ to π, we have
2Φ(t) ¢ dt t3
Z
δ 1/m
φ(t) dt t2
420
Fourier analysis
Z
Z
π
π
φ(t)ψm (t) =
φ(t)
δ
Zδ π ≤ δ
282H
1−cos(m+1)t dt 2π(m+1)(1−cos t)
φ(t) dt π(m+1)(1−cos δ)
→ 0 as m → ∞
because φ is integrable over [−π, π]. There must therefore be an m0 ∈ N such that Z π φ(t)ψm (t)dt ≤ ² δ
for every m ≥ m0 . Putting these together, we see that Z
π
φ(t)ψm (t)dt ≤ ² + 2² + ² = 4² 0
for every m ≥ max(m0 , πδ ). As ² is arbitrary, limm→∞ σm (x) = c, as claimed. 282I Corollary Let f be a complex-valued function which is integrable over ]−π, π], and hσm im∈N its sequence of F´ejer sums. (a) f (x) = lim R πm→∞ σm (x) for almost every x ∈ ]−π, π]. (b) limm→∞ −π |f (x) − σm (x)|dx = 0. (c) If g is another integrable function with the same Fourier coefficients, then f =a.e. g. (d) If x ∈ ]−π, π[ is such that a = limt∈dom f,t↑x f (t) and b = limt∈dom f,t↓x f (t) are both defined in C, then 1 2
limm→∞ σm (x) = (a + b). (e) If a = limt∈dom f,t↑π f (t) and b = limt∈dom f,t↓−π f (t) are both defined in C, then 1 2
limm→∞ σm (π) = (a + b). (f) If f is defined and continuous at x ∈ ]−π, π[, then limm→∞ σm (x) = f (x). (g) If f˜, the periodic extension of f , is defined and continuous at π, then limm→∞ σm (π) = f (π). proof (a) We have only to recall that by 223D lim sup δ↓0
1 δ
Z
δ
|f (x + t) + f (x − t) − 2f (x)|dt 0
Z
≤
1¡ lim sup δ 0 δ↓0
= lim sup δ↓0
1 δ
Z
Z
δ
δ
|f (x + t) − f (x)|dt + 0
δ
|f (x + t) − f (x)|dt = 0 −δ
for almost every x ∈ ]−π, π[. (b) Next observe that, in the language of 255O, σm = f ∗ ψm , by the last formula in 282Db. Consequently, by 255Oe, kσm k1 ≤ kf k1 kψm k1 ,
¢ |f (x − t) − f (x)|dt
282J
writing kσm k1 =
Fourier series
Rπ −π
421
|σm (x)|dx. But this means that we have
f (x) = limm→∞ σm (x) for almost every x,
lim supm→∞ kσm k1 ≤ kf k1 ;
and it follows from 245H that limm→∞ kf − σm k1 = 0. (c) If g has the same Fourier coefficients as f , then it has the same Fourier and F´ejer sums, so we have g(x) = limm→∞ σm (x) = f (x) almost everywhere. (d)-(e) Both of these amount to considering x ∈ ]−π, π] such that limt∈dom f˜,t↑x f˜(t) = a, limt∈dom f˜,t↓x f˜(t) = b. Setting c = 12 (a + b), φ(t) = |f˜(x + t) + f˜(x − t) − 2c| whenever this is defined, we have limt∈dom φ,t↓0 φ(t) = 0, Rδ so surely limδ↓0 1δ 0 φ = 0, and the theorem applies. (f )-(g) are special cases of (d) and (e). 282J I now turn to conditions for the convergence of Fourier sums. Probably the easiest result – one which is both striking and satisfying – is the following. Theorem Let f be a complex-valued function which is square-integrable over ]−π, π]. Let hck ik∈Z be its Fourier coefficients and hsn in∈N its Fourier sums (282A). Then P∞ 1 Rπ |f (x)|2 dx, (i) k=−∞ |ck |2 = 2π −π Rπ (ii) limn→∞ −π |f (x) − sn (x)|2 dx = 0. proof (a) I recall some notation from 244N. Let L2C be the space of square-integrable complex-valued functions on ]−π, π]. For g, h ∈ L2C , write Z π p (g|h) = g(x)h(x)dx, kgk2 = (g|g). −π
Recall that kg + hk2 ≤ kgk2 + khk2 for all g, h ∈ L2C (244Fb). For k ∈ Z, x ∈ ]−π, π] set ek (x) = eikx , so that Z π (f |ek ) = f (x)e−ikx dx = 2πck . −π
Moreover, if |k| ≤ n, n X
(sn |ek ) =
Z
π
eijx e−ikx dx = 2πck ,
cj
j=−n
−π
because Z
π
eijx e−ikx = 2π if j = k,
−π
= 0 if j 6= k. So (f − sn |ek ) = 0 whenever |k| ≤ n; in particular, (f − sn |sn ) =
n X k=−n
for every n ∈ N.
c¯k (f − sn |ek ) = 0
422
Fourier analysis
282J
(b) Fix ² > 0. The next element Pm of the proof is the fact that there are m ∈ N, a−m , . . . , am ∈ C such that kf − hk2 ≤ ², where h = k=−m ak ek . P P By 244Hb we know that there is a continuous function g : [−π, π] → C such that kf − gk2 ≤ 3² . Next, modifying g on a suitably short interval ]π − δ, π], we can find a continuous function g1 : [−π, π] → C such that kg − g1 k2 ≤ 3² and g1 (−π) = g1 (π). (Set M = supx∈[−π,π] |g(x)|, take δ ∈ ]0, 2π] such that (2M )2 δ/2π ≤ (²/3)2 , and set g1 (π−tδ) = tg(π−δ)+(1−t)g(−π) for t ∈ [0, 1].) Either by the Stone-Weierstrass theorem (281J), or by 282G above, there are a−m , . . . , am such Pm Pm ² that |g1 (x)− k=−m ak eikx | ≤ √ for every x ∈ [−π, π]; setting h = k=−m ak ek , we have kg1 −hk2 ≤ 31 ², 3 2π
so that kf − hk2 ≤ kf − gk2 + kg − g1 k2 + kg1 − hk2 ≤ ². Q Q (c) Now take any n ≥ m. Then sn − h is a linear combination of e−n , . . . , en , so (f − sn |sn − h) = 0. Consequently ²2 ≥ (f − h|f − h) = (f − sn |f − sn ) + (f − sn |sn − h) + (sn − h|f − sn ) + (sn − h|sn − h) = kf − sn k22 + ksn − hk22 ≥ kf − sn k22 . Thus kf − sn k2 ≤ ² for every n ≥ m. As ² is arbitrary, limn→∞ kf − sn k22 = 0, which proves (ii). (d) As for (i), we have n X
|ck |2 =
k=−n
But of course
1 2π
n X
c¯k (sn |ek ) =
k=−n
1 (sn |sn ) 2π
=
1 ksn k22 . 2π
¯ ¯ ¯ksn k2 − kf k2 ¯ ≤ ksn − f k2 → 0
as n → ∞, so ∞ X k=−∞
|ck |2 =
1 lim ksn k22 2π n→∞
=
1 kf k22 2π
=
1 2π
Z
π
|f (x)|2 dx,
−π
as required. 282K Corollary Let L2C be the Hilbert space of equivalence classes of square-integrable complex-valued functions on ]−π, π], with the inner product Z π • • (f |g ) = f (x)g(x)dx −π
and norm kf k2 = •
¡
Z
π
|f (x)|2 dx
¢1/2
,
−π
writing f • ∈ L2C for the equivalence class of a square-integrable function f . Let `2C (Z) be the Hilbert space of square-summable double-ended complex sequences, with the inner product (cc|dd) =
∞ X
ck d¯k
k=−∞
and norm kcck2 =
∞ ¡ X k=−∞
|ck |2
¢1/2
282K
Fourier series
423
for c = hck ik∈Z , d = hdk ik∈Z in `2C (Z). Then we have an inner-product-space isomorphism S : L2C → `2C (Z) defined by saying that Z π 1 • √ S(f )(k) = f (x)e−ikx dx 2π
−π
for every square-integrable function f and every k ∈ Z. proof (a) As in 282J, write L2C for the space of square-integrable functions. If f , g ∈ L2C and f • = g • , then f =a.e. g, so Z π Z π 1 1 √ f (x)e−ikx dx = √ g(x)e−ikx dx 2π
2π
−π
−π
for every k ∈ N. Thus S is well-defined. (b) S is linear. P P This is elementary. If f , g ∈ L2C and c ∈ C, Z π 1 S(f • + g • )(k) = √ (f (x) + g(x))e−ikx dx 2π −π Z π Z 1 1 =√ f (x)e−ikx dx + √ 2π
2π
−π
π
g(x)e−ikx dx
−π
= S(f • )(k) + S(g • )(k) for every k ∈ Z, so that S(f • + g • ) = S(f • ) + S(g • ). Similarly, Z π Z π c 1 cf (x)e−ikx dx = √ f (x)e−ikx dx = cS(f • )(k) S(cf • )(k) = √ 2π
2π
−π
−π
for every k ∈ Z, so that S(cf ) = cS(f ). Q Q •
•
(c) If f ∈ L2C has Fourier coefficients ck , then S(f • ) = h2πck ik∈Z , so by 282J(i) kS(f
•
)k22
= 2π
∞ X
Z 2
−π
k=−∞
Thus Su ∈ injective.
`2C (Z)
π
|ck | =
|f (x)|2 dx = kf • k22 .
and kSuk2 = kuk2 for every u ∈ L2C . Because S is linear and norm-preserving, it is surely
(d) It now follows that (Sv|Su) = (v|u) for every u, v ∈ L2C . P P (This is of course a standard fact about Hilbert spaces.) We know that for any t ∈ R kuk22 + 2 Re(eit (v|u)) + kvk22 = (u|u) + eit (v|u) + e−it (u|v) + (v|v) = (u + eit v|u + eit v) = ku + eit vk22 = kS(u + eit v)k22 = kSuk22 + 2 Re(eit (Sv|Su)) + kSvk22 = kuk22 + 2 Re(eit (Sv|Su)) + kvk22 , so that Re(eit (Sv|Su)) = Re(eit (v|u)). As t is arbitrary, (Sv|Su) = (v|u). Q Q (n) (e) Finally, S is surjective. P P Let c = hck ik∈Z be any member of `2C (Z). Set ck = ck if |k| ≤ n, 0 (n) otherwise, and c (n) = hck ik∈N . Consider
sn =
n X
ck ek ,
un = s•n
k=−n √1 eikx 2π
where I write ek (x) = the proof of 282J. Now
for x ∈ ]−π, π]. Then Sun = c (n) , by the same calculations as in part (a) of
424
Fourier analysis
qP
kcc(n) − c k2 =
|k|>n
282K
|ck |2 → 0
as n → ∞, so kum − un k2 = kcc(m) − c (n) k2 → 0 as m, n → ∞, and hun in∈N is a Cauchy sequence in L2C . Because L2C is complete (244G), it has a limit u ∈ L2C , and now Su = limn→∞ Sun = limn→∞ c(n) = c. Q Q Thus S : L2C → `2C (Z) is an inner-product-space isomorphism. Remark In the language of Hilbert spaces, all that is happening here is that he•k ik∈Z is a ‘Hilbert space basis’ or ‘complete orthonormal sequence’ in L2C , which is matched by S with the standard basis of `2C (Z). The only step which calls on non-trivial real analysis, as opposed to the general theory of Hilbert spaces, is the check that the linear subspace generated by {e•k : K ∈ Z} is dense; this is part (b) of the proof of 282J. Observe that while S : L2 → `2 is readily described, its inverse is more If c ∈ `2 , we should P∞ of a problem. 1 −1 ikx like to say that S c is the equivalence class of f , where f (x) = √2π k=−∞ ck e for every x. This works very well if {k : ck 6= 0} is finite, but for the general case it is less clear how to interpret the sum. It is in fact the case that if c ∈ `2 then Pn 1 g(x) = √ limn→∞ k=−n ck eikx 2π
is defined for almost every x ∈ ]−π, π], and that S −1c = g • in L2 ; this is, in effect, Carleson’s theorem (286V). A proof of Carleson’s theorem is out of our reach for the moment. What is covered by the results of this section is that 1 1 Pm Pn ikx h(x) = √ limm→∞ n=0 k=−n ck e 2π
m+1
is defined for almost every x ∈ ]−π, π], and that h = S −1c . (The point is that we know from the result just proved that there is some square-integrable f such that c is the sequence of Fourier coefficients of f ; now 282Ia declares that the F´ejer sums of f converge to f almost everywhere, that is, that h =a.e. √12π f .) •
282L The next result is the easiest, and one of the most useful, theorems concerning pointwise convergence of Fourier sums. Theorem Let f be a complex-valued function which is integrable over ]−π, π] and ck its Fourier coefficients, sn its Fourier sums. (i) If f is differentiable at x ∈ ]−π, π[, then f (x) = limn→∞ sn (x). (ii) If the periodic extension f˜ of f is differentiable at π, then f (π) = limn→∞ sn (π). proof (a) Take x ∈ ]−π, π] such that f˜ is differentiable at x; of course this covers both parts. We have Z π 1 1 f˜(x+t) sn (x) = sin(n + )t dt 1 2π
−π
sin 2 t
2
for each n, by 282Da. (b) Next, Z
π ˜ f (x+t)−f˜(x)
t
−π
dt
exists in C, because there is surely some δ ∈ ]0, π] such that (f˜(x+t)− f˜(x))/t is bounded on {t : 0 < |t| ≤ δ}, while Z
−δ ˜ f (x+t)−f˜(x)
−π
t
Z
π ˜ f (x+t)−f˜(x)
dt, δ
exist because 1/t is bounded on those intervals. It follows that
t
dt
282N
Fourier series
Z
π ˜ f (x+t)−f˜(x)
sin 12 t
−π
425
dt
exists, because |t| ≤ π| sin 12 t| if |t| ≤ π. So by the Riemann-Lebesgue lemma (282Fb), Z π f˜(x+t)−f˜(x) 1 lim sin(n + )t dt = 0. 1 n→∞
sin 2 t
−π
2
(c) Because 1 2π
Z
π
1
sin(n+ 2 )t dt = f˜(x) f˜(x) 1 sin 2 t
−π
for every n (282Dc), 1 sn (x) = f˜(x) +
2π
Z
π ˜ f (x+t)−f˜(x)
−π
sin 12 t
1 sin(n + )t dt → f˜(x) 2
as n → ∞, as required. 282M Lemma Suppose that f is a complex-valued function, defined almost everywhere and of bounded variation on ]−π, π]. Then supk∈Z |kck | < ∞, where ck is the kth Fourier coefficient of f , as in 282A. proof Set M = limx∈dom f,x↑π |f (x)| + Var]−π,π[ (f ). By 224J, Z
|kck | = =
π
1 ¯¯ 2π −π
¯ 1 kf (t)e−ikt dt¯ ≤ M 2π
M sup |e−ikc 2π c∈[−π,π]
− eikπ | ≤
¯ sup ¯ c∈[−π,π]
Z
c
¯ ke−ikt dt¯
−π
M π
for every k. 282N I give another lemma, extracting the technical part of the proof of the next theorem. (Its most natural application is in 282Xn.) Pn Pm 1 Lemma Let hdk ik∈N be a complex sequence, and set tn = k=0 dk , τm = m+1 n=0 tn for n, m ∈ N. Suppose that supk∈N |kdk | = M < ∞. Then for any j ≥ 1 and any c ∈ C, |tn − c| ≤
M j
+ (2j + 3) supm≥n−n/j |τm − c|
for every n ≥ j 2 . proof (a) The first point to note is that for any n, n0 ∈ N, |tn − tn0 | ≤
M |n−n0 | . 1+min(n,n0 )
P P If n = n0 this is trivial. Suppose that n0 < n. Then |tn − tn0 | = |
n X k=n0 +1
dk | ≤
n X M k=n0 +1
k
≤
M (n−n0 ) n0 +1
=
M |n−n0 | . 1+min(n0 ,n)
Of course the case n < n0 is identical. Q Q (b) Now take any n ≥ j 2 . Set η = supm≥n−n/j |τm − c|. Let m ≥ j be such that jm ≤ n < j(m + 1); then n < jm + m; also
426
Fourier analysis
282N
n(1 − 1j ) ≤ m(j + 1)(1 − 1j ) ≤ mj. Set τ∗ =
1 m
jm+m X
tn =
n=jm+1
jm+m+1 τjm+m m
−
jm+1 τjm . m
Then |τ ∗ − c| = | =|
jm+m+1 τjm+m m
jm+m+1 (τjm+m m
jm+m+1 η m
≤
−
+
jm+1 τjm m
jm+1 (τjm m
− c) −
jm+1 η m
− c| − c)|
≤ (2j + 3)η.
On the other hand, jm+m X
¯1 |τ ∗ − tn | = ¯
m
≤
1 m
jm+m X
¯ 1 (tn0 − tn )¯ ≤
m
n0 =jm+1 jm+m X
Mm 1+jm
n0 =jm+1
M |n−n0 | 1+min(n,n0 )
Mm 1+jm
≤
M . j
+ (2j + 3)η =
M j
+ (2j + 3) supm≥n−n/j |τm − c|,
n0 =jm+1
=
Putting these together, we have |tn − c| ≤ |tn − τ ∗ | + |τ ∗ − c| ≤
M j
as required. 282O Theorem Let f be a complex-valued function of bounded variation, defined almost everywhere on ]−π, π], and let hsn in∈N be its sequence of Fourier sums (282Ab). (i) If x ∈ ]−π, π[, then 1 2
limn→∞ sn (x) = (limt∈dom f,t↑x f (t) + limt∈dom f,t↓x f (t)). 1 2
(ii) limn→∞ sn (π) = (limt∈dom f,t↑π f (t) + limt∈dom f,t↓−π f (t)). (iii) If f is defined throughout ]−π, π], is continuous, and limt↓−π f (t) = f (π), then sn (x) → f (x) uniformly on ]−π, π]. proof (a) Note first that 224F shows that the limits limt∈dom f,t↓x f (t), limt∈dom f,t↑x f (t) required in the formulae above always exist. We know also from 282M that M = supk∈Z |kck | < ∞, where ck is the kth Fourier coefficient of f . Take any x ∈ ]−π, π], and set c = 12 (limt∈dom f,t↑x f˜(t) + limt∈dom f˜,t↓x f˜(t)), writing f˜ for the periodic extension of f , as usual. We know from 282Id-282Ie that c = limm→∞ σm (x), writing σm for the F´ejer sums of f . Take any j ≥ max(2, 2M/²). Take m0 ≥ 1 such that |σm (x) − c| ≤ ²/(2j + 3) for every m ≥ m0 . Now if n ≥ max(j 2 , 2m0 ), apply Lemma 282N with d0 = c0 ,
dk = ck eikx + c−k e−ikx for k ≥ 1,
so that tn = sn (x), τm = σm (x) and |kdk | ≤ 2M for every k, n, m ∈ N. We have n − n/j ≥ 12 n ≥ m0 , so η = supm≥n−n/j |τm − c| ≤ supm≥m0 |τm − c| ≤
² . 2j+3
282Q
Fourier series
427
So 282N tells us that |sn (x) − c| = |tn − c| ≤
2M j
+ (2j + 3) supm≥n−n/j |τm − c| ≤ ² + (2j + 3)η ≤ 2².
As ² is arbitrary, limn→∞ sn (x) = c, as required. (b) This proves (i) and (ii) of this theorem. Finally, for (iii), observe that under these conditions σm (x) → f (x) uniformly as m → ∞, by 282G. So given ² > 0 we choose j ≥ max(2, 2M/²) and m0 ∈ N such that |σm (x) − f (x)| ≤ ²/(2j + 3) for every m ≥ m0 , x ∈ ]−π, π]. By the same calculation as before, |sn (x) − f (x)| ≤ 2² 2
for every n ≥ max(j , 2m0 ) and every x ∈ ]−π, π]. As ² is arbitrary, limn→∞ sn (x) = f (x) uniformly for x ∈ ]−π, π]. 282P Corollary Let f be a complex-valued function which is integrable over ]−π, π], and hsn in∈N its sequence of Fourier sums. (i) Suppose that x ∈ ]−π, π[ is such that f is of bounded variation on some neighbourhood of x. Then 1 2
limn→∞ sn (x) = (limt∈dom f,t↑x f (t) + limt∈dom f,t↓x f (t)). (ii) If there is a δ > 0 such that f is of bounded variation on both ]−π, −π + δ] and [π − δ, π], then 1 2
limn→∞ sn (π) = (limt∈dom f,t↑π f (t) + limt∈dom f,t↓−π f (t)). proof In case (i), take δ > 0 such that f is of bounded variation on [x − δ, x + δ] and set f1 (t) = f (t) if x ∈ dom f ∩ [x − δ, x + δ], 0 for other t ∈ ]−π, π]; in case (ii), set f1 (t) = f (t) if t ∈ dom f and |t| ≥ π − δ, 0 for other t ∈ ]−π, π], and say that x = π. In either case, f1 is of bounded variation, so by 282O the Fourier sums hs0n in∈N of f1 converge at x to the value given by the formulae above. But now observe that, writing f˜ and f˜1 for the periodic extensions of f and f1 , f˜ − f˜1 = 0 on a neighbourhood of x, so Z π f˜(x+t)−f˜1 (x+t) dt 1 sin 2 t
−π
exists in C, and by 282Fb Z
π
lim
n→∞
−π
f˜(x + t) − f˜1 (x + t) 1 sin(n + )t dt = 0, 2 sin 12 t
that is, limn→∞ sn (x) − s0n (x) = 0. So hsn in∈N also converges to the right limit. 282Q I cannot leave this section without mentioning one of the most important facts about Fourier series, even though I have no space here to discuss its consequences. Theorem Let f and g be complex-valued functions which are integrable over ]−π, π], and hck ik∈N , hdk ik∈N their Fourier coefficients. Let f ∗ g be their convolution, defined by the formula Z π Z π (f ∗ g)(x) = f (x −2π t)g(t)dt = f˜(x − t)g(t)dt, −π
−π
as in 255O, writing f˜ for the periodic extension of f . Then the Fourier coefficients of f ∗ g are h2πck dk ik∈Z . proof By 255O(d-i), Z 1 2π
π
−π
(f ∗ g)(x)e−ikx dx =
1 2π
Z
π
Z
π
e−ik(t+u) f (t)g(u)dtdu −π −π Z π Z π 1 = e−ikt f (t)dt e−iku g(u)du = 2πck dk . 2π
−π
−π
428
Fourier analysis
*282R
*282R In my hurry to get to the theorems on convergence of F´ejer and Fourier sums, I have rather neglected the elementary manipulations which are essential when applying the theory. One basic result is the following. Proposition (a) Let f : [−π, π] → C be an absolutely continuous function such that f (−π) = f (π), and hck ik∈Z its sequence of Fourier coefficients. Then the Fourier coefficients of f 0 are hikck ik∈Z . (b) Let f : R → C be a differentiable function such that f 0 is absolutely P∞ continuous on [−π, π] and 0 f (−π) = f 0 (π). If hck ik∈Z are the Fourier coefficients of f ¹ ]−π, π], then k=−∞ |ck | is finite. proof (a) By 225Cb, f 0 is integrable over [−π, π]; by 225E, f is an indefinite integral of f 0 . So 225F tells us that Rπ 0 Rπ f (x)e−ikx dx = f (π)e−ikπ − f (−π)eikπ + ik −π f (x)e−ikx dx = ikck −π for every k ∈ Z. (b) By (a), applied twice, the Fourier coefficients of f 00 are h−k 2 ck ik∈Z , so supk∈Z k 2 |ck | is finite; because P∞ P∞ 1 k=1 2 < ∞, k=−∞ |ck | < ∞. k
282X Basic exercises > (a) SupposePthat hck ik∈N is an absolutely summable double-ended sequence ∞ of complex numbers. Show that f (x) = k=−∞ ck eikx exists for every x ∈ R, that f is continuous and periodic, and that its Fourier coefficients are the ck . (c) Set φn (t) = 2t sin(n + 12 t) for t 6= 0. (This is sometimes called the modified Dirichlet kernel.) Show that for any integrable function f on ]−π, π], with Fourier sums hsn in∈N and periodic extension f˜, Rπ 1 limn→∞ |sn (x) − 2π φ (t)f˜(x + t)dt| = 0 −π n for every x ∈ ]−π, π]. (Hint: show that
2 t
−
1 sin 21 t
is bounded, and use 282E.)
(d) Give a proof of 282Ib from 242O, 255Oe and 282G. (e) Give another proof of 282Ic, based on 242O and 281J instead of on 282H. (f ) Use the idea of 255Yi to shorten one of the steps in the proof of 282H, taking π gm (t) = min( m+1 2π , 4(m+1)t2 )
for |t| ≤ δ, so that gm ≥ ψm on [−δ, δ]. > (g)(i) Let f be a real square-integrable function on R]−π, π], and hak ik∈N , hbk ik≥1 its real p Fourier coefP∞ √ π ficients (282Ba). Show that 12 a20 + k=1 (a2k + b2k ) = π1 −π |f (x)|2 dx. (ii) Show that f 7→ ( π2 a0 , πa1 , √ πb1 , . . . ) defines an inner-product-space isomorphism between the real Hilbert space L2R of equivalence classes of real square-integrable functions on ]−π, π] and the real Hilbert space `2R of square-summable sequences. (h) Show that π4 = 1 − 13 + 15 − 17 + . . . . (Hint: find the Fourier series of f where f (x) = x/|x|, and compute the sum of the series at π2 . Of course there are other methods, e.g., examining the Taylor series for arctan π4 .) (i) Let f be an integrable complex-valued function on ]−π, π], and hsn in∈N its sequence of Fourier sums. R π f (t)−a Suppose that x ∈ ]−π, π[, a ∈ C are such that −π dt exists and is finite. Show that limn→∞ sn (x) = a. t−x
Explain how this generalizes 282L. What modification is appropriate to obtain a limit limn→∞ sn (π)? (j) Suppose that α > 0, K ≥ 0 and f : ]−π, π[ → C are such that |f (x) − f (y)| ≤ K|x − y|α for all x, y ∈ ]−π, π[. Show that the Fourier sums of f converge to f everywhere on ]−π, π[. (Hint: use 282Xi.) (Compare 282Yb.)
282Yb
Fourier series
429
(k) In 282L, show that it is enough if f˜ is differentiable with respect to its domain at x or π (see 262Fb), rather than differentiable in the strict sense. Ra Rb (l) Show that lima→∞ 0 sint t dt exists and is finite. (Hint: use 224J to estimate a sint t dt for 0 < a ≤ b.) R∞ (m) Show that 0 | sint t| dt = ∞. Ra 2 supa≥0 1 sint t dt = ∞.)
(Hint: show that supa≥0 |
Ra 1
cos 2t t dt|
< ∞, and therefore that
> (n) Let hdk ik∈N be a sequence in C such that supk∈N |kdk | < ∞ and 1 Pm Pn limm→∞ n=0 k=0 dk = c ∈ C. m+1 P∞ Show that c = k=0 dk . (Hint: 282N.) P∞ > (o) Show that n=1 sum of the series at 0.)
1 n2
=
π2 6 .
(Hint: find the Fourier series of f where f (x) = |x|, and compute the
(p) Let f be an integrable complex-valued function on ]−π, π], and hsn in∈N its sequence of Fourier sums. Suppose that x ∈ ]−π, π[ is such that (i) there is an a ∈ C such that R x a−f (t) either −π dt exists in C x−t
or there is some δ > 0 such that f is of bounded variation on [x−δ, x], and a = limt∈dom f,t↑x f (t) (ii) there is a b ∈ C such that R π f (t)−b either x dt exists in C t−x
or there is some δ > 0 such that f is of bounded variation on [x, x+δ], and b = limt∈dom f,t↓x f (t). Show that limn→∞ sn (x) = 21 (a + b). What modification is appropriate to obtain a limit limn→∞ sn (π)? > (q) Let f , g be integrable complex-valued functions c = hck ik∈Z , d = hdk ik∈Z their P∞ on ]−π, π], and P ∞ sequences of Fourier coefficients. Suppose that either k=−∞ |ck | < ∞ or k=−∞ |ck |2 + |dk |2 < ∞. Show that the sequence of Fourier coefficients of f × g is just the convolution c ∗ d of c and d (255Xe). (r) In 282Ra, what happens if f (π) 6= f (−π)? P∞ (s) Suppose that hck ik∈N is a double-ended sequence of complex numbers such that k=−∞ |kck | < ∞. P∞ Show that f (x) = k=−∞ ck eikx exists for every x ∈ R and that f is differentiable everywhere. (t) Let hck ik∈Z be a double-ended sequence of complex numbers such that supk∈Z |kck | < ∞. Show that there is a square-integrable function f on ]−π, π] such that the ck are the Fourier coefficients of f , that f is the limit almost everywhere of its Fourier sums, and that f ∗ f ∗ f is differentiable. (Hint: use 282K to show that there is an f , and 282Xn to show that its Fourier sums converge wherever its F´ejer sums do; use 282Q and 282Xs to show that f ∗ f ∗ f is differentiable.) 282Y Further exercises (a) Let f be a non-negative integrable function on ]−π, π], with Fourier coefficients hck ik∈Z . Show that Pn Pn ¯k cj−k ≥ 0 j=0 k=0 aj a for all complex numbers a0 , . . . , an . (See also 285Xr below.) (b) Let f : ]−π, π] → C, K ≥ 0, α > 0 be such that |f (x) − f (y)| ≤ K|x − y|α for all x, y ∈ ]−π, π]. Let ck , sn be the Fourier coefficients and sums of f . (i) Show that supk∈Z |k|α |ck | < ∞. (Hint: show that Rπ 1 ck = 4π (f (x) − f˜(x + πk ))e−ikx dx.) (ii) Show that if f (π) = limx↓−π f (x) then sn → f uniformly. −π (Compare 282Xj.)
430
Fourier analysis
282Yc
R π(c) Let pf be a measurable complex-valued function on ]−π, π], and suppose that p ∈ [1, ∞[ is R πsuch that |f (x)| dx < ∞. Let hσ i be the sequence of F´ e jer sums of f . Show that lim |f (x) − m m∈N m→∞ −π −π σm (x)|p dx = 0. (Hint: use 245Xk, 255Yl and the ideas in 282Ib.) (d) Construct a continuous function h : [−π, π] → R such that h(π) = h(−π) but the Fourier sums of h R π sin(m+ 12 )t sin(n+ 21 )t are unbounded at 0, as follows. Set α(m, n) = 0 dt. Show that limn→∞ α(m, n) = 0 for 1 P∞ sin 2 t every m, but limn→∞ α(n, n) = ∞. Set h0 (x) = k=0 δk sin(mk + 12 )x for 0 ≤ x ≤ π, 0 for −π ≤ x ≤ 0, where δk > 0, mk ∈ N are such that (α) δk ≤ 2−k , δk |α(mk , mn )| ≤ 2−k for every n < k (choosing δk ) (β) δk α(mk , mk ) ≥ k, δn |α(mk , mn )| ≤ 2−n for every n < k (choosing mk ). Now modify h0 on [−π, 0[ by adding a function of bounded variation. R π sin(n+ 1 )t (e) (i) Show that limn→∞ −π | sin 1 t2 |dt = ∞. (Hint: 282Xm.) (ii) Show that for any δ > 0 there are Rπ Rπ 2 n ∈ N, f ≥ 0 such that −π f ≤ δ, −π |sn | ≥ 1, where sn is the nth Fourier sum of f . (Hint: take n such 1 R π sin(n+ 12 )t 1 δ that | sin 1 t |dt > and set f (x) = for 0 ≤ x ≤ η, 0 otherwise, for small η.) (iii) Show that there −π 2π
2
δ
η
is an integrable function f : ]−π, π] → R such that supn∈N ksn k1 is infinite, where hsn in∈N is the sequence of Fourier sums of f . (Hint: it helps to know the ‘Uniform Boundedness Theorem’ of functional analysis, but f can also be constructed bare-handed by the method of 282Yd.) 282 Notes and comments This has been a long section with a potentially confusing collection of results, so perhaps I should recapitulate. Associated with any P integrable function on ]−π, π] we haveP the corresponding n ∞ ikx ikx Fourier sums, being the symmetric partial sums c e of the complex series , or, k k=−n k=−∞ ck e Pn P∞ 1 1 equally, the partial sums 2 a0 + k=1 ak cos kx + bk sin kx of the real series 2 a0 + k=1 ak cos kx + bk sin kx. The Fourier coefficients ck , ak , bk are the only natural ones, because if the series is to converge with any regularity at all then ¢ 1 R π ¡P∞ ikx −ilx e dx k=−∞ ck e −π 2π
ought to be simultaneously P∞
1
k=−∞ 2π
Rπ
c e −π k
ikx −ilx
e
dx = cl
and 1 Rπ 2π −π
f (x)e−ilx dx.
(Compare the calculations in 282J.) The effect of taking F´ejer sums σm (x) rather than the Fourier sums sn (x) is to smooth the sequence out; recall that if limn→∞ sn (x) = c then limm→∞ σm (x) = c, by 273Ca in the last chapter. Most of the work above is concerned with the question of when Fourier or F´ejer sums converge, in some sense, to the original function f . As has happened before, in §245 and elsewhere, we have more than one kind of convergence to consider. Norm convergence, for k k1 or k k2 or k k∞ , is the simplest; the three theorems 282G, 282Ib and 282J at least are relatively straightforward. (I have given 282Ib as a corollary of 282Ia; but there is an easier proof from 282G. See 282Xd.) Respectively, we have if f is continuous (and matches at ±π, that is, f (π) = limt↓−π f (t)) then σm → f uniformly, that is, for k k∞ (282G); if f is any integrable function, then σm → f for k k1 (282Ib); if f is a square-integrable function, then sn → f for k k2 (282J); if f is continuous and of bounded variation (and matches at ±π), then sn → f uniformly (282O). There are some similar results for other k kp (282Yc); but note that the Fourier sums need not converge for k k1 (282Ye). Pointwise convergence is harder. The results I give are if f is any integrable function, then σm → f almost everywhere (282Ia);
282 Notes
Fourier series
431
this relies on some careful calculations in 282H, and also on the deep result 223D. Next we have the results which look at the average of the limits of f from the two sides. Suppose I write 1 2
f ± (x) = (limt↑x f (t) + limt↓x f (t)) whenever this is defined, taking f ± (π) = 12 (limt↑π f (t) + limt↓−π f (t)). Then we have if f is any integrable function, σm → f ± wherever f ± is defined (282Id); if f is of bounded variation, sn → f ± everywhere (282O). Of course these apply at any point at which f is continuous, in which case f (x) = f ± (x). Yet another result of this type is if f is any integrable function, sn → f at any point at which f is differentiable (282L); in fact, this can be usefully extended for very little extra labour (282Xi, 282Xp). I cannot leave this list without mentioning the theorem I have not given. This is Carleson’s theorem: if f is square-integrable, sn → f almost everywhere (Carleson 66). I will come to this in §286. There is an elementary special case in 282Xt. The result is in fact valid for many other f (see the notes to §286). The next glaring lacuna in the exposition here is the absence of any examples to show how far these results are best possible. There is no suggestion, indeed, that there are any natural necessary and sufficient conditions for sn → f at every point. Nevertheless, we have to make an effort to find a continuous function for which this is not so, and the construction of an example by du Bois-Reymond (Bois-Reymond 1876) was an important moment in the history of analysis, not least because it forced mathematicians to realise that some comfortable assumptions about the classification of functions – essentially, that functions are either ‘good’ or so bad that one needn’t trouble with them – were false. The example is instructive but I have had to omit it for lack of space; ¨ rner 88, I give an outline of a possible method in 282Yd. (You can find a detailed construction in Ko chapter 18, and a proof that such a function exists in Dudley 89, 7.4.3.) If you allow general integrable functions, then you can do much better, or perhaps I should say much worse; there is an integrable f such that supn∈N |sn (x)| = ∞ for every x ∈ ]−π, π] (Kolmogorov 26; see Zygmund 59, §§VIII.3-4). In 282C I mentioned two types of problem. The first – when is a Fourier series summable? – has at least been treated at length, even though I cannot pretend to have given more than a sample of what is known. The second – how do properties of the ck reflect properties of f ? – I have hardly touched on. I do give what seem to me to be the three most important results in this area. The first is if f and g have the same Fourier coefficients, they are equal almost everywhere (282Ic, 282Xe). This at least tells us that we ought in principle to be able to learn almost anything about f by looking at its Fourier series. (For instance, 282Ya describes a necessary and sufficient condition for f to be non-negative almost everywhere.) The second is P∞ f is square-integrable iff k=−∞ |ck |2 < ∞; in fact, P∞ 1 Rπ 2 |f (x)|2 dx (282J). k=−∞ |ck | = π 2π
Of course this is fundamental, since it shows that Fourier coefficients provide a natural Hilbert space isomorphism between L2 and `2 (282K). I should perhaps remark that while the real Hilbert spaces L2R , `2R are isomorphic as inner product spaces (282Xg), they are certianly not isomorphic as Banach lattices; for instance, `2R has ‘atomic’ elements c such that if 0 ≤ d ≤ c then d is a multiple of c , while L2R does not. Perhaps even more important is the Fourier coefficients of a convolution f ∗ g are just a scalar multiple of the products of the Fourier coefficients of f and g (282Q); but to use this effectively we need to study the Banach algebra structure of L1 , and I have no choice but to abandon this path immediately. (It will form a conspicuous part of Chapter 44 in Volume 4.) 282Xt gives an elementary consequence, and 282Xq a very partial description of the relationship between a product f × g of two functions and the convolution product of their sequences of Fourier coefficients.
432
Fourier analysis
282 Notes
I end these notes with a remark on the number 2π. This enters nearly every formula involving Fourier series, but could I think be removed totally from the present section, at least, by re-normalizing the measure 1 of ]−π, π]. If instead of Lebesgue measure µ we took the measure ν = 2π µ throughout, then every 2π would disappear. (Compare the remark in 282Bb concerning the possibility of doing integrals over S 1 .) But I think most of us would prefer to remember the location of a 2π in every formula than to deal with an unfamiliar measure.
283 Fourier transforms I I turn now to the theory of Fourier transforms on R. In the first of two sections on the subject, I present those parts of the elementary theory which can be dealt with using the methods of the previous section on Fourier series. I find no way of making sense of the theory, however, without introducing a fragment of L.Schwartz’ theory of distributions, which I present in §284. As in §282, of course, this treatment also is nothing but a start in the topic. The whole theory can also be done in R r . I leave this extension to the exercises, however, since there are few new ideas, the formulae are significantly more complicated, and I shall not, in this volume at least, have any use for the multidimensional versions of these particular theorems, though some of the same ideas will appear, in multidimensional form, in §285. 283A Definitions Let f be a complex-valued function which is integrable over R. ∧
(a) The Fourier transform of f is the function f : R → C defined by setting ∧ 1 R ∞ −iyx f (y) = √ e f (x)dx −∞ 2π
for every y ∈ R. (Of course the integral is always defined because x 7→ e−iyx is bounded and continuous, therefore measurable.) ∨
(b) The inverse Fourier transform of f is the function f : R → C defined by setting ∨ 1 R ∞ iyx f (y) = √ e f (x)dx −∞ 2π
for every y ∈ R. 283B Remarks (a) It is a mildly vexing feature of the theory of Fourier transforms – vexing, that is, for outsiders like myself – that there is in fact no standard definition of ‘Fourier transform’. The commonest definitions are, I think, ∧ 1 R ∞ ∓iyx f (y) = √ e f (x)dx, −∞ 2π
∧
f (y) = ∧
f (y) =
R∞ −∞
R∞ −∞
e∓iyx f (x)dx,
e∓2πiyx f (x)dx,
corresponding to inverse transforms 1 f (y) = √ ∨
R∞
±iyx
e 2π −∞
∨
f (y) = ∨
f (y) =
f (x)dx,
1 R ∞ ±iyx e f (x)dx, 2π −∞
R∞ −∞
e±2πiyx f (x)dx.
I leave it to you to check that the whole theory can be carried through with any of these six pairs, and to investigate other possibilities (see 283Xa-283Xb below).
283C
Fourier transforms I
433 ∧
(b) The phrases ‘Fourier transform’, ‘inverse Fourier transform’ make it plain that (f )∨ is supposed to be f , at least some of the time. This is indeed the case, but the class of f for which this is true in the literal sense is somewhat constrained, and we shall have to wait a little while before investigating it. ∧
∨
(c) No amount of juggling with constants, in the manner of (a) above, can make f and f quite the same. ∨
∧
∨
∧
However, on the definitions I have chosen, we do have f (y) = f (−y) for every y, so that f and f will share essentially all the properties of interest to us here; in particular, everything in the next proposition will be valid with ∨ in place of ∧ , if you change signs at the right points in parts (c), (h) and (i). 283C Proposition Let f and g be complex-valued functions which are integrable over R. ∧ ∧ (a) (f + g)∧ = f + g. ∧
(b) (cf )∧ = cf for every c ∈ C. ∧
∧
(c) If c ∈ R and h(x) = f (x + c) whenever this is defined, then h(y) = eicy f (y) for every y ∈ R. ∧
∧
(d) If c ∈ R and h(x) = eicx f (x) for every x ∈ dom f , then h(y) = f (y − c) for every y ∈ R. 1 c
∧
∧
(e) If c > 0 and h(x) = f (cx) whenever this is defined, then h(y) = f (cy) for every y ∈ R. ∧
(f) f : R → C is continuous. ∧
∧
(g) limy→∞ f (y) = limy→−∞ f (y) = 0. R∞ ∧ (h) If −∞ |xf (x)|dx < ∞, then f is differentiable, and its derivative is i R ∞ −iyx e xf (x)dx 2π −∞
∧
f 0 (y) = − √
for every y ∈ R. ∧ (i) If f is absolutely continuous on every bounded interval and f 0 is integrable, then (f 0 )∧ (y) = iy f (y) for every y ∈ R. proof (a) and (b) are trivial, and (c), (d) and (e) are elementary substitutions. (f ) If hyn in∈N is any convergent sequence in R with limit y, then Z ∞ ∧ 1 f (y) = √ lim e−iyn x f (x)dx 2π −∞ n→∞ Z ∞ ∧ 1 e−iyn x f (x)dx = lim f (yn ) = lim √ 2π
n→∞
n→∞
−∞
by Lebesgue’s Dominated Convergence Theorem, because |e−iyn x f (x)| ≤ |f (x)| for every n ∈ N, x ∈ dom f . ∧
As hyn in∈N is arbitrary, f is continuous. (g) This is just the Riemann-Lebesgue lemma (282E). ∂ −iyx (h) The point is that | ∂y e f (x)| = |xf (x)| for every x ∈ dom f , y ∈ R. So by 123D Z ∞ Z ∧ 1 d 1 d e−iyx f (x)dx = √ e−iyx f (x)dx f 0 (y) = √
2π dy
= =
1 √ 2π
i −√ 2π
2π dy
−∞
Z
∂ −iyx e f (x)dx ∂y dom f
Z
∞
=
1 √ 2π
Z
dom f ∞
−ixe−iyx f (x)dx
−∞
xe−iyx f (x)dx.
−∞
(i) Because f is absolutely continuous on every bounded interval, Rx R0 f (x) = f (0) + 0 f 0 for x ≥ 0, f (x) = f (0) − x f 0 for x ≤ 0. Because f 0 is integrable,
434
Fourier analysis
limx→∞ f (x) = f (0) +
R∞
283C
f 0 , limx→−∞ f (x) = f (0) −
0
R0 −∞
f0
both exist. Because f also is integrable, both limits must be zero. Now we can integrate by parts (225F) to see that Z ∞ Z a 1 1 (f 0 )∧ (y) = √ e−iyx f 0 (x)dx = √ lim e−iyx f 0 (x)dx 2π −∞ 2π a→∞ −a Z ∞ ¢ 1 ¡ iy e−iyx f (x)dx =√ lim e−iya f (a) − lim e−iya f (a) + √ 2π a→∞
2π
a→−∞
−∞
∧
= iy f (y). Ra
R a sin x sin x π x dx = 2 , lima→∞ −a x dx = Rb that | a sinxcx dx| ≤ K whenever a ≤ b
283D Lemma (a) lima→∞ (b) There is a K < ∞ such
0
π. and c ∈ R.
proof (a)(i) Set F (a) = so that F (a) = −F (−a) and If 0 < a ≤ b, then by 224J |
Rb a
sin x dx| x
Rb a
1 b
Ra
sin x dx 0 x
sin x x dx
≤( +
In particular, |F (n) − F (m)| ≤
1 a
2 m
if a ≥ 0,
F (a) = −
R0 −a
sin x dx x
if a ≤ 0,
= F (b) − F (a) for all a ≤ b.
Rc
1 b
− ) supc∈[a,b] |
a
sin x dx| ≤
1 a
2 a
supc∈[a,b] | cos c − cos a| ≤ .
if m ≤ n in N, and hF (n)in∈N is a Cauchy sequence with limit γ say; now |γ − F (a)| = limn→∞ |F (n) − F (a)| ≤
2 a
for every a > 0, so lima→∞ F (a) = γ. Of course we also have lima→∞
Ra
−a
sin x dx x
= lima→∞ (F (a) − F (−a)) = lima→∞ 2F (a) = 2γ.
(ii) So now I have to calculate γ. For this, observe first that 2γ = lima→∞
R πa −πa
sin x dx x
= lima→∞
Rπ −π
sin at dt t
(substituting x = t/a). Next, limt→0 so
1 t
−
1 2 sin 12 t
R π ¯1 ¯ − −π t
= limu→0
¯ 1 ¯dt 1 2 sin 2 t
sin u−u 2u sin u
= 0,
< ∞,
and by the Riemann-Lebesgue lemma (282Fb) lima→∞ But we know that
R π ¡1 −π
Rπ
t
−
¢ 1 sin at dt 1 2 sin 2 t
sin(n+ 12 )t
−π
2 sin 12 t
= 0.
dt = π
for every n (using 282Dc), so we must have Z a Z π Z π sin t sin at sin at lim dt = lim dt = lim 1 dt t a→∞ −a t a→∞ −π a→∞ −π 2 sin 2 t Z π sin(n+ 12 )t = lim dt = π, 1 n→∞
−π
2 sin 2 t
283E
Fourier transforms I
435
and γ = π/2, as claimed. (b) Because F is continuous and π 2
π 2
lima→∞ F (a) = γ = ,
lima→−∞ F (a) = −γ = − ,
F is bounded; say |F (a)| ≤ K1 for all a ∈ R. Try K = 2K1 . Now suppose that a < b and c ∈ R. If c > 0, then |
Rb a
sin cx dx| x
=|
R bc ac
sin t dt| t
= |F (bc) − F (ac)| ≤ 2K1 = K,
substituting x = t/c. If c < 0, then |
Rb
sin cx dx| a x
while if c = 0 then |
Rb a
=|−
Rb a
sin cx dx| x
sin(−c)x dx| x
≤ K;
= 0 ≤ K.
283E The hardest work of this section will lie in the ‘pointwise inversion theorems’ 283I and 283K below. I begin however with a relatively easy, and at least equally important, result, showing (among other things) that an integrable function f can (essentially) be recovered from its Fourier transform. Lemma Whenever c < d in R, Z
a
e−iyx
lim
a→∞
−a
eidy −eicy dy y
= 2πi if c < x < d, = πi if x = c or x = d, = 0 if x < c or x > d.
proof We know that for any b > 0 lima→∞
Ra
sin bx dx −a x
= lima→∞
R ab
sin t dt t
−ab
=π
(subsituting x = t/b), and therefore that for any b < 0 lima→∞
Ra
sin bx dx −a x
Now consider, for x ∈ R, lima→∞ First note that all the integrals
Ra −a
= − lima→∞
Ra −a
e−iyx
Ra −a
sin(−b)x dx x
= −π.
eidy −eicy dy. y
exist, because limy→0
eidy −eicy y
= i(d − c)
is finite, and the integrand is certainly continuous except at 0. Now we have Z
a
−a
e−iyx
eidy −eicy dy y
Z
a
Z
a
= =
ei(d−x)y −ei(c−x)y dy y −a cos(d−x)y−cos(c−x)y dy y −a
Z
a
=i −a
Z
a
+i
sin(d−x)y−sin(c−x)y dy y
−a
sin(d−x)y−sin(c−x)y dy y
436
Fourier analysis
because cos is an even function, so
Ra −a
cos(d−x)y−cos(c−x)y dy y
283E
=0
for every a ≥ 0. (Once again, this integral exists because limy→0
cos(d−x)y−cos(c−x)y y
= 0.)
Accordingly Z
a
lim
a→∞
−a
e−iyx
eidy −eicy dy y
Z
Z
a
sin(d−x)y dy y a→∞ −a
= i lim
a
− i lim
a→∞
−a
sin(c−x)y dy y
= iπ − iπ = 0 if x < c, = iπ − 0 = πi if x = c, = iπ + iπ = 2πi if c < x < d, = 0 + iπ = πi if x = d, = −iπ + iπ = 0 if x > d.
∧
283F Theorem Let f be a complex-valued function which is integrable over R, and f its Fourier transform. Then whenever c ≤ d in R, Z
d
i f (x)dx = √
Z
a
lim
2π a→∞ −a
c
eicy −eidy ∧ f (y)dy. y
proof If c = d this is trivial; let us suppose that c < d. (a) Writing
Z
a
e−iyx
θa (x) = −a
eidy −eicy dy y
for x ∈ R, a ≥ 0, 283E tells us that lima→∞ θa (x) = 2πiθ(x) 1 2 (χ[c, d]
where θ = + χ ]c, d[) takes the value 1 inside the interval [c, d], 0 outside and the value endpoints. At the same time, |θa (x)| = | ≤|
Z
a
Z
a
1 2
at the
sin(d−x)y−sin(c−x)y dy| y −a
Z
a
sin(d−x)y sin(c−x)y dy| + | dy| y y −a −a
≤ 2K
for all a ≥ 0, x ∈ R, where K is the constant of 283Db. Consequently |f ×θa | ≤ 2K|f | everywhere on dom f , for every a ≥ 0, and (applying Lebesgue’s Dominated Convergence Theorem to sequences hf ×θan in∈N , where an → ∞) R R Rd lima→∞ f × θa = 2πi f × θ = 2πi c f . (b) Now consider the limit in the statement of the theorem. We have
283H
Fourier transforms I
Z
437
Z a Z ∞ icy 1 eicy − eidy ∧ e − eidy −iyx f (y)dy = √ e f (x)dxdy y y 2π −a −∞ Z ∞ Z a icy e − eidy −iyx 1 =√ e f (x)dydx y 2π −∞ −a Z ∞ 1 f (x)θa (x)dx, = −√ 2π −∞
a
−a
by Fubini’s and Tonelli’s theorems (252H), using the fact that (eicy − eidy )/y is bounded to see that
¯ R ∞ R a ¯ eicy −eidy −iyx ¯ y e f (x)¯dydx −∞
−a
is finite. Accordingly i √ lim 2π a→∞
Z
a
−a
Z ∞ i eicy − eidy ∧ f (x)θa (x)dx lim f (y)dy = − y 2π a→∞ −∞ Z d Z d i = − 2πi f (x)dx = f (x)dx, 2π c c
as required. ∧
∧
283G Corollary If f and g are complex-valued functions which are integrable over R, then f = g iff f =a.e. g. proof If f =a.e. g then of course R∞
1 f (y) = √ ∧
2π
∧
−∞
1 eiyx f (x)dx = √
R∞
e 2π −∞
iyx
∧
g(x)dx = g(y)
∧
for every y ∈ R. Conversely, if f = g, then by the last theorem Rd Rd f= c g c for all c ≤ d, so f = g almost everywhere, by 222D. ∧
283H Lemma Let f be a complex-valued function which is integrable over R, and f its Fourier transform. Then for any a > 0, x ∈ R, 1 √ 2π
proof We have
Ra
−a
∧
eixy f (y)dy =
Ra R∞ −a −∞
1 π
R∞
−∞
sin a(x−t) f (t)dt x−t
|eixy e−iyt f (t)|dtdy ≤ 2a
=
R∞ −∞
1 π
R∞
−∞
sin at f (x − t)dt. t
|f (t)|dt < ∞,
so (because the function (t, y) 7→ eixy e−iyt f (t) is surely jointly measurable) we may reverse the order of integration, and get Z a Z aZ ∞ ∧ 1 1 √ eixy f (y)dy = eixy e−iyt f (t)dt dy 2π −a 2π −a −∞ Z ∞ Z a 1 = f (t) ei(x−t)y dy dt 2π −∞ −a Z ∞ Z ∞ 1 1 2 sin(x−t)a sin au = f (t)dt = f (x − u)du, 2π
substituting t = x − u.
−∞
x−t
π
−∞
u
438
Fourier analysis
283I
283I Theorem Let f be a complex-valued function which is integrable over R, and suppose that f is differentiable at x ∈ R. Then Ra ∧ Ra ∨ 1 1 f (x) = √ lima→∞ −a eixy f (y)dy = √ lima→∞ −a e−ixy f (y)dy. 2π
2π
proof Set g(u) = f (x) if |u| ≤ 1, 0 otherwise, and observe that limu→0 u1 (f (x − u) − g(u)) = −f 0 (x) is finite, so that there is a δ ∈ ]0, 1] such that ¯ f (x−u)−g(u) ¯ ¯ < ∞. K = sup0 0. The hypothesis is that there is some δ > 0 such that Var[x−δ,x+δ] (f ) < ∞. Consequently limη↓0 Var]x,x+η] (f ) = limη↓0 Var[x−η,x[ (f ) = 0 (224E). There is therefore an η > 0 such that max(Var[x−η,x[ (f ), Var]x,x+η] (f )) ≤ ². Of course |f (t) − f (u)| ≤ Var[x−η,x[ (f ) ≤ ² whenever t, u ∈ dom f and x − η ≤ t ≤ u < x, so we shall have |f (t) − a| ≤ ² for every t ∈ dom f ∩ [x − η, x[, and similarly |f (t) − b| ≤ ² whenever t ∈ dom f ∩ ]x, x + η]. (c) Now set
440
Fourier analysis
283L
g1 (t) = f (t) when t ∈ dom f and |x − t| > η, 0 otherwise, g2 (t) = a when x − η ≤ t < x, b when x < t ≤ x + η, 0 otherwise, g3 = f − g1 − g2 . Then f = g1 + g2 + g3 ; each gj is integrable; g1 is zero on a neighbourhood of x; supt∈dom g3 ,t6=x |g3 (t)| ≤ ², Var[x−η,x[ (g3 ) ≤ ²,
Var]x,x+η] (g3 ) ≤ ².
(d) Consider the three parts g1 , g2 , g3 separately. (i) For the first, we have 1 R γ ixy ∧ e g 1 (y)dy 2π −γ
limγ→∞ √
=0
by 283I. (ii) Next,
1 √ 2π
Z
γ
e −γ
g 2 (y)dy =
1 π
=
a π
=
a π
ixy ∧
(by 283H)
Z
∞
Z
x
Z
γη
sin(x−t)γ g2 (t)dt x−t −∞
Z
sin(x−t)γ b dt + x−t π x−η
0
Z
sin u b du + u π
x+η
x
γη
0
sin(x−t)γ dt x−t
sin u du u
(substituting t = x − γ1 u in the first integral, t = −x + γ1 u in the second) →
a+b 2
as γ → ∞
by 283Da. (iii) As for the third, we have, for any γ > 0, ¯ 1 ¯√
2π
Z
γ
−γ
¯ 1¯ ∧ eixy g 3 (y)dy ¯ = ¯
Z
π
∞
¯ sin(x−t)γ g3 (t)dt¯ x−t −∞
Z
1¯ = ¯
Z
π
0
Z
∞
¯ sin tγ g3 (x − t)dt¯ t −∞ η
≤
¯ 1¯ sin tγ 1 ¯¯ g3 (x − t)dt¯ + ¯ π −η t π 0
¯ sin tγ g3 (x − t)dt¯ t
≤
K sup |g3 (t)| + Var (g3 ) π t∈dom g ∩]x−η,x[ ]x−η,x[ 3
³
+
sup t∈dom g3 ∩]x,x+η[
´ |g3 (t)| + Var (g3 ) ]x,x+η[
K π
≤ 4² , using 224J to bound the integrals in terms of the variation and supremum of g3 and integrals of subintervals. (e) We therefore have
sin γt t
over
283M
Fourier transforms I
¯ 1 lim sup¯ √ γ→∞
Z
2π
γ
∧
eixy f (y)dy −
−γ
441
a+b ¯¯ 2
Z
γ
¯ 1 ¯¯ ∧ eixy g 1 (y)dy ¯ 2π −γ
≤ lim sup √ γ→∞
Z
¯ 1 + lim sup¯ √
1 ¯ + lim sup √ ¯ γ→∞
≤0+0+
eixy g 2 (y)dy − ∧
2π
γ→∞
γ
−γ γ
Z
2π
−γ
a+b ¯¯ 2
¯ ∧ eixy g 3 ydy ¯
4K ² π
by the calculations in (d). As ² is arbitrary, 1 R γ ixy ∧ e f (y)dy 2π −γ
limγ→∞ √
−
a+b 2
= 0.
(f ) This is the first half of the theorem. But of course the second half follows at once, because Z γ Z γ ∨ ∧ 1 1 √ lim e−ixy f (y)dy = √ lim e−ixy f (−y)dy 2π γ→∞ −γ
2π γ→∞ −γ
Z
=
γ
∧ 1 √ lim eixy f (y)dy 2π γ→∞ −γ
=
a+b . 2
Remark You will see that this argument uses some of the same ideas as those in 282O-282P. It is more direct because (i) I am not using any concept corresponding to F´ejer sums (though a very suitable one is available; see 283Xf) (ii) I do not trouble to give the result concerning uniform convergence of the Fourier integrals when f is continuous and of bounded variation (283Xj) (iii) I do not give any pointer to the significance of ∧
the fact that if f is of bounded variation then supy∈R |y f (y)| < ∞ (283Xk). 283M
Corresponding to 282Q, we have the following.
Theorem Let f and g be complex-valued functions which are integrable over R, and f ∗ g their convolution product, defined by setting R∞ (f ∗ g)(x) = −∞ f (t)g(x − t)dt whenever this is defined (255E). Then (f ∗ g)∧ (y) =
√
∧
∧
2π f (y)g(y),
(f ∗ g)∨ (y) =
√
∨
∨
2π f (y)g(y)
for every y ∈ R. proof For any y, 1 (f ∗ g)∧ (y) = √
Z
∞
e−iyx (f ∗ g)(x)dx −∞ Z ∞Z ∞ 1 =√ e−iy(t+u) f (t)g(u)dtdu 2π 2π
(using 255G) 1 =√
2π
−∞
Z
∞
−∞
Z e−iyt f (t)dt
−∞
∞
e−iyu g(u)du =
√
∧
∧
2π f (y)g(y).
−∞
Now, of course, (f ∗ g)∨ (y) = (f ∗ g)∧ (−y) =
√
∧
∧
2π f (−y)g(−y) =
√
∨
∨
2π f (y)g(y).
442
Fourier analysis
283N section.
283N
I show how to compute a special Fourier transform, which will be used repeatedly in the next 2 2 √1 e−x /2σ σ 2π
Lemma For σ > 0, set ψσ (x) = transform are
for x ∈ R. Then its Fourier transform and inverse Fourier
∧
1 σ
∨
ψ σ = ψ σ = ψ1/σ . ∧
In particular, ψ 1 = ψ1 . proof (a) I begin with the special case σ = 1, using the Maclaurin series P∞ (−iyx)k e−iyx = k=0 k!
and the expressions for Fix y ∈ R. Writing
R∞ −∞
gk (x) =
xk e−x
2
/2
dx from §263.
(−iyx)k −x2 /2 e , k!
hn (x) =
Pn
h(x) = e|yx|−x
k=0 gk (x),
2
/2
,
we see that |gk (x)| ≤ so that |hn (x)| ≤
P∞ k=0
|yx|k −x2 /2 e , k!
|gk (x)| ≤ e|yx| e−x
2
/2
= h(x)
for every n; moreover, h is integrable, because |h(x)| ≤ e−|x| whenever |x| ≥ 2(1 + |y|). Consequently, using Lebesgue’s Dominated Convergence Theorem,
ψ 1 (y) =
1 2π
=
1 2π
∧
=
1 2π
Z
Z
∞
lim hn (x)dx =
−∞ n→∞ ∞ XZ ∞
gk (x)dx =
k=0 ∞ X j=0
−∞
1 2π
∞
1 lim hn (x)dx 2π n→∞ −∞
Z ∞ X (−iy)k k!
k=0
∞
xk e−x
2
/2
dx
−∞
(−iy)2j (2j)! √ 2π (2j)! 2j j!
(by 263H) 1 =√
2π
∞ X (−y 2 )j j=0
2j j!
1 = √ e−y
2
/2
2π
= ψ1 (y),
as claimed. x σ
1 σ
(b) For the general case, ψσ (x) = ψ1 ( ), so that ∧
ψ σ (y) =
1 σ
1 σ
∧
∧
· σ ψ 1 (σy) = ψ 1/σ (y)
by 283Ce. Of course we now have ∨
∧
1 σ
ψ σ (y) = ψ σ (−y) = ψ1/σ (y) because ψ1/σ is an even function. 283O
To lead into the ideas of the next section, I give the following very simple fact.
Proposition Let f and g be two complex-valued functions which are integrable over R. Then R∞ ∧ R∞ R∞ ∨ ∨ f × g and −∞ f × g = −∞ f × g. −∞
R∞ −∞
∧
f ×g =
283Wc
Fourier transforms I
proof Of course
R∞ R∞ −∞ −∞
|e−ixy f (x)g(y)|dxdy =
R∞ −∞
443
|f |
R∞ −∞
|g| < ∞,
so Z
∞
Z
1 ∧ f ×g = √
2π
∞
=
1 √ 2π
∞
Z
−∞ ∞
Z
Z
−∞
∞ −∞ ∞
f (y)e−iyx g(x)dxdy Z f (y)e−ixy g(x)dydx =
−∞
∞
∧
f × g. −∞
−ixy
For the other half of the proposition, replace every e
in the argument by eixy .
283W Higher dimensions I offer a series of exercises designed to provide hints on how the work of this section may be done in the r-dimensional case, where r ≥ 1. (a) Let f be an integrable complex-valued function defined almost everywhere in R r . Its Fourier trans∧
form is the function f : R r → C defined by the formula ∧ R 1 f (y) = √ r e−iy . x f (x)dx, ( 2π)
writing y .x = η1 ξ1 + . . . + ηr ξr for x = (ξ1 , . . . , ξr ) and y = (η1 , . . . , ηr ) ∈ R r , and
R
. . . dx for integration ∨
r
with respect to Lebesgue measure on R . Similarly, the inverse Fourier transform of f is the function f given by ∨ R ∧ 1 f (y) = √ r eiy . x f (x)dx = f (−y). ( 2π)
Show that, for any integrable complex-valued function f on R r , ∧ (i) f : R r → C is continuous; ∧ √ (ii) limkyk→∞ f (y) = 0, writing kyk = y . y as usual; R ∧ (iii) if kxk|f (x)|dx < ∞, then f is differentiable, and R ∂ ∧ i f (y) = − √ r e−iy . x ξj f (x)dx ∂ηj
( 2π)
r
for j ≤ r, y ∈ R , always taking ξj to be the jth coordinate of x ∈ R r ; ∂f ∂f ∧ (iv) if j ≤ r and ∂ξ is defined everywhere and is integrable, and if limkxk→∞ f (x) = 0, then ( ∂ξ ) (y) = j j ∧
iηj f (y) for every y ∈ Rr . ∧
(b) Let f be an integrable complex-valued function on R r , and f its Fourier transform. If c ≤ d in R r , show that Z
i f = ( √ )r lim 2π α1 ,... ,αr →∞ [c,d]
Z
r Y eiγj ηj − eiδj ηj ∧ f (y)dy, ηj [−a,a] j=1
setting a = (α1 , . . . ), c = (γ1 , . . . ), d = (δ1 , . . . ). ∧
(c) Let f be an integrable complex-valued function on R r , and f its Fourier transform. Show that if we write B∞ (0, a) = {y : |ηj | ≤ a for every j ≤ r}, then 1 √ ( 2π)r
for every a ≥ 0, where
R
∧
B∞ (0,a)
eix . y f (y)dy =
R
φa (t)f (x − t)dt
444
Fourier analysis
φa (t) =
1 πr
Qr j=1
283Wc
sin aτj τj
for t = (τ1 , . . . , τr ) ∈ R r . √ ∧ ∨ (d) Let f and g be integrable complex-valued functions on R r . Show that f ∗ g = ( 2π)r (f × g)∨ . (e) For σ > 0, define ψσ : Rr → C by setting ψσ (x) =
2 1 √ e−x . x/2σ (σ 2π)r
for every x ∈ R r . Show that ∧
∨
ψσ = ψσ =
1 ψ . σ r 1/σ
(f ) Defining ψσ as in (e), show that limσ→0 (f ∗ ψσ )(x) = f (x) for every continuous integrable f : R r → C, x ∈ Rr. ∧
∧∨
(g) Show that if f : R r → C is continuous and integrable, and f is also integrable, then f = f . (Hint: Show that both are equal at every point to √ ∧ limσ→∞ (σ 2π)r (f × ψσ )∨ = limσ→∞ f ∗ ψ1/σ .) (h) Show that
R
1 R r 1+kxkr+1
dx < ∞.
(i) Show that if f : R r → C can be partially differentiated r + 1 times, and f and all its partial derivatives ∂k f ∂ξj1 ∂ξj2 ...∂ξjk
∧
are integrable for k ≤ r + 1, then f is integrable.
√ ∧ ∧ (j) Show that if f and g are integrable complex-valued functions on R r , then (f ∗ g)∧ = ( 2π)r f × g. (k) Show that if f and g are integrable complex-valued functions on R r , then
R
∧
f ×g =
R
∧
f × g.
(l) Show that if f1 , . . . , fr are integrable complex-valued functions on R with Fourier transforms g1 , . . . , gr , and we write f (x) = f1 (ξ1 ) . . . fr (ξr ) for x = (ξ1 , . . . , ξr ) ∈ R r , then the Fourier transform of f is y 7→ g1 (η1 ) . . . gr (ηr ). (m)(i) Show that
R 2(k+1)π 2kπ
p
sin √ t dt t t
R∞
sin √ t dt > 0. t t Ra ∧ lima→∞ √1a −a f 1 (η)dη
> 0 for every k ∈ N, and hence that
0
(ii) Set f1 (ξ) = 1/ |ξ| for 0 < |ξ| ≤ 1, 0 for other ξ. Show that exists in R and is greater than 0. (iii) Construct an integrable function f2 , zero on some neighbourhood of 0, such that there are infinitely Rm ∧ many m ∈ N for which | −m f 2 (η)dη| ≥ √1m . (Hint: take f2 (ξ) = 2−k sin mk ξ for k + 1 ≤ ξ < k + 2, for a sufficiently rapidly increasing sequence hmk ik∈N .) (iv) Set f (x) = f1 (ξ1 )f2 (ξ2 ) for x ∈ R 2 . Show that f is integrable, that f is zero in a neighbourhood of 0, but that ∧ 1 R lim supa→∞ | B∞ (0,a) f (y)dy| > 0, 2π
defining B∞ as in (c). ∧
∨
283X Basic exercises (a) Confirm that the six alternative definitions of the transforms f , f offered in 283B all lead to the same theory; find the constants involved in the new versions of 283Ch, 283Ci, 283L, 283M and 283N.
283Xl
Fourier transforms I ∧
(b) If we redefined f (y) to be α
R∞
445 ∨
−∞
eiβxy f (x)dx, what would f (y) be?
(c) Show that nearly every 2π would disappear from the theorems of this section if we defined a measure ν on R by saying that νE = √12π µE for every Lebesgue measurable set E, where µ is Lebesgue measure, and wrote ∧ R∞ ∨ R∞ f (y) = −∞ e−iyx f (x)ν(dx), f (y) = −∞ eiyx f (x)ν(dx), R∞
(f ∗ g)(x) = What is lima→∞
Ra −a
−∞
f (t)g(x − t)ν(dt).
sin t t ν(dt)? ∧
> (d) Let f be an integrable complex-valued function on R, with Fourier transform f . Show that (i) if ∧
∧
g(x) = f (−x) whenever this is defined, then g(y) = f (−y) for every y ∈ R; (ii) if g(x) = f (x) whenever this ∧
∧
is defined, then g(y) = f (−y) for every y. ∧
(e) Let f be an integrable complex-valued function on R, with Fourier transform f . Show that
R∞
Rd ∧ c
i f (y)dy = √
−∞
2π
e−idx −e−icx f (x)dx x
whenever c ≤ d in R. > (f ) For an integrable complex-valued function f on R, let its F´ ejer integrals be
R c ¡R a
1 σc (x) = √
c 2π
for c > 0. Show that σc (x) =
(g) Show that
R∞ −∞
1−cos at dt at2
1 π
0
R∞ −∞
−a
∧
¢
eixy f (y)dy da
1−cos ct f (x − t)dt. ct2
= π for every a > 0. (Hint: integrate by parts and use 283Da.) Show that
lima→∞
R ∞ 1−cos at δ
at2
dt = lima→∞ supt≥δ
1−cos at at2
=0
for every δ > 0. (h) Let f be an integrable complex-valued function on R, and define its F´ejer integrals σa as in 283Xf above. Show that if x ∈ R, c ∈ C are such that 1 Rδ limδ↓0 0 |f (x + t) + f (x − t) − 2c|dt = 0, δ
then lima→∞ σa (x) = c. (Hint: adapt the argument of 282H.) > (i) Let f be an integrable complex-valued function on R, and define its F´ejer integrals σa as in 283Xf above. Show that f (x) = lima→∞ σa (x) for almost every x ∈ R. (j) Let f : R → C be a continuous integrable complex-valued function of bounded variation, and define its F´ejer integrals σa as in 283Xf above. Show that f (x) = lima→∞ σa (x) uniformly for x ∈ R. ∧
> (k) Let f be an integrable complex-valued function of bounded variation on R, and f its Fourier ∧
transform. Show that supy∈R |y f (y)| < ∞. ∨
(l) Let f and g be integrable complex-valued functions on R. Show that f ∗ g =
√
∧
2π(f × g)∨ .
446
Fourier analysis
283Xm
(m) Let f be an integrable complex-valued function on R, and fix x ∈ R. Set R∞ fˆx (y) = −∞ f (t) cos y(x − t)dt for y ∈ R. Show that (i) if f is differentiable at x, f (x) =
1 π
lima→∞
Ra 0
f˜x (y)dy;
(ii) if there is a neighbourhood of x in which f has bounded variation, then Ra 1 1 lima→∞ 0 fˆx (y)dy = (limt∈dom f,t↑0 f (t) + limt∈dom f,t↓0 f (t)); π
2
(iii) if f is twice differentiable and f 0 , f 00 are integrable then fˆx is integrable and f (x) = formula ¢ 1 R ∞ ¡R ∞ f (x) = f (t) cos y(x − t)dt dy, 0 −∞
1 π
R∞ 0
fˆx . (The
π
valid for such functions f , is called Fourier’s integral formula.) (n) Show that if f is a complex-valued function of bounded variation, defined almost everywhere in R, and converging to 0 at ±∞, then Ra 1 g(y) = √ lima→∞ −a e−iyx f (x)dx 2π
is defined in C for every y 6= 0, and that the limit is uniform in any region bounded away from 0. (o) Let f be an integrable complex-valued function on R. Set ∧ ∧ 1 R∞ 1 R∞ f c (y) = √ cos yx f (x)dx, f s (y) = √ sin yx f (x)dx −∞ −∞ 2π
2π
for y ∈ R. Show that 1 R a ixy ∧ √ e f (y)dy 2π −a
r =
∧ 2 Ra cos xy f c (y)dy 0 π
r +
∧ 2 Ra sin xy f s (y)dy 0 π
for every x ∈ R, a ≥ 0. RaR∞ R∞Ra (p) Use the fact that 0 0 e−xy sin y dxdy = 0 0 e−xy sin y dydx whenever a ≥ 0 to show that R a sin y R∞ 1 lima→∞ 0 y dy = 0 1+x2 dx. (q) Let f : R → C be an integrable function which is absolutely continuous on every bounded interval, ∧
∧∨
and suppose that its derivative f 0 is of bounded variation on R. Show that f is integrable and that f = f . (Hint: 283Ci, 283Xk.) ∧
> (r) Show that if f (x) = e−σ|x| , where σ > 0, then f (y) = Fourier transform of y 7→
2σ √ . 2π(σ 2 +y 2 )
Hence, or otherwise, find the
1 . 1+y 2
(s) Find the inverse Fourier transform of the characteristic function of a bounded interval in R. Show that in a formal sense 283F can be regarded as a special case of 283O. ∧
(t) Let f be a non-negative integrable function on R, with Fourier transform f . Show that ∧ Pn Pn ¯k f (yj − yk ) ≥ 0 j=0 k=0 aj a whenever y0 , . . . , yn in R and a0 , . . . , an ∈ C. P∞ (u) Let f be an integrable complex-valued function on R. Show that f˜(x) = n=−∞ f (x+2πn) is defined Rπ P∞ in C for almost every x. (Hint: n=−∞ −π |f (x + 2πn)|dx < ∞.) Show that f˜ is periodic. Show that the ∧ Fourier coefficients of f˜¹ ]−π, π] are h √1 f (k)ik∈Z . 2π
283 Notes
Fourier transforms I
447
283Y Further exercises (a) Show that if f : R → C is absolutely continuous in every bounded interval, f 0 is of bounded variation on R, and limx→∞ f (x) = limx→−∞ f (x) = 0, then Ra Ra 1 i g(y) = √ lima→∞ −a e−iyx f (x)dx = − √ lima→∞ −a e−iyx f 0 (x)dx 2π
y 2π
is defined, with 4 y 2 |g(y)| ≤ √ VarR (f 0 ), 2π
for every y 6= 0. (b) Let f : R → [0, ∞[ be an even function such that f is convex on [0, ∞[ (see §233 for notes on convex functions) and limx→∞ f (x) = 0. R 2kπ/y (i) Show that, for any y > 0, k ∈ N, −2kπ/y e−iyx f (x)dx ≥ 0. Ra (ii) Show that g(y) = √12π lima→∞ −a e−iyx f (x)dx exists in [0, ∞[ for every y 6= 0.
(iii) For n ∈ N, set fn (x) = e−|x|/(n+1) f (x) for every x. Show that fn is integrable and convex on [0, ∞[. ∧
(iv) Show that g(y) = limn→∞ f n (y) for every y 6= 0. (vi) Show that if f is integrable then
Ra −a
R∞
∧
4 f=√
2π
0
sin at f (t)dt t
R π/a
4a ≤√
2π
0
√ f (t)dt ≤ 2 2πf (1) ∧
for every a ≥ 0. Hence show that whether f is integrable or not, g is integrable and fn = (f n )∨ for every n. Ra ∧ (vii) Show that lima↓0 supn∈N −a f n = 0. ∧
(viii) Show that if f 0 is bounded (on its domain) then {f n : n ∈ N} is uniformly integrable (hint: use ∧
∨
(vii) and 283Ya), so that limn→∞ kf n − gk1 = 0 and f = g. (ix) Show that if f 0 is unbounded then for every ² > 0 we can find hR1 , h2 : R → [0, ∞[, both even, convex and converging to 0 at ∞, such that f = h1 + h2 , h01 is bounded, h2 ≤ ² and h2 (1) ≤ ². Hence ∨ show that in this case also f = g. (c) Suppose that f : R → R is even, twice differentiable and convergent to 0 at ∞, that f 00 is continuous and that {x : f 00 (x) = 0} is bounded in R. Show that f is the Fourier transform of an integrable function. (Hint: use 283Yb and 283Xq.) R∞ (d) Let g : R → R be an odd function of bounded variation such that 1 x1 g(x)dx = ∞. Show that g ∧
cannot be the Fourier transform of any integrable function f . (Hint: show that if g = f then
R1 0
2i f = √ lima→∞ 2π
R a 1−cos x 0
x
g(x)dx = ∞.)
283 Notes and comments I have tried in this section to give the elementary theory of Fourier transforms of integrable functions on R, with an eye to the extension of the concept which will be attempted in the next section. Following §282, I have given prominence to two theorems (283I and 283L) describing conditions for the inversion of the Fourier looking at R a transform to return to the original function; we find ourselves P n improper integrals lima→∞ −a , just as earlier we needed to look at symmetric sums limn→∞ k=−n . I do not go quite so far as in §282, and in particular I leave the study of square-integrable functions for the moment, since their Fourier transforms may not be describable by the simple formulae used here. One of the most fundamental obstacles in the subject is the lack of any effective criteria for determining which functions are the Fourier transforms of integrable functions. (Happily, things are better for square-integrable functions; see 284O-284P.) In 283Yb-283Yc I sketch an argument showing that ‘ordinary’ non-oscillating even functions which converge to 0 at ±∞ are Fourier transforms of integrable functions. Strikingly, this is not true of odd functions; thus y 7→ function, but y 7→
arctan y ln(e+y 2 )
is not (283Yd).
1 ln(e+y 2 )
is the Fourier transform of an integrable
448
Fourier analysis
283 Notes
In 283W I sketch the corresponding theory of Fourier transforms in Rr . There are few surprises. One point to note is that where in the one-dimensional case we ask for a well-behaved second derivative, in the r-dimensional case we may need to differentiate r + 1 times (283Wi). Another is that we lose the ‘localization principle’. In the one-dimensional case, if f is integrable and zero on an interval ]c, d[, then Ra ∧ lima→∞ −a eixy f (y)dy = 0 for every x ∈ ]c, d[; this is immediate from either 283I or 283L. But in higher dimensions the most natural formulation of a corresponding result is false (283Wm).
284 Fourier transforms II The basic paradox of Fourier transforms is the fact that while for certain functions (see 283J-283K) we ∧
have (f )∨ = f , ‘ordinary’ integrable functions f (for instance, the characteristic functions of non-trivial ∧
intervals) give rise to non-integrable Fourier transforms f for which there is no direct definition available ∧∨
∧∨
for f , making it a puzzle to decide in what sense the formula f = f might be true. What now seems by far the most natural resolution of the problem lies in declaring the Fourier transform to be an operation on distributions rather than on functions. I shall not attempt to describe this theory properly (almost any book on ‘Distributions’ will cover the ground better than I can possibly do here), but will try to convey the fundamental ideas, so far as they are relevant to the questions dealt with here, in language which will make the transition to a fuller treatment straightforward. At the same time, these methods make it easy to prove strong versions of the ‘classical’ theorems concerning Fourier transforms. 284A Test functions: Definition Throughout this section, a rapidly decreasing test function or Schwartz function will be a function h : R → C such that h is smooth, that is, differentiable everywhere any finite number of times, and moreover supx∈R |x|k |h(m) (x)| < ∞ for all k, m ∈ N, writing h(m) for the mth derivative of h. 284B
The following elementary facts will be useful.
Lemma (a) If g and h are rapidly decreasing test functions, so are g + h and ch, for any c ∈ C. (b) If h is a rapidly decreasing test function and y ∈ R, then x 7→ h(y − x) is a rapidly decreasing test function. (c) If h is any rapidly decreasing test function, then h and h2 are integrable. (d) If h is a rapidly decreasing test function, so is its derivative h0 . (e) If h is a rapidly decreasing test function, so is the function x 7→ xh(x). 2 (f) For any ² > 0, the function x 7→ e−²x is a rapidly decreasing test function. proof (a) is trivial. (b) Write g(x) = h(y − x) for x ∈ R. Then g (m) (x) = (−1)m h(m) (y − x) for every m, so g is smooth. For any k ∈ N, |x|k ≤ 2k (|y|k + |y − x|k ) for every x, so sup |x|k |g (m) (x)| = sup |x|k |h(m) (y − x)| x∈R
x∈R
≤ 2k |y|k sup |h(m) (y − x)| + 2k sup |y − x|k |h(m) (y − x)| x∈R k
k
= 2 |y| sup |h x∈R
(c) Because
x∈R (m)
k
(x)| + 2 sup |x|k |h(m) (x)| < ∞. x∈R
284E
Fourier transforms II
449
M = supx∈R |h(x)| + x2 |h(x)| is finite, we have
R
|h(x)|dx ≤
R
M dx 1+x2
< ∞.
Of course we now have |h2 | ≤ M |h|, so h2 is also integrable. (d) This is immediate from the definition, as every derivative of h0 is a derivative of h. (e) Setting g(x) = xh(x), g (m) (x) = xh(m) (x) + mh(m−1) (x) for m ≥ 1, so supx∈R |xk g (m) (x)| ≤ supx∈R |xk+1 h(m) (x)| + m supx∈R |xk h(m−1) (x)| is finite, for all k ∈ N, m ≥ 1. 2
(f ) If h(x) = e−²x , then for each m ∈ N we have h(m) (x) = pm (x)h(x), where p0 (x) = 1 and pm+1 (x) = 2 0 pm (x) − 2²xpm (x), so that pm is a polynomial. Because e²x ≥ ²k+1 x2k+2 /(k + 1)! for all x, k ≥ 0, 2
lim|x|→∞ |x|k h(x) = limx→∞ xk /e²x = 0 for every k, and lim|x|→∞ p(x)h(x) = 0 for every polynomial p; consequently lim|x|→∞ xk h(m) (x) = lim|x|→∞ xk pm (x)h(x) = 0 for all k, m, and h is a rapidly decreasing test function. ∧
∨
284C Proposition Let h : R → C be a rapidly decreasing test function. Then h : R → C and h : R → C ∧∨ ∨∧ are rapidly decreasing test functions, and h = h = h. R∞ proof (a) Let k, m ∈ N. Then supx∈R (|x|m + |x|m+2 )|h(k) (x)| < ∞ and −∞ |xm h(k) (x)|dx < ∞. We may ∧
therefore use 283Ch-283Ci to see that y 7→ ik+m y k h(m) (y) is the Fourier transform of x 7→ xm h(k) (x), and ∧
∧
∧
therefore that lim|y|→∞ y k h(m) (y) = 0, by 283Cg, so that (because h(m) is continuous) supy∈R |y k h(m) (y)| is ∧
finite. As k and m are arbitrary, h is a rapidly decreasing test function. ∨
∧
∨
(b) Since h(y) = h(−y) for every y, it follows at once that h is a rapidly decreasing test function. ∧∨
∨∧
(c) By 283J, it follows from (a) and (b) that h = h = h. 284D Definition I will use the phrase tempered function on R to mean a measurable complex-valued function f , defined almost everywhere on R, such that
R∞
1
−∞ 1+|x|k
|f (x)|dx < ∞
for some k ∈ N. 284E
As in 284B I spell out some elementary facts.
Lemma (a) If f and g are tempered functions, so are |f |, f + g and cf , for any c ∈ C. (b) If f is a tempered function then it is integrable over any bounded interval. (c) If f is a tempered function and x ∈ R, then t 7→ f (x+t) and t 7→ f (x−t) are both tempered functions. proof (a) is elementary; if
R∞
1
−∞ 1+|x|j
then
f (x)dx < ∞,
R∞
1
−∞ 1+|x|j+k
because
R∞
1
−∞ 1+|x|k
g(x)dx < ∞,
|(f + g)(x)| < ∞
450
Fourier analysis
284E
1 + |x|j+k ≥ max(1, |x|j+k ) ≥ max(1, |x|j , |x|k ) ≥
1 2
max(1 + |x|j , 1 + |x|k )
for all x. (b) If
R∞
1
|f (x)|dx = M < ∞,
−∞ 1+|x|k
then for any a ≤ b
Rb a
|f (x)|dx ≤ M (1 + |a|k + |b|k )(b − a) < ∞.
(c) The idea is the same as in 284Bb. If k ∈ N is such that
R∞
1
−∞ 1+|t|k
|f (t)|dt = M < ∞,
then we have 1 + |x + t|k ≤ 2k (1 + |x|k )(1 + |t|k ) so that 1 1+|t|k
for every t, and
R∞
|f (x+t)| dt −∞ 1+|t|k
Similarly,
≤ 2k (1 + |x|k )
≤ 2k (1 + |x|k )
R∞
|f (x−t)|
−∞ 1+|t|k
284F
R∞
1 1+|x+t|k
|f (x+t)|
−∞ 1+|x+t|k
dt ≤ 2k (1 + |x|k )M < ∞.
dt ≤ 2k (1 + |x|k )M < ∞.
Linking the two concepts, we have the following.
Lemma Let f be a tempered function on R and h a rapidly decreasing test function. Then f × h is integrable. R∞ 1 proof Of course f × h is measurable. Let k ∈ N be such that −∞ 1+|x| k |f (x)|dx < ∞. There is a M such k that (1 + |x| )|h(x)| ≤ M for every x ∈ R, so that
R∞ −∞
|f (x)h(x)|dx ≤ M
R∞
1
−∞ 1+|x|k
|f (x)|dx < ∞.
R R 284G Lemma Suppose that f1 and f2 are tempered functions and that f1 × h = f2 × h for every rapidly decreasing test function h. Then f1 =a.e. f2 . R proof (a) Set g = f1 − f2 ; then g × h = 0 for every rapidly decreasing test function h. Of course g is a tempered function, so is integrable over any bounded interval. By 222D, it will be enough if I can show that Rb g = 0 whenever a < b, since then we shall have g = 0 a.e. on every bounded interval and f1 =a.e. f2 . a (b) Consider the function φ0 (x) = e−1/x for x > 0. Then φ0 is differentiable arbitrarily often everywhere (m) in ]0, ∞[, 0 < φ0 (x) < 1 for every x > 0, and limx→∞ φ0 (x) = 1. Moreover, writing φ0 for the mth derivative of φ0 , 1 (m) x
(m)
limx↓0 φ0 (x) = limx↓0 φ0 (x) = 0 (m)
for every m ∈ N. P P (Compare 284Bf.) We have φ0 (x) = pm ( x1 )φ0 (x), where p0 (t) = 1 and pm+1 (t) = 2 0 t (pm (t) − pm (t)), so that pm is a polynomial for each m ∈ N. Now for any k ∈ N, limt→∞ tk e−t ≤ limt→∞
(k+1)!tk tk+1
= 0,
284Id
Fourier transforms II
451
so (m)
limx↓0 φ0 (x) = limt→∞ pm (t)e−t = 0, 1 (m) x
limx↓0 φ0 (x) = limt→∞ tpm (t)e−t = 0. Q Q (c) Consequently, setting φ(x) = 0 for x ≤ 0, e−1/x for x > 0, φ is smooth, with mth derivative φ(m) (x) = 0 for x ≤ 0,
(m)
φ(m) (x) = φ0 (x) for x > 0.
(The proof is an easy induction on m.) Also 0 ≤ φ(x) ≤ 1 for every x ∈ R, and limx→∞ φ(x) = 1. (d) Now take any a < b, and for n ∈ N set ψn (x) = φ(n(x − a))φ(n(b − x)). Then ψn will be smooth and ψn (x) = 0 if x ∈ / ]a, b[, so surely ψn is a rapidly decreasing test function, and R∞ g × ψn = 0. −∞ Next, 0 ≤ ψn (x) ≤ 1 for every x, n, and if a < x < b then limn→∞ ψn (x) = 1. So Rb R R R g = g × χ(]a, b[) = g × (limn→∞ ψn ) = limn→∞ g × ψn = 0, a using Lebesgue’s Dominated Convergence Theorem. As a and b are arbitrary, g =a.e. 0, as required. 284H Definition Let f and g be tempered functions in the sense of 284D. Then I will say that g represents the Fourier transform of f if R∞ R∞ ∧ g × h = −∞ f × h −∞ for every rapidly decreasing test function h. 284I Remarks (a) As usual, when shifting definitions in this way, we have some checking to do. If f is ∧
∧
an integrable complex-valued function on R, f its Fourier transform, then surely f is a tempered function, R ∧ R ∧ being a bounded continuous function; and if h is any rapidly decreasing test function, then f × h = f × h ∧
by 283O. Thus f ‘represents the Fourier transform of f ’ in the sense of 284H above. (b) Note also that 284G assures us that if g1 , g2 are two tempered functions both representing the Fourier transform of f , then g1 =a.e. g2 , since we must have R R ∧ R g1 × h = f × h = g2 × h for every rapidly decreasing test function h. (c) Of course the value of this indirect approach is that we can assign Fourier transforms, in a sense, to many more functions. But we must note at once that if g ‘represents the Fourier transform of f ’ then so will any function equal almost everywhere to g; we can no longer expect to be able to speak of ‘the’ Fourier transform of f as a function. We could say that ‘the’ Fourier transform of f is a functional φ on the space R ∧ of rapidly decreasing test functions, defined by setting φ(h) = f × h; alternatively, we could say that ‘the’ Fourier transform of f is a member of L0C , the space of equivalence classes of almost-everywhere-defined measurable functions (§241). (d) It is now natural to say that g represents the inverse Fourier transform of f just when f R R ∧ represents the Fourier transform of g; that is, when f × h = g × h for every rapidly decreasing test ∧∨ ∨∧ R ∨ R function h. Because h = h = h for every such h, this is the same thing as saying that f × h = g × h for every rapidly decreasing test function h, which is the other natural expression of what it might mean to say that ‘g represents the inverse Fourier transform of f ’.
452
Fourier analysis
284Ie
(e) If f , g are tempered functions and we write g ∗ (x) = g(−x) whenever this is defined, then g ∗ will also be a tempered function, and we shall always have R ∗ ∧ R ∧ R ∧ R ∨ g × h = g(−x)h(x)(x)dx = g(x)h(−x)dx = g × h, so that g represents the Fourier transform of f R R ∧ ⇐⇒ g × h = f × h for every test function h R ∨ R ∨∧ ⇐⇒ g × h = f × h for every h R ∗ ∧ R ⇐⇒ g × h = f × h for every h ⇐⇒ g ∗ represents the inverse Fourier transform of f . Combining this with (d), we get g represents the Fourier transform of f ⇐⇒ f ∗∗ = f represents the inverse Fourier transform of g ⇐⇒ f ∗ represents the Fourier transform of g. ∨
(f ) Yet again, I ought to spell out the check: if f is integrable and f is its inverse Fourier transform as defined in 283Ab, then R ∨ ∧ R ∧∨ R f ×h= f ×h = f ×h ∨
for every rapidly decreasing test function h, so f ‘represents the inverse Fourier transform of f ’ in the sense given here. 284J Lemma Let f be any tempered function and h a rapidly decreasing test function. Then f ∗ h, defined by the formula R∞ (f ∗ h)(y) = −∞ f (t)h(y − t)dt, is defined everywhere. proof Take any y ∈ R. By 284Bb, t 7→ h(y − t) is a rapidly decreasing test function, so the integral is always defined in C, by 284F. 284K Proposition Let f and g be tempered functions such that g represents the Fourier transform of f , and h a rapidly decreasing test function. ∧ ∧ (a) The Fourier transform of the integrable function f × h is √12π g ∗ h, where g ∗ h is the convolution of ∧
g and h. √ ∧ (b) The Fourier transform of the continuous function f ∗ h is represented by the product 2πg × h. ∧
proof (a) Of course f × h is integrable, by 284F, while g ∗ h is defined everywhere, by 284C and 284J. ∧
∧
Fix y ∈ R. Set φ(x) = h(y − x) for x ∈ R; then φ is a rapidly decreasing test function because h is (284Bb). Now Z ∞ Z ∞ ∧ ∧ ∧ 1 1 −itx √ √ φ(t) = e h(y − x)dx = e−it(y−x) h(x)dx 2π −∞ 2π −∞ Z ∞ ∧ ∧∨ 1 −ity √ eitx h(x)dx = e−ity h (t) = e−ity h(t), = e 2π
−∞
using 284C. Accordingly 1 (f × h)∧ (y) = √
2π
1 =√
2π
Z
∞
−∞ Z ∞ −∞
(because g represents the Fourier transform of f )
e−ity f (t)h(t)dt 1 f (t)φ(t)dt = √ ∧
2π
Z
∞
g(t)φ(t)dt −∞
284L
Fourier transforms II
Z
1 =√
2π
∞
453
1 g(t)h(y − t)dt = √ (g ∗ h)(y). ∧
∧
2π
−∞
1 As y is arbitrary, √ g ∗ h is the Fourier transform of f × h. ∧
2π
∧
(b) Write ψ for the Fourier transform of g × h, f ∗ (x) = f (−x) when this is defined, and h∗ (x) = h(−x) for every x, so that f ∗ represents the Fourier transform of g, by 284Ie, and h∗ is the Fourier transform of √ ∧ ∧ 1 h. By (a), we have ψ = √ f ∗ ∗ h∗ . This means that the inverse Fourier transform of 2πg × h must be 2π √ 2πψ ∗ = (f ∗ ∗ h∗ )∗ ; and as (f ∗ ∗ h∗ )∗ (y) = (f ∗ ∗ h∗ )(−y) Z ∞ = f ∗ (t)h∗ (−y − t)dt −∞ Z ∞ f (−t)h(y + t)dt = −∞ Z ∞ = f (t)h(y − t)dt = (f ∗ h)(y), −∞
√
∧
the inverse Fourier transform of 2πg × h is f ∗ h (which is therefore continuous), and represent the Fourier transform of f ∗ h.
√
∧
2πg × h must
Remark Compare 283M. It is typical of the theory of Fourier transforms that we have formulae valid in a wide variety of contexts, each requiring a different interpretation and a different proof. 284L We are now ready for a result corresponding to 282H. I use a different method, or at least a different arrangement of the ideas, through the following fact, which is important in other ways. Proposition Let f be any tempered function. Writing ψσ (x) =
2 2 √1 e−x /2σ σ 2π
for x ∈ R and σ > 0, then
limσ↓0 (f ∗ ψσ )(x) = c whenever x ∈ R and c ∈ C are such that 1 Rδ limδ↓0 0 |f (x + t) + f (x − t) − 2c|dt = 0. δ
proof (a) By 284Bf, every ψσR is a rapidly decreasing test function, so that f ∗ ψσ is defined everywhere, by ∞ 284J. We need to know that −∞ ψσ = 1; this is because (substituting u = x/σ)
R∞
1 ψ (x)dx = √ −∞ σ
2π
R∞
−∞
e−u
2
/2
du = 1,
by 263G. The argument now follows the lines of 282H. Set φ(t) = |f (x + t) + f (x − t) − 2c| Rt when this is defined, which is almost everywhere, and Φ(t) = 0 φ, defined for all t ≥ 0 because f is integrable over every bounded interval (284Eb). We have Z
Z
∞
|(f ∗ ψσ )(x) − c| = |
∞
f (x − t)ψσ (t)dt − c −∞ Z 0
=|
ψσ (t)dt| −∞
Z
−∞ Z ∞
=|
(f (x − t) − c)ψσ (t)dt| 0 Z ∞
(f (x + t) − c)ψσ (t)dt + 0
(because ψσ is an even function)
∞
(f (x − t) − c)ψσ (t)dt +
(f (x − t) − c)ψσ (t)dt| 0
454
Fourier analysis
284L
Z ∞ =| (f (x + t) + f (x − t) − 2c)ψσ (t)dt| Z 0∞ Z ≤ |f (x + t) + f (x − t) − 2c|ψσ (t)dt = 0
∞
φ(t)ψσ (t)dt.
0
(b) I should explain why this last integral is finite. Because f is a tempered function, so are the functions t 7→ f (x + t), t 7→ f (x − t) (284Ec); of course constant functions are tempered, so t 7→ φ(t) = |f (x + t) + f (x − t) − 2c| is tempered, and because ψσ is a rapidly decreasing test function we may apply 284F to see that the product is integrable. (c) Let ² > 0. By hypothesis, lim Φ(t)/t = 0; let δ > 0 be such that Φ(t) ≤ ²t for every t ∈ [0, δ]. Take R t↓0 ∞ any σ ∈ ]0, δ]. I break the integral 0 φ(t)ψσ (t)dt up into three parts. (i) For the integral from 0 to σ, we have Z σ Z σ 1 ²σ 1 √ φ(t)dt = √ Φ(σ) ≤ √ φ(t)ψσ (t)dt ≤ 0
because ψσ (t) ≤
√1 σ 2π
σ 2π
0
σ 2π
σ 2π
≤ ²,
for every t.
(ii) For the integral from σ to δ, we have Z
δ
σ 2π
σ
(because e
−t2 /2σ 2
Z
1 φ(t)ψσ (t)dt ≤ √
= 1/e
t2 /2σ 2
2
δ
φ(t) σ
2σ 2 dt t2
2
≤ 1/(t /2σ ) = 2σ 2 /t2 for every t 6= 0) r Z δ r Z 2 2 ¡ Φ(δ) φ(t) Φ(σ) =σ dt = σ − + 2 2 π
t
σ
π
δ
σ
(integrating by parts – see 225F)
Z δ ¡² 2² ¢ dt ≤σ + δ t2 σ p (because Φ(t) ≤ ²t for 0 ≤ t ≤ δ and 2/π ≤ 1) ¡ ² 2² ¢ ≤σ + ≤ 3². δ
σ
(iii) For the integral from δ to ∞, we have Z δ
∞
1 φ(t)ψσ (t) = √ 2π
Z
∞
φ(t)
e−t
2
/2σ 2
dt.
σ
δ
Now for any t ≥ δ, 1 σ
σ 7→ e−t
2
/2σ 2
: ]0, δ] → R
is monotonically increasing, because its derivative d 1 −t2 /2σ 2 e dσ σ
=
1 ¡ t2 σ2 σ2
¢ 2 2 − 1 e−t /2σ
is positive, and 1 σ
limσ↓0 e−t
2
/2σ 2
= lima→∞ ae−a
2 2
t /2
= 0.
So we may apply Lebesgue’s Dominated Convergence Theorem to see that Z lim
n→∞
2
∞
φ(t) δ
2
e−t /2σn dt = 0 σn
δ
2Φ(t) ¢ dt t3 σ
284M
Fourier transforms II
455
whenever hσn in∈N is a sequence in ]0, δ] converging to 0, so that Z
∞
lim
φ(t)
σ↓0
δ
e−t
2
/2σ 2
σ
dt = 0.
There must therefore be a σ0 ∈ ]0, δ] such that
R∞ δ
φ(t)ψσ (t)dt ≤ ²
for every σ ≤ σ0 . Putting these together, we see that |(f ∗ ψσ )(x) − c| ≤
R∞ 0
φ(t)ψσ (t)dt ≤ ² + 3² + ² = 5²
whenever 0 < σ ≤ σ0 . As ² is arbitrary, limσ↓0 (f ∗ ψσ )(x) = c, as claimed. 284M Theorem Let f and g be tempered functions such that g represents the Fourier transform of f . Then 1 R ∞ −iyx −²x2 (a)(i) g(y) = lim²↓0 √ e e f (x)dx for almost every y ∈ R. −∞ 2π
(ii) If y ∈ R is such that a = limt∈dom g,t↑y g(t) and b = limt∈dom g,t↓y g(t) are both defined in C, then 1 R ∞ −iyx −²x2 1 e e f (x)dx = (a + b). lim²↓0 √ −∞ 2π
(b)(i) f (x) = lim²↓0
1 R∞ √
e 2π −∞
ixy −²y
e
2
2
g(y)dy for almost every x ∈ R.
(ii) If x ∈ R is such that a = limt∈dom f,t↑x f (t) and b = limt∈dom f,t↓x f (t) are both defined in C, then 1 1 R ∞ ixy −²y 2 e e g(y)dy = (a + b). lim²↓0 √ −∞ 2π
2
proof (a)(i) By 223D, limδ↓0
1 Rδ 2δ −δ
|g(y + t) − g(y)|dt = 0
for almost every y ∈ R, because g is integrable over any bounded interval. Fix any such y. Set φ(t) = |g(y + t) + g(y − t) − 2g(y)| whenever this is defined. Then, as in 282Ia, Rδ Rδ φ ≤ −δ |g(y + t) − g(y)|dt, 0 Rδ so limδ↓0 1δ 0 φ = 0. Consequently, by 284L, g(y) = limσ→∞ (g ∗ ψ1/σ )(y). We know from 283N that the Fourier transform of ψσ is σ1 ψ1/σ for any σ > 0. Accordingly, by 284K, g ∗ ψ1/σ √ is the Fourier transform of σ 2πf × ψσ , that is, R∞ (g ∗ ψ1/σ )(y) = −∞ e−iyx σψσ (x)f (x)dx. So Z
∞
e−iyx σψσ (x)f (x)dx −∞ Z ∞ 2 2 1 = lim √ e−iyx e−x /2σ f (x)dx σ→∞ 2π −∞ Z ∞ 2 1 = lim √ e−iyx e−²x f (x)dx.
g(y) = lim
σ→∞
²↓0
2π
−∞
And this is true for almost every y. (ii) Again, setting c = 21 (a + b), φ(t) = |g(y + t) + g(y − t) − 2c| whenever this is defined, we have Rδ limt∈dom φ,t↓0 φ(t) = 0, so of course limδ↓0 1δ 0 φ = 0, and
456
Fourier analysis
c = lim (g ∗ ψ1/σ )(y) = σ→∞
1 lim √ 2π ²↓0
Z
284M
∞
2
e−iyx e−²x f (x)dx
−∞
as before. (b) This can be shown by similar arguments; or it may be actually deduced from (a), by observing that x 7→ f ∗ (x) = f (−x) represents the Fourier transform of g (see 284Id), and applying (a) to g and f ∗ . 284N L2 spaces We are now ready for results corresponding to 282J-282K. Lemma Let L2C be the space of square-integrable complex-valued functions on R, and S the space of rapidly decreasing test functions. Then for every f ∈ L2C and ² > 0 there is an h ∈ S such that kf − hk2 ≤ ². proof Set φ(x) = e−1/x for x > 0, zero for x ≤ 0; recall from the proof of 284G that φ is smooth. For any a < b, the functions x 7→ ψn (x) = φ(n(x − a))φ(n(b − x)) provide a sequence of test functions converging to χ ]a, b[ from below, so (as in 284G) Rb inf h∈S kχ ]a, b[ − hk22 ≤ limn→∞ a |1 − ψn (x)|2 dx = 0. Because S is a linear space (284Ba), it follows that for every step-function g with bounded support and every ² > 0 there is an h ∈ S such that kg − hk2 ≤ 12 ². But we know from 244H that for every f ∈ L2C and ² > 0 there is a step-function g with bounded support such that kf − gk2 ≤ 21 ²; so there must be an h ∈ S such that kf − hk2 ≤ kf − gk2 + kg − hk2 ≤ ². As f and ² are arbitrary, we have the result. 284O Theorem (a) Let f be any complex-valued function which is square-integrable over R. Then f is a tempered function and its Fourier transform is represented by another square-integrable function g, and kgk2 = kf k2 . (b) If f1 and f2 are complex-valued functions, square-integrable over R, with Fourier transforms represented by functions g1 , g2 , then R∞ R∞ f (x)f2 (x)dx = −∞ g1 (y)g2 (y)dy. −∞ 1 (c) If f1 and f2 are complex-valued functions, square-integrable over R, with Fourier transforms represented by functions g1 , g2 , then the integrable function f1 × f2 has Fourier transform √12π g1 ∗ g2 . (d) If f1 and f2 are complex-valued functions, square-integrable over R, with Fourier transforms repre√ sented by functions g1 , g2 , then 2πg1 × g2 represents the Fourier transform of the continuous function f1 ∗ f2 . proof (a)(i) Consider first the case in which f is a rapidly decreasing test function and g is its Fourier transform; we know that g is also a rapidly decreasing test function, and that f is the inverse Fourier transform of g (284C). Now the complex conjugate g of g is given by the formula 1 g(y) = √
R∞
e 2π −∞
−iyx f (x)dx
1 =√
R∞
e 2π −∞
iyx
f (x)dx,
so that g is the inverse Fourier transform of f . Accordingly ∨ R R∨ R R f × f = g × f = g × f = g × g, using 283O for the middle equality. (ii) Now suppose that f ∈ L2C . I said that f is a tempered function; this is simply because
R∞ ¡
1
−∞ 1+|x|
so
¢2
dx < ∞,
284O
Fourier transforms II
R∞
|f (x)|
−∞ 1+|x|
457
dx < ∞
(244Eb). By 284N, there is a sequence hfn in∈N of rapidly decreasing test functions such that limn→∞ kf − fn k2 = 0. By (i), ∧
∧
limm,n→∞ kf m − f n k2 = limm,n→∞ kfm − fn k2 = 0, ∧
and the sequence hf •n in∈N of equivalence classes is a Cauchy sequence in L2C . Because L2C is complete (244G), ∧
hf •n in∈N has a limit in L2C , which is representable as g • for some g ∈ L2C . Like f , g must be a tempered function. Of course ∧
kgk2 = limn→∞ kf n k2 = limn→∞ kfn k2 = kf k2 . Now if h is any rapidly decreasing test function, h ∈ L2C (284Bc), so we shall have R R ∧ R ∧ R ∧ g × h = limn→∞ f n × h = limn→∞ fn × h = f × h. So g represents the Fourier transform of f . (b) Of course any functions representing the Fourier transforms of f1 and f2 must be equal almost everywhere to square-integrable functions, and therefore square-integrable, with the right norms. It follows as in 282K (part (d) of the proof) that if g1 , g2 represent the Fourier transforms of f1 , f2 , so that ag1 + bg2 represents the Fourier transform of af1 + bf2 and kag1 + bg2 k2 = kaf1 + bf2 k2 for all a, b ∈ C, we must have R R f1 × f 2 = (f1 |f2 ) = (g1 |g2 ) = g1 × g 2 . (c) Of course f1 × f2 is integrable because it is the product of two square-integrable functions (244E). (i) Let y ∈ R and set f (x) = f2 (x)eiyx for x ∈ R. Then f ∈ L2C . We need to know that the Fourier transform of f is represented by g, where g(u) = g2 (y − u). P P Let h be a rapidly decreasing test function. Then Z Z Z g × h = g2 (y − u)h(u)du = g2 (u)h(y − u)du Z Z ∧ = g2 × h1 = f2 × h1 , ∧
where h1 (u) = h(y − u). To compute h1 , we have Z ∞ Z ∧ 1 1 h1 (v) = √ e−ivu h1 (u)du = √ 2π
1 =√
2π
−∞ Z ∞
2π
1 eivu h(y − u)du = √
e−ivu h(y − u)du
−∞ Z ∞
2π
−∞
∞
∧
eiv(y−u) h(u)du = eivy h(v). −∞
So Z
Z
∧
Z
f 2 × h1 =
g×h=
∧
f2 (v)h1 (v)dv Z Z ∧ ∧ ivy = f2 (v)e h(v)dv = f × h :
as h is arbitrary, g represents the Fourier transform of f . Q Q (ii) We now have 1 (f1 × f2 )∧ (y) = √
2π
1 =√
2π
(using part (b))
Z
∞
−∞ Z ∞ −∞
e−iyx f1 (x)f2 (x)dx 1 f1 (x)f (x)dx = √
2π
Z
∞
g1 (u)g(u)du −∞
458
Fourier analysis 1 =√
2π
As y is arbitrary, (f1 × f2 )∧ =
Z
∞
−∞
284O
1 g1 (u)g2 (y − u)du = √ (g1 ∗ g2 )(y). 2π
√1 g1 2π
∗ g2 , as claimed. √ ∗ (d) By (c), the Fourier transform of 2πg1 × g2 is f1∗ ∗ f2∗ , writing so that f1∗ represents √ f1 (x) = f1 (−x), ∗ the Fourier transform of g1 . So the inverse Fourier transform of 2πg1 × g2 is (f1 ∗ f2∗ )∗√ . But, just as in ∗ ∗ ∗ 2πg1 × g2 , and the proof of 284Kb, (f ∗ f ) = f ∗ f , so f ∗ f is the inverse Fourier transform of 1 2 1 2 1 2 √ 2πg1 × g2 represents the Fourier transform of f1 ∗ f2 , as claimed. Also f1 ∗ f2 , being the Fourier transform of an integrable function, is continuous (283Cf; see also 255K). 284P Corollary Writing L2C for the Hilbert space of equivalence classes of square-integrable complexvalued functions on R, we have a linear isometry T : L2C → L2C given by saying that T (f • ) = g • whenever f , g ∈ L2C and g represents the Fourier transform of f . 284Q Remarks (a) 284P corresponds, of course, to 282K, where the similar isometry between `2C (Z) and L2C (]−π, π]) is described. In that case there was a marked asymmetry which is absent from the present situation; because the relevant measure on Z, counting measure, gives non-zero mass to every point, members of `2C are true functions, and it is not surprising that we have a straightforward formula for S(f • ) ∈ `2C for every f ∈ L2C (]−π, π]). The difficulty of describing S −1 : `2C (Z) → L2C (]−π, π]) is very similar to the difficulty of describing T : L2C (R) → L2C (R) and its inverse. 284Yg and 286U-286V show just how close this similarity is. (b) I have spelt out parts (c) and (d) of 284O in detail, √ perhaps in unnecessary detail, because they give me an opportunity to insist on the difference between ‘ 2πg1 × g2 represents the Fourier transform of f1 ∗ f2 ’ and ‘ √12π g1 ∗ g2 is the Fourier transform of f1 × f2 ’. The actual functions g1 and g2 are not well-defined by the hypothesis that they represent the Fourier transforms of f1 and f2 , though their equivalence classes g1• , g2• ∈ L2C are. So the product g1 × g2 is also not uniquely defined as a function, though its equivalence class (g1 × g2 )• = g1• × g2• is well-defined as a member of L1C . However the continuous function g1 ∗ g2 is unaffected by changes to g1 and g2 on negligible sets, so is well defined as a function; and since f1 × f2 is integrable, and has a true Fourier transform, it is to be expected that (f1 × f2 )∧ should be exactly equal to √12π g1 ∗ g2 . (c) Of course 284Oc-284Od also exhibit a characteristic feature of arguments involving Fourier transforms, the extension by continuity of relations valid for test functions. ∧
(d) 284Oa is a version of Plancherel’s theorem. The formula kf k2 = kf k2 is Parseval’s identity. 284R Dirac’s delta function Consider the tempered function 1 with constant value 1. In what sense, if any, can we assign a Fourier transform to 1? R ∧ If we examine 1 × h, as suggested in 284H, we get √ R∞ ∧ R ∞ ∧ √ ∧∨ 1 × h = −∞ h = 2π h (0) = 2πh(0) −∞ √ R for every rapidly decreasing test function h. Of course there is no function g such that g × h = 2πh(0) for every rapidly decreasing test function h, since (using the arguments of 284G) we should have to have √ Rb g = 2π whenever a < 0 < b, so that the indefinite integral of g could not be continuous at 0. However a there is a measure on R with exactly the right property. If we set δ0 E = 1 when 0 ∈ E ⊆ R, 0 when 0R ∈ / E ⊆ R, then δ0 is a (Radon) probability measure; it is indeed a ‘distribution’ in the sense of 271C, and h dδ0 = h(0) for every function h defined at 0. So we shall have √ R R∞ ∧ 1 × h = 2π h dδ0 −∞ √ for every rapidly decreasing test function h, and we can reasonably say that the measure ν = 2πδ0 ‘represents the Fourier transform of 1’. We note with pleasure at this point that
284We
Fourier transforms II
R
1 √ 2π
459
eixy ν(dy) = 1
for every x ∈ R, so that 1 can be called the inverse Fourier transform of ν. If we look at the formulae of Theorem 284M, we get ideas consistent with this pairing of 1 with ν. We have 2 1 R ∞ −iyx −²x2 1 R ∞ −iyx −²x2 1 √ e e 1(x)dx = √ e e dx = √ e−y /4² 2π −∞ 2π −∞ 2² √ for every y ∈ R, using 283N with σ = 1/ 2². So 1 R ∞ −iyx −²x2 lim²↓0 √ e e 1(x)dx = 0 −∞ 2π
for every y 6= 0, and the Fourier transform of 1 should be zero everywhere except at 0. On the other √ 2 1 hand, the functions y 7→ √ e−y /4² all have integral 2π, concentrated more and more closely about 0 as 2² √ ² decreases to 0, so also point us directly to ν, the measure which gives mass 2π to 0. Thus allowing measures, as well as functions, enables us to extend the notion of Fourier transform. Of course we can go very much farther than this. If h is any rapidly decreasing test function, then √ R∞ ∧ xh(x)dx = −i 2πh0 (0), −∞ √ so that the identity function x 7→ x can be assigned, as a Fourier transform, the operator h 7→ −i 2πh0 (0). At this point we are entering the true theory of (Schwartzian) distributions or ‘generalized functions’, and I had better stop. The ‘Dirac delta function’ is most naturally regarded as the measure δ0 above; 1 alternatively, as √ 1. ∧
2π
284W The multidimensional case As in §283, I give exercises designed to point the way to the r-dimensional generalization. (a) A rapidly decreasing test function on R r is a function h : R r → C such that (i) h is smooth, that is, all repeated partial derivatives ∂mh ∂ξj1 ...∂ξjm
are defined and continuous everywhere in R r (ii) supx∈R r kxkk |h(x)| < ∞,
supx∈Rr kxkk |
∂mh (x)| ∂ξj1 ...∂ξjm
0 there is a rapidly decreasing test function h such that kf − hk2 ≤ ². (i) Let L2C be the space of square-integrable complex-valued functions on R r . Show that (i) for every f ∈ L2C there is a g ∈ L2C which represents the Fourier transform of f , and in this case kgk2 = kf k2 ; 1 (ii) if g1 , g2 ∈ L2C represent the Fourier transforms of f1 , f2 ∈ L2C , then (√2π) g ∗ g2 is the Fourier r 1 √ r transform of f1 × f2 , and ( 2π) g1 × g2 represents the Fourier transform of f1 ∗ f2 . 284X Basic exercises (a) Show that if g and h are rapidly decreasing test functions, so is g × h. (b) Show that there are non-zero continuous integrable functions f , g : R → C such that f ∗ g = 0 everywhere. (Hint: take them to be Fourier transforms of suitable test functions.) (c) Suppose that f : R → C is a differentiable function such that its derivative f 0 is a tempered function and, for some k ∈ N, R
R
limx→∞ x−k f (x) = limx→−∞ x−k f (x) = 0.
(i) Show that f × h0 = − f 0 × h for every rapidly decreasing test function h. (ii) Show that if g is a tempered function representing the Fourier transform of f , then y 7→ iyg(y) represents the Fourier transform of f 0 . (d) Show that if h is a rapidly decreasingR test function and f is any measurable complex-valued function, ∞ defined almost everywhere on R, such that −∞ |x|k |f (x)|dx < ∞ for every k ∈ N, then the convolution f ∗ h is a rapidly decreasing test function. (Hint: show that the Fourier transform of f ∗ h is a test function.) Ra > (e) Let f be a tempered function such that lima→∞ −a f (x)dx exists in C. Show that this limit R∞ 2 is also equal to lim²↓0 −∞ e−²x f (x)dx. (Hint: set g(x) = f (x) + f (−x). Use 224J to show that if Rb Rc Ra 2 2 0 ≤ a ≤ b then | a g(x)e−²x dx| ≤ supc∈[a,b] | a g|, so that lima→∞ 0 g(x)e−²x dx exists uniformly in ², Ra Ra 2 while lim²↓0 0 g(x)e−²x dx = 0 g(x)dx for every a ≥ 0.) > (f ) Let f and g be tempered functions on R such that g represents the Fourier transform of f . Show that 1 R a −iyx e f (x)dx g(y) = lima→∞ √ −a 2π
at almost all points y for which the limit exists. (Hint: 284Xe, 284M.) ∧
> (g) Let f be an integrable complex-valued function on R such that f is also integrable. Show that ∧∨
f = f at any point at which f is continuous. (h) Show that for every p ∈ [1, ∞[, f ∈ LpC and ² > 0 there is a rapidly decreasing test function h such that kf − hkp ≤ ².
284Xp
Fourier transforms II
461
> (i) Let f and g be square-integrable complex-valued functions on R such that g represents the Fourier transform of f . Show that
Rd c
i f (x)dx = √
R∞
−∞
2π
eicy −eidy g(y)dy y
whenever c < d in R. R (j) Let f be a measurable complex-valued function, defined almost everywhere in R, such that |f |p < ∞, where 1 < p ≤ 2. Show that f is a tempered function and that there is a tempered function g representing the Fourier transform of f . (Hint: express f as f1 + f2 , where f1 is integrable and f2 is square-integrable.) (Remark Defining kf kp , kgkq as in 244D, where q = p/(p − 1), we have kgkq ≤ (2π)(p−2)/2p kf kp ; see Zygmund 59, XVI.3.2.) (k) Let f , g be square-integrable complex-valued functions on R such that g represents the Fourier transform of f . (i) Show that 1 √ 2π
Ra −a
eixy g(y)dy =
1 π
R∞ −∞
sin at f (x − t)dt t
for every x ∈ R, a > 0. (Hint: find the inverse Fourier transform of y 7→ e−ixy χ[−a, a](y), and use 284Ob.) (ii) Show that if f (x) = 0 for x ∈ ]c, d[ then Ra 1 √ lima→∞ −a eixy g(y)dy = 0 2π
for x ∈ ]c, d[. (iii) Show that if f is differentiable at x ∈ R, then Ra 1 √ lima→∞ −a eixy g(y)dy = f (x). 2π
(iv) Show that if f has bounded variation over some interval properly containing x, then Ra 1 1 √ lima→∞ −a eixy g(y)dy = (limt∈dom f,t↑x f (t) + limt∈dom f,t↓x f (t)). 2π
2
∧
(l) Let f be an integrable complex function on R. Show that if f is square-integrable, so is f . (m) Let f1 , f2 be square-integrable complex-valued functions on R with Fourier transforms represented R∞ R∞ by g1 , g2 . Show that −∞ f1 (t)f2 (−t)dt = −∞ g1 (t)g2 (t)dt. (n) Write δx for the measure √ on R which assigns a mass of 1 to the point x and a mass of 0 to the rest of R. Describe a sense in which 2πδx can be regarded as the Fourier transform of the function t 7→ eixt . (o) For any tempered function f and x ∈ R, define δx as in 284Xn, and set R (δx ∗ f )(u) = f (u − t)δx (dt) = f (u − x) for every u for which u − x ∈ dom f (cf. 257Xe). If g represents the Fourier transform of f , find a corresponding representation of the Fourier transform of δx ∗ f , and relate it to the product of g with the Fourier transform of δx . (p) Show that limδ↓0,a→∞
¡R −δ 1 −a x
e−iyx dx +
Ra1 δ x
¢
e−iyx dx = −πi sgn y
for every y ∈ R, writing sgn y = y/|y| if y 6= 0 and sgn 0 = 0. (Hint: 283Da.) (ii) Show that limc→∞
1 c
RcRa 0
−a
eixy sgn y dy da =
2i x
462
Fourier analysis
284Xp
for every x 6= 0. (iii) Show that for any rapidly decreasing test function h, Z 0
∞
∧ 1 ∧ (h(x) − h(−x))dx x
=
¡
lim
Z
δ↓0,a→∞
Z
iπ 2π
Z
¢ 1∧ 1∧ h(x)dx + h(x)dx −a x δ x −δ
a
∞
= −√
h(y) sgn y dy. −∞
(iv) Show that for any rapidly decreasing test function h, iπ √ 2π
R∞ −∞
∧
h(x) sgn x dx =
R∞1 0
y
(h(y) − h(−y))dy.
R∞ (q) Let hhn in∈N be a sequence of rapidly decreasing test functions such that φ(f ) = limn→∞ −∞ hn × f R∞ R∞ ∧ is defined for every rapidly decreasing test function f . Show that limn→∞ −∞ h0n × f , limn→∞ −∞ hn × f R∞ and limn→∞ −∞ (hn ∗ g) × f are defined for all rapidly decreasing test functions f and g, and are zero if φ is identically zero. (Hint: 255G will help with the last.) 284Y Further exercises (a) Let f be an integrable complex-valued function on ]−π, π], and f˜ its periodic extension, as in 282Ae. Show that f˜ is a tempered function. Show that for any rapidly decreasing √ P∞ R ∧ ck h(k), where hck ik∈N is the sequence of Fourier coefficients of f . test function h, f˜ × h = 2π k=−∞
(Hint: begin with the case f (x) = einx . Next show that ∧ P∞ P∞ M = k=−∞ |h(k)| + k=−∞ supx∈[(2k−1)π,(2k+1)π] |h(x)| < ∞, and that
√ P∞ R ∧ | f˜ × h − 2π k=−∞ ck h(k)| ≤ M kf k1 .
Finally apply 282Ib.) (b) Let f be a complex-valued function, defined almost everywhere on R, such that f × h is integrable for every rapidly decreasing test function h. Show that f is tempered. (c) Let f and g be tempered functions on R such that g represents the Fourier transform of f . Show that Z c
d
Z
∞
eicy −eidy −y 2 /2σ 2 e g(y)dy 2π σ→∞ −∞ y
i f (x)dx = √
lim
whenever c ≤ d in R. (Hint: set θ = χ[c, d]. Show that both sides are limσ→∞ as in 283N.)
R
f × (θ ∗ ψ1/σ ), defining ψσ
R∞ (d) Show that if g : R → R is an odd function of bounded variation such that 1 x1 g(x)dx = ∞, then g does not represent the Fourier transform of any tempered function. (Hint: 283Yd, 284Yc.) (e) Let S be the space of rapidly decreasing test functions. For k, m ∈ N set τkm (h) = supx∈R |x|k |h(m) (x)| for every h ∈ S, writing h(m) for the mth derivative of h as usual. (i) Show that each τkm is a seminorm (2A5D) and that S is complete and separable for the metrizable linear space topology T they define (2A5B). ∧
(ii) Show that h 7→ h : S → S is continuous for T. (iii) Show that if f is any tempered function, then R R h 7→ f × h is T-continuous. (iv) Show that if f is an integrable function such that |xk f (x)|dx < ∞ for every k ∈ N, then h 7→ f ∗ h : S → S is T-continuous. (f ) Show that if f is a tempered function on R and 1 RcRa γ = limc→∞ 0 −a f (x)dxda c
is defined in C, then γ is also lim²↓0
R∞ −∞
f (x)e−²|x| dx.
284 Notes
Fourier transforms II
463
(g) Let f , g be square-integrable complex-valued functions on R such that g represents the Fourier transform of f . Suppose that m ∈ Z and that (2m − 1)π < x < (2m + 1)π. Set f˜(t) = f (t + 2mπ) for those t ∈ ]−π, π] such that t + 2mπ ∈ dom f . Let hck ik∈Z be the sequence of Fourier coefficients of f˜. Show that Ra Pn 1 √ lima→∞ −a eixy g(y)dy = limn→∞ k=−n ck eikx 2π
in the sense that if one limit exists in C so does the other, and they are then equal. (Hint: 284Xk(i), 282Da.) ∧
(h) Show that if f is integrable over R and there is some M ≥ 0 such that f (x) = f (x) = 0 for |x| ≥ M , then f =a.e. 0. (Hint: reduce to the case M = π. Looking at the Fourier series of f ¹ ]−π, π], show that f ∧ Pm is expressible in the form f (x) = k=−m ck eikx for almost every x ∈ ]−π, π]. Now compute f (2n + 12 ) for large n.) (i) Let ν be a Radon measure on R (definition: 256A) which is ‘tempered’ in the sense that
R∞
1
−∞ 1+|x|k
ν(dx)
is finite for some k ∈ N. (i) Show that every rapidly decreasing test function is ν-integrable. (ii) Show that if ν has bounded support (definition: 256Xf), and hR is a rapidly decreasing test function, then ν ∗ h is a ∞ rapidly decreasing test function, where (ν ∗ h)(x) = −∞ h(x − y)ν(dy) for x ∈ R. (iii) Show that there is R∞ R∞ a sequence hhn in∈N of rapidly decreasing test functions such that limn→∞ −∞ hn × f = −∞ f dν for every rapidly decreasing test function f . (j) Let φ : S → R be a functional defined by the formula of 284Xq. Show that φ is continuous for the topology of 284Ye. (Note: it helps to know a little more about metrizable linear topological spaces than is covered in §2A5.) 284 Notes and comments Yet again I must warn you that the material above gives a very restricted view of the subject. I have tried to indicate how the theory of Fourier transforms of ‘good’ functions – here taken to be the rapidly decreasing test functions – may be extended, through a kind of duality, to a very much wider class of functions, the ‘tempered functions’. Evidently, writing S for the linear space of rapidly decreasing test functions, we can seek to investigate a Fourier transform of any linear functional ∧ ∧ φ : S → C, writing φ(h) = φ(h) for any h ∈ S. (It is actually commoner at this point to restrict attention to functionals φ which are continuous for the standard topology on S, described in 284Ye; these are called tempered distributions.) By 284F-284G, we can identify some of these functionals with equivalence classes of tempered functions, and then set out to investigate those tempered functions whose Fourier transforms can again be represented by tempered functions. I suppose the structure of the theory of Fourier transforms is best laid out through the formulae involved. ∧ ∨ Our aim is to set up pairs (f, g) = (f, f ) = (g, g) in such a way that we have ∧∨
∨∧
Inversion: h = h = h; ∨ ∧ Reversal : h(y) = h(−y); ∧
∧
Linearity: (h1 + h2 )∧ = h1 + h2 ,
∧
(ch)∧ = ch;
∧
Differentiation: (h0 )∧ (y) = iy h(y); ∧
∧
Shift: if h1 (x) = h(x + c) then h1 (y) = eiyc h(y); ∧
∧
Modulation: if h1 (x) = eicx h(x) then h1 (y) = h(y − c); ∧
∧
Symmetry: if h1 (x) = h(−x) then h1 (y) = h(−y); ∧
Complex Conjugate: (h)∧ (y) = h(−y); ∧
1∧ y
Dilation: if h1 (x) = h(cx), where c > 0, then h1 (y) = h( ); c c √ ∧ ∧ ∧ ∧ 1 Convolution: (h1 ∗ h2 )∧ = 2π h1 × h2 , (h1 × h2 )∧ = √ h1 ∗ h2 ; 2π R∞ ∧ R∞ ∧ Duality: −∞ h1 × h2 = −∞ h1 × h2 ; R∞ ∧ R∞ ∧ Parseval : −∞ h1 × h2 = −∞ h1 × h2 ;
464
Fourier analysis
and, of course, ∧
1 h(y) = √
2π
Rd∧ c
i h(y)dy = √
2π
R∞ −∞
R∞ −∞
284 Notes
e−iyx h(x)dx, e−icy −e−idy h(y)dy. y
(I have used the letter h in the list above to suggest what is in fact the case, that all the formulae here are ∧
valid for rapidly decreasing test functions.) On top of all this, it is often important that the operation h 7→ h should be continuous in some sense. The challenge of the ‘pure’ theory of Fourier transforms is to find the widest possible variety R ∞of objects h for which the formulae above will be valid, subject to appropriate interpretations of ∧ , ∗ and −∞ . I must of course remark here that from the very beginnings, the subject has been enriched by its applications in other parts of mathematics, the physical sciences and the social sciences, and that again and again these have ∧ suggested further possible pairs (f, f ), making new demands on our power to interpret the rules we seek to follow. Even the theory of distributions does not seem to give a full canonical account of what can be done. First, there are great difficulties in interpreting the ‘product’ of two arbitrary distributions, making several of the formulae above problematic; and second, it is not obvious that only one kind of distribution need be considered. In this section I have looked at just one space of ‘test functions’, the space S of rapidly decreasing test functions; but at least two others are significant, the space D of smooth functions with bounded support and the space Z of Fourier transforms of functions in D. The advantage of starting with S is that it gives ∧ 2 a symmetric theory, since h ∈ S for every h ∈ S; but it is easy to find objects (e.g., the function x 7→ ex , or the function x 7→ 1/|x|) which cannot be interpreted as functionals on S, so that their Fourier transforms must be investigated by other methods, if at all. In 284Xp I sketch some of the arguments which can be used to justify the assertion p that the Fourier transform of the function x 7→ 1/x is, or can be represented by, the function y 7→ −i π2 sgn y; the general principle in this case being that we approach both 0 and ∞ symmetrically. For a variety of such matching pairs, established by arguments based on the idea in 284Xq, see Lighthill 59, chap. 3. Accordingly it seems that, after nearly two centuries, we must still proceed by carefully examining particular classes of function, and checking appropriate interpretations of the formulae. In the work above I have repeatedly used the concepts Ra R∞ 2 lima→∞ −a f (x)dx, lim²↓0 −∞ e−²x f (x)dx R∞ as alternative interpretations of −∞ f . (Of course they are closely related; see 284Xe.) The reasons for using 2
the particular kernel e−²x are that it belongs to S, it is an even function, its Fourier transform is calculable 2 2 and easy to manipulate, and it is associated with the normal probability density function σ√12π e−x /2σ , so that any miscellaneous facts we gather have a chance of being valuable elsewhere. But there are applications in which alternative kernels are more manageable – e.g., e−²|x| (283Xr, 283Yb, 284Yf). One of the guiding principles here is that purely formal manipulations, along the lines of those in the list above, and (especially) changes in the order of integration, with other exchanges of limit, again and again give rise to formulae which, suitably interpreted, are valid. First courses in analysis are often inhibitory; students are taught to distrust any manipulation which they cannot justify. To my own eye, the delight of this subject lies chiefly in the variety of the arguments demanded by a rigorous approach, the ground constantly shifting with the context; but there is no doubt that cheerful sanguinity is often the best guide to the manipulations which it will be right to try to justify. This being a book on measure theory, I am of course particularly interested in the possibility of a measure appearing as a Fourier transform. This is what happens if we seek the Fourier transform of the constant function 1 (284R). More generally, any periodic tempered function f with period 2π can be assigned a Fourier transform which is a ‘signed measure’ (for our present purposes, a complex linear combination of measures) concentrated on Z, the mass at each k ∈ Z being determined by the corresponding Fourier coefficient of f ¹ ]−π, π] (284Xn, 284Ya). In the next section I will go farther in this direction, with particular reference to probability distributions on R r . But the reason why positive measures have not forced themselves on our attention so far is that we do not expect to get a positive function as a Fourier transform unless some very
285C
Characteristic functions
465
special conditions are satisfied, as in 283Yb. As in §282, I have used the Hilbert space structure of L2C as the basis of the discussion of Fourier transforms of functions in L2C (284O-284P). But as with Fourier series, Carleson’s theorem (286U) provides a more direct description.
285 Characteristic functions I come now to one of the most effective applications of Fourier transforms, the use of ‘characteristic functions’ to analyse probability distributions. It turns out not only that the Fourier transform of a probability distribution determines the distribution (285M) but that many of the things we want to know about a distribution are easily calculated from its transform (285G, 285Xf). Even more strikingly, pointwise convergence of Fourier transforms corresponds (for sequences) to convergence for the vague topology in the space of distributions, so they provide a new and extremely powerful method for proving such results as the Central Limit Theorem and Poisson’s theorem (285Q). As the applications of the ideas here mostly belong to probability theory, I return to probabilists’ terminology, as in Chapter 27. 285A Definition (a) Let ν be a Radon probability measure on R r (256A). Then the characteristic function of ν is the function φν : R r → C given by the formula φν (y) =
R
eiy . x ν(dx)
for every y ∈ R r , writing y . x = η1 ξ1 + . . . + ηr ξr if y = (η1 , . . . , ηr ), x = (ξ1 , . . . , ξr ). (b) Let X1 , . . . , Xr be real-valued random variables on the same probability space. The characteristic function of X = (X1 , . . . , Xr ) is the characteristic function φX = φνX of their joint probability distribution νX as defined in 271C. 285B Remarks (a) By one of the ordinary accidents of history, the definitions of ‘characteristic function’ and ‘Fourier transform’ have evolved independently. In 283Ba I remarked that the definition of the Fourier transform remains unfixed, and that the formulae ∧
f (y) = ∨
f (y) =
R∞
1 2π
−∞
eiyx f (x)dx,
R∞ −∞
e−iyx f (x)dx
are sometimes used. On the other hand, I think that nearly all authors agree on the definition of the characteristic function as given above. You may feel therefore that I should have followed their lead, and chosen the definition of Fourier transform which best matches the definition of characteristic function. I did not do so largely because I wished to emphasise the symmetry between the Fourier transform and the inverse Fourier transform, and the correspondence between Fourier transforms and Fourier series. The principal advantage of matching the definitions up would be to make the constants in such theorems as 283F, 285Xh ∧ ∨ the same, and would be balanced by the need to remember different constants for f , f in such results as 283M. (b) A secondary reason for not trying too hard to make the formulae of this section match directly those of §§283-284 is that the r-dimensional case is at the heart of some of the most important applications of characteristic functions, so that it seems right to introduce it from the beginning; and consequently the formulae of this section will necessarily have new features compared with those in the body of the work so far. 285C Of course there is a direct way to describe the characteristic function of a family (X1 , . . . , Xr ) of random variables, as follows.
466
Fourier analysis
285C
Proposition Let X1 , . . . , Xr be real-valued random variables on the same probability space (Ω, Σ, µ), and νX their joint distribution. Then their characteristic function φνX is given by X φνX (y) = E(eiy .X ) = E(eiη1 X1 eiη2 X2 . . . eiηr Xr )
for every y = (η1 , . . . , ηr ) ∈ R r . proof Apply 271E to the functions h1 , h2 : Rr → R defined by h1 (x) = cos(y . x), h2 (y) = sin(y .x), to see that Z
Z
φνX (y) =
h1 (x)νX (dx) + i
h2 (x)νX (dx)
X X )) + iE(h2 (X X )) = E(eiy .X = E(h1 (X ).
285D I ought to spell out the correspondence between Fourier transforms, as defined in 283A, and characteristic functions. Proposition Let ν be a Radon probability measure on R. Write ∧ 1 ν(y) = √
R∞
−∞
2π
e−iyx ν(dx)
for every y ∈ R, and φν for the characteristic function of ν. 1 ∧ (a) ν(y) = √ φν (−y) for every y ∈ R. 2π
(b) For any Lebesgue integrable complex-valued function h defined almost everywhere in R,
R∞
∧
−∞
ν(y)h(y)dy =
R∞
∧
−∞
h(x)ν(dx).
(c) For any rapidly decreasing test function h on R (see §284),
R∞
−∞
h(x)ν(dx) =
R∞
∨
−∞
∧
h(y)ν(y)dy.
(d) If ν is an indefinite-integral measure over Lebesgue measure, with Radon-Nikod´ ym derivative f , then ∧ ν is the Fourier transform of f . ∧
proof (a) This is immediate from the definitions of φν and ν. (b) Because
R∞ R∞ −∞
−∞
|h(y)|ν(dx)dy =
R∞ −∞
|h(y)|dy < ∞,
we may change the order of integration to see that Z ∞ Z ∞Z ∞ 1 ∧ √ ν(y)h(y)dy = e−iyx h(y)ν(dx)dy 2π −∞ −∞ −∞ Z ∞Z ∞ Z 1 −iyx √ e h(y)dy ν(dx) = = 2π
−∞
−∞
∞
∧
h(x)ν(dx). −∞
∨
∨∧
(c) This follows immediately from (b), because h is integrable and h = h (284C). (d) The point is just that
R
h dν =
R
h(x)f (x)dx
for every bounded Borel measurable h : R → R (235M), and therefore for the functions x 7→ e−iyx : R → C. Now ∧ 1 ν(y) = √
2π
for every y.
R∞
−∞
1 e−iyx ν(dx) = √
2π
R∞
−∞
∧
e−iyx f (x)dx = f (y)
285F
Characteristic functions
467
285E Lemma Let X be a normal random variable with expectation a and variance σ 2 , where σ > 0. Then the characteristic function of X is given by φ(y) = eiya e−σ
2 2
y /2
.
proof This is just 283N with the constants changed. We have Z
1 φ(y) = E(eiyX ) = √
σ 2π
∞
2
eiyx e−(x−a)
/2σ 2
dx
−∞
(taking the density function for X given in 274Ad, and applying 271Ic) Z ∞ 2 1 √ = eiy(σt+a) e−t /2 dt 2π
(substituting x = σt + a) (setting ψ1 (x) =
2 √1 e−x /2 , 2π
−∞
√ ∧ = eiya 2π ψ 1 (−yσ) as in 283N) = eiya e−σ
2 2
y /2
.
285F I now give results corresponding to parts of 283C, with an extra refinement concerning independent random variables (285I). Proposition Let ν be a Radon probability measure on R r , and φ its characteristic function. (a) φ(0) = 1. (b) φ : R r → C is uniformly continuous. (c) φ(−y) = φ(y),R |φ(y)| ≤ 1 for every y ∈ Rr . R (d) If r = 1 and R |x|ν(dx) < ∞, then φ0 (y) exists and is equal to i Rxeixy ν(dx) for every y ∈ R. (e) If r = 1 and x2 ν(dx) < ∞, then φ00 (y) exists and is equal to − x2 eixy ν(dx) for every y ∈ R. R proof (a) φ(0) = 1ν(dx) = ν(R r ) = 1. (b) Let ² > 0. Let M > 0 be such that ν{x : kxk ≥ M } ≤ ²,
√
writing kxk = x . x as usual. Let δ > 0 be such that |eia − 1| ≤ ² whenever |a| ≤ δ. Now suppose that y, y 0 ∈ R r are such that ky − y 0 k ≤ δ/M . Then whenever kxk ≤ M , |eiy . x − eiy
0
.x
| = |eiy
0
.x
0
0
||ei(y−y ) . x − 1| = |ei(y−y ) . x − 1| ≤ ²
because |(y − y 0 ) . x| ≤ ky − y 0 kkxk ≤ δ. Consequently Z
Z
0
|φ(y) − φ(y )| ≤
|e
iy . x
kxk≤M
iy 0 . x
−e Z
|eiy . x |ν(dx)
|ν(dx) + kxk>M
|eiy
+
0
.x
|ν(dx)
kxk>M
≤ ² + ² + ² = 3². As ² is arbitrary, φ is uniformly continuous. (c) This is elementary; φ(−y) = |φ(y)| = |
R
R
e−iy . x ν(dx) =
eiy . x ν(dx)| ≤
R
R
eiy . x ν(dx) = φ(y),
|eiy . x |ν(dx) =
R
1 ν(dx) = 1.
468
Fourier analysis
285F
∂ iyx (d) The point is that | ∂y e | = |x| for every x, y ∈ R. So by 123D (applied, strictly speaking, to the real and imaginary parts of the function)
φ0 (y) =
d dy
R
R
eiyx ν(dx) =
∂ iyx e ν(dx) ∂y
=
R
ixeiyx ν(dx).
∂ (e) Since we now have | ∂y xeiyx | = x2 for every x, y, we can repeat the argument to get
φ00 (y) = i
d dy
R
xeiyx ν(dx) = i
R
∂ xeiyx ν(dx) ∂y
=−
R
x2 eiyx ν(dx).
285G Corollary (a) Let X be a real-valued random variable with finite expectation, and φ its characteristic function. Then φ0 (0) = iE(X). (b) Let X be a real-valued random variable with finite variance, and φ its characteristic function. Then φ00 (0) = −E(X 2 ). proof We have only to match X to its distribution ν, and say that ‘X has finite expectation’ corresponds to
R
‘ |x|ν(dx) = E(|X|) < ∞’, so that φ0 (0) = i
R
x ν(dx) = iE(X),
and that ‘X has finite variance’ corresponds to
R
‘ x2 ν(dx) = E(X 2 ) < ∞’, so that φ00 (0) = −
R
x2 ν(dx) = −E(X 2 ),
as in 271E. ∧
285H Remark Observe that there is no result corresponding to 283Cg (‘lim|y|→∞ f (y) = 0’). If ν is the Radon probability measure giving mass 1 to the point 0 (that is, the ‘Dirac delta function’ of 284R, that is, the distribution of a random variable which is zero almost everywhere), then φ(y) = 1 for every y. 285I Proposition Let X1 , . . . , Xn be independent real-valued random variables, with characteristic functions φ1 , . . . , φn . Let φ be the characteristic function of their sum X = X1 + . . . + Xn . Then Qn φ(y) = j=1 φj (y) for every y ∈ R. proof Let y ∈ R. By 272E, the variables Yj = eiyXj are independent, so by 272Q φ(y) = E(eiyX ) = E(eiy(X1 +...+Xn ) ) = E( as required. Remark See also 285R below.
Qn j=1
Yj ) =
Qn j=1
E(Yj ) =
Qn j=1
φj (y),
285L
Characteristic functions
469
285J There is an inversion theorem for characteristic functions, corresponding to 283F; I give it in 285Xh, with an r-dimensional version in 285Yb. However, this does not seem to be as useful as the following group of results. Lemma Let ν be a Radon probability measure on Rr , and φ its characteristic function. Then for any j ≤ r, a>0 ν{x : |ξj | ≥ a} ≤ 7a
R 1/a 0
(1 − Re φ(tej ))dt,
where ej ∈ R r is the jth unit vector. proof We have Z
Z
1/a
(1 − Re φ(tej ))dt = 7a
7a 0
Z
1/a ¡
1 − Re
0
Z
Rr
Z
1/a
= 7a 0
Rr 1/a
Z
Z
1 − cos(tξj )ν(dx)dt 1 − cos(tξj )dt ν(dx)
= 7a Rr
(because (x, t) 7→ 1 − cos(tξj ) is bounded and νR r ·
¢ eitξj ν(dx) dt
0
1 a
is finite) Z ¡1 1 ξj ¢ = 7a − sin ν(dx) a ξ a r j ZR ¡1 1 ξj ¢ ≥ 7a − sin ν(dx) ξj a |ξj |≥a a
(because
1 ξ
sin
ξ a
≤
1 a
for every ξ 6= 0) ≥ ν{x : |ξj | ≥ a},
because sin η η
≤
sin 1 1
≤
1 a
1 ξj
sin
6 7
if η ≥ 1,
so a( −
ξj ) a
≥
1 7
if |ξj | ≥ a. 285K Characteristic functions and the vague topology The time has come to return to ideas mentioned briefly in 274L. Fix r ≥ 1 and let P be the set of all Radon probability measures on R r . For any bounded continuous function h : R r → R, define ρh : P × P → R by setting ρh (ν, ν 0 ) = |
R
h dν −
R
h dν 0 |
for ν, ν 0 ∈ P . Then the vague topology on P is the topology generated by the pseudometrics ρh (274Ld). 285L Theorem Let ν, hνn in∈N be Radon probability measures on R r , with characteristic functions φ, hφn in∈N . Then the following are equiveridical: (i) νR = limn→∞ νn forR the vague topology; (ii) h dν = limn→∞ h dνn for every bounded continuous h : R r → R; (iii) limn→∞ φn (y) = φ(y) for every y ∈ R r . proof (a) The equivalence of (i) and (ii) is virtually the definition of the vague topology; we have
470
Fourier analysis
285L
lim νn = ν for the vague topology
n→∞
⇐⇒
lim ρh (νn , ν) = 0 for every bounded continuous h
n→∞
(2A3Mc)
Z ⇐⇒
Z
lim |
h dνn −
n→∞
h dν| = 0 for every bounded continuous h.
(b) Next, (ii) obviously implies (iii), because Re φ(y) =
R
hy dν = limn→∞ hy dνn = limn→∞ Re φn (y),
setting hy (x) = cos x. y for each x, and similarly Im φ(y) = limn→∞ Im φn (y) for every y ∈ R r . (c) So we are left to prove that (iii)⇒(ii). I start by showing that, given ² > 0, there is a closed bounded set K such that νn (R r \ K) ≤ ² for every n ∈ N. P P We know that φ(0) = 1 and that φ is continuous at 0 (285Fb). Let a > 0 be so large that for every j ≤ r, |t| ≤ 1/a we have 1 − Re φ(tej ) ≤
² , 14r
writing ej for the jth unit vector, as in 285J. Then 7a
R 1/a 0
(1 − Re φ(tej ))dt ≤
² 2r
for each j ≤ r. By Lebesgue’s Dominated Convergence Theorem (since of course the functions t 7→ 1 − Re φn (tej ) are uniformly bounded on [0, a1 ]), there is an n0 ∈ N such that 7a
R 1/a 0
(1 − Re φn (tej ))dt ≤
² r
for every j ≤ r, n ≥ n0 . But 285J tells us that now νn {x : |ξj | ≥ a} ≤
² r
for every j ≤ r, n ≥ n0 . On the other hand, there is surely a b ≥ a such that νn {x : |ξj | ≥ b} ≤
² r
for every j ≤ r, n < n0 . So, setting K = {x : |ξj | ≤ b for every j ≤ r}, νn (R r \ K) ≤ ² for every n ∈ N, as required. Q Q (d) Now take any bounded continuous h : R r → R and ² > 0. Set M = 1 + supx∈R r |h(x)|, and let K be a bounded closed set such that νn (R r \ K) ≤
² M
for every n ∈ N,
ν(R r \ K) ≤
² , M
using (b) just above. By the Stone-Weierstrass theorem (281K) there are y0 , . . . , ym ∈ Qr and c0 , . . . , cm ∈ C such that |h(x) − g(x)| ≤ ² for every x ∈ K, |g(x)| ≤ M for every x ∈ R r ,
285O
writing g(x) =
Characteristic functions
Pm
iyk . x k=0 ck e
limn→∞
R
for x ∈ R r . Now R Pm Pm g dνn = limn→∞ k=0 ck φn (yk ) = k=0 ck φ(yk ) = g dν.
On the other hand, for every n ∈ N, R
and similarly | g dν −
| R
R
471
g dνn −
R
h dνn | ≤
R K
|g − h|dνn + 2M νn (R \ K) ≤ 3²,
h dν| ≤ 3². Consequently
R
lim supn→∞ | As ² is arbitrary, limn→∞
R
h dνn − h dνn =
R
h dν| ≤ 6².
R
h dν,
and (ii) is true. 285M Corollary (a) Let ν, ν 0 be two Radon probability measures on R r with the same characteristic functions. Then they are equal. (b) Let (X1 , . . . , Xr ) and (Y1 , . . . , Yr ) be two families of real-valued random variables. If E(eiη1 X1 +...+iηr Xr ) = E(eiη1 Y1 +...+iηr Yr ) for all η1 , . . . , ηr ∈ R, then (X1 , . . . , Xr ) has the same joint distribution as (Y1 , . . . , Yr ). R R proof (a) Applying 285L with νn = ν 0 for every n, we see that h dν 0 = h dν for every bounded continuous h : R r → R. By 256D(iv), ν = ν 0 . (b) Apply (a) with ν, ν 0 the two joint distributions. 285N Remarks Probably the most important application of this theorem is to the standard proof of the Central Limit Theorem. I sketch the ideas in 285Xn and 285Yj-285Ym; details may be found in most serious probability texts; two on my shelf are Shiryayev 84, §III.4, and Feller 66, §XV.6. However, to get the full strength of Lindeberg’s version of the Central Limit Theorem we have to work quite hard, and I therefore propose to illustrate the method with a version of Poisson’s theorem (285Q) instead. I begin with two lemmas which are very frequently used in results of this kind. 285O Lemma Let c0 , . . . , cn , d0 , . . . , dn be complex numbers of modulus at most 1. Then Qn Qn Pn | k=0 ck − k=0 dk | ≤ k=0 |ck − dk |. proof Induce on n. The case n = 0 is trivial. For the case n = 1 we have |c0 c1 − d0 d1 | = |c0 (c1 − d1 ) + (c0 − d0 )d1 | ≤ |c0 ||c1 − d1 | + |c0 − d0 ||d1 | ≤ |c1 − d1 | + |c0 − d0 |, which is what we need. For the inductive step to n + 1, we have
|
n+1 Y k=0
(by the case just done, because
ck −
n+1 Y
dk | ≤ |
k=0
n Y
ck −
n Y
k=0 Qn Qn cn+1 , dn+1 , k=0 ck and k=0 n X k=0
(by the inductive hypothesis) =
dk all have modulus at most 1)
|ck − dk | + |cn+1 − dn+1 |
≤
n+1 X k=0
so the induction continues.
dk | + |cn+1 − dn+1 |
k=0
|ck − dk |,
472
Fourier analysis
285P
285P Lemma Let M , ² > 0. Then there are η > 0 and y0 , . . . , yn ∈ R such that whenever X, Z are two real-valued random variables with E(|X|) ≤ M , E(|Z|) ≤ M and |φX (yj ) − φZ (yj )| ≤ η for every j ≤ n, then FX (a) ≤ FZ (a + ²) + ² for every a ∈ R, where I write φX for the characteristic function of X and FX for the distribution function of X. proof Set δ =
² 7
> 0, b = M/δ.
(a) Define h0 : R → [0, 1] by setting h0 (x) = 1 if x ≤ 0, h0 (x) = 1 − x/δ if 0 ≤ x ≤ δ, h0 (x) = 0 if x ≥ δ. Then h0 is continuous. Let m be the integral part of b/δ, and for −m ≤ k ≤ m + 1 set hk (x) = h0 (x − kδ). By the theorem (281K), there are y0 , . . . , yn ∈ R and c0 , . . . , cn ∈ C such that, writing PStone-Weierstrass n g0 (x) = j=0 cj eiyj x , |h0 (x) − g0 (x)| ≤ δ for every x ∈ [−b − (m + 1)δ, b + mδ], |g0 (x)| ≤ 1 for every x ∈ R. For −m ≤ k ≤ m + 1, set
Set η = δ/(1 +
gk (x) = g0 (x − kδ) =
Pn j=0
Pn
j=0 cj e
−iyj kδ iyj x
e
.
|cj |) > 0.
(b) Now suppose that X, Z are random variables such that E(|X|) ≤ M , E(|Z|) ≤ M and |φX (yj ) − φZ (yj )| ≤ η for every j ≤ n. Then for any k we have Pn Pn E(gk (X)) = E( j=0 cj e−iyj kδ eiyj X ) = j=0 cj e−iyj kδ φX (yj ), and similarly E(gk (Z)) = so |E(gk (X)) − E(gk (Z))| ≤
Pn
Pn j=0
−iyj kδ φZ (yj ), j=0 cj e
|cj ||φX (yj ) − φZ (yj )| ≤
Pn j=0
|cj |η ≤ δ.
Next, |hk (x) − gk (x)| ≤ δ for every x ∈ [−b − (m + 1)δ + kδ, b + mδ + kδ] ⊇ [−b, b], |hk (x) − gk (x)| ≤ 2 for every x, Pr(|X| ≥ b) ≤
M b
= δ,
so E(|hk (X) − gk (X)|) ≤ 3δ; and similarly E(|hk (Z) − gk (Z)|) ≤ 3δ. Putting these together, |E(hk (X)) − E(hk (Z))| ≤ 7δ = ² whenever −m ≤ k ≤ m + 1. (c) Now suppose that −b ≤ a ≤ b. Then there is a k such that −m ≤ k ≤ m + 1 and a ≤ kδ ≤ a + δ. Since χ ]−∞, a] ≤ χ ]−∞, kδ] ≤ hk ≤ χ ]−∞, (k + 1)δ] ≤ χ ]−∞, a + 2δ], we must have Pr(X ≤ a) ≤ E(hk (X)), E(hk (Z)) ≤ Pr(Z ≤ a + 2δ) ≤ Pr(Z ≤ a + ²). But this means that Pr(X ≤ a) ≤ E(hk (X)) ≤ E(hk (Z)) + ² ≤ Pr(Z ≤ a + ²) + ² whenever a ∈ [−b, b].
285Q
Characteristic functions
473
(d) As for the cases a ≥ b, a ≤ −b, we surely have b(1 − FZ (b)) = b Pr(Z ≥ b) ≤ E(|Z|) ≤ M , so if a ≥ b then FX (a) ≤ 1 ≤ FZ (a) + 1 − FZ (b) ≤ FZ (a) +
M b
= FZ (a) + δ ≤ FZ (a + ²) + ².
Similarly, bFX (−b) ≤ E(|X|) ≤ M , so FX (a) ≤ δ ≤ FZ (a + ²) + ² for every a ≤ −b. This completes the proof. 285Q Law of Rare Events: Theorem For any M ≥ 0, ² > 0 there is a δ > 0 such that whenever X , P0n . . . , Xn are independent {0, 1}-valued random variables with Pr(Xk = 1) = pk ≤ δ for every k ≤ n, and k=0 pk = λ ≤ M , and X = X0 + . . . + Xn , then λm −λ e | m!
| Pr(X = m) −
≤²
for every m ∈ N. proof (a) We should begin by calculating some characteristic functions. First, the characteristic function φk of Xk will be given by φk (y) = (1 − pk )eiy0 + pk eiy1 = 1 + pk (eiy − 1). Next, if Z is a Poisson random variable with parameter λ (that is, if Pr(Z = m) P = λm e−λ /m! for every ∞ m ∈ N; all you need to know at this point about the Poisson distribution is that m=0 λm e−λ /m! = 1), then its characteristic function φZ is given by P∞ λm −λ iym P∞ (λeiy )m iy iy φZ (y) = m=0 e e = e−λ m=0 = e−λ eλe = eλ(e −1) . m!
m!
(b) Before getting down to δ’s and η’s, I show how to estimate φX (y) − φZ (y). We know that Qn φX (y) = k=0 φk (y) (using 285I), while φZ (y) = Because φk (y), epk (e
iy
−1)
Qn k=0
epk (e
iy
−1)
.
all have modulus at most 1 (we have iy
|epk (e 285O tells us that |φX (y) − φZ (y)| ≤
Pn k=0
−1)
| = e−pk (1−cos y) ≤ 1,) iy
|φk (y) − epk (e
−1)
|=
Pn k=0
|epk (e
iy
−1)
− 1 − pk (eiy − 1)|.
(c) So we have a little bit of analysis to do. To estimate |ez − 1 − z| where Re z ≤ 0, consider the function g(t) = Re(c(etz − 1 − tz)) where |c| = 1. We have g(0) = g 0 (0) = 0 and |g 00 (t)| = | Re(c(z 2 etz ))| ≤ |c||z 2 ||etz | ≤ |z|2 for every t ≥ 0, so that 1 2
|g(1)| ≤ |z|2 by the (real-valued) Taylor theorem with remainder, or otherwise. As c is arbitrary,
474
Fourier analysis
285Q
1 2
|ez − 1 − z| ≤ |z|2 whenever Re z ≤ 0. In particular, |epk (e
iy
1 2
−1)
− 1 − pk (eiy − 1)| ≤ p2k |eiy − 1|2 ≤ 2p2k
for each k, and |φX (y) − φZ (y)| ≤
Pn k=0
|epk (e
iy
−1)
− 1 − pk (eiy − 1)| ≤ 2
Pn k=0
p2k
for each y ∈ R. (d) Now for the detailed estimates. Given M ≥ 0 and ² > 0, let η > 0 and y0 , . . . , yl ∈ R be such that 1 2
Pr(X ≤ a) ≤ Pr(Z ≤ a + ) +
² 2
whenever X, Z are real-valued random variables, E(|X|) ≤ M , E(|Z|) ≤ M and |φX (yj ) − φX (yj )| ≤ η for every j ≤ l (285P). Take δ = η/(2M + 1) and suppose that X0P , . . . , Xn are independent {0, 1}-valued n random variables with Pr(Xk = 1) = pk ≤ δ for every k ≤ n, λ = k=0 pk ≤ M . Set X = X0 + . . . + Xn and let Z be a Poisson random variable with parameter λ; then by the arguments of (a)-(c), Pn Pn |φX (y) − φZ (y)| ≤ 2 k=0 p2k ≤ 2δ k=0 pk = 2δλ ≤ η for every y ∈ R. Also E(|X|) = E(X) = E(|Z|) = E(Z) =
P∞ m=0
m
λm −λ e m!
= e−λ
Pn k=0
pk = λ ≤ M ,
P∞
λm
m=1 (m−1)!
= e−λ
P∞ m=0
λm+1 m!
= λ ≤ M.
So 1 2
² 2
1 2
² 2
Pr(X ≤ a) ≤ Pr(Z ≤ a + ) + , Pr(Z ≤ a) ≤ Pr(X ≤ a + ) + for every a. But as both X and Z take all their values in N, | Pr(X ≤ m) − Pr(Z ≤ m)| ≤
² 2
for every m ∈ N, and | Pr(X = m) −
λm −λ e | m!
= | Pr(X = m) − Pr(Z = m)| ≤ ²
for every m ∈ N, as required. 285R Convolutions Recall from 257A that if ν, ν˜ are Radon probability measures on R r then they have a convolution ν ∗ ν˜ defined by writing (ν ∗ ν˜)(E) = (ν × ν˜){(x, y) : x + y ∈ E} r
for every Borel set E ⊆ R , which is also a Radon probability measure. We can readily compute the characteristic function φν∗˜ν from 257B: we have Z Z 0 φν∗˜ν (y) = eiy . x (ν ∗ ν˜)(dx) = eiy . (x+x ) ν(dx)˜ ν (dx0 ) Z Z Z 0 0 = eiy . x eiy . x ν(dx)˜ ν (dx0 ) = eiy . x ν(dx) eiy . x ν˜(dx0 ) = φν (y)φν˜ (y). (Thus convolution of measures corresponds to pointwise multiplication of characteristic functions, just as convolution of functions corresponds to pointwise multiplication of Fourier transforms.) Recalling that the sum of independent random variables corresponds to convolution of their distributions (272S), this gives
285Xb
Characteristic functions
475
another way of looking at 285I. Remember also that if ν, ν˜ have Radon-Nikod´ ym derivatives f , f˜ with ˜ respect to Lebesgue measure then f ∗ f is a Radon-Nikod´ ym derivative of ν ∗ ν˜ (257F). 285S The vague topology and pointwise convergence of characteristic functions In 285L we saw that a sequence hνn in∈N of Radon probability measures on R r converges in the vague topology to a Radon probability measure ν if and only if limn→∞
R
eiy . x νn (dx) =
R
eiy . x ν(dx)
for every y ∈ R r ; that is, iff limn→∞ ρ0y (νn , ν) = 0 for every y ∈ Rr , writing ρ0y (ν, ν 0 ) = |
R
eiy . x ν(dx) −
R
eiy . x ν 0 (dx)|
for Radon probability measures ν, ν 0 on R r and y ∈ R r . It is natural to ask whether the pseudometrics ρ0y actually define the vague topology. Writing T for the vague topology, S for the topology defined by {ρ0y : y ∈ R r }, we surely have S ⊆ T, just because every ρ0y is one of the pseudometrics used in the definition of T. Also we know that S and T give the same convergent sequences, and incidentally that T is metrizable (see 285Xq). But all this does not quite amount to saying that the two topologies are the same, and indeed they are not, as the next result shows. 285T Proposition Let y0 , . . . , yn ∈ R and η > 0. Then there are infinitely many m ∈ N such that |1 − eiyk m | ≤ η for every k ≤ n. proof Let η1 , . . . , ηr ∈ R be such that 1 = η0 , η1 , . . . ,P ηr are linearly independent over Q and every yk /2π r is a linear combination of the ηj over Q; say yk = 2π j=0 qkj ηj where every qkj ∈ Q. Express the qkj as Pr pkj /p where each pkj ∈ Z and p ∈ N \ {0}. Set M = maxk≤n j=0 |pkj |. Take any m0 ∈ N and let δ > 0 be such that |1 − e2πix | ≤ η whenever |x| ≤ 2πM δ. By Weyl’s Equidistribution Theorem (281N), there are infinitely many m such that <mηj > ≤ δ whenever 1 ≤ j ≤ r; in particular, there is such an m ≥ m0 . Let mj be the integral part of mηj , so that |mηj − mj | ≤ δ for 0 ≤ j ≤ r. Then Pr Pr |mpyk − 2π j=0 pkj mj | ≤ 2π j=0 |pkj ||mηj − mj | ≤ 2πM δ, so that |1 − eiyk mp | = |1 − exp(i(mpyk − 2π
Pr j=0
pkj mj ))| ≤ η
for every k ≤ n. As mp ≥ m0 and m0 is arbitrary, this proves the result. 285U Corollary The topologies S and T on the space of Radon probability measures on R, as described in 285S, are different. proof Let δx be the Radon probability measure on R which gives mass 1 to the singleton x, so that δx (E) = 1 if x ∈ E ⊆ R. By 285T, every member of S which contains δ0 also contains δm for infinitely many m ∈ N. On the other hand, the set G = {ν :
R
2
1 2
e−x ν(dx) > }
is a member of T, containing δ0 , which does not contain δm for any integer m 6= 0. So G ∈ T \ S and T 6= S. 285X Basic exercises (a) Let ν be a Radon probability measure on R r , where r ≥ 1, and suppose R that kxkν(dx) < ∞. Show that the characteristic function φ of ν is differentiable (in the full sense of R ∂φ 262Fa) and that ∂η (y) = i ξj eiy . x ν(dx) for every j ≤ r, y ∈ R r , using ξj , ηj to represent the coordinates j of x and y as usual. >(b) Let X = (X1 , . . . , Xr ) be a family of real-valued random variables, with characteristic function φX . Show that the characteristic function φXj of Xj is given by
476
Fourier analysis
285Xb
φXj (y) = φX (yej ) for every y ∈ R, r
where ej is the jth unit vector of R . > (c) Let X be a real-valued random variable and φX its characteristic function. Show that φaX+b (y) = eiyb φX (ay) for any a, b, y ∈ R. (d) Let X be a real-valued random variable and φ its characteristic function. (i) Show that for any integrable complex-valued function h on R,
R∞
∧
1 E(h(X)) = √
−∞
2π
φ(−y)h(y)dy,
∧
writing h for the Fourier transform of h. (ii) Show that for any rapidly decreasing test function h,
R∞
1 E(h(X)) = √
2π
∧
φ(y)h(y)dy. −∞
(e) Let ν be a Radon probability measure on R, and suppose that its characteristic function φ is squareintegrable. Show that ν is an indefinite-integral measure over Lebesgue measure and that its Radon-Nikod´ ym R derivatives are also square-integrable. (Hint: use 284O to find a square-integrable f such that h × f = R ∧ √1 φ × h for every rapidly decreasing test function h, and ideas from the proof of 284G to show that 2π Rb f = ν ]a, b[ whenever a < b in R.) a > (f ) Let X = (X1 , . . . , Xr ) be a family of real-valued random variables with characteristic function φX . Suppose that φX is expressible in the form Qr φX (y) = j=1 φj (ηj ) for some functions φ1 , . . . , φr , writing y = (η1 , . . . , ηr ) as usual. Show that X1 , . . . , Xr are independent. (Hint: show that the φj must be the characteristic functions of the Xj ; now show that the distribution of X has the same characteristic function as the product of the distributions of the Xj .) (g) Let X1 , X2 be independent real-valued random variables with the same distribution, and φ the characteristic function of X1 − X2 . Show that φ(t) = φ(−t) ≥ 0 for every t ∈ R. (h) Let ν be a Radon probability measure on R, with characteristic function φ. Show that 1 (ν[c, d] + ν ]c, d[) 2
=
i 2π
lima→∞
Ra −a
e−idy −e−icy φ(y)dy y
whenever c ≤ d in R. (Hint: use part (a) of the proof of 283F.) (i) Let X be a real-valued random variable and φX its characteristic function. Show that Pr(|X| ≥ a) ≤ 7a
R 1/a 0
(1 − Re(φX (y))dy
for every a > 0. (j) We say that a set Q of Radon probability measures on R is uniformly tight if for every ² > 0 there is an M ≥ 0 such that ν(R \ [−M, M ]) ≤ ² for every ν ∈ Q. Show that if Q is any uniformly tight family of Radon probability measures on R, and ² > 0, then there are η > 0 and y0 , . . . , yn ∈ R such that ν ]−∞, a] ≤ ν 0 ]−∞, a + ²] + ² whenever ν, ν 0 ∈ Q and |φν (yj ) − φν 0 (yj )| ≤ η for every j ≤ n, writing φν for the characteristic function of ν.
285Xs
Characteristic functions
477
(k) Let hνn in∈N be a sequence of Radon probability measures on R, and suppose that it converges for the vague topology to a Radon probability measure ν. Show that {ν} ∪ {νn : n ∈ N} is uniformly tight in the sense of 285Xj. > (l) Let ν, ν 0 be two totally finite Radon measures on R r which agree on all closed half-spaces, that is, sets of the form {x : x .y ≥ c} for c ∈ R, y ∈ R r . Show that ν = ν 0 . (Hint: reduce to the case νR r = ν 0 Rr = 1 and use 285M.) > (m) For γ > 0, the Cauchy distribution with centre 0 and scale parameter γ is the Radon probability measure νγ defined by the formula νγ (E) =
γ π
R
1
E γ 2 +t2
dt.
(i) Show that if X is a random variable with distribution νγ then Pr(X ≥ 0) = Pr(|X| ≥ γ) = 12 . (ii) Show that the characteristic function of νγ is y 7→ e−γ|y| . (Hint: 283Xr.) (iii) Show that if X and Y are independent random variables with Cauchy distributions, both centred at 0 and with scale parameters γ, δ respectively, and α, β are not both 0, then αX + βY has a Cauchy distribution centred at 0 and with scale parameter |α|γ + |β|δ. (iv) Show that if X and Y are independent normally distributed random variables with expectation 0 then X/Y has a Cauchy distribution. > (n) Let X1 , X2 , . . . be an independent identically distributed sequence of random variables, all of zero expectation and variance 1; let φ be their common characteristic function. For each n ≥ 1, set Sn = √1 (X1 + . . . + Xn ). n y n
(i) Show that the characteristic function φn of Sn is given by the formula φn (y) = (φ( √ ))n for each n. (ii) Show that |φn (y) − e−y
2
/2
y n
| ≤ n|φ( √ ) − e−y
2
/2n
|.
2
(iii) Setting h(y) = φ(y) − e−y /2 , show that h(0) = h0 (0) = h00 (0) = 0 and therefore that √ 2 limn→∞ nh(y/ n) = 0, so that limn→∞ φn (y) = e−y /2 for every y ∈ R. 2 1 Ra (iv) Show that limn→∞ Pr(Sn ≤ a) = √ e−x /2 dx for every a ∈ R. −∞ 2π
> (o) A random variable X has a Poisson distribution with parameter λ > 0 if Pr(X = n) = e−λ λn /n! for every n ∈ N. (i) Show that in this case E(X) = Var(X) = λ. (ii) Show that if X and Y are independent random variables with Poisson distributions then X + Y has a Poisson distribution. (iii) Find a proof of (ii) based on 285Q. > (p) For x ∈ R r , let δx be the Radon probability measure on R r which gives mass 1 to the singleton {x}, so that δx (E) = 1 whenever x ∈ E ⊆ R r . Show that δx ∗ δy = δx+y for all x, y ∈ R r . (q) Let P be the set of Radon probability measures on R r . For y ∈ R r , set ρ0y (ν, ν 0 ) = |φν (y) − φν 0 (y)| 1 for all ν, ν 0 ∈ P , writing φν for the characteristic function of ν. Set ψ(x) = (√2π) e−x . x/2 for x ∈ R r . Show r that the vague topology on P is defined by the family {ρψ } ∪ {ρ0y : y ∈ Qr }, defining ρψ as in 285K, and is therefore metrizable. (Hint: 281K; cf. 285Xj.) > (r) Let φ : R rP→ CP be the characteristic function of a Radon probability measure on R r . Show that n n φ(0) = 1 and that j=0 k=0 cj c¯k φ(aj − ak ) ≥ 0 whenever a0 , . . . , an ∈ R r and c0 , . . . , cn ∈ C. (‘Bochner’s theorem’ states that these conditions are sufficient, as well as necessary, for φ to be a characteristic function; see 445N in Volume 4.) Pn (s) Let hXn in∈N be an independent sequence of real-valued random variables and set Sn = j=0 Xj for each n ∈ N. Suppose that the sequence hνSn in∈N of distributions is convergent for the vague topology to a distribution. Show that hSn in∈N converges in measure, therefore a.e. (Hint: 285J, 273B.)
478
Fourier analysis
285Y
285Y Further exercises (a) Let ν be a Radon probability measure on R r . Write
R
∧ 1 ν(y) = √
( 2π)
r
e−iy . x ν(dx)
for every y ∈ R r . 1
∧
(i) Writing φν for the characteristic function of ν, show that ν(y) = √ r φν (−y) for every y ∈ R r . ( 2π) R ∧ R ∧ (ii) Show that ν(y)h(y)dy = h(x)ν(dx) for any Lebesgue integrable complex-valued function h on ∧
R r , defining the Fourier transform h as in 283Wa. R ∨ R ∧ (iii) Show that h(x)ν(dx) = h(y)ν(y)dy for any rapidly decreasing test function h on R r . (iv) Show that if ν is an indefinite-integral measure over Lebesgue measure, with Radon-Nikod´ ym ∧ derivative f , then ν is the Fourier transform of f . (b) Let ν be a Radon probability measure on R r , with characteristic function φ. Show that whenever c ≤ d in R r then ¡ i ¢r lim 2π α1 ,... ,αr →∞
Z
Z
α1
αr
... −α1
r ¡Y e−iδj ηj −e−iγj ηj ¢
−αr j=1
exists and lies between ν ]c, d[ and ν[c, d], writing ]c, d[ =
Q j≤r
ηj
φ(y)dy
]γj , δj [ if c = (γ1 , . . . , γr ) and d = (δ1 , . . . , δr ).
(c) Let hXn in∈N be anPindependent identically distributed sequence of (not-essentially-constant) random n variables, and set Sn = k=0 X2k+1 − X2k for each n ∈ N. Show that limn→∞ Pr(|S Pnn| ≥ α) = 1 for every α ∈ R. (Hint: 285Xg, proof of 285J.) Hence, or otherwise, show that limn→∞ Pr(| k=0 Xk | ≥ α) = 1 for every α ∈ R. (d) For Radon probability measures ν, ν 0 on R r set ρ(ν, ν 0 ) = inf{² : ² ≥ 0, ν ]−∞, a] ≤ ν 0 ]−∞, a + ²1] + ² ≤ ν ]−∞, a + 2²1] + 2² for every a ∈ R r }, writing ]−∞, a] = {(ξ1 , . . . , ξr ) : ξj ≤ αj for every j ≤ r} when a = (α1 , . . . , αr ), and 1 = (1, . . . , 1) ∈ R r . Show that ρ is a metric on the set of Radon probability measures on R r , and that the topology it defines is the vague topology. (Cf. 274Ya.) (e) Let r ≥ 1. We say that a set Q of Radon probability measures on R r is uniformly tight if for every ² > 0 there is a compact set K ⊆ R r such that ν(R r \K) ≤ ² for every ν ∈ Q. Show that if Q is any uniformly tight family of Radon probability measures on R r , and ² > 0, then there are η > 0, y0 , . . . , yn ∈ R r such that ν ]−∞, a] ≤ ν 0 ]−∞, a + ²1] + ² whenever ν, ν 0 ∈ Q and a ∈ R r and |φν (yj ) − φν 0 (yj )| ≤ η for every j ≤ n, writing φν for the characteristic function of ν. (f ) Show that for any M ≥ 0 the set of Radon probability measures ν on R r such that is uniformly tight in the sense of 285Ye.
R
kxkν(dx) ≤ M
(g) Let Cb (R r ) be the Banach space of bounded continuous real-valued functions on R r . (i) Show that any Radon probability measure ν on Rr corresponds to a continuous linear functional R r hν : Cb (R ) → R, writing hν (f ) = f dν for f ∈ Cb (Rr ). (ii) Show that if hν = hν 0 then ν = ν 0 . (iii) Show that the vague topology on the set of Radon probability measures corresponds to the weak* topology on the dual (Cb (R r ))∗ of Cb (R r ) (2A5Ig). (h) Let r ≥ 1 and let P be the set of Radon probability measures on R r . For m ∈ N let ρ∗m be the pseudometric on P defined by setting ρ∗m (ν, ν 0 ) = supkyk≤m |φν (y) − φν 0 (y)| for ν, ν 0 ∈ P , writing φν for the characteristic function of ν. Show that {ρ∗m : m ∈ N} defines the vague topology on P .
285 Notes
Characteristic functions
479
(i) Let r ≥ 1 and let P be the set of Radon probability measures on R r . For m ∈ N let ρ˜∗m be the pseudometric on P defined by setting ρ˜∗m (ν, ν 0 ) =
R
{y:kyk≤m}
|φν (y) − φν 0 (y)|dy
for ν, ν 0 ∈ P , writing φν for the characteristic function of ν. Show that {˜ ρ∗m : m ∈ N} defines the vague topology on P . (j) Let X be a real-valued random variable with finite variance. Show that for any η ≥ 0, 1 2
1 6
|φ(y) − 1 − iyE(X) + y 2 E(X 2 )| ≤ η|y 3 |E(X 2 ) + y 2 E(ψη (X)), writing φ for the characteristic function of X and ψη (x) = 0 for |x| ≤ η, x2 for |x| > η. (k) Suppose that ² ≥ δ > 0 and that X0 , . . . , Xn are independent real-valued random variables such that Pn Pn E(Xk ) = 0 for every k ≤ n, k=0 Var(Xk ) = 1, k=0 E(ψδ (Xk )) ≤ δ √ 2 2 (writing ψδ (x) = 0 if |x| ≤ δ, x if |x| > δ). Set γ = ²/ δ + δ, and let Z be a standard normal random variable. Show that |φ(y) − e−y
2
/2
1
| ≤ ²|y|3 + y 2 (δ + E(ψγ (Z))) 3 Pn write φk for the for every y ∈ R, writing φ for the characteristic function of X = k=0 Xk . (Hint: p ˜ characteristic function of Xk and φk for the characteristic function of σk Z, where σk = Var(Xk ). Show that ¡ ¢ 1 |φk (y) − φ˜k (y)| ≤ ²|y 3 |σk2 + y 2 E(ψ² (Xk )) + σk2 E(ψγ (Z)) .) 3
(l) Show that for every ² > 0 there is a δ > 0 such that whenever X0 , . . . , Xn are independent real-valued random variables such that Pn Pn E(Xk ) = 0 for every k ≤ n, k=0 Var(Xk ) = 1, k=0 E(ψδ (Xk )) ≤ δ (writing ψδ (x) = 0 if |x| ≤ δ, x2 if |x| > δ), then |φ(y) − e−y the characteristic function of X = X0 + . . . + Xn .
2
/2
| ≤ ²(y 2 + |y 3 |) for every y ∈ R, writing φ for
(m) Use 285Yl to prove Lindeberg’s theorem (274F). (n) Let r ≥ 1 and let P be the set of Radon probability measures on R r . Show that convolution, regarded as a map from P × P to P , is continuous when P is given the vague topology. (Hint: 281Xa and 257B will help.) 0
(o) Let S be the topology on R defined by {ρ0y : y ∈ R}, where ρ0y (x, x0 ) = |eiyx − eiyx | (compare 285S). Show that addition and subtraction are continuous for S in the sense of 2A5A. 285 Notes and comments Just as with Fourier transforms, the power of methods which use the characteristic functions of distributions is based on three points: (i) the characteristic function of a distribution determines the distribution (285M); (ii) the properties of interest in a distribution are reflected in accessible properties of its characteristic function (285G, 285I, 285J) (iii) these properties of the characteristic function are actually different from the corresponding properties of the distribution, and are amenable to different kinds of investigation. Above all, the fact that (for sequences!) convergence in the vague topology of distributions corresponds to pointwise convergence for characteristic functions (285L) provides us with a path to the classic limit theorems, as in 285Q and 285Xn. In 285S-285U I show that this result for sequences does not correspond immediately to any alternative characterization of the vague topology, though it can be adapted in more than one way to give such a characterization (see 285Yh-285Yi). Concerning the Central Limit Theorem there is one conspicuous difference between the method suggested here and that of §274. The previous approach offered at least a theoretical possibility of giving an explicit
480
Fourier analysis
285 Notes
formula for δ in 274F as a function of ², and hence an estimate of the rate of convergence to be expected in the Central Limit Theorem. The arguments in the present chapter, involving as they do an entirely non-constructive compactness argument in 281A, leave us with no way of achieving such an estimate. But in fact the method of characteristic functions, suitably refined, is the basis of the best estimates known, such as the Berry-Ess´een theorem (274Hc). In 285D I try to show how the characteristic function φν of a Radon probability measure can be related ∧ to a ‘Fourier transform’ ν of ν which corresponds directly to the Fourier transforms of functions discussed in §§283-284. If f is a non-negative Lebesgue integrable function and we take ν to be the corresponding ∧ ∧ indefinite-integral measure, then ν = f . Thus the concept of ‘Fourier transform of a measure’ is a natural extension of the Fourier transform of an integrable function. Looking at it from the other side, the formula ∧ of 285Dc shows that ν can be thought of as representing the inverse Fourier transform of ν in the sense of 284H-284I. Taking ν to be the measure which assigns a mass 1 to the point 0, we get the Dirac delta function, with Fourier transform the constant function 1. These ideas can be extended without difficulty to handle convolutions of measures (285R). It is a striking fact that while there is no satisfactory characterization of the functions which are Fourier transforms of integrable functions, there is a characterization of the characteristic functions of probability distributions. This is ‘Bochner’s theorem’. I give the condition in 285Xr, asking you to prove its necessity as an exercise; we already have three-quarters of the machinery to prove its sufficiency, but the last step will have to wait for Volume 4.
286 Carleson’s theorem Carleson’s theorem (Carleson 66) was the (unexpected) solution to a long-standing problem. Remarkably, it can be proved by ‘elementary’ arguments. The hardest part of the work below, in 286I-286L, involves only the laborious verification of inequalities. How the inequalities were chosen is a different matter; for once, some of the ideas of the proof lie in the statements of the lemmas. The argument here is a greatly expanded version of Lacey & Thiele 00. The Hardy-Littlewood Maximal Theorem (286A) is important, and worth learning even if you leave the rest of the section as an unexamined monument. I bring 286B-286D forward to the beginning of the section, even though they are little more than worked exercises, because they also have potential uses in other contexts. The complexity of the argument is such that it is useful to introduce a substantial number of special notations. Rather than include these in the general index, I give a list in 286W. Among them are ten constants C1 , . . . , C10 . The values of these numbers are of no significance. The method of proof here is quite inappropriate if we want to estimate rates of convergence. I give recipes for the calculation of the Cn only for the sake of the linear logic in which this treatise is written, and because they occasionally offer clues concerning the tactics being used. In this section all integrals are with respect to Lebesgue measure µ on R unless otherwise stated. 286A The Maximal Theorem Suppose that 1 < p < ∞ and that f ∈ LpC (µ) (definition: 244O). Set 1 Rb f ∗ (x) = sup{ |f | : a ≤ x ≤ b, a < b} a b−a
for x ∈ R. Then kf ∗ kp ≤
1/p
2 p kf kp . p−1
proof (a) It is enough to consider the case f ≥ 0. Note that if E ⊆ R has finite measure, then R R f = (f × χE) × χE ≤ kf × χEkp (µE)1/q ≤ kf kp (µE)1/q E R p is finite, where q = , by H¨older’s inequality (244Eb). Consequently, if t > 0 and E f ≥ tµE, we must p−1
have tµE ≤ kf × χEkp (µE)1/q and
286A
Carleson’s theorem 1 kf tp
µE = (µE)p−p/q ≤ (b) For t > 0, set Gt = {x :
Ra x
is open, because x 7→
x
× χEkpp =
1 tp
R E
f p.
f > (a − x)t for some a > x}.
(i) Gt is an open set. P P For any a ∈ R, Ra
481
Gta = {x : x < a,
Ra
f > (a − x)t}
x
f and x 7→ (a − x)t are continuous (225A); so Gt =
S a∈R
Gta is open. Q Q
R (ii) By 2A2I, there is a partition C of Gt into open intervals. Now C is bounded and tµC ≤ C f for every C ∈ C. P P Express C as ]a, b[ (for the moment, we have to allow for the possibility that one or both of a, b is infinite). Rc (α) If x ∈ C, there is some (finite) c > x such that x f > (c − x)t. Set d = min(b, c) > x. If d = c, then Rd Rc of course x f > (d − x)t. If d = b < c, then (because b ∈ / Gt ) b f ≤ (c − b)t, so again Rd Rb Rc Rc f = x f = x f − b f > (c − x)t − (c − b)t = (b − x)t = (d − x)t. x Rd Thus we always have some d ∈ ]x, b] such that x f > (d − x)t. (β) Now take any z ∈ C, and consider Rx Az = {x : z ≤ x ≤ b, z f ≥ (x − z)t}. Rx Then z ∈ Az , and Az is closed, again because the functions x 7→ z f and x 7→ (x − z)t are continuous. Moreover, Az is bounded, because x − z ≤
1 kf kpp tp
x0 ∈ Az , and there is a d ∈ ]x0 , b] such that
for every x ∈ Az , by (a). ?? If sup Az = x0 < b, then
Rd
f ≥ t(d − x0 ), by (α); but in this case d ∈ Az , which is Rb impossible. X X Thus b = sup Az ∈ Az (in particular, b < ∞), and z f ≥ (b − z)t. x0
(γ) Letting z decrease to a, we see that b − a ≤ Rb a
f = limz↓a
Rb z
1 kf kpp , tp
so a is finite, and also
f ≥ limz↓a (b − z)t = (b − a)t,
as required. Q Q (iii) Accordingly, because C is countable and f is non-negative, P P 1 R 1 R∞ µGt = C∈C µC ≤ C∈C p C f p ≤ p −∞ f p t
is finite, and
R Gt
f=
R
P C∈C
C
f≥
t
P C∈C
tµC = tµGt .
(c) All this is true for every t > 0. Now if we set f1∗ (x) = supa>x
1 Ra f a−x x
for x ∈ R, we have {x : f1∗ (x) > t} = Gt for every t > 0. For any t > 0, 1 tµGt p
1 q
= (1 − )tµGt ≤
R
Gt
writing 1 for the function with constant value 1. So Z
∞ −∞
(see 252O)
Z (f1∗ )p =
0
∞
1 q
f − t1 ≤
µ{x : f1∗ (x)p > t}dt
R∞ −∞
1 q
(f − t1)+ ,
482
Fourier analysis
Z
∞
=p 0
(substituting t = up )
Z
286A
up−1 µ{x : f1∗ (x) > u}du
Z ∞ Z ¡ ∞ ¢ 1 up−1 µGu du = p2 up−2 (f − u1)+ du q 0 0 −∞ Z ∞Z ∞ 1 = p2 max(0, f (x) − u)up−2 dudx ∞
≤p
−∞
q
0
1
(by Fubini’s theorem, 252B, because (x, u) 7→ up−2 max(0, f (x) − u) is measurable and non-negative) q Z ∞ Z qf (x) 1 up−2 (f (x) − u)dudx = p2 q −∞ 0 Z ∞ 2 p−1 p q p p = ) kf kpp . f (x)p dx = ( p(p−1)
(d) Similarly, setting f2∗ (x) = supa<x
p−1
−∞
1 Rx f x−a a
for x ∈ R,
R∞ −∞
(f2∗ )p ≤ (
p p ) kf kpp . p−1
But f ∗ = max(f1∗ , f2∗ ).
P P Of course f1∗ ≤ f ∗ and f2∗ ≤ f ∗ . But also, if f ∗ (x) > t, there must be a non-trivial interval I containing x Rb R Rx such that I f > tµI; if a = inf I and b = sup I, then either a f > (x − a)t and f2∗ (x) > t, or x f > (b − x)t Q and f1∗ (x) > t. As x and t are arbitrary, f ∗ = max(f1∗ , f2∗ ). Q Accordingly Z ∞ Z ∞ ∗ p ∗ p (f ) = max((f1∗ )p , (f2∗ )p ) kf kp = −∞ −∞ Z ∞ p p ≤ (f1∗ )p + (f2∗ )p ≤ 2( ) kf kpp . p−1
−∞
Taking pth roots, we have the inequality we seek. 286B Lemma Let g : R → [0, ∞[ be a function which is non-decreasing on ]−∞, α], non-increasing on R∞ [β, ∞[ and constant on [α, β], where α ≤ β. Then for any measurable function f : R → [0, ∞], −∞ f × g ≤ R∞ 1 Rb g · supa≤α,b≥β,a 0 such that |φ(x)| ≤ C1 min(w(3), w(x)2 ) for
every x ∈ R (because limx→∞ x6 φ(x) = limx→−∞ x6 φ(x) = 0). For σ ∈ Q, set wσ = 2kσ S−xσ D2kσ w, so that wσ (x) = 2kσ w(2kσ (x − xσ )) for every x. Elementary calculations show that (i) wσ depends only on Iσ ; R∞ R∞ (ii) −∞ wσ = −∞ w = 1 for every σ; (iii) |φσ (x)| ≤ C1 min(2−kσ /2 wσ (x), 2−3kσ /2 wσ (x)2 ) for every x and σ (because |φ(x)| ≤ C1 w(x)2 ≤ C1 w(x) for every x ∈ R).
286F Two partial orders (a) For σ, τ ∈ Q say that σ ≤ τ if Iσ ⊆ Iτ and Jτ ⊆ Jσ . Then ≤ is a partial order on Q. We have the following elementary facts. (i) If σ ≤ τ , then kσ ≥ kτ . (ii) If σ and τ are incomparable (that is, σ 6≤ τ and τ 6≤ σ), then (Iσ × Jσ ) ∩ (Iτ × Jτ ) is empty. P P We may suppose that kσ ≤ kτ . If Jσ ∩ Jτ 6= ∅, then Jσ ⊆ Jτ , because both are dyadic intervals, and Jσ is the shorter; but as τ 6≤ σ, this means that Iτ 6⊆ Iσ and Iτ ∩ Iσ = ∅. Q Q (iii) If σ, σ 0 are incomparable and both less than or equal to τ , then Iσ ∩ Iσ0 = ∅, because Jτ ⊆ Jσ ∩ Jσ0 . (iv) If σ ≤ τ and kσ ≥ k ≥ kτ , then there is a (unique) σ 0 such that σ ≤ σ 0 ≤ τ and kσ0 = k. (The point is that there is a unique I ∈ I such that Iσ ⊆ I ⊆ Iτ and µI = 2−k ; and similarly there is just one candidate for Jσ0 .) (b) For σ, τ ∈ Q say that σ ≤r τ if Iσ ⊆ Iτ and Jτr ⊆ Jσr (that is, either τ = σ or Jτ ⊆ Jσr ), so that, in particular, σ ≤ τ . Note that if σ, σ 0 ≤r τ and kσ 6= kσ0 then Jσr ∩ Jσr0 6= ∅, so (φσ |φσ0 ) = 0 (286E(b-iii)). (c) It will be convenient to have a shorthand for the following: if P , R ⊆ Q, say that P 4 R if for every σ ∈ P there is a τ ∈ R such that σ ≤ τ .
286G
Carleson’s theorem
485
286G
We shall need the results of some elementary calculations. The first three are nearly trivial. P∞ 1 Lemma (a) For any m ∈ N, n=m w(n + 12 ) ≤ . 2(1+m)2 R (b) Suppose that σ ∈ P and that I is an interval not containing xσ in its interior. Then I wσ ≥ wσ (x)µI, where x is the midpoint of I. P∞ (c) For any x ∈ R, n=−∞ w(x − n) ≤ 2. R∞ (d) There is a constant C2 ≥ 0 such that −∞ w(x)w(αx + β)dx ≤ C2 w(β) whenever 0 ≤ α ≤ 1 and β ∈ R. R (e) There is a constant C3 ≥ 0 such that |(φσ |φτ )| ≤ 2−kσ /2 2kτ /2 C3 Iτ wσ (x)dx whenever σ, τ ∈ Q and kσ ≤ kτ . (f) There is a constant C4 ≥ 0 such that whenever τ ∈ Q and k ∈ Z, then R P σ∈Q,σ≤τ,kσ =k R\Iτ wσ ≤ C4 . proof (a) The point is just that w is convex on ]−∞, 0] and [0, ∞[. So we can apply 233Ib with f (x) = x, or argue directly from the fact that w(n + 21 ) ≤ 12 (w(n + 12 + x) + w(n + 12 − x)) for |x| ≤ 12 , to see that R n+1 w(n + 21 ) ≤ n w for every n ≥ 0. Accordingly R∞ P∞ 1 1 . n=m w(n + 2 ) ≤ m w = 2 2(1+m)
(b) Similarly, because I lies all on the same side of xσ , wσ is convex on I, so the same inequality yields R wσ (x)µI ≤ I wσ . (c) Let m be such that |x − m| ≤ 12 . Then, using the same inequalities as before to estimate w(x − n) for n = 6 m, we have ∞ X
Z w(x − n) ≤ w(x − m) +
n=−∞
Z
x−m− 12
Z
∞
w(t)dt
w(t)dt + x−m+ 12
−∞ ∞
≤1+
w(x)dx = 2. −∞
(d)(i) The first step is to note that w( 12 (1 + β)) 8(1 + β)3 = ≤8 w(β) (3 + β)3 for every β ≥ 0. Now αw(α + αβ) ≤ 4w(β) whenever β ≥ 0 and α ≥ 21 . P P For t ≥ 12 , d tw(t + tβ) dt
=
1−2t(1+β) (1+t+tβ)4
≤ 0,
so Q αw(α + αβ) ≤ 12 w( 12 + 12 β) ≤ 4w(β). Q Of course this means that 1+β 1 w( ) α 2α
≤ 8w(β)
whenever β ≥ 0 and 0 < α ≤ 1. (ii) Try C2 = 16. If 0 < α ≤ 1 and β ≥ 0, set γ =
1+β . 2α
1 + αx + β = (1 + β)(1 + so w(αx + β) ≤ 8w(β) and
R∞ −γ
Then, for any x ≥ −γ,
αx ) 1+β
w(x)w(αx + β)dx ≤ 8w(β)
1 2
≥ (1 + β),
R∞ −γ
w ≤ 8w(β).
486
Fourier analysis
286G
On the other hand, Z
Z
−γ
∞
w(x)w(αx + β)dx ≤ w(γ)
w(αx + β)dx Z ∞ 1 1+β = w( ) w ≤ 8w(β).
−∞
−∞
α
Putting these together,
R∞ −∞
(iii) If α = 0, then R∞
−∞
w(x)w(αx + β)dx ≤ 16w(β); and this is true whenever 0 < α ≤ 1 and β ≥ 0.
w(x)w(αx + β)dx = w(β)
−∞
2α
R∞ −∞
w(x)dx = w(β) ≤ C2 w(β)
for any β. If 0 < α ≤ 1 and β < 0, then Z
Z
∞
∞
w(x)w(αx + β)dx =
w(−x)w(−αx − β)dx
−∞
−∞
(because w is an even function)
Z
∞
w(x)w(αx − β)dx ≤ C2 w(−β)
= −∞
(by (ii) above) = C2 w(β). So we have the required inequality in all cases. R 1/2 (e) Set C3 = max(C12 C2 , kφk22 / −1/2 w). (i) It is worth disposing immediately of the case σ = τ . In this case, |(φσ |φτ )| = kφσ k22 = kφk22 , while Z
Z wσ = 2
kσ
Iτ
so certainly |(φσ |φτ )| ≤ C3 so
R Iτ
xσ +2−kσ −1 xσ −2−kσ −1
Z w(2kσ (x − xσ ))dx =
1/2
w(x)dx, −1/2
wσ .
R (ii) Now suppose that Iσ 6= Iτ . In this case, because kσ ≤ kτ , Iτ must all lie on the same side of xσ , w ≥ wσ (xτ )µIτ , by (b). Iτ σ We know from 286E(c-iii) that |φσ (x)| ≤ 2−kσ /2 C1 wσ (x) for every x. So Z |(φσ |φτ )| ≤ 2
−kσ /2 −kτ /2
2
C12 Z
= 2kσ /2 2kτ /2 C12 = 2kσ /2 2−kτ /2 C12
∞
wσ (x)wτ (x)dx −∞
∞
w(2kσ (x − xσ ))w(2kτ (x − xτ ))dx
−∞ Z ∞
w(2kσ −kτ x + 2kσ (xτ − xσ ))w(x)dx
−∞
≤ 2kσ /2 2−kτ /2 C12 C2 w(2kσ (xτ − xσ )) (by (d), since 2kσ −kτ ≤ 1)
Z ≤ 2−kσ /2 2−kτ /2 C3 wσ (xτ ) ≤ 2−kσ /2 2kτ /2 C3
wσ , Iτ
as required.
286I
Carleson’s theorem
(f ) Set C4 = 2
P∞ R ∞ j=0 j+ 21
w(x)dx; this is finite because
487
R∞
w(x)dx =
α
1 2(1+α)2
for every α ≥ 0.
If k < kτ then kσ 6= k for any σ ≤ τ , so the result is trivial. If k ≥ kτ , then for each dyadic subinterval I of Iτ of length 2−k there is exactly one σ ≤ τ such that Iσ = I. List these as σ0 , . . . in ascending order of £ £ the centres xσj , so that if Iτ = 2−kτ m, 2−kτ (m + 1) then xσj = 2−kτ m + 2−k (j + 12 ), for j < 2k−kτ . Now τ 2k−k X−1 Z 2−kτ m
−∞
j=0
wσj (x)dx = 2
k
τ 2k−k X−1 Z 2−kτ m
j=0
= ≤
1 2
w(2k (x − 2−kτ m) − j − )dx
−∞
τ 2k−k X−1 Z 0
1 2
w(x − j − )dx
−∞ j=0 Z ∞ ∞ X j=0
1 2
w(x)dx = C4 .
j+ 21
Similarly (since w is an even function, so the whole picture is symmetric about xτ )
P2k−kτ −1 R ∞ j=0
and
2−kτ (m+1)
R
P σ≤τ,kσ =k
1 2
wσj (x)dx ≤ C4 ,
R\Iτ
wσ ≤ C4 ,
as required. 286H ‘Mass’ and ‘energy’ (Lacey & Thiele 00) If P is a subset of Q, E ⊆ R is measurable, h : R → R is measurable, and f ∈ L2C , set R R∞ massEh (P ) = supσ∈P,τ ∈Q,σ≤τ E∩h−1 [Jτ ] wτ ≤ supτ ∈Q −∞ wτ = 1, energyf (P ) = supτ ∈Q 2kτ /2
qP σ∈P,σ≤r τ
|(f |φσ )|2 .
If P 0 ⊆ P then massEh (P 0 ) ≤ massEh (P ) and energyf (P 0 ) ≤ energyf (P ). Note that energyf ({σ}) = 2kσ /2 |(f |φσ )| for any σ ∈ Q, since if σ ≤r τ then kτ ≤ kσ . 286I Lemma Set C5 = 212 . If P ⊆ Q is finite, E ⊆ R is measurable, h : R → RP is measurable, and γ ≥ massEh (P ), then we can find sets P1 ⊆ P , P2 ⊆ Q such that massEh (P1 ) ≤ 41 γ, γ τ ∈P2 µIτ ≤ C5 µE and P \ P1 4 P2 (in the notation of 286Fc). proof (a) Set P1 = {σ : σ ∈ P , massEh ({σ}) ≤ 41 γ}. Then massEh (P1 ) ≤ 14 γ.R If γ = 0 we can stop here, as P1 = P . Otherwise, for each σ ∈ P \ P1 let σ 0 ∈ Q be such that σ ≤ σ 0 and E∩h−1 [J 0 ] wσ0 > 41 γ. Let P2 σ be the set of elements of {σ 0 : σ ∈ P \ P1 } which are maximal for ≤; then P \ P1 4 P2 . (b) For k ∈ N set (k)
Rk = {τ : τ ∈ P2 , 2kτ µ(E ∩ h−1 [Jτ ] ∩ Iτ ) ≥ 22k−9 γ}, (k)
where Iτ is the interval with the same centre as Iτ and 2k times its length. Now P2 = (k) τ ∈ P2 . If k ∈ N and x ∈ R \ Iτ , then |x − xτ | ≥ 2k−kτ −1 , so
S k∈N
wτ (x) = 2kτ w(2kτ (x − xτ )) ≤ 2kτ w(2k−1 ) = 2kτ (1 + 2k−1 )−3 . So 1 γ 4
Z < E∩h−1 [Jτ ]
Z wτ =
E∩h−1 [Jτ ]∩Iτ
≤ 2kτ µ(E ∩ h−1 [Jτ ] ∩ Iτ ) +
∞ X k=0
wτ +
∞ Z X k=0
(k+1)
E∩h−1 [Jτ ]∩Iτ
(k)
wτ
\Iτ
2kτ µ(E ∩ h−1 [Jτ ] ∩ Iτ(k+1) )(1 + 2k−1 )−3 .
Rk . P P Take
488
Fourier analysis
286I
It follows that either 1 8
2kτ µ(E ∩ h−1 [Jτ ] ∩ Iτ ) ≥ γ and τ ∈ R0 , or there is some k ∈ N such that (k+1)
2kτ µ(E ∩ h−1 [Jτ ] ∩ Iτ
)(1 + 2k−1 )−3 ≥ 2−k−4 γ
and (k+1)
2kτ µ(E ∩ h−1 [Jτ ] ∩ Iτ
) ≥ (1 + 2k−1 )3 2−k−4 γ ≥ 22k−7 γ,
so that τ ∈ Rk+1 . Q Q
P (c) For every k ∈ N, τ ∈Rk µIτ ≤ 211−k µE. P P If Rk = ∅, this is trivial. Otherwise, enumerate Rk as hτj ij≤n in such a way that kτj ≤ kτl if j ≤ l ≤ n. Define q : {0, . . . , n} → {0, . . . , n} inductively by the rule (k)
(k)
q(l) = min({l} ∪ {q(j) : j < l, (Iτq(j) × Jτq(j) ) ∩ (Iτl × Jτl ) 6= ∅}) for each l ≤ n. A simple induction shows that q(q(l)) = q(l) ≤ l for every l ≤ n. Note that, for l ≤ n, (k) (k) Iτq(l) ∩ Iτl 6= ∅, so that (k)
(k+2)
Iτl ⊆ Iτl ⊆ Iτq(l) , (k)
(k)
because µIτl ≤ µIτq(l) . Moreover, if j < l ≤ n and q(j) = q(l), then both Jτj and Jτl meet Jτq(j) , therefore include it, and Jτj ⊆ Jτl . But as τj and τl are distinct members of P2 , τl 6≤ τj and Iτj ∩ Iτl must be empty. Set M = {q(j) : j ≤ n}. We have X X X γ µIτ = γ µIτj τ ∈Rk
m∈M
≤γ
X
j≤n q(j)=m
µIτ(k+2) = 2k+2 γ m
m∈M
≤ 2k+2
X
X
µIτm
m∈M
29−2k µ(E ∩ h−1 [Jτm ] ∩ Iτ(k) ) m
m∈M
≤ 2k+2 · 29−2k µE = 211−k µE (k)
(k)
because if l, m ∈ M and l < m then Iτl × Jτl and Iτm × Jτm are disjoint (since otherwise q(m) ≤ l and (k) (k) Q there can be no j such that q(j) = m), so that h−1 [Jτl ] ∩ Iτl and h−1 [Jτm ] ∩ Iτm are disjoint. Q (d) Accordingly γ
P τ ∈P2
µIτ ≤ γ
P∞ P k=0
τ ∈Rk
µIτ ≤ 212 µE,
as required. 286J Lemma If P ⊆ Q is finite and f ∈ L2C , then X X ¯ ¯ ¯(f |φσ )(φσ |φτ )(φτ |f )¯ ≤ C3 |(f |φσ )|2 σ,τ ∈P,Jσ =Jτ
σ∈P
≤ C3 k
X
(f |φσ )φσ k2 kf k2 .
σ∈P
proof X
¯ ¯ ¯(f |φσ )(φσ |φτ )(φτ |f )¯ ≤
σ,τ ∈P,Jσ =Jτ
(because |ξζ| ≤
1 2 2 (|ξ|
X σ,τ ∈P,Jσ =Jτ
2
+ |ζ| ) for all complex numbers ξ, ζ)
1¡ |(f |φσ )|2 2
¢ + |(f |φτ )|2 |(φσ |φτ )|
286K
Carleson’s theorem
=
X X σ∈P
≤
|(f |φσ )|2 |(φσ |φτ )|
τ ∈P Jσ =Jτ
X
|(f |φσ )|
σ∈P
(by 286Ge, since kσ = kτ if Jσ = Jτ ) ≤
489
Z
X
2
C3
Z
X
wσ Iτ
τ ∈P,Jσ =Jτ ∞
|(f |φσ )|2 C3
wσ −∞
σ∈P
(because if τ , τ 0 are distinct members of P and Jτ = Jτ 0 , then Iτ and Iτ 0 are disjoint) X X = C3 |(f |φσ )|2 = C3 (f |φσ )(φσ |f ) σ∈P
= C3
¡X
σ∈P
X ¯ ¢ (f |φσ )φσ ¯f ≤ C3 k (f |φσ )φσ k2 kf k2
σ∈P
σ∈P
by Cauchy’s inequality (244Eb). 286K Lemma Set
√ C6 = 4(C3 + 4C3 2C4 ).
Let P ⊆ Q be a finite set, f ∈ L2C and kf k2 = 1. Suppose that γ ≥ energyf (P ). Then we can find finite sets P P1 ⊆ P and P2 ⊆ Q such that energyf (P1 ) ≤ 12 γ, γ 2 τ ∈P2 µIτ ≤ C6 , and P \ P1 4 P2 . proof (a) We may suppose that γ > 0 and that P 6= ∅, since otherwise we can take P1 = P and P2 = ∅. (i) For τ ∈ Q, A ⊆ Q set Tτ = {σ : σ ∈ P , σ ≤r τ },
∆(A) =
P σ∈A
|(f |φσ )|2 .
There are only finitely many sets of the form Tτ ; let R ⊆ Q be a non-empty finite set such that whenever τ ∈ Q and Tτ is not empty, there is a τ 0 ∈ R such that Tτ = Tτ 0 and kτ 0 ≥ kτ ; this is possible because if A ⊆ P is not empty then kτ ≤ minσ∈A kσ whenever A = Tτ . (ii) Choose τ0 , τ1 , . . . , P00 , P10 , . . . inductively, as follows. P00 = P . Given that Pj0 ⊆ P is not empty, consider 1 4
Rj = {τ : τ ∈ R, 2kτ ∆(Pj0 ∩ Tτ ) ≥ γ 2 }. If Rj = ∅, stop the induction and set n = j, P2 = {τl : l < j}, P1 = Pj0 . Otherwise, among the members of 0 Rj take one with yτ as far to the left as possible, and call it τj ; set Pj+1 = Pj0 \ {σ : σ ∈ P , σ ≤ τj }, and continue. Note that as Rj+1 ⊆ Rj for every j, yτj+1 ≥ yτj for every j. The induction must stop at a finite stage because if it does not stop with n = j then ∆(Pj0 ∩ Tτj ) > 0, so 0 ⊆ Pj0 \ Tτj is a proper subset of Pj0 , while P00 = P is finite. Since Rn = ∅, Pj0 ∩ Tτj is not empty and Pj+1 p energy(P1 ) = energy(Pn0 ) = sup 2kτ /2 ∆(Pn0 ∩ Tτ ) f
f
τ ∈Q
p 1 = max 2kτ /2 ∆(Pn0 ∩ Tτ ) ≤ γ. τ ∈R
2
We also have P \ P1 4 {τj : j < n}.
S 0 (iii) Set Pj00 = Pj0 ∩ Tτj ⊆ Pj0 \ Pj+1 for j < n, so that hPj00 ij 0 and y, z, β ∈ R, set θzαβ (y) = θαz+β (αy + β). Then 0 3 (a) the function (α, β, y, z) 7→ θzαβ (y) : ]0, ∞[ × R → [0, 1] is Borel measurable; (b) for any rapidly decreasing test function f , ∧
0 2π|(f × θzαβ )∨ | ≤ D1/α A(Mβ Dα f )
(in the notation of 286C) at every point. proof (a) We need only observe that (y, z) 7→ θz (y) : R 2 → R is Borel measurable, and that (α, β, y, z) 7→ 0 θzαβ (y) is built up from this, + and ×. 0 (b) Set v = αz + β, so that θzαβ = Dα Sβ θv . Then ∧
∧
∧
0 f × θzαβ = f × Dα Sβ θv = Dα Sβ (S−β D1/α f × θv )
= αDα Sβ (S−β (Dα f )∧ × θv ) = αDα Sβ ((Mβ Dα f )∧ × θv ), so ¡ ¢∨ ∧ 0 (f × θzαβ )∨ = α Dα Sβ ((Mβ Dα f )∧ × θv ) ¡ ¢∨ = D1/α Sβ ((Mβ Dα f )∧ × θv ) ¡ ¢∨ = D1/α M−β (Mβ Dα f )∧ × θv and
¯¡ ¢∨ ¯ ∧ 0 2π|(f × θzαβ )∨ | = 2πD1/α ¯ (Mβ Dα f )∧ × θv ¯ ≤ D1/α A(Mβ Dα f )
by 286P. 286R Lemma For any y, z ∈ R, θ˜z (y) =
R2 1¡ 1 α
limn→∞
1 n
Rn 0
¢
0 θzαβ (y)dβ dα
is defined, and θ˜z (y) = θ˜1 (0) > 0 if y < z, = 0 if y ≥ z. proof (a) The case y ≥ z is trivial; because if y ≥ z then αy + β ≥ αz + β for all α > 0 and β ∈ R, so that 0 (y) = 0 for every α > 0, β ∈ R and θ˜z (y) = 0. For the rest of the proof, therefore, I look at the case θzαβ y < z. 0 0 (b)(i) Given y < z ∈ R and α > 0, set l = blog2 (20α(z − y))c. Then θz,α,β+2 l (y) = θzαβ (y) for every 0 β ∈ R. P P If θzαβ (y) = θαz+β (αy + β) is non-zero, there must be k, m ∈ Z such that
1 2
2k (m + ) ≤ αz + β < 2k (m + 1) and ∧
1 4
0 (y) 6= 0, φ(2−k (αy + β) − (m + ))2 = θzαβ
so
286R
Carleson’s theorem
503
2k m ≤ αy + β ≤ 2k (m + ∧
because φ is zero outside [− 51 , 51 ]. In this case,
1 20
9 ) 20
· 2k < α(z − y), so that k ≤ l. We therefore have
1 2
2k (m + 2l−k + ) ≤ αz + β + 2l < 2k (m + 2l−k + 1), 1 2
2k (m + 2l−k ) ≤ αy + β + 2l < 2k (m + 2l−k + ), so 1 4
∧
−k 0 0 (αy + β + 2l ) − (m + 2l−k + ))2 = θzαβ θz,α,β+2 (y). l (y) = φ(2
Similarly, 1 2
2k (m − 2l−k + ) ≤ αz + β − 2l < 2k (m − 2l−k + 1), 1 2
2k (m − 2l−k ) ≤ αy + β − 2l < 2k (m − 2l−k + ), so 1 4
∧
0 −k 0 θz,α,β−2 (αy + β − 2l ) − (m − 2l−k + ))2 = θzαβ (y). l (y) = φ(2 0 0 What this shows is that θz,α,β+2 Q l (y) = θzαβ (y) if either is non-zero, so we have the equality in any case. Q
(ii) It follows that g(α, y, z) = limb→∞
1 Rb 0 θ (y)dβ b 0 zαβ
R 2l
γ = 2−l
0
is defined. P P Set
0 θzαβ (y)dβ.
From (i) we see that γ = 2−l
R 2l (m+1) 2l m
for every m ∈ Z, and therefore that γ=
1 2m
R 2l m
l
0
0 θzαβ (y)dβ
0 θzαβ (y)dβ
for every m ≥ 1. Now θzαβ (y) is always greater than or equal to 0, so if 2l m ≤ b ≤ 2l (m + 1) then m γ m+1
= ≤
1 2l (m+1) 1 2l m
Z 0
Z
2l m
0 2l (m+1)
Z
b
0 θzαβ (y)dy ≤
1 b
0 θzαβ (y)dy =
m+1 γ, m
0
0 θzαβ (y)dβ
which approach γ as b → ∞. Q Q 0 (c) Because (α, β) 7→ θzαβ (y) is Borel measurable, each of the functions α 7→
1 Rn 0 θ (y)dy, n 0 zαβ
for n ≥ 1,
is Borel measurable (putting 251L and 252P together), and α 7→ g(α, y, z) : ]0, ∞[ → R is Borel measurable; 0 at the same time, since 0 ≤ θzαβ (y) ≤ 1 for all α and β, 0 ≤ g(α, y, z) ≤ 1 for every α, and θ˜z (y) = R2 1 g(α, y, z)dα is defined in [0, 1]. 1 α
(d) For any y < z, γ ∈ R and α > 0, g(α, y + γ, z + γ) = g(α, y, z). P P It is enough to consider the case γ ≥ 0. In this case
504
Fourier analysis
1 b→∞ b
Z
b
g(α, y + γ, z + γ) = lim
1 b→∞ b
Z
0
=
= lim
1
0 θz+γ,α,β (y + γ)dβ
b
= lim
1 lim b→∞ b
286R
θαz+αγ+β (αy + αγ + β)dβ Z
0 b+αγ
θαz+β (αy + β)dβ αγ Z b+αγ
b→∞ b
αγ
0 (y)dβ, θzαβ
so Z
Z
b+αγ
1 ¯¯ b→∞ b b 2αγ lim = b→∞ b
0 θzαβ (y)dβ −
|g(α, y + γ, z + γ) − g(α, y, z)| = lim ≤
R1 1
g(α, y + γ, z + γ)dα =
0 α
0
¯ 0 θzαβ (y)dβ ¯
0. Q Q
It follows that whenever y < z and γ ∈ R, θ˜z+γ (y + γ) =
αγ
R1 1 0 α
g(α, y, z)dα = θ˜z (y).
(e) The next essential fact to note is that θ2z (2y) is always equal to θz (y). P P If θz (y) 6= 0, then (as in (b) above) there are k, m ∈ Z such that 1 2
1 2
2k (m + ) ≤ z < 2k (m + 1),
1 4
∧
2k m ≤ y ≤ 2k (m + ),
θz (y) = φ(2−k y − (m + ))2 .
In this case, 1 2
2k+1 (m + ) ≤ 2z < 2k+1 (m + 1),
1 2
2k+1 m ≤ 2y ≤ 2k+1 (m + ),
so 1 4
∧
θ2z (2y) = φ(2−k−1 · 2y − (m + ))2 = θz (y). Similarly, 1 2
1 2
2k−1 (m + ) ≤ z < 2k−1 (m + 1),
1 2
1 2
2k−1 m ≤ y ≤ 2k−1 (m + ),
so 1 2
1 2
∧
1 4
θ 21 z ( y) = φ(2−k+1 · y − (m + ))2 = θz (y). This shows that θ2z (2y) = θz (y) if either is non-zero, and therefore in all cases. Q Q Accordingly 0 0 (y) (y) = θ2αz+2β (2αy + 2β) = θαz+β (αy + β) = θzαβ θz,2α,2β
for all y, z, β ∈ R and all α > 0. (f ) Consequently 1 b→∞ b
Z
g(2α, y, z) = lim
2 b→∞ b
0
Z
= lim
0
b
2 b→∞ b
0 θz,2α,β (y)dβ = lim
b/2
0 θzαβ (y)dβ = lim
1
Z
0 Z b
b→∞ b
whenever α > 0 and y, z ∈ R. It follows that
Rδ 1
γ α
g(α, y, z)dα =
Rδ 1 γ α
b/2
g(2α, y, z)dα =
0
0 θz,2α,2β (y)dβ
0 θzαβ (y)dβ = g(α, y, z)
R 2δ 1 2γ α
g(α, y, z)dα
286S
Carleson’s theorem
whenever 0 < γ ≤ δ, and therefore that
R 2γ
1 g(α, y, z)dα α
γ
=
R2 1
505
g(α, y, z)dα
1 α
for every γ > 0. P P Take k ∈ Z such that 2k ≤ γ < 2k+1 . Then Z γ
2γ
1 g(α, y, z)dα α
Z
2k+1
= 2k
Z
Z
γ
Z
Z
2
2γ
1 1 1 g(α, y, z)dα − g(α, y, z)dα + g(α, y, z)dα α α α k k+1 2 2
2k+1
1 g(α, y, z)dα α
= 2k
=
1 g(α, y, z)dα. α 1
Q Q
(g) Now if α, γ > 0 and y < z, 1 Rb θαγz+β (αγy b 0
g(α, γy, γz) = limb→∞
+ β)dβ = g(αγ, y, z).
So if γ > 0 and y < z, Z θ˜γz (γy) =
Z
2
1 g(α, γy, γz)dα α 1
Z
2γ
1 g(α, y, z)dα α
= γ
2
= Z
1 2
= 1
1 g(αγ, y, z)dα α
1 g(α, y, z)dα α
= θ˜z (y).
Putting this together with (d), we see that if y < z then θ˜z (y) = θ˜z−y (0) = θ˜1 (0). (h) I have still to check that θ˜1 (0) is not zero. But suppose that 1 ≤ α < 67 and that there is some m ∈ Z 1 5 such that 2(m + 12 ) ≤ β ≤ 2(m + 12 ). Then 2(m + 12 ) ≤ α + β < 2(m + 1), while | 12 β − (m + 14 )| ≤ 61 , so 1 2
∧
1 4
θα+β (β) = φ( β − (m + ))2 = 1. What this means is that, for 1 ≤ α < 76 , g(α, 1, 0) =
1 lim m→∞ 2m
≥ lim
1
Z
θα+β (β)dβ 0 m−1 X
m→∞ 2m
So θ˜1 (0) =
R2 1 1 α
2m
µ[2(j +
j=0
g(α, 1, 0)dα ≥
1 3
1 ), 2(j 12
R 7/6 1 1
α
+
5 )] 12
1 3
= .
dα > 0.
This completes the proof. 286S Lemma Suppose that f ∈ L2C . (a) For every x ∈ R, ˜ )(x) = lim inf n→∞ (Af
1 n
R2 1 Rn 1 α
0
(D1/α AMβ Dα f )(x)dβdα
˜ is defined R in [0, ∞], and√Af : R → [0, ∞] is Borel measurable. ˜ (b) F Af ≤ C9 kf k2 µF whenever µF < ∞.
∧
˜ at every point. (c) If f is a rapidly decreasing test function and z ∈ R, 2π|(f × θ˜z )∨ | ≤ Af proof (a) The point here is that the function (α, β, x) 7→ (D1/α AMβ Dα f )(x) : ]0, ∞[ × R 2 → [0, ∞]
506
Fourier analysis
286S
is Borel measurable. P P x α
(D1/α AMβ Dα f )(x) = (AMβ Dα f )( ) =
X
sup
z∈R,P ⊆Q is finite σ∈P,z∈J r σ
x α
|(Mβ Dα f |φσ )φσ ( )|.
Look at the central term in this formula. For any σ ∈ Q, we have Z
∞
(Mβ Dα f |φσ ) =
eiβt f (αt)φσ (t)dt
−∞ Z ∞
=
1 α
eiβt/α f (t)φσ (t/α)dt.
−∞
2 Now φσ is a rapidly decreasing test function, so there is some γ£ ≥ 0 such £ that |φσ (t)| ≤ γ/(1 + t ) for every 1 t ∈ R. This means that if α > 0 and hαn in∈N is a sequence in 2 α, ∞ and we set g(t) = supn∈N |φσ (t/αn )| for t ∈ R, then g(t) ≤ 4γ/(4 + t2 ) for every t and g is integrable. So Lebesgue’s Dominated Convergence Theorem tells us that if hαn in∈N → α and hβn in∈N → β,
1 αn
R∞
eiβn t/αn f (t)φσ (t/αn )dt →
−∞
1 α
R∞
−∞
eiβt/α f (t)φσ (t/α)dt.
Thus (α, β) 7→ (Mβ Dα f |φσ ) : ]0, ∞[ × R → R is continuous; and this is true for every σ ∈ Q. Accordingly P x (α, β, x) 7→ σ∈P,z∈Jσr |(Mβ Dα f |φσ )φσ ( )| α
is continuous for every z ∈ R and every finite P ⊆ Q, and (α, β, x) 7→ (D1/α AMβ Dα f )(x) is Borel measurable by 256Ma. Q Q It follows that the repeated integrals
R2 1 Rn 1 α
0
(D1/α AMβ Dα f )(x)dβdα
˜ is Borel measurable. are defined in [0, ∞] and are Borel measurable functions of x (252P again), so that Af (b) For any n ∈ N, Z
1 n F
Z
2
1 α 1
Z 0
n
(D1/α AMβ Dα f )(x)dβdαdx =
1 n
(by Fubini’s theorem, 252H) = = ≤ (286O)
1 n 1 n 1 n
Z
2
1 α 1
Z nZ 0
Z 2Z nZ 1
Z 2Z 1
Z 2Z
0 n
F
(D1/α AMβ Dα f )(x)dxdβdα
x 1 (AMβ Dα f )( )dxdβdα α α F
Z
α−1 F
0
(AMβ Dα f )(x)dxdβdα
n
C9 kMβ Dα f k2 1
0
p µ(α−1 F )dβdα
286S
Carleson’s theorem
Z 2Z
1 n
= C9 ·
1
= C9 kf k2 = C9 kf k2
n
1 √ kf k2 α
0
p p
507
µF ·
Z
1 n
2
1 α 1
1 p µF dβdα α
·√ Z
n
dβdα p µF ln 2 ≤ C9 kf k2 µF . 0
So Z
Z
Z ˜ = Af
F
Z
Z
1 n F
≤ lim inf n→∞
(by Fatou’s lemma) ≤ C9 kf k2
Z
2
1 α 1
1 n n→∞ F
lim inf
n
(D1/α AMβ Dα f )(x)dβdαdx
0
Z
2
1 α 1
n
(D1/α AMβ Dα f )(x)dβdαdx
0
p
µF .
(c) For any x ∈ R,
R∞
∧
|f (y)| −∞
R2 1¡ 1 α
supn∈N
1 n
¢
Rn
0 θzαβ (y)dβ dαdy ≤ ln 2 ·
0
R∞
∧
|f (y)|dy −∞
is finite. So Z
∧ 1 (f × θ˜z )∨ (x) = √
2π
1 =√
−∞ ∞
∧
eixy f (y)θ˜z (y)dy
Z
2π
=
∞
−∞
Z
∧
eixy f (y) Z
Z
∞
2π n→∞ 1 αn
Z
=
1 1 lim α n n→∞ 1 2
∧ 1 1 √ lim eixy f (y) 2π n→∞ −∞ αn 1
(by Lebesgue’s Dominated Convergence Theorem) Z 2 Z 1 1 √ lim = (by Fubini’s theorem)
2
2
1 lim n→∞ 1 αn
Z
n
0
n
Z
0
∞
Z
Z
n 0 n
0
0 θzαβ (y)dβdαdy
0 θzαβ (y)dβdαdy
∧
−∞
0 (y)dydβdα eixy f (y)θzαβ
∧
0 (f × θzαβ )∨ (x)dβdα,
and Z
¯ ∧ 2π|(f × θ˜z )∨ (x)| = 2π ¯ lim
Z
2
1 n→∞ 1 αn
Z
≤ 2π lim inf n→∞
Z
≤ lim inf n→∞
2
(286Qb) ˜ )(x). = (Af
0
2
1 αn 1
1 αn 1
n
Z
0
Z
¯ ∧ 0 (f × θzαβ )∨ (x)dβdα¯ n
0
∧
0 |(f × θzαβ )∨ (x)|dβdα
n
(D1/α AMβ Dα f )(x)dβdα
508
Fourier analysis
286T
ˆ : R → [0, ∞] by setting 286T Lemma Set C10 = C9 /π θ˜1 (0). For f ∈ L2C , define Af R ˆ )(y) = supa≤b √1 | b e−ixy f (x)dx| (Af a 2π
for each y ∈ R. Then
R
√
ˆ ≤ C10 kf k2 µF whenever µF < ∞. Af
F
1 ˆ is measurable. P proof (a) As usual, the first step is to confirm that Af P For a ≤ b, y 7→ | √
Rb
2π a
e−ixy f (x)dx|
is continuous (by 283Cf, since f × χ[a, b] is integrable), so 256M gives the result. Q Q (b) Suppose that f is a rapidly decreasing test function. Then ˆ )(y) ≤ 1 (A˜f )(−y) (Af π θ˜1 (0) ∨
for every y ∈ R. P P If a ∈ R then Z
Z
a
1 √ | e−ixy f (x)dx| 2π −∞
=
∞
1 √ | e−ixy θ˜a (x)f (x)dx| θ˜1 (0) 2π −∞
(286R) 1
1
∨∧
= ˜ |(f × θ˜a )∨ (−y)| = ˜ |(f × θ˜a )∨ (−y)| θ1 (0) θ1 (0) (284C) ≤
∨ 1 (A˜f )(−y) ˜ 2π θ1 (0)
(286Sc). So if a ≤ b in R, 1 R b −ixy √ | e f (x)dx| 2π a
∨
1
≤ ˜ (A˜f )(−y); π θ1 (0)
taking the supremum over a and b, we have the result. Q Q It follows that Z F
ˆ ≤ 1 Af π θ˜1 (0)
Z
∨
−F
(286Sb, 284Oa) = C10 kf k2
1 A˜f ≤ ˜ C9 kf k2 π θ1 (0)
p
µ(−F )
p µF .
(c) For general square-integrable f , take any ² > 0 and any n ∈ N. Set 1 Rb (Aˆn f )(y) = sup−n≤a≤b≤n √ | a e−ixy f (x)dx| 2π
for each y ∈ R. Let g be a rapidly decreasing test function such that kf − gk2 ≤ ² (284N). Then √
ˆ ≥ Aˆn g ≥ Aˆn f − √2n ² Ag 2π
(using Cauchy’s inequality), so R F
As ² is arbitrary,
R F
Aˆn f ≤
R F
r ˆ + Ag
n ²µF π
√
≤ C10 (kf k2 + ²) µF +
√ Aˆn f ≤ C10 kf k2 µF ; letting n → ∞, we get
R F
r n ²µF . π
ˆ ≤ C10 kf k2 √µF . Af
286U
Carleson’s theorem
509
286U Theorem If f ∈ L2C then 1 R b −ixy e f (x)dx 2π a
g(y) = lima→−∞,b→∞ √
is defined in C for almost every y ∈ R, and g represents the Fourier transform of f . proof (a) For n ∈ N, y ∈ R set ¯ Rn 1 ¯¯R b −ixy e f (x)dx − −n e−ixy f (x)dx¯. a 2π
γn (y) = supa≤−n,b≥n √
Then g(y) is defined whenever inf n∈N γn (y) = 0. P P If inf n∈N γn (y) = 0 and ² > 0, take m ∈ N such that R n −ixy 1 R b −ixy 1 √ γm (y) ≤ 2 ²; then | e f (x)dx − −n e f (x)dx| ≤ ² whenever n ≥ m and a ≤ −n, b ≥ n. But 2π a R n −ixy this means, first, that h −n e f (x)dxin∈N is a Cauchy sequence, so has a limit ζ say, and, second, that R b −ixy ζ ζ = lima→−∞,b→∞ a e f (x)dx, so that g(y) = √ is defined. Q Q 2π
Also each γn is a measurable function (cf. part (a) of the proof of 286T). (b) ?? Suppose, if possible, that {y : inf n∈N γn (y) > 0} is not negligible. Then limm→∞ µ{y : |y| ≤ m, inf n∈N γn (y) ≥
1 } m
> 0,
so there is an ² > 0 such that 1 ²
F = {y : |y| ≤ , inf n∈N γn (y) ≥ ²} has measure greater than ². Let n ∈ N be such that R∞ Rn 2 ( −∞ |f (x)|2 dx − −n |f (x)|2 dx) < ²3 , 4C10 and set f1 = f − f × χ[−n, n]; then 2C10 kf1 k2 ≤ ²3/2 . We have Z
γn (y) =
1 ¯ √ ¯ a≤−n,b≥n 2π a
b
sup
Z
1 | 2π a
b
n
¯ e−ixy f1 (x)dx¯
−n
ˆ 1 )(y), e−ixy f1 (x)dx| ≤ 2(Af
≤ 2 sup √ a≤b
Z e−ixy f1 (x)dx −
so that Z
Z
²µF ≤
ˆ 1 ≤ 2C10 kf1 k2 Af
γn ≤ 2 F
p
µF
F
(286T) ≤ ²3/2
p µF
and µF ≤ ²; but we chose ² so that µF would be greater than ². X X (c) Thus g(y) is defined for almost every y ∈ R. Now g represents the Fourier transform of f . P P Let h be ˆ a rapidly decreasing test R ∞function. Then the restriction of Af to the set on which it is finite is a tempered ˆ )(y)|h(y)|dy is finite, by 284F. Now function, by 286D, so −∞ (Af Z
∞
g×h= −∞
1 √ 2π
1 =√ 1 R n −ixy | e f (x)dx| 2π −n
(because √ Theorem)
Z
∞
¡
Z
n
lim
−∞ n→∞ −n Z ∞Z n
lim
2π n→∞ −∞ −n
¢ e−ixy f (x)dx h(y)dy
e−ixy f (x)h(y)dxdy
ˆ (y) for every n and y, so we can use Lebesgue’s Dominated Convergence ≤ Af
510
Fourier analysis
Z
1 =√
(because
Z
n
∞
lim
2π n→∞ −n −∞
R∞ Rn −∞ −n
286U
e−ixy f (x)h(y)dydx
|f (x)h(y)|dxdy is finite for each n) Z n Z ∧ = lim f (x)h(x)dx = n→∞
−n
∞
∧
f (x)h(x)dx
−∞
∧
because f × h is certainly integrable. As h is arbitrary, g represents the Fourier transform of f . Q Q 286V Theorem If f ∈ L2C (µ]−π,π] ) then its sequence of Fourier sums converges to it almost everywhere. proof Set f1 (x) = f (x) for x ∈ dom f , 0 for x ∈ R \ ]−π, π]; then f1 ∈ L2C (µ). Let g ∈ L2C (µ) represent 1 R a −ixy the inverse Fourier transform of f1 (284O). Then 286U tells us that f2 (x) = lima→∞ √ e g(y)dy is −a 2π
defined for almost every x, and that f2 represents the Fourier transform of g, so is equal almost everywhere to f1 (284Ib). Now, for any a ≥ 0, x ∈ R, Z
(where hax (y) = e
a
−a ixy
e−ixy g(y)dy = (g|hax ) if |y| ≤ a, 0 otherwise) ∧
= (f2 |hax ) (284Ob)
Z
1 =√
2π
∞
Z f2 (t)
e−ity hax (y)dy dt Z π sin(x−t)a 2 sin(x−t)a f2 (t)dt = √ f (t)dt.
−∞ Z ∞
2 =√
2π
∞ −∞
x−t
−∞
2π
So f (x) = f2 (x) = lima→∞
1 π
Rπ −π
x−t
−π
sin(x−t)a f (t)dt x−t
for almost every x ∈ ]−π, π]. On the other hand, writing hsn in∈N for the sequence of Fourier sums of f , we have, for any x ∈ ]−π, π[, 1 2π
sn (x) =
Rπ −π
sin(n+ 12 )(x−t)
f (t)
sin 12 (x−t)
dt
for each n, by 282Da. Now 1 2π
Z
π
f (t) −π
sin(n+ 21 )(x−t) sin 12 (x−t)
= =
1 π 1 π
Z Z
π
dt −
1 π
Z
π
f (t)
sin(n+ 12 )(x−t) x−t
−π
dt
¡ sin(n+ 12 )(x−t) sin(n+ 12 )(x−t) ¢ f (t) dt − dt 1
−π x+π x−π
2 sin 2 (x−t)
f (x − t) sin(n + 12 )t
x−t
¡
1 2 sin 12 t
−
1¢ dt. t
But if we look at the function ¡ px (t) = f (x − t)
1 2 sin 12 t
= 0 otherwise,
−
1¢ t
if x − π < t < x + π and t 6= 0,
286 Notes
Carleson’s theorem
px is integrable, because f is integrable over ]−π, π] and limt→0 1 | t
511 1 1 − 2 sin 12 t t
is finite. (This is where we need to know that |x| < π.) So Z π Z sin(n+ 12 )(x−t) 1 lim sn (x) − f (t) dt = lim π
n→∞
−π
x−t
n→∞
∞
−∞
= 0, so supt6=0,x−π≤t≤x+π |
1 − 2 sin 12 t
px (t) sin(n + 12 )t dt = 0
by the Riemann-Lebesgue lemma (282Fb). But this means that limn→∞ sn (x) = f (x) for any x ∈ ]−π, π[ 1 R π sin(x−t)a such that f (x) = lima→∞ f (t)dt, which is almost every x ∈ ]−π, π]. −π π
x−t
286W Glossary The following special notations are used in more than one paragraph of this section: µ for Lebesgue measure on R. 286A: f ∗ . 286C: Sα f , Mα f , Dα f . 286Ea: I, Q, Iσ , Jσ , kσ , xσ , yσ , Jσl , Jσr , yσl . 286Eb: φ, φσ , (f |g). 286Ec: w, wσ , C1 . 286F: ≤, ≤r , 4.
286G: C2 , C3 , C4 . 286H: mass, energy. 286I: C5 . 286K: C6 . 286L: C7 . 286M: C8 . 286N: C9 .
286O: Af . 286P: θz (y). 0 286Q: θzαβ (y). ˜ 286R: θz (y). ˜ . 286S: Af ˆ . 286T: C10 , Af
286X Basic exercises (a) Use 284Oa and 284Xf to shorten part (c) of the proof of 286U. P∞ P∞ (b) Show that if hck ik∈N is a sequence of complex numbers such that k=0 |ck |2 is finite, then k=0 ck eikx is defined in C for almost all x ∈ R. 286Y Further exercises (a) Show that if f is a square-integrable function on R r , where r ≥ 2, then 1 g(y) = √
( 2π)r
limα1 ,... ,αr →−∞,β1 ,... ,βr →∞
Rb a
e−iy . x f (x)dx
is defined in C for almost every y ∈ R r , and that g represents the Fourier transform of f . 286 Notes and comments This is not quite the longest single section in this treatise as a whole, but it is by a substantial margin the longest in the present volume, and thirty pages of sub-superscripts must tax the endurance of the most enthusiastic. You will easily understand why Carleson’s theorem is not usually presented at this level. But I am trying in this book to present complete proofs of the principal theorems, there is no natural place for Carleson’s theorem in later volumes as at present conceived, and it is (just) accessible at this point; so I take the space to do it here. The proof here divides naturally into two halves: the ‘combinatorial’ part in 286E-286M, up to the Lacey-Thiele lemma, followed by the ‘analytic’ part in 286N-286V, in which the averaging process
R2 1
1 α
limb→∞
1 b
Rb 0
. . . dβdα
is used to transform the geometrically coherent, but analytically irregular, functions θz into the characteristic 1 ˜ functions θz . From the standpoint of ordinary Fourier analysis, this second part is essentially routine; θ˜1 (0)
there are many paths we could follow, and we have only to take the ordinary precautions against illegitimate operations. Carleson (Carleson 66) stated his theorem in the Fourier-series form of 286V; but it had long been understood that this was equiveridical with the Fourier-transform version in 286U. There are of course many ways of extending the theorem. In particular, there are corresponding results for functions in Lp for any ¨ lin 71). The p > 1, and even for functions f such that f × ln(1 + |f |) × ln ln(2 + |f |) is integrable (Sjo ˆ as in 286T, then methods here do not seem to reach so far. I ought also to remark that if we define Af ˆ kp ≤ Ckf kp for every f ∈ Lp (Hunt 67, Mozzochi 71, there is for every p > 1 a constant C such that kAf C Jørsboe & Mejlbro 82). Note that the point of Carleson’s theorem, in either form, is that we take special limits. In the formulae
512
Fourier analysis 1 f (y) = √ lima→−∞,b→∞ ∧
2π
f (x) = limn→∞
Pn
Rb a
286 Notes
e−ixy f (x)dx,
−n ck e
ikx
,
valid almost everywhere for square-integrable f , we are not taking thePordinary integral R ∞ −ixy P functions ∞ ikx e f (x)dx or the unconditional sum c e . If f is not integrable, or k k∈Z k=−∞ |ck | is infi−∞ nite, these will not be defined at even one point. Carleson’s theorem makes sense only because we have a natural preference for particular kinds of improper integral and conditional sum. So when we return, in Chapter 44 of Volume 4, to Fourier analysis on general topological groups, there will simply be no language in which to express the theorem, and while versions have been proved for other groups (e.g., Schipp 78), they necessarily depend on some structure beyond the simple notion of ‘locally compact Hausdorff abelian topological group’. Even in R 2 , I understand that it is still unknown whether lima→∞
1 2π
R
B(0,a)
e−iy . x f (x)dx
will be defined a.e. for any square-integrable function f , if we use ordinary Euclidean balls B(0, a) in place of the rectangles in 286Ya.
2A1A
Set theory
513
Appendix to Volume 2 Useful Facts In the course of writing this volume, I have found that a considerable number of concepts and facts from various branches of mathematics are necessary to us. Nearly all of them are embedded in important and well-established theories for which many excellent textbooks are available and which I very much hope that you will one day study in depth. Nevertheless, I am reluctant to send you off immediately to courses in general topology, functional analysis and set theory, as if these were essential prerequisites for our work here, along with real analysis and basic linear algebra. For this reason I have written this Appendix, setting out those results which we actually need at some point in this volume. The great majority of them really are elementary – indeed, some are so elementary that they are not always spelt out in detail in orthodox treatments of their subjects. While I do not put this book forward as the proper place to learn any of these topics, I have tried to set them out in a way that you will find easy to integrate into regular approaches. I do not expect anybody to read systematically through this work, and I hope that the references given in the main chapters of this volume will be adequate to guide you to the particular items you need.
2A1 Set theory Especially for the examples in Chapter 21, we need some non-trivial set theory, which is best approached through the standard theory of cardinals and ordinals; and elsewhere in this volume I make use of Zorn’s Lemma. Here I give a very brief outline of the results involved, largely omitting proofs. Most of this material should be in any sound introduction to set theory. The references I give are to books which happen to have come my way and which I can recommend as reasonably suitable for beginners. I do not discuss axiom systems or logical foundations. The set theory I employ is ‘naive’ in the sense that I rely on my understanding of the collective experience of the last ninety years, rather than on any attempt at formal description, to distinguish legitimate from unsafe arguments. There are, however, points in Volume 5 at which such a relaxed philosophy becomes inappropriate, and I therefore use arguments which can, I believe, be translated into standard Zermelo-Fraenkel set theory without new ideas being invoked. Although in this volume I use the axiom of choice without scruple whenever appropriate, I will divide this section into two parts, starting with ideas and results not dependent on the axiom of choice (2A1A-2A1I) and continuing with the remainder (2A1J-2A1P). I believe that even at this level it helps us to understand the nature of the arguments better if we maintain a degree of separation. 2A1A Ordered sets (a) Recall that a partially ordered set is a set P together with a relation ≤ on P such that if p ≤ q and q ≤ r then p ≤ r p ≤ p for every p ∈ P if p ≤ q and q ≤ p then p = q. In this context, I will write p ≥ q to mean q ≤ p, and p < q or q > p to mean ‘p ≤ q and p 6= q’. ≤ is a partial order on P . (b) Let (P, ≤) be a partially ordered set, and A ⊆ P . A maximal element of A is a p ∈ A such that p 6< a for any a ∈ A. Note that A may have more than one maximal element. An upper bound for A is a p ∈ P such that a ≤ p for every a ∈ A; a supremum or least upper bound is an upper bound p such that p ≤ q for every upper bound q of A. There can be at most one such, because if p, p0 are both least upper bounds then p ≤ p0 and p0 ≤ p. Accordingly we may safely write p = sup A if p is the least upper bound of A. Similarly, a minimal element of A is a p ∈ A such that p 6> a for every a ∈ A; a lower bound of A is a p ∈ P such that p ≤ a for every a ∈ A; and inf A = a means that ∀ q ∈ P , a ≥ q ⇐⇒ p ≥ q for every p ∈ A. A subset A of P is order-bounded if it has both an upper bound and a lower bound.
514
Appendix
2A1A
A subset A of P is upwards-directed if for any p, p0 ∈ A there is a q ∈ A such that p ≤ q and p0 ≤ q; that is, if any non-empty finite subset of A has an upper bound in A. Similarly, A ⊆ P is downwards-directed if for any p, p0 ∈ A there is a q ∈ A such that q ≤ p and q ≤ p0 ; that is, if any non-empty finite subset of A has a lower bound in A. It is sometimes convenient to adapt the notation for closed intervals to arbitrary partially ordered sets: [p, q] will be {r : p ≤ r ≤ q}. (c) A totally ordered set is a partially ordered set (P, ≤) such that for any p, q ∈ P , either p ≤ q or q ≤ p. ≤ is a total or linear order on P . (d) A lattice is a partially ordered set (P, ≤) such that for any p, q ∈ P , p ∨ q = sup{p, q} and p ∧ q = inf{p, q} are defined in P . (e) A well-ordered set is a totally ordered set (P, ≤) such that inf A exists and belongs to A for every non-empty set A ⊆ P ; that is, every non-empty subset of P has a least element. In this case ≤ is a well-ordering of P . 2A1B Transfinite Recursion: Theorem Let (P, ≤) be a well-ordered set and X any class. For p ∈ SP write Lp for the set {q : q ∈ P, q < p} and X Lp for the class of all functions from Lp to X. Let F : p∈P X Lp → X be any function. Then there is a unique function f : P → X such that f (p) = F (f ¹Lp ) for every p ∈ P . proof There are versions of this result in Enderton 77 (p. 175) and Halmos 60 (§18). Nevertheless I write out a proof, since it seems to me that most elementary books on set theory do not give it its proper place at the very beginning of the theory of well-ordered sets. (a) Let Φ be the class of all functions φ such that (α) dom φ is a subset of P , and Lp ⊆ dom φ for every p ∈ dom φ; (β) φ(p) ∈ X for every p ∈ dom φ, and φ(p) = F (φ¹Lp ) for every p ∈ dom φ. (b) If φ, ψ ∈ Φ then φ and ψ agree on dom φ∩dom ψ. P P?? If not, then A = {q : q ∈ dom φ∩dom ψ, φ(q) 6= ψ(q)} is non-empty. Because P is well-ordered, A has a least element p say. Now Lp ⊆ dom φ ∩ dom ψ and Lp ∩ A = ∅, so φ(p) = F (φ¹Lp ) = F (ψ¹Lp ) = ψ(p), which is impossible. X XQ Q (c) It follows that Φ is a set, since the function φ 7→ dom φ is an injective function from Φ to PP , and its inverse is a surjection from a subset of PP onto Φ. We can therefore, without inhibitions, define a function f by writing S dom f = φ∈Φ dom φ, f (p) = φ(p) whenever φ ∈ Φ, p ∈ dom φ. S (If you think that a function φ is just the set of ordered pairs {(p, φ(p)) : p ∈ dom φ}, then f becomes Φ.) Then f ∈ Φ. P P Of course f is a function from a subset of P to X. If p ∈ dom f , then there is a φ ∈ Φ such that p ∈ dom φ, in which case Lp ⊆ dom φ ⊆ dom f ,
f (p) = φ(p) = F (φ¹Lp ) = F (f ¹Lp ). Q Q
(d) f is defined everywhere in P . P P?? Otherwise, P \ dom f is non-empty and has a least element r say. Now Lr ⊆ dom f . Define a function ψ by saying that dom ψ = {r} ∪ dom f , ψ(p) = f (p) for p ∈ dom f and ψ(r) = F (f ¹Lr ). Then ψ ∈ Φ, because if p ∈ dom ψ either p ∈ dom f so Lp ⊆ dom f ⊆ dom ψ and ψ(p) = f (p) = F (f ¹Lp ) = F (ψ¹Lp ) or p = r so Lp = Lr ⊆ dom f ⊆ dom ψ and ψ(p) = F (f ¹Lr ) = F (ψ¹Lr ).
2A1Fc
Set theory
515
Accordingly ψ ∈ Φ and r ∈ dom ψ ⊆ dom f . X XQ Q (e) Thus f : P → X is a function such that f (p) = F (f ¹Lp ) for every p. To see that f is unique, observe that any function of this type must belong to Φ, so must agree with f on their common domain, which is the whole of P . Remark If you have been taught to distinguish between the words ‘set’ and ‘class’, you will observe that my naive set theory is a relatively tolerant one in that it is willing to allow class variables in its theorems. 2A1C Ordinals An ordinal (sometimes called a ‘von Neumann ordinal’) is a set ξ such that if η ∈ ξ then η is a set and η 6∈ η, if η ∈ ζ ∈ ξ then η ∈ ξ, writing ‘η ≤ ζ’ to mean ‘η ∈ ζ or η = ζ’, (ξ, ≤) is well-ordered (Enderton 77, p. 191; Halmos 60, §19; Henle 86, p. 27; Krivine 71, p. 24; Roitman 90, 3.2.8. Of course many set theories do not allow sets to belong to themselves, and/or take it for granted that every object of discussion is a set, but I prefer not to take a view on such points in general.) 2A1D Basic facts about ordinals (a) If ξ is an ordinal, then every member of ξ is an ordinal. (Enderton 77, p. 192; Henle 86, 6.4; Krivine 71, p. 14; Roitman 90, 3.2.10.) (b) If ξ, η are ordinals then either ξ ∈ η or ξ = η or η ∈ ξ (and no two of these can occur together). (Enderton 77, p. 192; Henle 86, 6.4; Krivine 71, p. 14; Lipschutz 64, 11.12; Roitman 90, 3.2.13.) It is customary, in this case, to write η < ξ if η ∈ ξ and η ≤ ξ if either η ∈ ξ or η = ξ. Note that η ≤ ξ iff η ⊆ ξ. (c) If A is any non-empty class of ordinals, then there is an α ∈ A such that α ≤ ξ for every ξ ∈ A. (Henle 86, 6.7; Krivine 71, p. 15.) (d) If ξ is an ordinal, so is ξ ∪ {ξ}; call it ‘ξ + 1’. If ξ < η then ξ + 1 ≤ η; ξ + 1 is the least ordinal greater than ξ. (Enderton 77, p. 193; Henle 86, 6.3; Krivine 71, p. 15.) For any ordinal ξ, either S there is a greatest ordinal η < ξ, in which case ξ = η + 1 and we call ξ a successor ordinal, or ξ = ξ, in which case we call ξ a limit ordinal. (e) The first few ordinals are 0 = ∅, 1 = 0 + 1 = {0} = {∅}, 2 = 1 + 1 = {0, 1} = {∅, {∅}}, 3 = 2 + 1 = {0, 1, 2}, . . . . The first infinite ordinal is ω = {0, 1, 2, . . . }, which may be identified with N. (f ) The union of any set of ordinals is an ordinal. (Enderton 77, p. 193; Henle 86, 6.8; Krivine 71, p. 15; Roitman 90, 3.2.19.) (g) If (P, ≤) is any well-ordered set, there is a unique ordinal ξ such that P is order-isomorphic to ξ, and the order-isomorphism is unique. (Enderton 77, pp. 187-189; Henle 86, 6.13; Halmos 60, §20.) 2A1E Initial ordinals An initial ordinal is an ordinal κ such that there is no bijection between κ and any member of κ. (Enderton 77, p. 197; Halmos 60, §25; Henle 86, p. 34; Krivine 71, p. 24; Roitman 90, 5.1.10, p. 79). 2A1F Basic facts about initial ordinals (a) All finite ordinals, and the first infinite ordinal ω, are initial ordinals. (b) For every well-ordered set P there is a unique initial ordinal κ such that there is a bijection between P and κ. (c) For every ordinal ξ there is a least initial ordinal greater than ξ. (Enderton 77, p. 195; Henle 86, 7.2.1.) If κ is an initial ordinal, write κ+ for the least initial ordinal greater than κ. We write ω1 for ω + , ω2 for ω1+ , and so on.
516
Appendix
2A1Fd
(d) For any initial ordinal κ ≥ ω there is a bijection between κ×κ and κ; consequently there are bijections between κ and κr for every r ≥ 1. 2A1G Schr¨ oder-Bernstein theorem I remind you of the following fundamental result: if X and Y are sets and there are injections f : X → Y , g : Y → X then there is a bijection h : X → Y . (Enderton 77, p. 147; Halmos 60, §22; Henle 86, 7.4; Lipschutz 64, p. 145; Roitman 90, 5.1.2. It is also a special case of 344D in Volume 3.) 2A1H Countable subsets of PN The following results will be needed below. (a) There is a bijection between PN and R. (Enderton 77, p. 149; Lipschutz 64, p. 146.) (b) Suppose that X is any set such that there is an injection from X into PN. Let C be the set of countable subsets of X. Then there is a surjection from PN onto C. P P Let f : X → PN be an injection. Set f1 (x) = {0} ∪ {i + 1 : i ∈ f (x)}; then f1 : X → PN is injective and f1 (x) 6= ∅ for every x ∈ X. Define g : PN → PX by setting g(A) = {x : ∃ n ∈ N, f1 (x) = {i : 2n (2i + 1) ∈ A}} for each A ⊆ N. Then g(A) is countable, since we have an injection x 7→ min{n : f1 (x) = {i : 2n (2i + 1) ∈ A}} from g(A) to N. Thus g is a function from PN to C. To see that g is surjective, observe that ∅ = g(∅), while if C ⊆ X is countable and not empty there is a surjection h : N → C; now set A = {2n (2i + 1) : n ∈ N, i ∈ f1 (h(n))}, and see that g(A) = C. Q Q (c) Again suppose that X is a set such that there is an injection from X to PN, and write H for the set of functions h such that dom h is a countable subset of X and h takes values in {0, 1}. Then there is a surjection from PN onto H. P P Let C be the set of countable subsets of X and let g : PN → C be a surjection, as in (a). For A ⊆ N set g0 (A) = g({i : 2i ∈ A}), g1 (A) = g({i : 2i + 1 ∈ A}), so that g0 (A), g1 (A) are countable subsets of X, and A 7→ (g0 (A), g1 (A)) is a surjection from PN onto C × C. Let hA be the function with domain g0 (A) ∪ g1 (A) such that hA (x) = 1 if x ∈ g1 (A), 0 if x ∈ g0 (A) \ g1 (A). Then A 7→ hA is a surjection from PN onto H. Q Q 2A1I Filters I pause for a moment to discuss a construction which is of great value in investigating topological spaces, but has other uses, and in its nature belongs to elementary set theory (much more elementary, indeed, than the work above). (a) Let X be a non-empty set. A filter on X is a family F of subsets of X such that X ∈ F, ∅ ∈ / F, E ∩ F ∈ F whenever E, F ∈ F, E ∈ F whenever X ⊇ E ⊇ F ∈ F. The second condition implies (inducing on n) that F0 ∩ . . . ∩ Fn ∈ F whenever F0 , . . . , Fn ∈ F. (b) Let X, Y be non-empty sets, F a filter on X and f : D → Y a function, where D ∈ F . Then {E : E ⊆ Y, f −1 [E] ∈ F} is a filter on Y (because f −1 [Y ] = D, f −1 [∅] = ∅, f −1 [E ∩ F ] = f −1 [E] ∩ f −1 [F ], X ⊇ f −1 [E] ⊇ f −1 [F ] whenever Y ⊇ E ⊇ F ); I will call it f [[F]], the image filter of F under f . Remark Of course there is a hidden variable in this notation. Ordinarily in this book I regard a function f as being defined by its domain dom f and its values on its domain; that is, it is determined by its graph {(x, f (x)) : x ∈ dom f }, and indeed I normally do not distinguish between a function and its graph. This
2A1Le
Set theory
517
means that when I write ‘f : D → Y is a function’ then the class D = dom f can be recovered from the function, but the class Y cannot; all I promise is that Y includes the class f [D] of values of f . Now in the notation f [[F]] above we do actually need to know which set Y it is to be a filter on, even though this cannot be discovered from knowledge of f and F. So you will always have to infer it from the context. 2A1J The Axiom of Choice I come now to the second half of this section, in which I discuss concepts and theorems dependent on the Axiom of Choice. Let me remind you of the statement of this axiom: (AC) ‘whenever I is a set and hXi ii∈I is a family of non-empty sets indexed by I, there is a function f , with domain I, such that f (i) ∈ Xi for every i ∈ I’. The function f is a choice function; it picks out one member of each of the given family of non-empty sets Xi . I believe that one’s attitude to this principle is a matter for individual choice. It is an indispensable foundation for very large parts of twentieth-century pure mathematics, including a substantial fraction of the present volume; but there are also significant areas in which principles actually contradictory to it can be employed to striking effect, leading – in my view – to equally valid mathematics. At present it is the case that more current mathematical activity, by volume, depends on asserting the axiom of choice than on all its rivals put together; but it is a matter of judgement and taste where the most important, or exciting, ideas are to be found. For the present volume I follow standard practice in twentieth-century abstract analysis, using the axiom of choice whenever necessary; but in Volume 5 I hope to look at alternatives. 2A1K Zermelo’s Well-Ordering Theorem (a) The Axiom of Choice is equiveridical with each of the statements ‘for every set X there is a well-ordering of X’, ‘for every set X there is a bijection between X and some ordinal’, ‘for every set X there is a unique initial ordinal κ such that there is a bijection between X and κ.’ (Enderton 77, p. 196 et seq.; Halmos 60, §17; Henle 86, 9.1-9.3; Krivine 71, p. 20; Lipschutz 64, 12.1; Roitman 90, 3.6.38.) (b) When assuming the axiom of choice, as I do nearly everywhere in this treatise, I write #(X) for that initial ordinal κ such that there is a bijection between κ and X; I call this the cardinal of X. 2A1L Fundamental consequences of the Axiom of Choice (a) For any two sets X and Y , there is a bijection between X and Y iff #(X) = #(Y ). More generally, there is an injection from X to Y iff #(X) ≤ #(Y ), and a surjection from X onto Y iff either #(X) ≥ #(Y ) > 0 or #(X) = #(Y ) = 0. (b) In particular, #(PN) = #(R); write c for this common value, the cardinal of the continuum. Cantor’s theorem that PN and R are uncountable becomes the result ω < c, that is, ω1 ≤ c. (c) If X is any infinite set, and r ≥ 1, then there is a bijection between X r and X. (Enderton 77, p. 162; Halmos 60, §24.) (I note that we need some form of the axiom of choice to prove the result in this generality. But of course for most of the infinite sets arising naturally in mathematics – sets like N and PR – it is easy to prove the result without appeal to the axiom of choice.) (d) Suppose that κ is an infinite cardinal. IfSI is a set of cardinal at most κSand hAi ii∈I is a family of sets with #(Ai ) ≤ κ for every i ∈ I, then #( i∈I Ai ) ≤ κ. Consequently #( A) ≤ κ whenever A is a family of sets such that #(A) ≤ κ and #(A) ≤ κ for every A ∈ A. In particular, ω1 cannot be expressed as a countable union of countable sets, and ω2 cannot be expressed as a countable union of sets of cardinal at most ω1 . (e) Now we can rephrase 2A1Hc as: if #(X) ≤ c, then #(H) ≤ c, where H is the set of functions from a countable subset of X to {0, 1}. P P For we have an injection from X into PN, and therefore a surjection from PN onto H. Q Q
518
Appendix
2A1Lf
(f ) Any non-empty class of cardinals has a least member (by 2A1Dc). 2A1M Zorn’s Lemma In 2A1K I described the well-ordering principle. I come now to another proposition which is equiveridical with the axiom of choice: ‘Let (P, ≤) be a non-empty partially ordered set such that every non-empty totally ordered subset of P has an upper bound in P . Then P has a maximal element.’ This is Zorn’s Lemma. For the proof that the axiom of choice implies, and is implied by, Zorn’s Lemma, see Enderton 77, p. 151; Halmos 60, §16; Henle 86, 9.1-9.3; Roitman 90, 3.6.38. 2A1N Ultrafilters A filter F on a set X is an ultrafilter if for every A ⊆ X either A ∈ F or X \A ∈ F. If F is an ultrafilter on X and f : D → Y is a function, where D ∈ F, then f [[F]] is an ultrafilter on Y (because f −1 [Y \ A] = D \ f −1 [A] for every A ⊆ Y ). One type of ultrafilter can be described easily: if x is any point of a set X, then F = {F : x ∈ F ⊆ X} is an ultrafilter on X. (You need only read the definitions. Ultrafilters of this type are called principal ultrafilters.) But it is not obvious that there are any further ultrafilters, and indeed it is not possible to prove that there are any, without using a strong form of the axiom of choice, as follows. 2A1O The Ultrafilter Theorem As an example of the use of Zorn’s lemma which will be of great value in studying compact topological spaces (2A3N et seq., and §247), I give the following result. Theorem Let X be any non-empty set, and F a filter on X. Then there is an ultrafilter H on X such that F ⊆ H. proof (Cf. Henle 86, 9.4; Roitman 90, 3.6.37.) Let P be the set of all filters on X including F, and order P by inclusion, so that, for G1 , G2 ∈ P, G1 ≤ G2 in P iff G1 ⊆ G2 . It is easy to see that P is a partially orderedSset, and it is non-empty because F ∈ P. If Q is any non-empty totally ordered subset of P, then HQ = Q ∈ P. P P Of course HQ is a family of subsets of X. (i) Take any G0 ∈ Q; then X ∈ G0 ⊆ HQ . If G ∈ Q, then G is a filter, so ∅ ∈ / G; accordingly ∅ ∈ / HQ . (ii) If E, F ∈ HQ , then there are G1 , G2 ∈ Q such that E ∈ G1 and F ∈ G2 . Because Q is totally ordered, either G1 ⊆ G2 or G2 ⊆ G1 . In either case, G = G1 ∪ G2 ∈ Q. Now G is a filter containing both E and F , so it contains E ∩ F , and E ∩ F ∈ HQ . (iii) If X ⊇ E ⊇ F ∈ HQ , there is a G ∈ Q such that F ∈ G; and E ∈ G ⊆ HQ . This shows that HQ is a filter on X. (iv) Finally, HQ ⊇ G0 ⊇ F, so HQ ∈ P. Q Q Now HQ is evidently an upper bound for Q in P. We may therefore apply Zorn’s Lemma to find a maximal element H of P. This H is surely a filter on X including F. Now let A ⊆ X be such that A ∈ / H. Consider H1 = {E : E ⊆ X, E ∪ A ∈ H}. This is a filter on X. P P Of course it is a family of subsets of X. (i) X ∪ A = X ∈ H, so X ∈ H1 . ∅∪A=A∈ / H so ∅ ∈ / H1 . (ii) If E, F ∈ H1 then (E ∩ F ) ∪ A = (E ∪ A) ∩ (F ∪ A) ∈ H, so E ∩ F ∈ H1 . (iii) If X ⊇ E ⊇ F ∈ H1 then E ∪ A ⊇ F ∪ A ∈ H, so E ∪ A ∈ H and E ∈ H1 . Q Q Also H1 ⊇ H, so H1 ∈ P. But H is a maximal element of P, so H1 = H. Since (X \ A) ∪ A = X ∈ H, X \ A ∈ H1 and X \ A ∈ H. As A is arbitrary, H is an ultrafilter, as required. 2A1P I come now to a result from infinitary combinatorics for which I give a detailed proof, not because it cannot be found in many textbooks, but because it is usually given in enormously greater generality, to the point indeed that it may be harder to understand why the stated theorem covers the present result than to prove the latter from first principles. Theorem (a) Let hKα iα∈A be a family of countable sets, with #(A) strictly greater than c, the cardinal of the continuum. Then there are a set M , of cardinal at most c, and a set B ⊆ A, of cardinal strictly greater than c, such that Kα ∩ Kβ ⊆ M whenever α, β are distinct members of B.
§2A2 intro.
The topology of Euclidean space
519
(b) Let I be a set, and hfα iα∈A a family in {0, 1}I , the set of functions from I to {0, 1}, with #(A) > c. If hKα iα∈A is any family of countable subsets of I, then there is a set B ⊆ A, of cardinal greater than c, such that fα and fβ agree on Kα ∩ Kβ for all α, β ∈ B. (c) In particular, under the conditions of (b), there are distinct α, β ∈ A such that fα and fβ agree on Kα ∩ Kβ . proof (a) Choose inductively a family hMξ iξ 0 such that U (x, δ) ⊆ R \ A. But now there is an n such that kxn − xk < δ, in which case xn ∈ U (x, δ) ∩ A ⊆ U (x, δ) ∩ A. X X 2A2C Continuous functions (a) I begin with a characterization of continuous functions in terms of open sets. If r, s ≥ 1, D ⊆ R r and φ : D → R s is a function, we say that φ is continuous if for every x ∈ D, ² > 0 there is a δ > 0 such that kφ(y) − φ(x)k ≤ ² whenever y ∈ D and ky − xk ≤ δ. Now φ is continuous iff for every open set G ⊆ R s there is an open set H ⊆ R r such that φ−1 [G] = D ∩ H. P P (i) Suppose that φ is continuous and that G ⊆ R s is open. Set S H = {U : U ⊆ R r is open, φ[U ∩ D] ⊆ G}. Then H is a union of open sets, therefore open (1A2Bd), and H ∩D ⊆ φ−1 [G]. If x ∈ φ−1 [G], then φ(x) ∈ G, so there is an ² > 0 such that U (φ(x), ²) ⊆ G; now there is a δ > 0 such that kφ(y) − φ(x)k ≤ 21 ² whenever y ∈ D, ky − xk ≤ δ, so that φ[U (x, δ) ∩ D] ⊆ U (φ(x), ²) ⊆ G and x ∈ U (x, δ) ⊆ H. −1
As x is arbitrary, φ [G] = H ∩ D. As G is arbitrary, φ satisfies the condition. (ii) Now suppose that φ satisfies the condition. Take x ∈ D, ² > 0. Then U (φ(x), ²) is open, so there is an open H ⊆ R r such that H ∩ D = φ−1 [U (φ(x), ²)]; we see that x ∈ H, so there is a δ > 0 such that U (x, δ) ⊆ H; now if y ∈ D and ky − xk ≤ 21 δ then y ∈ D ∩ H, φ(y) ∈ U (φ(x), ²) and kφ(y) − φ(x)k ≤ ². As x and ² are arbitrary, φ is continuous. Q Q (b) Using the ²-δ definition of continuity, it is easy to see that a function φ from a subset D of R r to R s is continuous iff all its components φi are continuous, writing φ(x) = (φ1 (x), . . . , φs (x)) for x ∈ D. P P (i) If φ is continuous, i ≤ s, x ∈ D and ² > 0, then there is a δ > 0 such that |φi (y) − φi (x)| ≤ kφ(y) − φ(x)k ≤ ² whenever y ∈ D and ky − xk ≤ √ δ. (ii) If every φi is continuous, x ∈ D and ² > 0, then there are δi > 0 such that |φi (y) − φi (x)| ≤ ²/ s whenever y ∈ D and ky − xk ≤ δi ; setting δ = mini≤m δi > 0, we have kφ(y) − φ(x)k ≤ ² whenever y ∈ D and ky − xk ≤ δ. Q Q 2A2D Compactness in R r : Definition A subset F of R r is called compact if whenever G is a family of open sets covering F then there is a finite subset G0 of G still covering F .
2A2F
The topology of Euclidean space
521
2A2E Elementary properties of compact sets Take any r ≥ 1. (a) If F ⊆ R r is compact and E ⊆ R r is closed, then F ∩ E is compact. P P Let G be an open cover of E ∩ F . Then G ∪ {R r \ E} is an open cover of F , so has a finite subcover G0 say. Now G0 \ {R r \ E} is a finite subset of G covering F ∩ E. As G is arbitrary, F ∩ E is compact. Q Q (b) If F is compact and φ : D → R s is a continuous function with F ⊆ D, then φ[F ] is compact. P P Let G be an open cover of φ[F ]. Let H be {H : H ⊆ R r is open, ∃ G ∈ G, φ−1 [G] = D ∩ H}. If x ∈ F , then φ(x) ∈ φ[F ] so there is a G ∈ G suchSthat φ(x) ∈ G; now there is an H ∈ H such that x ∈ φ−1 [G] = D ∩ H (2A2Ca); as x is arbitrary, F ⊆ H. Let H0 be a finite subset of H covering F . For each H ∈ H0 , let GH ∈ G be such that φ−1 [GH ] = D ∩ H; then {GH : H ∈ H0 } is a finite subset of G covering φ[F ]. As G is arbitrary, φ[F ] is compact. Q Q (c) Any compact subset of R r is closed. P P Write G = R r \ F . Take any x ∈ G. Then Gn = R r \ B(x, 2−n ) is open for every n ∈ N (1A2G). Also S r r n∈N Gn = {y : y ∈ R , ky − xk > 0} = R \ {x} ⊇ F . So there is some finite set G0 ⊆ {Gn : n ∈ N} which covers F . There must be an n such that G0 ⊆ {Gi : i ≤ n}, so that S S F ⊆ G0 ⊆ i≤n Gi = Gn , and B(x, 2−n ) ⊆ G. As x is arbitrary, G is open and F is closed. Q Q (d) If F is compact and G is open and F ⊆ G, then there is a δ > 0 such that F + B(0, δ) ⊆ G. P P If F = ∅, this is trivial, as then F + B(0, 1) = {x + y : x ∈ F, y ∈ B(0, 1)} = ∅. Otherwise, set G = {U (x, δ) : x ∈ Rr , δ > 0, U (x, 2δ) ⊆ G}. S Then G is a family of open sets and G = G (because G is open), so G is an open cover of F and has a finite subcover G0 . Express G0 as {U (x0 , δ0 ), . . . , U (xn , δn )} where U (xi , 2δi ) ⊆ G for each i. Set δ = mini≤n δi > 0. If x ∈ F and y ∈ B(0, δ), then there is an i ≤ n such that x ∈ U (xi , δi ); now k(x + y) − xi k ≤ kx − xi k + kyk < δi + δ ≤ 2δi , so x + y ∈ U (xi , 2δi ) ⊆ G. As x and y are arbitrary, F + B(0, δ) ⊆ G. Q Q Remark This result is a simple form of the Lebesgue covering lemma. 2A2F The value of the concept of ‘compactness’ is greatly increased by the fact that there is an effective characterization of the compact subsets of R r . Theorem For any r ≥ 1, a subset F of R r is compact iff it is closed and bounded. proof (a) Suppose that F is compact. By 2A2Ec, it isSclosed. To see that it is bounded, consider G = {U (0, n) : n ∈ N}. G consists entirely of open sets, and G = Rr ⊇ F , so there is a finite G0 ⊆ G covering F . There must be an n such that G0 ⊆ {Gi : i ≤ n}, so that S S F ⊆ G0 ⊆ i≤n U (0, i) = U (0, n), and F is bounded. (b) Thus we are left with the converse; I have to show that a closed bounded set is compact. The main part of the argument is a proof by induction on r that the closed interval [−n, n] is compact for all n ∈ N, writing n = (n, . . . , n) ∈ Rr . (i) If r = 1 and n ∈ N and G is a family of open sets in R covering [−n, n], set
522
Appendix
2A2F
S A = {x : x ∈ [−n, n], there is a finite G0 ⊆ G such that [−n, x] ⊆ G0 }. S Then −n ∈ A, because if −n ∈ G ∈ G then [−n, −n] ⊆ {G}, and A is bounded above by n, so c = sup A exists and belongs to [−n, S n]. Next, c ∈ [−n, n] ⊆ G, so there is a G ∈ G containing c. Let δ > 0 be such that U (c, δ) ⊆ G. There is an x ∈ A such that x ≥ c − δ. Let G0 be a finite subset of G covering [−n, x]. Then G1 = G0 ∪ {G} is a finite subset of G covering [−n, c + 21 δ]. But c + 12 δ ∈ / A so c + 12 δ > n and G1 is a finite subset of G covering [−n, n]. As G is arbitrary, [−n, n] is compact and the induction starts. (ii) For the inductive step to r + 1, regard the closed interval F = [−n, n], taken in R r+1 , as the product of the closed interval E = [−n, n], taken in R r , with the closed interval [−n, n] ⊆ R; by the inductive hypothesis, both E and [−n, n] are compact. Let G be a family of open subsets of R r+1 covering r F . Write H for the S family of open subsets H of R such that H × [−n, n] is covered by a finite subfamily of G. Then E ⊆ H. P P Take x ∈ E. Set Ux = {U : U ⊆ R is open, ∃ G ∈ G, open H ⊆ R r , x ∈ H and H × U ⊆ G}. Then Ux is a family of open subsets of R. If ξ ∈ [−n, n], there is a G ∈ G containing (x, ξ); there is a δ > 0 such that U ((x, ξ), δ) ⊆ G; now U (x, 21 δ) and U (ξ, 12 δ) are open sets in R r , R respectively and U (x, 21 δ) × U (ξ, 12 δ) ⊆ U ((x, ξ), δ) ⊆ G, so U (ξ, 21 δ) ∈ Ux . As ξ is arbitrary, Ux is an open cover of [−n, n] in R. By (i), it has a finite subcover U0 , . . . , Uk say. For each j ≤ k we can findTHj , Gj such that Hj is an open subset of R r containing x and Hj × Uj ⊆ Gj ∈ G. Now set H = j≤k Hj . This is an open subset of R r containing x, and S H × [−n, n] ⊆ j≤n Gj is covered by a finite subfamily of G. So x ∈ H ∈ H. As x is arbitrary, H covers E. Q Q (iii) Now the inductive hypothesis tells us that E is compact, so there is a finite subfamily H0 of H S covering E. For each H ∈ H0 let GH be a finite subfamily of G covering H × [−n, n]. Then H∈H0 GH is a finite subfamily of G covering E × [−n, n] = F . As G is arbitrary, F is compact and the induction proceeds. (iv) Thus the interval [−n, n] is compact in R r for every r, n. Now suppose that F is a closed bounded set in R r . Then there is an n ∈ N such that F ⊆ [−n, n], that is, F = F ∩ [−n, n]. As F is closed and [−n, n] is compact, F is compact, by 2A2Ea. This completes the proof. 2A2G Corollary If φ : D → R is continuous, where D ⊆ R r , and F ⊆ D is a non-empty compact set, then φ is bounded and attains its bounds on F . proof By 2A2Cb, φ[F ] is compact; by 2A2D it is closed and bounded. To say that φ[F ] is bounded is just to say that φ is bounded on F . Because φ[F ] is a non-empty bounded set, it has an infimum a and a supremum b; now both belong to φ[F ] (by the criterion 2A2B(ii), or otherwise); because φ[F ] is closed, both belong to φ[F ], that is, φ attains its bounds. 2A2H Lim sup and lim inf revisited In §1A3 I briefly discussed lim supn→∞ an , lim inf n→∞ an for real sequences han in∈N . In this volume we need the notion of lim supδ↓0 f (δ), lim inf δ↓0 f (δ) for real functions f . I say that lim supδ↓0 f (δ) = u ∈ [−∞, ∞] if (i) for every v > u there is a η > 0 such that f (δ) is defined and less than or equal to v for every δ ∈ ]0, η] (ii) for every v < u, η > 0 there is a δ ∈ ]0, η] such that f (δ) is defined and greater than or equal to v. Similarly, lim inf δ↓0 f (δ) = u ∈ [−∞, ∞] if (i) for every v < u there is a η > 0 such that f (δ) is defined and greater than or equal to v for every δ ∈ ]0, η] (ii) for every v > u, η > 0 there is an δ ∈ ]0, η] such that f (δ) is defined and less than or equal to v. 2A2I
In the one-dimensional case, we have a particularly simple description of the open sets.
Proposition If G ⊆ R is any open set, it is expressible as the union of a countable disjoint family of open intervals. proof For x, y ∈ G write x ∼ y if either x ≤ y and [x, y] ⊆ G or y ≤ x and [y, x] ⊆ G. It is easy to check that ∼ is an equivalence relation on G. Let C be the set of equivalence classes under ∼. Then C is a
2A3Bd
General topology
523
partition of G. Now every C ∈ C is an open interval. P P Set a = inf C, b = sup C (allowing a = −∞ and/or b = ∞ if C is unbounded). If a < x < b, there are y, z ∈ C such that y ≤ x ≤ z, so that [y, x] ⊆ [y, z] ⊆ G and y ∼ x and x ∈ C; thus ]a, b[ ⊆ C. If x ∈ C, there is an open interval I containing x and included in G; since x ∼ y for every y ∈ I, I ⊆ C; so a ≤ inf I < x < sup I ≤ b and x ∈ ]a, b[. Thus C = ]a, b[ is an open interval. Q Q To see that C is countable, observe that every member of C contains a member of Q, so that we have a surjective function from a subset of Q onto C, and C is countable (1A1E).
2A3 General topology At various points – principally §§245-247, but also for certain ideas in Chapter 27 – we need to know something about non-metrizable topologies. I must say that you should probably take the time to look at some book on elementary functional analysis which has the phrases ‘weak compactness’ or ‘weakly compact’ in the index. But I can list here the concepts actually used in this volume, in a good deal less space than any orthodox, complete treatment would employ. 2A3A Topologies First we need to know what a ‘topology’ is. If X is any set, a topology onSX is a family T of subsets of X such that (i) ∅, X ∈ T (ii) if G, H ∈ T then G ∩ H ∈ T (iii) if G ⊆ T then G ∈ T (cf. 1A2B). The pair (X, T) is now a topological space. In this context, members of T are called open and their complements (in X) are called closed (cf. 1A2E-1A2F). 2A3B Continuous functions (a) If (X, T) and (Y, S) are topological spaces, a function φ : X → Y is continuous if φ−1 [G] ∈ T for every G ∈ S. (By 2A2Ca above, this is consistent with the ²-δ definition of continuity for functions from one Euclidean space to another.) (b) If (X, T), (Y, S) and (Z, U) are topological spaces and φ : X → Y and ψ : Y → Z are continuous, then ψφ : X → Z is continuous. P P If G ∈ U then ψ −1 [G] ∈ S so (ψφ)−1 [G] = φ−1 [ψ −1 [G]] ∈ T. Q Q (c) If (X, T) is a topological space, a function f : X → R is continuous iff {x : a < f (x) < b} is open whenever a < b in R. P P (i) Every interval ]a, b[ is open in R, so if f is continuous its inverse image {x : a < f (x) < b} must be open. (ii) Suppose that f −1 [ ]a, b[ ] is open whenever a < b, and let H ⊆ R be any open set. By the definition of ‘open’ set in R (1A2A), [ H = {]y − δ, y + δ[ : y ∈ R, δ > 0, ]y − δ, y + δ[ ⊆ H}, so f −1 [H] =
S −1 {f [ ]y − δ, y + δ[ ] : y ∈ R, δ > 0, ]y − δ, y + δ[ ⊆ H}
is a union of open sets in X, therefore open. Q Q (d) If r ≥ 1, (X, T) is a topological space, and φ : X → R r is a function, then φ is continuous iff φi : X → R is continuous for each i ≤ r, where φ(x) = (φ1 (x), . . . , φr (x)) for each x ∈ X. P P (i) Suppose that φ is continuous. For i ≤ r, y = (η1 , . . . , ηr ) ∈ R r , set πi (y) = ηi . Then |πi (y) − πi (z)| ≤ ky − zk for all y, z ∈ R r so πi : R r → R is continuous. Consequently φi = πi φ is continuous, by 2A3Bb. (ii) Suppose that every φi is continuous, and that H ⊆ R r is open. Set S
G = {G : G ⊆ X is open, G ⊆ φ−1 [H]}.
Then G0 = G is open, and G0 ⊆ φ−1 [H]. But suppose that x0 is any point of φ−1 [H]. Then there is a δ > 0 such that U (φ(x0 ), δ) ⊆ H, because H is open and contains φ(x0 ). For 1 ≤ i ≤ r set Vi = {x : φi (x0 ) − √δr < φi (x) < φi (x0 ) + √δr }; then Vi is the inverse image of an open set under the continuous T map φi , so is open. Set G = i≤r Vi . Then G is open (using (ii) of the definition 2A3A), x0 ∈ G, and kφ(x) − φ(x0 k < δ for every x ∈ G, so G ⊆ φ−1 [H], G ∈ G and x0 ∈ G0 . This shows that φ−1 [H] = G0 is open. As H is arbitrary, φ is continuous. Q Q
524
Appendix
2A3Be
(e) If (X, T) is a topological space, f1 , . . . , fr are continuous functions from X to R, and h : R r → R is continuous, then h(f1 , . . . , fr ) : X → R is continuous. P P Set φ(x) = (f1 (x), . . . , fr (x)) ∈ R r for x ∈ X. By (d), φ is continuous, so by 2A3Bb h(f1 , . . . , fr ) = hφ is continuous. Q Q In particular, f + g, f × g and f − g are continuous for all continuous functions f , g : X → R. (f ) If (X, T) and (Y, S) are topological spaces and φ : X → Y is a continuous function, then φ−1 [F ] is closed in X for every closed set F ⊆ Y . (For X \ φ−1 [F ] = φ−1 [Y \ F ] is open.) 2A3C Subspace topologies If (X, T) is a topological space and D ⊆ X, then TD = {G ∩ D : G ∈ T} is a topology on D. P P (i) ∅ = ∅ ∩ D and D = X ∩ D belong to TD . (ii) If G, H ∈ TD there are G0 , H 0 ∈ T such that G = G0 ∩ D, H S= H 0 ∩SD; now G ∩ H = G0 ∩ H 0 ∩ D ∈ TD . (iii) If G ⊆ TD set H = {H : H ∈ T, H ∩ D ∈ G}; then G = ( H) ∩ D ∈ TD . Q Q TD is called the subspace topology on D, or the topology on D induced by T. If (Y, S) is another topological space, and φ : X → Y is (T, S)-continuous, then φ¹D : D → Y is (TD , S)-continuous. (For if H ∈ S then (φ¹D)−1 [H] = D ∩ φ−1 [H] ∈ TD .) 2A3D Closures and interiors (a) In the proof of 2A3Bd I have already used the following idea. Let (X, T) be any topological space and A any subset of X. Write S int A = {G : G ∈ T, G ⊆ A}. Then int A is an open set, being a union of open sets, and is of course included in A; it must be the largest open set included in A, and is called the interior of A. (b) Because a set is closed iff its complement is open, we have a complementary notion: \ A = {F : F is closed, A ⊆ F } [ = X \ {X \ F : F is closed, A ⊆ F } [ = X \ {G : G is open, A ∩ G = ∅} [ = X \ {G : G is open, G ⊆ X \ A} = X \ int(X \ A). A is closed (being the complement of an open set) and is the smallest closed set including A; it is called the closure of A. (Compare 2A2A.) Because the union of two closed sets is closed (cf. 1A2F), A ∪ B = A ∪ B for all A, B ⊆ X. (c) There are innumerable ways of looking at these concepts; a useful description of the closure of a set is x ∈ A ⇐⇒ x ∈ / int(X \ A) ⇐⇒ there is no open set containing x and included in X \ A ⇐⇒ every open set containing x meets A.
2A3E Hausdorff topologies (a) The concept of ‘topological space’ is so widely drawn, and so widely applicable, that a vast number of different types of topological space have been studied. For this volume we shall not need much of the (very extensive) vocabulary which has been developed to describe this variety. But one useful word (and one of the most important concepts) is that of ‘Hausdorff space’; a topological space X is Hausdorff if for all distinct x, y ∈ X there are disjoint open sets G, H ⊆ X such that x ∈ G and y ∈ H. (b) In a Hausdorff space X, finite sets are closed. P P If z ∈ X, then for any x ∈ X \ {z} there is an open set containing x but not z, so X \ {z} is open and {z} is closed. So a finite set is a finite union of closed sets and is therefore closed. Q Q
2A3G
General topology
525
2A3F Pseudometrics Many important topologies (not all!) can be defined by families of pseudometrics; it will be useful to have a certain amount of technical skill with these. (a) Let X be a set. A pseudometric on X is a function ρ : X × X → [0, ∞[ such that ρ(x, z) ≤ ρ(x, y) + ρ(y, z) for all x, y, z ∈ X (the ‘triangle inequality’;) ρ(x, y) = ρ(y, x) for all x, y ∈ X; ρ(x, x) = 0 for all x ∈ X. A metric is a pseudometric ρ satisfying the further condition if ρ(x, y) = 0 then x = y. (b) Examples (i) For x, y ∈ R, set ρ(x, y) = |x − y|; then ρ is a metric on R (the ‘usual metric’ on R). pPr 2 (ii) For x, y ∈ R r , where r ≥ 1, set ρ(x, y) = kx − yk, defining kzk = i=1 ζi , as usual. Then ρ is r a metric, the Euclidean metric on R . (The triangle inequality for ρ comes from Cauchy’s inequality in 1A2C: if x, y, z ∈ R r , then ρ(x, z) = kx − zk = k(x − y) + (y − z)k ≤ kx − yk + ky − zk = ρ(x, y) + ρ(y, z). The other required properties of ρ are elementary. Compare 2A4Bb below.) (iii) For an example of a pseudometric which is not a metric, take r ≥ 2 and define ρ : R r ×Rr → [0, ∞[ by setting ρ(x, y) = |ξ1 − η1 | whenever x = (ξ1 , . . . , ξr ), y = (η1 , . . . , ηr ) ∈ R r . (c) Now let X be a set and P a non-empty family of pseudometrics on X. Let T be the family of those subsets G of X such that for every x ∈ G there are ρ0 , . . . , ρn ∈ P and δ > 0 such that U (x; ρ0 , . . . , ρn ; δ) = {y : y ∈ X, maxi≤n ρi (y, x) < δ} ⊆ G. Then T is a topology on X. P P (Compare 1A2B.) (i) ∅ ∈ T because the condition is vacuously satisfied. X ∈ T because U (x; ρ; 1) ⊆ X for any x ∈ X, ρ ∈ P. (ii) If G, H ∈ T and x ∈ G ∩ H, take ρ0 , . . . , ρm , ρ00 , . . . , ρ0n ∈ P, δ, δ 0 > 0 such that U (x; ρ0 , . . . , ρm ; δ) ⊆ G, U (x; ρ00 , . . . , ρ0n ; δ 0 ) ⊆ G; then U (x; ρ0 , . . . , ρm , ρ00 , . . . , ρ0n ; min(δ, δ 0 )) ⊆ G ∩ H. S As x is arbitrary, G ∩ H ∈ T. (iii) If G ⊆ T and x ∈ G, there is a G ∈ G such that x ∈ G; now there are ρ0 , . . . , ρn ∈ P and δ > 0 such that S U (x; ρ0 , . . . , ρn ; δ) ⊆ G ⊆ G. S As x is arbitrary, G ∈ T. Q Q T is the topology defined by P. (d) You may wish to have a convention to deal with the case in which P is the empty set; in this case the topology on X defined by P is {∅, X}. (e) In many important cases, P is upwards-directed in the sense that for any ρ1 , ρ2 ∈ P there is a ρ ∈ P such that ρi (x, y) ≤ ρ(x, y) for all x, y ∈ X and both i. In this case, of course, any set U (x; ρ0 , . . . , ρn ; δ), where ρ0 , . . . , ρn ∈ P, includes some set of the form U (x; ρ; δ), where ρ ∈ P. Consequently, for instance, a set G ⊆ X is open iff for every x ∈ G there are ρ ∈ P, δ > 0 such that U (x; ρ; δ) ⊆ G. (f ) A topology T is metrizable if it is the topology defined by a family P consisting of a single metric. Thus the Euclidean topology on R r is the metrizable topology defined by {ρ}, where ρ is the metric of (b-ii) above. 2A3G Proposition Let X be a set with a topology defined by a non-empty set P of pseudometrics on X. Then U (x; ρ0 , . . . , ρn ; ²) is open for all x ∈ X, ρ0 , . . . , ρn ∈ P and ² > 0. proof (Compare 1A2D.) Take y ∈ U (x; ρ0 , . . . , ρn ; ²). Set η = maxi≤n ρi (y, x),
δ = ² − η > 0.
526
Appendix
2A3G
If z ∈ U (y; ρ0 , . . . , ρn ; δ) then ρi (z, x) ≤ ρi (z, y) + ρi (y, x) < δ + η = ² for each i ≤ n, so U (y; ρ0 , . . . , ρn ; δ) ⊆ U (x; ρ0 , . . . , ρn ; ²). As y is arbitrary, U (x; ρ0 , . . . , ρn ; ²) is open. 2A3H Now we have a result corresponding to 2A2Ca, describing continuous functions between topological spaces defined by families of pseudometrics. Proposition Let X and Y be sets; let P be a non-empty family of pseudometrics on X, and Θ a non-empty family of pseudometrics on Y ; let T and S be the corresponding topologies. Then a function φ : X → Y is continuous iff whenever x ∈ X, θ ∈ Θ and ² > 0, there are ρ0 , . . . , ρn ∈ P and δ > 0 such that θ(φ(y), φ(x)) ≤ ² whenever y ∈ X and maxi≤n ρi (y, x) ≤ δ. proof (a) Suppose that φ is continuous; take x ∈ X, θ ∈ Θ and ² > 0. By 2A3G, U (φ(x); θ; ²) ∈ S. So G = φ−1 [U (φ(x); θ; ²)] ∈ T. Now x ∈ G, so there are ρ0 , . . . , ρn ∈ P and δ > 0 such that U (x; ρ0 , . . . , ρn ; δ) ⊆ G. In this case θ(φ(y), φ(x)) ≤ ² whenever y ∈ X and maxi≤n ρi (y, x) ≤ 21 δ. As x, θ and ² are arbitrary, φ satisfies the condition. (b) Suppose φ satisfies the condition. Take H ∈ S and consider G = φ−1 [H]. If x ∈ G, then φ(x) ∈ H, so there are θ0 , . . . , θn ∈ Θ and ² > 0 such that U (φ(x); θ0 , . . . , θn ; ²) ⊆ H. For each i ≤ n there are ρi1 , . . . , ρi,mi ∈ P and δi > 0 such that θ(φ(y), φ(x)) ≤ 12 ² whenever y ∈ X and maxj≤mi ρij (y, x) ≤ δi . Set δ = mini≤n δi > 0; then U (x; ρ00 , . . . , ρ0,m0 , . . . , ρn0 , . . . , ρn,mn ; δ) ⊆ G. As x is arbitrary, G ∈ T. As H is arbitrary, φ is continuous. 2A3I Remarks (a) If P is upwards-directed, the condition simplifies to: for every x ∈ X, θ ∈ Θ and ² > 0, there are ρ ∈ P and δ > 0 such that θ(φ(y), φ(x)) ≤ ² whenever y ∈ X and ρ(y, x) ≤ δ. (b) Suppose we have a set X and two non-empty families P, Θ of pseudometrics on X, generating topologies T and S on X. Then S ⊆ T iff the identity map φ from X to itself is a continuous function when regarded as a map from (X, T) to (X, S), because this will mean that G = φ−1 [G] belongs to T whenever G ∈ S. Applying the proposition above to φ, we see that this happens iff for every θ ∈ Θ, x ∈ X and ² > 0 there are ρ0 , . . . , ρn ∈ P and δ > 0 such that θ(y, x) ≤ ² whenever y ∈ X and maxi≤n ρi (y, x) ≤ δ. Similarly, reversing the roles of P and Θ, we get a criterion for when T ⊆ S, and putting the two together we obtain a criterion to determine when T = S. 2A3J Subspaces: Proposition If X is a set, P a non-empty family of pseudometrics on X defining a topology T on X, and D ⊆ X, then (a) for every ρ ∈ P, the restriction ρ(D) of ρ to D × D is a pseudometric on D; (b) the topology defined by PD = {ρ(D) : ρ ∈ P} on D is precisely the subspace topology TD described in 2A3C. proof (a) is just a matter of reading through the definition in 2A3Fa. For (b), we have to think for a moment. (i) Suppose that G belongs to the topology defined by PD . Set H = {H : H ∈ T, H ∩ D ⊆ G}, S H ∗ = H ∈ T, G∗ = H ∗ ∩ D ∈ TD ; then G∗ ⊆ G. On the other hand, if x ∈ G, then there are ρ0 , . . . , ρn ∈ P and δ > 0 such that (D)
(D)
(D)
U (x; ρ0 , . . . , ρn ; δ) = {y : y ∈ D, maxi≤n ρi
(y, x) < δ} ⊆ G.
Consider H = U (x; ρ0 , . . . , ρn ; δ) = {y : y ∈ X, maxi≤n ρi (y, x) < δ} ⊆ X.
2A3Mc
General topology
527
Evidently (D)
(D)
H ∩ D = U (x; ρ0 , . . . , ρn ; δ) ⊆ G. Also H ∈ T. So H ∈ H and x ∈ H ∩ D ⊆ H ∗ ∩ D = G∗ . Thus G = G∗ ∈ TD . (ii) Now suppose that G ∈ TD . Consider the identity map φ : D → X, defined by saying that φ(x) = x for every x ∈ D. φ obviously satisfies the criterion of 2A3H, if we endow D with PD and X with P, because ρ(φ(x), φ(y)) = ρ(D) (x, y) whenever x, y ∈ D and ρ ∈ P; so φ must be continuous for the associated topologies, and φ−1 [H] must belong to the topology defined by PD . But φ−1 [H] = G. Thus every set in TD belongs to the topology defined by PD , and the two topologies are the same, as claimed. 2A3K Closures and interiors Let X be a set, P a non-empty family of pseudometrics on X and T the topology defined by P. (a) For any A ⊆ X, x ∈ X, x ∈ int A ⇐⇒ there is an open set included in A containing x ⇐⇒ there are ρ0 , . . . , ρn ∈ P, δ > 0 such that U (x; ρ0 , . . . , ρn ; δ) ⊆ A. (b) For any A ⊆ X, x ∈ X, x ∈ A iff U (x; ρ0 , . . . , ρn ; δ) ∩ A 6= ∅ for every ρ0 , . . . , ρn ∈ P, δ > 0. (Compare 2A2B(ii), 2A3Dc.) 2A3L Hausdorff topologies Recall that a topology T is Hausdorff if any two points can be separated by open sets (2A3E). Now a topology defined on a set X by a non-empty family P of pseudometrics is Hausdorff iff for any two different points x, y of X there is a ρ ∈ P such that ρ(x, y) > 0. P P (i) Suppose that the topology is Hausdorff and that x, y are distinct points in X. Then there is an open set G containing x but not containing y. Now there are ρ0 , . . . , ρn ∈ P and δ > 0 such that U (x; ρ0 ), . . . , ρn ; δ) ⊆ G, in which case ρi (y, x) ≥ δ > 0 for some i ≤ n. (ii) If P satisfies the condition, and x, y are distinct points of X, take ρ ∈ P such that ρ(x, y) > 0, and set δ = 21 ρ(x, y). Then U (x; ρ; δ) and U (y; ρ; δ) are disjoint (because if z ∈ X, then ρ(z, x) + ρ(z, y) ≥ ρ(x, y) = 2δ, so at least one of ρ(z, x), ρ(z, y) is greater than or equal to δ), and they are open sets containing x, y respectively. As x and y are arbitrary, the topology is Hausdorff. Q Q In particular, metrizable topologies are Hausdorff. 2A3M Convergence of sequences (a) If (X, T) is any topological space, and hxn in∈N is a sequence in X, we say that hxn in∈N converges to x ∈ X, or that x is a limit of hxn in∈N , or hxn in∈N → x, if for every open set G containing x there is an n0 ∈ N such that xn ∈ G for every n ≥ n0 . (b) Warning In general topological spaces, it is possible for a sequence to have more than one limit, and we cannot safely write x = limn→∞ xn . But in Hausdorff spaces, this does not occur. P P If T is Hausdorff, and x, y are distinct points of X, there are disjoint open sets G, H such that x ∈ G and y ∈ H. If now hxn in∈N converges to x, there is an n0 such that xn ∈ G for every n ≥ n0 , so xn ∈ / H for every n ≥ n0 , and hxn in∈N cannot converge to y. Q Q In particular, a sequence in a metric space can have at most one limit. (c) Let X be a set, and P a non-empty family of pseudometrics on X, generating a topology T; let hxn in∈N be a sequence in X and x ∈ X. Then hxn in∈N converges to x iff limn→∞ ρ(xn , x) = 0 for every ρ ∈ P. P P (i) Suppose that hxn in∈N → x and that ρ ∈ P. Then for any ² > 0 the set G = U (x; ρ; ²) is an open set containing x, so there is an n0 such that xn ∈ G for every n ≥ n0 , that is, ρ(xn , x) < ² for every n ≥ n0 . As ² is arbitrary, limn→∞ ρ(xn , x) = 0. (ii) If the condition is satisfied, take any open set G containing X. Then there are ρ0 , . . . , ρk ∈ P and δ > 0 such that U (x; ρ0 , . . . , ρk ; δ) ⊆ G. For each i ≤ k there is an ni ∈ N such that ρi (xn , x) < δ for every n ≥ ni . Set n∗ = max(n0 , . . . , nk ); then xn ∈ U (x; ρ0 , . . . , ρk ; δ) ⊆ G for every n ≥ n∗ . As G is arbitrary, hxn in∈N → x. Q Q
528
Appendix
2A3Md
(d) Let (X, ρ) be a metric space, A a subset of X and x ∈ X. Then x ∈ A iff there is a sequence in A converging to x. P P(i) If x ∈ A, then for every n ∈ N there is a point xn ∈ A ∩ U (x; ρ; 2−n ) (2A3Kb); now hxn in∈N → x. (ii) If hxn in∈N is a sequence in A converging to x, then for every open set G containing x there is an n such that xn ∈ G, so that A ∩ G 6= ∅; by 2A3Dc, x ∈ A. Q Q 2A3N Compactness spaces.
The next concept we need is the idea of ‘compactness’ in general topological
(a) If (X, T) is any topological space, a subset K of X is compact if whenever G is a family in T covering K, then there is a finite G0 ⊆ G covering K. (Cf. 2A2D. A warning: many authors reserve the term ‘compact’ for Hausdorff spaces.) A set A ⊆ X is relatively compact in X if there is a compact subset of X including A. (b) Just as in 2A2E-2A2G (and the proofs are the same in the general case), we have the following results. (i) If K is compact and E is closed, then K ∩ E is compact. (ii) If K ⊆ X is compact and φ : K → Y is continuous, where (Y, S) is another topological space, then φ[K] is a compact subset of Y . (iii) If K ⊆ X is compact and φ : K → R is continuous, then φ is bounded and attains its bounds. 2A3O Cluster points (a) If (X, T) is a topological space, and hxn in∈N is a sequence in X, then a cluster point of hxn in∈N is an x ∈ X such that whenever G is an open set containing x and n ∈ N then there is a k ≥ n such that xk ∈ G. (b) Now if (X, T) is a topological space and A ⊆ X is relatively compact, every sequence hxn in∈N in A has a cluster point in X. P P Let K be a compact subset of X including A. Set G = {G : G ∈ T, {n : xn ∈ G} is finite}. ?? If G covers K, then there is a finite G0 ⊆ G covering K. Now S S N = {n : xn ∈ A} = {n : xn ∈ G0 } = G∈G0 {n : xn ∈ G0 }
S is a finite union of finite sets, which is absurd. X X Thus G does not cover K. Take any x ∈ K \ G. If G ∈ T and x ∈ G and n ∈ N, then G ∈ / G so {k : xk ∈ G} is infinite and there is a k ≥ n such that xk ∈ G. Thus x is a cluster point of hxn in∈N , as required. Q Q 2A3P Filters In R r , and more generally in all metrizable spaces, topological ideas can be effectively discussed in terms of convergent sequences. (To be sure, this occasionally necessitates the use of a weak form of the axiom of choice, in order to choose a sequence; but as measure theory without such choices is eviscerated, there is no point in fussing about them here.) For topological spaces in general, however, sequences are quite inadequate, for very interesting reasons which I shall not enlarge upon. Instead we need to use ‘nets’ or ‘filters’. The latter take a moment’s more effort at the beginning, but are then (in my view) much easier to work with, so I describe this method now. 2A3Q Convergent filters (a) Let (X, T) be a topological space, F a filter on X (see 2A1I), x ∈ X. We say that F is convergent to x, or that x is a limit of F, and write F → x, if every open set containing x belongs to F. (b) Let (X, T) and (Y, S) be topological spaces, φ : X → Y a continuous function, x ∈ X and F a filter on X converging to x. Then φ[[F]] (as defined in 2A1Ib) converges to φ(x) (because φ−1 [G] is an open set containing x whenever G is an open set containing φ(x)). 2A3R
Now we have the following characterization of compactness.
Theorem Let X be a topological space, and K a subset of X. Then K is compact iff every ultrafilter on X containing K has a limit in K.
2A3Sd
General topology
529
proof (a) Suppose that K is compact and that F is an ultrafilter on X containing K. Set G = {G : G ⊆ X is open, X \ G ∈ F}. Then the union of any two members of G belongs to G, so the union of any finite number of members of G belongs to G; also no member of G can include K, S because X \ K ∈ / F. Because K is compact, it follows that G cannot cover K. Let x be any point of K \ G. If G is any open set containing x, then G ∈ / G so X \G ∈ / F; but this means that G must belong to F, because F is an ultrafilter. As G is arbitrary, F → x. Thus every ultrafilter on X containing K has a limit in K. (b) Now suppose that every ultrafilter on X containing K has a limit in K. Let G be a cover of K by open sets in X. ?? Suppose, if possible, that G has no finite subcover. Set S F = {F : there is a finite G0 ⊆ G, F ∪ G0 ⊇ K}. S Then F is a filter on X. P P (i) X ∪ ∅ ⊇ K so X ∈ F. S S ∅ ∪ G0 = G0 6⊇ K for any / F. (ii) IfSE, F ∈ F there are finite sets G1 , G2 ⊆ G such that S finite G0 ⊆ S G, by hypothesis, so ∅ ∈ E ∪ G1 and F ∪ G2 both include K; now (E ∩ F ) ∪ (G1 ∪ GS 2 ) ⊇ K so E ∩ F ∈ F. (iii) If X ⊇ E ⊇ F ∈ F then there is a finite G0 ⊆ G such that F ∪ G0 ⊇ K; now E ∪ G0 ⊇ K and E ∈ F. Q Q By the Ultrafilter Theorem (2A1O), there is an ultrafilter F ∗ on X including F. Of course K itself belongs to F, so K ∈ F ∗ . By hypothesis, F ∗ has a limit x ∈ K. But now there is a set G ∈ G containing x, and (X \ G) ∪ G ⊇ K, so X \ G ∈ F ⊆ F ∗ ; which means that G cannot belong to F ∗ , and x cannot be a limit of F ∗ . X X So G has a finite subcover. As G is arbitrary, K must be compact. Remark Note that this theorem depends vitally on the Ultrafilter Theorem and therefore on the axiom of choice. 2A3S Further calculations with filters (a) In general, it is possible for a filter to have more than one limit; but in Hausdorff spaces this does not occur. P P (Compare 2A3Mb.) If (X, T) is Hausdorff, and x, y are distinct points of X, there are disjoint open sets G, H such that x ∈ G and y ∈ H. If now a filter F on X converges to x, G ∈ F so H ∈ / F and F does not converge to y. Q Q Accordingly we can safely write x = lim F when F → x in a Hausdorff space. (b) Now suppose that X is a set, F is a filter on X, (Y, S) is a Hausdorff space, D ∈ F and φ : D → Y is a function. Then we write limx→F φ(x) for lim φ[[F]] if this is defined in Y ; that is, limx→F φ(x) = y iff φ−1 [H] ∈ F for every open set H containing y. In the special case Y = R, limx→F φ(x) = a iff {x : |φ(x) − a| ≤ ²} ∈ F for every ² > 0 (because every open set containing a includes a set of the form [a−², a+²], which in turn includes the open set ]a − ², a + ²[). (c) Suppose that X and Y are sets, F is a filter on X, Θ is a non-empty family of pseudometrics on Y defining a topology S on Y , and φ : X → Y is a function. Then the image filter φ[[F]] converges to y ∈ Y iff limx→F θ(φ(x), y) = 0 in R for every θ ∈ Θ. P P (i) Suppose that φ[[F]] → y. For every θ ∈ Θ and ² > 0, U (y; θ; ²) = {z : θ(z, y) < ²} is an open set containing y (2A3G), so belongs to φ[[F]], and its inverse image {x : 0 ≤ θ(φ(x), y) < ²} belongs to F. As ² is arbitrary, limx→F θ(φ(x), y) = 0. As θ is arbitrary, φ satisfies the condition. (ii) Now suppose that limx→F θ(φ(x), y) = 0 for every θ ∈ Θ. Let G be any open set in Y containing y. Then there are θ0 , . . . , θn ∈ Θ and ² > 0 such that T U (y; θ0 , . . . , θn ; ²) = i≤n U (y; θi ; ²) ⊆ G. For each i ≤ n, φ−1 [U (y; θi ; ²)] = {x : θ(φ(x), y) < ²} belongs to F; because F is closed under finite intersections, so do φ−1 [U (y; θ0 , . . . , θn ; ²)] and its superset φ−1 [G]. Thus G ∈ φ[[F]]. As G is arbitrary, φ[[F]] → y. Q Q (d) In particular, taking X = Y and φ the identity map, if X has a topology T defined by a non-empty family P of pseudometrics, then a filter F on X converges to x ∈ X iff limy→F ρ(y, x) = 0 for every ρ ∈ P.
530
Appendix
2A3Se
(e)(i) If X is any set, F is an ultrafilter on X, (Y, S) is a topological space, and h : X → Y is a function such that h[F ] is relatively compact in Y for some F ∈ F, then limx→F h(x) is defined in Y . P P Let K ⊆ Y be a compact set including h[F ]. Then K ∈ h[[F]], which is an ultrafilter (2A1N), so h[[F]] has a limit in Y (2A3R), which is limx→F h(x). Q Q (ii) If X is any set, F is an ultrafilter on X, and h : X → R is a function such that h[F ] is bounded in R for some set F ∈ F, then limx→F h(x) exists in R. P P h[F ] is closed and bounded, therefore compact (2A2F), so h[F ] is relatively compact and we can use (i). Q Q (f ) The concepts of lim sup, lim inf can be applied to filters. Suppose that F is a filter on a set X, and that f : X → R is any function. Then lim sup f (x) = inf{u : u ∈ [−∞, ∞], {x : f (x) ≤ u} ∈ F} x→F
= inf sup f (x) ∈ [−∞, ∞], F ∈F x∈F
lim inf f (x) = sup{u : u ∈ [−∞, ∞], {x : f (x) ≥ u} ∈ F} x→F
= sup inf f (x). F ∈F x∈F
It is easy to see that, for any two functions f , g : X → R, limx→F f (x) = a
iff a = lim supx→F f (x) = lim inf x→F f (x),
and lim supx→F f (x) + g(x) ≤ lim supx→F f (x) + lim supx→F g(x), lim inf x→F f (x) + g(x) ≥ lim inf x→F f (x) + lim inf x→F g(x), lim inf x→F (−f (x)) = − lim supx→F f (x), lim inf x→F cf (x) = c lim inf x→F f (x),
lim supx→F (−f (x)) = − lim inf x→F f (x), lim supx→F cf (x) = c lim supx→F f (x),
whenever the right-hand-sides are defined in [−∞, ∞] and c ≥ 0. So if a = limx→F f (x) and b = limx→F (x) exist in R, limx→F f (x)+g(x) exists and is equal to a+b and limx→F cf (x) exists and is equal to c limx→F f (x) for every c ∈ R. We also see that if f : X → R is such that for every ² > 0 there is an F ∈ F such that supx∈F f (x) ≤ ² + inf x∈F f (x), then lim supx→F f (x) ≤ ² + lim inf x→F f (x) for every ² > 0, so that limx→F f (x) is defined in [−∞, ∞]. (g) Note that the standard limits of real analysis can be represented in the form described here. For instance, limn→∞ , lim supn→∞ , lim inf n→∞ correspond to limn→F0 , lim supn→F0 , lim inf n→F0 where F0 is the Fr´ echet filter on N, the filter {N \ A : A ⊆ N is finite} of cofinite subsets of N. Similarly, limδ↓a , lim supδ↓a , lim inf δ↓a correspond to limδ→F , lim supδ→F , lim inf δ→F where F = {A : A ⊆ R, ∃ h > 0 such that ]a, a + h] ⊆ A}. 2A3T Product topologies We need some brief remarks concerning topologies on product spaces. (a) Let (X, T) and (Y, S) be topological spaces. Let U be the set of subsets U of X × Y such that for every (x, y) ∈ U there are G ∈ T, H ∈ S such that (x, y) ∈ G × H ⊆ U . Then U is a topology on X × Y . P P (i) ∅ ∈ U because the condition for membership of U is vacuously satisfied. X × Y ∈ U because X ∈ T, Y ∈ S and (x, y) ∈ X × Y ⊆ X × Y for every (x, y) ∈ X × Y . (ii) If U , V ∈ U and (x, y) ∈ U ∩ V , then there are G, G0 ∈ T, H, H 0 ∈ S such that (x, y) ∈ G × H ⊆ U ,
(x, y) ∈ G0 × H 0 ⊆ V ;
2A3Uc
General topology
531
now G ∩ G0 ∈ T, H ∩ H 0 ∈ S and (x, y) ∈ (G ∩ G0 ) × (H ∩ H 0 ) ⊆ U ∩ V . S As (x, y) is arbitrary, U ∩ V ∈ U. (iii) If U ⊆ U and (x, y) ∈ US, then there is a U ∈ U suchSthat (x, y) ∈ U ; now there are G ∈ T, H ∈ S such that (x, y) ∈ G × H ⊆ U ⊆ U . As (x, y) is arbitrary, U ∈ U. Q Q U is called the product topology on X × Y . (b) Suppose, in (a), that T and S are defined by non-empty families P, Θ of pseudometrics in the manner of 2A3F. Then U is defined by the family Υ = {˜ ρ : ρ ∈ P} ∪ {θ¯ : θ ∈ Θ} of pseudometrics on X × Y , where ¯ ρ˜((x, y), (x0 , y 0 )) = ρ(x, x0 ), θ((x, y), (x0 , y 0 )) = θ(y, y 0 ) whenever x, x0 ∈ X, y, y 0 ∈ Y , ρ ∈ P and θ ∈ Θ. P P (i) Of course you should check that every ρ˜, θ¯ is a pseudometric on X × Y . (ii) If U ∈ U and (x, y) ∈ U , then there are G ∈ T, H ∈ S such that (x, y) ∈ G × H ⊆ U . There are ρ0 , . . . , ρm ∈ P, θ0 , . . . , θn ∈ Θ, δ, δ 0 > 0 such that (in the language of 2A3Fc) U (x; ρ0 , . . . , ρm ; δ) ⊆ G, U (x; θ0 , . . . , θn ; δ) ⊆ H. Now U ((x, y); ρ˜0 , . . . , ρ˜m , θ¯0 , . . . , θ¯n ; min(δ, δ 0 )) ⊆ U . As (x, y) is arbitrary, U is open for the topology generated by Υ. (iii) If U ⊆ X × Y is open for the topology defined by Υ, take any (x, y) ∈ U . Then there are υ0 , . . . , υk ∈ Υ and δ > 0 such that U ((x, y); υ0 , . . . , υk ; δ) ⊆ U . Take ρ0 , . . . , ρm ∈ P and θ0 , . . . , θn ∈ Θ such that {υ0 , . . . , υk } ⊆ {˜ ρ0 , . . . , ρ˜m , θ¯0 , . . . , θ¯n }; then G = U (x; ρ0 , . . . , ρm ; δ) ∈ T (2A3G), H = U (y; θ0 , . . . , θn ; δ) ∈ S, and G × H = U ((x, y); ρ˜0 , . . . , ρm , θ¯0 , . . . , θ¯n ; δ) ⊆ U ((x, y); υ0 , . . . , υk ; δ) ⊆ U . As (x, y) is arbitrary, U ∈ U. This completes the proof that U is the topology defined by Υ. Q Q (c) In particular, the product topology on R r × R s is the Euclidean topology if we identify R r × R s with R . P P The product topology is defined by the two pseudometrics υ1 , υ2 , where for x, x0 ∈ R r and y, 0 s y ∈ R I write r+s
υ1 ((x, y), (x0 , y 0 )) = kx − x0 k,
υ2 ((x, y), (x0 , y 0 )) = ky − y 0 k (2A3F(b-ii)). Similarly, the Euclidean topology on R r × R s ∼ = R r+s is defined by the metric ρ, where p ρ((x, y), (x0 , y 0 )) = k(x − y) − (x0 , y 0 )k = kx − x0 k2 + ky − y 0 k2 . Now if (x, y) ∈ R r × R s and ² > 0, then U ((x, y); ρ; ²) ⊆ U ((x, y); υj ; ²) for both j, while ² 2
U ((x, y); υ1 , υ2 ; √ ) ⊆ U ((x, y); ρ; ²). Thus, as remarked in 2A3Ib, each topology is included in the other, and they are the same. Q Q 2A3U Dense sets (a) If X is a topological space, a set D ⊆ X is dense in X if D = X, that is, if every non-empty open set meets D. More generally, if D ⊆ A ⊆ X, then D is dense in A if it is dense for the subspace topology of A (2A3C), that is, if A ⊆ D. (b) If T is defined by a non-empty family P of pseudometrics on X, then D ⊆ X is dense iff U (x; ρ0 , . . . , ρn ; δ) ∩ D 6= ∅ whenever x ∈ X, ρ0 , . . . , ρn ∈ P and δ > 0. (c) If (X, T), (Y, S) are topological spaces, of which Y is Hausdorff, and f , g : X → Y are continuous functions which agree on some dense subset D of X, then f = g. P P?? Suppose, if possible, that there is an x ∈ X such that f (x) 6= g(x). Then there are open sets G, H ⊆ Y such that f (x) ∈ G, g(x) ∈ H and G ∩ H = ∅. Now f −1 [G] ∩ g −1 [H] is an open set, containing x and therefore not empty, but it cannot meet XQ Q In particular, this is the case if (X, ρ) and (Y, θ) are metric spaces. D, so x ∈ / D and D is not dense. X
532
Appendix
2A3Ud
(d) A topological space is called separable if it has a countable dense subset. For instance, R r is separable for every r ≥ 1, since Qr is dense.
2A4 Normed spaces In Chapter 24 I discuss the spaces Lp , for 1 ≤ p ≤ ∞, and describe their most basic properties. These spaces form a group of fundamental examples for the general theory of ‘normed spaces’, the basis of functional analysis. This is not the book from which you should learn that theory, but once again it may save you trouble if I briefly outline those parts of the general theory which are essential if you are to make sense of the ideas here. 2A4A The real and complex fields While the most important parts of the theory, from the point of view of measure theory, are most effectively dealt with in terms of real linear spaces, there are many applications in which complex linear spaces are essential. I will therefore use the phrase ‘U is a linear space over R C’ to mean that U is either a linear space over the field R or a linear space over the field C; it being understood that in any particular context all linear spaces considered will be over the same field. In the same way, I will write ‘α ∈ R C ’ to mean that α belongs to whichever is the current underlying field. 2A4B Definitions (a) A normed space is a linear space U over R C together with a norm, that is, a functional k k : U → [0, ∞[ such that ku + vk ≤ kuk + kvk for all u, v ∈ U , kαuk = |α|kuk for u ∈ U , α ∈ R C, kuk = 0 only when u = 0, the zero vector of U . (Observe that if u = 0 (the zero vector) then 0u = u (where this 0 is the zero scalar) so that kuk = |0|kuk = 0.) (b) If U is a normed space, then we have a metric ρ on U defined by saying that ρ(u, v) = ku − vk for u, v ∈ U. P P ρ(u, v) ∈ [0, ∞[ for all u, v because kuk ∈ [0, ∞[ for every u. ρ(u, v) = ρ(v, u) for all u, v because kv − uk = | − 1|ku − vk = ku − vk for all u, v. If u, v, w ∈ U then ρ(u, w) = ku − wk = k(u − v) + (v − w)k ≤ ku − vk + kv − wk = ρ(u, v) + ρ(v, w). If ρ(u, v) = 0 then ku − vk = 0 so u − v = 0 and u = v. Q Q We therefore have a corresponding topology, with open and closed sets, closures, convergent sequences and so on. (c) If U is a normed space, a set A ⊆ U is bounded (for the norm) if {kuk : u ∈ A} is bounded in R; that is, there is some M ≥ 0 such that kuk ≤ M for every u ∈ A. 2A4C Linear subspaces (a) If U is any normed space and V is a linear subspace of U , then V is also a normed space, if we take the norm of V to be just the restriction to V of the norm of U ; the verification is trivial. (b) If V is a linear subspace of U , so is its closure V . P P Take u, u0 ∈ V and α ∈ R C . If ² > 0, set 0 δ = ²/(2 + |α|) > 0; then there are v, v ∈ V such that ku − vk ≤ δ, ku0 − v 0 k ≤ δ. Now v + v 0 , αv ∈ V and k(u + u0 ) − (v + v 0 )k ≤ ku − vk + ku0 − v 0 k ≤ ², 0
kαu − αvk ≤ |α|ku − vk ≤ ².
0
As ² is arbitrary, u + u and αu belong to V ; as u, u and α are arbitrary, and 0 surely belongs to V ⊆ V , V is a linear subspace of U . Q Q 2A4D Banach spaces (a) If U is a normed space, a sequence hun in∈N in U is Cauchy if kum −un k → 0 as m, n → ∞, that is, for every ² > 0 there is an n0 ∈ N such that kum − un k ≤ ² for all m, n ≥ n0 .
2A4G
Normed spaces
533
(b) A normed space U is complete if every Cauchy sequence has a limit; a complete normed space is called a Banach space. 2A4E
It is helpful to know the following result.
Lemma Let U be a normed space such that hun in∈N is convergent (that is, has a limit) in U whenever hun in∈N is a sequence in U such that kun+1 − un k ≤ 4−n for every n ∈ N. Then U is complete. proof Let hun in∈N be any Cauchy sequence in U . For each k ∈ N, let nk ∈ N be such that kum − un k ≤ 4−k whenever m, n ≥ nk . Set vk = unk for each k. Then kvk+1 − vk k ≤ 4−k (whether nk ≤ nk+1 or nk+1 ≤ nk ). So hvk ik∈N has a limit v ∈ U . I seek to show that v is the required limit of hun in∈N . Given ² > 0, let l ∈ N be such that kvk − vk ≤ ² for every k ≥ l; let k ≥ l be such that 4−k ≤ ²; then if n ≥ nk , kun − vk = k(un − vk ) + (vk − v)k ≤ kun − vk k + kvk − vk ≤ kun − unk k + ² ≤ 2². As ² is arbitrary, v is a limit of hun in∈N . As hun in∈N is arbitrary, U is complete. 2A4F Bounded linear operators (a) Let U , V be two normed spaces. A linear operator T : U → V is bounded if {kT uk : u ∈ U, kuk ≤ 1} is bounded. (Warning! in this context, we do not ask for the whole set of values T [U ] to be bounded; a ‘bounded linear operator’ need not be what we ordinarily call a ‘bounded function’.) Write B(U ; V ) for the space of all bounded linear operators from U to V , and for T ∈ B(U ; V ) write kT k = sup{kT uk : u ∈ U, kuk ≤ 1}. (b) A useful fact: kT uk ≤ kT kkuk for every T ∈ B(U ; V ), u ∈ U . P P If |α| > kuk then 1 α
k uk =
1 kuk |α|
≤ 1,
so 1 α
1 α
kT uk = kαT ( u)k = |α|kT ( u)k ≤ |α|kT k; as α is arbitrary, kT uk ≤ kT kkuk. Q Q (c) A linear operator T : U → V is bounded iff it is continuous for the norm topologies on U and V . P P (i) If T is bounded, u0 ∈ U and ² > 0, then kT u − T u0 k = kT (u − u0 )k ≤ kT kku − u0 k ≤ ² whenever ku − u0 k ≤
² ; 1+kT k
by 2A3H, T is continuous. (ii) If T is continuous, then there is some δ > 0
such that kT uk = kT u − T 0k ≤ 1 whenever kuk = ku − 0k ≤ δ. If now kuk ≤ 1, 1 δ
1 δ
kT uk = kT (δu)k ≤ , so T is a bounded operator. Q Q 2A4G Theorem B(U ; V ) is a linear space over R C , and k k is a norm on B(U ; V ). proof I am rather supposing that you are aware, but in any case you will find it easy to check, that if S : U → V and T : U → V are linear operators, and α ∈ R C , then we have linear operators S + T and αT from U to V defined by the formulae (S + T )(u) = Su + T u, (αT )(u) = α(T u) for every u ∈ U ; moreover, that under these definitions of addition and scalar multiplication the space of all linear operators from U to V is a linear space. Now we see that whenever S, T ∈ B(U ; V ), α ∈ R C, u ∈ U and kuk ≤ 1, k(S + T )(u)k = kSu + T uk ≤ kSuk + kT uk ≤ kSk + kT k, k(αT )uk = kα(T u)k = |α|kT uk ≤ |α|kT k;
534
Appendix
2A4G
so that S + T and αT belong to B(U ; V ), with kS + T k ≤ kSk + kT k and kαT k ≤ |α|kT k. This shows that B(U ; V ) is a linear subspace of the space of all linear operators and is therefore a linear space over R C in its own right. To check that the given formula for kT k defines a norm, most of the work has just been done; I suppose I should remark, for the sake of form, that kT k ∈ [0, ∞[ for every T ; if α = 0, then of course kαT k = 0 = |α|kT k; for other α, |α|kT k = |α|kα−1 αT k ≤ |α||α−1 |kαT k = kαT k, so kαT k = |α|kT k. Finally, if kT k = 0 then kT uk ≤ kT kkuk = 0 for every u ∈ U , so T u = 0 for every u and T is the zero operator (in the space of all linear operators, and therefore in its subspace B(U ; V )). 2A4H Dual spaces The most important case of B(U ; V ) is when V is the scalar field R C itself (of course we can think of R as a normed space over itself, writing kαk = |α| for each scalar α). In this case we call C R 0 ∗ B(U ; C ) the dual of U ; it is commonly denoted U or U ; I use the latter. 2A4I Extensions of bounded operators: Theorem Let U be a normed space and V ⊆ U a dense linear subspace. Let W be a Banach space and T0 : V → W a bounded linear operator; then there is a unique bounded linear operator T : U → W extending T0 , and kT k = kT0 k. proof (a) For any u ∈ U , there is a sequence hvn in∈N in V converging to u. Now kT0 vm − T0 vn k = kT0 (vm − vn )k ≤ kT0 kkvm − vn k ≤ kT0 k(kvm − uk + ku − vn k) → 0 as m, n → ∞, so hT0 vn in∈N is Cauchy and w = limn→∞ T0 vn is defined in W . If hvn0 in∈N is another sequence in V converging to u, then kw − T0 vn0 k ≤ kw − T0 vn k + kT0 (vn − vn0 )k ≤ kw − T0 vn k + kT0 k(kvn − uk + ku − vn0 k) → 0 as n → ∞, so w is also the limit of hT0 vn0 in∈N . (b) We may therefore define T : U → W by setting T u = limn→∞ T0 vn whenever hvn in∈N is a sequence in V converging to u. If v ∈ V , then we can set vn = v for every n to see that T v = T0 v; thus T extends T0 . 0 0 If u, u0 ∈ U and α ∈ R C , take sequences hvn in∈N , hvn in∈N in V converging to u, u respectively; in this case k(u + u0 ) − (vn + vn0 )k ≤ ku − vn k + ku0 − vn0 k → 0, 0
as n → ∞, so that T (u + u ) = limn→∞ T0 (vn +
vn0 ),
kαu − αun k = |α|ku − un k → 0
T (αu) = limn→∞ T0 (αvn ), and
kT (u + u0 ) − T u − T u0 k ≤ kT (u + u0 ) − T0 (vn + vn0 )k + kT0 vn − T uk + kT0 vn0 − T u0 k → 0, kT (αu) − αT uk ≤ kT (αu) − T0 (αvn )k + |α|kT0 vn − T uk → 0 as n → ∞. This means that kT (u + u0 ) − T u − T u0 k = 0, kT (αu) − αT uk = 0 so T (u + u0 ) = T u + T u0 , T (αu) = αT u; as u, u0 and α are arbitrary, T is linear. (c) For any u ∈ U , let hvn in∈N be a sequence in V converging to u. Then kT uk ≤ kT0 vn k + kT u − T0 vn k ≤ kT0 kkvn k + kT u − T0 vn k ≤ kT0 k(kuk + kvn − uk) + kT u − T0 vn k → kT0 kkuk as n → ∞, so kT uk ≤ kT0 kkuk. As u is arbitrary, T is bounded and kT k ≤ kT0 k. Of course kT k ≥ kT0 k just because T extends T0 . (d) Finally, let T˜ be any other bounded linear operator from U to W extending T . If u ∈ U , there is a sequence hvn in∈N in V converging to u; now kT˜u − T uk ≤ kT˜(u − vn )k + kT (vn − u)k ≤ (kT˜k + kT k)ku − vn k → 0 as n → ∞, so kT˜u − T uk = 0 and T˜u = T u. As u is arbitrary, T˜ = T . Thus T is unique.
2A5B
Linear topological spaces
535
2A4J Normed algebras (a) A normed algebra is a normed space (U, k k) together with a multiplication, a binary operator × on U , such that u × (v × w) = (u × v) × w, u × (v + w) = (u × v) + (u × w),
(u + v) × w = (u × w) + (v × w),
(αu) × v = u × (αv) = α(u × v), ku × vk ≤ kukkvk for all u, v, w ∈ U and α ∈
R C.
(b) A Banach algebra is a normed algebra which is a Banach space. A normed algebra U is commutative if its multiplication is commutative, that is, u × v = v × u for all u, v ∈ U .
2A5 Linear topological spaces The principal objective of §2A3 is in fact the study of certain topologies on the linear spaces of Chapter 24. I give some fragments of the general theory. 2A5A Linear space topologies Something which is not covered in detail by every introduction to functional analysis is the general concept of ‘linear topological space’. The ideas needed for the work of §245 are reasonably briefly expressed. R Definition A linear topological space or topological vector space over R C is a linear space U over C together with a topology T such that the maps
(u, v) 7→ u + v : U × U → U , (α, u) 7→ αu : R C ×U →U are both continuous, where the product spaces U × U and R C × U are given their product topologies (2A3T). Given a linear space U , a topology on U satisfying the conditions above is a linear space topology. Note that (u, v) 7→ u − v = u + (−1)v : U × U → U will also be continuous. 2A5B terms.
All the linear topological spaces we need turn out to be readily presentable in the following
Proposition Suppose that U is a linear space over R C , and T is a family of functionals τ : U → [0, ∞[ such that (i) τ (u + v) ≤ τ (u) + τ (v) for all u, v ∈ U , τ ∈ T; (ii) τ (αu) ≤ τ (u) if u ∈ U , |α| ≤ 1, τ ∈ T; (iii) limα→0 τ (αu) = 0 for every u ∈ U , τ ∈ T. For τ ∈ T, define ρτ : U × U → [0, ∞[ by setting ρτ (u, v) = τ (u − v) for all u, v ∈ U . Then each ρτ is a pseudometric on U , and the topology defined by P = {ρτ : τ ∈ T} renders U a linear topological space. proof (a) It is worth noting immediately that τ (0) = limα→0 τ (α0) = 0 for every τ ∈ T. (b) To see that every ρτ is a pseudometric, argue as follows. (i) ρτ takes values in [0, ∞[ because τ does.
536
Appendix
2A5B
(ii) If u, v, w ∈ U then ρτ (u, w) = τ (u − w) = τ ((u − v) + (v − w)) ≤ τ (u − v) + τ (v − w) = ρτ (u, v) + ρτ (v, w). (iii) If u, v ∈ U , then ρ(v, u) = τ (v − u) = τ (−1(u − v)) ≤ τ (u, v) = ρτ (u, v), and similarly ρτ (u, v) ≤ ρτ (v, u), so the two are equal. (iv) If u ∈ U then ρτ (u, u) = τ (0) = 0. (c) Let T be the topology on U defined by {ρτ : τ ∈ T} (2A3F). (i) Addition is continuous because, given τ ∈ T, we have ρτ (u0 + v 0 , u + v) = τ ((u0 + v 0 ) − (u + v)) ≤ τ (u0 − u) + τ (v 0 − v) = ρτ (u0 , u) + ρτ (v 0 , v) for all u, v, u0 , v 0 ∈ U . This means that, given ² > 0 and (u, v) ∈ U × U , we shall have ² 2
ρτ (u0 + v 0 , u + v) ≤ ² whenever (u0 , v 0 ) ∈ U ((u, v); ρ˜τ , ρ¯τ ; ), using the language of 2A3Tb. Because ρ˜τ , ρ¯τ are two of the pseudometrics defining the product topology of U × U (2A3Tb), (u, v) 7→ u + v is continuous, by the criterion of 2A3H. (ii) Scalar multiplication is continuous because if u ∈ U , n ∈ N then τ (nu) ≤ nτ (u) for every τ ∈ T (induce on n). Consequently, if τ ∈ T, α n
τ (αu) ≤ nτ ( u) ≤ nτ (u) whenever |α| < n ∈ N, τ ∈ T. Now, given (α, u) ∈ R C × U and ² > 0, take n > |α| and δ > 0 such that ² ) and τ (γu) ≤ 2² whenever |γ| ≤ δ; then δ ≤ min(n − |α|, 2n ρτ (α0 u0 , αu) = τ (α0 u0 − αu) ≤ τ ((α0 − α)u) + τ (α0 (u0 − u)) ≤ τ ((α0 − α)u) + nτ (u0 − u) R 0 0 0 0 whenever u0 ∈ U and α0 ∈ R C and |α | < n ∈ N. Accordingly, setting θ(α , α) = |α − α| for α , α ∈ C ,
ρτ (α0 u0 , αu) ≤
² 2
+ nδ ≤ ²
whenever ˜ ρ¯τ ; δ). (α0 , u0 ) ∈ U ((α, u); θ, Because θ˜ and ρ¯τ are among the pseudometrics defining the topology of R C ×U , the map (α, u) 7→ αu satisfies the criterion of 2A3H and is continuous. Thus T is a linear space topology on U . *2A5C
We do not need it for Chapter 24, but the following is worth knowing.
Theorem Let U be a linear space and T a linear space topology on U . (a) There is a family T of functionals satisfying the conditions (i)-(iii) of 2A5B and defining T. (b) If T is metrizable, we can take T to consist of a single functional. proof (a) Kelley & Namioka 76, p. 50. ¨ the 69, §15.11. (b) Ko 2A5D Definition Let U be a linear space over R C . Then a seminorm on U is a functional τ : U → [0, ∞[ such that (i) τ (u + v) ≤ τ (u) + τ (v) for all u, v ∈ U ; (ii) τ (αu) = |α|τ (u) if u ∈ U , α ∈ R C. Observe that a norm is always a seminorm, and that a seminorm is always a functional of the type described in 2A5B. In particular, the association of a metric with a norm (2A4Bb) is a special case of 2A5B.
2A5H
Linear topological spaces
537
2A5E Convex sets (a) Let U be a linear space over R C . A subset C of U is convex if αu+(1−α)v ∈ C whenever u, v ∈ C and α ∈ [0, 1]. The intersection of any family of convex sets is convex, so forP every set n A ⊆ U there is a smallest convex set including A; this is just the set of vectors expressible as i=0 αi ui Pn where u0 , . . . , un ∈ A, α0 , . . . , αn ∈ [0, 1] and i=0 αi = 1 (Bourbaki 87, II.2.3); it is the convex hull of A. (b) If U is a linear topological space, the closure of any convex set is convex (Bourbaki 87, II.2.6). It follows that, for any A ⊆ U , the closure of the convex hull of A is the smallest closed convex set including A; this is the closed convex hull of A. (c) I note for future reference that in a linear topological space, the closure of any linear subspace is a ¨ the 69, §15.2. Compare 2A4Cb.) linear subspace. (Bourbaki 87, I.1.3; Ko 2A5F Completeness in linear topological spaces In normed spaces, completeness can be described in terms of Cauchy sequences (2A4D). In general linear topological spaces this is inadequate. The true theory of ‘completeness’ demands the concept of ‘uniform space’ (see §3A4 in the next volume, or Kelley 55, chap. 6; Engelking 89, §8.1: Bourbaki 66, chap. II); I shall not describe this here, but will give a version adapted to linear spaces. I mention this only because you will I hope some day come to the general theory (in Volume 3 of this treatise, if not before), and you should be aware that the special case described here gives a misleading emphasis at some points. Definitions Let U be a linear space over R C , and T a linear space topology on U . A filter F on U is Cauchy if for every open set G in U containing 0 there is an F ∈ F such that F − F = {u − v : u, v ∈ F } ⊆ G. U is complete if every Cauchy filter on U is convergent. 2A5G of 2A5B.
Cauchy filters have a simple description when a linear space topology is defined by the method
Lemma Let U be a linear space over R C , and let T be a family of functionals defining a linear space topology on U , as in 2A5B. Then a filter F on U is Cauchy iff for every τ ∈ T, ² > 0 there is an F ∈ F such that τ (u − v) ≤ ² for all u, v ∈ F . proof (a) Suppose that F is Cauchy, τ ∈ T and ² > 0. Then G = U (0; ρτ ; ²) is open (using the language of 2A3F-2A3G), so there is an F ∈ F such that F − F ⊆ G; but this just means that τ (u − v) < ² for all u, v ∈ F. (b) Suppose that F satisfies the criterion, and that G is an open set containing 0. Then there are τ0 , . . . , τn ∈ T and ² > 0 such that U (0; T ρτ0 , . . . , ρτn ; ²) ⊆ G. For each i ≤ n there is an Fi ∈ F such that τi (u, v) < 2² for all u, v ∈ Fi ; now F = i≤n Fi ∈ F and u − v ∈ G for all u, v ∈ F . 2A5H Normed spaces and sequential completeness I had better point out that for normed spaces the definition of 2A5F agrees with that of 2A4D. Proposition Let (U, k k) be a normed space over R C , and let T be the linear space topology on U defined by the method of 2A5B from the set T = {k k}. Then U is complete in the sense of 2A5F iff it is complete in the sense of 2A4D. proof (a) Suppose first that U is complete in the sense of 2A5F. Let hun in∈N be a sequence in U which is Cauchy in the sense of 2A4Da. Set F = {F : F ⊆ U, {n : un ∈ / F } is finite}. Then it is easy to check that F is a filter on U , the image of the Fr´echet filter under the map n 7→ un : N → U . If ² > 0, take m ∈ N such that kuj − uk k ≤ ² whenever j, k ≥ m; then F = {uj : j ≥ m} belongs to F, and ku − vk ≤ ² for all u, v ∈ F . So F is Cauchy in the sense of 2A5F, and has a limit u say. Now, for any ² > 0, the set {v : kv − uk < ²} = U (u; ρk k ; ²) is an open set containing u, so belongs to F, and {n : kun − uk ≥ ²} is finite, that is, there is an m ∈ N such that kum − uk < ² whenever n ≥ m. As ² is arbitrary, u = limn→∞ un in the sense of 2A3M. As hun in∈N is arbitrary, U is complete in the sense of 2A4D.
538
Appendix
2A5H
(b) Now suppose that U is complete in the sense of 2A4D. Let F be a Cauchy filter on U . For T each n ∈ N, choose a set Fn ∈ F such that ku − vk ≤ 2−n for all u, v ∈ Fn . For each n ∈ N, Fn0 = i≤n Fi belongs to F, so is not empty; choose un ∈ Fn0 . If m ∈ N and j, k ≥ m, then both uj and uk belong to Fm , so kuj − uk k ≤ 2−m ; thus hun in∈N is a Cauchy sequence in the sense of 2A4Da, and has a limit u say. Now take any ² > 0 and m ∈ N such that 2−m+1 ≤ ². There is surely a k ≥ m such that kuk − uk ≤ 2−m ; now uk ∈ Fm , so Fm ⊆ {v : kv − uk k ≤ 2−m } ⊆ {v : kv − uk ≤ 2−m+1 } ⊆ {v : ρk k (v, u) ≤ ²}, and {v : ρk k (v, u) ≤ ²} ∈ F. As ² is arbitrary, F converges to u, by 2A3Sd. As F is arbitrary, U is complete. Thus the two definitions coincide, provided at least that we allow the countably many simultaneous choices of the un in part (b) of the proof. 2A5I Weak topologies I come now to brief notes on ‘weak’ on normed spaces; from the point of view of this volume, these are in fact the primary examples of linear space topologies. Let U be a normed linear space over R C. ∗ (a) Write U ∗ for its dual B(U ; R C ) (2A4H). Let T be the set {|h| : h ∈ U }; then T satisfies the conditions of 2A5B, so defines a linear space topology on U ; this is called the weak topology of U .
(b) A filter F on U converges to u ∈ U for the weak topology of U iff limv→F ρ|h| (v, u) = 0 for every h ∈ U ∗ (2A3Sd), that is, iff limv→F |h(v − u)| = 0 for every h ∈ U ∗ , that is, iff limv→F h(v) = h(u) for every h ∈ U ∗. (c) A set C ⊆ U is called weakly compact if it is compact for the weak topology of U . So (subject to the axiom of choice) a set C ⊆ U is weakly compact iff for every ultrafilter F on U containing C there is a u ∈ C such that limv→F h(v) = h(u) for every h ∈ U ∗ (put 2A3R together with (b) above). (d) A subset A of U is called relatively weakly compact if it is a subset of some weakly compact subset of U . (e) If h ∈ U ∗ , then h : U → R C is continuous for the weak topology on U and the usual topology of this is obvious if we apply the criterion of 2A3H. So if A ⊆ U is relatively weakly compact, h[A] must be bounded in R P Let C ⊇ A be a weakly compact set. Then h[C] is compact in R C. P C , by 2A3Nb, so is bounded, by 2A2F (noting that if the underlying field is C, then it can be identified, as metric space, with R 2 ). Accordingly h[A] is also bounded. Q Q R C;
(f ) If V is another normed space and T : U → V is a bounded linear operator, then T is continuous for the respective weak topologies. P P If h ∈ V ∗ then the composition hT belongs to U ∗ . Now, for any u, v ∈ U , ρ|h| (T u, T v) = |h(T u − T v)| = |hT (u − v)| = ρ|hT | (u, v), taking ρ|h| , ρ|hT | to be the pseudometrics on V , U respectively defined by the formula of 2A5B. By 2A3H, T is continuous. Q Q (g) Corresponding to the weak topology on a normed space U , we have the weak* or w*-topology on its dual U ∗ , defined by the set T = {|ˆ u| : u ∈ U }, where I write u ˆ(f ) = f (u) for every f ∈ U ∗ , u ∈ U . As ∗ in 2A5Ia, this is a linear space topology on U . (It is essential to distinguish between the ‘weak*’ topology and the ‘weak’ topology on U ∗ . The former depends only on the action of U on U ∗ , the latter on the action of U ∗∗ = (U ∗ )∗ . You will have no difficulty in checking that u ˆ ∈ U ∗∗ for every u ∈ U , but the point is ∗∗ that there may be members of U not representable in this way, leading to open sets for the weak topology which are not open for the weak* topology.) *2A5J Angelic spaces I do not rely on the following ideas, but they may throw light on some results in §§246-247. First, a topological space X is regular if whenever G ⊆ X is open and x ∈ G then there is an open set H such that x ∈ H ⊆ H ⊆ G. Next, a regular Hausdorff space X is angelic if whenever A ⊆ X is such that every sequence in A has a cluster point in X, then A is compact and every point of A is the
2A6C
Factorization of matrices
539
limit of a sequence in A. What this means is that compactness in X, and the topologies of compact subsets ˇ of X, can be effectively described in terms of sequences. Now the theorem (due to Eberlein and Smulian) ¨ the 69, §24; Dunford is that any normed space is angelic in its weak topology. (462D in Volume 4; Ko & Schwartz 57, V.6.1.) In particular, this is true of L1 spaces, which makes it less surprising that there should be criteria for weak compactness in L1 spaces which deal only with sequences.
2A6 Factorization of matrices I spend a couple of pages on the linear algebra of R r required for Chapter 26. I give only one proof, because this is material which can be found in any textbook of elementary linear algebra; but I think it may be helpful to run through the basic ideas in the language which I use for this treatise. 2A6A Determinants We need to know the following things about determinants. (i) Every r × r real matrix T has a real determinant det T . (ii) For any r × r matrices S and T , det ST = det S det T . (iii) If T is a diagonal matrix, its determinant is just the product of its diagonal entries. (iv) For any r × r matrix T , det T 0 = det T , where T 0 is the transpose of T . (v) det T is a continuous function of the coefficients of T . There are so many routes through this topic that I avoid even a definition of ‘determinant’; I invite you to check your memory, or your favourite text, to confirm that you are indeed happy with the facts above. Pr 2A6B Orthonormal families√ For x = (ξ1 , . . . , ξr ), y = (η1 , . . . , ηr ) ∈ R r , write x .y = i=1 ξi ηi ; of course kxk, as defined in 1A2A, is x. x. Recall that x1 , . . . , xk are orthonormal if xi . xj = 0 for i 6= j, 1 for i = j. The results we need here are: (i) If x1 , . . . , xk are orthonormal vectors in R r , where k < r, then there are vectors xk+1 , . . . , xr in R r such that x1 , . . . , xr are orthonormal. (ii) An r × r matrix P is orthogonal if P 0 P is the identity matrix; equivalently, if the columns of P are orthonormal. (iii) For an orthogonal matrix P , det P must be ±1 (put (ii)-(iv) of 2A6A together). (iv) If P is orthogonal, then P x . P y = P 0 P x . y = x . y for all x, y ∈ R r . (v) If P is orthogonal, so is P 0 = P −1 . (vi) If P and Q are orthogonal, so is P Q. 2A6C I now give a proposition which is not always included in elementary presentations. Of course there are many approaches to this; I offer a direct one. Proposition Let T be any real r × r matrix. Then T is expressible as P DQ where P and Q are orthogonal matrices and D is a diagonal matrix with non-negative coefficients. proof I induce on r. (a) If r = 1, then T = (τ11 ). Set D = (|τ11 |), P = (1) and Q = (1) if τ11 ≥ 0, (−1) otherwise. (b)(i) For the inductive step to r + 1 ≥ 2, consider the unit ball B = {x : x ∈ R r+1 , kxk ≤ 1}. This is a closed bounded set in R r+1 , so is compact (2A2F). The maps x 7→ T x : R r+1 → R r+1 and x 7→ kxk : R r+1 → R are continuous, so the function x 7→ kT xk : B → R is bounded and attains its bounds (2A2G), and there is a u ∈ B such that kT uk ≥ kT xk for every x ∈ B. Observe that kT uk must be the norm kT k of T as defined in 262H. Set δ = kT k = kT uk. If δ = 0, then T must be the zero matrix, and the result is trivial; so let us suppose that δ > 0. In this case kuk must be exactly 1, since otherwise we should have u = kuku0 where ku0 k = 1 and kT u0 k > kT uk. (ii) If x ∈ R r+1 and x . u = 0, then T x . T u = 0. P P?? If not, set γ = T x .T u 6= 0. Consider y = u + ηγx for small η > 0. We have
540
Appendix
2A6C
kyk2 = y . y = u . u + 2ηγu . x + η 2 γ 2 x . x = kuk2 + η 2 γ 2 kxk2 = 1 + η 2 γ 2 kxk2 , while kT yk2 = T y .T y = T u . T u + 2ηγT u . T x + η 2 γ 2 T x .T x = δ 2 + 2ηγ 2 + η 2 γ 2 kT xk2 . But also kT yk2 ≤ δ 2 kyk2 (2A4Fb), so δ 2 + 2ηγ 2 + η 2 γ 2 kT xk2 ≤ δ 2 (1 + η 2 γ 2 kxk2 ) and 2ηγ 2 ≤ δ 2 η 2 γ 2 kxk2 − η 2 γ 2 kT xk2 , that is, 2 ≤ η(δ 2 kxk2 − kT xk2 ). But this surely cannot be true for all η > 0, so we have a contradiction. X XQ Q (iii) Set v = δ −1 T u, so that kvk = 1. Let u1 , . . . , ur+1 be orthonormal vectors such that ur+1 = u, and let Q0 be the orthogonal (r + 1) × (r + 1) matrix with columns u1 , . . . , ur+1 ; then, writing e1 , . . . , er+1 for the standard orthonormal basis of R r+1 , we have Q0 ei = ui for each i, and Q0 er+1 = u. Similarly, there is an orthogonal matrix P0 such that P0 er+1 = v. Set T1 = P0−1 T Q0 . Then T1 er+1 = P0−1 T u = δP0−1 v = δer+1 , while if x . er+1 = 0 then Q0 x .u = 0 (2A6B(iv)), so that T1 x . er+1 = P0 T1 x . P0 er+1 = T Q0 x .v = 0, by (ii). This means that T1 must be of the form µ
S 0
0 δ
¶ ,
where S is an r × r matrix. ˜ Q, ˜ where P˜ and Q ˜ are orthogonal r ×r matrices (iv) By the inductive hypothesis, S is expressible as P˜ D ˜ and D is a diagonal r × r matrix with non-negative coefficients. Set ¶ ¶ µ ¶ µ µ ˜ 0 ˜ 0 D Q P˜ 0 . , D= , Q1 = P1 = 0 δ 0 1 0 1 Then P1 and Q1 are orthogonal and D is diagonal, with non-negative coefficients, and P1 DQ1 = T1 . Now set P = P0 P1 , Q = Q1 Q−1 0 , so that P and Q are orthogonal and −1 P DQ = P0 P1 DQ1 Q−1 0 = P0 T1 Q0 = T .
Thus the induction proceeds.
252Yf
Concordance
541
Concordance I list here the section and paragraph numbers which have (to my knowledge) appeared in print in references to this volume, and which have since been changed. 215Yc Measurable envelopes This exercise, referred to in the May 2000 edition of Volume 1, has been moved to 216Yc. 252Yf Ordinate sets This exercise, referred to in the May 2000 edition of Volume 1, has been moved to 252Yh.
542
References
References for Volume 2 Alexits G. [78] (ed.) Fourier Analysis and Approximation Theory. North-Holland, 1978 (Colloq. Math. Soc. Janos Bolyai 19). Bergelson V., March P. & Rosenblatt J. [96] (eds.) Convergence in Ergodic Theory and Probability. de Gruyter, 1996. du Bois-Reymond P. [1876] ‘Untersuchungen u ¨ber die Convergenz und Divergenz der Fouriersche Darstellungformeln’, Abh. Akad. M¨ unchen 12 (1876) 1-103. [§282 notes.] Bourbaki N. [66] General Topology. Hermann/Addison-Wesley, 1968. [2A5F.] Bourbaki N. [87] Topological Vector Spaces. Springer, 1987. [2A5E.] Carleson L. [66] ‘On convergence and growth of partial sums of Fourier series’, Acta Math. 116 (1966) 135-157. [§282 notes, §286 intro., §286 notes.] Defant A. & Floret K. [93] Tensor Norms and Operator Ideals, North-Holland, 1993. [§253 notes.] Dudley R.M. [89] Real Analysis and Probability. Wadsworth & Brooks/Cole, 1989. [§282 notes.] Dunford N. & Schwartz J.T. [57] Linear Operators I. Wiley, 1957 (reprinted 1988). [§244 notes, 2A5J.] Enderton H.B. [77] Elements of Set Theory. Academic, 1977. [§2A1.] Engelking R. [89] General Topology. Heldermann, 1989 (Sigma Series in Pure Mathematics 6). [2A5F.] Etemadi N. [96] ‘On convergence of partial sums of independent random variables’, pp. 137-144 in Bergelson March & Rosenblatt 96. [272U.] Evans L.C. & Gariepy R.F. [92] Measure Theory and Fine Properties of Functions. CRC Press, 1992. [263Ec, §265 notes.] Federer H. [69] Geometric Measure Theory. Springer, 1969 (reprinted 1996). [262C, 263Ec, §264 notes, §265 notes.] Feller W. [66] An Introduction to Probability Theory and its Applications, vol. II. Wiley, 1966. [Chap. 27 intro., 274H, 275Xc, 285N.] Fremlin D.H. [74] Topological Riesz Spaces and Measure Theory. Cambridge U.P., 1974. [§232 notes, 241F, §244 notes, §245 notes, §247 notes.] Fremlin D.H. [93] ‘Real-valued-measurable cardinals’, pp. 151-304 in Judah 93. [232Hc.] Haimo D.T. [67] (ed.) Orthogonal Expansions and their Continuous Analogues. Southern Illinois University Press, 1967. Hall P. [82] Rates of Convergence in the Central Limit Theorem. Pitman, 1982. [274Hc.] Halmos P.R. [50] Measure Theory. Van Nostrand, 1950. [§251 notes, §252 notes, 255Yn.] Halmos P.R. [60] Naive Set Theory. Van Nostrand, 1960. [§2A1.] Henle J.M. [86] An Outline of Set Theory. Springer, 1986. [§2A1.] Hunt R.A. [67] ‘On the convergence of Fourier series’, pp. 235-255 in Haimo 67. [§286 notes.] Jorsbøe O.G. & Mejlbro L. [82] The Carleson-Hunt Theorem on Fourier Series. Springer, 1982 (Lecture Notes in Mathematics 911). [§286 notes.] Judah H. [93] (ed.) Proceedings of the Bar-Ilan Conference on Set Theory and the Reals, 1991. Amer. Math. Soc. (Israel Mathematical Conference Proceedings 6), 1993. Kakutani S. [41] ‘Concrete representation of abstract L-spaces and the mean ergodic theorem’, Ann. of Math. 42 (1941) 523-537. [§242 notes.] Kelley J.L. [55] General Topology. Van Nostrand, 1955. [2A5F.] Kelley J.L. & Namioka I. [76] Linear Topological Spaces. Springer, 1976. [2A5C.] ¨ Kirzbraun M.D. [34] ‘Uber die zusammenziehenden und Lipschitzian Transformationen’, Fund. Math. 22 (1934) 77-108. [262C.] Kolmogorov A.N. [26] ‘Une s´erie de Fourier-Lebesgue divergente partout’, C. R. Acad. Sci. Paris 183 (1926) 1327-1328. [§282 notes.] Koml´os J. [67] ‘A generalization of a problem of Steinhaus’, Acta Math. Acad. Sci. Hung. 18 (1967) 217-229. [276H.] K¨orner T.W. [88] Fourier Analysis. Cambridge U.P., 1988. [§282 notes.] K¨othe G. [69] Topological Vector Spaces I. Springer, 1969. [2A5C, 2A5J.]
References
543
Krivine J.-L. [71] Introduction to Axiomatic Set Theory. D. Reidel, 1971. [§2A1.] Lacey M. & Thiele C. [00] ‘A proof of boundedness of the Carleson operator’, Math. Research Letters 7 (2000) 1-10. [§286 intro., 286H.] Lebesgue H. [72] Oeuvres Scientifiques. L’Enseignement Math´ematique, Institut de Math´ematiques, Univ. de Gen`eve, 1972. [Chap. 27 intro.] Liapounoff A. [1901] ‘Nouvelle forme du th´eor`eme sur la limite de probabilit´e’, M´em. Acad. Imp. Sci. St-P´etersbourg 12(5) (1901) 1-24. [274Xg.] Lighthill M.J. [59] Introduction to Fourier Analysis and Generalised Functions. Cambridge U.P., 1959. [§284 notes.] Lindeberg J.W. [22] ‘Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung’, Math. Zeitschrift 15 (1922) 211-225. [274Ha, §274 notes.] Lipschutz S. [64] Set Theory and Related Topics. McGraw-Hill, 1964 (Schaum’s Outline Series). [§2A1.] Lo`eve M. [77] Probability Theory I. Springer, 1977. [Chap. 27 intro., 274H.] Luxemburg W.A.J. & Zaanen A.C. [71] Riesz Spaces I. North-Holland, 1971. [241F.] Mozzochi C.J. [71] On the Pointwise Convergence of Fourier Series. Springer, 1971 (Lecture Notes in Mathematics 199). [§286 notes.] R´enyi A. [70] Probability Theory. North-Holland, 1970. [274Hc.] Roitman J. [90] An Introduction to Set Theory. Wiley, 1990. [§2A1.] Schipp F. [78] ‘On Carleson’s method’, pp. 679-695 in Alexits 78. [§286 notes.] Semadeni Z. [71] Banach spaces of continuous functions I. Polish Scientific Publishers, 1971. [§253 notes.] Shiryayev A. [84] Probability. Springer, 1984. [285N.] Sj¨olin P. [71] ‘Convergence almost everywhere of certain singular integrals and multiple Fourier series’, Arkiv f¨or Math. 9 (1971) 65-90. [§286 notes.] Zaanen A.C. [83] Riesz Spaces II. North-Holland, 1983. [241F.] Zygmund A. [59] Trigonometric Series. Cambridge U.P, 1959. [§244 notes, §282 notes, 284Xj.]
544
Index
Index to volumes 1 and 2 Principal topics and results The general index below is intended to be comprehensive. Inevitably the entries are voluminous to the point that they are often unhelpful. I am therefore preparing a shorter, better-annotated, index which will, I hope, help readers to focus on particular areas. It does not mention definitions, as the bold-type entries in the main index are supposed to lead efficiently to these; and if you draw blank here you should always, of course, try again in the main index. Entries in the form of mathematical assertions frequently omit essential hypotheses and should be checked against the formal statements in the body of the work. absolutely continuous real functions §225 —– as indefinite integrals 225E absolutely continuous additive functionals §232 —– characterization 232B atomless measure spaces —– have elements of all possible measures 215D Borel sets in R r 111G —– and Lebesgue measure 114G, 115G, 134F bounded variation, real functions of §224 —– as differences of monotonic functions 224D —– integrals of their derivatives 224I —– Lebesgue decomposition 226C Cantor set and function 134G, 134H Carath´eodory’s construction of measures from outer measures 113C Carleson’s theorem (Fourier series of square-integrable functions converge a.e.) §286 Central Limit Theorem (sum of independent random variables approximately normal) §274 —– Lindeberg’s condition 274F, 274G change R of variable in R the integral §235 —– J × gφ dµ = g dν 235A, 235E, 235L —– finding J 235O; —– —– J = | det T | for linear operators T 263A; J = | det φ0 | for differentiable operators φ 263D —– —– —– when the measures are Hausdorff measures 265B, 265E φ is inverse-measure-preserving —– when 235I R R —– gφ dµ = J × g dν 235T characteristic function of a probability distribution §285 —– sequences of distributions converge in vague topology iff characteristic functions converge pointwise 285L complete measure space §212 completion of a measure 212C et seq. concentration of measure 264H conditional expectation —– of a function §233 —– as operator on L1 (µ) 242J construction of measures —– image measures 112E —– from outer measures (Carath´eodory’s method) 113C —– subspace measures 131A, 214A —– product measures 251C, 251F, 251W, 254C —– as pull-backs 132G convergence theorems (B.Levi, Fatou, Lebesgue) §123 convergence in measure (linear space topology on L0 ) —– on L0 (µ) §245 —– when Hausdorff/complete/metrizable 245E
Hausdorff
Principal topics and results
convex functions 233G et seq. convolution of functions r —– (on R R ) §255 R —– h × (f ∗ g) = h(x + y)f (x)g(y)dxdy 255G —– f ∗ (g ∗ h) = (f ∗ g) ∗ h 255J convolution of measures r —– (on R R ) §257 R —– h d(ν1 ∗ ν2 ) = h(x + y)ν1 (dx)ν2 (dy) 257B —– of absolutely continuous measures 257F countable sets 111F, 1A1C et seq. countable-cocountable measure 211R counting measure 112Bd differentiable functions (from R r to R s ) §262, §263 direct sum of measure spaces 214K distribution of a finite family of random variables §271 —– as a Radon measure 271B X ) 271J —– of φ(X —– of an independent family 272G —– determined by characteristic functions 285M Doob’s Martingale Convergence Theorem 275G exhaustion, principle of 215A extended real line §135 R R Fatou’s Lemma ( lim inf ≤ lim inf for sequences of non-negative functions) 123B F´ejer sums (running averages of Fourier sums) converge to local averages of f 282H —– uniformly if f is continuous 282G Fourier series §282 —– norm-converge in L2 282J —– converge at points of differentiability 282L —– converge to midpoints of jumps, if f of bounded variation 282O —– and convolutions 282Q —– converge a.e. for square-integrable function 286V Fourier transforms —– on R §283, §284 Rd ∧ —– formula for c f in terms of f 283F —– and convolutions 283M —– in terms of action on test functions 284H et seq. —– of square-integrable functions 284O, 286U —– inversion formulae for differentiable functions 283I; for functions of bounded variation 283L R∞ ∧ 2 —– f (y) = lim²↓0 √12π −∞ e−iyx e−²x f (x)dx a.e. 284M Fubini’s R theorem §252 RR —– f d(µ × ν) = f (x, y)dxdy 252B —– when both factors σ-finite 252C, 252H —– for characteristic functions 252D, 252F Rx Rb d f = f (x) a.e.) §222; ( a F 0 (x)dx = F (b) − F (a)) 225E Fundamental Theorem of the Calculus ( dx a Hahn decomposition of a countably additive functional 231E Hardy-Littlewood Maximal Theorem 286A Hausdorff measures (on R r ) §264 —– are topological measures 264E —– r-dimensional Hausdorff measure on R r a multiple of Lebesgue measure 264I —– (r − 1) dimensional measure on R r 265F-265H
545
546
Index
image
image measures 112E indefinite integrals —– differentiate back to original function 222E, 222H —– to construct measures §234 independent random variables §272 —– joint distributions are product measures 272G —– limit theorems §273, §274 inner regularity of measures —– (with respect to compact sets) Lebesgue measure 134F; Radon measures §256 integration of real-valued functions, construction §122 —– as a positive linear functional 122O —– —– acting on L1 (µ) 242B —– by parts 225F —– characterization of integrable functions 122P, 122R —– over subsets 131D, 214E —– functions and integrals with values in [−∞, ∞] §133 Jensen’s inequality 233I-233J —– expressed in L1 (µ) 242K Koml´os’ subsequence theorem 276H Lebesgue’s Density Theorem (in R) §223 R x+h —– limh↓0 h1 x f = f (x) a.e. 223A R x+h 1 —– limh↓0 2h |f (x − y)|dy = 0 a.e. 223D x−h —– (in R r ) 261C, 261E Lebesgue measure, construction of §114, §115 —– further properties §134 R R Lebesgue’s Dominated Theorem ( lim = lim for dominated sequences of functions) 123C R Convergence R RP P R B.Levi’s theorem ( lim = lim for monotonic sequences of functions) 123A; ( ) 226E = Lipschitz functions §262 —– differentiable a.e. 262Q localizable measure space —– assembling partial measurable functions 213N —– which is not strictly localizable 216E Lusin’s theorem (measurable functions are almost continuous) —– (on R r ) 256F martingales §275 —– L1 -bounded martingales converge a.e. 275G —– when of form E(X|Σn ) 275H, 275I measurable envelopes —– elementary properties 132E measurable functions —– (real-valued) §121 —– —– sums, products and other operations on finitely many functions 121E —– —– limits, infima, suprema 121F Monotone Class Theorem 136B monotonic functions —– are differentiable a.e. 222A non-measurable set (for Lebesgue measure) 134B outer measures constructed from measures §132 —– elementary properties 132A outer regularity of Lebesgue measure 134F
uniformly
Principal topics and results
547
Plancherel Theorem (on Fourier series and transforms of square-integrable functions) 282K, 284O Poisson’s theorem (a count of independent rare events has an approximately Poisson distribution) 285Q product of two measure spaces §251 —– basic properties of c.l.d. product measure 251I —– Lebesgue measure on R r+s as a product measure 251M —– more than two spaces 251W —– Fubini’s theorem 252B —– Tonelli’s theorem 252G —– and L1 spaces §253 —– continuous bilinear maps on L1 (µ) × L1 (ν) 253F —– conditional expectation on a factor 253H product of any number of probability spaces §254 —– basic properties of the (completed) product 254F —– characterization with inverse-measure-preserving functions 254G —– products of subspaces 254L —– products of products 254N —– determination by countable subproducts 254O —– subproducts and conditional expectations 254R Rademacher’s theorem (Lipschitz functions are differentiable a.e.) 262Q Radon measures —– on R r §256 —– as completions of Borel measures 256C —– indefinite-integral measures 256E —– image measures 256G —– product measures 256K Radon-Nikod´ ym theorem (truly continuous additive set-functions have densities) 232E —– in terms of L1 (µ) 242I Stone-Weierstrass theorem §281 —– for Riesz subspaces of Cb (X) 281A —– for subalgebras of Cb (X) 281E —– for *-subalgebras of Cb (X; C) 281G strictly localizable measures —– sufficient condition for strict localizability Pn 213O 1 strong law of large numbers (limn→∞ n+1 i=0 (Xi − E(Xi )) = 0 a.e.) §273 P∞ 1 —– when n=0 Var(Xn ) < ∞ 273D 2 (n+1)
—– when supn∈N E(|Xn |1+δ ) < ∞ 273H —– for identically distributed Xn 273I —– for martingale difference sequences 276C, 276F —– convergence of averages for k k1 , k kp 273N subspace measures —– for measurable subspaces §131 —– for arbitrary subspaces §214 surface measure in R r §265 tensor products of L1 spaces §253 RR Tonelli’s theorem (f is integrable if |f (x, y)|dxdy < ∞) 252G uniformly integrable sets in L1 §246 —– criteria for uniform integrability 246G —– and convergence in measure 246J —– and weak compactness 247C
548
Index
Vitali
Vitali’s theorem (for coverings by intervals in R) 221A —– (for coverings by balls in R r ) 261B weak compactness in L1 (µ) §247 Weyl’s Equidistribution Theorem 281N c.l.d. version 213D et seq. L0 (µ) (space of equivalence classes of measurable functions) §241 —– as Riesz space 241E L1 (µ) (space of equivalence classes of integrable functions) §242 —– norm-completeness 242F —– density of simple functions 242M —– (for Lebesgue measure) density of continuous functions and step functions 242O Lp (µ) (space of equivalence classes of pth-power-integrable functions, where 1 < p < ∞) §244 —– is a Banach lattice 244G —– has dual Lq (µ), where p1 + 1q = 1 244K —– and conditional expectations 244M L∞ (µ) (space of equivalence classes of bounded measurable functions) §243 —– duality with L1 243F-243G —– norm-completeness 243E —– order-completeness 243H σ-algebras of sets §111 —– generated by given families 136B, 136G σ-finite measures 215B General index References in bold type refer to definitions; references in italics are passing references. Definitions marked with > are those in which my usage is dangerously at variance with that of some other author. Abel’s theorem 224Yi absolute summability 226Ac absolutely continuous additive functional 232Aa, 232B, 232D, 232F-232I, 232Xa, 232Xb, 232Xd, 232Xf, 232Xh, 234Ce, 256J, 257Xf absolutely continuous function 225B, 225C-225G, 225K-225O, 225Xa-225Xh, 225Xn, 225Xo, 225Ya, 225Yc, 232Xb, 233Xc, 244Yh, 252Yj, 256Xg, 262Bc, 263I, 264Yp, 265Ya, 282R, 283Ci Absoluteness Theorem see Shoenfield’s Absoluteness Theorem additive functional on an algebra of sets see finitely additive (136Xg, 231A), countably additive (231C) adjoint operator 243Fc, 243 notes algebra see algebra of sets (231A), Banach algebra (2A4Jb), normed algebra (2A4J) algebra of sets 113Yi, 136E, 136F, 136G, 136Xg, 136Xh, 136Xk, 136Ya, 136Yb, 231A, 231B, 231Xa; see also σ-algebra (111A) almost continuous function 256F almost every, almost everywhere 112Dd almost surely 112De analytic (complex) function 133Xc angelic topological space 2A5J Archimedean Riesz space 241F, 241Yb, 242Xc area see surface measure asymptotic density asymptotic density 273G asymptotically equidistributed see equidistributed (281Yi) atom (in a measure space) 211I, 211Xb, 246G atomic see purely atomic (211K)
convergent
General index
549
atomless measure (space) 211J, 211Md, 211Q, 211Xb, 211Yd, 211Ye, 212G, 213He, 214Xe, 215D, 215Xe, 215Xf, 216A, 216Ya, 234Ff, 234Yd, 251T, 251Wo, 251Xq, 251Xs, 251Yc, 252Yp, 252Yr, 252Yt, 254V, 254Yf, 256Xd, 264Yg automorphism see measure space automorphism axiom see Banach-Ulam problem, choice (2A1J), countable choice Coarea Theorem 265 notes cofinite see finite-cofinite algebra (231Xa) commutative (ring, algebra) 2A4Jb commutative Banach algebra 224Yb, 243D, 255Xc, 257Ya commutative group see abelian group compact set (in a topological space) 247Xc, 2A2D, 2A2E-2A2G, > 2A3N, 2A3R; see also, relatively compact (2A3N), relatively weakly compact (2A5Id), weakly compact (2A5Ic) compact support, function with 242O, 242Pd, 242O, 242Xh, 244H, 244Yi, 256Be, 256D, 256Xh, 262Yd262Yg compact support, measure with 284Yi compact topological space > 2A3N complete linear topological space 245Ec, 2A5F, 2A5H complete locally determined measure (space) 213C, 213D, 213H, 213J, 213L, 213O, 213Xg, 213Xi, 213Xl, 213Yd, 213Ye, 214I, 214Xc, 216D, 216E, 234Fb, 251Yb, 252B, 252D, 252E, 252N, 252Xb, 252Yh, 252Yj, 252Yr-252Yt, 253Yj, 253Yk; see also c.l.d. version (213E) complete measure (space) 112Df, 113Xa, 122Ya, 211A, 211M, 211N, 211R, 211Xc, 211Xd, §212, 214I, 214J, 216A, 216C, 216Ya, 234A, 234Dd, 254Fd, 254G, 254J, 264Dc; see also complete locally determined measure complete metric space 224Ye; see also Banach space (2A4D) complete normed space see Banach space (2A4D), Banach lattice (242G) complete Riesz space see Dedekind complete (241Fc) completed indefinite-integral measure see indefinite-integral measure (234B) completion (of a measure (space)) §212 (212C), 213Fa, 213Xa, 213Xb, 213Xk, 214Xb, 214Xj, 232Xe, 234Db, 235D, 235Hc, 235Xe, 241Xb, 242Xb, 243Xa, 244Xa, 245Xb, 251S, 251Wn, 251Xp, 252Ya, 254I, 256C complex-valued function §133 component (in a topological space) 111Ye conditional expectation 233D, 233E, 233J, 233K, 233Xg, 233Yc, 235Yc, 242J, 246Ea, 253H, 253Le, 275Ba, 275H, 275I, 275K, 275Ne, 275Xi, 275Ya, 275Yk, 275Yl conditional expectation operator 242Jf, 242K, 242L, 242Xe, 242Yk, 243J, 244M, 244Yk, 246D, 254R, 254Xp, 275Xd, 275Xe conegligible set 112Dc, 214Cc connected set 222Yb continuity, points of 224H, 224Ye, 225J continuous function 121D, 121Yf, 262Ia, 2A2C, 2A2G, 2A3B, 2A3H, 2A3Nb, 2A3Qb continuous linear functional 284Yj; see also dual linear space (2A4H) continuous linear operator 2A4Fc; see also bounded linear operator (2A4F) continuous see also semi-continuous (225H) continuum see c (2A1L) convergence in mean (in L1 (µ) or L1 (µ)) 245Ib >245A), 246J, 246Yc, 247Ya convergence in measure (in L0 (µ)) §245 (> >245A), 271Yd, 274Yd, 285Xs —– (in L0 (µ)) §245 (> —– (in the algebra of measurable sets) 232Ya —– (of sequences) §245, 246J, 246Xh, 246Xi, 253Xe, 255Yf, 271L, 273Ba, 275Xk, 275Yp, 276Yf convergent almost everywhere 245C, 245K, 273Ba, 276G, 276H; see also strong law convergent filter 2A3Q, 2A3S, 2A5Ib; —– sequence 135D, 245Yi, 2A3M, 2A3Sg; see also convergence in measure (245Ad)
550
Index
convex
convex function 233G, 233H-233J, 233Xb-233Xf (233Xd), 233Xh, 233Ya, 233Yc, 233Yd, 242K, 242Yi, 242Yj, 242Yk, 244Xm, 244Yg, 255Yk, 275Yg; see also mid-convex (233Ya) convex hull 2A5E; see also closed convex hull (2A5E) convex set 233Xd, 244Yj, 262Xh, 2A5E convolution in L0 255Fc, 255Xc, 255Yf, 255Yk convolution of functions 255E, 255F-255K, 255O, 255Xa-255Xc, 255Xf-255Xj, 255Ya, 255Yb, 255Yd, 255Ye, 255Yi, 255Yl, 255Ym, 255Yn, 262Xj, 262Yd, 262Ye, 262Yh, 263Ya, 282Q, 282Xt, 283M, 283Wd, 283Wf, 283Wj, 283Xl, 284J, 284K, 284O, 284Wf, 284Wi, 284Xb, 284Xd convolution of measures §257 (257A), 272S, 285R, 285Yn convolution of measures and functions 257Xe, 284Xo, 284Yi convolution of sequences 255Xe, 255Yo, 282Xq countable (set) 111F, 114G, 115G, §1A1, 226Yc countable choice (axiom of) 134C countable-cocountable algebra 211R, 211Ya, 232Hb countable-cocountable measure 211R, 232Hb, 252L countable sup property (in a Riesz space) 241Yd, 242Yd, 242Ye, 244Yb countably additive functional (on a σ-algebra of sets) §231 (231C), §232, 246Yg, 246Yi counting measure 112Bd, 122Xd, 122 notes, 211N, 211Xa, 213Xb, 226A, 241Xa, 242Xa, 243Xl, 244Xi, 244Xn, 245Xa, 246Xc, 251Xb, 251Xh, 252K, 255Yo, 264Db cover see measurable envelope (132D) covering theorem 221A, 261B, 261F, 261Xc, 261Ya, 261Yi, 261Yk cylinder (in ‘measurable cylinder’) 254Aa, 254F, 254G, 254Q, 254Xa decimal expansions 273Xf decomposable measure (space) see strictly localizable (211E) decomposition (of a measure space) 211E, 211Ye, 213O, 213Xh, 214Ia, 214K, 214M, 214Xi; see also Hahn decomposition of an additive functional (231F), Jordan decomposition of an additive functional (231F), Lebesgue decomposition of a countably additive functional (232I), Lebesgue decomposition of a function of bounded variation (§226) decreasing rearrangement (of a function) 252Yp Dedekind complete partially ordered set 135Ba Dedekind complete Riesz space 241Fc, 241G, 241Xf, 242H, 242Yc, 243H, 243Xj, 244L Dedekind σ-complete Riesz space 241Fb, 241G, 241Xe, 241Yb, 241Yh, 242Yg, 243H, 243Xb delta function see Dirac’s delta function (284R) delta system see ∆-system dense set in a topological space 242Mb, 242Ob, 242Pd, 242Xi, 243Ib, 244H, 244Ob, 244Yi, 254Xo, 281Yc, 2A3U, 2A4I density function (of a random variable) 271H, 271I-271K, 271Xc-271Xe, 272T, 272Xd, 272Xj; see also Radon-Nikod´ ym derivative (232If, 234B) density point 223B, 223Xi, 223Yb density topology 223Yb, 223Yc, 223Yd, 261Yf density see also asymptotic density (273G), Lebesgue’s Density Theorem (223A) derivative of a function (of one variable) 222C, 222E, 222F, 222G, 222H, 222I, 222Yd, 225J, 225L, 225Of, 225Xc, 226Be, 282R; (of many variables) 262F, 262G, 262P;see partial derivative determinant of a matrix 2A6A determined see locally determined measure space (211H), locally determined negligible sets (213I) determined by coordinates (in ‘W is determined by coordinates in J’) 254M, 254O, 254R-254T, 254Xp, 254Xr Devil’s Staircase see Cantor function (134H) differentiability, points of 222H, 225J differentiable function (of one variable) 123D, 222A, 224I, 224Kg, 224Yc, 225L, 225Of, 225Xc, 225Xn, 233Xc, 252Yj, 255Xg, 255Xh, 262Xk, 265Xd, 274E, 282L, 282Xs, 283I-283K, 283Xm, 284Xc, 284Xk; (of many variables) 262Fa, 262Gb, 262I, 262Xg, 262Xi, 262Xj; see also derivative ‘differentiable relative to its domain’ 262Fb, 262I, 262M-262Q, 262Xd-262Xf, 262Yc, 263D, 263Xc, 263Xd, 263Yc, 265E, 282Xk
field
General index
551
diffused measure see atomless measure (211J) dilation 286C dimension see Hausdorff dimension (264 notes) Dirac’s delta function 257Xa, 284R, 284Xn, 284Xo, 285H, 285Xp direct image (of a set under a function or relation) 1A1B direct sum of measure spaces 214K, 214L, 214Xi-214Xl, 241Xg, 242Xd, 243Xe, 244Xg, 245Yh, 251Xh, 251Xi directed set see downwards-directed (2A1Ab), upwards-directed (2A1Ab) Dirichlet kernel 282D; see also modified Dirichlet kernel (282Xc) disjoint family (of sets) 112Bb disjoint sequence theorem 246G, 246Ha, 246Yd, 246Ye, 246Yf, 246Yj distribution see Schwartzian distribution, tempered distribution distribution of a random variable 241Xc, 271E, 271F, 271Ga, 272G, 272S, 272Xe, 272Yc, 272Yf, 272Yg, 285H, 285Xg: see also Cauchy distribution (285Xm), empirical distribution (273 notes), Poisson distribution (285Xo) —– of a finite family of random variables 271B, 271C, 271D-271G, 272G, 272Ye, 272Yf, 285Ab, 285C, 285Mb distribution function of a random variable > 271G, 271L, 271Xb, 271Yb, 271Yc, 271Yd, 272Xe, 272Yc, 273Xg, 273Xh, 274F-274L, 274Xd, 274Xg, 274Xh, 274Ya, 274Yc, 285P Dominated Convergence Theorem see Lebesgue’s Dominated Convergence Theorem (123C) Doob’s Martingale Convergence Theorem 275G downwards-directed partially ordered set 2A1Ab dual linear space (of a normed space) 243G, 244K, 2A4H Dynkin class 136A, 136B, 136Xb Eberlein’s theorem 2A5J Egorov’s theorem 131Ya, 215Yb empirical distribution 273Xh, 273 notes envelope see measurable envelope (132D) equidistributed sequence (in a topological probability space) 281N, 281Yi, 281Yj, 281Yk Equidistribution Theorem see Weyl’s Equidistribution Theorem (281N) equiveridical 121B, 212B essential supremum of a family of measurable sets 211G, 213K, 215B, 215C; of a real-valued function 243D, 243I essentially bounded function 243A Etemadi’s lemma 272U Euclidean metric (on R r ) 2A3Fb Euclidean topology §1A2, §2A2, 2A3Ff, 2A3Tc even function 255Xb, 283Yb, 283Yc exchangeable sequence of random variables 276Xe exhaustion, principle of 215A, 215C, 215Xa, 215Xb, 232E, 246Hc expectation of a random variable 271Ab, 271E, 271F, 271I, 271Xa, 272Q, 272Xb, 272Xi, 285Ga, 285Xo; see also conditional expectation (233D) extended real line 121C, §135 extension of measures 132Yd, 212Xk; see also completion (212C), c.l.d. version (213E) fair-coin probability 254J Fatou’s Lemma 123B, 133K, 135G, 135Hb Fatou norm on a Riesz space 244Yf F´ejer integral 283Xf, 283Xh-283Xj F´ejer kernel 282D F´ejer sums 282Ad, 282B-282D, 282G-282I, 282Yc Feller, W. Chap. 27 intro. field (of sets) see algebra (136E)
552
Index
filter
filter 2A1I, 2A1N, 2A1O, 2A5F; see also convergent filter (2A3Q), ultrafilter (2A1N) finite-cofinite algebra 231Xa, 231Xc finitely additive functional on an algebra of sets 136Xg, 136Ya, 136Yb, 231A, 231B, 231Xb-231Xe, 231Ya-231Yh, 232A, 232B, 232E, 232G, 232Ya, 232Ye, 232Yg, 243Xk; see also countably additive Fourier coefficients 282Aa, 282B, 282Cb, 282F, 282I, 282J, 282M, 282Q, 282R, 282Xa, 282Xg, 282Xq, 282Xt, 282Ya, 283Xu, 284Ya, 284Yg Fourier’s integral formula 283Xm Fourier series 121G, §282 (282Ac) Fourier sums 282Ab, 282B-282D, 282J, 282L, 282O, 282P, 282Xi-282Xk, 282Xp, 282Xt, 282Yd, 286V, 286Xb Fourier transform 133Xd, 133Yc, §283 (283A, 283Wa), §284 (284H, 284Wd), 285Ba, 285D, 285Xd, 285Ya, 286U, 286Ya Fourier-Stieltjes transform see characteristic function (285A) Fr´echet filter 2A3Sg Fubini’s theorem 252B, 252C, 252H, 252R full outer measure 132F, 132G, 132Xk, 132Yd, 133Yf, 134D, 134Yt, 214F, 254Yf function 1A1B Fundamental Theorem of Calculus 222E, 222H, 222I, 225E Fundamental Theorem of Statistics 273Xh, 273 notes Gamma function see Γ-function (225Xj) Gaussian distribution see standard normal distribution (274Aa) Gaussian random variable see normal random variable (274Ad) generated (σ-)algebra of sets 111Gb, 111Xe, 111Xf, 121J, 121Xd, 136B, 136C, 136G, 136Xc, 136X, 136Xl, 136Yb Glivenko-Cantelli theorem 273 notes group 255Yn, 255Yo; see also circle group Hahn decomposition of an additive functional 231Eb, 231F —– see also Vitali-Hahn-Saks theorem (246Yg) half-open interval (in R or R r ) 114Aa, 114G, 114Xe, 114Yj, 115Ab, 115Xa, 115Xc, 115Yd Hardy-Littlewood Maximal Theorem 286A Hausdorff dimension 264 notes Hausdorff measure §264 (264C, 264Db, 264K, 264Yo); see also normalized Hausdorff measure (265A) Hausdorff metric (on a space of closed subsets) 246Yb Hausdorff outer measure §264 (264A, 264K, 264Yo) Hausdorff topology 2A3E, 2A3L, 2A3Mb, 2A3S Hilbert space 244N, 244Yj H¨older’s inequality 244Eb hull see convex hull (2A5E), closed convex hull (2A5E) ideal in an algebra of sets 232Xc; see also σ-ideal (112Db) identically distributed random variables 273I, 273Xh, 274 notes, 276Yg, 285Xn, 285Yc; see also exchangeable sequence (276Xe) image filter 2A1Ib, 2A3Qb, 2A3S image measure 112E, 112F, 112Xd, 112Xg, 123Ya, 132G, 132Xk, 132Yb, 132Yf, 211Xd, 212Bd, 212Xg, 235L, 254Oa, 256G image measure catastrophe 235J indefinite integral 131Xa, 222D-222F, 222H, 222I, 222Xa-222Xc, 222Yc, 224Xg, 225E, 225Od, 225Xh, 232D, 232E, 232Yf, 232Yi; see also indefinite-integral measure indefinite-integral measure §234 (234A), 235M, 235P, 235Xi, 253I, 256E, 256J, 256L, 256Xe, 256Yd, 257F, 257Xe, 263Ya, 275Yi, 275Yj, 285Dd, 285Xe, 285Ya; see also uncompleted indefinite-integral measure independence §272 (272A) independent random variables 272Ac), 272D-272I, 272L, 272M, 272P-272U, 272Xb, 272Xd, 272Xh-272Xj, 272Ya-272Yd, 272Yf, 272Yg, 273B, 273D, 273E, 273H, 273I, 273L-273N, 273Xh, 273Xi, 273Xk, 274B-274D,
Lebesgue
General index
553
274F-274K, 274Xc, 274Xd, 274Xg, 275B, 275Yh, 276Af, 285I, 285Xf, 285Xg, 285Xm-285Xo, 285Xs, 285Yc, 285Yk, 285Yl independent sets 272Aa, 272Bb, 272F, 272N, 273F, 273K independent σ-algebras 272Ab, 272B, 272D, 272F, 272J, 272K, 272M, 272O, 275Ym induced topology see subspace topology (2A3C) inductive definitions 2A1B infinity 112B, 133A, §135 initial ordinal 2A1E, 2A1F, 2A1K inner measure 113Yh, 212Yc, 213Xe, 213Yc inner product space 244N, 253Xe inner regular measure 256A, 256B —– —– with respect to closed sets 256Ya integrable function §122 (122M), 123Ya, 133B, 133Db, 133Dc, 133F, 133J, 133Xa, 135Fa, 212B, 212F, 213B, 213G; see also Bochner integrable function (253Yf), L1 (µ) (242A) integral §122 (122E, 122K, 122M); see also integrable function, Lebesgue integral (122Nb), lower integral (133I), Riemann integral (134K), upper integral (133I) integration by parts 225F, 225Oe, 252Xi integration by substitution see change of variable in integration interior of a set 2A3D, 2A3Ka interpolation see Riesz Convexity Theorem interval see half-open interval (114Aa, 115Ab), open interval (111Xb inverse Fourier transform 283Ab, 283B, 283Wa, 283Xb, 284I; see also Fourier transform inverse image (of a set under a function or relation) 1A1B inverse-measure-preserving function 132G, 134Yl, 134Ym, 134Yn, 235G, 235H, 235I, 235Xe, 241Xh, 242Xf, 243Xn, 244Xo, 246Xf, 254G, 254H, 254Ka, 254O, 254Xc-254Xf, 254Xh, 254Yb; see also image measure (112E) Inversion Theorem (for Fourier series and transforms) 282G-282I, 282L, 282O, 282P, 283I, 283L, 284C, 284M; see also Carleson’s theorem isodiametric inequality 264H, 264 notes isomorphism see measure space isomorphism Jacobian 263Ea Jensen’s inequality 233I, 233J, 242Yi joint distribution see distribution (271C) Jordan decomposition of an additive functional 231F, 231Ya, 232C kernelsee Dirichlet kernel (282D), F´ejer kernel (282D), modified Dirichlet kernel (282Xc) Kirzbraun’s theorem 262C Kolmogorov’s Strong Law of Large Numbers 273I, 275Yn Koml´os’ theorem 276H, 276Yh Kronecker’s lemma 273Cb Lacey-Thiele Lemma 286M Laplace’s central limit theorem 274Xe Laplace transform 123Xc, 123Yb, 133Xc, 225Xe lattice 2A1Ad; see also Banach lattice (242G) —– norm see Riesz norm (242Xg) law of a random variable see distribution (271C) law of large numbers see strong law (§273) law of rare events 285Q Lebesgue, H. Vol. 1 intro., Chap. 27 intro. Lebesgue Covering Lemma 2A2Ed Lebesgue decomposition of a countably additive functional 232I, 232Yb, 232Yg, 256Ib Lebesgue decomposition of a function of bounded variation 226C, 226Dc, 226Ya, 232Yb Lebesgue’s Density Theorem §223, 261C, 275Xg
554
Index
Lebesgue
Lebesgue’s Dominated Convergence Theorem 123C, 133G Lebesgue extension see completion (212C) Lebesgue integrable function 122Nb, 122Yb, 122Ye, 122Yf Lebesgue integral 122Nb Lebesgue measurable function 121C, 121D, 134Xd, 225H, 233Yd, 262K, 262P, 262Yc Lebesgue measurable set 114E, 114F, 114G, 114Xe, 114Ye, 115E, 115F, 115G, 115Yc Lebesgue measure (on R) §114 (114E), 131Xb, 133Xc, 133Xd, 134G-134L, 212Xc, 216A, Chap. 22, 242Xi, 246Yd, 246Ye, 252N, 252O, 252Xf, 252Xg, 252Yj, 252Yp, §255 —– —– (on R r ) §115 (115E), 132C, 132Ef, 133Yc, §134, 211M, 212Xd, 245Yj, 251M, 251Wi, 252Q, 252Xh, 252Yu, 254Xk, 255A, 255K, 255L, 255Xd, 255Yc, 255Yd, 256Ha, 256J-256L, 264H, 264I —– —– (on [0, 1], [0, 1[) 211Q, 216A, 252Yq, 254K, 254Xh, 254Xj-254Xl —– —– (on other subsets of R r ) 242O, 244Hb, 244I, 244Yh, 246Yf, 246Yl, 251Q, 252Ym, 255M, 255N, 255O, 255Ye, 255Yh Lebesgue negligible set 114E, 115E, 134Yk Lebesgue outer measure 114C, 114D, 114Xc, 114Yd, 115C, 115D, 115Xb, 115Xd, 115Yb, 132C, 134A, 134D, 134Fa Lebesgue set of a function 223D, 223Xf, 223Xg, 223Xh, 223Yg, 261E, 261Ye Lebesgue-Stieltjes measure 114Xa, 114Xb, 114Yb, 114Yc, 114Yf, 131Xc, 132Xg, 134Xc, 211Xb, 212Xd, 212Xi, 225Xf, 232Xb, 232Yb, 235Xb, 235Xg, 235Xh, 252Xi, 256Xg, 271Xb, 224Yh length of a curve 264Yl, 265Xd, 265Ya length of an interval 114Ab B.Levi’s theorem 123A, 123Xa, 133K, 135G, 135Hb, 226E, 242E Levi property of a normed Riesz space 242Yb, 244Ye L´evy’s martingale convergence theorem 275I L´evy’s metric 274Ya, 285Yd Liapounoff’s central limit theorem 274Xg limit of a filter 2A3Q, 2A3R, 2A3S limit of a sequence 2A3M, 2A3Sg limit ordinal 2A1Dd Lindeberg’s central limit theorem 274F-274H, 285Ym Lindeberg’s condition 274H linear operator 262Gc, 263A, 265B, 265C, §2A6; see also bounded linear operator (2A4F), continuous linear operator linear order see totally ordered set (2A1Ac) linear space topology see linear topological space (2A5A), weak topology (2A5I), weak* topology (2A5I) linear subspace (of a normed space) 2A4C; (of a linear topological space) 2A5Ec linear topological space 245D, 284Ye, 2A5A, 2A5B, 2A5C, 2A5Eb, 2A5F, 2A5G, 2A5H, 2A5I Lipschitz constant 262A, 262C, 262Yi, 264Yj Lipschitz function 225Yc, 262A, 262B-262E, 262N, 262Q, 262Xa-262Xc, 262Xh, 262Yi, 263F, 264Yj, 282Yb local convergence in measure see convergence in measure (245A) localizable measure (space) 211G, 211L, 211Ya, 211Yb, 212G, 213Hb, 213L-213N, 213Xl, 213Xm, 214Id, 214J, 214Xa, 214Xd, 214Xf, 216C, 216E, 216Ya, 216Yb, 234Fc, 234G, 234Ye, 241G, 241Ya, 243G, 243H, 245Ec, 245Yf, 252Yq, 252Yr, 252Yt, 254U; see also strictly localizable (211E) locally determined measure (space) 211H, 211L, 211Ya, 216Xb, 216Ya, 216Yb, 251Xc, 252Ya; see also complete locally determined measure locally determined negligible sets 213I, 213J-213L, 213Xj-213Xl, 214Ib, 214Xg, 214Xh, 216Yb, 234Yb, 252Yb locally finite measure 256A, 256C, 256G, 256Xa, 256Ya locally integrable function 242Xi, 255Xh, 255Xi, 256E, 261Xa, 262Yg Lo`eve, M. Chap. 27 intro. lower integral 133I, 133J, 133Xe, 135H lower Lebesgue density 223Yf lower Riemann integral 134Ka
norm
General index
555
lower semi-continuous function 225H, 225I, 225Xl, 225Xm, 225Yd, 225Ye Lusin’s theorem 134Yc, 256F Maharam measure (space) see localizable (211G) Markov time see stopping time (275L) martingale §275 (275A, 275Cc, 275Cd, 275Ce); see also reverse martingale martingale convergence theorems 275G-275I, 275K, 275Xf martingale difference sequence 276A, 276B, 276C, 276E, 276Xd, 276Ya, 276Yb, 276Ye, 276Yg martingale inequalities 275D, 275F, 275Xb, 275Yc-275Ye, 276Xb maximal element in a partially ordered set 2A1Ab maximal theorems 275D, 275Yc, 275Yd, 276Xb, 286A, 286T mean (of a random variable) see expectation (271Ab) Mean Ergodic Theoremsee convergence in mean (245Ib) measurable cover see measurable envelope (132D) measurable envelope 132D, 132E, 132F, 132Xf, 132Xg, 134Fc, 134Xc, 213K-213M, 214G, 216Yc; see also full outer measure (132F) measurable envelope property 213Xl, 214Xl measurable function (taking values in R) §121 (121C), 122Ya, 212B, 212F, 213Yd, 214La, 214Ma, 235C, 235K, 252O, 252P, 256F, 256Yb, 256Yc —– —– (taking values in R r ) 121Yf), 256G —– —– (taking values in other spaces) 133Da, 133E, 133Yb, 135E, 135Xd, 135Yf —– —– ((Σ, T)-measurable function) 121Yb, 235Xc, 251Ya, 251Yc —– —– see also Borel measurable, Lebesgue measurable measurable set 112A; µ-measurable set 212Cd; see also relatively measurable (121A) measurable space 111Bc measurable transformation §235; see also inverse-measure-preserving function measure 112A —– (in ‘µ measures E’, ‘E is measured by µ’) > 112Be measure algebra 211Yb, 211Yc measure-preserving function see inverse-measure-preserving function (235G), measure space automorphism, measure space isomorphism measure space §112 (112A), 113C, 113Yi measure space automorphism 255A, 255Ca, 255N, 255Ya, 255 notes measure space isomorphism 254K, 254Xj-254Xl, 255Ca, 255Mb metric 2A3F, 2A4Fb; Euclidean metric (2A3Fb), Hausdorff metric (246Yb), L´evy’s metric (274Ya), pseudometric (2A3F) metric outer measure 264Xb, 264Yc metric space 224Ye, 261Yi; see also complete metric space, metrizable space (2A3Ff) metrizable (topological) space 2A3Ff, 2A3L; see also metric space, separable metrizable space mid-convex function 233Ya, 233Yd minimal element in a partially ordered set 2A1Ab modified Dirichlet kernel 282Xc modulation 286C Monotone Class Theorem 136B Monotone Convergence Theorem see B.Levi’s theorem (133A) monotonic function 121D, 222A, 222C, 222Yb, 224D Monte Carlo integration 273J, 273Ya multilinear map 253Xc negligible set 112D, 131Ca, 214Cb; see also Lebesgue negligible (114E, 115E) non-decreasing sequence of sets 112Ce non-increasing sequence of sets 112Cf non-measurable set 134B, 134D, 134Xg norm 2A4B; (of a linear operator) 2A4F, 2A4G, 2A4I; (of a matrix) 262H, 262Ya; (norm topology) 242Xg, 2A4Bb
556
Index
normal
normal density function 274A, 283N, 283We, 283Wf normal distribution function 274Aa, 274F-274K, 274M, 274Xe, 274Xg normal random variable 274A, 274B, 285E, 285Xm, 285Xn normalized Hausdorff measure 264 notes, §265 (265A) normed algebra 2A4J normed space 224Yf, §2A4 (2A4Ba); see also Banach space (2A4D) null set see negligible (112Da) odd function 255Xb, 283Yd open interval 111Xb, 114G, 115G, 1A1A, 2A2I open set (in R r ) 111Gc, 111Yc, 114Yd, 115G, 115Yb, 133Xb, 134Fa, 134Yj, 135Xa, 1A2A, 1A2B, 1A2D, 256Ye, 2A3A, 2A3G; (in R) 111Gc, 111Ye, 114G, 134Xc, 2A2I; see also topology (2A3A) optional time see stopping time (275L) order-bounded set (in a partially ordered space) 2A1Ab order-complete see Dedekind complete (241Ec) order-continuous norm (on a Riesz space) 242Yc, 242Ye, 244Yd order*-convergent sequence in a partially ordered set 245Xc; (in L0 (µ)) 245C, 245K, 245L, 245Xc, 245Xd order unit (in a Riesz space) 243C ordered set see partially ordered set (2A1Aa), totally ordered set (2A1Ac), well-ordered set (2A1Ae) ordinal 2A1C, 2A1D-2A1F, 2A1K ordinate set 252N, 252Yg, 252Yh orthogonal matrix 2A6B, 2A6C orthogonal projection in Hilbert space 244Nb, 244Yj, 244Yk orthonormal vectors 2A6B outer measure §113 (113A), 114Xd, 132B, 132Xg, 136Ya, 212Ea, 212Xa, 212Xb, 212Xg, 213C, 213Xa, 213Xg, 213Xk, 213Ya, 251B, 251Wa, 251Xd, 254B, 264B, 264Xa, 264Ya, 264Yo; see also Lebesgue outer measure (114C, 115C), metric outer measure (264Yc), regular outer measure (132Xa) —– —– defined from a measure 113Ya, §132 (132B), 213C, 213F, 213Xa, 213Xg-213Xj, 213Xk, 213Yd, 214Cd, 215Yc, 251O, 251R, 251Wk, 251Wm, 251Xm, 251Xo, 252Yh, 254G, 254L, 254S, 254Xb, 254Xq, 254Yd, 264F, 264Yd outer regular measure 256Xi Parseval’s identity 284Qd partial derivative 123D, 252Yj, 262I, 262J, 262Xh, 262Yb, 262Yc partial order see partially ordered set (2A1Aa) partially ordered linear space 241E, 241Yg partially ordered set 2A1Aa Peano curve 134Yl-134Yo periodic extension of a function on ]−π, π] 282Ae Plancherel Theorem (on Fourier series and transforms of square-integrable functions) 282K, 284O, 284Qd point-supported measure 112Bd, 112Xg, 211K, 211O, 211Qb, 211Rc, 211Xb, 211Xf, 213Xo, 234Xc, 215Xr, 256Hb pointwise convergence (topology on a space of functions) 281Yf pointwise convergent see order*-convergent (245Cb) pointwise topology see pointwise convergence Poisson distribution 285Q, 285Xo Poisson’s theorem 285Q polar coordinates 263G, 263Xf P´olya’s urn scheme 275Xc polynomial (on R r ) 252Yu porous set 223Ye, 261Yg, 262L positive cone 253G, 253Xi, 253Yd positive definite function 283Xt, 285Xr predictable sequence 276Ec
Riesz
General index
557
presque partout 112De primitive product measure 251C, 251E, 251F, 251H, 251K, 251Wa, 251Xa-251Xc, 251Xe, 251Xf, 251Xj, 251Xl-251Xp, 252Yc, 252Yd, 252Yg, 253Ya-253Yc, 253Yg principal ultrafilter 2A1N probability density function see density (271H) probability space 211B, 211L, 211Q, 211Xb, 211Xc, 211Xd, 212G, 213Ha, 215B, 243Xi, 253H, 253Xh, §254, Chap. 27 product measure Chap. 25; see also c.l.d. product measure (251F, 251W), primitive product measure (251C), product probability measure (254C) product probability measure §254 (254C), 272G, 272J, 272M, 275J, 275Yi, 275Yj, 281Yk product topology 281Yc, 2A3T —– see also inner product space pseudometric 2A3F, 2A3G, 2A3H, 2A3I, 2A3J, 2A3K, 2A3L, 2A3Mc, 2A3S, 2A3T, 2A3Ub, 2A5B pseudo-simple function 122Ye, 133Ye pull-back measures 132G purely atomic measure (space) 211K, 211N, 211R, 211Xb, 211Xc, 211Xd, 212G, 213He, 214Xe, 234Xb, 251Xr purely infinite measurable set 213 notes push-forward measure see image measure (112F) quasi-Radon measure (space) 256Ya, 263Ya quasi-simple function 122Yd, 133Yd quotient partially ordered linear space 241Yg; see also quotient Riesz space quotient Riesz space 241Yg, 241Yh, 242Yg quotient topology 245Ba Rademacher’s theorem 262Q Radon measure(on R or R r ) §256 (256A), 284R, 284Yi; (on ]−π, π]) 257Yb Radon-Nikod´ ym derivative 232Hf, 232Yj, 234B, 234Ca, 234Yd, 234Yf, 235Xi, 256J, 257F, 257Xe, 257Xf, 272Xe, 272Yc, 275Ya, 275Yi, 285Dd, 285Xe, 285Ya Radon-Nikod´ ym theorem 232E-232G, 234G, 235Xk, 242I, 244Yk Radon probability measure (on R or R r ) 271B, 271C, 271Xb, 285Aa, 285M; (on other spaces) 256Ye, 271Ya Radon product measure (of finitely many spaces) 256K random variable 271Aa rapidly decreasing test function §284 (284A, 284Wa), 285Dc, 285Xd, 285Ya rearrangement see decreasing rearrangement (252Yp) recursion 2A1B regular measure see inner regular (256Ac) regular outer measure 132C, 132Xa, 213C, 214Hb, 251Xm, 254Xb, 264Fb regular topological space 2A5J relation 1A1B relatively compact set 2A3Na, 2A3Ob relatively measurable set 121A relatively weakly compact set (in a normed space) 247C, 2A5I repeated integral §252 (252A); see also Fubini’s theorem, Tonelli’s theorem reverse martingale 275K Riemann integrable function 134K, 134L, 281Yh, 281Yi Riemann integral 134K, 242 notes Riemann-Lebesgue lemma 282E Riesz Convexity Theorem 244 notes Riesz norm 242Xg Riesz space (= vector lattice) 231Yc, 241Ed, 241F, 241Yc, 241Yg
558
Index
Saks
Saks see Vitali-Hahn-Saks theorem (246Yg) saltus function 226B, 226Db, 226Xa Schr¨oder-Bernstein theorem 2A1G Schwartz function see rapidly decreasing test function (284A) Schwartzian distribution 284R, 284 notes; see also tempered distribution (284 notes) self-supporting set (in a topological measure space) 256Xf semi-continuous function see lower semi-continuous (225H) semi-finite measure (space) 211F, 211L, 211Xf, 211Ya, 212G, 213A, 213B, 213Hc, 213Xc, 213Xd, 213Xj, 213Xl, 213Xm, 213Ya-213Yc 214Xe, 214Xh, 215B, 216Xa, 216Yb, 234Fa, 235O, 235Xd, 235Xe, 241G, 241Ya, 241Yd, 243G, 245Ea, 245J, 245Xd, 245Xj, 245Xl, 246J, 246Xh, 251J, 251Xc, 252P, 252Yf, 253Xf, 253Xg semi-finite version of a measure 213Xc, 213Xd semi-martingale see submartingale (275Yf) seminorm 2A5D semi-ring of sets 115Ye separable (topological) space 2A3Ud separable Banach space 244I, 254Yc separable metrizable space 245Yj, 264Yb, 284Ye shift operators (on function spaces based on topological groups) 286C Sierpi´ nski Class Theorem see Monotone Class Theorem (136B) signed measure see countably additive functional (231C) >122A), 242M simple function §122 (> singular additive functional 232Ac, 232I, 232Yg smooth function (on R or R r ) 242Xi, 255Xi, 262Yd-262Yg, 284A, 284Wa smoothing by convolution 261Ye solid hull (of a subset of a Riesz space) 247Xa space-filling curve 134Yl sphere, surface measure on 265F-265H, 265Xa-265Xc, 265Xe spherical polar coordinates 263Xf, 265F square-integrable function 244Na; see also L2 standard normal distribution, standard normal random variable 274A Steiner symmetrization 264H, 264 notes step-function 226Xa Stieltjes measure see Lebesgue-Stieltjes measure (114Xa) Stirling’s formula 252Yn stochastically independent see independent (272A) Stone-Weierstrass theorem 281A, 281E, 281G, 281Ya, 281Yg stopping time 275L, 275M-275O, 275Xi, 275Xj strictly localizable measure (space) 211E, 211L, 211N, 211Xf, 211Ye, 212G, 213Ha, 213J, 213L, 213O, 213Xa, 213Xh, 213Xn, 213Ye, 214Ia, 214J, 215Xf, 216E, 234Fd, 235P, 251N, 251P, 251Xn, 252B, 252D, 252E, 252Ys, 252Yt strong law of large numbers 273D, 273H, 273I, 273Xh, 275Yn, 276C, 276F, 276Ye, 276Yg subalgebra see σ-subalgebra (233A) submartingale 275Yf, 275Yg subspace measure 113Yb, 214A, 214B, 214C, 214H, 214I, 214Xb-214Xh, 216Xa, 216Xb, 241Ye, 242Yf, 243Ya, 244Yc, 245Yb, 251P, 251Q, 251Wl, 251Xn, 251Yb, 254La, 254Ye, 264Yf; (on a measurable subset) 131A, 131B, 131C, 132Xb, 214J, 214K, 214Xa, 214Xi, 241Yf, 247A; (integration with respect to a subspace measure) 131D, 131E-131H, 131Xa-131Xc, 133Dc, 133Xa, 214D, 214E-214G, 214M subspace of a normed space 2A4C subspace topology 2A3C, 2A3J subspace σ-algebra 121A, 214Ce substitution see change of variable in integration successor cardinal 2A1Fc —– ordinal 2A1Dd
usual
General index
559
sum over arbitrary index set 112Bd, 226A sum of measures 112Xe, 112Ya, 212Xe, 212Xh, 212Xi, 212Xj, 212Yd, 212Ye summable family of real numbers 226A, 226Xf support of a topological measure 256Xf, 257Xd support see also bounded support, compact support supported see point-supported (112Bd) supporting see self-supporting set (256Xf), support supremum 2A1Ab surface measure see normalized Hausdorff measure (265A) symmetric distribution 272Ye symmetrization see Steiner symmetrization tempered distribution 284 notes tempered function §284 (284D), 286D tempered measure 284Yi tensor product of linear spaces 253 notes test function 242Xi, 284 notes; see also rapidly decreasing test function (284A) thick set see full outer measure (132F) tight see uniformly tight (285Xj) Tonelli’s theorem 252G, 252H, 252R topological measure (space) 256A topological space §2A3 (2A3A) topological vector space see linear topological space (2A5A) topology §2A2, §2A3 (2A3A); see also convergence in measure (245A), linear space topology (2A5A) total order see totally ordered set (2A1Ac) total variation (of an additive functional) 231Yh; (of a function) see variation (224A) totally finite measure (space) 211C, 211L, 211Xb, 211Xc, 211Xd, 212G, 213Ha, 214Ia, 214Ja, 215Yc, 232Bd, 232G, 243I, 243Xk, 245Fd, 245Xe, 245Ye, 246Xi, 246Ya totally ordered set 135Ba, 2A1Ac trace (of a σ-algebra) see subspace σ-algebra (121A) transfinite recursion 2A1B translation-invariant measure 114Xf, 115Xd, 134A, 134Ye, 134Yf, 255A, 255Ba, 255Yn truly continuous additive functional 232Ab, 232B-232E, 232H, 232I, 232Xa, 232Xb, 232Xf, 232Xh, 232Ya, 232Ye, 234Ce Ulam S. see Banach-Ulam problem ultrafilter 254Yd, 2A1N, 2A1O, 2A3R, 2A3Se; see also principal ultrafilter (2A1N) Ultrafilter Theorem 2A1O uncompleted indefinite-integral measure 234Cc uniform space 2A5F uniformly continuous function 224Xa, 255K uniformly distributed sequence see equidistributed (281Yi) >246A), 252Yp, 272Yd, 273Na, 274J, 275H, 275Xi, 275Yl, 276Xd, uniformly integrable set (in L1 ) §246 (> 276Yb; (in L1 (µ)) §246 (246A), 247C, 247D, 247Xe, 253Xd uniformly tight (set of measures) 285Xj, 285Xk, 285Ye, 285Yf unit ball in R r 252Q universal mapping theorems 253F, 254G up-crossing 275E, 275F upper integral 133I, 133J, 133K, 133Xe, 133Yf, 135H, 252Ye, 252Yh, 252Yi, 253J, 253K upper Riemann integral 134Ka upwards-directed partially ordered set 2A1Ab usual measure on {0, 1}I 254J; see under {0, 1}I usual measure on PX 254J; see under PX
560
Index
vague
vague topology (on a space of signed measures) 274Ld, 274Xh, 274Ya-274Yd, 275Yp, 285K, 285L, 285S, 285U, 285Xk, 285Xq, 285Xs, 285Yd, 285Yg-285Yi, 285Yn variance of a random variable 271Ac, 271Xa, 272R, 272Xf, 285Gb, 285Xo variation of a function §224 (224A, 224K, 224Yd, 224Ye), 226B, 226Db, 226Xc, 226Xd, 226Yb; see also bounded variation (224A) —– of a measure see total variation (231Yh) vector integration see Bochner integral (253Yf) vector lattice see Riesz space (241E) virtually measurable function 122Q, 122Xe, 122Xf, 212Bb, 212F, 241A, 252E Vitali cover 261Ya Vitali’s theorem 221A, 221Ya, 221Yc, 221Yd, 261B, 261Yk Vitali-Hahn-Saks theorem 246Yg volume 115Ac —– of a ball in R r 252Q, 252Xh Wald’s equation 272Xh weak topology (of a normed space) 247Ya, 2A5I —– see also (relatively) weakly compact, weakly convergent weak* topology on a dual space 253Yd, 285Yg, 2A5Ig; see also vague topology (274Ld) weakly compact set (in a linear topological space) 247C, 247Xa, 247Xc, 247Xd, 2A5I; see also relatively weakly compact (2A5Id) weakly convergent sequence in a normed space 247Yb Weierstrass’ approximation theorem 281F; see also Stone-Weierstrass theorem well-distributed sequence 281Xh well-ordered set 2A1Ae, 2A1B, 2A1Dg, 2A1Ka; see also ordinal (2A1C) Well-ordering Theorem 2A1Ka Weyl’s Equidistribution Theorem 281M, 281N, 281Xh Young’s inequality 255Ym Zermelo’s Well-ordering Theorem 2A1Ka zero-one law 254S, 272O, 272Xf, 272Xg Zorn’s lemma 2A1M a.e. (‘almost everywhere’) 112Dd a.s. (‘almost surely’) 112De B (in B(x, δ), closed ball) 261A, 2A2B B (in B(U ; V ), space of bounded linear operators) 253Xb, 253Yj, 253Yk, 2A4F, 2A4G, 2A4H c (the cardinal of R or PN) 2A1H, 2A1L C (in C(X), where X is a topological space) 243Xo, 281Yc, 281Ye, 281Yf C([0, 1]) 242 notes Cb (in Cb (X), where X is a topological space) 281A, 281E, 281G, 281Ya, 281Yd, 281Yg, 285Yg c.l.d. product measure §§251-253 (251F, 251W), 254Db, 254U, 254Ye, 256K, 256L c.l.d. version of a measure (space) 213E, 213F-213H, 213M, 213Xb-213Xe, 213Xg, 213Xj, 213Xk, 213Xn, 213Xo, 213Yb, 214Xf, 214Xj, 232Ye, 234Yf, 241Ya, 242Yh, 244Ya, 245Yc, 251Ic, 251S, 251Wn, 251Xd, 251Xj, 251Xk, 252Ya diam (in diam A) = diameter dom (in dom f ): the domain of a function f ess sup see essential supremum (243Da) E (in E(X), expectation of a random variable) 271Ab f -algebra 241H, 241 notes Gδ set 264Xe
General index
S
561
`1 (in `1 (X)) 242Xa, 243Xl, 246Xd, 247Xc, 247Xd `1 (= `1 (N)) 246Xc `2 244Xn, 282K, 282Xg `p (in `p (X)) 244Xn `∞ (in `∞ (X)) 243Xl, 281B, 281D `∞ (= `∞ (N)) 243Xl L0 (in L0 (µ)) 121Xb, 121Ye, §241 (241A), §245, 253C, 253Ya; see also L0 (241C), L0strict (241Yh), L0C (241J) L0strict 241Yh L0C (in L0C (µ)) 241J, 253L L0 (in L0 (µ)) §241 (241A), 242B, 242J, 243A, 243B, 243D, 243Xe, 243Xj, §245, 253Xe, 253Xf, 253Xg, 271De, 272H; (in L0C (µ)) 241J; see also L0 (241A) L1 (in L1 (µ)) 122Xc, 242A, 242Da, 242Pa, 242Xb; (in L1strict (µ) 242Yg; (in L1C (µ) 242P, 255Yn; (in 1 LV (µ)) 253Yf; see also L1 , k k1 L1 (in L1 (µ)) §242 (242A), 243De, 243F, 243G, 243J, 243Xf, 243Xg, 243Xh, 245H, 245J, 245Xh, 245Xi, §246, §247, §253, 254R, 254Xp, 254Ya, 254Yc, 255Xc, 257Ya, 282Bd; (in L1V (µ)) 253Yf, 253Yi; see also L1 , L1C , k k1 L1C (µ) 242P, 243K, 246K, 246Yl, 247E, 255Xc; see also convolution of functions L2 (in L2 (µ)) 244Ob, 253Yj, §286; (in L2C (µ)) 284N, 284O, 284Wh, 284Wi, 284Xi, 284Xk-284Xm, 284Yg; see also L2 , Lp , k k2 L2 (in L2 (µ)) 244N, 244Yk, 247Xe, 253Xe; (in L2C (µ)) 282K, 282Xg, 284P; see also L2 , Lp , k k2 Lp (in Lp (µ)) §244 (244A), 246Xg, 252Ym, 253Xh, 255K, 255Og, 255Yc, 255Yd, 255Yl, 255Ym, 261Xa, 263Xa, 273M, 273Nb, 281Xd, 282Yc, 284Xj, 286A; see also Lp , L2 , k kp Lp (in Lp (µ), 1 < p < ∞) §244 (244A), 245G, 245Xj, 245Xk, 245Yg, 246Xh, 247Ya, 253Xe, 253Xi, 253Yk, 255Yf; see also Lp , k kp L∞ (in L∞ (µ)) 243A, 243D, 243I, 243Xa, 243Xl, 243Xn; see also L∞ L∞ C 243K L∞ strict 243Xb L∞ (in L∞ (µ)) §243 (243A), 253Yd; see also L∞ , L∞ C , k k∞ L∞ 243K, 243Xm C L (in L(U ; V ), space of linear operators) 253A, 253Xa lim (in lim F) 2A3S; (in limx→F ) 2A3S lim inf (in lim inf n→∞ ) §1A3 (1A3Aa), 2A3Sg; (in lim inf δ↓0 ) 2A2H; (in lim inf x→F ) 2A3S lim sup (in lim supn→∞ ) §1A3 (1A3Aa), 2A3Sg; (in lim supδ↓0 ) 2A2H, 2A3Sg; (in lim supx→F f (x)) 2A3S ln+ 275Yd M0,∞ 252Yp M 1,∞ (in M 1,∞ (µ)) 234Yd, 244Xl, 244Xm, 244Xo, 244Yc N see PN N × N 111Fb P(usual measure on PX) 254J, 254Xf, 254Xq, 254Yd PN 1A1Hb, 2A1Ha, 2A1Lb; (usual measure on) 273G, 273Xd, 273Xe p.p. (‘presque partout’) 112De X ∈ E) etc. 271Ad Pr(X > a), Pr(X Q (the set of rational numbers) 111Eb, 1A1Ef R (the set of real numbers) 111Fe, 1A1Ha, 2A1Ha, 2A1Lb RX 245Xa, 256Ye; see also Euclidean metric, Euclidean topology R C 2A4A R see extended real line (§135) S (in S(A)) 243I; (in S f ∼ = S(Af )) 242M, 244H S see rapidly decreasing test function (284A)
562
Index
S
S 1 (the unit circle, as topological group) see circle group S r−1 (the unit sphere in R r ) see sphere ∗ sf (in µsf ) see semi-finite version of a measure (213Xc); (in µsf ) 213Xf, 213Xg, 213Xk T2 topology see Hausdorff (2A3E) T (in Tµ¯,¯ν ) 244Xm, 244Xo, 244Yc, 246Yc Tm see convergence in measure (245A) Ts (in Ts (U, V )) see weak topology (2A5I), weak* topology (2A5Ig) U (in U (x, δ)) 1A2A Var (in Var(X)) see variance (271Ac); (in VarD f , Var f ) see variation (224A) w∗ -topology see weak* topology 2A5Ig Z (the set of integers) 111Eb, 1A1Ee; (as topological group) 255Xe ZFC see Zermelo-Fraenkel set theory βr (volume of unit ball in R r ) 252Q, 252Xh, 265F, 265H, 265Xa, 265Xb, 265Xe Γ-function 225Xj, 225Xk, 252Xh, 252Yk, 252Yn, 255Xj ∆-system 2A1Pa θ-refinable see hereditarily weakly θ-refinable µG (standard normal distribution) 274Aa νX see distribution of a random variable (271C) π-λ Theorem see Monotone Class Theorem (136B) σ-additive see countably additive (231C) σ-algebra of sets §111 (111A), 136Xb, 136Xi, 212Xk; see also Borel σ-algebra (111G) σ-algebra defined by a random variable 272C, 272D σ-complete see Dedekind σ-complete (241Fb) σ-field see σ-algebra (111A) σ-finite measure (space) 211D, 211L, 211M, 211Xe, 212G, 213Ha, 213Ma, 214Ia, 214Ja, 215B, 215C, 215Xe, 215Ya, 215Yb, 216A, 232B, 232F, 234Fe, 235O, 235R, 235Xe, 235Xk, 241Yd, 243Xi, 245Eb, 245K, 245L, 245Xe, 251K, 251Wg, 252B-252E, 252H, 252P, 252R, 252Xc, 252Yb, 252Yl σ-ideal (of sets) 112Db, 211Xc, 212Xf, 212Xk σ-subalgebra of sets §233 (233A) P a 112Bd, 222Ba, 226A i∈I i τ -additive measure 256M, 256Xb, 256Xc Φ see normal distribution function (274Aa) χ (in χA, where A is a set) 122Aa ω (the first infinite ordinal) 2A1Fa ω1 (the first uncountable ordinal) 2A1Fc —– see also Pω1 ω2 2A1Fc see closure (2A2A, 2A3Db) ¯ ¯ (in h(u), where h is a Borel function and u ∈ L0 ) 241I, 241Xd, 241Xi, 245Dd =a.e. 112Dg, 112Xh, 222E, 241C ≤a.e. 112Dg, 112Xh, 212B ≥a.e. 112Dg, 112Xh, 233I 0 (in T 0 ) see adjoint operator
special symbols
General index
563
∗ (in f ∗ g, u ∗ v, λ ∗ ν, ν ∗ f , f ∗ ν) see convolution (255E, 255O, 255Xe, 255Yn) * (in weak*) see weak* topology (2A5Ig); (in U ∗ = B(U ; R), linear topological space dual) see dual (2A4H); (in µ∗ ) see outer measure defined by a measure (132B) ∗ (in µ∗ ) see inner measure defined by a measure (113Yh) \ (in E \ F , ‘set difference’) 111C 4 E4F , ‘symmetric difference’) 111C S (in S S (in E ) 111C; (in A) 1A1F n T Tn∈N T (in E) 1A2F R (in R n∈NREn ) 111C; R R (in f , f dµ, Rf (x)µ(dx)) 122E, 122K, 122M, R 122Nb; (in A f ) 131D, 214D, 235Xf; see also subspace measure; (in u) 242Ab, 242B, 242D; (in A u) 242Ac; see also upper integral, lower integral (133I) R R see upper integral (133I) see lower integral (133I) R R see Riemann integral (134K) ¹ (in f ¹A, the restriction of a function to a set) 121Eh | | (in a Riesz space) 241Ee, 242G k k1 (on L1 (µ)) §242 (242D), 246F, 253E, 275Xd, 282Ye; (on L1 (µ)) 242D, 242Yg, 273Na, 273Xi k k2 244Da, 273Xj; see also L2 , k kp k kp (for 1 < p < ∞) §244 (244Da), 245Xm, 246Xb, 246Xh, 246Xi, 252Ym, 252Yp, 253Xe, 253Xh, 273M, 273Nb, 275Xe, 275Xf, 275Xh, 276Ya; see also Lp , Lp k k∞ 243D, 243Xb, 243Xo, 244Xh, 273Xk, 281B; see also essential supremum (243D), L∞ , L∞ , `∞ ⊗ (in f ⊗ g) 253B, 253C, 253J, 253L, 253Ya, 253Yb; (in u ⊗ v) §253 (253E) b (in Σ⊗T) b ⊗ 251D, 251K, 251L, 251Xk, 251Ya, 252P, 252Xd, 252Xe, 253C N c (in N c Σi ) 251Wb, 251Wf, 254E, 254F, 254Mc, 254Xc, 254Xi Q Q Q i∈I (in i∈I αi ) 254F; (in i∈I Xi ) 254Aa # (in #(X), the cardinal of X) 2A1Kb + (in κ+ , successor cardinal) 2A1Fc; (in f + , where f is a function) 121Xa, 241Ef; (in u+ , where u belongs to a Riesz space) 241Ef; (in F (x+ ), where F is a real function) 226Bb − (in f − , where f is a function) 121Xa, 241Ef; (in u− , in a Riesz space) 241Ef; (in F (x− ), where F is a real function) 226Bb ∨, ∧ (in a lattice) 121Xa, 2A1Ad ∧
∨
, ∨ (in f , f ) see Fourier transform, inverse Fourier transform (283A) {0, 1}I (usual measure on) 254J, 254Xd, 254Xe, 254Yc, 272N, 273Xb; (when I = N) 254K, 254Xj, 254Xq, 256Xk, 261Yd; see also PX ¿ (in ν ¿ µ) see absolutely continuous (232Aa) ∞ see infinity [ ] (in [a, b]) see closed interval (115G, 1A1A, 2A1Ab); (in f [A], f −1 [B], R[A], R−1 [B]) 1A1B [[ ]] (in f [[F]]) see image filter (2A1Ib) [ [ (in [a, b[) see half-open interval (115Ab, 1A1A) ] ] (in ]a, b]) see half-open interval (1A1A) ] [ (in ]a, b[) see open interval (115G, 1A1A) (in µ E) 234E, 235Xf ∧