Measures, Integrals and Martingales

This page intentionally left blank MEASURES, INTEGRALS AND MARTINGALES This is a concise and elementary introduction...

Author: René L. Schilling

154 downloads 1387 Views 2MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

This page intentionally left blank

MEASURES, INTEGRALS AND MARTINGALES

This is a concise and elementary introduction to measure and integration theory as it is nowadays needed in many parts of analysis and probability theory. The basic theory – measures, integrals, convergence theorems, Lp -spaces and multiple integrals – is explored in the first part of the book. The second part then uses the notion of martingales to develop the theory further, covering topics such as Jacobi’s general transformation theorem, the Radon–Nikodým theorem, differentiation of measures, Hardy–Littlewood maximal functions or general Fourier series. Undergraduate calculus and an introductory course on rigorous analysis in are the only essential prerequisites, making this text suitable for both lecture courses and for self-study. Numerous illustrations and exercises are included, and these are not merely drill problems but are there to consolidate what has already been learnt and to discover variants, sideways and extensions to the main material. Hints and solutions will be available on the internet. René Schilling is Professor of Stochastics at the University of Marburg.

MEASURES, INTEGRALS AND MARTINGALES RENÉ L. SCHILLING

CAMBRIDGE UNIVERSITY PRESS

Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521850155 © Cambridge University Press 2005 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2005 eBook (EBL) ISBN-13 978-0-511-34456-5 ISBN-10 0-511-34456-2 eBook (EBL) ISBN-13 ISBN-10

hardback 978-0-521-85015-5 hardback 0-521-85015-0

ISBN-13 ISBN-10

paperback 978-0-521-61525-9 paperback 0-521-61525-9

Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

Contents

Prelude

page viii

Dependence chart

xi

1

Prologue Problems

1 4

2

The pleasures of counting Problems

5 13

3

-algebras Problems

15 20

4

Measures Problems

22 28

5

Uniqueness of measures Problems

31 35

6

Existence of measures Problems

37 46

7

Measurable mappings Problems

49 54

8

Measurable functions Problems

57 65

9

Integration of positive functions Problems

67 73

10 Integrals of measurable functions and null sets Null sets and the ‘a.e.’ Problems v

76 80 84

vi

Contents

11 Convergence theorems and their applications Parameter-dependent integrals Riemann vs. Lebesgue integration Examples Problems

88 91 92 98 100

12 The function spaces p 1 p Problems

105 116

13 Product measures and Fubini’s theorem More on measurable functions Distribution functions Minkowski’s inequality for integrals Problems

120 127 128 130 130

14 Integrals with respect to image measures Convolutions Problems

134 137 140

15 Integrals of images and Jacobi’s transformation rule Jacobi’s transformation formula Spherical coordinates and the volume of the unit ball Continuous functions are dense in p n Regular measures Problems

142 147 152 156 158 159

16 Uniform integrability and Vitali’s convergence theorem Different forms of uniform integrability Problems

163 168 173

17 Martingales Problems

176 188

18 Martingale convergence theorems Problems

190 200

19 The Radon–Nikodým theorem and other applications of martingales The Radon–Nikodým theorem Martingale inequalities The Hardy–Littlewood maximal theorem Lebesgue’s differentiation theorem The Calderón–Zygmund lemma Problems

202 202 211 213 218 221 222

Contents

vii

20 Inner product spaces Problems

226 232

21 Hilbert space Problems

234 246

22 Conditional expectations in L2 On the structure of subspaces of L2 Problems

248 253 257

23 Conditional expectations in Lp Classical conditional expectations Separability criteria for the spaces Lp X Problems

258 263 269 274

24 Orthonormal systems and their convergence behaviour Orthogonal polynomials The trigonometric system and Fourier series The Haar system The Haar wavelet The Rademacher functions Well-behaved orthonormal systems Problems

276 276 283 289 295 299 302 312

Appendix A: lim inf and lim sup

313

Appendix B: Some facts from point-set topology Topological spaces Metric spaces Normed spaces

318 319 322 325

Appendix C: The volume of a parallelepiped

328

Appendix D: Non-measurable sets

330

Appendix E: A summary of the Riemann integral The (proper) Riemann integral The fundamental theorem of integral calculus Integrals and limits Improper Riemann integrals

337 337 346 351 353

Further reading

360

References

364

Notation index

367

Name and subject index

371

Prelude

The purpose of this book is to give a straightforward and yet elementary introduction to measure and integration theory that is within the grasp of second or third year undergraduates. Indeed, apart from interest in the subject, the only prerequisites for Chapters 1–13 are a course on rigorous --analysis on the real line and basic notions of linear algebra and calculus in n . The first few chapters form a concise (not to say minimalist) introduction to Lebesgue’s approach to measure and integration, based on a 10-week, 30-hour lecture course for Sussex University mathematics undergraduates. Chapters 14–24 are more advanced and contain a selection of results from measure theory, probability theory and analysis. This material can be read linearly but it is also possible to select certain topics; see the dependence chart on page xi. Although more challenging than the first part, the prerequisites stay essentially the same and a reader who has worked through and understood Chapters 1–13 will be well prepared for all that follows. At some points, one or another concept from point-set topology will be (mostly superficially) needed; those readers who are not familiar with the topic can look up the basic results in Appendix B whenever the need arises. Each chapter is followed by a section of Problems. They are not just drill exercises but contain variants, excursions from and extensions of the material presented in the text. The proofs of the core material do not depend on any of the problems and it is an exception that I refer to a problem in one of the proofs. Nevertheless I do advise you to attempt as many problems as possible. The material in the Appendices – on upper and lower limits, basic topology and the Riemann integral – is primarily intended as back-up, for when you want to look something up. Unlike many textbooks this is not an introduction to integration for analysts or a probabilistic measure theory. I want to reach both (future) analysts and (future) probabilists, and to provide a foundation which will be useful for both viii

Prelude

ix

communities and for further, more specialized, studies. It goes without saying that I have to leave out many pet choices of each discipline. On the other hand, I try to intertwine the subjects as far as possible, resulting – mostly in the latter part of the book – in the consequent use of the martingale machinery which gives ‘probabilistic’ proofs of ‘analytic’ results. Measure and integration theory is often seen as an abstract and dry subject, disliked by many students. There are several reasons for this. One of them is certainly the fact that measure theory has traditionally been based on a thorough knowledge of real analysis in one and several dimensions. Many excellent textbooks are written for such an audience but today’s undergraduates find it increasingly hard to follow such tracts, which are often more aptly labelled graduate texts. Another reason lies within the subject: measure theory has come a long way and is, in its modern purist form, stripped of its motivating roots. If, for example, one starts out with the basic definition of measures, it takes unreasonably long until one arrives at interesting examples of measures – the proof of existence and uniqueness of something as basic as Lebesgue measure already needs the full abstract machinery – and it is not easy to entertain by constantly referring to examples made up of delta functions and artificial discrete measures. I try to alleviate this by postulating the existence and properties of Lebesgue measure early on, then justifying the claims as we proceed with the abstract theory. Technically, measure and integration theory is no more difficult than, say, complex function theory or vector calculus. Most proofs are even shorter and have a very clear structure. The one big exception, Carathéodory’s extension theorem, can be safely stated without proof since an understanding of the technique is not really needed at the beginning; we will refer to the details of it only in Chapter 14 in connection with regularity questions. The other exception is the (classical proof of the) Radon–Nikodým theorem, but we will follow a different route in this book and use martingales to prove this and other results. I am grateful to all students who went to my classes, challenged me to write, rewrite and improve this text and who drew my attention – sometimes unbeknownst to them – to many weaknesses. I owe a great debt to the patience and interest of my colleagues, in particular to Niels Jacob, Nick Bingham, David Edmunds and Alexei Tyukov who read the whole text, and to Charles Goldie and Alex Sobolev who commented on large parts of the manuscript. Without their encouragement and help there would be more obscure passages, blunders and typos in the pages to follow. It is a pleasure to acknowledge the interest and skill of the Cambridge University Press and its editor, Roger Astley, in the preparation of this book.

x

Prelude

A few words on notation before getting started. I tried to keep unusual and special notation to a minimum. However, a few remarks are in order: means the natural numbers 1 2 3 and 0 = ∪ 0. Positive or negative is always understood in non-strict sense 0 or 0; to exclude 0, I say strictly positive/negative. A ‘+’ as sub- or superscript refers to the positive part of a function or the positive members of a set. Finally, a ∨ b resp. a ∧ b denote the maximum resp. minimum of the numbers a b ∈ . For any other general notation there is a comprehensive index of notation at the end of the book. In some statements I indicate alternatives using square brackets, i.e., ‘if A [B] … then P [Q] ’ should be read as ‘if A … then P ’ and ‘if B … then Q ’. The end of a proof is marked by Halmos’ ‘tombstone’ symbol , and Bourbaki’s ‘dangerous bend’ symbol in the margin identifies a passage which requires some attention. As with every book, one cannot give all the details at every instance. On the other hand, the less experienced reader might glide over these places without even noticing that some extra effort is needed; for these readers – and, hopefully, not to the annoyance of all others – I use the symbol[] to indicate where some little verification is appropriate. Cross-referencing. Throughout the text chapters are numbered with arabic numerals and appendices with capital letters. Formulae are numbered (n.k) refering to formula k from Chapter n. For theorems and the like I write n.m for Theorem m from Chapter n. The abbreviation Tn.m is sometimes used for Theorem n.m (with D standing for Definition, L for Lemma, P for Proposition and C for Corollary).

§ 23.14 –18 Martingales & Cond. Expectation

§19.11–12 Martingale ineq.

§16.1–7 Uniform integrability, Vitali

§16.8–9 Different forms of UI

§15.16–17 Cc is dense in Lp

§15.18–20 Regularity of measures

§19.20–21 Leb. Differentiation T.

§19.14–18 Maximal functions

§ 23.1–13 Cond. Expectation in Lp

§18 Martingale Convergence

§19.1–9 ´ Radon-Nikodym Theorem

§15.5–15 Jacobi’s Transformation T. needs pf. of T. 5.1

§ 23.19–21 Separability of Lp

§ 22.1– 4 Cond. Expectation in L2

§17 Martingales

§19.22 ´ Calderon-Zygmund Lemma

§15.1– 4 Integrals of direct images

§§ 20, 21 Inner products, Hilbert space

§13.11–13 Distribution functs.

§13.14

§13.10

Dependence Chart

§ 24.29 Brownian motion

§ 24.19–20 Haar wavelets § 24.21–23 Rademacher fns. § 24.24 –28 Wellbehaved ONSs

functions

§ 24.16–18 Haar

§ 22.5 Structure of subspaces of L2

§ 24.1–15 Orth. polynomials, Fourier series

Chapters 2–12 contain core material which is needed in all later chapters. Prerequisites within Chapters 13–24 are shown by arrows , dashed arrows indicate a minor dependence.

§14.4–8 Convolution

§14.1–3 Image measure & integrals

§13.1–9 Product measure, Fubini T.

1 Prologue

The theme of this book is the problem of how to assign a size, a content, a probability, etc. to certain sets. In everyday life this is usually pretty straightforward; we • count: a b c x y z has 26 letters; • take measurements: length (in one dimension), area (in two dimensions), volume (in three dimensions) or time; • calculate: rates of radioactive decay or the odds to win the lottery. In each case we compare (and express the outcome) with respect to some base unit; most of the measurements just mentioned are intuitively clear. Nevertheless, let’s have a closer look at areas: area = length × widthw

w

(1.1)

l

An even more flexible shape than the rectangle is the triangle:

h

area =

b

1

1 × baseb × heighth 2

(1.2)

2

R.L. Schilling

Triangles are indeed more basic than rectangles since we can represent every rectangle, and actually any odd-shaped quadrangle, as the ‘sum’ of two nonoverlapping triangles:

(1.3)

area = area of shaded triangle + area of white triangle In doing so we have tacitly assumed a few things. In (1.2) we have chosen a particular base line and the corresponding height arbitrarily. But the concept of area should not depend on such a choice and the calculation this choice entails. Independence of the area from the way we calculate it is called well-definedness. Plainly, b3 h1

b1

area =

h3

b2

(1.4)

h2

1 × h1 × b1 2

=

1 × h2 × b2 2

=

1 × h3 × b3 2

Notice that (1.4) allows us to pick the most convenient method to work out the area. In (1.3) we actually used two facts: • the area of non-overlapping (disjoint) sets can be added, i.e. areaA = areaB = A ∩ B = ∅

=⇒ • congruent triangles have the same area, i.e. area

areaA ∪ B = + = area .

This shows that the least we should expect from a reasonable measure is that it is well-defined, takes values in 0 and ∅ = 0

(1.5)

additive, i.e. A ∪ B = A + B whenever A ∩ B = ∅

(1.6)

The additional property that the measure is invariant under congruences

(1.7)

Measures, Integrals and Martingales

3

turns out to be a very special property of length, area and volume, i.e. of Lebesgue measure on n . The above rules allow us to measure arbitrarily odd-looking polygons using the following recipe: dissect the polygon into non-overlapping triangles and add their areas. But what about curved or even more complicated shapes, say,

?

Here is one possibility for the circle: inscribe a regular 2j -gon, j ∈ , into the circle, subdivide it into congruent triangles, find the area of each of these slices and then add all 2j pieces. In the next step increase j j + 1 by doubling the number of points on the circumference and repeat the above procedure. Eventually, area of circle = lim 2j × area triangle at step j j→

2π rad 2j

(1.8)

Again, there are a few problems: does the limit exist? Is it admissible to subdivide a set into arbitrarily many subsets? Is the procedure independent of the particular subdivision? In fact, nothing would have prevented us from paving the circle with ever smaller squares! For a reasonable notion of measure the answer to all of these questions should be yes and the way we pave the circle should not lead to different results, as long as our tiles are disjoint. However, finite additivity (1.6) is not enough for this and we have to use instead − additivity area · Aj = areaAj (1.9) j∈

j∈

where the notation · j Aj means the disjoint union of the sets Aj , i.e. the union where the sets Aj are pairwise disjoint: Aj ∩ Ak = ∅ if j = k; a corresponding notation is used for unions of finitely many sets. We will see that conditions (1.5) and (1.9) lead to the notion of measure which is powerful enough to cater for all our everyday measuring needs and for much more. We will also see that a good notion of measure allows us to introduce integrals, basically starting with the na¨ıve idea that the integral of a positive

4

R.L. Schilling

function should stand for the area of the set between the graph of the function and the abscissa. Problems 1.1. Consider the two figures below.

They seem to indicate that there is no conclusive way to exhaust an area by squares (see the extra square in the second figure). Can that be? 1.2. Use (1.8) to find the area of a circle with radius r.

2 The pleasures of counting

Set algebra and countability play a major rôle in measure theory. In this chapter we review briefly notation and manipulations with sets and introduce then the notion of countability. If you are not already acquainted with set algebra, you should verify all statements in this chapter and work through the exercises. Throughout this chapter X and Y denote two arbitrary sets. For any two sets A B (which are not necessarily subsets of a common set) we write A ∪ B = x x ∈ A or x ∈ B or x ∈ A and B A ∩ B = x x ∈ A and x ∈ B A \ B = x x ∈ A and x ∈ B in particular we write A ∪· B for the disjoint union, i.e. for A ∪ B if A ∩ B = ∅. A ⊂ B means that A is contained in B including the possibility that A = B; to exclude the latter we write A B. If A ⊂ X, we set Ac = X \A for the complement of A (relative to X). Recall also the distributive laws for A B C ⊂ X A ∩ B ∪ C = A ∩ B ∪ A ∩ C A ∪ B ∩ C = A ∪ B ∩ A ∪ C

(2.1)

and de Morgan’s identities A ∩ Bc = Ac ∪ Bc A ∪ Bc = Ac ∩ Bc

(2.2)

which also hold for arbitrarily many sets Ai ⊂ X, i ∈ I (I stands for an arbitrary index set), c Ai = Aci i∈I

c Ai

i∈I

=

i∈I

i∈I

5

(2.3) Aci

6

R.L. Schilling

A map f X → Y is called injective (or one-one) ⇐⇒ fx = fx =⇒ x = x surjective (or onto) ⇐⇒ fX = fx ∈ Y x ∈ X = Y bijective ⇐⇒ f is injective and surjective Set operations and direct images under a map f are not necessarily compatible: indeed, we have, in general, fA ∪ B = fA ∪ fB fA ∩ B = fA ∩ fB

(2.4)

fA \ B = fA \ fB Inverse images and set operations are, however, always compatible. For C Ci D ⊂ Y one has f −1 Ci = f −1 Ci i∈I

f −1

i∈I

i∈I

Ci = f −1 Ci

(2.5)

i∈I

f −1 C \ D = f −1 C \ f −1 D If we have more information about f we can, of course, say more. 2.1 Lemma f X → Y is injective if, and only if, fA ∩ B = fA ∩ fB for all A B ⊂ X. Proof ‘⇒’: Since fA ∩ B ⊂ fA and fA ∩ B ⊂ fB, we have always fA ∩ B ⊂fA ∩ fB. Let us check the converse inclusion ‘⊃’. If y ∈ fA ∩ fB, we have y = fa and y = fb for some a ∈ A b ∈ B. So, fa = y = fb and, by injectivity, a = b. This means that a = b ∈ A ∩ B, hence y ∈ fA ∩ B and fA ∩ fB ⊂ fA ∩ B follows. ‘⇐’: Take x x ∈ X with fx = fx and set A = x, B = x . Then ∅ = fx ∩ fx = fx ∩ x which is only possible if x ∩ x = ∅, i.e. if x = x . This shows that f is injective. 2.2 Lemma f X → Y is injective if, and only if, fX \ A = fX \ fA for all A ⊂ X. Proof ‘⇒’ Assume that f is injective. We show first that fx ∈ fA if, and only if, x ∈ A. Indeed, if fx ∈ fA, then x ∈ A; if x ∈ A but fx ∈ fA,


7

then we can find some a ∈ A such that fa = fx ∈ fA. Since f is injective, x = a ∈ A and we have found a contradiction. Thus fX \ fA = y ∈ Y y = fx fx ∈ fA = y ∈ Y y = fx x ∈ A = fX \ A ‘⇐’: Let fx = fx and assume that x = x . Then fx ∈ fX \ x = fX \ fx which cannot happen as fx ∈ fx . We can now start with the main topic of this chapter: counting. 2.3 Definition Two sets X Y have the same cardinality if there exists a bijection f X → Y . In this case we write #X = #Y . If there is an injection g X → Y , we say that the cardinality of X is less than or equal to the cardinality of Y and write #X #Y . If #X #Y but #X = #Y , we say that X is of strictly smaller cardinality than Y and write #X < #Y (in this case, no injection g X → Y can be surjective). That Definition 2.3 is indeed counting becomes clear if we choose Y = since in this case #X = # or #X # just means that we can label each x ∈ X with a unique tag from the set 1 2 3 , i.e. we are numbering X. This particular example is, in fact, of central importance. 2.4 Definition A set X is countable if #X #. If # < #X, the set X is said to be uncountable. The cardinality of is called ℵ0 , aleph null. Plainly, Definition 2.4 requires that we can find for every countable set some enumeration X = x1 x2 x3 which may or may not be finite (and which may contain any xj more than once). Caution: Some authors reserve the word countable for the situation where #X = # while sets where #X # are called at most countable or finite or countable. This has the effect that a countable set is always infinite. We do not adopt this convention.

8

R.L. Schilling

The following examples show that (countable) sets with infinitely many elements can behave strangely. 2.5 Examples (i) Finite sets are countable: a b z → 1 2 26 where a ↔ 1 z ↔ 26, is bijective and 1 2 3 26 → is clearly an injection. Thus #a b c z = #1 2 3 26 # (ii) The even numbers are countable. This follows from the fact that the map f 2 4 6 2j →

k →

k 2

is an injection and even a bijection.[] This means that there are ‘as many’ even numbers as there are natural numbers. (iii) The set of integers = 0 ±1 ±2 is countable. The counting > scheme is shown on the right (run through in clockwise orientation starting > from 0) or, more formally, 1 2 –2 –1 0 < 2k if k > 0 < g k ∈ → 2k + 1 if k 0 hence # #.[] (iv) The Cartesian product × = j k j k ∈ is countable. To see this, arrange the pairs j k in an array and count along the diagonals:

1

2

3

4

5

(1, 1)

(1, 2)

(1, 3)

(1, 4)

(1, 5)

...

(2, 1)

(2, 2)

(2, 3)

(2, 4)

(2, 5)

...

(3, 1)

(3, 2)

(3, 3)

(3, 4)

(3, 5)

...

(4, 1)

(4, 2)

(4, 3)

(4, 4)

(4, 5)

...

(5, 1)

(5, 2)

(5, 3)

(5, 4)

(5, 5)

...

. ..

. ..

. ..

. ..

. ..


9

Notice that each line contains only finitely many elements, so that each diagonal can be dealt with in finitely many steps. The map for the above counting scheme is given by j + kj + k − 1 − k + 1 ∈ j k ∈ × (2.6) 2 (v) The rational numbers are countable. To see this, set Q± = q ∈ ±q > 0. Every element mn ∈ Q+ can be identified with at least one pair m n ∈ × , so that Q+ ⊂ 11 21 21 31 22 31 41 23 23 41 h j k →

1

2

3

4

in the set on the right we distinguish between cancelled and uncancelled 6 1 2 3 forms of a rational, i.e. 18 3 6 9 etc. are counted whenever they appear. k refer to the corresponding diagonals in the counting scheme in The numbers i

j

part (iv). This shows that we can find injections Q+ −→ −→ × ; the set × is countable, thus Q+ is countable[] and so is Q− . Finally, = Q− ∪· 0 ∪· Q+ = r1 r2 r3 ∪· 0 ∪· q1 q2 q3 and p1 = 0 p2k = qk p2k+1 = rk gives an enumeration p1 p2 p3 of . 2.6 Theorem Let A1 A2 A3 be countably many countable sets. Then A =

j∈ Aj is countable, i.e. countable unions of countable sets are countable. Proof Since each Aj is countable we can find an enumeration Aj = aj1 aj2 ajk (if Aj is a finite set, we repeat the last element of the list infinitely often), so that Aj = ajk j k ∈ × A= j∈

Using Example 2.5(iv) we can relabel × by and (after deleting all duplicates) we have found an enumeration. It is not hard to see that for cardinalities ‘ ’ is reflexive (#A #A) and transitive (#A #B #B #C =⇒ #A #C). Antisymmetry, which makes ‘ ’ into a partial order relation, is less obvious. The proof of the following important result is somewhat technical and can be left out at first reading. 2.7 Theorem (Cantor–Bernstein) Let X Y be two sets. If #X #Y and #Y #X, then #X = #Y .

10

R.L. Schilling

Proof By assumption, #X #Y ⇐⇒ there exists an injection f X → Y #Y #X ⇐⇒ there exists an injection g Y → X In order to prove #X = #Y we have to construct a bijection h X → Y . Step 1. Without loss of generality we may assume that Y ⊂ X. Indeed, since g Y → gY is a bijection, we know that #Y = #gY and it is enough to show #gY = #X. As gY ⊂ X we can simplify things and identify gY with Y , i.e. assume that g = id or, equivalently, Y ⊂ X. Step 2. Let Y ⊂ X and g = id. Recursively we define X0 = X Xj+1 = fXj

Y0 = Y Yj+1 = fYj

As usual, we write f j = f f · · · f and f 0 = id. Then

j times

f j+1 X = f j fX

fX⊂Y

⊂

Y ⊂X

f j Y ⊂ f j X

⊂

Xj+1

⊂

Yj

Xj

and we can define a map h X → Y by fx if x ∈ Xj \ Yj for some j ∈ 0 hx =

x if x ∈ j∈0 Xj \ Yj Step 3. The map h is surjective: hX = Y . Indeed, we have by definition c hX = fXj \ Yj ∪ Xj \ Yj j∈0

j∈0

c Xj \ Yj fXj \ fYj ∪ X \ Y ∪

1 2

=

j∈0

j∈

c c = Xj+1 \ Yj+1 Xj+1 \ Yj+1 ∪ X \ Y ∩ j∈0

= A

j∈0

= A ∪ X c ∪ Y ∩ Ac = A ∪ Y ∩ Ac = Y ∩ X = Y

where we used that A = j∈0 Xj+1 \ Yj+1 ⊂ j∈0 Xj+1 ⊂ X1 = fX ⊂ Y .


11

Step 4. The map h is injective. To see this, let x x ∈ X and hx = hx . We have four possibilities (a) x x ∈ Xj \ Yj for some j ∈ 0 . Then fx = hx = hx = fx so that x = x since f is injective.

(b) x x ∈ j∈0 Xj \ Yj . Then x = hx = hx = x . (c) x ∈ Xj \ Yj for some j ∈ 0 and x ∈ Xk \ Yk for all k ∈ 0 . As fx = hx = hx = x we see 1 2

x = fx ∈ fXj \ Yj = fXj \ fYj = Xj+1 \ Yj+1 which is impossible, i.e. (c) cannot occur. (d) x ∈ Xj\Yj for some j ∈ 0 and x ∈ Xk\Yk for all k ∈ 0 . This is analogous to (c). . Theorem 2.7 says that #X < #Y and #Y < #X cannot occur at the same time; it does not claim that we can compare the cardinality of any two sets X and Y , i.e. that ‘’ is a linear ordering. This is indeed true but its proof requires the axiom of choice, see Hewitt and Stromberg [20, p. 19]. Not all sets are countable. The following proof goes back to G. Cantor and is called Cantor’s diagonal method. 2.8 Theorem The interval 0 1 is uncountable; its cardinality = #0 1 is called the continuum. Proof Recall that we can write each x ∈ 0 1 as a decimal fraction, i.e. x = 0 y1 y2 y3 with yj ∈ 0 1 9. If x has a finite decimal representation, say x = 0 y1 y2 y3 yn , yn = 0, we replace the last digit yn by yn − 1 and fill it up with trailing 9s. For example, 0 24 = 0 2399 . This yields a unique representation of x by an infinite decimal expansion. Assume that 0 1 were countable and let x1 x2 be an enumeration (containing no element more than once!). Then we can write x1 = 0 a11 a12 a13 a14 x2 = 0 a21 a22 a23 a24 x3 = 0 a31 a32 a33 a34

(2.7)

x4 = 0 a41 a42 a43 a44

and construct a new number x = 0 y1 y2 y3 ∈ 0 1 with digits 1 if ajj = 5 yj = 5 if ajj = 5

(2.8)

12

R.L. Schilling

By construction, x = xj for any xj from the list (2.7): x and xj differ at the jth decimal. But then we have found a number x ∈ 0 1 which is not contained in our supposedly complete enumeration of 0 1 and we have arrived at a contradiction. By 0 1 we denote the set of all sequences xj j∈ where xj ∈ 0 1. 2.9 Theorem We have #0 1 = . Proof We have to assign to every sequence xj j∈ ⊂ 0 1 a unique number x ∈ 0 1 – and vice versa. For this we write, as in the proof of Theorem 2.8, each xj as a unique infinite decimal fraction xj = 0 aj1 aj2 aj3 aj4

j ∈

and we organize the array ajk jk∈ into one sequence with the help of the counting scheme of Example 2.5(iv): x = 0 a11 a12 a21 a13 a22 a31 a14 a23 a32 a41

1

2

3

4

k refer to the corresponding diagonals in the counting scheme (The numbers of Example 2.5(iv).) Since the counting scheme was bijective, this procedure is reversible, i.e. we can start with the decimal expansion of x ∈ 0 1 and get a unique sequence of xj s. We have thus found a bijection between 0 1 and 0 1.

We write X for the power set A A ⊂ X which is the family of all subsets of a given set X. For finite sets it is clear that the power set is of strictly larger cardinality than X. This is still true for infinite sets. 2.10 Theorem For any set X we have #X < #X. Proof We have to show that no injection X → X can be surjective. Fix such an injection and define B = x ∈ X x ∈ x (mind: x is a set!). Clearly B ∈ X. If were surjective, B = z for some element z ∈ X. Then, however, z∈B

def

⇐⇒

z ∈ z

⇐⇒

z ∈ B

which is impossible. Thus cannot be surjective.

since z = B


13

Problems 2.1. Let A B C ⊂ X be sets. Show that (i) (ii) (iii) (iv) (v)

A \ B = A ∩ Bc ; A \ B \ C = A \ B ∪ C; A \ B \ C = A \ B ∪ A ∩ C; A \ B ∩ C = A \ B ∪ A \ C; A \ B ∪ C = A \ B ∩ A \ C.

2.2. Let A B C ⊂ X. The symmetric difference of A and B is defined as A B = A \ B ∪ B \ A. Verify that A ∪ B ∪ C \ A ∩ B ∩ C = A B ∪ B C 2.3. Prove de Morgan’s identities (2.2) and (2.3). 2.4. (i) Find examples which illustrate that fA ∩ B = fA ∩ fB and fA \ B = fA \ fB. In both relations one inclusion ‘⊂’ or ‘⊃’ is always true. Which one? (ii) Prove (2.5). 1 if x ∈ A 2.5. The indicator function of a set A ⊂ X is defined by 1A x = 0 if x ∈ A Check that (i) 1A∩B = 1A 1B

(ii)

1A∪B = min1A + 1B 1

(iii)

1A\B = 1A − 1A∩B

(iv)

1A∪B = 1A + 1B − 1A∩B

(v)

1A∪B = max1A 1B

(vi)

1A∩B = min1A 1B

2.6. Let A B C ⊂ X and denote by AB the symmetric difference as in Problem 2.2. Show that (i) 1AB = 1A + 1B − 2 1A 1B = 1A + 1B mod 2; (ii) ABC = ABC; (iii) X is a commutative ring (in the usual algebraists’ sense) with ‘addition’ and ‘multiplication’ ∩. [Hint: use indicator functions for (ii) and (iii).] 2.7. Let f X → Y be a map, A ⊂ X and B ⊂ Y . Show that, in general, f f −1 B B

and

f −1 fA A

When does ‘=’ hold in these relations? Provide an example showing that the above inclusions are strict. 2.8. Let f and g be two injective maps. Show that f g, if it exists, is injective.

14

R.L. Schilling

2.9. Show that the following sets have the same cardinality as m ∈ m is odd ×

m m ∈ m∈ m . 2.10. Use Theorem 2.7 to show that # × = #. [Hint: # = # × 1 and × 1 ⊂ × .] 2.11. Show that if E ⊂ F we have #E #F . In particular, subsets of countable sets are again countable. 2.12. Show that 0 1 = all infinite sequences consisting of 0 and 1 is uncountable. [Hint: diagonal method.] 2.13. Show that the set is uncountable and that #0 1 = #. [Hint: find a bijection f 0 1 → .]

2.14. Let Aj j∈ be a sequence of sets of cardinality . Show that # j∈ Aj = .

[Hint: map Aj bijectively onto j − 1 j and use that 0 1 ⊂ j=1 j − 1 j ⊂ .] 2.15. Adapt the proof of Theorem 2.8 to show that #1 2 #0 1 #0 1 and conclude that #0 1 = #0 1 . Remark. This is the reason for writing = 2ℵ0 . [Hint: interpret 0 1 as base-2 expansions of all numbers in 0 1 while 1 2 are all infinite base-3 expansions lacking the digit 0.] 2.16. Extend Problem 2.15 to deduce #0 1 2 n = #0 1 for all n ∈ . 2.17. Mimic the proof of Theorem 2.9 to show that #0 12 = . Use the fact that # = #0 1 to conclude that #2 = . 2.18. Show that the set of all infinite sequences of natural numbers has cardinality . [Hint: use that #0 1 = #1 2 1 2 ⊂ ⊂ and # = #0 1 .] 2.19. Let = F ⊂ #F < . Show that # = #.

[Hint: embed into k∈ k or show that F → j∈F 2j is a bijection between and .] 2.20. Show – not using Theorem 2.10 – that # > #. Conclude that there are more than countably many maps f → . [Hint: diagonal method.] 2.21. If A ⊂ we can identify the indicator function 1A → 0 1 with the 0-1-sequence 1A j j∈ , i.e., 1A ∈ 0 1 . Show that the map A → 1A ∈ 0 1 is a bijection and conclude that # = .

3 -algebras

We have seen in the prologue that a reasonable measure should be able to deal with countable partitions. Therefore, a measure function should be defined on a system of sets which is stable whenever we repeat any of the basic set operations – ∪ ∩ c – countably many times. 3.1 Definition A -algebra on a set X is a family of subsets of X with the following properties: X ∈

(1 )

A ∈ =⇒ Ac ∈ Aj j∈ ⊂ =⇒ Aj ∈

(2 ) (3 )

j∈

A set A ∈ is said to be (-)measurable. 3.2 Properties (of a -algebra) (i) ∅ ∈ . Indeed: ∅ = X c ∈ by 1 2 . (ii) A B ∈ =⇒ A ∪ B ∈ . Indeed: set A1 = A A2 = B A3 = A4 = = ∅. Then A ∪ B = j∈ Aj ∈ by 3 . (iii) Aj j∈ ⊂ =⇒ j∈ Aj ∈ . Indeed: if Aj ∈ , then Acj ∈ by 2 , hence j∈ Acj ∈ by 3 and, c c ∈ . again by 2 j∈ Aj = j∈ Aj 3.3 Examples (i) X is a -algebra (the maximal -algebra in X). (ii) ∅ X is a -algebra (the minimal -algebra in X). (iii) ∅ B Bc X , B ⊂ X, is a -algebra. (iv) ∅ B X is no -algebra (unless B = ∅ or B = X). (v) = A ⊂ X #A # or #Ac # is a -algebra. 15

16

R.L. Schilling

Proof: Let us verify 1 –3 . 1 : X c = ∅ which is certainly countable. 2 : if A ∈ , either A or Ac is by definition countable, so Ac ∈ . 3 : if Aj j∈ ⊂ , then two cases can occur: • All Aj are countable. Then A = j∈ Aj is a countable union of countable sets which is, by T2.6, itself countable. • At least one Aj0 is uncountable. Then Acj0 must be countable, so that c Aj = Acj ⊂ Acj0 Hence

c

j∈ Aj

j∈

j∈

is countable (Problem 2.11) and so

j∈ Aj

∈ .

(vi) (trace -algebra) Let E ⊂ X be any set and let be some -algebra in X. Then E = E ∩ = E ∩ A A ∈

(3.1)

is a -algebra in E. (vii) (pre-image -algebra) Let f X → X be a map and let be a -algebra in X . Then

= f −1 = f −1 A A ∈ is a -algebra in X. 3.4 Theorem (and Definition) (i) The intersection i∈I i of arbitrarily many -algebras i in X is again a -algebra in X. (ii) For every system of sets ⊂ X there exists a smallest (also: minimal, coarsest) -algebra containing . This is the -algebra generated by , denoted by , and is called its generator. Proof (i) We check 1 –3 1 : since X ∈ i for all i ∈ I, X ∈ i i . 2 : if A ∈ i i , then Ac ∈ i for all i ∈ I, so Ac ∈ i i 3 : let Ak k∈ ⊂ i i . Then Ak ∈ i for all k ∈ and all i ∈ I, hence k∈ Ak ∈ i for each i ∈ I and so k∈ Ak ∈ i∈I i . (ii) Consider the family = -alg. ⊃

Since ⊂ X and since X is a -algebra, the above intersection is nonvoid. This means that the definition of makes sense and yields, by part (i), a -algebra containing . If is a further -algebra with ⊃ , then would be included in the intersection used for the definition of , hence ⊂ . In this sense, is the smallest -algebra containing .


17

3.5 Remarks (i) If is a -algebra, then = . (ii) For A ⊂ X we have A = ∅ A Ac X . 3.5(i) (iii) If ⊂ ⊂ , then ⊂ ⊂ = . On the Euclidean space n there is a canonical -algebra, which is generated by the open sets. Recall that U ⊂ n

is open ⇐⇒ ∀ x ∈ U ∃ > 0 B x ⊂ U

where B x = y ∈ n x − y < is the open ball with centre x and radius . A set is closed if its complement is open. The system of open sets in X = n , n , has the following properties: ∅ X ∈ n

( 1 )

U V ∈ =⇒ U ∩ V ∈ Ui ∈ n i ∈ I arbitrary =⇒ Ui ∈ n n

n

( 2 ) ( 3 )

i∈I

Note, however, that countable or arbitrary intersections of open sets need not be open[] . A family of subsets of a general space X satisfying the conditions 1 – 3 is called a topology, and the pair X is called a topological space; in analogy to n , U ∈ is said to be open while closed sets are exactly the complements of open sets; see Appendix B. 3.6 Definition The -algebra n generated by the open sets n of n is called Borel -algebra, and its members are the Borel sets or Borel measurable sets. We write n or n for the Borel sets in n . The Borel sets are fundamental for the study of measures on n . Since the Borel -algebra depends on the topology of n , n is often also called the topological -algebra. 3.7 Theorem Denote by n n and n the families of open, closed and compact1 sets in n . Then

n = n = n = n Proof Since compact sets are closed, we have n ⊂ n and by Remark 3.5(iii), n ⊂ n . On the other hand, if C ∈ n , then Ck = C ∩ Bk 0 is2 closed and bounded, hence ∈ n . By construction C = k∈ Ck , thus n ⊂ n and also n ⊂ n . 1 2

i.e. closed and bounded. Bk 0 Bk 0 denote the open, resp., closed balls with centre 0 and radius k.

18

R.L. Schilling

Since n c = U c U ∈ n = n (and n c = n ) we have n = n c ⊂ n , hence n ⊂ n and the converse inclusion is similar. The Borel -algebra n is generated by many different systems of sets. For our purposes the most interesting generators are the families of open rectangles o = on = o n = a1 b1 × · · · × an bn aj bj ∈ and (from the right) half-open rectangles = n = n = a1 b1 × · · · × an bn aj bj ∈ We use the convention that aj bj = aj bj = ∅ if bj aj and, of course, that a1 b1 × · · · × ∅ × · · · × an bn = ∅. Sometimes we use the shorthand a b = a1 b1 × · · · × an bn for vectors a = a1 an b = b1 bn o for the (half-)open rectangles with only from n . Finally, we write rat rat rational endpoints. Notice that the half-open rectangles are b b

a

b

intervals in R . . . ,

a

rectangles in R2 . . . ,

a

cuboids in R3 . . . ,

and hypercubes in dimensions n > 3. n = on = n = on . 3.8 Theorem We have n = rat rat

Proof We begin with open rectangles having rational endpoints. Since the open o . rectangle a b is an open set[] , we find n ⊃ o ⊃ rat n Conversely, if U ∈ , we have I (x) Bε (x) U= I (3.2) o I⊂U I∈ rat

Here ‘⊃’ is clear from the definition and for the other direction ‘⊂’ we fix x ∈ U . Since U is open, there is some ball B x ⊂ U – see the picture – and we can inscribe a square into B x and then shrink this square to get a rectangle o containing x. Since every rectangle I is uniquely determined I = Ix ∈ rat U


19

by its main diagonal, there are at most # n × n = # many I in the union (3.2). Thus o U ∈ n ⊂ rat o , and so n = o = o . proving the other inclusion n ⊂ rat rat Every half-open rectangle (with rational endpoints) can be written as a1 − 1j b1 × · · · × an − 1j bn a1 b1 × · · · × an bn = j∈

while every open rectangle (with rational endpoints) can be represented as c1 + 1j d1 × · · · × cn + 1j dn c1 d1 × · · · × cn dn = j∈

o and These formulae imply that ⊂ o and o ⊂ resp. rat ⊂ rat

o ⊂ , hence by Remark 3.5(iii), o = resp. o = rat rat rat rat o = o = n from the first and the proof follows since we know rat part. 3.9 Remark The Borel sets of the real line are also generated by any of the following systems − a a ∈

− a a ∈

− b b ∈

− b b ∈

c c ∈

c c ∈

d d ∈

d d ∈

3.10 Remark One might think that can be explicitly constructed for any given by adding to the family all possible countable unions of its members and their complements: c c = Gj Gj

Gj ∈ j∈

j∈

But c is not necessarily a -algebra.[] Even if we repeat this procedure countably often, i.e. = n = c c c n n∈ n times

.3 we end up, in general, with a set that is too small: 3

A ‘constructive’ approach along these lines is nevertheless possible if we use transfinite induction, see Hewitt and Stromberg [20, Theorem 10.23] or Appendix D.

20

R.L. Schilling

This shows that the -operation produces a pretty big family; so big, in fact, that no approach using countably many countable set operations will give the whole of . On the other hand, it is rather typical that a -algebra is given through its generator. In order to deal with these cases, we need the notion of Dynkin systems which will be introduced in Chapter 5.

Problems 3.1. Let be a -algebra. Show that (i) if A1 A2 AN ∈ , then A1 ∩ A2 ∩ ∩ AN ∈ ; (ii) A ∈ if, and only if, Ac ∈ ; (iii) if A B ∈ , then A \ B ∈ and A B ∈ . 3.2. Prove the assertions made in Example 3.3 (iv), (vi) and (vii). [Hint: use (2.5) for (vii).] 3.3. Verify the assertions made in Remark 3.5. 3.4. Let X = 0 1 . Find the -algebra generated by the sets (i) 0 21 ;

(ii) 0 41 43 1 ;

(iii) 0 43 41 1 . 3.5. Let A1 A2 AN be subsets of X. (i) If the Aj are disjoint and · Aj = X, then #A1 A2 AN = 2N . Remark. A set A in a -algebra is called an atom, if there is no proper subset B A such that B ∈ . In this sense all Aj are atoms. (ii) Show that A1 A2 AN consists of finitely many sets. [Hint: show that A1 A2 AN has only finitely many atoms.] 3.6. Verify the properties 1 – 3 for open sets in n . Is n a -algebra? 3.7. Find an example (e.g. in ) showing that j∈ Uj need not be open even if all Uj are open sets. 3.8. Prove any one of the assertions made in Remark 3.9. 3.9. Is this still true for the family = Br x x ∈ n r ∈ + ? [Hint: mimic the Proof of T3.8.] 3.10. Let n be the collection of open sets (topology) in n and let A ⊂ n be an arbitrary subset. We can introduce a topology A on A as follows: a set V ⊂ A is called open (relative to A) if V = U ∩ A for some U ∈ n . We write A for the open sets relative to A. (i) Show that A is a topology on A, i.e. a family satisfying 1 – 3 . (ii) If A ∈ n , show that the trace -algebra A ∩ n coincides with A (the latter is usually denoted by A: the Borel sets relative to A).


21

3.11. Monotone classes. A family ⊂ X is called a monotone class if it is stable under countable unions and countable intersections, i.e. Aj j∈ ⊂ =⇒ Aj Aj ∈ j∈

j∈

(i) Mimic the proof of T3.4 to show that for every ⊂ X there is a smallest monotone class containing . (ii) Assume that ∅ ∈ and that E ∈ =⇒ E c ∈ . Show that the system = B ∈ Bc ∈ is a -algebra. (iii) Show that in (ii) ⊂ ⊂ ⊂ holds and conclude that = . 3.12. Alternative characterization of n . In older books the Borel sets are often introduced as the smallest family of sets which is stable under countable intersections and countable unions and which contains all open sets n . The purpose of this exercise is to verify that = n . Show that (i) (ii) (iii) (iv)

is well-defined and ⊂ n ; U ∈ n =⇒ U c ∈ , i.e. contains all closed sets; B ∈ Bc ∈ is a -algebra; n ⊂ B ∈ Bc ∈ ⊂ .

[Hints: (i) – mimic T3.4(ii); (ii) – every closed set F is the intersection of the open sets Un = F + B1/n 0 = y x ∈ F x − y < 1/n , n ∈ .]

4 Measures

We are now ready to introduce one of the central concepts of measure and integration theory: measures. As before, X is some set and is a -algebra on X. 4.1 Definition A (positive) measure on X is a mapping → 0 defined on a -algebra satisfying ∅ = 0

(M1 )

and, for any countable family of pairwise disjoint sets Aj j∈ ⊂ , -additivity

Aj

· Aj = j∈

(M2 )

j∈

If M1 M2 hold, but is not a -algebra, is said to be a pre-measure. Caution: M2 requires implicitly that · j Aj is again in – this is clearly the case for -algebras, but needs special attention if one deals with pre-measures. 4.2 Definition Let X be a set and be a -algebra on X. The pair X is called measurable space. If is a measure on X, X is called measure space. A finite measure is a measure with X < and a probability measure is a measure with X = 1. The corresponding measure spaces are called finite measure space resp. probability space. An exhausting sequence Aj j∈ ⊂ is an increasing sequence of sets A1 ⊂ A2 ⊂ A3 ⊂ such that j∈ Aj = X. A measure is said to be -finite and X is called a -finite measure space, if contains an exhausting sequence Aj j∈ such that Aj < for all j ∈ . Let us derive some immediate properties of (pre-)measures. 22


23

4.3 Proposition Let X be a measure space and A B ∈ . Then (i) (ii) (iii) (iv) (v)

· B = A + B A ∩ B = ∅ =⇒ A ∪ A ⊂ B =⇒ A B A ⊂ B A < =⇒ B \ A = B − A A ∪ B + A ∩ B = A + B A ∪ B A + B

( finitely additive) (monotone) (strongly additive) (subadditive)

Proof (i) Set A1 = A A2 = B A3 = A4 = = ∅. Then Aj j∈ is a family of · B = · Aj and by M2 pairwise disjoint sets from . Moreover A ∪ j · B = · Aj = Aj = A + B + ∅ +

A ∪ j∈

j∈

= A + B

· B \ A , and by (i) (ii) If A ⊂ B, we have B = A ∪ · B \ A = A + B \ A B = A ∪ A

(4.1) (4.2)

(iii) If A ⊂ B, we can subtract the finite number A from both sides of (4.1) to get B − A = B \ A . (iv) For all A B ∈ we have · A∩B ∪ · B \ A ∩ B A ∪ B = A \ A ∩ B ∪ and using (i) twice we get A ∪ B = A \ A ∩ B + A ∩ B + B \ A ∩ B

Adding A ∩ B (which may assume the value +) on both sides and using again (4.1) yields A ∪ B + A ∩ B = A \ A ∩ B + A ∩ B + B \ A ∩ B + A ∩ B = A + B

(v) From (iv) we get A + B = A ∪ B + A ∩ B A ∪ B for all A B ∈ . So far we have not really used the -additivity of in its full strength. The next theorem shows that -additivity is, in fact, some kind of continuity condition for (pre-)measures.

24

R.L. Schilling

We call a sequence of sets Aj j∈ increasing, if A1 ⊂ A2 ⊂ A3 ⊂ and we write in this case Aj ↑ A with limit A = j Aj . Decreasing sequences of sets are defined accordingly and we write Aj ↓ A with limit A = j Aj . All -algebras are stable under increasing or decreasing limits of their members. 4.4 Theorem Let X be a measurable space. A map → 0 is a measure if, and only if, (i) ∅ = 0, · B = A + B for all A B ∈ with A ∩ B = ∅, (ii) A ∪ (iii) (continuity of measures from below) for any increasing sequence Aj j∈ ⊂ with Aj ↑ A ∈ we have A = lim Aj = sup Aj

j→

j∈

If A < for all A ∈ , (iii) can be replaced by either of the following equivalent conditions (iii ) (continuity of measures from above) for any decreasing sequence Aj j∈ ⊂ with Aj ↓ A ∈ we have A = lim Aj = inf Aj j→

(iii )

j∈

(continuity of measures at ∅) for any decreasing sequence Aj j∈ ⊂ with Aj ↓ ∅ we have lim Aj = 0

j→

4.5 Remark With some obvious rewordings, P4.3 and T4.4 are still valid for pre-measures, i.e. for families which are not -algebras. Of course, one has to make sure that ∅ ∈ and that is stable under finite unions, intersections and differences of sets1 (for P4.3) and, for T4.4, that increasing and decreasing sequences of the sets under consideration have their limits in . The proofs are literally the same. Proof (of Theorem 4.4) Let us, first of all, check that every measure enjoys all the properties (i)–(iii) and (iii ), (iii ). Property (i) is clear from the definition of a measure and (ii) follows from P4.3(i). Let Aj j∈ ⊂ be an increasing sequence of sets Aj ↑ A and set B1 = A1 Bj+1 = Aj+1 \ Aj

1

Such a family is called a ring of sets.


Obviously, Bj ∈ , the Bj are pairwise disjoint, Ak = kj=1 Bj and k∈ Ak = · Bj = A. Thus j∈ A = · Bj = Bj j∈

= lim

k→

k

25

B2

B1

j∈

Bj

j=1

= lim B1 ∪ ∪ Bk

B3

k→

= lim Ak

k→

If Aj ↓ A we see easily that A1 \ Aj ↑ A1 \ A as j → . Since A1 < , the previous argument shows that A1 \ A = lim A1 \ Aj = lim A1 − Aj = A1 − lim Aj

j→

j→

j→

This means that A1 − A = A1 − limj→ Aj and (iii ) follows. If we take, in particular, A = ∅, the above calculation also proves (iii ). Let us now assume that (i)–(iii) hold for the set-function → 0 . In order to see that is a measure, we have to check M2 . For this take a sequence Bj j∈ ⊂ of pairwise disjoint sets and define ·

∪ · Bk ∈ Ak = B1 ∪

A =

Ak = · Bj

(4.3)

j∈

k∈

Clearly Ak ↑ A, and using repeatedly property (ii) we get Ak = B1 + · · · + Bk . From (iii) we conclude A = lim Ak = lim k→

k→

∗

∗

k

Bj =

Bj

j∈

j=1

∗

Finally assume that A < for all A ∈ and that (i), (ii) and (iii ) or (iii ) hold. We will show that under the finiteness assumption (iii )⇒(iii )⇒(iii); the assertion follows then from the considerations of the first part of the proof.

26

R.L. Schilling

For (iii )⇒(iii ) there is nothing to show. For the remaining implication take a sequence Bj j∈ ⊂ of pairwise disjoint sets and define sets Ak and A as in (4.3). Then A \ Ak ↓ ∅ and from (iii ) we conclude that limk→ A\Ak = 0. Since Ak < we get A = limk→ Ak and (iii) follows. 4.6 Corollary Every measure [pre-measure] is -subadditive, i.e. Aj Aj j∈

(4.4)

j∈

holds for all sequences Aj j∈ ⊂ of not necessarily disjoint sets such that 2 j∈ Aj ∈ . Proof Since the arguments are virtually the same, we may assume that is a -algebra, so that becomes a measure. Set Bk = A1 ∪ ∪ Ak ↑ j∈ Aj as k → . By T4.4(iii) and repeated applications of P4.3(v), Aj = lim A1 ∪ ∪ Ak j∈

k→

lim A1 + · · · + Ak = Aj

k→

j∈

It is about time to give some examples of measures. At this stage this is, unfortunately, a somewhat difficult task! The main problem is that we have to explain for every set of the -algebra what its measure A shall be. Since can be very large – see Remark 3.10 – this is, in general, only (explicitly!) possible if either or is very simple. Nevertheless ... 4.7 Examples (i) (Dirac measure, unit mass) Let X be any measurable space and let x ∈ X be some point. Then x → 0 1, defined for A ∈ by 0 if x ∈ A x A = 1 if x ∈ A is a measure. It is called Dirac’s delta measure or unit mass at the point x. (ii) Consider with from Example 2.3(v) (i.e. A ∈ if A or Ac is countable). Then → 0 1, defined for A ∈ by 0 if A is countable A = 1 if Ac is countable is a measure. 2

This is automatically fulfilled for a measure on a -algebra.


27

(iii) (Counting measure) Let X be a measurable space. Then #A if A is finite A = + if A is infinite defines a measure. It is called counting measure. (iv) (Discrete probability measure) Let = 1 2 be a countable set

and pj j∈ be a sequence of real numbers pj ∈ 0 1 such that j∈ pj = 1. On the set-function pj = pj j A A ⊂ PA = j j ∈A

j∈

defines a probability measure. The triplet P is called discrete probability space. (v) (Trivial measures) Let X be a measurable space. Then 0 if A = ∅ and A = 0 A ∈ A = + if A = ∅ are measures. Note that our list of examples does not include the most familiar of all measures: length, area and volume. 4.8 Definition The set-function n on n n that assigns every half-open rectangle a b = a1 b1 × · · · × an bn ∈ the value n n a b = bj − aj j=1

is called n-dimensional Lebesgue measure. The problem here is that we do not know whether n is a measure in the sense of Definition 4.1: n is only explicitly given on the half-open rectangles and it is not obvious at all that n is a pre-measure on ; much less clear is the question if and how we can extend this pre-measure from to a proper measure on . Over the next few chapters we will see that such an extension is indeed possible. But this requires some extra work and a more abstract approach. One of the main obstacles is, of course, that cannot be obtained by a bare-hands construction from . Let us, meanwhile, note the upshot of what will be proved in the next chapters.

28

R.L. Schilling

4.9 Theorem Lebesgue measure n exists, is a measure on the Borel sets n and is unique. Moreover, n enjoys the following additional properties for B ∈ n : (i) n is invariant under translations: n x + B = n B , x ∈ n ; (ii) n is invariant under motions: n R−1 B = n B where R is a motion, i.e. a combination of translations, rotations and reflections; (iii) n M −1 B = det M−1 n B for any invertible matrix M ∈ n×n . The attentive reader will have noticed that the sets x + B = x + y y ∈ B R−1 B = R−1 y y ∈ B and M −1 B must again be Borel sets, otherwise the statement of T4.9 would be senseless, cf. T5.8 and Chapter 7. Problems 4.1. Extend Proposition 4.3(i), (iv) and (v) to finitely many sets A1 A2 AN ∈ . 4.2. Check that the set-functions defined in Example 4.7 are measures in the sense of Definition 4.1. 4.3. Is the set-function of 4.7 (ii) still a measure on the measurable space ? And on the measurable space ∩ ? 4.4. Let X = . For which -algebras are the following set-functions measures: 0 if A = ∅ 0 if A is finite (i) A = (ii) A = 1 if A = ∅ 1 if Ac is finite? 4.5. Find an example showing that the finiteness condition in Theorem 4.4 (iii ) or (iii ) is essential. [Hint: use Lebesgue measure or the counting measure on infinite tails k ↓ ∅.] 4.6. Let X be a measurable space. (i) Let be two measures on X . Show that for all a b 0 the set-function A = aA + bA , A ∈ , is again a measure. (ii) Let 1 2 be countably many measures on X and let j j∈ be a

sequence of positive numbers. Show that A = j=1 j j A , A ∈ , is again a measure. [Hint: to show -additivity use (and prove) the following helpful lemma: for any double sequence ij i j ∈ , of real numbers we have sup sup ij = sup sup ij

i∈ j∈

j∈ i∈

Thus limi→ limj→ ij = limj→ limi→ ij if i → ij , and j → ij increases when the other index is fixed.] 4.7. Let X be a measure space and F ∈ . Show that A → A ∩ F defines a measure.


29

4.8. Let P be a probability space and Aj j∈ ⊂ a sequence of sets with PAj = 1 for all j ∈ . Show that P j∈ Aj = 1. 4.9. Let X be a finite measure space and Aj j∈ Bj j∈ ⊂ such that Aj ⊃ Bj for all j ∈ . Show that Aj − Bj Aj − Bj

j∈

j∈

j∈

[Hint: show first that j Aj \ k Bk ⊂ j Aj \ Bj then use C4.6.] 4.10. Null sets. Let X be a measure space. A set N ∈ is called a null set or -null set if N = 0. We write for the family of all -null sets. Check that has the following properties: (i) ∅ ∈ ; (ii) if N ∈ M ∈ and M ⊂ N then M ∈ ; (iii) if Nj j∈ ⊂ , then j∈ Nj ∈ . 4.11. Let be one-dimensional Lebesgue measure. (i) Show that for all x ∈ the set x is a Borel set with x = 0. [Hint: consider the intervals x − 1/k x + 1/k k ∈ and use Theorem 4.4.] (ii) Prove that is a Borel set and that = 0 in two ways: a) by using the first part of the problem; b) by considering the set C = k∈ qk − 2−k qk + 2−k , where qk k∈ is an enumeration of , and letting → 0. (iii) Use the trivial fact that 0 1 = 0x1 x to show that a non-countable union of null sets (here: x) is not necessarily a null set. 4.12. Determine all null sets of the measure a + b , a b ∈ , on . 4.13. Completion (1). We have seen in Problem 4.10 that measurable subsets of null sets are again null sets: M ∈ M ⊂ N ∈ N = 0 then M = 0; but there might be subsets of N which are not in . This motivates the following definition: a measure space X ∗ (or a measure ) is complete if all subsets of -null sets are again in ∗ . In other words: if all subsets of a null set are null sets. The following exercise shows that a measure space X which is not yet complete can be completed. (i) ∗ = A ∪ N A ∈ N is a subset of some -measurable null set is a -algebra satisfying ⊂ ∗ . (ii) A ¯ ∗ = A for A∗ = A ∪ N ∈ ∗ is well-defined, i.e. it is independent of the way we can write A∗ , say as A∗ = A ∪ N = B ∪ M where A B ∈ and M N are subsets of null sets. ¯ = A for all A ∈ . (iii) ¯ is a measure on ∗ and A

30

R.L. Schilling (iv) X ∗ ¯ is complete. (v) We have ∗ = A∗ ⊂ X ∃ A B ∈

A ⊂ A∗ ⊂ B

B \ A = 0.

4.14. Restriction. Let X be a measure space and let ⊂ be a sub--algebra. Denote by = the restriction of to . (i) Show that is again a measure. (ii) Assume that is a finite measure [a probability measure]. Is still a finite measure [a probability measure]? (iii) Does inherit -finiteness from ? 4.15. Show that a measure space X is -finite if, and only if, there exists a sequence of measurable sets Ej j∈ ⊂ such that j∈ Ej = X and Ej < for all j ∈ .

5 Uniqueness of measures

Before we embark on the proof of the existence of measures in the following chapter, let us first check whether it is enough to consider measures on some generator of a -algebra – otherwise our construction of Lebesgue measure would be flawed from the start. As mentioned in Remark 3.10 a major problem is that, apart from trivial cases, cannot be constructively obtained from . To overcome this obstacle we need a new concept. 5.1 Definition A family ⊂ X is a Dynkin system if X ∈

(1 )

D ∈ =⇒ Dc ∈

(2 )

Dj j∈ ⊂ pairwise disjoint =⇒ · Dj ∈

(3 )

j∈

5.2 Remark As for -algebras, cf. Properties 3.2, one sees that ∅ ∈ and that finite disjoint unions are again in : D E ∈ D ∩ E = ∅ =⇒ D ∪· E ∈ . Of course, every -algebra is a Dynkin system, but the converse is, in general, wrong[] , Problem 5.2. 5.3 Proposition Let ⊂ X. Then there is a smallest (also minimal, coarsest) Dynkin system containing . is called the Dynkin system generated by . Moreover, ⊂ ⊂ . Proof The proof that exists parallels the proof of T3.4(ii). As in the case of -algebras, = if is a Dynkin system (by minimality) and so = . Hence, ⊂ implies that ⊂ = . It is important to know when a Dynkin system is already a -algebra. 31

32

R.L. Schilling

5.4 Lemma A Dynkin system is a -algebra if, and only if, it is stable under finite intersections:1 D E ∈ =⇒ D ∩ E ∈ . Proof Since a -algebra is ∩-stable (cf. Properties 3.2, Problem 3.1) as well as a Dynkin system (Remark 5.2) it only remains to show that a ∩-stable Dynkin system is a -algebra. Let Dj j∈ be a sequence of subsets in . We have to show that D = j∈ Dj ∈ . Set E1 = D1 ∈ and Ej+1 = Dj+1 \ Dj \ Dj−1 \ \ D1 c = Dj+1 ∩ Djc ∩ Dj−1 ∩ ∩ D1c ∈

where we used 2 and the assumed ∩-stability of . The Ej are obviously mutually disjoint and D = · j∈ Ej ∈ by 3 . Lemma 5.4 is not applicable if is given in terms of a generator , which is often the case. The next theorem is very important for applications as it extends Lemma 5.4 to the much more convenient setting of generators. 5.5 Theorem If ⊂ X is stable under finite intersections, then = . Proof We have already established ⊂ in P5.3. If we knew that were a -algebra, the minimality of and ⊂ would immediately imply ⊂ , hence equality. In view of L5.4 it is enough to show that is ∩-stable. For this we fix some D ∈ and introduce the family D = Q ⊂ X Q ∩ D ∈ Let us check that D is a Dynkin system: 1 is obviously true. 2 : take Q ∈ D . Then · Dc c Qc ∩ D = Qc ∪ Dc ∩ D = Q ∩ Dc ∩ D = Q ∩ D ∪ ∈

(5.1)

∈

and disjoint unions of sets from are still in . Thus Qc ∈ D . 3 : let Qj j∈ be a sequence of pairwise disjoint sets from D . By definition, Qj ∩ Dj∈ is a disjoint sequence in and 3 for the Dynkin system shows · Qj ∩ D = · Qj ∩ D ∈ j∈

which means that · j∈ Qj ∈ D . 1

∩-stable, for short.

j∈


33

Since ⊂ and since is ∩-stable, we have ⊂ G for all G ∈ .[] But G is a Dynkin system and so ⊂ G for all G ∈ (use P5.3, Problem 5.4). Consequently, if D ∈ and G ∈ we find because of ⊂ G and the very definition of G that G ∩ D ∈

∀ G ∈

∀ D ∈

so

⊂ D

∀ D ∈

and

⊂ D

∀ D ∈

The latter just says that is stable under intersections with D ∈ . By Lemma 5.4 is a -algebra and the theorem is proved. 5.6 Remark The technique used in the proof of Theorem 5.5 is an extremely important and powerful tool. We will use it almost exclusively in this chapter to prove the uniqueness of measures theorem and some properties of Lebesgue measure n . 5.7 Theorem (Uniqueness of measures). Assume that X is a measurable space and that = is generated by a family such that • is stable under finite intersections: G H ∈ =⇒ G ∩ H ∈ ; • there exists an exhausting sequence Gj j∈ ⊂ with Gj ↑ X. Any two measures that coincide on and are finite for all members of the exhausting sequence Gj = Gj < , are equal on , i.e. A = A for all A ∈ . Proof For j ∈ we define

j = A ∈ Gj ∩ A = Gj ∩ A

< !

and we claim that every j is a Dynkin system. 1 is clear. 2 : if A ∈ j we have

Gj ∩ Ac = Gj \ A = Gj − Gj ∩ A = Gj − Gj ∩ A = Gj \ A = Gj ∩ Ac

34

R.L. Schilling

so that Ac ∈ j . 3 : if Ak k∈ ⊂ j are mutually disjoint sets, we get

Gj ∩ · Ak = · Gj ∩ Ak =

Gj ∩ Ak k∈

k∈

=

k∈

Gj ∩ Ak = · Gj ∩ Ak = Gj ∩ · Ak

k∈

k∈

k∈

and · k∈ Ak ∈ j follows. Since is ∩-stable, we know from T5.5 that = ; therefore, j ⊃ =⇒ j ⊃ =

∀ j ∈

On the other hand, = ⊂ j ⊂ , which means that = j for all j ∈ , and so

Gj ∩ A = Gj ∩ A

∀ j ∈

∀ A ∈

(5.2)

Using T4.4(iii) we can let j → in (5.2) to get

A = lim Gj ∩ A = lim Gj ∩ A = A j→

j→

∀ A ∈

The following two theorems show why Lebesgue measure (if it exists) plays a very special rôle indeed. 5.8 Theorem (i) n-dimensional Lebesgue measure n is invariant under translations, i.e. n x + B = n B

∀ x ∈ n ∀ B ∈ n

(5.3)

(ii) Every measure on n n which is invariant under translations and satisfies = 0 1n < is a multiple of Lebesgue measure: = n . Proof First of all we should convince ourselves that B ∈ n =⇒ x + B ∈ n

∀ x ∈ n

(5.4)

otherwise the statement of T5.8 would be senseless. For this set

x = B ∈ n x + B ∈ n ⊂ n It is clear that x is a -algebra and that ⊂ x .[] Hence, n = ⊂ x ⊂ n and (5.4) follows. We can now start the proof proper.


35

(i) Set B = n x + B for some fixed x = x1 xn ∈ n . It is easy to check that is a measure on n n [] . Take I = a1 b1 × · · · × an bn ∈

and observe that x + I = a1 + x1 b1 + x1 × · · · × an + xn bn + xn ∈ so that I = n x + I =

n

n bj + xj − aj + xj = bj − aj = n I

j=1

j=1

This means that = n .2 But is ∩-stable,3 generates n and admits the exhausting sequence −k kn ↑ n n −k kn = 2kn < We can now invoke T5.5 to see that n = on the whole of n . (ii) Take I ∈ as in part (I) but with rational endpoints aj bj ∈ . Thus there is some M ∈ and kI ∈ and points xj ∈ n , such that kI n I = · xj + 0 M1 j=1

i.e. we pave the rectangle I by little squares 0 M1 n of side-length 1/M, where M is, say, the common denominator of all aj and bj . Using the translation invariance of and n , we see n n

I = kI 0 M1

0 1n = M n 0 M1 n n n I = kI n 0 M1 n 0 1n = M n n 0 M1 =1

and dividing the top two and bottom two equalities gives kI kI kI n I = n n 0 1n = n

0 1n n M M M n n n Thus I = 0 1 I = I for all I ∈ and, as in part (I), an application of T5.5 finishes the proof.

I =

Incidentally, Theorem 5.8 proves Theorem 4.9(I). Further properties of Lebesgue measure will be studied in the following chapters, but first we concentrate on its existence. 2

This is short for I = n I ∀ I ∈ .

3

Use × aj bj ∩ × aj bj = × aj ∨ aj bj ∧ bj .

n

n

n

j=1

j=1

j=1

[ ]

36

R.L. Schilling

Problems 5.1. Verify the claims made in Remark 5.2. 5.2. The following exercise shows that Dynkin systems and -algebras are, in general, different: Let X = 1 2 3 2k − 1 2k for some fixed k ∈ . Then the family = A ⊂ X #A is even is a Dynkin system, but not a -algebra. 5.3. Let be a Dynkin system. Show that for all A B ∈ the difference B \ A ∈ . · Rc c where R Q ⊂ X.] [Hint: use R \ Q = R ∩ Q ∪ 5.4. Let be a -algebra, be a Dynkin system and ⊂ ⊂ X two collections of subsets of X. Show that (i) = and = ; (ii) ⊂ ; (iii) ⊂ . 5.5. Let A B ⊂ X. Compare A B and A B. When are they equal? 5.6. Show that Theorem 5.7 is still valid, if Gj j∈ ⊂ is not an increasing sequence but any countable family of sets such that 1 Gj = X and 2 Gj = Gj < j∈

[Hint: set FN = G1 ∪ ∪ GN = FN −1 ∪ GN and check by induction that FN = FN ; use then T5.7.] 5.7. Show that the half-open intervals n in n are stable under finite intersections. n n n [Hint: check that I = × aj bj , I = × aj bj satisfy I ∩I = × aj ∨aj bj ∧bj . ] j=1

j=1

j=1

5.8. Dilations. Mimic the proof of Theorem 5.8(I) and show that t · B = tb b ∈ B is a Borel set for all B ∈ n and t > 0. Moreover, n t · B = tn n B

∀ B ∈ n ∀ t > 0

(5.5)

5.9. Invariant measures. Let X be a finite measure space where = for some ∩-stable generator . Assume that X → X is a map such that −1 A ∈ for all A ∈ . Prove that

G = −1 G

∀G ∈

=⇒

A = −1 A ∀ A ∈

(A measure with this property is called invariant w.r.t. the map .) 5.10. Independence (1). Let P be a probability space and let ⊂ be two sub--algebras of . We call and independent, if PB ∩ C = PB PC

∀ B ∈ C ∈

Assume now that = and = where , are ∩-stable collections of sets. Prove that and are independent if, and only if, PG ∩ H = PG PH

∀ G ∈ H ∈

6 Existence of measures

In Chapter 4 we saw that it is not a trivial task to assign explicitly a -value to every set A from a -algebra . Rather than doing this it is often more natural to assign -values to, say, rectangles (in the case of the Borel -algebra) or, in general, to sets from some generator of . Because of Theorem 4.4 (and Remark 4.5) should be a pre-measure. If and satisfy the conditions of the uniqueness theorem 5.7, this approach will lead to a unique measure on , provided we can extend from onto = . To get such an automatic extension the following (technically motivated) class of generators is useful. A semi-ring is a family ⊂ X with the following properties: ∅ ∈

(S1 )

S T ∈ =⇒ S ∩ T ∈

(S2 )

for S T ∈ there exist finitely many disjoint M S1 S2 SM ∈ such that S \ T = · Sj

(S3 )

j=1

The solution to our problems is the following deep extension theorem for measures which goes back to Carathéodory [9]. 6.1 Theorem (Carathéodory) Let be a semi-ring of subsets of X and → 0 be a pre-measure, i.e. a set-function with (i) ∅ = 0;

(ii) Sj j∈ ⊂ , disjoint and S = · Sj ∈ =⇒ S = Sj . j∈

j∈

37

38

R.L. Schilling

Then has an extension to a measure on . If, moreover, contains an exhausting sequence Sj j∈ , Sj ↑ X such that Sj < for all j ∈ , then the extension is unique. 6.2 Remark From the Definition 4.1 of a measure it is clear that the conditions 6.1(i) and (ii) are necessary for to become a measure. Theorem 6.1 says that they are even sufficient. Remarkable is the fact that (ii) is only needed relative to – its extension to is then automatic. The proof of Carathéodory’s theorem is a bit involved and not particularly rewarding when read superficially. Therefore we recommend skipping the proof on first reading and resuming on p. 44. Proof (of Theorem 6.1) We begin with the construction of an auxiliary setfunction ∗ X → 0 which will, eventually, extend . Define for each A ⊂ X the family of countable -coverings of A A = Sj j∈ ⊂ j∈ Sj ⊃ A (A = ∅ is possible since we do not require X ∈ ), and set ∗ A = inf Sj Sj j∈ ∈ A

(6.1)

j∈

where, as usual, inf ∅ = + . Step 1: Claim: ∗ has the following three properties:1 ∗ ∅ = 0

OM1

A ⊂ B =⇒ ∗ A ∗ B

monotone

OM2

Aj j∈ ⊂ X =⇒ -subadditive

∗

j∈

Aj

∗

OM3

Aj

j∈

OM1 is obvious since we can take in (6.1) the constant sequence S1 = S2 = = ∅ which is clearly in ∅. OM2 : if B ⊃ A, then each -cover of B also covers A, i.e. B ⊂ A. Therefore, ∗ A = inf Sj Sj j∈ ∈ A j∈

inf

Tk Tk k∈ ∈ B = ∗ B

k∈ 1

A set-function ∗ X → 0 satisfying OM1 –OM3 is called outer measure.


39

OM3 : without loss of generality we can assume that ∗ Aj < for all j ∈ and so Aj = ∅. Fix > 0 and observe that by the very nature of the infimum j we find for each Aj a cover Sk k∈ ∈ Aj with

Sk ∗ Aj + j

k∈

2j

j ∈

j

The double sequence Sk jk∈ is an -cover of A = ∗ A

j

Sk =

(6.2)

j∈ Aj ,

and so

j

Sk

j∈ k∈

jk∈× (6.2)

∗ Aj +

j∈

=

2j

∗ Aj +

j∈

where the second ‘’ follows from (6.2). Letting → 0 proves OM3 . Step 2. Claim: ∗ extends , i.e. ∗ S = S

∀ S ∈ .

Observe that can be uniquely extended to the set ∪ = S1 ∪· ∪· SM M ∈ Sj ∈ of all finite unions of disjoint -sets by · ∪ · SM = S ¯ 1∪

M

Sj

(6.3)

j=1

Since (6.3) is necessary for an additive set-function on ∪ , (6.3) implies the uniqueness of the extension[] once we know that ¯ is well-defined, that is, independent of the particular representation of sets in ∪ . To see this assume that · ∪ · SM = T1 ∪ · ∪ · TN S1 ∪

M N ∈ Sj Tk ∈

Then N · ∪ · TN = · Sj ∩ Tk Sj = Sj ∩ T1 ∪ k=1

and the additivity of on shows Sj =

N k=1

Sj ∩ Tk

40

R.L. Schilling

Summing over j = 1 2 M and swapping the rôles of Sj and Tk gives M

Sj =

j=1

M N

Sj ∩ Tk =

j=1 k=1

N

Tk

k=1

which proves that (6.3) does not depend on the representation of ∪ -sets. The family ∪ is clearly stable under finite disjoint unions. If S T ∈ ∪ we find (notation as before) MN · ∪ · SM ∩ T1 ∪ · ∪ · TN = · Sj ∩ Tk ∈ ∪ S ∩ T = S1 ∪ jk=1 ∈

and, since by S3 Sj \ Tk ∈ ∪ , also · ∪ · SM \ T1 ∪ · ∪ · TN S \ T = S1 ∪ M M N N = · Sj ∩ Tkc = · Sj \ Tk ∈ ∪ j=1 k=1 j=1 k=1

∈ ∪

∈∪

· -stability of ∪ . Finally, where we used the ∩- and ∪ · S∩T ∪ · T \ S ∈ ∪ 2 S∪T = S\T ∪ and the prescription (6.3) can be used to extend to finite unions of -sets. Let us show that ¯ is -additive on ∪ , i.e. a pre-measure. For this take Tk k∈ ⊂ ∪ such that T = · k∈ Tk ∈ ∪ . By the definition of the family ∪ we find a sequence of disjoint sets Sj j∈ ⊂ and a sequence of integers 0 = n0 n1 n2 such that · ∪ · Snk Tk = Snk−1+1 ∪

k∈

· ∪ · UN , where U = · Sj ∈ [] with disjoint index sets and T = U1 ∪ j∈J

· J2 ∪ · ∪ · JN = partitioning . Thus J1 ∪ def

T ¯ =

N =1

6.1(ii)

U =

N =1 j∈J

Sj =

nk

k∈ j=nk−1+1

which proves -additivity of . ¯ 2

def

Sj =

This shows that ∪ is the ring generated by , i.e. the smallest ring containing .

k∈

T ¯ k


41

Using the pre-measure ¯ we get from Corollary 4.6 for any cover Sj j∈ ∈ S, S ∈ , that

S = S ¯ = ¯ Sj ∩ S S ¯ j ∩ S j∈

j∈

=

Sj ∩ S

j∈

Sj

j∈

and passing to the infimum over S shows S ∗ S. The special cover S ∅ ∅ ∈ S, on the other hand, yields ∗ S S and this shows that = ∗ . Step 3. Claim: ⊂ ∗ , where ∗ is given by ∗ = A ⊂ X ∗ Q = ∗ Q ∩ A + ∗ Q \ A ∀ Q ⊂ X

(6.4)

Let S T ∈ . From S3 we get M · T \ S = S ∩ T ∪ · · Sj T = S ∩ T ∪ j=1

for some mutually disjoint sets Sj ∈ , j = 1 2 M. Since is additive on and ∗ is (-)subadditive by OM3 , we find ∗ S ∩ T + ∗ T \ S S ∩ T +

M

Sj = T

(6.5)

j=1

Take any B ⊂ X and some -cover Tj j∈ ∈ B. Using ∗ Tj = Tj and summing the inequality (6.5) for T = Tj over j ∈ yields ∗ ∗ ∗ Tj \ S + Tj ∩ S Tj j∈

j∈

j∈

and the -subadditivity OM3 and monotonicity OM2 of ∗ give (recall that B ⊂ j∈ Tj )

Tj \ S + ∗ Tj ∩ S ∗ B \ S + ∗ B ∩ S ∗

j∈

j∈ ∗

Tj =

j∈

Tj

j∈

We can now pass to the inf over B and find ∗ B \ S + ∗ B ∩ S ∗ B. Since the reverse inequality follows easily from the (-)subadditivity OM3 of ∗ , S ∈ ∗ holds for all S ∈ . Step 4. Claim: ∗ is a -algebra and ∗ is a measure on X ∗ .

42

R.L. Schilling

Clearly, ∅ ∈ ∗ and by the symmetry (w.r.t. A and Ac ) of definition (6.4) of ∗ we have A ∈ ∗ if, and only if, Ac ∈ ∗ . Let us show that ∗ is ∪-stable. Using the (-)subadditivity OM3 of ∗ we find for A A ∈ ∗ and any P ⊂ X ∗ P ∩ A ∪ A + ∗ P \ A ∪ A = ∗ P ∩ A ∪ A \ A + ∗ P \ A ∪ A ∗ P ∩ A + ∗ P ∩ A \ A + ∗ P \ A ∪ A = ∗ P ∩ A + ∗ P \ A ∩ A + ∗ P \ A \ A 64

= ∗ P ∩ A + ∗ P \ A

(6.6)

64

= ∗ P

(6.6 )

where we used in the last two steps the definition (6.4) of ∗ with Q = P \ A and Q= P, respectively. The reverse inequality follows from OM3 , hence equality, and we conclude that A ∪ A ∈ ∗ . If A A are disjoint, the equality (6.6)=(6.6 ) becomes, for P = A ∪· A ∩ Q, Q ⊂ X, · A = ∗ Q ∩ A + ∗ Q ∩ A ∗ Q ∩ A ∪

∀ Q ⊂ X

and a simple induction argument yields · ∪ · AM = ∗ Q ∩ A1 ∪

M

∗ Q ∩ Aj

∀Q ⊂ X

j=1

for all mutually disjoint A1 A2 AM ∈ ∗ . In particular, if Aj j∈ ⊂ ∗ is a sequence of pairwise disjoint sets, we find for their union A = · j∈ Aj that · ∪ · AM = ∗ Q ∩ A ∗ Q ∩ A1 ∪

M

∗ Q ∩ Aj

(6.7)

j=1

Since A1 ∪ ∪ AM ∈ ∗ , we can use OM3 and (6.7) to deduce ∗ Q = ∗ Q ∩ A1 ∪ ∪ AM + ∗ Q \ A1 ∪ ∪ AM ∗ Q ∩ A1 ∪ ∪ AM + ∗ Q \ A =

M j=1

∗ Q ∩ Aj + ∗ Q \ A

(6.8)


43

The left-hand side is independent of M; therefore, we can let M → and get ∗ Q

∗ Q ∩ Aj + ∗ Q \ A ∗ Q ∩ A + ∗ Q \ A

(6.9)

j=1

The reverse inequality ∗ Q ∗ Q ∩ A + ∗ Q \ A follows at once from the subadditivity of ∗ . This means that equality holds throughout (6.9) and we get A ∈ ∗ . If we take Q = A in (6.9) we even see the -additivity of ∗ on ∗ . So far we have seen that ∗ is a ∪-stable Dynkin system. Because of A ∩ B = Ac ∪ Bc c we see that ∗ is also ∩-stable and, by L5.4, a -algebra. Step 5. Claim: ∗ is a measure on which extends . By step 3, ⊂ ∗ and thus ⊂ ∗ = ∗ since ∗ is itself a -algebra (step 4). Again by step 4, ∗ is a measure which, by step 2, extends . Step 6. Uniqueness of ∗ . If there is an exhausting sequence Sj j∈ ⊂ , Sj ↑ X such that Sj < for all j ∈ , it follows from T5.7 that any two extensions of to coincide. 6.3 Remark The core of Carathéodory’s theorem 6.1 is the definition (6.4) of ∗ -measurable sets, i.e. of the -algebra ∗ . The proof shows that, in general, we cannot expect ∗ to be (-)additive outside ∗ . In many situations the -algebra X is simply too big to support a non-trivial measure. Notable exceptions are countable sets X or Dirac measures[] . For n-dimensional Lebesgue measure, this was first remarked by Hausdorff [19, pp. 401–402]. The general case depends on the cardinality of X and the behaviour of on one-point sets; see the discussion in Oxtoby [33, Chapter 5]. Put in other words this says that even a household measure like Lebesgue measure cannot assign a content to every set! In 3 (and higher dimensions) we even have the Banach–Tarski paradox: the open balls B1 0 and B2 0 with M centre 0 and radii 1 resp. 2 have finite disjoint decompositions B1 0 = · j=1 Ej M and B2 0 = · j=1 Fj such that for every j = 1 2 M the sets Ej and Fj are geometrically congruent (hence, should have the same Lebesgue measure); see Stromberg [49] or Wagon [52]. Of course, not all of the sets Ej and Fj can be Borel sets. This brings us to the question if and how we can construct a non-Borel measurable set, i.e. a set A ∈ n \ n . Such constructions are possible but they are based on the axiom of choice, see for example Hewitt and Stromberg [20, pp. 136–7], Oxtoby [33, pp. 22–3] or Appendix D. ∗

∗

∗

44

R.L. Schilling

Let us now apply Theorem 6.1 to prove the existence of n-dimensional Lebesgue measure n which was defined for half-open rectangles n = n in D4.8: n a b =

n

bj − aj

n

a b = × aj bj ∈ n j=1

j=1

6.4 Proposition The family of n-dimensional rectangles n is a semi-ring. Proof (By induction) It is obvious that 1 satisfies the properties S1 –S3 from page 37. Assume that n is a semi-ring for some n 1. From the definition of rectangles it is clear that

n+1 = n × 1 = In × I1 In ∈ n I1 ∈ 1 S1 is obviously true and S2 follows from the identity In × I1 ∩ Jn × J1 = In ∩ Jn × I1 ∩ J1

(6.10)

where In Jn ∈ n and I1 J1 ∈ 1 . Since Jn × J1 c = x y x ∈ Jn y ∈ J1

or x ∈ Jn y ∈ J1

or x ∈ Jn y ∈ J1

· Jn × J1c ∪ · Jnc × J1 = Jnc × J1c ∪ we see, using (6.10), In × I1 \ Jn × J1 = In × I1 ∩ Jn × J1 c · In ∩ Jn × I1 \ J1 ∪ · In \ Jn × I1 ∩ J1 = In \ Jn × I1 \ J1 ∪ Both In \ Jn and I1 \ J1 are made up of finitely many disjoint rectangles from n and 1 , and therefore In × I1 \ Jn × J1 is a finite union of disjoint rectangles from n × 1 ; thus S3 holds. In 2 it is easy to depict the two typical situations that occur in the proof of S3 in Proposition 6.4: Jn 1

8

7

2

Jn × J1

6

3

4

5

J1 I1

1 I1

2

3 In

In


45

The proof of Proposition 6.4 reveals a bit more: the Cartesian product of any two semi-rings is again a semi-ring.[] 6.5 Proposition n is a pre-measure on n Proof It is enough to verify (i), (ii) and (iii ) of Theorem 4.4 since n assigns finite measure to every rectangle in n . We consider only the case n = 2, since (b1, b2) n = 1 is similar but easier and n 3 adds only notational complications. Obviously, I2 2 ∅ = 0. To see additivity on 2 , we γ may as well cut I = a1 b1 × a2 b2 along I1 one direction (say, along j = 2) to get I1 = a1 b1 × a2 , I2 = a1 b1 × b2 (a1, a2) · I2 (if n 3 this is and reassemble it I = I1 ∪ accomplished by a hyperplane). Thus 2 I1 + 2 I2 = b1 − a1 − a2 + b1 − a1 b2 − = b1 − a1 b2 − + − a2 = b1 − a1 b2 − a2 = 2 I Now let Ij j∈ ⊂ 2 , Ij = aj bj , be a decreasing sequence of rectangles Ij ↓ ∅. We have to show that limj→ 2 Ij = 0. Since Ij ↓ ∅, it is clear that j j at least in one coordinate direction, say k = 2, we have limj→ b2 − a2 = 0, j otherwise j∈ Ij would contain a rectangle with side-lengths limj→ bk − j bk > 0 for k = 1 2. But then 2 Ij =

2

j

j

bk − ak

k=1

j j 2−1 j j j→ max bk − ak b2 − a2 −−−→ 0 k =2

6.6 Corollary (Existence of Lebesgue measure) There is a unique extension of n-dimensional Lebesgue pre-measure n from n (Definition 4.8) to a measure on the Borel sets n . This extension is again denoted by n and is called Lebesgue measure. n Proof We know from Theorem 3.8 that n =

−k kn ↑ . Since n n n is an exhausting sequence of cubes and since −k k = 2kn < , all conditions of Carathéodory’s theorem 6.1 are fulfilled, and n extends to a measure on n .

46

R.L. Schilling

6.7 Remark The uniqueness of Lebesgue measure and its properties (cf. Theorem 4.9) show that it is necessarily the familiar elementary-geometric volume (length, area …)-function voln • in the sense that voln can in only one way be extended to a measure on the Borel -algebra. Problems 6.1. Consider on the family of all Borel sets which are symmetric w.r.t. the origin. Show that is a -algebra. Is it possible to extend a pre-measure on to a measure on ? If so, is this extension unique? Continues in Problem 9.12. 6.2. Completion (2). Recall from Problem 4.13 that a measure space X is complete, if every subset of a -null set is a -null set (thus, in particular, measurable). Let X be a -finite measure space – i.e. there is an exhausting sequence Aj j∈ ⊂ such that Aj < . As in the proof of Theorem 6.1 we write ∗ for the outer measure (1) – now defined using -coverings – and ∗ for the -algebra defined by (6.4). (i) Show that for every Q ⊂ X there is some A ∈ such that ∗ Q = A and that N = 0 for all N ⊂ Q \ A with N ∈ . [Hint: since ∗ is defined as an infimum, every Q with ∗ Q < admits a sequence Bk ∈ with Bk ⊃ Q and B − ∗ Q 1/k. If ∗ Q = , consider for each j ∈ the set Q ∩ Aj .] (ii) Show that X ∗ ∗ ∗ is a complete measure space. (iii) Show that X ∗ ∗ ∗ is the completion of X in the sense of Problem 4.13. (i) Show that non-void open sets in (resp. n ) have always strictly positive Lebesgue measure. [Hint: let U be open. Find a small ball in U and inscribe a cube.] (ii) Is (i) still true for closed sets? 6.4. (i) Show that 1 a b = b − a for all a b ∈ a b. [Hint: approximate a b by half-open intervals and use Theorem 4.4.] (ii) Let H ⊂ 2 be a hyperplane which is perpendicular to the x1 -direction (that is to say: H is a translate of the x2 -axis). Show that H ∈ 2 and 2 H = 0. [Hint: consider the sets Ak = − 2−k 2−k × −k k and note that H ⊂ y + ∪k∈ Ak for some y.] (iii) State and prove the n -analogues of (i) and (ii). 6.5. Let X be a measure space such that all singletons x ∈ . A point x is called an atom, if x > 0. A measure is called non-atomic or diffuse, if there are no atoms. 6.3.

(i) Show that one-dimensional Lebesgue measure 1 is diffuse. (ii) Give an example of a non-diffuse measure on . (iii) Show that for a diffuse measure on X all countable sets are null sets.


47

(iv) Show that every probability measure P on can be decomposed into a sum of two measures + , where is diffuse and is a measure of the form = j∈ j xj , j > 0, xj ∈ . k

k

k

[Hint: since P = 1, there are at most k points y1 y2 yk such that k 1 > P yj k1 . Find by recursion (in k) all points satisfying such a k−1 k relation. There are at most countably many of these yj . Relabel them as x1 x2 . These are the atoms of P. Now take j = P yj , define as stated and prove that and P − are measures.] 6.6. A set A ⊂ n is called bounded, if it can be contained in a ball Br 0 ⊃ A of finite radius r. A set A ⊂ n is called connected, if we can go along a curve from any point a ∈ A to any other point a ∈ A without ever leaving A, cf. Appendix B. (i) Construct an open and unbounded set in with finite, strictly positive Lebesgue measure. [Hint: try unions of ever smaller open intervals centred around n ∈ .] (ii) Construct an open, unbounded and connected set in 2 with finite, strictly positive Lebesgue measure. [Hint: try a union of adjacent, ever longer, ever thinner rectangles.] (iii) Is there a connected, open and unbounded set in with finite, strictly positive Lebesgue measure? 6.7. Let = 1 01 be Lebesgue measure on 0 1 0 1 . Show that for every > 0 there is a dense open set U ⊂ 0 1 with U . [Hint: take an enumeration qj j∈ of ∩ 0 1 and make each qj the centre of a small open interval.] 6.8. Let = 1 be Lebesgue measure on . Show that N ∈ is a null set if, and only if, for every > 0 there is an open set U = U ⊃ N such that U < . [Hint: sufficiency is trivial, for necessity use ∗ constructed in Theorem 6.1 (6.1) from and observe that by Theorem 6.1 n = ∗ n . This gives the required open cover.] 6.9. Borel–Cantelli lemma (1) – the direct half. Prove the following theorem. Theorem (Borel–Cantelli lemma). Let P be a probability space. For every sequence Aj j∈ ⊂ we have j=1

PAj <

=⇒

P

Aj = 0

(6.11)

n=1 j=n

[Hint: use Theorem 4.4 and the fact that P jn Aj jn PAj .] Remark. This is the ‘easy’ or direct half of the so-called Borel–Cantelli lemma; the more difficult part see T18.9. The condition ∈ Aj means that happens n=1 j=n

to be in infinitely many of the Aj and the lemma gives a simple sufficient condition when certain events happen almost surely not infinitely often, i.e. only finitely often with probability one.

48

R.L. Schilling

6.10. Non-measurable sets (1). Let be a measure on = ∅ 0 1 1 2 0 2 , X = 0 2, such that 0 1 = 1 2 = 21 and 0 2 = 1. Denote by ∗ and ∗ the outer measure and -algebra which appear in the proof of Theorem 6.1. (i) Find ∗ a b and ∗ a for all 0 a < b < 2 if we use = in T6.1; (ii) Show that 0 1 0 ∈ ∗ . 6.11. Non-measurable sets (2). Consider on X = the -algebra = A ⊂ A or Ac is countable from Example 3.3(v) and the measure A from 4.7(ii) which is 0 or 1 according to A or Ac being countable. Denote by ∗ and ∗ the outer measure and -algebra which appear in the proof of Theorem 6.1. (i) Find ∗ if we use = in T6.1; (ii) Show that no set B ⊂ , such that both B and Bc are uncountable, is in or in ∗ .

7 Measurable mappings

In this chapter we consider maps T X → X between two measurable spaces X and X which respect the measurable structures, that is -algebras, on X and X . Such maps can be used to transport a given measure , defined on X , onto X . We have already used this technique in Theorem 5.8, where we considered shifts of sets: A x + A, but it is in probability theory where this concept is truly fundamental: you use it whenever you speak of the ‘distribution’ of a ‘random variable’. 7.1 Definition Let X X be two measurable spaces. A map T X → X is called / -measurable (or measurable unless this is too ambiguous) if the pre-image of every measurable set is a measurable set: T −1 A ∈

∀ A ∈

(7.1)

A random variable is a measurable map from a probability space to any measurable space. Note that T −1 ⊂ is a common shorthand for (7.1). In the language of Definition 7.1 the translation (and its inverse) which we used in Theorem 5.8 is a n /n -measurable map: x n → n y → y − x

and

x−1 n → n y → y + x

(7.2)

∀ B ∈ n

(7.3)

In fact, Theorem 5.8 states that

n B = n x + B

and this requires x + B = −x B = x−1 B to be a Borel set! Our proof of T5.8 needed (and proved) this for rectangles B ∈ n and not for all Borel sets – but 49

50

R.L. Schilling

this is good enough even in the most general case. The following lemma shows that measurability needs only to be checked for the sets of a generator. 7.2 Lemma Let X X be measurable spaces and let = . Then T X → X is / -measurable if, and only if, T −1 ⊂ , i.e. if T −1 G ∈

∀ G ∈

(7.4)

Proof If T is / -measurable, we have T −1 ⊂ T −1 ⊂ , and (7.4) is obviously satisfied. Conversely, consider the system = A ⊂ X T −1 A ∈ . By (7.4), ⊂ and it is not difficult to see that is itself a -algebra since T −1 commutes with all set-operations.[] Therefore, = ⊂ = =⇒ T −1 A ∈

∀ A ∈

On a topological space X – see Appendix B – we consider usually the (topological) Borel -algebra X = . The interplay between measurability and topology is often quite intricate. One of the simple and extremely useful aspects is the fact that continuous maps are measurable; let us check this for n . 7.3 Example Every continuous map T n → m is n /m -measurable. From calculus1 we know that T is continuous if, and only if, T −1 U ⊂ n

is open

∀ open U ⊂ m

(7.5)

Since the open sets m in m generate the Borel -algebra m , we can use (7.5) to deduce T −1 m ⊂ n ⊂ n = n By Lemma 7.2, T −1 m ⊂ n which means that T is measurable. Caution: Not every measurable map is continuous, e.g. x → 1−11 x. 7.4 Theorem Let Xj j , j = 1 2 3, be measurable spaces and T X1 → X2 , S X2 → X3 be 1 /2 - resp. 2 /3 -measurable maps. Then S T X1 → X3 is 1 /3 -measurable. Proof For A3 ∈ 3 we have

S T −1 A3 = T −1 S −1 A3 ∈ T −1 2 ⊂ 1 ∈ 2

1

See also Appendix B, Theorem B.12 and B.19.


51

Often we find ourselves in a situation where T X → X is given and where X is equipped with a natural -algebra – e.g. if X = and = – but no -algebra is specified in X. Then the question arises: is there a (smallest) -algebra on X which makes T measurable? An obvious, but nevertheless useless, candidate is X, which renders every map measurable.[] From Example 3.3(vii) we know that T −1 is a -algebra in X but we cannot remove a single set from it without endangering the measurability of T .[] Let us formalize this observation. 7.5 Definition (and Lemma) Let Ti i∈I be arbitrarily many mappings Ti X → Xi from the same space X into measurable spaces Xi i . The smallest -algebra on X that makes all Ti simultaneously measurable[] is

Ti−1 i (7.6) Ti i ∈ I = i∈I

We say that Ti i ∈ I is generated by the family Ti i∈I . Although Ti−1 i is a -algebra this is, in general, no longer true for −1 i∈I Ti i if #I > 1; this explains why we have to use the -hull in (7.6).

7.6 Theorem Let X X be measurable spaces and T X → X be an / -measurable map. For every measure on X , A = T −1 A

A ∈

(7.7)

defines a measure on X . Proof If A = ∅, then T −1 ∅ = ∅ and ∅ = ∅ = 0. If Aj j∈ ⊂ is a sequence of mutually disjoint sets, then

[]

· Aj = T −1 · Aj = · T −1 Aj j∈

j∈

j∈

−1 T Aj = Aj = j∈

j∈

Notice that we have seen a special case of Theorem 7.6 in the proof of Theorem 5.8 when considering translates of Lebesgue measure: n x + B =

x−1 B. 7.7 Definition The measure • of Theorem 7.6 is called the image measure of under T and is denoted by T• or T −1 •.

52

R.L. Schilling

7.8 Example Let P be a probability space and → be a random variable, i.e. an /-measurable map. Then2 PA = P −1 A = P ∈ A = P ∈ A is again a probability measure, called the law or distribution of the random variable . More concretely, if P describes throwing two fair dice, i.e. = j k 1 j k 6 , = and P j k = 1/36, we could ask for the total number of points thrown: → 2 3 12 , j k = j + k, which is a measurable map.[] The law of is then given in the table below: j

2

3

4

5

6

7

8

9

10

11

12

P = j

1 36

1 18

1 12

1 9

5 36

1 6

5 36

1 9

1 12

1 18

1 36

We close this section with some transformation formulae for Lebesgue measure. Recall that On is the set of all orthogonal n × n matrices: T ∈ On if, and only if, t T · T = id. Orthogonal matrices preserve lengths and angles, i.e. we have for all x y ∈ n x y = Tx Ty ⇐⇒ x = Tx

(7.8)

where x y = nj=1 xj yj and x2 = x x denote the usual Euclidean scalar product, resp. norm. 7.9 Theorem If T ∈ On, then n = T n . Proof Since T is a linear orthogonal map it is continuous and by (7.8) even an isometry, Tx − Ty = Tx − y = x − y hence measurable by Example 7.3. Therefore, the image measure B =

n T −1 B is well-defined (by T7.6) and satisfies for all x ∈ n x + B = n T −1 x + B = n T −1 x + T −1 B 5.8

= n T −1 B

= B 2

We use the shorthand ∈ A for −1 A and P ∈ A for P ∈ A .


53

and, again by Theorem 5.8, B = n B for all B ∈ n . To determine the constant we choose B = B1 0. Since T ∈ On, (7.8) implies B1 0 = x x < 1 = x Tx < 1 = T −1 B1 0 and thus

n B1 0 = n T −1 B1 0 = B1 0 = n B1 0 As 0 < n B1 0 < , we have = 1, and the theorem follows. Theorem 7.9 is a particular case of the following general change-of-variable formula for Lebesgue measure. Recall that GLn is the set of all invertible n × n matrices, i.e. S ∈ GLn ⇐⇒ det S = 0. 7.10 Theorem Let S ∈ GLn . Then S n = det S −1 n =

1

n det S

(7.9)

Proof Since S is invertible, both S and S −1 are linear maps on n , and as such continuous and measurable (Example 7.3). Set B = n S −1 B for B ∈ n . Then we have for all x ∈ n x + B = n S −1 x + B = n S −1 x + S −1 B 58

= n S −1 B = B

and from Theorem 5.8 we conclude that B = 0 1n n B = n S −1 0 1n n B From elementary geometry we know that S −1 0 1n is a parallelepiped spanned by the vectors S −1 ej j = 1 2 n, ej = 0 0 1 0 . Its geometric j

volume is

voln S −1 0 1n = det S −1 =

1 det S

see also Appendix C. By Remark 6.7, voln = n (at least on the Borel sets) and the proof is finished. Theorem 7.9 or 7.10 allow us to complete the characterization of Lebesgue measure announced earlier in Theorem 4.9. A motion is a linear transformation of the form Mx = x T

54

R.L. Schilling t

where x y = y −x is a translation and T ∈ On is an orthogonal map ( T · T = id). In particular, congruent sets are connected by motions. 7.11 Corollary Lebesgue measure is invariant under motions: n = M n for all motions M in n . In particular, congruent sets have the same measure. Proof We know that M is of the form x T . Since det T = ±1, we get 7.10

5.8

M n = x T n = x n = n Problems 7.1. Use Lemma 7.2 to show that x of (7.2), i.e. x y = y − x x y ∈ n , is n /n measurable. 7.2. Show that defined in the proof of Lemma 7.2 is a -algebra. 7.3. Let X be a set, Xi i , i ∈ I, be arbitrarily many measurable spaces, and Ti X → Xi be a family of maps. (i) Show that for every i ∈ I the smallest -algebra in X that makes Ti measurable is given by Ti−1 i . (ii) Show that i∈I Ti−1 i is the smallest -algebra in X that makes all Ti , i ∈ I, simultaneously measurable. 7.4. Let X be a set, Xi i i ∈ I, be arbitrarily many measurable spaces, and Ti X → Xi be a family of maps. Show that a map f from a measurable space F to X Ti i ∈ I is measurable if, and only if, all maps Ti f are /i -measurable. 7.5. Use Problem 7.4 to show that a function f n → m , x → f1 x fm x is ’take out’ measurable if, and only if, all coordinate maps fj n → , j = 1 2 m, are measurable. [Hint: show that the coordinate projections x = x1 xn → xj are measurable.] 7.6. Let T X → X be a measurable map. Under which circumstances is the family of sets T a -algebra? 7.7. Use image measures to give a new proof of Problem 5.8, i.e. show that

n t · B = tn n B

∀ B ∈ n ∀ t > 0

7.8. Let T X → Y be any map. Show that T −1 = T −1 holds for arbitrary families of subsets of Y . 7.9. Stieltjes measure (1). Throughout this exercise X = 1 and = 1 is one-dimensional Lebesgue measure. ⎧ ⎪ if x > 0 ⎪ ⎨0 x 1 (i) Let be a measure on . Show that F x = 0 if x = 0 ⎪ ⎪ ⎩−x 0 if x < 0 is a monotonically increasing and left-continuous function F → .


55

Remark. Increasing and left-continuous functions are called Stieltjes functions. (ii) Let F → be a Stieltjes function (cf. part (i)). Show that F a b = Fb − Fa

∀ a b ∈ a < b

has a unique extension to a measure on 1 . [Hint: check the assumptions of Theorem 6.1 with = a b a b .] (iii) Use part (i) to show that every measure on 1 with −n n < , n ∈ , can be written in the form F as in (ii) with some Stieltjes function F = F as in (i). (iv) Which Stieltjes function F corresponds to ? (v) Which Stieltjes function F corresponds to 0 ? (vi) Show that F as in (i) is continuous at x ∈ if, and only if, x = 0. (vii) Show that every measure on 1 which has no atoms (see Problem 6.5) can be written as image measure of . [Hint: has no atoms implies that F is continuous. So G = F −1 exists and can be made left-continuous. Finally a b = F b − F a =

G−1 a b ] (viii) Is (vii) true for measures with atoms, say, = 0 ? [Hint: determine F−1 . Is it measurable?] 0 7.10. Cantor’s ternary set. Let X = 0 1 0 1 ∩ 1 , = 1 01 , and set · I12 . Remove the E0 = 0 1. Remove the open middle third of E0 to get E1 = I11 ∪ j · I22 ∪ · I23 ∪ · I24 and so forth. open middle thirds of I1 , j = 1 2, to get E2 = I21 ∪ (i) Make a sketch of E0 E1 E2 E3 . (ii) Prove that each En is compact. Conclude that C = n∈ 0 En is non-void and compact. (iii) The set C is called the Cantor set or Cantor’s discontinuum. It satisfies 3k+2 = ∅. C ∩ n∈ k∈ 0 3k+1 n n 3 3 (iv) Find the value of En and show that C = 0. (v) Show that C does not contain any open interval. Conclude that the interior (of the closure) of C is empty. Remark. Sets with empty interior are called nowhere dense. (vi) We can write x ∈ 0 1 as a base-3 ternary fraction, i.e. x = 0x1 x2 x3 where

1 −j xj ∈ 0 1 2 , which is short for x = j=1 xj 3 . (E.g. 3 = 01 = 002222 ; note that this representation is not unique[] , which is important for this exercise.) Show that x ∈ C if, and only if, x has a ternary representation involving only 0s and 2s. [Hint: the numbers in 13 23 , the first interval to be removed, are all of the form 01 ∗ ∗ ∗ , i.e. they contain at least one ‘1’, while in 0 13 and 23 1 we have numbers of the form 00 ∗ ∗ ∗ –0022222 and 02 ∗ ∗ ∗ –02222 ,

56

R.L. Schilling respectively. The next step eliminates the 001 ∗ ∗ ∗ s and 021 ∗ ∗ ∗ s – etc.] (vii) Use (vi) to show that C is not countable and has even the same cardinality as 0 1. Nevertheless, C = 0 = 1 = 0 1.

7.11. Factorization lemma. Let X be a set, Y be a measurable space and T X → Y be a surjective map. Show that a function f X → is T /1 -measurable if, and only if, there exists some /1 -measurable function g Y → such that f = g T . [Hint: show first that Tx = Tx implies fx = fx .] Remark. The result is actually true for any map T X → Y , but the proof is quite difficult if TX ∈ . The problem is that one has to extend the TX ∩ /1 measurable function g TX → to an /1 -measurable function g Y → .

8 Measurable functions

A measurable function is a measurable map u X → from some measurable space X to . Measurable functions will play a central rôle in the theory of integration. Recall that u X → is /-measurable1 ( = ) if u−1 B ∈

∀B ∈

(8.1)

∀ G from a generator of

(8.2)

which is, due to Lemma 7.2, equivalent to u−1 G ∈

As we have seen in Remark 3.9, is generated by all sets of the form a (or b or − c or − d) with a b c d ∈ or , and we need u−1 a = x ∈ X ux ∈ a = x ∈ X ux a ∈

(8.3)

with similar expressions for the other types of intervals. Let us introduce the following useful shorthand notation: u v = x ∈ X ux vx

(8.4)

and u > v u v u < v u = v u = v u ∈ B , etc. which are defined in a similar fashion. In this new notation measurability of functions reads as 8.1 Lemma Let X be a measurable space. The function u X → is /-measurable if, and only if, one, hence all, of the following conditions hold (i) u a ∈ (ii) u > a ∈ 1

for all a ∈ (or all a ∈ ), for all a ∈ (or all a ∈ ),

We will frequently drop the since is naturally equipped with the Borel -algebra and just say that u is -measurable.

57

58

R.L. Schilling

(iii) u a ∈ (iv) u < a ∈

for all a ∈ (or all a ∈ ), for all a ∈ (or all a ∈ ).

Proof Combine Remark 3.9 and Lemma 7.2. It is sometimes practical to admit the values + and − in some calculations. ¯ = − +. If we agree To do this properly, consider the extended real line ¯ inherits the ordering from that − < x and y < + for all x y ∈ , then as well as the usual rules of addition and multiplication of elements from . The latter need to be augmented as follows: for all x ∈ we have x + + = + + x = + + + + = +

x + − = − + x = − − + − = −

and, if x ∈ 0 , ±x+ = +±x = ± ±x− = −±x = ∓ 0 · ± = ± · 0 = 02

1 ±

= 0

¯ is not a field. Expressions of the form Caution: − and

± ±

must be avoided2

¯ are called numerical functions. The Borel Functions which take values in ¯ ¯ -algebra = is defined by B∗ = B ∪ S for some B ∈ and B∗ ∈ ¯ ⇐⇒ S ∈ ∅ − + − +

(8.5)

and it is not hard to see that ¯ is again a -algebra whose trace w.r.t. is .[] ¯ 8.2 Lemma = ∩ . Moreover, 8.3 Lemma ¯ is generated by all sets of the form a (or b or − c or − d) where a (or b c d) is from or . 2

Conventions are tricky. The rationale behind our definitions is to understand ‘±’ in every instance as the limit of some (possibly each time different) sequence, and ‘0’ as a bona fide zero. Then 0 · ± = 0 · limn an = limn 0 · an = limn 0 = 0 while expressions of the type − or ± become limn an − limn bn ± n an or lim where two sequences compete and do not lead to unique results. lim b n n


59

Proof Set = a a ∈ . Since a = a ∪ +

and a ∈

¯ On the other hand, we see that a ∈ ¯ and ⊂ . a b = a \ b ∈

∀ − < a b <

¯ Since also which means that ⊂ ⊂ . j − = − −j = −j c + = j∈

j∈

j∈

we have − + ∈ which entails that all sets of the form B B ∪ + B ∪ − B ∪ − + ∈

∀ B ∈

therefore, ¯ ⊂ . The proofs for a ∈ and the other generating systems are similar. 8.4 Definition Let X be a measurable space. We write = and ¯ = ¯ for the families of real-valued /-measurable and numerical ¯ /-measurable functions on X. 8.5 Examples Let X be a measurable space. (i) The indicator function fx = 1A x is measurable if, and only if, A ∈ . This follows easily from Lemma 8.1 and ⎧ ⎪ ⎨ ∅ if 1 1A > = A if 1 > 0 ⎪ ⎩ X if < 0 (ii) Let A1 A2 AM ∈ be mutually disjoint sets and y1 yM ∈ . Then the function gx =

M

yj 1Aj x

(8.6)

j=1

is measurable. This follows from Lemma 8.1 and the fact (compare with the picture!) that

g > = · Aj ∈ j yj >

60

R.L. Schilling y4 y2

λ

λ

y1 y3

A1

A2

A3

A4

A1

{ f > λ } = A2 · A4

Functions of the form (8.6) are the building blocks for all measurable functions as well as for the definition of the integral. 8.6 Definition A simple function g X → on a measurable space X is a function of the form (8.6) with finitely many sets A1 AM ∈ and y1 yM ∈ . The set of simple functions is denoted by or . If the sets Aj 1 j M, are mutually disjoint we call M

yj 1Aj x

(8.7)

j=0

with y0 = 0 and A0 = A1 ∪ ∪ AM c a standard representation of g. Caution: The representations (8.7) are not unique. 8.7 Examples (continued) (iii) If a measurable function h X → attains only finitely many values y1 y2 yM ∈ , then it is a simple function. Indeed: set Bj = h = yj = h yj \ h < yj ∈ j = 1 2 M and note that the Bj are disjoint. Thus hx =

M j=1

yj 1Bj x =

M

yj 1 h=yj x

j=1

Since every simple function attains only finitely many values, this shows that every simple function has at least one standard representation. In particular, ⊂ consists of measurable functions. (iv) f g ∈ =⇒ f ± g f g ∈ . N Indeed: let f = M j=0 yj 1Aj and g = k=0 zk 1Bk be standard representations of f and g.


f

61

g

It is not hard to see (use the picture!) that f ±g =

M N

yj ± zk 1Aj ∩Bk

j=0 k=0

fg =

M N

yj zk 1Aj ∩Bk

j=0 k=0

and that Aj ∩ Bk ∩ Aj ∩ Bk = ∅ whenever j k = j k . After relabelling and merging the double indexation into a single index, this shows that f ± g fg ∈ . Notice that Aj ∩ Bk jk is the common refinement of the partitions Aj j and Bk k and that inside each of the sets Aj ∩ Bk the functions f and g do not change their respective values. (v) f ∈ =⇒ f + f − ∈ . Here we use the following notation: for a function u X → we write for the – u–

u+

u+ x = max ux 0

u− x = − min ux 0

(8.8)

for the positive u+ and negative u− parts of u. Obviously, u = u+ − u−

and

u = u+ + u−

(8.9)

(vi) f ∈ =⇒ f ∈ . Our next theorem reveals the fundamental rôle of simple functions. ¯ 8.8 Theorem Let X be a measurable space. Every /-measurable numer¯ ical function u X → is the pointwise limit of simple functions: ux = limj→ fj x, fj ∈ and fj u. If u 0, all fj can be chosen to be positive and increasing towards u so that u = supj∈ fj .

62

R.L. Schilling

Proof Assume first that u 0. Fix j ∈ and define level sets k2−j u < k + 12−j k = 0 1 2 j2j − 1 j Ak = uj k = j2j which slice up the graph of u horizontally as shown in the picture. The approximating simple functions are

j

fj

2–j

j

fj x =

j2

k2−j 1Aj x

k=0

k

and from the picture it is easy to see that • fj x − ux 2−j if x ∈ u < j ; j • Ak = k2−j u ∩ u < k + 12−j u j ∈ ; • 0 fj u and fj ↑ u. For a general u, we consider its positive and negative parts u± . Since u > if 0 + u > = ∅ if < 0 ¯ Thus u± are positive and since u− = −u+ , we have u± > ∈ for all ∈ . measurable functions, and we can construct, as above, simple functions gj ↑ u+ j→

and hj ↑ u− . Clearly, fj = gj − hj −−−→ u+ − u− = u as well as fj = gj + hj u+ + u− = u, and we are done. ¯ j ∈ , are 8.9 Corollary Let X be a measurable space. If uj X → , measurable functions, then so are sup uj j∈

inf uj

j∈

lim sup uj j→

lim inf uj j→

and, whenever it exists, limj→ uj . Before we prove Corollary 8.9 let us stress again that expressions of the type j→

supj∈ uj or uj −−−→ u, etc. are always understood in a pointwise, x-by-x sense, i.e. they are short for supj∈ uj x = sup uj x j ∈ or limj→ uj x = ux at each x (or for a specified range). The infimum ‘inf’ and supremum ‘sup’ are familiar from calculus. Recall the following useful formula inf uj x = − sup−uj x

j∈

j∈

(8.10)


63

which allows us to express an inf as a sup, and vice versa. Recall also the definition of the lower resp. upper limits lim inf and lim sup,

(8.11) lim inf uj x = sup inf uj x = lim inf uj x j→

k∈

jk

k→

lim sup uj x = inf sup uj x = lim k∈

j→

k→

jk

jk

sup uj x

(8.12)

jk

more details can be found in Appendix A. ¯ lim inf and lim sup always exist – but they may In the extended real line attain the values + and − – and we have lim inf uj x lim sup uj x j→

(8.13)

j→

Moreover, limj→ uj x exists [and is finite] if, and only if, upper and lower limits coincide lim inf j→ uj x = lim supj→ uj x [and are finite]; in this case all three limits have the same value. Proof (of Corollary 8.9) We show that supj uj and −1u = −u (for a measurable function u) are again measurable. Observe that for all a ∈

uj > a ∈ sup uj > a = j∈ j∈ ∈

The inclusion ‘⊃’ is trivial since a < uj x supj∈ uj x always holds; the direction ‘⊂’ follows by contradiction: if uj x a for all j ∈ , then also supj∈ uj x a. This proves the measurability of supj∈ uj . If u is measurable, we have for all a ∈ −u > a = u < −a ∈ which shows that −u is also measurable. The measurability of inf j∈ uj , lim inf j→ uj and lim supj→ uj follows now from formulae (8.10)–(8.12), which can be written down in terms of supj s and several multiplications by −1. If limj→ uj exists, it coincides with lim inf j→ uj = lim supj→ uj and inherits their measurability. ¯ 8.10 Corollary Let u v be /-measurable numerical functions. Then the functions u ± v

uv

u ∨ v = max u v

u ∧ v = min u v

¯ are /-measurable (whenever they are defined).

(8.14)

64

R.L. Schilling u

max{u, υ}

min{u, υ}

υ

The maximum u ∨ v and minimum u ∧ v of two functions is always meant pointwise, i.e. [] 1 u ∨ vx = max ux vx = ux + vx + ux − vx 2 1 [] u ∧ vx = min ux vx = ux + vx − ux − vx 2 Proof (of Corollary 8.10) If u v ∈ are simple functions, all functions in (8.14) are again simple functions[] and, therefore, measurable. For general u v ∈ ¯ j→

choose sequences fj j∈ gj j∈ ⊂ of simple functions such that fj −−−→ u j→

and gj −−−→ v. The claim now follows from the usual rules for limits. ¯ ¯ 8.11 Corollary A function u is /-measurable if, and only if, u± are /measurable. ¯ 8.12 Corollary If u v are /-measurable numerical functions, then u < v

u v

u = v

u = v ∈

Let us finally show an interesting result on the structure of T -measurable functions; see also Problem 7.11 of the previous chapter. 8.13 Lemma (Factorization lemma) Let T X → X be an / measurable map and let T ⊂ be the -algebra generated by T . Then ¯ ¯ if, and only if, u u = wT for some /-measurable function w X → ¯ ¯ is T /-measurable. X→ Proof Suppose that u is T -measurable. If u is an indicator function, u = 1A with A ∈ T , we know from the definition of T that A = T −1 A for some A ∈ . Thus u = 1A = 1T −1 A = 1A T and w = 1A will do. This consideration remains true for simple functions u ∈ T since they are just sums of scalar multiples of indicator functions;[] hence u = w T for a suitable w ∈ .


65

We can now use Theorem 8.8 and approximate the T -measurable function u by a sequence uj j∈ ⊂ T . By what was said above, uj = wj T for suitable wj ∈ . Then w = lim supj→ wj is measurable by C8.9 and satisfies ∗

wT = lim sup wj T = lim wj T = lim uj = u j→

j→

j→

where we used for the equality marked ∗ the fact that the limit limj→ uj exists. The converse, that u = w T is T -measurable, is obvious. Problems 8.1. Show directly that condition (i) of Lemma 8.1 is equivalent to either of (ii), (iii), (iv). ¯ defined in (8.5) is a -algebra. Moreover, prove that 8.2. Verify that ¯ = ¯ = ∩ . 8.3. Let X be a measurable space. (i) Let f g X → be measurable functions. Show that for every A ∈ the function hx = fx, if x ∈ A, and hx = gx, if x ∈ A, is measurable. (ii) Let fj j∈ be a sequence of measurable functions and let Aj j∈ ⊂ such that j∈ Aj = X. Suppose that fj Aj ∩Ak = fk Aj ∩Ak for all j k ∈ and set fx = fj x if x ∈ Aj . Show that f X → is measurable. 8.4. Let X be a measurable space and let be a sub--algebra. Show that . 8.5. Show that f ∈ implies that f ± ∈ . Is the converse valid? 8.6. Show that for every real-valued function u = u+ − u− and u = u+ + u− . 8.7. Scrutinize the proof of Theorem 8.8 and check that bounded [positive] measurable functions u ∈ can be approximated uniformly by an [increasing] sequence fj j∈ ⊂ of [positive] simple functions. 8.8. Show that every continuous function u → is / measurable. [Hint: check that for continuous functions f > is an open set.] 8.9. Show that x → max x 0 and x → min x 0 are continuous, and by Problem 8.8 or Example 7.3, measurable functions from → . Conclude that on any measurable space X positive and negative parts u± of a measurable function u X → are measurable. 8.10. Check that the approximating sequence fj j∈ for u in Theorem 8.8 consists of u-measurable functions. 8.11. Complete the proofs of Corollaries 8.11 and 8.12. 8.12. Let u → be differentiable. Explain why u and u = du/dx are measurable. 8.13. Find u, i.e. the -algebra generated by u, for the following functions: f g h → F G 2 →

(i)

fx = x

(ii)

gx = x2

(iv) Fx y = x + y

(v)

(iii)

hx = x

Gx y = x2 + y2

66

8.14. 8.15. 8.16. 8.17.

8.18.

R.L. Schilling [Hint: under f g h the pre-images of intervals are (unions of) intervals, under F we get strips in the plane, under G annuli and discs.] Consider and u → . Show that x ∈ u for all x ∈ if, and only if, u is injective. Let be one-dimensional Lebesgue measure. Find u−1 , if ux = x. Let u → be measurable. Which of the following functions are measurable: ux − 2 eux sinux + 8 u x sgn ux − 7? One can show that there are non-Borel measurable sets A ⊂ , cf. Appendix D. Taking this fact for granted, show that measurability of u does not, in general, imply the measurability of u. (The converse is, of course, true: measurability of u always guarantees that of u.) Show that every increasing function u → is / measurable. Under which additional condition(s) do we have u = ? [Hint: show that u < is an interval by distinguishing three cases: u is continuous and strictly increasing when passing the level , u jumps over the level u is ‘flat’ at level . Make a picture of these situations.]

9 Integration of positive functions

Throughout this chapter X will be some measurable space. Recall that + [+¯ ] are the -measurable positive real [numerical] functions and [ + ] are the [positive] simple functions. The fundamental idea of integration is to measure the area between the graph of a function and the abscissa. For a positive simple function f ∈ + in standard representation1 this is easily done: if

f=

M

yj 1Aj ∈

+

M

then

j=0

yj Aj

(9.1)

j=0

should be the -area enclosed by the graph and the abscissa.

yj

f

µ (Aj)

There is only the problem that (9.1) might depend on the particular (standard) representation of f – and this should not happen. N 9.1 Lemma Let M j=0 yj 1Aj = k=0 zk 1Bk be two standard representations of + the same function f ∈ . Then M j=0 1

N

yj Aj =

zk Bk

k=0

In the sense of Definition 8.6. By 8.5(iii) every f ∈ has a standard representation.

67

68

R.L. Schilling

· A1 ∪ · ∪ · AM = X = B0 ∪ · B1 ∪ · ∪ · BN we get Proof Since A0 ∪ N Aj = · Aj ∩ Bk

and

M Bk = · Bk ∩ Aj j=0

k=0

Using the (finite) additivity of we see that M

yj Aj =

j=0

M

yj

j=0

N

Aj ∩ Bk =

M N

yj Aj ∩ Bk

(9.2)

j=0 k=0

k=0

(since all yj are positive, the above sums always exist in 0 ). Similarly, N

zk Bk =

k=0

N k=0

zk

M

Aj ∩ Bk =

j=0

N M

zk Aj ∩ Bk

(9.3)

k=0 j=0

But yj = zk whenever Aj ∩ Bk = ∅, while for Aj ∩ Bk = ∅ we have Aj ∩ Bk = ∅ = 0. Thus yj Aj ∩ Bk = zk Aj ∩ Bk

∀ j k

and (9.2) and (9.3) have the same value. Lemma 9.1 justifies the following definition based on (9.1). + 9.2 Definition Let f = M j=0 yj 1Aj ∈ be a simple function in standard representation. Then the number I f =

M

yj Aj ∈ 0

j=0

(which is independent of the representation of f ) is called the (-)integral of f . 9.3 Properties (of I + → 0 ). Let f g ∈ + . Then (i) (ii) (iii) (iv)

I 1A = A ∀ A ∈ ; I f = I f ∀ 0; I f + g = I f + I g; f g =⇒ I f I g.

(positive homogeneous) (additive) (monotone)

Proof (i) and (ii) are obvious from the definition of I . (iii): take standard representations f=

M j=0

yj 1Aj

and

g=

N k=0

zk 1Bk


69

and observe that, as in Example 8.5(iv), M N

f +g =

yj + zk 1Aj ∩Bk ∈ +

j=0 k=0

is a standard representation of f + g. Thus I f + g

M N

=

yj + zk Aj ∩ Bk

j=0 k=0 M

=

yj

j=0 9293

=

M

N

Aj ∩ Bk +

k=0

=

zk

k=0

yj Aj +

j=0

N

N

M

Aj ∩ Bk

j=0

zk Bk

k=0

I f + I g

(iv): If f g, then g = f + g − f where g − f ∈ + , see examples 8.7(iv). By part (iii) of this proof, I g = I f + I g − f I f since I • is positive. In Theorem 8.8 we have seen that every u ∈ + can be written as an increasing limit of simple functions; by Corollary 8.9, suprema of simple functions are again measurable, so that u ∈ + ⇐⇒ u = sup fj j∈

fj ∈ +

fj fj+1

We will use this to ‘inscribe’ simple functions (which we know how to integrate) below the graph of a positive measurable function u and exhaust the -area below u. 9.4 Definition Let X be a measure space. The (-)integral of a positive numerical function u ∈ +¯ is given by (9.4) u d = sup I g g u g ∈ + ∈ 0 If we need to emphasize the integration variable, we also write ux dx or ux dx. The key observation is that the integral d extends I , i.e.

70

R.L. Schilling

9.5 Lemma For all f ∈ + we have

f d = I f .

Proof Let f ∈ + . Since f f , f is an admissible function in the supremum appearing in (9.4), hence def I f sup I g g f g ∈ + = f d On the other hand, + g f implies that I g I f by Properties 9.3(iv), and def f d = sup I g g f g ∈ + I f The next result is the first of several convergence theorems. It shows, in particular, that we could have defined (9.4) using any increasing sequence fj ↑ u of simple functions fj ∈ + . 9.6 Theorem (Beppo Levi) Let X be a measure space. For an increasing sequence of numerical functions uj j∈ ⊂ +¯ , 0 uj uj+1 we have u = supj∈ uj ∈ +¯ and

sup uj d = sup j∈

j∈

uj d

(9.5)

Note that we can write limj→ instead of supj∈ in (9.5) since the supremum of an increasing sequence is its limit. Moreover, (9.5) holds in 0 +, i.e. the case ‘+ = +’ is possible. Proof (of Theorem 9.6) That u ∈ +¯ follows from Corollary 8.9. Step 1. Claim: u v ∈ +¯ u v =⇒ u d v d. This follows from the monotonicity of the supremum since every simple f ∈ + with f u also satisfies f v, and so u d = sup I f f u f ∈ + sup I f f v f ∈ + = v d Step 2. Claim: supj∈ uj d supj∈ uj d; this shows ‘’ in (9.5). Because of step 1 and uj u = supj∈ uj we see uj d u d ∀ j ∈


71

The right-hand side is independent of j, so that we may take the supremum over all j ∈ on the left. Step 3. Claim: f u f ∈ + =⇒ I f supj∈ uj d. This will prove ‘’ in (9.5) since the right-hand side does not depend on f and so we may take the supremum over all f ∈ + with f u on the left (which is, by definition, the integral u d). To prove the claim we fix some f ∈ + , f u. Since u = supj∈ uj we can find[] for every ∈ 0 1 and every x ∈ X some Nx ∈ with fx uj x

∀ j Nx

which means that the sets Bj = f uj increase as j ↑ towards X and are, by Corollary 8.12, measurable as f uj ∈ +¯ . By the very definition of the Bj 1Bj f 1Bj uj uj and, if f =

M

k=0 yk 1Ak ,

M

we get from Lemma 9.5 and step 1

yk Ak ∩ Bj = I 1Bj f

uj d sup j∈

k=0

uj d

(9.6)

At this point we use the -additivity of (in the guise of T4.4(iii)) to get Ak ∩ Bj ↑ Ak ∩ X = Ak

as Bj ↑ X j ↑

which implies (the far right of (9.6) no longer depends on j) I f =

M

yk Ak sup

j∈

k=0

uj d

Since we were free in our choice of ∈ 0 1, we can make → 1, and the claim and the theorem follow. One can see the next corollary just as a special case of Theorem 9.6. Its true meaning, however, is that it allows us to calculate the integral of a measurable function using any approximating sequence of elementary functions—and this is a considerable simplification of the original definition (9.4). 9.7 Corollary Let u ∈ +¯ . Then u d = lim fj d j→

holds for every increasing sequence fj j∈ ⊂ + with limj→ fj = u.

72

R.L. Schilling

9.8 Properties (of the integral) Let u v ∈ +¯ . Then ∀ A ∈ ; (i) 1A d = A (ii) u d = u d ∀ 0; (positive homogeneous) (iii) u + v d = u d + v d; (additive) (iv) u v =⇒ u d v d. (monotone) Proof (i) follows from Properties 9.3(i) and Lemma 9.5 and (ii), (iii) follow from the corresponding properties of I , Corollary 9.7 and the usual rules for limits. (iv) has been proved in step 1 of the proof of Theorem 9.6. 9.9 Corollary Let uj j∈ ⊂ +¯ . Then j=1 uj is measurable and we have

uj d =

j=1

uj d

(9.7)

j=1

(including the possibility + = +). Proof Set sM = u1 + u2 + · · · + uM and apply Properties 9.8(iii) and T9.6. 9.10 Examples Let X be a measurable space. (i) Let = y be the Dirac measure for fixed y ∈ X. Then ∀ u ∈ +¯ u dy = ux y dx = uy

Indeed: for any f ∈ + with standard representation f = M j=0 j 1Aj , we know that y ∈ X lies in exactly one of the Aj , say y ∈ Aj0 . Then

fx y dx =

M

j 1Aj x y dx =

j=0

M

j y Aj = j0 = fy

j=0

Now take any sequence of simple functions fk ↑ u. By Corollary 9.7 ux y dx = lim fk x y dx = lim fk y = uy

k→

k→

(ii) Let X = j=1 j j . As we have seen in Problem 4.6(ii), is indeed a measure and k = k . On the other hand, all measurable functions u ∈ +¯ are of the form uk =

j=1

uj 1 j k

∀k ∈


73

for a suitable sequence uj j∈ ⊂ 0 .2 Thus by Corollary 9.9,

u d =

j=1

=

uj 1 j d =

uj 1 j d

j=1

uj j =

j=1

uj j

j=1

We close this chapter with another convergence theorem due to P. Fatou and which is often called Fatou’s lemma. 9.11 Theorem (Fatou) Let uj j∈ ⊂ +¯ be a sequence of positive measurable numerical functions. Then u = lim inf j→ uj is measurable and lim inf uj d lim inf uj d j→

j→

¯ the meaProof Recall that lim inf j→ uj = supk∈ inf jk uj always exists in ; surability of C8.9. Applying T9.6 to the increasing sequence

lim inf was shown in + inf jk uj k∈ – which is in ¯ by C8.9 – we find 9.6 lim inf uj d = sup inf uj d j→

k∈

9.8(iv)

jk

sup inf k∈

k

= lim inf →

u d

u d

where we used that inf uj u for all k and the monotonicity of the integral, jk

cf. Properties 9.8(iv). Problems

9.1. Let f X → be a positive simple function of the form fx = m j 1Aj x, j=1 m j 0, Aj ∈ —but not necessarily disjoint. Show that I f = j=1 j Aj . [Hint: use additivity and positive homogeneity of I .] 9.2. Complete the proof of Properties 9.8 (of the integral). 9.3. Find an example showing that an ‘increasing sequence of functions’ is, in general, different from a ‘sequence of increasing functions’. 2

This means that we can identify -measurable functions f → 0 and arbitrary sequences uj j∈ ⊂ 0 by uk = uk .

74

R.L. Schilling

9.4. Complete the proof of Corollary 9.9 and show that (9.7) is actually equivalent to (9.5) in Beppo Levi’s theorem 9.6. + 9.5. Let X be a measure space and u ∈ . Show that the set-function A → 1A u d, A ∈ , is a measure. 9.6. Prove: Every function u → on is measurable. 9.7. Let X be a measurable space and j j∈ be a sequence of measures thereon. Set, as in 9.10(ii), = j∈ j . By Problem 4.6(ii) this is again a measure. Show that u d = u dj ∀ u ∈ + j∈

[Instructions: (1) consider u = 1A . (2) consider u = f ∈ + . (3) approximate u ∈ + by an increasing sequence of simple functions and use Theorem 9.6. To interchange increasing limits/suprema use the hint to Problem 4.6(ii).] + 9.8. Reverse Fatou lemma. Let X be a measure space and uj j∈ ⊂ . + If uj u for all j ∈ and some u ∈ with u d < , then lim sup uj d lim sup uj d j→

j→

9.9. Fatou’s lemma for measures. Let X be a measure space and let Aj j∈ , Aj ∈ , be a sequence of measurable sets. We set

lim inf Aj = Aj and lim sup Aj = Aj (9.8) j→

j→

k∈ jk

k∈ jk

(i) Prove that 1lim inf Aj = lim inf 1Aj and 1lim sup Aj = lim sup 1Aj . j→

j→

j→

j→

[Hint: check first that 1j∈ Aj = inf j∈ 1Aj and 1j∈ Aj = supj∈ 1Aj .] (ii) Prove that lim inf Aj lim inf Aj . j→ j→ (iii) Prove that lim sup Aj lim sup Aj if is a finite measure. j→

j→

(iv) Provide an example showing that (iii) fails if is not finite. 9.10. Let Aj j∈ ⊂ be a sequence of disjoint sets such that · j∈ Aj = X. Show that for every u ∈ + 1Aj u d u d = j=1

Use this to construct on a -finite measure space X a function w which satisfies wx > 0 for all x ∈ X and w d < . 9.11. Kernels. Let X be a measure space. A map N X × → 0 is called kernel if A → Nx A

is a measure for every x ∈ X

x → Nx A

is a measurable function for every A ∈


75

(i) Show that A → NA = Nx A dx is a measure on X . (ii) For u ∈ + define Nux = uy Nx dy. Show that u → Nu is additive, positive homogeneous and Nu• ∈ + . (iii) Let N be the measure introduced in (i). Show that u dN = Nu d for all u ∈ + . [Hint: consider in each part of this problem first indicator functions u = 1A , then simple functions u ∈ + and then approximate u ∈ + by simple functions using 8.8 and 9.6] 9.12. (Continuation of Problem 6.1) Consider on the -algebra of all Borel sets which are symmetric w.r.t. the origin. Set A+ = A ∩ 0 , A− = − 0 ∩ A ± ± and consider their symmetrizations A± = A ∪ −A ∈ . Show that for every u ∈ + with 0 u 1 and for every measure on the set-function A → 1A+ u d + 1A− 1 − u d is a measure on that extends . Why does this not contradict the uniqueness theorem 5.7 for measures?

10 Integrals of measurable functions and null sets

Throughout this chapter X will be a measure space. Let us briefly review how we constructed the integral for positive measurable functions u ∈ +¯ . Guided by the idea that the integral should be the area between the graph of a function and the x-axis, we defined for indicator functions 1A d = A and extended this definition by linearity to all positive simple functions + which are just linear combinations of indicator functions (there was an issue about well-definedness which was addressed in L9.1). Since all positive measurable functions can be obtained as increasing limits of simple functions (Theorem 8.8), we could then define the integral of u ∈ +¯ by exhausting the area below u with elementary functions f u, see Definition 9.4. Beppo Levi’s theorem (in the form of C9.7) finally allowed us to replace the sup by an increasing limit. The integral turned out to be positive homogeneous, additive and monotone. We want to extend this integral now to not necessarily positive measurable functions u ∈ ¯ by linearity. The fundamental observation here is that u ∈ ¯ ⇐⇒ u = u+ − u−

u+ u− ∈ +¯

(cf. Corollary 8.11). This remark suggests the following definition. ¯ on a measure space X is said to be 10.1 Definition A function u X → ¯ (-)integrable, if it is /-measurable and if the integrals u+ d u− d < are finite. In this case we call u d = u+ d − u− d ∈ − (10.1) the (-)integral of u. We write 1 [1¯ ] for the set of all real-valued [numerical] -integrable functions. 76


77

In case we need to exhibit the integration variable, we write u d = ux dx = ux dx If = n , we call u dn the (n-dimensional) Lebesgue integral and u ∈ 1¯ n 1 Traditionally one writes ux dx or u dx is said to be Lebesgue integrable. n for the formally more correct u d . If we want to stress X or , etc., we will also write 1¯ X or 1¯ , etc. 10.2 Remark In the definition of the integral for positive u ∈ +¯ we did allow that u d = . Since we want to avoid the case ‘ − ’ in (10.1), we impose the finiteness condition u± d < . In particular, a positive function is said to be integrable only if the integral is finite: u ∈ 1¯ u 0 ⇐⇒ u ∈ +¯ and u d < (which is clear since for positive functions u+ = u and u− = 0).

+ Caution: Some authors call u -integrable (in the wide sense) whenever u d− − ¯ i.e. whenever it is not of the form ‘ − ’. We will u d makes sense in , not use this convention. Let us briefly summarize the most important integrability criteria. 10.3 Theorem Let u ∈ ¯ . Then the following conditions are equivalent: (i) u ∈ 1¯ ;

(ii) u+ u− ∈ 1¯ ;

(iii) u ∈ 1¯ ;

(iv) ∃ w ∈ 1¯ w 0 such that u w.

Proof (i)⇔(ii): this is just the definition of integrability. (ii)⇒(iii): since u = u+ + u− , we can use additivity of the integral on the + + − ¯ , see 9.8(iii), to get u d = u d + u d < . (iii)⇒(iv): take w = u. (iv)⇒(i): we have to show that u± ∈ 1¯ . Since u± u w we find by the monotonicity of the integral 9.8(iv) that u± d w d < .

1

The letter is in honour of H. Lebesgue who was one of the pioneers of modern integration theory. If is other than n , d is sometimes called the abstract Lebesgue integral.

78

R.L. Schilling

It is now easy to see that the properties 9.8 of the integral on +¯ extend to the set 1¯ : 10.4 Theorem Let X be a measure space and u v ∈ 1¯ , ∈ . Then 1 (i) u ∈ ¯ and u d = u d; (homogeneous) (ii) u + v ∈ 1¯ and u + v d = u d + v d (additive) (whenever u + v is defined); (iii) min u v max u v ∈ 1¯ ; (iv) u v =⇒ u d v d; (v) u d u d.

(lattice property) (monotone) (triangle inequality)

Proof There are principally two ways to prove this theorem: either we consider positive and negative parts for (i)–(v) and show that their integrals are finite, or we use T10.3(iii), (iv). Doing this we find (i) u = · u ∈ 1¯ by 9.8(ii). (ii) u + v u + v ∈ 1¯ by 9.8(iii). (iii) max u v u + v ∈ 1¯ and min u v u + v ∈ 1¯ by 9.8(iii). (iv) If u v, we find that u+ v+ and v− u− . Thus 9.8(iv) + − + − v d − v d = v d u d = u d − u d (v) Using ±u u we deduce from (iv) that u d = max u d − u d max

u d

− u d = u d

¯ for all x ∈ X – i.e. if we can exclude 10.5 Remark If ux ± vx is defined in ‘ − ’ – then T10.4(i),(ii) just say that the integral is linear: u + v d = u d + v d ∈ (10.2) This is always true for real-valued u v ∈ 1 , i.e. 1 is a vector space with addition and scalar () multiplication defined by u + vx = ux + vx

· ux = · ux


and

d 1 →

u →

79

u d

is a positive linear functional. 10.6 Examples Let us reconsider the examples from 9.10: (i) On X y , y ∈ X fixed, we have ux y dx = uy and u ∈ 1¯ y ⇐⇒ u ∈ ¯ and uy < (ii) On = is measurable, cf. Probj=1 j j every u→ lem 9.6. From 9.10(ii) we know that u d = j=1 j uj, so that u ∈ 1 ⇐⇒

j uj <

j=1

If 1 = 2 = = 1, 1 is called the set of summable sequences and 1 customarily denoted by = xj j∈ ⊂ j=1 xj < . This space is important in functional analysis. (iii) Let P be a probability space. Then every bounded measurable function (‘random variable’) ∈ , C = sup∈ < , is integrable. This follows immediately from dP sup Pd = C Pd = C < ∈

Caution: Not every P-integrable function is bounded.[] For A ∈ and u ∈ +¯ [or 1¯ ] we know from 8.5(i) and C8.10 [and 10.3(iv) using 1A u u] that 1A u is again measurable [or integrable]. 10.7 Definition Let X be a measure space and u ∈ 1¯ or u ∈ +¯ . Then u d = 1A u d = 1A xux dx ∀ A ∈ A

Of course,

X u d

=

u d.

10.8 Lemma On the measure space X let u ∈ + . The set-function A ∈ A → u d = 1A u d A

80

R.L. Schilling

is a measure on X . It is called the measure with density (function) u with respect to and denoted by = u . Proof Exercise. If has a density w.r.t. , one writes traditionally d/d for the density function. This notation is to be understood in a purely symbolical way; it is motivated by the well-known fundamental theorem of integral and differential calculus (for Riemann integrals) b ub − ua = u x dx a

where u = du 1 /d1 in our notation = du/dx. At least if u x 0 one can show that a b = ub − ua defines a measure and that, taking the fundamental theorem of integral calculus for granted, = u 1 = u dx, compare with Problem 7.9. A more advanced discussion of derivatives can be found in Chapter 19, Theorem 19.20 and Appendix E.16–E.19. Null sets and the ‘a.e.’ We will now discuss the behaviour of integrable functions on null sets which we have already encountered in Problem 4.10. Let X be a measure space. A (-)null set N ∈ is a measurable set N ∈ satisfying N ∈ ⇐⇒ N ∈ and

N = 0

(10.3)

If a property = x is true for all x ∈ X apart from some x contained in a null set N ∈ , we say that x holds for (-) almost all (a.a.) x ∈ X or that holds (-) almost everywhere (a.e.). In other words,

holds a.e. ⇐⇒ x x

fails ⊂ N ∈

but we do not a priori require that the set fails is itself measurable. Typically we are interested in properties x of the type: ux = vx ux vx, etc. and we say, for example, u=v

a.e. ⇐⇒ x ux = vx

is (contained in) a -null set

Caution: The assertions ‘u enjoys a property a.e.’ and ‘u is a.e. equal to v which satisfies everywhere’ are, in general, far apart; see in this connection Problem 10.14.


81

10.9 Theorem Let u ∈ 1¯ be a numerical integrable function on a measure space X . Then (i) (ii) N

u d = 0 ⇐⇒ u = 0 a.e. ⇐⇒ u = 0 = 0; u d = 0 ∀ N ∈ .

Proof Let us begin with (ii). Obviously, min u j ↑ u as j ↑ . By Beppo Levi’s theorem 9.6 we find 10.4(v) u d = 1N u d 1N u d N 9.6 = sup 1N min u j d sup j 1N d j∈

j∈

= sup j 1N d = sup j N = 0 j∈ j∈ =0

The second equivalence in (i) is clear since, due to the measurability of u, the set u = 0 is not just a subset of a null set, but measurable, hence a proper null set. In order to see ‘⇐’ of the first equivalence, we use (ii) with N = u = 0:

u d = =

u=0

u=0

u d + u d +

u=0

u=0

u d (ii)

0 d = 0

For ‘⇒’ we use the so-called Markov inequality: for A ∈ and c > 0 we have u c ∩ A =

1 uc∩A x dx

c 1 uc x dx A c 1 ux 1 uc x dx c A 1 ux dx c A =

(10.4)

82

R.L. Schilling

and for A = X this inequality implies that 4.6 [] u > 0 =

u 1j u 1j j∈

j∈

j u d = 0 j∈

=0

10.10 Corollary Let u v ∈ ¯ such that u = v -almost everywhere. Then (i) u v 0 =⇒ u d = v d;2 (ii) u ∈ 1¯ =⇒ v ∈ 1¯ and u d = v d. Proof Since u v are measurable, N = u = v ∈ . Therefore (i) follows from u d + u d u d = Nc

10.9(i) =

Nc

10.9(i)

=

Nc

N

v d + 0 v d +

N

use that u = v on N c v d =

v d

± ± For (ii) we observe first that u =v a.e. implies ±that u = v a.e. and then apply ± (i) to positive and negative parts: v d = u d < ; the claim follows.

10.11 Corollary If u ∈ ¯ and v ∈ 1¯ , v 0, then u v a.e. =⇒ u ∈ 1¯ Proof We have u± u v a.e., and by C10.10 u± d v d < . This shows that u is integrable. 10.12 Proposition (Markov inequality) For all u ∈ 1¯ , A ∈ and c > 0 1 u d (10.5) u c ∩ A c A and if A = X, in particular, 1 u d (10.6) u c c Proof See (10.4) in the proof of Theorem 10.9(i). 2

including, possibly, + = +.


83

10.13 Corollary If u ∈ 1¯ , then u is almost everywhere -valued. In partic ular, we can find a version u˜ ∈ 1 such that u˜ = u a.e. and u˜ d = u d. · u = − ∈ . Now Proof Set N = u = = u = + ∪ N=

u j j∈

and by 3.4(iii )3 and the Markov inequality we get

1 u d = 0 N = lim u j lim j→ j→ j 0 and whenever the expressions involved make sense/are finite, then: 1 (i) u > c u d; c 1 (ii) u > c p up d for all 0 < p < ; c


85

1 u d for an increasing function + → + ; c 1 (iv) u u d ; 1 (v) u < c u d for a decreasing function + → + ; c √ 1 (vi) P X − EX VX 2 , where P is a probability space and, in probabilistic jargon, X is a random variable (i.e. a measurable function X → ), EX = X dP the expectation or mean value and VX = X − EX2 dP the variance. Remark. This is Chebyshev’s inequality. 10.6. Show that up d < implies that u is a.e. real-valued (in the sense − valued!). Is this still true if we have arctanu d < ?. 10.7. Let Aj j∈ ⊂ be a sequence of pairwise disjoint sets. Show that (iii) u c

u 1j Aj ∈ 1 ⇐⇒ u 1An ∈ 1 and

j=1 Aj

u d <

10.8. Generalized Fatou lemma. Assume that uj j∈ ⊂ 1 . Prove: (i) If uj v for all j ∈ and some v ∈ 1 , then lim inf uj d lim inf uj d j→

j→

(ii) If uj w for all j ∈ and some w ∈ 1 , then lim sup uj d lim sup uj d j→

j→

(iii) Find examples that show that the upper and lower bounds in (i) and (ii) are necessary. [Hint: mimic and scrutinize the proof of Fatou’s Lemma 9.11 especially when it comes to the application of Beppo Levi’s theorem. What goes wrong if we do not have this upper/lower bound? Note that we have an ‘invisible’ v = 0 in T9.11.] 10.9. Let P be a probability space. Show that for u ∈ u ∈ 1 P

⇐⇒

P u j <

j=0

10.10. Independence (2). Let P be a probability space. Recall the notion of independence of two -algebras ⊂ introduced in Problem 5.10. Show that u ∈ + and w ∈ + satisfy uw dP = u dP · w dP

86

R.L. Schilling and that for u ∈ and w ∈ u ∈ 1

and w ∈ 1 ⇒ uw ∈ 1

Find an example proving that this fails if and are not independent. [Hint: start with simple functions and use Beppo Levi’s theorem 9.6.] 10.11. Completion (3). Let X ∗ ¯ be the completion of X – cf. Problems 4.13, 6.2. (i) Show that for every f ∗ ∈ +∗ there are f g ∈ + with f f ∗ g and f = g = 0 as well as f d = f ∗ d ¯ = g d. ∗ ∗ (ii) u X → is -measurable if, and only if, there exist -measurable ¯ with u u∗ w and u = w -a.e. functions u w X → ∗ 1 (iii) If ¯ then u w from (ii) can be chosen from 1 such that u ∈ , u d = u∗ d ¯ = w d. [Hint: (i) use Problem 4.13(v). (ii) for ‘⇒’ consider the sets u∗ > and use 4.13(v). The other direction is harder. For this consider first step functions using again 4.13(v) and then general functions by monotone convergence. (iii) by 4.13(iii), = ¯ on , and thus f d = f d ¯ for -measurable f .] 10.12. Completion (4). Inner measure and outer measure. Let X be a finite measure space. Define for every E ⊂ X the outer resp. inner measure ∗ E = inf A A ∈ A ⊃ E

and

∗ E = sup A A ∈ A ⊂ E

(i) Show that ∗ E ∗ E

∗ E + ∗ E c = X

∗ E ∪ F ∗ E + ∗ F

∗ E + ∗ F ∗ E ∪ F

(ii) For every E ⊂ X there exist sets E∗ E ∗ ∈ such that E∗ = ∗ E and E ∗ = ∗ E. [Hint: use the definition of the infimum to find sets E n ⊃ E such that E n − ∗ E n1 and consider n E n ∈ .] (iii) Show that ∗ = E ⊂ X ∗ E = ∗ E is a -algebra and that it is the completion of w.r.t. . Conclude, in particular, that ∗ ∗ = ∗ ∗ = ¯ if ¯ is the completion of . 10.13. Let X be a measure space and u ∈ . Assume that u ∈ and u = w almost everywhere w.r.t. . When can we say that w ∈ ? 10.14. ‘a.e.’ is a tricky business. When working with ‘a.e.’ properties one has to be extremely careful. For example, the assertions ‘u is continuous a.e.’ and ‘u is a.e. equal to an (everywhere) continuous function’ are far apart! Illustrate this by considering the functions u = 1 and u = 10 . 10.15. Let be a -finite measure on the measurable space X . Show that there exists a finite measure P on X such that = P , i.e. and P have the same null sets.


87

10.16. Construct an example showing that for u w ∈ + the equality G u d = w d for all G ∈ does not necessarily imply that u = w almost everywhere. G [Hint: In view of 10.14 cannot be -finite. Consider on the measure = m 1 where m = 1 x1 + 1 x>1 , u ≡ 1 and w = 1 x1 +2 1 x>1 . Then all Borel subsets of x > 1 have either -measure 0 or +, thus B u d = w d for all B ∈ while u = w = .] B

11 Convergence theorems and their applications

Throughout this chapter X will be some measure space. One of the shortfalls of the Riemann integral is the fact that we do not have sufficiently general results that allow us to interchange limits and integrals – typically one has to assume uniform convergence for this. This has partly to do with the fact that the set of Riemann integrable functions is somewhat limited, see Theorem 11.8. The classical counterexample for this defect is Dirichlet’s jump function x → 1∩01 x which is not Riemann integrable since its upper function is 101 while the lower function is 0 · 101 .[] For the Lebesgue integral on +¯ we have already seen more powerful convergence results in the form of Beppo Levi’s theorem 9.6 or Fatou’s lemma 9.11. They can deal with Dirichlet’s jump function: for any enumeration of = qj j ∈

we get 1∩01 d1 = sup 1q1 qN ∩01 d1 N ∈

9.6

= sup

N ∈

1q1 qN ∩01 d1

= sup 1 qj ∈ 0 1 1 j N = 0 N ∈ =0

In this chapter we study systematically convergence theorems for 1 and some of their most important applications. The first is a generalization of Beppo Levi’s theorem 9.6. 11.1 Theorem (Monotone convergence). Let X be a measure space. (i) Let uj j∈ ⊂ 1 be an increasing sequence of integrable functions u1 u2 with limit u = supj∈ uj . Then u ∈ 1 if, and only if, 88


89

supj∈ uj d < +, in which case sup uj d = sup uj d j∈

j∈

1

(ii) Let vk k∈ ⊂ be a decreasing sequence of integrable functions v1 1 v2 withlimitv = inf k∈ vk . Thenv ∈ if, andonlyif, inf k∈ vk d > −, in which case inf vk d = inf vk d k∈

k∈

Proof Obviously, (i) implies (ii) as uj = −vj fulfils all the assumptions of (i). To see (i), we remark that uj − u1 ∈ 1 defines an increasing sequence of positive functions 0 uj − u1 uj+1 − u1 for which we may use the Beppo Levi theorem 9.6: 0 sup uj − u1 d = sup uj − u1 d j∈

(11.1)

j∈

Assume that u ∈ 1 . Since the ‘sup’ in (11.1) stands for an increasing limit, we find that sup uj d = u − u1 d + u1 d j∈

(10.2)

=

u d −

u1 d +

u1 d =

u d <

Conversely, if supj∈ uj d < , we see from (11.1) that u − u1 ∈ 1 and, as u1 ∈ 1 , u = u − u1 + u1 ∈ 1 by (10.2). Therefore, (11.1) implies u d = u − u1 d + u1 d = sup uj d < j∈

One of the most useful and versatile convergence theorems is the following. 11.2 Theorem (Lebesgue. Dominated convergence). Let X be a measure space and uj j∈ ⊂ 1 be a sequence of functions such that uj w for all j ∈ and some w ∈ 1+ . If ux = limj→ uj x exists for almost every x ∈ X, then u ∈ 1 and we have (i) lim uj − u d = 0; j→ (ii) lim uj d = lim uj d = u d. j→

j→

90

R.L. Schilling

Proof Since all uj are measurable, N = x limj uj x does not exist is measurable, hence N ∈ , and we can assume that N = ∅ as the integral over the null set N gives no contribution, cf. Theorem 10.9(ii) – alternatively we could consider 1N c u and 1N c uj instead of u and uj . From uj w we get u = limj→ uj w, and u ∈ 1 by C10.11(iv). Therefore,

10.4(v)

uj d − u d = uj − u d uj − u d

which means that (i) implies (ii). Since uj − u uj + u 2w

∀j ∈

we get 2w − uj − u 0 and Fatou’s lemma 9.11 tells us that 2w d = lim inf 2w − uj − u d j→

lim inf =

j→

2w − uj − u d

2w d − lim sup j→

uj − u d 1

Thus 0 lim inf j→ uj − u d lim supj→ uj − u d 0, and conse quently limj→ uj − u d = 0. 11.3 Remark The uniform boundedness assumption uj w

∀ j ∈ and some

w ∈ 1+

(11.2)

is very important for Theorem 11.2. To see this, consider 1 and set j→

a.e.

uj x = j10 1 x −−−→ 10 x = 0 j

whereas uj d = j 1j = 1 = 0 = 10 d. The only obvious possibility to weaken (11.2) would be to require it to hold only almost everywhere.[] Lebesgue’s theorem gives merely sufficient – but easily verifiable – conditions for the interchange of limits and integrals; the ultimate version for such a result with necessary and sufficient conditions will be given in the form of Vitali’s convergence theorem 16.6 in Chapter 16 below. ∗ 1

Recall that lim inf j→ −xj = − lim supj→ xj .

∗

∗


91

Let us now have a look at two of the most important applications of the convergence theorems. Parameter-dependent integrals Again X is some measure space. 11.4 Theorem (Continuity lemma) Let ∅ = a b ⊂ be a non-degenerate open interval and u a b × X → be a function satisfying (a) x → ut x is in 1 for every fixed t ∈ a b; (b) t → ut x is continuous for every fixed x ∈ X; (c) ut x wx for all t x ∈ a b × X and some w ∈ 1+ . Then the function v a b → given by t → vt = ut x dx

(11.3)

is continuous. Proof Let us, first of all, remark that (11.3) is well-defined thanks to assumption (a). We are going to show that for any t ∈ a b and every sequence tj j∈ ⊂ a b with limj→ tj = t we have limj→ vtj = vt. This proves continuity of v at the point t. Because of (b), u• x is continuous and, therefore, j→

uj x = utj x −−−→ ut x

and uj x wx

∀ x ∈ X

Thus we can use Lebesgue’s dominated convergence theorem, and conclude lim vtj = lim utj x dx j→

j→

= =

lim utj x dx

j→

ut x dx = vt

A very similar consideration leads to 11.5 Theorem (Differentiability lemma) Let ∅ = a b ⊂ be a non-degenerate open interval and u a b × X → be a function satisfying (a) x → ut x is in 1 for every fixed t ∈ a b; (b) t → ut x is differentiable for every fixed x ∈ X; (c) t ut x wx for all t x ∈ a b × X and some w ∈ 1+ .

92

R.L. Schilling

Then the function v a b → given by t → vt = ut x dx

(11.4)

is differentiable and its derivative is

t vt = t ut x dx 2

(11.5)

Proof Let t ∈ a b and fix some sequence tj j∈ ⊂ a b such that tj = t and limj→ tj = t. Set uj x =

utj x − ut x j→ −−−→ t ut x tj − t

which shows, in particular, that x → t ut x is measurable. By the mean value theorem of differential calculus and (c) we see for some intermediate value = j x ∈ a b

uj x =

t ut x t= wx ∀ j ∈ 0 Thus uj ∈ 1 , and the sequence uj j∈ satisfies all conditions of the dominated convergence theorem 11.2. Finally, vtj − vt j→ tj − t ut x − ut x j = lim dx j→ tj − t = lim uj x dx

t vt = lim

j→

11.2

=

=

lim uj x dx

j→

t ut x dx

Later in this chapter we will give examples of how to apply the continuity and differentiability lemmas. Riemann vs. Lebesgue integration From here to the end of this chapter we choose X = . 2

This formula is very effectively remembered as ‘ t

=

t ’


93

Let us briefly recall the definition of the Riemann integral (see Appendix E for a more detailed discussion). Consider on the finite interval a b ⊂ the partitions = a = t0 < t1 < < tk = b define for a given function u a b → mj =

inf

ux

x∈tj−1 tj

Mj =

sup

j = 1 2 k

ux

x∈tj−1 tj

and introduce the lower resp. upper Darboux sums

k

S u =

k

mj tj − tj−1

resp.

S u =

j=1

Mj tj − tj−1

j=1

11.6 Definition A bounded function u a b → is said to be Riemann integrable, if the values ∗ S u = inf S u = u ∗ u = sup (sup inf range over all partitions of a b) coincide and are finite. b Their common value is called the Riemann integral of u and denoted by R a ux dx b or a ux dx. What is going on here? First of all, it is not difficult to see that lower [upper] Darboux sums increase [decrease] if we add points to the partition N , i.e. the sup [inf] in Definition 11.6 makes sense. Moreover, to S u and S u there correspond simple functions, namely u and u given by

k

ux =

k

mj 1tj−1 tj x

and ux =

j=1

Mj 1tj−1 tj x

j=1

which satisfy ux ux ux and which increase resp. decrease as refines. ∑π [u]

σπ [u] tj

tj + 1

tj + 2

tj + 3

94

R.L. Schilling

11.7 Remark The above construction gives the ‘usual’ integral which is often introduced as the anti-derivative. Unfortunately, this notion of integration is somewhat insufficient. Nice general convergence theorems (such as monotone or dominated convergence) hold only under unnatural restrictions or are not available at all. Moreover, it cannot deal with functions of the type x → 1∩01 x: the smallest upper function is 101 while the largest lower function is identically 0.[] Thus the Riemann integral of 1∩01 does not exist, whereas by T10.9(ii) the Lebesgue integral 1∩01 d = 0. Roughly speaking, the reason for this is the fact that the Riemann sums partition the domain of the function without taking into account the shape of the function, thus slicing up the area under the function vertically. Lebesgue’s approach is exactly the opposite: the domain is partitioned according to the values of the function at hand, leading to a horizontal decomposition of the area.

Lebesgue

(equidistant) Riemann

There is a beautifully simple connection with Lebesgue integrals which characterizes at the same time the class of Riemann integrable functions. It may come as a surprise that one needs the notion of Lebesgue null sets to understand Riemann’s integral completely. 11.8 Theorem Let u a b → be a measurable function. 1 (i) If u is Riemann integrable, then u is in b and the Lebesgue and Riemann integrals coincide: ab u d = R a ux dx.

(ii) A bounded function f a b → is Riemann integrable if, and only if, the points in a b where f is discontinuous are a Lebesgue null set. Caution: Theorem 11.8(ii) is often phrased in the following way: f is Riemann integrable if, and only if, f is (Lebesgue) a.e. continuous. Although correct, this is a dangerous way of putting things since one is led to read this statement (incorrectly) as ‘if f = a.e. with ∈ Ca b, then f is Riemann integrable’. That this is wrong is easily seen from f = 1∩ab and ≡ 0; see Problem 10.14 and 11.16.


95

Proof (of Theorem 11.8) (i) As u is Riemann integrable, we find a sequence of partitions j of a b such that ∗ lim Sj u = ∗ u = u = lim S j u j→ j→ Without loss of generality we may assume that the partitions are nested j ⊂ j + 1 ⊂ – otherwise we could switch to the increasing sequence 1 ∪ ∪ j of partitions, where we also observe that the lower [upper] Riemann sums increase [decrease] as the partitions refine. The corresponding simple functions j u and j u increase and decrease towards u = sup j u u inf j u = u j∈

j∈

and from the monotone convergence theorem 11.1 we conclude lim Sj u = lim j u d = u d ∗ u = j→ j→ ab ab and also ∗

u = lim S j u = lim j→

j→ ab

j u d =

(11.6)

u d

(11.7)

ab

In other words u u ∈ 1 . Since u is Riemann integrable, ∗ u − u d = u d − u d = u − ∗ u = 0 ab ab ab 0

which implies by Theorem 10.9(i) that u = u Lebesgue a.e. Thus u = u ∪ u = u ⊂ u = u ∈

(11.8)

and by Corollary 10.10(ii) we conclude that u is Lebesgue integrable. (ii) We continue to use the notation from part (i). The set = j∈ j of all partition points is countable, and by Problem 6.5(i),(iii) a Lebesgue null set. If f is Riemann integrable, we can find for > 0 and each x ∈ a b some nx ∈ such that for some suitable tj0 −1 tj0 ∈ nx we have x ∈ tj0 −1 tj0 and

j f x − f x + j f x − f x ∀ j nx By construction of the Riemann integral, all x y ∈ tj0 −1 tj0 satisfy fx − fy Mj0 − mj0 = nx f x − nx f x + f x − f x

96

R.L. Schilling

This inequality shows on the one hand that[] x fx is not continuous ⊂ ∪ f = f ∈ ∈ by (i)

is a null set if f is Lebesgue integrable. On the other hand, the above inequality shows also[] that f = f ⊂ x fx is continuous ∪ ∗ so that (11.6), (11.7) imply f = ∗ f , i.e. f is Riemann integrable. ∗

∗

∗

Let us finally discuss improper Riemann integrals of the type a ux dx = lim R ux dx R a→

0

(11.9)

0

provided the limit exists (cf. Appendix E for other types of improper integrals). 11.9 Corollary Let u 0 → be a measurable function which is Riemann integrable for every interval 0 N, N ∈ . Then u ∈ 1 0 if, and only if, N lim R ux dx < (11.10) N → 0 In this case, R 0 ux dx = 0 u d. Proof Using Theorem 11.8 we see that Riemann integrability of u implies Riemann integrability of u± .[] Moreover, N R u± x dx = u± x dx = u± 10N d (11.11) 0

0N

If u is Riemann integrable and satisfies (11.9) and (11.10), the limit N → of the left side of (11.11) exists and guarantees that the right-hand side has also a finite limit. The monotone convergence theorem 11.1 together with Theorem 10.3(ii) shows that u ∈ 1 0 . Conversely, if u is Lebesgue integrable, then so are u± u 10a and u± 10a for every a > 0. Since u is Riemann integrable over each interval [0 N ], we see from Theorem 11.8 that u and u± are Riemann integrable over each interval 0 a. The monotone convergence theorem 11.1 shows that for every increasing sequence aj ↑ lim u± 10aj d = u± d < j→

which yields that the limits (11.9), (11.10) exist.


97

11.10 Remark We can avoid in T11.8(i) and C11.9 the assumption that u is Borel measurable. If we admit an arbitrary u, our proofs show that u is outside a subset of a null set equal to the Borel measurable function – to wit: u = ⊂ = ∈ , but u = is not necessarily measurable. In other words, u becomes automatically measurable w.r.t. the completed Borel -algebra. This entails, of course, that we have to replace and 1 with the completed versions ¯ and ¯ see Problems 4.13, 6.2, 10.11 and 10.12. 1 , 11.11 Remark Lebesgue integration does not allow cancellations, but improper Riemann integrals N do. More precisely: the limit (11.9) can make sense even if limN → R 0 ux dx = . This is illustrated by the following example, which is typical in the theory of Fourier series: sin x The function x → sx = , x ∈ 0 , is improperly Riemann integrable but x not Lebesgue integrable. For a > 0 we can find N = Na ∈ such that N a < N + 1. Thus N sin x a sin x sin x dx = lim dx + dx a→ x x 0 0 N x N −1 j+1 sin x = lim dx N → x j=0 j = aj

where we used

a sin x

N +1

sin x

dx lim = 0

lim

a→ N x dx Nlim → N N → N x

Observe that the aj have alternating signs since j+1 sin x siny + j sin y aj = dx = dy = −1j dy x y + j j 0 0 y + j both as Riemann and Lebesgue integrals, by Theorem 11.8. Further, sin y sin y 1 sin y dy dy = dy aj = j +1 0 y 0 y + j 0 y+j y and also

aj =

0

sin y dy y + j

= aj+1

0

0

sin y dy y + j + 1 2 sin y dy = + j + 1 j + 2

98

R.L. Schilling

Since the function y → siny y is continuous and has a finite limit as y ↓ 0[] , we see that C = 0 siny y dy < , so that C 2/ aj+1 aj j +2 j +1

This and Leibniz’s convergence test prove that the alternating series j=0 aj converges conditionally but not absolutely, i.e. we get a finite improper Riemann integral, but the Lebesgue integral does not exist. Examples As we have seen in this chapter, the Lebesgue integral provides very powerful tools that justify the interchange of limits and integrals. On the other hand, the Riemann theory is quite handy when it comes to calculating the primitive (anti-derivative) of some concrete integrand. Theorem 11.8 tells us when we can switch between these two notions. 11.12 Example Let f x = x , x > 0 and ∈ . Then f ∈ 1 0 1 ⇐⇒ > −1 f ∈ 1 1 ⇐⇒ < −1 We show only the first assertion; the second follows similarly (or, indeed, from C11.9). Since f is continuous, it is Borel measurable, and since f 0 it is enough to show that 01 f d < . We find 01

9.6

x dx = lim

j→

x 11/j1 x dx

11.8

= lim R j→

1

x dx

1/j

x+1 1 = lim j→ + 1 1/j 1 1 − +1 = lim j→ + 1 j + 1

and the last limit is finite if, and only if, > −1. 11.13 Example The function fx = x e−x , x > 0, is Lebesgue integrable over 0 for all > −1 and 0.


99

Measurability of f follows from its continuity. Using the exponential series, we find for all N ∈ and x > 0

xN xj = ex N! j! j=0

=⇒

e−x

N ! −N x N

As e−x 1 for x > 0, we obtain the following majorization: fx = x e−x

N ! −N x 101 x + x 11 x ∈ 1 0 N

∈ 1 01 if >−1 by Example 11.12

(11.12)

∈ 1 1 if −N0 (11.13) t = 0

is called the Gamma function. It has the following properties: (i) (ii) (iii) (iv)

is continuous; is arbitrarily often differentiable; tt = t + 1, in particular n + 1 = n!; ln t is convex.

(see Problem 11.13(i)) (see Problem 11.13(ii)) (see Problem 11.13(iii))

Example 11.13 shows that the Gamma function is well-defined for all t > 0. We prove (i) and (ii) first for every interval a b where 0 < a < b < . Since both continuity and differentiability are local properties, i.e. they need to be checked locally at each point, (i) and (ii) follow for the half-line if we let a → 0 and b → . (i) We apply the continuity lemma T11.4. Set ut x = xt−1 e−x . We have already seen in Example 11.13 that ut • ∈ 1 0 for all t > 0; the continuity of u• x is clear and all that remains is to find a uniform (for t ∈ a b) dominating function. An argument similar to (11.12) gives for N > b + 1 xt−1 e−x xt−1 101 x + N ! xt−1−N 11 x xa−1 101 x + N ! xb−1−N 11 x xa−1 101 x + N ! x−2 11 x The expression on the right no longer depends on t, and is integrable according to Example 11.12. (Note that N = Nb depends on the fixed interval a b, but not on t.) This shows that t = 0 ut x dx is continuous for all t ∈ a b.

100

R.L. Schilling

(ii) We apply the differentiability lemma T11.5. The integrand ut • is integrable, and u• x is differentiable for fixed x > 0. In fact,

ut x = xt−1 e−x = xt−1 e−x ln x

t

t We still have to show that t ut x has an integrable majorant uniformly for all t ∈ a b. First we observe that ln x x, thus

∀ a < t 0, we find some > 0 with a − − 1 > −1, so that

1

C xa−1− e−x ∀ a < t < b 0 < x < 1

ut x xa−1− e−x x ln

t x →0 as3 x→0

Combining these calculations, we arrive at

ut x ∀ a < t < b

C xa−1− e−x 101 x + xb e−x 11 x

t which is an integrable majorant (by Examples 11.12, 11.13) independent of t ∈ a b. This shows that t is differentiable on a b, with derivative xt−1 e−x ln x dx t ∈ a b t = 0

A similar calculation proves that n exists for every n ∈ ; see Problem 11.13. Problems 11.1. Adapt the proof of Theorem 11.2 to show that any sequence uj j∈ ⊂ with limj→ uj x = ux and uj g for some g with g p ∈ 1+ satisfies lim uj − up d = 0 j→

[Hint: mimic the proof of 11.2 using uj − up uj + up 2p g p .] 11.2. Give an alternative proof of Lebesgue’s dominated convergence theorem 11.2(ii) using the generalized Fatou theorem from Problem 10.8. 3

To see this, use lim x ln x→0

1 x

x=exp−t

=

lim e−t t = 0 if > 0.

t→


101

11.3. Prove the following result of W. H. Young [56]; among statisticians it is also known as Pratt’s lemma, cf. J. W. Pratt [36]. Theorem (Young; Pratt): Let fk k gk k and Gk k be sequences of integrable functions on a measure space X . If k→

k→

k→

(i) fk x −−→ fx, gk x −−→ gx, Gk x −−→ Gx for all x ∈ X, (ii) gk x fk x Gk x for all k ∈ and all x ∈ X, k→ k→ (iii) gk d −−→ g d and Gk d −−→ G d with g d and G d finite, then limk→ fk d = f d and f d is finite. Explain why this generalizes Lebesgue’s dominated convergence theorem 11.2(ii). 11.4. Let u j j∈ be a sequence of integrable functions on X . Show that, if uj d < , the series j=1 j=1 uj converges a.e. to a real-valued function ux, and that in this case

uj d uj d = j=1

j=1

[Hint: use C9.9 to see that the series j uj converges absolutely for almost all x ∈ X. The rest is then dominated convergence.] 11.5. Let uj j∈ be a sequence of positive integrable functions on a measure space X . Assume that the sequence decreases to 0: u1 u2 u3 and j uj ↓ 0. Show that j=1 −1 uj converges, is integrable and that the integral is given by

−1j uj d = −1j uj d j=1

j=1

[Hint: mimic the proof of the Leibniz test for alternating series.] j→

11.6. Give an example of a sequence of integrable functions uj j∈ with uj x −−→ ux for all x and an integrable function u but such that limj→ uj d = u d. Does this contradict Lebesgue’s dominated convergence theorem 11.2? 11.7. Let be one-dimensional Lebesgue measure. Show that for every integrable function u, the integral function ut dt x > 0 x → 0x

is continuous. What happens if we exchange for a general measure ? 11.8. Consider the functions 1 1 (i) ux = x ∈ 1 (ii) vx = 2 x ∈ 1 x x 1 1 (iii) wx = √ x ∈ 0 1 (iv) yx = x ∈ 0 1 x x and check whether they are Lebesgue integrable in the regions given – what would happen if we consider 21 2 instead?

102

R.L. Schilling

[Hint: consider first uk = u 11k , resp., wk = w 11/k1 , etc. and use monotone convergence and the fact that Riemann and Lebesgue integrals coincide if both exist.] 11.9. Show that the function x → exp−x is 1 dx-integrable over the set 0 for every > 0. [Hint: find dominating integrable functions u resp. w if 0 x 1 resp. 1 < x < and glue them together by u 101 + w 11 to get an overall integrable upper bound.] 3 11.10. Show that for every parameter > 0 the function x → sinx x e−x is integrable over 0 and continuous as a function of the parameter. [Hint: find piecewise dominating integrable functions like in Problem 11.9; use the continuity lemma 11.4.] 11.11. Show that the function sintx dt G → Gx = 2 \0 t 1 + t is differentiable and find G0 and G 0. Use a limit argument, integration by parts for −nn dt and the formula t t sintx = x x sintx to show that x G x =

2t sintx dt 2 2 1 + t

11.12. Denote by one-dimensional Lebesgue measure. Prove that x k (i) e−x lnx dx = lim 1− lnx dx. k→ 1k k 1 k x (ii) e−x lnx dx = lim lnx dx. 1− k→ 01 k 01 11.13. Euler’s Gamma function. Show that the function e−x xt−1 dx t > 0 t = 0

(i) is m-times differentiable with m t = 0 e−x xt−1 log xm dx. [Hint: take t ∈ a b, use induction in m. Note that e−x xt−1 log xm xm+t−1 e−x Mx−2 for x 1, and M x−1 for x < 1 and some > 0 because limx→0 xa− log xm = 0 – use, e.g. the substitution x = e−y .] (ii) satisfies t + 1 = tt. n [Hint: use integration by parts for 1/n dt and let n → .] (iii) and is logarithmically convex, i.e. t → ln t is convex. [Hint: calculate ln t and show that this is positive.] 11.14. Show that x → x n fu x, fu x = eux /ex + 1, 0 < u < 1, is integrable over and that gu = xn fu x dx, 0 < u < 1, is arbitrarily often differentiable. 11.15. Moment generating function. Let X be a random variable on the probabil ity space P. The function X t = e−tX dP is called the moment generating function. Show that X is m-times differentiable at t = 0+ if the

Measures, Integrals and Martingales 103 absolute mth moment Xm dP exists. If this is the case, the following formulae hold:

dk

(i) X k dP = −1k k X t

for all 0 k m. t=0+ dt m

X k dP (ii) X t = −1k tk + otm . (ft = otm means that k! k=0 lim ft/tm = 0.)

t→0 m−1 k

tm

X dP k k

−1 t Xm dP. (iii) X t − k! m! k=0 (iv) If Xk dP < for all k ∈ , then k

X dP X t = −1k tk k! k=0 for all t within the convergence radius of the series. 11.16. Consider the functions ux = 1∩01 and vx = 1n−1 n∈ x. Prove or disprove: (i) The function u is 1 on the rationals and 0 otherwise. Thus u is continuous everywhere except the set ∩ 0 1. Since this is a null set, u is a.e. continuous, hence Riemann integrable by Theorem 11.8. (ii) The function v is 0 everywhere but for the values x = 1/n, n ∈ . Thus v is continuous everywhere except a countable set, i.e. a null set, and v is a.e. continuous, hence Riemann integrable by Theorem 11.8. (iii) The functions u and v are Lebesgue integrable and u d = v d = 0. (iv) The function u is not Riemann integrable. 11.17. Construct a sequence of functions uj j∈ which are Riemann integrable but conj→

verge to a limit uj −−→ u which is not Riemann integrable. [Hint: consider, e.g. uj = 1q1 q2 qj where qj j is an enumeration of .] 11.18. Assume that u 0 → is positive and improperly Riemann integrable. Show that u is also Lebesgue integrable. 11.19. Fresnel integrals. Show that the following improper Riemann integrals exist: sin x2 dx and cos x2 dx 0

0

Do they exist as Lebesgue integrals? Remark. The above integrals have the value 21 2 . This can be proved by Cauchy’s theorem or the residue theorem. 11.20. Frullani’s integral. Let f 0 → be a continuous function such that limx→0 fx = m and limx→ fx = M. Show that the two-sided improper Riemann integral s fbx − fax lim dx = M − m ln ab r→0 r x s→

104

R.L. Schilling

exists for all a b > 0. Does this integral have a meaning as Lebesgue integral? [Hint: use the mean value theorem for integrals, E.12.] 11.21. Denote by one-dimensional Lebesgue measure on the interval 0 1. (i) Show that for all k ∈ 0 one has

x ln x dx = −1 k

01

(ii) Use (i) to conclude that 01

k

x−x dx =

1 k+1

k+1 k + 1

k−k .

k=1

[Hint: note that x−x = e−x ln x and use the exponential series.]

12 The function spaces p 1 p

Throughout this chapter X will be some measure space. We will now discuss functions whose (absolute) pth power or pth (absolute) moment is integrable. More precisely, we are interested in the sets p = u X → u ∈ up d < p ∈ 1 (12.1) As usual, we suppress if the choice of measure is clear, and we write p X or p if we want to stress the underlying space or -algebra. It is convenient to have the following notation: 1/p p up = ux dx (12.2) Clearly, u ∈ p ⇐⇒ u ∈ and up < . It is no accident that the notation •p resembles the symbol for a norm:1 indeed, we have because of T10.9(i) up = 0 ⇐⇒ u = 0

a.e.,

and for all ∈ 1/p 1/p p p p u d = = up u d up =

(12.3)

(12.4)

The triangle inequality for •p and deeper results on p depend much on the following elementary inequality. 12.1 Lemma (Young’s inequality) Let p q ∈ 1 be conjugate numbers, p i.e. p1 + q1 = 1 or q = p−1 . Then AB

Ap Bq + p q

holds for all A B 0; equality occurs if, and only if, B = Ap−1 . 1

See Appendix B.

105

(12.5)

106

R.L. Schilling

ξ=

η q–1

Proof There are various different methods to prove (12.5) but probably the most intuitive one is through the following picture: The shaded area representing the pieces S1 and η S2 between the graph and the - resp. -axis is B given by S2 A B Ap Bq p−1 and d = q−1 d = 1 – p p q 0 0 ξ η=

respectively. The picture shows that their combined area is greater than the area of the

S1 A

ξ

Ap B q + AB. Equality obtains if, and only if, the lighter p q shaded area vanishes, i.e. if B = Ap−1 .

darker rectangle, thus

We can now prove the following fundamental inequality. 12.2 Theorem (Hölder’s inequality) Assume that u ∈ p and v ∈ q where p q ∈ 1 are conjugate numbers: p1 + q1 = 1. Then uv ∈ 1 , and the following inequality holds: uv d uv d up · vq p

(12.6)

q

Equality occurs if, and only if, uxp /up = vxq /vq a.e. Proof The first inequality of (12.6) follows directly from T10.4(v). To see the other inequality we use (12.5) with A =

ux up

and

B =

vx vq

to get uxvx uxp vxq p + q up vq p up q vq Integrating both sides of this inequality over x yields

p q uxvx dx up vq 1 1 + = 1 + p q = p q up vq p up q vq


107

Equality can only happen if we have equality in (12.5). Because of our choice of A and B, the condition for equality from L12.1 becomes vx/vq =

p−1 q a.e. Raising both sides to the qth power gives vxq /vq = ux/up p uxp /up since p − 1q = p. Hölder’s inequality with p = q = 2 is usually called the Cauchy–Schwarz inequality. 12.3 Corollary (Cauchy–Schwarz inequality) Let u v ∈ 2 . Then uv ∈ 1 and uv d u2 · v2 (12.7) Equality occurs if, and only if, ux2 /u22 = vx2 /v22 a.e. Another consequence of Hölder’s inequality is the Minkowski or triangle inequality for •p . 12.4 Corollary (Minkowski’s inequality) Let u v ∈ p , p ∈ 1 . Then u + v ∈ p and u + vp up + vp

(12.8)

Proof Since u + vp u + vp 2p max up vp 2p up + vp we get that u + vp ∈ 1 or u + v ∈ p . Now u + vp d = u + v · u + vp−1 d

u · u + vp−1 d +

v · u + vp−1 d

if p = 1 the proof stops here up · u + vp−1 q + vp · u + vp−1 q

12.2

Dividing both sides by u + vp−1 q proves our claim since 1/q 1−1/p p−1q p u + vp−1 = u + v d = u + v d q where we also used that q =

p p−1 .

108

R.L. Schilling

12.5 Remarks (i) Formulae (12.4) and (12.8) imply u v ∈ p

=⇒

u + v ∈ p

∀ ∈

which shows that p is a vector space. (ii) Formulae (12.3), (12.4) and (12.8) show that •p is a semi-norm for p : the definiteness of a norm is not fulfilled since up = 0

ux = 0 for almost every x

only implies that

but not for all x. There is a standard recipe to fix this: since p -functions can be altered on null sets without affecting their integration behaviour, we introduce the following equivalence relation: we call u v ∈ p equivalent if they differ on at most a -null set, i.e. u ∼ v ⇐⇒ u = v ∈ The quotient space Lp = p /∼ consists of all equivalence classes of p functions. If up ∈ Lp denotes the equivalence class induced by the function u ∈ p , it is not hard to see that u + vp = up + vp

uv1 = up vq

and

hold, turning Lp into a bona fide vector space with the canonical norm

up p = inf wp w ∈ p w ∼ u for quotient spaces. Fortunately, up p = up and later on we will often follow the usual abuse of notation and identify u with u. ¯ (iii) All results of this chapter are still valid for -valued numerical functions. p Indeed, if f ∈ ¯ and f d < , then

f p > j f = = f p = = j∈

= lim f p > j

4.4

j→

10.12

1 f p d = 0 j→ j

lim

by the Markov inequality. This means, however, that f is a.e. -valued, so sums and products of such functions are always defined outside a -null set. In particular p p there is no need to distinguish between the classes Lp = L and L¯ . ∗

∗

∗


109

We will need the concept of convergence of a sequence in the space p . A sequence uj j∈ ⊂ p is said to be convergent in p with limit p limj→ uj = u if, and only if, lim uj − up = 0

j→

Remember, however, that p -limits are only almost everywhere unique. If u w are both p -limits of the same sequence uj j∈ , we have 124

u − wp lim u − uj p + uj − wp = 0 j→

implying only u = w almost everywhere. We call uj j∈ ⊂ p a (p -) Cauchy sequence, if ∀ > 0

∃ N ∈

∀ j k N uj − uk p <

Note that these definitions reduce convergence in p to convergence questions of the semi-norm •p in + . This means that, apart from uniqueness, many formal properties of limits in carry over to p – most of them even with the same proof! Caution: Pointwise convergence of a sequence uj x → ux of p -functions uj j∈ ⊂ p does not guarantee convergence in p – but in view of Lebesgue’s dominated convergence theorem 11.2, the additional condition that p

uj x gx

for some function g ∈ +

is sufficient since uj − up uj + up 2p g p and uj x − ux → 0.[] Clearly, a convergent sequence uj j∈ is also a Cauchy sequence, uj − uk p uj − up + u − uk p < 2

∀ j k N

the converse of this assertion is also true, but much more difficult to prove. We start with a simple observation: 12.6 Lemma For any sequence uj j∈ ⊂ p , p ∈ 1 , of positive functions uj 0 we have uj p (12.9) uj j=1 j=1 p

Proof Repeated applications of Minkowski’s inequality (12.8) show that N N uj p uj p uj j=1 j=1 j=1 p

110

R.L. Schilling

and since the right-hand side is independent of N , the inequality remains valid even if we pass to the sup on the left. By Beppo Levi’s theorem 9.6, we find N ∈

p p N N sup uj = sup uj d N ∈ j=1 N ∈ j=1 p

=

sup

N

N ∈ j=1

p uj

d =

p uj

d

j=1

and the proof follows. The completeness of p was proved by E. Fischer (for p = 2) and F. Riesz (for 1 p < . 12.7 Theorem (Riesz–Fischer) The spaces p , p ∈ 1 , are complete, i.e. every Cauchy sequence uj j∈ ⊂ p converges to some limit u ∈ p . Proof The main difficulty here is to identify the limit u. By the definition of a Cauchy sequence we find numbers 1 < n1 < n2 < < nk < such that

unk+1 − unk < 2−k p

k ∈

To find u, we turn the sequence into a series by unk+1 =

k

unj+1 − unj

un0 = 0

(12.10)

j=0

and the limit as k → would formally be u = j=0 unj+1 − unj – if we can make sense of this infinite sum. Since (12.9) unj+1 − unj unj+1 − unj p j=0 j=0 p (12.11) 1 un1 p + 2j j=1

unj+1 − unj p < a.e., so that u = we conclude with C10.13 that j=0

j=0 unj+1 − unj is a.e. (absolutely) convergent.


111

Let us show that u = p - limk→ unk . For this, observe that by the (ordinary) triangle inequality and (12.11),

def u − unk = u u − unj = − unj p j=k+1 nj+1 j=k+1 nj+1 p p unj+1 − unj j=k+1 p

(12.9)

k→ unj+1 − unj − −−→ 0 p

j=k+1

Finally, using that uj j∈ is a Cauchy sequence, we get, for all > 0 and suitable N ∈ , u − uj u − unk + unk − uj p p p u − unk p + ∀ j nk N Letting k → shows u − uj p if j N . The proof of Theorem 12.7 shows even a weak form of pointwise convergence: 12.8 Corollary Let uj j∈ ⊂ p , p ∈ 1 with p -limj→ uj = u. Then there exists a subsequence unk k∈ such that limk→ unk x = ux holds for almost every x ∈ X. Proof Since uj j∈ converges in p , it is also an p -Cauchy sequence and the claim follows from (12.11). As we have already remarked, pointwise convergence alone does not guarantee convergence in p , not even of a subsequence, see Problem 12.7. Let us repeat the following sufficient criterion, which we have already proved on page 109. 12.9 Theorem Let uj j∈ ⊂ Lp , p ∈ 1 , be a sequence of functions such p that uj w for all j ∈ and some w ∈ + . If ux = limj→ uj x exists for almost every x ∈ X, then u ∈ p

and

lim u − uj p = 0

j→

Of a different flavour is the next result which is sometimes called F. Riesz’s convergence theorem.

112

R.L. Schilling

12.10 Theorem (Riesz) Let uj j∈ ⊂ p , p ∈ 1 , be a sequence such that limj→ uj x = ux for almost every x ∈ X and some u ∈ p . Then lim uj − up = 0

j→

⇐⇒

lim uj p = up

j→

(12.12)

Proof The direction ‘⇒’ in (12.12) follows from the lower triangle inequality2 uj p − up uj − up for •p . For ‘⇐’ we observe that uj − up uj + up 2p max uj p up 2p uj p + up and we can apply Fatou’s lemma 9.11 to the sequence 2p uj p + up − uj − up 0 to get 2p+1

up d =

lim inf 2p uj p + up − uj − up d j→

lim inf 2p uj p d + 2p up d − uj − up d j→

=2

p+1

u d − lim sup p

j→

uj − up d

where we used that limj→ uj p d = up d. This shows that lim sup uj − up d = 0 hence lim uj − up d = 0 j→

j→

Let us note the following structural result on p , which will become important later on. 12.11 Corollary The simple p-integrable functions ∩ p , p ∈ 1 , are a dense subset of p , i.e. for every u ∈ p one can find a sequence fj j∈ ⊂ such that limj→ fj − up = 0. p

Proof Assume first that u ∈ + is positive. By Theorem 8.8 we find an increasing sequence fj j∈ of positive simple functions with supj∈ fj = u. Since 0 fj u, we have fj ∈ p as well as supj∈ fj p d = up d.[] We can now apply Theorem 12.10 and deduce that limj→ fj − up = 0. 2

Follows exactly as a − b a − b follows from a + b a + b, a b ∈ .


113

For a general u ∈ p , we consider its positive and negative parts u± and construct, as before, sequences gj hj ∈ ∩ p with gj → u+ and hj → u− in p . But then gj − hj ∈ ∩ p , and j→

u − gj − hj p u+ − gj p + u− − hj p −−−→ 0 finishes the proof. With a special choice of X we can see that integrals generalize infinite series. 12.12 Example Consider the counting measure = j=1 j , cf. Example 4.7(iii), on the measurable space . As we have seen in Examples 9.10(ii) and 10.6(ii), a function u → is -integrable if, and only if,

uj <

in which case

j=1

u d =

uj

j=1

p In a similar way one shows that v ∈ p if, and only if, j=1 vj < . Functions u → are determined by their values u1 u2 u3 and every sequence aj j∈ ⊂ defines a function u by uj = aj . This means that we can identify the function u with the sequence ujj∈ of real numbers. Thus p = u →

ujp <

j=1

= aj j∈ ⊂

aj p < = p

j=1

the latter being a so-called sequence space. Note that in this context Hölder’s and Minkowski’s inequalities become

aj bj

j=1

1/p

aj

p

j=1

1/q bj

q

(12.13)

j=1

if p q ∈ 1 are conjugate numbers, and

j=1

1/p aj ± bj p

j=1

1/p aj p

+

j=1

1/p bj p

(12.14)

114

R.L. Schilling

We close this chapter with a useful convexity, resp. concavity, inequality. ¯ is convex [concave] if Recall that a function a b → on an interval a b ⊂ tx + 1 − ty tx + 1 − ty tx + 1 − ty tx + 1 − ty

0 < t < 1

(12.15)

0 < t < 1

holds for all x y ∈ a b. Geometrically this means that the graph of a convex [concave] function between the points x x and y y lies below [above] the chord linking x x and y y. Convex [concave] functions have nice properties: they are continuz y x ous in a b and if exists, A concave function Φ it is increasing [decreasing]. If is twice differentiable, convexity [concavity] is equivalent to 0 [ 0]. Further details and proofs can be found in Boas [8]. For our purposes we need the following lemma. 12.13 Lemma A convex [concave] function a b → has at every point in and satisfies the open interval a b a finite right-hand derivative + y x − y + y x + x + y x − y + y

∀ x y ∈ a b ∀ x y ∈ a b

(12.16)

In particular, a convex [concave] function is the upper [lower] envelope of all linear functions below [above] its graph x = sup x z = z + z ∀ z ∈ a b x = inf x z = z + z ∀ z ∈ a b

(12.17)

Proof Since the graph of a convex [concave] function looks like a smile [frown], the last statement of the lemma is intuitively clear. A rigorous argument uses (12.16) which says that admits at every point a tangent below [above] its graph. We show (12.16) only for concave functions. Pick numbers z< C = 0

(12.21)

a norm[] which is, for continuous u, just u = sup u. Interpreting p = 1 and q = as conjugate numbers, it is not hard to verify T12.2 and C12.4 for these values of p and q. The completeness of is much easier to prove than T12.7: if uj j∈ is a Cauchy sequence in , we set Ak = uk > uk ∪ uk − u > uk − u A = Ak k∈

By definition, Ak = 0 and A = 0, so that uj 1A = 0 for all j ∈ . On the set Ac , however, uj j∈ converges uniformly to a bounded function u, i.e. u1Ac ∈ as well as uj − u1Ac → 0. As in Remark 12.5 we write L for /∼ , where u ∼ v means that

u = v ∈ is a -null set. Note also that T12.10 and C12.11 are no longer true for p = . This can j→

be seen on from uj x = e−x/j −−−→ 1 x for the former and from [] ux = j=− 12j2j+1 x for the latter.

Problems 12.1. Let X be a finite measure space and let 1 q < p < . (i) Show that uq X1/q−1/p up . [Hint: use Hölder’s inequality for u · 1.] (ii) Conclude that p ⊂ q for all p q 1 and that a Cauchy sequence in p is also a Cauchy sequence in q . (iii) Is this still true if the measure is not finite? 12.2. Let X be a general measure space and 1 p r q . Prove that p ∩ q ⊂ r by establishing the inequality ur up · u1− q

∀ u ∈ p ∩ q

with = 1r − q1 / p1 − q1 . [Hint: use Hölder’s inequality.] 12.3. Extend the proof of Hölder’s inequality 12.2 to p = 1 and q = , i.e. show that (12.22) uv d u1 · v holds for all u ∈ 1 and v ∈ .


117

12.4. Generalized Hölder inequality. Iterate Hölder’s inequality to derive the following generalization: u1 · u2 · · uN d u1 p1 · u2 p2 · · uN pN (12.23) for all pj ∈ 1 such that Nj=1 pj−1 = 1 and all measurable uj ∈ . 12.5. Young functions. Let 0 → 0 be a strictly increasing continuous function such that 0 = 0 and lim→ = . Denote by = −1 the inverse function. The functions A = 1 d and B = 1 d (12.24) 0A

0B

are called conjugate Young functions. Adapt the proof of L12.1 to show the following general Young’s inequality: AB A + B

12.6.

12.7.

12.8.

12.9. 12.10.

12.11.

∀ A B 0

(12.25)

[Hint: interpret A and B as areas below the graph of , resp. .] Let 1 p < and u uk ∈ p such that k=1 u − uk p < . Show that limk→ uk x = ux almost everywhere. [Hint: mimic the proof of the Riesz–Fischer theorem using j uj+1 − uj .] Consider one-dimensional Lebesgue measure on 0 1. Verify that the sequence un x = n 101/n x, n ∈ , converges pointwise to the function u ≡ 0, but that no subsequence of un converges in p -sense for any p 1. Let p q ∈ 1 be conjugate indices, i.e. p−1 +q −1 = 1 and assume that uk k∈ ⊂ p and wk k∈ ⊂ q are sequences with limits u and w in p , resp. q -sense. Show that uk wk converges in 1 to the function uw. Prove that uj j∈ ⊂ 2 converges in 2 if, and only if, limnm→ un um d exists. [Hint: verify and use the identity u − w22 = u22 + w22 − 2 uw d.] every measurable u 0 with Let X be a finite measure space. Show that exphux dx < for some h > 0 is in p for every p 1. [Hint: check that tN /N ! et implies u ∈ N , N ∈ ; then use Problem 12.1.] Let be Lebesgue measure in 0 and p q 1 arbitrary. (i) Show that un x = n x + n− ( ∈ > 1) is for every n ∈ in p . (ii) Show that vn x = n e−nx ( ∈ ) is for every n ∈ in q .

12.12. Let ux = x + x −1 , x > 0. For which p 1 is u ∈ p 1 0 ? 12.13. Consider the measure space n , n 2 where is the =n 1 2p 1/p counting measure. Show that x is a norm if p ∈ 1 , but not for j j=1 p ∈ 0 1. [Hint: you can identify p with n .]

118

R.L. Schilling

12.14. Let X be a measure space. The space p is called separable, if there exists a countable dense subset p ⊂ p . Show that p , p ∈ 1 , is separable if, and only if, 1 is separable. [Hint: use Riesz’s convergence theorem 12.10.] 12.15. Let un ∈ p ,p 1, for all n ∈ . What can you say about u and w if you know that limn→ un − up d = 0 and limn→ un x = wx for almost every x? 1 12.16. Let X be a finite measure space and let u ∈ be strictly positive with u d = 1. Show that 1 log u d X log X 12.17. Let u be a positive measurable function on 0 1. Which of the following is larger: ux log ux dx or us ds · log ut dt? 01

01

01

[Hint: show that log x x log x, x > 0, and assume first that u d = 1, then consider u/ u d.] 12.18. Let X be a measure space and p ∈ 0 1. The conjugate index is given by − 1 < 0. Prove pq = 1/p for all measurable u v w X → 0 with u d vp d < and 0 < wq d < the inequalities 1/p 1/q uw d up d wq d and

1/p u + v d p

1/p 1/p p u d + v d p

[Hint: consider Hölder’s inequality for u and 1/w.] 12.19. Let X be a finite measure space and u ∈ be a bounded function with u > 0. Prove that for all n ∈ : (i) Mn = un d ∈ 0 ; (ii) Mn+1 Mn−1 Mn2 ; (iii) X−1/n un Mn+1 /Mn u ; (iv) limn→ Mn+1 /Mn = u . [Hint: (ii) – use Hölder’s inequality; (iii) – use Jensen’s inequality for the lower for the upper estimate; (iv) – observe that un d estimate, Hölder’s inequality n u − d = u > u − u − n , take the nth root and

u>u − let n → .] 12.20. Let X be a general measure space and let u ∈ p1 p . Then lim up = u

p→

where u = if u is unbounded.


119

[Hint: start with u < . Show that for any sequence qn → one has n and conclude that lim sup up+qn uqn /p+qn ·up/p+q p→ up u . The p other estimate follows from up u > 1 − u 1/p 1 − u and p → → 0, see also the hint to Problem 12.19, where is finite in view of the Markov inequality. If u = , use part one of the hint and observe that lim inf sup u ∧ kp sup lim u ∧ kp = sup u ∧ k p→

k∈ p→

k∈

k∈

= sup supux ∧ k = sup supux ∧ k k∈

x

x

k∈

= u = 12.21. Let X be a measure space and 1 p < . Show that f ∈ ∩ p if, and only if, f ∈ and f = 0 < . In particular, ∩ p = ∩ 1 . 12.22. Use Jensen’s inequality (12.18) to derive Hölder’s and Minkowski’s inequalities. Instructions: use x = x1/q x 0

w = f p

and

u = gq f −p 1 f =0

for Hölder’s inequality and x = x1/p + 1p x 0 for Minkowski’s inequality.

w = f p 1 f =0

and

u = f −p gp 1 f =0

13 Product measures and Fubini’s theorem

Lebesgue measure on n has, inherent in its definition, an interesting additional property: if n > d 1 n a1 b1 × · · · × an bn (13.1) = b1 − a1 · · bd − ad · bd+1 − ad+1 · · bn − an = d a1 b1 × · · · × ad bd · n−d ad+1 bd+1 × · · · × an bn

y∈n – d

i.e. it is – at least for rectangles – the product of Lebesgue measures in lowerdimensional spaces. In this chapter we will see that (13.1) remains true for any product A × B of sets A ∈ d and B ∈ n−d . More importantly, we will prove the following version of Cavalieri’s principle 1 (x, y ) E

0

y0

n E = =

1E (x0, y) E x0

=

x∈d

1E dn

1E x y0 d dx n−d dy0

1E x0 y n−d dy d dx0

which just says that we carve up the set E ⊂ n horizontally or vertically, measure the volume of the slices and ‘sum’ them up along the other direction to get the volume of the whole set E. Clearly, we should be careful about the measurability of products of sets. Recall the following simple rules for Cartesian products of sets A A Ai ⊂ X, i ∈ I, 120


and B B ⊂ Y :

i∈I

i∈I

121

Ai × B = Ai × B

i∈I

Ai × B =

Ai × B

i∈I

A × B ∩ A × B = A ∩ A × B ∩ B

(13.2)

Ac × B = X × B \ A × B A × B ⊂ A × B ⇐⇒ A ⊂ A and B ⊂ B which are easily derived from the formula A × B = A × Y ∩ X × B = 1−1 A ∩ 2−1 B where 1 X × Y → X and 2 X × Y → Y are the coordinate projections, and the compatibility of inverse mappings and set operations. To treat measurability, we assume throughout this chapter that X and Y are -finite measure spaces. Following (13.1) we want to define a measure on rectangles of the form A × B such that A × B = A B for A ∈ and B ∈ . The first problem which we encounter is that the family × = A × B A ∈ B ∈

(13.3)

is, in general, no -algebra. 13.1 Lemma Let and be two -algebras (or only semi-rings). Then × is a semi-ring.1 Proof Literally the same as the induction step in the proof of P6.4. 13.2 Definition Let X and Y be two measurable spaces. Then ⊗ = × is called a product -algebra, and X × Y ⊗ is the product of measurable spaces. The following lemma is quite useful since it allows us to reduce considerations for ⊗ to generators and of and – just as we did in (13.1). 1

See S1 –S3 on p. 37 for the definition of a semi-ring.

122

R.L. Schilling

13.3 Lemma If = and = and if contain exhausting sequences Fj j∈ ⊂ , Fj ↑ X and Gj j∈ ⊂ , Gj ↑ Y , then def

× = × = ⊗ Proof Since × ⊂ × we have × ⊂ ⊗ . On the other hand, the system = A ∈ A × G ∈ × ∀ G ∈ is a -algebra: Let A Aj ∈ , j ∈ , and G ∈ ; 1 follows from Fj × G ∈ × X ×G = j∈ ∈ ×

2 from Ac × G = X × G \ A × G ∈ × , and 3 from Aj × G = Aj × G ∈ × j∈ j∈ ∈ ×

Obviously, ⊂ ⊂ , and therefore = ; by the very definition of we conclude that × ⊂ × . A similar consideration shows × ⊂ × . This means that for all A ∈ and B ∈ A × B = A × X ∩ Y × B = A × Gk ∩ Fj × B

jk∈ ∈ ×

∈ ×

so that × ⊂ × and thus ⊗ ⊂ × . If the generators are rich enough, we have not too many choices of measures with F × G = F G. In fact, 13.4 Theorem (Uniqueness of product measures) Let X and Y be two measure spaces and assume that = and = . If • are ∩-stable, • contain exhausting sequences Fj ↑ X and Gk ↑ Y with Fj < and Gk < for all j k ∈ , then there is at most one measure on X × Y ⊗ satisfying

F × G = F G

∀ F ∈ G ∈


123

Proof By Lemma 13.3 × generates ⊗ . Moreover, × inherits the ∩-stability of and [] , the sequence Fj × Gj increases towards X × Y and

Fj × Gj = Fj Gj < . These were the assumptions of the uniqueness theorem 5.7, showing that there is at most one such product measure . As so often, it is the existence which is more difficult than uniqueness. 13.5 Theorem (Existence of product measures) Let X and Y be -finite measure spaces. Then the set-function

× → 0

A × B = A B

extends uniquely to a -finite measure on X × Y ⊗ such that

E = 1E x y dx dy = 1E x y dy dx

(13.4)

holds2 for all E ∈ ⊗ . In particular, the functions x → 1E x y y → 1E x y x → 1E x y dy y → 1E x y dx are , resp. -measurable for every fixed y ∈ Y , resp. x ∈ X. Proof Uniqueness of follows from T13.4. Existence: Let Aj j∈ , Bj j∈ be sequences in resp. with Aj ↑ X, Bj ↑ Y and Aj Bj < . Clearly, Ej = Aj × Bj ↑ X × Y . For every j ∈ we consider the family j of all subsets D ⊂ X × Y satisfying the following conditions: • x → 1D∩Ej x y and y → 1D∩Ej x y are measurable, • x → 1D∩Ej x y dy and y → 1D∩Ej x y dx are measurable, 1D∩Ej x y dy dx. • 1D∩Ej x y dx dy = That × ⊂ j follows from 1A×B∩Ej x y dx dy = 1A∩Aj x1B∩Bj y dx dy = A ∩ Aj 1B∩Bj y dy = A ∩ Aj B ∩ Bj = = 1A×B∩Ej x y dy dx 2

We use the symbols

d like brackets, i.e.

d d =

d d .

124

R.L. Schilling

where the ellipsis stands for the same calculations run through backwards. In each step the measurability conditions needed to perform the integrations are fulfilled because of the product structure.[] In particular, X × Y ∅ Ek ∈ j . If D ∈ j , then 1Dc ∩Ej = 1Ej − 1Ej ∩D and 1Dc ∩Ej x y dx dy = 1Ej x y dx − 1Ej ∩D x y dx dy 1Ej ∩D x y dx dy = 1Ej x y dx dy − 1Ej ∩D x y dy dx = 1Ej x y dy dx − by definition, since Ej D ∈ j = = 1Dc ∩Ej x y dy dx Again, in each step the measurability conditions hold since measurable functions form a vector space. If Dk k∈ ⊂ j are mutually disjoint sets, D = · k∈ Dk , the linearity of the integral and Beppo Levi’s theorem in the form of C9.9 show that 1D∩Ej x y dx dy = 1Dk ∩Ej x y dx dy k=1

=

1Dk ∩Ej x y dx dy

k=1

=

1Dk ∩Ej x y dy dx

k=1

by definition, since Dk ∈ j = = 1D∩Ej x y dy dx and the measurability conditions hold since measurability is preserved under sums and increasing limits. The last three calculations show that j is a Dynkin system containing the ∩-stable family × . By Theorem 5.5, ⊗ ⊂ j for every j ∈ . Since Ej ↑ X × Y , Beppo Levi’s theorem 9.6 proves (13.4) along with the measurability of the functions 1E • y, 1E x •, 1E • y dy and 1E x • dx since is stable under pointwise limits.


125

Replacing in the above calculations Ej by X × Y finally proves that E → E =

1E x y dx dy

is indeed a measure on X × Y ⊗ with A × B = A B. 13.6 Definition Let X and Y be -finite measure spaces. The unique measure constructed in Theorem 13.5 is called the product of the measures and , denoted by × . X × Y ⊗ × is called the product measure space. Returning to the example considered at the beginning we find 13.7 Corollary If n > d 1, n n n = d × n−d d ⊗ n−d d × n−d The next step is to see how we can integrate w.r.t. × . The following two results are often stated together as the Fubini or Fubini–Tonelli theorem. We prefer to distinguish between them since the first result, Theorem 13.8, says that we can always swap iterated integrals of positive functions (even if we get + ), whereas 13.9 applies to more general functions but requires the (iterated) integrals to be finite. 13.8 Theorem (Tonelli) Let X and Y be -finite measure spaces and let u X × Y → 0 be ⊗ -measurable. Then (i) x → ux y, y → ux y are , resp. -measurable for all y ∈ Y , resp. x ∈ X; (ii) x → ux y dy, y → ux y dx are , resp. -measurable; (iii) X×Y

Y

u d × =

YX

with values in 0 .

X

ux y dx dy =

ux y dy dx XY

Proof Since u is positive and ⊗ -measurable, we find an increasing sequence of simple functions fj ∈ + ⊗ with supj∈ fj = u. Each fj is of the form Nj fj x y = k=0 k 1Ek x y, where k 0 and the Ek ∈ ⊗ , 0 k Nj,

126

R.L. Schilling

are disjoint. By Theorem 13.5, the fact that ⊗ is a vector space and the linearity of the integral we conclude that x → fj x y y → fj x y x → fj x y dy y → fj x y dx Y

X

are measurable functions and (i), (ii) follow from the usual Beppo-Levi argument since and are stable under increasing limits, cf. C8.9. Linearity of the integral and Theorem 13.5 also show fj d × = fj d d = fj d d ∀ j ∈ X×Y

YX

XY

and (iii) follows from several applications of Beppo Levi’s theorem 9.6. 13.9 Corollary (Fubini’s theorem) Let X and Y be -finite mea¯ be ⊗ -measurable. If at least one of the sure spaces and let u X × Y → following three integrals is finite u d × ux y dx dy ux y dy dx X×Y

YX

XY

then all three integrals are finite, u ∈

1 × ,

and

(i) x → ux y is in 1 for -a.e. y ∈ Y ; (ii) y →

ux y is in 1 for -a.e. x ∈ X; ux y dx is in 1 ;

(iii) y →

(iv) x → (v)

X

ux y dy is in 1 ;

Y

u d × =

X×Y

ux y dx dy =

YX

ux y dy dx. XY

Proof Tonelli’s theorem 13.8 shows that in 0 u d × = u d d = u d d X×Y

YX

(13.5)

XY

If one of the integrals is finite, all of them are finite and u ∈ 1 × fol± lows. ± Again by Tonelli’s theorem, x → ±u x y is -measurable and y → u x y dx is -measurable. Since u u, (13.5) and C10.13 show that u± x y dx ux y dx < for -a.e. y ∈ Y X

X


and

u± x y dx dy

YX

127

ux y dx dy <

YX

This proves (i) and (iii); (ii) and (iv) are shown in a similar way. Finally, (v) follows for u+ and u− from Theorem 13.8 and for u = u+ − u− by linearity, since (i)–(iv) exclude the possibility of ‘ − ’. More on measurable functions There is an alternative way to introduce the product -algebra ⊗ . Recall that the coordinate projections j X1 × X2 → Xj

x1 x2 → xj

j = 1 2

induce the -algebra 1 2 on X1 ×X2 which is by Definition 7.5 the smallest -algebra such that both 1 and 2 are measurable maps. 13.10 Theorem Let Xj j , j = 1 2, and Z be measurable spaces. Then (i) 1 ⊗ 2 = 1 2 ; (ii) T Z → X1 × X2 is /1 ⊗ 2 -measurable if, and only if, j T is /j measurable j = 1 2; (iii) if S X1 × X2 → Z is measurable, then Sx1 • and S• x2 are 2 /- resp. 1 /-measurable for every x1 ∈ X1 , resp. x2 ∈ X2 . Proof (i) Since 1−1 x = x × X2 , 2−1 y = X1 × y and A1 × A2 = A1 × Y ∩ X × A2 , we have 7.5 1 2 = 1−1 1 2−1 2 = A1 × X2 X1 × A2 Aj ∈ j which shows that 1 × 2 ⊂ 1 2 ⊂ 1 ⊗ 2 , hence 1 2 = 1 ⊗ 2 . (ii) If T Z → X1 × X2 is measurable, then so is j T by part (i) and T7.4. Conversely, if j T , j = 1 2, are measurable we find T −1 A1 × A2 = T −1 1−1 A1 ∩ 2−1 A2 = T −1 1−1 A1 ∩ T −1 2−1 A2 = 1 T−1 A1 ∩ 2 T−1 A2 ∈ Since 1 × 2 generates 1 ⊗ 2 , T is measurable by L7.2. (iii) Fix x1 ∈ X1 and consider y → Sx1 y. Then Sx1 • = S ix1 •, where ix1 X2 → X1 × X2 , y → x1 y. By part (ii), ix1 is 2 /1 ⊗ 2 -measurable since

128

R.L. Schilling

the maps j ix1 x2 = xj are j /j -measurable j = 1 2. The claim follows now from T7.4. Distribution functions Let X be a -finite measure space. For u ∈ + the decreasing, left-continuous[] numerical function t → u t is called the distribution function of u (under ). The next theorem shows that Lebesgue integrals still represent the area between the graph of a function and the abscissa. 13.11 Theorem Let X be a -finite measure space and u X → 0 be -measurable. Then u d =

u t 1 dt ∈ 0

0

(13.6)

Proof Consider the function Ux t = ux t on X × 0 . By Theorem 13.10(ii), U is ⊗ 0 -measurable, thus E = x t ux t ∈ ⊗ 0 An application of Tonelli’s theorem 13.8 shows ux dx = 10ux t 1 dt dx 1E x t 1 dt dx = = =

X×0

0 ×X

0

1E x t dx 1 dt

u t 1 dt

If 0 → 0 is continuously differentiable, increasing and 0 = 0, we even have in the setting of Theorem 13.11 u d = u t 1 dt ∗

=

t=s

=

=

0

0 0

0

u t dt s u s ds s u s ds


129

The problem with this calculation is the step marked ∗ where we equate a Lebesgue integral with a Riemann integral. By Theorem 11.8(ii) we can do this if t → u t is Lebesgue a.e. continuous and bounded. Boundedness is not a problem since we may consider u t ∧ N , N ∈ , and let N → using T9.6. For the a.e. continuity we need 13.12 Lemma Every monotone function → has at most countably many discontinuities and is, in particular, Lebesgue a.e. continuous. Proof Without loss of generality we may assume that increases. Therefore, the one-sided limits lims↑t s = t− t+ = lims↓t s exist in , so that can only have jump discontinuities where t− < t+. Define for all >0 J = t ∈ t = t+ − t− Since on every compact interval a b and for every > 0 0 b − a =

b − a <

we can have at most b−a jumps of size or larger in the interval a b, that is # a b ∩ J < . Therefore, the set of all discontinuities of J = t ∈ t > 0 = −j j ∩ J 1/k jk∈

is a countable set, hence a Lebesgue null set. Since t → u t is decreasing, we finally have 13.13 Corollary Let X be -finite and let 0 → 0 with 0 = 0 be increasing and continuously differentiable. Then u d = s u s ds (13.7) 0

holds for all u ∈ + ; the right-hand side is an improper Riemann integral. Moreover, u ∈ 1 if, and only if, this Riemann integral is finite. In the important special case where t = tp , p 1, (13.7) reads psp−1 u s ds upp = up d = 0

(13.8)

130

R.L. Schilling

Minkowski’s inequality for integrals The following inequality is a generalization of Minkowski’s inequality C12.4 to double integrals. In some sense it is also a theorem on the change of the order of iterated integrals, but equality is only obtained if p = 1. 13.14 Theorem (Minkowski’s inequality for integrals) Let X and ¯ be ⊗ -measurable. Y be -finite measure spaces and u X × Y → Then p 1/p 1/p p ux y dy dx ux y dx dy X

Y

Y

X

holds for all p ∈ 1 , with equality for p = 1. Proof If p = 1, the assertion follows directly from Tonelli’s theorem 13.8. If p > 1 we set Uk x = ux y dy ∧ k 1Ak x Y

where Ak ∈ is a sequence with Ak ↑ X and Ak < . Without loss of generality we may assume that Uk x > 0 on a set of positive -measure, otherwise the left-hand side of the above inequality would be 0 (using Beppo Levi’s theorem 9.6) and there would be nothing to prove. By Tonelli’s theorem and p Hölder’s inequality T12.2 with p1 + q1 = 1 or q = p−1 , we find p p−1 Uk x dx Uk x ux y dy dx X

X

=

Y

Y X

p−1

Uk

Y

X

x ux y dx dy

1−1/p 1/p p Uk x dx ux yp dx dy

The claim follows upon dividing both sides by k → with Beppo Levi’s theorem 9.6.

X

1−1/p p X Uk x dx

and letting

Problems 13.1. Prove the rules (13.2) for Cartesian products. 13.2. Let X and Y be two -finite measure spaces. Show that A × N , where A ∈ and N ∈ , N = 0, is a × -null set.


131

13.3. Denote by Lebesgue measure on 0 . Prove that the following iterated integrals exist and that e−xy sin x sin y dxdy = e−xy sin x sin y dydx 0 0

0 0

Does this imply that the double integral exists? 13.4. Denote by Lebesgue measure on 0 1. Show that the following iterated integrals exist, but yield different values: x2 − y 2 x2 − y 2 dxdy = dydx 2 2 2 2 2 2 01 01 x + y 01 01 x + y What does this tell about the double integral? 13.5. Denote by Lebesgue measure on −1 1. Show that the iterated integrals exist, coincide, xy xy dxdy = dydx 2 2 2 2 2 2 −11 −11 x + y −11 −11 x + y but that the double integral does not exist. 13.6. (i) Prove that 0 e−tx dt = x1 for all x > 0. (ii) Use (i) and Fubini’s theorem to show that the sine integral sin x lim dx = n→ 0n x 2 13.7. Let A = #A be the counting measure and be Lebesgue measure on the measurable space 0 1 0 1. Denote by = x y ∈ 0 12 x = y the diagonal in 0 12 . Check that 1 x y dxdy = 1 x y dydx 01 01

01 01

Does this contradict Tonelli’s theorem? 13.8. (i) State Tonelli’s and Fubini’s theorems for spaces of sequences, i.e. for the measure space where = j∈ j , and obtain criteria when one can interchange two infinite summations. (ii) Using similar considerations as in part (i) deduce the following. Lemma Let Aj j be countably many (i.e. a finite or countably infinite number of) mutually disjoint sets whose union is , and let xk k∈ ⊂ be a sequence. Then xk = xk k∈

j k∈Aj

in the sense that if either side converges absolutely, so does the other, in which case both sides are equal. 13.9. Let u 2 → 0 be a Borel measurable function. Denote by Su = x y 0 y ux the set above the abscissa and below the graph u = x ux x ∈ of u.

132

R.L. Schilling (i) Show that Su ∈ 2 . (ii) Is it true that 2 Su = u d1 ? (iii) Show that u ∈ 2 and that 2 u = 0.

[Hint: (i) – use T8.8 to approximate u by simple functions fj ↑ u. Thus Su = 2 j Sfj and Sfj ∈ is easy to see; alternatively, use T13.10, set Ux y = ux y and observe that Su = U −1 C for the closed set C = x y x y; (ii) – use Tonelli’s theorem; (iii) – use u ⊂ Su \ Su − + or u = U −1 x y x = y; show first that 2 u ∩ −n n2 = 0 for every n ∈ and observe that u ∩ −n n2 = u 1−nn ∧ n.] 13.10. Let X be a -finite measure space and let u ∈ + be a 0 -valued measurable function. Show that the set Y = y ∈ x ux = y = 0 ⊂ is countable. [Hint: assume that u ∈ 1+ . Set Y = y > u = y > and observe that for t1 tN ∈ Y we have N Nj=1 tj u = tj u d. Thus Y is a finite set, and Y = kn∈ Y n1 k1 is countable. If u is not integrable, consider u ∧ m 1Am , m ∈ , where Am ↑ X is an exhaustion.] 13.11. Completion (5). Let X and Y be any two measure spaces such that = X and such that contains non-empty null sets. (i) Show that × on X × Y ⊗ is not complete, even if both and were complete. (ii) Conclude from (i) that neither 2 ⊗ × nor the product of ¯ are complete. the completed spaces 2 ∗ ⊗ ∗ ¯ × [Hint: you may assume in (ii) that ∗ = .] 13.12. Let be a bounded measure on the measure space 0 0 . (i) Show that A ∈ 0 ⊗ if, and only if, A = j∈ Bj × j, where Bj j∈ ⊂ 0 . (ii) Show that there exists a unique measure on 0 ⊗ satisfying tn B × n = e−t dt n! B 13.13. Stieltjes measure (2). Stieltjes integrals. This continues Problem 7.9. Let and be two measures on such that −n n −n n < for all n ∈ , and denote by ⎧ ⎧ ⎪ ⎪ if x > 0 if x > 0 ⎪0 x ⎪ ⎨ ⎨ 0 x Fx = 0 and Gx = 0 if x = 0 if x = 0 ⎪ ⎪ ⎪ ⎪ ⎩−x 0 if x < 0 ⎩− x 0 if x < 0 the associated right-continuous distribution functions (in Problem 7.9 we considered left-continuous distribution functions). Moreover, set Fx = Fx − Fx− and Gx = Gx − Gx−.


133

(i) Show that F G are increasing, right-continuous and that Fx = 0 if, and only if, x = 0. Moreover, F and are in one-to-one correspondence. (ii) Since measures and distribution functions are in one-to-one correspondence, it is customary to write u d = u dF , etc. If a < b we set B = x y a < x b x y b. Show that B is measurable and that × B = Fs dGs − FaGb − Ga ab

(iii) Integration by parts. Show that FbGb − FaGa = Fs dGs + ab

=

ab

Gs− dFs ab

Fs− dGs +

ab

Gs− dFs +

FsGs

a<sb

[Hint: expand × a b2 in two different ways, using (ii). Note that the sum in the second part of the formula is at most countable because of L13.12.] (iv) Change of variable formula. Let be a C 1 -function. Then Fb − Fa = Fs − Fs− − FsFs Fs− dFs + ab

a<sb

[Hint: use (iii) to show the change of variable formula for polynomials and then use the fact that continuous functions can be uniformly approximated by a sequence of polynomials – cf. Weierstraß’ approximation theorem 24.6.] 13.14. Rearrangements. Let X be a -finite measure space and let f ∈ p for some p ∈ 1 . The distribution function of f is given by f f t and the decreasing rearrangement of f is the generalized inverse of f , f ∗ = inf t f t

0

(inf ∅ = + ).

(i) Let f = 2 113 + 4 145 + 3 169 . Make a sketch of the graphs of fx, f t and f ∗ . (ii) Show that for f ∈ p f p d = p tp−1 f t dt = f ∗ p d

0

∗

0

In other words: f p = f p . Because of this the space p is said to be rearrangement invariant.

14 Integrals with respect to image measures

Let X be a measure space and X be a measurable space. As we have seen in T7.6 and D7.7, any / -measurable map T X → X can be used to transport the measure , defined on X , to a measure on X : TA = T −1 A

∀ A ∈

(14.1)

Let us see how (14.1) extends to integrals. To make sense of the integral dT w.r.t. the image measure T, we use again the recipe from Chapters 9, 10 when we introduced the integral. First, we calculate the image integrals for indicator functions and, by linearity, for (positive) simple functions. By monotone convergence T9.6 we extend the resulting formula to all positive measurable functions and, finally, considering positive and negative parts, to the whole class 1 T. This is the blueprint for the proof of 14.1 Theorem Let T X → X be a measurable map between the measure space ¯ X and the measurable space X . For every /-measurable ¯ ¯ ¯ is /and T-integrable function u X → we find that u T X → measurable, -integrable and satisfies X

u dT =

X

u T d

(14.2)

If u 0 is positive, (14.2) remains valid without assuming T-integrability. Proof Since u and T are measurable, so is u T , see T7.4, and the integrals in (14.2) are well-defined. 134


Let us assume that u 0 but not necessarily integrable, i.e. is allowed. For a simple function f ∈ + , f=

M

yj 1Aj

135

u dT = +

Aj ∈ yj 0

j=0

the identity (14.2) follows from (14.1) by linearity:

f dT =

M

yj

X

j=0

=

M

1Aj dT

yj TAj

j=0 (14.1)

=

M

yj T −1 Aj

(14.3)

j=0

=

M

yj

j=0

=

M j=0

X

1T −1 Aj x dx

X

1Aj Tx dx =

yj

fTx dx X

where we used that 1T −1 A x = 1A Tx for all A ∈ . Since every u ∈ + is the limit of an increasing sequence of positive simple functions fj ∈ + , see T8.8, we can use Beppo Levi’s theorem 9.6 and (14.3) to get (14.3) 9.6 u dT = sup fj dT = sup fj T d X

j∈ X

9.6

=

j∈ X

X

u T d

If u is T-integrable, we write u = u+ − u− and apply (14.2) to u± separately. All we have to do is to observe that u± T = u T± and that, due to the (14.2) ± u dT < . integrability assumption, u T± d = Often we are in the situation where T X → X is invertible with an inverse map T −1 X → X. In this case we can strengthen Theorem 14.1.

/-measurable

14.2 Corollary If in the situation of Theorem 14.1 the measurable map T X → X ¯ is T integrable (and, has a measurable inverse T −1 X → X, then u X →

136

R.L. Schilling

¯ a fortiori, /-measurable) if, and only if, u T is integrable (and, a fortiori, ¯ /-measurable). In this case (14.2) holds. Proof Apply Theorem 14.1 to u and u T using the measurable transformations T and T −1 respectively. 14.3 Examples We will frequently encounter the following particular situation of Corollary 14.2: let X be n n n where n is n-dimensional Lebesgue measure and let X = n n . The maps n → n y → −y

and

x n → n y → y − x

are continuous, hence measurable (Example 7.3) and so are their inverses −1 = and x−1 = −x . (i) By Corollary 7.11, Lebesgue measure n is invariant under reflections and translations, so that n = n and n = x n for all x ∈ n . But then (14.2) becomes u−y n dy = u y n dy = uy n dy (14.4) n = uy dy and, for all x ∈ n ,

uy ∓ x n dy =

u ±x y n dy = =

uy ±x n dy (14.5) uy n dy

(ii) If we consider Lebesgue measure with a density f 0, f n , cf. L10.8, we find uy x f n dy = u x yfy n dy (14.6) = u x yf x −x y n dy = uyfy + x n dy which also proves that x f n = f −x n .


137

Convolutions The convolution or Faltung of functions and measures on n n appears naturally in functional analysis, Fourier analysis, probability theory and other branches of mathematics. One can understand it as an averaging process that respects translations and results in a gain of smoothness. ¯ be 14.4 Definition Let and be measures on n n and u v n → measurable numerical functions. The convolution of … • …two functions u and v is the function ux − yvy n dy u vx =

(14.7)

provided ux − •v is positive or contained in 1 n ; • …of the function u and the measure is the function ux − y dy u x =

(14.8)

provided ux − • is positive or contained in 1 ; • …of two measures and is the measure B ∈ n B = 1B x + y dx dy

(14.9)

n

n

14.5 Remarks (i) The convolution of two functions (or of a function with a measure or of two measures) is linear in each of its arguments, e.g. u + v w = u w + v w

∈

Similar formulae hold in the second argument and for the other cases. (ii) The function 2n → n , x y = x + y, is Borel measurable and B = 1B x + y dx dy = 1 −1 B x y d × x y = × If and have densities u v 0 w.r.t. Lebesgue measure, that is = u n and = v n , we find for all B ∈ n that 1B x + yuxvy n dx n dy u n v n B = n ux − yvy dy n dx = B

138

R.L. Schilling

where we used Tonelli’s theorem 13.8 and (14.6). Thus u n v n = u v n ; n in a similar way one shows = u n . that u Interpreting (14.9) as 1B d = 1B x + y dx dy, we easily see that d =

x + y dx dy

first for simple functions, then by T9.6 for positive measurable functions, and finally by linearity for general ∈ 1 × . Note that the definition of u v is not really straightforward since we require ux − •v to be positive or integrable. Here is a much handier criterion due to W.H. Young. 14.6 Theorem (Young’s inequality) Let u ∈ 1 n and v ∈ p n , p ∈ 1 . Then the convolution u v defines a function in p n , satisfies u v = v u and

u v p u 1 · v p

(14.10)

Proof Assume first that u v 0. Let x y = x − y. Then is Borel measurable and so are x y → ux − yvy and x y → uyvx − y. Since n is invariant under translations, we see using (14.4)–(14.6) u vx = ux − yvy n dy = uyvx − y n dy = v ux Moreover,

p uyvx − y n dy n dx p uy n p dy n dx vx − y = u 1

u 1 12.14 uy n p dy n dx u 1 vx − yp

u 1 uy 13.8 p = u 1 n dy vx − yp n dx

u

1

u v pp =

p

= v p by (14.5) p

= u 1 v pp which implies that u v ∈ p n . The general case follows now from considering u = u+ − u− and v = v+ − v− and the fact that u+ − u− v± = u+ v± − u− v± where the difference is a.e. defined since u± v± ∈ p n is a.e. finite.


139

The convolution u v is a hybrid of u and v which inherits those properties which are preserved under translations and averages, cf. Problems 14.6–14.8. In general, u v is smoother than u and v. To see this, we need the following result which, although similar to Corollary 12.11, is much deeper and uses the topological structure of n n n . 14.7 Lemma The continuous functions with compact support Cc n are a dense subset of p n , p ∈ 1 . Proof Postponed to Chapter 15, Theorem 15.17. p n 14.8 Theorem Let u ∈ , p ∈ 1 . (i) The map x → ux + y − uyp n dy is uniformly continuous.

(ii) If u ∈ 1 n , v ∈ n , then u v is bounded and continuous. Proof (i) Because of Lemma 14.7 we find for every > 0 some ∈ Cc n such that u − . By the lower triangle inequality for • p and the translation invariance of n we find for any two x x ∈ n

ux + • − u p − ux + • − u p ux + • − ux + • p (14.5)

= ux − x + • − u p

Using again the triangle inequality and translation invariance we get for every R > 0 and all x x with x − x < R/2

ux−x + • − u p ux − x + • − u 1BR 0 p + ux − x + • − u 1BRc 0 p 1/p [] p n ux − x + • − u 1BR 0 p + 2 u d c 0 BR/2

Since u ∈ p n , it follows from the monotone convergence theorem 11.1 that limR→ Bc 0 up dn = 0, so that we can achieve R

1/p u d p

c 0 BR/2

n

∀ R > R

Since is continuous with compact support, it is uniformly continuous, which means that there is a = R > 0 such that for all y ∈ n , x < , and any fixed R > R we have x + y − y /n BR 01/p .

140

R.L. Schilling

Another application of the triangle inequality for • p and translation invariance yields ux−x + • − u 1B 0 R p ux − x + • − x − x + • 1BR 0 p + u − 1BR 0 p + x − x + • − 1BR 0 p 1/p

p n

2 u − p + x − x + y − y dy

BR 0 p /n BR 0 if x−x 0 define the function x = −n x/. The function u is called the Friedrichs mollifier of u ∈ p , 1 p < . (i) Show that x = exp1/x2 − 1 1B1 0 x has, for a suitable > 0, the properties mentioned above. Determine . (ii) Show that ∈ Cc n , supp = B 0, and 1 = 1. (iii) Show that supp u ⊂ supp u + supp = y ∀ x ∈ supp u x − y . (iv) Show that u is in C ∩ p and

u p u p

∀ > 0

(v) Show that Lp -lim→0 u = u. [Hint: split the region of integration as in the proof Theorem 14.8 and use the uniform boundedness shown in (iv).] 14.11. Define → by x = 1−cos x 102 x and let ux = 1, vx = x, and wx = −x t dt. Then

(i) u vx = 0 for all x ∈ ; (ii) u wx = x > 0 for all x ∈ 0 4; (iii) u v w ≡ 0 = u v w. Does this contradict the commutativity of the convolution which was used in Theorem 14.6?

15 Integrals of images and Jacobi’s transformation rule1

The previous chapter dealt with image measures and, by their definition, with measures of pre-images of sets. Sometimes one needs to know the measure of the direct image of a set under T X → Y . If T −1 exists and is measurable, we can apply the results of Chapter 14 to S = T −1 and we are done. If, however, T −1 is not measurable, the direct image TA of a set A ∈ need not be -measurable; in particular, an expression of the type TA – here is any measure on X – may not be well-defined, let alone a measure. Let us consider this problem in a very particular setting, where X ⊂ n X ⊂ d

and are Lebesgue measures n resp. d

We need some notation: if X → d , we write = 1 2 d for its components and we set for vectors x = x1 xn ∈ n and matrices A = ajk j=1n k=1d

x = max xj

A = max ajk

1jn

1jn 1kd

(15.1)

A set F ⊂ n [G ⊂ n ] is called an F -set [G -set] if it is the countable union of closed sets [countable intersection of open sets], i.e. if F= G= (15.2) C U ∈

∈

for closed sets C [open sets U ]. Obviously, both F - and G -sets are Borel sets; but, in general, neither are F -sets closed nor are G -sets open.

1

The proofs in this chapter can be left out at first reading.

142


143

15.1 Theorem Let F ⊂ n be an F -set and F → d be an -Hölder continuous map, that is x − y L x − y

∀ x y ∈ F

(15.3)

with constant L and exponent ∈ 0 1. For every F -set E ⊂ n , F ∩ E is an F -set in d , hence Borel measurable. If d n, we have d F ∩ E Ld n E

(15.4)

Proof Since E F are F -sets, they have representations E = F = j∈ Cj with closed sets j Cj ⊂ n . Moreover, E ∩ Bk 0 = j ∩ Bk 0 = K E= k∈

kj∈

j∈ j

and

∈

where K ∈ is an enumeration of the family j ∩ Bk 0jk∈ of closed and bounded, hence compact, sets. Thus Cj ∩ K is a compact set, and since images of compact sets under continuous maps are compact, we see that Cj ∩ K is compact and, in particular, closed. So, (2.4) F ∩ E = Cj ∩ K j ∈

is an F -set. Assume now that d n. If n E = , (15.4) is trivial and we will consider only the case n E < . The proof of Carathéodory’s extension theorem 6.1 – in particular (6.1) – for = n and the semi-ring = n of n-dimensional half-open rectangles (cf. P6.4) shows that we can find for every > 0 a sequence Jj j∈ ⊂ n with n Jj and Jj n E + (15.5) E⊂ j∈

j∈

Without loss of generality we can assume that all Jj are squares, i.e. have sides of equal length sj < s < 1, otherwise we could subdivide each Jj into finitely many non-overlapping squares of this type.[] So, 4.6

F ∩ Jj d F ∩ Jj (15.6) d F ∩ E d j∈

j∈

which means that it is enough to check (15.4) for a square E = J of side-length s < 1 and centre c ∈ n . Because of (15.3), n d J = × ck − 21 s ck + 21 s ⊂ × k c − L2 s1/ k c + L2 s1/ k=1

k=1

144

R.L. Schilling

and (notice that n J 1 and d/ n 1)

d/ n d F ∩ J Ls1/ d = Ld n J Ld n J From (15.5), (15.6) we conclude d F ∩ E

Ld n Jj Ld n E +

j∈

and the claim follows upon letting → 0. Theorem 15.1 can be improved if we use the completed Borel- -algebra cf. Problems 4.13, 6.2, 10.11 and 10.12. Recall that

∗ n ,

B∗ ∈ ∗ n ⇐⇒

B∗ = B ∪ N for some B ∈ n and a subset N of a Borel null set.

The advantage of ∗ n over n is that -Hölder continuous maps n → d map ∗ n -measurable sets into ∗ d -sets if d n; this is not true for n . To see this we need a few preparations. 15.2 Lemma Let B ∈ n be a Borel set. Then there exists an F -set F and a G -set G such that F ⊂B⊂G

and

n F = n B = n G

Proof The proof consists of three stages: Step 1: Construction of the set G. If n B = , we take G = n . If n B < , we find as in the proof of Theorem 15.1 (or as in Carathéodory’s extension theorem 6.1, (6.1), with = n and = n ) for every k ∈ a sequence of half-open squares Jjk j∈ ⊂ n of side-length sj such that j∈

Jjk

and

n Jjk n B +

j∈

We can now enlarge Jjk by moving the lower left corner by j = sjn + 2−j /k1/n − sj units ‘to the left’ in each coordinate direction. The new open square J˜jk has volume 1 n J˜jk = n Jjk + 2−j k

1 k εj

~k

Jj

sj

k

Jj

sj

∋

B⊂

j


145

˜k j∈ Jj

⊃ B satisfy 4.6 n k 1 1 1 n k n ˜k n G + Jj = Jj + B + k k k j∈ j∈

and we see that the open sets Gk =

is a G -set with G ⊃ B, and 2 4.4 n n n k n B G = lim G lim B + = n B k→ k→ k

Thus G =

k k∈ G

Step 2: Construction of the set F if n B < . Denote by B¯ the closure2 of B. Since B¯ \ B is a Borel set, we find as in step 1 open sets U k with B¯ \ B ⊂ U k

and

n U k n B¯ \ B +

1 k

(15.7)

Observe that

B ⊂ B \ U k ∪ U k ∩ B ⊂ B¯ \ U k ∪ U k \ B¯ \ B so that by the subadditivity of measures

n B n B¯ \ U k + n U k \ B¯ \ B = n B¯ \ U k + n U k − n B¯ \ B (15.7)

n B¯ \ U k +

1 k

By construction, Ck = B¯ \ U k ⊂ B¯ \ B¯ \ B = B is a closed set and F = ⊂ B is an F -set satisfying 1 n B − n Ck n Cj = n F n B k j∈

k∈ Ck

The claim follows as k → . Step 3: Construction of the set F if n B = . Setting

Bj = B ∩ Bj 0 \ Bj−1 0 j ∈ we get a disjoint partitioning of B = · j∈ Bj where each set Bj is a Borel set with finite volume. Applying step 2 to each Bj , we find F -sets Fj ⊂ Bj with 2

i.e. the smallest closed set containing B, cf. Appendix B, Definition B.3(iii).

146

R.L. Schilling

n Fj = n Bj , j ∈ . Since the Bj are mutually disjoint, so are the Fj , and since F = j∈ Fj is again an F -set (cf. Problem 15.1) we end up with F ⊂ B and n n n F = Fj = Bj = n B j∈

j∈

The proof of the lemma is now complete. 15.3 Lemma Let n → d be an -Hölder continuous map with ∈ 0 1 and d n. If N ∗ is a subset of a Borel null set N ∈ n , then N ∗ is a subset of a Borel null set M ∈ d . Proof Since N ∗ ⊂ N ∈ n where n N = 0, we can repeat the argument of the proof of Theorem 15.1 to find for k ∈ a covering of N by half-open squares Jjk ∈ n such that N⊂

Jjk

and

n

j∈

1 Jjk n Jjk k j∈ j∈

Since n Jjk = n J¯jk , J¯jk is the closed square, we have also n

1 J¯jk n J¯jk k j∈ j∈

Applying T15.1 to the F -set F k = well as

¯k j∈ Jj

shows that

d F k Ld n F k Since

k∈ F

k

k∈ F

k ∈ n

as

Ld k

⊃ N ⊃ N ∗ , we conclude

d

Ld k→ F d F k −−−→ 0 k ∈

Lemma 15.3 is just a special case of the following theorem which has already been announced above. 15.4 Theorem Let F ⊂ n be an F -set, F → d be an -Hölder continuous map with exponent ∈ 0 1. If d n, then maps the completed Borel

-algebra F ∩ ∗ n into ∗ d , and the inequality (15.4) holds for all B ∈ ∗ n with the completed Lebesgue measures3 ¯n and ¯d . 3

See Problems 4.13, 6.2, 10.11, 10.12, 13.3 for the completion of measures and their properties.


147

Proof Pick B∗ ∈ ∗ n and write B∗ = B ∪ N ∗ where B ∈ n and N ∗ is a subset of a Borel null set N ∈ n . According to L15.2 we have B∗ = E ∪ M ∗ ∪ N ∗ = E ∪ N ∗∗ where E is an F -set, n E = n B, and M ∗ N ∗∗ = N ∗ ∪ M ∗ are subsets of Borel null sets. Thus B∗ = E ∪ N ∗∗ = E ∪ N ∗∗ and E is an F -set, see T15.1, and N ∗∗ is contained in a Borel null set ⊂ d , see L15.3, hence B∗ ∈ ∗ d . Finally, by T15.1,

¯ d F ∩ B∗ = ¯ d F ∩ E ∪ N ∗∗

¯ d F ∩ E ∪ N ∗∗

= d F ∩ E 15.1

Ld n E = Ld n B = Ld ¯ n B∗

Let us stress that both Hölder continuity of and the condition d n are crucial for Theorem 15.4; one can find counterexamples if we have only ∈ CF d or d < n.

Jacobi’s transformation formula One of the most interesting situations arises if = 1 n nx → ny (we write nx if we want to indicate the generic variable in order to distinguish between the domain and range of ) is a C 1 -map with everywhere defined inverse −1 ny → nx which is again a C 1 -map. Such maps are called C 1 n n diffeomorphisms. As usual, we write D x = x k x for the j

jk=1n

Jacobian at the point x ∈ nx . By Taylor’s theorem we find for all x x ∈ K from a compact set K ⊂ nx k x − k x

n · xj − xj k x j j=1

(15.8)

n sup D · x − x ∈K

i.e. is locally Lipschitz (1-Hölder) continuous with Lipschitz constant L = LK = n sup∈K D .

148

R.L. Schilling

15.5 Theorem (Jacobi’s transformation theorem) Let nx → ny be a C 1 -diffeomorphism. Then n (15.9) B = det D x n dx B

holds for all Borel sets B ∈

nx .

The proof of Theorem 15.5 is based on two auxiliary results. 15.6 Lemma Let and be two measures on space X and 4 the measurable let be a semi-ring such that = . If and if there is a sequence Sj j∈ ⊂ with Sj ↑ X, then . Proof It is clear from the properties of and that = − → 0 is a pre-measure. By T6.1, has a unique extension to a measure ˜ on and + S S = + S =

∀S ∈

+ is the unique extension of the pre-measure + to a measure where on . But the measures ˜ + and satisfy S = ˜ S + S = S + S = + S

∀S ∈

and we conclude from the uniqueness of the extensions that = ˜ + on , i.e. A − A = ˜ A 0 for all A ∈ . Caution: Lemma 15.6 fails if is not a semi-ring; see Problem 15.4. 15.7 Lemma For every C 1 -diffeomorphism nx → ny we have ∀ J ∈ nx n J det D x n dx J

Proof Let J = a b, a b ∈ nx , and note that J¯ = a b is a compact set. Since D −1 is continuous, we find on the compact set J¯ L = sup D x−1 sup D −1 y x∈J

y∈ J¯

(15.10)

where we used the inverse function theorem.[] Since D is uniformly continuous on J¯, we find for a given > 0 some > 0 such that sup D x − D x (15.11) L xx ∈ab x−x

4

This is short for S S for all S ∈ .


149

Partition J into N disjoint half-open squares J1 JN ∈ nx of the same side-length < . Since D and det D are continuous functions[] , we can find for each = 1 2 N a point x ∈ J¯

such that det D x = inf det D x x∈J¯

Set T = D x ∈ n×n and observe that DT −1 x = T −1 D x = idn +T −1 D x − D x (idn is the identity matrix in n×n ). The estimates (15.10), (15.11) show that sup DT −1 x 1 + L

x∈J¯

= 1+ L

∀ 1 N

i.e. T −1 is Lipschitz (1-Hölder) continuous with constant 1 + , see (15.8). Therefore, the special transformation rule T6.10 for Lebesgue measure and T15.1 show

n J = n T T −1 J

= det T · n T −1 J det T 1 + n n J N Since J = · =1 J and det T det D x for all x ∈ J , we get n J

N

n J 1 + n

=1

N

det T n J

=1

1 + n = 1 + n

N =1 J

J

det D x n dx

det D x n dx

and the proof is finished by letting → 0. We can finally proceed to the proof of Theorem 15.5. Proof (of Theorem 15.5) Set = −1 . Since is continuous, = d = d is a measure on nx , compare T7.6 and D7.7. The determinant det D is also continuous, thus A = A det D x n dx defines a measure on nx , see L10.8. From Lemma 15.7 we know that J J < for all

150

R.L. Schilling

rectangles J ∈ nx , and Lemma 15.6 shows that holds on the whole of nx , i.e. n X det D x n dx ∀ X ∈ nx (15.12) X

This proves ‘’ of (15.9). For the other direction our strategy is to apply Lemma 15.7 to the inverse function = −1 . If X = −1 Y , Y ∈ ny , (15.12) becomes 1Y y n dy = n Y 1 −1 Y x det D x n dx = 1Y x det D x n dx and with exactly the same arguments which we used to prove Theorem 14.1, this inequality is easily extended from indicator functions to all u ∈ + ny : uy n dy u x det D x n dx (15.13) ny

nx

Switching in (15.13) the rôles of nx ↔ ny , x ↔ y and considering the C 1 -diffeomorphism ny → nx (instead of ) and the measurable[] function ux = 1 A x det D x for some A ∈ nx yields 1 A x det D x n dx nx

= = =

ny

1 A det D −1 y det D −1 y n dy

ny

ny

ny

1 A y · detD −1 y · det D −1 y n dy

1 A y · det

D −1 y · D −1 y

idn =Didn =D −1 =D −1 ·D −1

1 A y n dy = n A

This proves that for all A ∈ nx 1A x det D x n dx = nx

n dy

nx

1 A x det D x n dx

n A and, together with the converse inequality (15.12), the theorem follows.


151

If X ⊂ nx Y ⊂ ny are open sets and X → Y is a C 1 -diffeomorphism, we still can apply Theorem 15.5 to A = −1 B, A ∈ X ∩ nx , B ∈ Y ∩ ny to get n Y = Y = X • = det D x n dx (15.14) • ∩X

i.e. Theorem 14.1 yields the following important result. 15.8 Corollary (General transformation theorem) Let X Y ⊂ n be open sets ¯ is integrable w.r.t. and X → Y be a C 1 -diffeomorphism. A function u Y → n ¯ if, and only if, the function u · det D X → is integrable w.r.t. n . In this case Y

uy n dy =

X

u x det D x n dx

(15.15)

For many applications we need a somewhat reinforced version of C15.8 since is often only almost everywhere a diffeomorphism. The following simple generalization takes care of that. Recall that ¯ n is the completed Lebesgue measure, cf. Problems 4.13, 6.2, 10.11, 10.12, 13.11. 15.9 Corollary Let X → ny be a C 1 -map on a measurable set X ∈ ∗ nx whose open interior is denoted by X . If X \ X is a ¯ n -null set5 and X is a C 1 -diffeomorphism onto X , then uy ¯ n dy = u x det D x ¯ n dx (15.16) X

X

holds for all ∗ -measurable positive functions u X → 0 . Moreover, ¯ is ¯ n u X → is ¯ n integrable if, and only if, u · det D X → integrable; in this case (15.16) remains valid. Proof The argument proving C15.8 remains literally valid for ¯ n , i.e. the difficulty of C15.9 is not the completion of the measure but the fact that is only almost everywhere a diffeomorphism. Since ¯ n X \ X = 0, we get X \ X ⊂ X \ X , cf. Chapter 2, which is again a ¯ n -null set by Lemma 15.3. In view of C10.10 we can alter 1 -functions on null sets, which means that the equality

u d¯ n = 1 X · u · det D d¯ n X

from C15.8 immediately implies (15.16). 5

i.e. a subset of a Borel null set.

152

R.L. Schilling

15.10 Remark Formulae (15.9) and (15.15) have the following interesting interpretation in connection with the Radon–Nikodým theorem 19.2 and Lebesgue’s differentiation theorem for measures T19.20, in particular C19.21: dn n Br x x = det D x = lim r→0 n Br x dn Spherical coordinates and the volume of the unit ball Some of the most interesting applications of Corollaries 15.8 and 15.9 are coordinate changes. 15.11 Example (Planar polar coordinates) Consider the map P 0 × 0 2 → 2 \ 0 × 0

Pr = r cos r sin

which introduces polar coordinates r in 2 . It is not hard to see that P is bijective and even a C 1 -diffeomorphism. The determinant of the Jacobian is given by

Pr det r

cos −r sin = = r cos2 + r sin2 = r sin r cos

Since 0 × 0 is a 2 -null set, we can apply Corollary 15.8 (or 15.9) and find for every u 2 → , u ∈ 1 2 2 2

ux y d2 x y = =

r ur cos r sin d2 r 0×02

r ur cos r sin d1 d1 r

0 02

where we used Fubini’s theorem 13.9 for the last equality. This shows, in particular, that

u ∈ 1 2 ⇐⇒ r → r ur cos r sin ∈ 1 0 × 0 2 A simple but quite interesting application of planar polar coordinates is the following formula which plays a central rôle in probability theory: this is where the norming factor √1 for the Gaussian distribution comes from. 2


15.12 Example We have

e−x d1 x = 2

√

153

(15.17)

Proof: We use the following trick: by Tonelli’s theorem 13.8 2 2 2 −x2 1 e d x = e−x e−y d1 x d1 y

= =

2

e−x

2 +y 2

d2 x y

r e−r d1 r d1 2

0 02

−r 2

is positive and improperly Riemann integrable[] , we know that Since re Lebesgue and Riemann integrals coincide (cf. 11.8, 11.18), and therefore 2

2 2 −x2 1 e d x = 1 0 2 r e−r dr = 2 − 21 e−r 0 =

0

Polar coordinates also exist in higher dimensions but, unfortunately, the formulae become quite messy. The idea ω here is that we parametrize n by the radius r ∈ 0 , and n − 1 angles ∈ 0 2 and ∈ −/2 /2n−2 , so that x = Pr . The Jacobian is now of the form r n−1 J and, if we denote θ by v = u P the function u expressed in polar coordinates, the transformation formula gives u dn = r n−1 vr det J d1 r d1 dn−2 n

0×02× ×−/2/2n−2

We will not give further details but settle for the slightly simpler case of spherical coordinates which will lead to a similar formula. Let S n−1 = x ∈ n x2 = 1 be the unit sphere of n (x2 = x12 + · · · + xn2 is the Euclidean norm) and set n \ 0 → 0 × S n−1

x → x x

where x = x/x ∈ S n−1 is the directional unit vector for x. Obviously, is bijective, differentiable and has a differentiable inverse −1 r s = r · s.

154

R.L. Schilling

15.13 Theorem On S n−1 = S n−1 ∩ n there exists a measure n−1 which is invariant under rotations and satisfies

u d = n

n

r n−1 urs 1 dr n−1 ds

(15.18)

0×S n−1

for all u ∈ 1 n . In other words, n = × n−1 where dr = r n−1 10 r 1 dr; in particular

u ∈ 1 n n ⇐⇒ r n−1 urs ∈ 1 0 × S n−1 1 × n−1 Proof We define n−1 by

n−1 A = n n −1 A ∩ B1 0 ∀ A ∈ S n−1 which is an image measure, hence a measure, cf. T7.6. Since −1 and n are invariant w.r.t. rotations around the origin, see T7.9, it is obvious that n−1 inherits this property, too. Both and −1 are continuous, hence measurable. Therefore,

−1 ⊗ S n−1 ⊂ n

n ⊂ ⊗ S n−1

and

which shows that n = −1 ⊗ S n−1 . To see (15.18), fix A ∈ S n−1 and consider first the set B = x ∈ n x ∈ a b x ∈ A = −1 A ∩ x a x < b, which is clearly a Borel set of n . Thus

n B = n −1 A ∩ x a x < b

= n −1 A ∩ Bb 0 − n −1 A ∩ Ba 0

= bn n −1 A ∩ B1 0 − an n −1 A ∩ B1 0

= bn − an n −1 A ∩ B1 0 where we used that n a · B = an n B, cf. T7.10 or Problems 5.8, 7.7, and that −1 is invariant under dilations. This shows n B = n1 bn − an n−1 A =

ab

r n−1 n−1 A 1 dr

= × n−1 a b × A


155

Since the family a b × A a < b A ∈ S n−1 generates ⊗ S n−1 , see Lemma 13.3, and satisfies the conditions of the uniqueness theorem 5.7, the above relation extends to all sets B ∈ ⊗ S n−1 . Since n = −1 ⊗ S n−1 , we have B = B for some B ∈ n , so that n B = n −1 B = n −1 B = × n−1 B All other assertions follow now from Theorem 14.1 on image integrals and Fubini’s theorem 13.9. Let us note the particularly interesting case where ux = fx is rotationally invariant. 15.14 Corollary If ux = fx is a rotationally invariant function, then u ∈ 1 n n if, and only if, r → r n−1 fr ∈ 1 0 1 . In this case n

fx n dx = n n

r n−1 fr 1 dr

0

where n = n B1 0 denotes the volume of the unit ball in n . In particular, we get for the functions f x = x , ∈ , f ∈ 1 B1 0 \ 0 ⇐⇒ > −n f ∈ 1 n \ B1 0 ⇐⇒ < −n Proof The integral formula follows from (15.18) where the constant n n =

n−1 S n−1 .6 That n must be the volume of B1 0 is immediately clear if we choose ux = 1B1 0 x. The integrability of f follows now from Example 11.12. Let us finally determine n , the volume of the unit ball in n . For this we use the same method which we employed in Example 15.12: √

n (15.17)

=

e

−t2

n 1

dt

= 15.14

···

= n n

6

e−x1 +···+xn 1 dx1 2 dxn

2

2

r n−1 e−r 1 dr 2

0

This is, actually, the surface area of the unit ball B1 0 in n .

156

R.L. Schilling

Since r n−1 e−r is positive and improperly Riemann integrable[] , Riemann and Lebesgue integrals coincide (use 11.8, 11.18), and we find after a change of variables according to s = r 2 2

√ n = n n

r

n−1 −r 2

e

0

see Example 11.14. Since

n 2

n n/2−1 −s dr = n s e ds = n n2 n2 2 0

n2 = n2 + 1, we have finally established

n = n B1 0 =

15.15 Corollary

n/2 . n2 + 1

Continuous functions are dense in p n We will now establish a result that is closely related to Lemma 15.2: we show that the continuous functions with compact support Cc n are dense in the space of Lebesgue p-integrable functions p n , 1 p < , that is, if u ∈ p n , then ∀ > 0

∃ = u ∈ Cc n u − p

Since every compact set K ⊂ n is bounded, we find for some sufficiently large R > 0 that K ⊂ −R Rn , hence n K 2Rn . Thus for ∈ Cc n with support supp = = 0 ⊂ K, pp =

p dn =

K

p dn sup xp 2Rn < x∈K

so that Cc n ⊂ p n (measurability is clear because of continuity). Our strategy will be to approximate first indicator functions of Borel sets and simple functions. For this we need the following 15.16 Lemma (Urysohn) Let K ⊂ n be a compact set and U ⊃ K be an open set. Then there exists a continuous function = KU ∈ Cn such that 1 K 1U . Proof Let dx A = inf y∈A x − y be the distance of the point x ∈ n from the set A ⊂ n . For x x ∈ n we have

dx A = inf x − y inf x − x + x − y = x − x + dx A y∈A

y∈A


157

which shows, due to the symmetry in x and x , that dx A − dx A x − x , or, in other words, that x → dx A is continuous. It is now easy to see that the function dx U c x = dx K + dx U c is continuous and satisfies 1K 1U . 15.17 Theorem Cc n is a dense subset of p n , 1 p < . Proof We have already verified that Cc n ⊂ p n . Step 1: Cn ∩ p n is dense in n ∩ p n . Let B ∈ n such that 1B ∈ p n (i.e. n B < ). In steps 1,2 of the proof of Lemma 15.2 we constructed for such sets open sets U and closed sets C such that C ⊂ B ⊂ U

and n U − n B + n B − n C p

By the continuity of measures T4.4(iii) we find bounded, hence

for the closed and n n compact, sets Bj 0 ∩ C ↑ C that limj→ Bj 0 ∩ C = C . This means that we can replace C by a compact set K ⊂ C and still have K ⊂ B ⊂ U

and n U − n K 2p

Using Lemma 15.16 we find a continuous function = U K ∈ Cn with 1K 1U . As 1K 1B 1U we have, in particular, 1B − p 1B − 1K p + 1K − p 2 1U − 1K p 4 which also shows that ∈ p n . Since any f ∈ n ∩ p n has a standard representation of the form f = M j=0 yj 1Bj where y0 = 0 and B1 BM are Borel sets of finite volume, it is clear that Cn ∩ p n is dense in the set of all pth power integrable simple functions. Step 2 : Cn ∩ p n is dense in p n . Fix > 0. Since n ∩ p n is dense in p n , cf. C12.11, there exists some f ∈ n ∩ p n such that f − up Using step 1 we find some ∈ Cn ∩ p n with − f p and the claim follows from Minkowski’s inequality for •p − up − f p + f − up 2

158

R.L. Schilling

Step 3 : Cc n is dense in p n . Let ∈ Cn be the function constructed in step 2. Using Lemma 15.16 we obtain a sequence of functions j such that j→

1B 0 j 1Bj+1 0 . Obviously, j −−−→ , j and j ∈ j Cc n . Lebesgue’s dominated convergence theorem 11.2 (or 12.9) therefore shows that lim u − j p = u − p 2

j→

and the theorem is proved. Regular measures The seemingly innocuous question whether the continuous functions are a dense subset of p is – even for Lebesgue measure in n – quite hard to answer, as we have seen in Theorem 15.17. In general measure spaces, such results require a connection between measure and topology that reaches further than just considering the Borel (= topological) -algebra on a topological space X . This connection is made in the following 15.18 Definition Let X be a topological space, denote by the compact subsets of X and let be a measure on X , = . The measure is called outer regular if B = infU U ∈ U ⊃ B

∀ B ∈

and (compact) inner regular if B = supK K ∈ K ⊂ B

∀ B ∈

For Lebesgue measure n on n n we have proved outer and inner regularity in Lemma 15.2, see also step 1 in the proof of Theorem 15.17 and Problem 15.2. Let us note, without proof, the following characterization of outer regular measures. 15.19 Theorem Let X be a complete separable metric space7 and denote the open sets by and the compact sets by . Every measure on X X which is locally finite, i.e. every x ∈ X has an open neighbourhood U = Ux of finite measure U < , is both outer regular and inner regular, i.e. B = infU U ∈ U ⊃ B = supK K ∈ K ⊂ B 7

cf. Appendix B.


159

A proof can be found in Bauer [6, §26]. Note the analogy to Lemma 15.2 and the proof of Theorem 15.17 where we (essentially) verified Theorem 15.19 for Lebesgue measure. Also note that the measure in Theorem 15.19 is -finite: since X is separable, there is a countable dense subset D ⊂ X, and the collection = Br d r ∈ + d ∈ D Br d ⊂ Ud Ud as in T15.19 is a countable family of open balls with finite -measure. Moreover, since every U ∈ can be written in the form8 U= Br d

Br d⊂U

N

we find that X = N =1 j=1 Brj dj with Brj dj < . Almost the same argument that was used in the proof of Theorem 15.17 is valid in the abstract setting. 15.20 Theorem Let X be a topological space and be an outer regular measure on X X. Then the set Cfin X = u X → u is continuous u = 0 < is dense in Lp X , 1 p < . Proof Let A ∈ be a set with A < . Since is outer regular, we find for every > 0 some U ∈ such that A⊂U

and U − p A U

Literally as in step 2 of the proof of Lemma 15.2 we can find some closed set F with F ⊂A

and

F A F + p

and, consequently, U − F 2p . The rest of the proof is now as in T15.17. Problems 15.1. Let F F1 F2 F3 be F -sets in n . Show that (i) F1 ∩ F2 ∩ ∩ FN is for every N ∈ an F -set; (ii) Fj is an F -set; j∈ 8

This is similar to (3.2) in the proof of T3.8: the inclusion ‘⊂’ is obvious, for ‘⊃’ fix x ∈ U . Then there exists some r ∈ + with Br x ⊂ U . Since D is dense, x ∈ Br/2 d for some d ∈ D with d x < r/4, so that x ∈ Br/2 d ⊂ U .

160

R.L. Schilling

(iii) F c and j∈ Fjc are G -sets; (iv) all closed sets are F -sets. 15.2. Prove the following corollary to Lemma 15.2: Lebesgue measure n on n is outer regular, i.e. ∀ B ∈ n n B = inf n U U ⊃ B U open and inner regular, i.e. n B = sup n F F ⊂ B F closed = sup n K K ⊂ B K compact

∀ B ∈ n ∀ B ∈ n

15.3. Completion (6). Combine Problems 15.2 and 10.12 to show that the completion ¯ n of n-dimensional Lebesgue measure is again inner and outer regular. 15.4. Consider the Borel -algebra 0 and write = 1 0 for Lebesgue measure on the half-line 0 . (i) Show that = a a 0 generates 0 . (ii) Show that B = B 124 dx and B = 5 · B, B ∈ 0 are measures on 0 such that but not in general. Why does this not contradict Lemma 15.6? 15.5. Use Jacobi’s transformation formula to recover Theorem 5.8(i), Problem 5.8 and Theorem 7.10. Show, in particular, that for all integrable functions u n → 0 ux + y n dx = ux n dx ∀ y ∈ n 1 ux n dx tn 1 uAx n dx = ux n dx det A

ut x n dx =

∀ t > 0 ∀ A ∈ GLn

In particular, the l.h.s. of the above equalities exists and is finite if, and only if, the r.h.s. exists and is finite. Why can’t we use 15.5 and 15.8 to prove these formulae? 15.6. Arc-length. Let f → be a twice continuously differentiable function and denote by f = t ft t ∈ its graph. Define a function → 2 by x = x fx. Then (i) → f is a C 1 -diffeomorphism and det D x = 1 + f x2 . (ii) = det D 1 is a measure on f . (iii) f ux y d x y = ut ft 1 + f t2 d1 t with the understanding that whenever one side of the equality makes sense (measurability!) and is finite, so does the other.


161

The measure is called canonical surface measure on f . This name is justified by the following compatibility property w.r.t. 2 : Let nx be a unit normal vector ˜ × → 2 by x ˜ to f at the point x fx and define a map r = x + r nx. Then ˜ r = 1+f x2 −r f x. (iv) nx = −f x 1/ 1 + f x2 and det D x Conclude that for every compact interval c d there exists some > 0 such ˜ cd×− is a C 1 -diffeomorphism. that (v) Let C ⊂ f cd and r < with as in (iv). Make a sketch of the set

˜ −1 C × −r r and show that it is Borel measurable. Cr = (vi) Use dominated convergence to show that for every x ∈ c d 1 det D x ˜ ˜ 0 r 1 dr = det D x lim r↓0 2r −rr (vii) Use the general transformation theorem 15.8, Tonelli’s theorem 13.8, (vi) and dominated convergence to show that det D x 1 dx lim 2 Cr = −1 C

r↓0

(viii) Conclude that

1 + f t2 dt is the arc-length of the graph of f .

15.7. Let d → M ⊂ n , d n, be a C 1 -diffeomorphism.

(i) Show that M = det D d is a measure on M. Find a formula for u dM . M (ii) Show that for a dilation r n → n , x → r x, r > 0, we have ur r n dM = u dM M

r M

(iii) Let M = x = 1 = S be the unit sphere in n , so that d = n − 1. Show that for every integrable u ∈ 1 n and = M uxn dx = ux dx 1 dr n−1

0 x=r

=

ur x dx 1 dr

0 x=1

Remark. With somewhat more effort it is possible to show the analogue of the approximation formula in Problem 15.6(vii) for M ; all that changes are technical details, the idea of the proof is the same, cf. Stroock [50, pp. 94–101] for a nice presentation. 15.8. In Example 11.14 we introduced Euler’s Gamma function: xt−1 e−x 1 dx t = Show that

21

=

√

0

.

162 15.9.

R.L. Schilling 3-d polar coordinates. Define 0 × 0 2 × −/2 /2 → 3 by

r = r cos cos r sin cos r sin Show that det D r = r 2 cos and find the integral formula for the coordinate change from Cartesian to polar coordinates x y z r .

15.10. Compute for m n ∈ the integral

B1 0

xm yn d2 x y.

16 Uniform integrability and Vitali’s convergence theorem

Lebesgue’s dominated convergence theorem 11.2 gives sufficient conditions which allow us to interchange limits and integrals. A crucial ingredient is the assumption that uj w a.e. for all j ∈ and some w ∈ 1+ . This condition is not necessary, but a slightly weaker one is indeed necessary and sufficient in order to swap limits and integrals. The key idea is to control the size of the sets where the uj exceed a given reference function. This is the rationale behind the next definition. 16.1 Definition Let X be a measure space and ⊂ be a family of measurable functions. We call uniformly integrable (also: equi-integrable) if ∀ > 0 ∃ w ∈ 1+ sup u d <

(16.1) u∈ u>w

Note that there are other (but for X < usually equivalent) definitions of uniform integrability, see Theorem 16.8 below for a discussion. We follow the universal formulation due to G. A. Hunt [21, p. 33]. j→

The other key assumption in Theorem 11.2 was that uj x −−−→ ux for (almost) all x ∈ X; we can weaken this assumption, too. 16.2 Definition Let X be a measure space. A sequence of -measurable ¯ converges in measure1 if numerical functions uj X → ∀ > 0 ∀ A ∈ A < lim uj − u > ∩ A = 0 (16.2) j→

holds for some u ∈ . We write - limj→ uj = u or uj −→ u. 1

If is a probability measure one usually speaks of convergence in probability.

163

164

R.L. Schilling

16.3 Example Convergence in measure is strictly weaker than pointwise convergence. To see this, take X = 0 1 0 1 1 01 and set un x = 1j2−k j+12−k x

n = j + 2k 0 j < 2k

This is a sequence of rectangular functions of width 2−k moving in 2k steps through 0 1 , jump back to x = 0, halve their width and start moving again. Obviously, n = nk→

1 un > = 2−k −−−−−−−→ 0

∀ ∈ 0 1

1

so that un −→ 0 in measure, but the pointwise limit limn→ un x does not exist anywhere.[] 16.4 Lemma Let uj j∈ ⊂ p , p ∈ 1 , and wk k∈ ⊂ . Then

(i) lim uj − u p = 0 implies uj −→ u; j→

(ii) lim wk x = wx a.e. implies wk −→ w. k→

Proof (i) follows immediately from the Markov inequality P10.12, uj − u > ∩ A uj − u > = uj − up > p

1

u − u pp

p j

(ii) Observe that for all > 0 wk − w > ⊂ ∧ wk − w

An application of the Markov inequality P10.12 yields wk − w > ∩ A ∧ wk − w ∩ A 1 1 ∧ wk − w 1A d

∧ wk − w d = A 1 If A < , the function 1A ∈ + is integrable, dominates the integrand ∧ wk − w 1A , and Lebesgue’s dominated convergence theorem 11.2 implies that limk→ A ∧ wk − w d = 0.

16.5 Lemma Assume that X is -finite and that uj j∈ ⊂ converges in measure to u. Then u is a.e. unique.


165

Proof Let Ak k∈ ⊂ be a sequence with Ak ↑ X and Ak < . Suppose that

u and w are two measurable functions such that uj −→ u and uj −→ w. Because of u − w u − uj + uj − w we find for all j n ∈ that u − w > n2 ⊂ u − uj > n1 ∪ uj − w > n1

Therefore, Ak ∩ u − w > n2 j→ Ak ∩ u − uj > n1 + Ak ∩ uj − w > n1 −−−→ 0 holds for all k n ∈ , i.e. Ak ∩ u − w > n2 is a null set for all k n ∈ ; but then u = w ⊂ n∈ u − w > n2 = kn∈ Ak ∩ u − w > n2 is also a null set, and we are done. Caution: Limits in measure on a non--finite measure space X need not be unique, see Problem 16.6. We are now ready for the main result of this chapter, which generalizes Lebesgue’s dominated convergence theorem 11.2. 16.6 Theorem (Vitali) Let X be -finite and let uj j∈ ⊂ p , p ∈ 1 , be a sequence which converges in measure to some measurable function u ∈ . Then the following assertions are equivalent: (i) lim uj − u p = 0; j→ (ii) uj p j∈ is a uniformly integrable family; (iii) lim uj p d = up d. j→

Proof (iii)⇒(ii): Since lim uj p d = up d, there exists some constant j→ C < such that supj∈ uj p d C, and for every > 0 there is some N ∈ such that ∀ j N

uj p d − up d p p

Setting w = maxu1 u2 uN u , we have w ∈ + [] and we see for every ∈ 0 1 that uj > 1 w = ∅ ∀ j N uj > 1 w ⊂ uj > u ∀ j ∈

166

R.L. Schilling

This implies for all j ∈ that p p p p

uj d

uj − u u d d +

1 1 uj > 1 w uj > w uj > w up d p + uj >u p + p sup uj p d 1 + C p

j∈

Since

uj > 1 w = uj p > 1p wp (16.3) we have established the uniform integrability of uj p j∈ . (ii)⇒(i): Let us first check that the double sequence uj − uk p jk∈ is again uniformly integrable. In view of (16.3), our assumption reads uj p d < ∀j ∈ (16.4) p

w ∈ + ⇔ wp ∈ 1+

and

uj >w p

for some suitable w = w ∈ + . From a − b a + b 2 maxa b we deduce p p p uj − uk d 2 uj ∨ uk d uj −uk >2w

uj −uk >2w

and since uj − uk uj + uk we get uj − uk > 2w ⊂ uj > w ∪ uk > w

Consequently, uj − uk p d uj −uk >2w

2

p

+

uj >w ∩uk >w

2p

uj >wuk

uj p d

uj >w ∩uk >w

+ 2p

uj p d + 2p

uj >w 16 4

4 · 2p = 2p+2

+

+

uk >wuj

uj >w ∩uk >w

uk >w

uk p d

uj p ∨ uk p d

uk p d


167

p

From this we conclude that for W = 2w ∈ + and large R > 0 uj − uk p d uj − uk p d + uj − uk p d = uj −uk >W

uj −uk W

2p+2 +

uj −uk W ∧

2

p+2

+

2p+2 +

p

W d + p

uj −uk > ∩ W>R

p ∧ W p d +

uj − uk p d

∩ <W R

W p d

W>R

+ R uj − uk > ∩ < W R

p

Letting first j k → we find because of uj −→ u that[] lim sup uj − uk p d 2p+2 + p ∧ W p d + jk→

W p d

W>R

The last two terms vanish as → 0 and R → by the dominated convergence theorem 12.9, so that limjk→ uj − uk p d = 0. Since p is complete (cf. T12.7), uj j∈ converges in p to a limit u˜ ∈ p .

Due to Lemma 16.4, p -convergence also implies uj −→ u˜ and, by Lemma 16.5, we have u = u˜ a.e., hence p - limj→ uj = u. (i)⇒(iii): is a consequence of the lower triangle inequality for the p -norm, cf. the first part of the proof of Theorem 12.10 16.7 Remark Vitali’s theorem 16.6 still holds for measure spaces X which are not -finite. In this case, however, we can no longer identify the p -limit and the theorem reads: If uj −→ u, then the following are equivalent: (i) uj j∈ converges in p ; (ii) uj j∈ is uniformly integrable; (iii) uj p j∈ converges in . The reason for this is evident from the proof of T16.6: the last few lines of the step (ii)⇒(i) require -finiteness of X .

168

R.L. Schilling

Different forms of uniform integrability2 In view of Vitali’s convergence theorem 16.6 one is led to suspect that uniform integrability is essentially a sufficient (and also necessary, if X is -finite) condition for weak sequential relative compactness in 1 , i.e. every uj j∈ ⊂ has a subsequence ujk k∈ such (16.1) =⇒ that lim u · d exists for all ∈ . jk k→

(see Dunford and Schwartz [15, pp. 289–90, 386–7]). In p , 1 < p < , uniform boundedness of ⊂ p is enough for this: ⎧ ⎪ every uj j∈ ⊂ has a subsequence ⎪ ⎨ ujk k∈ such that lim ujk · d sup u p < ⇐⇒ k→ ⎪ u∈ ⎪ ⎩ exists for all ∈ q 1 + 1 = 1. p

q

This is a consequence of the reflexivity of the spaces p , p > 1. Let us give various equivalent conditions for uniform integrability. 16.8 Theorem Let X be some measure space and ⊂ 1 . Then the following statements (i)–(iv) are equivalent: is uniformly integrable, i.e. (16.1) holds; (ii) a) sup u d < ; (i)

u∈

b) ∀ > 0 ∃ w ∈ 1+ > 0 ∀ B ∈ =⇒ sup u d< ;

(iii) a) sup

B

w d <

u∈ B

u d < ;

u∈

b) ∀ > 0 ∃ K ∈ K 0 ∃ > 0 ∀ B ∈ B < =⇒ sup u d < ; u∈ B (iv) a) ∀ > 0 ∃ K ∈ K R

If X is a -finite measure space, (i)–(iv) are also equivalent to 2

This section can be left out at first reading.


169

u d < ; u d = 0 for every decreasing sequence Aj j∈ ⊂ , Aj ↓ b) lim sup

(v) a) sup u∈

j→ u∈ Aj

∅. [Note: Aj < is not assumed.] If X is a finite measure space, (i)–(v) are also equivalent to (vi) lim sup u d = 0; R→ u∈ u>R (vii) sup u d < for some increasing, convex function 0 u∈

t = . t→ t

→ 0 such that lim

16.9 Remark Almost any combination of the above criteria appears in the literature as uniform integrability or under different names. Here is a short list: (ii-a) – uniform boundedness (iii-b) – tightness (iii-c) – uniform absolute continuity (v-b) – uniform -additivity (vii) – de la Vallée Poussin’s condition (iii) – Dieudonné’s condition (weak seq. relative compactness) (v) – Dunford–Pettis condition (weak seq. relative compactness) Proof (of Theorem 16.8) First we show (iv)⇒(iii)⇒(ii)⇒(i)⇒(iv) for general measure spaces, then (ii)⇒(v)⇒(i) for -finite measure spaces and, finally, for finite measure spaces (iv)⇒(vi)⇒(vii)⇒(i). (iv)⇒(iii): Condition (iii-b) is clear. Given > 0 we can pick K = K/2 ∈ and R = R/2 > 0 such that u d + u d + u d u d = K∩u>R

Kc

K∩uR

+ RK + < 2 2 uniformly for all u ∈ . Setting = 2R we see for every B ∈ with B < that u d = u d + u d

B

and (iii) follows.

B∩u>R

u>R

B∩uR

u d + R B

+ R = 2

170

R.L. Schilling

(iii)⇒(ii): Condition (ii-a) is clear. Given > 0 we pick K = K ∈ with K < and = > 0 and set w = 1K . If B ∈ is such that B ∩ K = w d < , we get from (iii-c) and (iii-b) that B u d = u d + u d + B

B∩Kc

B∩K

uniformly for all u ∈ which is just (ii-b). (ii)⇒(i): Take w = w and = > 0 as in (ii). If R > and so

u d

u>Rw

u>Rw

u d R

w d

1

sup u d we see u∈

w d u>Rw

1 sup u d

R u∈

From (ii-b) we infer that supu∈ u>Rw u d . (i)⇒(iv): Let w = w be as in (i) resp. (16.1). Since u w ∩ u R ⊂ w R , we have u d = u d + u d u>R

u>w ∩u>R

u>w

+

u d +

uw ∩u>R

(16.5)

w d w>R

w 1w>R d

From the dominated convergence theorem 11.2 we see that the right-hand side tends (uniformly for all u ∈ ) to as R → and (iv-b) follows. To see (iv-a) we choose r = r > 0 so small that wr w d w ∧ r d ; this is possible since by Lebesgue’s convergence theorem 11.2 limr→0 w ∧ r d = 0. By the Markov inequality P10.12 we see w > r 1r w d < , and we get for K = w > r u d = sup u d + u d sup u∈ K c

wr ∩u>w

u∈ (16.1)

+ sup

u∈ wr ∩uw

+ 2

w d wr

wr ∩uw

u d


171

This proves (iv). Assume for the rest of the proof that is -finite (ii)⇒(v): (v-a) is clear. If A j ↓ ∅ we see from the monotone convergence theorem 11.1 that limj→ A w d = 0, so that for we have by (ii-b) j supu∈ A u d supu∈ A u d < for sufficiently large j ∈ . j

j

(v)⇒(i): Note that for the positive, resp. negative parts u± of u u± d = ±u d and Aj ∩ ±u 0 ↓ ∅ Aj ∩±u0

Aj

which implies that we may replace u in (v-b) by u. Since is -finite, we can find an exhausting sequence Ek ∈ , Ek ↑ X, Ek < . The function w =

2−k 1 1 + Ek Ek k∈

is clearly positive and ∈ 1+ . Assume (i) false; in particular, u d >

∃ > 0 ∀ j ∈ sup u∈ u>j w

But Aj = u > j w ↓ ∅ and (v) (with the above discussed modification) will then lead to a contradiction. Assume for the rest of the proof that is finite (iv)⇒(vi): is trivial. (vi)⇒(vii): For u ∈ we set n = n u = u > n and define t =

s ds

s =

0t

n 1nn+1 s

n=1

We will now determine the numbers 1 2 3 . Clearly, t =

n

n=1

0t

1nn+1 s ds =

n t − n+ ∧ 1

n=1

and

u d =

n=1

n

n u > n

u − n+ ∧ 1 d n=1

(16.6)

172

R.L. Schilling

If we can construct n n∈ such that it increases to and (16.6) is finite (uniformly for all u ∈ ), then we are done: s will increase to , t will be convex3 and satisfy t 1 1 s ds = s ds 21 2t ↑

t t 0t t t/2t By assumption we sequence rj j∈ ⊂ such that can find an increasing −j limj→ rj = and u>r u d 2 . Thus j

u > k =

k=rj

 k

j=1 k=rj

< u + 1

j=1 =rj

=

j=1 =rj rj

2−j by assumption 3

Usually one argues that 0 a.e., but for this we need to know that the monotone function = is almost everywhere differentiable – and this requires Lebesgue’s differentiation theorem 19.20. Here is an alternative elementary argument: it is not hard to see that a b → is convex if, and only if, y−x z−x holds for all a < x < y < z < b, use e.g. the technique of the proof of Lemma 12.13. y−x z−x x Since x = 0 s ds (by L13.12 and T11.8), this is the same as 1 y 1 z 1 y 1 z s ds s ds ⇐⇒ s ds s ds y−x x z−x x y−x x z−y y 1 1 ⇐⇒ sy − x + x ds sz − y + y ds

0

0

The latter inequality follows from the fact that is increasing and sy − x + x ∈ x y while sz − y + y ∈ y z for 0 s 1.


173

and interchange the order of summation in the first double sum on the left: u > k = 11k rj u > k 1

j=1 k=rj

k=1

j=1

= k

This finishes the construction of the sequence k k∈ . (vii)⇒(i): Since X < , constants are integrable and we may take w x = r for all x ∈ X. Fix > 0 and choose r so big that t−1 t > 1/ for all t > r . Then u d u d u d u>r

u>r

and (i) follows. Problems 16.1. Let X be a finite measure space and uj j∈ ⊂ . Prove that

j→ lim sup uj > = 0 ∀ > 0 =⇒ fj −−→ 0 a.e. k→

jk

[Hint: uj → 0 a.e. if, and only if, jk uj > is small for all > 0 and big k k .] 16.2. Show that for a sequence uj j∈ of measurable functions on a finite measure space

lim sup uj > = lim sup uj > ∀ > 0 k→

j→

jk

and combine this with Problem 16.1 to give a new criterion for a.e. convergence. j→

16.3. Let X be a measure space and uj j∈ ⊂ . Show that uj −−→ u in jk→

measure if, and only if, uj − uk −−−→ 0 in measure. 16.4. Consider one-dimensional Lebesgue measure on 0 1 0 1 . Compare the convergence behaviour (a.e., p , in measure) of the following sequences: (i) fnj = n 1j−1/nj/n , n ∈ 1 j n run through in lexicographical order; (ii) gn = n 101/n , n ∈ ; (iii) hn = an 1 − nx+ , n ∈ , x ∈ 0 1 and a sequence an n∈ ⊂ + . 16.5. Let uj j∈ wj j∈ be two sequences of measurable functions on X . Sup

→ u and wj − → w. Show that auj + bwj , a b ∈ , maxuj wj , pose that uj − minuj wj and uj converge in measure and find their limits. 16.6. Let X be a measure space which is not -finite. Construct an example of a sequence uj j∈ ⊂ which converges in measure but whose limit is not unique. Can this happen in a -finite measure space?

174

R.L. Schilling

[Hint: let Xf = F F < be the -finite part of X. Show that X \ Xf = ∅, that every measurable E ⊂ X \ Xf satisfies E = and that we can change every limit of uj j∈ outside Xf .] 16.7. (i) Prove, without using Vitali’s convergence theorem, the following Theorem (Bounded convergence). Let X be a measure space, A ∈ be a set with A < and uj j∈ be a sequence of measurable functions. Suppose that all uj vanish on Ac , that uj C for all j ∈ and some constant

→ u. Then L1 -limj uj = u. C > 0 and that uj − (ii) Use one-dimensional Lebesgue measure and the sequence uj = 1jj+1 to show that the assumption A < is really needed in (i). (iii) As L1 -limit the function u is unique but, as we have seen in Problem 16.6, this is not the case for limits in measure. Why does the uniqueness of the limit in (i) not contradict Problem 16.6? 16.8. Let P be a probability space. Define for two random variables X Y X Y = inf > 0 PX − Y

(i) is a pseudo-metric on the space of random variables , i.e. satisfies properties d2 , d3 of a metric, cf. Appendix B, Definition B.15. (ii) A sequence Xj j∈ ⊂ converges in probability to a random variable j→

X if, and only if, Xj X −−→ 0. (iii) is a complete pseudo-metric on , i.e. every -Cauchy sequence converges in probability to some limit in . (iv) Show that g X Y =

X − Y dP 1 + X − Y

and

X Y =

X − Y ∧ 1 dP

are pseudo-metrics on which have the same Cauchy sequences as . 16.9. Let X be a -finite measure space. Suppose that Aj j∈ ⊂ satisfies j→

Aj −−→ 0. Show that lim

j→ Aj

u d = 0

∀ u ∈ 1

[Hint: use Vitali’s convergence theorem 16.6.] 16.10. Let X be a measure space and un n∈ ⊂ . n→

(i) Let xn n∈ ⊂ . Show that xn −−→ 0 if, and only if, every subsequence k→

xnk k∈ satisfies xnk −−→ 0.

→ u if, and only if, every subsequence unk k∈ has a sub(ii) Show that un − subsequence ˜unk k∈ which converges a.e. to u on every set A ∈ of finite -measure.


175

[Hint: use L16.4 for necessity. For sufficiency show that u˜ nk → u in measure, hence the sequence of reals A ∩ unk − u > has a subsequence converging to 0; use (i) to conclude that A ∩ un − u > → 0.] → u entails that un − → u for every (iii) Use part (ii) to show that un − continuous function → . 16.11. Let and be two families of uniformly integrable functions on an arbitrary measure space X . Show that (i) every finite collection of functions f1 fn ⊂ 1 is uniformly integrable. (ii) ∪ f1 fn , f1 fn ∈ 1 is uniformly integrable. (iii) + = f + g f ∈ g ∈ is uniformly integrable. (iv) c.h. = tf + 1 − t f ∈ 0 t 1 (‘c.h.’ stands for convex hull) is uniformly integrable. (v) the closure of c.h. in the space 1 is uniformly integrable. 16.12. Assume that uj j∈ is uniformly integrable. Show that 1 lim sup uj d = 0

k→ k jk 16.13. Let P be a probability space. Adapt the proof of Theorem 16.8 to show that a sequence uj j∈ ⊂ 1 is uniformly integrable if it is bounded in some space p P with p > 1, i.e. if supj∈ uj p < . Use Vitali’s convergence theorem 16.6 to construct an example illustrating that 1 -boundedness of uj j∈ does not guarantee uniform integrability. 16.14. Let X be a finite measure space and ⊂ 1 be a family of integrable functions. Show that is uniformly integrable if, and only if, j=1 j j < f j + 1 converges uniformly for all f ∈ . [Hint: compare (vi)⇒(vii) of the proof of Theorem 16.8.]

17 Martingales

Martingales are a key tool of modern probability theory, in particular, when it comes to a.e. convergence assertions and related limit theorems. The origins of martingale techniques can be traced back to analysis papers by Kac, Marcinkiewicz, Paley, Steinhaus, Wiener and Zygmund from the early 1930s on independent (or orthogonal) functions and the convergence of certain series of functions, see e.g. the paper by Marcinkiewicz and Zygmund [28] which contains many references. The theory of martingales as we know it now goes back to Doob and most of the material of this and the following chapter can be found in his seminal monograph [13] from 1953. We want to understand martingales as an analysis tool which will be useful for the study of Lp - and almost everywhere convergence and, in particular, for the further development of measure and integration theory. Our presentation differs somewhat from the standard way to introduce martingales – conditional expectations will be defined later in Chapter 22 – but the results and their proofs are pretty much the usual ones. The only difference is that we develop the theory for -finite measure spaces rather than just for probability spaces. Those readers who are familiar with martingales and the language of conditional expectations we ask for patience until Chapter 23, in particular Theorem 23.9, when we catch up with these notions. Throughout this chapter X is a measure space which admits a filtration, i.e. an increasing sequence 0 ⊂ 1 ⊂ ⊂ j ⊂ ⊂ of sub--algebras of . If X 0 is -finite1 we call X j a -finite filtered measure space. This will always be the case from now on. Finally, 1

i.e. Aj j∈ ⊂ 0 with Aj ↑ X and Aj < .

176


177

we write = j j = 0 1 2 for the smallest -algebra generated by all j . 17.1 Definition Let X j be a -finite filtered measure space. A sequence of -measurable functions uj j∈ is called a martingale (w.r.t. the filtration j j∈ ), if uj ∈ 1 j for each j ∈ and if uj+1 d = uj d ∀ A ∈ j (17.1) A

A

We say that uj j∈ is a submartingale (w.r.t. j j∈ ) if uj ∈ 1 j and uj+1 d uj d ∀ A ∈ j (17.2) A

A

and a supermartingale (w.r.t. j j∈ ) if uj ∈ 1 j and uj+1 d uj d ∀ A ∈ j A

(17.3)

A

If we want to emphasize the underlying filtration, we write uj j j∈ . 17.2 Remark (i) It is enough to assume instead of (17.1) that G uj+1 d = G uj d for all G ∈ j where j is a generator of j containing an exhausting sequence Gk k∈ ⊂ j with Gk ↑ X. This follows from the fact that + − uj+1 d = uj d ⇐⇒ u+ + u d = u− j j+1 + uj d j+1 A A A A = A

= A

where are finite measures on j and from the uniqueness theorem 5.7: j = j implies – under our assumptions on j – that = on j . (For sub- or supermartingales we need, in addition, that j is a semi-ring, cf. Lemma 15.6.) (ii) Set j = A ∈ j A < . It is not hard to see that j is a semiring and that, because of -finiteness, j = j . Therefore (ii) means that it is enough to assume (17.1)–(17.3) for all sets in j , i.e. for all sets with finite -measure. (iii) Condition (17.2) in Definition 17.1 is equivalent to

uj+1 d uj d ∀ ∈ (17.2 ) + j

Indeed: Since = 1A ∈ + j for all A ∈ j , (17.2 ) implies (17.2). Conversely, if ∈ + j is a simple function, (17.2 ) follows from (17.2) by linearity. For general ∈ + j , we find by T8.8 a sequence of j -measurable

178

R.L. Schilling

simple functions k such that k and k ↑ . Since uj uj+1 ∈ 1 , we can use Lebesgue’s dominated convergence theorem 11.2 and get

uj+1 d = lim

k→

17.2’

k uj+1 d

lim

k→

k uj d =

uj d

Similar statements hold for martingales (17.1) and supermartingales (17.3). (iv) With some obvious (notational) changes in Definiton 17.1 we can also consider other index sets such as 0 , or −. 17.3 Examples Let X j be a -finite filtered measure space. (i) uj j∈ is a martingale if, and only if, it is both a sub- and a supermartingale. (ii) uj j∈ is a supermartingale if, and only if, −uj j∈ is a submartingale. (iii) Let uj j∈ and wj j∈ be [sub-]martingales and let be [positive] real numbers. Then uj + wj j∈ is a [sub-]martingale. (iv) Let uj j∈ be a submartingale. Then u+ j j∈ is a submartingale. Indeed: Take A ∈ j and observe that uj 0 ∈ j . Then + + uj+1 d uj+1 d uj+1 d A

A∩uj 0

A∩uj 0

(17.2)

A∩uj 0

uj d =

A

u+ j d

(v) Let uj j∈ be a martingale. Then uj j∈ is a submartingale. This follows from uj = 2u+ j − uj , (iii) and (iv). (vi) Let uj j∈ be a martingale. If uj ∈ p j for some p ∈ 1 , then uj p j∈ a submartingale. y Indeed: Note that y p − x p = x p tp−1 dt p x p−1 y − x for all x y ∈ y x where we set, as usual, x = − y if x > y . If we take y = uj+1 and x = uj and integrate over A ∈ j , we find by dominated convergence T11.2 uj+1 p − uj p d p 1A uj p−1 uj+1 − uj d A

=

lim p

N →

1A uj p−1 ∧ N uj+1 − uj d ∈ + j

(17.2 ),(v)

0

since uj j∈ is, by (v), a submartingale.


179

(vii) Let uj ∈ 1 j , j ∈ , and u1 u2 u3 . Then uj j∈ is a submartingale. (viii) Let X = 0 1 0 1 = 1 01 and consider the finite (-) algebras generated by all dyadic intervals of 0 1 of length 2−j , j ∈ 0 : −j −j −j j −j j = 0 2 k2 k + 12 2 − 12 1 Obviously, 0 ⊂ 1 ⊂ ⊂ 0 1 and 0 1 0 1 j is a (-) finite filtered measure space. Then uj j∈0 , uj = 2j 102−j , is a martingale. Indeed: Since the sets k2−j k + 12−j , k = 0 1 2j − 1 are a disjoint partition of 0 1, every A ∈ consists of a (finite) disjoint union of such sets. If 0 2−j ⊂ A, we have uj+1 d = 2j+1 1A∩02−j+1 d = 2j+1 2−j+1 A

= 2j 2−j =

2j 1A∩02−j d =

A

uj d

and, otherwise, uj+1 d = 2j+1 102−j+1 d = 0 = 2j 102−j d = uj d A

A

A

A

(ix) Let X = 0 n 0 n = n 0n and consider the algebras j generated by the lattice of half-open dyadic squares of sidelength 2−j , j ∈ 0 , −j n −j n j ∈ 0 j = z + 0 2 z ∈ 2 0 n n n Then 0 ⊂ 1 ⊂ ⊂ 0 , and 0 0 j is a -finite filtered measure space. For every real-valued function u ∈ 1 0 n we can define an j measurable step function uj on the dyadic squares in j by z+02−j n u d 1z+02−j n x uj x = −j n z∈2−j n0 z + 0 2 (17.4)

1z+02−j n d 1z+02−j n x = u −j n z + 0 2 n −j z∈2 0

Then uj j j∈ is a martingale.

180

R.L. Schilling

Indeed: Since the sets z + 0 2−j n are disjoint for different z ∈ 2−j n0 , the sums in (17.4) are actually finite sums.

That uj ∈ 1 j is clear from the construction. To see (17.1), fix z ∈ 2−j n0 and j ∈ 0 and observe that for all k = j j + 1 j + 2

z +02−j n

uk x dx

=

1z+02−k n d · 1z+02−k n 1z +02−j n d u z + 0 2−k n

z∈2−k n0

=

z∈2−k n0 z+02−k n ⊂z +02−j n

=

z∈2−k n0 z+02−k n ⊂z +02−j n

=

1z+02−k n d · z + 0 2−k n u z + 0 2−k n

z+02−k n

ux dx

z +02−j n

ux dx

The r.h.s. is independent of k and, therefore, we get uj d = u d = z +02−j n

z +02−j n

z +02−j n

uj+1 d

Since j is generated by (disjoint unions of) squares of the form z + n −j n

−j 0 2 , z ∈ 2 0 , the claim follows from Remark 17.2(i).

(x) Assume that X is a probability space, i.e. a measure space where X = 1. A family of real functions uj j∈ ⊂ 1 is called independent, if M M −1 uj Bj = u−1 (17.5) j Bj j=1

j=1

holds for all M ∈ and any choice of B1 B2 BM ∈ . If k = u1 u2 uk is the -algebra generated by u1 u2 uk , then the sequence of partial sums sk = u1 + u2 + · · · + uk is an k k∈ -submartingale if, and only if,

k ∈

uj d 0 for all j.


181

To see this we need an auxiliary result which is of some interest on its own: If u1 u2 uk+1 are independent integrable functions, then A

uk+1 d = A

uk+1 d

∀ A ∈ u1 u2 uk

(17.6)

∀ ∈ 1 u1 uk

(17.7)

and

uk+1 d =

d ·

uk+1 d

In particular, integrable independent functions satisfy k

uj d =

j=1

k

uj d

j=1

The proof of (17.6) and (17.7) will be given in Scholium 17.4 below. Returning to the original problem, we find for all A ∈ k that A

sk+1 d =

A

sk + uk+1 d = (17.6)

=

A

A

sk d +

A

uk+1 d

sk d + A

uk+1 d

Thus uk+1 d 0 is necessary and sufficient for sk k∈ to be a submartingale. (xi) Let uj j∈ ⊂ 1+ ∩ + be independent functions (in the sense of (x)). Then pk = u0 · u1 · · uk , k ∈ , isa submartingale w.r.t. the filtration k = u0 u1 uk if, and only if, uj d 1 for all j. This follows directly from A

pk+1 d =

(17.7)

1A pk uk+1 d =

=

A

1A pk d · pk d ·

uk+1 d uk+1 d ∀ A ∈ k

17.4 Scholium (on independent functions) (i) Let u1 u2 uk+1 be independent integrable functions on the probability space X . Then A

uk+1 d = A

uk+1 d

∀ A ∈ u1 u2 uk

(17.6)

182

R.L. Schilling

and

uk+1 d =

d ·

∀ ∈ 1 u1 uk

uk+1 d

(17.7)

−1 Proof. We begin with (17.6). Pick a set AM = M j=1 uj Bj , B1 BM ∈

, M k, from the generator of k = u1 u2 uk . Because of Theorem 8.8 (and Problem 8.10) we find a sequence of simple functions f ∈ ⊂ uk+1 such that f uk+1 and lim→ f = uk+1 . For the standard repreN sentations f = j=0 yj 1Hj , Hj ∈ uk+1 , we get using dominated convergence T11.2

11.2

AM

uk+1 d = lim

N

→ AM j=0

yj 1Hj d

N

= lim

→

yj AM ∩ Hj

j=0

N

(17.5)

= lim

→

yj AM Hj

j=0

11.2

= AM

uk+1 d

where we applied (17.5) for Hj ∈ uk+1 ⇐⇒ Hj = u−1 k+1 Cj with some suitable Cj ∈ and AM . This proves (17.6) for a generator of k which satisfies the conditions stated in Remark 17.2(i); a similar argument as the one in this remark now proves that (17.6) holds for all A ∈ k . For (17.7) let us first assume that is bounded. Set k = u1 uk . By Theorem 8.8 (and Problem 8.10) we find a sequence of simple functions f ∈ ⊂ k such that f and lim→ f = . For the standard N representations f = j=0 yj 1Aj , Aj ∈ k , we get using dominated convergence T11.2 and (17.6)

11.2

uk+1 d = lim

→

N j=0

yj 1Aj uk+1 d

N

= lim

→

j=0

yj Aj

uk+1 d


= lim

→

11.2

=

N j=0

d ·

yj 1Aj d ·

183

uk+1 d

uk+1 d

If is integrable but not bounded, we apply the previous calculation to the bounded functions = ∧ and use dominated convergence on the right and monotone convergence on the left to get

9.6

· uk+1 d = lim

→

· uk+1 d = lim

d ·

→

11.2

=

d ·

uk+1 d

uk+1 d

This shows, in particular, that uk+1 ∈ 1 . We can therefore apply dominated convergence to = − ∨ ∧ to derive

uk+1 d = lim

→

uk+1 d = lim

→

=

d ·

d ·

uk+1 d

uk+1 d

(ii) In Example 17.3(x) we assumed the existence of infinitely many independent functions. As a matter of fact, this is a not completely trivial matter. If we want to construct finitely many independent functions u1 u2 un , we can proceed as follows. Replace the probability space X by the n-fold n ⊗n ×n product measure space X (which is again a probability space[] ) and define u˜ j x1 xn = uj xj for j = 1 2 n. Since each of the new functions u˜ j depends only on the variable xj , their independence follows from a simple Fubini-type argument. A similar argument can be applied to countably many functions – provided we know how to construct infinite-dimensional products. We will not follow this route but construct instead countably many independent functions Xj j∈ on the probability space 0 1 0 1 = 1 01 which are identically distributed, i.e. the image measures satisfy X1 = Xj for all j ∈ with a Bernoulli distribution X1 = p 1 + 1 − p 0 , p ∈ 0 1. Consider the interval map p 0 1 → 0 1 p x =

x x−p x 10p x + 1 p 1 − p p1

184

R.L. Schilling

and its iterates np = p · · · p , see the pictures for the graphs of p and 2p . n times

Define Xn x = 10p n−1 p x

n ∈

In the first step the interval 0 1 is split according to p 1 − p into two intervals 0 p and p 1 and X1 is 1 on the left segment and 0 on the right. The subsequent iterations split each of the intervals of the previous step – say, step n − 1 – into two new sub-intervals according to the ratio p 1 − p, and we define Xn to be 1 on each new left subinterval and 0 otherwise, see the picture for n = 1 2. Thus Xn = 1 = p and Xn = 0 = 1 − p, which means that the Xn are identically Bernoulli distributed. To see independence, fix j ∈ 0 1 , and observe that X1 = 1 ∩ X2 = 2 ∩ ∩ Xn−1 = n−1 exactly determines the segment before the nth split. Since each split preserves the proportion between p and 1 − p, we find

1 p

p 0

1

1 p

2p 0

p2

p

2p-p2

1

X1 = 1 ∩ ∩ Xn−1 = n−1 ∩ Xn = 1 = X1 = 1 ∩ ∩ Xn−1 = n−1 · p so that X1 = 1 ∩ ∩ Xn−1 = n−1 ∩ Xn = n = p1 +···+n 1 − pn−1 −···−n =

n

Xj = j

j=1

This shows that the Xj are all independent. For later reference purposes let us derive some formulae for the arithmetic means n1 Sn = n1 X1 + X2 + · · · + Xn . The mean value is 1 1 Sn d = X1 + · · · + Xn d = X1 d = 1 · p + 0 · 1 − p = p n n


185

while the variance is given by n 2 2 1 S − np d = X − p d j n n n2 j=1

1

=

n 1 Xj − pXk − p d n2 jk=1

n 1 = 2 Xj − p2 d n j=1

1 X1 − p2 d n 1 = 1 − p2 p + p2 1 − p n 1 = p1 − p n

=

(independence) (identical distr.)

In the next chapter we study the convergence behaviour of a martingale uj j∈ ; therefore, it is natural to ask questions of the type from which index j onwards does uj x exceed a certain threshold, etc. This means that we must be able to admit indices which may depend on the argument x of uj x: ux x. The problem is measurability. 17.5 Definition Let X j be a -finite filtered measure space. A stopping time is a map X → ∪ which satisfies j ∈ j for all j ∈ . The associated -algebra is given by = A ∈ A ∩ j ∈ j ∀ j ∈ As usual, we write u x instead of the more correct ux x. 17.6 Lemma Let be stopping times on a -finite filtered measure space X j . (i) ∧ , ∨ , + k, k ∈ 0 are stopping times. (ii) < ∈ ∩ and ⊂ if . (iii) If uj is a sequence of real functions such that uj ∈ j , then u is / -measurable.

186

R.L. Schilling

Proof (i) follows immediately from the identities ∧ j = j ∪ j ∈ j ∨ j = j ∩ j ∈ j + k j = j − k ∈ j−k∨0 ⊂ j (ii) Since for all j ∈

j

< ∩ j =

= k ∩ k <

k=1

=

j

k ∩ k − 1 c ∩ k c ∈ j k=1 ∈ k

∈ k

∈ k

we find that < ∈ , while a similar calculation for < ∩ j yields < ∈ . If we find for A ∈ A ∩ j = A ∩ ∩ j = A ∩ j ∩ j ∈ j ∈ j

=

∈ j

i.e. A ∈ , hence ⊂ . (iii) We have for all B ∈ and j ∈ ∪ j u ∈ B ∩ j = uk ∈ B ∩ = k k=1

=

j

uk ∈ B ∩ k ∩ k − 1 c ∈ j k=1 ∈ k

∈ k

∈ k

The next result is a very useful characterization of (sub-)martingales. 17.7 Theorem Let X j be a -finite filtered measure space. For a sequence uj j∈ , uj ∈ 1 j , the following assertions are equivalent: (i) uj j∈ is a submartingale; (ii) u d u d for all bounded stopping times ; (iii) A u d A u d for all bounded stopping times and A ∈ . Proof (i)⇒(ii): Let N be two stopping times. By Lemma 17.6 u is measurable, and since N N uj d uj d j ∩ = j = j c ∩ = j ∈ j and we see

u d =

=

(17.2)

=

=

=

=

u d + u d + u d +

N −1 j=1 k uj b ∧ N

(as usual we set inf ∅ = +). Then 0 = 0 < 1 1 2 N = N = N

By the very definition of an upcrossing we find b − a Ua b N u 1 − a + u 2 − u2 + · · · + u N − uN b−a

b−a

and integrating both sides of this inequality over A ∈ 0 yields, after some simple rearrangements, b − a Ua b N d A

−

17.7

A

A

a d +

A

0

u 1 − u2 d + · · · +

u N − a d

A

A

0

u N −1 − uN d +

A

u N d

u N − a+ d

The upcrossing lemma is the basis for all martingale convergence theorems. 18.2 Theorem (Submartingale convergence) Let uj j j∈ be a submartingale on the -finite + filtered measure space X j . If supj∈ uj d < , then u x = limj→ uj x exists for almost all x ∈ and defines an -measurable function. Before we give the details of the proof, let us note some immediate consequences.

192

R.L. Schilling

18.3 Corollary Under any of the following conditions the pointwise limit limj→ uj exists a.e. in : (i) uj j∈ is a supermartingale and supj∈ u− j d < . (ii) uj j∈ is a positive supermartingale. (iii) uj j∈ is a martingale and supj∈ uj d < . Proof (of Theorem 18.2) In view of (18.1) we have

x lim uj x does not exist = x lim sup uj x > lim inf uj x j→

j→

j→

x sup Ua b N x =

=

aw

18.6 Theorem (Convergence of UI submartingales) Let uj j∈ be a submartingale on the -finite filtered measure space X j . Then the following assertions are equivalent: (i) u x = lim uj x exists a.e., u ∈ 1 , j→ lim uj d = u d, and uj j∈∪ is a submartingale. j→

(ii) uj j∈ is uniformly integrable. (iii) uj j∈ converges in 1 . Proof (i)⇒(ii): Since 0 is -finite, we can fix an exhausting sequence Ak k∈ ⊂ 0 with Ak ↑ X and Ak < . It is not hard to see that the −k 1 + A −1 1 function w = k Ak is strictly positive w > 0 and intek=1 2 1 grable w ∈ 0 . Because of u ∈ 1 , we find for every > 0 + some > 0 and some N ∈ such that u+ > u+ d + Ac u d < for all j

j N . Example 17.3(iv) shows that u+ j j∈∪ is still a submartingale, so that for every L > 0 + u d u+ d j + + uj >Lw

uj >Lw

+ u+ j >Lw∩u ∩AN

u+ d +

c u+ >∪AN

u+ d

u+ > Lw ∩ A N + j −N −1 > L 2 1 + A + u+ N j where we used that w 2−N 1 + AN −1 on AN . The Markov inequality P10.12 and the submartingale property imply 2N 1 + AN + sup + sup u+ uj d j d + L j∈ uj >Lw j∈ 2N 1 + AN + u d +

L Since we may choose L > 0 arbitrarily large, we have found that u+ j j∈ is + uniformly integrable. From limj→ uj = u a.e., we conclude limj→ u+ j = u ,


195

+ and Vitali’s convergence theorem 16.6 shows that limj→ u+ j d = u d. Thus j→ + uj d = 2u+ − u d − − − → 2u − u d = u d j j

and another application of Vitali’s theorem proves that uj j∈ is uniformly integrable. (ii)⇒(iii): Because of uniform integrability we have for some > 0 and a suitable w ∈ 1 uj d = uj d + uj d uj >w

+

uj w

w d <

and the martingale convergence theorem 18.2 guarantees that the pointwise limit u = limj→ uj exists a.e.; 1 -convergence follows from Vitali’s convergence theorem 16.6. (iii)⇒(i): Since 1 -limj→ uj = u exists we find (e.g. as in Theorem 12.10) that supj∈ uj d < . By the martingale convergence theorem 18.2, the pointwise limit u = limj→ uj exists a.e. On the other hand, by Corollary 12.8, u = limk→ ujk a.e. for some subsequence. This implies that u = u a.e. and, in 1 particular, that u = -limj→ uj ; this entails limj→ A uj d = A u d for all A ∈ .[] Since uj j∈ is a submartingale, we find for all k > j and A ∈ j A

uj d

k→

A

uk d −−−→

A

u d

so that uj j∈∪ is also a submartingale. Again, 1 -convergence of backwards (sub-)martingales holds under much weaker assumptions. 18.7 Theorem Let w ∈− be a backwards submartingale and assume that − is -finite. Then (i) lim w−j = w− ∈ − exists a.e. (ii)

j→+ 1 - lim w−j j→+

= w− if, and only if, inf j∈ w−j d > −. In this case,

w ∈−∪− is a submartingale and w− is a.e. real-valued.

For a backwards martingale, the condition in (ii) is automatically satisfied.

196

R.L. Schilling

Proof Part (i) has already been proved in Corollary 18.5. For (ii) we start with the observation that for a backwards submartingale sup w−j d < ⇐⇒ inf w−j d > − ⇐⇒ lim w−j d ∈

j∈

j∈

j→+

Indeed: the second equivalence follows from the submartingale property, w−j−1 d w−j d w−1 d while ‘⇐’ of the first equivalence derives from the fact that w+ ∈− is again a submartingale, cf. Example 17.3(iv), and + + w−j d = 2w−j − w−j d 2 w−1 d − w−j d the other direction ‘⇒’ is obvious. With exactly the same reasoning which was used in the proof of T18.6, (i)⇒(ii), we can now show that w+ ∈− and w ∈− are uniformly integrable (of course, the function w used as a bound for uniform integrability is now − -measurable). The submartingale property of w ∈−∪− follows literally with the same arguments as the corresponding assertion in (iii)⇒(i) of T18.6. We close this chapter with a simple but far-reaching application of the (backwards) martingale convergence theorem. 18.8 Example (Kolmogorov’s strong law of large numbers) For every sequence Xj j∈ of identically distributed independent random variables on the probability space P – that is, all Xj → are measurable, independent functions (in the sense of Example 17.3(x) and Scholium 17.4) such that Xj P = X1 P for all j ∈ – the strong law of large numbers holds, i.e. the limit 1 X1 + · · · + Xn n→ n lim

exists and is finite for a.e. ∈

if, and only if, the Xj are integrable. If this is the case, the above limit is given by X1 dP. Sufficiency: Suppose the Xj are integrable. Then Yj = Xj − Xj dP are again independent identically distributed random variables with zero mean: Yj dP = 0. Set Sn = Y1 + Y2 + · · · + Yn and −n = Sn Sn+1 Sn+2 and n Sn −n n∈ is a backwards martingale. In fact, any function of Y1 Y2 Yn Sn is independent of Yn+1 Yn+2 , and (17.6) yields for every set of the 1


197

form A = Nj=1 Yn+j ∈ Bj ∩ Sn ∈ B0 , B0 BN ∈ , N ∈ , and all k = 1 2 n Yk dP = N 1Sn ∈B0 Yk dP j=1 Yn+j ∈Bj

A

= =

Sn ∈B0

Yk dP · P

Sn ∈B0

Yn+j ∈ Bj

(by (17.6))

j=1

N

Y1 dP · P

N

Yn+j

∈ Bj

j=1

noting that the Yk are identically distributed. Summing over k = 1 n gives A

Sn dP = n

Sn ∈B0

Y1 dP · P

N j=1

Yn+j ∈ Bj = n

A

Y1 dP

This means that A Y1 dP = A n1 Sn dP for all n ∈ and all sets A from a generator clearly satisfies the conditions of Remark 17.2(i), proving that of1 −n which S −n n∈ is a backwards martingale. Theorem 18.7 now guarantees that n n Sn S 2 = lim n2 n→ n n→ n

L = lim

exists a.e. and in 1

It remains to show that L = 0 a.e. Note that limn→ Sn /n2 = 0 a.e.; since e− x 1 and since constants are integrable, the dominated convergence theorem 11.2 and independence (17.7) show 2 S −S e− L dP = lim exp − Snn exp − n2n2 n dP n→ S −S = lim exp − Snn exp − n2n2 n dP n→

S −S exp − Snn dP exp − n2n2 n dP = lim n→

=

e− L dP

2

Thus

2 2 2 e− L − e− L dP dP = e− L dP − e− L dP = 0

198

R.L. Schilling

and we conclude with Theorem 10.9(i) that e− L = e− L dP a.e.; as a consequence, L is almost everywhere constant. Using L = L1 - limn→ Sn /n, we get S n L = L dP = lim dP = 0 a.e. n→ n =0

Necessity: Suppose the a.e. limit L = limn→ n1 X1 + · · · + Xn exists and is finite. If all Xj were positive, we could argue as follows: the truncated random variables Xjc = Xj ∧ c are still independent and identically distributed. Since they are also integrable, the sufficiency direction of Kolmogorov’s law shows that for all c > 0 X c + · · · + Xnc X + · · · + Xn X1c dP = lim 1 lim 1 = L

n→ n→ n n Letting c → , Beppo Levi’s theorem 9.6 proves X1 dP < . Such a simple argument is not available in the general case. For this we need the converse or ‘difficult’ half of the Borel – Cantelli lemma (cf. Problem 6.9). 18.9 Theorem (Borel–Cantelli) Let P be a probability space and Aj j∈ ⊂ . Then PAj < =⇒ Plim supj→ Aj = 0 j=1

if the sets Aj are pairwise independent,1 then

PAj =

=⇒

Plim supj→ Aj = 1

j=1

Proof Recall that lim supj Aj = k jk Aj . Thus ∈ lim supj Aj if, and only if, appears in infinitely many of the Aj . This shows that lim supj Aj = j=1 1Aj = . The first of the two implications follows thus: by the Beppo Levi theorem for series C9.9, we see

1Aj dP =

j=1

Corollary 10.13 then shows 1

j=1

j=1 1Aj

i.e. PAj ∩ Ak = PAj PAk for all j = k.

1Aj dP =

PAj <

j=1

< a.e., and Plim supj→ Aj = 0 follows.


n

For the second implication we set Sn = j=1 1Aj and S = mn = Sn dP = nj=1 PAn and, by pairwise independence,

Sn − mn 2 dP =

n

199

j=1 1Aj .

Then

1Aj − PAj 1Ak − PAk dP

jk=1

=

n

1Aj − PAj 2 dP

j=1

=

n

PAj 1 − PAj mn

j=1

Since Sn S, we can use Markov’s inequality P10.12 to get P S 21 mn P Sn 21 mn = P Sn − mn − 21 mn P Sn − mn 21 mn = P Sn − mn 2 41 m2n 4 4 2 Sn − mn 2 dP

mn mn n→

By assumption mn −−−→ , hence PS < = limn→ PS 21 mn = 0. 18.8 Example (continued) We can now continue with the proof of the necessity part of Kolmogorov’s strong law of large numbers. Since the a.e. limit exists, we get Xn Sn n − 1 Sn−1 n→ = − −−−→ 0 n n n n−1 which shows that ∈ An = Xn > n happens only for finitely many n. In other words, P 0; since the An are all independent, the Borel–Cantelli j=1 1Aj = = lemma T18.9 shows that j=1 PAj < . Thus

X1 dP = =

j=1 j−1 X1 <j j

X1 dP

Pj − 1 X1 < j =

Pj − 1 X1 < j

n=1 j=n

j=1 n=1

= 1+

j Pj − 1 X1 < j

j=1

P X1 n = 1 +

n=1

since X1 and Xn have the same distribution.

n=1

P Xn n <

200

R.L. Schilling

We will see more applications of the martingale convergence theorems in the following chapters. Problems Unless otherwise stated X j will be a -finite filtered measure space. 18.1. Verify that the random times k and k defined in the proof of Lemma 18.1 are stopping times. 18.2. Let −j j∈ be a decreasing filtration such that − is -finite. Assume that u−j −j j∈ is a backwards supermartingale which converges a.e. to a real-valued function u− ∈ 1 which closes the supermartingale to the left, i.e. such that u−j −j j∈∪ is still a supermartingale. Then lim u−j d = u− d

j→

18.3. Let uj j j∈ be a supermartingale such that uj 0 and limj→ uj d = 0. j→

Then uj −−→ 0 pointwise a.e. and in 1 . Remark: Positive supermartingales with limj→ uj d = 0 are called potentials. 18.4. Let uj j j∈ be a martingale. If 1 -limj→ uj exists, then the pointwise limit limj→ uj x exists for almost every x. 18.5. Let P be a probability space. Find a martingale uj j∈ for which 0 < Puj converges < 1. [Hint: take a sequence Xk k∈0 of independent Bernoulli 21 21 -distributed random variables with values ±1; try uj = 21 X0 + 1X1 + X2 + · · · + Xj .] 18.6. The followingexercise furnishes an example of a martingale Mj j∈ on the probability space 0 1 0 1 = 1 01 such that -limj→ Mj exists but the pointwise limit limj→ Mj x doesn’t. Compare this with Problem 18.4. (i) Construct a sequence Xj j∈ of independent, identically Bernoulli distributed random variables with X1 = 1 = X1 = −1 = 21 . (ii) Let n = X1 X2n . Show that An = X2n−1 +1 + · · · + X2n = 0 is for each n ∈ contained in n and lim An = 0

n→

and

lim sup An = 1

n→

Conclude that the set of all x for which limn 1An x exists is a null set. [Hint: use the ‘difficult’ direction √ of the Borel–Cantelli lemma T18.9. Moreover, Stirling’s formula n! ∼ 2n n/en might come in handy.] (iii) The sequence M0 = 0 and Mn+1 = Mn 1 + X2n +1 + 1An X2n +1 , n ∈ 0 , defines a martingale Mn n n1 . (iv) Show that Mn+1 = 0 21 Mn = 0 + An . (v) Show that for every x ∈ limn Mn exists the limit limn 1An x exists, too. Conclude that limn Mn = 0 = 1 and that limn Mn exists = 0.


201

1 18.7. Consider the probability space P with Pj = 1j − j+1 . Set n = 1 2 n n + 1 ∩

and show that Xn = n + 11n+1∩ , n ∈ , is a positive martingale such that Xn dP = 1, limn→ Xn = 0 but supn∈ Xn = . 2 18.8. martingales A martingale uj j j∈ is called 2 -bounded, if supj∈ -bounded 2 uj d < . For ease of notation set u0 = 0. (i) Show that uj j∈ is 2 -bounded if, and only if, uj − uj−1 2 d < . j=1

[Hint: use Problem 17.6.] Assume from now on that uj j∈ is 2 -bounded. (ii) Show that lim uj = u exists a.e. j→

[Hint: argue that u2j j∈ is a submartingale.] (iii) Show that lim u − uj 2 d = 0. j→ 2 [Hint: check that uj+k − uj 2 d = j+k =j+1 u − u−1 d and apply Fatou’s lemma T9.11.] (iv) Assume now that X < . Show that uj j∈ is uniformly integrable, that j→

uj −−→ u in 1 and that u = u closes the martingale to the right, i.e. that uj j∈∪ is again a martingale. 18.9. Let P be a probability space. (i) Let j j∈ be a sequence of independent identically Bernoulli 21 21 -distributed random variables with values ±1. Show that for any sequence yj j∈

yj2 <

⇐⇒

j=1

j yj

converges a.e.

j=1

(ii) Generalize (i) to a sequence of independent random variables Xj j∈ with zero mean Xj dP = 0 and finite variances Xj2 dP = j2 < and prove

j2 <

=⇒

j=1

Xj

converges a.e.

j=1

[Hint: consider the martingale Sn = X1 + · · · + Xn and use Problem 18.8.] (iii) If Xj C for all j ∈ , the converse of (ii) is also true, i.e. j=1

j2 <

⇐⇒

Xj

converges a.e.

j=1

[Hint: show that Mn = X1 + · · · + Xn 2 − 12 + · · · + n2 = Sn2 − An is a martingale, use optional sampling 17.8 for Mn with = infj Mj > , observe that Mn∧ C + and that An∧ dP K + c2 .]

19 The Radon–Nikodým theorem and other applications of martingales

After our excursion into the theory of martingales we want to apply martingales to continue the development of measure and integration theory. The central topics of this chapter are • the Radon–Nikodým theorem 19.2 and Lebesgue’s decomposition theorem 19.9; • the Hardy–Littlewood maximal theorem 19.17; • Lebesgue’s differentiation theorem 19.20. For the last two we need (maximal) inequalities for martingales. These will be treated in a short interlude which is also of independent interest.

The Radon–Nikodým theorem Let X be a measure space. We have seen in Lemma 10.8 that for any + f ∈ 1+ – or indeed for f ∈ – the set-function = f given by A = A fx dx is again a measure. From Theorem 10.9(ii) we know that N ∈

N = 0

=⇒

N = 0

(19.1)

This observation motivates the following 19.1 Definition Let be two measures on the measurable space X . If (19.1) holds, we call absolutely continuous w.r.t. and write . Measures with densities are always absolutely continuous w.r.t. their base measure: f . Remarkably, the converse is also true. 19.2 Theorem (Radon–Nikodým). Let be two measures on the measurable space X . If is -finite, then the following assertions are equivalent 202


(i) A = (ii) .

fx dx A

203

for some a.e. unique f ∈ + ;

The unique function f is called the Radon–Nikodým derivative and (traditionally) denoted by f = d/d. Above we have just verified that (i)⇒(ii). The converse direction is less obvious and we want to use a martingale argument for its proof. For this we need a few more preparations which extend the notion of martingale to directed index sets. Let I be any partially ordered index set. We call I upwards filtering or upwards directed if

∈ I

=⇒

∃ ∈ I

(19.2)

A family ∈I of sub- -algebras of is called a filtration if

∈ I =⇒ ⊂ as before, we set = ∈I , and we treat as the biggest element of I ∪ , i.e. < for all ∈ I. If a -algebra 0 ⊂ for all ∈ I and if 0 is -finite, we call X a -finite filtered measure space. 19.3 Definition Let X be a -finite filtered measure space. A family of measurable functions u ∈I is called a martingale (w.r.t. the filtration ∈I ), if u ∈ 1 for each ∈ I and if u d = u d ∀ ∀ A ∈ (19.3) A

A

The notion of convergence along an upwards filtering set is slightly more complicated than for the index set . We say u = 1 - lim u ⇐⇒ ∀ > 0 ∃ ∈ I ∀ u − u 1 <

∈I

We can now extend Theorem 18.6. 19.4 Theorem Let I be an upwards filtering index set, X be a -finite measure space and u ∈I be a martingale. Then the following assertions are equivalent. (i) There exists a unique u ∈ 1 such that u ∈I∪ is a martingale. In this case u = 1 -lim ∈I u . (ii) u ∈I is uniformly integrable.

204

R.L. Schilling

Proof (i)⇒(ii): (compare with T18.6) Denote by Aj j∈ an exhausting sequence in 0 . Since u ∈ 1 , we find for every > 0 some > 0 and N ∈ such that u >

u d +

Acj

u d

∀ j N

Clearly, the function wx = j∈ 2−j 1 + Aj −1 1Aj x is in 1+ 0 , w > 0 and, as u ∈I∪ is a submartingale (cf. Example 17.3(v)), we find for every L>0 sup u d sup u d

∈I u >Lw

sup

∈I u >Lw

∈I u >Lw∩AN ∩ u

u d +

AcN

u d +

sup u > L 2−N 1 + AN −1 +

u >

u d

∈I

(use for the last step that wx 2−N 1 + AN −1 for x ∈ AN ). By Markov’s inequality P10.12 and the submartingale property we get 2−N 1 + AN sup sup u d + u d L

∈I u >Lw

∈I −N 2 1 + AN u d + L and (ii) follows since we can choose L > 0 as large as we want. (ii)⇒(i): Step 1: uniqueness. Assume that u w ∈ 1 are two functions which close the martingale u ∈I , i.e. functions satisfying u d = w d = u d ∀ A ∈ ∈ I A

A

A

Since u and w are integrable functions, the family

= A ∈ u d = w d

A

A

is a -algebra which satisfies ∈I ⊂ ⊂ .Since is generated by the , we get = , which means that A u d = A w d holds for all A ∈ . Now Corollary 10.14 applies and we get u = w almost everywhere. Step 2: existence of the limit. We claim that ∀ > 0 ∃ ∈ I ∀ u − u d < (19.4) Otherwise we could find a sequence j j∈ ⊂ I such that u j+1 − u j d > for all j ∈ . Since I is upwards filtering, we can assume that j j∈ is an


205

increasing sequence.[] Because of (ii), u j j j∈ is a uniformly integrable martingale with index set which is, by construction, not an 1 -Cauchy sequence. This contradicts Theorem 18.6. We will now prove the existence of the 1 -limit. Pick in (19.4) = n1 and choose 1/n . Since I is upwards directed, we can assume that 1/n increases as n → ;[] thus u 1/n n∈ ⊂ 1 is an 1 -Cauchy sequence. By Theorem 18.6 it converges in 1 and a.e. to some u = limn→ u 1/n ∈ 1 . Moreover, for all A ∈ and > 1/n we have A

u − u d

u − u 1/n d + A

A

u 1/n − u d

2 n

1/n by (19.4) 1

This shows, in particular, that 1A u −→ 1A u for all A ∈ , and in view of step 1, u is the only possible limit. The same argument that we used in (iii)⇒(i) of T18.6 now yields that u ∈I∪ is still a martingale. a.e. along I

Theorem 19.4 does not claim that u −−−−−−→ u . This is, in general, false for non-linearly ordered index sets I, see e.g. Dieudonné [12]. That uncountable, partially ordered index sets are not at all artificial is shown by the following example which will be essential for the proof of Theorem 19.2. 19.5 Example Let X be a finite measure space and assume that is a measure such that . Set

n I = = A1 A2 An n ∈ Aj ∈ and · Aj = X j=1

and define an order relation ‘ ’ on I through · ∪ · A where Ak ∈ ∈

⇐⇒ ∀ A ∈ A = A1 ∪ Since the common refinement of any two elements ∈ I, = A ∩ A A ∈ A ∈ is again in I and satisfies and , it is clear that I is upwards filtering. In particular, ∈I

where

= A A ∈

206

R.L. Schilling

is a filtration as ⊂ whenever . Moreover, f ∈I defined by A A 1 = 0 if A = 0 f = A A A A∈

is a martingale. Indeed, if , ∈ I, then A A if A > 0 f d = A = = A A 0 if A = 0 A · ∪ · B and B1 B ∈ as . Similarly, for A ∈ with A = B1 ∪ A

f d =

k=1 Bk

Bk Bk Bk k=1 Bk =

f d =

k Bk >0 ∗

=

Bk = A

k=1

= 0 if B = 0. Thus where we used in ∗ that , i.e. B k k A f d = A f d for all A ∈ , hence on since all A ∈ are disjoint and generate

[] ( , cf. also Remark 17.2(i)). What Example 19.5 really says is that A = f d ∀ A ∈

(19.5)

A

or and d /d = f . Heuristically we should expect that,

→

if f −−−→ f exists, f is the Radon–Nikodým derivative d/d = f . This idea can be made rigorous and is the basis for the Proof (of Theorem 19.2 (ii)⇒(i)) Let us first assume that and are finite measures Denote by f ∈I the martingale of Example 19.5. It is enough to show that f = 1 - lim f exists and that =

∈I

Indeed, (19.6) combined with (19.5) implies A = f d ∀A ∈ A

∈I

(19.6)


207

and theorem 5.7 for measures extends this equality to = the uniqueness ∈I . Since A ∈ is trivially contained in where = A Ac – at this point we use the finiteness of the measure – we see ⊃ = ⊃ ⊃

∈I

∈I

and all that remains is to prove the existence of the limit in (19.6). In view of Theorem 19.4 we have to show that f ∈I is uniformly integrable. We claim that sup ∈I f > R for all large enough R = R > 0. Otherwise we could find some 0 > 0 with f > n > 0 for all n ∈ , so that n∈ f > n > 0 by the continuity of measures, T4.4. Since is a finite measure, 4.4 10.12 1 X f d = lim = 0 f > n = inf f > n lim n→ n n→ n n∈ n∈ which contradicts the fact that . Finally, f d = f d = f > R f >R

f >R

if R = R > 0 is sufficiently large, and uniform integrability follows since the constant function R ∈ 1 . The uniqueness of f follows also from Theorem 19.4. Assume that is finite and X = Denote by = F ∈ F < the sets with finite -measure. Obviously, is ∪-stable, and the constant c = sup F X < F ∈

can be approximated by an increasing sequence Fj j∈ ⊂ such that c = j∈ Fj = supj∈ Fj .[] When restricted to the set F = j∈ Fj , is by c , A ∈ , we have definition -finite, while for A ⊂ F either

A = A = 0

or

0 < A < A =

(19.7)

In fact, if A < , then Fj ∪ A ∈ for all j ∈ , which implies that · A = F ∪ · A = F + A = c + A Fj ∪ c j∈

that is A = 0, hence A = 0 by absolute continuity; if, however, A = we have again by absolute continuity that A > 0. Define now F0 = ∅ j = • ∩ Fj \ Fj−1 j = • ∩ Fj \ Fj−1

208

R.L. Schilling

and it is clear that j j for every j ∈ . Since j j are finite measures, the first part of this proof shows that j = fj j . Obviously, the function f x if x ∈ Fj \ Fj−1 (19.8) fx = j c if x ∈ F fulfils = f . By construction, f is unique on the set F . But since every density f˜ of with respect to satisfies c c f˜ n ∩ F f˜ d n f˜ n ∩ F = < c fñ∩F c = f˜ n ∩ F c = 0 for all the alternative (19.7) reveals that f˜ n ∩ F n ∈ , i.e. that f˜ Fc = . In other words: f , as defined in (19.8), is also unique c. on F Assume that is -finite and X Let Aj j∈ ⊂ be an exhausting sequence with Aj ↑ X and Aj < . Then the measures 2−j h and where hx = 1 x 1 + Aj Aj j=1 have the same null sets.[] Therefore if, and only if, h . Since h is a finite measure[] , the first two parts of the proof show that = f · h = fh for a suitable density f ∈ + . The last equality needs proof: if f= M j=0 yj 1Aj is a positive simple function, A =

M A j=0

yj 1Aj dh =

M

yj

1Aj ∩A h d =

j=0

fh d A

and the general case follows from Beppo Levi’s theorem 9.6. Uniqueness is clear as f is h -a.e. unique, which implies that fh is -a.e. unique since h > 0. 19.6 Corollary Let X be a -finite measure space and = f . Then (i) X < ⇐⇒ f ∈ 1 ; (ii) is -finite ⇐⇒ f = = 0. Proof The first assertion (i) is obvious. For (ii) assume first that f = = 0. Since is -finite, we find an exhausting sequence Aj j∈ ⊂ with Aj ↑ X and Aj < . The sets Bk = 0 f k

B = f =


obviously satisfy

k∈ Bk ∪ B

Bk ∩ Aj =

Bk ∩Aj

209

= X as well as B = 0 and f d k d = k Aj < Aj

This shows that Aj ∩ Bk ∪ B jk∈ is an exhausting sequence for which means that is -finite. Conversely, let be -finite and assume that f = > 0. As we can find one exhausting sequence Ck k∈ ⊂ for both and [] , we see that f = = f = ∩ Ck ⊃ f = ∩ Ck0 k∈

for some fixed k0 ∈ with Ck0 > 0. But then Ck0 f d = f =∩Ck0

which is impossible. It is clear that not all measures are absolutely continuous with respect to each other. In some sense, the next notion is the opposite of absolute continuity. 19.7 Definition Two measures on a measurable space X are called (mutually) singular if there is a set N ∈ such that N = 0 = N c . We write in this case ⊥ (or ⊥ as ‘⊥’ is symmetric). 19.8 Examples Let X = n n . Then (i) x ⊥ n for all x ∈ n ; (ii) f ⊥ g if supp f ∩ supp g = ∅.1 The measures and are singular, if they have disjoint ‘supports’, that is, if lives in a region of X which is not charged by and vice versa. In this sense, Example 19.8(ii) is the model case for singular measures. In general, however, two measures are neither purely absolutely continuous nor purely singular, but are a mixture of both. 19.9 Theorem (Lebesgue decomposition) Let be two -finite measures on a measurable space X . Then there exists a (up to null sets) unique decomposition = + ⊥ where and ⊥ ⊥ . 1

supp f = f = 0.

210

R.L. Schilling

Proof Obviously + is still a -finite measure[] , and + . In this situation Theorem 19.2 applies and shows that = f + = f + f

(19.9)

For any > 0 we conclude, in particular, that f 1 + = f d + f 1+

1 + f 1 + + 1 + f 1 + i.e. f 1 + = f 1 + = 0 for all , hence f > 1 = f > 1 = 0. Without loss of generality we may therefore assume that 0 f 1. In this case (19.9) can be rewritten as 1 − f = f

(19.10)

and on the set N = f = 1 we have N =

f =1

d =

(19.10)

f =1

f d =

f =1

1 − f d = 0

Therefore, ⊥ ⊥ where ⊥ = • ∩ f = 1, and for = • ∩ f < 1 we get from (19.10) A = A ∩ f < 1 =

A∩f 0 p = 1

(19.16)

1 < p <

(19.17)

+ 1.

Proof If we could show that the square maximal function u∗0 satisfies u∗0 u∗ , then (19.16), (19.17) would immediately follow from Doob’s inequalities C19.13, compare Example 19.14. The problem, however, is that a ball Br of radius r ∈ 41 2−k−1 41 2−k , k ∈ , need not entirely fall into any single square of our 0

lattice k :

216

R.L. Schilling

r

r ∋

2

2

–k

0

∋

0

1

But if we move our lattice by 2 · 41 2−k = 21 2−k in certain (combinations of) coordinate directions, we can ‘catch’ Br inside a single cube Q of the shifted lattice.[] More precisely, if j ∈ 0 21 2−k e = 1 n then

e k = e + Qk z z ∈ 2−k n k ∈

e uk k∈

u∗e

are 2n filtrations with corresponding martingales and square maximal functions. As in Example 19.14 we find that 1 n u∗e s u 1 s

s > 0

(19.18)

Combining Corollary 15.15 with the translation invariance and scaling behaviour of Lebesgue measure we see that the volume of a ball Br of radius 41 2−k−1 r < 1 −k and arbitrary centre is 42 n n/2 r n n/2 41 2−k−1 15.15 n n n Br = r B1 = n2 + 1 n2 + 1 hence we get from x ∈ Br ⊂ Q and n Q = 2−k n that n Q 1 1 n u d u dn n Br Br n Br n Q Q −k n n 2 +1 2 1 u dn 1 n n n/2 −k Q Q 82 8 n √ n2 + 1 max u∗e x e

= n


217

This shows that u∗ n max u∗e and e

n u∗ s n

e

n

u∗e

e (19.18)

2n n

s n s

max u∗e

n

1 u n ds s

p u p for all shifts e, and Doob’s A very similar argument yields u∗e p p−1 ∗ inequality (19.14) applied to each ue finally shows

u∗ p n max u∗e p n u∗e p 2n n e

e

p u p p−1

All that remains to be done is to call cn = 2n n . The proof of Theorem 19.17 extends with very little effort to maximal functions of finite measures. 19.18 Definition Let be a locally finite2 measure on n n . maximal function is given by ∗ x = sup B Bx

The

B n B

where B ⊂ n stands for a generic open ball of any radius. If we replace in the proof of Theorem 19.17 the expression B u dn by B and u∗e x by

e Q ∗ k x ∈ Q Q∈ e x = sup n Q k∈ we arrive at the following generalization of (19.16) 19.19 Corollary Let be a finite measure on n n with total mass and maximal function ∗ . Then c (19.19) n ∗ s n s > 0 s n with the universal constant cn = √16 n2 + 1. 2

i.e. every point x ∈ n has a neighbourhood U = Ux such that U < . In n this is clearly equivalent to saying that B < for every open ball B.

218

R.L. Schilling

Lebesgue’s differentiation theorem Let us return once again to the Radon–Nikodým theorem 19.2. There we have seen that implies = f . The proof, though, shows even more, namely = f

and

1 - lim f = f

∈I

(notation as in T19.2). Let us consider a concrete measure space X = n n n . In this case we could reduce our consideration to a countable sequence of -algebras (instead of ∈I ) – cf. Problem 19.1 – and use Theorem 18.6 instead of 19.4. In fact, this would even allow us to get fx as pointwise limit. This is one way to prove Lebesgue’s differentiation theorem. 19.20 Theorem (Lebesgue) Let u ∈ 1 n . Then 1 lim n uy − ux n dy = 0 r→0 Br x Br x for (Lebesgue) almost all x ∈ n . In particular, 1 uy n dy ux = lim n r→0 Br x Br x

(19.20)

(19.21)

We will not follow the route laid out above, but use instead the Hardy–Littlewood maximal theorem 19.17 to prove T19.20. The reason is mainly a didactic one since this is a beautiful example of how weak-type maximal inequalities (i.e. inequalities like (19.16) or (19.13)) can be used to get a.e. convergence. More on this theme can be found in Krantz [25, pp. 27–30] and Garsia [16, pp. 1–4]. Our proof will also show that the limits in (19.20) and (19.21) can be strengthened to B ↓ x where B is any ball containing x and, in the limit, shrinking to x. Proof (of Theorem 19.20) We know from Theorem 15.17 that the continuous functions with compact support Cc n are dense in 1 n . Since ∈ Cc n is uniformly continuous, we find for every > 0 some > 0 such that x − y <

∀ x − y r r <

Thus

1 y − x n dy r→0 n Br x Br x lim

∀ ∈ Cc n

(19.22)

and (19.20) is true for Cc n . For a general u ∈ 1 n we pick a sequence j j∈ ⊂ Cc n with limj→ u − j 1 = 0. Denote by 1 w dn w x = sup n B x B x 0 3

x lim sup

>0

x u − ux x > 3 = n x u − j + j − j x + j x − ux x > 3 n u − j ∗ > + n x j − j x x > + n x j x − ux >

n

cn 1 u − j 1 + 0 + j − u 1

where we used Theorem 19.17, resp., (19.22) with → 0, resp., the Markov inequality 10.12 to deal with each of the above three terms respectively. The assertion now follows by letting first j → and then → 0. Let us now investigate the connection between ordinary derivatives and the Radon–Nikodým derivative. For this the following auxiliary notation will be useful. If is a measure on n n that assigns finite volume to any ball, we set ¯ Dx = lim sup r→0

Br x Br x = lim sup n Br x k→ 0 − . Setting 1 • = K ∩ •

and

2 = − 1

we obtain two measures 1 2 with = 1 + 2 and 2 . Since K c is open, ¯ 1 x = 0 for all x ∈ K c , we conclude from the definition of the derivative that D so that ¯ 2 x = D ¯ 2 x 2∗ x ¯ ¯ 1 x + D Dx = D

∀ x ∈ Kc

where 2∗ denotes the maximal function for the measure 2 . This shows that ¯ > s ⊂ K ∪ 2∗ > s D

∀ s > 0

Using that n K n N = 0 and the maximal inequality Corollary 19.19 we find c c ¯ > s n 2∗ > s n 2 n n D s s 4

See the footnote on p. 217.


221

¯ = 0 Lebesgue a.e. Since > 0 and s > 0 were arbitrary, we conclude that D If is not finite, we choose an exhausting sequence of open balls Bk 0, k ∈ ¯ = D ¯ k on Bk 0, and there, and set k • = Bk 0 ∩ • . Obviously, D ¯ fore the first part of the proof shows that Dx = 0 for Lebesgue almost all x ∈ ¯ B 0. Denoting the exceptional set by Mk we see that Dx = 0 for all x ∈ M = k n k∈ Mk ; the latter, however, is an -null set, and the theorem follows. The Calderón–Zygmund lemma Our last topic is the famous Calderón–Zygmund decomposition lemma which is the heart of many further developments in the theory of singular integral operators. We take the proof from Stein’s book [47, p. 17] and rephrase it a little to bring out the martingale connection. 19.23 Lemma (Calderón–Zygmund decomposition) Let u ∈ 1+ n and > 0. Then there exists a decomposition of n such that (i) n = F ∪ and F ∩ = ∅; (ii) u almost everywhere on F ; (iii) = k∈ Qk with mutually disjoint half-open axis-parallel squares Qk such that for each Qk 1

< n u dn 2n Qk Qk 0

Proof Let k = k , k ∈ , be the dyadic filtration of Example 19.14 and let uk k∈ be the corresponding martingale (19.15). Introduce a stopping time = infk ∈ uk >

inf ∅ = +

and set F = = + ∪ = − and = − < < +. By the very definition of the martingale uk k∈ we see 1 k→− uk x n u dn = 2nk u 1 −−−−→ 0 Qk so that limk→− uk x = 0 and = − = ∅. If x ∈ = +, we have uk x

and so ux = limk→ uk x a.e., as the almost everywhere pointwise limit exists by Corollary 18.3 (note that uk dn = u dn < ). This settles (i) and (ii). Since is a stopping time, = k = k \ k − 1 ∈ k , hence = k as well as = · k∈ = k are unions of disjoint half-open squares. The estimate in (iii) can be written as

 is clear. For the upper estimate we note that every square Qk−1 ∈ k−1 contains 2n squares Qk ∈ k , so that 1 n u d 1Qk y n Qk y Qk y uk Qk y⊂Qk−1 z z∈2−k+1 n = 1 uk−1 u dn 1Qk−1 z n Q z Qk−1 z k−1 −k+1 n z∈2 2−n n u d 1Qk−1 z n Qk y Qk y −k+1 n Q y⊂Q z z∈2 k k−1 2n 1 u dn 1Qk−1 z n Q z Qk−1 z k−1 z∈2−k+1 n = 2n Finally, by the definition of , 1− 0 such that N Fyj − Fxj for all points x1 < y1 < x2 < y2 < < xN < yN with j=1 N j=1 yj − xj < . (3) F3 is continuous and singular, i.e. 3 ⊥ 1 .

[Hint: use in (19.2) the characterization of null sets of Problem 6.1.] 19.10. The devil’s staircase. Recall the construction of Cantor’s ternary set from Problem k k 7.10. In each step of the construction Ek = Ik1 · · Ik2 . Denote by Jk1 Jk2 −1

224

R.L. Schilling the intervals which make up 0 1 \ Ek arranged in increasing order of their endpoints. We construct a sequence of functions Fk 0 1 → 0 1 by ⎧ ⎪ if x = 0 ⎪ ⎨0 −k Fk x = j 2 if x ∈ Jkj 1 j 2k − 1 ⎪ ⎪ ⎩1 if x = 1 and interpolate linearly between these values to get Fk x for all other x. (i) Sketch the first three functions F1 F2 F3 . (ii) Show that the limit Fx = limk→ Fk x exists. Remark. F is usually called the Cantor function. (iii) Show that F is continuous and increasing. (iv) Show that F exists a.e. and equals 0. (v) Show that F is not absolutely continuous (in the sense of Problem 19.9(2)) but singular, i.e. the corresponding measure with distribution function F is singular w.r.t. Lebesgue measure 1 01 .

19.11. Kolmogorov’s inequality. Let Xj j∈ be a sequence of independent, identically distributed random variables on a probability space P. Then we have the following generalization of Chebyshev’s inequality, cf. Problem 10.5 (vi), # n # n # # 1 # # P max # Xj − EXj # t 2 VX 1jn t j=1 j j=1 where, in probabilistic notation, EY = Y dP is the expectation or mean value and VY = Y − EY 2 dP the variance of the random variable (i.e. measurable function) Y → . 19.12. Let u w 0 be measurable functions on a -finite measure space X . (i) Show that t u t ut w d for all t > 0 implies that

up d

p p−1 u w d p−1

∀ p > 1

(ii) Assume now that u w ∈ Lp . Conclude from (i) that u p

p p−1

w p for p > 1.

[Hint: use the technique of the proof of Theorem 19.12; for (ii) use Hölder’s inequality.] 19.13. Show the following slight improvement of Doob’s maximal inequality T19.12: Let uj j∈ be a martingale or uj p j∈ , 1 < p < , be a submartingale on a -finite filtered measure space. Then max uj p u∗N p jN

p p uN p max u p−1 p − 1 1jN j p


225

19.14. p -bounded martingales. A martingale uj j j∈ is called p -bounded, if p supj∈ uj d < for some p > 1. Show that the sequence uj j∈ converges a.e. and in p -sense to a function u ∈ p . [Hint: compare with Problem 18.8] 19.15. Use Theorem 18.6 to show that the martingale of Example 19.14 is uniformly integrable. 19.16. Let u a b → be a continuous function. Show that x → ax ut dt is everywhere differentiable and find its derivative. What happens if we only assume that u ∈ 1 dt? [Hint: Theorem 19.20.] 19.17. Let f → be a bounded increasing function. Show that f exists Lebesgue almost everywhere and that fb−fa ab f x dx. When do we have equality? [Hint: assume first that f is left- or right-continuous. Then you can interpret f as distribution function of a Stieltjes measure . Use Lebesgue’s decomposition theorem 19.9 to write = + ⊥ and use Corollaries 19.21 and 19.22 to find f . If f is not one-sided continuous in the first place, use Lemma 13.12 to find a version of f which is left- or right-continuous such that = f is at most countable, hence a Lebesgue null set.] 19.18. Fubini’s ‘other’ theorem. Let fj j∈ be a sequence of monotone increasing functions fj a b → . If the series sx = j=1 fj x converges, then s x exists a.e. and is given by s x = j=1 fj x a.e. [Hint: the partial sums sn x and sx are again increasing functions and, by Problem 19.17 s x and sn x exist a.e.; the latter can be calculated through termby-term differentiation. Since the fj are increasing functions, the limits of the difference quotients show that 0 sn sn+1 s a.e., hence j fj converges a.e. To identify this series with s , show that k sx − snk x converges on a b for some suitable subsequence. The first part of the proof applied to this series implies that k s x − sn k x converges, thus s − sn k → 0.]

20 Inner product spaces

¯ Often it is Up to now we have only considered functions with values in or . necessary to admit complex-valued functions, too. In what follows will stand for or . Recall that a -vector space is a set V with a vector addition ‘ + ’ V × V → V , v w → v + w and a multiplication of a vector with a scalar ‘ · ’ × V → V , v → · v which are defined in such a way that V + is an Abelian group and that for all ∈ and v w ∈ V the relations + v = v + v

v + w = v + w

v = v

1·v = v

hold. Typical examples of -vector spaces are the spaces p or Lp (see Remark 12.5) and, in particular, the sequence spaces p from Example 12.12. For the -versions we first need to know how to integrate complex functions. 20.1 Scholium (integral of complex functions) It is often necessary to consider complex-valued functions u X → on a measurable space X . Since is a normed space, we have a natural topology on and we may consider the Borel -algebra on . Since we can (even topologically) identify the complex plane with 2 , the Borel sets in are generated by the half-open rectangles

z w = x + iy Re z x < Re w Im z y < Im w 2 The correspondence ↔ 2 is accomplished by the map → , z → 1 1 z = Re z Im z = 2 z + z¯ 2i z − z¯ which is, along with its inverse −1 2 → , x y → −1 x y = x + iy, continuous, hence measurable.

226


227

Consequently, we have fX→

is

⎫ ⎬

/ measurable ⎭

⇐⇒

⎧ ⎨ Re f Im f X → are ⎩ / measurable.

(20.1)

To see ‘⇒’ note that the maps Re z → 21 z + z¯ and Im z → 2i1 z − z¯ are continuous, hence measurable, and so are by Theorem 7.4 the compositions Re f and Im f . Conversely, ‘⇐’ follows – if we write f = u + iv – from the formula f −1

z w = u−1 Re z Re w ∩ v−1 Im z Im w ∈

∈

∈

and the fact that the rectangles of the form

z w generate . This means that we can define the integral of a -valued measurable function by linearity f d = Re f d + i Im f d (20.2) and we call f X → integrable and write f ∈ 1 if Re f Im f X → are integrable in the usual sense. The following rules for f ∈ 1 are readily checked: Re f d = Re f d Im f d = Im f d f d = f d (20.3) f ∈ 1 ⇐⇒ f ∈ and f ∈ 1

(20.4)

1/2 is measurable In (20.4) the direction ‘⇒’ follows since f = Re f2 +Im f2 and f Ref + Im f , while ‘⇐’ is implied by Re f Im f f . The equivalence (20.4) can be used to show that 1 is a -vector space: for f g ∈ 1 and ∈ we have f + g ∈ 1 , in which case

f + g d =

f d +

moreover, we have the following standard estimate: f d f d

g d

(20.5)

(20.6)

228

R.L. Schilling

Only (20.6) is not entirely straightforward. Since f d ∈ , we can find some ∈ 0 2 such that i f d = ei f d = Re e f d (20.3),(20.5)

=

Re ei f d

ei f d = f d

p

The spaces , 1 < p , are now defined by p p = f ∈ f ∈

(20.7)

and it is obvious that all assertions from Chapter 12 remain valid. In particular, p p L stands for the set of all equivalence classes of -functions if we identify functions which coincide outside some -null set. Note also that most of our results on -valued integrands carry over to -valued functions by considering real and imaginary parts separately. p

As we have seen in Chapter 12, cf. Remark 12.5, the spaces , resp. semi-normed, resp. normed vector spaces. The same and more is true n : here we can even define a product of two vectors which, however, results in a scalar. It is this notion which we want to study in greater detail. p L are for n and

20.2 Definition A -vector space V is an inner product space if it supports a scalar or inner product, i.e. a map • • V × V → with the following properties: for all u v w ∈ V and ∈ v v > 0

definiteness

⇐⇒

v = 0

v w = w v

skew-symmetry

u + v w = u w + v w

SP1 SP2 SP3

If = , (SP2 ) becomes symmetry and (SP2 ), (SP3 ) together show that both v → v w and w → v w are -linear; therefore we call v w → v w bilinear. If = , (SP2 ), (SP3 ) give SP2

SP3

u v + w = v + w u = v u + w u SP2 ¯ = ¯ v u + ¯ w u = u ¯ v + u w


229

i.e. w → v w is skew-linear. We call • • in this case a sesqui-linear form. Since = always includes = , we will restrict ourselves to = . 20.3 Lemma (Cauchy–Schwarz inequality) Let V • • be an inner product space. Then

v w 2 v v w w

∀ v w ∈ V

(20.8)

Equality holds if, and only if, v = w for some ∈ . Proof If v = 0 or w = 0, there is nothing to show. For all other v w ∈ V and ∈ we have 0 v − w v − w = v v − w v − v ¯ w + w ¯ w = v v − 2 Re w v + 2 w w where we used that z + z¯ = 2 Re z. If we set = v v / w v , we get 0 v v − 2 Re v v +

v v 2 w w

w v 2

which implies (20.8). Since v − w v − w = 0 only if v = w, this is necessary for equality in (20.8), too. If, indeed, v = w, we see

v w 2 = w w 2 = w ¯ w w w = w w w w = v v w w showing that v = w is also sufficient for equality in (20.8). Lemma 20.3 is an abstract version of the Cauchy–Schwarz inequality for integrals C12.3. Just as in Chapter 12 we will use it to show that in an inner product space V • • v ∈ V (20.9) v = v v defines a norm, i.e. a map • V → 0 satisfying for all v w ∈ V and ∈ definiteness

v > 0

⇐⇒

v = 0

N1

pos. homogeneity

v = · v

N2

triangle inequality

v + w v + w

N3

230

R.L. Schilling

20.4 Lemma V • • 1/2 is a normed space.

Proof Because of (SP1 ) the map • V → 0 , v = v v , is well-defined. All we have to do is to check the properties N1 –N3 . Obviously SP1 ⇔ N1 , N2 follows from SP2 SP3 : ¯ v = 2 · v2 v2 = v v = v and the triangle inequality N3 is a consequence of the Cauchy–Schwarz inequality (20.8): v + w2 = v + w v + w = v v + v w + w v + w w = v2 + 2 Re v w + w2 v2 + 2 v w + w2 (20.8)

v2 + 2 v · w + w2

= v + w2 20.5 Examples (i) The typical finite-dimensional inner product spaces are n

-vector space x y =

n

xj yj

n

-vector space

z w =

j=1

x =

zj w ¯j

j=1

1/2

n

n

z =

xj2

j=1

n

1/2

zj

2

j=1

(ii) The typical separable1 infinite-dimensional inner product spaces are 2

-vector space

x = xj j∈ y = yj j∈ x y = x y2 =

xj yj

2

-vector space

z = zj j∈ w = wj j∈ z w = z w2 =

j=1

x = x2 =

j=1

1

zj w ¯j

j=1

1/2 xj2

z = z2 =

1/2

zj

2

j=1

Separable means that the space contains a countable dense subset, see Definition 21.14 below.


231

(iii) Let X be a measure space. The typical general (finite and infinitedimensional) inner product spaces are L2

-vector space u v = u v2 = u v d u = u2 =

L2

-vector space f g = f g2 = f g¯ d

1/2 2

u d

f = f 2 =

1/2

f d

2

Every inner product space becomes a normed space with norm given by (20.9), but not every normed space is necessarily an inner product space. In fact, Lp or p are for all 1 p normed spaces, but only for p = 2 inner product spaces. The reason for this is that in Lp , p = 2, the parallelogram law does not hold. 20.6 Lemma (Parallelogram identity) Let V • • be an inner product space. Then v + w 2 v−w 2 1 2 + ∀ v w ∈ V (20.10) = v + w2 2 2 2 Proof Obvious. Geometrically v + w and v − w are the diagonals of the parallelogram spanned by v and w. The proof of (20.10) in n would show the cosine law for the angle x y between the vectors x y ∈ n : x y = cos x y x · y

(20.11)

In fact, inner products induce a natural geometry on V which resembles in many aspects the Euclidean geometry on n and n . 20.7 Definition Let V • • be an inner product space. We call v w ∈ V orthogonal and write v ⊥ w if v w = 0. 20.8 Remark (i) If • derives from a scalar product, we can recover • • from • with the help of the so-called polarization identities: if = , v w = 41 v + w2 − v − w2 = 21 v + w2 − v2 − w2 (20.12) and if = , v w =

1 4

v + w2 − v − w2 + iv − iw2 − iv + iw2

(20.13)

232

R.L. Schilling

(ii) One can show that a norm • derives from a scalar product if, and only if, • satisfies the parallelogram identity (20.10). For a proof we refer to Yosida [55, p. 39], see also Problem 20.2. (iii) Let V = V be an -inner product space with scalar product • • . Then we can turn V into a -inner product space using the following complexification procedure: V = V ⊕ iV = v + iw v w ∈ V with the following addition v + iw + v + iw = v + v + iw + w

v v w w ∈ V

scalar multiplication + iv + iw = v − w + iv + w

∈ v w ∈ V

inner product v + iw v + iw = v v + i w v − i v w + w w

v v w w ∈ V

1/2

and norm · = • • . Problems 20.1. Show that the examples given in 20.5 are indeed inner product spaces. 20.2. This exercise shows the following Theorem (Fréchet–von Neumann–Jordan). An inner product • • on the vector space V derives from a norm if, and only if, the parallelogram identity (20.10) holds. (i) Necessity: prove Lemma 20.6. Assume from now on that • is a norm satisfying (20.10) and set v w = 41 v + w2 − v − w2 (ii) Show that v w satisfies the properties SP1 and SP2 of Definition 20.2. (iii) Prove that u + v w = u w + v w. (iv) Use (iii) to prove that q v w = q v w for all dyadic numbers q = j 2−k , j ∈ , k ∈ 0 and conclude that SP3 holds for dyadic . (v) Prove that the maps t → tv + w and t → tv − w t ∈ v w ∈ X are continuous and conclude that t → tv w is continuous. Use this and (iv) to show that SP3 holds for all ∈ .


233

20.3. (Continuation of Problem 20.2) Assume now that W is a -vector space with norm • satisfying the parallelogram identity (20.10) and let v w = 41 v + w2 − v − w2 Then v w = v w + iv iw is a complex-valued inner product. 20.4. Does the norm •1 on L1 0 1 0 1 1

01 derive from an inner product? 20.5. Let V • • be a -inner product space, n ∈ and set = e2i/n . n 1 if k = 0 1 (i) Show that jk = n j=1 0 if 1 k n − 1 (ii) Use (i) to prove for all n 3 the following generalization of (20.12) and (20.13): n 1 v w = j v + j w2 n j=1 (iii) Prove the following continuous version of (ii) 2 1 v w = ei v + ei w d 2 − 20.6. Let V be an inner product space. Show that v ⊥ w if, and only if, Pythagoras’ theorem v + w2 = v2 + w2 holds.

21 Hilbert space

Let V • • be an inner product space. As we have seen in Chapter 20, V • = • •1/2 is a normed space and the norm resembles in many aspects the Euclidean, resp. unitary norm in n and n . In particular, we have a notion of convergence:1 a sequence vj j∈ ⊂ V converges to an element v ∈ V if v − vj j∈ converges to 0 in + , lim vj = v ⇐⇒ lim v − vj = 0

j→

j→

But it is completeness and the study of Cauchy sequences in V , vj j∈ ⊂ V Cauchy sequence ⇐⇒ lim vj − vk = 0 jk→

that gets analysis really going. This leads to the very natural 21.1 Definition A Hilbert space is a complete inner product space, i.e. an inner product space where every Cauchy sequence converges. We will usually write for a Hilbert space. 2 21.2 Example The spaces n , n , and L2 over any measure space X are Hilbert spaces and, indeed, the ‘typical’ ones. This follows from Example 20.5 and the Riesz – Fischer theorem 12.7.

Since every Hilbert space is an inner product space, we have the notion of orthogonality of g h ∈ , see Definition 20.7: g ⊥ h ⇐⇒ g h = 0 234


235

21.3 Definition Let be a Hilbert space. The orthogonal complement M ⊥ of a subset M ⊂ is by definition M ⊥ = h ∈ h ⊥ m ∀ m ∈ M (21.1) = h ∈ h m = 0 ∀ m ∈ M 21.4 Lemma Let be a Hilbert space and M ⊂ be any subset. The orthogonal complement M ⊥ is a closed linear subspace of and M ⊂ M ⊥ ⊥ . Proof If g h ∈ M ⊥ we find for all ∈ that g + h m = g m + h m = 0

∀ m ∈ M

i.e. g + h ∈ M ⊥ and M ⊥ is a linear subspace of . To see the closedness we take a sequence hk k∈ ⊂ M ⊥ such that limk→ hk = h. Then, for all m ∈ M, 20.3

k→

h m = h m − hk m = h − hk m h − hk · m −−−→ 0 =0

this shows that M ⊥ is closed since h ∈ M ⊥ . Finally, if m ∈ M we get 0 = h m = m h

∀ h ∈ M ⊥ =⇒ m ∈ M ⊥ ⊥

The next theorem is central for the study of (the geometry of) Hilbert spaces. Recall that a set C ⊂ is convex if u w ∈ C =⇒ tu + 1 − tw ∈ C

∀ t ∈ 0 1

21.5 Theorem (Projection theorem) Let C = ∅ be a closed convex subset of the Hilbert space . For every h ∈ there is a unique minimizer u ∈ C such that h − u = inf h − w = dh C w∈C

(21.2)

This element u = PC h is called (orthogonal) projection of h onto C and is equally characterized by the property PC h ∈ C

and

Re h − PC h w − PC h 0 ∀ w ∈ C

(21.3)

Proof Existence: Let d = inf w∈C h−w. By the very definition of the infimum, there is a sequence wk k∈ ⊂ C such that lim h − wk = d

k→

If we can show that wk k∈ is a Cauchy sequence, we know that the limit u = limk→ wk exists because of the completeness of and is in C since C is

236

R.L. Schilling

closed. Applying the parallelogram law (20.10) with v = h − wk and w = h − w gives 2

wk − w 2 1 h − wk + w + = h − wk 2 + h − w 2 2 2 2 Since C is convex, 21 wk + 21 w ∈ C, thus d h − 21 wk + w and d2 + 41 wk − w 2

1 2

k→ h − wk 2 + h − w 2 −−−−→ d2

This proves that wk k∈ is a Cauchy sequence. Uniqueness: Assume that u u˜ ∈ C satisfy both (21.2), i.e. u − h = d = u˜ − h Since by convexity 21 u + 21 u˜ ∈ C, the parallelogram law (20.10) gives 2

2 d2 h − 21 u + 21 u˜ + 21 u − u˜ = 21 h − u2 + h − u˜ 2 = d2 d2

and we conclude that u − u˜ 2 = 0 or u = u˜ . Equivalence of (21.2),(21.3): Assume that u ∈ C satisfies (21.2) and let w ∈ C. By convexity, 1 − tu + tw ∈ C for all t ∈ 0 1 and by (21.2) h − u2 h − 1 − tu − tw2 = h − u − tw − u2 = h − u2 − 2t Re h − u w − u + t2 w − u2 Hence, 2 Re h − u w − u tw − u2 and (21.3) follows as t → 0. Conversely, if (21.3) holds, we have for u = PC h ∈ C h − u2 − h − w2 = 2 Re h − u w − u − u − w2 0

∀ w ∈ C

which implies (21.2). We will now study the properties of the projection operator PC . If V W ⊂ are two subspaces with V ∩ W = 0 , we call V + W = v + w v ∈ V w ∈ W the direct sum and write V ⊕ W . 21.6 Corollary (i) Let ∅ = C ⊂ be a closed convex subset. The projection PC → C is a contraction, i.e. PC g − PC h g − h

∀ g h ∈

(21.4)


237

(ii) If ∅ = C = F is a closed linear subspace of , PF is a linear operator and f = PF h is the unique element with f ∈F

and

h − f ∈ F ⊥

(21.5)

In particular, = F ⊕ F ⊥ . (iii) If F is not closed, then = F¯ ⊕ F ⊥ or, equivalently, F¯ = F ⊥ ⊥ . Proof (i) follows from the inequality

PC g − PC h2 = Re PC g PC g − PC h − PC h PC g − PC h = Re PC g − g PC g − PC h + PC h − h PC h − PC g

+ g − h PC g − PC h (21.3)

Re g − h PC g − PC h g − h · PC g − PC h

where we used the Cauchy – Schwarz inequality L20.3 for the last estimate. (ii) Since F is a linear subspace, v ∈ F =⇒ v ∈ F for all ∈ and (21.3) reads in this case Re h − PF h v − PF h 0

∀ ∈ v ∈ F

or, equivalently,

Re h − PF h v Re h − PF h PF h

∀ ∈ v ∈ F

which is only possible if h − PF h v = 0 for all v ∈ F and for v = PF h, in particular, h − PF h PF h = 0; this shows (21.5). If, on the other hand, (21.5) is true, we get for all v ∈ F 0 = Re h − f v − Re h − f f = Re h − f v − f and f = PF h follows by the uniqueness of the projection. The decomposition = F ⊕ F ⊥ follows immediately as h = PF h + h − PF h and h ∈ F ∩ F ⊥ ⇐⇒ h h = 0 ⇐⇒ h = 0. The decomposition also proves the linearity of PF since for all g h ∈ and ∈ g − PF g + h − PF h g + h = 0 ∈F⊥

as well as

∈F⊥

∈F

g + h − PF g + h g + h = 0

238

R.L. Schilling

which implies, again by uniqueness of the projection, that PF g + h = PF g + PF h. (iii) We know from Lemma 21.4 that F ⊂ F ⊥ ⊥ and that F ⊥ ⊥ is closed; therefore, F¯ ⊂ F ⊥ ⊥ . Moreover, F ⊂ F¯ implies F¯ ⊥ ⊂ F ⊥ ,[] showing that 21.6(ii) 21.6(ii) = F¯ ⊕ F¯ ⊥ ⊂ F¯ + F ⊥ ⊂ F ⊥ ⊥ ⊕ F ⊥ =

and = F¯ ⊕ F ⊥ or F¯ = F ⊥ ⊥ follows. 21.7 Remarks (i) It is easy to show that the projection PF onto a subspace F ⊂ is symmetric, i.e. that PF g h = g PF h

∀ g h ∈

(21.6)

and that PF2 = PF , i.e. PF2 g h = PF g PF h = PF g h

∀ g h ∈

(21.7)

In fact, (21.7) implies (21.6). Since PF g ∈ F , PF PF g = PF g by the uniqueness of the projection and PF2 g h = PF g h follows. Finally, PF g h = PF g PF h + PF g h − PF h = PF g PF h =0

(ii) Pythagoras’ theorem has a particularly nice form for projections: h2 = PF h2 + h − PF h2

∀ h ∈

(21.8)

(iii) A very useful interpretation of C21.6(iii) is the following: a linear subspace F ⊂ is dense in if, and only if, F ⊥ = 0 . In other words, F ⊂

is dense ⇐⇒ f h = 0

∀f ∈ F

entails h = 0

Let us briefly discuss two important consequences of the projection theorem 21.5: F. Riesz’ representation theorem on the structure of continuous linear functionals on and the problem of finding a basis in . 21.8 Definition A continuous linear functional on is a map → , h → h which is linear, g + h = g + h

∀ ∈ ∀ g h ∈

and satisfies

g − h c g − h with a constant c 0 independent of g h ∈ .

∀ g h ∈


239

It is easy to find examples of continuous linear functionals on . Just fix some g ∈ and set g h = h g

h ∈

(21.9)

Linearity is clear and the Cauchy–Schwarz inequality L20.3 shows

g h − h˜ = h − h˜ g g ·h − h˜ = c

That, in fact, all continuous linear functionals of arise in this way is the content of the next theorem, due to F. Riesz. 21.9 Theorem (Riesz representation theorem) Each continuous linear functional on the Hilbert space is of the form (21.9), i.e. there exists a unique g ∈ such that h = g h = h g

∀ h ∈

Proof Set F = −1 0 which is, due to the continuity and linearity of , a closed linear subspace of .[] If F = , ≡ 0 and g = 0 ∈ does the job. Otherwise we can pick some g0 ∈ \ F and set g =

g0 − PF g0 (21.5) ⊥ ∈ F =⇒ g = 0 g0 − PF g0

Since = F ⊕ F ⊥ , we can write every h ∈ in the form

h h g+ h− g ∈ F ⊥ ⊕ F h= g g hence

h h h g g = 0 ⇐⇒ h g = g g h− g g g =1

⇐⇒ h = h g g

and the proof is finished. We will finally see how to represent elements of a Hilbert space using an orthonormal base (ONB, for short). We begin with a definition. 21.10 Definition Let be a Hilbert space. (i) The (linear) span of a family ek k = 1 2 N ⊂ , N ∈ ∪ , is the set of all finite linear combinations

240

R.L. Schilling

of the ek , i.e. span e1 e2 eN =

n

k ek 1 n ∈ n ∈ n N

k=1

(ii) A sequence ek k∈ ⊂ is called a (countable) orthonormal system (ONS, for short) if 0 if k = ek e = 1 if k = that is, ek = 1 and ek ⊥ e whenever k = . 21.11 Theorem Let ek k∈ be an ONS in the Hilbert space and denote by E = EN = span e1 eN the linear span of e1 eN , N ∈ . (i) E = EN is a closed linear subspace, PE g = N g − g ek ek < g − f

N

k=1 g ek ek

and

∀ f ∈ E f = PE g

k=1

and also PE g2 =

N

g ek 2

k=1

(ii) (Pythagoras’ theorem) For g ∈ 2 N N g = g − PE g + PE g =

g ek 2 g − g ek ek + 2

2

2

k=1

k=1

(iii) (Bessel’s inequality) For g ∈

g ek 2 g2

k=1

c e , c ∈ , converges to (iv) (Parseval’s identity) The sequence m k=1 2 k k m∈ k an element g ∈ if, and only if, k=1 ck < . In this case, Parseval’s identity holds: k=1

ck 2 =

k=1

g ek 2 = g2


241

Proof (i) That EN is a linear subspace is due to the very definition of ‘span’. The closedness follows from the fact that EN is generated by finitely many ek : if f ∈ EN is of the form f = Nj=1 cj ej , cj ∈ , then f ek =

N

N cj ej ek = cj ej ek = ck

j=1

j=1 n→

Let f n n∈ ⊂ EN be a sequence with f n −−−→ f ∈ . Then N N n n f − f ej ej = f − f ej ej j=1

j=1

N f n − f ej · ej j=1

N

f n − f

(L20.3, ej = 1)

j=1 n→

= N f n − f −−−→ 0 which shows that limn→ f n = Nj=1 f ej ej ∈ EN . If g ∈ , we observe that g − Nj=1 g ej ej ⊥ ek for all k = 1 2 N , since for these k N N g − g ej ej ek = g ek − g ej ej ek j=1

j=1

= g ek − g ek = 0 Since = EN ⊕ EN ⊥ , we get PEN g = Nj=1 g ej ej , while (21.2) implies g − N g ej ej g − f for f ∈ EN , with equality holding only if f = j=1 PEN g because of uniqueness of PEN g. Finally, PEN g2 = PEN g PEN g =

N

g ej ej

j=1

=

N

g ek ek

k=1

g ej g ek ej ek =

jk=1

where we used that ej is an ONS.

N

N j=1

g ej 2

242

R.L. Schilling

(ii) follows from (21.8) and (i). (iii) From (ii) we get for all N ∈ N

g ej 2 = g2 − g − PE g g2

j=1

Since the right-hand side is independent of N ∈ , we can let N → and the claim follows. m

(iv) Since is complete, it is enough to show that k=1 ck ek m∈ is a Cauchy sequence. Because of the orthogonality of the ek we see (as in (i)) n 2 n n 2 2 = c e

c

e =

ck 2 k k k k k=m−1

k=m−1

k=m−1

m ck ek m∈ is a Cauchy sequence in if, and only if, which means that k=1

2 converges. In the latter case, Parseval’s identity follows from (iii): k=1 ck N for g = k=1 ck ek we have PEN g = k=1 ck ek and ck = g ek by (i). Thus by (ii), g2 = g − PEN g2 +

N

N →

g ej 2 −−−→

j=1

g ej 2 =

j=1

cj 2

j=1

Two questions remain: can we always find a countable ONS? If so, can we use it to represent all elements of ? The answer to the first question is ‘yes’, while the second question has to be answered by ‘no’, unless we are looking at separable Hilbert spaces, see Definition 21.14 below. Here we will restrict ourselves to the latter situation but we will point towards references where the general case is treated. 21.12 Definition An ONS ek k∈ in the Hilbert space is said to be maximal (also complete, total, an orthonormal basis) if for every g ∈ g ek = 0

∀ k ∈ =⇒ g = 0

The idea behind maximality is that we can obtain as limit of finite-dimensional projections, ‘ = limN Pspan e1 eN ’ or ‘ = k∈ ek ’, if the limits and summations are understood in the right way. Here we see that the countability of the ONS entails that can be represented as closure of the span of countably many


243

elements – and that this is indeed a restriction should be obvious. Let us make all this more precise. 21.13 Theorem Let ek k∈ be an ONS in the Hilbert space . Then the following assertions are equivalent. (i) ek k∈ is maximal; (ii) span e1 eN is dense in ; N ∈

(iii) g =

g ej ej

∀ g ∈ ;

j=1

(iv)

g ej 2 = g2

∀ g ∈ ;

j=1

(v)

g ej h ej = g h

∀ g h ∈ .

j=1

Proof (i)⇒(ii): Since F = N ∈ span e1 eN = span ej j ∈ is a linear subspace of , the assertion follows from the definition of maximality and Remark 21.7(iii). (ii)⇒(iii) is obvious since

g ej ej = lim

N →

j=1

N

g ej ej = lim PEN g

j=1

N →

(iii)⇒(iv) follows from Theorem 21.11(iv). (iv)⇒(v) follows from the polarization identity (20.13). (v)⇒(i): If u ek = 0 for some u ∈ and all k ∈ , we get from (v) with g = h = u that 0=

u ej u ej = u u = u2

j=1

and therefore u = 0. Theorem 21.13 solves the representation issue. To find an ONS, we recall first what we do in a finite-dimensional vector space V to get a basis. If V =

244

R.L. Schilling

recursively all v1 vk such that still V = span span v1 vN , we remove

v1 vN \ v1 vk . This procedure gives us in at most N steps a minimal system w1 wn ⊂ v1 vN , N = n + k, with the property that V = span w1 wn . Note that this is, at the same time, a maximally independent system of vectors in V . We can now rebuild w1 wn into an ONS by the Gram–Schmidt orthonormalization procedure: e1 =

w1 w1

and recursively

e˜ j+1 = wj+1 − Pspan e1 ej wj+1 = wj+1 −

j

wj+1 e e

=1

ej+1

e˜ j+1 = e˜ j+1

⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭

(21.10)

Another interpretation of (21.10) is this: If we had unleashed the Gram–Schmidt procedure on the set v1 vN , we would have obtained again n orthonormal vectors[] , say, f1 fn (which are, in general, different from e1 en constructed from w1 wn ). A close inspection of (21.10) shows that at each step V = span f1 fj vj+1 vN , so that (21.10) extends an partially existing basis f1 fj to a full ONB f1 fn . This means that (21.10) is also a ‘basis extension procedure’. To get (21.10) to work in infinite dimensions we must make sure that is the closure of the span of countably many vectors. This motivates the following convenient (but somewhat restrictive) 21.14 Definition A Hilbert space is said to be separable if contains a countable dense subset G ⊂ . 21.15 Theorem Every separable Hilbert space has a maximal ONS. Proof Let G = gj j∈ be an enumeration of some countable dense subset of and consider the subspaces Fk = span g1 gk . Note that Fk ⊆ Fk+1 , dimFk k and that k∈ Fk is dense in . Now construct an ONB in the finite-dimensional space Fk and extend this ONB using (21.10) to an ONB in Fk+1 , etc. This produces a sequence ej j∈ of orthonormal elements such that span ej j ∈ = k∈ Fk = G is dense in and T21.13 completes the proof.


245

21.16 Remarks (i) Assume that is separable. Then we have the following ‘algebraic’ interpretation of the results in 21.11–21.15. Consider the maps coordinate projection 2 →

g → g ej j∈

(re-)construction map !

2 →

cj j∈ →

cj ej

j=1

! Because of Theorem 21.11(iv), both and are well-defined maps, and Theorem ! 21.11 shows that Diagram 1 (below, left) commutes, i.e. = id2 .

H

H

Π

2 ()

Π

Π

2 ()

Π id

id

H

2 ()

Diagram 1

Diagram 2

This means that, if we start with a square-summable sequence, associate an element from with it and project to the coordinates, we get the original sequence back. The converse operation, if we start with some h ∈ , project h down to its coordinates, and then try to reconstruct h from the (square integrable) coordinate sequence, is much more difficult, as we have seen in Theorems 21.13 and 21.15. Nevertheless, it can be done in every separable Hilbert space, and Diagram 2 ! (above, right) becomes commutative, i.e. = id . This shows that every separable Hilbert space can be isometrically mapped 2 onto . The isometry is given by Parseval’s identity 21.11(iv):

2

h ej 2 = h2 = h2

j∈

(ii) If is not separable, we can still construct an ONB but now we need transfinite induction or Zorn’s lemma. A reasonably short account is given in Rudin’s book [40, pp. 83–88]. The results 21.11–21.13 carry over to this case if one makes some technical (what is an uncountable sum? etc.) modifications.

246

R.L. Schilling

Problems 21.1. Show that every convergent sequence in is a Cauchy sequence. 21.2. Show that g → g h, h ∈ , is continuous.

1/p 21.3. Show that g h = gp + hp is for every p 1 a norm on × . For which values of p does × become a Hilbert space? 21.4. Show that g h → g h and t h → t h are continuous on × resp. × . 21.5. Show that a Hilbert space is separable if, and only if, contains a countable maximal orthonormal system. " 21.6. Show that for = L2 X and w ∈ L2 the set Mw⊥ = u ∈ L2 u w d = 0 ⊥ is either 0 or a one-dimensional subspace of . 21.7. Let ej j∈ ⊂ be an orthonormal system. (i) Show that no subsequence of ej j∈ converges. However, limj→ ej h = 0 for every h ∈ . [Hint: show that it can’t be a Cauchy sequence. Use Bessel’s inequality.] 1 (ii) The Hilbert cube Q = h ∈ h = cj ej cj j j ∈ is closed, j=1

bounded and compact (i.e. every sequence has a convergent subsequence). (iii) The set R = B1/j ej is closed, bounded but not compact (cf. (ii)). j=1

(iv) The set S = h ∈ h =

cj ej cj j j ∈ 2 compact (cf. (ii)) if, and only if, j=1 j < .

is closed, bounded and

j=1

21.8. Let be a real Hilbert space.

g h = sup g h = sup g h g g=0 g1 g=1 (ii) Can we replace in (i) • • by • •? (iii) Is it enough to take g in (i) from a dense subset rather than from (resp. B1 0 or k ∈ k = 1 )? (i) Show that

h = sup

21.9. Show that the linear span of a sequence ek k∈ ⊂ , span ek ek ∈ k ∈ , is a linear subspace of . 21.10. A weak form of the uniform boundedness principle. Consider the real Hilbert 2 space 2 = and let a = aj j∈ and b = bj j∈ be two sequences of real numbers. 2 (i) Assume that j=1 aj = . Construct a sequence jk k∈ such that j1 = 0 and 2 jk <jjk+1 aj > 1 for all k ∈ . (ii) Define bj = k aj for all jk < j jk+1 , k ∈ and show that one can determine 2 the k in such a way that j=1 aj bj = while j=1 bj < . (iii) Conclude that if a b < for all a ∈ 2 , we have necessarily b ∈ 2 . (iv) State and prove the analogue of (iii) for all separable Hilbert spaces.


247

Remark. The general uniform boundedness principle states that in every Hilbert space and for any H ⊂ one has sup h g < ∀ g ∈

=⇒

h∈H

sup h < h∈H

Interpreting h g → g h as linear map, this says that the boundedness of the orbits h for all h ∈ H implies that the set H is bounded. This formulation perseveres even in Banach spaces. The proof is normally based on Baire’s category theorem, cf. Rudin [40]. 21.11. Let F G ⊂ be linear subspaces. An operator P defined on G is called (-) linear, if P f + g = Pf + Pg holds for all ∈ and f g ∈ G. (i) Assume that F is closed and that P → F is the orthogonal projection. Then P2 = P

and

Pg h = g Ph

∀ g h ∈

(21.11)

(ii) If P → is a map satisfying (21.11), then P is linear and P is the orthogonal projection onto the closed subspace P. (iii) If P → is a linear map satisfying P2 = P

Ph h

and

∀ h ∈

then P is the orthogonal projection onto the closed subspace P. 21.12. Let X be a measure space and Aj j∈ ⊂ be mutually disjoint sets such that X = · j∈ Aj . Set # 2 2 Yj = u ∈ L

u d = 0 j ∈ Acj

(i) Show that Yj ⊥ Yk if j = k.

(ii) Show that span · j∈ Yj , (i.e. the set of all linear combinations of finitely many elements from · j∈ Yj ) is dense in L2 .

(iii) Find the projection Pj L2 → Yj .

21.13. Let X be a measure space and assume that Aj j∈ ⊂ is a sequence of pairwise disjoint sets such that · j∈ Aj = X and 0 < Aj < . Denote by n = A1 A2 An and by = Aj j ∈ . (i) Show that L2 n ⊂ L2 and that L2 n is a closed subspace. (ii) Find an explicit formula for E n u where E n is the orthogonal projection E n L2 → L2 n . (iii) Determine the orthogonal complement of L2 n .

n (iv) Show that E u n∈∪ is a martingale. n→

(v) Show that E n u −−→ E u a.e. and in L2 . (vi) Conclude that L2 is separable.

22 Conditional expectations in L2

Throughout this chapter X will be some measure space. We have seen in Chapter 20 that L2 = L2 ⊕ i L2 . By considering real and imaginary parts separately, we can reduce many assertions concerning L2 to L2 . From Chapter 21 we know that L2 is a Hilbert space with inner product, resp. norm u v = u v2 =

u v d

resp.

u = u2 =

1/2 u d 2

Since a function1 u ∈ L2 is only almost everywhere defined and since ¯ are almost everywhere finite, (square-) integrable functions with values in hence -valued, cf. Remark 12.5, we can identify L2 and L2¯ . We will do so and simply write L2 . Caution: Note that for functions u v ∈ L2 expressions of the type u = v, u v always mean ux = vx, ux vx for all x outside some -null set. In this chapter we are mainly interested in linear subspaces of L2 and projections onto them. One particularly important class arises in the following way: if ⊂ is a sub--algebra of , then any -measurable function is certainly -measurable. Since X is also a measure space, it seems natural to interpret L2 (with norm · L2 ) as a subspace of L2 (with norm · L2 ). This can indeed be done. 22.1 Lemma Let ⊂ be a sub--algebra of . Then ı 2 → 2 1

and

j L2 → L2

Strictly speaking we should call it an equivalence class of functions, cf. Remark 12.5.

248


249

are isometric imbeddings, i.e. linear maps satisfying ıu2 = u2 and jwL2 = wL2 for all u ∈ 2 , resp. w ∈ L2 . In particular 2 L2 is a closed linear subspace of 2 L2 . Proof Since ⊂ and since and coincide on , we have ⊂ for the simple functions. The map ı →

f → ı f = f =

N

j 1Gj

j=0

where the latter is a standard representation of f with j ∈ and Gj ∈ , clearly satisfies f 22 =

N

2j Gj =

j=0

N j=0

2j Gj = f 22

∀ f ∈

According to Corollary 12.11 we can find for every u ∈ 2 a sequence fk k∈ ⊂ ∩ 2 such that limk→ u − fk 2 = 0. Therefore, k →

ı fk − ı f 2 = ı fk − ı f 2 −−−−→ 0 which shows because of the completeness of 2 (cf. Theorem 12.7) that ıu = 2 - limk→ ı fk is a linear isometry from 2 to 2 . Since 2 is complete, ı2 is a closed linear subspace of 2 . Denote by u ∈ L2 the equivalence class containing the function u ∈ 2 . Since for any two u w ∈ 2 with u = w a.e. we also have ıu = ıw a.e., the map j u = ıu is independent of the chosen representative for u and defines a linear isometry j L2 → L2 . As before, jL2 is a closed linear subspace of L2 . It is customary to identify u ∈ L2 with ju ∈ L2 and we will do so in the sequel. Unless we want to stress the -algebra, we will write instead of and · 2 for the norm in L2 and L2 . A key observation is that the choice of ⊂ determines our knowledge about a function u. 22.2 Example Consider a finite measure space X and the sub--algebra = ∅ G Gc X where G ∈ and G > 0, Gc > 0. Let f ∈ be a simple function in standard representation: f=

m j=0

yj 1Aj

yj ∈ Aj ∈

250

R.L. Schilling

Then G

f d =

m j=0

yj

G

1Aj d =

m

yj

j=0

Aj ∩ G G = 1 1G d G

(22.1)

= 1

and similarly, Gc

f d =

m j=0

yj

Aj ∩ Gc c G = 1Gc d 2 Gc

(22.2)

= 2

This indicates that we could have obtained the same results in the integrations (22.4) and (22.5) if we had not used f ∈ but the -simple function g = 1 1G + 2 1Gc ∈

(22.3)

with 1 2 from above. In other words, f and g are indistinguishable, if we evaluate (i.e. integrate) both of them on sets of the -algebra . Note that g is much simpler than f , but we have lost nearly all information of what f looks like on sets from save : if we take a set from the standard representation of f , say, Aj0 G, Aj0 ∈ , then f d = yj0 Aj0 = g d = 1 Aj0 ∩ G Aj0

Aj0

Aj ∩ G = yj ·Aj0 G j=0 m

= 1

i.e. we would get a weighted average of the yj rather than precisely yj0 . Let us extend the process sketched in Example 22.2 to -finite measures and general square-integrable functions. Our starting point is the observation that, with the notation of 22.2, f g d = 1 f d + 2 f d = 12 G + 22 Gc = g 2 d G

Gc

that is, f − g g = 0 or f − g ⊥ g in the space L2 . 22.3 Definition Let X be a measure space and ⊂ be a sub--algebra. The conditional expectation of u ∈ L2 relative to is the orthogonal projection onto the closed subspace L2

E L2 → L2 Sometimes one writes Eu instead of E u.

u → E u


251

The terminology ‘conditional expectation’ comes from probability theory where this notion is widely used and where X is usually a probability space. In slight abuse of language we continue to call E conditional expectation even if is not a probability measure. Let us collect some properties of E . 22.4 Theorem Let X be a measure space and ⊂ a sub--algebra. The conditional expectation E has the following properties (u w ∈ L2 ): (i) (ii) (iii) (iv)

E u ∈ L2 ; E uL2 uL2 ; E u w = u E w = E u E w; E u is the unique minimizer in L2 such that u − E uL2 = inf u − gL2 g∈L2

u = w =⇒ E u = E w; E u + w = E u + E w for all ∈ ; If ⊂ is a further sub--algebra, then E E u = E u; E g u = g E u for all g ∈ L ; E g = g for all g ∈ L2 ; 0 u 1 =⇒ 0 E u 1; u w =⇒ E u E w; E u E u; 1 (xiii) E∅X u = u d for all u ∈ L1 ∩ L2 X

(v) (vi) (vii) (viii) (ix) (x) (xi) (xii)

1

= 0 .

Before we turn to the proof of the above properties let us stress again that all (in-)equalities in (i)–(xiii) are between L2 -functions, i.e. they hold only -almost everywhere. In particular, E u is itself only determined up to a -null set N ∈ . Proof (of Theorem 22.4) Properties (i)–(vi) and (ix) follow directly from Theorem 21.5, Corollary 21.6 and Remark 21.7. (vii): For all u w ∈ L2 we find because of (iii) (iii)

E E u w = E u E w = u E E w ∈L2

(ix)

= u E w = E u w

as E w ∈ L2 ⊂ L2 . Since w ∈ L2 was arbitrary, we conclude that E E u = E u.

252

R.L. Schilling

(viii): Writing u = f + f ⊥ ∈ L2 ⊕ L2 ⊥ , we get g u = g f + g f ⊥ . Moreover, we have for any ∈ L2 and g ∈ L that g ∈ L2 , thus g f ⊥ = f ⊥ g = 0 and from the uniqueness of the orthogonal decomposition we infer that g f ⊥ = g f ⊥

E g u = g f = g E u

or

(x): Let 0 u 1. Since E u ∈ L2 , the Markov inequality P10.12 implies

E u > n1 n2 E u2L2 n2 u2L2 <

(22.4)

and so n = 1E u>1/n ∈ L2 . Therefore,

E u 1E u 1/n = 0 n∈

hence E u 0. With very similar arguments we see that 1E u>1 ∈ L2 , and since u 1 we have

22.4(iii),(ix) E u 1E u>1 d = u 1E u>1 d E u > 1 which entails E u > 1 = 0 or E u 1. (xi): Using that w −u 0, the first part of the proof of (x) shows E w −u 0, so that by linearity E w E u. (xii): Again by the proof of (x) we find for u ± u 0 that E u ± u 0, and by linearity ±E u E u. This proves E u E u. (xiii): If X = , we have L2 ∅ X = 0,[] thus E∅X u = 0 and the formula clearly holds.


253

If X < , we have L2 ∅ X , and E∅X u = c is a constant. By (iv), c = E∅X u minimizes u − cL2 , and u − c2 d = u2 d − 2c u d + c2 d = shows that c =

1 X

2 2 1 1 u d + u d − c X u d − X X 2

u d is the unique minimizer.

In Chapter 23 we will extend the operator E to further properties.

p p1 L

and add a few

On2 the structure of subspaces of L2 In the rest of this chapter we want to address a different question. As we have seen, E L2 → L2 is a symmetric orthogonal projection onto the closed subspace L2 of the Hilbert space L2 . It is natural to ask whether every orthogonal projection L2 → onto a closed subspace ⊂ L2 is a conditional expectation. Equivalently we could ask under which conditions a closed subspace

of L2 is of the form L2 = for a suitable sub--algebra ⊂ . 22.5 Theorem Let X be a -finite measure space. For a closed linear subspace ⊂ L2 and its orthogonal projection = P L2 → , the following assertions are equivalent. (i) = L2 and = E for some sub--algebra ⊂ containing an exhausting sequence Gj j∈ ⊂ with Gj ↑ X and Gj < . (ii) is a sub-Markovian operator, i.e. 0 u 1 =⇒ 0 u 1, u ∈ L2 , and for some u0 ∈ L2 with u0 > 0 we have u0 > 0. (iii) ∩ L is an algebra – i.e. it is closed under pointwise products: f h ∈

∩ L =⇒ f h ∈ ∩ L – which is L2 -dense in and contains an (everywhere) strictly positive function h0 > 0. (iv) is a lattice – i.e. f h ∈ =⇒ f ∧ h ∈ – containing an (everywhere) strictly positive function h0 > 0, and for all h ∈ also h ∧ 1 ∈ . Proof We show that (i)⇒(ii)⇒(iv)⇒(i)⇒(iii)⇒(iv). (i)⇒(ii) The sub-Markov property of = E follows from Theorem 22.4(x), while 2−j u0 = 1Gj ∈ + j=1 Gj + 1 2

This section can be left out at first reading.

254

R.L. Schilling

clearly satisfies 0 < u0 1, 2−j 2−j u0 2 = 1Gj 1Gj 2 2 j=1 Gj + 1 j=1 Gj + 1 Gj −j = 2 1 Gj + 1 j=1 so that u0 ∈ L2 and, therefore, 0 < u0 = u0 1. (ii)⇒(iv) Since preserves positivity, we find for all u ∈ L2 that u+ 0 and u = u+ − u− u+ , thus u ∨ 0 u+ . On the other hand, = h ∈ L2 h = h and the above calculation shows for h ∈

h+ = h+ = h ∨ 0 h+

(22.5)

Since is a contraction, see (21.4), we find also (22.5)

h+ 2 h+ 2 h+ 2 which implies h+ h+ = h+ h+ = h+ h+ . Because of (22.5) we get h+ − h+ h+ = 0 or h+ = h+ on the set h+ > 0. But then 0

h+ 2 = h+ 2 =⇒

h+ =0

h+ 2 d =

h+ =0

h+ 2 d = 0

which shows that h+ = 0 on h+ = 0 or h+ = 0 = 0. In either case, h+ = h+ (almost everywhere) and h+ ∈ . Consequently, f ∧ h = f − f − h+ ∈ . Similarly, h ∧ 1 = h − h − 1+ and, if h ∈ , we see h ∧ 1 h ∧ 1 = h ∧ 1. Further, h − 1+ = h − h ∧ 1 h − h ∧ 1 = h − 1+ and since is a contraction, the same argument which we used to get h+ = h+ yields h − 1+ = h − 1+ , hence h − 1+ h ∧ 1 ∈ . Finally, h0 = u0 , u0 as in (ii), satisfies h0 ∈ and h0 > 0. (iv)⇒(i) We set = G ∈ h ∧ 1G ∈

∀ h ∈

Let us first show that is a -algebra. Clearly, ∅ X ∈ . If G ∈ , then h ∧ 1Gc + h ∧ 1G = h ∧ 1 + h ∧ 0 ∈

∈

∀ h ∈


255

which means that h ∧ 1Gc ∈ and Gc ∈ . For any two sets G H ∈ we see3 h ∧ 1G∪H = h ∧ 1G ∨ 1H = h ∧ 1G ∨ h ∧ 1H ∈

∀ h ∈

so that G ∪ H ∈ . Finally, let Gj j∈ ⊂ ; since is ∪-stable, we may assume that Gj ↑ G = j∈ Gj . Then h ∧ 1Gj j∈ ⊂

and

lim h ∧ 1Gj = h ∧ 1G ∈ L2

j→

∀ h ∈

Since h ∧ 1Gj h ∧ 1G ∈ L2 , an application of the dominated convergence theorem 12.9 shows that in L2 limj→ h ∧ 1Gj = h ∧ 1G for all h ∈ . Since

is a closed subspace, we conclude that h ∧ 1G ∈ for all h ∈ , thus G ∈ . We will now show that L2 = . If f ∈ we know from our assumptions that ±f ∧ 0 ∈ , so that f + = −−f ∧ 0, f − = −f ∧ 0 ∈ . Thus =

+ − + , and since also L2 = L2+ − L2+ it is clearly enough to show that + = L2+ .

Assume that f ∈ + . Then for a > 0, f ∧ a = a fa ∧ 1 ∈ , and by monotone convergence T11.1 and the closedness of ,

h ∧ 1f>a = h ∧ sup nf − nf ∧ a ∧ 1 ∈

∀ h ∈ n∈

proving that f > a ∈ for all a > 0. Moreover, f > a = X if a < 0 and f > 0 = k∈ f > 1/k, which shows that f > a ∈ for all a ∈ and, consequently, + ⊂ L2+ . Conversely, if g ∈ L2+ , we can write g as limit of simple functions, gn = Nn n n n n j=1 yj 1Gn with disjoint sets G1 GNn ∈ and yj > 0. For all h ∈

j

we find

Nn

gn ∧ h =

j=1

n yj 1Gn ∧ h j

Nn

=

j=1

n yj

1Gn ∧ j

h n

yj

∈

and dominated convergence T12.9 and the closedness of imply that g ∧ h ∈ . Choosing, in particular, h = n h0 for some a.e. strictly positive function h0 and letting n → gives g = L2 - lim n h0 ∧ g ∈ + n→

where we again used monotone convergence T11.1 and the closedness of . This proves L2+ ⊂ + . Finally, the sets Gj = h0 > 1/j ∈ satisfy Gj ↑X and, because of the Markov inequality P10.12, Gj = h0 > 1/j j 2 h20 d < . 3

Use that is a vector space and a ∨ b = −−a ∧ −b.

256

R.L. Schilling

(i)⇒(iii): Note that L2 ∩ L = L2 ∩ L . An application of the dominated convergence theorem 12.9 shows that the sequence fn = −n ∨ f ∧ n, n ∈ , f ∈ L2 , converges in L2 to f , i.e. L2 ∩ L is a dense subset of L2 . The element h0 > 0 is now constructed as in the proof of (i)⇒(ii). That L2 ∩ L is an algebra is trivial. (iii)⇒(iv): Let us show, first of all, that ∩ L is stable under minima. To this end we define recursively a sequence of polynomials in , p0 x = 0

pn+1 x = pn x + 21 x2 − pn2 x

n ∈ 0

By induction it is easy to see that pn 0 = 0 for all n ∈ 0 and that 0 pn x pn+1 x x

∀ x ∈ −1 1

For n = 0 there is nothing to show. Otherwise we can use the induction assumption pn x pn+1 x x to get def

= pn+2 x

2 x 0 pn+1 x pn+1 x + 21 x2 − pn+1 0

= x − x − pn+1 x · 1 − 21 x + pn+1 x 0

0 for x∈ −11

x Therefore, limn→ pn x = supn∈ pn x = x exists for all x 1 and, according to the recursion relation, x = x. Since is a linear subspace which is stable under products, we get for every h ∈ ∩ L that pn h/h ∈ , and monotone convergence T11.1 and the closedness of show h h sup pn ∈ =⇒ h ∈ = h h n∈ As ∩ L is dense in , we find for h ∈ a sequence hk k∈ ⊂ ∩ L such that L2 - limk→ hk = h. From above we know, however, that hk ∈ and k→

hk −−−→ h in L2 , thus h ∈ . This shows, in particular, that f ∧ h = 21 f + h − f − h ∈

∀ f h ∈

Since 0 < h0 h0 , we get for all n h0 that N n h0 j h0 j = = sup n − h0 j=0 n N ∈ j=0 n


and for h ∈ , h∧

257

N h0 j n = lim ∧h n − h0 N → j=0 n ∈

n By monotone convergence T11.1 we conclude that h ∧ n−h ∈ . Finally, as 0

2 n n 2 h2 , we can use the dominated convergence theorem n−h ↓ 1 and h ∧ n−h 0

0

12.9 and the closedness of to see that h ∧ 1 ∈ . Problems 22.1. Let X be a measure space and ⊂ be two sub--algebras. Show that E E u = E E u = E u

∀ u ∈ L2

22.2. Let X be a measure space, ⊂ be a sub--algebra and let = f where f ∈ + is a density f > 0. (i) Denote by E resp. E the projections in the spaces L2 , resp. L2 . Express E in terms of E . [Hint: E u = E fu/E f 1E f>0 ]

(ii) Under which condition do we have E u = E u for all u ∈ L2 ∩L2 ? Remark. The above result allows us to study conditional expectations for finite measures only and to define for more general measures a conditional expectation by E u =

E fu E f

1E f>0

n 22.3. Let X be a finite measure space, G1 Gn ∈ such that · j=1 Gj = X and Gj > 0 for all j = 1 2 n. Then n dx E u= ux 1 Gj Gj Gj j=1 Remark. The measure 1Gj /Gj = • ∩ Gj /Gj is often called the conditional probability given Gj .

23 Conditional expectations in Lp

Throughout this chapter X will be some measure space. Our aim is to extend the operator E from L2 to a wider class of functions including the spaces Lp , 1 p . We will use the same technique that allowed us in Chapters 9 and 10 to extend the integral from the simple functions to the positive measurable functions + and integrable functions 1 . Since we are now considering the spaces Lp of (equivalence classes of) pth power integrable functions, it is convenient to have an analogous notion for measurable functions. 23.1 Definition Let X be a measure space. Two functions u v ∈ are called equivalent, u ∼ v, if u = v ∈ is a -null set. We write M = /∼ for the set of all equivalence classes of measurable functions u ∈ . As with Lp -functions, all (in-)equalities between elements from M hold pointwise almost everywhere. 23.2 Lemma Let X be a -finite measure space. Then u ∈ M + if, and only if, there exists an increasing sequence uj j∈ ⊂ L2+ such that u = supj∈ uj . Proof The ‘only if’ part ‘⇐’ is trivial since suprema of countably many measurable functions are again measurable (C8.9). For ‘⇒’ let Aj j∈ ⊂ be an exhausting sequence such that Aj ↑ X and Aj < . If u ∈ M + , then uk = u ∧ k 1Ak ∈ L2+ and supk∈ uk = u. 23.3 Remark Lemma 23.2 is no longer true if X is not -finite. In fact, if 1 ∈ M + can be approximated by an increasing sequence uk k∈ ⊂ L2+ , 1 = supk∈ uk , the sets Ak = uk > 1/k would form an increasing sequence 258


Ak ↑ X with Ak = uk > 1/k k P10.12.

2

259

u2k d < by the Markov inequality

The key technical point is the following result. 23.4 Lemma Let X be a measure space, ⊂ be a sub- -algebra, and uj j∈ wj j∈ ⊂ L2 be two increasing sequences. Then sup uj = sup wj =⇒ sup E uj = sup E wj

j∈

j∈

j∈

(23.1)

j∈

If u = supj∈ uj is in L2 , the following conditional monotone convergence property holds: sup E uj = E sup uj = E u (23.2) j∈

j∈

in L2 and almost everywhere. Proof Let us first of all assume that uj ↑ u and u ∈ L2 . Monotone convergence j→

11.1 and Theorem 12.7 show that uj −−−→ u also in L2 -sense. By Theorem 22.4(xi), E uj j∈ is again an increasing sequence and E uj E u. From 22.4(ii), (vi) we get E u − E uj = E u − uj u − uj 2 2 2 i.e. L2 -limj→ E uj = E u. For a subsequence ujk k∈ ⊂ uj j∈ we even have limk→ E ujk = E u a.e., cf. Corollary 12.8. Because of the monotonicity of the sequence E uj j∈ , we get for all j > jk E u − E uj = E u − E uj E u − E ujk and letting first j → and then k → gives E uj ↑ E u a.e. This finishes the proof of (23.2). If uj j∈ wj j∈ ⊂ L2 are any two increasing sequences1 such that supj∈ uj = supj∈ wj , we can apply (23.2) to the increasing sequences uj ∧ wk ↑ uj (as k → and for fixed j) and uj ∧ wk ↑ wk (as j → and for fixed k). This shows (23.2)

(23.2)

sup E uj = sup sup E uj ∧ wk = sup sup E uj ∧ wk = sup E wk

j∈

j∈ k∈

k∈ j∈

k∈

A combination of Lemmata 23.2 and 23.4 allows us to define conditional expectations for positive measurable functions in a -finite measure space. 1

We do not assume that supj∈ uj supj∈ wj ∈ L2 .

260

R.L. Schilling

23.5 Definition Let X be a -finite measure space and ⊂ be a sub -algebra. Let u ∈ M + and let uj j∈ ⊂ L2+ be an increasing sequence such that u = supj∈ uj . Then E u = sup E uj

(23.3)

j∈

is called the conditional expectation of u with respect to . If u ∈ M and E u± ∈ almost everywhere, we define (almost everywhere) − E u = E u+ − E u− = lim E u+ (23.4) j − E uj j→

± 2 where u± j ↑ u are suitable approximating sequences from L+ . We write L for the set of all functions u ∈ M such that (almost everywhere) E u exists and is finite.

23.6 Theorem Let X be a -finite measure space. The conditional expectation E extends E , i.e. L2 ⊂ L and E u = E u for all u ∈ L2 . Proof Applying (23.2) to u+ and u− shows E u± = E u± and, in particular, E u± ∈ L2 . As such, E u± is a.e. real-valued, so that (23.4) is always defined in M, resp. M. 23.7 Theorem Let X be a -finite measure space. Then L is a vector space and Lp ⊂ L for all 1 p . Proof Let 1 E u > 1/n ∈ have finite -measure, Gn E u > 1/n n2 E u2 d <

Moreover,

E u 1G p = E u E up−1 sgnE u 1G n p n

= u E up−1 sgnE u 1Gn

(by T22.4(iii), (ix))

Cq u p where we used Hölder’s inequality T12.2 with p−1 + q −1 = 1 and

1/q

1/q Cq = E up−1q 1Gn d = E up 1Gn d


261

Dividing the above inequality by Cq – if Cq = 0 there is nothing to show since in this instance E u = 0[] – gives E u 1G u p

n p As we have seen above, E u = = 0, so we can use Beppo Levi’s theorem 9.6 to find for all u ∈ L2 ∩ Lp E u = E u 10 0 such that w d < =⇒ u d < ∀ A ∈

(23.13) A

A

Since for all j ∈ and c > 0 j j c w d E u d E u d = u d E

j u>c w

we may choose c = c0 = −1 u d, which implies that for a given > 0 (23.13) w d c

E

0w

j u>c

0w

Since Ej u > c0 w ∈ j , the martingale property implies 23.8(xi) E j u d E j u d E

j u>c

E

0w

23.9

j u>c

0w

Ej u>c0 w

u d

which is but uniform integrability of the family Ej u j∈ .

∀ j ∈

268

R.L. Schilling

The convergence assertions follow now from the convergence theorem for UI submartingales T18.6. (ii) follows directly from Theorem 18.6. Since the conditional Jensen inequality needs fewer assumptions than the classical Jensen inequality we can improve Example 17.3(v), (vi). 23.16 Corollary Let X j be a -finite filtered measure space and uj j∈ be a family of measurable functions uj ∈ Lj which satisfies the [sub-]martingale property2 uj = Ej uj+1 resp. uj Ej uj+1

If V → is a [monotone increasing] convex function such that Vuj ∈ L1 j , then Vuj j∈ is a submartingale. Proof Since uj j∈ satisfies the [sub-]martingale property, we find from Jensen’s inequality C23.13 [and the monotonicity of V ] that Vuj V Ej uj+1 Ej Vuj+1

23.17 Example In Example 17.3(ix) we introduced a dyadic filtration on the measure space 0 n 0 n = n 0n given by −j n −j n j ∈ 0

j = z + 0 2 z ∈ 2 0 For u ∈ L1 0 n and all j ∈ 0 we can now rewrite (17.4) as 1z+02−j n j d 1z+02−j n x

E ux = u z + 0 2−j n z∈2−j n 0

23.18 Remark In Theorem 22.5 we found necessary and sufficient conditions that a projection in L2 is a conditional expectation. This result has a counterpart in the spaces Lp , p = 2, which we want to mention here without proof. Details can be found in the monograph by Neveu [31, pp. 12–16]. Let X be a finite measure space. Then (i) Let p ∈ 1 . linear operator T Lp → Lp such that Every bounded Tf d = f d, f ∈ Lp , and Tf Tg = Tf Tg, f ∈ L g ∈ 1 L , is a conditional expectation w.r.t. some sub- -algebra ⊂ . (ii) Let p ∈ 1 , p = 2. Every linear contraction T Lp → Lp such that T 2 = T andT 1 = 1isaconditionalexpectationw.r.t.somesub- -algebra ⊂. 2

This is slightly more general than assuming that uj j∈ is a [sub-]martingale since [sub-]martingales are, by definition, integrable.


269

Separability criteria for the spaces Lp X Let X be a measure space. Recall that Lp is separable if it contains a countable dense subset dj j∈ ⊂ Lp . We have seen in Chapter 21 that the Hilbert space L2 is separable if we can find a countable complete ONS ej j∈ ⊂ L2 since the system q1 e1 + · · · + qN eN N ∈ qj ∈ is both countable and dense. Conversely, using any countable dense subset dj j∈ as input for the Gram–Schmidt orthonormalization procedure (21.10), produces a complete countable ONS. Here is a simple sufficient criterion for the separability of Lp . 23.19 Lemma Let X be a -finite measure space and assume that the -algebra is countably generated, i.e. = Aj j ∈ , Aj ⊂ X. Then Lp X , 1 p < , is separable. Proof Step 1: Let us first assume that is a finite measure. Consider the -algebras n = A1 An ; then 1 ⊂ 2 ⊂ ⊂ = j j ∈ is a filtration, j is trivially -finite for every j ∈ and = .

Set uj = Ej u for u ∈ L1 . By Theorem 23.15 uj j j∈ is a uniformly j→

integrable martingale, hence uj −−−→ u in L1 and a.e. If v ∈ Lp , we set vj = Ej vp Ej vp (by Corollary 23.13) and observe that vj j j∈ is a submartingale, cf. Theorem 23.15, which is uniformly integrable. The latter follows easily from p E j v d vj d Ej vp d vj >w

vj >w

E

j vp >w

and the uniform integrability of the family Ej vp j∈ , see Theorem 23.15. j→

From the (sub-)martingale convergence Theorem 18.6 we conclude that vj −−−→ E vp = vp in L1 and a.e., and Riesz’ theorem 12.10 shows vj 1/p = j→ j→ E j v −−−→ v in Lp . Consequently, Ej v −−−→ v in Lp . Since the -algebra j is generated by finitely many sets, Ej u, resp., Ej v are simple functions with canonical representations of the form s=

N k=1

yk 1Bk

yk = 0 B1 BN ∈ j disjoint

270

R.L. Schilling

as j is kept fixed, we suppress the dependence of yk Bk N on j. If yk ∈ , we find for every > 0 numbers yk ∈ such that yk − y k

N X1/p

The triangle inequality now shows N N yk − y Bk 1/p s − y 1 B k k k k=1

p

k=1

which proves that the system N qk 1Bk N ∈ qk ∈ Bk ∈ j D = k=1

j∈

is a countable dense subset of the space Lp X , 1 p < . Step 2: If is -finite but not finite, we choose an exhausting sequence Cj j∈ ⊂ such that Cj ↑ X and Cj < and consider the finite measures j = • ∩ Cj , j ∈ , on Cj ∩ . Since every u ∈ Lp j = Lp Cj Cj ∩ j can be extended by 0 on the set X \ Cj and becomes an element of Lp j+1 , we can interpret the sets Lp j as a chain of increasing subspaces of each other and of Lp : Lp Cj Cj ∩ j ⊂ Lp Cj+1 Cj+1 ∩ j+1 ⊂ ⊂ Lp X

Applying the construction from step 1 to each of the sets Lp j furnishes countable dense subsets Dj . Obviously, D = j∈ Dj is a countable set but it is also dense in Lp . To see this, fix > 0 and u ∈ Lp . Since X \ Cj ↓ ∅, we find by Lebesgue’s dominated convergence theorem 12.9 some N ∈ such that X\C up d < p for all j N . Since Dj is dense in Lp j and since j u 1Cj ∈ Lp j , there is some dj ∈ Dj ⊂ D with u 1Cj − dj Lp j , and altogether we get for large j N u − dj p u 1Cj − dj Lp j + u 1X\Cj Lp 2

If the underlying set X is a separable metric space (cf. Appendix B), the criterion of Lemma 23.19 becomes particularly simple. 23.20 Corollary Let X be a separable metric space equipped with its Borel -algebra = X. Then Lp X , 1 p < is separable for every -finite measure on X . If is not -finite, Lp X need not be separable.


271

Proof Denote by D ⊂ X a countable dense subset and consider the countable system of open balls Br d = x ∈ X x d < r = Br d d ∈ D r ∈ + ⊂ X

Since every open set U ∈ X can be written as U= Br d 3 Br d⊂U Br d∈

which shows that X ⊂ ⊂ X = X. Thus the Borel sets X = are countably generated, and the assertion follows from Lemma 23.19. If is not -finite, we have the following counterexample: take X = 0 1 with its natural Euclidean metric x y = x − y and let be the counting measure on 0 1 0 1, i.e. B = #B. Obviously, is not -finite. The p th power -integrable simple functions are of the form N p ∩ L = yj 1Aj N ∈ yj ∈ Aj ∈ #Aj < j=1

so that

Lp = u 0 1 → ∃ xj j∈ ⊂ 0 1 ux = 0 and

∀ x = xj uxj <

p

j=1

Obviously, 1x x∈01 ⊂ Lp , but no single countable system can approximate this family since 1x − 1y pp = 0

or

2

according to whether x = y or x = y. With somewhat more effort we can show that the conditions of Lemma 23.19 are even necessary. 23.21 Theorem Let X be a -finite measure space. Then the following assertions are equivalent. (i) is (almost) separable, i.e. there exists a countable family ⊂ such that F < for all F ∈ and ≈ in the sense that every set in has, up to a null set, a version in . 3

The inclusion ‘⊂’ is obvious, for ‘⊃’ fix x ∈ U . Then there exists some r ∈ + with Br x ⊂ U . Since D is dense, x ∈ Br/2 d for some d ∈ D with d x < r/4, so that x ∈ Br/2 d ⊂ U .

272

R.L. Schilling

(ii) is separable,4 i.e. there exists a countable family ⊂ such that F < and for every A ∈ with A < we have ∀ > 0 ∃ F ∈ A \ F + F \ A

(iii) Lp X is separable, 1 p < . Proof (i)⇒(iii): The proof of Lemma 23.19 shows that Lp X is separable. Since for each A ∈ there is an A∗ ∈ with A \ A∗ ∪ A∗ \ A = 0 ⇐⇒ 1A − 1A∗ d = 0 every simple function ∈ has a version ∗ ∈ such that −∗ d = 0. This proves that Lp X ⊃ Lp X (we have, in fact, equality since ⊂ ), and we see that Lp X is separable. (iii)⇒(ii): Denote by dj j∈ a countable dense subset of Lp . Since ∩Lp is dense in Lp , cf. Lemma 12.11, we find for each dj a sequence fjk k∈ ⊂ ∩ Lp such that Lp -limk→ fjk = dj . Thus fjk jk∈ is also dense in Lp , and the system of subsets N fjk = r N ∈ j k ∈ r ∈ = =1

is countable since each fjk attains only finitely many values. For every A ∈ , A < , we have 1A ∈ Lp , and we find a subsequence p fA ∈ ⊂ fjk jk∈ with lim→ fA − 1A p = 0. A A Set F = f − 1A 1/2 ∩ f > 1/2. Obviously F ∈ , and F ⊂ A since Ac ∩ F = Ac ∩ fA − 1A 1/2 ∩ fA > 1/2 = Ac ∩ fA 1/2 ∩ fA > 1/2 = ∅

Thus F \ A = 0, while A \ F A ∩ fA − 1A > 1/2 + A ∩ fA 1/2

Using the triangle inequality, we infer A ∩ fA 1/2 ⊂ A ∩ fA − 1 1/2 = A ∩ fA − 1A 1/2 4

This notion derives from the fact that , A B = A \ B + B \ A, A B ∈ becomes a separable pseudo-metric space in the usual sense, cf. Appendix B.


273

and with the above calculation and an application of Markov’s inequality we conclude that A \ F + F \ A 2 A ∩ fA − 1A 1/2 10.12

2p+1 fA − 1A pp

The right-hand side of the above inequality tends to 0 as → , and (ii) follows. (ii)⇒(i): Fix A ∈ with A < . Then we find, by assumption, sets Fn ∈ with A \ Fn + Fn \ A 2−n . Consider the sets F ∗ =

Fn

and

F∗ =

k=1 n=k

Fn

k=1 n=k

Then using the continuity of measures T4.4 and -subadditivity C4.6, c ∗ F \ A + A \ F∗ = Fn A + A ∩ Fn k=1 n=k

=

k=1 n=k

c Fn \ A + A ∩ Fn

k=1 n=k

k=1 n=k

Fn \ A + A \ Fn = lim k→

n=k

Fn \ A + A \ Fn

lim

k→

n=k

lim

k→

n=k

2−k = 0

n=k

This shows that for all A ∈ with A < ∃ F∗ F ∗ ∈ F∗ ⊂ F ∗

and F ∗ \ A + A \ F∗ = 0

implying that F ∗ \ A + A \ F ∗ = 0 also. If A = we pick some exhausting sequence Ak k∈ ⊂ with Ak ↑ X and Ak < . Then the sets A ∩ Ak have finite -measure and we can construct, as before, sets Fk∗ and F∗k . Setting F ∗ = k∈ Fk∗ we find

k∈

Fk∗

j∈

A ∩ Aj =

k∈

Fk∗

∗ A ∩ Aj ⊂ Fk \ A ∩ Ak

j∈

k∈

274

R.L. Schilling

and so ∗

F \ A

k∈

Fk∗ \ A ∩ Ak

Fk∗ \ A ∩ Ak = 0

! " k=1 =0

A \ F ∗

is handled analogously. The expression This shows that sets from and differ by at most a null set. Problems 23.1. Complete the proof of Theorem 23.8. 23.2. Show that E 1 = 1 if, and only if is -finite. Find a counterexample showing that E 1 1 is, in general, best possible. [Hint: use p = 2 and E = E .] 23.3. Let be a sub- -algebra of . Show that E g = g for all g ∈ Lp . [Hint: observe that, a.e., g = g 1j g>1/j and g > 1/j < . This emulates -finiteness.] 23.4. Let ⊂ be two sub- -algebras of . Show that E E u = E E u = E u p for all u ∈ L provided is -finite. resp. for all u ∈ M [Hint: if is not -finite, the set Lp can be very small ….] 23.5. Consider on the measure space 0 = 1 0 the filtration n = 0 n 0 1 0 2 n − 1 n n . Find E u for u ∈ Lp . 23.6. Let X be a measure space and ⊂ be a sub- -algebra. Show that, in general, E u d u d u ∈ L1 with equality holding only if is -finite. 23.7. Prove Corollaries 23.11 and 23.12. 23.8. Let X j be a -finite filtered measure space and denote by u the canonical dual pairing between u ∈ Lp and ∈ Lq , p−1 + q −1 = 1, i.e. u = u d. A sequence uj j∈ ⊂ Lp is weakly relatively compact if there exists a subsequence ujk k∈ such that k→

ujk − u −−→ 0 holds for all ∈ Lq and some u ∈ Lp . Show that for a martingale uj j∈ and every p ∈ 1 the following assertions are equivalent: (i) there exists some u ∈ Lp such that limj→ uj − u p = 0; (ii) there exists some u ∈ Lp such that uj = E j u ; (iii) the sequence uj j∈ is weakly relatively compact.


275

23.9. Let X be a measure space and uj j∈ ⊂ L1 . Show that m1 = u1

mj+1 − mj = uj+1 − E j uj+1

is a martingale under the filtration j = u1 uj . 23.10. (Continuation of Problem 23.9). If u1 d = 0 and E j uj+1 = 0 then uj j∈ is called a martingale difference sequence. Assume that uj ∈ L2 and denote by sk = u1 + · · · + uk the partial sums. Show that sj2 j j∈ is a submartingale satisfying k sk2 d = u2j d

j=1

23.11. Doob decomposition. Let X j be a -finite filtered measure space and let sj j j∈ be a submartingale. Show that there exists an a.e. unique martingale mj j j∈ and an increasing sequence of functions aj j∈ such that aj ∈ L1 j−1 for all j 2 and sj = mj + aj

j ∈

[Hint: set m0 = u0 , mj+1 − mj = uj+1 − E j uj+1 and a0 = 0, aj+1 − aj = E j uj+1 − uj . For uniqueness assume m ˜ j + a˜ j is a further Doob decomposition ˜ j = a˜ j − aj .] and study the measurability properties of the martingale Mj = mj − m 23.12. Let P be a probability space and let Xj j∈ be a sequence of independent identically distributed random variables such that PXj = 0 = PXj = 2 = 21 . Set # Mk = kj=1 Xj . Show that there does not exist any filtration j j∈ and no random variable M such that Mk = E k M. [Hint: compare with Example 17.3(xi).] Remark. This example shows that not all martingales can be obtained as conditional expectations of a single function.

24 Orthonormal systems and their convergence behaviour

In Chapter 21 we discussed the importance of orthonormal systems (ONSs) in Hilbert spaces. In particular, countable complete ONSs turned out to be bases of separable Hilbert spaces. We have also seen that a countable ONS gives rise to a family of finite-dimensional subspaces and a sequence of orthogonal projections onto these spaces. In the present chapter we are concerned with the following topics: • to give concrete examples of (complete) ONSs; • to see when the associated canonical projections are conditional expectations; • to understand the Lp (p = 2) and a.e. convergence behaviour of series expansions with respect to certain ONSs. The latter is, in general, not a trivial matter. Here we will see how we can use the powerful martingale machinery of Chapters 17 and 18 to get Lp 1 p < and a.e. convergence. Throughout this chapter we will consider the Hilbert space L2 I I where I ⊂ is a finite or infinite interval of the real line, I = I ∩ are the Borel sets in I, = 1 I is Lebesgue measure on I and x is a density function. We will usually write x dx and dx instead of and d. One of the most important techniques to construct ONSs is the Gram–Schmidt orthonormalization procedure (21.10), which we can use to turn any countable family fk k∈ into an orthonormal sequence ek k∈ . Something of a problem, however, is to find a reasonable sequence fk k∈ which can be used as input to the orthonormalization procedure. Orthogonal polynomials For many practical applications, such as interpolation, approximation or numerical integration, a natural set of fk to begin with is given by the polynomials on I. 276


277

Usually one applies (21.10) to the sequence of monomials 1 t t2 t3 = tj j∈0 to construct an ONS consisting of polynomials. Of course, this depends heavily on the underlying measure space where polynomials should be square integrable. With some (partly pretty tedious) calculations1 one can get the following important classes of orthogonal polynomials in L2 I I x dx. , > −1 We choose 24.1 Jacobi polynomials Jk k∈ 0

I = −1 1

x dx = 1 − x 1 + x dx

> −1

and we get dk

+k +k 1 + x 1 − x dxk x = Jk k k! 2 1 − x 1 + x k 1 k+ k+ = k x − 1k−j x + 1j 2 j=0 j k−j −1k

2

J

= k 2

2 ++1 k + + 1 k + + 1 2k + + + 1 k + 1 k + + + 1

Choosing in 24.1 particular values for and yields other important families. 24.2 Chebyshev polynomials (of the first kind) Tk k∈0 We choose I = −1 1

x dx = 1 − x2 −1/2 dx

and we get

⎧ ⎨ 2 cosk arccos x if k ∈ −1/2−1/2

Tk x = Jk x = ⎩ √1 if k = 0

1 k + 21 2 2 Tk 2 = 2 k + 1

The first few Chebyshev polynomials are 1 1

x

2x2 − 1

4x3 − 3x

8x4 − 8x2 + 1

16x5 − 20x3 + 5x

The material in Sections 24.1–24.5 below is taken from Alexits [1, pp. 30–37], Gradshteyn-Ryzhik [17, §8.9] and Kaczmarz-Steinhaus [22, §§IV.1–2, 8–9]. Another classic is the book by Szegö [51, §§1–5], and a good modern reference is the monograph by Andrews et al. [2, §§5.1, 6.1–6.3].

278

R.L. Schilling

and the following recursion formula holds: Tk+1 x = 2x Tk x − Tk−1 x

k ∈

24.3 Legendre polynomials Pk k∈0 We choose I = −1 1

x dx = dx

and we get 00

Pk x = Jk

x =

−1k dk 1 − x2 k k k k! 2 dx

Pk 22 =

2 2k + 1

The first few Legendre polynomials are 1 x

2 1 2 3x − 1

3 1 2 5x − 3x

4 2 1 8 35x − 30x + 3

5 3 1 8 63x − 70x + 15x

and the following recursion formula holds: k + 1 Pk+1 x = 2k + 1 x Pk x − k Pk−1 x

k ∈

24.4 Laguerre polynomials Lk k∈0 We choose I = 0

x dx = e−x dx

and we get j k dk −x k j k x e x = k! −1 Lk x = e dxk j j! j=0

Lk 22 = k!2

x

The first few Laguerre Polynomials are 1 1 − x x2 − 4x + 2 −x3 + 9x2 − 18x + 6 x4 − 16x3 + 72x2 − 96x + 24 and the following recursion formula holds: Lk+1 x = 2k + 1 − x Lk x − k2 Lk−1 x

k ∈

24.5 Hermite polynomials Hk k∈0 We choose I = −

x dx = e−x dx 2

and we get Hk x = −1k ex

2

dk −x2 e dxk

√ Hk 22 = 2k k!


279

The first few Hermite polynomials are 1

2x

4x2 − 2

8x3 − 12x

16x4 − 48x2 + 12

and the following recursion formula holds: Hk+1 x = 2x Hk x − 2k Hk−1 x

k ∈

In order to decide if a family of polynomials pk k∈ ⊂ L2 I x dx is a complete ONS we have to show that ux pk x x dx = 0 ∀ k ∈ 0 =⇒ u = 0 a.e. The key technical result is the Weierstraß approximation theorem. 24.6 Theorem (Weierstraß) Polynomials are dense in C0 1 w.r.t. uniform convergence. Proof (S.N. Bernstein) Take a sequence Xj j∈ of independent2 measurable functions on 0 1 0 1 dx which are all Bernoulli p 1 − p-distributed, 0 < p < 1, i.e. Xj = 1 = p

and

Xj = 0 = 1 − p

∀ j ∈

cf. 17.4 for the construction of such a sequence. Write Sn = X1 + · · · + Xn for the partial sum and observe that, due to independence, Sn = k = · Xj1 = 1 ∩ ∩ Xjk = 1∩ 1j1 jk n

∩ Xjk+1 = 0 ∩ ∩ Xjn = 0

n k = p 1 − pn−k k which shows that u

Sn x

2

n

n k n k dx = p 1 − pn−k = Bn u p u n k k=0

In the sense of Example 17.3(x).

280

R.L. Schilling

where Bn u p stands for the nth Bernstein polynomial.3 From 17.4 we also know that 2 S x 1 p1 − p n (24.1) n − p dx = n 4n since the function p → p1 − p attains its maximum at p = 1/2. As u ∈ C0 1 is uniformly continuous, ux − uy < whenever x − y < is sufficiently small. Thus Bn u p − up u Snn − up d = S u Snn − up d + S

n n −p 0 we get uxx xj dx = 0 =⇒ u = 0 a.e. =⇒ u = 0 a.e. −11

This does not quite work for the Hermite and Laguerre polynomials, which are defined on infinite intervals. For the latter we take u ∈ L2 0 e−x dx, and find for all s 1 −sx ux e dx = ux e1−sx e−x dx 0

0

=

1 − sk ux xk e−x dx = 0 k! 0 k=0

=0

(note that the integral and the sum can be interchanged by dominated convergence). Using Jacobi’s formula C15.8 to change coordinates according to t = e−x , dt/dx = −e−x , we get 0= ux e−sx dx = u− ln t ts−1 dt s 1 0

01

and for s ∈ the above equality reduces to the case covered by Corollary 24.8. A very similar calculation can be used for the Hermite polynomials since 2 2 ux e−sx dx = ux + u−x e−sx dx

0

√ √ dt u t + u − t e−st √ 0 2 t √ where we used the obvious substitution x = t. =


283

The trigonometric system and Fourier series We consider now L2 = L2 − − = 1 − . As before we use dx as a shorthand for dx. The trigonometric system consists of the functions 1 √ 2

cos x √

sin x √

cos 2x √

sin 2x cos kx √ √

or, equivalently, 1 √ eikx 2

k ∈ i =

√

sin kx √ (24.2)

−1

(24.3)

Since eix = cos x + i sin x, we can see that (24.2) and (24.3) are equivalent, and from now on we will only consider (24.2). Orthogonality of the functions in (24.2) follows easily from the classical result that ⎧ if k = ⎪ ⎨0 (24.4) cos kx sin x dx = if k = 1 ⎪ − ⎩ 2 if k = = 0 which we leave as an exercise for the reader, see Problem 24.4. 24.9 Definition A trigonometric polynomial (of order n) is an expression of the form n Tx = 0 +

j cos jx + j sin jx (24.5) j=1

where n ∈ 0 , j j ∈ and 2n + 2n > 0. It is not hard to see that the representation (24.5) of Tx is equivalent to Tx =

n

jk cosj x sink x

jk=0

with coefficients jk ∈ , cf. Problem 24.5. It is this way of writing Tx that justifies the name trigonometric polynomial. 24.10 Theorem The trigonometric system (24.2) is a complete ONS in L2 = L2 − dx. Proof We have to show that ux cos kx dx = 0 −

−

ux sin x dx = 0

⎫ ∀ k ∈ 0 ⎪ ⎪ ⎬ ⎪ ⎪ ∀ ∈ ⎭

=⇒

u=0

a.e.

(24.6)

284

R.L. Schilling

Assume first that u is continuous and that, contrary to (24.6), ux0 = c = 0 for some x0 ∈ − . Without loss of generality we may assume that c > 0. Since the trigonometric functions are 2 -periodic, we can extend u periodically onto the whole real line. Then wx = c−1 ux + x0 is continuous around x = 0, orthogonal on − to any of the functions in (2)[] , and satisfies w0 = 1. As w is continuous, there is some 0 < < such that wx >

1 2

∀ x ∈ −

Consider the trigonometric polynomial

2 – cos δ 1

tx = 1 − cos + cos x Obviously, tx and all powers tN x are polynomials in cos x. From de Moivre’s formula

–π

–δ

δ

π

t (x) = 1 – cos δ + cos x

eikx = cos x + i sin xk

it is easy to see that cosk x = kj=0 cj cos jx,[] see also Gradshteyn and Ryzhik [17, 1.32]. We can thus write tN x as linear combination of cos kx, k = 0 1 N . By assumption, w is orthogonal to all of them, and so 0= wx tN x dx = + + (24.7) wx tN x dx −

− − −

On − we have wx > −

1 2

as well as tx > 1, hence

wx tN x dx

1 N → tN x dx −−−→ 2 −

by monotone convergence T9.6. On the other hand, tx 1 for x ∈ − − ∪ and N wx t x dx − w < ∀ N ∈

which means that (24.7) is impossible, i.e. w ≡ 0 and u ≡ 0. An arbitrary function u ∈ L2 − dx is, due to the finiteness of the measure, integrable[] , and we may consider the primitive Ux = ut dt − x


285

which is a continuous function, cf. Problem 11.7. Moreover, U− = 0 = ut dt = U −

because of the assumption √ that u is orthogonal to every function from (24.2) and, in particular, to t → 1/ 2 . By Fubini’s theorem 13.9 we get Ux cos kx dx = 1− x t ut cos kx dt dx −

− −

=

−

=−

−

ut −

1t x cos kx dx ut dt

sin kt dt = 0 k

and we conclude from the first part of the proof that U ≡ 0. Lebesgue’s differentiation theorem 19.20 finally shows that ux = U x = 0 a.e. Since the trigonometric system is one of the most important ONSs, we provide a further proof of the completeness theorem which gives some more insight into Fourier series and yields even an independent proof of Weierstraß’ approximation theorem 24.6 for trigonometric polynomials, cf. Corollary 24.12 below. We begin with an elementary but fundamental consideration which goes back to Féjer. If u ∈ L2 − dt, we write 1 1 aj = ut cos jt dt bk = ut sin kt dt (24.8)

−

− (j ∈ 0 , k ∈ ) for the Fourier cosine and sine coefficients of u and set sN u x =

N j=1

a aj cos jx + bj sin jx + 0 2

N 1 1 = ut dt cos jt cos jx + sin jt sin jx +

− j=1 2

(24.9)

N 1 1 = + ut cos jt − x dt 2 j=1

− = DN t−x

where we used the trigonometric formula cos a cos b + sin a sin b = cosa − b

(24.10)

286

R.L. Schilling

The function DN • is called the Dirichlet kernel. In Problem 24.6 we will see that DN • has the following closed-form expression: sin n + 21 x (24.11) DN x = 2 sin x2 but we do not need this formula in the sequel. Now we introduce the Cesàro C-1 mean 1 N u x = s0 u x + s1 u x + · · · + sN u x N +1

(24.12)

and in view of (24.9) we want to compute what is known as the Féjer kernel KN x =

1 D0 x + D1 x + · · · + DN x N +1

Using again (24.10) and observing that the cosine is even, we find for every k = 0 1 N 1 − cos x Dk x = (24.10)

=

k 1 1 − cos x cos jx 2 j=−k k 1 cos jx − cosj − 1x + sin x sin jx 2 j=−k

1 cos kx − cosk + 1x 2 since sin jx = − sin−jx is an odd function which cancels if we sum over −k j k. Summing over all values of k = 0 1 N shows 1 KN x = D0 x + D1 x + · · · + DN x N +1 1 − cosN + 1x 1 (24.13) = 2N + 1 1 − cos x =

24.11 Lemma (Féjer) If u ∈ C− , then limN → N u − u p = 0 for all 1 p . Proof From (24.9), (24.12) and (24.13) we get after a change of variables in the integrals 1 N u x = ut KN x − t dt

− 1 − cosN + 1t 1 ux − t dt = 2N + 1 − 1 − cos t


Since

1

− KN t dt

N u − u p

287

= 1[] , we see for all > 0 and sufficiently small > 0

1 1 − cosN + 1t =

u• − t − u dt

2N + 1 −

1 − cos t

p

1 − cosN + 1t

u• − t − u dt p 1 − cos t −

1 1 − cosN + 1t

u• − t − u dt p 2N + 1 − 1 − cos t u p 1 − cosN + 1t dt + N + 1 − − ∪ 1 − cos t

12.14

1 2N + 1

+

u p 4 N + 1 1 − cos

where we used Jensen’s inequality and the fact that limt→0 u• − t − u p = 0 by dominated convergence (p < ), resp. uniform continuity (p = ). Letting first N → and then → 0 finishes the proof. 24.12 Corollary (Weierstraß) The trigonometric polynomials are dense in C− under • and dense in Lp − dt, w.r.t. • p , 1 p < . Proof From (24.9), (24.12) it is obvious that N u • is a trigonometric polynomial. The density of the trigonometric polynomials in C− is just Lemma 24.11. Since C− is dense in Lp − dt, cf. Theorem 15.17, we can find for every > 0 and u ∈ Lp − some g ∈ C− with u − g p and a trigonometric polynomial t such that g − t 2 −1/p . This shows u − t p u − g p + g − t p + 2 1/p g − t 2 For the last estimate we also used that w p 2 1/p w . 24.13 Corollary The trigonometric system (24.2) is a complete ONS in L2 = L2 − dt Proof (of C24.13 and, again, of T24.10) Let u ∈ L2 − and pick a trigonometric polynomial t such that u − t 2 , cf. Corollary 24.12. Let n = degreet . As in the proof of Theorem 24.10 we use de Moivre’s formula to see that cosk x and sink x can be represented as linear combinations of 1 cos x cos kx and sin x sin kx.[]

288

R.L. Schilling

Recall that the partial sum sn u x = a0 /2 + nj=1 aj cos jx + bj sin jx is the projection of u onto span1 cos x sin x cos nx sin nx. Therefore, Theorem 21.11(i) applies and u − sn u 2 u − t 2 proves completeness. The above proof of the completeness of the trigonometric system has a further advantage as it allows a glimpse into other modes of convergence of Fourier series. We have 24.14 Corollary (M. Riesz’ theorem) Let u ∈ Lp − dt and 1 p < . lim u − sn u p = 0

n→

⇐⇒

sn u p Cp u p ∀ n ∈

(24.14)

with an absolute constant Cp not depending on u or n ∈ . Proof The ‘only if’ part is a consequence of the uniform boundedness principle (Banach–Steinhaus theorem) from functional analysis, see e.g. Rudin [40, §5.8] or Problem 21.10. The ‘if’ part follows from the observation that every trigonometric polynomial T of degree n satisfies sn T = T .[] Choosing for u ∈ Lp the polynomial T = t with u − t p , cf. Corollary 24.12, we infer that for sufficiently large n > degreet u − sn u p u − t p + t − sn t p + sn t − sn u p =0

1 + Cp u − t p Establishing the estimate sn u p Cp u p is an altogether different matter and so is the whole Lp - and pointwise convergence theory for Fourier series. Here we want to mention only a few facts: • Lp -convergence (1 p < ) of the Cesàro means n u follows immediately from Lemma 24.11. This is in stark contrast to… • Lp -convergence (1 p < , p = 2) of the partial sums sn u requires the estimate (24.14); see Corollary 24.14 and, for more details, Wheeden and Zygmund [53, §12.88].


289 n→

• Pointwise a.e. convergence of the partial sums sn u −−−→ u when u ∈ L2 or u ∈ Lp , 1 < p < , which had been an open problem until 1966. A.N. Kolmogorov constructed in 1922/23 a function u ∈ L1 whose Fourier series diverges a.e. In his famous 1966 paper L. Carleson proved that a.e. convergence holds for u ∈ L2 , and R.A. Hunt extended this result in 1968 to u ∈ Lp , 1 < p < . All these deep results depend on estimates of the type (24.14) and, more importantly, on estimates for max0jn sj u which resemble the maximal martingale estimates which we have encountered in Chapter 19, e.g. T19.12. But there is a catch. 24.15 Lemma The subspace n = span1 cos x sin x cos nx sin nx of L2 − dx is not of the form L2 n where n is a sub--algebra of the Borel sets − . Proof The space L2 n is a lattice, i.e. if f ∈ L2 n , then f ∈ L2 n . Take fx = sin x. Unfortunately, 2 4 cos 2x cos 4x cos 6x sin x = − + + +···

1·3 3·5 5·7 so that sin• ∈ n but sin• ∈ n . (You might also want to have a look at Theorem 22.5 for a more systematic treatment.) This means that martingale methods are not (immediately) applicable to Fourier series. The Haar system In contrast to Fourier series, the Haar system allows a complete martingale treatment. Throughout this section we consider L2 = L2 0 1 0 1 , = 1 01 . 24.16 Definition The Haar system consists of the functions 00 x = 101 x jk x = 2k/2 1 2j−2 2j−1 x − 1 2j−1 2k+1 2k+1

2j 2k+1 2k+1

1 j 2 k k ∈ 0

x

⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭

(24.15)

Obviously, each Haar function is normalized to give jk 2 = 1. The first few Haar functions are

290

R.L. Schilling

2

2

2

2

1

1

√2 1

√2 1

1 4

1 2

3 4

1

χ 0,0

χ 1,0

χ 1,1

χ 2,1

2

2

2

2

1

1

1

1

χ 1,2

χ 2,2

χ 3,2

χ 4,2

It is often more convenient to arrange the double sequence (24.15) in lexicographical order: 00 ; 10 ; 11 21 ; 12 22 32 42 ; …and to relabel them in the following way H0 = 00

Hn = H2k + = +1k

0 2k − 1

(24.16)

(note that the representation n = 2k + , 0 2k − 1 is unique). We can now associate with the sequence Hn n∈ a canonical filtration H n = H0 H1 Hn

n ∈ 0

which is the smallest -algebra that makes all functions H0 Hn measurable, cf. Definition 7.5. 24.17 Theorem The Haar functions are a complete ONS in L2 0 1 dx. Moreover, N MN = an Hn N ∈ 0 an ∈ n=0 p is a martingale w.r.t. the filtration H N N ∈0 , and for every u ∈ L 0 1 dx, 1 p < , the Haar–Fourier series

sN u x =

N n=0

u Hn Hn


291

converges to u in Lp and almost everywhere, and the maximal inequality

p

sup s u u p

n p−1 p n∈ holds for all u ∈ Lp and 1 < p < . Proof Step 1. Orthonormality: That jk 2 = 1 is obvious. If the functions jk = m satisfy jk = 0 ∩ m = 0 = ∅, it is clear that jk m d = 0. Otherwise, we can assume that k < m, so that either

m = 0 ⊂ jk = +1

or

m = 0 ⊂ jk = −1

obtains. In either case,

m jk d = ±

m d = 0

Step 2. Martingale property: Let n = 2k + . Then, for all n ∈ , H n = 00 10 11 2k−1 k−1 1k 2k +1k 2 + 1 2 + 2 + 1 + 2 2k − 1 1 = 0 k+1 k k+1 1 k k+1 2 2 2k 2 2 2 = n

= n

where we used that the dyadic intervals are nested and refine. Assume, for simplicity, that < 2k − 1. Then Hn+1 = 0 ∈ n , and so Hn+1 x dx = 0 ∀ J ∈ n or J ∈ n J

(If = 2k − 1 we get an analogous conclusion with a rollover as H n is just the dyadic -algebra generated by all disjoint half-open intervals of length 2−k−1 in H 0 1.) By Theorem 23.5 we have En Hn+1 = 0, and by Theorem 23.8

EN MN +1 = EN MN + aN +1 HN +1 = MN + aN +1 EN HN +1 = MN This shows that MN H N N ∈ is indeed a martingale, cf. Corollary 23.14. H

H

H

Step 3. Convergence in L1 and a.e. if u ∈ L1 ∩ L : Set ak = u Hk , so that MN = sN u becomes the Haar–Fourier partial sum. Using Bessel’s inequality (Theorem 21.11) we see sN u 22 =

N k=0

u Hk 2 u 22

(24.17)

292

R.L. Schilling

where the right-hand side is finite since L1 ∩ L ⊂ L2 ,[] and from the Cauchy– Schwarz C12.3 and Markov P10.12 inequalities we get for all R > 0 1/2 sN u d sN u 2 sN u > R sN u>R

1 1 sN u 22 u 22 R R

Since the constant function R is in L2 0 1 dx, the martingale sN uN ∈ is uniformly integrable in the sense of Definition 16.1, and we conclude from Theorems 18.6 and 23.15 that N →

sN u −−−→ u

in L1 and almost everywhere

Since H n n∈ contains the sequence k k∈ of dyadic -algebras – we have H H H indeed n = 2n −1 – we know that = n n ∈ = 0 1. Just as in Example 23.17 we see that

E2n −1 u = En u = s2n −1 u H

and in view of Theorem 23.15 we conclude that u = u a.e. Step 4. Convergence in Lp if u ∈ L1 ∩ L : Observe that L1 ∩ L ⊂ Lp for all 1 < p < .[] Applying the inequality b p p p p p−1 a − b a − b = p t dt a

p a − b max ap−1 bp−1 p a − b max ap−1 bp−1 a b ∈ , 1 1 H

H where we also used that sN u = EN u u as u u < , cf. Theorem 23.8(ix). From Riesz’ convergence theorem T12.10 we conclude that N →

sN u −−−→ u in Lp for all 1 < p < and all u ∈ L1 ∩ L . Step 5. Convergence in Lp if u ∈ Lp : If u ∈ Lp , 1 p < , is not bounded, we set uk = −k ∨ u ∧ k. Since we have a finite measure space, uk ∈ Lp ∩ L ⊂


293

L1 ∩ L , and we see from the triangle inequality and Theorem 23.8(v),(ii) sN u − u p sN u − sN uk p + sN uk − uk p + uk − u p sN uk − uk p + 2 uk − u p The claim follows as N → and then k → . Step 6. A.e. convergence if u ∈ Lp : Since sN u± = EN u± 0, we know from Corollary 23.16 that sN u± p N ∈ are submartingales which satisfy, by Theorem 23.8(ii), H

H

p p sN u± p d EN u± d = EN u± p u± pp H

Therefore, the submartingale convergence theorem 18.2 applies and shows that limN → sN u± xp exists a.e., hence, limN → sN u x exists a.e. Since step 5 and Corollary 12.8 already imply limj→ sNj u x = ux a.e. for some subsequence, we can identify the limit and get limN → sN u x = ux a.e. Step 7. Completeness follows from lim sN u − u 2 = 0 and T21.13. N →

Step 8. The maximal inequality is just Doob’s maximal Lp -inequality for martingales T19.12 since sn un∈ is a uniformly integrable martingale which is, by step 5 and Theorem 23.15, closed by s u = u. 24.18 Remark As a matter of fact, ordering the Haar functions in a sequence like Hn n∈0 does play a rôle. If p = 1, we can find (after some elementary but very tedious calculations) that

2n √

k/2

00 + 10 + 2 1k

2 1

k=1

while the lacunary series satisfies

n

k

10 + 2 12k

cn

k=1

1

for some absolute constant c > 0. Therefore, we can rearrange n=0 an Hn in such a way that it becomes a divergent series n=0 an Hn for some necessarily infinite permutation 0 → 0 . This phenomenon does not happen if 1 < p < . In fact, Hn n∈0 is what one calls an unconditional basis of Lp , 1 < p < , which means that every p rearrangement of the series n=0 an Hn converges in L and leads to the same limit. The Haar system is even the litmus test for the existence of unconditional bases: every Banach space B where Hn n∈0 is a basis has an unconditional

294

R.L. Schilling

basis if, and only if, the basis Hn n∈0 is unconditional, cf. Olevski˘ı [32, p. 73, Corollary] or Lindenstrauss and Tzafriri [27, vol. II, p. 161, Corollary 2.c.11]. Since the unconditionality of Hn n∈0 rests on a martingale argument, we include a sketch of its proof. First we need the following Burkholder–Davis– Gundy inequalities for a martingale uj j∈0 on a probability space X :

p sup uj u• u• N Kp sup uj

(BDG) 0jN

p

p

0jN

p

for all N ∈ 0 , all 0 0. The expression u• u• N stands for the quadratic variation of the martingale u• u• N = u0 2 +

N −1

uj+1 − uj 2

j=0

A proof of (BDG) can be found in Rogers and Williams [38, vol. 2, pp. 94–6]. If we combine (BDG) with Doob’s maximal Lp -inequality 19.12 we get

p Kp

p uN p u• u• N u (BDG ) p−1 N p p for all N ∈ 0 and 1 < p < – mind the different range for p in (BDG ) compared to (BDG). Obviously, uN =

N

u Hk Hk

and

k=0

wN =

N

k u Hk Hk

k=0

k ∈ −1 +1, are uniformly integrable martingales (use the argument of the proof of Theorem 24.17) and their quadratic variations u• u• N = w• w• N coincide. Therefore, (BDG ) shows that the martingales uN − un N n and wN − wn N n satisfy

1/2

1/2

uN − un p ∼ u• − un u• − un N p = w• − wn w• − wn N p ∼ wN − wn p where a ∼ b means that a b K a for some absolute constants K > 0, so that either both sequences converge or diverge. Let us assume that uN N ∈0 converges. Then every lacunary series

u Hkj Hkj

converges

(24.18)

j=1

since we can produce its partial sums by adding and subtracting uN and wN with suitable ±1-sequences k k∈ . This entails that for every fixed permutation


0 → 0

N

u Hk Hk

N>n

295

sufficiently large

p

k=n

Otherwise, we could find finite sets 0 1 2 ⊂ 0 with kj j∈ = and

u Hk Hk

∀ n ∈

>

!

n∈ n

p

k∈n

contradicting (24.18). For more on this topic we refer to Lindenstrauss and Tzafriri [27].

The Haar wavelet Let us now consider a Haar system on the whole real line, i.e. in L2 = L2 dx. We begin with the remark that the functions 00 = 101 and 10 = 101/2 − 11/21 are the two basic Haar functions, since we can reconstruct all Haar functions jk from them by scaling and shifting: jk x = 2k/2 10 2k x − j + 1

k ∈ 0 j = 1 2 2k

(24.19)

The advantage of (24.19) over the definition (24.15) is that (24.19) easily extends to all pairs j k ∈ 2 and, thus to a system of functions on . 24.19 Definition The Haar wavelets are the system jk jk∈ where the mother wavelet is x = 101/2 x − 11/21 x and jk x = 2k/2 2k x − j = 2k/2 1 2j 2j+1 x − 1 2j+1 2j+2 x 2k+1 2k+1

2k+1 2k+1

for all j k ∈ . Note that = 10 = 10 , j−1k = jk for all j = 1 2 2k and k ∈ 0 while −10 x = 2−1/2 00 x for 0 x < 1. The Haar wavelets can be treated by martingale methods. To do so, we introduce the two-sided dyadic filtration j j+1 = n ∈ j ∈ = jn j ∈ n+1 2n+1 2n+1 (24.20) " = = ∅ = = − n n n∈

n∈

296

R.L. Schilling

The last assertion follows from the fact that D = j2−n−1 j ∈ n ∈ is a dense subset of and that is generated by all intervals of the form a b where a b ∈ D (or, indeed, any other dense subset).[] In what follows we have to consider double summations. To keep notation simple, we write # $ ajk as a shorthand for ajk k=− j=−

and call kconst. the double sum.

j=−

the right tail and

k=− − R we find

E−M uR = 2−M

−R0

ux dx 1−2M 0 + 2−M

0R

ux dx 102M

where we used that E−M projects onto the intervals j2M j + 12M , and we find from the Hölder inequality T12.2 with p−1 + q −1 = 1 that E −M uR x 2−M R1/q u p 1−2M 2M x

which implies

E −M uR 2−M R1/q u p 2 · 2M 1/p = cR 2−M1−1/p u p p

298

R.L. Schilling

Finally, by Theorem 23.8(v),(ii),

E −M u E−M u − uR + E−M uR

p p p

u − uR p + cR 2−M1−1/p u p and we get limM→ E−M u p = 0 for all u ∈ Lp , 1 < p < , letting first M → and then R → . MN → This shows that u−MN −−−−−→ u in Lp , 1 < p < , and the proof of the convergence of (24.21) in Lp , 1 < p < , is complete.

Step 6. Completeness of the Haar wavelets in L2 follows if we apply (24.21) in the case p = 2, cf. Theorem 21.13. Step 7. A.e. convergence of the left tail of (24.21): Observe that A = E−M u > for infinitely many M ∈ " E−j u > ∈ −

=

M=1 j=M

∈ −M

By the martingale maximal inequality, Lemma 19.11, for the reversed martingale E −j u j∈ and Theorem 23.8(ii) we see E −j u > A j=M

sup E−j u > j∈

1

E−1 u 1 u p p p p

This shows that A < . Since − = ∅ is the trivial -algebra, we M→

conclude that A = 0 or A = ∅. Therefore, E−M u −−−→ 0 almost everywhere

MN →

and so uN−M −−−−−→ u almost everywhere. Step 8: The maximal inequality (24.22): From step 2 we know that

N

sup uN−M = sup

u jk jk

p NM∈

NM∈ k=−M j=−

= sup EN +1 u − inf E−M u

N ∈

M∈

p u p + E−1 u p p−1

p

p


299

The last estimate follows from a combination of Minkowski’s inequality, Doob’s maximal Lp -inequality for martingales T19.12 applied to the closed (by u) martin p gale EN u , cf. step 3 and Theorem 23.15, and the fact that E−M u N ∈∪

M∈

is a reversed submartingale, cf. Example 17.3(vi) or Corollary 23.16, which entails E−M u p E−1 u p . Since by T23.8(ii) conditional expectations are contractions on Lp , we have E−1 u p u p , and the proof is completed.

A nice introduction to the Haar and other wavelets is Pinsky [35]. The Rademacher functions Let L2 = L2 0 1 0 1 , = 1 01 . The Rademacher functions Rk k∈0 are functions on L2 defined by R1 = 10 1 − 1 1 0

R0 = 101

2

2

R2 = 10 1 − 1 1 1 + 1 1 3 − 1 3 1 4

4 2

2 4

4

The graphs of the first four Rademacher functions are 1

1 4

1 2

3 4

1

R0

R1

R2

R3

In terms of Haar functions we have k

R0 = 00

Rk+1 =

2 1

2k/2 j=1

jk

k ∈ 0

(24.25)

Another equivalent definition of the Rademacher system is the following: expand −j with ∈ 0 1 – we exclude each x ∈ 0 1 as binary series, x = j j=1 j 2 expansions terminating with a string of 1s to enforce uniqueness – and set R0 x = 101 x

Rk x = 2k − 1

Yet another way to think of the functions Rk is as right-continuous versions of sign changes: Rk x ≈ sgn sin2k x, k ∈ 0 . 24.21 Lemma The system of Rademacher functions Rk k∈ is an ONS of independent4 functions in L2 0 1 dx which is not complete. 4

In the sense of Example 17.3(x) and Scholium 17.4.

300

R.L. Schilling

Proof Orthonormality follows since R =±1 R d = 0 for all k < , thus k Rk R d = 0 while R2k d = 1 is obvious. In very much the same way we deduce that Rk R1 R2 d = 0 for all k ∈ 0 which shows that the system Rk k∈0 is not complete. Independence is a special case of Scholium 17.4 with p = q = 1/2. Although Rk k∈0 is not complete in L2 , it still has good a.e. convergence properties. The reason for this is formula (24.25) and independence. 24.22 Theorem The Rademacher series 2 everywhere if, and only if, k=0 ck < .

k=1 ck Rk ,

ck ∈ , converges almost

2 −k/2 c Proof Assume first that k k=0 ck < . In view of (24.25) we set cjk = 2 and rearrange the absolutely convergent series as

2 k

ck2

=

k=0

2 cjk <

k=0 j=1

We can now interpret the double sequence cjk 1 j 2k k ∈ 0 as coefficients of the complete (!) Haar ONS jk 1 j 2k k ∈ 0 . From Parseval’s identity T21.11(iv) we then conclude that the series

2 k

ck Rk =

k=0

cjk jk

k=0 j=1

converges almost everywhere and in L2 to some element u ∈ L2 . Conversely, assume that the series k=0 ck Rk converges to a finite limit sx for all x ∈ E ∈ 0 1 such that E > 0. Writing sN for the N th partial sum of this series, we see that AN =

x ∈ E sj x − sx >

1 2

and

"

AN = ∅

N ∈

j=N

By the continuity of measures T4.4 we find for every > 0 some N = N ∈ such that AN < < 21 E

and

E \ AN > 0

In particular, if E ∗ = E \ AN , sj x − sk x sj x − sx + sx − sk x 1

∀ j k > N x ∈ E ∗


301

and an application of the Cauchy–Schwarz inequality for (double) series, cf. (12.13), shows 2 N ∗ E ck Rk d E∗

k=M+1

N

= E ∗

1

k=M+1

E = E

Rj Rk d

k=M+1

M<j k sn u = E u =

k∈

sn u − En u >

1 k

= 0

k∈

Therefore, sn u = En u a.e.

(iii)⇒(iv) For u ∈ L1 ∩L Theorem 23.15 shows that En u n∈ is a uniformly n→

integrable martingale and that En u −−−→ u in L1 and a.e. As in step 4 of the proof of Theorem 24.17, we use the inequality ap − bp p a − b maxap−1 bp−1 to deduce that ±

a b ∈ p > 1

En up − up d p En u − u · u p−1 1 n→

and, by Riesz’ convergence theorem 12.10, that En u −−−→ u in Lp . If u ∈ Lp is not bounded, we take an with exhausting sequence Ak k∈ ⊂ 1 Ak ↑ X and Ak < and set uk = −k ∨ u ∧ k 1Ak . Clearly, uk ∈ L ∩ L , and we see using Theorem 23.8(v),(ii),

E n u − u En u − En uk + En uk − uk + uk − u p p p p

E n uk − uk p + 2 uk − u p The claim follows if we let first n → and then k → . (iv)⇒(i) is just p = 2 combined with Theorem 21.13. If we know that the elements of the ONS are independent, we obtain the following necessary and sufficient conditions for pointwise convergence which generalize Theorem 24.22.

306

R.L. Schilling

24.27 Theorem Let X P be a probability space and ej j∈0 ⊂ L2 P be independent random variables such that ej dP = 0 and ej2 dP = 1 and let cj j∈0 ⊂ be a sequence of real numbers. Then (i) The family ej j∈0 is an ONS of martingale differences; 2 (ii) If in L2 P and a.e.; j=0 cj < , then j=0 cj ej converges (iii) If supj∈0 ej < and if j=0 cj ej converges almost everywhere, 2 then j=0 cj < . Proof (i) We set n = e0 e1 en

and

un =

n

cj ej

j=0

Since P is a probability measure, uj ∈ L2 P ⊂ L1 P and under our assumptions it is clear that uj j j∈0 is a martingale.[] By independence we have ⎧ ⎪ ⎨ ej dP · ek dP = 0 if j = k ej ek dP = ⎪ ⎩ e2 dP = 1 if j = k j which entails

u2n dP =

n

cj ck

ej ek dP =

n

cj2

(24.29)

j=0

jk=0

and also n n n un un+k dP = E un un+k dP = un E un+k dP = cj2

(24.30)

j=0

(ii) Because of (24.29) we see that un 21 un 22 =

n j=0

cj2

cj2 <

j=0 n→

and the martingale convergence theorem C18.3 shows that un −−−→ u a.e. Using (24.30) we conclude that un+k − un 2 dP = u2n+k − 2un un+k + u2n dP =

u2n+k dP −

u2n dP =

n+k j=n+1

cj2

j=n+1

cj2


307

Thus, by Fatou’s lemma 9.11, n→ cj2 −−−→ 0 u − un 2 dP lim inf un+k − un 2 dP k→

j=n+1

n→

and un −−−→ u follows in the L2 -sense. (iii) Since ej and j−1 are independent, we find for all A ∈ j−1 176 uj − uj−1 2 dP = cj2 ej2 dP = cj2 PA = cj2 dP A

A

(24.31)

A

Essentially the same calculation that was used in (24.30) also yields uj − uj−1 2 dP = 1A uj − 1A uj−1 2 dP A

= =

1A uj 2 dP −

A

1A uj−1 2 dP

u2j − u2j−1 dP

which can be combined with (24.31) to give n n−1 2 2 2 2 un − un−1 − cj dP = cj dP A

A

j=0

∀ A ∈ n−1

j=0

This means, however, that wn = u2n − nj=0 cj2 is a martingale. Consider the stopping time = = infn ∈ 0 un > , inf ∅ = . Since the series j=0 cj ej converges a.e., we can choose > 0 in such a way that 2 P < < 21 P = Without loss of generality we may also take 2 > w0 dP + u20 dP . The optional sampling theorem 17.8 proves that wn∧ n∈ is again a martingale and, therefore, n∧ 2 w0 dP = wn∧ dP = u2n∧ dP − cj dP (24.32) j=0

Taking into account the very definition of we find furthermore 2 2 2 un∧ dP + un∧ dP + u2n∧ dP un∧ dP = >n

2 2 +

1n

1n

u2 dP

=0

308

R.L. Schilling

= 2 2 +

c e + u−1 2 dP

1n

2 2 + 2

c2 e2 + u2−1 dP

1n

where we used the elementary inequality a + b2 2a2 + 2b2 in the last line. Since the ej are uniformly bounded by and since u−1 , we get u2n∧ dP 4 2 + 2 c2 dP n

4 2 + 2 P n

n

cj2

(24.33)

j=0

4 2 + 21 P =

n

cj2

j=0

since, by construction, 2 P n 2 P < < 21 P = . Rearranging (24.32) and combining this with the above estimates we obtain n∧ n∧ n 2 2 cj2 = cj dP cj dP P = j=0

=

j=0

j=0 (24.32)

=

u2n∧ dP −

w0 dP

(24.33)

4 2 + 21 P =

n

cj2 + 2

j=0

uniformly for all n ∈ . Since, by assumption, P = > 0 for sufficiently 2 large , we conclude that j=0 cj < . Theorem 24.27 has an astonishing corollary if we apply the Burkholder–Davis– Gundy inequalities (BDG) from p. 294 to the martingale n+k

wn = un+k − uk =

cj ej

j=k+1

w.r.t. the filtration n = n+k = e0 e1 en+k . The part of the inequalities which is important for our purposes reads

p wn p p sup wn w• w• n (24.34) 0jn

p

p


309

where n ∈ 0 , 0 < p < , and the quadratic variation is given by w• w• n = u•+k − uk u•+k − uk n =

n−1

n+k

uj+k+1 − uj+k 2 =

j=0

cj2 ej2

j=k+1

If we happen to know that supj∈ ej < , we even find n+k 1/2

2

cj

w• w• n

j=k+1

and we conclude from (24.34) that for all n k ∈ and 0 < p <

p un+k − uk p u•+k − uk u•+k − uk 1/2 n p

u•+k − uk u•+k − uk 1/2 n

n+k

1/2 cj2

j=k+1

holds. This proves immediately the following 24.28 Corollary Let X P be a probability space and let ej n∈0 be a sequence of independent random variables such that sup ej < ej dP = 0 and ej2 dP = 1 j∈0

Then un = nj=0 cj ej converges in L2 and a.e. to some u ∈ L2 if, and only if, 2 j=0 cj < . If the latter is the case, u ∈ Lp and the convergence takes place in Lp -sense for all 0 < p < . Unfortunately, many ONSs of martingale differences are incomplete and seem to behave more often like Rademacher functions than Haar functions. More on this topic can be found in the paper by Gundy [18] and the book by Garsia [16]. 24.29 Epilogue The combination of martingale methods and orthogonal expansions opens up a whole new world. Let us illustrate this by a rapid construction of one of the most prominent stochastic process: the Wiener process or Brownian motion. Choose in Theorem 24.27 X P = 0 1 0 1 where is onedimensional Lebesgue measure on 0 1 ; denoting points in 0 1 by , we will often write d instead of d. Assume that the independent, identically

310

R.L. Schilling

distributed random variables ej are all standard normal Gaussian random variables, i.e. 1 −x2 /2 Pej ∈ B = √ e dx B ∈ 0 1 2 B and consider the series expansion Wt =

en 10t Hn

∈ 0 1

n=0

1 Here t ∈ 0 1 is a parameter, u v = 0 uxvx dx, and Hn , n = 2k + j, 0 j < 2k , denote the lexicographically ordered Haar functions (24.16). A short calculation confirms for n 1 t t 10t Hn = Hn x dx = 21 2k/2 H1 2k x − j dx = 21 2−k/2 Fn t where F1 t =

t 0

0

0

H1 x dx 101 t = 2t10 1 t − 2t − 21 1 1 t is a tent2

function and Fn t = F1 2k t − j. Since 0 Fn 1, we see

10t Hn 2

n=0

2

1 1 2−k = 4 n=0 2

and Theorem 24.27(ii) guarantees that Wt exists, for each t ∈ 0 1 , both in L2 d-sense and d-almost everywhere. More is true. Since the en are independent Gaussian random variables, so are their finite linear combinations (e.g. Bauer [5, §24]) and, in particular, the partial sums N SN t = en 10t Hn n=0

Gaussianity is preserved under L2 -limits;6 we conclude that Wt has a Gaussian distribution for each t. The mean is given by 1 1 Wt d = en d 10t Hn = 0 0

n=0 0

(to change integration and summation use that L2 d-convergence entails L1 d-convergence on a finite measure space). Since en em d = 0 or 1 6

(cf. [5, §§23, 24]) if Xn is normal distributed with mean 0 and variance n2 , its Fourier transform is iX n→ 2 2 e n dP = en /2 . If Xn −−−→ X in L2 -sense, we have n2 → 2 and, by dominated convergence, iX iX 2 2 2 2 n e dP = limn e dP = limn en /2 = e /2 ; the claim follows from the uniqueness of the Fourier transform.


311

according to n = m or n = m, we can calculate for 0 s < t 1 the variance by 1 Wt − Ws 2 d 0

= =

nm=0 0

1

en em d 10t − 10s Hn 10t − 10s Hm

1st Hn 2

24.17,21.13

=

1st 1st = t − s

n=0

In particular, the increment Wt − Ws has the same probability distribution as Wt−s . In the same vein we find for 0 s < t u < v 1 that 1 Wt − Ws Wv − Wu d = 1st 1uv = 0 0

Since Wt − Ws is Gaussian, this proves already the independence of the two increments Wt − Ws and Wv − Wu , cf. [5, §24]. By induction, we conclude that Wtn − Wtn−1 Wt1 − Wt0 are independent for all 0 t0 · · · tn 1. Let us finally turn to the dependence of Wt on t. Note that for M < N N 1 1 sup SN t − SM t d = sup en 10t Hn d 0 0 t∈01

t∈01 n=M+1

N

1

n=M+1 0

C

N

en d sup 10t Hn t∈01 = const.

1 2

2−k/2 <

n=M+1

which means that the partial sums SN t of Wt converge in L1 d uniformly for all t ∈ 0 1 . By C12.8 we can extract a subsequence, which converges (uniformly in t) for d-almost all to Wt ; since for fixed the partial sums t → SN t are continuous functions of t, this property is inherited by the a.e. limit Wt . The above construction is a variation of a theme by Lévy [26, Chap. I.1, pp. 15–20] and Ciesielski [10]. In one or another form it can be found in many probability textbooks, e.g. Bass [3, pp. 11–13] or Steele [45, pp. 35–39]. A related construction of Wiener, see Paley and Wiener [34, Chapter XI], using random Fourier series, is discussed in Kahane [23, §16.1–3].

312

R.L. Schilling

Problems 24.1. Prove the orthogonality relation for the Jacobi polynomials 24.1. 24.2. Use the Gram–Schmidt orthonormalization procedure to verify the formulae for the first few Chebyshev, Legendre, Laguerre and Hermite polynomials given in 24.1–24.5. 24.3. State and prove Theorem 24.6 and Corollary 24.8 for an arbitrary compact interval a b . 24.4. Prove the orthogonality relations (24.4) for the trigonometric system. [Hint: observe that Im eix+y + eix−y = 2 sin x cos y.] 24.5. (i) Show that for suitable constants cj sj ∈ and all k ∈ 0 cosk x =

k

cj cos jx

and

sink+1 x =

j=0

k+1

sj sin jx

j=1

(ii) Show that for suitable constants aj bj ∈ and all k ∈ cos kx =

k

aj cosk−j x sinj x

and

sin kx =

j=0

k

bj cosk−j x sinj x

j=1

(iii) Deduce that every trigonometric polynomial Tn x of order n can be written in the form n Un x = jk cosj x sink x jk=0

and vice versa. 24.6. Use the sin a − sin b = 2 cos a+b sin a−b to show that DN x sin x2 = 2 2 formula 1 1 sin N + 2 x. This proves (24.11). 2 24.7. Find the Fourier series expansion for the function sin x . 24.8. Let ux = 101 x. Show that the Haar–Fourier series for u converges for all 1 p < in Lp -sense to u. Is this also true for the Haar wavelet expansion? 24.9. Show that the Haar–Fourier series for u ∈ Cc converges uniformly for every x ∈ to ux. Show that this remains true for functions u ∈ C , i.e. the set of continuous functions such that limx→ ux = 0. [Hint: use the fact that u ∈ Cc is uniformly continuous. For u ∈ C observe that • (closure in sup-norm) and check that sN u x u .] C = Cc 24.10. Extend Problem 24.9 to the Haar wavelet expansion.

N → [Hint: use Problem 24.9 and show that E−N u −−−→ 0 for all u ∈ Cc .]

24.11. Let ux = 101/3 x. Prove that the Haar–Fourier diverges at x = 13 . [Hint: verify lim inf N → sN u 13 < lim supN → sN u 13 .]

Appendix A lim inf and lim sup

For a sequence of real numbers aj j∈ ⊂ the limes inferior or lower limit is defined as lim inf aj = sup inf aj j→

k∈ jk

(A.1)

and the limes superior or upper limit is defined as lim sup aj = inf sup aj k∈ jk

j→

(A.2)

Lower and upper limits of a sequence are always defined as numbers in − + and − is due to the fact that the +, respectively. This sequences inf jk aj k∈ ⊂ − + and supjk aj k∈ ⊂ − + are in- resp. decreasing, so that the supk∈ and inf k∈ in (A.1) and (A.2) are actually (improper) limits limk→ . Let us collect a few simple properties of lim inf and lim sup. A.1 Properties (of lim inf and lim sup). Let aj j∈ and bj j∈ be sequences of real numbers. (i) lim inf aj = lim inf aj and lim sup aj = lim sup aj . j→

k→ jk

j→

k→ jk

(ii) lim inf aj = − lim sup−aj . j→

j→

(iii) lim inf aj lim sup aj . j→

j→

(iv) lim inf aj and lim sup aj are limits of subsequences of aj j∈ and all other j→

j→

limits L of subsequences of aj j∈ satisfy lim inf aj L lim sup aj j→

j→

313

314

R.L. Schilling

(v) lim aj ∈ exists ⇐⇒ − < lim inf aj = lim sup aj < +. j→

j→

j→

In this case lim aj = lim inf aj = lim sup aj . j→

j→

j→

(vi) lim inf aj + lim inf bj lim inf aj + bj , j→

j→

j→

lim supaj + bj lim sup aj + lim sup bj . j→

j→

j→

(vii) If aj bj 0 for all j ∈ , then lim inf aj lim inf bj lim inf aj bj j→

j→

j→

lim sup aj bj lim sup aj lim sup bj j→

j→

j→

(viii) lim inf aj + bj lim inf aj + lim sup bj lim supaj + bj . j→

j→

j→

j→

(ix) If, for all j ∈ , aj bj 0, then lim inf aj bj lim inf aj lim sup bj lim sup aj bj j→

j→

j→

j→

(x) If the limit limj→ aj exists, then lim inf aj + bj = lim aj + lim inf bj j→

j→

j→

lim supaj + bj = lim aj + lim sup bj j→

j→

j→

(xi) If aj bj 0 for all j ∈ and if limj→ aj exists, then lim inf aj bj = lim aj lim inf bj j→

j→

j→

lim sup aj bj = lim aj lim sup bj j→

j→

j→

(xii) lim sup aj = 0 =⇒ lim aj = 0. j→

j→

Proof (i) follows from the remark preceding A.1, (ii) is clear since inf aj = − sup−aj j

j

and (iii) follows from the inequality inf jk aj supjk aj where we can pass to the limit k → on both sides. Notice that (ii) reduces any statement about lim sup to a dual statement for lim inf. This means that we need to show (iv)–(xi) for the lower limit only.


315

(iv): Let anj j∈ ⊂ aj j∈ be some subsequence with (improper) limit L = limj→ anj . Then inf aj inf anj L =⇒ lim inf aj L

jk

jk

k→ jk

i.e. lim inf j→ aj is smaller than any limit of any subsequence. Let us now construct a subsequence which has L∗ = lim inf j→ aj > − as its limit. By the very definition of L∗ and the infimum we find for all > 0 some N ∈ such that L∗ − inf aj ∀ k N jk

Since then inf jk aj > −, we find by the definition of the infimum some

k N , = k , and a with a − inf aj jk

Specializing = n1 , n ∈ , we obtain an infinite family of a n from which we can extract a subsequence with limit L∗ . If L∗ = −, the sequence aj j∈ is unbounded from below and it is obvious that there must exist a subsequence tending to −. (v): If limj→ aj exists, then all subsequences converge and have the same limit, thus lim inf j→ aj = limj→ aj = lim supj→ aj by (iv). Conversely, if L = lim inf j→ aj = lim supj→ aj , we get for all k ∈ k→

0 ak − inf aj sup aj − inf aj −−−→ 0 jk

jk

jk

and limk→ ak = limk→ inf jk aj = L follows from a sandwiching argument. (vi) follows immediately from inf aj + inf bj a + b

jk

jk

∀ k

=⇒

inf aj + inf bj inf a + b

jk

jk

k

if we pass to the limit k → on both sides. (vii): We have 0 inf jk bj b for all k and multiplying this inequality with 0 inf jk aj a , k, gives inf aj inf bj a b

jk

jk

∀ k

=⇒

inf aj inf bj inf a b

jk

jk

k

The assertion follows as we go to the limit k → on both sides. (viii): We have inf aj + bj a + b a + sup bj

jk

jk

∀ k

316

R.L. Schilling

so that inf jk aj + bj inf jk aj + supjk bj , and the assertion follows as we go to the limit k → on both sides. (ix) is similar to (viii) taking into account the precautions set out in (vii). (x): If limj→ aj exists, we know from (v) that limj→ aj = lim inf j→ aj = lim supj→ aj . Thus A.1(v)

lim aj + lim inf bj = lim inf aj + lim inf bj

j→

j→

j→

A.1(vi)

j→

A.1(viii)

lim inf aj + bj j→

lim sup aj + lim inf bj j→

A.1(v)

j→

lim aj + lim inf bj

j→

j→

(xi) is similar to (x) using (v),(vii) and (ix). (xii): since aj 0, A.1(iii)

0 lim inf aj lim sup aj = 0 j→

j→

and we conclude from (v) that lim aj = lim inf aj = lim sup aj = 0

j→

j→

j→

Thus limj→ aj = 0. ∗

∗

∗

Sometimes the following definitions for upper and lower limits of a sequence of sets Aj j∈ , Aj ⊂ X, are used: Aj and lim sup Aj = Aj (A.3) lim inf Aj = j→

k∈ jk

j→

k∈ jk

The connection between set-theoretic and numerical upper and lower limits is given by A.2 Lemma For all x ∈ X we have lim inf 1Aj x = 1lim inf Aj x

(A.4)

lim sup 1Aj x = 1lim sup Aj x

(A.5)

j→

j→

j→

j→


317

Proof Note that 1k∈ Bk = inf 1Bk

and

k∈

1k∈ Bk = sup 1Bk k∈

which follows from 1k∈ Bk x = 1 ⇐⇒ x ∈

Bk

k∈

⇐⇒ ∀ k ∈ x ∈ Bk ⇐⇒ ∀ k ∈ 1Bk x = 1 ⇐⇒ inf 1Bk x = 1 k∈

A similar argument proves the assertion for supk∈ 1Bk . Hence, 1lim inf Aj = 1k∈ jk Aj = sup 1jk Aj = sup inf 1Aj = lim inf 1Aj j→

and (A.5) follows analogously.

k∈

k∈ jk

j→

Appendix B Some facts from point-set topology

The following diagram gives a survey of various types of abstract spaces used in this book. The arrows ‘−→’ indicate how the spaces are connected. In brackets we mention the key concepts that define the notion of convergence in these spaces. n

Banach space (norm, complete)

Hilbert space (scalar product, complete)

normed space (norm)

inner product space (scalar product)

metric space (distance)

topological space (open set)

Note that due to the Riesz–Fischer theorem 12.7 the space L2 • • is a Hilbert space and all Lp •p , 1 p < are Banach spaces. The material below can be found in many introductory texts on general topology and real analysis. For this compilation we used the books by Willard [54], Steen and Seebach [46] and Rudin [39]. Complete proofs are given in [54] and in the first few chapters of [39].

318


319

Topological spaces Topological spaces are characterized by the notion of openness of sets. B.1 Definition A topological space X consists of a set X and a system = X of subsets of X, called a topology, which satisfies the following properties: ∅ X ∈

(1 )

U V ∈ =⇒ U ∩ V ∈ Ui ∈ i ∈ I arbitrary =⇒ Ui ∈ n

(2 ) (3 )

i∈I

A set U ∈ is called an open set. A set F ⊂ X is closed, if its complement F c is open. We write = X for the family of closed sets in X. From de Morgan’s identities (2.2) it is not hard to see that • X and ∅ are closed sets, • unions of finitely many closed sets are again closed, • intersections of arbitrarily many closed sets are again closed. B.2 Examples Let X be an arbitrary set. (i) ∅ X is a topology on X. (ii) The power set X is a topology on X. (iii) Let U be a ‘classical’ open set in n , i.e. for every x ∈ U one can find some > 0 such that B x ⊂ U . The classical open sets n are a topology in n . Unless otherwise stated, we will always consider this natural topology on n . (iv) (Trace topology) Let X X be a topological space and A ⊂ X be any subset. Then the relatively open subsets of A, A = A ∩ X = A ∩ U U ∈ X turn A A into a topological space. (v) (Product topology) Let X X and Y Y be topological spaces. Then X × Y becomes a topological space under the product topology X × Y : by definition, a set W ∈ X × Y if W ⊂ X × Y and if for each w = x y ∈ W there exist U ∈ X and V ∈ Y such that w = x y ∈ U × V ⊂ W This makes X × Y the smallest topology containing X × Y . B.3 Definition Let X be a topological space. (i) An open neighbourhood of a point x ∈ X is an open set U = Ux containing x. A neighbourhood of x is any set containing an open neighbourhood of x.

320

R.L. Schilling

(ii) The space X is called separated or a Hausdorff space if any two different points x y ∈ X have disjoint neighbourhoods. ¯ is the smallest closed set (iii) Let A ⊂ X. The closure of A, denoted by A, containing A, i.e. A¯ = F ∈F ⊃A F . (iv) Let A ⊂ X. The (open) interior of A, denoted by A , is the largest open set inside A, i.e. A = U ∈U ⊂A U . (v) A set A ⊂ X is dense in X, if A¯ = X. (vi) The space X is separable if it contains a countable dense subset. B.4 Examples (i) The space n n is a Hausdorff space. (ii) The space X X and all spaces mentioned in the diagram at the beginning of the section are Hausdorff spaces. (iii) The space X ∅ X is not separated. (iv) A set U in a topological space X is open if, and only if, it is a neighbourhood of each of its points. (v) The open ball Br x = y ∈ n x −y < r in n is an open neighbourhood of x. The closed ball Kr x = y ∈ n x − y r is the closure of Br x, thus Br x = Kr x. (vi) The set of rational numbers is dense in . Therefore is separable. The same is true for n when we consider the countable dense set n . Density assertions are often expressed through approximation theorems such as Corollary 12.11, Theorem 24.6 or Corollary 24.12. B.5 Definition A subset K of a Hausdorff space X is called compact, if every cover of K by open sets, K ⊂ i∈I Ui , Ui ∈ , I is an arbitrary index set, admits a finite sub-cover, i.e. if there are finitely many Ui1 Uin such that K ⊂ Ui1 ∪ ∪ Uin . A set L is relatively compact if L is compact. B.6 Proposition Let X be a Hausdorff space. (i) Every compact set K is closed. (ii) Closed subsets of compact sets are closed. (iii) A family Ki i∈I of compact sets (indexed by an arbitrary set I) has non empty intersection i∈I Ki = ∅ if, and only if, every finite subcollection Kij nj=1 has non-empty intersection Ki1 ∩ Ki2 ∩ ∩ Kin = ∅. B.7 Example A set K ⊂ n is compact if, and only if, it is closed and bounded. This is also equivalent to saying that every sequence xj j∈ ⊂ K has a convergent subsequence. Such a simple characterization of compactness fails in infinitedimensional spaces, notably in the Hilbert space L2 or the Banach spaces Lp , 1 p < , see Theorem B.22 and B.27.


321

Theorem B.6(iii) is an abstract version of the well known interval principle in : a sequence of nested closed intervals aj bj ⊂ , j ∈ , has non empty intersection j∈ aj bj = ∅. If, in addition, limj→ bj − aj = 0 then j∈ aj bj = L where L = lim j→ aj = lim j→ bj . B.8 Definition Let X X and Y Y be two topological spaces. A map f X → Y is called continuous at x ∈ X, if for every neighbourhood V = Vfx we can find a neighbourhood U = Ux of x such that fU ⊂ V . If f is continuous at every x ∈ X, we call f continuous. B.9 Example Definition B.8 coincides on Euclidean spaces with the classical notion of continuity, i.e. a map f n → m is continuous at x ∈ n if, and j→

j→

only if, for every convergent sequence xj −−−→ x we have fxj −−−→ fx, cf. Theorem B.19. B.10 Definition Let X be a topological space. A set A ⊂ X is called connected, if A cannot be written in the form A = U ∪ V where U V ∈ and U ∩ V = ∅. The set A is called pathwise connected, if for any two points x y ∈ A there is a continuous curve or path 0 1 → A such that 0 = x and 1 = y. B.11 Examples (i) The only connected sets in are finite or infinite intervals. The set a b ∪ c d where a 0 is connected, but no path can be found from 0 0 to any point x sin x1 . B.12 Theorem Let f X → Y be a map between the topological spaces X and Y . (i) The map f is continuous if, and only if, for all open V ∈ Y the pre-image f −1 V ∈ X is open. (ii) Let f be continuous. The image fK ⊂ Y of a compact [connected, pathwise connected ] set K ⊂ X is again compact [connected, pathwise connected ]. (iii) Let K ⊂ X be a compact set. A continuous map1 g K → attains its maximum and minimum. (iv) Let K ⊂ X be a compact set and f K → be a injective and continuous map. Then the inverse map f −1 fK → K exists and is continuous. Since for our purposes the characterization of continuity by open sets is of central importance, cf. Example 7.3, we include the short 1

We consider here the trace topology K, cf. Example B.2.

322

R.L. Schilling

Proof (of Theorem B.12(i)) ‘⇐’ Assume first that f −1 Y ⊂ X. Every neighbourhood V˜ of fx contains by definition an open set V ⊂ V˜ with fx ∈ V . By assumption, U = f −1 V is open, and since x ∈ U , U is an (open) neighbourhood of x with fU = f f −1 V ⊂ V . ‘⇒’ Assume now that f is continuous. Take any open set B ⊂ Y , set A = and fix some x ∈ A. Since B is open, there is some open neighbourhood V = Vfx ⊂ B and by continuity we find some neighbourhood U = Ux ⊂ X of x with fU ⊂ V . Thus f −1 B

def

U ⊂ f −1 fU ⊂ f −1 V ⊂ f −1 B = A which shows that A contains for every of its points a whole neighbourhood. This is to say that A is open. B.13 Example Let g a b → be a continuous function. Since a b is compact, g attains its maximum M = sup g a b = gxmax and minimum m = inf g a b = gxmin at some points xmax xmin ∈ a b. Since a b is compact and pathwise connected, so is g a b, hence it is of the form m M. In particular, we have recovered the intermediate value theorem for functions of a real variable. B.14 Definition Let xj j∈ ⊂ X, be a sequence in the topological space X . We j→

say that xj converges to x ∈ X and write limj→ xj = x or x −−−→ x if for every open neighbourhoodU = Ux there is someN = NU ∈ such thatxj ∈ U for allj NU . This is also the ‘usual’ convergence in the spaces and n . Note that limits are only unique if X is a Hausdorff space. Sometimes we can use limits of sequences to give an equivalent description of the topology. This is always the case if every point x ∈ X has a countable system of open neighbourhoods Un n∈ with the property that for every neighbourhood V = Vx of x there is at least one Un0 ⊂ V ; this is always true in metric spaces, cf. B.19. Metric spaces In metric spaces we have a notion of distance between any two points. B.15 Definition A metric space X d is a set X with a distance function or metric d X × X → 0 such that for all x y z ∈ X definiteness symmetry triangle inequality

dx y = 0 ⇐⇒ x = y dx y = dy x dx y dx z + dz y

d1 d2 d3


323

B.16 Examples (i) Let X d be a metric space. Then A d = dA×A is again a metric space for all A ⊂ X. (ii) The real line is a metric space with dx y = x − y. The space n becomes a metric space with each of the following metrics: ⎧ 1/p n ⎪

⎪ p ⎨ if 1 p < xj − yj j=1 dp x y = ⎪ ⎪ if p = ⎩ max xj − yj 1jn

(iii) The topological space X X is a metric space with metric 1 if x = y dx y = 0 if x = y (iv) Let Xj dj , j = 1 2, be two metric spaces. Then X1 × X2 becomes a metric space for any of the following metrics xj yj ∈ Xj 1 p < :

p 1/p p

p x1 x2 y1 y2 = d1 x1 y1 + d2 x2 y2 or

x1 x2 y1 y2 = max dj xj yj j=12

B.17 Definition Let X d be a metric space. We call Br x = x ∈ X dx y < r

resp. Kr x = x ∈ X dx y r

an open resp. closed ball with centre x and radius r > 0. An open set is a set U ⊂ X such that for every x ∈ U there is some > 0 and B x ⊂ U . Closed sets arise as complements of open sets. Using the triangle inequality it is easy to see that open balls in X are also open sets and that closed balls are closed sets. Mind, however, that in general Br x Kr x. B.18 Lemma The family of open sets of a metric space X is a topology in the sense of Definition B.1. X is a separated topological space. The converse of Lemma B.18 is wrong: the topology ∅ a X of the space X = a b cannot be generated by any metric. The topology of metric spaces can be described by sequences.

324

R.L. Schilling

B.19 Theorem Let X d, Y be a metric spaces. j→

(i) A sequence xj j∈ ⊂ X converges to x, xj −−−→ x, if, and only if, j→

dxj x −−−→ 0. Moreover, the limit x is unique. (ii) A set F ⊂ X is closed if, and only if, every convergent sequence xj j∈ ⊂ F has its limit limj→ xj ∈ F . (iii) A set K ⊂ X is compact if, and only if, every sequence xj j∈ ⊂ K has a convergent subsequence whose limit is in K. (iv) A set A ⊂ X is dense if, and only if, for every x ∈ X there is a sequence j→

aj j∈ with daj x −−−→ 0. (v) A function f X → Y is continuous at x ∈ X if, and only if, for every sequence j→

j→

xj −−−→ x we have fxj −−−→ fx. Since for our purposes the characterization of continuity is of central importance, cf. Example 7.3, we include the short Proof (of Theorem B.19(v)) We begin with the observation that every neigh˜ bourhood U˜ = Ux of a point x ∈ X contains some open set U ⊂ U˜ with x ∈ U . Since U is open, we find by definition some > 0 such that B x ⊂ U . This shows that we can restate the definition of continuity B.8 at a point x in the following form: ∀ > 0

∃ > 0 fB x ⊂ B fx

(mind that the balls are taken in X and Y , respectively). j→

‘⇒’: If xj −−−→ x, we know from the definition of convergence that for every > 0 there is some N = N such that xj ∈ B x for all j N . Since f is continuous at x, we can choose for every > 0 some = > 0 such that fB x ⊂ B fx. Thus fxk ∈ fxj j N ⊂ fB x ⊂ B fx ∀ k N k→

which shows that fxk −−−→ fx. j→

j→

‘⇐’: Assume that xj −−−→ x implies fxj −−−→ fx but that f is not continuous at x. Thus there is some > 0, such that for all n ∈ the set fB1/n x is not (entirely) contained in B fx. Thus we can pick for each n ∈ some n→ xn ∈ B1/n x, such that fxn ∈ B fx. This means, however, that xn −−−→ x while dfxn fx > 0 for all n ∈ , contradicting that fxn converges to fx.


325

B.20 Definition Let X d be a metric space. A sequence xj j∈ is a Cauchy sequence, if ∀ > 0

∃ N = N ∈

∀ j k N dxj xk

A metric space is complete if every Cauchy sequence converges. An isometry is a surjective map j X → Y between two metric spaces X d and Y which satisfies dx x = jx jx . B.21 Theorem (Completion) For every metric space X d there exists a comˆ such that d ˆ X×X = d and X ⊂ X d is a dense subset. Any plete metric space X two completions of X are, up to isometries, identical. By covering a compact set K with the open sets B1 xx∈K and extracting a finite subcover we can easily see that K has finite diameter diamK = supxy∈K dx y and is, therefore, bounded. Thus compact sets are closed and bounded. The converse is, in general, not true; however, B.22 Theorem (Heine–Borel) A subset of n is compact if, and only if, it is closed and bounded. Moreover, all metrics on n are equivalent in the sense that for any two metrics d and there are absolute constants c C > 0 such that c dx y x y C dx y

∀ x y ∈ n

Normed spaces B.23 Definition A normed space X • is a -vector space2 X with a norm •, i.e. a map • X → 0 which satisfies for x y ∈ X and ∈ the following properties: x > 0 ⇐⇒ x = 0

N1

pos. homogeneity

x = · x

N2

triangle inequality

x + y x + y

N3

definiteness

If we drop the definiteness N1 , • is called a semi-norm and X • is a semi-normed space. B.24 Examples

(i) The spaces n n equipped with 1/p n xj p or x = max xj x = j=1

1 p < are normed spaces. 2

stands for either or .

1jn

326

R.L. Schilling

(ii) Let Xj •j , j = 1 2, be two normed spaces. Then X1 × X2 becomes a normed space under any of the following norms xj ∈ Xj 1 p < :

p p 1/p x1 x2 p = x1 1 + x2 2

or

x1 x2 = max xj j j=12

(iii) Every normed space is a metric space with metric given by dx y = x −y. Therefore, all notions and results for metric spaces carry over to normed spaces. In particular, open and closed balls are given by Br x = y ∈ X x − y < r and

Kr x = y ∈ X x − y r

Since X is a vector space, we have now Br x = Kr x. However, not every metric space arises from a normed space, e.g. the metric dx y = 1 or 0 according to x = y or x = y on n cannot be realized by any norm. B.25 Lemma Let X be a normed space. Then the following maps are continuous: X x → x

X × X x y → x + y

× X x → x

B.26 Definition A Banach space is a complete normed space. The following result, due to F. Riesz, says that the Heine–Borel theorem B.22 holds if, and only if, the underlying space is finite-dimensional. B.27 Theorem (Riesz). In a normed space V closed and bounded sets are compact if, and only if, V is finite-dimensional. Let ∼ be an equivalence relation on the normed space X. We write x = y ∈ X x ∼ y for the equivalence class with representative x. The quotient space X/∼ consists of all equivalence classes. It is not hard to see that X/∼ is again a vector space and that

x + y = x + y

∀ ∈ x y ∈ X

B.28 Theorem Let X • be a (complete) normed space. Then X/∼ is a (complete) normed space under the quotient norm given by x∼ = infy y ∈ x


327

Essentially the same procedure allows us to turn any semi-normed space X • into a normed space. We use the following equivalence relation for x y ∈ X: x ≈ y ⇐⇒ x − y = 0 and observe that infy y ∈ x = x B.29 Corollary Let X • be a (complete) semi-normed space. Then X/≈ is a (complete) normed space with norm given by x≈ = x. B.30 Example Denote by p X , 1 p < , the pth power integrable functions of the measure space X . Then 1/p up = up d is a semi-norm on p X , and Lp X = p X /∼ is a Banach space if we identify u w ∈ p X whenever u − wp = 0.

Appendix C The volume of a parallelepiped

In this appendix we give a simple derivation for the volume of the parallelepiped A 0 1n = Ax ∈ n x ∈ 0 1n A ∈ GLn for a non-degenerate n × n matrix A ∈ n×n . C.1 Theorem n A0 1n = det A for all A ∈ GLn . The proof of Theorem C.1 requires two auxiliary results. C.2 Lemma If D = diag1 n , j > 0, is a diagonal n × n matrix, then n DB = det D n B for all Borel sets B ∈ n . Proof Since both D and D−1 are continuous maps, DB is a Borel set if B ∈ n , cf. Example 7.3. In view of the uniqueness theorem 5.7 for measures it is enough to prove the lemma for half-open rectangles a b, a b ∈ n . Obviously, n

Da b = × j aj j bj j=1

and n n n Da b = j bj − j aj = 1 · · n bj − aj j=1

j=1

= det D n a b

C.3 Lemma Every A ∈ GLn can be written as A = SDT , where S T ∈ On are orthogonal n × n matrices and D = diag1 n is a diagonal matrix with positive entries j > 0. 328


329

Proof The matrix tAA is symmetric and so we can find some orthogonal matrix U ∈ On such that ˜ = diag 1 n UtAAU = D

t

Since for ej = 0 0 1 0 0 and the Euclidean norm • j

˜ j = tej tU tAAUej = AUej 2 > 0

j = tej De

˜ = diag1 n where j = j . Thus we can define D = D D−1 tU tAAUD−1 = idn and this proves that S = AUD−1 ∈ On. Since T = tU ∈ On, we easily see that SDT = AUD−1 D t U = A Proof (of Theorem C.1) We have for A ∈ GLn C.3 n A0 1n = n SDT 0 1n 7.9 n = DT 0 1n C.2 = det D n T 0 1n C.3 = det D n 0 1n Since S T ∈ On, their determinants are either +1 or −1, and we conclude that det A = detSDT = det S · det D · det T = det D.

Appendix D Non-measurable sets

Let X be a measure space and denote by X ∗ ¯ its completion, cf. Problem 4.13 for the definition and Problems 6.2, 10.11, 10.12, 13.11 and 15.3 for various properties. Here we only need that ∗ = A ∪ N A ∈ N is a subset of some -measurable -null set is the completion of with respect to the measure . It is a natural question to ask how big and ∗ are and whether ⊂ ∗ ⊂ X are proper inclusions. Sometimes, see Problems 6.10 or 6.11, these questions are easy to answer. For the Borel -algebra = n and Lebesgue measure = n this is more difficult. The following definition helps to distinguish between sets in n and the completion ∗ n w.r.t. Lebesgue measure. D.1 Definition The Lebesgue -algebra is the completion ∗ n of the Borel -algebra w.r.t. Lebesgue measure n . A set B ∈ ∗ n is called Lebesgue measurable. The next theorem shows that there are ‘as many’ Lebesgue measurable sets as there are subsets of n . D.2 Theorem We have #∗ n = #n for all n ∈ . Proof Since ∗ n ⊂ n we have that #∗ n #n . On the other hand, we have seen in Problem 7.10 that the Cantor ternary set C is an uncountable Borel measurable 1 -null set of cardinality # = . Consequently, n−1 × C is a n -null set. By definition of the Lebesgue -algebra, all sets in n−1 × C are Lebesgue measurable (null) sets, i.e. n−1 × C ⊂ ∗ n , and therefore #n−1 × C #∗ n . Using the fact that there is a bijection between C and we also get #n #n−1 × C #∗ n , and the Cantor–Bernstein theorem 2.7 proves that #n = #∗ n . 330


331

Unfortunately, we cannot use Theorem D.2 to decide whether there are sets which are not Lebesgue measurable. To answer this question we need the axiom of choice. D.3 Axiom of choice (AC) Let Mi i ∈ I be a collection of non-empty and mutually disjoint subsets of X. Then there exists a set L ⊂ i∈I Mi which contains exactly one element from each set Mi , i ∈ I. Note that AC only asserts the existence of the set L but does not tell us how or if the set L can be constructed at all. (This problem is at the heart of the controversy over whether one should or should not accept AC.) D.4 Theorem Assuming the axiom of choice, there exist non-Lebesgue measurable sets in n . Proof Assume first that n = 1. We will construct a non-Lebesgue measurable subset of = 0 1. We call any two x y ∈ equivalent if x∼y

⇐⇒

x − y ∈

The equivalence class containing x is given by x = y ∈ x − y ∈ = x + ∩ . By construction, is partitioned by a family of mutually disjoint equivalence classes xj , j ∈ J . By the axiom of choice1 there exists a set L which contains exactly one element, say mj , from each of the classes xj , j ∈ J . We will show that L cannot be Lebesgue measurable. Assume L were Lebesgue measurable. Since for every x ∈ we have x ∩ L = mj0 , j0 = j0 x ∈ J , we can find some q ∈ such that x = mj0 + q. Obviously, −1 < q < 1. Thus ⊂ L + ∩ −1 1 ⊂ + −1 1 = −1 2 which we can rewrite as 0 1 ⊂

q + L ⊂ −1 2

q∈ ∩−11

Moreover, r +L∩q +L = ∅ for all r = q, r q ∈ . Otherwise r +x = q +y for x y ∈ L, so that x ∼ y which is impossible since L contains only one representative 1

We have to use the axiom of choice since J is uncountable. This follows from the observation that the uncountable set = · j∈J xj is the disjoint union of countable sets xj = x + ∩ . It is known that all proofs for Theorem D.4 must use the axiom of choice or some equivalent statement, cf. Solovay [44].

332

R.L. Schilling

of each equivalence class. Therefore we can use the -additivity of the measure ¯ 1 to find ¯ 1 q + L ¯ 1 −1 2 = 3 1 = ¯ 1 0 1 q∈ ∩−11

Since ¯ 1 is invariant under translations, we get ¯ 1 q + L = ¯ 1 L for all q ∈ ∩ −1 1. We conclude that 1 ¯ 1 L 3 q∈ ∩−11

which is not possible. This proves that L cannot be Lebesgue measurable. If n > 1, a similar argument shows that 0 1n−1 × L is not Lebesgue measurable. The question whether there are Lebesgue measurable sets which are not Borel measurable can be answered constructively. Since this is quite tedious, we content ourselves with the fact that there are ‘fewer’ Borel sets than there are Lebesgue measurable sets. D.5 Theorem We have #n = . D.6 Corollary There are Lebesgue measurable sets which are not Borel measurable. Proof (of D.6) We know from Theorem D.2 that #∗ n = #n and from Theorem D.5 that #n = . Since by Theorem 2.9 and Problem 2.17 #n > #n = , we conclude that n ∗ n . To prove Theorem D.5 we show that the Borel sets are contained in a family of k sets which has cardinality . Let = k=1 be the set of all finite sequences of natural numbers and write for the family of open balls Br x ⊂ n with radius r ∈ + and centre x ∈ n . We have seen in Problems 2.19 and 2.9 that # = # and # = # + × n = # Therefore, the collection of all Souslin schemes → i1 i2 ik → Ci1 i2 ik has cardinality # = # = , cf. Problem 2.18. With each Souslin scheme we can associate a set A ⊂ n in the following way: take any sequence ij j∈ of natural numbers and consider the sequence of finite tuples i1 i1 i2 i1 i2 i3 i1 i2 ik formed by the first 1 2 k members of the sequence


333

ij j∈ . Using the Souslin scheme we pick for each tuple i1 i2 ik the corresponding set Ci1 i2 ik ∈ to get a sequence of sets Ci1 Ci1 i2 Ci1 i2 i3 Ci1 i2 ik from . Finally, we form the intersection of all these sets Ci1 ∩ Ci1 i2 ∩ Ci1 i2 i3 ∩ ∩ Ci1 i2 ik ∩ and consider the union over all possible sequences ij j∈ of natural numbers: A = A = Ci1 i2 ik ij j∈∈ k=1

Note that this union is uncountable, so that A is not necessarily a Borel set. It is often helpful to visualize this construction as tree: Souslin scheme s

C1

C11

C12

C2

C13

...

C211

C21

C212

C22

C213

...

C3

...

C23

C31

C32

C33

...

...

where the Ci1 Ci1 i2 Ci1 i2 i3 ∈ are the sets of the 1st, 2nd, 3rd, etc. generation. We will also call Ci1 i2 or Ci1 i2 i3 children or grandchildren of Ci1 . D.7 Definition (Souslin) Let , and A be as above. The sets in = A ∈ are called analytic or Souslin sets (generated by ). D.8 Lemma Let and be as before. (i) (ii) (iii) (iv)

is stable under countable unions and countable intersections; contains all open and all closed subsets of n ; n = ⊂ ; # .

Proof (i) Let A ∈ , ∈ , be a sequence of analytic sets

A = ij

j∈∈

k=1

Ci1 i2 ik

334

R.L. Schilling

Since

A =

A =

∈ ij j∈∈ k=1

∈

Ci1 i2 ik

it is obvious that A can be obtained from a Souslin scheme which arises by the juxtaposition of the Souslin schemes belonging to the A : arrange the double sequence Ci1 , i1 ∈ × , in one sequence – e.g. using the counting scheme of Example 2.5(iv) – to get the first generation of sets while all other generations follow suit in genealogical order. Thus A ∈ . For the countable intersection of the A we observe first that B =

∈

A =

∈ ij

j∈∈

ijm j∈∈

=1 k=1

[]

Ci1 i2 ik =

k=1

Ci i i 1 2

k

m=123

and then we merge the two infinite intersections indexed by k ∈ × into a single infinite intersection. Once again this can be achieved through the counting scheme of Example 2.5(iv): Ci11 1

∩ Ci11 i1 ∩ Ci22 1 2

1

∩ Ci11 i1 i1 ∩ Ci22 i2 ∩ Ci33 1 2 3

1 2

1

∩

1 1 → 1 2 → 2 1 → 1 3 → 2 2 → 3 1 → and so B=

ijm j∈∈ m=123

Ci11 ∩ Ci11 i1 ∩ Ci22 ∩ Ci11 i1 i1 ∩ Ci22 i2 ∩ Ci33 ∩ 1

1 2

1

1 2 3

1 2

1

We will now construct a Souslin scheme which produces B by arranging the sets j Ckm in a tree: • The first generation are the sets Ci11 , i11 ∈ . 1

• The second generation are the sets Ci11 i1 , i21 ∈ , such that they are for fixed i11 1 2

the children of Ci11 . 1

• Each Ci11 i1 has the same offspring, namely the sets Ci22 , i12 ∈ , which form 1 2 1 jointly the third generation. • The fourth generation are the sets Ci11 i1 i1 , i31 ∈ , such that they are for fixed 1 2 3

i11 i21 the grandchildren of Ci11 i1 . 1 2

• The fifth generation are the sets Ci22 i2 , i22 ∈ , such that they are for fixed i12 the grandchildren of Ci22 . 1

1 2


335

• Each Ci22 i2 has the same offspring, namely the sets Ci33 , i13 ∈ , which form 1 2 1 jointly the sixth generation. • This shows that B ∈ . (ii) Every open set can be written as countable union of -sets Br x U= Br x⊂U Br x∈

Indeed, the inclusion ‘⊃’ is obvious, for ‘⊂’ fix x ∈ U . Then there exists some r ∈ + with Br x ⊂ U . Since n is dense in n , x ∈ Br/2 y for some y ∈ n with x − y < r/4, so that x ∈ Br/2 y ⊂ U . Since there are only countably many sets in , the union is a fortiori countable. By part (i) we then get that U ∈ , i.e. contains all open sets. For a closed set F we know that F= Uj where Uj = F + B1/j 0 = y x ∈ F x − y < 1j j∈

is a countable intersection of open[] sets Uj . Since open sets are analytic, part (i) implies that F ∈ . (iii) Consider the system = A ∈ Ac ∈ . We claim that is a -algebra. Obviously, satisfies conditions 1 2 – i.e. contains n and is stable under complementation. To see 3 we take a sequence Aj j∈ ⊂ and observe that, by part (i),

c c Aj ∈ and Aj = Aj ∈ j∈ j∈ j∈ ∈

so that j Aj ∈ . Because of (ii) we have ⊂ ⊂ and this implies that ⊂ . Since, by (ii), all open sets are countable unions of sets from , we get ⊂ ⊂ def

( denotes the family of open sets) or = = n . (iv) follows immediately from the fact that there are # = # = Souslin schemes, cf. Definition D.7. The Proof of Theorem D.5 is now easy: By Lemma D.8 there are at most analytic sets. Since each singleton x , x ∈ n , is a Borel set, there are at least

336

R.L. Schilling

Borel sets (use Problem 2.17 to see #n = ). So, #n # and an application of Theorem 2.7 finishes the proof. D.9 Remark Our approach to analytic sets follows the original construction of Souslin [42], which makes it easy to determine the cardinality of . This, however, comes at a price: if one wants to work with this definition, things become messy, as we have seen in the proof of Lemma D.8(i). Nowadays analytic sets are often introduced by one of the following characterizations. A set A ⊂ n is analytic if, and only if, one of the following equivalent conditions holds: (i) A = f for some left-continuous function f → n ; (ii) A = g for some Borel measurable function g → n ; (iii) A = hB for some Borel set B ∈ X, some Polish space2 X and some Borel measurable function h B → n ; (iv) A = 2 B where 2 Y × n → n is the coordinate projection onto n , Y is a compact Hausdorff space3 and B ⊂ Y × n is a -set, i.e. B can be written as countable intersection (‘ ’) of countable unions (‘ ’) of compact subsets (‘ ’) of Y × n . For a proof we refer to Srivastava [43] which is also our main reference for analytic sets. The Souslin operation can be applied to other systems of sets than . Without proof we mention the following facts: = open sets = closed sets = compact sets and also =

and

n ∗ n

Most constructions of sets which are not Borel but still Lebesgue measurable are actually constructions of non-Borel analytic sets, cf. Dudley [14, §13.2].

2 3

i.e. a space X which can be endowed with a metric for which X is complete and separable. cf. Appendix B, Definition B.3.

Appendix E A summary of the Riemann integral

In this appendix we give a brief outline of the Riemann integral on the real line. The notion of integration was well known for a long time and ever since the creation of differential calculus by Newton and Leibniz, integration was perceived as anti-derivative. Several attempts to make this precise were made, but the problem with these approaches was partly that the notion of integral was implicit – i.e. axiomatically given rather than constructively – partly that the choice of possible integrands was rather limited and partly that some fundamental points were unclear. Out of the need to overcome these insufficiencies and to have a sound foundation, Bernhard Riemann asked in his Habilitationsschrift Über die Darstellbarkeit einer Function durch eine trigonometrische Reihe1 the question Also zuerst: Was b hat man unter fx dx zu verstehen?2 (p. 239) and proposed a general way a

to define an integral which is constructive, which is (at least for continuous integrands) the anti-derivative, and which can deal with a wider range of integrands than all its predecessors. We do not follow Riemann’s original approach but use the Darboux technique of upper and lower integrals. Riemann’s original definition will be recovered in Theorem E.5(iv). The (proper) Riemann integral Riemann integrals are defined only for bounded functions on compact intervals a b ⊂ ; this avoids all sorts of complications arising when either the domain or the range of the integrand is infinite. Both cases can be dealt with by various extensions of the Riemann integral, one of which – the so-called improper Riemann integral – we will discuss later on. 1 2

On the representability of a function by a trigonometric series. b First of all: what is the meaning of fx dx? a

337

338

R.L. Schilling

A partition of the interval a b consists of finitely many points satisfying = a = t0 < t1 < < tk−1 < tk = b

k = k

We call mesh = max1jk tj − tj−1 the mesh or fineness of the partition. Given a partition and a bounded function u a b → we define mj =

inf

ux

x∈tj−1 tj

and

Mj =

sup

ux

x∈tj−1 tj

for all j = 1 2 k, and introduce the lower, resp. upper Darboux sums

k

S u =

k

mj tj − tj−1

resp.

S u =

j=1

Mj tj − tj−1

j=1

Obviously, S • S • are linear, and if ux M, they satisfy S u S u M b − a

S u S u M b − a

(E.1)

E.1 Lemma Let be a partition of a b and ⊃ be a refinement of . Then

S u S u S u S u holds for all bounded functions u a b → .

Proof Since S u = −S −u and since S u S u is trivially fulfilled, it is enough to show S u S u. The partitions contain only finitely many points and we may assume that = ∪ where tj0 −1 < < tj0 for some index 1 j0 k. The rest follows by iteration. Clearly, S u = mj tj − tj−1 + mj0 tj0 − + mj0 − tj0 −1 j =j0

j =j0

+

mj tj − tj−1 + inf ux tj0 − x∈ tj0

inf

x∈tj0 −1

ux − tj0 −1 = S u

Lemma E.1 shows that the following definition makes sense. E.2 Definition Let u a b → be a bounded function. The lower and upper integrals of u are given by b S u ∗ u = sup a

and

b∗ a

u = inf S u

where sup and inf range over all finite partitions of a b.


339

b b∗ b b∗ E.3 Lemma ∗ u u and ∗ u = − −u. a

a

a

a

E.4 Definition A bounded function u a b → is said to be (Riemann) integrable, if the upper and lower integrals coincide. Their common value is denoted by

b

a

b b∗ ux dx = ∗ u = u a

a

and is called the (Riemann) integral of u. The collection of all Riemann integrable functions in a b is denoted by a b. E.5 Theorem (Characterization of a b) Let u a b → be a bounded function. Then the following assertions are equivalent (i) u ∈ a b. (ii) For every > 0 there is some partition such that S u − S u . (iii) For every > 0 there is some > 0 such that S u − S u for all partitions with mesh < . (iv) The limit I = lim uj tj − tj−1 exists for every choice of intermesh →0

j tj ∈

mediate values tj−1 j tj ; this means that for all > 0 there exists a > 0 such that for all partitions with mesh < I − uj tj − tj−1 j tj ∈

independently of the intermediate points. b∗ b u = ∗ u. If the limit exists, I = a

a

Proof We show the implications (i)⇒(ii)⇒(iii)⇒(iv)⇒(i). (i)⇒(ii): By the very definition and the lower and upper integrals in terms of of sup and inf, we find for every > 0 partitions and such that b

∗ u − S u 2 a

and S u −

b∗ a

u 2

Using the common refinement = ∪ we get from Lemma E.1 and the integrability of u b b∗ S u − S u S u − S u = S u − u + ∗ u − S u a

a

340

R.L. Schilling

(ii)⇒(iii): This is the most intricate step in the proof. Fix > 0 and denote by = a = t0 < t1 < < tk = b the partition in (ii). We choose > 0 in such a way that
0 and choose > 0 as in (iii). Then we have for any partition = a = t0 < < tk = b with mesh < and any choice of intermediate points j ∈ tj−1 tj ,

k

S u − S u

uj tj − tj−1 S u S u +

j=1

This implies b∗

k

u−

a

uj tj − tj−1

b∗

u

a

j=1

and k b b u u t − t j j j−1 ∗ ∗ u + a

a

j=1

b b∗ mesh →0 u t − t − − − − − − → I = u = u. j j j−1 j=1 ∗ a a k mesh →0 (vi)⇒(i): Assume that j=1 uj tj − tj−1 −−−−−−→ I exists for any choice

which means that

k

of intermediate values. We have to show that I =

b∗ a

b u = ∗ u. By definition of a

the limit, there is some > 0 and some partition with mesh < such that

k

I −

uj tj − tj−1 I +

j=1

Since this must hold uniformly for any choice of intermediate values, we can pass to the infimum and supremum of these values and get

k

I −

j=1

k

inf

∈tj−1 tj

u tj − tj−1

sup

u tj − tj−1 I +

j=1 ∈tj−1 tj

Thus I − < S u S u I + , and b b∗ u S u I + I − < S u ∗ u a

a

Once we know that u is Riemann integrable, we can work out the value of the integral by particular Riemann sums:

342

R.L. Schilling

E.6 Corollary If u a b → is Riemann integrable, then the integral is the limit of Riemann sums lim

n→

kn n n n u j tj − tj−1 j=1

n n n where n = a = t0 < t1 < < tkn = b is any sequence of partitions with n n n→ n mesh n −−−→ 0 and where j ∈ tj−1 tj are some intermediate points. The existence of the limit of Riemann sums for some particular sequence of partitions does not guarantee integrability. E.7 Example The Dirichlet jump function ux = 101∩ x on 0 1 is not Riemann integrable, since for each partition of 0 1 we have Mj = 1 and 1 1∗ u = S u = 1. mj = 0, so that ∗ u = S u = 0 while 0

0

On the other hand, the equidistant Riemann sum k

j

uj

j−1 − = k k

j=1

1 k

k

uj

j=1

takes the value nk , 0 n k if we choose 1 n rational and n+1 k irrational. This allows us to construct sequences of Riemann sums which converge to any value in 0 1. Let us now find concrete functions which are Riemann integrable. A step function on a b is a function f a b → of the form fx =

N

yj 1Ij x

j=1

where N ∈ , yj ∈ and Ij are (open, half-open, closed, even degenerate) adjacent intervals such that I1 ∪ ∪ IN = a b and Ij ∩ Ik , j = k, intersect in at most one point. We denote by a b the family of all step functions on a b. E.8 Theorem Continuous functions, monotone functions, and step functions on a b are Riemann integrable. Proof Notice that the functions from all three classes are bounded on a b. Continuous functions: Let u a b → be continuous. Since a b is compact, u is uniformly continuous and we find for all > 0 some > 0 such that ux − uy

∀ x y ∈ a b x − y <


343

If is a partition of a b with mesh < we find Mj − mj tj − tj−1 tj − tj−1 = b − a S u − S u = tj ∈

tj ∈

since, by uniform continuity, Mj − mj = sup utj−1 tj − inf utj−1 tj =

sup

u − u

∈tj−1 tj

Thus u ∈ a b by Theorem E.5(iii). Monotone functions: We can safely assume that u a b → is monotone increasing, otherwise we would consider −u. For the equidistant partition k with points tj = a + j b−a k , 0 j k, we get S k u − Sk u =

k

utj − utj−1 tj − tj−1

j=1

=

k b−a b−a utj − utj−1 = ub − ua k j=1 k

where we used that sup utj−1 tj = utj and inf utj−1 tj = utj−1 because of monotonicity. Since b−a k ub − ua can be made arbitrarily small, u ∈ a b by Theorem E.5(ii). Step functions: Let u be a step function which has value yj on the interval Ij , j = 1 k. The endpoints of the non-degenerate intervals form a partition of a b, = a = t0 < t1 < < tN = b , N k, and we set for every > 0 = a = s0 < s1 < s1 < s2 < < sN −1 < sN −1 < sN = b where sj < tj < sj , 1 j N − 1, and sj − sj < /2N u . Since u is constant with value yj on each interval sj−1 sj , we find S u − S u =

N

yj − yj sj − sj−1 +

j=1

N −1 j=1

N −1

sup usj sj − inf usj sj sj − sj

j=1

2 u

2N u

Therefore Theorem E.5(ii) proves that u ∈ a b. With somewhat more effort one can prove the following general theorem.

344

R.L. Schilling

E.9 Theorem Any bounded function u a b → with at most countably many points of discontinuity is Riemann integrable. An elementary proof of this based on a compactness argument can be found in Strichartz [48, §6.2.3], but since Theorem 11.8 supersedes this result anyway, we do not include a proof here. A combination of Theorems E.8 and E.5 yields the following quite useful criterion for integrability. E.10 Corollary u ∈ ab if, and only if, for every > 0 there are f g ∈ a b b such that f u g and a g − f dt . E.11 Theorem The Riemann integral is a positive linear form on the vector lattice a b, that is, for all ∈ and u w ∈ a b one has b b b (i) u + w ∈ a b and u + w dt = u dt + w dt; a a ba b u dt w dt; (ii) u w =⇒ a a b b + − (iii) u ∨ w u ∧ w u u u ∈ a b and u dt u dt; a

(iv) up u w ∈ a b, 1 p < .

a

Proof (i) follows immediately from the linearity of the limit criterion in Theorem E.5(iv). b (ii): In view of (i) it is enough to show that v = w − u 0 entails a v dt 0. This, however is clear since v ∈ a b and b b 0 ∗v= v dt a a

(iii): Since u ∨ w = −−u ∧ −w, u+ = u ∨ 0, u− = −u ∨ 0 and u = u+ − u− , it is enough to prove that u ∧ w ∈ a b. By Corollary E.10 there are for every f g ∈ a b such that f u g, w b > 0 step functions b and a g − f dt + a − dt . Obviously, f ∧ g ∧ are again step functions[] with f ∧ u ∧ w g ∧ and b b f ∧ − g ∧ dt g − f + − dt a

a

where we used (ii) and the elementary inequality for a b A B ∈ a ∧ A − b ∧ B maxa − b A − B a − b + A − B b b Finally, since±u u we find by parts (i),(ii) that ± a u dt a u dt which b b implies a u dt a u dt.


345

(iv): By (iii), u ∈ a b and, by Corollary E.10, we find for each > 0 b step functions f u g such that a g − f dt . Without loss of generality, we may assume that f 0 and g u – otherwise we could consider f + and g ∧ u and note that f + g ∧ u ∈ a b, f + u g ∧ u and b b g ∧ u − f + dt g − f dt a gp ,

up

a

where Thus differential calculus we get fp

f p gp

∈ a b. By the mean value theorem of

p−1 g p − f p p g p−1

g − f p u g − f

Thus, by (ii), a

Since uw and the fact

b g p − f p dt p u p−1 g − f dt p u p−1

b

a

= 41 u + w2 − u − w2 , that u2 = u2 ∈ a b.

we conclude that uw ∈ a b from (i)

Note that Theorem E.11(iii) has no converse: u ∈ a b does not imply that u ∈ a b (as is the case for the Lebesgue integral, cf. T10.3). This can be seen by the modified Dirichlet jump function u = 101∩ − 101\ which is not Riemann integrable but whose modulus u = 101 is Riemann integrable. E.12 Corollary (Mean value theorem for integrals) Let u ∈ a b be either positive or negative and let v ∈ Ca b. Then there exists some ∈ a b such that b b utvt dt = v ut dt (E.5) a

a

Proof The case u 0 being similar, we may assume that u 0. By Theorem E.8 and E.11(iv), uv is integrable and because of E.11(ii) we have b b b ut dt utvt dt sup va b ut dt inf va b a

a

a

Since v is continuous on a b, the intermediate value theorem guarantees the existence of some ∈ a b such that (E.5) holds. E.13 Theorem Let c d ⊂ a b. Then a b ⊂ c d in the sense that u ∈ a b satisfies ucd ∈ c d. Moreover, for any u ∈ a b b c b u dt = u dt + u dt a

a

c

Proof By Theorem E.8 and E.11 we find that 1cd u ∈ a b. Since we can always add the points c and d to any of the partitions appearing in one of the

346

R.L. Schilling

criteria of Theorem E.5, we see that ucd = 1cd ucd ∈ c d and b d 1cd u dt = u dt a

c

Considering u = 1ac u + 1cb u proves also the formula in the statement of the theorem. The fundamental theorem of integral calculus x Since by Theorem E.13 a x ⊂ a b, we can treat a ut dt, u ∈ a b, as a function of its upper limit x ∈ a b. x E.14 Lemma For every u ∈ a b the function Ux = a ut dt is continuous for all x ∈ a b. Proof Since u is bounded, M = supx∈ab ux < . For all x y ∈ a b, x < y, we have by Theorem E.13 and E.11 x y Uy − Ux = ut dt − ut dt a

a

y y x−y→0 ut dt ut dt M y − x −−−−→ 0 = x x

showing even uniform continuity. We can now discuss the connection between differentiation and integration. Let us begin with a few examples. E.15 Example (i) Let 0 1 a b. Then ux = 101 x is an integrable function and ⎧ ⎫ ⎪ ⎪ ⎨ 0 if x 0 ⎬ x Ux = ut dt = x if 0 < x < 1 = x+ ∧ 1 ⎪ ⎪ a ⎩ 1 if x 1 ⎭ Note that U x does not exist at x = 0 or x = 1, so that ux cannot be the derivative of any function (at every point). (ii) Let a b = 0 1 and take an enumeration qj j∈ of 0 1 ∩ . Then the function

−j ux = 2 = 2−j 1qj 1 x x ∈ 0 1 j qj x

j=1

is increasing, satisfies 0 u 1 and its discontinuities are jumps at the points qj of height uqj + − uqj − = 2−j – this is as bad as it can get for


347

a monotone function, cf. Lemma 13.12. By Theorem E.8 u is integrable, and since qj j∈ is dense, there is no interval c d ⊂ 0 1 such that U x = ux for all x ∈ c d for any function Ux. (iii) Consider on −1 1 the function x2 sin x12 if x = 0 ux = 0 if x = 0 It is an elementary exercise to show that u x exists on −1 1 and 2x sin x12 − x2 cos x12 if x = 0 u x = 0 if x = 0 Thus u exists everywhere, but it is not Riemann integrable in any neighbourhood of x = 0 since u is unbounded. (iv) Let qn n∈ be an enumeration of 0 1 ∩ . The function 2−n if x = qn n ∈ ux = 0 if x ∈ 0 1 \ ∪ 0 1 is discontinuous for every x ∈ 0 1 ∩ and continuous otherwise. Moreover, u ∈ 0 1 which follows from Theorem E.9 or directly from the following argument: fix > 0 and n ∈ such that 2−n < . Choose a partition = 0 = t0 < t1 < < tN = 1 with mesh = < n in such a way that each qk from Qn = q1 q2 qn is the midpoint of some tj−1 tj , j = 1 2 N . Therefore, if Mj denotes sup utj−1 tj , 0 S u S u =

N

Mj tj − tj−1

j=1

=

Mj tj − tj−1

j tj−1 tj ∩Qn =∅

+

Mj tj − tj−1

j tj−1 tj ∩Qn =∅

n

+ 2−n n j t

tj − tj−1

j−1 tj ∩Qn =∅

+

N

tj − tj−1 = 2

j=1

1 x This proves u ∈ 0 1 and 0 0 ut dt 0 ut dt = 0. Thus u x = 0 = ux for all x from a dense subset.

348

R.L. Schilling

The above examples show that the Riemann integral is not always the antiderivative, nor is the antiderivative an extension of the Riemann integral. The two concepts, however, coincide on a large class of functions. E.16 Definition Let u a b → be a bounded function. Every function U ∈ Ca b such that U x = ux for all but possibly finitely many x ∈ a b is called a primitive of u. Obviously, primitives are only unique up to constants: for every constant c, U + c is again a primitive of u. On the other hand, if U W are two primitives of u, we have U − W = 0 at all but finitely many points a = x0 < x1 < < xn = b. Thus the mean value theorem of differential calculus shows U = W + const. (cf. Rudin [39, Thm. 5.11]), first on each interval xj−1 xj , j = 1 2 n, and then, by continuity, on the whole interval a b. x E.17 Proposition Every u ∈ Ca b has Ux = a ut dt as a primitive. Moreover, b Ub − Ua = ut dt a

Proof Since continuous functions are integrable, Ux is well-defined by Theorem E.13 and continuous by Lemma E.14. For a < x < x + h < b and sufficiently small h we find x x+h x+h Ux + h − Ux − h ux = ut dt − ut dt − ux dt a a x x+h = ut − ux dt x

x+h

ut − ux dt

x

x

x+h

dt = h

where we used that ut is continuous at t = x. With a similar calculation we get Ux − Ux − h − h ux h and a combination of both inequalities shows that lim

y→x

The formula Ub − Ua =

b a

Uy − Ux = ux y−x

ut dt follows from the fact that Ua = 0.


349

E.18 Theorem (Fundamental theorem of calculus) Assume that U is a primitive of u ∈ a b. Then b Ub − Ua = ut dt a

Proof Let C be some finite set such that U x = ux if x ∈ a b \ C. Fix

> 0. Since u is integrable, we find by E.5(ii) a partition of a b such that S u − S u . Because of Lemma E.1 this inequality still holds for the partition = ∪ C whose points we denote by a = t0 < t1 < < tk = b. Since Ub − Ua =

k

Utj − Utj−1

j=1

and since U is differentiable in each segment tj−1 tj and continuous on a b, we can use the mean value theorem of differential calculus to find points j ∈ tj−1 tj with Utj − Utj−1 = U j tj − tj−1 = uj tj − tj−1

1 j k

Using mj = inf utj−1 tj uj sup utj−1 tj = Mj we can sum the above equality over j = 1 k and get

S u − S u Ub − Ua S u S u + b By integrability, S u a u dt S u, and this shows b b u dt − Ub − Ua u dt + ∀ > 0 a

a

which proves our claim. E.19 Remark There is not much room to improve the fundamental theorem E.18. On one hand, Example E.15(ii) shows that an integrable x function need not have a primitive and E.15(iv) gives an example where a u dt exists, but is not a primitive in any interval; on the other hand, E.15(iii) provides an example of a function u which has a primitive u but which is itself not Riemann integrable since it is unbounded. Volterra even constructed an example of a bounded but not Riemann integrable function with a primitive, see Sz.-Nagy [30, pp. 155–7]. To overcome this phenomenon was one of the motivations for Lebesgue when he introduced the Lebesgue integral. And, in fact, every bounded function f on the interval a b with a primitive F is Lebesgue integrable: indeed, since F is continuous, it is measurable in the sense of Chapter 8 and so is the limit fx = limn→ Fx + n1 − Fx/ n1 , cf. Corollary 8.9 – the finitely many points where the limit does not exist are a Lebesgue null set and pose no problem. Since

350

R.L. Schilling

f is dominated by the (Lebesgue) integrable function M 1ab M = sup fa b, we conclude that f ∈ 1 a b. An immediate consequence of the integral as antiderivative are the following integration formulae which are easily proved by ‘integrating up’ the corresponding differentiation rules. E.20 Theorem (Integration by parts) Let u and v be integrable functions on a b with primitives u and v. Then uv is a primitive of u v+uv and, in particular, b b u tvt dt = ubvb − uava − utv t dt a

a

E.21 Theorem (Integration by substitution) Let u ∈ a b and assume that c d → a b is a strictly increasing differentiable function such that c = a and d = b. If u ∈ c d and if u has a primitive U , then U is a primitive of u · as well as d −1 b b ut dt = us s ds = us s ds a

−1 a

c

E.22 Corollary (Bonnet’s mean value theorem3 ) Let u v ∈ a b have primitives U and V . If u 0 [resp. u 0] and U 0, then there exists some ∈ a b such that b Utvt dt = Ua vt dt (E.6) a a b b resp. Utvt dt = Ub vt dt (E.6 ) a

Proof By subtracting a suitable constant from x V we may assume that Va = 0 and, by the fundamental theorem E.18, Va = a vt dt. Integration by parts now shows b b Utvt dt = UbVb − utVt dt a

a

Since u 0 we get b Utvt dt UbVb − sup Va b a

b

ut dt

a

= UbVb − sup Va b Ub − Ua = Ub Vb − sup Va b + sup Va b Ua sup Va b Ua 3

Also known as the second mean value theorem of integral calculus.


351

and a similar calculation yields the other inequality below: b Utvt dt sup Va b Ua inf Va b Ua a

Applying the intermediate value theorem to the continuous function V furnishes some ∈ a b such that (E.6) holds. Integrals and limits One of the strengths of Lebesgue integration is the fact that we have fairly general theorems that allow interchanging pointwise limits and Lebesgue integrals. Similar results for the Riemann integral regularly require uniform convergence. Recall that a sequence of functions un •n∈ on a b converges uniformly (in x) to u, if ∀ > 0

∃ N ∈ ∀ x ∈ a b ∀ n N

un x − ux

The basic convergence result for the Riemann integral is the following. E.23 Theorem Let un n∈ ⊂ a b be a sequence which converges uniformly to a function u. Then u ∈ a b and b b b lim un dt = lim un dt = u dt n→ a

a n→

a

n→

Proof Let be a partition of a b and let > 0 be given. Since un −−−→ u uniformly, we can find some N ∈ such that ux − un x /b − a uniformly in x ∈ a b for all n N . Because of (E.1) we find for all n N S u − S u = S u − un + S un − S un − S u − un 2 + S un − S un thus b∗ a

b u − ∗ u 2 + S un − S un

∀ n N

a

Fixing some n0 N we can use that un0 is integrable and choose in such a way that

S un0 − S un0

. This shows that

b∗ a

b u − ∗ u 3 and u ∈ a b. a

Once u is known to be integrable, we get for all n N b b

→0 u − un dt u − un dt b − a −−→ 0 a

a

We can now consider Riemann integrals which depend on a parameter.

352

R.L. Schilling

E.24 Theorem (Continuity theorem) Let u a b × → be a continuous function. Then b wy = ut y dt a

is continuous for all y ∈ . Proof Since u• y is continuous, the above Riemann integral exists. Fix y ∈ and consider any sequence yn n∈ with limit y. Without loss of generality we can assume that yn n∈ ⊂ I = y − 1 y + 1. Since a b × I is compact, uab×I is uniformly continuous, and we can find for all > 0 some > 0 such that t − 2 + y − 2 < =⇒ ut y − u < n→

As yn −−−→ y, there is some N ∈ with ut yn − ut y <

∀ t ∈ a b ∀ n N

n→

i.e. uyn t −−−→ uy t uniformly in t ∈ a b. Theorem E.23 and the continuity of ut • therefore show b b b lim wyn = lim ut yn dt = lim ut yn dt = ut y dt = wy n→

n→ a

a n→

a

which is but the continuity of w at y. E.25 Theorem (Differentiation theorem) Let u a b× → be a continuous function with continuous partial derivative y ut y. Then wy =

b

ut y dt a

is continuously differentiable and b d b w y = ut y dt ut y dt = dy a a y Proof Since u• y and y u• y are continuous, the above integrals exist. Fix y ∈ and consider any sequence yn n∈ with limit y. Without loss of generality we can assume that yn n∈ ⊂ I = y − 1 y + 1. We introduce the following auxiliary function

ht z = ut z − ut y −

ut y z − y y


353

Clearly, ht y = 0 and z ht z = z ut z − y ut y is continuous and uniformly continuous on a b × I, i.e. for all > 0 there is some > 0 such that 2 2 t − + z − < =⇒ ht z − h < z

From the mean value theorem of differential calculus we infer that for some between z and y ht z = ht z − ht y = h · z − y = h − ht y · z − y y z − y whenever z y ∈ I and z − y < . This shows that for some N ∈ ut yn − ut y − ut yyn − y yn − y ∀ t ∈ a b ∀ n N y Theorem E.23 now shows that b ut y − ut y wyn − wy n = lim dt n→ n→ yn − y yn − y a b b ut yn − ut y dt = lim ut y dt = yn − y a n→ a y

w y = lim

Improper Riemann integrals Let us finally have a glance at various extensions of the Riemann integral to unbounded intervals and/or unbounded integrands. The following cases can occur: A. the interval of integration is a + or − b; B. the interval of integration is a b or a b, and the integrand ut is unbounded as t ↑ b resp. t ↓ a; C. the interval of integration is a b with − a < b + and the integrand may or may not be unbounded.

354

R.L. Schilling

A. Improper Riemann integrals of the type

u dt or a

b

−

u dt

E.26 Definition If u ∈ a b for all b ∈ a [resp. a ∈ − b] and if the limit b b u dt resp. lim u dt lim a→− a

b→ a

exists and is finite, we call u improperly Riemann integrable and write u ∈ a [resp. u ∈ − b ]. The value of the above limit is called the

b (improper Riemann) integral and denoted by a u dt resp. − u dt . The typical examples of improper integrals of this kind are expressions of the type 1 t dt if < 0. In fact, if = −1, b −1 if < −1 1 +1 +1 t = lim t dt = lim b − 1 = b→ b→ + 1 1 1

if > −1 and a similar calculation confirms that 1 t−1 dt = . Thus t ∈ 1 if, and only if, < −1. From now on we will only consider integrals of the type a u dt, the case of a finite upper and infinite lower limit is very similar. The following Cauchy criterion for improper integrals is quite useful. E.27 Lemma y u ∈ a if, and only if, u ∈ a b for all b ∈ a and limxy→ x u dt = 0 (x y → simultaneously). z Proof This is just Cauchy’s convergence criterion for Uz = a ut dt as z → . It is not hard to see that Lemma E.27 implies, in particular, that • a is a vector space, i.e. for all ∈ and u w ∈ a , u + w dt = u dt + w dt a

• u ∈ a if, and only if,

a

b

a

u dt exists for all b > a.

E.28 Corollary Let u w a → be two functions such that u w. If w ∈ a , and if u ∈ a b for all b > a, then u u ∈ a . In particular, u ∈ a implies that u ∈ a .


355

Proof For all y > x > a we find using Theorem E.11 and Lemma E.27 that y y y xy→ u dt u dt w dt −−−−→ 0 x

x

x

which shows, again by E.27, that u u ∈ a . Note that, unlike Lebesgue integrals, improper Riemann integrals are not absolute integrals since improper integrability of u does NOT imply improper integra bility of u, see e.g. Remark 11.11 where 0 sin t/t dt is discussed. This means that the following convergence theorems for improper Riemann integrals are not necessarily covered by Lebesgue’s theory. E.29 Theorem Let un n∈ ⊂ a . If for some u a → n→

• un t −−−→ ut uniformly in t ∈ a b and for every b > a, b un dt exists uniformly for all n ∈ , i.e. for every > 0 there is • lim b→ a

some N ∈ such that y sup un dt < n∈ x then u ∈ a and

lim

n→ a

un dt =

∀ y > x > N a

lim un dt =

n→

u dt a

Proof That u ∈ a b for all b > a follows from Theorem E.23. Fix > 0 and choose N as in the above statement. For all y > x > N y y y u dt u − un dt + un dt y − x sup ut − un t + x

x

x

t∈xy

y and as n → we find x u dt for all y > x > N , hence u ∈ a by Lemma E.27. In pretty much the same way as we derived Theorems E.24, E.25 from the basic convergence result E.23 we get now from E.29 the following continuity and differentiability theorems for improper integrals. E.30 Theorem Let I ⊂ be an open interval and u a × I → be continuous such that u• y ∈ a for all y ∈ I and b lim ut y dt exists uniformly for all y ∈ c d ⊂ I Then Uy

b→ a = a ut y dt

is continuous for all y ∈ c d.

356

R.L. Schilling

Proof (sketch) Fix y ∈ c d and choose any sequence yn n∈ ⊂ c d with limit n→ y. By the assumptions un t = ut yn −−−→ ut y uniformly for all t ∈ a b. Now the basic convergence theorem for improper integrals E.29 applies and shows n→ Uyn −−−→ Uy. E.31 Theorem Let I ⊂ be an open interval and u a × I → be contin uous with continuous partial derivative y ut y. If u• y y ut y ∈ a for all y ∈ I, and if b b lim ut y dt and lim ut y dt b→ a b→ a y exist uniformly for all y ∈ c d ⊂ I, then Wy = a ut y dt exists and is differentiable on c d with derivative d W y = ut y dt ut y dt = dy a a y x Ux y exists and Proof (sketch) Set Ux y = a ut y dt. By Theorem E.25 y x equals a y ut y dt. By assumption, x→ ut y dt pointwise for all y ∈ c d Ux y −−−→ a

x→ Ux y −−−→ ut y dt y a y

uniformly for all y ∈ c d

By a standard theorem on uniform convergence and differentiability, cf. Rudin [39, Theorem 7.17], we now conclude d ut y dt = ut y dt dy a a y E.32 Theorem Let u w ∈ a b for all b ∈ a and assume that u w 0 and that limx→ ux/wx = A > 0 exists. Then u ∈ a if, and only if, w ∈ a . Proof By assumption we find for every > 0 some N ∈ such that 0 < A−

ux A+ wx

∀ x N > a

Thus A − wx ux A + wx for all x N . Thus, if w ∈ a , we get A + w ∈ a (cf. the remark following Lemma E.27) and, by Corollary E.28, u ∈ a . Similarly, if u ∈ a , we have u/A − ∈ a and, again by E.28, w ∈ a .


357

We will finally study the interplay of series and improper integrals. E.33 Theorem Let a = b0 < b1 < b2 < be a strictly increasing sequence with bk → .

bk u dt converges. (i) If u ∈ a , then k=1 bk−1

(ii) If u 0 and u ∈ bk−1 bk for all k ∈ , then the convergence of implies u ∈ a . Proof (i): Since u ∈ a , bn n u dt = lim u dt = lim n→ a

a

n→

bk

k=1 bk−1

u dt =

bk

bk

u dt

k=1 bk−1

u dt

k=1 bk−1

bk (ii): Define S = k=1 bk−1 u dt. Since bk increases to , we find for all b > a some N ∈ such that bN > b. Consequently,

b a

u dt

bN

a

which shows that the limit limb→

u dt = b a

N

bk

k=1 bk−1

u dt S

u dt = supb>0

b a

u dt S exists.

E.34 Theorem (Integral test for series) Let u ∈ C0 , u 0, be a decreasing function. Then

u dt and uk 0

k=0

either both converge or diverge. Proof Note that by Theorem E.8 u ∈ 0 b for all b > 0, so that the improper integral can be defined. Since u is decreasing, k+1 uk + 1 ut dt uk k

cf. Theorem E.11, and summing these inequalities over k = 0 1 N yields N +1 k=1

uk =

N k=0

uk + 1

0

N +1

ut dt

N

uk

k=0

Since positive terms, it is obvious that u is positive and since the series has only u dt converges if, and only if, the series k=0 uk is finite. 0

358

R.L. Schilling

B. Improper Riemann integrals with unbounded integrands E.35 Definition If u ∈ a c [resp. u ∈ c b ] for all c ∈ a b and if the limit c b u dt resp. lim u dt lim c↑b a

c↓a c

exists and is finite, we call u improperly Riemann integrable and write u ∈ a b [resp. u ∈ a b ]. The bvalue of the limit is called the (improper Riemann) integral and denoted by a u dt. Notice that the function u in E.35 need not be bounded in a b. If it is, the improper integral coincides with the ordinary Riemann integral. E.36 Lemma If the function u ∈ a b [or u ∈ a b ] has an extension to a b which is bounded, then the extension is Riemann integrable over a b, and proper and improper Riemann integrals coincide. Proof We consider only a b, since the other case is similar. Denote, for notational simplicity, the extension of u again by u. Let M = sup ua b, fix > 0 and pick c < b with b − c M . Since u ∈ a c, we can find a partition of a c such that S u − S u . For the partition = ∪ b of a b we get

M = S u − S u = sup uc b M M and

M = S u − S u = inf uc b M M

which implies that S u − S u 3 and u ∈ a b by Theorem E.5. The claim now follows from Lemma E.14. b Many of the results for improper integrals of the form a u dt resp. − u dt carry over with minor notational changes to the case of half-open bounded intervals. Note, however, that in the convergence theorems some assertions involving uniform convergence are senseless in the presence of unbounded integrands. We leave the details to the reader. The examples of improper integrals of this kind are expressions of the 1typical type 0 t dt if < 0. In fact, if = −1, 1 1 1 if > −1 1 +1 t = lim t dt = lim 1 − = +1

→0

→0 +1 0

if < −1


and a similar calculation confirms that only if, > −1.

1 0

359

t−1 dt = . Thus t ∈ 0 1 if, and

C. Improper Riemann integrals where both limits are critical Assume now that the integration interval is a b and that both endpoints a and b, − a < b + , are critical, i.e. that the integrand is unbounded at one or both endpoints and/or that one or both endpoints are infinite. Let u ∈ a c ∩ c b for some point a < c < b and suppose that d satisfies c < d < b. By the remark following Lemma E.27 and Theorem E.13 we find c b c y u dt + u dt = lim u dt + lim u dt a

x↓a x

c

= lim

y↑b c

c

x↓a x

= lim

x↓a x

=

a

d

d

u dt +

d

c

u dt + lim

u dt +

u dt + lim

y↑b d

y↑b d

y

u dt

y

u dt

b

u dt d

which shows that u ∈ a d ∩ d b. Therefore, the following definition makes sense. E.37 Definition Let − a < b + and let a b ⊂ be a bounded or unbounded open interval. Then u a b → is said to be improperly integrable if for some (hence, all) c ∈ a b the function u is improperly integrable both over a c and c b, i.e. we define a b = a c ∩ c b. The (improper Riemann) integral is then given by b c b c y u dt = u dt + u dt = lim u dt + lim u dt a

a

c

x↓a x

y↑b c

The typical example of an improper integral of this kind is Euler’s Gamma function x = tx−1 e−t dt x > 0 0

which is treated in Example 10.14 in the framework of Lebesgue theory, but the arguments are essentially similar. The Gamma function is only for 0 < x < 1 a two-sided improper integral, since for x 1 it can be interpreted as a one-sided improper integral over 0 , cf. Lemma E.36.

Further reading

Measure theory is used in many mathematical disciplines. A few of them we have touched in this book and the purpose of this section is to point towards literature which treats these subjects in depth. The choice of books and topics is certainly not comprehensive. On the contrary, it is very personal, limited by my knowledge of the literature and, of course, my own mathematical taste. I decided to include only books in English and which I thought are accessible to readers of the present text. Real analysis (in particular measure and integration theory for analysts) Bass, R. F., Probabilistic Techniques in Analysis, New York: Springer 1995. Dudley, R. M., Real Analysis and Probability (2nd edn), Cambridge: Cambridge University Press, Studies in Adv. Math. vol. 74, 2002. Hewitt, E. and K. R. Stromberg, Real and Abstract Analysis, New York: Springer, Grad. Texts in Math. vol. 25, 1975. Kolmogorov, A. N. and F. V. Fomin, Introductory Real Analysis, Mineola (NY): Dover, 1975. Lieb, E. H. and M. Loss, Analysis (2nd edn), Am. Mathematical Society, Grad. Studies in Math. vol. 14, Providence (RI) 2001. Rudin, W., Real and Complex Analysis (3rd edn), McGraw-Hill, New York 1987. Saks, S., Theory of the Integral (2nd revised edn), Hafner, Mongrafie Matematyczne Tom VII, New York 1937. [Reprinted by Dover, 1964. Free online edition in the Wirtualna Biblioteka Nauki: http://matwbn.icm.edu.pl/kstresc.php?tom=7&wyd=10] Stroock, D., A Concise Introduction to the Theory of Integration (3rd edn), Birkhäuser, Boston 1999. Sz.-Nagy, B., Introduction to Real Functions and Orthogonal Expansions, Oxford University Press, Univ. Texts in the Math. Sci., New York 1965. Wheeden, R. L. and A. Zygmund, Measure and Integral. An Introduction to Real Analysis, Marcel Dekker, Pure Appl. Math. vol. 43, New York 1977.

360

Further reading

361

Functional analysis Bollobas, B., Linear Analysis. An Introductory Course (2nd edn), Cambridge University Press, Cambridge 1999. Hirsch, F. and G. Lacombe, Elements of Functional Analysis, Springer, Grad. Texts in Math. vol. 192, New York 1999. Kolmogorov, A. N. and F. V. Fomin, Introductory Real Analysis, Mineola (NY): Dover, 1975. Yosida, K., Functional Analysis (6th edn), Springer, Grundlehren math. Wiss. Bd. 123, Berlin 1980. Zaanen, A. C., Integration (completely revised edn. of An Introduction to the Theory of Integration), North-Holland, Amsterdam 1967.

Fourier series, harmonic analysis, orthonormal systems, wavelets Alexits, G., Convergence Problems of Orthogonal Series, Pergamon, Int. Ser. Monogr. Pure Appl. Math. vol. 20, Oxford 1961. Andrews, G. E., Askey, R. and R. Roy, Special Functions, Cambridge University Press, Encycl. Math. Appl. vol. 71, Cambridge 1999. Garsia, A. M., Topics in Almost Everywhere Convergence, Markham, Chicago 1970. Helson, H., Harmonic Analysis, Addison-Wesley, London, 1983. Kahane, J.-P., Some Random Series of Functions (2nd edn), Cambridge University Press, Stud. Adv. Math. vol. 5, Cambridge 1985. Krantz, S. G., A Panorama of Harmonic Analysis, Mathematical Association of America, Carus Math. Monogr. vol. 27, Washington 1999. Pinsky, M. A., Introduction to Fourier Analysis and Wavelets, Brooks/Cole, Ser. Adv. Math., Pacific Grove (CA) 2002. Schipp, F., Wade, W. R. and P. Simon, Walsh Series. An Introduction to Dyadic Harmonic Analysis, Adam Hilger, Bristol 1990. Stein, E. M., Singular Integrals and Differentiability Properties of Functions, Princeton University Press, Math. Ser. vol. 30, Princeton (NJ) 1970. Stein, E. M. and R. Shakarchi, Fourier Analysis: An Introduction, Princeton University Press, Princeton (NJ) 2003. Sz.-Nagy, B., Introduction to Real Functions and Orthogonal Expansions, Oxford University Press, Univ. Texts in the Math. Sci., New York 1965. Wojtaszczyk, P., A Mathematical Introduction to Wavelets, Cambridge University Press, London Math. Society Student Texts vol. 37, Cambridge 1997. Zygmund, A., Trigonometric Series (2nd edn), Cambridge University Press, Cambridge 1959. [Almost unaltered softcover editions: Cambridge: Cambridge University Press, 1969, 1988 and 2003.]

362

Further reading

Geometric measure theory, Hausdorff measure, fine properties of functions Evans, L. C. and R. F. Gariepy, Measure Theory and Fine Properties of Functions, CRC Press, Boca Raton (FL) 1992. Mattila, P., Geometry of Sets and Measures in Euclidean Spaces, Cambridge University Press, Studies in Adv. Math. vol. 44, Cambridge 1995. Morgan, F., Geometric Measure Theory: A Beginner’s Guide (3rd edn), Academic Press, San Diego, 2000. Rogers, C. A., Hausdorff Measures, Cambridge University Press, Cambridge Math. Library, Cambridge 1970. Ziemer, W. P., Weakly Differentiable Functions, Springer, Grad. Texts in Math. vol. 120, New York 1989. Topological measure theory, functional analytic aspects of integration and measure Bauer, H., Measure and Integration Theory, de Gruyter, Studies in Math. vol. 26, Berlin 2001. Choquet, G., Lectures on Analysis. vol. 1: Integration and Topological Vector Spaces, W. A. Benjamin, New York 1969. Dieudonné, J., Treatise on Analysis, vol. II, Academic Press, Pure Appl. Math. vol. 10-II, New York 1969. Hewitt, E. and K. A. Ross, Abstract Harmonic Analysis, vol. 1, Springer, Grundlehren math. Wiss. Bd. 115, Berlin 1963. Malliavin, P., Integration and Probability, Springer, Grad. Texts in Math. 157, New York 1995. Oxtoby, J. C., Measure and Category (2nd edn), Springer, Grad. Texts Math. vol. 2, New York 1980. Weir, A. J., General Integration and Measure, Cambridge University Press, Cambridge 1974. Borel and analytic sets Rogers, C. A. et al., Analytic Sets, Academic Press, London 1980. Srivastava, S. M., A Course on Borel Sets, Springer, Grad. Texts Math. vol. 180, New York 1998. Probability theory (in particular probabilistic measure theory) Ash, R. B. and C. A. Doléans-Dade, Probability and Measure Theory (2nd edn), Academic Press, San Diego (CA) 2000. Billingsley, P., Probability and Measure (3rd edn), Wiley, Ser. Probab. Math. Stat., New York 1995. Chow, Y. S. and H. Teicher, Probability Theory. Independence, Interchangeability, Martingales (3rd edn), Springer, Texts in Stat., New York 1997.

Further reading

363

Durrett, R., Probability: Theory and Examples (3rd edn), Thomson Brooks/Cole, Duxbury Adv. Studies, Belmont (CA) 2004. Kallenberg, O., Foundations of Modern Probability, Springer, New York 2001. Malliavin, P., Integration and Probability, Springer, Grad. Texts in Math. 157, New York 1995. Neveu, J., Mathematical Foundations of the Calculus of Probability, Holden Day, San Francisco (CA) 1965. Stromberg, K., Probability for Analysts, Chapman and Hall, Probab. Ser., New York 1994. Martingales and their applications Ash, R. B. and C. A. Doléans-Dade, Probability and Measure Theory (2nd edn), Academic Press, San Diego (CA) 2000. Chow, Y. S. and H. Teicher, Probability Theory. Independence, Interchangeability, Martingales (3rd edn), Springer, Texts in Stat., New York 1997. Dellacherie, C. and P. A. Meyer, Probabilities and Potential Pt. B: Theory of Martingales, North Holland, Math. Studies, Amsterdam 1982. [Note that Probabilities and Potential Pt. A, Amsterdam 1979, by the same authors is a prerequisite for this text.] Garsia, A. M., Topics in Almost Everywhere Convergence, Markham, Chicago 1970. Meyer, P. A., Probabilities and Potentials, Blaisdell, London 1966. Neveu, J., Discrete-parameter Martingales, North Holland, Math. Libr. vol. 10, Amsterdam 1975. Rogers, L. C. G. and D. Williams, Diffusions, Markov Processes and Martingales (2 vols., 2nd edn), Cambridge Math. Library, Cambridge 2000.

References

[1] Alexits, G., Convergence Problems of Orthogonal Series, Oxford: Pergamon, Int. Ser. Monogr. Pure Appl. Math. vol. 20, 1961. [2] Andrews, G. E., Askey, R. and R. Roy, Special Functions, Cambridge: Cambridge University Press, Encycl. Math. Appl. vol. 71, 1999. [3] Bass, R. F., Probabilistic Techniques in Analysis, New York: Springer, 1995. [4] Bauer, H., Approximation and abstract boundaries, Am. Math. Monthly 85 (1978), 632–647. Also in: H. Bauer, Selecta, Berlin: de Gruyter, 2003, 436–451. [5] Bauer, H., Probability Theory, Berlin: de Gruyter, Studies in Math. vol. 23, 1996. [6] Bauer, H., Measure and Integration Theory, Berlin: de Gruyter, Studies in Math. vol. 26, 2001. [7] Benyamini, Y. and J. Lindenstrauss, Geometric Nonlinear Functional Analysis, vol. 1, Providence (RI): Am. Math. Soc., Coll. Publ. vol. 48, 2000. [8] Boas, R. P., A Primer of Real Functions, Math. Association of America, Carus Math. Monogr. vol. 13, 1960. ¨ [9] Carathéodory, C., Uber das lineare Maß von Punktmengen – eine Verallgemeinerung des Längenbegriffs, Nachr. Kgl. Ges. Wiss. Göttingen Math.-Phys. Kl. (1914), 404–426. Also in: C. Carathéodory, Gesammelte mathematische Schriften (5 Bde.), München: C.H. Beck, 1954-57, Bd. 4, 249–275. [10] Ciesielski, Z., Hölder condition for realizations of Gaussian processes, Trans. Am. Math. Soc. 99 (1961), 403–413. [11] Diestel, J. and J. J. Uhl Jr., Vector Measures, Providence (RI): American Mathematical Society, Math. Surveys no. 15, 1977. [12] Dieudonné, J., Sur un théorème de Jessen, Fundam. Math. 37 (1950), 242–248. Also in: J. Dieudonné, Choix d’œuvres mathématiques (2 tomes), Paris: Hermann, 1981, t. 1, 369–275. [13] Doob, J. L., Stochastic Processes, New York: Wiley, Ser. Probab. Math. Stat., 1953. [14] Dudley, R. M., Real Analysis and Probability, Pacific Grove (CA): Wadsworth & Brooks/Cole, Math. Ser., 1989. [15] Dunford, N. and J. T. Schwartz, Linear Operators I, New York: Pure Appl. Math. vol. 7, Interscience, 1957. [16] Garsia, A. M., Topics in Almost Everywhere Convergence, Chicago: Markham, 1970. [17] Gradshteyn, I. and I. Ryzhik, Tables of Integrals, Series, and Products (4th corrected and enlarged edn), San Diego (CA): Academic Press, 1992.

364

References

365

[18] Gundy, R. F., Martingale theory and pointwise convergence of certain orthogonal series, Trans. Am. Math. Soc. 124 (1966), 228–248. [19] Hausdorff, F., Grundzüge der Mengenlehre, Leipzig: Veit & Comp., 1914 (1st edn). Reprint of the original edn, New York: Chelsea, 1949. [20] Hewitt, E. and K. R. Stromberg, Real and Abstract Analysis, New York: Springer, Grad. Texts Math. vol. 25, 1975. [21] Hunt, G. A., Martingales et processus de Markov, Paris: Dunod, Monogr. Soc. Math. France t. 1, 1966. [22] Kaczmarz, S. and H. Steinhaus, Theorie der Orthogonalreihen (2nd corr. reprint), New York: Chelsea, 1951. First edition appeared under the same title with PWN, Warsaw: Monogr. Mat. Warszawa vol. VI, 1935. [23] Kahane, J.-P., Some Random Series of Functions, (2nd edn) Cambridge: Cambridge University Press, Stud. Adv. Math. vol. 5, 1985. [24] Korovkin, P. P., Linear Operators and Approximation Theory, Delhi: Hindustan Publ. Corp., 1960. [25] Krantz, S. G., A Panorama of Harmonic Analysis, Washington: Mathematical Association of America, Carus Math. Monogr. vol. 27, 1999. [26] Lévy, P., Processus stochastiques et mouvement Brownien, Paris: Gauthier-Villars, Monographies des Probabilités Fasc. VI, 1948. [27] Lindenstrauss, J. and Tzafriri, L., Classical Banach Spaces I, II, Berlin: Springer, Ergeb. Math. Grenzgeb. 2. Ser. Bde. 92, 97, 1977–79. [28] Marcinkiewicz, J. and A. Zygmund, Sur les fonctions indépendantes, Fundam. Math. 29 (1937), 309–335. Also in: J. Marcinkiewicz, Collected Papers, Warsaw: PWN, 1964, 233–259. [29] Métivier, M., Semimartingales. A Course on Stochastic Processes, Berlin: de Gruyter, Stud. Math. vol. 2, 1982. [30] Sz.-Nagy, B., Introduction to Real Functions and Orthogonal Expansions, New York: Oxford University Press, Univ. Texts in the Math. Sci., 1965. [31] Neveu, J., Discrete-parameter Martingales, Amsterdam: North Holland, Math. Libr. vol. 10, 1975. Slightly updated version of the French original: Martingales à temps discrèt, Paris: Masson, 1972. [32] Olevski˘ı, A. M., Fourier Series with Respect to General Orthogonal Systems, Berlin: Springer, Ergeb. Math. Grenzgeb. Bd. 2. Ser. 86, 1975. [33] Oxtoby, J. C., Measure and Category, (2nd edn), New York: Springer, Grad. Texts Math. vol. 2, 1980. [34] Paley, R. E. A. C. and N. Wiener, Providence (RI): Fourier Transforms in the Complex Domain, American Mathematical Society, Coll. Publ. vol. 19, 1934. [35] Pinsky, M. A., Introduction to Fourier Analysis and Wavelets, Pacific Grove (CA): Brooks/Cole, Ser. Adv. Math., 2002. [36] Pratt, J. W., On interchanging limits and integrals, Ann. Math. Stat. 31 (1960), 74–77. [Acknowledgement of Priority, Ann. Math. Stat. 37 (1966), 1407.] ¨ [37] Riemann, B., Uber die Darstellbarkeit einer Function durch eine trigonometrische Reihe, Nachr. Kgl. Ges. Wiss. Göttingen 13 (1867), 227–271. Also in: Bernhard Riemann, Collected Papers, Berlin: Springer, 1990, 259–303. [38] Rogers, L. C. G. and D. Williams, Diffusions, Markov Processes and Martingales (2 vols., 2nd edn), Cambridge: Cambridge Mathematical Library, 2000. [39] Rudin, W., Principles of Mathematical Analysis (3rd edn), New York: McGrawHill, 1976. [40] Rudin, W., Real and Complex Analysis (3rd edn), New York: McGraw-Hill, 1987.

366

References

[41] Schipp, F., Wade, W. R. and P. Simon, Walsh Series. An Introduction to Dyadic Harmonic Analysis, Bristol: Adam Hilger, 1990. [42] Souslin, M. Y., Sur une définition des ensembles mesurables B sans nombres transfinis, C. R. Acad. Sci. Paris 164 (1917), 88–91. [43] Srivastava, S. M., A Course on Borel Sets, New York: Springer, Grad. Texts Math. vol. 180, 1998. [44] Solovay, R. M., A model of set theory in which every set of reals is Lebesgue measurable, Ann. Math. 92 (1970), 1–56. [45] Steele, J. M., Stochastic Calculus and Financial Applications, New York: Springer, Appl. Math. vol. 45, 2000. [46] Steen, L. A. and J. A. Seebach, Counterexamples in Topology, New York: Dover, 1995. [47] Stein, E. M., Singular Integrals and Differentiability Properties of Functions, Princeton (NJ): Princeton University Press, Math. Ser. vol. 30, 1970. [48] Strichartz, R. S., The Way of Analysis (rev. edn), Sudbury (MA): Jones and Bartlett, 2000. [49] Stromberg, K., The Banach–Tarski paradox, Am. Math. Monthly 86 (1979), 151– 161. [50] Stroock, D. W., A Concise Introduction to the Theory of Integration (3rd edn), Boston: Birkhäuser, 1999. [51] Szegö, G., Orthogonal Polynomials, Providence (RI): Am. Math. Soc., Coll. Publ. vol. 23, 1939. [52] Wagon, S., The Banach–Tarski Paradox, Cambridge: Cambridge University Press, Encycl. Math. Appl. vol. 24, 1985. [53] Wheeden, R. L. and A. Zygmund, Measure and Integral. An Introduction to Real Analysis, New York: Marcel Dekker, Pure Appl. Math. vol. 43, 1977. [54] Willard, S., General Topology, Reading (MA): Addison-Wesley, 1970. [55] Yosida, K., Functional Analysis (6th edn), Berlin: Springer, Grundlehren Math. Wiss. Bd. 123, 1980. [56] Young, W. H., On semi-integrals and oscillating successions of functions, Proc. London Math. Soc. (2) 9 (1910/11), 286–324.

Notation index

This is intended to aid cross-referencing, so notation that is specific to a single section is generally not listed. Some symbols are used locally, without ambiguity, in senses other than those given below. Numbers following entries are page numbers with the occasional (Pr mn) referring to Problem mn on the respective page. Unless otherwise stated, binary operations between functions such as f ± g, f · g, j→

f ∧ g, f ∨ g, comparisons f g, f < g or limiting relations fj −−→ f , limj fj , lim inf j fj , lim sup fj , supi fi or inf i fi are always understood pointwise. Alternatives are indicated by square brackets, i.e., ‘if A [B] … then P [Q] ’ should be read as ‘if A … then P ’ and ‘if B … then Q ’. Abbreviations and shorthand notation a.a. a.e. ONB ONS UI w.r.t. negative positive

almost all, 80 almost every(where), 80 orthonormal basis, 239 orthonormal system, 239 uniformly integrable, 163, 194 with respect to always in the sense 0 always in the sense 0

∪-stable ∩-stable []

stable under finite unions stable under finite intersections, 32 end of proof, x indicates that a small intermediate step is required, x (in the margin) caution, x

Special labels, defining properties 1 2 3 M1 M2

Dynkin system, 31 measure, 22

S1 S2 S3 1 2 3

semi-ring, 37 -algebra, 15

Mathematical symbols Sub- and superscripts +

positive part, positive elements

⊥ b c

367

orthogonal complement, 235 bounded compact support

368

Notation index

Symbols, binary operations ∀ ∃ # − → − → ↑ ↓

= def = ≡ ∨, [f ∨ g] ∧, [f ∧ g] ⊥ ⊕ ×

⊗

for all, for every there exists, there is cardinality, 7 converges to convergence in measure, 163 increases to decreases to defining equality equal by definition identically equal maximum [of f and g], 64 minimum [of f and g], 64 absolutely continuous, 202 - measures: singular, 209 - Hilbert space: orthogonal, 231, 235 convolution, 137 direct sum, 236 - Cartesian product of sets; - Cartesian product of -algebras, 121; - product of measures, 125 product of -algebras, 121;

A⊂B AB A¯ A Aj ↑ A Aj ↓ A A×B An A #A

a b a b a b a b open, closed, half-open intervals a b, ((a,b)) rectangles in n , 18

u v u = v u ∈ B etc. 57 Functions, norms, measures & integrals fA f −1 f g f+ f− 1A

sgn

•

Set operations ∅ A∪B ·B A∪ A∩B A\B Ac AB

#A #B, #A < #B 7 t·A = ta a ∈ A, 36 (Pr 5.8) x+A = x + a a ∈ A, 28 E∩ = E ∩ A A ∈ , 16

empty set union 5 union of disjoint sets, 3, 5 intersection, 5 set-theoretic difference, 5 complement of A, 5 symmetric difference, 13 (Pr 2.2) subset, 5 proper subset, 5 closure of A, 320 open interior of A, 320 24 24 Cartesian product n-fold Cartesian product infinite sequences with values in A cardinality of A, 7

•p • • • ¯

x ∈ A, 6 = fx = f −1 B B ∈ , 16 composition: f gx = fgx = f ∨ 0 positive part, 61 = −f ∧ 0 negative part, 61 indicatorfunction of A 1 x ∈ A 1A x = 0 x ∈ A sign function ⎧ x>0 ⎨1 sgnx = 0 x=0 ⎩ −1 x < 0 maximum-norm in n and n×n , 142 Lp -norm, 105, 108 L -norm, 116 scalar product, 228

completion of the measure , 29 (Pr 4.13) restriction of the measure to the family of sets restriction of the measure X to the canonical -algebra on X T −1 image measure, 51 - limj→ convergence in measure, 163 T image measure, 51 u· measure with density, 79–80 ux d x 69, 76 u d , ux dx, u d = 1 u d , 79 A A

, X,

Notation index 77 u dx, ux dx u dT = u T d , 134 b b ux dx, R a ux dx Riemann a integral, 93, 339 Other notation in alphabetical order ℵ0 ∗ j j , − ,

cardinality of , 7 completion, 29 (Pr 4.13) filtration, 176 =

i i ∈ I, 177, 203 = ∈− , 193 185

Br x

open ball with radius r and centre x, 17, 323 Borel sets in A, 20 (Pr 3.10) Borel sets in , 226 Borel sets in n , 17 completion of the Borel sets, 132 (Pr 13.11), 144, 330 ¯ 58 Borel sets in ,

A n n ∗ n ¯ ¯ CU Cc U C U

x det Dx d d

E , E• E

cardinality of 0 1, 11 complex numbers continuous functions f U → continuous functions f U → with compact support functions f U → differentiable arbitrarily often Dynkin system generated by , 31 unit mass at x, Dirac measure at x, 26 determinant (of a matrix) Jacobian, 147 Radon-Nikodým derivative, 203 conditional expectation, 250, 263 = E conditional expectation, 260, 263 simple functions, 60

369

GLn

invertible n×n -matrices

id

identity map or matrix

n n half-open rectangles in n , 18 n …with rational endpoints, rat rat 18 o on o n open rectangles in n , 18 on o rat rat …with rational endpoints, 18

or

n 1 p

1 , 1¯

1

1 -lim∈I Lp

p , Lp

p

p -limj→

, L L lim inf j aj lim supj aj lim inf j Aj lim supj Aj

Lebesgue measure in n , 27 79 113 76 227 203 108 228 105 109 116 260 = supk inf jk aj , 313 = inf sup a , 313 k jk j = A , 316 k∈ jk j = k∈ jk Aj , 316

, ¯ M

59 258

0

natural numbers: 1 2 3 positive integers: 0 1 2 -null sets, 29 (Pr 4.10), 80

n X n , n

volume of the unit ball in n , 156 topology, open sets, 17 topology, open sets in n , 17

PC , PF X

(orthogonal) projection, 235 all subsets of X, 12

rational numbers

370

Notation index

¯

real numbers extended real line − +, 58 n Euclidean n-space nx , ny 147 n×n real n × n-matrices a b 339 a , − b 354 a b, a b 358 a b, − 359 ,

stopping times, 185 -algebra generated by , 16

T Ti i ∈ I -algebra generated by the map(s) T , resp., Ti , 51 span all finite linear combinations of the elements in , 239 supp f = f = 0 support of f , x

stopping times, 185 shift x y = y − x, 49

X measurable space, 22 X measure space, 22 X j , X filtered measure space, 176, 203

integers: 0 ±1 ±2

Name and subject index

This should be used in conjunction with the Bibliography and the Index of Notation. Numbers following entries are page numbers which, if accomplished by (Pr n.m), refer to Problem n.m on that page; a number with a trailing ‘n’ indicates that a footnote is being referenced. Unless otherwise started ‘integral’, integrability’ etc. always refer to the (abstract) Lebesgue integral. Within the index we use ‘L-…’ and ‘R-…’ as a shorthand for ‘(abstract) Lebesgue-…’ and ‘Riemann-…’ ℵ0 aleph null, 7 absolutely continuous, 202 uniformly absolutely continous, 169 Alexits, György, 277n, 302 almost all (a.a.), 80 almost everywhere (a.e.), 80 Analytic set, 333 Andrews, George, 277n arc-length, 160–161 (Pr 15.6) Askey, Richard, 277n atom, 20 (Pr 3.5), 46 (Pr 6.5) axiom of choice, 331 Banach, Stephan, 43 Banach space, 326 Banach–Tarski paradox, 43 basic convergence result for improper R-integrals, 355 for R-integrals, 351 basis, 242 unconditional basis, 293–295 Bass, Richard, 311 Bauer, Heinz, 159, 281, 310 Benyamini, Yoav, 210 Bernoulli distribution, 183 Bernstein, Serge˘ı N., 279

Bernstein polynomials, 280 bijective map, 6 Boas, Ralph, 114 Borel, Emile Borel measurable, 17 Borel set, 17 Borel -algebra, 17 alternative definition, 21 (Pr 3.12) cardinality, 332 completion, 330 generator of, 18, 19 in a subset, 20 (Pr 3.10) ¯ 58 in , Brownian motion, 309–311 continuum, 11 Calderón, Alberto Calderón–Zygmund decomposition, 221 Cantor, Georg, 11 Cantor’s diagonal method, 11 Cantor discontinuum, 55 (Pr 7.10), 223–224 (Pr 19.10) Cantor function, 224 (Pr 19.10) Cantor (ternary) set, 55 (Pr 7.10), 223–224 (Pr 19.10) Carathéodory, Constantin, 37

371

372

Name and Subject Index

cardinality, 7 of the Borel -algebra, 332 of the Lebesgue -algebra, 330 Carleson, Lennart, 289 Cartesian product rules for Cartesian Products, 121 Cauchy sequence in p , 109 in metric spaces, 325 in normed spaces, 234 Cavalieri’s principle, 120 Cesàro mean, 286 change of variable formula for Lebesgue integrals, 151 for Riemann integrals, 350 for Stieltjes integrals, 133 (Pr 13.13) Chebyshev, Pafnuti L., 85 (Pr 10.5) Chebyshev polynomials (first kind), 277 Ciesielski, Z., 311 closed ball, 323 compactness (weak sequential), 169 in 1 , 168 in p , 168, 274 (Pr 23.8) and uniform integrability, 169 completeness of p , 1 p < , 110 of , 116 in normed spaces, 234 completion, 29 (Pr 4.13) and Hölder maps, 146 and inner measure, 86 (Pr 10.12) and inner/outer regularity, 160 (Pr 15.6) integration w.r.t. complete measures, 86 (Pr 10.11) of metric spaces, 325 and outer measure, 46 (Pr 6.2), 86 (Pr 10.12) and product measures, 132 (Pr 13.11) and submartingales, 187 (Pr 16.3) complexification, 232 conditional conditional Beppo Levi Theorem, 264 conditional dominated convergence Theorem, 266 conditional Fatou’s Lemma, 265 conditional Jensen inequality, 266 conditional monotone convergence property, 259 conditional probability, 257 (Pr 22.3)

conditional expectation in Lp and L , 260 in L1 , 263–264 in L2 , 250 properties (in L2 ), 251 properties (in Lp ), 261–262 via Radon-Nikodým Theorem, 223 (Pr 19.3) conjugate numbers (also conj. indices), 105 conjugate Young functions, 117 (Pr 12.5) continuity implies measurability, 50 of measures at ∅, 24 of measures from above, 24 of measures from below, 24 in metric spaces, 324 in topological spaces, 321 continuous function is measurable, 50 is Riemann integrable, 342 continuous linear functional in Hilbert space, 238 representation of continuous linear functionals, 239 convergence along an upwards filtering set, 203 criteria for a.e. convergence, 173 (Pr 16.1,16.2) in p , 109 in p implies in measure, 164 in measure, 163 criterion, 174 (Pr 16.10) is metrizable, 174 (Pr 16.8) no unique limit, 173 (Pr 16.6) weaker than pointwise, 164 in metric spaces, 323–324 in normed spaces, 234 pointwise implies in measure, 164 pointwise vs. p , 109 in probability, 163n of series of random variables, 201 (Pr 16.9) in topological spaces, 322 uniform convergence, 351 convex function, 114–115, 172n convex set, 235 convolution formula for integrals, 138 of a function and a measure, 137 of functions, 137

Name and Subject Index as image measure, 137 of measures, 137 cosine law, 231 countable set, 7 counting measure, 27 Darboux, Gaston, 337 Darboux sum, 93, 338 de la Vallée Poussin, Charles de la Vallèe Poussin’s condition, 169 de Morgan’s identities, 5, 6 dense subset, 320 of C, 279, 287 in Hilbert space, 238 of p , 157, 159, 139 of L2 , 282 density (function), 80, 202 derivative of a measure, 152, 219 of a measure singular to n , 220 of a monotone function, 225 (Pr 19.17) Radon-Nikodým derivative, 203 of a series of monotone functions, 225 (Pr 19.18) diagonal method, 11 Diestel, Joseph, 210 Dieudonné, Jean, 205 Dieudonné’s condition, 169 diffeomorphism, 147 diffuse measure, 46 (Pr 6.5) Dirac measure, 26 direct sum, 236 Dirichlet (Lejeune-D.), Gustav Dirichlet’s jump function, 88 not Riemann integrable, 342 Dirichlet kernel, 286 disjoint union, 3, 5 distribution distribution function, 128 of a random variable, 52 Doob, Joseph, 176, 213 Doob decomposition, 275 (Pr 23.11) Doob’s upcrossing estimate, 191 Dudley, Richard, 336 Dunford, Nelson, 168 Dunford–Pettis condition, 169 dyadic interval, 179 dyadic square, 179 Dynkin system, 31 conditions to be -algebra, 32

373

generated by a family, 31 minimal Dynkin system, 31 not -algebra, 36 (Pr 5.2) enumeration, 7 equi-integrable, see uniformly integrable exhausting sequence, 22 factorization lemma, 56 (Pr 7.11), 64 Faltung, see convolution Fatou, Pierre, 73 Féjer, Lipót, 285, 286 Féjer kernel, 286 filtration, 176, 203 dyadic filtration, 213, 221, 268, 295, 302 finite additivity, 23 Fischer, Ernst, 110 Fourier, Jean Baptiste Fourier coefficients, 285 Fourier series, a.e.-convergence, 288 Fourier series, Kolmogorov’s example of a nowhere convergent Fourier series, 288 Fourier series, Lp -convergence, 288 Fréchet, Maurice, 232 (Pr 20.2) Fresnel integral, 103 (Pr 11.19) Friedrichs mollifier, 141 (Pr 14.10) Frullani integral, 103 (Pr 11.20) F set, 142, 159 (Pr 15.1) Fubini, Guido, 125 function absolutely continuous function, 223 (Pr 19.9) concave function, 114–115 convex function, 114–115, 172n distribution function, see distribution function independent function, see independent functions indicator function, see indicator function integrable function, 76 measurable function, see measurable function(s) moment generating function, 102 (Pr 11.15) monotone function, see monotone function negative part of, 61 numerical function, 58

374


function (cont.) positive part of, 61 Riemann integrable funtion, 339 simple function, see simple functions (Gamma) function, 99, 161 (Pr 15.8) Garsia, Adriano, 218, 309 Gaussian distribution, 152, 310 G set, 142 generator of the Borel -algebra, 18, 19 of a Dynkin system, 31 of a -algebra, 16 Gradshteyn, Izrail S., 277n, 284 Gram-Schmidt orthonormalization, 243–244 Gundy, Richard, 309 Haar, Alfréd Haar–Fourier series, 290 a.e.-convergence, 291 Lp -convergence, 291 Haar functions, 289, 310 Haar system, 289, 310 complete ONS, 290 Haar wavelet, 295 a.e.-convergence, 296 complete ONS, 296 Lp -convergence, 296 Hausdorff, Felix, 43 Hausdorff space, 320 Hermite polynomials, 278 Hewitt, Edwin, 11, 19n, 43 Hilbert cube, 246 (Pr 21.7) Hilbert space, 234 isomorphic to 2 , 244–245 separable Hilbert space, 230, 244–245, 245 (Pr 21.5) Hölder continuity, 143 Hunt, G., 163 Hunt, R., 289 image measure, 51, 134 integral w.r.t. image measure, 134 of measure with density, 140 (Pr 14.1) independence and integrability, 85 (Pr 10.10) of -algebras, 36 (Pr 5.10) independent functions, 180–184, 279, 299, 302

existence of independent functions, 183 independent random variables, 188 (Pr 16.8, 16.9), 196, 201 (Pr 18.9), 224 (Pr 19.11), 275 (Pr 23.12), 306 convergence of independent random variables, 309 indicator function, 13 (Pr 2.5), 59 measurability, 59 rules for indicator functions, 74 (Pr 9.9), 316 inequality Bessel inequality, 240 Burkholder–Davis–Gundy inequality, 294 Cauchy–Schwarz inequality, 107, 229 Chebyshev inequality, 85 (Pr 10.5) conditional Jensen inequality, 266 Doob’s maximal Lp inequality, 211, 224 (Pr 19.13) generalized Hölder inequality, 117 (Pr 12.4) Hardy–Littlewood maximal inequality, 215 Hölder inequality, 106 for series, 113 for 0 < p < 1, 118 (Pr 12.18) Jensen inequality, 115 Kolmogorov’s inequality, 224 (Pr 19.11) Markov inequality, 82 variants thereof, 84–85 (Pr 10.5) Minkowski’s inequality, 107 for integrals, 130 for series, 113 for 0 < p < 1, 118 (Pr 12.18) moment inequality, 118 (Pr 12.19) strong-type inequality, 212 weak-type maximal inequality, 212 Young inequality, 105, 117 (Pr 12.5), 138, 141 (Pr 14.9) injective map, 6 inner product, 228 inner product space, 228 integrability comparison test, 85 (Pr 10.9) of complex functions, 227 integrability criterion

Name and Subject Index for improper R-integrals, 354, 356, 357 for L-integrals, 77 for L-integrals of image measures, 134, 135 for R-integrals, 94, 339, 344 of exponentials, 98, 102 (Pr 11.9) w.r.t. image measures, 135 of measurable functions, 76 of positive functions, 77 of (fractional) powers, 98, 155 Riemann integrability, 93, 339 integrable function. see also 1 P etc. improperly R-, not L-integrable function, 97 is a.e. -valued, 83 Riemann integrable function, 93, 339 integral, see also Lebesgue integral, Riemann integral, Stieltjes integral and alternating series, 101 (Pr 11.5) of complex functions, 226–228 examples, 72–73, 79 generalizing series, 113 w.r.t. image measures, 134 and infinite series, 101 (Pr 11.4) iterated vs. double, 130–131 (Pr 13.3–13.5) lattice property, 78 of measurable functions, 76 over a null set, 81 of positive functions, 69 examples, 72–73 properties, 71–72 is positive linear functional, 79 properties, 78–79 of rotationally invariant functions, 155 of simple functions, 68 sine integral, 131 (Pr 13.6) over a subset, 79 integral test for series, 357 integration by parts for Riemann integrals, 350 for Stieltjes integrals, 133 (Pr 13.13) integration by substitution, 350 see also change of variable formula isometry, 245, 325 Jacobi polynomials, 277 Jacobian, 147 Jordan, Pascual, 232 (Pr 20.2)

Kac, Mark, 176 Kaczmarz, Stefan, 277n Kahane, Jean-Pierre, 311 kernel, 74 (Pr 9.11) Dirichlet kernel, 286 Féjer kernel, 286 Kolmogorov, Andrei N., 196, 288 Kolmorgov’s law of large numbers, 196–200 Korovkin, Pavel P., 281 Krantz, Steven, 218, 222 1 (summable sequences), 79 2 being isomorphic to separable Hilbert spaces, 245 1 , 1¯ (integrable functions), 76 1 , 227 p , 113 p , 105 dense subset of p , 140, 157, 159 p , Lp , 228 Lp , 108 being not separable, 271 Lp = Lp¯ , 108 separability criterion, 269, 271, 272 Laguerre polynomials, 278 lattice, 253 law of large numbers, 196–200 , L , 116 Lebesgue, Henri, 77, 77n, 349 Lebesgue integrable, 77 Lebesgue measurable set, 330 Lebesgue pre-measure, 45 Lebesgue -algebra, 330 cardinality, 330 Lebesgue integral, 77 abstract Lebesgue integral, 77n invariant under reflections, 136 invariant under translations, 136 transformation formula, 151 and differentiation, 152 Lebesgue measure, 27 change of variable formula, 53, 148 and differentiation, 152 characterized by translation invariance, 34 is diffuse, 46 (Pr 6.5) dilations, 36 (Pr 5.8) existence, 28, 45 and Hölder maps, 143

375

376


Lebesgue measure (cont.) is inner/outer regular, 158 invariant under motions, 54 invariant under orthogonal maps, 52 invariant under translations, 34 null sets, 29 (Pr 4.11), 47 (Pr 6.8) under Hölder maps, 146 as product measure, 125 properties of Lebesgue measure, 28, 46 (Pr 6.3) transformation formula, 148 and differentiation, 152 uniqueness, 28 Legendre polynomials, 278 lemma Borel-Cantelli lemma, 48 (Pr 6.9), 198 Calderón-Zygmund lemma, 221 lemma conditional Fatou’s lemma, 265 continuity lemma (L-integral), 91 differentiability lemma (L-integral), 91 Doob’s upcrossing lemma, 191 factorization lemma, 56 (Pr 7.11), 64 Fatou’s lemma, 73 Fatou’s lemma for measures, 74 (Pr 9.9) generalized Fatou’s lemma, 85 (Pr 10.8) Pratt’s lemma, 101 (Pr 11.3) reverse Fatou lemma, 74 (Pr 9.8) Urysohn’s lemma, 156 Lévy, Paul, 311 lim inf, lim sup (limit inferior/superior) of a numerical sequence, 63, 313–314 of a sequence of sets, 74 (Pr 9.9), 316 Lindenstrauss, Joram, 210, 294, 295 linear span, 240, 246 (Pr 21.9) lower integral, 338 map bijective map, 6 continuity in metric spaces, 324 continuous map, 321 Hölder continuous map, 143 and completion, 146 injective map, 6 measurable map, 49, 54 (Pr 7.5) surjective map, 6 Marcinkiewicz, Jozef, 176 martingale, 177 see also submartingale backwards martingale, 193 characterization of martingales, 186

closure of a martingale, 266–267 and conditional expectation, 266 and convex functions, 178, 268 martingale difference sequence, 188 (Pr 17.9), 275 (Pr 23.10), 302 a.e.-convergence, 303, 304, 306 L2 -convergence, 306 Lp -convergence, 304 of independent functions, 306 ONS, 303 with directed index set, 203 1 -convergence, 203 example of non-closable martingale, 275 (Pr 23.12) martingale inequality, 210–213, 294 L1 -convergence, 267 2 -bounded martingale, 201 (Pr 18.8) p -bounded martingale, 225 (Pr 19.14) quadratic variation, 294 reverse martingale, see backwards martingale martingale transform, 188 (Pr 16.7) uniformly integrable (UI) martingale, 194, 267 maximal function Hardy–Littlewood maximal function, 214 of a measure, 217 square maximal function, 214 measurability of continuous maps, 50 of coordinate functions, 54 (Pr 7.5) of indicator functions, 59 ∗ -meaurabilty, 43 measurable function(s), 57 complex valued measurable function, 227 stable under limits, 62 vector lattice, 63 measurable map, 49, 54 (Pr 7.5) measurable set, 15, 17 measurable space, 22, measure, 22, see also Lebesgue measure, Stieltjes measure, 22 complete measure, 29 (Pr 4.13) continuous at ∅, 24 continuous from above, 24 continuous from below, 24 counting measure, 27 -measure, 26

Name and Subject Index diffuse measure, 46 (Pr 6.5) Dirac measure, 26 discrete probability measure, 27 equivalent measures, 223 (Pr 19.5) examples of measures, 26–28 finite measure, 22 inner regular measure, 158 invariant measure, 36 (Pr 5.9) locally finite measure, 218, 217n non-atomic measure, 46 (Pr 6.5) outer measure, 38n outer regular measure, 158 pre-measure, 22, 24, 45 probability measure, 22 product measure, 122–123 properties of measures, 23 separable measure, 272 -additivity, 22 -finite measure, 22, 30 (Pr 4.15) -subadditivity, 26 singular measure, 209 on S n−1 , 153–156 strong additivity, 23 subadditivity, 23 surface measure, 153–156, 161 (Pr 15.6) uniqueness, 33 measure with density, 80, 202 measure space, 22 complete measure space, 29 (Pr 4.13) finite measure space, 22 probability space, 22 -finite measure space, 22, 30 (Pr 4.15) -finite filtered measure space, 176, 203 mesh, 338 Métivier, Michel, 210 metric (distance function), 322 metric space, 322 monotone class, 21 (Pr 3.11) monotone function discontinuities of monotone functions, 129 is Lebesgue a.e. continuous, 129 is Lebesgue a.e. differentiable, 225 (Pr 19.17) is Riemann integrable, 342 monotonicity monotonicity of the integral, 78, 344 monotonicity of measures, 23

377

neighbourhood, 319 open neighbourhood, 319 von Neumann, John, 232 (Pr 20.2) Neveu, Jacques, 268 non-measurable set, 48 (Pr 6.10), 48 (Pr 6.11) for the Borel -algebra, 332 for the Lebesgue -algebra, 331 norm, 105, 325 normed space, 325 and inner products, 229 Lp , 108 p , 105 , L , 116 quotient norm, 326 null set, 29 (Pr 4.10), 47 (Pr 6.8), 80 subsets of a null set, 29 (Pr 4.13) under Hölder map, 146 Olevski˘ı, A.M., 294 open ball, 323 optional sampling, 187 orthogonal orthogonal complement, 235 orthogonal elements of a Hilbert space, 234 orthogonal projection, 235, 246–247 (Pr 21.1) as conditional expectation, 253 orthogonal vectors, 231 orthogonal polynomials, 277–279 Chebyshev polynomials, 277 complete ONS, 282 dense in L2 , 282 Hermite polynomials, 278 Jacobi polynomials, 277 Laguerre polynomials, 278 Legendre polynomials, 278 orthonormal basis (ONB), 239, 242 characterization of, 242 orthonormal system (ONS), 240 complete orthogonal system, 242 maximal orthogonal system, 242 total orthogonal system, 242 orthonormalization procedure, 243–244 Oxtoby, John, 43 Paley, Raymond, 176, 302, 311 parallelogram identity, 231 parameter-dependent

378


parameter-dependent (cont.) improper R-integrals, 103 (Pr 11.20), 355–356 L-integrals, 92, 99 R-integrals, 352–353 Parseval’s identity, 240 partial order, 9 partition, 338 Pettis, B.J., 169 Pinsky, Mark, 299 polar coordinates 3-dimensional, (Pr 15.9) 162 n-dimensional, 153 planar, 152 polarization identity, 231 generalized polarization identity, 233 (Pr 20.5) Polish space, 336, 336n power set, 12 Pratt, John, 101 (Pr 11.3) pre-measure, 22, 24, 45 extension of a pre-measure, 37 -subadditivity, 26 primitive, 346, 348 bounded functions with primitive are L-integrable, 349 of a continuous function, 225 (Pr 19.16) differentiability of a primitive, 225 (Pr 19.16) probability space, 22 product of measurable spaces, 121 product measure space, 125 product measures, 122–123 product -algebra, 121, 127 projection, 235, 246–247 (Pr 21.11) orthogonal projection, 238 Pythagoras’ theorem, 233 (Pr 20.6), 238, 240 quadratic variation (of a martingale), 294 quotient norm, 326 quotient space, 326 Rademacher, Hans Rademacher functions, 299 are an incomplete ONS, 299

completion of, 301–302 Rademacher series, a.e.-convergence, 300 Radon–Nikodým derivative, 203 random variable, 49 see also independent random variables distribution of a random variable, 52 ¯ (extended real line), 58 ¯ 58 arithmetic of , rearrangement decreasing rearrangement, 133 (Pr 13.14) rearrangement invariant, 133 (Pr 13.14) rectangle, 18 Riemann, Bernhard, 337 Riemann integrability, 339 criteria for Riemann integrability, 94, 339, 344 Riemann sum, 342 Riemann integral, 93, 339, 339 vs. antiderivative, 348 coincides with Lebesgue integral, 94 and completed Borel -algebra, 97 function of upper limit, 346 improper Riemann integral, 96, 129, 353–359 improper Riemann integral and infinite series, 357 properties of Riemann integral, 344 Riesz, Frigyes, 110, 111, 238, 239, 326 Riesz, Marcel, 288 ring of sets, 24n, 40n Rogers, Chris (L.C.G.), 294 Roy, Ranjan, 277n Rudin, Walter, 245, 246 (Pr 21.10), 288, 318, 348, 356 Ryzhik, Iosif, 277n, 284 scalar product, see inner product Schipp, Ferenc, 302 Schwartz, Jacob, 168 Seebach, J. Arthur, 318 semi-norm, 325 in p , 108 semi-ring (of sets), 37 n is semi-ring, 44 separable separable Hilbert space, 244, 230, 244–245, 245 (Pr 21.5) separable Lp -space, 272

Name and Subject Index separable measure, 272 separable -algebra, 272 separable space, 320 sesqui-linear form, 229 set analytic set, 333 Borel set, 17 cardinality of, 7 closed set, 319 closed in metric spaces, 323 closed in n , 17 closure, 320 compact set, 320, 324–326, see also compactness connected set, 321 convex set, 235 countable set, 7 dense set, 320, see also dense subset F set, 159 (Pr 15.1), 142 G set, 142 (open) interior, 320 measurable set, 15 ∗ -measurable set, 43 non-measurable set, see non-measurable set nowhere dense set, 55 (Pr 7.10) open set, 319 open in metric spaces, 323 open in n , 17, 319 pathwise connected set, 321 relatively compact set, 320 relatively open set, 319 Souslin set, 333 uncountable set, 7 upwards filtering index set, 203 -additivity, 3, 22 -algebra, 15 Borel set, 17 examples, 15–16 generated by a family of maps, 51 generated by a family of sets, 16 generated by a map, 51 generator of, 16 inverse image, 16, 49 minimal -algebra, 16, 51 product -algebra, 121, 127 properties, 15, 20 (Pr 3.1) separable -algebra, 272 topological -algebra, 17 trace -algebra, 16, 20 (Pr 3.10)

379

-finite filtered measure space, 176 -finite measure, 22 -finite measure space, 22 -subadditivity, 26 Simon, Peter, 302 simple functions, 60 dense in p , 1 p < , 112 dense in , 61 integral of simple finctions, 68 not dense in , 116 standard representation, 60 uniformly dense in b , 65 (Pr 8.7) singleton, 46 (Pr 6.5) Souslin, Michel, 333, 336 Souslin operation, 336 Souslin scheme, 332 Souslin set, 333 span, 240, 246 (Pr 21.9) spherical coordinates, 153 Srivastava, Sashi, 336 standard representation, 60 Steele, Michael, 311 Steen, Lynn, 318 Stein, Elias, 221 Steinhaus, Hugo, 176, 277n step function, 342, see also simple functions is Riemann integrable, 342 Stieltjes, Thomas Stieltjes function, 55 (Pr 7.9) Stieltjes integral, 132 (Pr 13.13) change of variable, 133 (Pr 13.13) integration by parts, 133 (Pr 13.13) Stieltjes measure, 54 (Pr 7.9), 132 (Pr 13.13) Lebesgue decomposition of Stieltjes measure, 223 (Pr 19.9) stopping time, 184 characterization of, 189 (Pr 17.9) Strichartz, Robert, 344 Stromberg, Karl, 11, 19n, 43 strong additivity, 23 Stroock, Daniel, 161 (Pr 15.7) subadditivity, 23 submartingale, 177 backwards submartingale, 193 convergence theorem, 193 1 -convergence, 195 change of filtration, 187 (Pr 16.2) characterization of submartingales, 186

380


submartingale (cont.) w.r.t. completed filtration, 187 (Pr 16.3) and conditional expectation, 266 and convex functions, 178, 268 Doob decomposition, 275 (Pr 23.11) Doob’s maximal inequality, 211, 224 (Pr 19.13) examples, 178–181, 200 (Pr 18.6) inequalities for, 210–213 1 -convergence, 194 pointwise convergence, 191 reversed submartingale, see backwards martingale uniformly integrable (UI)martingale, 193, 194 upcrossing estimate, 191 supermartingale, 177 see also submartingale surface measure, 153–156, 161 (Pr 15.6) surjective map, 6 symmetric difference, 13 (Pr 2.2) Sz.-Nagy, Béla, 349 Szegö, Gabor, 277n Tarski, Alfréd, 43 theorem, see also lemma or inequality backwards convergence theorem, 193 Beppo Levi theorem, 70 Bonnet’s mean value theorem, 350 bounded convergence theorem, 174 (Pr 16.7) Cantor–Bernstein theorem, 9 Carathéodory’s existence theorem, 37 completion of metric spaces, 325 conditional Beppo Levi theorem, 264 conditional dominated convergence theorem, 266 continuity theorem (improper R-integral), 355 continuity theorem (R-integral), 352 continuity lemma (L-integral), 91 convergence of UI submartingales, 194 differentiability lemma (L-integral), 91 differentiation theorem (improper R-integral), 356 differentiation theorem (R-integral), 352 dominated convergence theorem, 89 p -version, 111 100 (Pr 11.1) Doob’s theorem, 222 (Pr 19.2) existence of product measures, 123

extension of measures, 37 Fréchet-v. Neumann-Jordan theorem, 232 (Pr 20.2) Fubini’s theorem, 126 Fubini’s theorem on series, 225 (Pr 19.18) fundamental theorem of calculus, 349 general transformation theorem, 151 Hardy–Littlewood maximal inequalities, 215 Heine–Borel theorem, 325, 326 integral test for series, 357 integration by parts, 350 integration by substitution, 350 Jacobi’s transformation theorem, 148 Korovkin’s theorem, 280 Lebesgue’s convergence theorem, 89 p -version, 100 (Pr 11.1), 111 Lebesgue decomposition, 209 Lebesgue’s differentiation theorem, 218 mean value theorem for integrals, 345 monotone convergence theorem, 88 optional sampling theorem, 187 projection theorem, 235 Pythagoras’ theorem, 233 (Pr 20.6), 238, 240 Radon-Nikodým theorem, 202 Riesz representation theorem, 239 Riesz’ convergence theorem, 112 Riesz–Fischer theorem, 110 M. Riesz’ theorem, 288 second mean value theorem for integrals, 350n submartingale convergence theorem, 191 Tonelli’s theorem, 125 uniqueness of measures, 33 alternative statement, 36 (Pr 5.6) uniqueness of product measures, 122 Vitali’s convergence theorem, 165 non--finite case, 167 Weierstrass approximation theorem, 279, 287 tightness (of measures), 169 Tonelli, Leonida, 125 topological -algebra, 17 topological space, 17, 319 topology, 17, 319 examples, 319 trace -algebra, 16, 20 (Pr 3.10)

Name and Subject Index transformation formula for Lebesgue integrals, 151 for Lebesgue measure, 53, 148 and differentiation, 152 trigonometric polynomial, 283 trigonometric polynomials are dense in C − , 287 trigonometric system, 283 complete in L2 , 283, 287 Tzafriri, Lior, 294, 295 Uhl, John, 210 unconditional basis, 293–295 uncountable, 7 uniform boundedness principle, 246 (Pr 21.10) uniformly integrable, 163, 175 (Pr 16.11) vs. compactness, 169 equivalent conditions, 168 uniformly -additive, 169 unit mass, 26 upcrossing, 190 upcrossing estimate, 191 upper integral, 338 upwards filtering, upwards directed, 203

381

vector space, 226 Volterra, Vito, 349 volume of unit ball, 155–156 Wade, William, 302 Wagon, Stan, 43 Walsh system, 302 wavelet, see Haar wavelet Weierstrass, Karl, 279 Wheeden, Richard, 288 Wiener, Norbert, 176, 311 Wiener process, 309, 311 Willard, Stephen, 318 Williams, David, 294 Yosida, Kôsaku, 232 Young, William, 101 (Pr 11.3), 117 (Pr 12.5), 138 Young function, 117 (Pr 12.5) Young’s inequality, 105, 117 (Pr 12.5), 138, 141 (Pr 14.9) Zygmund, Antoni, 176, 221, 288

Measures, Integrals and Martingales

Measures, integrals and martingales

Martingales and Stochastic Integrals I

Functional Operators, Volume 1: Measures and Integrals

Martingales

Operator-valued measures and integrals for cone-valued functions

Martingales and stochastic analysis

From Measures to Itô Integrals (AIMS Library of Mathematical Sciences)

Continuous martingales and Brownian motion

Martingales diffusions and financial mathematics

Fourier series and integrals

Integrals and Operators

Continuous Martingales and Brownian Motion

Measures and Probabilities (Universitext)

Conditional Measures and Applications

Conditional measures and applications

Probability with martingales

Random And Vector Measures

Measures and Probabilities

Measures and Probabilities (Universitext)

New Integrals

Irresistible integrals

Probability with Martingales

Probability with martingales

Singular Integrals

Stochastic integrals

Stochastic integrals

Probabilities and potential B: theory of martingales

Vector measures

Gaussian measures

Decisive Measures

Measures, Integrals and Martingales