MEASURE THEORY Volume 2
D.H.Fremlin
By the same author: Topological Riesz Spaces and Measure Theory, Cambridge University Press, 1974. Consequences of Martin’s Axiom, Cambridge University Press, 1982. Companions to the present volume: Measure Theory, vol. 1, Torres Fremlin, 2000. Measure Theory, vol. 3, Torres Fremlin, 2002.
First printing May 2001 Second printing April 2003
MEASURE THEORY Volume 2 Broad Foundations
D.H.Fremlin Reader in Mathematics, University of Essex
Dedicated by the Author to the Publisher
This book may be ordered from the publisher at the address below. For price and means of payment see the author’s Web page http://www.essex.ac.uk/maths/staff/fremlin/mtsales.htm, or enquire from
[email protected] First published in 2001 by Torres Fremlin, 25 Ireton Road, Colchester CO3 3AT, England c D.H.Fremlin 2001 ° The right of D.H.Fremlin to be identified as author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. This work is issued under the terms of the Design Science License as published in http://dsl.org/copyleft/dsl.txt. For the source files see http://www.essex. ac.uk/maths/staff/fremlin/mt2.2003/index.htm. Library of Congress classification QA312.F72 AMS 2000 classification 28A99 ISBN 0953812928 Typeset by AMSTEX Printed in England by Biddles Short Run Books, King’s Lynn
5
Contents General Introduction
9
Introduction to Volume 2
10
*Chapter 21: Taxonomy of measure spaces Introduction 211 Definitions
12 12
Complete, totally finite, σfinite, strictly localizable, semifinite, localizable, locally determined measure spaces; atoms; elementary relationships; countablecocountable measures.
212 Complete spaces
18
Measurable and integrable functions on complete spaces; completion of a measure.
213 Semifinite, locally determined and localizable spaces
24
Integration on semifinite spaces; c.l.d. versions; measurable envelopes; characterizing localizability, strict localizability, σfiniteness.
214 Subspaces
36
Subspace measures on arbitrary subsets; integration; direct sums of measure spaces.
215 σfinite spaces and the principle of exhaustion
43
The principle of exhaustion; characterizations of σfiniteness; the intermediate value theorem for atomless measures.
*216 Examples
47
A complete localizable nonlocallydetermined space; a complete locally determined nonlocalizable space; a complete locally determined localizable space which is not strictly localizable.
Chapter 22: The fundamental theorem of calculus Introduction 221 Vitali’s theorem in R
54 54
Vitali’s theorem for intervals in R.
222 Differentiating an indefinite integral Monotonic functions are differentiable a.e., and their derivatives are integrable;
223 Lebesgue’s density theorems f (x) = limh↓0 of a function.
1 h
R x+h x
f a.e. (x); density points; limh↓0
1 2h
R x+h x−h
d dx
Rx a
57 f = f a.e.
63 f − f (x) = 0 a.e. (x); the Lebesgue set
224 Functions of bounded variation
68
Variation of a function; differences of monotonic functions; sums and products, limits, continuity and R differentiability for b.v. functions; an inequality for f ×g.
225 Absolutely continuous functions
77
Absolute continuity of indefinite integrals; absolutely continuous functions on R; integration by parts; lower semicontinuous functions; *direct images of negligible sets; the Cantor function.
*226 The Lebesgue decomposition of a function of bounded variation
87
Sums over arbitrary index sets; saltus functions; the Lebesgue decomposition.
Chapter 23: The RadonNikod´ ym theorem Introduction 231 Countably additive functionals
95 95
Additive and countably additive functionals; Jordan and Hahn decompositions.
232 The RadonNikod´ ym theorem
100
Absolutely and truly continuous additive functionals; truly continuous functionals are indefinite integrals; *the Lebesgue decomposition of a countably additive functional.
233 Conditional expectations
109
σsubalgebras; conditional expectations of integrable functions; convex functions; Jensen’s inequality.
234 Indefiniteintegral measures
117
Measures f µ and their basic properties.
235 MeasurableR transformations R
The formula g(y)ν(dy) = J(x)g(φ(x))µ(dx); detailed conditions of applicability; inversemeasurepreserving functions; the image measure catastrophe; using the RadonNikod´ ym theorem.
121
6
Chapter 24: Function spaces Introduction
133
241 L0 and L0
133
The linear, order and multiplicative structure of L0 ; action of Borel functions; Dedekind completeness and localizability.
242 L1
141
The normed lattice L1 ; integration as a linear functional; completeness and Dedekind completeness; the RadonNikod´ ym theorem and conditional expectations; convex functions; dense subspaces.
243 L∞
152
The normed lattice L∞ ; completeness; the duality between L1 and L∞ ; localizability, Dedekind completeness and the identification L∞ ∼ = (L1 )∗ .
244 Lp
160
The normed lattices Lp , for 1 < p < ∞; H¨ older’s inequality; completeness and Dedekind completeness; (Lp )∗ ∼ = Lq ; conditional expectations.
245 Convergence in measure
173
The topology of (local) convergence in measure on L0 ; pointwise convergence; localizability and Dedekind completeness; embedding Lp in L0 ; k k1 convergence and convergence in measure; σfinite spaces, metrizability and sequential convergence.
246 Uniform integrability
184
Uniformly integrable sets in L1 and L1 ; elementary properties; disjointsequence characterizations; k k1 and convergence in measure on uniformly integrable sets.
247 Weak compactness in L1
193
A subset of L1 is uniformly integrable iff it is relatively weakly compact.
Chapter 25: Product measures Introduction 251 Finite products
199 199
Primitive and c.l.d. products; basic properties; Lebesgue measure on R r+s as a product measure; products of direct sums and subspaces; c.l.d. versions.
252 Fubini’s theorem RR When Rr .
f (x, y)dxdy and
R
215 f (x, y)d(x, y) are equal; measures of ordinate sets; *the volume of a ball in
253 Tensor products
233
L1 (µ×ν) as a completion of L1 (µ)⊗L1 (ν); bounded bilinear maps; the ordering of L1 (µ×ν); conditional expectations.
254 Infinite products
244
Products of arbitrary families of probability spaces; basic properties; inversemeasurepreserving functions; usual measure on {0, 1}I ; {0, 1}N isomorphic, as measure space, to [0, 1]; subspaces of full outer measure; sets determined by coordinates in a subset of the index set; generalized associative law for products of measures; subproducts as image measures; factoring functions through subproducts; conditional expectations on subalgebras corresponding to subproducts.
255 Convolutions of functions
263
256 Radon measures on R r
275
R R Shifts in R 2 as measure space automorphisms; convolutions of functions on R; h × (f ∗ g) = h(x + y)f (x)g(y)d(x, y); f ∗ (g ∗ h) = (f ∗ g) ∗ h; kf ∗ gk1 ≤ kf k1 kgk1 ; the groups R r and ]−π, π]. Definition of Radon measures on R r ; completions of Borel measures; Lusin measurability; image measures; products of two Radon measures; semicontinuous functions.
257 Convolutions of measures
R RR Convolution of totally finite Radon measures on R r ; h d(ν1 ∗ν2 ) = h(x+y)ν1 (dx)ν2 (dy); ν1 ∗(ν2 ∗ν3 ) = (ν1 ∗ ν2 ) ∗ ν3 .
284
7
Chapter 26: Change of variable in the integral Introduction 261 Vitali’s theorem in R r
287 287
Vitali’s theorem for balls in R r ; Lebesgue’s Density Theorem.
262 Lipschitz and differentiable functions
295
Lipschitz functions; elementary properties; differentiable functions from R r to R s ; differentiability and partial derivatives; approximating a differentiable function by piecewise affine functions; *Rademacher’s theorem.
263 DifferentiableR transformations in R r R
308
264 Hausdorff measures
319
In the formula g(y)dy = J(x)g(φ(x))dx, find J when φ is (i) linear (ii) differentiable; detailed conditions of applicability; polar coordinates; the onedimensional case. rdimensional Hausdorff measure on R s ; Borel sets are measurable; Lipschitz functions; if s = r, we have a multiple of Lebesgue measure; *Cantor measure as a Hausdorff measure.
265 Surface measures
329
Normalized Hausdorff measure; action of linear operators and differentiable functions; surface measure on a sphere.
Chapter 27: Probability theory Introduction 271 Distributions
338 339
Terminology; distributions as Radon measures; distribution functions; densities; transformations of random variables.
272 Independence
346
Independent families of random variables; characterizations of independence; joint distributions of (finite) independent families, and product measures; the zeroone law; E(X×Y ), Var(X + Y ); distribution of a sum as convolution of distributions; *Etemadi’s inequality.
273 The strong law of large numbers
P∞ 1 Xi →0 a.e. if the Xn are independent with zero expectation and either (i) n=0 (n+1)2 P∞ 1+δ Var(Xn ) < ∞ or (ii) n=0 E(Xn  ) < ∞ for some δ > 0 or (iii) the Xn are identically distributed. 1 n+1
Pn
357
i=0
274 The Central Limit Theorem
370
Normally distributed r.v.s; Lindeberg’s conditions for the Central Limit Theorem; corollaries; estimating R ∞ −x2 /2 dx. α e
275 Martingales
381
Sequences of σalgebras, and martingales adapted to them; upcrossings; Doob’s Martingale Convergence Theorem; uniform integrability, k k1 convergence and martingales as sequences of conditional expectations; reverse martingales; stopping times.
276 Martingale difference sequences
392
Martingale difference sequences; strong law of large numbers for m.d.ss.; Koml´ os’ theorem.
Chapter 28: Fourier analysis Introduction 281 The StoneWeierstrass theorem
401 401
Approximating a function on a compact set by members of a given lattice or algebra of functions; real and complex cases; approximation by polynomials and trigonometric functions; Weyl’s Equidistribution Theorem in [0, 1]r .
282 Fourier series
413
Fourier and F´ ejer sums; Dirichlet and F´ ejer kernels; RiemannLebesgue lemma; uniform convergence of F´ ejer sums of a continuous function; a.e. convergence of F´ ejer sums of an integrable function; k k2 convergence of Fourier sums of a squareintegrable function; convergence of Fourier sums of a differentiable or b.v. function; convolutions and Fourier coefficients.
283 Fourier Transforms I
R Fourier and inverse Fourier transforms; elementary properties; 0∞ x1 sin x dx = R R ∧ 2 ∧ for differentiable and b.v. f ; convolutions; e−x /2 ; f × g = f ×g.
1 π; 2
∧∨
the formula f
432 =f
8
284 Fourier Transforms II ∧∨
448
Test functions; h = h; tempered functions; tempered functions which represent each other’s transforms; convolutions; squareintegrable functions; Dirac’s delta function.
285 Characteristic functions The characteristic function of a distribution; independent r.v.s; the normal distribution; the vague topology on the space of distributions, and sequential convergence of characteristic functions; Poisson’s theorem.
286 Carleson’s theorem
465
480
The HardyLittlewood Maximal Theorem; the LaceyThiele proof of Carleson’s theorem.
Appendix to Volume 2 Introduction 2A1 Set theory
513 513
Ordered sets; transfinite recursion; ordinals; initial ordinals; Schr¨ oderBernstein theorem; filters; Axiom of Choice; Zermelo’s WellOrdering Theorem; Zorn’s Lemma; ultrafilters.
2A2 The topology of Euclidean space
519
Closures; compact sets; open sets in R.
2A3 General topology Topologies; continuous functions; subspace topologies; Hausdorff topologies; pseudometrics; convergence of sequences; compact spaces; cluster points of sequences; convergence of filters; product topologies; dense subsets.
2A4 Normed spaces
523
532
Normed spaces; linear subspaces; Banach spaces; bounded linear operators; dual spaces; extending a linear operator from a dense subspace; normed algebras.
2A5 Linear topological spaces
535
Linear topological spaces; topologies defined by functionals; completeness; weak topologies.
2A6 Factorization of matrices
539
Determinants; orthonormal families; T = P DQ where D is diagonal and P , Q are orthogonal.
Concordance
541
References for Volume 2
542
Index to Volumes 1 and 2 Principal topics and results General index
544 548
General introduction
9
General introduction In this treatise I aim to give a comprehensive description of modern abstract measure theory, with some indication of its principal applications. The first two volumes are set at an introductory level; they are intended for students with a solid grounding in the concepts of real analysis, but possibly with rather limited detailed knowledge. As the book proceeds, the level of sophistication and expertise demanded will increase; thus for the volume on topological measure spaces, familiarity with general topology will be assumed. The emphasis throughout is on the mathematical ideas involved, which in this subject are mostly to be found in the details of the proofs. My intention is that the book should be usable both as a first introduction to the subject and as a reference work. For the sake of the first aim, I try to limit the ideas of the early volumes to those which are really essential to the development of the basic theorems. For the sake of the second aim, I try to express these ideas in their full natural generality, and in particular I take care to avoid suggesting any unnecessary restrictions in their applicability. Of course these principles are to to some extent contradictory. Nevertheless, I find that most of the time they are very nearly reconcilable, provided that I indulge in a certain degree of repetition. For instance, right at the beginning, the puzzle arises: should one develop Lebesgue measure first on the real line, and then in spaces of higher dimension, or should one go straight to the multidimensional case? I believe that there is no single correct answer to this question. Most students will find the onedimensional case easier, and it therefore seems more appropriate for a first introduction, since even in that case the technical problems can be daunting. But certainly every student of measure theory must at a fairly early stage come to terms with Lebesgue area and volume as well as length; and with the correct formulations, the multidimensional case differs from the onedimensional case only in a definition and a (substantial) lemma. So what I have done is to write them both out (§§114115). In the same spirit, I have been uninhibited, when setting out exercises, by the fact that many of the results I invite students to look for will appear in later chapters; I believe that throughout mathematics one has a better chance of understanding a theorem if one has previously attempted something similar alone. As I write this Introduction (March 2003), the plan of the work is as follows: Volume Volume Volume Volume Volume
1: 2: 3: 4: 5:
The Irreducible Minimum Broad Foundations Measure Algebras Topological Measure Spaces Settheoretic Measure Theory.
Volume 1 is intended for those with no prior knowledge of measure theory, but competent in the elementary techniques of real analysis. I hope that it will be found useful by undergraduates meeting Lebesgue measure for the first time. Volume 2 aims to lay out some of the fundamental results of pure measure theory (the RadonNikod´ ym theorem, Fubini’s theorem), but also gives short introductions to some of the most important applications of measure theory (probability theory, Fourier analysis). While I should like to believe that most of it is written at a level accessible to anyone who has mastered the contents of Volume 1, I should not myself have the courage to try to cover it in an undergraduate course, though I would certainly attempt to include some parts of it. Volumes 3 and 4 are set at a rather higher level, suitable to postgraduate courses; while Volume 5 will assume a wideranging competence over large parts of analysis and set theory. There is a disclaimer which I ought to make in a place where you might see it in time to avoid paying for this book. I make no attempt to describe the history of the subject. This is not because I think the history uninteresting or unimportant; rather, it is because I have no confidence of saying anything which would not be seriously misleading. Indeed I have very little confidence in anything I have ever read concerning the history of ideas. So while I am happy to honour the names of Lebesgue and Kolmogorov and Maharam in more or less appropriate places, and I try to include in the bibliographies the works which I have myself consulted, I leave any consideration of the details to those bolder and better qualified than myself. The work as a whole is not yet complete; and when it is finished, it will undoubtedly be too long to be printed as a single volume in any reasonable format. I am therefore publishing it one part at a time. However, drafts of most of the rest are available on the Internet; see http://www.essex.ac.uk/ maths/staff/fremlin/mt.htm for detailed instructions. For the time being, at least, printing will be in short runs. I hope that readers will be energetic in commenting on errors and omissions, since it should be possible to correct these relatively promptly. An inevitable consequence of this is that paragraph references may go out of date rather quickly. I shall be most flattered if anyone chooses to rely on this book as a source
10
General introduction
for basic material; and I am willing to attempt to maintain a concordance to such references, indicating where migratory results have come to rest for the moment, if authors will supply me with copies of papers which use them. Two such items can already be found in the concordance to the present volume. I mention some minor points concerning the layout of the material. Most sections conclude with lists of ‘basic exercises’ and ‘further exercises’, which I hope will be generally instructive and occasionally entertaining. How many of these you should attempt must be for you and your teacher, if any, to decide, as no two students will have quite the same needs. I mark with a > those which seem to me to be particularly important. But while you may not need to write out solutions to all the ‘basic exercises’, if you are in any doubt as to your capacity to do so you should take this as a warning to slow down a bit. The ‘further exercises’ are unbounded in difficulty, and are unified only by a presumption that each has at least one solution based on ideas already introduced. Occasionally I add a final ‘problem’, a question to which I do not know the answer and which seems to arise naturally in the course of the work. The impulse to write this book is in large part a desire to present a unified account of the subject. Crossreferences are correspondingly abundant and wideranging. In order to be able to refer freely across the whole text, I have chosen a reference system which gives the same code name to a paragraph wherever it is being called from. Thus 132E is the fifth paragraph in the second section of the third chapter of Volume 1, and is referred to by that name throughout. Let me emphasize that crossreferences are supposed to help the reader, not distract her. Do not take the interpolation ‘(121A)’ as an instruction, or even a recommendation, to lift Volume 1 off the shelf and hunt for §121. If you are happy with an argument as it stands, independently of the reference, then carry on. If, however, I seem to have made rather a large jump, or the notation has suddenly become opaque, local crossreferences may help you to fill in the gaps. Each volume will have an appendix of ‘useful facts’, in which I set out material which is called on somewhere in that volume, and which I do not feel I can take for granted. Typically the arrangement of material in these appendices is directed very narrowly at the particular applications I have in mind, and is unlikely to be a satisfactory substitute for conventional treatments of the topics touched on. Moreover, the ideas may well be needed only on rare and isolated occasions. So as a rule I recommend you to ignore the appendices until you have some direct reason to suppose that a fragment may be useful to you. During the extended gestation of this project I have been helped by many people, and I hope that my friends and colleagues will be pleased when they recognise their ideas scattered through the pages below. But I am especially grateful to those who have taken the trouble to read through earlier drafts and comment on obscurities and errors.
Introduction to Volume 2 For this second volume I have chosen seven topics through which to explore the insights and challenges offered by measure theory. Some, like the RadonNikod´ ym theorem (Chapter 23) are necessary for any understanding of the structure of the subject; others, like Fourier analysis (Chapter 28) and the discussion of function spaces (Chapter 24) demonstrate the power of measure theory to attack problems in general real and functional analysis. But all have applications outside measure theory, and all have influenced its development. These are the parts of measure theory which any analyst may find himself using. Every topic is one which ideally one would wish undergraduates to have seen, but the length of this volume makes it plain that no ordinary undergraduate course could include very much of it. It is directed rather at graduate level, where I hope it will be found adequate to support all but the most ambitious courses in measure theory, though it is perhaps a bit too solid to be suitable for direct use as a course text, except with careful selection of the parts to be covered. If you are using it to teach yourself measure theory, I strongly recommend an eclectic approach, looking for particular subjects and theorems that seem startling or useful, and working backwards from them. My other objective, of course, is to provide an account of the central ideas at this level in measure theory, rather fuller than can easily be found in one volume elsewhere. I cannot claim that it is ‘definitive’, but I do think I cover a good deal of ground in ways that provide a firm foundation for further study. As in Volume 1, I usually do not shrink from giving ‘best’ results, like Lindeberg’s conditions for the Central Limit Theorem (§274), or the theory of products of arbitrary measure spaces (§251). If I were teaching this material to students in a PhD programme I would rather accept a limitation in the breadth of the course than leave them unaware of what could be done in the areas discussed.
Introduction to Volume 2
11
The topics interact in complex ways – one of the purposes of this book is to exhibit their relationships. There is no canonical linear ordering in which they should be taken. Nor do I think organization charts are very helpful, not least because it may be only two or three paragraphs in a section which are needed for a given chapter later on. I do at least try to lay the material of each section out in an order which makes initial segments useful by themselves. But the order in which to take the chapters is to a considerable extent for you to choose, perhaps after a glance at their individual introductions. I have done my best to pitch the exposition at much the same level throughout the volume, sometimes allowing gradients to steepen in the course of a chapter or a section, but always trying to return to something which anyone who has mastered Volume 1 ought to be able to cope with. (Though perhaps the main theorems of Chapter 26 are harder work than the principal results elsewhere, and §286 is only for the most determined.) I said there were seven topics, and you will see eight chapters ahead of you. This is because Chapter 21 is rather different from the rest. It is the purest of pure measure theory, and is here only because there are places later in the volume where (in my view) the theorems make sense only in the light of some abstract concepts which are not particularly difficult, but are also not obvious. However it is fair to say that the most important ideas of this volume do not really depend on the work of Chapter 21. As always, it is a puzzle to know how much prior knowledge to assume in this volume. I do of course call on the results of Volume 1 of this treatise whenever they seem to be relevant. I do not doubt, however, that there will be readers who have learnt the elementary theory from other sources. Provided you can, from first principles, construct Lebesgue measure and prove the basic convergence theorems for integrals on arbitrary measure spaces, you ought to be able to embark on the present volume. Perhaps it would be helpful to have in hand the resultsonly version of Volume 1, since that includes the most important definitions as well as statements of the theorems. There is also the question of how much material from outside measure theory is needed. Chapter 21 calls for some nontrivial set theory (given in §2A1), but the more advanced ideas are needed only for the counterexamples in §216, and can be passed over to begin with. The problems become acute in Chapter 24. Here we need a variety of results from functional analysis, some of them depending on nontrivial ideas in general topology. For a full understanding of this material there is no substitute for a course in normed spaces up to and including a study of weak compactness. But I do not like to insist on such a preparation, because it is likely to be simultaneously too much and too little. Too much, because I hardly mention linear operators at this stage; too little, because I do ask for some of the theory of nonlocallyconvex spaces, which is often omitted in first courses on functional analysis. At the risk, therefore, of wasting paper, I have written out condensed accounts of the essential facts (§§2A32A5). Note on second printing For the second printing of this volume, I have made two substantial corrections to inadequate proofs and a large number of minor amendments; I am most grateful to T.D.Austin for his careful reading of the first printing. In addition, I have added a dozen exercises and a handful of straightforward results which turn out to be relevant to the work of later volumes and fit naturally here. The regular process of revision of this work has led me to make a couple of notational innovations not described explicitly in the early editions of Volume 1. I trust that most readers will find these immediately comprehensible. If, however, you find that there is a puzzling crossreference which you are unable to match with anything in the version of Volume 1 which you are using, it may be worth while checking the errata pages in http://www.essex.ac.uk/maths/staff/fremlin/mterr.htm.
12
Taxonomy of measure spaces
*Chapter 21 Taxonomy of measure spaces I begin this volume with a ‘starred chapter’. The point is that I do not really recommend this chapter for beginners. It deals with a variety of technical questions which are of great importance for the later development of the subject, but are likely to be both abstract and obscure for anyone who has not encountered the problems these techniques are designed to solve. On the other hand, if (as is customary) this work is omitted, and the ideas are introduced only when urgently needed, the student is likely to finish with very vague ideas on which theorems can be expected to apply to which types of measure space, and with no vocabulary in which to express those ideas. I therefore take a few pages to introduce the terminology and concepts which can be used to distinguish ‘good’ measure spaces from others, with a few of the basic relationships. The only paragraphs which are immediately relevant to the theory set out in Volume 1 are those on ‘complete’, ‘σfinite’ and ‘semifinite’ measure spaces (211A, 211D, 211F, 211Lc, §212, 213A213B, 215B), and on Lebesgue measure (211M). For the rest, I think that a newcomer to the subject can very well pass over this chapter for the time being, and return to it for particular items when the text of later chapters refers to it. On the other hand, it can also be used as an introduction to the flavour of the ‘purest’ kind of measure theory, the study of measure spaces for their own sake, with a systematic discussion of a few of the elementary constructions.
211 Definitions I start with a list of definitions, corresponding to the concepts which I have found to be of value in distinguishing different types of measure space. Necessarily, the significance of many of these ideas is likely to be obscure until you have encountered some of the obstacles which arise later on. Nevertheless, you will I hope be able to deal with these definitions on a formal, abstract basis, and to follow the elementary arguments involved in establishing the relationships between them (211L). In §216 I give three substantial examples to demonstrate the rich variety of objects which the definition of ‘measure space’ encompasses. In the present section, therefore, I content myself with very brief descriptions of sufficient cases to show at least that each of the definitions here discriminates between different spaces (211M211R). 211A Definition Let (X, Σ, µ) be a measure space. Then µ, or (X, Σ, µ), is (Carath´ eodory) complete if whenever A ⊆ E ∈ Σ and µE = 0 then A ∈ Σ; that is, if every negligible subset of X is measurable. 211B Definition Let (X, Σ, µ) be a measure space. Then (X, Σ, µ), is a probability space if µX = 1. In this case µ is called a probability or probability measure. 211C Definition Let (X, Σ, µ) be a measure space. Then µ, or (X, Σ, µ), is totally finite if µX < ∞. 211D Definition Let (X, Σ, µ) be a measure space. Then µ, or S (X, Σ, µ), is σfinite if there is a sequence hEn in∈N of measurable sets of finite measure such that X = n∈N En . Remark Note that in this case we can set Fn = En \
S i 0 and whenever F ∈ Σ, F ⊆ E one of F , E \ F is negligible. 211J Definition Let (X, Σ, µ) be a measure space. Then µ, or (X, Σ, µ), is atomless or diffused if there are no atoms for µ. (Note that this is not the same thing as saying that all finite sets are negligible; see 211R below.) 211K Definition Let (X, Σ, µ) be a measure space. Then µ, or (X, Σ, µ), is purely atomic if whenever E ∈ Σ and E is not negligible there is an atom for µ included in E. Remark P Recall that a measure µ on a set X is pointsupported if µ measures every subset of X and µE = x∈E µ{x} for every E ⊆ X (112Bd). Every pointsupported measure is purely atomic, because {x} must be an atom whenever µ{x} > 0, but not every purely atomic measure is pointsupported (211R). 211L The relationships between the concepts above are in a sense very straightforward; all the direct implications in which one property implies another are given in the next theorem. Theorem (a) A probability space is totally finite. (b) A totally finite measure space is σfinite. (c) A σfinite measure space is strictly localizable. (d) A strictly localizable measure space is localizable and locally determined. (e) A localizable measure space is semifinite. (f) A locally determined measure space is semifinite. proof (a), (b), (e) and (f) are trivial. (c) Let (X, Σ, µ) be a σfinite measure space; let hFn in∈N be a disjoint sequence of measurable sets of finite measure covering X (see the remark in 211D). If E ∈ Σ, then of course E ∩ Fn ∈ Σ for every n ∈ N, and
14
Taxonomy of measure spaces
µE =
P∞ n=0
µ(E ∩ Fn ) =
P n∈N
211L
µ(E ∩ Fn ).
If E ⊆ X and E ∩ Fn ∈ Σ for every n ∈ N, then S E = n∈N E ∩ Fn ∈ Σ. So hFn in∈N is a decomposition of X and (X, Σ, µ) is strictly localizable. (d) Let (X, Σ, µ) be a strictly localizable measure space; let hXi ii∈I be a decomposition of X. (i) Let E be a family of measurable subsets of X. Let F be the family of measurable sets F ⊆ X such that µ(F ∩ E) = 0 for every E ∈ E. Note that ∅ ∈ F and, if hFn in∈N is any sequence in F, then S n∈N Fn ∈ F. For each i ∈ I, set γi = sup{µ(F ∩ Xi ) : F ∈ F} and choose a sequence hFin in∈N in F such that limn→∞ µ(Fin ∩ Xi ) = γi ; set S Fi = n∈N Fin ∈ F . Set F =
S i∈I
Fi ∩ Xi ⊆ X
and H = X \ F . We see that F ∩ Xi = Fi ∩ Xi for each i ∈ I (because hXi ii∈I is disjoint), so F ∈ Σ and H ∈ Σ. For any E ∈ E, P P µ(E \ H) = µ(E ∩ F ) = i∈I µ(E ∩ F ∩ Xi ) = i∈I µ(E ∩ Fi ∩ Xi ) = 0 because every Fi belongs to F. Thus F ∈ F. If G ∈ Σ and µ(E \ G) = 0 for every E ∈ E, then X \ G and F 0 = F ∪(X\G) belong to F. So µ(F 0 ∩Xi ) ≤ γi for each i ∈ I. But also µ(F ∩Xi ) ≥ supn∈N µ(Fin ∩Xi ) = γi , so µ(F ∩ Xi ) = µ(F 0 ∩ Xi ) for each i. Because µXi is finite, it follows that µ((F 0 \ F ) ∩ Xi ) = 0, for each i. Summing over i, µ(F 0 \ F ) = 0, that is, µ(H \ G) = 0. Thus H is an essential supremum for E in Σ. As E is arbitrary, (X, Σ, µ) is localizable. (ii) If E ∈ Σ and µE = ∞, then there is some i ∈ I such that 0 < µ(E ∩ Xi ) ≤ µXi < ∞; so (X, Σ, µ) is semifinite. If E ⊆ X and E ∩ F ∈ Σ whenever µF < ∞, then, in particular, E ∩ Xi ∈ Σ for every i ∈ I, so E ∈ Σ; thus (X, Σ, µ) is locally determined. 211M Example: Lebesgue measure Let us consider Lebesgue measure in the light of the concepts above. Write µ for Lebesgue measure on R r and Σ for its domain. (a) µ is complete, because it is constructed by Carath´eodory’s method; if A ⊆ E and µE = 0, then µ∗ A = µ∗ E = 0 (writing µ∗ for Lebesgue outer measure), so, for any B ⊆ R, µ∗ (B ∩ A) + µ∗ (B \ A) ≤ 0 + µ∗ B = µ∗ B, and A must be measurable. S (b) µ is σfinite, because R = n∈N [−n, n], writing n for the vector (n, . . . , n), and µ[−n, n] = (2n)r < ∞ for every n. Of course µ is neither totally finite nor a probability measure. (c) Because µ is σfinite, it is strictly localizable (211Lc), localizable (211Ld), locally determined (211Ld) and semifinite (211Lef). (d) µ is atomless. P P Suppose that E ∈ Σ. Consider the function a 7→ f (a) = µ(E ∩ [−a, a]) : [0, ∞[ → R We have f (a) ≤ f (b) ≤ f (a) + µ[−b, b] − µ[−a, a] = f (a) + (2b)r − (2a)r whenever a ≤ b in [0, ∞[, so f is continuous. Now f (0) = 0 and limn→∞ f (n) = µE > 0. By the Intermediate Value Theorem there is an a ∈ [0, ∞[ such that 0 < f (a) < µE. So we have
211Q
Definitions
15
0 < µ(E ∩ [−a, a]) < µE. As E is arbitrary, µ is atomless. Q Q (e) It is now a trivial observation that µ cannot be purely atomic, because R r itself is a set of positive measure not including any atom. 211N Counting measure Take X to be any uncountable set (e.g., X = R), and µ to be counting measure on X (112Bd). (a) µ is complete, because if A ⊆ E and µE = 0 then A = E = ∅ ∈ Σ. (b) µ is not σfinite, because if hEn in∈N is any sequence of sets of finite measure then every En is finite, S therefore countable, and n∈N En is countable (1A1F), so cannot be X. A fortiori, µ is not a probability measure nor totally finite. (c) µ is strictly localizable. P P Set Xx = {x} for every x ∈ X. Then hXx ix∈X is a partition of X, and for any E ⊆ X µ(E ∩ Xx ) = 1 if x ∈ E, 0 otherwise. By the definition of µ, µE =
P x∈X
µ(E ∩ Xx )
for every E ⊆ X, and hXx ix∈X is a decomposition of X. Q Q Consequently µ is localizable, locally determined and semifinite. (d) µ is purely atomic. P P {x} is an atom for every x ∈ X, and if µE > 0 then surely E includes {x} for some x. Q Q Obviously, µ is not atomless. 211O A nonsemifinite space Set X = {0}, Σ = {∅, X}, µ∅ = 0 and µX = ∞. Then µ is not semifinite, as µX = ∞ but X has no subset of nonzero finite measure. It follows that µ cannot be localizable, locally determined, σfinite, totally finite nor a probability measure. Because Σ = PX, µ is complete. X is an atom for µ, so µ is purely atomic (indeed, it is pointsupported). 211P A noncomplete space Write B for the σalgebra of Borel subsets of R (111G), and µ for the restriction of Lebesgue measure to B (recall that by 114G every Borel subset of R is Lebesgue measurable). Then (R, B, µ) is atomless, σfinite and not complete. proof (a) To see that µ is not complete, recall that there is a continuous, strictly increasing bijection g : [0, 1] → [0, 1] such that µg[C] > 0, where C is the Cantor set, so that there is a set A ⊆ g[C] which is not Lebesgue measurable (134Ib). Now g −1 [A] ⊆ C cannot be a Borel set, since χA = χ(g −1 [A])◦g −1 is not Lebesgue measurable, therefore not Borel measurable, and the composition of two Borel measurable functions is Borel measurable (121Eg); so g −1 [A] is a nonmeasurable subset of the negligible set C. (b) The rest of the arguments of 211M apply to µ just as well as to true Lebesgue measure, so µ is σfinite and atomless. *Remark The argument offered in (a) could give rise to a seriously false impression. The set A referred to there can be constructed only with the use of a strong form of the axiom of choice. No such device is necessary for the result here. There are many methods of constructing nonBorel subsets of the Cantor set, all illuminating in different ways, and some do not need the axiom of choice at all; I hope to return to this question in Volumes 4 and 5. 211Q Some probability spaces Two obvious constructions of probability spaces, restricting myself to the methods described in Volume 1, are (a) the subspace measure induced by Lebesgue measure on [0, 1] (131B);
16
Taxonomy of measure spaces
211Q
P (b) the pointsupported measure induced on a set X by a function h : X → [0, 1] such that x∈X h(x) = 1, P writing µE = x∈E h(x) for every E ⊆ X; for instance, if X is a singleton {x} and h(x) = 1, or if X = N and h(n) = 2−n−1 . Of these two, (a) gives an atomless probability measure and (b) gives a purely atomic probability measure. 211R The countablecocountable measure The following is one of the basic constructions to keep in mind when considering abstract measure spaces. (a) Let X be any set. Let Σ be the family of those sets E ⊆ X such that either E or X \ E is countable. Then Σ is a σalgebra of subsets of X. P P (i) ∅ is countable, so belongs to Σ. (ii) The condition for E to belong to Σ is symmetric between E and X \ E, so X \ E ∈ Σ for every E ∈ Σ. (iii) Let hEn in∈N be S any sequence in Σ, and set E = n∈N En . If every En is countable, then E is countable, so belongs to Σ. Otherwise, there is some n such that X \ En is countable, in which case X \ E ⊆ X \ En is countable, so again E ∈ Σ. Q Q Σ is called the countablecocountable σalgebra of X. (b) Now consider the function µ : Σ → {0, 1} defined by writing µE = 0 if E is countable, µE = 1 if E ∈ Σ and E is not countable. Then µ is a measure. P P (i) ∅ is countable so µ∅ = 0. (ii) Let hEn in∈N be a disjoint sequence in Σ, and E its union. (α) If every Em is countable, then so is E, so P∞ µE = 0 = n=0 µEn . (β) If some Em is uncountable, then E ⊇ Em is also uncountable, and µE = µEm = 1. But in this case, because Em ∈ Σ, X \ Em is countable, so En , being a subset of X \ Em , is countable for every n = 6 m; thus µEn = 0 for every n 6= m, and P∞ µE = 1 = n=0 µEn . As hEn in∈N is arbitrary, µ is a measure. Q Q (µ is called the countablecocountable measure on X.) (c) If X is any uncountable set and µ is the countablecocountable measure on X, then µ is a complete, purely atomic probability measure, but is not pointsupported. P P (i) If A ⊆ E and µE = 0, then E is countable, so A is also countable and belongs to Σ. Thus µ is complete. (ii) Because X is uncountable, µX = 1 and µ is a probability measure. (iii) If µE > 0, then µF = µE = 1 whenever F is aPnonnegligible measurable subset of E, so E is itself an atom; thus µ is purely atomic. (iv) µX = 1 > 0 = x∈X µ{x}, so µ is not pointsupported. Q Q 211X Basic exercises > (a) Let µ be counting measure on a set X. Show that µ is always strictly localizable and purely atomic, and that it is σfinite iff X is countable, totally finite iff X is finite, a probability measure iff X is a singleton, and atomless iff X is empty. > (b) Let g : R → R be a nondecreasing function and µg the associated LebesgueStieltjes measure (114Xa). Show that µg is complete and σfinite. Show that (i) µg is totally finite iff g is bounded; (ii) µg is a probability measure iff limx→∞ g(x) − limx→−∞ g(x) = 1; (iii) µg is atomless iff g is continuous; (iv) if E is any atom for µg , there is a point x ∈ E such that µg E = µg {x}; (v) µg is purely atomic iff it is pointsupported. (c) Let X be a set. Show that for any σideal I of subsets of X (definition: 112Db), the set Σ = I ∪ {X \ A : A ∈ I} is a σalgebra of subsets of X, and that there is a measure µ : Σ → {0, 1} given by setting µE = 0 if E ∈ I,
µE = 1 if E ∈ Σ \ I.
Show that I is precisely the ideal of µnegligible sets, that µ is complete, totally finite and purely atomic, and is a probability measure iff X ∈ / I.
211 Notes
Definitions
17
> (d) Let (X, Σ, µ) be a measure space, Y any set and φ : X → Y a function. Let µφ−1 be the image measure as defined in 112E. Show that (i) µφ−1 is complete whenever µ is; (ii) µφ−1 is a probability measure iff µ is; (iii) µφ−1 is totally finite iff µ is; (iv) µ is σfinite if µφ−1 is; (v) if µ is purely atomic and σfinite, then µφ−1 is purely atomic; (vi) if µ is purely atomic and µφ−1 is semifinite, then µφ−1 is purely atomic. > (e) Let (X, Σ, µ) be a measure space. Show that µ is σfinite iff there is a totally finite measure ν on X with the same measurable sets and the same negligible sets as µ. (f ) Show that a pointsupported measure is strictly localizable iff it is semifinite. 211Y Further exercises (a) Let Σ be the countablecocountable σalgebra of R. Show that [0, ∞[ ∈ / Σ. Let µ be the restriction of counting measure to Σ. Show that (R, Σ, µ) is complete, semifinite and purely atomic, but not localizable nor locally determined. (b) Let (X, Σ, µ) be a measure space, and for E, F ∈ Σ write E ∼ F if µ(E4F ) = 0. Show that ∼ is an equivalence relation on Σ. Let A be the set of equivalence classes in Σ for ∼; for E ∈ Σ, write E • ∈ A for its equivalence class. Show that there is a partial ordering ⊆ on A defined by saying that, for E, F ∈ Σ, E • ⊆ F • ⇐⇒ µ(E \ F ) = 0. Show that µ is localizable iff for every A ⊆ A there is an h ∈ A such that (i) a ⊆ h for every a ∈ A (ii) whenever g ∈ A is such that a ⊆ g for every a ∈ A, then h ⊆ g. (c) Let (X, Σ, µ) be a measure space, and construct A as in (b) above. Show that there are operations on A defined by saying that
∪, ∩, \
E • ∩ F • = (E ∩ F )• , E • ∪ F • = (E ∪ F )• , E • \ F • = (E \ F )• for all E, F ∈ Σ. Show that if A ⊆ A is any countable set, then there is certainly an h ∈ A such that (i) a ⊆ h for every a ∈ A (ii) whenever g ∈ A is such that a ⊆ g for every a ∈ A, then h ⊆ g. Show that there is a functional µ ¯ : A → [0, ∞] defined by saying µ ¯(E • ) = µE for every E ∈ Σ. ((A, µ ¯) is called the measure algebra of (X, Σ, µ).) (d) Let (X, Σ, µ) be a semifinite measure space. Show that it is atomless iff whenever ² > 0, E ∈ Σ and µE < ∞, then there is a finite partition of E into measurable sets of measure at most ². (e) Let (X, Σ, µ) be a strictly localizable measure space. Show that it is atomless iff for every ² > 0 there is a decomposition of X consisting of sets of measure at most ². 211 Notes and comments The list of definitions in 211A211K probably strikes you as quite long enough, even though I have omitted many occasionally useful ideas. The concepts here vary widely in importance, and the importance of each varies widely with context. My own view is that it is absolutely necessary, when studying any measure space, to know its classification under the eleven discriminating features listed here, and to be able to describe any atoms which are present. Fortunately, for most ‘ordinary’ measure spaces, the classification is fairly quick, because if (for instance) the space is σfinite, and you know the measure of the whole space, the only remaining questions concern completeness and atoms. The distinctions between
18
Taxonomy of measure spaces
211 Notes
spaces which are, or are not, strictly localizable, semifinite, localizable and locally determined are relevant only for spaces which are not σfinite, and do not arise in elementary applications. I think it is also fair to say that the notions of ‘complete’ and ‘locally determined’ measure space are technical; I mean, that they do not correspond to significant features of the essential structure of a space, though there are some interesting problems concerning incomplete measures. One manifestation of this is the existence of canonical constructions for rendering spaces complete or complete and locally determined (212C, 213D213E). In addition, measure spaces which are not semifinite do not really belong to measure theory, but rather to the more general study of σalgebras and σideals. The most important classifications, in terms of the behaviour of a measure space, seem to me to be ‘σfinite’, ‘localizable’ and ‘strictly localizable’; these are the critical features which cannot be forced by elementary constructions. If you know anything about Borel subsets of the real line, the argument of part (a) of the proof of 211P must look very clumsy. But ‘better’ proofs rely on ideas which we shall not need until Volume 4, and the proof here is based on a construction which we have to understand for other reasons.
212 Complete spaces In the next two sections of this chapter I give brief accounts of the theory of measure spaces possessing certain of the properties described in §211. I begin with ‘completeness’. I give the elementary facts about complete measure spaces in 212A212B; then I turn to the notion of ‘completion’ of a measure (212C) and its relationships with the other concepts of measure theory introduced so far (212D212H). 212A Proposition Any measure space constructed by Carath´eodory’s method is complete. proof Recall that ‘Carath´eodory’s method’ starts from an arbitrary outer measure θ : PX → [0, ∞] and sets Σ = {E : E ⊆ X, θA = θ(A ∩ E) + θ(A \ E) for every A ⊆ X},
µ = θ¹Σ
(113C). In this case, if B ⊆ E ∈ Σ and µE = 0, then θB = θE = 0 (113A(ii)), so for any A ⊆ X we have θ(A ∩ B) + θ(A \ B) = θ(A \ B) ≤ θA ≤ θ(A ∩ B) + θ(A \ B), and B ∈ Σ. 212B Proposition (a) If (X, Σ, µ) is a complete measure space, then any conegligible subset of X is measurable. (b) Let (X, Σ, µ) be a complete measure space, and f a realvalued function defined on a subset of X. If f is virtually measurable (that is, there is a conegligible set E ⊆ X such that f ¹E is measurable), then f is measurable. (c) Let (X, Σ, µ) be a complete measure space, and f a realvalued function defined on a conegligible subset of X. Then the following are equiveridical, that is, if one is true so are the others: (i) f is integrable; (ii) f is measurable and f  is integrable; (iii) f is measurable and there is an integrable function g such that f  ≤a.e. g (that is, f  ≤ g almost everywhere). (d) Let (X, Σ, µ) be a complete measure space, Y a set and f : X → Y a function. Then the image measure µf −1 (112E) is complete. proof (a) If E is conegligible, then X \ E is negligible, therefore measurable, and E is measurable. (b) Let a ∈ R. Then there is an H ∈ Σ such that {x : (f ¹E)(x) ≤ a} = H ∩ dom(f ¹E) = H ∩ E ∩ dom f . Now F = {x : x ∈ dom f \ E, f (x) ≤ a} is a subset of the negligible set X \ E, so is measurable, and {x : f (x) ≤ a} = (F ∪ H) ∩ dom f ∈ Σdom f ,
212D
Complete spaces
19
writing ΣD = {D ∩ E : E ∈ Σ}, as in 121A. As a is arbitrary, f is measurable. (c) A realvalued function f on a general measure space (X, Σ, µ) is integrable iff (α) there is a conegligible set E ⊆ dom f such that f ¹E is measurable (β) there is a nonnegative integrable function g such that f  ≤a.e. g (122P(iii)). But in view of (b), we can in the present context restate (α) as ‘f is defined a.e. and measurable’; which is the version here. (The shift from ‘nonnegative integrable g’ to ‘integrable g’ is trivial, because if g is integrable and f  ≤a.e. g, then g is nonnegative and integrable and f  ≤a.e. g.) (d) If B ⊆ F ⊆ Y and (µf −1 )(F ) = 0, then f −1 [B] ⊆ f −1 [F ] and µ(f −1 [F ]) = 0, so (µf −1 )(B) = µ(f −1 [B]) = 0. 212C The completion of a measure Let (X, Σ, µ) be any measure space. ˆ be the family of those sets E ⊆ X such that there are E 0 , E 00 ∈ Σ with E 0 ⊆ E ⊆ E 00 and (a) Let Σ 00 0 ˆ is a σalgebra of subsets of X. P ˆ because we can take µ(E \ E ) = 0. Then Σ P (i) Of course ∅ belongs to Σ, 0 00 0 00 0 00 00 0 ˆ E = E = ∅. (ii) If E ∈ Σ, take E , E ∈ Σ such that E ⊆ E ⊆ E and µ(E \ E ) = 0. Then X \ E 00 ⊆ X \ E ⊆ X \ E 0 ,
µ((X \ E 0 ) \ (X \ E 00 )) = µ(E 00 \ E 0 ) = 0,
ˆ (iii) If hEn in∈N is a sequence in Σ, ˆ then for each n choose En0 , En00 ∈ Σ such that En0 ⊆ En ⊆ En00 so X \E ∈ Σ. 00 0 and µ(En \ En ) = 0. Set S S S E = n∈N En , E 0 = n∈N En0 , E 00 = n∈N En00 ; S ˆ Q Q then E 0 ⊆ E ⊆ E 00 and E 00 \ E 0 ⊆ n∈N (En00 \ En0 ) is negligible, so E ∈ Σ. ˆ set (b) For E ∈ Σ, µ ˆE = µ∗ E = inf{µF : E ⊆ F ∈ Σ}. ˆ E 0 , E 00 ∈ Σ, E 0 ⊆ E ⊆ E 00 and µ(E 00 \ E 0 ) = 0, then It is worth remarking at once that if E ∈ Σ, µE 0 = µ ˆE = µE 00 ; this is because µE 0 = µ∗ E 0 ≤ µ∗ E ≤ µ∗ E 00 = µE 00 = µE 0 + µ(E 00 \ E) = µE 0 (recalling from 132A, or noting now, that µ∗ A ≤ µ∗ B whenever A ⊆ B ⊆ X, and that µ∗ agrees with µ on Σ). ˆ µ (c) We now find that (X, Σ, ˆ) is a measure space. P P (i) Of course µ ˆ, like µ, takes values in [0, ∞]. (ii) ˆ with union E. For each n ∈ N choose En0 , µ ˆ∅ = µ∅ = 0. (iii) Let hEn in∈N be a disjoint sequence in Σ, S S En00 ∈ Σ such that En0 ⊆ En ⊆ En00 and µ(En00 \ En0 ) = 0. Set E 0 = n∈N En0 , E 00 = n∈N En00 . Then (as in (aiii) above) E 0 ⊆ E ⊆ E 00 and µ(E 00 \ E 0 ) = 0, so P∞ P∞ ˆEn µ ˆE = µE 0 = n=0 µEn0 = n=0 µ because hEn0 in∈N , like hEn in∈N , is disjoint. Q Q ˆ µ (d) The measure space (X, Σ, ˆ) is called the completion of the measure space (X, Σ, µ); equally, I will call µ ˆ the completion of µ, and occasionally (if it is plain which ideal of negligible sets is under ˆ the completion of Σ. Members of Σ ˆ are sometimes called µmeasurable. consideration) I will call Σ 212D There is something I had better check at once. ˆ µ Proposition Let (X, Σ, µ) be any measure space. Then (X, Σ, ˆ), as defined in 212C, is a complete measure ˆ space and µ ˆ is an extension of µ; and (X, Σ, µ ˆ) = (X, Σ, µ) iff (X, Σ, µ) is complete. ˆ and µ proof (a) Suppose that A ⊆ E ∈ Σ ˆE = 0. Then (by 212Cb) there is an E 00 ∈ Σ such that E ⊆ E 00 00 and µE = 0. Accordingly we have ∅ ⊆ A ⊆ E 00 , µ(E 00 \ ∅) = 0, ˆ As A is arbitrary, µ so A ∈ Σ. ˆ is complete.
20
Taxonomy of measure spaces
212D
ˆ because E ⊆ E ⊆ E and µ(E \ E) = 0; and µ (b) If E ∈ Σ, then of course E ∈ Σ, ˆE = µ∗ E = µE. Thus ˆ and µ Σ⊆Σ ˆ extends µ. ˆ then there are E 0 , E 00 ∈ Σ (c) If µ = µ ˆ then of course µ must be complete. If µ is complete, and E ∈ Σ, 0 00 00 0 0 00 0 such that E ⊆ E ⊆ E and µ(E \ E ) = 0. But now E \ E ⊆ E \ E , so (because (X, Σ, µ) is complete) ˆ ⊆ Σ and Σ ˆ = Σ and µ = µ E \ E 0 ∈ Σ and E = E 0 ∪ (E \ E 0 ) ∈ Σ. As E is arbitrary, Σ ˆ. 212E The importance of this construction is such that it is worth spelling out some further elementary properties. ˆ µ Proposition Let (X, Σ, µ) be a measure space, and (X, Σ, ˆ) its completion. (a) The outer measures µ ˆ∗ , µ∗ defined from µ ˆ and µ coincide. (b) µ, µ ˆ give rise to the same negligible and conegligible sets. ˆ which agrees with µ on Σ. (c) µ ˆ is the only measure with domain Σ ˆ (d) A subset of X belongs to Σ iff it is expressible as F 4A where F ∈ Σ and A is µnegligible. ˆ and µF = µ proof (a) Take any A ⊆ X. (i) If A ⊆ F ∈ Σ, then F ∈ Σ ˆF , so µ ˆ∗ A ≤ µ ˆF = µF ; ∗ ∗ ˆ as F is arbitrary, µ ˆ A ≤ µ A. (ii) If A ⊆ E ∈ Σ, there is an E 00 ∈ Σ such that E ⊆ E 00 and µE 00 = µ ˆE, so µ∗ A ≤ µE 00 = µ ˆE; as E is arbitrary, µ∗ A ≤ µ ˆ∗ A. (b) Now, for A ⊆ X, A is µnegligible ⇐⇒ µ∗ A = 0 ⇐⇒ µ ˆ∗ A = 0 ⇐⇒ A is µ ˆnegligible, A is µconegligible ⇐⇒ µ∗ (X \ A) = 0 ⇐⇒ µ ˆ∗ (X \ A) = 0 ⇐⇒ A is µ ˆconegligible. ˆ extending µ, we must have (c) If µ ˜ is any measure with domain Σ µE 0 ≤ µ ˜E ≤ µE 00 , µE 0 = µ ˆE = µE 00 , so µ ˜E = µ ˆE, whenever E 0 , E 00 ∈ Σ, E 0 ⊆ E ⊆ E 00 and µ(E 00 \ E 0 ) = 0. ˆ take E 0 , E 00 ∈ Σ such that E 0 ⊆ E ⊆ E 00 and µ(E 00 \ E 0 ) = 0. Then E \ E 0 ⊆ E 00 \ E 0 , so (d)(i) If E ∈ Σ, 0 E \ E is µnegligible, and E = E 0 4(E \ E 0 ) is the symmetric difference of a member of Σ and a negligible set. (ii) If E = F 4A, where F ∈ Σ and A is µnegligible, take G ∈ Σ such that µG = 0 and A ⊆ G; then ˆ F \ G ⊆ E ⊆ F ∪ G and µ((F ∪ G) \ (F \ G)) = µG = 0, so E ∈ Σ. 212F Now let us consider integration with respect to the completion of a measure. ˆ µ Proposition Let (X, Σ, µ) be a measure space and (X, Σ, ˆ) its completion. ˆ (a) A [−∞, ∞]valued function f defined on a subset of X is Σmeasurable R iff it isRµvirtually measurable. (b) Let f be a [−∞, ∞]valued function defined on a subset of X. Then f dµ = f dˆ µ if either is defined in [−∞, ∞]; in particular, f is µintegrable iff it is µ ˆintegrable. ˆ ˆ be such proof (a) (i) Suppose that f is a [−∞, ∞]valued Σmeasurable function. For q ∈ Q let Eq ∈ Σ that {x : f (x)S≤ q} = dom f ∩ Eq , and choose Eq0 , Eq00 ∈ Σ such that Eq0 ⊆ Eq ⊆ Eq00 and µ(Eq00 \ Eq0 ) = 0. Set H = X \ q∈Q (Eq00 \ Eq0 ); then H is conegligible. For a ∈ R set S Ga = q∈Q,q 0. If F ∈ Σ and F ⊆ H, let F ⊆ F be such that F ∈ Σ and µ ˆ(F \ F ) = 0. Then E ∩ F 0 ⊆ E
212Xk
Complete spaces
23
and µ ˆ(F 4(E ∩ F 0 )) = 0, so either µ ˆF = µ(E ∩ F 0 ) = 0 or µ ˆ(H \ F ) = µ(E \ F ) = 0. As F is arbitrary, H is an atom for µ ˆ. ˆ µ (iii) It follows at once that (X, Σ, ˆ) is atomless iff (X, Σ, µ) is. α) On the other hand, if (X, Σ, µ) is purely atomic and µ (iv)(α ˆH > 0, there is an E ∈ Σ such that E ⊆ H and µE > 0, and an atom F for µ such that F ⊆ E; but F is also an atom for µ ˆ. As H is arbitrary, ˆ µ (X, Σ, ˆ) is purely atomic. ˆ µ β ) And if (X, Σ, (β ˆ) is purely atomic and µE > 0, then there is an H ⊆ E which is an atom for µ ˆ; now let F ∈ Σ be such that F ⊆ H and µ ˆ(H \ F ) = 0, so that F is an atom for µ and F ⊆ E. As E is arbitary, (X, Σ, µ) is purely atomic. 212X Basic exercises > (a) Let (X, Σ, µ) be a complete measure space. Suppose that A ⊆ E ∈ Σ and that µ∗ A + µ∗ (E \ A) = µE < ∞. Show that A ∈ Σ. > (b) Let µ and ν be two measures on a set X, with completions µ ˆ and νˆ. Show that the following are ∗ ∗ equiveridical: (i) the outerR measures µ , ν defined from µ and ν coincide; (ii) µ ˆE = νˆE whenever either R is defined and finite; (iii) f dµ = f dν whenever f is a realvalued function such that either integral is defined and finite. (Hint: for (i)⇒(ii), if µ ˆE < ∞, take a measurable envelope F of E for ν and calculate ν ∗ E + ν ∗ (F \ E).) (c) Let µ be the restriction of Lebesgue measure to the Borel σalgebra of R, as in 211P. Show that its completion is Lebesgue measure itself. (Hint: 134F.) (d) Repeat 212Xc for (i) Lebesgue measure on R r (ii) LebesgueStieltjes measures on R (114Xa). (e) Let X be a set and µ1 , µ2 two measures on X, with domains Σ1 , Σ2 respectively. Let µ = µ1 + µ2 be their sum, with domain Σ = Σ1 ∩ Σ2 (112Xe). Show that if (X, Σ1 , µ1 ) and (X, Σ2 , µ2 ) are complete, so is (X, Σ, µ). (f ) Let X be a set and Σ a σalgebra of subsets of X. Let I be a σideal of subsets of X (112Db). (i) Show that Σ1 = {E4A : E ∈ Σ, A ∈ I} is a σalgebra of subsets of X. (ii) Let Σ2 be the family of sets E ⊆ X such that there are E 0 , E 00 ∈ Σ with E 0 ⊆ E ⊆ E 00 and E 00 \ E 0 ∈ I. Show that Σ2 is a σalgebra of subsets of X and that Σ2 ⊆ Σ1 . (iii) Show that Σ2 = Σ1 iff every member of I is included in a member of Σ ∩ I. (g) Let (X, Σ, µ) be a measure space, Y any set and φ : X → Y a function. Set θB = µ∗ φ−1 [B] for every B ⊆ Y . (i) Show that θ is an outer measure on Y . (ii) Let ν be the measure defined from θ by Carath´eodory’s method, and T its domain. Show that if C ⊆ Y and φ−1 [C] ∈ Σ then C ∈ T. (iii) Suppose that (X, Σ, µ) is complete and totally finite. Show that ν is the image measure µφ−1 . (h) Let X be a set and µ1 , µ2 two complete measures on X, with domains Σ1 , Σ2 . Let µ be their sum, with domain Σ1 ∩ Σ2 , as in 212Xe. Show that a realvalued function f defined on a subset of X is R R R µintegrable iff it is µi integrable for both i, and in this case f dµ = f dµ1 + f dµ2 . (Compare 212Yd.) (i) Let g, h be two nondecreasing functions from R to itself, and µg , µh the associated LebesgueStieltjes measures. Show that a realvalued function f R defined on Ra subset Rof R is µg+h integrable iff it is both µg integrable and µh integrable, and that then f dµg+h = f dµg + f dµh . (Hint: 114Yb). T (j) Let XPbe a set and hµi ii∈I a family of measures on X; write Σi for the domain of µi . Set Σ = i∈I Σi and µE = i∈I µi E for E ∈ Σ, as in 112Ya. (i) Show that if every µi is complete, so is µ. (ii) Suppose that µi is complete. Show that a realvalued function f defined on a subset of X is µintegrable iff R P every f dµ i is defined and finite. (Compare 212Ye.) i∈I (k) Let (X, Σ, µ) be a measure space, and I a σideal of subsets of X. (i) Show that Σ0 = {E ∩ A : E ∈ Σ, A ∈ I} is a σalgebra of subsets of X. (ii) Show that if every member of Σ ∩ I is µnegligible, then there is a unique extension of µ to a measure µ0 with domain Σ0 such that µ0 A = 0 for every A ∈ I.
24
Taxonomy of measure spaces
212Y
212Y Further exercises (a) Let µ be the restriction of counting measure to the countablecocountable σalgebra of R, as in 211Ya. Let µ ˆ be the completion of µ, µ∗ the outer measure defined from µ, and µ ˇ the measure defined by Carath´eodory’s method from µ∗ . Show that µ = µ ˆ and that µ ˇ = µ∗ is counting measure on R, so that µ ˆ 6= µ ˇ. P P (b) i∈I µi E = i∈I µi E for T Repeat 212Xe for sums of arbitrary families of measures, saying that E ∈ i∈I dom µi , as in 112Ya, 212Xj. (c) Let X be a set and φ an inner measure on X, that is, a functional from X to [0, ∞] such that φ∅ = 0, φ(A ∪ B) ≥ φA + φB if A ∩ B = ∅, T φ( n∈N An ) = limn→∞ φAn whenever hAn in∈N is a nonincreasing sequence of subsets of X and φA0 < ∞; if φA = ∞, a ∈ R there is a B ⊆ A such that a ≤ φB < ∞. Let µ be the measure defined from φ, that is, µ = φ¹Σ, where Σ = {E : φ(A) = φ(A ∩ E) + φ(A \ E) ∀ A ⊆ X} (113Yg). Show that µ must be complete. (d) Write µL for Lebesgue measure on [0, 1], and ΣL for its domain. Set Σ1 = {E × [0, 1] : E ∈ ΣL }, Σ2 = {[0, 1] × E : E ∈ ΣL } and µ1 (E × [0, 1]) = µ2 ([0, 1] × E) = µL E for every E ∈ ΣL . Let µ = µ1 + µ2 : Σ1 ∩ Σ2 → [0, 2] be Rtheir sum Ras defined in 212Xe and 212Xh above. Show that there is a function f : [0, 1]2 → {0, 1} such that f dµ1 = f dµ2 = 0 but f is not µintegrable. (e) Set X = ω1 + 1 = ω1 ∪ {ω1 } (2A1Dd), and set I = {E : E ⊆ ω1 is countable},
Σ = I ∪ {X \ E : E ∈ I}.
For each ξ < ω1 P define µξ : Σ → {0, 1} by setting µξ E = 1 if ξ ∈ E, 0 otherwise. Show that µξ is a measure on X. Set µ = ξ 0. Set E² = {x : x ∈ E, f (x) ≥ ²},
R
c = sup{ g : g is a simple function, g ≤a.e. f }; we are supposing that c is finite. If F ⊆ E² is measurable and µF < ∞, then ²χF is a simple function and ²χF ≤a.e. f , so ²µF =
R
²χF ≤ c,
µF ≤ c/².
As F is arbitrary, 213A tells us that µE² ≤ c/² is finite. As ² is arbitrary, (γ) is satisfied. As for (δ), if F = {x : x ∈ E, f (x) = ∞} then µF is finite (by (γ)) and nχF ≤a.e. f , so nµF ≤ c, for every n ∈ N, so µF = 0. Q Q (b) Now R suppose that f : D → [0, ∞] is a µvirtually measurable function, where D ⊆ X is conegligible, so that f is defined in [0, ∞] (135F). Then (a) tells us that Z
Z f=
sup
g
g is simple,g≤f a.e.
(if either is finite, and therefore also if either is infinite)
26
Taxonomy of measure spaces
213B
Z =
sup g is simple,g≤f a.e.,µF 0 (213Fc). This E belongs to E and µ(E \ F ) = µE > 0; which is impossible if F is an essential supremum of E. X X ˜ such that µ (ii) Thus µ ˜(H \ F ) = 0 for every H ∈ H. Now take any G ∈ Σ ˜(H \ G) = 0 for every H ∈ H. Let E0 ∈ Σ be such that E0 ⊆ F \ G and µE0 = µ ˜(F \ G); note that F \ E0 ⊇ F ∩ G. If E ∈ E, there is an H ∈ H such that E ⊆ H, so that µ(E \ (F \ E0 )) ≤ µ ˜(H \ (F ∩ G)) ≤ µ ˜(H \ F ) + µ ˜(H \ G) = 0. Because F is an essential supremum for E in Σ, 0 = µ(F \ (F \ E0 )) = µE0 = µ ˜(F \ G). ˜ As H is arbitrary, (X, Σ, ˜ µ This shows that F is an essential supremum for H in Σ. ˜) is localizable. ˜ then H has an essential supremum F in Σ ˜ (iii) The argument of (i)(ii) shows in fact that if H ⊆ Σ ˜ there is an F ∈ Σ such that such that F actually belongs to Σ. Taking H = {H}, we see that if H ∈ Σ µ(H4F ) = 0. (c) We already know that µ ˜E ≤ µE for every E ∈ Σ, with equality if µE < ∞, by 213Fa. (i) If (X, Σ, µ) is semifinite, then for any F ∈ Σ we have µF = sup{µE : E ∈ Σ, E ⊆ F, µE < ∞} = sup{˜ µE : E ∈ Σ, E ⊆ F, µE < ∞} ≤ µ ˜F ≤ µF, so that µ ˜F = µF . (ii) Suppose that µ ˜F = µF for every F ∈ Σ. If µF = ∞, then there must be an E ∈ Σ such that µE < ∞, µ ˆ(F ∩ E) > 0; in which case F ∩ E ∈ Σ and 0 < µ(F ∩ E) < ∞. As F is arbitrary, (X, Σ, µ) is semifinite. R ˜ (iii) If f is nonnegative and f dµ = ∞, then f is µvirtually measurable, therefore Σmeasurable (213Ga), and defined µalmost everywhere, therefore µ ˜almost everywhere. Now Z Z f d˜ µ = sup{ g d˜ µ : g is µ ˜simple, 0 ≤ g ≤ f µ ˜a.e.} Z ≥ sup{ g dµ : g is µsimple, 0 ≤ g ≤ f µa.e.} = ∞ R R R by 213B. With 213Gb, this shows that f d˜ µ = f dµ whenever f is nonnegative R R and f dµ is defined in [0, ∞]. Applying this to the positive and negative parts of f , we see that f d˜ µ = f dµ whenever the latter is defined in [−∞, ∞]. ˜ is an atom for µ ˜ such that (d)(i) If H ∈ Σ ˜, then (because µ ˜ is semifinite) there is surely an H 0 ∈ Σ 0 0 0 H ⊆ H and 0 < µ ˜H < ∞, and we must have µ ˜(H \ H ) = 0, so that µ ˜H < ∞. Accordingly there is an E ∈ Σ such that E ⊆ H and µ ˜(H \ E) = 0 (213Fc above). We have µE = µ ˜H > 0. If F ∈ Σ and F ⊆ E,
30
Taxonomy of measure spaces
213H
then either µF = µ ˜F = 0 or µ(E \ F ) = µ ˜(H \ F ) = 0. Thus E ∈ Σ is an atom for µ with µ ˜(H4E) = 0 and µE = µ ˜H < ∞. ˜ and there is an atom E for µ such that µE < ∞ and µ ˜ be a subset (ii) If H ∈ Σ ˜(H4E) = 0, let G ∈ Σ of H. We have µ ˜G ≤ µ ˜H = µE < ∞, so there is an F ∈ Σ such that F ⊆ G and µ ˜(G \ F ) = 0. Now either µ ˜G = µ(E ∩ F ) = 0 or µ ˜(H \ G) = ˜ and G ⊆ H; also µ µ(E \ F ) = 0. This is true whenever G ∈ Σ ˜H = µE > 0. So H is an atom for µ ˜. ˜ µ (e) If (X, Σ, µ) is atomless, then (X, Σ, ˜) must be atomless, by (d). ˜ µ If (X, Σ, µ) is purely atomic and H ∈ Σ, ˜H > 0, then there is an E ∈ Σ such that 0 < µ ˆ(H ∩ E) < ∞. Let E1 ∈ Σ be such that E1 ⊆ H ∩ E and µE1 > 0. There is an atom F for µ such that F ⊆ E1 ; now ˜ µ µF < ∞ so F is an atom for µ ˜, by (d). Also F ⊆ H. As H is arbitrary, (X, Σ, ˜) is purely atomic. ˜ µ (f ) If µ = µ ˜, then of course (X, Σ, µ) must be complete and locally determined, because (X, Σ, ˜) is. If ˜ (X, Σ, µ) is complete and locally determined, then µ ˆ = µ so (using the definition in 213D) Σ ⊆ Σ and µ ˜ = µ, by (c) above. 213I Locally determined negligible sets The following simple idea is occasionally useful. Definition A measure space (X, Σ, µ) has locally determined negligible sets if for every nonnegligible A ⊆ X there is an E ∈ Σ such that µE < ∞ and A ∩ E is not negligible. 213J Proposition If a measure space (X, Σ, µ) is either strictly localizable or complete and locally determined, it has locally determined negligible sets. proof Let A ⊆ X be a set such that A ∩ E is negligible whenever µE < ∞; I need to show that A is negligible. (i) If µ is strictly localizable, let hXi ii∈I be a decomposition ∩ Xi is negligible, so S of X. For each i ∈ I, A P there is a negligible Ei ∈ Σ such that A ∩ Xi ⊆ Ei . Set E = i∈I Ei ∩ Xi . Then µE = i∈I µ(Ei ∩ Xi ) = 0 and A ⊆ E, so A is negligible. (ii) If µ is complete and locally determined, take any measurable set E of finite measure. Then A ∩ E is negligible, therefore measurable; as E is arbitrary, A is measurable; as µ is semifinite, A is negligible. 213K Lemma If a measure space (X, Σ, µ) has locally S determined negligible sets, and E ⊆ Σ has an essential supremum H ∈ Σ in the sense of 211G, then H \ E is negligible. S proof Set A = H \ E. Take any F ∈ Σ such that µF < ∞. Then F ∩ A has a measurable envelope V say (132Ee). If E ∈ E, then µ(E \ (X \ V )) = µ(E ∩ V ) = µ∗ (E ∩ F ∩ A) = 0, so H ∩ V = H \ (X \ V ) is negligible and F ∩ A is negligible. As F is arbitrary and µ has locally determined negligible sets, A is negligible, as claimed. 213L Proposition Let (X, Σ, µ) be a localizable measure space with locally determined negligible sets; for instance, µ might be either complete, locally determined and localizable or strictly localizable (213J, 211Ld). Then every subset A of X has a measurable envelope. proof Set E = {E : E ∈ Σ, µ∗ (A ∩ E) = µE < ∞}. Let G be an essential supremum for E in Σ. (i) A \ G is negligible. P P Let F be any set of finite measure for µ. Let E be a measurable envelope of A ∩ F . Then E ∈ E so E \ G is negligible. But F ∩ A \ G ⊆ E \ G, so F ∩ A \ G is negligible. Because µ has locally determined negligible sets, this is enough to show that A \ G is negligible. Q Q
213N
Semifinite, locally determined and localizable spaces
31
˜ = E0 ∪ G, so that G ˜ ∈ Σ, A ⊆ G ˜ (ii) Let E0 be a negligible measurable set including A \ G, and set G ∗ ˜ \ G) = 0. ?? Suppose, if possible, that there is an F ∈ Σ such that µ (A ∩ F ) < µ(G ˜ ∩ F ). Let and µ(G F1 ⊆ F be a measurable envelope of A ∩ F . Set H = X \ (F \ F1 ); then A ⊆ H. If E ∈ E then µE = µ∗ (A ∩ E) ≤ µ(H ∩ E), ˜ \ H is negligible. But G ˜ ∩ F \ F1 ⊆ G ˜ \H so E \ H is negligible; as E is arbitrary, G \ H is negligible and G and ˜ ∩ F \ F1 ) = µ(G ˜ ∩ F ) − µ∗ (A ∩ F ) > 0. X µ(G X ˜ is a measurable envelope of A, as required. This shows that G 213M Corollary (a) If (X, Σ, µ) is σfinite, then every subset of X has a measurable envelope for µ. (b) If (X, Σ, µ) is localizable, then every subset of X has a measurable envelope for the c.l.d. version of µ. proof (a) Use 132Ee, or 213L and the fact that σfinite spaces are strictly localizable (211Lc). (b) Use 213L and the fact that the c.l.d. version of µ is localizable as well as being complete and locally determined (213Hb). 213N When we come to use the concept of ‘localizability’, it will frequently be through the following characterization. Theorem Let (X, Σ, µ) be a localizable measure space. Suppose that Φ is a family of measurable realvalued functions, all defined on measurable subsets of X, such that whenever f , g ∈ Φ then f = g almost everywhere on dom f ∩ dom g. Then there is a measurable function h : X → R such that every f ∈ Φ agrees with h almost everywhere on dom f . proof For q ∈ Q, f ∈ Φ set Ef q = {x : x ∈ dom f, f (x) ≥ q} ∈ Σ. For each q ∈ Q, let Eq be an essential supremum of {Ef q : f ∈ Φ} in Σ. Set h∗ (x) = sup{q : q ∈ Q, x ∈ Eq } ∈ [−∞, ∞] for x ∈ X, taking sup ∅ = −∞ if necessary. If f , g ∈ Φ and q ∈ Q, then Ef q \ (X \ (dom g \ Egq )) = Ef q ∩ dom g \ Egq ⊆ {x : x ∈ dom f ∩ dom g, f (x) 6= g(x)} is negligible; as f is arbitrary, Eq ∩ dom g \ Egq = Eq \ (X \ (dom g \ Egq ))
S is negligible. Also Egq \Eq is negligible, so Egq 4(Eq ∩dom g) is negligible. Set Hg = q∈Q Egq 4(Eq ∩dom g); then Hg is negligible. But if x ∈ dom g \ Hg , then, for every q ∈ Q, x ∈ Eq ⇐⇒ x ∈ Egq ; it follows that for such x, h∗ (x) = g(x). Thus h∗ = g almost everywhere on dom g; and this is true for every g ∈ Φ. The function h∗ is not necessarily realvalued. But it is measurable, because S {x : h∗ (x) > a} = {Eq : q ∈ Q, q > a} ∈ Σ for every real a. So if we modify it by setting h(x) = h∗ (x) if h(x) ∈ R = 0 if h∗ (x) ∈ {−∞, ∞}, we shall get a measurable realvalued function h : X → R; and for any g ∈ Φ, h(x) will be equal to g(x) at least whenever h∗ (x) = g(x), which is true for almost every x ∈ dom g. Thus h is a suitable function.
32
Taxonomy of measure spaces
213O
213O There is an interesting and useful criterion for a space to be strictly localizable which I introduce at this point, though it will be used rarely in this volume. Proposition Let (X, Σ, µ) be a complete locally determined space. (a) Suppose that there is a disjoint family E ⊆ Σ such that (α) µE < ∞ for every E ∈ E (β) whenever F ∈ an E ∈ E such that µ(E ∩ F ) > 0. Then (X, Σ, µ) is strictly localizable, S Σ and µF > 0 then there is S E is conegligible, and E ∪ {X \ E} is a decomposition of X. (b) Suppose that hXi ii∈I is a partition of X into measurable sets of finite measure such that whenever E ∈ Σ and µE > 0 there is an i ∈ I such that µ(E ∩ Xi ) > 0. Then (X, Σ, µ) is strictly localizable, and hXi ii∈I is a decomposition of X. proof S (a)(i) The first thing to note is that if F ∈ Σ and µF < ∞, there is a countable E 0 ⊆ E such that µ(F \ E 0 ) = 0. P P Set En0 = {E : E ∈ E, µ(F ∩ E) ≥ 2−n } for each n ∈ N, S E 0 = n∈N En0 = {E : E ∈ E, µ(F ∩ E) > 0}. Because E is disjoint, we must have #(En0 ) ≤ 2n µF for every n ∈ N, soSthat every En0 is finite and E 0 , being the union of a sequence of countable sets, is countable. Set E 0 = E 0 and F 0 = F \ E 0 , so that both E 0 and F 0 belong to Σ. If E ∈ E 0 , then E ⊆ E 0 so µ(E ∩ F 0 ) = µ∅ = 0; if E ∈ E \ E 0 , then µ(E ∩ S F 0 ) = µ(E ∩ F ) = 0. Thus µ(E ∩ F 0 ) = 0 for every E ∈ E. 0 By the hypothesis (β) on E, µF = 0, so µ(F \ E 0 ) = 0, as required. Q Q (ii) Now suppose that H ⊆ X is such that H ∩ E ∈ Σ for every E ∈ E. In this case H ∈ Σ. P PSLet 0 0 F ∈ Σ be such that µF < ∞. Let E 0 ⊆ E be a countable set such that µ(F \ E ) = 0, where E = E 0. S 0 0 Then H ∩ (F \ E ) ∈ Σ because (X, Σ, µ) is complete. But also H ∩ E = E∈E 0 H ∩ E ∈ Σ. So H ∩ F = (H ∩ (F \ E 0 )) ∪ (F ∩ (H ∩ E 0 )) ∈ Σ. As F is arbitrary and (X, Σ, µ) is locally determined, H ∈ Σ. Q Q P P (α) Because E is disjoint, we must (iii) P We find also that µH = E∈E µ(H 0∩ E) for every H ∈ Σ. P have E∈E 0 µ(H ∩ E) ≤ µH for every finite E ⊆ E, so P P 0 E∈E 0 µ(H ∩ E) : E ⊆ E is finite} ≤ µH. E∈E µ(H ∩ E) = sup{ (β) ForSthe reverse inequality, consider first the case µH < ∞. By (i), there is a countable E 0 ⊆ E such that µ(H \ E 0 ) = 0, so that P S P µH = µ(H ∩ E 0 ) = E∈E 0 µ(H ∩ E) ≤ E∈E µ(H ∩ E). (γ) In general, because (X, Σ, µ) is semifinite, µH = sup{µF : F ⊆ H, µF < ∞} X X ≤ sup{ µ(F ∩ E) : F ⊆ H, µF < ∞} ≤ µ(H ∩ E). E∈E
So in all cases we have µH ≤
E∈E
P
µ(H ∩ E), and the two are equal. Q Q S S (iv) In particular, setting E0 = X \ E, E0 ∈ Σ and µE0 = 0; that is, E is conegligible. Consider E ∗ = E ∪ {E0 }. This is a disjoint cover of X by sets of finite measure (now using the hypothesis (α) on E). If H ⊆ X is such that H ∩ E ∈ Σ for every E ∈ E ∗ , then H ∈ Σ and P P µH = E∈E µ(H ∩ E) = E∈E ∗ µ(H ∩ E). E∈E
Thus E ∗ (or, if you prefer, the indexed family hEiE∈E ∗ ) is a decomposition witnessing that (X, Σ, µ) is strictly localizable. (b) Apply (a) with E = {Xi : i ∈ I}, noting that E0 in (iv) is empty, so can be dropped.
213Xh
Semifinite, locally determined and localizable spaces
33
213X Basic exercises (a) Let (X, Σ, µ) be any measure space, µ∗ the outer measure defined from µ, ˇ for the domain of µ and µ ˇ the measure defined by Carath´eodory’s method from µ∗ ; write Σ ˇ. Show that (i) ˇ whenever F ∈ Σ and µF < ∞, µ ˇ extends the completion µ ˆ of µ; (ii) if H ⊆ X is such that H ∩ F ∈ Σ ˇ (iii) (ˇ then H ∈ Σ; µ)∗ = µ∗ , so that the integrable functions for µ ˇ and µ are the same (212Xb); (iv) if µ is strictly localizable then µ ˇ=µ ˆ; (v) if µ is defined by Carath´eodory’s method from another outer measure, then µ = µ ˇ. > (b) Let µ be counting measure restricted to the countablecocountable σalgebra of a set X (211R, 211Ya). (i) Show that the c.l.d. version µ ˜ of µ is just counting measure on X. (ii) Show that µ ˇ, as defined in 213Xa, is equal to µ ˜, and in particular strictly extends the completion of µ. (c) Let (X, Σ, µ) be any measure space. For E ∈ Σ set µsf E = sup{µ(E ∩ F ) : F ∈ Σ, µF < ∞}. (i) Show that (X, Σ, µsf ) is a semifinite measure space, and is equal to (X, Σ, µ) iff (X, Σ, µ) is semifinite. (ii) Show that a µintegrable realvalued function f is µsf integrable, with the same integral. (iii) Show that if E ∈ Σ and µsf E < ∞, then E can be expressed as E1 ∪ E2 where E1 , E2 ∈ Σ, µE1 = µsf E1 and µsf E2 = 0. (iv) Show that if f is a µsf integrable realvalued function on X, it is equal µsf almost everywhere to a µintegrable function. (v) Show that if (X, Σ, µsf ) is complete, so is (X, Σ, µ). (vi) Show that µ and µsf have identical c.l.d. versions. (d) Let (X, Σ, µ) be any measure space. Define µ ˇ as in 213Xa. Show that (ˇ µ)sf , as constructed in 213Xc, is precisely the c.l.d. version µ ˜ of µ, so that µ ˇ=µ ˜ iff µ ˇ is semifinite. (e) Let (X, Σ, µ) be a measure space. For A ⊆ X set µ∗ A = sup{µE : E ∈ Σ, µE < ∞, E ⊆ A}, as in 113Yh. (i) Show that the measure constructed from µ∗ by the method of 113Yg is just the c.l.d. version µ ˜ of µ. (ii) Show that µ ˜∗ = µ∗ . (iii) Show that if ν is another measure on X, with domain T, then µ ˜ = ν˜ iff µ∗ = ν∗ . (f ) Let X be a set and θ an outer measure on X. Show that θsf , defined by writing θsf A = sup{θB : B ⊆ A, θB < ∞} is also an outer measure on X. Show that the measures defined by Carath´eodory’s method from θ, θsf have the same domains. (g) Let (X, Σ, µ) be any measure space. Set µ∗sf A = sup{µ∗ (A ∩ E) : E ∈ Σ, µE < ∞} for every A ⊆ X. (i) Show that µ∗sf A = sup{µ∗ B : B ⊆ A, µ∗ B < ∞} for every A. (ii) Show that µ∗sf is an outer measure. (iii) Show that if A ⊆ X and µ∗sf A < ∞, there is an E ∈ Σ such that µ∗sf A = µ∗ (A ∩ E) = µE, ∗ µsf (A \ E) = 0. (Hint: take a nondecreasing sequence hEn in∈N of measurable sets of finite measure such S S that µ∗sf A = limn→∞ µ∗ (A ∩ En ), and let E ⊆ n∈N En be a measurable envelope of A ∩ n∈N En .) (iv) Show that the measure defined from µ∗sf by Carath´eodory’s method is precisely the c.l.d. version µ ˜ of µ. ˜∗ , so that if µ is complete and locally determined then µ∗sf = µ∗ . (v) Show that µ∗sf = µ >(h)PLet (X, Σ, µ) be a strictly localizable measure space with a decomposition hXi ii∈I . Show that µ A = i∈I µ∗ (A ∩ Xi ) for every A ⊆ X. ∗
34
Taxonomy of measure spaces
213Xi
> (i) Let (X, Σ, µ) be a complete locally determined measure space, and let A ⊆ X be such that max(µ∗ (E ∩ A), µ∗ (E \ A)) < µE whenever E ∈ Σ and 0 < µE < ∞. Show that A ∈ Σ. (Hint: given µF < ∞, consider the intersection E of measurable envelopes of F ∩A, F \A to see that µ∗ (F ∩A)+µ∗ (F \A) = µF .) > (j) Let (X, Σ, µ) be a measure space, µ ˜ its c.l.d. version, and µ ˇ the measure defined by Carath´eodory’s method from µ∗ . (i) Show that the following are equiveridical: (α) µ has locally determined negligible sets; (β) µ and µ ˜ have the same negligible sets; (γ) µ ˇ=µ ˜. (ii) Show that in this case µ is semifinite. (k) Let (X, Σ, µ) be a measure space. Show that the following are equiveridical: (i) (X, Σ, µ) has locally determined negligible sets; (ii) the completion µ ˆ and c.l.d. version µ ˜ of µ have the same sets of finite measure; (iii) µ and µ ˜ have the same integrable functions; (iv) µ ˜∗ = µ∗ ; (v) the outer measure µ∗sf of 213Xg is equal to µ∗ . (l) Let us say that a measure space (X, Σ, µ) has the measurable envelope property if every subset of X has a measurable envelope. (i) Show that a semifinite space with the measurable envelope property has locally determined negligible sets. (ii) Show that a complete semifinite space with the measurable envelope property is locally determined. (m) Let (X, Σ, µ) be a semifinite measure space, and suppose that it satisfies the conclusion of Theorem 213N. Show that it is localizable. (Hint: given E ⊆ Σ, set F = {F : F ∈ Σ, E ∩ F is negligible for every E ∈ E}. Let Φ be the set of functions f from subsets of X to {0, 1} such that f −1 [{1}] ∈ E and f −1 [{0}] ∈ F.) (n) Let (X, Σ, µ) be a measure space. Show that its c.l.d. version is strictly localizable iff there is a disjoint family E ⊆ Σ such that µE < ∞ for every E ∈ E and whenever F ∈ Σ, 0 < µF < ∞ there is an E ∈ E such that µ(E ∩ F ) > 0. (o) Show that the c.l.d. version of any pointsupported measure is pointsupported. 213Y Further exercises (a) Set X = N, and for A ⊆ X set p θA = #(A) if A is finite, ∞ if A is infinite. Show that θ is an outer measure on X, that θA = sup{θB : B ⊆ A, θB < ∞} for every A ⊆ X, but that the measure µ defined from θ by Carath´eodory’s method is not semifinite. Show that if µ ˇ is the measure defined by Carath´eodory’s method from µ∗ (213Xa), then µ ˇ 6= µ. (b) Set X = [0, 1] × {0, 1}, and let Σ be the family of those subsets E of X such that {x : x ∈ [0, 1], E[{x}] 6= ∅, E[{x}] 6= {0, 1}} is countable, writing E[{x}] = {y : (x, y) ∈ E} for each x ∈ [0, 1]. Show that Σ is a σalgebra of subsets of X. For E ∈ Σ, set µE = #({x : (x, 1) ∈ E}) if this is finite, ∞ otherwise. Show that µ is a complete semifinite measure. Show that the measure µ ˇ defined from µ∗ by Carath´eodory’s method (213Xa) is not semifinite. Show that the domain of the c.l.d. version of µ is the whole of PX. (c) Set X = N, and for A ⊆ X set φA = #(A)2 if A is finite, ∞ if A is infinite. Show that φ satisfies the conditions of 113Yg, but that the measure defined from φ by the method of 113Yg is not semifinite. (d) Let (X, Σ, µ) be a complete locally determined measure space. Suppose that D ⊆ X and that f : D → R is a function. Show that the following are equiveridical: (i) f is measurable; (ii) µ∗ {x : x ∈ D ∩ E, f (x) ≤ a} + µ∗ {x : x ∈ D ∩ E, f (x) ≥ b} ≤ µE whenever a < b in R, E ∈ Σ and µE < ∞ (iii) max(µ∗ {x : x ∈ D ∩ E, f (x) ≤ a}, µ∗ {x : x ∈ D ∩ E, f (x) ≥ b}) < µE
213 Notes
Semifinite, locally determined and localizable spaces
35
whenever a < b in R and 0 < µE < ∞. (Hint: for (iii)⇒(i), show that if E ⊆ X then µ∗ {x : x ∈ D ∩ E, f (x) > a} = supb>a µ∗ {x : x ∈ D ∩ E, f (x) ≥ b}, and use 213Xi above.) (e) Let (X, Σ, µ) be a complete locally determined measure space and suppose that E ⊆ Σ is suchSthat µE < S∞ for every E ∈ E and whenever F ∈ Σ and µF < ∞ there is a countable E0 ⊆ E such that F \ E0 , F ∩ (E \ E0 ) are negligible. Show that (X, Σ, µ) is strictly localizable. 213 Notes and comments I think it is fair to say that if the definition of ‘measure space’ were rewritten to exclude all spaces which are not semifinite, nothing significant would be lost from the theory. There are solid reasons for not taking such a drastic step, starting with the fact that it would confuse everyone (if you say to an unprepared audience ‘let (X, Σ, µ) be a measure space’, there is a danger that some will imagine that you mean ‘σfinite measure space’, but very few will suppose that you mean ‘semifinite measure space’). But the whole point of measure theory is that we distinguish between sets by their measures, and if every subset of E is either nonmeasurable, or negligible, or of infinite measure, the classification is too crude to support most of the usual ideas, starting, of course, with ordinary integration. Let us say that a measurable set E is purely infinite if E itself and all its nonnegligible measurable subsets have infinite measure. On the definition of the integral which I chose in Volume 1, every simple function, and therefore every integrable function, must be zero almost everywhere on E. This means that the whole theory of integration will ignore E entirely. Looking at the definition of ‘c.l.d. version’ (213D213E), you will see that the c.l.d. version of the measure will render E negligible, as does the ‘semifinite version’ described in 213Xc. These amendments do not, however, affect sets of finite measure, and consequently leave integrable functions integrable, with the same integrals. The strongest reason we have yet seen for admitting nonsemifinite spaces into consideration is that Carath´eodory’s method does not always produce semifinite spaces. (I give examples in 213Ya213Yb; more important ones are the Hausdorff measures of §§264265 below.) In practice the right thing to do is often to take the c.l.d. version of the measure produced by Carath´eodory’s construction. It is a reasonable general philosophy, in measure theory, to say that we wish to measure as many sets, and integrate as many functions, as we can manage in a canonical way – I mean, without making blatantly arbitrary choices about the values we assign to our measure or integral. The revision of a measure µ to its c.l.d. version µ ˜ is about as far as we can go with an arbitrary measure space in which we have no other structure to guide our choices. You will observe that µ ˜ is not as close to µ as the completion µ ˆ of µ is; naturally so, because if E ∈ Σ is purely infinite for µ then we have to choose between setting µ ˜E = 0 6= µE and finding some way of fitting many sets of finite measure into E; which if E is a singleton will be actually impossible, and in any case would be an arbitrary process. However the integrable functions for µ ˜, while not always the same as those for µ (since µ ˜ turns purely infinite sets into negligible ones, so that their characteristic functions become integrable), are ‘nearly’ the same, in the sense that any µ ˜integrable function can be changed into a µintegrable function by adjusting it on a µ ˜negligible set. This corresponds, of course, to the fact that any set of finite measure for µ ˜ is the symmetric difference of a set of finite measure for µ and a µ ˜negligible set. For sets of infinite measure this can fail, unless µ is localizable (213Hb, 213Xb). If (X, Σ, µ) is semifinite, or localizable, or strictly localizable, then of course it is correspondingly closer ˜ µ to (X, Σ, ˜), as detailed in 213Hac. It is worth noting that while the measure µ ˇ obtained by Carath´eodory’s method directly from the outer measure µ∗ defined from µ may fail to be semifinite, even when µ is (213Yb), a simple modification of µ∗ (213Xg) yields the c.l.d. version µ ˜ of µ, which can also be obtained from an appropriate inner measure (213Xe). The measure µ ˇ is of course related in other ways to µ ˜; see 213Xd.
36
Taxonomy of measure spaces
§214 intro.
214 Subspaces In §131 I described a construction for subspace measures on measurable subsets. It is now time to give the generalization to subspace measures on arbitrary subsets of a measure space. The relationship between this construction and the properties listed in §211 is not quite as straightforward as one might imagine, and in this section I try to give a full account of what can be expected of subspaces in general. I think that for the present volume only (i) general subspaces of σfinite spaces and (ii) measurable subspaces of general measure spaces will be needed in any essential way, and these do not give any difficulty; but in later volumes we shall need the full theory. I begin with a general construction for ‘subspace measures’ (214A214C), with an account of integration with respect to a subspace measure (214E214G); these (with 131E131H) give a solid foundation for the concept of ‘integration over a subset’ (214D). I give this work in its full natural generality, which will eventually be essential, but even for Lebesgue measure alone it is important to be aware of the ideas here. I continue with answers to some obvious questions concerning subspace measures and the properties of measure spaces so far considered, both for general subspaces (214I) and for measurable subspaces (214J), and I mention a basic construction for assembling measure spaces sidebyside, the ‘direct sums’ of 214K214L. 214A Proposition Let (X, Σ, µ) be a measure space, and Y any subset of X. Let µ∗ be the outer measure defined from µ (132A132B), and set ΣY = {E ∩ Y : E ∈ Σ}; let µY be the restriction of µ∗ to ΣY . Then (Y, ΣY , µY ) is a measure space. proof (a) I have noted in 121A that ΣY is a σalgebra of subsets of Y . (b) Of course µY F ∈ [0, ∞] for every F ∈ ΣY . (c) µY ∅ = µ∗ ∅ = 0. (d) If hFn in∈N is a disjoint sequence in ΣY with union F , then choose En , En0 , E ∈ Σ such that Fn = Y ∩ En for each n, Fn ⊆ En0S , µY Fn = µEn0 for each n, F ⊆ E and µY F = µE (using 132Aa 0 0 repeatedly). Set Gn = En ∩ En ∩ E \ m 0. Then there is an E ∈ Σ such that µE < ∞ and µ∗ (E ∩ U ) > 0. P P?? Otherwise, E ∩ U is µnegligible whenever µE < ∞; because µ has locally determined negligible sets, U is µnegligible and µY U = µ∗ U = 0. X XQ Q Now E ∩ U ∈ ΣY and 0 < µ∗ (E ∩ U ) = µY (E ∩ U ) ≤ µE < ∞. (c) By (a), µY is complete; by (b) and 213J, it is semifinite. (d) By (c), µY is complete and semifinite. To see that it is locally determined, take any U ⊆ Y such that U ∩ V ∈ ΣY whenever V ∈ ΣY and µY V < ∞. By 213L, there is a measurable envelope E of U for µ; of course E ∩ Y ∈ ΣY . I claim that µ(E ∩ Y \ U ) = 0. P P Take any F ∈ Σ with µF < ∞. Then F ∩ U ∈ ΣY , so µY (F ∩ E ∩ Y ) ≤ µ(F ∩ E) = µ∗ (F ∩ U ) = µY (F ∩ U ) ≤ µY (F ∩ E ∩ Y ); thus µY (F ∩ E ∩ Y ) = µY (F ∩ U ) and µ∗ (F ∩ E ∩ Y \ U ) = µY (F ∩ E ∩ Y \ U ) = 0. Because µ is complete, µ(F ∩E∩Y \U ) = 0; because µ is locally determined and F is arbitrary, µ(E∩Y \U ) = 0. Q Q But this means that E ∩ Y \ U ∈ ΣY and U ∈ ΣY . As U is arbitrary, µY is locally determined. To see that µY is localizable, let U be any family in ΣY . Set E = {E : E ∈ Σ, µE < ∞, µE = µ∗ (E ∩ U ) for some U ∈ U},
214L
Subspaces
41
and let G ∈ Σ be an essential supremum for E in Σ. I claim that G ∩ Y is an essential supremum for U in ΣY . P P (i) ?? If U ∈ U and U \ (G ∩ Y ) is not negligible, then (because µY is semifinite) there is a V ∈ ΣY such that V ⊆ U \ G and 0 < µY V < ∞. Now there is an E ∈ Σ such that V ⊆ E and µE = µ∗ V . We have µ∗ (E ∩ U ) ≥ µ∗ V = µE, so E ∈ E and E \ G must be negligible; but V ⊆ E \ G is not negligible. X X Thus U \ (G ∩ Y ) is negligible for every U ∈ U. (ii) If W ∈ ΣY is such that U \ W is negligible for every U ∈ U, express W as H ∩ Y where H ∈ Σ. If E ∈ E, there is a U ∈ U such that µE = µ∗ (E ∩ U ); now µ∗ (E ∩ U \ W ) = 0, so µE = µ∗ (E ∩ U ∩ W ) ≤ µ(E ∩ H) and E \ H is negligible. As E is arbitrary, H is an essential upper bound for E and G \ H is negligible; but this means that G ∩ Y \ W is negligible. As W is arbitrary, G ∩ Y is an essential supremum for U. Q Q As U is arbitrary, µY is localizable. 214J Measurable subspaces: Proposition Let (X, Σ, µ) be a measure space. (a) Let E ∈ Σ and let µE be the subspace measure, with ΣE its domain. If (X, Σ, µ) is complete, or totally finite, or σfinite, or strictly localizable, or semifinite, or localizable, or locally determined, or atomless, or purely atomic, so is (E, ΣE , µE ). (b) Suppose that hXi ii∈I is a disjoint cover of X by measurable sets (not necessarily of finite measure) such that Σ = {E : E ⊆ X, E ∩ Xi ∈ Σ ∀ i ∈ I}, P µE = i∈I µ(E ∩ Xi ) for every E ∈ Σ. Then (X, Σ, µ) is complete, or strictly localizable, or semifinite, or localizable, or locally determined, or atomless, or purely atomic, iff (Xi , ΣXi , µXi ) has that property for every i ∈ I. proof I really think that if you have read attentively up to this point, you ought to find this easy. If you are in any doubt, this makes a very suitable set of sixteen exercises to do. 214K Direct sums Let h(Xi , Σi , µi )ii∈I be any indexed family of measure spaces. Set X = {i}); for E ⊆ X, i ∈ I set Ei = {x : (x, i) ∈ E}. Write
S
i∈I (Xi
×
Σ = {E : E ⊆ X, Ei ∈ Σi ∀ i ∈ I}, P µE = i∈I µi Ei for every E ∈ Σ. Then it is easy to check that (X, Σ, µ) is a measure space; I will call it the direct sum of the family h(Xi , Σi , µi )ii∈I . Note that if (X, Σ, µ) is any decomposable measure space, with decomposition hXi ii∈I , L then we have a natural isomorphism between (X, Σ, µ) and the direct sum (X 0 , Σ0 , µ0 ) = i∈I (Xi , ΣXi , µXi ) of the subspace measures, if we match (x, i) ∈ X 0 with x ∈ X for every i ∈ I, x ∈ Xi . For some of the elementary properties (to put it plainly, I know of no properties which are not elementary) of direct sums, see 214L and 214Xi214Xl. 214L Proposition Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, with direct sum (X, Σ, µ). Let f be a realvalued function defined on a subset of X. For each i ∈ I, set fi (x) = f (x, i) whenever (x, i) ∈ dom f . (a) f is measurable iff fi is measurable for every i ∈ I. R R P (b) If f is nonnegative, then f dµ = i∈I fi dµi if either is defined in [0, ∞]. proof (a) For a ∈ R, set Fa = {(x, i) : (x, i) ∈ dom f, f (x, i) ≥ a}. (i) If f is measurable, i ∈ I and a ∈ R, then there is an E ∈ Σ such that Fa = E ∩ dom f ; now {x : fi (x) ≥ a} = dom fi ∩ {x : (x, i) ∈ E} belongs to the subspace σalgebra on dom fi induced by Σi . As a is arbitrary, fi is measurable. (ii) If every fi is measurable and a ∈ R, then for each i ∈ I there is an Ei ∈ Σi such that {x : (x, i) ∈ Fa } = Ei ∩ dom f ; setting E = {(x, i) : i ∈ I, x ∈ Ei }, Fa = dom f ∩ E belongs to the subspace σalgebra on dom f . As a is arbitrary, f is measurable. (b)(i) Suppose first that f is measurable and defined everywhere. Set Fnk = {(x, i) : (x, i) ∈ X, f (x, i) ≥ P 4n 2 k} for k, n ∈ N, gn = k=1 2−n χFnk for n ∈ N, Fnki = {x : (x, i) ∈ Fnk } for k, n ∈ N and i ∈ I, gni (x) = gn (x, i) for i ∈ I, x ∈ Xi . Then −n
42
Taxonomy of measure spaces
Z
Z f dµ = lim
n
gn dµ = sup
n→∞
214L
n∈N
4 X
2−n µFnk
k=1
n
= sup n∈N
=
X i∈I
=
X i∈I
4 X X
2−n µFnki
k=1 i∈I n
sup n∈N
sup n∈N
4 X
2−n µFnki
k=1
Z
gni dµi =
XZ
fi dµi .
i∈I
R (ii) Generally, if f dµ is defined, there are a measurable g : X → [0, ∞[ and a conegligible measurable set P E ⊆ dom f such that g = f on E. Now Ei = {x : (x, i) ∈ Xi } belongs to Σi for each i, and i∈I µi (Xi \ Ei ) = µ(X \ E) = 0, so Ei is µi conegligible for every i. Setting gi (x) = g(x, i) for x ∈ Xi , (i) tells us that R R P R P R fi dµi = i∈I gi dµi = g dµ = f dµ. i∈I R (iii) On the other hand, if fi dµi is defined for every i ∈ I, then for each i ∈ I we can find a measurable function gi : Xi → [0, ∞[ and a µi conegligible measurable set Ei ⊆ dom fi such that gi = fi on Ei . Setting g(x, i) = gi (x) for i ∈ I, x ∈ Xi , (a) tells us that g is measurable, while g = f on {(x, i) : i ∈ I, x ∈ Ei }, which is conegligible (by the calculation in (ii) just above); so R R P R P R f dµ = g dµ = i∈I gi dµi = i∈I fi dµi , again using (i) for the middle step. 214M Corollary Let (X, Σ, µ) be a measure space with a decomposition hXi ii∈I . If f is a realvalued function defined on a subset of X, then (a) f is measurable R iff f ¹X P i isRmeasurable for every i ∈ I, (b) if f ≥ 0, then f = i∈I Xi f if either is defined in [0, ∞]. proof Apply 214L to the direct sum of h(Xi , ΣXi , µXi )ii∈I , identified with (X, Σ, µ) as in 214K. 214X Basic exercises (a) Let (X, Σ, µ) be a localizable measure space. Show that there is an E ∈ Σ such that the subspace measure µE is purely atomic and µX\E is atomless. (b) Let (X, Σ, µ) be a measure space, and let µ ˆ be the completion of µ. Show that for any Y ⊆ X, the subspace measure µ ˆY on Y defined from µ ˆ is equal to the completion of the subspace measure µY . (Cf. 214Cb.) (c) Let X be a set, θ a regular outer measure on X, and Y a subset of X. Let µ be the measure on X defined by Carath´eodory’s method from θ, µY the subspace measure on Y , and ν the measure on Y defined by Carath´eodory’s method from θ¹ PY . (i) Show that ν is an extension of µY . (ii) Show that if F ∈ dom ν and νF < ∞ then F ∈ ΣY . (iii) Show that if µY is locally determined (in particular, if µ is either strictly localizable or complete, locally determined and localizable) then ν = µY . (d) Let (X, Σ, µ) be a localizable measure space, and Y a subset of X such that the subspace measure µY is semifinite. Show that µY is localizable. (e) Let (X, Σ, µ) be a measure space, and Y a subset of X such that the subspace measure µY is semifinite. Show that if µ is atomless or purely atomic, so is µY . (f ) Let (X, Σ, µ) be a localizable measure space, and Y any subset of X. Show that the c.l.d. version of the subspace measure on Y is localizable.
215A
σfinite spaces and the principle of exhaustion
43
(g) Let (X, Σ, µ) be a measure space with locally determined negligible sets in the sense of 213I, and Y a subset of X, with its subspace measure µY . (i) Show that µY has locally determined negligible sets; in particular, it is semifinite. (ii) Show that if µ is localizable, so is µY . (iii) Show that a set U ⊆ Y is an atom for µY iff it is expressible as F ∩ Y where F is an atom for µ and µ∗ (F ∩ Y ) > 0. (iv) Show that if µ is purely atomic, so is µY . (v) Show that if µ is atomless, so is µY . > (h) Let (X, Σ, µ) be a measure space. Show that (X, Σ, µ) has locally determined negligible sets iff the subspace measure µY is semifinite for every Y ⊆ X. > (i) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, with direct sum (X, Σ, µ) (214K). Set Xi0 = Xi × {i} ⊆ X for each i ∈ I. Show that Xi0 , with the subspace measure, is isomorphic to (Xi , Σi , µi ). Under what circumstances is hXi0 ii∈I a decomposition of X? Show that µ is complete, or strictly localizable, or localizable, or locally determined, or semifinite, or atomless, or purely atomic iff every µi is. Show that a measure space is strictly localizable iff it is isomorphic to a direct sum of totally finite spaces. > (j) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, and (X, Σ, µ) their direct sum. Show that the completion of (X, Σ, µ) can be identified with the direct sum of the completions of the (Xi , Σi , µi ), and that the c.l.d. version of (X, Σ, µ) can be identified with the direct sum of the c.l.d. versions of the (Xi , Σi , µi ). (k) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces. Show that their direct sum has locally determined negligible sets iff every µi has. (l) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, and (X, Σ, µ) their direct sum. Show that (X, Σ, µ) has the measurable envelope property (213Xl) iff every (Xi , Σi , µi ) has. 214 Notes and comments I take the first part of the section, down to 214H, slowly and carefully, because while none of the arguments are deep (214Eb is the longest) the patterns formed by the results are not always easy to predict. There is a counterexample to a tempting extension of 214H/214Xc in 216Xb. The message of the second part of the section (214I214K) is that subspaces inherit many, but not all, of the properties of a measure space; and in particular there is a difficulty with semifiniteness, unless we have locally determined negligible sets (214Xh). (I give an example in 216Xa.) Of course 213Hb shows that if we start with a localizable space, we can convert it into a complete locally determined localizable space without doing great violence to the structure of the space, so the difficulty is ordinarily superable.
215 σfinite spaces and the principle of exhaustion I interpolate a short section to deal with some useful facts which might get lost if buried in one of the longer sections of this chapter. The great majority of the applications of measure theory involve σfinite spaces, to the point that many authors skim over any others. I myself prefer to signal the importance of such concepts by explicitly stating just which theorems apply only to the restricted class of spaces. But undoubtedly some facts about σfinite spaces need to be grasped early on. In 215B I give a list of properties characterizing σfinite spaces. Some of these make better sense in the light of the principle of exhaustion (215A). I take the opportunity to include a fundamental fact about atomless measure spaces (215D). 215A The principle of exhaustion The following is an example of the use of one of the most important methods in measure theory. Lemma Let (X, Σ, µ) be any measure space and E ⊆ Σ a nonempty set such that supn∈N µFn is finite for every nondecreasing sequence hFn in∈N in E.
44
Taxonomy of measure spaces
215A
(a) There is a nondecreasing sequence hFn in∈N in E such that, for S every E ∈ Σ, either there is an n ∈ N such that E ∪ Fn is not included in any member of E or, setting F = n∈N Fn , limn→∞ µ(E \ Fn ) = µ(E \ F ) = 0. In particular, if E ∈ E and E ⊇ F , then E \ F is negligible. (b)SIf E is upwardsdirected, then there is a nondecreasing sequence hFn in∈N in E such that, setting F = n∈N Fn , µF = supE∈E µE and E \ F is negligible for every E ∈ E, so that F is an essential supremum of E in Σ in the sense of 211G. (c) If the union of any nondecreasing sequence in E belongs to E, then there is an F ∈ E such that E \ F is negligible whenever E ∈ E and F ⊆ E. proof (a) Choose hFn in∈N , hEn in∈N and hun in∈N inductively, as follows. Take F0 to be any member of E. Given Fn ∈ E, set En = {E : Fn ⊆ E ∈ E} and un = sup{µE : E ∈ En } in [0, ∞], and choose Fn+1 ∈ En such that µFn+1 ≥ min(n, un − 2−n ); continue. Observe that this construction yields a nondecreasing sequence hFn in∈N in E. Since En+1 ⊆ En for every n, hun in∈N is nonincreasing, and has a limit u in [0, ∞]. Since min(n, u − 2−n ) ≤ µFn+1 ≤ un for every n, limn→∞ µFn = u. Our hypothesis on E now tells us that u is finite. If E ∈ Σ is such that for every n ∈ N there is an En ∈ E such that E ∪ Fn ⊆ En , then En ∈ En , so µFn ≤ µ(E ∪ Fn ) ≤ µEn ≤ un for every n, and limn→∞ µ(E ∪ Fn ) = u. But this means that µ(E \ F ) ≤ limn→∞ µ(E \ Fn ) = limn→∞ µ(E ∪ Fn ) − µFn = 0, as stated. In particular, this is so if E ∈ E and E ⊇ F . (b) Take hFn in∈N from (a). If E ∈ E, then (because E is upwardsdirected) E ∪ Fn is included in some member of E for every n ∈ N; so we must have the second alternative of (a), and E \ F is negligible. It follows that supE∈E µE ≤ µF = limn→∞ µFn ≤ supE∈E µE, so µF = supE∈E µE. If G is any measurable set such that E \ F is negligible for every E ∈ E, then Fn \ G is negligible for every n, so that F \ G is negligible; thus F is an essential supremum for E. S (c) Again take hFn in∈N from (a), and set F = n∈N En . Our hypothesis now is that F ∈ E, so has both the properties declared. 215B
σfinite spaces are so important that I think it is worth spelling out the following facts.
Proposition Let (X, Σ, µ) be a semifinite measure space. Write N for the family of µnegligible sets and Σf for the family of measurable sets of finite measure. Then the following are equiveridical: (i) (X, Σ, µ) is σfinite; (ii) every disjoint family in Σf \ N is countable; (iii) every disjoint family in Σ \ N is countable; S (iv) for every E ⊆ Σ there is a countable set E0 ⊆ E such that E \ E0 is negligible for every E ∈ E; (v) forSevery nonempty upwardsdirected E ⊆ Σ there is a nondecreasing sequence hFn in∈N in E such that E \ n∈N Fn is negligible for every E ∈ E; S (vi) for every nonempty E ⊆ Σ, there is a nondecreasing sequence hFn in∈N in E such that E \ n∈N Fn is negligible whenever E ∈ E and E ⊇ Fn for every n ∈ N; (vii) either µX = 0 or there is a probability measure ν on X with the same domain and the same negligible sets as µ; (viii) there is a measurable integrable function f : X → ]0, 1]; R (ix) either µX = 0 or there is a measurable function f : X → ]0, ∞[ such that f dµ = 1. proof (i)⇒(vii) and (viii) If µX = 0, (vii) is trivial and we can take f = χX in (viii). Otherwise, let hEn in∈N be a disjoint sequence in Σf covering that there is a sequence hαn in∈N of P∞ X. Then it is easy to see P∞ strictly positive real numbers such that n=0 αn µEn = 1. Set νE = n=0 αn µ(E ∩ En ) for E ∈ Σ; then ν
215C
σfinite spaces and the principle of exhaustion
is a probability measure with domain Σ and the same negligible sets as µ. Also f = is a strictly positive measurable integrable function.
45
P∞ n=0
min(1, αn )χEn
(vii)⇒(vi) and (v) Assume (vii), and let E be a nonempty family of measurable sets. If µX = 0 then (vi) and (v) are certainly true. Otherwise, let ν be a probability measure with domain Σ and the same negligible sets as µ. Since supE∈E S νE ≤ 1 is finite, we can apply 215Aa to find S a nondecreasing sequence hFn in∈N in ESsuch that E \ n∈N Fn is negligible whenever E ∈ E includes n∈N Fn ; and if E is upwardsdirected, E \ n∈N Fn will be negligible for every E ∈ E, as in 215Ab. (vi)⇒(iv) Assume (vi), and let E be any subset of Σ. Set S H = { E0 : E0 ⊆ E is countable}. S By (vi), there is a sequence hHn in∈N in H such thatSH \ n∈N Hn is negligible whenever H ∈ H andS H ⊇ Hn 0 0 0 for every n ∈ N. Now we can express each H as E , where E ⊆ E is countable; setting E = n 0 n n n∈N En , S S E0 isScountable.SIf E ∈ E, then E ∪ n∈N Hn = ({E} ∪ E0 ) belongs to H and includes every Hn , so that E \ E0 = E \ n∈N Hn is negligible. So E0 has the property we need, and (iv) is true. S (iv)⇒(iii) Assume (iv). If E is a disjointSfamily in Σ \ N , take a countable E0 ⊆ E such that E \ E0 is negligible for every E ∈ E. Then E = E \ E0 is negligible for every E ∈ E \ E0 ; but this just means that E \ E0 is empty, so that E = E0 is countable. (iii)⇒(ii) is trivial. (ii)⇒(i) Assume (ii). Let P be the set of all disjoint subsets of Σf \ N , ordered by ⊆. Then P is a partially ordered set, not empty S (as ∅ ∈ P), and if Q ⊆ P is nonempty and totally ordered then it has an upper bound in P. P P Set E = Q, the union of all the disjoint families belonging to Q. If E ∈ E then E ∈ C for some C ∈ Q, so E ∈ Σf \ N . If E, F ∈ E and E 6= F , then there are C, D ∈ Q such that E ∈ C, F ∈ D; now Q is totally ordered, so one of C, D is larger than the other, and in either case C ∪ D is a member of Q containing both E and F . But since any member of Q is a disjoint collection of sets, E ∩ F = ∅. As E and F are arbitrary, E is a disjoint family of sets and belongs to P. And of course C ⊆ E for every C ∈ Q, so E is an upper bound for Q in P. Q Q S By Zorn’s Lemma (2A1M), P has a maximal element E say. By (ii), E must be countable, so E ∈ Σ. S Now H = X \ E is negligible. P P?? Suppose, if possible, otherwise. Because (X, Σ, µ) is semifinite, there is a set G of finite measure such that G ⊆ H and µG > 0, that is, G ∈ Σf \ N and G ∩ E = ∅ for every E ∈ E. But this means that {G} ∪ E is a member of P strictly larger than E, which is supposed to be impossible. X XQ Q Let hXn in∈N be a sequence running over E ∪ {H}. Then hXn in∈N is a cover of X by a sequence of measurable sets of finite measure, so (X, Σ, µ) is σfinite. S (v)⇒(i) If (v) is true, then we have a sequence hEn in∈N in Σf such that E \ n∈N En is negligible for S every E ∈ Σf . Because µ is semifinite, X \ n∈N En must be negligible, so X is covered by a countable family of sets of finite measure and µ is σfinite. (viii)⇒(ix) If µX = 0 this is trivial. Otherwise, if f is a strictly positive measurable integrable function, R 1 then c = f > 0 (122Rc), so f is a strictly positive measurable function with integral 1. c
(ix)⇒(i) If f : X → ]0, ∞[ is measurable and integrable, h{x : f (x) ≥ 2−n }in∈N is a sequence of sets of finite measure covering X. 215C Corollary Let (X, Σ, µ) be a σfinite measure space, and suppose that E ⊆ Σ is any nonempty set. (a) There is a nondecreasing sequence hFn in∈N in E such that, for every E ∈ Σ, either there is an n ∈ N S such that E ∪ Fn is not included in any member of E or E \ n∈N Fn is negligible. S (b) If E is upwardsdirected, then there is a nondecreasing sequence hFn in∈N in E such that n∈N Fn is an essential supremum of E in Σ. (c) If the union of any nondecreasing sequence in E belongs to E, then there is an F ∈ E such that E \ F is negligible whenever E ∈ E and F ⊆ E. proof By 215B, there is a totally finite measure ν on X with the same measurable sets and the same negligible sets as µ. Since supE∈E νE is finite, we can apply 215A to ν to obtain the results.
46
Taxonomy of measure spaces
215D
215D As a further example of the use of the principle of exhaustion, I give a fundamental fact about atomless measure spaces. Proposition Let (X, Σ, µ) be an atomless measure space. If E ∈ Σ and 0 ≤ α ≤ µE < ∞, there is an F ∈ Σ such that F ⊆ E and µF = α. proof (a) We need to know that if G ∈ Σ is nonnegligible and n ∈ N, then there is an H ⊆ G such that 0 < µH ≤ 2−n µG. P P Induce on n. For n = 0 this is trivial. For the inductive step to n + 1, use the inductive hypothesis to find H ⊆ G such that 0 < µH ≤ 2−n µG. Because µ is atomless, there is an H 0 ⊆ H such that µH 0 , µ(H \ H 0 ) are both defined and nonzero. Now at least one of them has measure less than or equal to 12 µH, so gives us a subset of G of nonzero measure less than or equal to 2−n−1 µG. Q Q It follows that if G ∈ Σ has nonzero finite measure and ² > 0, there is a measurable set H ⊆ G such that 0 < µH ≤ ². (b) Let H be the family of all Sthose H ∈ Σ such that H ⊆ E and S µH ≤ α. If hHn in∈N is any nondecreasing sequence in H, then µ( n∈N Hn ) = limn→∞ µHn ≤ α, so n∈N Hn ∈ H. So 215Ac tells us that there is an F ∈ H such that H \ F is negligible whenever H ∈ H and F ⊆ H. ?? Suppose, if possible, that µF < α. By (a), there is an H ⊆ E \ F such that 0 < µH ≤ α − µF . But in this case H ∪ F ∈ H and µ((H ∪ F ) \ F ) > 0, which is impossible. X X So we have found an appropriate set F . 215X Basic exercises (a) Let (X, Σ, µ) be a measure R space and Φ a nonempty set of µintegrable realvalued functions from X to R. Suppose that supn∈N fn is finite for every sequence hfn in∈N in Φ such that fn ≤a.e. fn+1 for every n. Show that there is a sequence hfn in∈N in Φ such that fn ≤a.e. fn+1 for every n and, for every integrable realvalued function f on X, either f ≤a.e. supn∈N fn or there is an n ∈ N such that no member of Φ is greater than or equal to max(f, fn ) almost everywhere. > (b) Let (X, Σ, µ) be a measure space. (i) Suppose S that E is a nonempty upwardsdirected subset of Σ such that c = supE∈E µE is finite. Show that E \ n∈N Fn is negligible whenever E ∈ E and hFn in∈N is a sequence in E such that limn→∞ µFn = c. (ii) Let Φ be a nonempty set of integrable functions on X which is upwardsdirected in the R sense that for all f , g ∈ Φ there is an h ∈ Φ such that max(f, g) ≤a.e. h, and suppose that c = supf ∈Φ fR is finite. Show that f ≤a.e. supn∈N fn whenever f ∈ Φ and hfn in∈N is a sequence in Φ such that limn→∞ fn = c. (c) Use 215A to shorten the proof of 211Ld. (d) Give an example of a (nonsemifinite) measure space (X, Σ, µ) satisfying conditions (ii)(iv) of 215B, but not (i). > (e) Let (X, Σ, µ) be an atomless σfinite measure space. Show that forSany ² > 0 there is a disjoint sequence hEn in∈N of measurable sets with measure at most ² such that X = n∈N En . (f ) Let (X, Σ, µ) be an atomless strictly localizable measure space. Show that for any ² > 0 there is a decomposition hXi ii∈I of X such that µXi ≤ ² for every i ∈ I. 215Y Further exercises (a) Let (X, Σ, µ) be a σfinite measure space and hfmn im,n∈N , hfm im∈N , f measurable realvalued functions defined almost everywhere on X and such that hfmn in∈N → fm a.e. for each m and hfm im∈N → f a.e. Show that there is a strictly increasing sequence hnm im∈N in N such that hfm,nm im∈N → f a.e. (Compare 134Yb.) (b) Let (X, Σ, µ) be a σfinite measure space. Let hfn in∈N be a sequence of measurable realvalued functions such that f = limn→∞ fn is defined almost everywhere on X. Show that there is a nondecreasing S sequence hXk ik∈N of measurable subsets of X such that k∈N Xk is conegligible in X and hfn in∈N → f uniformly on every Xk , in the sense that for any ² > 0 there is an m ∈ N such that fj (x) − f (x) is defined and less than or equal to ² whenever j ≥ m, x ∈ Xk . (This is a version of Egorov’s theorem.)
§216 intro.
Examples
47
(c) Let (X, Σ, µ) be a totally finite measure space and hfn in∈N , f measurable realvalued functions defined almost everywhere on X. Show that hfn in∈N → f a.e. iff there is a sequence h²n in∈N of strictly positive real numbers, converging to 0, such that S limn→∞ µ∗ ( k≥n {x : x ∈ dom fk ∩ dom f, fk (x) − f (x) ≥ ²n }) = 0. (d) Find a direct proof of (v)⇒(vi) in 215B. (Hint: given E ⊆ Σ, use Zorn’s Lemma to find a maximal totally ordered E 0 ⊆ E such that E4F ∈ / N for any distinct E, F ∈ E 0 , and apply (v) to E 0 .) 215 Notes and comments The common ground of 215A, 215B(vi), 215C and 215Xa is actually one of the most fundamental ideas in measure theory. It appears in such various forms that it is often easier to prove an application from first principles than to explain how it can be reduced to the versions here. But I will try henceforth to signal such applications as they arise, calling the method (the proof of 215Aa or 215Xa) the ‘principle of exhaustion’. One point which is perhaps worth noting here is the inductive construction of the sequence hFn in∈N in the proof of 215Aa. Each Fn+1 is chosen after the preceding one. It is this which makes it possible, in the proof of 215B(vii)⇒(vi), to extract a suitable sequence hFn in∈N directly. In many applications (starting with what is surely the most important one in the elementary theory, the RadonNikod´ ym theorem of §232, or with part (i) of the proof of 211Ld), this refinement is not needed; we are dealing with an upwardsdirected set, as in 215B(v), and can choose the whole sequence hFn in∈N at once, no term interacting with any other, as in 215Xb. The axiom of ‘dependent choice’, which asserts that we can construct sequences termbyterm, is known to be stronger than the axiom of ‘countable choice’, which asserts only that we can choose countably many objects simultaneously. In 215B I try to indicate the most characteristic properties of σfiniteness; in particular, the properties which distinguish σfinite measures from other strictly localizable measures. This result is in a way more abstract than the manipulations in the rest of the section. Note that it makes an essential use of the axiom of choice in the form of Zorn’s Lemma. I spent a paragraph in 134C commenting on the distinction between ‘countable choice’, which is needed for anything which looks like the standard theory of Lebesgue measure, and the full axiom of choice, which is relatively little used in the elementary theory. The implication (ii)⇒(i) of 215B is one of the points where we do need something beyond countable choice. (I should perhaps remark that the whole theory of nonσfinite measure spaces looks very odd without the general axiom of choice.) Note also that in 215B the proofs of (i)⇒(vii) and (vii)⇒(vi) are the only points where anything so vulgar as a number appears. The conditions (iii), (iv), (v) and (vi) are linked in ways that have nothing to do with measure theory, and involve only with the structure (X, Σ, N ). (See 215Yd here, and 316D316E in Volume 3.) There are similar conditions relating to measurable functions rather than measurable sets; for a fairly abstract example, see 241Yd. In 215Ya215Yc are three more standard theorems on almosteverywhereconvergent sequences which depend on σ or total finiteness.
216 Examples It is common practice – and, in my view, good practice – in books on pure mathematics, to provide discriminating examples; I mean that whenever we are given a list of new concepts, we expect to be provided with examples to show that we have a fair picture of the relationships between them, and in particular that we are not being kept ignorant of some startling implication. Concerning the concepts listed in 211A211K, we have ten different properties which some, but not all, measure spaces possess, giving a conceivable total of 210 different types of measure space, classified according to which of these ten properties they have. The list of basic relationships in 211L reduces these 1024 possibilities to 72. Observing that a space can be simultaneously atomless and purely atomic only when the measure of the whole space is 0, we find ourselves with 56 possibilities, being two trivial cases with µX = 0 (because such a measure may or may not be complete) together with 9 × 2 × 3 cases, corresponding to the nine classes probability spaces, spaces which are totally finite, but not probability spaces,
48
Taxonomy of measure spaces
§216 intro.
spaces which are σfinite, but not totally finite, spaces which are strictly localizable, but not σfinite, spaces which are localizable and locally determined, but not strictly localizable, spaces which are localizable, but not locally determined, spaces which are locally determined, but not localizable, spaces which are semifinite, but neither locally determined nor localizable, spaces which are not semifinite; the two classes spaces which are complete, spaces which are not complete; and the three classes spaces which are atomless, not of measure 0, spaces which are purely atomic, not of measure 0, spaces which are neither atomless nor purely atomic. I do not propose to give a complete set of fiftysix examples, particularly as rather fewer than fiftysix different ideas are required. However, I do think that for a proper understanding of abstract measure spaces it is necessary to have seen realizations of some of the critical combinations of properties. I therefore take a few paragraphs to describe three special examples to add to those of 211M211R. 216A Lebesgue measure Before turning to the new ideas, let me mention Lebesgue measure again. As remarked in 211M, 211P and 211Q, (a) Lebesgue measure µ on R is complete, atomless and and σfinite, therefore strictly localizable, localizable and locally determined. (b) The subspace measure µ[0,1] on [0, 1] is a complete, atomless probability measure. (c) The restriction µ¹B of µ to the algebra B of Borel sets in R is atomless, σfinite and not complete. 216B I now embark on the description of three ‘counterexamples’; meaning spaces built specifically for the purpose of showing that there are no unexpected implications among the ten properties under consideration here. Even by the standards of this chapter these must be regarded as dispensable by the student who wants to get on with the real business of understanding the big theorems of the subject. Neither the existence of these examples, nor the techniques needed in constructing them, are vital for anything else we shall look at before Volume 5. But if you are going to take abstract measure theory seriously at all, sooner or later you will need to form some kind of mental picture of the nature of the spaces possessing the different properties here, and a minimal requirement of such a picture is that it should include the discriminations witnessed by these examples. *216C A complete, localizable, nonlocallydetermined space The first example hardly needs an idea beyond what we already have, but it does call for more manipulations than it seems fair to set as an exercise, and may therefore be useful as a demonstration of technique. (a) Let I be any uncountable set, and set X = {0, 1} × I. For E ⊆ X, y ∈ {0, 1} set E[{y}] = {i : (y, i) ∈ E} ⊆ I. Set Σ = {E : E ⊆ X, E[{0}]4E[{1}] is countable}. Then Σ is a σsubalgebra of subsets of X. P P (i) ∅[{0}]4∅[{1}] = ∅ is countable, so ∅ ∈ Σ. (ii) If E ∈ Σ then is countable. (iii) If hEn in∈N
(X \ E)[{0}]4(X \ E)[{1}] = E[{0}]4E[{1}] S is a sequence in Σ and E = n∈N En , then S E[{0}]4E[{1}] ⊆ n∈N En [{0}]4En [{1}]
is countable. Q Q For E ∈ Σ, set µE = #(E[{0}]) if this is finite, ∞ otherwise; then (X, Σ, µ) is a measure space. (b) (X, Σ, µ) is complete. P P If A ⊆ E ∈ Σ and µE = 0, then (0, i) ∈ / E for every i. So
*216D
Examples
49
A[{0}]4A[{1}] = A[{1}] ⊆ E[{1}] = E[{1}]4E[{0}] must be countable, and A ∈ Σ. Q Q (c) (X, Σ, µ) is semifinite. P P If E ∈ Σ and µE > 0, there is an i ∈ I such that (0, i) ∈ E; now F = {(0, i)} ⊆ E and µF = 1. Q Q (d) (X, Σ, µ) is localizable. P P Let E be any subset of Σ. Set S J = E∈E E[{0}], G = {0, 1} × J. Then G ∈ Σ. If H ∈ Σ, then µ(E \ H) = 0 for every E ∈ E ⇐⇒ E[{0}] ⊆ H[{0}] for every E ∈ E ⇐⇒ (0, i) ∈ H for every i ∈ J ⇐⇒ µ(G \ H) = 0. Thus G is an essential supremum for E in Σ; as E is arbitrary, µ is localizable. Q Q (e) (X, Σ, µ) is not locally determined. P P Consider H = {0}×I. Then H ∈ / Σ because H[{0}]4H[{1}] = I is uncountable. But let E ∈ Σ be any set such that µE < ∞. Then (E ∩ H)[{0}]4(E ∩ H)[{1}] = (E ∩ H)[{0}] ⊆ E[{0}] is finite, so E ∩ H ∈ Σ. As E is arbitrary, H witnesses that µ is not locally determined. Q Q (f ) (X, Σ, µ) is purely atomic. P P Let E ∈ Σ be any set of nonzero measure. Let i ∈ I be such that (0, i) ∈ E. Then (0, i) ∈ E and F = {(0, i)} is a set of measure 1, included in E; because F is a singleton set, it must be an atom for µ; as E is arbitrary, µ is purely atomic. Q Q (g) Thus the construction here yields a complete, localizable, purely atomic, nonlocallydetermined space. *216D A complete, locally determined space which is not localizable The next construction requires a little set theory. We need two sets I, J such that I is uncountable (more strictly, I cannot be S expressed as the union of countably many countable sets), I ⊆ J and J cannot be expressed as i∈I Ki where every Ki is countable. The most natural way of doing this, subject to the axiom of choice, is to take I = ω1 , the first uncountable ordinal, and J to be ω2 , the first ordinal from which there is no injection into ω1 (see 2A1Fc); but in case you prefer other formulations (e.g., I = {{x} : x ∈ R} and J = PR), I will write the following argument in terms of I and J, and you can pick your own pair. (a) Let T be the countablecocountable σalgebra of J and ν the countablecocountable measure on J (211R). Set X = J × J and for E ⊆ X set E[{ξ}] = {η : (ξ, η) ∈ E}, E −1 [{ξ}] = {η : (η, ξ) ∈ E} for every ξ ∈ J. Set Σ = {E : E[{ξ}] and E −1 [{ξ}] belong to T for every ξ ∈ J}, P P µE = ξ∈J νE[{ξ}] + ξ∈J νE −1 [{ξ}] for every E ∈ Σ. It is easy to check that Σ is a σalgebra and that µ is a measure. (b) (X, Σ, µ) is complete. P P If A ⊆ E ∈ Σ and µE = 0, then all the sets E[{ξ}], E −1 [{ξ}] are countable, so the same is true of all the sets A[{ξ}], A−1 [{ξ}], and A ∈ Σ. Q Q (d) (X, Σ, µ) is semifinite. P P For each ζ ∈ J, set Gζ = {ζ} × J,
˜ ζ = J × {ζ}. G
˜ ˜ −1 Then all the sections Gζ [{ξ}], G−1 ζ [{ξ}], Gζ [{ξ}], Gζ [{ξ}] are either J or ∅ or {ζ}, so belong to T, and all ˜ ζ belong to Σ, with µmeasure 1. the Gζ , G
50
Taxonomy of measure spaces
*216D
Suppose that E ∈ Σ is a set of strictly positive measure. Then there must be some ξ ∈ J such that ˜ ξ ) < ∞, 0 < νE[{ξ}] + νE −1 [{ξ}] = µ(E ∩ Gξ ) + µ(E ∩ G ˜ ξ is a set of nonzero finite measure included in E. Q and one of the sets E ∩ Gξ , E ∩ G Q (e) (X, Σ, µ) is locally determined. P P Suppose that H ⊆ X is such that H ∩ E ∈ Σ whenever E ∈ Σ and ˜ ζ belong to Σ, so µE < ∞. Then, in particular, H ∩ Gζ and H ∩ G ˜ ζ )[{ζ}] ∈ T, H[{ζ}] = (H ∩ G H −1 [{ζ}] = (H ∩ Gζ )−1 [{ζ}] ∈ T, for every ζ ∈ J. This shows that H ∈ Σ. As H is arbitrary, µ is locally determined. Q Q (f ) (X, Σ, µ) is not localizable. P P Set E = {Gζ : ζ ∈ J}. ?? Suppose, if possible, that G ∈ Σ is an essential supremum for E. Then ν(J \ G[{ξ}]) = µ(Gξ \ G) = 0 S and J \ G[{ξ}] is countable, for every ξ ∈ J. Consequently J 6= ξ∈I (J \ G[{ξ}]), and there is an η belonging S T to J \ ξ∈I (J \ G[{ξ}]) = ξ∈I G[{ξ}]. This means just that (ξ, η) ∈ G for every ξ ∈ I, that is, that ˜ η ) = 1. But observe that I ⊆ G−1 [{η}]. Accordingly G−1 [{η}] is uncountable, so that νG−1 [{η}] = µ(G ∩ G ˜ ˜ µ(Gξ ∩ Gη ) = µ{(ξ, η)} = 0 for every ξ ∈ J. This means that, setting H = X \ Gη , E \ H is negligible, for ˜ η ) = 1, which is absurd. X every E ∈ E; so that we must have 0 = µ(G \ H) = µ(G ∩ G X Thus E has no essential supremum in Σ, and µ cannot be localizable. Q Q (g) (X, Σ, µ) is purely atomic. P P If E ∈ Σ has nonzero measure, there must be some ξ ∈ J such that ˜ ξ is not negligible. But if one of E[{ξ}], E −1 [{ξ}] is not countable; that is, such that one of E ∩ Gξ , E ∩ G now H ⊆ E ∩ Gξ , either H[{ξ}] is countable, and µH = 0, or J \ H[{ξ}] is countable, and µ(Gξ \ H) = 0; ˜ ξ , one of µH, µ(G ˜ ξ \ H) must be 0, according to whether H −1 [{ξ}] is countable or similarly, if H ⊆ E ∩ G ˜ not. Thus E ∩ Gξ , E ∩ Gξ , if not negligible, must be atoms, and E must include an atom. As E is arbitrary, µ is purely atomic. Q Q (h) Thus (X, Σ, µ) is complete, locally determined and purely atomic, but is not localizable. *216E A complete, locally determined, localizable space which is not strictly localizable For the last, and most interesting, construction, we need a nontrivial result in infinitary combinatorics, which I have written out in 2A1P: if I is any set, and hfα iα∈A is a family in {0, 1}I , the set of functions from I to {0, 1}, with #(A) strictly greater than c, the cardinal of the continuum, and if hKα iα∈A is any family of countable subsets of I, then there must be distinct α, β ∈ A such that fα and fβ agree on Kα ∩ Kβ . Armed with this fact, I proceed as follows. (a) Let C be any set of cardinal greater than c. Set I = PC, the set of subsets of C, and write X = {0, 1}I . For γ ∈ C, define fγ ∈ X by saying that fγ (Γ) = 1 if γ ∈ Γ ⊆ C and fγ (Γ) = 0 if γ ∈ / Γ ⊆ C. Let K be the family of countable subsets of I, and for K ∈ K, γ ∈ C set FγK = {x : x ∈ X, x¹K = fγ ¹K} ⊆ X. Let Σγ = {E : E ⊆ X, either there is a K ∈ K such that FγK ⊆ E or there is a K ∈ K such that FγK ⊆ X \ E}. Then Σγ is a σalgebra of subsets of X. P P (i) Fγ∅ ⊆ X \ ∅ so ∅ ∈ Σγ . (ii) The definition of Σγ is symmetric between E and X \ E, so X \ E ∈ Σγ whenever E ∈ Σγ . (iii) Let hEn in∈N be a sequence in Σγ , with union E. (α) If there are n ∈ N, K ∈ K such that FγK ⊆ En , then FS γK ⊆ E, so E ∈ Σγ . (β) Otherwise, there is for each n ∈ N a Kn ∈ K such that Fγ,Kn ⊆ X \ En . Set K = n∈N Kn ∈ K. Then FγK = {x : x¹K = fγ ¹K} = {x : x¹Kn = fγ ¹Kn for every n ∈ N} \ \ X \ En = X \ E, Fγ,Kn ⊆ = n∈N
n∈N
*216E
Examples
51
so again E ∈ Σγ . As hEn in∈N is arbitrary, Σγ is a σalgebra. Q Q (b) Set Σ=
T γ∈C
Σγ ;
then Σ, being an intersection of σalgebras, is a σalgebra of subsets of X (see 111Ga). Define µ : Σ → [0, ∞] by setting µE = #({γ : fγ ∈ E}) if this is finite, = ∞ otherwise; then µ is a measure. (c) It will be convenient later to know something about the sets GD = {x : x ∈ X, x(D) = 1} for D ⊆ C. In particular, every GD belongs to Σ. P P If γ ∈ D, then fγ (D) = 1 so GD = Fγ,{D} ∈ Σγ . If γ ∈ C \ D, then fγ (D) = 0 so GD = X \ Fγ,{D} ∈ Σγ . Q Q Also, of course, {γ : fγ ∈ GD } = D. (d) (X, Σ, µ) is complete. P P Suppose that A ⊆ E ⊆ Σ and that µE = 0. For every γ ∈ C, E ∈ Σγ and fγ ∈ / E, so FγK 6⊆ E for any K ∈ K and there is a K ∈ K such that FγK ⊆ X \ E ⊆ X \ A. Thus A ∈ Σγ ; as γ is arbitrary, A ∈ Σ. As A is arbitrary, µ is complete. Q Q (e) (X, Σ, µ) is semifinite. P P Let E ∈ Σ be a set of positive measure. Then there must be some γ ∈ C such that fγ ∈ E. Consider E 0 = E ∩ G{γ} . As fγ ∈ E 0 , µE 0 ≥ 1 > 0. On the other hand, µG{γ} = #({δ : δ ∈ {γ}}) = 1, so µE 0 = 1. As E is arbitrary, µ is semifinite. Q Q S (f ) (X, Σ, µ) is localizable. P P Let E be any subset of Σ. Set D = {δ : δ ∈ C, fδ ∈ E}. Consider GD . For H ∈ Σ, µ(E \ H) = 0 for every E ∈ E ⇐⇒ fγ ∈ / E \ H for every E ∈ E, γ ∈ C ⇐⇒ fγ ∈ H for every γ ∈ D ⇐⇒ fγ ∈ / GD \ H for every γ ∈ C ⇐⇒ µ(GD \ H) = 0. Thus GD is an essential supremum for E in Σ. As E is arbitrary, µ is localizable. Q Q (g) (X, Σ, µ) is not strictly localizable. P P?? Suppose, if possible, that hXj ij∈J is a decomposition of (X, Σ, µ). Set J 0 = {j : j ∈ J, µXj > 0}. For each j ∈ J 0 , the set Cj = {γ : fγ ∈ Xj } must be finite and nonempty. Moreover, for each γ ∈SC, there must be some j ∈ J such that µ(G{γ} ∩ Xj ) > 0, and in this case j ∈ J 0 and γ ∈ Cj . Thus C = j∈J 0 Cj . Because #(C) > c, #(J 0 ) > c (2A1Ld). For each j ∈ J 0 , choose γj ∈ Cj . Then fγj ∈ Xj ∈ Σ ⊆ Σγj , so there must be a Kj ∈ K such that Fγj ,Kj ⊆ Xj . At this point I finally turn to the result cited at the start of this example. Because #(J 0 ) > c, there must be distinct j, k ∈ J 0 such that fγj and fγk agree on Kj ∩ Kk . We may therefore define x ∈ X by saying that x(δ) = fγj (δ) if δ ∈ Kj , = fγk (δ) if δ ∈ Kk , = 0 if δ ∈ C \ (Kj ∪ Kj ). Now x ∈ Fγj ,Kj ∩ Fγk ,Kk ⊆ Xj ∩ Xk ,
52
Taxonomy of measure spaces
*216E
and Xj ∩ Xk 6= ∅; contradicting the assumption that the Xj formed a decomposition of X. X XQ Q (h) (X, Σ, µ) is purely atomic. P P If E ∈ Σ and µE > 0, then (as remarked in (e) above) there is a γ ∈ C such that µ(E ∩ G{γ} ) = 1; now E ∩ G{γ} must be an atom. Q Q (i) Accordingly (X, Σ, µ) is a complete, locally determined, localizable, purely atomic measure space which is not strictly localizable. 216X Basic exercises (a) In the construction of 216C, show that the subspace measure on {1} × I is not semifinite. (b) Suppose, in 216D, that I = ω1 . (i) Show that the set {(ξ, η) : ξ ≤ η < ω1 } is measurable for the measure constructed by Carath´eodory’s method from µ∗ ¹ P(I × I), but not for the subspace measure on I × I. (ii) Hence, or otherwise, show that the subspace measure on I × I is not locally determined. (c) In 216Ya, 252Yr and 252Yt below, I indicate how to construct atomless versions of 216C, 216D and 216E, that is, atomless complete measure spaces of which the first is localizable but not locally determined, the second is locally determined spaces but not localizable, and the third is locally determined and localizable but not strictly localizable. Show how direct sums of these, together with counting measure and the examples described in this chapter, can be assembled to provide all 56 examples called for by the discussion in the introduction to this section. 216Y Further exercises (a) Let λ be Lebesgue measure on [0, 1], and Λ its domain. Set Y = [0, 1] × {0, 1} and write T = {F : F ⊆ Y, F −1 [{0}] ∈ Λ}, νF = λ{z : (z, 0) ∈ F } for every F ∈ T. Set T0 = {F : F ∈ T, F −1 [{0}]4F −1 [{1}] is λnegligible}. Let I be an uncountable set. Set X = Y × I, Σ = {E : E ⊆ X, E −1 [{i}] ∈ T for every i ∈ I, {i : E −1 [{i}] ∈ / T0 } is countable}, P µE = i∈I νE −1 [{i}] for E ∈ Σ. (i) Show that (Y, T, ν) and (Y, T0 , ν¹ T0 ) are complete probability spaces, and that for every F ∈ T there is an F 0 ∈ T0 such that ν(F 4F 0 ) = 0. (ii) Show that (X, Σ, µ) is an atomless complete localizable measure space which is not locally determined. (b) Define a measure µ on X = ω2 × ω2 as follows. Take Σ to be the σalgebra of subsets of X generated by {A × ω2 : A ⊆ ω2 } ∪ {ω2 × α : α < ω2 }. For E ∈ Σ set W (E) = {ξ : ξ < ω2 , sup E[{ξ}] = ω2 }, and set µE = #(W (E)) if this is finite, 0 otherwise. Show that µ is a measure on X, is localizable and locally determined, but does not have locally determined negligible sets. Find a subspace Y of X such that the subspace measure on Y is not semifinite. (c) Show that in the space described in 216E every set has a measurable envelope, but that this is not true in the spaces of 216C and 216D.
216 Notes
Examples
53
216 Notes and comments The examples 216C216E are designed to form, with Lebesgue measure, a basis for constructing a complete set of examples for the concepts listed in 211A211K. One does not really expect to encounter these phenomena in applications, but a clear understanding of the possibilities demonstrated by these examples is part of a proper appreciation of their rarity. Of course, if we add further properties to our list – for instance, the property of having locally determined negligible sets (213I), or the property that every subset should have a measurable envelope (213Xl) – then there are further positive results to complement 211L, and more examples to hunt for, like 216Yb. But it is time, perhaps past time, that we returned to the classical theorems which apply to the measure spaces at the centre of the subject.
54
The fundamental theorem of calculus
Chapter 22 The fundamental theorem of calculus In this chapter I address one of the most important properties of the Lebesgue integral. Given an Rx integrable function f : [a, b] → R, we can form its indefinite integral F (x) = a f (t)dt for x ∈ [a, b]. Two questions immediately present themselves. (i) Can we expect to have the derivative F 0 of F equal to f ? (ii) Can we identify which functions F will appear as indefinite integrals? Reasonably satisfactory answers may be found for both of these questions: F 0 = f almost everywhere (222E) and indefinite integrals are the absolutely continuous functions (225E). In the course of dealing with them, we need to develop a variety of techniques which lead to many striking results both in the theory of Lebesgue measure and in other, apparently unrelated, topics in real analysis. The first step is ‘Vitali’s theorem’ (§221), a remarkable argument – it is more a method than a theorem – which uses the geometric nature of the real line to extract disjoint subfamilies from collections of intervals. It is the foundation stone not only of the results in §222 but of all geometric measure theory, that is, measure theory on spaces with a geometric structure. I use it here to show that monotonic functions are differentiable almost everywhere (222A). Following this, Fatou’s Lemma and Lebesgue’s Dominated Convergence Theorem are enough to show that the derivative of an indefinite integral is almost everywhere equal to the integrand. We now find that some innocentlooking manipulations of this fact take us surprisingly far; I present these in §223. I begin the second half of the chapter with a discussion of functions ‘of bounded variation’, that is, expressible as the difference of bounded monotonic functions (§224). This is one of the least measuretheoretic sections in the volume; only in 224I and 224J are measure and integration even mentioned. But this material is needed for Chapter 28 as well as for the next section, and is also one of the basic topics of twentiethcentury real analysis. §225 deals with the characterization of indefinite integrals as the ‘absolutely continuous’ functions. In fact this is now quite easy; it helps to call on Vitali’s theorem again, but everything else is a straightforward application of methods previously used. The second half of the section introduces some new ideas in an attempt to give a deeper intuition into the essential nature of absolutely continuous functions. §226 returns to functions of bounded variation and their decomposition into ‘saltus’ and ‘absolutely continuous’ and ‘singular’ parts, the first two being relatively manageable and the last looking something like the Cantor function.
221 Vitali’s theorem in R I give the first theorem of this chapter a section to itself. It occupies a position between measure theory and geometry (it is, indeed, one of the fundamental results of ‘geometric measure theory’), and its proof involves both the measure and the geometry of the real line. 221A Vitali’s theorem Let A be a bounded subset of R and I a family of nonsingleton closed intervals in R such that every point of A belongs to arbitrarily short members of I. Then thereSis a countable set I0 ⊆ I such that (i) I0 is disjoint, that is, I ∩ I 0 = ∅ for all distinct I, I 0 ∈ I0 (ii) µ(A \ I0 ) = 0, where µ is Lebesgue measure on R. S proof (a) If there is a finite disjoint set I0 ⊆ I such that A ⊆ I0 (including the possibility that A = I0 = ∅), we can stop. So let us suppose henceforth that there is no such I0 . Let µ∗ be Lebesgue outer measure on R. Suppose that x < M for every x ∈ A, and set I 0 = {I : I ∈ I, I ⊆ [−M, M ]}. 0 0 (b) In this case, if I0 is any S finite disjoint subset of I , there is a J ∈ I which is disjoint from any member of I0 . P P Take x ∈ A \ I0 . Now there is a δ > 0 such that [x − δ, x + δ] does not meet any member of I0 , and as x < M we can suppose that [x −S δ, x + δ] ⊆ [−M, M ]. Let J be a member of I, containing x, and of length at most δ; then J ∈ I 0 and J ∩ I0 = ∅. Q Q
221B
Vitali’s theorem in R
55
(c) We can now choose a sequence hγn in∈N of real numbers and a disjoint sequence hIn in∈N in I 0 inductively, as follows. Given hIj ij 0 we have an h with 0 < h ≤ δ and 1 h (f (x + h) − f (x)) > m, so that [x, x + h] ∈ I if h > 0,
[x + h, x] ∈ I if h < 0;
thus every member of Am belongs to arbitrarilySsmall intervals in I. By Vitali’s theorem (221A), there is a countable disjoint set I0 ⊆ I such that µ(A \ I0 ) = 0. P Now, because f is nondecreasing, hf ∗ (J)iJ∈I0 P ∗ ∗ is disjoint, andSall the f (J) are included in [−M, M ], so J∈I0 µf (J) ≤ 2M and J∈I0 µJ ≤ 2M/m. Because Am \ I0 is negligible,
58
The fundamental theorem of calculus
222A
µ∗ A ≤ µ∗ Am ≤ 2M/m. As m is arbitrary, µ∗ A = 0 and A is negligible. (c) Now consider B = {x : x ∈ I, D∗ f (x) > D∗ f (x)}. For q, q 0 ∈ Q with 0 ≤ q < q 0 , set Bqq0 = {x : x ∈ I, D∗ f (x) < q, D∗ f (x) > q 0 }. Fix such q, q 0 for the moment, and write γ = µ∗ Bqq0 . Take any ² > 0, and let G be an open set including Bqq0 such that µG ≤ γ + ² (134Fa). Let J be the set of nonsingleton closed intervals [a, b] ⊆ I ∩ G such that f (b) − f (a) ≤ q(b − a); this time µf ∗ (J) ≤ qµJ for J ∈ J . Then every member of B Sqq0 is included in arbitrarily small members of J , so there is a countable disjoint J0 ⊆ J such that Bqq0 \ J0 is negligible. Let L be the set of endpoints of members of J0 ; then L is a countable union of doubleton sets, so is countable, therefore negligible. Set S C = Bqq0 ∩ J0 \ L; then µ∗ C = γ. Let I be the set of nonsingleton closed intervals J = [a, b] such that (i) J is included in one of the members of J0 (ii) f (b) − f (a) ≥ q 0 (b − a); now µf ∗ (J) ≥ q 0 µJ for every J ∈ I. Once again, because every member of C is an interior point of some member of J0 , every point S of C belongs to arbitrarily small members of I; so there is a countable disjoint I0 ⊆ I such that µ(C \ I0 ) = 0. As in (b) above, S P P S γq 0 ≤ q 0 µ( I0 ) = I∈I0 q 0 µI ≤ I∈I0 µf ∗ (I) = µ( I∈I0 f ∗ (I)). On the other hand, µ(
[
J∈J0
f ∗ (J)) =
X J∈J0
µf ∗ (J) ≤ q
X
[ µJ = qµ( J0 )
J∈J0
[ ≤ qµ( J ) ≤ qµG ≤ q(γ + ²).
S S But I∈I0 f ∗ (I) ⊆ J∈J0 f ∗ (J), because every member of I0 is included in a member of J0 , so γq 0 ≤ q(γ +²) and γ ≤ ²q/(q 0 − q). As ² is arbitrary, γ = 0. S Thus every Bqq0 is negligible. Consequently B = q,q0 ∈Q,0≤q 0. Then there is a δ > 0 such that x + h ∈ [a, b] and F (x + h) − F (x)) − hF 0 (x) ≤ ²h whenever h ≤ δ. Let n ∈ N be such that 2−n (b − a) ≤ δ. Let k < 2n be such that x ∈ Ink . Then x − δ ≤ ank ≤ x < bnk ≤ x + δ,
gn (x) =
2n (F (bnk ) − F (ank )). b−a
60
The fundamental theorem of calculus
222C
Now we have gn (x) − F 0 (x) = 
2n (F (bnk ) − F (ank )) − F 0 (x) b−a
=
2n F (bnk ) − F (ank ) − (bnk b−a
≤
2n ¡ F (bnk ) − F (x) − (bnk b−a
− ank )F 0 (x)
− x)F 0 (x)
¢ + F (x) − F (ank ) − (x − ank )F 0 (x)
≤
2n (²bnk b−a
− x + ²x − ank ) = ².
And this is true whenever 2−n ≤ δ, that is, for all n large enough. As ² is arbitrary, F 0 (x) = limn→∞ gn (x). Q Q (d) Thus gn → F 0 almost everywhere on [a, b]. By Fatou’s Lemma,
Rb a
F0 =
Rb a
lim inf n→∞ gn ≤ lim inf n→∞
Rb a
gn = limn→∞
Rb a
gn = F (b) − F (a),
as required. Remark There is a generalization of this result in 224I. 222D Lemma R x Suppose R x that a < b in R, and that f , g are realvalued functions, both integrable over [a, b], such that a f = a g for every x ∈ [a, b]. Then f = g almost everywhere on [a, b]. proof The point is that
R E
f=
Rb a
f × χE =
for any measurable set E ⊆ [a, b[. P P (i) If E = [c, d[ where a ≤ c ≤ d ≤ b, then
R
f= E
Rd a
f−
Rc a
f=
Rb a
Rd a
g × χE =
g−
Rc a
R
g=
E
g
R E
g.
(ii) If E = [a, b[ ∩ G for some open set G ⊆ R, then for each n ∈ N set Kn = {k : k ∈ Z, k ≤ 4n , [2−n k, 2−n (k + 1)[ ⊆ G}, S Hn = k∈Kn [2−n k, 2−n (k + 1)[ ∩ [a, b[; then hHn in∈N is a nondecreasing sequence of measurable sets with union E, so f × χE = limn→∞ f × χHn , and (by Lebesgue’s Dominated Convergence Theorem, because f × χHn  ≤ f  almost everywhere for every n, and f  is integrable)
R
E
f = limn→∞
R
Hn
f.
At the same time, each Hn is a finite disjoint union of halfopen intervals in [a, b[, so R R R R P P f = g, −n k,2−n (k+1)[∩[a,b[ f = −n k,2−n (k+1)[∩[a,b[ g = k∈K k∈K [2 [2 n n H H n
and
n
R E
g = limn→∞
R Hn
g = limn→∞
R Hn
f=
R E
f.
(iii) For general measurable E ⊆ T [a, b[, we can choose for each n ∈ N an open set Gn ⊇ E such that µGn ≤ µE + 2−n (134Fa). Set G0n = m≤n Gm , En = [a, b[ ∩ G0n for each n, T T T F = [a, b[ ∩ n∈N Gn = n∈N [a, b[ ∩ G0n = n∈N En . Then E ⊆ F and µF ≤ inf n∈N µGn = µE,
222F
Differentiating an indefinite integral
so F \ E is negligible and f × χ(F \ E) is zero almost everywhere; consequently On the other hand,
61
R F \E
f = 0 and
R F
f=
R E
f.
f × χF = limn→∞ f × χEn , so by Lebesgue’s Dominated Convergence Theorem again
R
E
Similarly
f=
R
F
f = limn→∞
R
R En
f.
R
g = limn→∞ E g. n R R R R But by part (ii) we have En g = En f for every n, so E g = E f , as required. Q Q By 131Hb, f = g almost everywhere on [a, b[, and therefore almost everywhere on [a, b]. E
222E TheoremR Suppose that a ≤ b in R and that f is a realvalued function which is integrable over x [a, b]. Then F (x) = a f exists in R for every x ∈ [a, b], and the derivative F 0 (x) exists and is equal to f (x) for almost every x ∈ [a, b]. proof (a) For most of this proof (down to the end of (c) below) I suppose that f is nonnegative. In this case, F (y) = F (x) +
Ry x
f ≥ F (x)
whenever a ≤ x ≤ y ≤ b; thus F is nondecreasing and therefore differentiable almost everywhere in [a, b], by 222A. Rx By 222C we know also that a F 0 exists and is less than or equal to F (x) − F (a) = F (x) for every x ∈ [a, b]. (b) Now suppose, in addition, that f is bounded; say 0 ≤ f (t) ≤ M for every t ∈ dom f . Then M − f is integrable over [a, b]; let G be its indefinite R x integral, so that G(x) = M (x − a) − F (x) for every x ∈ [a, b]. Applying (a) to M − f and G, we have a G0 ≤ G(x) for every x ∈ [a, b]; but of course G0 = M − F 0 , so Rx Rx Rx Rx M (x − a) − a F 0 ≤ M (x − a) − F (x), that is, a F 0 ≥ F (x) for every x ∈ [a, b]. Thus a F 0 = a f for every x ∈ [a, b]. Now 222D tells us that F 0 = f almost everywhere on [a, b]. (c) Thus for bounded, nonnegative f we are done. For unbounded f , let hfn in∈N be a nondecreasing sequence of nonnegative simple functions converging to f almost everywhere on [a, b], and let hFn in∈N be the corresponding indefinite integrals. Then for any n and any x, y with a ≤ x ≤ y ≤ b, we have F (y) − F (x) = 0
Ry x
f≥
Ry x
fn = Fn (y) − Fn (x),
Fn0 (x)
so that F (x) ≥ for any x ∈ ]a, b[ where both are defined, and F 0 (x) ≥ fn (x) for almost every x ∈ [a, b]. This is true for every n, so F 0 ≥ f almost everywhere, and F 0 − f ≥ 0 almost everywhere. On the other hand, as noted in (a),
Rb
so
Rb a
a
F 0 ≤ F (b) − F (a) =
Rb a
f,
F 0 − f ≤ 0. It follows that F 0 =a.e. f (that is, that F 0 = f almost everywhere)(122Rc).
(d) This completes the proof for nonnegative f . For general f , we can express f as f1 − f2 where f1 , f2 are nonnegative integrable functions; now F = F1 − F2 where F1 , F2 are the corresponding indefinite integrals, so F 0 =a.e. F10 − F20 =a.e. f1 − f2 , and F 0 =a.e. f . f is any realvalued function which is integrable over R, and set F (x) = R x 222F Corollary Suppose that 0 f for every x ∈ R. Then F (x) exists and is equal to f (x) for almost every x ∈ R. −∞ proof For each n ∈ N, set Fn (x) =
Rx −n
f
for x ∈ [−n, n]. Then Fn0 (x) = f (x) for almost every x ∈ [−n, n]. But F (x) = F (−n) + Fn (x) for every x ∈ [−n, n], so F 0 (x) exists and is equal to Fn0 (x) for every x ∈ ]−n, n[ for which Fn0 (x) is defined; and F 0 (x) = f (x) for almost every x ∈ [−n, n]. As n is arbitrary, F 0 (x) = f (x) for almost every x ∈ R.
62
The fundamental theorem of calculus
222G
222G Corollary Suppose that R E ⊆ R is a measurable set and that f is a realvalued function which is integrable over E. Set F (x) = E∩]−∞,x[ f for x ∈ R. Then F 0 (x) = f (x) for almost every x ∈ E, and F 0 (x) = 0 for almost every x ∈ R \ E. proof RApply 222F to f˜, where f˜(x) = f (x) for x ∈ E ∩ dom f and f˜(x) = 0 for x ∈ R \ E, so that x f˜ for every x ∈ R. F (x) = −∞
Rx d f = f (x) for almost every x is satisfying, but is no substitute for the more 222H The result that dx a elementary result that this equality is valid at any point at which f is continuous. PropositionR Suppose that a ≤ b in R and that f is a realvalued function which is integrable over [a, b]. x Set F (x) = a f for x ∈ [a, b]. Then F 0 (x) exists and is equal to f (x) at any point x ∈ dom(f ) ∩ ]a, b[ at which f is continuous. proof Set c = f (x). Let ² > 0. Let δ > 0 be such that δ ≤ min(b − x, x − a) and f (t) − c ≤ ² whenever t ∈ dom f and t − x ≤ δ. If x < y ≤ x + δ, then 
F (y)−F (x) y−x
− c =
1 Ry  y−x x
Similarly, if x − δ ≤ y < x, 
F (y)−F (x) y−x
− f (x) =
f − c ≤
1 Rx  f x−y y
1 y−x
− c ≤
Ry
1 x−y
x
f − c ≤ ².
Rx y
f − c ≤ ².
As ² is arbitrary, F 0 (x) = limy→x
F (y)−F (x) y−x
= c,
as required. 222I Complexvalued functions In the work above, I have taken f to be realvalued throughout. The extension to complexvalued f is just a matter of applying the above results to the real and imaginary parts of f . Specifically, we have the following. Rx (a) If a ≤ b in R and f is a complexvalued function which is integrable over [a, b], then F (x) = a f is defined in C for every x ∈ [a, b], and its derivative F 0 (x) exists and is equal to f (x) for almost every x ∈ [a, b]; moreover, F 0 (x) = f (x) whenever x ∈ dom(f ) ∩ ]a, b[ and f is continuous at x. Rx (b) If f is a complexvalued function which is integrable over R, and F (x) = −∞ f for each x ∈ R, then F 0 exists and is equal to f almost everywhere in R. (c) If RE ⊆ R is a measurable set and f is a complexvalued function which is integrable over E, and F (x) = E∩]−∞,x[ f for each x ∈ R, then F 0 (x) = f (x) for almost every x ∈ E and F 0 (x) = 0 for almost every x ∈ R \ E. 222X Basic exercises > (a) Suppose that a < b in RR and that f is a realvalued function which is x integrable over [a, b]. Show that the indefinite integral x 7→ a f is continuous. Ry > (b) Suppose that a < b in R and that h is a realvalued function such that x h exists and is nonnegative whenever a ≤ x ≤ y ≤ b. Show that h ≥ 0 almost everywhere on [a, b]. that a < b in R and that f , g are integrable complexvalued functions on [a, b] such that R x> (c) RSuppose x f = g for every x ∈ [a, b]. Show that f = g almost everywhere on [a, b]. a a R1 > (d) Let F : [0, 1] → [0, 1] be the Cantor function (134H). Show that 0 F 0 < F (1) − F (0). 222Y Further exercises functions on P∞ (a) Let hFn in∈N be a sequence of nonnegative, P∞nondecreasing 0 0 [0, 1] such that F (x) = n=0 Fn (x) is finite for every x ∈ [0, 1]. Show that F (x) = F (x) n=0 n P∞ Pnk for almost Fj , and set every x ∈ [0, 1]. (Hint: take hnk ik∈N such that k=0 F (1) − Gk (1) < ∞, where Gk = j=0 P∞ P∞ 0 0 0 H(x) = k=0 F (x) − Gk (x). Observe that k=0 F (x) − Gk (x) ≤ H (x) whenever all the derivatives are defined, so that F 0 = limk→∞ G0k almost everywhere.)
223A
Lebesgue’s density theorems
63
(b) Let F : [0, 1] → R be a continuous nondecreasing function. (i) Show that if c ∈ R then C = {(x, y) : x, y ∈ [0, 1], F (y) − F (x) = c} is connected. (Hint: A set A ⊆ R r is connected if there is no continuous surjection h : A → {0, 1}. Show that if h : C → {0, 1} is continuous then it is of the form (x, y) 7→ h1 (x) for some continuous function h1 .) (ii) Now suppose that F (0) = 0, F (1) = 1 and that G : [0, 1] → [0, 1] is a second continuous nondecreasing function with G(0) = 0, G(1) = 1. Show that for any n ≥ 1 there are x, y ∈ [0, 1] such that F (y) − F (x) = G(y) − G(x) = n1 . (c) Let Rf , g be nonnegative integrable functions on R, and n ≥ 1. Show that there are u < v in [−∞, ∞] R Rv R v such that u f = n1 f , u g = n1 g. (d) Let f : R → R be measurable. Show that H = dom f 0 is a measurable set and that f 0 is a measurable function. 222 Notes and R x comments I have relegated to an exercise (222Xa) the fundamental fact that an indefinite integral x 7→ a f is always continuous; this is not strictly speaking needed in this section, and a much stronger result is given in 225E. There is also much more to be said about monotonic functions, to which I will return in §224. What we need here is the fact that they are differentiable almost everywhere (222A), which I prove by applying Vitali’s theorem three times, once in part (b) of the proof and twice in part (c). Following this, the arguments of 222C222E form a fine series of exercises in the central ideas of Volume 1, using the concept of integration over a (measurable) subset, Fatou’s Lemma (part (d) of the proof of 222C), Lebesgue’s Dominated Convergence Theorem (parts (aii) and (aiii) of the proof of 222D) and the approximation R xof Lebesgue measurable sets by open sets (part (aiii) of the proof of 222D). Of course d knowing that dx f = f (x) almost everywhere is not at all the same thing as knowing that this holds for a any particular x, and when we come to differentiate any particular indefinite integral we generally turn to 222H first; the point of 222E is that it applies to wildly discontinuous functions, for which more primitive methods give no information at all.
223 Lebesgue’s density theorems I now turn to a group of results which may be thought of as corollaries of Theorem 222E, but which also have a vigorous life of their own, including the possibility of significant generalizations which will be treated in Chapter 26. The idea is that any measurable function f on R r is almost everywhere ‘continuous’ in a variety of very weak senses; for almost every x, the value f (x) is determined by the behaviour of f near x, in the sense that f (y) l f (x) for ‘most’ y near x. I should perhaps say that while I recommend this work as a preparation for Chapter 26, and I also rely on it in Chapter 28, I shall not refer to it again in the present chapter, so that readers in a hurry to characterize indefinite integrals may proceed directly to §224. 223A Lebesgue’s Density Theorem: integral form Let I be an interval in R, and let f be a realvalued function which is integrable over I. Then f (x) = for almost every x ∈ I. proof Setting F (x) =
R I∩]−∞,x[ 0
1 lim h↓0 h
Z
x+h
f= x
1 lim h↓0 h
Z
x
f= x−h
1 lim h↓0 2h
x+h
f x−h
f , we know from 222G that
f (x) = F (x) =
1 lim (F (x + h) − F (x)) h↓0 h
=
1 lim h↓0 h
1
1
h↓0 h
h↓0 h
= lim (F (x) − F (x − h)) = lim 1 (F (x + h) − F (x − h)) h↓0 2h
= lim for almost every x ∈ I.
Z
Z
x+h
f Zxx f x−h
1 h↓0 2h
Z
x+h
= lim
f x−h
64
The fundamental theorem of calculus
223B
223B Corollary Let E ⊆ R be a measurable set. Then limh↓0
1 µ(E 2h
1 µ(E 2h
limh↓0
∩ [x − h, x + h]) = 1 for almost every x ∈ E,
∩ [x − h, x + h]) = 0 for almost every x ∈ R \ E.
proof Take n ∈ N. Applying 223A to f = χ(E ∩ [−n, n]), we see that limh↓0
1 2h
R x+h x−h
f = limh↓0
1 µ(E 2h
∩ [x − h, x + h])
whenever x ∈ ]−n, n[ and either limit exists, so that limh↓0
1 µ(E 2h
∩ [x − h, x + h]) = 1 for almost every x ∈ E ∩ [−n, n],
limh↓0
1 µ(E 2h
∩ [x − h, x + h]) = 0 for almost every x ∈ [−n, n] \ E.
As n is arbitrary, we have the result. Remark For a measurable set E ⊆ R, a point x such that limh↓0
1 µ(E 2h
∩ [x − h, x + h]) = 1 is sometimes
called a density point of E. 223C Corollary Let f be a measurable realvalued function defined almost everywhere on R. Then for almost every x ∈ R, limh↓0
1 µ{y 2h
: y ∈ dom f, y − x ≤ h, f (y) − f (x) ≤ ²} = 1,
limh↓0
1 µ{y 2h
: y ∈ dom f, y − x ≤ h, f (y) − f (x) ≥ ²} = 0
for every ² > 0. proof For q, q 0 ∈ Q, set Dqq0 = {x : x ∈ dom f, q ≤ f (x) < q 0 }, so that Dqq0 is measurable, Cqq0 = {x : x ∈ Dqq0 , limh↓0 so that Dqq0 \ Cqq0 is negligible, by 223B; now set C = dom f \
S
1 µ(Dqq0 2h
∩ [x − h, x + h]) = 1},
q,q 0 ∈Q (Dqq
0
\ Cqq0 ),
so that R \ C is negligible. If x ∈ C and ² > 0, then there are q, q 0 ∈ Q such that f (x) − ² ≤ q ≤ f (x) < q 0 ≤ f (x) + ², so that x belongs to Dqq0 and therefore to Cqq0 , and now lim inf h↓0
1 µ{y 2h
: y ∈ dom f ∩ [x − h, x + h], f (y) − f (x) ≤ ²}
≥ lim inf h↓0
1 µ(Dqq0 2h
∩ [x − h, x + h])
= 1, so limh↓0
1 µ{y 2h
: y ∈ dom f ∩ [x − h, x + h], f (y) − f (x) ≤ ²} = 1.
1 µ{y 2h
: y ∈ dom f ∩ [x − h, x + h], f (y) − f (x) > ²} = 0
It follows at once that limh↓0
223Ea
Lebesgue’s density theorems
65
for almost every x; but since ² is arbitrary, this is also true of 12 ², so in fact limh↓0
1 µ{y 2h
: y ∈ dom f ∩ [x − h, x + h], f (y) − f (x) ≥ ²} = 0
for almost every x. 223D Theorem Let I be an interval in R, and let f be a realvalued function which is integrable over I. Then limh↓0
1 2h
R x+h x−h
f (y) − f (x)dy = 0
for almost every x ∈ I. proof (a) Suppose first that I is a bounded open interval ]a, b[. For each q ∈ Q, set gq (x) = f (x) − q for x ∈ I ∩ dom f ; then g is integrable over I, and limh↓0
1 2h
R x+h x−h
gq = gq (x)
for almost every x ∈ I, by 223A. Setting
we have I \ Eq negligible, so I \ E is negligible, where E = limh↓0
1 2h
R x+h x−h
R x+h
1 2h
Eq = {x : x ∈ I ∩ dom f, limh↓0
x−h
T
q∈Q
gq = gq (x)},
Eq . Now
f (y) − f (x)dy = 0
for every x ∈ E. P P Take x ∈ E and ² > 0. Then there is a q ∈ Q such that f (x) − q ≤ ², so that f (y) − f (x) ≤ f (y) − q + ² = gq (y) + ² for every y ∈ I ∩ dom f , and lim sup h↓0
1 2h
Z
x+h
f (y) − f (x)dy ≤ lim sup h↓0
x−h
1 2h
Z
x+h
gq (y) + ² dy x−h
= ² + gq (x) ≤ 2². As ² is arbitrary, limh↓0
1 2h
R x+h x−h
f (y) − f (x)dy = 0,
as required. Q Q (b) If I is an unbounded open interval, apply (a) to the intervals In = I ∩ ]−n, n[ to see that the limit is zero almost everywhere on every In , and therefore on I. If I is an arbitrary interval, note that it differs by at most two points from an open interval, and that since we are looking only for something to happen almost everywhere we can ignore these points. Remark The set {x : x ∈ dom f, limh↓0
1 2h
R x+h x−h
f (y) − f (x)dy = 0}
is sometimes called the Lebesgue set of f . 223E Complexvalued functions I have expressed the results above in terms of realvalued functions, this being the most natural vehicle for the ideas. However there are applications of great importance in which the functions involved are complexvalued, so I spell out the relevant statements here. In all cases the proof is elementary, being nothing more than applying the corresponding result (223A, 223C or 223D) to the real and imaginary parts of the function f . (a) Let I be an interval in R, and let f be a complexvalued function which is integrable over I. Then
66
The fundamental theorem of calculus
1 h↓0 h
Z
x+h
f (x) = lim
1 h↓0 h
Z
x
f = lim x
1 h↓0 2h
223Ea
Z
x+h
f = lim x−h
f x−h
for almost every x ∈ I. (b) Let f be a measurable complexvalued function defined almost everywhere on R. Then for almost every x ∈ R, limh↓0
1 µ{y 2h
: y ∈ dom f, y − x ≤ h, f (y) − f (x) ≥ ²} = 0
for every ² > 0. (c) Let I be an interval in R, and let f be a complexvalued function which is integrable over I. Then limh↓0
1 2h
R x+h x−h
f (y) − f (x)dy = 0
for almost every x ∈ I. 223X Basic exercises > (a) Let E ⊆ [0, 1] be a measurable set for which there is an α > 0 such that µ(E ∩ [a, b]) ≥ α(b − a) whenever 0 ≤ a ≤ b ≤ 1. Show that µE = 1. (b) Let A ⊆ R be any set. Show that limh↓0
1 ∗ µ (A ∩ [x − h, x + h]) 2h
= 1 for almost every x ∈ A. (Hint:
apply 223B to a measurable envelope E of A.) (c) Let f be any realvalued function defined almost everywhere in R. Show that limh↓0
1 ∗ µ {y 2h
:y ∈
dom f, y − x ≤ h, f (y) − f (x) ≤ ²} = 1 for almost every x ∈ R. (Hint: use the argument of 223C, but with 223Xb in place of 223B.) > (d) Let I be an interval in R, and let f be a realvalued function which is integrable over I. Show that R x+h limh↓0 h1 x f (y) − f (x)dy = 0 for almost every x ∈ I. (e) Let E, F ⊆ R be measurable sets, and suppose that F is bounded and of nonzero measure. Let x ∈ R be such that limh↓0
1 µ(E 2h
∩ [x − h, x + h]) = 1. Show that limh↓0
µ(E∩(x+hF )) h µF
= 1. (Hint: it helps
to know that µ(hF ) = hµF (134Ya, 263A). Show that if F ⊆ [−M, M ], then 1 µF ¡ µ(E∩(x+hF )) ¢ µ(E ∩ [x − hM, x + hM ]) ≤ 1 − 1− .) 2hM
2M
h µF
(Compare 223Ya.) (f ) Let f be a realvalued function which is integrable over R, and let E be the Lebesgue set of f . Show R x+h 1 that limh↓0 2h f (t) − cdt = f (x) − c for every x ∈ E and c ∈ R. x−h (g) Let f be an integrable realvalued function defined almost everywhere in R. Let x ∈ dom f be such n R x+1/n that limn→∞ f (y) − f (x) = 0. Show that x belongs to the Lebesgue set of f . x−1/n 2
(h) Let f be an integrable realvalued function defined almost everywhere in R, and x any point of the Lebesgue set of f . Show that for every ² > 0 there is a δ > 0 such that whenever I is a nontrivial interval 1 R f  ≤ ². and x ∈ I ⊆ [x − δ, x + δ], then f (x) − I µI
(i) Let E, F ⊆ R be measurable sets, and x ∈ R a point which is a density point of both. Show that x is a density point of E ∩ F .
223 Notes
Lebesgue’s density theorems
67
T (j) Let E ⊆ R be a nonnegligible measurable set. Show that for any n ∈ N there is a δ > 0 such that i≤n E + xi is nonempty whenever x0 , . . . , xn ∈ R are such that xi − xj  ≤ δ for all i, j ≤ n. (Hint: find n a nontrivial interval I such that µ(E ∩ I) > n+1 µI.) 223Y Further exercises (a) Let E, F ⊆ R be measurable sets, and suppose that 0 < µF < ∞. Let x ∈ R be such that limh↓0
1 µ(E 2h
∩ [x − h, x + h]) = 1. Show that limh↓0
µ(E∩(x+hF )) h µF
= 1.
(Hint: apply 223Xe to sets of the form F ∩ [−M, M ].) (b) Let T be the family of measurable sets G ⊆ R such that every point of G is a density point of G. (i) Show that S T is a topology on R. (Hint: take G ⊆ T. By 215B(iv) there is a countable G0 ⊆ G such that µ(G \ G0 ) = 0 for every G ∈ G. Show that S S 1 G ⊆ {x : lim suph↓0 µ( G0 ∩ [x − h, x + h]) > 0}, 2h S S so that µ( G \ G0 ) = 0.) (ii) Show that a function f : R → R is measurable iff it is Tcontinuous at almost every x ∈ R. (T is the density topology on R. See 414P in Volume 4.) d dx
(c) R x Show that if f : [a, b] → R is bounded and continuous for the density topology on R, then f (x) = f for every x ∈ ]a, b[. a
(d) Show that a function f : R → R is continuous for the density topology at x ∈ R iff limh↓0 y ∈ [x − h, x + h], f (y) − f (x) ≥ ²} = 0 for every ² > 0. (e) A set A ⊆ R is porous at a point x ∈ R if lim supy→x
ρ(y,A) ky−xk
1 ∗ 2h µ {y
:
> 0, where ρ(y, A) = inf a∈A ky − ak.
Show that if A is porous at every x ∈ A then A is negligible. (f ) For a measurable set E ⊆ R write ψE for the set of its density points. Show that (i) ψ(E∩F ) = ψE∩ψF for all measurable sets E, F (ii) for measurable sets E and F , ψE ⊆ ψF iff µ(E \ F ) = 0 (iii) µ(E4ψE) = 0 for every measurable set E (iv) ψ(ψE) = ψE for every measurable set E (v) for every compact set K ⊆ ψE there is a compact set L ⊆ K ∪ E such that K ⊆ ψL. (g) Let f be an integrable realvalued function defined almost everywhereR in R, R and x anyR point of the Lebesgue set of f . Show that for every ² > 0 there is a δ > 0 such that f (x) g − f × g ≤ ² g whenever g : R → [0, ∞[ is such that g is nondecreasing on ]−∞, x], nonincreasing on [x, ∞[ and zero outside g(x) Pn [x − δ, x + δ]. (Hint: express g as a limit almost everywhere of functions of the form i=0 χ ]ai , bi [, n+1
where x − δ ≤ a0 ≤ . . . ≤ an ≤ x ≤ bn ≤ . . . ≤ b0 ≤ x + δ.) 223 Notes and comments The results of this section can be thought of as saying that a measurable function is in some sense ‘almost continuous’; indeed, 223Yb is an attempt to make this notion precise. For an integrable function we have stronger results, of which the furthestreaching seems to be 223D/223Ec. There are rdimensional versions of all these theorems, using balls centred on x in place of intervals [x − h, x + h]; I give these in 261C261E. A new idea is needed for the rdimensional version of Lebesgue’s density theorem (261C), but the rest of the generalization is straightforward. A less natural, and less important, extension, also in §261, involves functions defined on nonmeasurable sets (compare 223Xa223Xc). In 223D, and again in 223Xf, the essential idea is just that the intersection of countably many conegligible sets is conegligible. Put like this, it should by now be almost second nature to you. But applications of this kind tend to hinge on selecting the right family of conegligible sets to look at the intersection of. And the guiding principle is, that you need not be economical. If you have any countable family of conegligible sets in hand, you are entitled to work within its intersection. So, for instance, once we have established that every
68
The fundamental theorem of calculus
223 Notes
measurable function has a conegligible Lebesgue set, then henceforth we can work within the intersections of the Lebesgue sets of all the functions we have names for – provided that in this context we restrict ourselves to names based on some countable language. Thus in 223Xf I suggest looking at functions of the form x 7→ f (x) − q where q ∈ Q. The countable language used here has a name for the given function f and for every rational number, but not for all real numbers. Of course we could certainly have larger countable sets in place of Q if they helped. For once, it is worth trying to develop a restrictable imagination: you want to take the intersection of ‘all the conegligible sets you can think of’, but in so doing you must shift temporarily into a frame of mind which encompasses only a countable universe. (The objects of that countable universe can of course be uncountable sets; it is the universe which should be countable, when seen from outside, not its members.) The countable universe you use can in principle contain all the individual objects which appear in the statement of the problem; thus in 223Xf, for instance, the function f itself surely belongs to the relevant universe, like the set R (but not most of its members) and the operation of subtraction.
224 Functions of bounded variation I turn now to the second of the two problems to which this chapter is devoted: the identification of those real functions which are indefinite integrals. I take the opportunity to offer a brief introduction to the theory of functions of bounded variation, which are interesting in themselves and will be important in Chapter 28. I give the basic characterization of these functions as differences of monotonic functions (224D), with a representative sample of their elementary properties. 224A Definition Let f be a realvalued function and D a subset of R. I define VarD (f ), the (total) variation of f on D, as follows. If D ∩ dom f = ∅, VarD (f ) = 0. Otherwise, VarD (f ) is Pn sup{ i=1 f (ai ) − f (ai−1 ) : a0 , a1 , . . . , an ∈ D ∩ dom f, a0 ≤ a1 ≤ . . . ≤ an }, allowing VarD (f ) = ∞. If VarD (f ) is finite, we say that f is of bounded variation on D. If the context seems clear, I may write Var f for Vardom f (f ), and say that f is simply ‘of bounded variation’ if this is finite. 224B Remarks (a) In the present chapter, we shall virtually exclusively be concerned with the case in which D is a bounded closed interval included in dom f . The general formulation will be useful for some technical questions arising in Chapter 28; but if it makes you more comfortable, you will lose nothing by supposing for the moment that D is an interval. (b) Clearly VarD (f ) = VarD∩dom f (f ) = Var(f ¹D) for all D, f . 224C Proposition (a) If f , g are two realvalued functions and D ⊆ R, then VarD (f + g) ≤ VarD (f ) + VarD (g). (b) If f is a realvalued function, D ⊆ R and c ∈ R then VarD (cf ) = c VarD (f ). (c) If f is a realvalued function, D ⊆ R and x ∈ R then VarD (f ) ≥ VarD∩]−∞,x] (f ) + VarD∩[x,∞[ (f ), with equality if x ∈ D ∩ dom f . (d) If f is a realvalued function and D ⊆ D0 ⊆ R then VarD (f ) ≤ VarD0 (f ). (e) If f is a realvalued function and D ⊆ R, then f (x) − f (y) ≤ VarD (f ) for all x, y ∈ D ∩ dom f ; so if f is of bounded variation on D then f is bounded on D ∩ dom f and (if D ∩ dom f 6= ∅) supy∈D∩dom f f (y) ≤ f (x) + VarD (f ) for every x ∈ D ∩ dom f .
224C
Functions of bounded variation
69
(f) If f is a monotonic realvalued function and D ⊆ R meets dom f , then VarD (f ) = supx∈D∩dom f f (x) − inf x∈D∩dom f f (x). proof (a) If D ∩ dom(f + g) = ∅ this is trivial, because VarD (f ) and VarD (g) are surely nonnegative. Otherwise, if a0 ≤ . . . ≤ an in D ∩ dom(f + g), then n X
(f + g)(ai ) − (f + g)(ai−1 ) ≤
i=1
n X
f (ai ) − f (ai−1 ) +
i=1
n X
g(ai ) − g(ai−1 )
i=1
≤ Var(f ) + Var(g); D
D
as a0 , . . . , an are arbitrary, VarD (f + g) ≤ VarD (f ) + VarD (g). (b)
Pn i=1
(cf )(ai ) − (cf )(ai−1 ) = c
Pn i=1
f (ai ) − f (ai−1 )
whenever a0 ≤ . . . ≤ an in D ∩ dom f . (c)(i) If either D ∩ ]−∞, x] ∩ dom f or D ∩ [x, ∞[ ∩ dom f is empty, this is trivial. If a0 ≤ . . . ≤ am in D ∩ ]−∞, x] ∩ dom f , b0 ≤ . . . ≤ bn in D ∩ [x, ∞[ ∩ dom f , then m X
f (ai ) − f (ai−1 ) +
i=1
n X
f (bi ) − f (bi−1 ) ≤
j=1
m+n+1 X
f (ai ) − f (ai−1 )
i=1
≤ Var(f ), [a,b]
if we write ai = bi−m−1 for m + 1 ≤ i ≤ m + n + 1. So VarD∩]−∞,x] (f ) + VarD∩[x,∞[ (f ) ≤ VarD (f ). (ii) Now suppose that x ∈ D ∩ dom f . If a0 ≤ . . . ≤ an in D ∩ dom f , and a0 ≤ x ≤ an , let k be such that x ∈ [ak−1 , ak ]; then n X
f (ai ) − f (ai−1 ) ≤
i=1
k−1 X
f (ai ) − f (ai−1 ) + f (x) − f (ak−1 )
i=1
+ f (ak ) − f (x) +
n X
f (ai ) − f (ai−1 )
i=k+1
≤
Var
(f ) +
D∩]−∞,x]
Var (f )
D∩[x,∞[
P0 Pn Pn as 0). If x ≤ a0 then i=1 f (ai ) − f (ai−1 ) ≤ VarD∩[x,∞[ (f ); if (counting empty sums i=1 , i=n+1 Pn x ≥ an then i=1 f (ai ) − f (ai−1 ) ≤ VarD∩]−∞,x] (f ). Thus Pn i=1 f (ai ) − f (ai−1 ) ≤ VarD∩]−∞,x] (f ) + VarD∩[x,∞[ (f ) in all cases; as a0 , . . . , an are arbitrary, VarD (f ) ≤ VarD∩]−∞,x] (f ) + VarD∩[x,∞[ (f ). So the two sides are equal. (d) is trivial. (e) If x, y ∈ D ∩ dom f and x ≤ y then f (x) − f (y) = f (y) − f (x) ≤ VarD (f ) by the definition of VarD ; and the same is true if y ≤ x. So of course f (y) ≤ f (x) + VarD (f ).
70
The fundamental theorem of calculus
224C
(f ) If f is nondecreasing, then n X
Var(f ) = sup{ D
i=1 n X
= sup{
f (ai ) − f (ai−1 ) : a0 , a1 , . . . , an ∈ D ∩ dom f, a0 ≤ a1 ≤ . . . ≤ an } f (ai ) − f (ai−1 ) : a0 , a1 , . . . , an ∈ D ∩ dom f, a0 ≤ a1 ≤ . . . ≤ an }
i=1
= sup{f (b) − f (a) : a, b ∈ D ∩ dom f, a ≤ b} =
sup
f (b) −
b∈D∩dom f
inf
a∈D∩dom f
f (a).
If f is nonincreasing then n X Var(f ) = sup{ f (ai ) − f (ai−1 ) : a0 , a1 , . . . , an ∈ D ∩ dom f, a0 ≤ a1 ≤ . . . ≤ an } D
i=1 n X = sup{ f (ai−1 ) − f (ai ) : a0 , a1 , . . . , an ∈ D ∩ dom f, a0 ≤ a1 ≤ . . . ≤ an } i=1
= sup{f (a) − f (b) : a, b ∈ D ∩ dom f, a ≤ b} =
sup
f (a) −
a∈D∩dom f
inf
b∈D∩dom f
f (b).
224D Theorem For any realvalued function f and any set D ⊆ R, the following are equiveridical: (i) there are two bounded nondecreasing functions f1 , f2 : R → R such that f = f1 − f2 on D ∩ dom f ; (ii) f is of bounded variation on D; (iii) there are bounded nondecreasing functions f1 , f2 : R → R such that f = f1 − f2 on D ∩ dom f and VarD (f ) = Var f1 + Var f2 . proof (i)⇒(ii) If f : R → R is bounded and nondecreasing, then Var f = supx∈R f (x) − inf x∈R f (x) is finite. So if f agrees on D ∩ dom f with f1 − f2 where f1 and f2 are bounded and nondecreasing, then Var(f ) = D
Var (f ) ≤
D∩dom f
Var (f1 ) +
D∩dom f
Var (f2 )
D∩dom f
≤ Var f1 + Var f2 < ∞, using (a), (b) and (d) of 224C. (ii)⇒(iii) Suppose that f is of bounded variation on D. Set D0 = D ∩ dom f . If D0 = ∅ we can take both fj to be the zero function, so henceforth suppose that D0 6= ∅. Write g(x) = VarD∩]−∞,x] (f ) for x ∈ D0 . Then g1 = g + f and g2 = g − f are both nondecreasing. P P If a, b ∈ D0 and a ≤ b, then g(b) ≥ g(a) + VarD∩[a,b] (f ) ≥ g(a) + f (b) − f (a). So g1 (b) − g1 (a) = g(b) − g(a) − f (b) + f (a),
g2 (b) − g2 (a) = g(b) − g(a) + f (b) − f (a)
are both nonnegative. Q Q Now there are nondecreasing functions h1 , h2 : R → R, extending g1 , g2 respectively, such that Var hj = Var gj for both j. P P f is bounded on D, by 224Ce, and g is bounded just because VarD (f ) < ∞, so that gj is bounded. Set cj = inf x∈D0 gj (x) and hj (x) = sup({cj } ∪ {gj (y) : y ∈ D0 , y ≤ x}) for every x ∈ R; this works. Q Q Observe that for x ∈ D0 , h1 (x) + h2 (x) = g1 (x) + g2 (x) = g(x) + f (x) + g(x) − f (x) = 2g(x),
224F
Functions of bounded variation
71
h1 (x) − h2 (x) = 2f (x). Now, because g1 and g2 are nondecreasing, supx∈D0 g1 (x) + supx∈D0 g2 (x) = supx∈D0 g1 (x) + g2 (x) = 2 supx∈D0 g(x), inf x∈D0 g1 (x) + inf x∈D0 g2 (x) = inf x∈D0 g1 (x) + g2 (x) = 2 inf x∈D0 g(x) ≥ 0. But this means that Var h1 + Var h2 = Var g1 + Var g2 = 2 Var g ≤ 2 VarD (f ), using 224Cf three times. So if we set fj (x) = functions such that f1 (x) − f2 (x) = f (x) for x ∈ D0 ,
1 2 hj (x)
for j ∈ {1, 2}, x ∈ R, we shall have nondecreasing
Var f1 + Var f2 =
1 2
1 2
Var h1 + Var h2 ≤ VarD (f ).
Since we surely also have VarD (f ) ≤ VarD (f1 ) + VarD (f2 ) ≤ Var f1 + Var f2 , we see that VarD (f ) = Var f1 + Var f2 , and (iii) is true. (iii)⇒(i) is trivial. 224E Corollary Let f be a realvalued function and D any subset of R. If f is of bounded variation on D, then limx↓a VarD∩]a,x] (f ) = limx↑a VarD∩[x,a[ (f ) = 0 for every a ∈ R, and lima→−∞ VarD∩]−∞,a] (f ) = lima→∞ VarD∩[a,∞[ (f ) = 0. proof (a) Consider first the case in which D = dom f = R and f is a bounded nondecreasing function. Then VarD∩]a,x] (f ) = supy∈]a,x] f (x) − f (y) = f (x) − inf y>a f (y) = f (x) − limy↓a f (y), so of course limx↓a VarD∩]a,x] (f ) = limx↓a f (x) − limy↓a f (y) = 0. In the same way limx↑a VarD∩[x,a[ (f ) = limy↑a f (y) − limx↑a f (x) = 0, lima→−∞ VarD∩]−∞,a] (f ) = lima→−∞ f (a) − limy→−∞ f (y) = 0, lima→∞ VarD∩[a,∞[ (f ) = limy→∞ f (y) − lima→∞ f (a) = 0. (b) For the general case, define f1 , f2 from f and D as in 224D. Then for every interval I we have VarD∩I (f ) ≤ VarI (f1 ) + VarI (f2 ), so the results for f follow from those for f1 and f2 as established in part (a) of the proof. 224F Corollary Let f be a realvalued function of bounded variation on [a, b], where a < b. If dom f meets every interval ]a, a + δ] with δ > 0, then limt∈dom f,t↓a f (t) is defined in R. If dom f meets [b − δ, b[ for every δ > 0, then limt∈dom f,t↑b f (t) is defined in R. proof Let f1 , f2 : R → R be nondecreasing functions such that f = f1 − f2 on [a, b] ∩ dom f . Then
72
The fundamental theorem of calculus
224F
limt∈dom f,t↓a f (t) = limt↓a f1 (t) − limt↓a f2 (t) = inf t>a f1 (t) − inf t>a f2 (t), limt∈dom f,t↑b f (t) = limt↑b f1 (t) − limt↑b f2 (t) = supt n Var f . Order these so that x0 < x1 < . . . < xk . Set δ = 12 min1≤i≤k xi − xi−1 > 0. For each i, there is a yi ∈ D ∩ [xi − δ, xi + δ] such that f (yi ) − f (xi ) ≥ n1 . Take x0i , yi0 to be xi , yi in order, so that x0i < yi0 . Now x00 ≤ y00 ≤ x01 ≤ y10 ≤ . . . ≤ x0k ≤ yk0 , and Var f ≥
Pk i=0
f (yi0 ) − f (x0i ) =
Pk i=0
1 n
f (yi ) − f (xi ) ≥ (k + 1) > Var f ,
which is impossible. X XS Q Q It follows that A = n∈N An is countable, being a countable union of finite sets. But A is exactly the set of points of D at which f is not continuous. 224I Theorem Let I ⊆ R be an interval, and f : I → R a function of bounded variation. Then f is differentiable almost everywhere in I, and f 0 is integrable over I, with
R
I
f 0  ≤ VarI (f ).
proof (a) Let f1 and f2 be nondecreasing functions such that f = f1 − f2 everywhere on I (224D). Then f1 and f2 are differentiable almost everywhere (222A). At any point of I except possibly its endpoints, if any, f will be differentiable if f1 and f2 are, so f 0 (x) is defined for almost every x ∈ I. (b) Set F (x) = VarI∩]−∞,x] f for x ∈ R. If x, y ∈ I and x ≤ y, then F (y) − F (x) = Var[x,y] f ≥ f (y) − f (x), 0 by 224Cc; so F 0 (x) R ≥0 f (x) R whenever x is an interior point of I and both derivatives exist, which is almost everywhere. So I f  ≤ I F 0 . But if a, b ∈ I and a ≤ b,
224J
Functions of bounded variation
Rb Now I is expressible as
S
F 0 ≤ F (b) − F (a) ≤ F (b) ≤ Var f .
a n∈N [an , bn ]
Z
where an+1 ≤ an ≤ bn ≤ bn+1 for every n. So Z
Z
f 0  ≤ I
73
F0 =
F 0 × χI
I
Z
Z 0
=
F 0 × χ[an , bn ]
sup F × χ[an , bn ] = sup
n∈N
(by B.Levi’s theorem)
Z
n∈N
bn
= sup n∈N
F 0 ≤ Var(f ). I
an
224J The next result is not needed in this chapter, but is one of the most useful properties of functions of bounded variation, and will be used repeatedly in Chapter 28. Proposition Let f , g be realvalued functions defined on subsets of R, and suppose that g is integrable over an interval [a, b], where a < b, and f is of bounded variation on ]a, b[ and defined almost everywhere in ]a, b[. Then f × g is integrable over [a, b], and ¯ ¯
Z
b
¯ ¡ f × g¯ ≤
a
Z ¯ c ¯ ¯ f (x) + Var(f ) sup g ¯. ¢
lim
x∈dom f,x↑b
]a,b[
c∈[a,b]
a
proof (a) By 224F, l = limx∈dom f,x↑b f (x) is defined. Write M = l + Var]a,b[ (f ). Note that if y is any point of dom f ∩ ]a, b[, f (y) ≤ f (x) + f (x) − f (y) ≤ f (x) + Var]a,b[ (f ) → M as x ↑ b in dom f , so f (y) ≤ M . Moreover, f is measurable on ]a, b[, because there are bounded monotonic functions f1 , f2 : R → R such that f = f1 − f2 everywhere on ]a, b[ ∩ dom f . So f × g is measurable and dominated by M g, and is integrable over ]a, b[ or [a, b]. (b) For n ∈ N, k ≤ 2n set ank = a + 2−n k(b − a), and for 1 ≤ k ≤ 2n choose xnk ∈ dom f ∩ ]an,k−1 , ank ]. Define fn : ]a, b] → R by setting fn (x) = f (xnk ) if 1 ≤ k ≤ 2n , x ∈ ]an,k−1 , ank ]. Then f (x) = limn→∞ fn (x) whenever x ∈ ]a, b[ ∩ dom f and f is continuous at x, which must be almost everywhere (224H). Note next that all the fn are measurable, and that they are uniformly bounded, in modulus, by M . So {fn × g : n ∈ N} is dominated by the integrable function M g, and Lebesgue’s Dominated Convergence Theorem tells us that
Rb a
f × g = limn→∞
(c) Fix n ∈ N for the moment. Set K = supc∈[a,b]  continuous.) Then ¯ ¯
Z
b
n
2 ¯ ¯X fn × g ¯ = ¯
a
¯ =¯
Z
ank
Rb a
Rc a
g. (Note that K is finite because c 7→
¯ fn × g ¯
k=1
an,k−1
2n X
Z f (xnk )(
Z
ank
an,k−1
g− a
k=1
¯ g)¯
a
n
−1 ¯2X =¯ (f (xnk ) − f (xn,k+1 ))
Z
Z
ank
Z
b
n
−1 ¯ ¯¯ ¯ 2X ¯f (xn,k+1 ) − f (xnk )¯¯ g¯ +
a
k=1
≤ (f (xn,2n ) + Var(f ))K → M K ]a,b[
b
g + f (xn,2n ) a
k=1
¯ ¯¯ ≤ ¯f (xn,2n )¯¯
fn × g.
¯ g¯
a
Z a
ank
¯ g¯
Rc a
g is
74
The fundamental theorem of calculus
224J
as n → ∞. (d) Now 
Rb a
f × g = limn→∞ 
Rb a
fn × g ≤ M K,
as required. 224K Complexvalued functions So far I have taken all functions to be realvalued. This is adequate for the needs of the present chapter, but in Chapter 28 we shall need to look at complexvalued functions of bounded variation, and I should perhaps spell out the (elementary) adaptations involved in the extension to the complex case. (a) Let D be a subset of R and f a complexvalued function. The variation of f on D, VarD (f ), is zero if D ∩ dom f = ∅, and otherwise is Pn sup{ j=1 f (aj ) − f (aj−1 ) : a0 ≤ a1 ≤ . . . ≤ an in D ∩ dom f }, allowing ∞. If VarD (f ) is finite, we say that f is of bounded variation on D. (b) Just as in the real case, a complexvalued function of bounded variation must be bounded, and VarD (f + g) ≤ VarD (f ) + VarD (g), VarD (cf ) = c VarD (f ), VarD (f ) ≥ VarD∩]−∞,x] (f ) + VarD∩[x,∞[ (f ) for every x ∈ R, with equality if x ∈ D ∩ dom f , VarD (f ) ≤ VarD0 (f ) whenever D ⊆ D0 ; the arguments of 224C go through unchanged. (c) A complexvalued function is of bounded variation iff its real and imaginary parts are both of bounded variation (because max(VarD (Re f ), VarD (Im f ) ≤ VarD (f ) ≤ VarD (Re f ) + VarD (Im f ).) So a complexvalued function f is of bounded variation on D iff there are bounded nondecreasing functions f1 , . . . , f4 : R → R such that f = f1 − f2 + if3 − if4 on D (224D). (d) Let f be a complexvalued function and D any subset of R. If f is of bounded variation on D, then limx↓a VarD∩]a,x] (f ) = limx↑a VarD∩[x,a[ (f ) = 0 for every a ∈ R, and lima→−∞ VarD∩]−∞,a] (f ) = lima→∞ VarD∩[a,∞[ (f ) = 0. (Apply 224E to the real and imaginary parts of f .) (e) Let f be a complexvalued function of bounded variation on [a, b], where a < b. If dom f meets every interval ]a, a + δ] with δ > 0, then limt∈dom f,t↓a f (t) is defined in C. If dom f meets [b − δ, b[ for every δ > 0, then limt∈dom f,t↑b f (t) is defined in C. (Apply 224F to the real and imaginary parts of f .) (f ) Let f , g be complex functions and D a subset of R. If f and g are of bounded variation on D, so is f × g. (For f × g is expressible as a linear combination of the four products Re f × Re g, . . . , Im f × Im g, to each of which we can apply 224G.)
224Xj
Functions of bounded variation
75
(g) Let I ⊆ R be an interval, and f : I → C a function of bounded variation. Then f is differentiable almost everywhere on I, and
R
I
f 0  ≤ VarI (f ).
(As 224I.) (h) Let f , g be complexvalued functions defined on subsets of R, and suppose that g is integrable over an interval [a, b], where a < b, and f is of bounded variation on ]a, b[ and defined almost everywhere in ]a, b[. Then f × g is integrable over [a, b], and ¯ ¯
Z
b
¯ ¡ f × g¯ ≤
a
lim
x∈dom f,x↑b
f (x) + Var(f ) ]a,b[
¢
¯ sup ¯ c∈[a,b]
Z
c
¯ g ¯.
a
(The argument of 224J applies virtually unchanged.) 224X Basic exercises >(a) Set f (x) = x2 sin
1 x2
for x 6= 0, f (0) = 0. Show that f : R → R is
differentiable everywhere and uniformly continuous, but is not of bounded variation on any nontrivial interval containing 0. (b) Give an example of a nonnegative function g : [0, 1] → [0, 1], of bounded variation, such that not of bounded variation.
√
g is
(c) Show that if f is any realvalued function defined on a subset of R, there is a function f˜ : R → R, extending f , such that Var f˜ = Var f . Under what circumstances is f˜ unique? (d) Let f : D → R be a function of bounded variation, where D ⊆ R is a nonempty set. Show that if inf x∈D f (x) > 0 then 1/f is of bounded variation. (e) Let f : [a,P b] → R be a continuous function, where a ≤ b in R. Show that if c < Var f then there is a n δ > 0 such that i=1 f (ai )−f (ai−1 ) ≥ c whenever a = a0 ≤ a1 ≤ . . . ≤ an = b and max1≤i≤n ai −ai−1 ≤ δ. (f ) Let hfn in∈N be a sequence of real functions, and set f (x) = limn→∞ fn (x) whenever the limit is defined. Show that Var f ≤ lim inf n→∞ Var fn . Rx (g) Let f be a realvalued function which is integrable over an interval [a, b] ⊆ R. Set F (x) = a f for Rb R x ∈ [a, b]. Show that Var[a,b] (F ) = a f . (Hint: start by checking that Var F ≤ f ; for the reverse inequality, consider the case f ≥ 0 first.) (h) Show that if f is a realvalued function defined on a set D ⊆ R, then Pn VarD (f ) = sup{ i=1 (−1)i (f (ai ) − f (ai−1 )) : a0 ≤ a1 ≤ . . . an in D}.
(i) Let f be a realvalued function which is integrable over a bounded interval [a, b] ⊆ R. Show that Rb R ai Pn f  =sup{ i=1 (−1)i a f  : a = a0 ≤ a1 ≤ a2 ≤ . . . ≤ an = b}. a i−1
(Hint: put 224Xg and 224Xh together.) (j) Let f and g be realvalued functions defined on subsets of R, and suppose that g is integrable over an interval [a, b], where a < b, and f is of bounded variation on ]a, b[ and defined almost everywhere on ]a, b[. Show that 
Rb a
f × g ≤ (limx∈dom f,x↓a f (x) + Var]a,b[ (f )) supc∈[a,b] 
Rb c
g.
76
The fundamental theorem of calculus
224Y
224Y Further exercises (a) Show that if f is any complexvalued function defined on a subset of R, there is a function f˜ : R → C, extending f , such that Var f˜ = Var f . Under what circumstances is f˜ unique? (b) Let D be any nonempty subset of R, and let V be the space of functions f : D → R of bounded variation. For f ∈ V set Pn kf k = sup{f (t0 ) + i=1 f (ti ) − f (ti−1 ) : t0 ≤ . . . ≤ tn ∈ D}. Show that (i) k k is a norm on V (ii) V is complete under k k (iii) kf × gk ≤ kf kkgk for all f , g ∈ V, so that V is a Banach algebra. (c) Let f : R → R be a function of bounded variation. Show that there is a Rsequence hfn in∈N of differentiable functions such that limn→∞ fn (x) = f (x) for every x ∈ R, limn→∞ fn − f  = 0, and Var(fn ) ≤ Var(f ) for every n ∈ N. (Hint: start with nondecreasing f .) (d) For any partially ordered set X and any function f : X → R, say that VarX (f ) = 0 if X = ∅ and otherwise Pn VarX (f ) = sup{ i=1 f (ai ) − f (ai−1 ) : a0 , a1 , . . . , an ∈ X, a0 ≤ a1 ≤ . . . ≤ an }. State and prove results in this framework generalizing 224D and 224Yb. (Hints: f will be ‘nondecreasing’ if f (x) ≤ f (y) whenever x ≤ y; interpret ]−∞, x] as {y : y ≤ x}.) (e) Let (X, ρ) be a metric space and f : [a, b] → X a function, where a ≤ b in R. Set Var[a,b] (f ) = Pn sup{ i=1 ρ(f (ai ), f (ai−1 )) : a ≤ a0 ≤ . . . ≤ an ≤ b}. (i) Show that Var[a,b] (f ) = Var[a,c] (f ) + Var[c,b] (f ) for every c ∈ [a, b]. (ii) Show that if Var[a,b] (f ) is finite then f is continuous at all but countably many points of [a, b]. (iii) Show that if X is complete and Var[a,b] (f ) < ∞ then limt↑x f (t) is defined for every x ∈ ]a, b]. (f ) Let U be a normed space and a ≤ b in R. For functions f : [a, b] → U define Var[a,b] (f ) as in 224Ye, using the standard metric ρ(x, y) = kx − yk for x, y ∈ U . (i) Show that Var[a,b] (f + g) ≤ Var[a,b] (f ) + Var[a,b] (g), Var[a,b] (cf ) = c Var[a,b] (f ) for all f , g : [a, b] → U and all c ∈ R. (ii) Show that if V is another normed space and T : U → V is a bounded linear operator then Var[a,b] (T f ) ≤ kT k Var[a,b] (f ) for every f : [a, b] → U . −1 (g) Let f : [0, 1] → R be a continuous function. For y ∈ R set h(y) = #(f [{y}]) if this is finite, ∞ R otherwise. Show that (if we allow ∞ as a value of the integral) Var[0,1] (f ) = h. (Hint: for n ∈ N, i < 2n set cni = sup{f (x) − f (y) : x, y ∈ [2−n i, 2−n (i + 1)]}, hni (y) = 1 if y ∈ f [ [2−n i, 2−n (i + 1)[ ], 0 otherwise. R P2n −1 P2n −1 Show that cni = hni , limn→∞ i=0 cni = Var f , limn→∞ i=0 hni = h.) (See also 226Yb.)
(h) Let ν be any LebesgueStieltjes measure on R, I ⊆ R an interval (which may be either open or closed, bounded or unbounded), and D ⊆ I a nonempty set. Let V be the space of functions of bounded variation R from D to R, and k k the norm of 224Yb on V. Let g : D → R be a function such that [a,b]∩D g dν exists R R whenever a ≤ b in I, and K = supa,b∈I,a≤b  [a,b]∩D g dν < ∞. Show that  D f × g dν ≤ Kkf k for every f ∈ V. (i) Explain how to apply 224Yh with D = N to obtain Abel’s theorem that the product of a monotonic sequence converging to 0 with a series which has bounded partial sums is summable. 224 Notes and comments I have taken the ideas above rather farther than we need immediately; for the present chapter, it is enough to consider the case in which D = dom f = [a, b] for some interval [a, b] ⊆ R. The extension to functions with irregular domains will be useful in Chapter 28, and the extension to irregular sets D, while not important to us here, is of some interest – for instance, taking D = N, we obtain the notion of ‘sequence of bounded variation’, which is surely relevant to problems of convergence and summability. The central result of the section is of course the fact that a function of bounded variation can be expressed as the difference of monotonic functions (224D); indeed, one of the objects of the concept is to characterize the linear span of the monotonic functions. Nearly everything else here can be derived as an easy consequence of this, as in 224E224G. In 224I and 224Xg we go a little deeper, and indeed some measure theory appears;
225C
Absolutely continuous functions
77
this is where the ideas here begin to connect with the real business of this chapter, to be continued in the next section. Another result which is easy enough in itself, but contains the germs of important ideas, is 224Yg. In 224Yb I mention a natural development in functional analysis, and in 224Yd and 224Ye224Yf I suggest further wideranging generalizations.
225 Absolutely continuous functions We are now ready for a full characterization of the functions that can appear as indefinite integrals (225E, 225Xh). The essential idea is that of ‘absolute continuity’ (225B). In the second half of the section (225G225N) I describe some of the relationships between this concept and those we have already seen. 225A Absolute continuity of the indefinite integral I begin with an easy fundamental result from general measure theory. Theorem Let (X, Σ, µ) be any measure space and f an integrable realvalued function defined on a conegligible subset of RX. Then for any ² > 0 there are a measurable set E of finite measure and a real number δ > 0 such that F f  ≤ ² whenever F ∈ Σ and µ(F ∩ E) ≤ δ. proof There isR a nondecreasing R sequence hgn in∈N of nonnegative R R simple functions such that f  =a.e. limn→∞ gn and f  = limn→∞ gn . Take n ∈ N such that gn ≥ f  − 12 ². Let M > 0, E ∈ Σ be such that µE < ∞ and gn ≤ M χE; set δ = ²/2M . If F ∈ Σ and µ(F ∩ E) ≤ δ, then
R
F
consequently
R F
f  =
gn =
R F
R
1 2
gn × χF ≤ M µ(F ∩ E) ≤ ²;
gn +
R F
1 2
f  − gn ≤ ² +
R
f  − gn ≤ ².
225B Absolutely continuous functions on R: Definition If [a, b] is a nonempty closed interval in R and f : [a, b] → P R is a function, we say that f is absolutely continuous if for every ² > 0 there is n aPδ > 0 such that i=1 f (bi ) − f (ai ) ≤ ² whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and n i=1 bi − ai ≤ δ. Remark The phrase ‘absolutely continuous’ is used in various senses in measure theory, closely related (if you look at them in the right way) but not identical; you will need to keep the context of each definition in clear focus. 225C Proposition Let [a, b] be a nonempty closed interval in R. (a) If f : [a, b] → R is absolutely continuous, it is uniformly continuous. (b) If f : [a, b] → R is absolutely continuous it is of bounded variation on [a, b], so is differentiable almost everywhere on [a, b], and its derivative is integrable over [a, b]. (c) If f , g : [a, b] → R are absolutely continuous, so are f + g and cf , for every c ∈ R. (d) If f , g : [a, b] → R are absolutely continuous so is f × g. (e) If g : [a, b] → [c, d] and f : [c, d] → R are absolutely continuous, and g is nondecreasing, then the composition f g : [a, b] → R is absolutely continuous. Pn proof (a) Let ² > 0. Then there Pn is a δ > 0 such that i=1 f (bi ) − f (ai ) ≤ ² whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ; but of course now f (y) − f (x) ≤ ² whenever x, y ∈ [a, b] and x − y ≤ δ. As ² is arbitrary, f is uniformly continuous. Pn (b) PnLet δ > 0 be such that i=1 f (bi ) − f (ai ) ≤ 1 whenever a ≤ a1 ≤ b1 ≤ Pan2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ. If a ≤ c = c0 ≤ c1 ≤ . . . ≤ cn ≤ d ≤ min(b, c + δ), then i=1 f (ci ) − f (ci−1 ) ≤ 1, so
78
The fundamental theorem of calculus
225C
Var[c,d] (f ) ≤ 1; accordingly (inducing on k, using 224Cc for the inductive step) Var[a,min(a+kδ,b)] (f ) ≤ k for every k, and Var[a,b] (f ) ≤ d(b − a)/δe < ∞. It follows that f 0 is integrable, by 224I. (c)(i) Let ² > 0. Then there are δ1 , δ2 > 0 such that Pn 1 i=1 f (bi ) − f (ai ) ≤ 2 ² Pn whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ1 , Pn 1 i=1 g(bi ) − g(ai ) ≤ 2 ² Pn whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and P i=1 bi − ai ≤ δ2 . Set δ = min(δ1 , δ2 ) > 0, and n suppose that a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ. Then Pn Pn Pn i=1 (f + g)(bi ) − (f + g)(ai ) ≤ i=1 f (bi ) − f (ai ) + i=1 g(bi ) − g(ai ) ≤ ². As ² is arbitrary, f + g is absolutely continuous. (ii) Let ² > 0. Then there is a δ > 0 such that Pn ² i=1 f (bi ) − f (ai ) ≤ 1+c Pn whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ. Now Pn i=1 (cf )(bi ) − (cf )(ai ) ≤ ² Pn whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ. As ² is arbitrary, cf is absolutely continuous. (d) By either (a) or (b), f and g are bounded; set M = supx∈[a,b] f (x), M 0 = supx∈[a,b] g(x). Let ² > 0. Then there are δ1 , δ2 > 0 such that Pn Pn i=1 f (bi ) − f (ai ) ≤ ² whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ1 , Pn Pn i=1 g(bi ) − g(ai ) ≤ ² whenever a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ2 . Pn Set δ = min(δ1 , δ2 ) > 0 and suppose that a ≤ a1 ≤ b1 ≤ . . . ≤ bn ≤ b and i=1 bi − ai ≤ δ. Then n X
f (bi )g(bi ) − f (ai )g(ai ) =
i=1
≤ ≤
n X i=1 n X i=1 n X
(f (bi ) − f (ai ))g(bi ) + f (ai )(g(bi ) − g(ai )) f (bi ) − f (ai )g(bi ) + f (ai )g(bi ) − g(ai ) f (bi ) − f (ai )M 0 + M g(bi ) − g(ai )
i=1
≤ ²M 0 + M ² = ²(M + M 0 ). As ² is arbitrary, f × g is absolutely continuous.
Pn (e) Let ² > 0. Then ≤ ² whenever c ≤ c1 ≤ d1 ≤ . . . ≤ Pn there is a δ > 0 such that i=1 f (di ) − f (ci ) P n cn ≤ dn ≤ d and i=1 di − ci ≤ δ; andPthere is an η > 0 such that i=1 g(bi ) − g(ai ) ≤ δ whenever n a ≤ a1 ≤ b1 P ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ η. Now suppose that a ≤ a1 ≤ b1 ≤ . . . ≤ an ≤ n bP ≤ b and n i=1 bi − ai ≤ Pnη. Because g is nondecreasing, we have c ≤ g(a1 ) ≤ . . . ≤ g(bn ) ≤ d and n g(b ) − g(a ) ≤ δ, so i i i=1 i=1 f (g(bi )) − f (g(ai )) ≤ ²; as ² is arbitrary, f g is absolutely continuous. 225D Lemma Let [a, b] be a nonempty closed interval in R and f : [a, b] → R an absolutely continuous function which has zero derivative almost everywhere on [a, b]. Then f is constant on [a, b].
225E
Absolutely continuous functions
79
Pn proof Let x ∈ [a, b], ² > 0. Let δ > a ≤ a1 ≤ b1 ≤ Pn0 be such that i=1 f (bi ) − f (ai ) ≤ ² whenever a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ. Set A = {t : a < t < x, f 0 (t) exists = 0}; then µA = x − a, writing µ for Lebesgue measure as usual. Let I be the set of nonempty nonsingleton closed intervals [c, d] ⊆ [a, x] such that f (d) − f (c) ≤ ²(d − c); then every member of A belongs to arbitrarily short members of I. By Vitali’s theorem (221A), there is a countable disjoint family I0 ⊆ I such that S µ(A \ I0 ) = 0, that is, S P x − a = µ( I0 ) = I∈I0 µI. Now there is a finite I1 ⊆ I0 such that
S P µ( I1 ) = I∈I1 µI ≥ x − a − δ.
If I1 = ∅, then x ≤ a + δ and f (x) − f (a) ≤ ². Otherwise, express I1 as {[c0 , d0 ], . . . , [cn , dn ]}, where a ≤ c0 < d0 < c1 < d1 < . . . < cn < dn ≤ x. Then Pn S (c0 − a) + i=1 (ci − di−1 ) + (x − dn ) = µ([a, x] \ I1 ) ≤ δ, so f (c0 ) − f (a) +
Pn i=1
f (ci ) − f (di−1 ) + f (x) − f (dn ) ≤ ².
On the other hand, f (di ) − f (ci ) ≤ ²(di − ci ) for each i, so Pn Pn i=0 f (di ) − f (ci ) ≤ ² i=0 di − ci ≤ ²(x − a). Putting these together, f (x) − f (a) ≤ f (c0 ) − f (a) + f (d0 ) − f (c0 ) + f (c1 ) − f (d0 ) + . . . + f (dn ) − f (cn ) + f (x) − f (dn ) = f (c0 ) − f (a) +
n X
f (ci ) − f (di−1 )
i=1
+ f (x) − f (dn ) +
n X
f (di ) − f (ci )
i=0
≤ ² + ²(x − a) = ²(1 + x − a). As ² is arbitrary, f (x) = f (a). As x is arbitrary, f is constant. 225E Theorem Let [a, b] be a nonempty closed interval in R and F : [a, b] → R a function. Then the following are equiveridical: Rx (i) there is an integrable realvalued function f such that F (x) = F (a) + a f for every x ∈ [a, b]; Rx 0 (ii) a F exists and is equal to F (x) − F (a) for every x ∈ [a, b]; (iii) F is absolutely continuous. R proof (i)⇒(iii) Assume (i). Let ² > 0. By 225A, there is a δ > 0 such that H f  ≤ ² whenever H ⊆ [a, b] and µH ≤ δ, writing Now suppose that a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ Pn µ for Lebesgue measure as usual. S an ≤ bn ≤ b and i=1 bi − ai ≤ δ. Consider H = 1≤i≤n [ai , bi [. Then µH ≤ δ and R Pn Pn R Pn R i=1 F (bi ) − F (ai ) = i=1  [a ,b [ f  ≤ i=1 [a ,b [ f  = F f  ≤ ². i
i
i
i
As ² is arbitrary, F is absolutely continuous. Rb (iii)⇒(ii) If F is absolutely continuous, then it is of bounded variation (by 225Ba), so a F 0 exists (224I). Rx Set G(x) = a F 0 for x ∈ [a, b]; then G0 =a.e. F 0 (222E) and G is absolutely continuous (by (i)⇒(iii) just proved). Accordingly G − F is absolutely continuous (225Bb) and is differentiable, with zero derivative, almost everywhere. It follows that G − F must be constant (225D). But as G(a) = 0, G = F + F (a); just as required by (ii). (ii)⇒(i) is trivial.
80
The fundamental theorem of calculus
225F
225F Integration by parts As an application of this result, I give a justification of a familiar formula. Theorem Let f be a realvalued function which is integrable over an interval [a, b] ⊆ R, and g : [a, b] → R xR an absolutely continuous function. Suppose that F is an indefinite integral of f , so that F (x) − F (a) = a f for x ∈ [a, b]. Then
Rb a
f × g = F (b)g(b) − F (a)g(a) −
Rb a
F × g0 .
proof Set h = F ×g. Because F is absolutely continuous (225E), so is h (225Cd). Consequently h(b)−h(a) = Rb 0 h , by (iii)⇒(ii) of 225E. But h0 = F 0 × g + F × g 0 wherever F 0 and g 0 are defined, which is almost a everywhere, and F 0 =a.e. f , by 222E; so h0 =a.e. f × g + F × g 0 . Finally, g and F are continuous, therefore measurable, and bounded, while f and g 0 are integrable (using 225E yet again), so f × g and F × g 0 are integrable, and F (b)g(b) − F (a)g(a) = h(b) − h(a) =
Rb a
h0 =
Rb a
f ×g+
Rb a
F × g0 ,
as required. 225G I come now to a group of results at a rather deeper level than most of the work of this chapter, being closer to the ideas of Chapter 26. Proposition Let [a, b] be a nonempty closed interval in R and f : [a, b] → R an absolutely continuous function. Then f [A] is negligible for every negligible set A ⊆ R. Pn proof Let ² > 0.PThen there is a δ > 0 such that i=1 f (bi ) − f (ai ) ≤ ² whenever a ≤ a1 ≤ b1 . . . ≤ n a n ≤ bn ≤ b and i=1 bi − ai ≤ δ. Now there is a Ssequence hIk ik∈N of closed intervals, covering A, with P ∞ P Fm must be expressible µI < δ. For each m ∈ N, let F be [a, b] ∩ k m k≤m Ik . Then µf [Fm ] ≤ ². P k=0 S as i≤n [ci , di ] where n ≤ m and a ≤ c0 ≤ d0 ≤ . . . ≤ cn ≤ dn ≤ b. For each i ≤ n choose xi , yi such that ci ≤ xi , yi ≤ di and f (xi ) = minx∈[ci ,di ] f (x), f (yi ) = maxx∈[ci ,di ] f (x); such exist because f is continuous, so is bounded and attains its bounds on [ci , di ]. Set ai = min(xi , yi ), bi = max(xi , yi ), so that ci ≤ ai ≤ bi ≤ di . Then Pn Pn S i=0 bi − ai ≤ i=0 di − ci = µFm ≤ µ( k∈N Ik ) ≤ δ, so µf [Fm ] = µ(
[
f [ [ci , di ] ]) ≤
i≤m
=
n X
n X
µ(f [ [ci , di ] ])
i=0 n X
µ[f (xi ), f (yi )] =
i=0
f (bi ) − f (ai ) ≤ ². Q Q
i=0
But hf [Fm ]im∈N is a nondecreasing sequence covering f [A], so S µ∗ f [A] ≤ µ( m∈N f [Fm ]) = supm∈N µf [Fm ] ≤ ². As ² is arbitrary, f [A] is negligible, as claimed. 225H Semicontinuous functions In preparation for the last main result of this section, I give a general result concerning measurable realvalued functions on subsets of R. It will be convenient here, for once, to consider functions taking values in [−∞, ∞]. If D ⊆ R r , a function g : D → [−∞, ∞] is lower semicontinuous if {x : g(x) > u} is an open subset of D (for the subspace topology, see 2A3C) for every u ∈ [−∞, ∞]. Any lower semicontinuous function is Borel measurable, therefore Lebesgue measurable (121B121D). Now we have the following result. 225I Proposition Suppose that r ≥ 1 and that f is a realvalued function, defined on a subset D of R r , which is integrable over D. Then for any ² > g : R r → [−∞, ∞] R 0 there is a lower semicontinuous function R such that g(x) ≥ f (x) for every x ∈ D and D g is defined and not greater than ² + D f .
225I
Absolutely continuous functions
81
Remarks This is a result of great general importance, so I give it in a fairly general form; but for the present chapter all we need is the case r = 1, D = [a, b] where a ≤ b. R proof (a) We can enumerate Q as hqn in∈N . By 225A, there is a δ > 0 such that F f  ≤ 12 ² whenever µD F ≤ δ, where µD is the subspace measure on D, so that µD F = µ∗ F , the outer Lebesgue measure of F , for every F ∈ ΣD , the domain of µD (214A214B). For each n ∈ N, set δn = 2−n−1 min(
² , δ), 1+2qn 
P∞ P∞ so that n=0 δn qn  ≤ 21 ² and n=0 δn ≤ δ. For each n ∈ N, let En ⊆ R r be a Lebesgue measurable set such that {x : f (x) ≥ qn } = D ∩ En , and choose an open set Gn ⊇ En ∩ B(0, n) such that µGn ≤ µ(En ∩ B(0, n)) + δn (134Fa), writing B(0, n) for the ball {x : kxk ≤ n}. For x ∈ R r , set g(x) = sup{qn : x ∈ Gn }, allowing −∞ as sup ∅ and ∞ as the supremum of a set with no upper bound in R. (b) Now check the properties of g. (i) g is lower semicontinuous. P P If u ∈ [−∞, ∞], then S {x : g(x) > u} = {Gn : qn > u} is a union of open sets, therefore open. Q Q (ii) g(x) ≥ f (x) for every x ∈ D. P P If x ∈ D and η > 0, there is an n ∈ N such that kxk ≤ n and f (x) − η ≤ qn ≤ f (x); now x ∈ En ⊆ Gn so g(x) ≥ qn ≥ f (x) − η. As η is arbitrary, g(x) ≥ f (x). Q Q (iii) Consider the functions h1 , h2 : D → ]−∞, ∞] defined by setting [ h1 (x) = f (x) if x ∈ D ∩ (Gn \ En ) n∈N
= 0 for other x ∈ D, ∞ X h2 (x) = qn χ(Gn \ En )(x) for every x ∈ D. Setting F =
n=0
S n∈N
Gn \ En , µF ≤
so
R D
P∞ n=0
h1 ≤
µ(Gn \ En ) ≤ δ,
R D∩F
1 2
f  ≤ ²
by the choice of δ. As for h2 , we have (by B.Levi’s theorem) R P∞ P∞ 1 h = n=0 qn µD (D ∩ Gn \ Fn ) ≤ n=0 qn µ(Gn \ Fn ) ≤ ² D 2 2 R – because this is finite, h2 (x) < ∞ for almost every x ∈ D. Thus D h1 + h2 ≤ ². (iv) The point is that g ≤ f + h1 + h2 everywhere in D. P P Take any x ∈ D. If n ∈ N and x ∈ Gn , then either x ∈ En , in which case f (x) + h1 (x) + h2 (x) ≥ f (x) ≥ qn , or x ∈ Gn \ En , in which case f (x) + h1 (x) + h2 (x) ≥ f (x) + f (x) + qn  ≥ qn . Thus f (x) + h1 (x) + h2 (x) ≥ sup{qn : x ∈ Gn } ≥ g(x). Q Q So g ≤ f + h1 + h2 everywhere in D. (v) Putting (iii) and (iv) together,
82
The fundamental theorem of calculus
R D
g≤
R D
f + h1 + h2 ≤ ² +
R D
225I
f,
as required. 225J
We need some results on Borel measurable sets and functions which are of independent interest.
Theorem Let D be a subset of R and f : D → R any function. Then E = {x : x ∈ D, f is continuous at x} is relatively Borel measurable in D, and F = {x : x ∈ D, f is differentiable at x} is actually Borel measurable; moreover, f 0 : F → R is Borel measurable. proof (a) For k ∈ N set Gk = {]a, b[ : a, b ∈ R, f (x) − f (y) ≤ 2−k for all x, y ∈ D ∩ ]a, b[}. S T Then Gk = Gk is an open set, so E0 = k∈N Gk is a Borel set. But E = D ∩ E0 , so E is a relatively Borel subset of D. (b)(i) I should perhaps say at once that when interpreting the formula f 0 (x) = limh→0 (f (x+h)−f (x))/h, I insist on the restrictive definition a = limh→0 if for every ² > 0 there is a δ > 0 such that
0
f (x+h)−f (x) h
f (x+h)−f (x) is defined and h f (x+h)−f (x)  − a ≤ ² whenever h
0 < h ≤ δ.
So f (x) can be defined only if there is some δ > 0 such that the whole interval [x − δ, x + δ] lies within the domain D of f . (ii) For p, q, q 0 ∈ Q and k ∈ N set H(k, p, q, q 0 ) = ∅ if ]q, q 0 [ 6⊆ D, = {x : x ∈ E ∩ ]q, q 0 [ , f (y) − f (x) − p(y − x) ≤ 2−k for every y ∈ ]q, q 0 [} if ]q, q 0 [ ⊆ D. Then H(k, p, q, q 0 ) = E ∩ ]q, q 0 [ ∩ H(k, p, q, q 0 ). P P If x ∈ E ∩ ]q, q 0 [ ∩ H(k, p, q, q 0 ) there is a sequence hxn in∈N 0 in H(k, p, q, q ) converging to x. Because f is continuous at x, f (y) − f (x) − p(y − x) = limn→∞ f (y) − f (xn ) − p(y − xn ) ≤ 2−k for every y ∈ ]q, q 0 [, so that x ∈ H(k, p, q, q 0 ). Q Q Since E is a Borel set, by (a), so is H(k, p, q, q 0 ). (iii) Now F =
T
S k∈N
p,q,q 0 ∈Q
H(k, p, q, q 0 ).
P P (α) Suppose x ∈ F , that is, f 0 (x) is defined; say f 0 (x) = a. Take any k ∈ N. Then there are p ∈ Q, (x) δ ∈ ]0, 1] such that p − a ≤ 2−k−1 and [x − δ, x + δ] ⊆ D and  f (x+h)−f − a ≤ 2−k−1 whenever h 0 0 0 < h ≤ δ; now q ∈ Q ∩ ]x,Tx + δ] T take S q ∈ Q ∩ [x − δ, x[, S and see that x ∈0 H(k, p, q, q ). As x is 0 arbitrary, F ⊆ k∈N p,q,q0 ∈Q H(k, p, q, q ). (β) If x ∈ k∈N p,q,q0 ∈Q H(k, p, q, q ), then for each k ∈ N (x) − pk  ≤ 2−k . choose pk , qk , qk0 ∈ Q such that x ∈ H(k, pk , qk , qk0 ). If h 6= 0, x + h ∈ ]qk , qk0 [ then  f (x+h)−f h But this means, first, that pk − pl  ≤ 2−k + 2−l for every k, l (since surely there is some h 6= 0 such that x + h ∈ ]qk , qk0 [ ∩ ]ql , ql0 [), so that hpk ik∈N is a Cauchy sequence, with limit a say; and, second, that (x) − a ≤ 2−k + a − pk  whenever h 6= 0 and x + h ∈ ]qk , qk0 [, so that f 0 (x) is defined and equal to  f (x+h)−f h a. Q Q S (iv) Because Q is countable, all the unions p,q,q0 ∈Q H(k, p, q, q 0 ) are Borel sets, so F also is.
225K
Absolutely continuous functions
83
S
0 (v) Now enumerate Q 3 as h(pi , qi , qi0 )ii∈N , and set Hki = H(k, pi , qi , qi0 ) \ 0 0 k, i ∈ N. Every Hki is Borel measurable, hHki ii∈N is disjoint, and S S 0 0 i∈N Hki = i∈N H(k, pi , qi , qi ) ⊇ F
j
H(k, pj , qj , qj0 ) for each
for each k. Note that f 0 (x) − p ≤ 2−k whenever x ∈ F ∩ H(k, p, q, q 0 ), so if we set fk (x) = pi for every 0 x ∈ Hki we shall have a Borel function fk such that f (x) − fk (x) ≤ 2−k for every x ∈ F . Accordingly 0 f = limk→∞ fk ¹F is Borel measurable. 225K Proposition Let [a, b] be a nonempty closed interval in R, and f : [a, b] → R a function. Set F = {x : x ∈ ]a, b[ , f 0 (x) is defined}. Then f is absolutely continuous iff (i) it is continuous (ii) f 0 is integrable over F (iii) f [ [a, b] \ F ] is negligible. proof (a) Suppose first that f is absolutely continuous. Then f is surely continuous (225Ca) and f 0 is integrable over [a, b], therefore over F (225E); also [a, b] \ F is negligible, so f [ [a, b] \ F ] is negligible, by 225G. (b) So now suppose that f satisfies the conditions. Set f ∗ (x) = f 0 (x) for x ∈ F , 0 for x ∈ [a, b] \ F . Rb Then f (b) ≤ f (a) + a f ∗ . P P (i) Because F is a Borel set and f 0 is a Borel measurable function (225J), f ∗ is measurable. Let ² > 0. Let G be an open subset of R such that f [ [a, b] \ F ] ⊆ G and µG ≤ ² (134Fa). Let g : R → [0, ∞] be a lower Rb Rb semicontinuous function such that f ∗ (x) ≤ g(x) for every x ∈ [a, b] and a g ≤ a f ∗ + ² (225I). Consider A = {x : a ≤ x ≤ b, µ([f (a), f (x)] \ G) ≤ 2²(x − a) +
Rx a
g},
interpreting [f (a), f (x)] as ∅ if f (x) < f (a). Then a ∈ A ⊆ [a, b], so c = sup A is defined and belongs to [a, b]. Rx Because f is continuous, the function x 7→ µ([f (a), f (x)] \ G) is continuous; also x 7→ 2²(x − a) + a g is certainly continuous, so c ∈ A. (ii) ?? If c ∈ F , so that f ∗ (c) = f 0 (c), then there is a δ > 0 such that a ≤ c − δ ≤ c + δ ≤ b, g(x) ≥ g(c) − ² ≥ f 0 (c) − ² whenever x − c ≤ δ, 
f (x)−f (c) x−c
− f 0 (c) ≤ ² whenever x − c ≤ δ.
Consider x = c + δ. Then c < x ≤ b and µ([f (a), f (x)] \ G) ≤ µ([f (a), f (c)] \ G) + f (x) − f (c) Z c ≤ 2²(c − a) + g + ²(x − c) + f 0 (c)(x − c) a Z c Z x ≤ 2²(c − a) + g + ²(x − c) + (g + ²) a
(because g(t) ≥ f 0 (c) − ² whenever c ≤ t ≤ x)
Z
c x
= 2²(x − a) +
g, a
so x ∈ A; but c is supposed to be an upper bound of A. X X Thus c ∈ [a, b] \ F . (iii) ?? Now suppose, if possible, that c < b. We know that f (c) ∈ G, so there is an η > 0 such that [f (c) − η, f (c) + η] ⊆ G; now there is a δ > 0 such that f (x) − f (c) ≤ η whenever x ∈ [a, b] and x − c ≤ δ. Set x = min(c + δ, b); then c < x ≤ b and [f (c), f (x)] ⊆ G, so Rc Rx µ([f (a), f (x)] \ G) = µ([f (a), f (c)] \ G) ≤ 2²(c − a) + a g ≤ 2²(x − a) + a g
84
The fundamental theorem of calculus
225K
and once again x ∈ A, even though x > sup A. X X (iv) We conclude that c = b, so that b ∈ A. But this means that f (b) − f (a) ≤ µ([f (a), f (b)]) ≤ µ([f (a), f (b)] \ G) + µG Z b Z b ≤ 2²(b − a) + g + ² ≤ 2²(b − a) + f∗ + ² + ² a
Z
= 2²(1 + b − a) + As ² is arbitrary, f (b) − f (a) ≤
Rb a
a b
f ∗.
a
f ∗ , as claimed. Q Q
Rb Rb (c) Similarly, or applying (b) to −f , f (a) − f (b) ≤ a f ∗ , so that f (b) − f (a) ≤ a f ∗ . Rd ∗ Of course the argument applies equally to any subinterval of [a, b], so f (d) R −∗ f (c) ≤ c f whenever a ≤ c ≤ d ≤ b. Now let ² > 0. By 225A once more, there is a δ > 0 such that E f ≤ ² whenever E ⊆ [a, b] Pn and µE ≤ δ. Suppose that a ≤ a1 ≤ b1 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ. Then Pn Pn R bi ∗ R S f ∗ ≤ ². i=1 f (bi ) − f (ai ) ≤ i=1 a f = [a ,b ] i
i≤n
i
i
So f is absolutely continuous, as claimed. 225L Corollary Let [a, b] be a nonempty closed interval in R. Let f : [a, b] → R be a continuous function which is differentiable on the open interval ]a, b[. If its derivative f 0 is integrable over [a, b], then f Rb is absolutely continuous, and f (b) − f (a) = a f 0 . proof f [ [a, b] \ F ] = {f (a), f (b)} is surely negligible, so f is absolutely continuous, by 225K; consequently Rb f (b) − f (a) = a f 0 , by 225E. 225M Corollary Let [a, b] be a nonempty closed interval in R, and f : [a, b] → R a continuous function. Then f is absolutely continuous iff it is continuous and of bounded variation and f [A] is negligible for every negligible A ⊆ [a, b]. proof (a) Suppose that f is absolutely continuous. By 225C(ab) it is continuous and of bounded variation, and by 225G we have f [A] negligible for every negligible A ⊆ [a, b]. (b) So now suppose that f satisfies the conditions. Set F = {x : x ∈ ]a, b[ , f 0 (x) is defined}. By 224I, [a, b] \ F is negligible, so f [ [a, b] \ F ] is negligible. Moreover, also by 224I, f 0 is integrable over [a, b] or F . So the conditions of 225K are satisfied and f is absolutely continuous. 225N The Cantor function I should mention the standard example of a continuous function of bounded variation which is not absolutely continuous. Let C ⊆ [0, 1] be the Cantor set (134G). Recall that the ‘Cantor function’ is a nondecreasing continuous function f : [0, 1] → [0, 1] such that f 0 (x) is defined and equal to zero for every x ∈ [0, 1] \ C, but f (0) = 0 < 1 = f (1) (134H). Of course f is of bounded variation and not absolutely continuous. C is negligible and f [C] = [0, 1] is not. If x ∈ C, then for every n ∈ N there is an interval of length 3−n , containing x, on which f increases by 2−n ; so f cannot be differentiable at x, and the set F = dom f 0 of 225K is precisely [0, 1] \ C, so that f [ [0, 1] \ F ] = [0, 1]. 225O Complexvalued functions As usual, I spell out the results above in the forms applicable to complexvalued functions. (a) Let (X, Σ, µ) be any measure space and f an integrable complexvalued function defined on a conegligible subset of RX. Then for any ² > 0 there are a measurable set E of finite measure and a real number δ > 0 such that F f  ≤ ² whenever F ∈ Σ and µ(F ∩ E) ≤ δ. (Apply 225A to f .) (b) If [a, b] is a nonempty closed interval in R and f : [a, b] → C Pnis a function, we say that f is absolutely continuous if for every ² > 0 therePis a δ > 0 such that i=1 f (bi ) − f (ai ) ≤ ² whenever n a ≤ a1 ≤ b1 ≤ a2 ≤ b2 ≤ . . . ≤ an ≤ bn ≤ b and i=1 bi − ai ≤ δ. Observe that f is absolutely continuous iff its real and imaginary parts are both absolutely continuous.
225Xg
Absolutely continuous functions
85
(c) Let [a, b] be a nonempty closed interval in R. (i) If f : [a, b] → C is absolutely continuous it is of bounded variation on [a, b], so is differentiable almost everywhere on [a, b], and its derivative is integrable over [a, b]. (ii) If f , g : [a, b] → C are absolutely continuous, so are f + g and ζf , for any ζ ∈ C, and f × g. (iii) If g : [a, b] → [c, d] is monotonic and absolutely continuous, and f : [c, d] → C is absolutely continuous, then f g : [a, b] → C is absolutely continuous. (d) Let [a, b] be a nonempty closed interval in R and F : [a, b] → C a function. Then the following are equiveridical: Rx (i) there is an integrable complexvalued function f such that F (x) = F (a) + a f for every x ∈ [a, b]; Rx 0 (ii) a F exists and is equal to F (x) − F (a) for every x ∈ [a, b]; (iii) F is absolutely continuous. (Apply 225E to the real and imaginary parts of F .) (e) Let f be an integrable complexvalued R x function on an interval [a, b] ⊆ R, and g : [a, b] → C an absolutely continuous function. Set F (x) = a f for x ∈ [a, b]. Then
Rb a
f × g = F (b)g(b) − F (a)g(a) −
Rb a
F × g0 .
(Apply 225F to the real and imaginary parts of f and g.) (f ) Let f be a continuous complexvalued function on a closed interval [a, b] ⊆ R, and suppose that f is differentiable at every point of the open interval ]a, b[, with f 0 integrable over [a, b]. Then f is absolutely continuous. (Apply 225L to the real and imaginary parts of f .) (g) For a result corresponding to 225M, see 264Yp. 225X Basic exercises (a) Show directly from the definition in 225B (without appealing to 225E) that any absolutely continuous realvalued function on a closed interval [a, b] is expressible as the difference of nondecreasing absolutely continuous functions. (b) Let f : [a, b] → R be an absolutely continuous function, where a ≤ b. (i) Show that f  : [a, b] → R is absolutely continuous. (ii) Show that gf is absolutely continuous whenever g : R → R is a differentiable function with bounded derivative. (c) Show directly from the definition in 225B and the Mean Value Theorem (without appealing to 225K) that if a function f is continuous on a closed interval [a, b], differentiable on the open R x interval ]a, b[, and has bounded derivative in ]a, b[, then f is absolutely continuous, so that f (x) = f (a) + a f 0 for every x ∈ [a, b]. (d) Show that if f : [a, b] → R is absolutely continuous, then Var[a,b] (f ) = 225E together.)
Rb a
f 0 . (Hint: put 224I and
(e) Let f : [0, ∞[ → C be R ∞a function which is absolutely continuous on [0, a] for every a ∈ [0, ∞[ and has Laplace transform F (s) = 0 e−sx f (x)dx defined on {s : Re s > S}. Suppose also that limx→∞ e−Sx f (x) = 0. Show that f 0 has Laplace transform sF (s) − f (0) defined whenever Re s > S. (Hint: show that f (x)e−sx − f (0) =
Rx
d
0 dt
(f (t)e−st )dt
for every x ≥ 0.) (f ) Let g : R → R be a nondecreasing function which is absolutely continuous on every bounded interval; R let µg be the associated LebesgueStieltjes measure (114Xa), and Σg its domain. Show that E g 0 = µg E for any E ∈ Σg , if we allow ∞ as a value of the integral. (Hint: start with intervals E.) (g) Let g : [a, b] → R be a nondecreasing absolutely continuous function, and f : [g(a), g(b)] → R a R g(b) Rb Rx continuous function. Show that g(a) f (t)dt = a f (g(t))g 0 (t)dt. (Hint: set F (x) = g(a) f , G = F g and Rb consider a G0 (t)dt. See also 263I.)
86
The fundamental theorem of calculus
225Xh
(h) Suppose that I ⊆ R is any nontrivial interval (bounded or unbounded, open, closed or halfopen, but not empty or a singleton), and f : I → R a function. Show that f is absolutely continuous on every closed Rb bounded subinterval of I iff there is a function g such that a g = f (b) − f (a) whenever a ≤ b in I, and in this case g is integrable iff f is of bounded variation on I. (i) Show that
R1
ln x dx 0 x−1
=
P∞
1
n=1 n2 .
(Hint: use 225F to find
R1 0
xn ln x dx, and recall that
1 1−x
=
P∞ n=0
xn
for 0 ≤ x < 1.) R1 R∞ (j) (i) Show that 0 ta dt is finite for every a > −1. (ii) Show that 1 ta e−t dt is finite for every a ∈ R. R∞ (Hint: show that there is an M such that ta ≤ M et/2 for t ≥ M .) (iii) Show that Γ(a) = 0 ta−1 e−t dt is defined for every a > 0. (iv) Show that Γ(a + 1) = aΓ(a) for every a > 0. (v) Show that Γ(n + 1) = n! for every n ∈ N. (Γ is of course the Γfunction.) (k) Show that if b > 0 then g(u) = u2 /2 in 225Xg.)
R∞ 0
ub−1 e−u
2
/2
du = 2(b−2)/2 Γ( 2b ). (Hint: consider f (t) = t(b−2)/2 e−t ,
(l) Suppose that f , g are lower semicontinuous functions, defined on subsets of R r , and taking values in ]−∞, ∞]. (i) Show that f +g, f ∧g and f ∨g are lower semicontinuous, and that αf is lower semicontinuous for every α ≥ 0. (ii) Show that if f and g are nonnegative, then f × g is lower semicontinuous. (iii) Show that if f is nonnegative and g is continuous, then f × g is lower semicontinuous. (iv) Show that if f is nondecreasing then the composition f g is lower semicontinuous. (m) Let A be a nonempty family of lower semicontinuous functions definedSon subsets of R r and taking values in [−∞, ∞]. Set g(x) = sup{f (x) : f ∈ A, x ∈ dom f } for x ∈ D = f ∈A dom f . Show that g is lower semicontinuous. (n) Suppose that f : [a, b] → R is continuous, and differentiable at all but countably many points of [a, b]. Show that f is absolutely continuous iff it is of bounded variation. (o) Show that if f : [a, b] → R is absolutely continuous, then f [E] is Lebesgue measurable for every Lebesgue measurable set E ⊆ [a, b]. 225Y Further exercises (a) Show that the composition of two absolutely continuous functions need not be absolutely continuous. (Hint: 224Xb.) (b) Let f : [a, b] → R be a continuous function, where a < b. Set G = {x : x ∈ ]a, b[ , ∃ y ∈ ]x, b] such that f (x) < f (y)}. Show that G is open and is expressible as a disjoint union of intervals ]c, d[ where f (c) ≤ f (d). Use this to prove 225D without calling on Vitali’s theorem. (c) Let f : [a, b] → R be a function of bounded variation and γ > 0. Show that there is an absolutely continuous function g : [a, b] → R such that g 0 (x) ≤ γ wherever the derivative is defined and {x : x ∈ [a, b], f (x) 6= g(x)} has measure at most γ −1 Var[a,b] (f ). (Hint: reduce to the case of nondecreasing f . Apply 225Yb to the function x 7→ f (x) − γx and show that γµG ≤ Var[a,b] (f ). Set g(x) = f (x) for x ∈ ]a, b[ \ G.) (d) Let f be a nonnegative measurable realvalued function defined on a subset D of R r , where r ≥ 1. r Show that for any ² > R 0 there is a lower semicontinuous function g : R → [−∞, ∞] such that g(x) ≥ f (x) for every x ∈ D and D g − f ≤ ². (e) Let f be a measurable realvalued function defined on a subset D of R r , where r ≥ 1. Show that for any ² > 0 there is a lower semicontinuous function g : R r → [−∞, ∞] such that g(x) ≥ f (x) for every x ∈ D and µ∗ {x : x ∈ D, g(x) > f (x)} ≤ ². (Hint: 134Yd, 134Fb.)
226Ab
The Lebesgue decomposition of a function of bounded variation
87
(f ) Let f be a realvalued function defined on a set D ⊆ R. For x ∈ D, set (D∗ f )(x) = inf{u : u ∈ [−∞, ∞], ∃ δ > 0, f (y) ≤ f (x) + u(y − x) whenever y ∈ D and x ≤ y ≤ x + δ}. Show that D∗ f : D → [−∞, ∞] is Lebesgue measurable, in the sense that {x : (D∗ f )(x) ≥ u} is relatively measurable in D for every u ∈ [−∞, ∞], if f is Lebesgue measurable, and is Borel measurable if f is Borel measurable. 225 Notes and comments There is a good deal more to say about absolutely continuous functions; I will return to the topic in the next section and in Chapter 26. I shall not make direct use of any of the results from 225H on, but it seems to me that this kind of investigation is necessary for any clear picture of the relationships between such concepts as absolute continuity and bounded variation. Of course, in order to apply these results, we do need a store of simple kinds of absolutely continuous function, differentiable functions with bounded derivative forming the most important class (225Xc). A larger family of the same kind is the class of ‘Lipschitz’ functions (262Bc). The definition of ‘absolutely continuous function’ is ordinarily set out for closed bounded intervals, as in 225B. The point is that for other intervals the simplest generalizations of this formulation do not seem quite appropriate. In 225Xh I try to suggest the kind of demands one might make on functions defined on other types of interval. I should remark that the real prize is still not quite within our grasp. I have been able to give a reasonably satisfactory formulation of simple integration by parts (225F), at least for bounded intervals – a further limiting process is necessary to deal with unbounded intervals. But a companion method from advanced calculus, integration by substitution, remains elusive. The best I think we can do at this point is 225Xg, which insists on a continuous integrand f . It is the case that the result is valid for general integrable f , but there are some further subtleties to be mastered on the way; the necessary ideas are given in the much more general results 235A and 263D below, and applied to the onedimensional case in 263I. On the way to the characterization of absolutely continuous functions in 225K, I find myself calling on one of the fundamental relationships between Lebesgue measure and the topology of R r (225I). The technique here can be adapted to give many variations of the result; see 225Yd225Ye. If you have not seen semicontinuous functions before, 225Xl225Xm give a partial idea of their properties. In 225J I give a fragment of ‘descriptive set theory’, the study of the kinds of set which can arise from the formulae of analysis. These ideas too will resurface elsewhere; compare 225Yf and also the proof of 262M below.
226 The Lebesgue decomposition of a function of bounded variation I end this chapter with some notes on a method of analysing a general function of bounded variation which may help to give a picture of what such functions can be, though it is not directly necessary for anything of great importance dealt with in this volume. 226A Sums over arbitrary index sets To get a full picture of this fragment of real analysis, a bit of preparation will be helpful. This concerns the notion of a sum over an arbitrary index set, which I have rather been skirting around so far. (a) If I is any set and hai ii∈I any family in [0, ∞], we set P P i∈I ai = sup{ i∈K ai : K is a finite subset of I}, P with the convention that i∈∅ ai = 0. (See 112Bd, 222Ba.) For general ai ∈ [−∞, ∞], we can set P P P + − i∈I ai = i∈I ai − i∈I ai P P − + if this is defined in [−∞, ∞], that is, at least one of i∈I a+ i , i∈I ai is finite, where a = max(a, 0) and P a− = max(−a, 0) for each a. If i∈I ai is defined and finite, we say that hai ii∈I is summable.
88
The fundamental theorem of calculus
226Ab
(b) Since this is a book on measure theory, I will immediately describe the relationship between this kind of summability and an appropriate notion of integration. For any set I, we have the corresponding ‘counting measure’ µ on I (112Bd). Every subset of I is measurable, so every family hai ii∈I of real numbers is a measurable realvalued function on I. A subset of I has finite measure iff it is finite; so a realvalued function f on I is ‘simple’ if K = {i : f (i) 6= 0} is finite. In this case, R P P f dµ = i∈K f (i) = i∈I f (i) as R defined in part R (a). The measure µ is semifinite (211Nc) so a nonnegative function f is integrable iff f = supµK 0 there is a finite K ⊆ I such that i∈I\K ai  ≤ ². P R This is nothing but a special case of 225A; there is a set K with µK < ∞ such that I\K ai µ(di) ≤ ², but R P ai µ(di) = i∈I\K ai . Q Q I\K (Of course there are ‘direct’ proofs of this result from the definition in (a), not mentioning measures or integrals. But I think you will see that they rely on the same idea as that in the proof of 225A.) Consequently, for any family hai ii∈I of real numbers and any s ∈ R, the following are equiveridical: P (i) i∈I ai = s; P (ii) for every ² > 0 there is a finite K ⊆ I such that s − i∈J ai  ≤ ² whenever J is finite and K ⊆ J ⊆ I. P P P (i)⇒(ii) Take K such that i∈I\K ai  ≤ ². If K ⊆ J ⊆ I, then P P P s − i∈J ai  =  i∈I\J ai  ≤ i∈I\K ai  ≤ ². (ii)⇒(i) Let ² > 0, and let K ⊆ I be as described in (ii). If J ⊆ I \ K is any finite set, then set J1 = {i : i ∈ J, ai ≥ 0}, J2 = J \ J1 . We have X X X ai  =  ai − ai  i∈J
i∈J1 ∪K
≤ s − As J is arbitrary,
P i∈I\K
X
i∈J1 ∪K
ai  ≤ 2² and
i∈J2 ∪K
ai  + s −
X i∈J2 ∪K
ai  ≤ 2².
226Bb
The Lebesgue decomposition of a function of bounded variation
P Accordingly
P
i∈I
ai  ≤
P i∈K
89
ai  + 2² < ∞.
ai is welldefined in R. Also P P P P s − i∈I ai  ≤ s − i∈K ai  +  i∈I\K ai  ≤ ² + i∈I\K ai  ≤ 3². P As ² is arbitrary, i∈I ai = Q Ps, as required. Q In this way, we express i∈I ai directly as a limit; we could write it as P P i∈I ai = limK↑I i∈K ai , i∈I
on the understanding that we look at finite sets K in the righthand formula. P (e) Yet another approach is through the following fact. If i∈I ai  < ∞, then for any δ > 0 the set P {i : ai  ≥ δ} is finite, indeed can have at most 1δ i∈I ai  members; consequently S J = {i : ai 6= 0} = n∈N {i : ai  ≥ 2−n } P P is countable (1A1F). If J is finite, then of course i∈I ai = i∈J ai reduces to a finite sum. Otherwise, we can enumerate J as hjn in∈N , and we shall have P P Pn P∞ i∈I ai = i∈J ai = limn→∞ k=0 ajk = n=0 ajn P (using (d) to reduce the sum i∈J ai to a limit of finite sums). Conversely, if hai ii∈I is such that there is P∞ P a countably infinite J ⊆ {i : ai 6= 0} enumerated as hjn in∈N , and if n=0 ajn  < ∞, then i∈I ai will be P∞ n=0 ajn . 226B Saltus functions Now we are ready for a special type of function of bounded variation on R. Suppose that a < b in R. (a) A (real) saltus function on [a, b] is a function F : [a, b] → R expressible in the form P P F (x) = t∈[a,x[ ut + t∈[a,x] vt P P for x ∈ [a, b], where hut it∈[a,b[ , hvt it∈[a,b] are realvalued families such that t∈[a,b[ ut  and t∈[a,b] vt  are finite. (b) For any function F : [a, b] → R we can write F (x+ ) = limy↓x F (y) if x ∈ [a, b[ and the limit exists, F (x− ) = limy↑x F (y) if x ∈ ]a, b] and the limit exists. (I hope that this will not lead to confusion with the alternative interpretation of x+ as max(x, 0).) Observe that if F is a saltus function, as defined in (b), with associated families hut it∈[a,b[ and hvt it∈[a,b] , then va = F (a), vx = F (x) − F (x− ) for x ∈ ]a, b], ux = F (x+ ) − F (x) for x ∈ [a, b[. P P Let ² > 0. As remarked in 226Ad, there is a finite K ⊆ [a, b] such that P P t∈[a,b[\K ut  + t∈[a,b]\K vt  ≤ ². Given x ∈ [a, b], let δ > 0 be such that [x − δ, x + δ] contains no point of K except perhaps x. In this case, if max(a, x − δ) ≤ y < x, we must have X X F (y) − (F (x) − vx ) =  ut + vt  t∈[y,x[
t∈]y,x[
X
≤
X
ut  +
t∈[a,b[\K
vt  ≤ ²,
t∈[a,b]\K
while if x < y ≤ min(b, x + δ) we shall have F (y) − (F (x) + ux ) = 
X
ut +
t∈]x,y[
≤
X
t∈[a,b[\K
X
vt 
t∈]x,y]
ut  +
X
t∈[a,b]\K
vt  ≤ ².
90
The fundamental theorem of calculus
226Bb
As ² is arbitrary, we get F (x− ) = F (x) − vx (if x > a) and F (x+ ) = F (x) + ux (if x < b). Q Q It follows that F is continuous at x ∈ ]a, b[ iff ux = vx = 0, while F is continuous at a iff ua = 0 and F is continuous at b iff vb = 0. In particular, {x : x ∈ [a, b], F is not continuous at x} is countable (see 226Ae). (c) If F is a saltus function defined on [a, b], with associated families hut it∈[a,b[ , hvt it∈[a,b] , then F is of bounded variation on [a, b], and P P Var[a,b] (F ) ≤ t∈[a,b[ ut  + t∈]a,b] vt . P P If a ≤ x < y ≤ b, then F (y) − F (x) = ux + so F (y) − F (x) ≤
P
t∈]x,y[ (ut
P t∈[x,y[
ut  +
+ vt ) + vy ,
P t∈]x,y]
vt .
If a ≤ a0 ≤ a1 ≤ . . . ≤ an ≤ b, then n X
F (ai ) − F (ai−1 ) ≤
i=1
n X ¡
X
i=1 t∈[ai−1 ,ai [
X
≤
Var[a,b] (F ) ≤
P t∈[a,b[
¢ vt 
t∈]ai−1 ,ai ]
X
ut  +
t∈[a,b[
Consequently
X
ut  + vt .
t∈]a,b]
ut  +
P t∈]a,b]
vt  < ∞. Q Q
(d) The inequality in (c) is actually an equality. To see this, note first that if a ≤ x < y ≤ b, then Var[x,y] (F ) ≥ ux  + vy . P P I noted in (b) that ux = limt↓x F (t) − F (x) and vy = F (y) − limt↑y F (t). So, given ² > 0, we can find t1 , t2 such that x < t1 ≤ t2 ≤ y and F (t1 ) − F (x) ≥ ux  − ²,
F (y) − F (t2 ) ≥ vy  − ².
Now Var[x,y] (F ) ≥ F (t1 ) − F (x) + F (t2 ) − F (t1 ) + F (y) − F (t2 ) ≥ ux  + vy  − 2². As ² is arbitrary, we have the result. Q Q Now, given a ≤ t0 < t1 < . . . < tn ≤ b, we must have n X
Var(F ) ≥ [a,b]
i=1
Var (F )
[ti−1 ,ti ]
(using 224Cc) n X
≥
uti−1  + vti .
i=1
As t0 , . . . , tn are arbitrary, Var[a,b] (F ) ≥
P t∈[a,b[
ut  +
P t∈]a,b]
vt ,
as required. (e) Because a saltus function is of bounded variation ((c) above), it is differentiable almost everywhere (224I). In fact its derivative is zero almost everywhere. P P Let F : [a, b] → R be a saltus function, with associated families hut it∈[a,b[ , hvt it∈[a,b] . Let ² > 0. Let K ⊆ [a, b] be a finite set such that P P t∈[a,b[\K ut  + t∈[a,b]\K vt  ≤ ². Set
226Ca
The Lebesgue decomposition of a function of bounded variation
91
u0t = ut if t ∈ [a, b[ ∩ K, = 0 if t ∈ [a, b[ \ K, vt0
= vt if t ∈ K, = 0 if t ∈ [a, b] \ K,
u00t vt00
= ut − u0t for t ∈ [a, b[ , = vt − vt0 for t ∈ [a, b].
Let G, H be the saltus functions corresponding to hu0t it∈[a,b[ , hvt0 it∈[a,b] and hu00t it∈[a,b[ hvt00 it∈[a,b] , so that F = G + H. Then G0 (t) = 0 for every t ∈ ]a, b[ \ K, since ]a, b[ \ K comprises a finite number of open intervals on each of which G is constant. So G0 =a.e. 0 and F 0 =a.e. H 0 . On the other hand, Rb 0 P P H  ≤ Var[a,b] (H) = t∈[a,b[\K ut  + t∈]a,b]\K vt  ≤ ², a using 224I and (d) above. So
As ² is arbitrary,
Rb a
Rb a
F 0  =
Rb a
H 0  ≤ ².
F 0  = 0 and F 0 =a.e. 0, as claimed. Q Q
226C The Lebesgue decomposition of a function of bounded variation Take a, b ∈ R with a < b. (a) If F : [a, b] → R is nondecreasing, set va = 0, vt = F (t) − F (t− ) for t ∈ ]a, b], ut = F (t+ ) − F (t) for t ∈ [a, b[, defining F (t+ ), F (t− ) as in 226Bb. Then all the vt , ut are nonnegative, and if a < t0 < t1 < . . . < tn < b, then Pn Pn + − i=0 (F (ti ) − F (ti )) ≤ F (b) − F (a). i=0 (uti + vti ) = P P Accordingly t∈[a,b] vt are both finite. Let Fp be the corresponding saltus function, as t∈[a,b[ ut and defined in 226Ba, so that P Fp (x) = F (a+ ) − F (a) + t∈]a,x[ (F (t+ ) − F (t− )) + F (x) − F (x− ) if a < x ≤ b. If a ≤ x < y ≤ b then Fp (y) − Fp (x) = F (x+ ) − F (x) +
X
(F (t+ ) − F (t− )) + F (y) − F (y − )
t∈]x,y[
≤ F (y) − F (x) because if x = t0 < t1 < . . . < tn < tn+1 = y then F (x+ ) − F (x) +
n X
− − (F (t+ i ) − F (ti )) + F (y) − F (y )
i=1
= F (y) − F (x) −
n+1 X
+ (F (t− i ) − F (ti−1 )) ≤ F (y) − F (x).
i=1
Accordingly both Fp and Fc = F − Fp are nondecreasing. Also, because Fp (a) = 0 = va , Fp (t) − Fp (t− ) = vt = F (t) − F (t− ) for t ∈ ]a, b], Fp (t+ ) − Fp (t) = ut = F (t+ ) − F (t) for t ∈ [a, b[, we shall have Fc (a) = F (a),
92
The fundamental theorem of calculus
226Ca
Fc (t) = Fc (t− ) for t ∈ ]a, b], Fc (t) = Fc (t+ ) for t ∈ [a, b[, and Fc is continuous. Clearly this expression of F = Fp +Fc as the sum of a saltus function and a continuous function is unique, except that we can freely add a constant to one if we subtract it from the other. (b) Still taking F : [a, b] → R to be nondecreasing, we know that F 0 is integrable (222C); moreover, Rx 0 0 F =a.e. Fc , by 226Be. Set Fac (x) = F (a) + a F for each x ∈ [a, b]. We have Ry Fac (y) − Fac (x) = x Fc0 ≤ Fc (y) − Fc (x) 0
for a ≤ x ≤ y ≤ b (222C again), so Fcs = Fc − Fac is still nondecreasing; Fac is continuous (225A), so Fcs 0 0 =a.e. 0. =a.e. F 0 =a.e. Fc0 (222E), so Fcs is continuous; Fac Again, the expression of Fc = Fac + Fcs as the sum of an absolutely continuous function and a function with zero derivative almost everywhere is unique, except for the possibility of moving a constant from one to the other, because two absolutely continuous functions whose derivatives are equal almost everywhere must differ by a constant (225D). (c) Putting all these together: if F : [a, b] → R is any nondecreasing function, it is expressible as Fp + Fac + Fcs , where Fp is a saltus function, Fac is absolutely continuous, and Fcs is continuous and differentiable, with zero derivative, almost everywhere; all three components are nondecreasing; and the expression is unique if we say that Fac (a) = F (a), Fp (a) = Fcs (a) = 0. The Cantor function f : [0, 1] → [0, 1] (134H) is continuous and f 0 =a.e. 0 (134Hb), so fp = fac = 0 and f = fcs . Setting g(x) = 21 (x + f (x)) for x ∈ [0, 1], as in 134I, we get gp (x) = 0, gac (x) = x2 and gcs (x) = 21 f (x). (d) Now suppose that F : [a, b] → R is of bounded variation. Then it is expressible as a difference G − H of nondecreasing functions (224D). So writing Fp = Gp − Hp , etc., we can express F as a sum 0 Fp + Fcs + Fac , where Fp is a saltus function, Fac is absolutely continuous, Fcs is continuous, Fcs =a.e. 0, Fac (a) = F (a), Fcs (a) = Fp (a) = 0. Under these conditions the expression is unique, because (for instance) 0 =a.e. (F − Fp )0 =a.e. F 0 . Fp (t+ ) − Fp (t) = F (t+ ) − F (t) for t ∈ [a, b[, while Fac This is a Lebesgue decomposition of the function F . (I have to say ‘a’ Lebesgue decomposition because of course the assignment Fac (a) = F (a), Fp (a) = Fcs (a) = 0 is arbitrary.) 226D Complex functions The modifications needed to deal with complex functions are elementary. (a) If I is any set and haj ij∈I is a family of complex numbers, then the following are equiveridical: P (i) j∈I aj  < ∞; P (ii) there is an s ∈ C such that for every ² > 0 there is a finite K ⊆ I such that s− j∈J aj  ≤ ² whenever J is finite and K ⊆ J ⊆ I. In this case R P P s = j∈I Re(aj ) + i j∈I Im(aj ) = I aj µ(dj), P where µ is counting measure on I, and we write s = j∈I aj . (b) If a < b in R, a complex saltus function on [a, b] is a function F : [a, b] → C expressible in the form P P F (x) = t∈[a,x[ ut + t∈[a,x] vt P P for x ∈ [a, b], where hut it∈[a,b[ , hvt it∈[a,b] are complexvalued families such that t∈[a,b[ ut  and t∈[a,b] vt  are finite; that is, if the real and imaginary parts of F are saltus functions. In this case F is continuous except at countably many points and differentiable, with zero derivative, almost everywhere on [a, b], and ux = limt↓x F (t) − F (x) for every x ∈ [a, b[, vx = limt↑x F (x) − F (t) for every x ∈ ]a, b]
226Y
The Lebesgue decomposition of a function of bounded variation
93
(apply the results of 226B to the real and imaginary parts of F ). F is of bounded variation, and its variation is P P Var[a,b] (F ) = t∈[a,b[ ut  + t∈]a,b] vt  (repeat the arguments of 226Bcd). (c) If F : [a, b] → C is a function of bounded variation, where a < b in R, it is uniquely expressible as F = Fp + Fcs + Fac , where Fp is a saltus function, Fac is absolutely continuous, Fcs is continuous and has zero derivative almost everywhere, and Fac (a) = F (a), Fp (a) = Fcs (a) = 0. (Apply 226C to the real and imaginary parts of F .) 226E As an elementary exercise in the language of 226A, I interpolate a version of a theorem of B.Levi which is sometimes useful. Proposition Let (X, Σ, µ) be a measurePspace, a family of µintegrable realR I a countable set, and hfi ii∈I P or complexvalued functions such that f dµ is finite. Then f (x) = i i∈I i∈I fi (x) is defined almost R P R everywhere and f dµ = i∈I fi dµ. proof If I is finite this is elementary.POtherwise, since there must be a bijection between I and N, we n may take it that I = N. Setting R gn =P i=0 R fi  for each n, we have a nondecreasing sequence hgn in∈N of integrable functions such that gn ≤ i∈N fi  for every n, so that g = supn∈N gn is integrable, by B.Levi’s theorem as stated in Now if x ∈P X is such that g(x) is P P123A. In particular, g is finite almost everywhere. defined and finite, i∈J fi (x) ≤ g(x) for every finite J ⊆ N, so i∈N fi (x) and i∈N fi (x) are defined. Pn P Pn In this case, of course, i∈N fi (x) = limn→∞ i=0 fi (x). But  i=0 fi  ≤a.e. g for each n, so Lebesgue’s Dominated Convergence Theorem tells us that RP R Pn P Pn R i∈N fi . i=0 fi = i∈N fi = limn→∞ i=0 fi = limn→∞ 226X Basic exercises > (a) A stepfunction on an interval [a, b] is a function F such that, for suitable t0 , . . . , tn with a = t0 ≤ . . . ≤ tn = b, F is constant on each interval ]ti−1 , ti [. Show that F : [a, b] → R is a saltus function iff for every ² > 0 there is a stepfunction G : [a, b] → R such that Var[a,b] (F − G) ≤ ². (b) Let F , G be realvalued functions of bounded variation defined on an interval [a, b] ⊆ R. Show that, in the language of 226C, (F + G)p = Fp + Gp , (F + G)cs = Fcs + Gcs ,
(F + G)c = Fc + Gc , (F + G)ac = Fac + Gac .
> (c) Let F be a realvalued function of bounded variation on an interval [a, b] ⊆ R. Show that, in the language of 226C, Var[a,b] (F ) = Var[a,b] (Fp ) + Var[a,b] (Fc ) = Var[a,b] (Fp ) + Var[a,b] (Fcs ) + Var[a,b] (Fac ). (d) Let F be a realvalued function of bounded variation on an interval [a, b] ⊆ R. Show that F is Rb absolutely continuous iff Var[a,b] (F ) = a F 0 . (e) Consider the function g of 134I/226Cc. Show that g −1 : 0, 1] → [0, 1] is differentiable almost everywhere on [0, 1], and find µ{x : (g −1 )0 (x) ≤ a} for each a ∈ R. > (f ) Suppose that I and J are sets and that hai ii∈I is a summable family of real numbers. (i) Show that f : J → I is injective then haf (j) ij∈J P is summable. (ii) Show that if g : I → J is any function, then P ifP a is defined and equal to −1 i j∈J i∈g [{j}] i∈I ai . 226Y Further exercises (a) Explain what modifications are appropriate in the description of the Lebesgue decomposition of a function of bounded variation if we wish to consider functions on open or halfopen intervals, including unbounded intervals.
94
The fundamental theorem of calculus
226Yb
−1 (b) Suppose R that F : [a, b] → R is a function of bounded variation, and set h(y) = #(F [{y}]) for y ∈ R. Show that h = Var[a,b] (Fc ), where Fc is the ‘continuous part’ of F as defined in 226Ca/226Cd.
(c) Show that a set I is countable iff there is a summable family hai ii∈I of nonzero real numbers. 226 Notes and comments In 232I and 232Yb below I will revisit these ideas, linking them to a decomposition of the LebesgueStieltjes measure corresponding to a nondecreasing real function, and thence to more general measures. All this work is peripheral to the main concerns of this volume, but I think it is illuminating, and certainly it is part of the basic knowledge assumed of anyone working in real analysis.
231Bd
Countably additive functionals
95
Chapter 23 The RadonNikod´ ym Theorem In Chapter 22, I discussed the indefinite integrals of integrable functions on R, and gave what I hope you feel are satisfying descriptions of both the functions which are indefinite integrals (the absolutely continuous functions) and of how to find which functions they are indefinite integrals of (you differentiate them). For general measure spaces, we have no structure present which can give such simple formulations; but nevertheless the same questions can be asked and, up to a point, answered. The first section of this chapter introduces the basic machinery needed, the concept of ‘countably additive’ functional and its decomposition into positive and negative parts. The main theorem takes up the second section: indefinite integrals are the ‘truly continuous’ additive functionals; on σfinite spaces, these are the ‘absolutely continuous’ countably additive functionals. In §233 I discuss the most important single application of the theorem, its use in providing a concept of ‘conditional expectation’. This is one of the central concepts of probability theory – as you very likely know; but the form here is a dramatic generalization of the elementary concept of the conditional probability of one event given another, and needs the whole strength of the general theory of measure and integration as developed in Volume 1 and this chapter. I include some notes on convex functions, up to and including versions of Jensen’s inequality (233I233J). While we are in the area of ‘pure’ measure theory, I take the opportunity to discuss two further topics. The first is an essentially elementary construction, the ‘indefiniteintegral’ measure defined from a nonnegative measurable function on a measure space; I think the details need a little attention, and I work through them in §234. Rather deeper ideas are needed to deal with ‘measurable transformations’. In §235 I set out the techniques necessary to provide an abstract basis for a general method of integrationbysubstitution, with a detailed account of sufficient conditions for a formula of the type
R
g(y)dy =
R
g(φ(x))J(x)dx
to be valid.
231 Countably additive functionals I begin with an abstract description of the objects which will, in appropriate circumstances, correspond to the indefinite integrals of general integrable functions. In this section I give those parts of the theory which do not involve a measure, but only a set with a distinguished σalgebra of subsets. The basic concepts are those of ‘finitely additive’ and ‘countably additive’ functional, and there is one substantial theorem, the ‘Hahn decomposition’ (231E). 231A Definition Let X be a set and Σ an algebra of subsets of X (136E). A functional ν : Σ → R is finitely additive, or just additive, if ν(E ∪ F ) = νE + νF whenever E, F ∈ Σ and E ∩ F = ∅. 231B Elementary facts Let X be a set, Σ an algebra of subsets of X, and ν : Σ → R a finitely additive functional. (a) ν∅ = 0. (For ν∅ = ν(∅ ∪ ∅) = ν∅ + ν∅.) (b) If E0 , . . . , En are disjoint members of Σ then ν(
S i≤n
Ei ) =
Pn i=0
νEi .
(c) If E, F ∈ Σ and E ⊆ F then νF = νE + ν(F \ E). More generally, for any E, F ∈ Σ, νF = ν(F ∩ E) + ν(F \ E). (d) If E, F ∈ Σ then νE − νF = ν(E \ F ) + ν(E ∩ F ) − ν(E ∩ F ) − ν(F \ E) = ν(E \ F ) − ν(F \ E).
96
The RadonNikod´ ym theorem
231C
231C Definition Let XPbe a set and Σ an algebra of subsets of X. S A function ν : Σ → R is countably ∞ additive or σadditiveSif n=0 νEn exists in R and is equal to ν( n∈N En ) for every disjoint sequence hEn in∈N in Σ such that n∈N En ∈ Σ. Remark Note that when I use the phrase ‘countably additive functional’ I mean to exclude the possibility of ∞ as a value of the functional. Thus a measure is a countably additive functional iff it is totally finite (211C). You will sometimes see the phrase ‘signed measure’ used to mean what I call a countably additive functional. 231D Elementary facts Let X be a set, Σ a σalgebra of subsets of X and ν : Σ → R a countably additive functional. P∞ (a) ν is finitely additive. P P (i) Setting En = ∅ for every n ∈ N, n=0 ν∅ must be defined in R so ν∅ must be 0. (ii) Now if E, F ∈ Σ and E ∩ F = ∅ we can set E0 = E, E1 = F , En = ∅ for n ≥ 2 and get S P∞ ν(E ∪ F ) = ν( n∈N En ) = n=0 νEn = νE + νF . Q Q (b) If hEn in∈N is a nondecreasing sequence in Σ, with union E ∈ Σ, then P∞ νE = νE0 + n=0 ν(En+1 \ En ) = limn→∞ νEn . (c) If hEn in∈N is a nonincreasing sequence in Σ with intersection E ∈ Σ, then νE = νE0 − limn→∞ ν(E0 \ En ) = limn→∞ νEn . (d) If ν 0 : Σ → R is another countably additive functional, and c ∈ R, then ν + ν 0 : Σ → R and cν : Σ → R are countably additive. (e) If H ∈ Σ, then νH : Σ → R is countably additive, where νH E = ν(E ∩ H) for every E ∈ Σ. P P If hEn in∈N is a disjoint sequence in Σ with union E ∈ Σ then hEn ∩ Hin∈N is disjoint, with union E ∩ H, so P∞ P∞ S Q νH E = ν(H ∩ E) = ν( n∈N (H ∩ En )) = n=0 ν(H ∩ En ) = n=0 νH En . Q Remark For the time being, we shall be using the notion of ‘countably additive functional’ only on σalgebras Σ, in which case we can take it for granted that the unions and intersections above belong to Σ. 231E All the ideas above amount to minor modifications of ideas already needed at the very beginning of the theory of measure spaces. We come now to something more substantial. Theorem Let X be a set, Σ a σalgebra of subsets of X, and ν : Σ → R a countably additive functional. Then (a) ν is bounded; (b) there is a set H ∈ Σ such that νF ≥ 0 whenever F ∈ Σ and F ⊆ H, νF ≤ 0 whenever F ∈ Σ and F ∩ H = ∅. proof (a) ?? Suppose, if possible, otherwise. For E ∈ Σ, set M (E) = sup{νF  : F ∈ Σ, F ⊆ E}; then M (X) = ∞. Moreover, whenever E1 , E2 , F ∈ Σ and F ⊆ E1 ∪ E2 , then νF  = ν(F ∩ E1 ) + ν(F \ E1 ) ≤ ν(F ∩ E1 ) + ν(F \ E1 ) ≤ M (E1 ) + M (E2 ), so M (E1 ∪ E2 ) ≤ M (E1 ) + M (E2 ). Choose a sequence hEn in∈N in Σ as follows. E0 = X. Given that M (En ) = ∞, where n ∈ N, then surely there is an Fn ⊆ En such that νFn  ≥ 1 + νEn , in which case ν(En \ Fn ) ≥ 1. Now at least one of M (Fn ), M (En \ Fn ) is infinite; if M (Fn ) = ∞, set En+1 = Fn ;
231Xb
Countably additive functionals
97
otherwise, set En+1 = En \ Fn ; in either case, note that ν(En \ En+1 ) ≥ 1 and M (En+1 ) = ∞, so that the induction will continue. On P∞completing this induction, set Gn = En \ En+1 for n ∈ N. Then hGn in∈N is a disjoint sequence in Σ, so n=0 νGn is defined in R and limn→∞ νGn = 0; but νGn  ≥ 1 for every n. X X (b)(i) By (a), γ = sup{νE : E ∈ Σ} < ∞. Choose a sequence hEn in∈N in Σ such that νEn ≥ γ − 2−n T for every n ∈ N. For m ≤ n ∈ N, set Fmn = m≤i≤n Ei . Then νFmn ≥ γ − 2 · 2−m + 2−n for every n ≥ m. P P Induce on n. For n = m, this is due to the choice of Em = Fmm . For the inductive step, we have Fm,n+1 = Fmn ∩ En+1 , while surely γ ≥ ν(En+1 ∪ Fmn ), so
γ + νFm,n+1 ≥ ν(En+1 ∪ Fmn ) + νFm,n+1 = νEn+1 + ν(Fmn \ En+1 ) + νFm,n+1 = νEn+1 + νFmn ≥ γ − 2−n−1 + γ − 2 · 2−m + 2−n (by the choice of En+1 and the inductive hypothesis) = 2γ − 2 · 2−m + 2−n−1 . Subtracting γ from both sides, νFm,n+1 ≥ γ − 2 · 2−m + 2−n−1 and the induction proceeds. Q Q (ii) For m ∈ N, set Fm =
T n≥m
Fmn =
T n≥m
En .
Then by 231Dc. Next, hFm im∈N
νFm = limn→∞ νFmn ≥ γ − 2 · 2−m , S is nondecreasing, so setting H = m∈N Fm we have νH = limm→∞ νFm ≥ γ;
since νH is surely less than or equal to γ, νH = γ. If F ∈ Σ and F ⊆ H, then νH − νF = ν(H \ F ) ≤ γ = νH, so νF ≥ 0. If F ∈ Σ and F ∩ H = ∅ then νH + νF = ν(H ∪ F ) ≤ γ = νH so νF ≤ 0. This completes the proof. 231F Corollary Let X be a set, Σ a σalgebra of subsets of X, and ν : Σ → R a countably additive functional. Then ν can be expressed as the difference of two totally finite measures with domain Σ. proof Take H ∈ Σ as described in 231Eb. Set ν1 E = ν(E ∩ H), ν2 E = −ν(E \ H) for E ∈ Σ. Then, as in 231Dde, both ν1 and ν2 are countably additive functionals on Σ, and of course ν = ν1 − ν2 . But also, by the choice of H, both ν1 and ν2 are nonnegative, so are totally finite measures. Remark This is called the ‘Jordan decomposition’ of ν. The expression of 231Eb is a ‘Hahn decomposition’. 231X Basic exercises (a) Let Σ be the family of subsets A of N such that one of A, N \ A is finite. Show that Σ is an algebra of subsets of N. (This is the finitecofinite algebra of subsets of N; compare 211R.) (b) Let X be a set, Σ an algebra of subsets of X and ν : Σ → R a finitely additive functional. Show that ν(E ∪ F ∪ G) + ν(E ∩ F ) + ν(E ∩ G) + ν(F ∩ G) = νE + νF + νG + ν(E ∩ F ∩ G) for all E, F , G ∈ Σ. Generalize this result to longer sequences of sets.
98
The RadonNikod´ ym theorem
231Xc
> (c) Let Σ be the finitecofinite algebra of subsets of N, as in 231Xa. Define ν : Σ → Z by setting ¡ ¢ νE = limn→∞ #({i : i ≤ n, 2i ∈ E}) − #({i : i ≤ n, 2i + 1 ∈ E}) for every E ∈ Σ. Show that ν is welldefined and finitely additive and unbounded. (d) Let X be a set and Σ an algebra of subsets of X. (i) Show that if ν : Σ → R and ν 0 : Σ → R are finitely additive, so are ν + ν 0 and cν for any c ∈ R. (ii) Show that if ν : Σ → R is finitely additive and H ∈ Σ, then νH is finitely additive, where νH (E) = ν(H ∩ E) for every E ∈ Σ. (e) Let X be a set, Σ an algebra of subsets of X and ν : Σ → R a finitely P additive functional. Let S be n the linear space of those realvalued functions onRX expressible in the form i=0 ai χEi where Ei ∈ Σ for each i. (i) Show that we have a linear functional : S → R given by writing R Pn Pn i=0 ai χEi = i=0 ai νEi R whenever a0 , . . . , an ∈ R and E0 , . . . , En ∈ Σ. (ii) Show that if νE ≥ 0 for every E ∈ Σ then f ≥ 0 whenever f ∈ S and f (x) ≥ 0 for every x ∈ X. (iii) Show that if ν is bounded and X 6= ∅ then sup{
R
f  : f ∈ S, kf k∞ ≤ 1} = supE,F ∈Σ νE − νF ,
writing kf k∞ = supx∈X f (x). > (f ) Let X be a set, Σ a σalgebra of subsets of X and ν : Σ → R a finitely additive functional. Show that the following are equiveridical: (i) ν is countably additive; T (ii) limn→∞ νEn = 0 whenever hEn in∈N is a nonincreasing sequence T inSΣ and n∈N En = ∅; (iii) limn→∞ νEn = 0 whenever hEn in∈N is a sequence in Σ and n∈N m≥n Em = ∅; (iv) limn→∞ νEn = νE whenever hEn in∈N is a sequence in Σ and S T T S E = n∈N m≥n Em = n∈N m≥n Em . (Hint: for (i)⇒(iv), consider nonnegative ν first.) (g) Let X be a set and Σ a σalgebra of subsets of X, and let ν : Σ → [−∞, ∞[ be a function P∞ which is countably additive in the sense that ν∅ = 0 and whenever hE i is a disjoint sequence in Σ, n n∈N n=0 νEn = Pn S limn→∞ i=0 νEi is defined in [−∞, ∞[ and is equal to ν( n∈N En ). Show that ν is bounded above and attains its upper bound (that is, there is an H ∈ Σ such that νH = supF ∈Σ νF ). Hence, or otherwise, show that ν is expressible as the difference of a totally finite measure and a measure, both with domain Σ. 231Y Further exercises (a) Let X be a set, Σ an algebra of subsets of X, and ν : Σ → R a bounded finitely additive functional. Set ν + E = sup{νF : F ∈ Σ, F ⊆ E}, ν − E = − inf{νF : F ∈ Σ, F ⊆ E}, νE = sup{νF1 − νF2 : F1 , F2 ∈ Σ, F1 , F2 ⊆ E}. +
−
Show that ν , ν and ν are all bounded finitely additive functionals on Σ and that ν = ν + − ν − , ν = ν + + ν − . Show that if ν is countably additive so are ν + , ν − and ν. (ν is sometimes called the variation of ν.) (b) Let X be a set and Σ an algebra of subsets of X. Let ν1 , ν2 be two bounded finitely additive functionals defined on Σ. Set (ν1 ∨ ν2 )(E) = sup{ν1 F + ν2 (E \ F ) : F ∈ Σ, F ⊆ E}, (ν1 ∧ ν2 )(E) = inf{ν1 F + ν2 (E \ F ) : F ∈ Σ, F ⊆ E}. Show that ν1 ∨ ν2 and ν1 ∧ ν2 are finitely additive functionals, and that ν1 + ν2 = ν1 ∨ ν2 + ν1 ∧ ν2 . Show that, in the language of 231Ya,
231Yi
Countably additive functionals
ν + = ν ∨ 0,
ν − = (−ν) ∨ 0 = −(ν ∧ 0), ν1 ∨ ν2 = ν1 + (ν2 − ν1 )+ ,
99
ν = ν ∨ (−ν) = ν + ∨ ν − = ν + + ν − , ν1 ∧ ν2 = ν1 − (ν1 − ν2 )+ ,
so that ν1 ∨ ν2 and ν1 ∧ ν2 are countably additive if ν1 and ν2 are. (c) Let X be a set and Σ an algebra of subsets of X. Let M be the set of all bounded finitely additive functionals from Σ to R. Show that M is a linear space under the natural definitions of addition and scalar multiplication. Show that M has a partial ordering ≤ defined by saying that ν ≤ ν 0 iff νE ≤ ν 0 E for every E ∈ Σ, and that for this partial ordering ν1 ∨ ν2 , ν1 ∧ ν2 , as defined in 231Yb, are sup{ν1 , ν2 }, inf{ν1 , ν2 }. (d) Let X be a set and Σ an algebra of subsets of X. Let ν0 , . . . , νn be bounded finitely additive functionals on Σ and set Pn S νˇE = sup{ i=0 νi Fi : F0 , . . . , Fn ∈ Σ, i≤n Fi = E, Fi ∩ Fj = ∅ for i 6= j}, Pn
νˆE = inf{
i=0
νi Fi : F0 , . . . , Fn ∈ Σ,
S i≤n
Fi = E, Fi ∩ Fj = ∅ for i 6= j}
for E ∈ Σ. Show that νˇ and νˆ are finitely additive and are, respectively, sup{ν0 , . . . , νn } and inf{ν0 , . . . , νn } in the partially ordered set of finitely additive functionals on Σ. (e) Let X be a set and Σ a σalgebra of subsets of X; let M be the partially ordered set of all bounded finitely additive functionals from Σ to R. (i) Show that if A ⊆ M is nonempty and bounded above in M , then A has a supremum νˇ in M , given by the formula n X
νˇE = sup{
νi Fi : ν0 , . . . , νn ∈ A, F0 , . . . , Fn ∈ Σ,
i=0
[
Fi = E,
i≤n
Fi ∩ Fj = ∅ for i 6= j}. (ii) Show that if A ⊆ M is nonempty and bounded below in M then it has an infimum νˆ ∈ M , given by the formula n X
νˆE = inf{
i=0
νi Fi : ν0 , . . . , νn ∈ A, F0 , . . . , Fn ∈ Σ,
[
Fi = E,
i≤n
Fi ∩ Fj = ∅ for i 6= j}. (f ) Let X be a set, Σ an algebra of subsets of X, and ν : Σ → R a nonnegative finitely additive functional. For E ∈ Σ set νca (E) = inf{supn∈N νFn : hFn in∈N is a nondecreasing sequence in Σ with union E}. Show that νca is a countably additive functional on Σ and that if ν 0 is any countably additive functional with ν 0 ≤ ν then ν 0 ≤ νca . Show that νca ∧ (ν − νca ) = 0. (g) Let X be a set, Σ an algebra of subsets of X, and ν : Σ → R a bounded finitely additive functional. Show that ν is uniquely expressible as νca + νpf a , where νca is countably additive, νpf a is finitely additive and if 0 ≤ ν 0 ≤ νpf a  and ν 0 is countably additive then ν 0 = 0. (h) Let X be a set and Σ an algebra of subsets of X. Let M be the linear space of bounded finitely additive functionals on Σ, and for ν ∈ M set kνk = ν(X), defining ν as in 231Ya. (kνk is the total variation of ν.) Show that k k is a norm on M under which M is a Banach space. Show that the space of bounded countably additive functionals on Σ is a closed linear subspace of M . (i) Repeat as many as possible of the results of this section for complexvalued functionals.
100
The RadonNikod´ ym theorem
231 Notes
231 Notes and comments The real purpose of this section has been to describe the Hahn decomposition of a countably additive functional (231E). The very leisurely exposition in 231A231D is intended as a review of the most elementary properties of measures, in the slightly more general context of ‘signed measures’, with those properties corresponding to ‘additivity’ alone separated from those which depend on ‘countable additivity’. In 231Xf I set out necessary and sufficient conditions for a finitely additive functional on a σalgebra to be countably additive, designed to suggest that a finitely additive functional is countably additive iff it is ‘sequentially ordercontinuous’ in some sense. The fact that a countably additive functional can be expressed as the difference of nonnegative countably additive functionals (231F) has an important counterpart in the theory of finitely additive functionals: a finitely additive functional can be expressed as the difference of nonnegative finitely additive functionals if (and only if) it is bounded (231Ya). But I do not think that this, or the further properties of bounded finitely additive functionals described in 231Xe and 231Y, will be important to us before Volume 3.
232 The RadonNikod´ ym theorem I come now to the chief theorem of this chapter, one of the central results of measure theory, relating countably additive functionals to indefinite integrals. The objective is to give a complete description of the functionals which can arise as indefinite integrals of integrable functions (232E). These can be characterized as the ‘truly continuous’ additive functionals (232Ab). A more commonly used concept, and one adequate in many cases, is that of ‘absolutely continuous’ additive functional (232Aa); I spend the first few paragraphs (232B232D) on elementary facts about truly continuous and absolutely continuous functionals. I end the section with a discussion of the decomposition of general countably additive functionals (232I). 232A Absolutely continuous functionals Let (X, Σ, µ) be a measure space and ν : Σ → R a finitely additive functional. (a) ν is absolutely continuous with respect to µ (sometimes written ‘ν ¿ µ’) if for every ² > 0 there is a δ > 0 such that νE ≤ ² whenever E ∈ Σ and µE ≤ δ. (b) ν is truly continuous with respect to µ if for every ² > 0 there are E ∈ Σ, δ > 0 such that µE is finite and νF  ≤ ² whenever F ∈ Σ and µ(E ∩ F ) ≤ δ. (c) For reference, I add another definition here. If ν is countably additive, it is singular with respect to µ if there is a set F ∈ Σ such that µF = 0 and νE = 0 whenever E ∈ Σ and E ⊆ X \ F . 232B Proposition Let (X, Σ, µ) be a measure space and ν : Σ → R a finitely additive functional. (a) If ν is countably additive, it is absolutely continuous with respect to µ iff νE = 0 whenever µE = 0. (b) ν is truly continuous with respect to µ iff (α) it is countably additive (β) it is absolutely continuous with respect to µ (γ) whenever E ∈ Σ and νE 6= 0 there is an F ∈ Σ such that µF < ∞ and ν(E ∩ F ) 6= 0. (c) If (X, Σ, µ) is σfinite, then ν is truly continuous with respect to µ iff it is countably additive and absolutely continuous with respect to µ. (d) If (X, Σ, µ) is totally finite, then ν is truly continuous with respect to µ iff it is absolutely continuous with respect to µ. proof (a)(i) If ν is absolutely continuous with respect to µ and µE = 0, then µE ≤ δ for every δ > 0, so νE ≤ ² for every ² > 0 and νE = 0. (ii) ?? Suppose, if possible, that νE = 0 whenever µE = 0, but ν is not absolutely continuous. Then there is an ² > 0 such that for every δ > 0 there is an E ∈ Σ such that µE ≤ δ butT νES≥ ². For each n ∈ N we may choose an Fn ∈ Σ such that µFn ≤ 2−n and νFn  ≥ ². Consider F = n∈N k≥n Fk . Then we have S P∞ µF ≤ inf n∈N µ( k≥n Fk ) ≤ inf n∈N k=n 2−k = 0, so µF = 0.
232B
The RadonNikod´ ym theorem
101
Now recall that by 231Eb there is an H ∈ Σ such that νG ≥ 0 when G ∈ Σ and G ⊆ H, and νG ≤ 0 when G ∈ Σ and G ∩ H = ∅. As in 231F, set ν1 G = ν(G ∩ H), ν2 G = −ν(G \ H) for G ∈ Σ, so that ν1 and ν2 are totally finite measures, and ν1 F = ν2 F = 0 because µ(F ∩ H) = µ(F \ H) = 0. Consequently S 0 = νi F = limn→∞ νi ( m≥n Fm ) ≥ lim supn→∞ νi Fn for both i, and 0 = limn→∞ (ν1 Fn + ν2 Fn ) ≥ lim inf n→∞ νFn  ≥ ² > 0, which is absurd. X X (b)(i) Suppose that ν is truly continuous with respect to µ. It is obvious from the definitions that ν is absolutely continuous with respect to µ. If νE 6= 0, there must be an F of finite measure such that νG < νE whenever G ∩ F = ∅, so that ν(E \ F ) < νE and ν(E ∩ F ) 6= 0. This deals with the conditions (β) and (γ). To check that ν is countably additive, let hEn in∈N be a disjoint sequence in Σ, with union E, and ² > 0. Let δ > 0, F ∈ Σ be such µF < ∞ and νG ≤ ² whenever G ∈ Σ and µ(F ∩ G) ≤ δ. Then P∞ n=0 µ(En ∩ F ) ≤ µF < ∞, P∞ S ∗ so there is an n ∈ N such that i=n µ(Ei ∩ F ) ≤ δ. Take any m ≥ n and consider Em = i≤m Ei . We have Pm ∗ ∗ ) ≤ ²,  = ν(E \ Em νE − i=0 νEi  = νE − νEm because ∗ µ(F ∩ E \ Em )=
P∞
As ² is arbitrary, νE =
i=m+1
P∞ i=0
µ(F ∩ Ei ) ≤ δ.
νEi ;
as hEn in∈N is arbitrary, ν is countably additive. (ii) Now suppose that ν satisfies the three conditions. By 231F, ν can be expressed as the difference of two nonnegative countably additive functionals ν1 , ν2 ; set ν 0 = ν1 + ν2 , so that ν 0 is a nonnegative countably additive functional and νF  ≤ ν 0 F for every F ∈ Σ. Set γ = sup{ν 0 F : F ∈ Σ, µF < ∞} ≤ ν 0 X < ∞,
S and choose a sequence hFn in∈N of sets of finite measure such that limn→∞ ν 0 Fn = γ; set F ∗ = n∈N Fn . If ∗ G ∈ Σ and G ∩ F = ∅ then νG = 0. P P?? Otherwise, by condition (γ), there is an F ∈ Σ such that µF < ∞ and ν(G ∩ F ) 6= 0. It follows that ν 0 (F \ F ∗ ) ≥ ν 0 (F ∩ G) ≥ ν(F ∩ G) > 0, and there must be an n ∈ N such that γ < ν 0 Fn + ν 0 (F \ F ∗ ) = ν 0 (Fn ∪ (F \ F ∗ )) ≤ ν 0 (F ∪ Fn ) ≤ γ because µ(F ∪ FS XQ Q n ) < ∞; but this is impossible. X Setting Fn∗ = k≤n Fk for each n, we have limn→∞ ν 0 (F ∗ \ Fn∗ ) = 0. Take any ² > 0, and (using condition (β)) let δ > 0 be such that νE ≤ 12 ² whenever µE ≤ δ. Let n be such that ν 0 (F ∗ \ Fn∗ ) ≤ 12 ². Now if F ∈ Σ and µ(F ∩ Fn∗ ) ≤ δ then νF  ≤ ν(F ∩ Fn∗ ) + ν(F ∩ F ∗ \ Fn∗ ) + ν(F \ F ∗ ) 1 2
≤ ² + ν 0 (F ∩ F ∗ \ Fn∗ ) + 0 1 2
1 2
1 2
≤ ² + ν 0 (F ∗ \ Fn∗ ) ≤ ² + ² = ². And µFn∗ < ∞. As ² is arbitrary, ν is truly continuous. (c) Now suppose that (X, Σ, µ) is σfinite and that ν is countably additive and absolutely continuous with respect to µ. Let hXn in∈N be a nondecreasing sequence of sets of finite measure covering X (211D).
102
The RadonNikod´ ym theorem
232B
If νE 6= 0, then limn→∞ ν(E ∩ Xn ) 6= 0, so ν(E ∩ Xn ) 6= 0 for some n. This shows that ν satisfies condition (γ) of (b), so is truly continuous. Of course the converse of this fact is already covered by (b). (d) Finally, suppose that µX < ∞ and that ν is absolutely continuous with respect to µ. Then it must be truly continuous, because we can take F = X in the definition 232Ab. 232C Lemma Let (X, Σ, µ) be a measure space and ν, ν 0 two countably additive functionals on Σ which are truly continuous with respect to µ. Take c ∈ R and H ∈ Σ, and set νH E = ν(E ∩ H), as in 231De. Then ν + ν 0 , cν and νH are all truly continuous with respect to µ, and ν is expressible as the difference of nonnegative countably additive functionals which are truly continuous with respect to µ. proof Let ² > 0. Set η = ²/(2 + c) > 0. Then there are δ, δ 0 > 0 and E, E 0 ∈ Σ such that µE < ∞, µE 0 < ∞ and νF  ≤ η whenever µ(F ∩ E) ≤ δ, ν 0 F  ≤ η whenever µ(F ∩ E) ≤ δ 0 . Set δ ∗ = min(δ, δ 0 ) > 0, E ∗ = E ∪ E 0 ∈ Σ; then µE ∗ ≤ µE + µE 0 < ∞. Suppose that F ∈ Σ and µ(F ∩ E ∗ ) ≤ δ ∗ ; then µ(F ∩ H ∩ E) ≤ µ(F ∩ E) ≤ δ ∗ ≤ δ,
µ(F ∩ E 0 ) ≤ δ ∗
so (ν + ν 0 )F  ≤ νF  + ν 0 F  ≤ η + η ≤ ², (cν)F  = cνF  ≤ cη ≤ ², νH F  = ν(F ∩ H) ≤ η ≤ ². 0
As ² is arbitrary, ν + ν , cν and νH are all truly continuous. Now, taking H from 231Eb, we see that ν1 = νH and ν2 = −νX\H are truly continuous and nonnegative, and ν = ν1 − ν2 is the difference of truly continuous measures. 232D Proposition Let (X, Σ, µ) be a measure space, and f a µintegrable realvalued function. For R E ∈ Σ set νE = E f . Then ν : Σ → R is a countably additive functional and is truly continuous with respect to µ, therefore absolutely continuous with respect to µ. R R proof Recall that E f = f × χE is defined for every E ∈ Σ (131F). So ν : Σ → R is welldefined. If E, F ∈ Σ are disjoint then Z Z ν(E ∪ F ) = f × χ(E ∪ F ) = (f × χE) + (f × χF ) Z Z = f × χE + f × χF = νE + νF, so ν is finitely additive. Now 225A, without using the phrase ‘truly continuous’, proved exactly that ν is truly continuous with respect to µ. It follows from 232Bb that ν is countably additive and absolutely continuous. R Remark The functional E 7→ E f is called the indefinite integral of f . 232E
We are now at last ready for the theorem.
The RadonNikod´ ym theorem Let (X, Σ, µ) be a measure space and ν : Σ → R a function. Then the following are equiveridical: R (i) there is a µintegrable function f such that νE = E f for every E ∈ Σ; (ii) ν is finitely additive and truly continuous with respect to µ. R proof (a) If f is a µintegrable realvalued function and νE = E f for every E ∈ Σ, then 232D tells us that ν is finitely additive and truly continuous.
232F
The RadonNikod´ ym theorem
103
(b) In the other direction, suppose that ν is finitely additive and truly continuous; note that (by 232B(ab)) νE = 0 whenever µE = 0. To begin with, suppose that ν is nonnegative and R R not zero. In this case, there is a nonnegative simple function f such that f > 0 and E f ≤ νE for every E ∈ Σ. P P Let H ∈ Σ be such that νH > 0; set ² = 13 νH > 0. Let E ∈ Σ, δ > 0 be such that µE < ∞ and νF ≤ ² whenever F ∈ Σ and µ(F ∩ E) ≤ δ; then ν(H \ E) ≤ ² so νE ≥ ν(H ∩ E) ≥ 2² and µE ≥ µ(H ∩ E) > 0. Set µE F = µ(F ∩ E) for every F ∈ Σ; then µE is a countably additive functional on Σ. Set ν 0 = ν − αµE , where α = ²/µE; then ν 0 is a countably additive functional and ν 0 E > 0. By 251Eb, as usual, there is a set G ∈ Σ such that ν 0 F ≥ 0 if F ∈ Σ, F ⊆ G, but ν 0 F ≤ 0 if F ∈ Σ and F ∩ G = ∅. As ν 0 (E \ G) ≤ 0, 0 < ν 0 E ≤ ν 0 (E ∩ G) ≤ ν(E ∩ G) and µ(E ∩ G) > 0. Set f = αχ(E ∩ G); then f is a nonnegative simple function and If F ∈ Σ then ν 0 (F ∩ G) ≥ 0, that is, ν(F ∩ G) ≥ αµE (F ∩ G) = αµ(F ∩ E ∩ G) = So νF ≥ ν(F ∩ G) ≥
R F
R
F
R
f = αµ(E ∩ G) > 0.
f.
f,
as required. Q Q (c) Still supposing that ν is a nonnegative, truly R continuous additive functional, let Φ be the set of nonnegative simple functions f : X → R such that E f ≤ νE for every E ∈ Σ; then the constant function 0 belongs to Φ, so Φ is not empty. If f , g ∈ Φ then f ∨ g ∈ Φ, where (f ∨ g)(x) = max(f (x), g(x)) for x ∈ X. P P Set H = {x : (f − g)(x) ≥ 0} ∈ Σ; then f ∨ g = (f × χH) + (g × χ(X \ H)) is a nonnegative simple function, and for any E ∈ Σ,
R
f ∨g = E
R
f+ E∩H
R
E\H
Set
g ≤ ν(E ∩ H) + ν(E \ H) = νE. Q Q
R
γ = sup{ f : f ∈ Φ} ≤ νX < ∞. R Choose a sequence hfRn in∈N in Φ such that limn→∞ fRn = γ. For each n, set gn = f0 ∨ f1 ∨ . . . ∨ fn ; then R gn ∈ Φ and fRn ≤ gn ≤ γ for each n, so limn→∞ gn = γ. By B.Levi’s theorem, f = limn→∞ gn is integrable and f = γ. Note that if E ∈ Σ then
R
f = limn→∞ E
R
fn ≤ νE. R ?? Suppose, if possible, that there is an H ∈ Σ such that H f 6= νH. Set ν1 F = νF −
E
R
F
f ≥0
for every F ∈ Σ; then by (a) of this proof and 232C, ν1 is a truly continuous finitely additive functional, and we R are supposing that ν1 6= 0. ByR (b) of this proof, there is a nonnegative R R simple function g such that F g ≤ ν1 F for every F ∈ Σ and g > 0. Take n ∈ N such that fn + g > γ. Then fn + g is a nonnegative simple function and
R
(f + g) = F n
R
F
fn +
for any F ∈ Σ, so fn + g ∈ Φ, and which is absurd. X X Thus we have
R H
γ
0, then (because n=0 νEn = νE < ∞) there is an n ∈ N such that k=n+1 νEk ≤ ². Now, ² for each k ≤ n, there is an Fk ∈ Σ such that µFk = 0 and ν(Ek ∩ Fk ) ≥ νs Ek − n+1 . In this case, S F = k≤n Fk ∈ Σ, µF = 0 and P∞ Pn Pn νs E ≥ ν(E ∩ F ) ≥ k=0 ν(Ek ∩ Fk ) ≥ k=0 νs Ek − ² ≥ k=0 νs Ek − 2², P∞
because
P∞ k=n+1
νs Ek ≤
As ² is arbitrary, νs E ≥
P∞ k=n+1
P∞ k=0
νEk ≤ ².
νs Ek .
Fk0
(δδ ) Similarly, for each k ≤ n, there is an ∈ Σ such that µFk0 < ∞ and ν(Ek ∩ Fk0 ) ≥ νt Ek − S case, F 0 = k≤n Fk0 ∈ Σ, µF 0 < ∞ and Pn Pn P∞ νt E ≥ ν(E ∩ F 0 ) ≥ k=0 ν(Ek ∩ Fk0 ) ≥ k=0 νt Ek − ² ≥ k=0 νt Ek − 2², because
P∞ k=n+1
νt Ek ≤
As ² is arbitrary, (²²) Putting these together, νs E = are countably additive. Q Q
k=n+1
In this
νEk ≤ ².
P∞
νt Ek . P∞ n=0 νs En and νt E = n=0 νt En . As hEn in∈N is arbitrary, νs and νt
P∞
νt E ≥
P∞
² n+1 .
k=0
(ii) Still supposing that ν is nonnegative, S if we choose a sequence hFn in∈N in Σ such that µFn = 0 for each n and limn→∞ νFn = νs X, then F ∗ = n∈N Fn has µF ∗ = 0, νF ∗ = νs X; so that νs (X \ F ∗ ) = 0, and νs is singular with respect to µ in the sense of 232Ac. Note that νs F = νF whenever µF = 0. So if we write νac = ν − νs , then νac is a countably additive functional and νac F = 0 whenever µF = 0; that is, νac is absolutely continuous with respect to µ. If we write νtc = νt − νs , then νtc is a nonnegative countably additive functional; νtc F = 0 whenever µF = 0, and if νtc E > 0 there is a set F with µF < ∞ and νtc (E ∩ F ) > 0. So νtc is truly continuous with respect to µ, by 232Bb. Set νe = ν − νt = νac − νtc . Thus for any nonnegative countably additive functional ν, we have expressions
106
The RadonNikod´ ym theorem
232I
ν = νs + νac , νac = νtc + νe where νs , νac , νtc and νe are all nonnegative countably additive functionals, νs is singular with respect to µ, νac and νe are absolutely continuous with respect to µ, νtc is truly continuous with respect to µ, and νe F = 0 whenever µF < ∞. (iii) For general countably additive functionals ν : Σ → R, we can express ν as ν 0 −ν 00 , where ν 0 and ν 00 are nonnegative countably additive functionals. If we define νs0 , νs00 , . . . , νe00 as in (i)(ii), we get countably additive functionals νs = νs0 − νs00 ,
0 00 νac = νac − νac ,
0 00 νtc = νtc − νtc ,
νe = νe0 − νe00
such that νs is singular with respect to µ (if F 0 , F 00 are such that µF = µF 0 = νs0 (X \ F ) = νs00 (X \ F ) = 0, then µ(F 0 ∪ F 00 ) = 0 and νs E = 0 whenever E ⊆ X \ (F 0 ∪ F 00 )), νac is absolutely continuous with respect to µ, νtc is truly continuous with respect to µ, and νe F = 0 whenever µF < ∞, while ν = νs + νac = νs + νtc + νe . α) If, for instance, ν = ν˜s + ν˜ac , where ν˜s is (iv) Moreover, these decompositions are unique. P P(α singular and ν˜ac is absolutely continuous with respect to µ, let F , F˜ be such that µF = µF˜ = 0 and ν˜s E = 0 whenever E ∩ F˜ = ∅, νs E = 0 whenever E ∩ F = ∅; then we must have νac (E ∩ (F ∪ F˜ )) = ν˜ac (E ∩ (F ∪ F˜ )) = 0 for every E ∈ Σ, so νs E = ν(E ∩ (F ∪ F˜ )) = ν˜s E for every E ∈ Σ. Thus ν˜s = νs and ν˜ac = νac . β ) Similarly, if νac = ν˜tc + ν˜e where ν˜tc is truly continuous with respect to µ and ν˜e F = 0 whenever (β µF < ∞, then there are sequences hFn in∈N , hF˜n in∈N of sets of finite measure such that νtc F = 0 whenever S S S F ∩ n∈N Fn = ∅ and ν˜tc F = 0 whenever F ∩ n∈N F˜n = ∅. Write F ∗ = n∈N (Fn ∪ F˜n ); then ν˜e E = νe E = 0 whenever E ⊆ F ∗ and ν˜tc E = νtc E = 0 whenever E ∩ F ∗ = ∅, so νe E = νac (E \ F ∗ ) = ν˜e E for every E ∈ Σ, and νe = ν˜e , νtc = ν˜tc . Q Q (b) In this case, µ is σfinite (cf. 211P), so every absolutely continuous countably additive functional is truly continuous (232Bc), and we shall always have νe = 0, νac = νtc . But in the other direction we know that singleton sets, and therefore countable sets, are all measurable. We therefore have a further decomposition νs = νp + νcs , where there is a countable set K ⊆ R r with νp E = 0 whenever E ∈ Σ, E ∩ K = ∅, and νcs is singular with respect to µ and zero on countable sets. P P (i) If ν ≥ 0, set νp E = sup{ν(E ∩ K) : K ⊆ R r is countable}; just as with νs , dealt with in (a) above, νp is countably additive and there is a countable K ⊆ R r such that νp E = ν(E ∩ K) for every E ∈ Σ. (ii) For general ν, we can express ν as ν 0 − ν 00 where ν 0 and ν 00 are nonnegative, and write νp = νp0 − νp00 . (iii) νp is characterized by saying that there is a countable set K such that νp E = ν(E ∩ K) for every E ∈ Σ and ν{x} = 0 for every x ∈ R r \ K. (iv) So if we set νcs = νs − νp , νcs will be singular with respect to µ and zero on countable sets. Q Q Now, for any E ∈ Σ, P P νp E = ν(E ∩ K) = x∈K∩E ν{x} = x∈E ν{x}. Remark The expression ν = νp + νcs + νac of (b) is the Lebesgue decomposition of ν. 232X Basic exercises (a) Let (X, Σ, µ) be a measure space and ν : Σ → R a countably additive functional which is absolutely continuous with respect to µ. Show that the following are equiveridical: (i) ν is truly continuous with respect to µ; (ii) there S is a sequence hEn in∈N in Σ such that µEn < ∞ for every n ∈ N and νF = 0 whenever F ∈ Σ and F ∩ n∈N En = ∅.
232Yf
The RadonNikod´ ym theorem
107
> (b) Let g : R → R be a bounded nondecreasing function and µg the associated LebesgueStieltjes measure (114Xa). Show that µg is absolutely continuous (equivalently, truly continuous) with respect to Lebesgue measure iff the restriction of g to any closed bounded interval is absolutely continuous in the sense of 225B. (c) Let X be a set and Σ a σalgebra of subsets of X; let ν : Σ → R be a countably additive functional. Let I be an ideal of Σ, that is, a subset of Σ such that (α) ∅ ∈ I (β) E ∪ F ∈ I for all E, F ∈ I (γ) if E ∈ Σ, F ∈ I and E ⊆ F then E ∈ I. Show that ν has a unique decomposition as ν = νI + νI0 , where νI and νI0 are countably additive functionals, νI0 E = 0 for every E ∈ I, and whenever E ∈ Σ, νI E 6= 0 there is an F ∈ I such that νI (E ∩ F ) 6= 0. > (d) Let X be a nonempty set and Σ a σalgebra of subsets of X. Show that for any sequence hνn in∈N of countably additive functionals on Σ there is a probability measure µ on X, with domain Σ, such that every νn is absolutely continuous with respect to µ. (Hint: start with the case νn ≥ 0.) ˆ µ (e) Let (X, Σ, µ) be a measure space and (X, Σ, ˆ) its completion (212C). Let ν : Σ → R be an additive functional such that νE = 0 whenever µE = 0. Show that ν has a unique extension to an additive functional ˆ → R such that νˆE = 0 whenever µ νˆ : Σ ˆE = 0. (f ) Let F be an ultrafilter on N including the filter {N\I : I ⊆ N is finite} (2A1O). Define ν : PN → {0, 1} by setting νE = 1 if E ∈ F, 0 for E ∈ PN\F. (i) Let µ1 be counting measure on PN. Show that ν is additive and absolutelyP continuous with respect to µ2 , but is not truly continuous. (ii) Define µ2 : PN → [0, 1] by setting µ2 E = n∈E 2−n−1 . Show that ν is zero on µ2 negligible sets, but is not absolutely continuous with respect to µ2 . (g) Rewrite this section in terms of complexvalued additive functionals. (h) Let (X, Σ, µ) be a measure space, and ν and λ additive functionals on Σ of which ν is positive and countably additive, so that (X, Σ, ν) is also a measure space. (i) Show that if ν is absolutely continuous with respect to µ and λ is absolutely continuous with respect to ν, then λ is absolutely continuous with respect to µ. (ii) Show that if ν is truly continuous with respect to µ and λ is absolutely continuous with respect to ν then λ is truly continuous with respect to µ. 232Y Further exercises (a) Let (X, Σ, µ) be a measure space and ν : Σ → R a finitely additive functional. If E, F , H ∈ Σ and µH < ∞ set ρH (E, F ) = µ(H ∩(E4F )). (i) Show that ρH is a pseudometric on Σ (2A3Fa). (ii) Let T be the topology on Σ generated by {ρH : H ∈ Σ, µH < ∞} (2A3Fc). Show that ν is continuous for T iff it is truly continuous in the sense of 232Ab. (T is the topology of convergence in measure on Σ.) (b) For a nondecreasing function F : [a, b] → R, where a < b, let νF be the corresponding LebesgueStieltjes measure. Show that if we define (νF )ac , etc., with regard to Lebesgue measure on [a, b], as in 232I, then (νF )p = νFp ,
(νF )ac = νFac ,
(νF )cs = νFcs ,
where Fp , Fcs and Fac are defined as in 226C. (c) Extend the idea of (b) to general functions F of bounded variation. (d) Extend the ideas of (b) and (c) to open, halfopen and unbounded intervals (cf. 226Ya). ˜ µ (e) Let (X, Σ, µ) be a measure space and (X, Σ, ˜) its c.l.d. version (213E). Let ν : Σ → R be an additive functional which is truly continuous with respect to µ. Show that ν has a unique extension to a functional ˜ → R which is truly continuous with respect to µ ν˜ : Σ ˜. (f ) Let (X, Σ, µ) be a measure space and f a µintegrable realvalued function. Show that the indefinite integral of f is the unique countably additive functional ν : Σ → R such that whenever E ∈ Σ and f (x) ∈ [a, b] for almost every x ∈ E, then aµE ≤ νE ≤ bµE.
108
The RadonNikod´ ym theorem
232Yg
(g) Say that two bounded additive functionals ν1 , ν2 on an algebra Σ of sets are mutually singular if for any ² > 0 there is an H ∈ Σ such that sup{ν1 F  : F ∈ Σ, F ⊆ H} ≤ ², sup{ν2 F  : F ∈ Σ, F ∩ H = ∅} ≤ ². (i) Show that ν1 and ν2 are mutually singular iff, in the language of 231Ya231Yb, ν1  ∧ ν2  = 0. (ii) Show that if Σ is a σalgebra and ν1 and ν2 are countably additive, then they are mutually singular iff there is an H ∈ Σ such that ν1 F = 0 whenever F ∈ Σ and F ⊆ H, while ν2 F = 0 whenever F ∈ Σ and F ∩ H = ∅. (iii) Show that if νs , νtc and νe are defined from ν and µ as in 232I, then each pair of the three are mutually singular. (h) Let (X, Σ, µ) be a measure space and f a nonnegative realvalued function which is integrable over R R X; let ν be its indefinite integral. Show that for any function g : X → R, g dν = f × g dµ in the sense that if one of these is defined in [−∞, ∞] so is the other, and they are then equal. (Hint: start with simple functions g.) (i) Let (X, Σ, µ) be a measure space, f an integrable function, and ν : Σ → R the indefinite integral of f . Show that ν, as defined in 231Ya, is the indefinite integral of f . (j) Let X be a set, Σ a σalgebra of subsets of X, and ν : Σ → R a countably additive functional. Show that ν has a RadonNikod´ ym derivative with respect to ν as defined in 231Ya, and that any such derivative has modulus equal to 1 νa.e. 232 Notes and comments The RadonNikod´ ym theorem must be on any list of the halfdozen most important theorems of measure theory, and not only the theorem itself, but the techniques necessary to prove it, are at the heart of the subject. In my book Fremlin 74 I discussed a variety of more or less abstract versions of the theorem and of the method, to some of which I will return in §§327 and 365 of the next volume. As I have presented it here, the essence of the proof is split between 231E and 232E. I think we can distinguish the following elements. Let ν be a countably additive functional. (i) ν is bounded (231Ea). (ii) ν is expressible as the difference of nonnegative functionals (231F). (I gave this as a corollary of 231Eb, but it can also be proved by simpler methods, as in 231Ya.) (iii) If ν > 0, there is an integrable f such that 0 < νf ≤ ν, writing νf for the indefinite integral of f . (This is the point at which we really do need the Hahn decomposition 231Eb.) (iv) The Rset Ψ = {f : νf ≤ ν} is closed under countable suprema, so there is an f ∈ Ψ maximising f . (In part (b) of the proof of 232E, I spoke of simple functions; but this was solely to simplify the technical details, and the same argument works if we apply it to Ψ instead of Φ. Note the use here of B.Levi’s theorem.) (v) Take f from (iv) and use (iii) to show that ν − νf = 0. Each of the steps (i)(iv) requires a nontrivial idea, and the importance of the theorem lies not only in its remarkable direct consequences in the rest of this chapter and elsewhere, but in the versatility and power of these ideas. I introduce the idea of ‘truly continuous’ functional in order to give a reasonably straightforward account of the status of the RadonNikod´ ym theorem in nonσfinite measure spaces. Of course the whole point is that a truly continuous functional, like an indefinite integral, must be concentrated on a σfinite part of the space (232Xa), so that 232E, as stated, can be deduced easily from the standard form 232F. I dare to use the word ‘truly’ in this context because this kind of continuity does indeed correspond to a topological notion (232Ya).
233B
Conditional expectations
109
There is a possible trap in the definition I give of ‘absolutely continuous’ functional. Many authors use the condition of 232Ba as a definition, saying that ν is absolutely continuous with respect to µ if νE = 0 whenever µE = 0. For countably additive functionals this coincides with the ²δ formulation in 232Aa; but for other additive functionals this need not be so (232Xf(ii)). Mostly the distinction is insignificant, but I note that in 232Bd it is critical, since ν there is not assumed to be countably additive. In 232I I describe one of the many ways of decomposing a countably additive functional into mutually singular parts with special properties. In 231Yf231Yg I have already suggested a method of decomposing an additive functional into the sum of a countably additive part and a ‘purely finitely additive’ part. All these results have natural expressions in terms of the ordered linear space of bounded additive functionals on an algebra (231Yc).
233 Conditional expectations I devote a section to a first look at one of the principal applications of the RadonNikod´ ym theorem. It is one of the most vital ideas of measure theory, and will appear repeatedly in one form or another. Here I give the definition and most basic properties of conditional expectations as they arise in abstract probability theory, with notes on convex functions and a version of Jensen’s inequality (233I233J). 233A σsubalgebras Let X be a set and Σ a σalgebra of subsets of X. A σsubalgebra of Σ is a σalgebra T of subsets of X such that T ⊆ Σ. If (X, Σ, µ) is a measure space and T is a σsubalgebra of Σ, then (X, T, µ¹ T) is again a measure space; this is immediate from the definition (112A). Now we have the following straightforward lemma. It is a special case of 235I below, but I give a separate proof in case you do not wish as yet to embark on the general investigation pursued in §235. 233B Lemma Let (X, Σ, µ) be a measure space and T a σsubalgebra of Σ. A realvalued function f defined on a subset of X is µ¹ TintegrableRiff (i) it is µintegrable (ii) dom f is µ¹ Tconegligible (iii) f is R µ¹ Tvirtually measurable; and in this case f d(µ¹ T) = f dµ. Pn proof (a) Note first that if f is a µ¹ Tsimple function, that is, is expressible as i=0 ai χEi where ai ∈ R, Ei ∈ T and (µ¹ T)Ei < ∞ for each i, then f is µsimple and
R
f dµ =
Pn
i=0
ai µEi =
R
f d(µ¹ T).
(b) Let Uµ be the set of nonnegative µintegrable functions and Uµ¹ T the set of nonnegative µ¹ Tintegrable functions. Suppose f ∈ Uµ¹ T . Then there is a nondecreasing sequence hfn in∈N of µ¹ Tsimple functions such that f (x) = limn→∞ fn µ¹ Ta.e. and
R
R
f d(µ¹ T) = limn→∞ fn d(µ¹ T). R R But now every R fn is Ralso µsimple, and fn dµ = fn d(µ¹ T) for every n, and f = limn→∞ fn µa.e. So f ∈ Uµ and f dµ = f d(µ¹ T). (c) Now suppose thatR f is µ¹ Tintegrable. Then it is the difference of two members of Uµ¹ T , so is µR integrable, and f dµ = f d(µ¹ T). Also conditions (ii) and (iii) are satisfied, according to the conventions established in Volume 1 (122Nc, 122P122Q). (d) Suppose that f satisfies conditions (i)(iii). Then f  ∈ Uµ , and there is a conegligible set E ⊆ dom f such that E ∈ T and f ¹E is Tmeasurable. Accordingly f ¹E is Tmeasurable. Now, if ² > 0, then R (µ¹ T){x : x ∈ E, f (x) ≥ ²} = µ{x : x ∈ E, f (x) ≥ ²} ≤ 1² f dµ < ∞; moreover,
110
The RadonNikod´ ym theorem
233B
Z sup{
g d(µ¹ T) : g is a µ¹ Tsimple function, g ≤ f  µ¹ Ta.e.} Z = sup{ g dµ : g is a µ¹ Tsimple function, g ≤ f  µ¹ Ta.e.} Z ≤ sup{ g dµ : g is a µsimple function, g ≤ f  µa.e.} Z ≤ f dµ < ∞.
By the criterion of 122Ja, f  ∈ Uµ¹ T . Consequently f , being µ¹ Tvirtually Tmeasurable, is µ¹ Tintegrable, by 122P. This completes the proof. 233C Remarks (a) My argument just above is detailed to the point of pedantry. I think, however, that while I can be accused of wasting paper by writing everything down, every element of the argument is necessary to the result. To be sure, some of the details are needed only because I use such a wide notion of ‘integrable function’; if you restrict the notion of ‘integrability’ to measurable functions defined on the whole measure space, there are simplifications at this stage, to be paid for later when you discover that many of the principal applications are to functions defined by formulae which do not apply on the whole underlying space. The essential point which does have to be grasped is that while a µ¹ Tnegligible set is always µnegligible, a µnegligible set need not be µ¹ Tnegligible. (b) As the simplest possible example of the problems which can arise, I offer the following. Let (X, Σ, µ) be [0, 1]2 with Lebesgue measure. Let T be the set of those members of Σ expressible as F × [0, 1] for some F ⊆ [0, 1]; it is easy to see that T is a σsubalgebra of Σ. Consider f , g : X → [0, 1] defined by saying that f (t, u) = 1 if u > 0, 0 otherwise, g(t, u) = 1 if t > 0, 0 otherwise. Then both f and g are µintegrable, being constant µa.e. But only g is µ¹ Tintegrable, because any nonnegligible E ∈ T includes a complete vertical section {t} × [0, 1], so that f takes both values 0 and 1 on E. If we set h(t, u) = 1 if u > 0, undefined otherwise, then again (on the conventions I use) h is µintegrable but not µ¹ Tintegrable, as there is no conegligible member of T included in the domain of h. (c) If f is defined everywhere on X, and µ¹ T is complete, then of course f is µ¹ Tintegrable iff it is µintegrable and Tmeasurable. But note that in the example just above, which is one of the archetypes for this topic, µ¹ T is not complete, as singleton sets are negligible but not measurable. 233D Conditional expectations Let (X, Σ, µ) be a probability space, that is, a measure space with µX = 1. (Nearly all the ideas here work perfectly well for any totally finite measure space, but there seems nothing to be gained from the extension, and the traditional phrase ‘conditional expectation’ demands a probability space.) Let T ⊆ Σ be a σsubalgebra. (a) For any µintegrable realvalued function f defined on a conegligible subset of X, we have a correR sponding indefinite integral νf : Σ → R given by the formula νf E = E f for every E ∈ Σ. We know that νf is countably additive and truly continuous with respect to µ, which in the present context is the same as saying that it is absolutely continuous (232Bc232Bd). Now consider the restrictions µ¹ T, νf ¹ T of µ and νf to the σalgebra T. It follows directly from the definitions of ‘countably additive’ and ‘absolutely continuous’ that νf ¹ T is countably additive and absolutely continuous with respect to µ¹ T, therefore truly continuous with respect to µ¹ T. Consequently, the ym theorem (232E) tells us that there is a R RadonNikod´ µ¹ Tintegrable function g such that (νf ¹ T)F = F g d(µ¹ T) for every F ∈ T.
233E
Conditional expectations
111
(b) Let us define aRconditional expectation of f on T to be such a function; that is, a µ¹ Tintegrable R function g such that F g d(µ¹ T) = F f dµ for every F ∈ T. Looking back at 233B, we see that for such a g we have
R
g d(µ¹ T) =
F
R
g × χF d(µ¹ T) =
R
g × χF dµ =
R
F
g dµ
for every F ∈ T; also, that g is almost everywhere equal to a Tmeasurable function defined everywhere on X which is also a conditional expectation of f on T (232He). (c) I set the word ‘a’ of the phrase ‘a conditional expectation’ in bold type to emphasize that there is nothing unique about the function g. In 242J I will return to this point, and describe an object which could properly be called ‘the’ conditional expectation of f on T. g is ‘essentially unique’ only in the sense that if g1 , g2 are both conditional expectations of f on T then g1 = g2 µ¹ Ta.e. (131Hb). This does of course mean that a very large number of its properties – for instance, the distribution function G(a) = µ ˆ{x : g(x) ≤ a}, where µ ˆ is the completion of µ (212C) – are independent of which g we take. (d) A word of explanation of the phrase ‘conditional expectation’ is in order. This derives from the standard identification of probability with measure, due to Kolmogorov, which I will discuss more fully in Chapter 27. A realvalued random variable may be regarded as a measurable, orRvirtually measurable, function f on a probability space (X, Σ, µ); its ‘expectation’ becomes identified with fRdµ, supposing that 1 this exists. If F ∈ Σ and µF > 0 then the ‘conditional expectation of f given F ’ is µF f . If F0 , . . . , Fn F is a partition of X into measurable sets of nonzero measure, then the function g given by g(x) =
1 µFi
R
Fi
f if x ∈ Fi
is a kind of anticipated conditional expectation; if we are one day told that x ∈ Fi , then g(x) will be our subsequent estimate of the expectation of f . In the terms of the definition above, g is a conditional expectation of f on the finite algebra T generated by {F0 , . . . , Fn }. An appropriate intuition for general σalgebras T is that they consist of the events which we shall be able to observe at some stated future time t0 , while the whole algebra Σ consists of all events, including those not observable until times later than t0 , if ever. 233E
I list some of the elementary facts concerning conditional expectations.
Proposition Let (X, Σ, µ) be a probability space and T a σsubalgebra of Σ. Let hfn in∈N be a sequence of µintegrable realvalued functions, and for each n let gn be a conditional expectation of fn on T. Then (a) g1 + g2 is a conditional expectation of f1 + f2 on T; (b) for any c ∈ R, cg0 is a conditional expectation of cf0 on T; (c) if f1 ≤a.e. f2 then g1 ≤a.e. g2 ; (d) if hfn in∈N is nondecreasing a.e. and f = limn→∞ fn is µintegrable, then limn→∞ gn is a conditional expectation of f on T; (e) if f = limn→∞ fn is defined a.e. and there is a µintegrable function h such that fn  ≤a.e. h for every n, then limn→∞ gn is a conditional expectation of f on T; (f) if F ∈ T then g0 × χF is a conditional expectation of f0 × χF on T; (g) if h is a bounded, µ¹ Tvirtually measurable realvalued function defined µ¹ Talmost everywhere on X, then g0 × h is a conditional expectation of f0 × h on T; (h) if Υ is a σsubalgebra of T, then a function h0 is a conditional expectation of f0 on Υ iff it is a conditional expectation of g0 on Υ. proof (a)(b) We have only to observe that
R
F
g1 + g2 d(µ¹ T) =
R F
for every F ∈ T. (c) If F ∈ T then
R
F
g1 d(µ¹ T) +
cg0 d(µ¹ T) = c
R
R F
F
g2 d(µ¹ T) =
g0 d(µ¹ T) = c
R R
F
F
f1 dµ + f0 dµ =
R R
F
f2 dµ =
F
cf0 dµ
R F
f1 + f2 dµ,
112
The RadonNikod´ ym theorem
R F
g1 d(µ¹ T) =
R F
f1 dµ ≤
R F
f2 dµ =
233E
R F
g2 d(µ¹ T)
for every F ∈ T; consequently g1 ≤ g2 µ¹ Ta.e. (131Ha). (d) By (c), hgn in∈N is nondecreasing µ¹ Ta.e.; moreover, supn∈N
R
gn d(µ¹ T) = supn∈N
R
fn dµ =
R
f dµ < ∞.
By B.Levi’s theorem, g = limn→∞ gn is defined µ¹ Talmost everywhere, and
R
F
g d(µ¹ T) = limn→∞
R
F
gn d(µ¹ T) = limn→∞
R
F
fn dµ =
R F
f dµ
for every F ∈ T, so g is a conditional expectation of f on T. (e) Set fn0 = inf m≥n fm , fn00 = supm≥n fm for each n ∈ N. Then we have −h ≤a.e. fn0 ≤ fn ≤ fn00 ≤a.e. h, and hfn0 in∈N , hfn00 in∈N are almosteverywheremonotonic sequences of functions both converging almost everywhere to f . For each n, let gn0 , gn00 be conditional expectations of fn0 , fn00 on T. By (iii) and (iv), hgn0 in∈N and hgn00 in∈N are almosteverywheremonotonic sequences converging almost everywhere to conditional expectations g 0 , g 00 of f . Of course g 0 = g 00 µ¹ Ta.e. (233Dc). Also, for each n, gn0 ≤a.e. gn ≤a.e. gn00 , so hgn in∈N converges to g 0 µ¹ Ta.e., and g = limn→∞ gn is defined almost everywhere and is a conditional expectation of f on T. (f ) For any H ∈ T,
R
g × χF d(µ¹ T) = H 0
R H∩F
(g)(i) If h is actually (µ¹ T)simple, say h =
R
F
g0 × h d(µ¹ T) =
Pn
i=0
ai
R
F
g0 d(µ¹ T) = Pn i=0
R H∩F
f0 dµ =
R H
f0 × χF dµ.
ai χFi where Fi ∈ T for each i, then
g0 × χFi d(µ¹ T) =
Pn
i=0
ai
R
F
f × χFi dµ =
R
F
f × h dµ
for every F ∈ T. (ii) For the general case, if h is µ¹ Tvirtually measurable and h(x) ≤ M µ¹ Talmost everywhere, then there is a sequence hhn in∈N of µ¹ Tsimple functions converging to h almost everywhere, and with hn (x) ≤ M for every x, n. Now f0 × hn → f0 × h a.e. and f0 × hn  ≤a.e. M f0  for each n, while g0 × hn is a conditional expectation of f0 × hn for every n, so by (e) we see that limn→∞ g0 × hn will be a conditional expectation of f0 × h; but this is equal almost everywhere to g0 × h. R R (h) We need note only that H g0 d(µ¹ T) = H f0 dµ for every H ∈ Υ, so Z Z h0 d(µ¹ Υ) = g0 d(µ¹ T) for every H ∈ Υ H H Z Z ⇐⇒ h0 d(µ¹ Υ) = f0 dµ for every H ∈ Υ. H
H
233F Remarks Of course the results above are individually nearly trivial (though I think (e) and (g) might give you pause for thought if they were offered without previous preparation of the ground). Cumulatively they amount to some quite strong properties. In §242 I will restate them in language which is syntactically more direct, but relies on a deeper level of abstraction. As an illustration of the power of conditional expectations to surprise us, I offer the next proposition, which depends on the concept of ‘convex’ function. 233G Convex functions Recall that a realvalued function φ defined on an interval I ⊆ R is convex if φ(tb + (1 − t)c) ≤ tφ(b) + (1 − t)φ(c) whenever b, c ∈ I and t ∈ [0, 1]. Examples The formulae x, x2 , e±x ± x define convex functions on R; on ]−1, 1[ we have 1/(1 − x2 ); on ]0, ∞[ we have 1/x and x ln x; on [0, 1] we have the function which is zero on ]0, 1[ and 1 on {0, 1}.
233I
Conditional expectations
113
233H The general theory of convex functions is both extensive and important; I list a few of their more salient properties in 233Xe. For the moment the following lemma covers what we need. Lemma Let I ⊆ R be a nonempty open interval (bounded or unbounded) and φ : I → R a convex function. (a) For every a ∈ I there is a b ∈ R such that φ(x) ≥ φ(a) + b(x − a) for every x ∈ I. (b) If we take, for each q ∈ I ∩ Q, a bq ∈ R such that φ(x) ≥ φ(q) + bq (x − q) for every x ∈ I, then φ(x) = supq∈I∩Q φ(q) + bq (x − q). (c) φ is Borel measurable. proof (a) If c, c0 ∈ I and c < a < c0 , then a is expressible as dc + (1 − d)c0 for some d ∈ ]0, 1[, so that φ(a) ≤ dφ(c) + (1 − d)φ(c0 ) and φ(a)−φ(c) a−c
≤
dφ(c)+(1−d)φ(c0 )−φ(c) dc+(1−d)c0 −c
=
d(φ(c0 )−φ(c)) d(c0 −c)
=
=
(1−d)(φ(c0 )−φ(c)) (1−d)(c0 −c)
φ(c0 )−dφ(c)−(1−d)φ(c0 ) c0 −dc−(1−d)c0
≤
φ(c0 )−φ(a) . c0 −a
This means that b = supc (c) Let I ⊆ R be an open interval and φ : I → R a function. (i) Show that if φ is differentiable then it is convex iff φ0 is nondecreasing. (ii) Show that if φ is absolutely continuous on every bounded closed subinterval of I then φ is convex iff φ0 is nondecreasing on its domain. (d) For any r ≥ 1, a subset C of R r is convex if tx + (1 − t)y ∈ C for all x, y ∈ C and t ∈ [0, 1]. If C ⊆ R r is convex, then a function φ : C → R is convex if φ(tx + (1 − t)y) ≤ tφ(x) + (1 − t)φ(y) for all x, y ∈ C and t ∈ [0, 1]. Let C ⊆ R r be a convex set and φ : C → R a function. Show that the following are equiveridical: (i) the function φ is convex; (ii) the set {(x, t) : x ∈ C, t ∈ R, t ≥ φ(x)} is convex in R r+1 ; P (iii) the set r {x : x ∈ C, φ(x) + b. x ≤ c} is convex in R r for every b ∈ R r and c ∈ R, writing b. x = i=1 βi ξi if b = (β1 , . . . , βr ) and x = (ξ1 , . . . , ξr ). (e) Let I ⊆ R be an interval and φ : I → R a convex function. (i) Show that if a, d ∈ I and a < b ≤ c < d then φ(b)−φ(a) b−a
≤
φ(d)−φ(c) . d−c
(ii) Show that φ is continuous at every interior point of I. (iii) Show that either φ is monotonic on I or there is a c ∈ I such that φ(c) = minx∈I φ(x) and φ is nonincreasing on I ∩ ]−∞, c], monotonic nondecreasing on I ∩ [c, ∞[. (iv) Show that φ is differentiable at all but countably many points of I, and that its derivative is nondecreasing in the sense that φ0 (x) ≤ φ0 (y) whenever x, y ∈ dom φ0 and x ≤ y. (v) Show that if I is closed and bounded and φ is continuous then φ is absolutely continuous. (vi) Show that if I is closed and bounded and ψ : I → R is absolutely continuous with a nondecreasing derivative then ψ is convex.
116
The RadonNikod´ ym theorem
233Xf
(f ) Show that if I ⊆ R is an interval and φ, ψ : I → R are convex functions so is aφ + bψ for any a, b ≥ 0. (g) In the context of 233K, give an example in which g × h is integrable but f × h is not. (Hint: take X, µ, T as in 233Cb, and arrange for g to be 0.) (h) Let I ⊆ R be an interval and Φ a nonempty family of convex realvalued functions on I such that ψ(x) = supφ∈Φ φ(x) is finite for every x ∈ I. Show that ψ is convex. 233Y Further exercises (a) If I ⊆ R is an interval, a function φ : I → R is midconvex if φ( x+y 2 )≤ + φ(y)) for all x, y ∈ I. Show that a midconvex function which is bounded on any nontrivial subinterval of I is convex. 1 2 (φ(x)
(b) Generalize 233Xd to arbitrary normed spaces in place of R r . (c) Let (X, Σ, µ) be a probability space and T a σsubalgebra of Σ. Let φ be a convex realvalued function with domain an interval I ⊆ R, and f an integrable realvalued function on X such that f (x) ∈ I for almost every x ∈ X and φf is integrable. Let g, h be conditional expectations on T of f , φf respectively. Show that g(x) ∈ I for almost every x and that φg ≤a.e. h. (d) (i) Show that if I ⊆ R is a bounded interval, E ⊆ I is Lebesgue measurable, and µE > 23 µI where µ is Lebesgue measure, then for every x ∈ I there are y, z ∈ E such that z = x+y 2 . (Hint: by 134Ya/263A, µ(x + E) + µ(2E) > µ(2I).) (ii) Show that if f : [0, 1] → R is a midconvex Lebesgue measurable function (definition: 233Ya), a > 0, and E = {x : x ∈ [0, 1], a ≤ f (x) < 2a} is not negligible, then there is a nontrivial interval I ⊆ [0, 1] such that f (x) > 0 for every x ∈ I. (Hint: 223B.) (iii) Suppose that f : [0, 1] → R is a midconvex function such that f ≤ 0 almost everywhere on [0, 1]. Show that f ≤ 0 everywhere on ]0, 1[. (Hint: for every x ∈ ]0, 1[, max(f (x − t), f (x + t)) ≤ 0 for almost every t ∈ [0, min(x, 1 − x)].) (iv) Suppose that f : [0, 1] → R is a midconvex Lebesgue measurable function such that f (0) = f (1) = 0. Show that f (x) ≤ 0 for every x ∈ [0, 1]. (Hint: show that {x : f (x) ≤ 0} is dense in [0, 1], use (ii) to show that it is conegligible in [0, 1] and apply (iii).) (v) Show that if I ⊆ R is an interval and f : I → R is a midconvex Lebesgue measurable function then it is convex. 233 Notes and comments The concept of ‘conditional expectation’ is fundamental in probability theory, and will reappear in Chapter 27 in its natural context. I hope that even as an exercise in technique, however, it will strike you as worth taking note of. I introduced 233E as a ‘list of elementary facts’, and they are indeed straightforward. But below the surface there are some remarkable ideas waiting for expression. If you take T to be the trivial algebra {∅, R X}, so that the (unique) conditional expectation of an integrable function f is the constant function ( f )χX, then 233Ed and 233Ee become versions of B.Levi’s theorem and Lebesgue’s Dominated Convergence Theorem. (Fatou’s Lemma is in 233Xa.) Even 233Eg can be thought of as a generalization of the R R result that cf = c f , where the constant c has been replaced by a bounded Tmeasurable function. A recurrent theme in the later parts of this treatise will be the search for ‘conditioned’ versions of theorems. The proof of 233Ee is a typical example of an argument which has been translated from a proof of the original ‘unconditioned’ result. I suggested that 233I233J are surprising, and I think that most of us find them so, even applied to the list of convex functions given in 233G. But I should remark that in a way 233J has very little to do with conditional expectations. The only properties of conditional expectations used in the proof are (i) that if g is a conditional expectation of f , then aχX + bg is a conditional expectation of aχX + bf for all real a, b (ii) if g1 , g2 are conditional expectations of f1 , f2 and f1 ≤a.e. f2 , then g1 ≤a.e. g2 . See 244Xm below. Note that 233Ib can be regarded as the special case of R233J in which T = {∅, X}. In fact 233Ia can be derived from 233Ib applied to the measure ν where νE = E g for every E ∈ Σ. Like 233B, 233K seems to have rather a lot of technical detail in the argument. The point of this result is that we can deduce the integrability of f × h from that of g0 × h (but not from the integrability of g × h; see 233Xg). Otherwise it should be routine.
234Cc
Indefiniteintegral measures
117
234 Indefiniteintegral measures I take a few pages to describe a standard construction. The idea is straightforward (234A234B), but a number of details need to be worked out if it is to be securely integrated into the general framework I employ. 234A Theorem Let (X, Σ, µ) be a measure space, and f a nonnegative µvirtually measurable realR valued function defined on a conegligible subset of X. Write νF = f × χF dµ whenever F ⊆ X is such that the integral is defined in [0, ∞] according to the conventions of 133A. Then ν is a complete measure on X, and its domain includes Σ. R proof (a) Write T for the domain of ν, that is, the family of sets F ⊆ X such that f × χF dµ is defined in [0, ∞], that is, f × χF is µvirtually measurable (133A). Then T is a σalgebra of subsets of X. P P For each F ∈ T let HF ⊆ X be a µconegligible set such that f × χF ¹HF is Σmeasurable. Because f itself is µvirtually measurable, X ∈ T. If F ∈ T, then f × χ(X \ F )¹(HX ∩ HF ) = f ¹(HX ∩ HF ) − (f × χF )¹(HX ∩ HF ) is Σmeasurable, while T HX ∩ HF is µconegligible, so X \ F ∈ T. If hFn in∈N is a sequence in T with union F , set H = n∈N HFn ; then H is coneglible, f × χFn ¹H is Σmeasurable for every n ∈ N, and f × χF = supn∈N f × χFn , so f × χF ¹H is Σmeasurable, and F ∈ T. Thus T is a σalgebra. If F ∈ Σ, then f × χF ¹HX is Σmeasurable, so F ∈ T. Q Q (b) Next, ν is a measure. P P Of course νF ∈ [0, ∞] for every F ∈ T. f × χ∅P= 0 wherever it is defined, ∞ f × χFn . If νFm = ∞ so ν∅ = 0. If hFn in∈N is a disjoint sequence inPT with union F , then f × χF = n=0 P ∞ ∞ for some m, then we surely have νF = ∞ = n=0 νFn . If νFm < ∞ for each m but n=0 νFn = ∞, then R R S Pm f × χ( n≤m Fn ) = n=0 f × χFn → ∞ P∞ P∞ as m → ∞, so again νF = ∞ = n=0 νFn . If n=0 νFn < ∞ then by B.Levi’s theorem R P∞ P∞ P∞ R Q νF = n=0 νFn . Q n=0 f × χFn = n=0 f × χFn = (c) Finally, ν is complete. P P If A ⊆ F ∈ T and νF = 0, then f × χF =a.e. 0, so f × χA =a.e. 0 and νA is defined and equal to zero. Q Q 234B Definition Let (X, Σ, µ) be a measure space, and ν another measure on X with domain T. I will call ν an indefiniteintegral measure over µ, or sometimes a completed indefiniteintegral measure, if it can be obtained by the method of 234A from some nonnegative virtually measurable function defined almost everywhere on X. In this case I will call f a RadonNikod´ ym derivative of ν with respect to µ. As in 232Hf, the phrase density function is also used in this context. 234C Remarks Let (X, Σ, µ) be a measure space, and f a µvirtually measurable nonnegative realvalued function defined almost everywhere on X; let ν be the associated indefiniteintegral measure. (a) There is a Σmeasurable function g : X → [0, ∞[ such that f =a.e. g. P P Let H ⊆ dom f be a measurable conegligible set such that f ¹H is measurable, and set g(x) = f (x) for x ∈ H, g(x) = 0 for R R x ∈ X \ H. Q Q In this case, f × χE dµ = g × χE dµ if either is defined. So g is a RadonNikod´ ym derivative of ν, and ν has a RadonNikod´ ym derivative which is Σmeasurable and defined everywhere. (b) If E is µnegligible, then f × χE = 0 µa.e., so νE = 0. But it does not follow that ν is absolutely continuous with respect to µ in the ²δ sense of 232Aa (234Xa). (c) I have defined ‘indefiniteintegral measure’ in such a way as to produce a complete measure. In my view this is what makes best sense in most applications. There are occasions on R R which it seems more appropriate to use the measure ν0 : Σ → [0, ∞] defined by setting ν0 E = E f dµ = f × χE dµ for E ∈ Σ. I suppose I would call this the uncompleted indefiniteintegral measure over µ defined by f . (ν is always the completion of ν0 ; see 234Db.)
118
The RadonNikod´ ym theorem
234Cd
R (d) Note the wayR in which I formulated the definition of ν: ‘νE = f × χE dµ if the integral is defined’, rather than ‘νE = E f dµ’. The point is that theRlonger formula gives a rule for deciding what the domain of ν must be. Of course it is the case that νE = E f dµ for every E ∈ dom ν (apply 214F to f × χE). (e) Many authors are prepared to say ‘ν is absolutely continuous with respect to µ’ in this context. If ν is an indefiniteintegral measure with respect to µ, then it is certainly true that νE = 0 whenever µE = 0, as in 232Ba. If ν is not totally finite, however, it does not follow that limµE→0 νE = 0, as required by the definition in 232Aa, and further difficulties can arise if µ or ν is not σfinite (see 234Ya234Yc). 234D The domain of an indefiniteintegral measure It is sometimes useful to have an explicit description of the domain of a measure constructed in this way. Proposition Let (X, Σ, µ) be a measure space, f a nonnegative µvirtually measurable function defined almost everywhere in X, and ν the associated indefiniteintegral measure. Set G = {x : x ∈ dom f, f (x) > ˆ be the domain of the completion µ 0}, and let Σ ˆ of µ. Then ˆ in particular, T ⊇ Σ ˆ ⊇ Σ. (a) the domain T of ν is {E : E ⊆ X, E ∩ G ∈ Σ}; (b) ν is the completion of its restriction to Σ. (c) A set A ⊆ X is νnegligible iff A ∩ G is µnegligible. (d) In particular, if µ itself is complete, then T = {E : E ⊆ X, E ∩ G ∈ Σ} and νA = 0 iff µ(A ∩ G) = 0. proof (a)(i) If E ∈ T, then f × χE is virtually measurable, so there is a conegligible measurable set H ⊆ dom f such that f × χE¹H is measurable. Now E ∩ G ∩ H = {x : x ∈ H, (f × χE)(x) > 0} must ˆ and E ∩ G ∈ Σ. ˆ belong to Σ, while E ∩ G \ H is negligible, so belongs to Σ, ˆ let F1 , F2 ∈ Σ be such that F1 ⊆ E ∩ G ⊆ F2 and F2 \ F1 is negligible. Let (ii) If E ∩ G ∈ Σ, H ⊆ dom f be a conegligible set such that f ¹H is measurable. Then H 0 = H \ (F2 \ F1 ) is conegligible and f × χE¹H 0 = f × χF1 ¹H 0 is measurable, so f × χE is virtually measurable and E ∈ T. (b) Thus the given formula does indeed describe T. If E ∈ T, let F1 , F2 ∈ Σ be such that F1 ⊆ E ∩G ⊆ F2 ˆ there are G1 , G2 ∈ Σ such that G1 ⊆ G ⊆ G2 and and µ(F2 \ F1 ) = 0. Because G itself also belongs to Σ, 0 0 µ(G2 \G1 ) = 0. Set F2 = F2 ∪(X\G1 ). Then F2 ∈ Σ and F1 ⊆ E ⊆ F20 . But (F20 \F1 )∩G ⊆ (G2 \G1 )∪(F2 \F1 ) is negligible, so ν(F20 \ F1 ) = 0. ˜ is its domain, then T ⊆ T. ˜ But as ν is complete, it This shows that if ν˜ is the completion of ν¹Σ and T surely extends ν˜, so ν = ν˜, as claimed. (c) Now take any A ⊆ X. Because ν is complete, A is νnegligible ⇐⇒ νA = 0 Z ⇐⇒ f × χA dµ = 0 ⇐⇒ f × χA = 0 µa.e. ⇐⇒ A ∩ G is µnegligible. (d) This is just a restatement of (a) and (c) when µ = µ ˆ. 234E Corollary If (X, Σ, µ) is a complete measure space and G ∈ Σ, then the indefiniteintegral measure over µ defined by χG is just the measure µ G defined by setting (µ G)(F ) = µ(F ∩ G) whenever F ⊆ X and F ∩ G ∈ Σ. proof 234Dd. *234F The next two results will not be relied on in this volume, but I include them for future reference, and to give an idea of the scope of these ideas.
*234F
Indefiniteintegral measures
119
Proposition Let (X, Σ, µ) be a measure space, and ν an indefiniteintegral measure over µ. (a) If µ is semifinite, so is ν. (b) If µ is complete and locally determined, so is ν. (c) If µ is localizable, so is ν. (d) If µ is strictly localizable, so is ν. (e) If µ is σfinite, so is ν. (f) If µ is atomless, so is ν. proof By 234Ca, we may express ν as the indefinite integral of a Σmeasurable function f : X → [0, ∞[. ˆ the domain of the completion µ Let T be the domain of ν, and Σ ˆ of µ; set G = {x : x ∈ X, f (x) > 0} ∈ Σ. (a) Suppose that E ∈ T and that νE = ∞. Then E ∩ G cannot be µnegligible. Because µ is semiS finite, there is a nonnegligible F ∈ Σ such that F ⊆ E ∩ G and µF1 < ∞. Now F = n∈N {x : x ∈ F , 2−n ≤ f (x) ≤ n}, so there is an n ∈ N such that F 0 = {x : x ∈ F , 2−n ≤ f (x) ≤ n} is nonnegligible. Because f is measurable, F 0 ∈ Σ ⊆ T and 2−n µF 0 ≤ νF 0 ≤ nµF 0 . Thus we have found an F 0 ⊆ E such that 0 < νF 0 < ∞. As E is arbitrary, ν is semifinite. (b) We already know that ν is complete (234Db) and semifinite. Now suppose that E ⊆ X is such that E ∩ F ∈ T, that is, E ∩ F ∩ G ∈ Σ (234Dd), whenever F ∈ T and νF < ∞. Then E ∩ G ∩ F ∈ Σ whenever F ∈ Σ and µF < ∞. P P Set Fn = {x : x ∈ F ∩S G, f (x) ≤ n}. Then νFn ≤ nµF < ∞, so E ∩ G ∩ Fn ∈ Σ for every n. But this means that E ∩ G ∩ F = n∈N E ∩ G ∩ Fn ∈ Σ. Q Q Because µ is locally determined, E ∩ G ∈ Σ and E ∈ T. As E is arbitrary, ν is locally determined. ˆ By 212Ga, µ (c) Let F ⊆ T be any set. Set E = {F ∩ G : F ∈ F }, so that E ⊆ Σ. ˆ is localizable, so E has 0 0 ˆ But now, for any H ∈ T, H ∪ (X \ G) = (H 0 ∩ G) ∪ (X \ G) belongs to Σ, ˆ an essential supremum H ∈ Σ. so ν(F \ H 0 ) = 0 for every F ∈ F ⇐⇒ µ ˆ(F ∩ G \ H 0 ) = 0 for every F ∈ F ⇐⇒ µ ˆ(E \ H 0 ) = 0 for every E ∈ E ⇐⇒ µ ˆ(E \ (H 0 ∪ (X \ G))) = 0 for every E ∈ E ⇐⇒ µ ˆ(H \ ((H 0 ∪ (X \ G))) = 0 ⇐⇒ µ ˆ(H ∩ G \ H 0 ) = 0 ⇐⇒ ν(H \ H 0 ) = 0. So H is also an essential supremum of F in T. As F is arbitrary, ν is localizable. (d) Let hXi ii∈I be a decomposition of X for µ; then it is also a decomposition for µ ˆ (212Gb). Set F0 = X \ G, Fn = {x : x ∈ G, n − 1 < f (x) ≤ n} for n ≥ 1. Then hXi ∩ Fn ii∈I,n∈N is a decomposition for ν. P P (i) hXi ii∈I and hFn in∈N are disjoint covers of X by members of Σ ⊆ T, so hXi ∩ Fn ii∈I,n∈N also is. (ii) ν(Xi ∩ F0 ) = 0, ν(Xi ∩ Fn ) ≤ nµXi < ∞ for i ∈ I, n ≥ 1. (iii) If E ⊆ X and E ∩ Xi ∩ Fn ∈ T for every S ˆ for every i, so E ∩ G ∈ Σ ˆ and E ∈ T. i ∈ I and n ∈ N then E ∩ Xi ∩ G = n∈N E ∩ Xi ∩ Fn ∩ G belongs to Σ (iv) If E ∈ T, then of course P P i∈I,n∈N ν(E ∩ Xi ∩ Fn ) = supJ⊆I×N is finite (i,n)∈J ν(E ∩ Xi ∩ Fn ) ≤ νE. P So if If the sum is finite, then K = {i : i ∈ i∈I,n∈N ν(E ∩ Xi ∩ Fn ) = ∞ it is surely equal to νE. R I, ν(E ∩ Xi ) > 0} must be countable. But for i ∈ I \ K, E∩Xi f dµ = 0, so f = 0 µa.e. on E ∩ Xi , S that is, µ ˆ(E ∩ G ∩ Xi ) = 0. Because hXi ii∈I is a decomposition for µ ˆ, µ ˆ(E ∩ G ∩ i∈I\K Xi ) = 0 and S ν(E ∩ i∈I\K Xi ) = 0. But this means that P P P νE = i∈K ν(E ∩ Xi ) = i∈K,n∈N ν(E ∩ Xi ∩ Fn ) = i∈I,n∈N ν(E ∩ Xi ∩ Fn ). As E is arbitrary, hXi ∩ Fn ii∈I,n∈N is a decomposition for ν. Q Q So ν is strictly localizable. (e) If µ is σfinite, then in (d) we can take I to be countable, so that I × N is also countable, and ν will be σfinite.
120
The RadonNikod´ ym theorem
*234F
ˆ (f ) If µ is atomless, so is µ ˆ (212Gd). If E ∈ T and νE > 0, then µ ˆ(E ∩ G) > 0, so there isR an F ∈ Σ such that F R⊆ E ∩ G and neither F nor E ∩ G \ F is µ ˆnegligible. But in this case both νF = F f dµ and ν(E \ F ) = E\F f dµ must be greater than 0 (122Rc). As E is arbitrary, ν is atomless. *234G For localizable measures, there is a straightforward description of the associated indefiniteintegral measures. Theorem Let (X, Σ, µ) be a localizable measure space. Then a measure ν, with domain T ⊇ Σ, is an indefiniteintegral measure over µ iff (α) ν is semifinite and zero on µnegligible sets (β) ν is the completion of its restriction to Σ (γ) whenever νE > 0 there is an F ⊆ E such that F ∈ Σ and µF < ∞ and νF > 0. proof (a) If ν is an indefiniteintegral measure over ν, then by 234Fa, 234Cb and 234Db it is semifinite, zero on µnegligible sets and the completion of its restriction to Σ. Now suppose that E ∈ T and νE > 0. Then there is an E0 ∈ Σ such that E0 ⊆ E and νE0 = νE, by 234Db. If f : X → R is a Σmeasurable RadonNikod´ ym derivative of ν, and G = {x : f (x) > 0}, then µ(G ∩ E0 ) > 0; because µ is semifinite, there is an F ∈ Σ such that F ⊆ G ∩ E0 and 0 < µF < ∞, in which case νF > 0. (b) So now suppose that ν satisfies the conditions. (i) Set E = {E : E ∈ Σ, νE < ∞}. For each E ∈ E, consider νE : Σ → R, setting νE G = ν(G ∩ E) for every G ∈ Σ. Then νE is countably additive and truly continuous with respect to µ. P P νE is countably additive, just as in 231De. Because ν is zero on µnegligible sets, νE must be absolutely continuous with respect to µ, by 232Ba. Since νE clearly satisfies condition (γ) of 232Bb, it must be truly continuous. Q Q R By 232E, there is a µintegrable function fE such that νE G = G fE dµ for every G ∈ Σ, and we may suppose that fE is Σmeasurable (232He). Because νE is nonnegative, fE ≥ 0 µalmost everywhere. (ii) If E, F ∈ E then fE = fF µa.e. on E ∩ F , because
R
f dµ = νG = G E
R
G
fF dµ
whenever G ∈ Σ and G ⊆ E ∩ F . Because (X, Σ, µ) is localizable, there is a measurable f : X → R such that fE = f µa.e. on E for every E ∈ E (213N). Because every fE is nonnegative almost everywhere, we may suppose that f is nonnegative, since surely fE = f ∨ 0 µa.e. on E for every E ∈ E. (iii) Let ν˜ be the indefiniteintegral measure defined by f . If E ∈ E then νE =
R
f dµ = E E
R
E
f dµ = ν˜E.
For E ∈ Σ \ E, we have ν˜E ≥ sup{˜ ν F : F ∈ E, F ⊆ E} = sup{νF : F ∈ E, F ⊆ E} = νE = ∞ because ν is semifinite. Thus ν˜ and ν agree on Σ. But since each is the completion of its restriction to Σ, they must be equal. 234X Basic exercises (a) Let µ be Lebesgue measure on [0, 1], and set f (x) = x1 for x > 0. Let ν be the associated indefiniteintegral measure. Show that the domain of ν is equal to the domain of µ. Show ¤ ¤ that for every δ ∈ 0, 12 there is a measurable set E such that µE = δ but νE = 1δ . (b) Let (X, Σ, µ) be a measure space, and ν an indefiniteintegral measure over µ. Show that if µ is purely atomic, so is ν. (c) Let µ be a pointsupported measure. Show that any indefiniteintegral measure over µ is pointsupported. 234Y Further exercises (a) Let (X, Σ, µ) be a semifinite measure space which is not localizable. Show that there is a measure R ν : Σ → [0, ∞] such that νE ≤ µE for every E ∈ Σ but there is no measurable function f such that νE = E f dµ for every E ∈ Σ. (b) Let (X, Σ, µ) be a localizable measure space with locally determined negligible sets. Show that a measure ν, with domain T ⊇ Σ, is an indefiniteintegral measure over µ iff (α) ν is complete and semifinite and zero on µnegligible sets (β) whenever νE > 0 there is an F ⊆ E such that F ∈ Σ and µF < ∞ and νF > 0.
235A
Measurable transformations
121
(c) Give an example of a localizable measure space (X, Σ, µ) and a complete semifinite measure ν on X, defined on a σalgebra T ⊇ Σ, zero on µnegligible sets, and such that whenever νE > 0 there is an F ⊆ E such that F ∈ Σ and µF < ∞ and νF > 0, but ν is not an indefiniteintegral measure over µ. (Hint: 216Yb.) (d) Let (X, Σ, µ) be an atomless semifinite measure space and ν an indefiniteintegral measure over µ. Show that the following are equiveridical: (i) for every ² > 0 there is a δ > 0 such that νE ≤ ² whenever µE ≤ δ (ii) ν has a RadonNikod´ ym derivative expressible as the sum of a bounded function and an integrable function. (e) Let (X, Σ, µ) be a localizable measure space, and ν a complete localizable measure on X, with domain T ⊇ Σ, which is the completion of its restriction to Σ. Show that if we set ν1 F = sup{ν(F ∩ E) : E ∈ Σ, µE < ∞} for every F ∈ T, then ν1 is an indefiniteintegral measure over µ, and there is a µconegligible set H such that ν1 F = ν(F ∩ H) for every F ∈ T. (f ) Let (X, Σ, µ) be a measure space and ν an indefiniteintegral measure over µ, with RadonNikod´ ym derivative f . Show that the c.l.d. version of ν is the indefiniteintegral measure defined by f over the c.l.d. version of µ. 234 Notes and comments I have taken this section very carefully because the ideas I wish to express here, in so far as they extend the work of §232, rely critically on the details of the definition in 234A, and it is easy to make a false step once we have left the relatively sheltered context of complete σfinite measures. I believe that if we take a little trouble at this point we can develop a theory (234C234F) which will offer a smooth path to later applications; to see what I have in mind, you can refer to the entries under ‘indefiniteintegral measure’ in the index. For the moment I mention only a kind of RadonNikod´ ym theorem for localizable measures (234G).
235 Measurable transformations I turn now to a topic which is separate from the RadonNikod´ ym theorem, but which seems to fit better here than in either of the next two chapters. I seek to give results which will generalize the basic formula of calculus
R
g(y)dy =
R
g(φ(x))φ0 (x)dx
in the context of a general transformation φ between measure spaces. The principal results are I suppose 235A/235E, which are very similar expressions of the basic idea, and 235L, which gives a general criterion for a stronger result. A formulation from a different direction is in 235T. 235A I start with the basic result, which is already sufficient for a large proportion of the applications I have in mind. Theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and φ : Dφ → Y , J : DJ → [0, ∞[ functions defined on conegligible subsets Dφ , DJ of X such that
R
J × χ(φ−1 [F ])dµ exists = νF
whenever F ∈ T and νF < ∞. Then
R φ−1 [H]
J × gφ dµ exists =
R H
g dν
for every νintegrable function g taking values in [−∞, ∞] and every H ∈ T, provided that we interpret (J × gφ)(x) as 0 when J(x) = 0 and g(φ(x)) is undefined. Pn proof (a) If g is a simple function, say g = i=0 ai χFi where νFi < ∞ for each i, then R R R Pn Pn J × gφ dµ = i=0 ai J × χ(φ−1 [Fi ]) dµ = i=0 ai νFi = g dν.
122
The RadonNikod´ ym theorem
235A
R (b) If νF = 0 then J × χ(φ−1 [F ]) = 0 so J = 0 a.e. on φ−1 [F ]. So if g is defined νa.e., J = 0 µa.e. on X \ dom(gφ) = (X \ Dφ ) ∪ φ−1 [Y \ dom g], and, on the convention proposed, J × gφ is defined µa.e. Moreover, if limn→∞ gn = g νa.e., then limn→∞ J × gn φ = J × gφ µa.e. So if hgn in∈N is a nondecreasing sequence of simple functions converging almost everywhere to g, hJ × gn φin∈N will be an nondecreasing sequence of integrable functions converging almost everywhere to J × gφ; by B.Levi’s theorem, R R R R J × gφ dµ exists = limn→∞ J × gn φ dµ = limn→∞ gn dν = g dν. (c) If g = g + − g − , where g + and g − are νintegrable functions, then
R
J × gφ dµ =
R
J × g + φ dµ −
R
R
J × g − φ dµ =
g + dν −
R
g − dν =
R
g dν.
(d) This deals with the case H = Y . For the general case, we have Z
Z g dν =
(g × χH)dν
H
(131Fa)
Z
Z
=
Z −1
J × (g × χH)φ dµ =
J × gφ × χ(φ
[H])dµ =
J × gφ dµ φ−1 [H]
by 214F. 235B Remarks (a) Note the particular convention 0 × undefined = 0 which I am applying to the interpretation of J × gφ. This is the first of a number of technical points which will concern us in this section. The point is that if g is defined νalmost everywhere, then for any extension of g to a function g1 : Y → R we shall, on this convention, have J × gφ = J × g1 φ except on {x : J(x) > 0, φ(x) ∈ Y \ dom g}, which is negligible; so that
R
J × gφ dµ =
R
J × g1 φ dµ =
R
g1 dν =
R
g dν
if g and g1 are integrable. Thus the convention is appropriate here, and while it adds a phrase to the statements of many of the results of this section, it makes their application smoother. (But I ought to insist that I am using this as a local convention only, and the ordinary rule 0 × undefined = undefined will stand elsewhere in this treatise unless explicitly overruled.) (b) I have had to take care in the formulation of this theorem to distinguish between the hypothesis
R
J(x)χ(φ−1 [F ])(x)µ(dx) exists = νF whenever νF < ∞
and the perhaps more elegant alternative
R
φ−1 [F ]
J(x)µ(dx) exists = νF whenever νF < ∞,
R R which is not quite adequate for the theorem. (See 235S below.) Recall that by A f I mean (f ¹A)dµA , R where µA is the subspace measure on A (214D). It is possible for A (f ¹A)dµA to be defined even when R f × χA dµ is not; for instance, take µ to be Lebesgue on [0, 1], A any nonmeasurable subset of R measure ∗ [0, 1], and f the constantRfunction with value 1; then A f = µ A, but f × χA = χA is not µintegrable. It is R however the case that if f × χA dµ is defined, then so is A f , and the two are equal; this is a consequence of 214F. While 235R shows that in most of the cases relevant to the present volume the distinction can be passed over, it is important to avoid assuming that φ−1 [F ] is measurable for every F ∈ T. A simple example is the following. Set X = Y = [0, 1]. Let µ be Lebesgue measure on [0, 1], and define ν by setting T = {F : F ⊆ [0, 1], F ∩ [0, 21 ] is Lebesgue measurable}, νF = 2µ(F ∩ [0, 21 ]) for every F ∈ T. Set φ(x) = x for every x ∈ [0, 1]. Then we have
235E
Measurable transformations
R
123
R
J × χ(φ−1 [F ])dµ ¤ ¤ for every F ∈ T, where J(x) = 2 for x ∈ [0, 12 ] and J(x) = 0 for x ∈ 12 , 1 . But of course there are subsets F of [ 21 , 1] which are not Lebesgue measurable (see 134D), and such an F necessarily belongs to T, even though φ−1 [F ] does not belong to the domain Σ of µ. The point here is that if νF0 = 0 then we expect to have J = 0 on φ−1 [F0 ], and it is of no importance whether φ−1 [F ] is measurable for F ⊆ F0 . νF =
F
J dµ =
R 235C Theorem 235A is concerned with integration, and accordingly the hypothesis J ×χ(φ−1 [F ])dµ = νF looks only at sets F of finite measure. If we wish to consider measurability of nonintegrable functions, we need a slightly stronger hypothesis. I approach this version more gently, with a couple of lemmas. Lemma Let Σ, T be σalgebras of subsets of X and Y respectively. Suppose that D ⊆ X and that φ : D → Y is a function such that φ−1 [F ] ∈ ΣD , the subspace σalgebra, for every F ∈ T. Then gφ is Σmeasurable for every [−∞, ∞]valued Tmeasurable function g defined on a subset of Y . proof Set C = dom g and B = dom gφ = φ−1 [C]. If a ∈ R, then there is an F ∈ T such that {y : g(y) ≤ a} = F ∩ C. Now there is an E ∈ Σ such that φ−1 [F ] = E ∩ D. So {x : gφ(x) ≤ a} = B ∩ E ∈ ΣB . As a is arbitrary, gφ is Σmeasurable. 235D Some of the results below are easier when we can move freely between measure spaces and their completions (212C). The next lemma is what we need. ˆ µ ˆ νˆ). Let φ : Lemma Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with completions (X, Σ, ˆ) and (Y, T, Dφ → Y ,RJ : DJ → [0, ∞[ be functions defined on conegligible subsets ofR X. (a) If J × χ(φ−1 [F ])dµ = νF whenever F ∈ T and νF < ∞, then J × χ(φ−1 [F ])dˆ µ = νˆF whenever ˆ and νˆF < ∞. F ∈T R R ˆ (b) If J × χ(φ−1 [F ])dµ = νF whenever F ∈ T, then J × χ(φ−1 [F ])dˆ µ = νˆF whenever F ∈ T. R proof Both rely on the fact that either hypothesis is enough to ensure that J ×χ(φ−1 [F ])dµ = 0 whenever νF = 0. Accordingly, if F is νnegligible, so that there is an F 0 ∈ T such that F ⊆ F 0 and νF 0 = 0, we shall have
R
J × χ(φ−1 [F ])dµ =
R
J × χ(φ−1 [F 0 ])dµ = 0.
ˆ there is an F0 ∈ T such that F0 ⊆ F and νˆ(F \ F0 ) = 0, so that But now, given any F ∈ T, Z Z −1 J × χ(φ [F ])dˆ µ = J × χ(φ−1 [F ])dµ Z Z = J × χ(φ−1 [F0 ])dµ + J × χ(φ−1 [F \ F0 ])dµ = νF0 = νˆF, provided (for part (a)) that νˆF < ∞. Remark Thus if we have the hypotheses of any of the principal results of this section valid for a pair of noncomplete measure spaces, we can expect to be able to draw some conclusion by applying the result to the completions of the given spaces. 235E
Now I come to the alternative version of 235A.
Theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and φ : Dφ → Y , J : DJ → [0, ∞[ two functions defined on conegligible subsets of X such that
R
J × χ(φ−1 [F ])dµ = νF
for every F ∈ T, allowing ∞ as a value of the integral.
124
The RadonNikod´ ym theorem
235E
(a) J × gφ is µvirtually measurable for every νvirtually measurable function g defined on a subset of Y . (b) RLet g be a νvirtually measurable [−∞, ∞]valued function defined on a conegligible subset of Y . R Then J × gφ dµ = g dν whenever either integral is defined in [−∞, ∞], if we interpret (J × gφ)(x) as 0 when J(x) = 0 and g(φ(x)) is undefined. ˆ µ ˆ νˆ) be the completions of (X, Σ, µ) and (Y, T, ν). By 235D, proof Let (X, Σ, ˆ) and (Y, T,
R
J × χ(φ−1 [F ])dˆ µ = νˆF
ˆ Recalling that a realvalued function is µvirtually measurable iff it is Σmeasurable ˆ for every F ∈ T. R R (212Fa), and that f dµ = f dˆ µ if either is defined in [−∞, ∞] (212Fb), the conclusions we are seeking are 0 ˆ ˆ (a) J × gφ is Σmeasurable for every Tmeasurable function g defined on a subset of Y ; R R 0 ˆ (b) J × gφ dˆ µ = g dˆ ν whenever g is a Tmeasurable function defined almost everywhere in Y and either integral is defined in [−∞, ∞]. (a) When I write
R
J × χDφ dµ =
R
J × χ(φ−1 [Y ])dµ = νY ,
which is part of the hypothesis of this theorem, I mean to imply that J × χDφ is µvirtually measurable, ˆ ˆ that is, is Σmeasurable. Because Dφ is conegligible, it follows that J is Σmeasurable, and its domain DJ , ˆ ˆ being conegligible, also belongs to Σ. Set G = {x : x ∈ DJ , J(x) > 0} ∈ Σ. Then for any set A ⊆ X, J × χA ˆ ˆ So the hypothesis is just that G ∩ φ−1 [F ] ∈ Σ ˆ for every F ∈ T. ˆ is Σmeasurable iff A ∩ G ∈ Σ. ˆ Now let g be a [−∞, ∞]valued function, defined on a subset C of Y , which is Tmeasurable. Applying ˆ ˆ 235C to φ¹G, we see that gφ¹G is Σmeasurable, so (J × gφ)¹G is Σmeasurable. On the other hand, J × gφ ˆ J × gφ is Σmeasurable, ˆ is zero almost everywhere on X \ G, so (because G ∈ Σ) as required. R (b)(i) Suppose first that g ≥ 0. Then J × gφ ≥ 0, so (a) tells us that J × gφ is defined in [0, ∞]. R R R α) If g dˆ (α ν < ∞ then J × gφ dˆ µ = g dˆ ν by 235A. β ) If there is some ² > 0 such that νˆH = ∞, where H = {y : g(y) ≥ ²}, then (β
R
J × gφ dˆ µ≥²
so
R
R
J × χ(φ−1 [H])dˆ µ = ²ˆ ν H = ∞,
J × gφ dˆ µ=∞=
R
g dˆ ν.
(γγ ) Otherwise, Z Z J × gφ dˆ µ ≥ sup{ J × hφ dˆ µ : h is νˆintegrable, 0 ≤ h ≤ g} Z Z = sup{ h dˆ ν : h is νˆintegrable, 0 ≤ h ≤ g} = g dˆ ν = ∞, so once again
R
J × φ dˆ µ=
R
g dˆ ν.
(ii) For general realvalued g, apply (i) to g + and g − where g + = 12 (g + g), g − = 21 (g − g); the point is that (J × gφ)+ = J × g + φ and (J × gφ)− = J × g − φ, so that
R
J × gφ =
R
J × g+ φ −
R
J × g− φ =
R
g+ −
R
g− =
R
g
if either side is defined in [−∞, ∞]. 235F Remarks (a) Of course there are two special cases of this theorem which between them carry all its content: the case J ≡ 1 and the case in which X = Y and φ is the identity function. If J = χX we are very close to 235I below, and if φ is the identity function we are close to the indefiniteintegral measures of §234. (b) As in 235A, we can strengthen the conclusion of (b) in 235E to
235J
Measurable transformations
R whenever F ∈ T and
R F
φ−1 [F ]
J × gφ dµ =
R F
125
g dν
g dν is defined in [−∞, ∞].
235G Inversemeasurepreserving functions It is high time that I introduced the nearest thing in measure theory to a ‘morphism’. If (X, Σ, µ) and (Y, T, ν) are measure spaces, a function φ : X → Y is inversemeasurepreserving if φ−1 [F ] ∈ Σ and µφ−1 [F ] = νF for every F ∈ T. 235H Proposition (a) Let (X, Σ, µ), (Y, T, ν) and (Z, Λ, λ) be measure spaces, and φ : X → Y , ψ : Y → Z two inversemeasurepreserving maps. Then ψφ : X → Z is inversemeasurepreserving. (b) Let (X, Σ, µ) be a measure space and T a σsubalgebra of Σ. Then the identity map is an inversemeasurepreserving function from (X, Σ, µ) to (X, T, µ¹ T). ˆ µ ˆ νˆ). Let φ be an (c) Let (X, Σ, µ) and (Y, T, ν) be measure spaces with completions (X, Σ, ˆ) and (Y, T, inversemeasurepreserving function from (X, Σ, µ) to (Y, T, ν). Then φ is also inversemeasurepreserving ˆ µ ˆ νˆ). from (X, Σ, ˆ) to (Y, T, proof (a) For any W ∈ Λ, µ(ψφ)−1 [W ] = µφ−1 [ψ −1 [W ]] = νψ −1 [W ] = λW . (b) is surely obvious. (c) When we say that φ is inversemeasurepreserving for µ and ν, we are saying that the image measure µφ−1 extends ν; of course it follows that the image measure µ ˆφ−1 extends ν. By 212Bd, µ ˆφ−1 is complete measure, so must extend νˆ; that is, φ is inversemeasurepreserving for µ ˆ and νˆ. Remark Of course (c) here can also be deduced from 235Db. 235I Theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces and φ : X → Y an inversemeasurepreserving function. Then (a) if g is a νvirtually measurable [−∞, ∞]valued function defined on a subset of Y , gφ is µvirtually measurable; R (b) if gR is a νvirtually measurable [−∞, ∞]valued function defined on a conegligible subset of Y , gφ dµ = g dν if either integral is defined in [−∞, ∞]; (c) if g is Ra νvirtually measurable [−∞, ∞]valued function defined on a conegligible subset of Y , and R F ∈ T, then φ−1 [F ] gφ dµ = F g dν if either integral is defined in [−∞, ∞]. ˆ for every proof (a) This follows immediately from 235Hc and 235C; in the notation of 235Hc, φ−1 [F ] ∈ Σ ˆ so if g is Tmeasurable ˆ ˆ F ∈ T, then gφ will be Σmeasurable. (b) Apply 235E with J = χX; we have
R
J × χ(φ−1 [F ])dµ = µφ−1 [F ] = νF
for every F ∈ T, so
R
gφ =
R
J × gφ =
R
g
if either integral is defined in [−∞, ∞]. (c) Apply (b) to g × χF . 235J The image measure catastrophe Applications of 235A would run much more smoothly if we could sayR R ‘ g dν exists and is equal to J × gφ dµ for every g : Y → R such that J × gφ is µintegrable’. Unhappily there is no hope of a universally applicable result in this direction. Suppose, for instance, that ν is Lebesgue measure on Y = [0, 1], that X ⊆ [0, 1] is a nonLebesguemeasurable set of outer measure 1 (134D), that µ is the subspace measure νX on X, and that φ(x) = x for x ∈ X. Then µφ−1 F = ν ∗ (X ∩ F ) = νF
126
The RadonNikod´ ym theorem
235J
for every Lebesgue measurable set F ⊆ Y , so we can take J =R χX and the hypotheses of 235A R and 235E will be satisfied. But if we write g = χX : [0, 1] → {0, 1}, then gφ dµ is defined even though g dν is not. The point here is that there is a set A ⊆ Y such that (in the language of 235A/235E) φ−1 [A] ∈ Σ but ˆ This is the image measure catastrophe. The search for contexts in which we can be sure that A∈ / T. it does not occur will be one of the motive themes of Volume 4. For the moment, I will offer some general remarks (235K235L), and describe one of the important cases in which the problem does not arise (235M). 235K Lemma Let Σ, T be σalgebras of subsets of X, Y respectively, and φ a function from a subset D of X to Y . Suppose that G ⊆ X and that T = {F : F ⊆ Y, G ∩ φ−1 [F ] ∈ Σ}. Then a realvalued function g, defined on a member of T, is Tmeasurable iff χG × gφ is Σmeasurable. proof Because surely Y ∈ T, the hypothesis implies that G ∩ D = G ∩ φ−1 [Y ] belongs to Σ. Let g : C → R be a function, where C ∈ T. Set B = dom(gφ) = φ−1 [C], and for a ∈ R set Fa = {y : g(y) ≥ a}, Ea = G ∩ φ−1 [Fa ] = {x : x ∈ G ∩ B, gφ(x) ≥ a}. Note that G ∩ B ∈ Σ because C ∈ T. (i) If g is Tmeasurable, then Fa ∈ T and Ea ∈ Σ for every a. Now G ∩ {x : x ∈ B, gφ(x) ≥ a} = G ∩ φ−1 [Fa ] = Ea , so {x : x ∈ B, (χG × gφ)(x) ≥ a} is either Ea or Ea ∪ (B \ G), and in either case is relatively Σmeasurable in B. As a is arbitrary, χG × gφ is Σmeasurable. (ii) If χG × gφ is Σmeasurable, then, for any a ∈ R, Ea = {x : x ∈ G ∩ B, (χG × gφ)(x) ≥ a} ∈ Σ because G ∩ B ∈ Σ and χG × gφ is Σmeasurable. So Fa ∈ T. As a is arbitrary, g is Tmeasurable. 235L Theorem Let (X, Σ, µ) and (Y, T, ν) be complete measure spaces. Let φ : Dφ → Y , J : DJ → [0, ∞[ be functions defined on conegligible subsets of X, and set G = {x : x ∈ DJ , J(x) > 0}. Suppose that T = {F : F ⊆ Y, G ∩ φ−1 [F ] ∈ Σ},
R
J × χ(φ−1 [F ])dµ for every F ∈ T. R R Then, for any realvalued function g defined on a subset of Y , J × gφ dµ = g dν whenever either integral is defined in [−∞, ∞], provided that we interpret (J × gφ)(x) as 0 when J(x) = 0 and g(φ(x)) is undefined. νF =
proof If g is Tmeasurable and defined almost everywhere, this is a consequence of 235E. So I have to show that if J × gφ is measurable and defined almost everywhere, so is g. Set W = Y \ dom g. Then J × gφ is undefined on G ∩ φ−1 [W ], because gφ is undefined there and we cannot take advantage of the escape clause available when J = 0; so G ∩ φ−1 [W ] must be negligible, therefore measurable, and W ∈ T. Next, νW =
R
J × χ(φ−1 [W ]) = 0
because J × χ(φ−1 [W ]) can be nonzero only on the negligible set G ∩ φ−1 [W ]. So g is defined almost everywhere. Note that the hypothesis surely implies that J × χDφ = J × χ(φ−1 [Y ]) is measurable, so that J is measurable (because Dφ is conegligible) and G ∈ Σ. Writing K(x) = 1/J(x) for x ∈ G, 0 for x ∈ X \ G, the function K : X → R is measurable, and χG × gφ = K × J × gφ is measurable. So 235K tells us that g must be measurable, and we’re done. Remark When J = χX, the hypothesis of this theorem becomes T = {F : F ⊆ Y, φ−1 [F ] ∈ Σ},
νF = µφ−1 [F ] for every F ∈ T;
that is, ν is the image measure µφ−1 as defined in 112E.
235O
Measurable transformations
127
235M Corollary Let (X, Σ, µ) be a complete measure space, and J a nonnegative measurable function defined on a conegligible subset of X. Let ν be the associated indefiniteintegral measure, and T its domain. ThenR for any Rrealvalued function g defined on a subset of X, g is Tmeasurable iff J × g is Σmeasurable, and g dν = J × g dµ if either integral is defined in [−∞, ∞], provided that we interpret (J × g)(x) as 0 when J(x) = 0 and g(x) is undefined. proof Put 235L, taking Y = X and φ the identity function, together with 234Dd. 235N Applying the RadonNikod´ ym theorem In order to use 235A235L effectively, we need to be able to find suitable functions J. This can be difficult – some very special examples will take up most of Chapter 26 below. But there are many circumstances in which we can be sure that such J exist, even if we do not they are. A minimal requirement is that if νF < ∞ and µ∗ φ−1 [F ] = 0 then νF = 0, because R know what −1 J × χ(φ [F ])dµ will be zero for any J. A sufficient condition, in the special case of indefiniteintegral measures, is in 234G. Another is the following. 235O Theorem Let (X, Σ, µ) be a σfinite measure space, (Y, T, ν) a semifinite measure space, and φ : D → Y a function such that (i) D is a conegligible subset of X, (ii) φ−1 [F ] ∈ Σ for every F ∈ T; (iii) µφ−1 [F ] > 0 whenever F ∈ T and νF > 0. R Then there is a Σmeasurable function J : X → [0, ∞[ such that J × χφ−1 [F ] dµ = νF for every F ∈ T. proof (a) To begin with (down to the end of (c) below) let us suppose that D = X and that ν is totally finite. ˜ = {φ−1 [F ] : F ∈ T} ⊆ Σ. Then T ˜ is a σalgebra of subsets of X. P Set T P (i) ˜ ∅ = φ−1 [∅] ∈ T. ˜ take F ∈ T such that E = φ−1 [F ], so that (ii) If E ∈ T, ˜ X \ E = φ−1 [Y \ F ] ∈ T. ˜ then for each n ∈ N choose Fn ∈ T such that En = φ−1 [Fn ]; then (iii) If hEn in∈N is any sequence in T, S S −1 ˜ Q Q [ n∈N Fn ] ∈ T. n∈N En = φ ˜ → [0, νY ] given by setting Next, we have a totally finite measure ν˜ : T ν˜(φ−1 [F ]) = νF for every F ∈ T. P P (i) If F , F 0 ∈ T and φ−1 [F ] = φ−1 [F 0 ], then φ−1 [F 4F 0 ] = ∅, so µ(φ−1 [F 4F 0 ]) = 0 and ν(F 4F 0 ) = 0; consequently νF = νF 0 . This shows that ν˜ is welldefined. (ii) Now ν˜∅ = ν˜(φ−1 [∅]) = ν∅ = 0. ˜ let hFn in∈N be a sequence in T such that En = φ−1 [Fn ] for each (iii) If hEn in∈N isSa disjoint sequence in T, 0 n; set Fn = Fn \ m 0 then µE > 0, because E = φ−1 [F ] where νF > 0. R (b) By 215B(ix) there is a Σmeasurable function h : X → ]0, ∞[ such that h dµ is finite. Define R ˜ → [0, ∞[ by setting µ ˜ then µ ˜ µ ˜ : T ˜E = E h dµ for every E ∈ T; ˜ is a totally finite measure. If E ∈ T and µ ˜E = 0, then (because h is strictly positive) µE = 0 and ν˜E = 0. Accordingly we may apply the ˜ ym theorem to µ ˜ and ν˜ to see that there is a Tmeasurable function g : X → R such that RRadonNikod´ ˜ g d˜ µ = ν ˜ E for every E ∈ T. Because ν ˜ is nonnegative, we may suppose that g ≥ 0. E (c) Applying 235A to µ, µ ˜, h and the identity function from X to itself, we see that
R
g × h dµ = E
˜ that is, that for every E ∈ T,
R
E
g d˜ µ = ν˜E
128
The RadonNikod´ ym theorem
R
235O
J × χ(φ−1 [F ])dµ = νF
for every F ∈ T, writing J = g × h. (d) This completes the proof when ν is totally finite and D = X. For the general case, if Y = ∅ then µX = 0 and the result is trivial. Otherwise, let φˆ be any extension of φ to a function from X to Y which is constant on X \ D; then φˆ−1 [F ] ∈ Σ for every F ∈ T, because D = φ−1 [Y ] ∈ Σ and φˆ−1 [F ] is always either φ−1 [F ] or (X \ D) ∪ φ−1 [F ]. Now ν must be σfinite. P P Use the criterion of 215B(ii). If F is a disjoint family in {F : F ∈ T, 0 < νF < ∞}, then E = {φˆ−1 [F ] : F ∈ F} is a disjoint family in {E : µE > 0}, so E and F are countable. Q Q Let hYn in∈N be a partition of Y into sets of finite νmeasure, and for each n ∈ N set νn F = ν(F ∩ Yn ) for every F ∈ T. Then νn is a totally finite measure on Y , and if νn F > 0 then νF > 0 so µφˆ−1 [F ] = µφ−1 [F ] > 0. Accordingly µ, φˆ and νn satisfy the assumptions of the theorem together with those of (a) above, and there is a Σmeasurable function Jn : X → [0, ∞[ such that for every F ∈ T. Now set J = then
νn F =
P∞ n=0
Z J × χ(φ
−1
[F ])dµ = =
R
Jn × χ(φ−1 [F ])dµ
Jn × χ(φ−1 [Yn ]), so that J : X → [0, ∞[ is Σmeasurable. If F ∈ T, ∞ Z X n=0 ∞ Z X
Jn × χ(φ−1 [Yn ]) × χ(φ−1 [F ])dµ Jn × χ(φ−1 [F ∩ Yn ])dµ =
n=0
∞ X
ν(F ∩ Yn ) = νF,
n=0
as required. 235P Remark Theorem 235O can fail if µ is only strictly localizable rather than σfinite. P P Let X = Y be an uncountable set, Σ = PX, µ counting measure on X (112Bd), T the countablecocountable σalgebra of Y , ν the countablecocountable measure on Y (211R), φ : X → Y the identity map. Then φ−1 [F ] ∈ Σ and µφ−1 [F ] > 0 whenever νF > 0. But if J is any µintegrable function on X, then F = {x : J(x) 6= 0} is countable and ν(Y \ F ) = 1 6= 0 =
R
φ−1 [Y \F ]
J dµ. Q Q
*235Q There are some simplifications in the case of σfinite spaces; in particular, 235A and 235E become conflated. I will give an adaptation of the hypotheses of 235A which may be used in the σfinite case. First a lemma. Lemma R LetR(X, Σ, µ)Rbe a measure space and f a nonnegative integrable function on X. If A ⊆ X is such that A f + X\A f = f , then f × χA is integrable. proof By 214Eb, there are µintegrable functions f1 , f2 such that f1 extends f ¹A, f2 extends f ¹X \ A, and R R R R f = E∩A f , f = E\A f E 2 E 1 R R for every E ∈ Σ. Because f is nonnegative, E f1 and E f2 are nonnegative for every E ∈ Σ, and f1 , f2 are nonnegative a.e. Accordingly we have f × χA ≤a.e. f1 and f × χ(X \ A) ≤a.e. f2 , so that f ≤a.e. f1 + f2 . But also
R
f1 + f2 =
R
X
f1 +
R
X
f2 =
R
A
f+
R
X\A
f=
R
f,
so f =a.e. f1 + f2 . Accordingly f1 =a.e. f − f2 ≤a.e. f − f × χ(X \ A) = f × χA ≤a.e. f1 and f × χA =a.e. f1 is integrable.
*235S
Measurable transformations
129
*235R Proposition Let (X, Σ, µ) be a complete measure space and (Y, T, ν) a complete σfinite measure space. Suppose R that φ : Dφ → Y , J : DJ → [0, ∞[ are functions defined on conegligible subsets Dφ , DJ of X such that φ−1 [F ] J dµ exists and is equal to νF whenever F ∈ T and νF < ∞. (a) J × gφ is Σmeasurable for every Tmeasurable realvalued function g defined onR a subset of Y R. (b) If g is a Tmeasurable realvalued function defined almost everywhere in Y , then J × gφ dµ = g dν whenever either integral is defined in [−∞, ∞], interpreting (J × gφ)(x) as 0 when J(x) = 0, g(φ(x)) is undefined. proof The point is that the hypotheses of 235E are satisfied. To see this, let us write ΣC = {E ∩C : E ∈ Σ} and µC = µ∗ ¹ΣC for the subspace measure on C, for each C ⊆ X. Let hYn in∈N be a nondecreasing sequence of sets with union Y and with νYn < ∞ for every n ∈ N, starting from Y0 = ∅. (i) Take any F ∈ T with νF < ∞, and set Fn = F ∪ Yn for each n ∈ N; write Cn = φ−1 [Fn ]. Fix n for the moment. Then our hypothesis implies that
R
C0
J dµ +
R
Cn \C0
J dµ = νF + ν(Fn \ F ) = νFn =
R
Cn
J dµ.
If we regard the subspace measures on C0 and Cn \ C0 as derived from the measure µCn of Cn (214Ce), then 235Q tells us that J × χC0 is µCn integrable, and there is a µintegrable function hn such that hn extends (J × χC0 )¹Cn . Let E be a µconegligible set, included in the domain Dφ of φ, such S that hn ¹E is Σmeasurable for every n. Because hCn in∈N is a nondecreasing sequence with union φ−1 [ n∈N Fn ] = Dφ , (J × χC0 )(x) = limn→∞ hn (x) for every x ∈ E, and (J × χC0 )¹E is measurable. At the same time, we know that there is a µintegrable h extending J¹C0 , and 0 ≤a.e. J × χC0 ≤a.e. h. Accordingly J × χC0 is integrable, and (using 214F)
R
J × χφ−1 [F ] dµ =
R
J × χC0 dµ =
R
C0
J¹C0 dµC0 = νF .
(ii) This deals with F of finite measure. For general F ∈ T,
R
J × χ(φ−1 [F ]) dµ = limn→∞
R
J × χ(φ−1 [F ∩ Yn ]) dµ = limn→∞ ν(F ∩ Yn ) = νF .
So the hypotheses of 235E are satisfied, and the result follows at once.
of
R*235S
I remarked in 235Bb that a difficulty can arise in 235A, for general measure spaces, if we speak R J dµ in the hypothesis, in place of J × χ(φ−1 [F ])dµ. Here is an example. φ−1 [F ]
Example Set X = Y = [0, 2]. Write ΣL for the algebra of Lebesgue measurable subsets of R, and µL for Lebesgue measure; write µc for counting measure on R. Set Σ = T = {E : E ⊆ [0, 2], E ∩ [0, 1[ ∈ ΣL }; of course this is a σalgebra of subsets of [0, 2]. For E ∈ Σ = T, set µE = νE = µL (E ∩ [0, 1[) + µc (E ∩ [1, 2]); then µ is a complete measure – in effect, it is the direct sum of Lebesgue measure on [0, 1[ and counting measure on [1, 2] (see 214K). It is easy to see that µ∗ B = µ∗L (B ∩ [0, 1[) + µc (B ∩ [1, 2]) for every B ⊆ [0, 2]. Let A ⊆ [0, 1[ be a nonLebesguemeasurable set such that µ∗L (E \ A) = µL E for every Lebesgue measurable E ⊆ [0, 1[ (see 134D). Define φ : [0, 2] → [0, 2] by setting φ(x) = x + 1 if x ∈ A, φ(x) = x if x ∈ [0, 2] \ A. If F ∈ Σ, then µ∗ (φ−1 [F ]) = µF . P P (i) If F ∩ [1, 2] is finite, then µF = µL (F ∩ [0, 1]) + #(F ∩ [1, 2]). Now φ−1 [F ] = (F ∩ [0, 1[ \ A) ∪ (F ∩ [1, 2]) ∪ {x : x ∈ A, x + 1 ∈ F }; as the last set is finite, therefore µnegligible, µ∗ (φ−1 [F ]) = µ∗L (F ∩ [0, 1[ \ A) + #(F ∩ [1, 2]) = µL (F ∩ [0, 1[) + #(F ∩ [1, 2]) = µF .
130
The RadonNikod´ ym theorem
*235S
(ii) If F ∩ [1, 2] is infinite, so is φ−1 [F ] ∩ [1, 2], so µ∗ (φ−1 [F ]) = ∞ = µF . Q Q This means that if we set J(x) = 1 for every x ∈ [0, 2],
R
φ−1 [F ]
J dµ = µφ−1 [F ] (φ−1 [F ]) = µ∗ (φ−1 [F ]) = µF
for every F ∈ Σ, and R φ, J satisfy the amended hypotheses for 235A. But if we set g = χ [0, 1[, then g is µintegrable, with g dµ = 1, while J(x)g(φ(x)) = 1 if x ∈ [0, 1] \ A, 0 otherwise, so, because A ∈ / Σ, J × gφ is not measurable, and therefore (since µ is complete) not µintegrable. 235T Reversing the burden Throughout the work above, I have been using the formula
R
as being the natural extension of the formula
R
J × gφ = g=
R
R
g,
gφ × φ0
of ordinary advanced calculus. But we can also move the ‘derivative’ J to the other side of the equation, as follows. Theorem Let (X, Σ, µ), (Y, T, ν) be measure spaces and φ : X → Y , J : YR → [0, ∞[Rfunctions such that R J dν and µφ−1 [F ] are defined in [0, ∞] and equal for every F ∈ T. Then gφ dµ = J × g dν whenever F g is νvirtually measurable and defined νalmost everywhere and either integral is defined in [−∞, ∞]. proof Let ν1 be the indefiniteintegral measure over ν defined by J, ˆ the completion of µ. Then φ is R and µ inversemeasurepreserving for µ ˆ and ν1 . P P If F ∈ T, then ν1 F = F J dν = µφ−1 [F ]; that is, φ is inversemeasurepreserving for µ and ν1 ¹T. Since ν1 is the completion of ν1 ¹T (234Db), φ is inversemeasurepreserving for µ and ν1 (235Hc). Q Q Of course we can also regard ν1 as being an indefiniteintegral measure over the completion νˆ of ν (212Fb). So if g is νvirtually measurable and defined νalmost everywhere,
R
J × g dν =
R
J × g dˆ ν=
R
g dν1 =
R
gφ dˆ µ=
R
gφ dµ
if any of the five integrals is defined in [−∞, ∞], by 235M, 235Ib and 212Fb again. 235X Basic exercises (a) Explain what 235A tells us when X = Y , T = Σ, φ is the identity function and νE = αµE for every E ∈ Σ. (b) Let (X, Σ, µ) be a measure space, J an integrable nonnegative realvalued function on X, and φ : Dφ → R a measurable function, where Dφ is a conegligible subset of X. Set g(a) =
R
{x:φ(x)≤a}
J
for a ∈ R, and let µg be the LebesgueStieltjes measure associated with g. Show that for every µg integrable real function f .
R
J × f φ dµ =
R
f dµg
(c) Let Σ, T and Λ be σalgebras of subsets of X, Y and Z respectively. Let us say that a function φ : A → Y , where A ⊆ X, is (Σ, T)measurable if φ−1 [F ] ∈ ΣA , the subspace σalgebra of A, for every F ∈ T. Suppose that A ⊆ X, B ⊆ Y , φ : A → Y is (Σ, T)measurable and ψ : B → Z is (T, Λ)measurable. Show that ψφ is (Σ, Λ)measurable. Deduce 235C. (d) Let (X, Σ, µ) be a measure space and (Y, T, ν) a semifinite measure space. LetR φ : Dφ → Y and J : DJ → [0, ∞[ be functions defined on conegligible subsets Dφ , DJ of X such that J × χ(φ−1 [F ])dµ exists = νF whenever F ∈ T and νF < ∞. Let g be a Tmeasurable realvalued function, defined on a conegligible subset of Y . Show that J × gφ is µintegrable iff g is νintegrable, and the integrals are then equal, provided we interpret (J × gφ)(x) as 0 when J(x) = 0 and g(φ(x)) is undefined.
235Yc
Measurable transformations
131
> (e) Let (X, Σ, µ) and (Y, T, ν) be measure spaces and φ : X → Y an inversemeasurepreserving function. (i) Show that ν is totally finite, or a probability measure, iff µ is. (ii) Show that if ν is σfinite, so is µ. (iii) Show that if ν is semifinite and µ is σfinite, then ν is σfinite. (Hint: use 215B(iii).) (iv) Show that if µ is purely atomic and ν is semifinite then ν is purely atomic. (vi) Show that if ν is σfinite and atomless then µ is atomless. (f ) Let (X, Σ, µ) be a measure space and E ∈ Σ. Define a measure µ E on X by setting (µ E)(F ) = µ(E ∩ F ) Rwhenever F ⊆RX is such that F ∩ E ∈ Σ. Show that, for any function f from a subset of X to [−∞, ∞], f d(µ E) = E f dµ if either is defined in [−∞, ∞]. > (g) Let g : R → R be a nondecreasing function which is absolutely continuous on every closed bounded interval, and let µg be the associated LebesgueStieltjes measure (114Xa, R R 225Xf). Write µ for Lebesgue measure on R, and let f : R → R be a function. Show that f × g 0 dµ = f dµg in the sense that if one of the integrals exists, finite or infinite, so does the other, and they are then equal. (h) Let g : R → R be a nondecreasing function and J a nonnegative R realvalued µg integrable function, where µg is the LebesgueStieltjes measure defined from g. Set h(x) = ]−∞,x] J dµg for each x ∈ R, and let R R µh be the LebesgueStieltjes measure associated with h. Show that, for any f : R → R, f ×J dµg = f dµh , in the sense that if one of the integrals is defined in [−∞, ∞] so is the other, and they are then equal. > (i) Let X be a set and λ, µ, ν three measures on X such that µ is an indefiniteintegral measure over λ, with RadonNikod´ ym derivative f , and ν is an indefiniteintegral measure over µ, with RadonNikod´ ym derivative g. Show that ν is an indefiniteintegral measure over λ, and that f × g is a RadonNikod´ ym derivative of ν with respect to λ, provided we interpret (f × g)(x) as 0 when f (x) = 0 and g(x) is undefined. R (j) In 235O, if ν is not semifinite, show that we can still find a J such that φ−1 [F ] J dµ = νF for every set F of finite measure. (Hint: use the ‘semifinite version’ of ν, as described in 213Xc.) (k) Let (X, Σ, µ) be a σfinite measure space, and T a σsubalgebra of Σ. Let ν : T → R be a countably additive functional such R that νF = 0 whenever F ∈ T and µF = 0. Show that there is a µintegrable function f such that F f dµ = νF for every F ∈ T. (Hint: use the method of 235O, applied to the positive and negative parts of ν.) ˆ µ ˆ νˆ). Let φ : Dφ → (l) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with completions (X, Σ, ˆ) and (Y, T, R Y , J : DJ → [0, ∞[ be functions defined on conegligible subsets of X. Show that if φ−1 [F ] J dµ = νF R ˆ and νˆF < ∞. Hence, or otherwise, whenever F ∈ T and νF < ∞, then φ−1 [F ] J dµ = νF whenever F ∈ T show that 235Rb is valid for noncomplete spaces (X, Σ, µ) and (Y, T, ν). 235Y Further exercises (a) Write ν for Lebesgue measure on Y = [0, 1], and T for its domain. Let A ⊆ [0, 1] be a set such that ν ∗ A = ν ∗ ([0, 1] \ A) = 1, and set X = [0, 1] ∪ {x + 1 : x ∈ A} ∪ {x + 2 : x ∈ [0, 1] \ A}. Let µLX be the subspace measure induced on X by Lebesgue measure, and set µE = 31 µLX E for E ∈ Σ = dom µLX . Define φ : X → Y by writing φ(x) = x if x ∈ [0, 1], φ(x) = x − 1 if x ∈ X ∩ ]1, 2] and φ(x) = x − 2 if x ∈ X ∩ ]2, 3]. Show that T = {F : F ⊆ Y, φ−1 [F ] ∈ Σ}, that νF = µφ−1 [F ] for every F ∈ T, but that ν ∗ A > µ∗ φ−1 [A]. (b) Write T for the algebra of Borel subsets of Y = [0, 1], and ν for the restriction of Lebesgue measure to T. Let A ⊆ [0, 1] be a set such that both A and [0, 1] \ A have Lebesgue outer measure 1, and set X = A ∪ [1, 2]. Let Σ be the algebra of relatively Borel subsets of X, and set µE = µA (A ∩ E) for E ∈ Σ, where µA is the subspace measure induced on A by Lebesgue measure. Define φ : X → Y by setting φ(x) = x if x ∈ A, x − 1 if x ∈ X \ A. Show that T = {F : F ⊆ Y, φ−1 [F ] ∈ Σ} and that φ is inversemeasurepreserving, but that, setting g = χ([0, 1] \ A), gφ is µintegrable while g is not νintegrable. (c) Let (X, Σ, R µ) be a probability space and T a σsubalgebra of Σ. Let f be a nonnegative µintegrable function with f dµ = 1, so that its indefiniteintegral measure ν is a probability measure. Let g be a νintegrable realvalued function and set h = f × g, intepreting h(x) as 0 if f (x) = 0 and g(x) is undefined. Let f1 , h1 be conditional expectations of f , h on T with respect to the measure µ, and set g1 = h1 /f1 , interpreting g1 (x) as 0 if h1 (x) = 0 and f1 (x) is either 0 or undefined. Show that g1 is a conditional expectation of g on T with respect to the measure ν.
132
The RadonNikod´ ym theorem
235 Notes
235 Notes and comments I see that I have taken up a great deal of space in this section with technicalities; the hypotheses of the theorems vary erratically, with completeness, in particular, being invoked at apparently arbitrary intervals, and ideas repeat themselves in a haphazard pattern. There is nothing deep, and most of the work consists in laboriously verifying details. The trouble with this topic is that it is useful. The results here are abstract expressions of integrationbysubstitution; they have applications all over measure theory. I cannot therefore content myself with theorems which will elegantly express the underlying ideas, but must seek formulations which I can quote in later arguments. I hope that the examples in 235Bb, 235J, 235P, 235S, 235Ya and 235Yb will go some way to persuade you that there are real traps for the unwary, and that the careful verifications written out at such length are necessary. On the other hand, it is happily the case that in simple contexts, in which the measures µ, ν are σfinite and the transformations φ are Borel isomorphisms, no insuperable difficulties arise, and in particular the image measure catastrophe does not trouble us. But for further work in this direction I refer you to the applications in §263, §265 and §271, and to Volume 4. One of the striking features of measure theory, compared with other comparably abstract branches of pure mathematics, is the relative unimportance of any notion of ‘morphism’. The theory of groups, for instance, is dominated by the concept of ‘homomorphism’, and general topology gives a similar place to ‘continuous function’. In my view, the nearest equivalent in measure theory is the idea of ‘inversemeasurepreserving function’ (235G). I mean in Volumes 3 and 4 to explore this concept more thoroughly. In this volume I will content myself with signalling such functions when they arise, and with the basic facts listed in 235H235I.
241B
L0 and L0
133
Chapter 24 Function spaces The extraordinary power of Lebesgue’s theory of integration is perhaps best demonstrated by its ability to provide structures relevant to questions quite different from those to which it was at first addressed. In this chapter I give the constructions, and elementary properties, of some of the fundamental spaces of functional analysis. I do not feel called on here to justify the study of normed spaces; if you have not met them before, I hope that the introduction here will show at least that they offer a basis for a remarkable fusion of algebra and analysis. The fragments of the theory of metric spaces, normed spaces and general topology which we shall need are sketched in §§2A22A5. The principal ‘function spaces’ described in this chapter in fact combine three structural elements: they are (infinitedimensional) linear spaces, they are metric spaces, with associated concepts of continuity and convergence, and they are ordered spaces, with corresponding notions of supremum and infimum. The interactions between these three types of structure provide an inexhaustible wealth of ideas. Furthermore, many of these ideas are directly applicable to a wide variety of problems in more or less applied mathematics, particularly in differential and integral equations, but more generally in any system with infinitely many degrees of freedom. I have laid out the chapter with sections on L0 (the space of equivalence classes of all realvalued measurable functions, in which all the other spaces of the chapter are embedded), L1 (equivalence classes of integrable functions), L∞ (equivalence classes of bounded measurable functions) and Lp (equivalence classes of pthpowerintegrable functions). While ordinary functional analysis gives much more attention to the Banach spaces Lp for 1 ≤ p ≤ ∞ than to L0 , from the special point of view of this book the space L0 is at least as important and interesting as any of the others. Following these four sections, I return to a study of the standard topology on L0 , the topology of ‘convergence in measure’ (§245), and then to two linked sections on uniform integrability and weak compactness in L1 (§§246247). There is a technical point here which must never be lost sight of. While it is customary and natural to call L1 , L2 and the others ‘function spaces’, their elements are not in fact functions, but equivalence classes of functions. As you see from the language of the preceding paragraph, my practice is to scrupulously maintain the distinction; I give my reasons in the notes to §241.
241 L0 and L0 The chief aim of this chapter is to discuss the spaces L1 , L∞ and Lp of the following three sections. However it will be convenient to regard all these as subspaces of a larger space L0 of equivalence classes of (virtually) measurable functions, and I have collected in this section the basic facts concerning the ordered linear space L0 . It is almost the first principle of measure theory that sets of measure zero can often be ignored; the phrase ‘negligible set’ itself asserts this principle. Accordingly, two functions which agree almost everywhere may often (not always!) be treated as identical. A suitable expression of this idea is to form the space of equivalence classes of functions, saying that two functions are equivalent if they agree on a conegligible set. This is the basis of all the constructions of this chapter. It is a remarkable fact that the spaces of equivalence classes so constructed are actually better adapted to certain problems than the spaces of functions from which they are derived, so that once the technique has been mastered it is easier to do one’s thinking in the more abstract spaces. 241A The space L0 : Definition It is time to give a name to a set of functions which has already been used more than once. Let (X, Σ, µ) be any measure space. I write L0 , or L0 (µ), for the space of realvalued functions f defined on conegligible subsets of X which are virtually measurable, that is, such that f ¹E is measurable for some conegligible set E ⊆ X. Recall that f is µvirtually measurable iff it is measurable for the completion of µ (212Fa).
134
Function spaces
241B
241B Basic properties If (X, Σ, µ) is any measure space, then we have the following facts, corresponding to the fundamental properties of measurable functions listed in §121 of Volume 1. I work through them in order, so that if you have Volume 1 to hand you can see what has to be missed out. (a) A constant realvalued function defined almost everywhere in X belongs to L0 (121Ea). (b) f + g ∈ L0 for all f , g ∈ L0 (for if f ¹F and g¹G are measurable, then (f + g)¹(F ∩ G) = (f ¹F ) + (g¹G) is measurable)(121Eb). (c) cf ∈ L0 for all f ∈ L0 , c ∈ R (121Ec). (d) f × g ∈ L0 for all f , g ∈ L0 (121Ed). (e) If f ∈ L0 and h : R → R is Borel measurable, then hf ∈ L0 (121Eg). (f) If hfn in∈N is a sequence in L0 and f = limn→∞ fn is defined (as a realvalued function) almost everywhere in X, then f ∈ L0 (121Fa). (g) If hfn in∈N is a sequence in L0 and f = supn∈N fn is defined (as a realvalued function) almost everywhere in X, then f ∈ L0 (121Fb). (h) If hfn in∈N is a sequence in L0 and f = inf n∈N fn is defined (as a realvalued function) almost everywhere in X, then f ∈ L0 (121Fc). (i) If hfn in∈N is a sequence in L0 and f = lim supn∈N fn is defined (as a realvalued function) almost everywhere in X, then f ∈ L0 (121Fd). (j) If hfn in∈N is a sequence in L0 and f = lim inf n∈N fn is defined (as a realvalued function) almost everywhere in X, then f ∈ L0 (121Fe). (k) L0 is just the set of realvalued functions, defined on subsets of X, which are equal almost everywhere to some Σmeasurable function from X to R. P P (i) If g : X → R is Σmeasurable and f =a.e. g, then F = {x : x ∈ dom f, f (x) = g(x)} is conegligible and f ¹F = g¹F is measurable (121Eh), so f ∈ L0 . (ii) If f ∈ L0 , let E ⊆ X be a conegligible set such that f ¹E is measurable. Then D = E ∩ dom f is conegligible and f ¹D is measurable, so there is a measurable h : X → R agreeing with f on D (121I); and h agrees with f almost everywhere. Q Q 241C The space L0 : Definition Let (X, Σ, µ) be any measure space. Then =a.e. is an equivalence relation on L0 . Write L0 , or L0 (µ), for the set of equivalence classes in L0 under =a.e. . For f ∈ L0 , write f • for its equivalence class in L0 . 241D The linear structure of L0 Let (X, Σ, µ) be any measure space, and set L0 = L0 (µ), L0 = L (µ). 0
(a) If f1 , f2 , g1 , g2 ∈ L0 and f1 =a.e. f2 , g1 =a.e. g2 then f1 + g1 =a.e. f2 + g2 . Accordingly we may define addition on L0 by setting f • + g • = (f + g)• for all f , g ∈ L0 . (b) If f1 , f2 ∈ L0 and f1 =a.e. f2 , then cf1 =a.e. cf2 for every c ∈ R. Accordingly we may define scalar multiplication on L0 by setting c.f • = (cf )• for all f ∈ L0 , c ∈ R. (c) Now L0 is a linear space over R, with zero 0• , where 0 is the function with domain X and constant value 0, and negatives −f • = (−f )• . P P (i) f + (g + h) = (f + g) + h for all f , g, h ∈ L0 , so u + (v + w) = (u + v) + w for all u, v, w ∈ L0 . (ii) f + 0 = 0 + f = f for every f ∈ L0 , so u + 0• = 0• + u = u for every u ∈ L0 . (iii) f + (−f ) =a.e. 0 for every f ∈ L0 ,
241Ed
L0 and L0
135
so f • + (−f )• = 0• for every f ∈ L0 . (iv) f + g = g + f for all f , g ∈ L0 , so u + v = v + u for all u, v ∈ L0 . (v) c(f + g) = cf + cg for all f , g ∈ L0 and c ∈ R, so c(u + v) = cu + cv for all u, v ∈ L0 and c ∈ R. (vi) (a + b)f = af + bf for all f ∈ L0 , a, b ∈ R, so (a + b)u = au + bu for all u ∈ L0 , a, b ∈ R. (vii) (ab)f = a(bf ) for all f ∈ L0 , a, b ∈ R, so (ab)u = a(bu) for all u ∈ L0 , a, b ∈ R. (viii) 1f = f for all f ∈ L0 , so 1u = u for all u ∈ L0 . Q Q 241E The order structure of L0 Let (X, Σ, µ) be any measure space and set L0 = L0 (µ), L0 = L0 (µ). (a) If f1 , f2 , g1 , g2 ∈ L0 , f1 =a.e. f2 , g1 =a.e. g2 and f1 ≤a.e. g1 , then f2 ≤a.e. g2 . Accordingly we may define a relation ≤ on L0 by saying that f • ≤ g • iff f ≤a.e. g. (b) Now ≤ is a partial ordering on L0 . P P (i) If f , g, h ∈ L0 and f ≤a.e. g and g ≤a.e. h, then f ≤a.e. h. 0 Accordingly u ≤ w whenever u, v, w ∈ L and u ≤ v, v ≤ w. (ii) If f ∈ L0 then f ≤a.e. f ; so u ≤ u for every u ∈ L0 . (iii) If f , g ∈ L0 and f ≤a.e. g and g ≤a.e. f , then f =a.e. g, so if u ≤ v and v ≤ u then u = v. Q Q (c) In fact L0 , with ≤, is a partially ordered linear space, that is, a (real) linear space with a partial ordering ≤ such that if u ≤ v then u + w ≤ v + w for every w, if 0 ≤ u then 0 ≤ cu for every c ≥ 0. P P (i) If f , g, h ∈ L0 and f ≤a.e. g, then f + h ≤a.e. g + h. (ii) If f ∈ L0 and f ≥a.e. 0, then cf ≥a.e. 0 for every c ≥ 0. Q Q (d) More: L0 is a Riesz space or vector lattice, that is, a partially ordered linear space such that u ∨ v = sup{u, v}, u ∧ v = inf{u, v} are defined for all u, v ∈ L0 . P P Take f , g ∈ L0 such that f • = u, • 0 g = v. Then f ∨ g, f ∧ g ∈ L , writing (f ∨ g)(x) = max(f (x), g(x)),
(f ∧ g)(x) = min(f (x), g(x))
for x ∈ dom f ∩ dom g. (Compare 241Bgh.) Now, for any h ∈ L0 , we have
136
Function spaces
241Ed
f ∨ g ≤a.e. h ⇐⇒ f ≤a.e. h and g ≤a.e. h, h ≤a.e. f ∧ g ⇐⇒ h ≤a.e. f and h ≤a.e. g, so for any w ∈ L0 we have (f ∨ g)• ≤ w ⇐⇒ u ≤ w and v ≤ w, w ≤ (f ∧ g)• ⇐⇒ w ≤ u and w ≤ v. Thus we have (f ∨ g)• = sup{u, v} = u ∨ v,
(f ∧ g)• = inf{u, v} = u ∧ v
in L0 . Q Q (e) In particular, for any u ∈ L0 we can speak of u = u ∨ (−u); if f ∈ L0 then f •  = f • . If f , g ∈ L0 , c ∈ R then cf  = cf ,
1 2
f ∨ g = (f + g + f − g),
1 2
f ∧ g = (f + g − f − g),
f + g ≤a.e. f  + g,
so cu = cu, 1 2
1 2
u ∨ v = (u + v + u − v),
u ∧ v = (u + v − u − v),
u + v ≤ u + v
for all u, v ∈ L0 . (f ) A special notation is often useful. If f is a realvalued function, set f + (x) = max(f (x), 0), f − (x) = max(−f (x), 0) for x ∈ dom f , so that f = f + − f −,
f  = f + + f − = f + ∨ f − ,
all these functions being defined on dom f . In L0 , the corresponding operations are u+ = u∨0, u− = (−u)∨0, and we have u = u+ − u− ,
u = u+ + u− = u+ ∨ u− ,
u+ ∧ u− = 0.
(g) It is perhaps obvious, but I say it anyway: if u ≥ 0 in L0 , then there is an f ≥ 0 in L0 such that f = u. P P Take any g ∈ L0 such that u = g • , and set f = g ∨ 0. Q Q •
241F Riesz spaces There is an extensive abstract theory of Riesz spaces, which I think it best to leave aside for the moment; a general account may be found in Luxemburg & Zaanen 71 and Zaanen 83; my own book Fremlin 74 covers the elementary material, and Chapter 35 in the next volume repeats the most essential ideas. For our purposes here we need only a few definitions and some simple results which are most easily proved for the special cases in which we need them, without reference to the general theory. (a) A Riesz space U is Archimedean if whenever u ∈ U , u > 0 (that is, u ≥ 0 and u 6= 0), and v ∈ U , there is an n ∈ N such that nu 6≤ v. (b) A Riesz space U is Dedekind σcomplete (or σordercomplete, or σcomplete) if every nonempty countable set A ⊆ U which is bounded above has a least upper bound in U . (c) A Riesz space U is Dedekind complete (or order complete, or complete) if every nonempty set A ⊆ U which is bounded above in U has a least upper bound in U .
L0 and L0
241G
241G
137
Now we have the following important properties of L0 .
Theorem Let (X, Σ, µ) be a measure space. Set L0 = L0 (µ). (a) L0 is Archimedean and Dedekind σcomplete. (b) If (X, Σ, µ) is semifinite, then L0 is Dedekind complete iff (X, Σ, µ) is localizable. proof Set L0 = L0 (µ). (a)(i) If u, v ∈ L0 and u > 0, express u as f • and v as g • where f , g ∈ L0 . Then E = {x : x ∈ dom f, f (x) > 0} is not negligible. So there is an n ∈ N such that En = {x : x ∈ dom f ∩ dom g, nf (x) > g(x)} S is not negligible, since E ∩dom g ⊆ n∈N En . But now nu 6≤ v. As u and v are arbitrary, L0 is Archimedean. (ii) Now let A ⊆ L0 be a nonempty countable set with an upper bound w in L0 . Express A as {fn• : n ∈ N} where hfn in∈N is a sequence in L0 ,Tand w as h• where h ∈ L0 . Set f = supn∈N fn . Then we have f (x) defined in R at any point x ∈ dom h ∩ n∈N dom fn such that fn (x) ≤ h(x) for every n ∈ N, that is, for almost every x ∈ X; so f ∈ L0 (241Bg). Set u = f • ∈ L0 . If v ∈ L0 , say v = g • where g ∈ L0 , then un ≤ v for every n ∈ N ⇐⇒ for every n ∈ N, fn ≤a.e. g ⇐⇒ for almost every x ∈ X, fn (x) ≤ g(x) for every n ∈ N ⇐⇒ f ≤a.e. g ⇐⇒ u ≤ v. Thus u = supn∈N un in L0 . As A is arbitrary, L0 is Dedekind σcomplete. (b)(i) Suppose that (X, Σ, µ) is localizable. Let A ⊆ L0 be any nonempty set with an upper bound w0 ∈ L0 . Set A = {f : f is a measurable function from X to R, f • ∈ A}, then every member of A is of the form f • for some f ∈ A (241Bk). For each q ∈ Q, let Eq be the family of subsets of X expressible in the form {x : f (x) ≥ q} for some f ∈ A; then Eq ⊆ Σ. Because (X, Σ, µ) is localizable, there is a set Fq ∈ Σ which is an essential supremum for Eq . For x ∈ X, set g ∗ (x) = sup{q : q ∈ Q, x ∈ Fq }, allowing ∞ as the supremum of a set which is not bounded above, and −∞ as sup ∅. Then S {x : g ∗ (x) > a} = q∈Q,q>a Fq ∈ Σ for every a ∈ R. If f ∈ A, then f ≤a.e. g ∗ . P P For each q ∈ Q, set then Eq \ Fq is negligible. Set H =
S
Eq = {x : f (x) ≥ q} ∈ Eq ; q∈Q (Eq
\ Fq ). If x ∈ X \ H, then
f (x) ≥ q =⇒ g ∗ (x) ≥ q, so f (x) ≤ g ∗ (x); thus f ≤a.e. g ∗ . Q Q If h : X → R is measurable and u ≤ h• for every u ∈ A, then g ∗ ≤a.e. h. P P Set Gq = {x : h(x) ≥ q} for each q ∈ Q. If E ∈ Eq , there is an f ∈ A such that E = {x : f (x) ≥ q}; now f ≤a.e. h, so E \ Gq ⊆ {x : f (x) > h(x)} is negligible. Because Fq is an essential supremum for Eq , Fq \ Gq is negligible; and this is true for every q ∈ Q. Consequently S {x : h(x) < g ∗ (x)} ⊆ q∈Q Fq \ Gq is negligible, and g ∗ ≤a.e. h. Q Q Now recall that we are assuming that A 6= ∅ and that A has an upper bound w0 ∈ L0 . Take any f0 ∈ A and a measurable h0 : X → R such that h•0 = w0 ; then f ≤a.e. h0 for every f ∈ A, so f0 ≤a.e. g ∗ ≤a.e. h0 , and g ∗ must be finite a.e. Setting g(x) = g ∗ (x) when g ∗ (x) ∈ R, we have g ∈ L0 and g =a.e. g ∗ , so that f ≤a.e. g ≤a.e. h
138
Function spaces
241G
whenever f , h are measurable functions from X to R, f • ∈ A and h• is an upper bound for A; that is, u ≤ g• ≤ w whenever u ∈ A and w is an upper bound for A. But this means that g • is a least upper bound for A in L0 . As A is arbitrary, L0 is Dedekind complete. (ii) Suppose that L0 is Dedekind complete. We are assuming that (X, Σ, µ) is semifinite. Let E be any subset of Σ. Set A = {0} ∪ {(χE)• : E ∈ E} ⊆ L0 . Then A is bounded above by (χX)• so has a least upper bound w ∈ L0 . Express w as h• where h : X → R is measurable, and set F = {x : h(x) > 0}. Then F is an essential supremum for E in Σ. P P (α) If E ∈ E, then (χE)• ≤ w so χE ≤a.e. h, that is, h(x) ≥ 1 for almost every x ∈ E, and E \ F ⊆ {x : x ∈ E, h(x) < 1} is negligible. (β) If G ∈ Σ and E \ G is negligible for every E ∈ E, then χE ≤a.e. χG for every E ∈ E, that is, (χE)• ≤ (χG)• for every E ∈ E; so w ≤ (χG)• , that is, h ≤a.e. χG. Accordingly F \G ⊆ {x : h(x) > (χG)(x)} is negligible. Q Q As E is arbitrary, (X, Σ, µ) is localizable. 241H The multiplicative structure of L0 Let (X, Σ, µ) be any measure space; write L0 = L0 (µ), L = L0 (µ). 0
(a) If f1 , f2 , g1 , g2 ∈ L0 and f1 =a.e. f2 , g1 =a.e. g2 then f1 × g1 =a.e. f2 × g2 . Accordingly we may define multiplication on L0 by setting f • × g • = (f × g)• for all f , g ∈ L0 . (b) It is now easy to check that, for all u, v, w ∈ L0 and c ∈ R, u × (v × w) = (u × v) × w, u × 1• = 1• × u = u, where 1 is the function with constant value 1, c(u × v) = cu × v = u × cv, u × (v + w) = (u × v) + (u × w), (u + v) × w = (u × w) + (v × w), u × v = v × u, u × v = u × v, u × v = 0 iff u ∧ v = 0, u ≤ v iff there is a w such that w ≤ 1• and u = v × w. 241I The action of Borel functions on L0 Let (X, Σ, µ) be a measure space and h : R → R a Borel measurable function. Then hf ∈ L0 = L0 (µ) for every f ∈ L0 (241Be) and hf =a.e. hg whenever f =a.e. g. ¯ : L0 → L0 defined by setting h(f ¯ • ) = (hf )• for every f ∈ L0 . For instance, if So we have a function h 0 p ¯ u ∈ L and p ≥ 1, we can consider u = h(u) where h(x) = xp for x ∈ R. 241J Complex L0 The ideas of this chapter, like those of Chapters 2223, are often applied to spaces based on complexvalued functions instead of realvalued functions. Let (X, Σ, µ) be a measure space. (a) We may write L0C = L0C (µ) for the space of complexvalued functions f such that dom f is a conegligible subset of X and there is a conegligible subset E ⊆ X such that f ¹E is measurable; that is, such that the real and imaginary parts of f both belong to L0 (µ). Next, L0C = L0C (µ) will be the space of equivalence classes in L0C under the equivalence relation =a.e. . (b) Using just the same formulae as in 241D, it is easy to describe addition and scalar multiplication rendering L0C a linear space over C. We no longer have quite the same kind of order structure, but we can identify a ‘real part’, being {f • : f ∈ L0C is real a.e.}, obviously identifiable with the real linear space L0 , and corresponding maps u 7→ Re(u), u 7→ Im(u) : L0C → L0 such that u = Re(u) + i Im(u) for every u. Moreover, we have a notion of ‘modulus’, writing
L0 and L0
241Y
139
f •  = f • ∈ L0 for every f ∈ L0C , satisfying the basic relations cu = cu, u + v ≤ u + v for u, v ∈ L0C and c ∈ C, as in 241Ef. We do of course still have a multiplication on L0C , for which all the formulae in 241H are still valid. (c) The following fact is useful. For any u ∈ L0C , u is the supremum in L0 of {Re(ζu) : ζ ∈ C, ζ = 1}. P P (i) If ζ = 1, then Re(ζu) ≤ ζu = u. So u is an upper bound of {Re(ζu) : ζ = 1}. (ii) If v ∈ L0 and Re(ζu) ≤ v whenever ζ = 1, then express u, v as f • , g • where f : X → C and g : X → R are measurable. For any q ∈ Q, x ∈ X set fq (x) = Re(eiqx f (x)). Then fq ≤a.e. g. Accordingly H = {x : fq (x) ≤ g(x) for every q ∈ Q} is conegligible. But of course H = {x : f (x) ≤ g(x)}, so f  ≤a.e. g and u ≤ v. As v is arbitrary, u is the least upper bound of {Re(ζu) : ζ = 1}. Q Q 241X Basic exercises >(a) Let X be a set, and let µ be counting measure on X (112Bd). Show that L (µ) can be identified with L0 (µ) = RX . 0
> (b) Let (X, Σ, µ) be a measure space and µ ˆ the completion of µ (212C). Show that L0 (µ) = L0 (ˆ µ), 0 L (µ) = L (ˆ µ). 0
(c) Let (X, Σ, µ) be a measure space. (i) Show that for every u ∈ L0 (µ) we may define an outer measure θu : PR → [0, ∞] by writing θu (A) = µ∗ f −1 [A] whenever A ⊆ R and f ∈ L0 (µ) is such that f • = u. (ii) Show that every Borel subset of R is measurable for the measure defined from θu by Carath´eodory’s method. > (d) Let (X, Σ, µ) be a measure space. Suppose that r ≥ 1 and that h : R r → R is a Borel measurable ¯ : L0 (µ)r → L0 (µ) defined by writing function. Show that there is a function h ¯ • , . . . , f • ) = (h(f1 , . . . , fr ))• h(f 1
r
for f1 , . . . , fr ∈ L0 (µ). (e) Let U be a Dedekind σcomplete Riesz space and A ⊆ U a nonempty countable set which is bounded below in U . Show that inf A is defined in U . (f ) Let U be a Dedekind complete Riesz space and A ⊆ U a nonempty set which is bounded below in U . Show that inf A is defined in U . (g) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, with direct sum (X, Σ, µ) (214K). (i) Writing φi : Xi → X for the canonical maps (in the construction of 214K, φi (x) = (x, i) for x ∈ Xi ), show that Q f 7→ hf φi ii∈I is a bijection between L0 (µ) and i∈I L0 (µi ). (ii) Show that this corresponds to a bijection Q between L0 (µ) and i∈I L0 (µi ). (h) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and φ : X → Y an inversemeasurepreserving function. (i) Show that we have a map T : L0 (ν) → L0 (µ) defined by setting T g • = (gφ)• for every g ∈ L0 (ν). (ii) Show that T is linear, that T (v × w) = T v × T w for all v, w ∈ L0 (ν), and that T (supn∈N vn ) = supn∈N T vn whenever hvn in∈N is a sequence in L0 (ν) with an upper bound in L0 (ν). (i) Let (X, Σ, µ) be a measure space and g, h, hgn in∈N Borel measurable functions from R to itself; write ¯ g¯n for the corresponding functions from L0 = L0 (µ) to itself (241I). (i) Show that g¯, h, ¯ ¯ ¯ g¯(u) + h(u) = g + h(u), g¯(u) × h(u) = g × h(u), g¯(h(u)) = gh(u) ¯ for every u ∈ L0 . (ii) Show that if g(t) ≤ h(t) for every t ∈ R, then g¯(u) ≤ h(u) for every u ∈ L0 . (iii) Show 0 that if g is nondecreasing, then g¯(u) ≤ g¯(v) whenever u ≤ v in L . (iv) Show that if h(t) = supn∈N gn (t) ¯ for every t ∈ R, then h(u) = supn∈N g¯n (u) in L0 for every u ∈ L0 . 241Y Further exercises (a) Let (X, Σ, µ) be a measure space and µ ˜ the c.l.d. version of µ (213E). (i) Show that L0 (µ) ⊆ L0 (˜ µ). (ii) Show that this inclusion defines a linear operator T : L0 (µ) → L0 (˜ µ) such that T (u × v) = T u × T v for all u, v ∈ L0 (µ). (iii) Show that whenever v > 0 in L0 (˜ µ) there is a u ≥ 0 in L0 (µ) such that 0 < T u ≤ v. (iv) Show that T (sup A) = sup T [A] whenever A ⊆ L0 (µ) is a nonempty set with a least upper bound in L0 (µ). (v) Show that T is injective iff µ is semifinite. (vi) Show that if µ is localizable, then T is an isomorphism for the linear and order structures of L0 (µ) and L0 (˜ µ). (Hint: 213Hb.)
140
Function spaces
241Yb
(b) Show that any Dedekind σcomplete Riesz space is Archimedean. (c) Let U be any Riesz space. For u ∈ U write u = u ∨ (−u), u+ = u ∨ 0, u− = (−u) ∨ 0. Show that, for any u, v ∈ U , u = u+ − u− ,
u = u+ + u− = u+ ∨ u− ,
u+ ∧ u− = 0,
1 2
u ∨ v = (u + v + u − v) = u + (v − u)+ , 1 2
u ∧ v = (u + v − u − v) = u − (u − v)+ , u + v ≤ u + v. (d) A Riesz space U is said to have the countable sup property if for every A ⊆ U with a least upper bound in U , there is a countable B ⊆ A such that sup B = sup A. Show that if (X, Σ, µ) is a semifinite measure space, then it is σfinite iff L0 (µ) has the countable sup property. (e) Let (X, Σ, µ) be a measure space and Y any subset of X; let µY be the subspace measure on Y . (i) Show that L0 (µY ) = {f ¹ Y : f ∈ L0 (µ)}. (ii) Show that there is a canonical surjection T : L0 (µ) → L0 (µY ) defined by setting T (f • ) = (f ¹ Y )• for every f ∈ L0 (µ), which is linear and multiplicative and preserves finite suprema and infima, so that (in particular) T (u) = T u for every u ∈ L0 (µ). (f ) Suppose, in 241Ye, that Y ∈ Σ. Explain how L0 (µY ) may be identified (as ordered linear space) with the subspace {u : u × χ(X \ Y )• = 0} of L0 (µ). (g) Let U be a partially ordered linear space and N a linear subspace of U such that whenever u, u0 ∈ N and u0 ≤ v ≤ u then v ∈ N . (i) Show that the linear space quotient U/N is a partially ordered linear space if we say that u• ≤ v • in U/N iff there is a w ∈ N such that u ≤ v + w in U . (ii) Show that in this case U/N is a Riesz space if U is a Riesz space and u ∈ N for every u ∈ N . (h) Let (X, Σ, µ) be a measure space. Write L0strict for the space of all measurable functions from X to R, and N for the subspace of L0strict consisting of measurable functions which are zero almost everywhere. (i) Show that L0strict is a Dedekind σcomplete Riesz space. (ii) Show that L0 (µ) can be identified, as ordered linear space, with the quotient L0strict /N as defined in 241Yg above. (i) Let (X, Σ, µ) be a measure space, and h : R → R a nondecreasing function which is continuous on the ¯ ¯ left. Show that if A ⊆ L0 = L0 (µ) is a nonempty set with a supremum v ∈ L0 , then h(v) = supu∈A h(u), ¯ : L0 → L0 is the function described in 241I. where h 241 Notes and comments As hinted in 241Yb241Yc, the elementary properties of the space L0 which take up most of this section are strongly interdependent; it is not difficult to develop a theory of ‘Riesz algebras’ to incorporate the ideas of 241H into the rest. (Indeed, I sketch such a theory in §352 in the next volume.) If we write L0strict for the space of measurable functions from X to R, then L0strict is also a Dedekind σcomplete Riesz space, and L0 can be identified with the quotient L0strict /N, writing N for the set of functions in L0strict which are zero almost everywhere. (To do this properly, we need a theory of quotients of ordered linear spaces; see 241Yg241Yh above.) Of course L0 , as I define it, is not quite a linear space. I choose the slightly more awkward description of L0 as a space of equivalence classes in L0 rather than in L0strict because it frequently happens in practice that a member of L0 arises from a member of L0 which is either not defined at every point of the underlying space, or not quite measurable; and to adjust such a function so that it becomes a member of L0strict , while trivial, is an arbitrary process which to my mind is liable to distort
§242 intro.
L1
141
the true nature of such a construction. Of course the same argument could be used in favour of a slightly larger space, the space L0∞ of µvirtually measurable [−∞, ∞]valued functions defined and finite almost everywhere, relying on 135E rather than on 121E121F. But I maintain that the operation of restricting a function in L0∞ to the set on which it is finite is not arbitrary, but canonical and entirely natural. Reading the exposition above – or, for that matter, scanning the rest of this chapter – you are sure to notice a plethora of • s, adding a distinctive character to the pages which, I expect you will feel, is disagreeable to the eye and daunting, or at any rate wearisome, to the spirit. Many, perhaps most, authors prefer to simplify the typography by using the same symbol for a function in L0 or L0strict and for its equivalence class in L0 ; and indeed it is common to use syntax which does not distinguish between them either, so that an object which has been defined as a member of L0 will suddenly become a function with actual values at points of the underlying measure space. I prefer to maintain a rigid distinction; you must choose for yourself whether to follow me. Since I have chosen the more cumbersome form, I suppose the burden of proof is on me, to justify my decision. (i) Anyone would agree that there is at least a formal difference between a function and a set of functions. This by itself does not justify insisting on the difference in every sentence; mathematical exposition would be impossible if we always insisted on consistency in such questions as whether (for instance) the number 3 belonging to the set N of natural numbers is exactly the same object as the number 3 belonging to the set C of complex numbers, or the ordinal 3. But the difference between an object and a set to which it belongs is a sufficient difference in kind to make any confusion extremely dangerous, and while I agree that you can study this topic without using different symbols for f and f • , I do not think you can ever safely escape a mental distinction for more than a few lines of argument. (ii) As a teacher, I have to say that quite a few students, encountering this material for the first time, are misled by any failure to make the distinction between f and f • into believing that no distinction need be made; and – as a teacher – I always insist on a student convincing me, by correctly writing out the more pedantic forms of the arguments for a few weeks, that he understands the manipulations necessary, before I allow him to go his own way. (iii) The reason why it is possible to evade the distinction in certain types of argument is just that the Dedekind σcomplete Riesz space L0strict parallels the Dedekind σcomplete Riesz space L0 so closely that any proposition involving only countably many members of these spaces is likely to be valid in one if and only if it is valid in the other. In my view, the implications of this correspondence are at the very heart of measure theory. I prefer therefore to keep it constantly conspicuous, reminding myself through symbolism that every theorem has a Siamese twin, and rising to each challenge to express the twin theorem in an appropriate language. (iv) There are ways in which L0strict and L0 are actually very different, and many interesting ideas can be expressed only in a language which keeps them clearly separated. For more than half my life now I have felt that these points between them are sufficient reason for being consistent in maintaining the formal distinction between f and f • . You may feel that in (iii) and (iv) of the last paragraph I am trying to have things both ways; I am arguing that both the similarities and the differences between L0 and L0 support my case. Indeed that is exactly my position. If they were totally different, using the same language for both would not give rise to confusion; if they were essentially the same, it would not matter if we were sometimes unclear which we were talking about.
242 L1 While the space L0 treated in the previous section is of very great intrinsic interest, its chief use in the elementary theory is as a space in which some of the most important spaces of functional analysis are embedded. In the next few sections I introduce these one at a time. The first is the space L1 of equivalence classes of integrable functions. The importance of this space is not only that it offers a language in which to express those many theorems about integrable functions which do not depend on the differences between two functions which are equal almost everywhere. It can also appear as the natural space in which to seek solutions to a wide variety of integral equations, and as the completion of a space of continuous functions.
142
Function spaces
242A
242A The space L1 Let (X, Σ, µ) be any measure space. (a) Let L1 = L1 (µ) be the set of realvalued functions, defined on subsets of X, which are integrable over X. Then L1 ⊆ L0 = L0 (µ), as defined in §241, and, for f ∈ L0 , we have f ∈ L1 iff there is a g ∈ L1 such that f  ≤a.e. g; if f ∈ L1 , g ∈ L0 and f =a.e. g, then g ∈ L1 . (See 122P122R.) (b) Let L1 =R L1 (µ)R ⊆ L0 = L0 (µ) be the set of equivalence classes of members of L1 . If f , gR ∈ L1 and R R 1 f =a.e. g then f = g (122Rb). Accordingly we may define a functional on L by writing f • = f for every f ∈ L1 . R (c) It will be convenient to be able to write u for u ∈ L1 , A ⊆ X; this may be defined by saying that A R • R 1 f R= A f Rfor every f ∈ L , where the integral is defined in 214D. P P I have only to check that if f =a.e. g A then A f = A g; and thisR is because f ¹A = g¹A almost everywhere Q R R Ron A. Q If E ∈ Σ, u ∈ L1 then E u = u × (χE)• ; this is because E f = f × χE for every integrable function f (131Fa). (d) If u ∈ L1 , there is a Σmeasurable, µintegrable function f : X → R such that f • = u. P P As noted in 241Bk, there is a measurable f : X → R such that f • = u; but of course f is integrable because it is equal almost everywhere to some integrable function. Q Q Theorem Let (X, Σ, µ) be any measure space. Then L1 (µ) is a linear subspace of L0 (µ) and R 242B 1 : L → R is a linear functional. proof If u, v ∈ L1 = L1 (µ) and c ∈ R let f , g be integrable functions such that u = f • , v = g • ; then f + g and cf are integrable, so u + v = (f + g)• and cu = (cf )• belong to L1 . Also R R R R R R u+v = f +g = f + g = u+ v and
R
cu =
R
R R cf = c f = c u.
242C The order structure of L1 Let (X, Σ, µ) be any measure space. (a) L1 = L1 (µ) has an order structure derived from that of L0 = L0 (µ) (241E); that is, f • ≤ g • iff f ≤ g a.e. Being a linear subspace of L0 , L1 must be a partially ordered linear space; the two conditions of 241Ec are obviously inherited by linear subspaces. R R 1 Note R alsoRthat if u, v ∈ L and u ≤ v then u ≤ v, because if f , g are integrable functions and f ≤a.e. g then f ≤ g (122Od). (b) If u ∈ L0 , v ∈ L1 and u ≤ v then u ∈ L1 . P P Let f ∈ L0 = L0 (µ), g ∈ L1 = L1 (µ) be such that • • u = f , v = g ; then g is integrable and f  ≤a.e. g, so f is integrable and u ∈ L1 . Q Q (c) In particular, u ∈ L1 whenever u ∈ L1 , and R R R R  u = max( u, (−u)) ≤ u, because u, −u ≤ u. (d) Because u ∈ L1 for every u ∈ L1 , 1 2
u ∨ v = (u + v + u − v),
u ∧ v = 12 (u + v − u − v)
belong to L1 for all u, v ∈ L1 . But if w ∈ L1 we surely have w ≤ u & w ≤ v ⇐⇒ w ≤ u ∧ v, w ≥ u & w ≥ v ⇐⇒ w ≥ u ∨ v because these are true for all w ∈ L0 , so u ∨ v = sup{u, v}, u ∧ v = inf{u, v} in L1 . Thus L1 is, in itself, a Riesz space.
L1
242E
143
R (e) Note that if uR ∈ L1 , then u ≥ 0 iff E u ≥ 0 for every E ∈ Σ; this is because if f is an integrable 1 function if u, R R on X and E f ≥ 0 for every E ∈ Σ, then f ≥a.e. 0 (131Fb). More1 generally, R R v ∈ L and u ≤ E v for every E ∈ Σ, then u ≤ v. It follows at once that if u, v ∈ L and E u = E v for every E E ∈ Σ, then u = v (cf. 131Fc). (f ) If u ≥ 0 in L1 , there is a nonnegative f ∈ L1 such that f • = u (compare 241Eg). 242D The norm of L1 Let (X, Σ, µ) be any measure space. R R (a) For f ∈ L1 = L1 (µ) I write kf k1 = f  ∈ [0, ∞[. For u ∈ L1 = L1 (µ) set kuk1 = u, so that kf • k1 = kf k1 for every f ∈ L1 . Then k k1 is a norm on L1 . P P (i) If u, v ∈ L1 then u + v ≤ u + v, by 241Ee, so R R R R ku + vk1 = u + v ≤ u + v = u + v = kuk1 + kvk1 . (ii) If u ∈ L1 and c ∈ R then
R cu = c u = ckuk1 . R R (iii) If u ∈ L1 and kuk1 = 0, express u as f • , where f ∈ L1 ; then f  = u = 0. Because f  is nonnegative, it must be zero almost everywhere (122Rc), so f =a.e. 0 and u = 0 in L1 . Q Q R R (b) Thus L1 , with k k1 , is a normed space and : L1 → R is a linear operator; observe that k k ≤ 1, because R R  u ≤ u = kuk1 kcuk1 =
R
cu =
R
for every u ∈ L1 . (c) If u, v ∈ L1 and u ≤ v, then kuk1 =
R
u ≤
R
v = kvk1 .
1
In particular, kuk1 = kuk1 for every u ∈ L . (d) Note the following property of the normed Riesz space L1 : if u, v ∈ L1 and u, v ≥ 0, then R R R ku + vk1 = u + v = u + v = kuk1 + kvk1 . (e) The set (L1 )+ = {u : u ≥ 0} is closed in L1 . P P If v ∈ L1 , u ∈ (L1 )+ then ku − vk1 ≥ kv ∧ 0k1 ; this is 1 because if f , g ∈ L and f ≥a.e. 0, f (x) − g(x) ≥  min(g(x), 0) whenever f (x) and g(x) are both defined and f (x) ≥ 0, which is almost everywhere, so R R ku − vk1 = f − g ≥ g ∧ 0 = kv ∧ 0k1 . Now this means that if v ∈ L1 and v 6≥ 0, the ball {w : kw − vk1 < δ} does not meet (L1 )+ , where δ = kv ∧ 0k1 > 0 because v ∧ 0 6= 0. Thus L1 \ (L1 )+ is open and (L1 )+ is closed. Q Q 242E
For the next result we need a variant of B.Levi’s theorem.
Lemma and hfn in∈N a sequence of µintegrable realvalued functions such P∞LetR (X, Σ, µ) be a measure Pspace ∞ that n=0 fn  < ∞. Then f = n=0 fn is integrable and R R P∞ R P∞ R f = n=0 fn , f  ≤ n=0 fn . proof (a) Suppose first that every fn is nonnegative. Set gn = increasing a.e. and R P∞ R limn→∞ gn = k=0 fk
Pn k=0
is finite, so by B.Levi’s theorem (123A) f = limn→∞ gn is integrable and R R P∞ R f = limn→∞ gn = k=0 fk .
fk for each n; then hgn in∈N is
144
Function spaces
In this case, of course,
R
f  =
(b) For the general case, set fn+ = nonnegative integrable functions, and P∞ R So h1 = Finally
P∞ n=0
P∞
n=0
R
f=
1 2 (fn 
fn+ +
P∞ R n=0
fn =
+ fn ), fn− =
P∞ R n=0
fn− =
242E
P∞ R n=0
1 2 (fn 
fn .
− fn ), as in 241Ef; then fn+ and fn− are
P∞ R n=0
fn  < ∞.
fn+ and h2 = n=0 fn− are both integrable. Now f =a.e. h1 − h2 , so R R R P∞ R P∞ R P∞ R f = h1 − h2 = n=0 fn+ − n=0 fn− = n=0 fn . R
f  ≤
R
h1  +
R
h2  =
P∞ R n=0
fn+ +
P∞ R n=0
fn− =
P∞ R n=0
fn .
242F Theorem For any measure space (X, Σ, µ), L1 (µ) is complete under its norm k k1 . proof Let hun in∈N be a sequence in L1 such that kun+1 − un k1 ≤ 4−n for every n ∈ N. Choose integrable • functions fn such that f0• = u0 , fn+1 = un+1 − un for each n ∈ N. Then P∞ R P∞ fn  = ku0 k1 + n=0 kun+1 − un k1 < ∞. n=0 Pn P∞ So f = n=0 fn is integrable, by 242E, and u = f • ∈ L1 . Set gn = j=0 fj for each n; then gn• = un , so R R P∞ P∞ −j = 4−n /3 ku − un k1 = f − gn  ≤ j=n+1 4 j=n+1 fj  ≤ for each n. Thus u = limn→∞ un in L1 . As hun in∈N is arbitrary, L1 is complete (2A4E). 242G Definition It will be convenient, for later reference, to introduce the following phrase. A Banach lattice is a Riesz space U together with a norm k k on U such that (i) kuk ≤ kvk whenever u, v ∈ U and u ≤ v, writing u for u ∨ (−u), as in 241Ee (ii) U is complete under k k. Thus 242Dc and 242F amount to saying that the normed Riesz space (L1 , k k1 ) is a Banach lattice. 242H L1 as a Riesz space We can discuss the ordered linear space L1 in the language already used in 241E241G for L0 . Theorem Let (X, Σ, µ) be any measure space. Then L1 = L1 (µ) is Dedekind complete. proof (a) Let A ⊆ L1 be any nonempty set which is bounded above in L1 . Set A0 = {u0 ∨ . . . ∨ un : u0 , . . . , un ∈ A}. Then A ⊆ A0 , A0 has the same upper as A and u ∨ v ∈ A0 for all u, vR∈ A0 . Taking w0 to be any R bounds R 0 upper bound of A and A , we have R u ≤ w0 for every u ∈ A0 , so γ = supu∈A0 u is defined in R. For each n ∈ N, choose un ∈ A0 such that un ≥ γ − 2−n . Because L0 = L0 (µ) is Dedekind σcomplete (241Ga), u∗ = supn∈N un is defined in L0 , and u0 ≤ u∗ ≤ w0 in L0 . Consequently 0 ≤ u∗ − u0 ≤ w0 − u0 in L0 . But w0 − u0 ∈ L1 , so u∗ − u0 ∈ L1 (242Cb) and u∗ ∈ L1 . (b) The point is that u∗ is an upper bound for A. P P If u ∈ A, then u ∨ un ∈ A0 for every n, so Z ku − u ∧ u∗ k1 =
Z u − u ∧ u∗ ≤
u − u ∧ un
(because u ∧ un ≤ un ≤ u∗ , so u ∧ un ≤ u ∧ u∗ ) Z = u ∨ un − un (because u ∨ un + u ∧ un = u + un – see the formulae in 242Cd)
L1
242Jc
Z =
145
Z u ∨ un −
un ≤ γ − (γ − 2−n ) = 2−n
for every n; so ku − u ∧ u∗ k1 = 0. But this means that u = u ∧ u∗ , that is, that u ≤ u∗ . As u is arbitrary, u∗ is an upper bound for A. Q Q (c) On the other hand, any upper bound for A is surely an upper bound for {un : n ∈ N}, so is greater than or equal to u∗ . Thus u∗ = sup A in L1 . As A is arbitrary, L1 is Dedekind complete. Remark Note that the ordercompleteness of L1 , unlike that of L0 , does not depend on any particular property of the measure space (X, Σ, µ). 242I The RadonNikod´ ym theorem I think it is worth rewriting the RadonNikod´ ym theorem (232E) in the language of this chapter. Theorem Let (X, Σ, µ) be a measure space. Then there is a canonical bijection between L1 = L1 (µ) and the set of truly continuous additive functionals ν : Σ → R, given by the formula R νF = F u for F ∈ Σ, u ∈ L1 . Remark Recall that if µ is σfinite, then the truly continuous additive functionals are just the absolutely continuous countably additive functionals; and that if µ is totally finite, then all absolutely continuous (finitely) additive functionals are truly continuous (232B). R proof For u ∈ L1 , F ∈ Σ set νu F = F u. If u ∈ L1 , there is an integrable function f such that f • = u, in which case R F 7→ νu F = F f : Σ → R is additive and truly continuous, by 232D. If Rν : Σ → R is additive and truly continuous, then by 232E there • 1 is an integrable function f such that νF = F f for every F ∈ RΣ; setting R u = f in L , ν = νu . Finally, if 1 u, v are distinct members of L , there is an F ∈ Σ such that F u 6= F v (242Ce), so that νu 6= νv ; thus u 7→ νu is injective as well as surjective. 242J Conditional expectations revisited We now have the machinery necessary for a new interpretation of some of the ideas of §233. (a) Let (X, Σ, µ) be a measure space, and T a σsubalgebra of Σ, as in 233A. Then (X, T, µ¹ T) is a measure space, and L0 (µ¹ T) ⊆ L0 (µ); moreover, if f , g ∈ L0 (µ¹ T), then f = g (µ¹ T)a.e. iff f = g µa.e. P P There are µ¹ Tconegligible sets F , G ∈ T such that f ¹F and g¹G are Tmeasurable; set E = {x : x ∈ F ∩ G, f (x) 6= g(x)} ∈ T; then f = g (µ¹ T)a.e. ⇐⇒ (µ¹ T)(E) = 0 ⇐⇒ µE = 0 ⇐⇒ f = g µa.e. Q Q Accordingly we have a canonical map S : L0 (µ¹ T) → L0 (µ) defined by saying that if u ∈ L0 (µ¹ T) is the equivalence class of f ∈ L0 (µ¹ T), then Su is the equivalence class of f in L0 (µ). It is easy to check, working through the operations described in 241D, 241E and 241H, that S is linear, injective and orderpreserving, and that Su = Su, S(u ∨ v) = Su ∨ Sv, S(u × v) = Su × Sv for u, v ∈ L0 (µ¹ T). R R (b) Next, if f ∈ L1 (µ¹ T), then f ∈ L1 (µ) and f dµ = f d(µ¹ T) (233B); so Su ∈ L1 (µ) and kSuk1 = kuk1 for every u ∈ L1 (µ¹ T). Observe also that every member of L1 (µ) ∩ S[L0 (µ¹ T)] is actually in S[L1 (µ¹ T)]. P P Take u ∈ L1 (µ) ∩ 0 • 1 • 0 S[L (µ¹ T)]. Then u is expressible both as f where f ∈ L (µ), and as g where g ∈ L (µ¹ T). So g =a.e. f , and g is µintegrable, therefore (µ¹ T)integrable. Q Q This means that S : L1 (µ¹ T) → L1 (µ) ∩ S[L0 (µ¹ T)] is a bijection. (c) Now suppose that µX = 1, so that (X, Σ, µ) is a probabilityR space.R Recall that g is a conditional expectation of f on T if g is µ¹ Tintegrable, f is µintegrable and F g = F f for every F ∈ T; and that every µintegrable function has such a conditional expectation (233D).R If g is Ra conditional expectation of f and f1 = f µa.e. then g is a conditional expectation of f1 , because F f1 = F f for every F ; and I have already remarked in 233Dc that if g, g1 are conditional expectations of f on T then g = g1 µ¹ Ta.e.
146
Function spaces
242Jd
• • (d) This means that we have an operator P : L1 (µ) → L1 (µ¹ T) defined by saying that R R P (f ) = g 1 1 whenever g ∈ L (µ¹ T) is a conditional expectation of f ∈ L (µ) on T; that is, that F P u = F u whenever u ∈ L1 (µ), F ∈ T. If we identify L1 (µ), L1 (µ¹ T) with the sets of absolutely continuous additive functionals defined on Σ and T, as in 242I, then P corresponds to the operation ν 7→ ν¹ T. R R (e) Because P u is uniquely defined in L1 (µ¹ T) by the requirement F P u = F u for every F ∈ T (242Ce), we see that P must be linear. P P If u, v ∈ L1 (µ) and c ∈ R, then R R R R R R R P u + P v = F P u + F P v = F u + F v = F u + v = F P (u + v), F R R R R R P (cu) = F cu = c F u = c F P u = F cP u F R R for every F ∈ T. Q Q Also, if u ≥ 0, then F P u = F u ≥ 0 for every F ∈ T, so P u ≥ 0 (242Ce again). It follows at once that P is orderpreserving, that is, that P u ≤ P v whenever u ≤ v. Consequently
P u = P u ∨ (−P u) = P u ∨ P (−u) ≤ P u 0
for every u ∈ L (µ¹ T), because u ≤ u and −u ≤ u. (f ) We may legitimately regard P u ∈ L1 (µ¹ T) as ‘the’ conditional expectation of u ∈ L1 (µ) on T; P is the conditional expectation operator. R 1 1 as in (b); now P Su = u. P P F P Su = R (g) If Ru ∈ L (µ¹ T), then we have a corresponding Su ∈ L (µ), Su = F u for every F ∈ T. Q Q Consequently SP SP = SP : L1 (µ) → L1 (µ). F (h) The distinction drawn above between u = f • ∈ L0 (µ¹ T) and Su = f • ∈ L0 (µ) is of course pedantic. I believe it is necessary to be aware of such distinctions, even though for nearly all purposes it is safe as well as convenient to regard L0 (µ¹ T) as actually a subset of L0 (µ). If we do so, then (b) tells us that we can identify L1 (µ¹ T) with L1 (µ) ∩ L0 (µ¹ T), while (g) becomes ‘P 2 = P ’. 242K
The language just introduced allows the following reformulations of 233J233K.
Theorem Let (X, Σ, µ) be a probability space and T a σsubalgebra of Σ. Let φ : R → R be a convex ¯ • ) = (φf )• (241I). If function and φ¯ : L0 (µ) → L0 (µ) the corresponding operator defined by setting φ(f 1 1 ¯ ¯ P : L (µ) → L (µ¹ T) is the conditional expectation operator, then φ(P u) ≤ P (φu) whenever u ∈ L1 (µ) is ¯ such that φ(u) ∈ L1 (µ). proof This is just a restatement of 233J. 242L Proposition Let (X, Σ, µ) be a probability space, and T a σsubalgebra of Σ. Let P : L1 (µ) → L (µ¹ T) be the corresponding conditional expectation operator. If u ∈ L1 = L1R(µ) and v R∈ L0 (µ¹ T), then u × v ∈ L1 iff P u × v ∈ L1 , and in this case P (u × v) = P u × v; in particular, u × v = P u × v. 1
proof (I am here using the identification of L0 (µ¹ T) as a subspace of L0 (µ), as suggested in 242Jh.) Express u as f • , v as h• , where f ∈ L1 = L1 (µ) and h ∈ L0 (µ¹ T). Let g, g0 ∈ L1 (µ¹ T) be conditional expectations of f , f  respectively, so that P u = g • and P u = g0• . Then, using 233K, u × v ∈ L1 ⇐⇒ f × h ∈ L1 ⇐⇒ g0 × h ∈ L1 ⇐⇒ P u × v ∈ L1 , and in this case g × h is a conditional expectation of f × h, that is, P u × v = P (u × v). 242M L1 as a completion I mentioned in the introduction to this section that L1 appears in functional analysis as a completion of some important spaces; put another way, some dense subspaces of L1 are significant. The first is elementary. Proposition Let (X, Σ, µ) be any measure space, and write S for the space of µsimple functions on X. Then R (a) whenever f is a µintegrable realvalued function and ² > 0, there is an h ∈ S such that f − h ≤ ²; (b) S = {f • : f ∈ S} is a dense linear subspace of L1 = L1 (µ).
L1
242O
147
proof (a)(i) If f is nonnegative, then there is a simple function h such that h ≤a.e. f and (122K), in which case R R R R 1 f − h = f − h = f − h ≤ ².
R
h≥
R
f − 12 ²
2
(ii) In the general case, f is expressible as a difference f1 − f2 of nonnegative integrable functions. Now R there are h1 , h2 ∈ S such that fj − hj  ≤ 21 ² for both j and R R R f − h ≤ f1 − h1  + f2 − h2  ≤ ². (b) Because S is a linear subspace of RX included in L1 = L1 (µ), S isRa linear subspace of L1 . If u ∈ L1 and ² > 0, there are an f ∈ L1 such that f • = u and an h ∈ S such that f − h ≤ ²; now v = h• ∈ S and R ku − vk1 = f − h ≤ ². As u and ² are arbitrary, S is dense in L1 . 242N As always, Lebesgue measure on R r and its subsets is by far the most important example; and in this case we have further classes of dense subspace of L1 . If you have reached this point without yet troubling to master multidimensional Lebesgue measure, just take r = 1. If you feel uncomfortable with general subspace measures, take X to be R r or [0, 1] ⊆ R or some other particular subset which you find interesting. The following term will be useful. Definition If f is a real or complexvalued function defined on a subset of R r , say that the support of f is {x : x ∈ dom f, f (x) 6= 0}. 242O Theorem Let X be any subset of R r , where r ≥ 1, and let µ be Lebesgue measure on X, that is, the subspace measure on X induced by Lebesgue measure on R r . Write Ck for the space of bounded continuous functions f : R r → R which have bounded support, and S0 for the space of linear combinations of functions of the form χI where I ⊆ R r is a bounded halfopen interval. Then R R (a) whenever f ∈ L1 = L1 (µ) and ² > 0, there are g ∈ Ck , h ∈ S0 such that X f − g ≤ ², X f − h ≤ ²; (b) {(g¹X)• : g ∈ Ck } and {(h¹X)• : h ∈ S0 } are dense linear subspaces of L1 = L1 (µ). Remark Of course there is a redundant ‘bounded’ in the description of Ck ; see 242Xh. proof (a) I argue in turn that the result is valid for each of an increasing number of members f of L1 = L1 (µ). Write µr for Lebesgue measure on R r , so that µ is the subspace measure (µr )X . (i) Suppose first that f = χI¹X where I ⊆ R r is a bounded halfopen interval. Of course χI is already in S0 , so I have only to show that it is approximated by members of Ck . If I = ∅ the result is trivial; we can take g = 0. Otherwise, express I as [a − b, a + b[ where a = (α1 , . . . , αr ), b = (β1 , . . . , βr ) and βj > 0 for each j. Let δ > 0 be such that Q Q j≤r (βj + 2δ) ≤ ² + j≤r βj . For ξ ∈ R set gj (ξ) = 1 if ξ − αj  ≤ βj , = (βj + δ − ξ − αj )/δ if βj ≤ ξ − αj  ≤ βj + δ, = 0 if ξ − αj  ≥ βj + δ. 1
αj
The function gj
βj
δ
148
Function spaces
For x = (ξ1 , . . . , ξr ) ∈ R r set g(x) =
Q j≤r
242O
gj (ξj ).
Then g ∈ Ck and χI ≤ g ≤ χJ, where J = [a − b − δ1, a + b + δ1] (writing 1 = (1, . . . , 1)), so that (by the choice of δ) µr J ≤ µr I + ², and Z Z g − f  ≤ (χ(J ∩ X) − χ(I ∩ X))dµ = µ((J \ I) ∩ X) X
≤ µr (J \ I) = µr J − µr I ≤ ², as required. (ii) Now suppose that f = χ(X ∩ E) where E ⊆ R r is S a set of finite measure. Then there is a disjoint P There is an open set G ⊇ E family I0 , . . . , In of halfopen intervals such that µr (E4 j≤n Ij ) ≤ 21 ². P such that µr (G \ E) ≤ 14 ² (134Fa). For each m ∈ N, let Im be the family of halfopen intervals in R r of −m −m −m the form [a, b[ where a = (2 S k1 , . . . , 2 kr ), k1 , . . . , kr being integers, and b = a + 2 1; then Im is a disjoint family. Set Hm = {I : I ∈ Im , I ⊆ G}; then hHm im∈N is a nondecreasing family with union G, so that there isSan m such that µr (G \ Hm ) ≤ 14 ² and µr (E4Hm ) ≤ 21 ². But now Hm is expressible as a disjoint union j≤n Ij where I0 , . . . , In enumerate the members of Im included in Hm . (The last sentence derails if Hm is empty. Q Pn But if Hm = ∅ then we can take n = 0, I0 = ∅.) Q Accordingly h = j=0 χIj ∈ S0 and R
S
1
≤ ². 2 R As for Ck , (i) tells us that there is for each j ≤ n a gj ∈ Ck such that X gj − χIj  ≤ ²/2(n + 1), so that Pn g = j=0 gj ∈ Ck and X
R X
f − g ≤
R X
f − h = µ(X ∩ (E4
f − h +
R X
h − g ≤
² 2
j≤n Ij ))
+
Pn
R
j=0 X
gj − χIj  ≤ ².
Pn (iii) If f is a simple function, express f as k=0 ak χEk where each Ek is of finite measure in X. Each Ek is expressible as X ∩ Fk where µr Fk = µEk (214Ca). By (ii), we can find gk ∈ Ck , hk ∈ S0 such that R R ² ² ak  X gk − χFk  ≤ , ak  X hk − χFk  ≤ n+1 n+1 Pn Pn for each k. Set g = k=0 ak gk , h = k=0 ak hk ; then g ∈ Ck , h ∈ S0 and R R Pn R Pn f − g ≤ X k=0 ak χFk − gk  = k=0 ak  X χFk − gk  ≤ ², X R R Pn f − h ≤ k=0 ak  X χFk − hk  ≤ ², X as required. (iv) If f is any integrable function on X, then by 242Ma we can f0 such that R find a simple function R f − f0  ≤ 12 ², and now by (iii) there are g ∈ Ck , h ∈ S0 such that X f0 − g ≤ 21 ², X f0 − h ≤ 21 ²; so that R R R f − g ≤ X f − f0  + X f0 − g ≤ ², X R R R f − h ≤ X f − f0  + X f0 − h ≤ ². X
R
(b)(i) We must check first that if g ∈ Ck then g¹X is actually µintegrable. The point here is that if g ∈ Ck and a ∈ R then {x : x ∈ X, g(x) > a} is the intersection of X with an open subset of R r , and is therefore in the domain of µ, because all open sets are Lebesgue measurable (115G). Next, g is bounded and the set E = {x : x ∈ X, g(x) 6= 0} is bounded in R r , therefore of finite outer measure for µr and of finite measure for µ. Thus there is an M ≥ 0 such that g ≤ M χE, which is µintegrable. Accordingly g is µintegrable.
L1
242Xb
149
Of course h¹X is µintegrable for every h ∈ S0 because (by the definition of subspace measure) µ(I ∩ X) is defined and finite for every bounded halfopen interval I. (ii) Now the rest follows by just the same arguments as in 242Mb. Because {g¹X : g ∈ Ck } and {h¹X : h ∈ S0 } are linear subspaces of RX included in L1 (µ), their images Ck# and S0# are linearRsubspaces of L1 . If u ∈ L1 and ² > 0, there are an f ∈ L1 such that f • = u, and g ∈ Ck , h ∈ S0 such that X f − g, R f − h ≤ ²; now v = (g¹X)• ∈ Ck# and w = (h¹X)• ∈ S0# and X R R ku − vk1 = X f − g ≤ ², ku − wk1 = X f − h ≤ ². As u and ² are arbitrary, Ck# and S0# are dense in L1 . 242P Complex L1 As you would, I hope, expect, we can repeat the work above with L1C , the space of complexvalued integrable functions, in place of L1 , to construct a complex Banach space L1C . The required changes, based on the ideas of 241J, are minor. (a) In 242Aa, it is perhaps helpful to remark that, for f ∈ L0C , f ∈ L1C ⇐⇒ f  ∈ L1 ⇐⇒ Re(f ), Im(f ) ∈ L1 . Consequently, for u ∈ L0C , u ∈ L1C ⇐⇒ u ∈ L1 ⇐⇒ Re(u), Im(u) ∈ L1 . P∞ R 1 such that fn  < (b) ToP prove Ra complex version of 242E, observe that if hf i is a sequence in L n n∈N C n=0 R P∞ ∞ ∞, then n=0  Re(fn ) and n=0  Im(fn ) are both finite, so we may apply 242E twice and see that R P∞ R P∞ R P∞ P∞ R ( n=0 fn ) = ( n=0 Re(fn )) + ( n=0 Im(fn )) = n=0 fn . Accordingly we can prove that L1C is complete under k k1 by the argument of 242F. (c) Similarly, little change is needed to adapt 242J to give a description of a conditional expectation operator P : L1C (µ) → L1C (µ¹ T) when (X, Σ, µ) is a probability space and T is a σsubalgebra of Σ. In the formula P u ≤ P u of 242Je, we need to know that P u = supζ=1 Re(ζP u) in L0 (µ¹ T) (241Jc), while Re(ζP u) = Re(P (ζu)) = P (Re(ζu)) ≤ P u whenever ζ = 1. In 242M, we need to replace S by SC , the space of ‘complexvalued simple functions’ of the form P(d) n a k=0 k χEk where each ak is a complex number and each Ek is a measurable set of finite measure; then we get a dense linear subspace SC = {f • : f ∈ SC } of L1C . In 242O, we must replace Ck by Ck (R r ; C), the space of bounded continuous complexvalued functions of bounded support, and S0 by the linear span over C of {χ(I ∩ X) : I is a bounded halfopen interval}. 242X Basic exercises >(a) Let X be a set, and let µ be counting measure on X. Show that L1 (µ) can be identified with the space `1 (X) of absolutely summable realvalued functions on X (see 226A). In particular, the space `1 = `1 (N) of absolutely summable realvalued sequences is an L1 space. Write out proofs of 242F adapted to these special cases. >(b) Let (X, Σ, µ) be any measure space, and µ ˆ the completion of µ (212C, 241Xb). Show that L1 (ˆ µ) = 1 1 L (µ) and L (ˆ µ) = L (µ). 1
150
Function spaces
242Xc
(c) Show that any Banach lattice must be an Archimedean Riesz space (241Fa). (d) Let h(Xi , Σi , µi )ii∈I be anyQfamily of measure spaces, and (X, Σ, µ) their direct sum. Show that the isomorphism between L0 (µ) and i∈I L0 (µi ) (241Xg) induces an identification between L1 (µ) and Q P Q {u : u ∈ i∈I L1 (µi ), kuk = i∈I ku(i)k1 < ∞} ⊆ i∈I L1 (µi ). (e) Let (X, Σ, µ) be a probability space, and T a σsubalgebra of Σ, Υ a σsubalgebra of T. Let P1 : L1 (µ) → L1 (µ¹ T), P2 : L1 (µ¹ T) → L1 (µ¹ Υ) and P : L1 (µ) → L1 (µ¹ Υ) be the corresponding conditional expectation operators. Show that P = P2 P1 . (f ) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and φ : X → Y an inversemeasurepreserving function. Show that g 7→ gφ : L1 (ν) → L1 (µ) (235I) induces a linear operator T : L1 (ν) → L1 (µ) such that kT vk1 = kvk1 for every v ∈ L1 (ν). (g) Let U be a Riesz space (definition: 241E). A Riesz norm on U is a norm k k such that kuk ≤ kvk whenever u ≤ v. Show that if U is given its norm topology (2A4Bb) for such a norm, then (i) u 7→ u : U → U , (u, v) 7→ u ∨ v : U × U → U are continuous (ii) {u : u ≥ 0} is closed. (h) Show that if g : R r → R is continuous and has bounded support it is bounded. (Hint: 2A2F2A2G.) (i) Let µ be Lebesgue measure on R. (i) Show that if φδ (x) = exp(−
1 ) δ 2 −x2
for x < δ, 0 for x ≥ δ then Rx φ is smooth, that is, differentiable arbitrarily often. (ii) Show that if Fδ (x) = −∞ φδ dµ for x ∈ R then F is smooth. (iii) Show that if a < b < c < d in R there is a smooth function h such that χ[b, c] ≤ h ≤ χ[a, d]. (iv) Write D for the space of smooth functions h : R → R such that {x : h(x) 6= 0} is bounded. Show that {h• : h ∈ D} is dense in L1 (µ). (v) Let f be a realvalued function whichR is integrable over every bounded subset of R. Show that f × h is integrable for every h ∈ D, and that if f × h = 0 for every h ∈ D then f =a.e. 0. (Hint: 222D.) 242Y Further exercises (a) Let (X, Σ, µ) be a measure space. Let A ⊆ L1 = L1 (µ) be a nonempty downwardsdirected set, and suppose that inf A = 0 in L1 . (i) Show that inf u∈A kuk1 = 0. (Hint: set γ = inf u∈A kuk1 ; find a nonincreasing sequence hun in∈N in A such that limn→∞ kun k1 = γ; set v = inf n∈N un and show that u ∧ v = v for every u ∈ A, so that v = 0.) (ii) Show that if U is any open set containing 0, there is a u ∈ A such that v ∈ U whenever 0 ≤ v ≤ u. (b) Let (X, Σ, µ) be a measure space, and A ⊆ L1 = L1 (µ) a nonempty upwardsdirected set. Suppose that A is bounded for the norm k k1 . (i) Show that there is a nondecreasing sequence hun in∈N in A such R R that limn→∞ un = supu∈A u, and that hun in∈N is Cauchy. (ii) Show that w = sup A is defined in L1 and belongs to the normclosure of A in L1 , so that, in particular, kwk1 ≤ supu∈A kuk1 . (c) The norm on a Banach lattice U is ordercontinuous if inf u∈A kuk = 0 whenever A ⊆ U is a nonempty downwardsdirected set with infimum 0. (Thus 242Ya tells us that the norms k k1 are all ordercontinuous.) Show that in this case (i) any nondecreasing sequence in U which has an upper bound in U must be Cauchy (ii) U is Dedekind complete. (Hint for (i): if hun in∈N is a nondecreasing sequence with an upper bound in U , let B be the set of upper bounds of {un : n ∈ N} and show that A = {v − un : v ∈ B, n ∈ N} has infimum 0 because U is Archimedean.) (d) Let (X, Σ, µ) be any measure space. Show that L1 (µ) has the countable sup property (241Yd). (e) More generally, show that any Banach lattice with an ordercontinuous norm has the countable sup property. (f ) Let (X, Σ, µ) be a measure space and Y any subset of X; let µY be the subspace measure on Y and 1 T : L0 (µ) → L0 (µY ) the canonical map described in 241Ye. (i) Show thatR T u ∈ L R (µY ) and kT uk1 ≤ kuk1 1 1 for every u ∈ L (µ). (ii) Show that if u ∈ L (µ) then kT uk1 = kuk1 iff E u = Y ∩E T u for every E ∈ Σ. (iii) Show that T is surjective and that kvk1 = min{kuk1 : u ∈ L1 (µ), T u = v} for every v ∈ L1 (µY ). (Hint: 214Eb.) (See also 244Yc below.)
242 Notes
L1
151
(g) Let (X, Σ, µ) be a measure space. Write L1strict for the space of all integrable Σmeasurable functions from X to R, and N for the subspace of L1strict consisting of measurable functions which are zero almost everywhere. (i) Show that L1strict is a Dedekind σcomplete Riesz space. (ii) Show that L1 (µ) can be identified, as ordered linear space, with the quotient L1strict /N as defined in 241Yg. (iii) Show that k k1 is a seminorm on L1strict (definition: 2A5D). (iv) Show that f 7→ f  : L1strict → L1strict is continuous if L1strict is given the topology defined from k k1 (2A5B). (v) Show that {f : f ≥a.e. 0} is closed in L1strict , but that {f : f ≥ 0} need not be. (h) Let (X, Σ, µ) be a measure space, and µ ˜ the c.l.d. version of µ (213E). Show that the inclusion L1 (µ) ⊆ L1 (˜ µ) induces an isomorphism, as ordered normed linear spaces, between L1 (˜ µ) and L1 (µ). (i) Let (X, Σ, µ) and (Y, T, ν) be measure spaces and U ⊆ L0 (µ) a linear subspace. Let T : U → L0 (ν) be a linear operator such that T u ≥ 0 in L0 (ν) whenever u ∈ U and u ≥ 0 in L0 (µ). Suppose that w ∈ U is such that w ≥ 0 and T w = (χY )• . Show that whenever φ : R → R is a convex function and u ∈ L0 (µ) is ¯ ¯ (w × u) ≤ T (w × φu). ¯ such that w × u and w × φ(u) ∈ U , defining φ¯ : L0 (µ) → L0 (µ) as in 241I, then φT Explain how this result may be regarded as a common generalization of Jensen’s inequality, as stated in 233I, and 242K above. See also 244M below. (j) (i) A function φ : C → R is convex if φ(ab + (1 − a)c) ≤ aφ(b) + (1 − a)φ(c) for all b, c ∈ C and a ∈ [0, 1]. (ii) Show that such a function must be bounded on any bounded subset of C. (iii) If φ : C → R is convex and c ∈ C, show that there is a b ∈ C such that φ(x) ≥ φ(c) + Re(b(x − c)) for every x ∈ C. (iv) If hbc ic∈C is such that φ(x) ≥ φc (x) = φ(c) + Re(bc (x − c)) for all x, c ∈ C, show that {bc : c ∈ I} is bounded for any bounded I ⊆ C. (v) Show that if D ⊆ C is any dense set, φ(x) = supc∈D φc (x) for every x ∈ C. (k) Let (X, Σ, µ) be a probability space and T a σsubalgebra of Σ. Let P : L1C (µ) → L1C (µ¹ T) be the conditional expectation operator. Show that if φ : C → R is any convex function, and we define ¯ • ) = (φf )• for every f ∈ L0 (µ), then φ(P ¯ u) ≤ P (φ(u)) ¯ ¯ φ(f whenever u ∈ L1C (µ) is such that φ(u) ∈ L1 (µ). C Let (X, Σ, µ) be a measure and u0 , . . . , un ∈ L1 (µ). (i) Suppose 0 , . . . , kn ∈ Z are such that Pn Pk Pn Pspace P(l) n n n αi − αj  ≤ 0 for all k (Hint: k = 1. Show that i=0 j=0 ki kjP i=0 j=0 i kj kui − uj k1 ≤ 0. P i=0 i n Pn n α0 , . . . , αn ∈ R.) (ii) Suppose γ0 , . . . , γn ∈ R are such that i=0 γi = 0. Show that i=0 j=0 γi γj kui − uj k1 ≤ 0. 242 Notes and comments Of course L1 spaces compose one of the most important classes of Riesz space, and accordingly their properties have great prominence in the general theory; 242Xc, 242Xg and 242Ya242Ye outline some of the interrelations between these properties. I will return to these questions in Chapter 35 in the next volume. I have mentioned in passing (242Dd) the additivity of the norm of L1 on the positive elements. This elementary fact actually characterizes L1 spaces among Banach lattices (Kakutani 41); see 369E in the next volume. Just as L0 (µ) can be regarded as a quotient of a linear space L0strict , so can L1 (µ) be regarded as a quotient of a linear space L1strict (242Yg). I have discussed this question in the notes to §241; all I try to do here is to be consistent. We now have a language in which we can speak of ‘the’ conditional expectation of a function f , the equivalence class in L1 (µ¹ T) consisting precisely of all the conditional expections of f on T. If we think of L1 (µ¹ T) as identified with its image in L1 (µ), then the conditional expectation operator P : L1 (µ) → L1 (µ¹ T) becomes a projection (242Jh). We therefore have restatements of 233J233K, as in 242K, 242L and 242Yi. I give 242O in a fairly general form; but its importance already appears if we take X to be [0, 1] with onedimensional Lebesgue measure. In this case, we have a natural norm on C([0, 1]), the space of all continuous realvalued functions on [0, 1], given by setting R1 kf k1 = 0 f (x)dx for every f ∈ C([0, 1]). The integral here can, of course, be taken to be the Riemann integral; we do not need the Lebesgue theory to show that k k1 is a norm on C([0, 1]). It is easy to check that C([0, 1]) is not complete for this norm (if we set fn (x) = min(1, 2n xn ) for x ∈ [0, 1], then hfn in∈N is a k k1 Cauchy sequence
152
Function spaces
242 Notes
with no k k1 limit in C([0, 1])). We can use the abstract theory of normed spaces to construct a completion of C([0, 1]); but it is much more satisfactory if this completion can be given a relatively concrete form, and this is what the identification of L1 with the completion of C([0, 1]) can do. (Note that the remark that k k1 is a norm on C([0, 1]), that is, that kf k1 6= 0 for every nonzero f ∈ C([0, 1]), means just that the map f 7→ f • : C([0, 1]) → L1 is injective, so that C([0, 1]) can be identified, as ordered normed space, with its image in L1 .) It would be even better if we could find a realization of the completion of C([0, 1]) as a space of functions on some set Z, rather than as a space of equivalence classes of functions on [0, 1]. Unfortunately this is not practical; such realizations do exist, but necessarily involve either a thoroughly unfamiliar base set Z, or an intolerably arbitrary embedding map from C([0, 1]) into R Z . You can get an idea of the obstacle to realizing the completion of C([0, 1]) asPa space of functions on ∞ [0, 1] itself by considering fn (x) = n1 xn for n ≥ 1. An easy calculation shows that n=1 kfn k1 < ∞, so that P∞ n=1 fn must exist in the completion of C([0, 1]); but there is no natural value to assign to it at the point 1. Adaptations of this idea can give rise to indefinitely complicated phenomena – indeed, 242O shows that every integrable function is associated with some appropriate sequence from C([0, 1]). In §245 I shall have more to say about what k k1 convergent sequences look like. From the point of view of measure theory, narrowly conceived, most of the interesting ideas appear most clearly with real functions and real linearspaces. But some of the most important applications of measure theory – important not only as mathematics in general, but also for the measuretheoretic questions they inspire – deal with complex functions and complex linear spaces. I therefore continue to offer sketches of the complex theory, as in 242P. I note that at irregular intervals we need ideas not already spelt out in the real theory, as in 242Pb and 242Yk.
243 L∞ The second of the classical Banach spaces of measure theory which I treat is the space L∞ . As will appear below, L∞ is the polar companion of L1 , the linked opposite; for ‘ordinary’ measure spaces it is actually the dual of L1 (243F243G). 243A Definitions Let (X, Σ, µ) be any measure space. Let L∞ = L∞ (µ) be the set of functions f ∈ L0 = L0 (µ) which are essentially bounded, that is, such that there is some M ≥ 0 such that {x : x ∈ dom f, f (x) ≤ M } is conegligible, and write L∞ = L∞ (µ) = {f • : f ∈ L∞ (µ)} ⊆ L0 (µ). Note that if f ∈ L∞ , g ∈ L0 and g = f a.e., then g ∈ L∞ ; thus L∞ = {f : f ∈ L0 , f • ∈ L∞ }. 243B Theorem Let (X, Σ, µ) be any measure space. Then (a) L∞ = L∞ (µ) is a linear subspace of L0 = L0 (µ). (b) If u ∈ L∞ , v ∈ L0 and v ≤ u then v ∈ L∞ . Consequently u, u ∨ v, u ∧ v, u+ = u ∨ 0 and − u = (−u) ∨ 0 belong to L∞ for all u, v ∈ L∞ . (c) Writing e = 1• , the equivalence class in L0 of the constant function with value 1, then an element u of L0 belongs to L∞ iff there is an M ≥ 0 such that u ≤ M e. (d) If u, v ∈ L∞ then u × v ∈ L∞ . (e) If u ∈ L∞ , v ∈ L1 = L1 (µ) then u × v ∈ L1 . proof (a) If f , g ∈ L∞ = L∞ (µ) and c ∈ R, then f + g, cf ∈ L∞ . P P We have M1 , M2 ≥ 0 such that f  ≤ M1 a.e. and g ≤ M2 a.e. Now f + g ≤ f  + g ≤ M1 + M2 a.e., ∞
so f + g, cf ∈ L . Q Q It follows at once that u + v, cu ∈ L
∞
cf  ≤ cM1  a.e., whenever u, v ∈ L∞ and c ∈ R.
(b)(i) Take f ∈ L∞ , g ∈ L0 = L0 (µ) such that u = f • , v = g • . Then g ≤ f  a.e. Let M ≥ 0 be such that f  ≤ M a.e.; then g ≤ M a.e., so g ∈ L∞ and v ∈ L∞ .
L∞
243Dd
153
(ii) Now  u  = u so u ∈ L∞ whenever u ∈ L∞ . Also u∨v = 12 (u+v+u−v), u∧v = 12 (u+v−u−v) belong to L∞ for all u, v ∈ L∞ . (c)(i) If u ∈ L∞ , take f ∈ L∞ such that f • = u. Then there is an M ≥ 0 such that f  ≤ M a.e., so that f  ≤ M 1 a.e. and u ≤ M e. (ii) Of course 1 ∈ L∞ , so e ∈ L∞ , and if u ∈ L0 and u ≤ M e then u ∈ L∞ by (b). (d) f × g ∈ L∞ whenever f , g ∈ L∞ . P P If f  ≤ M1 a.e. and g ≤ M2 a.e., then f × g = f  × g ≤ M1 M2 a.e. Q Q So u × v ∈ L
∞
∞
for all u, v ∈ L .
(e) If f ∈ L∞ and g ∈ L1 = L1 (µ), then there is an M ≥ 0 such that f  ≤ M a.e., so f × g ≤ M g almost everywhere; because M g is integrable and f × g is virtually measurable, f × g is integrable and u × v ∈ L1 . 243C The order structure of L∞ Let (X, Σ, µ) be any measure space. Then L∞ = L∞ (µ), being a linear subspace of L0 = L0 (µ), inherits a partial order which renders it a partially ordered linear space (compare 242Ca). Because u ∈ L∞ whenever u ∈ L∞ (243Bb), u ∧ v and u ∨ v belong to L∞ whenever u, v ∈ L∞ , and L∞ is a Riesz space (compare 242Cd). The behaviour of L∞ as a Riesz space is dominated by the fact that it has an order unit e = 1• with the property that for every u ∈ L∞ there is an M ≥ 0 such that u ≤ M e (243Bc). 243D The norm of L∞ Let (X, Σ, µ) be any measure space. (a) For f ∈ L∞ = L∞ (µ), say that the essential supremum of f  is ess sup f  = inf{M : M ≥ 0, {x : x ∈ dom f, f (x) ≤ M } is conegligible}. Then f  ≤ ess sup f  a.e. P P Set M = ess sup f . For each n ∈ N, there is an Mn ≤ M + 2−n such that f  ≤ Mn a.e. Now T {x : f (x) ≤ M } = n∈N {x : f (x) ≤ Mn } is conegligible, so f  ≤ M a.e. Q Q (b) If f , g ∈ L∞ and f = g a.e., then ess sup f  = ess sup g. Accordingly we may define a functional k k∞ on L∞ = L∞ (µ) by setting kuk∞ = ess sup f  whenever u = f • . (c) From (a), we see that, for any u ∈ L∞ , kuk∞ = min{γ : u ≤ γe}, where, as before, e = 1• ∈ L∞ . Consequently k k∞ is a norm on L∞ . P P(i) If u, v ∈ L∞ then u + v ≤ u + v ≤ (kuk∞ + kvk∞ )e so ku + vk∞ ≤ kuk∞ + kvk∞ . (ii) If u ∈ L∞ and c ∈ R then cu = cu ≤ ckuk∞ e, so kcuk∞ ≤ ckuk∞ . (iii) If kuk∞ = 0, there is an f ∈ L∞ such that f • = u and f  ≤ kuk∞ a.e.; now f = 0 a.e. so u = 0. Q Q (d) Note also that if u ∈ L0 , v ∈ L∞ and u ≤ v then u ≤ kvk∞ e so u ∈ L∞ and kuk∞ ≤ kvk∞ ; similarly, ku × vk∞ ≤ kuk∞ kvk∞ ,
ku ∨ vk∞ ≤ max(kuk∞ , kvk∞ )
for all u, v ∈ L∞ . Thus L∞ is a commutative Banach algebra (2A4J).
154
Function spaces
(e) Moreover, 
R
u × v ≤
R
243De
u × v = ku × vk1 ≤ kuk1 kvk∞
whenever u ∈ L1 and v ∈ L∞ , because u × v = u × v ≤ u × kvk∞ e = kvk∞ u. (f ) Observe that if u, v are nonnegative members of L∞ then ku ∨ vk∞ = max(kuk∞ , kvk∞ ); this is because, for any γ ≥ 0, u ∨ v ≤ γe ⇐⇒ u ≤ γe and v ≤ γe. 243E Theorem For any measure space (X, Σ, µ), L∞ = L∞ (µ) is a Banach lattice under k k∞ . proof (a) We already know that kuk∞ ≤ kvk∞ whenever u ≤ v (243Dd); so we have just to check that L∞ is complete under k k∞ . Let hun in∈N be a Cauchy sequence in L∞ . For each n ∈ N choose fn ∈ L∞ = L∞ (µ) such that fn• = un in L∞ . For all m, n ∈ N, (fm − fn )• = um − un . Consequently Emn = {x : fm (x) − fn (x) > kum − un k∞ } is negligible, by 243Da. This means that S T E = n∈N {x : x ∈ dom fn , fn (x) ≤ kun k∞ } \ m,n∈N Emn is conegligible. But for every x ∈ E, fm (x) − fn (x) ≤ kum − un k∞ for all m, n ∈ N, so that hfn (x)in∈N is a Cauchy sequence, with a limit in R. Thus f = limn→∞ fn is defined almost everywhere. Also, at least for x ∈ E, f (x) ≤ supn∈N kun k∞ < ∞, so f ∈ L
∞
∞
and u = f ∈ L . If m ∈ N, then, for every x ∈ E, •
f (x) − fm (x) ≤ supn≥m fn (x) − fm (x) ≤ supn≥m kun − um k∞ , so ku − um k∞ ≤ supn≥m kun − um k∞ → 0 as m → ∞, and u = limm→∞ um in L∞ . As hun in∈N is arbitrary, L∞ is complete. 243F The duality between L∞ and L1 Let (X, Σ, µ) be any measure space. 
1 1 ∞ ∞ 1 R (a) I have already remarked that if u ∈ L = L (µ) and v ∈ L = L (µ), then u × v ∈ L and u × v ≤ kuk1 kvk∞ (243Bd, 243De).
(b) Consequently we have a bounded linear operator T from L∞ to the normed space dual (L1 )∗ of L1 , given by writing R (T v)(u) = u × v for all u ∈ L1 , v ∈ L∞ . P P (i) By (a), (T v)(u) is welldefined for u ∈ L1 , v ∈ L∞ . (ii) If v ∈ L∞ , u, u1 , 1 u2 ∈ L and c ∈ R, then Z Z (T v)(u1 + u2 ) = (u1 + u2 ) × v = (u1 × v) + (u2 × v) Z Z = u1 × v + u2 × v = (T v)(u1 ) + (T v)(u2 ), (T v)(cu) =
R
cu × v =
R
R c(u × v) = c u × v = c(T v)(u).
This shows that T v : L1 → R is a linear functional for each v ∈ L∞ . (iii) Next, for any u ∈ L1 and v ∈ L∞ ,
L∞
243G
155
R (T v)(u) =  u × v ≤ ku × vk1 ≤ kuk1 kvk∞ , as remarked in (a). This means that T v ∈ (L1 )∗ and kT vk ≤ kvk∞ for every v ∈ L∞ . (iv) If v, v1 , v2 ∈ L∞ , u ∈ L1 and c ∈ R, then Z Z T (v1 + v2 )(u) = (v1 + v2 ) × u = (v1 × u) + (v2 × u) Z Z = v1 × u + v2 × u = (T v1 )(u) + (T v2 )(u) = (T v1 + T v2 )(u), T (cv)(u) =
R
R cv × u = c v × u = c(T v)(u) = (cT v)(u).
As u is arbitrary, T (v1 + v2 ) = T v1 + T v2 and T (cv) = c(T v); thus T : L∞ → (L1 )∗ is linear. (v) Recalling from (iii) that kT vk ≤ kvk∞ for every v ∈ L∞ , we see that kT k ≤ 1. Q Q (c) Exactly the same arguments show that we have a linear operator T 0 : L1 → (L∞ )∗ , given by writing R (T 0 u)(v) = u × v for all u ∈ L1 , v ∈ L∞ , and that kT 0 k is also at most 1. 243G Theorem Let (X, Σ, µ) be a measure space, and T : L∞ (µ) → (L1 (µ))∗ the canonical map described in 243F. Then (a) T is injective iff (X, Σ, µ) is semifinite, and in this case is normpreserving; (b) T is bijective iff (X, Σ, µ) is localizable, and in this case is a normed space isomorphism. proof (a)(i) Suppose that T is injective, and that E ∈ Σ has µE = ∞. Then χE is notR equal a.e. to 0, • • so (χE)• 6= 0 in L∞ , and T (χE)• 6= 0; let uR∈ L1 be such R that T (χE) (u) 6= 0, that is, u × (χE) 6= 0. • Express u as f where then R E f 6= 0 so E f  6= 0. Let g be a simple function such that R f Ris integrable; R Pn 0 ≤ g ≤a.e. f  and g > f  − E f ; then E g 6= 0. Express g as i=0 ai χEi where µEi < ∞ for each i; Pn then 0 6= i=0 ai µ(Ei ∩ E), so there is an i ≤ n such that µ(E ∩ Ei ) 6= 0, and now E ∩ Ei is a measurable subset of E of nonzero finite measure. As E is arbitrary, this shows that (X, Σ, µ) must be semifinite if T is injective. (ii) Now suppose that (X, Σ, µ) is semifinite, and that v ∈ L∞ is nonzero. Express v as g • where g : X → R is measurable; then g ∈ L∞ . Take any a ∈ ]0, kvk∞ [; then E = {x : g(x) ≥ a} has nonzero measure. Let F ⊆ E be a measurable set of nonzero finite measure, and set f (x) = g(x)/g(x) if x ∈ F , 0 otherwise; then f ∈ L1 and (f × g)(x) ≥ a for x ∈ F , so, setting u = f • ∈ L1 , we have R R R (T v)(u) = u × v = f × g ≥ aµF = a f  = akuk1 > 0. This shows that kT vk ≥ a; as a is arbitrary, kT vk ≥ kvk∞ . We know already from 243F that kT vk ≤ kvk∞ , so kT vk = kvk∞ for every nonzero v ∈ L∞ ; the same is surely true for v = 0, so T is normpreserving and injective. (b)(i) Using (a) and the definition of ‘localizable’, we see that under either of the conditions proposed (X, Σ, µ) is semifinite and T is injective and normpreserving. I therefore have to show just that it is surjective iff (X, Σ, µ) is localizable. (ii) Suppose that T is surjective and that E ⊆ Σ. Let F be the family of finite unions of members of E, counting ∅ as the union of no members of E, so that F is closed under finite unions and, for any G ∈ Σ, E \ G is negligible for every E ∈ E iffR E \ G is negligible for every E ∈ F. If u ∈ L1 , then h(u) = limE∈F ,E↑ E u exists in R. P P If u is nonnegative, then R R h(u) = sup{ E u : E ∈ F } ≤ u < ∞. For other u, we can express u as u1 − u2 , where u1 and u2 are nonnegative, and now h(u) = h(u1 ) − h(u2 ). Q Q R Evidently h : L1 → R is linear, being a limit of the linear functionals u 7→ E u, and also
156
Function spaces
h(u) ≤ supE∈F 
R E
u ≤
243G
R
u
for every u, so h ∈ (L1 )∗ . Since we are supposing that T is surjective, there is a v ∈ L∞ such that T v = h. Express v as g • where g : X → R is measurable and essentially bounded. Set G = {x : g(x) > 0} ∈ Σ. If F ∈ Σ and µF < ∞, then R R g = (χF )• × g • = (T v)(χF )• = h(χF )• = supE∈F µ(E ∩ F ). F ?? If E ∈ E and E \ G is not negligible, then there is a set F ⊆ E \ G such that 0 < µF < ∞; now R µF = µ(E ∩ F ) ≤ F g ≤ 0, as g(x) ≤ 0 for x ∈ F . X X Thus E \ G is negligible for every E ∈ E. Let H ∈ Σ be such that E \ H is negligible for every E ∈ E. ?? If G \ H is not negligible, there is a set F ⊆ G \ H of nonzero finite measure. Now µ(E ∩ F ) ≤ µ(H ∩ F ) = 0 R for every E ∈ E, so µ(E ∩ F ) = 0 for every E ∈ F, and F g = 0; but g(x) > 0 for every x ∈ F , so µF = 0, which is impossible. X X Thus G \ H is negligible. Accordingly G is an essential supremum of E in Σ. As E is arbitrary, (X, Σ, µ) is localizable. (iii) For the rest of this proof, I will suppose that (X, Σ, µ) is localizable and seek to show that T is surjective. Take h ∈ (L1 )∗ such that khk = 1. Write Σf = {F : F ∈ Σ, µF < ∞}, and for F ∈ Σf define νF : Σ → R by setting νF E = h(χ(E ∩ F )• ) for every E ∈ Σ. Then νF ∅ = h(0) = 0, and if hEn in∈N is a disjoint sequence in Σ with union E, P∞ χ(E ∩ F )• = n=0 χ(En ∩ F )• in L1 . P P kχ(E ∩ F )• −
Pn k=0
χ(En ∩ F )• k1 = µ(F ∩ E \
as n → ∞. Q Q So νF E = h(χ(E ∩ F )• ) =
P∞ n=0
S
h(χ(En ∩ F )• ) =
k≤n
Ek ) → 0
P∞ n=0
νF En .
Thus νF is countably additive. Also νF E ≤ kχ(E ∩ F )• k1 = µ(E ∩ F ) for every E ∈ Σ, so νF is truly continuous in the ym theorem (232E), R sense of 232Ab. By the RadonNikod´ there is an integrable function gF such that E gF = νF E for every E ∈ Σ; we may take it that gF is measurable and has domain X (232He). (iv) It is worth noting that gF  ≤a.e. 1. P P If G = {x : gF (x) > 1}, then R g = νF G ≤ µ(F ∩ G) ≤ µG; G F but this is possible only if µG = 0. Similarly, if G0 = {x : gF (x) < −1}, then R g = νF G0 ≥ −µG0 , G0 F so again µG0 = 0. Q Q (v) If F , F 0 ∈ Σf , then gF = gF 0 almost everywhere on F ∩ F 0 . P P If E ∈ Σ and E ⊆ F ∩ F 0 , then R R g = h(χ(E ∩ F )• ) = h(χ(E ∩ F 0 )• ) = E gF 0 . E F So 131H gives the result. Q Q 213N (applied to {gF ¹F : F ∈ Σf }) now tells us that, because µ is localizable, there is a measurable function g : X → R such that g = gF almost everywhere on F , for every F ∈ Σf . (vi) For any F ∈ Σf , the set {x : x ∈ F, g(x) > 1} ⊆ {x : gF (x) > 1} ∪ {x : x ∈ F, g(x) 6= gF (x)}
L∞
243I
157
is negligible; because µ is semifinite, {x : g(x) > 1} is negligible, and g ∈ L∞ , with ess sup g ≤ 1. Accordingly v = g • ∈ L∞ , and we may speak of T v ∈ (L1 )∗ . (vii) If F ∈ Σf , then
R F
g=
R F
gF = νF X = h(χF • ).
It follows at once that (T v)(f • ) =
R
f × g = h(f • )
for every simple function f : X → R. Consequently T v = h, because both T v and h are continuous and the equivalence classes of simple functions form a dense subset of L1 (242Mb, 2A3Uc). Thus h = T v is a value of T . (viii) The argument as written above has assumed that khk = 1. But of course any nonzero member of (L1 )∗ is a scalar multiple of an element of norm 1, so is a value of T . So T : L∞ → (L1 )∗ is indeed surjective, and is therefore an isometric isomorphism, as claimed. 243H Recall that L0 is always Dedekind σcomplete and sometimes Dedekind complete (241G), while L1 is always Dedekind complete (242H). In this respect L∞ follows L0 . Theorem Let (X, Σ, µ) be a measure space. (a) L∞ (µ) is Dedekind σcomplete. (b) If µ is localizable, L∞ (µ) is Dedekind complete. proof These are both consequences of 241G. If A ⊆ L∞ = L∞ (µ) is bounded above in L∞ , fix u0 ∈ A and an upper bound w0 of A in L∞ . If B is the set of upper bounds for A in L0 = L0 (µ), then B ∩ L∞ is the set of upper bounds for A in L∞ . Moreover, if B has a least member v0 , then we must have u0 ≤ v0 ≤ w0 , so that 0 ≤ v0 − u0 ≤ w0 − u0 ∈ L∞ and v0 − u0 , v0 belong to L∞ . (Compare part (a) of the proof of 242H.) Thus v0 = sup A in L∞ . Now we know that L0 is Dedekind σcomplete; if A ⊆ L∞ is a nonempty countable set which is bounded above in L∞ , it is surely bounded above in L0 , so has a supremum in L0 which is also its supremum in L∞ . As A is arbitrary, L∞ is Dedekind σcomplete. While if µ is localizable, we can argue in the same way with arbitrary nonempty subsets of L∞ to see that L∞ is Dedekind complete because L0 is. 243I A dense subspace of L∞ In 242M242O I described a couple of important dense linear subspaces of L1 spaces. The position concerning L∞ is a little different. However I can describe one important dense subspace. Proposition Let (X, Σ, µ) be a measure space. (a) Write SP for the space of ‘Σsimple’ functions on X, that is, the space of functions from X to R n expressible as k=0 ak χEk where ak ∈ R and Ek ∈ Σ for every k ≤ n. Then for every f ∈ L∞ = L∞ (µ) and every ² > 0, there is a g ∈ S such that ess sup f − g ≤ ². (b) S = {f • : f ∈ S} is a k k∞ dense linear subspace of L∞ = L∞ (µ). (c) If (X, Σ, µ) is totally finite, then S is the space of µsimple functions, so S becomes just the space of equivalence classes of simple functions, as in 242Mb. proof (a) Let f˜ : X → R be a bounded measurable function such that f =a.e. f˜. Let n ∈ N be such that f (x) ≤ n² for every x ∈ X. For −n ≤ k ≤ n set Ek = {x : k² ≤ f˜(x) < k + 1)². Set g=
Pn k=−n
k²χEk ∈ S;
then 0 ≤ f˜(x) − g(x) ≤ ² for every x ∈ X, so ess sup f − g = ess sup f˜ − g ≤ ².
158
Function spaces
243I
(b) This follows immediately, as in 242Mb. (c) is also elementary. 243J Conditional expectations Conditional expectations are so important that it is worth considering their interaction with every new concept. (a) If (X, Σ, µ) is any measure space, and T is a σsubalgebra of Σ, then the canonical embedding S : L0 (µ¹ T) → L0 (µ) (242Ja) embeds L∞ (µ¹ T) as a subspace of L∞ (µ), and kSuk∞ = kuk∞ for every u ∈ L∞ (µ¹ T). As in 242Jb, we can identify L∞ (µ¹ T) with its image in L∞ (µ). (b) Now suppose that µX = 1, and let P : L1 (µ) → L1 (µ¹ T) be the conditional expectation operator (242Jd). Then L∞ (µ) is actually a linear subspace of L1 (µ). Setting e = 1• ∈ L∞ (µ), we see that R e = (µ¹ T)(F ) for every F ∈ T, so F P e = 1• ∈ L∞ (µ¹ T). If u ∈ L∞ (µ), then setting M = kuk∞ we have −M e ≤ u ≤ M e, so −M P e ≤ P u ≤ M P e, because P is orderpreserving (242Je); accordingly kP uk∞ ≤ M = kuk∞ . Thus P ¹L∞ (µ) : L∞ (µ) → L∞ (µ¹ T) is an operator of norm 1. 243K Complex L∞ All the ideas needed to adapt the work above to complex L∞ spaces have already appeared in 241J and 242P. Let L∞ C be {f : f ∈ L0C , ess sup f  < ∞} = {f : Re(f ) ∈ L∞ , Im(f ) ∈ L∞ }. Then • ∞ 0 ∞ ∞ L∞ C = {f : f ∈ LC } = {u : u ∈ LC , Re(u) ∈ L , Im(u) ∈ L }.
Setting kuk∞ = kuk∞ = ess sup f  whenever f • = u, ∞ we have a norm on L∞ C rendering it a Banach space. We still have u × v ∈ LC and ku × vk∞ ≤ kuk∞ kvk∞ ∞ for all u, v ∈ LC . ∞ 1 We now have a duality between L1C and L∞ C giving rise to a linear operator T : LC → LC of norm at most 1, defined by the formula R (T v)(u) = u × v for every u ∈ L1 , v ∈ L∞ .
T is injective iff the underlying measure space is semifinite, and is a bijection iff the underlying measure space is localizable. (This can of course be proved by reworking the arguments of 243G; but it is perhaps easier to note that T (Re(v)) = Re(T v), T (Im(v)) = Im(T v) for every v, so that the result for complex spaces can be deduced from the result for real spaces.) To check that T is normpreserving when it is injective, the quickest route seems to be to imitate the argument of (aii) of the proof of 243G. 243X Basic exercises (a) Let (X, Σ, µ) be any measure space, and µ ˆ the completion of µ (212C, 241Xb). Show that L∞ (ˆ µ) = L∞ (µ), L∞ (ˆ µ) = L∞ (µ). > (b) Let (X, Σ, µ) be a nonempty measure space. Write L∞ strict for the space of bounded Σmeasurable 0 0 realvalued functions with domain X. (i) Show that L∞ (µ) = {f • : f ∈ L∞ strict } ⊆ L = L (µ). (ii) Show ∞ that Lstrict is a Dedekind σcomplete Banach lattice if we give it the norm kf k∞ = supx∈X f (x) for every f ∈ L∞ strict . • (iii) Show that for every u ∈ L∞ = L∞ (µ), kuk∞ = min{kf k∞ : f ∈ L∞ strict , f = u}.
> (c) Let (X, Σ, µ) be any measure space, and A a subset of L∞ (µ). Show that A is bounded for the norm k k∞ iff it is bounded above and below for the ordering of L∞ . (d) Let (X, Σ, µ) be any measure space, and A ⊆ L∞ (µ) a nonempty set with a least upper bound w in L (µ). Show that kwk∞ ≤ supu∈A kuk∞ . ∞
243Y
L∞
159
(e) Let h(Xi , Σi , µi )ii∈I be a family of measure Q spaces, and (X, Σ, µ) their direct sum (214K). Show that the canonical isomorphism between L0 (µ) and i∈I L0 (µi ) (241Xg) induces an isomorphism between L∞ (µ) and the subspace Q {u : u ∈ i∈I L∞ (µi ), kuk = supi∈I ku(i)k∞ < ∞} Q of i∈I L∞ (µi ). 1 ∞ (f ) Let (X, Σ, R µ) be any measure space, and u ∈ L (µ). Show that there is a v ∈ L (µ) such that kvk∞ ≤ 1 and u × v = kuk1 .
(g) Let (X, Σ, µ) be a semifinite measure space and v ∈ L∞ (µ). Show that R kvk∞ = sup{ u × v : u ∈ L1 , kuk1 ≤ 1} = sup{ku × vk1 : u ∈ L1 , kuk1 ≤ 1}. (h) Give an example of a probability space (X, Σ, µ) and a v ∈ L∞ (µ) such that ku × vk1 < kvk∞ whenever u ∈ L1 (µ) and kuk1 ≤ 1. (i) Write out proofs of 243G adapted to the special cases (i) µX = 1 (ii) (X, Σ, µ) is σfinite. (j) Let (X, Σ, µ) be any measure space. Show that L0 (µ) is Dedekind complete iff L∞ (µ) is Dedekind complete. (k) Let (X, Σ, µ) be a totally finite measure space and ν : Σ → R a functional. Show that the following are equiveridical: (i) there is a continuous linear functional h : L1 (µ) → R such that h((χE)• ) = νE for every E ∈ Σ (ii) ν is additive and there is an M ≥ 0 such that νE ≤ M µE for every E ∈ Σ. > (l) Let X be any set, and let µ be counting measure on X. In this case it is customary to write `∞ (X) for L∞ (µ), and to identify it with L∞ (µ). Write out statements and proofs of the results of this chapter adapted to this special case – if you like, with X = N. In particular, write out a direct proof that (`1 )∗ can be identified with `∞ . What happens when X has just two members? or three? (m) Show that if (X, Σ, µ) is any measure space and u ∈ L∞ C (µ), then kuk∞ = sup{k Re(ζu)k∞ : ζ ∈ C, ζ = 1}. (n) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and φ : X → Y an inversemeasurepreserving function. Show that gφ ∈ L∞ (µ) for every g ∈ L∞ (ν), and that the map g 7→ gφ induces a linear operator T : L∞ (ν) → L∞ (µ) defined by setting T (g • ) = (gφ)• for every g ∈ L∞ (ν). (Compare 241Xh.) Show that kT vk∞ = kvk∞ for every v ∈ L∞ (ν). (o) On C = C([0, 1]), the space of continuous realvalued functions on the unit interval [0, 1], say f ≤ g iff f (x) ≤ g(x) for every x ∈ [0, 1], kf k∞ = supx∈[0,1] f (x). Show that C is a Banach lattice, and that moreover kf ∨ gk∞ = max(kf k∞ , kgk∞ ) whenever f , g ≥ 0, kf × gk∞ ≤ kf k∞ kgk∞ for all f , g ∈ C, kf k∞ = min{γ : f  ≤ γ1} for every f ∈ C. 243Y Further exercises (a) Let (X, Σ, µ) be a measure space, and Y a subset of X; write µY for the subspace measure on Y . Show that the canonical map from L0 (µ) onto L0 (µY ) (241Ye) induces a canonical map from L∞ (µ) onto L∞ (µY ), which is normpreserving iff it is injective.
160
Function spaces
243 Notes
243 Notes and comments I mention the formula ku ∨ vk∞ = max(kuk∞ , kvk∞ ) for u, v ≥ 0 (243Df) because while it does not characterize L∞ spaces among Banach lattices (see 243Xo), it is in a sense dual to the characteristic property ku + vk1 = kuk1 + kvk1 for u, v ≥ 0 1
of the norm of L . (I will return to this in Chapter 35 in the next volume.) The particular set L∞ I have chosen (243A) is somewhat arbitrary. The space L∞ can very well be described entirely as a subspace of L0 , without going back to functions at all; see 243Bc, 243Dc. Just as with L0 and L1 , there are occasions when it would be simpler to work with the linear space of essentially bounded measurable functions from X to R; and we now have a third obvious candidate, the linear space L∞ strict of measurable functions from X to R which are literally, rather than essentially, bounded, which is itself a Banach lattice (243Xb). I suppose the most important theorem of this section is 243G, identifying L∞ with (L1 )∗ . This identification is the chief reason for setting ‘localizable’ measure spaces apart. The proof of 243Gb is long because it depends on two separate ideas. The RadonNikod´ ym theorem deals, in effect, with the totally finite case, and then in parts (bv) and (bvi) of the proof localizability is used to link the partial solutions gF together. Exercise 243Xi is supposed to help you to distinguish the two operations. The map T 0 : L1 → (L∞ )∗ (243Fc) is also very interesting in its way, but I shall leave it for Chapter 36. 243G gives another way of looking at conditional expectation operators. If (X, Σ, µ) is a probability space and T is a σsubalgebra of Σ, of course both µ and µ¹ T are localizable, so L∞ (µ) can be identified with (L1 (µ))∗ and L∞ (µ¹ T) can be identified with (L1 (µ¹ T))∗ . Now we have the canonical embedding S : L1 (µ¹ T) → L1 (µ) (242Jb) which is a normpreserving linear operator, so gives rise to an adjoint operator S 0 : L1 (µ)∗ → L1 (µ¹ T)∗ defined by the formula (S 0 h)(v) = h(Sv) for all v ∈ L1 (µ¹ T), h ∈ L1 (µ)∗ . Writing Tµ : L∞ (µ) → L1 (µ)∗ and Tµ¹ T : L∞ (µ¹ T) → L1 (µ¹ T)∗ for the canonical maps, we get a map −1 0 ∞ ∞ Q = Tµ¹ T S Tµ : L (µ) → L (µ¹ T), defined by saying that R R R Qu × v = (Tµ¹ T Qu)(v) = (S 0 Tµ u)(v) = (Tµ v)(Su) = Su × v = u × v whenever v ∈ L1 (µ¹ T) and u ∈ L∞ (µ). But this agrees with the formula of 242L: we have R R R R Qu × v = u × v = P (u × v) = P u × v. Because v is arbitrary, we must have Qu = P u for every u ∈ L∞ (µ). Thus a conditional expectation operator is, in a sense, the adjoint of the appropriate embedding operator. The discussion in the last paragraph applies, of course, only to the restriction P ¹L∞ (µ) of the conditional expectation operator to the L∞ space. Because µ is totally finite, L∞ (µ) is a subspace of L1 (µ), and the real qualities of the operator P are related to its behaviour on the whole space L1 . P : L1 (µ) → L1 (µ¹ T) can also be expressed as an adjoint operator, but the expression needs more of the theory of Riesz spaces than I have space for here. I will return to this topic in Chapter 36.
244 Lp Continuing with our tour of the classical Banach spaces, we come to the Lp spaces for 1 < p < ∞. The case p = 2 is more important than all the others put together, and it would be reasonable, perhaps even advisable, to read this section first with this case alone in mind. But the other spaces provide instructive examples and remain a basic part of the education of any functional analyst. 244A Definitions Let (X, Σ, µ) be any measure space, and p ∈ ]1, ∞[. Write Lp = Lp (µ) for the set of functions f ∈ L0 = L0 (µ) such that f p is integrable, and Lp (µ) for {f • : f ∈ Lp (µ)} ⊆ L0 = L0 (µ). Note that if f ∈ Lp , g ∈ L0 and f =a.e. g, then f p =a.e. gp so gp is integrable and g ∈ Lp ; thus p L = {f : f ∈ L0 , f • ∈ Lp }.
244E
Lp
161
Alternatively, we can define up whenever u ∈ L0 , u ≥ 0 by writing (f • )p = (f p )• for every f ∈ L0 such that f (x) ≥ 0 for every x ∈ dom f (compare 241I), and say that Lp = {u : u ∈ L0 , up ∈ L1 (µ)}. 244B Theorem Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞]. (a) Lp = Lp (µ) is a linear subspace of L0 = L0 (µ). (b) If u ∈ Lp , v ∈ L0 and v ≤ u, then v ∈ Lp . Consequently u, u ∨ v and u ∧ v belong to Lp for all u, v ∈ Lp . proof The cases p = 1, p = ∞ are covered by 242B, 242C and 243B; so I suppose that 1 < p < ∞. (a)(i) Suppose that f , g ∈ Lp = Lp (µ). If a, b ∈ R then a + bp ≤ 2p max(ap , bp ), so f + gp ≤a.e. 2 (f p ∨ gp ); now f + gp ∈ L0 and 2p (f p ∨ gp ) ∈ L1 so f + gp ∈ L1 . Thus f + g ∈ Lp for all f , g ∈ Lp ; it follows at once that u + v ∈ Lp for all u, v ∈ Lp . p
(ii) If f ∈ Lp and c ∈ R then cf p = cp f p ∈ L1 , so cu ∈ Lp . Accordingly cu ∈ Lp whenever u ∈ Lp and c ∈ R. (b)(i) Express u as f • and v as g • , where f ∈ Lp and g ∈ L0 . Then g ≤a.e. f , so gp ≤a.e. f p and g is integrable; accordingly g ∈ Lp and v ∈ Lp . p
(ii) Now  u  = u so u ∈ Lp whenever u ∈ Lp . Finally u∨v = 12 (u+v+u−v), u∧v = 12 (u+v−u−v) belong to Lp for all u, v ∈ Lp . 244C The order structure of Lp Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞]. Then 244B is enough to ensure that the partial ordering inherited from L0 (µ) makes Lp (µ) a Riesz space (compare 242C, 243C). 244D The norm of Lp Let (X, Σ, µ) be a measure space, p ∈ ]1, ∞[. R (a) For f ∈ Lp = Lp (µ), set kf kp = ( f p )1/p . If f , g ∈ Lp and f =a.e. g then f p =a.e. gp so kf kp = kgkp . Accordingly we may define k kp :RLp (µ) → [0, ∞[ by writing kf • kp = kf kp for every f ∈ Lp . Alternatively, we can say just that kukp = ( up )1/p for every u ∈ Lp = Lp (µ). (b) The notation k kp carries a promise that it is a norm on Lp ; this is indeed so, but I hold the proof over to 244F below. For the however, let us note just that kcukp = ckukp for all u ∈ Lp , c ∈ R, R moment, p and that if kukp = 0 then u = 0 so up = 0 and u = 0. (c) If u ≤ v in Lp then kukp ≤ kvkp ; this is because up ≤ vp . 244E I now work through the lemmas required to show that k kp is a norm on Lp and, eventually, that the normed space dual of Lp may be identified with a suitable Lq . Lemma Suppose (X, Σ, µ) is a measure space, and that p, q ∈ ]1, ∞[ are such that (a) ab ≤ p1 ap + 1q bq for all real a, b ≥ 0. (b)(i) f × g is integrable and R R f × g ≤ f × g ≤ kf kp kgkq
1 p
+
1 q
= 1. Then
for all f ∈ Lp = Lp (µ), g ∈ Lq = Lq (µ); (ii) u × v ∈ L1 = L1 (µ) and R  u × v ≤ ku × vk1 ≤ kukp kvkq for all u ∈ Lp = Lp (µ), v ∈ Lq = Lq (µ). proof (a) If either a or b is 0, this is trivial. If both are nonzero, we may argue as follows. The function x 7→ x1/p : [0, ∞[ → R is concave, with second derivative strictly less than 0, so lies entirely below any of its
162
Function spaces
244E
tangents; in particular, below its tangent at the point (1, 1), which has equation y = 1 + p1 (x − 1). Thus we have 1 p
x1/p ≤ x + 1 −
1 p
1 p
= x+
1 q
for every x ∈ [0, ∞[. So if c, d > 0, then c d
( )1/p ≤
1c pd
1 q
+ ;
multiplying both sides by d, 1 q
1 p
c1/p d1/q ≤ c + d; setting c = ap , d = bq , we get 1 p
1 q
ab ≤ ap + bq , as claimed. (b)(i)(α) Suppose first that kf kp = kgkq = 1. For every x ∈ dom f ∩ dom g we have 1 p
1 q
f (x)g(x) ≤ f (x)p + g(x)q by (a). So 1 p
1 q
f × g ≤ f p + gq ∈ L1 (µ) and f × g is integrable; also R 1R 1R 1 1 1 1 f × g ≤ f p + gq = kf kpp + kgkqq = + = 1. p q p q p q R p p (β) If kf kp = 0, then f  = 0 so f  =a.e. 0, f =a.e. 0, f × g =a.e. 0 and R f × g = 0 = kf kp kgkq . Similarly, if kgkq = 0, then g =a.e. 0 and again R f × g = 0 = kf kp kgkq . (γ) Finally, for general f ∈ Lp , g ∈ Lq such that c = kf kp and d = kgkq are both nonzero, we have k 1c f kp = k d1 gkq = 1 so 1 c
1 d
1 c
1 d
f × g = cd( f × g) is integrable, and
R
f × g = cd
R
 f × g ≤ cd,
as required. (ii) Now if u ∈ Lp , v ∈ Lq take f ∈ Lp , g ∈ Lq such that u = f • , v = g • ; f × g is integrable, so u × v ∈ L1 , and R R  u × v ≤ ku × vk1 = f × g ≤ kf kp kgkq = kukp kvkq . Remark Part (b) is ‘H¨ older’s inequality’. In the case p = q = 2 it is ‘Cauchy’s inequality’. 244F Proposition Let (X, Σ, µ) be a measure space and p ∈ ]1, ∞[. Set q = p/(p−1), so that p1 + 1q = 1. R (a) For every u ∈ Lp = Lp (µ), kukp = max{ u × v : v ∈ Lq (µ), kvkq ≤ 1}. (b) k kp is a norm on Lp . proof (a) For u ∈ Lp , set
Lp
244G
163
R τ (u) = sup{ u × v : v ∈ Lq (µ), kvkq ≤ 1}. (i) If u ∈ Lp , then kukp ≥ τ (u), by 244E. If kukp = 0 then surely R 0 = kukp = τ (u) = max{ u × v : v ∈ Lq (µ), kvkq ≤ 1}. If kukp = c > 0, consider v = c−p/q sgn u × up/q , where for a ∈ R I write sgn a = a/a if a 6= 0, 0 if a = 0, so that sgn : R → R is a Borel measurable function; for f ∈ L0 I write (sgn f )(x) = sgn(f (x)) for x ∈ dom f , so that sgn f ∈ L0 ; and for f ∈ L0 I write sgn(f • ) = (sgn f )• to define a function sgn : L0 → L0 (cf. 241I). Then v ∈ Lq and R R kvkq = ( vq )1/q = c−p/q ( up )1/q = c−p/q cp/q = 1. So Z τ (u) ≥
recalling that 1 +
p q
= p, p −
p q
Z u × v = c−p/q
sgn u × u × sgn u × up/q Z Z p 1+ p −p/q −p/q q =c u =c up = cp− q = c,
= 1. Thus τ (u) ≥ kukp and R τ (u) = kukp = u × v.
p (b) In view of the remarks in 244Db, I have only to check R that ku + vkp ≤ kukp + kvkp for all u, v ∈ L . q But given u and v, let w ∈ L be such that kwkq = 1 and (u + v) × w = ku + vkp . Then R R R ku + vkp = (u + v) × w = u × w + v × w ≤ kukp + kvkp ,
as required. 244G Theorem Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞]. Then Lp = Lp (µ) is a Banach lattice under its norm k kp . proof The cases p = 1, p = ∞ are covered by 242F242G and 243E, so let us suppose that 1 < p < ∞. We know already that kukp ≤ kvkp whenever u ≤ v, so that it remains only to show that Lp is complete. Let hun in∈N be a sequence in Lp such that kun+1 − un kp ≤ 4−n for every n ∈ N. Note that P∞ Pn−1 kun kp ≤ ku0 kp + k=0 kuk+1 − uk kp ≤ ku0 kp + k=0 4−k ≤ ku0 kp + 2 for every n. For each n ∈ N, choose fn ∈ Lp such that f0• = u0 , fn• = un − un−1 for n ≥ 1; do this in such a way that dom fk = X and fk is Σmeasurable (241Bk). Then kfn kp ≤ 4−n+1 for n ≥ 1. For m, n ∈ N, set Emn = {x : fm (x) ≥ 2−n } ∈ Σ. Then fm (x)p ≥ 2−np for x ∈ Emn , so 2−np µEmn ≤
R
fm p < ∞
and µEmn < ∞. So χEmn ∈ Lq = Lq (µ) and R R f  = fk  × χEmn ≤ kfk kp kχEmn kq Emn k for each k, by 244E(bi). Accordingly P∞ R
fk  ≤ kχEmn kq
P∞
k=0 kfk kp < ∞, P∞ and S k=0 fk (x) exists for almost every x P ∈ Emn , by 242E. This is true for all m,Pn ∈ N. But if x ∈ ∞ ∞ X \ m,n∈N Emn , fn (x) = 0 for every n, so k=0 fk (x) certainly exists. Thus g(x) = k=0 fk (x) is defined in R for almost Pnevery x ∈ X. Set gn = k=0 fk ; then gn• = un ∈ Lp for each n, and g(x) = limn→∞ gn (x) is defined a.e. in X. Now consider gp =a.e. limn→∞ gn p . We know that k=0 Emn
164
Function spaces
lim inf n→∞ so by Fatou’s Lemma
R
244G
gn p = lim inf n→∞ kun kpp ≤ (2 + ku0 kp )p < ∞, R
gp ≤ lim inf k→∞
R
gk p < ∞.
Thus u = g • ∈ Lp . Moreover, for any m ∈ N, Z Z g − gm p ≤ lim inf gn − gm p = lim inf kun − um kpp n→∞
≤ lim inf n→∞
So
n→∞
n−1 X k=m
4−kp =
∞ X
4−kp = 4−mp /(1 − 4−p ).
k=m
R ku − um kp = ( g − gm p )1/p ≤ 4−m /(1 − 4−p )1/p → 0
as m → ∞. Thus u = limm→∞ um in Lp . As hun in∈N is arbitrary, Lp is complete. 244H
Following 242M242O, I note that Lp behaves like L1 in respect of certain dense subspaces.
Proposition (a) Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞[. Then the space S of equivalence classes of µsimple functions is a dense linear subspace of Lp = Lp (µ). (b) Let X be any subset of R r , where r ≥ 1, and let µ be the subspace measure on X induced by Lebesgue measure on R r . Write Ck for the set of (bounded) continuous functions g : R r → R such that {x : g(x) 6= 0} is bounded, and S0 for the space of linear combinations of functions of the form χI, where I ⊆ R r is a bounded halfopen interval. Then {(g¹X)• : g ∈ Ck } and {(h¹X)• : h ∈ S0 } are dense in Lp (µ). proof (a) I repeat the argument of 242M with a tiny modification. (i) Suppose that u ∈ Lp (µ), u ≥ 0 and ² > 0. Express u as f • where Rf : X → R [0, ∞[ is a measurable function. Let g : X → R be a simple function such that 0 ≤ g ≤ f p and g ≥ f p − ²p . Set h = g 1/p . Then h is a simple function and h ≤ f . Because p > 1, (f − h)p + hp ≤ f p and R R (f − h)p ≤ f p − g ≤ ²p , so
R ku − h• kp = ( f − hp )1/p ≤ ²,
while h• ∈ S. (ii) For general u ∈ Lp , ² > 0, u can be expressed as u+ − u− where u+ = u ∨ 0, u− = (−u) ∨ 0 belong to Lp and are nonnegative. By (i), we can find v1 , v2 ∈ S such that ku+ − v1 kp ≤ 21 ², ku− − v2 kp ≤ 21 ², so that v = v1 − v2 ∈ S and ku − vkp ≤ ². As u and ² are arbitrary, S is dense. (b) Again, all the ideas are to be found in 242O; the changes needed are in the formulae, R notp in the p method. They will go more easily if I note at the outset that whenever f , f ∈ L (µ) and f1  ≤ ²p , 1 2 R R p p p p f2  ≤ δ (where ², δ ≥ 0), then f1 + f2  ≤ (² + δ) ; this is just for k kp (244Fb). R the triangle inequality R Also I will regularly express the target relationships in the form ‘ X f − gp ≤ ²p ’, ‘ X f − gp ≤ ²p ’. Now let me run through the argument of 242Oa, rather more briskly than before. (i) SupposeR first that f = χI¹X where I ⊆ R r is a bounded halfopen interval. As before, we can set h = χI and get X f − hp = 0. This time, use the same construction to R find an interval J and a function g ∈ Ck such that χI ≤ g ≤ χJ and µr (J \ I) ≤ ²p ; this will ensure that X f − gp ≤ ²p . (ii) Now suppose that f = χ(X ∩E) where E ⊆ R r is a set of finite measure. Then, for Sthe same reasons as before, there is a disjoint family I0 , . . . , In of halfopen intervals such that µr (E4 j≤n Ij ) ≤ ( 21 ²)p . R Pn Accordingly h = j=0 χIj ∈ S0 and X f − hp ≤ ( 21 ²)p . And (i) tells us that there is for each j ≤ n a R R Pn gj ∈ Ck such that X gj − χIj p ≤ (²/2(n + 1))p , so that g = j=0 gj ∈ Ck and X f − gp ≤ ²p . (iii) The move toR simple functions, and thence to arbitrary members of Lp (µ), is just as before, but using kf kp in place of X f . Finally, the translation from Lp to Lp is again direct – remembering, as before, to check that g¹X, h¹X belong to Lp (µ) for every g ∈ Ck , h ∈ S0 .
Lp
244K
165
*244I Corollary In the context of 244Hb, Lp (µ) is separable. proof Let A be the set Pn {( j=0 qj χ([aj , bj [ ∩ X))• : n ∈ N, q0 , . . . , qn ∈ Q, a0 , . . . , an , b0 , . . . , bn ∈ Q r }. Pn Then A is a countable subset of Lp (µ), and its closure must contain ( j=0 cj χ([aj , bj [ ∩ X))• whenever c0 , . . . , cn ∈ R and a0 , . . . , an , b0 , . . . , bn ∈ R r ; that is, A is a closed set including {(h¹X)• : h ∈ S0 }, and is the whole of Lp (µ), by 244Hb. 244J Duality in Lp spaces Let (X, Σ, µ) be any measure space, and p ∈ ]1, ∞[. Set q = p/(p − 1); note that p1 + 1q = 1 and that p = q/(q − 1); the relation between p and q is symmetric. Now u × v ∈ L1 (µ) and ku × vk1 ≤ kukp kvkq whenever u ∈ Lp = Lp (µ) and v ∈ Lq = Lq (µ) (244E). Consequently we have a bounded linear operator T from Lq to the normed space dual (Lp )∗ of Lp , given by writing R (T v)(u) = u × v for all u ∈ Lp , v ∈ Lq , exactly as in 243F. 244K Theorem Let (X, Σ, µ) be a measure space, and p ∈ ]1, ∞[; set q = p/(p − 1). Then the canonical map T : Lq (µ) → Lp (µ)∗ , described in 244J, is a normed space isomorphism. Remark I should perhaps remind anyone who is reading this chapter to learn about L2 that the general theory of Hilbert spaces yields this theorem in the case p = q = 2 without any need for the more generally applicable argument given below (see 244N, 244Yj). proof We know that T is a bounded linear operator of norm at most 1; I need to show (i) that T is actually an isometry (that is, that kT vk = kvkq for every v ∈ Lq ), which will show incidentally that T is injective (ii) that T is surjective, which is the really substantial part of the theorem. q p R (a) If v ∈ L , then by 244Fa (recalling that p = q/(q − 1)) there is a u ∈ L such that kukp ≤ 1 and u × v = kvkq ; thus kT vk ≥ (T v)(u) = kvkq . As we know already that kT vk ≤ kvkq , we have kT vk = kvkq for every v, and T is an isometry.
(b) The rest of the proof, therefore, will be devoted to showing that T : Lq → (Lp )∗ is surjective. Fix h ∈ (Lp )∗ with khk = 1. I need to show that h ‘lives on’ a countable union of sets of finite measure in X, in the following sense: • p there is a nondecreasing S sequence hEn in∈N of sets of finite measurepsuch that h(f ) = 0 whenever f ∈ L P Choose a sequence hun in∈N in L such that kun kp ≤ 1 for every n and and f (x) = 0 for x ∈ n∈N En . P limn→∞ h(un ) = khk = 1. For each n, express un as fn• , where fn : X → R is a measurable function. Set Pn En = {x : k=0 fk (x)p ≥ 2−n } for n ∈ N; because fk p is measurable and integrable and has domain X for every k, En ∈ Σ and µEn < ∞ for each n. S Now suppose that f ∈ Lp (X) and that f (x) = 0 for x ∈ n∈N En ; set u = f • ∈ Lp . ?? Suppose, if possible, that h(u) 6= 0, and consider h(cu), where 1/(p−1) sgn c = sgn h(u), 0 < c < (p h(u) kuk−p . p )
(Of course kukp 6= 0 if h(u) 6= 0.) For each n, we have S {x : fn (x) 6= 0} ⊆ m∈N Em ⊆ {x : f (x) = 0}, so fn + cf p = fn p + cf p and h(un + cu) ≤ kun + cukp = (kun kpp + kcukpp )1/p ≤ (1 + cp kukpp )1/p . Letting n → ∞, 1 + ch(u) ≤ (1 + cp kukpp )1/p . Because sgn c = sgn h(u), ch(u) = ch(u) and we have 1 + pch(u) ≤ (1 + ch(u))p ≤ 1 + cp kukpp ,
166
Function spaces
244K
so that ph(u) ≤ cp−1 kukpp < ph(u) by the choice of c; which is impossible. X X S This means that h(f • ) = 0 whenever f : X → R is measurable, belongs to Lq , and is zero on n∈N En . Q Q S (c) Set Hn = En \ k 0} ⊆ n∈N En . If F ⊆ G and µF < ∞, then R limn→∞ g × χ(F ∩ En ) ≤ supn∈N h(χ(F ∩ En )• ) ≤ supn∈N kχ(F ∩ En )kp = (µF )1/p , so by B. Levi’s theorem
Lp
244L
R F
R
g=
R
167
g × χF = limn→∞
R
g × χ(F ∩ En )
R exists. Similarly, F g exists if F R⊆ {x : g(x) < 0} has finite measure; while obviously F g exists if F ⊆ {x : g(x) = 0}. Accordingly F g exists for every set F of finite measure. Moreover, by Lebesgue’s Dominated Convergence Theorem, R R P∞ g = limn→∞ F ∩En g = limn→∞ h(χ(F ∩ En )• ) = n=0 h(χ(F ∩ Hn )• ) = h(χF • ) F for such F , by (c) above. It follows at once that R g × f = h(f • ) for every simple function f : X → R. (f ) Now g ∈ Lq . P P (i) We already know that gq : X → R is measurable, because g is measurable and q a 7→ a is continuous. (ii) Suppose that f is a nonnegative simple function and f ≤a.e. gq . Then f 1/p is a simple function, and only the values 0, 1 and −1, so f1 = f 1/p × sgn g is R sgnp g isR measurable andR takes 1/p simple. We see that f1  = f , so kf1 kp = ( f ) . Accordingly Z (
Z Z f )1/p ≥ h(f1• ) = g × f1 = g × f 1/p  Z ≥ f 1/q × f 1/p
(because 0 ≤ f 1/q ≤a.e. g)
Z =
and we must have
f,
R
f ≤ 1. (iii) Thus R sup{ f : f is a nonnegative simple function, f ≤a.e. gq } ≤ 1 < ∞.
But now observe that if ² > 0 then {x : g(x)q ≥ ²} =
S
n∈N {x
: x ∈ En , g(x)q ≥ ²},
and for each n ∈ N µ{x : x ∈ En , g(x)q ≥ ²} ≤ 1² , because f = ²χ{x : x ∈ En , g(x)q ≥ ²} is a simple function less than or equal to gq , so has integral at most 1. Accordingly µ{x : g(x)q ≥ ²} = supn∈N µ{x : x ∈ En , g(x)q ≥ ²} ≤
1 ²
< ∞.
q
Thus g is integrable, by the criterion in 122Ja. Q Q (g) We may therefore speak of h1 = T (g • ) ∈ (Lp )∗ , and we know that it agrees with h on members of Lp of the form f • where f is a simple function. But these form a dense subset of Lp , by 244Ha, and both h and h1 are continuous, so h = h1 is a value of T , by 2A3Uc. The argument as written so far has assumed that khk = 1. But of course any nonzero member of (Lp )∗ is a scalar multiple of an element of norm 1, so is a value of T . So T : Lq → (Lp )∗ is indeed surjective, and is therefore an isometric isomorphism, as claimed. 244L
Continuing with the same topics as in §§242 and 243, I turn to the ordercompleteness of Lp .
Theorem Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞[. Then Lp = Lp (µ) is Dedekind complete. proof I use 242H. Let A ⊆ Lp be any set which is bounded above in Lp . Fix u0 ∈ A and set A0 = {u0 ∨ u : u ∈ A}, so that A0 has the same upper bounds as A and is bounded below by u0 . Fixing an upper bound w0 of A in Lp , then u0 ≤ u ≤ w0 for every u ∈ A0 . Set B = {(u − u0 )p : u ∈ A0 }.
168
Function spaces
244L
Then 0 ≤ v ≤ (w0 − u0 )p ∈ L1 = L1 (µ) for every v ∈ B, so B is a nonempty subset of L1 which is bounded above in L1 , and therefore has a 1/p 1/p least upper bound v1 in L1 . Now v1 ∈ Lp ; consider w1 = u0 + v1 . If u ∈ A0 then (u − u0 )p ≤ v1 so 1/p u − u0 ≤ v1 and u ≤ w1 ; thus w1 is an upper bound for A0 . If w ∈ Lp is an upper bound for A0 , then u − u0 ≤ w − u0 and (u − u0 )p ≤ (w − u0 )p for every u ∈ A0 , so (w − u0 )p is an upper bound for B and 1/p v1 ≤ (w − u0 )p , v1 ≤ w − u0 and w1 ≤ w. Thus w = sup A0 = sup A in Lp . As A is arbitrary, Lp is Dedekind complete. 244M
As in the last two sections, the theory of conditional expectations is worth revisiting.
Theorem Let (X, Σ, µ) be a probability space, and T a σsubalgebra of Σ. Take p ∈ [1, ∞]. Regard L0 (µ¹ T) as a subspace of L0 (µ), so that Lp (µ¹ T) becomes a subspace of Lp (µ) (cf. 242Jb). Let P : L1 (µ) → L1 (µ¹ T) be the conditional expectation operator, as described in 242Jd. Then whenever u ∈ Lp (µ), P up ≤ P (up ), so P u ∈ Lp (µ¹ T) and kP ukp ≤ kukp . proof For p = ∞, this is 243Jb, so I assume henceforth that p < ∞. Set φ(t) = tp for t ∈ R; then φ is a convex function (because it is absolutely continuous on any bounded interval, and its derivative is non¯ decreasing), and up = φ(u) for every u ∈ L0 = L0 (¯ µ), where φ¯ is defined as in 241I. Now if u ∈ Lp = Lp (µ), 1 p • we surely have u ∈ L (because u ≤ u ∨ (χX) , or otherwise); so 242K tells us that P up ≤ P up . But this means that P u ∈ Lp ∩ L1 (µ¹ T) = Lp (µ¹ T), and R R R kP ukp = ( P up )1/p ≤ ( P up )1/p = ( up )1/p = kukp , as claimed. 244N The space L2 (a) As I have already remarked, the really important function spaces are L0 , L1 , L2 and L∞ . L2 has the special property of being an inner product space; R if (X, Σ, µ) is any measure space and u, v ∈ L2 (µ) then u × v ∈ L1 , by 244Eb, and we may write (uv) = u × v. This makes L2 a real inner product space (because (u1 + u2 v) = (u1 v) + (u2 v), (uu) ≥ 0,
(cuv) = c(uv),
(uv) = (vu),
u = 0 whenever (uu) = 0
p for all u, u1 , u2 , v ∈ L2 and c ∈ R) and its norm k k2 is the associated norm (because kuk2 = (uu) whenever u ∈ L2 ). Because L2 is complete (244G), it is a real Hilbert space. The fact that it may be identified with its own dual (244K) can of course be deduced from this. I will use the phrase ‘squareintegrable’ to describe functions in L2 . (b) Conditional expectations take a special form in the case of L2 . Let (X, Σ, µ) be a probability space, T a σsubgalgebra of Σ, and P : L1 = L1 (µ) → L1 (µ¹ T) ⊆ L1 the corresponding conditional expectation 2 2 operator. Then P [L2 ] ⊆ L2 , where L2 = L2 (µ) (244M), so we have R an operator P2 = P ¹L from L to itself.1 2 Now P2 is an orthogonal projection and its kernel is {u : u ∈ L , F u = 0 for every F ∈ T}. P P (i) If u ∈ L R then P u = 0 iff F u = 0 for every F ∈ T (cf. 242Je); so surely the kernel of P2 is the set described. (ii) Since P 2 = P , P2 is also a projection; because P2 has norm at most 1 (244M), and is therefore continuous, U = P2 [L2 ] = L2 (µ¹ T) = {u : u ∈ L2 , P2 u = u}, 2
V = {u : P2 u = 0}
2
are closed linear subspaces of L such that U ⊕ V = L . (iii) Now suppose that u ∈ U , v ∈ V . Then P v ∈ L2 , so u × P v ∈ L1 and P (u × v) = u × P v, by 242L. Accordingly R R R (uv) = u × v = P (u × v) = u × P v = 0. Thus U and V are orthogonal subspaces of L2 , which is what we mean by saying that P2 is an orthogonal projection. (Some readers will know that every projection of norm at most 1 on an inner product space is orthogonal.) Q Q
Lp
244Xd
169
244O Complex Lp Let (X, Σ, µ) be any measure space. (a) For any p ∈ ]1, ∞[, set LpC = LpC (µ) = {f : f ∈ L0C (µ), f p is integrable}, LpC (µ) = {f • : f ∈ Lp } = {u : u ∈ L0C (µ), Re(u) ∈ Lp (µ) and Im(u) ∈ Lp (µ)} = {u : u ∈ L0C , u ∈ Lp }. R Then LpC is a linear subspace of L0C . Set kukp = ( up )1/p for u ∈ LpC . (b) The proof of 244E(bi) applies unchanged to complexvalued functions, so taking q = p/(p − 1) we get ku × vk1 ≤ kukp kvkq LpC ,
LqC .
for all u ∈ v∈ 244Fa becomes p for every u ∈ LC there is a v ∈ LqC such that kvkq ≤ 1 and R R u × v =  u × v = kukp ; the same proof works, if you allow me to write sgn a = a/a for all nonzero complex numbers – it would perhaps be more natural to write sgn(a) in place of sgn a. So, just as before, we find that k kp is a norm. We can use the argument of 244G to show that LpC is complete. The space SC of equivalence classes of ‘complexvalued simple functions’ is dense in LpC . If X is a subset of R r and µ is Lebesgue measure on X, then the space of equivalence classes of continuous complexvalued functions on X with bounded support is dense in LpC . R (c) The canonical map T : LqC → (LpC )∗ , defined by writing (T v)(u) = u × v, is surjective because T ¹Lq : Lq → (Lp )∗ is surjective; and it is an isometry by the remarks in (b) just above. Thus we can still identify LqC with (LpC )∗ . (d) When we come to the complex form of Jensen’s inequality, it seems that a new idea is needed. I have relegated this to 242Yj242Yk. But for the complex form of 244M a simpler argument will suffice. If we have a probability space (X, Σ, µ), a σsubalgebra T of Σ, and the corresponding conditional expectation operator P : L1C (µ) → L1C (µ¹ T), then for any u ∈ LpC (µ) we shall have P up ≤ (P u)p ≤ P (up ), applying 242Pc and 244M. So kP ukp ≤ kukp , as before. (e) There is a special point arising with L2C . We now have to define R (uv) = u × v¯ R for u, v ∈ L2C , so that (uu) = u2 = kuk22 for every u; this means that (vu) is the complex conjugate of (uv). ˆ µ 244X Basic exercises > (a) Let (X, Σ, µ) be a measure space, and (X, Σ, ˆ) its completion. Show that p p p L (ˆ µ) = L (µ) and L (ˆ µ) = L (µ) for every p ∈ [1, ∞]. p
(b) Let (X, Σ, µ) be a measure space, and 1 ≤ p ≤ r ≤ ∞. Set e = 1• in L0 (µ). (i) Show that if u ∈ Lp (µ) and u ≤ e then u ∈ Lr (µ) and kukr ≤ kukp . (Hint: look first at the case kukp = 1.) (ii) Show that if v ∈ Lr (µ) then (v − e)+ ∈ Lp (µ) and k(v − e)+ kp ≤ kvkr . (c) Let (X, Σ, µ) be a measure space, and 1 ≤ p ≤ q ≤ r ≤ ∞. Show that Lp (µ) ∩ Lr (µ) ⊆ Lq (µ) ⊆ L (µ) + Lr (µ) ⊆ L0 (µ). (See also 244Yg.) p
1 = 0 as (d) Let (X, Σ, µ) be a measure space. Suppose that p, q, r ∈ [1, ∞] and that p1 + 1q = 1r , setting ∞ r p q usual. Show that u × v ∈ L (µ) and ku × vkr ≤ kukp kvkq for every u ∈ L (µ), v ∈ L (µ). (Hint: if r < ∞ apply H¨older’s inequality to ur ∈ Lp/r , vr ∈ Lq/r .)
170
Function spaces
244Xe
(e) (i) Let (X, Σ, µ) be a probability space. Show that R if 1 ≤ p ≤ r ≤ ∞ then kf kp ≤ kf kr for every f ∈ Lr (µ). (Hint: use H¨older’s inequality to show that f p ≤ kf p kr/p .) In particular, Lp (µ) ⊇ Lr (µ). (ii) Let (X, Σ, µ) be a measure space such that µE ≥ 1 whenever E ∈ Σ and µE > 0. (This happens, for instance, when µ is ‘counting measure’ on X.) Show that if 1 ≤ p ≤ r ≤ ∞ then Lp (µ) ⊆ Lr (µ) and kukp ≥ kukr for every u ∈ Lp (µ). (Hint: 244Xb.) (f ) Let (X, Σ, µ) be a semifinite measure space, and p, q ∈ [1, ∞] such that p1 + 1q = 1. Show that if u ∈ L0 (µ) \ Lp (µ) then there is a v ∈ Lq (µ) such that u × v ∈ / L1 (µ). (Hint: reduce to the case u ≥ 0. n q Show that in this R case therenis for each n ∈PN∞a un ≤ u such that 4 ≤ kun kp < ∞; take vn ∈ L such that −n kvn kq ≤ 2 , un × vn ≥ 2 , and set v = n=0 vn .) (g) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, and (X, Σ, µ) their Q direct sum (214K). Take any p ∈ [1, ∞[. Show that the canonical isomorphism between L0 (µ) and i∈I L0 (µi ) (241Xg) induces an isomorphism between Lp (µ) and the subspace ¡P Q p 1/p {u : u ∈ i∈I Lp (µi ), kuk = < ∞} i∈I ku(i)kp ) Q p of i∈I L (µi ). (h) Let (X, Σ, µ) be a measure space. Set M ∞,1 = L1 (µ) ∩ L∞ (µ). Show that for u ∈ M ∞,1 the function p 7→ kukp : [1, ∞[ → [0, ∞[ is continuous, and that kuk∞ = limp→∞ kukp . (Hint: consider first the case in which u is the equivalence class of a simple function.) (i) Let µ be counting measure on X = {1, 2}, so that L0 (µ) = R 2 and Lp (µ) = L0 (µ) can be identified with R 2 for every p ∈ [1, ∞]. Sketch the unit balls {u : kukp ≤ 1} in R 2 for p = 1, 23 , 2, 3 and ∞. (j) Let µ be counting measure on X = {1, 2, 3}, so that L0 (µ) = R 3 and Lp (µ) = L0 (µ) can be identified with R 3 for every p ∈ [1, ∞]. Describe the unit balls {u : kukp ≤ 1} in R 3 for p = 1, 2 and ∞. (k) At which point does the argument of 244Hb break down if we try to apply it to L∞ with k k∞ ? (l) For any measure space (X, Σ, µ) write M 1,∞ = M 1,∞ (µ) for {v + w : v ∈ L1 (µ), w ∈ L∞ (µ)} ⊆ L0 (µ). Show that M 1,∞ is a linear subspace of L0 including Lp for every p ∈ [1, ∞], and that if u ∈ L0 , v ∈ M 1,∞ and u ≤ v then u ∈ M 1,∞ . (Hint: u = v × w where w ≤ 1• .) (m) Let (X, Σ, µ) and (Y, T, ν) be two measure spaces, and let T + be the set of linear operators T : M 1,∞ (µ) → M 1,∞ (ν) such that (α) T u ≥ 0 whenever u ≥ 0 in M 1,∞ (µ) (β) T u ∈ L1 (ν) and kT uk1 ≤ kuk1 whenever u ∈ L1 (µ) (γ) T u ∈ L∞ (ν) and kT uk∞ ≤ kuk∞ whenever u ∈ L∞ (µ). (i) Show that if φ : R → R ¯ is a convex function such that φ(0) = 0, and u ∈ M 1,∞ (µ) is such that φ(u) ∈ M 1,∞ (µ) (interpreting 0 0 1,∞ ¯ ¯ ¯ ¯ φ : L (µ) → L (µ) as in 241I), then φ(T u) ∈ M (ν) and φ(T u) ≤ T (φ(u)) for every T ∈ T + . (ii) Hence p p show that if p ∈ [1, ∞] and u ∈ L (µ), T u ∈ L (ν) and kT ukp ≤ kukp for every T ∈ T + . > (n) Let X be any set, and let µ be counting measure on X. In this case it is customary (at least for p ∈ [1, ∞]) to write `p (X) for Lp (µ), and to identify it with Lp (µ). In particular, L2 (µ) becomes identified with `2 (X), the space of squaresummable functions on X. Write out statements and proofs of the results of this chapter adapted to this special case. (o) Let (X, Σ, µ) and (Y, T, ν) be measure spaces and φ : X → Y an inversemeasurepreserving function. Show that the map g 7→ gφ : L0 (ν) → L0 (µ) (241Xh) induces a normpreserving map from Lp (ν) to Lp (µ) for every p ∈ [1, ∞], and also a map from M 1,∞ (ν) to M 1,∞ (µ) which belongs to the class T + of 244Xm. ˜ µ 244Y Further exercises (a) Let (X, Σ, µ) be a measure space, and (X, Σ, ˜) its c.l.d. version. Show p p that L (µ) ⊆ L (˜ µ) and that this embedding induces a Banach lattice isomorphism between Lp (µ) and Lp (˜ µ), for every p ∈ [1, ∞[. (b) Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞[. Show that Lp (µ) has the countable sup property in the sense of 241Yd. (Hint: 242Yd.)
Lp
244 Notes
171
(c) Let (X, Σ, µ) be a measure space, and Y a subset of X; write µY for the subspace measure on Y . Show that the canonical map T from L0 (µ) onto L0 (µY ) (241Ye) includes a surjection from Lp (µ) onto Lp (µY ) for every p ∈ [1, ∞], and also a map from M 1,∞ (µ) to M 1,∞ (µY ) which belongs to the class T + of 244Xm. Show that the following are equiveridical: (i) there is some p ∈ [1, ∞[ such that T ¹Lp (µ) is injective; (ii) T : Lp (µ) → Lp (µY ) is normpreserving for every p ∈ [1, ∞[; (iii) F ∩ Y 6= ∅ whenever F ∈ Σ and 0 < µF < ∞. (d) Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞[. Show that the norm k kp on Lp (µ) is ordercontinuous in the sense of 242Yc. (e) Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞]. Show that if A ⊆ Lp (µ) is upwardsdirected and normbounded, then it is bounded above. (Hint: 242Yb.) (f ) Let (X, Σ, µ) be any measure space, and p ∈ [1, ∞]. Show that if a nonempty set A ⊆ Lp (µ) is upwardsdirected and has a supremum in Lp (µ), then k sup Akp ≤ supu∈A kukp . (Hint: consider first the case 0 ∈ A.) (g) Let (X, Σ, µ) be a measure space and u ∈ L0 (µ). Show that I = {p : p ∈ [0, ∞[ , u ∈ Lp (µ)} is an interval. Give examples to show that it may be open, closed or halfopen. Show that p 7→ p ln kukp : I → R is convex. Hence show that if p ≤ q ≤ r ∈ I, kukq ≤ max(kukp , kukr ). (h) Let [a, b] be a nontrivial closed interval in R and F : [a, b] → R a function; take p ∈ ]1, ∞[. Show that the following are equiveridical: (i) F is absolutely continuous and its derivative F 0 belongs to Lp (µ), where µ is Lebesgue measure on [a, b] (ii) Pn F (ai )−F (ai−1 )p c = sup{ i=1 : a ≤ a0 < a1 < . . . < an ≤ b} p−1 (ai −ai−1 )
0
is finite, and that in this case c = kF kp . (Hint: (i) if F is absolutely continuous and F 0 ∈ Lp , use H¨older’s R b0 inequality to show that F (b0 ) − F (a0 )p ≤ (b0 − a0 )p−1 a0 F 0 p whenever a ≤ a0 ≤ b0 ≤ b. (ii) If F satisfies Pn Pn the conditions, show that ( i=0 F (bi ) − F (ai ))p ≤ c( i=0 (bi − ai ))p−1 whenever a ≤ a0 ≤ b0 ≤ a1 ≤ . . . ≤ bn ≤ b, so that F is absolutely continuous. Take aR sequence hFn in∈N R of polygonal functions approximating F ; use 223Xh to show that Fn0 → F 0 a.e., so that F 0 p ≤ supn∈N Fn0 p ≤ cp .) (i) Let G be an open set in R r and write µ for Lebesgue measure on G. Let Ck (G) be the set of continuous functions f : G → R such that inf{kx − yk : x ∈ G, f (x) 6= 0, y ∈ R r \ G} > 0 (counting inf ∅ as ∞). Show that for any p ∈ [1, ∞[ the set {f • : f ∈ Ck (G)} is a dense linear subspace of Lp (µ). (j) Let U be any Hilbert space. (i) Show that if C ⊆ U is convex (that is, tu + (1 − t)v ∈ C whenever u, v ∈ C and t ∈ [0, 1]; see 233Xd) and closed, and u ∈ U , then there is a unique v ∈ C such that ku − vk = inf w∈C ku − wk, and that (u − vv − w) ≥ 0 for every w ∈ C. (ii) Show that if h ∈ U ∗ there is a unique v ∈ U such that h(w) = (wv) for every w ∈ U . (Hint: apply (i) with C = {w : h(w) = 1}, u = 0.) (iii) Show that if V ⊆ U is a closed linear subspace then there is a unique linear projection P on U such that P [U ] = V and (u − P uv) = 0 for all u ∈ U , v ∈ V (P is ‘orthogonal’). (Hint: take P u to be the point of V nearest to u.) (k) Let (X, Σ, µ) be a probability space, and T a σsubalgebra Rof Σ. UseR part (iii) of 244Yj to show that 2 there is an orthogonal projection P : L2 (µ) →R L2 (µ¹ T) R such that F P u = F u for every u ∈ L (µ), F ∈ T. Show that P u ≥ 0 whenever u ≥ 0 and that P u = u for every u, so that P has a unique extension to a continuous operator from L1 (µ) onto L1 (µ¹ T). Use this to develop the theory of conditional expectations without using the RadonNikod´ ym theorem. 244 Notes and comments At this point I feel we must leave the investigation of further function spaces. The next stage would have to be a systematic abstract analysis of general Banach lattices. The Lp spaces give a solid foundation for such an analysis, since they introduce the basic themes of normcompleteness, ordercompleteness and identification of dual spaces. I have tried in the exercises to suggest the importance
172
Function spaces
244 Notes
of the next layer of concepts: ordercontinuity of norms and the relationship between normboundedness and orderboundedness. What I have not had space to discuss is the subject of orderpreserving linear operators between Riesz spaces, which is the key to understanding the order structure of the dual spaces here. (But you can make a start by rereading the theory of conditional expectation operators in 242J242L, 243J and 244M.) All these topics are treated in Fremlin 74 and in Chapters 35 and 36 of the next volume. I remember that one of my early teachers of analysis said that the Lp spaces (for p 6= 1, 2, ∞) had somehow got into the syllabus and had never been got out again. I would myself call them classics, in the sense that they have been part of the common experience of all functional analysts since functional analysis began; and while you are at liberty to dislike them, you can no more ignore them than you can ignore Milton if you are studying English poetry. H¨older’s inequality, in particular, has a wealth of applications; not only 244F and 244K, but also 244Xd, 244Xe and 244Yh, for instance. The Lp spaces, for 1 ≤ p ≤ ∞, form a kind of continuum. In terms of the concepts dealt with here, there is no distinction to be drawn between different Lp spaces for 1 < p < ∞ except the observation that the norm of L2 is an inner product norm, corresponding to a Euclidean geometry on its finitedimensional subspaces. To discriminate between the other Lp spaces we need much more refined concepts in the geometry of normed spaces. In terms of the theorems given here, L1 seems closer to the middle range of Lp for 1 < p < ∞ than ∞ L does; thus, for all 1 ≤ p < ∞, we have Lp Dedekind complete (independent of the measure space involved), the space S of equivalence classes of simple functions is dense in Lp (again, for every measure space), and the dual (Lp )∗ is (almost) identifiable as another function space. All of these should be regarded as consequences in one way or another of the ordercontinuity of the norm of Lp for p < ∞. The chief obstacle to the universal identification of (L1 )∗ with L∞ is that for nonσfinite measure spaces the space L∞ can be inadequate, rather than any pathology in the L1 space itself. (This point, at least, I mean to return to in Volume 3.) There is also the point that for a nonsemifinite measure space the purely infinite sets can contribute to L∞ without any corresponding contribution to L1 . For 1 < p < ∞, neither of these problems can arise. Any member of any such Lp is supported entirely by a σfinite part of the measure space, and the same applies to the dual – see part (c) of the proof of 244K. Of course L1 does have a markedly different geometry from the other Lp spaces. The first sign of this is that it is not reflexive as a Banach space (except when it is finitedimensional), whereas for 1 < p < ∞ the identifications of (Lp )∗ with Lq and of (Lq )∗ with Lp , where q = p/(p − 1), show that the canonical embedding of Lp in (Lp )∗∗ is surjective, that is, that Lp is reflexive. But even when L1 is finitedimensional the unit balls of L1 and L∞ are clearly different in kind from the unit balls of Lp for 1 < p < ∞; they have corners instead of being smoothly rounded (244Xi244Xj). The proof of 244K, identifying (Lp )∗ , is a fairly long haul, and it is natural to ask whether we really have to work so hard, especially since in the case of L2 we have a much easier argument (244Yj). Of course we can go faster if we know a bit more about Banach lattices (§369 in Volume 3 has the relevant facts), though this route uses some theorems quite as hard as 244K as given. There are alternative routes using the geometry of the Lp spaces, following the ideas of 244Yj, but I do not think they are any easier, and the argument I have presented here at least has the virtue of using some of the same ideas as the identification of (L1 )∗ in 243G. The difference is that whereas in 243G we may have to piece together a large family of functions gF (part (bv) of the proof), in 244K there are only countably many gn ; consequently we can make the argument work for arbitrary measure spaces, not just localizable ones. The geometry of Hilbert space gives us an approach to conditional expectations which does not depend on the RadonNikod´ ym theorem (244Yk). To turn these ideas into a proof of the RadonNikod´ ym theorem itself, however, requires qualities of determination and ingenuity which can be better employed elsewhere. The convexity arguments of 233J/242K can be used on many operators besides conditional expectations (see 244Xm). The class ‘T + ’ described there is not in fact the largest for which these arguments work; I take the ideas farther in Chapter 37. There is also a great deal more to be said if you put an arbitrary pair of Lp spaces in place of L1 and L∞ in 244Xl. 244Yg is a start, but for the real thing (the ‘Riesz convexity theorem’) I refer you to Zygmund 59, XII.1.11 or Dunford & Schwartz 57, VI.10.11.
245Bb
Convergence in measure
173
245 Convergence in measure I come now to an important and interesting topology on the spaces L0 and L0 . I start with the definition (245A) and with properties which echo those of the Lp spaces for p ≥ 1 (245D245E). In 245G245J I describe the most useful relationships between this topology and the norm topologies of the Lp spaces. For σfinite spaces, it is metrizable (245Eb) and sequential convergence can be described in terms of pointwise convergence of sequences of functions (245K245L). 245A Definitions Let (X, Σ, µ) be a measure space. (a) For any measurable set F ⊆ X of finite measure, we have a functional τF on L0 = L0 (µ) defined by setting R τF (f ) = f  ∧ χF for every f ∈ L0 . (The integral exists in R because f ∧χF belongs to L0 and is dominated by the integrable function χF ). Now τF (f + g) ≤ τF (f ) + τF (g) whenever f , g ∈ L0 . P P We need only observe that min((f + g)(x), (χF )(x)) ≤ min(f (x), (χF )(x)) + min(g(x), (χF )(x)) for every x ∈ dom f ∩ dom g, which is almost every x ∈ X. Q Q Consequently, setting ρF (f, g) = τF (f − g), we have ρF (f, h) = τF ((f − g) + (g − h)) ≤ τF (f − g) + τF (g − h) = ρF (f, g) + τF (g, h), ρF (f, g) = τF (f − g) ≥ 0, ρF (f, g) = τF (f − g) = τF (g − f ) = ρF (g, f ) for all f , g, h ∈ L0 ; that is, ρF is a pseudometric. (b) The family {ρF : F ∈ Σ, µF < ∞} 0
now defines a topology on L (2A3F); I will call it the topology of convergence in measure on L0 . (c) If f , g ∈ L0 and f =a.e. g, then f  ∧ χF =a.e. g ∧ χF and τF (f ) = τF (g), for every set F of finite measure. Consequently we have functionals τ¯F on L0 = L0 (µ) defined by writing τ¯F (f • ) = τF (f ) whenever f ∈ L0 , F ∈ Σ and µF < ∞. Corresponding to these we have pseudometrics ρ¯F defined by either of the formulae ρ¯F (u, v) = τ¯F (u − v),
ρ¯F (f • , g • ) = ρF (f, g)
for u, v ∈ L0 , f , g ∈ L0 and F of finite measure. The family of these pseudometrics defines the topology of convergence in measure on L0 . (d) I shall allow myself to say that a sequence (in L0 or L0 ) converges in measure if it converges for the topology of convergence in measure (in the sense of 2A3M). 245B Remarks (a) Of course the topologies of L0 , L0 are about as closely related as it is possible for them to be. Not only is the topology of L0 the quotient of the topology on L0 (that is, a set G ⊆ L0 is open iff {f : f • ∈ G} is open in L0 ), but every open set in L0 is the inverse image under the quotient map of an open set in L0 . (b) It is convenient to note that if F0 , . . . , Fn are measurable sets of finite measure with union F , then, in the notation of 245A, τFi ≤ τF for every i; this means that a set G ⊆ L0 is open for the topology of convergence in measure iff for every f ∈ G we can find a single set F of finite measure and a δ > 0 such that ρF (g, f ) ≤ δ =⇒ g ∈ G.
174
Function spaces
245Bb
Similarly, a set G ⊆ L0 is open for the topology of convergence in measure iff for every u ∈ G we can find a set F of finite measure and a δ > 0 such that ρ¯F (v, u) ≤ δ =⇒ v ∈ G. (c) The phrase ‘topology of convergence in measure’ agrees well enough with standard usage when (X, Σ, µ) is totally finite. But a warning! the phrase ‘topology of convergence in measure’ is also used for the topology defined by the metric of 245Ye below, even when µX = ∞. I have seen the phrase local convergence in measure used for the topology of 245A. Most authors ignore nonσfinite spaces in this context. However I hold that 245D245E below are of sufficient interest to make the extension worth while. 245C Pointwise convergence The topology of convergence in measure is almost definable in terms of ‘pointwise convergence’, which is one of the roots of measure theory. The correspondence is closest in σfinite measure spaces (see 245K), but there is still a very important relationship in the general case, as follows. Let (X, Σ, µ) be a measure space, and write L0 = L0 (µ), L0 = L0 (µ). (a) If hfn in∈N is a sequence in L0 converging almost everywhere to f ∈ L0 , then hfn in∈N → f in measure. P P By 2A3Mc, I have only to show that limn→∞ ρF (fn , f ) = 0 whenever µF < ∞. But hfn − f  ∧ χF in∈N converges to 0 a.e. and is dominated by the integrable function χF , so by Lebesgue’s Dominated Convergence Theorem R limn→∞ ρF (fn , f ) = limn→∞ fn − f  ∧ χF = 0. Q Q (b) To formulate a corresponding result applicable to L0 , we need the following concept. If hfn in∈N , hgn in∈N are sequences in L0 such that fn• = gn• for every n, and f , g ∈ L0 are such that f • = g • , and hfn in∈N → f a.e., then hgn in∈N → g a.e., because \ \ {x : x ∈ dom f ∩ dom g ∩ dom fn ∩ gn , n∈N
n∈N
g(x) = f (x) = lim fn (x), fn (x) = gn (x) ∀ n ∈ N} n→∞
is conegligible. Consequently we have a definition applicable to sequences in L0 ; we can say that, for f , fn ∈ L0 , hfn• in∈N is order*convergent, or order*converges, to f • iff f =a.e. limn→∞ fn . In this case, of course, hfn in∈N → f in measure. Thus, in L0 , a sequence hun in∈N which order*converges to u ∈ L0 also converges to u in measure. Remark I suggest alternative descriptions of orderconvergence in 245Xc; the conditions (iii)(vi) there are in forms adapted to more general structures. (c) For a typical example of a sequence which is convergent in measure without being orderconvergent, consider the following. Take µ to be Lebesgue measure on [0, 1], and set fn (x) = 2m if x ∈ [2−m k, 2−m (k+1)], 0 otherwise, where k = k(n) ∈ N, m = m(n) ∈ N are defined by saying that n + 1 = 2m + k and 0 ≤ k < 2m . Then hfn in∈N → 0 for the topology of convergence in measure (since ρF (fn , 0) ≤ 2−m if F ⊆ [0, 1] is measurable and 2m − 1 ≤ n), though hfn in∈N is not convergent to 0 almost everywhere (indeed, lim supn→∞ fn = ∞ everywhere). 245D Proposition Let (X, Σ, µ) be any measure space. (a) The topology of convergence in measure is a linear space topology on L0 = L0 (µ). (b) The maps ∨, ∧ : L0 × L0 → L0 , and u 7→ u, u 7→ u+ , u 7→ u− : L0 → L0 are all continuous. (c) The map × : L0 × L0 → L0 is continuous. ¯ : L0 → L0 (241I) is continuous. (d) For any continuous function h : R → R, the corresponding function h proof (a) The point is that the functionals τ¯F , as defined in 245Ac, satisfy the conditions of 2A5B below. P P Fix a set F ∈ Σ of finite measure. We have already seen that τ¯F (u + v) ≤ τ¯F (u) + τ¯F (v) for all u, v ∈ L0 .
245D
Convergence in measure
175
Next, τ¯F (cu) ≤ τ¯F (u) whenever u ∈ L0 , c ≤ 1
...(*)
because cf  ∧ χF ≤a.e. f  ∧ χF whenever f ∈ L0 , c ≤ 1. Finally, given u ∈ L0 and ² > 0, let f ∈ L0 be such that f • = u. Then limn→∞ 2−n f  ∧ χF =a.e. 0, so by Lebesgue’s Dominated Convergence Theorem limn→∞ τ¯F (2−n u) = limn→∞
R
2−n f  ∧ χF = 0,
and there is an n such that τ¯F (2−n u) ≤ ². It follows (by (*) just above) that τ¯F (cu) ≤ ² whenever c ≤ 2−n . As ² is arbitrary, limc→0 τ¯F (u) = 0 for every u ∈ L0 ; which is the third condition in 2A5B. Q Q Now 2A5B tells us that the topology defined by the τ¯F is a linear space topology. (b) For any u, v ∈ L0 , u − v ≤ u − v, so ρ¯F (u, v) ≤ ρ¯F (u, v) for every set F of finite measure. By 2A3H,   : L0 → L0 is continuous. Now 1 2
u ∨ v = (u + v + u − v), u+ = u ∧ 0,
1 2
u ∧ v = (u + v − u − v), u− = (−u) ∧ 0.
As addition and subtraction are continuous, so are ∨, ∧,
+
and
−
.
0
(c) Take u0 , v0 ∈ L and F ∈ Σ a set of finite measure and ² > 0. Represent u0 and v0 as f0• , g0• respectively, where f0 , g0 : X → R are Σmeasurable (241Bk). If we set Fm = {x : x ∈ F, f0 (x) + g0 (x) ≤ m}, then hFm im∈N is a nondecreasing sequence of sets with union F , so there is an m ∈ N such that µ(F \Fm ) ≤ 1 1 2 2 ². Let δ > 0 be such that (2m + µF )δ + 2δ ≤ 2 ². 0 Now suppose that u, v ∈ L are such that ρ¯F (u, u0 ) ≤ δ 2 and ρ¯F (v, v0 ) ≤ δ 2 . Let f , g : X → R be measurable functions such that f • = u and v • = v. Then µ{x : x ∈ F, f (x) − f0 (x) ≥ δ} ≤ δ,
µ{x : x ∈ F, g(x) − g0 (x) ≥ δ} ≤ δ,
so that µ{x : x ∈ F, f (x) − f0 (x)g(x) − g0 (x) ≥ δ 2 } ≤ 2δ and
R F
min(1, f − f0  × g − g0 ) ≤ δ 2 µF + 2δ.
Also f × g − f0 × g0  ≤ f − f0  × g − g0  + f0  × g − g0  + f − f0  × g0 , so that Z ρ¯F (u × v, u0 × v0 ) =
min(1, f × g − f0 × g0 ) Z 1 ≤ ²+ min(1, f × g − f0 × g0 ) 2 Fm Z 1 min(1, f − f0  × g − g0  + mg − g0  + mf − f0 ) ≤ ²+ 2 Fm Z 1 ≤ ²+ min(1, f − f0  × g − g0 ) 2 F Z Z +m min(1, g − g0 ) + m min(1, f − f0 ) F
F
≤
1 ² + δ 2 µF 2
2
+ 2δ + 2mδ ≤ ².
F
176
Function spaces
245D
As F and ² are arbitrary, × is continuous at (u0 , v0 ); as u0 and v0 are arbitrary, × is continuous. ¯ ¯ (d) Take u ∈ L0 , F ∈ Σ of finite measure and ² > 0. Then there is a δ > 0 such that ρF (h(v), h(u)) ≤² whenever ρF (v, u) ≤ δ. P P?? Otherwise, we can find, for each n ∈ N, a vn such that ρ¯F (v, u) ≤ 4−n but ¯ ¯ ρ¯F (h(v), h(u)) > ². Express u as f • and vn as g • where f , gn : X → R are measurable. Set n
En = {x : x ∈ F, gn (x) − f (x) ≥ 2−n }
T S for each n. Then ρ¯F (vn , u) ≥ 2−n µEn , so µEn ≤ 2−n for each n, and E = n∈N m≥n Em is negligible. But limn→∞ gn (x) = f (x) for every x ∈ F \ E, so (because h is continuous) limn→∞ h(gn (x)) = h(f (x)) for every x ∈ F \ E. Consequently (by Lebesgue’s Dominated Convergence Theorem, as always) R ¯ n ), h(u)) ¯ min(1, h(gn (x)) − h(f (x))µ(dx) = 0, limn→∞ ρ¯F (h(v = limn→∞ F
which is impossible. X XQ Q ¯ is continuous. By 2A3H, h Remark I cannot say that the topology of convergence in measure on L0 is a linear space topology solely because (on the definitions I have chosen) L0 is not in general a linear space. 245E I turn now to the principal theorem relating the properties of the topological linear space L0 (µ) to the classification of measure spaces in Chapter 21. Theorem Let (X, Σ, µ) be a measure space. Let T be the topology of convergence in measure on L0 = L0 (µ), as described in 245A. (a) (X, Σ, µ) is semifinite iff T is Hausdorff. (b) (X, Σ, µ) is σfinite iff T is metrizable. (c) (X, Σ, µ) is localizable iff T is Hausdorff and L0 is complete under T. proof I use the pseudometrics ρF on L0 = L0 (µ), ρ¯F on L0 described in 245A. (a)(i) Suppose that (X, Σ, µ) is semifinite and that u, v are distinct members of L0 . Express them as f and g • where f and g are measurable functions from X to R. Then µ{x : f (x) 6= g(x)} > 0 so, because (X, Σ, µ) is semifinite, there is a set F ∈ Σ of finite measure such that µ{x : x ∈ F, f (x) 6= g(x)} > 0. Now R ρ¯F (u, v) = F min(f (x) − g(x), 1)dx > 0 •
(see 122Rc). As u and v are arbitrary, T is Hausdorff (2A3L). (ii) Suppose that T is Hausdorff and that E ∈ Σ, µE > 0. Then u = χE • 6= 0 so there is an F ∈ Σ such that µF < ∞ and ρ¯F (u, 0) 6= 0, that is, µ(E ∩ F ) > 0. Now E ∩ F is a nonnegligible set of finite measure included in E. As E is arbitrary, (X, Σ, µ) is semifinite. (b)(i) Suppose that (X, Σ, µ) is σfinite. Let hEn in∈N be a nondecreasing sequence of sets of finite measure covering X. Set ρ¯(u, v) =
∞ X ρ¯En (u, v) 1 + 2n µEn n=0
for u, v ∈ L0 . Then ρ¯ is a metric on L0 . P P Because every ρ¯En is a pseudometric, so is ρ¯. If ρ¯(u, v) = 0, express u as f • , v as g • where f , g ∈ L0 (µ); then R f − g ∧ χEn = ρ¯En (u, v) = 0, S so f = g almost everywhere on En , for every n. Because X = n∈N En , f =a.e. g and u = v. Q Q If F ∈ Σ and µF < ∞ and ² > 0, take n such that µ(F \ En ) ≤ 21 ². If u, v ∈ L0 and ρ¯(u, v) ≤ ²/2(1 + 2n µEn ), then ρ¯F (u, v) ≤ ². P P Express u as f • , v = g • where f , g ∈ L0 . Then R ² u − v ∧ χEn = ρ¯En (u, v) ≤ (1 + 2n µEn )¯ ρ(u, v) ≤ , 2
while R
² 2
f − g ∧ χ(F \ En ) ≤ µ(F \ En ) ≤ ,
245E
Convergence in measure
177
so ρ¯F (u, v) =
R
f − g ∧ χF ≤
R
f − g ∧ χEn +
R
f − g ∧ χ(F \ En ) ≤
² 2
+
² 2
= ². Q Q
In the other direction, given ² > 0, take n ∈ N such that 2−n ≤ 12 ²; then ρ¯(u, v) ≤ ² whenever ρ¯En (u, v) ≤ ²/2(n + 1). These show that ρ¯ defines the same topology as the ρ¯F (2A3Ib), so that T, the topology defined by the ρ¯F , is metrizable. (ii) Suppose that T is metrizable. Let ρ¯ be a metric defining T. For each n ∈ N there must be a measurable set Fn of finite measure and a δn > 0 such that ρ¯Fn (u, 0) ≤ δn =⇒ ρ¯(u, 0) ≤ 2−n .
S Set E = X \ n∈N Fn . ?? If E is not negligible, then u = χE • 6= 0; because ρ¯ is a metric, there is an n ∈ N such that ρ¯(u, 0) > 2−n ; now µ(E ∩ Fn ) = ρ¯Fn (u, 0) > δn . But E ∩ Fn = ∅. X X S So µE = 0 < ∞. Now X = E ∪ n∈N Fn is a countable union of sets of finite measure, so µ is σfinite. (c) By (a), either hypothesis ensures that µ is semifinite and that T is Hausdorff. (i) Suppose that (X, Σ, µ) is localizable; then T is Hausdorff, by (a). Let F be a Cauchy filter on L0 (2A5F). For each measurable set F of finite measure, choose a sequence hAn (F )in∈N T of members of F such that ρ¯F (u, v) ≤ 4−n for every u, v ∈ An (F ) and every n (2A5G). Choose uF n ∈ k≤n An (F ) for each n; then ρ¯F (uF,n+1 , uF n ) ≤ 2−n for each n. Express each uF n as fF• n where fF n is a measurable function from X to R. Then µ{x : x ∈ F, fF,n+1 (x) − fF n (x) ≥ 2−n } ≤ 2n ρ¯F (uF,n+1 , uF n ) ≤ 2−n for each n. It follows that hfF n in∈N must converge almost everywhere on F . P P Set Hn = {x : x ∈ F, fF,n+1 (x) − fF n (x) ≥ 2−n }. Then µHn ≤ 2−n for each n, so
T S P∞ µ( n∈N m≥n Hm ) ≤ inf n∈N m=n 2−m = 0. S T S If x ∈ F \ n∈N m≥n Hm , then there is some k such that x ∈ F \ m≥k Hm , so that fF,m+1 (x) − fF m (x) ≤ 2−m for every m ≥ k, and hfF n (x)in∈N is Cauchy, therefore convergent. Q Q Set fF (x) = limn→∞ fF n (x) for every x ∈ F such that the limit is defined in R, so that fF is measurable and defined almost everywhere in F . If E, F are two sets of finite measure and E ⊆ F , then ρ¯E (uEn , uF n ) ≤ 2 · 4−n for each n. P P An (E) and An (F ) both belong to F, so must have a point w in common; now ρ¯E (uEn , uF n ) ≤ ρ¯E (uEn , w) + ρ¯E (w, uF n ) ≤ ρ¯E (uEn , w) + ρ¯F (w, uF n ) ≤ 4−n + 4−n . Q Q Consequently µ{x : x ∈ E, fF n (x) − fEn (x) ≥ 2−n } ≤ 2n ρ¯E (uF n , uEn ) ≤ 2−n+1 for each n, and limn→∞ fF n − fEn = 0 almost everywhere on E; so that fE = fF a.e. on E. Consequently, if E and F are any two sets of finite measure, fE = fF a.e. on E ∩ F , because both are equal almost everywhere on E ∩ F to fE∪F . Because µ is localizable, it follows that there is an f ∈ L0 such that f = fE a.e. on E for every measurable set E of finite measure (213N). Consider u = f • ∈ L0 . For any set E of finite measure, R R ρ¯E (u, uEn ) = E min(1, f (x) − fEn (x))dx = E min(1, fE (x) − fEn (x))dx → 0 as n → ∞, using Lebesgue’s Dominated Convergence Theorem. Now
178
Function spaces
245E
inf sup ρ¯E (v, u) ≤ inf sup ρ¯E (v, u)
A∈F v∈A
n∈N v∈AEn
≤ inf sup (¯ ρE (v, uEn ) + ρ¯E (u, uEn )) n∈N v∈AEn
≤ inf (4−n + ρ¯E (u, uEn )) = 0. n∈N
As E is arbitrary, F → u. As F is arbitrary, L0 is complete under T. (ii) Now suppose that L0 is complete under T and let E be any family of sets in Σ. Let E 0 be S { E0 : E0 is a finite subset of E}. Then the union of any two members of E 0 belongs to E 0 . Let F be the set {A : A ⊆ L0 , ∃ E ∈ E 0 such that A ⊇ AE }, where for E ∈ E 0 I write AE = {χF • : F ∈ E 0 , F ⊇ E}. Then F is a filter on L0 , because AE ∩ AF = AE∪F for all E, F ∈ E 0 . In fact F is a Cauchy filter. P P Let H be any set of finite measure and ² > 0. Set γ = supE∈E 0 µ(H ∩ E) and take E ∈ E 0 such that µ(H ∩ E) ≥ γ − ². Consider AE ∈ F . If F , G ∈ E 0 and F ⊇ E, G ⊇ E then ρ¯H (χF • , χG• ) = µ(H ∩ (F 4G)) = µ(H ∩ (F ∪ G)) − µ(H ∩ F ∩ G) ≤ γ − µ(H ∩ E) ≤ ². Thus ρ¯H (u, v) ≤ ² for all u, v ∈ AE . As H and ² are arbitrary, F is Cauchy. Q Q Because L0 is complete under T, F has a limit w say. Express w as h• , where h : X → R is measurable, and consider G = {x : h(x) > 21 }. ?? If E ∈ E and µ(E\G) > 0, let F ⊆ E\G be a set of nonzero finite measure. Then {u : ρ¯F (u, w) < 12 µF } belongs to F, so meets AE ; let H ∈ E 0 be such that E ⊆ H and ρ¯F (χH • , w) < 21 µF . Then R 1 min(1, 1 − h(x)) = ρ¯F (χH • , w) < µF ; F 2
1 2
but because F ∩ G = ∅, 1 − h(x) ≥ for every x ∈ F , so this is impossible. X X Thus E \ G is negligible for every E ∈ E. Now suppose that H ∈ Σ and µ(G \ H) > 0. Then there is an E ∈ E such that µ(E \ H) > 0. P P Let F ⊆ G \ H be a set of nonzero finite measure. Let u ∈ A∅ be such that ρ¯F (u, w) < 12 µF . Then u is of the form χC • for some C ∈ E 0 , and R 1 min(1, χC(x) − h(x)) < µF . F 2
1 2
As h(x) ≥ for every x ∈ F , µ(C ∩ F ) > 0. But C is a finite union of members of E, so there is an E ∈ E such that µ(E ∩ F ) > 0, and now µ(E \ H) > 0. Q Q As H is arbitrary, G is an essential supremum of E in Σ. As E is arbitrary, (X, Σ, µ) is localizable. 245F Alternative description of the topology of convergence in measure Let us return to arbitrary measure spaces (X, Σ, µ). (a) For any F ∈ Σ of finite measure and ² > 0 define τF ² : L0 → [0, ∞[ by τF ² (f ) = µ∗ {x : x ∈ F ∩ dom f, f (x) > ²} for f ∈ L0 , taking µ∗ to be the outer measure associated with µ (132B). If f , g ∈ L0 and f =a.e. g, then {x : x ∈ F ∩ dom f, f (x) > ²}4{x : x ∈ F ∩ dom g, g(x) > ²} is negligible, so τF ² (f ) = τF ² (g); accordingly we have a functional from L0 to [0, ∞[, given by
245H
Convergence in measure
179
τ¯F ² (u) = τF ² (f ) 0
0
whenever f ∈ L and u = f ∈ L . •
(b) Now τF ² is not (except in trivial cases) subadditive, so does not define a pseudometric on L0 or L0 . But we can say that, for f ∈ L0 , τF (f ) ≤ ² min(1, ²) =⇒ τF ² (f ) ≤ ² =⇒ τF (f ) ≤ ²(1 + µF ). (The point is that if E ⊆ dom f is a measurable conegligible set such that f ¹E is measurable, then R τF (f ) = E∩F min(f (x), 1)dx, τF ² (f ) = µ{x : x ∈ E ∩ F, f (x) > ²}.) This means that a set G ⊆ L0 is open for the topology of convergence in measure iff for every f ∈ G we can find a set F of finite measure and ², δ > 0 such that τF ² (g − f ) ≤ δ =⇒ g ∈ G. Of course τF δ (f ) ≥ τF ² (f ) whenever δ ≤ ², so we can equally say:G ⊆ L0 is open for the topology of convergence in measure iff for every f ∈ G we can find a set F of finite measure and ² > 0 such that τF ² (g − f ) ≤ ² =⇒ g ∈ G. 0
Similarly, G ⊆ L is open for the topology of convergence in measure on L0 iff for every u ∈ G we can find a set F of finite measure and ² > 0 such that τ¯F ² (v − u) ≤ ² =⇒ v ∈ G. (c) It follows at once that a sequence hfn in∈N in L0 = L0 (µ) converges in measure to f ∈ L0 iff limn→∞ µ∗ {x : x ∈ F ∩ dom f ∩ dom fn , fn (x) − f (x) > ²} = 0 whenever F ∈ Σ, µF < ∞ and ² > 0. Similarly, a sequence hun in∈N in L0 converges in measure to u iff limn→∞ τ¯F ² (u − un ) = 0 whenever µF < ∞ and ² > 0. (d) In particular, if (X, Σ, µ) is totally finite, hfn in∈N → f in L0 iff limn→∞ µ∗ {x : x ∈ dom f ∩ dom fn , f (x) − fn (x) > ²} = 0 for every ² > 0, and hun in∈N → u in L0 iff limn→∞ τ¯X² (u − un ) = 0 for every ² > 0. 245G Embedding Lp in L0 : Proposition Let (X, Σ, µ) be any measure space. Then for any p ∈ [1, ∞], the embedding of Lp = Lp (µ) in L0 = L0 (µ) is continuous for the norm topology of Lp and the topology of convergence in measure on L0 . proof Suppose that u, v ∈ Lp and that µF < ∞. Then (χF )• belongs to Lq , where q = p/(p − 1) (taking q = 1 if p = ∞, q = ∞ if p = 1 as usual), and R ρ¯F (u, v) ≤ u − v × (χF )• ≤ ku − vkp kχF • kq (244E). By 2A3H, this is enough to ensure that the embedding u 7→ u : Lp → L0 is continuous. 245H
The case of L1 is so important that I go farther with it.
Proposition Let (X, Σ, µ) be a measure space. 1 1 R (a)(i) If f ∈ L = L (µ) 1and R ² > R0, there are a δ > 0 and a set F ∈ Σ of finite measure such that f − g ≤ ² whenever g ∈ L , g ≤ f  + δ and ρF (f, g) ≤ δ. R (ii) For any RsequenceRhfn in∈N in L1 and any f ∈ L1 , limn→∞ f − fn  = 0 iff hfn in∈N → f in measure and lim supn→∞ fn  ≤ f . (b)(i) If u ∈ L1 = L1 (µ) and ² > 0, there are a δ > 0 and a set F ∈ Σ of finite measure such that ku − vk1 ≤ ² whenever v ∈ L1 , kvk1 ≤ kuk1 + δ and ρ¯F (u, v) ≤ δ.
180
Function spaces
245H
(ii) For any sequence hun in∈N in L1 and any u ∈ L1 , hun in∈N → u for k k1 iff hun in∈N → u in measure and lim supn→∞ kun k1 ≤ kuk1 . R proof (a)(i) We know that there are a set F of finite measure and an η > 0 such that RE f  ≤R51 ² whenever µ(E ∩ F ) ≤ η (225A). Take δ > 0 such that δ(² + 5µF ) ≤ ²η and δ ≤ 15 ². Then if g ≤ f  + δ and ρF (f, g) ≤ δ, let G ⊆ dom f ∩ dom g be a conegligible measurable set such that f ¹G and g¹G are both measurable. Set E = {x : x ∈ F ∩ G, f (x) − g(x) ≥
² }; ²+5µF
then ²
δ ≥ ρF (f, g) ≥ µE, ²+5µF R so µE ≤ η. Set H = F \ E, so that µ(F \ H) ≤ η and X\H f  ≤ 15 ². On the other hand, for almost every R ² x ∈ H, f (x) − g(x) ≤ ²+5µF , so H f − g ≤ 15 ² and R H
Since
R
g ≤
R
f  + 51 ², R
g ≥
R X\H
R H
1 5
f  − ² ≥
R
f  −
R X\H
1 5
f  − ² ≥
R
2 5
f  − ².
g ≤ 35 ². Now this means that
g − f  ≤
R X\H
g +
R X\H
f  +
R H
3 5
1 5
1 5
g − f  ≤ ² + ² + ² = ²,
as required.
R • (ii) If limn→∞ f − fn  = 0, that is, limn→∞ fn• = f • in L1 , then by 245G we must have hfn• in∈N R →f 0 in R L , that is, hfn in∈N → f for the topology of convergence in measure. Also, of course, limn→∞ fn  = f . R R In the other direction, if lim supn→∞ fn  ≤ f  and hfn in∈N → f for the topology R of convergence R in measure, then whenever δ > 0 and µF < ∞ there must be an m ∈ N such that fn  ≤ f  + δ, R ρF (f, fn ) ≤ δ for every m ≥ n; so (a) tells us that limn→∞ fn − f  = 0. (b) This now follows immediately if we express u as f • , v as g • and un as fn• . 245I Remarks (a) I think the phenomenon here is so important that it is worth looking at some elementary examples. (i) If R µ is counting measure on N, and we set fn (n) = 1, fn (i) = 0 for i 6= n, then hfn in∈N → 0 in measure, while fn  = 1 for every n. (ii) If µ is Lebesgue measureRon [0, 1], and we set fn (x) = 2n for 0 < x ≤ 2−n , 0 for other x, then again hfn in∈N → 0 in measure, while fn  = 1 for every n. R (iii) In 245Cc we have another sequence hfn in∈N converging to 0 in measure, while fn  = 1 for every R R n. In all these cases (as required by Fatou’s Lemma, at least in (i) and (ii)) we have f  ≤ lim inf n→∞ fn . (The next proposition shows that this applies to any sequence which is convergent in measure.) The common feature of these examples is the way in which somehow the fn escape to infinity, either 0 n laterally (in (i)) or vertically (in (iii)) or both (in (ii)). Note that in all three R 0examples we can set fn = 2 fn to obtain a sequence still converging to 0 in measure, but with limn→∞ fn  = ∞. R (b) In 245H, I have used the explicit formulations ‘limn→∞ fn − f  = 0’ (for sequences of functions), ‘hun in∈N → u for k k1 ’ (for sequences in L1 ). These are often expressed by saying that hfn in∈N , hun in∈N are convergent in mean to f , u respectively. 245J
For semifinite spaces we have a further relationship.
Proposition Let (X, Σ, µ) be a semifinite measure space. Write L0 = L0 (µ), etc. R 1 (a)(i) For any a ≥ 0, the set {f : f ∈ L , f  ≤ a} is closed in L0 for the topology of convergence in measure. R (ii) If hfn in∈N is a sequence in L1 which Ris convergent in measure to f ∈ L0 , and lim inf n→∞ fn  < ∞, R then f is integrable and f  ≤ lim inf n→∞ fn .
245L
Convergence in measure
181
(b)(i) For any a ≥ 0, the set {u : u ∈ L1 , kuk1 ≤ a} is closed in L0 for the topology of convergence in measure. (ii) If hun in∈N is a sequence in L1 which is convergent in measure to u ∈ L0 , and lim inf n→∞ kun k1 < ∞, then u ∈ L1 and kuk1 ≤ lim inf n→∞ kun k1 . R 0 proof (a)(i) Set A = {f : f ∈ L1 , f  ≤ a}, and let g be any member of the closure of A R in L . Let h be any simple function such that 0 ≤ h ≤a.e. g, and ² > 0. If h = 0 then of course h ≤ a. Otherwise, setting F = {x : h(x) > 0}, M = supx∈X h(x), there is an f ∈ A such that µ∗ {x : x ∈ F ∩ dom f ∩ dom g, f (x) − g(x) ≥ ²} ≤ ² (245F); let E ⊇ {x : x ∈ F ∩ dom fR ∩ dom g, f (x) − g(x) ≥ ²} be a measurable set of measure at most ². Then h ≤a.e. f  + ²χF + M χE, so h ≤ a + ²(M + µF ). As ² is R arbitrary,R h ≤ a. But we are supposing that µ is semifinite, so this is enough to ensure that g is integrable and that g ≤ a (213B), that is, that g ∈ A. As g is arbitrary, A is closed. R (ii) Now if hfn in∈N is convergent R in measure to f , and lim inf n∈N fn  = a, then for any ² > 0 there is a subsequence hfn(k) ik∈N such that Rfn(k)  ≤ a + ² for every k; since hfn(k) ik∈N still converges in measure R to f , f  ≤ a + ². As ² is arbitrary, f  ≤ a. (b) As in 245H, this is just a translation of part (a) into the language of L1 and L0 . 245K For σfinite measure spaces, the topology of convergence in measure on L0 is metrizable, so can be described effectively in terms of convergent sequences; it is therefore important that we have, in this case, a sharp characterisation of sequential convergence in measure. Proposition Let (X, Σ, µ) be a σfinite measure space. Then (a) a sequence hfn in∈N in L0 converges in measure to f ∈ L0 iff every subsequence of hfn in∈N has a subsubsequence converging to f almost everywhere; (b) a sequence hun in∈N in L0 converges in measure to u ∈ L0 iff every subsequence of hun in∈N has a subsubsequence which order*converges to u. R proof (a)(i) Suppose that hfn in∈N → f , that is, that limn→∞ f − fn  ∧ χF = 0 for every set F of finite measure. Let hEk in∈N be a nondecreasing sequence of sets of finite measure covering X, and letP hn(k)ik∈N R ∞ be a strictly increasing sequence in N such that f − fn(k)  ∧ χEk ≤ 4−k for every k ∈ N. Then k=0 f − P∞ fn(k)  ∧ χEk < ∞ a.e. (242E); but limk→∞ fn(k) (x) = f (x) whenever k=0 min(1, f (x) − fn(k) (x)) < ∞, so hfn(k) ik∈N → f a.e. (ii) The same applies to every subsequence of hfn in∈N , so that every subsequence of hfn in∈N has a subsubsequence converging to f almost everywhere. (iii) Now suppose that hfn in∈N does not converge to f . Then there is an open set U containing f such that {n : fn ∈ / U } is infinite, that is, hfn in∈N has a subsequence hfn0 in∈N with fn0 ∈ / U for every n. But now no subsubsequence of hfn0 in∈N converges to f in measure, so no such subsubsequence can converge almost everywhere, by 245Ca. (b) This follows immediately from (a) if we express u as f • , un as fn• . 245L Corollary Let (X, Σ, µ) be a σfinite measure space. (a) A subset A of L0 = L0 (µ) is closed for the topology of convergence in measure iff f ∈ A whenever f ∈ L0 and there is a sequence hfn in∈N in A such that f =a.e. limn→∞ fn . (b) A subset A of L0 = L0 (µ) is closed for the topology of convergence in measure iff u ∈ A whenever u ∈ L0 and there is a sequence hun in∈N in A order*converging to u. proof (a)(i) If A is closed for the topology of convergence in measure, and hfn in∈N is a sequence in A converging to f almost everywhere, then hfn in∈N converges to f in measure, so surely f ∈ A (since otherwise all but finitely many of the fn would have to belong to the open set L0 \ A). (ii) If A is not closed, there is an f ∈ A \ A. The topology can be defined by a metric ρ (245Eb), and we can choose a sequence hfn in∈N in A such that ρ(fn , f ) ≤ 2−n for every n, so that hfn in∈N → f in measure. By 245K, hfn in∈N has a subsequence hfn0 in∈N converging a.e. to f , and this witnesses that A fails to satisfy the condition. (b) This follows immediately, because A ⊆ L0 is closed iff {f : f • ∈ A} is closed in L0 .
182
Function spaces
245M
245M Complex L0 In 241J I briefly discussed the adaptations needed to construct the complex linear space L0C . The formulae of 245A may be used unchanged to define topologies of convergence in measure on L0C and L0C . I think that every word of 245B245L still applies if we replace each L0 or L0 with L0C or L0C . Alternatively, to relate the ‘real’ and ‘complex’ forms of 245E, for instance, we can observe that because max(ρF (Re(u), Re(v)), ρF (Im(u), Im(v))) ≤ ρF (u, v) ≤ ρF (Re(u), Re(v)) + ρF (Im(u), Im(v)) for all u, v ∈ L0 and all sets F of finite measure, L0C can be identified, as uniform space, with L0 × L0 , so is Hausdorff, or metrizable, or complete iff L0 is. 245X Basic exercises > (a) Let X be any set, and µ counting measure on X. Show that the topology of convergence in measure on L0 (µ) = RX is just the product topology on RX regarded as a product of copies of R. ˆ µ > (b) Let (X, Σ, µ) be any measure space, and (X, Σ, ˆ) its completion. Show that the topologies of 0 0 convergence in measure on L (µ) = L (ˆ µ) (241Xb), corresponding to the families {ρF : F ∈ Σ, µF < ∞}, ˆ µ {ρF : F ∈ Σ, ˆF < ∞} are the same. >(c) Let (X, Σ, µ) be any measure space; set L0 = L0 (µ). Let u, un ∈ L0 for n ∈ N. Show that the following are equiveridical: (i) hun in∈N order*converges to u in the sense of 245C; (ii) there are measurable functions f , fn : X → R such that f • = u, fn• = un for every n ∈ N, and f (x) = limn→∞ fn (x) for every x ∈ X; (iii) u = inf n∈N supm≥n um = supn∈N inf m≥n um , the infima and suprema being taken in L0 ; (iv) inf n∈N supm≥n u − um  = 0 in L0 ; (v) there is a nonincreasing sequence hvn in∈N in L0 such that inf n∈N vn = 0 in L0 and u − vn ≤ un ≤ u + vn for every n ∈ N; (vi) there are sequences hvn in∈N , hwn in∈N in L0 such that hvn in∈N is nondecreasing, hwn in∈N is nonincreasing, supn∈N vn = u = inf n∈N wn and vn ≤ un ≤ wn for every n ∈ N. (d) Let (X, Σ, µ) be a semifinite measure space. Show that a sequence hun in∈N in L0 = L0 (µ) is order*convergent to u ∈ L0 iff {un  : n ∈ N} is bounded above in L0 and hsupm≥n um − uin∈N → 0 for the topology of convergence in measure. (e) Write out proofs that L0 (µ) is complete (as linear topological space) adapted to the special cases (i) µX = 1 (ii) µ is σfinite, taking advantage of any simplifications you can find. (f ) Let (X, Σ, µ) be a measure space and r ≥ 1; let h : Rr → R be a continuous function. (i) Suppose that for 1 ≤ k ≤ r we are given a sequence hfkn in∈N in L0 = L0 (µ) converging in measure to fk ∈ L0 . Show that hh(f1n , . . . , fkn )in∈N converges in measure to h(f1 , . . . , fk ). (ii) Generally, show that (f1 , . . . , fr ) 7→ h(f1 , . . . , fr ) : (L0 )r → L0 is continuous for the topology of convergence in measure. (iii) Show that the ¯ : (L0 )r → L0 (241Xd) is continuous for the topology of convergence in measure. corresponding function h R (g) Let (X, Σ, µ) be a measure space and u ∈ L1 (µ). Show that v 7→ u × v : L∞ → R is continuous for the topology of convergence in measure on the unit ball of L∞ , but not, as a rule, on the whole of L∞ . (h) Let (X, Σ, µ) be a measure space and v a nonnegative member of L1 = L1 (µ). Show that on the set A = {u : u ∈ L1 , u ≤ v} the subspace topologies (2A3C) induced by the norm topology of L1 and the topology of convergence in measure are the same. (Hint: given ² > 0, take F ∈ Σ of finite measure and R M ≥ 0 such that (v − M χF • )+ ≤ ². Show that ku − u0 k1 ≤ ² + M ρ¯F (u, u0 ) for all u, u0 ∈ A.) (i) Let (X, Σ, µ) be a measure space and F a filter on L1 = L1 (µ) which is convergent, for the topology of convergence in measure, to u ∈ L1 . Show that F → u for the norm topology of L1 iff inf A∈F supv∈A kvk1 ≤ kuk1 .
245Yh
Convergence in measure
183
> (j) Let (X, Σ, µ) be a semifinite measure space and p ∈ [1, ∞], a ≥ 0. Show that {u : u ∈ Lp (µ), kukp ≤ a} is closed in L0 (µ) for the topology of convergence in measure. (k) Let (X, Σ, µ) be a measure space, and hun in∈N a sequence in Lp = Lp (µ), where 1 ≤ p < ∞. Let u ∈ Lp . Show that the following are equiveridical: (i) u = limn→∞ un for the norm topology of Lp (ii) hun in∈N → u for the topology of convergence in measure and limn→∞ kun kp = kukp (iii) hun in∈N → u for the topology of convergence in measure and lim supn→∞ kun kp ≤ kukp . (l) Let X be a set and µ, ν two measures on X with the same measurable sets and the same negligible sets. (i) Show that L0 (µ) = L0 (ν) and L0 (µ) = L0 (ν). (ii) Show that if both µ and ν are semifinite, then they define the same topology of convergence in measure on L0 and L0 . (Hint: use 215A to show that if µE < ∞ then µE = sup{νF : F ⊆ E, νF < ∞}.) (m) Let (X, Σ, µ) be a measure space and p ∈ [1, ∞[. Suppose that hun in∈N is a sequence in Lp (µ) which converges for k kp to u ∈ Lp (µ). Show that hun p in∈N → up for k k1 . (Hint: 245G, 245Xf, 245H.) 245Y Further exercises (a) Let (X, Σ, µ) be a measure space and give Σ the topology described in 232Ya. Show that χ : Σ → L0 (µ) is a homeomorphism between Σ and its image χ[Σ] in L0 , if L0 is given the topology of convergence in measure and χ[Σ] the subspace topology. (b) Let (X, Σ, µ) be a measure space and Y any subset of X; let µY be the subspace measure on Y . Let T : L0 (µ) → L0 (µY ) be the canonical map defined by setting T (f • ) = (f ¹ Y )• for every f ∈ L0 (µ) (241Ye). Show that T is continuous for the topologies of convergence in measure on L0 (µ) and L0 (µY ). (c) Let (X, Σ, µ) be a measure space, and µ ˜ the c.l.d. version of µ. Show that the map T : L0 (µ) → L0 (˜ µ) 0 0 induced by the inclusion L (µ) ⊆ L (˜ µ) (241Ya) is continuous for the topologies of convergence in measure. (d) Let (X, Σ, µ) be a measure space, and give L0 = L0 (µ) the topology of convergence in measure. Let A ⊆ L0 be a nonempty downwardsdirected set, and suppose that inf A = 0 in L0 . (i) Let F ∈ Σ be any set of finite measure, and define τ¯F as in 245A; show that inf u∈A τ¯F (u) = 0. (Hint: set γ = inf u∈A τ¯F (u); find a nonincreasing sequence hun in∈N in A such that limn→∞ τ¯F (un ) = γ; set v = (χF )• ∧ inf n∈N un and show that u ∧ v = v for every u ∈ A, so that v = 0.) (ii) Show that if U is any open set containing 0, there is a u ∈ A such that v ∈ U whenever 0 ≤ v ≤ u. (e) Let (X, Σ, µ) be a measure space. (i) Show that for u ∈ L0 = L0 (µ) we may define ψa (u), for a ≥ 0, by setting ψa (u) = µ{x : f (x) ≥ a} whenever f : X → R is a measurable function and f • = u. (ii) Define ρ : L0 × L0 → [0, 1] by setting ρ(u, v) = min({1} ∪ {a : a ≥ 0, ψa (u − v) ≤ a}. Show that ρ is a metric on L0 , that L0 is complete under ρ, and that +, −, ∧, ∨ : L0 × L0 → L0 are continuous for ρ. (iii) Show that c 7→ cu : R → L0 is continuous for every u ∈ L0 iff (X, Σ, µ) is totally finite, and that in this case ρ defines the topology of convergence in measure on L0 . (f ) Let (X, Σ, µ) be a localizable measure space and A ⊆ L0 = L0 (µ) a nonempty upwardsdirected set which is bounded in the linear topological space sense (i.e., such that for every neighbourhood U of 0 in L0 there is a k ∈ N such that A ⊆ kU ). Show that A is bounded above in L0 , and that its supremum belongs to its closure. (g) Let (X, Σ, µ) be a measure space, p ∈ [1, ∞[ and v a nonnegative member of Lp = Lp (µ). Show that on the set A = {u : u ∈ Lp , u ≤ v} the subspace topologies induced by the norm topology of Lp and the topology of convergence in measure are the same. (h) Let S be the set of all sequences s : N → N such that limn→∞ s(n) = ∞. For every s ∈ S, let (Xs , Σs , µs ) be [0, 1] with Lebesgue measure, and let (X, Σ, µ) be the direct sum of h(Xs , Σs , µs )is∈S (214K). For s ∈ S, t ∈ [0, 1], n ∈ N set hn (s, t) = fs(n) (t), where hfn in∈N is the sequence of 245Cc. Show that hhn in∈N → 0 for the topology of convergence in measure on L0 (µ), but that hhn in∈N has no subsequence which is convergent to 0 almost everywhere.
184
Function spaces
245Yi
(i) Let X be a set, and suppose we are given a relation * between sequences in X and members of X such that (α) if xn = x for every n then hxn in∈N * x (β) hx0n in∈N * x whenever hxn in∈N * x and hx0n in∈N is a subsequence of hxn in∈N (γ) if hxn in∈N * x and hxn in∈N * y then x = y. Show that we have a topology T on X defined by saying that a subset G of X belongs to T iff whenever hxn in∈N is a sequence in X and hxn in∈N * x ∈ G then some xn belongs to G. Show that a sequence hxn in∈N in X is Tconvergent to x iff every subsequence of hxn in∈N has a subsubsequence hx00n in∈N such that hx00n in∈N * x. (j) Let µ be Lebesgue measure on R r . Show that L0 (µ) is separable for the topology of convergence in measure. (Hint: 244I.) 245 Notes and comments In this section I am inviting you to regard the topology of (local) convergence in measure as the standard topology on L0 , just as the norms define the standard topologies on Lp spaces for p ≥ 1. The definition I have chosen is designed to make addition and scalar multiplication and the operations ∨, ∧ and × continuous (245D); see also 245Xf. From the point of view of functional analysis these properties are more important than metrizability or even completeness. Just as the algebraic and order structure of L0 can be described in terms of the general theory of Riesz spaces, the more advanced results 241G and 245E also have interpretations in the general theory. It is not an accident that (for semifinite measure spaces) L0 is Dedekind complete iff it is complete as uniform space; you may find the relevant generalizations in 23K and 24E of Fremlin 74. Of course it is exactly because the two kinds of completeness are interrelated that I feel it necessary to use the phrase ‘Dedekind completeness’ to distinguish this particular kind of ordercompleteness from the more familiar uniformitycompleteness described in 2A5F. The usefulness of the topology of convergence in measure derives in large part from 245G245J and the Lp versions 245Xj and 245Xk. Some of the ideas here can be related to a question arising out of the basic convergence theorems. If hfn in∈N is a sequence of integrable functions converging (pointwise) to a function f , R R in what ways can f fail to be limn→∞ fn ? In the language of this section, this translates into: if we have a sequence (or filter) in L1 converging for the topology of convergence in measure, in what ways can it fail to converge for the norm topology of L1 ? The first answer is Lebesgue’s Dominated Convergence Theorem: this cannot happen if the sequence is dominated, that is, lies within some set of the form {u : u ≤ v} where v ∈ L1 . (See 245Xh and 245Yg.) I will return to this in the next section. For the moment, though, 245H tells us that if hun in∈N converges in measure to u ∈ L1 , but not for the topology of L1 , it is because lim supn→∞ kun k1 is too big; some of its weight is being lost at infinity, as in the examples of 245I. If hun in∈N actually order*converges to u, then Fatou’s Lemma tells us that lim inf n→∞ kun k1 ≥ kuk1 , that is, that the limit cannot have greater weight (as measured by k k1 ) than the sequence provides. 245J and 245Xj are generalizations of this to convergence in measure. If you want a generalization of B.Levi’s theorem, then 242Yb remains the best expression in the language of this chapter; but 245Yf is a version in terms of the concepts of the present section. In the case of σfinite spaces, we have an alternative description of the topology of convergence in measure (245L) which makes no use of any of the functionals or pseudometrics in 245A. This can be expressed, at least in the context of L0 , in terms of a standard result from general topology (245Yi). You will see that that result gives a recipe for a topology on L0 which could be applied in any measure space. What is remarkable is that for σfinite spaces we get a linear space topology.
246 Uniform integrability The next topic is a fairly specialized one, but it is of great importance, for different reasons, in both probability theory and functional analysis, and it therefore seems worth while giving a proper treatment straight away.
246C
Uniform integrability
185
246A Definition Let (X, Σ, µ) be a measure space. (a) A set A ⊆ L1 (µ) is uniformly integrable if for every ² > 0 we can find a set E ∈ Σ, of finite measure, and an M ≥ 0 such that R (f  − M χE)+ ≤ ² for every f ∈ A. (b) A set A ⊆ L1 (µ) is uniformly integrable if for every ² > 0 we can find a set E ∈ Σ, of finite measure, and an M ≥ 0 such that R (u − M χE • )+ ≤ ² for every u ∈ A. 246B Remarks (a) Recall the formulae from 241Ef: u+ = u ∨ 0, so (u − v)+ = u − u ∧ v. (b) The phrase ‘uniformly integrable’ is not particularly helpful. But of course we can observe that for any particular integrable function f , there are simple functions approximating f for k k1 (242M), and such functions will be bounded (in modulus) by functions of the form M χE, with µE < ∞; thus singleton subsets of L1 and L1 are uniformly integrable. A general uniformly integrable set of functions is one in which M and E can be chosen uniformly over the set. (c) It will I hope be clear from the definitions that A ⊆ L1 is uniformly integrable iff {f • : f ∈ A} ⊆ L1 is uniformly integrable. (d) There is a useful simplification in the definition if µX < ∞ (in particular, if (X, Σ, µ) is a probability space). In this case a set A ⊆ L1 (µ) is uniformly integrable iff R inf M ≥0 supu∈A (u − M e)+ = 0 iff
R limM →∞ supu∈A (u − M e)+ = 0,
R • 1 writing constant function with value 1. (For if supu∈A (u − M χE • )+ ≤ ², R e = 1 ∈0 L+ (µ), where 1 is the then (u − M e) ≤ ² for every M 0 ≥ M .) Similarly, A ⊆ L1 (µ) is uniformly integrable iff R limM →∞ supf ∈A (f  − M 1)+ = 0 iff
R inf M ≥0 supf ∈A (f  − M 1)+ = 0.
Warning! Some authors use the phrase ‘uniformly integrable’ for sets satisfying the conditions in (d) even when µ is not totally finite. 246C We have the following wideranging stability properties of the class of uniformly integrable sets in L1 or L1 . Proposition Let (X, Σ, µ) be a measure space and A a uniformly integrable subset of L1 (µ). (a) A is bounded for the norm k k1 . (b) Any subset of A is uniformly integrable. (c) For any a ∈ R, aA = {au : u ∈ A} is uniformly integrable. (d) There is a uniformly integrable C ⊇ A such that C is convex and k k1 closed, and v ∈ C whenever u ∈ C and v ≤ u. (e) If B is another uniformly integrable subset of L1 , then A ∪ B, A + B = {u + v : u ∈ A, v ∈ B} are uniformly integrable. proof Write Σf for {E : E ∈ Σ, µE < ∞}.
R (a) There must be E ∈ Σf , M ≥ 0 such that (u − M χE • )+ ≤ 1 for every u ∈ A; now R R kuk1 ≤ (u − M χE • )+ + M χE • ≤ 1 + M µE
186
Function spaces
246C
for every u ∈ A, so A is bounded. (b) This is immediate from the definition 246Ab. f R (c) Given ² >•0,+ we can find E ∈ Σ , M ≥ 0 such that a (v − aM χE ) ≤ ² for every v ∈ aA. E
R E
(u − M χE • )+ ≤ ² for every u ∈ A; now
(d) If A is empty, take C = A. Otherwise, try R R C = {v : v ∈ L1 , (v − w)+ ≤ supu∈A (u − w)+ for every w ∈ L1 (µ)}. Evidently A ⊆ C, and C satisfies the definition 246Ab because A does, considering w of the form M χE • where E ∈ Σf and M ≥ 0. The functionals R v 7→ (v − w)+ : L1 (µ) → R R are all continuous for k k1 (because the operators v 7→ v, v 7→ v − w, v 7→ v + , v 7→ v are continuous), so C is closed. If v 0  ≤ v and v ∈ C, then R 0 R R (v  − w)+ ≤ (v − w)+ ≤ supu∈A (u − w)+ for every w, and v 0 ∈ C. If v = av1 + bv2 where v1 , v2 ∈ C, a ∈ [0, 1] and b = 1 − a, then v ≤ av1  + bv2 , so v − w ≤ (av1  − aw) + (bv2  − bw) ≤ (av1  − aw)+ + (bv2  − bw)+ and (v − w)+ ≤ a(v1  − w)+ + b(v2  − w)+ for every w; accordingly Z
Z
Z (v1  − w)+ + b (v2  − w)+ Z Z ≤ (a + b) sup (u − w)+ = sup (u − w)+
(v − w)+ ≤ a
u∈A
u∈A
for every w, and v ∈ C. Thus C has all the required properties. (e) I show first that A ∪ B is uniformly integrable. P P Given ² > 0, let M1 , M2 ≥ 0 and E1 , E2 ∈ Σf be such that R (u − M1 χE1• )+ ≤ ² for every u ∈ A, R (u − M2 χE2• )+ ≤ ² for every u ∈ B. Set M = max(M1 , M2 ), E = E1 ∪ E2 ; then µE < ∞ and R (u − M χE • )+ ≤ ² for every u ∈ A ∪ B. As ² is arbitrary, A ∪ B is uniformly integrable. Q Q Now (d) tells us that there is a convex uniformly integrable set C including A ∪ B, and in this case A + B ⊆ 2C, so A + B is also uniformly integrable, using (b) and (c). 246D Proposition Let (X, Σ, µ) be a probability space and A ⊆ L1 (µ) a uniformly integrable set. Then there is a convex, k k1 closed uniformly integrable set C ⊆ L1 such that A ⊆ C, w ∈ C whenever v ∈ C and w ≤ v, and P v ∈ C whenever v ∈ C and P is the conditional expectation operator associated with a σsubalgebra of Σ. proof Set
R R C = {v : v ∈ L1 (µ), (v − M e)+ ≤ supu∈A (u − M e)+ for every M ≥ 0},
writing e = 1• as usual. The arguments in the proof of 246Cd make it plain that C ⊇ A is uniformly integrable, convex and closed, and that w ∈ C whenever v ∈ C and w ≤ v. As for the conditional
246G
Uniform integrability
187
expectation operators, if v ∈ C, T is a σsubalgebra of Σ, P is the associated conditional expectation operator, and M ≥ 0, then P v ≤ P v = P ((v ∧ M e) + (v − M e)+ ) ≤ M e + P ((v − M e)+ ), so (P v − M e)+ ≤ P ((v − M e)+ ) and
R
(P v − M e)+ ≤
R
R R P (v − M e)+ = (v − M e)+ ≤ supu∈A (u − M e)+ ;
as M is arbitrary, P v ∈ C. 246E Remarks (a) Of course 246D has an expression in terms of L1 rather than L1 : if (X, Σ, µ) is a probability space and A ⊆ L1 (µ) is uniformly integrable, then there is a uniformly integrable set C ⊇ A such that (i) af + (1 − a)g ∈ C whenever f , g ∈ C and a ∈ [0, 1] (ii) g ∈ C whenever f ∈RC, g ∈ L0 (µ) and g ≤a.e. f  (iii) f ∈ C whenever there is a sequence hfn in∈N in C such that limn→∞ f − fn  = 0 (iv) g ∈ C whenever there is an f ∈ C such that g is a conditional expectation of f with respect to some σsubalgebra of Σ. (b) In fact, there are obvious extensions of 246D; the proof there already shows that T [C] ⊆ C whenever T : L1 (µ) → L1 (µ) is an orderpreserving linear operator such that kT uk1 ≤ kuk1 for every u ∈ L1 (µ) and kT uk∞ ≤ kuk∞ for every u ∈ L1 (µ) ∩ L∞ (µ) (246Yc). If we had done a bit more of the theory of operators on Riesz spaces I should be able to take you a good deal farther along this road; for instance, it is not in fact necessary to assume that the operators T of the last sentence are orderpreserving. I will return to this in Chapter 37 in the next volume. (c) Moreover, the main theorem of the next section will show that for any measure spaces (X, Σ, µ), (Y, T, ν), T [A] will be uniformly integrable in L1 (ν) whenever A ⊆ L1 (µ) is uniformly integrable and T : L1 (µ) → L1 (ν) is a continuous linear operator (247D). 246F
We shall need an elementary lemma which I have not so far spelt out.
Lemma Let (X, Σ, µ) be a measure space. Then for any u ∈ L1 (µ), R kuk1 ≤ 2 supE∈Σ  E u. proof Express u as f • where f : X → R is measurable. Set F = {x : f (x) ≥ 0}. Then R R R R R kuk1 = f  =  F f  +  X\F f  ≤ 2 supE∈Σ  E f  = 2 supE∈Σ  E u. 246G
Now we come to some of the remarkable alternative descriptions of uniform integrability.
Theorem Let (X, Σ, µ) be any measure space and A a nonempty subset of L1 (µ). Then the following are equiveridical: (i) A is uniformly integrable; R (ii) supu∈A  u < ∞ for every µatom F ∈ Σ and for every ² > 0 there are E ∈ Σ, δ > 0 such that R F µE < ∞ and  F u R ≤ ² whenever u ∈ A, F ∈ Σ and µ(F ∩ E) ≤ δ; R (iii) supu∈A  F u < ∞ for every µatom F ∈ Σ and limn→∞ supu∈A  Fn u = 0 whenever hFn in∈N is a disjoint sequenceR in Σ; R (iv) supu∈A  F u < ∞ for every µatom F ∈ Σ and limn→∞ supu∈A  Fn u = 0 whenever hFn in∈N is a nonincreasing sequence in Σ with empty intersection. Remark I use the phrase ‘µatom’ to emphasize that I mean an atom in the measure space sense (211I). proof (a)(i)⇒(iv) Suppose that A is uniformly integrable. Then surely if F ∈ Σ is a µatom, R supu∈A  F u ≤ supu∈A kuk1 < ∞,
188
Function spaces
246G
by 246Ca. Now suppose that hFn in∈N is a nonincreasing sequence in Σ with empty intersection, and that R ² > 0. Take E ∈ Σ, M ≥ 0 such that µE < ∞ and (u − M χE • )+ ≤ 12 ² whenever u ∈ A. Then for all n large enough, M µ(Fn ∩ E) ≤ 12 ², so that R R R R ²  Fn u ≤ Fn u ≤ (u − M χE • )+ + Fn M χE • ≤ + M µ(Fn ∩ E) ≤ ² 2 R for every u ∈ A. As ² is arbitrary, limn→∞ supu∈A  Fn u = 0, and (iv) is true. R (b)(iv)⇒(iii) Suppose that (iv) is true. Then of course supu∈A  F u < ∞ for every µatom F ∈ Σ. ?? Suppose, if possible, that hFn in∈N is a disjoint sequence in Σ such that R ² = lim supn→∞ supu∈A min(1, 13  Fn u) > 0. R S Set Hn = i≥n Fi for each n, so that hHn in∈N is nonincreasing and has empty intersection, and Hn u → 0 as n → ∞ for every u ∈ L1 (µ). ChooseR hni ii∈N , hmi ii∈N , hui ii∈N inductively, asR follows. n0 = 0. Given ni ∈ N, take mi ≥ ni , ui ∈ A such that  Fm ui  ≥ 2². Take ni+1 > mi such that Hn ui  ≤ ². Continue. i i+1 S Set Gk = i≥k Fmi for each k. Then hGk ik∈N is a nonincreasing sequence in Σ with empty intersection. But Fmi ⊆ Gi ⊆ Fmi ∪ Hni+1 , so R R R R  Gi ui  ≥  Fm ui  −  Gi \Fm ui  ≥ 2² − Hn ui  ≥ ² i
i
i+1
for every i, contradicting the hypothesis X R (iv). X This means that limn→∞ supu∈A  Fn u must be zero, and (iii) is true. R (c)(iii)⇒(ii) We still have supu∈A  F u < ∞ for every µatom F . ?? Suppose, if possible, that there is an ² > 0 such that for every R measurable set E of finite measure and every δ > 0 there are u ∈ A, F ∈ Σ such that µ(F ∩ E) ≤ δ and  F u ≥ ². Choose a sequence hEn in∈N of sets of finite measure, a sequence hGn in∈N in Σ, a sequence hδn in∈N of strictly positive real numbers and aSsequence hun in∈N in A as follows. Given uk , ERk , δk for k < n, choose un ∈ A and Gn ∈ Σ such that µ(Gn ∩ k 0 such that F un  ≤ 12 ² whenever F ∈ Σ and µ(F ∩ En ) ≤ δn (see 225A). Continue. S On completing the induction, set Fn = En ∩ Gn \ k>n Gk for each n; then hFn in∈N is a disjoint sequence in Σ. By the choice of Gk , P∞ S µ(En ∩ k>n Gk ) ≤ k=n+1 2−k δn ≤ δn , R R R so µ(En ∩ (Gn \ Fn )) ≤ δn and Gn \Fn un  ≤ 21 ². This means that  Fn un  ≥  Gn un  − 12 ² ≥ 21 ². But this is contrary to the hypothesis (iii). X X α) Assume (ii). Let ² > 0. Then there are E ∈ Σ, Rδ > 0 such that µE < ∞ and R (d)(ii)⇒(i)(α  F u ≤ ² whenever u ∈ A, F ∈ Σ and µ(F ∩ E) ≤Rδ. Now supu∈A E u < ∞. P P Write I for the family of those F ∈ Σ such that F ⊆ E and sup u is finite. If F ⊆ E is an atom for µ, then u∈A F R R supu∈A F u = supu∈A  F u < ∞, so F ∈ I. (The point is that if f : X → R is a measurable function • 0 00 such that R f = u, thenRone of F R = {x : x R ∈ F, f (x)R ≥ 0}, F = {x : x ∈ F, f (x) < 0} must be negligible, so that F u is either F 0 u = F u or − F 00 u = − F u.) If F ∈ Σ, F ⊆ E and µF ≤ δ then R R supu∈A F u ≤ 2 supu∈A,G∈Σ,G⊆F  G u ≤ 2² R R R (by 246F), so F ∈ I. Next, if F , G ∈ I then supu∈A F ∪G u ≤ S supu∈A F u + supu∈A G u is finite, so F ∪ GS∈ I. Finally, if hFSn in∈N is any sequence in I, and F = n∈N Fn , there is some n ∈ N such that S µ(F \ i≤n Fi ) ≤ δ; now i≤n Fi and F \ i≤n Fi both belong to I, so F ∈ I. By 215Ab, there is an F ∈ I such that H \F is negligible for every H ∈ I. Now observe that E \F cannot include any nonnegligible member of I; in particular, cannot include either an atom or a nonnegligible set of measure less than δ. But this means that the subspace measure on E \ F is atomless, totally finite and has no nonnegligible measurable sets of measure less than δ; by 215D, µ(E \ F ) = 0 and E \ F and E belong toR I, as required. Q Q R Since X\E u ≤ δ for every u ∈ A, γ = supu∈A u is also finite. β ) Set M = γ/δ. If u ∈ A, express u as f • , where f : X → R is measurable, and consider (β
246J
Uniform integrability
189
F = {x : f (x) ≥ M χE(x)}. Then
R R M µ(F ∩ E) ≤ F f = F u ≤ γ, R R so µ(F ∩ E) ≤ γ/M = δ. Accordingly F u ≤ ². Similarly, F 0 (−u) ≤ ², writing F 0 = {x : −f (x) ≥ M χE(x)}. But this means that R R R R (u − M χE • )+ = (f  − M χE)+ ≤ F ∪F 0 f  = F ∪F 0 u ≤ 2², for every u ∈ A. As ² is arbitrary, A is uniformly integrable. 246H Remarks (a) Of course conditions (ii)(iv) of this theorem, like (i), have direct R translations in terms of members of L1 . Thus a nonempty set A ⊆ L1 is uniformly integrable iff supf ∈A  F f  is finite for every atom F ∈ Σ and R either for every ² > 0 we can find E ∈ Σ, δ > 0 such that µE < ∞ and  F f  ≤ ² whenever f ∈ A, F ∈ Σ and µ(F ∩ E) ≤ δ R or limn→∞ supf ∈A  Fn f = 0 for every disjoint sequence hFn in∈N in Σ R or limn→∞ supf ∈A  Fn f  = 0 for every nondecreasing sequence hFn in∈N in Σ with empty intersection. (b) There are innumerable further equivalent expressions characterizing uniform integrability; every author has his own favourite. Many of them are variants on (i)(iv) of this theorem, as in 246I and 246Yd246Yf. For a condition of a quite different kind, see Theorem 247C. 246I Corollary Let (X, Σ, µ) be a probability space. For f ∈ L0 (µ), M ≥ 0 set F (f, M ) = {x : x ∈ dom f, f (x) ≥ M }. Then a nonempty set A ⊆ L1 (µ) is uniformly integrable iff R limM →∞ supf ∈A F (f,M ) f  = 0. proof (a) If A satisfies the condition, then R R inf M ≥0 supf ∈A (f  − M χX)+ ≤ inf M ≥0 supf ∈A F (f,M ) f  = 0, so A is uniformly integrable.
R (b) If A is uniformly Rintegrable, and ² > 0, there is an M0 ≥ 0 such that (f  − M0 χX)+ ≤ ² for every f ∈ A; also, γ = supf ∈A f  is finite (246Ca). Take any M ≥ M0 max(1, (1 + γ)/²). If f ∈ A, then f  × χF (f, M ) ≤ (f  − M0 χX)+ + M0 χF (f, M ) ≤ (f  − M0 χX)+ +
² f  γ+1
everywhere on dom f , so R F (f,M )
As ² is arbitrary, limM →∞ supf ∈A
R ² R f  ≤ 2². f  ≤ (f  − M0 χX)+ +
R F (f,M )
γ+1
f  = 0.
246J The next step is to set out some remarkable connexions between uniform integrability and the topology of convergence in measure discussed in the last section. Theorem Let (X, Σ, µ) be a measure space. (a) If hfn in∈N is a uniformly integrable sequence of realvalued functions on X, and fR(x) = limn→∞ Rfn (x) R for almost every x ∈ X, then f is integrable and limn→∞ fn − f  = 0; consequently f = limn→∞ fn . (b) If A ⊆ L1 = L1 (µ) is uniformly integrable, then the norm topology of L1 and the topology of convergence in measure of L0 = L0 (µ) agree on A. (c) For any u ∈ L1 and any sequence hun in∈N in L1 , the following are equiveridical: (i) u = limn→∞ un for k k1 ;
190
Function spaces
246J
(ii) {un : n ∈ N} is uniformly integrable and hun in∈N converges to u in measure. (d) If (X, Σ, µ) is semifinite, and A ⊆ L1 is uniformly integrable, then the closure A of A in L0 for the topology of convergence in measure is still a uniformly integrable subset of L1 . R proof (a) Note first that because sup and f  = lim inf n→∞ fn , Fatou’s Lemma R n∈N fn  < ∞ (246Ca) R assures us that f  is integrable, with f  ≤ lim supn→∞ fn . It follows immediately that {fn −f : n ∈ N} is uniformly integrable, being the sum of two uniformly integrable R sets (246Cc, 246Ce). Given ² > 0, there are M ≥ 0, E ∈ Σ such that µE < ∞ and (fn − f  − M χE)+ ≤ ² for every n ∈ N. Also fn − f  ∧ M χE → 0 a.e., so Z Z lim sup fn − f  ≤ lim sup (fn − f  − M χE)+ n→∞ n→∞ Z + lim sup fn − f  ∧ M χE n→∞
≤ ², by Lebesgue’s Dominated Convergence Theorem. As ² is arbitrary, limn→∞ f = 0.
R
fn −f  = 0 and limn→∞
R
fn −
(b) Let TA , SA be the topologies on A induced by the norm topology of L1 and the topology of convergence in measure on L0 respectively. R (i) Given ² > 0, let F ∈ Σ, M ≥ 0 be such that µF < ∞ and (v − M χF • )+ ≤ ² for every v ∈ A, and consider ρ¯F , defined as in 245A. Then for any f , g ∈ L0 , f − g ≤ (f  − M χF )+ + (g − M χF )+ + M (f − g ∧ χF ) everywhere on dom f ∩ dom g, so u − v ≤ (u − M χF • )+ + (v − M χF • )+ + M (u − v ∧ χF • ) for all u, v ∈ L0 . Consequently ku − vk1 ≤ 2² + M ρ¯F (u, v) for all u, v ∈ A. This means that, given ² > 0, we can find F , M such that, for u, v ∈ A, ρ¯F (u, v) ≤
² 1+M
=⇒ ku − vk1 ≤ 3².
It follows that every subset of A which is open for TA is open for SA (2A3Ib). (ii) In the other direction, we have ρ¯F (u, v) ≤ ku − vk1 for every u ∈ L1 and every set F of finite measure, so every subset of A which is open for SA is open for TA . (c) If hun in∈N → u for k k1 , A = {un : n ∈ N} is P uniformly integrable. P P Given ² > 0, let m be such that kun − uk1 ≤ ² whenever n ≥ m. Set v = u + i≤m ui  ∈ L1 , and let M ≥ 0, E ∈ Σ be such that R (v − M χE • )+ ≤ ². Then, for w ∈ A, E (w − M χE • )+ ≤ (w − v)+ + (v − M χE • )+ , so
R E
(w − M χE • )+ ≤ k(w − v)+ k1 +
R E
(v − M χE • )+ ≤ 2². Q Q
Thus on either hypothesis we can be sure that {un : n ∈ N} and A = {u} ∪ {un : n ∈ N} are uniformly integrable, so that the two topologies agree on A (by (b)) and hun in∈N converges to u in one topology iff it converges to u in the other. (d) Because A is k k1 bounded (246Ca)R and µ is semifinite, A ⊆ L1 (245J(bi)). Given ² > 0, let M ≥ 0, E ∈ Σ be such that µE < ∞ and (u − M χE • )+ ≤ ² for every u ∈ A. Now the maps u 7→ u, u 7→ u − M χE • , u 7→ u+ : L0 → L0 are all continuous for the topology of convergence in measure (245D), R while {u : kuk1 ≤ ²} is closed for the same topology (245J again), so {u : u ∈ L0 , (u − M χE • )+ ≤ ²} is
246Y
Uniform integrability
closed and must include A. Thus integrable.
R
191
(u − M χE • )+ ≤ ² for every u ∈ A. As ² is arbitrary, A is uniformly
246K Complex L1 and L1 The definitions and theorems above can be repeated without difficulty for spaces of (equivalence classes of) complexvalued functions, with just one variation: in the complex equivalent of 246F, the constant must be changed. It is easy to see that, for u ∈ L1C (µ), kuk1 ≤ k Re(u)k1 + k Im(u)k1 Z Z Z ≤ 2 sup  Re(u) + 2 sup  Im(u) ≤ 4 sup  u. F ∈Σ
F
F ∈Σ
F
F ∈Σ
F
R
(In fact, kuk1 ≤ π supF ∈Σ  F u; see 246Yl and 252Yo.) Consequently some of the arguments of 246G need to be written out with different constants, but the results, as stated, are unaffected. 246X Basic exercises (a) Let (X, Σ, µ) be a measure space and A a subset of L1 = L1 (µ). Show that the following (i) A is uniformly integrable; (ii) for every ² > 0 there is a w ≥ 0 in L1 R are equiveridical: + such that (u − w) ≤ ² for every u ∈ A; (iii) h(un+1  − supi≤n ui )+ in∈N → 0 in L1 for every sequence hun in∈N in A. (Hint: for (ii)⇒(iii), set vn = supi≤n ui  and note that hvn ∧ win∈N is convergent in L1 for every w ≥ 0.) > (b) Let (X, Σ, µ) be a totally finite measure space. Show that for any pR > 1, M ≥ 0 the set {f : f ∈ R Lp (µ), kf kp ≤ M } is uniformly integrable. (Hint: (f  − M χX)+ ≤ M 1−p f p .) > (c) Let µ be counting measure on N. Show that a set A ⊆ L1 (µ) = `1 is uniformly P∞ integrable iff (i) supf ∈A f (n) < ∞ for every n ∈ N (ii) for every ² > 0 there is an m ∈ N such that n=m f (n) ≤ ² for every f ∈ A. (d) Let X be a set, and let µ be counting measure on X. Show that a set A ⊆ L1 (µ) = `1 (X) is uniformly integrable iff (i) supf ∈A f (x) < ∞ for every x ∈ X (ii) for every ² > 0 there is a finite set I ⊆ X such that P f (x) ≤ ² for every f ∈ A. Show that in this case A is relatively compact for the norm topology of x∈X\I 1 ` . (e) Let (X, Σ, µ) be a measure space, δ > 0, and I ⊆ Σ a family such that (i) every atom belongs to I (ii) E ∈ I whenever E ∈ Σ and µE ≤ δ (iii) E ∪ F ∈ I whenever E, F ∈ I and E ∩ F = ∅. Show that every set of finite measure belongs to I. (f ) Let (X, Σ, µ) and (Y, T, ν) be measure spaces and φ : X → Y an inversemeasurepreserving function. Show that a set A ⊆ L1 (ν) is uniformly integrable iff {gφ : g ∈ A} is uniformly integrable in L1 (µ). (Hint: use 246G for ‘if’, 246A for ‘only if’.) >(g) Let (X, Σ, µ) be a measure space and p ∈ [1, ∞[. Let hfn in∈N be a sequence in LpR = Lp (µ) such that {fn p : n ∈ N} is uniformly integrable and fn → f a.e. Show that f ∈ Lp and limn→∞ fn − f p = 0. (h) Let (X, Σ, µ) be a semifinite measure space and p ∈ [1, ∞[. Let hun in∈N be a sequence in Lp = Lp (µ) and u ∈ L0 (µ). Show that the following are equiveridical: (i) u ∈ Lp and hun in∈N converges to u for k kp (ii) hun in∈N converges in measure to u and {un p : n ∈ N} is uniformly integrable. (Hint: 245Xk.) (i) Let (X, Σ, µ) be a totally finite measure space, and 1 ≤ p < r ≤ ∞. Let hun in∈N be a k kr bounded sequence in Lr (µ) which converges in measure to u ∈ L0 (µ). Show that hun in∈N converges to u for k kp . (Hint: show that {un p : n ∈ N} is uniformly integrable.) 246Y Further exercises (a) Let (X, Σ, µ) be a totally finite measure space. Show that A ⊆ L1 (µ) is uniformly integrable iff there is a convex function φ : [0, ∞[ → R such that lima→∞ φ(a)/a = ∞ and R supf ∈A φ(f ) < ∞.
192
Function spaces
246Yb
(b) For any metric space (Z, ρ), let CZ be the family of closed subsets of Z, and for F , F 0 ∈ CZ \ {∅} set ρ˜(F, F 0 ) = max(supz∈F inf z0 ∈F 0 ρ(z, z 0 ), supz0 ∈F 0 inf z∈F ρ(z, z 0 )). Show that ρ˜ is a metric on CZ \ {∅} (it is the Hausdorff metric). Show that if (Z, ρ) is complete then the family KZ \ {∅} of nonempty compact subsets of Z is closed for ρ˜. Now let (X, Σ, µ) be any measure space and take Z = L1 = L1 (µ), ρ(z, z 0 ) = kz − z 0 k1 for z, z 0 ∈ Z. Show that the family of nonempty closed uniformly integrable subsets of L1 is a closed subset of CZ \ {∅} including KZ \ {∅}. (c) Let (X, Σ, µ) be a totally finite measure space and A ⊆ L1 (µ) a uniformly integrable set. Show that there is a uniformly integrable set C ⊇ A such that (i) C is convex and closed in L0 (µ) for the topology of convergence in measure (ii) if u ∈ C and v ≤ u then v ∈ C (iii) if T belongs to the set T + of operators from L1 (µ) = M 1,∞ (µ) to itself, as described in 244Xm, then T [C] ⊆ C. R (d) Let µ be Lebesgue measure on R. Show that a set A ⊆ L1 (µ) is uniformly integrable iff limn→∞ Fn fn = 0 for every disjoint sequence hFn in∈N of compact sets in R and every sequence hfn in∈N in A. R (e) Let µ be Lebesgue measure on R. Show that a set A ⊆ L1 (µ) is uniformly integrable iff limn→∞ Gn fn = 0 for every disjoint sequence hGn in∈N of open sets in R and every sequence hfn in∈N in A. (f ) Repeat 246Yd and 246Ye for Lebesgue measure on arbitrary subsets of R r . (g) Let X be a set and Σ a σalgebra of subsets of X. Let hνn in∈N be a sequence of countably additive functionals on Σ such that νE = limn→∞ νn E is defined for every E ∈ Σ. Show that limn→∞ νn Fn = 0 whenever hFn in∈N is a disjoint sequence in Σ. (Hint: suppose otherwise. By taking suitable subsequences −n −i reduce S to the case in which νn Fi − νFi  ≤ 2 ² for i < n, νn Fn  ≥ 3², νn Fi  ≤ 2 ² for i > n. Set F = i∈N F2i+1 and show that ν2n+1 F − ν2n F  ≥ ² for every n.) Hence show that ν is countably additive. (This is the VitaliHahnSaks theorem.) R (h) Let (X, Σ, µ) be a measure space and hun in∈N a sequence in L1 = L1 (µ) such that limn→∞ F un is defined for every F ∈ Σ. Show that {un : n ∈ N} is uniformly integrable. (Hint: suppose not.R Then there are a disjoint sequence hFn in∈N in Σ and a subsequence hu0n in∈N of hun in∈N such that inf n∈N  Fn u0n  = ² > 0. But this contradicts 246Yg.) P∞ (i) In 246Yg, show that ν is countably additive. (Hint: Set µ = n=0 an νn for a suitable sequence han in∈N of strictly positive numbers. For each n choose a RadonNikod´ ym derivative fn of νn with respect to µ. Show that {fn : n ∈ N} is uniformly integrable, so that ν is truly continuous.) 1 (j) Let (X, Σ, µ) be any measure (i) R space, and A ⊆ L (µ). Show that the following are equiveridical: R A is k k1 bounded; (ii) supu∈A  F u < ∞ for every µatom F ∈ Σ and lim supn→∞ supu∈A  Fn u < ∞ R for every disjoint sequence hFn in∈N of measurable sets of finite measure; (iii) supu∈A  E u < ∞ for every E ∈ Σ. (Hint: show that han un in∈N is uniformly integrable whenever limn→∞ an = 0 in R and hun in∈N is a sequence in A.)
(k) Let (X, Σ, µ) be a measure space and A ⊆ L1 (µ) a nonempty set. Show that the following are equiveridical: (i) A is uniformly integrable; (ii) whenever B ⊆ L∞ (µ) is nonempty and downwardsdirected R ∞ and has infimum 0 in L (µ) then inf v∈B supu∈A  u×v = 0. (Hint: for (i)⇒(ii), note that inf v∈B w×v = 0 for every w ≥ 0 in L0 . For (ii)⇒(i), use 246G(iv).) R (l) Set f (x) = eix for x ∈ [−π, π]. Show that  E f  ≤ 2 for every E ⊆ [−π, π]. 246 Notes and comments I am holding over to the next section the most striking property of uniformly integrable sets (they are the relatively weakly compact sets in L1 ) because this demands some nontrivial ideas from functional analysis and general topology. In this section I give the results which can be regarded as essentially measuretheoretic in inspiration. The most important new concept, or technique, is that of ‘disjointsequence theorem’. A typical example is in condition (iii) of 246G, relating uniform integrability to the behaviour of functionals on disjoint sequences of sets. I give variants of this in 246Yd246Yf, and
Weak compactness in L1
247A
193
246Yg246Yj are further results in which similar methods can be used. The central result of the next section (247C) will also use disjoint sequences in the proof, and they will appear more than once in Chapter 35 in the next volume. The phrase ‘uniformly integrable’ ought to mean something like ‘uniformly approximable by simple functions’, and the definition 246A can be forced into such a form, but I do not think it very useful to do so. However condition (ii) of 246G amounts to something like ‘uniformly truly continuous’, if we think of members of L1 as truly continuous functionals on Σ, as in 242I. (See 246Yi.) Note that in each of the statements (ii)(iv) of 246G we need to take special note of any atoms for the measure, since they are not controlled by the main condition imposed. In an atomless measure space, of course, we have a simplification here, as in 246Yd246Yf. Another way of justifying theR‘uniformly’ in ‘uniformly integrable’ is by considering functionals θw where w ≥ 0 in L1 , setting θw (u) = (u − w)+ for u ∈ L1 ; then A ⊆ L1 is uniformly integrable iff θw → 0 uniformly on A as w rises in L1 (246Xa). It is sometimes useful to know that if this is true at all then it is necessarily witnessed by elements w which can be built directly from materials at hand (see (iii) of 246Xa). Furthermore, the sets Aw² = {u : θw (u) ≤ ²} are always convex, k k1 closed and ‘solid’ (if u ∈ Aw² and v ≤ u then v ∈ Aw² )(246Cd); they are closed under pointwise convergence of sequences (246Ja) and in semifinite measure spaces they are closed for the topology of convergence in measure (246Jd); in probability spaces, for level w, they are closed under conditional expectations (246D) and similar operators (246Yc). Consequently we can expect that any uniformly integrable set will be included in a uniformly integrable set which is closed under operations of several different types. Yet another ‘uniform’ property of uniformly integrable sets is in 246Yk. The norm k k∞ is never (in interesting cases) ordercontinuous in the way that other k kp are (244Yd); but the uniformly integrable subsets of L1 provide interesting ordercontinuous seminorms on L∞ . 246J supplements results from §245. In the notesR to that section I mentioned the question: if hfn in∈N → f R R a.e., in what ways can h fn in∈N fail to converge to f ? Here we find that h fn −f in∈N → 0 iff {fn : n ∈ N} is uniformly integrable; this is a way of making precise the expression ‘none of the weight of the sequence is lost at infinity’. Generally, for sequences, convergence in k kp , for p ∈ [1, ∞[, is convergence in measure for pthpoweruniformlyintegrable sequences (246Xh).
247 Weak compactness in L1 I now come to the most striking feature of uniform integrability: it provides a description of the relatively weakly compact subsets of L1 (247C). I have put this into a separate section because it demands some knowledge of functional analysis – in particular, of course, of weak topologies on Banach spaces. I will try to give an account in terms which are accessible to novices in the theory of normed spaces because the result is essentially measuretheoretic, as well as being of vital importance to applications in probability theory. I have written out the essential definitions in §§2A32A5. 247A Part of the argument of the main theorem below will run more smoothly if I separate out an idea which is, in effect, a simple special case of a theme which has been running through the exercises of this chapter (241Ye, 242Yf, 243Ya, 244Yc). Lemma Let (X, Σ, µ) be a measure space, and G any member of Σ. Let µG be the subspace measure on G, so that µG E = µE for E ⊆ G, E ∈ Σ. Set U = {u : u ∈ L1 (µ), u × χG• = u} ⊆ L1 (µ). Then we have an isomorphism S between the ordered normed spaces U and L1 (µG ), given by writing S(f • ) = (f ¹G)• for every f ∈ L1 (µ) such that f • ∈ U . proof Of course I should remark explicitly that U is a linear subspace of L1 (µ). I have discussed integration over subspaces in §§131 and 214; in particular, I noted that f ¹G is integrable, and that
194
Function spaces
R
f ¹GdµG =
1
R
247A
f  × χG dµ ≤
R
f dµ
1
for every f ∈ L (µ) (131Fa). If f , g ∈ L (µ) and f = g µa.e., then f ¹G = g¹G µG a.e.; so the proposed formula for S does indeed define a map from U to L1 (µG ). Because (f + g)¹G = (f ¹G) + (g¹G),
(cf )¹G = c(f ¹G)
1
for all f , g ∈ L (µ) and all c ∈ R, S is linear. Because f ≤ g µa.e. =⇒ f ¹G ≤ g¹G µG a.e., R S is orderpreserving. Because f ¹GdµG ≤ f dµ for every f ∈ L1 (µ), kSuk1 ≤ kuk1 for every u ∈ U . To see that S is surjective, take any v ∈ L1 (µG ). Express v as g • where g ∈ L1 (µG ). By 131E, f ∈ L1 (µ), where f (x) = g(x) for x ∈ dom g, 0 for x ∈ X \ G; so that f • ∈ U and f ¹G = g and v = S(f • ) ∈ S[U ]. To see that S is normpreserving, note that, for any f ∈ L1 (µ), R R f ¹GdµG = f  × χG dµ, R
so that if u = f • ∈ U we shall have R R kSuk1 = f ¹GdµG = f  × χG dµ = ku × χG• k1 = kuk1 . 247B Corollary Let (X, Σ, µ) be any measure space, and let G ∈ Σ be a measurable set expressible as 1 a countable union of sets of finite measure. Define U as in 247A, R and let h : L (µ) → R be any continuous ∞ linear functional. Then there is a v ∈ L (µ) such that h(u) = u × v dµ for every u ∈ U . proof Let S : U → L1 (µG ) be the isomorphism described in 247A. Then S −1 : L1 (µG ) → U is linear and continuous, so h1 = hS −1 belongs to the normed space dual (L1 (µG ))∗ of L1 (µG ). Now of course µG is σfinite, therefore localizable (211L), so 243Gb tells us that there is a v1 ∈ L∞ (µG ) such that R h1 (u) = u × v1 dµG for every u ∈ L1 (µG ). Express v1 as g1• where g1 : G → R is a bounded measurable function. Set g(x) = g1 (x) for x ∈ G, 0 for x ∈ X \ G; then g : X → R is a bounded measurable function, and v = g • ∈ L∞ (µ). If u ∈ U , express u as f • where f ∈ L1 (µ); then Z h(u) = h(S −1 Su) = h1 ((f ¹G)• ) = (f ¹G) × g1 dµG Z Z Z Z = (f × g)¹G dµG = f × g × χG dµ = f × g dµ = u × v. As u is arbitrary, this proves the result. 247C Theorem Let (X, Σ, µ) be any measure space and A a subset of L1 = L1 (µ). Then A is uniformly integrable iff it is relatively compact in L1 for the weak topology of L1 . proof (a) Suppose that A is relatively compact for the weak topology. I seek to show that it satisfies the condition (iii) of 246G. R R (i) If F ∈ Σ, then surely supu∈A  F u < ∞, because u 7→ F u belongs to (L1 )∗ , and if h ∈ (L1 )∗ then the image of any relatively weakly compact set under h must be bounded (2A5Ie). (ii) Now suppose that hFn in∈N is a disjoint sequence in Σ. ?? Suppose, if possible, that R hsupu∈A  Fn uin∈N does not converge to 0. Then there is a strictly increasing sequence hn(k)ik∈N in N such that R 1 γ = inf k∈N supu∈A  Fn(k) u > 0. 2
Weak compactness in L1
247C
For each k, choose uk ∈ A such that 
R
195
uk  ≥ γ. Because A is relatively compact for the weak topology,
Fn(k) 1
there is a cluster point u of huk ik∈N in L for the weak topology (2A3Ob). Set ηj = 2−j γ/6 > 0 for each j ∈ N. We can now choose a strictly increasing sequence hk(j)ij∈N inductively so that, for each j, R Pj−1 (u + i=0 uk(i) ) ≤ ηj Fn(k(j)) R Pj−1 R i=0  Fn(k(i)) u − Fn(k(i)) uk(j)  ≤ ηj R P−1 Pj−1 for every j, interpreting i=0 as 0. P P Given hk(i)ii<j , set v ∗ = u+ i=0 uk(i) ; then limk→∞ Fn(k) v ∗ = 0, by Lebesgue’s Convergence Theorem or otherwise, so there is a k ∗ such that k ∗ > k(i) for every R Dominated ∗ i < j and Fn(k) v ≤ ηj for every k ≥ k ∗ . Next, R Pj−1 R w 7→ i=0  Fn(k(i)) u − Fn(k(i)) w : L1 → R is continuous for the weak topology of L1 and zero at u,R and u belongs R to every weakly open set containing Pj−1 {uk : k ≥ k ∗ }, so there is a k(j) ≥ k ∗ such that i=0  Fn(k(i)) u − Fn(k(i)) uk(j)  < ηj , which continues the construction. Q Q Let v be any cluster point in L1 , for the weak topology, of huk(j)Rij∈N . Setting Gi = F R n(k(i)) , Rwe have R R R  Gi u − Gi uk(j)  ≤ ηj whenever i < j, so limj→∞ Gi uk(j) exists = Gi u for each i, and Gi v = Gi u for S every i; setting G = i∈N Gi , R R P∞ R P∞ R v = i=0 Gi v = i=0 Gi u = G u, G by 232D, because hGi ii∈N is disjoint. For each j ∈ N, j−1 Z X  Gi
i=0
uk(j)  +
j−1 Z X
≤
j−1 X
uk(j) 
u +
j−1 Z X 
Z u− Gi
i=0 ∞ X
ηi + ηj +
i=0
Gj
Gi
Gi
i=0
R

i=j+1
≤
On the other hand, 
Z
∞ X
ηi =
i=j+1
Gi ∞ X i=0
uk(j)  +
∞ Z X i=j+1
Gi
uk(j) 
γ 3
ηi = .
uk(j)  ≥ γ. So 
R G
uk(j)  = 
P∞ R i=0 Gi
2 3
uk(j)  ≥ γ.
This R is true2 for every j; because every weakly open set containing v meets {uk(j) : j ∈ N},  and  G u ≥ 3 γ. On the other hand, R P∞ R P∞ R P∞ γ  G u =  i=0 Gi u ≤ i=0 Gi u ≤ i=0 ηi = ,
R G
v ≥ 32 γ
3
which is absurd. X X R This contradiction shows that limn→∞ supu∈A  Fn u = 0. As hFn in∈N is arbitrary, A satisfies the condition 246G(iii) and is uniformly integrable. (b) Now assume that A is uniformly integrable. I seek a weakly compact set C ⊇ A. R (i) For each n ∈ N, choose En ∈ Σ, Mn ≥ 0 such that µEn < ∞ and (u − Mn χEn• )+ ≤ 2−n for every u ∈ A. Set R C = {v : v ∈ L1 ,  F v ≤ Mn µ(F ∩ En ) + 2−n ∀ n ∈ N, F ∈ Σ},
196
Function spaces
247C
and note that A ⊆ C, because if u ∈ A and F ∈ Σ, R R R  F u ≤ F (u − Mn χEn• )+ + F Mn χEn• ≤ 2−n + Mn µ(F ∩ En ) for every n. Observe also that C is k k1 bounded, because R kuk1 ≤ 2 supF ∈Σ  F u ≤ 2(1 + M0 µ(F ∩ E0 )) ≤ 2(1 + M0 µE0 ) for every u ∈ C (using 246F). (ii) Because I am seeking to prove this theorem for arbitrary measure spaces (X, Σ, µ), I cannot use 243G to identify the dual of L1 . Nevertheless, 247B above shows thatR 243Gb it is ‘nearly’ valid, in the 1 ∗ ∞ following for every u ∈ C. RP P Set S sense: if h ∈ (L ) , there1 is a v ∈ L such that h(u) = u × v ∞ G = n∈N En ∈ Σ, and define U ⊆ L as in 247A247B. By 247B, there is a v ∈ L such that h(u) = u×v for every u ∈ U . But if u ∈ C, we can express u as f • where f : X → R is measurable. If F ∈ Σ and F ∩ G = ∅, then R R  F f  =  F u ≤ 2−n + Mn µ(F ∩ En ) = 2−n R for every n ∈ N, so F f = 0; it followsRthat f = 0 a.e. on X \ G (131Fc), so that f × χG =a.e. f and u = u × χG• , that is, u ∈ U , and h(u) = u × v, as required. Q Q (iii) So we may proceed, having an adequate description, not of (L1 (µ))∗ itself, but of its action on C. Let F be any ultrafilter on L1 containing C (see 2A3R). For each F ∈ Σ, set R νF = limu→F F u; because supu∈C 
R F
u ≤ supu∈C kuk1 < ∞,
R R R this is welldefined in R (2A3Se). If E, F are disjoint members of Σ, then E∪F u = E u + F u for every u ∈ C, so R R R ν(E ∪ F ) = limu→F E∪F u = limu→F E u + limu→F F u = νE + νF (2A3Sf). Thus ν : Σ → R is additive. Next, it is truly continuous with respect to µ. P P Given ² > 0, take n ∈ N such that 2−n ≤ 21 ², set δ = ²/2(Mn + 1) > 0 and observe that R νF  ≤ supu∈C  F u ≤ 2−n + Mn µ(F ∩ En ) ≤ ² whenever µ(F ∩ En ) ≤ δ. Q Q By the RadonNikod´ ym theorem (232E), there is an f0 ∈ L1 such that R • 1 f = νF for every F ∈ Σ. Set u0 = f0 ∈ L . If n ∈ N, F ∈ Σ then F 0 R R  F u0  = νF  ≤ supu∈C  F u ≤ 2−n + Mn µ(F ∩ En ), so u0 ∈ C. (iv) OfRcourse the point is that F converges to u0 . P P Let h ∈ (L1 )∗ . Then there is a v ∈ L∞ such • that h(u) = u × v for every u ∈ C. Express v as g , where g : X → R is bounded and Σmeasurable. Let ² > 0. Take a0 ≤ a1 ≤ . . . ≤ an such that ai+1 − ai ≤ ²P for each i while a0 ≤ g(x) < an for each x ∈ X. Set n Fi = {x : ai−1 ≤ g(x) < ai } for 1 ≤ i ≤ n, and set g˜ = i=1 ai χFi , v˜ = g˜• ; then k˜ v − vk∞ ≤ ². We have Z u0 × v˜ = =
n X i=1 n X i=1
Consequently
Z ai
u= Fi
Z
ai νFi
i=1
u = lim
ai lim
u→F
n X
Fi
u→F
n X i=1
Z
Z ai
u = lim Fi
u→F
u × v˜.
Weak compactness in L1
247Yb
Z lim sup 
Z u×v−
u→F
Z u0 × v ≤ 
197
Z u0 × v −
Z u0 × v˜ + sup 
Z u×v−
u × v˜
u∈C
≤ ku0 k1 kv − v˜k∞ + sup kuk1 kv − v˜k∞ u∈C
≤ 2² sup kuk1 . u∈C
As ² is arbitrary, Z lim sup h(u) − h(u0 ) = lim sup  u→F
u→F
Z u×v−
u0 × v = 0.
As h is arbitrary, u0 is a limit of F in C for the weak topology of L1 . As F is arbitrary, C is weakly compact in L1 , and the proof is complete. 247D Corollary Let (X, Σ, µ) and (Y, T, ν) be any two measure spaces, and T : L1 (µ) → L1 (ν) a continuous linear operator. Then T [A] is a uniformly integrable subset of L1 (ν) whenever A is a uniformly integrable subset of L1 (µ). proof The point is that T is continuous for the respective weak topologies (2A5If). If A ⊆ L1 (µ) is uniformly integrable, then there is a weakly compact C ⊇ A, by 247C; T [C], being the image of a compact set under a continuous map, must be weakly compact (2A3N(bii)); so T [C] and T [A] are uniformly integrable by the other half of 247C. 247E Complex L1 There are no difficulties, and no surprises, in proving 247C for L1C . If we follow the same proof, everything works, but of course we must remember to change the constant when applying 246F, or rather 246K, in part (bi) of the proof. 247X Basic exercises > (a) Let (X, Σ, µ) be any measure space. Show that if A ⊆ L1 = L1 (µ) is relatively weakly compact, then {v : v ∈ L1 , v ≤ u for some u ∈ A} is relatively weakly compact. 1 1 0 (b) Let (X, Σ, µ) be a Rmeasure F ∈ Σ, w ∈ L∞ (µ) R space.0 On L = LR (µ) define Rpseudometrics ρF , ρw for 1 by setting ρF (u, v) =  F u − F v, ρw (u, v) =  u × w − v × w for u, v ∈ L . Show that on any k k1 bounded subset of L1 , the topology defined by {ρF : F ∈ Σ} agrees with the topology generated by {ρ0w : w ∈ L∞ }.
> (c) Show that for any set X a subset of `1 = `1 (X) is compact for the weak topology of `1 iff it is compact for the norm topology of `1 . (Hint: 246Xd.) (d) Use the argument of (aii) in the proof of 247C to show directly that if A ⊆ `1 (N) is weakly compact then inf n∈N un (n) = 0 for any sequence hun in∈N in A. (e) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and T : L2 (ν) → L1 (µ) any bounded linear operator. Show that {T u : u ∈ L2 (ν), kuk2 ≤ 1} is uniformly integrable in L1 (µ). (Hint: use 244K to see that {u : kuk2 ≤ 1} is weakly compact in L2 (ν).) 247Y Further exercises (a) Let (X, Σ, µ) be a measure space. Take 1 < p < ∞ and M ≥ 0 and set A = {u : u ∈ Lp = Lp (µ), kukp ≤ M }. Write SA for the topology of convergence in measure on A, that is, the subspace topology induced by the topology of convergence in measure on L0 (µ). Show that if h ∈ (Lp )∗ then h¹A is continuous for SA ; so that if T is the weak topology on Lp , then the subspace topology TA is included in SA . R (b) Let (X, Σ, µ) be a measure space and hun in∈N a sequence in L1 = L1 (µ) such that limn→∞ F un is defined for every F ∈ Σ. Show that {un : n ∈ N} is weakly convergent. (Hint: 246Yh.) Find an alternative argument relying on 2A5J and the result of 246Yj.
198
Function spaces
247 Notes
247 Notes and comments In 247D and 247Xa I try to suggest the power of the identification between weak compactness and uniform integrability. That a continuous image of a weakly compact set should be weakly compact is a commonplace of functional analysis; that the solid hull of a uniformly integrable set should be uniformly integrable is immediate from the definition. But I see no simple arguments to show that a continuous image of a uniformly integrable set should be uniformly integrable, or that the solid hull of a weakly compact set should be relatively weakly compact. (Concerning the former, an alternative route does exist; see 371Xf in the next volume.) I can distinguish two important ideas in the proof of 247C. The first, in (aii) of the proof, is a careful manipulation of sequences; it is the argument needed to show that a weakly compact subset of `1 is normcompact. (You may find it helpful to write out a solution to 247Xd.) The Fn(k) and uk are chosen to mimic the situation in which we have a sequence in `1 such that uk (k) = 1 for each k. The k(i) are chosen so that the ‘hump’ moves sufficiently rapidly along this means R for uk(j) (k(i)) to be very small whenever i 6= j.PBut P∞ ∞ that i=0 uk(j) (k(i)) (corresponding to G uk(j) in the proof) is always substantial, while i=0 v(k(i)) will be small for any putative cluster point v of huk(j) ij∈N . I used similar techniques in §246; compare 246Yg. In the other half of the proof of 247C, the strategy is clearer. Members of L1 correspond to truly continuous functionals on Σ; the uniform integrability of C makes the corresponding set of functionals ‘uniformly truly continuous’, so that any limit functional will also be truly continuous and will give us a member of L1 via the RadonNikod´ ym R theorem.R A straightforward approximation argument ((aiv) in the proof, and 247Xb) shows that limu∈F u × w = v × w for every w ∈ L∞ . For localizable measures µ, this would complete the proof. For the general case, we need another step, here done in 247A247B; a uniformly integrable subset of L1 effectively lives on a σfinite part of the measure space, so that we can ignore the rest of the measure and suppose that we have a localizable measure space. The conditions (ii)(iv) of 246G make it plain that weak compactness in L1 can be effectively discussed in terms of sequences; see also 246Yh. I should remark that this is a general feature of weak compactness in Banach spaces (2A5J). Of course the disjointsequence formulations in 246G are characteristic of L1 – I mean that while there are similar results applicable elsewhere (see Fremlin 74, chap. 8), the ideas are clearest and most dramatically expressed in their application to L1 .
§251 intro.
Finite products
199
Chapter 25 Product Measures I come now to another chapter on ‘pure’ measure theory, discussing a fundamental construction – or, as you may prefer to consider it, two constructions, since the problems involved in forming the product of two arbitrary measure spaces (§251) are rather different from those arising in the product of arbitrarily many probability spaces (§254). This work is going to stretch our technique to the utmost, for while the fundamental theorems to which we are moving are natural aims, the proofs are lengthy and there are many pitfalls beside the true paths. RRThe central idea is that of ‘repeated integration’. You have probably already seen formulae of the type ‘ f (x, y)dxdy’ used to calculate the integral of a function of two real variables over a region in the plane. One the basic techniques of advanced R 1 R of R 1 R xcalculus is reversing the order of integration; for instance, we expect 1 ( f (x, y)dx)dy to be equal to ( 0 f (x, y)dy)dx. As I have developed the subject, we already have a 0 y 0 R third calculation to compare with these two: D f , where D = {(x, y) : 0 ≤ y ≤ x ≤ 1} and the integral is taken with respect to Lebesgue measure on the plane. The first two sections of this chapter are devoted to an analysis of the relationship between one and twodimensional Lebesgue measure which makes these operations valid – some of the time; part of the work has to be devoted to a careful description of the exact conditions which must be imposed on f and D if we are to be safe. Repeated integration, in one form or another, appears everywhere in measure theory, and it is therefore necessary sooner or later to develop the most general possible expression of the idea. The standard method is through the theory of products of general measure spaces. Given measure spaces (X, Σ, µ) and (Y, T, ν), the aim is to find a measure λ on X × Y which will, at least, give the right measure µE · νF to a ‘rectangle’ E × F where E ∈ Σ and F ∈ T. It turns out that there are already difficulties in deciding what ‘the’ product measure is, and to do the job properly I find I need, even at this stage, to describe two related but distinguishable constructions. These constructions and their elementary properties take up the whole of R §251. InRR§252 I turn to integration over the product, with Fubini’s and Tonelli’s theorems relating f dλ with f (x, y)µ(dx)ν(dy). BecauseRRthe construction of λ is symmetric between the two factors, this RR automatically provides theorems relating f (x, y)µ(dx)ν(dy) with f (x, y)ν(dy)µ(dx). §253 looks at the space L1 (λ) and its relationship with L1 (µ) and L1 (ν). For general measure spaces, there are obstacles in the way of forming an infinite product; to start with, Q if h(Xn , µn )in∈N is a sequence of measure spaces, then a product measure λ on X = n∈N Xn ought to set Q∞ λX = n=0 µn Xn , and there is no guarantee that this product will converge, or behave well when it does. But for probability spaces, when µn Xn = 1 for every n, this problem at least evaporates. It is possible to define the product of any family of probability spaces; this is the burden of §254. I end the chapter with three sections which are a preparation for Chapters 27 and 28, but are also important in their own right as an investigation of the way in which the group structure of R r interacts with Lebesgue and other measures. §255 deals with the ‘convolution’ f ∗ g of two functions, where (f ∗ g)(x) = R f (y)g(x − y)dy (the integration being with respect to Lebesgue measure). In §257 I show that some of the same ideas, suitably transformed, can be used to describe a convolution ν1 ∗ ν2 of two measures on R r ; in preparation for this I include a section on Radon measures on R r (§256).
251 Finite products The first construction to set up is the product of a pair of measure spaces. It turns out that there are already substantial technical difficulties in the way of finding a canonical universally applicable method. I find myself therefore describing two related, but distinct, constructions, the ‘primitive’ and ‘c.l.d.’ product measures (251C, 251F). After listing the fundamental properties of the c.l.d product measure (251I251J), I work through the identification of the product of Lebesgue measure with itself (251M) and a fairly thorough discussion of subspaces (251N251R).
200
Product measures
251A
251A Definition Let (X, Σ, µ) and (Y, T, ν) be two measure spaces. For A ⊆ X × Y set P∞ S θ(A) = inf{ n=0 µEn · νFn : En ∈ Σ, Fn ∈ T ∀ n ∈ N, A ⊆ n∈N En × Fn }. Remark In the products µEn · νFn , 0 · ∞ is to be taken as 0, as in §135. 251B Lemma In the context of 251A, θ is an outer measure on X × Y . proof (a) Setting En = Fn = ∅ for every n ∈ N, we see that θ∅ = 0. S S (b) If A ⊆ B ⊆ X × Y , then whenever B ⊆ n∈N En × Fn we shall have A ⊆ n∈N En × Fn ; so θA ≤ θB. (c) Let hAn in∈N be a sequence of subsets of X × Y , with union A. For any ²S> 0, we may choose, for each n ∈ N, sequences hEnm im∈N in Σ and hFnm im∈N in T such that An ⊆ m∈N Enm × Fnm and P∞ −n ². Because N×N is countable, we have a bijection k 7→ (nk , mk ) : N → N×N, m=0 µEnm ·νFnm ≤ θAn +2 and now S S A ⊆ n,m∈N Enm × Fnm = k∈N Enk mk × Fnk mk , so that θA ≤ ≤
∞ X k=0 ∞ X
µEnk mk · νFnk mk = θAn + 2−n ² = 2² +
n=0
∞ X ∞ X
µEnm · νFnm
n=0 m=0 ∞ X
θAn .
n=0
P∞ As ² is arbitrary, θA ≤ n=0 θAn . As hAn in∈N is arbitrary, θ is an outer measure. 251C Definition Let (X, Σ, µ) and (Y, T, ν) be measure spaces. By the primitive product measure on X × Y I shall mean the measure λ0 derived by Carath´eodory’s method (113C) from the outer measure θ defined in 251A. Remark I ought to point out that that there is no general agreement on what ‘the’ product measure on X × Y should be. Indeed in 251F below I will introduce an alternative one, and in the notes to this section I will mention a third. 251D Definition It is convenient to have a name for a natural construction for σalgebras. If X and b for the σalgebra of subsets of X × Y Y are sets with σalgebras Σ ⊆ PX and T ⊆ PY , I will write Σ⊗T generated by {E × F : E ∈ Σ, F ∈ T}. 251E Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces; let λ0 be the primitive product b ⊆ Λ and λ0 (E × F ) = µE · νF for all E ∈ Σ, F ∈ T. measure on X × Y , and Λ its domain. Then Σ⊗T proof Throughout this proof, write Σf = {E : E ∈ Σ, µE < ∞}, Tf = {F : F ∈ T, νF < ∞}. (a) Suppose thatSE ∈ Σ and A ⊆ X P × Y . For any ² > 0, there are sequences hEn in∈N in Σ and hFn in∈N ∞ in T such that A ⊆ n∈N En × Fn and n=0 µEn · νEn ≤ θA + ². Now S S A ∩ (E × Y ) ⊆ n∈N (En ∩ E) × Fn , A \ (E × Y ) ⊆ n∈N (En \ E) × Fn , so θ(A ∩ (E × Y )) + θ(A \ (E × Y )) ≤ =
∞ X n=0 ∞ X n=0
µ(En ∩ E) · νFn +
∞ X n=0
µEn · νFn ≤ θA + ².
µ(En \ E) · νFn
251G
Finite products
201
As ² is arbitrary, θ(A ∩ (E × Y )) + θ(A \ (E × Y )) ≤ θA. And this is enough to ensure that E × Y ∈ Λ (see 113D). (b) Similarly, X × F ∈ Λ for every F ∈ T, so E × F = (E × Y ) ∩ (X × F ) ∈ Λ for every E ∈ Σ, F ∈ T. Because Λ is a σalgebra, it must include the smallest σalgebra containing all the products E × F , that b is, Λ ⊇ Σ⊗T. (c) Take E ∈ Σ, F ∈ T. We know that E × F ∈ Λ; setting E0 = E, F0 = F , En = Fn = ∅ for n ≥ 1 in the definition of θ, we have λ0 (E × F ) = θ(E × F ) ≤ µE · νF . We have P Suppose that S come to the central idea of the construction. In fact θ(E × F )P=∞µE · νF . P E × F ⊆ n∈N En × Fn where En ∈ Σ and Fn ∈ T for every n. Set u = n=0 µEn · νFn . If u = ∞ or µE = 0 or νF = 0 then of course µE · νF ≤ u. Otherwise, set I = {n : n ∈ N, µEn = 0},
J = {n : n ∈ N, νFn = 0}, K = N \ (I ∪ J), S E 0 = E \ n∈I En , F 0 = F \ n∈J Fn . S Then µE 0 = µE and νF 0 = νF ; E 0 × F 0 ⊆ n∈K En × Fn ; and for n ∈ K, µEn < ∞ and νFn < ∞, since µEn · νFn ≤ u < ∞ and neither µEn nor νFn is zero. Set S
fn = νFn χEn : X → R
R if n ∈ K, and fn = 0 : X → R if n ∈ I ∪ J. Then fn is a simple function and fn = νFn µEn for n ∈ K, 0 otherwise, so P∞ P∞ R fn (x)µ(dx) = n=0 µEn · νFn ≤ u. n=0 R P∞ Pn By B.Levi’s theorem (123A), applied to h k=0 fk in∈N , g = n=0 fn is integrable and g dµ ≤ u. Write E 00 for {x : x ∈ E 0 , g(x) < ∞}, S so that µE 00 = µE 0 =SµE. Now take any x ∈ E 00 and set Kx = {n : n ∈ 0 0 K, x ∈ En }. Because E × F ⊆ n∈K En × Fn , F 0 ⊆ n∈Kx Fn and P∞ P νF = νF 0 ≤ n∈Kx νFn = n=0 fn (x) = g(x). Thus g(x) ≥ νF for every x ∈ E 00 . We are supposing that 0 < µE = µE 00 and 0 < νF , so we must have νF < ∞, µE 00 < ∞. Now g ≥ νF χE 00 , so R R P∞ µE · νF = µE 00 · νF = νF χE 00 ≤ g ≤ u = n=0 µEn · νFn . As hEn in∈N , hFn in∈N are arbitrary, θ(E × F ) ≥ µE · νF and θ(E × F ) = µE · νF . Q Q Thus λ0 (E × F ) = θ(E × F ) = µE · νF for all E ∈ Σ, F ∈ T. 251F Definition Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ0 the primitive product measure defined in 251C. By the c.l.d. product measure on X × Y I shall mean the function λ : dom λ0 → [0, ∞] defined by setting λW = sup{λ0 (W ∩ (E × F )) : E ∈ Σ, F ∈ T, µE < ∞, νF < ∞} for W ∈ dom λ0 . 251G Remark I had better show at once that λ is a measure. P P Of course its domain Λ = dom λ0 is a σalgebra, and λ∅ = λ0 ∅ = 0. If hWn in∈N is a disjoint sequence in Λ, then for any E ∈ Σ, F ∈ T of finite measure S P∞ P∞ λ0 ( n∈N Wn ∩ (E × F )) = n=0 λ0 (Wn ∩ (E × F )) ≤ n=0 λWn , S P∞ P∞ so λ( n∈N Wn ) ≤ n . On the other hand, if a < n=0 λW n=0 λWn , then we can find m ∈ N and Pm a0 , . . . , am such that a ≤ n=0 an and an < λWn for each n ≤ m; now there are E0 , . . . , Em ∈ Σ and
202
Product measures
251G
F0 , . . . , Fm ∈ T, all of finite measure, such that an ≤ λ0 (Wn ∩ (En × Fn )) for each n. Setting E = S and F = n≤m Fn , we have µE < ∞ and νF < ∞, so λ(
[
W n ) ≥ λ0 (
n∈N
[
Wn ∩ (E × F )) =
n∈N
≥
m X
∞ X
λ0 (Wn n=0 m X
λ0 (Wn ∩ (En × Fn )) ≥
n=0
S n≤m
En
∩ (E × F ))
an ≥ a.
n=0
S P∞ S P∞ As a is arbitrary, λ( n∈N Wn ) ≥ n=0 λWn and λ( n∈N Wn ) = n=0 λWn . As hWn in∈N is arbitrary, λ is a measure. Q Q 251H
We need a simple property of the measure λ0 .
Lemma Let (X, Σ, µ) and (Y, T, ν) be two measure spaces; let λ0 be the primitive product measure on X × Y , and Λ its domain. If H ⊆ X × Y and H ∩ (E × F ) ∈ Λ whenever µE < ∞ and νF < ∞, then H ∈ Λ. proof Let θ be the outer measure described in 251A. Suppose that AS⊆ X × Y and θA
0. Let ∞ hEn in∈N , hFn in∈N be sequences in Σ, T respectively such that A ⊆ n∈N En × Fn and n=0 µEn · νFn ≤ θA + ². Now, for each n, the product of the measures µEn , νEn is finite, so either one is zero or both are finite. If µEn = 0 or νFn = 0 then of course µEn · νFn = 0 = θ((En × Fn ) ∩ H) + θ((En × Fn ) \ H). If µEn < ∞ and νFn < ∞ then µEn · νFn = λ0 (En × Fn ) = λ0 ((En × Fn ) ∩ H) + λ0 ((En × Fn ) \ H) = θ((En × Fn ) ∩ H) + θ((En × Fn ) \ H). Accordingly, because θ is an outer measure, θ(A ∩ H) + θ(A \ H) ≤ =
∞ X n=0 ∞ X
θ((En × Fn ) ∩ H) +
∞ X
θ((En × Fn ) \ H)
n=0
µEn · νFn ≤ θA + ².
n=0
As ² is arbitrary, θ(A ∩ H) + θ(A \ H) ≤ θA. As A is arbitrary, H ∈ Λ. 251I
Now for the fundamental properties of the c.l.d. product measure.
Theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces; let λ be the c.l.d. product measure on X × Y , and Λ its domain. Then b ⊆ Λ and λ(E × F ) = µE · νF whenever E ∈ Σ, F ∈ T and µE · νF < ∞; (a) Σ⊗T b such that V ⊆ W and λV = λW ; (b) for every W ∈ Λ there is a V ∈ Σ⊗T (c) (X × Y, Λ, λ) is complete and locally determined, and in fact is the c.l.d. version of (X × Y, Λ, λ0 ) as described in 213D213E; in particular, λW = λ0 W whenever λ0 W < ∞; (d) if W ∈ Λ and λW > 0 then there are E ∈ Σ, F ∈ T such that µE < ∞, νF < ∞ and λ(W ∩(E ×F )) > 0; (e) if W ∈ Λ and λW <S∞, then for every ² > 0 there are E0 , . . . , En ∈ Σ, F0 , . . . , Fn ∈ T, all of finite measure, such that λ(W 4 i≤n (Ei × Fi )) ≤ ². proof Take θ to be the outer measure of 251A and λ0 the primitive product measure of 251C. Set Σf = {E : E ∈ Σ, µE < ∞} and Tf = {F : F ∈ T, F ∈ ∞}.
251I
Finite products
203
b ⊆ Λ. If E ∈ Σ and F ∈ T and µE · νF < ∞, either µE · νF = 0 and λ(E × F ) = (a) By 251E, Σ⊗T λ0 (E × F ) = 0 or both µE and νF are finite and again λ(E × F ) = λ0 (E × F ) = µE · νF . (b)(i) Take any a < λW . Then there are E ∈ Σf , F ∈ Tf such that λ0 (W ∩ (E × F )) > a (251H); now θ((E × F ) \ W ) = λ0 ((E × F ) \ W ) = λ0 (E × F ) − λ0 (W ∩ (E × F )) < λ0 (E × F ) − a. Let hEn in∈N , hFn in∈N be sequences in Σ, T respectively such that (E × F ) \ W ⊆ P∞ n=0 µEn · νFn ≤ λ0 (E × F ) − a. Consider S b V = (E × F ) \ n∈N En × Fn ∈ Σ⊗T;
S n∈N
En × Fn and
then V ⊆ W , and
λV = λ0 V = λ0 (E × F ) − λ0 ((E × F ) \ V ) [ ≥ λ0 (E × F ) − λ0 ( E n × Fn ) (because (E × F ) \ V ⊆
S
n∈N n∈N En × Fn )
≥ λ0 (E × F ) −
∞ X
µEn · νFn ≥ a
n=0
(by the choice of the En , Fn ). b such that V ⊆ W and λV ≥ a. Now choose a sequence (ii) Thus for every a < λW there is a V ∈ Σ⊗T S han in∈N strictly increasing to λW , and for each an a corresponding Vn ; then V = n∈N Vn belongs to the b is included in W , and has measure at least supn∈N λVn and at most λW ; so λV = λW , as σalgebra Σ⊗T, required. (c)(i) If H ⊆ X ×Y is λnegligible, there is a W ∈ Λ such that H ⊆ W and λW = 0. If E ∈ Σ, F ∈ T are of finite measure, λ0 (W ∩ (E × F )) = 0; but λ0 , being derived from the outer measure θ by Carath´eodory’s method, is complete (212A), so H ∩ (E × F ) ∈ Λ and λ0 (H ∩ (E × F )) = 0. Because E and F are arbitrary, H ∈ Λ, by 251H. As H is arbitrary, λ is complete. (ii) If W ∈ Λ and λW = ∞, then there must be E ∈ Σ, F ∈ T such that µE < ∞, νF < ∞ and λ0 (W ∩ (E × F )) > 0; now 0 < λ(W ∩ (E × F )) ≤ µE · νF < ∞. Thus λ is semifinite. (iii) If H ⊆ X × Y and H ∩ W ∈ Λ whenever λW < ∞, then, in particular, H ∩ (E × F ) ∈ Λ whenever µE < ∞ and νF < ∞; by 251H again, H ∈ Λ. Thus λ is locally determined. W ⊆
(iv) λ0 W < ∞, then we have sequences hEn in∈N in Σ, hFn in∈N in T such that S If W ∈ Λ and P ∞ (E × F ) and n n n∈N n=0 µEn · νFn < ∞. Set
I = {n : µEn = ∞}, J = {n : νFn = ∞}, K = N \ (I ∪ J); S S then ν( n∈I Fn ) = µ( n∈J En ) = 0, so λ0 (W \ W 0 ) = 0, where S S S W 0 = W ∩ n∈K (En × Fn ) ⊇ W \ (( n∈J En × Y ) ∪ (X × n∈I Fn )). S S S Now set En0 = i∈K,i≤n Ei , Fn0 = i∈K,i≤n Fi for each n. We have W 0 = n∈N W 0 ∩ (En0 × Fn0 ), so λW ≤ λ0 W = λ0 W 0 = limn→∞ λ0 (W 0 ∩ (En0 × Fn0 )) ≤ λW 0 ≤ λW , and λW = λ0 W . (v) Following the terminology of 213D, let us write ˜ = {W : W ⊆ X × Y, W ∩ V ∈ Λ whenever V ∈ Λ, λ0 V < ∞}, Λ
204
Product measures
251I
˜ = sup{λ0 (W ∩ V ) : V ∈ Λ, λ0 V < ∞}. λW ˜ ⊆ Λ and Λ ˜ = Λ. Because λ0 (E × F ) < ∞ whenever µE < ∞ and νF < ∞, Λ Now for any W ∈ Λ we have ˜ = sup{λ0 (W ∩ V ) : V ∈ Λ, λ0 V < ∞} λW ≥ sup{λ0 (W ∩ (E × F )) : E ∈ Σf , F ∈ Tf } = λW ≥ sup{λ(W ∩ V ) : V ∈ Λ, λ0 V < ∞} = sup{λ0 (W ∩ V ) : V ∈ Λ, λ0 V < ∞}, ˜ is the c.l.d. version of λ0 . using (iv) just above, so that λ = λ (d) If W ∈ Λ and λW > 0, there are E ∈ Σf and F ∈ Tf such that λ(W ∩(E×F )) = λ0 (W ∩(E×F )) > 0. (e) There are E ∈ Σf , F ∈ Tf such that λ0 (W ∩ (E × F )) ≥ λW − 13 ²; set V1 = W ∩ (E × F ); then 1
λ(W \ V1 ) = λW − λV1 = λW − λ0 V1 ≤ ². 3 S P∞ 0 0 There are sequences hEn in∈N in Σ, hFn in∈N in T such that V1 ⊆ n∈N En0 × Fn0 and n=0 µEn0 · νFn0 ≤ f f 0 0 0 0 0 0 λ0 V1 + 31 ². Replacing S En , 0Fn by0 En ∩ E, Fn ∩ F if necessary, we may suppose that En ∈ Σ , Fn ∈ T for every n. Set V2 = n∈N En × Fn ; then λ(V2 \ V1 ) ≤ λ0 (V2 \ V1 ) ≤ Let m ∈ N be such that
P∞ n=m+1
P∞ n=0
1 3
µEn0 · νFn0 − λ0 V1 ≤ ².
µEn0 · νFn0 ≤ 31 ², and set Sm V = n=0 En0 × Fn0 .
Then λ(V2 \ V ) ≤
P∞ n=m+1
1 3
µEn0 · νFn0 ≤ ².
Putting these together, we have W 4V ⊆ (W \ V1 ) ∪ (V2 \ V1 ) ∪ (V2 \ V ), so 1 3
1 3
1 3
λ(W 4V ) ≤ λ(W \ V1 ) + λ(V2 \ V1 ) + λ(V2 \ V ) ≤ ² + ² + ² = ². And V is of the required form. 251J Proposition If (X, Σ, µ) and (Y, T, ν) are semifinite measure spaces and λ is the c.l.d. product measure on X × Y , then λ(E × F ) = µE · νF for all E ∈ Σ, F ∈ T. proof Setting Σf = {E : E ∈ Σ, µE < ∞}, Tf = {F : F ∈ T, νF < ∞}, we have λ(E × F ) = sup{λ0 ((E ∩ E0 ) × (F ∩ F0 )) : E0 ∈ Σf , F0 ∈ Tf } = sup{µ(E ∩ E0 ) · ν(F ∩ F0 )) : E0 ∈ Σf , F0 ∈ Tf } = sup{µ(E ∩ E0 ) : E0 ∈ Σf } · sup{ν(F ∩ F0 ) : F0 ∈ Tf } = µE · νF (using 213A). 251K σfinite spaces Of course most of the measure spaces we shall apply these results to are σfinite, and in this case there are some useful simplifications. Proposition Let (X, Σ, µ) and (Y, T, ν) be σfinite measure spaces. Then the c.l.d. product measure on b moreover, X × Y is equal to the primitive product measure, and is the completion of its restriction to Σ⊗T; this common product measure is σfinite.
251L
Finite products
205
proof Write λ0 , λ for the primitive and c.l.d. product measures, as usual, and Λ for their domain. Let hEn in∈N , hFn in∈N be nondecreasing sequences of sets of finite measure covering X, Y respectively (see 211D). S (a) For each n ∈ N, λ(En × Fn ) = µEn · νFn is finite, by 251Ia. Since X × Y = n∈N En × Fn , λ is σfinite. (b) For any W ∈ Λ, λ0 W = limn→∞ λ0 (W ∩ (En × Fn )) = limn→∞ λ(W ∩ (En × Fn )) = λW . So λ = λ0 . ˆ B for its completion. b and λ (c) Write λB for the restriction of λ = λ0 to Σ⊗T, ˆ B . Then there are W 0 , W 00 ∈ Σ⊗T b such that W 0 ⊆ W ⊆ W 00 and (i) Suppose that W ∈ dom λ 00 0 00 0 λB (W \ W ) = 0 (212C). In this case, λ(W \ W ) = 0; as λ is complete, W ∈ Λ and ˆB W . λW = λW 0 = λB W 0 = λ ˆB . Thus λ extends λ b such that V ⊆ W and λ(W \ V ) = 0. P (ii) If W ∈ Λ, then there is a V ∈ Σ⊗T P For each n ∈ N b such that Vn ⊆ W ∩ (En × Fn ) and λVn = λ(W ∩ (En × Fn )) (251Ib). But as there is a Vn ∈ Σ⊗T S λ(En × Fn ) = µEn · νFn is finite, this means that λ(W ∩ (En × Fn ) \ Vn ) = 0. So if we set V = n∈N Vn , b V ⊆ W and we shall have V ∈ Σ⊗T, S S W \ V = n∈N W ∩ (En × Fn ) \ V ⊆ n∈N W ∩ (En × Fn ) \ Vn is λnegligible. Q Q b such that V 0 ⊆ (X × Y ) \ W and λ(((X × Y ) \ W ) \ V 0 ) = 0. Setting Similarly, there is a V 0 ∈ Σ⊗T 00 0 00 b V = (X × Y ) \ V , V ∈ Σ⊗T, W ⊆ V 00 and λ(V 00 \ W ) = 0. So λB (V 00 \ V ) = λ(V 00 \ V ) = λ(V 00 \ W ) + λ(W \ V ) = 0, ˆ B , with λ ˆ B W = λB V = λW . As W is arbitrary, λ ˆ B = λ. and W is measured by λ 251L It is time that I gave some examples. Of course the central example is Lebesgue measure. In b this case we have the only reasonable result. I pause to describe the leading example of the product Σ⊗T introduced in 251D. Proposition Let r, s ≥ 1 be integers. Then we have a natural bijection φ : R r × R s → R r+s , defined by setting φ((ξ1 , . . . , ξr ), (η1 , . . . , ηs )) = (ξ1 , . . . , ξr , η1 , . . . , ηs ) for ξ1 , . . . , ξr , η1 , . . . , ηs ∈ R. If we write Br , Bs and Br+s for the Borel σalgebras of R r , R s and R r+s b s. respectively, then φ identifies Br+s with Br ⊗B proof (a) Write B for the σalgebra {φ−1 [W ] : W ∈ Br+s } copied onto R r × R s by the bijection φ; we b s . We have maps π1 : R r+s → R r , π2 : R r+s → R s defined by are seeking to prove that B = Br ⊗B setting π1 (φ(x, y)) = x, π2 (φ(x, y)) = y. Each coordinate of π1 is continuous, therefore Borel measurable (121Db), so π1−1 [E] ∈ Br+s for every E ∈ Br , by 121K. Similarly, π2−1 [F ] ∈ Br+s for every F ∈ Bs . So φ[E × F ] = π1−1 [E] ∩ π1−1 [F ] belongs to Br+s , that is, E × F ∈ B, whenever E ∈ Br and F ∈ Bs . Because B b s ⊆ B. is a σalgebra, Br ⊗B (b) Now examine sets of the form {(x, y) : ξi ≤ α} = {x : ξi ≤ α} × R s , {(x, y) : ηj ≤ α} = R r × {y : ηj ≤ α} b s . But for α ∈ R, i ≤ r and j ≤ s, taking x = (ξ1 , . . . , ξr ) and y = (η1 , . . . , ηs ). All of these belong to Br ⊗B b b the σalgebra they generate is just B, by 121J. So B ⊆ Br ⊗Bs and B = Br ⊗Bs .
206
Product measures
251M
251M Theorem Let r, s ≥ 1 be integers. Then the bijection φ : R r × R s → R r+s described in 251L identifies Lebesgue measure on R r+s with the c.l.d. product λ of Lebesgue measure on R r and Lebesgue measure on R s . proof Write µr , µs , µr+s for the three versions of Lebesgue measure, µ∗r , µ∗s and µ∗r+s for the corresponding outer measures, and θ for the outer measure on R r × R s derived from µr and µs by the formula of 251A. (a) If I ⊆ R r and J ⊆ R s are halfopen intervals, then φ[I × J] ⊆ R r+s is also a halfopen interval, and µr+s (φ[I × J]) = µr I · µs J; this is immediate from the definition Qrof the Lebesgue measure of an interval. (I speak of ‘halfopen’ intervals here, that is, intervals of the form j=1 [αj , βj [, because I used them in the definition of Lebesgue measure in §115. If you prefer to work with open intervals or closed intervals it makes no difference.) Note also that every halfopen interval in R r+s is expressible as φ[I × J] for suitable I, J. (b) For any A ⊆ R r+s , θ(φ−1S[A]) ≤ µ∗r+s (A). P For any ² > 0, there is a sequence hKn in∈N of halfopen P∞P intervals in Rr+s such that A ⊆ n∈N Kn and n=0 µr+s (Kn ) ≤ µ∗r+s (A)+². Express each Kn as φ[In ×Jn ], S where In and Jn are halfopen intervals in R r and R s respectively; then φ−1 [A] ⊆ n∈N In × Jn , so that P∞ P∞ θ(φ−1 [A]) ≤ n=0 µr In · µs Jn = n=0 µr+s (Kn ) ≤ µ∗r+s (A) + ². As ² is arbitrary, we have the result. Q Q P (i) Consider first the (c) If E ⊆ R r and F ⊆ R s are measurable, then µ∗r+s (φ[E × F ]) ≤ µr E · µs F . P case µr E < ∞, µs F < S ∞. In this case, given ² > 0, there are sequences hI i , hJn in∈N of halfopen n n∈N S intervals such that E ⊆ n∈N In , F ⊆ n∈N Fn , P∞ ∗ n=0 µr In ≤ µr E + ² = µr E + ², P∞ ∗ n=0 µs Jn ≤ µs F + ² = µs F + ². S S Accordingly E × F ⊆ m,n∈N Im × Jn and φ[E × F ] ⊆ m,n∈N φ[Im × Jn ], so that µ∗r+s (φ[E × F ]) ≤ =
∞ X
µr+s (φ[Im × Jn ]) =
m,n=0 ∞ X
∞ X
m=0
n=0
µr Im ·
∞ X
µr Im · µs Jn
m,n=0
µs Jn ≤ (µr E + ²)(µs F + ²).
As ² is arbitrary, we have the result. (ii) Next, if µr E = 0, there is a sequence hFn in∈N of sets of finite measure covering R s ⊇ F , so that P∞ P∞ µ∗r+s (φ[E × F ]) ≤ n=0 µ∗r+s (φ[E × Fn ]) ≤ n=0 µr E · µs Fn = 0 = µr E · µs F . (iii) Similarly, µ∗r+s (φ[E × F ]) ≤ µr E · µs F if µs F = 0. (iv) The only remaining case is in which both of µr E, µs F are strictly positive and one is infinite; but in this case µr E · µs F = ∞, so surely µ∗r+s (φ[E × F ]) ≤ µr E · µs F . Q Q (d) If A ⊆ R r+s , then µ∗r+s (A) ≤ θ(φ−1 [A]). P P Given ² > S 0, there are sequences P∞hEn in∈N , hFn in∈N of measurable sets in R r , R s respectively such that φ−1 [A] ⊆ n∈N En × Fn and n=0 µr En · µs Fn ≤ S θ(φ−1 [A]) + ². Now A ⊆ n∈N φ[En × Fn ], so P∞ P∞ µ∗r+s (A) ≤ n=0 µ∗r+s (φ[En × Fn ]) ≤ n=0 µr En · µs Fn ≤ θ(φ−1 [A]) + ². As ² is arbitrary, we have the result. Q Q (e) Putting (c) and (d) together, we have θ(φ−1 [A]) = µ∗r+s (A) for every A ⊆ Rr+s . Thus θ on R r × R s corresponds exactly to µ∗r+s on R r+s . So the associated measures λ0 , µr+s must correspond in the same way, writing λ0 for the primitive product measure. But 251K tells us that λ0 = λ, so we have the result.
251O
Finite products
207
251N In fact, a large proportion of the applications of the constructions here are to subspaces of Euclidean space, rather than to the whole product R r × R s . It would not have been especially difficult to write 251M out to deal with arbitrary subspaces, but I prefer to give a more general description of the product of subspace measures, as I feel that it illuminates the method. I start with a straightforward result on strictly localizable spaces. Proposition Let (X, Σ, µ) and (Y, T, ν) be strictly localizable measure spaces. Then the c.l.d. product measure on X × Y is strictly localizable; moreover, if hXi ii∈I and hYj ij∈J are decompositions of X and Y respectively, hXi × Yj i(i,j)∈I×J is a decomposition of X × Y . proof Let hXi ii∈I and hYj ij∈J be decompositions of X, Y respectively. Then hXi × Yj i(i,j)∈I×J is a disjoint cover of X × Y by measurable sets of finite measure. If W ⊆ X × Y and λW > 0, there P are sets E ∈ Σ, F ∈ TP such that µE < ∞, νF < ∞ and λ(W ∩ (E × F )) > 0. We know that µE = i∈I µ(E ∩ Xi ) and µF = j∈J µ(F ∩ Yj ), so there must be finite sets I0 ⊆ I, J0 ⊆ J such that P P µE · νF − ( i∈I0 µ(E ∩ Xi ))( j∈J0 ν(F ∩ Yj )) < λ(W ∩ (E × F )). S S Setting E 0 = i∈I0 Xi , F 0 = j∈J0 Yj we have λ((E × F ) \ (E 0 × F 0 )) = λ(E × F ) − λ((E ∩ E 0 ) × (F ∩ F 0 )) < λ(W ∩ (E × F )), so that λ(W ∩ (E 0 × F 0 )) > 0. There must therefore be some i ∈ I0 , j ∈ J0 such that λ(W ∩ (Xi × Yj )) > 0. This shows that {Xi × Yj : i ∈ I, j ∈ J} satisfies the criterion of 213O, so that λ, being complete and locally determined, must be strictly localizable. Because hXi × Yj i(i,j)∈I×J covers X × Y , it is actually a decomposition of X × Y (213Ob). 251O Lemma Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X ×Y . Let λ∗ be the corresponding outer measure (132B). Then λ∗ C = sup{θ(C ∩ (E × F )) : E ∈ Σ, F ∈ T, µE < ∞, νF < ∞} for every C ⊆ X × Y , where θ is the outer measure of 251A. proof Write Λ for the domain of λ, Σf for {E : E ∈ Σ, µE < ∞}, Tf for {F : F ∈ T, νF < ∞}; set u = sup{θ(C ∩ (E × F )) : E ∈ Σf , F ∈ Tf }. (a) If C ⊆ W ∈ Λ, E ∈ Σf and F ∈ Tf , then θ(C ∩ (E × F )) ≤ θ(W ∩ (E × F )) = λ0 (W ∩ (E × F )) (where λ0 is the primitive product measure) ≤ λW. As E and F are arbitrary, u ≤ λW ; as W is arbitrary, u ≤ λ∗ C. (b) If u = ∞, then of course λ∗ C = u. Otherwise, let hEn in∈N , hFn in∈N be sequences in Σf , Tf respectively such that u = supn∈N θ(C ∩ (En × Fn )). S S Consider C = C \ ( n∈N En × n∈N Fn ). If E ∈ Σf and F ∈ Tf , then for every n ∈ N we have 0
u ≥ θ(C ∩ ((E ∪ En ) × (F ∪ Fn ))) = θ(C ∩ ((E ∪ En ) × (F ∪ Fn )) ∩ (En × Fn )) + θ(C ∩ ((E ∪ En ) × (F ∪ Fn )) \ (En × Fn )) (because En × Fn ∈ Λ, by 251E) ≥ θ(C ∩ (En × Fn )) + θ(C 0 ∩ (E × F )). Taking the supremum of the righthand expression as n varies, we have u ≥ u + θ(C 0 ∩ (E × F )) so
208
Product measures
251O
λ(C 0 ∩ (E × F )) = θ(C 0 ∩ (E × F )) = 0. As E and F are arbitrary, λC 0 = 0. But this means that λ∗ C ≤ λ∗ (C ∩ (
[
En ×
n∈N ∗
= lim λ (C ∩ ( n→∞
[
[
Fn )) + λ∗ C 0
n∈N
[
Ei ×
i≤n
Fi ))
i≤n
(using 132Ae) ≤ u, as required. 251P Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and A ⊆ X, B ⊆ Y subsets; write µA , νB for the subspace measures on A, B respectively. Let λ be the c.l.d. product measure on X × Y , and ˜ be the c.l.d. product measure of µA and νB on A × B. λ# the subspace measure it induces on A × B. Let λ Then ˜ extends λ# . (i) λ (ii) If either (α) A ∈ Σ and B ∈ T or (β) A and B can both be covered by sequences of sets of finite measure or (γ) µ and ν are both strictly localizable, ˜ then λ = λ# . proof Let θ be the outer measure on X × Y defined from µ and ν by the formula of 251A, and θ˜ the outer ˜ ˜ for the domain of λ, measure on A × B similarly defined from µA and νB . Write Λ for the domain of λ, Λ f # # f and Λ = {W ∩ (A × B) : W ∈ Λ} for the domain of λ . Set Σ = {E : µE < ∞}, T = {F : νF < ∞}. ˜ = θC for every C ⊆ A × B. P (a) The first point to observe is that θC P (i) If hEn in∈N and hFn in∈N are S sequences in Σ, T respectively such that C ⊆ n∈N En × Fn , then S C = C ∩ (A × B) ⊆ n∈N (En ∩ A) × (Fn ∩ B), so ˜ ≤ θC =
∞ X n=0 ∞ X
µA (En ∩ A) · νB (Fn ∩ B) µ∗ (En ∩ A) · ν ∗ (Fn ∩ B) ≤
n=0
∞ X
µEn · νFn .
n=0
˜ ≤ θC. (ii) If hE ˜n in∈N , hF˜n in∈N are sequences in ΣA = dom µA , As hEn in∈N and hFn in∈N are arbitrary, θC S ˜n × F˜n , then for each n ∈ N we can choose En ∈ Σ, Fn ∈ T TB = dom νB respectively such that C ⊆ n∈N E such that ˜n ⊆ En , µEn = µ∗ E ˜n = µA E ˜n , E F˜n ⊆ Fn , and now θC ≤
P∞ n=0
νFn = ν ∗ F˜n = νB F˜n ,
µEn · νFn =
P∞ n=0
˜n · νB F˜n . µA E
˜ Q ˜n in∈N , hF˜n in∈N are arbitrary, θC ≤ θC. As hE Q ˜ P (b) It follows that Λ# ⊆ Λ. P Suppose that V ∈ Λ# and that C ⊆ A × B. In this case there is a W ∈ Λ such that V = W ∩ (A × B). So
251Q
Finite products
209
˜ ∩ V ) + θ(C ˜ \ V ) = θ(C ∩ W ) + θ(C \ W ) = θC = θC. ˜ θ(C ˜ Q As C is arbitrary, V ∈ Λ. Q Accordingly, for V ∈ Λ# , λ# V = λ∗ V = sup{θ(V ∩ (E × F )) : E ∈ Σf , F ∈ Tf } ˜ < ∞, νB F˜ < ∞} ˜ × F˜ )) : E ˜ ∈ ΣA , F˜ ∈ TB , µA E = sup{θ(V ∩ (E ˜ ∩ (E ˜ ˜ × F˜ )) : E ˜ ∈ ΣA , F˜ ∈ TB , µA E ˜ < ∞, νB F˜ < ∞} = λV, = sup{θ(V using 251O twice. This proves part (i) of the proposition. ˜ and V ⊆ E × F where E ∈ Σf and F ∈ Tf , then V ∈ Λ# . (c) The next thing to observe is that if V ∈ Λ P P Let W ⊆ E × F be a measurable envelope of V with respect to λ (132Ee). Then ˜ ˜ θ(W ∩ (A × B) \ V ) = θ(W ∩ (A × B) \ V ) = λ(W ∩ (A × B) \ V ) ˜ V ∈ Λ) ˜ (because W ∩ (A × B) ∈ Λ# ⊆ Λ, ˜ ˜ ≤ θ(W ∩ (A × B)) − θV = λ(W ∩ (A × B)) − λV ˜ = θV ˜ = θV ) (because V ⊆ (E ∩ A) × (F ∩ B), and µA (E ∩ A) and νB (F ∩ B) are both finite, so λV = λ∗ (W ∩ (A × B)) − λ∗ V ≤ λW − λ∗ V = 0. But this means that W 0 = W ∩ (A × B) \ V ∈ Λ and V = (A × B) ∩ (W \ W 0 ) belongs to Λ# . Q Q ˜ and look at the conditions (α)(γ) of part (ii) of the proposition. (d) Now fix any V ∈ Λ, α) If A ∈ Σ and B ∈ T, and C ⊆ X × Y , then A × B ∈ Λ (251E), so (α θ(C ∩ V ) + θ(C \ V ) = θ(C ∩ V ) + θ((C \ V ) ∩ (A × B)) + θ((C \ V ) \ (A × B)) ˜ ∩ V ) + θ(C ˜ ∩ (A × B) \ V ) + θ(C \ (A × B)) = θ(C ˜ ∩ (A × B)) + θ(C \ (A × B)) = θ(C = θ(C ∩ (A × B)) + θ(C \ (A × B)) = θC. As C is arbitrary, V ∈ Λ, so V = V ∩ (A × B) ∈ Λ# . S S S β ) If A ⊆ n∈N En and B ⊆ n∈N Fn where all the En , Fn are of finite measure, then V = m,n∈N V ∩ (β (Em × Fn ) ∈ Λ# , by (c). (γγ ) If hXi ii∈I , hYj ij∈J are decompositions of X, Y respectively, then for each i ∈ I, j ∈ J we have V ∩(Xi ×Yj ) ∈ Λ# , that is, there is a Wij ∈ Λ such that V ∩(Xi ×Yj ) = Wij ∩(A×B). Now hXi ×Yj i(i,j)∈I×J is a decomposition of X × Y for λ (251N), so that S W = i∈I,j∈J Wij ∩ (Xi × Yj ) ∈ Λ, and V = W ∩ (A × B) ∈ Λ# . ˜ = Λ# , in which case (i) tells us that (e) Thus any of the three conditions is sufficient to ensure that Λ # ˜ λ=λ . 251Q Corollary Let r, s ≥ 1 be integers, and φ : Rr × R s → R r+s the natural bijection. If A ⊆ R r and B ⊆ R s , then the restriction of φ to A × B identifies the product of Lebesgue measure on A and Lebesgue measure on B with Lebesgue measure on φ[A × B] ⊆ R r+s . Remark Note that by ‘Lebesgue measure on A’ I mean the subspace measure µrA on A induced by rdimensional Lebesgue measure µr on R r , whether or not A is itself a measurable set. ˜ on A × B is just the proof By 251P, using either of the conditions (iiβ) or (iiγ), the product measure λ subspace measure λ# on A × B induced by the product measure λ on R r × R s . But by 251M we know that ˜ with the subspace φ is an isomorphism between (R r × Rs , λ) and (R r+s , µr+s ); so it must also identify λ measure on φ[A × B].
210
Product measures
251R
251R Corollary Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X×Y . If A ⊆ X and B ⊆ Y can be covered by sequences of sets of finite measure, then λ∗ (A×B) = µ∗ A·ν ∗ B. proof In the language of 251P, λ∗ (A × B) = λ# (A × B) = µA A · νB B (by 251K and 251E) = µ∗ A · ν ∗ B. 251S
The next proposition gives an idea of how the technical definitions here fit together. ˆ µ ˜ µ Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces. Write (X, Σ, ˆ) and (X, Σ, ˜) for the completion ˆ and λ ˜ be the three c.l.d. product measures on X × Y and c.l.d. version of (X, Σ, µ) (212C, 213E). Let λ, λ ˆ = λ. ˜ obtained from the pairs (µ, ν), (ˆ µ, ν) and (˜ µ, ν) of factor measures. Then λ = λ ˆ λ ˜ respectively; and θ, θ, ˆ θ˜ for the outer measures on ˆ and Λ ˜ for the domains of λ, λ, proof Write Λ, Λ X × Y obtained by the formula of 251A from the three pairs of factor measures. (a) If E ∈ Σ and µE < ∞, then θ, θˆ and θ˜ agree on subsets of E × Y . P P Take A ⊆ E × Y and ² > 0. S P∞ (i) There are sequences hEn in∈N in Σ, hFn in∈N in T such that A ⊆ n∈N En ×Fn and n=0 µEn ·νFn ≤ θA + ². Now µ ˜En ≤ µEn for every n (213F), so P∞ ˜ ≤ P∞ µ θA n=0 µEn · νFn ≤ θA + ². n=0 ˜ En · νFn ≤ P∞ ˆ ˆ ˆn ·ν Fˆn ≤ ˆn in∈N in Σ, ˆ hFˆn in∈N in T such that A ⊆ S ˆE (ii) There are sequences hE n∈N En ×Fn and n=0 µ 0 0 0 ˆ ˆ ˆ θA + ². Now for each n there is an En ∈ Σ such that En ⊆ En and µEn = µ ˆEn , so that P P∞ ∞ ˆ + ². ˆn · ν Fˆn ≤ θA ˆE θA ≤ n=0 µEn0 · ν Fˆn = n=0 µ P∞ ˜n · ˜ ˜ ˜n in∈N in Σ, ˜ hF˜n in∈N in T such that A ⊆ S ˜E (iii) There are sequences hE n=0 µ n∈N En × Fn and ˜ + ². Now for each n, E ˜n ∩ E ∈ Σ, ˆ so ν F˜n ≤ θA P P∞ ˆ ≤ ∞ µ ˜ + ². ˜ ˜ ˜n · ν F˜n ≤ θA θA ˜E n=0 ˆ (En ∩ E) · ν Fn ≤ n=0 µ (iv) Since A and ² are arbitrary, θ = θˆ = θ˜ on P(E × Y ). Q Q ∗ ˆ∗ ∗ ˜ (b) Consequently, the outer measures λ , λ and λ are identical. P P Use 251O. Take A ⊆ X × Y , E ∈ Σ, ˆ ∈ Σ, ˆ E ˜ ∈ Σ, ˜ F ∈ T such that µE, µ ˆ µ ˜ and νF are all finite. Then E ˆE, ˜E (i) ˆ ∩ (E × F )) ≤ λ ˆ ∗ A, θ(A ∩ (E × F )) = θ(A
˜ ∩ (E × F )) ≤ λ ˜∗A θ(A ∩ (E × F )) = θ(A
because µ ˆE and µ ˜E are both finite. ˆ ⊆ E 0 and µE 0 < ∞, so that (ii) There is an E 0 ∈ Σ such that E ˆ ∩ (E ˆ ∩ (E 0 × F )) = θ(A ∩ (E 0 × F )) ≤ λ∗ A. ˆ × F )) ≤ θ(A θ(A ˜ E ˜ and µ ˜ \ E 00 ) = 0 (213Fc), so that θ(( ˜ \ E 00 ) × Y ) = 0 (iii) There is an E 00 ∈ Σ such that E 00 ⊆ E ˜ (E and µE 00 < ∞; accordingly ˜ ∩ (E ˜ ∩ (E 00 × F )) = θ(A ∩ (E 00 × F )) ≤ λ∗ A. ˜ × F )) = θ(A θ(A ˆ E ˜ and F , we get (iv) Taking the supremum over E, E, ˆ ∗ A, λ∗ A ≤ λ ˜ ∗ A, λ ˆ ∗ A ≤ λ∗ A, λ ˜ ∗ A ≤ λ∗ A. λ∗ A ≤ λ ˆ∗ = λ ˜∗. Q As A is arbitrary, λ∗ = λ Q ˆ and λ ˜ are all complete and locally determined, so by 213C are the measures defined by (c) Now λ, λ Carath´eodory’s method from their own outer measures, and are therefore identical.
251Wf
Finite products
211
251T It is ‘obvious’ and an easy consequence of theorems so far proved, that the set {(x, x) : x ∈ R} is negligible for Lebesgue measure on R 2 . The corresponding result is true in the square of any atomless measure space. Proposition Let (X, Σ, µ) be an atomless measure space, and let λ be the c.l.d. measure on X × X. Then ∆ = {(x, x) : x ∈ X} is λnegligible. proof Let E, F ∈ Σ be sets of finite measure, and n ∈ N. Applying 215D repeatedly, we can find a disjoint S µF for each i; setting Fn = F \ i 0. f (v) If W ∈ ΛSand λW Q < ∞, then for every ² > 0 there are n ∈ N and E0i , . . . , Eni ∈ Σi , for each i ∈ I, such that λ(W 4 k≤n i∈I Eki ) ≤ ².
212
Product measures
251Wg
N (g) If each µi is σfinite, so is λ, and λ = λ0 is the completion of its restriction to c i∈I Σi . (h) If hIj ij∈J is any partition of I, then λ can be identified with the c.l.d. product of hλj ij∈J , where λj is the c.l.d. product of hµi ii∈Ij . (See the arguments in 251M and also in 254N below.) (i) If I = {1, . . . , n} and each µi is Lebesgue measure on R, then λ can be identified with Lebesgue measure on R n . Q (j) If, for each i ∈ I, we have a decomposition hXij ij∈Ji of Xi , then h i∈I Xi,f (i) if ∈Qi∈I Ji is a decomposition of X. (k) For any A ⊆ X, λ∗ C = sup{θ(C ∩
Q i∈I
Ei ) : Ei ∈ Σfi for every i ∈ I}.
(l) Suppose that Ai ⊆ Xi for each i ∈ I. Write λ# for the subspace measure on A = ˜ extends λ# , and if the c.l.d. product of the subspace measures on the Ai . Then λ either Ai ∈ Σi for every i or every Ai can be covered by a sequence of sets of finite measure or every µi is strictly localizable, ˜ = λ# . then λ
Q i∈I
˜ for Ai , and λ
Q ∗ Q (m) ∗If Ai ⊆ Xi can be covered by a sequence of sets of finite measure for each i ∈ I, then λ ( i∈I Ai ) = i∈I µi Ai for each i. (n) Writing µ ˆi , µ ˜i for the completion and c.l.d. version of each µi , λ is the c.l.d. product of hˆ µi ii∈I and also of h˜ µi ii∈I . (o) If all the (Xi , Σi , µi ) are the same atomless measure space, then {x : x ∈ X, i 7→ x(i) is injective} is λconegligible. 251X Basic exercises (a) Let (X, Σ, µ) and (Y, T, ν) be measure spaces; let λ0 be the primitive product measure on X × Y , and λ the c.l.d. product measure. Show that λ0 W < ∞ iff λW < ∞ and W is included in a set of the form S (E × Y ) ∪ (X × F ) ∪ n∈N En × Fn where µE = νF = 0 and µEn < ∞, νFn < ∞ for every n. > (b) Show that if X and Y are any sets, with their respective counting measures, then the primitive and c.l.d. product measures on X × Y are both counting measure on X × Y . (c) Let (X, Σ, µ) and (Y, T, ν) be measure spaces; let λ0 be the primitive product measure on X × Y , and λ the c.l.d. product measure. Show that λ0 is locally determined ⇐⇒ λ0 is semifinite ⇐⇒ λ0 = λ ⇐⇒ λ0 , λ have the same negligible sets. > (d) (See Q 251W.) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, where I is a nonempty finite set. Set X = i∈I Xi . For A ⊆ X, set P∞ Q S Q θ(A) = inf{ n=0 i∈I µi Eni : Eni ∈ Σi ∀ n ∈ N, i ∈ I, A ⊆ n∈N i∈I Eni }. Show that θ is an outer measure on X. Let λ0 be the measure defined from θ by Carath´eodory’s method, and for W ∈ dom λ0 set Q λW = sup{λ0 (W ∩ i∈I Ei ) : Ei ∈ Σi , µi Ei < ∞ for every i ∈ I}. Show that λ is a measure on X, and is the c.l.d. version of λ0 .
251Xr
Finite products
213
> (e) (See 251W.) Let I be a nonempty finite set and h(Xi , Σi , µi )ii∈I a family of measure spaces. For Q (K) nonempty K ⊆ I set X (K) = i∈K Xi and let λ0 , λ(K) be the measures on X (K) constructed as in 251Xd. Show that if K is a nonempty proper subset of I, then the natural bijection between X (I) and (I) (K) (I\K) X (K) × X (I\K) identifies λ0 with the primitive product measure of λ0 and λ0 , and λ(I) with the (K) (I\K) c.l.d. product measure of λ and λ . > (f ) Using 251Xd251Xe above, or otherwise, show that if (X1 , Σ1 , µ1 ), (X2 , Σ2 , µ2 ), (X3 , Σ3 , µ3 ) are measure spaces then the primitive and c.l.d. product measures λ0 , λ of (X1 × X2 ) × X3 , constructed by first taking the appropriate product measure on X1 × X2 and then taking the product of this with the measure of X3 , are identified with the corresponding product measures on X1 × (X2 × X3 ) by the canonical bijection between the sets (X1 × X2 ) × X3 and X1 × (X2 × X3 ). (g) (i) What happens in 251Xd when I is a singleton? (ii) Devise an appropriate convention to make 251Xd251Xe remain valid when one or more of the sets I, K, I \ K there is empty. > (h) Let (X, Σ, µ) be a complete locally determined measure space, and I any nonempty set; let ν be counting measure on I. Show that the c.l.d. product measure on X × I is equal to (or at any rate identifiable with) the direct sum measure of the family h(Xi , Σi , µi )ii∈I , if we set (Xi , Σi , µi ) = (X, Σ, µ) for every i. > (i) Let h(Xi , Σi , µi )ii∈I be a family of measure spaces, with direct sum (X, Σ, µ) (214K). Let (Y, T, ν) be any measure space, andSgive X ×Y , Xi ×Y their c.l.d. product measures. Show that the natural bijection between X × Y and Z = i∈I ((Xi × Y ) × {i}) is an isomorphism between the measure of X × Y and the direct sum measure on Z. > (j) Let (X, Σ, µ) be any measure space, and Y a singleton set {y}; let ν be the measure on Y such that νY = 1. Show that the natural bijection between X × {y} and X identifies the primitive product measure on X × {y} with µ ˇ as defined in 213Xa, and the c.l.d. product measure with the c.l.d. version of µ. Explain how to put this together with 251Xf and 251Ic to prove 251S. > (k) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . Show b that λ is the c.l.d. version of its restriction to Σ⊗T. (l) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with primitive and c.l.d. product measures λ0 , λ. Let b such that λ1 (E × F ) = µE · νF for every E ∈ Σ, F ∈ T. Show that λ1 be any measure with domain Σ⊗T b λW ≤ λ1 W ≤ λ0 W for every W ∈ Σ⊗T. (m) Let (X, Σ, µ) and (Y, T, ν) be two measure spaces, and λ0 the primitive product measure on X × Y . Show that the corresponding outer measure λ∗0 is just the outer measure θ of 251A. (n) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and A ⊆ X, B ⊆ Y subsets; write µA , νB for the subspace measures. Let λ0 be the primitive product measure on X × Y , and λ# 0 the subspace measure it ˜ 0 be the primitive product measure of µA and νB on A × B. Show that λ ˜ 0 extends induces on A × B. Let λ # λ0 . Show that if either (α) A ∈ Σ and B ∈ T or (β) A and B can both be covered by sequences of sets of ˜ 0 = λ# . finite measure or (γ) µ and ν are both strictly localizable, then λ 0 (o) Let (X, Σ, µ) and (Y, T, ν) be any measure spaces, and λ0 the primitive product measure on X × Y . Show that λ∗0 (A × B) = µ∗ A · ν ∗ B for any A ⊆ X, B ⊆ Y . (p) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and µ ˆ the completion of µ. Show that µ, ν and µ ˆ, ν have the same primitive product measures. (q) Let (X, Σ, µ) be an atomless measure space, and (Y, T, ν) any measure space. Show that the c.l.d. product measure on X × Y is atomless. (r) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . (i) Show that if µ and ν are purely atomic, so is λ. (ii) Show that if µ and ν are pointsupported, so is λ.
214
Product measures
251Xs
(s) Let (X, Σ, µ) be a semifinite measure space. Show that µ is atomless iff the diagonal {(x, x) : x ∈ X} is negligible for the c.l.d. product measure on X × X. 251Y Further exercises (a) Let X, Y be sets with σalgebras of subsets Σ, T. Suppose that h : b X × Y → R is Σ⊗Tmeasurable and φ : X → Y is (Σ, T)measurable (121Yb). Show that x 7→ h(x, φ(x)) : X → R is Σmeasurable. (b) Let (X, Σ, µ) be a complete locally determined measure space with a subspace A whose measure is not locally determined (see 216Xb). Set Y = {0}, νY = 1 and consider the c.l.d. product measures on ˜ for their domains. Show that Λ ˜ properly includes {W ∩ (A × Y ) : W ∈ Λ}. X × Y and A × Y ; write Λ, Λ (c) Let (X, Σ, µ) be any measure space, (Y, T, ν) an atomless measure space, and f : X → Y a (Σ, T)measurable function. Show that {(x, f (x)) : x ∈ X} is negligible for the c.l.d. product measure on X × Y . 251 Notes and comments There are real difficulties in deciding which construction to declare as ‘the’ product of two arbitrary measures. My phrase ‘primitive product measure’, and notation λ0 , betray a bias; my own preference is for the c.l.d. product λ, for two principal reasons. The first is that λ0 is likely to be ‘bad’, in particular, not semifinite, even if µ and ν are ‘good’ (251Xc, 252Yf), while λ inherits some of the most important properties of µ and ν (see 251N); the second is that in the case of topological measure spaces X and Y , there is often a canonical topological measure on X × Y , which is likely to be more closely related to λ than to λ0 . But for elucidation of this point I must ask you to wait until §417 in Volume 4. It would be possible to remove the ‘primitive’ product measure entirely from the exposition, or at least to relegate it to the exercises. This is indeed what I expect to do in the rest of this treatise, since (in my view) all significant features of product measures on finitely many factors can be expressed in terms of the c.l.d. product measure. For the first introduction to product measures, however, a direct approach to the c.l.d. product measure (through the description of λ∗ in 251O, for instance) is an uncomfortably large bite, and I have some sort of duty to present the most natural rival to the c.l.d. product measure prominently enough for you to judge for yourself whether I am right to dismiss it. There certainly are results associated with the primitive product measure (251Xm, 251Xo, 252Yc) which have an agreeable simplicity. The clash is avoided altogether, of course, if we specialize immediately to σfinite spaces, in which the two constructions coincide (251K). But even this does not solve all problems. There is a popular alternative b measure often called ‘the’ product measure: the restriction λ0B of λ0 to the σalgebra Σ⊗T. (See, for b instance, Halmos 50.) The advantage of this is that if a function f on X × Y is Σ⊗Tmeasurable, then x 7→ f (x, y) is Σmeasurable for every y ∈ Y . (This is because {W : W ⊆ X × Y, {x : (x, y) ∈ W } ∈ Σ ∀ y ∈ Y } b is a σalgebra of subsets of X × Y containing E × F for every E ∈ Σ, F ∈ T, and therefore including Σ⊗T.) 2 The primary objection, to my mind, is that Lebesgue measure on R is no longer ‘the’ product of Lebesgue measure on R with itself. Generally, it is right to seek measures which measure as many sets as possible, and I prefer to face up to the technical problems (which I acknowledge are offputting) by seeking appropriate definitions on the approach to major theorems, rather than rely on ad hoc fixes when the time comes to apply them. I omit further examples of product measures for the moment, because the investigation of particular examples will be much easier with the aid of results from the next section. Of course the leading example, and the one which should come always to mind in response to the words ‘product measure’, is Lebesgue measure on R 2 , the case r = s = 1 of 251M and 251Q. For an indication of what can happen when one of the factors is not σfinite, you could look ahead to 252K. I hope that you will see that the definition of the outer measure θ in 251A corresponds to the standard definition of Lebesgue outer measure, with ‘measurable rectangles’ E × F taking the place of intervals, and the functional E × F 7→ µE · νF taking the place of ‘length’ or ‘volume’ of an interval; moreover, thinking of E and F as intervals, there is an obvious relation between Lebesgue measure on R 2 and the product measure on R × R. Of course an ‘obvious relationship’ is not the same thing as a proper theorem with exact hypotheses and conclusions, but Theorem 251M is clearly central. Long before that, however, there is
252A
Fubini’s theorem
215
another parallel between the construction of 251A and that of Lebesgue measure. In both cases, the proof that we have an outer measure comes directly from the defining formula (in 113Yd I gave as an exercise a general result covering 251B), and consequently a very general construction can lead us to a measure. But the measure would be of far less interest and value if it did not measure, and measure correctly, the basic sets, in this case the measurable rectangles. Thus 251E corresponds to the theorem that intervals are Lebesgue measurable, with the right measure (114Db, 114F). This is the real key to the construction, and is one of the fundamental ideas of measure theory. Yet another parallel is in 251Xm; the outer measure defining the primitive product measure λ0 is exactly equal to the outer measure defined from λ0 . I described the corresponding phenomenon for Lebesgue measure in 132C. Any construction which claims the title ‘canonical’ must satisfy a variety of natural requirements; for instance, one expects the canonical bijection between X × Y and Y × X to be an isomorphism between the corresponding product measure spaces. ‘Commutativity’ of the product in this sense is I think obvious from the definitions in 251A251C. It is obviously desirable – not, I think, obviously true – that the product should be ‘associative’ in that the canonical bijection between (X × Y ) × Z and X × (Y × Z) should also be an isomorphism between the corresponding products of product measures. This is in fact valid for both the primitive and c.l.d. product measures (251W, 251Xd251Xf). Working through the classification of measure spaces presented in §211, we find that the primitive product measure λ0 of arbitrary factor measures µ, ν is complete, while the c.l.d. product measure λ is always complete and locally determined. λ0 may not be semifinite, even if µ and ν are strictly localizable (252Yf); but λ will be strictly localizable if µ and ν are (251N). Of course this is associated with the fact that the c.l.d. product measure is distributive over direct sums (251Xi). If either µ or ν is atomless, so is λ (251Xq). Both λ and λ0 are σfinite if µ and ν are (251K). It is possible for both µ and ν to be localizable but λ not (254U). At least if you have worked through Chapter 21, you have now done enough ‘pure’ measure theory for this kind of investigation, however straightforward, to raise a good many questions. Apart from direct sums, we also have the constructions of ‘completion’, ‘subspace’, ‘outer measure’ and (in particular) ‘c.l.d. version’ to integrate into the new ideas; I offer some results in 251S and 251Xj. Concerning subspaces, some possibly surprising difficulties arise. The problem is that the product measure on the product of two subspaces can have a larger domain than one might expect. I give a simple example in 251Yb and a more elaborate one in 254Ye. For strictly localizable spaces, there is no problem (251P); but no other criterion drawn from the list of properties considered in §251 seems adequate to remove the possibility of a disconcerting phenomenon.
252 Fubini’s theorem Perhaps the most important feature of the concept of ‘product measure’ is the fact that we can use it to discuss repeated integrals. In this section I give versions of Fubini’s theorem and Tonelli’s theorem (252B, 252G) with a variety of corollaries, the most useful ones being versions for σfinite spaces (252C, 252H). As applications I describe the relationship between integration and measuring ordinate sets (252N) and calculate the rdimensional volume of a ball in R r (252Q, 252Xh). I mention counterexamples showing the difficulties which can arise with nonσfinite measures and nonintegrable functions (252K252L, 252Xf252Xg). 252A Repeated integrals Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and f a realvalued function defined on a set dom f ⊆ X × Y . We can seek to form the repeated integral
RR
f (x, y)ν(dy)µ(dx) =
which should be interpreted as follows: set D = {x : x ∈ X,
R
g(x) =
R ¡R
¢
f (x, y)ν(dy) µ(dx),
f (x, y)ν(dy) is defined in [−∞, ∞]},
R
f (x, y)ν(dy) for y ∈ D,
216
Product measures
252A
RR R and then write f (x, y)ν(dy)µ(dx) = g(x)µ(dx) if this is defined. Of course the subset of Y on which y 7→ f (x, y) is defined may vary with x, but it must always be conegligible, as must D. Similarly, exchanging the roles of X and Y , we can seek a repeated integral
RR
f (x, y)µ(dx)ν(dy) =
R ¡R
¢
f (x, y)µ(dx) ν(dy).
The point is that, under appropriate conditions on µ and ν, we can relate these repeated integrals to each other by connecting them both with the integral of f itself with respect to the product measure on X × Y . As will become apparent shortly, it is essential here to allow oneself to discuss the integral of a function which is not everywhere defined. It is of less importance whether one allows integrands and integrals to take infinite values, but for definiteness let me say that I shall be following the rules of 135F; that is, R R R f = f + − f − provided that R f is Rdefined almost everywhere, takes values in [−∞, ∞] and is virtually measurable, and at most one of f + , f − is infinite. 252B Theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with c.l.d. product (X × Y, Λ, λ) (251F). Suppose that ν is σfinite and that µ is either strictly localizable or completeRRand locally determined. Let f R be a [−∞, ∞]valued function such that f dλ is defined in [−∞, ∞]. Then f (x, y)ν(dy)µ(dx) is defined R and is equal to f dλ. proof The proof of this result involves substantial technical difficulties. If you have not seen these ideas before, you should almost certainly not go straight to the full generality of the version announced above. I will therefore start by writing out a proof in the case in which both µ and ν are totally finite; this is already lengthy enough. I will present it in such a way that only the central section (part (b) below) needs to be amended in the general case, and then, after completing the proof of the special case, I will give the alternative version of (b) which is required for the full result. R RR (a) Write L for the family of [0, ∞]valued functions f such that f dλ and fR(x, y)ν(dy)µ(dx) are defined and equal. My aim is to show first that f ∈ L whenever f is nonnegative and f dλ is defined, and then to look at differences of functions in L. To prove that enough functions belong to L, my strategy will be to start with ‘elementary’ functions and work outwards through progressively larger classes. It is most efficient to begin by describing ways of building new members of L from old, as follows. (i) f1 + f2 ∈ L for all f1 , f2 ∈ L, and cf ∈ L for all f ∈ L, c ∈ [0, ∞[; this is because
R
(f1 + f2 )(x, y)ν(dy) =
R
R
f1 (x, y)ν(dy) +
(cf )(x, y)ν(dy) = c
R
R
f2 (x, y)ν(dy),
f (x, y)ν(dy)
whenever the righthand sides are defined, which we are supposing to be the case for almost every x, so that ZZ ZZ ZZ (f1 + f2 )(x, y)ν(dy)µ(dx) = f1 (x, y)ν(dy)µ(dx) + f2 (x, y)ν(dy)µ(dx) Z Z Z = f1 dλ + f2 dλ = (f1 + f2 )dλ, ZZ
Z (cf )(x, y)ν(dy)µ(dx) = c
Z f (x, y)ν(dy)µ(dx) = c
Z f dλ =
(cf )dλ.
(ii) If hfn in∈N is a sequence in L such that fn (x, y) ≤ fn+1 (x, y) whenever n ∈ NR and (x, y) ∈ dom fn ∩ dom fn+1 , then supn∈N fn ∈ L. P P Set f = supn∈N fn ; for x ∈ X, n ∈ N set gn (x) = fn (x, y)ν(dy) when the integral is defined in [0, ∞]. Since here I am allowing ∞ as a value R of a function, it is natural to R T regard f as defined on n∈N dom fn . By B.Levi’s theorem, f dλ = supn∈N fn dλ; write u for this common value in [0, ∞]. Next, because fn ≤ fn+1 wherever both are defined, gn ≤ gn+1 wherever both are defined, for each n; we are supposing that fn ∈ L, so gn is defined µalmost everywhere for each n, and R
supn∈N
R
gn dµ = supn∈N
R
fn dλ = u.
T By B.Levi’s theorem again, g dµ = u, where g = supn∈N gn . Now take any x ∈ n∈N dom gn , and consider the functions fxn on Y , setting fxn (y) = fn (x, y) whenever this is defined. Each fxn has an integral in [0, ∞], and fxn (y) ≤ fx,n+1 (y) whenever both are defined, and
252B
Fubini’s theorem
217
R
supn∈N fxn dν = g(x); R so, using B.Levi’s theorem for a third time, (supn∈N fxn )dν is defined and equal to g(x), that is,
R
f (x, y)ν(dy) = g(x).
This is true for almost every x, so
RR
f (x, y)ν(dy)µ(dx) =
R
g dµ = u =
R
f dλ.
Thus f ∈ L, as claimed. Q Q (iii) The expression of the ideas in the next section of the proof will go more smoothly if I introduce another term. Write W for {W : W ⊆ X × Y, χW ∈ L}. Then (α) if W , W 0 ∈ W and W ∩ W 0 = ∅, W ∪ W 0 ∈ W by (i), because χ(W ∪ W 0 ) = χW + χW 0 , S (β) n∈N Wn ∈ W whenever hWn in∈N is a nondecreasing sequence in W and supn∈N λWn is finite because hχWn in∈N ↑ χW , and we can use (ii). R It is also helpful to note that, for any W ⊆ X × Y and any x ∈ X, χW (x, y)ν(dy) = νW [{x}], at least whenever W [{x}] = {y : (x, y) ∈ W } is measured by Rν. Moreover, because λ is complete, a set W ⊆ XR × Y belongs to Λ iff χW is λvirtually measurable iff χW dλ is defined in [0, ∞], and in this case λW = χW dλ. (iv) Finally, we need to observe that, in appropriate circumstances, the difference of two members of W will belong to W: ifR W , W 0 ∈ W and W ⊆ WR0 and λW 0 < ∞, then W 0 \ W ∈ W. P P We 0 0 are supposing that g(x) = χW (x, y)ν(dy) and g (x) = χW (x, y)ν(dy) are defined for almost every x, R R and that g dµ = λW , g 0 dµ = λW 0 . Because λW 0 is finite, g 0 must be finite almost everywhere, and D = {x : x ∈ dom g ∩ dom g 0 , g 0 (x) < ∞} is conegligible. Now, for any x ∈ D, both g(x) and g 0 (x) are finite, so y 7→ χ(W 0 \ W )(x, y) = χW 0 (x, y) − χW (x, y) is the difference of two integrable functions, and Z
Z χ(W 0 \ W )(x, y)ν(dy) =
χW 0 (x, y) − χW (x, y)ν(dy) Z Z 0 = χW (x, y)ν(dy) − χW (x, y)ν(dy) = g 0 (x) − g(x).
Accordingly
RR
χ(W 0 \ W )(x, y)ν(dy)µ(dx) =
R
g 0 (x) − g(x)µ(dx) = λW 0 − λW = λ(W 0 \ W ),
and W 0 \ W belongs to W. Q Q (Of course the argument just above can be shortened by a few words if we allow ourselves to assume that µ and ν are totally finite, since then g(x) and g 0 (x) will be finite whenever they are defined; but the key idea, that the difference of integrable functions is integrable, is unchanged.) (b) Now let us examine the class W, assuming that µ and ν are totally finite. (i) E × F ∈ W for all E ∈ Σ, F ∈ T. P P λ(E × F ) = µE · νF (251J), and
R
χ(E × F )(x, y)ν(dy) = νF χE(x)
for each x, so ZZ
Z χ(E × F )(x, y)ν(dy)µ(dx) =
(νF χE(x))µ(dx) = µE · νF Z = λ(E × F ) = χ(E × F )dλ. Q Q
218
Product measures
252B
(ii) Let E be {E × F : E ∈ Σ, F ∈ T}. Then E is closed under finite intersections (because (E × F ) ∩ (E 0 × F 0 ) = (E ∩ E 0 ) × (F ∩ F 0 )) and is included in W. In particular, X × Y ∈ W. But this, together with (aiv) and (aiiiβ) above, means that W is a Dynkin class (definition: 136A), so includes the σalgebra of b (definition: subsets of X × Y generated by E, by the Monotone Class Theorem (136B); that is, W ⊇ Σ⊗T 251D). b such that (iii) Next, W ∈ W whenever W ⊆ X × Y is λnegligible. P P By 251Ib, there is a V ∈ Σ⊗T V ⊆ (X × Y ) \ W and λV = λ((X × Y ) \ W ). Because λ(X × Y ) = µX · νY is finite, V 0 = (X × Y ) \ V is b λnegligible, and we have W ⊆ V 0 ∈ Σ⊗T. Consequently 0 = λV 0 = But this means that D = {x :
R
RR
χV 0 (x, y)ν(dy)µ(dx).
χV 0 (x, y)ν(dy) is defined and equal to 0}
is conegligible. If x ∈ D, then we must have χV 0 (x, y) = 0Rfor νalmost every y, that is, V 0 [{x}] is negligible; 0 in which case R W [{x}] ⊆ V [{x}] is also negligible, and χW (x, y)ν(dy) = 0. And this is true for every x ∈ D, so χW (x, y)ν(dy) is defined and equal to 0 for almost every x, and
RR
χW (x, y)ν(dy)µ(dx) = 0 = λW ,
as required. Q Q b such that V ⊆ W (iv) It follows that Λ ⊆ W. P P If W ∈ Λ, then, by 251Ib again, there is a V ∈ Σ⊗T and λV = λW , so that λ(W \ V ) = 0. Now V ∈ W by (ii) and W \ V ∈ W by (iii), so W ∈ W by (aiiiα). Q Q (c) I return to the class L. (i) If f ∈ L and g is a [0, ∞]valued function defined and equal to f λa.e., then g ∈ L. P P Set W = (X × Y ) \ {(x, y) : (x, y) ∈ dom f ∩ dom g, f (x, y) = g(x, y)}, RR so that λW = 0. (Remember that λ is complete.) By (b), χW (x, y)ν(dy)µ(dx) = 0, that is, W [{x}] is νnegligible for µalmost every x. Let D be {x : x ∈ X, W [{x}] is νnegligible}. Then D is µconegligible. If x ∈ D, then is negligible, so that
R
W [{x}] = Y \ {y : (x, y) ∈ dom f ∩ dom g, f (x, y) = g(x, y)} R f (x, y)ν(dy) = g(x, y)ν(dy) if either is defined. Thus the functions R R x 7→ f (x, y)ν(dy), x 7→ g(x, y)ν(dy)
are equal almost everywhere, and
RR
g(x, y)ν(dy)µ(dx) =
RR
f (x, y)ν(dy)µ(dx) =
so that g ∈ L. Q Q (ii) Now let f be any nonnegative function such that k, n ∈ N set
R
R
f dλ =
R
g dλ,
f dλ is defined in [0, ∞]. Then f ∈ L. P P For
Wnk = {(x, y) : (x, y) ∈ dom f, f (x, y) ≥ 2−n k}. Because λ is complete and f is λvirtually measurable and dom f is conegligible, every Wnk belongs to Λ, P4n so χWnk ∈ L, by (b). Set fn = k=1 2−n χWnk , so that fn (x, y) = 2−n k if k ≤ 4n and 2−n k ≤ f (x, y) < 2−n (k + 1), = 2n if f (x, y) ≥ 2n , = 0 if (x, y) ∈ (X × Y ) \ dom f. By (ai), fn ∈ L for every n ∈ N, while hfn in∈N is nondecreasing, so f 0 = supn∈N fn ∈ L, by (aii). But f =a.e. f 0 , so f ∈ L, by (i) just above. Q Q R let f be any [−∞, ∞]valued function such that f dλ is defined in [−∞, ∞]. Then R + (iii)R Finally, f dλ, f − dλ are both defined and at most one is infinite. By (ii), both f + and f − belong to L. Set
252B
Fubini’s theorem
219
R R R R g(x) = Rf + (x, y)ν(dy), h(x) = f − (x, y)ν(dy) whenever these are defined; then g dµ = f + dλ and R h dµ = f − dλ are both defined in [0, ∞]. R R Suppose first that f − dλ is finite. Then h dµ is finite, so h must be finite µalmost everywhere; set For any x ∈ D,
R
D = {x : x ∈ dom g ∩ dom h, h(x) < ∞}. R f + (x, y)ν(dy) and f − (x, y)ν(dy) are defined in [0, ∞], and the latter is finite; so
R
f (x, y)ν(dy) =
R
f + (x, y)ν(dy) −
R
f − (x, y)ν(dy) = g(x) − h(x)
is defined in ]−∞, ∞]. Because D is conegligible, ZZ Z Z Z f (x, y)ν(dy)µ(dx) = g(x) − h(x)µ(dx) = g dµ − h dµ Z Z Z + − = f dλ − f dλ = f dλ, as required. R whenR f − dλ is finite. Similarly, or by applying the argument above to −f , RR Thus we have the result R f (x, y)ν(dy)µ(dx) = f dλ if f + dλ is finite. Thus the theorem is proved, at least when µ and ν are totally finite. (b*) The only point in the argument above where we needed to know anything special about the measures µ and ν was in part (b), when showing that Λ ⊆ W. I now return to this point under the hypotheses of the theorem as stated, that ν is σfinite and µ is either strictly localizable or complete and locally determined. (i) It will be helpful to note that the completion µ ˆ of µ (212C) is identical with its c.l.d. version µ ˜ (213E). P P If µ is strictly localizable, then µ ˆ=µ ˜ by 213Ha. If µ is complete and locally determined, then µ ˆ=µ=µ ˜ (212D, 213Hf). Q Q (ii) Write Σf = {G : G ∈ Σ, µG < ∞}, Tf = {H : H ∈ T, νH < ∞}. For G ∈ Σf , H ∈ Tf let µG , νH and λG×H be the subspace measures on G, H and G × H respectively; then λG×H is the c.l.d. product measure of µG and νH (251P(iiα)). Now W ∩ (G × H) ∈ W for every W ∈ Λ. P P W ∩ (G × H) belongs to the domain of λG×H , so by (b) of this proof, applied to the totally finite measures µG and νH , λ(W ∩ (G × H)) = λG×H (W ∩ (G × H)) Z Z = χ(W ∩ (G × H))(x, y)νH (dy)µG (dx) ZG ZH = χ(W ∩ (G × H))(x, y)ν(dy)µG (dx) G
Y
(because χ(W ∩ (G × H))(x, y) = 0 if y ∈ Y \ H, so we can use 131E) Z Z = χ(W ∩ (G × H))(x, y)ν(dy)µ(dx) X
by 131E again, because
R Y
Y
χ(W ∩ (G × H))(x, y)ν(dy) = 0 if x ∈ X \ G. So W ∩ (G × H) ∈ W. Q Q
(iii) In fact, W ∈ W for every W ∈ Λ. P P Remember that we are supposing that ν is σfinite. Let hYn in∈N Rbe a nondecreasing sequence in Tf covering Y , and for each n ∈ N set Wn = W ∩ (X × Yn ), gn (x) = χWn (x, y)ν(dy) whenever this is defined. For any G ∈ Σf ,
R
g dµ = G n
RR
χ(W ∩ (G × Yn ))(x, y)ν(dy)µ(dx)
is defined and equal to λ(W ∩ (G × Yn )), by (ii). But this means, first, that G \ dom gn is negligible, that is, that µ ˆ(G \ dom gn ) = 0. Since this is so whenever µG is finite, µ ˜(X \ dom gn ) = 0, and g is defined µ ˜a.e.; but µ ˜=µ ˆ, so g is defined µ ˆa.e., that is, µa.e. (212Eb). Next, if we set Ena = {x : x ∈ dom gn , gn (x) ≥ a} ˆ whenever G ∈ Σf , where Σ ˆ is the domain of µ for a ∈ R, then Ena ∩ G ∈ Σ ˆ; by the definition in 213D, Ena is measured by µ ˜=µ ˆ. As a isR arbitrary, gn is µvirtually measurable (212Fa). We can therefore speak of gn dµ. Now
220
Product measures
ZZ
252B
Z χWn (x, y)ν(dy)µ(dx) =
Z gn dµ = sup
gn
G∈Σf
G
(213B, because µ is semifinite) = sup λ(W ∩ (G × Yn )) = λ(W ∩ (X × Yn )) G∈Σf
by the definition in 251F. Thus W ∩ (X × Yn ) ∈ W. This is true for every n ∈ N. Because hYn in∈N ↑ Y , W ∈ W, by (aiiiβ). Q Q (iv) We can therefore return to part (c) of the argument above and conclude as before. 252C The theorem above is of course asymmetric, in that different hypotheses are imposed on the two factor measures µ and ν. If we want a ‘symmetric’ theorem we have to suppose that they are both σfinite, as follows. Corollary Let (X, Σ, µ) and (Y, T, ν) RR be two σfinite measureRRspaces, and λ the c.l.d. product measure on X × Y . If f is λintegrable, then f (x, y)ν(dy)µ(dx) and f (x, y)µ(dx)ν(dy) are defined, finite and equal. proof Since µ and ν are surely strictly localizable (211Lc), we can apply 252B from either side to conclude that
RR
f (x, y)ν(dy)µ(dx) =
R
f dλ =
RR
f (x, y)µ(dx)ν(dy).
252D So many applications of Fubini’s theorem are to characteristic functions that I take a few lines to spell out the form which 252B takes in this case, as in parts (b)(b*) of the proof there. Corollary Let (X, Σ, µ) and (Y, T, ν) be measure spaces and λ the c.l.d. product measure on X ×Y . Suppose that ν is σfinite and that Rµ is either strictly localizable or complete and locally determined. (i) If W ∈ dom λ, then ν ∗ W [{x}]µ(dx) is defined in [0, ∞] and R R equal to λW . (ii) If ν is complete, we can write νW [{x}]µ(dx) in place of ν ∗ W [{x}]µ(dx). R proof The point is just that χW (x, y)ν(dy) = νˆW [{x}] whenever either is defined, where νˆ is the completion of ν (212Fb). Now 252B tells us that λW =
RR
χW (x, y)ν(dy)µ(dx) =
R
νˆW [{x}]µ(dx).
We always have νˆW [{x}] = ν ∗ W [{x}], by the definition of νˆ (212C); and if ν is complete, then νˆ = ν so R λW = νW [{x}]µ(dx). 252E Corollary Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with c.l.d. product (X×Y, Λ, λ). Suppose that ν is σfinite and that µ is either strictly localizable or complete and locally determined. Then if f is a Λmeasurable realvalued function defined on a subset of X × Y , y 7→ f (x, y) is νvirtually measurable for µalmost every x ∈ X. proof Let f˜ be a Λmeasurable extension of f to a realvalued function defined everywhere on X × Y (121I), and set f˜x (y) = f˜(x, y) for all x ∈ X, y ∈ Y , D = {x : x ∈ X, f˜x is νvirtually measurable}. If G ∈ Σ and µG < ∞, then G \ D is negligible. P P Let hYn in∈N be a nondecreasing sequence of sets of finite measure covering Y respectively, and set f˜n (x, y) = f˜(x, y) if x ∈ G, y ∈ Yn and f˜(x, y) ≤ n, = 0 for other x ∈ X × Y. Then each f˜n is λintegrable, being bounded and Λmeasurable and zero off G × Yn . Consequently, setting f˜nx (y) = f˜n (x, y),
252G
Fubini’s theorem
R R
221
( f˜nx dν)µ(dx) exists =
R
f˜n dλ.
But this surely means that f˜nx is νintegrable, therefore νvirtually measurable, for almost every x ∈ X. Set Dn = {x : x ∈ X, f˜nx is νvirtually measurable}; T T then every Dn is µconegligible, so n∈N Dn isTconegligible. But for any x ∈ G ∩ n∈N Dn , f˜x = limn→∞ f˜nx is νvirtually measurable. Thus G \ D ⊆ X \ n∈N Dn is negligible. Q Q This is true whenever µG < ∞. By 213J, because µ is either strictly localizable or complete and locally determined, X \ D is negligible and D is conegligible. But, for any x ∈ D, y 7→ f (x, y) is a restriction of f˜x and must be νvirtually measurable. 252F As a further corollary we can get some useful information about the c.l.d. product measure for arbitrary measure spaces. Corollary Let (X, Σ, µ) and (Y, T, ν) be two measure spaces, λ the c.l.d. product measure on X × Y , and Λ its domain. Let W ∈ Λ be such that the vertical section W [{x}] is νnegligible for µalmost every x ∈ X. Then λW = 0. proof Take E ∈ Σ, F ∈ T of finite measure. Let λE×F be the subspace measure on E × F . By 251P(iiα), this is just the product of the subspace measures µE and νF . We know that W ∩ (E × F ) is measured by λE×F . At the same time, the vertical section (W ∩ (E × F ))[{x}] = W [{x}] ∩ F is νF negligible for µE almost every x ∈ X. Applying 252B to µE and νF and χ(W ∩ (E × F )), λ(W ∩ (E × F )) = λE×F (W ∩ (E × F )) =
R
νF (W [{x}] ∩ F )µE (dx) = 0.
But looking at the definition in 251F, we see that this means that λW = 0, as claimed. 252G Theorem 252B and its corollaries depend on the factor measures µ and ν belonging to restricted classes. There is a partial result which applies to all c.l.d. product measures, as follows. Tonelli’s theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and (X × Y, Λ, λ) their c.l.d. product. Let RR f be a ΛmeasurableRR[−∞, ∞]valued function defined on a member of Λ, and suppose that either f (x, y)µ(dx)ν(dy) or f (x, y)ν(dy)µ(dx) exists in R. Then f is λintegrable. proof Because the construction of the product measure is symmetric in the two factors, it is enough to RR consider the case in which f (x, y)ν(dy)µ(dx) is defined and finite, as the same ideas will surely deal with the other case also. (a) The first step is to check that f is defined and finite λa.e. P P Set W = {(x, y) : (x, y) ∈ dom f, f (x, y) is finite}. Then W ∈ Λ. The hypothesis
RR
includes the assertion
R
f (x, y)ν(dy)µ(dx) is defined and finite
f (x, y)ν(dy) is defined and finite for µalmost every x,
which implies that for µalmost every x, f (x, y) is defined and finite for νalmost every y; that is, that for µalmost every x, W [{x}] is νconegligible. But by 252F this implies that (X × Y ) \ W is λnegligible, as required. Q Q (b)RRLet h be any nonnegative λsimple function such that h ≤ f  λa.e. Then than f (x, y)ν(dy)µ(dx). P P Set
R
h cannot be greater
W = {(x, y) : (x, y) ∈ dom f, h(x, y) ≤ f (x, y)}, h0 = h × χW ; Pn then h0 is a simple function and h0 =a.e. h. Express h0 as i=0 ai χWi where ai ≥ 0 and λWi < ∞ for each i. Let ² > 0. For each i ≤ n there are Ei ∈ Σ, Fi ∈ T such that µEi < ∞, νFi < ∞ and
222
Product measures
252G
S S λ(Wi ∩ (Ei × Fi )) ≥ λWi − ². Set E = i≤n Ei and F = i≤n Fi . Consider the subspace measures µE and νF and their product λE×F on E × F ; then λE×F is the subspace measure on E × F defined from λ (251P). Accordingly, applying 252B to the product µE × νF ,
R
h0 dλ =
E×F
R
E×F
h0 dλE×F =
R R E
F
h0 (x, y)νF (dy)µE (dx).
For any x, we know that h0 (x, y) ≤ f (x, y) whenever f (x, y) is defined. So we can be sure that
R
R
R
h0 (x, y)νF (dy) = h0 (x, y)χF (y)ν(dy) ≤ f (x, y)ν(dy) F R 0 R at least whenever F h (x, y)νF (dy) and f (x, y)ν(dy) are both defined, which is the case for almost every x ∈ E. Consequently Z Z Z h0 (x, y)νF (dy)µE (dx) h0 dλ = E F E×F Z Z ZZ ≤ f (x, y)ν(dy)µ(dx) ≤ f (x, y)ν(dy)µ(dx). E
On the other hand,
R
R
h0 dλ −
E×F
h0 dλ =
So
R
As ² is arbitrary,
R
h dλ ≤
RR
Pn i=0
h dλ =
R
ai λ(Wi \ (E × F )) ≤ h0 dλ ≤
RR
Pn i=0
ai λ(Wi \ (Ei × Fi )) ≤ ²
f (x, y)ν(dy)µ(dx)+²
Pn i=0
Pn i=0
ai .
ai .
f (x, y)ν(dy)µ(dx), as claimed. Q Q
(c) This is true whenever h is a λsimple function less than or equal to f  λa.e. But f  is Λmeasurable and λ is semifinite (251Ic), so this is enough to ensure that f  is λintegrable (213B), which (because f is supposed to be Λmeasurable) in turn implies that f is λintegrable. 252H Corollary Let (X, Σ, µ) and (Y, T, ν) be σfinite measure spaces, λ the c.l.d. product measure on X × Y , and Λ its domain. Let f be a Λmeasurable realvalued function defined on a member of Λ. Then if one of
R
X×Y
f (x, y)λ(d(x, y)),
R R
R R
Y
X
f (x, y)µ(dx)ν(dy), X
exists in R, so do the other two, and in this case
R
X×Y
f (x, y)λ(d(x, y)) =
proof (a) Suppose that
R
R R Y
X
f (x, y)µ(dx)ν(dy) =
Y
R R X
Y
f (x, y)ν(dy)µ(dx)
f (x, y)ν(dy)µ(dx).
f dλ is finite. Because both µ and ν are σfinite, 252B tells us that
both exist and are equal to
RR
R
f (x, y)µ(dx)ν(dy),
f dλ, while
RR
R
f (x, y)µ(dx)ν(dy),
RR
RR
f (x, y)ν(dy)µ(dx)
f (x, y)ν(dy)µ(dx)
both exist and are equal to f dλ. RR (b) Now suppose that f (x, y)ν(dy)µ(dx) exists in R. Then 252G tells us that f  is λintegrable, so we can use (a) to complete the argument. Exchanging the coordinates, the same argument applies if RR f (x, y)µ(dx)ν(dy) exists in R. 252I Corollary Let (X, Σ, µ) and (Y, T, ν) be measure spaces, λ the c.l.d. product measure on X × Y , and Λ its domain. Take W ∈ Λ. If either of the integrals
R
µ∗ W −1 [{y}]ν(dy),
exists and is finite, then λW < ∞. proof Apply 252G with f = χW , remembering that
R
ν ∗ W [{x}]µ(dx)
252Kc
Fubini’s theorem
µ∗ W −1 [{y}] =
R
ν ∗ W [{x}] =
χW (x, y)µ(dx),
223
R
χW (x, y)ν(dy)
whenever the integrals are defined, as in the proof of 252D. 252J Remarks 252H is the basic form of Fubini’s theorem; it is not a coincidence that most authors avoid nonσfinite spaces in this context. The next two examples exhibit some of the difficulties which can arise if we leave the familiar territory of moreorless Borel measurable functions on σfinite spaces. The first is a classic. 252K Example Let (X, Σ, µ) be [0, 1] with Lebesgue measure, and let (Y, T, ν) be [0, 1] with counting measure. (a) Consider the set W = {(t, t) : t ∈ [0, 1]} ⊆ X × Y . We observe that W is expressible as T Sn n∈N
k+1 k k=0 [ n+1 , n+1 ]
k k+1 b × [ n+1 , n+1 ] ∈ Σ⊗T.
If we look at the sections W −1 [{t}] = W [{t}] = {t} for t ∈ [0, 1], we have
RR
χW (x, y)µ(dx)ν(dy) =
RR
R
χW (x, y)ν(dy)µ(dx) =
µW −1 [{y}]ν(dy) =
R
νW [{x}]µ(dx) =
R
R
0 ν(dy) = 0,
1 µ(dx) = 1,
so the two repeated integrals differ. It is therefore not generally possible to reverse the order of repeated integration, even for a nonnegative measurable function in which both repeated integrals exist and are finite. b (b) Because the set W of part (a) actually belongs to Σ⊗T, we know that it is measured by the c.l.d. product measure λ, and 252F (applied with the coordinates reversed) tells us that λW = 0. (c) It is in fact easy to give a full description of λ. (i) The point is that a set W ⊆ [0, 1] × [0, 1] belongs to the domain Λ of λ iff every horizontal section W −1 [{y}] is Lebesgue measurable. P P (α) If W ∈ Λ, then, for every b ∈ [0, 1], λ([0, 1] × {b}) is finite, so W ∩ ([0, 1] × {b}) is a set of finite measure, and λ(W ∩ ([0, 1] × {b})) =
R
µ(W ∩ ([0, 1] × {b}))−1 [{y}]ν(dy) = µW −1 [{b}]
by 252D, because µ is σfinite, ν is both strictly localizable and complete and locally determined, and (W ∩ ([0, 1] × {b}))−1 [{y}] = W −1 [{b}] if y = b, = ∅ otherwise. As b is arbitrary, every horizontal section of W is measurable. (β) If every horizontal section of W is measurable, let F ⊆ [0, 1] be any set of finite measure for ν; then F is finite, so S b ⊆ Λ. W ∩ ([0, 1] × F ) = y∈F W −1 [{y}] × {y} ∈ Σ⊗T But it follows that W itself belongs to Λ, by 251H. Q Q (ii) Now some of the same calculations show that for every W ∈ Λ, P λW = y∈[0,1] µW −1 [{y}]. P P For any finite F ⊆ [0, 1],
224
Product measures
252Kc
Z µ(W ∩ ([0, 1] × F ))−1 [{y}]ν(dy)
λ(W ∩ ([0, 1] × F )) = Z
µW −1 [{y}]ν(dy) =
= F
So λW = supF ⊆[0,1] is finite
P y∈F
X
µW −1 [{y}].
y∈F
µW −1 [{y}] =
P y∈[0,1]
µW −1 [{y}]. Q Q
252L Example For the second example, I turn to a problem that can arise if we neglect to check that a function is measurable as a function of two variables. Let (X, Σ, µ) = (Y, T, ν) be ω1 , the first uncountable ordinal (2A1Fc), with the countablecocountable measure (211R). Set W = {(ξ, η) : ξ ≤ η < ω1 } ⊆ X × Y . Then all the horizontal sections W
−1
R
[{η}] = {ξ : ξ ≤ η} are countable, so µW −1 [{η}]ν(dη) =
R
0 ν(dη) = 0,
while all the vertical sections W [{ξ}] = {η : ξ ≤ η < ω1 } are cocountable, so
R
νW [{ξ}]µ(dξ) =
R
1 µ(dξ) = 1.
Because the two repeated integrals are different, they cannot both be equal to the measure of W , and the sole resolution is to say that W is not measurable for the product measure. 252M Remark A third kind of difficulty in the formula
RR
f (x, y)dxdy =
RR
f (x, y)dydx
b can arise even on probability spaces with Σ⊗Tmeasurable realvalued functions defined everywhere if we neglect to check that f is integrable with respect to the product measure. In 252H, we do need the hypothesis that one of
R
X×Y
f (x, y)λ(d(x, y)),
R R
R R
Y
X
f (x, y)µ(dx)ν(dy), X
Y
f (x, y)ν(dy)µ(dx)
is finite. For examples to show this, see 252Xf and 252Xg. 252N Integration through ordinate sets I: Proposition Let (X, Σ, µ) be a complete locally determined measure space, and λ the c.l.d. product measure on X × R, where R is given Lebesgue measure; write Λ for the domain of λ. For any [0, ∞]valued function f defined on a conegligible subset of X, write Ωf , Ω0f for the ordinate sets Ωf = {(x, a) : x ∈ dom f, 0 ≤ a ≤ f (x)} ⊆ X × R, Ω0f = {(x, a) : x ∈ dom f, 0 ≤ a < f (x)} ⊆ X × R. Then λΩf = λΩ0f =
R
f dµ
in the sense that if one of these is defined in [0, ∞], so are the other two, and they are equal. proof (a) If Ωf ∈ Λ, then
R
f (x)µ(dx) =
R
ν{y : (x, y) ∈ Ωf }µ(dx) = λΩf
by 252D, writing µ for Lebesgue measure, because f is defined almost everywhere. Similarly, if Ω0f ∈ Λ,
R
f (x)µ(dx) =
R
ν{y : (x, y) ∈ Ω0f }µ(dx) = λΩ0f .
*252P
Fubini’s theorem
225
R (b) If f dµ is defined, then f is µvirtually measurable, therefore measurable (because µ is complete); again because µ is complete, dom f ∈ Σ. So S Ω0f = q∈Q,q>0 {x : x ∈ dom f, f (x) > q} × [0, q], Ωf =
T
S
1
: x ∈ dom f, f (x) ≥ q − } × [0, q] n R 0 belong to Λ, so that λΩf and λΩf are defined. Now both are equal to f dµ, by (a). n≥1
q∈Q,q>0 {x
252O Integration through ordinate sets II: Proposition Let (X, Σ, µ) be a measure space, and f a Σmeasurable [0, ∞]valued function defined on a measurable conegligible subset of X. Then
R
f dµ =
R∞
R∞
µ{x : x ∈ dom f, f (x) ≥ t}dt = 0 µ{x : x ∈ dom f, f (x) > t}dt R in [0, ∞], where the integrals . . . dt are taken with respect to Lebesgue measure. P4n proof For n, k ∈ N set Enk = {x : x ∈ dom f, f (x) > 2−n k}, gn (x) = 2−n k=1 χEnk . Then Rhgn in∈N is a nondecreasing sequence of measurable functions converging to f at every point of dom f , so f dµ = R limn→∞ gn dµ and µ{x : f (x) > t} = limn→∞ µ{x : gn (x) > t} for every t ≥ 0; consequently 0
R∞ 0
µ{x : f (x) > t}dt = limn→∞
R∞ 0
µ{x : gn (x) > t}dt.
On the other hand, µ{x : gn (x) > t} = µEnk if 1 ≤ k ≤ 4n and 2−n (k − 1) < t ≤ 2−n k, 0 if t ≥ 2n , so that R∞ R P4n −n µ{x : g (x) > t}dt = 2 µE = gn dµ, n nk k=1 0 R∞ R for every n ∈ N. So 0 µ{x : f (x) > t}dt = f dµ. Now µ{x : f (x) ≥ t} = µ{x : f (x) > t} for almost all t. P P Set C = {t : µ{x : f (x) > t} < ∞}, h(t) = µ{x : f (x) > t} for t ∈ C. Then h : C → [0, ∞[ is monotonic, so is continuous almost everywhere (222A). But at any point of C at which h is continuous, µ{x : f (x) ≥ t} = lims↓t µ{x : f (x) > s} = µ{x : f (x) > t}. So we have the Rresult, since µ{x : f (x) ≥ t} = µ{x : fR(x) > t} = ∞ for any t ∈ [0, ∞[ \ C. Q Q ∞ Accordingly 0 µ{x : f (x) ≥ t}dt is also equal to f dµ. b *252P If we work through the ideas of 252B for Σ⊗Tmeasurable functions, we get the following, which is sometimes useful. b Proposition Let (X, Σ, µ) be a measure space, and R (Y, T, ν) a σfinite measure space. Then for any Σ⊗Tmeasurable RR function f : X × Y → R[0, ∞], x 7→ f (x, y)ν(dy) : X → [0, ∞] is Σmeasurable; and if µ is semifinite, f (x, y)ν(dy)µ(dx) = f dλ, where λ is the c.l.d. product measure on X × Y . proof (a) Let hYn in∈N be a nondecreasing sequence of subsets of Y of finite measure with union Y . Set A = {W : W ⊆ X × Y, W [{x}] ∈ T for every x ∈ X, x 7→ ν(Yn ∩ W [{x}]) is Σmeasurable for every n ∈ N}. b by the Then A is a Dynkin class of subsets of X × Y including {E × F : E ∈ Σ, F ∈ T}, so includes Σ⊗T, Monotone Class Theorem (136B). b then This means that if W ∈ Σ⊗T, µW [{x}] = supn∈N ν(Yn ∩ W [{x}]) is defined for every x ∈ X and is a Σmeasurable function of x. (b) Now, for n, k ∈ N, set Wnk = {(x, y) : f (x, y) ≥ 2−n k}, Then if we set hn (x) =
R
gn (x, y)ν(dy)=
gn =
P4n k=1
P4n k=1
2−n χWnk .
2−n νWnk [{x}]
226
Product measures
*252P
for n ∈ N and x ∈ X, hn : X → [0, ∞] is Σmeasurable, and limn→∞ hn (x) =
R¡
limn→∞ gn (x, y))ν(dy) =
R
f (x, y)ν(dy)
for every x, because hgn (x, y)in∈N is a nondecreasing sequence with limit f (x, y) for all x ∈ X, y ∈ Y . So R x 7→ f (x, y)ν(dy) is also defined everywhere in X and is Σmeasurable. R R R (c) If E ⊆ X is measurable and has finite measure, then E f (x, y)ν(dy)µ(dx) = E×Y f dλ, applying 252B to the product of the subspace measure µE and ν (and using 251P to check that the product of µE and ν is the subspace measure on E × Y ). Now if λW is defined and finite, there must be a nondecreasing sequence S hEn in∈N of subsets of X of finite measure such that λW = supn∈N λ(W ∩ (En × Y )), so that W \ n∈N (En × Y ) is negligible, and Z
Z f dλ = lim
n→∞
W
f dλ W ∩(En ×Y )
(by B.Levi’s theorem applied to hf × χ(W ∩ (En × Y ))in∈N ) Z ≤ lim f dλ n→∞ E ×Y Z nZ ZZ = lim f (x, y)ν(dy)µ(dx) ≤ f (x, y)ν(dy)µ(dx). n→∞
By 213B,
R
En
f dλ = supλW (c) Let (X1 , Σ1 , µ1 ), (X2 , Σ2 , µ2 ), (X2 , Σ3 , µ3 ) be three σfinite measure spaces, and f a realvalued function defined almost RRR everywhere on X1 × X2 × X3 and measurable for RRR the product measure described in 251Xf. Show that if f (x , x , x )dx dx dx is defined in R, then f (x1 , x2 , x3 )dx2 dx3 dx1 and 1 2 3 1 2 3 RRR f (x1 , x2 , x3 )dx3 dx1 dx2 exist and are equal. b such that (d) Give an example of strictly localizable measure spaces (X, Σ, µ), (Y, T, ν) and a W ∈ Σ⊗T x 7→ νW [{x}] is not Σmeasurable. (Hint: in 252Kb, try Y a proper subset of [0, 1].) b > (e) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and f a Σ⊗Tmeasurable function defined on a subset of X × Y . Show that y 7→ f (x, y) is Tmeasurable for every x ∈ X. 2 RR> (f ) Set f (x, y) = sin(x − y) if 0 ≤ y ≤ x ≤ y + 2π, 0 for other x, y ∈ R . Show that f (x, y)dy dx = 2π, taking all integrals with respect to Lebesgue measure.
RR
f (x, y)dx dy = 0,
252Ye
> (g) Set f (x, y) =
Fubini’s theorem x2 −y 2 (x2 +y 2 )2
for x, y ∈ ]0, 1]. Show that
229
R1R1 0
0
π 4
f (x, y)dydx = ,
R1R1 0
0
π 4
f (x, y)dxdy = − .
(h) Let r ≥ 1 be an integer, and write βr for the Lebesgue measure of the unit ball in R r . Set gr (t) = rβr tr−1 for tR ≥ 0. Set φ(x) = kxk for x ∈ R r . (i) Writing µr for Lebesgue measure on Rr , show that µr φ−1 [E] = E rβr tr−1 µ1 (dt) for every Lebesgue measurable set E ⊆ [0, ∞[. (Hint: start with intervals E, noting from 115Xe that µr {x : kxk ≤ a} = βr ar for a ≥ 0, and progress to open sets, negligible sets and general measurable sets.) (ii) Using 235T, show that Z Z ∞ 2 2 r e−kxk /2 µr (dx) = rβr tr−1 e−t /2 µ1 (dt) = 2(r−2)/2 rβr Γ( ) 2
0
√ r 1 = 2r/2 βr Γ(1 + ) = ( 2Γ( ))r 2
where Γ is the Γfunction (225Xj). (iii) Show that 2Γ( 21 )2 = 2β2 and hence that βr =
r/2
π Γ(1+ r2 )
and
R∞ −∞
e−t
2
/2
dt =
R∞
√
0
te−t
2
2
/2
dt = 2π,
2π.
(i) Let f , g : R → R be two nondecreasing functions, and µf , µg the associated LebesgueStieltjes measures (see 114Xa). Set f (x+ ) = limt↓x f (t),
f (x− ) = limt↑x f (t)
for each x ∈ R, and define g(x+ ), g(x− ) similarly. Show that whenever a ≤ b in R, Z Z − f (x )µg (dx) + g(x+ )µf (dx) = g(b+ )f (b+ ) − g(a− )f (a− ) [a,b] [a,b] Z Z 1 1 − + = (f (x ) + f (x ))µg (dx) + ((g(x− ) + g(x+ ))µf (dx). [a,b]
2
[a,b]
2
(Hint: find two expressions for (µf × µg ){(x, y) : a ≤ x < y ≤ b}.) 252Y Further exercises (a) Let (X, Σ, µ) be a measure space. Show that the following are equiveridical: (i) the completion of µ is locally determined; (ii) the completion of µ coincides with the c.l.d. version of µ; (iii) whenever (Y, R T, ν) is a σfinite measure space and RR λ the c.l.d. product measure on X × Y and R f is a function such that f dλ is defined in [−∞, ∞], then f (x, y)ν(dy)µ(dx) is defined and equal to f dλ. (b) Let (X, Σ, µ) be a measure space. Show that the following are equiveridical: (i) µ has locally determined negligible sets (213I); (ii) whenever (Y, T, ν) is a σfinite measure space and λ the c.l.d. product RR R measure on X × Y , then f (x, y)ν(dy)µ(dx) is defined and equal to f dλ for any λintegrable function f . (c) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ0 theRRprimitive product measure R on X ×Y (251C). Let f be any λ0 integrable realvalued function. Show that f (x, y)ν(dy)µ(dx) = f dλ0 . (Hint: show that there are sequences hGn in∈NS , hHn in∈N of sets of finite measure such that f (x, y) is defined and equal to 0 for every (x, y) ∈ (X × Y ) \ n∈N Gn × Hn .) (d) Let (X, Σ, µ) and (Y, T, ν) be measure spaces; let λ0 be the primitive product measure on X × Y , and λ the c.l.d. R R product measure. Show that if f is a λ0 integrable realvalued function, it is λintegrable, and f dλ = f dλ0 . (e) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with c.l.d. product (X ×Y, Λ, λ). Let f be a nonnegative Λmeasurable realvalued function defined on a λconegligible set, and suppose that ¢ R ¡R f (x, y)µ(dx) ν(dy) is finite. Show that f is λintegrable.
230
Product measures
252Yf
(f ) Let (X, Σ, µ) be the unit interval [0, 1] with Lebesgue measure, and (Y, T, ν) the interval with counting measure, as in 252K; let λ0 be the primitive product measure on [0, 1]2 . (i) Setting ∆ = {(t, t) : t ∈ [0, 1]}, show that λ0 ∆ = ∞. (ii) Show that λ0 is not semifinite. (iii) Show that if W ∈ dom λ0 , then λ0 W = P −1 [{y}] if there are a countable set A ⊆ [0, 1] and a Lebesgue negligible set E ⊆ [0, 1] such that y∈[0,1] µW W ⊆ ([0, 1] × A) ∪ (E × [0, 1]), ∞ otherwise. (g) Let (X, Σ, µ) be a measure space, and λ0 the primitive product measure on X × R, where R is given Lebesgue measure; write Λ for its domain. For any [0, ∞]valued function f defined on a conegligible subset R of X, write Ωf , Ω0f for the corresponding ordinate sets, as in 252N. Show that if any of λ0 Ωf , λ0 Ω0f , f dµ is defined and finite, so are the others, and all three are equal. (h) Let (X, Σ, µ) be a complete locally determined measure space, and f a nonnegative function defined on a conegligible subset of X. Write Ωf , Ω0f for the corresponding ordinate sets, as in 252N. Let λ be the R c.l.d. product measure on X × R, where R is given Lebesgue measure. Show that f dµ = λ∗ Ωf = λ∗ Ω0f . R R∞ (i) Let (X, Σ, µ) be a measure space and f : X → [0, ∞[ a function. Show that f dµ = 0 µ∗ {x : f (x) ≥ t}dt. (j) Let (X, Σ, µ) be a complete locally determined measure space and a < b in R, endowed with Lebesgue measure; let Λ be the domain of the c.l.d. product measure λ on X × [a, b]. Let f : X × ]a, b[ → R be a Λmeasurable function such that t 7→ f (x, t) : [a, b] → R is continuous on [a, b] and differentiable on ]a, b[ for every x ∈ X. (i) Show that the partial derivative
∂f ∂t
with respect to the second variable is Λmeasurable.
R ∂f (ii) Now suppose that is λintegrable and that f (x, t0 )µ(dx) is defined and finite for some t0 ∈ ]a, b[. ∂t R Show that F (t) = f (x, t)µ(dx) is defined in R for every t ∈ [a, b], that F is absolutely continuous, and that R ∂f R ∂f F 0 (t) = (x, t)µ(dx) for almost every t ∈ ]a, b[. (Hint: F (c) = F (a) + X×[a,c] dλ for every c ∈ [a, b].) ∂t
∂t
(k) Show that
Γ(a)Γ(b) Γ(a+b)
R∞ 0
=
R1
ta−1
0
ta−1 (1 − t)b−1 dt for all a, b > 0. (Hint: show that
R∞ t
e−x (x − t)b−1 dxdt =
R∞ 0
e−x
Rx 0
ta−1 (x − t)b−1 dtdx.)
(l) Let (X, Σ, µ) and (Y, T, ν) be σfinite measure spaces R and R λ the c.l.d. product R Rmeasure on X × Y . Suppose that f ∈ L0 (λ) and that 1 < p < ∞.R Show that (  f (x, y)dxp dy)1/p ≤ ( f (x, y)p dy)1/p dx. p (Hint: set q = p−1 and consider the integral f (x, y)g(y)λ(d(x, y)) for g ∈ Lq (ν), using 244K.) 1 Ry f (m) Let ν be Lebesgue measure on [0, ∞[; suppose that f ∈ Lp (ν) where 1 < p < ∞. Set F (y) = y 0 R p 1 for y > 0. Show that kF kp ≤ kf kp . (Hint: F (y) = 0 f (xy)dx; use 252Yl with X = [0, 1], Y = [0, ∞[.) p−1
R∞ (n) Set f (t) = t − ln(t + 1) for t > −1. (i) Show that Γ(a + 1) = aa+1 e−a −1 e−af (u) du for every a > 0. (Hint: substitute u = at −1 in 225Xj(iii).) (ii) Show that there is a δ > 0 such that f (t) ≥ 13 t2 for −1 ≤ t ≤ δ. (iii) Setting α = 12 f (δ), show that (for a ≥ 1) R∞ √ √ R ∞ −af (t) a δ e dt ≤ ae−aα 0 e−f (t)/2 dt → 0 √ √ √ 2 as a → ∞. (iv) Set ga (t) = e−af (t/ a) if − a < t ≤ δ a, 0 otherwise. Show that ga (t) ≤ e−t /3 for all a, t 2 and that lima→∞ ga (t) = e−t /2 for all t, so that lim
a→∞
ea Γ(a+1) a
a+ 1 2
(v) Show that
=
Z
∞
√
√
a
−∞
e
−af (t)
Z
δ
dt = lim a e−af (t) dt a→∞ a→∞ −1 −1 Z ∞ Z ∞ √ 2 = lim ga (t)dt = e−t /2 dt = 2π.
= lim
a→∞
n! limn→∞ −n n √ e n n
√
−∞
2π. (This is Stirling’s formula.)
252 Notes
Fubini’s theorem
231
(o) Let (X, Σ, µ) be a measure space and f a µintegrable complexvalued function. For α ∈ ]−π, π] set Rπ R R Hα = {x : x ∈ dom f, Re(e−iα f (x)) > 0}. Show that −π Re(e−iα Hα f )dα = 2 f , and hence that there R 1 R is some α such that  Hα f  ≥ f . (Compare 246F.) π
(p) Let (X, Σ, µ) be a complete measure space and write M0,∞ for the set {f : f ∈ L0 (µ), µ{x : f (x) ≥ a} is finite for some a ∈ [0, ∞[}. (i) Show that for each f ∈ M0,∞ there is a nonincreasing f ∗ : ]0, ∞[ → R such that µL {t : f ∗ (t) ≥ α} = µ{x : f (x) ≥ α} for every α > 0, writing µL for Lebesgue measure. (ii) R R µE Show that E f dµ ≤ 0 f ∗ dµL for every E ∈ Σ (allowing ∞). (Hint: (f × χE)∗ ≤ f ∗ .) (iii) Show that 0,∞ kf ∗ kpR = kf kp for every . (Hint: (f p )∗ = (f ∗ )p .) (iv) Show that if f , g ∈ M0,∞ R ∗ p ∈ ∗[1, ∞], f ∈ M then R f × gdµ ≤ f × g dµ R L . (Hint: look at simple functions first.) (v) Show that1 if µ is atomless a ∗ then 0 f dµL = supE∈Σ,µE≤a E f  for every a ≥ 0. (Hint: 215D.) (vi) Show that A ⊆ L (µ) is uniformly integrable iff {f ∗ : f ∈ A} is uniformly integrable in L1 (µL ). (f ∗ is called the decreasing rearrangement of f .) (q) Let (X, Σ, µ) be a complete locally determined measure space, and write ν for Lebesgue measure on [0, 1]. Show that the c.l.d. product measure λ on X × [0, 1] is localizable iff µ is localizable. (Hints: (i) if E ⊆ Σ, show that F ∈ Σ is an essential supremum for E in Σ iff F × [0, 1] is an essential supremum for {E × [0, 1] : E ∈ E} in Λ = dom λ. (ii) For W ∈ Λ, n ∈ N, k < 2n set Wnk = {x : x ∈ X, ν ∗ {t : (x, t) ∈ W, 2−n k ≤ t ≤ 2−n (k + 1)} ≥ 2−n−1 }. Show that if W ⊆ Λ and Fnk is an essential supremum for {Wnk : W ∈ W} in Σ for all n, k, then S T S −m k, 2−m (k + 1)] n∈N m≥n k 0. Show that if the c.l.d. product measure on X × Y is strictly localizable, then µ is strictly localizable. (Hint: take F ∈ T, 0 < νF < ∞. Let hWi ii∈I be a decomposition of X × Y . For i ∈ I, n ∈ N set Ein = {x : ν ∗ {y : y ∈ F, (x, y) ∈ Wi } ≥ 2−n }. Apply 213Ye to {Ein : i ∈ I, n ∈ N}.) (t) Let (X, Σ, µ) be the space of Example 216E, and give Lebesgue measure to [0, 1]. Show that the c.l.d. product measure on X × [0, 1] is complete, locally determined, atomless and localizable, but not strictly localizable. (u) Show that if p is any nonzero (real) polynomial in r variables, then {x : x ∈ R r , p(x) = 0} is Lebesgue negligible. 252 Notes and comments For a volume and a half now I have asked you to accept the idea of integrating partiallydefined functions, insisting that sooner or later they would appear at the core of the subject. The moment has now come. If we wish to apply Fubini’s and Tonelli’s theorems in the most fundamental of all cases, with both factors equal to Lebesgue measure on the unit interval, it is surely natural to look at all functions which are integrable on the square for twodimensional Lebesgue measure. Now twodimensional Lebesgue measure is a complete measure, so, in particular, assigns zero measure to any set of the form {(x, b) : x ∈ A} or {(a, y) : y ∈ A}, whether or not the set A is measurable for onedimensional measure. Accordingly, if f is a function of two variables which is integrable for twodimensional Lebesgue measure, there is no reason why any particular section x 7→ f (x, b) or y 7→ f (a, y) should beRRmeasurable, let alone integrable. Consequently, even if f itself is defined everywhere, the outer integral of f (x, y)dxdy is likely to be applied to a function which is not defined for every y. Let me remark that the problem does not concern ‘∞’; the awkward functions are those with sections so irregular that they cannot be assigned an integral at all.
232
Product measures
252 Notes
I have seen many approaches to this particular nettle, generally less wholehearted than the one I have determined on for this treatise. Part of the difficulty is that Fubini’s theorem really is at the centre of measure theory. Over large parts of the subject, it is possible to assert that a result is nontrivial if and only if it depends on Fubini’s theorem. I am therefore unwilling to insert any local fix, saying that ‘in this chapter, we shall integrate functions which are not defined everywhere’; before long, such a provision would have to be interpolated into the preambles to half the best theorems, or an explanation offered of why it wasn’t necessary in their particular contexts. I suppose that one of the commonest responses is (like Halmos 50) b to restrict attention to Σ⊗Tmeasurable functions, which eliminates measurability problems for the moment (252Xe, 252P); but unhappily (or rather, to my mind, happily) there are crucial applications in which the b functions are not actually Σ⊗Tmeasurable, but belong to some wider class, and this restriction sooner or later leads to undignified contortions as we are forced to adapt limited results to unforeseen contexts. Besides, it leaves unsaid the really rather important information that if f is a measurable function of two variables then (under appropriate conditions) almost all its sections are measurable (252E). In 252B and its corollaries there is a clumsy restriction: we assume that one of the measures is σfinite and the other is either strictly localizable or complete and locally determined. The obvious question is, whether we need these hypotheses. From 252K we see that the hypothesis ‘σfinite’ on the second factor can certainly not be abandoned, even when the first factor is a complete probability measure. The requirement ‘µ is either strictly localizable or complete and locally determined’ is in fact fractionally stronger than what is needed, as well as disagreeably elaborate. The ‘right’ hypothesis is that the completion of µ should be locally determined (see (b*i) of the proof of 252B). The point is that because the product of two measures is the same as the product of their c.l.d. versions (251S), no theorem which leads from the product measure to the factor measures can distinguish between a measure and its c.l.d. version; so that, in 252B, we must expect to need µ and its c.l.d. version to give rise to the same integrals. The proof of 252B would be better focused if the hypothesis was simplified to ‘ν is σfinite and µ is complete and locally determined’. But this would just transfer part of the argument into the proof of 252C. We also have to work a little harder in 252B in order to cover functions and integrals taking the values ±∞. Fubini’s theorem is so central to measure theory that I believe it is worth taking a bit of extra trouble to state the results in maximal generality. This is especially important because we frequently apply it in multiply repeated integrals, as in 252Xc, in which we have even less control than usual over the intermediate functions to be integrated. I have expressed all the main results of this section in terms of the ‘c.l.d.’ product measure. In the case of σfinite spaces, of course, which is where the theory works best, we could just as well use the ‘primitive’ product measure. Indeed, Fubini’s theorem itself has a version in terms of the primitive product measure which is rather more elegant than 252B as stated (252Yc), and covers the great majority of applications. (Integrals with respect to the primitive and c.l.d. product measures are of course very closely related; see 252Yd.) But we do sometimes need to look at nonσfinite spaces, and in these cases the asymmetric form in 252B is close to the best we can do. Using the primitive product measure does not help at all with the most substantial obstacle, the phenomenon in 252K (see 252Yf). The precalculus concept of an integral as ‘the area under a curve’ is given expression in 252N: the integral of a nonnegative function is the measure of its ordinate set. This is unsatisfactory as a definition of the integral, not just because of the requirement that the base space should be complete and locally determined (which can be dealt with by using the primitive product measure, as in 252Yg), but because the construction of the product measure involves integration (part (c) of the proof of 251E). The idea of 252N is to relate the measure of an ordinate set to the integral of the measures of its vertical sections. Curiously, if instead we integrate the measures of its horizontal sections, as in 252O, we get a more versatile result. (Indeed this one does not involve the Rconcept of ‘product measure’, and could have appeared at any point after ∞ §123.) Note that the integral 0 . . . dt here is applied to a monotonic function, so may be interpreted as an improper Riemann integral. If you think you know enough about the Riemann integral to make this a tempting alternative to the construction in §122, the tricky bit now becomes the proof that the integral is additive. A different line of argument is to use integration over sections to define a product measure. The difficulty with this approach is that unless we take great care we may find ourselves with an asymmetric construction. My own view is that such an asymmetry is acceptable only when there is no alternative. But in Chapter 43 of Volume 4 I will describe a couple of examples.
253Ab
Tensor products
233
Of the two examples I give here, 252K is supposed to show that when I call for σfinite spaces they are really necessary, while 252L is supposed to show that joint measurability is essential in Tonelli’s theorem and its corollaries. The factor spaces in 252K, Lebesgue measure and counting measure, are chosen to show that it is only the lack of σfiniteness that can be the problem; they are otherwise as regular as one can reasonably ask. In 252L I have used the countablecocountable measure on ω1 , which you may feel is fit only for counterexamples; and the question does arise, whether the same phenomenon occurs with Lebesgue measure. This leads into deep water, and I will return to it in Volume 5. I ought perhaps to note explicitly that in Fubini’s theorem, we really do need to have a function which is integrable for the product measure. I include 252Xf RR RRand 252Xg to remind you that even in the bestregulated circumstances, the repeated integrals f dxdy, f dydx may fail to be equal if f is not integrable as a function of two variables. There are many ways to calculate the volume βr of an rdimensional ball; the one I have used in 252Q follows a line that would have been natural to me before I ever heard of measure theory. In 252Xh I suggest another method. The idea of integrationbysubstitution, used in part (b) of the argument, is there supported by an ad hoc argument; I will present a different, more generally applicable, approach in Chapter 26. Elsewhere (252Xh, 252Yk, 252Yl) I find myself taking for granted substitutions of the form t 7→ at, t 7→ a + t; for a systematic justification, see §263. Of course an enormous number of other formulae of advanced calculus are also based on repeated integration of one kind or another, and I give a sample handful of such results (252Xi, 252Yj252Yn).
253 Tensor products The theorems of the last section show that the integrable functions on a product of two measure spaces can be effectively studied in terms of integration on each factor space separately. In this section I present a very striking relationship between the L1 space of a product measure and the L1 spaces of its factors, which actually determines the product L1 up to isomorphism as Banach lattice. I start with a brief note on bilinear maps (253A) and a description of the canonical bilinear map from L1 (µ) × L1 (ν) to L1 (µ × ν) (253B253E). The main theorem of the section is 253F, showing that this canonical map is universal for continuous bilinear maps from L1 (µ) × L1 (ν) to Banach spaces; it also determines the ordering of L1 (µ × ν) (253G). I end with a description of a fundamental type of conditional expectation operator (253H) and notes on products of indefiniteintegral measures (253I) and upper integrals of special kinds of function (253J, 253K). 253A Bilinear maps Before looking at any of the measure theory in this section, I introduce a concept from the theory of linear spaces. (a) Let U , V and W be linear spaces over R (or, indeed, any other field). A map φ : U × V → W is bilinear if it is linear in each variable separately, that is, φ(u1 + u2 , v) = φ(u1 , v) + φ(u2 , v), φ(u, v1 + v2 ) = φ(u, v1 ) + φ(u, v2 ), φ(αu, v) = αφ(u, v) = φ(u, αv) for all u, u1 , u2 ∈ U , v, v1 , v2 ∈ V and scalars α. Observe that φ gives rise to, and in turn can be defined by, a linear operator T : U → L(V ; W ), writing L(V ; W ) for the space of linear operators from V to W , where (T u)(v) = φ(u, v) for all u ∈ U , v ∈ V . Hence, or otherwise, we can see, for instance, that φ(0, v) = φ(u, 0) = 0 whenever u ∈ U, v ∈ V . If W 0 is another linear space over the same field, and S : W → W 0 is a linear operator, then Sφ : U × V → 0 W is bilinear.
234
Product measures
253Ab
(b) Now suppose that U , V and W are normed spaces, and φ : U × V → W is a bilinear map. Then we say that φ is bounded if sup{kφ(u, v)k : kuk ≤ 1, kvk ≤ 1} is finite, and in this case we call this supremum the norm kφk of φ. Note that kφ(u, v)k ≤ kφkkukkvk for all u ∈ U , v ∈ V (because kφ(u, v)k = αβkφ(α−1 u, β −1 v)k ≤ αβkφk whenever α > kuk, β > kvk). If W 0 is another normed space and S : W → W 0 is a bounded linear operator, then Sφ : U × V → W 0 is a bounded bilinear map, and kSφk ≤ kSkkφk. 253B Definition The most important bilinear maps of this section are based on the following idea. Let f and g be realvalued functions. I will write f ⊗ g for the function (x, y) 7→ f (x)g(y) : dom f × dom g → R. 253C Proposition (a) Let X and Y be sets, and Σ, T σalgebras of subsets of X, Y respectively. If f is a Σmeasurable realvalued function defined on a subset of X, and g is a Tmeasurable realvalued function b defined on a subset of Y , then f ⊗ g, as defined in 253B, is Σ⊗Tmeasurable. (b) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X ×Y . If f ∈ L0 (µ) and g ∈ L0 (ν), then f ⊗ g ∈ L0 (λ). Remark Recall from 241A that L0 (µ) is the space of µvirtually measurable realvalued functions defined on µconegligible subsets of X. b proof (a) The point is that f ⊗ χY is Σ⊗Tmeasurable, because for any α ∈ R there is an E ∈ Σ such that {x : f (x) ≥ α} = E ∩ dom f , so that {(x, y) : (f ⊗ χY )(x, y) ≥ α} = (E ∩ dom f ) × Y = (E × Y ) ∩ dom(f ⊗ χY ), b b and of course E × Y ∈ Σ⊗T. Similarly, χX ⊗ g is Σ⊗Tmeasurable and f ⊗ g = (f ⊗ χY ) × (χX ⊗ g) is b Σ⊗Tmeasurable. (b) Let E ∈ Σ, F ∈ T be conegligible subsets of X, Y respectively such that E ⊆ dom f , F ⊆ dom g, b ⊆ Λ (251Ia). Also f ¹E is Σmeasurable and g¹F is Tmeasurable. Write Λ for the domain of λ. Then Σ⊗T E × F is λconegligible, because λ((X × Y ) \ (E × F )) ≤ λ((X \ E) × Y ) + λ(X × (Y \ F )) = µ(X \ E) · νY + µX · ν(Y \ F ) = 0 (also from 251Ia). So dom(f ⊗ g) ⊇ E × F is conegligible. Also, by (a), (f ⊗ g)¹(E × F ) = (f ¹E) ⊗ (g¹F ) b is Σ⊗Tmeasurable, therefore Λmeasurable, and f ⊗ g is virtually measurable. Thus f ⊗ g ∈ L0 (λ), as claimed. 253D
Now we can apply the ideas of 253B253C to integrable functions.
Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces, Rand write λ for R the Rc.l.d. product measure on X × Y . If f ∈ L1 (µ) and g ∈ L1 (ν), then f ⊗ g ∈ L1 (λ) and f ⊗ g dλ = f dµ g dν. Remark I follow §242 in writing L1 (µ) for the space of µintegrable realvalued functions. proof (a) Consider first the case f = χE, g = χF where E ∈ Σ, F ∈ T have finite measure; then f ⊗ g = χ(E × F ) is λintegrable with integral λ(E × F ) = µE · νF = by 251Ia. (b) It follows at once that f ⊗ g is λsimple, with function and g is a νsimple function.
R
R
f dµ ·
R
f ⊗ g dλ =
g dν, R
R f dµ g dν, whenever f is a µsimple
(c) If f and g are nonnegative integrable functions, there are nondecreasing sequences hfn in∈N , hgn in∈N of nonnegative simple functions converging almost everywhere to f , g respectively; now note that if E ⊆ X,
253F
Tensor products
235
F ⊆ Y are conegligible, E×F is conegligible in X×Y , as remarked in the proof of 253C, so the nondecreasing sequence hfn × gn in∈N of λsimple functions converges almost everywhere to f ⊗ g, and
R
f ⊗ g dλ = limn→∞
R
fn ⊗ gn dλ = limn→∞
R
fn dµ
R
gn dν =
R
f dµ
R
g dν
by B.Levi’s theorem. (d) Finally, for general f and g, we can express them as the differences f + − f − , g + − g − of nonnegative integrable functions, and see that
R
f ⊗ g dλ =
R
f + ⊗ g + − f + ⊗ g − − f − ⊗ g + + f − ⊗ g − dλ =
R
f dµ
R
g dν.
253E The canonical map L1 × L1 → L1 I continue the argument from 253D. Because E ×F is conegligible in X × Y whenever E and F are conegligible subsets of X and Y , f1 ⊗ g1 = f ⊗ g λa.e. whenever f = f1 µa.e. and g = g1 νa.e. We may therefore define u ⊗ v ∈ L1 (λ), for u ∈ L1 (µ) and v ∈ L1 (ν), by saying that u ⊗ v = (f ⊗ g)• whenever u = f • and v = g • . Now if f , f1 , f2 ∈ L(µ), g, g1 , g2 ∈ L(ν) and a ∈ R, (f1 + f2 ) ⊗ g = (f1 ⊗ g) + (f2 ⊗ g), f ⊗ (g1 + g2 ) = (f ⊗ g1 ) + (f ⊗ g2 ), (af ) ⊗ g = a(f ⊗ g) = f ⊗ (ag). It follows at once that the map (u, v) 7→ u ⊗ v is bilinear. R R R Moreover, if f ∈ L1 (µ) and g ∈ L1 (ν), f  ⊗ g = f ⊗ g, so f ⊗ gdλ = f dµ gdν. Accordingly ku ⊗ vk1 = kuk1 kvk1 1
1
for all u ∈ L (µ), v ∈ L (ν). In particular, the bilinear map ⊗ is bounded, with norm 1 (except in the trivial case in which one of L1 (µ), L1 (ν) is 0dimensional). 253F
We are now ready for the main theorem of this section.
Theorem Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and let λ be the c.l.d. product measure on X × Y . Let W be any Banach space and φ : L1 (µ) × L1 (ν) → W a bounded bilinear map. Then there is a unique bounded linear operator T : L1 (λ) → W such that T (u ⊗ v) = φ(u, v) for all u ∈ L1 (µ), v ∈ L1 (ν), and kT k = kφk. proof (a) The centre of the argument is the following fact: if E0 , . . . , En are measurable Pn sets of finite measure in X, F0 , .P . . , Fn are measurable sets of finite measure in Y , a0 , . . . , an ∈ R and i=0 ai χ(Ei × Fi ) = 0 λn P We can find a disjoint family hGj ij≤m of measurable sets of a.e., then i=0 ai φ(χEi• , χFi• ) = 0 in W . P finite measure in X such that each E is expressible as a union of some subfamily of the Gj ; so that χEi i Pm is expressible in the form j=0 bij χGj (see 122Ca). Similarly, we can find a disjoint family hHk ik≤l of Pl measurable sets of finite measure in Y such that each χFi is expressible as k=0 cik χHk . Now ¢ Pm Pl ¡Pn Pn j=0 k=0 i=0 ai bij cik χ(Gj × Hk ) = i=0 ai χ(Ei × Fi ) = 0 λa.e. Because the Gj × Hk are disjoint, and λ(Gj × Hk ) = µG Pnj · νHk for all j, k, it follows that for every jP≤ m, k ≤ l we have either µGj = 0 or νHk = 0 or i=0 ai bij cik = 0. In any of these three cases, n • • i=0 ai bij cik φ(χGj , χHk ) = 0 in W . But this means that ¢ Pm Pl ¡Pn Pn • • • • 0 = j=0 k=0 i=0 ai φ(χEi , χFi ), i=0 ai bij cik φ(χGj , χHk ) = as claimed. Q Q 0 (b) It follows that if E0 , . . . , En , E00 , . . . , Em are measurable sets of finite measure Pnin X, F0 , . . . , Fn , measurable sets of finite measure in Y , a0 , . . . , an , a00 , . . . , a0m ∈ R and i=0 ai χ(Ei × Fi ) = Fi0 ) λa.e., then Pn Pm 0 • • 0• 0• i=0 ai φ(χEi , χFi ) = i=0 ai φ(χEi , χFi )
0 0 F , . . . , Fm are P0m 0 0 a χ(E i× i=0 i
in W . Let M be the linear subspace of L1 (λ) generated by
236
Product measures
253F
{χ(E × F )• : E ∈ Σ, µE < ∞, F ∈ T, νF < ∞}; then we have a unique map T0 : M → W such that Pn Pn T0 ( i=0 ai χ(Ei × Fi )• ) = i=0 ai φ(χEi• , χFi• ) whenever E0 , . . . , En are measurable sets of finite measure in X, F0 , . . . , Fn are measurable sets of finite measure in Y and a0 , . . . , an ∈ R. Of course T0 is linear. (c) Some of the same calculations show that kT0 uk ≤ kφkkuk1 for every u ∈ M . P P If u ∈ M , then, by Pm Pl • the arguments of (a), we can express u as j=0 k=0 ajk χ(Gj × Hk ) , where hGj ij≤m and hHk ik≤l are disjoint families of sets of finite measure. Now kT0 uk = k
m X l X
ajk φ(χG•j , χHk• )k ≤
j=0 k=0
≤
m X l X
m X l X
ajk kφ(χG•j , χHk• )k
j=0 k=0
ajk kφkkχG•j k1 kχHk• k1 = kφk
j=0 k=0
= kφk
m X l X
ajk µGj · νHk
j=0 k=0
m X l X
ajk λ(Gj × Hk ) = kφkkuk1 ,
j=0 k=0
as claimed. Q Q (d) The next point is to observe that M is dense in L1 (λ) for k k1 . P P Repeating the ideas above once again, we observe that if E , . . . , E are sets of finite measure in X and F 0 n 0 , . . . , Fn are sets of finite measure S in Y , then χ( i≤n Ei × Fi )• ∈ M ; this is because, expressing each Ei as a union of Gj , where the Gj are disjoint, we have S S 0 j≤m Gj × Fj , i≤n Ei × Fi = S where Fj0 = {Fi : Gj ⊆ Ei } for each j; now hGj × Fj0 ij≤m is disjoint, so Pm S χ( j≤m Gj × Fj )• = j=0 χ(Gj × Fj0 )• ∈ M. So 251Ie tells us that whenever λH < ∞ and ² > 0 there is a G such that λ(H4G) ≤ ² and χG• ∈ M ; now kχH • − χG• k1 = λ(G4H) ≤ ², so χH • is approximated arbitrarily closely by members of M , and belongs to the closure M of M in L1 (λ). Because M is a linear subspace of L1 (λ), so is M (2A4Cb); accordingly M contains the equivalence classes of all λsimple functions; but these are dense in L1 (λ) (242M), so M = L1 (λ), as claimed. Q Q (e) Because W is a Banach space, it follows that there is a bounded linear operator T : L1 (λ) → W extending T0 , with kT k = kT0 k ≤ kφk (2A4I). Now T (u ⊗ v) = φ(u, v) for all u ∈ L1 (µ), v ∈ L1 (ν). P P If u = χE • , v = χF • , where E and F are measurable sets of finite measure, then T (u ⊗ v) = T (χ(E × F )• ) = T0 (χ(E × F )• ) = φ(χE • , χF • ) = φ(u, v). Because φ and ⊗ are bilinear and T is linear, T (f • ⊗ g • ) = φ(f • , g • ) whenever f and g are simple functions. Now whenever u ∈ L1 (µ), v ∈ L1 (ν) and ² > 0, there are simple functions f , g such that ku − f • k1 ≤ ², kv − g • k1 ≤ ² (242M again); so that kφ(u, v) − φ(f • , g • )k ≤ kφ(u − f • , v − g • )k + kφ(u, g • − v)k + kφ(f • − u, v)k ≤ kφk(²2 + ²kuk1 + ²kvk1 ). Similarly ku ⊗ v − f • ⊗ g • k1 ≤ ²(² + kuk1 + kvk1 ), so
253G
Tensor products
237
kT (u ⊗ v) − T (f • ⊗ g • )k ≤ ²kT k(² + kuk1 + kvk1 ); because T (f • ⊗ g • ) = φ(f • , g • ), kT (u ⊗ v) − φ(u, v)k ≤ ²(kT k + kφk)(² + kuk1 + kvk1 ). As ² is arbitrary, T (u ⊗ v) = φ(u, v), as required. Q Q (f ) The argument of (e) ensured that kT k ≤ kφk. Because ku ⊗ vk1 ≤ kuk1 kvk1 for all u ∈ L1 (µ), v ∈ L1 (ν), kφ(u, v)k ≤ kT kkuk1 kvk1 for all u, v, and kφk ≤ kT k; so kT k = kφk. (g) Thus T has the required properties. To see that it is unique, we have only to observe that any bounded linear operator S : L1 (λ) → W such that S(u ⊗ v) = φ(u, v) for all u ∈ L1 (µ), v ∈ L1 (ν) must agree with T on objects of the form χ(E × F )• where E and F are of finite measure, and therefore on every member of M ; because M is dense and both S and T are continuous, they agree everywhere in L1 (λ). 253G The order structure of L1 In 253F I have treated the L1 spaces exclusively as normed linear spaces. In general, however, the order structure of an L1 space (see 242C) is as important as its norm. The map ⊗ : L1 (µ) × L1 (ν) → L1 (λ) respects the order structures of the three spaces in the following strong sense. Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . Then (a) u ⊗ v ≥ 0 in L1 (λ) whenever u ≥ 0 in L1 (µ) and v ≥ 0 in L1 (ν). (b) The positive cone {w : w ≥ 0} of L1 (λ) is precisely the closed convex hull C of {u ⊗ v : u ≥ 0, v ≥ 0} in L1 (λ). *(c) Let W be any Banach lattice, and T : L1 (λ) → W a bounded linear operator. Then the following are equiveridical: (i) T w ≥ 0 in W whenever w ≥ 0 in L1 (λ); (ii) T (u ⊗ v) ≥ 0 in W whenever u ≥ 0 in L1 (µ) and v ≥ 0 in L1 (ν). proof (a) If u, v ≥ 0 then they are expressible as f • , g • where f ∈ L1 (µ), g ∈ L1 (ν) and f ≥ 0, g ≥ 0. Now f ⊗ g ≥ 0 so u ⊗ v = (f ⊗ g)• ≥ 0. (b)(i) Write L1 (λ)+ for {w : w ∈ L1 (λ), w ≥ 0}. Then L1 (λ)+ is a closed convex set in L1 (λ) (242De); by (a), it contains u ⊗ v whenever u ∈ L1 (µ)+ , v ∈ L1 (ν)+ , so it must include C. (ii)(α) OfPcourse 0 = 0 ⊗ 0 ∈ C. (β) If u ∈ M , as defined in the proof of 253F, and u > 0, then u is expressible as j≤m,k≤l ajk χ(Gj × Hk )• , where G0 , . . . , Gm and H0 , . . . , Hl are disjoint sequences of sets of finite measure, as in (a) of the proof of 253F. Now ajk can be negative only if χ(Gj × Hk )• = 0, so replacing every aP jk by max(0, ajk ) if necessary, we can suppose that ajk ≥ 0 for all j, k. Not all the ajk can be zero, so a = j≤m,k≤l ajk > 0, and u=
P
ajk
j≤m,k≤l a
· aχ(Gj × Hk )• =
P
ajk j≤m,k≤l a
· (aχG•j ) ⊗ χHk• ∈ C.
• (γ) If w ∈ L1 (λ)+ and h ≥ 0 in L1 (λ). There is a simple function h1 ≥ 0 such R ² > R0, express w as h whereP n that h1 ≤a.e. h and h ≤ h1 + ². Express h1 as i=0 ai χHi where λHi < ∞, ai ≥ 0 for each i, and for each i ≤ n choose sets GSi0 , . . . , Gimi ∈ Σ, Fi0 , . . . , Fimi ∈ T, all of finite measure, such that Gi0 , . . . , Gimi are disjoint and λ(Hi 4 j≤mi Gij × Fij ) ≤ ²/(n + 1)(ai + 1), as in (d) of the proof of 253F. Set Pn Pmi w0 = i=0 ai j=0 χ(Gij × Fij )• .
Then w0 ∈ C because w0 ∈ M and w0 ≥ 0. Also kw − w0 k1 ≤ kw − h•1 k1 + kh•1 − w0 k1 Z Z mi n X X ≤ (h − h1 )dλ + ai χHi − χ(Gij × Fij )dλ i=0
≤²+
n X i=0
ai λ(H4
[ j≤mi
j=0
Gij × Fij ) ≤ 2².
238
Product measures
253G
As ² is arbitrary and C is closed, w ∈ C. As w is arbitrary, L1 (λ)+ ⊆ C and C = L1 (λ)+ . (c) Part (a) tells us that (i)⇒(ii). For the reverse implication, we need a fragment from the theory of Banach lattices: W + = {w : w ∈ W, w ≥ 0} is a closed set in W . P P If w, w0 ∈ W , then w = (w − w0 ) + w0 ≤ w − w0  + w0 ≤ w − w0  + w0 , −w = (w0 − w) − w0 ≤ w − w0  − w0 ≤ w − w0  + w0 , w ≤ w − w0  + w0 ,
w − w0  ≤ w − w0 ,
because w = w ∨ (−w) and the order of W is translationinvariant (241Ec). Similarly, w0  − w ≤ w − w0  and w − w0  ≤ w − w0 , so kw − w0 k ≤ kw − w0 k, by the definition of Banach lattice (242G). Setting φ(w) = w − w, we see that kφ(w) − φ(w0 )k ≤ 2kw − w0 k for all w, w0 ∈ W , so that φ is continuous. Now, because the order is invariant under multiplication by positive scalars, w ≥ 0 ⇐⇒ 2w ≥ 0 ⇐⇒ w ≥ −w ⇐⇒ w = w ⇐⇒ φ(w) = 0, so W + = {w : φ(w) = 0} is closed. Q Q Now suppose that (ii) is true, and set C1 = {w : w ∈ L1 (λ), T w ≥ 0}. Then C1 contains u ⊗ v whenever u, v ≥ 0; but also it is convex, because T is linear, and closed, because T is continuous and C1 = T −1 [W + ]. By (b), C1 includes {w : w ∈ L1 (λ), w ≥ 0}, as required by (i). 253H Conditional expectations The ideas of this section and the preceding one provide us with some of the most important examples of conditional expectations. Theorem Let (X, Σ, µ) and (Y, T, ν) be complete probability spaces, with c.l.d. product (X × Y, Λ, λ). Set Λ1 = {E × Y : E ∈ Σ}. Then Λ1 is a σsubalgebra of Λ. Given a λintegrable realvalued function f , set g(x, y) =
R
f (x, z)ν(dz)
whenever this is defined. Then g is a conditional expectation of f on Λ1 . proof We know that Λ1 ⊆ Λ, R by 251Ia, and Λ1 is a σalgebra of sets because Σ is. Fubini’s theorem (252B, 252C) tells us that f1 (x) = f (x, z)ν(dz) is defined for almost every x, and therefore that g = f1 ⊗ χY is defined almost everywhere in X ×Y . f1 is µvirtually measurable; because µ is complete, f1 is Σmeasurable, so g is Λ1 measurable (since {(x, y) : g(x, y) ≤ α} = {x : f1 (x) ≤ α} × Y for every α ∈ R). Finally, if W ∈ Λ1 , then W = E × Y for some E ∈ Σ, so Z
Z g dλ =
Z (f1 ⊗ χY ) × (χE ⊗ χY )dλ =
Z f1 × χE dµ
χY dν
W
(by 253D)
ZZ =
(by Fubini’s theorem)
Z χE(x)f (x, y)ν(dy)µ(dx) =
f × χ(E × Y )dλ
Z =
f dλ. W
So g is a conditional expectation of f . 253I
This is a convenient moment to set out a useful result on indefiniteintegral measures.
Proposition Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and f ∈ L0 (µ), g ∈ L0 (ν) nonnegative functions. Let µ0 , ν 0 be the corresponding indefiniteintegral measures (see §234). Let λ be the c.l.d. product of µ and ν, and λ0 the indefiniteintegral measure defined from λ and f ⊗ g ∈ L0 (λ) (253Cb). Then λ0 is the c.l.d. product of µ0 and ν 0 . proof Write θ for the c.l.d. product of µ0 and ν 0 .
253I
Tensor products
239
R (a) If we replace µ by its completion, we do not change the integral dµ (212Fb), so (by the definition in 234A) we do not change µ0 ; at the same time, we do not change λ, by 251S. The same applies to ν. So it will be enough to prove the result on the assumption that µ and ν are complete; in which case f and g are measurable and have measurable domains. Set F = {x : x ∈ dom f, f (x) > 0} and G = {y : y ∈ dom g, g(y) > 0}, so that F × G = {w : w ∈ dom(f ⊗ g), (f ⊗ g)(w) > 0}. Then F is µ0 conegligible and G is ν 0 conegligible, so F × G is θconegligible as well as λ0 conegligible. Because both θ and λ0 are complete (251Ic, 234A), it will be enough to show that the subspace measures θF ×G , λ0F ×G on F × G are equal. But note that θF ×G can be identified with the 0 0 product of µ0F and νG , where µ0F and νG are the subspace measures on F , G respectively (251P(iiα)). At 0 the same time, µF is the indefiniteintegral measure defined from the subspace measure µF on F and the 0 is the indefiniteintegral measure defined from the subspace measure νG on G and g¹G, function f ¹F , νG 0 and λF ×G is defined from the subspace measure λF ×G and (f ¹F ) ⊗ (g¹G). Finally, by 251P again, λF ×G is the product of µF and νG . What all this means is that it will be enough to deal with the case in which F = X and G = Y , that is, f and g are everywhere defined and strictly positive; which is what I will suppose from now on. (b) In this case dom µ0 = Σ and dom ν 0 = T (234Da). Similarly, dom λ0 = Λ is just the domain of λ. Set Fn = {x : x ∈ X, 2−n ≤ f (x) ≤ 2n },
Gn = {y : y ∈ Y, 2−n ≤ g(y) ≤ 2n }
for n ∈ N. (c) Set A = {W : W ∈ dom θ ∩ dom λ0 , θ(W ) = λ0 (W )}. If µ0 E and ν 0 H are defined and finite, then f × χE and g × χH are integrable, so Z Z λ0 (E × H) = (f ⊗ g) × χ(E × H)dλ = (f × χE) ⊗ (g × χH)dλ Z Z = f × χE dµ · g × χH dν = θ(E × H) by 253D and 251Ia, that is, E × H ∈ A. If we now look at AEH = {W : W ⊆ X × Y , W ∩ (E × H) ∈ A}, then we see that AEH contains E 0 × H 0 for every E 0 ∈ Σ, H 0 ∈ T, S if hWn in∈N is a nondecreasing sequence in AEH then n∈N Wn ∈ AEH , if W , W 0 ∈ AEH and W ⊆ W 0 then W 0 \ W ∈ AEH . Thus AEH is a Dynkin class of subsets of X × Y , and by the Monotone Class Theorem (136B) includes the b σalgebra generated by {E 0 × H 0 : E 0 ∈ Σ, H 0 ∈ T}, which is Σ⊗T. (d) Now suppose that W ∈ Λ. In this case W ∈ dom θ and θW ≤ λ0 W . P P Take n ∈ N, and E ∈ dom µ0 , 0 0 0 0 0 H ∈ dom ν such that µ E and ν H are both finite. Set E = E ∩ Fn , H = H ∩ Gn and W 0 = W ∩ (E 0 × H 0 ). b such Then W 0 ∈ Λ, while µE 0 ≤ 2n µ0 E and νH 0 ≤ 2n ν 0 H are finite. By 251Ib there is a V ∈ Σ⊗T b such that V 0 ⊆ (E 0 × H 0 ) \ W 0 and that V ⊆ W 0 and λV = λW 0 . Similarly, there is a V 0 ∈ Σ⊗T λV 0 = λ((E 0 × H 0 ) \ W 0 ). This means that λ((E 0 × H 0 ) \ (V ∪ V 0 )) = 0, so λ0 ((E 0 × H 0 ) \ (V ∪ V 0 )) = 0. But (E 0 × H 0 ) \ (V ∪ V 0 ) ∈ A, by (c), so θ((E 0 × H 0 ) \ (V ∪ V 0 )) = 0 and W 0 ∈ dom θ, while θW 0 = θV = λ0 V ≤ λ0 W . Since E and H are arbitrary, W ∩ (Fn × Gn ) ∈ dom θ (251H) and θ(W ∩ (Fn × Gn )) ≤ λ0 W . Since hEn in∈N , hGn in∈N are nondecreasing sequences with unions X, Y respectively, θW = supn∈N θ(W ∩ (En × Gn )) ≤ λ0 W . Q Q (e) In the same way, λ0 W is defined and less than or equal to θW for every W ∈ dom θ. P P The arguments are very similar, but a refinement seems to be necessary at the last stage. Take n ∈ N, and E ∈ Σ, H ∈ T such that µE and νH are both finite. Set E 0 = E ∩ Fn , H 0 = H ∩ Gn and W 0 = W ∩ (E 0 × H 0 ). Then b such that W 0 ∈ dom θ, while µ0 E 0 ≤ 2n µE and ν 0 H 0 ≤ 2n νH are finite. This time, there are V , V 0 ∈ Σ⊗T 0 0 0 0 0 0 0 0 0 0 V ⊆ W , V ⊆ (E × H ) \ W , θV = θW and θV = θ((E × H ) \ W ). Accordingly
240
Product measures
253I
λ0 V + λ0 V 0 = θV + θV 0 = θ(E 0 × H 0 ) = λ0 (E 0 × H 0 ), so that λ0 W 0 is defined and equal to θW 0 . What this means is that W ∩(Fn ×Gn )∩(E ×H) ∈ A whenever µE and νH are finite. So W ∩(Fn ×Gn ) ∈ Λ, by 251H; as n is arbitrary, W ∈ Λ and λ0 W is defined. ?? Suppose, if possible, that λ0 W > θW . Then there is some n ∈ N such that λ0 (W ∩ (Fn × Gn )) > θW . Because λ is semifinite, 213B tells us that there is some λsimple function h such that h ≤ (f ⊗ g) × χ(W ∩ R (Fn × Gn )) and h dλ > θW ; setting V = {(x, y) : h(x, y) > 0}, we see that V ⊆ W ∩ (Fn × Gn ), λV is defined and finite and λ0 V > θW . Now there must be sets E ∈ Σ, H ∈ T such that µE and νF are both finite and λ(V \ (E × H)) < 4−n (λ0 V − θW ). But in this case V ∈ Λ ⊆ dom θ (by (d)), so we can apply the argument just above to V and conclude that V ∩ (E × H) = V ∩ (Fn × Gn ) ∩ (E × H) belongs to A. And now λ0 V = λ0 (V ∩ (E × H)) + λ0 (V \ (E × H)) ≤ θ(V ∩ (E × H)) + 4n λ(V \ (E × H)) < θV + λ0 V − θW ≤ λ0 V, which is absurd. X X So λ0 W is defined and not greater than θW . Q Q (f ) Putting this together with (d), we see that λ0 = θ, as claimed. Remark If µ0 and ν 0 are totally finite, so that they are ‘truly continuous’ with respect to µ and ν in the sense of 232Ab, then f and g are integrable, so f ⊗ g is λintegrable, and θ = λ0 is truly continuous with respect to λ. The proof above can be simplified using a fragment of the general theory of complete locally determined spaces, which will be given in §412 in Volume 4. *253J Upper integrals The idea of 253D can be repeated in terms of upper integrals, as follows. Proposition Let (X, Σ, µ) and (Y, T, ν) be σfinite measure spaces, with c.l.d. product measure λ. Then for any functions f and g, defined on conegligible subsets of X and Y respectively, and taking values in [0, ∞], R R R f ⊗ g dλ = f dµ · g dν. Remark Here (f ⊗ g)(x, y) = f (x)g(y) for all x ∈ dom f , y ∈ dom g, taking 0 · ∞ = 0, as in §135. R R R R proof (a) I show first that f ⊗ g ≤ f g. P P If f = 0, then f =a.e. 0, so f ⊗ g =a.e. 0 and the result is R R R immediate. The same argument applies if g = 0. If both f and g are nonzero, and either is infinite, the result is trivial. So R let us R supposeR thatRboth are finite. In this case there are integrable f0 , g0 such that f ≤a.e. f0 , g ≤a.e. g0 , f = f0 and g = g0 (133J). So f ⊗ g ≤a.e. f0 ⊗ g0 , and
R
f ⊗g ≤
R
f0 ⊗ g0 =
R
f0
R
g0 =
R R
f g,
by 253D. Q Q
R (b) For the reverse inequality, we need consider only the case in which f ⊗ g is finite, so that there is a R R λintegrable function h such that f ⊗ g ≤a.e. h and f ⊗ g = h. Set f0 (x) =
R
h(x, y)ν(dy)
whenever this is defined in R, which is almost everywhere, by Fubini’s theorem (252B252C). Then f0 (x) ≥ R f (x) g dν for every x ∈ dom f0 ∩ dom f , which is a conegligible set in X; so
R
f ⊗g =
R
h dλ =
R
f0 dµ ≥
R R
f g,
as required. *253K
A similar argument applies to upper integrals of sums, as follows.
Proposition Let (X, Σ, µ) and (Y, T, ν) be probability spaces, with c.l.d. product measure λ. Then for any realvalued functions f , g defined on conegligible subsets of X, Y respectively,
253Xc
Tensor products
R
241
R R f (x) + g(y)λ(d(x, y)) = f (x)µ(dx) + g(y)ν(dy),
at least when the righthand side is defined in [−∞, ∞]. proof Set h(x, y) = f (x) + g(y) for x ∈ dom f , y ∈ dom g, so that dom h is λconegligible. R R R R R P If either f or g is ∞, this is trivial. Otherwise, (a) As in 253J, I start by showing that h ≤ f + g. P take integrable functions f0 , g0 such that f ≤a.e. f0 and g ≤a.e. g0 . Set h0 = (f0 ⊗ χY ) + (χX ⊗ g0 ); then h ≤ h0 λa.e., so
R
R
R
h dλ ≤
R
h0 dλ =
R
f0 dµ +
R
g0 dν.
R
As f0 , g0 are arbitrary, h ≤ f + g. Q Q (b) For the reverse inequality, suppose that h ≤ h0 for λalmost every (x, y), where h0 is λintegrable. R R Set f0 (x) = h0 (x, y)ν(dy) whenever this is defined in R. Then f0 (x) ≥ f (x) + g dν whenever x ∈ dom f ∩ dom f0 , so
R
R
R
h0 dλ =
R
f0 dµ ≥
R
f dµ +
R
g dν.
R
As h0 is arbitrary, h ≥ f + g, as required. 253L Complex spaces As usual, the ideas of 253F and 253H apply essentially unchanged to complex L1 spaces. Writing L1C (µ), etc., for the complex L1 spaces involved, we have the following results. Throughout, let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . (a) If f ∈ L0C (µ), g ∈ L0C (ν) then f ⊗ g, defined by the formula (f ⊗ g)(x, y) = f (x)g(y) for all x ∈ dom f , y ∈ dom g, belongs to L0C (λ). R R R (b) If f ∈ L1C (µ), g ∈ L1C (ν) then f ⊗ g ∈ L1C (λ) and f ⊗ g dλ = f dµ g dν. (c) We have a bilinear map (u, v) 7→ u ⊗ v : L1C (µ) × L1C (ν) → L1C (λ) defined by writing f • ⊗ g • = (f ⊗ g)• for all f ∈ L1C (µ), g ∈ L1C (ν). (d) If W is any complex Banach space and φ : L1C (µ) × L1C (ν) → W is any bounded bilinear map, then there is a unique bounded linear operator T : L1C (λ) → W such that T (u ⊗ v) = φ(u, v) for every u ∈ L1C (µ), v ∈ L1C (ν), and kT k = kφk. (e) If µ and ν are complete probability measures, and Λ1 = {E × Y : E R∈ Σ}, then for any f ∈ L1C (λ) we have a conditional expectation g of f on Λ1 given by setting g(x, y) = f (x, z)ν(dz) whenever this is defined. 253X Basic exercises > (a) Let U , V and W be linear spaces. Show that the set of bilinear maps from U × V to W has a natural linear structure agreeing with those of L(U ; L(V ; W )) and L(V ; L(U ; W )), writing L(U ; W ) for the linear space of linear operators from U to W . > (b) Let U , V and W be normed spaces. (i) Show that for a bilinear map φ : U × V → W the following are equiveridical: (α) φ is bounded in the sense of 253Ab; (β) φ is continuous; (γ) φ is continuous at some point of U × V . (ii) Show that the space of bounded bilinear maps from U × V to W is a linear subspace of the space of all bilinear maps from U × V to W , and that the functional k k defined in 253Ab is a norm, agreeing with the norms of B(U ; B(V ; W )) and B(V ; B(U ; W )), writing B(U ; W ) for the normed space of bounded linear operators from U to W . (c) Let (X1 , Σ1 , µ1 ), . . . , (Xn , Σn , µn ) be measure spaces, and λ the c.l.d. product measure on X1 ×. . .×Xn , as described in 251W. Let W be a Banach space, and suppose that φ : L1 (µ1 ) × . . . × L1 (µn ) → W is multilinear (that is, linear in each variable separately) and bounded (that is, kφk = sup{φ(u1 , . . . , un ) : kui k1 ≤ 1 ∀ i ≤ n} < ∞). Show that there is a unique bounded linear operator T : L1 (λ) → W such that T ⊗ = φ, where ⊗ : L1 (µ1 ) × . . . × L1 (µn ) → L1 (λ) is a canonical multilinear map (to be defined).
242
Product measures
253Xd
(d) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . Show that if A ⊆ L1 (µ) and B ⊆ L1 (ν) are both uniformly integrable, then {u ⊗ v : u ∈ A, v ∈ B} is uniformly integrable in L1 (λ). > (e) Let (X, Σ, µ) and (Y, T, ν) be measure spaces and λ the c.l.d. product measure on X × Y . Show that (i) we have a bilinear map (u, v) 7→ u ⊗ v : L0 (µ) × L0 (ν) → L0 (λ) given by setting f • ⊗ g • = (f ⊗ g)• for all f ∈ L0 (µ), g ∈ L0 (ν); (ii) if 1 ≤ p ≤ ∞ then u ⊗ v ∈ Lp (λ) and ku ⊗ vkp = kukp kvkp for all u ∈ Lp (µ), v ∈ Lp (ν); (iii) if u, u0 ∈ L2 (µ) and v, v 0 ∈ L2 (ν) then the inner product (u ⊗ vu0 ⊗ v 0 ), taken in L2 (λ), is just 0 (uu )(vv 0 ); (iv) the map (u, v) 7→ u ⊗ v : L0 (µ) × L0 (ν) → L0 (λ) is continuous if L0 (µ), L0 (ν) and L0 (λ) are all given their topologies of convergence in measure. (f ) In 253Xe, assume that µ and ν are semifinite. Show P that if u0 , . . . , un are linearly independent n members of L0 (µ) and v0 , . . . , vn ∈ L0 (ν) are not all 0, then i=0 ui ⊗ vi 6= 0 in L0 (λ). (Hint: start by • finding sets E ∈ Σ, F ∈ T of finite measure such that u0 × χE , . . . , un × χE • are linearly independent and v0 × χF • , . . . , vn × χF • are not all 0.) (g) In 253Xe, assume that µ and ν are semifinite. If U , V are linear subspaces of L0 (µ) and L0 (ν) respectively, write U ⊗ V for the linear subspace of L0 (λ) generated by {u ⊗ v : u ∈ U, v ∈ V }. Show that if W is any linear space and φ : U ×V → W is a bilinear map, there is a unique linear opeartor T : U ⊗V → W such that T (u ⊗ v) = φ(u, v) P for all u ∈ U , v ∈ V . (Hint: start by showing that if u0 , . . . , un ∈ U and Pn n v0 , . . . , vn ∈ V are such that i=0 ui ⊗ vi = 0, then i=0 φ(ui , vi ) = 0 – do this by expressing the ui as linear combinations of some linearly independent family and applying 253Xf.) >(h) Let (X, Σ, µ) and (Y, T, ν) be completeR probability spaces, with c.l.d. product measure λ. Suppose that p ∈ [1, ∞] and that f ∈ Lp (λ). Set g(x) = f (x, y)ν(dy) whenever this is defined. Show that g ∈ Lp (µ) and that kgkp ≤ kf kp . (Hint: 253H, 244M.) (i) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, with c.l.d. product measure λ, and p ∈ [1, ∞[. Show that {w : w ∈ Lp (λ), w ≥ 0} is the closed convex hull in Lp (λ) of {u ⊗ v : u ∈ Lp (µ), v ∈ Lp (ν), u ≥ 0, v ≥ 0} (see 253Xe(ii) above). 253Y Further exercises (a) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ0 the primitive product measure on X × Y . Show that if f ∈ L0 (µ) and g ∈ L0 (ν), then f ⊗ g ∈ L0 (λ0 ). (b) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ0R the primitive Rproduct R measure on X × Y . Show that if f ∈ L1 (µ) and g ∈ L1 (ν), then f ⊗ g ∈ L1 (λ0 ) and f ⊗ g dλ0 = f dµ g dν. (c) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ0 , λ the primitive and c.l.d. product measures on X × Y . Show that the embedding L1 (λ0 ) ⊆ L1 (λ) induces a Banach lattice isomorphism between L1 (λ0 ) and L1 (λ). (d) Let (X, Σ, µ), (Y, T, ν) be strictly localizable measure spaces, with c.l.d. product measure λ. Show that L∞ (λ) can be identified with L1 (λ)∗ . Show that under this identification {w : w ∈ L∞ (λ), w ≥ 0} is the weak*closed convex hull of {u ⊗ v : u ∈ L∞ (µ), v ∈ L∞ (ν), u ≥ 0, v ≥ 0}. (e) Find a version of 253J valid when one of µ, ν is not σfinite. (f ) Let (X, Σ, µ) be any measure space and V any Banach space. Write L1V = L1V (µ) for the set of functions f such that (α) dom f is a conegligible subset of X (β) f takes values in V (γ) there is a conegligible set −1 D R ⊆ dom f such that f [D] is separable and D ∩ f [G] ∈ Σ for every open set G ⊆ V (δ) the integral kf (x)kµ(dx) is finite. (These are the Bochner integrable functions from X to V .) For f , g ∈ L1V write f ∼ g if f = g µa.e.; let L1V be the set of equivalence classes in L1V under ∼. Show that (i) f + g, cf ∈ L1V for all f , g ∈ L1V , c ∈ R;
253 Notes
Tensor products
243
(ii) L1V has a natural linear space structure, defined by writing f • + g • = (f + g)• , cf • = (cf )• for f , g ∈ L1V and c ∈ R; R (iii) L1V has a norm k k, defined by writing kf • k = kf (x)kµ(dx) for f ∈ L1V ; (iv) L1V is a Banach space under this norm; (v) there is a natural map ⊗ : L1 ×V → L1V defined by writing (f ⊗v)(x) = f (x)v when f ∈ L1 = L1R (µ), v ∈ V , x ∈ dom f ; (vi) there is a canonical bilinear map ⊗ : L1 × V → L1V defined by writing f • ⊗ v = (f ⊗ v)• for f ∈ L1 , v ∈V; (vii) whenever W is a Banach space and φ : L1 × V → W is a bounded bilinear map, there is a unique bounded linear operator T : L1V → W such that T (u ⊗ v) = φ(u, v) for all u ∈ L1 , v ∈ V , and kT k = kφk. (g) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ0 the primitive product measure on X × Y . If f is a λ0 integrable function, write fx (y) = f (x, y) whenever this is defined. Show that we have a map x 7→ fx• from a conegligible subset D0 of X to L1 (ν). Show that this map is a Bochner integrable function, as defined in 253Yf. (h) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and suppose that φ is a function from X to a separable subset of L1 (ν) which is measurable in the sense that φ−1 [G] ∈ Σ for every open G ⊆ L1 (ν). Show that there is a Λmeasurable function f from X × Y to R, where Λ is the domain of the c.l.d. product measure on X × Y , such that φ(x) = fx• for every x ∈ X, writing fx (y) = f (x, y) for x ∈ X, y ∈ Y . (i) Let (X, Σ, µ) and (Y, T, ν) be measure spaces, and λ the c.l.d. product measure on X × Y . Show that 253Yg provides a canonical identification between L1 (λ) and L1L1 (ν) (µ). (j) Let (X, Σ, µ) and (Y, T, ν) be complete locally determined measure spaces, with c.l.d. product measure R λ. (i) Suppose that K ∈ L2 (λ), f ∈ L2 (µ). Show that h(y) = K(x, y)f R (x)dx is defined for almost all y ∈ Y and that h ∈ L2 (ν). (Hint: to see that h is defined a.e., consider E×F K(x, y)f (x)d(x, y) for µE, R νF < ∞; to see that h ∈ L2 consider h × g where g ∈ L2 (ν).) (ii) Show that the map f 7→ h corresponds to a bounded linear operator TK : L2 (µ) → L2 (ν). (iii) Show that the map K 7→ TK corresponds to a bounded linear operator, of norm at most 1, from L2 (λ) to B(L2 (µ); L2 (ν)). 1 (k) Suppose that p, q ∈ [1, ∞] and that p1 + 1q = 1, interpreting ∞ as 0 as usual. Let (X, Σ, µ), (Y, T, ν) be complete locally determined measure spaces with c.l.d. product measure λ. Show that the ideas of 253Yj can be used to define a bounded linear operator, of norm 1, from Lp (λ) to B(Lq (µ); Lp (ν)).
(l) In 253Xc, suppose that W is a Banach lattice. Show that the following are equiveridical: (i) T u ≥ 0 whenever u ∈ L1 (λ); (ii) φ(u1 , . . . , un ) ≥ 0 whenever ui ≥ 0 in L1 (λi ) for each i ≤ n. 253 Notes and comments Throughout the main arguments of this section, I have written the results in terms of the c.l.d. product measure; of course the isomorphism noted in 253Yc means that they could just as well have been expressed in terms of the primitive product measure. The more restricted notion of integrability with respect to the primitive product measure is indeed the one appropriate for the ideas of 253Yg. Theorem 253F is a ‘universal mapping theorem’; it asserts that every bounded bilinear operator on L1 (µ) × L1 (ν) factors through ⊗ : L1 (µ) × L1 (ν) → L1 (λ), at least if the range space is a Banach space. It is easy to see that this property defines the pair (L1 (λ), ⊗) up to Banach space isomorphism, in the following sense: if V is a Banach space, and ψ : L1 (µ) × L1 (ν) → V is a bounded bilinear map such that for every bounded bilinear map φ from L1 (µ) × L1 (ν) to any Banach space W there is a unique bounded linear operator T : V → W such that T ψ = φ and kT k = kφk, then there is an isometric Banach space isomorphism S : L1 (λ) → V such that S⊗ = ψ. There is of course a general theory of bilinear maps between Banach spaces; in the language of this theory, L1 (λ) is, or is isomorphic to, the ‘projective tensor product’ of L1 (µ) and L1 (ν). For an introduction to this subject, see Defant & Floret 93, §I.3, or Semadeni 71, §20. I should perhaps emphasise, for the sake of those who have not encountered tensor products before, that this theorem is special to L1 spaces. While some of the same ideas can be applied to other function spaces (see 253Xe253Xg), there is no other class to which 253F applies.
244
Product measures
253 Notes
There is also a theory of tensor products of Banach lattices, for which I do not think we are quite ready (it needs general ideas about ordered linear spaces for which I mean to wait until Chapter 35 in the next volume). However 253G shows that the ordering, and therefore the Banach lattice structure, of L1 (λ) is determined by the ordering of L1 (µ) and L1 (ν) and the map ⊗ : L1 (µ) × L1 (ν) → L1 (λ). The conditional expectation operators described in 253H are of very great importance, largely because in this special context we have a realization of the conditional expectation operator as a function P0 from L1 (λ) to L1 (λ¹Λ1 ), not just as a function from L1 (λ) to L1 (λ¹Λ1 ), as in 242J. As described here, P0 (f + f 0 ) need not be equal, in the strict sense, to P0 f + P0 f 0 ; it can have a larger domain. In applications, however, one b might be willing to restrict attention to the linear space U of bounded Σ⊗Tmeasurable functions defined everywhere on X × Y , so that P0 becomes an operator from U to itself (see 252P).
254 Infinite products I come now to the second basic idea of this chapter: the description of a product measure on the product of a (possibly large) family of probability spaces. The section begins with a construction on similar lines to that of §251 (254A254F) and its defining property in terms of inversemeasurepreserving functions (254G). I discuss the usual measure on {0, 1}I (254J254K), subspace measures (254L) and various properties of subproducts (254M254T), including a study of the associated conditional expectation operators (254R254T). Q 254A Definitions (a) Let h(Xi , Σi , µi )ii∈I be a family of probability spaces. Set X = i∈I Xi , the family of functions x with domain I such that x(i) ∈ Xi for every i ∈ I. In this context, I will say that a measurable cylinder is a subset of X expressible in the form Q C = i∈I Ci , where Ci ∈ Σi for every i ∈ I and Xi } is finite. Note that for a nonempty C ⊆ X this expression Q{i : Ci 6= Q is unique. P P Suppose that C = i∈I Ci = i∈I Ci0 . For each i ∈ I set Di = {x(i) : x ∈ C}. Of course Di ⊆ Ci . Because C 6= ∅, we can fix on some z ∈ C. If i ∈ I and ξ ∈ Ci , consider x ∈ X defined by setting x(i) = ξ,
x(j) = z(j) for j 6= i;
then x ∈ C so ξ = x(i) ∈ Di . Thus Di = Ci for i ∈ I. Similarly, Di = Ci0 . Q Q (b) We can therefore define a functional θ0 : C → [0, 1], where C is the set of measurable cylinders, by setting Q θ0 C = i∈I µi Ci whenever Ci ∈ Σi for every i ∈ I and {i : Ci 6= Xi } is finite, noting that only finitely many terms in the product can differ from 1, so that it can safely be treated as a finite Q product. If C = ∅, one of the Ci must be empty, so θ0 C is surely 0, even though the expression of C as i∈I Ci is no longer unique. (c) Now define θ : PX → [0, 1] by setting P∞ S θA = inf{ n=0 θ0 Cn : Cn ∈ C for every n ∈ N, A ⊆ n∈N Cn }. 254B Lemma The functional θ defined in 254Ac is always an outer measure on X. proof Use exactly the same arguments as those in 251B above. 254CQDefinition Let h(Xi , Σi , µi )ii∈I be any indexed family of probability spaces, and X the Cartesian product i∈I Xi . The product measure on X is the measure defined by Carath´eodory’s method (113C) from the outer measure θ defined in 254A.
254F
Infinite products
245
Q 254D Remarks (a) In 254Ab, I asserted that if C ∈ C and no Ci is empty, then nor is C = i∈I Ci . This is the ‘Axiom of Choice’: the product of any family hCi ii∈I of nonempty sets is nonempty, that is, there is a ‘choice function’ x with domain I picking out a distinguished member x(i) of each Ci . In this volume I have not attempted to be scrupulous in indicating uses of the axiom of choice. In fact the use here is not an absolutely vital one; I mean, the theory of infinite products, even uncountable products, of probability spaces does not change character completely in the absence of the full axiom of choice (provided, that is, that we allow ourselves to use Q the countable axiom of choice). The point is that all we really need, in the present context, is that X = i∈I Xi should be nonempty; and in many contexts we can prove this, for the particular cases of interest, without using the axiom of choice, by actually exhibiting a member of X. The simplest case in which this is difficult is when the Xi are uncontrolled Borel subsets of [0, 1]; and even then, if they are presented with coherent descriptions, we may, with appropriate labour, be able to construct a member of X. But clearly such a process is liable to slow us down a good deal, and for the moment I think there is no great virtue in taking so much trouble. (b) I have given this section the title ‘infinite products’, but it is useful to be able to apply the ideas to finite I; I should mention in particular the cases #(I) ≤ 2. (i) If I = ∅, X consists of the unique function with domain I, the empty function. If we identify a function with its graph, then X is actually {∅}; in any case, X is to be a singleton set, with λX = 1. (ii) If I is a singleton {i}, then we can identify X with Xi ; C becomes identified with Σi and θ0 with µi , so that θ can be identified with µ∗i and the ‘product measure’ becomes the measure on Xi defined from µ∗i , that is, the completion of µi (213Xa(iv)). (iii) If I is a doubleton {i, j}, then we can identify X with Xi × Xj ; in this case the definitions of 254A, 254C above match exactly with those of 251A and 251C, so that λ here can be identified with the primitive product measure as defined in 251C. Because µi and µj are both totally finite, this agrees with the c.l.d. product measure of 251F. Q 254E Definition Let hXi ii∈I be any family of sets, and X = i∈I Xi . If Σi is a σsubalgebra of subsets N of Xi for each i ∈ I, I write c i∈I Σi for the σalgebra of subsets of X generated by {{x : x ∈ X, x(i) ∈ E} : i ∈ I, E ∈ Σi }. (Compare 251D.) 254FQTheorem Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, and let λ be the product measure on X = i∈I Xi defined as in 254C; let Λ be its domain. (a) λX = 1. Q Q Q (b) If Ei ∈ Σi for every i ∈ I, and {i : Ei 6= Xi } is countable, then i∈I Ei ∈ Λ, and λ( i∈I Ei ) = i∈I µi Ei . In particular, λC = θ0 C for every measurable cylinder C, as defined in 254A, and if j ∈ I then x 7→ x(j) : X → Xj is inversemeasurepreserving. N (c) c i∈I Σi ⊆ Λ. (d) λ is complete. (e) For S every W ∈ Λ and ² > 0 there is a finite family C0 , . . . , Cn of measurable cylinders such that λ(W 4 k≤n Ck ) ≤ ². N (f) For every W ∈ Λ there are W1 , W2 ∈ c i∈I Σi such that W1 ⊆ W ⊆ W2 and λ(W2 \ W1 ) = 0. Q Remark Perhaps I should pause Q to interpret the product i∈I µi Ei . Because all the µi Ei belong to [0, 1], this is simply inf J⊆I,J is finite i∈J µi Ei , taking the empty product to be 1. proof Throughout this proof, define C, θ0 and θ as in 254A. I will write out an argument which applies to finite I as well as infinite I, but you may reasonably prefer to assume that I is infinite on first reading. Q (a) Of course λX = θX, so I have to show that θX = 1. Because X, ∅ ∈ C and θ0 X = i∈I µi Xi = 1 and θ0 ∅ = 0, θX ≤ θ0 X + θ0 ∅ + . . . = 1.
246
Product measures
254F
I therefore have to show that θX ≥ 1. ?? Suppose, if possible, otherwise. P∞ (i) There is a sequence hCn in∈N in C, covering X, such that n=0 θ0 Cn < 1. For each n ∈ N, express Cn as {x : x(i) ∈ Eni ∀ i ∈ I}, where S every Eni ∈ Σi and Jn = {i : Eni 6= Xi } is finite. No Jn can be empty, because θ0 Cn < 1 = θ0 X; set J = n∈N Jn . Then J is a countable nonempty subset of I. Set K = N if J is infinite, {k : 0 ≤ k < #(J)} if J is finite; let k 7→ ik : K → Q J be a bijection. For each k ∈ K, set Lk = {ij : j < k} ⊆ J, and set αnk = i∈I\Lk µi Eni for n ∈ N, k ∈ K. If J is finite, then we can identify L#(J) with J, and set αn,#(J) = 1 for every n. We have αn0 = θ0 Cn for each n, so P∞ n=0 αn0 < 1. For n ∈ N, k ∈ K, t ∈ Xik set fnk (t) = αn,k+1 if t ∈ En,ik , = 0 otherwise. Then
R
fnk dµik = αn,k+1 µik En,ik = αnk .
(ii) Choose tk ∈ Xik inductively, for k ∈ K, as follows. The inductive hypothesis will be that αnk < 1, where Mk = {n : n ∈ N, tj ∈ En,ij ∀ j < k}; of course M0 = N, so the induction starts. Given that R R P P P 1 > n∈Mk αnk = n∈Mk fnk dµik = ( n∈Mk fnk )dµik P (by B.Levi’s theorem), there must be a tk ∈ Xik such that n∈Mk fnk (tk ) < 1. Now for such a choice of tk , P αn,k+1 = fnk (tk ) for every n ∈ Mk+1 , so that n∈Mk+1 αn,k+1 < 1, and the induction continues, unless J is finite and k + 1 = #(J). In this last case we must just have M#(J) = ∅, because αn,#(J) = 1 for every n. P
n∈Mk
(iii) If J is infinite, we obtain a full sequence htk ik∈N ; if J is finite, we obtain just a finite sequence htk ik 0. Then there is a sequence hCn in∈N in C such that S C and n n∈N n=0 θ0 Cn ≤ θA + ². In this case S S A ∩ W ⊆ n∈N Cn ∩ W , A \ W ⊆ n∈N Cn \ W,
so θ(A ∩ W ) ≤
P∞
n=0 θ0 (Cn
and θ(A ∩ W ) + θ(A \ W ) ≤
P∞
∩ W ),
n=0 θ0 (Cn
θ(A \ W ) ≤
P∞
n=0 θ0 (Cn
∩ W ) + θ0 (Cn \ W ) =
P∞
\ W ),
n=0 θ0 Cn
≤ θA + ².
As ² is arbitrary, θ(A ∩ W ) + θ(A \ W ) ≤ θA; as A is arbitrary, W ∈ Λ. (iii) I show next that if JQ⊆ I is finite and Ci ∈ Σi for each i ∈ J, and C = {x : x ∈ X, x(i) ∈ Ci ∀ i ∈ J}, then C ∈ Λ and λC = i∈J µi Ci . P P Induce on #(J). If #(J) = 0, that is, J = ∅, then C = X and this is part (a). For the inductive step to #(J) = n + 1, take any j ∈ J and set J 0 = J \ {j}, C 0 = {x : x ∈ X, x(i) ∈ Ci ∀ i ∈ J 0 }, C 00 = C 0 \ C = {x : x ∈ C 0 , x(j) ∈ Xj \ Cj }.
254G
Infinite products
247
Q Then C, C 0 , C 00 all belong to C, and θ0 C 0 = i∈J 0 µi Ci = α say, θ0 C = αµj Cj , θ0 C 00 = α(1 − µj Cj ). Moreover, by the inductive hypothesis, C 0 ∈ Λ and α = λC 0 = θC 0 . So C = C 0 ∩ {x : x(j) ∈ Cj } ∈ Λ by (ii), and C 00 = C 0 \ C ∈ Λ. We surely have λC = θC ≤ θ0 C, λC 00 ≤ θ0 C 00 ; but also α = λC 0 = λC + λC 00 ≤ θ0 C + θ0 C 00 = α, so in fact λC = θ0 C = αµj Cj =
Q i∈J
µCi ,
and the induction proceeds. Q Q Q (iv) Now let us return to the general case of a set W of the form i∈I Ei where Ei ∈ Σi for each i, and K = {i : Ei 6= Xi } is countable. If K is finite then W = {x : x(i) ∈ Ei ∀ i ∈ K} so W ∈ Λ and Q Q λW = i∈K µi Ei = i∈I µi Ei . Otherwise, let hin in∈N be an enumeration of K. Q For each n ∈ N set Wn = {x : x ∈ X, x(ik ) ∈ Eik ∀ k ≤ n}; then we know that Wn ∈ Λ and that λWn = k≤n µik Eik . But hWn in∈N is a nonincreasing sequence with intersection W , so W ∈ Λ and Q Q λW = limn→∞ λWn = i∈K µi Ei = i∈I µi Ei . N (c) is an immediate consequence of (b) and the definition of c i∈I Σi . (d) Because λ is constructed by Carath´eodory’s method it must be complete. P∞ S (e) Let hCn in∈N be a sequence in C such that W ⊆ n∈N Cn and n=0 θ0 Cn ≤ θW + 21 ². Set V = P∞ S S by (b), V ∈ Λ. Let n ∈ N be such that i=n+1 θ0 Ci ≤ 21 ², and consider W 0 = k≤n Ck . Since n∈N Cn ; S V \ W 0 ⊆ i>n Ci , λ(W 4W 0 ) ≤ λ(V \ W 0 ) + λ(V \ W ) = λV − λW + λ(V \ W ) = θV − θW + θ(V \ W ) ∞ X X 1 1 ≤ θ0 Ci − θW + θ0 Ci ≤ ² + ² = ². i=0
2
i=n+1
2
N P Let hCn in∈N (f )(i) If W ∈ Λ and ² > 0 there is a V ∈ c i∈I Σi such that W ⊆ V and λV ≤ λW + ². P N P∞ S c be a sequence in C such that W ⊆ n∈N Cn and n=0 θ0 Cn ≤ θW + ². Then Cn ∈ i∈I Σi for each n, so S N V = n∈N Cn ∈ c i∈I Σi . Now W ⊆ V , and P∞ λV = θV ≤ n=0 θ0 Cn ≤ θW + ² = λW + ². Q Q N (ii) Now, given W ∈ Λ, let hVn in∈N be a sequence of sets in c i∈I Σi such that W ⊆ Vn and λVn ≤ T N λW + 2−n for each n; then W2 = n∈N Vn belongs to c i∈I Σi and W ⊆ W2 and λW2 = λW . Similarly, N there is a W 0 ∈ c Σi such that X \ W ⊆ W 0 and λW 0 = λ(X \ W ), so we may take W1 = X \ W 0 to 2
i∈I
complete the proof. 254G
2
2
2
The following is a fundamental, indeed defining, property of product measures.
Lemma Let h(Xi , Σi , µi )ii∈I be a family of probability spaces with product (X, Λ, λ). Let (Y, T, ν) be a complete probability space and φ : Y → X a function. Suppose that ν ∗ φ−1 [C] ≤ λC for every measurable cylinder C ⊆ X. Then φ is inversemeasurepreserving. In particular, φ is inversemeasurepreserving iff φ−1 [C] ∈ T and νφ−1 [C] = λC for every measurable cylinder C ⊆ X. Remark By ν ∗ I mean the usual outer measure defined from ν as in §132. proof (a) First note that, writing θ for the outer measure of 254A, ν ∗ φ−1 [A] P Given S ≤ θA for every P∞A ⊆ X. P ² > 0, there is a sequence hCn in∈N of measurable cylinders such that A ⊆ n∈N Cn and n=0 θ0 Cn ≤ θA+²,
248
Product measures
254G
where θ0 is the functional of 254A. But we know that θ0 C = λC for every measurable cylinder C (254Fb), so S P∞ P∞ ν ∗ φ−1 [A] ≤ ν ∗ ( n∈N φ−1 [Cn ]) ≤ n=0 ν ∗ φ−1 [Cn ] ≤ n=0 λCn ≤ θA + ². As ² is arbitrary, ν ∗ φ−1 [A] ≤ θA. Q Q (b) Now take any W ∈ Λ. Then there are F , F 0 ∈ T such that φ−1 [W ] ⊆ F ,
φ−1 [X \ W ] ⊆ F 0 ,
νF = ν ∗ φ−1 [W ] ≤ θW = λW ,
νF 0 ≤ λ[X \ W ].
We have F ∪ F 0 ⊇ φ−1 [W ] ∪ φ−1 [X \ W ] = Y , so ν(F ∩ F 0 ) = νF + νF 0 − ν(F ∪ F 0 ) ≤ λW + λ(X \ W ) − 1 = 0. Now F \ φ−1 [W ] ⊆ F ∩ φ−1 [X \ W ] ⊆ F ∩ F 0 is νnegligible. Because ν is complete, F \ φ−1 [W ] ∈ T and φ−1 [W ] = F \ (F \ φ−1 [W ]) belongs to T. Moreover, 1 = νF + νF 0 ≤ λW + λ(X \ W ) = 1, so we must have νF = λW ; but this means that νφ−1 [W ] = νW . As W is arbitrary, φ is inversemeasurepreserving. 254H Corollary Let h(Xi , Σi , µi )ii∈I and h(Yi , Ti , νi )ii∈I be two families of probability spaces, with products (X, Λ, λ) and (Y, Λ0 , λ0 ). Suppose that for each i ∈ I we are given an inversemeasurepreserving function φi : Xi → Yi . Set φ(x) = hφi (x(i))ii∈I for x ∈ X. Then φ : X → Y is inversemeasurepreserving. Q Q proof If C = i∈I Ci is a measurable cylinder in Y , then φ−1 [C] = i∈I φ−1 i [Ci ] is a measurable cylinder in X, and Q Q 0 λφ−1 [C] = i∈I µi φ−1 i [Ci ] = i∈I νi Ci = λ C. Since λ is a complete probability measure, 254G tells us that φ is inversemeasurepreserving. 254I
Corresponding to 251S we have the following.
Proposition Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, λ the product measure on X = and Λ its domain. Then λ is also the product of the completions µ ˆi of the µi (212C).
Q i∈I
Xi ,
ˆ for the product of the µ ˆ for its domain. (i) The identity map from Xi to itself proof Write λ ˆi , and Λ is inversemeasurepreserving if regarded as a map from (Xi , µ ˆi ) to (Xi , µi ), so the identity map on X is ˆ to (X, λ), by 254H; that is, Λ ⊆ Λ ˆ ˆ and λ = λ¹Λ. inversemeasurepreserving if regarded as a map from (X, λ) Q ˆ i for every i and {i : Ci 6= Xi } (ii) If C is a measurable cylinder for hˆ µi ii∈I , that is, C = i∈I Ci where Ci ∈ Σ Q is finite, then for each i ∈ I we can find a Ci0 ∈ Σi such that Ci ⊆ Ci0 and µi Ci0 = µ ˆi Ci ; setting C 0 = i∈I Ci0 , we get Q Q ˆ λ∗ C ≤ λC 0 = µi C 0 = µ ˆi Ci = λC. i∈I
i
i∈I
ˆ whenever W ∈ Λ. ˆ Putting this together with (i), we see By 254G, λW must be defined and equal to λW ˆ that λ = λ. 254J The product measure on {0, 1}I (a) Perhaps the most important of all examples of infinite product measures is the case in which each factor Xi is just {0, 1} and each µi is the ‘faircoin’ probability measure, setting
254K
Infinite products
249 1 2
µi {0} = µi {1} = . In this case, the product X = {0, 1}I has a family hEi ii∈I of measurable sets such that, writing λ for the product measure on X, T λ( i∈J Ei ) = 2−#(J) if J ⊆ I is finite. (Just take Ei = {x : x(i) = 1} for each i.) I will call this λ the usual measure on {0, 1}I . Observe that if I is finite then λ{x} = 2−#(I) for each x ∈ X (using 254Fb). On the other hand, if I is infinite, then λ{x} = 0 for every x ∈ X (because, again using 254Fb, λ∗ {x} ≤ 2−n for every n). (b) There is a natural bijection between {0, 1}I and PI, matching x ∈ {0, 1}I with {i : i ∈ I, x(i) = 1}. ˜ on PI, which I will call the usual measure on PI. Note that for any So we get a standard measure λ finite b ⊆ I and any c ⊆ b we have ˜ : a ∩ b = c} = λ{x : x(i) = 1 for i ∈ c, x(i) = 0 for i ∈ b \ c} = 2−#(b) . λ{a (c) Of course we can apply 254G to these measures; if (Y, T, ν) is a complete probability space, a function φ : Y → {0, 1}I is inversemeasurepreserving iff ν{y : y ∈ Y, φ(y)¹J = z} = 2−#(J) whenever J ⊆ I is finite and z ∈ {0, 1}J ; this is because the measurable cylinders in {0, 1}I are precisely the sets of the form {x : x¹J = z} where J ⊆ I is finite. 254K In the case of countably infinite I, we have a very important relationship between the usual product measure of {0, 1}I and Lebesgue measure on [0, 1]. Proposition Let λ be the usual measure on X = {0, 1}N , and let µ be Lebesgue measure on [0, 1]; write Λ for the domain of λ and Σ for the domain of µ. P∞ (i) For x ∈ X set φ(x) = i=0 2−i−1 x(i). Then φ−1 [E] ∈ Λ and λφ−1 [E] = µE for every E ∈ Σ; φ[F ] ∈ Σ and µφ[F ] = λF for every F ∈ Λ. (ii) There is a bijection φ˜ : X → [0, 1] which is equal to φ at all but countably many points, and any such bijection is an isomorphism between (X, Λ, λ) and ([0, 1], Σ, µ). proof (a) The first point to observe is that φ is nearly a bijection. Setting H = {x : x ∈ X, ∃ m ∈ N, x(i) = x(m) ∀ i ≥ m}, H 0 = {2−n k : n ∈ N, k ≤ 2n }, then H and H 0 are countable and φ¹X \ H is a bijection between X \ H and [0, 1] \ H 0 . (For t ∈ [0, 1] \ H 0 , φ−1 (t) is the binary expansion of t.) Because H and H 0 are countably infinite, there is a bijection between them; combining this with φ¹X \ H, we have a bijection between X and [0, 1] equal to φ except at countably many points. For the rest of this proof, let φ˜ be any such bijection. Let M be the countable set {x : x ∈ ˜ ˜ ]; then φ[A]4φ[A] ˜ X, φ(x) 6= φ(x)}, and N the countable set φ[M ] ∪ φ[M ⊆ N for every A ⊆ X. (b) To see that λφ˜−1 [E] exists and is equal to µE for every E ∈ Σ, I consider successively more complex sets E. α) If E = {t} then λφ˜−1 [E] = λ{φ˜−1 (t)} exists and is zero. (α β ) If E is of the form [2−n k, 2−n (k + 1)[, where n ∈ N and 0 ≤ k < 2n , then φ−1 [E] differs by at most (β two points from a set of the form {x : x(i) = z(i) ∀ i < n}, so φ˜−1 [E] differs from this by a countable set, and λφ˜−1 [E] = 2−n = µE. (γγ ) If E is of the form [2−n k, 2−n l[, where n ∈ N and 0 ≤ k < l ≤ 2n , then
250
Product measures
E=
S k≤i 0 there is a sequence hIn in∈N of halfopen subintervals of [0, 1[ such that S P∞ S E \ {1} ⊆ n∈N In and n=0 µIn ≤ µE + ²; now φ˜−1 [E] ⊆ {φ˜−1 (1)} ∪ n∈N φ−1 [In ], so S P∞ P∞ λ∗ φ˜−1 [E] ≤ λ( n∈N φ˜−1 [In ]) ≤ n=0 λφ˜−1 [In ] = n=0 µIn ≤ µE + ². As ² is arbitrary, λ∗ φ˜−1 [E] ≤ µE, and there is a V ∈ Λ such that φ−1 [E] ⊆ V and λV ≤ µE. (ζζ ) Similarly, there is a V 0 ∈ Λ such that V 0 ⊇ φ˜−1 [[0, 1] \ E] and λV 0 ≤ µ([0, 1] \ E). Now V ∪ V 0 = X, while λV + λV 0 ≤ µE + (1 − µE) = 1 = λ(V ∪ V 0 ), so λ(V ∩ V 0 ) = 0 and φ˜−1 [E] = (X \ V 0 ) ∪ (V ∩ V 0 ∩ φ˜−1 [E]) belongs to Λ, with λφ˜−1 [E] ≤ λV ≤ µE; at the same time, 1 − λφ˜−1 [E] ≤ λV 0 ≤ 1 − µE so λφ˜−1 [E] = µE. (c) Now suppose that C ⊆ X is a measurable cylinder of the special Pn form {x : x(0) = ²0 , . . . , x(n) = ²n } for some ²0 , . . . , ²n ∈ {0, 1}. Then φ[C] = [t, t + 2−n−1 ] where t = i=0 2−i−1 ²i , so that µφ[C] = λC. Since ˜ ˜ φ[C]4φ[C] ⊆ N is countable, µφ[C] = λC. If C ⊆ X is any measurable cylinder, then it is of the form {x : x¹J = z} for some finite J ⊆ N; taking n so large that J ⊆ {0, . . . , n}, C is expressible as a disjoint union of 2n+1−#(J) sets of the form just considered, ˜ being just those in which ²i = z(i) for i ∈ J. Summing their measures, we again get µφ[C] = λC. Now −1 ˜ ˜ 254G tells us that φ : [0, 1] → X is inversemeasurepreserving, that is, φ[W ] is Lebesgue measurable, with measure λW , for every W ∈ Λ. Putting this together with (b), φ˜ must be an isomorphism between (X, Λ, λ) and ([0, 1], Σ, µ), as claimed in (ii) of the proposition. (d) As for (i), if E ∈ Σ then φ−1 [E]4φ˜−1 [E] ⊆ M is countable, so λφ−1 [E] = λφ˜−1 [E] = µE. While if ˜ ] ⊆ N is countable, so µφ[W ] = µφ[W ˜ ] = λW . W ∈ Λ, φ[F ]4φ[W (e) Finally, if ψ : X → [0, 1] is any other bijection which agrees with φ at all but countably many points, set M 0 = {x : ψ(x) 6= φ(x)}, N 0 = ψ[M 0 ] ∪ φ[M 0 ]. Then ψ −1 [E]4φ−1 [E] ⊆ M 0 ,
λψ −1 [E] = λφ−1 [E] = µE
for every E ∈ Σ, and ψ[F ]4φ[F ] ⊆ N 0 , for every F ∈ Λ.
µψ[F ] = µφ[F ] = λF
254L
Infinite products
251
254L Subspaces Just as in 251P, we can consider the product of subspace measures. There is a simplification in the form of the result because in the present context we are restricted to probability measures. Theorem Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, and (X, Λ, λ) their product. (a) For each i ∈ I, let Ai ⊆ Xi be a set of full outer measure, and write µ ˜i for the subspace measure on ˜ be the product measure on A = Q Ai . Then λ ˜ is the subspace measure on A induced Ai (214B). Let λ i∈I by λ. Q Q (b) λ∗ ( i∈I Ai ) = i∈I µ∗i Ai whenever Ai ⊆ Xi for every i. ˜ for the proof (a) Write λA for the subspace measure on A defined from λ, and ΛA for its domain; write Λ ˜ domain of λ. Q (i) Let φ : A → X be the identity map. If C ⊆ X is a measurable cylinder, say C = i∈I Ci where Q Ci ∈ Σi for each i, then φ−1 [C] = i∈I (Ci ∩ Ai ) is a measurable cylinder in A, and Q ˜ −1 [C] = Q µ λφ i∈I ˜ i (Ci ∩ Ai ) ≤ i∈I µi Ci = µC. ˜ ∩ W ) = λW for every W ∈ Λ. But this means that By 254G, φ is inversemeasurepreserving, that is, λ(A ∗ ˜ is defined and equal to λA V = λ V for every V ∈ ΛA , since for any such V there is a W ∈ Λ such that λV V = A ∩ W and λW = λA V . In particular, λA A = 1. ˜ ˜ (ii) Now regard φ as a function Q from the measure space (A, ΛA , λA ) to (A, Λ, λ). If D is a measurable ˜i and Di = Ai for all cylinder in A, we can express it as i∈I Di where every Di belongs to the domain of µ but finitely many i. Now for each i we can find Ci ∈ Σi such that D = C ∩ A and µCi = µ ˜i Di , and we i i i Q can suppose that Ci = Xi whenever Di = Ai . In this case C = i∈I Ci ∈ Λ and Q Q ˜ ˜i Di = λD. λC = i∈I µi Ci = i∈I µ Accordingly ˜ λA φ−1 [D] = λA (A ∩ C) ≤ λC = λD. By 254G again, φ is inversemeasurepreserving in this manifestation, that is, λA V is defined and equal to ˜ for every V ∈ Λ. ˜ as claimed. ˜ Putting this together with (i), we have λA = λ, λV (b) For each i ∈ I, choose a set Ei ∈ Σi such that Ai ⊆ Ei and µi Ei = µ∗i Ai ; do this in such a way that Ei = Xi whenever µ∗i Ai = 1. Set Bi = Ai ∪ (Xi \ Ei ), so that µ∗i Bi = 1 for each i (if F ∈ Σi and F ⊇ Bi then F ∩ Ei ⊇ Ai , so µi F = µi (F ∩ Ei ) + µi (F \ Ei ) = µi Ei + µi (Xi \ Ei ) = 1.) Q By (a), we can identify the subspace measure λB on B = i∈I Bi with the product Q of the subspaceQmeasures µ ˜i onQ Bi . In particular, λ∗ B = λB B = 1. Now Ai = Bi ∩ Ei so (writing A = Qi∈I Ai ), A = B ∩ i∈I Ei . If i∈I µ∗i Ai = 0, then for every ² > 0 there is a finite J ⊆ I such that i∈J µ∗i Ai ≤ ²; consequently (using 254Fb) Q λ∗ A ≤ λ{x : x(i) ∈ Ei for every i ∈ J} = i∈J µi Ei ≤ ². Q As ² is arbitrary, λ∗ A = 0. If i∈I µ∗i Ai > 0, then for every n ∈ N the set {i : µ∗ Ai ≤ 1 − 2−n } must be finite, so J = {i : µ∗ Ai < 1} = {i : Ei 6= Xi } Q is countable. By 254Fb again, applied to hEi ∩ Bi ii∈I in the product i∈I Bi , λ∗ (
Y
Y Ai ) = λB ( Ai ) = λB {x : x ∈ B, x(i) ∈ Ei ∩ Bi for every i ∈ J}
i∈I
=
Y i∈J
as required.
i∈I
µ ˜i (Ei ∩ Bi ) =
Y i∈I
µ∗i Ai ,
252
Product measures
254M
254M I now turn to the basic results which make it possible to use these product measures effectively. First, I offer a vocabulary for dealing with subproducts. Let hXi ii∈I be a family of sets, with product X. Q (a) For J ⊆ I, write XJ for i∈J Xi . We have a canonical bijection x 7→ (x¹J, x¹I \ J) : X → XI × XI\J . Associated with this we have the map x 7→ πJ (x) = x¹J : X → XJ . Now I will say that a set W ⊆ X is determined by coordinates in J if there is a V ⊆ XJ such that W = πJ−1 [V ]; that is, W corresponds to V × XI\J ⊆ XJ × XI\J . It is easy to see that W is determined by coordinates in J ⇐⇒ x0 ∈ W whenever x ∈ W, x0 ∈ X and x0 ¹J = x¹J ⇐⇒ W = πJ−1 [πJ [W ]]. It follows that if W is determined by coordinates in J, and J ⊆ K ⊆ I, W is also determined by coordinates in K. The family WJ of subsets of X determined by coordinates in J is closed under complementation and arbitrary unions and intersections. P P If W ∈ WJ , then X \ W = X \ πJ−1 [πJ [W ]] = πJ−1 [XJ \ πJ [W ]] ∈ WJ . If V ⊆ WJ , then
S
V=
S V ∈V
πJ−1 [πJ [V ]] = πJ−1 [
(b) It follows that W=
S
S V ∈V
πJ [V ]] ∈ WJ . Q Q
{WJ : J ⊆ I is countable},
the family of subsets of X determined by coordinates in some countable set, is a σalgebra of subsets of X. P P (i) X and ∅ are determined by coordinates in ∅ (recall that X∅ is a singleton, and that X = π∅−1 [X∅ ], ∅ = π∅−1 [∅]). (ii) If W ∈ W, there is a countable J ⊆ I such that W ∈ WJ ; now X \ W = πJ−1 [XJ \ πJ [W ]] ∈ WJ ⊆ W. (iii) If hWS n in∈N is a sequence in W, then for each n ∈ N there is a countable Jn ⊆ I such that W ∈ WJn . Now J = n∈N Jn is a countable subset of I, and every Wn belongs to WJ , so S Q n∈N Wn ∈ WJ ⊆ W. Q (c) If i ∈ I and E ⊆ Xi then {x : x ∈ X, x(i) ∈ E} is determined by the single coordinate i, so surely N belongs to W; accordingly W must include c i∈I PXi . A fortiori, if Σi is a σalgebra of subsets of Xi for N N each i, W ⊇ c Σi ; that is, every member of c Σi is determined by coordinates in some countable set. i∈I
i∈I
254N Theorem Let h(Xi , Σi , µi )ii∈I be a family of Q probability spaces and hKj ij∈J a partition of I. For each j ∈ J let λj be the product measure on Zj = i∈Kj Xi , and write λ for the product measure on Q X = i∈I Xi . Then the natural bijection Q x 7→ φ(x) = hx¹Kj ij∈J : X → j∈J Zj identifies λ with the product of the family hλj ij∈J . In Q particular, ifQK ⊆ I is any set, then λ can be identified with the c.l.d. product of the product measures on i∈K Xi and i∈I\K Xi . Q ˜ for the product measure on Z; let Λ, Λ ˜ be the domains proof (Compare 251M.) Write Z = j∈J Zj and λ ˜ of λ and λ. Q ˜ (a) Let C ⊆ Z be a measurable cylinder. Then λ∗ φ−1 [C] ≤ λC. P P Express C as j∈J Cj where Cj ⊆ Zj belongs to the domain Λj of λj for each j. Set L = {j : Cj 6= Zj },Qso that L is finite. Let ² S > 0. For each j ∈ L let hCjn in∈N be a sequence of measurable cylinders in Zj = i∈Kj Xi such that Cj ⊆ n∈N Cjn and
254O
Infinite products
P∞
Q
λj Cjn ≤ λCj + ². Express each Cjn as is finite). For f ∈ N L , set n=0
Because
S
i∈Kj
253
Cjni where Cjni ∈ Σi for i ∈ Kj (and {i : Cjni 6= Xi }
Df = {x : x ∈ X, x(i) ∈ Cj,f (j),i whenever j ∈ L, i ∈ Kj }. j∈L {i
: Cj,f (j),i 6= Xi } is finite, Df is a measurable cylinder in X, and Q Q Q λDf = j∈L i∈Kj µi Cj,f (j),i = j∈L λj Cj,f (j) .
Also
S
{Df : f ∈ N L } ⊇ φ−1 [C]
because if φ(x) ∈ C then φ(x)(j) ∈ Cj for each j ∈ L, so there must be an f ∈ N L such that φ(x)(j) ∈ Cj,f (j) for every j ∈ L. But (because N L is countable) this means that X X Y λ∗ φ−1 [C] ≤ λDf = λj Cj,f (j) =
f ∈N L j∈L
f ∈N L ∞ YX
λj Cjn ≤
j∈L n=0
As ² is arbitrary, λ∗ φ−1 [C] ≤
Q j∈L
Y
(λj Cj + ²).
j∈L
˜ λj Cj = λC. Q Q
˜ , whenever W ∈ Λ. ˜ By 254G, it follows that λφ−1 [W ] is defined, and equal to λW Q ˜ (b) Next, λφ[D] = λD for every measurable cylinder D ⊆ X. P P This is easy. Express D as i∈I Di Q ˜ j , where D ˜j = Q where Di ∈ Σi for every i ∈ I and {i : Di 6= Σi } is finite. Then φ[D] = j∈J D i∈Kj Di is a ˜ j 6= Zj } must also be finite (in fact, it cannot have more measurable cylinder for each j ∈ J. Because {j : D Q ˜ j is itself a measurable cylinder in Z, and members than the finite set {i : Di 6= Xi }), j∈J D Q Q ˜ ˜j = Q Q λφ[D] = j∈J λj D j∈J i∈Kj µDi = λD. Q ˜ Applying 254G to φ−1 : Z → X, it follows that λφ[W ] is defined, and equal to λW , for every W ∈ Λ. But together with (a) this means that for any W ⊆ X, ˜ ˜ and λφ[W if W ∈ Λ then φ[W ] ∈ Λ ] = λW , ˜ ˜ if φ[W ] ∈ Λ then W ∈ Λ and λW = λφ[W ]. ˜ ˜ λ). And of course this is just what is meant by saying that φ is an isomorphism between (X, Λ, λ) and (Z, Λ, 254O Proposition Let h(Xi , Σi , µQ i )ii∈I be a family of probability spaces. For each J ⊆ I let λJ be the product probability measure on XJ = i∈J Xi , and ΛJ its domain; write X = XI , λ = λI and Λ = ΛI . For x ∈ X and J ⊆ I set πJ (x) = x¹J ∈ XJ . (a) For every J ⊆ I, λJ is the image measure λπJ−1 (112E); in particular, πJ : X → XJ is inversemeasurepreserving for λ and λJ . (b) If J ⊆ I and W ∈ Λ is determined by coordinates in J (254M), then λJ πJ [W ] is defined and equal to λW . Consequently there are W1 , W2 belonging to the σalgebra of subsets of X generated by {{x : x(i) ∈ E} : i ∈ J, E ∈ Σi } such that W1 ⊆ W ⊆ W2 and λ(W2 \ W1 ) = 0. (c) For every W ∈ Λ, we can find a countable set J and W1 , W2 ∈ Λ, both determined by coordinates in J, such that W1 ⊆ W ⊆ W2 and λ(W2 \ W1 ) = 0. (d) For every W ∈ Λ, there is a countable set J ⊆ I such that πJ [W ] ∈ ΛJ and λJ πJ [W ] = λW ; so that W 0 = πJ−1 [πJ [W ]] belongs to Λ, and λ(W 0 \ W ) = 0. proof (a)(i) By 254N, we can identify λ with the product of λJ and λI\J on XJ × XI\J . Now πJ−1 [E] ⊆ X corresponds to E × XI\J ⊆ XJ × XI\J , so
254
Product measures
254O
λ(π −1 [E]) = λJ E · λI\J XI\J = λJ E, by 251E or 251Ia, whenever E ∈ ΛJ . This shows that πJ is inversemeasurepreserving. (ii) To see that λJ is actually the image measure, suppose that E ⊆ XJ is such that πJ−1 [E] ∈ Λ. Identifying πJ−1 [E] with E × XI\J , as before, we are supposing that E × XI\J is measurable for the product measure on XJ × XI\J . But this means that for λI\J almost every z ∈ XI\J , Ez = {y : (y, z) ∈ E × XI\J } belongs to ΛJ (252D(ii), because λJ is complete). Since Ez = E for every z, E itself belongs to ΛJ , as claimed. (b) If W ∈ Λ is determined by coordinates in J, set H = πJ [W ]; then πJ−1 [H] = W , so H ∈ ΛJ by (a) N just above. By 254Ff, there are H1 , H2 ∈ c i∈J Σi such that H1 ⊆ H ⊆ H2 and λJ (H2 \ H1 ) = 0. Let TJ be the σalgebra of subsets of X generated by sets of the form {x : x(i) ∈ E} where i ∈ J and E ∈ ΣJ . Consider T0J = {G : G ⊆ XJ , πJ−1 [G] ∈ TJ }. This is a σalgebra of subsets of XJ , and it contains {y : y ∈ XJ , y(i) ∈ E} whenever i ∈ J, E ∈ ΣJ (because πJ−1 [{y : y ∈ XJ , y(i) ∈ E}] = {x : x ∈ X, x(i) ∈ E} N whenever i ∈ J, E ⊆ Xi ). So T0J must include c i∈J Σi . In particular, H1 and H2 both belong to T0J , that is, Wk = πJ−1 [Hk ] belongs to TJ for both k. Of course W1 ⊆ W ⊆ W2 , because H1 ⊆ H ⊆ H2 , and λ(W2 \ W1 ) = λJ (H2 \ H1 ) = 0, as required. N (c) Now take any W ∈ Λ. By 254Ff, there are W1 and W2 ∈ c i∈I Σi such that W1 ⊆ W ⊆ W2 and λ(W2 \ W1 ) = 0. By 254Mc, there are countable sets J1 , J2 ⊆ I such that, for each k, Wk is determined by coordinates in Jk . Setting J = J1 ∪ J2 , J is a countable subset of I and both W1 and W2 are determined by coordinates in J. (d) Continuing the argument from (c), πJ [W1 ], πJ [W2 ] ∈ ΛJ , by (b), and λJ (πJ [W2 ] \ πJ [W1 ]) = 0. Since πJ [W1 ] ⊆ πJ [W ] ⊆ πJ [W2 ], it follows that πJ [W ] ∈ ΛJ , with λJ πJ [W ] = λJ πJ [W2 ]; so that, setting W 0 = πJ−1 [πJ [W ]], W 0 ∈ Λ, and λW 0 = λJ πJ [W ] = λJ πJ [W2 ] = λπJ−1 [πJ [W2 ]] = λW2 = λW . 254P Proposition Let h(Xi , Σi , µi )ii∈I Q be a family of probability spaces, and for each J ⊆ I let λJ be the product probability measure on XJ = i∈J Xi , and ΛJ its domain; write X = XI , Λ = ΛI and λ = λI . For x ∈ X and J ⊆ I set πJ (x) = x¹J ∈ XJ . (a) If J ⊆ I and g is a realvalued function defined on a subset of XJ , then g is ΛJ measurable iff gπJ is Λmeasurable. (b) Whenever f is a Λmeasurable realvalued function defined on a λconegligible subset of X, we can find a countable set J ⊆ I and a ΛJ measurable function g defined on a λJ conegligible subset of XJ such that f extends gπJ . proof (a)(i) If g is ΛJ measurable and a ∈ R, there is an H ∈ ΛJ such that {y : y ∈ dom g, g(y) ≥ a} = H ∩ dom g. Now πJ−1 [H] ∈ Λ, by 254Oa, and {x : x ∈ dom gπJ , gπJ (x) ≥ a} = πJ−1 [H] ∩ dom gπJ . So gπJ is Λmeasurable. (ii) If gπJ is Λmeasurable and a ∈ R, then there is a W ∈ Λ such that {x : gπJ (x) ≥ a} = W ∩dom gπJ . As in the proof of 254Oa, we may identify λ with the product of λJ and λI\J , and 252D(ii) tells us that, if we identify W with the corresponding subset of XJ × XI\J , there is at least one z ∈ XI\J such that Wz = {y : y ∈ XI , (y, z) ∈ W } belongs to ΛJ . But since (on this convention) gπJ (y, z) = g(y) for every y ∈ XJ , we see that {y : y ∈ dom g, g(y) ≥ a} = Wz ∩ dom g. As a is arbitrary, g is ΛJ measurable. (b) For rational numbers q, set Wq = {x : x ∈ dom f, f (x) ≥ q}. By 254Oc we can find for each q a 0 00 countable set Jq ⊆ I and sets W by coordinates in Jq , such that Wq0 ⊆ Wq ⊆ Wq00 S q , Wq , both determined S 00 0 and λ(Wq \ Wq ) = 0. Set J = q∈Q Jq , V = X \ q∈Q (Wq00 \ Wq0 ); then J is a countable subset of I and V is a conegligible subset of X; moreover, V is determined by coordinates in J because all the Wq0 , Wq00 are.
254R
Infinite products
255
For every q ∈ Q, Wq ∩ V = Wq0 ∩ V , because V S ∩ (Wq \ Wq0 ) ⊆ V ∩ (Wq00 \ Wq0 ) = ∅; so Wq ∩ V is determined by coordinates in J. Consequently V ∩ dom f = q∈Q V ∩ Wq is also determined by coordinates in J. Also T {x : x ∈ V ∩ dom f, f (x) ≥ a} = q≤a V ∩ Wq is determined by coordinates in J. What this means is that if x, x0 ∈ V and πJ x = πJ x0 , then x ∈ dom f iff x0 ∈ dom f and in this case f (x) = f (x0 ). Setting H = πJ [V ∩ dom f ], we have πJ−1 [H] = V ∩ dom f a conegligible subset of X, so (because λJ = λπJ−1 ) H is conegligible in XJ . Also, for y ∈ H, f (x) = f (x0 ) whenever πJ x = πJ x0 = y, so there is a function g : H → R defined by saying that gπJ (x) = f (x) whenever x ∈ V ∩ dom f . Thus g is defined almost everywhere in XJ and f extends gπJ . Finally, for any a ∈ R, πJ−1 [{y : g(y) ≥ a}] = {x : x ∈ V ∩ dom f, f (x) ≥ a} ∈ Λ; by 254Oa, {y : g(y) ≥ a} ∈ ΛJ ; as a is arbitrary, g is measurable. 254Q Proposition Let h(Xi , Σi , µi )ii∈I beQa family of probability spaces, and for each J ⊆ I let λJ be the product probability measure on XJ = i∈J Xi ; write X = XI , λ = λI . For x ∈ X, J ⊆ I set πJ (x) = x¹J ∈ XJ . (a) Let S be the linear subspace of RX spanned by {χC : C ⊆ X is a measurable R cylinder}. Then for every λintegrable realvalued function f and every ² > 0 there is a g ∈ S such that f R− gdλ ≤ ².R (b) Whenever J ⊆ I and g is a realvalued function defined on a subset of XJ , then g dλJ = gπJ dλ if either integral is defined in [−∞, ∞]. (c) Whenever f is a λintegrable realvalued function, we can find a countable set J ⊆ X and a λJ integrable function g such that f extends gπJ . proof (a)(i) Write SRfor the set of functions f satisfying the assertion, that is, such that for every ² > 0 there is a g ∈ S such that R f − g ≤ ². ThenR f1 + f2 and cf1 ∈ S whenever f1 , f2 ∈ S. P Given ² > 0 there are R P ² g1 , g2 ∈ S such that f1 − g1  ≤ 2+c , f2 − g2  ≤ 2² ; now g1 + g2 , cg1 ∈ S and (f1 + f2 ) − (g1 + g2 ) ≤ ², R cf1 − cg1  ≤ ². Q Q Also, of course, f ∈ S whenever f0 ∈ S and f =a.e. f0 . (ii) Write W for {W : W ⊆ X, χW ∈ S}, and C for the family of measurable cylinders in X. Then it is plain from the definition in 254A that C ∩ C 0 ∈ C for all C, C 0 ∈ C, and of course C ∈ W for every C ∈ C, because χC S ∈ S. Next, W \ V ∈ W whenever W , V ∈ W and V ⊆ W , because then χ(W \SV ) = χW − χV . P SetR W = n∈N Wn . Given Thirdly, n∈N Wn ∈ W for every nondecreasing sequence hWn in∈N in W. P ²R > 0, there is an n ∈ N such that λ(W \ Wn ) ≤ 2² . Now there is a g ∈ S such that χWn − g ≤ 2² , so that χW − g ≤ ². Q Q Thus W is a Dynkin class of subsets of X. By the Monotone Class Theorem (136B), W must include the σalgebra of subsets of X generated by N C, which is c i∈I Σi . But this means that W contains every measurable subset of X, since by 254Ff any N measurable set differs by a negligible set from some member of c Σi . i∈I
(iii) Thus S contains the characteristic function of any measurable subset of X. under addition and scalar multiplication, it contains all simple functions. But this contain all integrable functions. P P If f is a realvalued function which is integrable R there is a simple function h : X → R such that f − h ≤ 2² (242M), and now there R R ² h − g ≤ 2 , so that f − g ≤ ². Q Q This proves part (a) of the proposition.
Because it is closed means that it must over X, and ² > 0, is a g ∈ S such that
(b) Put 254Oa and 235L together. (c) By 254Pb, there are a countable J ⊆ I and a realvalued function g defined on a conegligible subset of XJ such that f extends gπJ . Now dom(gπJ ) = πJ−1 [dom g] is conegligible, so f =a.e. gπJ and gπJ is λintegrable. By (b), g is λJ integrable. 254R Conditional expectations again Putting the ideas of 253H together with the work above, we obtain some results which are important not only for their direct applications but for the light they throw on the structures here.
256
Product measures
254R
Theorem Let h(Xi , Σi , µi )ii∈I be a family of probability spaces with product (X, Λ, λ). For J ⊆ I let ΛJ ⊆ Λ be the σsubalgebra of sets determined by coordinates in J (254Mb). Then we may regard L0 (λ¹ΛJ ) as a subspace of L0 (λ) (242Jh). Let PJ : L1 (λ) → L1 (λ¹ΛJ ) ⊆ L1 (λ) be the corresponding conditional expectation operator (242J). Then (a) for any J, K ⊆ I, PK∩J = PK PJ ; (b) for any u ∈ L1 (λ), there is a countable set J ∗ ⊆ I such that PJ u = u iff J ⊇ J ∗ ; (c) for any u ∈ L0 (λ), there is a unique smallest set J ∗ ⊆ I such that u ∈ L0 (λ¹ΛJ ∗ ), and this J ∗ is countable; (d) for any W ∈ Λ there is a unique smallest set J ∗ ⊆ I such that W 4W 0 is negligible for some W 0 ∈ ΛJ ∗ , and this J ∗ is countable; (e) for any Λmeasurable realvalued function f : X → R there is a unique smallest set J ∗ ⊆ I such that f is equal almost everywhere to a Λ∗J measurable function, and this J ∗ is countable. Q proof For J ⊆ I, write XJ = i∈J Xi , let λJ be the product measure on XJ , and set φJ (x) = x¹J for x ∈ X. Write L0J for L0 (λ¹ΛJ ), regarded as a subset of L0 = L0I , and L1J for L1 (λ¹ΛJ ) = L1 (λ) ∩ L0J , as in 242Jb; thus L1J is the set of values of the projection PJ . Q (a)(i) Let C ⊆ X be a measurable cylinder, expressed as i∈I Ci where Ci ∈ Σi for every i and L = {i : Ci 6= Xi } is finite. Set Q Q Ci0 = Ci for i ∈ J, Xi for i ∈ I \ J, C 0 = i∈I Ci0 , α = i∈I\J µi Ci . Then αχC 0 is a conditional expectation of χC on ΛJ . P P By 254N, we can identify λ with the product of λJ and λI\J . This identifies ΛJ with {E × XI\J : E ∈ dom λJ }. By 253H we have a conditional expectation g of χC defined by setting g(y, z) =
R
χC(y, t)λI\J (dt)
Q for y ∈ XJ , z ∈ XI\J . But C is identified with CJ × CI\J , where CJ = i∈J Ci , so that g(y, z) = 0 if y∈ / CJ and otherwise is λI\J CI\J = α. Thus g = αχ(CJ × XI\J ). But the identification between XI × XI\J and X matches CJ × XI\J with C 0 , as described above. So g becomes identified with αχC 0 and αχC 0 is a conditional expectation of χC. Q Q (ii) Next, setting
Q Ci00 = Ci0 for i ∈ K, Xi for i ∈ I \ K, C 00 = i∈I Ci00 , Q Q β = i∈I\K µi Ci0 = i∈I\(J∪K) µi Ci ,
the same arguments show that βχC 00 is a conditional expectation of χC 0 on ΛK . So we have Q
PK PJ (χC)• = βα(χC 00 )• .
But if we look at βα, this is just i∈I\(K∩J) µi Ci , while Ci00 = Ci if i ∈ K ∩ J, Xi for other i. So βαχC 00 is a conditional expectation of χC on ΛK∩J , and PK PJ (χC)• = PK∩J (χC)• . (iii) Thus we see that the operators PK PJ , PK∩J agree on elements of the form χC • where C is a measurable cylinder. Because they are both linear, they agree on linear combinations of these, that is, PK PJ v = PK∩J v whenever v = g • for some g in the space S of 254Q. But Rif u ∈ L1 (λ) and ² > 0, there is a λintegrable function f such that f • = u an there is a g ∈ S such that f − g ≤ ² (254Qa), so that ku − vk1 ≤ ², where v = g • . Since PJ , PK and PK∩J are all linear operators of norm 1, kPK PJ u − PK∩J uk1 ≤ 2ku − vk1 + kPK PJ v − PK∩J vk1 ≤ 2². As ² is arbitrary, PK PJ u = PK∩J u; as u is arbitrary, PK PJ = PK∩J . (b) Take u ∈ L1 (λ). Let J be the family of all subsets J of I such that PJ u = u. By (a), J ∩ K ∈ J for all J, K ∈ J . Next, J contains a countable set J0 . P P Let f be a λintegrable function such that f • = u. By 254Qc, we can find a countable set J0 ⊆ I and a λJ0 integrable function g such that f =a.e. gπJ0 . Now gπJ0 is ΛJ0 measurable and u = (gπJ0 )• belongs to L1J0 , so J0 ∈ J . Q Q
254S
Infinite products
257
T Write J ∗ = J , so that J ∗ ⊆ J0 is countable. Then J ∗ ∈ J . P P Let ² > 0. As in the proof of (a) above, there is a g ∈ S such that ku − vk1 ≤ ², where v = g • . But because g is a finite linear combination of characteristic functions of measurable cylinders, each determined by coordinates in some finite set, there is a finite K ⊆ I such that g isTΛK measurable, so that PK v = v. Because K is finite, there must be J1 , . . . , Jn ∈ J such that J ∗ ∩ K = 1≤i≤n Ji ∩ K; but as J is closed under finite intersections, J = J1 ∩ . . . ∩ Jn ∈ J , and J ∗ ∩ K = J ∩ K. Now we have PJ ∗ v = PJ ∗ PK v = PJ ∗ ∩K v = PJ∩K v = PJ PK v = PJ v, using (a) twice. Because both PJ and PJ ∗ have norm 1, kPJ ∗ u − uk1 ≤ kPJ ∗ u − PJ ∗ vk1 + kPJ ∗ v − PJ vk1 + kPJ v − PJ uk1 + kPJ u − uk1 ≤ ku − vk1 + 0 + ku − vk1 + 0 ≤ 2². As ² is arbitrary, PJ ∗ u = u and J ∗ ∈ J . Q Q Now, for any J ⊆ I, PJ u = u =⇒ J ∈ J =⇒ J ⊇ J ∗ =⇒ PJ u = PJ PJ ∗ u = PJ∩J ∗ u = PJ ∗ u = u. Thus J ∗ has the required properties. (c) Set e = (χX)• , un = (−ne)∨(u∧ne) for each n ∈ N. Then, for any J ⊆ I, u ∈ L0J iff un ∈ L0J for every n. P P (α) If u ∈ L0J , then u is expressible as f • for some ΛJ measurable f ; now fn = (−nχX) ∨ (f ∧ nχX) is ΛJ measurable, so un = fn• ∈ L0J for every n. (β) If un ∈ L0J for each n, then for each n we can find a ΛJ measurable function fn such that fn• = un . But there is also a Λmeasurable function f such that u = f • , and we must have fn =a.e. (−nχX)∨(f ∧nχX) for each n, so that f =a.e. limn→∞ fn and u = (limn→∞ fn )• . Since limn→∞ fn is ΛJ measurable, u ∈ L0J . Q Q As every un belongs to L1 , we know that un ∈ L0J ⇐⇒ un ∈ L1J ⇐⇒ PJ un = un . that PJ un = un iff J ⊇ Jn∗ . So we see that u ∈ L0J iff J ⊇ Jn∗ By (b), there is for each nSa countable Jn∗ suchS ∗ ∗ for every n, that is, J ⊇ n∈N Jn . Thus J = n∈N Jn∗ has the property claimed. (d) Applying (c) to u = (χW )• , we have a (countable) unique smallest J ∗ such that u ∈ L0J ∗ . But if J ⊆ I, then there is a W 0 ∈ ΛJ such that W 0 4W is negligible iff u ∈ L0J . So this is the J ∗ we are looking for. (e) Again apply (c), this time to f • . 254S Proposition Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, with product (X, Λ, λ). (a) If A ⊆ X is determined by coordinates in I \ {j} for every j ∈ I, then its outer measure λ∗ A must be either 0 or 1. (b) If W ∈ Λ and λW > 0, then for every ² > 0 there are a W 0 ∈ Λ and a finite set J ⊆ I such that λW 0 ≥ 1 − ² and for every x ∈ W 0 there is a y ∈ W such that x¹I \ J = y¹I \ J. Q proof For J ⊆ I write XJ for i∈J Xi and λJ for the product measure on XJ . (a) Let W be a measurable envelope of A. By 254Rd, there is a smallest J ⊆ I for which there is a W 0 ∈ Λ, determined by coordinates in J, with λ(W 4W 0 ) = 0. Now J = ∅. P P Take any j ∈ I. Then A is determined by coordinates in I \ {j}, that is, can be regarded as Xj × A0 for some A0 ⊆ XI\{j} . We can also think of λ as the product of λ{j} and λI\{j} (254N). Let ΛI\{j} be the domain of λI\{j} . By 251R, λ∗ A = λ∗{j} Xj · λ∗I\{j} A0 = λ∗I\{j} A0 . Let V ∈ ΛI\{j} be measurable envelope of A0 . Then W 0 = Xj × V belongs to Λ, includes A and has measure λ∗ A, so λ(W ∩ W 0 ) = λW = λW 0 and W 4W 0 is negligible. At the same time, W 0 is determined by coordinates in I \ {j}. This means that J must be included in I \ {j}. As j is arbitrary, J = ∅. Q Q
258
Product measures
254S
But the only subsets of X which are determined by coordinates in ∅ are X and ∅. Since W differs from one of these by a negligible set, λ∗ A = λW ∈ {0, 1}, as claimed. (b) Set η = 21 min(², 1)λW . By 254Fe, there is a measurable set V , determined by coordinates in a finite subset J of I, such that λ(W 4V ) ≤ η. Note that 1 2
λV ≥ λW − η ≥ λW > 0, so 1 2
λ(W 4V ) ≤ ²λW ≤ ²λV . ˜ , V˜ ⊆ XI × XI\J be the sets We may identify λ with the c.l.d. product of λJ and λI\J (254N). Let W corresponding to W , V ⊆ X. Then V˜ can be expressed as U × XI\J where λJ U = λV > 0. Set U 0 = {z : ˜ −1 [{z}] = 0}. Then U 0 is measured by λI\J (252D(ii) again, because both λJ and λI\J are z ∈ XI\J , λJ W complete), and Z λJ U · λI\J U 0 ≤
˜ −1 [{z}]4U )λI\J (dz) λJ ( W
˜ −1 [{z}]4U ) = λJ U ) (because if z ∈ U 0 then λJ (W Z ˜ 4V˜ )−1 [{z}]λI\J (dz) = λJ ( W ˜ 4V˜ ) = (λJ × λI\J )(W (252D once more) = λ(W 4V ) ≤ ²λV = ²λJ U. This means that λI\J U 0 ≤ ². Set W 0 = {x : x ∈ X, x¹I \ J ∈ / U 0 }; then λW 0 ≥ 1 − ². If x ∈ W 0 , then 0 −1 ˜ z = x¹I \ J ∈ / U , so W [{z}] is not empty, that is, there is a y ∈ W such that y¹I \ J = z. So this W 0 has the required properties. 254T Remarks It is important to understand that the results above apply to L0 and L1 and measurablesetsuptoanegligible set, not to sets and functions themselves. One idea does apply to sets and functions, whether measurable or not. (a) Let hXi ii∈I be a family of sets with Cartesian product X. For each J ⊆ I let WJ be the set of subsets of X determined by coordinates in J. Then WJ ∩ WK = WJ∩K for all J, K ⊆ I. P P Of course WJ ∩ WK ⊇ WJ∩K , because WJ ⊇ WJ 0 whenever J 0 ⊆ J. On the other hand, suppose W ∈ WJ ∩ WK , x ∈ W , y ∈ X and x¹J ∩K = y¹J ∩K. Set z(i) = x(i) for i ∈ J, y(i) for i ∈ I \J. Then z¹J = x¹J so z ∈ W . Also y¹K = z¹K so y ∈ W . As x, y are arbitrary, W ∈ WJ∩K ; as W is arbitrary, WJ ∩ WK ⊆ WJ∩K . Q Q Accordingly, for any W ⊆ X, F = {J : W ∈ WJ } is a filter on I (unless W = X or W = ∅, in which case F = PX). But F does not necessarily have a least element, as the following example shows. (b) Set X = {0, 1}N , W = {x : x ∈ X, limi→∞ x(i) = 0}. Then for everyTn ∈ N W is determined by coordinates in Jn = {i : i ≥ n}. But W is not determined by coordinates in n∈N Jn = ∅. Note that S T W = n∈N i≥n {x : x(i) = 0} is measurable for the usual measure on X. But it is also negligible (since it is countable); in 254Rd we have J = ∅, W 0 = ∅. *254U of §251.
I am now in a position to describe a counterexample answering a natural question arising out
254X
Infinite products
259
Example There are a localizable measure space (X, Σ, µ) and a probability space (Y, T, ν) such that the c.l.d. product measure λ on X × Y is not localizable. proof (a) Take (X, Σ, µ) to be the space of 216E, so that X = {0, 1}I , where I = PC for some set C of cardinal greater than c. For each γ ∈ C write Eγ for {x : x ∈ X, x({γ}) = 1} (that is, G{γ} in the notation of 216Ec); then Eγ ∈ Σ and µEγ = 1; also every measurable set of nonzero measure meets some Eγ in a set of nonzero measure, while Eγ ∩ Eδ is negligible for all distinct γ, δ (see 216Ee). Let (Y, T, ν) be {0, 1}C with the usual measure (254J). For γ ∈ C, let Fγ be {y : y ∈ Y, y(γ) = 1}, so that νFγ = 21 . Let λ be the c.l.d. product measure on X × Y , and Λ its domain. (b) Consider the family W = {Eγ × Fγ : γ ∈ C} ⊆ Λ. ?? Suppose, if possible, that V were an essential supremum of W in Λ in the sense of 211G. For γ ∈ C write Hγ = {x : V [{x}]4Fγ is negligible}. Because Fγ 4Fδ is nonnegligible, Hγ ∩ Hδ = ∅ for all γ 6= δ. Now Eγ \ Hγ is µnegligible for every γ ∈ C. P P λ((Eγ × Fγ ) \ V ) = 0, so Fγ \ V [{x}] is negligible for almost every x ∈ Eγ , by 252D. On the other hand, if we set Fγ0 = Y \ Fγ , Wγ = (X × Y ) \ (Eγ × Fγ0 ), then we see that (Eγ × Fγ0 ) ∩ (Eγ × Fγ ) = ∅,
Eγ × Fγ ⊆ Wγ ,
λ((Eδ × Fδ ) \ Wγ ) = λ((Eγ × Fγ0 ) ∩ (Eδ × Fδ )) ≤ µ(Eγ ∩ Eδ ) = 0 for every δ 6= γ, so Wγ is an essential upper bound for W and V ∩ (Eγ × Fγ0 ) = V \ Wγ must be λnegligible. Accordingly V [{x}] \ Fγ = V [{x}] ∩ Fγ0 is νnegligible for µalmost every x ∈ Eγ . But this means that V [{x}]4Fγ is νnegligible for µalmost every x ∈ Eγ , that is, ν(Eγ \ Hγ ) = 0. Q Q Now consider the family hEγ ∩ Hγ iγ∈C . This is a disjoint family of sets of finite measure in X. If E ∈ Σ has nonzero measure, there is a γ ∈ C such that µ(Eγ ∩ Hγ ∩ E) = ν(Eγ ∩ E) > 0. But this means that E = {Eγ ∩ Hγ : γ ∈ C} satisfies the conditions of 213O, and µ must be strictly localizable; which it isn’t. X X (c) Thus we have found a family W ⊆ Λ with no essential supremum in Λ, and λ is not localizable. Remark If (X, Σ, µ) and (Y, T, ν) are any localizable measure spaces with a nonlocalizable c.l.d. product measure, then their c.l.d. versions are still localizable (213Hb) and still have a nonlocalizable product (251S), which cannot be strictly localizable; so that one of the factors is also not strictly localizable (251N). Thus any example of the type here must involve a complete locally determined localizable space which is not strictly localizable, as in 216E. 254V Corresponding to 251T and 251Wo, we have the following result on countable powers of atomless probability spaces. Proposition Let (X, Σ, µ) be an atomless probability space and I a countable set. Let λ be the product probability measure on X I . Then {x : x ∈ X I , x is injective} is λconegligible. proof For any pair {i, j} of distinct elements of X, the set {z : z ∈ X {i,j} , z(i) = z(j)} is negligible for the product measure on X {i,j} , by 251T. By 254Oa, {x : x ∈ X, x(i) = x(j)} is λnegligible. Because I is countable, there are only countably many such pairs {i, j}, so {x : x ∈ X, x(i) = x(j) for some distinct i, j ∈ I} is negligible, and its complement is conegligible; but this complement is just the set of injective functions from I to X. 254X Basic exercises (a) Let h(Xi , Σi , µi )ii∈I be any family of probability spaces, with product (X, Λ, µ). Write E for the family of subsets of X expressible as the union of a finite disjoint family of measurable cylinders. (i) Show that if C ⊆ X is a measurable cylinder then X \ C ∈ E. (ii) Show that W ∩ V ∈ E for all W , V ∈ E. (iii) Show that X \ W ∈ E for every W ∈ E. (iv) Show that E is an algebra of subsets of X. (v) Show that for any W ∈ Λ, ² > 0 there is a V ∈ E such that λ(W 4V ) ≤ ²2 . (vi) Show that for any W ∈ Λ, ² > 0 there are disjoint measurable cylinders C0 , . . . , Cn such that λ(W ∩ Cj ) ≥ (1 − ²)λCj S for every j and λ(W \ j≤n Cj ) ≤ ². (Hint: select the Cj from the measurable cylinders composing a set V R R as in (v).) (vii) Show that if f , g are λintegrable R R functions and C f ≤ C g for every measurable cylinder C ⊆ X, then f ≤a.e. g. (Hint: show that W f ≤ W f for every W ∈ Λ.)
260
Product measures
254Xb
> (b) Let h(Xi , Σi , µi ) be a family of probability spaces, with product (X, Λ, λ). Show that the outer measure λ∗ defined by λ is exactly the outer measure θ described in 254A, that is, that θ is a regular outer measure. (c) Let h(Xi , Σi , µi ) be a family of probability spaces, with product (X, Λ, λ). Write λ0 for the restriction N of λ to c i∈I Σi , and C for the family of measurable cylinders in X. Suppose that (Y, T, ν) is a probability space and φ : Y → X a function. (i) Show that φ is inversemeasurepreserving when regarded as a function N from (Y, T, ν) to (X, c i∈I Σi , λ0 ) iff φ−1 [C] belongs to T and νφ−1 [C] = λ0 C for every C ∈ C. (ii) Show that λ0 is the only measure on X with this property. (Hint: 136C.) > (d) Let I be a set and (Y, T, ν) a complete probability space. Show that a function φ : Y → {0, 1}I is inversemeasurepreserving for ν and the usual measure on {0, 1}I iff ν{y : φ(y)(i) = 1 for every i ∈ J} = 2−#(J) for every finite J ⊆ I. > (e) Let I be any set and λ the usual measure on X = {0, 1}I . Define addition on X by setting (x + y)(i) = x(i) +2 y(i) for every i ∈ I, x, y ∈ X, where 0 +2 0 = 1 +2 1 = 0, 0 +2 1 = 1 +2 0 = 1. (i) Show that for any y ∈ X, the map x 7→ x + y : X → X is inversemeasurepreserving. (Hint: Use 254G.) (ii) Show that the map (x, y) 7→ x + y : X × X → X is inversemeasurepreserving, if X × X is given its product measure. > (f ) Let I be any set and λ the usual measure on PI. (i) Show that the map a 7→ a4b : PI → PI is inversemeasurepreserving for any b ⊆ I; in particular, a 7→ I \ a is inversemeasurepreserving. (ii) Show that the map (a, b) 7→ a4b : PI × PI → PI is inversemeasurepreserving. > (g) Show that for any q ∈ [0, 1] and any set I there is a measure λ on PI such that λ{a : J ⊆ a} = q #(J) for every finite J ⊆ I. > (h) Let (Y, T, ν) be a complete probability space, and write µ for Lebesgue measure on [0, 1]. Suppose that φ : Y → [0, 1] is a function such that νφ−1 [I] exists and is equal to µI for every interval I of the form [2−n k, 2−n (k + 1)], where n ∈ N and 0 ≤ k < 2n . Show that φ is inversemeasurepreserving for ν and µ. (i) Let hXi ii∈I be a family of sets, and for each i ∈ I let Σi be a σalgebra of subsets of Xi . Show N that for every E ∈ c i∈I Σi there is a countable set J ⊆ I such that E is expressible as πJ−1 [F ] for some Q Q N Xi . Xi for x ∈ F ∈ c Xj , writing πJ (x) = x¹J ∈ i∈J
i∈I
i∈J
N
(j) (i) Let ν be the usual measure on X = {0, 1} . Show that for any k ≥ 1, (X, ν) is isomorphic to (X k , νk ), where νk is the measure on X k which is the product measure obtained by giving each factor X the measure ν. (ii) Writing µ[0,1] for Lebesgue measure on [0, 1], etc., show that for any k ≥ 1, ([0, 1]k , µ[0,1]k ) is isomorphic to ([0, 1], µ[0,1] ). (k) (i) Writing µ[0,1] for Lebesgue measure on [0, 1], etc., show that ([0, 1], µ[0,1] ) is isomorphic to k ([0, 1[ , µ[0,1[ ). (ii) Show that for any k ≥ 1, ([0, 1[ , µ[0,1[k ) is isomorphic to ([0, 1[ , µ[0,1[ ). (iii) Show that for any k ≥ 1, (R, µR ) is isomorphic to (R k , µRk ). (l) Let µ be Lebesgue measure on [0, 1] and λ the product measure on [0, 1]N . Show that ([0, 1], µ) and ([0, 1]N , λ) are isomorphic. Q (m) Let h(Xi , Σi , µi )ii∈I be a family of complete probability spaces Qand λ the product measure Q on ∗ i∈I Xi , with domain Λ. Suppose that Ai ⊆ Xi for each i ∈ I. Show that i∈I Ai ∈ Λ iff either (i) i∈I µi Ai = 0 or (ii) Ai ∈ Σi for every i and {i : Ai 6= Xi } is countable. (Hint: assemble ideas from 252Xb, 254F, 254L and 254N.) (n) Let h(Xi , Σi , µi )ii∈I be a family of probability spaces with product (X, Λ, λ). (i) Show that, for any A ⊆ X, λ∗ A = min{λ∗J πJ [A] : J ⊆ I is countable}, Q where for J ⊆ I I write λJ for the product probability measure on XJ = i∈J Xi and πJ : X → XJ for the canonical map. (ii) Show that if J, K ⊆ I are disjoint and A, B ⊆ X are determined by coordinates in J, K respectively, then λ∗ (A ∩ B) = λ∗ A · λ∗ B.
254 Notes
Infinite products
261
(o) Let h(Xi , Σi , µi )ii∈I be a family of probability spaces with product (X, Λ, λ). Let S be the linear span of the set of characteristic functions of measurable cylinders in X, as in 254Q. Show that {f • : f ∈ S} is dense in Lp (µ) for every p ∈ [1, ∞[. (p) Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, and (X, Λ, λ) their product; for J ⊆ I let ΛJ be the σalgebra of members of Λ determined by coordinates in J and PJ : L1 = L1 (λ) → L1J = L1 (λ¹ΛJ ) the corresponding conditional expectation. (i) Show that if u ∈ L1J and v ∈ L1I\J then u × v ∈ L1 and R R R R R u × v = u · v. (Hint: 253D.) (ii) Show that if u ∈ L1 then u ∈ L1J iff C u = λC · u for every measurable cylinder C ⊆ X which isTdetermined by coordinates in I \ J. (Hint: 254Xa(vii).) (iii) Show that T if J ⊆ PI is nonempty, with J ∗ = J , then L1J ∗ = J∈J L1J . (q) (i) Let I be any set and λ the usual measure on PI. Let A ⊆ PI be such that a4b ∈ A whenever a ∈ A and b ⊆ I is finite. Show that λ∗ A must be either 0 or 1. (ii) Let λ be the usual measure on {0, 1}N , and Λ its domain. Let f : {0, 1}N → R be a function such that, for x, y ∈ {0, 1}N , f (x) = f (y) ⇐⇒ {n : n ∈ N, x(n) 6= y(n)} is finite. Show that f is not Λmeasurable. (Hint: for any q ∈ Q, λ∗ {x : f (x) ≤ q} is either 0 or 1.) Q (r) Let hXi ii∈I be any family of sets and A ⊆ B ⊆ i∈I Xi . Suppose that A is determined by coordinates in J ⊆ I and that B is determined by coordinates in K. Show that there is a set C such that A ⊆ C ⊆ B and C is determined by coordinates in J ∩ K. 254Y Further exercises (a) Let Q h(Xi , Σi , µi )ii∈I be a family of probability spaces, and for J ⊆ I let λJ be the product measure on XJ = i∈J Xi ; write X = XI , λ = λI and πJ (x) = x¹J for x ∈ X and J ⊆ I. (i) Show that for K ⊆ J ⊆ I we have a natural linear, orderpreserving and normpreserving map TJK : L1 (λK ) → L1 (λJ ) defined by writing TJK (f • ) = (f πKJ )• for every λK integrable function f , where πKJ (y) = y¹K for y ∈ XJ . (ii) Write K for the set of finite subsets of I. Show that if W is any Banach space and hTK iK∈K is a family such that (α) TK is a bounded linear operator from L1 (λK ) to W for every K ∈ K (β) TK = TJ TJK whenever K ⊆ J ∈ K (γ) supK∈K kTK k < ∞, then there is a unique bounded linear operator T : L1 (λ) → W such that TK = T TIK for every K ∈ K. S (iii) Write J for the set of countable subsets of I. Show that L1 (λ) = J∈J TIJ [L1 (λJ )]. Q (b) Let h(Xi , Σi , µi )ii∈I be a family of probability spaces, and λ a complete measure on X = i∈I Xi . Suppose that for every complete probability space (Y, T, ν) and function φ : Y → X, φ is inversemeasurepreserving for ν and λ iff νφ−1 [C] is defined and equal to θ0 C for every measurable cylinder C ⊆ X, writing θ0 for the functional of 254A. Show that λ is the product measure on X. (c) Let I be a set, and λ the usual measure on {0, 1}I . Show that L1 (λ) is separable, in its norm topology, iff I is countable. (d) Let I be a set, and λ the usual measure on PI. Show that if F is a nonprincipal ultrafilter on I then λ∗ F = 1. (Hint: 254Xq, 254Xf.) (e) Let (X, Σ, µ), (Y, T, ν) and λ be as in 254U. Set A = {fγ : γ ∈ C} as defined in 216E. Let µA be the ˜ the c.l.d. product measure of µA and ν on A × Y . Show that λ ˜ is a proper subspace measure on A, and λ ˜ = {(fγ , y) : γ ∈ C, y ∈ Fγ }, in the notation of extension of the subspace measure λA×Y . (Hint: consider W 254U.) (f ) Let (X, Σ, µ) be an atomless probability space, I a set with cardinal at most #(X), and A the set of injective functions from I to X. Show that A has full outer measure for the product measure on X I . 254 Notes and comments While there are many reasons for studying infinite products of probability spaces, one stands preeminent, from the point of view of abstract measure theory: they provide constructions of essentially new kinds of measure space. I cannot describe the nature of this ‘newness’ effectively without
262
Product measures
254 Notes
venturing into the territory of Volume 3. But the function spaces of Chapter 24 do give at least a form of words we can use: these are the first probability spaces (X, Λ, λ) we have seen for which L1 (λ) need not be separable for its norm topology (254Yc). The formulae of 254A, like those of 251A, lead very naturally to measures; the point at which they become more than a curiosity is when we find that the product measure λ is a probability measure (254Fa), which must be regarded as the crucial argument of this section, just as 251E is the essential basis of §251. It is I think remarkable that it makes no difference to the result here whether I is finite, countably infinite or uncountable. If you write out the proof for the case I = N, it will seem natural to expand the sets Jn until they are initial segments of I itself, thereby avoiding altogether the auxiliary set K; but this is a misleading simplification, because it hides an essential feature of the argument, which is that any sequence in C involves only countably many coordinates, so that as long as we are dealing with only one such sequence the uncountability of the whole set I is irrelevant. This general principle naturally permeates the whole of the section; in 254O I have tried to spell out the way in which many of the questions we are interested in can be expressed in terms of countable subproducts of the factor spaces Xi . See also the exercises 254Xi, 254Xm and 254Ya(iii). There is a slightly paradoxical side to this principle: even the bestbehaved subsets Ei of Xi may fail Q I to have measurable products i∈I Ei if Ei 6= Xi for uncountably many i. For instance, ]0, 1[ is not a measurable subset of [0, 1]I if I is uncountable (254Xm). It has full outer measure and its own product measure is just the subspace measure (254L), but any measurable subset must have measure zero. The N point is that the empty set is the only member of c i∈I Σi , where Σi is the algebra of Lebesgue measurable I subsets of [0, 1] for each i, which is included in ]0, 1[ (see 254Xi). As in §251, I use a construction which automatically produces a complete measure on the product space. I am sure that this is the best choice for ‘the’ product measure. But there are occasions when its restriction to the σalgebra generated by the measurable cylinders is worth looking at; see 254Xc. Lemma 254G is a result of a type which will be commoner in Volume 3 than in the present volume. It describes the product measure in terms not of what it is but of what it does; specifically, in terms of a property of the associated family of inversemeasurepreserving functions. It is therefore a ‘universal mapping theorem’. (Compare 253F.) Because this description is sufficient to determine the product measure completely (254Yb), it is not surprising that I use it repeatedly. The ‘usual measure’ on {0, 1}I (254J) is sometimes called ‘cointossing measure’ because it can be used to model the concept of tossing a coin arbitrarily many times indexed by the set I, taking an x ∈ {0, 1}I to represent the outcome in which the coin is ‘heads’ for just those i ∈ I for which x(i) = 1. The sets, or ‘events’, in the class C are just those which can be specified by declaring the outcomes of finitely many tosses, and the probability of any particular sequence of n results is 1/2n , regardless of which tosses we look at or in which order. In Chapter 27 I will return to the use of product measures to represent probabilities involving independent events. In 254K I come to the first case in this treatise of a nontrivial isomorphism between two measure spaces. If you have been brought up on a conventional diet of modern abstract pure mathematics based on algebra and topology, you may already have been struck by the absence of emphasis on any concept of ‘homomorphism’ or ‘isomorphism’. Here indeed I start to speak of ‘isomorphisms’ between measure spaces without even troubling to define them; I hope it really is obvious that an isomorphism between measure spaces (X, Σ, µ) and (Y, T, ν) is a bijection φ : X → Y such that T = {F : F ⊆ Y, φ−1 [F ] ∈ Σ} and νF = µφ−1 [F ] for every F ∈ T, so that Σ is necessarily {E : E ⊆ X, φ[E] ∈ T} and µE = νφ[E] for every E ∈ Σ. Put like this, you may, if you worked through the exercises of Volume 1, be reminded of some constructions of σalgebras in 111Xc111Xd and of the ‘image measures’ in 112E. The result in 254K (see also 134Yo) naturally leads to two distinct notions of ‘homomorphism’ between two measure spaces (X, Σ, µ) and (Y, T, ν): (i) a function φ : X → Y such that φ−1 [F ] ∈ Σ and µφ−1 [F ] = νF for every F ∈ T, (ii) a function φ : X → Y such that φ[E] ∈ T and νφ[E] = µE for every E ∈ Σ. On either definition, we find that a bijection φ : X → Y is an isomorphism iff φ and φ−1 are both homomorphisms. (Also, of course, the composition of homomorphisms will be a homomorphism.) My own view is that (i) is the more important, and in this treatise I study such functions at length, calling them ‘inversemeasurepreserving’. But both have their uses. The function φ of 254K not only satisfies both definitions, but is also ‘nearly’ an isomorphism in several different ways, of which possibly the most
255A
Convolutions of functions
263
important is that there are conegligible sets X 0 ⊆ {0, 1}N , Y 0 ⊆ [0, 1] such that φ¹X 0 is an isomorphism between X 0 and Y 0 when both are given their subspace measures. Having once established the isomorphism between [0, 1] and {0, 1}N , we are led immediately to many more; see 254Xj254Xl. In fact Lebesgue measure on [0, 1] is isomorphic to a large proportion of the probability spaces arising in applications. In Volumes 3 and 4 I will discuss these isomorphisms at length. The general notion of ‘subproduct’ is associated with some of the deepest and most characteristic results in the theory of product measures. Because we are looking at products of arbitrary families of probability spaces, the definition must ignore any possible structure in the index set I of 254A254C. But many applications, naturally enough, deal with index sets with favoured subsets or partitions, and the first essential step is the ‘associative law’ (254N; compare 251Xd251Xe). This is, for instance, the tool Q by which Qwe can apply Fubini’s theorem within infinite products. The natural projection maps from i∈I Xi to i∈J Xi , where J ⊆ I,Qare related in a way which has already been used as the basis Q of theorems in §235; the product measure on i∈J Xi is precisely the image of the product measure on i∈I Xi (254Oa). In 254O254Q I explore the consequences of this fact and the fact already noted that all measurable sets in the product are ‘essentially’ determined by coordinates in some countable set.Q In 254R I go more deeply into this notion of a set W ⊆ i∈I Xi ‘determined by coordinates in’ a set J ⊆ I. In its primitive form this is a purely settheoretic notion (254M, 254Ta). I think that even a threeelement set I can give us surprises; I invite you to try to visualize subsets of [0, 1]3 which are determined by pairs of coordinates. But the interactions of this with measuretheoretic ideas, and in particular with a willingness to add or discard negligible sets, lead to much more, and in particular to the unique minimal sets of coordinates associated with measurable sets and functions (254R). Of course these results can be elegantly and effectively described in terms of L1 and L0 spaces, in which negligible sets are swept out of sight as the spaces are constructed. The basis of all this is the fact that the conditional expectation operators associated with subproducts multiply together in the simplest possible way (254Ra); but some further idea is needed T to show that if J is a nonempty family of subsets of I, then L0T J = J∈J L0J (see part (b) of the proof of 254R, and 254Xp(iii)). 254Sa is a version of the ‘zeroone law’ (272O below). 254Sb is a strong version of the principle that measurable sets in a product must be approximable by sets determined by a finite set of coordinates (254Fe, 254Qa, 254Xa). Evidently it is not a coincidence that the set W of 254Tb is negligible. In §272 I will revisit many of the ideas of 254R254S and 254Xp, in particular, in the more general context of ‘independent σalgebras’. Finally, 254U and 254Ye hardly belong to this section at all; they are unfinished business from §251. They are here because the construction of 254A254C is the simplest way to produce an adequately complex probability space (Y, T, ν).
255 Convolutions of functions I devote a section to a construction which is of great importance – and will in particular be very useful in Chapters 27 and 28 – and may also be regarded as a series of exercises on the work so far. I find it difficult to know how much repetition to indulge in in this section, because the natural unified expression of the ideas is in the theory of topological groups, and I do not think we are yet ready for the general theory (I will come to it in Chapter 44 in Volume 4). The groups we need for this volume are R; R r , for r ≥ 2; S 1 = {z : z ∈ C, z = 1}, the ‘circle group’; Z, the group of integers. All the ideas already appear in the theory of convolutions on R, and I will therefore present this material in relatively detailed form, before sketching the forms appropriate to the groups R r and S 1 (or ]−π, π]); Z can I think be safely left to the exercises. 255A This being a book on measure theory, it is perhaps appropriate for me to emphasize, as the basis of the theory of convolutions, certain measure space isomorphisms.
264
Product measures
255A
Theorem Let µ be Lebesgue measure on R and µ2 Lebesgue measure on R 2 ; write Σ, Σ2 for their domains. (a) For any a ∈ R, the map x 7→ a + x : R → R is a measure space automorphism of (R, Σ, µ). (b) The map x 7→ −x : R → R is a measure space automorphism of (R, Σ, µ). (c) For any a ∈ R, the map x 7→ a − x : R → R is a measure space automorphism of (R, Σ, µ). (d) The map (x, y) 7→ (x + y, y) : R 2 → R 2 is a measure space automorphism of (R 2 , Σ2 , µ2 ). (e) The map (x, y) 7→ (x − y, y) : R 2 → R 2 is a measure space automorphism of (R 2 , Σ2 , µ2 ). Remark I ought to remark that (b), (d) and (e) may be regarded as simple special cases of Theorem 263A in the next chapter. I nevertheless feel that it is worth writing out separate proofs here, partly because the general case of linear operators dealt with in 263A requires some extra machinery not needed here, but more because the result here has nothing to do with the linear structure of R and R 2 ; it is exclusively dependent on the group structure of R, together with the links between its topology and measure, and the arguments I give now are adaptable to the proper generalizations to abelian topological groups. proof (a) This is just the translationinvariance of Lebesgue measure, dealt with in §134. There I showed that if E ∈ Σ then E + a ∈ Σ and µ(E + a) = µE (134Ab); that is, writing φ(x) = x + a, µ(φ[E]) exists and is equal to µE for every E ∈ Σ. But of course we also have µ(φ−1 [E]) = µ(E + (−a)) = µE for every E ∈ Σ, so φ is an automorphism. (b) The point is that µ∗ (A) = µ∗ (−A) for every A ⊆ R. P P (I follow the P∞definitions of Volume 1.) If ² > S 0, there is a sequence hIn in∈N of halfopen intervals covering A with n=0 µIn ≤ µ∗ A + ². Now −A ⊆ n∈N (−In ). But if In = [an , bn [ then −In = ]−bn , an ], so P∞ P∞ P∞ µ∗ (−A) ≤ n=0 µ(−In ) = n=0 max(0, −an − (−bn )) = n=0 µIn ≤ µ∗ A + ². As ² is arbitrary, µ∗ (−A) ≤ µ∗ A. Also of course µ∗ A ≤ µ∗ (−(−A)) = µ∗ A, so µ∗ (−A) = µ∗ A. Q Q This means that, setting φ(x) = −x this time, φ is an automorphism of the structure (R, µ∗ ). But since µ is defined from µ∗ by the abstract procedure of Carath´eodory’s method, φ must also be an automorphism of the structure (R, Σ, µ). (c) Put (a) and (b) together; x 7→ a − x is the composition of the automorphisms x 7→ −x and x 7→ a + x, and the composition of automorphisms is surely an automorphism. (d)(i) Write T for the set {E : E ∈ Σ2 , φ[E] ∈ Σ2 }, where this time φ(x, y) = (x + y, y) for x, y ∈ R, so that φ : R 2 → R2 is a bijection. Then T is a σalgebra, being the intersection of the σalgebras Σ2 and {E : φ[E] ∈ Σ2 } = {φ−1 [F ] : F ∈ Σ2 }. Moreover, µ2 E = µ2 (φ[E]) for every E ∈ T. P P By 252D, we have R µ2 E = µ{x : (x, y) ∈ E}µ(dy). But applying the same result to φ[E] we have Z
Z
Z
µ{x : (x, y) ∈ φ[E]}µ(dy) = µ{x : (x − y, y) ∈ E}µ(dy) Z µ(E −1 [{y}] + y)µ(dy) = µE −1 [{y}]µ(dy)
µ2 φ[E] = =
(because Lebesgue measure is translationinvariant) = µ2 E. Q Q (ii) Now φ and φ−1 are clearly continuous, so that φ[G] is open, and therefore measurable, for every open G; consequently all open sets must belong to T. Because T is a σalgebra, it contains all Borel sets. Now let E be any measurable set. Then there are Borel sets H1 , H2 such that H1 ⊆ E ⊆ H2 and µ2 (H2 \ H1 ) = 0 (134Fb). We have φ[H1 ] ⊆ φ[E] ⊆ φ[H2 ] and µ(φ[H2 ] \ φ[H1 ]) = µφ[H2 \ H1 ] = µ(H2 \ H1 ) = 0. Thus φ[E] \ φ[H1 ] must be negligible, therefore measurable, and φ[E] = φ[H1 ] ∪ (φ[E] \ φ[H1 ]) is measurable.
255D
Convolutions of functions
265
This shows that φ[E] is measurable whenever E is. But now observe that T can also be expressed as {E : E ∈ Σ2 , φ−1 [E] ∈ Σ2 }, so that we can apply the same argument with φ−1 in the place of φ to see that φ−1 [E] is measurable whenever E is. So φ is an automorphism of the structure (R 2 , Σ2 ), and therefore (by (i) again) of (R 2 , Σ2 , µ2 ). (e) Of course this is an immediate corollary either of the proof of (d) or of (d) itself as stated, since (x, y) 7→ (x − y, y) is just the inverse of (x, y) 7→ (x + y, y). 255B Corollary (a) If a ∈ R, then for any complexvalued function f defined on a subset of R R R R R f (x)dx = f (a + x)dx = f (−x)dx = f (a − x)dx in the sense that if one of the integrals exists so do the others, and they are then all equal. (b) If f is a complexvalued function defined on a subset of R2 , then R R R f (x + y, y)d(x, y) = f (x − y, y)d(x, y) = f (x, y)d(x, y) in the sense that if one of the integrals exists and is finite so does the other, and they are then equal. 255C Remarks (a) I am not sure whether it ought to be ‘obvious’ that if (X, Σ, µ), (Y, T, ν) are measure spaces and φ : X → Y is an isomorphism, then for any function f defined on a subset of Y R R f (φ(x))µ(dx) = f (y)ν(dy) in the sense that if one is defined so is the other, and they are then equal. If it is obvious then the obviousness must be contingent on the nature of the definition of integration: integrability with respect to the measure µ is something which depends on the structure (X, Σ, µ) and on no other properties of X. If it is not obvious then it is an easy deduction from Theorem 235A above, applied in turn to φ and φ−1 and to the real and imaginary parts of f . In any case the isomorphisms of 255A are just those needed to prove 255B. R (b) Note that in 255Bb I write f (x, y)d(x, y) to emphasize that I am considering the integral of f with respect to twodimensional Lebesgue measure. The fact that ¢ ¢ ¢ R ¡R R ¡R R ¡R f (x, y)dx dy = f (x + y, y)dx dy = f (x − y, y)dx dy R R is actually easier, being an immediate consequence of the equality f (a+x)dx = f (x)dx. But applications of this result often depend essentially on the fact that the functions (x, y) 7→ f (x + y, y), (x, y) 7→ f (x − y, y) are measurable as functions of two variables. (c) I have moved directly to complexvalued functions because these are necessary for the applications in Chapter 28. If however they give you any discomfort, either technically or aesthetically, all the measuretheoretic ideas of this section are already to be found in the real case, and you may wish at first to read it as if only real numbers were involved. 255D
A further corollary of 255A will be useful.
Corollary Let f be a complexvalued function defined on a subset of R. (a) If f is measurable, then the functions (x, y) 7→ f (x + y), (x, y) 7→ f (x − y) are measurable. (b) If f is defined almost everywhere on R, then the functions (x, y) 7→ f (x + y), (x, y) 7→ f (x − y) are defined almost everywhere on R 2 . proof Writing g1 (x, y) = f (x + y), g2 (x, y) = f (x − y) whenever these are defined, we have g(x, y) = (f ⊗ 1)(φ(x, y)),
g2 (x, y) = (f ⊗ 1)(φ−1 (x, y)),
writing φ(x, y) = (x + y, y) as in 255B(de), and (f ⊗ 1)(x, y) = f (x), following the notation of 253B. By 253C, f ⊗ 1 is measurable if f is, and defined almost everywhere if f is. Because φ is a measure space automorphism, (f ⊗ 1)φ = g1 and (f ⊗ 1)φ−1 = g2 are measurable, or defined almost everywhere, if f is.
266
Product measures
255E
255E The basic formula Let f and g be measurable complexvalued functions defined almost everywhere in R. Write f ∗ g for the function defined by the formula R (f ∗ g)(x) = f (x − y)g(y)dy whenever the integral exists (with respect to Lebesgue measure, naturally) as a complex number. Then f ∗ g is the convolution of the functions f and g. Observe that dom(f  ∗ g) = dom(f ∗ g), and that f ∗ g ≤ f  ∗ g everywhere on their common domain, for all f and g. Remark Note that I am here prepared to contemplate the convolution of f and g for arbitrary members of L0C , the space of almosteverywheredefined measurable complexvalued functions, even though the domain of f ∗ g may be empty. 255F Basic properties (a) Because integration is linear, we surely have ((f1 + f2 ) ∗ g)(x) = (f1 ∗ g)(x) + (f2 ∗ g)(x), (f ∗ (g1 + g2 ))(x) = (f ∗ g1 )(x) + (f ∗ g2 )(x), (cf ∗ g)(x) = (f ∗ cg)(x) = c(f ∗ g)(x) whenever the righthand sides of the formulae are defined. (b) If f , g are measurable complexvalued functions defined almost everywhere in R, then f ∗ g = g ∗ f , in the strict sense that they have the same domain and the same value at each point of that common domain. P P Take x ∈ R and apply 255Ba to see that Z Z (f ∗ g)(x) = f (x − y)g(y)dy = f (x − (x − y))g(x − y)dy Z = f (y)g(x − y)dy = (g ∗ f )(x) if either is defined. Q Q (c) If f1 , f2 , g1 , g2 are measurable complexvalued functions defined almost everywhere in R, and f1 =a.e. f2 and g1 =a.e. g2 , then for every x ∈ R we shall have f1 (x − y) = f2 (x − y) for almost every y ∈ R, by 255Ac. Consequently f1 (x − y)g1 (y) = f2 (x − y)g2 (y) for almost every y, and (f1 ∗ g1 )(x) = (f2 ∗ g2 )(x) in the sense that if one of these is defined so is the other, and they are then equal. Accordingly we may regard convolution as a binary operator on L0C ; if u, v ∈ L0C , we can define u ∗ v as being equal to f ∗ g whenever f • = u and g • = v. We need to remember, of course, that for general u, v ∈ L0 the domain of u ∗ v may vanish. 255G I have grouped 255Fa255Fc together because they depend only on ideas up to and including 255Ac, 255Ba. Using the second halves of 255A and 255B we get much deeper. I begin with what seems to be the fundamental result. Theorem Let f , g and h be measurable complexvalued functions defined almost everywhere in R. (a) R R h(x)(f ∗ g)(x)dx = h(x + y)f (x)g(y)d(x, y) whenever the righthand side exists in C, provided that in the expression h(x)(f ∗ g)(x) we interpret the product as 0 if h(x) = 0 and (f ∗ g)(x) is undefined. R R (b) If, on the same interpretation of h(x)(f  ∗ g)(x), the integral h(x)(f  ∗ g)(x)dx is finite, then h(x + y)f (x)g(y)d(x, y) exists in C, so again we shall have Z R h(x)(f ∗ g)(x)dx = h(x + y)f (x)g(y)d(x, y) ZZ ZZ = h(x + y)f (x)g(y)dxdy = h(x + y)f (x)g(y)dydx.
255I
Convolutions of functions
267
proof Consider the functions k1 (x, y) = h(x)f (x − y)g(y), k2 (x, y) = h(x + y)f (x)g(y) wherever these are defined. 255D tells us that k1 and k2 are measurable and defined almost everywhere. Now setting φ(x, y) = (x + y, y), we have k2 = k1 φ, so that R R k1 (x, y)d(x, y) = k2 (x, y)d(x, y) if either exists, by 255Bb. If
R
h(x + y)f (x)g(y)d(x, y) =
R
k2
exists, then by Fubini’s theorem we have R R R R k2 = k1 (x, y)d(x, y) = ( h(x)f (x − y)g(y)dy)dx R so h(x)f (x − y)g(y)dy exists almost everywhere, that is, (f ∗ g)(x) exists for almost every x such that h(x) 6= 0; on the interpretation I am using here, h(x)(f ∗ g)(x) exists almost everywhere, and Z Z Z Z ¡ ¢ h(x)(f ∗ g)(x)dx = h(x)f (x − y)g(y)dy dx = k1 Z Z = k2 = h(x + y)f (x)g(y)d(x, y). If (on the same interpretation) h × (f  ∗ g) is integrable, k1 (x, y) = h(x)f (x − y)g(y) is measurable, and
RR
h(x)f (x − y)g(y)dydx =
R
h(x)(f  ∗ g)(x)dx
is finite, so by Tonelli’s theorem (252G, 252H) k1 and k2 are integrable, and once again Z Z h(x)(f ∗ g)(x)dx = h(x + y)f (x)g(y)d(x, y) ZZ ZZ = h(x + y)f (x)g(y)dxdy = h(x + y)f (x)g(y)dydx.
255H
Certain standard results are now easy.
Corollary If f , g are complexvalued functions which are integrable over R, then f ∗ g is integrable, with R R R R R R f ∗ g = f g, f ∗ g ≤ f  g. proof In 255G, set h(x) = 1 for every x ∈ R; then R R R R h(x + y)f (x)g(y)d(x, y) = f (x)g(y)d(x, y) = f g by 253D, so as claimed. Now
R
f ∗g =
R
h(x)(f ∗ g)(x)dx = R
f ∗ g ≤
R
R
h(x + y)f (x)g(y)d(x, y) =
f  ∗ g =
R
f 
R
R
f
R
g,
g.
255I Corollary For any measurable complexvalued functions f , g defined almost everywhere in R, f ∗ g is measurable and has measurable domain. proof Set fn (x) = f (x) if x ∈ dom f , x ≤ n, f (x) ≤ n, and 0 elsewhere in R; define gn similarly from g. Then fn and gn are integrable, fn  ≤ f  and gn  ≤ g almost everywhere, and f = limn→∞ fn , g = limn→∞ gn . Consequently, by Lebesgue’s Dominated Convergence Theorem,
268
Product measures
Z
255I
Z
(f ∗ g)(x) =
f (x − y)g(y)dy = lim fn (x − y)gn (y)dy n→∞ Z = lim fn (x − y)gn (y)dy = lim (fn ∗ gn )(x) n→∞
n→∞
for every x ∈ dom f ∗ g. But fn ∗ gn is integrable, therefore measurable, for every n, so that f ∗ g must be measurable. As for the domain of f ∗ g, Z x ∈ dom(f ∗ g) ⇐⇒ f (x − y)g(y)dy is defined in C Z ⇐⇒ f (x − y)g(y)dy is defined in R Z ⇐⇒ fn (x − y)gn (y)dy is defined in R for every n Z and sup fn (x − y)gn (y)dy < ∞. n∈N
Because every fn  ∗ gn  is integrable, therefore measurable and with measurable domain, T dom(f ∗ g) = {x : x ∈ n∈N dom(fn  ∗ gn ), supn∈N (fn  ∗ gn )(x) < ∞} is measurable. 255J Theorem Let f , g, h be complexvalued measurable functions defined almost everywhere in R, such that f ∗ g and g ∗ h are also defined a.e. Suppose that x ∈ R is such that one of (f  ∗ (g ∗ h))(x), ((f  ∗ g) ∗ h)(x) is defined in R. Then f ∗ (g ∗ h) and (f ∗ g) ∗ h are defined and equal at x. proof Set k(y) = f (x − y) when this is defined, so that k is measurable and defined almost everywhere. Now R RR (f  ∗ (g ∗ h))(x) = f (x − y)(g ∗ h)(y)dy = k(y)g(y − z)h(z)dzdy, Z
ZZ (f  ∗ g)(x − y)h(y)dy = f (x − y − z)g(z)h(y)dzdy ZZ ZZ = k(y + z)g(z)h(y)dzdy = k(y + z)g(y)h(z)dydz.
((f  ∗ g) ∗ h)(x) =
So if either of these is finite, the conditions of 255Gb are satisfied, with k, g, h in the place of h, f and g, and R R k(y)(g ∗ h)(y)dy = k(y + z)g(y)h(z)d(y, z), that is, Z (f ∗ (g ∗ h))(x) =
Z f (x − y)(g ∗ h)(y)dy =
k(y)(g ∗ h)(y)dy ZZ = k(y + z)g(y)h(z)d(y, z) = k(y + z)g(y)h(z)dydz ZZ Z = f (x − y − z)g(y)h(z)dydz = (f ∗ g)(x − z)h(z)dz Z
= ((f ∗ g) ∗ h)(x). 255K I do not think we shall need an exhaustive discussion of the question of just when (f ∗ g)(x) is defined; this seems to be complicated. However there is a fundamental case in which we can be sure that (f ∗ g)(x) is defined everywhere.
255L
Convolutions of functions
269
Proposition Suppose that f , g are measurable complexvalued functions defined almost everywhere in R, 1 and that f ∈ LpC , g ∈ LqC where p, q ∈ [1, ∞] and p1 + 1q = 1 (writing ∞ = 0 as usual). Then f ∗ g is defined everywhere in R, is uniformly continuous, and supx∈R (f ∗ g)(x) ≤ kf kp kgkq . proof (a) (For an introduction to Lp spaces, see §244.) For any x ∈ R, the function fx , defined by setting fx (y) = f (x − y) whenever x − y ∈ dom f , must alsoR belong to Lp , because fx = f φ for an automorphism φ of the measure space. Consequently (f ∗ g)(x) = fx × g is defined, and of modulus at most kf kp kgkq , by 243Fa/243K and 244Eb/244Ob. (b) To see that f ∗ g is uniformly continuous, argue as follows. Suppose first that p < ∞. Let ² > 0. Let η > 0 be such that η(2 + 21/p )kgkq ≤ ². Then there is a bounded continuous function h : R → C such that {x : h(x) 6= 0} is bounded and kf − hkp ≤ η (244H, 244Ob); let M ≥ 1 be such that h(x) = 0 whenever x ≥ M − 1. Next, h is uniformly continuous, so there is a δ ∈ ]0, 1] such that h(x) − h(x0 ) ≤ ηM −1/p whenever x − x0  ≤ δ. Suppose that x − x0  ≤ δ. Defining hx (y) = h(x − y), as before, we have Z
Z hx − hx0 p =
(substituting t = x − y)
Z h(x − y) − h(x0 − y)p dy =
Z
M
=
h(t) − h(x0 − x + t)p dt
h(t) − h(x0 − x + t)p dt
−M
(because h(t) = h(x0 − x + t) = 0 if t ≥ M ) ≤ 2M (ηM −1/p )p (because h(t) − h(x0 − x + t) ≤ ηM −1/p for every t) = 2η p . So khx − hx0 kp ≤ 21/p η. On the other hand, R R R hx − fx p = h(x − y) − f (x − y)p dy = h(y) − f (y)p dy, so khx − fx kp = kh − f kp ≤ η, and similarly khx0 − fx0 kp ≤ η. So kfx − fx0 kp ≤ kfx − hx kp + hx − hx0 kp + khx0 − fx0 kp ≤ η(2 + 21/p ). This means that Z (f ∗ g)(x) − (f ∗ g)(x0 ) = 
Z fx × g −
Z fx0 × g = 
(fx − fx0 ) × g
≤ kfx − fx0 p kgkq ≤ η(2 + 21/p )kgkq ≤ ². As ² is arbitrary, f ∗ g is uniformly continuous. The argument here supposes that p is finite. But if p = ∞ then q = 1 is finite, so we can apply the method with g in place of f to show that g ∗ f is uniformly continuous, and f ∗ g = g ∗ f by 255Fb. 255L The rdimensional case I have written 255A255K out as theorems about Lebesgue measure on R. However they all apply equally well to Lebesgue measure on R r for any r ≥ 1, and the modifications required are so small that I think I need do no more than ask you to read through the arguments again, turning every R into an R r , and every R 2 into an (R r )2 . In 255A and elsewhere, the measure µ2 should be read either as Lebesgue measure on R 2r or as the product measure on (R r )2 ; by 251M the two may be identified. There is a trivial modification required in part (b) of the proof; if In = [an , bn [ then Q µIn = µ(−In ) = i≤r max(0, βni − αni ), writing an = (αn1 , . . . , αnr ). In the proof of 255I, the functions fn should be defined by saying that fn (x) = f (x) if f (x) ≤ n and kxk ≤ n, 0 otherwise. In quoting these results, therefore, I shall be uninhibited in referring to the paragraphs 255A255K as if they were actually written out for general r ≥ 1.
270
Product measures
255M
255M The case of ]−π, π] The same ideas also apply to the circle group S 1 and to the interval ]−π, π], but here perhaps rather more explanation is in order. (a) The first thing to establish is the appropriate group operation. If we think of S 1 as the set {z : z ∈ C, z = 1}, then the group operation is complex multiplication, and in the formulae above x + y must be rendered as xy, while x − y must be rendered as xy −1 . On the interval ]−π, π], the group operation is +2π , where for x, y ∈ ]−π, π] I write x +2π y for whichever of x + y, x + y + 2π, x + y − 2π belongs to ]−π, π]. To see that this is indeed a group operation, one method is to note that it corresponds to multiplication on S 1 if we use the canonical bijection x 7→ eix : ]−π, π] → S 1 ; another, to note that it corresponds to the operation on the quotient group R/2πZ. Thus in this interpretation of the ideas of 255A255K, we shall wish to replace x + y by x +2π y, −x by −2π x, and x − y by x −2π y, where −2π x = −x if x ∈ ]−π, π[,
−2π π = π,
and x −2π y is whichever of x − y, x − y + 2π, x − y − 2π belongs to ]−π, π]. (b) As for the measure, the measure to use on ]−π, π] is just Lebesgue measure. Note that because ]−π, π] is Lebesgue measurable, there will be no confusion concerning the meaning of ‘measurable subset’, as the relatively measurable subsets of ]−π, π] are actually measurable for Lebesgue measure on R. Also we can identify the product measure on ]−π, π] × ]−π, π] with the subspace measure induced by Lebesgue measure on R 2 (251Q). On S 1 , we need the corresponding measure induced by the canonical bijection between S 1 and ]−π, π], which indeed is often called ‘Lebesgue measure on S 1 ’. (We shall see in 265E that it is also equal to Hausdorff onedimensional measure on S 1 .) We are very close to the level at which it would become reasonable to move to S 1 and this measure (or its normalized version, in which it is reduced by a factor of 2π, so as to make S 1 a probability space). However, the elementary theory of Fourier series, which will be the principal application of this work in the present volume, is generally done on intervals in R, so that formulae based on ]−π, π] are closer to the standard expressions. Henceforth, therefore, I will express all the work in terms of ]−π, π]. (c) The result corresponding to 255A now takes a slightly different form, so I spell it out. 255N Theorem Let µ be Lebesgue measure on ]−π, π] and µ2 Lebesgue measure on ]−π, π] × ]−π, π]; write Σ, Σ2 for their domains. (a) For any a ∈ ]−π, π], the map x 7→ a +2π x : ]−π, π] → ]−π, π] is a measure space automorphism of (]−π, π] , Σ, µ). (b) The map x 7→ −2π x : ]−π, π] → ]−π, π] is a measure space automorphism of (]−π, π] , Σ, µ). (c) For any a ∈ ]−π, π], the map x 7→ a −2π x : ]−π, π] → ]−π, π] is a measure space automorphism of (]−π, π] , Σ, µ). 2 2 2 (d) The map (x, y) 7→ (x +2π y, y) : ]−π, π] → ]−π, π] is a measure space automorphism of (]−π, π] , Σ2 , µ2 ). 2 2 2 (e) The map (x, y) 7→ (x −2π y, y) : ]−π, π] → ]−π, π] is a measure space automorphism of (]−π, π] , Σ2 , µ2 ). proof (a) Set φ(x) = a +2π x. Then for any E ⊆ ]−π, π], φ[E] = ((E + a) ∩ ]−π, π]) ∪ (((E + a) ∩ ]π, 3π]) − 2π) ∪ (((E + a) ∩ ]−3π, −π]) + 2π), and these three sets are disjoint, so that
µφ[E] = µ((E + a) ∩ ]−π, π]) + µ(((E + a) ∩ ]π, 3π]) − 2π) + µ(((E + a) ∩ ]−3π, −π]) + 2π) = µL ((E + a) ∩ ]−π, π]) + µL (((E + a) ∩ ]π, 3π]) − 2π) + µL (((E + a) ∩ ]−3π, −π]) + 2π) (writing µL for Lebesgue measure on R)
255Od
Convolutions of functions
271
= µL ((E + a) ∩ ]−π, π]) + µL ((E + a) ∩ ]π, 3π]) + µL ((E + a) ∩ ]−3π, −π]) = µL (E + a) = µL E = µE. Similarly, µφ−1 [E] is defined and equal to µE for every E ∈ Σ, so that φ is an automorphism of (]−π, π] , Σ, µ). (b) Of course this is quicker. Setting φ(x) = −2π x for x ∈ ]−π, π], we have µ(φ[E]) = µ(φ[E] ∩ ]−π, π[) = µ(−(E ∩ ]−π, π[) = µL (−(E ∩ ]−π, π[)) = µL (E ∩ ]−π, π[) = µ(E ∩ ]−π, π[) = µE for every E ∈ Σ. (c) This is just a matter of putting (a) and (b) together, as in 255A. (d) We can argue as in (a), but with a little more elaboration. If E ∈ Σ2 , and φ(x, y) = (x +2π y, y) for 2 x, y ∈ ]−π, π], set ψ(x, y) = (x + y, y) for x, y ∈ R, and write c = (2π, 0) ∈ R 2 , H = ]−π, π] , H 0 = H + c, H 00 = H − c. Then for any E ∈ Σ2 , φ[E] = (ψ[E] ∩ H) ∪ ((ψ[E] ∩ H 0 ) − c) ∪ ((ψ[E] ∩ H 00 ) + c), so µ2 φ[E] = µ2 (ψ[E] ∩ H) + µ2 ((ψ[E] ∩ H 0 ) − c) + µ2 ((ψ[E] ∩ H 00 ) + c) = µL (ψ[E] ∩ H) + µL ((ψ[E] ∩ H 0 ) − c) + µL ((ψ[E] ∩ H 00 ) + c) (this time writing µL for Lebesgue measure on R 2 ) = µL (ψ[E] ∩ H) + µL (ψ[E] ∩ H 0 ) + µL (ψ[E] ∩ H 00 ) = µL ψ[E] = µL E = µ2 E. 2
In the same way, µ2 (φ−1 [E]) = µ2 E for every E ∈ Σ2 , so φ is an automorphism of (]−π, π] , Σ2 , µ2 ), as required. (e) Finally, (e) is just a restatement of (d), as before. 255O Convolutions on ]−π, π] With the fundamental result established, the same arguments as in 255B255K now yield the following. Write µ for Lebesgue measure on ]−π, π]. (a) Let f and g be measurable complexvalued functions defined almost everywhere in ]−π, π]. Write f ∗ g for the function defined by the formula Rπ (f ∗ g)(x) = −π f (x −2π y)g(y)dy whenever x ∈ ]−π, π] and the integral exists as a complex number. Then f ∗ g is the convolution of the functions f and g. (b) If f , g are measurable complexvalued functions defined almost everywhere in ]−π, π], then f ∗g = g∗f , in the strict sense that they have the same domain and the same value at each point of that common domain. (c) We may regard convolution as a binary operator on L0C ; if u, v ∈ L0C (µ), we can define u ∗ v as being equal to (f ∗ g)• whenever f • = u and g • = v. (d) Let f , g and h be measurable complexvalued functions defined almost everywhere in ]−π, π]. Then (i) Rπ R h(x)(f ∗ g)(x)dx = ]−π,π]2 h(x + y)f (x)g(y)d(x, y) −π
272
Product measures
255Od
whenever the righthand side exists and is finite, provided that in the expression h(x)(f ∗ g)(x) we interpret the product as 0 if h(x) = 0 and (f ∗ g)(x) is undefined. Rπ (ii) If, on the same interpretation of h(x)(f  ∗ g)(x), the integral −π h(x)(f  ∗ g)(x)dx is finite, R then ]−π,π]2 h(x + y)f (x)g(y)d(x, y) exists in C, so again we shall have R Rπ h(x)(f ∗ g)(x)dx = ]−π,π]2 h(x + y)f (x)g(y)d(x, y). −π (e) If f , g are complexvalued functions which are integrable over ]−π, π], then f ∗ g is integrable, with Rπ Rπ Rπ Rπ Rπ Rπ f ∗ g = −π f −π g, f ∗ g ≤ −π f  −π g. −π −π (f ) Let f , g, h be complexvalued measurable functions defined almost everywhere in ]−π, π]. Suppose that x ∈ ]−π, π] is such that one of (f  ∗ (g ∗ h))(x), ((f  ∗ g) ∗ h)(x) is defined in R. Then f ∗ (g ∗ h) and (f ∗ g) ∗ h are defined and equal at x. (g) Suppose that f ∈ LpC (µ), g ∈ LqC (µ) where p, q ∈ [1, ∞] and everywhere in ]−π, π], and supx∈]−π,π] (f ∗ g)(x) ≤ kf kp kgkq .
1 p
+
1 q
= 1. Then f ∗ g is defined
255X Basic exercises > (a) Let Rf , g be complexvalued functions defined almost everywhere in R. Show that for any x ∈ R, (f ∗ g)(x) = f (x + y)g(−y)dy if either is defined. > (b) Let f and g be complexvalued functions defined almost everywhere in R. (i) Show that if f and g are even functions, so is f ∗ g. (ii) Show that if f is even and g is odd then f ∗ g is odd. (iii) Show that if f and g are odd then f ∗ g is even. > (c) Let µ be Lebesgue measure on R. Show that we have a function ∗ : L1C (µ) × L1C (µ) → L1C (µ) given by setting f • ∗ g • = (f ∗ g)• for all f , g ∈ L1C (µ). Show that L1C is a commutative Banach algebra under ∗ (definition: 2A4J). R (d) (i) Show that if hRis an integrableR function on R 2 , then (T h)(x) = h(x − y, y)dy exists for almost every x ∈ R, and that (T h)(x)dx = h(x, y)d(x, y). (ii) Write µ2 for Lebesgue measure on R 2 , µ for Lebesgue measure on R. Show that there is a linear operator T˜ : L1 (µ2 ) → L1 (µ) defined by setting T˜(h• ) = (T h)• for every integrable function h on R 2 . (iii) Show that in the language of 253E and (b) above, T˜(u ⊗ v) = u ∗ v for all u, v ∈ L1 (µ). P P a(n − i)bb(i) < ∞. Show that >(e) For a, b ∈ CZ set (a a ∗ b)(n) = i∈Z a(n − i)bb(i) whenever i∈Z a a b b a (i) P ∗ = ∗ ; P P a(i)bb(j) < ∞; a(i)bb(j) if i,j∈Z cc(i + j)a a ∗ b)(i) = i,j∈Z c(i + j)a (ii) i∈Z c(i)(a 1 1 a ∗ b k1 ≤ ka ak1 kbbk1 ; (iii) if a , b ∈ ` (Z) then a ∗ b ∈ ` (Z) and ka a ∗ b k∞ ≤ ka ak2 kbbk2 ; (iv) If a , b ∈ `2 (Z) then a ∗ b ∈ `∞ (Z) and ka a ∗ (bb ∗ cc))(n) is welldefined, then (a a ∗ (bb ∗ c ))(n) = ((a a ∗ b ) ∗ c )(n). (v) if a , b , c ∈ CZ and (a (f ) Suppose that f , g are realvalued measurable functions defined almost everywhere in R r and such that f > 0 a.e., g ≥a.e. 0 and {x : g(x) > 0} is not negligible. Show that f ∗ g > 0 everywhere in dom(f ∗ g). > (g) Suppose that f : R → C is a bounded differentiable function and that f 0 is bounded. Show that for any integrable complexvalued function g on R, f ∗ g is differentiable and (f ∗ g)0 = f 0 ∗ g everywhere. (Hint: 123D.) Rb (h) A complexvalued function g defined almost everywhere in R is locally integrable if a g is defined in C whenever a < b in R. Suppose that g is such a function and that f : R → C is a differentiable function, with continuous derivative, such that {x : f (x) 6= 0} is bounded. Show that (f ∗ g)0 = f 0 ∗ g everywhere. R 1 > (i) Set φδ (x) = exp(− δ2 −x φδ , ψδ = αδ−1 φδ . Let 2 ) if x < δ, 0 if x ≥ δ, as in 242Xi. Set αδ = f be a locally integrable complexvalued function on R. (i) Show that f ∗ ψδ is a smooth function defined everywhere on R for every δ > 0. (ii) Show that limRδ↓0 (f ∗ ψδ )(x) = f (x) for almost every x ∈ R. (Hint: 223Yg.) (iii) Show that if f is integrable then limδ↓0 fR− f ∗ ψδ  = 0. R (Hint: use (ii) and 245H(aii) or look first at the case f = χ[a, b] and use 242O, noting that f ∗ ψδ  ≤ f .) (iv) Show that if f is uniformly continuous and defined everywhere on R then limδ↓0 supx∈R f (x) − (f ∗ φδ )(x) = 0.
255Yj
> (j) For α > 0, set gα (t) =
Convolutions of functions 1 α−1 t Γ(α)
273
for t > 0, 0 for t ≤ 0. Show that gα ∗ gβ = gα+β for all α, β > 0.
(Hint: 252Yk.) 255Y Further exercises (a) Set f (x) = 1 for all x ∈ R, g(x) =
x x
for 0 < x ≤ 1 and 0 otherwise,
h(x) = tanh x for all x ∈ R. Show that f ∗ (g ∗ h) and (f ∗ g) ∗ h are both defined (and constant) everywhere, and are different. (b) Discuss what can happen if, in the context of 255J, we know that (f  ∗ (g ∗ h))(x) is defined, but have no information on the domain of f ∗ g. (c) Suppose that p ∈ [1, ∞[ and that f ∈ LpC (µ), where µ is Lebesgue measure on R r . For a ∈ R r set (Sa f )(x) = f (a + x) whenever a + x ∈ dom f . Show that Sa f ∈ LpC (µ), and that for every ² > 0 there is a δ > 0 such that kSa f − f kp ≤ ² whenever a ≤ δ. (d) Suppose that p, q ∈ ]1, ∞[ and p1 + 1q = 1. Let f ∈ LpC (µ), g ∈ LqC (µ), where µ is Lebesgue measure on R r . Show that limkxk→∞ (f ∗ g)(x) = 0. (Hint: use 244Hb.) (e) Repeat 255Yc and 255K, this time taking µ to be Lebesgue measure on ]−π, π], and setting (Sa f )(x) = f (a +2π x) for a ∈ ]−π, π]; show that in the new version of 255K, (f ∗ g)(π) = limx↓−π (f ∗ g)(x). (f ) Let µ be Lebesgue measure on R. For a ∈ R, f ∈ L0 = L0 (µ) set (Sa f )(x) = f (a + x) whenever a + x ∈ dom f . (i) Show that Sa f ∈ L0 for every f ∈ L0 . (ii) Show that we have a map S˜a : L0 → L0 defined by setting S˜a (f • ) = (Sa f )• for every f ∈ L0 . (iii) Show that S˜a is a Riesz space isomorphism and is a homeomorphism for the topology of convergence in measure; moreover, that S˜a (u × v) = S˜a u × S˜a v for all u, v ∈ L0 . (iv) Show that S˜a+b = S˜a S˜b for all a, b ∈ R. (v) Show that lima→0 S˜a u = u for the topology of convergence in measure, for every u ∈ L0 . (vi) Show that if 1 ≤ p ≤ ∞ then S˜a ¹Lp is an isometric isomorphism of the Banach lattice Lp . (vii) Show that if p ∈ [1, ∞[ then lima→0 kS˜a u − ukp = 0 for every u ∈ Lp . (viii) Show that if A ⊆ L1 is uniformly integrable and M ≥ 0, then {S˜a u : u ∈ A, a ≤ M } is uniformly integrable. (ix) Show that if u, v ∈ L0 are such that u ∗ v is defined in L0 , then S˜a (u ∗ v) = (S˜a u) ∗ v = u ∗ (S˜a v) for every a ∈ R. (g) Prove 255Nd from 255Na by the method used to prove 255Ad from 255Aa, rather than by quoting 255Ad. (h) Repeat the results of this chapter for the group (S 1 )r , where r ≥ 2, given its product measure. (i) Let f be a complexvalued function which is integrable over R. (i) Let x be any point of the Lebesgue set of f . Show that for any ² > 0 there is a δ > 0 such that f (x) − (f ∗ g)(x) ≤ ² whenever g : R → [0, ∞[ is R Rδ a function which is nondecreasing on ]−∞, 0], nondecreasing on [0, ∞[, and has g = 1 and −δ g ≥ 1 − δ. (ii) Show that for any ² > 0 there is a δ > 0 such that kf − f ∗ gk1 ≤ ² whenever g : R → [0, ∞[ is a function R Rδ which is nondecreasing on ]−∞, 0], nondecreasing on [0, ∞[, and has g = 1 and −δ g ≥ 1 − δ. (j) Let f be a complexvalued function which is integrable over R. Show that, for almost every x ∈ R, a R∞ f (y) 1 R∞ lima→∞ dy, lima→∞ x f (y)e−a(y−x) dy, 2 2 −∞ π
(x−y) +a
1 limσ↓0 √
σ 2π
all exist and are equal to f (x). (Hint: 263G.)
a
R∞ −∞
f (y)e−(y−x)
2
/2σ 2
dy
274
Product measures
255Yk
(k) Let µ be Lebesgue measure on R, and φ : R → R a convex function such that φ(0) = 0; let φ¯ : L0 → L0 =RL0 (µ) be the associated operator (see 241I). Show that if u ∈ L1 = L1 (µ), v ∈ L0 are such ¯ ¯ ∗ v) ≤ u ∗ φ(v). ¯ is defined in L0 , then φ(u (Hint: 233I.) that u, v ≥ 0, u = 1 and and u ∗ φ(v) (l) Let µ be Lebesgue measure on R, and p ∈ [1, ∞]. Let f ∈ L1C (µ), g ∈ LpC (µ). Show that f ∗ g ∈ LpC (µ) and that kf ∗ gkp ≤ kf k1 kgkp . (Hint: argue from 255Yk, as in 244M.) (m) Suppose that p, q, r ∈ ]1, ∞[ and that p1 + 1q = 1 + 1r . Let µ be Lebesgue measure on R. (i) Show that R 1−p/r 1−q/r R p f × g ≤ kf kp kgkq ( f × g q )1/r 0
0
whenever f , g ≥ 0 and f ∈ Lp (µ), g ∈ Lq (µ). (Hint: set p0 = p/(p − R1), etc.; f1 = f p/q , g1 = g q/p , h = (f p ×g q )1/r . Use 244Xd to see that kf1 ×g1 kr0 ≤ kf1 kq0 kg1 kp0 , so that f1 ×g1 ×h ≤ kf1 kq0 kg1 kp0 khkr .) (ii) Show that kf ∗ gkr ≤ kfRkp kgkq for all f ∈ Lp (µ), g ∈ Lq (µ). (Hint: take f , gR≥ 0. Use (i) to see that (f ∗ g)(x)r ≤ kf kr−p kgkr−q f (y)p g(x − y)q dy, so that kf ∗ gkrr ≤ kf kr−p kgkr−q f (y)p kgkqq dy.) (This is p q p q Young’s inequality.) (n) Let G be a group and µ a σfinite measure on G such that (α) for every a ∈ G, the map x 7→ ax is an automorphism of (G, µ) (β) the map (x, y) 7→ (x, xy) is an automorphism of (G2 , µ2 ), where µ2 is the c.l.d. R 0 product measure on G × G. For f , g ∈ LC (µ) write (f ∗ g)(x) = f (y)g(y −1 x)dy whenever this is defined. Show that R R (i) ifR f , g, h ∈ L0C (µ) and h(xy)f (x)g(y)d(x, y) is defined in C, then h(x)(f ∗ g)(x)dx exists and is equal to h(xy)f (x)g(y)d(x, y), provided that in the expression h(x)(f ∗ g)(x) we interpret the product as 0 if h(x) = 0 and (f ∗ g)(x) is undefined; R R R (ii) if f , g ∈ L1C (µ) then f ∗ g ∈ L1C (µ) and f ∗ g = f g, kf ∗ gk1 ≤ kf k1 kgk1 ; (iii) if f , g, h ∈ L1C (µ) then f ∗ (g ∗ h) = (f ∗ g) ∗ h. (See Halmos 50, §59.) (o) Repeat 255Yn for counting measure on any group G. 255 Notes and comments I have tried to set this section out in such a way that it will be clear that the basis of all the work here is 255A, and the crucial application is 255G. I hope that if and when you come to look at general topological groups (for instance, in Chapter 44), you will find it easy to trace through the ideas in any abelian topological group for which you can prove a version of 255A. For nonabelian groups, of course, rather more care is necessary, especially as in some important examples we no longer have µ{x−1 : x ∈ E} = µE for every E; see 255Yn255Yo for a little of what can be done without using topological ideas. The critical point in 255A is the move from the onedimensional results in 255Aa255Ac, which are just the translation and reflectioninvariance of Lebesgue measure, to the twodimensional results in 255Ac255Ad. And the living centre of the argument, as I present it, is the fact that the shear transformation φ is an automorphism of the structure (R 2 , Σ2 ). The actual calculation of µ2 φ[E], assuming that it is measurable, is an easy application of Fubini’s and Tonelli’s theorems and the translationinvariance of µ. It is for this step that we absolutely need the topological properties of Lebesgue measure. I should perhaps remind you that the fact that φ is a homeomorphism is not sufficient; in 134I I described a homeomorphism of the unit interval which does not preserve measurability, and it is easy to adapt this to produce a homeomorphism ψ : R 2 → R 2 such that ψ[E] is not always measurable for measurable E. The argument of 255A is dependent on the special relationships between all three of the measure, topology and group structure of R. I have already indulged in a few remarks on what ought, or ought not, to be ‘obvious’ (255C). But perhaps I can add that such results as 255B and the later claim, in the proof of 255K, that a reflected version of a function in Lp is also in Lp , can only be trivial consequences of results like 255A if every step in the construction of the integral is done in the abstract context of general measure spaces. Even though we are here working exclusively with the Lebesgue integral, the argument will become untrustworthy if we have at any stage in the definition of the integral even mentioned that we are thinking of Lebesgue measure. I advance this as a solid reason for defining ‘integration’ on abstract measure spaces from the beginning, as
§256 intro.
Radon measures on R r
275
I did in Volume 1. Indeed, I suggest that generally in pure mathematics there are good reasons for casting arguments into the forms appropriate to the arguments themselves. I am writing this book for readers who are interested in proofs, and as elsewhere I have written the proofs of this section out in detail. But most of us find it useful to go through some material in ‘advanced calculus’ mode, by which I mean starting with a formula such as R (f ∗ g)(x) = f (x − y)g(y)dy, and then working out consequences by formal manipulations, for instance R RR RR h(x)(f ∗ g)(x)dx = h(x)f (x − y)g(y)dydx = h(x + y)f (x)g(y)dydx, without troubling about the precise applicability of the formulae to begin with. In some ways this formuladriven approach can be more truthful to the structure of the subject than the careful analysis I habitually present. The exact hypotheses necessary to make the theorems strictly true are surely secondary, in such contexts as this section, to the pattern formed by the ensemble of the theorems, which can be adequately and elegantly expressed in straightforward formulae. Of course I do still insist that we cannot properly appreciate the structure, nor safely use it, without mastering the ideas of the proofs – and as I have said elsewhere, I believe that mastery of ideas necessarily includes mastery of the formal details, at least in the sense of being able to reconstruct them fairly fluently on demand. Throughout the main exposition of this section, I have worked with functions rather than equivalence classes of functions. But all the results here have interpretations of great importance for the theory of the ‘function spaces’ of Chapter 24. It is an interesting point that if u, v ∈ L0 then u ∗ v is most naturally interpreted as a function, not as a member of L0 , even if it is defined almost everywhere. Thus 255H can be regarded as saying that u ∗ v ∈ L1 for u, v ∈ L1 . We cannot quite say that convolution is a bilinear operator from L1 × L1 to L1 , because L1 is not strictly speaking a linear space. If we want a bilinear functional, then we have to replace the function u ∗ v by its equivalence class, so that convolution becomes a bilinear map from L1 × L1 to L1 . But when we look at convolution as a function on L2 × L2 , for instance, then our functions u ∗ v are defined everywhere (255K), and indeed are continuous functions vanishing at ∞ (255Yc255Yd). So in this case it seems more appropriate to regard convolution as a bilinear operator from L2 × L2 to some space of continuous functions, and not as an operator from L2 × L2 to L∞ . For an example of an interesting convolution which is not naturally representable in terms of an operator on Lp spaces, see 255Xj. Because convolution acts as a continuous bilinear operator from L1 (µ) × L1 (µ) to L1 (µ), where µ is Lebesgue measure on R, Theorem 253F tells us that it must correspond to a linear operator from L1 (µ2 ) to L1 (µ), where µ2 is Lebesgue measure on R 2 . This is the operator T˜ of 255Xd. So far in these notes I have written as though we were concerned only with Lebesgue measure on R. However many applications of the ideas involve R r or ]−π, π] or S 1 . The move to R r should be elementary. The move to S 1 does require a reformulation of the basic result 255A/255N. It should also be clear that r there will be no new difficulties in moving to ]−π, π] or (S 1 )r . Moreover, we can also go through the r whole theory for the groups Z and Z , where the appropriate measure is now counting measure, so that L0C r becomes identified with CZ or CZ (255Xe, 255Yo).
256 Radon measures on R r In the next section, and again in Chapters 27 and 28, we need to consider the principal class of measures on Euclidean spaces. For a proper discussion of this class, and the interrelationships between the measures and the topologies involved, we must wait until Volume 4. For the moment, therefore, I present definitions adapted to the case in hand, warning you that the correct generalizations are not quite obvious. I give the definition (256A) and a characterization (256C) of Radon measures on Euclidean spaces, and theorems on the construction of Radon measures as indefinite integrals (256E, 256J), as image measures (256G) and as product measures (256K). In passing I give a version of Lusin’s theorem concerning measurable functions on Radon measure spaces (256F).
276
Product measures
256A
256A Definitions Let ν be a measure on R r , where r ≥ 1, and Σ its domain. (a) ν is a topological measure if every open set belongs to Σ. Note that in this case every Borel set, and in particular every closed set, belongs to Σ. (b) ν is locally finite if every bounded set has finite outer measure. (c) If ν is a topological measure, it is inner regular with respect to the compact sets if νE = sup{νK : K ⊆ E is compact} for every E ∈ Σ. (Because ν is a topological measure, and compact sets are closed (2A2Ec), νK is defined for every compact set K.) (d) ν is a Radon measure if it is a complete locally finite topological measure which is inner regular with respect to the compact sets. 256B
It will be convenient to be able to call on the following elementary facts.
Lemma Let ν be a Radon measure on R r , and Σ its domain. (a) ν is σfinite. (b) For any E ∈ Σ and any ² > 0 there are a closed set F ⊆ E and an open set G ⊇ E such that ν(G \ F ) ≤ ². (c) For every E ∈ Σ there is a set H ⊆ E, expressible as the union of a sequence of compact sets, such that ν(E \ H) = 0. (d) Every continuous realvalued function on R r is Σmeasurable. (e) If h : R r → R is continuous and has bounded support, then h is νintegrable. proof (a) For each n ∈ N, B(0, n) = {x : kxk ≤ n} is a closed bounded set, therefore Borel. So if ν is a Radon measure on R r , hB(0, n)in∈N is a cover of R r by a sequence of sets of finite measure. (b) Set En = {x : x ∈ E, n ≤ kxk < n + 1} for S each n. Then νEn < ∞, so there is a compact set Kn ⊆ En such that νKn ≥ νEn − 2−n−2 ². Set F = n∈N Kn ; then ν(E \ F ) =
P∞ n=0
Also F ⊆ E and F is closed because F ∩ B(0, n) =
1 2
ν(En \ Kn ) ≤ ².
S i≤n
Ki ∩ B(0, n)
is closed for each n. In the same way, there is a closed set F 0 ⊆ R r \ E such that ν((R r \ E) \ F 0 ) ≤ 21 ². Setting G = R r \ F 0 , we see that G is open, that G ⊇ E and that ν(G \ E) ≤ 21 ², so that ν(G \ F ) ≤ ², as required. (c)SBy (b), we can choose for each n ∈ N a closed set S Fn ⊆ E such that ν(E \ Fn ) ≤ 2−n . Set H = n∈N Fn ; then H ⊆ E and ν(E \ H) = 0, and also H = m,n∈N B(0, m) ∩ Fn is a countable union of compact sets. (d) If h : R r → R is continuous, all the sets {x : h(x) > a} are open, so belong to Σ. (e) By (d), h is measurable. Now we are supposing that there is some n ∈ N such that h(x) = 0 whenever x ∈ / B(0, n). Since B(0, n) is compact (2A2F), h is bounded on B(0, n) (2A2G), and we have h ≤ γχB(0, n) for some γ; since νB(0, n) is finite, h is νintegrable. 256C Theorem A measure ν on R r is a Radon measure iff it is the completion of a locally finite measure defined on the σalgebra B of Borel subsets of R r . proof (a) Suppose first that ν is a Radon measure. Write Σ for its domain. (i) Set ν0 = ν¹B. Then ν0 is a measure with domain B, and it is locally finite because ν0 B(0, n) = νB(0, n) is finite for every n. Let νˆ0 be the completion of ν0 (212C).
256C
Radon measures on R r
277
(ii) If νˆ0 measures E, there are E1 , E2 ∈ B such that E1 ⊆ E ⊆ E2 and ν0 (E2 \ E1 ) = 0. Now E \ E1 ⊆ E2 \ E1 must be νnegligible; as ν is complete, E ∈ Σ and νE = νE1 = ν0 E1 = νˆ0 E. (iii) If E ∈ Σ, then by 256Bc there is a Borel set H ⊆ E such that ν(E \ H) = 0. Equally, there is a Borel set H 0 ⊆ R r \ E such that ν((R r \ E) \ H 0 ) = 0, so that we have H ⊆ E ⊆ R r \ H 0 and ν0 ((R r \ H 0 ) \ H) = ν((Rr \ H 0 ) \ H) = 0. So νˆ0 E is defined and equal to ν0 E1 = νE. This shows that ν = νˆ0 is the completion of the locally finite Borel measure ν¹B. And this is true for any Radon measure ν on R r . (b) For the rest of the proof, I suppose that ν0 is a locally finite measure on R r and ν is its completion. Write Σ for the domain of ν. We say that a subset of R r is a Kσ set if it is expressible as the union of a sequence of compact sets. Note that every Kσ set is a Borel set, so belongs to Σ. Set A = {E : E ∈ Σ, there is a Kσ set H ⊆ E such that ν(E \ H) = 0}, Σ = {E : E ∈ A, R r \ E ∈ A}. (c)(i) Every open set is itself a Kσ set, so belongs to A. P P Let G ⊆ R r be open. If G = ∅ then G is compact and the result is trivial. Otherwise, let I be the set of closed intervals of the form [q, q 0 ], where q, q 0 ∈ Qr , which are included in G. Then all the members of I are closed and bounded, therefore compact. If x ∈ G, there is a δ > 0 such S that B(x, δ) = {y : ky − xk ≤ δ} ⊆ G; now there is an I ∈ I such that x ∈ I ⊆ B(x, δ). Thus G = I. But I is countable, so G is Kσ . Q Q S (ii) Every closed subset of R is Kσ , so belongs to A. P P If F ⊆ R is closed, then F = n∈N F ∩ B(0, n); but every F ∩ B(0, n) is closed and bounded, therefore compact. Q Q S P ForSeach n ∈ N we have a (iii) If hEn in∈N is any sequence in A, then E = n∈N En Sbelongs to A. P countable family Kn of compact subsets ofSEn such that ν(E \ K ) = 0; now K = n∈N Kn is a countable n n S S Q family of compact subsets of E, and E \ K ⊆ n∈N (En \ Kn ) is νnegligible. Q T P For each (iv) If hEn in∈N is any sequence in A, then F = S n ∈ N, let hKni ii∈N be a S n∈N En ∈ A. P 0 = i≤j Kni for each j, so that sequence of compact subsets of En such that ν(En \ i∈N Kni ) = 0. Set Knj 0 ∩ H) ν(En ∩ H) = limj→∞ ν(Knj
for every H ∈ Σ. Now, for each m, n ∈ N, choose j(m, n) such that 0 ν(En ∩ B(0, m) ∩ Kn,j(m,n) ) ≥ ν(En ∩ B(0, m)) − 2−(m+n) .
T 0 ; then Km is closed (being an intersection of closed sets) and bounded (being a Set Km = n∈N Kn,j(m,n) 0 0 subset of K0,j(m,0) ), therefore compact. Also Km ⊆ F , because Kn,j(m,n) ⊆ En for each n, and P P∞ ∞ 0 ) ≤ n=0 2−(m+n) = 2−m+1 . ν(F ∩ B(0, m) \ Km ) ≤ n=0 ν(En ∩ B(0, m) \ Kn,j(m,n) S Consequently H = m∈N Km is a Kσ subset of F and ν(F ∩ B(0, m) \ H) ≤ inf k≥m ν(F ∩ B(0, k) \ Hk ) = 0 for every m, so ν(F \ H) = 0 and F ∈ A. Q Q (d) Σ is a σalgebra of subsets of R. P P (i) ∅ and its complement are open, so belong to A and therefore to Σ. (ii) If E ∈ Σ then both R r \ E and Rr \ (R r \ E) = E belong to A, so Rr \ E ∈ Σ. (iii) Let hEn in∈N be a sequence in Σ with union E. By (aiii) and (aiv), T E ∈ A, R r \ E = n∈N (R r \ En ) ∈ A, so E ∈ Σ. Q Q (e) By (ci) and (cii), every open set belongs to Σ; consequently every Borel set belongs to Σ and therefore to A. Now if E is any member of Σ, there is aSBorel set E1 ⊆ E such that ν(E \ E1 ) = 0 and a Kσ set H ⊆ E1 such that ν(E1 \ H) = 0. Express H as n∈N Kn where every Kn is compact; then
278
because
Product measures
νE = νH = limn→∞ ν(
S i∈n
S i≤n
256C
Ki ) ≤ supK⊆E
is compact
νK ≤ νE
Ki is a compact subset of E for every n.
(f ) Thus ν is inner regular with respect to the compact sets. But of course it is complete (being the completion of ν0 ) and a locally finite topological measure (because ν0 is); so it is a Radon measure. This completes the proof. 256D Proposition If ν and ν 0 are two Radon measures on R r , the following are equiveridical: (i) ν = ν 0 ; (ii) νK = ν 0 K for every compact set K ⊆ R r ; (iii) RνG = ν 0 GR for every open set G ⊆ R r ; (iv) h dν = h dν 0 for every continuous function h : R r → R with bounded support. proof (a)(i)⇒(iv) is trivial. (b)(iv)⇒(iii) If (iv) is true, and G ⊆ R r is an open set, then for each n ∈ N set hn (x) = min(1, 2n inf y∈R r \(G∩B(0,n)) ky − xk) for x ∈ R r . RThen hn is (in fact hn (x) − hn (x0 ) ≤ 2n kx − x0 k for all x, x0 ∈ R r ) and zero outside R continuous 0 B(0, n), so hn dν = hn dν . Next, hhn (x)in∈N is a nondecreasing sequence converging to χG(x) for every x ∈ R r . So νG = limn→∞
R
hn dν = limn→∞
R
hn dν 0 = ν 0 G,
by 135Ga. As G is arbitrary, (iii) is true. (c)(iii)⇒(ii) If (iii) is true, and K ⊆ R r is compact, let n be so large that kxk < n for every x ∈ K. Set G = {x : kxk < n}, H = G \ K. Then G and H are open and G is bounded, so νG = ν 0 G is finite, and νK = νG − νH = ν 0 G − ν 0 H = ν 0 K. As K is arbitrary, (ii) is true. (d)(ii)⇒(i) If ν, ν 0 agree on the compact sets, then νE = supK⊆E
is compact
νK = supK⊆E
is compact
ν 0K = ν 0E
for every Borel set E. So ν¹B = ν 0 ¹B, where B is the algebra of Borel sets. But since ν and ν 0 are both the completions of their restrictions to B, they are identical. 256E It is I suppose time I gave some examples of Radon measures. However it will save a few lines if I first establish some basic constructions. You may wish to glance ahead to 256H at this point. Theorem Let ν be a Radon measure on R r , with domain Σ, and f a nonnegative Σmeasurable R function defined on a νconegligible subset of R r . Suppose that f is locally integrable in the sense that E f dν < ∞ for every bounded set E. Then the indefiniteintegral measure ν 0 on R r defined by saying that ν 0E =
R
E
f dν whenever E ∩ {x : x ∈ dom f, f (x) > 0} ∈ Σ
r
is a Radon measure on R . proof For the construction of ν 0 , see 234A234D. It is a topological measure because every open set belongs to Σ and therefore to the domain Σ0 of ν 0 . ν 0 is locally finite because f is locally integrable. To see that ν 0 is inner regular with respect to the compact sets, take any set E ∈ Σ0 , and set E 0 = {x : x ∈ E ∩ dom f, f (x) > 0}. Then E 0 ∈ Σ, so there is a set H ⊆ E 0 , expressible as the union of a sequence of compact sets, such that ν(E 0 \ H) = 0. In this case ν 0 (E \ H) =
R
E\H
f dν = 0.
Let hKn in∈N be a sequence of compact sets with union H; then S ν 0 E = ν 0 H = limn→∞ ν 0 ( i≤n Ki ) ≤ supK⊆E
is compact
As E is arbitrary, ν 0 is inner regular with respect to the compact sets.
ν 0 K ≤ ν 0 E.
256Hb
Radon measures on R r
279
256F Theorem Let ν be a Radon measure on R r , and Σ its domain. Let f : D → R be a Σmeasurable function, where D ⊆ R r . Then for every ² > 0 there is a closed set F ⊆ R r such that ν(Rr \ F ) ≤ ² and f ¹F is continuous. proof By 121I, there is a Σmeasurable function h : R r → R extending f . Enumerate Q as hqn in∈N . For each n ∈ N set En = {x : h(x) ≤ qn }, En0 = {x : h(x) > qn } and use 256Bb T to choose closed sets Fn ⊆ En , Fn0 ⊆ En0 such that ν(En \ Fn ) ≤ 2−n−2 ², ν(En0 \ Fn0 ) ≤ 2−n−2 ². Set F = n∈N (Fn ∪ Fn0 ); then F is closed and P∞ P∞ ν(R r \ F ) ≤ n=0 ν(R r \ (Fn ∪ Fn0 )) ≤ n=0 ν(En \ Fn ) + ν(En0 \ Fn0 ) ≤ ². I claim that h¹F is continuous. P P Suppose that x ∈ F and δ > 0. Then there are m, n ∈ N such that h(x) − δ ≤ qm < h(x) ≤ qn ≤ h(x) + δ. 0 ∩ En ; consequently x ∈ / Fm ∪ Fn0 . Because Fm ∪ Fn0 is closed, there is an η > 0 This means that x ∈ Em such that y ∈ / Fm ∪ Fn0 whenever ky − xk ≤ η. Now suppose that y ∈ F and ky − xk ≤ η. Then 0 0 0 y ∈ (Fm ∪ Fm ) ∩ (Fn ∪ Fn0 ) and y ∈ / Fm ∪ Fn0 , so y ∈ Fm ∩ Fn ⊆ Em ∩ En and qm < h(y) ≤ qn . Consequently h(y) − h(x) ≤ δ. As x and δ are arbitrary, h¹F is continuous. Q Q Consequently f ¹F = (h¹F )¹D is continuous, as required.
256G Theorem Let ν be a Radon measure on R r , with domain Σ, and suppose that φ : R r → R s is measurable in the sense that all its coordinates are Σmeasurable. If the image measure ν 0 = νφ−1 (112F) is locally finite, it is a Radon measure. proof Write Σ for the domain of ν and Σ0 for the domain of ν 0 . If φ = (φ1 , . . . , φs ), then φ−1 [{y : ηj ≤ α}] = {x : φj (x) ≤ α} ∈ Σ, so {y : ηj ≤ α} ∈ Σ0 for every j ≤ s, α ∈ R, where I write y = (η1 , . . . , ηs ) for y ∈ R s . Consequently every Borel subset of R s belongs to Σ0 (121J), and ν 0 is a topological measure. It is complete because if F is ν 0 negligible, and H ⊆ F , then φ−1 [H] ⊆ φ−1 [F ] is νnegligible, therefore belongs to Σ (cf. 211Xd). The point is of course that ν 0 is inner regular with respect to the compact sets. P P Suppose that F ∈ Σ0 0 r and that γ < ν F . For each j ≤ s, there is T a closed set Hj ⊆ R such that φj ¹Hj is continuous and ν(R r \ Hj ) < 1s (ν 0 F − γ), by 256F. Set H = j≤s Hj ; then H is closed and φ¹H is continuous and ν(R r \ H) < ν 0 F − γ = νφ−1 [F ] − γ, so that ν(φ−1 [F ] ∩ H) > γ. Let K ⊆ φ−1 [F ] ∩ H be a compact set such that νK ≥ γ, and set L = φ[K]. Because K ⊆ H and φ¹H is continuous, L is compact (2A2Eb). Of course L ⊆ F , and ν 0 L = νφ−1 [L] ≥ νK ≥ γ. As F and γ are arbitrary, ν 0 is inner regular with respect to the compact sets. Q Q Since ν 0 is locally finite by the hypothesis of the theorem, it is a Radon measure. 256H Examples I come at last to the promised examples. (a) Lebesgue measure on R r is a Radon measure. (It is a topological measure by 115G, and inner regular with respect to the compact sets by 134Fb.) (b) Let htn in∈N be any sequence in R r , and han in∈N any summable sequence in [0, ∞[. For every E ⊆ Rr set P νE = {an : tn ∈ E}. so that ν is a totally finite pointsupported measure. Then ν is a (totally finite) Radon measure on R r . P P Clearly ν is complete and defined on every Borel set and gives finite measure to bounded sets. To see that it is inner regular with respect to the compact sets, observe that for any E ⊆ R r the sets Kn = E ∩ {ti : i ≤ n} are compact and νE = limn→∞ νKn . Q Q
280
Product measures
256Hc
(c) Now we come to a new idea. Recall that the Cantor set C (134G) is a closed negligible subset of [0, 1], and that the Cantor function (134H) is a nondecreasing continuous function f : [0, 1] → [0, 1] such that f (0) = 0, f (1) = 1 and f is constant on each of the intervals composing [0, 1] \ C. It follows that if we set g(x) = 12 (x + f (x)) for x ∈ [0, 1], then g : [0, 1] → [0, 1] is a continuous bijection such that the Lebesgue measure of g[C] is 21 (134I); consequently g −1 : [0, 1] → [0, 1] is continuous. Now extend g to a bijection h : R → R by setting h(x) = x for x ∈ R \ [0, 1]. Then h and h−1 are continuous. Note that h[C] = g[C] has Lebesgue measure 21 . Let ν1 be the indefiniteintegral measure defined from Lebesgue measure µ on R and the function 2χ(h[C]); that is, ν1 E = 2µ(E∩h[C]) whenever this is defined. By 256E, ν1 is a Radon measure, and ν1 h[C] = ν1 R = 1. Let ν be the measure ν1 h, that is, νE = ν1 h[E] for just those E ⊆ R such that h[E] ∈ dom ν1 . Then ν is a Radon probability measure on R, by 256G, and νC = 1, ν(R \ C) = µC = 0. 256I Remarks (a) The measure ν of 256Hc, sometimes called Cantor measure, is a classic example, and as such has many constructions, some rather more natural than the one I use here (see 256Xk, and also 264Ym below). But I choose the method above because it yields directly, without further investigation or any appeal to more advanced general theory, the fact that ν is a Radon measure. (b) The examples above are chosen to represent the extremes under the ‘Lebesgue decomposition’ described in 232I. If ν is a (totally finite) Radon measure on R r , we can use 232Ib to express its restriction ν¹B to the Borel σalgebra as νp + νac + νcs , where νp is the ‘pointmass’ or ‘atomic’ part of ν¹B, νac is the ‘absolutely continuous’ part (with respect to Lebesgue measure), and νcs is the ‘atomless singular part’. In the example of 256Hb, we have ν¹B = νp ; in 256E, if we start from Lebesgue measure, we have ν¹B = νac ; and in 256Hc we have ν¹B = νcs . 256J Absolutely continuous Radon measures It is worth pausing a moment over the indefiniteintegral measures described in 256E. Proposition Let ν be a Radon measure on R r , where r ≥ 1, and write µ for Lebesgue measure on R r . Then the following are equiveridical: (i) ν is an indefiniteintegral measure over µ; (ii) νE = 0 whenever E is aR Borel subset of R r and µE = 0. In this case, if g ∈ L0 (µ) and E g dµ = νE for every Borel set E ⊆ R r , then g is a RadonNikod´ ym derivative of ν with respect to µ in the sense of 234B. proof (a)(i)⇒(ii) If f is a RadonNikod´ ym derivative of ν with respect to µ, then of course νE =
R
E
f dµ = 0
whenever µE = 0. (ii)⇒(i) If νE = 0 for every µnegligible Borel set E, then νE is defined and equal to 0 for every µnegligible set E, because ν is complete and any µnegligible set is included in a µnegligible Borel set. Consequently dom ν includes the domain Σ of µ, since every Lebesgue measurable set is expressible as the union of a Borel set and a negligible set. For each n ∈ N set En = {x : n ≤ kxk < n + 1}, so that hEn in∈N is a partition of R r into bounded Borel sets. Set νn E = ν(E ∩ En ) for every Lebesgue measurable set E and every n ∈ N. Now νn is absolutely continuous with respect ym theorem (232F) there is a µintegrable R to µ (232Ba), so by the RadonNikod´ function fn such that E fn dµ = νn E for every Lebesgue measurable set E. Because νn E ≥ 0 for every E ∈ Σ, fn ≥a.e. 0; because νn (R r \ En ) = 0, fn = 0 a.e. on R r \ En . Now if we set P∞ f = max(0, n=0 fn ), f will be defined µa.e. and we shall have R P∞ R P∞ f dµ = n=0 E fn dµ = n=0 ν(E ∩ En ) = νE E for every Borel set E, so that the indefiniteintegral measure ν 0 defined by f and µ agrees with ν on the Borel sets. Since this ensures that ν 0 is locally finite, ν 0 is a Radon measure, by 256E, and is equal to ν, by 256D. Accordingly ν is an indefiniteintegral measure over µ.
*256M
Radon measures on R r
281
(b) As in (aii) above, h must be locally integrable and the indefiniteintegral measure defined by h agrees with ν on the Borel sets, so is identical with ν. 256K Products The class of Radon measures on Euclidean spaces is stable under a wide variety of operations, as we have already seen; in particular, we have the following. Theorem Let ν1 , ν2 be Radon measures on R r and R s respectively, where r, s ≥ 1. Let λ be their c.l.d. product measure on R r × R s . Then λ is a Radon measure. Remark When I say that λ is ‘Radon’ according to the definition in 256A, I am of course identifying R r ×R s with R r+s , as in 251L251M. proof (a) I hope the following rather voluminous notation will seem natural. Write Σ1 , Σ2 for the domains of ν1 , ν2 ; Br , Bs for the Borel σalgebras of R r , R s ; Λ for the domain of λ; and B for the Borel σalgebra of R r+s . Because each νi is the completion of its restriction to the Borel sets (256C), λ is the product of ν1 ¹Br and ν2 ¹Bs (251S). Because ν1 ¹Br and ν2 ¹Bs are σfinite (256Ba, 212Ga), λ must be the completion of its b s , which by 251L is identified with B. Setting Qn = {(x, y) : kxk ≤ n, kyk ≤ n} we have restriction to Br ⊗B λQn = ν1 {x : kxk ≤ n} · ν2 {y : kyk ≤ n} < ∞ for every n, while every bounded subset of R r+s is included in some Qn . So λ¹B is locally finite, and its completion λ is a Radon measure, by 256C. 256L Remark We see from 253I that if ν1 and ν2 are Radon measures on R r and R s respectively, and are both indefiniteintegral measures over Lebesgue measure, then their product measure on R r+s is also an indefiniteintegral measure over Lebesgue measure. *256M For the sake of applications in §286 below, I include another result, which is in fact one of the fundamental properties of Radon measures, as will appear in §414. Proposition Let ν be a Radon measure on Rr , and D any subset of Rr . Let Φ be a nonempty upwardsdirected family of nonnegative continuous functions from D to R. For x ∈ D set g(x) = supf ∈Φ f (x) in [0, ∞]. Then (a) gR : D → [0, ∞] is lower semicontinuous, therefore Borel measurable; R (b) D g dν = supf ∈Φ D f dν. proof (a) For any u ∈ [−∞, ∞], {x : x ∈ D, g(x) > u} =
S
f ∈Φ {x
: x ∈ D, f (x) > u}
is an open set for the subspace topology on D (2A3C), so is the intersection of D with a Borel subset of Rr . This is enough to show that g is Borel measurable (121B121C). R R R (b) Accordingly D g dν will be defined in [0, ∞], and of course D g dν ≥ supf ∈Φ D f dν. For the reverse inequality, observe that there is a countable set Ψ ⊆ Φ such that g(x) = supf ∈Ψ f (x) for every x ∈ D. P P For a ∈ Q, q, q 0 ∈ Q r set Φaqq0 = {f : f ∈ Φ, f (y) > a whenever y ∈ D ∩ [q, q 0 ]}, interpreting [q, q 0 ] as in 115G. Choose faqq0 ∈ Φaqq0 if Φaqq0 is not empty, and arbitrarily in Φ otherwise; and set Ψ = {faqq0 : a ∈ Q, q, q 0 ∈ Q r }, so that Ψ is a countable subset of Φ. If x ∈ D and b < g(x), there is an a ∈ Q such that b ≤ a < g(x); there is an fˆ ∈ Φ such that fˆ(x) > a; because fˆ is continuous, there are q, q 0 ∈ Qr such that q ≤ x ≤ q 0 and fˆ(y) ≥ a whenever y ∈ D ∩ [q, q 0 ]; so that fˆ ∈ Φaqq0 , Φaqq0 6= ∅, faqq0 ∈ Φaqq0 and supf ∈Ψ f (x) ≥ faqq0 (x) ≥ b. As b is arbitrary, g(x) = supf ∈Ψ f (x). Q Q Let hfn in∈N be a sequence running over Ψ. Because Φ is upwardsdirected, we can choose hfn0 in∈N in Φ 0 inductively in such a way that fn+1 ≥ max(fn0 , fn ) for every n ∈ N. So hfn0 in∈N is a nondecreasing sequence in Φ and supn∈N fn0 (x) ≥ supf ∈Ψ f (x) = g(x) for every x ∈ D. By B.Levi’s theorem,
R
D
g dν ≤ supn∈N
and we have the required inequality.
R
D
fn0 dν ≤ supf ∈Φ
R
D
f dν,
282
Product measures
256X
256X Basic exercises > (a) Let ν be a measure on R r . (i) Show that it is locally finite, in the sense of 256Ab, iff for every x ∈ R r there is a δ > 0 such that ν ∗ B(x, δ) < ∞. (Hint: the sets B(0, n) are compact.) (ii) Show that in this case ν is σfinite. > (b) Let ν beSa Radon measure on R r and G a nonempty upwardsdirected family of open sets in R r . S (i) Show that ν( G) = supG∈G S νG. (Hint: observe that if K ⊆ G is compact, then K ⊆ G for some G ∈ G.) (ii) Show that ν(E ∩ G) = supG∈G ν(E ∩ G) for every set E which is measured by ν. > (c) Let ν be a Radon measure on R r and T F a nonempty downwardsdirected family of closed sets in R such that inf F ∈F νF < ∞. (i) Show that ν( F) = inf F ∈F νF . (Hint: apply 256Xb(ii) to G = {R r \ F : T F ∈ F}.) (ii) Show that ν(E ∩ F) = inf F ∈F ν(E ∩ F ) for every E in the domain of ν. r
> (d) Show that a Radon measure ν on R r is atomless iff ν{x} = 0 for every x ∈ R r . (Hint: apply 256Xc with F = {F : F ⊆ E is closed, not negligible}.) (e) Let ν1 , ν2 be Radon measures on Rr , and α1 , α2 ∈ ]0, ∞[. Set Σ = dom ν1 ∩ dom ν2 , and for E ∈ Σ set νE = α1 ν1 E + α2 ν2 E. Show that ν is a Radon measure on R r . Show that ν is an indefiniteintegral measure over Lebesgue measure iff ν1 , ν2 are, and that in this case a linear combination of of RadonNikod´ ym derivatives of ν1 and ν2 is a RadonNikod´ ym derivative of ν. > (f ) Let ν be a Radon measure on R r . (i) Show that there is a unique closed set F ⊆ R r such that, for open sets G ⊆ R r , νG > 0 iff G ∩ F 6= ∅. (F is called the support of ν.) (ii) Generally, a set A ⊆ R r is called selfsupporting if ν ∗ (A ∩ G) > 0 whenever G ⊆ R r is an open set meeting A. Show that for every closed set F ⊆ R r there is a unique selfsupporting closed set F 0 ⊆ F such that ν(F \ F 0 ) = 0. > (g) Show that a measure ν on R is a Radon measure iff it is a LebesgueStieltjes measure as described in 114Xa. Show that in this case ν is an indefiniteintegral measure over Lebesgue measure iff the function x 7→ ν ]−∞, x] is absolutely continuous on every bounded interval. (h) Let ν be a Radon measure on R r . Let Ck be the space of continuous realvalued functions on R r with bounded supports. Show that for every νintegrable function f and every ² > 0 there is a g ∈ Ck such that R f − gdν ≤ ². (Hint: use arguments from 242O, but in (ai) of the proof there start with closed intervals I.) (i) Let ν be a Radon measure on R r . Show that νE = inf{νG : G ⊇ E is open} for every set E in the domain of ν. (j) Let ν, ν 0 be two Radon measures on R r , and suppose that νI = ν 0 I for every halfopen interval I ⊆ R r (definition: 115Ab). Show that ν = ν 0 . (k) Let ν be Cantor measure (256Hc). (i) Show that if Cn is the nth set used in the construction of the Cantor set, so that Cn consists of 2n intervals of length 3−n , then νI = 2−n for each of the intervals N N I composing → R by setting P∞Cn .−n(ii) Let λ be the usual Nmeasure on {0, 1} (254J). Define φ : {0, 1} 2 N φ(x) = 3 n=0 3 x(n) for each x ∈ {0, 1} . Show that φ is a bijection between {0, 1} and C. (iii) Show that if B is the Borel σalgebra of R, then {φ−1 [E] : E ∈ B} is precisely the σalgebra of subsets of {0, 1}N generated by the sets {x : x(n) = i} for n ∈ N, i ∈ {0, 1}. (iv) Show that φ is an isomorphism between ({0, 1}N , λ) and (C, νC ), where νC is the subspace measure on C induced by ν. (l) Let ν and ν 0 be two Radon measures on R r . Show that ν 0 is an indefiniteintegral measure over ν iff ν E = 0R whenever νE = 0, and in this case a function f is a RadonNikod´ ym derivative of ν 0 with respect 0 to ν iff E f dν = ν E for every Borel set E. 0
256Y Further exercises (a) Let ν be a Radon measure on R r , and X any subset of Rr ; let νX be the subspace measure on X and ΣX its domain, and give X its subspace topology (2A3C). Show that νX has the following properties: (i) νX is complete and locally determined; (ii) every open subset of X belongs to ΣX ; (iii) νX E = sup{νX F : F ⊆ E is closed in S X} for every E ∈ ΣX ; (iv) whenever G is a nonempty upwardsdirected family of open subsets of X, νX ( G) = supG∈G νX G; (v) every point of X belongs to an open set of finite measure.
256 Notes
Radon measures on R r
283
(b) Let ν be a Radon measure on R r , with domain Σ, and f : Rr → R a function. Show that the following are equiveridical: (i) f is Σmeasurable; (ii) for every nonnegligible set E ∈ Σ there is a nonnegligible F ∈ Σ such that F ⊆ E and f ¹F is continuous; (iii) for every set E ∈ Σ, νE = supK∈Kf ,K⊆E νK, where Kf = {K : K ⊆ R r is compact, f ¹K is continuous}. (Hint: for S (ii)⇒(i), take a maximal disjoint family E ⊆ {K : K ∈ Kf , νK > 0}; show that E is countable and that E is conegligible.) (c) Take ν, X, νX and ΣX as in 256Ya. Suppose that f : X → R is a function. Show that f is ΣX measurable iff for every nonnegligible measurable set E ⊆ X there is a nonnegligible measurable F ⊆ E such that f ¹F is continuous. (d) Let hνn in∈N be a sequence of Radon measures on R r . Show that there is a Radon measure ν on R such that every νn is anP indefiniteintegral measure over ν. (Hint: find aP sequence hαn in∈N of strictly ∞ ∞ positive numbers such that n=0 αn νn B(0, k) < ∞ for every k, and set ν = n=0 αn νn , using the idea of 256Xe.) r
(e) A set G ⊆ R N is open if for every x ∈ G there are n ∈ N, δ > 0 such that {y : y ∈ R N , y(i) − x(i) < δ for every i ≤ n} ⊆ G. The Borel σalgebra of R N is the σalgebra B of subsets of R N generated, in the sense of 111Gb, by the family T of open sets. (i) Show that T is a topology (2A3A). (ii) Show that a filter F on R N converges to x ∈ R N iff πi [[F]] → x(i) for every i ∈ N, where πi (y) = y(i) for i ∈ N, y ∈ R N . (iii) Show that B is the σalgebra generated by sets of the form {x : x ∈ R N , x(i) ≤ a}, where i runs through N and a runs through R. (iv) Show that if αi ≥ 0 for every i ∈ N, then {x : x(i) ≤ αi ∀ i ∈ N} is compact. (Hint: 2A3R.) (v) Show that any open set in R N is the union of a sequence of closed sets. (Hint: look at sets of the form {x : qi ≤ x(i) ≤ qi0 ∀ i ≤ n}, where qi , qi0 ∈ Q for i ≤ n.) (vi) Show that if ν0 is any probability measure with domain B, then its completion ν is inner regular with respect to the compact sets, and therefore may be called a ‘Radon measure on R N ’. (Hint: show that there are compact sets of measure arbitrarily close to 1, and therefore that every open set, and every closed set, includes a Kσ set of the same measure.) 256 Notes and comments Radon measures on Euclidean spaces are very special, and the results of this section do not give clear pointers to the direction the theory takes when applied to other kinds of topological space. With the material here you could make a stab at developing a theory of Radon measures on separable complete metric spaces, provided you use 256Xa as the basis for your definition of ‘locally finite’. These are the spaces for which a version of 256C is true. (See 256Ye.) But for generalizations to other types of topological space, and for the more interesting parts of the theory on R r , I must ask you to wait for Volume 4. My purpose in introducing Radon measures here is strictly limited; I wish only to give a basis for §257 and §271 sufficiently solid not to need later revision. In fact I think that all we really need are the Radon probability measures. The chief technical difficulty in the definition of ‘Radon measure’ here lies in the insistence on completeness. It may well be that for everything studied in this volume, it would be simpler to look at locally finite measures with domain the algebra of Borel sets. This would involve us in a number of circumlocutions when dealing with Lebesgue measure itself and its derivates, since Lebesgue measure is defined on a larger σalgebra; but the serious objection arises in the more advanced theory, when nonBorel sets of various kinds become central. Since my aim in this book is to provide secure foundations for the study of all aspects of measure theory, I ask you to take a little extra trouble now in order to avoid the possibility of having to rework all your ideas later. The extra trouble arises, for instance, in 256D, 256Xe and 256Xj; since different Radon measures are defined on different σalgebras, we have to check that two Radon measures which agree on the compact sets, or on the open sets, have the same domains. On the credit side, some of the power of 256G arises from the fact that the Radon image measure νφ−1 is defined on the whole σalgebra {F : φ−1 [F ] ∈ dom(ν)}, not just on the Borel sets. The further technical point that Radon measures are expected to be locally finite gives less difficulty; its effect is that from most points of view there is little difference between a general Radon measure and a totally finite Radon measure. The extra condition which obviously has to be put into the hypotheses of such results as 256E and 256G is no burden on either intuition or memory.
284
Product measures
256 Notes
In effect, we have two definitions of Radon measures on Euclidean spaces: they are the inner regular locally finite topological measures, and they are also the completions of the locally finite Borel measures. The equivalence of these definitions is Theorem 256C. The latter definition is the better adapted to 256K, and the former to 256G. The ‘inner regularity’ of the basic definition refers to compact sets; we also have forms of inner regularity with respect to closed sets (256Bb) and Kσ sets (256Bc), and a complementary notion of ‘outer regularity’ with respect to open sets (256Xi).
257 Convolutions of measures The ideas of this chapter can be brought together in a satisfying way in the theory of convolutions of Radon measures, which will be useful in §272 and again in §285. I give just the definition (257A) and the central property (257B) of the convolution of totally finite Radon measures, with a few corollaries and a note on the relation between convolution of functions and convolution of measures (257F). 257A Definition Let r ≥ 1 be an integer and ν1 , ν2 two totally finite Radon measures on Rr . Let λ be the product measure on R r × R r ; then λ is also a (totally finite) Radon measure, by 256K. Define φ : R r × R r → R r by setting φ(x, y) = x + y; then φ is continuous, therefore measurable in the sense of 256G. The convolution of ν1 and ν2 , ν1 ∗ ν2 , is the image measure λφ−1 ; by 256G, this is a Radon measure. Note that if ν1 and ν2 are Radon probability measures, then λ and ν1 ∗ ν2 are also probability measures. 257B Theorem Let r ≥ 1 be an integer, and ν1 and ν2 two totally finite Radon measures on R r ; let ν = ν1 ∗ ν2 be their convolution, and λ their product on R r × R r . Then for any realvalued function h defined on a subset of R r , R R h(x + y)λ(d(x, y)) exists = h(x)ν(dx) if either integral is defined in [−∞, ∞]. proof Apply 235L with J(x, y) = 1, φ(x, y) = x + y for all x, y ∈ R r . 257C Corollary Let r ≥ 1 be an integer, and ν1 , ν2 two totally finite Radon measures on R r ; let ν = ν1 ∗ ν2 be their convolution, and λ their product on R r × R r ; write Λ for the domain of λ. Let h be a Λmeasurable function defined λalmost everywhere in R r . Suppose that any one of the integrals RR RR R h(x + y)ν1 (dx)ν2 (dy), h(x + y)ν2 (dy)ν1 (dx), h(x + y)λ(d(x, y)) exists and is finite. Then h is νintegrable and R RR RR h(x)ν(dx) = h(x + y)ν1 (dx)ν2 (dy) = h(x + y)ν2 (dy)ν1 (dx). proof Put 257B together with Fubini’s and Tonelli’s theorems (252H). 257D Corollary If ν1 and ν2 are totally finite Radon measures on R r , then ν1 ∗ ν2 = ν2 ∗ ν1 . proof For any Borel set E ⊆ R r , apply 257C to h = χE to see that ZZ ZZ (ν1 ∗ ν2 )(E) = χE(x + y)ν1 (dx)ν2 (dy) = χE(x + y)ν2 (dy)ν1 (dx) ZZ = χE(y + x)ν2 (dy)ν1 (dx) = (ν2 ∗ ν1 )(E). Thus ν1 ∗ ν2 and ν2 ∗ ν1 agree on the Borel sets of R r ; because they are both Radon measures, they must be identical (256D). 257E Corollary If ν1 , ν2 and ν3 are totally finite Radon measures on R r , then (ν1 ∗ν2 )∗ν3 = ν1 ∗(ν2 ∗ν3 ). proof For any Borel set E ⊆ R r , apply 257B to h = χE to see that
257Xe
Convolutions of measures
285
ZZ ((ν1 ∗ ν2 ) ∗ ν3 )(E) =
χE(x + z)(ν1 ∗ ν2 )(dx)ν3 (dz) ZZZ = χE(x + y + z)ν1 (dx)ν2 (dy)ν3 (dz)
(because x 7→ χE(x + z) is Borel measurable for every z) ZZ = χE(x + y)ν1 (dx)(ν2 ∗ ν3 )(dy) R (because (x, y) 7→ χE(x + y) is Borel measurable, so y 7→ χE(x + y)ν1 (dx) is (ν2 ∗ ν3 )integrable) = (ν1 ∗ (ν2 ∗ ν3 ))(E). Thus (ν1 ∗ ν2 ) ∗ ν3 and ν1 ∗ (ν2 ∗ ν3 ) agree on the Borel sets of R r ; because they are both Radon measures, they must be identical. 257F Theorem Suppose that ν1 and ν2 are totally finite Radon measures on R r which are indefiniteintegral measures over Lebesgue measure µ. Then ν1 ∗ ν2 is also an indefiniteintegral measure over µ; if f1 and f2 are RadonNikod´ ym derivatives of ν1 , ν2 respectively, then f1 ∗ f2 is a RadonNikod´ ym derivative of ν1 ∗ ν2 . R proof By 255H (see the remark in 255L), f1 ∗ f2 is integrable with respect to µ, with f1 ∗ f2 dµ = 1, and of course f1 ∗ f2 is nonnegative. If E ⊆ R r is a Borel set, Z
ZZ f1 ∗ f2 dµ =
χE(x + y)f1 (x)f2 (y)µ(dx)µ(dy)
E
(by 255G)
ZZ =
χE(x + y)f2 (y)ν1 (dx)µ(dy)
(because x 7→ χE(x + y) is Borel measurable) ZZ = χE(x + y)ν1 (dx)ν2 (dy) R (because (x, y) 7→ χE(x + y) is Borel measurable, so y 7→ χE(x + y)ν1 (dx) is ν2 integrable) = (ν1 ∗ ν2 )(E). So f1 ∗ f2 is a RadonNikod´ ym derivative of ν with respect to µ, by 256J. 257X Basic exercises > (a) Let r ≥ 1 be an integer. Let δ0 be the Radon probability measure on R r such that δ0 {0} = 1. Show that δ0 ∗ ν = ν for every totally finite Radon measure on R r . (b) Let µ and Rν be totally finite Radon measures on R r , and E any set measured by their convolution µ ∗ ν. Show that µ(E − y)ν(dy) is defined in [0, ∞] and equal to (µ ∗ ν)(E). (c) Let ν1 , . . . , νn be totally finite Radon measures on R r , and let ν be the convolution ν1 ∗ . . . ∗ νn (using 257E to see that such a bracketless expression is legitimate). Show that R R R h(x)ν(dx) = . . . h(x1 + . . . + xn )ν1 (dx1 ) . . . νn (dxn ) for every νintegrable function h. (d) Let ν1 and ν2 be totally finite Radon measures on R r , with supports F1 , F2 (256Xf). Show that the support of ν1 ∗ ν2 is {x + y : x ∈ F1 , y ∈ F2 }. >(e) Let ν1 and ν2 be totally finite Radon measures on R r , and suppose that ν1 has a RadonNikod´ ym derivative f with ym derivative g, R respect to Lebesgue measure µ. Showr that ν1 ∗ ν2 has a RadonNikod´ where g(x) = f (x − y)ν2 (dy) for µalmost every x ∈ R .
286
Product measures
257Xf
(f ) Suppose that ν1 , ν2 , ν10 and ν20 are totally finite Radon measures on R r , and that ν10 , ν20 are absolutely continuous with respect to ν1 , ν2 respectively. Show that ν10 ∗ ν20 is absolutely continuous with respect to ν1 ∗ ν2 . 257Y Further exercises (a) Let M be the space of countably additive functionals defined on the algebra B of Borel subsets of R, with its norm kνk = ν(R) (see 231Yh). (i) Show that we have a unique bilinear operator ∗ : M × M → M such that (µ1 ¹B) ∗ (µ2 ¹B) = (µ1 ∗ µ2 )¹B for all totally finite Radon measures µ1 , µ2 on R. (ii) Show that ∗ is commutative and associative. (iii) Show that kν1 ∗ ν2 k ≤ kν1 kkν2 k for all ν1 , ν2 ∈ M , so that M is a Banach algebra under this multiplication. (iv) Show that M has a multiplicative identity. (v) Show that L1 (µ) can be regarded as a closed subalgebra of M , where µ is Lebesgue measure on R r (cf. 255Xc). (b) Let us say that a Radon measure on ]−π, π] is a measure ν, with domain Σ, on ]−π, π] such that (i) every Borel subset of ]−π, π] belongs to Σ (ii) for every E ∈ Σ there are Borel sets E1 , E2 such that E1 ⊆ E ⊆ E2 and ν(E2 \ E1 ) = 0 (iii) every compact subset of ]−π, π] has finite measure. Show that for any two totally finite Radon measures ν1 , ν2 on ]−π, π] there is a unique totally finite Radon measure ν on ]−π, π] such that R R h(x)ν(dx) = h(x +2π y)ν1 (dx)ν2 (dy) for every νintegrable function h, where +2π is defined as in 255Ma. 257 Notes and comments Of course convolution of functions and convolution of measures are very closely connected; the obvious link being 257F, but the correspondence between 255G and 257B is also very marked. In effect, they give us the same notion of convolution u ∗ v when u, v are positive members of L1 and u ∗ v is interpreted in L1 rather than as a function (257Ya). But we should have to go rather deeper than the arguments here to find ideas in the theory of convolution of measures to correspond to such results as 255K. I will return to questions of this type in §444 in Volume 4. All the theorems of this section can be extended to general abelian locally compact Hausdorff topological groups; but for such generality we need much more advanced ideas (see §444), and for the moment I leave only the suggestion in 257Yb that you should try to adapt the ideas here to ]−π, π] or S 1 .
261A
Vitali’s theorem in R r
287
Chapter 26 Change of Variable in the Integral I suppose most courses on basic calculus still devote a substantial amount of time to practice in the techniques of Rintegrating standard functions. Surely the most powerful single technique is that of substitution: R replacing g(y)dy by g(φ(x))φ0 (x)dx for an appropriate function φ. At this level one usually concentrates on the skills of guessing at appropriate φ and getting the formulae right. I will not address such questions here, except for rare special cases; in this book I am concerned rather with validating the process. For functions of one variable, it can usually be justified by an appeal to the fundamental theorem of calculus, and for any particular case I would normally go first to §225 in the hope that the results there would cover it. But for functions of two or more variables some much deeper ideas are necessary. I have already treated the general problem of integrationbysubstitution in abstract measure spaces in R R §235. There I described conditions under which g(y)dy = g(φ(x))J(x)dx for an appropriate function J. The context there gave very little scope for suggestions as to how to compute J; at best, it could be presented as a RadonNikod´ ym derivative (235O). In this chapter I give a form of the fundamental theorem for the case of Lebesgue measure, in which φ is a more or less differentiable function between Euclidean spaces, and J is a ‘Jacobian’, the modulus of the determinant of the derivative of φ (263D). This necessarily depends on a serious investigation of the relationship between Lebesgue measure and geometry. The first step is to establish a form of Vitali’s theorem for rdimensional space, together with rdimensional density theorems; I do this in §261, following closely the scheme of §§221 and 223 above. We need to know quite a lot about differentiable functions between Euclidean spaces, and it turns out that the theory is intertwined with that of ‘Lipschitz’ functions; I treat these in §262. In the last two sections of the chapter, I turn to a separate problem for which some of the same techniques turn out to be appropriate: the description of surface measure on (smooth) surfaces in Euclidean space, like the surface of a cone or sphere. I suppose there is no difficulty in forming a robust intuition as to what is meant by the ‘area’ of such a surface and of suitably simple regions within it, and there is a very strong presumption that there ought to be an expression for this intuition in terms of measure theory as presented in this book; but the details are not I think straightforward. The first point to note is that for any calculation of the area of a region G in a surface S, one would always turn at once to a parametrization of the region, that is, a bijection φ : D → G from some subset D of Euclidean space. But obviously one needs to be sure that the result of the calculation is independent of the parametrization chosen, and while it would be possible to base the theory on results showing such independence directly, that does not seem to me to be a true reflection of the underlying intuition, which is that the area of simple surfaces, at least, is something intrinsic to their geometry. I therefore see no acceptable alternative to a theory of ‘rdimensional measure’ which can be described in purely geometric terms. This is the burden of §264, in which I give the definition and most fundamental properties of Hausdorff rdimensional measure in Euclidean spaces. With this established, we find that the techniques of §§261263 are sufficient to relate it to calculations through parametrizations, which is what I do in §265.
261 Vitali’s theorem in R r The main aim of this section is to give rdimensional versions of Vitali’s theorem and Lebesgue’s Density Theorem, following ideas already presented in §§221 and 223. 261A Notation For most of this chapter, we shall be dealing with the geometry and measure of Euclidean space; it will save space to fix some notation. Throughout this section and the two following, r ≥ 1 will be an integer. I will use Roman letters for members of R r and Greek letters for their coordinates, so that a = (α1 , . . . , αr ), etc.; if you see any Greek letter with a subscript you should look first for a nearby vector of which it might be a coordinate. The measure under consideration will nearly always be Lebesgue measure on Rr ; so unless otherwise R indicated µ should be interpreted as Lebesgue measure, and µ∗ as Lebesgue outer measure. Similarly, . . . dx will always be integration with respect to Lebesgue measure (in a dimension determined by the context).
288
Change of variable in the integral
261A
p For x = (ξ1 , . . . , ξr ) ∈ R r , write kxk = ξ12 + . . . + ξr2 . Recall that kx + yk ≤ kxk + kyk (1A2C) and that kαxk = αkxk for any vectors x, y and scalar α. I will use the same notation as in §115 for ‘intervals’, so that, in particular, [a, b[ = {x : αi ≤ ξi < βi ∀ i ≤ r}, ]a, b[ = {x : αi < ξi < βi ∀ i ≤ r}, [a, b] = {x : αi ≤ ξi ≤ βi ∀ i ≤ r} whenever a, b ∈ R r . 0 = (0, . . . , 0) will be the zero vector in R r , and 1 will be (1, . . . , 1). If x ∈ R r and δ > 0, B(x, δ) will be the closed ball with centre x and radius δ, that is, {y : y ∈ R r , ky −xk ≤ δ}. Note that B(x, δ) = x+B(0, δ); so that by the translationinvariance of Lebesgue measure we have µB(x, δ) = µB(0, δ) = βr δ r , where βr = =
1 k π k!
if r = 2k is even,
22k+1 k! k π (2k+1)!
if r = 2k + 1 is odd
(252Q). 261B Vitali’s theorem in R r Let A ⊆ R r be any set, and I a family of closed nontrivial (that is, nonsingleton, or, equivalently, nonnegligible) balls in R r such that every point of A is S contained in arbitrarily small members of I. Then there is a countable disjoint set I0 ⊆ I such that µ(A \ I0 ) = 0. proof (a) To begin with (down to the end of (f) below), suppose that kxk < M for every x ∈ A, and set I 0 = {I : I ∈ I, I ⊆ B(0, M )}. S If there is a finite disjoint set I0 ⊆ I 0 such that A ⊆ I0 (including the possibility that A = I0 = ∅), we can stop. So let us suppose henceforth that there is no such I0 . (b) In this case, if I0 is any finite disjoint subset of I 0 , there is a J ∈ I 0 which is disjoint from any S member of I0 . P P Take x ∈ A \ I0 . Because every member of I0 is closed, there is a δ > 0 such that B(x, δ) does not meet any member of I0 , and as kxk < M we can suppose that B(x, S δ) ⊆ B(0, M ). Let J be a member of I, containing x, and of diameter at most δ; then J ∈ I 0 and J ∩ I0 = ∅. Q Q (c) We can therefore choose a sequence hγn in∈N of real numbers and a disjoint sequence hIn in∈N in I 0 inductively, as follows. Given hIj ij 0, then there are q, q 0 ∈ Q such that f (x) − ² ≤ q ≤ f (x) ≤ q 0 ≤ f (x) + ², and now lim inf δ↓0
µ∗ {y:y∈D∩B(x,δ), f (y)−f (x)≤²} µB(x,δ)
≥ lim inf δ↓0
µ∗ (Dqq0 ∩B(x,δ)) µB(x,δ)
= 1,
so limδ↓0
µ∗ {y:y∈D∩B(x,δ), f (y)−f (x)≤²} µB(x,δ)
= 1.
(d) Define C as in (c). We know from (a) that µ(D \ C 0 ) = 0, where C 0 = {x : x ∈ D, limδ↓0
µ∗ (D∩B(x,δ)) µB(x,δ)
= 1}.
If x ∈ C ∩ C 0 and ² > 0, we know from (c) that limδ↓0
µ∗ {y:y∈D∩B(x,δ), f (y)−f (x)≤²/2} µB(x,δ)
= 1.
But because f is measurable, we have µ∗ {y : y ∈D ∩ B(x, δ), f (y) − f (x) ≥ ²} 1 2
+ µ∗ {y : y ∈ D ∩ B(x, δ), f (y) − f (x) ≤ ²} ≤ µ∗ (D ∩ B(x, δ)) for every δ > 0. Accordingly lim sup δ↓0
µ∗ {y:y∈D∩B(x,δ), f (y)−f (x)≥²} µB(x,δ) µ∗ (D∩B(x,δ)) µB(x,δ) δ↓0
≤ lim
µ∗ {y:y∈D∩B(x,δ), f (y)−f (x)≤²/2} µB(x,δ) δ↓0
− lim
= 0,
292
Change of variable in the integral
261D
and limδ↓0
µ∗ {y:y∈D∩B(x,δ), f (y)−f (x)≥²} µB(x,δ)
=0
for every x ∈ C ∩ C 0 , that is, for almost every x ∈ D. 261E Theorem Let D be a subset of R r , and f a realvalued function which is integrable over D. Then R 1 limδ↓0 f (y) − f (x)dy = 0 D∩B(x,δ) µB(x,δ)
for almost every x ∈ D. proof (Compare 223D.) (a) Suppose first that D is bounded. For each q ∈ Q, set gq (x) = f (x) − q for x ∈ D ∩ dom f ; then g is integrable over D, and R 1 limδ↓0 g = gq (x) D∩B(x,δ) q µB(x,δ)
for almost every x ∈ D, by 261C. Setting Eq = {x : x ∈ D ∩ dom f, limδ↓0
R 1 gq µB(x,δ) D∩B(x,δ)
we have D \ Eq negligible for every q, so D \ E is negligible, where E = limδ↓0
R
1 f (y) − f (x)dy µB(x,δ) D∩B(x,δ)
= gq (x)},
T q∈Q
Eq . Now
=0
for every x ∈ E. P P Take x ∈ E and ² > 0. Then there is a q ∈ Q such that f (x) − q ≤ ², so that f (y) − f (x) ≤ f (y) − q + ² = gq (y) + ² for every y ∈ D ∩ dom f , and Z 1 lim sup δ↓0
µB(x,δ)
f (y) − f (x)dy ≤ lim sup δ↓0
D∩B(x,δ)
Z
1 µB(x,δ)
gq (y) + ² dy D∩B(x,δ)
= ² + gq (x) ≤ 2². As ² is arbitrary, limδ↓0
R 1 f (y) − f (x)dy µB(x,δ) D∩B(x,δ)
= 0,
as required. Q Q (b) For unbounded sets D, apply (a) to D ∩ B(0, n) for each n ∈ N. Remark The set {x : x ∈ dom f, limδ↓0
R 1 f (y) − f (x)dy µB(x,δ) D∩B(x,δ)
= 0}
is sometimes called the Lebesgue set of f . 261F
Another very useful consequence of 261B is the following.
Proposition Let A ⊆ R r be any set,Sand ² > 0. Then hBn in∈N of closed balls in R r , P∞ there is a sequence ∗ all of radius at most ², such that A ⊆ n∈N Bn and n=0 µBn ≤ µ A + ². Moreover, we may suppose that the balls in the sequence whose centres do not lie in A have measures summing to at most ². proof (a) Set βr = µB(0, 1). The first step is the obvious remark that if x ∈ R r , δ > 0 then the halfopen √ cube I = [x, x + δ1[ is a subset of the ball B(x, δ r), which has measure γr δ r = γr µI, where γr = βr rr/2 . r It follows that if G ⊆ R is any open set, then G can be covered by a sequence of balls of total measure at most γr µG. P P If G is empty, we can take all the balls to be singletons. Otherwise, for each k ∈ N, set £ £ Qk = {z : z ∈ Zr , 2−k z, 2−k (z + 1) ⊆ G},
261Yc
Vitali’s theorem in R r
Ek =
S z∈Qk
293
£ −k £ 2 z, 2−k (z + 1 ).
Then hEk ik∈N is a nondecreasing sequence of sets with union G, and E0 and each of the differences Ek+1 \Ek is expressible as a disjoint union of halfopen cubes. Thus G also is expressible as a disjoint union S of a sequence hIn in∈N of halfopen cubes. Each In is covered by a ball Bn of measure γr µIn ; so that G ⊆ n∈N Bn and P∞ P∞ Q n=0 µBn ≤ γr n=0 µIn = γr µG. Q (b) It follows at once that if µA = 0 then for any ² > 0 there is a sequence hBn in∈N of balls covering A of measures summing to at most ², because there is certainly an open set including A with measure at most ²/γr . (c) Now take any set A, and ² > 0. Let G ⊇ A be an open set with µG ≤ µ∗ A + 21 ². Let I be the family of nontrivial closed balls included in G, of radius at most ² and with centres in A. Then every point of A S belongs to arbitrarily small members of I, so there is a countable disjoint I ⊆ I such that µ(A \ I ) = 0. 0 0 S P∞ surely all have Let hBn0 in∈N be a sequence of balls covering A \ I0 with n=0 µBn0 ≤ min( 21 ², βr ²r ); these S radius at most ². Let hBn in∈N be a sequence amalgamating I0 with hBn0 in∈N ; then A ⊆ n∈N Bn , every Bn has radius at most ² and P∞ P P∞ 1 0 n=0 µBn = B∈I0 µB + n=0 µBn ≤ µG + ² ≤ µA + ², 2
while the Bn whose centres do not lie in A must come from the sequence hBn0 in∈N , so their measures sum to at most 21 ² ≤ ². Remark In fact we can (if A is not empty) arrange that the centre of every Bn belongs to A. This is an easy consequence of Besicovitch’s Covering Lemma (see §472 in Volume 4). 261X Basic exercises (a) Show that 261C and 261E are valid for any locally integrable realvalued function f ; in particular, for any f ∈ Lp (µD ) for any p ≥ 1, writing µD for the subspace measure on D. (b) Show that 261C, 261Dc, 261Dd and 261E are valid for complexvalued functions f . > (c) Take three disks in the plane, each touching the other two, so that they enclose an open region R with three cusps. In R let D be a disk tangent to each of the three original disks, and R0 , R1 , R2 the three components of R \ D. In each Rj let Dj be a disk tangent to each of the disks bounding Rj , and Rj0 , Rj1 , Rj2 the three components of Rj \ Dj . Continue, obtaining 27 regions at the next step, 81 regions at the next, and so on. Show that the total area of the residual regions converges to zero as the process continues indefinitely. (Hint: compare with the process in the proof of 261B.) 261Y Further exercises (a) Formulate an abstract definition of ‘Vitali cover’, meaning a family of sets satisfying the conclusion of 261B in some sense, and corresponding generalizations of 261C261E, covering (at least) (b)(d) below. £ £ (b) For x ∈ R r , k ∈ N let C(x, k) be the halfopen cube of the form 2−k z, 2−k (z + 1) , with z ∈ Zr , containing x. Show that if f is an integrable function on R r then R limk→∞ 2kr C(x,k) f = f (x) for almost every x ∈ R r . (c) Let f be a realvalued function which is integrable over R r . Show that 1 R limδ↓0 r [x,x+δ1[ f = f (x) δ
r
for almost every x ∈ R .
294
Change of variable in the integral
261Yd
(d) Give X = {0, 1}N its usual measure ν (254J). For x ∈ X, k ∈ N set C(x, k) = {y : y ∈ X, R y(i) = x(i) for i < k}. Show that if f is any realvalued function which is integrable over X then limk→∞ 2k C(x,k) f dν = R f (x), limk→∞ 2k C(x,k) f (y) − f (x)ν(dy) = 0 for almost every x ∈ X. (e) Let f be a realvalued function which is integrable over R r , and R x a point in the Lebesgue set of f . Show that for every ² > 0 there is a δ > 0 such that f (x) − f (x − y)g(kyk)dy ≤ ² whenever R R g : [0, ∞[ → [0, ∞[ is a nonincreasing function such that R r g(kyk)dy = 1 and B(0,δ) g(kyk)dy ≥ 1 − δ. (Hint: 223Yg.) (f ) Let T be the family of those measurable sets G ⊆ R r such that limδ↓0 r
r
µ(G∩B(x,δ)) µB(x,δ)
= 1 for every
x ∈ G. Show that T is a topology on R , the density topology of R . Show that a function f : R r → R is measurable iff it is Tcontinuous at almost every point of Rr . (g) A set A ⊆ R r is said to be porous at x ∈ R r if lim supy→x
ρ(y,A) ky−xk
> 0, writing ρ(y, A) = inf z∈A ky −zk
(or ∞ if A is empty). Show that if A is porous at all its points then it is negligible. (h) Let A ⊆ R r be a bounded set and I a nonempty family of nontrivial closed Pn balls covering A. Show that for any ² > 0 there are disjoint B0 , . . . , Bn ∈ I such that µ∗ A ≤ (3 + ²)r k=0 µBk . (i) Let (X, ρ) be a metric space and A ⊆ X any set, x 7→ δx : A → [0, ∞[ any bounded function. Show that if γ >S3 then there is an A0 ⊆ A such that (i) ρ(x, y) > δx + δy for all distinct x, y ∈ A0 (ii) S x∈A0 B(x, γδx ), writing B(x, α) for the closed ball {y : ρ(y, x) ≤ α}. x∈A B(x, δx ) ⊆ (j) Show that any union of nontrivial closed balls in R r is Lebesgue measurable. (Hint: induce on r. Compare 415Ye in Volume 4.) (k) Suppose that A ⊆ R r and that I is a family of closed subsets of R r such that for every x ∈ A there is an η > 0 such that for every ² > 0 there is an I ∈ I such that x ∈ I and 0 < η(diam I)r ≤ µI ≤ ². S Show that there is a countable disjoint set I0 ⊆ I such that A \ I0 is negligible. 261 Notes and comments In the proofs of 261B261E above, I have done my best to follow the lines of the onedimensional case; this section amounts to a series of generalizations of the work of §§221 and 223. It will be clear that the idea of 261A/261B can be used on other shapes than balls. To make it work in the form above, we need a family I such that there is a constant K for which µI 0 ≤ KµI for every I ∈ I, where we write I 0 = {x : inf y∈I kx − yk ≤ diam(I)}. Evidently this will be true for many classes I determined by the shapes of the sets involved; for instance, if E ⊆ R r is any bounded set of strictly positive measure, the family I = {x + δE : x ∈ R r , δ > 0} will satisfy the condition. In 261Ya I challenge you to find an appropriate generalization of the arguments depending on the conclusion of 261B. Another way of using 261B is to say that because sets can be essentially covered by disjoint sequences of balls, it ought to be possible to use balls, rather than halfopen intervals, in the definition of Lebesgue measure on R r . This is indeed so (261F). The difficulty in using balls in the basic definition comes right at the start, in proving that if a ball is covered by finitely many balls then the sum of the volumes of the covering balls is at least the volume of the covered ball. (There is a trick, using the compactness of closed balls and the openness of open balls, to extend such a proof to infinite covers.) Of course you could regard this fact as ‘elementary’, on the ground that Archimedes would have noticed if it weren’t true, but nevertheless it would be something of a challenge to prove it, unless you were willing to wait for a version of Fubini’s theorem, as some authors do.
262B
Lipschitz and differentiable functions
295
I have given the results in 261C261E for arbitrary subsets D of Rr not because I have any applications in mind in which nonmeasurable subsets are significant, but because I wish toRmake it possible to notice when measurability matters. Of course it is necessary to interpret the integrals D f dµ in the way laid down in §214. The game is given away in part (c) of the proof of 261C, R where R I rely on the fact that if f is integrable over D then there is an integrable f˜ : R r → R such that F f˜ = D∩F f for every measurable F ⊆ R r . In effect, for all the questions dealt with here, we can replace f , D by f˜, R r . The idea of 261C is that, for almost every x, f (x) is approximated by its mean value on small balls B(x, δ), ignoring the missing values on B(x, δ) \ (D ∩ dom f ); 261E is a sharper version of the same idea. The formulae of 261C261E mostly involve the expression µB(x, δ). Of course this is just βr δ r . But I think that leaving it unexpanded is actually more illuminating, as well as avoiding sub and superscripts, since it makes it clearer what these density theorems are really about. In §472 of Volume 4 I will revisit this material, showing that a surprisingly large proportion of the ideas can be applied to arbitrary Radon measures on R r , even though Vitali’s theorem (in the form stated here) is no longer valid.
262 Lipschitz and differentiable functions In preparation for the main work of this chapter in §263, I devote a section to two important classes of functions between Euclidean spaces. What we really need is the essentially elementary material down to 262I, together with the technical lemma 262M and its corollaries. Theorem 262Q is not relied on in this volume, though I believe that it makes the patterns which will develop more natural and comprehensible. 262A Lipschitz functions Suppose that r, s ≥ 1 and φ : D → R s is a function, where D ⊆ R r . We say that φ is γLipschitz, where γ ∈ [0, ∞[, if kφ(x) − φ(y)k ≤ γkx − yk p p for all x, y ∈ D, writing kxk = ξ12 + . . . + ξr2 if x = (ξ1 , . . . , ξr ) ∈ R r , kzk = ζ12 + . . . + ζs2 if z = (ζ1 , . . . , ζs ) ∈ R s . In this case, γ is a Lipschitz constant for φ. A Lipschitz function is a function φ which is γLipschitz for some γ ≥ 0. Note that in this case φ has a least Lipschitz constant (since if A is the set of Lipschitz constants for φ, and γ0 = inf A, then γ0 is a Lipschitz constant for φ). 262B
We need the following easy facts.
Lemma Let D ⊆ R r be a set and φ : D → R s a function. (a) φ is Lipschitz iff φi : D → R is Lipschitz for every i, writing φ(x) = (φ1 (x), . . . , φs (x)) for every x ∈ D = dom φ ⊆ R r . (b) In this case, there is a Lipschitz function φ˜ : R r → R s extending φ. (c) If r = s = 1 and D = [a, b] is an interval, then φ is Lipschitz iff it is absolutely continuous and has a bounded derivative. proof (a) For any x, y ∈ D and i ≤ s, φi (x) − φi (y) ≤ kφ(x) − φ(y)k ≤
√
s supj≤s φj (x) − φj (y),
so any Lipschitz constant for φ will be a Lipschitz constant for every φi , and if γj is a Lipschitz constant for √ φj for each j, then s supj≤s γj will be a Lipschitz constant for φ. (b) By (a), it is enough to consider the case s = 1, for if every φi has a Lipschitz extension φ˜i , we can set ˜ φ(x) = (φ˜1 (x), . . . , φ˜s (x)) for every x to obtain a Lipschitz extension of φ. Taking s = 1, then, note that the case D = ∅ is trivial; so suppose that D 6= ∅. Let γ be a Lipschitz constant for φ, and write ˜ φ(z) = sup φ(y) − γky − zk y∈D
for every z ∈ R r . If x ∈ D, then, for any z ∈ R r and y ∈ D, φ(y) − γky − zk ≤ φ(x) + γky − xk − γky − zk ≤ φ(x) + γkz − xk,
296
Change of variable in the integral
262B
˜ ˜ so that φ(z) ≤ φ(x) + γkz − xk; this shows, in particular, that φ(z) < ∞. Also, if z ∈ D, we must have ˜ φ(z) − γkz − zk ≤ φ(z) ≤ φ(z) + γkz − zk, so that φ˜ extends φ. Finally, if w, z ∈ R r and y ∈ D, ˜ + γkw − zk; φ(y) − γky − wk ≤ φ(y) − γky − zk + γkw − zk ≤ φ(z) and taking the supremum over y ∈ D, ˜ ˜ + γkw − zk. φ(w) ≤ φ(z) As w and z are arbitrary, φ˜ is Lipschitz. (c)(i) Suppose that φ is γLipschitz. If ² > 0 and a ≤ a1 ≤ b1 ≤ . . . ≤ an ≤ bn ≤ b and ²/(1 + γ), then Pn Pn i=1 φ(bi ) − φ(ai ) ≤ i=1 γbi − ai  ≤ ².
Pn
i=1 bi
− ai ≤
As ² is arbitrary, φ is absolutely continuous. If x ∈ [a, b] and φ0 (x) is defined, then φ0 (x) = limy→x
φ(y)−φ(x) y−x
≤ γ,
so φ0 is bounded. (ii) Now suppose that φ is absolutely continuous and that φ0 (x) ≤ γ for every x ∈ dom φ0 , where γ ≥ 0. Then whenever a ≤ x ≤ y ≤ b, Ry Ry φ(y) − φ(x) =  x φ0  ≤ x φ0  ≤ γ(y − x) (using 225E for the first equality). As x and y are arbitrary, φ is γLipschitz. 262C Remark The argument for (b) above shows that if φ : D → R is a Lipschitz function, where D ⊆ Rr , then φ has an extension to R r with the same Lipschitz constants. In fact it is the case that if φ : D → R s is a Lipschitz function, then φ has an extension to φ˜ : R r → R s with the same Lipschitz constants; this is ‘Kirzbraun’s theorem’ (Kirzbraun 34, or Federer 69, 2.10.43). 262D Proposition If φ : D → R r is a γLipschitz function, where D ⊆ R r , then µ∗ φ[A] ≤ γ r µ∗ A for every A ⊆ D, where µ is Lebesgue measure on R r . In particular, φ[D ∩ A] is negligible for every negligible set A ⊆ R r . proof LetP² > 0. By 261F, there is aP sequence hBn in∈N = hB(xn , δn )in∈N of closed balls in Rr , covering A, ∞ ∗ such that n=0 µBn ≤ µ A + ² and n∈N\K µBn ≤ ², where K = {n : n ∈ N, xn ∈ A}. Set L = {n : n ∈ N \ K, Bn ∩ D 6= ∅}, and for n ∈ L choose yn ∈ D ∩ Bn . Now set Bn0 = B(φ(xn ), γδn ) if n ∈ K, = B(φ(yn ), 2γδn ) if n ∈ L, = ∅ if n ∈ N \ (K ∪ L). S Then φ[Bn ∩ D] ⊆ Bn0 for every n, so φ[D ∩ A] ⊆ n∈N Bn0 , and µ∗ φ[A ∩ D] ≤
∞ X
µBn0 = γ r
n=0 r ∗
X
µBn + 2r γ r
n∈K r r
≤ γ (µ A + ²) + 2 γ ². As ² is arbitrary, µ∗ φ[A ∩ D] ≤ γ r µ∗ A, as claimed.
X n∈L
µBn
262G
Lipschitz and differentiable functions
297
262E Corollary Let φ : D → R r be an injective Lipschitz function, where D ⊆ R r , and f a measurable function from a subset of R r to R. (a) If φ−1 is defined almost everywhere in a subset H of R r and f is defined almost everywhere in R r , then f φ−1 is defined almost everywhere in H. (b) If E ⊆ D is Lebesgue measurable then φ[E] is measurable. (c) If D is measurable then f φ−1 is measurable. proof Set C = dom(f φ−1 ) = {y : y ∈ φ[D], φ−1 (y) ∈ dom f } = φ[D ∩ dom f ]. (a) Because f is defined almost everywhere, φ[D \ dom f ] is negligible. But now C = φ[D] \ φ[D \ dom f ] = dom φ−1 \ φ[D \ dom f ], so H \ C ⊆ (H \ dom φ−1 ) ∪ φ[D \ dom f ] is negligible. (b) Now suppose that E ⊆SD and that E is measurable. Let hFn in∈N be a sequence of closed bounded subsets of E such that µ(E \ n∈N Fn ) = 0 (134Fb). Because φ is Lipschitz, it is continuous,Sso φ[Fn ] is compact, therefore closed, therefore measurable for every n (2A2F, 2A2E, 115G); also φ[E \ n∈N Fn ] is negligible, by 262D, therefore measurable. So S S φ[E] = φ[E \ n∈N Fn ] ∪ n∈N φ[Fn ] is measurable. (c) For any a ∈ R, take a measurable set E ⊆ R r such that {x : f (x) ≥ a} = E ∩ dom f . Then {y : y ∈ C, f φ−1 (y) ≥ a} = C ∩ φ[D ∩ E]. But φ[D ∩ E] is measurable, by (b), so {y : f φ−1 (y) ≥ a} is relatively measurable in C. As a is arbitrary, f φ−1 is measurable. 262F Differentiability I come now to the class of functions whose properties will take up most of the rest of the chapter. Definitions Suppose r, s ≥ 1 and that φ is a function from a subset D = dom φ of R r to R s . (a) φ is differentiable at x ∈ D if there is a real s × r matrix T such that limy→x
kφ(y)−φ(x)−T (y−x)k ky−xk
= 0;
in this case we may write T = φ0 (x). (b) I will say that φ is differentiable relative to its domain at x, and that T is a derivative of φ at x, if x ∈ D and for every ² > 0 there is a δ > 0 such that kφ(y) − φ(x) − T (y − x)k ≤ ²ky − xk for every y ∈ B(x, δ) ∩ D. 262G Remarks (a) The standard definition in 262Fa, involving an allsided limit ‘limy→x ’, implicitly requires φ to be defined on some nontrivial ball centred on x, so that we can calculate φ(y) − φ(x) −T (y −x) for all y sufficiently near x. It has the advantage that the derivative T = φ0 (x) is uniquely defined (because if limz→0
kT1 z−T2 zk kzk
= 0 then k(T1 −T2 )zk kzk
= limα→0
kT1 (αz)−T2 (αz)k kαzk
=0
for every nonzero z, so T1 − T2 must be the zero matrix). For our purposes here, there is some advantage in relaxing this slightly to the form in 262Fb, so that we do not need to pay special attention to the boundary of dom φ.
298
Change of variable in the integral
262Gb
(b) If you have not seen this concept of ‘differentiability’ before, but have some familiarity with partial differentiation, it is necessary to emphasize that the concept of ‘differentiable’ function (at least in the strict sense demanded by 262Fa) is strictly stronger than the concept of ‘partially differentiable’ function. For purposes of computation, the most useful method of finding true derivatives is through 262Id below. For a simple example of a function with a full set of partial derivatives, which is not everywhere differentiable, consider φ : R 2 → R defined by φ(ξ1 , ξ2 ) =
ξ1 ξ2 if ξ12 + ξ22 6= 0, ξ12 + ξ22
= 0 if ξ1 = ξ2 = 0. Then φ is not even continuous at 0, although both partial derivatives
∂φ ∂ξj
are defined everywhere.
(c) In the definition above, I speak of a derivative as being a matrix. Properly speaking, the derivative of a function defined on a subset of R r and taking values in R s should be thought of as a bounded linear operator from R r to R s ; the formulation in terms of matrices is acceptable just because there is a natural onetoone correspondence between s × r real matrices and linear operators from R r to R s , and all these linear operators are bounded. I use the ‘matrix’ description because it makes certain calculations more direct; in particular, the relationship between φ0 and the partial derivatives of φ (262Ic), and the notion of the determinant det φ0 (x), used throughout §§263 and 265. 262H The norm of a matrix Some of the calculations below will rely on the notion of ‘norm’ of a matrix. The one I will use (in fact, for our purposes here, any norm would do) is the ‘operator norm’, defined by saying kT k = sup{kT xk : x ∈ R r , kxk ≤ 1} for any s × r matrix T . For the basic facts concerning these norms, see 2A4F2A4G. The following will also be useful. (a) If all the coefficients of T are small, so is kT k; in fact, if T = hτij ii≤s,j≤r , and kxk ≤ 1, then ξj  ≤ 1 for each j, so ¢ ¢ ¡Ps Pr ¡Ps Pr √ 2 1/2 2 1/2 ≤ r s maxi≤s,j≤r τij , ≤ kT xk = j=1 τij ) j=1 τij ξj ) i=1 ( i=1 ( √ and kT k ≤ r s maxi≤s,j≤r τij . (This is a singularly crude inequality. A better one is in 262Ya. But it tells us, in particular, that kT k is always finite.) (b) If kT k is small, so are all the coefficients of T ; in fact, writing ej for the jth unit vector of R r , then the ith coordinate of T ej is τij , so τij  ≤ kT ej k ≤ kT k. 262I Lemma Let φ : D → R s be a function, where D ⊆ R r . For i ≤ s let φi : D → R be its ith coordinate, so that φ(x) = (φ1 (x), . . . , φs (x)) for x ∈ D. (a) If φ is differentiable relative to its domain at x ∈ D, then φ is continuous at x. (b) If x ∈ D, then φ is differentiable relative to its domain at x iff each φi is differentiable relative to its domain at x. i (c) If φ is differentiable at x ∈ D, then all the partial derivatives ∂φ ∂ξj of φ are defined at x, and the
i derivative of φ at x is the matrix h ∂φ ∂ξj (x)ii≤s,j≤r .
i (d) If all the partial derivatives ∂φ ∂ξj , for i ≤ s and j ≤ r, are defined in a neighbourhood of x ∈ D and are continuous at x, then φ is differentiable at x.
proof (a) Let T be a derivative of φ at x. Applying the definition 262Fb with ² = 1, we see that there is a δ > 0 such that kφ(y) − φ(x) − T (y − x)k ≤ ky − xk whenever y ∈ D and ky − xk ≤ δ. Now
262I
Lipschitz and differentiable functions
299
kφ(y) − φ(x)k ≤ kT (y − x)k + ky − xk ≤ (1 + kT k)ky − xk whenever y ∈ D and ky − xk ≤ δ, so φ is continuous at x. (b)(i) If φ is differentiable relative to its domain at x ∈ D, let T be a derivative of φ at x. For i ≤ s let Ti be the 1 × r matrix consisting of the ith row of T . Let ² > 0. Then we have a δ > 0 such that φi (y) − φi (x) − Ti (y − x) ≤ kφ(y) − φ(x) − T (y − x)k ≤ ²ky − xk whenever y ∈ D and ky − xk ≤ δ, so that Ti is a derivative of φi at x. (ii) If each φi is differentiable relative to its domain at x, with corresponding derivatives Ti , let T be the s × r matrix with rows T1 , . . . , Ts . Given ² > 0, there is for each i ≤ s a δi > 0 such that φi (y) − φi (x) − Ti y ≤ ²ky − xk whenever y ∈ D, ky − xk ≤ δi ; set δ = mini≤s δi > 0; then if y ∈ D and ky − xk ≤ δ, we shall have Ps kφ(y) − φ(x) − T (y − x)k2 = i=1 φi (y) − φi (x) − Ti (y − x)2 ≤ s²2 ky − xk2 , so that
√ kφ(y) − φ(x) − T (y − x)k ≤ ² sky − xk.
As ² is arbitrary, T is a derivative of φ at x. (c) Set T = φ0 (x). We have limy→x
kφ(y)−φ(x)−T (y−x)k ky−xk
= 0;
fix j ≤ r, and consider y = x + ηej , where ej = (0, . . . , 0, 1, 0, . . . , 0) is the jth unit vector in R r . Then we must have limη→0
kφ(x+ηej )−φ(x)−ηT (ej )k η
= 0.
Looking at the ith coordinate of φ(x + ηej ) − φ(x) − ηT (ej ), we have φi (x + ηej ) − φi (x) − τij η ≤ kφ(x + ηej ) − φ(x) − ηT (ej )k, where τij is the (i, j)th coefficient of T ; so that limη→0
φi (x+ηej )−φi (x)−τij η η
But this just says that the partial derivative
∂φi ∂ξj (x)
(d) Now suppose that the partial derivatives δ > 0 be such that
∂φi ∂ξj
= 0.
exists and is equal to τij , as claimed. are defined near x and continuous at x. Let ² > 0. Let
i  ∂φ ∂ξj (y) − τij  ≤ ²
whenever ky − xk ≤ δ, writing τij =
∂φi ∂ξj (x).
Now suppose that ky − xk ≤ δ. Set
y = (η1 , . . . , ηr ),
x = (ξ1 , . . . , ξr ),
yj = (η1 , . . . , ηj , ξj+1 , . . . , ξr ) for 0 ≤ j ≤ r, so that y0 = x, yr = y and the line segment between yj−1 and yj lies wholly within δ of x whenever 1 ≤ j ≤ r, since if z lies on this line segment then ζi lies between ξi and ηi for every i. By the ordinary mean value theorem for differentiable real functions, applied to the function t 7→ φi (η1 , . . . , ηj−1 , t, ξj+1 , . . . , ξr ), there is for each i ≤ s, j ≤ r a point zij on the line segment between yj−1 and yj such that i φi (yj ) − φi (yj−1 ) = (ηj − ξj ) ∂φ ∂ξj (zij ).
300
Change of variable in the integral
262I
But i  ∂φ ∂ξj (zij ) − τij  ≤ ²,
so φi (yj ) − φi (yj−1 ) − τij (ηj − ξj ) ≤ ²ηj − ξj  ≤ ²ky − xk. Summing over j, φi (y) − φi (x) −
Pr
j=1 τij (ηj
− ξj ) ≤ r²ky − xk
for each i. Summing the squares and taking the square root,
√ kφ(y) − φ(x) − T (y − x)k ≤ ²r sky − xk,
where T = hτij ii≤s,j≤r . And this is true whenever ky − xk ≤ δ. As ² is arbitrary, φ0 (x) = T is defined. 262J Remark I am not sure if I ought to apologize for the notation i ξj ) ∂φ ∂ξj (zij )
∂ ∂ξj .
In such formulae as (ηj −
above, the two appearances of ξj clash most violently. But I do not think that any person of good will is likely to be misled, provided that the labels ξj (or whatever symbols are used to represent the variables involved) are adequately described when the domain of φ is first introduced (and always remembering that in partial differentiation, we are not only moving one variable – a ξj in the present context – but holding fixed some further list of variables, not listed in the notation). I believe that the traditional notation ∂ξ∂ j has survived for solid reasons, and I should like to offer a welcome to those who are more comfortable with it than with any of the many alternatives which have been proposed, but have never taken root. 262K The Cantor function revisited It is salutary to reexamine the examples of 134H134I in the light of the present considerations. Let f : [0, 1] → [0, 1] be the Cantor function (134H) and set g(x) = 1 −1 : [0, 1] → [0, 1]. 2 (x + f (x)) for x ∈ [0, 1]. Then g : [0, 1] → [0, 1] is a homeomorphism (134I); set φ = g 1 We see that if 0 ≤ x ≤ y ≤ 1 then g(y) − g(x) ≥ 2 (y − x); equivalently, φ(y) − φ(x) ≤ 2(y − x) whenever 0 ≤ x ≤ y ≤ 1, so that φ is a Lipschitz function, therefore absolutely continuous (262Bc). If D = {x : φ0 (x) is defined}, then [0, 1] \ D is negligible (225Cb), so [0, 1] \ φ[D] = φ[ [0, 1] \ D] is negligible (262Da). I noted in 134I that there is a measurable function h : [0, 1] → R such that the composition hφ is not measurable; now h(φ¹D) = (hφ)¹D cannot be measurable, even though φ¹D is differentiable. 262L
It will be convenient to be able to call on the following straightforward result.
Lemma Suppose that D ⊆ R r and x ∈ R r are such that limδ↓0
µ∗ (D∩B(x,δ)) µB(x,δ)
= 1. Then limz→0
ρ(x+z,D) kzk
= 0,
where ρ(x + z, D) = inf y∈D kx + z − yk. proof Let ² > 0. Let δ0 > 0 be such that µ∗ (D ∩ B(x, δ)) > (1 − (
² r ) )µB(x, δ) 1+²
whenever 0 < δ ≤ δ0 . Take any z such that 0 < kzk ≤ δ0 /(1 + ²). ?? Suppose, if possible, that ρ(x + z, D) > ²kzk. Then B(x + z, ²kzk) ⊆ B(x, (1 + ²)kzk) \ D, so µ∗ (D ∩ B(x, (1 + ²)kzk)) ≤ µB(x, (1 + ²)kzk) − µB(x + z, ²kzk) = (1 − (
² r ) )µB(x, (1 + ²)kzk), 1+²
which is impossible, as (1 + ²)kzk ≤ δ0 . X X Thus ρ(x + z, D) ≤ ²kzk. As ² is arbitrary, this proves the result. Remark There is a word for this; see 261Yg. 262M I come now to the first result connecting Lipschitz functions with differentiable functions. I approach it through a substantial lemma which will be the foundation of §263.
262M
Lipschitz and differentiable functions
301
Lemma Let r, s ≥ 1 be integers and φ a function from a subset D of R r to R s which is differentiable at each point of its domain. For each x ∈ D let T (x) be a derivative of φ. Let Msr be the set of s × r matrices and ζ : A → ]0, ∞[ a strictly positive function, where A ⊆ Msr is a nonempty set containing T (x) for every x ∈ D. Then we can find sequences hDn in∈N , hTn in∈N such that (i) hDn in∈N is a disjoint cover of D by sets which are relatively measurable in D, that is, are intersections of D with measurable subsets of R r ; (ii) Tn ∈ A for every n; (iii) kφ(x) − φ(y) − Tn (x − y)k ≤ ζ(Tn )kx − yk for every n ∈ N and x, y ∈ Dn ; (iv) kT (x) − Tn k ≤ ζ(Tn ) for every x ∈ Dn . proof (a) The first step is to note that there is a sequence hSn in∈N in A such that S A ⊆ n∈N {T : T ∈ Msr , kT − Sn k < ζ(Sn )}. P P (Of course this is a standard result about separable metric spaces.) Write Q for the set of matrices in Msr with rational coefficients; then there is a natural bijection between Q and Qsr , so Q and Q × N are countable. Enumerate Q × N as h(Rn , kn )in∈N . For each n ∈ N, choose Sn ∈ A by the rule — if there is an S ∈ A such that {T : kT − Rn k ≤ 2−kn } ⊆ {T : kT − Sk < ζ(S)}, take such an S for Sn ; — otherwise, take Sn to be any member of A. I claim that this works. For let S ∈ A. Then ζ(S) > 0; take k ∈ N such that 2−k < ζ(S). Take R∗ ∈ Q such that kR∗ − Sk < min(ζ(S) − 2−k , 2−k ); this is possible because kR − Sk will be small whenever all the coefficients of R are close enough to the corresponding coefficients of S (262Ha), and we can find rational numbers to achieve this. Let n ∈ N be such that R∗ = Rn and k = kn . Then {T : kT − Rn k ≤ 2−kn } ⊆ {T : kT − Sk < ζ(S)} (because kT − Sk ≤ kT − Rn k + kRn − Sk), so we must have chosen Sn by the first part of the rule above, and S ∈ {T : kT − Rn k ≤ 2−kn } ⊆ {T : kT − Sn k < ζ(Sn )}. As S is arbitrary, this proves the result. Q Q (b) Enumerate Qr × Qr × N as h(qn , qn0 , mn )in∈N . For each n ∈ N, set Hn = {x : x ∈ [qn , qn0 ] ∩ D, kφ(y) − φ(x) − Smn (y − x)k ≤ ζ(Smn )ky − xk = [qn , qn0 ] ∩ D ∩
for every y ∈ [qn , qn0 ] ∩ D}
\
{x : x ∈ D,
0 ]∩D y∈[qn ,qn
kφ(y) − φ(x) − Smn (y − x)k ≤ ζ(Smn )ky − xk}. Because φ is continuous, Hn = D ∩ H n , writing H n for the closure of Hn , so Hn is relatively measurable in D. Note that if x, y ∈ Hn , then y ∈ D ∩ [qn , qn0 ], so that kφ(y) − φ(x) − Smn (y − x)k ≤ ζ(Smn )ky − xk. Set Hn0 = {x : x ∈ Hn , kT (x) − Smn k ≤ ζ(Smn )}. S (c) D = n∈N Hn0 . P P Let x ∈ D. Then T (x) ∈ A, so there is a k ∈ N such that kT (x) − Sk k < ζ(Sk ). Let δ > 0 be such that kφ(y) − φ(x) − T (x)(x − y)k ≤ (ζ(Sk ) − kT (x) − Sk k)kx − yk whenever y ∈ D and ky − xk ≤ δ. Then kφ(y) − φ(x) − Sk (x − y)k ≤ (ζ(Sk ) − kT (x) − Sk k)kx − yk + kT (x) − Sk kkx − yk ≤ ζ(Sk )kx − yk
302
Change of variable in the integral
262M
whenever y ∈ D ∩ B(x, δ). Let q, q 0 ∈ Qr be such that x ∈ [q, q 0 ] ⊆ B(x, δ). Let n be such that q = qn , q 0 = qn0 and k = mn . Then x ∈ Hn0 . Q Q (d) Write Cn = {x : x ∈ Hn , limδ↓0
µ∗ (Hn ∩B(x,δ)) µB(x,δ)
= 1}.
Then Cn ⊆ Hn0 . P P (i) Take x ∈ Cn , and set T˜ = T (x) − Smn . I have to show that kT˜k ≤ ζ(Smn ). Take ² > 0. Let δ0 > 0 be such that kφ(y) − φ(x) − T (x)(y − x)k ≤ ²ky − xk whenever y ∈ D and ky − xk ≤ δ0 . Since kφ(y) − φ(x) − Smn (y − x)k ≤ ζ(Smn )ky − xk whenever y ∈ Hn , we have kT˜(y − x)k ≤ (² + ζ(Smn ))ky − xk whenever y ∈ Hn and ky − xk ≤ δ0 . (ii) By 262L, there is a δ1 > 0 such that (1+2²)δ1 ≤ δ0 and ρ(x+z, Hn ) ≤ ²kzk whenever 0 < kzk ≤ δ1 . So if kzk ≤ δ1 there is a y ∈ Hn such that kx + z − yk ≤ 2²kzk. (If z = 0 we can take y = x.) Now kx − yk ≤ (1 + 2²)kzk ≤ δ0 , so kT˜zk ≤ kT˜(y − x)k + kT˜(x + z − y)k ≤ (² + ζ(Sm ))ky − xk + kT˜kkx + z − yk n
≤ (² + ζ(Smn ))kzk + (² + ζ(Smn ) + kT˜k)kx + z − yk ≤ (² + ζ(Sm ) + 2²2 + 2²ζ(Sm ) + 2²kT˜k)kzk. n
n
And this is true whenever 0 < kzk ≤ δ1 . But multiplying this inequality by suitable positive scalars we see that ¡ ¢ kT˜zk ≤ ² + ζ(Sm ) + 2²2 + 2²ζ(Sm ) + 2²kT˜k kzk n
n
r
for all z ∈ R , and kT˜k ≤ ² + ζ(Smn ) + 2²2 + 2²ζ(Smn ) + 2²kT˜k. As ² is arbitrary, kT˜k ≤ ζ(Smn ), as claimed. Q Q (e) By 261Da, Hn \ Cn is negligible for every n, so Hn \ Hn0 is negligible, and Hn0 = D ∩ (H n \ (Hn \ Hn0 )) is relatively measurable in D. Set Dn = Hn0 \
S k 1. (b) The first step is to show that all the partial derivatives Borel measurable. P P Take j ≤ r. For q ∈ Q \ {0} set
∂φ ∂ξj
are defined almost everywhere and are
1 q
∆q (x) = (φ(x + qej ) − φ(x)), writing ej for the jth unit vector of R r . Because φ is continuous, so is ∆q , so that ∆q is a Borel measurable function for each q. Next, for any x ∈ Rr , 1 δ
D+ (x) = lim supδ→0 (φ(x + δej ) − φ(x)) = limn→∞ supq∈Q,0 0 there is a δ > 0 such that φ(x + (u, 0)) − φ(x) − T (x)(u, 0) ≤ ²kuk whenever kuk ≤ δ, that is, iff for every m ∈ N there is an n ∈ N such that φ(x + (u, 0)) − φ(x) − T (x)(u, 0) ≤ 2−m kuk whenever u ∈ Qr−1 and kuk ≤ 2−n . But for any particular m ∈ N and u ∈ Qr−1 the set {x : φ(x + (u, 0)) − φ(x) − T (x)(u, 0) ≤ 2−m kuk} is measurable, indeed Borel, because all the functions x 7→ φ(x + (u, 0)), x 7→ φ(x), x 7→ T (x)(u, 0) are Borel measurable. So H1 is of the form T S T m∈N n∈N u∈Qr−1 ,kuk≤2−n Emnu where every Emnu is a measurable set, and H1 is therefore measurable. Now however observe that for any σ ∈ R, the function v 7→ φσ (v) = φ(v, σ) : R r−1 → R
*262Q
Lipschitz and differentiable functions
305
is Lipschitz, therefore (by the inductive hypothesis) differentiable almost everywhere on R r−1 ; and that (v, σ) ∈ H1 iff (v, σ) ∈ H and φ0σ (v) is defined. Consequently {v : (v, σ) ∈ H1 } is conegligible whenever {v : (v, σ) ∈ H} is, that is, for almost every σ ∈ R; so that H1 , being measurable, must be conegligible. Q Q (e) Now, for q, q 0 ∈ Q and n ∈ N, set F (q, q 0 , n) = {x : x ∈ R r , q ≤
φ(x+(0,η))−φ(x) η
≤ q 0 whenever 0 < η ≤ 2−n }.
Set F∗ (q, q 0 , n) = {x : x ∈ F (q, q 0 , n), limδ↓0
µ∗ (F (q,q 0 ,n)∩B(x,δ)) µB(x,δ)
= 1}.
By 261Da, F (q, q 0 , n) \ F∗ (q, q 0 , n) is negligible for all q, q 0 , n, so that S H2 = H1 \ q,q0 ∈Q,n∈N (F (q, q 0 , n) \ F∗ (q, q 0 , n)) is conegligible. (f ) I claim that φ is differentiable at every point of H2 . P P Take x = (u, σ) ∈ H2 . Then α =
∂φ (x) ∂ξr
and
T = T (x) are defined. Let γ be a Lipschitz constant for φ. Take ² > 0; take q, q 0 ∈ Q such that α − ² ≤ q < α < q 0 ≤ α + ². There must be an n ∈ N such that x ∈ F (q, q 0 , n); consequently x ∈ F∗ (q, q 0 , n), by the definition of H2 . By 262L, there is a δ0 > 0 such that ρ(x + z, F (q, q 0 , n)) ≤ ²kzk whenever kzk ≤ δ0 . Next, there is a δ1 > 0 such that φ(x + (v, 0)) − φ(x) − T (v, 0) ≤ ²kvk whenever v ∈ R r−1 and kvk ≤ δ1 . Set δ = min(δ0 , δ1 , 2−n )/(1 + 2²) > 0. Suppose that z = (v, τ ) ∈ R r and that kzk ≤ δ. Because kzk ≤ δ0 there is an x0 = (u0 , σ 0 ) ∈ F (q, q 0 , n) such that kx + z − x0 k ≤ 2²kzk; set x∗ = (u0 , σ). Now max(ku − u0 k, σ − σ 0 ) ≤ kx − x0 k ≤ (1 + 2²)kzk ≤ min(δ1 , 2−n ). so φ(x∗ ) − φ(x) − T (x∗ − x) ≤ ²ku0 − uk ≤ ²(1 + 2²)kzk. But also φ(x0 ) − φ(x∗ ) − T (x0 − x∗ ) = φ(x0 ) − φ(x∗ ) − α(σ 0 − σ) ≤ ²σ 0 − σ ≤ ²(1 + 2²)kzk, because x0 ∈ F (q, q 0 , n) and σ − σ 0  ≤ 2−n , so that (if x0 6= x∗ ) α−²≤q ≤
φ(x∗ )−φ(x0 ) σ−σ 0
≤ q0 ≤ α + ²
and ¯ φ(x0 )−φ(x∗ ) ¯ ¯ − α¯ ≤ ². 0 σ −σ
Finally, φ(x + z) − φ(x0 ) ≤ γkx + z − x0 k ≤ 2γ²kzk, T z − T (x0 − x) ≤ kT kkx + z − x0 k ≤ 2²kT kkzk. Putting all these together, φ(x + z) − φx − T z ≤ φ(x + z) − φ(x0 ) + T (x0 − x) − T z + φ(x0 ) − φ(x∗ ) − T (x0 − x∗ ) + φ(x∗ ) − φ(x) − T (x∗ − x) ≤ 2γ²kzk + 2²kT kkzk + ²(1 + 2²)kzk + ²(1 + 2²)kzk = ²(2γ + 2kT k + 2 + 4²)kzk. And this is true whenever kzk ≤ δ. As ² is arbitrary, φ is differentiable at x. Q Q Thus {x : φ is differentiable at x} includes H2 and is conegligible; and the induction continues.
306
Change of variable in the integral
262X
262X Basic exercises (a) Let φ and ψ be Lipschitz functions from subsets of R r to R s . Show that φ + ψ is a Lipschitz function from dom φ ∩ dom ψ to Rs . (b) Let φ be a Lipschitz function from a subset of R r to R s , and c ∈ R. Show that cφ is a Lipschitz function. (c) Suppose φ : D → R s and ψ : E → R q are Lipschitz functions, where D ⊆ R r and E ⊆ R s . Show that the composition ψφ : D ∩ φ−1 [E] → R q is Lipschitz. (d) Suppose φ, ψ are functions from subsets of R r to R s , and suppose that x ∈ dom φ ∩ dom ψ is such that each function is differentiable relative to its domain at x, with derivatives S, T there. Show that φ + ψ is differentiable relative to its domain at x, and that S + T is a derivative of φ + ψ at x. (e) Suppose that φ is a function from a subset of R r to R s , and is differentiable relative to its domain at x ∈ dom φ. Show that cφ is differentiable relative to its domain at x for every c ∈ R. > (f ) Suppose φ : D → R s and ψ : E → Rq are functions, where D ⊆ R r and E ⊆ R s ; suppose that φ is differentiable relative to its domain at x ∈ D ∩ φ−1 [E], with an s × r matrix T a derivative there, and that ψ is differentiable relative to its domain at φ(x), with a q × s matrix S a derivative there. Show that the composition ψφ is differentiable relative to its domain at x, and that the q × r matrix ST is a derivative of ψφ at x. (g) Let φ : R r → R s be a linear operator, with associated matrix T . Show that φ is differentiable everywhere, with φ0 (x) = T for every x. > (h) Let G ⊆ R r be a convex open set, and φ : G → R s a function such that all the partial derivatives are defined everywhere in G. Show that φ is Lipschitz iff all the partial derivatives are bounded on G.
∂φi ∂ξj
(i) Let φ : R r → R s be a function. Show that φ is differentiable at x ∈ R r iff for every m ∈ N there are an n ∈ N and an r × s matrix T with rational coefficients such that kφ(y) − φ(x) − T (y − x)k ≤ 2−m ky − xk whenever ky − xk ≤ 2−n . >(j) Suppose that f is a realvalued function which is integrable over R r , and that g : R r → R is a bounded differentiable function such that the partial derivative the convolution of f and g (255L). Show that
∂ (f ∂ξj
∂g ∂ξj
is bounded, where j ≤ r. Let f ∗ g be
∗ g) is defined everywhere and equal to f ∗
∂g . ∂ξj
(Hint:
255Xg.) >(k) Let (X, Σ, µ) be a measure space, G ⊆ R r an open set, and f : X × G → R a function. Suppose that (i) for every x ∈ X, t 7→ f (x, t) : G → R is differentiable; ∂f (ii) there is an integrable function g on X such that  ∂τ (x, t) ≤ g(x) whenever x ∈ X, t ∈ G j and j ≤R r; (iii) f (x, t)µ(dx) exists in R for every t ∈ G. R Show that t 7→ f (x, t)µ(dx) : G → R is differentiable. (Hint: show first that, for a suitable M , f (x, t) − f (x, t0 ) ≤ M g(x)kt − t0 k for every t, t0 ∈ G and x ∈ X.) 262Y Further exercises (a) Show if T = hτij ii≤s,j≤r is an s × r matrix then the operator norm qP that Pr s 2 kT k, as defined in 262H, is at most i=1 j=1 τij  . (b) Give an example of a measurable function φ : R 2 → R such that dom
∂φ ∂ξ1
is not measurable.
(c) Let φ : D → R be any function, where D ⊆ R r . Show that H = {x : x ∈ D, φ is differentiable relative to its domain at x} is relatively measurable in D, and that
∂φ ¹H ∂ξj
is measurable for every j ≤ r.
262 Notes
Lipschitz and differentiable functions
(d) A function φ : R r → R is smooth if all its partial derivatives r
r
307 ∂...∂φ ∂ξi ∂ξj ...∂ξl
are defined everywhere in
r
R and are continuous. Show that if f is integrable over R and φ : R → R is smooth and has bounded support then the convolution f ∗ φ is smooth. (Hint: 262Xj, 262Xk.) R 2 2 (e) For δ > 0 set φ˜δ (x) = e1/(δ −kxk ) if kxk < δ, 0 if kxk ≥ δ; set αδ = φ˜δ (x)dx, φδ (x) = αδ−1 φ˜δ (x) for r every x. (i) Show that R φδ : R → R is smooth and has bounded support. (ii) Show that if f is integrable r over R then limδ↓0 f (x) − (f ∗ φδ )(x)dx = 0. (Hint: start with continuous functions f with bounded support, and use 242O.) r (f ) Show R that if f is integrable over R and ² > 0 there is a smooth function h with bounded support such that f − h ≤ ². (Hint: either reduce to the case in which f has bounded support and use 262Ye or adapt the method of 242Xi.)
(g) Suppose that f is a real function which is integrable over every bounded subset of R r . (i) Show that r that if Rf × φ is integrable whenever φ : R → R is a smooth function with bounded support. (ii) Show R f × φ = 0 for every smooth function with bounded support then f =a.e. 0. (Hint: show that B(x,δ) f = 0 R for every x ∈ R r and δ > 0, and use 261C. Alternatively show that E f = 0 first for E = [b, c], then for open sets E, then for arbitrary measurable sets E.) (h) Let f be integrable over R r , and for δ > 0 let φδ : R r → R be the function of 262Ye. Show that limδ↓0 (f ∗ φδ )(x) = f (x) for every x in the Lebesgue set of f . (Hint: 261Ye.) (i) Let L be the space of all Lipschitz functions from R r to R s and for φ ∈ L set kφk = kφ(0)k + inf{γ : γ ∈ [0, ∞[, kφ(y) − φ(x)k ≤ γky − xk for every x, y ∈ R r }. Show that (L, k k) is a Banach space. 262 Notes and comments The emphasis of this section has turned out to be on the connexions between the concepts of ‘Lipschitz function’ and ‘differentiable function’. It is the delight of classical real analysis that such intimate relationships arise between concepts which belong to different categories. ‘Lipschitz functions’ clearly belong to the theory of metric spaces (I will return to this in §264), while ‘differentiable functions’ belong to the theory of differentiable manifolds, which is outside the scope of this volume. I have written this section out carefully just in case there are readers who have so far missed the theory of differentiable mappings between multidimensional Euclidean spaces; but it also gives me a chance to work through the notion of ‘function differentiable relative to its domain’, which will make it possible in the next section to ride smoothly past a variety of problems arising at boundaries. The difficulties I am concerned with arise in the first place with such functions as the polarcoordinate transformation (ρ, θ) 7→ (ρ cos θ, ρ sin θ) : {(0, 0)} ∪ (]0, ∞[ × ]−π, π]) → R 2 . In order to make this a bijection we have to do something rather arbitrary, and the domain of the transformation cannot be an open set. On the definitions I am using, this function is differentiable relative to its domain at every point of its domain, and we can apply such results as 262O uninhibitedly. You will observe that in this case the noninterior points of the domain form a negligible set {(0, 0)} ∪ (]0, ∞[ × {π}), so we can expect to be able to ignore them; and for most of the geometrically straightforward transformations that the theory is applied to, judicious excision of negligible sets will reduce problems to the case of honestly differentiable functions with open domains. But while opendomain theory will deal with a large proportion of the most important examples, there is a danger that you would be left with real misapprehensions concerning the scope of these methods. The essence of differentiability is that a differentiable function φ is approximable, near any given point of its domain, by an affine function. The idea of 262M is to describe a widely effective method of dissecting D = dom φ into countably many pieces on each of which φ is wellbehaved. This will be applied in §§263 and 265 to investigate the measure of φ[D]; but we already have several straightforward consequences (262N262P).
308
Change of variable in the integral
§263 intro.
263 Differentiable transformations in R r This section is devoted to the proof of a single major theorem (263D) concerning differentiable transformations between subsets of R r . There will be a generalization of this result in §265, and those with some familiarity with the topic, or sufficient hardihood, may wish to read §264 before taking this section and §265 together. I end with a few simple corollaries and an extension of the main result which can be made in the onedimensional case (263I). Throughout this section, as in the rest of the chapter, µ will denote Lebesgue measure on R r . 263A Linear transformations I begin with the special case of linear operators, which is not only the basis of the proof of 263D, but is also one of its most important applications, and is indeed sufficient for many very striking results. Theorem Let T be a real r × r matrix; regard T as a linear operator from R r to itself. Let J =  det T  be the modulus of its determinant. Then µT [E] = JµE r
for every measurable set E ⊆ R . If T is a bijection (that is, if J 6= 0), then µF = JµT −1 [F ] for every measurable F ⊆ R r , and
R F
g dµ = J
R T −1 [F ]
gT dµ
for every integrable function g and measurable set F . proof (a) The first step is to show that T [I] is measurable for every halfopen interval I ⊆ R r . P P Any nonempty halfopen interval I = [a, b[ is a countable union of closed intervals In = [a, b − 2−n 1], and each In is compact (2A2F), S so that T [In ] is compact (2A2Eb), therefore closed (2A2Ec), therefore measurable Q (115G), and T [I] = n∈N T [In ] is measurable. Q (b) Set J ∗ = µT [ [0, 1[ ], where 0 = (0, . . . , 0) and 1 = (1, . . . , 1); because T [ [0, 1[ ] is bounded, J ∗ < ∞. (I will eventually show that J ∗ = J.) It is convenient to deal with the case of singular T first. Recall that T , regarded as a linear transformation from R r to itself, is either bijective or onto a proper linear subspace. In the latter case, take any e ∈ Rr \ T [R r ]; then the sets T [ [0, 1[ ] + γe, as γ runs over [0, 1], are disjoint and all of the same measure J ∗ , because µ is translationinvariant (134A); moreover, their union is bounded, so has finite outer measure. As there are infinitely many such γ, the common measure J ∗ must be zero. Now observe that S T [R r ] = z∈Zr T [ [0, 1[ ] + T z, and µ(T [ [0, 1[ ] + T z) = J ∗ = 0 for every z ∈ Zr , while Zr is countable, so µT [Rr ] = 0. At the same time, because T is singular, it has zero determinant, and J = 0. Accordingly µT [E] = 0 = JµE r
for every measurable E ⊆ R , and we’re done. (c) Henceforth, therefore, let us assume that T is nonsingular. Note that it and its inverse are continuous, so that T is a homeomorphism, and T [G] is open iff G is open. If a ∈ R r and k ∈ N, then £ £ µT [ a, a + 2−k 1 ] = 2−kr J ∗ . £ £ £ £ £ £ P P Set Jk∗ = µT [ 0, 2−k 1 ]. Now T [ a, a + 2−k 1 ] = T [ 0, 2−k 1 ] + T a; because µ is translationinvariant, £ £ its measure is also Jk∗ . Next, [0, 1[ is expressible as a disjoint uion of 2kr sets £of the form £ a, a + 2−k 1 ; consequently, T [ [0, 1[ ] is expressible as a disjoint uion of 2kr sets of the form T [ a, a + 2−k 1 ], and
263A
Differentiable transformations in R r
309
J ∗ = µT [ [0, 1[ ] = 2kr Jk∗ , that is, Jk∗ = 2−kr J ∗ , as claimed. Q Q (d) Consequently µT [G] = J ∗ µG for every open set G ⊆ R r . P P For each k ∈ N, set £ −k £ r −k Qk = {z : z ∈ Z , 2 z, 2 z + 2−k 1 ⊆ G, £ £ S Gk = z∈Qk 2−k z, 2−k z + 2−k 1 . £ −k £ −k −k kr Then Gk is a disjoint union of #(Qk ) sets of the form k ); also, £ −k 2 −kz, 2 z−k+ 2£ 1 , so µGk = 2−kr#(Q T [Gk ] is a disjoint union of #(Qk ) sets of the form T [ 2 z, 2 z + 2 1 ], so has measure 2 J ∗ #(Qk ) = J ∗ µGk , using (c). Observe next that hGk ik∈N is a nondecreasing sequence with union G, so that µT [G] = limk→∞ µT [Gk ] = limk→∞ J ∗ µGk = J ∗ µG. Q Q (e) It follows that µ∗ T [A] = J ∗ µ∗ A for every A ⊆ Rr . P P Given A ⊆ R r and ² > 0, there are open sets ∗ G, H such that G ⊇ A, H ⊇ T [A], µG ≤ µ A + ² and µH ≤ µ∗ T [A] + ² (134Fa). Set G1 = G ∩ T −1 [H]; then G1 is open because T −1 [H] is. Now µT [G1 ] = J ∗ µG1 , so µ∗ T [A] ≤ µT [G1 ] = J ∗ µG1 ≤ J ∗ µ∗ A + J ∗ ² ≤ J ∗ µG1 + J ∗ ² = µT [G1 ] + J ∗ ² ≤ µH + J ∗ ² ≤ µ∗ T [A] + ² + J ∗ ². As ² is arbitrary, µ∗ T [A] = J ∗ µ∗ A. Q Q (f ) Consequently µT [E] exists and is equal to J ∗ µE for every measurable E ⊆ R r . P P Let E ⊆ R r be r 0 −1 measurable, and take any A ⊆ R . Set A = T [A]. Then µ∗ (A ∩ T [E]) + µ∗ (A \ T [E]) = µ∗ (T [A0 ∩ E]) + µ∗ (T [A0 \ E]) = J ∗ (µ∗ (A0 ∩ E) + µ∗ (A0 \ E)) = J ∗ µ∗ A0 = µ∗ T [A0 ] = µ∗ A. As A is arbitrary, T [E] is measurable, and now µT [E] = µ∗ T [E] = J ∗ µ∗ E = J ∗ µE. Q Q (g) We are at last ready for the calculation of J ∗ . Recall that the matrix T must be expressible as P DQ, where P and Q are orthogonal matrices and D is diagonal, with nonnegative diagonal entries (2A6C). Now we must have T [ [0, 1[ ] = P [D[Q[ [0, 1[ ]]], so, using (f), ∗ ∗ J ∗ = JP∗ JD JQ , ∗ where JP∗ = µP [ [0, 1[ ], etc. Now we find that JP∗ = JQ = 1. P P Let B = B(0, 1) be the unit ball of r R . Because B is closed, it is measurable; because it is bounded, µB < ∞; and because B includes the £ £ nonempty halfopen interval 0, r−1/2 1 , µB > 0. Now P [B] = Q[B] = B, because P and Q are orthogonal matrices; so we have
µB = µP [B] = JP∗ µB, ∗ and JP∗ must be 1; similarly, JQ = 1. Q Q ∗ (h) So we have only to calculate JD . Suppose the coefficients of D are δ1 , . . . , δr ≥ 0, so that Dx = (δ1 ξ1 , . . . , δr ξr ) = d × x. We have been assuming since the beginning of (c) that T is nonsingular, so no δi can be 0. Accordingly
D[ [0, 1[ ] = [0, d[,
310
Change of variable in the integral
and ∗ JD = µ [0, d[ =
Qr
i=1 δi
263A
= det D.
Now because P and Q are orthogonal, both have determinant ±1, so det T = ± det D and J ∗ = ± det T ; because J ∗ is surely nonnegative, J ∗ =  det T  = J. (i) Thus µT [E] = JµE for every Lebesgue measurable E ⊆ R r . If T is nonsingular, then we may use the above argument to show that T −1 [F ] is measurable for every measurable F , and R µF = µT [T −1 [F ]] = JµT −1 [F ] = J × χ(T −1 [F ]) dµ, identifying J with the constant function with value J. By 235A, R R R g dµ = T −1 [F ] JgT dµ = J T −1 [F ] gT dµ F for every integrable function g and measurable set F . 263B Remark Perhaps I should have warned you that I should be calling on the results of §235. But if they were fresh in your mind the formulae of the statement of the theorem will have recalled them, and if not then it is perhaps better to turn back to them now rather than before reading the theorem, since they are used only in the last sentence of the proof. I have taken the argument above at a leisurely, not to say pedestrian, pace. The point is that while the translationinvariance of Lebesgue measure, and its behaviour under simple magnification of a single coordinate, are more or less built into the definition, its behaviour under general rotations is not, since a rotation takes halfopen intervals into skew cuboids. Of course the calculation of the measure of such an object is not really anything to do with the Lebesgue theory, and it will be clear that much of the argument would apply equally to any geometrically reasonable notion of rdimensional volume. We come now to the central result of the chapter. We have already done some of the detail work in 262M. The next basic element is the following lemma. 263C Lemma Let T be any r × r matrix; set J =  det T . Then for any ² > 0 there is a ζ = ζ(T, ²) > 0 such that (i)  det S − det T  ≤ ² whenever S is an r × r matrix and kS − T k ≤ ζ; (ii) whenever D ⊆ R r is a bounded set and φ : D → R r is a function such that kφ(x) − φ(y) − T (x − y)k ≤ ζkx − yk for all x, y ∈ D, then µ∗ φ[D] − Jµ∗ D ≤ ²µ∗ D. proof (a) Of course (i) is the easy part. Because det S is a continuous function of the coefficients of S, and the coefficients of S must be close to those of T if kS − T k is small (262Hb), there is surely a ζ0 > 0 such that  det S − det T  ≤ ² whenever kS − T k ≤ ζ0 . (b)(i) Write B = B(0, 1) for the unit ball of R r , and consider T [B]. We know that µT [B] = JµB (263A). Let G ⊇ T [B] be an open set such that µG ≤ (J + ²)µB (134Fa). Because B is compact (2A2F) so is T [B], so there is a ζ1 > 0 such that T [B] + ζ1 B ⊆ G (2A2Ed). This means that µ∗ (T [B] + ζ1 B) ≤ (J + ²)µB. (ii) Now suppose that D ⊆ R r is a bounded set, and that φ : D → R r is a function such that kφ(x) − φ(y) − T (x − y)k ≤ ζ1 kx − yk for all x, y ∈ D. Then if x ∈ D and δ > 0, φ[D ∩ B(x, δ)] ⊆ φ(x) + δT [B] + δζ1 B, because if y ∈ D ∩ B(x, δ) then T (y − x) ∈ δT [B] and φ(y) = φ(x) + T (y − x) + (φ(y) − φ(x) − T (y − x)) ∈ φ(x) + δT [B] + ζ1 ky − xkB ⊆ φ(x) + δT [B] + ζ1 δB. Accordingly µ∗ φ[D ∩ B(x, δ)] ≤ µ∗ (δT [B] + δζ1 B) = δ r µ∗ (T [B] + ζ1 B) ≤ δ r (J + ²)µB = (J + ²)µB(x, δ). S P∞ Let η > 0. Then there is a sequence hBn in∈N of balls in R r such that D ⊆ n∈N Bn , n=0 µBn ≤ µ∗ D+η and the sum of the measures of those Bn whose centres do not lie in D is at most η (261F). Let K be the
263C
Differentiable transformations in R r
311
set of those n such that the centre of Bn lies in D. Then µ∗ φ[D ∩ Bn ] ≤ (J + ²)µBn for every n ∈ K. Also, of course, φ is (kT k + ζ1 )Lipschitz, so µ∗ φ[D ∩ Bn ] ≤ (kT k + ζ1 )r µBn for n ∈ N \ K (262D). Now ∗
µ φ[D] ≤
∞ X
µ∗ φ[D ∩ Bn ]
n=0
≤
X
(J + ²)µBn +
n∈K
X
(kT k + ζ1 )r µBn
n∈N\K ∗
≤ (J + ²)(µ D + η) + η(kT k + ζ1 )r . As η is arbitrary, µ∗ φ[D] ≤ (J + ²)µ∗ D. (c) If J = 0, we can stop here, setting ζ = min(ζ0 , ζ1 ); for then we surely have  det S − det T  ≤ ² whenever kS − T k ≤ ζ, while if φ : D → Rr is such that kφ(x) − φ(y) − T (x − y)k ≤ ζkx − yk for all x, y ∈ D, then µ∗ φ[D] − Jµ∗ D = µ∗ φ[D] ≤ ²µ∗ D. If J 6= 0, we have more to do. Because T has nonzero determinant, it has an inverse T −1 , and  det T −1  = J −1 . As in (bi) above, there is a ζ2 > 0 such that µ∗ (T −1 [B] + ζ2 B) ≤ (J −1 + ²0 )µB, where ²0 = ²/J(J + ²). Repeating (b), we see that if C ⊆ Rr is bounded and ψ : C → R r is such that kψ(u) − ψ(v) − T −1 (u − v)k ≤ ζ2 ku − vk for all u, v ∈ C, then µ∗ ψ[C] ≤ (J −1 + ²0 )µ∗ C. Now suppose that D ⊆ R r is bounded and φ : D → R r is such that kφ(x) − φ(y) − T (x − y)k ≤ ζ20 kx − yk for all x, y ∈ D, where ζ20 = min(ζ2 , kT −1 k)/2kT −1 k2 > 0. Then 1 2
kT −1 (φ(x) − φ(y)) − (x − y)k ≤ kT −1 kζ20 kx − yk ≤ kx − yk for all x, y ∈ D, so φ must be injective; set C = φ[D] and ψ = φ−1 : C → D. Note that C is bounded, because kφ(x) − φ(y)k ≤ (kT k + ζ20 )kx − yk whenever x, y ∈ D. Also 1 2
kT −1 (u − v) − (ψ(u) − ψ(v))k ≤ kT −1 kζ20 kψ(u) − ψ(v)k ≤ kψ(u) − ψ(v)k for all u, v ∈ C. But this means that 1 2
kψ(u) − ψ(v)k − kT −1 kku − vk ≤ kψ(u) − ψ(v)k and kψ(u) − ψ(v)k ≤ 2kT −1 kku − vk for all u, v ∈ C, so that kψ(u) − ψ(v) − T −1 (u − v)k ≤ 2ζ20 kT −1 k2 ku − vk ≤ ζ2 ku − vk for all u, v ∈ C. By (b) just above, it follows that µ∗ D = µ∗ ψ[C] ≤ (J −1 + ²0 )µ∗ C = (J −1 + ²0 )µ∗ φ[D], and Jµ∗ D ≤ (1 + J²0 )µ∗ φ[D]. (d) So if we set ζ = min(ζ0 , ζ1 , ζ20 ) > 0, and if D ⊆ R r , φ : D → R r are such that D is bounded and kφ(x) − φ(y) − T (x − y)k ≤ ζkx − yk for all x, y ∈ D, we shall have µ∗ φ[D] ≤ (J + ²)µ∗ D, µ∗ φ[D] ≥ Jµ∗ D − J²0 µ∗ φ[D] ≥ Jµ∗ D − J²0 (J + ²)µ∗ D = Jµ∗ D − ²µ∗ D, so we get the required formula
312
Change of variable in the integral
263C
µ∗ φ[D] − Jµ∗ D ≤ ²µ∗ D. 263D
We are ready for the theorem.
Theorem Let D ⊆ R r be any set, and φ : D → R r a function differentiable relative to its domain at each point of D. For each x ∈ D let T (x) be a derivative of φ relative to D at x, and set J(x) =  det T (x). Then (i) J : D → [0,R∞[ is a measurable function, (ii) µ∗ φ[D] ≤ D J dµ, allowing ∞ as the value of the integral. If D is measurable, then (iii) φ[D] is measurable. If D is measurableRand φ is injective, then (iv) µφ[D] = D J dµ, (v) for every realvalued function g defined on a subset of φ[D], R R g dµ = D J × gφ dµ φ[D] if either integral is defined in [−∞, ∞], provided we interpret J(x)g(φ(x)) as zero when J(x) = 0 and g(φ(x)) is undefined. proof (a) To see that J is measurable, use 262P; the function T 7→  det T  is a continuous function of the coefficients of T , and the coefficients of T (x) are measurable functions of x, by 262P, so x 7→  det T (x) is measurable (121K). We also know that if D is measurable, φ[D] will be measurable, by 262Ob. Thus (i) and (iii) are done. (b) For the moment, assume that D is bounded, and fix ² > 0. For r × r matrices T , take ζ(T, ²) > 0 as in 263C. Take hDn in∈N , hTn in∈N as in 262M, so that hDn in∈N is a disjoint cover of D by sets which are relatively measurable in D, and each Tn is an r × r matrix such that kT (x) − Tn k ≤ ζ(Tn , ²) whenever x ∈ Dn , kφ(x) − φ(y) − Tn (x − y)k ≤ ζ(Tn , ²)kx − yk for all x, y ∈ Dn . Then, setting Jn =  det Tn , we have J(x) − Jn  ≤ ² for every x ∈ Dn , µ∗ φ[Dn ] − Jn µ∗ Dn  ≤ ²µ∗ Dn , by the choice of ζ(Tn , ²). So we have R R P∞ J dµ ≤ n=0 Jn µ∗ Dn + ²µ∗ D ≤ D J dµ + 2²µ∗ D; D I am using here the fact that all the Dn are relatively measurable in D, so that, in particular, µ∗ D = P ∞ ∗ n=0 µ Dn . Next, P∞ P∞ µ∗ φ[D] ≤ n=0 µ∗ φ[Dn ] ≤ n=0 Jn µ∗ Dn + ²µ∗ D. Putting these together, µ∗ φ[D] ≤
R D
J dµ + 2²µ∗ D.
If D is measurable and φ is injective, then all the Dn are measurable subsets of R r , so all the φ[Dn ] are measurable, and they are also disjoint. Accordingly R P∞ P∞ J dµ ≤ n=0 Jn µDn + ²µD ≤ n=0 (µφ[Dn ] + ²µDn ) + ²µD = µφ[D] + 2²µD. D Since ² is arbitrary, we get µ∗ φ[D] ≤ and if D is measurable and φ is injective,
R D
R D
J dµ,
J dµ ≤ µφ[D];
thus we have (ii) and (iv), on the assumption that D is bounded.
*263F
Differentiable transformations in R r
(c) For a general set D, set Bk = B(0, k); then µ∗ φ[D] = limk→∞ µ∗ φ[D ∩ Bk ] ≤ limk→∞
R D∩Bk
313
J dµ =
R D
J dµ,
with equality if φ is injective and D is measurable. (d) For part (v), I seek to show that the hypotheses of 235L are satisfied, taking X = D and Y = φ[D]. P P Set G = {x : x ∈ D, J(x) > 0}. α) If F ⊆ φ[D] is measurable, then there are Borel sets F1 , F2 such that F1 ⊆ F ⊆ F2 and µ(F2 \F1 ) = (α 0. Set Ej = φ−1 [Fj ] for each j, so that E1 ⊆ φ−1 [F ] ⊆ E2 , and both the sets Ej are measurable, because φ and dom φ are measurable. Now, applying (iv) to φ¹Ej , R J dµ = µφ[Ej ] = µ(Fj ∩ φ[D]) = µF Ej R for both j, so E2 \E1 J dµ = 0 and J = 0 a.e. on E2 \ E1 . Accordingly J × χ(φ−1 [F ]) =a.e. J × χE1 , and R R J × χ(φ−1 [F ])dµ exists and is equal to E1 J dµ = µF . At the same time, (φ−1 [F ] ∩ G)4(E1 ∩ G) is negligible, so φ−1 [F ] ∩ G is measurable. R β ) If F ⊆ φ[D] and G ∩ φ−1 [F ] is measurable, then we know that µφ[D \ G] = D\G J = 0 (by (iv)), (β so F \ φ[G] must be negligible; while F ∩ φ[G] = φ[G ∩ φ−1 [F ]] is also measurable, by (iii). Accordingly F is measurable whenever G ∩ φ−1 [F ] is measurable. Thus all the hypotheses of 235L are satisfied. Q Q Now (v) can be read off from the conclusion of 235L. 263E Remarks (a) This is a version of the classical result on change of variable in a manydimensional integral. What I here call J(x) is the Jacobian of φ at x; it describes the change in volumes of objects near x, following the rule already established in 263A for functions with constant derivative. The idea of the proof is also the classical one: to break the set D up into small enough pieces Dm for us to be able to approximate φ by affine operators y 7→ φ(x) + Tm (y − x) on each. The potential irregularity of the set D, which in this theorem may be any set, is compensated for by a corresponding freedom in choosing the sets Dm . In fact there is a further decomposition of the sets Dm hidden in part (bii) of the proof of 263C; each Dm is essentially covered by a disjoint family of balls, the measures of whose images we can estimate with an adequate accuracy. There is always a danger of a negligible exceptional set, and we need the crude inequalities of the proof of 262D to deal with it. (b) Throughout the work of this chapter, from 261B to 263D, I have chosen balls B(x, δ) as the basic shapes to work with. I think it should be clear that in fact any reasonable shapes would do just as well. In particular, the ‘balls’ Pr B1 (x, δ) = {y : i=1 ηi − ξi  ≤ δ}, B∞ (x, δ) = {y : ηi − ξi  ≤ δ ∀ i} would serve perfectly. There are many alternatives.£ We could use sets C(x, k), for x ∈ R r and £ of the form −k −k r k ∈ N, defined to be the halfopen cube of the form 2 z, 2 (z + 1) with z ∈ Z containing x, instead; or even C 0 (x, δ) = [x, x + δ1[. In all such cases we have versions of the density theorems (261Yb261Yc) which support the remaining theory. (c) I have presented 263D as a theorem about differentiable functions, because that is the normal form in which one uses it in elementary applications. However, the proof depends essentially on the fact that a differentiable function is a countable union of Lipschitz functions, and 263D would follow at once from the same theorem proved for Lipschitz functions only. Now the fact is that the theorem applies to any countable union of Lipschitz functions, because a Lipschitz function is differentiable almost everywhere. For more advanced work (see Federer 69 or Evans & Gariepy 92, or Chapter 47 in Volume 4) it seems clear that Lipschitz functions are the vital ones, so I spell out the result. *263F Corollary Let D ⊆ R r be any set and φ : D → R r a Lipschitz function. Let D1 be the set of points at which φ has a derivative relative to D, and for each x ∈ D1 let T (x) be such a derivative, with J(x) =  det T (x). Then (i) D \ D1 is negligible;
314
Change of variable in the integral
*263F
(ii) J : D1 → [0, R ∞[ is measurable; (iii) µ∗ φ[D] ≤ D J(x)dx. If D is measurable, then (iv) φ[D] is measurable. If D is measurableR and φ is injective, then (v) µφ[D] = D J dµ, (vi) for every realvalued function g defined on a subset of φ[D], R R g dµ = D J × gφ dµ φ[D] if either integral is defined in [−∞, ∞], provided we interpret J(x)g(φ(x)) as zero when J(x) = 0 and g(φ(x)) is undefined. proof This is now just a matter of putting 262Q and 263D together, with a little help from 262D. Use 262Q to show that D \ D1 is negligible, 262D to show that φ[D \ D1 ] is negligible, and apply 263D to φ¹D1 . 263G Polar coordinates in the plane I offer an elementary example with µ a useful consequence. ¶ cos θ −ρ sin θ 2 2 2 0 Define φ : R → R by setting φ(ρ, θ) = (ρ cos θ, ρ sin θ) for ρ, θ ∈ R . Then φ (ρ, θ) = , so sin θ ρ cos θ J(ρ, θ) = ρ for all ρ, θ. Of course φ is not injective, but if we restrict it to the domain D = {(0, 0)}∪{(ρ, θ) : ρ > 0, −π < θ ≤ π} then φ¹D is a bijection between D and R 2 , and R R g dξ1 dξ2 = D g(φ(ρ, θ))ρ dρdθ for every realvalued function g which is integrable over R 2 . Suppose, in particular, that we set g(x) = e−kxk for x = (ξ1 , ξ2 ) ∈ R. Then
2
/2
2
2
= e−ξ1 /2 e−ξ2 /2
R R 2 2 g(x)dx = e−ξ1 /2 dξ1 e−ξ2 /2 dξ2 , R R 2 as in 253D. Setting I = e−t /2 dt, we have g = I 2 . (To see that I is welldefined in R, note that the integrand is continuous, therefore measurable, and that R 1 −t2 /2 e dt ≤ 2, −1 R −1 −∞
e−t
2
/2
R
dt =
R∞ 1
e−t
2
/2
dt ≤
R∞ 1
e−t/2 dt = lima→∞
Ra 1
1 2
e−t/2 dt = e−1/2
are both finite.) Now looking at the alternative expression we have Z
Z
I2 =
g(x)dx =
g(ρ cos θ, ρ sin θ)ρ d(ρ, θ) Z Z ∞Z π 2 2 = e−ρ /2 ρ d(ρ, θ) = ρe−ρ /2 dθdρ D
D
0
−π
(ignoring the point (0, 0), which has zero measure) Z ∞ Z −ρ2 /2 = 2πρe dρ = 2π lim a→∞
0
= 2π lim (−ea
2
/2
+ 1) = 2π.
2
dt = I =
a→∞
Consequently
R∞ −∞
e−t
/2
√
a
2
ρe−ρ
/2
dρ
0
2π,
which is one of the many facts every mathematician should know, and in particular is vital for Chapter 27 below.
263I
Differentiable transformations in R r
263H Corollary If k ∈ N is odd,
R∞ −∞
xk e−x
2
/2
315
dx = 0;
if k = 2l ∈ N is even, then R∞ −∞
xk e−x
2
/2
dx =
(2l)! √ 2π. 2l l!
proof (a) To see that all the integrals are welldefined and finite, observe that limx→±∞ xk e−x 2 that Mk = supx∈R xk e−x /4  is finite, and R ∞ k −x2 /2 R∞ 2 x e dx ≤ Mk −∞ e−x /4 dx < ∞. −∞
2
/4
= 0, so
(b) If k is odd, then substituting y = −x we get R ∞ k −x2 /2 R∞ 2 x e dx = − −∞ y k e−y /2 dy, −∞ so that both integrals must be zero. √ √ R∞ 2 2π by 263G. For the (c) For even k, proceed by induction. Set Il = −∞ x2l e−x /2 dx. I0 = 2π = 20! 0 0! inductive step to l + 1 ≥ 1, integrate by parts to see that R a 2l+1 Ra 2 2 2 2 x · xe−x /2 dx = −a2l+1 e−a /2 + (−a)2l+1 e−a /2 + −a (2l + 1)x2l e−x /2 dx −a for every a ≥ 0. Letting a → ∞, Il+1 = (2l + 1)Il . Because (2(l+1))! √ 2π 2l+1 (l+1)!
= (2l + 1)
(2l)! √ 2π, 2l l!
the induction proceeds. 263I The onedimensional case The restriction to injective functions φ in 263D(v) is unavoidable in the context of the result there. But in the substitutions of elementary calculus it is not always essential. In the hope of clarifying the position I give a result here which covers many of the standard tricks. Theorem Let I ⊆ R be an interval with more than one point, and φ : I → R a function which is absolutely continuous on any closed bounded subinterval of I. Write u = inf I, u0 = sup I in [−∞, ∞], and suppose that v = limx↓u φ(x) and v 0 = limx↑u0 φ(x) are defined in [−∞, ∞]. Let g be a Lebesgue measurable realvalued function defined almost everywhere on φ[I]. Then R v0 R g = I g(φ(x))φ0 (x)dx v R v0 Rv whenever the righthand side is defined in R, on the understanding that we interpret v g as − v0 g when v 0 < v, and g(φ(x))φ0 (x) as 0 when φ0 (x) = 0 and g(φ(x)) is undefined. proof (a) Recall that φ is differentiable almost everywhere on I (225Cb) and that φ[A] is negligible for every negligible A ⊆ I (225G). (These results are stated for closed bounded intervals; but since any interval is expressible as the union of a sequence of closed bounded intervals, they remain valid in the present context.) Set D = dom φ0 , so that I \ D and φ[I \ D] are negligible. Next, setting : x ∈ R D0 = {x 0 0 D, φ (x) = 0}, D and D are Borel sets (225J) and φ[D ] is negligible, by 263D(ii), while g(φ(x))φ (x)dx = 0 0 I R 0 g(φ(x))φ (x)dx. D\D0 Applying 262M with A = R \ {0} and ζ(α) = 12 α for α ∈ A, we have sequences hEn in∈N , hαn in∈N such that hEn in∈N is a disjoint cover of D \ D0 by measurable sets, every αn is nonzero, and φ(x) − φ(y) − αn (x − y) ≤ 21 αn x − y for all x, y ∈ En ; so that, in particular, φ¹En is injective, while sgn φ0 (x) = sgn αn for every x ∈ En , writing sgn α = α/α as usual. Set ²n = sgn αn for each n. Now 263D(v) tells us that P∞ R P∞ R g × χ(φ[En ]) = n=0 En g(φ(x))φ0 (x)dx n=0 is finite.
316
Change of variable in the integral
263I
−1 Note that R 263D(v) also 0shows that if B ⊆ R is negligible, then En ∩ φ [B] must be negligible for every n, so that φ−1 [B] g(φ(x))φ (x)dx = 0. Consequently, setting P∞ C0 = {y : y ∈ (φ[I] ∩ dom g) \ ({v, v 0 } ∪ φ[I \ D] ∪ φ[D0 ]), n=0 g(y)χ(φ[En ])(y) < ∞},
φ[I] \ C0 is negligible, and if we set C = {y : y ∈ C0 , g(y) 6= 0}, R R g= Jg J∩C for every J ⊆ φ[I]. (b) The point of the argument is the following fact: if y ∈ C then ∞ X
²n χ(φ[En ])(y) = 1 if v < y < v 0 ,
n=0
= −1 if v 0 < y < v, = 0 if y < v ≤ v 0 or v 0 ≤ v < y.
P∞ P P Because g(y) 6= 0 and n=0 g(y)χ(φ[En ])(y) / φ[I \ D] ∪ S is finite, {n : y ∈ φ[En ]} is finite; because y ∈ φ[D0 ], and φ¹En is injective for every n, and n∈N En = D \ D0 , K = φ−1 [{y}] is finite. For each x ∈ K, P∞ P let nx be such that x ∈ Enx ; then ²nx = sgn φ0 (x). So n=0 ²n χ(φ[En ])(y) = x∈K sgn φ0 (x). If J ⊆ R\K is an interval, φ(z) 6= y for z ∈ J; since φ is continuous, the Intermediate Value Theorem tells us that sgn(φ(z)P− y) is constant on J. A simple induction on #(K ∩ ]−∞, z[) shows that y) = P sgn(φ(z) − 0 sgn(v − y) + 2 x∈K,x α}dα R R∞ is finite, and in this case kf kp = γ 1/p . (Hint: f p = 0 µ∗ {x : f (x)p > β}dβ, by 252O; now substitute β = αp .) r Let f be an integrable function defined almost everywhere on . Show R that if α < r − 1 then P(b) PR ∞ ∞ α r α f (nx)dx for any ball B n=1 n f (nx) is finite for almost every x ∈ R . (Hint: estimate n=0 n B centered at the origin.)
263 Notes
Differentiable transformations in R r
317
(c) Let A ⊆ ]0, 1[ be a set such that µ∗ A = µ∗ ([0, 1] \ A) = 1, where µ is Lebesgue measure on R. Set D = A ∪ {−x : x ∈ ]0, 1[ \ A} ⊆ [−1, 1], and set φ(x) = x for x ∈ D. Show that φ is injective, that φ is R differentiable relative to its domain everywhere in D, and that µ∗ φ[D] < D φ0 (x)dx. (d) Let φ : D → R r be a function differentiable relative to D at each point of D ⊆ R r , and suppose that for each x ∈ DS there is a nonsingular derivative T (x) of φ at x; set J(x) =  det T (x). Show that D is expressible as k∈N Dk where Dk = D ∩ Dk and φ¹Dk is injective for each k. R 1 R 1 du = E u du. (ii) For t ∈ R, > (e) (i) Show that for any Lebesgue measurable E ⊆ R, t ∈ R \ {0}, tE u R R 1 1 t u ∈ R \ {0} set φ(t, u) = ( u , u). Show that φ[E] tu d(t, u) = E tu d(t, u) for any Lebesgue measurable E ⊆ R2. (f ) Define φ : R 3 → R 3 by setting φ(ρ, θ, α) = (ρ sin θ cos α, ρ sin θ sin α, ρ cos θ). Show that det φ0 (ρ, θ, α) = ρ2 sin θ. (g) Show that if k = 2l + 1 is odd, then
R∞ 0
2
xk e−x
/2
dx = 2l l!. (Compare 252Xh.)
R 1 263Y Further exercises (a) Define a measure ν on R by setting νE = E x dx for Lebesgue measurR x 1 able sets E ⊆ R. For f , g ∈ L (ν) set (f ∗ g)(x) this is defined in R. (i) Show R = f ( t )g(t)ν(dt) whenever R that f ∗ g = g ∗ f ∈ L1 (ν). (ii) Show that h(x)(f ∗ g)(x)ν(dx) = h(xy)f (x)f (y)ν(dx)ν(dy) for every h ∈ L∞ (ν). (iii) Show that f ∗ (g ∗ h) = (f ∗ g) ∗ h for every h ∈ L1 (ν). (Hint: 263Xe.) (b) Let E ⊆ R 2 be a measurable set such that lim supα→∞ α12 µ2 (E ∩ B(0, α)) > 0, writing µ2 for Lebesgue measure on R 2 . Show that there is some θ ∈ ]−π, π] suchR that µ1 Eθ = ∞, where Eθ = {ρ : π ρ ≥ 0, (ρ cos θ, ρ sin θ) ∈ E}. (Hint: show that α12 µ2 (E ∩ B(0, α)) ≤ −π min( 12 , α1 µ1 Eθ )dθ.) Generalize to higher dimensions and to functions other than χE. (c) Let E ⊆ R r be a measurable set, and φ : E → R r a function differentiable relative to its domain, with a derivative T (x), at each point x of E; set J(x) =  det T (x). Show that for any integrable function g defined on φ[E], R R g(y)#(φ−1 [{y}])dy = E J(x)g(φ(x))dx (Hint: 263I.) (d) Find a proof of 263I based on the ideas of §225. (Hint: 225Xg.) 263 Notes and comments Yet again, approaching 263D, I find myself having to choose between giving an accessible, relatively weak result and making the extra effort to set out a theorem which is somewhere near the natural boundary of what is achievable within the concepts being developed in this volume; and, as usual, I go for the more powerful form. There are three basic sources of difficulty: (i) the fact that we are dealing with more than one dimension; (ii) the fact that we are dealing with irregular domains; (iii) the fact that we are dealing with arbitrary integrable functions. I do not think I need to apologise for (iii) in a book on measure theory. Concerning (ii), it is quite true that the principal applications of these results are to cases in which the transformation φ is differentiable everywhere, with continuous derivative, and the set D has negligible boundary; and in these cases there are substantial simplifications available – mostly because the sets Dm of the proof of 263D can be taken to be cubes. Nevertheless, I think any form of the result which makes such assumptions is deeply unsatisfactory at this level, being an awkward compromise between ideas natural to the Riemann integral and those natural to the Lebesgue integral. Concerning (i), it might even have been right to lay out the whole argument for the case r = 1 before proceeding to the general case, as I did in §§114115, because the onedimensional case is already important and interesting; and if you find the work above difficult – which it is – and your immediate interests are in onedimensional integration by substitution, then I think you might find it worth your time to reproduce the r = 1 argument
318
Change of variable in the integral
263 Notes
yourself, up to a proof of 263I. In fact the biggest difference is in 263A, which becomes nearly trivial; the work of 262M and 263C becomes more readable, because all the matrices turn into scalars and we can drop the word ‘determinant’, but I do not think we can dispense with any of the ideas, at least if we wish to obtain 263D as stated. (But see 263Yd.) I found myself insisting, in the last paragraph, that a distinction can be made between ‘ideas natural to the Riemann integral and those natural to the Lebesgue integral’. We are approaching deep questions here, like ‘what are books on measure theory for?’, which I do not think can be answered without some – possibly unconscious – reference to the question ‘what is mathematics for?’. I do of course want to present here some of the wonderful general theorems which arise in the Lebesgue theory. But more important than any specific theorem is a general idea of what can be proved by these methods. It is the essence of modern measure theory that continuity does not matter, or, if you prefer, that measurable functions are in some sense so nearly continuous that we do not have to add hypotheses of continuity in our theorems. Now this is in a sense a great liberation, and the Lebesgue integral is now the standard one. But you must not regard the Riemann integral as outdated. The intuitions on which it is founded – for instance, that the surface of a solid body has zero volume – remain of great value in their proper context, which certainly includes the study of differentiable functions with continuous derivatives. What I am saying here is that I believe we can use these intuitions best if we maintain a division, a flexible and permeable one, of course, between the ideas of the two theories; and that when transferring a theorem from one side of the boundary to the other we should do so wholeheartedly, seeking to express the full power of the methods we are using. I have already said that the essential difference between the onedimensional and multidimensional cases lies in 263A, where the Jacobian J =  det T  enters the argument. Shorn of the technical devices necessary to deal with arbitrary Lebesgue measurable sets, this amounts to a calculation of the volume of the parallelepiped T [I] where I is the interval [0, 1[. I have dealt with this by a little bit of algebra, saying that the result is essentially obvious if T is diagonal, whereas if T is an isometry it follows from the fact that the unit ball is left invariant; and the algebra comes in to express an arbitrary matrix as a product of diagonal and orthogonal matrices (2A6C). It is also plain from 261F that Lebesgue measure must be rotationinvariant as well as translationinvariant; that is to say, it is invariant under all isometries. Another way of looking at this will appear in the next section. I feel myself that the centre of the argument for 263D is in the lemma 263C. This is where we turn the exact result for linear operators into an approximate result for almostlinear functions; and the whole point of differentiability is that a differentiable function is well approximated, in a neighbourhood of any point of its domain, by a linear operator. The lemma involves two rather different ideas. To show that µ∗ φ[D] ≤ (J + ²)µ∗ D, we look first at balls and then use Vitali’s theorem P to see that D is economically covered by balls, so that an upper bound for µ∗ φ[D] in terms of a sum B∈I0 µ∗ φ[D ∩ B] is adequate. To obtain a lower bound, we need to reverse the argument by looking at ψ = φ−1 , which involves checking first that φ is invertible, and then that ψ is appropriately linked to T −1 . I have written out exact formulae for ²0 , ζ20 and so on, but this is only in case you do not trust your intuition; the fact that kφ−1 (u)−φ−1 (v)−T −1 (u−v)k is small compared with ku−vk is pretty clearly a consequence of the hypothesis that kφ(x)−φ(y)−T (x−y)k is small compared with kx − yk. The argument of 263D itself is now a matter of breaking the set D up into appropriate pieces on each of which φ is sufficiently nearly linear for 263C to apply, so that P∞ P∞ µ∗ φ[D] ≤ m=0 µ∗ φ[Dm ] ≤ m=0 (Jm + ²)µ∗ Dm . With a little care (taken in 263C, with its condition (i)), we can also ensure R that the Jacobian J is well P∞ approximated by Jm almost everywhere on Dm , so that m=0 Jm µ∗ Dm l D J(x)dx. These ideas, joined with the results of §262, bring us to the point R J dµ = µφ[E] E when φ is injective and E ⊆ D is measurable. We need a final trick, involving Borel sets, to translate this into R J dµ = µF φ−1 [F ] whenever F ⊆ φ[D] is measurable, which is what is needed for the application of 235L. I hope that you long ago saw, and were delighted by, the device in 263G. Once again, this is not really
264B
Hausdorff measures
319
Lebesgue integration; but I include it just to show that the machinery of this chapter can be turned to deal with the classical results, and that indeed we have a tiny profit from our labour, in that no apology need be made for the boundary of the set D into which the polar coordinate system maps the plane. I have already given the actual result as an exercise in 252Xh. That involved (if you chase through the references) a onedimensional substitution (performed in 225Xj), Fubini’s theorem and an application of the formulae of §235; that is to say, very much the same elements as those used above, though in a different order. I could present this with no mention of differentiation in higher dimensions because the first change of variable was in one dimension, and the second (involving the function x 7→ kxk, in 252Xh(i)) was of a particularly simple type, so that a different method could be used to find the function J. The abstract ideas to which this treatise is devoted do not, indeed, lead us to many particular examples on which to practise the ideas of this section. The ones which do arise tend to be very straightforward, as in 263G, 263Xa263Xb and 263Xe. I mention the last because it provides a formula needed to discuss a new type of convolution (263Ya). In effect, this depends on the multiplicative group R \ {0} in place of the additive group R treated in §255. The formula x1 in the definition of ν is of course the derivative of ln x, and ln is an isomorphism between (]0, ∞[ , ·, ν) and (R, +, Lebesgue measure).
264 Hausdorff measures The next topic I wish to approach is the question of ‘surface measure’; a useful example to bear in mind throughout this section and the next is the notion of area for regions on the sphere, but any other smoothly curved twodimensional surface in threedimensional space will serve equally well. It is I think more than plausible that our intuitive concepts of ‘area’ for such surfaces should correspond to appropriate measures. But the formalisation of this intuition is nontrivial, especially if we seek the generality that simple geometric ideas lead us to; I mean, not contenting ourselves with arguments that depend on the special nature of the sphere, for instance, to describe spherical surface area. I divide the problem into two parts. In this section I will describe a construction which enables us to define the rdimensional measure of an rdimensional surface – among other things – in sdimensional space. In the next section I will set out the basic theorems making it possible to calculate these measures effectively in the leading cases. 264A Definitions Let s ≥ 1 be an integer, and r > 0. (I am primarily concerned with integral r, but will not insist on this until it becomes necessary, since there are some very interesting ideas which involve nonintegral ‘dimension’ r.) For any A ⊆ R s , δ > 0 set θrδ A = inf{
∞ X
(diam An )r : hAn in∈N is a sequence of subsets of R s covering A,
n=0
diam An ≤ δ for every n ∈ N}. It is convenient in this context to say that diam ∅ = 0. Now set θr A = supδ>0 θrδ A; θr is rdimensional Hausdorff outer measure on R s . 264B
Of course we must immediately check the following:
Lemma θr , as defined in 264A, is always an outer measure. proof You should be used to these arguments by now, but there is an extra step in this one, so I spell out the details. (a) Interpreting the diameter of the empty set as 0, we have θrδ ∅ = 0 for every δ > 0, so θr ∅ = 0. (b) If A ⊆ B ⊆ R s , then every sequence covering B also covers A, so θrδ A ≤ θrδ B for every δ and θr A ≤ θr B.
320
Change of variable in the integral
264B
s (c) Let hAn in∈N be a sequence of subsets P∞ of R with union A, and take any a < θr A. Then there is a δ > 0 such that a ≤ θrδ A. Now θrδ A ≤ n=0 θrδ (An ). P P Let ² > 0,Pand for each n ∈ N choose a sequence ∞ hAnm im∈N of sets, covering A, with diam Anm ≤ δ for every m and m=0 (diam Anm )r ≤ θrδ + 2−n ². Then hAnm im,n∈N is a cover of A by countably many sets of diameter at most δ, so P∞ P∞ P∞ P∞ θrδ A ≤ n=0 m=0 (diam Anm )r ≤ n=0 θrδ An + 2−n ² = 2² + n=0 θrδ An .
As ² is arbitrary, we have the result. Q Q Accordingly a ≤ θrδ A ≤
P∞
n=0 θrδ An
As a is arbitrary, θr A ≤
≤
P∞
n=0 θr An .
P∞
n=0 θr An ;
as hAn in∈N is arbitrary, θr is an outer measure. 264C Definition If s ≥ 1 is an integer, and r > 0, then Hausdorff rdimensional measure on R s is the measure µHr on R s defined by Carath´eodory’s method from the outer measure θr of 264A264B. 264D Remarks (a) It is important to note that the sets used in the definition of the θrδ need not be balls; even in R 2 not every set A can be covered by a ball of the same diameter as A. (b) In the definitions above I require r > 0. It is sometimes appropriate to take µH0 to be counting measure. This is nearly the result of applying the formulae above with r = 0, but there can be difficulties if we interpret them overliterally. (c) All Hausdorff measures must be complete, because they are defined by Carath´eodory’s method (212A). For r > 0, they are all atomless (264Yg). In terms of the other criteria of §211, however, they are very illbehaved; for instance, if r, s are integers and 1 ≤ r < s, then µHr on R s is not semifinite. (I will give a proof of this in