NORTH-HOLLAND
MATHEMATICS STUDIES
29
Probabilities and Potential
CLAUDE DELLACHERIE Institut de Mathematique Univers...
556 downloads
1094 Views
8MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
NORTH-HOLLAND
MATHEMATICS STUDIES
29
Probabilities and Potential
CLAUDE DELLACHERIE Institut de Mathematique Universite Louis-Pasteur, Strasbourg
PAUL-ANDRE MEYER Directeur de recherches Centre National de la Recherche Scientifique
I
HERMANN. PUBLISHERS IN ART AND SCIENCE
db
1978
NORTH-HOLLAND PUBLISHING COMPANY - AMSTERDAM - NEW YORK - OXFORD
© Hermann, Paris 1978
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner.
Hermann ISBN: 2 7056 5857 2 North-Holland ISBN: 0 72040701 x
Translation of : Probabilites et potentiel © 1975, Hermann, 293 rue Lecourbe, 75015 Paris_ France
PUBLISHERS
HERMANN, PARIS NORTH-HOLLAND PUBLISHING COMPANY AMSTERDAM • NEW YORK • OXFORD SOLE DISTRIBUTORS FOR THE U.S.A. AND CANADA:
ELSEVIER NORTH-HOLLAND, INC. 52 VANDERBILT AVENUE, NEW YORK, N.Y. 10017 Library of Congress Cataloging in Publication Data :
Dellacherie, Claude. Probabilities and potential. (North-Holland mathematics studies; 29) Translation of Probabilites et potentiel. Ed. of 1966 by P.-A. Meyer Bibliography: p Includes indexes. 1. Probabilities. 2. Measure theory. 3. Potential, Theory of. I. Meyer, Paul Andre, joint author. II. Meyer, Paul Andre. Probabilites et potentiel. III. Title. QA273.D3713 519.2 77-26865 ISBN 0-7204-0701-X
PRINTED IN FRANCE
Contents
CHAPTER O. NOTATION
1
O-v\PTER I. f''EASURABLE SPACES
7
a-fields and random variables
7
Definition of a-fields (no.l). Random variables (nos. 2 to 4). a-fields generated by subsets, etc ... (nos. 5 to 7). Product a-fields (no. 8). Atoms, separable a-fields (nos. 9 to 12). Real-valued random variables
11
First properties (nos. 13 to 18). The monotone class theorem (nos. 19 to 24).
CW\PTER II. PROPABILITY lAWS AND mTHEMl\TlCAL EXPECTATIONS Summary of integration theory
17
17
Probability laws (nos. 1 to 4). Expectations, Lebesgue's theorem, etc ... (nos. 5 to 9). Convergence of random variables (no. 10). Image laws (nos. 11 to 12). Integration of laws, Fubini's Theorem (nos. 13 to 16). Supplement on integration
21
Oniform integrability (nos. 17 to 22). Vitali-Hahn-Saks Theorem (no. 23). Weak compactness, Dunford-Pettis Compactness Theorem (nos. 24 to 26). Rapid filters (nos. 27 to 29). Completion, independence, conditioning
30
Internally negligible sets (nos. 30 to 32). Independence (nos. 33 to 35). Conditional expectations (nos. 36 to 40). List of their properties (nos 41 to 42). Conditional independence (nos. 43 to 45).
CHAPTER III. CCl''PLEmfTS TO rtEASURE THEORY Analytic sets Pavings (no.l) Compact pavings (nos. 2 to 6). Analytic sets and closure properties (nos. 7 to 13). The separation theorem (no. 14). Souslin measurable spaces, etc ... (nos. 15 to 17). Direct images (nos. 18 to 20). The Souslin-
39 39
Lusin theorem (nos. 21 to 23). Blackwell spaces (nos. 24 to 26). Capaciti es
5
Choquet capacities (no. 27). Choquetls theorem (nos. 28 to 29). Construction of capacities (nos. 30 to 32). Applications: measurability of analytic sets (no. 33), Caractheodory's theorem (no. 34) and Daniell IS theorem (no. 35); measures on compact spaces (no. 36) and Lusin spaces (nos. 37 to 38). Leftcontinuous (nos. 39 to 40) and right-continuous (nos. 41 to 42) capacities. Another proof of the separation theorem (no. 43). Measurability of debuts and cross section theorem (nos. 44 to 45). Bounded Radon measures
6!
Radon measures (nos. 46 to 47). Filtering famili~s of l.s.c. or u.s.c. functions (nos. 48 to 50). Inverse limits: Kolmogorov's theorem ~os. 51 to 52) and Prokhorov's theorem (no. 53). Strict convergence (nos. 54 to 58). Prokhorov's compactness criterion (no. 59). The space of probability laws (nos. 60 to 62). Lindelof spaces (nos. 63 to 66). Non-metrizable Souslin and Lusin spaces (nos. 67 to 69). Desintegration of measures (nos. 70 to 74). (N0S 75 to 86 appear in the appendix, see below).
CHAPTER IV STOCHASTIC PROCESSES I
General properties of processes
8~
8~
Processes (no. 1). Philosophy (nos. 2 to 5). Standard modifications and indistinguishable processes (nos. 6 to 8). Time laws canonical processes (nos. 9 to 10). Filtrations, adapted processes and philosophy (nos. 11 to 13). Progressivity (nos. 14 to 15). Regularity of paths
85
Notation- (no. 16). Processes on a countable dense set (nos. 17 to 19). Upcrossings and downcrossings, applications (nos. 20 to 23). Separable processes (nos. 24 to 30). Random closed sets (nos. 31 to 32). Progressivity of certain processes (nos. 33 to 34). Almost-equivalence (no. 35). Essential topology (nos. 36 to 39). Pseudo-paths (nos. 40 to 46). Optional and predictable times Definitions concerning filtrations (nos. 47 to 48). Stopping times (no. 49). Debuts (nos. 50 to 51). a-fields associated to stopping times (nos. 52 to 54). Properties (nos. 55 to 59). Stochastic intervals (no. 60). Optinal and predietable a-fields (nos. 61 to 63). Properties (nos. 64 to 68). Predictable times (nos. 69 to 74). Sequences foretelling a predictable time (nos. 75 to 78).
11
Totally inaccessible stopping times; classification (nos. 79 to 81). Quasileft-continuous filtrations (nos. 82 to 83). Cross section theorems (nos 84 to 87). Sets with countable sections (no. 88). Additional numbers on sets with countable sections (nos Examples and supplements
143
Optional or predictable processes defined by limits (nos. 89 to 93). Canonical spaces (nos. 94 to 96), their predictable and optional a-fields (nos. 97 to 98) and Galmarino's test (nos. 99 to 102). Decomposition of a stopping time (no. 103). Filtrations generated by a single stopping time (nos. 104 to 108).
APPEND IX TO CHAPTER III
156
Souslin schemes (nos. 75 to 77). Representation of Souslin and Lusin spaces as continuous images of Polish spaces (nos. 78 to 80). Isomorphisms:between Lusin spaces (no. 80). Cross section theorem (nos. 81 to 82). The second separation theorem (83).
APPENDIX TO CHAPTER IV
163
Sets with un-countable sections (nos. 109 to 113). Derivatives of random sets (nos. 114 to 116). Sets with countable sections (nos. 117 to 118).
CO't'ENTS
INDEX OF
169 TERJ~ INOLOGY
175
H'lDEX OF ['{)TA11 or~
181
BlEUOGRAPHY
185
Preface
The titles of most books are meant to provide some information about their contents. So it is only fair to warn the reader that this volume contains little enough probability and no potential theory whatsoever (1). Most of the probability should appear in a subsequent volume (martingale theory), potential theory still later (resolvents and semi-groups). As for the "and" between these two words, it is pushed so far into the future that we scarcely dare think about it. The true contents of this volume are some brief recollections of measure theory and the vocabulary of probability, and two long chapters, the first one on analytic sets and capacities, and the second one on the foundations of stochastic processes. Why then this title? In the year 1966, the second author had already published a volume called Probability and Potentials, containing eleven chapters which covered a much wider domain. It only lacked the "and", that is, the connection between potential theory and the theory of Markov processes. This was meant for a second volume, whose partial outline appeared in 1967 as a set of lecture notes on Markov processes. And now, instead of completing this second part to crown the edifice, we return to the very foundations of it. This may look absurd, but there are several reasons for it. First of all, the need for a reference book on Markov precesses and potential theory which was felt in those times was relieved by the publication, in 1968, of the treatise of Blumenthal and Getoor. Next, our whole theory has been since 1966 in a process of very rapid evolution. To take a few examples, in probability theory the first edition of this book contained the definition of a well-measurable process, but that of a predictable process, which is now so basic for stochastic integration, was only implicit there. On the potential side, a modest notion called "pseudo-reduite" (p. 247 of the French edition) was (1) So we are following in the footsteps of our Master N. Bourbaki, whose stpuctures fondamentales de 1 'analyse contain no analysis at all.
introduced with the somewhat despising comment "we aren't sure that the following theorem can be of any use". From the work of Mokobodzki on resolvents, we have learnt since that (pseudo)reduites are the key to the deeper results of potential theory. Finally, concerning the "and" part, the announcement at the beginning of the section on Ray resolvents was "the following results will not be used in later chapters, and can be omitted", while they would now be considered as fundamental. Examples of this kind could be multiplied. The conditions of our work have also changed considerably since 1966. At that time, whereas potential theory and the theory of Markov processes were respectable areas of mathematics, people interested in the relation between them would scarcely outnumber half a score in the world. This is no longer the case (may be some credit for it can be ascribed to the first edition of this book which, for all its imperfections, has contributed to popularizing a number of ideas). There are just two names on the cover, but this shouldn1t hide the fact that the new points of view presented here, or to be presented in later volumes, came into being through innumerable exchanges. The reader will gain some idea of it by perusing the volumes of the Strasbourg probability seminar, published every year since 1967 - and this is but the tip of the iceberg. Thus the rapid evolution of the whole theory has discouraged us from building on the old foundations, and the support of an active mathematical environment has been an incentive to undertake again the full work from the start. Our publisher also has been full of understanding in his acceptance of a publication "by instalments", more informal than usual. From the history of our theory we have also learnt some lessons, which we have tried to put to use in this new edition. In particular, we have tried to free ourselves from the attitude of many textbooks, which deal with mathematical truths as with eternal objects offered to our contemplation, from a world of pure Ideas where inflation is something unheard of. Truths are truths, but their value doesn't come from being printed on fine paper. Many immutable truths of 1966 have lost all interest and are now dead, while small remarks of 1966 have grown up and now shed light on large parts of our field. So we have tried to put ~s much life as we could into the work, making digressions, adding comments and leaving some room for technical tricks and "useless" remarks. We must confess that the material may be considered arid, and that boredom has overcome us at times (at which places, the reader will probably know by his own weariness), but not too often. We have preserved the organization of the first edition : within each chapter, all statements (whether theorems, definition or remarks) to which it may be useful to refer are sequentially numbered. So Theorem II. 31 (II denotes the chapter) may
be followed by remarks II. 32 a) and b), and Definition II. 33. This is convenient for the reader (so we believe), but the cost to the authors of modifying a chapter that is almost completed becomes enormous. So we beg from our readers some indulgence for irregularities: "bis" and missing numbers, or maybe trivial remarks glorified with a number to prevent a gap in the numbering. Indexes of notation and terminology at the end of the volume are organized according to this system, not to page numbers. The bibliography is classified by alphabetical order of authors, but in the list of each author's publications (numbered [1J, [2J ... ) the order is purely random, as fits a probability book. We should have dedicated the book to our wives, for keeping the children quiet while Daddy was working (or pretending to), but we got from Frank Knight the (secret) information that 1976 was the year of Doob's 65 th birthday (1). Now Doob's ideas inspired a great deal of the work in our field and in particular pervade the whole of our chapter IV. So it was only justice to write here: DEDICATED TO J.L. DOOB ON HIS 65 TH BIRTHDAY
(1) Our hearty thanks go to Professor T.G. Kurtz, who helped us to prepare the final manuscript of the English edition. His comments, on mathematics and language, led to the elimination of many errors and obscurities.
CHAPTER 0
Notation 1
Notation from set theory The complement of A is denoted by [A or more often AC . The notation A\B means An BC ; A ~ B is the symmetric difference (A\B) U (B\A). The set of all x E E with some property P is denoted by {x E E : P(x)} or, if there is no ambiguity, {x:P(x)} or simply {Pl. The restriction of a function f to a set A is denoted by fi A. Similarly, if is a family of subsets, ciA is the set of traces on A of elements of e : explicitly el A = {B () A, BEn.
t
Closure of sets of subsets
2
We sometimes use sentences of the following form: the family e is closed under ( ... ), where the brackets contain set-theoretic operation symbols, sometimes followed by the letters f, c, a, m, which abbreviate respectively: finite, countable, arbitrary, monotone. Two examples will suffice to clarify their meaning: lie is closed under (uf,na)" means that finite unions (*) of elements of 8 and arbitrary intersections of elements of e still belong to E; lie is closed under (umc, c r ' means that monotone countable unions of elements of e (i .e. unions of increasing sequences ine) still belong toe and that complements of elements of ~still belong to e. Sets of subsets or functions are generally denoted by capital script letters. The closure of a family of subsets e under (uc) (resp. (nc)) is denoted by (resp.8 0) - this notation is classical to set theory. We write ((e)a)o = eao '
eo
Lattice notation Let f and 9 be two real-valued functions. We write f v g and fAg for sup(f,g) and inf(f,g). The notation f+ and f- has its classical meaning: f+ = f v 0, f- = (-f) v O.
More generally, V,A denote least upper and greatest lower bounds: for example, v the a-field generated by the union of a family of a-fields J i is denoted by i 1i .
(*) Bourbaki includes under finite unions the "empty union" and similarly for inter-
sections. We do not use this convention.
3
PROBAB ILITI ES
2-0
4 Limits along Rand N The notation s t t means s -+ t, s s t ; s t tt means s -+ t, sn ttt is used similarly for sequences (sn)' with the additional is increasing. Obvious changes are required if + appears instead notations lim Y)-+ 0, there exist two bounded functions f' and f" which are respectively U.S.C. and 1 .S.C., such that f' :0; f :0; f" and J(fll - f')P< E. Proof: For (a), apply 21 with c the algebra of bounded continuous functions and ~ the set of bounded Borel functions f such that ffP = We know that Cb(E) generates 6a(E) (15). Here the convenient property certainly is closure under multiplication, as is shown by the special case where E = Rn and Cb(E) is replaced by the set of bounded infinitely differentiable functions. We leave it to the reader to prove the same result for infinitely differentiable functions with compact support: in the case of two probability laws P and p', or more generally of two locally bounded measures ~ and ~' on Rn . For (b), we take for C the set of all bounded continuous functions and for R
IfF"
23
16-1
the set of all bounded Borel functions posse~sing the above stated approximation property. We then apply the form (22.3) of the theorem, which avoids uniform convergence. To show that ~ is closed under bounded monotone convergence, we consider an increasing sequence f n of elements of ~ which are uniformly bounded and the corresponding U.S.c. functions f' and l.s.c. functions f" such that f' s f S f" and n n n n n - -2 (fll - f')1P < £2 n . We write f = lim f ,f" = sup f" , f 11 = sup f' and verify n n n n n n . J n n that fi s f s f", (fll - fi)IP < E/2. The function f" is l.s.c. but the function f 1 is not U.s.C. : it is necessary to take f' to be a function sup n s N f'n , where N is chosen sufficiently large so thatJ(fi - f')P < E/2. Here is another example of the use of Theorem 21, useful in the theory of Markov processes.
J
1
24 THEOREM. Let (n,s,p) be a probability space and X and Y two random variables with values in a separable metric space E. To check that X = Y ~-a.s., it suffices to check that, for every pair (f,g) of bounded continuous functions on E, (24.1)
[[f(X)g(Y)J= [[f(X)g(X)J.
Proof: Let ~ be the set of all bounded Borel functions h(x,y) on E x E such that [[h(X,X)J = [[h(X,Y)J : ~is a vector space closed under bounded monotone convergence and uniform convergence. Let C be the set (closed under multiplication) of all functions of the form (x,y) t-+ f(x)g(y), where f. and g are continuous and bounded on E (1). Formula (24.1) tells that cc ~ and we know that C generates the a-field ~(E) x ~(E) = 8(E x E). By 21, ~ contains all bounded Borel functions. We conclude by taking h(x,y) to be the indicator of the complement of the diagonal~
(1) The function (x,y)t-+ f(x)g(y) is frequently denoted by f
0
g.
CHAPTER II
Probability laws and mathematical expectations
As said in the introduction, we assume that our reader is familiar with the more classical parts of measure theory. The first part of this chapter is therefore simply a summary, intended to present the terminology of probability theory. We resume giving complete pro@fs in the paragraph devoted to uniform integrability.
I, A SUMMARY OF INTEGRATION THEORY DEFINITION. A probability law on a measurable space (~,J) is a measure ~ defined on 1 5, which is positive and has a total mass of 1. The triple (~,J,f) is called a probability space. In other words, P is a positive function defined on J such that W(~) = 1, which 2 satisfies the following property (" coun table additivity") : IP( UA n ) = I P(A n ) for every sequence (An)n E N of disjoint events. n n The number P(A) is called the probability of the event A. An event whose probability is equal to 1 is said to be almost sure. Let f and g be two random variables defined on (~,J) with values in the same measurable space (E,c). If the set {w:f(w) = g(w)} is an event (1) of probability 1, we write f = g a.s. where "a . s ." is an abbreviation of "almost surely". Similarly, we shall write "A = B a.s." to express that two events A and B differ only by a set of zero probability. More generally, we use the expression "almost surely" in the same way as people use "almost everywhere" in measure theory. In fact probabilists freely use the vocabulary of measure theory alongside their own : this enables them to avoid repetition and makes their books very pleasant to read. 3 A probability space (~,~,P) is called complete if every subset A of ~ which is contained in a IP-negligible set belongs to the a-field J1 (and then necessarily P(A) = 0). We shall return to this notion in 32 and prove there that any probability space can be completed. EXAMPLES. (a) Let I be the interval [O,IJ. Let us set, for every A E &{I) P(A) = fA dx (Lebesgue measure). (1) This is the case if
(E,~)
is separable and Hausdorff (1.12).
4
18-11
PROBABILITIES
Then P is a probability law on I. ~ is not complete; it becomes so when extended to the a-field of Lebesgue measurable sets. b) Let (a ,:f') be a measurable space and x be a poi nt of rt. We denote by Ex the probability law defined by : Ex(A) = IA(x) (A E .1). This law is also called the degenerate law at x or the unit mass at x. More generally a law P on a measurable space (rt,~) is said to be degenerate if ~(A) = 0 or 1 for all A E ~. Every real-valued random variable then is a.s. equal to a constant. Mathematical expectations 5
DEFI NIT ION. Le t
(rt, :f ,P)
.:::..be:=:-.;;.a~~~...:-:...~~:.:.::..::..~~
random variable. The integral f(w)P(dw) is called the mathematical rt the random variable f and is denoted by the symbol [[fJ. We shall henceforth omit the adjective "mathematical". We give few details on integration theory proper. We just state the two theorems which are most often used and make a few remarks.
~ LEBESGUE'S THEOREM (the dominated convergence theorem). Let (fn)n
E
~ be a sequence
or real-valued random variables which converges almost surely (1), and let f be a random variable a.s. equal to limnf n . If the f n are bounded in absolute value by some integrable function, f is integrable and [[fJ = limnE[fnJ. Given a positive random variable f, finite or not, which is not integrable, we use the convention [[fJ = + 00 Then the following theorem holds.
(2)
8
FATOU'S LEMMA. Let (fn)n E N be a sequence of positive random variables then we have [[lim inf f J ~ lim inf [[f J. n n n n This inequality can be replaced by equality when the sequence is increasing, whether the integrals are finite or not. This last result is known as Lebesguels monotone convergence theorem. In conformity with Bourbaki IS notation, we denote by.tP(rt,:f,IP) (or simplyclP) the vector space of real-valued random variables whose p-th power is integrable (1 ~ P < 00) and by LP the quotient space of JP by the equivalence relation defined by almost sure equality. For every real-valued measurable function f, we set 1
IlfII. p
=
(H If PJ) P
(possibly + 00) •
00
Similarly, we denote by~ (rt,~) the space (independent of P) of bounded random 00 variables, with the norm of uniform convergence, and by L (rt,~,P) the quotient 00 space of~ by the same equivalence relation. The norm of an element f of L (the essenti a1 supremum of If I) is denoted by I fll oo ~oo
(1) Or even only in probability (see 10).
AND POTENTIAL
19-II
We shall use without further reference the following properties of the spaces LP the fact that LP is a Banach space (see for example Dunford-Schwartz CIJ, oo p. 146) ; Holder's inequality (ibid. p. 119) ; the fact that the dual of LI is L (ibid. p. 289). Another necessary result is the Radon-Nikodym theorem(ibid. p. 176), which will also be established in Chapter V as an application of martingale theory. The following two remarks are useful
9
(a) Let f be an integrable random variable which is measurable with respect to a sub-cr-field of jo. Then f is a.s. positive, if and only if
1
fAf(W)rP(dw)
~
0 for all
A E:
~
•
(Take A to be the event {f < O}). It follows in particular that two integrable random variables f and g which are both ~-measurable and have the same integral on every set of ~ are a.s. equal. (b) Let f and g be two integrable random variables; we say that f and g are orthogonal if the product f.g is integrable and has zero expectation. Let ~denote a sub-a-field of~, U be the closed subspace of LI consisting of all classes of ~ oo measurable random variables, and V be the subspace of L consisting of all classes of bounded random variables orthogonal to every element of U. It follows from the Hahn-Banach theorem that every random variable f E: eLl orthogonal to every element of V is a.s. equal to a ~-measurable function. Convergence of random variables We now recall, restricting ourselves to the case of sequences, the main types convergence of real-valued random variables (1). Let (f n) be a sequence of random variables defined on (~,j,rP). We say that the sequence (f n) converges to a random variable f almost surely if rP{w : fn(w) -+ f(w)} = 1. in probability if liml{w : Ifn(w) - f(w)1 > s} = 0 for all s > 0, in the strong sense in LP if the f and f belong tocl.P and limE [If - flPJ = 0, n n n oo in the weak sense in LI (or alternatively: in the sense of the topology a(LI,L )) if the f n and f belong to ~~I and, for every random variable g E: J.oo , lim [[f .gJ = [[f.gJ, n n 2 2 2 in the weak sense in L (or alternatively; in the sense of the topology a(L ,L )) if the f n and f belong to cL2 and, for every random variable g E L2 , limnECfn,gJ = [[f.gJ. We shall return to weak convergence in LI in the section concerning uniform integrability, We just recall here that almost sure convergence and strong convergence in LP imply convergence in probability and that every sequence which converges (1) Or a.s. finite extended real-valued. The definitions relating to convergence in
probability need slight modification for r.v. which are not a.s. finite.
10
PROBABILITIES
20-11
in probability, contains a subsequence which converges almost surely. More precisely, let us set fo every real-valued random variable f TI[fJ = [[Ifl A IJ. Then the function (f,g)~ TI[f-gJ is a pseudo-metric which defines convergence in probability; if the sequence (f n ) satisfies the property L TI[f n - f n+l J < n it converges in probability and almost surely (see for example Dunford and Schwartz [IJ, p. 150). 00
Image laws 11 DEFINITION. Let (n,~,p) be a probability space, (E,c) be a measurable ~pace and f be a random variable from n to E. The image law of P under f, denoted by f(P), is the law Q on (E,c) defined by : Q(A) = P(f- 1(A)) (A E: t:). This law is also called the law of or the distribution of f. Let g be a measurable mapping of (E,8) into a measurable space (G,~). We have the obvious equation : g(f(IP)) = g f)(iP) (lltransitivity of image laws ll ). 0
~ THEOREM.-Let h be a real-valued random variable on (E,0) only if h
h ~ Q-integrable if and
f is P-inte rable and then h(x)Q(dx) = h 0 f(w )IP( dw). E n Integration of probability laws Fubini IS Theorem 0
13 DEFINITION. Let (n,~) and (E,~) be two measurable spaces. A family (Px)x E: E of probability laws on (n,~) is said to be ~-measurable if the function x~ P (A) is x ~-measurable for all A E: ~. Given such a family (P) x x E: E' we have the following statement:
~ FUBINIIS THEOREM. Let Q be a probability law on (E,~). Let (U,U) denote the measurable space (E x n, 0 x 1). (1) Let f be a real-valued random variable defined on (U,U). Each one of the partial mappings x~ f(x,w), w~ f(x,w) is measurable on the corresponding factor space. (2) There exists one and only one probability law $ on (U,li) such that, for all A E C and B E~, (14.1) .
$(A
x
B)
=
fAIP x (B)Q(dx).
(3) Let f be a positive (1) random variable on (U,'lL). The function X'-+ f(x,w)'\(dw).
In
(1) Recall that the integral has been defined for all positive measurable function (cf. 6).
21-II
AND POTENTIAL is
~-measurable
and
fUf(x,w)t(dx,dw)
(14.2)
=
f Q(dX)f f(x,w)lP(dw).
E Sl This relation still holds true if f is $-integrable ; but one can then only assert that wo--t- f(x,w) is Px-integrable for Q-almost all x E E. REMARKS. (a) If f is neither positive nor $-integrable, the right-hand side of (14.2) may be meaningful without the left-hand side being so. (b) If all the Px are equal to the same law P, the law ~ is called the product (law) of ~ and IP and denoted by Q ® P. The probability space (U,U,Q ® IP) is not complete in general. Fubini 's Theorem is often stated for product laws only and in a slightly different form: assume that the factor spaces are complete and that f is measurable on the completed product space; assertion (1) then is no longer true, but still the partial mappings Xf--7 f(x,w) (resp. Wf--7 f(x,w)) are c.-measurable (resp. ~-measurable) for ~-almost all x E E (resp. for IP-almost all w E Sl). (c) The definition of the product of finitely many probability laws is obvious. We do not study here infinite products, which, howeve~ are examples of inverse limits of probability laws, see Chapter III. DEFINITION. In the notation of 14, the integral of the family IP x with respect to Q, denoted by P Q(dx), is the image law of ~ under the projection mapping of E E Sl Ex onto Sl. By combining 12 and 14 we get the following theorem.
15
THEOREM. Let IP denote the law fEPxQ(dX) and f be a positive random variable on (Sl,~). Then the function xo--t- f Sl f(w)1P x(dw) ~ is ~-measurable and -..------f(w)P(dw) = J Q(dX)J f(w)P (dw). JSl E Sl x This relation is also true for every IP-integrable random variable f ; However f is IP -integrable only for Q-almost all x E E, so that f(w)1P (dw) is defined x Sl x ~-a.s., and no longer on the whole of E.
~
f
f
2.
SUPPLEMENT ON INTEGRATION
Uniformly integrable random variables All the random variables considered in this section are real-valued and defined on the same probability space (Sl,~,P) (1). DEFINITION. Let J:t be a subset of the space ot1 (Sl,J,IP).f!.is called a uniformly integrable set if the integrals (17 . 1)
f
{If I
If (w) lIP (dw ) 2
(f
E ))
c}
tend uniformly to 0 as the positive number c tends to +
00.
(1) For the case of a non-bounded measure, see Dunford-Schwartz [IJ.
17
PROBABILITIES
22-II
NOTATION. Let f be a random variable. We denote by fC the function fC(w) = f(w) for I F(w)1 ::; c fC (w) = a for If (w) I > c. We write f = f - fC. Definition 17 then takes the following form : ~is c uniformly integrable if and only if, for every s > 0, a number c exists so that Ilf II < s for every f E }f. c 1 18 REMARKS. (a) Every family of random variables dominated in absolute value by a fixed integrable function (in particular, every finite subset of ~1) is uniformly integrable. (b) Definition 17 is obviously compactible with a.s. equality of random variables (1). It only involves the latter through their absolute values; so we may often restrict ourselves to positive random variables. 19 THEOREM. Let ~ be a subset of t 1 ; for ~ to be uniformly integrable, it is necessary and sufficient that the following conditions hold: (a) the expectations [[lfIJ, f E~, are uniformly bounded (2) ; (b) for evert s > 0, there exists a number 0 > a such that the conditions A E ~,P(A) ::; 6, impl~ the inegualit~
J If(w)llP(dw)
(19.1)
::; s (f E it).
A
Proof To establish the necessity of conditions (a) and (b), we note that, for every integrable function f and every set A E J,
JAIf(w)llP(dw)
(19.2) Suppose that
::; c.P(A) + [[If JJ. c
~
is uniformly integrable and choose c so large that IEIlfclJ < s/2 (f E ~). We first obtain (a) by taking A = ~ , then (b) choosing 6 = s/2c. Conversely, suppose that properties (a) and (b) hold, and let s > a be given. Choose some 0 > a satisfying (b) and let c = sup E[ I fl J/6, (finite by virtue of (a)). Apply (19.1), taking for A the set {If I C; c} ~E~hose probability is less than 0 according to the inequality IP{ I fl ::: c} ::; ~[[ I fl J ; we get (f E ~) If(w)llP(dw)
J{ I f I :::
c}
and }:l. indeed is uniformly integrable. (1) We can thus speak of uniformly integrable subsets of L1 . (2) It can be proved that (a) is a consequence of (b) if the law P is diffuse (i.e. has no atomic part).
23-11
AND POTENTIAL THEOREM. Let ~ be a uniformly integrable set also uniformly integrable.
the closed convex hull of ~in ~I ~
20
Proof: We begin by noting that the closure of a uniformly integrable set in J! is also uniformly integrable: this is an immediate consequence of theorem 19. Hence it suffices to show that the convex hull of ~ is uniformly integrable. We check conditions (a) and (b) of 19. The first one is obvious. Let us choose 8 such that (19.1) holds for every fE:){; let f I ,· .. , f be elements of~, t ,· .. , t numbers n I n ~O such that t +... + t = 1 and A a measurable set such that P(A) ~ o. Then 1
fAlt~fI
+... + tnfnllP
~
t l fA !fliP +... + t n fAlfnl1P
~
E.
Hence condition (b) is satisfied. REMARK. Let Hand K be two uniformly integrable subsets of L1 ; their union H UK obviously is uniformly integrable and so is its convex hull; it then follows from the inclusion: ~(H+K) c convex hull of H UK. that the sum H + K is uniformly integrable. This result can also be deduced simply from 19. The following result generalizes the dominated convergence theorem. THEOREM. Let (f) IN be a sequence of integrable random variables which converges nnE: almost everywhere (1) to a random variable f. Then f is integrable and f n converges 1 -to f in the strong sense inL ,ifand only if the f n are uniformly integrable. If the ra ndom va ria b1es f n _a_re--..!.p_o...;.s_i...t...iv_e~,:--i...t.-..:...is~a_l...:.s ...o_n__e...c. :e.. ;.s. .:.s_a..... ry~a_n_d-..;..s u ...f_f_i...;.c...:.i...e... nt..:...-t...h...a...;..t : lim E[f J = E[fJ < n n Proof: Assume first that the f n converge to f in LI (which supposes the integrability of f) ; we show that conditions (a) and (b) of 19 are satisfied. We have for A 10 ~ ----"--~----':::...-_..;..
00.
(21.1)
fAIf n (w) liP ( dw ) ~ f If (w ) liP (dw) + f A n II
- fill·
Condition (a) follows immediately. We choose an integer N such that IIf n - fIII~ E/2 for all n > N and a number 8 such that the inequality P(A) ~ 0 implies IgllP ~ E/2, when g runs through the finite set {f I , f 2 , ... , fN,f}. The left-hand sia~ of (21.1) is then at most E for all n provided IP(A) ~ 6, and condition (b) is satisfied. Conversely, suppose that the functions f n are uniformly integrable. Then the expectations [[If I] are uniformly bounded and Fatou's Lemma implies thatnlflJ < n 1 Let us show that f n converges to f in L We have (21.2) [[If - fll ~ [[If c - fCI] + Eff ] + 1E[lf I]. n n nc c
f
00
(1) Or only in probability.
21
PROBAB ILI TI ES
24-11
Let € > 0 be given. Choose c so large that the last two expectations are bounded by 8/3 for all n, and such that p{rfl = c} = 0 (which is possible, since there are only countably many t such that P{lfl = t} > O. Next we can choose n so large that the first expectation is bounded by €/3, according to Lebesgue's Theorem, since the functions I f~ - fCI are uniformly bounded and converge almost everywhere to O. The left-hand side of (21.2) then is at most 8, and convergence in norm is established. It remains to show that the convergence of [[fnJ to [[fJ < implies, when the f n are positive, the convergence of [[I f n - flJ to 0 (and consequently the uniform integrability of the f ). To this end, we write: n f + f n = (f v f ) + (f 1\ f n ). n [[ f 1\ f J tends to [[ fJ by Lebesgue s Theorem. On the other hand, H f + f J tends to n n 2[[fJ by hypothesis. It follows that [[f v fnJ tends to [[fJ. We then deduce from the relation If-f!=fvf -fl\f n n n thatHlf - fn!J tends to O. We give a complete proof of the following theorem (due to la Vallee-Poussin), because it helps to understand the significance of uniform integrability. However, the most useful part of it is the implication (2) ~ (1), which is also the easier to establish. For example, every bounded subset of L2 is uniformly integrable 2 (take G(t) = t ). 00
I
22 THEOREM. Let ~ be a subset of ~1. The following properties are equivalent (1) ~ is uniformly integrable. (2) There exists a positive function G(t) defined on R+ such that 1im .§l:U = + and (1) t-+<x> t 00
(22.1)
sup [[G f
E
0
IflJ
0 be given and let a = ~' where Mis the value of the left-hand side of (22.1). We choose c so large that G~t) ~ a for all t ~ C. Then we have I fl $ ~ on the set {If I ~ c} and consequently
Jnfl
~ c}
!f!P
$
1fa {I fj
~ c}
G
0
Iflp
$
1M =
€
a
for every function f E~. Definition 17 is therefore satisfied. We now establish the converse by constructing a function G(t) of the from f;9(S)dS, where g is an increasing function equal to zero at t = 0, which tends to with t and takes a constant value gn on each interval [n,n + 1[ (n write, for each function f E }. an(f) = P{lfl > n}. +
00
(1) The function G which we construct is also convex.
E
~).
We
25-II
AND POTENTIAL Since go = 0, we have HG o IfIJ~gl·IP{l< If I ~ 2} + (gl+ g2).1P{2 < If I s 3} +... =
00
L g.a (f). n=1 n n Hence it remains to show that it is possible to choose coefficients gn which tend to infinity as n increases, such that the sums Ign.an(f) are uniformly bounded. We choose an increasing sequence of integers c , which tends to infinity, such that f If lIP ~ 2- n (fnE~) { I fl ~ c n} according to our assumption of uniform integrability. We have:
I
f
I
If lIP ~ klP{k < If I ~ k + 1} ~ lP{lfl > m} = am(f). } k=c m=c m=c n n n n It follows that the sum I am(f) is uniformly bounded for f E' ).; but this sum
J{ Ifl
~ C
r
00
n
en
is of the form I gm.a (f), where g denotes the number of integers n such that m m m c ~ m. The theorem is established. n kJ~gLtQQQ1Qgi~~ oo We now give some results on the weak topology a(L 1 ,L ) closely related in fact to uniform integrability. We make some use of the conditional expectation operators, which will only be defined later (40), but this involves of course no circularity. We first recall a well know theorem:
THEOREM (Vitali-Hahn-Saks). Let (~n) be a seguence of bounded measures, not necessarily positive, on a measurable space (~,5) and let A be a bounded positive measure such that the ~n are absolutely continuous with respect to A. Suppose that for all A E j the limit ~(A) = lim]J (A) exists and is finite. Then nn (1) ~ is a bounded measure. (2) For every € > 0, there exists n> 0 such that the inegualitYA(A)~ n implies sUPn I~nl (A) ~€. Further, the masses II~nll are uniformly bounded. Proof: We note first that the existence of A such that the ~n are absolutely continuous with respect to A is not a restriction : it suffices to take A = II~n 1/2nll~nll. Then comparing (2) and 19, we may state (2) in a different way: the densities ~ n/A are uniformly integrable with respect to A . Let ¢ be the subset of L1 (A) consisting of the equivalence classes of indicators of elements of j (we shall denote these classes by the elements of ~ they represent) 1 ¢ is closed in L , hence ¢ is a complete metric space. The functions A~ ~n(A) are continuous on ¢ and converge pointwise to A~ ~(A). Let a> 0 and let Lj = {U
E
¢:
\im ~
j, \in ~ j, l~n(U) -~m(U)1 ~ a}.
Lj is a closed subset of ¢ and the union of the Lj is the whole of ¢. By Baire's Theorem, there exists a j such that L. has an interior point A. In other words, J there exist an integer j and a number h > 0 such that the relations
23
PROBABILITIES
26-11
j, m ~ j, A(B ~ A) ~ h imply l~n(B) - ~n(A)1 ~ a . Such a j being chosen, let n E JO,h[ be such that A(C) ~ n implies for i = 0, 1, ... , j (hence I~il (C) ~ 2a (1)). For n ~ j, we write n
~
l~i(C)1
~
a
l~n(C) I ~ l~n(A
U C) - ~n(A) I + l~n(A\C) - ~n(A) I ~ l~n(A UC) - ~j(A U C)I + l~j(A U C) - ~j(A)1 + l~j(A) - ~n(A)1 + l~n(A\C) - ~j(A\C)1 + l~j(A\C) - ~j(A)1 + l~j(A) - ~n(A)I· Thus A(C) ~ n ~ sUPnl~nl (C) ~ 6a. We deduce the following properties. (1) Since n decomposes into finitely many sets of measure ~n (relative to A) and finitely many atoms of measure ~n, the total masses of the ~n are bounded. Note that this argument is unnecessary if the ~n are positive. (2) Taking a = E/6, we get the last sentence of the theorem. (3) The additive set function ~ is bounded and property (2) implies that ~ is a measure, absolutely continuous with respect to A. Let indeed (E k) be a decreasing sequence of elements of ~, whose intersection is empty: then A(E k) + 0 and hence ~(Ek) + 0 ; ~ is therefore countably additive. We know that ~ then is the difference of two positive measures. The theorem is proved. We now prove a special case, much easier than the general one, of the theorems of Eberlein and Smulian from the theory of topological linear spaces. As usual we work on a probability space (n,5,p). oo 24 THEOREM. Let K be a subset of L1 , which is compact under the weak topology a(L 1 ,L ). If the a-field j is separable, K is metrizable. Even if 5 is not separable, every sequence of elements of K contains a convergent subsequence. Proof: Suppose first that j is separable. Let (H n) be a sequence of elements of ~ which generates it and let Q be the Boolean algebra generated by the H ; it is n easily verified that Q is countable. On the other hand, if f and g are two elements 1 of L , the relation JAW =fA gp for all A E aimplies f = g a.s. (cf. 1.20). We order the elements of ~ into a sequence (An) and write, for f, g E K, n d(f,g) = I a- 11 fP gPI where a = 2 (l+sup hPI)· n A A n hE K A n n n d is a metric on K. The associated topology is Hausdorff and coarser than the (compact) topology of K, and hence is equal to it. Let (f n) be a sequence of elements of K and let [f0 be the a-field generated by the f n ; $0 is separable even if ~ is not. Let U denote the conditional expectation operator g~ [[gl~oJ, which maps Ll(~) continuously onto L1(5 ) ; U(K) is a metri0 zable weakly compact subset of Ll(~O) and U(f ) = f . Hence we can find a subsequence 1 n n (f~) of the sequence (f ), which converges to f E Ll(~O) for the topology a(L (50)' 00 n 00 L (§O)), Consider now gEL (j) and let h be [[gljoJ E Loo(~O)' We have
J
ff~9p
=
If
J
ff~hP
+
ff~P
= ff9P.
(1) Recall thatlel(A) = ~R(le(B)I + le(A\B)I) for every measure e.
27-II
AND POTENTIAL
f relative to the topology a(L1(~), Loo(~)) and the theorem is established. q REMARK. Theorem 24 extends immediately to all the weak topologies a(LP,L ), where q is the conjugate exponent of p and 1 ~ p ~ The implication (1) ~ (3) of the next theorem will be a fundamental tool in the following chapters. The other implications will not be used as much, but are still very interesting.
Thus f'
n
+
00.
THEOREM (Dunford-Pettis compactness criterion) (1). Let » be a subset of the space L1 . The following three properties are eguivalent : (1) ~ is uniformly integrable. 1 1 (2) ~ is relatively compact in L with the weak topology a(L ,L ). (3) Every sequence of elements of ~ contains a subseguence which converges in 1 the sense of the topology a(L ,L ). 00
00
Proof: We show that (1) ~ (2). Let ~ be an ultrafilter f E ~ and every set E E ~, we define If(E)
on~;
for each function
IEf(W)~(dW),
=
By the relation Jlf(E) 1 ~ [[lfIJ and condition (a) of 19, the numbers If(E) are uniformly bounded. The limit I(E) = lim u If(E) therefore exists for all E E ~. Clearly the set function E~ I(E) is additive and bounded. By condition (b) of 19 there exists for all E > 0 a number 0 > 0 such that P(E) ~ 8 implies II(E) I < E; hence I is a measure which is absolutely continuous with respect to P. There then exists, by the Radon-Nikodym Theorem, a function ¢ E L1 such that for every measurable set E I(E)
=
IE
¢(w)~(dw).
Assertion (2) will be established if we show that topology. Obviously 1i m U
[cl f . g J
=
~
converges to ¢ in the weak
H ¢ . gJ
for every function g E ~ which is a finite linear combination of indicators of sets. Since every function 9 EoC is a uniform limit of such functions, the conclusion follows by uniform convergence. The assertion (2) ~ (3) follows from 24. Finally, (3) ~ (1) : assume indeed that (1) does not hold; then by (19) ~ contains a SUbsequence (f) such that n either [[If IJ + or there exist a number E > 0 and n elements A of $ such that P(A ) + 0 a~d J If Ip ~ E. According to 23, (2), this sequence has ~o weakly convernAn gent subsequence and (3) is false. The following result illustrates the difference between weak convergence and strong convergence in L1 : a sequence (f ) which converges weakly but not strongly n 00
+00,
(1) For the case of non-bounded measures, see Dunford-Schwartz [1J.
PROBABI LIT! ES
28-11
oscillates violently around its weak limit. 26 THEOREM. Let (f ) be a sequence of integrable functions on ~, which converges to f n 1 f a.s. -on A ; ~ +han in the sense of a(L ,L ). Let A E: ff be such that f::; lim inf nn ] If-fllP-+o. 00
A
n
Proof : We immediately reduce to the case where f = 0 and A = ~ . The functions f n are uniformly integrable by 25. We choose a > 0 such that IP(U) < a implies I f lIP ::; s for all n. Then we set, for N an integer, fU n A = {w : i nf f (w) ~ - d n n~N n and choose N so large that IP(Ag) < a, according to our hypothesis that lim infnfn~ O. The sequence (f ) converges weakly to 0, so we may choose N' ~ N such that n ~ N' implies f ~I ::; s. Then we have, if n ~ N',
If A
N
n
fl fnllP::; fA IfnlP + fAc fnlP::; fA If n + sip + fA sIP + JAC I fnlP. N N' N N N The last two integrals are no greater than s(from the choices of N and a for the second one.) On the other hand, by the definition of AN' the first integral on the right is equal to (f +s)1P ::; f IPI + s ::; 2s since n ~ N' . Hence finally A nAn JlfnllP::; 4s. N N
f
If
A theorem of Mokobodzki We have seen earlier (no. 10) that any sequence which converges in probability contains an a.s. convergent subsequence. It sometimes happens (e.g. in the theory of Markov processes) that one is given on some space (~,§) a whole family (Pi) of probability laws and a sequence (f n) which converges in probability for each of the IP 1.. Is it then possible to select a single random variable f such that f n -+ f in probability for each of the P.1 ? If we knew how to extract from (f n ) a subsequence (f n' ) convergi ng P.1 -a. s. for every i, the functi on f = 1im i nf nf n woul d be the solution. Unfortunately, the procedure in 10 depends on the law IP 1.. Mokobodzki has shown that there is a universal extraction procedure (performed by means of a filter, not a subsequence) which yields the existence of f. The proof uses the "con tinuum hypothesis" or continuum axiom. We shall see later another procedure (1) (also using the continuum hypothesis) which leads to analogous results (cf. Meyer [1J). I
27
LEMMA. There exists a filter r Qn ~ with the following property: for every strictly increasing sequence (sn) of positive integers, there exists a strictly increasing sequence (t n) such that (1) sn ::; \ for all sufficiently large n, (2) for all n, the set {tn' t n+1 , ... } belongs to r.
----
(1) That of "medial limits", also due to Mokobodzki
AND POTENTIAL
29-11
Proof: We denote by I the set of all countable ordinals (0.8) and by ~ the set of all strictly increasing sequences of positive integers. The continuum axiom affirms the existence of a bijection i ~ si of I onto a. We shall construct by transfinite induction a mapping i ~ti of I into ~ with the following properties: (a) sin ~ t ni for . all sufficiently large . n, (b) if i < j, t J is a subsequence of t' except for a finite number of terms. The lemma then follows immediately. For. each i let indeed f. be the "elemen-' tary filter" associated with the sequence t', that is, the set of all AclN which contain all but a finite number of the t ni ; by property (b), the mapping i ~ f., is increasing. Hence there exists a filter r containing all the f.1 (even an ultrafilter) and, all the strictly increasing sequences having been enumerated, the filter r satisfies the lemma by virtue of (a). We pass to the construction. We write to = sO. If t i is constructed, we take .. t i+1 to be a subsequence of t i such that ski+1 ~ t i+1 k for all k (an ,mmedlate construction by induction on k). If (a) and (b) hold up to the i-th term they then hold up to the (i+1)-th term. If i is a limit ordinal and the t. have been constructed J for all j < i, we proceed as follows: we choose a strictly increasing sequence of ordinals jn < i such that i = sUPnjn. By (b), we can consttuct sequences un by suppressing a finite number of terms at the beginning of t Jn such that un+1 is for n ~ si for all n a subsequence of . un. We may suppress a few more and assume that uo n all n. We t~en write t~ = u8 ; this sequence is a subsequence of each of the Jn , except for a finite number of terms, and it is by construction "more sequences t rapid" than s i. Hence the induction. is possible and the lemma is proved. We call filters satisfying conditions (1) and (2) of 27 rapid filters.
,
y·
THEOREM. On a complete probability space (~,~,P), let (f n) be a sequence of measurable functions which converges in probability to a function f. Let r be a rapid filter on N? Then for almost all W limrfn(w) = f(w).
28
Proof: We reduce it immediately to the case where the f n and f have values in the interval [-1,1J. Then the f n converge to f in L1 . Let s = (sk) be a strictly increasing sequence of integers such that m ~ sk implies Ilfm - fll ~ 2- k , then let t = (t k) be a sequence such that sk ~ t k for all sufficiently large k and r is finer than the elementary filter associated with t (property (2) of 27). Since Illf t k+1 f ll {(X), we a.s. have lim fn(w) = f(w) relative to this elementary filter and a tk fortiori relative to r, which is finer. COROLLARY. Let (f n) be a sequence of ~-measurable functions and let f = lim infrf n . For every law P such that the sequence (f n) converges in P-measure,it can be affirmed that f is egual P-a.s. to an j-measurable function and that f n + f in Pmeasure.
29
30-11
PROBABILITIES
Note however that the limit f is not universally measurable in general. The similar procedure, of "medial limits" , always leads to universally measurable functi ons.
3.
COMPLETION. INDEPENDENCE. CONDITIONING We now come back to elementary results of a probabilistic nature.
Internally negligible sets 30 DEFINITION. Let (D,},P) be a probability space. A set A CD is called internally
P-negligible if P(B)
= 0 for every B E
~
contained in A.
31 THEOREM. Let ~ be a family of subsets of D which satisfies the folJowing conditions: (1) 'l1 is closed under (uc). (2) Every element of ~ is internally ~-negligible. Let JI be the a-field generated by ~ and~. The law W then can be extended uniquely to a law P' on J' such that every element of'Y) is P'-negligible.
Proof: We merely indicate the main steps, leaving details to the reader. Let ~ be the family of subsets of D which are contained in some element of ~ and be the family of subsets of the form F Ll M (F E j , ME ~). One checks easily that is a a-field; since kJ1l contains the empty set, j c ~ and similarily Jnt c Let A = F Ll M be an element of1; we set Q(A) = P(F). It can be verified that Q(A) depends only on A, not on the representation F Ll Mof A. To show that Q is a probability law on~, we consider a sequence (An).of disjoint elements of o t, and their union A. Each A is of the form F Ll M (F E g, n n n n MnE ~). Let F be the union of the Fn . Since the Fn are disjoint up to a negligible set, P(F) = L P(E n) ; on the other hand, A and F differ only by an element of~.
q
r
1.
n
Hence Q(A) = ~ Q(A n). The required law P' is then the restriction of ~ to ~'. To establish uniqueness, we consider another law P" on (1.' extending P, such that every element of 1'\ is P"-negligible. Every element of ttn then is internally 1P"negligible, so that P" extends to a law on such that every element of 'lT1 is negligible. This law is then identical to Q and hence pi = /P".
t
32 REMARKS. (a) Theorem 31 is often applied to a family ~ consisting of a single internally negligible set. (b) The theorem implies the possibility of completing (cf. 3) a probability space (D,~,IP) : one takes for ~ the class of all subsets of P-negligible sets. Let then ~P be the completed a-field; every element of~P can be expressed as F Ll M, where F belongs to 3 and M is contained in some W-negligible set N E J. Then F Ll Mlies between the two sets F\N and FUN, which belong to ~ and differ only by a negligible set, and this property obviously characterizes the elements of
31-II
AND POTENTIAL
jP. The usual approxiamation of measurable functions by step functions now gives the following result: A real-valued function f is measurable relative to the completed a-field jP, if and only if there exist two J-measurable real-valued functions g and h such that g s f s h, /P{ g
t-
h} = O.
(c) Let (~,~) be a measurable space; for each law P on (~,J) consider the completed a-field ~P and denote by ~ the intersection of all the a-fields ~p : the measurable space (~,~) is called the universal completion of (~,~). The reader can verify the following properties: A ,. (1) Every law P on ~ can be extended uniquely to a law P on ~ and the mapping p~ is a bijection of the set of laws on $ onto the set of laws on ~. (2) Let (E,c) be a measurable space and f a measurable mapping of (~,~) into " into (E,t). " (E,c) ; then f is a measurable mapping of (~,c:f) (d) The universal completion of a Borel a-field ~(E) is denoted by ~u(E) and is also called the a-field of universally measurable sets of E. If E and F are two topological spaces and f is a mapping from E to F. f ;s called universally measurable if it is measurable from ~u(E) to ~u(F). By (c) it suffices that it be measurable from ~u(E) to ~(F).
P
l!1.9~P~!1.9~!lE~
Text books on elementary probability theory give an important place to independence. In this book, we shall not need it much and we refer the reader to Chung [lJ for a more detailed study. DEFINITION. Let (Xi)i E I be a finite family of random variables from a probability space (~,j,lP) to measurable spaces (E., c.). I' ----Let X be the random variable 1 1 1 E (X.). I with values in the space ( n E., n e.). The random variables X. (or the 1 1
iE I
E
1
33
i EI I I
family (Xi)) are said to be independent if the law of X is the product of the laws of the X.. 1 Let (Xi)i E I be an arbitrary family of random variables. The family (Xi) ~ said to be independent if every finite subfamily is independent. More concretely (cf. 14) : the random variables (Xi)i E I are independent if and only if P{ viE J, X. EA.} = n P{ X. EA.} 1 1 iEJ 1 1 for every finite subset J c I and every family (A.). J such that A. E t. for all 1 1 E l l
i
E
J.
The definition of independence can be given another form. DEFINITION. Let (~,5,P) be a probability space and let Cg;i)i E I be a family of sub- 34 a-fields of~. -The a-fields (:f.). I are called independent if 11 E IP(
n A.)1
. J
1E
=
np(A.) J 1
. lE
PROBABILITIES
32-II
for every finite subset J c I and every family of sets (Ai)i € J such that Ai € §i for a11 i E J. The definitions 33 and 34 can easily be reduced to each other. The random variables (Xi)i E J are indeed independent (in the sense of 33) if and only if the a-fields ~(X.) are independent (in the sense of 34). Similarly, the a-fields (jii)iE.J are independ~nt if and only if the random variables Xi are independent, Xi denoting the identity mapping from (~,~) to (~,~;). 35 THEOREM. Let ~1' §2"'" ~n be independent a-fields and let f 1 , f 2 ,···, f n be integrable real-valued random variables, measurable relative to the corresponding a-fieldsJ1 , J 2 ,···, jn' Then the product f 1f 2 ... f n is integrable and [[f 1 ·f 2 ··· fnJ
=
[[f 1J·[[f 2J ... [[fnJ·
~QD9j:tjQDjD9
The notion of conditional expectation is essential to probability theory. We give the different forms of the definitions in nos. 36-39 and then, in no. 40, we list all properties that must be kept in mind. be a probability space and f be a random variable from (~,*) to some measurable space (E,c). Let ~ be the image law of P under f. Let X be a P-integrable random variable on (~,~). There exists a Q-integrable random variable V on (E,t) such that, for every set A E c:
36 THEOREM. Let
(36.1)
(~,5,p)
I
V(x)Q(dx) =
A
I
f
X(w)P(dw).
-1
(A)
If VI is any random variable satisfying (36.1), then VI
=
Va.s.
Proof: The assertion concerning uniqueness of V is an immediate consequence of remark 9, (a). To establish the existence of V, we begin by assuming that X belongs to J2(p). We associate to every Z E t 2(Q) the number I (Z 0 f)X.P, which depends only on the equivalence class of Z. We thus get a linear~functional on t 2(Q), whose norm is at most IIX11 2 . Hence there exists a function V E et2(~) such that 2 I~(Z 0 f)X.P = IE zV.Q (Z E t (Q)). The function V solves the problem. If further X is positive, V has a positive integral on every set A E ~ ; hence it is a.s. positive by 9, (a). We now pass to the case where X is only integrable. The same is true of its positive part X+ and its negative part X-. The random variables X~ = x+ A n (n E ~) belong tol 2(P). Hence we can associate to them random variables V~ as above. By the preceding remark, these random variables are a.s. positive and increase with n a.s. and their integrals are bounded by [[X+J. Hence we can choose an integrable random + + variable V , a.s. equal to the limit of the Yn' Similarly we construct form X- a + random variable V ; the integrable random variable V = V - V satisfies (36.1)
33-11
AND POTENTIAL and the theorem is established.
DEFINITION. ~et Y be an o-measurable and Q-integrable random variable satisfying relation (36.1). We call Y (a version of) the conditional expectation of X given f. This will be denoted provisionally by [[X/fJ this notation will not be used after 39.
37
REMARKS. 38 (a) If X is the indicator of an event B, [[X/fJ is called the conditional probability of B, given f. It is important to keep in mind that sud a "pro bability" is not a number, but a random variable defined up to equivalence. (b) Consider a partition of the set ~ into a sequence of measurable sets A n and denote by f the mapping of ~ into ~ equal to n on An' A measure Q on ~ then is defined by Q({n}) =1P(A n). Let X be an integrable random variable on~; it is easy to compute Y = [[X/fJ
~
Y(n) = P(~n)
for all n such that P(A n ) t O. If IP(A ) is zero, Y(n) can be chosen arbitrarily (1). Suppose in particular that X n IP(B nAn) ;s the indicator of an event B ; then Y(n) = P(A ) if IP(A n ) is non-zero. We n recognize here the number which is called, in elementary probability theory, the conditional probability of B given that A occur. It would be tempting to use the n-same terminology in the general case and to call the value Y(x) (x E E) lithe conditional expectation of X given that f(w) = x", but this would be improper, since the random variable Y is only defined up to Q-equivalence and one may not talk about its value at a point x unless Q({x}) I O. (a) Let X be a non-integrable positive random variable. Passing to the monotone 39 limit as in the proof of 36 gives a positive random variable Y, finite or not, defined up to a.s. equality, which satisfies formula (36.1). We still denote it by [[X/fJ and speak in this case of a generalized conditional expectation. Then [[X/fJ is finite a.s. if and only if there exists an increasing sequence (A ) of elements n of such that
e,
A = E, J -1 XIP < + for all n. n n f (An) (b) Given an arbitrary random variable X, we now say that X has a generalized conditional expectation if [[X+/fJ and [[X-/fJ are finite a.s. and we then set [[X/fJ = [[X+/fJ - [[X-/fJ. We started with definition 37 of conditional expectations, because it may be the most intuitive one. But it has a for more important variant, in fact, the only U
00
(1) Usually we take Y(n) = 0, in conformity with the convention ~ =
o.
PROBABILITIES
34-11
form that we shall use hence forth. One gets it by taking, in statements 37-39, E to be ~, c to be a sub-a-field of ~ and f to be the identity mapping. The image measure ~ then is the restriction of P to ~ and we have the following definition
e
40 DEFINITION. Let (~,J,P) be a probability space, be a sub-a-field of ~ and X be an integrable random variable. ~ (version of the) conditional expectation of X given ~ is any ~-measurable integrable random variable Y such that (40.1)
JAX(W)P(dW)
=
JAY(w)P(dW)
for all A E
~.
In general we omit the word "version". We denote Y be the notation [[X I~] (1). If C is the a-field a(f i , i E I) generated by a family of random variables, we speak of the conditional expectation of X given the f. and write E[Xlf., i E I]. If X is the indicator of an event A, we speak of the conditional probability of A given t(or the f ) and write P(A Ie), P(Alf , i E I). It often happens that conditional i i expectations are iterated as in [[[[XIJI]I~2] , ~I and ~2 being two sub-a-fields of ~. We then use the simpler notation [[XIJII ~2]' which is entirely unambiguous.
,
,
REMARKS. (a) Coming back to the notation of 36-37, denote by $ the a-field a(f) ; we have the a.s. equality [[XI~] = Y 0 f. Theorem 1.18 then reduces Definition 37 to Definition 40. (b) A random variable X (not assumed to be positive or integrable) has a generalized conditional expectation given t if and only if the measure Ixl.p is afiniteon~.
Fundamental properties of conditional expectations
~
We group under this heading all the properties of conditional expectations which we use later on. In particular, we state again Definition 40 in another way. All random variables concerned are defined on (~,J,~). PROPERTY 1. Let X and Y be integrable random variables and a, b, c be constants. Then, for every a-field Gc.1, (41.1)
[[aX + bY + cl
~J = a.[[XI~] + b.[[YI~] +
c a.s.
PROPERTY 2. Let X and Y be integrable random variables such that X $ Y a.s. Then HXI~] $ HYlc] a.s. PROPERTY 3. Let Xn (n E ~ be integrable random variables which increase to an integrable random variable X. Then (41. 2)
[[xlt]
(1) Hunt simply writes
~X.
=
lim n[[X n I~J a.s. -
This is an excellent notation
35-11
AND POTENTIAL
PROPERTY 4. (Jensen's inequality). Let c be a convex mapping of R into ~ and let X be an integrable random variable such that coX is integrable. We then have (41. 3)
c
[[XI~J ~
0
Hc
0
XI~J ~.
Proof: The function c is the upper envelope of a countable family of affine functions Ln(x) = anx + bn . The random variables LoX are integrable and n Ln IECXI~J = [[L n xl EJ ~ [[c XI~J. Then we take the upper envelope on the right-hand side. If X takes its values in some interval I of R, it obviously suffices that c be convex on I. 0
0
0
PROPERTY 5. Let X be an integrable random variable; then [[XlcJ ~ ~-measurable if Xii ~-measurable. then X = [[XI~J a.s. (This is a partial restatement of the definition of conditional expectations; with an obvious consequence of their uniqueness). PROPERTY 6. Let~, G, be two sub-a-fields of integrable random variable X
~
such that
~
c ~. Then for every
(41. 4)
And in particular (41. 5)
[[E[XI~JJ
= [[XJ.
(The first formula is an immediate consequence of uniqueness. The second follows by tak i ng 9J = {It' ,~.l} . ) PROPERTY 7. Let X be an integrable random variable and Y be an variable such that XY is integrable. Then (41. 6)
[[xYI~J
= Y.[[X!cJ
~-measurable
random
~.
Proof: When Y assumes only finitely many values, (41.6) is an immediate consequence of the definition of conditional expectations. The general case follows by monotone convergence. The extension of these properties to generalized conditional expectations is sometimes useful. We leave it to the reader. CONTINUITY PROPERTIES
42
We apply Jensen's inequality taking c(x) to be the function Ixl P (1 ~ p ~ 00). We get (42.1)
II [[X I~JII
~ II XII
.
P P The same inequality is obvious for p = 00. The mapping X~ [[XlcJ therefore is an operator of norm ~1 on LP (1 ~ P ~ 00). Now it is well known that a continuous linear operator on a Banach space B still is continuous when B is given its weak topology o(B,B*) see for example Bourbaki [lJ (1), Dunford-Schwartz [lJ, p. 422). (1) E.V.T. IV, 2nd edition, §4, no. 2, Proposition 6 (page 103).
PROBABILITIES
36-11
Hence the conditional expectation operators are continuous for the weak topologies oo a(L 1 ,L ) and a(L 2 ,L 2), for example. ~ be a sequence of integrable random variables which converges Let (X) a.s. t8 ~nEihtegrable random variable X. It may be asked whether the conditional expectations [CX ItJ converge a.s. to [CX1~J, for any a-field t. Doob has shown that thenanswer is yes if the X are dominated by a fixed integrable function, and Blackwell and Dubinsnhave shown in CIJ that this condition cannot be improved. ~QQ22~2QQ~l_2Q2§Q§Q2§Q~§
The proof of Theorem 45 may be a good exercise on Properties 1-7 above. 43 DEFINITION. Let (Q,1,P) be a probability space and jl,J2 ,13 be three sub-a-fields of j. jl and j3 are called conditionally independent given j2 2i (43.1)
[CYIY3Ij2J
=
[CYl!j2J.IECY3!j2J ~.
where Y1 , Y3 denote positive random variables measurable with respect to the corresponding a-field J 1 , j3' 44 REMARKS. (a) Taking (f2 to be the a-field {0,Q}, we recover the definition of independence (33,34). We could similarily define conditional independence of several 0fields relative to a given a-field. (b) It can easily be shown, through the usual monotone limit procedure, that it suffices to assume (43.1) when Y1 and Y3 are indicators of sets. 45 THEOREM. Let -1'12 be the a-field generated by.1'l and .1'2' Then jl and 3'3 are conditionally independent given ~2' if and only if (45.1)
IECY 3!.1'12 J
=
[CY 3!:f2J a.s.
for every j3 measurable and integrable random variable Y3 · Proof: (a) (43.1) ~ (45.1). We wish to check that both sides of (45.1) have the same integral on every element of j12' Now the set of elements of j12 for which this property holds is closed under (umc,nmc). On the other hand, the family C of finite unions of disjoint sets of the form Al n A2 (AI E J 1 • A2 E j2) generates j12' Hence it suffices, by 1.19, to verify that [cala2,[CY2!3'12JJ
=
[Cala2·lECY31j2JJ
where a 1 and a 2 denote respectively the indicators of Al and A2 . Now we have (the numbers indicating the properties used) : [Cala2,[CY3IJI2JJ = [C[Ca 1a2Y3! J12 JJ (7) = [Ca a Y J (5) 1 2 3 =
[CHala2Y31~2JJ
(5)
=
[Ca2.[CalY3!j2JJ
(7)
=
[C a2.[ Ca11 J 2J.[ CY31 J2JJ
(43 . 1)
37-II
AND POTENTIAL
(b) (45.1)
~
= IECa 2 ·H (a1'[[Y3Ij2J) Ij2JJ
(7)
= [[IEC (a2a1[[ Y31 ji 2J ) 1.1' 2J J
(7)
= IECa2a1IECY31j2JJ
(5)
=
IECY1Y31j121j2J
(6)
=
IEC(Y1·IECY31~12J)lj2J
(7)
(43.1). We ha ve : ECY1Y21~2J
= [C(Y1,[CY31j2J)I~2J =
[CYlI12J,[CY3If2J
(45.1) (7)
CHAPTER III
Complements to measure theory
Thanks to Hunt [1J, Choquetls theorem on capacitability has become one of the fundamental tools of probability theory. This theorem is proved in paragraph 2 and constitutes the core of the chapter. Paragraph 1 contains the elements of analytic set theory necessary to prove Choquet's theorem and other results useful to probabilists (Blackwell IS theorem for instance). Paragraph 3 is devoted to bounded Radon measures. We have tried to restrict ourselves to really useful results, either for probability theory or for potential theory except in (the appendix, which contains some luxury theorems). But this does not mean they are all equally important. The reader that looks for essentials may limit himself to nos 1-13, 27-32 and 44.
I.
ANALYTIC SETS
Let E be a set. A paving on E is any family of subsets of E which contains the empty set; the pair (E,e) consisting of a set E and a paving ~ on E is called a paved set. This terminology is used only in this chapter and the applications which depend on it. Let (E.,~.),. (re~. , , E I be a family of paved sets. The product paving of the ~., the sum paving (1) of the ~.) is the paving on the set n E. (resp. L E.) consis,
ting of the subsets of the form
n id
A. (resp. '
I id
'1"1 ' , E A.) where A. c E. belongs to , E
'
"
c. ,
for all i (and, in the case of the sum, differs from 0 only for finitely many indices). The first edition of this book gave a different definition of the product paving, analogous to that of the sum paving, insisting-that A. = E. except for a finite number of indices. It then follows, when this nu~ber ~s equal to 0, that the whole space belongs to every product paving, which causes some inconveniences. The present definition is better, given that we only consider countable products (or sums)
----
(1) Recall that the sum of the E. (denoted by L E. or II E.) is the union of the , '1"1' sets Ei x {i}. 'E , E
1
PROBABILITIES
40- I II
It should be noted that, when the ~.1 are a-fields, the product paving of the ci is not the same as the product a-field of the ~i (the latter being generated by the product paving when I is countable). Hence there is some ambiguity in using notations such as n ~. or c x ~ to denote a product paving. We shall nevertheless id
1
fl)
use them, in this chapter only \
,being explicit when necessary.
Compact and semi-compact pavings 2
Let (E,~) be a paved set and (K.). I be a family of elements of c. We say that 1 1 E this family has the finite intersection property if K. 1 0 for every finite . I 1 lE 0 subset I O c I. This amounts to saying that the sets Ki belong to a filter or also, by the ultrafilter theorem (2), that there exists an ultrafilter U such that K.1 E U for all i E I.
n
3
DEFINITION. Let (E,c) be a paved set. The paving ~ is said to be compact (resp. semi-compact) if every tami Iy (resp. every countable family) of elements of c, which has the finite intersection property, has a non-empty intersection (3). For instance, if E is a Hausdorff topological space, the paving consisting of the compact subsets of E (henceforth denoted by X(E)) is compact. Abstract compact pavings are seldom found: an interesting example is that of the lIislets of iNN (no 77 in the appendix). Let & be a compact (resp. semi-compact) paving on E then the paving 0 u{E} is compact (resp. semi-compact). The definition of analytic sets given in this edition no longer uses semicompact pavings. Hence the reader can omit every reference to it. The reasons for retaining it are of a purely aesthetic nature. ll
4 THEOREM. Let E be a set with a compact (resp. semi-compact) paving ~ and let ~I be the closure of C under (uf,na) (resp. (uf,nc)). Then the paving c' is compact (resp. semi-compact). Proof: Let j be the closure of Gunder (uf). Then ~. is the closure of J under (na) (resp. (nc)). The latter closure obviously preserves compactness and hence it will suffice to show that j is compact (resp. semi-compact). So let us consider a family (resp. a countable family) (Ki)i E I of elements of 8, which has the finite intersection property; let U be an ultrafilter such that Ki E U for all i. Each set Ki is a union U K.. of elements of C, where J. is a finite set. Hence there exists ~
. Ji JE
1J
1
(1) The best solution consists in using the symbol ® for product a-fields, as does Neveu [1 J. (2) Bourbaki [2J (3rd edition), §6 no 4, Theorem l. (3) Here is a simple example of a non-compact semi-compact paving: on a non-countable set, the paving consisting of all finite subsets and all subsets whith a countable complement.
41-1 II
AND POTENTIAL
an index j. E J. such that K.. E U (1). The fami ly (K .. ). E I therefore has the
" ' Ji
'J i
'
finite intersection property, hence its intersection is non-empty and so a fortiori is the intersection of the family (Ki)i E I' THEOREM. Let (E.,c.). E I be a family of paved sets. If each of the pavi ngs ~i h " compact (resp. semi-compact) so are the product paving IT c.,. and the sum pavi ng iEI
,
5
Ie ..
id
'
Proof: The proof is immediate as far as the product paving is concerned. Let ~ be the paving on the sum set L E. consisting of all subsets of the form I A. such iEI ' iEI ' that A. = 0 for all indices except at most one i, for which A. belongs to ~ .. This paving is obviously compact (semi-compact). It then suffices to note that the sum paving is the closure of ~ under (uf). There is no need to attach any importance to the II semi ll -compact nature of the paving in the following statement: the gain in generality is illusory.
,
,
,
THEOREM. Let (E,c) be a paved set and let f be a mapping of E into a set F. Suppose that, for all x E F, the paving consisting of the sets f- 1 ({x}) n A, A E 0, is semicompact. Then, for every decreasing seguence (An)n E N of elements of ~.
(6.1)
f( n A) nclN
n
=
6
n f(A ).
flElN
n
Proof: It suffices to show that we can associate to every x E n f(A ) an element n n yEn A such that f(y) = x. Now the family of sets of the form f- 1 ({x}) n A has n n n the finite intersection property, hence it has a non-empty intersection and we just choose y in this intersection. (f-ana lyti c sets DEFINITION. Let (F,~) be a paved set. A subset A of F is called ~-analytic if there exist an auxiliary compact metrizable space E and a subset BeE x F belonging to (X(E) x ~)ao' such that A is the projection of B onto F. The paving on F consisting of a11 ~-ana lyti c sets is denoted by (i(Cf) . It follows immediately from the definition that every A EQ(j) is contained in some element of ~a' In particular, the whole space F is ~-analytic if and only if it belongs to (fa (8 below). Definition 7 involves a variable compact space E. We show in the appendix that replacing E by the fixed compact space NN (~ being the one point compactification of IN), or by R, leads to the same class of analytic sets. The same is true, on the other hand, if E is replaced by a variable semicompact paved space (E,~), as was done in the first edition. Finally, the ~-analytic sets are those which are constructed from Souslin1s operation (A) applied to elements of~. (1) Bourbaki [2J ( 3rd edition), §6, no 4, Proposition 5. This proof was communicated
to us by G. Mokobodzki.
7
42- I II
8 THEOREM.
PROBABILITIES ~ e a(~)
; the paving u(j) is closed under (uc,nc).
Proof: The first assertion is obvious. To establish the second, we consider a sequence (An)n E ~ of ~-analytic sets. There exist by definition, for each integer n :
- a compact metrizable space En' with its paving ~(En) = en - a subset BEE n n x F, belonging to (~n x~) Ou~ (and hence equal to the intersection of a sequence (Bnm)m E ~ of elements of (~n x j)o) whose projection onto F is An' Let E be the compact space n En with the paving ~= IT ~n e Jt(E). Let 'IT be the projection of E x F onto F. We de~ote by C the cylinder o? base B in E x F, that is (IT Em) x Bn (1) ; then A = 'IT( Cn). The assertion conCer~ing the operation n#m n n n n (nc) will therefore be established if we show that n C belongs to (c x j)oo ; n n which is obvious since every Cn belongs to (c x ~)oo' Now let E be the Alexandrov compactification of the topological sum IE, with n n the compact pavi ng ~ = ~ ~n e $( E) and 1et 'IT be the projecti on of E x F onto F. Then
n
n
IBn) = UA (identifying ( IE) x F to L(E x F)). Hence it is sufficient to n n n n n n n show that L Bn E (C x j) ~. Now this set is equal to n L B and I B belongs Ou m n nm n nm n to (c x (f) • Thus closure under (uc) is established. 'IT (
o
9 THEOREM. (a) Let
(E,~)
and
(F,~)
be two paved sets; we have a. We then suppose that the construction has been made up to the (n-l) th term We have by hypothesis Cn- 1 c A, I(C n_1) > a. Consequently I(C n_1 ) = I(C n_1 nAn) = sUPmI(Cn_l n Anm ). Then we take Bn to be one of the sets Anm ,where m is sufficiently large, so that I(C _1 n Anm ) = I(C n) > a. n (1) [3J, §.6, Proposition 14 (2) The statement is trivial if I(A)
= - 00, for then
I(~)
= - 00 and
~
belongs to
~.
AND POTENTIAL
53- I II
Having constructed the sequence (Bn)n B=
~
l' we set
B~ =
n Bn = n B'n'
B1 n B2 n ••• n Bn and
n n The sets B~ belong to ~ and decrease and we have Cn c B~ : hence I(B~) > a and I(B) ~ a by (27.2). We have Bn C An and hence B c A. Finally the set B satisfies the required conditions and the lemma is established. Now let A be ~-analytic. There exist a compact metric space E with its compact paving X(E) = c and an element B of (0 x 1)00 such that the projection of B onto F is equal to A. Let n denote the projection of E x F onto F and ~ denote the paving consisting of all finite unions of elements of G x ~. By 4, there is no loss of generality in supposing that c is closed under (uf,nf) and then ~ is closed under (uf ,nf) . LEMMA 2. The set function J defined for all H c E J(H) = I(n(H)) is an ~-capacity on E x F.
x
F Ql
Proof: The function J is obviously increasing and satisfies (27.1). Property (27.2) follows immediately from the relation: n(B ) = n( B) n n n n which holds, according to 6, for every decreasing sequence (Bn)n E N of elements of
n
n
~.
We can now complete the proof. Since B is capacitable relative to J by Lemma 1, there exists an element D of ~o such that DeB, J(D) ~ J(B) - E (E > 0). Let C be the set n(D) : the above equality shows that C is an element of J o and we have C c A, I(C) ~ I(A) - E. It is interesting to analyze the above proof following Sion [IJ. Let C be the class of all sets A such that I(A) > a : C has the properties: (29.1) (29.2)
AEt,AcB=}BEC if (An) is an increasing sequence of subsets of F, whose union belongs to C, then some An belongs to c,. On the other hand, the property we established can be stated as follows
(29.3)
if anJ-analytic set belongs to C, it contains the intersection of a decreasing sequence of elements of J n C.
The proof rests solely on (29.1) and (29.2). Lemma 1 amounts to saying that any ~oo belonging to C satisfies (29.3), and Lemma 2 to the fact that the class c' in E x F consisting of the sets whose projection on F belongs to C still satisfies (29.1) and (29.2). Then Lemma 1 is applied in E x F, and finally projection and intersection commute thanks to the compactness of the paving 0 (no. 6). Sion calls such a class B satisfying (29.1) and (29.2) a capacitance. The
29
PROBABILITIES
54-III
validity of (29.3) then is "Sion l s Capacitability theorem", which is a little bit more general than that of Choquet. See Sion [lJ. Construction of capacities The hypotheses of Choquetls theorem are quite general, but difficult to fulfill : one seldom comes across non trivial set functions which are given from start for all subsets of a set F. It is more natural to consider a function defined on a paving and to determine whether one can extend it to the whole of ~(F) as a Choquet capacity. Still following Choquet, we now describe such an extension procedure for "strongly subadditive set functions. We limit ourselves to the positive case. but this restriction is by no means essential. ll
30
DEFINITION. Let J ~paving on a set F, closed under (uf,nf). Let I be a positive and increasing set function defined on $(1). We say that I is strongly sub-additive if for every pair (A,B) of elements of $ I(A u B) + I(A n B)
(30.1) If the symbol function on ~.
~
~
I(A) + I(B).
is replaced by =, we get the definition of an additive
31 THEOREM. Let ~ be a paving on F which is closed under (uf,nf) and let I be an increasing and positive set function on J. The following properties are equivalent (a) I is strongly subadditive ; (b) I(P u Q u R) + I(R) ~ I(P u R) + I(Q u R) for all P,Q,R E ~; (c) I(V U VI) + I(X) + I(X I) ~ I(X U XI) + I(V) + I(V I ) for all pairs (X,V), (XI,V of elements of ~ such that X c V, XI C VI. I
)
To show that (a) ~ (b), we write A = PuR, B = Q u R in (30.1). Then I(P u Q u R) + I((P n Q) u R) ~ I(P u R) + I(Q u R). Since I is increasing, the inequality implies (b). Proof
To show that (b) ~ (c), we write P = V, Q = VI, R = X in (b). Then I(V u VI U X) + I(X) ~ I(V u X) + I(V I U X). We add I(X I ) to both sides and use the relations V u VI U X = V U VI, V U X = V, VI U X = VI U X U XI. Then I(V U VI) + I(X) + I(X I ) ~ I(V) + [I(V I U X U XI) + I(XI)J. We again apply (b) with P = VI, Q = X, R = XI. So we get an upper bound for the bracket to the right of the preceding inequality and deduce I(V u VI) + I(X) + I(X I ) ~ I(V) + I(V I U XI) + I(X U XI) =
(1) The value
+
00
is allowed.
I(V) + I(V I ) + I(X
U
XI).
55- I II
AND POTENTIAL That is (c). To show (c) Then
~
(a), it suffices to write X = A n B, V = B, XI
= VI = A in (c).
I(A u B) + I(A n B) + I(A) ~ I(A) + I(B) + I(A). Then either I(A) = + 00 and inequality (30.1) is trivial, or I(A) < + 00 and this inequality implies (30.1). Thus the equivalence of all three properties is established. REMARKS. Formula (c) extends immediately by induction as follows: Let Xl' X2 ,.· .• , Xn , VI' V2 ,.·., Vn be elements of ~ such that Xi C Vi for i = 1,2, .•. , n. Then (31.1)
I(
Y , Vi)
+ ~ I(X i ) ~ I(
,
Y , Xi)
,~ I(V i )·
+
This formula looks more pleasant when all the quantities I(X i ) are finite it can then be written: (31.2)
I(
Y , Vi)
- I(
Y , Xi) ~ ~, [I(V i ) -
I(Xi)J·
Inequality (b) is less useful ; when all the quantities appearing are finite, it can be written as I(P u Q U R) - I(P U R) - I(Q U R) + I(R) ~ O. We now associate an 'outer capacity" to every strongly subadditive and increasing set function, and investigate whether this procedure yields a true Choquet capacity. THEOREM. Let F be a set with a paving ~ closed under (uf,nf). Let I be a set function ~ defined on J, positive, increasing and strongly subadditive, which satisfies the following property: (32.1)
for every increasing seguence (A) > 1 of elements of n n A belongs to~, I(A) = sUPnI(An)' For every set A E ~a we define I*(A)
(32.2)
~
whose union
= sup I(B). Bd'
BeA and, for every subset C of F (32.3) 1* (C) = inf
r* (A)
(inf 0
(1)
=
+ 00).
Ad'a
A::>C Then the function 1* is increasing and has the following properties (a) for every increasing sequence (X) > 1 of subsets of F, n n (32.4) (b) n. Then (32.5)
~(Xn)'
1*( U X ) = sup I*(X ). n n n n (V n) be two sequences of subsets of F such that Xn
1* ( U V ) n n
+
L 1* (X ) n
n
~ 1* (
U X ) + L 1* (V ). n n
(1) I * is called the outer capacity associated with I.
n
n
C
Vn for all
PROBABILITIES
56- II I
(c) The function 1* is an I-capacity, if and only if (32.6)
1*(
nn An)
inf I(A ) n n for every decreasing sequence (A) 1 of elements of n n ~ =
~.
Proof: We start by noting that definition (32.2) gives an extension of I to ~(J and definition (32.3) an extension of 1* to the whole of ~(F). In other words, the definition of 1* is coherent. Clearly 1* is increasing on ~(F). (1) Let (An)n ~ 1 be an increasing sequence of elements of J~ and set A = ~ An' Then I*(A) = sHP 1*(A n)· It obviously suffices to show that 1(B) ~ sup 1*(A n) for all B E 3 such that B c A. Let (A nm )m -> 1 be a sequence of elements of J whose union is An' Replacing if necessary each Anm by the set A~m = AIm U A2m U ••• u Anm , we can assume that A m is an increasing function of n for each m. Then n * sup I (An) = sUPn(suPm1(Anm)) = suPn1(A nn ). contained in A ; then B = U (B n A ) and, by (32.1) n nn suPn1(B n Ann) ~ suPn1(A nn ) = sUPn1*(A n)· ~
Let B be an element of 1(B)
=
(2) The function 1* is strongly subadditive on ~(F). First let A and B be two elements of ~a and let (An)' (B n) be two increasing sequences of elements of J whose unions are equal respectively to A and B. Then the sets An n Bn, An u Bn belong to ~ and A n B = U (A n n hence by (1) I*(A
n
B)
B) n
A u B = U (An n
= limnI(A n
u
Bn) ;
Bn) + limn1(A n n Bn) limn[I(A n) + 1(Bn)J = 1*(A) + 1*(B). We then consider any two subsets X and Y of F. Let A and B denote elements of containing X and Y respectively. We have 1*(X u Y) + 1*(X n Y) ~ 1*(A u B) + 1*(A n B) ~ 1*(A) + 1*(B). u
B) + 1*(A
n
u
Ja
Then we get the desired inequality 1*(X u Y) + 1*(X n Y) ~ I*(X) + 1*(Y) by passing to the infimum over A dnd B. (3) Let (Xn)n ~ 1 be an increasing seguence of subsets of F and~ X = U Xn. Then 1*(X) = sup 1*(X ). n -n n If the right-hand side is + 00, (3) is obvious, so we may assume it is finite. Let h be a number> 0 ; we are going to construct an increasing sequence (Yn)n ~ 1 of elements of ~a such n that Yn ~ X and 1*(Y n) ~ 1*(X n) + h.
57-II I
AND POTENTIAL
Then if Y denotes the union of the Yn, which belongs to So and contains X, we have by (1) I * (X) ~ I * (Y) = sUPnI * (Y n) ~ sUPnI * (X n) + h and the theorem will be established, since h is arbitrary. We begin by choosing for each n a set Z E S such that X c Zn and h nan I*(X n) ~ I*(Zn) ~ I*(X ) + -- • We then write 2n n Yn = ZI u Z2 u '" U Zn and prove inductively that I*(X n) ~ I*(Y ) ~ I*(X ) + h(l- ~), n n 2n which implies the required property. Since these inequalities are obviously satisfied for n = 1, we assume they hold up to step n. We have Yn+1 = Yn u Zn+ 1 ; strong subadditivity implies I * (Y n+1) ~ I * (Zn+l) + [I * (Y n) - I * (Y n n Zn+l)J. Now the bracket is no greater than h(l- ~), 2n since Yn n Zn+1 is an element of J a lying between Xn and Yn and we have, by the induction hypothesis, I * (X n)
~
I * (Y n n Zn+l)
~
;n)'
I * (X n) + h(l- 1
Consequently ~
1 I* (Zn+l) + h(l- -n)
1
1 2
+ h(--;:-:-:r + 1 - - ) 2n+1 2n
where the last inequality follows from the definition of Zn+l' Hence the induction formula is true at step n + 1 and property (3) is established. It only remains to prove (32.5) . This inequality is deduced immediately on passing to the 1imit from the relation n n n n 1*( U Y,) + L I*(X.) ~ 1*( U X,) + L I*(Y i ) '1 '1' , =' , =
'I' ,=
i=1
which is a consequence of property (2) (see (31.1)) (the passage to the 1imit is justified by property (3)). Finally, it is immediate that condition (32.6) is necessary and sufficient for 1* to be a Choquet J-capacity. Assertion (61 is somewhat different from the other ones. There is no reason why I should be a capacity relative to the same paving J from which the extension started. For example, we shall apply Theorem 32 with f being either the paving of compact subsets of a Hausdorff space E or that of open subsets; (6) will be a natural condition in the first case, but not in the second one. Applications to measure theory Before pursuing the study of capacities, let us show that theorems 28 and 32 contain several important and classical results of measure theory.
PROBABILITIES
58- I II
33
(a) Measurability of analytic sets Let (~,a,p) be a complete probability space and let ~ be a family of subsets of ~, contained in Qand closed under (uf,nf). Let I be the restriction of P to J. Obviously I*(A) = P(A) for every element A of ~o and consequently also I*(A) = P(A) for every element A of ~e by (32.3). Conditions (32.1) and (32.6) are obviously satisfied. Let A be an ~-analytic subset of ~. Choquetls theorem implies that sup P(B) = inf fP(C). BE:fe
CE'3'O
BeA C-~A I So there ~xist an element B of J eo and an element C1 of Joe such that BI cAe C' and P(B 1 ) = P(C I ). This implies in particular that A E ~. This result was known long before Choquet's theorem (see Saks D], p. 50). (b) Caratheodoryls extension theorem 34
We return to the hypotheses of 32 and suppose that I is additive on J(cf. 30) and that (32.6) holds. Let (An)n E ~ and (Bn)n E N be two decreasing sequences of elements of ~. Passing to the limit (according to (32.6)) in the formula I(A n u Bn) + I(A n n Bn) = I(A n) + I(B n) we see that 1* is additive on ~e' Then let A and B be two elements of ~~) and £ a number> 0 ; we choose two sets AI and BI , belonging to ~, contained respectively in A and B and such that: I* (A I) ~ I* (A) - £ ; I* (B I ) ~ I* (B) - £. Then we have : I* (A u B) + I* (A n B) ~ I* (AI UBI) + I* (AI n BI) = I* (AI) + I* (B I) ~ I* (A) + I*(B) - 2£. Since the function I* is strongly sub-additive and £ is arbitrary, we see that I* is additive on ~(~). Having established this, we consider a Boolean algebra ~ and on j an additive set function I, which is positive and finite and satifies Caratheodoryls condition: (34.1) If A E ~ are decreasing and n An = 0, then lim n I(A n ) = O. Then obviously -n (32.1) is satisfied. We show that (32.6)n is also satisfied. This condition can be stated as follows: if (G ) is an increasing sequence and (F ) a decreasing sequence n n of elements of:} andUG :JnF , then sup I(G) ~ inf I(F). Now let H = FO\F E J; -n n n n n n n n n the Hn are increasing and U (G u H ) :J FO' By (32.1), sup I(G u H ) ~ I(F O) and n n n n n n a fortiori sUPn(I(Gn) + I(H n)) ~ I(F O)' whence subtracting sUPnI(Gn) ~ infn(I(F O) I(H n)) = infnI(F n)· Hence we can apply 32 and the remark at the beginning of 34 to see that 1* is additive on a(~) and hence also on o(J) c ~(~). Since I * passes to the limit along increasing sequences, I * is a measure on O(J) which extends I and we have established
59- II I
AND POTENTIAL the classical Caratheodory extension theorem from probability theory. Let us establish similarly the other main extension theorem.
35
(c) Daniell's theorem Let ~ be a set and ~ a linear space of real valued functions on ~ which is closed under the operation A and contains the constant functions. Let A be a linear functional on ~ which is increasing (i.e. positive on the cone )+ of positive elements of~) and satisfies Daniell's condition (35.1)
for every- decreasing sequence (h n ) of elements of ~+ such that lim nhn 1imnA(hn) = O.
= 0,
Let us prove Daniell's theorem: there exists a positive measure ~ on the a-field a(~) (unique according to 1.22, the lattice form of the monotone class theorem) such that A(h) = Jh~ for all h Eo: )t. Let F be the set R+ x ~. We associate to every positive function 9 on n the set Wg = {(t,w) Eo: F : t < g(w)} of all points of F lying strictly below the graph of g. The mapping 9 ~ W is injective. We denote by ~ the paving on F consisting of all 9 W , h Eo: )+, which is closed under(uf,nf) according to the relations h Wf \) W = Wf ,W f n W = Wf • 9
vg
9
Let I be the set function defined on J by I(W h) = A(h)
Ag
(h
+
Eo:
~
).
The relation h A h2 + h1 v h2 = h + h2 implies that I is additive on ~. Daniell's 1 1 condition implies (32.1). Hence we may use the extension Theorem 32 to define a set function 1* on ~(F). For every positive function 9 on ~, we s~~ A*(g) = I*(W ) 9
We show that 1* satisfies (32.6). The verification reduces to that of the following statement: let (f ) be a decreasing sequence of elements of ~+ and (gn) an increan sing sequence of elements of ~+ such that sup 9 ~ inf f • Then AlSUP 9 ) ~ inf A(f ). nn n n - - + nn n n Let h = fa - f . These functions increase with nand belnng to ~ • The relation n n * sUPn(gn+hn) ~ fa implies A (suPn(gn+hn)) ~ A(fO) and, by (32.4), sUPnA(gn+hn) ~ A(fO)' Now sUPnA(gn+hn) = SUPn(A(gn) + A(h n)) = SUPnA(gn) + sUPnA(hn)' Hence SUPnA(gn) ~ A(fO) - sUPnA(hn) = infnA(f n), the required result. Finally, A* is positively homogeneous: if 9 is a function ~ a on ~ and a is a real ~ 0, then t(ag) = aA*(g). This property is indeed true for 9 Eo: ~+ and is clearly preserved by the extension operations (32.2) and (32.3). We show that WaI belongs to Q(~) for all A Eo: a(~) and all a ~ O. Let ~ denote the family of subsetsAA of ~ such that WaI and WaI c belong to ~(~) for all a ~ a this is a-field since G..(J.) is closed underA(uc,nc).AHence to prove that a(~) is included in ~ we need only show that ~ contains all sets of the form {h > b}, with h Eo:~, b Eo: R. Since {h > b} = {(h-b)+ > O},it suffices to show that {h > O}belongs to f.> for all h Eo: X+, and this follows from
PROBABILITIES
60-111
Wal{h>O} = ~ Wa ((nh)A1)' WI a {h=O} Then, for all A
E a(~),
=
nn W(1 a - hn)+.
set ~(A)
This set function is follows from that of show that A(h) = Jh~ to show that A*(g) = disjoint elements of
= I*(W I ) = A*(I A). A a bounded positive measure on (~,a(~)) : the additivity of ~ 1* on Q(J) (34) and the a-additivity from (32.4). We finally for all h E )t-+ and hence for all h E~. By (32.4), it suffices Jg~ when g is an elementary function a i 1 where the Ai are Ai a(~) an~ the a are ~eals > O. But under these conditions i
i
A*(g) = LA*(a.I A ) = L a.A*(I A ) = Jg~ 1 1 i 1 1 i since A* is positively homogeneous and additive on u(~). 36 (d) The representation theorem of F. Riesz
We recall how Daniell IS Theorem implies the IIF. Riesz representation theorem Let E be a compact metric space,» the space C(E) and A an increasing linear functional on~. By Dini's lemma (of which we shall see a more elaborate form in Chapter X (1) every decreasing sequence (h ) of continuous functions on E, which converges n pointwise to 0, converges uniformly to 0 and condition (35.1) is therefore satisfied. Thus, since tRl(E) = aOO : ll
•
THEOREM. Every increasing linear functional A on C(E) has a unique representation (36.1) where
(f ~
EO
C(E))
is a bounded positive measure on E.
(e) Regularity of measures The following theorem form a transition between capacities and the results of paragraph 3. 37
THEOREM. Let E be a compact metrizable space. For every bounded positive measure on E and every Borel (or more generally ~-measurable) set BeE,
~
(37.1) where K runs through compact subsets of E contained in B. Proof: Let ~* be the outer measure associated with ~ (2) , which is a capacity relative to ~. Every X-analytic set B is ~*-capacitable and hence satisfies (37.1). On the other hand, ~(E) c ~(~) (13). If B is ~-measurable, one can find two Borel sets B and B" such that B c Be B" and ~(B"\BI) = 0 ; (37.1) for B then follows from the same relation applied to the Borel set B' . 1
1
(1) First edition, no. X.b. (2) By definition y ~*(A) = i~f
B~:>~E)
~(B)
for every subset A of E.
AND POTENTIAL
61-III
THEOREM. The same statement is true for every space E homeomorphic to a universally measurable subspace of a compact metric space, and hence for every Lusin (in particular Polish), Souslin, or cosouslin metrizable space.
~
Proof: The first sentence is obvious from 37. The case of Lusin spaces follows from their definition (16), that of Polish spaces from 17 and that of Souslin and cosouslin spaces from 16 and 33. We shall see in no. 69 that this important result is also valid for certain non-metrizable spaces, which play an important role in analysis. We note a consequence: although the o-field ~(E) is not necessarily a Blackwell 0field (if E is cosouslin, for example), every measure on E is carried by a countable union of metrizable compact subsets and hence by a Blackwell subspace. Hence results such as 26 can be extended, up to sets of measure zero. Right-continuous capacities To apply Theorem 32, it is necessary to verify hypotheses (32.1) and (32.6). That is why the usual capacities are constructed, either from a left-continuous function on open sets or from a right-continuous function on compact sets. We work with a Hausdorff space F and denote as usual by the paving of open subsets of F and by X that of compact subsets of F.
r
DEFINITION. Let I be an increasing positive function defined on left-continuous if (39.1)
t.
I is said to be
39
for every open set U and every real number a < I(U), there exists a compact set K c U such that I(V) > a for every open set V containing K.
THEOREM. Let I be a function on ~ which is positive, increasing, left-continuous*and strongly subadditive. Then I satisfies (32.1) relative to the paving J = and I is a capacity relative to K.
1
Proof: Let (U n ) be an increasing sequence of open sets with union U and let a < I(U). We choose a compact set K c U satisfying (36.1). K is contained in one of the Un and hence sUPnI(Un) > a and finally sUPnI(Un) ~ I(U), and (32.1) follows. Let (K n ) be a decreasing sequence of compact sets with intersection K. By definition of I*(K) there exists, for every number b > I*(K), an open set U containing K such that I(U) < b. Then one of the K is contained in U and hence inf I*(K ) n n n < b, whence infnI*(K ) ~ I*(K) and we have equality. n
--
REMARK. The conclusion of Theorem 32 is capacitability of every i-analytic set relative to 1*. Right-continuity implies a better result: if F is metrizable and separable, every Souslin set S c F is capacitable. Imbedding F in a compact metrizable space c, set indeed L(G) = I(G n F) for every open set G of C ; L is rightcontinuous and strongly subadditive and L* and 1* coincide on subsets of F. On the other hand, S is t-analytic in C (18).
40
PROBABILITIES
62-111
If F is only assumed to be Hausdorff, the Souslin sets in Bourbaki's sense (67) are capacitable. 41
DEFINITION. Let J be a positive increasing function defined on right-continuous ~ (41.1)
42
X.
J is said to be
for every compact set K and every real number a > J(K), there exists an open set V ~ K such that J(L) < a for every compact set LeV.
THEOREM. (a) ~et J be a function on Uwhich is positive, increasing, strongly subadditive and right-continuous. Then J satisfies (32.1) relative to the paving J = X. (b) For every open set G define (42.1)
J + (G)
=
sup J(K)
KEX KeG
and for every subset A of F
inf J+(G).
(42.2)
GE~
GJ~
Then J+I = J, J+I~ is a function of open sets satisfying the hypothesis of 40, so X that J+ is a capacity relative to X. Proof To make clearer the relation with the preceding results, denote by I the function of open sets defined in (42.1) : then J + = I on ~ and on the whole of ~(F) J+ is the "outer capacity" 1* relative to the paving ~. This isn't the same as the outer capacity J* relative to the paving X, whose definition uses Xa sets instead of open sets). The right-continuity of J means that II = J ; on the other hand, the same arguments applied to I show that I is left-contin~ous on open sets. We show that I is strongly subadditive on which implies that 40 applies to 1* = J+, from which the remainder of the statement follows at once.
l'
LEMMA. Let U and V be two open sets and K a compact set contained in U u V. There then exist two compact sets LeU, MeV such that K = L u M. K\U and K\V are two disjoint compact sets in a Hausdorff space ; hence they can be enclosed in two disjoint open sets P and Q, and we just set L = K\P and M= K\Q. Having established this lemma, we take two numbers a < I(U n V) and b < I(U uV) and,choose a compact set H c U n V such that J(H) > a and a compact set K c U u V such that J(K) > b. Replacing K by H u K if necessary, it can be assumed that H c K. By the lemma, we can write K = L u Mwith L c U,and MeV and, replacing them by H u L, HuM if necessary, we can assume that Land Mcontain H. Then we have a + b $ J(H) + J(K) $ J(K n M) + J(L u M) $ J(L) + J(M) $ I(U) + I(V). Passing to the upper bound over a and b, we get I(U n V) + I(U u V) $ I(U) + I(V), the required inequality. REMARK. Let us compare the two capacities J* (defined by means of the
Xa)
and J+
AND POTENTIAL
63- I II
(by means of the open sets). They are equal on X and hence on ~ , and for arbitrary . J + (B), where B E X contalns . A we have J * (A) = lnf A. It followsa that J * (A) ~ J + (A). a If A is X-analytic, then J*(A) = J+(A) by Choquet's Theorem. Note that if F is metrizable and separable and A is Souslin in F and not contained in any X , then A is capacitable relative to J+ but not necessarily relative to J*(J*(A) ~ + 00). The capacity J+ = 1* is computed "from outside, using open sets". The capacitability theorem applied to J+ thus tells us that both ways of computing the capacity, from inside and from outside, are equivalent. It ~;so~~~e:~~s~o in 32, but approximation by Xa sets is less convenient than Theorem 28 itself may be given an analogous interpretation. Let m be the monotone class generated by J, and for every set B define (27.4) I+(B) = inf I(M) where M EtnI contains B. Then 1+ is a capacity and coincides with I on~, and 28 means that it coincides with I on ~(~). Thus all results on capacitability are theorems on approximation both from outside and from inside. Some applications of the theory of capacities We have already given some applications of the capacitability theorem to measure theory. The following ones are of a different kind. We begin with the second proof of the separation theorem for analytic sets, mentioned in no. 14. Recall the notation: J is a semi compact paving on F which can be assumed to be 43 closed under (uf,nc) and C is the closure of J under (uc,nc). Let ~ be the diagonal of F x F. For every subset Wof F x F we set I(W) = 1 if every element of the product paving C x C which contains W intersects the diagonal I(W) = 0 otherwise. Let us prove that I is a capacity relative to the product paving :f x 3". It is obviously increasing. We show that if WE S x J is the union of an increasing sequence (W n), then I(W) = limnI(W n). It suffices to treat the case where I(W n) = 0 for all n, which means there exist elements Cn x Dn of t x C containing Wn such that enD =~. Replacing C by n C and D by n D if necessary, we can n n n m>n n > m m-n - m assume that the sequences (C ) and (D ) are increasing. Then (U C ) x ( U D ) n n n n n n belongs to 6 x C, contains Wand does not meet the diagonal, hence I(W) = O. Finally, consider a decreasing sequence Wn = Kn x Ln of ~ x ~ and its intersection W= K xL; let us prove that I(W) = inf nI(W n). It suffices to treat the case where I(W n) = 1 for all n, that is, where Kn x Ln ! ~. Since the paving ~ is semicompact, we have K n L ! ~ and L(W) = 1. Then let A and B be two non-separable J-analytic sets. This means (1) that I(A x B) = 1. By the capacitability theorem, since A x B is J x j-analytic (9), (1) If Ax B ! 0
we leave to the reader the case where A or B is empty.
PROBAB ILI TI ES
64- I II
there exists an element K x L of ~ x ~ contained in A x B such that I(K x L) But then K n L 1 ~. hence A n B 1 ~ and the theorem is established (1). We pass to a result which has important probabilistic applications. 44
Let
(~,s)
be a measurable space and A a subset of R+ DA(w) = inf{t
(44.1)
E
(with the usual convention that inf of A.
R+ : (t,w) ~ =
+ 00);
E
x ~.
= 1.
We write, for all w E
~,
A}
the function DA is called the debut
THEOREM. Suppose that A belongs to the a-field 6;XR+) x J (or, more generally, that A is (~(R ) x ~)-analytic). -+ A (a) The debut D is measurable relative to the a-field ~, the universal compleA tion of J. (b) Let P be a probability law on (~,~). There exists an ~-measurable random variable T with values in [O,ooJ such that T(w)
(44.2)
P{T
O. The set {D A < r} is the projection on ~ of {(t,w) : t < r, (t,w) E A}. By 13, {D A < r} is J-analytic. By 33, it belongs to every completed a-field of~, whence assertion (a). Associate with P the set function p* as in 32 (P*is the classical ou ter probabilityll of Caratheodory) : this is an J-capacity, equal to IP on ~ and even on the completed a-field of J(II.32,(b)). Let n be the projection of R+ x ~ onto ~ and let I be the set function A~ IP * [n(A)J : I is a capacity relative lI
to the paving J£, the closure of X(IR+) x J under (uc,nc) (28, Lemma 2). By 13, every element of the product a-field ~(IR+) x J (or, more generally, of G(~(IR+) x 'J)) is ~-analytic and the capacitability theorem 28 implies the existence, for all E > 0, of an element B of ~ = ~ contained in A and such that I(B) > I(A) - E. This can o also be written IP{D < oo} > ~{DA < oo} - E • Since for all w E ~ the set B(w) = {t : B (t,w) E B} is compact, the graph of DB in R+ x ~ is contained in A. Then let SE be an J-measurable positive random variable, equal almost everywhere (3) to DB ; we write TE (w) = SE (w) if (SE (w),w) E A = + 00 otherwise. (1) For a proof of a deeper theorem along the same lines, see Dellacherie, Seminaire de Probabilites de Strasbourg vol. X~ p. 580-582. (Lecture Notes in M. 511, SpringerVerlag 1976). (2) With the notation of (b) the cross-section T of A is said to be complete if T(w) < for every w such that DA(w) < We prove in the appendeix a theorem on existence of complete cross-sections (81). (3) In fact, DB itself can be shown to be J-measurable. 00
00.
65-III
AND POTENTIAL
Then T satisfies (44.2) and a weaker condition than (44.3) (P{T s < oo} > P{D A > oo} s - E). Let us say (in this proof only) that, given C E ~(R+) x S, a positive s-measurable function S such that (S(w),w) E C for all W E {S < oo} is a section of C with remainder P{S = 00 , D < oo} : By the above, C has a section with remainder <s C for all s > O. I~e construct sections of A inductively as follows. To = + identically. If T has been defined, we construct a section S of A = A n {(t,w) : n 1 n n Tn(w) = oo} such that P{S n < oo} -> -2P{DA n< oo} , and we set Tn+1 = Tn AS, n a section 00
of A which "extends" Tn' At each step, the remainder is at most half of the procedinq one. So T = infnTn is a section with remainder zero, which therefore satisfies (44.2) and (44.3). At the cost of minor modifications, this theorem is still valid if (R+,~(R+)) is 45 replaced by a Souslin measurable space (S,~), which, from the measure theoretic point of view, is not distinguishable from an analytic subset of R (20) : (a) is no longer meaningful but the projection rr(A) of A onto ~ still belongs toJ; (b) remains true provided IP{OA < oo} is replaced by P[rr(A)J and [O,ooJ by S u{oo} , where "00" is a point added to S. By way of illustration, here is a theorem on liftinq measures, which we shall use later. THEOREM. Let (S,$) be a Souslin measurable space, (E,c) be a Hausdorff separable measurable space and f be a measurable mapping from S onto E. For every probability law ~ on E, there exists a probability law A on S such that ~ = f(A). Proof: We apply 44 with P = ~ . The graph A of f in S x E belongs to b x~. Hence there exists a measurable maoping g defined on an element B of G such that ~(B) = 1, with values in S, such that g(y) = x if f(x) = y E B. Then it suffices to take A to be the image law g(~). Note that the hypotheses imoly that the measurable space E is Souslin.
3, BOUNDED RADON MEASURES If abstract measure theory - which is the basis of probability theory - is compared to the theory of Radon measures, as developed for example in Bourbaki's book on integration, it may seem that the latter is superior to the former on four counts. These are, by order of decreasing importance: - the existence of a good theorem on inverse (projective) limits of mesures, - the existence of some reasonable topologies (vague, strict) on the space of measures, the possibility of passing to the limit along uncountable increasing families of l.s.c. functions. - the removal of certain a-finiteness restrictions. The "importance" is here estimated from the probabilists' point of view. We leave aside the last point and examine the other three. We also prove, without many details, existence theorems for conditional laws and disintegration of measures. We follow
PROBABILITI ES
66- I II Bourbaki quite closedly throughout this paragraph: see Bourbaki [5J. Radon measures and filtering families of semicontinuous functions
46 DEFINITION. Let E be a Hausdorff topological space. A measure ~ on E is called a Radon measure if (1) every point of E has an open neighbourhood V such that ~(V) < + (2) for all A E ~(E), 00
(46.1)
~(A) =
sup
~(K).
Kd{( E)
KcA Property (1) and (2) are called respectively local boundedness and inner regularity on tightness of the measure ~. A signed measure is said to be Radon if it is the difference of two positive Radon measures. Here we limit ourselves to positive measures and to bounded measures except in 47. The notion of a (bounded) Radon measure has a counterpart in abstract theory : the notion of inner regular measure with respect to a compact paving. This notion seems to have some applications, but note of great importance. The first edition of this book can be consulted or the notes [lJ of Pfanzagl-Pierlo. 47 REMARKS. (a) Property (1) implies that ~(K) < for every compact set K. Conversely, on a locally compact space, this property implies (1). Every Radon measure on a compact space is bounded. (b) Every element of the completed a-field ~~ of ~(E) relative to ~ contains a Borel set which differs from it only by a ~-negligible set. Hence we also have (46.1) for a11 A E ~~. (c) If ~ is bounded, the approximation (46.1) applied to the complement af A E ~~gives (1) 00
(47.1)
~(A)
=
inf
GE ~(E)
~(G).
G JA
Hence the measure is also "outer regular". More generally, when ~ is not bounded, this is valid for all A contained in an open set U of finite measure (pass to the complement in U instead of E) or even in the union of a sequence of open sets of finite measure Un (given £ > 0, choose an open set Gn ~ A n Un such that ~(Gn) ~ ~(A nUn) + £.2- n ; then ~( U G ) ~ ~(A) + 2£). If E itself is the union of such a n n sequence, (47.1) holds for all A E ~~; this is the case when E has a countable base. We henceforth limit ourselves to bounded measures. 48 THEOREM. Let ~ be a bounded Radon measure. For every positive Borel function f
(more generally for f measurable relative to the completed a-field ~~), we have
Jf~ = sUPh Jh~, where h is u.s.c. and bounded with compact support and ---- o ~ h ~ f, (48.1)
(1) Recall that 1(E) is the paving of open subsets of E.
AND POTENTIAL
67-III
ffW = inf g fgW' where g is l.s.c. and g ~ f. Proof: First formula. Replacing f by fAn if necessary, we can suppose that f is bounded. There then exists a measurable function k taking only a finite number of values such that k ~ f and w(k) ~ w(f) - E (Lebesgue approximation). We write k as a finite sum IanI A , choose for each n a compact set Kn C An such that w(K n) ~ -n n w(A n) - E.2 Ian and take h = I anI K . n Second formula. We choose an elementary function j ~ f such that w(j) ~ w(f) + E (Lebesgue approximation from above). We write j as a countable sum I anI A ' choose for each n an open set G ~ A such that w(G ) ~ w(A ) + E.2- n/a and set gn= IanI G . n n n n n n (48.2)
THEOREM. Let w be a bounded Radon measure on a Hausdorff space E. (a) Let (fi)i E I be family of l.s.c. positive functions, filtering to the right, with upper envelope f. Then w(f) = sUPiw(fi)' (b) Let (9i)i E I be a family of U.S.c. positive functions, filtering to the left, with lower envelope g. If there exists an index i such that w(gi) < + 00, then ~(g) = inf.~(g.). 1 1 ( 1) (c) Ii E is completely regular, then for every positive l.s.c. function f we have ~(f) = suPc~(c), where c runs through the set Cf of positive continuous functions bounded above by f. Proof: We first prove the particul~r case of (a) that concerns open sets : If an open set G is the union of a family of open sets Gi which is filtering to the right, then ~(G) = sUPiw(Gi)' This is obvious, as w is regular and every compact set contained in G is contained in some Gi . Taking complements, we get the form of (b) for closed sets. With every positive function h we associate the Lebesgue approximation truncated at 2n : k=22n n h(n) = 2I I{h>k2- n}. k=l If h is l.s.c., h(n) is a finite linear combination of indicators of open sets. Using the preceding result,
~(f) = SUPnw(f(n)) = sUPnw(suPif~n)) = sUPnsuPiw(f~n)) = sUPi
SUPnW(f~n)) = sUPi~(fi)'
We pass on to (b) : since the family contains an integrable function, one may reduce to the case where all the gi are bounded above by an integrable function h. Replacing the gi by the giI{9i ~ N}' where N is chosen large enough so that (1) For the definition and properties of completely regular spaces, see Bourbaki, Top. Gen. IX. §1, nos 5 to 7.
49
PROBABILITIES
68- I II
f {h
>
N}
h~
< E,
one may still reduce to the case where the gi are bounded by N,
which finally reduces to (a) by considering the extension to non-bounded Radon measures is less Finally, since E is completely regular, Cf whose upper envelope is f (Bourbaki, Gen. Top., we deduce (c) from (a). 50
l.s.c. functions N - gi (here the obvious). is a family filtering to the right Chap. IX, §2, Proposition 5). Then
REMARK. In particular, by (a) the union of all the ~-negligible open sets is a ~ negligible open set, whose complement S~ is the smallest closed set carrying ~. S~ is called the support of ~. We have deduced properties (a) and (b) from the regularity of ~ . We shall see later (in the "digression" of nos. 63-68) another proof, using a property of the space E, not of the measure ~. There is nothing very deep in all this ! Tightness and inverse limits The fundamental theorem on inverse limits, used in the construction of stochastic processes, is Kolmogorov's Theorem (in fact, Kolmogorov has rediscovered a much earlier result of Daniell). We shall see how more recent results follow very easily from it. Kolmogorov's proof using Caratheodory's extension theorem is quite classical (see the first edition of this book, no. 111.31 or Neveau [IJ, Theorem 111.3.1, p. 78). We rather give a quicker proof (also classical) for separable metrizable spaces.
51
We use the following notation: the En(n c ~) are separable metrizable spaces; F is their product, which is also separable and metrizable ; Fn is the product EO x El x x En' Pn the canonical projection of F + n l onto Fn and qn the projection of F onto Fn. THEOREM. For each n, let ~n be an inner regular probability law on Fn. If the family (~ ) satisfies the compatibility conditions ~ = p (~ +1)' there exists on F one and n n n n only one probability law ~ such that ~n = qn(~) for all n, and ~ is tight. Proof: Uniqueness. If dn is a metric on En defining the topology and bounded by 1, the topology on F is defined by the metric d((x n), (Yn)) = L 2- ndn (x n ,y n). The balls relative to d are measurable relative to the product a-field ~ ~(En) on F, which is therefore identical to ~(F). Let ~ be the Boolean algebra consisting of the subsets of the form q-1n(A n) (n c IN, An c ~(En)). The condition qn(~) = ~n determines ~ on CA.; since G.generates the product a-field, the uniqueness follows from 1.20. Existence. Suppose first that the En (and hence F) are compact. Let Cf(F) be the subspace of C(F) consisting of the continuous functions g of the form gn 0 qn (n c ~, gn c C(F n)) : Cf contains the constants and is closed under the operation
A.
69- I II
AND POTENTIAL
The a-field it generates contains q~I(~(En)) for all n and hence is equal to ~(F) = ~~(En). The compatibility condition between the ~n enables us to set I(g) = fgn~n' independently of the representation g = gn 0 qn of g E Cf . I obviously is an increasing linear functional on ~f such that 1(1) = 1. By Dini's Lemma, I satisfies Daniell's condition (35.1) : hence there exists a unique measure ~ on F such that I(f) = ff~ for all f E Cf and ~n(gn) = I(gn 0 qn) = ~(gn 0 qn) for all gn E C(F n)· Hence ~n = qn(~) for all n (1.23). In order to pass to the general case, we imbed each En in a compact metrizable space t n and introduce the _corresponding notation Fn , r, pn , _qn. Each ~n can be _ identified with its image ~ under the injection of Fn into F , a measure on F (1) n n_ _ n carri:d _ b~ Fn . By the above special case, there exists on F a ~easure ~ such that ~n = qn(~) for all n and the problem reduces to showing that ~ is carried by F. Now let £ > O. For each n, we choose a compact set Kn C En such that ~n+I(EO x EI x •• C -n (for n = 0, ~O(Kn) C < E); then we have ~( - IT -F x KC x •• x K ) < £2 Fk) < s2 -n . n k n kn n Consequently, if K is the compact set 11 K contained in F, we have 0(KC) ~ I s.2- = n n n 2s and ~ is carried by F. The measure ~ is tight on F (37) ; since it is carried to within 2s by a compact subset of F, it is immediately verified that it still is tight on F. We deduce from this result the general theorem on the construction of stochastic processes, also due to Kolmogorov. However the usefulness of this theorem is somewhat illusory: when the index set T is uncountable, the a-field (~(E))T is far from being rich enough.
n -
COROLLARY. Let E be a separable metrizable space, T be any index set and F be the product set ET with the product a-field ~ = (~(E))T. For every finite subset U of T, let FU denote the (metrizable) space EU and qu the projection of F onto FU' and let ~U be a tight probability law (2) on FU. There exists a probability law ~ on (F,s) such that qU(~) = ~U for every finite U C T, if and only if the following condition is satisfied: For every pair (U,V) of finite subsets such that U c V, ~V under the projection of F onto F · V U The measure ~ then is unique. (52.1)
~U
is the image of
Proof: If T is countable, this theorem reduces immediately to the preceding theorem. So assume T is uncountable. Let ~D denote, for every countable subset D of T, the (1) By definition, this means that ~n is carried by a Borel subset of Fn contained in Fn ; as ~n is inner regular, it is carried by a countable union of compact sets contained in Fn. (1) This tightness condition can be slightly relaxed: see the similar theorem (111.31) in the first edition of this book or Neveu [IJ.
52
PROBABI LIT! ES
70- II I
a-field generated by the coordinate mappings whose indices belong to O. The preceding theorem implies the existence of a unique measure ~O on J O such that ~U = qU(~O) for every finite set U contained in o. On the other hand, if 0 and 0 1 are countable and 0 is contained in 0 obviously ~Ol induces ~O on sO. Hence there exists one and only one set function ~ on the union U J O which induces ~O on each sO. Now this 1
,
o
union is the a-field J and ~ is completely additive (since every sequence of elements of J is already contained in some a-field ~O). The following theorem may look more general than Kolmogorov's theorem, but our proof - borrowed from Bourbaki - reduces it to the latter. Note that the mappings Pn are not assumed to be continuous : this is a significant improvement (due to Parthasarathy) to Prokhorov1s classical theorem on inverse limits. 53
We use the following notation: we consider a sequence of separable metrizable spaces Fn with tight probability laws ~n and universally measurable mappings Pn : Fn+1 + Fn · We denote by F the inverse limit of the (Fn,Pn)' that is, the subspace of the product TT Fn consisting of all sequences (xk)k E ~ such that xk = Pk(x k+l ) for all k, and by qnn the mapping of F into F which maps (xk)k IN to xn . n E We say that the ~n constitute an inverse system of laws (on the inverse system of spaces (Fn,Pn)) if ~n = Pn(~n+l) for all n. Under this hypothesis THEOREM. There exists one and only one law ~ on F such that is called the inverse limit of the laws ~n.
~n = qn(~)
for all n.
Proof: Let F~ denote the space FO x FI x ••• x Fn , P~ the projection of F~+I onto F' the space fTF n and q~ the projection of F' onto F~. For every n, we have an
~
F~,
n
injection in of Fn into F~ x ~ (PI F being a subset of F
1 ,
F' i
I F
0
•••
0
Pn-I(x), ... , Pn_l(x),x) ;
we denote by i the injection of F into Fl. We have a diagram q~+1
qn+1
~ r:l > Fn+1
pi n
T in
Pn
> F n
The space F' is separable and metrizable and hence so is the subspace F. The injection i is continuous and hence Borel, as are the mappings q~ (projections) and qn (restrictions of projections). Finally, the mappings in are universally measurable. We denote by ~ the measure in (~n) and by A~ the image in (Fn) in F~. We prove that ~~ is tight and carried by A~. Since in is universally measurable, there exists a Borel set H carrying ~n' on which in coincides with a Borel function
AND POTENTIAL
71-111
(approximate in by step functions) ; since ~n is tight, there exists for all E> 0 a compact set KE contained in H such that ~n(KE) > 1 - E. The set K~ = in(K E) is Souslin (18) and hence ~-measurable (34) ; therefore ~~(K~) > 1 - E. Further, the measure induced by ~'n on K'E is tight (38). We deduce immediately that ~'n is also tight and, the K'E being contained in A', n that ~'n is carried by AI. n It follows from the diagram that P~(~~+I) = ~~ a~d hence Kolgomorov's theorem implies the existence of a unique measure ~' such that q~(~') = ~~ for all n. On the other hand, An' carries ~I and hence q,-I(A') carries ~'. But this set consists of n n n 1 all sequences (xk)k E ~ such that Xo = Pl(x 1) and xn- 1 = Pn(x n). The set Qq~- (A~), which carries ~', is therefore exactly F' and the theorem follows immediately. Narrow convergence and Prokhorov's theorem In this section we present - following Bourbaki very closely - only the most basic results. In particular, we limit ourselves to positive measures. DEFINITION. Let E be a completely regular space. The topology of narrow convergence(l) 54 on the cone~~(E) of bounded (positive) Radon measures on E is the coarsest topology for which that mappings ~~~(f), where f runs through Cb(E), are continuous. This topology is Hausdorff: for if two Radon measures ~1' ~2 are such that ~1(f) = ~(f) for f E C , the same property holds for all positive l.s.c. f (49,(c)) b and then for all positive Borel f by (48.2). Hence ~1 = ~2. Here are some elementary properties : THEOREM. Let f be a positive l.s.c. (resp. bounded u.s.c.) function. Then the mapping ~~ ~(f) is l.s.c. (resp. u.s.c.) for the narrow topology on ~~(E).
55
Proof The l.s.c. case follows from 49, (c) ; if f is u.s.c. and bounded by 1, 1 - f is positive l.s.c. Given a function f on E, the l.s.c. regularization! of f is the function x~ lim inf f(y) ; this is the greatest l.s.c. function dominated by f. We define y-+x in the same way the u.s.c. regularization 1, and the set of ooints of continuity of f is the set {f = f}.
56
COROLLARY. Let f be a bounded Borel function on E. If the measure A is carried by the set of points of continuity of f, the mapping w-t- ~(f) is continuous on rn~(E) at the poi nt A. It indeed lies between the two mapoings ~r-+ ~(f) and ~ -+ ~(f), which are equal at the point Aand respectively l.s.c. and u.s.c. on ~~(E).
57
(1) "Narrow" convergence translates the French "convergence etroite". The usual English terminology is weak convergence, which however is slightly ambiguous when E turns out to be locally compact (in that case the true weak convergence is defined by continuous functions with compact support, the "vague convergence" of Bourbaki).
PROBABILITIES
72-II1
58 THEOREM. Let F be a subspace of E and let i be the injection of F into E. The mapping ~~ i(~) is a homeomorphism of ~~(F) onto the set of bounded positive Radon measures on E carried by F, with the topology induced bY~~(E). Proof: If ~ is a bounded Radon measure on F, i(~) is a measure on E carried by a countable union of compact subsets of F, according to the tightness of~, and hence is carried by F. Conversely, if A is a bounded Radon measure on E carried by F, A is carried by a countable union of compact subsets of F and it follows that the measure induced by A on F is tight. We so define two reciprocal bijections between ~~(F) and the set~~(E,F) of Radon measures on E carried by F. To simplify the language, we shall identify these two sets. We must show that the two narrow topologies from E and from F coincide on ~~(E,F). We argue with sequences, but everything extends to arbitrary filters. First, if the ~n E:'rY1~(E,F) converge to ~ E:1YY]~(E,F) narrowly in F and if f belongs to ~b(E), then fl F belongs to+~b(F) and therefore ~n(f) = ~n(fIF) ~ ~(fl F) = ~(f). Hence there is convergence in ~b(E). Conversely, suppose that the ~n converge to ~ in ~~(E) and let g be a continuous function on F lying between 0 and 1. Let j and k be the functions obtained by extending g to E by the values 0 and 1 outside F and let j and ~ be their (respectively U.S.c. and l.s.c.) regularizations. Then on F ~ = 3 = g and hence, by 55, ~(!) ~ lim infn~n(~)' ~(J) c lim sUPn~n(J).
Since the
~n
and
~
are carried by F, this can be written ~(g) ~
lim
infn~n(g), ~(g)
c lim
sUPn~n(g),
the required result. Before giving the consequences of theorem 58 with regard to the topology of ~~(E), we state Prokhorov's compactness theorem. For this we must recall that the set of (positive) Radon measures of mass ~ 1 Qn a compact space C is compact under strict convergence (Bourbaki, [4J, Integration, Chapter III, § 1, no. 9, Proposition 15). If C is metrizable, this follows very simply from 36. 59 THEOREM. Let E be a completely regular space and let H be a subset Of~~(E) consisting of the measures of mass ~ 1 satisfying (59.1)
For every number s for a11 ~ E: H. (1)
>
0, there exists a compact set K c E such that
~(K
C
Then the closure of H !.!:!- '»1~(E) is narrowly compact. Proof: Let ~ be an ultrafilter on H ; we show that ~ converges in ~~(E). We set sn = lin and choose some compact Kn such that ~(K~) < En' We can suppose that the (1) This property is called equal tightness of H.
) <E
73-111
AND POTENTIAL
sequence (K n) is increasing. Let ~n be the measure ~.IKh' identified to a measure on Kn ; by the compactness result recalled above, ~n converges strictly along ~for all n to a measure An' which we consider as a measure on E carried by Kn . Let m ~ n and let f be a positive continuous function on E ; fl K and fl K are continuous on m n Km and Kn respectively and hence A (f) = A (fl K ) = lim ~(fIK ) ~ lim ~(fIK ) = A (fl K ) = A (f). m m m ~ m U n n n n Hence ~ ~ An. We write A = SUPnAn' a measure of mass ~ 1 on E, which is obviously tight (interchange of sup). We show that ~ converges to A. Let E be a number> 0 and f be a continuous function on E lying between 0 and 1. We choose n so large that En < E and An(I) ~ A(I) - E. We then have
I + < J-l - J-ln,f > + < A - An,f > ~ !J-ln(f) - \(f) I + J-l(K~) + < A - An,I > ~ IJ-ln (f ) - An (f) I + 2E. We conclude by noting that J-ln(f) + J-ln(f) along U. I~(f)
- A(f)
I
~
IJ-ln(f) - An(f)
The conditions of the statement also are necessary for strict compactness when E is locally compact or Polish (Prokhorov [IJ). On the other hand, every narrowly convergent sequence of tight measures on a metric space satisfies (59.1). (Le Cam [IJ). And yet (Preiss [IJ) property (59.1) is not a necessary condition for compactness, even on a metric space as simple as the space of rational numbers. ll
II
The topological space of probability laws Let E a separable metrizable space. We are going to study in the following few numbers the space ~(E) of tight (1) probability laws on E, with the topology of narrow convergence. A similar study is possible for the space of tight measures of + mass:::: 1 or for the whole cone'lrlb(E). THEOREM. If E is separable and metrizable, so is ~(E). If further E is compact, respectively Polish, Lusin, Souslin, cosouslin, f(E) has the same property. Proof: Our starting point is the well known property that ~(E) is metrizable and compact if E is metrizable and compact. If E is separable and metrizable, we imbed it in a compact metrizable space C. Then ~(E) can be identified with the subspace of ~(C) consisting of all laws carried by E (58) ; hence it is separable and metrizable. If f is continuous and bounded on C, the function J-l~ J-l(f) is continuous on ~(C) by the definition of narrow convergence; it is l.s.c. if f is l.s.c. and bounded (55) and a simple argument on monotone classes shows that it is Borel if f is Borel (1) On the usual spaces every law is tight (38).
60
PROBABILITIES
74- II I
and bounded. If E is Polish, E is the intersection of a sequence (G n) of open subsets of C (17). The set of laws with no mass on Gn is closed. in ~(C), therefore @(E) is a 10 in the compact metric space P(C) and hence is Polish(l) (Bourbaki, [5J, Gen. Top., Chap. IX, § 6, no. 2). If E is Lusin, E is Borel in C (19) and hence f(E) = {~ E ~(C) : ~(EC) = O} is Borel in ~(C), compact and metrizable and finally Lusin. Suppose that E is Souslin. Then E is analytic in C and there exist a compact metric space D and an element A of (X(D) x X(C))cro whose image under the projection n of D x C onto C is E. By 45, ~(E) is the image of i?(A) under the continuous mapping At-+ n(A) of ~(D x C) into ~(C) ; since ~(A) is Borel, fl(E) is Souslin. The study of the case where E is cosouslin needs a little more work, and an interesting definition. 61 DEFINITION. Let (~,~) be a paved set. A positive function f on ~ is called ~-analytic if, for all a E R+, the set {f > a} belongs to ~(~). We often omit mentioning J. It is equivalent to say that {f ~ a} belongs to ~(J) for every a > 0, but the sets {f < a}, {f ~ a} are complements of analytic sets. The indicator of an analytic set is an analytic function (a remark useful in remembering which way goes the inequality!) and it is easily seen using Lebesgue approximation that f is analytic if and only if f is the limit of an increasing sequence of finite linear combinations, with positive coefficients, of indicators of elements of U-(:f) (2).
We now prove a lemma. 62 THEOREM. Let f be a function ~ 0 on a compact metrizable space C. If f is analytic (resp. Borel, universally measurable), then the function ~ + ~(f) is analytic (resp. Borel, universally measurable) on ~(C). Proof: The Borel case has been treated above. It suffices to treat the case where f is the indicator of a set E. Suppose that E is analytic and consider the set A from the proof of 60. The set {~ E ~(C) : ~(E) > a} is empty for a ~ 1 and, for a < 0, is the image of Ja,lJ x ~(A) x P(C) under the mapping (t,A,A t-+ t.n(A) + (l-t)A' ; hence it is analytic by 18. We suppose finally that E is universally measurable, and let A be a bounded measure on ~(C). We define Aon C by writing A(B) = f~(B)A(d~) for all B E ti(C). Then let B1 and B2 be two Borel subsets of C such that B1 c E c B2 and A(B 1) = A(B 2): then ~(B1) ~ ~(E) ~ ~(B2) for all ~ E f(C) and f~(B1)A(d~) = J~(B2)A(d~). It follows that ~t-+ ~(E) is universally measurable. We now complete the proof of 60 : suppose that E is cosouslin. The set ~(E)C is the set of ~ E ~(C) such that ~(EC) > 0 ; since EC is analytic, the function ~t-+ ~(EC) (1) This can be proved by constructing explicitly a distance on ~(E) from a distance on E (cf. Prokhorov [lJ, Strassen [lJ). (2) It can also be shown that f is ~-analytic if and only if the set Wf = {(t,w) ER+x ~ : t < f(w)} is (@.>(R.) x :3')-analytic. 1
)
AND POTENTIAL
75- II I
is analytic and dJ(E) is the complement on an analytic set. Digression: countability properties, non-metrizable Lusin spaces Cartier [lJ has remarked that in Bourbaki's General Topology, Chapter IX, 2 nd edition the word "me trizable" can be replaced by "Hausdorff" in every section dealing whith Souslin or Lusin spaces, and that this modification (which now appears in the "definitive" edition) is quite interesting: many spaces important in analysis, and in particular the space C~(Rn), and its dual ~(Rn), the space of distributions on Rn, are non-metrizable Lusin spaces. Given the importance that the theory of random distributions may take on in the future, we show in this section how easily Bourbaki's theory of Lusin and Souslin spaces reduces to that we have just described. Besides that, the lemmas on Lindelof spaces which we use to this end (and which are borrowed from Bourbaki) are interesting in themselves. DEFINITION. A Hausdorff space E ~ (L) if every open covering of E contains a countable subcovering, (LL) if every open set of E ~ (L), (LLL) if E x E is (LL).
63
(L) means "Lindelof"; this convenient but ridiculous notation will be used only in this section. Every space with a countable base (in particular every separable metrizable space) is (LLL). So is every Hausdorff space E with the following property: there exists an (LLL) space F and a continuous mapping f of F onto E. This will be the case for all non-metrizable spaces which we shall meet later. THEOREM. Every family (f.). I of l.s.c. (u.s.c.) functions on an (LL) space E 1 1 E contains a countable subfamily with the same upper (lower) envelope.
64
Proof: We treat the case of l.s.c. functions. Let f = ~u~ f i . For every real a, the union of the family of open set {f i > a} (i E I) is thel~pen set {f > a} ; let J a be a countable subset of I such that U {f. > a} = U {f. > a}. We set g = sup f., iEJ 1 iEI 1 iEJ 1 a where J is the union of the J a for a rational ; then {f > a} = {g > a} for every rational a, hence f = g and the theorem is established. An equivalent statement: for arbitrary f i , i E I, there exists a countable set J such that sup. If. and sup.eJf. have the same l.s.c. regularization. This is also true f6f i~f (witho~t r~placing l.s.c. by u.s.c.!) if E has a countable base. This is "Choquet's Lemma", cf. Brelot 1, p. 6. COROLLARY. -Let ~ be a positive measure on E -and (f.). I be a family of positive 1 .s.c. 65 11 E functions on E which is filtering to the right and let f = sup.f .. Then ~(f) = sup. 1 1
--
1
~(fi)·
The proof is obvious. There is an analogous statement for u.s.c. functions. THEOREM. Let E be an (LLL) space. There exists a Hausdorff topology with a countable base coarser than the topology on E. If E is completely regular, this topology can
66
PROBABI LIT! ES
76- I II be assumed to be metrizable.
Proof: To every pair (x,y) of points of E we associate a pair (U x ,Uy of disjoint open sets containing respectively x and y. The complement of the diagonal 6 of E x E then is the union of the open sets U x U By the (LLL) property, there exists a x C y sequence of pairs (x n 'Yn) such that 6 is the uni on of the U x U Let T be the Yn xn topology generated by the sets Ux , Uy T is Hausdorff, coarser than the initial topology and one checks immediately that it has a countable base. If E is completely regular, there exists a family (f.). , , E: I of continuous functions with values in the interval [O,IJ, which separates the points of E. The intersection of the closed sets Fi = {(x,y) : fi(x) = fi(y)} then is the diagonal 6 . By the (LLL) property, there exists a sequence (in) of elements of I such that 6 is the intersection of the F. . Then the function d(x,y) = I 2- n lf. (x) - f i (y) I 'n n 'n n is a distance on E defining a topology T' coarser than the original one. Since E has the (L) property, so does T Hence for all E > 0, there hence exists a countable family of open balls of radius E covering E, and we deduce that E is separable under T'. 1
)
1
•
1
1
•
;
I
•
COROLLARY. Every (LLL) compact space is metrizable. Here now is the class of - not necessarily metrizable - topological spaces introduced by Bourbaki. 67 DEFINITION. Let E be a Hausdorff topological space. E is called Souslin (resp. Lusin if there exist a Souslin (resp. Lusin) metrizable space P and a continuous (resp. injective continuous) mapping of P onto E. P can always be assumed to be Polish (see the Appendix) : we then recover exactly Bourbaki IS definition. Every Lusin space is Souslin ; every Souslin space is seoarable and (LLL). Every compact subspace of a Souslin space is (LLL) and hence metrizable. The fundamental result on Souslin and Lusin spaces in Bourbabki IS sense is the fact that, from the measure theoretic point of view, they are ordinary Souslin and Lusin spaces. But we shall also improve somewhat the theorems on direct images and isomorphisms : comparison with 18 and 21 shows that the hypothesis on f has been strengthened (continuity instead of measurability) and that on F modified (separability of the a-field isn't assumed). 68 THEOREM. Let P and F be two Hausdorff -topo 1ogi ca 1 spac-es, f be a continuous mappi ng of P into F and E be the image f(P). (a) !! P is Souslin, then E ~ ~(F)-analytic and the measurable space (E,~(E)) is Souslin. (b) !! P is Lusin and f is injective, then E E: ~(F) and the measurable space (E,~(E)) is Lusin.
77-1 II
AND POTENTIAL
In particular, if we apply this to Definition 67 with F = E, we see that the measurable space underlying a Lusin (resp. Souslin) space is Lusin (Souslin) as stated earl i er. Hence all of the "measurab1e" theory descri bed above app1i es to these spaces. Proof: Bearing in mind Definition 67, there is no loss in generality in supposing further that P is metrizable and hence has a countable base, We first establish a lemma. LEMMA. There exists a separable sub-a-field C of an atom of C.
~(F)
such that every point of E is
Let (Un) be a countable base for the topology on P. The a-field generated by the sets f(Ul) is the required sub-a-field. For let x E E ; for every y E F, y F x, n --1 there exists an open neighbourhood H of x such that y i H. The open set f (H) in P contains at least one Un and then f(U n) contains x and not y. Let C be such a a-field and let i be the canonical mapping of F onto the Hausdorff space associated with (F,e). Then C = i- 1(2) and E = i- 1(i(E)). Theorems 18 and 21 applied to i 0 f show that i(E) E Q(C) and i(E) E C if f ;s injective and P Lusin. Hence E E Q(C) c a(~(F)) and i(E) E C c ~(F) if f is injective and P Lusin. More generally, if f is injective and P Lusin, the direct image i 0 f(A) of an element A of ~(P) belongs to C ; hence f(A) belongs to C c ~(F) and f is an isomorphism of (P,S(P)) onto (E,~(F) IE) = (E,~(E)). Hence the latter measurable space is Lusin. It only remains to show that, if P is Souslin or f is not injective, (~(E)) is Souslin. But we know that (E,~IE) is Souslin(18) and it suffices to show that every A E ~(E) belongs to ~IE' To this end we choose a Borel subset AI of F such that A = AI n E and denote by C the a-field generated by ~ and AI ; the preceding argument applies to C' and consequently the space (E,C I IE) is Souslin. Since the two a-fields cl E and CI IE are Souslin, comparable, and have the same atoms, Blackwell's theorem implies they are equal and the proof is complete.
(F,C)
l
REMARK. An argument similar to the above yields the separation theorem in Bourbaki's form: in a Hausdorff topological space two disjoint Souslin subspaces are separable by disjoint elements of ~(F). THEOREM. Let E be a Souslin space in Bourbaki's sense (and a fortiori Lusin ... ). Every bounded measure ~ on ~E) is tight. Proof: Let ~ be a bounded measure on E and let P and f have the same meaning as in Definition 67. Since the measurable space (E,~(E)) is Souslin and hence isomorphic to a separable metrizable space, there exists by 45 a measure A on P such that ~ = f(A). Since the measure ~ is the image of a tight measure (38) under a continuous mapping, it is itself tight. This result is all the more interesting since the compact subsets of E are metrizable (68).
69
PROBABILITIES
78- I II Disintegration of measures 70
The theorem on disintegration of measures has a bad reputation, and probabilists often try to avoid the use of conditional distributions ... But it really is simple and easy to prove. We shall give precise statements for future reference, and rapid proofs. This is how the problem arises: we have two probability spaces and a measurable mapping of the first into the second one (n,~,p)
(70.1)
-+
(E,G,}l).
q
We suppose that q(~) = }l. To disintegrate II' consists in finding an c-measurable family (11.13) x~ Px of probability laws on (n,~) such that P = fpx}l(dX) and Px is carried by q-l{x} (1) for }l-almost all x. The relation with the problem of conditional laws is the following: let f be a positive ~-measurable function and g a positive §-measurable function. We have Hg.f
0
qJ
=
rEp
)
[g.f
x
0
qJ }l(dx)
=
ff(X)[p [gJ}l(dx) x
since (for }l-almost all x) f 0 q is equal ~x - a.e. to the constant f(x). This means that [II' [gJ can be interpreted as the conditional expectation of g "given that q = x" (II.38)xand that
w~
fglPq(w) is a version of the conditional expectation [[g!a(q)J.
We shall show that if (n,J,p) is a "good" probability space, then the problem of conditional laws can be solved in a satisfactory way and that a small hypothesis on E then enables us to complete the disintegration. 71
First case:
n is
a compact metric space with its Borel a-field (2) ~(n)
=
~.
To every function f E C(n) we associate the (bounded, not necessarily positive) measure q(f.P) on (E,G). This measure is absolutely continuous with respect to ~ and hence admits (Radon-Nikodym Theorem) an ~-measurable density df , which we choose arbitrarily within its class. Let ~ c C(n) be a countable vector space over the field Q of rationals, which is closed under the operations A and v, contains the function 1 and is dense in C(K). Let A be the set of all x E E such that f~ df(x) is an increasing Q-linear functional on the space ~ such that d1(x) = 1. It is immediately verified that A belongs to c and that ~(A) = 1. If x E A, the linear functional df(x) can be extended to an increasing linear functional of norm 1 on C(n), that is, a probability law on n(36). We denote this law by Px and the corresponding expectation by lEx. On the other hand we choose any law e on n and set IP x = e if x i A. The function Xt""+ IE x[fJ is c-measurable if f E):t, hence also if f E ~(n) by (1) This condition is natural only if the atoms of (2) These results can be extended to the a-field j
~
are the points of E.
= ~u(n).
79- II I
AND POTENTIAL
uniform convergence, and finally if f is ~-measurable and bounded, by a simple argument using monotone classes. We verify that, if f is J-measurable and bounded or positive on ~, the mapping w~ [q(w)[fJ is a version of the conditional expectation [[fla(q)J. It suffices to verify this when f E ~. This function is of the form h 0 q, where h = [[fJ is ~ measurable ; hence it is a(q)-measurable. Conversely (1.18), every bounded a(q)measurable r.v. can be written as g 0 q, where g is c-measurable and bounded. The fundamental property of conditional expectations therefore reduces to the equality (to be verified) (71.1)
Jf(W)9(q(W))~(dW)
=
f[q(w)[fJ9(q(W))~(dW).
The left-hand side is the integral of g with respect to the measure q(f.~) and hence its value is !9(X)df(X)~(dX) by definition of df . The right-hand side can be written as J[x[fJ9(X ~(dx) by definition of the image law ~ and equality follows from the fact that df(x) = [x[fJ~- a.e. Second case : ~ is a separable metrizable space and 1= ~(~) (1) 72 the measure ~ is ti ght. Then ~ is carried by a set J which is a countable union of compact subsets of ~ . We imbed ~ in a compact metric space K, we identify ~ with a measure on K carried by the (Borel) subset J and we construct the ~x on K as above. We have ~(J C) = 0 and hence the ~-measurable set {x : Px(JC) = O} carries ~. We modify Px outside this set, giving it the value e , an arbitrary law on J. All the laws Px are then carried by J and hence by ~ and we can forget about the compactification. If ~ is homeomorphic to a universally measurable subspace of a compact metric space, this applies to every law P (38). Similarly, if ~ is a Bourbaki Lusin or Souslin space, we can "lift" P to a metrizable Souslin space P above ~(67) by means of a section, disintegrate in P and then go down again to ~. So this theorem covers the usual needs of analysis. Until now the space (E,c) has been an abstract one: hence we can take c to be a sub-a-field of ~ with q the identity mapping. This special case deserves being stated : THEOREM. Let ~ be a separable metrizable space with its Borel a-field ;f = 8(~) and let ~ be a tight law on j. There then exist conditional laws on ~ relative to any suba-field 8 of 3". Moreover the same result holds for a sub-a-field ~ of the completed a-field jP, but we omit the details. Clearly the conditional laws here are not carried by the q-1{y}! We now come back to the problem raised in 70 and wish to examine whether the laws Px 73 are carried by the q-1{x} for ~-almost all x. We have remarked that this needs (1) These results can be extended to the a-field :f = 03u(~).
PROBABILITIES
80- I II
hypothesis on (73.1)
(E,~).
Hence we require, in addition to 72, the following property
E is a separable metrizable space and
~ = ~(E).
1
Let G be the product space ~ x E with its Borel a-field = ~(G) the mapping w~ (w,q(w)) of ~ into G. One sees easily that the image 1aw of P under
p
is the integral
JEIP x
® S
x
= ~ x C. Let p be
~(dx)
(compare these two laws on rectangular sets). Let J be a countable union of compact subsets of ~ carrying P ; J is obviously Lusin, so p(J) is Souslin (18) and hence universally measurable in G, and finally it carries the image law. We deduce that for ~-almost all x,P is carried by the section J x of p(J) by x and this is contained -1 x in q {x}. Bimeasures We give a last result concerning tightness, which is nice, and important in some applications. The proof is borrowed from Morando [IJ. 74 THEOREM. Let E and F be two separable metrizable spaces and 8 a mapping of ~(E)x ~(F) into the interval [0,1 J such that 8(E x F) = 1. Assume 8 has the following property for all A E ~(E), 8(A,.) is a tight measure on F and, for all B E S(F), 8(.,B) is a tight measure on E. Then there exists a unique probability law P on E x F such that 8(A,B) (A x ~(E), B E ®(F)) and IP is tight.
= P(A x B)
Proof: Let 0.. be the Boolean algebra consisting of finite unions of "rec tangular" sets, i.e. sets of the form A x B, A E ~(E), B E S(F). Every element U of u also is a finite union of disjoint rectangular sets A. x B. (1 ~ i ~ n) and one can readily n 1 1 check that the number . L\ 8(A.,B.) depends only on U and not on its decomposition . 1 1 1= 1
We denote it by IP(U). The function U0-+ IP(U) on G- is obviously additive. Keeping the above notation, let s > 0 be given. Since the measure 8(A i ,.) is tight, there exists a compact set T.1 c B.1 such that 8(A.,T.) ~ 8(A.,B.) - E/2n. 1 1 1 1 Since the measure 8(.,T i ) is tight, there exists a compact set Si c Ai such that 8(S.,T.) ~ 8(A.,T.) - s/2n. If K denotes the compact set US. x T., which belongs 11
to
11
i11
then K c U and P(K) ~ P(U) - E. Next consider a decreasing sequence (Un) of elements of a, whose intersection is empty. To show that P(U n) ~ 0, let us assume that P(U ) ~ 3a > 0 and deduce a n contradiction. For all n, let Kn be a compact set belonging to Q, contained in Un and such that P(Un'\K n) ~ a2 -n ; if we set Ln = KO n ... n Kn , then IP(U '\L n) ~ 2a, n hence IP(L n) ~ a and Ln is non-empty. Hence the intersection of the decreasing sequence of compact sets (L ) is non-empty and finally n U F~, which contradicts n n n the hypothesis. Q,
AND POTENTIAL
81-111
Caratheodory's extension theorem (34) then enables us to extend P to a probability law on a(~) = ~(E x F). The tightness of this law is verified using monotone classes, since the property is true on ~: there is no difficulty with increasing sequences and the argument for decreasing sequences is similar to that we have just given.
CHAPTER IV
Stochastic processes
In the first two paragraphs of this chapter we study stochastic processes and methods leading to the construction of suitable versions of them. In the last two paragraphs the fundamental structure is that of a probability space provided with an increasing family of a-fields. The study is pushed as far as we can without martingale theory.
1.
GENERAL DEFINITIONS ON PROCESSES
Definition of processes DEFINITION. Let (D, ~,P) be a probability space, T be any set and (E,~) be a measurable space. ~ stochastic process (or simply a process) defined on D, with time set T and state space E, is any family (Xt)t EO T of E-valued random variables, indexed by T. The space D is often called the sample space of the process, and the random variable Xt is called the state at time t. For every W EO D, the mapping t~ Xt(w) from r into E is called the (sample) path of w. In this book T will always be a subset of the extended real line R usually an interval of IR (" con tinuous case") or of i ("discrete case"), sometimes a dense countable set, for example. This is the situation inwhich the terminology originated time, instants, and paths. But there also exist parts of the theory of processes where T is only a partially ordered set (in statistical mechanics, for example, T may be the family of subsets of a finite or countable set, partially ordered by inclusion) or even has no order structure at all (in some problems of ergodic theory T may be a group; in problemsconcernin9 regularity of paths of Gaussian processes T is just a metric space). So this book, where the notion of time plays an essential role, gives a somewhat partial idea of the theory of processes.
1
Definition 1 calls for a number of remarks.
2
PROBABILITIES
84-IV
(a) Just as the notion of a random variable was related to a measurable space structure (~,~) and not to a probability space structure (~,3,P), the notion of a process does not really require a law P, and from time to time we may speak of a process on some space, without emphasis on any oarticular law on it. (b) We have defined a process as a family (Xt)t E T of r.v., i.e. a mapping of T into the set of E-valued random variables. A process can also be considered as a mapping (t,w) ~ Xt(w) of T x ~ into E or as a mapping w ~ (t ~ Xt(w)) of ~ into the set of all possible paths. In the latter interpretation, the process appears as a random variable with values in the set of paths (a"random function"), but this notion is not complete from a mathematical point of view: it lacks a a-field given on the set of all paths. We shall return to this. The second point of view (a process is a function on T x ~) will be the most useful. We illustrate it by a definition: 3
DEFINITION. Suppose that T is given a a-field J. The process (Xt)t E T is said to be measurable if the mapping (t,w) ~ Xt(w) is measurable on T x ~ with respect to the product a-fi e1d J x J'. In the discrete case (I c 1), the a-field T is that of all subsets of T and the notion is trivial : every process is measurable. We continue the "remarks" on definition 1.
4
(c) A continuous time stochastic process is the kind of mathematical model one uses to describe a natural phenomenon whose evolution is governed by chance. Hence it is natural to wonder under which conditions two processes describe the same phenomenon. On the other hand, given a natural phenomenon, how can observations be used to construct a process which describes it ? The classical answer to these questions is the following. Let us assume that at any finite system of instants t 1 , t 2 , ... , tn' we can determine with arbitrary precision the state of the process. By performing a large number of independent experiments, it is then possible to estimate with arbitrary precision orobabilities of the type (4.1)
P{X t
1
E
AI"'"
Xt
n
E
A} n
and in general observation can give nothing more. Hence the following definition expresses reasonably the fact that two processes (X t ) and (Xi) represent the same natural phenomenon. DEFINITION. We consider two stochastic processes with the same time set 1 and state space (E, c) (~,ji,P,(Xt)t E T) and (~I ,j.' ,pi ,(Xi)t E T)' The processes (X t ) and (Xi) are called equivalent if :
85-IV
AND POTENTIAL
P{X t
1
AI' Xt 2
E
E
A2 , ... , Xt n
E
A} n
=
p'{X' t1
E
AI' Xi
2
E
A2 , ... , Xi n
E
A} n
for every finite system of instants t 1 , t 2 ,· .. , t n and elements AI' A2 , ... , An of c. Terminology is somewhat functuating : we often say that (X t ) and (Xi) have the same time law, or simply the same law, or that they are versions of each other. 5 (d) However, the notion of a time law leads to criticism. On the one hand it is too precise. For it is impossible in practice to determine a measure at any given instant. All that instruments can give are average results over small time intervals. In other words, we have no direct access to the r.v. Xt themselves, but only to r.v. of the form 1 b - a
where f is a function on the state space E (considering such integrals of course requires some measurability from the process). This leads to a notion of "almostequivalence We develop this topic in nos. 35-45. (e) On the other hand, the time law notion is insufficiently precise, because it concerns only finite subsets of a set T, which in general is uncountable. We take an example. On the probability space ~ = [O,IJ with the Borel a-field ~= ~([O,IJ) and Lebesgue measure P, we consider two real-valued processes (X ) and (Y ) t t defined as follows: T = [O,IJ = ~ ll
•
Xt(W) (5.1)
= 0 for all wand all t
= 0 for all wand all t f w, Yt(t) = 1. For each t, Yt = Xt a.s. but the set of w such that ~(w) = ~(w) is empty. The two Yt(w)
processes have the same time law but the first one has all its paths continuous while the paths of the second one are almost all discontinuous. In wouldn't be right to discard this example as artificial ; let us indeed give, for the expert, the following example: we consider a one-dimensional Brownian motion (B t ) starting from 0 and define Xt(w) = 0 for all wand all t (5.2)
Yt(w)
= 0 for all wand all t such that Bt(w) f 0 = 1 for all wand all t such that Bt(w) = O.
The situation is the same as above and the process (Y t ) is by no means "artificial": it has been studied by Paul Levy in a series of works which are considered masterpieces of probability theory (Levy [IJ, Chapter VI). We now give formal definitions of the notions we have just met. The first one is a little more precise than equivalence: 6 DEFINITION. Let (Xt)t E T and (Yt)t E T be two stochastic processes defined on the same probability space (~,~,P) with values in the same state space (E,~). We say that
PROBABILITIES
86-IV
(Yt)t
E
T is a (standard) modification of (Xt)t E T if
Xt = Yt
a.S.
for each t E lr. The second definition expresses the greatest possible orecision from the probabilistic pointof view: two indistinguishable orocesses really are "the same" process. 7
DEFINITION. In the notation of definition 6, the processes (X t ) and (Y t ) are called P-indistinguishable (or simply indistinguishable) if for almost all w E ~ Xt(W)
=
Yt(w) for all t.
For example, if two real-valued processes (X ) and (Y t ) have right-continuous t (or left-continuous) paths on T = R and if, for each rational t, X = Yt a.s., then t they are indistinguishable: the paths XJw) and ~(w) are indeed a.s. equal on the rationals and hence everywhere on R. 8
The definition of indistinguishable processes can be expressed differently. A random set is a subset A of T x ~ whose indicator lA' as a function of (t,w), is a stochastic process (i.e. w~ IA(t,w) is a r.v. for all t). The set A is said to be evanescent if the process I is indistinguishable from 0, which means also that the projection A of A on ~ is contained in a P-negligible set. Two processes (X ) and (Y t ) then are t indistinguishable if and only if the set {(t,w) : Xt(w) f Yt (w)} is evanescent. Time laws: canonical process and construction
9
Among all the processes with a given time law, we try to distinguish some orocess defined unambiguously and naturally, using no information on the process other than its time law. Such a process is called canonical. We consider a stochastic process (~,J,P,(Xt)t E T) with values in (E,~). We T denote by T the mapping of ~ into E which associates with w E ~ the ooint (Xt(w))tET of ET , that is the path of w. The mapping T is measurable when El is given the product a-field t T (see 1.8) ; hence we can consider the image law T(P) on the space (ET,cT). We denote by Yt the coordinate mapping of index t on ET . The processes (~,~,P,(Xt)t E r) and (Er,~T,T(p), (Yt)t E I) are then equivalent (by the very definition of image laws) and we can set the definition DEFINITION. In the above notation, the process
r r
(E , '0 ,T(P), (Yt)t
E
T)
is called the canonical process associated with (or equivalent to) (X t ). Two processes (X t ) and (X t ) are equivalent if and only if they are associated with the same canonical process. This canonical process is hardly ever used directly when the time set T is uncountable: the a-field ~T contains just events which depend only on countably many variables Yt , whereas the most interesting properties of the process (continuity of
87-IV
AND POTENTIAL
paths, for example) involve all these random variables. The canonical process is mainly useful as a step in the construction of more complicated processes. We must insist on the fact that the "canonical" character depends on the available information on the process (X t ). In the absence of any information other than the time law, everybody will be satisfied with the above canonical process. But if it is known, for example, that the process (X t ) has a version with continuous pathsr(under some topology on r), then it would by silly to use it. The set E of all mappings of r into E will be replaced by that of all continuous mappings of T into E, onto which the measure will be carried by the same procedure as above, thus defining a canonical continuous process. The notion of a canonical process leads to a simple -but hardly satisfying - solution 10 to the problem of constructing stochastic processes. We return to the situation described in no. 4 we have observed some "random phenomenon" which we wish to represent by means of a process. Since it can only be defined to within an equivalence, the choice that offers itself to the mind is thatof the canonical process. Hence we use the measurable space (ET,oT) and the coordinate mappings (Yt)t E T· It remains to construct a probability law P on this space such that AI'···' Yt E An} = ~(tI'···' t n ; AI'···' An) n for every finite subset ~ = {t I , t 2 , ... , t n} of T and every finite family AI' A2 , .. .. , An of measurable subsets of E, the functions ~ being given by observation. For the construction to be possible, it is necessary that the set function P{Y t
E
I
A2 x ••• x An t+ ~(tI' t 2 ,···, t n ; AI' A2 ,···, An) be extendedable to a probability law ~u on (EM, ~), probability law which moreover is uniquely determined by ~(by Theorem 1.20, applied to the set of finite unions of subsets of E~ of the form Al x A x ••• xA n). On the other hand it is necessary that 2 'TTuv(Pv) = lP u Al
x
-
~-
/IItA.
AM.
for every pair of finite subsets I'A'\""'" u, v of T such that u c v, where 'TT uv denotes the projection of El onto EY. We recognize here the definition of an inverse system of probability laws (111.52) and the possibility of constructing the law P appears to be equivalent to the existence of an inverse limit for the inverse system (P~). Theorem 111.52 then gives a simple condition that imolies the existence of P. Ay,
~
~
Adapted and progressive processes We henceforth assume that the time set r is the closed positive half-line R+. We leave to the reader all trivial extensions to other time sets, except for a few remarks on more delicate points. In the numbers which follow we introduce some terminology which will be used throughout this book, but we postpone until paragraph 3 a detailed study of it. Let
(~,1')
be a measurable space and let
(~t)t
E
R be a family of sub-a-fields
of~
11
PROBABILITIES
88-IV
such that J s c 1t for s : ; t. We shall say that P t ) is an increasing family of afields on (~,~) or a filtration of (~,s) : jt is called the a-field of events prior to t. We define J + =
(11.1)
t
n J, J s>t s t -
=
V 5 (t > 0). s b ; if no such integer exists we write k = O. The intervals (t 1 ,t 2), (t 3 ,t 4 ),···, (t2k-1,t2k) of -.u represent periods of time during which the function f goes upward, from below a to above b, whereas the intermediate intervals represent downward periods. The number k is called the number of upcrossings by f (considered on ,...u) of the interval [a,bJ and is denoted by (21.1) U(f;u;ca,bJ). We define similarly the number of downcrossings of f (considered on u) on the interval [a,bJ: I'M
(21.2)
D(f;u;Ca,bJ) = U(-f,u,C-b,-aJ). NY\
N'I
We can also define the upcrossings and downcrossinqs of an interval of the form Ja,bC, replacing strict inequalities by loose inequalities in the definition of the instants t. ( 1) 1 Now let S be any subset of R+. We write : (21.3)
U(f;S;Ca,bJ) =
sup U(f;u;Ca,bJ). u finite ANI ucs Definition (21.2) can be similarly extended. The principal interest of these numbers arises from the following theorem: MIl
22 THEOREM. Let f be a function on R+ with values in R. For f to be free of oscillatory discontinuities, it is necessary and sufficient that (22.1)
U(f;I;Ca,bJ)
1 be a sequence dense in E. n n We write hn(x) = d(x n ,x) for n ~ 1 (so that the sequence (h n) of continuous functions separates the points) and hO(x) = l/d(x O'x). We want to express that each one of the real processes (h n 0 Xt ) n -> 1 has right-hand and left-hand limits along D and that the process (h O 0 Xt ) has finite right-hand and left-hand limits along D. This follows immediately using the numbers of upcrossings of paths, considered on D. REMARKS. (a) The result extends to the case of a Polish space E, since every Polish space E can be considered (111.17) as a to in some compact metric space and hence as an intersection of LCC spaces En. We then write down the preceding conditions for each of the En . If E were cosouslin (in particular Lusin), the set in the statement
23
PROBABILITIES
96-IV
would be the complement of an ~-analytic set: we leave this aside. (b) We have been concerned here with r.c.l.l. or r.l.l.l. mappings, but we might consider continuous mappings analogously. The method would be more classical: To express that a mapping of D into a Polish space E can be extended to a continuous mapping of R into E, one just writes for every integer n the condition for uniform + continuity on D n [O,nJ. Choice of the countable set: separability We emphasize again that the results of nos. 24 to 30 will not be used elsewhere and can therefore be omitted. 24
Our problem now is the following: how can we recognize whether a given process (X t ) admits a modification (Y t ) with nice properties - for example, a modification with r.c.l.l. paths or (which is more difficult) a modification with bounded paths. This is a quite natural problem, and we shall in fact study later on another problem of the same kind, relative to "almost-modifications". However, it is sometimes forbidden to modify a given process. Recall the example, already considered in no. 5, of the process (24.1)
where (B t ) is one dimensional brownian motion, with continuous paths. If we are just looking for a modification with regular paths, we may spare our time and simply take the modification Yt = O. Here the theory of separability would destroy the structure of the process. The theory of separability was developed by Doob for continuous time processes (Xt)t E R . It extends without difficulty to processes whose time set is a topological +
space with countable base. Instead of this, we study processes indexed by R+, but under the right topology (1) on R, which hasn1t a countable base. This extension is due to Chung (Chung-Doob [2J). On the other hand, the theory can be extended to processes with values in a compact metrizable space, whereas we only consider processes with values in ~ (beware, the distinction between Rand R is important here). 25 The definition may be clearer in its most general form. DEFINITION. Let f be a mapping of a topological space T into a topological space E and let D be a dense set in T. We say that f ~ D-separable if the set of points (t,f(t)), tED, is dense in the graph of f (for the product topology on T x E). Henceforth we take T = R+ with the right topology and E = R. On the other hand, D will be countable. We then say that f is right D-separable (D-separable if the ordinary topology of R+ is used.) (1) Recall that the neighbourhoods of x E R+ for the·-right topology are the sets
containing an interval [x, x+E:[, E: closed under the right topology!
>
0, so that a left-closed interval [a,b[ is
AND POTENTIAL
97-IV
DEFINITION. Let (Xt)tE R be a process with values in +
R,
defined on a probability
26
space (D,~,P). (X t ) is called right separable if there exists a countable dense set D such that, for almost all WED, the path X (w) is right D-separable. If (X t ) is right separable, we may solve easily the problem of no. 24 : if the paths are bounded on D, they are bounded everywhere, etc ... The following lemma is a modification due to Chung of a result of Doob (Stochastic Processes, pp. 56-57).
R. There exists a countable dense 27
THEOREM. Let (Xt)t E R be a process with values in +
set D with the following property: for every closed set F of R and every set I c R+ open under the right topology, (27.1)
P{X t E F for all tED n I, Xu E F}
=
0 for every u E I and, for
every countable set S (27.2)
P{X t E F for all tED
E F for all t E S n I}. t Proof: We leave to the reader the equivalence of (27.1) and (27.2), which is easy. We choose a countable set ~ of closed subsets of R, such that every closed set is the intersection of a descreasing sequence of elements of ~, and a countable set ~ of open subsets of R+ with the ordinary topology, such that every (ordinary) open set of R+ is the union of an increasing sequence of elements of For every pair (I,F), I E ~, F E ~, we choose a countable set 6(I,F) dense in I such that the probabi 1ity P{X t E F for all t E S n I} (S countable) n
I}
~
P{X
1.
is minimal for S = 6(I,F). We set 6(F) = u 6(I,F) for I E open set I and every countable set S we have (27.3)
P{X t E F for all t E 6(F)
n
I}
~
t.
Then for every ordinary
P{X t E F for all t E S
n
I}.
Always keeping F E ~ fixed, we consider for r rational > 0 the increasing function on [O,r[ hr(t) = infSIP{X u E F for all u E S n [t,r[} (S countable) which we compare to kr(t) = P{X u E F for all u E 6(F) n [t,r[}. We have, by the choice of 6(F), hr(t+) = kr(t+) for all t and hence hr and kr differ only on a countable set Nr . If we enlarge 6(F) by replacing it - without changing the notation - by 6(F) u ( UN), we have, for every rational r and every t E [O,rJ, r r hr(t) = kr(t). But then the same result will hold for all real r on oassing to the limit. Thus, for every interval [t,r[ P{X E F for all u E 6(F) n [t,r[, Xt J!' F} = o. u Now let I be an open set under the right topology: I is a countable union of disjoint intervals of the form Jt.,r.[ or [t.,r.[. The probability (27.4)
1
1
J
J
PROBABILITI ES
98-IV
P{X u E F for all u E 6(F) n I, X i F t is zero for all tEI: if t is an inner point of I in the ordinary sense, use (27.2) ; if t is one of the left-hand end-points of intervals [tj,r}, use (27.3). To get the set D of the statement, possessing the above properties for all closed sets, it suffices to take the union of the countable sets 6(F), F running through the countaple set ~. 28 We come to Doob's main theorems, the first one concerning arbitrary processes and the second one measurable processes. We first give two examples. - Let D consist of a single point. A process then is simply a function f(t) on R+ which may be arbitrary bad. On the other hand (27.1) tells us that there exists some D such that (f(t) E F for tED n I) (f(t) E F for t E I). ¢>
It follows that f is a right D-separable function and the "process" f therefore is right separable. So separability in itself doesn't imoly any regularity of the sample functions of a process. - We return to example (24.1). For every countable set D, we have P{X u = 0, u E D} = 1, whereas for almost all W the set {u : Xu(w) = 1} is non-empty. Hence the process is not separable, and any attempt to make it separable would also make it indistinguishable from 0, and hence without interest. 29 THEOREM. Every real-valued process (Xt)t R has a right separable modification with values in ~ (1) E + Proof: We fix t E R+. Choosing the set D as in no. 27, we denote by At(w) the non-empty - set of cluster values in R of the function X (w) at the point t from the right and along D At(w)
=
(I{X (w), u E Dnet, t + 1/n[}. n
u
The set of w such that Xt(w) E At(w) is measurable. Let indeed d be a metric defining the topology of R ; the condition Xt(w) E At(w) is equivalent to vn
>
0, vm
>
0,
3 U
E [t,t + 1/n[ n D, d(Xt(w), Xu(w))
0, 0s_(w) = s} u A (1), which is measurable. Taking their union, the closure A of A is measurable. Progressivity follows immediately: what we have done on [O,oo[ x ~ carries over to [O,t[ x~ using an increasing bijection of [O,oo[ onto [O,U, and it follows that the closure of A n ([O,t[ x ~) in [O,t[ x ~ is measurable relative to the a-field ~([O,t[) x ~t. But then the remarks of no. 14 imply that A is progressive with respect to the family ift + = 3't. Here is a stronger version of 32
(1) The addition of A is necessary only to take account of the instant t
=
O.
AND POTENTIAL
103-IV
THEOREM. With the same hypotheses on the space and family of a-fields as in no. 32, let (X ) be a real-valued progressive process. Then the following processes are t progressive (a) X*t = sup X t 5:5 t (b)
Y~
=
1im sup s+t
33
lim inf X and the analogous processes on the lefts+t s
hand side. -+
(c) Zt
= lim sup Xs ' .t.t+ = lim inf X and the analogous processes on the leftsHt s sHt
hand side. Proof: (a) We write LO = - 00 Lt = sup X for t > O. As X* = Lt v X ' it suffices , s a} is the projection on ~ of the (~(R+) x ~t)-measurable set {(s,w) : s < t, Xs(w) So it is
~t-analytic
and hence
~t-measurable,
-+
>
~t
-+
a}. being complete -+
-+
(b) and (c) we deal only with Y and Z . As Yt = Xt v Zt' it suffices to consider only the process Z+. But we note that Z~(w) ~ a if and only if, for all E > 0, t is a right cluster point of {s : X (w) > a - E}. Then we denote by A the s set{(s,w) : Xs(w) > a - E} and return to the discussion of no. 32 : we have seen that the set of right cluster points of A can be written as {(s,w) : Ds(w) = s} and that it is measurable (and progressive by the end of the proof of 32). It follows that (Z;) itself is progressive. One can say more about the processes constructed above from left-hand limits, but we must wait until the predictable a-field has been defined. REMARK. Contrary to what happens in the theory of separability, we have no general method to compute probabilities relative to the processes defined in no. 33. Compare the following result to Theorem 18. THEOREM. Let (n,~) be a measurable space and let (X t ) be a measurable process with values in a separable metrizable space E. Let ~rl ' ~r' ~c be the subsets of ~ consisting of the w whose paths are r.c.l .1., resp. right continuous, continuous. If E is cosouslin, the complements of these three sets are hence belong to the universal completion a-field of 1).
~-analytic
(and
Proof: We deal for example with ~ • We choose a countable dense subset D of ~+ and r return to the proof of 18 : E is imbedded in the cube I = [O,lJ N, to which we adjoin an isolated point a. As in no. 18, we define
--
=
lim Xs(w) if this limit exists in I s+t,sED a otherwi se.
34
104-IV
PROBABILITIES
This process is measurable. To say that the path X (w) is right continuous on R • + amounts to saying that Xt(w) = Xt+(w) for all t. Thus, ~~ is the projection of the (~(R+) x ~)-measurable set {(t,w) : Xt(w) f Xt+(w)}, hence it is 5-analytic. Almost-equivalence, almost-modifications Nos. 35 to 45 can be omitted at a first reading ; later in this book. 35 We consider on a probability space
(~,~,P)
39-45 will not be used
~.
a process (Xt)t
R taking its values +
E
in a metrizable separable space E. We assume that the mapping (t,w) ~ Xt(w) of R+ x ~ into E is measurable, where R+ x ~ is given the completed a-field of ~R+)x1 with respect to the measure dt ® dP(w), a property which we express by saying (X ) t is Lebesgue measurable. Throughout this section we adopt the point of view of no. 5, according to which we do not have access to the r.v. X themselves, but only to t functions on ~ of the form
X M¢(w,g)
(35.1)
=
Joo0 g(Xt(w))¢(t)dt
where g is Borel on E and positive or bounded and ¢ is Borel, positive and integrable on R+. Measurability in the Lebesgue sense guarantees both the existence of integrals (35.1) and the fact that the functions M;(. ,g) are r.v. on the completed space (~,j). DEFINITION. Two processes (Xt)t
R' (Yt)t +
E
R with values in E (defined on possiE
+
bly different probability spaces and Lebesgue measurable, are said to be almostequivalent if, for every finite system of pairs (¢i,gi) (1 ~ i ~ n ; c} is negligible relative to Lebesgue measure. This definition require f to be measurable. We do not change ess sup f(s) by modifying f sd of measure zero. As ess sup f(s) is an increasing function of I, we can following definitions. sEI
DEFINITION. Let f be a real-valued function on R, measurable or not, and let fER. We define (1) (36.1)
ess 1im sup f(s) = 1im ess sup f(s) e::HO t<s 0, but their limit as E + 0 isn1t D. a-fields associated with a stopping time DEFINITION. Let T be a stopping time of a filtration (~~) on ~. The a-field of events 52 prior to T, denoted ~~, consists of all events A E ~o such that o (52.1) for all t, A n {T ~ t} belongs to ~t' When T is a positive constant r, ~~ is the a-field ~~ of the filtration; hence the notation and name are reasonable. We shall not attempt to give an intuitive jusitification of (52.1) - the justifications will not be lacking later. Let us set ~~ = j~+ for all t (r~- = 1t ~_, ~~ = t~) and 1et T be a stoppi ng time of (er~) that is, a wide sense stopping time of (~~) (49). Then an event A belongs to t~ if and only if it belongs to t~ = &~ and 00
ll
II
for all t, A (\ {T < t} belongs to 1'~. It is quite natural to denote this a-field t~ by the notation ~~+. (52.2)
The following proposition is a mere reformulation of 52. However we introduce an operation on stopping times and some notation, which we will use often later on.
53
THEOREM. Let T be optional relative to (~~). Then A belongs to 3'~, if and only if A belongs to ~~ and the random variable TA defined by TA(w) = T(w) if w E A, TA( w) = + if w.' A is optional. We now wish to define, for every stopping time, a a-field j~_. It would be tempting to introduce, as in the remark of no. 52, the family of a-fields ~~ = ~~_ and to set :r~_ =}t~ for every stopping time T of this family. This definition would be useless : it happens frequently that (~t) satisfies the usual conditions and further ~t = ~t- for all t. The above definition would then lead us to ~T = f T_ for every stopping time T, while the distinction between pas t and strict pastil turns out to be important, even for stopping times of such families. The correct definition has been given by Chung and Doob : DEFINITION. Let T be a stopping time of (g;~). The a-field of events strictly prior to T, denoted ~~_, is the a-field generated by ~~_ and the events of the form (53. 1)
00
ll
II
(54.1)
A (\ {t < T},
t ~ 0, A
E
II
&~ .
The reader will verify that ~~_ is also generated by the following (less convenient) sets (54.2)
A (\ {t ~ T},
t ~ 0, A ~ ~~_.
This definition is meaningful for every r.v. T ~ 0 (cf. (68.1)). If we write ~~ = ~~+ for all t, ~~_ = ~~_, then 1~ = ~ ~ _ for all T.
_
54
PROBABILITIES
118-IV
ELEMENTARY PROPERTIES OF STOPPING TIMES In the statements which follow, the r.v. are all defined on the same space (Q,~o,P) and the stopping times relative to the same filtration (~~), unless otherwise mentioned. 55
THEOREM. (closure properties). (a) Let S and T be two stopping times ; then SAT and S v T are stopping times. (b) Let (Sn) be an increasing sequence of stopping times. Then S = limnS n is a stopping time. (c) Let (Sn) be a decreasing sequence of stopping times. Then S = limnS n is a stopping time of the family (~~+) - a stopping time of (~~) if the sequence is stationary, i.e. if for all w there exists an integer n such that Sm(w) = Sn(w) for all m ~ n. Proof: (a) {S A T ~ t}
= {S ~ t}
u
{T ~ t} belongs to ~~
similarly for
v.
~ t} belongs to .f°t' (c) {S < t} = U {S < t} belongs to SOt ( b) {S ~ t} = n{S n n n n when the sequence is stationary then also {S ~ t} = ~ {Sn ~ t}. We deduce immediate consequences : the set of stopping times is closed under (vc) and the set of stopping times of (~~+) (wide sense stopping times) is closed under the operations (VC,AC), and also under countable lim inf and lim sup.
56
THEOREM. (events prior to stopping times). (a) For every stopping time S, ~~_ c:f~ and S is ~~_-measurable. (b) Let S and T be two stopping times such that S ~ T. Then ~~ c 1~, and, if S < T everywhere, ~~ c ~~_. (c) Let Sand T be two stopping times. Then (56.1)
for aJ.l A E ~~, A
(56.2)
for all A E 1'~, A n {S
n
~o
S-
c
JO
T-
{S ~ T} belongs to j~ < T}
belongs to j~_.
In particular, (56.3)
{S ~ T}, {S = T} belong to ~~ and J~ and {S
the generators (54.1) of &~_. (c) Property (56.1) follows from the following equality. true
j~, i.e. satisfies S}, r ~ 0, A E ~~.
t} appears among for all t
AND POTENTIAL
119- IV
A n {S ~ n n {T ~ t} = [A () {S ~ t}] () {T ~ t} () {S /\ t ~ T /\ t}. If A is t~-measurab1e, the three events apoearing on the right are in &~. Prooerty (56.2) follows from the following equality, where r runs through the rationals: A n {S < T} = U (A n {S < r}) n {r < T}. r
If A belongs to ~~, the events appearing on the right belong to the generating system (54.1) of ~~_. (b) If S ~ T and A E ~~, then A = A n {S ~ T}E t~ ; if S ( T everywhere, A = A n {S < T} belongs to ~~_. Finally, it suffices to verify that the generators (54.1) of ~ ~_ be long to j" ~ _ if S ~ T; but, if B E J1~, then B n {t < S} = (B II {t < S}) n {t < T} and B n {t < S} E f~. (d) In the case of an increasing sequence, U 1'~ _ c f~_ by (b). On the other n
n
hand, every generator (54.1) of the a-field ~~_ can be written as A n {t < S} = UA () {t < S } with t ~ 0, A E ~Ot' and we see that it belongs to U'fso _. In the case n n n n of a decreasingoSequence, we recall that j~+ is the set of A E ~~ such that A n {S < t} E tf t for all t. As A n {S < t} = nAn {S < t}, we see that o 0 n n ~S+ ::> ifS + and the converse inclusion follows from (b). n n (e) The set of all A E~: such that A n {S =oo} belongs to 1'~_ is a a-field. Hence it suffices to verify that it contains ~~ for all t. Now if A belongs to ~~, A n {n < T} is a generator (54.1) of ~~_ for all n ~ t and hence A n {S = oo} belongs to j-~_. Finally, the case of ~~ is trivial.
n
REMARK. (a) The reader may begin to perceive one of the great orincip1es of this theory: to extend to stopping times T and a-fields J~ all that is known for constant times t and a-fields ~~. Thus (a) and (b) are the extension to stopping times of the monotonicity of (~~) ; (c) is the extension to arbitrary pairs of stopping times of properties (52.2) and (54.1), relating to pairs consisting of a stopping time and a constant, and (d) is the extension to stopping times of the continuity properties of the families (t~_) and (~~+). (b) It is not true in general that for all A E ~~_, A n {S ~ T} belongs to f~_. We shall see in no. 72, (c) the correct extension of (54.2). (c) Let Sand T be two stopping times. Then we have
~ S /\ T = ~S () ~T ' ~ S
v
T = ~S v ~T
Indeed, if A belongs to ~S and to ~T' A n {S ~ S 1\ n and A n {T ~ S 1\ T} belong to ~S 1\ T according to (56.1) ; taking, their union, A belongs to t s 1\ T' so that ~S n ~T c ~S 1\ T· The reverse inclusion is obvious. Similarly, if A belongs to Y S v T' A n {S v T ~ S} and A n {S v T ~ T} belong to ~S v J'T ; taking unions the same is true for A. The reader may show a1 so that 3'(S v T)- = ~S- v :3'T _, and that 1\ T)- = ~S- II ~T- if Sand T are predictable.
1S
THEOREM. (a) Let S be a stopping time of the family (~~) and T be an ~~-measurab1e r.v. such that S ~ T. Then T is a stopping time of (J~). The same conclusion holds if S is a stopping time of (~~+), T is ~~+-measurab1e S < T on {S < oo}.
57
120-IV
PROBABILITIES
This applies in particular to the r.v. T = S + t (t
>
0) and to
(57.1) (b) Suppose that the family (~~) is right continuous. Let S be a stopping time, (~~) the family (~~+t) and T and ~~-measurable positive r.v. Then U = S + T is a _s_to--'-p.....p_i.....;ng"--t_i_me_o_f (~~), if and only if T is a stopping· time of (~~) (1) Proof
(a) For all u {T
~
u} = {T
~
u} n {S
~
u}.
As {T ~ u} belongs to j~, this belongs to ~~ by definition of j~, and T is a stopping time. If S < T on {S < oo}, we can replace {S ~ u} by {S < u} in the argument details are left to the reader. (b) Suppose that T is a stopping time of (~~). We write that {U < t}
= U {S +
b < t} n {T < b}
b
on all rationals b < t. But {T < b} E ~~ = ~~+b' hence, by definition of ~~+b' {T < b} n {S + b < t} E ~~ and {U < t} E ~~' Since the family (~~) is right continuous, U is a stopping time of it. Conversely, we suppose for simplicity that S is finite. Then if U is a stopping time of (~~) {T ~ t} = {S + T ~ S + t} E ~~+t = 1~ by (56.1) and T is indeed a stopping time of (~~). 58
COROLLARY. Every stopping time S of the family (~~+) is the limit of a decreasing sequence of discrete stopping times of the family (~~) and can also be represented as the lower envelope of an (in general non-decreasing) sequence of stopping times T of the following type (58.1)
T=a.I
A
+ (-f 2- } (n ~ 0), then Vo = UO' Vn = Un\U n_1 (n > 0) ; the sets Vn are optional (predicatable if so is (X t )) and disjoint. Ne~t we set 01(w) = inf{t : (t,w) n
€
so that Oi is the i-th jump n (X t ) has r.c.l.l. paths, Vn enumerate all points of Vn . according to 87d). Finally, an ordinary sequence (Tn)'
V} , ok+1(w) = inf{t n n
Ok(w) : (t,w) € Vn} n 1 of (X t ) whose size lies between 2- n and 2- n+ . Since. 1 has no finite cluster point, and the . stopping times 0n If (X t ) is predictable, then the 01n are predictable . it only remains to reorder the double sequence O~ into >
REMARK. The conclusion still holds for a process taking values in a metrizable separable space E : one just imbeds E into [0,1J~ and applies the statement to each coordinate process, then the procedure at the beginning of the proof of 88 to turn the stopping times into disjoint ones. The same remark (with the same argument) applies to the following application of 888, which interrupts our discussion of sets with countable sections. THEOREM. Let X = (X t ) be a real-valued adapted r.c.l.l. process. Then X is predictable if and only if the following two conditions are satisfied. 1) For every totally inaccessible stopping time T, XT and XT_ are a.s. equal on {T
~
105
T ;r------S
s Proof: We shall deal only with stopping times of (j~). If S satisfies the conditions of the theorem, then {S ~ t} E ~~ for all t. Indeed, if t < s, the set {S ~ is contained in {T < t}, it is a Borel set of JO,t[ and belongs to ~~ ; if t ~ s, the set {S ~ t} contains the whole of the atom [t,oo[ and it also belongs to t~.
t;}
Conversely, let S be a stopping time of (~~). If S > T everywhere, there is nothing to prove. If there exists some w such that S(w) ~ T(w), we take s = S(w) ~ T(w) = w. The set {S = s} belongs to ~~ and contains w, which belongs to the atom [S,oo[ ; hence it contains the whole of the atom [s,oo[, thus giving condition (b). Now let Wi < s. If it were true that s' = S(w l ) ~ T(w ' ) = Wi, it would follow that s' < s. The set {S ='Sl} E ~~I would contain the point ~ of the atom [s· ,00[, hence the whole of the atom, and we would have S = s' on [S,oo[ whereas we have S = s there and s F Sl. Consequently, S(w l ) > T(w ' ) and condition (b) is proved. THEOREM. Every stopping time of (t~) is predictable.
106
Proof: Let S be a stopping time of (t~) and s the constant associated with it by 105. If s = 0, then S = 0 and hence S is predictable; suppose that s > 0 and let (sn) be a sequence of positive reals such that sn#s. Then S is foretold by the sequence (Sn) of stopping times of (~~) defined by Sn = n A ((1 - ~)S + ~) on {T < n A sn} Sn = n
A
sn on {T
~
n
A
sn}.
We now give (n,~o) a probability law P. We denote by ~ the completed a-field of 3-'0 and by (t t) the usual augmentation of (~~). THEOREM. If the law P is diffuse, the filtration (~t) is quasi-left-continuous and 107 T is totally inaccessible. If the law P is purely atomic and non-degenerate, T is a non-predictable accessible time. Proof: We begin with the second assertion. The law P is carried by a countable set
154-IV D and the graph of T is then P-a.s. contained in the union of the graphs [t], tED. Since constants are predictable times, T is accessible (81). Suppose that P is nondegenerate. Then P is non-zero at two distinct points u and v such that u < v. If T were predictable, there would exist a predictable time S of (1~) such that S = T a.s. and hence S(u) = T(u) and S(v) = T(v). By 105 this implies u = v, which is absurd. Suppose that P is diffuse. Since every set of the form It} is negligible, the a-fields obtained by adjoining all the negligible sets to q~_ = t~ and to f~+ are equal, so that (~t) is simply the completed filtration of (~~). We first show that T is totally inaccessible. Let S be a stopping time of the family (~t) such that S ~ T. Then there exits a stopping time R of the family (f~+) such that S = R. a.s. (59) and, replacing R by RAT if necessary, we can assume that R ~ T everywhere. But then, by 105, R is of the form T A sand S = T A S a.s. Now let (Sn) be an increasing sequence of stopping times bounded above by T ; each one is a.s. of the form T A sn' the sn increase to a number s and the set {limnS n = T, Sn < T for all n} is a.s. contained in {T = s}, which is negligible because P is diffuse. Finally we show that the family (~t) is quasi-left-continuous. Let U be a predictable stopping time of (~t) and A be an element of ~U ; we show that A belongs to ~U-' Since all the negligible sets belong to ~O-' it is sufficient to show that A is a.s. equal to an element of ~U-' But UA is a stopping time of (f t ) and hence is equal a.s. to a stopping time V of (f~+) (59). As U is predictable and T totally inaccessible, P{U = T} = 0 and hence, replacing V by V{VIT} if necessary, we can assume that V is nowhere equal to T. Then we see by 105 that V is a stopping time of (~~) and hence a predictable time and A is a.s. equal to the set (A n {U =oo})u{V = U}, which belongs to ~U-' 108 REMARKS. (a) If P is diffuse, the completed filtration (~t) satisfies tt_ = ~t = f t + for all t, so that every stopping time of (f t ) still is a stopping time of (~t-) without implying that every stopping time of (~t) is predictable. (b) Suppose that P is diffuse and that the support of P is O. We have seen that every stopping where s is a constant. Hence S < T.a.s. only if S = optional set ]O,T[ does not possess a complete, or cross-section.
greatest lower bound of the time S ~ T is a.s. equal to T A s, 0 a.s. In other words, the even almost-complete, optional
APPENDIX TO CHAPTER III
The results below aren't less interesting than those of the main text: they just lack (for the moment) important applications either to measure theory or to potential theory. One should bear in mind that the results of Chapter III, or even those in Chapter IX of Bourbaki's General Topology are only the lower stages of the descriptive theory of sets, developed by Polish and Russian mathematicians and then by modern logicians to incredible heights. The last results in the appendix indicate the next stage above. The numbering follows that of Chapter III. Souslin schemes In probability theory analytic sets appear naturally as projections of Borel sets, whence the definition which we have adopted in the text. But the oldest (and still most used) definition is that of Souslin. We prove it here to be equivalent to the definition in the text, and to that given in the first edition of our book. We denote by S the set of finite sequences of integers and by I the set NN of infinite sequences of integers: if ~ is given the discrete topology and L the product topology, I is a Polish space and a ~o of the compact metrizable space (N u {oo})~. We denote by lsi the length of a finite sequence s E S. The notation s < t, where s E Sand t E S or t E I, means that t begins with s : for example s = 3,1,4 and t = 3,1,4,1,6. Finally, for 0 E I, we denote the n-th term of 0 by a(n) and the finite sequence 0(1), ... , o(n) by oln, with analogous notation s(n) and sin if s E Sand n ~ s . DEFINITION. Let (F,~) be a paved space. ~ Souslin scheme on of S into ~. The kernel of the Souslin sheme is the set
~
is a mapping
s~
75
Bs
B = U n Bs = U nn Bo1n ' o s~o o We say also that B is the result of the Souslin operation (A) applied to the determining system (Bs)s E S· The Souslin sc~eme is said to be regular if Bs ~ Bt for s < t. If j is closed under (nf), the regular Souslin scheme B = n B is also a s r-<s r scheme on j and still has the same kernel B hence there is no loss of generality in considering only regular schemes in this case. 1
We first show that every f-analytic set, even every J-analytic set in a slightly more general sense (see the first edition of this book, page 34), is the kernel of a Souslin scheme on 1.
c
THEOREM. Let (E,~) and (F,:1') be two paved sets, the paving being semicompact. Let B be an element of (~ x J)ao and A be its projection onto F. Then A is the Kernel of
76
156-APPENDIX-III
PROBAB ILITI ES
a Souslin scheme on
~.
Proof: By definition, B can be written in the form B
= n U (E n m
x
nm
F
nm
), E
nm
E
C,
F
nm
~.
E
We permute the operations of union and intersection, getting B=U n(Ea()xF an n n na ()). n Then for every s
E
S of length lsI = n we write A = F () if nEms(m)f 0 and As =0 s ns n m~n
otherwise, n denoting projection onto F, we have n( Since the paving
~
n E a( ) x Fna(n)) = n A I . l:::;n:::;N n n l:::;n:::;N a n
is semicompact, by 6 n( nn Ena(n) x Fna(n) ) = nn Aal n
and taking the union over a we get the required result A = n(B) = U n Aaln0 a n
77
We pass to the converse. We even establish a stronger result: every kernel A of a Souslin scheme (As) on ~ is the projection of an element of (~ x ~)ao' where ~ is the paving consisting of all compact subsets in the compact metrizable space E = (N U {oo})lN. So there is no need in definition 7 to take a llll compact metrizable spaces as auxiliary sets; E is sufficient (1) lI
We introduce a convenient terminology: for all s E S, the set s < a} is called the islet of index s. Clearly Is ~ It if s < t, Is and Is n It = 0 if sand t are not comparable. Islets are both open form a countable base of the topology of L . THEOREM. Let A be the kernel of a Soulin scheme (As) on onto F of the following subset of L x F
~.
I s = {a E L c It if t < s and closed and
Then A is the projection
B = ~ I~ I=n' (Is x As) = ~ ~ (I aln x Aa1n ) = ~[{a} x ( ~ Aa1n ) J. If
L x F is
considered as a
subs~t
of E
x
F, B belongs to
(~
x ~)ao'
Proof: The equality of the first and second expressions of B follows from the remarks on islets preceding the statement; the equality of the second and third expressions is obvious. It follows immediately from the third expression that n(B) = A. (1) The possibility of using a unique auxiliary compact set is not a great discovery, since all compact metrizable spaces can be imbedded in [O,lJ N.
157-APPENDIX-III
AND POTENTIAL
Let Is be the closure of Is in E ; since Is is closed, Is = i s n I and U (I x A ) is the intersection of U (1 x A ) - which belongs to n Isl=n s s n lsl=n s s (~x j)ao - with L x U, where U is the union of all the As. As L is a ~o in E, L is also a t~; similarly U is a ~a and hence a ~au~. Finaly, I x U is a (c x 'jt)ao au and the theorem follows.
n
n
REMARK. We have seen that the ~-analytic sets are exactly those which are constructed from the Souslin operation (A). The result of 10 according to which ~~(~)) = ~~), therefore means that the kernel of a Souslin scheme on a(~) (which is itself the set of all kernels of Souslin schemes on ~) still is the kernel of some Souslin scheme on 3'. The direct proof of this result (known as the lIidempotency of the operation (A)) requires a certain amount of ingenuity: see for example Hausdorff1s t~engenl ehre [1]. ll
Supplement on Souslin and Lusin spaces In no. 16 we defined a metrizable Souslin (resp. Lusin) space as any space homeomorphic to an analytic (resp. Borel) subspace of a compact metrizable space in no. 67 we extended this definition by saying that a Hausdorff space which is not necessarily metrizable is Souslin (resp. Lusin) if it is the continuous (resp. injective continuous) image of some metrizable Souslin (resp. Lusin) space. We now show that every analytic (resp. Borel) subset of a compact metrizable space is the continuous (resp. injective continuous) image of a Polish space. This will imply the equivalence between our definition and Bourbaki's (General Topology, Chapter IX, §6).
THEOREM. Let E b~ an analytic subset of a compact metric space F. Then there exists a continuous mapping of the Polish space I = ~~ onto E. Proof: Let ~ denote the paving consisting of the compact subsets of F and let G be an auxiliary compact space, given the paving of its compact subsets, such that E is the projection onto F of a set H E x ~)ao
(r
t
H = ~ ~ (A nm x Bnm ) , Anm E ~, Bnm E ~. Let d be a distance defining the topology of F. Replacing the Bnm if necessary by their intersections with suitable balls and modifying the indices m, we can suppose that all the compact sets Bnm have diameter (with respect to d) at most l/n. We then tra~sform this representation as in no. 76 : we get a representation of E as the kernel of aSouslin scheme (E s ) on ~ such that for all s E S the diameter of Es is at most 1/lsl. Bu then for all a E I the intersection E = n E I either reduces to a nan a single point or is empty. In the first case we denote the unique element of Ea by f(a) and in the second case we set f(a) = xo ' an arbitrarily chosen element of E. Clearly f is a mapping of I onto E and we leave to the reader the task of showing
78
158-APPENDIX-III
PROBABILITIES
that it is continuous. 79 THEOREM. Let E be a Borel subset of a compact metrizable space. Then E is the injective continuous image of a closed (and hence Polish) subspace P of L • Proof: We can suppose that E is imbedded in [0,1/2JN ; by imbedding the latter set in F = [O,l[N , the problem is reduced to establishing the theorem for all Borel sets of F. We shall use an argument by monotone classes. Let A c F ; we define that A E ~ if there exist a closed set P of L and an injective continuous mapping f of P onto A, and show that fhas good closure properties and is sufficiently rich. (a) P is closed under finite or countable intersections. We consider the countably infinite case: let A be elements of ~ which are images of closed sets N Pn of L under continuous biject~ons f n . We identify F with the diagonal 6 of F using the diagonal mapping x~ (x,x, ... ) and denote by ¢ the mapping TI f of tN n ~ 1 ~ n into F and by P the closed subset ¢- (F) n TI P of L • Clearly f = ¢lp is a n n continuous bijection of Ponto A c F = 6. Now P is a closed subset of tN, not
nn
n
IN . JIJ = 11t)NxlN.1S homeomorp h"1C t 0 L, \' an d of L ; it matters little S1nce L.\' = N , so L P can be considered as a subset of L by transport of structure.
(b) The union of a finite or countable sequence of disjoint elements An of P be longs to P. We again cons i der the countab ly i nfi ni te case and defi ne the Pnand f n as above. We can immediately define a continuous bijection f of the topological sum P of the Ponto UA n . Now P is a closed subset of the topological sum N x L of n a sequence of copies 8f L ' and as above this is homeomorphic to L . (c) Let 6 be the family of all subsets of F of the form TI I , where for all n n n I n is an interval of the form [a n ,b n[ and I n = [0,1[ except for a finite number of values of n. Note that C contains F and is closed under finite intersections and that the complement of an element of C is a finite union of elements of C. We show that C c ~. Since is homeomorphic to L, it is obviously sufficient to show that there exists a continuous bijection of L onto each of the In or simply onto [0,1[. Such a bijection is given by IN f(o) = 1- 2- 0 (1) _ 2-[0(1)+0(2)J _ 2-[0(1)+0(2)+0(3)J
IN
(0 E
(recall that IN
=
L=
IN )
{1,2, ...}, so that all the o(i) are> 0).
C We can now complete the proof. Let ~ be the set of A c F such that A and A belong to P; we show that ~ is closed under (uc,nc). It is sufficient to prove that for every sequence (A n) of elements of rrtl, nA n and uA n belong to (P. For intersections this follows from (a) and for unions from (b) and the fact that uA n is also the union of the disjoint sets B1 = AI' B2 = A2 n A~, B3 = A3 n A~ n A~, ... , which belong to ~ by (a). We concl ude by noting that rm contains C by (c) and that C generates the Borel o-field.
159-APPENDIX-III
AND POTENTIAL
From the topological point of view, a metrizable Lusin space is homeomorphic to a Borel subspace of a compact metrizable space. From the measure theoretic point of view, the situation is much simpler: we now establish (following Christensen's [1] elegant proof) the following result due to Kuratowski THEOREM. All uncountable Lusin measurable spaces are isomorphic (in particular to \' IN [0,1], L or {Otl} ). Proof: Let L be an uncountable Lusin metrizable space. We compare it to the set C = {O,l}N (homeomorphic to the Cantor set) by first constructing a measurable isomorphism f of L onto a Borel subset of C and then a measurable isomorphism g of C onto a Borel subset of L. After that we use the classical method of V. Bernstein to construct a measurable isomorphism of L onto C. (a) Let (An) be a countable Boolean algebra generating the Borel a-field of L. The mapping f : x~ (I A (x))n E ~ then is a measurable isomorphism of L onto a subn
set of C with its Borel a-field (cf. 1.11, an analogous result). The image f(L) is Borel by the Souslin-Lusin Theorem (21), but in fact it is interesting to prove this without 21 which appeals to the theory of analytic sets: since f is a measurable isomorphism, f(L) is Borel by the isomorphism extension theorem 111.18-19. (b) In the other direction, we consider a Polish space P and an injective continuous mapping h of Ponto L. We construct an injective continuous mapping j of C into P and that will suffice: for h 0 j = g will be an injective continuous mapping of the compact space C into the Hausdorff space L and will therefore be a homeomorphism of C onto a compact subset of L. We give P the structure of a complete separable metric space. Let Mbe the set of all x E P such that every neighbourhood of x is uncountable; Mis closed with no isolated points and, since the topology of P has a countable base, P\M is countable and hence M is non-empty. Let x and y be two distinct points of Mand MO and M1 be two disjoint closed neighbourhoods of x and y, of diameter ~ 1. It follows from the definition of Mthat MO and M are not countable. Repeating the operation, 1 we construct two disjoint non-countable closed subsets MOO and Mal of MO of diameter ~ 1/2 and similarly MIa and MIl in MI' Following this procedure we can construct for every finite dyadic word s of length lsi = n a non-countable closed set Ms of diameter ~ 21-n ,such that
I s I = I tl
,s f t
=>
Ms n Mt = 0.
The mapping which associates with every infinite sequence a j(a) of MI is then continuous and injective.
nnan
E
C the unique element
(c) To simplify notation, we identify C with a (compact) subset of L, so that g is now the canonical injection of C into Land f is a measurable isomorphism of
80
PROBABILITIES
160-APPENDIX-III L onto a Borel subset of C. We define f(A 1),oo., An+1 = f(A n ) A = U A , B = L\A
Al
= l\ C,
n
AZ
=
n
h(x) = f(x) if
X
E
A, h(x)
=
x if x
E
B.
Since f maps L into C, Al and AZ are disjoint and it is immediately verified that all the A. are disjoint, that B c C and that f(A) = C\B. On the other hand, f induces 1 an isomorphism of Ai onto Ai +1 , hence h induces an isomorphism of A onto f(A) = C\B and an isomorphism of B onto itself, and finally h is an isomorphism of L onto C. A cross-section theorem. 81 This is a nice cross-section theorem which is stronger than 44 (b) - it makes no use of the measure on E -and whose proof is simple. We take it from the book [IJ by Hoffmann-J~rgensen but it is much older (Jankov 1941, Von Neumann 1949). Recall that I can be given a total ordering (denoted by ~ ), namely the lexicographic ordering defined as follows: let a and T be two elements of I ' then a ~ T ifa = T or a(n) < T(n), where n denotes the smallest integrer i such that a(i) f T(i). Every non-empty closed subset of I has a smallest element with respect to this ordering. For all T E I let J T = {a : a ~ T, afT} ; it is not difficult to show that J T is open. On the other hand, the Borel a-field of I is generated by the J T. Let s E S be of length k, and let Tn be the infinite sequence which coincides with s up to the k-th term and has the value n thereafter we write Ls = U J T ' which belongs to the a-field generated by the J . Then a E L if and only n
n
s
T
if alk ~ s under the lexicographic ordering on the set of sequences of length k and hence the islet I is equal to L \ U L , where r runs through the set of sequences s
s
r
r
of length k which are strictly less than s. Finally we know that the islets generate ~a)
.
THEOREM. Let E and ~ be two metrizable spaces (1), TI be the projection of E x ~ onto ~, A be a Souslin subset of E x~ and B = TI(A). Let ~ be the a-field on ~ generated by the Souslin subsets of ~. There then exists a mapping h of ~ into E, measurable from a-fields ~ to ~(E) and such that, for all x E: B, (h(x) ,x) E A (h ~ complete cross-section of A). Proof: Since B is the image of A under a continuous mapping, it is Souslin in ~, so that B E S and the problem reduces to the case where B = ~. By 78 there exists a continuous mapping f of I into E x n such that fn) = A. Let k = TI f. For all -1 w E ~,k ({w}) is a non-empty closed subset of I ; we denote its smallest element 0
(1) As an exercise, the reader may try to replace in this proof Il me trizable ll by IlHausdorffll.
161-APPENDIX-III
AND POTENTIAL
with respect to the lexicographic ordering by g(w). It is easily verified that for all T E I the set g-l(J ) is equal to k(J ), the image of J under a continuous T T T mapping and hence Souslin. Thus g is an$-measurable mapping of $I into I . It only remains to take h = ~' 0 fog, where ~' is the projection of E x $I onto E. REMARKS. (a) By 17 every Souslin subset of $I is ~($I)-analytic and hence universally measurable by 33. Thus ~ c ~u($I) and h is universally measurable. By 18, if $I is Souslin then also ~$I) c ~. (b) Even if E = $I = [O,lJ and A is Borel with projection equal to $I, the theorem cannot be improved by constructing a Borel cross-section h or a cross-section with a Souslin graph. 44 should be compared to the abstract form of 81 which we now give: the crosssection constructed here does not depend on a measure on ($I,J). More generally the method below can be used to reduce many "abstract" situations to "topological" ones. A
THEOREM. Let ($1,1) be a measurable space and A an element of ~(R+) x ~. Let ~ be the universal completion of ~. There exists an j-measurable mapping T of $I into R+ with the following properties (a) (T(w),w) E A for all w such that T(w) < (b) the set {T < oo}is the projection of A onto $I,
82
00
Proof: There exists a separable sub-a-field jo' of ;1 such that A E ~(R+) X j' ; replacing J by ~' if necessary, we can assume that ~ is separable. Taking quotients if necessary, we may assume that the atoms of ~ are the points of $I. Then by 1.11 we can identify $I to a subset of R with the a-field B(R) 1$1 . Since A belongs to ~(R+) x j1, there exi sts A x S(R+) x ~(IR.) such that A is the trace of A on IR+ x $I. Applying the above theorem to the Lusin set A', we can construct a universally measurable cross-section of A' defined on IR. Its restriction h to $I is universally measurable on $I (11.32,(2) applied to the injection of $I into R) and we conclude the proof taking T(w) = h(w) if w belongs to the projection of A and T(w) = + otherwise. I
I
00
The second separation theorem Let (F,j) be a paved set. We say that a subset A of F is J-bianalytic if A and A are ~-analytic. Bianalytic sets constitute a a-field ~G(t), which usually contains ~, and hence also the a-field it generates. The typical example is that of a separable metrizable topological space F, with the paving ~ of its closed sets; then the complement of any closed set belongs to ~a' so that ~ c ~G(~) and the bianalytic a-field contains the Borel a-field. C
We are going to state two equivalent forms of the second separation theorem, and then to give some comments. First, the form due to Novikov.
83
I62-APPENDIX-III THEOREM. Assume that J1 c \?)(;(~). Let (A ) be a sequence of ~-analytic sets such -n that nAn = ~. There then exists a sequence (B n ) of j-bianalytic sets such that n
A c B for every nand n B =~. n n -- n n The second form is the so called reduction theorem of Kuratowski. To understand it, let us recall how useful the following operation is in set theory: given sets Un with union U, replace the Un by smaller disjoint sets Vn with the same union (one usually takes VI = UI , V2 = U2 n U~, V3 = U3 n U~ n U~ ... ) This may be called a reduction of the sequence (U ). Kuratowski's theorem says that reduction n is possible in the class of complements of ~-analytic sets (to abbreviate: jcoanalytic sets) - note that the trivial reduction procedure mentioned above would take us out of this class. THEOREM. Assume that J1 c OJJGJ(~). Let (Un) be a sequence of .J-coanalytic sets. There then exists a sequence (V ) of disjoint ~-coanalytic sets such that Vn c U for n n-every n and UnV n = UnU n ' The abstract set-up in these statements is fake generality: the interesting case is that of separable metrizable spaces. In the case of a compact metrizable space, one knows from the first separation theorem that the bianalytic sets are exactly the same as the Borel sets, and the theorem of Novikov becomes very striking (consider for instance the case of a decreasing sequence (An))' Most applications concern this compact case, and some of them are very simple and beautiful (Dellacherie [2J). We have mentioned several times that analytic sets and capacities belong to the "first floor" of descriptive set theory, while the second separation theorem is already 'on the "second floor". Dellacherie made in [2J the conjecture that the compact case of Novikov's theorem (with "Borel" instead of "bianalytic") still belongs to the first floor. Mokobodzki indeed succeeded in giving an "elementary" proof of it (1), which was further simplified by Dellacherie (2). Another "first floor" proof of a result which includes Novikov's, in the compact case, was given by Saint-Pierre (unpublished).
(1) G. Mokobodzki. Demonstration elementaire d'un theoreme de Novikov. Seminaire de Probabilites de Strasbourg, vol. X, 1976, p. 539-543. (2) Same volume, p. 580.
APPENDIX TO CHAPTER IV
We give here two supplementary results on random sets. The first concerns random sets which have (with non-zero probability) non-countable sections. The second on the contrary determines the structure of random sets almost all of whose sections are countable. The proofs given here are new - at least, we think they are - and use only the "elementary" theory of analytic sets contained in Chapter III and its appendix. Different proofs can be found in Dellacherie [lJ, Chapter VI, depending on a notion which does not appear in this book, that of "dichotomic capacitances". We use the following notation : (~,jO,P) is a probability space and ~ is the completed a-field of ~o with respect to P. For some results ~ will be given an increasing family (~t) of sub-a-fields of ~ satisfying the usual conditions. Sets with uncountable sections. We begin with some preliminary results
109
In the appendix to Chapter III (no. 80) we saw that every non-countable Lusin measurable space L is isomorphic to the interval [O,lJ; it therefore carries a diffuse measure. This can be shown also in a more elementary fashion: note that L is isomorphic to a Polish space P (111.79) and that P contains (cf. III.80(b)) a compact subset homeomorphic to {0,1}~, which also carries diffuse measures (the measure of the "heads or tails game" for example) ; the same result for L then follows. On the other hand we have the following lemma. LEMMA. Let ~ be the set of probability laws on R+ with the strict convergence topology. Then the set ~ of diffuse laws is a Borel set of P.
110
Proof: Let ~ E P and let F~(X) = ~(J_oo,xJ) be its distribution function. The property that ~ E ID can be expressed as follows: for all n E ~ there exists m E ~ such that, for every pair (r,s) of rationals 1 of [0, nJ such that r ~ s ~ r + m' we have ~([r,sJ) < l/n. This condition is necessary (if ~ is diffuse, F is uniformly continuous on [O,nJ) ~ and sufficient (let x E R+ ; taking r ~ x and s ~ x we see that ~{x} = 0). We conclude the proof by noting that ~~ ~([r,sJ) is a Borel function on ~. THEOREM. Let H be a random set (1) belonging to ~(R+) x jO. 111 (a) The set c(H) = {w : H(w) is non-countable} ~ ~O-analytic and in particular belongs to ~(2). (1) As usual, we write H(w) = {t : (t,w) E H}. (2) This is a special case of the Mazurkiewicz-Sierpinski theorem which states more generally that c(H) is ~O-analytic if H is (B(R+) x~o)-analytic.
164-APPENDIX-IV
PROBABILITIES
(b) There exists an g.°-measurable mapping ]J>-+]J of st into w-following properties
9) u
{O} with the
- for all w,]Jw is carried by H(w). - p{w ]Jw f O} = P(c(H)). Proof An immediate argument by monotone classes, starting from the case where H is a rectangle, shows that the mapping (]J,w)~ ]J(H(w)) on ~ x st is measurable. The set L = {(]J,w) : ]J E ~ and ]J(H(w)) = I} is therefore a measurable subset of ~x st On the other hand, ~ is Polish by 111.60 and ~ is Lusin by 110 : by 111.20 (or 111.80) the Lusin measurable space ~ can be identified with a Borel set of ~+ and hence L can be identified with a measurable subset of R+ x st. Finally, by 109, H(w) is uncountable if and only if H(w) carries a diffuse law, i.e. if w belongs to the projection of L. Part (a) then follows from 111.13 and part (b) from the crosssection theorem 111.44-45 - with the zero measure "0" playing here the role of a point adjoined to $. 112
REMARKS. (1) Using the cross-section theorem 111.82, instead of 111.45, we may construct a mapping W>-+]J w of st into mu{O} which is measurable with respect to the , universal completion a-field &of ~o and such that {w:]J f O} is equal to c(H). w
(2) The function CH = inf{t : H(w)
n
[O,t] is non-countable}
on st is called the penetration time into H. It follows easily from the above theorem that CH is ~-measurable - and is a stopping time if H is a progressive set. (3) We define a topology on R+ - which we call the condensation topology in these few numbers - as follows: a point x is considered a cluster point of E c R+ if x E E or if x is a condensation point of E, i.e. if every ordinary neighbourhood of x in R+ contains a non-countable infinity of points of E. The corresponding notions of limits will here be denoted by symbols of the type cd-lim sup y -+ x f(y) ... and we leave it to the reader (along with the verification of the topology axioms) to decide the meaning of symbols of the type cd-lim sup and cd-lim sup. We y J, x y-+x ,YfX are interested here in the fact that for every progressive (resp. measurable) process (X t ) the process (112.1)
Xt(w)
=
Cd-~:~,~~~ Xs(w),
and the analogous processes constructed by changing the type of the limit, are progressive (resp. ~(R+) x j-measurable). More precisely, there are results similar to those of 89 and 90 with an analogous proof: if X is the indicator of a set H, consider the process (At)t > 0 defined by At(w) = sup{s < t : s is a condensation point of H(w)} and use penetration times instead of debuts. N
When (X t ) is the indicator of a random closed set A, (X t ) is the indicator of
165-APPENDIX-IV
AND POTENTIAL
a random closed set A (relative to the completion a-field j). For all w, the section A(w) is the set of condensation points of A(w), i.e. the largest perfect set contained in A(w), the so called perfect kernel of A(w), and it is well known that for all w the set A(w)\A(w) is countable (Cantor-Bendixson theorem). We often say that the random set '" A itself is the perfect kernel of A.
'"
(4) We shall see in a later chapter that if H is optional or predictable the family w~ ~ of theorem 111 can be subjected to additional conditions of optionality w or predictability. Theorem 111 can be improved as follows: THEOREM. Suppose that H is a random closed set. Then it can be assumed that the mapping w~ ~w of assertion (b) has the following property for almost all
WED
~
the support of
w
113
is the perfect kernel '" H(w) of H(w).
Proof: We arrange all the pairs of rationals ~ 0 (r,s) such that r < s in a sequence (rn,sn) and for all n let Hn = H n Jrn,sn[' We construct a mapping w~~: satisfying assertion (b) of 111 relative to H and set A = I 2-n~n. For every w this is a
wnw
n
diffuse positive measure of mass ~ 1 carried by H(w) and hence also by the perfect kernel H(w) and, for almost all w, A is different from zero on every open interval w ~ I such that I n H(w) is non-countable, i.e. every open interval intersecting H(w). The support of AW is therefore exactly H(w). To construct the required mapping it only remains to set ~ w = AwIliA wII of Aw ~ 0 and ~ w = 0 if Aw = O. REMARK. As in no. 112, (1), it is also possible to construct a mapping w~ ~ which " 0 w '" is measurable with respect to J(but not to ~ ), such that the support of ~ w is H(w) for all w. Derived sets of random sets We saw that if H is a random closed set, H\H'" is a random set with countable sections. We shall show in these few numbers that it is indistinguishable from a countable union of graphs. More generally, since the sections of H\H are 18 of R+, we shall establish this result for all "ran dom ~8" with countable sections. We recall first a classical definition and give a simple lemma on random sets. DEFINITION. Let F be a subset of R+. The derived set (1) of F, denoted by F' , ~ the set of points of F which are not isolated in F. For every countable ordinal a the derived set of F of order a is defined by transfinite induction as follows (114.1)
r +1 = (r) n r for every
FO = F ;
F(3
=
a r : t ~ 0 and (t,w) E H} ; the isolated points of H(w) appear among the points Tr(w) ; and Tr(w) is isolated if and only if (Tr(w),w) E H(w) ; r < Tr(w) ; there exists s > 0 such that H(w) n [Tr(w), Tr(w) + sJ = 0. Let A(r) be the set of w with these properties; it is not difficult to verify that A(r) belongs to ~ rand H\H' then is the union of the graphs of the stopping times T T~(r) (53). This implies that HI is progressive. In the following statelement we omit the case of progressive sets. 116 THEOREM. (a) Let H be a (~(R+) x ~)-measurable set. There exists a countable ordinal a such that Ha and Ha+1 are indistinguishable. (b) Suppose that the sections of H are ~8s. Then the sections Ha(o) are a.s. (for the above ordinal a ) either empty or non-countable. In particular, if the section of H are countable ~o' Ha(w) is empty for almost all w and H is indistinguishable from a countable union of graphs of j-measurable functions. Proof: For every rational r let T~(w) = inf{t > r : (t,w) E Ha }. Since Ha decreases as a increases, T~ increases and no. 0.8 applied to the numbers E[exp(-Tr)J shows that there exists a countable ordinal a such that T~ = T~+l a.s. Since there are only countable many rationals, there exists an a such that this is true simultaneously for all rationals. Then we consider an w such that T~(w) = T~+l(w) for all rational r ; Ha(w) has no isolated point - for if s were such a point, there would also exist an interval Ju,v[ with rational end-points containing s and not intersecting Ha+1(w), and we would have T~(w) ~ s, T~+l(w) ~ v contrary to the hypothesis. Hence Ha (w) = Ha+l (w). Suppose that the sections H(w) are ;8 s . We saw earlier that H(w~Ha(w) is a countable set D and hence Ha(w) = H(w) n DC is a to . We choose w such that Ha(w) is equal to Ha +1(w), i.e. has no isolated points and is non-empty; then Ha(w) is a non-empty 10 of R+ and hence a Baire space and it cannot be countable (since every countable Baire space has at least one isolated point). If the sections H(w) are countable 10' Ha(w) is a.s. empty and hence H is indistinguishable from H\H a , which is a countable union of graphs by 115.
AND POTENTIAL
167-APPENDIX-IV
REMARKS. (1) In the last assertion the ~-measurable random variables might of course be replaced by ~o-measurable random variables. (2) This theorem applies in particular to sets with closed sections. In this case there exists an ordinal a such that Ha is indistinguishable from the perfect kernel Hof Hand H\H is indistinguishable from a countable union of graphs. Sets with countable sections THEOREM. Let H be a ~(R+) x ~-measurable (resp. optional, predictable) set with countable sections. Then H is indistinguishable from a countable union of graphs of ~-measurable functions (resp. optional, predictable times). Proof: We can pass to the optional and predictable cases using no. IV.88 (and this only requires that the filtration satisfy the usual conditions). We can therefore forget about the filtration. The theorem follows from no. 116 for sets whose sections are ~8s and from the following lemma (which requires only that the a-field :1 be compl ete). LEMMA. Let H be a (~(R+) x ~)-measurable set. Then there exist a (~(R+) x ~)-measu rable set K, all of whose sections are t8 S , and a Borel mapping h of R+ into at such that (t,w)~ (h(t),w) induces a bijection of ~ pnto H. The lemma is applied as follows: by 116, K is indistinguishable from the union of the graphs of a sequence of r.v. Tn' and H then is indistinguishable from the union of the graphs of r.v. h Tn. 0
r
Proof of the lemma: Since there exists a separable sub-a-field of ~ such that H there is no loss of generality in assuming that ~ is separable. belongs to ~(R+) x ~or is there any such loss (taking if necessary a quotient relative to the equivalence relation associated with t) in assuming that the atoms of f are the points of ~. Then (~,~) can be identified (1.11) with a subset of [O,IJ given its Borel afield. Finally, since every Borel set of ~ x R+ is the trace of a Borel set of [O,IJ xm+, we can reduce to the case where ~ = [O,IJ, ~ = ~([O,IJ) and prove the lemma under these conditions.
l'
The set H is Lusin and hence there exists (111.79, appendix) a closed set F of L = ~ and a continuous bijection of F onto H, which we denote by a ~ (hI (a), h2(a)), where hI and h2 take values respectively in R+ and in ~. We now use the fact (1) that L is homeomorphic to the set I of irrational numbers of [O,IJ to identify F with a closed subset of I c R+ and set K = {(toW)
F, h2(t) = w} (K is the graph of h2). Let h be a Borel extension to IR+ of the mapping hI defined on F. The mapping E
IR+
(1) See the remark below.
x ~
:
t
E
117
168-APPENDIX-IV (h(t),w) then is Borel and induces a bijection of K onto H. On the other hand. for all w the section K(w) is closed in F since h2 is continuous, hence closed in I, and since I is a of R+ ' so is K(w). (t.w)~
10
118
REMARK. The traditional homeomorphism between I and I = If'l-l is given by the contined fraction expansion of irrational numbers. We now give a means of avoiding it. With every sequence 0 = (n l , n2 , n3 , ... ) E I we associate the number f(o) E [O,~J whose dyadic expansion begins with nl "O"s and continues with n2 "l"s, then n3 "O"s, n4 "1"s etc .... Since all the ni are> 0, the function f thus defined is a bijection betweenI and the complement J of the dyadic rationals in [O,~J, and it is easily verified that f is a homeomorphism of I onto J. Since J is a ~o of [O,~J; the reader may just replace I by J in the above proof.
COMMENTS
In this chapter on notation we have added a small paragraph on transfinite induction along countable ordinals, a procedure which we find extremely pleasant and intuitive, but which doesn't belong to the standard background of most students in probability (at least in France). Many people think that transfinite induction is lIsupersededll by such general results as Zorn's lemma. This isn't true: Zorn's lemma may spare much work with well ordered sets in set theory or topology, but doesn't replace naturally the typically "countable" transfinite inductions of real analysis. As far as the continuum hypothesis is concerned, it seems that its consequences for real analysis (Mokobodzki's theorems on rapid filters and medial limits, Mokobodzki's lifting theorem) can also be deduced from more complicated, but "so fter", axioms of set theory ("Martin's axiom"), which are also compatible with the negation of the continuum hypothesis.
CHAPTER I The contents of this chapter are classical. Compared to the first edition, we have entirely omitted non-metrizable compact spaces, and added the fact, classical but not too well known, that Hausdorff separable measurable spaces can be inbedded into [0,1J. We have also given more flexible variants of the monotone class theorem.
CHAPTER II This chapter also is quite classical. The results given without a proof belong to any text on measure theory. Compared to the first edition, we have omitted a small section on Radon measures (the subject is taken over in chapter III in a modified form) and added the Vitali-Hahn-Sakes theorem, the converse of the DunfordPettis compactness criterion (i.e. the implication that weak compactness = uniform integrability) and Mokobodzki's theorem on rapid filters (to be completed later on by his theorem on medial limits). Finally, we have introduced generalized conditional expectations in a slightly non classical way (39, b)). All this has been added, not just for beauty's sake, but because it has proved useful either in martingale theory or in the theory of Markov processes.
170-COMMENTS
PROBAB ILITI ES
CHAPTER II I For probabilists, the main use of analytic sets and capacities is the measurability theorem for debuts (44). On this subject, the first sentence of the chapter which ascribes to Hunt the discovery of the relation between this measurability problem and capacity theory, isn't quite true. Ooob's first fundamental paper on Brownian motion (Ooob [4J), which partially inspired Hunt's great work, contained a proof of measurability of a debut - the hitting time of a Borel set - deduced, not from capacity theory which didn't exist yet, but from results of Cartan in classical potential theory. Moreover both Hunt and Ooob proved more than the measurability of a debut, namely the possibility of approximating hitting times of Borel sets by hitting times of open sets. In the general case this uses the full strength of Choquet's theory, whereas Blackwell remarked (1) that the measurability theorem for debuts just requires that analytic sets should be universa~ly measurable, a result known long before Choquet (Saks [lJ). Similarly, the cross section theorems of chapter IV depend only on 44 b), which is also established in the appendix without using capacities. The reader might therefore be tempted to believe that one could do without Choquet's theorem. This is wrong: Choquet's theorem is at the heart of measure theory, and it is much better to learn it rather than the tricks that may replace it in each particular case. This volume contains much more on analytic sets than the first edition. Blackwell's theorem is given a more prominent role, as a result which ;s at the same time extremely simple and powerful, a remarkable tool for discovery (even if more lIelementary" proofs come to light afterwards). We have systematically introduced Lusin, etc, measurable spaces, partly under the influence of Yen [lJ. Finally, and perhaps above all, we hope that our simple proof will dissipate the fear of the magnificent Souslin-Lusin theorem, so often considered as "deep " (with the connotation of "s lippery and useless"), because of the complication of its usual proofs (Bourbaki [3J, for example). The first edition contained some numbers devoted to Ilregular measures" in an abstract setting. The only remnant of this is the notion of a semi-compact paving (which is defined, but practically used nowhere). The reason for this omission is the importance given to Radon measures on non locally compact spaces, a theory created by Prokhorov in [lJ, developed for probabilistic (even statistical) purposes by Le Cam, Varadarajan, Parthasarathy, in relation with much research on measure theory in topological linear spaces (inspired by an idea of Gelfand [lJ and carried on by Minlos and the Russian school, by Schwartz, Fernique •.• ). This theory seems to have reached maturity, and we have presented it following Bourbaki [5J, with more (1) In an unpublished report, which was the first paper to investigate the measurahility of debuts for arbitrary measurable processes, and was thus the starting point of the "general theory of processes". The only copy we had of it has been lost.
AND POTENTIAL
171-COMMENTS
emphasis on capacity methods. It might interest the reader to compare our proof of the regularity of measures on a Polish space to Prokhorov's beautiful original proof in [1]. This doesn't mean, however, that abstract regular measures won't come again to the foreground. A good reference may be the small lecture notes set by PfanzaglPierlo [1]. On "standard" spaces, i.e. Lusin spaces as defined by Bourbaki, and on their applications to random distributions, Fernique [1] is a good reference. As we already said in the main text, light on this subject came from a remark of Cartier [1].
The theorem on bimeasures is mentioned, without proof, in Kingman [1]. Cartier mentioned to us that it could probably be proved along the same lines as C. DoleansDade's proof of the supermartingale decomposition theorem, and this was successfully carried out by Morando [1]. The theorem has been rediscovered several times since, and turned out to have nice applications. We now give some additional references. The renewal of interest in analytic set theory (1) has led to several new books on the subject, among which HoffmannJ~rgensen [1] and Christensen [1] are highly commendable. In particular, the latter contains a study of the so called Effros structure, a measurable structure on the family of all closed subsets of a Polish space, which has important applications. These two books like the present work, deal only with the elementary levels of analytic set theory (see the appendix to chapter III). To climb to the level just above, a possible guide is Dellacherie [2] (partly following Rogers [1]), or Dellacherie-Meyer [1] in a language more familiar to probabilists. Logicians have climbed far higher summits in descriptive set theory, but we are unable to follow them : a book by Moschovakis should be forthcoming on this subject. As far as capacities are concerned, the reader may consult the memoir [1] of Choquet, the spring from which the whole theory took its rise, still full of lively ideas. Also, the excellent papers by Sion, to whom we owe much. Finally, papers giving applications of capacities to classical analysis: Brelot [1], [2], Helms [1] and Carleson [1] (and the forthcoming books on classical potential theory in its relation to Brownian motion, bu Doob and Rao (2)). For the happy few that have access to them, we strongly commend the collection of seminars of the University of Paris : Seminaire Brelot-Choquet-Deny (potential theory) and Seminaire Choquet (initiation to analysis) : some of them appear now in Springer's Lecture Notes series, but the intermediate volumes are difficult to find. (1) This renewal has also led to more general definitions of analytic sets in non metrizable non separable topological spaces. This scarcely concerns probabilists, and we give no indication concerning this direction of research, in spite of its intrinsic interest. (2) Rao's little book has just appeared Brownian motion and classical potential theory, Aarhus Lecture Notes series no. 47, Aarhus Univ. 1977.
I72-COMMENTS
PROBABILITIES
CHAPTER IV This chapter has been deeply changed since the first edition. We have reduced the role of separability (omitting in particular a ridiculous appendix on lI abstract ll separability and all that concerned the IIsecond canonical process ll ), and added a number of sections on subjects which, from our experience of several years, we now consider important: the measurability properties on canonical spaces, essential limits, and so on. On the other hand, we have included in the new edition of chapter IV many results on the IIgeneral theory of processes ll which were spread over chapters VII and VIII of the first edition. The lIgeneralll theory of processes is often criticized as too abstract, but it is hard to know what is meant here, since after all mathematics as a whole is an abstract science. If lI abstract ll means IIdifficult to learn ll , or even IIboring ll , we may agree to a certain extent, but not if it is meant that it is a'mere IIdivertissement dans le gout fran~aisll. Indeed, the general theory of processes hasn't grown up as an autonomous branch of mathematics, but rather in constant interplay with martingale theory and the theory of Markov processes, as the set of tOQls necessary to fulfill the program of Levy, Doob, Ito, Chung, Hunt, and in favour of which Chung has for years conducted an unwearying propaganda: IILook at the sample functions ... II . Some examples: Increasing families of a-fields and stopping times come from martingale theory (Doob, around 1940), and even from gambling theory, the very birthplace of all probability. The idea of systematically extending to stopping times results known to hold at fixed times is an imitation of the II s trong Markov propertyll, first mentioned in a paper of Doob's on Markov chains in 1942, then by Hunt for processes with independent increments, then independently for general Markov processes by Blumenthal and the school of Dynkin in the USSR. We also owe to the latter many useful results on progressive processes and stopping times. The idea of a process IIwith a lifetime ll , before it became a general tool under the shape of killing operators, occurred in concrete cases of lIexplosivell Markov chains or diffusions (e.g. in population problems), while the little trick of adding a fictitious state is due to Doob [2J. The idea of a predictable or foretellable stopping time appears (with a wrong proof) in Meyer's thesis in 1962, in connection with the theory of Markov. processes, while that of a totally inacessible stopping time is implicit in the quasi-left-continuity of Markov processes, a very important property discovered by Hunt and Blumenthal (and the most intuitive example of stopping times, namely the successive jumps of a Poisson process, turn out to be totally inaccessible). The optional a-field and cross-section theorem appear in a seminar report of 1963 (Meyer [4J) with a wrong proof (1). (1) The first simple (and correct) proof of the cross-section theorems was discovered by Cornea-Licea, then further simplified by Dellacherie (see Meyer [3J).
173-COMMENTS
AND POTENTIAL
Interest in the predictable a-field grew from the work of Catherine Doleans : [lJ, on a new proof of the supermartingale decomposition theorem along lines suggested by Cartier, and [2J on stochastic calculus, a branch which is by essence concrete, since it has applications even to engineering. Finally, a-fields of type T- had been considered by Chung and Doob [lJ, and their relation to predictability was discovered while Chung was at Strasbourg (1967-68), with motivations coming from the theory of Markov chains (much of the theory took its definitive form under pressure from Chung). The use of up- and downcrossings is borrowed from martingale theory (Doob [lJ, p. 315-316). Doob [5J (on the heat equation) was our model for the transfinite induction arguments on intervals where functions oscillate little. The use of essential topology on the line comes from time reversal for Markov processes (Chung-Walsh [lJ, Doob [6J, Walsh [lJ; for a particularly nice application, see Walsh [2J). The point of view that consists in rejecting instantaneous observations on processes, to be replaced by averages on small time intervals, has its source in the work on random distributions. The method we present goes back to an old discussion with Cartier. Ito studied (and explained once in a visit to the Strasbourg seminar) a similar point of view ([lJ, [2J) aiming more than we do to the study of sample path regularity. Some recent and important work of Knight [lJ pushes the theory further. Our paragraph 3 also differs from the earlier accounts of the theory (the first edition of this book, Dellacherie [lJ, Meyer [5J) by an important feature the partial rejection of the usua l condit-;ons at the cost of some technical complication. Let us say again that we haven't done it just for the sake of developing a more general theory, but because it turns out to be useful (for instance, in the work of Azema on time reversal, of Foellmer on quasimartingales, etc.). ll
lI
,
The theory of processes given in this volume still is incomplete: it still lacks the projection theorems which need martingale theory and will be given later, and the criteria for right and left continuity of optional or predictable processes (Mertens [lJ, Dellacherie [lJ, p.101). ll
II
,
INDEX OF TERMINOLOGY
Accessible stopping time, IV.80. Adapted process, IV.12. Additive set function, 111.30. Additivity, countable, 11.1. Algebra, Boolean, 1.1. a-algebra, 1.1, note. Almost complete cross-section, 111.44. Almost-equivalent processes, IV.35. Almost-modification, IV.35. Almost sure, 11.2, convergence, 11.10. Analytic set in a separable metrizable space, 111.15. ~-analytic function, 111.61. 1-analytic set, 111.7. Approximation, Lebesgue, 1.17. Augmentation, usual, IV.48. Baire a-field, 1.7. Base space of a process, IV.1. Bianalytic set, 111.83 (Appendix). Bimeasure, 111.74. Blackwell space, 111.24. Bl ackwe11 s theorem, II I. 26. Borel a-field, function, set, 1.7. Borel isomorphism, 1.11. I
Cadlag process, IV.16, footnote. Canonical process, IV.9, IV.94, etc. Capacitable, 111.27. Capacitance, 111.29. Caratheodory's extension theorem, 111.34. Choquet's theorem, 111.28. Closure of a random set, IV.31, of a family of subsets, 0.2. Coanalytic set, 111.16, footnote 2). Compact paving, 111.3. Complete cross-section, 111.44, footnote 31 Complete filtration, IV.48. Complete probability space, 11.3 and 11.32. Completion, 11.32, universal, 11.32. Conditional expectation, 11.37 and 11.40, generalized, 11.39.
176-INDEX OF TERMINOLOGY
PROBABILITIES
Conditional independence, 11.43. Conditional probability, 11.38. Conditions, usual, IV.48. Continuum hypothesis, 0.8. Convergence, dominated, 11.6, of r.v., 11.10, narrow, 111.54. Cosouslin measurable or metrizable space, 111.16. Cross-section theorems, 111.44-45, 111.81 (appendix), IV.84 (optional), IV.85 predictable). Daniell IS theorem, 111.35. Debut of a subset of R+ x ~, 111.44, essential, IV.39. Degenerate law, 11.4. Derived set of a random set, IV.114 (Appendix). Determining system, 111.75 (Appendix). Deterministic filtration, IV.13. Dichotomic capacitance, IV. Appendix. Diffuse law, 11.19, note. Disintegration of measures, 111.70-73. Distribution of a r.v., 11.11. Dominated convergence theorem, 11.6. Downcrossings, number of, IV.21. Dunford-Pettis weak compactness criterion, 11.25. Elementary r.v., 1.13. Equivalent processes, IV.4. Essential debut, IV.39. Essential least upper bound, 11.8. Essential limit, topology, IV.36. Evanescent random set, IV.8. Event prior to t, IV.11, prior to a stopping time, IV.52, strictly prior to a stopping time, IV.54. Expectation, II.5, conditional, 11.37-40, generalized conditional, 11.39. Fatou's Lemma, 11.7. a-field, 1.1, Baire, Borel, 1.7, generated by •.• , 1.5, optional, predictable, IV.51, product, 1.8, progressive, IV.31. Filter, rapid, 11.27. Filtration, IV.11, complete, IV.48, deterministic, IV.13, left-quasi-continuous, IV.82, natural, IV.12. Finite intersection property, 111.2. Fubinils theorem, 11.14.
AND POTENTIAL
177-INDEX OF TERMINOLOGY
Function, analytic, 11.61, Borel, 1.7, free of oscillaroty discontinuities, IV.20, measurable, 1.2, universally measurable, 11.32. Galmarino's test, IV.99-101. Generalized conditional expectation, 11.39. Generated by ... , a-field, 1.5. Hypothesis, continuum, 0.8. Image law, 11.11. Increasing family of a-fields, IV.11, cf. also Filtration. Independence, 11.33-34, conditional, 11.43. Indicator, 1.3. Indistinguishable processes, IV.7. Induction, transfinite, 0.8. Inner regular measure, 111.46. Integrable, uniformly, set, 11.17. Integral of a family of laws, 11.15. Internally negligible set, 11.30. Interval, stochastic, IV.60. Inverse system, 111.53. Isomorphism, measurable or Borel, 1.11. Jensen's Inequality, 11.41. (4). Kernel, of a Souslinscheme,IIl.75 (appendix), perfect, IV.112 (appendix). Kind, ordinal of the first or second, 0.8. Kolmogorov's theorem, 11.51-52. (L), (LL), (LLL) space, 111.63. La Vallee Poussin's uniform integrability criterion, 11.22. Law (probability), degenerate, 11.4, image, 11.11, of a r.v., 11.11, product, 11.14 (Remark), time, of a process, IV.4. LCC space, 0.7. Lebesgue approximation, 1.17. Lebesgue measurable process, IV.35. Lebesgue's theorem, 11.6-7. Left-continuous capacity, 11.39. Lemma, Fatou's, 11.7. Lifetime, IV.19. Limit, inverse, II I. 53. Limit ordinal, 0.8. Lindelof space, 111.63. Locally bounded measure, 111.47. L.s.c. functions, 0.7.
178-INDEX OF TERMINOLOGY
PROBABILITIES
Lusin topological space, 111.67. Lusin measurable space of metrizable topological space, 111.16. Mapping, see Function. Mass, unit, 111.4. Measurable family of laws, 11.13. Measurable function, 1.2. Measurable process, IV.3, in the Lebesgue sense, IV.35, progessively, IV.14. Measurable space, 1.1. Measure, 0.5, convergence in, 11.10, inner regular, 111.46, locally bounded, 111.47, outer regular, 111.47, product, 11.14, Radon, 111.46. Monotone class, 1.19, theorem, 1.19-21. Narrow convergence of measures, 111.54. Natural family of a process, IV.12. Negligible, internally, set, 11.30. Number of upcrossings and downcrossings, IV.21. Operation (A), Souslin, 111.75 (Appendix). Optional a-field, process, IV.61. Optional time, IV.49. Ordinal, 0.8., limit, of the first pr second kind, 0.8. Orthogonality, 11.9. Outer regular measure, 111.47. Path, IV.1. Paved set, 111.1. Paving, 111.1, compact, semicompact, 111.3, sum, product, 111.1. Penetration time, IV.111 (Appendix). Perfect random set, IV.112 (Appendix). Polish space, 0.7. Predecessor of an ordinal, 0.8. Predictable a-field, process, IV.61. Predictable time, IV.68. Prior to t, event, IV.11. Prior to T, event, IV.52. Probability, convergence in, 11.10. Probability law, 11.1. Probability space, 11.1, complete, 11.3. Process, IV.1, adapted, IV.12, canonical, IV.9, measurable, IV.3, measurable in the Lebesgue sense, IV.35, progressive, IV.14, separable, IV.26. Processes, equivalent, IV.4, indistinguishable, IV.6. Product a-field, 1.8. Product law measure, 11.14.
AND POTENTIAL
179-INDEX OF TERMINOLOGY
Product paving, 111.1. Progressive, progressively, measurable, IV.14. Preogressive a-field, set, IV.31. Prokhorov's Theorems, (inverse limits), 111.53, (strict comoactness), 111.59. Pseudo-law of a process, IV.44. Pseudo-path, IV.41. Quasi-left continuous filtration, IV.82. Radon measure, 111.46. Random closed (right-, left-closed) set, IV.31. Random set, IV.8, measurable, progressive, closed •.• , IV.31. Random variable, r.v., 1.2, elementary, 1.13, real-valued, 1.13. Rapid filter, 11.27. R.c.l.l., etc., IV.16. Real-valued random variable, 1.13. Regular, inner, measure, 111.46. Regular, outer, measure, 111.47. Regular Souslin scheme, 111.75 (Appendix). Riesz Representation Theorem, 111.35 bis. Right-continuous capacity, 111.41. Right-continuous filtration, IV.11. Right essential limit, topology, IV.36. Right topology, IV.24, note. Scheme, Souslin, 111.75 (appendix). Semicompact paving, 111.3. Separable a-field, 1.10. Separable measurable space, 1.10. Separable, right separable process, IV.25-26. Separable topological space, 0.7. Separable theorem, 111.14 and 111.23, second, III. Appendix. Set, analytic, in a metrizable space, 111.15, ~-analytic, 111.7, j-bianalytic, 111.83 (appendix), capacitable, 111.27, internally negligible, 11.30, paved, 111.1, time, of a process, IV.1, universally measurable, 11.32. Sets separable by a paving, 111.14. Skorokhod topology, IV.19. Souslin, 111.16 and 111.67. Souslin scheme, 111.75 (appendix). Souslin-Lusin theorem, 111.21. Space, Blackwell, 111.24, Hausdorff, 1.9, Lusin, Souslin, cosouslin, 111.16, measurable 1.1, separable, 1.10. Standard modification of a process, IV.6. State of a process at instant t, IV.1.
l80-INDEX OF TERMINOLOGY
State space of a process, IV.l. Stationary sequence, IV.55. Stochastic interval, IV.60. Stopping time, IV.49, accessible, totally inaccessible, IV.80, foretellable, IV.70, predictable, IV.68, in the wide sense, IV.49. Strictly prior to, event, IV.54. Strong convergence, 11.10. Strongly subadditive set function, 111.30. Sum paving, 111.1. Support of a measure, 111.50. System, determining, 111.75 (Appendix), inverse, of laws, 111.53. Test, Galmarino's, IV.99-l0l. Time, accessible, totally inaccessible, IV.80, penetration, IV.lll (appendix), predictable, IV.68, optional, IV.49. Time law of a process, IV.4. Time set of a process, IV.l. Topological space, LCC, 0.7, Lindelof ((L), (LL), (LLL)), 111.63, Lusin, Souslin, cosouslin, 111.16 (metrizable case) and 111.67, Polish 0.7, separable, 0.7. Topology, essential, right essential, IV.36. Totally inaccessible, IV.80. Transfinite induction, 0.8. Uniformly integrable, 11.17. Unit mass, 11.4. Universal completion, 11.32. Universally measurable, 11.32. Upcrossings, number of, IV.2l. U.s.c. functions, 0.7. Usual augmentation, conditions, IV.48. Variable, random, 1.2. Versions of a process, IV.4. Vitali-Hahn-Saks theorem, 11.23. Weak convergence, 11.10. Well measurable (= optinal) process, set, IV.6l. Wide sense stopping time, IV.49.
INDEX OF NOTATION
AC, A, A\B, A/::, B, Ix : •.• ], fl A, ~IA (set theory), O.l. closure under uf, uc, ua, umc, .•. , 0.2.
,,
vc., 0.3. . stt, sttt, s ntt, lim, n lim inf n , 0.4. 11)..111, total mass of the measure )..I, ff)..l,A/)..I, ff(t)dF(t), 0.5. C(E), Cb(E), Cc(E), CO(E), C~(E) spaces of continuous functions, 0.6. ~(~), b(~),
measurable functions, 0.6.
l.s.c., u.s.c. (semi-continuity), 0.7. r.v., random variable, 1.2. I A' i nd i ca tor of A, 1. 3. 3(~), ~(fi,i ~(E),
If~i'
,
(D,~),
f
a-field generated by E
1.5.
I), a-field generated by the f i , 1.5.
Borel a-field, 1.7. ~1 x 'C ' product a-field, 1.8. 2 Hausdorff space associated with
g, the mapping (x,y)
®
t},
1-+
(D,~),
1.9.
f(x)g(y), 1.24 (note).
a.s., almost surely, 11.1. ~x' unit rna ssat x, I 1. 4• [[fJ, expectation of f, 11.5.
J?
(function space), LP (class space), 11.8.
norm in LP (possibly + (0), II.B. oo a(L1,L ) , a(L 2,L 2) weak topologies, 11.10. 1If11 ,
p
f(P), image law of P under f, 11.11. A®
)..I,
product measure, 11.14 (Remark (b)).
JpxQ(dX), integral of probability laws, 11.15. fC,f ' tr~ncation of f, 11.17. c ~, completion a-field of ~ with respect to P, 11.32 . • ~,
universal completion a-field of ~, 11.32. ~u(E), universally measurable a-field, 11.32.
PROBABILITIES
182-INOEX OF NOTATIONS E[X/fJ, conditional expectation (provisional notation), 11.37. E[XI~J,
[[Xlf., i € IJ, P[AleJ, P[Alf., i 1 1 babilities, 11.40.
€
IJ, conditional expectations and pro-
[[Xlc11~2J, ~ci'~l x
iterated conditional expectation, 11.40. ~2' product paving, ambiguous notation used only in Chapter III.
1
LC;'
sum pa vi ng.
i
X(E), paving consisting of the compact subsets of a Hausdorff space, 111.3.
Q(o), paving consisting of the
~-analytic
sets, 111.7.
C(c), closure of 0 under (uc,nc), notation used occasionally in Chapter III, 111.14. ~A'
trace a-field of ~ on A, 111.15.
1(E), paving
consist~ng
of the open sets of a topological space.
U(E), paving consisting of the analytic subsets of a metrizable space E.
¥,
l'
~, Q,
l(~), S(~),
abridged notation for K(E), etc •.•
SI(e), pavings consisting of the Lusin, Souslin, cosouslin subsets of a
Hausdorff measurable space ~(~),
(E,~).
paving consisting of the complements of elements of Q(c), 111.19.
1*, outer capacity, 111.32. ~
* , outer measure, 111.37.
1+, outer capacity calculated using open sets, 111.42. 0A' debut of A, 111.44. ~~(E), positive Radon measures on a completely regular space E, 111.54.
P(E), space of Radon laws on E, 111.60. ~t' ~t+' ~t- (filtrations), IV.11.
a,
t, cemetery point and lifetime, IV.19.
r.c.l.l. etc. IV.16. f(t+), f(t-), IV.20. U(f,I;[a,bJ). O(f,I;[a,bJ), number of upcrossings and downcrossings, IV.21. ess sup f(s), ess lim sup f(s). etc .•. , IV.36. f{t) = ess lim sup f(s), IV.37. sH t
~oo-' J't(t
1~, IV.48.