This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
0 is, a number t5 > 0 can be found such that the conditions A E~, P(A) < t5, imply the inequality T19
THEOREM
L11(w)1
dP(w)
0, associate with it some t5 > 0 satisfying (b) and take c = SUP/EJf' E[lfll/t5, a finite quantity in view of (a). Apply formula (19.2), taking for A the set {If I ~ cl, whose probability is less than t5 in view of the inequality
P{lfl > c} < ! E[lfll. c
We obtain the inequality
r
J 0 can be found such that .Ye c Bc + Us' Observe that such a sum set is convex: If it contains .Ye, it therefore contains its convex hull. Let Hand K be two uniformly integrable subsets of LI. Their union H U K is clearly uniformly integrable, and hence so is the convex hull of H U K. It then follows from the inclusion: 1 2(H + K) c the convex hull of H U K, Remark
that the sum H + K is uniformly integrable. This result can also be deduced from Theorem 19. The following is a generalization of Lebesgue's theorem. T21
Let (fn)neN be a sequence of integrable random variables that converges almost everywhere* to a random variable! Then f is integrable, and the convergence offn to f takes place in the LI norm, if and only if the fn are uniformly integrable. If the random variables fn are positive, they also are uniformly integrable if and only if THEOREM
lim E[fnl = E[fl n
< 00.
Proof Let us suppose first that fn converges to f in norm (which presupposes the integrability off) and show that conditions (a) and (b) of Theorem 19 are verified. Denote by A any
measurable set. We have
L
If.(w)1 dP(w)
and cI>' be two limit points: these two functions are a.s. equal to ff-measurable functions [see 9(b)]; in order to establish their a.s. equality, it thus suffices to show [in virtue of 9(a)] that 4>(w) dP(w) = c/>'(w) dP(w)
L
L
for every set E E ff. Now this equality holds for E E ff o• Denote by...ll the collection of subsets E E ff for which this equality is true; it follows from Lebesgue's theorem that...ll is closed under passage to monotone limits, and from I.T19 that Jt =:T. Thus cl> = cI>' a.s., and the theorem is established.
3. Construction
cif Measures.
Radon Measures
We have as yet given no method for the construction of a measure on a measurable space. The beginning of this section indicates the two theorems which are most often used for that purpose, as well as a procedure which sometimes permits the enlargement of the a-field on which a measure is defined. We limit ourselves to the case of probability laws. Almost all the measures we shall find on our way will be Radon measures (in potential theory) or abstract measures arising from the modification of Radon measures (stochastic processes taking values in compact spaces). The obvious reference for Radon measures is Bourbaki's treatise (Integration, Chapter III and IV). The exposition we present here is very schematic, its goal being only the explanation of the roles played by the Baire and Borel a-fields. The reader will find in Chapter III the proofs of several theorems cited in this section. Extension theorems
The Daniell extension theorem is proved, for example, in Loomis [(90), p. 29]. See also III.T2S. Let Ye be a vector space ofreal-valuedfunctions defined on a set Q, which contains the constants and is closed under the operation V. Let I be a positive linear functional on Ye such that l(l) = 1. There then exists a probability law P on the a-field ff generated by Ye, such that every function f E Ye is P-integrable and T24
THEOREM (Daniell)
I(f) -'Inf(oo) dP(oo)
if and only if the following condition holds: for every decreasing sequence (In) of elements of Ye such that limfn = 0, lim l(fn) is equal to O. n
The law P then is unique.
n
(24.1)
11, T25, D26, T27, 28
Probability Laws and Mathematical Expectations
22
The following theorem can be deduced from T24, by taking for YE the set of all finite linear combinations of indicators of elements of §'o. T25 THEOREM (Caratheodory) Let §'O be a nonempty collection of subsets of 0, closed under (uf, (), and let 1 be a positive, additive set function defined on§'o such that 1(0) = 1. There exists a probability law P on the a-jield §' generated by §'0, such that P(A) = I(A) for every A E §'0' if and only if the following condition holds:
lim I(A n) is equal to 0 for every decreasing sequence n
n
(An)neN of elements of §'O such that An The law P then is unique. n
=
0.
(25.1)
Internally negligible sets
D26 DEFINITION Let (O,§',P) be a probability space. We shall say that a set A c 0 is internally P-negligible* if every §'-measurable subset of A has probability zero. T27 THEOREM Let JV be a collection ofsubsets of 0 that satisfies the following conditions: 1. JV is closed under (Uc). 2. Every element of JV is internally P-negligible. Let §" be the a-jield generated by §' and JV,. then the law P can be extended in a unique manner to a law p' on §", such that every element of JV is P'-negligible. Proof We only indicate the steps in the reasoning and leave the details to the reader. Let vii be the collection of all subsets of 0 contained in some element ofJV, and let ~ be the collection of subsets of the form F 6. M (F E §', M E vii). It is easily verified that ~ is a a-field. Since.A contains the empty set, we have §' c ~, and similarly vii c ~. Let A = F 6. M be an element of ~; put Q(A) = P(F). It can be checked that Q(A) depends only on A, and not on its representation as F 6. M. In order to show that Q is a probability law on ~, consider a sequence (An)neN of disjoint elements of~, and their union A; each is of the form F n 6. M n (Fn E §', M n Evil). Let F be the union of the F n • Since they are disjoint up to negligible sets, we have P(F) = ~n P(Fn); on the other hand, A and F differ only by an element of vii. Thus Q(A) = ~n Q(A n )· The law P' then is the restriction of Q to §". In order to establish the uniqueness of p', consider another law P" on §" satisfying the same assumptions. Every element of vii then is internally P"-negligible, and P" thus can be extended to a law on ~ such that every element of vii is negligible. This law must be identical to Q, and hence P' = P".
28 Remarks (a) This result is often applied to families JV consisting of one single internally negligible set. (b) This theorem implies the existence of the completion of a probability space (O,§',P) (No. 3), JV denoting then the collection of all subsets of negligible sets. Let §'P be the completed a-field. Every element of §'P can be written F 6. M, where F belongs to §' and M is contained in a P-negligible set NE§'. F 6. M then is included between F""'N and F 6. N, which belong to§' and differ by a negligible set. The usual approximation of real-valued measurable functions by step functions then gives the following result: A real-valuedfunction f is §'P-measurable if and only if there exist two functions g and h, §'-measurable, such that g < f < h, P{g ~ h} = O.
* Sets of probability 0 are often called P-negligible sets.
23
Construction of Measures. Radon Measures
II, T29, T30, 31
Let (n,~) be a measurable space; for each law P on (n,~), we consider the completed a-field ~P, and we denote by ~" the intersection (over P) of all these a-fields. The measurable space " (n,~) is called the universal completion of (n,~). The reader can easily check the following properties: " (1) Every law P on ~ can be uniquely extended to a law P" on" ~, and P J\,/I/'+ P is a one-to-one mapping onto the set of all laws on~." (2) Let (E,tff) be a measurable space, andfbe a measurable function from (n,~) to (E,tff);fthen " to (E,tff). " also is measurable from (n,~) (c)
We start now the study of Radon measures with some preliminaries on the relations between a-fields and topology. T29 THEOREM Let E be a compact space. A compact subset K of E belongs to the Baire a-field if and only if K is the intersection of a sequence of open sets in E.
Proof Let K be a compact Baire set of E, and let (fn) be a sequence of continuous functions, such that K belongs to the a-field generated by the fn(1.9). Let fbe the continuous mapping (fn)neN of E into RN: There exists a measurable subset A of RN such that K = f-l(A) (1.12), and we thus have K = f-l(f(K». Now RN is a compact metrizable space, and the compact setf(K) therefore is the intersection of a sequence of open sets G n • We then have K = nnf-1(G n). Conversely, assume that the compact set K c E is the intersection of a sequence (G n ) of open sets. Let gn be a continuous function with values in [0,1], equal to 1 on K and to off Gn. Then K is equal to {gn = I}, and thus belongs to the Baire a-field.
nn
T30
°
Let (Ei)ieI be a family of topological spaces, and let E be their product. (a) Assume that each E i is metrizable and separable, and that I is countable. Then the Borel a-field !!I(E) is equal to the product a-field I1ieI !!I(Ei). (b) Assume that each E i is compact (I being arbitrary). The Baire a-field !!Io(E) then is equal to IIieI !!IO(Ei). THEOREM
Proof Since the Baire and Borel a-fields are equal for metrizable spaces, and since the product E in statement (a) is metrizable, we may consider only Baire a-fields. Let us denote by (Xi)iel the coordinate functions on E; each Xi' being continuous, is measurable from (E, !!Io(E» into (Ei,!!Io(Ei» [1.10,d)]. The identity mapping of (E,!!Io(E» onto (E, IIiel !!IO(Ei» thus is measurable (I.TI2), which means that the product a-field is contained into !!Io(E). We now establish the reverse inclusion. In case (a), it follows from the existence of a countable base for the topology of E, the elements of which belong to the product a-field. In case (b), let us prove that every real-valued continuous function on E is measurable with
respect to the product a-field. This follows indeed from the Stone-Weierstrass theorem, * according to which continuous functions can be approximated uniformly on E by polynomials: where each he (k = 1, ... , n) is a continuous function on E, which depends on one coordinate only; f is obviously measurable with respect to the product a-field.t 31 Let E be a topological space, A be a subspace of E, and i be the inclusion mapping from A into E. We show that the Borel sets in A are exactly the intersections with A of Borel sets in E. Let or be the collection of these intersections. Since the mapping i is continuous, we have or c 8l(A)
* Dunford and Schwartz (67), p. 272, Loomis (90) p. 9, Bourbaki (15), Section 4, No. 2.
t Statement (b) is false for Borel a-fields of nonmetrizable compact spaces, even for finite products.
11, 32, 33, T34, T35 Probability Laws and Mathematical Expectations
24
[1.10(d)]. Conversely, or is a a-field and contains the open sets of A (intersections with A of open sets of E); or thus contains 8i(A).
32 Let X and Ybe two random variables on a probability space (n,.F,p) with values in a separable metric space E. Since the diagonal of E x E belongs to 8i(E x E) = 8i(E) x 8i(E), the set {X = Y} is an event, which has probability one if and only if the following equality holds for every measurable bounded function f, or only (I. T20) for continuous [: (32.1)
E[[(X, Y)] = E[[(X,X)]
According to I.T20, it suffices to verify (32.1) for functions [(x,y) bounded and continuous respectively on X and Y.
= g(x)h(y),
where g and hare
Radon measures We could have omitted the subject of Radon measures, and referred the reader to the treatise of Bourbaki. We have not done this because Bourbaki associates directly, with every Radon measure ft, a set function defined on the a-field of ft-measurable sets. For probability theory we need to be a little more delicate in the matter of the a-fields, and to examine the different extension procedures with more care. The case of compact spaces suffices to illustrate this point. 33 Let E be a compact space. Recall that a positive Radon measure on E is a positive linear functional ft on the space ~(E), and that a Radon measure is a linear functional on ~(E) equal to the difference of two positive Radon measures (or, alternatively, continuous in the topology of uniform convergence). Let fl be a positive Radon measure. According to the classical lemma of Dini, * every sequence of continuous functions that decreases to 0 converges uniformly to 0 on E. The linear functional fl thus satisfies condition (24.1), and the Daniell extension theorem gives us the "Riesz representation theorem": ...
T34
THEOREM
Let 4> be the mapping that associates with each bounded positive measure m
on (E,PAo(E)) the linearfunctional f
-¥I'+
fE/(X) dm(x) on ~(E). Then 4> is a bijection ofthe set
of bounded positive measures onto the set ofpositive Radon measures. We have, on the other hand, the following result: ...
T35 THEOREM Let m be a positive bounded measure on PAo(E); then m can be extended, in a unique way, to a measure mon PA(E), which possesses the following property: Let (Ki)iEI be a family of compact sets in E, which is filtering to the left. Then we have m(ni K i ) = infi m(Ki).t (35.1). The measure mthen is regular (see 111.28): For every Borel set A: m(A)
=
sup
m(K)
(35.2)
KcA Kcompact
Let us denote by 1p the inverse of the bijection 4> of No. 34; we simplify our language by calling "Radon measures" the original Radon measure ft, the measure 1p(ft) on 860 (E), and • See Bourbaki (15), §4, No. 1 (p. 53). See also X.6 below.
t This holds in particular when the compact sets and their intersection belong to 8io(E}-a result not obvious from the definition of ft.
25
Construction of Measures. Radon Measures ~
11, T36, 37, D38 /"-..
its extension "P(fl) to f1l(E), and further by writing fl instead of "P(p), "P(p,). If f denotes a Borel function, the integral of f with respect to fl can be written as p(f), (p,f), or Sf(x)dfl(X), Note that a l.s.c. (lower semicontinuous) function is Borel. We then have the following statement, * which appears in some way as a generalization of T35, since the indicator of a compact set is upper semicontinuous.
-+
T36 THEOREM We then have
Let (h)ieI be a family of l.s.c. functions, which is filtering to the right. (36.1)
The first member of (36.1) makes sense, since the function sUPih is l.s.c. Note that this result only concerns positive functions, but doesn't require their integrability. Theorems 35 and 36 can also be considered from an "abstract" point of view (see III.T33 and T34). 37 Bounded positive measures on the Borel field ~(E) of a nonmetrizable compact space E are not necessarily Radon measures. A positive measure A on ~(E) is a Radon measure if and only if it is bounded (on locally compact spaces this condition must be replaced by finiteness on compact sets), and if it verifies either one of the following (equivalent) properties: 10 Let f be a positive bounded, l.s.c. function; then we have A(f) = sup A(g)
(37.1)
+ ge"C:r g:5: I
20
A is a regular measure [Le., verifies (35.2)]
(37.2)
This implies in particular that the sum of a series of Radon measures is itself a Radon measure if and only if it is bounded. Completion of a Radon measure The following definition given for compact spaces extends in fact to the case of a locally compact and a-compact space, but more care is required for general locally compact spaces. D38 DEFINITION Let fl be a Radon measure on (E,f1l(E». We denote by f1l Jl the completion of f1l(E) with respect io fl; the elements of f1l Jl are called fl-measurable sets. The elements of the intersection p-field
where p ranges over the collection ofall positive Radon measures on E, are called universally measurable sets. If Eis metrizable, f1liE) is just the universal completion of f1l(E) (28,c).
t Bourbaki (18), Chap. 4, Section 1, Theorem 1, p. 105.
11, D39, D40, T41, 42 Probability Laws and Mathematical Expectations
26
4. Independence. Conditionina The notion of independence is seldom used in this book. The definitions given below are principally intended to prepare the reader to understand the notion of conditional independence, essential to the theory of Markov processes. Definition of independence D39 DEFINITION Let (Xi)iEI be afinitefamily ofrandom variables defined on the probability space (Q,~,P), with values in the measurable spaces (Ei,tffi)iEI' Let X be the random variable (Xi)iEI with values in the space (IIiEI E i , IIiEI tffi)' The random variables Xi are said to be independent if the law of X is the product of the laws of the Xi' Let (Xi)iEI be any family of random variables. The Xi are said to be independent random variables of every finite subfamily are independent.
if the
This definition can take the following form (No. 14): The random variables (Xi)iEI are independent if and only if for every finite set J c I, and every family (Ai)iEJ such that Ai E tff i for i E J, P{Xi E Ai for every
i E J} =
IT iEJ
P{Xi E Ai}'
The definition of independence can take another equally interesting form: D40 DEFINITION Let (Q,~,P) be a probability space and let (~i)iEI be a family of sub-(]fields of ~. These (]-.fields are said to be independent if, for every finite subset J c I and every family of sets (Ai)iEJ such that Ai E ~ifor i E J,
Definitions 39 and 40 are easily shown to be equivalent. The random variables (Xi)iEI are, in fact, independent (in the sense of D39) if and only if the (]-fields 5""( Xi) are independent (in the sense of D40). In the same way, the (]-fields (~i)iEI are independent if and only if the random variables Xi are independent, where Xi denotes the identity mapping of (Q,ff) onto (Q'~i)' T41 THEOREM Let ~1' ~2' ••• , ~ n be independent (]-.fields, and let h,f2' ... ,fn be real-valued random variables, measurable with respect to the corresponding (]-.fields ~b ••• , ~ n' The product flf2 ... fn then is integrable if and ehdf fb ... ,fn are integrable and, moreover,
42 Let X and Y be two independent real-valued random variables, and let A and ft be their respective laws. The law of the pair (X, Y) in R2 is the product law A (2) ft. The law of the random variable X + Y is the image of this product law under the mapping (x,y) ~ x + y from R2 into R. In other words, the law of X + Y is the convolution A * ft.
27
11, T43, D44, 45
Independence. Conditioning
Conditioning
The notion of conditional expectation often appears a little disconcerting at first sight. Its use in this book will present no difficulties, however, since we shall use only formal properties of conditional expectations, all summarized in this section. Let (Q,~,P) be a probability space, and f a random variable defined on (Q,~) with values in a measurable space (E,C). Let Q be the image law ofP by f Let X be a P-integrable random variable on (Q,~). There exists a Q-integrable random variable Y on (E,C) such that for every set A E C: T43
THEOREM
f
Y(x) dQ(x)
A
=J.
X(w) dP(w).
(43.1)
r1Ul
Let Y' be another random variable satisfying (43.1); then Y = Y' a.s. Proof The assertion regarding the uniqueness of Y is an immediate consequence of
remark 9(a). In order to establish the existence of Y, we begin by considering the case where X is square-integrable relative to P. Associate with every element Z of "p2(E,C,Q) the number
In (Z
0
f)X dP, which depends only on the equivalence class of Z. We thus construct a
linear functional on L2(E,C,Q), whose norm is at most equal to a function YE "p2(E,C,Q) such that
L
(Z 0 f)X dP =
IE ZY dQ
11
X112. Hence there exists
for every such Z.
The function Y is as desired. Suppose that the random variable X is positive: the random variable Y then has a positive integral over every set A E C. It is thus a.s. positive, from 9(a). Consider now the case where the random variable X is only supposed integrable. The same then holds true for its positive part X+, and for its negative part X-. The random variables X~ = X+ A n (n EN) thus belong to L2(Q,~,P), so we can associate with them random variables Y n + as above. According to the preceding remark, these random variables are a.s. positive, increase a.s., and their integrals are bounded by E[X+]. We can thus choose an integrable random variable Y+, equal a.s. to the limit of the Y n +. In the" same way construct a random variable Y_, starting with X-. The integrable random variable Y = Y+ - Y_ satisfies relation (43.1), and the theorem is established. D44 DEFINITION Let Y be an integrable real-valued random variable defined on (E,C,Q), which satisfies relation (43.1). JlPe:say that Y is (a version of) the conditional mathematical expectation of X, given f
45 Remarks (a) When X is the indicator function of an event B, Y is called the conditional probability of B, given f It is important to keep in mind that this "probability" is a random variable defined only up to an a.s. equality, and not a number. (b) Consider a partition of the set Q into a sequence of measurable sets Am and denote by fthe mapping of Q into N equal to n on An. The image measure Q on N is then defined by
11, D46, 47
28
Probability Laws and Mathematical Expectations
Let X be an integrable random variable on Q; it is very easy to compute Y:
Yen) =
f
XdP
An
P(A n )
for every n such that P(A n ) ~ O.
IfP(A n ) is zero, Yen) can be chosen arbitrarily. Suppose in particular that Xis the indicator function of an event B; then Y(n) = P(B n An)fP(A n) if P(A n) is not zero. One recognizes here the number called, in elementary probability theory, the conditional probability of B given that An has occurred. It would be tempting to say the same in the general case, and to call the value Y(x) (x E E) "the conditional expectation of X given thatf(m) = x", but this
terminology would be improper, because the random variable Y is defined only up to a.s. equality, and one can specify its value at a point x only if Q({x}) ~ O. (c) Let X be a positive nonintegrable random variable. The passage to a monotone limit used in the proof of T43 still applies, and yields a positive random variable, not necessarily finite, defined up to a.s. equality, that satisfies formula (43.1). We speak in this case of the generalized conditional expectation.
We have begun with Definition 44 of conditional expectation because we believe it is the most intuitive one. The following is a variation much more important in practice, which we use constantly. It is obtained by taking in statements 43-44: for E the set Q, for tff a sub-a-field of /F, and for fthe identity mapping of Q onto itself. The image measure Q is then the restriction of P to tff and we have the following definition: D46
Let (Q,/F,P) be a probability space, let tff be a sub-a-.field of /F, and let X be an integrable real-valued random variable. A (version of the) conditional expectation of X relative to tff is an integrable tff-measurable random variable Y such that DEFINITION
L
X(w) dP(w)
=
L
for every
Y(w) dP(w)
A
E
C.
(46.1)
I
In the following we omit the word "version." We generally use the notation E[X tff]* for Y. When tff is the a-field ff(ft" i E I) generated by a family of random variables, we speak of the conditional expectation of X relative to theft" and we write simply E[X 1ft" i El]. If X is the indicator function of an event A, we speak of the conditional probability of A relative to C (or to the ft,) and we write P(A I tff) [or peA 1ft" i E I)]. It often happens that conditional expectations are superimposed in the form E[E[fl /Ft] I /F2], where /Ft and /F2 are sub-a-algebras of /F. We then employ the notation E[XI/Ft 1/F2], which is more intelligible. Remark Returning to the notation of statements 43-44, and denoting by f/ the a-field ff(f), we have a.s. the equality E[X f/] = Y f Theorem 1.18 permits us to recover Definition 44 from Definition 46.
I
0
Fundamental properties of conditional expectations ...
47 Under this title we group all the properties of conditional expectations that we use in what follows. In particular, we state anew Definition 46, in another form. The random variables considered are defined on the space (Q,/F,P).
* The notation E"'[X] is also widely used.
29
11,48
Independence. Conditioning
1 Let X and Y be two integrable random variables, a, b, and c be constants. For any l1-field C c !F we have
PROPERTY
E[aX + bY + cl C]
= aE[XI C] + bE[Y/ C] +
c a.s.
(47.1)
(49.1). We have
E[Y1Yal ff 2 ] = E[Y1 Ya lff12 1ff 2 ] = E[(Y1E[Yal ff12 ]) ff 2] = E[( Y1E[ Yal ff 2]) ff 2]
I I
I
= E[ Y1 ff 2 ]E[ Yal ff 2]·
(6) (7) (51.1) (7)
The properties of conditional mathematical expectations will hereafter be used without special reference.
CHAPTER
III
Complements to Measure Theory
The larger part of this chapter is devoted to the capacitability theorem of Choquet (in its "abstract" form) and to results connected with it. The remainder of this book furnishes several important applications of Choquet's theorem, mostly to potential theory, but also to the general theory of stochastic processes. In contrast to the first two sections, which refer to results worthy of being considered as classical, the last section contains some theorems of lesser importance, of interest mainly to the professional in probability theory. It can be omitted without inconvenience, at least by readers possessing a good knowledge of Radon measures.
1.
Compact Pavings. Ana!ytic Sets
1 Let E be a set. A paving on E is a collection of subsets of E that contains the empty set; the pair (E,tff) consisting of a set E and a paving tff on E is called a paved set. This terminology is used only in this chapter and in the applications depending on it. Let (Ei,tffi)iEI be a family of paved sets. The product paving of the tff i (respectively, sum paving. of the tffi ) is the paving on the set IIiEI E i (respectively, on ~iEI E i ) consisting of the subsets of the form IIiEI Ai (respectively, ~iEI Ai), where Ai C E i differs from E i (respectively, from 0) only for a finite number of indices, for which Ai belongs to tff i . It is important to note, when the tff i are a-jields, that the product paving of the tff i is not identical with the product a-field of the tff i (the latter is generated by the product paving). Hence, there is ambiguity in using notations such as IIiEI tff i or tff x IF to denote a product paving. We shall use them nevertheless, in this chapter only. * Compact and semicompact pavings 2 Let (E,tff) be a paved set, and let (Ki)iEI be a family of elements of tff. We say that this family has the finite intersection property if niEl o K i ;;c 0 for every finite subset 10 C I. This amounts to saying that the sets K i belong to a filter or alternatively, from the ultrafilter theorem,t that they belong to some ultrafilter U on E. • A solution consists in adopting the sign @ for product a-fields, as in Neveu (105). We didn't think it useful to do so here. t Bourbaki (13), 3rd edition, Section 6, No. 4, Theorem 1.
32
33
Compact Pavings. Analytic Sets
Ill, D3, T4-6
D3 DEFINITION Let (E,C) be a paved set. The paving C is said to be compact (respectively, semicompact) if every family (respectively, every countable family) of elements of C, which has the finite intersection property, has a nonempty intersection.
For example, if E is a Hausdorff topological space, the paving consisting of the compact sets of E is a compact paving. Let C be a compact [semicompact] paving on E; then the paving C U {E} is compact [semicompact] . Properties of compact pavings T4 THEOREM Let E be a set given a compact (respectively, semicompact) paving C, and let C' be the paving obtained by closing C under the operations (Uf, fla) [respectively, (U/, flC)]. The paving C' is then compact (respectively, semicompact).
Proof Let §" be the paving obtained by closing C under (Uf). The paving C' is obtained by closing §" under (fla) [respectively under (flc)]. Since this last closure evidently preserves compactness, it will suffice to show that §" is a compact paving (respectively, semicompact). Consider thus a family (Ki)iEI (respectively, a countable family) of elements of §", which has the finite intersection property; let U be an ultrafilter such that K i E U for every i E I. Each set K i is a union U jEJi K ii of elements of C, where J i is a finite set. Hence there exists an index ji E J i such that K iii E U. * The family (Kii)iEI thus has the finite intersection property, its intersection therefore is nonempty, and that of the family (Ki)iEI is nonempty a fortiori. T5 THEOREM Let (Ei,Ci)iEI be afamily ofpaved sets. If each of the pavings C i is compact (respectively, semicompact), then so are the product paving IIiEI Ci and the sum paving LiEI C i •
Proof The proof is immediate concerning the product paving. Let Je be the paving on the sum set LiEI E i consisting of the subsets of the form !iEI Ai' where Ai = 0 for all the indices except at most one, for which Ai belongs to C i . This paving is evidently compact (semicompact). It then suffices to note that the sum paving is obtained by closing Je under (uf)· The following theorem will be used only for semicompact pavings, so we neglect the version for the compact case. T6 THEOREM Let (E,C) be a paved set, and let f be a mapping of E into a set F. Suppose that, for every x E F, the paving consisting of the sets f-1({x}) fI A, A E C, is semicompact. Then,for every decreasing sequence (An)nEN of elements of C,
En
yEn
Proof It suffices to show that with every x n f(A n) an element n An can be associated such that f(y) = x. But the family of sets of the form f-l({x}) fI An has the finite intersection property, and hence it has nonempty intersection, so it suffices to choose y in this intersection. • Bourbaki, (13), 3rd edition, Chapter 1, Section 6, No. 4, prop. 5. This proof was communicated to us by G. Mokobodzki.
Ill, D7, T8-IO
Complements to Measure Theory
34
§"-analytic sets D7 DEFINITION Let (F,!F) be a paved set. A subset A of F is said to be §"-analytic if there exists an auxiliary set E with a semicompact paving tt, and a subset BeE X F belonging to (tt X §")alJ such that A is the projection of B on F. The paving on F consisting of the §"-analytic sets is denoted by de§). * T8
THEOREM
~
is contained in de§). The paving d(§) is closed under (uc,nc).
Proof The first assertion is evident. To establish the second, consider a sequence (An)neN of §"-analytic sets. There exists by definition, for each integer n: A set En with a semicompact paving tt m A subset B n of En X F, belonging to (ttn X ff)alJ [and hence equal to the intersection of a sequence (Bnm)meN of elements of (ttn X §")a] whose projection on F is An" Let E be the product set ITn Em with the semicompact paving ITn ttn; let TT be the projection of E X F on F. Denote by Cn the cylinder based on B n in E X F, i.e., the set (ITm#nEm) x B n ; nnAn is equal to TT(nn Cn). The closure under (nc) will thus be established if we show that the set n Cn belongs to (tt x §)alJ. It suffices for this to note that each Cn belongs to (tt x §)alJ. Now let E be the sum In Em with the semicompact paving In tt m and let TT be the projection of E x F on F. We have TT(In B n) = Un An [identifying (In En) X F with In (En X F)]. It then suffices to show that In B n is an element of (tt x §")(1lJ. But this set is equal to nm In B nm , and In B nm evidently belongs to (tt x §)a. Thus the closure under (uc) is established.
n
T9
THEOREM
(a) Let (E,tt) and (F,§) be two paved sets; we then have d(tt) x d(§) c d(tt x §).
(b) Assume that the paving tt is semicompact; let A' belong to d(tt A of A' on F then belongs to de§).
X
§). The projection
Proof Let A x B belong to d(tt) x de§); D7 implies that A is contained in some Al E tta' B in some B I E §" We obviously have d(tt) x ~ c d(tt x ~);therefore, A x BI E d(tt x §") from T8. The same is true for Al X B, and thus for A x B = (A x BI ) n (AI X B). Let us now prove (b): Since A' belongs to d(tt x §), one may find a semicompact paved set (G/~) and a set A" c G x (E x F), belonging to (~ x (tt x ff»alJ' such that A' is the projection of A" on E X F. Now observe that the paving ~ x tt is semicompact, and that A" can be considered as a «~ X tff) x §)(1lJ subset of (G x E) x F whose projection on F is A. (1.
TIO
THEOREM
We have d(d(§» = de§).
Proof Let A be an d(ff)-analytic set. There exists a set E, with a semicompact paving tt, and a set A' E (tt x d(§»(1lJ such that A is the projection of A' on F. Now we have tt x d(§) c d(tff) x d(§) c d(tt x §) (T9(a» , and therefore A' belongs to d(tt x §) (T8). The conclusion A E d(§) now follows from T9(b). • These sets in fact are the same as the .iF-Suslin sets, i.e., as the sets obtained by applying to elements of .iF the operation (A) of Suslin. This result can be easily proved by the method of Choquet (26); see also Sion (110, 111). Our definition is easier to use in the setup of capacity theory and stochastic processes.
35
Ill, TIl-13
Compact Pavings. Analytic Sets
Tll THEOREM Let (F,~) and (G,~) be two paved sets, and f a mapping of F into G such that f-l(~) c d(~). Then we also have f-l(d(~» c d(~. Proof Let A be an element of d(~), and let (E,C) be a semicompact paved set, such that there exists B E (C X ~)u~ whose projection on G is A. Denote by h the mapping (x,y) ~ (x,f(Y» of E x F into E X G. The set C = h-l(B) obviously belongs to (C x d(~»u~ c (d(C x ~)u~ c d(C X ~) (T9 and T8); f-l(A) is equal to the projection of C on F, and therefore is ~-analytic (T9). T12 THEOREM d(~) contains the a-field .r(~) generated by plement of every element of ~ is ~-analytic.
~
if and only if the
com-
Proof The condition is clearly necessary. To show that it also is sufficient, consider the collection.r of all sets B c F such that B and CB belong to d(~;!T is a a-field contained in d(~, and the condition implies ~ c !T. We thus also have .r(~ c !T c d(~. We have established the necessary theorems for proving and using Choquet's theorem on capacities. The reader can therefore omit the end of this section without inconvenience. We begin with a result concerning direct images of analytic sets. T13 THEOREM Let F be a separable metric space, and ~ = fJj(F) be its Borel a-field. (a) Let E be a compact metric space, C = fJj(E) be its Borel a-field, and f be a measurable function from (E,C) into (F,~). For every C -analytic set A in E, the image f(A) then is ~-analytic in F. (b) The statement remains true if the hypothesis on (E,C) is replaced by the following: (E,C) is a measurable space; there exists a compact metric space E' and a (13.1) measurable function ep from (E',fJj(E'» to (E,C) , which maps E' onto E. (c) Let E be a Polish space *; the measurable space (E,fJj(E» then possesses the property (13.1). Proof (a) Let $' be the paving of all compact subsets of E; $' generates the Borel a-field C, and the complement of every element of $' belongs to $' u. We then have $' c C c de$') (TI2) and therefore de$') = d(C) (TI0). Let G be the graph off, and g be the mapping (x,y) ~ (f(x),y) of E x F into F x F; G is the inverse image by g of the diagonal of F x F, which belongs to fJj(F x F), i.e., to the product a-field !T(~ X ~ (11.31), and finally to d(~ X ~ (TI2). On the other hand, we have g-l(~ X ~ C C x ~ c de$') x ~ c de$' x ~), and therefore G E de$' x ~ according to Tll. Let A belong to d(tC) = d($'); A x F belongs to de$' x ~), and the same holds for (A X F) n G. We now observe that the projection of this set on F is f(A), and apply T9. (b) Let ep be a measurable mapping of E' onto E, and let A be C-analytic in E; ep-l(A) = A' then is fJj(E')-analytic in E' (Tll); we have f(A) = (f 0 ep)(A'), and (a) applied to f 0 ep shows that f(A) is ~-analytic. (c) Let E be a Polish space, N be the one-point compactification of the discrete space N, and E' be the compact metrizable space NN. We construct a Borel mapping ep from E' onto E. It will be sufficient to find a Borel subset V of E', and a continuous mapping f from V onto E, since we may then set ep(x) = f(x) if x belongs to V ep(x) = Xo if x belongs to CV,
* According to Bourbaki (14), a topological space E is a Polish space if it is separable, and can be metrized in such a way that it becomes complete.
Ill, TI4
Complements to Measure Theory
36
denoting some point in E. We now provide E with a distance compatible with its topology, under which E is complete, and choose for each nE N a countable covering (A~mEN of E by closed sets, the diameter of which does not exceed 2- n • For any finite sequence of integers s = (s(O), s(l) ... s(n)), we set Xo
and for every infinite sequence (J E NN A(F =
n As
S- a. We prove first the existence of a sequence • (14) Section 6, Prop. 14 (p. 138). t The result is obvious if I(A) = -
00
«18.3) holds with B = 0).
Ill, D20
Complements to Measure Theory
(Bn)n~l
of elements of §" such that B n C An and l(Cn) B 2 n ... n B n • Let us construct B 1 • We have from (18.1) that
> a,
40 where Cn = A n B1 n
leA) = I(A n AI) = sup I(A n AIm)' m
It suffices to take B1 = AIm' where m is chosen large enough so that I(A n AIm) > a. Suppose then the construction is done up to the (n - I)st step. We have by hypothesis C n- l C A, l( Cn-l) > a. Consequently,
I(Cn_l )
= I(Cn- 1
n An)
=
sup I(Cn- 1 n A nm). m
One then takes for B n a set A nm , where m is large enough so thatl(Cn_1 n A nm) = I(Cn) > a. The sequence (B n) having been constructed, put B~ = B 1 n B 2 n ... n B n and B= nnBn= nnB~. The sets B~ belong to §" and decrease, we have Cn C B~; hence I(B~) > a, and I(B) > a from (18.2). We have B n C Am and hence B C A. The set B therefore satisfies the given conditions, and the lemma is established. Now let A be an §"-analytic set. There exists an auxiliary set E, with a semicompact paving c!, and an element B of(c! x ~a;; such that the projection of B on Fis equal to A. Denote by 7T the projection of E x F on F, and by ':§ the paving consisting of finite unions of elements of c! x §". We have LEMMA
2
The set function J defined, for every H
C
E x F, by
J(H) = 1(7T(H» is a ':§-capacity on E
X
F.
Proof The function J is evidently increasing, and satisfies (18.1). Property (18.2) follows immediately from the relation
~ 7T(B
n)
=
7T( ~ B n )'
which holds, by virtue of T6 and T4, for every decreasing sequence (Bn)neN of elements of ':§. We can then conclude the proof. The set B being capacitable for J, there exists an element D of ':§;; such that D C B, and J(D) > J(B) - e (e > 0). Let C be the set 7T(D); the above equality shows that C is an element of §";;, and that C C A and l( C) > leA) - e. Construction of capacities The hypotheses of Choquet's theorem are very general but sometimes difficult to verify. One rarely encounters set functions defined at once for all the subsets of a set F; it is more natural to consider functions defined on a paving, and to try extending them to the whole of ~(F) as a Choquet capacity. We are going to describe, still following Choquet, such an extension process for "strongly subadditive" set functions. We limit ourselves to the case where these functions are positive, but this restriction is not at all essential.
Let §" be a paving on a set F, closed under the operations (Uf, nf). Let I be a set function defined on §", positive and increasing. We say that I is strongly subadditive if, for every pair (A,B) of elements of §", D20
DEFINITION
I(A U B)
+ I(A
n B)
< leA) + I(B).
(20.1)
41
Ill, T21, T22
Capacities If the sign
" ke}, and set '" kelu I I = k7N
k
\u k+l
Prove that III is l.s.c. ({Ill> ke} = UH1 is open). Let (h)iel be a family of non-negative l.s.c. functions, which is filtering to the right: Then E[I] = sup E[.t:] (first reduce the iel
problem to the case of bounded functions, and then, by considering the functions I:, to the case of l.s.c. functions taking only the values 0, e, 2e, ... , Ne. If g is such a function, N-1
E[g] =
L P{g > ke}. Conclude by applying (2».
k=O
CHAPTER
IV
Stochastic Processes
1.
General Properties
of Processes
1 Notation We denote by T an index set, which we call the time set, although the interpretation of the parameter t ETas a time (we often speak of "the instant t") requires that T at least be given the structure of an ordered set. Most of the time, T will be an interval either of the extended line ~ ("continuous case"), or of the set of integers Z, possibly compactified by the addition of the points + 00, - 00 ("discrete case"). We denote by (E, C) a measurable space, which we call the state space. Generally, E will be a compact metrizable space, or be easily imbedded in such a space. Definition of stochastic processes
D2
A stochastic process [with time set T and state space (E,tf)] is a system (n,~,p,(Xt)teT) consisting of A probability space (n,~,p) and Afamily (Xt)teT of random variables defined on (n,~ with values in (E,C). The random variable (Xt ) is called the state of the process at time t; the function t ~ X t ( co) from T into E is called the path of co; the measurable space (n,~ is called the base space of the process. * We usually omit the word "stochastic," and simplify our language further by speaking, when it causes no ambiguity, of the "process (Xt)teT" or even of the "process (Xt )". DEFINITION
Equivalent processes
D3 DEFINITION Consider two stochastic processes hat1ing the same time set T and the same state space (E,G): (n,~,p,(Xt)teT) and (n',~',p',(X;)teT)· We say that the processes (Xt ) and (X;) are equivalent if P{Xt1 E AI' X tll E A 2 , ••• , Xt"E An} = P'{X:1 E AI' X:lI E A 2 , • •• , X:n E An} (3.1) for every finite system of times t 1 , t 2 , ••• , t n and elements Ab A 2 , ••• , An of G. • Added in prOOf There is a tendency now to distinguish between a stochastic process (D2) and a random function, the difference being that in the latter case no probability law is given on the base space (O,oF). This terminology certainly is very convenient.
S2
53
General Properties of Processes
IV, 4, D5, 6, D7, 8
4 The importance of the notion of equivalence follows from the following considerations: A stochastic process is a mathematical representation of a natural phenomenon whose evolution is governed by chance. Suppose that we have observed a very large number of independent realizations of this phenomenon. We know then with arbitrary precision (thanks to the laws of large numbers) the expression that figures in formula (3.1) for the arbitrarily large number of times t I , t 2 , ••• ,tn, but observation can help us no further. In other words, "Nature can give us stochastic processes only up to an equivalence." The probabilist is thus free to choose, from a class of equivalent processes, those he desires to work with. We shall see later how useful this freedom of choice can be. Here is a notion similar to that of equivalence, but more restrictive, which we use often at the end of this chapter: D5 DEFINITION Let (Xt)teT and (Yt)teT be two stochastic processes defined on the same probability space (O,j'=",P), with values in the same state space (E,tff). The process (Yt)teT is a modification of the process (Xt)teT if Y t = X t a.s. for every t ET.
First canonical process
°
6 Consider a stochastic process (O,j'=",P,(Xt)teT)' Denote by fthe mapping of into ET, which associates with each WE the point (Xt(w))teT of ET, i.e., the path of w. The mappingf is measurable when ET is given the product a-field tffT (see No. 1.12). One can thus consider the image law f(P) on the space (ET,tffT). Denote by Yt the coordinate mapping of index t on ET. The processes (O,j'=",P,(Xt)teT) and (ET,tffT,f(P),( Yt)teT) are then equivalent (by the very definition of image laws), and we can make the definition:
°
D7
DEFINITION
With the same notation as in No. 6, the process (ET,tffT,f(P),( Yt)teT)
is called the first canonical process associated with (or equivalent to) the process (Xt).
Two processes (Xt ) and (X;) are equivalent if and only if they are associated with the same canonical process. This canonical process is almost never used directly when the time set T is uncountable: The a-field tffT contains, in fact, only those events which depend on at most a countable infinity of the variables Yt, while the more interesting properties of processes (continuity of paths, for example) depend on all the random variables. The first canonical process is mainly used as a stage in the construction of more elaborate processes, as we shall see later.
Construction of processes 8 Recall the situation envisaged in No. 4: We have observed a certain "random phenomenon" in the course of its development, and we desire to represent it by means of a process. It is natural to begin by constructing the simplest of the processes in the equivalence class, Le., the first canonical process of this class. We thus use the measurable space (ET,tff~, and the coordinate mappings (Yt)teT' It remains to construct a probability law P on this measurable space, such that
P{ Y t1 E AI' ... , Ytn E A n}= ep(tb
••• ,
t n ; AI' ... , An)
54
Stochastic Processes
IV, T9, 10
for every finite subset u = {t I , ... , t n} of T, and every finite collection AI' ... , An of measurable subsets of E, where the functions 4> are given by observation of the phenomenon. The construction is possible if and only if the set function Al
x A2 X
•..
x
An ~ 4>(tb t 2, ... , t n; AI, A 2, ... , An)
can be extended to a probability law P u on (EU,CU), a probability law uniquely determined by the function 4> (from Theorem I.T21, applied to the collection of finite unions of subsets of E U of the form Al X A 2 X ••• X An). It is necessary on the other hand that (8.1) for every pair of finite subsets u, v of T such that u c v, 7Tuv denoting the projection of E V on E U • We recognize here the definition of a projective system of probability laws (111.30), and see that the possibility of the construction of the law P is equivalent to the existence of a projective limit for the projective system (Pu). Such a projective limit does not necessarily exist; Theorem II1.T31 gives us a simple sufficient condition for its existence. In the most important case for applications, where E is a compact space, a very simple proof of the existence of the law P can be given independently of the results of Chapter Ill.
-+-
T9 THEOREM Let E be a compact space, and let C be the Baire a-field of E. For each finite subset u ofT, let P u be a probability law on (EU,CU). If the laws P u satisfy condition (8.1), there exists a probability law P on (ET,CT) such that 7Tu (P) = P u for every finite subset u of T, U 7Tu denoting the projection of ET onto E • Such a law is unique. Proof Let CCf(ET) be the vector space consisting of the continuous functions defined on the compact space ET, which depend only on a finite number of variables. We know from the Stone-Weierstrass theorem that CCf(ET) is dense in CC(E T). Let g be an element of U CC/E~; there exists a finite subset u of T and a continuous function gu defined on E such that g = gu ° 7Tu. It follows from relation (8.1) that the integrals
r
JEU
gu dP u depend
only on g, and not on the representation chosen for g. If we associate this integral with g E CC/ET), we define a positive linear form of norm I on CCf(ET), which extends by continuity to a positive linear form of norm 1 on CC(ET), i.e., to a Radon law P on ET. We have, for every finite subset u of T and every continuous function gu on E U
r
JET
gu0'7TudP=r gudPu
JE
U
by the very definition of P. Relation (9.1) follows then from I.T20, and the same theorem yields the uniqueness of the law P. Second canonical process 10 Let E be a compact space, with the Baire a-field C = fJlo(E) , and let (Xt)teT be a process with values in (E,C). Let (ET,CT,p,( Yt)teT) be the first canonical process associated with the process (Xt ). Since the law P is a Radon law on the compact space ET, it is possible to extend it to a law P on the Borel a-field fJl(E T), which satisfies the condition of theorem II.T35. We thus define a new process equivalent to the process (Xt ): T" (E T ,fJl(E ),P,( Yt)teT)
IV, 11, 12, D13
Separable Processes
55
which we call the second canonical process associated with (Xt ). It should be remarked that the random variables (Yt ) are now measurable when E is given the Borel a-field. Although the a-field fJl(E T) is incomparably richer than the a-field eT = fJlO(E T), the second canonical process is not really satisfactory. It will rather serve us as a tool for the construction of "nice" processes, i.e., processes whose paths are as regular as possible.
2.
Separable Processes
The notion of separability appears extremely important to us. We point out, however, that it will be possible to avoid it in the rest of this book. The only indispensable numbers of this section are 20-22. All the processes envisaged in this section, with the exception of those in the appendix, have as time set an interval T of R, and take their values in a compact metrizable space E. We begin with an example (due to Doob) showing that the same event can have different probabilities for equivalent processes. 11 Consider the two processes (Xt ) and (Yt ) defined as follows with state space R, time set the interval [0,1], and base space (Q,~,P) the interval [0,1] with Lebesgue measure. We set for every co E [0,1] Xt(co) =
°
Y,(w)
=
{~
for t #: co for t
=
co
These two processes are equivalent (we have X t = Yt a.s. for each t). The set of co with continuous paths is measurable for the two processes. It has probability 1 for the first, probability for the second.
°
Definition of separable processes 12 Let (Xt)teT be a stochastic process with values in E, defined on the space (n,~,p). Let I be a subset of T, K a compact set in E. We denote by V(I,K) the set of co whose paths "remain in K for the times in I," namely, V(I,K) = {co: for every t
E
I, X t ( co)
E
K}.
Suppose that I is a countable set. It is then clear that V(I,K) E~, and that P[V(I,K)] = inf P[V(u,K)].
(12.1)
u finite
ueI
Let (Q,~,P,(Xt)teT) be a process with values in E. We say that the process is separable (relative to the collection of compact subsets of E) if for every open interval leT and every compact set K c E: D13
DEFINITION
E ~
(13.1)
(a)
V(I,K)
(b)
P[V(I,K)] = infP[V(u,K)] u
where
U
runs over the collection offinite subsets of I .
(13.2)
IV, 14, 15, D16
Stochastic Processes
56
14 Remarks (a) Properties (13.1) and (13.2) are then verified in the obvious way for any interval leT, open or not. (b) Let f be a continuous function defined on E, with values in a compact space F. If the process (Xt ) is separable, then so is the process (f 0 X t ) with values in F. (c) A process (X t ) whose paths are right (or left) continuous is separable. Indeed, under this hypothesis we have V(I,K) = V(I n Q,K)
for every open interval I,
where Q denotes the set of rational numbers. Property (13.1) is thus verified, and we can easily verify the equality (13.2). Example of a separable process
15 Let (Xt ) be a process with values in the compact metrizable space E. Consider the second canonical process associated with (Xt ) (see No. 10), T
T\
(E ,~(E j,P,( Yt)teT)' A
Denote by I an open interval of T, by UI the collection of finite subsets of I, and by K a compact subset of E. The sets V(u,K), V(I,K) are compact in ET, and hence measurable Since V(I,K) is the intersection o~ the family (V(u'K)ue UI) of compact subsets, which is filtering to the left, and the law P is a Radon law, it follows from II.T35 ~_hat A
A
P[V(I,K)] = inf P[V(u,K)]. ueUI
The second canonical process is thus separable. It should be noted that the metrizability of E plays no role, and that the hypothesis according to which I is an interval has not been used (see Nos. 26-29). This example shows that every process with values in E is equivalent to a separable process. This is a weak form of a result due to Doob, which we shall prove later, according to which every process admits a separable modification (TI9)-this last result actually uses the metrizability of E. Universal separating sets
D16 DEFINITION With the same notation as in D13, a countable subset SeT is said to be a separating set for the pair (I,K) if P[V(S nI, K)] = inf P[V(u,K)]. u finite uel
(16.1)
A countable set SeT, dense in T, is said to be a universal separating set ifS is a separating set for every pair (I,K). It is easy to construct a set S separating for a given pair (I,K). It suffices to choose fin'ite subsets Un of I such that for each n E N
P[V(umK)]
and to put S
=
< u inf finite uel
Un Un'
P[V(u,K)]
+ 1, n
'-----
IV, T17-T19
Separable Processes
57
The existence of a universal separating set S for a given process (Xt ) by no means implies the separability of the process: S is still a universal separating set for every version of (Xt ), separable or not. Example Let (Xt) be a process with right-continuous paths. Every countable dense set in T is then a universal separating set.
T17 THEOREM There exists a universal separating set for every stochastic process with values in E (compact metrizable). Proof We keep the notation of the preceding numbers. Let f be a countable collection of compact subsets of E such that the complements of the elements of f constitute a base for the topology of E. Let J be the collection of open intervals in 1 with rational endpoints. For each pair (I,K) (I E J, KEf), denote by SI,K a separating set for 1 and K. Let S be a countable dense set in T, containing the union of the sets SI,K (I E J, KEf). We will show that S is a universal separating set. Indeed, let J be an open interval of T, and let L be a compact subset of E. J is the union of an increasing sequence (Im)mEN of elements of J, L the intersection of a decreasing sequence (Kn)nEN of elements of f. It suffices to show that, for every finite subset U of J P[V(S
n J, L)]
< P[V(u,L)].
But we have u c I m for large enough m and it follows that P[V(S
n J,L)]
= p[n V(S n
n J,K n)]
= inf P[V(S n
< inf P[V(S n n
n J,K n)]
Im,K n )]
< inf P[V(u,Kn )] =
P[V(u,L)].
n
The following theorem allows us to study the regularity of the paths of a separable process. We denote by Xw(w) (where W is any subset of T) the closure of the image of W under the mapping t ~ X t(w); X w( w) is also the intersection of the sets KEf such that wE V(W,K). ..
T18 THEOREM Let (n,~,p,(Xt)tET) be a process with values in E, and let S be a universal separating set for this process. (a) Suppose that the process (Xt) is separable. There then exists a P-negligible set A E~, such that for every wE n""-A (18.1) Xi w ) = XSf"'Iiw) whatever be the open interval JeT. (b) Conversely, if the law P is complete, condition (18.1) implies that the process (Xt) is separable.
We shall prove simultaneously Theorem 18 and the following theorem: ..
T19
THEOREM
If the law P
is complete, the process (Xt) admits a separable modification.
Proof The symbols J and f have the same significance as in the proof of T16. For each pair (I,K) (I E J, KEf), put A(I,K) = V(S
n I, K)""- V(I,K)
and let A be the union of the sets A(I,K). Suppose that the process is separable and admits S as a universal separating set. The set A is then P-negligible. Let J be an open interval,
IV, T19
58
Stochastic Processes
(Im ) an increasing sequence of elements of f such that J = U m I m , and W an element of n which does not belong to A. We verify relation (18.1): The set x:;(W) [respectively, X s (')iw)] is the closure of the union of the sets X1m(w) [respectively, XS(,)lm(W)]; it thus suffices to verify that XI m(w) = XS(')1 m (w) for w f/= A. But the first (respectively, the second) member is the intersection of the elements K of :ft, which contain it, i.e., which are such that WE V(lm,K) [respectively, WE V(8 n I m , K)]. It then suffices to remark that the relations W E V(lm,K) and W E V(8 n I m , K) are equivalent for W f/= A and K E:ft. Conversely, suppose condition (18.1) is satisfied. We have the equivalences (where K is a compact set c E): WE V(J,K) x:;(W) c K
X,;((0
c K X s (,)J(W) c K
X s (')iw) c K WE V(8
for
W f/= A
n J, K).
The sets V(J,K) and V(8 n J, K) thus differ only by a subset of A, i.e., by a negligible set. The process (Xt ) is thus separable, and admits 8 as a universal separating set. We now proceed with the proof of Theorem 19. For each t ET, define a function Yt(w) in the following manner. We put if
Xt(W)
n XI (')s(w),
E
(19.1)
IeJ lel
and in the opposite case, we take for Ylw) an arbitrary point of the set that appears in (19.1) (which as the intersection of a family of nonempty compact sets, filtering to the left, is nonempty). We show that the functions Y t are measurable, and that the process (Yt ) is a separable modification of (Xt ). Since the set 8 is a universal separating set, we have for each pair (I,K) (I E f, K E :ft) that V(I n (8 U {t}), K) c V(I n 8, K) and P[V(I n (8 U {t}), K)] = P[V(I n 8, K)]. The set B(I,K) = V(I n 8, K)"""V(I n (8 U {t}), K) is thus negligible, and so is the union B of the sets B(I,K) (I E f, K E :ft). Now the relation (19.1) is verified for W f/= B. It then follows that X t = Y t a.s., that Y t is a random variable, and that the process (Yt) is a modification of (Xt ). It is clear that Xt(w) = Ylw) for every W when t belongs to 8. We thus have from the construction of Y t that Yt(W)
E
n Y (')s(w) I
for every
t
and every w.
leJ lel
Let then J be an open interval of T, and L a compact subset of E. The properties Ylw) EL
for every
t
EJ
Yl w) E L
for every
t
E8
and
nJ
are equivalent from (19.2). The process (Yt ) is hence separable (cf. DI3).
(19.2)
59
Separable Processes
IV, 20, 21
Remarks (1) For every subset W of T, denote by Gw( w) the set of points of T X E of the form (t,Xt(w)) (for t E W). Condition (18.1) is equivalent to the relation
for every
(19.3)
whatever be the open interval JeT, the closures being taken in the compact space T x E. We leave the verification of this equivalence to the reader. Here is one of its consequences: Let F be another compact metrizable space, and let f be a continuous mapping of T X E into T X F. Suppose that the process (Xt ) satisfies condition (18.1). We show that the process (Yt) defined by Yt(w) = f(t,Xt(w)) satisfies it also. It suffices to note that f(GJ(w)) = f(GJ(w)) = f(GsnJ(w)) = f(Gsni w )).
The equality (18.1) for (Yt ) is obtained by projecting on F. (2) It should not be thought that separability implies any regularity for the paths of a process. Let f be an arbitrary mapping of T into E. Define a "deterministic" process by choosing a space Q consisting of a single point w, for which we put Xlw) = f(t). This process has only a single path, which can be very irregular. It is nevertheless separable. The set V(I,K) is, in fact, identical either to the whole space Q (and one then has P[V(u,K)] = 1 for every finite subset u of I), or to the empty set (and there then exists a finite subset u or I such that V(u,K) = 0). The conditions of Definition 13 are thus satisfied. Oscillatory discontinuities of paths
20 Letfbe a mapping of an interval T of R into a Hausdorff topological space E. We say that f is free of oscillatory discontinuities* if the limit from the right
f(t+) = limf(s) s-+t s>t
exists at every point t of T (except for the right endpoint of T) and if the limit from the left
f(t-)
= limf(s) s-+t s t k - 1 and f(Si) > b [respectively, f(Si) < a]. If no such element exists, we set t k = Sn. Consider the last even integer 2k such that one actually has f(t 2k- 1 ) < a and f(t 2k ) > b. If no such integer exists we put k = O. The intervals (tbt2), (t3,t 4), ••• ,(t2k-bt2k) of u • Such a function is called a "regulated" function by Bourbaki ("fonction reglee").
IV, T22
Stochastic Processes
60
represent the periods of time when the function f is rising from a to b, whereas the intermediate intervals represent the periods when f is descending from b to a. The number k is called the number of upcrossings byf(considered on u) of the interval [a,b] and denoted by U(f;u; [a,b D.
(21.1)
We similarly define the number of downcrossings by f(considered onu) of [a,b] to be D(f;u;[a,bD
=
U( - f;u;[ -b,-aD.
(21.2)
Upcrossings and downcrossings are also defined on intervals of the form ]a,b[, by replacing strict inequalities by weak inequalities in the definition of the times t i • * Now let S be an arbitrary subset of T. We set U(f;S;[a,bD =
sup U(f;u;[a,bD.
(21.3)
u finite ueS
Definition (21.2) is extended in an analogous manner. The principal interest in these numbers comes from the following theorem. -.
T22 THEOREM Let f be a real-valuedfunction defined on a compact interval T. The function f is free of oscillatory discontinuities if and only if U(f;T;[a,bD < 00 for every pair of rational numbers a, b such that a < b.
Proof Suppose that there exists a point t E T where the function f has an oscillatory discontinuity-for example, suppose it has no left limit. A sequence of points (tn) can then be found that increases to t in such a way that lim infj(tn) = c n-+ 00 nodd
>d=
lim sup f(t n ). n-+oo
neven
Choose then two rational numbers a and b such that d < a < b < c. It is immediately verified, by extracting finite subsets of the set of points t n , that U(f;T;[a,bD = + 00. The converse follows from a property which the reader can easily prove: If r, s, tare three times such that r < s < t, we have U(f;[r,t];[a,bD
< U(f;[r,s];[a,bD + U(f;[s,t];[a,bD + 1.
Let oc and {J be the endpoints of T. Suppose that the function f has no oscillatory discontinuities; one can then associate with each point t ET an open interval It containing t, such that the oscillation offin each of the intervals It n ]t,{J]. [oc,t[ n It is strictly less than b - a. We can cover the interval T with a finite number of intervals It 1 , It 2 , ... , It k . Arrange in increasing order of magnitude the points oc and {J, the points tl' t2 , ••• , tk and the endpoints of the intervals It 1 ,It, ... ,It. We obtain a finite set of points: oc = So < SI < ... < Sn = 2 k (J, such that the oscillation off on each of the intervals ]Si,Si+I[ is less than b - a. We thus have U(f;]Si,si+I[;[a,bD = 0, and consequently also U(f;[Si,si+I];[a,bD < 1. The inequality cited above then gives us U(f;T ;[a,bD < 2n - 1 and the converse is established. t
* The numbers U([;u;[a,b)), D([;u;[a,b)) have the advantage of defining l.s.c. functions of/for the topology of simple convergence. This property extends to the number of uperossings or of downcrossings on an arbitrary set S. t This proof was communicated to us by P. Gabriel.
61
IV, T23, 24
Separable Processes
Remark Let S be a countable dense set in T. Suppose that the function f is defined only on S, and that U(f;S; [a,b]) < 00
for every pair of rational numbers a, b such that a < b. It can then be proved, exactly as above, that the functionf admits a left and right limit through S at every point of T. We apply Theorem 22 to the theory of stochastic processes. Let (n,~,p,(Xt)tET) be a separable stochastic process with values in a compact metrizable space E. The set of WEn whose paths arefree ofoscillatory discontinuities is then measurable.
T23
THEOREM
Proof Let:Ye be a countable collection of continuous real-valued functions which separates
the points of E [for each pair of distinct points x, y of E, there exists a function h E :Ye such that hex) ~ hey)]. The function t ~ Xt(w) is free of oscillatory discontinuities, if and only if each of the functions t ~ h Xt(w) (h E £) is free of them. Denote by Uh(w;S;[a,b]) the number of upcrossings by the function t ~ h Xt(w) (considered on SeT) of [a,b], and, suppose, to simplify the argument, that T is a compact interval. The set of WEn whose paths are free of oscillatory discontinuities is the intersection of the sets 0
0
< oo} (23.1) where a and b are two rational numbers such that a < b, and where h belongs to :Ye. Since {w: Uh(w;T;[a,b])
there are countably many of these sets, it suffices for us to verify that each of them is measurable. In order that Uh(w;T;[a,b]) = + 00, it is necessary and sufficient that one can find, for every integer k, disjoint open intervals with rational endpoints I b 12 , ••• , Ik , contained in T and such that Uh(w;lj;[a,b]) > 0 for j = 1,2, ... , k. Since the collection of systems of intervals of this type is countable, it suffices to verify that the set {w: Uh(w;I;[a,b]) = O}
(23.2)
is measurable, which it is, since it is identical to the union: V(I, h- 1([ -
00,
b])
U ('
l.) V(I, h-1([p, p rational
+ ooD)
p>a
As the set of WEn whose paths are free of oscillatory discontinuities is measurable, its probability will not be affected, if the measure P is completed. Let S be a universal separating set for the process (Xt ); if w does not belong to the exceptional set A considered in the statement of Theorem 18, the reader can easily verify that, for every finite subset u of T, and every h E :Ye, 24
Remarks
Uh(w;u;[a,b])
< Uh(w;S;[a,b]).
i A, Uh(w;T;[a,b]) < Uh(w;S;[a,b]).
It then follows that we also have, for w
Since the reverse inequality is clear, we can replace the inequality by an equality. Thus, in order that the paths of the process (Xt ) be a.s. free of oscillatory discontinuities on the compact interval T, it is necessary and sufficient that Uh(w;S;[a,b]) < + 00 a.s. for every functionh E :Ye, and every pair of rational numbers a, b such that a < b. The following theorem will be useful in the theory of Markov processes.
IV, T25, 26, D27
62
Stochastic Processes
T25 THEOREM Let (12,§",P,(Xt)teT) be a stochastic process with values in the compact metrizable space E. Suppose that the set 120 of cv E 12 whose paths are free of oscillatory discontinuities has probability I, and that for each t E T Xl cv) = Xt+(cv)
a.s. on
120 ,
where Xt+(cv) denotes the right-hand limit of the path of cv at the time t. There then exists a process equivalent to (Xt) whose paths are free of oscillatory discontinuities and are right continuous. Proof Consider the set §"o of elements of F contained in 120 • §"o is clearly a a-field, and the restriction Po of the law P to §"o is a probability law on (12o,§"o). On this new probability space, consider the process Ylcv) = Xt+(cv) (t ET) It is clear that the processes (Xt ) and (Yt ) are equivalent.
The following construction is often useful. Denote by E(T) the collection of mappings from T into E that are right continuous and free of oscillatory discontinuities. E(T) is a subset of ET:~Let (Zt)teT be the restrictions to E(T) of the coordinate mappings, and let r§ be the a-field on E(T) generated by the functions Zt, t E T. If we associate with each cv E 120 the function t ~ Yt(cv), we define a measurable mappingf from (12o,~o) into (E(T),r§). The process (E(T)/§,f(Po),(Zt)teT) is equivalent to the process (Xt), has paths as regular as the process ( Yt), and presents a canonical character that renders it very helpful on occasion.
Appendix to Section 2 An extension of the notion of separability *
26 Let (12,§",P,(Xt)teT) be a stochastic process with values in a measurable space (E,C). We make no particular hypothesis here, on T, on the measurable space, or on the law P. In particular, the latter satisfies no hypothesis of "regularity" in the sense of Chapter Ill, Section 3. Let Kbe a semicompact paving on E, consisting of elements of C. We call a condition any family (Kt)teT of elements of K. Let I be any subset of T. We say that the element cv E 12 satisfies the condition (Kt)teT for t in I if we have Xlcv) E Kt for every tEl. The set of all cv E 12 that have this property is denoted by V(Kt , t E I); it evidently belongs to §" when I is countable. D27 DEFINITION We say that the process (Xt) is separable relative to the condition (Kt)teT if the set V(Kt, t E T) belongs to §" and if P[V(Kt, t ET)] = inf P[V(Kt, t E u)].
(27.1)
ueT u finite
We say that a countable subset S of T is a separating set for the condition (Kt)teT P[V(Kt , t
E
S)] = inf P[V(Kt, t
E
u)].
if we have (27.2)
ueT u finite
* See Meyer (94). This paper contains no detailed proofs, so we have included one here. The results of this appendix will not be used in the sequel.
IV, 28, T29
Appendix to Section 2
63
Example Suppose that T is an interval of R and that E is a compact metrizable space. Separability such as defined in No. 13 is equivalent to the separability of the process, in the sense of Definition 27, relative to all the conditions of the form (Kt)teT where
Kt
K
for
t E
J
= (E
for
t E
T""J
K ranging over the collection of compact sets of E, J over the open intervals of T.
Remarks (a) There actually exist separating sets. Consider a finite subset Un of T, such that P[V(Kt , t E un)] is within at most Iln of the second member of (27.2), and put S = un Un' It is clear that S is separating for the condition (Kt)teT' On the other hand, one cannot establish the existence of a universal separating set. (b) Let S be a separating set for the condition (Kt)teT' and let Z be a countable subset of T which contains S. It is evident that V(Kt, t E Z) C V(Kt, t E S), and these sets have the same probability, as they differ only by a negligible set. It can immediately be deduced that the set V(Kt, t E S) ~ V(Kt, t E S'), where Sand S' are two separating sets, is negligible. (c) The process (Xt) is separable with respect to a condition (Kt)teT' if and only if the set 28
(28.1) is P-negligible, where S denotes a separating set for the condition. We have just seen that this property does not depend on the separating set chosen. The following theorem generalizes the result of Doob which we established in No. 17. T29 THEOREM Let C be any collection of conditions. There exists a process (Yt ), equivalent to the process (Xt ), which is separable relative to the elements of C. Proof Consider the first canonical version of the given process:
We render the process (Yt ) separable by extending the law P in such a manner that all sets of the form (28.1), relative to the conditions which belong to C, become negligible for this extension. It is known (II.T27) that such an extension is possible if and only if every countable union ofsets of the form (28.1) is internally P-negligible. We establish this point and the theorem follows. First fix some notation: With I denoting a countable subset of T, we denote by YI the projection map of ET on El. We say that a subset A of ET is cylindrical with base in El if we have A = y I l( YI(A». For example, every set of the form V(Kt , t E I) is cylindrical, based in El. We reason in the following manner: We suppose that there exist conditions (K~)teT (n EN) such that the set U n[V(K~, t E sn)"" V(K~, t ET)], where each sn is a separating set relative to the condition (K7)teT, contains a non-negligible measurable set a. We then modify the set a without changing its probability, and show that a contradiction is obtained. The set a is of the form Yi 1( a J ), where J is a countable subset of T and a J is measurable in E J • Let I be the countable set J U (U n sn), and let a' be the set
U n
[V(K~,
t
E
sn)""V(K;, t
E
I)].
Stochastic Processes
IV, T29
64
The set fJ = (J.""- (J.' has a strictly positive probability, since it differs from (J. only by a negligible set. It is cylindrical, has a measurable base in El, and is contained in the set
U
[V(K:, t E 1)""- V(K~, t ET)].
n
We are going to shrink that
fJ a little more.
Let H be the collection of finite subsets heN such
p[nV(K:,tEI)] =0. neh
Since the set H is countable, the set
fJ'
=
U
heH
[n
neh
V(K:, t El)]
is negligible. Set y = fJ""-fJ'. This set is again a cylinder based in El, and its probability is strictly positive. Here then is the contradiction we meet: Since y is nonempty, choose a Z E y, and set x = YI(z) = (Xt)tel' We are going to construct a point y = (Yt)teT""-1 of ET""-I such that Yt E K: for every t E T""-I, and every n such that X t E K: for every t E I. In other words, if we put z' = (x,y) E El X ET".I = ET, we have
z'i U n
[V(Kf, t E I)""-V(K~, t ET)].
(28.2)
But this is impossible, since z and z' have the same projection on El, and hence both belong to y, which is contained in the above set (28.2). Let N x be the collection of integers n such that X t E K~ for every t E I [or z E V(K:, t El)]. As the paving .Y{ is semicompact, a family (Yt)teT"' 1 satisfying the above properties can be found if and only if for every finite subset h of Nx we have
n K: '# 0
neh
for every s E T""-I.
But this condition is actually satisfied. We have in fact z E y, hence z i fJ', hi H, and consequently p[nneh V(K~\ t El)] > O. Since the set I is separating for these conditions, this probability is equal, for every SE T""-I, to p[nneh V(K:, t El u {s})]. We thus have, a fortiori, that P{ Y s E K: for every n Eh} > 0, which excludes the possibility that the set nneh is empty.
K:
3. Measurable Processes. Stopping Times This section serves principally as an introduction to the terminology we use in what follows. It will perhaps be to the reader's advantage to study it in a summary fashion, and return to it later according to his needs. In order to avoid overburdening the text, we have limited ourselves to processes whose time set is the half-line R+. The reader can easily adapt the results to the other usual time sets.
65
Measurable Processes. Stopping Times
IV, 30, D31, 32, D33, 34
Processes adapted to an increasing family of a-fields
Let (n,~ be a measurable space, and let (~t)tER+ be a family of sub-a-fields of ~ such that the relation s < t implies ~s c !Ft. We say that (~t) is an increasing family of sub-a-fields of~, and we call ~t, for each t E R+, the a-field of events prior to t. We denote by ~ <Xl the a-field generated"by the union of the a-fields ~t, and we set 30
~t+
= s>t n ~s
(t
E
R+).
The family (!Ft) is said to be right continuous if !Ft =
~t+
for every t
E
R+.
Remarks (a) The family of a-fields (~t+)tER+ is right continuous. (b) When the index set is N instead of R+ (or more generally when the index set is dis-
crete) the notion of a right-continuous family and the definition of the a-fields ~t+ evidently have no meaning. D3! DEFINITION Let (Xt)tER+ be a stochastic process defined or a probability space (n,~,p) and let (~t)tER be an increasing family of sub-a-fields of ~. The process (Xt ) is said to be adapted to the family (~t) if X t is ~t-measurable for every t E R+.
Let (Xt) be a stochastic process. It is natural to consider (Xt) as a process adapted to the family of a-fields ~t = .9""(Xs' s < t). Example
32 The preceding definitions have this intuitive significance: If we interpret the parameter t as time, and each event A E ~ as a physical phenomenon, the sub-a-field !Ft consists of the events that occur prior to the instant t. The !Ft-measurable random variables are hence those which depend only on the evolution of the universe prior to t. In particular, imagine that an observer watches the appearance of a certain phenomenon in the universe, and notes the time T(w) when this phenomenon is produced for the first time. The event {T < t}, which occurs if and only if the phenomenon considered is produced at least once before the instant t, or at that instant, is evidently prior to t. From this comes the interest in the following definition. D33 DEFINITION Let (n,~ be a measurable space, and let (~t)tER+ be an increasing family ofsub-a-fields of ~. A positive random variable T defined on n is said to be a stopping time [relative to the family (~t)*] if T satisfies the following property: The event {T
< t} belongs to ~t
for every t E R+.
(33.1)
34 Remarks (a) Every random variable equal to a positive constant is a stopping time. (b) Let T be a positive random variable, which satisfies the condition
{T < t} E ~t for every t E R+. It is clear that we then have {T
(34.1)
< t} E ~t+e for every t E R+ and every e > o. In other
words, Tis a stopping time relative to the family of a-fields (~t+)tER+. In particular, if the family (~t) is right continuous, relation (34.1) implies that T is a stopping time relative to the family (~t). (c) We often allow stopping times to take the value + 00. • Or (better): A stopping time of the family ("t) (added in proof).
Stochastic Processes
IV, D35, T36-T40
66
D35 DEFINITION Let T be a stopping time relative to the family of (J-fields We denote by ~ T the collection of events A E ~ such that
(~t)tER+'
00
A n {T
< t} E ~ t
for every t
E
(35.1)
R+.
We call ~ T the (J-field of events prior to T. The reader will immediately verify that these events do constitute a (J-field, and that if the stopping time is equal to the constant t, the (J-field ~t is recovered. Properties of stopping times The stopping times figuring in the following statements are all relative to a single family of (J-fields (~t)tER + • T36 THEOREM Let Sand T be two stopping times. Then the random variables SAT and S V T are again stopping times. * The proof is immediate. T37
THEOREM
Let T be a stopping time. Then T is ~ T-measurable.
The proof is immediate. T38 THEOREM Let T be a stopping time and S an that S > T; S is then a stopping time.
~ T-measurable
random variable such
Proof We verify the relation {S < t} E ~t for every t E R+. The event {S ~ t} belonging to ~ T' we have {S < t} n {T < t} E ~t. It then suffices to note that this intersection is equal to the event {S < t}. Here is a generalization of formula (35.1). T39 THEOREM then have
Let Sand T be two stopping times, and let A be an element of ~s. We (39.1)
Proof In order to verify that
A n {S
< T} n
{T < t} E ~t for every t
E
R+
it suffices to write the left-hand side in the form
[A n {S
< t}]
n {T
< t}
n {S A t
< TAt}.
Each of these three events belongs to ~t. The first, by reason of the relation A E ~s; the second, from the fact that T is a stopping time; the third, finally, follows from the fact, which the reader can easily verify, that the functions SAt and TAt are ~t-measurable. T40
THEOREM
Let Sand T be two stopping times such that S
< T.
~s c ~T'
Proof Let A be an element of ~s' From T39 we have A=An{S T} E.?FT by taking complements. Denote by R the stopping time SAT. Then R is .?FR-measurable (T37), and hence.?FT-measurable (T40). It then follows that the events {R = T} = {S = T}, {R < T} = {S < T} belong to .?F T' These events also belong to .?Fs, since Sand T play symmetric roles. Let (Tn)neN be a sequence of stopping times. It can be verified immediately that sUPn T n is again a stopping time, which implies in particular that the limit of an increasing sequence of stopping times is a stopping time. We have more complete results in the following case. Proof We have {S
Suppose that the family (.?Ft) is right continuous. Let (rn) be a sequence of stopping times. The random variables lim infn~oo T m lim supn~oo T n are then stopping times. Suppose, moreover, that the sequence (Tn) is decreasing, and denote its limit by T. We then have .?FT = n .?FT' n
T42
THEOREM
n
Proof We prove only the second assertion. We have.?F T c nn.?FT n from T40. Conversely, let A be an element of this latter a-field. We have A n {Tn < t} E.?F t for every t E R+, and hence also Un [A n {T n < t}l = A n {T < t} E.?Ft for every t. This implies that A n {T < t} E.?Ft+ for every t, and we conclude by using the equality .?Ft+ = .?Ft.
The following notation is often used in what follows. D43
Let t be a positive number, and n a positive integer. We denote by t(n) the k single number of the form 2n (k E N) such that DEFINITION
k-l e}. n+l ro + 00 if the above set is empty.
These functions are then stopping times. We show this only for the case n = 1, the general case being treated by induction in an obvious manner. Let t be a number> O. Given a rational number h < t, a number e' > e, and an integer m > 0, consider the set D h •m of pairs of rational numbers (r,s) such that O 0, and n a positive integer. For every integer k < 2 n and every
set
and complete this definition by putting X: = Xt. It is clear that the mapping (S,W) ~ X:( w) of [O,t] x Q into E is measurable when the first space is given the a-field &l([O,t]) X ~t. The same then holds for (s,w) .I\,f\/'+ Xi w), since this mapping is the limit of the preceding one as n~ 00. The case where the paths are left-continuous is treated similarly. The theorem extends, moreover, to any Hausdorft'space E, given its Borel a-field (1.15). The following notation is used for the rest of the book, with the random variable H being, in general, a stopping time. D48 DEFINITION Let (Xt)teR+ be a measurable stochastic process defined on (Q,~,P) with values in a space (E,C), and let H be a random variable defined on Q, with values in R+. We denote by X H the random variable w .I\,f\/'+ X H(w)( w). This function is indeed a random variable, since it is equal to the composition of the two measurable functions w ~ (H(w),w) and (t,w) ~ Xt(w). The case where the time set of the process is R+ U {+ oo}, and where H is allowed to take the value + 00, is also frequently encountered. The following theorem, for example, is true in this situation without the finiteness restriction. T49 THEOREM Let (Xt ) be a progressively measurable process with respect to the family of a-fields (~t), and let T be a (finite) stopping time. The random variable X T is then ~ T-measurable.
71
Measurable Processes. Stopping Times
IV, 50, D51, T52
Proof We have to show that, for every measurable subset A of the state space E and every t E
R+, we have
Put S = TAt. This set is equal to [{Xs EA}
n {S < t}]
U
[{Xt EA}
n
{T = t}].
It thus suffices to show that X s is ~t-measurable. This follows immediately from the fact that X s is the composition of the measurable functions: w -AN'~ (S( W ),w) from (o,~t) into ([O,t] X 0, ~([O,t]) X ~t), and (s,w) ~ Xlw) from ([O,t] X 0, ~([O,t]) X ~t) into E.
Measurability of hitting times
50 We have noted that a process can be considered as a function defined on the product set R+ X O. We now use this remark in a more systematic manner (and we use it much more in Chapter VIII). Thus, given a function X defined on R X 0, with values in a measurable space (E,C), we denote by X t the partial mapping w ~ X(t,w), and we identify the function X with the family (Xt)teR+ of these partial mappings. This permits us to say, for example, that X is a process if the family (Xt) is. We say also that a subset A of R+ X 0 is progressively measurable with respect to the family (~t) if its indicator function is identified with a progressively measurable process with respect to this family. The reader can verify that the progressively measurable processes are identified with random variables on R+ x 0, given the a-field of progressively measurable sets. D51
Let A be a subset ofR X O. We denote by D A (the "debut of A") the function defined on 0 by DEFINITION
DA(w)
=(
inf {t: (t, w) EA}
+ 00
if the above set is empty.*
The idea of using Choquet's theorem to establish the measurability of functions of this type is due to Hunt, but the following theorem has been established by Blackwell and Freedman.
-+-
Suppose that the family (~t)teR+ is right continuous, and that each a-jield ~t is complete for the law P. Let A be a progressively measurable subset of R+ X O. The function DAis then a stopping time.
T52
THEOREM
Proof Since the family (~t) is right continuous, it suffices to verify that {DA
< t} E ~t
R+. Since the a-field ~t is complete for the law P, we show that the set {DA < t} is ~t-analytic (see 111.24). Now this set is the projection on 0 of A n ([O,t[ X 0), which belongs to the a-field ~([O,t]) X ~t due to the fact that A is progressively measurable. Let f be the paving consisting of the compact subsets of [O,t]. Every element of the product a-jield ~([O,t]) X ~t is analytic with respect to the product paving f X ~t from III.TI2. It then suffices 10 use Ill.T9 to establish the theorem.
for every
t E
• It should be observed that every stopping time T can be considered as the debut of a progressively measurable set, namely {(t,w): t > T(w)}.
IV, 53, D54, 55
72
Stochastic Processes
Remark Let!!7 be the a-field of all progressively measurable subsets of R+ x preceding theorem extends to the case where A is only !!7-analytic.
n.
The
53 We suppose that the family (~t) satisfies the hypotheses of the preceding statement. Let (Xt ) be a stochastic process, with values in a state space (E,tff), which is progressively measurable relative to the family (~t). (a) Let B be an element of tff. We set DB{w) = inf {t: Xt{w)
E
B}.
This function is called the hitting time [for the process (Xt)] of B. It is equal to the "debut" of the set {{t,W): Xt{w) E B}, which is progressively measurable, so that DB is a stopping time. This remains true, from n.T11 and the above remark, when B is only supposed tff-analytic. (b) Letfbe a real-valued measurable function defined on E. The process «f °Xo,f oXt))tER+' with values in R x R, is evidently progressively measurable relative to the family (§"t). The following function of W is hence a stopping time: inf {t: IfoXo{w) - foXt{W) I > e}
(e
> 0).
It is called "the first time for which f differs from its initial value by more than e (on the path of w)." (c) Let us take up again the situation of No. 44{b), and show that the functions T n are stopping times, by the method of No. 52. Since the processes (Xs) and (Xs-) are progressively measurable from T47, the same holds for the process (Xs - X s-). Reason by induction: By supposing that T n is a stopping time, the set
A
= {{t,w): t > Tiw)}
is progressively measurable. The same holds for the set A' = {{t,w): IXt{w) - Xt_{w) I > e}. It then suffices to note that Tn+l is the debut of A n A'.
Systems and chains of stopping times All stopping times envisaged below are relative to the same right continuous family of a-fields (~t). We also suppose themjinite, but this restriction can be removed if we consider processes with time set R+ U {+ 00 }. D54 DEFINITION Let I be a totally ordered set. A system of stopping times is a family of stopping times {Ti)iEI such that T i < T; for every pair (i,j) of elements of I such that i < j. Let (Xt) be a stochastic process progressively measurable with respect to the family (~t). The stochastic process (XT)i E I is said to be "transformedfrom (Xt) by the system ofstopping times {Ti )." 55 Examples Let T be a finite stopping time. Two particularly interesting systems of stopping times can be obtained from T in the following manner: (1) The set I is equal to R+, and the stopping time T i is equal to T + i. The transformed process is obtained by "translating the origin of times to the instant T."
73
Measurable Processes. Stopping Times
IV, T56, T57
(2) The set I is equal to R+, and the stopping time Tt is equal to T A i. The transformed process is obtained by "stopping the process (Xt ) at the instant T." This transformation is the origin of the name "stopping time." Other very important examples of systems of stopping times, the "changes of time," will be studied in Chapter VII. We now establish an important property of transformed processes, communicated to us by K. L. Chung, as was the following lemma (which is of interest itself). T56
Let (Xt ) be a process with values in a measurable space (E,8), progressively measurable with respect to the family (~t), and let T be a stopping time. The mapping THEOREM
(s,w) ~ XsAT(w)(W), from R+ X
(56.1)
n into (E,8) is measurable when the first space is given the a-jield &I(R+) x ~T'
Proof We begin with a remark. Let r be an element of R+; then A fl (R+
x {w: T(w)
~ r}) E
&I(R+) x ~ T
(56.2)
for every set A E &I(R+) x ~r' Since the collection of sets A having this property is a a-field, it indeed suffices to verify that the sets of the form I x J (I E &I(R+), J E ~r) have it, and this follows from the definition of the a-field ~T' The theorem i.s established if we show that {(s,w): XsAT(w) E
U} E &I(R+) x
~T
for every set U E 8. It suffices for this to show that each of the following sets belongs to &I(R+) x ~ T: {(s,w): s ~ T(w), XT(w) {(s,w): s
E
U}
< T(w), Xlw) E U}.
This is obvious for the first set, since the three mappings (s,w) ~ s, (s,w) ~ T(w), and (s,w) ~ XT(w) are measurable with respect to the a-field &I(R+) x ~ T' Write the second set in the form
< r, T(w) > r, Xs(w) E U}. Now each of these sets is of the form (56.2), with A = {(s,w): s < r, Xs(w) E U} E &I(~) U
rrational
{(s,w): s
x
~r'
from the fact that the process (Xt ) is progressively measurable. The conclusion then follows from relation (56.2). Consider next a system of stopping times (Tt)teR+ having the following property: The mappings t ~ Tt(w) are increasing and right continuous.
(57.1)
Denote by (~t)teR+ the "family of transformed a-fields" defined by ~t = ~Tt; this family is increasing and right continuous from T40 and T42. Set finally Y t = X Tt . We have the following theorem. T57
THEOREM
The process (Yt) is progressively measurable with respect to thefamity (~t).
Proof The process (Tt) is adapted to the family (~t) from T37, and its paths are right continuous by hypothesis. It is thus progressively measurable with respect to the family (~t) from T47. Let t be a positive number; since the mapping (s,w) ~ Ts(w) is measurable
IV, T58, D59, T60
74
Stochastic Processes
with respect to the a-field 86'([O,tD x r,gt, the mapping (s,w) -'\1'1'+ (Ts(w),w) from ([O,t] x 0, 86'([O,tD X r,gt) into (R+ X O,86'(R+) X r,gt) is measurable. Since the mapping (u,w) -'\1'1'+ XUATt(W) from (R+ X O,86'(R+) X r,gt) into (E,C) is measurable from the preceding theorem, the composed mapping (s,w) -A.N+ XTsATt(W) = Ys(w) is measurable with respect to 86'([O,tD X r,gt. The following proposition is often useful in effecting a change in the time origin (No. 55, 1). Here T denotes a stopping time, and we set r,gt = §"T+t. T58 (r,gt)
A positive random variable S is a stopping time with respect to the family if and only if T + S is a stopping time with respect to the family (§"t). THEOREM
Proof Suppose that T have from T41 that
+ {S
S is a stopping time with respect to the family (§"t). We then
< t} =
{T + S
< T + t} E §"T+t =
r,gt.
Conversely, suppose that S is a stopping time with respect to the family (r,gt). We have
{T
+ S < t}
=
U
a+b 0
{T
< a}
n {S
< b}.
It thus suffices to show that {T < a} n {S < b} = {T < a} n [{S < b} n {T + b < t}] belongs to §"t. But we have {T X s).
(1.1)
2 Remarks (a) The processes we call submartingales (respectively, supermartingales) would be called "semimartingales" (respectively, "lower semimartingales") in Doob's book. This terminology is now abandoned. (b) Let (Xt ) be a supermartingale. The process (-Xt ) is then a submartingale, and conversely. We thus state the theorems for only one of the two species of processes, in general, that of supermartingales. (c) Suppose that the random variables (Xt ) are non-negative. Relation (1.1) then makes sense, even if the X t are not integrable. One can then speak of a generalized martingale (respectively, supermartingale, submartingale).
77
V, 3, T4, T5
78
Generalities and the Discrete Case
(d) A process (Xt ) given with no reference to a family of a-fields is called a martingale (supermartingale) if it is a martingale (supermartingale) with respect to the family ff t = S""(Xs , s < t). It can then be said that a process equivalent to a martingale (supermartingale) is again a martingale (supermartingale). (e) Definition 1 can be generalized in the following manner. Suppose we are given for each t ETa measurable space (Ot,ff t), and for each pair (s,t) of elements of T such that s < t a measurable mapping 7Tst of 0t into Os, such that 7Trs 0 7Tst = 7Trt for r < s < t. Measures flt (each of which is defined on the space of the same index) then constitute a martingale if for s < t we have fls = 7Tst Cflt). Supermartingales and submartingales are similarly defined. Definition 1 is then obtained by taking all the 0t equal to 0, all the mappings 7Tst equal to the identity mapping of onto itself, * and by setting flt = X t P. Unfortunately, one cannot translate into this language the most important results of martingale theory, since these concern the behavior of the paths of the process (Xt).
°
Examples of martingales 3 We limit ourselves to two examples; the rest of the book will furnish many others. (a) Let (O,ff,P) be a probability space, and let (fft)teT be an increasing family of sub-afields of ff. For each integrable random variable Y, set Y t = E[ Y fftl. The process (Yt) is then a martingale [with respect to the family (fft)l. (b) Let (O,ff,P) be a probability space. Denote by T the collection of all finite sub-afields of ff, ordered by inclusion, and by Q a positive additive set function defined on ff. Each element t of T is generated (as a a-field) by a partition of into a finite number of measurable sets AI' A 2 , ••• , An. Denote by X t the following function (where an arbitrary value is assigned to ratios of the form 0/0):
I
°
f
Q(A i ) lA .. • i=l P(A i )
This function is evidently t-measurable. The process (Xt)teT is then a supermartingale. It is a martingale if Q(A i) = 0 for every set Ai such that P(A i ) = O. This process will be used in Chapter VIII. Additivity and convexity theorems T4 THEOREM Let (Xt) and (Yt ) be two martingales (respectively, supermartingales) defined on (O,§",P), relative to the samefamily (§"t) ofsub-a-jields of§"o Let a and b be two constants (respectively, two non-negative constants). The process (aXt + b Y t) is then a martingale (respectively, a supermartingale). The process (Xt A Y t) is a supermartingale. Proof Immediate.
T5
Let (Xt) be a supermartingale relative to a family of a-jields (ff t ). In order that (Xt) be a martingale, it is necessary and sufficient that the function t""'vV'+ E[Xtl be constant. THEOREM
I
Proof Let s, t be two elements of T such that s < t. We have E[Xt ffsl two sides are then a.s. equal, if and only if they have the same expectation. • More precisely, the identity mapping of (n,~t) onto (n,~.).
< X s a.s.
The
Definftions and General Properties
79
V, T6-T8
T6 THEOREM Let (Xt ) be a martingale (respectively, a supermartingale) relative to a family of (J-jields (ff t ) , and let f be a concave (respectively, concave increasing) function defined on R and such that the random variables f 0 X t are integrable. The process (f 0 X t ) is then a supermartingale relative to the family (ff t ). Proof Let sand t be two elements ofT such that s in both cases
< t. Set Y u =
fo X u for u ET. We have
Write then Jensen's inequality (11.47, property 4),
This establishes the theorem. A corollary of this result is quite commonly used. Let (Xt ) be a martingale, and let A. be a number > 1 such that the random variables IXtl A are integrable. The process (IXtI A) is then a submartingale. The following two theorems generalize those which have been proved. We borrow them from Dubins (64), where they aid in establishing several interesting inequalities. They will not be used in this book. T7 LEMMA (Generalized Jensen's inequality) Let (n,ff,p) be a probability space, X an integrable random variable on this space, and rg a sub-(J-jield of ff. We shall denote by Ya version of E[X rg]. Let f be a measurable mapping ofn X R (with the naturalproduct (J-jield) into R, such that (a) The mapping w J\,/\/+ f(w,t) is rg-measurable for each t ER. (b) The mapping t Jo..N+ f(w,t) is convex for each WEn. Suppose that the random variable w Jo..N+ f(w,X(w» is integrable. We then have the inequality
I
E[f{-,X(o»
I rg] > f(o, Y(o»
a.s.
(7.1)
This theorem could be proved by first supposing X was elementary, and then deducing the general case by using a passage to the limit (which is a little more delicate than usual). We will not enter into the details here. T8 THEOREM For each integer n > 1, let qn be a measurable mapping of Rn into R. Suppose that these functions have the following properties: (1) For each n > 1, and each system ofvalues Xl' ... , Xn-b the function Xn J\,/\/+ qn(X I' ... , Xn- b x n) is concave increasing. (2) qn(x I, ... , x n) > qn+I(Xb ... , x n- l , x m x n). Let then (Xn)n?-l be a supermartingale relative to afamilyof(J-jields(ffn)n?-l' Suppose that the random variables
are integrable. The process (Yn) is then a supermartingale relative to the family (ffn)' Proof The inequality E[ Yn+ll ff n] < Y n a.s. is established by using the reasoning which led to Theorem 6, Lemma 7 replacing Jensen's inequality.
V,T9
Generalities and the Discrete Case
80
2. Fundamental Inequalities The inequalities established in this section will be used later to treat the countable case and the continuous case. The reader will find other interesting inequalities in the article by Dubins (64). Doob's optional sampling theorem The vocabulary and the notation are those of Chapter IV, Nos. 35, 47, and 53. .....
T9 THEOREM Let (Xn )n=I.2•...• k be a supermartingale (respectively, a martingale) relatit'e to a family of a-jields (:F n)n=I •...• k' Let (J:)i=l. .... 21 be a system of stopping times relative to these a-jields. The process (XT )i=I•...• 21 is then a supermartingale (respectively, a martingale) relative to the a-jields (:F T)i=l. ..•• 21' Proof The inequality
Ell X TIl =,~L~,) 1X ,I dP ~ i,E[1 X ,11 < 00 shows that X T is integrable for every stopping time T. We prove only the statement concerning supermartingales: The case where (Xn ) is a martingale then follows by considering the two supermartingales (Xn ) and (- X n ). On the other hand, the definition of supermartingales requiring the consideration of only two instants at a time, it suffices to consider a system reduced to two stopping times 8 and T such that 8 ~ T, and to establish the supermartingale inequality:
J,x
s dP
>
L
X T dP
(A
E j>s)·
(9.1)
Let us assume this inequality has been established in the case where the difference T - S is at most equal to I, and show that the general case can then be deduced. Set Rn = TA (8 + n), n = I, ... ,k. These random variables are stopping times (IV.T36), so we have A E:FRn for every n (IV.T40) and consequently, from the particular case, R n+1 - Rn being less than I for every n, we have
L
XsdP
>
L
X R, dP
> ... >
L
XR.dP
=
L
X T dP,
Le., inequality (9.1). Suppose thus that T - 8 is at most equal to I. We have fA
(X s - XT)dP=If
(X s - XT)dP
n=1 A n{S=n} n{T> S}
=if
n=1 A f"\{S=n} f"\{T> n}
(X n - Xn+JdP.
The event A n {S = n} belongs to :Fn from the definition of the a-field :Fs; the event {T > n}, the complement of {T < n}, belongs to :F n from the definition of stopping times. We are thus integrating X n - Xn+l in the second member over an element of :Fn' which gives a positive result from the supermartingale inequality.
81
V, T1D-T12, 13
Fundamental Inequalities
TI0 COROLLARY Let (Xn)n=l ..... k be a supermartingale, and let T be a stopping time. We then have the inequality (10.1) E[X1 ] > E[XT ] > E[Xk ]· [Apply the preceding theorem to the system of stopping times (I,T,k).]
Tll COROLLARY Let (Xn)n=l ..... k be a supermartingale, and let T be a stopping time. We then have the inequality E[IXTIl
< E[X1 ] + 2E[X;] < 3 sup E[/Xnll.
( 11.1)
n
Proof We have E[IXTI] = E[XT] + 2E[Xp ]. E[XT] is less than or equal to E[X1 ] from TI0. On the other hand, the process (Xn A 0) is a supermartingale from T4, so that (X;) is a submartingale, and we have E[Xp] < E[Xk"] (TI0).
-+
Two fundamental inequalities T12 THEOREM Let (Xn)n=l ..... k be a supermartingale, and A a non-negative constant. We then have the inequalities AP {sup X n > A } < E[X1 ] n
-J:
r~~ E[X T ] > AP{SUp X n > A}
> A}, or
e~PXn -;.}
which implies (12.2).
13 Example ofan application Let (Xn)n=l..... k be a martingale. Suppose that the random variables X n are square integrable: The process inequality (12.2) gives
(X~)
is then a submartingale (T6) and
A2Pb~ IX.I > A} < E[X=l. Suppose in particular that the X n are of the form X n = Y1
+ +... + Y2
Ym
(13.1)
V, TI4
82
Generalities and the Discrete Case
where the random variables Yn are independent, square integrable, and of mean zero. Inequality (13.1) is then well known under the name of Kolmogorov's inequality. The above proof is borrowed from Doob's book. . We shall now use the notions defined in IV.2t. Given random variables Xl' ... , X k, we denote by V(w;[a,bD [respectively, D(w;[a,bD] the number of upcrossings (respectively, downcrossings) by the function n ~ Xn(w) (n = 1, ... ,k) of the interval [a,b]. We then have the following theorem, which we state for a submartingale. It is due to Doob in the case of martingales, to Snell in that of submartingales. We follow here Hunt's method of proof, which appeared in Doob's paper (62).
-+
Tt4
(Doob's inequality) Let (Xn)n=l, ....k be a submartingale relative to a family of a-jields (§'n)n=l. ... .k' and let a, b be two real numbers such that a < b.We then have the inequalities THEOREM
E[U(';[a,b]) 13',]
< E[(X
k -
a)+ ~~'~ - (X, - a)+
(14.1)
and
(14.2) Proof In fact, we are going to establish these inequalities for the numbers of upcrossings or downcrossings of the open interval ]a,b[, which we abbreviate by V' and D': They evidently majorize the corresponding numbers for [a,b]. We may replace (Xn) and ]a,b[ without affecting V' and D', by the process (Yn ) = «Xn - a)+) (which is a submartingale from T4), and the interval ]0, b - a[. Define inductively the stopping times T I , ... , Tk+l as follows: TI(w) = 1; T 2(w) is the first index i for which Yi(w) = 0, or k if there exists no such index; Ta(w) is the first index i > T 2 (w) for which Ylw) > b - a, or k if there exists none. We alternate thus up to Tk+l = k. We can write YI(w) = [YT 2(W) - Y Tt(W)] + [YTa(w) - Y T2 (W)] + ... + [YTk +1(W) - YTk(W)]. Consider in this sum the even-numbered terms starting from the left: (YTa - YT 2 )' ( YTs - Y T)' .... We encounter first those which correspond to upcrossings by the path n ~ Yn(w) of the interval ]0, b - a[. The number of these is equal to V'(w), and their contribution to the sum is at least equal to (b - a)V'(w). We encounter next either all zero terms, or an "incomplete" upcrossing followed by zero terms, so that the contribution of this part to the sum is non-negative. We hence have, by taking conditional expectations with respect to §'l' Yk(w) -
I
E[Yk §'l] -
YI
> (b -
I
a)E[U' §'l]
+ 1
n~k
E[YTn +1
-
I
YTn §'l]'
nodd
Inequality (14.1) is then obtained by noting that each of the terms in the second member is non-negative from TtO. Inequality (14.2) is proved in an analogous manner, but we denote this time by T2(w) the first index for which the path attains the value b - a, by Ta(w) the first index for which it returns to the value zero, etc. Theorem to then gives us the inequality E[(YTa -
Y T2 )
+ (YTs
-
YT)
+ ... \§'l] > 0.
83
Fundamental Inequalities
V,15
Now this sum is composed of terms that correspond to downcrossings of ]0, b - a[ by the path n.A.N+ Xn(w), of which the total contribution is less than -(b - a)E[D' ~2]' followed by a single nonzero term at most equal to (Yk - (b - a))+ = (Xk - b)+. We thus have
I
E[(Xk
-
b)+ I ~l]
> (b -
a)E[D' I ~l]'
from which (14.2) follows. 15 Remarks (a) The presence of conditional expectations in formulas (14.1) and (14.2) is a refinement without great utility: the truly useful formulas are those which are obtained by integrating these. It would, in fact, have been possible, in the same manner, to introduce conditional expectations into inequalities (12.1) and (12.2). (b) Let (Xn ) be a supermartingale. By applying the inequalities 14 to the process (-Xn ) and to the interval (-b,-a), the following inequalities are obtained, given only in their integral form:
(15.1) and (15.2) This last inequality is probably a little easier to remember than the others. The case of positive supermartingales 16 The case of positive supermartingales is particularly important in potential theory; thus let us give two inequalities, due to Dubins (64), which improve inequalities (14.1) and (14.2). The notation is the same as in the preceding statements, but the process (Xn)n=l ..... k is a positive supermartingale, and the numbers a, b are non-negative. Here are Dubins' inequalities (p is an integer > 1): P{ U(.
·[a b]) -> p;") -< E[Xb
a](~)p-l
(16.1)
> p} < E[X~ A bl(~r-l.
(16.2)
1 A
"
b
and P{D(';[a,b))
Let us sketch the proof of inequality (16.1), for example. As in the proof of Theorem 14, set T 1 = I, call T2 the instant when the path first has a value b, etc. The event {U > p} is identical to the event {XT2 P+l > b}. We write the following inequality which follows from Theorem 10:
Jr{T2p 1. We obtain, for p
> 1,
P{U We thus have P{U
> p} < WP{T•.-, < k} < WP{U > p -
> p} < (ba) P-lP{U > I}.
84
I}.
If P is equal to 1, we can dominate the
second integral of formula (16.3) by E[Xl A a], and inequality (16.1) then follows immediately. Dubins has shown that these inequalities cannot be improved.
3. The Countable Case. ConverBence Theorems This section, in which the results are almost all borrowed from Doob's book, does not pretend to exhaust the whole subject. The reader will find other interesting convergence theorems in Doob (56), (62) and Chow (45). We continue to denote by (Q,~,P) the base space of the process.
-+
T17
(a) Let (Xn)neN be a supermartingale relative to an increasing family of sub-a-jields of ~. Suppose that
THEOREM
(,p;n)neN
sup E[X-;;]
< 00.
(17.1)
n
The random variables X n then converge a.s. to an integrable random variable X oo . (b) This condition is satisfied in particular when the X n are positive. We then have E[Xoo ] < limn E[Xn], with equality if and only if the random variables X n are uniformly integrable. The process (Xn)neNv{oo} is a supermartingale. (c) Suppose that the X n are uniformly integrable. Condition (17.1) is then satisfied, the process (Xn)neNv{oo} is a supermartingale, and the convergence of X n to X oo takes place in the Ll norm. (d) Suppose that the X n are uniformly integrable, and that the process (Xn)neN is a martingale. The process (Xn)neNV{oo} is then a martingale. Proof (a) Consider an
wE
Q such that
lim sup Xiw)
> lim inf Xn(w). n-+oo
n-+oo
There can then be found, between these two limits, two rational numbers a, b such that a < b. We then have U(w;[a,bD = 00, denoting by this the number of upcrossings of the interval [a,b] by the path n .A./II'+ Xn(w). Thus X n converges a.s. if and only if U(w;[a,bJ) < 00 a.s. for every pair of rational numbers a, b such that a < b. To prove this we use inequality (15.1): E[U(.;[a,bD] ~ sup E[(X n - b)-] . n b- a The second member is finite, by virtue of the hypothesis (17.1) and the inequality (Xn
-
b)-
< X;;+ b+.
We have on the other hand (No. 11), for every kEN, that
E[\Xkll
< E[Xo] + 2 sup E[X-;;]. n
Thus X oo is integrable, from Fatou's lemma.
V, T18
Fundamental Inequalities
85
(b) Suppose that the X n are non-negative. Property (17.1) is then clear, and the second
assertion of the statement is a simple repetition of Fatou's lemma and of II.T21. To show that the process (Xn)neNU{oo} is a supermartingale [relative to the family of a-fields (ff n)neNU{oo}-see IV.30 for the definition of ff 00]' consider two integers m, n such that n < m, and an element A of ff n; the inequality
L
X n dP
>
L
X m dP
(17.2)
I
passes to the limit when m ~ 00 from Fatou's lemma. Thus we do have X n > E[Xoo ff n] a.s. (c) Suppose that the Xn are uniformly integrable. The inequality sUPn E[IXnlJ < 00 (11. T19) then implies (17.1). The convergence of X n to X 00 takes place in the sense of the Lt norm from 1I.T21, and this justifies passing to the limit under the integral sign in formula (17.2) above. (d) To treat the case where (Xn ) is a uniformly integrable martingale, it suffices to apply (c) to the two supermartingales (Xn ) and (- X n ). The following theorem (due in part to Paul Levy) develops statement (d). ~
Let (Xn) be a stochastic process adapted to the family of a-fields (ffn)' In order that (Xn) be a uniformly integrable martingale [with respect to the family (ff n)] it is necessary and sufficient that there exist an integrable random variable Y such that X n = E[ Y ff n] a.s. for every n EN. There then exists a random variable Yo ' essentially unique, which has this property and which is ff oo-measurable. We have a.s. Yo = limn~oo X n , and this equality takes place in the sense of the Lt norm.
T18
THEOREM
I
Proof Suppose that the X n are uniformly integrable, and use T17(d). We have X n = E[Xoo ff n] for every n, and X oo can be chosen ff oo-measurable. We have X oo = limn X n in the sense of convergence a.s. and in the sense of the Lt norm. Let Y be an ff oo-measurable random variable such that X n = E[ Y ff n] for every n. Denote by vii the collection of events A E ff 00 such that
I
I
L L YdP=
XoodP.
Since vii is closed under passage to a monotone limit, and contains the union r:c of the a-fields ff m we have vii = ff 00 (I.T19). We thus have Y = X oo a.s., from II.T9. Conversely, we show that every martingale of the form X n = E[ Y ff n] is uniformly integrable. Let A. be a positive number. We have IXnl < E[I YII ff n] (by Jensen's inequality) and consequently
I
f
IXnl dP
{lX.. 1>A}
A}
00.
Now
> A.} < E[lXnl]/A. ~ E[I YIl/A..
It then suffices to apply property II.19(b) to the uniformly integrable collection consisting of the single random variable Y. The preceding reasoning does not use the order structure of the time set. It yields, in fact, the following result, which merits being made explicit.
86
Generalities and the Discrete Case
V, T19, 20, T21
T19 THEOREM Let Y be an integrable random variable. The collection of random variables of the form E[ Y ~], where ~ ranges over the collection of sub-a-fields of ~, is uniformly integrable.
I
Here is another proof of this theorem. We can restrict ourselves to the case where Y is positive. Choose a positive, increasing, convex function g defined on R+, such that lim t ---+ +00 g(t)/t = + 00, and E[g 0 Y] < + 00 (such a function exists from 1I.T22). From Jensen's inequality, we have g 0 E[ Y ~] < E[g 0 Y ~], and consequently sup E[g 0 E[Y ~]] < + 00.
I
I
I
e. This would permit the extraction from T of an increasing sequence (tn)neN such that the X tn do not constitute a Cauchy sequence. This would contradict Theorem 18, since the process (Xt)neN is a uniformly integrable martingale. Denote by ~ 00 the a-field generated by the union of the ~t, t ET. It can evidently be supposed that X oo is ~oo-measurable. The relation X t = E[Xoo ~t] for every t ET is very easily verified, and it can be shown as in No. 18 that Xoo is the only ~oo-measurable random variable which has this property (up to an a.s. equivalence). Conversely, let Y be an integrable random variable. It follows from No. 19 that the random variables E[ Y ~ t] are uniformly integrable. The simplest convergence theorem holds in the case of an index set filtering to the left. We limit ourselves to considering the negative integers.
I
I
-+
T21 THEOREM Let (Xn)ne-N be a supermartingale relative to a family of a-fields (~ne-N' Denote by ~ _ 00 the n-field ne -N ~n' Suppose that
n
sup E[X n ]
< 00.
(21.1)
n
We then have the following properties: (a) The random variables X n are uniformly integrable. (b) The random variables X n converge a.s. to an integrable random variable X -00 when n -+ - 00; the convergence also takes place in the sense of the Ll norm. (c) The process (Xn)ne{ -oo}u( -N) is a supermartingale with respect to the a-fields (~n)ne{ -oo}U( -N)'
(d) Suppose that the process (Xn)ne -N is a martingale; condition (21.1) is then satisfied, and the process (Xn)n e {_ OO}U ( _ N) still is a martingale. • "Sur un theoreme de Jensert," Fund. Math., 37 (1950), 242-248.
87
V,21
Fundamental Inequalities
Proof (a) We show first that the X n are uniformly integrable. Fix an e negative integer k such that
< e.
lim E[Xil - E[Xkl i-+-oo
We then have 0 < E[Xnl - E[Xkl We show that the integral
< e for
all n
> 0, and choose a
< k.
Let A be a non-negative constant.
(21.2) is less than e for every n when A is large enough. It suffices to prove it for values of n less than k. This integral is equal to
-f{Xn A}, since it is dominated by (I/A)E[I Xnll, tends to zero uniformly in n when A -+ 00. The same property then holds for the integral (21.5) from II.TI9(b), and we have completed the proof of uniform integrability. We do not detail the rest of the proof: The a.s. convergence of the X n follows as in No. 17 from Doob's inequality and from the relation sUPn E[IXnll < 00, which we have just established. The convergence in Ll norm follows from the a.s. convergence and uniform integrability (II.T21). Assertions (c) and (d) are immediate consequences of the uniform integrability, which justifies passing to the limit under the integral sign. We leave to the reader the task of generalizing this theorem to ordered sets filtering to the left, in the manner of No. 20. A theorem of Doob's We have avoided mentioning convergence in LP(1 < P < (0) in the statements of Theorems 17 and 21, in order to avoid overburdening them. The study of convergence in LP rests on the following theorem, due to Doob, which we have occasion to use later. We merely reproduce Doob's proof here.
V, T22, 23 ~
T22
THEOREM
Generalities and the Discrete Case
88
Let p and q be two conjugate exponents, * distinct from 1 and from
00.
Let (Xn)neN be a non-negative submartingale such that sup E[X~] n
< 00.
(22.1)
The random variable sUPn X n then belongs to .pP, and we have
0,
AP{Y > A} Set P{ Y
> A} =
< J{Y~)J r X dP.
F(A). We have E[YV]
=_J:ooo Av dF(A) =J:oo F(A) d(AV) 0
lim h~oo
[AVF(A)]~
E[ Y I§
n]
(27.1)
[i.e., we require that the supermartingale (Xn)neN can be extended to a supermartingale (Xn)neNu{oo} (put X oo = Y)]. The stopping times we consider will all be allowed to take the value stopping time T, we set XT(w) = Yew) on the set T = + 00.
+ 00. For every such (27.2)
We call Y the "random variable at infinity." The case of an arbitrary system of stopping times reduces to that of a system of two stopping times Sand T such that S < T. We then have the following statement.
-+
T28 THEOREM Suppose that the supermartingale (Xn)neN satisfies condition (27.1). The random variables X s and X T are then integrable, and we have the supermartingale inequality Xs
> E[XT I §s]
a.s.
(28.1)
Proof Suppose that we have been able to establish the theorem in the following two special cases: (a) The supermartingale (Xn ) is non-negative, and the random variable at infinity is taken to be O. (b) (Xn) is a martingale of the form E[ Y § n], and the random variable at infinity is equal to Y. It will then suffice to write the decomposition:
I
X n = (Xn - E[YI §nD
+ E[YI §n],
in order to deduce formula (28.1) in its full generality. We thus treat cases (a) and (b) separately.
91
The Optional Sampling Theorem in the Countable Case
V, T29
Case (a) We denote by Sk' T k the bounded stopping times S A k, TA k (k EN). We have E[XTkl < E[Xol for every kEN from TI0, and X T < limk ->-oo X Tk • Fatou's lemma thus implies the inequality E[XTl < E[Xol, so that X T (and in the same way X s ) is integrable. Let A be an element of :Fs. The set A n {S < k} then belongs to :FSk (IV.T39), and we have, from T9,
1
A f"'I{S:Sk}
XSk
dP
>1
A f"'I{S:S k}
The second integral is diminished by replacing {S
1
Xs dP
A f"'I{S:Sk}
X T dP. k
< k} by {T < k}. It thus follo,:\,s that
>1
A f"'I{T:Sk}
X T dP.
Letting k tend to infinity, it follows that
1
Af"'I{S1
Af"'I{T 0, and choose an integer k large enough so that E[Zkl < e. It then follows, for every stopping time T and every number A > 0, that
i
ZT dP
{ZT>A}
= 2k
i=l
i
Zi dP
{T=i}n{Zi>A}
r
The second integral is dominated by event {T > k} belonging to
§'k)
+
i
ZT dP.
{T>k}n{ZT>A}
ZT dP, which is, in turn, dominated (the
){T>k}
by the integral
J:
Zk dP
< E[Zkl < e.
The sum in
{'l'>k}
the second member is, on the other hand, dominated by
2k
i=l
f
{Zi > A}
Zi dP
which is independent of T and tends to 0 when A ~ established.
00.
The uniform integrability is thus
CHAPTER
VI
Continuous Parameter Martingales
This chapter consists principally of rather easy extensions of the results in the preceding chapter. We limit ourselves to the case in which the time set is the half-line R+, and we consider almost exclusively martingales whose paths are a.s. right continuous-we call them right-continuous supermartingales for abbreviation. The reader can find several complementary results in Doob's book. In this chapter all supermartingales and all stopping times will, unless explicitly mentioned to the contrary, be defined on a single probability space (0, ~,P), and relative to a single increasing family (~t)teR+ of sub- n (the ball
Hm nP{R n n-+oo
(21.1)
Let D n be the set of points x such that of center 0 and radius l/n). The probability P{R n < oo} is equal to the probability of hitting D n in a Brownian motion starting at ~, which is 1
if
1
I.f
nr(~)
r () ~
>-n1
Expression (21.1) thus maintains the constant value l/r(rx) for large enough n, and the supermartingale (Xt ) cannot belong to the class (D). 22 Remark Let us return to the notation of T20: The supermartingale obtained by stopping (Xt ) at time T n = Rn 1\ n then belongs to the class (D), since it is dominated by the integrable random variable X Tn V n.
CHAPTER
VII
Generation of Supermartingales
The conventions and hypotheses of the preceding chapter will be used again in this one (see the introduction to Chapter VI, and No. VI.8). The stopping times we consider will be allowed to take the value + 00, unless explicitly mentioned to the contrary. The use without further explanation of such a notation as X T' where (Xt ) denotes a process and T a stopping time, already implicitly supposes that the limit X co = lime_co X t exists a.s., and that X T = X co on the set {T = oo}. Recall that a stochastic process (Xt ) can be considered as a function of the two variables t, co; this identification allows the use of such notations as (X;) (sum of a series of realvalued processes), etc.
In
1. The Discrete Case 1 Let (!F n)neN be an increasing family of sub-a-fields of !F, and let (Xn)neN be a supermartingale relative to the family (!F n). Define the random variables Yn' An' by induction, in the following manner:
Yo = X Q YI = Y o + (Xl - E[XI I!FoD
+ (Xn -
I
Ao = 0
I
Al = X o - E[XI !Fo]
E[Xn !F n-l]) The following properties are easily verified: (a) X n = Y n - An for every n. (b) The process (Yn ) is a martingale. (c) An is obtained from A n- l by adding a positive quantity; i.e., the paths of the process (An) are increasing functions of n. (d) A o = 0; An is !F n_I-measurable for every n, and integrable. We call any process (Rn), adapted to the family (!F n) and having the following properties, an increasing process: (~) Rn is integrable for every n; R o = O. (fJ) The paths of the process (Rn) are increasing functions of n. Y n = Y n- l
104
105
Increasing Processes
VII, 2, D3, 4, D5
The preceding construction shows that every discrete supermartingale (Xn) is equal to the difference of a martingale and an increasing process. This remark has been used by Doob, who has posed the problem of the existence of such a decomposition in the continuous case. We solve this problem in this chapter, and see that the decomposition (which we call the "Doob decomposition") is then possible only for certain supermartingales. We also study the possibility of decomposing a supermartingale by means of an increasing process with continuous paths. 2 Consider now the uniqueness of such decompositions. Starting from an increasing process (B n) and a martingale (Zn), form the supermartingale (Xn) = (Zn) - (Bn), and construct the process (Yn) and (An) as above. We then have B n = Anfor every n if and only if B n is ~ n_l-measurable for every n. There thus exists only one decomposition of (Xn ) by means of an increasing process which satisfies property (d). We have an analogous uniqueness theorem in the continuous case, but the condition defining the "natural" increasing process that enters into the decomposition will be much more complicated. We hope that these few remarks will be of help to the reader in understanding this chapter, which begins in a rather abrupt manner with a certain number of "technical lemmas" on increasing processes (which have some independent interest). The fundamental results are contained in Sections 3 and 4, which the reader can peruse first, if he so desires.
2.
Increasing Processes
D3 DEFINITION Let (At)teR be a real-valued stochastic process, adapted to thefamily + We say that (At) is an increasing process if (1) The paths t ~ At(w) are a.s. zero for t = 0, increasing, and right continuous. (2) The random variables At are integrable. We say that the increasing process (At) is integrable if sup E[Atl t
< 00.
(~t).
(3.1)
4 Remarks A process adapted to the family (~t) that satisfies condition (1), but not necessarily condition (2), is said to be increasing in the broad sense. Condition (1) implies the existence of the random variable lim t _ oo At = A oo • An increasing process is integrable if and only if E[Aool < 00. D5 DEFINITION Let (Xt) be a right-continuous supermartingale. We say that (Xt ) admits a Doob decomposition if there exists a right-continuous martingale (Mt) and an increasing process (At), such that X t + At = M t for every t E R+. (5.1) Suppose in particular that (Xt ) is a uniformly integrable potential; since the expectations E[Mtl and E[Xtl are bounded, condition (3.1) is satisfied, and A 00 is integrable. The random variables At, being dominated by A oo ' are then uniformly integrable; so are the random variables (Xt) by hypothesis, and hence so is the martingale (M t). We thus have M t = E[M 00 ~tl a.s., from VI.T6(d). Now X 00 = 0, and formula (5.1) then takes the form
I
I
X t = E[A oo ~tl - At a.s. This leads us to the following definition.
VII, 06, T7, 08
Generation of Supermartingales
106
D6 DEFINITION Let (At) be an integrable increasing process, and let (Mt) be a rightcontinuous modification of the martingale (E[A oo §'t)); the process (M t - At) is called the potential generated by At.
I
The modification (M t ) considered exists from VI.T4. The expression "the potential generated by (At)" conforms to the convention of No. VI.8 in not distinguishing between right-continuous modifications of a single process. We now need only to justify the definition as a "potential." ~
T7 THEOREM Let (Xt) be the potential generated by the integrable increasing process (At). (1) (Xt ) is a potential of the class (D). (2) For every stopping time T we have XT = E[A oo
I§'Tl -
AT
a.s.
(7.1)
Proof The process (M t) is a right-continuous martingale, and the process (At) a rightcontinuous submartingale, so that (Xt ) is a right-continuous supermartingale. We have, for each t E R+,
The paths of the process (Xt ) are thus a.s. positive, in view of the right continuity. We finally have Hm E[Xtl = E[Aool - Hm E[Atl = 0, t-+oo
t-+oo
from Lebesgue's theorem; (Xt ) is thus a potential. Let .r be the collection of all stopping times. The random variables M T(T E:Y) are uniformly integrable from VI.TI9. The random variables AT(T E:T) are dominated by A oo ' and hence are uniformly integrable. It then follows that (Xt ) belongs to the class (D). To establish assertion (2), it suffices to note that
and M T = E[A oo
I§'Tl
a.s.
(from VI.TI3 and 14).
We show later that the converse of this theorem is true: Every potential of the class (D) is generated by an integrable increasing process (not necessarily unique). This will allow us to find a necessary and sufficient condition for a supermartingale to admit a Ooob decomposition. The rest of this section is devoted to the study of integrable increasing processes. We begin with some elementary remarks, not worth being stated as theorems. Strong order properties D8 DEFINITION Let (At) and (B t) be two increasing processes. We say that (B t) dominates ,(At) in the strong sense, and we write (At) (B t), if the process (B t - At) is increasing. Let (Xt ) and (Yt ) be two right-continuous supermartingales. We say that (Yt ) dominates (Xt) in the strong sense, and we write (Xt) (Yt), if the process (Yt - X t) is a positive supermartingale.
« «
107
Increasing Processes
VII, 9-11
«
9 Remarks (a) Let (At) and (B t) be two integrable increasing processes such that (At) (B t), and let (Xt) and (Yt) be their respective potentials. We then have (Xt) (Yt). (b) There is evidently an analogous definition for increasing processes in the broad sense. (c) Let (A~) (n EN) be a sequence of increasing processes that increases in the strong sense. Suppose that sUPn E[A~] < 00 for every t E R+, and put At = sUPn A~. We will show that (At) is an increasing process. Since the paths of the process (At) are a.s. increasing, and zero for t = 0, and each random variable At is integrable, we need only show that the functions s ~ AsCw) are a.s. right continuous. Now for s < t we have
«
Since the random variable At is a.s. finite, there is a.s. uniform convergence on the interval [O,t], which implies the desired right continuity. We note also, in view of later applications, that continuity a.s. of the processes (A~) implies the same property for the process (At). The continuous and discontinuous parts of an increasing process 10 Let (At) be an increasing process, and e a number >0. Define by induction the stopping times: T~+I(w) =
inf {t: t
> T~(w), AtCw) -
At_(w)
> e}
(see No. IV.44). Then, for every t E R+, set A:(w) =
L
(AT~(w) - AT~Jw))
T~(co)~t
(the sum of the jumps larger than e). It is clear that the processes (A~) are increasing processes dominated in the strong sense by (At), and strongly increasing as e decreases. They thus converge, when e ~ 0, to an increasing process (A ~), called the purely discontinuous part of (At). The increasing process (A~) = (At - A~) then has a.s. continuous paths; it is called the continuous part of (At). 11 The decomposItion of the discontinuous part can be pursued further. Given a decreasing sequence (en) of strictly positive numbers, which converges to zero, put B tn --
A£n+l _
t
A£n
t .
The increasing process (A~) is the sum of the processes (B~). Each path of the processes (B~) has a finite number of discontinuities on every compact interval; thus denote by T nm( w) the instant when the mth discontinuity of the function t .A.f'.I+ B~t( w) occurs, and by a nm ( w) the size of the jump at this instant. It is easily verified that T nm is a stopping time, and that anm is §'T nm -measurable. The process defined by
is thus an increasing process, with paths having at most one discontinuity, and we have
n,m
VII, T12, D13
Generation of Supermartingales
108
Change of time associated with an increasing process
This notion will be an important tool for us, both in martingale theory and in the study of Markov processes. , The following lemma was already known to Lebesgue. T12 THEOREM Let a be a function defined on R+, with positive values, not necessarily finite, which is increasing and right continuous. For every t E R+ put c(t) = inf {s: a(s)
> t}.
(12.1)
The function c is then increasing, right continuous, and such that a(s) = inf {t: c(t) Suppose that a(O)
= 0, and let f
> s}.
(12.2)
be a positive Borelfunction on R+: then
f/(l) da(l) = (OO)/(C(I»
dl.
(12.3)
Proof The function c clearly is increasing, and right continuous at every point t such that c(t) = 00. Suppose that c is not right continuous at a point t such that c(t) < 00. There then exists a number h such that c(t) < h < c(t + e) for every e > o. These relations imply, respectively, the inequalities a(h) > t and a(h) < t + e for every e > 0, leading to a contradiction, and it follows that the function c is right continuous. Note that c(a(s» > s for every SE R+, and consequently c(a(s + e» > s + e > s for everye > o. We thus have a(s + e) > inf {t: c(t) > s}.
Since the function a is right continuous, it follows that a(s)
> inf {t: c(t) > s}.
Let t be a number such that c(t) > s. The definition of the function c implies the inequality a(s) < t. We hence have also that a(s) < inf {t: c(t) > s}. Relation (12.2) is thus established. Suppose that a(O) = 0 and, to simplify things a little, that the function a is bounded. We take forfthe indicator function of an interval [O,s], and verify relation (12.3). The left side is equal to a(s); the right side is equal to the length of the interval Is = {t: c(t) < s}, also equal to inf {t: c(t) > s} = a(s), from (12.2). Denote by.Ye the vector space of bounded Borel functions f such that relation (12.3) holds, and by C(j the collection of indicator functions of intervals of the form [O,t]. Theorem I.T20 shows that .Ye contains all of the bounded Borel functions. Formula (12.3) is then verified for all positive Borel functions by means of a passage to the limit. D13 DEFINITION Let (At) be an increasing process in the broad sense. The system of stopping times (Ct)teR+ defined by clw) = inf {s: AsCw) > t} is called the change of time associated with (At). Let (Xt) be a stochastic process progressively measurable with respect to the family (~t). The process (XCt)teR+ is called the transform of(Xt) by the change of time (c t).
109
Increasing Processes
VII, 14, TI5
The relations C t < a and Aa > t are equivalent for every a E R+. Since Aa is ~a-measur able, the random variables C t are clearly stopping times, as stated in the definition. The paths of the process (c t ) are right continuous. Right continuity of process is thus preserved under this transformation. Integration with respect to an increasing process 14 Let (At) be an increasing process in the broad sense (we suppose only, for simplification, that the random variables At are a.s. finite). Let (Xt) be a measurable process with positive values (see IV.D45). Since each function t ~ Xt(w) is measurable from II.TI4, we can consider for each WEn the Lebesgue-Stieltjes integral on R+:
f.oo Xt(w) dAt(w). This integral is an ~-measurable function of w, from Fubini's theorem (II.TI4). Suppose in particular that the process (Xt ) is progressively measurable with respect to the family (~t). Consider the process (Yt) defined by
t f.t X
Y =
s
dA s
(the point t being included in the interval of integration): The same reasoning as above shows that Yt is ~t-measurable for every t E R+. The process (Yt ) admits, on the other hand, right-continuous paths. It is hence progressively measurable with respect to the family (~t), from IV.T47. Let T be a stopping time; the random variable
T= f.T X
Y
is
S
dAB
~t-measurable
from IV.T49. We shall consider only positive random variables in the following theorems, in order to avoid considerations of integrability. The symbols E[·I .] will denote generalized conditional expectations.
T15 THEOREM Let (Xt ) and (Yt ) be two measurable stochastic processes with positive values [not necessarily adapted to the family (~t)] such that, for each stopping time T, (15.1)
E[XTI{T< oo}] = E[YTI{T< oo}]· We then have,for every increasing process (At) and every t
< 00, (15.2)
Proof We begin by treating the case t = + 00. Introduce the change of time (cs) associated with the increasing process (As). It follows, from Theorem 12 and Fubini's theorem, that
E[f.oo X. dA.] = E[f.oo X"I{,.T}'
We obtain the general formula
E[f
X,
dA,1 ffTJ = E[f Y, dA,1 ff TJ
a.s.
(15.4)
Here is a simple, often useful corollary of the preceding theorem.
T16 THEOREM Let (Yt ) be a positive right-continuous martingale and let (At)bean increasing process. Then for every t E R+ we have
E[A,Y,] = EU: Y, dA.J. This inequality also holds for t =
(16.1)
+ 00 if the martingale (Yt) is uniformly integrable.
Proof Let us establish this last point first. Since the martingale (Yt) is right continuous and uniformly integrable, we have YT = E[Yoo :FT] a.s. for every stopping time T (VI.T13 and 14). It then suffices to apply the preceding theorem to the process ( Yt) and the constant process equal to Y00' Equality (16.1) is easily established in the same way, but it is still simpler to reduce to the case already treated, by noting that the ordered sets [O,t] and [0,00] are isomorphic. The following statement applies in particular to two integrable increasing processes which generate the same potential. The use of the expression "increasing processes" supposes implicitly that they are adapted to the family (:Ft). The reader will note that this hypothesis is not used in the proof.
I
T17
THEOREM
Let (At) and (B t) be two increasing processes such that
I
I
E[B t - B s :Fs] = E[A t - As :Fs] a.s.
(17.1)
for every pair of numbers s, t such that 0 < s < t < 00. Let (Y t ) be a process with positive values, adapted to the family (:Ft), and having left-continuous paths a.s. We then have, for every t < + 00,
(17.2)
111
VII, Dl8
Uniqueness of the Doob Decomposition
Proof It suffices to treat the case where t is finite and where the random variables Y s are bounded by a constant. The general case can then be deduced by passing to a monotone limit. Suppose then that the process (Ys) is bounded, and set, for every integer n > 0,
Y: = Y o = Y!£t
Y:
J
k
k+l
SE;; t, ~ t
for
n
J
(k EN).
Since the paths of the process (Ys) are left continuous, it will suffice to establish (17.2) for the process (Ysn), and then to let n tend to infinity, using Lebesgue's theorem. Now we have
!FkJJ.
I -t -t -t -t n n n n We also have an analogous relation for the process (Bs). It then is sufficient to note that from (17.1) = ni1E[Yk E[Ak+l - A k o
E[Ak+ln t-
A!£t n
I!F~tJ n
=
E[Bk+ln t-
B~tn I!F~tJ n
a.s.
Remarks (a) In formula (17.2), the expectations can be replaced by conditional expectations E[- !Fo]· (b) Formula (17.2) extends to all positive processes (Yt ) such that the mapping (t,w) .J\,/I/'+ YtCw) is measurable with respect to the a-field on R+ X n generated by the processes with left-continuous paths.
I
3 - Uniqueness
cif the
Doob Decomposition
We begin by establishing the uniqueness theorem, whose proof requires less technique than that of the existence theorem. Natural increasing processes The definition of "natural" increasing processes, which we now give, will doubtless appear particularly artificial. We shall see later that the "natural" processes are, roughly speaking, the limits of increasing processes with continuous paths. We shall also see that they appear "naturally" in the proof of the existence theorem. D18 DEFINITION process if
Let (At) be an increasing process. We say that (At) is a natural increasing (18.1)
for every t
E
R+ and every positive, bounded, right-continuous martingale (Yt ).
This definition is simplified in the particularly important case where the increasing process is integrable. We then have the following theorem.
VII, T19, T20
Generation of Supermartingales
112
T19 THEOREM Let (At) be an integrable increasing process. (1) The process (At) is natural if and only if (19.1) for every positive, bounded, right-continuous martingale (Y t ). (2) We then have also,for every stopping time T,
(19.2) (3) Under the same conditions the increasing process (B t ), defined by
B t = Al{t E} < p(s~p (~ - Y:) > E} =
P(i~f(Y: - ~) < -
E} < ~ E[~ -
Y~],
from inequality VI.l.2. It then follows that the left side is zero, and the theorem is established. The uniqueness theorem .....
T21 THEOREM Let (Xt ) be a right-continuous supermartingale. There exists at most one natural increasing process (At) such that the process (Xt + At) is a martingale. Proof * Let (B t ) be another natural increasing process, which has the same property. We shall show that, for each t E R+, At = B t a.s. This is indeed the desired result, since we do not distinguish two right-continuous modifications of the same process. It suffices to show that E[YA t ] = E[YB t ]
for every bounded, positive, ~t-measurable random variable Y [see remark 11.9(a)]. Denote by (Ys) a right-continuous modification of the martingale (E[ Y ~s])' We have
I
E[YA,] = E[f:Y. dA.]
E[f:¥.- dA.] E[YB,] = E[J:¥. dB.] = E[f Y.- dB.], =
from Theorem 16 and the fact that the two processes are natural. Since the process (B t - At) is a martingale, the hypothesis of Theorem 17 is satisfied, and so we have
This establishes the theorem. • This proof communicated by P. Courrege.
VII, 22
114
Generation of Supermartingales .
4. The Existence Theorem We begin by establishing the existence of an integrable increasing process generating a potential of the class (D) (T29). The necessary and sufficient condition for the existence of a Doob decomposition will be given in No. 31. The crucial point of the proof is the theorem concerning uniform integrability (T2S). We prove it directly for all natural, integrable, increasing processes, but the reader will note that this general form is not necessary for the existence proof. The latter uses it, in fact, only for increasing processes of the form
At = {H, ds, where Ht is a process adapted to the family (§'t), with positive values and right-continuous paths. This remark might lead to a more elementary proof of Theorem 29. Uniform integrability properties
22 We begin by proving a general formula for integration by parts, which seems not to be entirely classical. Let f and g be two increasing and right-continuous functions on R+, such that f(O) = g(O) = O. For simplicity, we suppose also that the quantities f(oo) = limt-.oof(t) and g( (0) = limt -. oo g(t) are finite. We then have
=f.
f( 00 )g( (0)
df(x) dg(y).
R+xR+
Denote by D+ the set of points of R+ X R+ situated above, but not on, the diagonal, and by D- the complement of D+. By applying Fubini's theorem to the integrals
f
n+
j(oo)g(oo)
Irn - d/(x) dg(y),
df(x) dg(y),
= flg(oo) -
g(u)] dj(u)
we obtain
+ flj(oo)
- f(u-)] dg(u),
(22.1)
where f(u-) denotes the left limit off at u. The general formula f(oo)g(oo) = fg(U) df(u)
+ ff(U-) dg(u)
(22.2)
is then easily deduced. This formula is not symmetric with respect to f and g. Its "symmetrization" yields the formula for integration by parts given by Hewitt (75). We point out the following formula, where p is an integer> 0, and where the functionf is assumed to be increasing, continuous, and zero for t = 0: j(oo)' =
Pl( o
dj(u,)
r ··r dj(u,)'
df(u,).
(22.3)
U~-l
Ul
The following identity is the basis for the theory of energy, which we shall present in Section 6.~It ,was,~sed f~r th~ tst time~b Volkonski (12~in a less general form. C'\!
v ",J..
~
J
(L\
\" (". .
\») - ""... .n} j
l.. f,. \!
-:.,
v,.... ..J.
,.t
{j
~1 P I ' (,..
i /_ \. '.
L-'.'::' i,.n ; l
i 0. '\ i IJ ') J) .
"" '1
..>;.. ~
(j (t.) .
.
~•. :'J ~, \
,./
.U -
t".
0
I
The Existence Theorem
115
VII, T23
T23 THEOREM Let (Xt ) be the potential generated by an integrable, natural, increasing process (At). We then have
E[A:') = E[f(X' + Xt-> dA'l
(23.1)
Proof Denote by (Af) the bounded integrable increasing process (At A n)(n EN), and by (Xf) the potential of (A f). The process (X~ + A~) is a right-continuous modification of the martingale (E[A~ I ~t]), and the increasing process (At) is natural. We thus have, from T16 and T20,
E[AooA~) =
E[(X; + A:) dA,J
=
E[f(X~ + A~) dA,J.
and consequently
2E[AooA~) = E[( (X: + X:_ + A: + A:_) dA'l The left side is also equal to
from formula (22.2). Since the second integral is finite, we are led to the relation
w~
then let n tend to infinity. The right side above tends to
E[(A' + A.-> dA,J = E[A:'] [formula (22.2)]. On the other hand we have, from VI.T16, Hm X:(w) = X t ( w) for every t
a.s.
n
The random variables X;_ increase with n. They thus have a limit as n ~ denote by Yt. We show that
ye<w) = Indeed, we have for every c > 0, p(suP (X t_ - Yt)
X t _(w)
for every t
00,
which we
a.s.
> cl < p(s~p (X t - x:) > cl < P(i~f (x: + A: -
X t - At)
< -cl
< ! E[A oo - A~] c
from formula VI.l.2. Since this last expectation tends to zero when n ~ 00, we see that the left side is zero. It then only remains to apply Lebesgue's monotone convergence theorem.
116
Generation of Supermartingales
VII, T24, T25
T24 COROLLARY Let (At) be an increasing natural integrable process whose potential (Xt) is dominated by a constant c. We then have
(24.1) Proof We have
E[A:'l = E[f<x. + X._) dA.] < 2cE[A ro l = 2cE[Xol < 2c'. Remark It is possible to establish Theorems 29 and 37, which are the fundamental results of this chapter, by using relation (23.1) only for an increasing continuous process (At). This relation is then almost immediate. In fact, we have, from (22.1),
A:'
f
=2
[A ro
A,l dA,.
-
Taking the expectation of both sides, and noting that X T = E[A oo every stopping time T, we can apply T15, which yields
E[A:'l = 2E[f c},
and by r(c) the expectation E[ Y T J. We have seen in VI.T20 that r(c) tends to zero as c tends to 00. For every process (At) E.9I, denote by (A~) the integrable increasing process defined by A~ = AtI{t0, let us choose c large enough so that r(c) < e/2, and show that
1 +1
Aoo
dP
{Aoo>a}
~e
independently of Aoo ' if a is large enough. This integral is dominated by r(c)
{A oo>a}
A~ dP
from property (1), and it remains to show that the second integral can be made less than e/2 for a large enough. Since the random variables (A~) are uniformly integrable, from
117
The Existence Theorem
VII, 26, 27
property (2) (see II.T22), it suffices to show that the probability P{A oo > a} is dominated by a function of a which tends to zero as a tends to infinity (II.T19). Now we have
P{A oo
> a} < -1 E[A oo ] = a
1
- E[Xo]
a
< -1 E[Yo]. a
Let us now establish properties (1) and (2). The first one follows from
E[A oo
-
A~]
= E[A oo - AT) = E[E[A oo = E[XTJ < E[YT ) = r(c).
-
I
ATe ~TJ]
To establish the second, note that the process (A~) is natural (T19). It hence suffices to show that the process (XD is dominated by c, and to use T24. Note that the inequality YtCw) > c implies TC<w) ~ t, and hence A~(w) - A~(w) = O. Consequently,
X~
= E[A~
I ---,= E[(A~ - A~)I{l't:S;c} I~t] A~ I~]It{Y,:s;c} < Yi{y,:s;c} a.s.
- A~ §"t]
= E[A~ -
We then conclude by using the right continuity of the process (X~). 26 In No. 1 we constructed the increasing process (An) associated with a discrete supermartingale (Xn ). We had n-l
An =
!
(Xk
k=O
-
I
E[Xk+l ~k])'
It is natural to seek an analogous formula in the continuous case, in which the summation is replaced by an integration, and the difference operator under the sign "~" is replaced by a "derivative." This last notion is, however, difficult to define. We hence use a difference quotient and pass to the limit. It is possible to use another construction procedure, the passage from the discrete case to the continuous case by means of finer and finer subdivisions. The proof is then simpler, but the procedure unfortunately does not always lead to a natural increasing process. 27 DEFINITION Let (Xt)teR+ be a right-continuous potential of the class (D), and let h be a number >0. We denote by (PhXt)teR+ a right-continuous modification of the supermartingale Yt
= E[XHh I ~t]·
(27.1)
(a) This definition must be justified. Let sand t be two instants such that s
I
< t. We have
I
E[ Yt §"s] = E[XHh I §"t I §"s] = E[XHh ~s] = E[XHh I §"s+h I §"s] ~ E[XS+h ~s]
I
= Ys
a.s.
The process (Yt ) thus is a supermartingale. Since the function t -¥I'+ E[ Yt ] = E[XHh ] is right continuous, Theorem VI.T4 establishes the existence of a right-continuous modification of (Yt ). We have Yt ~ X t a.s. for each t. This implies that the supermartingale (PhXt) is dominated by (Xt ); it is hence also a potential of the class (D). (b) Let T be a stopping time (finite or not); we have
I
PhXT = E[XT+h ~T]
a.s.
(27.2)
VII, T28
Generation of Supermartingales
118
Indeed, denote by (Tn ) a sequence of elementary stopping times which decreases to T-the stopping times T]
+ E[n~p YTn(ATn+l =
E[~: YTn(ATn+l -
AT)]
where (3 is a quantity dominated in modulus by CE[A oo way,
E[f Y.- dA~] = E[~: YT.(A~'H where (lh is dominated in modulus by e E[X:] CE[XT 21 ]. On the other hand, we have
liff E[:~:YTn(A}n+l -
-
AT)]
-
+ lX
+ fJ +
(l,
A T21 ] = CE[XT ). In the same
A~.)] + P' + (X',
< e E[Xo], and fJh is dominated by CE[X~ ]
0 put (36.1)
The stopping times T: then increase with n and we have
lim P{T:
< oo} = o.
(36.2)
n-+ 00
(2) For each n EN, let (A7) [respectively, (At)] be the natural, integrable, increasing process, which generates (Xr) [respectively (Xt)]. Suppose that the process (Xt) is bounded. The random variables A oo , A~ then belong to 2 2 , and we have 1im E[(A oo
-
A~)2]
= O.
(36.3)
n-+ 00
Proof The process (Xt - Xr) is right continuous. It follows from IV.44 that T: is a stopping time, and it is clear that Ten increases with n. Put T = limn ---+ 00 T;-. By applyip.g VI.T13 to the process (Xr), we have Hm E[X~] > E[X~] n-+ 00
e
and also, from the regularity of (Xt ), lim E[X~] = E[XT ].
n-+ 00
e
Thus, for every p, we have
E[XT - X~] On the other hand, for p
> lim E[X~ -
X~~].
n-+ 00
< n, we have from (36.1)
XTne - X~nE
> X Tn E
XTnE
> e I{Tn< oo}' E
which implies, for every pEN,
E[X T
-
X~]
> e Hrn P{T: < oo}. n-+ 00
VII, T37
126
Generation of Supermartingales
Formula (36.2) can then be deduced by noting that the left side tends to zero when p ~ 00, from VI.T16 and Lebesgue's theorem. Suppose next that the process (Xt ) is bounded by a constant c. The same then holds for each of the processes (X;), which implies the relations E[A~]
< 2c 2
and
E[(A~)2]
< 2c2
(36.4)
from T24. Denote by (Yt) the process (Xt - X;), and by (B t) the process (At - A~). Since the random variables A oo and A~ belong to 2 2 , we can apply the proof of formula (23.1) to (B t ), which is the difference of two natural increasing processes. We obtain
E[(A oo
-
A::'l'l = E[B:'] = E[f(yu + Yu-l dB u ]'
We separate this last expression into the two integrals
E[J[O.T~[ r (Yu + Yu-) dB u]
and
E
We have Yu(w) < e, Yu_(w) nated in absolute value by
[J:
[~.oo[
(36.5)
(Yu + Yu-) dB u] .
(36.6)
< e for every u E [O,T:(w)[. The first
2eE[J[O.T~[ r d(A
u
integral is hence domi-
+ A:)] < 2eE[A oo + A~] < 4eE[Xo].
The second integral is dominated in absolute value by
fI{,F, O. The stopping time TA is then totally inaccessible (respectively, accessible). (c) Let (Tn)llEN be an increasing sequence of accessible stopping times; the stopping time sUPn T n is then accessible.
Proof Suppose that Tand T' are totally inaccessible and consider a sequence (8 n) E !/TAT" We then have (8 n ) E !/T' (8 n ) E !/T" and
{li~ 8 n =
TAT' =
< 00,
{li~ 8 n = U
8 n < TAT' T
< 00,
{li~ 8 n =
for every nJ
8 n < T for every n, T'
1. It is then easily deduced that all stopping times are accessible. It will be noted that the natural increasing process 2(A t ) strongly dominates (B t ), without the latter process being natural (see No. 50). This is due to the existence of times of discontinuity for the family (:Ft). (b) Let (Q,:F,P) be a probability space sufficiently rich so that there exists a positive random variable S, defined on Q, and admitting an exponential law : P{S
> t} =
e- t for every t
Put then Xt(w)
I
for
t
o
for
t
={
> o.
< S(w) > S(w)
and denote by :Ft the a-field consisting of the sets, which differ from an element of .9""(Xs , s < t) by a negligible set. The family is evidently increasing, and we shall see later that it is also right continuous. We begin by noting that .9""(Xs , s
< t) = .9""(S At).
Indeed, the random variable SAt is measurable with respect to the first a-field, since SAt is the upper bound of the rational numbers s < t such that X s = 1. On the other hand, each random variable Xis < t) is measurable with respect to the second a-field, since Xs =
I{s<sAt}·
139
The Classification of Stopping Times
VII, 55
The a-field §" 00 is hence equal to !Y(S) (up to negligible sets); every §" oo-measurable and bounded random variable is thus a.s. equal to a function of the form Ho S, where H is a bounded Borel function on R+ (see I.T18). We can then exhibit a specific right-continuous modification of the martingale (E[H 0 §"tD,
Si
5.
00
I
E[H 0 S SAt]
=H
0
S . I{s.~t} '-.
+
I{S;t}
H(x)e- X dx
5.
(54.1)
00
e- x dx
t
Let us show then that the family (§"t) is right continuous. Let A be an element of §"t+; choose the function H so that H 0 S = lA' and denote by (Yt ) the modification constructed above. The martingale (Yt ) has a discontinuity only at the instant S, and S #: t a.s. We thus have Y t = Y t- a.s. Since Y t is equal to lA' and §" t contains the negligible sets, we have A E§"t. We then have, for s < t, E[Xt -
I
Xsl §"s] = -E[l{s<s:5:t} §"s]
=-
[1 - e-
E[f
X._ dA.J
.
> e[(X,)]
(58.2)
The first inequality is an immediate consequence of formula (58.1). To establish the second one, we begin by supposing that the energy of (Xt ) is finite. We have, from T20 and the na'turalness of the increasing process (At),
This relation can also be written, since (Xt ) has finite energy, as
The second inequality then follows from (58.1). To treat the case where the energy of (Xt ) is infinite, introduce the integrable increasing processes (A~) = (At A n) (n EN), and denote by (X:) the potential of (A~). Since the processes (A~) are natural from T49, we have
E[f
X._ dA.J
>
E[f
X::.- dA.J
>
E[f
X:_ dA:J
>
lE[(A::,)'].
This last quantity tends to infinity with n.
59 Remark
Definition 57 can be generalized by setting, for every integer p ep[(Xt )] =
1- E[(Acx')P]. p!
> 1, (59.1)
These quantities have not been used in classical potential theory, and we point out only one result concerning them. If the potential (Xt ) is dominated by a constant c, we have ep[(Xt )] ~ c1J , or in other words, (59.2) /
Here is the idea of the proof of this inequality: First the case where (At) is continuous is considered, using formula (22.3). Next, the case of a natural increasing process is covered by means of a passage to the weak limit in Lp, based on T30, and then proceeding as in Remark 61 below:
TOO THEOltEM Let (Yt ) be a potential withfinite energy and let (Xt ) be a potential dominated by (Yt ). Then (Xt ) ha3 finite energy and e[(Xt)]
~
4e[( Yt)].
(60.1)
VII, 61
Generation of Supermartingales
142
Proof We first establish (60.1), supposing that (Xt) has finite energy. Let (At) and (B t) be the natural increasing processes which generate (Xt ) and (Yt ), respectively; we have -[(X,)] The process (Yu
< E[f X~ dA.J < E[f(y~ + H._) dA.}
+ B u ) is a martingale, and the increasing process (At) is natural; Theorem
20 hence permits us to replace the last expectation by E[f.oo(y.
+ H.) dA.J. an expression
which, from T16, also equals E[BooAcxJ. We thus have, using Schwarz's inequality,
e[(Xt)] < (E[A~]E[B~])1/2 = (4e[(Xt)]e[(~)])1/2. Inequality (60.1) then follows immediately. The restriction made on (Xt ) is next removed in the following manner: Let (A~) be the increasing process (At A n), which is natural from T49. The potential of (A~) is dominated by n, and hence has finite energy from T24. We thus have We then let n tend to infinity.
Remark The same reasoning also gives the following results: (a) A potential (Xt) has finite energy if and only if the random variable Y = SUPt X t belongs to ,22, and we have e[(Xt)] < 2E[Y2] < 16e[(Xt)]. (b) Let (B t ) be an integrable, increasing process which generates (Xt ) (natural or not); if E[B~] is finite, then (Xt) has finite energy and e[(Xt)] ~ 8E[B~]. To prove (a) and (b), assume first that (Xt) has finite energy, and let (At) the natural increasing process that generates it. Then Y is dominated by SUPt E[A oo ~t], which belongs to ,22 (VI.2); formula VI.(2.1) also yields the inequality:
I
E[ Y2]
< 4E[A~] =
8e[(Xt)]. The same reasoning applies to (B t) under assumption (b), giving that E[ Y2] Conversely, to dominate e[(Xt)], assume that (Xt) is bounded. Then
< 4E[B~].
< E[f.oo X u- dAuJ < E[f.oo Y dAuJ < E[YA oo ]. Applying the inequality of Schwarz, one finds that e[(Xt )] < 2E[ Y2]. A passage to the limit e[(Xt)]
as above then extends it to the general case. Monotone convergence and energy
61 We first recall several elementary results on weak convergence in L2. Let (fn)neN be a sequence of elements of L2 such that sUPn IIfnl12 < 00, and which converges weakly in Ll to a functionf;fis then the only possible cluster point of the sequence (fn) in the weak topology of L2. Since, on the other hand, the set offn's is bounded in L2, the sequence (fn) must converge weakly to f in L2. The L2 norm is a lower semicontinuous (1.s.c.) function under the weak topology on L2; thus (61.1) IIfl12 < lim inf IIfn112. n
143
A Few Results on Energy
VII, T62, T63
The fn will converge strongly to fin L2 if and only if
IIfl12
= lim n
Ilfnl12'
(61.2)
This condition is clearly necessary. Conversely, if it is satisfied, lim E[(f - fn)2] n
=
E[f2]
+ lim E[!;] n
2lim E[ffn] n
= 0.
Here is a consequence of these properties. T62 THEOREM Let (Xt ) be a potential which is the upper envelope of an increasing sequence ofpotentials (X;). We then have the inequality e[(Xt )]
< lim inf e[(X~)].
(62.1)
n
Proof It suffices to establish this inequality in the case where the right side is finite. We can then suppose [by extracting a subsequence from the sequence (Xr) if necessary] that the energies e[(Xn] are all finite, and converge as n tends to infinity. Each potential (Xr) then belongs to the class (D), and is generated by a natural, integrable, increasing process (A~). Let A be a weak cluster point in L2 of the sequence (A~) -A does exist, since the expectations E[(A~)2] are uniformly bounded. The relation Xf < E[A~ ~t] becomes in the limit X t < E[A I ~t] a.s. It then follows immediately that the potential (Xt ) belongs to the class (D), and is hence generated by a natural, integrable, increasing process (At). It follows from T30 that A oo = lim n A~ in the weak topology of L\ and relation (62.1) is then an immediate consequence of No. 61. Inequality (62.1) cannot always be replaced by an equality, as we shall see in No. 67. This can be done, however, in two very important cases, which are the object of the following two theorems.
I
T63 THEOREM Let (Xt ) be a regular potential of the class (D), which is the upper envelope of an increasing sequence ofpotentials (Xf). We then have e[(Xt )]
= lim e[(X:)].
(63.1)
n
Proof This equality is trivial, from (62.1), when e[(Xt )] = 00. We can thus limit ourselves to the case where (Xt ) [and hence also each (Xf), from T60] is of finite energy. Let (At) and (A~) be the natural, integrable, increasing processes which generate, respectively, (Xt ) and (Xf). We show that
lim E[(A oo
-
A~)2] = 0,
(63.2)
n
which evidently implies (63.1), and which is equivalent to it from the remarks in No. 61. We have already established this equality in No. 36, in the case where the process (Xt ) was bounded, and we refer to the proof of T36, which remains valid (with only insignificant changes) up to the point where one is trying to dominate the integral (36.6),
144
Generation of Supermartingales
VII, T64
Note first that the process (At) is continuous; this integral is thus increased by replacing the interval of integration by ]T:,oo[, which removes a negative term. Denote then by (B t ) [respectively, (B7)] the integrable increasing process defined by 0 Bt(w)[respectively,
B~(w)]
=
(
t
for
At(w) -
< T:(w)
AT:(w)
[respectively, A~(w) - A~(w)]
for
t
> T:(w).
These processes are natural from T48, and a simple calculation yields the potential (Zt) generated by (B t ),
I~t]I{t 1) is a natural increasing process with a single jump of the preceding type. * Set (B t) = 2~=o (An, (C t) = 2:=n+l (An, and denote by (Yt), (Zt) the potentials • See No. 50,(a).
Generation of Supermartingales
VII, T65, T66
146
generated, respectively, by (B t ), (C t ). We can also construct the processes (B~), (C~) associated with (B t ) and (C t ) as in No. 28, and consider their potentials (Y:), (Z:). Finally, let e be a number >0. We can choose n large enough so that E[C;,] ~ e, and then h small enough so that
E[(B<Xl - B~)2]
<e
from T63 and the above. We tqe,n have, from T60~ I '\:~ ;'",;/
'
'-.\ --'~ ( "
..
t. t"
"\'
2 ] ". FE"(Ch )2]' , 0,
there exists a number
'Yj
> 0 such
that the relation A E ~, P(A) < 'Yj, implies Q(A) < B.
(9.2)
The absence of such an 'Yj would, in fact, imply the existence of a sequence (An)nEN of elements of~, such that 1n P(A n) < 00 and Q(A n) > B for every n. Set/A = lim n sup lA.. ; we have A c U ~1> An for every p, hence P(A) < 1:=1> P(A n), and finally peA) = O. On the other hand, Q(A) > lim n sup Q(A n) > B, from Fatou's lemma (applied to the sets Q"'A n). This contradicts the absolute continuity of Q with respect to P. We show next that the martingale (Xn ) is uniformly integrable. It is clear that (9.3)
On the other hand, (9.4)
The left side of (9.4) is thus less than the number 'Yj of (9.2) whenever c is large enough, and therefore the integral in (9.3) is smaller than B, from (9.2), independently of n.
VIII, 10, 11
Applications of Martingale Theory
154
The martingale (Xn ) thus converges to a limit when n -+ 00 in the Ll norm. This limit is evidently a Radon-Nikodym density of the restriction of Q to ~, with respect to the restriction of P to ~. The Radon-Nikodym theorem thus holds for every separable a-field. For each separable sub-a-field ~ of :F, denote by X 0, there exists a stopping time T such that
(T(m),m)
E
A for every m such that T(m)
< 00;
P{T < oo} ~ P(C) - e.
If A
(21.1) (21.2)
belongs to .r(J'), it can moreover be assumed that T is accessible.
Proof The indicator function of A is a well-measurable process; there thus exists, from T20, a set BE .r(ef') and a sequence (Tn ) of totally inaccessible stopping times, such that n
If A belongs to .r(ef'), we just take B = A. Set: Rn(m) = {
Tn ( m) if
+ 00
(Tn ( m), m)
belongs to A
otherwise.
The set [Rn] is well-measurable, which implies that Rn is a stopping time.
163
VIII, D22
Square-Integrable Martingales
Denote by J; the collection of finite unions of elements of J'. The debut (IV.D5l) of every element of is the lower bound of a finite number of accessible stopping times, and hence is accessible from VII.T43(a). The paving J;lJ is closed under (U f, c). The at every wE Q is a compact set in R+, and the cross section of every element H of debut of H, being the limit of an increasing sequence of accessible stopping times, is accessible from VII.T43(c). Denote by p* the "outer probability" associated with P [P*(U) = infyEoF P(V) for every
J;
n
J;
Y::>U
U c Q]; we saw in No. I1I.24 that p* is a Choquet ~-capacity. Let 'TT' be the projection of R+ x Q onto Q; for every subset H of R+ x Q set I(H) = P*('TT'(H». This set function is a capacity with respect to the paving J;lJ: Properties III.18(a) and (b) are clear, and property III.18(c) is an easy consequence of III.T6. Since the set B is J'analytic from T18, the Choquet capacitability theorem implies the existence of an element J of J;lJ such that e J c Band I(J) > I(B) -
2.
Let S be the debut of J; we have (S(w),w) EJ for every w such that S(w) < 00. Since the stopping time S is accessible, we also have (S(w),w) E A for every w such that S(w) < 00. This settles the case where A E :T(J'). To deal with the general case, set R(w) = R 1(w) A R 2(w) A ••• A Riw),
where the integer p is chosen large enough so that
p{w: R(w)
=
00, i~f R.(w) < 00) < ~.
Again we have (R(w),w) E A for every w such that R(w) this stopping time satisfies property (21.1) and we have
< 00.
Finally, set T = R A S;
P[~{T< oo}] < P[B"{S < oo}] + P[{i~f R. < OO)"{R < oo}] < eo 3. Square-InteBrable MartinBales The martingales, stopping times, etc., we consider in this section are always relative to a family of a-fields (~t), which satisfies the same hypotheses as in the preceding section (No. 13).* D22 have
DEFINITION
We say that a right-continuous martingale (Xt ) is square-integrable
sup E[X~]
< 00.
if we
(22.1)
t
This amounts to saying that (Xt ) is a uniformly integrable martingale of the form (E[Y\ ~t]), where Y belongs to 'p2. We then have E[X~] < E[y2 ] for every stopping time T, from Jensen's inequality. The theory we develop extends to right-continuous martingales (Xt ) such that X t belongs to 'p2 for every t. This immediate generalization is left to the reader. • The results of this section are taken from Meyer (97).
VIII, 23, D24, T25, D26
Applications of Martingale Theory
164
23 Let (Xt ) be a square-integrable martingale, and let (Mt ) be a right-continuous modification of the martingale (E[X~ I §""tD. The submartingale (X:) is dominated by (Mt), and thus belongs to the class (D) (VI.T19). The process (M t - X:) is hence a potential of the class (D), generated from VII.T29 by a unique integrable natural increasing process (At). We say that (At) is the increasing process associated with the martingale (Xt). * Since the process (X; - At) is a right-continuous martingale we have for every pair of stopping times S, T such that S < T,
I
E[A T - As §""s]
D24 DEFINITION if we have
=
I
E[X~ - X~ §""s]
= E[(XT
- X s )21 §""s]
a.s.
We say that a square-integrable martingale (Xt ) is quasi-left-continuous
X um T n = lim X T n
(24.1)
a.s.
n
n
for every increasing sequence (Tn)nEN of stopping times.
Every stopping time T such that XT(w) ~ XT_(w) a.s. on the set
{T < oo}
is then totally inaccessible (VII.D42). t Conversely, the reader can verify that this property implies the quasi left continuity of (Xt ). T25 THEOREM The increasing process (At) associated with (Xt) is continuous if and only if (Xt ) is quasi-left-continuous.
Proof The natural increasing process (At) is continuous if and only if
lim E[X~n ] = E[X~]
(25.1)
n
for every increasing sequence (Tn ) of stopping times which converges to a stopping time T (er. VII.T37). Since the random variables X~ are uniformly integrable, this condition • n can be wntten
Now we have lim n X Tn =
E[X~1 = E[ (Ii:,n X T . ) } E[XT I V §"" TJt (VI.T6). We thus have
E[X~] =
(25.2)
n
E[(lim X T n )2] n
+ E[(XT -
lim X T n )2], n
and we see that (25.2) is equivalent to the relation X T = lim n X T n a.s. D26 DEFINITION We say that two square integrable martingales (Xt ) and ( Yt) are orthogonal if the process (Xt Yt) is a martingale. Suppose that Yo = 0; the martingales (Xt ) and (Yt ) are then orthogonal if and only if E[XTYT ] = 0 for every stopping time T: If the process (XtYt ) is a martingale, we have
* This notion is mainly useful for the theory of stochastic integrals, which we do not develop here. t Or a.s. infinite.
+Notation of No. VII.38.
Square-Integrable Martingales
165
VIII, T27, T28
Conversely, if this property is satisfied we have E[XT AYT) = 0* for every stopping time T and every event AE :FT. This relation can also be written
or even, since E[Xoo Y 00] = 0,
fA XTYT dP = fAXooYoo dP, or finally
I
E[XooYoo ~T] = XTYT a.s. T27 THEOREM Let (Xt) and (Yt) be two square-integrable martingales, and let (At) and (B t) be the increasing processes associated, respectively, with (Xt) and (Yt). Then (Xt) and (Yt) are orthogonal if and only if the increasing process associated with the martingale
(Xt
+ Yt) is equal to (At + ~ ~ k
.s~~
Proof This latter condition !laYs agaifi (since the iHcreasing process (At + B t) is natural) ~ that the process (Xt + Y t)2 - (At + B t ) is a martingale. It then suffices to note that
First decomposition of square-integrable martingales
We are going to decompose every square-integrable martingale into a quasi-left-continuous martingale and a martingale orthogonal to every quasi-left-continuous martingale. We say that two right-continuous martingales (Mt) and (Nt) have no common discontinuities if we have a.s. NtCw) = Nt_(w) for every t E R+ such that MtCw) ¥= Mt_(w). We begin with an auxiliary result. T28 THEOREM Let (Sn)neN be an increasing sequence of stopping times. Let S = lim n Sn and denote by U a square-integrable, ~s-measurable random variable such that
(28.1) then (a) The process (Ut) = (UI{t?-s}) is a square-integrable martingale; (b) (Ut) is orthogonal to every square-integrable martingale (Mt), which has no common discontinuity with (Ut); (c) The increasing process (At) associated with (Ut) is given by
(t ER).,
(28.2)
Proof It will suffice to establish (b), since (a) is then deduced by taking for (Mt) the martingale equal to 1. We have U 00 = UI{s T( co); we thus have
X tj _ 1 tends to X T arbitrarily fine, and to apply Fatou's lemma. -
-
X T - as the subdivision (t i ) becomes
VIII, T32
Applications of Martingale Theory
170
With this point established, consider as in No. 30 a sequence (Tn)neN of stopping times, which "carries" all of the discontinuities of the martingale (Xt ). Since (Xt ) is quasi left continuous the T n can be supposed totally inaccessible (VII.T44). Set
yo = X To
X
-
T 0-
(A~) = (yOI{t?:':T o})' (U~) = (A~ - B~),
where
(B~)
(A~ -
B~)
-is the only difference of continuous integrable increasing processes such that is a square-integrable martingale. Finally, let (X~) = (X t
-
U~).
Now, with the notation (Bf) having the same significance as above relative to define inductively (A~)
= (Y
1J
(A~),
I{t2:T,,}),
(Un = (Af - Bf),
(X:+ 1) = (X: -
un.
The proof is then finished exactly as in No. 30; the martingales (Un, with no common discontinuities, are pairwise orthogonal; the martingales (XfH) and (U~ + ... + U:) are orthogonal for the same reason. The series ~1J U~ converges in the quadratic mean to a random variable Y; the martingale (Y t ) = (E[ Y ~ admits the jumps of (Xt ) as discontinuities, so that the martingale (Zt) = (Xt - Yt) is continuous and orthogonal to (Yt ). We leave the details of the proof to the reader.
I tn
~ Part C
ANALYTIC TOOLS OF POTENTIAL THEORY
CHAPTER
IX
Kernels and Resolvents
1. Kernels. Dispersions Dl
DEFINITION
Let (E,lt) be a measurable space. A kernel on (E,lt) is a mapping (x,A)
JVV+
N(x,A)
of E x It into R+, which has the following properties: (1) The mapping x ~ N(x,A) is It-measurable for every set A Elt. (2) The mapping A .A./II+ N(x,A) is completely additive for every x E E.
The measure A
~
N(x,A) will often be denoted by N(x,dy).
2 The preceding definition in no way limits the "size" of kernels. For example, there exist kernels for which the function N(x,A) takes only the values 0 and + 00. Definition 1 will therefore be completed by the following definitions: (a) A kernel N is said to be sub-Markov (respectively, Markov) if N(x,E)
= (A, Vf)
Let .YE be the collection of bounded Borel functions g with the following properties: For every relatively compact open set A (a) The function x ~ Nx(gIA ) is Borel, and A-integrable. (b) The function gIA is p-integrable, and (p,gIA > = fEdA(X) Nx(gIA )·
The set .YE is evidently a vector space, closed under uniform convergence, and the limit of an increasing sequence of uniformly bounded positive elements of .YE still belongs to .YE. Let us prove that the set fC of all bounded, positive, l.s.c. functions is contained in .YE. If g belongs to fC, the function gIA is the upper envelope of a family, filtering to the right, of functions i: E fC}, and the function x ~ NxCgIA)' the upper envelope of the continuous functions Vi:, is hence l.s.c. and consequently Borel. On the other hand, by using again II.T36, we have
(p,g(,) =
s~p (p,!.> = s~p (Je, V!.> =
t
dJe( x)N.(g(,J
Properties (a) and (b) are thus verified. Since fC is closed under multiplication, it follows from I.T20 that .YE contains all bounded functions measurable with respect to the a-field generated by the l.s.c. functions, i.e., the Borel a-field.
177
Kernels. Dispersions
IX, TI2
Suppose now thatfis a bounded, universally measurable function with compact support. Let f1 and f2 be two bounded Borel functions with compact support such that /1 k). It thus follows from Lebesgue's theorem that (ea:N,N OO g)
= Hm (ea:N ,N7Jg) = Hm N 7J+1ga: = 7J-+ 00
N OO ga:.
7J-+ 00
The function NOOg is hence invariant. Let f be the excessive function g - NOOg. It is clear that NOOf = 0, and the preceding theorem implies that f is the potential of the function f-Nf. Suppose that we have a decomposition of the form
g=
f' + h,
where f' is excessive and h is invariant. Then N OOg = NOOf'
+ h,
and thus we obtain h ~ NOOg, with equality if and only if NOOf' = 0, i.e., iff' is a potential. This implies the uniqueness of the decomposition of g into an invariant function and a potential, and shows that NOOg is the largest invariant minorant of g.
IX, T2Q-T22
T20 N°Og
Kernels and Resolvents
182
Let g be a finite potential (or more generally a potential such that everywhere). Every excessive function dominated by g is then a potential.
COROLLARY
< 00
T21 THEOREM Suppose that the kernel G is proper (No. 2). Every excessive function f is then the limit of an increasing sequence offinite potentials. Proof Since E is the union of a sequence of sets with finite potentials, the function Gl is the limit of an increasing sequence of finite potentials gn. Let fn = ngn: the potentials fn converge everywhere to + 00. It then suffices to note that the functions fn A fare potentials, which increase to f
The reduite of an excessive function on a set We begin by defining notation. Let A be a measurable set and let A' be its complement. We denote byJA (respectively,JA ,) the kernel defined byJAf = fIA (respectively,JA,f= fIA,) for every measurable function f; by N A (respectively, NA') the kernel NJA (respectively, NJA,); by GA (respectively, GA,) the potential kernel associated with the kernel N A (respectively, N A'). T22 THEOREM Let f be an excessive function. The collection of excessive functions that dominate f on A has a smallest element, equal to
(22.1) This function is called the reduite off on A. Proof Denote by HA the kernel J A HA! = g. The inequality
+ JA,GA,NA .
It is evident that HA = HAJA . Set
k
JAf + 2,JA,N~.4:NAf HA! = g. Now h majorizes HAh; we thus have h > g, and g actually is the smallest excessive function that majorizes f on A.
The Potential Theory of a Single Kernel
183
IX, T24
23 Remarks (a) The potential of the function g - Ng is at most equal to g, and it is equal to g if N°Og = O. This happens at least in the following two cases: (1) if the functionfis a finite potential, since then N°Og < Nj = 0; (2) if the potential of the function JAf is finite, since GJAf > f on A, so that GJAf > g everywhere, and consequently N°Og < N°O(GJAi) = O. Suppose in particular that the kernel N is sub-Markov, and that the potential G(IA) is finite. The reduite of the function 1 on A is then a potential, which is called the equilibrium potential of A. (b) We are going to indicate a characterization of potentials by means of the reduite, analogous to the characterization most commonly used in classical potential theory. We suppose that the kernel G is proper, and we retain the notation of the preceding paragraphs. Denote by g an excessive function such that the function Ng is finite. We can now show that g is a potential if and only iflim n __ oo HAng = 0 for every decreasing sequence (An)neN of measurable sets which has empty intersection. Suppose in fact that g is a potential, and put HAtlg = h n • The functions h n decrease as n increases, and we have seen that Nh n = h n on (An. Let h = lim n __ oo h n. We have Nh = h from Lebesgue's theorem, and this implies the equality h = 0 from T19. Conversely, suppose that g is not a potential. There then exists a nonzero invariant function h, dominated by g. We are going to construct a decreasing sequence of sets An An = 0. such that HAnh = h for every n, and Consider first a set A' such that the function GA,h is finite, and put A = (A'. We have NAh + NA,h = h, and consequently
nn
HAh = JAh
+ JA,GA,NAh =
= JAh + JA,h
JAh
+ JA,GA,(h -
NA,h)
= h.
Since the kernel G is proper. we can choose an increasing sequence of measurable sets B n such that G(IB n ) < 00 for every nand Un B n = E. Then let A~ = B n
n
{h
< n},
and
An = (A~.
We have for every n GA~h < nG(/B) < 00. It follows from the preceding results that HAnh = h for every n, and consequently lim n __ oo HAng > h #: 0, while An = 0.
nn
~
Let h be a positive measurablefunction, equal to zero on the complement of A, and let f be its potential Gh. Every excessive function that dominates f on A dominates it everywhere. T24
THEOREM
Proof Let u be an excessive function that dominates f on A, and let v be the excessive function u A f We denote by j the positive function equal to v - Nv on the set {v < oo}, and to + 00 on the set {v = oo}. The potential of j is equal to + 00 on the set {v = oo}, and to v - N°Ov on {v < oo}; it is thus everywhere less than v (T18). Apply the kernel N to both sides of the inequality v < Gh, yielding Nv
< NGh,
from which, by adding h to both sides, we obtain h
+ Nv < h + NGh =
Gh.
We have Ghx = vex) on A and consequently, at every point x of A hex) ~ vex) - Nv X = j(x).
n {j
O} c
A,
< Gj < v,
T25 COROLLARY (Domination principle) Let g and h be two positive measurable functions, and let A be the set {h > O}. The relation
Ggz implies the inequality Gg
> Gh
Z
for every
x
E
A
> Gh.
The function 1 is excessive if the kernel N is sub-Markov, and the same holds for every function of the form a + Vg, where g is positive and measurable and where a is a positive constant. We thus have the following result. T26 COROLLARY (Complete maximum principle) Suppose that the kernel N is sub-Markov. Let g and h be two positive measurable functions, a a positive constant. The relation a + Ggz > Ghz for every x such that h(x) > 0
implies the inequality a
+ Gg > Gh.
Remark These two "principles" are also satisfied for kernels proportional to G, Le., for the elementary kernels of Deny (No. 17). It is interesting to note that the excessive functions can be characterized without explicit mention of the kernel N. T27 THEOREM Suppose that the kernel G is proper. A positive measurable function f is excessive if and only if the following property is satisfied: For every measurable function h (not necessarily positive) with a potential Gh that is welldefined andfinite, the relation
f(x)
> Gh
f(x)
> Gh
implies
Z
for et'ery x such that
Z
for every
x
E
h(x)
>0
E.
(27.1)
We postpone until later (Nos. 70 and 72) the proof of this theorem. Excessive measures The theory of excessive measures is, in general, easier than that of excessive functions, and we often put the emphasis on this latter theory, here and in all that follows. We suppose now that E is a locally compact, a-compact space and that N is a diffusion-kernel on E. All of the measures we consider will be defined on the a-field !!liE), and will be positive. The results below are borrowed from article [(52)] of Deny, where a more general notion of "kernel" is used, which is more satisfactory for the theory of excessive measures. D28 DEFINITION A Radon measure ft on E is said to be excessive (respectively, invariant) with respect to the kernel N if ftN < N (respectively, ftN = N). The measure ftN is then finite on compact sets, and is hence a Radon measure (No. 10). The following theorem corresponds to Theorem 16. Notice the disappearance of the countability restrictions.
185
The Potential Theory of a Single Kernel
IX, T29, 30, T31-T33
T29 THEOREM (a) Let A and fl be two excessive measures, cx and {J be two positive numbers. The measures CXA + {Jfl, A A fl are then excessive. (b) Let fl be a Radon measure, equal to the weak limit of a family (fli)iel of excessive measures, which is filtering either to the right or to the left. The measure fl then is excessive. Proof Statement (a) is obvious. To establish (b), we need only prove that (fl,Nf) < (fl,f) for every function fE ~}. Now we have (fli,Nf) < (fli ,f), and (fl,f) = limi (fli,f). The point to establish thus is the relation: (fl,Nf)
< lim (fli,Nf).
(29.1)
i
This is obvious if the family is filtering to the left. Assume it is filtering to the right, and denote by hi a density of fli with respect to fl. Let U be any relatively compact open set: The functions h/u increase with i, their integrals remain bounded, and they thus converge in the space Ll(fl). The relation limi fli(g) = fl(g) holds for any function g E~} that has its support in U; the Ll limit of the functions h/u thus is equal to I u , and we get fl(f) = sup fli(f) i
for every universally measurable function f that is positive, bounded, and equal to 0 on the complement of U. This relation now extends, by an increasing passage to the limit, to all universally measurable positive functions, and (29.1) follows. 30 The potential kernel G is not necessarily a diffusion. We shall say that a Radon measure fl belongs to the domain of G if the measure flG is finite on compact sets (it is then a Radon measure). If fl is excessive, we set flNa) = lim n flNn. T31 THEOREM (Riesz decomposition) Let fl be an excessive measure. Then fl can be written uniquely as the sum of an invariant measure and of a potential. To be precise, the measure fl - flN belongs to the domain of G, the measure flNa) = lim flN n is invariant, and n
The proof of this theorem is identical to that of Theorem 19. It can also be verified, as in No. 20, that every excessive measure dominated by a potential flG (where fl belongs to the domain of G) is a potential. The following theorem is obvious. We mention itonly for the sake of its name. T32 THEOREM ("Principle of the uniqueness of masses") Let A and fl be two n:easures belonging to the domain of G. The relation AG = flG then implies A = fl. Proof We have in fact A + AGN = AG = flG = fl + flGN, and the measures AGN and flGN are two equal Radon measures. The following theorem corresponds to Theorem 22 and uses the same notation. We will not give a proof for it.
T33 THEOREM Let fl be an excessive measure. The collection of excessive measures that dominate fl on A has a smallest element fl', equal to
IX, T34, 35
Kernels and Resolvents
186
This measure could be called the "reduite" of p, on A, but in general it isn't. It can be verified, as in No. 22, that the measures p,' and p,'N are equal on A'. Suppose, in particular"that p, is a potential AG. The measure p,', since it is dominated by p" is the potential of a well-determined measure A', which is called the balayee* of A on A. We have A' = p,' - p,' N, so that A' is carried by A. The statement analogous to Theorem 24 is true for excessive measures (the proof carries over without change). Let then A" be a second measure carried by A, whose potential coincides with AG on A. The potentials A'G and A"G have the same restriction to A. They are thus equal, and hence A' = A". We therefore have the following theorem. T34 THEOREM (Principle of balayage) Let A be a measure that belongs to the domain of G, and let A be a universally measurable set. There exists a unique measure A' with the following properties: (1) A' is carried by A. (2) A'G < AG, and these two potentials have the same restriction to A.
Appendix: Connections with Martingale Theory 35 The analogies between potential theory and martingale theory can perhaps be illumi-
nated by the following remarks. Let (.o,§',P) be a complete probability space, and let (§'n)nEN be an increasing family of sub-a-fields of §'. Denote by E the set N x .0, and by C the a-field on E consisting of the subsets of the form
U {n}
X
An'
nEN
where each set An is §'n-measurable. The C-measurable mappings from E into R are then of the form (n,w) ~ Xn(w),
where each partial mapping X n is §'n-measurable. The definition of stochastic processes adapted to the family (§'n) is thus recovered. Introduce on the set of these processes the equivalence relation defined by f"'Ooo./
(Xn)nEN
f"'Ooo./
(Yn)nEN if and only if X n = Y n a.s. for every n EN.
Let X = (Xn)nEN be a process with positive values adapted to the family (§'J. For each n, denote by Yn a version of the generalized conditional expectation of X n+1 with respect to §'n' and define The mapping N is not well defined, due to the indeterminacy in the choice of conditional expectations, but by passing to the quotient by the equivalence relation one can obtain a mapping that formally has all the properties of a sub-Markov kernel-in particular, the behavior under passage to a monotone limit. The "excessive (respectively, invariant) functions" with respect to the "kernel" N are then the equivalence classes of generalized supermartingales (respectively, martingales), and all of f"'Ooo./,
• "Swept out measure."
187
Semigroups and Resolvents
IX, D36, D37, 38
the elementary theory we have developed carries over without difficulty. Naturally, no truly important theorem on supermartingales is obtained by this method. In particular, the fundamental theorems on the behavior of paths have no parallel in potential theory.
3. Semi8roups and Resolvents The results of this section are not deep, but they are very useful technical tools. They come mostly from Hunt's papers (78). Semigroups of kernels D36 DEFINITION Let (E,tff) be a measurable space. A family (Nt)teR+ [respectively (Nt)t> 0] ofkernels on (E,tff) is said to be a semigroup ofkernels (respectively, a semigroup in the broad sense) if the relation
holds for every pair (s,t) of numbers >0 (respectively, >0). The semigroup is said to be sub~Markov (Markov) if all of the kernels N t are sub~Markov (Markov). A semigroup ofdispersions on a locally compact space constitutes a particular case of this definition. A semigroup in the broad sense can always be transformed into a true semigroup. It suffices to set No = I (the identity kernel). We prefer, however, to maintain the distinction between these two types of semigroups. Let (Nt ) be a semigroup of kernels. It is possible to produce from it new semigroups of kernels (N:) (where p denotes a number >0) by setting
Nf = e-ptNt · These semigroups are sometimes better behaved than (Nt ) itself. Supermedian and excessive functions D37 DEFINITION Let (Nt) be a semigroup in the broad sense on (E,tff). A positive function f defined on E is said to be p~supermedian (p > 0) with respect to the semigroup (Nt) if f is
tff~measurable and
e-PtNtf < f
for every
t
> O.
(37.1)
The function f is said to be p~excessive if, moreover, Hm e-PtNtf = f.
(37.2)
t--+O
The function f is said to be p-invariant
iff
e-PtNtf= f
is everywhere finite and if for every
t
> O.
38 Remarks (a) Functions that are O-supermedian (O-excessive) are called simply supermedian (excessive). (b) The functions that are p-supermedian (p-excessive) with respect to the semigroup (Nt ) are identical to the supermedian (excessive) functions with respect to the semigroup (N:).
IX, 39
188
Kernels and Resolvents
(c) A p-supermedian (p-excessive) function is also q-supermedian (q-excessive) for every
q > p. One can thus say that "the larger p is, the more p-supermedian (p-excessive) functions
there are." (d) Relation (37.1) implies that the function t ~ e-ptNt!X is decreasing for every x E E. Moreover, if condition (37.2) is satisfied, this function is right continuous from Lebesgue's theorem. (e) Let! be a p-supermedian (p-excessive) function. The function N t! is then p-supermedian (p-excessive) for every t > 0. (!) Let (!n)nEN be a sequence of p-supermedian functions. It follows immediately from Fatou's lemma that the function! = limninf!n is also supermedian. Suppose that the sequence is increasing, and that the functions!n are p-excessive. We then have I
=
sup In n
=
sup sup e-PtNtln = sup sup e-PtNt!n n
t -PtN! = sup e t,
t
n
t
so that! is p-excessive. (g) Supermedian or excessive measures are defined similarly. The theory of excessive measures cannot be developed satisfactorily under our current hypotheses. We shall see, in turn, that the theory of excessive measures becomes very simple-much simpler than that of excessive functions-when suitable hypotheses are made on the semigroup (Nt ). Resolvents
Let (Nt) be a semigroup in the broad sense. We say that (Nt ) is a measurable semigroup if the function
39
(t,X) ~ Nix,!)
is measurable (with respect to the natural product a-field on ]O,oo[ x E) for every positive C-measurable function! Then, for every number p > 0, define Vp(X,!) =
f.oo e-PtNt(x,A) dt
(A
E
C).
(39.1)
It is clear that the mapping (x,A) ~ Vix,A) is a kernel V p, and that the notation Vix,!) we have used is consistent. The family of kernels (Vp)p>o is called the resolvent of the semigroup (Nt), and the kernel V = Vo is called the potential of the semigroup (Nt). More generally, let p, be a bounded measure on the half-line R+. Define a kernel NI' by
the relation Nil) = f.ooNtfdP,(t).
(39.2)
The kernels N t are of this form (for p, = Ct), just as are the kernels Vp (p, is then the measure with density e- pt on R+, which we denote by e p ). It is easy to verify that N;.Np.
the symbol
=
= Np..;' =
N;.*p.
* denoting convolution. The formula e + (q - p)e * e p = q
q
where q and p are two numbers such that q Vq
Np.N;.,
(39.3)
e p,
> P > 0, then gives us the fundamental formula
+ (q -
p)Vq V2J = V p,
(39.4)
IX, D40, 41, D42, D43, 44
Semigroups and Resolvents
189
which is known as the resolvent equation. Now we are going to forget measurable semigroups for a moment and study the families of kernels which satisfy (39.4) for their own sake. D40 DEFINITION (E,C), such that
A resolvent on a measurable space (E,C) is a family ( Vp) p> 0 of kernels on
(40.1)
and
for every pair of numbers p, q such that q > P > O. A resolvent (Vp) is said to be proper (respectively, sub-Markov, Markov) if the kernels V p are all proper (respectively, if the kernels p V p are all sub-Markov, Markov).
We shall mainly be interested in sub-Markov resolvents in the rest of this book. The reader can simplify his task by supposing, for the rest of this section, that all the resolvents considered are sub-Markov. 41 Let f be a positive measurable function. According to formula (40.1) the function p ~ Vpfis decreasing. We can thus put Vof
=
Vf = sup Vpf = lim Vpf. p
p-+O
Let (fn) be a sequence of positive measurable functions that increases to f We have Vf = sup Vpf = sup sup VIn p p n
=
sup sup VIn n
=
P
sup Vfn n
so that V is a kernel. It is easily verified that VVp = VpV; V = V p + pVVp (p
> 0).
D42 DEFINITION Let f be a positive measurable function. The function Vrf (r the r-potential off The function Vf = Vof is called the potential off D43
DEFINITION
> 0) is called
We say that the resolvent (Vp) is closed if the kernel V is proper.
Suppose that E is a locally compact, a-compact space. We then say, in a slightly more precise sense, that the resolvent (Vp) is closed if all of the kernels Vip > 0) are dispersion kernels. 44 Let (Vp ) be a resolvent and let r be a positive number. The family of kernels (p
> 0)
is a new resolvent. Suppose that the resolvent (Vp ) is proper: The resolvent (V;) is then closed for every r > O. Indeed, V~f = lim p _ o V p+rf < Vrf for every positive measurable functionf(we shall see later that the inequality is, in fact, an equality). This property is the reason for the interest in the resolvents (V;). Suppose that the resolvent (Vp ) is associated with a measurable semigroup (Nt ) by formula (39.1). The resolvent (V;) is then associated with the semigroup (ND = (e-rtNt ).
Supermedian and excessive functions We now define supermedian and excessive functions with respect to a resolvent. The connection between this definition and Definition 37 is examined later (No. 65).
IX, D45, T46
Kernels and Resolvents
190
A positive measurable function f defined on E is said to be r-supermedian (r > 0) with respect to the resolvent (V1J if D45
DEFINITION
pVP+rf O.
(45.1)
if in addition
The function f is said to be r-excessive
Hm p V1J+rf =
The function f is said to be r-invariant
p
iff
f.
(45.2)
is everywhere finite and if
PV 1J+rf = f
for every
p
> O.
The words "with respect to the resolvent (V1J )" will usually be omitted. Functions that are O-excessive (O-supermedian, O-invariant) will be called simply excessive (supermedian, invariant). The r-supermedian (r-excessive) functions with respect to the resolvent (V1J) are identical with the supermedian (excessive) functions with respect to the resolvent (V;). First properties
We suppose henceforth that the resolvent (V1J) is proper.
(a) Let f be a positive measurable function, and x a point in E. The function p ~ V1Jfx is then decreasing, right continuous, and continuous on every open interval where it is finite. (b) Let f be an r-supermedian function. The function p ~ pVr+1JfIX is then increasing and continuous for every x E E. T46
THEOREM
Proof The fact that the function p ~ V1JfIX is decreasing follows immediately from the resolvent equation, and has already been used. Let Po, p, e be three numbers such that o < Po < p, 0 < e < p - Po, and V1Jo f lX < 00. We then have V1JfIX = V1J+e f IX + eV1J V1J+ef IX and V1J _ef IX = V1JfIX + eV1J V1J_ef IX, where the quantities eV1JV'lJ-efIX, eV1JV1J+efIX are dominated by(p - po)V1J V 1Jo f'x = V 1Jo f lX V1JfIX < 00. It then follows that the function p ~ V1Jpx is continuous on the interval ]Po,oo[. Consider next an arbitrary number p > 0; since the kernel V1J is proper, f is the limit of an increasing sequence of positive functions fn such that the functions V1Jfn are finite. The function q ~ Vqr is thus equal, on ]p,oo[, to the upper envelope of the continuous functions q ~ Vqfn IX. It is therefore decreasing and lower semicontinuous (l.s.c.), and consequently right continuous. In summary, the function p ~ V1JfIX is decreasing, and has at most one point of discontinuity Po, to the right of which it is finite and continuous, and to the left of which it equals + 00. Suppose next that fis r-supermedian, and let p and q be two numbers such that 0 < P < q. We then have and consequently, applying the kernel Vr+q ,
< (q - p)Vr+qf, p)Vr+qVr+1J f < pVr+qf + (q -
p(q - p)Vr+qVr+1J! and
pVr+qf + p(q -
p)Vr+qf= qVr+qf
Semigroups and Resolvents
191
IX, T47-T49
Since the left side is equal to PVr+pf from the resolvent equation, we see that the function p JW+ PVr+ p! is increasing. Now this function can have, from (a), only a single point of discontinuity Po, to the left of which it equals + 00, and to the right of which it is finite; this cannot happen for an increasing function, and it follows that the function p .A./II'+ Vr + p is continuous. T47 THEOREM Let f be a positive measurable function. (a) The function f is r-supermedian if and only if f is s-supermedian for every s > r. (b) Suppose that f is r-supermedian and that there exists a number s > 0 such that f is s-excessive. The function f is then r-excessive.
Proof The relation p Vr+p! < f implies p Vs+p! < f for every s > r according to T46(a). Conversely, the relation p V s+p! < f for every s > r implies p Vr+pf < f from the right continuity of the function s JW+ Vs+pf. Let rand s be two positive numbers. The equalities lim p ~+pf = I p-+ 00 can be written, respectively,
and
lim PYs+pf = I p-+ 00
lim (p - s)Vpf = f, p-+ 00 and consequently are equivalent, since the ratio (p - r)/(p - s) tends to 1 as p --+ and
lim (p - r) Vpf = I
p -+ 00
00.
T48 THEOREM (a) Let f and g be two r-supermedian (respectively, r-excessive) functions, and let lX, fJ be two positive constants. The function lXf + fJg is then r-supermedian (respectively, r-excessive). The function fAg is r-supermedian. (b) Let (fn)neN be a sequence of r-supermedian functions. The function f = lill1 inffn is n-+oo then r-supermedian. (c) Let (fn)ne N be an increasing sequence of r-excessive functions. The function f = limnfn is then r-excessive.
Proof We have
P Vr+if A g)
< (p Vr+pf) A (p Vr+pg) < fAg.
This function is hence r-supermedian. In order to establish (b), we use Fatou's lemma, which implies that p
~+p (limninfIn) < limninf p ~H!n < limn infIn·
Under the hypothesis of (c), we have, from T36(b), lim p ~+pf = sup p ~+pf = sup sup p ~+pfn p-+ 00 p P n
= so that f is r-excessive.
sup sup p ~+pfn n
=
P
sup In n
= f,
T49 THEOREM Let q and r be two positive numbers, and f an r-supermedian function. The function Vqf is also r-supermedian.
Proof For every p
> 0, we have p Vr+pVq! = Vq(p Vr+p!)
< Vqf·
IX, T5Q-T53
Kernels and Resolvents
192
Resolvent identities
T50
LEMMA
Let f be a positive measurable function. The function Vrf is r-supermedian.
(We shall see later that this function, under very general conditions, is actually r-excessive.) Proof This is an immediate consequence of the resolvent equation, pVr+pVr!
+
Vr+p!= Vrf
(50.1)
T51 LEMMA Let f be a positive measurable function with a finite r-potential Vrf The functions of the form (51.1) are then finite. Proof Let e be a strictly positive number such that r function (51.1) is dominated by (-v,.+e)k-l-v,.!=
+ e < Pb ... , r + e < Pk.
~l (e-v,.+el-l-v,.f
0) are finite. The function r JVV'+ V r! is then infinitely differentiable on the interval ]0,00[, and we have the relations (52.1)
and
n
d r -v,.! = n! ( -1 )n+l(-v,.t(I - r -v,.)f. dr n
(52.2)
Proof From the resolvent equation and T46(a) we have
lim Vqf - -v,.! = - lim Vq-v,.f = q-+r q - r q-+r
-(-v,.)~.
The two formulas are then established by induction: We leave the details of the proof to the reader. The following lemmas will permit us to establish another important identity. T53 LEMMA Let r be a number >0, and h a finite, positive, measurable function such that all of the functions Vp+,n are finite (p > 0). Suppose that there exists a p > 0 such that p V p+,.h = h;
(53.1)
the function h is then r-invariant. Proof The equality
Vq+,.h = V p+,n
+ (p -
q) Vq+rVp+,.h
holds for every q < P (from the resolvent equation), and also for every q the functions that enter are finite. Replacing V p+,.h by hip we obtain 1 Vq+rh = - h
p
+
and consequently qVq+,.h = h for every q > O.
p-q Vq+rh, p
> p, since all of
193
Semigroups and Resolvents
IX, T54, T55, 56
T54 LEMMA Let f be a positive measurable function with finite r-potential (r function is then the only r-invariant function dominated by VrI
> 0). The zero
Proof We have Vrf = lim£--+o Vr+£f (T46). Since the function Vrf is finite, we can write
lim eVr+£Vrf = lim (~f £-+0
~+£f) =
o.
£-+0
Let then h be an r-invariant function dominated by Vrf; we have h = e~+£h = lim e~+£h £-+0
< lim e~+£~f= o. £-+0
The following theorem is stated only for the kernel V and for a closed resolvent. It extends to the kernels Vr [consider the resolvents (V;) of No. 44] and then, as r ~ 0, to the kernel V even when the resolvent (Vp ) is not closed. T55 THEOREM * Suppose that the resolvent (Vp) is dosed, and let p be a number have the identity:
>0.
We
00
pV = !(pVp)n.
(55.1)
n=l
Proof Both sides being kernels, and the left-hand side being a proper kernel, it suffices to verify the equality 00
pVf= !(pVp)nf n=l
for every positive measurable function f with finite potential VI It follows immediately from the resolvent equation that, for all n
> 0, pVf= pVp[
+ (PVp)'1 + ... + (pVp)n-y + (pVp)npVI
We thus need only show that lim (p Vp)nVf =
o.
n-+oo
But these functions decrease when n increases, and are dominated by Vf(see T49 and T50). The limit h = lim(pVp)nVf n-+oo
thus exists and clearly satisfies the relation p Vph = h (Lebesgue's theorem). It is hence invariant from T53, and zero from T54.
A supplementary hypothesis 56 We suppose from now on that the resolvent (Vp ) satisfies the following hypothesis, which will be studied in more detail in No. 68.
There exists a number s gn such that
> 0, and an increasing sequence offinite s-supermedian functions limg n = n-+oo
• This identity has been used by Deny (see No. 68).
+00.
(56.1)
IX, T57, 058, 059
Kernels and Resolvents
194
We continue to suppose also that the resolvent is proper. Hypothesis (56.1) is satisfied in two very important cases: when the resolvent is sub-Markov (take s = 0 and gn = n); when the kernels V2) are strictly positive (i.e., when all of the measures Ea;V2) are different from 0). Indeed, choose any s > 0; since the kernel Vs is proper, the function 1 is equal to the limit of an increasing sequence (h n ) of positive functions with finite s-potentials. Put gn = nV ~n; the functions gn are s-supermedian (T50) and since the function V sl is everywhere strictly positive we have Hm gn = lim n~l = + 00. n
n
Here is an important consequence of this hypothesis.
T57 THEOREM r-excessive.
Let f be a positive measurable function; the function Vrf (r
~
0) is then
Proof We have the relation pV2J+r Vrf + V2J+rf = Vrf. Suppose first that/is bounded by one of the functions gn of No. 56. We then have, for large enoughp,
V2J+rf < V2J+rgn
=
1
p+r-s
(p
+
r - s)V2J+rgn
1 < p+r-s gm
and thus limj)-+oo V2J+rf = 0, so that the function Vrfis r-excessive. To treat the case wheref is arbitrary it then suffices to note that the functions Vr(f A gn) are r-excessive and to apply T48.
Regularization of supermedian functions
Letfbe an r-supermedian function (r we can put
> 0). Since the function p.A.J"V'+ pV2J+rfis increasing,
J=
lim p V2J+rf· 2)-+ 00
This function is r-supermedian (T49 and T48) and dominated by f. Also for every s > 0, so that depends only on f, and not on r.
J
D58
DEFINITION
D59
DEFINITION
ViIA.)
=
J=
limj)-+oo p V2J+sf
The function 1 is called the regularization of the r-supermedian function f.
Let A be a measurable set; A is said to be a set of potential zero 0 for every p > o.
if
It suffices that V2)(IA.) = 0 for a single value of p. Indeed this implies that Vq(IA.) = 0 for every q > P from T46(a), and for q < P we have
Vq(IA.) = ViIA.)
+ (q -
p)VqViIA.) = O.
We employ in what follows the expression "almost everywhere," when it will not lead to ambiguity, as synonymous with the expression "except for the points of a set of potential zero." Let f and g be two positive measurable functions. equal almost everywhere. The potentials V2)fand V2)g are then equal for every p > O. In particular, iffandg are r-excessive we have f = limj)-+oo p V2J+rf = limj)-+oo p Vr+2)g = g.
IX, T60, 061, T62
Semigroups and Resolvents
195
T60 THEOREM Let f be an r-supermedian function; the function I is then r-excessive, equal to f almost everywhere, and is the largest r-excessive function dominated by f.
Proof The function pVr+pfis r-supermedian (T49) and (r + p)-excessive (T57), and hence r-excessive (T47). It follows from T48(c) that the functionlis r-excessive. Let g be an r-supermedian function dominated almost everywhere by f. We have p Vr+pg < P Vr+pf for every p, and consequently also g < Thus I is, in particular, the largest rexcessive function dominated by f. It remains to show that we actually havel = falmost everywhere. To see this, consider a number t greater than r and the number s of No. 56. The functions gn of No. 56 and the function f are also t-supermedian, and I = limp -+ oo p Vp+tf. The functions fn = fA gn are t-supermedian and finite, so thatf = sUPnfn and
J.
I
= sup pVp+tf = p
sup sup pVpHfn p
n
=
sup sup pVp+tfn = sup In. n
n
p
We thus need only know that In = fn almost everywhere or, since the functions fn are finite and dominate lm that Vtln = Vtfn. But we have, from the relation pVt+pfn 0). Then f is said to be purely r-excessive, or to be an r-potential, if there exists no r-invariant function dominated by f and distinct from o.
The expression ''! is an r-potential" by no means implies that there exists a positive function g such that f = Vrg; it is borrowed from classical potential theory in the unit disk, where the superharmonic functions that satisfy this condition are effectively Green potentials (of positive measures). We use rather the expression',! is purely r-excessive" in this chapter. T62
THEOREM
(Riesz decomposition)
Let f be an r-excessive function such that the function h = lim p Vp+rf p--O
is finite. The function h is then r-invariant and the function f - h is purely r-excessive. This decomposition off into an r-invariant function and a purely r-excessive function is unique. Proof Since the function p ~ p VP+r f is increasing, and p ~ VP+rf is decreasing, the function h is finite if and only if all of the functions Vp+rf are finite. Suppose then that h is finite. It follows from Lebesgue's theorem that qYa+r h
= limpqVq+rVp+rl= lim p--o
pq (Vp+rf- Ya+rf) p--o q - P
= h.
Thus h is r-invariant. If h' is an r-invariant function dominated by f we have h' = pVp+,h'
< pVp+rf
for every p
> 0,
and consequently h' < h. Thus h is the largest r-invariant minorant off. It then follows that the function f - h is purely r-excessive (If it admitted a nonzero r-invariant minorant k,
196
Kernels and Resolvents
IX, T63, T64
h + k would be a minorant off, and larger than h.) We leave to the reader the uniqueness of the decomposition, which is easily proved.
Let f be an r-excessive function, such that the functions Vp+rf are finite for every p > O. The function f is purely r-excessive if and only if T63
COROLLARY
lim p VfJ+rf = O.
fJ-+O
This applies in particular to a finite functionf of the form Vrg (T54). The following theorem is particularly useful.
-+
(a) Every r-excessive function (r > 0) is the limit of an increasing sequence offinite r-potentials ofpositive functions. (b) If the resolvent (VfJ) is closed, then property (a) holds for r = 0 also. If the resolvent is sub-Markov, the r-potentials considered can moreover be supposed bounded. T64
THEOREM
Proof Property (a) can be deduced immediately from property (b) by replacing the resolvent (VfJ) by the resolvent (V;) of No. 44. We shall thus establish only (b), supposing that the resolvent is closed. The proof will be divided into several parts. (1) Let f be a purely excessive finite function. Define
(p
> 0);
this function is positive. Since all of the functions VfJf are finite for p > 0 (from the inequality pVfJf 0), and we can write
VqDfJf = p(Vqf - pVqVfJf) = p[(Vqf - (p - q)VqVfJf) - qVqVfJf] = p(VfJf - qVqVfJf)
< pVfJf (d)] is adapted from Deny (52). ~
T6t" THEOREM Let (V2J) be a proper resolvent. The following statements are equivalent: (a) Property (56.1) is satisfiedfor an s > 0; (b) The set of nonpermanent points is ofpotential zero; (c) Property (56.1) is satisfied for every s > 0, and for s = 0 if the resolvent is closed; (d) Let f be a positive measurable function, and u a supermedian function such that
u(x)
> Vfx
f(x)
> o.
f(x)
> O.
for every x such that
Then we have u > Vf; (e) Let f and g be two positive measurable functions such that VgX We then have Vg
> Vfx
at every point x such that
> Vf ("domination principle").
Proof We begin by establishing the equivalence of (a), (b), and (c). The set Eo of nonpermanent points is the set of points where the supermedian function + 00 differs from its regularization, so that (a) => (b) from T60 [which is a consequence of (a)]. With the supposition that (b) is satisfied, equip the set E2J of permanent points with the a-field induced by tt. Since the set Eo is negligible for every measure cX V2J , we can define a resolvent (W2J) on E2J by putting Wix,f) = V 2J(x,f') for every p > 0, every x E E 2J , and every positive measurable function f defined on E 2J , f' denoting any measurable extension off to E. The kernels W 2J are then strictly positive on E 2J , and this implies the existence for every s > 0 of an increasing sequence of functions h n defined on E2J , which are finite, s-supermedian with respect to the resolvent (W2J ), and which tend to + 00 on E2J (this point was established in No. 56). It then suffices to put
gn(x)
=(
hn(X) for x
E
E2J
n
E
Eo
for x
to obtain s-supermedian functions with respect to (V2J ), which satisfy (56.1). Finally, the implication (c) => (a) is clear. The reasoning of No. 56 shows that the functions gn exist also for s = 0 if the resolvent is closed.
IX, T69, T70
Semigroups and Resolvents
199
The rest of the theorem will be established using the scheme (c) => (d) => (e) => (b). It will suffice to prove the implication (c) => (d) in the case where the resolvent is closed. Indeed, this implication will then be established for each of the resolvents (V;) (r > 0). Since the function u is r-supermedian for every r > 0, the relation
>
u(x)
VflX
> Vrr
for every x such that
f(x)
>0
will then imply u
>
Vrf
for every
> 0,
r
and consequently also u > Vof, which is the desired result. Suppose then that the resolvent is closed, and denote by (gn) an increasing sequence of finite supermedian functions that tend to + 00. Put fn = f A gn' We have u(x)
>
Vfn
for every x'such that
IX
fn(x)
> 0,
and it suffices to show that u > Vfn for every n. We shall use, to this end, the elementary domination principle of the preceding section (T24). Let p be a number >0, and let N be the kernelpVp • The potential kernel associated with N(in the sense of No. 17) is equal to / + PV from the identity in No. 55. Every supermedian fuqction with respect to the resolvent (Vp ) is excessive with respect to N (in the sense of No. 14). The relation gn(x)
+ pu(x) ~fn(x) + pVfn(x)
at every point x such that
fix)
>0
hence implies, from T24, the relation gn
+ pu > fn + pVfn'
Since p is arbitrary and the functions gn and fn are finite, this implies u > Vfm and assertion (d) is established. The implication (d) => (e) is clear. Finally, we establish the implication (e) => (b). The function VC/Eo) is zero at every point of Eo. We thus have
o=
Vox
> V(/Eo)1X
at every point x such that
IEo(x)
> O.
Thus V(IE o) = 0 everywhere from (e), and this implies that Eo is a set of potential zero. The following statement is very useful. T69 COROLLARY Properties (d) and (e) of the preceding statement are satisfied by every sub-Markov resolvent and by every proper resolvent (Vp) with strictly positive kernels.
Let (Vp ) be a closed resolvent that satisfies the equivalent conditions of No. 68. It is sometimes of interest to know how to determine if a function g is supermedian with respect to the resolvent (V p), without having to form the functions pVpg. T70 THEOREM A positive measurable function g is supermedian if and only if the following property holds: For every measurable function h (not necessarily positive) with a well-defined and finite potential Vh, the relation (70.1) g(x) > VhlX for every x such that hex) > 0 implies g(x)
> Vh lX
for every x
E
(70.2)
E.
Proof Suppose that g is supermedian. The relation (70.1) can also be written g(x)
+
V(h-y
>
V(h+)1X
for every x such that
h+(x)
> O.
IX, T71
Kernels and Resolvents
200
We then have, from T68(d), which is equivalent to (70.2). Conversely suppose that g satisfies the property in the statement. Denote by f a positive measurable function dominated by g and with finite potential VI The function h = p(f - p V"f) then admits a well-defined and finite potential, equal to p V"f Now we have on the set
{g - pV"f> O}
and a fortiori on the set {f - p V"f > O} = {h > O}. We thus have g > Vh = P V"f everywhere. Since the kernel V is proper, g is equal to the limit of an increasing sequence of functions fn of the preceding type. We thus have g > PV"g, which shows that g is supermedian. The pseudo-reduite of a function
The notion we define now does not coincide with the classical notion of the reduite of f (which would be the lower e:Q.velope of the excessive functions that dominate f on A). This is why we call it the pseudo-reduite of I It is in fact not certain that the following theorem has any usefulness, and we shall only outline its proof.
T7t
Let A be a measurable set and fa supermedian function with respect to the resolvent (V,,). The collection of supermedian functions that dominate f on A has a smallest element, which we shall call the pseudo-reduite off on A. THEOREM
Proof For every p > 0, put N" = pV", and denote by g" the reduite off on A relative to the kernel N" (No. 22). The reader can easily verify the following facts: (a) Every excessive function with respect to the kernel N" is excessive with respect to every kernel N q for q < p. (b) When p increases, there are thus fewer and fewer excessive functions with respect to N", so that the reduite g" increases. Put g = lim~oo g". (c) The function g is supermedian with respect to the resolvent (V,,). It is equal to f on A, and every supermedian function that dominates f on A dominates g everywhere. It then follows that g is the desired pseudo-reduite ofI Remark Suppose that A is a set of potential zero; the pseudo-reduite of f on A is then clearly equal to fIA • There would be a very different result with the classical reduite.
Connections between Sections 2 and 3
Let N be a kernel, and let
be the potential kernel associated with N. We are going to show that a resolvent (V,,) can be constructed so that G = Vo, and that this resolvent is, in turn, associated with a semigroup of kernels. Put, for every number a in ]0,1],
201
Semigroups and Resolvents
Let b be a number such that 0
< b < a. Then
GaGo = GoGa = I
and consequently
+ (a + b)N + (a 2 + ab + b2)N2 + ... ,
(a - b)GoGa + bGo
Put then, for every p
~
IX, T71
= (a
- b)GaGo + bGo
=
aGa.
0, V2J
=
1
p+1
G1 /(21+1)·
The kernels V2J constitute a resolvent such that Vo = G. Put, on the other hand, for every t ~ 0, 2 2 tN t N Pt = e-t(l + +~ + ...).
1!
It is easily verified that the kernels (Pt) constitute a measurable semigroup with resolvent
(V2J)-this last point follows from the possibility of integrating the exponential series term by term. Suppose now, for simplicity, that the kernel G is proper (it could be supposed only that the kernels Ga are proper for every a > 0). Since the kernel G is strictly positive, the resolvent (V2J) satisfies the hypothesis of No. 56. Let A be a set of potential zero. The relation GIA ~ lA shows that A is empty, and it follows (T60) that the supermedian functions with respect to the resolvent (V2J) are excessive. Let/be an excessive function with respect to N. We have p VJ =
(1 + p +1 1 p+ 1 p
Nf +
(p
1
+
N 2f
1)2
+ ...)
~ P ~ / ( 1 + p: 1 + (p ~ 1)' + .. -) = J, so that f is supermedian (and hence excessive) with respect to (V2J). Conversely, suppose that / is excessive with respect to (V2J). Then / is the limit of an increasing sequence of potentials (T63(b», and hence it is excessive with respect to N. The resolvent equation is such a useful analytic tool that one could occasionally think of using it in the elementary situation of Section 2. Theorem 27, for example, reduces immediately to Theorem 70-whose proof by resolvents is very natural. We shall see later on other examples of the use of the resolvents (V21) associated with a kernel G.
X
CHAPTER
Construction of Resolvents and Semigroups
We now study, following Hunt, this problem: Given a kernel V, which satisfies the complete maximum principle, does there exist a sub-Markov resolvent (Vp ) such that Vo = V? Is this resolvent associated with a semigroup? The answer to this question is only partially known, but what is known shows that all "nice" kernels of potential theory fit into Hunt's probabilistic theory. Only Nos. 14 and 16 are indispensable for understanding the following chapters. We consider only proper kernels in this chapter.
The Domination Principle
1.
Dl
Let V be a proper kernel on a measurable space (E,tC); V is said to satisfy the domination principle if for every pair (f,g) ofpositive measurable functions, the relation DEFINITION
VfX
> VgX
for every x
E
E such that
g(x)
>0
implies XEE.
IX.T25 and IX.T69 furnish examples of kernels that satisfy the domination principle. D2 DEFINITION A kernel V is said to satisfy the complete maximum principle if for every constant a > 0 and for every pair (f,g) ofpositive measurable functions the relation
a
+ Vfx > VgX
implies a
+
Vfx
for every x such that
> VgX
for every
g(x)
>0
XEE.
This principle clearly implies the domination principle. We have seen examples of kernels that satisfy the complete maximum principle in IX.26 and in IX.69 (the subMarkov case). 202
X, 3, T4
The Domination Principle
203
3 Here is another, very useful, form of the complete maximum principle. Let f be a measurable function (not necessarily positive) such that Vf makes sense, and suppose that the function Vftakes value> 0 at certain points. Let P = {x:f(x) > O}. If V satisfies the complete maximum principle, we have sup Vfe = sup Vfl:. reEE reEP
(3.1)
(This property is sometimes called the "weak principle of the positive maximum.") To establish (3.1), denote the right side by a; we have on the set
{x:f+(x)
> O} =
P,
and consequently a+ + V(f-) > V(f+) everywhere, so that a+ > Vf. Since the function Vfattains strictly positive values we have finally a+ > 0, and hence a+ = a. This establishes (3.1). Conversely, it is easy to see that property (3.1) implies (for a proper kernel) the complete maximum principle. We shall be particularly interested in the case where E is a locally compact, a-compact space given the a-field of universally measurable sets, and where V is a continuous diffusion-kernel on E. The following theorem then allows us to simplify the verification of the domination principle. The strict positivity hypothesis made on V will be commented on in No. 5.
T4
THEOREM
Suppose that V is a continuous and strictly positive diffuSion-kernel, and that
the relation Vfre
> vgre
for every x such that
g(x)
>0
(4.1)
implies, when f and g belong to ~}(E), Vr
> vgre
for every
xEE.
(4.2)
The kernel V then satisfies the domination principle. Proof Let f and g be two positive universally measurable functions such that
for every x such that
g(x)
> o.
We shall show that Vf > Vg. Let AI be the set of l.s.c. (lower semicontinuous) functions that dominate f, and let B g be the set of bounded, positive, u.s.c. (upper semicontinuous) functions dominated by g. We have the following relations, which are immediate consequences of classical results from the theory of Radon measures:
Vj= inf Vj~ f'EA,
Vg = sup Vg'. g'EB g
We have, on the other hand, V(X,g')
= f
J{g'>O}
V(x,dy)g'(y)
=
sup
f
Kcompact JK Kc{g'>O}
V(x,dy)g'(y).
X,5
204
Construction of Resolvents and Semigroups
It thus suffices to show that
Vf' > V(g'IK)
for every functionf' E AI' every functiong' E B g , and every compact K contained in {g' Let then ep be a positive continuous function with compact support, such that Vepa;
>0
for every
x
E
> O}.
K.
The existence of such a function is an immediate consequence of the Borel-Lebesgue theorem, since the kernel V is strictly positive. For every e > 0 we have V(f'
+
eepY
> V(g'IKY
for every
x
E
K.
Denote by C the set of functions h' E ce~(E) dominated by f' + eep. The family of functions Vh'(h' E C) is filtering to the right, and admits f' + eep as its upper envelope; these functions are, on the other hand, continuous, whereas the function V(g'IK ) is U.S.c. (cf. IX.IO). Theorem 6 then implies the existence of a function h' E C such that Vh'a;
> V(g'IK)a;
for every
x
E
K.
Since the function Vh' is continuous, and the function V(g'IK) is u.s.c., there exists a compact neighborhood L of K such that Vh'a;
> V(g'IK)a;
for every
x
E
L.
Denote by D the set of functions j' E ce}(E) with support in L, which dominate g'I K' Another application of Theorem 6, analogous to that above, shows that there exists a function j' E D such that Vh'a; > Vj'a; for every x E L. This inequality then holds for every x such that j'(x) x, from (4.2). We thus have, V(f'
+
eep)
> 0, and consequently also for every
> Vh' > Vj' > V(g'IK).
which concludes the proof, since e was arbitrary. S Remarks (a) Let E1) be the set of permanent points for the kernel V (see IX.D67). We have E1) = {x: VIa; > O}. Since the function VI is l.s.c., E1) is open. Instead of supposing that V is strictly positive, suppose that the set (E1) is of potential zero. For every positive Borel function f defined on E1) and every x E E1) set W(x,j) = V(x,f'), where f' denotes any Borel extension off to E. The kernel W defined in this way on E1) is continuous and strictly positive, and thus satisfies the domination principle if V has the property of the statement. It then follows that the kernel V itself satisfies the domination principle. (b) Suppose that the continuous kernel V satisfies the complete maximum principle for the elements of ce}(E); it can then be shown, exactly as above, that V satisfies the complete maximum principle. It is not necessary to suppose that V is strictly positive: the function Vf + eVep of the foregoing proof can be replaced by Vf + e. Here now is the topological lemma we have used in the course of this proof, and which we shall have occasion to use again. It is a very easy generalization of the classical Dini's lemma.
205
Construction of Resolvents
X, T6, T7, T8, 9
T6 THEOREM Let K be a compact space, fan l.s.c. function on K, and g a u.s.c. function on K such that f(x) > g(x) for every x E K. Let :Yt' be a set of continuous functions, filtering to the right, with upper envelope f; there then exists a function h E :Yt' such that on K.
f>h>g
Proof For each x E K, choose a function ha: E :Yt' such that ha:Cx) > g(x). Since g is a u.s.c. function, we have hiy) > g(y) for all points y in a neighborhood Va: of x. There then exist a finite number of neighborhoods Va:'1 ... , Va: n , which cover K, and it suffices to take for h an element of :Yt' that dominates the functions ha: 1 , ... , h~""n .
Some consequences of the domination principle The lemmas we give now are borrowed from Deny (53). T7 THEOREM Let V be a kernel that satisfies the domination principle, and let f and g be two finite, positive, measurable functions such that the functions Vf and Vg are finite. Then, if p denotes a constant > 0, the equality f+pVf=g+pVg implies the equality f = g. Proof Putf' = f - (f A g) and g' = g - (f A g); we havef' A g' = 0 andf' + pVf' = g' + p Vg'. Then Vg'a: > Vf'a: at every point x such that g'(x) = 0 and consequently at every point x such that f'(x) > O. The domination principle then implies the inequality Vg' > Vf', but we can show in the same way Vf' > Vg', so that Vf' = Vg', hencef' = g' and finally f = g. ~
T8 THEOREM Let V be a proper kernel which satisfies the domination principle; there then exists at most one resolvent (Vp) such that Vo = v.
Proof Let (Vp) and (Wp) be two resolvents such that Vo = Wo = V, and let f be a positive, finite, measurable function such that the function Vf is finite. From the resolvent equation we have (I + pV)Vpf= (I + pV)Wpf= Vf for every p > O. It then follows from the preceding theorem that Vpf = Wpf. Since the kernel V is proper, every positive measurable function g is the limit of an increasing sequence of functions of the same type as! We thus have also Vpg = Wpg, and the theorem is established. The following result will be pointed out without proof: if the kernel V satisfies the domination principle, so do all of the kernels I + PV (p > 0). This result is clear when there exists a resolvent (Vp ) such that Vo = V, from IX.T55 and IX.T25.
2. Construction
if Resolvents
9 Let V be a kernel that satisfies the complete maximum principle. We give in this section sufficient conditions for the existence of a resolvent (Vp) such that Vo = V. Such a resolvent,
X, TIO
206
Construction of Resolvents and Semigroups
if it exists, is necessarily sub-Markov since the constant I is supermedian from the complete maximum principle and IX.T70. Here first is an example which shows that such a resolvent does not always exist. Let (Up) be a closed Markov resolvent on a measurable space (E,C); the kernel U = Uo satisfies the complete maximum principle from IX.T69. Let F be the set obtained by adjoining a point (X to E and let ff be the (J- field generated by C and {(X}. For every positive ff-measurable functionf defined on F set Vfx
=
Uf'x + f«(X)
for every
x
and
E,
E
vpt = !«(X),
where f' denotes the restriction off to E. It can easily be verified that V is a kernel and satisfies the complete maximum principle. We show that there cannot exist a sub-Markov resolvent (V p) such that Vo = V. If there were one, in fact, we would have (1
+ pV)Vpf=
Vf
for every positive measurable function f Suppose that f is zero at (X and denote by g the function equal to U pt' on E, and to zero at (X; then (1
+ pV)g =
VI,
and consequently Vpf = g from T7. In particular, we would have pVilE) = lE·
This equality and the inequality p VpI
0, sup V1Jgll: =
Il:EE
sup V1Jg ll: > 0 Il:E{(g-1JVPg) > O}
from (3.1). This is absurd, since V1Jgx < 0 at every point x such that g(x) > p V1Jgx. (b) IlpV1J 11 < 1. Since V1J is positive, it suffices to verify the relation p V1Jl < 1. Now we have
1 >pV1Jpl for every x such that 1 - pV1Jl x > O. In other words,
1 > V[p(1 - pV1Jl)]X for every x such that p(1 - pV1Jl)X > 0, and consequently, from the complete maximum principle, 1 > V[P(1 - P V1Jl)] = P V1J1.
(c) The existence of V1J for some p
1/11 VII for p
=
> 0 implies that of V1H-e for 0 < e < p (or 0 < e
0 can be chosen so that the series !n An and !n An 11 Vfn 11 converge. We then set
an = (na) A 1.
The function a belongs to f'Co, is strictly positive everywhere, and the function Va belongs to f'co. The functions an are continuous and increase to 1 as n ---+ 00: the set {an = I} ends up containing every compact subset K of E, provided n is large enough. The functions Van, finally, are bounded. For every n and every positive universally measurable function/we now set
It can easily be verified that
vn is a continuous kernel,
which tends to 0 at infinity and satisfies the complete maximum principle. Since the function Vnl = V(a n) is bounded, there exists a sub-Markov resolvent (V;:) consisting of continuous kernels, which tend to 0 at infinity, and such that Vf: = vn. • It is supposed that E is given the a-field f!ju(E), as in all problems that concern dispersion-kernels.
209
Construction of Resolvents
Let m and n be two integers such that n
X, TU
< m, and let fbe an element of ~}. We show that
Denote these functions by kn,k m, respectively. From the resolvent equation we obtain
kn =
vn(:' -
Pk n)
= VU -
pankn),
and an analogous formula for km. Consequently we have
k n - km = V(pamkm - pankn). A contradiction is then obtained by supposing that the function km - k n takes on a strictly positive value; in fact it then takes on strictly positive values on the set {p(ank n - amkm) > O} which is absurd since an < am. But the function f has compact support; the function V;(f/a n) is thus equal to V;f whenever n is large enough. It follows that the limit
VJ=
lim
V;f
(11.1)
n-+oo
exists (and is u.s.c.) for every function f E ~}. This property extends by uniform convergence to the elements of ~t, from the relation lip 1 < 1, then to ~o by linearity. It is clear that IlpVpfll < Ilf 11· Let g be an element of ~}; the function Vg belongs to ~t, and we thus have
V;
VpVg = lim V;Vg.
(11.2)
n-+oo
But we have then
pV:Vg = Pv;vn(.K.) = v n(-!.) - V;(-!.) = Vg - V:(-!.).
(11.3)
an an an an The left side thus increases with n, and the function p V pVg is thus l.s.c. We saw above that it was u.s.c., and hence it is continuous. Finally, a passage to the limit shows immediately that this function is dominated by Vg. It thus belongs to ~t. Let then f be an element of ~}; choose the function g E~} so that f ~ Vg, and set h = Vg - f The functions Vpf and Vph are u.s.c., and their sum is the function VpVg, which belongs to -+0
Now we have seen that V1>g < V;g for everyp, whenever n is large enough so that an is equal to 1 on the support of g. We thus have a fortiori Hm p V1>g =
,
o.
jl->-O
\
This extends to functions g E ~t by uniform convergence, in view of the inequality lip V 1> I1 < 1. Let then f be an element of~}; the function g = Vf belongs to ~t and we thus have Hm pV1>Vj = o. 1> ..... 0
But we have p V1> Vf = Vf - V1>! [an obvious passage to the limit starting with (11.3)]. We therefore have also lim V1>f = VI, 1>-+0
which concludes the proof.
3. Construction
cif Semigroups
12 The construction of the previous section allows us to associate a sub-Markov resolvent with every "nice" kernel, which satisfies the complete maximum principle. We are now going to give sufficient conditions for such a resolvent to be associated with a sub-Markov semigroup. The essential tool for the construction of the semigroup is the Hille-Yosida theorem, which we state here without mentioning infinitesimal generators, a subject which the reader can find treated in the following works: Dunford and Schwartz (67), Hille and Phillips (77), Yosida (121), and also Loeve (89), which has the advantage of giving an introduction to the Russian work on infinitesimal generators of Markov semigroups. We begin by recalling several results on semigroups and resolvents in Banach spaces. Let @J be a Banach space, ordered by a closed convex proper* cone @J+ (we take @J+ = {O} if 81 doesn't have a natural order structure). The only topology we shall consider on 81 will be the strong topology defined by the norm (written 1I • 11). An operator A on 81 is said to be sub-Markov if IIA 11 < 1 and if A is positive (Ax E 81+ for every x E 81+). A sub-Markov semigroup on 81 is a family (Tt)t>o of sub-Markov operators on @J, such that TsTt = T s+ t for every s > 0, t > O.
We always complete this definition by putting To = I, but this convention is not necessary. The semigroup is said to be strongly continuous if Hm 1;x = x
for every
X E
81.
t-+O
• We understand a proper cone (in French: cone sail/ant) to be a cone P such that P
(1
(-P) =
o.
Construction of Semigroups
211
X, T13
It can then be easily shown that the function t .A.N+ Ttx is continuous on the interval [0,00[. A sub-Markov resolvent on fJB is a family (V2»2»o of operators on fJB, such that the
operators pV2> are sub-Markov and the resolvent equation holds: for every
p
> 0, q > 0.
(12.1)
Although this expression is not classical, we say that the resolvent (V2» is strongly continuous if for every X E fJB. (12.2) This definition can be put in another, very useful, form: It follows immediately from (12.1> that the image VifJB) does not depend on p; denote it by!:». The resolvent is then strongly continuous if and only if!:» is dense in fJB. Condition (12.2), in fact, implies immediately that !:» is dense. Conversely, the relation x = Vay implies limpV2>x
=
2> .... 00
lim (VaY - V2>Y
+ qV2>Vqy) =
VqY
=
x.
2> .... 00
Relation (12.2) thus holds for every x E!:». Since the operators p V2> are sub-Markov, it holds for every x E !:», and hence for every x if !:» is dense. Let (Tt) be a strongly continuous sub-Markov semigroup on fJB; a sub-Markov resolvent on fJB can then be defined by setting for every
X E
fJB.
Let x' be a continuous linear functional on fJB, orthogonal to !:». It can easily be verified that (x,x' ) = Hm (p V2>x,x') = 0. 2> .... 00
We thus have x' = 0, and!:» is dense in fJB from the Hahn-Banach theorem. The resolvent (V2» is hence strongly continuous. We call it the resolvent of the semigroup (Tt). Here then is the Hille-Yosida theorem. The proof we give is borrowed in large part from Yosida (121) and Neveu. * We shall only indicate the steps, leaving the verification of details to the reader. ...
Tt3 THEOREM Let (V2» be a strongly continuous sub-Markov resolvent on fJB. There then exists a strongly continuous sub-Markov semigroup (Tt) with (V2» as its resolvent, and this semigroup is unique. Proof We begin by supposing that there exists an operator V such that, for every p
> 0, (13.1)
It can then be easily verified that V(fJB) = !:». We also set !:»2 = V2(fJB); we then have !:»2 = V;(fJB) for every p 0, and the relation
>
Hm (p V2»2x
=
X
for every
X E
fJB
2> .... 00
• See "Theory of Markov Semigroups," University of Calif. Publications in Statistics, 2 (1958), 319-394.
X, TB shows that
Construction of Resolvents and Semigroups !?)2
212
is dense in f16. Note also the relation
(a) For every p
(I
> 0 set
+ pV)(I -
(13.2)
pV1J) = I.
A1J = p(pV1J - I), T:1J)
= exp (tA1J) =
e-1Jt exp (tp . pV1J).
It can easily be verified that the operators T~1J) constitute a strongly continuous sub-Markov
semigroup (we even have lim 1 Tt(1J) - III = 0). t--+O
We are going to show that these semigroups converge to the desired semigroup (Tt) whenp-+ 00. (b) With this in mind, note that the formula
~ A 1J
dp
= - (p v:
1J
is a consequence of the formula: V1J It then follows that rp
=-
-
1.. A2 p2
1)2 = -
1J
Y; (which comes from the resolvent equation).
~ 7:(1J) = _ ..!..- T(1J)A 2 t 2 t 1J. dp p If x belongs to !?)2 we have, since x is of the form y 2y,
.E:..- T(1J) t X dP
.!...2 T(p)(p v:p)2y. t
2 2 .!....2 T(lI) t A 1J V y = -
= -
P
P
The norm of the latter is at most equal to t Ily 1 jp 2, and we thus obtain, by integrating,
1I T~1J) x
T~q) xii < t .! _.! . Ilyll.
-
p
q
We can hence set Ttx = liIllp--+oo T~p)x for every x E !?)2. Since the operators T~p) are sub-Markov, this limit also exists for every x E !!)2 = f1l, and defines a sub-Markov operator Tt on f1l. The function t JV'.I+ Ttx is the uniform limit of the functions t JV'.I+ T~p)x on every compact interval of R+, when x belongs to !?)2. It is therefore continuous on !?)2' and hence on !?)2 = f1l by passage to the limit. The relation T~p)T~p) = Ts~~ then passes to the limit, and it follows that (Tt) is a strongly continuous semigroup, for which we still must find the resolvent. (c) We have d - T(p)
dt
t
= -d
dt
exp (tA )
=
T(p) A
1J
t
1J
and consequently, all of the operators T~1J), A 1J , V1J commuting, d .. - T(1J) t
dt
vi = T(p) A t
1J
Vx
=-
T(1J)(pV: )x t 1J
=
-pV: T(1J) x. 1J t
213
X,014
Construction of Semigroups
This derivative thus converges to - Ttx when p ~ 00, the convergence being uniform on every compact interval of [0,00[, and we obtain the formula d
-~Vx= -~x.
dt
Denote the resolvent of (Tt) by (Wp ). Integrating the above formula by parts, it follows that Wpx =
J:
00
e-Pt~x
dt = -
o
f.oo e- pt -d (~Vx) dt dt
0
= Vx - p f.ooe-Pt~vx dt,
or Wp(I + P V) = V. Now V p satisfies an analogous formula, and the operator I + P V is invertible, from (13.2). We thus have Wp = Vp as desired. (d) Let (TD be a second strongly continuous semigroup which has (Vp ) as its resolvent, and let x' be an element of the dual f!J' of f!J. The two continuous functions t AN+ (Ttx,x' > and t AN+ (T;,x' > have the same Laplace transform and are hence equal. The equality of the two semigroups can now be deduced. It remains for us to free ourselves of the auxiliary hypothesis concerning the existence of V. The proof of uniqueness given above clearly is independent of this hypothesis. Take then an arbitrary strongly continuous sub-Markov resolvent (Vp)p>o, and consider for every s > 0 the resolvent (Vs,p)p>o defined by
These resolvents satisfy the auxiliary hypothesis. There thus exists for each of them a strongly continuous sub-Markov semigroup (Ts,t) such that
Now the semigroup (e-(S-r)t Tr,t) also satisfies this relation for every number r thus have, from the uniqueness established above, T s,t = e-(s-r) t T r,t
(0
E
[O,s]. We
< r < s).
It then follows that the semigroup (est Ts,t) does not depend on s, and is strongly continuous and sub-Markov: We denote it by (Tt). The resolvent of (Tt) clearly is (Vp).
The Hille-Yosida theorem will allow us to complete the construction of the semigroup associated with a kernel that satisfies the complete maximum principle. We begin with a definition.
f:
D14 DEFINITION Let E be a locally compact, a-compact space and let (Pt)teR+ be a semigroup of sub-Markov dispersion-kernels on E. We say that (Pt) is a Feller semigroup if: (1) Each kernel Pt is continuous and tends to 0 at infinity. ~ (2) Po = I, andfor every functionfE ~o(E)Jimt-+o Pt! = f niformly on E.
A
Such a semigroup is not necessarily measurdte (in the sens f IX.39) when E is given the a-field f!Ju(E). A resolvent can, however, be associated with it in the following manner.
X, TI5
Construction of Resolvents and Semigroups
214
(a) Let f be an element of CC%; the mapping t ~ Ptf of R+ into CCo is bounded and continuous, which allows us to set
(integrating in CCo).
Vflf= f.ooe-fltptfdt
We thus define a positive linear mapping of CC% into ~o (which extends moreover to an operator on CCo, of norm at most equal to lip). It then follows from IX.T11 that the mapping f ~ Vflfis the restriction to ~% of a dispersion-kernel Vfl on E (continuous, and tending to 0 at infinity). (b) Let £ be the collection of bounded Bore! functions f such that the function t -A.J\t+ (p"Ptf) is Borel for every bounded Radon measure p, on E, and such that the following relations are satisfied.
0 and every x E E,
v"r =
fe-.tpJ"dt
Relation (15.1) is then deduced, whenfbelongs to
(fE 'C}). ~},
by lettingp tend to O.
• In other words, from IX.Tll and X.T4, a positive linear mapping V of r;% into '6'0 such that the relation + VpJ > Vg3: on {g > O} (a a positive constant, f e ~}-, g e ~~) implies the same inequality for every x.
a
Construction of Semigroups
215
The mappingf JW+ Ptf is, from IX.Ttt, the restriction to Pt (continuous, tending to 0 at infinity). The relations
X,16 ~o
of a sub-Markov kernel
Vf= f.\f dt and can then be extended to universally measurable functions, as in No. t4. The existence of a Feller semigroup satisfying (15.1) is thus established. Let (P;) be a second Feller semigroup having the same property, and let (V;) be its resolvent. It follows from (15.1) that V~ = V, and from T8 that V; = V p for every p. Let then f be an element of ~J("; the continuous functions t J\III'+ Pt/x and t ~ P;/x have the same Laplace transforms and are hence identical; so that the kernels Pt and P; are themselves equal. Remark It can be shown that (Pt) is the only semigroup of sub-Markov kernels that satisfies (15.1) and such that, for every function f E ~J(", the function (t,x) J\III'+ Ptfx is measurable with respect to the a-field 88(R+) x 88iE). We shall not prove this result. Passage from the sub-Markov case to the Markov case t6 Suppose that the semigroup (Pt) we have constructed is Markov. We shall see later that, using probabilistic methods, the potential theory relative to the kernel V can be studied in a very detailed manner. These methods are not directly applicable in the subMarkov case, where we have to reduce to the Markov case by the following method. Let (Pt) be a Feller semigroup of sub-Markov kernels on a measurable space (E,C). Adjoin to E an additional element 0, put E U {o} = E', and denote by C' the a-field generated by C and the set {o}. Define then kernels P; on (E',C') by setting P't(x,A) = Pt(x,A) for x P~(x,{
oD =
P~(o,A) =
E
E, AcE, A
1 - Pt(x,E) for x
IA(o)
(A
E
E
E;
E
C; (16.1)
C').
It is trivial to verify that we thus obtain Markov kernels on (E',C'), which again constitute
a semlgroup. We adopt the following very important convention: We identify every function defined on E with its extension to E' which vanishes at the point o. It is clear with this convention that Pt/ = P;f for every function f defined on E and, iff is defined on E', Pd = !(o)
+ Pt(! -
j(o)).
(16.2)
A sub-Markov resolvent (VJI) on (E,tC) can be extended in the same way to a Markov resolvent (V;) on (E',C'), by putting
pV;! =
j(o)
+ PYv(! -
j(o)).
(16.3)
It will be noted that if (VJI) is the resolvent of (Pt), then (V;) is the resolvent of (P;). Let us next consider the case where E is a locally compact, a-compact space, and where the sub-Markov semigroup (Pt) is a Feller semigroup on E. E' can then be considered to be the Alexandrov (one-point) compactification of E, 0 being the point at infinity (an
X, 17, D18, TI9
Construction of Resolvents and Semigroups
216
isolated point if E is compact). In this case the semigroup (P;) is a (Markov) Feller semigroup on E'. Analogous considerations apply to resolvents that take elements of CCo(E) into CCo(E). Ray resolvents
17 Let E be a locally compact, a-compact space, and let (Vp ) be a sub-Markov resolvent on the Banach space CCo(E). The hypothesis of strong continuity on this resolvent plays an essential role in the construction of the semigroup (Pt) associated with (Vp ) through the Hille-Yosida theorem. We seek now to replace strong continuity by a less restrictive condition, following Ray (106). The results that follow will not be used in later chapters. From No. 16 above, we lose no generality in limiting ourselves to the study of Markov resolvents (V p) on a compact space E with kernels that leave the space CC(E) invariant. We note first that for each p > 0, if f/ p is the convex cone of continuous p-supermedian functions (IX.T45), the vector space f/ p - f/ p is independent of p. To see this, let p and q be such that 0 < P < q. Every p-supermedian function is then q-supermedian (IX.T47), and it suffices to show that every continuous q-supermedian function f is equal to the difference of two continuous p-supermedian functions. Now we have f = [f + (q - p) Vpf] - (q - p) Vpf; these two functions are continuous, and the second is psupermedian from IX.T50. It will thus suffice to show that the function h + (q - p)Vph is p-supermedian for every q-supermedian function h. This property holds when h is of the form Vqg (g > 0), since then h + (q - p) Vph = Vpg; it thus holds for every excessive function h from IX.T64. We conclude finally by noting that, for every p, a function is p-supermedian if and only if it is equal almost everywhere to a p-excessive function, from IX.T60. We can now pose the following definition.
DI8
Let (Vp) be a Markov resolvent consisting of continuous diffusion-kernels on a compact space E. We say that (Vp) is a Ray resolvent if the cone f/ q of continuous q-supermedian functions separates the points of E for some q > O. DEFINITION
The condition then holds for every q > O. The cone f/ q is closed under the operation A; the space f/ q - f/ q is thus closed under the operations V and A , contains the constants, and separates the points of E. It is then dense in CC(E) by the Stone-Weierstrass theorem. TI9 THEOREM (Ray) Let (Vp) be a Ray resolvent. There then exists a unique measurable semigroup (Pt) on the measurable space (E,gjo(E» with (Vp) as its resolvent, which has the following property: The function t ~ Ptf X is right continuous for every function f E CC(E) and every x E E. Proof Let q be a number> 0, and let J be the closure in CC(E) of the image space Vp(CC(E». J is a Banach space, invariant under the operators V p, on which the resolvent (Vp ) is strongly continuous. There thus exists a strongly continuous sub-Markov semigroup (Pt) on J such that
(p> O,!E J).
(19.1)
The function I belongs to J, and Vpl = lip. It follows from the uniqueness of Laplace transforms that Ptl = 1 for every t ~ O.
Construction 01 Semigroups
217
X, T19
LetIbe an element of !7q' The functions p V'P+ql increase to the q-excessive regularization lof/whenp-+ 00 (IX.T46 and T60). We set (19.2) Pt! = Hm Pt(pV'P+qf), 'P-+ 00 and in particular Pol = J. We next extend the mapping I .J\I\t+ P tlto!7q - !7q by linearity. To show that the mapping t.J\l\t+ Ptlx is right continuous (and free of oscillatory discontinuities), we begin with the case where I is of the form Vqg, g E ~+(E). We then have oo
e-qtPtf = e-qtpt(f. e-qSPsf dS)
=
f.oo e-qSPsg ds.
The function t.J\l\t+ e-qtPtlx is thus continuous and decreasing. Suppose next that/belongs to !7q; the functions V'P+ql are of the preceding type, taking for g the positive function (I - qV'P+ql)· It follows from (19.2) that the function t.J\l\t+ e-qtPtlx is decreasing and l.s.c.-i.e. right continuous. The stated result is then clear by linearity when I belongs to !7q - !7q. Suppose next that I belongs to !7q - !7q and is positive; I is then the difference of two continuous q-supermedian functions g and h such that g > h. This inequality implies that pV'P+qg ~ pV'P+~ for every p, and hence Ptg > Pth for every t from (19.2). The relation I ~ 0 thus implies Ptl > O. We note finally that the function Ptf, when I belongs to !7q' is the upper envelope of an increasing sequence of continuous functions. Ptl is thus a Baire function, and this result extends to !7q - !7q by linearity. We have seen in No. 18 that the space !7q - !7q is dense in ~(E). The positivity of Pt implies on the other hand the relation IIPtll1 < IIIII (in the uniform norm). The mapping 1.J\I\t+ Ptl thus extends by continuity to a linear mapping, of norm 1, from ~(E) into ~(E), the space of bounded, 86o(E)-measurable functions on E [where 86o(E) is the Baire a-field on El. It can then be shown, as in IX.II-12, that the mappings Pt so defined are the restrictions to ~(E) of Markov kernels on the measurable space (E,86o(E)), which we denote by the same symbols. We show that the following three properties hold for every bounded Baire function I: (s ~ 0, t
> 0);
(19.3)
(b) The function (t,x).J\I\t+ Ptlx is measurable with respect to the a-field 86(R+)
(c)
V.I =
X
86o(E);
f.oo e-"'P,f dt
(p
> 0).
These three properties are indeed true when I belongs to J: (a) and (c) from the HilleYosida theorem, and (b) from the strong continuity of the semigroup, which implies that the function (t,x) .J\I\t+ Pt/x is continuous. They extend then to the case where I belongs to !7q from (19.2) and Lebesgue's monotone convergence theorem, then to ~(E) by linearity and continuity. The space :Ye of bounded Baire functions, which satisfy (a), (b) and (c), contains ~(E), is closed under passage to monotone limits, and therefore includes all bounded Baire functions from I.T20. The existence of the desired semigroup is thus established. Let (P;) be a second semigroup which has the same properties, and let I be an element of CC(E); the functions t.J\l\t+ ptr and t.J\l\t+ P;!X are right continuous and
X, 020, T21
Construction of Resolvents and Semigroups
218
have the same Laplace transforms. They are hence equal, and we see then that the kernels Pt and P~ are equal. In Chapter XI we study concepts analogous to those which we introduce now, following Ray. We keep the notation of the preceding numbers. D20 DEFINITION We say that the point x E E is a branching point for the Ray resolvent (VI}) if there exists a number p > 0 and a positive measure # of mass 1, distinct from ex, such that (20.1) (#,f> < f(x) for every function fE //1.1. TIt THEOREM The following properties are equivalent: (a) x is not a branching point for (VlJ); (b) exPo = ex (c) lim Ptfx = f(x) for every function f E ~(E) t-+O
(d) lim qVq / x = f(x) for every function f E ~(E). q-+oo
Proof We have from formula (19.2) for every functionfE
Po/x = limqVlJ+q flll
//1.1'
< f(x).
q-+oo
The measure exPo thus satisfies inequality (20.1), so that (a) implies (b). The implication (b) => (c) follows from the right continuity of the function t ~ Ptf X for t = O. The implication (c) =>(d) is a well-known property of Laplace transforms, and it only remains to show that (d) implies (a). Let # be a positive measure of mass 1 such that #(f) < f(x) for every function f E // 1.1. Let g be a continuous function with values in [0,1]. Apply # to both sides of the equality 1 = pVlJg
+ pVi 1 -
g).
The two potentials belong to // 1.1' so that (#, VlJg> = V lJtJ:. We thus have (#,!> = lex) for every functionfEJ", from the relation! = limq -+ oo qVlJ+q! Now wehave!(x) =f(x) from (d); the relation! < f then gives us (#,f> > f(x) for every function f E // 1.1. Since the measure # satisfies (20.1), this inequality can be replaced by an equality. Since the space // 1.1 - / / 1.1 is dense in ~(E), we have # = ex and property (a) follows.
CHAPTER
XI
Convex Cones and Extremal Elements
The main object of this chapter is to prove Choquet's fundamental theorem on integral representations in compact convex sets. This theorem could be considered now as a particular case of a theory of "balayage" defined by a convex cone of continuous functions on a compact set. Since this general theory has as yet no other important applications, we preferred to present it after Choquet's theorem, in Section 3, in order not to impose it on the reader interested only in convex cones. In Section 1 we have grouped a few auxiliary results on compact sets, which are not all indispensable for our purpose, but which are sometimes hard to find in the literature. All vector spaces considered in this chapter will be supposed real.
1.
Compact Convex Sets
Sublinear functions
Dl DEFINITION Let E be a vector space. A real-valued function p defined on E is said to be a sublinear function if it is subadditive: p(x
+ y) < p(x) + p(y)
(x, Y E E)
and positive homogeneous:
p(AX) = Ap(X)
(x
E
E, A > 0).
A linear functional f on E is said to be dominated by p if f(x) < p(x) for every x We shall need the following form of the Hahn-Banach theorem.
E
E.
T2 THEOREM Let E be a vector space. p a sublinear function on E, Fa subspace of E, and f a linear functional on F such that
f(x)
< p(x)
for every x
E
F.
There then exists a linear functional g on F, which is dominated by p and extends f. 219
XI, T3, 4
220
Convex Cones and Extremal Elements
Proof* Let fJJ be the set of all ordered pairs (h,H), where H is a subspace of E containing F, and h a linear functional on H, dominated by p on H and extending f: fJJ is clearly nonempty, and inductive for the order relation < defined by:
< (h',H»~ (H c
«h,H)
H' and h' extends h).
Let then (g,G) be a maximal element of fJJ (from Zorn's lemma); the theorem will be established if we show that G = E, which follows immediately from the next lemma. Let (h,H) be an element of fJJ, a an element of E that does not belong to H, and H' the vector space H E8 Ra. Let h' be the linear functional on H' defined by
LEMMA
+
h'(x
=
ra)
Then h' will be dominated by p on H'
h(x)
+
(x E H, rE R).
rA
if and only if
sup [h(x) - p(x - a)] a;EH
< A < inf [p(y + a) -
h(y)].
'YEH
These two conditions are always consistent. Proof of Lemma Since the two functions h' and p are positive-homogeneous, h' will be dominated by p if and, only if h'(x
+
a) = h(x)
+
A 0. These two sets are disjoint, the first is compact convex, and the second is convex closed. There hence exists a closed affine hyperplane H* which separates U and V; H cannot be parallel to the line R X {O} since every hyperplane of this type that intersects U also intersects V. H is hence the graph of a continuous affine function g, which is within e off on K. Let B be a Banach space and let B' be the dual of B, with the weak topology. Take for K the unit ball of B', and consider a linear functional f on B', which is weakly continuous when restricted to K. f is the uniform limit on K of a sequence (fn)neN of weakly continuous linear functionals on B'. These functionals arise from elements of B, and ~ a Cauchy sequence in B; it then follows that f itself arises from an element of B. We have thus established an important theorem due to Banach [see, e.g., Bourbaki (17), p. 74; Dunford and Schwartz (67), p. 428]. Several analogous results follow in the same way from Theorem 6. The expression "g is strictly dominated by f" in the following statement means that g(x) < f(x) for every x E K. Example of an application
"
prt.n1.-
T7 THEOREM (a) Let f be a finite convex l.s.c. function defined on K. Denote by d, the set of restrictions to K of continuous affine functions on E, which are strictly dominated by f on K; we then have (7.1) f= sup g. ge.9l1f
(b) Suppose in addition that f is affine. The set d, is then filtering to the right. (c) Let f be a finite convex u.s.c. function defined on K; the set re, of continuous convex functions on K which strictly dominate f is then filtering to the left, and
f= inf g.
(7.2)
ge~,
Proof Using the notation of the proof of T6 we set, under hypothesis (a), W
= {(t,x) E R
X K:
t
> f(x)}.
W is then a closed convex subset of F = R X E; for every point z = (s,y) of R x K such that there exists a closed hyperplane H, which strictly separates z from W. This hyper-
plane cannot be parallel to R x {O}; it is hence the graph of a function g E d" and relation (7.1) follows. Suppose next thatfis affine; to establish (b), we recall that the convex hull of the union of two convex compact subsets Band B' of F is compact [being the image of [0, I] x B x B' under the mapping (t,x,y) ~ tx + (1 - t)y]. Let then hand h' be two elements of d, and a a constant dominated by hand h' on K. Denote by B the compact convex set {(t,x) ER x K: a < t < h(x)}, and by B' the analogous set with h replaced by h'. The reader can verify directly from the fact that f is affine that the convex hull C of B U B' is disjoint from W. Since C is compact there exists a closed hyperplane, which strictly separates C and W; this hyperplane is the graph of an affine function g that dominates hand h' and is strictly dominated by f on K. Statement (b) is thus established. Suppose finally thatfis convex u.s.c. Let g and g' be two convex bounded l.s.c. functions on K that strictly dominate f We shall show that there exists a function h E re, dominated
* Bourbaki (16), Chapter 1I, Section 3, Prop. 4, p. 73; Dunford and Schwartz (67), V. 2.7, Theorem 10, p.417.
Compact Convex Sets
223
XI, 8
by g and g'-this will imply in particular that re! is filtering to the left. Let a be a constant that dominates g and g'; denote by B the compact convex set {(t,x) E R X K: g(z)
< t < a},
and by B' the analogous set with g replaced by g'. Let C be the convex hull of the union B U B', which is compact. Set k(x) = inf {t: (t,x) E Cl. This function is convex l.s.c. and strictly dominates f It hence is equal, from (a), to the upper envelope of the set ..#k' Denote by :Ye the set of functions of the form ho V hI V ..• V hn
(n EN; ho, ... , hn E ..#k) ;
:Ye is a family of continuous functions, which is filtering to the right. There then exists from X.T6 a function h E :Ye, which strictly dominates f
It only remains to show that the lower envelope of re! is equal to f It suffices from the above to construct for each point x E K and each t > f(x) a convex l.s.c. function g t.aJ on K, which strictly dominates f and is such that g t.ix) = t. Let us thus choose a number b, which dominates f on K (such a number exists since fis u.s.c. and finite), and denote byGt,aJ the convex hull of the point (t,x) and of {b} x K: Gt.aJ is a "stalactite" hanging over the graph off It then suffices to put
gt.iy) = inf {s ER: (s,y) E G t.aJ }.
Extreme points of a compact convex set 8 We denote by Jt+ (respectively, Jtt) the collection of positive (respectively, positive with unit mass) Radon measures on K. The barycenter of a measure ft E.Lt will be written b(p,). We begin by recalling, or proving quickly, some elementary properties of barycenters. (a) Let ft be a measure in .Lt with barycenter x, and let f be a convex (respectively, afJine),jinite, u.s.c. or l.s.c.function on K. We then have
f(x)
f we then have fl(f) < fl(g) < 1(g), which yields the inequality fl(f) < pif) < p;.(f) by passage to the limit inferior on g. Conversely, let fl be a linear functional on ~, which is dominated by the sublinear function P;.. The relation f < 0 implies pif) < 0, and thus fl(f) < 0; fl is thus a positive measure on. K. Let f be a function in f/; then f = J, hence p;.(f) = 1(f)andfl(f) < 1(f). It then follows that fl is a balayage of 1. T20
COROLLARY
For every measure 1
E
Jt+ and every function f
E ~
p;.(f) = sup fl(f).
we have (20.1)
p.evlt+ p.>-;'
Proof The sublinear function P;. is the upper envelope of the family of linear functions which it dominates (No. 4), and the latter are the balayages of 1 from the preceding result. The existence and characterization of maximal measures
-.
T21
THEOREM
Every measure 1
E
Jt+ admits a maximal balayage.
Proof Let Jt;. be the family of balayages of 1, ordered by the relation -< ; it will suffice to show that Jt;. is inductive (Zorn's lemma). Now let i J\,f\,f+ fli be an increasing mapping of a totally ordered set I into Jt; since the measures fli are positive and have the same total mass, they admit a weak cluster point fl. We have fl(f) = limi fllf) for every function f E f/, since the mapping i J\,f\,f+ fllf) is decreasing. Since the set f/ - f/ is dense in ~, fl is the weak limit of the fli' and also the least upper bound of the fli under the order -< . This establishes the theorem. Here now is the most important result concerning maximal measures; it is due to Mokobodzki (102). -.
T22 (a) (b) (c) (d)
Let 1 be a positive measure on K. The following statements are equivalent The measure 1 is maximal; 1(f) = 1(j) for every function f E ~; 1(f) = 1(1) for every function f E - f/; The measure 1 is carried by each of the sets B t = {x E K:f(x) = lex)} (fE -!/). THEOREM
Proof We know that 1 is maximal if and only if the set of balayages of 1 consists of 1 alone, or again (T19 and No. 4) if 1 = P;.' Statements (a) and (b) are thus equivalent, and (b) clearly implies (c). We show conversely that (c) implies (a): Let fl be a balayage of 1; we have fl{f) > 1{f) for every functionfE -f/, and also fl{f) < pif) = 1{f), from (c). Since the space f/ - f/ is dense in ~ under uniform convergence on K, we have fl = 1, and it follows that 1 is maximal. Finally, the relation f < implies immediately the equivalence of (c) and (d).
1
229
The Choquet Theorem
XI, T23-T25
Maximal measures and extreme points (The metrizable case) We denote by OK the set of extreme points of K (it will be noted that this set has not yet entered the discussion). The first result does not require that K be metrizable. T23 THEOREM (a) f(x) = J(x) for every point x (b) OK = Bf
n
E
OK and every functionfE~;
fE-!/'
(c) Let A be a positive measure on K, such that every compact subset disjoint from OK is A-negligible,. A is then maximal. Proof If x is extreme every balayage of Ca; is equal to Ca; [9(a)]; (a) then follows from formula (20.1). We thus have OK c nfE-!/' Bf • Conversely, let x be a point of this intersection; the measure Ca; is then maximal from T22. Every measure p, E 1+ such that r(p,) = x is thus equal to Ca; [17(c)], and x is extreme from 9(a). Suppose finally that A satisfies the hypothesis of (c), and let f be an element of~. The function f is u.s.c., so that K"""Bf is the union of the sequence of compact sets {J - f > Iln }(n EN). These sets are disjoint from OK' their union is thus A-negligible, and so A is carried by Bf • It then follows from T22 that A is maximal. The idea of using a strictly convex function in the proof of the following theorem is borrowed from Bonsall (12) (this article contains a very short and elegant proof of the Choquet theorem in the metrizable case, which is the origin of the proof we give here).
J-
T24 THEOREM Suppose that K is metrizable. The set OK is then the intersection of a sequence of open sets. A measure A E 1+ is maximal if and only if it is carried by OK'
Proof Since the set K is metrizable, the space ~ (given the norm of uniform convergence on K) admits a countable dense subset. Since the space f/ - f/ is dense in ~, there exists a sequence Cfn)nEN of elements of -f/, which separates the points of K. We can suppose that all of these functions lie between - I and 1 on K; put then
f= nEN !
21nf~.
This function is also convex continuous and is linear on no open segment contained in K. Since the function J is concave, we thus have J(x) > f(x) at every nonextreme point x E K. It follows then from T23(a) that B f = OK; OK is thus the intersection of the sequence of open sets f < Iln} (n EN), and every maximal measure A is carried by OK from T22(d). Conversely, if A is carried by OK' A is carried by every set B g (g E -f/) from T23(a), and thus is maximal from T22(d). Here then is Choquet's existence theorem for the metrizable case.
{l-
...
T25 THEOREM Suppose that K is metrizable. Every point x measure fl carried by the set of extreme points of K.
E
K is then the resultant of a
Proof Let p, be a maximal balayage of Ca; (T21); we have rep,) = x, and p, is carried by OK'
The uniqueness theorem The version of the uniqueness theorem that we give is borrowed from the article by Loomis (91) [see Cartier, Fell, Meyer (24)]. We begin by establishing, following Cartier, the identity between the order -< and the "strong" order introduced by Loomis.
Convex Cones and Extremal Elements
XI, T26, 27 T26
THEOREM
230
Let A and fl be two positive measures on K; the following three properties
are equivalent; (a) A -< fl; (b) For every finite family (A i)i=l n ofpositive measures on K such that A = ~~=1 Ai, there exists a finite family (fli)i=l n ofpositive measures on K, such that Ai -< fli for
and
i = 1, . . . , n;
(c) The same statement as (b), replacing Ai -< fli by r(Ai)
= r(fli)'
Proof* Suppose that (a) holds. Let E be the product vector space ~n, and let F be the subspace of E consisting of the elements of E of the form (1,1, ... ,f) (n times, f E ~. Consider the sublinear function p on E defined by ,fn) = A1(h) + A2(!2) + ... + An(!n)' The linear functional (I, I, ,f) -A/II'+ fl(f) on F is dominated by p on F. It can thus be extended to all of E, from T2, by a linear functional dominated by p. This functional can be written (f1,h, ... ,fn) -A/II'+ #1(f1) + fl2(f2) + ... + flnCfn)' P(f1,f2'
flh ... , fln denoting linear functionals on ~. We have fllf) < Ai(!) for every i and every function f E ~; consequently fli is a balayage of Ai from T19. Finally, we have 1i fli = fl,
and property (b) is established. Property (b) clearly implies (c). To show that (c) implies (a), consider a functionfE [/, and a number 8 > O. Cover K with a finite number of closed convex sets Wb W 2 , ••• , W m on each of which the oscillation of/is less than 8. Put ei = W i "'" Uj 1 [f(r{A i» i
- 8A i {I)]
i
from the condition on the oscillation of the (positive-homogeneous) function This last expression is equal to ~f{r(fli»
- 8A{I)
> ~ fli{f)
- 8A{I)
=
f
on
Wi•
fl(f) - 8A{I),
i
i
from (15.2). We thus have A -< fl, and the theorem is established. Let x be an element of C. We define a subdivision of x to be any finite family (X i)i=1.2. ... .n of elements of C such that x = ~:=1 Xi' The subdivisions of a measure fl E 1+ are defined similarly. The set of all subdivisions of x is partially orderedt by the relation "s is less fine than t," which we write s -l t, and which is stated, if s = (X i)i=l. ... .n and t = (yj)j=l . ... .k' as
27
(s -l t}(there exists a partition of {I, 2, ... , Xi
k} into n sets J1 ,
•••
,In such that
= ~jEJiYj for all i).
* One may note that the first part of the proof can be generalized as follows: let E be a vector space, PI' P2' ... ,pn be sublinear functions on E, x' be a linear functional dominated by PI + P2 + ... + pn. One may then find linear functionals x~, x~, ... , x~ on E, dominated, respectively, by Ph P2' ... ,pn, such that x' = x~ + x~ + ... + x~. We shall not give details here, since No. 51 will yield an extension of this result to "continuous sums" of sublinear functions. t This translates the French "preordonne"; note that (s -I t and t -I s) doesn't imply s = t.
231
XI, T28, T29
The Choquet Theorem
We associate with every subdivision s = (X i )i=1 . ... .n of x the measure e s = ~:=1 erei • The relation s 1 t clearly implies e s -< et. The relation e s -< ft, where ft is an element of Jt+, is equivalent from T26 to the existence of a subdivision (fti)i=1 . ... .n of ft such that Xi = r(fti) for every i. Let S be a collection of subdivisions of x, which is filtering to the right for the relation 1 ; we say then simply that S is afiltering set. Such sets are natural objects of study in certain applications of Choquet's theory, in particular, in the theory of group representations, which was the origin of Loomis's work. Let ft be a positive measure on K, and let (fti)i=1.2 . ... .n and (ft;)i=1.2 . ... .k be two subdivisions of p; there then exist positive measures Ai; (i < i < n, 1 <j < k) such that Pi = ~i Aii for every i, and P; = ~i Aii for every j. * Denote then by Sp. the collection of subdivisions of X = rep) of the form (r(Pi»i=l . ... ,n' where (Pi)i=l . ... .n denotes a subdivision of p. It is clear from above that this set is filtering. The following theorem then summarizes the main results of Loomis's theory. T28 THEOREM (a) Let A and P be two positive measures. The relations A -< p and Sl c Sp. are equivalent. (b) Let R be a filtering set of subdivisions of x. The set of measures e s (s ER) then admits an upper bound for the order - (et,f). Since the set !/ - !/ is dense in rc, it follows that the mapping s ~ es from R into JI+ admits a single cluster point in the weak topology, along the filter of sections of the partially ordered set R. This mapping thus admits a weak limit eR' and it can be verified easily that eR is the supremum of the measures e s (s ER). The relation e s -< eR for every subdivision SE R implies R c S(ER) from T26. Finally, if R is of the form Sp.' we have e s -< p for every SE R, hence eR -< p; on the other hand R = Sp. c S(ER)' so that p -< eR and lastly p = eR. The collection of filtering sets of subdivisions of x, ordered by inclusion, is clearly inductive; (c) is thus an immediate consequence of Zorn's lemma. Suppose finally that R is a maximal filtering set, and let p be a measure such that ReSp." We have R c S(ER)' R eSp" thus R = Sp. = S(ER) and finally p = eR. In other words, eR is the only measure p such that R esp.. It then follows in particular that eR is maximal. Conversely, let p be a maximal measure, and let R be a maximal filtering set containing Sp. [one exists, from (c)]. The measure eR is then maximal, we have p -< eR and thus p = eR. Consequently Sp. = R, and Sp. is indeed a maximal filtering set. Theorem 28 implies Choquet's uniqueness theorem:
-+
T29 THEOREM The following two statements are equivalent: (a) The cone C is a lattice under its intrinsic order. (b) For every point x E C there exists a unique maximal measure ft E JI+ such that rep) = x.
* Let /i (respectively, I;) be Borel functions on K into [0,1], such that Pi We then set Ail = fl;p.
= ~p
(respectively,
pi = f;p).
Convex Cones and Extremal Elements
XI, T30
232
Proof Suppose that the cone C is a lattice. Let x be an element of C, and let (X i )i=l, ... . n and (X;)i=l . ... ,k be two subdivisions of x. The "decomposition lemma" [see Bourbaki, (18) Chapter. 11, Section 1, No. 1, p. 19] implies the existence of elements Yii (1 < i < n, 1 <j < k) of C such that Xi = ~i Yii for every i, and = ~i Yii for every j. In other words the set of all subdivisions of x is filtering; it is therefore the unique maximal filtering set and (b) follows from T28. To establish the converse, we note first that the set 1~ of maximal measures is always a convex cone, which is a lattice for its intrinsic order. To see this let A and ft be two maximal measures; A and ft then charge no set B, (fE -f/; cr. No. 22). The same is then true of the measures A + ft, A " ft, and A V ft (these last symbols with respect to the natural order on 1+), measures which are thus maximal from T22. It follows that 1~ is a cone and that A " ft and A V ft are, respectively, the inf and the sup of A and ft in 1~. Now the function ft Jo..N+ r(ft) maps 1~ onto C (T21), and it is clearly additive and increasing when ..,/I~ and C are given their intrinsic orderings. Suppose that property (b) is satisfied: this mapping is then an isomorphism, and it follows that C is a lattice. We say (following Choquet) that K is a simplex if the cone C is a lattice. The reader will find other characterizations of the case of uniqueness in the article by Choquet-Meyer
x;
(44).
The following theorem will be generalized later (No. 36).
Suppose that K is a simplex. For every x E K, let ft:» be the unique maximal measure with barycenter x. The mapping x .J\I\/'+ ft:»(f) is then Borel for every function f E cc. Let A be a positive measure; the unique maximal measure ft such that A -< p is given by the formula T30
THEOREM
(f E rc).
(30.1)
Proof For every measure A E 1+ and every functionfE -f/ we have from (20.1) Pl(f)
= ACj)=
sup ()(f)
=
ft(/),
;'- 0).
(40.3)
> 0: +
k) =
iOOe-~te-ktdft",(t) = iOOe-~tdfth+k(t)
which implies dfth+k(t) = e-ktdft",(t) (uniqueness of Laplace transforms) It follows that the measure ft defined by dft(t) = ehtdft",(t) does not depend on h, and (40.3) then becomes f(x
We replace x
+ h) =
iooe-l,,*hltdp(/).
+ h by x and get the representation (40.2).
• In fact, one can easily prove that the decreasing exponential functions and the constant 1 are extremal elements of Cb but we don't need this here.
239
Balayage Defined by a Convex Cone of Functions
3. BalayaBe Defined by a Convex Cone
XI, D41
cif Functions
In all of the classical forbears of Newtonian potential theory certain convex cones of functions, which play a fundamental role, are seen to appear: superharmonic functions, plurisuperharmonic functions, concave functions, and excessive functions. These cones are always closed under the operation A, and the functions of which they consist are generally lower semicontinuous. One can thus imagine the "general potential theory" as the study of convex cones of functions that have these two properties. This ambitious "general theory" so far has certainly not reached its definitive form. Its main interest at this time comes from the better insight it gives into older results (Choquet's theorem, Shilov boundaries) and from having simplified their proofs. This certainly is enough to justify the study here. Given the incomplete state of the theory, it appeared sufficient for us to give the general ideas in their simplest form. We limit ourselves in particular to the study of convex cones of continuous functions. The results of this section are borrowed from Bauer (3) and Mokobodzki. D41 DEFINITION Let X be a compact space and [/ a subset ofCC(X). We denote by partial ordering on .L+(X) defined by for every
fE f/).
-
ft(1); the collection of balayages of a measure A is thus weakly compact.
Suppose that X is a compact convex subset of a locally convex space, and that [/ is the collection of continuous concave functions on X. The relation -< then coincides with that which we have used in the preceding section.
Example
XI , D42--D44
240
Convex Cones and Extremal Elements
D42 DEFINITION A point x E f/ is said to belong to the boundary (of X relative to 9) there exists no balayage of ere distinct from ere'
if
The boundary will be denoted by a!/X. Remarks 41(a) and (b) imply that the boundary is not changed if f/ is replaced by the closed convex cone, closed under A, which is generated by f/.
Example Suppose that there exists afunction f E f/, which attains a strict negative minimum at a point X.~ f(x) < 0; f(y) > f(x) for every yE X""{x}. The point x then belongs to the boundary. Indeed let p be a balayage of p(f) < f(x) and p,(f) = p({x})f(x)
+f, x
ere;
we have
fey) dp,(y)
,{re}
> p,({ x })f(x) + p,(X",,{x })f(x) =
p,(1)f(x),
X""
where the inequality is strict if p charges {x}. The relation 1 E f/ implies that p,(l) < 1, hence p(1)f(x) > f(x), with strict inequality if p,(1) < 1. The comparison of these inequalities then gives p(X""{x}) = 0, p,(1) = 1, and hence p = ere'
D43 DEFINITION Let A be a subset of x; we say that A is a Shilov set (relative to 9) if the relations: f E f/; inff(x) ~ -1 (43.1) re eA
imply the inequality infreex f(x)
> -1.
Remarks (a) The set X is always a Shilov set; the empty set is a Shilov set if and only if every function f E f/ is positive. (b) Let A be a compact subset of X; definition 43 then takes the following form: A is a compact Shilov set if and only if every function f E f/ which takes on a value < 0 attains its minimum at a point of A. (c) The Shilov sets remain the same if f/ is replaced by the closed convex cone generated by f/ and closed under the operation A. D44 DEFINITION Suppose that f/ is a convex cone closed under the operation A and containing the positive constants. Let A be a Shilov set. For every function f E ~(X) we set (44.1) = inf g.
lA
1
ge!/ g?! onA
Ix·
We write in place of
lA
lA
Remarks (a) The relation f < 1 implies < 1; the relation f > -1 implies :;=: -1, from the fact that A is a Shilov set. The function is thus bounded for every function f E ~(X); since it is upper semicontinuous, it is integrable for every measure A E .L+(X). Since the cone f/ is closed under A we have (AlA)
=
lA
inf (A,g).
ge!/ g?! onA.
(b) We set
PA,A (f)
=
(A,lA)'
(44.2)
241
Balayage Defined by a Convex Cone of Functions
XI, T45-T48
This function is finite and sublinear on ~(X). We then have the following theorem, valid under the hypotheses of D44, which generalizes T19. T45 THEOREM Let A be a compact Shilov set. The linear functionals on ~(X) dominated by the sublinear function pA,A are identical with the balayages of A carried by A. Proof Let ft be a balayage of A carried by A; we have ft(f) < ft(g) < A(g) for every function g E f/ which dominates f on A, and hence ft(f) < PA.if). Conversely, let 4> be a linear functional on ~(x) dominated by PA.A; the relation f < implies PA.if) < 0, and hence 4>(f) < 0: 4> is thus a positive measure. The relations f > 0, f = on A, imply PA.if) < 0, and hence 4>(f) < 0. Thus 4> is carried by A. Finally we have PA,if) < A(f) if fbelongs to f/, hence 4>(f) < A(f), and 4> is a balayage of A.
°
T46
COROLLARY
°
We have*
sup
PA ;.(f) =
ft(f).
A- 2), and for f/ the convex cone consisting of the positive superharmonic functions continuous in the unit ball and zero at the boundary (in other words, vanishing at infinity in X). Every element of f/ is of the form Uv, where v is a positive measure on X and U denotes the Green's kernel. f/ contains all functions of the form Uf, where f is positive, continuous, and has compact support in X; it is well known that f/ satisfies the hypotheses of No. 49. Let A and fl be two bounded positive measures on X; the relation A -< fl implies 50
(AU,f)
=
(A,Uf)
> (fl,Uf) =
(50.1)
(flU,f)
for every functionfE ~}(X), and consequently AU
> flU.
(50.2)
Conversely, this last relation implies A -< fl, since for every function g = Uv E f/ [v E 1+(X)] we have (A,UV)
=
(AU,V)
> (flU,v) = (fl,Uv).
In particular, let fl be the balayage of A (in the classical sense) on an open set w of X; it is known that AU > flU on X, and AU = flU on w. We thus have A -< fl, so that fl is a balayage of Ain the sense of this chapter. This justifies, in part, the use of this terminology. Since the functions of f/ are positive, we have Ba; -< 0 for every x E X, and the boundary is empty. Every compact set A c X is thus a compact Shilov set. The function is hence defined for every function f E f/; this function is well known in classical potential theory, where it is called the reduite off on A.
lA
lA
Theorems on the existence of dispersions We are now going to extend Theorem 36 of the preceding section (on the existence of dilations) to the balayage defined by a convex cone f/. We begin by establishing a very general result, which implies a large number of existence theorems for dispersions; it is due to V. Strassen (117). Here first are several points of vocabulary. Denote by E a Banach space (with norm written 11 • ID and by S the collection of sublinear functions on E; consider a mapping q: w ~ qw from a measurable space (n,§) into S. We shall say that q is bounded if there exists a positive constant K such that Iqw(x) I < K IIx 11 for every x E E and every wE n*; we shall say that q is weakly measurable if the real-valued function w JV\I'+ qw(x) is measurable for every x E E. • The reader can easily verify the relation Iqw(x) - qw(y) I ~ K the functions qw on E.
Ilx - yll,
which implies the continuity of
XI, T51
244
Convex Cones and Extremal Elements
T51 THEOREM (Strassen) Let E be a separable Banach space, and (n,~ a measurable space with a complete bounded positive measure A. Let p: w .A./II'+ p w be a bounded, weakly measurable mapping from n into the set S. Denote by s the sublinear function
s(x)
=
L
PcO<x) dA(W)
(x
E
E).
(51.1)
Let x' be an element of the dual E' of E. The following properties are equivalent: (a) x' is dominated by s [(x',x) < s(x)for every x E E]; (b) There exists a bounded, weakly measurable mapping W.A./II'+ x~ from n into E' such that x' is dominated by p w for A-almost all w, and such that (x',x) =
fn (x~,x) dA.(w)
for every
x
E
E.
(51.2)
Proof It is clear that (b) implies (a). To establish the reverse implication, we first recall several facts from measure theory. A function f defined on n, with values in E, is said be to measurable if it is the uniform limit of measurable elementary functions. A measurable function f is said to be integrable if the quantity
IIfl11
=
fnl'f(w) 1 dA( w)
is finite. One denotes by 2}; the vector space of integrable functions with values in E, by L}; the quotient space of 2}; by the subspace of a.s. zero functions. It can be shown that L};, with the norm 11 • 111 defined above, is a Banach space. Since the measure A is bounded, the constant mappings from n into E are integrable. We can clearly assume, without loss of generality, that A. is a probability law. Denote by () the mapping which associates with each x E E the equivalence class of the constant function equal to x; () is an isomorphism of E onto a subspace of L};, which we identify with E in what follows. Let f be a real-valued integrable function, and x an element of E. We denote by fx the integrable function W.A./II'+ f(w)x with values -in E; it can be shown without difficulty that the vector space generated by the classes of these functions is dense in L1. Let G be a continuous linear functional, of norm K, on the normed space 21 (or, by passing to the quotient, on L1). It can be shown * that there exists a weakly measurable mapping W.A./ll'+ g'(w), with values in the ball of radius K in the Banach space E' such that for every integrable function fwith values in E we have G(f)
=
L
(g'(w),j(w» dJ.(w).
(51.3)
Now let us return to the proof of Theorem 51. Let f be an integrable function with values in E; since the real-valued function W.A./ll'+ Pw(f(w)) is obviously integrable, we can extend the sublinear function s from X to L1 by setting s(f) =
fn PCJlf(w)) dA.(w)
(fE L1:).
• Here is a quick proof of this result. For every x E E, consider the linear functional/ .A./II'+ G(/x) on the space £1, with norm at most equal to K Ilxll. The dual of V being LOO, there exists a unique element hz of LOO, with norm at most equal to K Ilxll, such that G(Jx) = E[fh z ] for every function/ E V; it is c1earthat hz +" = hz + h" and h t% = th z • Let p be a linear and isometric lifting of Loo into.flJ oo (see VII.TU); denote by H z the function p E.flJ oo , and by g' the linear functional x .A./II'+ Hz ( w) on E. The desired properties can be easily verified.
("f
Balayage Defined by a Convex Cone of Functions
245
XI, T52
The linear functional x' on E, dominated by s on E, can be extended from T2 to a linear functional ~' on L};, which is dominated by s. This linear functional can be written
~'(f) = fn (x~,f(w)
dA(W)
(f E L1J),
(51.4)
where W.J\.f\t+ x~ denotes a bounded, weakly measurable mapping from n into E'. Formula (51.2) says that ~' extends x'; the theorem will hence be established if we show that x~ is dominated by pw for almost every w. Let (Xn)nEN be a dense sequence in E; it clearly suffices to verify that a.s. (x~,xn) < Pw(xn) for every nE N. To see this, we write the relation ~'(IAXn) < s(IAxn) for every A E:F as
fA
(X~,Xn) dA(W)