This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
A}, so if g belongs to some L4, or H.,, then so does f *, with no larger norm. Now, it is easy to see that
1A (XQ  XT) = Xp  X.
Proof. Let n E IN be given. We must show that {p = n} E .fin. Now AE F,,so we have for =n}fAEFn. Since or < r, we have F, g F,, so the complementary set St \ A belongs to FT, and {T = n} \ A E Fn. Therefore
{p=n}= (for =n}flA)U({T=n}\A) E.Pn. For equation (1.1.5a), consider separately what happens on A and on Q\A. Optional stopping
Let (Xn) be an adapted stochastic process. If or is a random variable with values in IN, then it makes sense to talk about Xo, that is, the value that the process has at the time v. Technically, Xa is a random variable given by
(XQ)(w) = XaM(w) If Xn represents the fortune of a gambler at time n, and he has the option of stopping whenever he wants to (but does not know the future), then the time he chooses to stop must be a "stopping time" as it has been defined. This means that XQ is his fortune when he stops. Then X, not only is a random variable, but is F measurable:
(1.1.6) Proposition. Let v E E and let (Xn) be an adapted process. Then X, is X, measurable. Proof. For all neIN and all A E JR.,
to, = n} fl {Xv < A} = to, = n} fl {Xn < Al E j,;,. Therefore {Xo < Al E .F,, so Xo is .F,measurable.
A useful consequence of this proposition is the possibility of "optional sampling." Let (Xn) be a process adapted to the stochastic basis (.Fn). Let of < a2 < ... be an increasing sequence of stopping times. Then the sampled process is the sequence Yk = XQ, of random variables, which is adapted to the stochastic basis (Gk)kEIIN defined by ck = Yok The random variable supra IXn1 is known as a "maximal function." The
next result is our first example of an inequality for the distribution of the maximal function; other maximal inequalities will be seen later.
1.1. Definitions
7
(1.1.7) Maximal inequality. Let (Xn) be an adapted sequence of random variables. Then for every A > 0,
P{sup IXnI>A}< nEIN
J
supE[IXaI]
A TEE
Proof. Let N E IN and define AN = {maxn
 I min {nE IN:1

N
ifwEAN, if w V AN.
Then 'r E E. Now (1.1.7a)
sup E [IX°I] > E [IXTI] > E [IXTI IAN]
AP(AN).
Now as N > oo, the set AN increases to {supra All > A}, so the result follows by the monotone convergence theorem. Complements
(1.1.8) (Uniqueness of limits.) Let (at) be a net in a metric space S. Then (at) converges to at most one point of S. (1.1.9) (Subnets.) Let J and L be two directed sets. A function cp: L > J is called cofinal if, for every t E J there exists so c L such that cp(s) > t for all s > so. If (at)tEJ is a net in S, then the net (b.)EL is a subnet of (at) if there is a cofinal function cp: L + J such that b8 = a,(8) for all s E L. Prove: if (at) converges to a E S, then the subnet (be) also converges to a. (1.1.10) (Frechet property.) Let (at) be a net in a topological space S. Let a E S. Suppose that every subnet (b8) of (at) admits a further subnet (ca) that converges to a. Then (at) itself converges to a. (See, for example, Kelley [1955], Chapter 2.) (1.1.11) (Boundedness.) bounded.
A convergent realvalued net need not be
(1.1.12) (Measurable approximation lemma.) Let (Fn) be a stochastic basis. Write .F,,,) for the aalgebra generated by the union UnEIN .F,2. Let Y
be an F,measurable random variable. Then for every e > 0, there exist an integer n E IN and a random variable X, measurable with respect to .fin, such that P j IX  Y J > I < e. To prove this, observe successively that the set of random variables Y with this approximation property: (a) is a linear space; (b) is closed under pointwise a.e. convergence; (c) contains indicator functions 1B for B E Fn; (d) contains 1B for B E (e) contains all
random variables measurable with respect to F. (1.1.13) (Optional stopping of the stochastic basis.) If (Fn) is a stochastic basis, and T E E, then F, is a aalgebra.
Stopping times
8
(1.1.14) (Monotonicity of stopping.) If a < r, then F, C Fr. (1.1.15) (Generalized waiting lemma.) Let a E E be given, and for each
m E IN with {a = m} # 0, let ri'i E E be given with T(m) > in on {a = m}. Then T defined by r(w) = T(m) (w) on {a = m} belongs to E and
T> a. (1.1.16) (Weak L1.) A random variable Y is said to belong to "weak L1" if there is a constant C such that P{IYI > Al < C/A for all A > 0. Prove that all L1 functions belong to weak L1, but that the converse is false.
(1.1.17) (Maximal weak L1.) Suppose (Xn) is a stochastic process such that supE [IXQI] < oo.
oEE
Then (by the maximal inequality 1.1.7) the corresponding maximal function
sup, IX,,l belongs to weak L1. Show that the maximal function need not belong to L1.
(1.1.18) (Reversed stochastic basis.) A reversed stochastic basis on (Q, F, P) is a family (.Fn)nEIN of aalgebras satisfying the monotonicity condition
F
F 1
2 F 3 J ...
Simple stopping times are defined as before; we will continue to write E for the set of all simple stopping times. (It is now a dual directed set.)
Prove the maximal inequality in this setting: Let (X,)nE_IN be a sequence of random variables adapted to the reversed stochastic basis Then for every A > 0,
P{ sup IXnI>A}< 1supE[IXQI] nEIN
J
A 1EE
(Edgar & Sucheston [1976a]). Remarks
MooreSmith convergence was proposed by Moore [1915] and Moore & Smith Its usefulness in general topology was displayed by Birkhoff [1937], and carried out in detail by Kelley [1950]. A particular case of the sequential sufficiency theorem 1.1.3 was proved by Neveu [1975]: A family indexed by a directed set converges in a complete metric space if there is convergence for all increasing [1922].
sequences.
The importance of stopping times was emphasized by Doob [1953]. The maximal inequality 1.1.7 is from Chacon & Sucheston [1975].
1.2. The amart convergence theorem
9
1.2. The amart convergence theorem Let a probability space P) and a stochastic basis (.F) be fixed. We will write, as before, E for the set of all simple stopping times for If (X,,,) is a stochastic process, and a E E, then the stopped random variable XQ makes sense. If X,, is integrable for all n, then of course Xo is integrable, since a has only finitely many values. An adapted sequence (Xn) of integrable random variables is called an amart if the net (E [Xe] )oEE of real numbers converges. That is, there is
a real number a with the property: for every e > 0, there exists vo E E such that, for all or E E with a > ao, we have
IE[Xc]  aI < e. Since the metric for the real line is complete, this is equivalent to a Cauchy condition: for every e > 0, there exists ao E E such that, for all v, T > ao, we have IE[XX]  E[XT] I <E. The definition may, in turn, be phrased in terms of sequences only: The
integrable stochastic process (Xn) is an amart if and only if, for every sequence vl < a2 < in E with vn * oe, the sequence (E [Xon] )71EIIv converges.
If the net E [X.,] is constant, then of course it converges. An adapted sequence X,, with this property is called a martingale. Some other interesting examples of amarts are given below; see, for example, (1.4.4). Observe that in the definition of amarts it is essential to use simple stopping times: otherwise the famous original gambling martingale, in which
the player doubles his stakes each time he loses, would not be an amart (see Section 3.2).
The lattice property
The collection of all Libounded amarts satisfies a very useful lattice property: if (X,,) and (Y,,) are both Llbounded amarts, then so are the pointwise maximum (X,a V Y,,) and the pointwise minimum (X,,, A YY,). This will be proved below (1.2.2). A similar result is true for semiamarts. A stochastic process (Xn) is a semiamart if supQEE IE [Xe] I < oo.
(1.2.1) Lemma. Let (X,,) be an amart. Then (X,,,) is a semiamart. Proof. The net E [X , j converges, so it is a Cauchy net. There is N E IN such that IE [XN]  E [XQ] I < 1 for all a E E with a > N. If a is any simple stopping time, then or V N is a simple stopping time > N. But IE [XonN] I < E Amax jXnj]
and IE [XovN  E [XN] ] I < 1, so IE [X,,] I = IEr[XaAN] + E1[XXVN]  E [XN]
<E r 1max jXnj +1<00. <n.
0
Stopping times
10
(1.2.2) Theorem. (1) If (Xn) is an L1bounded semiamart, then so is (X,+a ). (2) If (Xn) is an amart, then the limit limUEE E [XQ ] exists in [0, oo]. (3) If (Xn) is an Libounded amart, then so is (X,+,). (4) If (Xn) and (Yn) are Libounded semiamarts, then so are the processes (Xn V Yn) and (Xn AY,,.). (5) If (X,,) and (Yn) are Llbounded amarts, then so are (X,, VY,,) and (Xn AY.).
Proof. Given a and r in E with or < T, by the waiting lemma (1.1.5), with A = {Xa > 0}, the random variable p defined by P(W) =
Q(W)
ifXa(W)>0
T(W)
if X, (W) < 0
is a stopping time. By (1.1.5a), we have
X p XT = 1 A (X Q X.,) > X v  X,f . Taking expectation, we have (1.2.2a)
E [X,,] E [XT] <E[Xp] E[X.,].
To prove (1), let a E E be arbitrary, and choose r > a constant, r = m. Then E [XQ ] < E [Xp] + E [X,] < sup,,,EE E [X.] + SupnEIN E [IXnI]
To prove (2), choose Qn E E, an > oo, such that
E [X n ]  lim sup E [X, ] a
and then choose
Qn such that E [X n ] > lim inf E [X,, ] a
.
Since (Xn) is an amart, we have E [X4]  E [X n ]  0 by inequality (1.2.2a). Therefore lim sup E [X,+] = lim inf E [X,+], possibly both infinite. For (3), combine (1) and (2). For (4), observe that x V y = (x  y) + + y and x A y = x  (x  y)+; then apply (1). For (5), apply (3), using the same identities.
(1.2.3) Corollary. Let (Xf)fEnv be an Llbounded amart. (a) If A is a positive constant, then the truncation ((A) V Xn A A)nEIN is also an amart. (b) SupaEE E [IX.,, 1] < oo. (C) SupnEIN IXnI < 00 a.s.
Proof. For (a), apply "the lattice property," that is, part (5) of the theorem. For (b), observe that I xI = 2x+ . x. For (c), apply the maximal inequality (1.1.7) using (b).
1.2. The amart convergence theorem
11
Convergence
We are ready now for the first convergence theorem. We begin with a useful observation on approximation of cluster points by stopping times.
Cluster point approximation theorem. Let (Xn) be an
(1.2.4)
adapted stochastic process, let Y, = a(UnE1N fn), and let Y be an F,,,,measurable random variable. Suppose that Y(w) is a cluster point of the sequence
for every w E Q. Then there exists a sequence
al < a2 < in E such that an > oo and limnw Xo = Y a.s. Proof. Given any N E IN and e > 0, we will construct a stopping time or E E with a > N and P{JXa YI < e} > 1e. This may then be applied recursively to produce an increasing sequence an E IN with Qn + oo such that XQ converges to Y stochastically (that is, "in probability"). Then there is a subsequence that converges a.s.
First, since Y is .F measurable, by (1.1.12) there is N' > N and an .2N'measurable random variable Y' such that
P{lY  Y'l < 2} > i
(1.2.4a)
2
But { rY
 Y'l < } C {there exists n > N' such that lXn  Y'l < 21' 2
Therefore, there is an integer N" > N' such that P(B) > 1  e/2, where
B = {there exists n with N' < n < N" such that IXn  Y'I < 21' Define the simple stopping time a as follows:
a(w) =
J inf{n:N'
Then a > N and (1.2.4b)
P{fix Y'I<2}>1
2
Combining (1.2.4a) and (1.2.4b), we obtain P{JXU  Yl< e} > 1 e. The main theorem is now easy.
(1.2.5) Amart convergence theorem. Let (Xn)nEIN be an Llbounded amart. Then (Xn) converges a.s. Proof. First, consider the special case where supra IXnI is integrable. Let
X * = lim supra Xn and X. = lim info Xn. Both of these random variables satisfy the hypothesis of the cluster point approximation theorem
Stopping times
12
(1.2.4). So there exist simple stopping times vl < 0`2 <
with Qn > 00
and XQ > X* a.s. and simple stopping times rl < T2 < .. with Tn > oo and Xr.. > X. a.s. Now by the defining property of an amart, limn E limn E [Xrn]. Finally, by the dominated convergence theo
rem, E [X*  X*] = limn E [Xo  Xj = 0, so X* = X. a.s. Now consider the general Llbounded amart (XX). If A > 0 is fixed, then the truncated process (A) V Xn A A is an amart that has supremum < A, and therefore integrable. So by the first case, the truncated process converges a.s. Now the truncated process agrees with (Xn) itself on the set SZa = {sup. IXnI < A}. Therefore (Xn) converges a.s. on the set Sta. By Corollary (1.2.3(c)), we have sup.EIN IXnI < oo a.s., so 0 is (up to a null set) the countable union of sets 1x. Therefore (Xn) converges a.s. Complements
(1.2.6) (Optional sampling for amarts.) Let (.Fn) be a stochastic basis, and let (Xn) be an amart for that stochastic basis. Let o1 < 0`2 < be an increasing sequence of simple stopping times for (.Fn). Then the process (Yk) defined by Yk = Xok is an amart for the stochastic basis (ck) defined by Gk = rok (Edgar & Sucheston [1976a], p. 199). (1.2.7) (Reversed amarts.) Let be a reversed stochastic basis. The adapted family (Xn)nE_IN of integrable random variables is an amart if the net E [Xe] converges according to the dual directed set E of simple stopping times. Here is the lattice property; note that the hypothesis if L1boundedness is not required: If (Xn) and (Yn) are reversed amarts, then so are (Xn V Yn) and (Xn AYn) (Edgar & Sucheston [1976a]). (1.2.8) (Reversed convergence.) Let (Xn)nE_IN be a reversed amart. Then (Xn) is uniformly integrable (Edgar & Sucheston [1976a], Theorem 2.9) and converges a.s. and in L1.
(1.2.9) (Chacon's inequality.) Let (Xn) be an L1bounded process. Then
E [lim sup Xn  lim inf Xn1 < lim sup E [Xo  Xr] . nEIN
nEIN
o,rEE
This may be proved using a variant of the argument given above for the amart convergence theorem; see Edgar & Sucheston [1976a]. It is due to Chacon [1974].
(1.2.10) (Approximation by stopping times.) Let (Xn) be an adapted integrable process. Then E [lim sup nEIN
(1.2.11) Then
< lim sup E [Xr] X nJ
.
rEE
(Approximation of sup.) Suppose all Xn are
E [sup lXn1] 11
= supE
[IX,1]
.
F1measurable.
1.3. Directed processes
13
If, in addition, (Xn) is a semiamart, then E [sup,, I X I ] < oo. (See Edgar & Sucheston [1976a], Proposition 2.4.) (1.2.12) (Restriction lemma.) Let (Xn) be an amart, and let A E Then the process (Xn 1A),,,>m is an amart. (1.2.13) (Associated charge.) Let (X,,) be an amart. Then for each A E U,,F,,, the limit p(A) = lim E [Xn 1A] noo exists and p is a finitely additive set function. Remarks
The term "amart" comes from "asymptotic martingale." Theorem 1.2.4 on approximation of limit points is from Austin, Edgar, Ionescu Tulcea [1974]. The lattice property was given explicitly in Edgar & Sucheston [1976a], but implicitly already in Austin, Edgar, Ionescu Tulcea [1974].
The amart convergence theorem was stated explicitly in Austin, Edgar, IonescuTulcea [1974]. The key element of the theorem is the use of simple stopping times. Their proof used the method of "upcrossings." The proof given here, based on truncation, is from Edgar & Sucheston [1976a]. Earlier versions of this theorem and related theorems are found in Baxter [1974], Lamb [1973], Mertens [1972], Meyer [1966].
1.3. Directed processes and the RadonNikodym theorem We will see many "derivation theorems" in this book. In this section we will prove the RadonNikodym theorem as an elementary example. For this purpose (and for its usefulness in the future), we discuss processes with
index set more general than V. In Chapter 4 we will discuss more thoroughly the theory of amarts, ordered amarts, and other processes, indexed by a directed set. But some basic results will be proved here. Processes indexed by directed sets
Let (St, F, P) be a probability space, and let J be a directed set. The family (.Ft)tEJ of aalgebras contained in .F is a stochastic basis if it satisfies
the monotonicity condition: F. C_ Ft whenever s < t in J. The family (Xt)tEJ of random variables is adapted to (.Ft) if, for all t E J, the random variable Xt is Ftmeasurable. A function a : S2 + J (a Jvalued random variable) is a simple stopping time for (.Ft) if o, has finitely many values and {a = t} E Ft for all t E J. (Because a has finitely many values, this measurability condition is equivalent to: for < t} E Ft for all t.) We will write E((Ft)tEJ) or E for the set of all simple stopping times for (.Ft). It is a directed set itself as before. An smart for (.Ft) is an adapted family (Xt) of integrable random variables such that the net (E [Xa] )aEE converges. An ordered stopping time is a simple stopping time r such that the elements t1, t2, , in the range of rr are linearly ordered, say t1 < t2 < < tm. We denote by E° the set of ordered stopping times. Then E°
Stopping times
14
is a directed set under <. An integrable stochastic process (XT)fEIN is an ordered amart if the net (E [XT])TEEo converges.
Clearly every amart is an ordered amart. When the directed set is J = IN, the notions of amart and ordered amart coincide. So each of these notions is a natural extension to directed processes of an amart indexed by N. For general index set, stochastic convergence (convergence in probability) of ordered amarts is proved. Now we will see below (1.4.3) that L1bounded submartingales are ordered amarts. So this convergence theorem implies the stochastic convergence of Llbounded submartingales. On
the other hand, amartsunlike ordered amartsconverge pointwise a.e. (or essentially) under the covering condition (V) often satisfied in derivation theory (4.2.8 and 4.2.11). Thus each of the two notions (amart and ordered amart) has its applications. Recall that a net (Xt)tEJ of random variables converges stochastically (or in probability) to X,,,, if
limp
{IXtXooI>e}=0
tEJ
for every e > 0; and converges in mean if limt E [IXt  X" I] = 0. Stochastic convergence is determined by a metric, for example
p(X, Y) = E [IX  YI A 1]
,
or p(X, Y) = E [1
Y
I+
X
]
(These equivalent metrics are complete, but we do not need that fact here.) Recall that a family (Xt)tEJ of random variables is uniformly integrable if
lim supE [IXtI 1{1xtI>a}] = 0.
A'OO tEJ
(1.3.1) Theorem. Let J be a directed set, and let (.F't)tEJ be a stochastic basis. Let (Xt)tEJ be an ordered amart (or amart). If (Xt) is L1bounded, then it converges stochastically; if (Xt) is uniformly integrable, then it converges in mean.
Proof. By the sequential sufficiency theorem (1.1.3), it is enough to show that there exist sn E J such that converges stochastically for
each increasing sequence t1 < t2 < ... in J with tn, > sn. Choose sn, increasing so that IE [XT,  XT2] I < 2' for r1, r2 E Eo, T1,72 > S.Suppose t,, increases and to > sn. We claim that (Xt.) is an amart for the stochastic basis (Gn) defined by 9,, = .Ft.. Given e > 0, choose N with 2N < e; for a E E((cn)) with a > N, we have tQ > tN > SN. Also tQ E E((.Ft)), since {v = s} E G. = .Ft. and or = s implies to = t87 so the sets It, = t8} are countable unions of sets in Ft, = G8, and hence is are in G8. Then IE [Xto  XtN] I < 2N < E. Thus (E [Xto]
1.3. Directed processes
15
Cauchy. Thus (Xt.) is an amart. It therefore converges a.s. by the amart convergence theorem (1.2.5), and thus it converges stochastically. On a uniformly integrable set, stochastic convergence coincides with convergence in mean. (See Theorem (2.3.4).) The question of pointwise convergence of (Xt) will be considered below in Chapter 4. This amart convergence theorem will be used to prove the RadonNikodym theorem. Notice that the RadonNikodym theorem has not been used up to this pointwe have not even mentioned conditional expectation. Let (SZ,.F, P) be a probability space. A set function p: F + 1R is countably additive if, for any sequence (An) C F of pairwise disjoint sets,
it U An = > p(An)n=1
n=1
We say that p is absolutely continuous (with respect to P) if p(A) = 0 whenever P(A) = 0. The variation of p on A E F is n
Ip1(A) = sup
lµ(C2)I, i=1
where the supremum is taken over all finite pairwise disjoint sequences
(C1) 1cFwith Czc_A.
Our proof uses some facts about signed measures. They are proved in the Complements (below) for completeness, and so that it can be verified that they do not use the RadonNikodym theorem. The facts are: (1) Let p: F > IR be a countably additive set function. If An E F and Al D A2 D , then , then µ((l An) = lim p(An). If Al C A2 C_ µ(U An) = lim p(An) (1.3.4). (2) A realvalued countably additive set function p has finite variation (1.3.5).
(3) Let p be a countably additive set function that is absolutely continuous with respect to P. Then for every e > 0, there exists 6 > 0
such that ifAEFandP(A) <6,then Ip(A)I <e (1.3.6). (1.3.2) RadonNikodym theorem. Let (Sl, F, P) be a probability space, and suppose p: F  IR is countably additive and absolutely continuous. Then there exists an integrable random variable Y such that
for all A E F.
Proof. A measurable partition of S2 is a finite subset t C F, consisting of sets A with P(A) > 0, pairwise disjoint [if A1, A2 E t, and Al 54 A2, then
Stopping times
16
Al n A2 = 0], and having union Q. Write J for the set of all measurable partitions. If s, t E J, then we say that t essentially refines s, and write t > s, if for each B E t there exists some A E s with B C_ A a.s. With this ordering, J is a directed set. If t E J, let Ft be the oalgebra generated by t; thus Ft is a finite algebra and t is the set of atoms of F. Clearly, if s < tin J, then F8 C _Ft. Next, for t E J, define a random variable
Xt =
P(A)1A. AEt
First we claim that (Xt) is an amart. [In fact it is a martingale: that is, E [X°] is constant.] Suppose o: S2 + J is a simple stopping time. Then for = t} E .fit for each t, so for = t} is a union of sets A E t. So
X° = tEJ
Xt 1 {°=t} = E Y, µ(A) 1A 1{°=t} = E E µ(A) P(A)1A' t
AEt
P(A)
t
AEt
AC{o=t}
Thus
E [X] _t E AEt E
u(A)
P(A)
AC {°=t}
=It
UA AEt
AC{°=t}
_ E µ{o = t} = 04 t
This shows that the net (E [X°]) is constant, and therefore convergent. So (Xt) is an amart. Next we must show that (Xt) is L1bounded. Let t E J. Then
E [IXtI] = E
Ip(A>I
 P(A) < Iµi(S2)
AEt
But µ has finite variation, by (1.3.5), so (Xt) is Llbounded. We claim next that (Xt)tEJ is uniformly integrable. Let e > 0. Then, since µ is absolutely continuous, by (1.3.6) there exists 5 > 0 such that It(A)I < e whenever P(A) < 5. Let A = IpI(S2)/8. For all t E J,
P(A) P(A) = p
E [Xt 1{xt>a}] = AEt
µ(A)>AP(A)
U AEt µ(A)>AP(A)
A
<,F,
17
1.3. Directed processes
since
U
P
A
<1
µ(A)< 6.
AEt
AEt
µ(A)>AP(A)
µ(A)>AP(A)
Similarly
E [IXtI 1{x,<a}] < e. So (Xt) is uniformly integrable. Now (Xt) is an Llbounded amart that is uniformly integrable. There
fore (Xt) converges in mean, say to Y E L1(1, F, P). If Ao E F, let to = {A, fI \ Ao} E J, and note that if t > to, then
E [Xt 1Ao] _ E IL(A) = p(Ao) AEt
ACAo
Thus IE [Y lAo]  IL(Ao) I < lim IE [Y 1Ao]  E [Xt 1Ao] I + lim IE [Xt 1Ao]  p(Ao)I
respect to P; we write Y = dp/dP or p = Y P. Once this basic RadonNikodym theorem has been proved, many variations can be done as well. For example, P can be replaced by an infinite measure, p can take values in the complex numbers, or in a finitedimensional vector space IR". These will not be proved here. We will discuss measures It with values in a Banach space in Chapter 5. Complements
(1.3.3) (Reversed amarts.) Let J be a dual directed set. The family (Ft)ZEJ of subQalgebras of F is a reversed stochastic basis iff s < t implies
Fg C Ft. Let (Xt)tEJ be a reversed amart; that is: E [Xo] converges for a in the dual directed set E((Fe)). Then (Xt) converges stochastically and in mean. As usual, Llboundedness need not be postulated (Edgar & Sucheston [1976a]).
(1.3.4) (Countable additivity.) Let p: F + IR be a countably additive set function. If A , , E F and Al 2 A2 2 , then µ((l An) = lim µ(A,'). If Al 9 A2 C , then p(UA,,) = limp(A,,).
Stopping times
18
To prove the first one, let B1 = Al and Bn = An \ An_1 for n > 2. Then the sets Bn are pairwise disjoint, and ao
00
U00
Bn =
U An = µ n =1
µ(Bn) n 1
n=1
n
= n m > p(Bk) = n
mooµ(An).
k=1
For the second part, apply the first part to the complements 0 \ A. (1.3.5) (Variation.) A realvalued, countably additive set function µ has finite variation. First, we claim that supAE.F µ(A) < oo. If not, then there exist sets An E .F with p(An+1) > µ(Un=1 Ak) + 1. Then 00
µU
n
0o
Alc)
µ(A1) +:t it
n=1
An+1 \ U Ak = oo,
n=1
k=1
contradicting the fact that p has real values. Similarly, infAE.F p(A) > oo. Now if (Ci) 1 are pairwise disjoint sets, write
I+={i:10} I_ ={i: 1
µ(Ci)I = > µ(C'i) i=1
iEI+
µ(ci) iEI_
=µ U Ci µ U Ci iEI_
iEI+
< sup µ(A)  inf µ(A) < cc. AEF
AEF
Thus 1µ1(n) < oo.
(1.3.6) (Absolute continuity.) Let µ be a countably additive set function that is absolutely continuous with respect to P. Then for every e > 0,
there exists 6 > 0 such that if A E F and P(A) < 6, then 1µ(A)1 < e. Indeed, suppose the conclusion is false. Then there would exist e > 0 and sets An E F with P(A) < 2n but Ip(An)I > e. But then the set A= n°°==1 Uk n An satisfies
P(A) =1imP (OAk) < linm E P(Ak) = 0, k=n
k=n
00
Iµl(A) = lnm 1µlk=n U Ak
> lim sup lµ(An)I > e. n
1.4. Conditional expectations
19
Then there exists B C A with P(B) = 0 and p(B) 54 0. This contradicts absolute continuity. (1.3.7) (Hahn decomposition.) We may use the RadonNikodym theorem
to prove the Hahn decomposition theorem: Let µ:.F + 1R be countably additive. Then there exist measurable sets P and N with P U N = ft and
Pf1N=0such that p(A)>0forallACPandp(A) <0forallACN. Indeed, let P be a probability measure equivalent with jpj; let X be the RadonNikodym derivative dtt/dP, and set P = {X > 0}, N = {X < 0}. Remarks
Stochastic convergence of amarts indexed by directed sets comes from Edgar & Sucheston [1976a]. Ordered amarts are from Millet & Sucheston [1980b]. The process
Xt =
P(A) 1A AEt
is a famous martingale, appearing in Doob [1953], page 343, and earlier in Danish and other papers.
1.4. Conditional expectations A special case of the RadonNikodym derivative, important for martingale theory, is the conditional expectation. We will deal with conditional expectation in a probability space now. Later (Section 2.3) we will discuss conditional expectation in an infinite measure space. An intuitive way to think of the conditional expectation EO [X] is: the expected value of X given the information of the aalgebra G. Definition and basic properties
Let (1, _'F, P) be a probability space, and let G C_ F be a oralgebra. Suppose X is an integrable random variable, that is X E Ll (S1, F, P). A conditional expectation of X given G is a random variable Y, integrable and measurable with respect to G, Y E Ll (1l, G, P), such that
E[Y1A] =E[X1A]
for allAEG. The existence of conditional expectations is a simple consequence of the
RadonNikodym theorem: on the aalgebra G, the set function µ(A) = E [X 1A] is absolutely continuous with respect to the restriction PIG of P. We define VX as the RadonNikodym derivative of ft with respect to P on G. It is also clear that Y is unique up to sets of measure zero, so we often speak of the conditional expectation of X given G, and write Y = E9X or E [X 1G].
The properties of the conditional expectation are first those of the expectation (the Lebesgue integral), which is EN for the trivial oralgebra N = {Si, 0}: linearity, positivity, contraction property with respect to
Stopping times
20
the Lp norm. The proofs are similar (or use the properties of the ordinary expectation to shorten the arguments). Thus Lebesgue's monotone and bounded convergence theorems hold for conditional expectation: limn, X, = X a.s. implies liraV [Xn] = V [X] a.s. assuming either each X, integrable and (X,,) monotone, or sup I X,,I integrable. However, lim X,,, = X a.s. does not necessarily imply lim E9 [X,,] = EO [X] a.s. if (X,,) is only assumed to be uniformly integrable (1.4.15). Special properties of the conditional expectation are the following:
(1.4.1) Proposition. (a) Factorization: If X is measurable with respect to 9, then E9 [XY] = XE9 [Y], provided both sides exist. (b) The smaller algebra prevails: if GJl and GJ2 are aalgebras with c1 c c2 and X E L1, then E91 [E92 [X]]
= Ec2 [EQ1 [X]] = E91 [X]
.
Proof. (a) Since both sides are measurable with respect to 9, it suffices to show that the right side satisfies the definition of the left side, that is: for each A E G,
E [XY 1A] = E [(XE9 [Y]) 1A] .
If X = 1B for some B E 9, then this is easy: both sides reduce to E [Y 1AnB]. For a general X E L1, use linearity and approximation by simple functions. (b) Since E191 [X] is measurable with respect to 92, the second equality follows. If A E 91, then E [1AEc2 [X]] = E [1AX] = E [1AE91 [X]], so E91 [E02 [X]] = E91 [X].
One last property deals with the stopped aalgebra .TQ.
(1.4.2) Localization theorem. Let (52,.1, P) be a probability space, let J be a directed set, let (.Ft)tEJ be a stochastic basis, and let X E L1. If Yt=EFt[X] for all t E J, then Yo= E'' [X] for all a EE. Proof. The assertion may be stated in another way: For each t, we have E' [X] = E.F't [X] on the set {a = t}. F o r the proof, write {t1, , t,, } for the set of values of a, and let
Z = E EFti [X] 1{Q=tt}. We claim that Z = E'° [X]. First we must show that Z is .FQmeasurable. Now if A E IR, then for all ti we have
{Z>A}n{a=ti}={EF^[X]>A}n{a=ti}E.Ftt.
1.4. Conditional expectations
21
Therefore {Z > Al E F,,; this shows that Z is .Fomeasurable. Next, if A E .F,,,, we must show E [X 1A] = E [Z 1A]. But we have
An{Q=ti}E.F't: for all ti, so E [X 1A] _
E [EFtI [X] 1An{o=t;}] = E [Z 1A] .
E [X 1An{o=t;}] _ i
i
This completes the proof. Martingales and related processes
There are special classes of amarts that are naturally defined in terms of the conditional expectation. Now that conditional expectations have been discussed, we may consider these classes. The most important is the martingale.
Let (Il, F, P) be a probability space, let J be a directed set, let (Ft)tEJ be a stochastic basis, and let (Xt)tEJ be adapted and integrable. Then we say (Xt) is a martingale if Ewe [Xt] = X8 for s < t; we say (Xt) is a submartingale if Eya [Xt] > X8 for s < t; we say (Xt) is a supermartingale
if E' [Xt] < Xe for s < t Note the use of ordered stopping times in the characterizations that follow.
(1.4.3) Theorem. Let (Xt) be an adapted and integrable process. Then: (i) (Xt) is a submartingale if and only if the net (E [X'] ),"EE' is increasing [in the sense that E [Xo] < E [XT] for or < T] if and only if the process (XT)TEEo is a submartingale [that is, for o', T E E°, o' < T].
[XT] > Xo
(ii) (Xt) is a supermartingale if and only if the net (E [X,,] )QEEo is decreasing if and only if the process (XT)TEEo is a supermartingale. (iii) (Xt) is a martingale if and only if the net (E [XT])TEEo is constant if and only if the net (E [XT])TEE is constant. [XT] > Xo Proof. (i) Let (Xt) be a submartingale. We first prove that for D', T E E0, or < T. By the localization theorem 1.4.2, it suffices to show that Ewe [XT] > X. on the set {v = s}. Suppose that T takes values t1 < t2 < < t, . For a fixed value s of v, we use induction on the index n between 1 and m defined by to = max { T(w) : Q(w) = s }. Since or < T,
certainly to > s. For tn = s, we have r = s on for = s}, and therefore Efe [XT] = X8 on {v = s}. For the inductive step, suppose t,ti > s, and define T' E EO by
T'(w) =
T(w)
if o(w) = s and r(w) < t,,
t,i_1
if a(w) = s and r(w) = t,,,
T(w)
if or(w)
s.
Stopping times
22
Then r > T' > Q. By the induction hypothesis, E'' [XT] > X. on to, = s}. Also, {T' <71 = {T' < ti,1} n to, = s}. Now on to, = s}, EFtn1 [XT]
E.Ftn1 [XT' + (Xtn  Xtn1) 1{T,
> EFtn1 [XT,] +0 = XT,.
Hence, still supposing to > s, on to, = s}, we have Er' [XT] = EF° [E'"Vtn1 [XT]] > ET [XT,] > X8.
This shows ET° [XT] > Xo and completes the proof of [XT] > X, for a, T E E0, a < T. On integrating, we obtain E [XT] = E [E [X] ] > E [Xs] for or < T. Next suppose that the net (E [XT] )TEEo is increasing. We must show


that Ewe [Xt] > Xe if s < t. Let A E F. Then if wVA if wEA
s
{t
is an ordered simple stopping time, and or > s. Therefore E [Xo] > E [X8]. That is: E [Xt 1A] + E [X810\A] > E [X81A] + E [X810\A]
.
Therefore E [Xt 1A] > E [X81A]. This is true for all A CF8, so EF° [Xt] >_ X8,
as required. (ii) Since (Xt) is a submartingale if and only if (Xt) is a supermartingale, (ii) follows from (i).
(iii) Suppose (Xt) is a martingale. Let a E E be given. Choose t > a. Then
E [X"] = E E [Xs 1{0=8}] 8
_
E [Xt 1{0=8}] 8
= E [Xt]
We can see that this value is independent of t by integrating the equation E78 [Xt] = X8f so in fact (E [X,])oEE is independent of a.
1.4. Conditional expectations
23
If (E [X])QEE is constant, then certainly (E [Xc])vEEo is also constant. Finally, if (E [Xo])oEEo is constant, then it is both increasing and decreasing, so by (i) and (ii) the process (Xt) is both a submartingale and a supermartingale, therefore a martingale.
Let (Xt)tEJ be an integrable process. We say that (Xt) is a quasimartingale if there is a constant M such that n1
E [IE [Xti+l I Fti]  Xti ] < M i=1
< to in J. (1.4.4) Theorem. Every quasimartingale (Xt) is an ordered amart. for all finite increasing sequences tl < t2 <
Proof. Let (Xt) be a quasimartingale: n1
(1.4.4a)
M = sup
E[
[Xt,+l I 'Pti]  Xti Il < oo,
i=1
J
where the sup is o v e r all finite sequences tl < " < t , . We will show that (E [X,])TEE is a Cauchy net. Let e > 0 be given. Choose sl < s2 < < sm so that m1
E [IE
(1.4.4b)
[X8i+,
I .F'8i]  X9{
> M  e.
i=1
Now let T E E° with r > sm,. Write tl < t2 < ... < to for the set of values of T. Now if we apply the inequality (1.4.4a) with the finite sequence subtract (1.4.4b), we obtain
E E [IXt,  E [Xtj+1 I Fti]
11 < e.
j=1
Now for ti < tj we have {T = ti} E .Ft,, so we have n1
IE[XT]E[Xtn]I =
E [(Xti  Xtn)1{T=ti}] i=1
n1 n1
E E E [(Xt.i  Xtj+1) 1{T=ti}] i=1 j=i
n1
j
E E E [(Xt;  E [Xtj+1 I Ft,] )1{T=ti}]
j=1 i=1
Stopping times
24 n1
j
< E EE [IXt,  E [Xt,+1 I F,] I 1{r=t;}, j=1 i=1
n1
< E E [IXt,  E j=1
[Xti+1
I mot,] I] < E.
So if T1iT2 > s,n, we have IE [Xrl]  E [Xr2] < 2e. Therefore (Xt) is an ordered amart. Riesz decomposition The Riesz decomposition (1.4.6) for an amart or an ordered amart shows that the process is "close" to a martingale. First consider an alternative ordering for the set EO of ordered stopping times. If a, ,r E E0, we write a < T iff there is t E J with v < t and t _< T. Note that EO is directed under the order as well as under the usual
order <. For convergence of nets, the difference does not matter. Indeed, or <'< T
implies a < r, so if a net (ar) converges according to <, then it trivially converges according to <'< . Conversely, suppose (ar) converges to a according to Then given e > 0 there is v E E° such that l ar  al < e for all T E E° with a T. But there is t E J with t > a, and therefore l aT  aI < e for all r > t. Thus ar converges to a according to <. (See also (1.1.9).)
When we take a limit over pairs of stopping times, it does make a difference whether we restrict to pairs a, ,r with or < T or to pairs with a ,I< T. The ordering appears in the following useful characterizations of amarts and ordered amarts. (1.4.5) Difference property. Let J be a directed set, and let (Ft)tEJ be a stochastic basis. Let (Xt)tEJ be an adapted integrable process. (i) (Xt) is an ordered amart if and only if (1.4.5a)
lim IIE'°° [Xr]  Xo
O << T o,rEEO
II1 =
0.
(ii) (Xt) is an amart if and only if (1.4.5b)
lim
O
E7° [Xr]
 XaII1= 0.
Proof. (i) Suppose that the difference property (1.4.5a) holds. Then for a
IIE''[Xr]Xoll1'0
1.4. Conditional expectations
25
so (Xt) is an ordered amart. Conversely, let e > 0; choose s E J such that a > s, r > s and a, r E E° implies E [XQ]  E [XT] l < E.
Lets < or 4<,r; for any A E Fo define p = a on A, and p = Ton 1l \ A. Then p E E°. Furthermore
E[1A(XaE"[XT])] =E[Xp]E[XT]<E. Now take A = {XQ > E [XT]} to see that
E [(Xe  EF [X,])+] < c. Similarly
E [(Xc  EF° [XT]) ] <
E,
so Xo  Elm [XT] Iii < 2e. This proves the difference property (1.4.5a). (ii) This proof is the same: substitute "amart" for "ordered amart," "E"
for "E°," and "<" for " l< ." (1.4.6) Riesz decomposition. Let J be a directed set, and let (.'Ft)tEJ be a stochastic basis. (i) Let (Xt) be an ordered amart. Then Xt can be uniquely written as Xt = 1't + Zt, where Yt is a martingale and (ZT)TEEo converges to 0 in mean. (ii) Let (Xt) be an amart. Then Xt can be uniquely written as Xt = Yt + Zt, where Yt is a martingale and (ZT)TEE converges to 0 in mean.
Proof. (i) For any a E E°, the net (E° [XT])TEEo is Cauchy in L1 norm by the difference property. Write Yo for the limit. Conditional expectations are continuous for Llnorm, so if a1 * a2 we have E'F1 [Y2] = lim E'E*°1 [E"°2 [X] ] = lim r E'°1 [XT] = Y 1.
So (Yt) is a martingale. The difference Zt = Xt  Yt is an ordered amart, and by the difference property 11 ZTI1 1 + 0. _ For uniqueness, assume there is another decomposition Xt = Yt + Zt. For A E .7=t, we have
E [1AYt] = limE [1AYT] = limE [lAXT] = limE [1AYT] = E [1AYt] . T
T
T
Hence Yt = 1't.
(ii) This proof is the same: substitute "amart" for "ordered amart," "E" for "E°," and "<" for "
Stopping times
26
The sequential case
The most important setting of the theory is of course the case J = IN. Then a submartingale is defined by X,n < E.7m [Xn] for m < n, a supermartingale by Xm > E. [Xn] for m < n, and a martingale by X. = Em [Xn] for m < n. It is easy to see by induction that these properties need to be checked only for n = m + 1. A quasimartingale is an integrable process (Xn) for which there is a constant C such that n
IIEI [Xi+i]  XiIIi < C for all n. Equivalently (1.4.30) there is a constant C such that
r,i IE[XTt}1XT,]I <
for any sequence Ti < T2 < < Tn,, of simple stopping times. When J = IN, all stopping times are ordered. So Theorems (1.4.3) and (1.2.5) lead to:
(1.4.7) Theorem. Let (Xn)nEIN be an Llbounded supermartingale, submartingale, quasimartingale, or martingale. Then (Xn) is an amart, and converges a.s. If (Xn) is also uniformly integrable, then it converges in mean. Complements
(1.4.8) (Atomic example.) An atom in a aalgebra 9 is a set A E 9 such that if B E 9 and B C A, then either P(B) = 0 or P(A \ B) = 0. If A is an atom of 9, then E19 [X] has the value (1/P(A))E [X 1A] on A. (1.4.9) (Nonlocalization.) The localization of E9 [X] (1.4.2) works only in the variable g, not in the variable X. That is, if Yn = EO [Xn] for all n, then it does not follow that Yo = Eq [Xe]. For example, let 9 = {0, Sl},
let A E Fj satisfy P(A) = 1/2, let a = 1 on A and 2 otherwise, and let
Xi=11A,X2=21A. (1.4.10) (Reversed martingales.) Let (.Fn) be a reversed stochastic basis. The process (Xn)nE_IN is a reversed martingale if E [X,] is constant (a E E), or, equivalently, ET [Xn] = X,n for m < n in IN. Every reversed martingale clearly has the form Xn = E'^ [Y] for all n: in fact Y = X_1 will do. So a reversed martingale is automatically uniformly integrable. (1.4.11) (Counterexample for convergence.) It is not true that an amart (Xn) that is not Llbounded converges a.s. It is not even true that a martingale converges a.s. on the set {supra I Xn l < oo}. Let Y1i Y2,
be independent random variables with P{Yk = 1} = 1  2" and
1.4. Conditional expectations
P{Yk = 2k  1} = 2k. Let r = inf { n : Yn time (not a simple stopping time), and
27
1 }. Then r is a stopping
00
P{T = oo} = 11 (1  2k) > 0. k=1
Define X,, = En=(1)kYk 1{,>k}. Then (Xn) is an amart (even a martingale). Write X* = supn IXnI. Then P{X* > 2n} _< E n+12k = 2 so it follows that 2fP{X* > 2n} < 1 for all integers n > 1. Then
AP{X* > A} < 2 for all real A > 1. Thus we have X* < oo a.s., but on the set {T = oo}, the process Xn alternates between 1 and 0, and does not converge. (We owe this example to D. L. Burkholder.) (1.4.12) (Conditional expectation for nonintegrable random variables.) If X > 0 is a random variable (possibly with the value oo), the conditional expectation V [X] may be defined by
Y = lim E9 [X A n]. n.00
Even if X is not integrable, the defining relations E [Y 1A] = E [X 1A]
for all A E G
More generally, if X is a random variable such that E [X+] < 00 or E [X] < oo, then we may use the definition
remain true.
E9 [X] = E9 [X+]
 E9 [X]
,
since the right side is a.s. not of the form oo  oo. (1.4.13) (Contraction property.) The operator E19 is a contraction on L1: that is, IIE19 [X] III 5 IIXII1. Indeed, let A = {E19 [X] > 0}. Then A E G, so E [IE9 [X] I] = E [1AE19 [X]] E [1n\AE0 [X]] = E [1AX] E [1o\AX] < E [IXI].
(1.4.14) (Markovian operator.) The conditional expectation operator E19
is a Markovian operator, that is, a positive linear operator on L1 that preserves expectation. Every Markovian operator is a contraction on L1. Other examples of Markovian operators appear in Chapter 8. More generally, the conditional expectation operator is a contraction on Orlicz spaces (2.3.11), hence Lp spaces (1 < p < oo). In fact, E19 is a contraction on rearrangement invariant Banach function spaces (Lindenstrauss & Tzafriri [1979], p. 122).
(1.4.15) (Nonpreservation of convergence.) Convergence a.e. need not be preserved by E9, even in the presence of uniform integrability. Given a nonnegative sequence (Xn) that converges a.e. to 0 but such that sup I XnI
Stopping times
28
is not integrable, there is a oralgebra 9 such that lim sup V [Xn] = 00 (Blackwell & Dubins [1963]).
(1.4.16) (Lebesgue decomposition of signed charges.) Let g C F be an algebra of sets. A charge on G is a finitely additive set function p : C + R. If p has finite variation, then there is a decomposition p = µa + µs; where µa is absolutely continuous in the sense that µa (A) = E [Y1A] for all A, for some integrable random variable Y; and µs is singular in the sense that for
every e > 0 there is B E 9 with Iµsl(B) < e but P(B) > 1E. Prove this in a manner similar to (1.3.2), above. First suppose p > 0. Let J be the set of finite partitions of 1 by sets of G. Then J is directed by refinement. (Note: changing on a null set is not allowed, since it is possible that p(B) 0 0 even when P(B) = 0.) Then for t E J, let
Xt =
µ(A) 1A, P(A)
where the sum is over all A E t with P(A) # 0. Then (Xt) is an Llbounded supermartingale, so it converges stochastically, say to Y. Then pa(B) = E [Y 1B], B E 9, and u,(B) = p(B) pa(B) define the required decomposition. For general p, use the finiteness of the variation of µ to write p as the difference of two nonnegative charges. (1.4.17) (A decomposition for martingales.) Let (Xt) be an Llbounded martingale for the stochastic basis (.Ft). Then (Xt) can be written Xt = Yt+Zt, where Y is a martingale of the form Yt = E`r"t [Y] for some integrable random variable Y, and Zt is singular in the sense that the charge v(A) = limt E [Zt 1A] is singular as in (1.4.16). Prove this by applying the Lebesgue decomposition (1.4.16) to the charge µ(A) = limt E [Xt 1A], for A E 9 = Ut F. Show that Y may be chosen to be the stochastic limit of Xt (Theorem 1.3.1); then clearly Zt + 0 stochastically. (1.4.18) (Maximal inequality.) The maximal inequality (1.1.7) is
simpler for martingales than for general amarts, since the expression sup7Ex E [IX, 1] may be written in other forms: Let (Xn) be a martingale or a nonnegative submartingale. Then the net E [IXI] is increasing, so
sup E [IX,1] = lim E [IX,1] = sup E [IX,,1] = lim E [IX,,1] aEE
°EE
nEIN
.
nEIN
To see this, observe that if (Xn) is a martingale, then (I Xn I) is a nonnegative submartingale. So it is enough to consider the case of a nonnegative submartingale. By (1.4.3), the net (E [X,]),EE is increasing, and the sequence (E [Xn])nEIN is a subnet of it, therefore has the same limit (finite or infinite). So we have Doob's maximal inequality:
P{sup XnI ? A} <
supE [IXnI] .
1.4. Conditional expectations
29
(1.4.19) (Nonsimple stopping times.) Let (Xn) be an Libounded martingale or nonnegative submartingale. Then it converges, write X,,. = lim Xn. If r is any stopping time, with values in IN U {oo}, then E [IXrI] < supE [IXnl] . n
To prove this, write M = supra E [Xn]. Since (Xn) is an Llbounded martingale (or nonnegative submartingale), so is (Xrnn)nEIN The stopping times r A n belong to E, so certainly E [Xrhn] < M. But limn Xrnn = Xr
(consider both the sets A = Jr = oo} and 1 \ A = {r < oo}). Apply Fatou's Lemma.
(1.4.20) (Approximation by stopping times.) Let (Xn) be an adapted integrable process. Let r E E. Then E [sup IE  [Xn] I] < supE I a>r n>r
[X]
I]
(1.4.21) (Difference property in IN.) Let (Xn) be an adapted integrable process. Then the following are equivalent:
(1) (Xn) is an amart; (2) for every e > 0, there is mo E IN such that u, ,r E E and mo < v < T imply E[JE.Fo[Xr]XaI] <E;
(3) for every e > 0, there is mo E IN such that m E IN, T E E, and mo < m < r imply
E
1
[IEFm [Xr]X,,,.IJ <E.
(1.4.22) (Riesz decomposition in IN.) Let (Xn) be an amart. Then Xn has a unique decomposition Xn = Yn + Zn, where (Yn) is a martingale and (Zn) is an amart potential. In fact, Ym is the Ll norm limit limn Em [Xn]. [By (1.4.20) and (1.2.3(b)), the supremum supra EF [Xn] is integrable for each m.] Also lim,EE E [IZ,I] = 0 and limZn = 0 a.s. More information on amart potentials is given in (4.1.15). The Riesz decomposition will be generalized below to Banach spaces (5.2.13 and 5.2.27), and semiamarts (1.4.26).
(1.4.23) (Uniform difference property.) Let Xn be an amart. Then for any e> 0 there is m E IN so that for all a, r e E with m a T, we have
sup E
AE.F,
[(XX
 E'F° [X]) 1A] < e.
Stopping times
30
(1.4.24) (Doob potentials.) Amart potentials can be related to the potentials that are used in classical martingale theory. A Doob potential is a positive supermartingale Sn with limnE [ISaI] = 0. In particular limn E [Sn 1A] = 0 for all A E U°_1.F'n, so it is an amart potential, and Sn > 0. We will see (Theorem 4.1.15) that an adapted process (Zn) is an amart potential if and only if there is a Doob potential Sn with ) Zn I < Sn for all n. (1.4.25) (A stopped inequality.) Use Chacon's inequality (1.2.9) to prove: If (Xn) is an Libounded process, then E [urn sup IXn  Xml]
2limsupE [IE"° [XT]  XQI] r>a
n,m
(1.4.26) (Semiamart Riesz decomposition.) Suppose E [XT] is bounded. Then Xn = Yn + Zn, where (Yn) is a martingale and (Zn) is L1bounded and oscillates about 0 in the sense:
lim inf E E [Zi1A] < 0 < limsup 7 n
E [Z11A]
for all AEUFm. Hint: Fix m. By (1.2.11), sup E.1 [Xn] E L1. n
Let Ym be a weak truncated limit of a subsequence of
n 1
EFm [Xn]l
J nEIN
n z1 (8.1.1).
It can also be shown that Xn  Yn is dominated by the Snell envelope ess sup EF^ IX.] aEE
o>n
(Krengel & Sucheston [1978], p. 204). (1.4.27) (Reversed difference property.)
Let (.Fn)nEIN be a reversed stochastic basis. The adapted integrable process (Xn)nE_IN is an amart if and only if for every e > 0, there is mo E IN such that for all o,, ,r E E with or < T < mo, E [IEF° [XT] XQI] <,F.

1.4. Conditional expectations
31
(1.4.28) (Reversed amart Riesz decomposition.) Let (Xn)fE1N be a reversed amart. The process is uniformly integrable and converges to a ran
dom variable X_m. Then the sequence Zn = Xn  X_m is adapted; set Yn = X,,. for all n. (1.4.29) (Optional sampling.) Let (Xn) be a martingale [submartingale] for (Fn). If O.1 < Q2 < in E, then (Xok) is a martingale [submartingale] for Optional sampling for quasimartingales follows from the next complement.
(1.4.30) (Alternative definition of quasimartingale.) Let (Xn)nEIN be a sequence of real valued random variables adapted to an stochastic basis (Fn)nEIN Let C be a positive constant. The following are equivalent: (1) E [E°°1 IEFi [Xt+i]  XiI] < C. (ii) E [E it IE' . [XTi}1]  XT, 11 < C for any sequence 71 < T2 (iii)
< Tm of simple stopping times. Emit I E [Xri i ]  E [XT, ] I < C for any sequence T1 < T2 < T,n of simple stopping times.
<
The techniques in the proof of this are like those in the proof of (1.4.4)
(R. Wittmann, private communication). (1.4.31) (Continuous parameter amarts.) Let (Q, F, P) be a complete
probability space (subsets of null sets are in F). Let the index set J be an interval in the line R. Suppose the stochastic basis (Ft)tEJ is right continuous, i.e., for each s,
3= t>s
and each Ft contains all null sets in F. An optional time is a function
T: St + J U {oo} such that {T < t} E Ft for all t E J. An optional time T is called simple if it takes finitely many finite values. A continuous parameter ascending amart is a process (Xt)tEJ such that E converges for every ascending (= increasing) sequence (Tn) of simple optional times. The definition of a descending amart is the same, except that the sequence
Tn is descending (= decreasing). An amart is a process that is both an ascending amart and a descending amart. The theory of continuousparameter amarts parallels that of continuous parameter martingales (Doob [1953] or Dellacherie & Meyer [1982]). We recall some basic definitions. A trajectory of a process (Xt) at w is the net (Xt(w))tEj. A modification of a process (Xt) is a process (Yt) such that Xt = Yt a.s.; the exceptional null set may depend on t. The processes (Xt) and (Yt) are indistinguishable if there is a single null set, outside of which Xt = Yt for all t. Under mild boundedness assumptions, a continuous parameter amart has a modification every trajectory of which has right and left limits. If an
Stopping times
32
amart is right continuous in probability, then it has a modification every trajectory of which is right continuous. A related but stronger notion is the hyperamart. A process (Xt) is a hyperamart if E converges for all monotone sequences of bounded but not necessarily simple optional times. If (Xt) is a hyperamart, then there is a process Yt indistinguishable from Xt with the regularity properties (such as right and left limits). Reference: Edgar & Sucheston [1976b]. (1.4.32) (Continuous parameter quasimartingales.) Let J be an interval in H, and let (.Ft)tEJ be a stochastic basis with the properties stated above
(1.4.31). A process (Xt)tEj is a martingale if X8 = E'' [Xt] for s < t. The process (Xt) is a submartingale if X. < Ewe [Xt] for s < t and a supermartingale if Xe > Ewe [Xt] for s < t. The process (Xt) is a quasimartingale if there is a constant M so that IIE [Xt: I Fti1] i=1

Xt:1111
<M
for all finite sequences t1 < t1 < < to in J. (See Fisk [1965], Orey [1967], Rao [1969].) All quasimartingales (in particular martingales, L1bounded submartingales, L1bounded supermartingales) are continuous parameter amarts. Remarks
Roughly speaking, there are two antecedents to the martingale theory. One is the derivation theory of R. de Possel (for example de Possel [1936]). Another, probabilistic, approach originated with Paul Levy [1937]; he generalized sums of independent variables with 0 expectations by "centering" (assuming 0 conditional expectations given the past). As for sub and supermartingales (called by Doob upper and lower semimartingales), they seem to have appeared first in the derivation
theory. The quest for the best formulation went back and forth between random variables and setfunctions, as between Doob and AndersenJessen (see Doob [1953],
pp. 630 ff). At the end, successors and students of AndersenJessen showed that their setfunction notions could be just as general or more general; Lamb [1973] gave a setfunction formulation equivalent to (but preceding) the randomvariable formulation of the amart in Austin, Edgar, lonescu Tulcea [1974]. But almost no one paid attention any more and random variables prevailed. The reason seems to be that, because of the RadonNikodym theorem, set functions bring no greater generality, and the intuitive meaning of stochastic processes and stopping times is lost in the setfunction approach. Krickeberg [1957], [1959], initiating the topic, proved stochastic convergence of Llbounded submartingales indexed by directed sets. Millet & Sucheston [1980b] proved stochastic convergence of Llbounded ordered amarts. The proof given above is quite different. The difference property is from Edgar & Sucheston [1976c]; the Riesz decomposition is from Edgar & Sucheston [1976a]; we have used a proof due to Astbury [1978]. The theory of continuous parameter smarts is from Edgar & Sucheston [1976b]. Hyperamarts appear earlier in the work of Meyer [1971] and Mertens [1972]; for an intermediate notion, with optional times taking countably many values, see Doob [1975]. A Banachvalued version of continuous parameter amarts appears in Choi & Sucheston [1981].
2
Infinite measure and Orlicz spaces
In this chapter we will prepare some of the tools to be used later. One important possibility is the use of an infinite measure space, rather than a probability space. For probability in the narrow sense, only finite measure spaces are normally used. Attempts to do probability in infinite measure spaces have had little success. In most of the book, we consider primarily the case of finite measure space. But there are reasons for considering infinite measure spaces. The techniques to be developed for pointwise convergence theorems can tell us something also in infinite measure spaces. The ergodic theorems in Chapter 8 often have their natural setting in infinite measure spaces. J. A. Hartigan [1983] argues that an infinite measure space provides a rigorous foundation for Bayesian statistics. Our material on derivation requires the possibility of infinite measures to cover the most common cases, such as Euclidean space 1R' with Lebesgue measure. Another important tool to be used is the Orlicz space. The class of Orlicz spaces generalizes the class of function spaces LP. If the primary concern is finite measure space, then Orlicz spaces are basic. A necessary and sufficient condition for uniform integrability of a set of functions is boundedness in an Orlicz norm. Zygmund's Orlicz space L log L, and, more generally, Orlicz spaces L logk L, appear naturally in considerations of integrability of supremum and related multiparameter convergence theorems. For consideration of infinite measure spaces, Orlicz spaces will not suffice, unless property (02) holds. But in principal cases, the "heart" of an Orlicz space (the closure of the integrable simple functions) plays an important role. Measuretheoretic arguments often require approximation of functions by simple functions; therefore much of the theory applies only to hearts of Orlicz spaces. The hearts of the Orlicz spaces L logk L, the Rk spaces introduced by N. Fava, are especially important. A novelty of our approach consists in identifying useful spaces as hearts of appropriate Orlicz spaces, and applying the general theory. For instance, the Fava space R0, which is the right setting for the DunfordSchwartz and the martingale theorems in infinite measure spaces, is identified as the heart of the largest Orlicz space Ll + Lam; it follows at once that Ro is an ordercontinuous Banach lattice. We will normally write (SZ, , µ) for a (possibly infinite) measure space. We may say "measurable function" rather than "random variable" for a function f : 1 + R. Normally we will write f f d p for the integral, and reserve E [f] for the case of a probability space. Conditional expectations
Eµ [f I G] will be considered later, but f f dp is not the special case of Eµ [ f I G] for trivial valgebra G. We may also use the phrase "almost everywhere" rather than "almost surely" in this context.
Infinite measure and Orlicz spaces
34
2.1. Orlicz spaces Much of modern probability theory, especially the part concerned with convergence theorems, deals with appropriate spaces of measurable functions. The most useful are the spaces LP, but these spaces are not enough for everything we want to do here. It is therefore necessary to consider the generalization known as the Orlicz spaces. We have attempted to make the exposition selfcontained. It requires only a knowledge of the spaces L. and their basic properties, such as contained in Cohn [1980] or Royden [1968]. Orlicz functions and their conjugates The Orlicz function generalizes the function tP used for the definition of the space LP. The generality in the definition determines the generality of the Orlicz spaces that will result. We allow the values 0 and oo for the function so that spaces like L log L and L,,. will be included. Arithmetic with oo follows the usual conventions, including 0 oo = 0.
(2.1.1) Definition. An Orlicz function is a function 4? :
[0, oo) > [0, 00]
satisfying:
(1) 'P(0) = 0. (2) 4' is leftcontinuous: limut 4t(u) = fi(x). (3) 1 is increasing: if u1 < u2i then 4i(u1) < 4t(u2).
(4) (P is convex: 4 (aul + (1  a)u2) < a4i(u1) + (1  a)'t(u2), for
0
(5) b is nontrivial: (D(u) > 0 for some u > 0 and D(u) < 00 for some
u>0. The convexity (4) implies that (D is continuous except possibly at a single
point, where it jumps to oo, so condition (2) is needed only at that one point. An Orlicz function is increasing, so (see 7.1.4) it is differentiable a.e. and (see 7.1.11) its derivative cp = 4i' satisfies ( 2.1.1a)
(u) =
f
(x) dx,
0
u < oo.
If 4(u) = oo for some values of u, then we take by convention W(u) = 00 also, so that (2.1.1a) remains correct. Since 4i is increasing, we have cp > 0. Since t is convex, the derivative W is increasing. Then cp is continuous except at countably many points, so V can be made leftcontinuous without destroying (2.1.1a).
2.1. Orlicz spaces
35
(2.1.2) Definition. An Orlicz derivative is a function 0: (0, oo) ' [0, 00] satisfying: (1) cp is increasing. (2) cp is leftcontinuous. (3) cp is nontrivial: W(x) > 0 for some x and W(x) < oo for some x. It is not hard to verify that if o is an Orlicz derivative, then (2.1.1a) defines an Orlicz function 4) with derivative W. If cp is strictly increasing, then it has an inverse function o. In any case, we may define a "generalized leftcontinuous inverse" of cp as follows:
,O(y)=inf{xE(0,00):W(x)>y}. An example is shown in Figure (2.1.2). The new function O is also an Orlicz derivative, so
JV
IF (v) =
defines an Orlicz function. We say that ' is the conjugate Orlicz function to 4). Notice that the construction works in reverse: The derivative of 4) is 1'; the generalized leftcontinuous inverse of 0 is cp; and 4)(u) = fo W(x) dx.
Figure (2.1.2). An Orlicz derivative and its generalized leftcontinuous inverse.
Let us consider more carefully the relation between W and 0. If cp is discontinuous at x = a, then O(y) = a for all y with cp(a) < y < W(a+). Since 4) is convex, the function 4)(u)/u is increasing. We have 4)(u)/u  00 as u * oo if and only if W(x) > oo as x + oo; this is equivalent to z/)(y) < 00 for all y. (In this case, we say that cp is unbounded, 0 is finite, or ' is finite.) W(x) = d is finite, then z/i(v) = '(v) = oo for On the other hand, if all v > d. (Then we say that cp is bounded, 0 is infinite, or IF is infinite.)
36
Infinite measure and Orlicz spaces
(2.1.3) Definition. A Young partition is a pair (El, E2) of open subsets of (0, oo) x (0, oo) such that
(1) E1nE2=0,E100, E200. (2) El is southeast hereditary: if (x, y) E El, x' > x, and y' < y, then (x', y') E El. (3) E2 is northwest hereditary. (4) El U E2 = El U E2 = (0, oo) x (0, oo).
Figure (2.1.3). Young partition. If (El, E2) is a Young partition, then V(x) = sup { y : (x, y) E El }
is an Orlicz derivative with generalized leftcontinuous inverse V)(y) = sup { x : (x, y) E E2 } .
Conversely, if cp is an Orlicz derivative with inverse &, then El = { (x, y) E (0, oo) x (0, oo) : y < w(x) } E2 = { (x, y) E (0, oo) x (0, oo) : x < V)(y) }
is a Young partition. The set G = { (x, cp(x)) : x E (0, oo) } U { (V'(y), y) : y E (0, oo) }
is the common boundary of El and E2. See Figure (2.1.3). The set G has twodimensional Lebesgue measure 0, since it meets each line of slope 1 in only one point.
2.1. Orlicz spaces
37
The Young partition leads to a useful inequality.
(2.1.4) Theorem (Young's inequality). Let 4 and 41 be conjugate Orlicz functions with derivatives cp and,0. Then
uv < CU) +'(v) for all u, v > 0. Equality holds if and only if v = p(u) or u = O(v). Proof. Figure (2.1.4).
UV
u u Figure (2.1.4). Young's inequality: uv < 4(u) + T (v). (2.1.5) Corollary. (D (u) = sup { uv '(v) : v > 0 } for u > 0, and the sup is achieved if W(u) < oo.
Proof. Fix u > 0. Then (D (u)+W(v) > uv for all v and uv is finite, so (even if one of t (u), T(v) is infinite) we have
t(u)>sup{uvT(v):v>0}. Now if co(u) < oo, then for v = cp(u) we have 4D(u) +'Q(v) = uv, so both '(u) and W(v) are finite and
max{uv  T(v) : v > 0}. On the other hand, if V(u) = oo, then ' is bounded, so
sup{uvT(v):v>0}= 00.
Infinite measure and Orlicz spaces
38
Orlicz spaces We will now consider a measure space (S2, F, µ) where it is afinite (and nonnegative). Certain properties of the measure space will be important in the description of the Orlicz space.
(2.1.6) Definition. Let (0,.F, p) be a measure space. Then we say it is infinite if µ(S2) = oo; otherwise we say p is finite. We say that p has arbitrarily small sets if
inf{p(A):AE.F, µ(A)>0}=0. The canonical examples are the following. The unit interval [0, 1] with Lebesgue measure: This is finite and has arbitrarily small sets. The interval [0, oo) with Lebesgue measure: This is infinite and has arbitrarily small sets. The positive integers IN with counting measure: This is infinite and has no arbitrarily small sets. The fourth possibility (finite measure with no arbitrarily small sets) is typified by a finite set {1, 2,. .. , n} with counting measure; this is less interesting, since all of the Orlicz spaces consist of the same functions (it is, however, important in the isometric theory of Banach spaces).
Let b and T be conjugate Orlicz functions with derivatives cp and'. They will be fixed throughout most of Section 2.1.
(2.1.7) Definition. Let f : S2 > 1R, be a measurable function. The Orlicz modular of f for 4P is
M, (f)= f(IfI) dp. The Young class of is the set Yt = Y4, p) consisting of all measurable functions f : S2 + IR with Mp (f) < oo. Note that (P is convex, so Mt is convex in this sense:
(2.1.8) Proposition. Let f, g be measurable functions, and let a satisfy
0
MV (af+(1a)g)
J(Iaf+(1 a)gI)dp
J(aIf I +(1a)IgI)dp f (aD(I f I) + (1 a)'1(IgI)) dp
aM.(f)+(1a)Mt (g). This implies that the Young class Yp is a convex set, and in fact that the Orlicz ball
Be={f:Me(f)<1}
is a convex set.
In general, the Young class is not a linear space. It is possible that f E YD but 2f ¢ Ye (see 2.1.25). We must therefore consider more complex definitions.
2.1. Orlicz spaces
39
(2.1.9) Definition. The Orlicz space for 'F is the set
L4, = Le(1, F,µ) of all measurable functions f : St > 1R such that M4. (f /a) < oo for some a > 0. The heart of the Orlicz space for t is the set H4, = He (S2, .F, µ) of all measurable functions f : Q > 1R, such that M4. (f /a) < oo for all a > 0. The Luxemburg norm of a measurable function f is
IIfII..=inf{a>0:Me(f/a) <1}, where by convention inf 0 = oo. If (S2, .F, p) is the discrete space IN with
counting measure, then we write l, = Le(IN) and he = H4.(IN). The spaces L. are known as Orlicz sequence spaces.
Note that if f E Le, then Me (f /n) < oo for some integer n. But if I /n > 0 a.e., so 4)(I f I /n)  0 a.e. (Here we use the assumption that 'F(u) < oo for some u > 0.) Thus, by the dominated convergence theorem, M 4. (f /n) < 1 f o r some n, and I l f 114, < oo. Conversely, if IIf l i e < oo, then
clearly f E Le. Another description of Lt is thus the set of all f with IlfIi4. < 00.
We insert here a few basic calculations.
(2.1.10) Proposition. (1) If 0

(3) If hf lie >1, then M4.(f) Ilf1i.>1. (4) If f E He, then IIflIe = 1 is equivalent to Mt (f) = 1.
(5) IIf.fll4.'0if and only if M4.(k(f, f)) >0forallk>0. (6) If Ilf.  f lit  0, then f,,, converges in measure to f, that is: for every e > 0, we have limn,. µ{l fm.  f I > e} = 0.
Proof. (1) For a > I I f II.., we have Me (f /a) < 1. Now as a decreases to If lie, the quotient If I /a increases to If I/ IIf IIe By the leftcontinuity of 'P, we have'F(If I /a) ?'F(If I/ if lie), so by the monotone convergence theorem,
Me (f/a)>M. (f/IlfIIe) Therefore Me (f/Iiflie) <_ 1. (2) Let a = I I f lie. Then by convexity, Me (f) /a < Me (f /a) < 1, so Me(f)<
Mt(f)>a. Thus Me(f)? llflit. (4) By (2) and (3), if Me (f) = 1, then IlflIe = 1. Conversely, suppose
llflie=1. By (2),Me(f)<1. For 01,so Me (f /a) > 1. As a + 1, we have ID (IfI/a) ''F(IfI). Now 4D is continuous at all points except possibly where it jumps to oo. Since f E He, we see that Me (f /a) < oo, so 'F is finite at the point If (w) I /a for almost all w. Therefore Me (f) = lima?1 Me (f /a) > 1. Thus Me (f) = 1.
Infinite measure and Orlicz spaces
40
0, so IIk(f fn)II't < 1 0, then IIk(f fn.)II. (5) If IIf ffII.D for large n, so by (2),we have M., (k(f  fn,))  0. Conversely, suppose
Mt (k(f  fn))
0 for all k. Then for each k we have M4, (k(f  fn)) < 1 for large n, so that Ill  fn II < 1/k for large n. Thus If  fn I I t ' 0. (6) Let e > 0. There is uo so that d(uo) > 0. If k = uo/e, then for any
6>0, for large n we have f 4(klfnfI)dµ<6,so
6>
f
P(kifnfI)dp>,D(uo)p'{IfnfI >e}.
Thus µ{IfnfI>e}*0. One might remark that the hypothesis f E HD in part (4) cannot be omitted. The space L,,,, provides a counterexample. If functions in L4, are identified when they agree a.e., then L1, is a Banach space:
(2.1.11) Theorem. (a) The set L4, is a Banach space. (b) The set H4, is a closed linear subspace of L4,, and H4, C Yp C L4,. (c) (The Fatou property) If fn E Lp is an increasing, nonnegative sequence with fnII$ 1 for all n, then the pointwise limit f = lim fn belongs to L 1 and If II 1.
Proof. We first prove that L4, is a normed linear space. Homogeneity Ilaf IIp = IaI IIf II4, is easy from the definition of the norm. Thus if f E L.,, it follows that a f E L4,. For the triangle inequality, let f, g E L4,, and let a =IIf II.t, b = II9II4, Then f /a and g/b belong to the Orlicz ball B4,. But B4, is convex, so
a f+ b 9=f+g
a+ba
a+bb
a+b
belongs to B4,. Thus Ill + gIIp < a + b. This shows that f + g E L4, and IIf+9II1P < IV HIP + 1104'.
In order for IIf IIt = 0, we must have M4, (f /a) < 1 for all a > 0. Fix E > 0. Then
Up
\a/
=
f P (I/II dp > 41(a) u{IfI > E}. \µ{II
Now when a * 0, we have 4)(E/a) p oo, so f I > e} = 0. This is true for all e > 0, so f = 0 a.e. Now consider a sequence fn E L4, with 0 < f1 < f2 < ... and IIfnII < 1 for all n. That is, f 1D(I fnI) dµ <_ 1. If f = lim fn, then fnI) I) by the leftcontinuity of t. Therefore, by the monotone convergence theorem,
f 1t(IfI)du 1. Thus, IlfIi4, << 1. From this we may deduce completeness: Suppose (fn) is a Cauchy sequence in L4,. There is a subsequence fnk with II fnk  fnk1114 < 2k. Now let
00
9 =
I
k=1
Ifnk  fnk1I
2.1. Orlicz spaces
41
By convexity and leftcontinuity of t, 4'(9) =
Ifnk k=1

fnk3 I
= 1 (7 2k 2k Ink  fnk_1 I) E2 k4D (2k I fn'  fnk1 I) . Thus II9II,,D
1, so g belongs to L4,, so the series converges a.e. to a finite
limit (since 4D(u) > oo as u + oo). Now 00
f = E (fnk  fnk_l) + fno k=1
also converges a.e., and If  fno I <_ g, so f E L4,. But (fn) is Cauchy, so for any e > 0, there exists m so that for all n > m, we have II fn  fm I I4, 5 E.
Then for nk > m, we get
ma00.
II fin  fnk II4, < E. Thus II fin  f II44,
0 as
Next, consider the subset H4,. Clearly it is a linear space. We claim it is closed in L4,. Suppose fn E H4, and II f  I n I I ' 0. Let a > 0. Then II fn/a  f /aII4, > 0, so there is n with Mt (fn/a  f /a) < 1. Then (f  fn)/a E Y4,. But also fn/a E Y4,. By the convexity of Y4,, we have f /2a E Y4,. Since a is arbitrary, this shows that f E H4,. The sequence fn converges in 4)norm to f if II fn  f 114) + 0 as usual. Also, fn converges in if M4, (fn  f) + 0. These two modes of convergence are related (2.1.10(5)) but not identical (2.1.15 and 2.1.18). The Banach space L4, is an example of a Banach lattice. This includes the following observations. The relation
f < g a.e. is a partial order on L4,. If f < g a.e., then f + h < g + h a.e. If f E L4,, f > 0 a.e., and a E IR, a > 0, then a f > 0 a.e. If f, g E L4,, then the pointwise maximum f V g and pointwise minimum f A g also belong to L4,. In particular, the pointwise absolute value If I belongs to L4, and II
I f I 14 = I I f 114. Observe that these closure properties are true as well for
the heart H4,. It is also a Banach lattice. [The Fatou property (2.1.11(c)) may fail for H4,, however. See the example below (2.1.25(5)).] The space L4, is a rearrangement invariant function space. This means that if f E L4, and g is a rearrangement of f, then g E L4, and II9II4, = 1 1 1 1 14, .
A good way to make this precise is to consider the distribution functions of
f and g:
Infinite measure and Orlicz spaces
42
(2.1.12) Proposition. Let f and g be measurable functions. If
i{IfI >t}=µ{IgI >t} for all t>0,then Mt(f)=Mt(g)and Ilflie=IIgIIe Proof. Since 4 is strictly increasing (except possibly where it is 0), we have u1 < u2 if and only if 4)(ul) < t(u2) (provided $(u2) > 0), so we make the
substitution t = b(u) in the integral:
M., (f)= f
f f
00µ{F(If1)
> t}dt
00
=
µ{IfI > u} cp(u) du.
This depends only on the distribution function µ{IfI > t} of f.
There are some desirable properties that the spaces He have that are not (in general) shared by Le.
(2.1.13) Definition. A Banach lattice E has order continuous norm if any downward directed net (ft)tEJ in E with greatest lower bound 0 satisfies lim I I ft II = 0.
(2.1.14) Theorem. Let b be a finite Orlicz function. (a) H4, has order continuous norm. (b) He is the closure in L4, of the integrable simple functions.
Proof. (a) We first prove that if (fn)nEIN is a decreasing sequence in He with greatest lower bound 0, then lim IIfnII , = 0. Now fn > 0 a.e. For
any a > 0, the sequence lb(fn/a) also converges to 0 a.e. But fl E He, so f 4b(fl/a) dp < oo. Thus we have lim f 4'(fn/a) dp = 0. In particular, then, for large n we have f t(f /a) dp < 1, or IIfnii4 < a. This is true for all a>0, so limllfn lle =0. Notice that a consequence of this property is: If fn is an increasing sequence in He that is bounded above by an element of H4,, then fn converges. Indeed, let f be the pointwise limit of fn. It is dominated by an
element of He, so it also belongs to He. Then the sequence (f  fn) is decreasing and has greatest lower bound 0. Thus I I f  I n I I e ' 0. Now we may deduce the full order completeness using the sequential sufficiency theorem (1.1.3). If (ft) is a net directed downward in He, then any sequence (ft,,) with t1 < t2 < ... is a decreasing sequence, which
converges in the norm of He. Therefore the net (ft) converges in norm. Since it has greatest lower bound 0, the limit must be 0 a.e.
2.1. Orlicz spaces
43
(b) Since 4) is finite, the integrable simple functions are in HD, so their
closure also lies in H4,. Let f E Hp, f > 0. There is a sequence (fn) of integrable simple functions that increases to f a.e. (since µ is assumed to be afinite). By order continuity, 11f,,  f II1 ' 0. For general f E HD,
note that f =f+ f, where f+ =f V O and f = (f) V 0 also belong to Hp. The space L4, may fail the two properties given in Theorem (2.1.14) (see 2.1.25). Since many results in probability theory (and even in integration theory) rely on approximation by simple functions, the density of the integrable simple functions is often indispensable. The (02) condition, treated below, insures that L4, shares this property, but in the absence of the (02) condition, we will see that the Orlicz heart H4, is often the proper space to use, rather than the full Orlicz space L4,.
The (O2) condition
We have seen that the Orlicz space L4, has certain desirable properties that the Orlicz heart Hp may lack [the Fatou property, Theorem (2.1.11(c))]. On the other hand, H4, has some desirable properties that L4, may lack (order continuous norm, density of simple functions; see (2.1.14)).
Therefore, the cases when L4, = H4, are particularly useful. This is the situation we will consider here.
(2.1.15) Proposition. Let (Sl,.P, p) be a orfinite measure space, and let 4) be an Orlicz function. (1) The following are equivalent: (1a) H4, (11,.F, µ) = L4, (lb) Y44,(1,.P,µ) = 2Y4, (I, F, µ) (2) The following are equivalent: (2a) IfMt (fn,) + 0, then we have also M4, (2f.) > 0. (2b) IIf  In II., ' 0 if and only if M4, (f  f,,,) + 0 (norm convergence is equivalent to modular convergence). (3) Suppose µ is infinite or 4) is strictly positive. Then (1) (2).
(4) Suppose µ has arbitrarily small sets or 4) is finite. Then (2) (1).
Proof. (1) If H4,=L4,, then Y4c2Y4CL4,=Ht CY4,,so Y4,=2Y4,. Conversely, suppose Y4, = 2Y4,. If f E Y4, then 2'f E Y4, for all n, so if a E IR, we have I a f I I a I I f I< 2n l f I for some n, and a f E Y4, . Thus YD = H.D. If f E L4,, then f /a E Y4, = H4, for some a, so f E H4,. (2) If (2a) holds, then M4, (fn) > 0 implies limn M4, (2m fn) = 0 for any
m, and thus limn M4, (k fn) = 0 for any k E R. By (2.1.10(5)), this is equivalent to II fn II 4,
 0
(3) Suppose (1) holds. If µ is infinite, then 4) is strictly positive. Indeed, if 4)(u) = 0 for some u > 0, then there is uo with fi(uo) = 0 but 4)(2uo) > 0. So the constant uo belongs to Y4,, but the constant 2uo does not. Thus we may suppose 4) is strictly positive. Assume (for purposes of contradiction) that there exists a sequence fn E Y4, with Mt (fn) + 0 but Up (2 fn)  0. Now M4, (2 fn) = oo for some n
Infinite measure and Orlicz spaces
44
would contradict (1b). We may assume oo > M4, (2fn) > c > 0. Taking a subsequence, we may assume M4, (fn) < 2n. Consider f = sup If,, 1. We have
f t(If1) dp = f ;(sup If. 1) dµ =f
sup (I fn I) d p
(by leftcontinuity)
J(Ifnl)4t E
f CIfnl) dp
(by Fatou's Lemma)
<2' <00. Thus f E YF. Therefore 2f E Y... Now f 4)(IfnI)dp < 2n, so 0 a.e. Since is strictly positive, this means IfnI ' 0 a.e. Thus't(2IfnI) ' 0 a.e. But fn I) < 4?(2f), which is integrable, so by the dominated convergence theorem, we have M., (2fn) + 0. (4) Suppose (2a) holds. If p has arbitrarily small sets, then b is fi
nite. Indeed, if 4)(u) = oo for some u, then there is uo with (D(uo) < 00 but D(2uo) = oo. Now if An is a sequence of sets with 0 < p(An) and p(An) + 0, then fn = uo lA satisfies Mt (fn) > 0 but M4 (2fn) .,,+ 0, so we may suppose
is finite.
Suppose Y4 # 2Y4,. Then there is f E Y4 with 2f ¢ Y,. Let fn be integrable simple functions with fn T If 1. Then gn = If I  fn decreases to 0. Also, P(gn) < 4>(I f I), which is integrable, so by the dominated convergence theorem, MP (gn) + 0. Now Mj (21f I) = oo, and M.b (2If I) <
2
[f(4fn)dP+f(4gn)dP].
The first term on the right is finite (since (D is finite), so M.V (4gn) = oo.
We thus have a sequence (gn) with Mt (gn) 4 0 but Mt (4gn) = oo. [Using functions of the form fn  fm, we may obtain a sequence (hn) with M.t (hn) + 0, Mt (2hn) < co, but Mb (2hn) t+ 0.1
(2.1.16) Definition. The Orlicz function 44 satisfies condition (02) at oo if there exist uo E (0, oo) and k E IR such that (2.1.16a)
4)(2u) < ,(u)
for all u > uo.
The Orlicz function 44 satisfies condition (02) at 0 if there exist uo E (0, oo) and k E 1R such that (2.1.16b)
1)(2u) < kd)(u)
for 0 < u < uo.
The strict inequality in (2.1.16a) requires that 4i(u) < oo for all u. The strict inequality in (2.1.16b) requires that 1(u) > 0 for all u > 0.
2.1. Orlicz spaces
45
(2.1.17) Theorem. Let (1k, F, µ) be a afinite measure space, and let be an Orlicz function. (1) If 4) satisfies (02) at 0 and oo, then H4,(1l, F, µ) = Lq, (1, F, µ). If HD ([0, oo)) = L., ([0, oo)), then 4) satisfies (02) at 0 and oo.
(2) Suppose p is finite. If 4) satisfies (02) at oo, then H4, (SZ,F, p) _ Lj (SZ, F, µ). If 114 ([0,1]) = L4 ([0,1]), then 4) satisfies (02) at 00. (3) Suppose u has no arbitrarily small sets. If 4) is finite and satisfies
L4,(52,.',µ). Ifhp = l,p, then 4)
(02) at 0, then satisfies ((2) at 0.
Proof. (1) Suppose 4) satisfies (02) at 0 and oo. Then 0 < 4)(u) < oo for all u E (0, oo). The quotient 4)(2u)/4)(u) is continuous on (0, oo), so we have
sup
(2.1.17a)
4)(2u)
< oo.
uE(O,0O) 4)(u)
Thus 4)(2u) < k4)(u) for all u, so clearly MV (2f) < kM., (f). Conversely, suppose Hp ([0, oo)) = L4 ([0, oo)), but the (02) condition
fails (either at 0 or oo). If 4)(u) = oo for some u, then there is uo with 4)(uo) < oo but 4)(2uo) > 0, so that f = uo 1[o,1] shows that Yb # 2Yt. If 4)(u) = 0 for some u > 0, then there is uo with 4)(uo) = 0 but 4)(2uo) = oo, so that f = uo 1[o,00) shows that Y4 # 2Y4,. Thus we may suppose that 4) is positive and finite. Since the (02) condition fails, there exist u,,, E (0, oo) with 4)(2u,,,) > 2n4)(un). Let A,,, be disjoint intervals in [0, oo) with p(A.) = 2n/4)(u,,.). The function
cc
f =EUn1A, n=1
satisfies
Mt (f) = f 4)(If I) dµ =
4)(un)
2_n
4'(un)
=
1:2n
< 00
but M4, (2f) = f 4)(21 fI) dµ =
t(2un)
2n > 'D(un)
(2) Now 4) satisfies (A2) possibly at o0 only, say 4)(2u) < k4)(u)
for u > uo.
1= 00.
46
Infinite measure and Orlicz spaces
By the strict inequality, we have 4 (u) < oo for all u E (0, oo). Let f E Yt. Then
Mt (2f) = f ,D (2IfI) dµ
J IfI?uo}
kD(If1)+
4(2uo) dp 111f I
kM., (f) +,D(2uo)p(S2) < oo.
Thus 2f E Y.,. Conversely, suppose Hp ([0,1]) = Lp([0,1]). Then ' is finite as in the previous case. If the (A2) condition fails at oo, then un may be chosen
as in case (1), but with un > oo, so also d(un)  oo. We may suppose t(un) > 1 for all n. Now the intervals An, chosen as before, have total length at most 1, so they may be chosen in [0, 1]. The rest is the same as case (1). (3) Now 4 satisfies (02) at 0, so
fi(2u) < kfi(u)
for 0 < u < uo.
By strict inequality, we have (D(u) > 0 for all u. Now u has no arbitrarily small sets, say µ(A) > c > 0 whenever µ(A) > 0. Let f E Y4,. We have
f(IfI)dL
(uo){f I >_uo},
so µ{IfI > uo } < oo. Since µ is bounded below by c > 0, the set A = {IfI > uo } consists of a finite number of atoms, so 4P (2I f I) is bounded on the set. Finally,
M.p (2f) = f (P (2IfI) dp
k4(IfI)dµ+ {IfI
{IfI>uo}
D(2IfI)dµ
Suppose that the (02) condition at 0 is false. Then we may choose un  0 with 2't(un). We may assume d(un) < 2n. Now the disjoint sets An C_ IN may be taken with p(An) = 2k where 2n < 'D(un) 2k <
2n+l. Then, for f = E°O_1 un 1An, we have M (f) = E'b(un) µ(An) <_ E 2n+1 < CO
2.1. Orlicz spaces
47
cc(u)
U
Figure (2.1.19b).
but
00
Mt (2f) _
41(2un) {i(An) > E 1 = oo. n=1
Thus l,t # h4,. U We will normally say simply: condition (02) is satisfied, and understand the particular one of the three possibilities appropriate for the measure space under discussion. In particular, if is a probability space, then we will mean that (02) is satisfied at oo. (2.1.18) Corollary. Let 4 be a strictly positive Orlicz function satisfying the (02) condition. Then: (A) Integrable simple functions are dense in Lp. (B) If fn is an increasing sequence in L4, and sup < oo, then fn converges in norm. (C) Modular convergence is equivalent to norm convergence in L.; that
is: 11ff.114, +0if and only if M4,(ffn) 0. Proof. By the (02) condition, L4, = Hg.. Part (A) is from (2.1.14(b)). Part (C) is from (2.1.15(3)). For (B), let f be the pointwise supremum of fn. Then by (2.1.11), we have f E L. Now (f  fn) is a decreasing sequence with greatest lower bound 0. By (2.1.14(a)), L4, has order continuous norm,
soIIff14+0.
Complements
(2.1.19) (Inequalities.) Let 4) be an Orlicz function and cp its derivative. Use Figures (2.1.19a)(2.1.19c) to prove inequalities: (a) D(u) < ucp(u); (b) 4 (u) + ucp(u);
(c) fi(n) «(n  1) + cp(n), if n > 1.
Infinite measure and Orlicz spaces
48
n1
n
Figure (2.1.19c).
(2.1.20) (Norm of indicator.) Let A E .J , 0 < µ(A) < oo. Then IIIAIIP 1/a, where 1
a =sup u > 0 : (D(u) < µ(A)
.
Thus either a satisfies 1/µ(A) or else (D jumps to oo at a. Corollary: if 4D is finite, then p(A)  0 if and only if IIlAIIZ ' 0(2.1.21) (Comparison of L4, and L1.) If p(1) < oo, then there is a constant C > 0 with Ill I I 1 < C I I f 11(b. The constant C depends only on the function and the value µ(S2). To see this, choose a > 0 with 4P(a) > 0,
then choose b with 0 < b < O(a). Let C = (a/b) + aµ(52). By convexity of fi, we have P(u) > (b/a)u for u > a. Now if f E L4,, let r = Ilf ll4 and compute
1 JIfIdit=J r
If 'd+ / Ifl>ar}
bf
(I
If'dt
rf{Ifl<ar} r I)
dµ + aµ(52) < b + aµ(52) = C.
That is, IlfII1
< 1/a, then 11g,,111 <_ (2a/b) II9n II 4,, so certainly 11g,,111 + 0.
Let A = {gn $ 0}. Then 1A < Ignl, SO II1AII4, < II9nII4, < 1/a. Thus 1 > f 't(1A/(1/a)) dp = µ(A)4,(a) >_ µ(A) b. Thus p(A) <_ 1/b. Now for C = (a/b) + aµ(A) < (2a/b), we may apply the previous result (2.1.21): I I9n I I 1 S C I I9n I I, = (2a/b) I I9I I4,
2.1. Orlicz spaces
49
(2.1.23) Suppose 4 is finite and Ignl < 1 for all n. If II9nII1 a 0, then IlgnIk ' 0. (Prove as in (2.1.20).) (2.1.24) Property (B) of (2.1.18) is equivalent to a Banach space condition known as weak sequential completeness (see Lindenstrauss & Tzafriri [1979], p. 34).
(2.1.25) (Failure of the (02) condition.) The Orlicz function
IP(u)=e'u1 is finite and satisfies the (02) condition at 0, but fails the (z2) condition at oo. It can be used to provide explicit examples that illustrate the problems that arise when that happens. Let (Il, be [0, 1] with Lebesgue measure.
(1) L4 # H4, and Y4 # 2Y4,: For the function f (t) = log(1/t), we have f E L4, and f E 2Y4, since M4, (f /2) < oo, but f ¢ H4, and f ' Y4, since M4, (f) = oo. (2) L4, does not have order continuous norm: The sequence
fn(t) = log
1(0,11.)(t)
t
decreases, and has greatest lower bound 0, but does not converge to 0 in norm, since II fn II > 1 for all n. (3) Modular convergence is not equivalent to norm convergence in L4,: The functions fn (t) = 2log t 1(O,1/n)(t)
satisfy M4, (fn) > 0, but IIfnII4, > 1/2. (4) Weak sequential completeness fails in L4,: The sequence
is increasing and bounded in norm, but not norm convergent. (5) H4, fails the Fatou property: The previous sequence (fn) belongs to H4,, is increasing, bounded in norm, but its pointwise limit is not in H4,.
(2.1.26) (Separability.) If t is finite, then the integrable simple functions
are dense in H4,. So H4,([0,1]) is separable. In fact, if I(u)/u * oo as u > oo, then the Haar functions constitute a basis for H4,([0,1]) (Krasnosel'skii & Rutickii [1961], p. 106).
Infinite measure and Orlicz spaces
50
In (2.1.17) we saw that if the (A2) condition fails at oo, then Hb ([0,1]) # L4. ([0,1] ). In fact in this case more is true: (2.1.27) Theorem. I f (A2) f a i l s at oo, then L. ([0,1]) is nonseparable. Proof. We will prove that Lq is nonseparable by constructing an uncountable family of functions in Lb, each of distance at least 1 from the others. Since (A2) fails, there exist un increasing to oo with 4i(2un) > 2n45(un).
We may assume 4b(ul) > 1. Choose intervals An in [0, 1] with p(A,,) _
2n/d(un). This may be done so that the right endpoint of An is the left endpoint of A,a+1 for each n, and the endpoints cluster at 1. Define a function f by
3un 1A
.f (x)
n=1
Now f E Lip, since f /3 E Ye,. For each a with 0 < a < 1, define
r f(x+1a)
0<x
f(x  a)
a <x<1.
),
Then for a
Qwehave f P(f,,, fp)dp=oo,so 11fafpll4, >1.
(2.1.28) (Equivalents to (A2).) Let (Q, F,,u) be a nonatomic afinite measure space, let 4i be an Orlicz function, and let L = Lk (1l, F, p) with norm denoted 1 1 . The following contitions are equivalent. (i) The function 4? satisfies (A2) (both at oo and at 0). (ii) (Cl) If f E L+, g E L+, 11f 11 > 0, then If +gII > IIgII. 11
(iii) (C) For each f' E L+ and each number a > 0, there is a number Q = ,3(f', a) such that if f E L+, f < f', IIf II > a, g E L+, Ilgll < 1, then IIf +gII > II'II +a. (iv) G. Birkhoff's uniform monotonicity condition (UMB): Given an
e> 0, there is a 6 > 0 such that if f E L+, g E L+, IIgll = 1, and If + gII 51 + 6, then IIf II <,F (Akcoglu & Sucheston [1985a]). Remarks Most of this material on Orlicz spaces was adapted from the following sources:
Zaanen [1983], Chapter 19, for the basic material (the most complete account we found); Lindenstrauss & Tzafriri [1977], Chapter 4, for the sequence spaces 1,j, and ha.; Lindenstrauss & Tzafriri [1979], Chapter 2, for the function spaces on [0, 1] and [0, oo); Akcoglu & Sucheston [1985a] for much of the discussion of HD; Krasnosel'skii & Rutickii [1961] also discusses the (A2) condition and the spaces H.. (called there
EM). There is some overlap in these references, and all of them have additional material not mentioned here. The Young classes Y, were introduced by W. H. Young [1912]. In general they are not linear spaces, and even when they are, Young did not norm them. W. Orlicz [1932], [1936] introduced the Banach spaces L4., defining the norm in terms of the conjugate. The definition of the norm given above, which uses only 4i, is due to W. A. J. Luxemburg [1955]. The (A2) condition appears already in Orlicz [1932]. The second paper, Orlicz [1936], showed how to avoid requiring the (A2) condition. A. C. Zaanen [1949] extended the definition of the Orlicz function to allow the value oo, so that L,,, became an Orlicz space. Spaces HV originate in Morse & Transue [1950].
2.2. More on Orlicz spaces
51
2.2. More on Orlicz spaces We consider now the classical examples. Let p be given, 1 < p < oo. Then 4)p(u) _ up p is an Orlicz function with conjugate 4Dq, where
1+1=1. p
q
Clearly, the Young class Ypp is the usual space Lp. The Orlicz space Lip and its heart Hip are both equal to Lp, but the Luxemburg norm is
We may also include U
and its conjugate
".(u) _
0
l00
0
Then L4,1 = H41 = L1 and If II t, = IIf 111. Also, La. = L. and IIf lI4,_ = If III. This is a good example where the derivative coi = 4b1' is bounded, so the conjugate is infinite. Note that fails the (02) condition both at 0 and at oo. This example also illustrates a case where H4,_ = 0, and is not equal to the closure of the integrable simple functions. Comparison of Orlicz spaces It is possible for different Orlicz functions to yield the same set of func
tions for the Orlicz space, with equivalent norms. A related possibility is that one Orlicz space is a subset of another. These possibilities are related to the ways that Orlicz functions can be compared.
(2.2.1) Proposition. Let (S2, F,,u) be a ofinite measure space, and let '11, I2 be Orlicz functions. The following are equivalent. (1) L4,, (S2, F,,u) C Ls,2 42, F, µ) (a subset, possibly not closed). (2) There is a constant k such that kllf II.pI ? IIf lIP2 for all measurable functions f. Proof. If kll f II.1 ? U11t2' then clearly L4,1 (52, .x', p) C L4,2 (52, .x', µ). Con
p) C L4,2 (Sl, F, p), then the linear transformation T : L1,1 * L4,2 defined by T (f) = f has closed graph. [Indeed, if we have Ilfn  f Iltl  0, then by (2.1.10(6)) there is a subsequence with Ink + f a.e. If also Ilfn 0, then there is a further subsequence versely, if

Infinite measure and Orlicz spaces
52
with f,,,,, + h a.e. Thus f = h.] By the closed graph theorem (Rudin [1973], Theorem 2.15), T is bounded, and its norm 1ITh will do for k.
(2.2.2) Definition. Let fi1 and fi2 be Orlicz functions. Then fil dominates fi2 at oo if there exist a, b, uo > 0 such that bfii(au) > fi2(u)
for all u > uo.
We write fil roo fit. Similarly, fil dominates fi2 at 0 if there exist a, b, uo > 0 such that
bfii(au) > fi2(u)
for all u < uo.
We write fii >o 't2Of course, if the appropriate (02) condition is satisfied, we may equiv
alently take a = 1 in these definitions. (But in general we may not take a = 1; see (2.2.20).) The comparison of Orlicz spaces can be treated in much the same way as the (02) condition was treated above.
(2.2.3) Theorem. Let (f2, F, p) be a ofinite measure space, and let fii, fi2 be Orlicz functions.
(1) If fit >,, fi2 and fii >o fi2, then Le1(52, F, p) C L42 (52,F,µ). If Le1([0, oo)) C Lt2 ([0, oo)), then fii >oo fi2 and fil >o fi2 (2) Suppose p is finite. If fii > m fi2, then
L..1p) c Lee(I,F,li) If L41([0,1]) C L4.2 ([0,1]), then fii r,, fi2. (3) Suppose p has no arbitrarily small sets and fi2 is finite. If fit o fi2, then L4,1(52, F, µ) C L.,2 (S2, If l4, C 14,2, then fi1 }o 'b2Proof. (1) Suppose fil >o fi2 and fii >
b'fii(a'u) > fi2(u) b"fii(a"u) > fi2(u)
fi2. Thus
for u > u' for u < u".
We may assume fi2(u') < 00. If not, choose u < u' with 4D(u) < oo, and let a = a'u'/u, so that for u > u,
b2(u) < fi2(u'u/u) < b4i(a u u/u) = b'fil(au). Similarly, we may assume fii(a"u") > 0. If we take a = max{a', a"},
b =max
fi2 (u)
fii (au)
u" < u < u'
,
2.2. More on Orlicz spaces
53
and b = max{b', b", b"',1}, then we have
b'11(au) > 't2(u)
for all u.
Let f E L4,1. Then M4,1(f /k) < 1 for some k. Now
M4,2(
k)=f'P2(Ia l) dµ
f bbl (ICI) dp
<
Since b > 1, by convexity of M.,2 we have M(1'2 (f /abk) < 1. Thus f E L'P2. Conversely, suppose L4,1([0, oo)) C L.,2 ([0, oo)). If '1 > (D2 fails, either
at 0 or oo, then there exist un E (0,oo) with 0 < 2'41(2nun) < (D2(un). 0. Let An be The strict inequality shows (D1(2nun) < oo and disjoint intervals in [0, oo) with p(An) = 2n/.b1(2'Un). Let 00
f =E2 n un 1An.
n1 Now
Mt1(f) _
'D1(2nun)
2
1(2nun) = E 2n < 00.
Thus f E L4.1. If a > 0, then 2nun/a > un for n larger than some no, so (2n un
f 2
a
2n
4)1(2nun)
a
n
00
E 'D2(un) It1(2nun) n=no 00
> E 1=00. n=no
Thus f ¢ Lt.2. (2) Now 4>1 >,, '102i say bPl(au) >
M4,2(j) = J P2(ka) da <_
f
{IfI
t2 (uo) dµ +
µ(Q) + bMp1
Thus f E L4,2.
f
b41
{IfI>kauo}
(f)
< 00.
dp ILI
Infinite measure and Orlicz spaces
54
For the converse, if 4i1 >w 42 fails, then the points u,, as in case (1) may be chosen with u" > oo, so also 4;1(2"u") > oo, and we may assume 411(2"u") > 1. Then the intervals A", chosen as before, have total length at most 1, so they may be chosen in [0, 1]. (3) Now 4i1 }o'D2i say b4i1(au) > 4i2(u) for u _< uo. As in case (1), we may assume 4i1(auo) > 0. If f E L4,1, then M,1(f /k) < oo for some k. Now
f
i1
(iii)
dp > 4i1(auo)p{IfI >_ akuo},
so µ{I If I > akuo} < oo. Since µ has no arbitrarily small sets, we see that the set { I f I > akuo } consists of a finite number of atoms, so 4?2 (I f I /ka) is
bounded on the set. Now
MD, ( 'fa) = k
f
t2 1
I aI
I dµ
< f1JfJ
(If I" dµ + f kJ
If I I 4;2 C kal)
dp
<00.
Thus f E L12. For the converse, if 4i1 >o 4i2 fails, then the points u" as in case (1)
may be chosen with u" > 0. We may suppose t(u") < 1, and hence 4i1(2"u") < 2". Then disjoint sets A" C_ IN should be chosen, with µ(A") = 2k, where 2" < 4i1(2"un) 2k < 2"+1. The function f defined in case (1) satisfies f E lpl but f ¢ 142. 4i2 for the condition of domination
We will normally write simply 4i1
appropriate for the measure space being considered. If both 42
> 4i1 i then we will say that 4i1 and 4D2 are equivalent, and write
(D1 ^ qt2
Largest and smallest Orlicz functions Consider the following conjugate pair of Orlicz functions (see Figures (2.2.4a) and (2.2.4b)). 0
OFmin(u)
{u u 00
1
0
1
(2.2.4) Proposition. If 4) is any Orlicz function, then (Dmax > 4? > min (at both 0 and oo). If f is a measurable function, f E Lpm_ if and only if f E L1 and f E Lam, and then I f 11,t_ = I f 11, V I I f II0, Also, f E Ltmi
2.2. More on Orlicz spaces
0
1
55
1
Figure (2.2.4a). The smallest and largest Orlicz functions.
1
0
1
Figure (2.2.4b). The Young partition for min and 4max
if and only if f can be written as a sum f = fl + f,, with fl E Ll and f, E Lam, and then
Proof. Let (D be an Orlicz function. Then there is uo > 0 with d(uo) > 0. If 4(uo) = oo, let b = 1, and if 4D(uo) < oo, let b = 1/4D(uo). Let a = uo. Then, if 0 < u < 1, we have 4min(u) = 0 < be(au). And if 1 < u, then by convexity of
we have b4(au) > buD(a) = u > 4Dmin(u). Thus 4D
and a = 1/ul. If Next, there is ui > 0 with do(ur) < oo. Let b = oo > 4D(u), and if 0 < u < ul, then by conu > ul, we have vexity of 4D, we have iD(u) <
b4Pmax(au). Thus'max
ob.
Infinite measure and Orlicz spaces
56
Suppose f E Ll fl L. Write a = Ill Iii V IlfII Then M4mex
so f E
dµ = /'IfI dµ < 1, = frnax(t) (f) a a J a
and
< a = Ill Iii V IlfII.. Conversely, let f E Ltmax. Write a = Ilf1l4'mex Then IIfIl4'max
f If l dµ = a fal dµ < a
f
tmax
(If')
Hence f E Ll and IIf II l IIf Il4'mex Also, if µ{IfI > a} > 0, then we would have 1 > f Pmax(If I /a) dµ > f{I fl>a} oo dµ = oo, which is false. So f E L,, and IlfIIoo < Now suppose f E Write a =IIf 114,m,.. Define
f=
f
if IfI
a sgn(f)
if IfI > a 0
fi=f  fog= (IfI a) sgn(f)
if IfI a.
Then IIffII. < a, and IIfilli=
f Iflldi {IfI>a}
(IfI  a)dµ IfI  1
=a
IfI/a>1}
=a
f
dµ
a
{f
min (I fa I)
dµ < a.
Thus IIfiIii V IIf.IIpp <
Conversely, suppose f can be written f = fl + foo with fl E Ll and I f,,.I/a < 1 a.e., so f E L. Write a = Ilfi1II V
f
min
(IaI) dµ < f min (I fa I
+ 111 ) d M
fmin(t+1) dµ a =f Ialdµ=aIIf1II1 Thus f E L. and Ilfll4'min
Iif,lll V
IIfII
<<1.
2.2. More on Orlicz spaces
57
The space L4m,x is the smallest Orlicz space. We will sometimes write Lmin = L4,mx. As suggested by the above description, it is called L1 fl L. If It is finite, then L4,m,x (S2, F, µ) is just L,,. (0,.F, p) with an equivalent norm. Similarly, if p has no arbitrarily small sets, then L4,mx (1, F, p) is L1(S2, F, µ) with an equivalent norm. The space L4min is the largest Orlicz space. We will sometimes write Lmax = L4,min It is often called L1 + L. The Luxemburg norm (described in the theorem) differs from the usual norm
IIfIIL1+L =inf{IIf1Ii1+IIffII.: f =fi+ff}, but IllIl4min
<
IIfIILi+L < 211!II$min'
If it is finite, then
(S2, , µ) = L1 (S2, F, p) with an equivalent norm. If µ has no arbitrarily small sets, then L4,min (f2, F, A)= L . (S2, F, µ) with an equivalent norm. We will see later that the largest Orlicz heart H4min (",.F, µ) is useful in probability and ergodic theory. Now min satisfies the (02) condition at oo, so if µ is finite, then we have H4min = L4min = L1. But tmin fails the (02) condition at 0, so L1 + L. = L.Vmin ([0, oc)) D H4,min ([0, oc)). This is the space Ro of Fava, discussed below. If 1 = IN and p is counting measure, then L4min = l.. ) and H4,min = c0. We next briefly consider the spaces L1 and L.
(2.2.5)
Proposition. Let ' be an Orlicz function. (1) If
oo
for some u, then L4, (52, . ', µ) C Lc,. (5I, F, µ) and H4, (52, , ', µ) = {0}. If
Hp ([0,1]) C_ L,,. ([0,1]), then 4(u) = oo for some u. (2) If 4'(u) = 0 for some u > 0, then L4, (S2, F, µ) L,,,, (9,.F, p). If 14, 2 co, then 4'(u) = 0 for some u > 0. (3) Lip ([0, oo)) = Lc,. ([0, oo)) if and only if oo for some u and 4)(u) = 0 for some u > 0.
Proof. (1) Suppose F(uo) = oo. If f E L4,, then M4, (f /a) < oo for some a > 0. Then, however, If I /a < uo a.e., so f E L,,,,. is finite. Choose positive numbers ck so that Conversely, suppose ckb(k2) < 2k and Ck < 2'k for k = 1, 2, .. Let Ak be disjoint intervals
in [0,1] with u(Ak) = ck. Let f = Ek 1 k 1Ak. Then f is not in L. If a > 0, then M4, (a f) _
,t(ak)ck + E 1,(ak)ck k
k>a 00
<
lb(ak)ck + E k=1 k
oc,
sofEH4,. (2) Suppose (k (uo) = 0. Then if f E Lam, take a = IfII,i and we have M4, (f /auo) = 0, so f E L4,.
Infinite measure and Orlicz spaces
58
Conversely, suppose 4 is positive. Choose disjoint finite sets An C IN with µ(Ak) ? j(1/k2) Then
cc
k=1
is in co but for any a > 0, we have
Up
(a)
> k>a
(a) µ(Ak)
E
k>a
Thus f ¢ li,. (3) Combine the arguments of (1) and (2).
(2.2.6) Proposition. Let b be an Orlicz function, and let cp be its Orlicz derivative. (1) If cp is bounded above, then H4, (Q, .F, µ) D L1(SZ, .F, µ). If L4 ([0,1]) D L1([O,1]), then cp is bounded above. (2) If cp is bounded away
from 0, then L., (SZ, .F, µ) C L1(1l, , µ). If hD C 11, then cp is bounded away from 0. (3) L4, ([0, oo)) = L1([0, oo)) if and only if cp is bounded away from 0 and oo.
Proof. (1) Suppose V(x) < C for all x. Then 4(u) < Cu for all u. Let
f EL1. Fora>Owehave Mb (af) = J D(alf1)dµ
aCIIfIdµ
Thus f E H4,. Conversely, suppose cp is unbounded, so that D(u)/u > oo as u + oo.
Let an be such that 4 (an) > 2nan and an > 1. Choose disjoint intervals An in [0,1] with u(An) = 2n/an. Then for f = > nan lAn we have f If I dp = >2 nanµ(An) = >2 n2n < 00,
but for any k > 0, t
(If1)
dµIt (nkn)µ(An)>
Thus L1([0,1]) Z L4 ([0,1]).
1=00. n=k
2.2. More on Orlicz spaces
59
(2) Suppose cp(x) > r > 0 for all x. Then ru for all u. Let f E L. Then there is a > 0 with M1 (f /a) < oo. Then
J Ifldµ=af LJIdµaf.(
dµ
Thus f E L1. Conversely, suppose cp is not bounded away from 0.
have 4D(u)/u < cp(u), so t(u)/u + 0 as u 
0.
For u > 0, we Let an be such that
4b(an)/an < 2n/n and an < 1. Choose disjoint sets An C IN with
n an < µ(An) <2n . an Let f = >2(an/n) lA,,. Then:
n
f lid For any c > 0,
f
lb(clf1)dµ4 (
4,(
nn) p(An)
E n
+E
µ(An)
µ(An) + E
2n+1 < 00.
nn) IL(An)
n>c
n
n
n>c
Thus fEh4, butf¢ll. (3) For this part, combine the arguments of (1) and (2). Duality for Orlicz spaces If E is a Banach space, then the dual of E (or conjugate of E) is the set E* of all bounded linear functionals x* : E + IR. The norm
IIx*Il=sup{Ix*(x)I:xE E, IIxiI <1} makes E* a Banach space. We discuss here briefly a few duality theorems for Orlicz spaces. The key is Young's inequality (2.1.4). The following will sometimes also be referred to as Young's inequality.
(2.2.7) Proposition. Let 4 and T be conjugate Orlicz functions. If f E L4, and g E LT, then the product fg is integrable, and
f
Ifgl dµ < 2IIfIItIIgIIp
Infinite measure and Orlicz spaces
60
Proof. First, by Young's inequality,
f Ifgldt< f
(f)+Mp(g)
Now if a = 11f I I4. and b = I Ig I I w, then, applying Young's inequality to f /a
and g/b, we obtain
f
IfgI dp < ab(M4. (f/a) +Mw(g/b)) < 2ab.
An important consequence of the preceding result is this: if g E L4., then a bounded linear functional By on L4. may be defined by 89(f) =
ffgd.
The Orlicz norm of g is defined as the norm of this functional, and written IIghI4..
IIgII. = sup {
f fgdp :fEL4.,IIfhI4<_1} f fgdµ (f)<1}. M4.
The last expression is the norm originally used by Orlicz for the space L,,. We will show below that this norm is equivalent to the one we are using, called the Luxemburg norm. Here are some simple variants of the Orlicz definition.
(2.2.8) Proposition. Let g be a measurable function. (1)
(2)
IIgII*t = sup
1191141 = sup
{ f Ifgl dµ : f E L4., 11f 11,t <_ 1 }
.
{ f I fgl dp : f integrable simple, 11f II4 < 1 }
(3) If 1 is finite, then IIghI*t
.
J
=sup{ f Ifgldµ:fEH4., IIflI4<1}.
Proof. (1) Replace f by sgn(fg). (2) Suppose f I fgI dµ < a for all integrable simple functions f with IIf II4. < 1. We must show that f I fgI dp < a for any f E L4. with IIf II4. :!5 1.
There is a sequence (fn) of integrable simple functions with fn T if I a.e. Clearly IlfnII4. S 1, so f I fngl dµ < a for all n. But Ifngl T Ifgl a.e., so by the monotone convergence theorem, f Ifgl dy < a. (3) If 4' is finite, then the integrable simple functions are in H4..
2.2. More on Orlicz spaces
61
Now we can consider the connection of the Orlicz norm to the Luxemburg norm and the Young modular:
(2.2.9) Theorem. Let g be a measurable function. II9IIw
(1)
(2)
Proof. The second inequality in (1) follows from (2.2.7). For (2), let Ii.f 114, = 1. Then
ff9 dp
f
(t(IfI) + `F(IgI)) dp
= Mt (f) +Mw(g) < 1+Mq,(g). So IIAIAI < 1 + Mq,(g)
Next we turn to the first inequality in (1). Let g E Lp, and write a = II9II*, We must show II9IIip < a; that is: f W(IgI/a) dp < 1.
First consider the case where g is an integrable simple function, and W(IgI/a) is finite a.e. Let f = ,0(IgI/a). By the case of equality in Young's inequality (2.1.4), we have
fiLidiu=f [IfI)dz+w(M)] Now if M4, (f)<1,then f fI du Mq, (a) < M,,(.f) }
dµ = M.(f) + Mw ( )
= a, so we have
M4, (a) = f IagI dp < 1.
On the other hand, if M4, (f) = b > 1, then Mb (f /b) < M., (f) /b, so (by the definition of II9II*) we have f IfgI dp < bIIgII = ab, and thus MID (f)+MT (9a) =
a
fIfgIdt
a
=
(f),
so that Up (g/a) = 0. Thus in both cases Mw (g/a) < 1, and we have II9II'
Next consider the case where g is an integrable simple function, but `I`(IgI/a) is not a.e. finite. This means that ' is infinite. There is d so that W(v) = oo for v > d and'Q(v) < oo for v < d. Then W(x) > d as x * oo, so 4'(u) < du for all u. We claim that IgI < ad a.e. Indeed, if µ{IgI > ad} > 0, there is a set A C {IgI > ad} with 0 < p(A) < oo. For f = (1/dp(A)) 1A, we have Mt (f) _ ,D(1/dµ(A)) µ(A) < 1, so IIl III < 1. Then a = II9II
> f If9I dµ > dµ(A) p(A) = a,
Infinite measure and Orlicz spaces
62
a contradiction, so IgI < ad a.e. If 0 < a < 1, we have algl/a < ad < d, so W(algl/a) is finite. We may then proceed as in the previous case to conclude Mw (ag/a) < a < 1. But then when a T 1, we have Mq, (g/a) < 1, since 41 is leftcontinuous. Finally, consider general g E L41. Since p is ofinite, there is a sequence (g,,) of integrable simple functions with gn T I9I a.e. Now M q'(g, / 119n I I ) S 1 and II9nII; IIgII, = a, so Mw(g,,,/a) < 1. Again, ' is leftcontinuous,
so Mp(g/a) < 1. Thus II91l p < a.
(2.2.10) Corollary. Suppose g is a measurable function. If fg E L1 for all f E L1,, then g E Lp. Suppose that t is finite. If fg E L1 for all f EH4,, then gELT. Proof. Suppose fg E Ll for all f E Lp. Observe that the linear transformation T : L b > L 1 defined by T (f) = f g has closed graph. [Indeed, if we have 11fn  f II.t  0, then (by 2.1.10(6)) there is a subsequence with Ink > f a.e., so fnkg  fg a.e. If IIIng  hull * 0, then there is a further subsequence with fnk,g + h a.e. Thus fg = h.] So II9II* < oo by the closed graph theorem. Therefore 11911q, < oo and g E Lp.
In the second case, where b is finite, we know that HH, is enough to So again II 91I *> < oo by the closed graph theorem and
compute I I9I I
gEL,,.
The preceding result is close to showing that L,, is the dual of Lt. But we know that L1 is not the dual of L... Here is a good illustration of the difference between the spaces Lp and H4,.
(2.2.11) Theorem. Suppose ' is finite. Then the dual of H4, (with the Luxemburg norm) is L4, (with the Orlicz norm). Proof. Since ' is finite, the space H4, is the closure of the integrable simple functions. Let x* be a bounded linear functional on H4,. For each A E F
with p(A) < oo, define v(A) = x*(1A). Now if An 10, then [since Hp has order continuous norm (2.1.14(a))] we have v(A,,) * 0. So v is a finite signed measure on each A where p is finite. If u(A) = 0 then v(A) = 0.
By the RadonNikodym theorem, there is a measurable function f such that fA f dµ = x*(1A) for all A with µ(A) < oo. Then by linearity, for all integrable simple functions we h a v e f f g dp = x* (g), so f I f gI dp < 11x* 11 l19ll w Thus 11f Il * 5 11X* 11 < cc, so f E L4,. The integrable simple functions are dense in H4,, so f f g dp = x* (g) for all g E H*. By (2.2.8),
11f 111 = Ilx*II
Note that
and T are finite in the following (2.2.25).
(2.2.12) Corollary. Suppose fi and lI are conjugate finite Orlicz functions. If 41 satisfies the (02) condition, then the dual of L4, is L4,. If satisfies the (02) condition, then the bidual of H,, is Lip. If both T and satisfy condition (02), then L4, and L4, are reflexive.
2.2. More on Orlicz spaces
63
The heart of a sum of Orlicz spaces We consider an Orlicz function that satisfies condition (02) at oo; that is, there exist uo and M so that
for all u > uo.
If (Q, .F, p) is finite, then L4, = Hp and Lp Q Lam, so there is no point to discussion of the heart of L1, + L00. Therefore, let us assume that (St, F, p) is infinite.
The space L4, + L00 is again a Banach lattice. The usual norm for this space is:
II!IIL.+Lo, =inf{IIfulk'+IIf2II00 : f =fi+f2}. If 4'(u) = 0 for some u > 0, then L4, D L00, so it is of little interest to consider Lt + L00. Therefore it will sometimes be assumed below that this does not happen. In the next result, subscript `s' for `shift' was chosen because the graph of t$ is the graph of 4D shifted to the right by one unit.
(2.2.13) Proposition. Let 4 be an Orlicz function. Let 4% be defined by
%(u) = {
(2.2.13a)
if0
0
if u > 1.
4p(u  1)
Then Loa is the space Lp + L00, with Luxemburg norm
IIfII4,a=inf{IIfi1I4, VIIf2II0:f=fi+f2}. so that IIfII's < IIfIIL.+L <211f 11.t..

Proof. Note that M.,a (f) = f{If1>1} ib(If I 1) dµ. Suppose f = f1 +f2, where f1 E L1, and f2 E L.. If a= IIf1IIa' V IIf2II0,
then IfI a
Mealf = f a
(IfIa)dµ<
l
a
{IfI>a}
aJ
Thus Ilf II,pa < a. This shows IIfila'a
On the other hand, let f E Lt.. Write a = 11111 that M,% (f /a) < 1, or
/ J{IfI>a}
Then we have, first,
rlflal dy<1. `
a
J
Let f1 = sgn f(IfI a)1{IfI>a} and f2 = sgn f(IfI Aa), so f = f1 +f2. Then IIf2IIoo < a and f D(f1/a) dp < 1, so IIf1II,D < a. Thus
IIfIIa, ?inf{IIfuII4, VIIf211.:f =fi+f2}.
Infinite measure and Orlicz spaces
64
The heart of such a space LD+L,,,, can be characterized in various ways.
(2.2.14) Proposition. Let (P be an Orlicz function satisfying (02) at oo and P(u) > 0 for all u > 0. Let 4 be defined by (2.2.13a). Let f be a measurable function. The following are equivalent.
(a) fEH4,.
(b) f EL4, andµ{IfI > t} < oo for all t > 0. (c)
f{IfI>t}11(IfI)dy
Proof. (a) = (c). Suppose f E H.Da. Fix t > 0. Write u = t/(1+t). Then x > t implies x < (x/u) 1, so on the set {IfI > t} we have If I IfI /u 1.
Also, u
J IfI>t} 4(IfI)dpf{IfI>t}
(II 1) (IfI
{IfI>u}
\u
dµ
1 dµ
J(ii) dµ
(c)
It{IfI > t}
iIfI>t}
(If 1) dp.
(b) (a). Suppose f E Lea and µ{I f I > t} < oo for all t > 0. Write a = Ilfll4 . Given e > 0, write fl = f 1{IfI>e} and f2 = f 1{IfI<e} We claim that fl E H4.. Indeed, let b > 0 be given. Since 4D satisfies (02) at
oo, so does
and we have M = sup u>e/a
$ (au/b)
< oo.
(1g u )
Then
{IfI>e}
D.
(I bill da =
(iLl)
dµ
J{Ifl>e}
MPs(If)
I dp<M.
Hence f j E H4,, . Also, M4e
(f2) = / EJ
J{IfI<e}
% ( IfI f dµ E/
4%(0)dµ = 0.
2.2. More on Orlicz spaces
65
e. The distance of f from Hp, is at most e. Since is closed, we have f E Hp..
Therefore 1112 I I
(2.2.15) Let us consider again the largest Orlicz space L1 + L. The Orlicz function that may be used is
if0
0
(2.2.15a)
u1
'bmin(u)
As we have seen (2.2.4), this produces the Luxemburg norm
IlfII,Dmia=inf{II.filly/IIfflI.:f=ff+f.,fiEL1iffEL.}. The heart Him;,, of this Orlicz space is also known as Fava's space Ro. Thus
f E Ro if and only if f EL1+L,, and µ{lfI>t}
(2.2.16) Proposition. A set K is bounded in L.Dm;,, = L1 + Lo' if and only if there exist a, M such that
IfIdµ<M for all f E K. Proof. Suppose Ilf IIq,m;o < A for all f E K. Let a > 2A, M > 2A. If f E )C,
there is a decomposition f = fl+foo with IIf1II1 < M/2 and IIf. Il. < a/2. Now on the set {IfI > a} we have If. I < a/2 so If, I > a/2, and
{2
2µ{IfI>a}< 21L {Ifil> 2}
Thus
<flflldµ<2.
MM a (+Ifil)2 dµ<_+=M. Ifldµ< 4f12:C4 2
We consider next Fava's spaces Rk. They are (in one sense) generalizations of the space Ro.
Let k > 0 be real. (In most applications, k will be an integer; but real values certainly make sense.) Consider the function 4% defined by 0
(2.2.16a)
fik(u) =
u(log u)
ifu<1 if u > 1.
Infinite measure and Orlicz spaces
66
[We may abbreviate this as tk (u) = u(log+ u)k. See Figure (2.2.16).] For k > 1, this is an Orlicz function. The space L is Fava's space Rk. Note that 4 k satisfies (02) at oo, so if It is a finite measure, then we have Rk = L logk L. But "Dk fails (02) at 0, so if µ is infinite, we may have Rk # L logk L. In any case, Rk is the closure of the integrable simple functions in L logk L.
0
0
1
1
Figure (2.2.16). Orlicz function u log+ u and its conjugate.
When k = 0, the function 4'o is not an Orlicz function. But it is "equivalent" to the Orlicz function Fmin in (2.2.4a), (Do `Dmin, so LDo = Lpmin =
Ll+L,,.
Since Rk (k > 0) is an Orlicz heart, it is also a Banach lattice with order continuous norm (2.1.14(a)). We may also note that if k < m, then Rk 2 Rm. Clearly, also, if k > 0 and p > 1, then Rk Q LP. We will see below that certain classical results known to hold for all spaces LP with p > 1 but known to fail for Ll in fact hold on an appropriate space Rk. As in the case of Ro, we may describe Rk in many equivalent ways. We have f E Rk if and only if
If (log Ill  loga)kdp < 00 J{Ifl>a}
for all a > 0. Or: f E Rk if and only if f E L logk L and µ{ I f I > a} < 00 for all a > 0. Or: f E Rk if and only if, for all e > 0, there exist g and h such that f = g + h, f IgI(log+ IgI)k dp < 00, p{g 0} < 00, IIhII,,,, < e.
2.2. More on Orlicz spaces
67
Complements
(2.2.17) There is, in fact, a formula for the Orlicz norm not involving the conjugate Orlicz function.
IIhII;= rook
(1+f
(Krasnosel'skii & Rutickii [1961], p. 92). (2.2.18) (Inequality.) Let (D be an Orlicz function with derivative cp
and conjugate T. Suppose Ct(u) for all u. Then W(p(u)) < (C  2)4 (u) for all u. To see this, use 2.1.19b and the case of equality in Young's inequality 2.1.4: If t(u) < oo, then 1@(o(u)) = ucp(u)  P(u) < ,D(2u)  24b(u) < (C (2.2.19) (Definition of 5'.) Let and (D2 be Orlicz functions. The definition of t1 >(, (D2 is: There exist a, b, no > 0 such that (D2(u)
for all u > no.
The following are equivalent: (a) (b1 >(b) there exist a,uo > 0 such that 4P1(au) > (u) for all u > uo. A similar equivalence holds at 0. (2.2.20) (Definition of Find Orlicz functions showing that the fol
lowing are not equivalent: (a) 4P1 >,, (D2i (b) there exist b, uo > 0 such that b41(u) > 4 2(U) for all u > uo. (2.2.21) (Convolution on 1Rn.) Let SZ be ndimensional Euclidean space IRn and let p be ndimensional Lebesgue measure. Let (D and T be conjugate Orlicz functions. Suppose (D satisfies the (A2) condition. If f E L,p(1R') and g E Lq,(IRn), then the convolution
k(t) = J n f (t  x)g(x) dx = fiRn f (x)g(t  x) dx exists and is continuous on IRn. If T also satisfies the (02) condition, then limlltll,,. k(t) = 0 (Zaanen [1983], p. 600). (2.2.22) Let 11 be ndimensional Euclidean space IRn and let p be ndimensional Lebesgue measure. Suppose 4 satisfies the (O2) condition.
If f E L. (1Rn), then, for any e > 0, there exists a continuous function g : IRn  > IR with compact support such that 11f  g
< e. For h E IRn, let the translate fh be defined by fh(x) = f (x  h). Then
Ilfh  f II4,  0 as IIhII ' 0 (Zaanen [1983], p. 599). (2.2.23) Let A be a subset of H4,(IRn). Then A is relatively compact if and only if (1) SUPfEA IIhII(b < oo; and
(2) for every e > 0 there is 6 > 0 such that for all h E IRn with IIhII < 6 and all f E A, we have Il fh  f II 4, < e (Krasnosel'skii & Rutickii [1961], p. 100).
68
Infinite measure and Orlicz spaces
(2.2.24) (Linear functionals.) Let 4) be a finite Orlicz function with conjugate T. Let a be a (possibly discontinuous) linear functional on L,. Suppose
(a) a(f) < 1 for all f < 0; (b) there is e > 0 such that a(f) < 1 for all f with 1If III :5e. Then there is g E Lq,, g > 0 such that
a(f) = Jfgdµ for all f E
To see this, observe that by (a) a is a positive linear functional, and by (b) a is continuous (with norm < 1/e). Define a setfunction v on the sets A E F with u(A) < oo by v(A) = a(1A).
By (2.1.20) it follows that v is countably additive (on the ring of sets of finite pmeasure), so by the RadonNikodym theorem (and the vfiniteness of p) there is g with a(A) = fAgdµ. The rest follows as in (2.2.11), using monotone convergence.
(2.2.25) (Finiteness.) It is possible for an Orlicz function ' to be infinite even if its conjugate 4 satisfies (02). For example ir X)
x
if0<x<1
4
ifx>1. 1 arctan x (Thanks to A. Millet for providing this example.) Remarks
This section comes from the same sources as the last one: Krasnosel'skii & Rutickii [1961] discuss the comparison of Orlicz spaces in Chapter II, Section 13; and duality of Orlicz spaces in Section 14. Zaanen [1983], Chapter 19, also has a lot on duality. Fava [1972] introduced the spaces Rk, k = 0,1, in the context of ergodic theory. Frangos & Sucheston [1986] identified the spaces Rk, k > 1, as hearts of Orlicz spaces L logk L, and showed that Ra is a Banach lattice with order continuous norm. The identification of Ra as the heart of Ll + Lam, and the other material on the heart of the sum of Orlicz spaces, is from Edgar & Sucheston [1989].
2.3. Uniform integrability and conditional expectation Uniform integrability is a useful criterion in probability theory. We define
it here even for infinite measure spaces, so that the usual role played by Ll is taken by the largest Orlicz space Ll + L,,. Only the basic properties of uniform integrability are proved here, but it will reappear frequently throughout the book. An interesting characterization (2.3.5) of uniform integrability (which goes back to de La Vallee Poussin) can be formulated in terms of Orlicz functions. Then we will discuss conditional expectation on infinite measure spaces.
2.3. Uniform integrability
69
(2.3.1) Definition. Let IC be a family of realvalued measurable functions defined on (S2, , µ). We say that IC is uniformly integrable if, for every e > 0, there is A > 0 such that for all f E IC, we have f{IfI>a} If I dp < e. It would seem logical that a "uniformly integrable" set should consist of integrable functions. This is true if it is finite, but not in general. A singleton {g} satisfies this definition if and only if g E Ll + Lam, that is, g is in some Orlicz space Lp. Uniform integrability of a set JC might be considered as the property that the elements of IC belong to Ll + L,,,, in a uniform way. The main uses of the term "uniformly integrable" will occur when it is finite. Some books use a slightly different definition. We will say that IC is uniformly absolutely continuous if, for every E > 0, there is 6 > 0 such that if f E IC and A E F with µ(A) < 6, then fA IfI dp < e. The next result shows that this concept is (for most purposes) equivalent to uniform integrability. [The vectorvalued versions are different, however; see (5.2.15).]
(2.3.2) Proposition. Let IC be a family of realvalued measurable functions defined on (Q, .F, µ). (1) If 1C is uniformly integrable, then IC is uniformly absolutely continuous. (2) Suppose IC is Ll + Lambounded. Then IC is uniformly integrable if and only if IC is uniformly absolutely continuous. (3) Suppose p is atomless. Then IC is uniformly integrable if and only if IC is uniformly absolutely continuous.
Proof. (1) Suppose IC is uniformly integrable. Let e > 0 be given. Then there is A so that for all f E IC, we have f{Ifl>a} I f I dp < e/2. Let 6 < E/2A. Now if f E IC and p(A) < 6, we have
f If I dp
fAn{IfI_a} I f I dµ + 2
fan{IfI
If I dp
6A < e.
(2) Suppose IC is uniformly absolutely continuous and L1 +L,,,,bounded.
Then by (2.2.16) there exist a and M such that J{IfI>_}
If Idµ<M
for all f c K. Let e > 0 be given. Now there is 6 > 0 so that for µ(A) < 6 and f E IC we have fA If I dp < e. Let A > max{a, M/6}. Then µ{IfI
?A}<1 f A
{IfI>a}
<_1f
A IIf2a}
IfIdp
Ifldµ<M
Infinite measure and Orlicz spaces
70 so
If I di <e. (3) Suppose p is atomless and K is uniformly absolutely continuous. Let
e > 0 be given. Then there is b > 0 so that fA IfI dµ < e whenever f E K and p(A) < b. Let A = 2e/b. We claim that µ{IfI > A} < b for all f E K, which will prove that K is uniformly integrable. Suppose not: suppose that, for some f E K, we have µ{IfI > A} > S. Since p is atomless, there is a set
A C {IfI > Al with u(A) = 6/2. Then
>f IfIdµ>
2b= e,
A
a contradiction. There are some obvious sufficient conditions for uniform integrability.
(2.3.3) Proposition. Let K be a family of measurable functions. (a) Suppose there is f E Ll + L,,,, with IgI < If I a.e. for all g E K (we say K is dominated by f). Then K is uniformly integrable. (b) Suppose there is f E Ll + L,, with p{II > A} 5 tL{IfI > A} for all A > 0, g E K (we say K is dominated in distribution by f). Then K is uniformly integrable.
Proof. Part (a) is an easy consequence of (b); so we prove (b). Suppose
f = fi + foo, where f, c: Ll and f,, E L,,. Write M =
For
A> M, we have for aligE K, IgI dµ = Ap{IgI > Al + {IsI>_A}
f
µ{IgI >_ t} dt
At{If1I+M>A}+f"o µ{If1I+M>t}dt
f
a Ifi I+M>a}
(If1I+M)dµ.
This is independent of g and tends to 0 as A > oo.
One common use of uniform integrability is to connect convergence in mean with other modes of convergence. Recall that a sequence (fn) converges to f in measure if, for every e > 0, we have lim
n,oo
µ{Ifn fI >e}=0.
2.3. Uniform integrability
71
(2.3.4) Proposition. Let (Q, 'F, p) be a finite measure space. Suppose fn E Li for n E IN and f E Li. Then II fn  f IIi > 0 if and only if { fn : n E IN } is uniformly integrable and fn > f in measure. Proof. Suppose Ilfn
 f Iii  0. Now first, if E > 0, then
i{Ifn  fl > E}<<
IIfnfIl1
Jlfnfld=
Thus , {Ifn  f I > E} + 0 as n > oo, so fn + f in measure. Second, we must prove uniform integrability. But { fn : n E IN} is Libounded, so by (2.3.2(2)) we may prove uniform absolute continuity. For
e > 0, there is N with Ilfn  flli < eforn>N. Now
9=imax IfnlVIfI is integrable, so there is 6 > 0 with
fl9ld/L<e
whenever p(A) < 6.
Now for n < N, if p(A) < 6 we have
flfnldPflYldIL<6, and for n>N,if u(A)<6we have
Thus { fn} is uniformly absolutely continuous. Conversely, suppose { fn} is uniformly integrable and fn > f in measure. Of course, K = If } U { f,, : n E IN } is also uniformly integrable. Fix E > 0.
There is 6 > 0 so that fA I9I dµ < e whenever g E K and p(A) < 6. Also,
there is Nso that if n>Nwe have µ{Ifn fI>e}<6. Then forn>N,
f
Ifn  fldµ<
f
{If,.J I>E} = 2E + ep(SZ).
Thus IIfn  f Ill  0.
(IfnI+IfI)du+
f
{I!nJ I<_e}
Ifn  fl dµ
Infinite measure and Orlicz spaces
72
(2.3.5) Theorem (criterion of de La Vallee Poussin). Let 1C be a set of functions. The following are equivalent: (1) 1C is uniformly integrable. (2) 1C is Lebounded for some finite Orlicz function 4 with lim ID(u) = oo.
u.oo
u (3) sup { M4, (f) : f E 1C } < oo for some finite Orlicz function t with lim u.oo
co.
U
Proof. (2) = (1). Suppose IC is Lebounded. Say IIf IIe < M for all f E K. Given e > 0, let A be such that 4D(u)/u > M/E for u > A/M. Then IfI d M {IfI_a}
IfIdµ =
µ
{IfI>_a} M
<MMJ
'1
((M)
µ<E.
{ISl?a}
So 1C is uniformly integrable. (1) (3). Suppose 1C is uniformly integrable. Write
C(A)=sup
. IfIdµ: f E J({ If 12 a) Then C is a decreasing function; lima.. C(A) = 0 since 1C is uniformly integrable. Choose A0 with C(Ao) < 1. Then  log C(A) is a positive
increasing function of A for A _> A0, and  log C(A) > oo as A + oo. There
is a continuous function cp: (0, oo) 4 (0, oo) with cp(u) = 0 for u < A0, cp(u) <  log C(u) for u > A0, and limo(,. cp(u) = oo. We may assume cp is strictly increasing on (A0, oo). Thus cp is an Orlicz derivative. Let 4 be the corresponding Orlicz function and let 0 be the inverse of V. Now t(u)/u > oo since W(u) 4 oo. Since cp is strictly increasing on (A0, oo), we
have cp(l,(y)) = y for ally > 0. Thus y < Now for f E K, we have
Me(f)
log C( (y)), or C(r(y)) < ev.
<J(ifi) Ifidi V(If(wI)
in
oo
 Jo
o
dy If(w)I dµ(w)
{If(')I? '(v)}
If (w) I dµ(w) dy
oo
ff e_v
C(b(y)) dy
0
<
dy = 1.
0
(3) . (2). By (2.1.10(3)), if we have sup { Me(f) : f E 1C } = N, then it follows that sup { Ill 114, : f E lC } < N V 1.
2.3. Uniform integrability
73
(2.3.6) Corollary. (a) If K is uniformly integrable, then the convex hull cony K is also uniformly integrable. (b) If K is uniformly integrable, then the closure K in an Orlicz norm II IIip is also uniformly integrable if lim,,.c,. 4i(u)/u = oo. (c) If K1 and K2 are uniformly integrable, then
K1+K2=If +g: f EK1i9EK2} is also uniformly integrable. (d) If K is a finite subset of L1 + L', then K is uniformly integrable. Proof. (a) The norm 11 'III is a convex function, so cony K has the same Lb bound as K. (b) The norm II 'III is a continuous function on the Orlicz space, so K has the same L4, bound as K. (c) Suppose 4Di is an Orlicz function such that Ki is bounded in Lip, (i = 1, 2). Write cpi = V. Then the pointwise minimum cp = cpl A cp2 is also an Orlicz derivative. Let 4i be the corresponding Orlicz function. Now 4)i(u)/u > oo, so cpi(u) * oo, and therefore W(u) + oo, sob(u)/u > oo. Also b < (Pi (i = 1, 2). Thus K1 and K2 are both bounded in L4,. Clearly K = K1 + K2 is also bounded in LD.
(d) A singleton If } is uniformly integrable if f E L1 + L. Then the result follows from (c).
Conditional expectation in infinite measure spaces
(2.3.7) Suppose (S2, .F, µ) is a measure space, f E L1(S2, .F, p) is a measurable function, and 9 C_ F is a valgebra. The conditional expectation of f given 9, written Eµ [f 19] or Eµ f, should be a function g E Ll (S2, c, µ) satisfying
fgdP=jfd/i
(2.3.7a)
for all A E G. If µ(S2) < oo, this can easily be arranged: the RadonNikodym theorem may be applied as in the case of a probability measure.
But if µ(S2) = oo it is not possible in general to find such a function g (2.3.19).
(2.3.8) Proposition. Let (S2,.F, p) be a ofinite measure space, and let C C F be a oralgebra. There is a set B E 9, unique up to sets of measure zero, such that (a) B is orfinite in (S2, g, µ): there exist sets B,,, E 9 (n = 1, 2, ) with
µ(Bn)
1B,,,.
(b) For C E 9, with C fl B = 0, we have either µ(C) = 0 or µ(C) = 00. Proof. B is the essential supremum of all sets in 9 of finite measure. (This concept is discussed in detail in Section 4.1, but here we will include a selfcontained proof.)
74
Infinite measure and Orlicz spaces
Pass to a finite measure v equivalent to µ on F. Choose a sequence G,,, of sets in G with µ(G,,) < oo so that 0 < v(G,,,) and v(G,,) converges to sup { v(G) : µ(G) < oo, G E G }.
By disjoining the sets G,,,, we obtain a countable disjoint family {H,,,} of sets maximal for the condition that 0 < µ(A) < oo. Let
B=UHn. Then clearly BEG and B is orfinite in (S2, G, µ). If C E G with c fl B = 0,
then c fl Hn = 0 for all n. By maximality, we see that either µ(C) = 0 or µ(C) = oo. The set B of Proposition (2.3.8) is called the set of Q finiteness of µ with respect to G. It is easy to define a "conditional expectation" on the class of functions
L1 + L. Suppose f E L1 + LO,, and g are given. Let B be the set of vfiniteness of p with respect to G. Then
v(A)=J
fdp
AnB
exists for all A E G with µ(A) < oo. Now B is afinite, say B = U Bn, µ(Bn) < oo, Bn fl B,,,. = 0 for n # m. The RadonNikodym theorem yields
gn E L1(Bn, G, µ) with v(A) = fA gn dµ for all A E G, A C Bn. The functions may be pieced together, 00
9=>9n'B,, n=1
to obtain g E (L1 + L... )(1, 9, p) satisfying
IA
dµ=
f
A
fdp
for all A E G with p(A) < oo.
(2.3.9) Definition. Let f E (L1 + L00)(St,F,µ) and let G C F be a aalgebra. Then g is the conditional expectation of f given G, written
9=Et, [fIG] or 9=Eµf iff g is Gmeasurable, and
IA
dµ=
f f dp A
2.3. Uniform integrability
75
for all A E 9 with µ(A) < oo. Note that (even if f is integrable), there need not exist a 9measurable function g with
f
A
A
fdµ
for all A E G; see (2.3.19). It can easily be verified that if g = Eµ [ f 19], then we also have
f hgdµ= f hfdu for all ameasurable h E L1 f1 L. Occasionally we will write Eµ [ f 19] even when f is not in L1+L,,. This will make sense, for example, if f > 0, provided we allow EN, [f G] to take the value oo. (Compare (1.4.8).) Several of the usual properties may be checked easily: Eµ [h 1 + f2 191 _ EN. [ fl 191 + Eµ [ f2 19] (conditional expectation is a linear operator). If f > 0, then Eµ [f I G] > 0 (conditional expectation is a positive operator). If g is ameasurable, then E. [ g f 19] = gEµ [ f 19], provided both conditional expectations exist. Jensen's inequality is an important and useful property, which will be
used to prove that conditional expectation is a contraction on all Orlicz spaces.
(2.3.10) Jensen's inequality. Let f E (L1 + Lam) (1l, .F, µ), let 9 C .F be a valgebra, and let D: lR > 1R be a convex function. Then
,D(El[.fI9])
<EN,[D(f)I9]
a.e. on the set of vfiniteness of g.
Proof. First recall that a convex function on IR is necessarily continuous, so that 4(f) is a measurable function. Consider rational numbers m and b such that the graph of the line y = mx + b is below the graph y = (D(x); that is, 4D(x) > mx + b for all x E 1R. Then for any A E G with u(A) < oo, we have JA 4D(f) dp > m fA f dp + bµ (A).
Thus
L1ImJA11d1dp+bµ(A) or
j
(Eµ
[
(.f) I G]  mEµ [f I G]  b) dp > 0.
The integrand is ameasurable, and the inequality holds for all A E G with p(A) < oo, so Em
[4(.f)I9]mEµ[.fI9]b>0
Infinite measure and Orlicz spaces
76
y=mx+b Figure (2.3.10). Diagram for Jensen's inequality.
a.e. on the set B of ofiniteness of µ with respect to G. There are countably
many pairs (m, b) of rationals such that y = mx + b is below y = P(x). Thus there is a single set N E G with µ(N) = 0 so that, for all w E B \ N, Eµ [,P(f) I G] (w) > mEµ [f 19] (w) + b for all such pairs (m, b).
Now let WEB\N,ande>0. Write For each E > 0, there is a pair (m, b) of rationals with y = mx + b below y = P(x), but yo < mxo + b + E. See Figure (2.3.10). (This is true since the convex function fi(x) is leftdifferentiable at xo.) Thus we have Eµ [t(f) 19] (w) ? 4,(Eµ [f 191 (w))  E. This is true for all e > 0, so we have Eµ
19] >,P(Eµ [f 19])
Note that if (P (0) = 0, then we even have EN a.e. on Q.
a.e. on B. I G] >
(Eµ [ f I G] )
(2.3.11) Corollary. Let 4D be an Orlicz function. If f E Li., then Em [f 9]EL4, and IIEµ [f 1G] II4, < If 11.t.
If f c Hp, then Eµ [ f 19] E H.. Proof. Both parts follow from: Mt (Eµ [ f I G]) Mt (f ). This is a consequence of Theorem (2.3.10), using the convex function 4(JxI).
2.3. Uniform integrability
77
(2.3.12) Corollary. Let K be a uniformly integrable set of Fmeasurable functions, and let { Gi : i E I} be a family of subcalgebras of F. Then
K={EM[fIGt] : f EIC,iEI} is also uniformly integrable.
Proof. By Theorem (2.3.5), there is an Orlicz function with 4?(u)/u + o0 such that K is L4,bounded. By Corollary (2.3.11), the set k has the same
bound in L,. Complements
(2.3.13) (Uniform integrability and uniform absolute continuity.) Suppose
(1,F,µ) has an atom: A E F, 0 < µ(A) < oo, and if B C A, then either p(B) = µ(A) or p(B) = 0. Let fn = n IA. The set { fn : n E IN} is uniformly absolutely continuous, but not uniformly integrable.
(2.3.14) (The converse is false in (2.3.3(b)).) Let (52,.x', µ) be [0,1] with Lebesgue measure. For k > 3, let Jk = kl(o,1/(klogk))
Then the set K = { fk : k = 3,4, } is uniformly integrable. But if K is dominated in distribution by f, that is µ{Ifkl _ A}<_µ{lfl>A} for all k, then we have f If I dt = oo, so f ¢ LI + L,,. = Ll (Clarke [1979]). (2.3.15) (Mean convergence of nets.) Let J be a directed set, and suppose
ft E Ll for each t E J. The net (ft) is said to be uniformly integrable at infinity if, for every e > 0, there exists s E J and A > 0 such that for all
t>s
LIft1>X}
IftIdµ<e.
Suppose µ(S2) < oo. Then: 11ft  f III > 0 if and only if ft * f in measure and (ft) is uniformly integrable at infinity. (Neveu [1965a], p. 54.) Uniform integrability is connected to weak compactness. The classical theorem along these lines deals with finite measure spaces: Let µ be a finite measure, and let K C L1(µ). Then K is relatively sequentially compact in
the weak topology of Ll if and only if K is uniformly integrable. (For example, Dunford & Schwartz [1958], (IV.8.11).) Here is the more general version.
Infinite measure and Orlicz spaces
78
(2.3.16) Proposition. Let 1C C L1 + L,,.. Then IC is relatively sequentially compact in the weak topology a(Li + Lam, L1 n Lam) if and only if 1C is uniformly integrable and L1 + Lambounded.
Proof. Suppose 1C is relatively sequentially compact. Then, for every measurable function h E L1 nL,,,, the set { f f h dp : f E 1C } is a bounded set of scalars. Thus, by the uniform boundedness principle, { If IIL1+Lo, : f E IC} is bounded. We claim that IC is uniformly integrable. Suppose not. Then there exist a>0,A, E.F,fnEIC with µ(An).0but Taking a 21 and subsequence, we may assume that µ(An) fn converges for the Now A = U' 1 An has weak topology v(Ll + Lam, L1 n L..), say fAnIfnIdp>a.
finite measure, and fn > f weakly in L1(A). By the classical theorem, fAn I fnI dµ + 0 since µ(An) r 0, a contradiction. Conversely, suppose IC is uniformly integrable and L1 + Lambounded. Now S2 is orfinite, say Ak I St, p(Ak) < oo. Let fn E 1C. For each k, the set { fn 1Ak : n E IN } is uniformly integrable and bounded in L1, so by the classical theorem, it is relatively weakly sequentially compact. We may piece together the limits, so there exists a measurable function g with
fn IA,, ` g 1& weakly as n > oo, for all k. Now we claim that fn > gin a(Li + Lam, L1 n Lam). Let e > 0. Let h E L1 n L. Choose A so that
{Ifn91>a}
Ifn  gI dp <
3IIhII.
for all n. Then choose k so that
12\Ak
Then choose N so that for all n > N, we have
JAk
 g)h dµ
< 3.
Then we have, for any n > N,
f
(fn  g)h dy
J (fn  g)hdp + k
+
I
Ifn9I?a}\Ak
f
{Ifn9I
Ifn  gI I hI dµ
+ A 3E + IIhii. 3IIhII. = e. 3
This shows that fn > gin a(L1 +
L1 n L,,).
Ifn  gI I hl du
2.3. Uniform integrability
79
Another variant uses this definition: IC is uniformly Ro if for every 'y > 0
there exists M > 0 such that if A E .P and p(A) < y, then fA If dp < M for all f E K. Obviously a singleton If } is uniformly Ro if and only if f ERo. (2.3.17) Let K C Ro. Then K is weakly relatively sequentially compact in the weak topology a(Ro, L1 n L..) if and only if K is bounded in L1 + L.. norm, K is uniformly integrable, and K is uniformly Ro. (2.3.18) (Weak compactness in co.) Describe weak compactness in co by taking f = IN in the preceding. (2.3.19) (Conditional expectation counterexample.) Let (1k, F, p) be lR with Lebesgue measure, let f = 1i0,11, and let 9 = {S2, 0}. Then there is no
g E (L1 +L.)(fl,g,p) with
f
A
gdp=
JA
.
fdp
for allAEG. (2.3.20) (Alternative construction of the conditional expectation.) Let (Q, .F, p) be a afinite measure space, and let G be a subQalgebra of F.
Let B be the set of afiniteness of p with respect to G. Then there is a probability measure P on .F and a RadonNikodym derivative p = dP/dp, where {p > 01 = B. We have
Eµ If 9] =
Ep [g/p 19] Ep [1/p 19]
is
This could be used as the definition of the conditional expectation in an infinite measure space. (2.3.21) (Uniformly integrable martingales.) Let (Sl,
P) be a probability space. Let (Ft)tEj be a stochastic basis indexed by a directed set J. (i) If (Xt) is a uniformly integrable martingale, then Xt converges in L1 to a random variable X and Xt = E7t [X]. (ii) If (Xt) is a uniformly integrable submartingale, then Xt converges in L1 to a random variable X and Xt < E.Ft [X].
Proof. (i) Xt converges stochastically to a random variable X; and since it is uniformly integrable, it converges to X in L1 (1.1.3, 1.3.1, and 1.4.7). By Fatou's lemma, X E L1. Now ifs < t and A E F , then E [Xs 1A] = E [Xt 1A]
by the martingale property. Taking the limit as t > oo, we get
E[X81A] =E[X1A] for all A E Fe. That is, X9 = Ewe [X].
80
Infinite measure and Orlicz spaces
(ii) Proofs for submartingales are similar.
(2.3.22) (Orlicz bounded martingales.) Suppose (Xt) is a uniformly bounded martingale. By the criterion of de La Vallee Poussin, there is an Orlicz function 4 with lim'(u)/u + oo such that E [4,(jXt1)] is bounded. (Modular boundedness.) On the other hand, if (Xt) is a martingale, and there is an Orlicz function 4' with 4'(u)/u + oo such that E [4'(IXtl)] is bounded, then (Xt) is uniformly integrable. What is less well known is that 4'( jXt I) is also uniformly integrable. To see this, observe that since 4'(IXtj) converges stochastically to 4'(JXJ), by Fatou's lemma we have E [,b(jXj)] < oo. But by Jensen's inequality,
4'(IXtI) = 4'(IEFt [X] 1)< E.rt [4'(IXI)]
The last term is uniformly integrable, so 4'(IXtl) is also uniformly integrable.
(2.3.23) (Orlicz norm and Orlicz modular convergence of martingales.) Let 4, be an Orlicz function with 4'(u)/u + oo. Let (Xt) be a martingale or a positive submartingale. (i) If (Xt) is bounded in Dmodular: sup M.1 (Xt) < 00, t
then (Xt) converges in 4'modular. (ii) Suppose also that 4, satisfies condition (02). If (Xt) is L4,bounded, then (Xt) converges in L4, norm. Proof. The martingale case follows from the positive submartingale case by decomposition into positive and negative parts. So assume (Xt) is a positive
submartingale. The limit X of Xt exists in L1. Then t(Xt) converges stochastically to 4'(X), and by Fatou's lemma, 4'(X) E L1. By (2.3.21) we have
Xt < E.Ft [X], so by Jensen's inequality Eat [,P(X)].
As in (2.3.22), 4'(Xt) is uniformly integrable. Thus 4'(Xt) > 4'(X) in L1. But 4' is convex, so
4'(x  y) «(x)  4'(y) for x > y > 0. Therefore 'D (I Xt  X I) < 4'(Xt) 'D (X)
Thus 4'(IXt  XI) converges to zero in Ll; that is, Xt converges to X in 4'modular.
2.3. Uniform integrability
81
(ii) If (A2) holds, modular convergence is equivalent to Lip convergence, and modular boundedness is equivalent to L4, boundedness (2.1.18).
Condition (02) cannot be omitted in (ii). See Mogyorodi [1978] and Bui [1987].
Remarks Conditional expectations in infinite measure spaces can be found in Chow [1960b] and Dellacherie & Meyer [1978].
3
Inequalities
In this chapter we will prove several different kinds of inequalities. We begin with the "threefunction inequality," which relates weak inequalities, such as
tt{I9I>A}
IfIdu,
{I9I>_a}
to strong inequalities, such as II91IP<_CIIf11P.
We will replace the space LP with various Orlicz spaces. The main result (Theorem (3.1.2)) is called the "threefunction inequality" since the most general version deals with inequalities i {I9I >_ A} <_ 
(W)
J{IhI>a}
If I dµ
or
bMM (b)
(S)
(h)
I
involving three functions f, g, h. Section 3.2 studies martingale transforms, and proves the basic maximal inequality due to Burkholder. Section 3.3 deals with some elementary "prophet inequalities." These
compare the gain that can be made by a gambler (who knows only the present and past) to the gain that can be made by a prophet (who knows also the future).
3.1. The threefunction inequality The proofs of ergodic and martingale theorems are similar, and to unify them is an old problem. A. & C. Ionescu Tulcea [1963] gave a common generalization of two vectorvalued maximal theorems: that of Chacon, and the martingale theorem. In other places the theories converge more obviously. By Fubini, there exists a passage from weak Ll inequalities to strong maximal inequalities involving either the LP norm (1 < p < oo), or the L logy` L norm. This is used in ergodic and martingale theory, and
3.1. The threefunction inequality
83
in harmonic analysis where it originated. In this section, this argument is extended to general Orlicz spaces. There is also a reverse inequality.
(3.1.1) Let (1k, , µ) be a ofinite measure space. Let 4D be an Orlicz function, with derivative cp, and conjugate T. We will be interested in the function
defined by
(3.1.1a)
C(u) = iI`(co(u)).
By the case of equality in Young's inequality (2.1.4), for values of u with 4b(u) < oo the function may also be written (u) = uco(u) D(u)
For simplicity we will normally assume in this section that 4D(u) < oo for all u. In many cases, will again be an Orlicz function, but even when it is not, we will use the notation
Mt (f) =
J(IfI) dµ.
will be defined by the Luxemburg formula I 1f £ _ inf { a : Mt(f /a) < 11. If is not an Orlicz function, then need not The expression
satisfy the triangle inequality.
(3.1.2) Theorem. Let be a finite Orlicz function with l (u)/ u p oo, and let be defined by (3.1.1a). Let f, g, h be measurable functions. Suppose that (W)
i{Igi>A}<_ I i1hJ>a}IfIdµ
for all A > 0. Then for a, b > 0, we have (S)
bMt( )
a
Proof. Only the absolute values of the functions f, g, h enter into the theorem, so we may assume that these functions are all nonnegative. We may assume that Mp (f /a) and M£ (h/b) are both finite. Note that cp is continuous except at countably many points of (0, oo), since it is nondecreasing.
Let C. be the set of points of continuity of cp, and let Cp be the set of points of continuity of 0. First, suppose that g and h are integrable simple functions with values
in T = bC , U {0}. (Note that T is a dense set in 1R+.) Then, for each wEIl,wehave g(w)/bEC,,U{0}. If yE C,1,, then we have Y:5 W (9(b
))
0(y) <
9(b )
Inequalities
84
Therefore, for almost all y E (0, oo), we have
By Fubini's Theorem, the sets {
(w,y)ESlx(0,oo):y
S (w) y) E 52 x (0, oo) : '+l)(y) <
9(w) }
l
b
J
agree except for a set of measure zero in the product 0 x (0, oo). Similarly, the sets
{ (w,y)ESlx(0,oo):y<w
(h(bw))},
( (w, y) E f x (0, oo) : ib(y) < h(w) } l b agree except for a set of measure zero.
Now we have: bM{
(\bll
l
(bl dµ
=b J' (p (2b)) dy
=bJ
0
0c
(9/b)
J
'
G(y)dyda
apply F ubini's theorem to interchange the order b
Jo
J{s?b,'(y)}
dµ V) (y) dy
µ{g > bb(y)} bb(y) dy
use the hypothesis
<
0
{h>b ,(y)}
f dµ dy
apply Fhbini's theorem to exchange back p(h/b)
=
1
dy.f dp
3.1. The threefunction inequality
a j p (b)
85
adµ
apply Young's inequality (2.1.4)
=aM,(a)+aM£I b I. For the general case we will approximate g and h by integrable simple functions with values in T in the usual way. Since it is used below, we spell out the approximation. Given n, choose to = 0 < tl < < tm with ti E W. and ti  ti1 < 2n and tm > 2'. Define gn by:
{gn = ti} = {ti < 9 < ti+1}, {9n = tm} = {tm < 9}, and similarly for hn approximating h. Thus we will have gn T g and hn T h. Now f, g, h satisfy the hypothesis
µ{g > A} <
1
f dµ {h>a}
for all A > 0. We claim that f, gn, hn satisfy the same hypothesis. Consider
A> 0. Ifti_1 ti}=19>ti},
1\1
{hn>A}={hn>ti}={h>ti}. Therefore
µ{gn > A} = µ{g > ti} <
1f
ti {h>tj}
_
1
fdµ fdµ
ti
1 f A
fdµ.
This verifies that f, gn, hn also satisfy the hypothesis. Therefore the previous case yields
bM£(gb)
Inequalities
86
Now
is leftcontinuous, so we may pass to the limit to obtain:
bMt (b) < aMp (a) +aMt
I\ I .
To illustrate the result, we display a number of consequences. First take
g = h: If 1z{IgI?A}<
(Wg)
1A f
If Idµ
{Igl>a}
for all A > 0, we have (S'g)
bMt (b) < aM4, (1) +aMt (b)
With an assumption, we get more:
(3.1.3) Theorem. Let 4) be an Orlicz function, and let
be defined as in (3.1.1a). Let f and g be measurable functions related by (Wg). If (Fg)
p{IgI > Al < oo
for all A > 0,
then for any a < b, (S9)
M
(g)
a
M
M. (f).
Proof. We may assume that f and g are nonnegative functions, since all quantities involved depend only on the absolute values of the functions. First, consider the special case where µ{g > 0} < oo (for example, µ(1) < oo) and g is bounded, say g < C. Then of course M M (b) J{g>o}
(b) dp < oo.
Thus we may solve (Sg) to get the required conclusion. Next, suppose only that g is bounded, but possibly µ{g > 0} = oo. For
e > 0, consider ge = (g  E)+. It, too, is bounded; and µ{ge > 01 _ i {g > E} < oo. Also, ge satisfies a weak inequality of the form (W9):
µ{ge>A}=µ{g>A+s}<
A+
Jg>a+e} fd< f
as required. By the previous case, we have
A
fdµ g .>a}
3.1. The threefunction inequality
87
Now as e 1 0, we have gE T g, so by the monotone convergence theorem (and leftcontinuity of ) we get the conclusion (S.). Finally, consider general g. For a constant C > 0, let gC = g A C, so if A > C,
µ{gC>a}=0< 1 JgC>A} fdµ,
fdµ=1
p{gC>a}=µ{g>A}<1 f A
g>a}
f
gc>a}
fdµ.
Thus by the previous case, we have
M f
a
gC
Now as C T oo we have gC T g, so by the monotone convergence theorem we get (Se).
(3.1.4) Replace f by cf: If µ{IgI > Al Goo and
µ{I91>A}< c A
f
{IgI>a}
Ifldµ
for all A > 0, then for 0 < a < b, we have
MM (b)
(,f).
Or, replace a by ac to obtain for 0 < a < b/c,
Mb f (9)
ac
bacM
(3.1.5) Take f =g=h. Assume j{ Ill
fa A} < ooforallA > 0. The
inequality
µ{IfI >A}<_1J{Ifl>a}IfIdµ is clearly true for all A > 0, so by (3.1.3), if a < b, we have
M£(b)
Inequalities
88
(3.1.6) Proposition. Let f and g be measurable functions related by
µ{IfI?)1}< 1 1\
(Wf)
f
{Ifl>a}
IfIdµ
Suppose either (a)
µ{IfI>_a}
(Ff)
for all A > 0; or (b) (D satisfies condition A2. Then (S9) holds for any a < b.
Proof. Clearly
µ{IfI>_A}< 1 A
111f I>a}
IfIdp,
so by the threefunction inequality (3.1.2),
(Sf)
bMM(f)
Assume first that p fi f I > Al < oo for all A > 0. By Theorem (3.1.3), we have
(Sf)
MtIbI
Thus by the threefunction inequality (3.1.2),
bMM(b)
C. Then
Ct (u) > 4 (2u) > ucp(u),
so (u) = ucp(u)  D(u) < Cb(u). Thus MM (f /a) < CMS (f /a). Now if M£ (f /a) = oo, then also Mt (f /a) = oo, so (Sg) holds. On the other hand, if M£ (f /a) < oo, then Mg (f /b) < oo and we may solve (Sc) 'to obtain (Sf). The remainder of the proof is the same as the previous case.
(3.1.7) Replace g by g/c: If {If I > Al < oo and u{IgI ? ca} <
1f A
{Ifl>a}
If I dp
3.1. The threefunction inequality
89
for all A > 0, then for 0 < ac < b, we have
M
ac
(9) b
M
bacM
a (')
(3.1.8) Take f = g: If {If I > Al < oo and p{I f I >_ A} <_
I f I dp
for all A > 0, then for 0 < a < b, we have MgI
I
b
b
Note that we obtain the same conclusion (S9) in both (3.1.3) and (3.1.6). Which approach is better? In the typical application, we will have g > f, so that, assuming (F9), (3.1.3) is a stronger result than (3.1.6). For example, we will apply these results below where g is the maximal function constructed from f > 0 by 9 = suP Tn.f, n
where Tn are positive operators on function spaces, and To = I, so that f < g. Now if f < g, then (W9) is easier to verify than (W f). On the other hand, in infinite measure spaces, integrability of f implies (Ff) but not (F9). It is therefore of interest to know that in many cases, (W9) implies (Wf) with different constants. (See, for example, (8.2.5).)
The inequalities have all been stated using the nonstrict inequalities such as p{IgI > A}. They are equivalent to the same formulas with strict inequalities such as p{IgI > A}. This is because of the relations p{IgI > Al = limp{IgI > A + 1/n}, n
p{I9I > \} = limp{IgI > A  1/n}. n
For example, by the monotone and dominated convergence theorems p{IgI > A} <
A
{ lhl>a} IfI dµ
for all A > 0
is equivalent to
'
N{I9I>AI < 1 f
{Ihj>A}
Ifldp
for allA>0.
In the case when 4 and the related function are Orlicz functions, our theorem tells us about Orlicz spaces, and Orlicz hearts.
Inequalities
90
(3.1.9) Corollary. Let 4 be an Orlicz function, and suppose that the function defined by (3.1.1a) is also an Orlicz function. Then: (a) L4. C LC; H4. C H. Suppose f and g are measurable functions with µ{IgI > Al < oo and
i4i
(W9)
lI
A} <
1 A
J
IfI dµ I91?a}
for all A > 0. Then: (b) f E L. r g E Lt.
(c) f EH4.
gEHH.
Proof. (a) If f E L4., then there is a > 0 so that Mp (f /a) < oo. Then by (2.1.19b), l;(u) _< t(2u) so for b > 2a we have MM (f /b) < oo, and so f E Lt. Similarly, if f E H4., then for every a > 0 we have Mp (f /a) < oo. Thus for any b > 0, there is a with 0 < 2a < b, and we have M£ (f /b) < oo, so f EHH. For parts (b) and (c), proceed as in part (a), using (3.1.3).
(3.1.10)
Corollary. Suppose f, g are measurable functions such that
,u{I9l > A} < oo, and
i{I9I>A}< 1 A
f
If ldµ
{192.\}
for all A > 0.
(a) Let 1
II9IIp
(b) Let k > 1. If f E Rk, then g E Rk_1. If f E L logk L, then g E L logk1 L and II9IILIog' 'L C k k
1
IIfIlL1og'L.
(c) If f E R1, then g E Ro and. II9II4.o
2IIfIILlogL,
where 4Do is as in (2.2.16a). If f E L log L then g E L1 + L, and
II9IILi+L <_ 2 +2J IfIlog+Ifl dµ.
3.1. The threefunction inequality
91
Proof. (a) Let 1 < p < oo. Consider the Orlicz function ID(u) = uP/p. Then W(u) = up1 and .(u) = ((p  1)/p)uP. Thus we have
M.,(f)=pIIfIIP,
MM(f)=pp1IIfIIP Now if f E LP, we may use (3.1.3) with a = 1 and b = p/(p  1) to obtain the inequality stated. (b) Let k > 1. Consider the Orlicz function defined in (2.2.16a): (D(u) _ 44(u) = u(log+u)k. Then cp(u) = (k + log u)(log+ u)k1,
and
C(u) = ku(log+u)k1 = k4ik1(u) Now if f E L logk L, take a
IIfIILIogkL,
b=kola. Then M.D(f /a) = 1, so by (3.1.3) we have M{ (g/b) < a/ (b  a) = k, so that MMk_1(g/b) = (1/k)M4(g/b) < 1. Thus
II9IILIogk1L
k
IIfIILIogkL
1
(c) The first inequality is proved in the same way as (b): if 4 = t1i then 6 = (Do. Recall that (since 4io is not an Orlicz function) II II4° is not a norm. For the second inequality, begin with (3.1.3):
(g) b
a
M,,, (f
ba
'
a
Then observe that II9IIL1+L,o
b+J
I
I
dp
Igj>b}
=b+bMPO (9b
b+baM'l (a Finally, set a = 1 and b = 2. The constants p/(p  1) in part (a), (k + 1)/k in part (b), and 2 in the first inequality of part (c) are the best possible in this result. (See (3.1.14) and (3.1.15), below.)
Inequalities
92
Reverse inequality Theorem (3.1.2) has a companion "reverse" theorem. We take a = b and
f = h. If there is c so that
p{I9I>a}> c 1\
f
{If2a}
If ldµ
for all A > 0, then
M£(
)>cMi1
a I+cM£(al.
It is enough to prove it in the case a = 1, since we may replace A by aA. In fact, we will prove a more general result with two terms on the right. (It is not hard to see that it could be done with three or more terms on the right, but our applications use at most two terms.) Note that the constants c1, c2 are not required to be positive.
(3.1.11) Theorem. Let t be a finite Orlicz function with 4D(u)/u > oo, and let be defined by (3.1.1a). Let fl, f2, g be measurable functions. Suppose that there exist constants c1, c2 so that
Ap{Igl?A}>cl f fi >A} {If,ld+c2 I
f
f2ldp If2 I>A}
for all A > 0. Then we have Mt (9) >_ c1M4, (fl) + c1MM (fl) + c2M4, (f2) + c2M£ (f2)
Proof. This proof is similar to the proof of Theorem (3.1.2). (We take a = 1.) Again, we may assume that fl, f2, and g are nonnegative. Suppose first that fl, f2, and g are functions having countably many values, all in the set T = Cv U {0}, as before. Then
M£ (9) = f (g) dp =
f
`y (w (9)) du v(9)
z/i(y) dy dp o
apply F ubini's theorem to exchange the order
=f
J{g2:0(01
dµ0(y)dy
3.1. The threefunction inequality
=
f
93
µ{g > V,(y)} b(y) dy
0
use the hypothesis > c1
0f00
f fi_(y)}
{fidpdy+c2
v(h) dy A dµ + c2
= ci f fo
f ff_
f0 f0
f2dµdy
0
w(f2)
dy f2 du
=clJ(fi) fi dz + c2 J O (12)f2du apply the case of equality in Young's inequality
f
= ci f 41 (fl) d i+ci + C2
f
IF ((p (fl)) dp
'D (f2) dp+c2
f
(f2)) dp
= ciM . (fi) + ciMt (fi) + c2M4, (f2) + c2Mg (f2)
For the case of general fl, f2i and g, approximate as follows. Given n, let countably many values 0 < < t_2 < t_1 < to be chosen so that
ti E Cv,, to > 2n, ti  ti_1 < 2", limi__,. ti = 0, and ti/ti1 < 1 + 2n. Let
{gn = ti} = {ti < 9 < ti+1}, {gn = t0} = {to 91, {gn = 0} = {g = 0};
so that gn < g with gn > g. Similarly, fln should be constructed to approximate f1 and f2n should be constructed to approximate f2. Then for ti_1 < A < ti we have {gn > Al = {g > ti}, and similarly for fl and f2. If c1 > 0, then c1
>
Cl
Cl
ti  ti1(1 + 2n)  A(1 + 2n) If c1 < 0, then Cl
Cl
ti
A
Similarly for C2. Thus in the case where c1, c2 > 0, we have p{gn >_ A} = µ{g > ti} > C1
f
ti {fl>}
f1 dj.t + tiC2 J
f2>ti}
f2 dp
Inequalities
94
fl. d1.1
c1
ti {h,.>a} Ci
A(l + 2)
Mg (9n)
{fi,._a}
f2. dl.,
c2
+ C2 ti
{fzn>A}
fln dp +
c2
A(1 + 2n) k2._!'\)
1 + 2n M4, (fin) + 1 + 2n
f2. dp.
M£ (fin)
C2
1+2 n M (f2n) + 1+2 n ME (fen C2
)
Take the limit as n > oo to obtain the conclusion. If cl > 0 and c2 < 0, we have similarly Mg (9n) >_ 1 + 1
Mt (fln) + 1 + 2n M£ (fln)
+ C2M4 (f2n) + C2M£ (f2n) .
Again, take the limit as n > oo to obtain the conclusion. Cases with ci < 0 are similar. Theorem (3.1.11) has many simple consequences. The following will be included here. Others will be seen below.
(3.1.12) Corollary. Let 4) be a finite Orlicz function with 4)(u)/u + oo, and suppose that the function 1; defined by (3.1.1a) is also an Orlicz function. Suppose f and g are measurable functions satisfying
µ{IgI?A}> c
A
f
If Idti
{Ifl?a}
for all A > 0. Assume either (i) 4) satisfies (02) or (ii) i { If I > A} < oo for all A > 0. Then:
(a) g E Lt = f E L4,. (b) g E Ht
f E HiD.
Proof. We prove part (a). The other part is left to the reader. If g E Lt, then there is a > 0 such that Mt (g/a) < oo. But then we have Mt (f /a) < Mt (f /a) + Mt (f /a) < (1/c)Mt (g/a) < oo, so f E L.. Maximal inequalities for stopped processes The threefunction inequalities may be used to prove refinements of the basic maximal inequality (1.1.7).
If (Xn) is an adapted process, let
Xiv = max IXnI, 1
X* = supXn,. N
3.1. The threefunction inequality
95
Let A > 0. If AN = {Xr, > A}, let the stopping time a be defined by o(w)
{
inf{n:1
ifwEAN,
N
otherwise.
The inequality
µ(AN) < A fAN IXaI dp
was seen above (1.1.7a). By (3.1.3), if (X,)IEE is bounded in L4,, and p(AN) < oo for all A > 0, then (XN)NEIN is bounded in L£, so we have:
Theorem. Let 1 and 6 be as before. (i) If (XX)QEE is an integrable net bounded in Le, then X* E L. (ii) If (Xn) is a positive submartingale, bounded in Le, then X* E Lt. The estimates in (3.1.10) (3.1.13) apply.
Proof. For (ii), observe that if Xn is a positive submartingale, then M4, (X,) is an increasing function of a E E, since (by Jensen's inequality) P(Xn/a)
is a submartingale. Thus we have: If (Xn)nEIN is bounded in Le,, then (XX),EE is also bounded in Le, so X* E Lt. Complements
fo
(3.1.14) (Best constant.) The constant p/(p  1) in Corollary (3.1.10(a)) cannot be improved. Let (D, .F, ,a) be (0, 1) with Lebesgue measure. Let
a satisfy 1/p < a < 0. Define g(t) = to' and f (t) _ (a + 1)ta. Then µ{IgI >_ Al = µ{ta > A} = µ{t < Al/«} = p((0, alga]) _ Al/" and
I f dy=(a+1)
fal/°
J
tadt
= a + 1 A(a+1)/a
a+1
= A z{IgI >_ A}.
But also
1
IIYIIp =
f tap dt < oo,
since ap > 1. Thus II9IIp/IIf IIp = 1/(a + 1). Finally sup I
1
1
a+1
p
}p1.
(3.1.15) (Best constant.) The constant (k+1)/k in Corollary (3.1.10(b)) cannot be improved. Let (S1, F, p) be [0, oo) with Lebesgue measure. Let
Inequalities
96
a = 1/(k + 1), so 1 < a < 0. Then let g(t) = to and f (t) = (1 + a)ta. Check that
i{I9I?A}= 1 f
for allA>0.
If ldµ
{191>a}
Write 4k(u) = u(1og+u)k. The following are calculus exercises(!):
alga(a)kk! M (a) _ (1 + a)l/a(a + 1)k+1'
bl/a(a)k1(k  1)!
(gb)
M..k
(a + 1)k
Then take a =IIf II'k and b = II9II4,k,' and conclude [since a = 1/(k+1)] k
a _ (1 + a)a+1 b
So II9II
(a)aka
k+1
'k. = ((k + 1)/k) If II4,k
(3.1.16) (Improved constants.) Prove the elementary inequality:
alog+b
f IgIdµ<_eel+eel J Ifllog+Ifldp (or see Doob [1953], p. 317, or Neveu [1975], p. 71).
This inequality is true with e/(e  1) replaced by the unique positive solution of the equation ec = (c  1)2. Note e/(e  1) 1.582 and c .: 1.478. This is the best constant for the inequality (D. Gilat [1986]). (3.1.17) (Variant Lp inequality, Burkholder [1973].) Let X and T be nonnegative measurable functions, and let a,,3 be positive constants. Suppose, for all A > 0,
Aµ{T > ,(3A} < a f
X dµ.
{T> A}
Then, for 1 < p < oo, we have IiTllp < a/3P
(p P 1)
IIXIIPI
To see this, apply the threefunction inequality (3.1.2) with: t(u) = up/p,
cp(u) = 01, (u) = uP/q, where q = p/(p  1) is the conjugate index.
3.1. The threefunction inequality
97
Then with f = aX, g = T//3, h = T, the hypothesis of Theorem (3.1.2) is satisfied. The conclusion is 1
gbP_1
P a) aP  b IITII'  pa1
(1
IIXIIP
Substitute a = 1 and b = q/3P to obtain the result. (3.1.18) (Best constant.) Show that aoPq is the best constant in the preceding. It is clearly enough to consider the case a = 1. For ,3 < 1, use the measure space (0, 1], and T(w) = w8)(3118, X(w) _ (s + 1)w8, where 1/p < s < 0; then let s + 1/p to show that the constant is best possible. What should be done for ,3 > 1 ? (3.1.19) (HardyLittlewood.) The best known use of maximal functions
is due to Hardy and Littlewood. Let SZ = (0,1) and let It be Lebesgue measure. Suppose f : (0, 1) + IR, is an integrable function. Define
f*(x)=sup
fv
vu uJ 1
If(t)Idt:0
.
Let f be the decreasing rearrangement of IfI on (0, 1). Thus,
µ{IfI ?A}=µ{f>A}, so f and f belong simultaneously to any Lp or H.., and have the same norm. Let g(x) = (11x) fo 1(t) dt. Thus
µ{f*>A}
p{g > Al < J
f (t) dt. g>A}
This yields, then, many corollaries. If 1 < p < oo, and f E Lp, then f* E Lp, and IIf*IIP <_ (p/(p 1)) IIfIIP (see 3.1.10(a)). If f E LlogkL, then f * E L logk1 L. If f E L log L, then f * E L1. (3.1.20) Let (b be an Orlicz function with derivative
If IllII1 <1/2,and{IfI>A}
fW(o(IfI))dA
f 2) x dx < oo. o
x
Inequalities
98
Then
4)(u)=u oU6X ) dx,
u>0,
o
is an Orlicz function, and 6(u) = 'Y(cp(u)) a.e., where lY is the conjugate and cp is the derivative of 4). (3.1.22) (Converse maximal inequality.) Let 4) be an Orlicz function
and e(u) = '@(cp(u)) as usual. We saw in (3.1.13(ii)) that for a positive martingale (X.), if sup. IIXX1I. < oo, then 11 supXnll£ < oo. The converse
is false: for example take Xn = Xl for all n, where X1 E L logk1 L but Xl ¢ L logk L. The converse can be proved under additional assumptions: Theorem. Let (Xn) be a positive supermartingale such that for a constant C > 0 and all n we have CXn. If M£(supn Xn) < oo and M41 (X1) < oo, then supra M4,(Xn) < 00 .
Proof. Fix A > 0. Let o be the stopping time equal to the first n such that Xn > A. The set for = k} E.Fk, hence n
E [1{1
Also Xo < CXo_1 < CA on the set {1 < v < n}. Therefore E [1{8UP1A}Xn]
E
= E [1 {x1>A}Xn] + E [1{1
E [1{x1>),}X1] + E [1{1ll
< E [1{xl>A}Xl] + CAP
sup Xi > A
l JJJ
Now apply Theorem (3.1.11) with g = sup1
C>0and all nwe have Xn+1<CXn. Letk>1. If E
[(sux) (log+ (supx))k n
< oo and E [X1(log+ X 1)k] < oo,
sup E [Xn(log+ Xn)k] < oo. n
To see this, choose 4)(u) = u(log+ u)k. Then (u) = ku(log+ u)k1. (3.1.24) (Atomic aalgebras.) Suppose each.Fn is atomic, and there is a constant C such that P(An) < CP(An+l) for atoms An E Fn and An+1 E .n+l with An D An+1 Show that every positive supermartingale satisfies Xn+1 < CXn, the hypothesis of (3.1.22). The most familiar example is the dyadic stochastic basis (.7n), where P(An) = 2P(An+1).
3.2. Sharp maximal inequality for martingale transforms
99
Remarks
The threefunction inequality is from Edgar & Sucheston [1989] and [1991]. Inequality (3.1.3) is proved but not stated in Neveu [1975], pages 217219, for the case g E L.. There is a version of Theorem 3.1.11 in which the weak inequalities are assumed to hold only for sufficiently large A; this is useful to obtain the converse of the dominated ratio ergodic theorem; see Szabo [1991]. The proof (3.1.14) that (p  1)/p is the best constant is taken from Hardy, Littlewood, & Polya [1952]. For k = 1, the martingale case of (3.1.23), and the important stopping time argument are due to Gundy [1969].
3.2. Sharp maximal inequality for martingale transforms In this section (St,.F, P) will be a fixed probability space. Normally we will be concerned with a sequence (X,,,) of random variables and the corresponding stochastic basis defined by .F = o(X1, X2i , X..). We will use the following notation in this section. If X = (X"') is a sequence of random variables, then X* = sup,, IX,,I. The Lpnorm of the sequence is JjXjjp = sup,, IlXnllp. The notation X > 0 means Xn > 0 for all n. Recall that the sequence X = (X,,) is a martingale if X76 = EFm [Xn]
for m < n. Of course, it is enough to have Xn = EF^ [X,,+1] for all n. Let Y = (Yn) be the difference sequence of X; that is, Y1 = Xl and Yn = Xn  X._1 for n > 2. Thus, Xn = E 1 Y. An equivalent definition of the martingale is: En [Yn+1] = 0 for n > 1. We will say that Y is a martingale difference sequence. A sequence V = (Vn) of random variables is predictable if Vn+1 is mea
surable with respect to F. The transform of X by V is the process Z = (Zn) defined by Zn = Eti UY. We will write Z = V * X. If Xn is the fortune of a gambler at time n, then Zn may be viewed as the result of controlling X by V. Since V is predictable, multiplication of Yi by Vi is equivalent to changing the stakes for the ith game on the basis of information available before the ith game. The transform Z of a martingale X by a predictable process V is again a martingale, provided it is integrable, since [Zn+l  Zn] = E7" [Vn+1Yn+l] = Vn+1E'c° [Yn+l] = 0.
If r is a stopping time, then the stopped process (XTnn)nEIN is the transform of (Xn) by the predictable sequence Vn = 1{T>n} = 11{T
W*(V*X)=(W*V)*X, W*(V*X)=V*(W*X). The theorem below gives the weak maximal inequality for martingale transforms by a process V between 0 and 1.
Inequalities
100
(3.2.1) Theorem. Let X be a martingale, let V be a predictable process with values in [0, 1], and let Z = V * X. Then for all A > 0,
P{Z* > Al <
IIXII,.
The proof is omitted, since it is a special case of Theorem (3.2.2), below. Theorem (3.2.1) may appear surprising, since Z may be badly behaved in some other ways; for example Z need not be Llbounded, even if X is.
(See (6.2.8).) To give an intuitive interpretation of the theorem, assume
that X is nonnegative, and X1 = IIXII1 = c < A. In a fair game, the player is allowed to change the stakes by multiplying them by a number between 0 and 1, using information before each play. In particular, the player can skip a game by letting Vi = 0, but he cannot reverse the roles of his opponent and himself, since V > 0. The probability that the player's initial fortune c is ever increased to A is at most c/A < 1. In particular, the probability that the player's initial fortune will ever double is less than 1/2. In fact, controlling a fair game (the martingale X) by a predictable process V taking values in [0, 1] does not visibly improve this probability, since the inequality in Theorem (3.2.1) is the same as Doob's maximal inequality (1.4.18), corresponding to the case V = 1 and Z = X. A more effective way of controlling the game is to allow V to take values
in the interval [1, 1]. Now the gambler may choose, just before game i, to reverse roles with the opponent for game i by choosing V = 1. In the seventeenth century, the word "martingale" meant the doubleornothing martingale which can be mathematically described as follows. Let P be Lebesgue measure on [0,1], and let Xn = c2n1 1[0,2,.+1]. Then Xn is a nonnegative martingale starting at c with limn Xn = 0. Define V1 = 1 and Vn = 1 for n > 1. Then n
Zn=X1J:Yi=c(Xnc)=2cXn. i=2
Thus supn Zn = 2c  inf Xn = 2c  0 = 2c. In an infinite game, the gambler can double his initial fortune. Even if an infinite game is impractical, it remains that in a game of sufficient duration, a gambler endowed with very large reserves or credit can double his initial fortune with large probability. The following maximal inequality, due to Burkholder [1986], allows for V with both signs. In particular, it gives the bound 2 in the case of the doubleornothing martingale. The case a = 0, b = 1 is Theorem (3.2.1).
(3.2.2) Theorem. Let a, b be real numbers with a < 0 < b. Let X be a martingale, let V be a predictable process with values in the interval [a, b],
andletZ=V*X. Then forallA>0, P{Z* > Al <
b
The constant b  a is the best possible.
a IIXIII
3.2. Sharp maximal inequality for martingale transforms
101
Proof. Let Y be the difference process of X. Set n
An = >(V  a)Y,
(3.2.2a)
=1
n
Bn = E(b  V )Y.
(3.2.2b)
i_1
Then (3.2.2c)
An + Bn = (b  a)Xn,
(3.2.2d)
bAn + aBn = (b  a) Zn, and
(3.2.2e)
(An  An_1)(Bn  Bn_1) > 0
(with the convention that AO = Bo = 0). Let u : JR2 + 1R, be defined by
u(x , y)
1 + xy
if IxI < l and jyj < 1
Ix + yl
otherwise
Note that u is continuous, since the two formulas agree on the boundary square. Next, 0 < u(x, y)  I x+yi <_ 1: the extrema may be easily computed separately inside and outside the square Ixi V IyI 1 The function u may be estimated by its firstdegree Taylor polynomial in two variables as follows. The partial derivative with respect to x exists and is equal to
ux(x y) =
(3 2 2f)
I
Y
sgn (x + y)
ifIxI
except possibly at the values x = 1 and x = 1. For each fixed y, the function u_ (x, y) is nondecreasing in x and u(x, y) is continuous in x (so for each fixed y the function u(x, y) is absolutely continuous); for fixed x, the function ux (x, y) is nondecreasing in y. Similarly, the partial derivative with respect to y is
V
x sgn (x + y)
ifIxi
Inequalities
102
For fixed x, the function uy is increasing in y, and u is continuous; for fixed y, the function uy is increasing in x. Now if x, y, h, k are real, and hk _> 0, then we have
u(x + h, y + k) > u(x, y) + ux(x, y) h + uy(x, y) k.
(3.2.2g)
We prove the case h > 0 and k > 0: fh
u(x+h,y+k) =u(x,y+k)+J ux(x+t,y+k)dt 0 k =u(x) y)+J uy(x,y+s)ds+J ux(x + t, y + k) dt h
o
o
k
/'h
> u(x, y) + J uy(x, y) ds + J ux(x, y) dt o
o
= u(x, y) + ux (x, y) h + uy (x, y) k.
If we apply (3.2.2g) to the R2valued stochastic process Cn = (An, Bn)
with x=An1, y=Bn1, h=AnAn_1i k=BnBn_1i we obtain (3.2.2h) u(Cn) > u(Cn1)+ux(Cn1)(AnAn1)+uy(CC1)(BnBn1).
If n = 1, this is the inequality u(Ci) > u(Co). If n > 1, then An An1 = (Vn  a)Yn, so ux(Cn_1)(An  An1) can be written as a product of Yn and a random variable Q which is a bounded measurable function of V1, V2, ,Vn, Y1, Y2,
,Yn_1. Thus Q is Fnlmeasurable, and thus
E [ux(CC1)(An  An1)] = E [YYQ] = E
[E*Fn1
[Yn] Q] = 0.
Similarly,
E [uy(Cn1)(Bn  Bn1)] = 0. Therefore, by (3.2.2h), E [u(Cn)] >_ E [u(Cn1)] > ... > E [u(Cl)] >_ E [u(Co)] = u(0, 0) = 1.
Using (3.2.2d), we have
P{IZnI > 1} =P{IbAn+aBnj > b  a}
= 1P{IbAn+aBnI
< 1  P{IAnI < 1 and IBnI < 1}
<E[u(Cn)I(Cn)], where I is the indicator function of the set { (x, y) : jxj < 1 and jyj < 1 }.
3.3. Prophet compared to gambler
103
Now by (3.2.2c) and the inequalities for u, we have
a(Cn)I(CC) < IAn+Bnl < (ba)IXni. Combining this with (3.2.2i), we obtain (3.2.2j)
P{Iznl > 1} < (ba)IIXnlii,
which is quite close to the announced inequality for Z*. We will use a stopping time argument similar to the one often used in proofs of Doob's inequality. Let r = inf In: I Zn I > 1 }. Now the stopped process ZTnn is the transform by V of the stopped martingale Xrnn. Thus we have from (3.2.2j)
P{IZrnnl > 1} < (b  a)IIXTnfhi1 But IXni is a submartingale, so by (1.4.18), we have IIXTnnII1 < IIXIII, so if n * oo we obtain
P{Z* > 1} < (b  a) IIXII1.
Applying this to the martingale X/A yields the maximal inequality stated. In the proof that the constant b  a is best possible, we may assume that A = 1. If a = 0, we may assume b = 1. Then the deterministic example Xn = Vn = 1 shows that the constant 1 is the best possible (also in Doob's inequality). If a < 0, consider a martingale X such that X1 is the constant 1/(b  a) and X2 = X3 = is such that P JX2 =
2
a
a 2(b  a)'
P{X2=0}= 2ba
2(b  a)*
Let V1 = b and Vn = a for n > 2. Then P { Z * = 11 = 1 and I I X I I 1 = 1/(b  a). In Section 6.2, below, we give the proof that the transform of an L1bounded martingale converges a.s. Remarks
References related to this section are: Burkholder [1966] and Burkholder [1986].
3.3. Prophet compared to gambler Let n E IN, and let the random variable Xi be the fortune of a player at time i, for 1 < i < n. We may sometimes use a boldface letter as an abbreviation: X = (X1, X2, .. , Xn). The player is to stop playing at a certain time i; his goal is to choose i so that his fortune Xi is as large
Inequalities
104
as possible (on the average). We will consider two types of players: the prophet and the gambler. A prophet is a player with complete foresight. Since he knows all the values Xi, he may simply choose the largest one, and stop at the appropriate time. Thus he achieves the maximum X1 V X2 V V X,,,.
A gambler is a player without knowledge of the future; but he knows the past and the present, and has knowledge of the odds (that is, the joint distribution of X). Let Fi be the valgebra generated by X1, X2, , Xi. Then the time the gambler chooses to stop must be a stopping time for this stochastic basis (Fi) 1. Write E,,, for this set of stopping times; their values are in the set {1, 2,. . . , n}. When the gambler uses the stopping time z E E,,,, his fortune when he stops is XT. So his expected fortune is E [XT]. Thus the best expected fortune he can achieve is:
sup{E[XT] : r E En}. This is known as the value of the process X = (X1, X2,
V = V(X) =
,
X,,,). Notation:
V(X1,X2,... ,Xn).
We will be interested in prophet inequalities. They compare the expected
gain P of a prophet to the expected gain G = V of a gambler. Of course the prophet has the advantage, so P > G. A surprising result is that often there are moderate universal constants C (independent of n and of the distributions of the Xi) such that P < CG. In the independent positive
case, C = 2 (Theorem 3.3.2) and the constant 2 is optimal. Thus the advantage of knowing the future is not as large as might have been expected.
To be sure, the prophet uses his foresight only to stop the game, not to change the stakes. The gambler uses his nonanticipating skills for the same purpose. A different situation arises when the players are allowed to change
the stakes. A prophet inequality (with optimal constant 3) also exists in that case. This will be treated in Theorems (3.3.4) and (3.3.5). Stopped processes
We prove first a basic lemma; the technique used to define the stopping times of is known as "backward induction," since we proceed from Qi+i to Oi.
(3.3.1) Lemma. Let X1, X2,. , Xn be independent nonnegative random variables. Define an = n, and inductively for i = n  1, n  2,... , 2, 1: (3.3.1a)
Qi =
i
on {Xi>E[Xoi+1]}
ai+1
on {Xi < E [Xoi}1] }.
Then we have n1
E [(Xi  E
E [Xn] + i=1
[Xoi+1+]
= E [Xoi]
3.3. Prophet compared to gambler
105
Proof. Let i be given, 1 < i < n. Then we have (3.3.1b)
(Xi
 E [Xi+J)+
= 1 {xi<E[xoi+i]} (Xi  E [Xoi+1])
Since 1{Xi <E[x,. ] } and XQi+,  E [x,,+,] are independent, we obtain
E [1{xi<E[xai+i]}
=E
(Xvi  E [Xoi+i])]
[1 {xi <E[xoi+1 ] }
(Xoi+i  E [Xai+i ]) ]
(3.3.1c)
= P {Xi < E [Xoi+i ] } E [Xoi+,  E [XQi+1 ] ] = 0.
Integrating (3.3.1b) and adding (3.3.1c) to it, we obtain
E [(Xi  E
(3.3.1d)
[Xoi+1])+]
= E [Xoi]  E [Xi+,]
Summing these equations from i = 1 to i = n  1 and adding E [Xn] _ E [Xan], the assertion follows.
Now assume n > 2. Write V = V(X1i X2, , Xn), ei = E [Xi], and 1V = V (X2i , Xn). Clearly 1V < V, so the theorem below proves slightly more than P < 2V. (3.3.2) Theorem. Let X1, X2, , Xn be independent positive random variables with 0 < E [Xi] < oo. Then
E[(X1VX2V...VX, 1V)+]
(3.3.2a)
hence
E[XiVX2V...VXn] <1V+V <2V.
(3.3.2b)
Proof. Apply the lemma to the random variables Xi. By (3.3.1d) we have E [Xo2 ] > E [Xoi+, ] and therefore
sup Xx E[Xo2 ] +
1
Xn +(Xz iE[Xof+i])+ i=1
Integrating this, and using the lemma, we obtain:
E
sup Xi  E [X2
[(l
]\ +< I
E [X
1
]
Since E [Xo2 ] < 1 V and E [XQ, ] < V, we have (3.3.2a).
Note that, in fact, E [X,] = V and E [XQ2] = V. (This is not used below.) See (3.3.8).
Inequalities
106
Transformed processes
To study transformed (rather than stopped) processes, it will be useful to consider the random variables X0, X1, X,,, and subaalgebras 9i of .F such that each Xi is measurable with respect to G. However, we do not assume, in general, that Gi C 9i+1The gambler we considered previously adds at time i+1 to his fortune Xi the amount Xi+1Xi if he chooses to continue, and 0 if he does not. We will now allow the gambler to change the stakes: At time i + 1 the gambler gains the amount Ui+1(Xi+1Xe), where the factor Ui+1 is chosen by the gambler on the basis of the information provided by Gi; that is, Ui+1 is Gi predictable.
Usually when we consider transforms we have Gi = o (Xo, X1, ... , Xi); in this case we simply say that U = (U1, U2i , U,,.) is predictable. In our next result, we assume only Gi = o (Xi); in this case U is called presently predictable: the player multiplies his stake by the random variable Ui+1 which depends only on the present. In any event, the gain of the player up to a time m is
r1
E Ui+1(Xi+1 Xi) = Z.i=O
Of course, the sequence Z = (Z1, Z2, process X by U.
,
Zn) is the transform U * X of the
In the following theorem, the gambler transforms the process X by a presently predictable process U bounded by a constant c. Because we are only interested in the ratio PIG, we may assume without loss of generality c = 1. Let 1is be the set of all presently predictable processes bounded by 1. On the other hand, the prophet transforms X by any measurable process U bounded by 1. Let A. be the set of all measurable processes bounded by 1, so the expected gains of the prophet and gambler are, respectively,
Ps = sup E[U*X], UEO,
Gs = sup E[U*X]. UEII,
Let 1I be the set of nonnegative processes in 113, and let A be the set of nonnegative processes in Os. Our main concern here will be the expected gains P of the prophet and G of the gambler when they are restricted to these classes.
In the nonnegative case H, it is obvious that the best choice for the prophet is: bet the maximum if there is gain, and bet 0 if there is loss. Hence
n
P=
E [(Xi  Xi1)+] . i=1
In the signed case IIs, the best choice for the prophet is: bet the maximum on the winning side. Hence (nom
P=EEDXiXi1I]. i=1
3.3. Prophet compared to gambler
107
It will be useful to introduce the functional p defined on L1 by
µ(X) =E[X] E [(X E[X])+] . Note that µ(X) = E [X]  (1/2)E [IX  E [X] I]. We will sometimes assume below that the first and last random variables satisfy p(Xo) < p(Xn). We observe that this is not a loss of generality if all the random variables are positive: This may be seen as follows. Since we compare gains, we suppose that both players have the same initial fortune. If X1, X2, , Xn are positive and the players receive X1, then addition of the random variable X0 = 0 at the beginning does not change anything. Hence in this case we
may assume Xn > E [X0]. Now Xn > E [Xo] implies p(X,,) > µ(Xo) whether or not Xn is positive. Indeed, subtracting the same constant from X0 and Xn does not change p(X,,)  p(Xo); so we may assume E [Xo] = 0 and Xn > 0. Then p(Xo) < 0 while p(X,,) = E [Xn] (1/2)E [IX.  E [Xn] 1] > E [Xn]  (1/2)E [IXnI]  (1/2)E [Xn] > 0. Here we will not require the independence of the X2's; only that the centered random variables Xi  E [Xi] = Xi  e2 be differences of a weak martingale. A sequence Y2 is a weak martingale if E [Y2 I Y2_1] = Yi_1 for all i. This notion is of interest in convergence theory, since Llbounded weak martingales converge in probability, while the issue of a.e. convergence remains open (Nelson [1970]). Before we prove the theorem, we insert two lemmas.
(3.3.3) Lemma. Let X be an integrable random variable, and let a < b be constants. Then
(b  a)P{X > b} < E [(X  a)+]  E [(X  b)+] < (b  a)P{X > a}. Proof. Note that
E [(X  a)+]  E [(X  b)+] = E [(X  a  X + b)1 {x>b}] + E [(X  a)1{a<x
0<Xa
E [(b  a)1{x>b}] < E [(X  a)+]  E [(X  b)+] < E [(b  a)1 {x>b}] + E [(b  a)1{a<x
Inequalities
108
(3.3.4) Lemma. Let X be an integrable random variable, let d be a constant, and E [X] = e. Then
E [(X  e)+] < E [(X  d)+] (1 + P{X < e}) + d  e. If d > e, the term P{X < e} may be omitted. Proof. If d > e, then by the second inequality in (3.3.3),
E [(X  e)+] < E [(X  d)+] + d  e. Next, if d < e, then by the first half of (3.3.3),
E [(X  e)+] < E [(X  d)+] + (d  e)P{X > e} = E [(X  d)+] + (d  e) + (d  e)P{X < e} < E [(X  d)+] (1 + P{X < e}) + d  e because E [(X  d)+] > E [X  d] = e  d.
(3.3.5) Theorem. Let X = (Xo, X1,
, X,) be an integrable stochastic process. Let 9i be aalgebras such that Xi is Suppose that gimeasurable.
(3.3.5a)
E [Xi I Xi_1] = ei
is constant. Assume p(Xo) < p(X.) (or assume that all the Xi's are nonnegative). Then the expected gain P of the prophet and the expected gain G of the gambler are related by:
P<3G. The inequality is strict unless all the random variables are identically equal to the same constant. Proof. Since decreasing the aalgebras Gi decreases the expected gain G but leaves P unchanged, we m a y assume 7i = a(Xi). Let U = (U1, U 2 , . , UU)
be presently predictable and 0 < Ui < 1. For the gambler, using (3.3.5a), we have
E [U2(X,  Xi1)] = E [Era' [UU(Xi  Xi1)]] = E [Ui(ei  Xi1)] . This is maximal when
Ui = 1{Xii<e;}
Hence G= E
1E
[(ei  Xi_1)+] .
3.3. Prophet compared to gambler
109
On the other hand, n
P
E [(X1  Xi_1)+]l
i=1 n
< > (E [(Xi  ei)+] + E [(ei  Xi1)+] ) i=1 n
_
E [(Xi  ei)+] + G. i=1
It is therefore sufficient to show n
E [(Xi  ei)+] < 2G.
(3.3.5b) i=1
The identity E [(ei  Xi_1)+] = E [(Xz_1  ei)+] + ei  ei_1 yields n
E [(Xi1  ei)+] + en  e0.
G= i=1
Applying (3.3.4) with X = Xi, e = ei, d = ei+1, summing from i = 0 to n  1, and observing that 1 + P{Xi < ei} < 2, we obtain n1
n
E [(Xi  ei)+] < 2
(3.3.5c) i=0
E [(Xi_1  ei)+] + en  e0. i=1
Thus we have n
E [(Xi  ei) +] = E [(Xn  en) +] i=1
 E [(X0  eo) +]
n
+2
E [(Xi_1  ei)+] + en  e0
 (en  e0)
i=1
= 2G  µ(Xn) + y(Xo) This proves (3.3.5b) because we assume µ(Xn) > µ(X0). Finally, we show that P < 3G unless all the Xi's are equal to e0. So assume P = 3G. The computation just completed shows: (1) µ(Xn) = µ(X0); (2) equality in (3.3.5c); and (3) since the estimate 1 + P{Xi < ei} < 2 is
strict, we must have E [(X2_1  ei)+] = 0, or X2_1 < ei. Thus ei+1 = E [Xi+1] > ei, and E [(Xi  e2)+] < E [(ei+1  ei)+] = ei+1 ei, with strict inequality unless ei+1 = e2 and Xi = ei. But by equality in (3.3.5c), we must have equality in every term E [(Xi  ei)+] < e2+1  ei, so Xi = ei =
foralli. There is a version depending on n, obtained by applying the stopping time result to transforms. Note that if the random variables X1i , Xn are nonnegative, then we can always add X0 = 0 to the beginning without changing P or G.
Inequalities
110
(3.3.6) Proposition. Let X = (Xo, X1i
, Xn) be nonnegative random variables such that E [Xi I Xi+l] = ei is constant, and Xo = 0. Then
n
P
Proof. Note V(Xi) = ei and V(Xi_1, Xi) = E [Xi_1 V ei], so by Theorem (3.3.2) we have
E [Xi_1 V Xi] < ei + E [Xi_1 V ei] = ei + E [(ei  Xi_1)+] + ei_1. Also E [Xi1 V Xi] = E [(Xi  Xi1)+] + ei1, so
E [(Xi  Xi1)+] + ei1 < ei + E [(ei  Xi1)+] + ei1, n
n
E [(Xi  Xi_1)+] < i=1
n
E [(ei  Xi_1)+] + i=1
ei. i=1
Therefore P < G + E 1 ei. The case of signed U We now allow Ui with c < Ui < c. For consideration of the ratio PIG, again we may assume c = 1. Our notation is the same as before: The expected gain of the prophet is P3 and the expected gain of the gambler is G3. For the computation, we transform X by processes U presently predictable with respect to oalgebras (QQi) such that Xi is gimeasurable.
(3.3.7) Theorem. Suppose E [Xi I Xi_1] = ei is constant. Assume that µ(Xo) < A(XX) and eo > en (both conditions hold if Xn = eo). Then
P3<3G3. If en = eo, then P3 = 2P and G. = 2G.
Proof. At stage i, the optimal gambler receives Xi  Xz_1 on the set {Xi_1 < ei} and Xi_1  Xi on the set {Xi_1 > ei}. Hence n
(E [(ei  Xi1)+] + E [(ei  Xi1)
G3 =
])
i=1
On the other hand, the difference of the summands is
E [(ei  Xi1)+]
 E [(ei  Xi1)
] = ei  ei1,
hence
n
n
E [(ei  Xi1)+] 
G3 = 2 i=1
E [(ei  ei_1)] = 2G  en + eo. i=1
3.3. Prophet compared to gambler
111
Similarly, P. = 2P  en + eo.
Now P < 3G and eo > en imply P8 < 3G8. If en = eo, then P. = 2P and G8 = 2G. Complements
(3.3.8) (Optimality of backward induction.) In Lemma (3.3.1), we have E [X 1] = V and E [X2] = 1V. It follows inductively (backward on i) that E [X,,] is the value of the process (Xi, , Xn). (See Chow, Robbins & Siegmund [1971], p. 50.)
(3.3.9) (Constant 2 is best in (3.3.2).) Let n = 2, X1 = 1 and X2 = M > 1 with probability 1/M and X2 = 0 with probability 11/M. The expected fortune of the prophet is
=2M. The gambler expects to receive 1 regardless of when he stops. Then P/V = 2  1/M, so 2 is the best constant since M is arbitrary. (3.3.10) (Bounded case.) In Theorem (3.3.2), if the random variables take
values in [0, 1], then the prophet inequality P < 2V can be improved to P < 2V  V2 (Hill [1983]). (3.3.11) (Constant 3 is best in (3.3.4).) Given e > 0, there exist random variables Xi, X2, ... , X,, with P' > (3  6) G', where P and G' are the expected gains of the prophet and of the gambler for the primed process. The sequence (Xi') is obtained from a sequence (Xi) by inserting many copies of the constant random variable E [Xi] between Xi and Xi+1. The independent positive sequence Xi itself is chosen so that En, E [(X2  ei)+] > (2  e)G and Xo = Xn (Krengel & Sucheston [1987], p. 1598). (3.3.12) (An economic interpretation of (3.3.4).) Assume that (Yn) and (Zn) are two integrable processes with arbitrary distributions, but independent of each other. Players observe the two processes alternately: say
Xn = Yn for n even and Xn = Zn for n odd. Then the adjacent Xn's are independent. To fix the idea, suppose that a conglomerate is sufficiently diversified so that the stock prices of two of its firms, F1 and F2, are independent. Let Yn be the value of cl shares of Fl; let Zn be the value of c2 shares of F2. Two of the conglomerate's executives are allowed to trade each year kc1 shares of F1 for kc2 shares of F2 or vice versa. Each executive chooses each year his own value of k, but the k's are bounded by a fixed constant C. Assume that the junior executive bases his decisions on the present (equivalently, present
and past): he is a gambler. The senior executive knows the future: he is a prophet. In practice, there is a device equivalent to the gift of prophecy put at the disposal of senior executives: the right to exercise options, i.e., trading in stock some time in the future at prices prevailing earlier. The theorem implies that the expected gain of the senior executive is less than 3 times that of his junior colleague (Krengel & Sucheston, unpublished, 1987).
112
Inequalities
Remarks
Prophet inequalities were introduced by Krengel & Sucheston [1978], where it
was first shown that there are constants C such that P/V < C. For stopped processes, Krengel and Sucheston obtained C = 4; the same paper contains a proof by D. Garling that C = 2 can be taken. The paper also studies the analogous but more difficult problem when the X2's are averages of independent random variables, obtaining C = 5.46. The problem whether the best constant in this case was 2 had remained open until Hill [1986] resolved it in the affirmative. The proof of Theorem (3.3.2) given here is based on arguments communicated to us by D. Gilat and R. Wittmann. Theorems (3.3.4) and (3.3.5) are from Krengel & Sucheston [1987]. Proposition (3.3.6) was observed by Sucheston & Yan [in press].
4
Directed index set
In this Chapter we present the theory of martingales and amarts indexed by directed sets. After Dieudonne showed that martingales indexed by directed sets in general need not converge essentially, Krickebergin a series of papersproved essential convergence under covering conditions called "Vitali conditions." This theory is presented in an expository article by Krickeberg & Pauc [1963] and in a book by Hayes & Pauc [1970].
Here we offer a new approach and describe the subsequent progress. The condition (V), introduced by Krickeberg to prove the essential convergence of Llbounded martingales, was shown not to be necessary. Similarly the condition (V°), introduced to prove convergence of Llbounded submartingales, is now also known not to be necessary. The condition (V.), which Krickeberg showed to be sufficient for the convergence of martingales
bounded in the Orlicz space L,,, is also necessary for this purpose if the Orlicz function 1 satisfies the (02) condition. In each instance, the convergence of appropriate classes of amarts exactly characterizes the corresponding Vitali condition. This is of particular interest for (V) and (V°) since there is no corresponding characterization in the classical theory. In general, to nearly every Vitali type of covering condition there corresponds the convergence of an appropriate class of "amarts." The understanding of this fact was helped by new formulations of Vitali conditions in terms of stopping times. Informally, a Vitali condition says that the essential upper limit of a 01 valued process (1A,) can be approximated by the process stopped by appropriate stopping times. This has a clear intuitive meaning even in the case of multivalued stopping times because, as a condition for convergence, the overlap of the values is small, in a precise sense. The application of martingale theory to derivation theory has been long known, but also amarts come into their own. A derivative of a superadditive setfunction is both a supermartingale and an amart (4.2.18). In the classical setting of the derivation theory the Vitali condition (V) holds (4.2.8), but (V°) does not. So supermartingales need not converge essentially, and the amart theory is needed to prove convergence. Similarly, derivatives of functions of measures are amarts, and the Riesz decomposition sheds some light on their behavior (4.2.19). Since the condition (V) is not necessary for essential convergence of martingales, is there a covering condition both sufficient and necessary? The
answer is yes: There is a condition (C) for this purpose. Condition (C)
Directed index set
114
is sufficient for convergence of Llbounded martingales, and also necessary
if the index set has a countable cofinal subset. The question of necessity remains open for the general index set. It is also not known whether (V4.) is necessary for convergence of Lwbounded martingales if (02) falls. Another open question is the existence of a covering condition both sufficient and necessary for convergence of Llbounded submartingales. The conditions (V4.) are modeled on similar covering conditions in the theory of derivations (see Chapter 7). Condition (C) has moved in the opposite directionit was first introduced in the study of convergence of processes indexed by directed sets, then translated into a condition for derivation theory. As with Doob's theory of martingales indexed by IN, the main interest and the main difficulties occur in the L1bounded case. The greater part of the Chapter is devoted to that case.
4.1. Essential and stochastic convergence There are two modes of convergence that will be used in this chapter: essential convergence and stochastic convergence.
When the index set is countable, essential convergence coincides with almost everywhere convergence, but when the index set is uncountable, essential convergence is still reasonable, although almost everywhere convergence may not be. Essential convergence is called order convergence in some of the literature. Stochastic convergence is also known as convergence in probability (or in measure). We will consider the corresponding stochastic upper limit and stochastic lower limit. Essential convergence
Let (St, F, P) be a probability space. Let S be the set of all extendedrealvalued random variables. Let P C S be some set of random variables. The random variable Z E S is called the essential supremum of the set P if
(1) Z > X a.s. for all X E P; (2) if Y E S and Y > X a.s. for all X E P, then Y > Z a.s. (Thus Z is the least upper bound of P in the partial order obtained by identifying functions that agree almost everywhere.)
(4.1.1) Proposition. (a) Every subset P of S has an essential supremum, ess sup P, unique up to null sets. (b) There exists a sequence Xn in P such that sup,, Xn, = ess sup P a.s. (c) If the family P is directed, the sequence X,,, may be chosen to be a.s. increasing. Proof. Assume first that all X E P satisfy 0 < X < 1. Let Pl be the set of all countable suprema supk Xk for Xk E P. Write
a=sup{E[Y]:YEP1}.
4.1. Essential and stochastic convergence
115
(Since 0 < Y < 1 for all Y E P1, we have 0 < a < 1.) For each n, choose Yn E P1 with E [Yn] > a  2n, and define Z(w) = supra Y,,(w). Then Z is itself a countable supremum of elements of P, so Z E P1. We have E [Z] =a. We claim that Z is an essential supremum for the set P. If X E P, then
XVZEP1and
a>E[XVZ]>E[Z]=a,
so X V Z = Z a.s., hence X < Z a.s. On the other hand, if Y is a random
variable with Y > X a.s. for all X E P, then Y > Z a.s., since Z is a countable supremum of elements of P. For the general case, choose a continuous strictly increasing bijection between [oo, oo] and [0,1], and apply the preceding case. For the uniqueness, suppose that both Z1 and Z2 are essential suprema
for the set P. Now Z1 > X a.s. for all X E P, so (since Z2 is essential supremum), Zl > Z2 a.s. Similarly, Z2 > Z1 a.s. Therefore Z1 = Z2 a.s. This completes the proof of (a) and (b). For (c), note that if the sequence Xn is chosen for (b), we may choose Xn E P so that X;, > Xn and X,, > Xk for 1 < k < n since P is directed. Then Xn is increasing and sup Xn = ess sup P.
We will write ess sup P for the essential supremum of the set P. The essential infimum is defined analogously (the greatest lower bound in the partial order obtained by identifying functions that agree almost everywhere), or equivalently in terms of the essential supremum: ess inf P =
(ess sup{X : X E P}). Note: it is easily verified that if P is countable, then Z = ess sup P is the pointwise supremum,
Z(w)=sup{X(w):XEP}. This need not be true for uncountable P. For example, if (fl, Y, P) is [0, 1] with Lebesgue measure, and P is the set
P={1{X}:xE[0,1]}, then ess sup P is 0 a.s., but sup { X (w) : X E P } is identically 1. If P is replaced by a nonmeasurable subset, then sup { X (w) : X E P } is not measurable. The essential supremum is better behaved.
(4.1.2) Proposition. Suppose the set P is a nonempty family of indicator functions of measurable sets. Then ess sup P is also an indicator of a measurable set.
Proof. Let Z = ess sup P. Since P is nonempty, there is X E P, and thus Z >_ X > O a.s. For all X E P, we have X < 1, so Z < 1. Let A = {Z > 1}. We claim Z = 1A a.s. If X E P, then X < Z a.s. But X is an indicator function, so it is 0 whenever it is less than 1, and we have
Directed index set
116
X < 1A a.s. Therefore, by the definition of essential supremum, Z < 1A. Clearly Z > 1A a.s., so Z = 1A a.s. If C C_ F is a family of measurable sets, then we will write B = ess sup C for the set of the proposition; that is,
1B = esssup 1A. AEC
(By convention, if C = 0, then ess sup C = 0.)
(4.1.3) Definition. Let J be a directed set, and let (Xt)tEJ be a net of random variables. The essential upper limit of (Xt) is defined by e lim sup Xt = ess inf ess sup Xt. 8EJ tEJ tEJ t>s
We will often write X * for the e lim sup of a process (Xt). The essential lower limit e lim inf Xt is defined analogously: Xt = ess sup ess Einf Xt, sEJ t>8
e
or e lim inf Xt =  e lim sup (Xt). (4.1.4) Proposition. If (Xt) is a net of random variables, then we have e lim inf Xt < e lim sup Xt a.s.
Proof. For each s E J, ess inft>8 Xt < X8. Therefore, for each u E J, ess sup ess inf Xt < ess sup X8, s>u
t>s
s>u
hence e lim inft Xt < ess sup.>u X8. Therefore, e lim inf Xt < ess inf ess sup X8 = e lim sup X8. t
u
s>u
s
Let (Xt) be a net of random variables. It is said to converge essentially to a random variable X,,. if X00 = e lim sup Xt = e lim inf Xt tEJ tEJ
a.s.
Then we write X,,. = e limtEJ Xt. We leave to the reader the verification that when J is countable, we have (e lim sup Xt) (w) = lim sup (Xt (w)), tEJ tEJ (e lim inf Xt) (w) = lim inf (Xt (w) ), tEJ tEJ
(elimXt)(w) = lim (Xt(w)) tEJ
for almost all w.
tEJ
4.1. Essential and stochastic convergence
117
Stochastic convergence
Stochastic convergence (convergence in probability, convergence in measure) will be useful not only in situations when essential convergence fails,
but also as a tool in proofs of essential convergence. We begin with the stochastic upper and lower limit. (There does not seem to be a reasonable notion of "stochastic supremum," however.) Let (Sl, P) be a probability space.
(4.1.5) Definition. Let (Xt)tEJ be a net of extendedrealvalued random variables. A random variable Y such that limt P{Xt > Y} = 0 is called asymptotically greater than Xt in probability. (The words "in probability"
will sometimes be omitted.) The stochastic upper limit of (Xt), written slim sups Xt, is the essential infimum of the set of all extended random variables which are asymptotically greater than Xt in probability. The stochastic lower limit, written slim inf Xt, is slim inf Xt =  slim sup (Xt). If slim sup Xt = slim inf Xt = X,,,,, then X,,. is called the stochastic limit of (Xt), which is then said to converge stochastically (or to converge in probability) to X,,,,. We write slimtEJ Xt = X11There are relations between stochastic convergence and essential conver
X.
gence.
(4.1.6) Proposition. Let (Xt)tEJ be a net of extendedrealvalued random variables. Then (4.1.6a)
e lim inf Xt < slim inf Xt < slim sup Xt < e lim sup Xt.
Therefore, if (Xt) converges essentially, then it converges stochastically to the same limit.
Proof. If so E J, then esssupt>80 Xt > X8 for all s > so. Hence lim P ( ess sup Xt < X3 I
l
0.
t>80
ess inf ess sup Xt > s lim sup X3; 80
t>80
8
this holds for all so, so e lim sup Xt > slim sup Xt. tEJ tEJ Similarly,
e lim inf Xt < s lim inf Xt. tEJ tEJ
M
Directed index set
118
(4.1.7) There is a natural definition intermediate between essential convergence and stochastic convergence. When three of the four random variables in (4.1.7a) coincide, we say that (Xt) demiconverges to their common value. More specifically, we may say that (Xt) upper demiconverges to Xoc if slim Xt = e lim sup Xt = X,,,, t
t
and that (Xt) lower demiconverges to X,,. if
slimXt = eliminf Xt = Xoo. Thus demiconvergence implies stochastic convergence. (For some results on demiconvergence, see (7.3.24), (9.1.2), (9.4.4), (9.4.13).) Let (At) be a net of events. Then slim sup lAt is an indicator function. Indeed, if Y is asymptotically greater than (1At), then
0=limP{1At >Y}>P{O>Y}, so Y > 0 a.s. Thus {1At > Y} > {1At > 1{Y>1}}; that is, 1{Y>1} is also asymptotically greater than (1At), and 1{Y>1} < Y. Thus slim suplAt is the essential infimum of a family of indicator functions, and therefore by (4.1.2) is an indicator function itself
1B =slimsuplAt. We will call B the stochastic upper limit of (At), and write B = A = slim sup At. Translating the definition, we get: (4.1.7a)
A is the smallest set C such that lim P(At \ C) = 0. t
Indeed, At \ C = {1A, > 1c I, hence limt P(At \ C) = 0 if and only if 1c is asymptotically greater than 1At in probability. It can be easily verified that if X,,. is finite a.s., then slimtEJ Xt = Xoo is equivalent to the usual definition: (4.1.7b)
for alle>0.
lim P { j x""  Xt I > 61 = 0 tEJ
4.1. Essential and stochastic convergence
119
For sets, B = slim At holds if and only if (4.1.7c) P(At L B) > 0. (4.1.8) Lemma. Suppose (At) is an adapted sequence of sets. Then A = slim sup At is the largest set A such that (4.1.8a)
limsupP(At n B) > 0 t
for all subsets B C A of positive probability. Proof. Let A be a set such that (4.1.8a) holds for every B C_ A of positive probability. Then
P(Atn(A\A))
P(At\A)*0,
which implies P(A \ A) = 0 and therefore A C_ A a.s.
Conversely, let B C A with P(B) > 0 but suppose that (4.1.8a) fails. We have
P (At \ (A\ B)) < P(At \ A) + P(At n B).
Applying (4.1.7a) with C = A, we obtain P(At \ A) = 0. Therefore limP(At \ (A \ B)) = 0, hence P(B) = 0, because A is the smallest set C as in (4.1.7a).
(4.1.9) Lemma. Let A = slim sup At, and let (s,,,) be a sequence of indices. Then there exists an increasing sequence (t,,) of indices such that sn < to and A C U001 At . Proof. On measurable subsets of A define a function y by y(B) = lim sup P(At n B). t
Set B1 = A and choose t1 > s1 such that P(At1 nB1) > y(Bi)/2. Then set B2 = A\At1 and choose t2 > t1, t2 > s2, such that P(At2 nB2) > y(B2)/2. Continue the definition of (tn) and (Bn) recursively: given t 1 , , to and 113n, set B1,_ n
Bn+1=A\UAt, i=1
and choose to+1 > tn, to+1 >_ Sn+l, with P(Ati}1 n Bn+1) >_ y(Bn+1)/2. Since
00
00
nBn=A\UAt,, n=1
n=1
it suffices to show P(fn Bn) = 0. The sets B. n At. are pairwise disjoint in a finite measure space, and y(Bn) decreases, hence
limy(Bn) = limlimsupP(At n Bn) < 2limsupP(Atn n Bn) = 0. n
It follows that y(n Bn) = 0, hence by Lemma (4.1.8), P(l Bn) = 0, so A C Un At,, .
120
Directed index set
Proofs of convergence in Chapter 1 (for example (1.2.4)) are based on
approximation of the a.s. lim sup by the stopped process. We will see below that also on directed sets the stochastic upper limit slim sup At can be approximated by a stopped set A(r), with r E E. This was implicitly used in the proof of (1.3.9). (For the essential lim sup, a lim sup At, such an approximation need not hold; it must be postulated; this is the meaning of the Vitali conditions below.) In fact, it suffices to stop with ordered stopping times. A simple ordered stopping time is a stopping time r : SZ + J such that its set of values is totally ordered (see Section 1.3). We write E° for the set of ordered stopping times.
(4.1.10) Proposition. Let (At) be an adapted family of sets, and write A = slim sup At. For every e > 0 and every to E J, there is a T E E° such that T > to and P(A A(T)) < e.
Proof. Given e and to, choose a sequence sn E J such that s,,, > to and P(At \ A) < e2n1 for all t > sn. Let (tn) be the sequence obtained by application of Lemma (4.1.9). Choose k such that P (A \ Uki Ai) < e/2. Define r = to on At. \ U 1 At; for n < k and r = tk+1 on the rest of Q.
Then rEE0and P(ADA(r))<e. The corresponding result for processes is proved next. A process (Xt)tEJ is called asymptotically uniformly absolutely continuous if, for every e > 0,
there exists s E J and 6 > 0 such that if P(B) < 6 and t _> s, then E [IXt 1B1] < e.
(4.1.11) Stochastic maximal inequality. Let (Xt)tEJ be a positive adapted process, let A > 0, and define A = Is lim sup Xt > A}. (a) Then
P(A) <
lim sup E [Xr]
A rEE°
.
(b) Suppose also that (XT)rEEo is asymptotically uniformly absolutely continuous. Then P(A) lim sup E [Xr 1A] . A
rEEO
Proof. (a) Let (Xt) be a positive process and A > 0. Fix a number a with
0Aa}. Then
{slimsupXt > A} C s1imsupBt = B. Indeed, suppose Y is a random variable asymptotically greater than (Xt), and D is an event such that 1D is asymptotically greater than (1Bt). If
Yt_ (A  a t
Y
onQ\D on D,
4.1. Essential and stochastic convergence
121
then we claim Y' is asymptotically greater than (Xt):
P{Xt > Y'} < P{Xt > Y} + P{Bt \ D} . 0. Therefore slim sup Xt < Y. Thus {slim sup Xt > Al C {Y' > A} C D.
This is true for all events D such that 1D is asymptotically greater than
(IBj, so {slim sup Xt > Al C B.
Now given e > 0, choose s E J and T E E° such that T > s and P(B 0 B(T)) < e. This is possible by (4.1.10). Then we have
P(B) < P(B(T)) + e <
1
a
E [XT 1B(,)] + e.
The result follows on letting s + oo, a * 0 and e + 0. (b) Given 6 > 0, choose s E J and e > 0 with e < 6 such that P(B) < 2e and r > s implies E [X.r 1B] < 6A. Then let a < A/2 be so small that (4.1.11a)
P (s lim sup{Xt > A  a} \ A) < E.
Write Bt = {Xt > A  a} and b = slim sup Bt. Now choose T E E°, T > s, such that P(BOB(T)) < E. We have A C B, so P(A\B(T)) < P(B\B(T)), so by (4.1.11a) we have E [XT 1B(,r)] < E [X.r 1A] + 6A.
Then:
P(A) < P(B) < P(B(T)) + e < 1 E[XT1A]+ 6A
Aa <
Aa +e
1aE[X.r1A]+36.
The maximal inequality follows on letting s + oo, a > 0 and 6 a 0.
In the previous result, note that if (Xt) is a positive submartingale or positive supermartingale, then the right hand side of the inequality in (a) simplifies to (1/A) limtE j E [Xt].
We will now use the stochastic maximal inequality to show that the (essential) maximal inequality is equivalent with convergence of martingales. We state it here in a somewhat abstract form in part (ii); the best known application is M(A, e) = e/A, as in part (i).
Directed index set
122
(4.1.12) Proposition. Let (.Ft)tEj be a stochastic basis. The following are equivalent:
(i) For every integrable real random variable X, for every A > 0, we have
P{elimsupIEFt[X]I >A} < E[IXI]. (ii) There exists M : IR+ x IR+ .' 1R.+ with (a) lim6_.,o M(A, E) = 0 for
every A > 0, and (b) for e, A > 0 and every positive integrable random variable X with E [X] < E, we have
P{elimsupEFt [X] > A} < M(A,e).
(iii) For every positive integrable random variable X, the martingale Xt = E.*Ft [X] converges essentially.
(iv) For every integrable real random variable X, for every A > 0, letting A = {elimsup IEFt [X] I > Al, we have
P(A) < Proof. (i)
E [IXI 1A] .
(ii). Set M(A, e) = e/A.
(ii) = (iii). Suppose the function M as in (ii) exists. Let X E Ll. Write J:=, = v(U.Ft). We claim that the martingale Xt = E.171 [X] converges essentially to EF°° [X]. Since X may be replaced by EF°° [X], we may assume that X is .F,,.measurable. In that case, we must show that
elimXt = X.
Fix a > 0 and A > 0. Choose e > 0 so small that e < aA/2 and M(A/2, e) < a. Since X is F,,.measurable, there must exist to E J and Y E L1(.7=to) such that E [IX Yj] < E. (See (1.1.12).) Then by the triangle inequality, for t > to,
P {e lim sup jXt  X > Al < P {e lim sup E.97t [IX  Yj] > 2 }
+P{elimsup E7[IXYl]> 2}. But the first term on the right hand side is at most M(A/2, e) < a, while by Chebyshev's inequality the second term is at most 2e/A < a. Therefore P {e lim sup l Xt  X1 > A} < 2a for all t _> to. Thus a lim sup jXt  X = 0, and therefore elimXt = X. (iii) (iv). Let X E Ll. Let Xt = E.1t [X], so that also X,. = EF, [X] for ,r E E by the localization theorem (1.4.2). Since Xt converges essentially,
4.1. Essential and stochastic convergence
123
we haves lim sup Xt = e lim Xt = e lim sup Xt and A = {slimsuplXtI > A}.
Now if Yt = IXtj, then (Y,),EEo is uniformly absolutely continuous by (2.3.12). Then applying (4.1.11(b)), we obtain
P(A) < 1 limsupE [Y, 1A] A rEE° < 1 limsupE A rEE°
[IXI] 1A]
_ E [IX1 1A] . The last equality follows because Ear [IXI] converges to E"00 [IXI] in Ll. (iv) (i) is obvious The preceding proof in fact shows more. We need not consider all random variables X, but only a certain subclass. A family .F of processes (Xt, fit),
where Xt is measurable with respect to Ft, is called stable if for every (Xt, Ft) E £ and every to E J, the process (Yt, Gt) is also in £, where, for t > to, Yt = Xt  Xt,, and Gt = Ft; and for other t, Yt = 0 and Gt = Ft.. When we say (Yt) E £, without specifying the valgebras, we understand that (Yt, Gt) E £, where Gt is the aalgebra generated by { Ys : s < t }.
(4.1.13) Proposition. Let (.Ft)tEJ be a stochastic basis, and let £ be a stable subfamily of the family of uniformly integrable martingales. Then the following conditions are equivalent:
(i) For every (Xt) E £ and every A > 0, we have
P{elimsupIXtj >A} <
AE[IXtj].
(ii) Each (Xt) E £ converges essentially. Snell envelope
There is one application of the essential supremum that is useful even when the processes are indexed by IN. But it works also for processes indexed by any directed set J. We will define a variant of the Snell envelope, restricting stopping times to E; this is important in connection with amarts. Let (Xt) be an adapted integrable process. Then the Snell envelope of (Xt)
Directed index set
124
is the process (Zt) defined by: Zt = ess sup E.*Ft [X,]. TEE r>t
(4.1.14) Proposition. Let (Zt) be the Snell envelope of the integrable process (Xt). Then: (1) If a E E, then Za = ess sup
[Xr]
.
TEE r>a
(2) For or E E, there exists a sequence T,+, E E, T,,, > or, such that [XT..] I Z,.
(3) (Zt) is an amart and a supermartingale.
Proof. (1) Let or E E, and let t be one of the values of or. If T E E and
T > a, then there is r' E E, r' > t, such that r = T' on for = t}. Indeed, choose to with to > r; and let r ,r(w) to
if a(w) = r(w) otherwise.
Similarly, if T' E E and T' > t, then there is r E E, r > o, such that r = T' on for = t}. So on the set {a = t}, we have (by the localization theorem (1.4.2))
E*F° [Xr] = E.rt [Xr] =
[XT,]
.
Hence on {a = t} we have ess sup E'F° [Xr] = ess sup Et [XT] = Zt = Za. r>a r>t
This is true for all t. (2) First consider the case or = t constant. For Tl,T2 > t, if we define B = {Eat [XT.]
Eat [Xr2] },
then B E Ft, and r defined by Tl T =
{T2
belongs to E. We have T > t and
on B
onQ\B [Xr] > Eft [XT.] for i = 1, 2. Thus
the collection
{EFt[X,.]:rEE,T>t} is directed. Apply (4.1.1(c)) to complete the proof of (2) in the case a = t.
4.1. Essential and stochastic convergence
125
In the general case, for each t E J with {a = t} # 0, choose a sequence T(t) E E with r,( V) > t such that
Zt = supEft [Xm] n
Now (4.1.23) there exist stopping times Tn with rn = have ZQ = Zt on {a = t}, so we get Zr,, = sup EF [XT ]
Tnt)
on {a = t}. We
.
n
(3) Let a1 < a2 be simple stopping times. Choose Tn > a2 with El 2 T Z. Now Tn > a1, so Z 1. So by the monotone convergence theorem, (Zt) is a submartingale, and E [Zol] > linmE [E°1 [XTn]] = linmE [E"r'*°2
E[Z12]
As an application of the Snell envelope, we prove a stronger form of the
amart Riesz decomposition (1.4.6). Recall that an amart potential is an amart (Xn) such that limn E [Xn 1A] = 0 for all A E U°0=13 m. A Doob potential is a positive supermartingale Sn with limn E [Sn] = 0.
(4.1.15) Theorem. Let (Xn)nEIIV be an adapted process. Then (Xn) is an amart potential if and only if there is a Doob potential (Sn) with IXnI < Sn a.s. for all n. Proof. Suppose I Xnl < Sn a.s. Then IX < S, for all or E E. Now E [So] decreases as a increases. But E [Sn] > 0, so also E [So] > 0. Then IE [Xo] I < E [X01] < E [So], so (Xn) is an amart and E [I Xnl ] > 0, so it is an amart potential. Conversely, suppose (Xn) is an amart potential. Then (I Xn I) is also an amart potential. Let (Sn) be the Snell envelope of the process (IXn1). Then (Sn) is an amart and a supermartingale (1.4.3). Also, Sn > IXnI, so Sn > 0. Finally, E [IXTI] + 0, so by (4.1.14(2)) we have E [Sn] 0. j
The Riesz decomposition and (4.1.15) indicate that for J = IN, the amart convergence theorem is not likely to have striking applications not possible with martingales and supermartingales. In the vectorvalued case (Chapter
5) the situation is similar for uniform amarts (5.2.13), but not for other classes of amarts, since the Riesz decompositions are less restrictive. Also on directed sets (this Chapter) the behavior of amarts cannot be reduced to that of martingales and supermartingales.
Directed index set
126
Complements
(4.1.16) (Fatou's lemma for stochastic convergence.) Let (Xt) be a net of nonnegative random variables. Then E [slim inf Xt] < lim inf E [Xt] .
(4.1.17) (Uniform integrability and stochastic convergence.) Let (Xt) be a uniformly integrable net in L1. If Xt > X stochastically, then E [Xt] E [X]. In particular, if Xt * X essentially, then E [Xt] > E [X]. (4.1.18) (Monotone convergence theorem for essential convergence.) Let (Xt) be a net bounded in L1 that is monotone increasing in the sense that if s < t, then X3 < Xt a.s. Then e lim Xt exists and lim E [Xt] = E [e lim Xt]. In particular, slim Xt exists and lim E [Xt] = E [slim Xt]. (4.1.19) If a net (Xt) converges in Lr norm, for some p (1 < p < oo), then (Xt) converges stochastically. (4.1.20) Let (Xt) be a net of random variables, and let A E R. Let
At={Xt>4}andBt={Xt>a}. Then:
{X3VXt>A}=A,UAt9B.UBt={X8VXt>A} {X3AXt>Al =A,nAt cB.nBt ={X8AXt>A} {Xt1 VXt2V...>A}=At,UAt2U...CC Bt, U Bt2 U ... C {Xtl V Xt2 V ... > Al
{Xt1AXt2A...>A}cAt,nAt2n...c cBtlnBt2n...={Xt1AXt2A...>A}. (4.1.21) In the same notation, (a) less sup Xt > A} = ess sup At c_ ess sup Bt c_ less sup Xt > Al (b) less inf Xt > Al c_ ess inf At c ess inf Bt = less inf Xt _> A) (c) {e lim sup Xt > A} c e lim sup At c e lim sup Bt c {e lim sup Xt > Al (d) {e lim inf Xt > A} c e lim inf At c e lim inf Bt c {e lim inf Xt > A}.
(4.1.22) Let (Xt) be a net of random variables, and let A E R. Then slim sup{Xt > Al c_ {slim sup Xt > Al. t
t
Indeed, suppose Y is a random variable asymptotically above Xt, that is P{Xt > Y} * 0. Write Bt = {Xt > A}. Then
P(Bt\{Y>A})=P{Y A} 2 slim sup Bt, or Y > A on slim sup Bt. This is true for all such Y, so by the definition of slim sup Xt, we have also slim sup Xt > A on slim sup Bt. That is, {slim sup Xt > Al slim sup Bt.
4.2. The covering condition (V)
127
Note that in general the reverse inclusion slim sup{Xt > Al
{slim sup Xt > A}

t
t
fails. For example, with J = IN, if Xn = 1  1/n and A = 1, then the right side is 1 and the left side is 0. (4.1.23) (Generalized waiting lemma.) Generalize (1.1.5) to directed sets: Let (.Ft)tEJ be a stochastic basis, and let E be the corresponding set of simple stopping times. Let or E E be given, and for each t E J with for = t} # 0, let Tit) E E be given with T(t) > t on {a = t}. Then r defined by r(w) = r(t) (w) on for = t} belongs to E and r > or. Remarks
Stochastic upper and lower limits (also called upper and lower limits in measure) are due to D. E. Menchoff. The definition and treatment were considerably simplified by Goffman & Waterman [1960]. We have used their definition. Propositions (4.1.11) and (4.1.12) are from Millet & Sucheston [1980e]. For processes indexed by IN, the connection between maximal inequalities and convergence has been much studied; see for example Burkholder [1964]. Demiconvergence of martingales was first observed by Edgar & Sucheston [1981]. See also Millet & Sucheston [1983] and Frangos & Sucheston [1985].
4.2. The covering condition (V) Martingales indexed by a directed set converge stochastically. We will now provide an example showing that they need not converge essentially.
(4.2.1) Example. Let J be the set of all finite subsets of IN, ordered by inclusion. Then J is a countable directed set. Let Un be independent, identically distributed random variables with
P{Un = 1} = P{Un = 1} = 2. For each finite set t E J, let Ft be the (finite) aalgebra generated by the random variables Un, n E t. Thus, if s C t, then Fg C Ft. Define
Xt=1: 1Un. n nEt
First, we verify that the process (Xt) is L1bounded. In fact, it is L2bounded: since the Un are orthonormal,
(1)2=
(1)2 < IIXtll2= nEt
nEIN
Directed index set
128
For Llboundedness, apply the Schwartz inequality:
liXtili = E [IXtI] = E [IXtI 1] < E [IXtl2]1/2E [12]1/2 = IIXtII2.
Next, to show that (Xt) is a martingale, we claim: if s C t, then Eye [Xt] = X3. Since t is s plus a finite number of extra elements, it is enough to consider the case where there is one extra element, t = s U {m}, and then apply induction. But Ur, is independent of .F.7 and therefore
Xe+mE[U,.]=X3.
EF°[Xt]=Ey° [XB+mUml
is a martingale. This shows that Now we know by (1.3.1) that (Xt) converges stochastically. (In fact, by elementary Hilbert space theory, it converges in L2 norm and therefore by (4.1.19) stochastically.) But we claim that it does not converge a.s. Since J is countable, this means also that it does not converge essentially. In fact, we will see that the set of w E fI for which the net (Xt(w))tEJ converges has probability 0. Indeed, almost all w E S2 satisfy IU,,(w)I = 1 for all n.
Convergence of (Xt(w)) to x means: for any e > 0, there is a finite set s C IN such that for all finite sets t 2 s, <e. nEt
But this is equivalent to saying that of the series EnEIN(1/n)Un(w) converges absolutely. Now InUn(w)I=nF_
nEIN
H =°°,
so the series EnEIN (1/n)Un(w) does not converge absolutely. Therefore (Xt(w)) does not converge. Condition (V) Let (.F't)tEJ be a stochastic basis. Recall the notation E for the set of simple stopping times. For certain stochastic bases (for example a stochastic basis (.c'n)nEIN indexed by IN), all L1bounded amarts converge essentially. For other stochastic bases (for example, the one in the preceding example) this is not the case. In this section we study a condition on the stochastic basis that will insure essential convergence. This condition is called the covering condition (V) or the Vitali condition (V). An adapted family of sets is a family (At)tEJ, where At E ft for all t E J. We will often write A* for e lim sup At. If r is a simple stopping time for the stochastic basis (.Ft), we consider a stopped set A(r) defined by
A(r) = U (At n {r = t}). tEJ
This union is finite, since {r = t} is empty except for finitely many values of t. It may be easily verified that A(r) E .F,.
The condition (V) states that the essential upper limit of an adapted family of sets may be approximated by a stopped set.
4.2. The covering condition (V)
129
(4.2.2) Definition. The stochastic basis (.Ft)tEJ satisfies the covering condition (V) if, for each adapted family (At)tEJ of sets, and each c > 0, there exists a simple stopping time r E E with P(A* \ A(T)) < E. To understand the meaning of this notion, recall that a.s. convergence of amarts was proved in Chapter 1 by approximation of lim sup Xt by the stopped process XT (1.2.4 and 1.2.5). On directed sets this is not possible in general. Condition (V) postulates such an approximation of e lim sup for zeroone valued processes (At). This assumption is crucial: it will be shown below that (V) is necessary and sufficient for convergence of L1bounded amarts.
We have stated condition (V) in terms of stopping times since it will be used in that form. But it can also be stated without reference to stopping times: For each adapted family (At)tEJ and each e > 0, there exists an adapted pairwise disjoint family (Bt)tEJ, with only finitely many Bt nonempty, such that
P A* \ U(At n Bt) < e. t
Another variant allows countably many Bt nonempty, and concludes
P A*\U(AtnBt) =0. t
Another begins with a set B C A* and almost covers it by sets At n Bt. We will now prove the equivalence of a few simple variants of condition (V). To each variant there corresponds an asymptotic version in which, for each to, there is an approximating stopping time larger than to. The equivalence of an asymptotic version with the corresponding nonasymptotic version follows from the consideration of the process (Be) defined by Bt = (b)). Below, (b) is an At if t > to, Bt = 0 otherwise (see the proof of (a) asymptotic version of (a); conditions (c) and (d) are in asymptotic forms; their obvious nonasymptotic formulations have been omitted. Any of these equivalent formulations may be referred to as the covering condition (V).
(4.2.3) Proposition. Let (,Ft)tEJ be a stochastic basis. Then the following formulations of condition (V) are equivalent: (a) For every adapted family (At) and every e > 0, there exists T E E such that P(A* \ A(T)) < e. (b) For every adapted family (At), we have liminf, P(A* \ A(T)) = 0;
that is: for every e > 0, and every to E J, there exists r E E with T > to and P(A* \ A(T)) < e. (c) For every adapted family (At), we have limsup,,. P(A(T)) > P(A*);
that is: for every e > 0 and every to E J, there exists T E E with T > to and P(A(T)) > P(A*)  e. (d) For every adapted family (At), every to E J and every e > 0, there is a r E E with T > to such that P(A* L A(T)) < e.
Directed index set
130
Proof. (a)
(b): Given to E J, define Bt as follows:
ift>to
(At
Bt
oth erwise.
0
Then B* = A*. By (a), there is T E E with P(B* \ B(T)) < e. Choose tl > T, tl > to. Define or by eT
=
ti
otherwise.
Then B(T) = B(v) C A(a), so P(A* \ A(a)) < E.
(b) = (c): By (b), there is T > to with P(A* \ A(T)) < e. Thus P(A(T)) > P(A*)  P(A* \ A(T)) > P(A*)  E. (c) = (d). Given e > 0, choose s E J, s > to, such that P ( ess sup At \ A*) < e. t>8
Let r > s be given by (c). Then P (A* L. A(T)) = P (A* \ A(T)) + P (A(T) \ A*)
< P (ess sup At \ A(T) I + P I ess sup At \ A* t>8
/
\ t>8
< P (ess sup At)  P (A(T)) + e t>8
< P(A*)  P(A(T)) + 2e < 3E. Clearly (d)
(a).
Condition (V) asserts that A* = e lim sup At may be "covered" by the stopped set A(T). We now show that (V) also holds if only a portion of A* can be covered, but a fixed portion, independent of the choice of (At).
(4.2.4) Proposition. Let (.Ft)tEj be a stochastic basis. Condition (V) holds if and only if there is a constant a, 0 < a < 1, such that for each adapted family (At) of sets there is a r E E such that P(A* fl A(T)) > aP(A*). Proof. Suppose a exists. Let (At) and E be given. Let r1 E E be such that P(A* fl A(T,)) > aP(A*). Then let s2 E J, 82 > Tl, and set
A' =
At \ A(Ti)
if t > sl otherwise.
Since A* \ A(Ti) = e lim sups Ai , there exists r2 E E such that r2 > s2 and
P((A*\A(Tl))flA(T2)) >aP(A*\A(Ti)). Then P(A*\(A(Tl)UA(T2))) <
4.2. The covering condition (V)
131
(1  a)2P(A*). Continue inductively to obtain a sequence Tn of stopping times satisfying for all n the relations Tn_1 < Sn < to and n
P A* \ U A(Tj)
< (1 a)nP(A*).
j=1
Now we are given e > 0; choose n so that (1  a)nP(A*) < e. Choose s > Tn, and define
T=
on A(Tj) \ Uj=1 A(Tk), on 1 \ Uk=1 A(Tk).
Ti S
for 1 < j:5 n,
Then T E E and P(A* \ A(T)) < E. For the converse, suppose (V) holds; let a = 1/2 (say). Let (At) be given. If P(A*) = 0, then clearly P(A* flA(T)) > aP(A*) for any T. If P(A*) > 0,
use e = P(A*)/2 with condition (V) to obtain r with P(A* \ A(T)) < e. Then P(A* fl A(T)) < (1/2)P(A*). Example: Totally ordered basis Some simple examples may help explain the covering condition (V). The stochastic basis (.Ft)tEJ is totally ordered if, for any s, t E J, either .Fy C Ft
or Ft C .F3. Note that if J is totally ordered, then the stochastic basis is totally ordered, but the converse is not necessarily true.
(4.2.5) Proposition. If (.Ft)tEJ is totally ordered, then (.Ft) satisfies condition (V).
Proof. Let (At) be adapted and let E > 0. Then A* = e lim sup At C_ ess sup At, so there is a countable set {t}1 C J such that A* C U° 1 Ate Thus there is N E IN with P (A* \ UN 1 At,) < e. Renumber the ti so that C Ft,. Define T by: 'Ftl C J't2 C
1ti T=
ti tN
Then we have A(T)
on At, on Ati \ U,=1 Ate elsewhere.
for 2 < i < N
UN1 At,, and therefore P(A* \ A(T)) < e.
Example: Finite subsets of IN Consider the directed set J of finite subsets of IN, as in (4.2.1). We show now that condition (V) fails for the stochastic basis of the example (4.2.1).
See also (4.4.16), where it is shown that this basis also fails the weaker condition (C). We begin with independent, identically distributed random variables Un
with P{Un = 1} = P{Un = 1} = 1/2. If J is the set of all finite subsets
Directed index set
132
of IN, ordered by inclusion, and .Ft is defined as the least valgebra such that U,, (n E t) are measurable, then (.Ft)tEJ is a stochastic basis. If B, C are disjoint finite subsets of IN, write F(B, C) for the event
{UU,=1 forallnEB, U=1 forallnEC}. Thus Ft has atoms F(B, C), where B n c = 0 and B U C = t. These atoms all have measure 2k, where k is the number of elements of t. We claim that condition (V) fails. For m E IN, let Cm. be the set of all melement subsets of {m + 1, m + 2, , 4m}. For m E IN and C E Cm, let t(m, C) = {1, 2, , m} U C E J. Define
At =
f F(C,0)
if t=t(m,C)for some mEINand CECm, otherwise.
0 1 We will show that (V) fails for the adapted family (At). First, we claim that e lira sup At = Q. Given s E J and e > 0, we can , m} and choose m so that s C {1, 2,
(3,m)
23m
(4.2.5a)
>
k =m
(This is a simple combinatorial lemma. A probabilistic proof is given in (4.2.19).) Then
U At 2 U F(C, 0), t>8
CECm
which is the event that at least m of the 3m random variables Um+1i , U4m are 1. Thus the probability is at least 1  E by (4.2.5a). Um+2, Since s is arbitrary, this shows that e lim sup At = Q. Next, fix p E IN. Let T E E, r > {1, 2, ,p}. Then
A(T)=U(Atn{T=t}) 00
=m=P UCECm U (F(C,0) n {T = t(m,C)}). Fix m > p. The atoms of Ft(m,C) contained in F(C, 0) have the form F(CUB, D) where BUD = {1, , m}, BnC = 0. Note P(F(CUB, D)) = 22m. Now two of the sets F(C U B, D), F(C' U B, D) are not disjoint, since they both contain F({m + 1, , 4m} U B, D). Thus, for a fixed pair (B, D), there is at most one C with F(C U B, D) n Jr = t(m, C)} # 0. So
P CUm (F(C, 0) n Jr = t(m, C) } f
<E Thus P(A(T))
22m = 2m 22m = 2m
°°=P 2m = 2P+1. So condition (V) fails by (4.2.3(c)).
4.2. The covering condition (V)
133
Example: Interval partitions We now discuss a classical situation in which condition (V) holds (Theorem (4.2.8)). It is a result useful in derivation theory. The derivation of setfunctions defined on Euclidean space Rd is closely related to an ap
propriate directed set consisting of partitions. The topic is treated in a somewhat different setting in Chapter 7.
Let 1 be the ddimensional cube [0,1]d, and let P be ddimensional Lebesgue measure on Q. A collection C of open subsets of S2 will be called
substantial iff there is a constant M such that for every C E C there is an
open ball B with C C B and P(B) < MP(C). A simple example is the family ddimensional intervals (rectangular solids with edges parallel to the axes) such that the ratio of the longest edge to the shortest edge is bounded by some constant M'. First, a Vitali style covering lemma: A stronger version of the Lemma is found in Chapter 7 (7.2.1) . If B is an open ball, we write r(B) for its radius.
(4.2.6) Lemma. (a) Let D be a collection of open balls in [0,1]d. Let W = U D. Then for each e > 0 there is a finite disjoint subcollection D' of D such that
E P(B) > 3d (P(W) BED'
(b) Let C be a substantial collection of open sets in [0, 1]d with constant M.
Let W = UC. Then for each e > 0 there is a finite disjoint subcollection C' of C such that
P(C) > M13d (P(W) CEC'
Proof. (a) Let e > 0 be given. The set W is open, and therefore measurable.
So there is a compact set K C_ W with P(K) > P(W)  e. Now D is an open cover of the compact set K, so there is a finite subcover, say , Si,, E D and K C_ UP=1 Sj. Suppose these sets are ordered in decreasing order of their radii: r(Sl) > r(S2) > . . . > r(SP). , B,,, of balls. Let B1 = Now we define recursively a sequence 131, B2, , Bk have been defined. Let Bk+l be Sj, where j is the S1. Suppose 131, least index such that Sj fl Bi = 0 for 1 < i < k. If there is no such j, that is every Sj meets some Bi, then the construction stops with Bk. Certainly the construction stops in at most p steps. This completes the definition of
S1, S2,
the sequence 1 3 1 ,B 2 , .. . , B,,,.
Now for each Bi, let B' be the ball with the same center as Bi but three times the radius. We claim that P
n
j=1
i=1
US39UBZ.
Directed index set
134
Indeed, for each j, the ball S3 meets some Bi with r(Bi) > r(Sj), so S, C B. Thus
P(K) < P (U Sj) < P (U B:) <
P(B:) = 3d 1: pp,).
The required inequality follows. (b) follows from (a).
We will consider the collection of all countable measurable partitions of Q. Partitions are ordered by a. e. refinement: we write s < t if every atom of s is a union of atoms of t up to sets of measure 0. We will postulate that J is a directed set. This is satisfied in the classical cases. See (7.2.2).
(4.2.7) Theorem. Let C be a substantial collection of open subsets of the ddimensional cube Il = [0,1]d. Suppose that the family J of countable partitions of St into elements of C is directed by a.e. refinement. For t E J, let Ft be the valgebra generated by the partition t. Then the stochastic basis (Ft)tEJ satisfies condition (V).
Proof. Let M be the constant showing that C is substantial. Choose e > 0
so small that M13d(1  2e)  e > 0. We will verify the condition in Proposition 4.2.4 with a = M13d(1  2e)  E. Let (At) be a family of sets adapted to the stochastic basis (.2) described,
and write A* = e lim sup At. We may assume P(A*) > 0. Write E' _ eP(A*). Choose s E J so that P(esssupt>3 At \ A*) < e'. There exists a sequence tk > s of indices with Uk At,, = ess supt>s At. Decompose each At,, into atoms Ck,,, of tk. Then P(Uk n Ckn) > P(A*). There is thus a
finite set F of pairs (k, n) such that P(UF Ckn) > P(A*)  e'. By the
lemma, there is a subset F' C_ F such that the atoms { Ckn : (k, n) E F' } are disjoint and P(UF1 Ckn) > M13d(P(UF Ckn)  e'). Now choose an index u larger than all tk where k occurs as a first coordinate in the finite set F'. By the disjointness of the Ckn with (k, n) E F', we may define a stopping time by if w E Ckn and (k, n) E F'
f tk u Thus A(T)
otherwise.
UF' Ckn. Finally,
P(A* n A(T)) > P(A(T))  P(A(r) \ A*) >M13d
(p( ICkn) e1 'I e'  e' J
> M13 d (P (A*)  2,) = aP(A*).
4.2. The covering condition (V)
135
Essential convergence Here are some consequences of condition (V).
(4.2.8) Proposition. Let (.F't)tEJ be a stochastic basis. Suppose (V) holds. Then: (a) If (At) is an adapted family of sets, then slim sup A(T) = e lim sup At (= e lim sup A(T) rEE
\
tEJ
TEE
(b) If (Xt) is a stochastic process, then
slim supXt = elimsupXt I = elimsupX,) rEE
\
tEJ
rEE
That is, for every e > 0 and every to E J, there is a r E E with T > to and P11 e lim sup Xt  XT I > e l < E.
(c) If (Xt) is a nonnegative process, then for every A > 0 and every to E J, there is a r E E with T > to and P {e lim sup Xt > Al < 1 lim sup E [X,.] . A
TEE
(d) If (Xt) is a stochastic process, and v,,, is a sequence of simple stop
ping times, then there exist T,,, E E with T,,, > on, and X,, > e lim sup Xt a.s.
Proof. (a) Write A* = e lim sup At. To show slim sup, A(T) A*, we use (4.1.7a). If C is any set with lim, P (A(T) \ C) = 0, then we have
P (A* \ C) < P (A* \ A(T)) + P (A(T) \ C).
By (V), we have liminf, P(A*\A(T)) = 0. Therefore P(A*\C) = 0, or C A* a.s. This shows slim sup, A(T) J A*, so in fact slim sup, A(T) = A*. (b) We always have slim sup X, < e lim sup X, = e lim sup Xt, so we must prove the opposite inequality. Applying part (a), 4.1.21(c) and 4.1.22, we have for any \ > 0 { elimsupXt > A } C_ elimsup{Xt > A} tEJ l tEJ = slim sup {XT > A} JJJ
rEE
C { s lim sup Xr >,\ rEE
Therefore e lim sups Xt < s lim sup, XT.
I
Directed index set
136
(c) Let (Xt) be a nonnegative process, and let A > 0. Fix 3 with 0< Q < A. Define At = {Xt > 0}. Then {elimsupXt > /3} C A*. Let to E J and E > 0 be given. Then by (V), there is T E E with T > to and P(A* \ A(T)) < E. Then
E [Xr] _ E E [Xt 1{r=t}] t
/3EE [1{r=t}nA,] = /3P(A(T))
>,3(P(A*) E) > ,3 (P{elimsupXt > i3}  E). Now to and E were arbitrary, so
P{elimsupXt >)31<
lim sup E [Xr] . 13
rEE
Finally, let 0 4 A to obtain the result. (d) By (b), for any to E J there exists r > to with
P{I elimsupXt  X,I > E} < E. Apply this recursively.
Convergence theorems hold in the presence of condition (V). Our proof will follow the method used in Chapter 1. A semiamart is a process (Xt)tEJ with lim sup IE [X,] I < oo. 'EE (See, for example, (1.4.26).) Clearly every amart is a semiamart. Note that for J = IN this definition is equivalent to supCEE IE [Xa] I < oo (Lemma (1.2.1)).
(4.2.9) Lattice property. (1) If (Xt) is an L1bounded semiamart, then (Xt) is also a semiamart. (2) If (Xt) is an L1bounded amart, then (Xt) is also an amart.
The proof is essentially the same as that of Theorem (1.2.2), and is therefore omitted.
(4.2.10) Theorem (Astbury). Let (Ft)tEJ be a stochastic basis. The following are equivalent: (1) Condition (V). (2) Lambounded amarts converge essentially. (3) L1bounded amarts converge essentially.
4.2. The covering condition (V)
137
Proof. (1) (2). Let (Xt) be an L, bounded amart. For each n E IN, choose to E J so that if r, or > tn, then IE [Xr]  E [Xo] I < 1/n. Then by Proposition (4.2.8(d)), there exist stopping times rn with rn > tn, Tn+l >rn, and Xr,. i elimsupXt a.s. By Proposition (4.2.8(d)) applied to (Xt),
there exist stopping times Qn with vn > tn, vn+1 > vn, and Xo > e lim inf Xt. Hence 0 = lim J E [Xrn ]  E [XQn]
E [e lira sup Xt  e lim inf Xt],
so a lim sup Xt = e lim inf Xt.
(1) : (3). Suppose (1) holds. Then (as we just proved) also (2) holds. Let (Xt) be an Llbounded amart. By the lattice property (4.2.10), if A > 0, then the process ((A) V Xt A A) is an Lambounded amart. Therefore by (2) it converges essentially. Therefore the original process (Xt) converges essentially on the set S1A = {elimsuplXtI < Al. But the maximal inequality (4.2.8(c)) shows that P{e urn sup IXtI < oo} = 1, so S2 is the countable union of sets Sia, hence (Xt) converges essentially. (3) (2)
(2) is easy.
(1). Let (At) be an adapted family of sets. For t E J, let Xt be
the Snell envelope (4.1.15):
Xt = esssupEFt [1A(r)] . r>t The net (XQ)CEE is decreasing, so (Xt) is an Lambounded amart. So by (2)
it converges essentially. Now Xt > 1At and lA* < e lim sup Xt, so by the essential convergence, e lim sup Xt = slim sup Xt, and thus by (4.1.11(c)),
P(A*) < P{elimsupXt > 1} < limsupE[XX]. rEE
Given or E E, there exists a sequence Tn > or such that E" [1A(r,)] T Xo, so
E [Xo] = linm E {Ey' [1A(rn)] ] = linm P (A(7.)) .
Hence limsupr E [Xr] < limsupr P(A(T)). Thus
P(A*) < limsupP(A(r)), r so (V) holds.
The next corollary is an immediate consequence of Theorem (4.2.11).
(4.2.11) Corollary. Let (Ft)tEJ satisfy (V). Then Llbounded martingales converge essentially
Note, however, that under condition (V), it it not necessarily true that Llbounded submartingales converge essentially (4.2.17). We will see in Section 4.4 that condition (V) is not necessary for convergence of Llbounded martingales.
Directed index set
138
Complements
(4.2.12) (vdirected set.) Suppose the directed set J has the property that every countable subset has an upper bound. (We say that J is a adirected set.) Then any stochastic basis (.Ft)tEJ indexed by J satisfies condition (V). To see this, let (At) be adapted, and suppose a = lim inf P(A* \ A(T)) > 0. TEE
Next, choose indices sn E J with 1
inf P(A* \ A(T)) > a  n for n = 1, 2,. . . . There is soo E J larger than all sn, so
liminf P(A* \ A(T)) = inf P(A* \ A(T)) = a. TJ8
Now choose 71 = soo, and continue choosing recursively Tn with Tn '* Tn+1
so that P(A* \ A(Tn)) * a. Choose too larger than all Tn. Then define a countably valued stopping time Too by: Tn
Too _ too
on A(Tn) \ elsewhere.
A(TE)
for n = 1, 2,
So A(To) _2 U A(Tn), and thus P(A* \ A(To)) = a > 0. But A* \ A(To) is the e lim sup of the At \ A(To), so there is t1 > to such that P ((A* n At,) \ A(Too)) > 0, so P (A* \ (A(TOo) U At 1))) < a. Then for large enough n we have also P (A* \
(A(Tn)UAt1))) < a. Then we may construct a E E with P(A* \ A(a)) < a, a contradiction. (4.2.13) (Condition (V).) All the conditions in Theorem (4.2.9) are equivalent with condition (V). (4.2.14) (Other generalizations.) Let (Xt)tEJ be a stochastic process. For u, ,r E E, a < T, write
H(a, T) = X,  E° [XT] . Then we say that (Xt) is a pramart if slim H(a, T) = 0; O
4.2. The covering condition (V)
139
a subpramart if slim sup H(Q, T) < 0; 0<1
a martingale in the limit if e lim H(s, t) = 0. 8
Among other conditions equivalent with (V) are: Every amart is a martingale in the limit. Every pramart is a martingale in the limit. Every Llbounded pramart (or subpramart) converges essentially. Every Libounded submartingale amart converges essentially.
Pramarts and subpramarts converge under the following condition (d), which is properly weaker than Llboundedness:
lim inf E [Xt ] + lim inf E [Xt ] < oo.
(d)
(4.2.15) (Abstract difference condition.) Let be a stochastic basis. Suppose, for a, ,r E E, we are given a random variable f (a, T). Assume that: (1) For each s E J, 1j,=,}f(Q,T) = 1{o=3}.f(s,T) a.s.
(2) For each s E J, A E F., and 7,7' E E, if r = T' on A, then f (s, r) = f (s, r') on A. Suppose .F satisfies condition (V). If f (v, T) converges stochastically, then it converges essentially (Millet & Sucheston [1980b]; this paper missed the needed second localization condition, as was pointed out by A. Bellow). (4.2.16) (Submartingale and supermartingale compared to amart.) (a) An example of a supermartingale that is not an amart. The stochastic basis satisfies condition (V), yet the L1bounded supermartingale does not converge essentially.
Let ci =
2i2
. Let J = { (i, j) : i > 0,1 < j < ci } be ordered by:
i
(i,j)<(i',j')
For (i, j) E J, let F(i,i) be the oralgebra on SZ = [0,1) generated by the partition
{[±!2J!2):1
1/i
on [(j  1)/ci,j/ci), elsewhere.
Directed index set
140
Then: (Xt) is an L1bounded supermartingale and (Ft)tEJ satisfies (V) since it is totally ordered. But (Xt) clearly does not converge essentially, so it is not an amart. The controlled Vitali condition (4.2.25) fails. (b) A submartingale that is an amart, but the net (E [XT])TEE is not increasing.
Let J = IN x IN, ordered by (i1, j1) < (i2, j2) if it < i2 and ji < j2. Let 1 = [0, 1], and let P be Lebesgue measure. Define .F(o,o) as the trivial aalgebra {O, 1l}; for all other (i, j) E J, define F(ij) as the aalgebra with atoms [0, 1/2] and (1/2, 1]. Thus (.F(ij)) is a totally ordered stochastic basis. Define a stochastic process (X(ij)) as follows: X(o,o) = 0 X(o,1) = 21[o,1/2]  1(1/2,1] X(1,o) = 1[o,1/2] + 2 1(1/2,11
X(ij) = 3
for all other (i, j).
For this process, we have X. < Ewe [Xt] for all s < t in J, but E [XQ] E [XT], where a < r in E are given by v(w) = (0, 0) T(w) =
if w E (1/2,1], if w E [0 1/2]
(0,1) (1,0)
Also, the net E [Xv] converges to 3, so (Xt) is an amart. (4.2.17) (Supermartingale amarts in the derivation setting.) Let A be an algebra of subsets of Q. A set function /: A > [0, oo) is called a charge if it is finitely additive; it is called a supercharge if it is finitely superadditive,
that is
'(A U B) > O(A) + (B), for disjoint A, B E A. We discuss in Chapter 8 the decomposition of a suOc percharge i/i as = Wm+,bc+,bs, where m is a measure, is a pure charge,
and O8 is a pure supercharge (8.4.1). Here we will discuss the connection of amart theory to the derivation of charges and supercharges. As we will see, we obtain a supermartingale that is also an amart; the supermartingale property is less useful than the amart property, since supermartingales converge essentially only under conditions like (VO), strictly stronger than condition (V); while by Astbury's theorem (4.2.11), amarts converge under (V). Condition (V) holds in the classical setting of Theorem (4.2.8). P) be a probability space; let J be a set of finite measurable (a) Let
partitions of 1 directed by refinement. For each t E J, let .F be the aalgebra generated by t. Let z/i be a supercharge on the algebra A = U .Ft. For each t E J, let A(A)
Xt =
1A, AEt
P
4.2. The covering condition (V)
141
with the convention that O(A)/P(A) = 0 if P(A) = 0. Then Xt is an amart (and a supermartingale). Proof. Let a, T E E with or < r. We claim that E [XT] < E [Xa]. Write si for the values of a and tj for the values of T. Then E i
AEsi
[X,. 1A]
P(A)
AC{a=si}
i
AEsi
P1
AC{a=si}
1:
1:
j
BEtS
V)(B)
1A
BC{r=tj}nA
<E E P1A) 1: 1&({T=tj}nA) i AEsi
1A
i
AC {a=si }
< Xa. Integrate to obtain E [XT] < E [XC].
(b) The Riesz decomposition is Xt = Yt + Zt, where the martingale part is given by 4bm(A) + c(A) 1A
't =
P(A)
AEt
Proof. Let e > 0. Since 0$ is a pure supercharge, there is a finite partition A1, , A,,, composed of sets of A such that E', 0s(Aj) < E. Choose t E J so large that Ft contains all the sets A1, , A. Now n
0<
.(Ai) <
[V(A)  V)m(A)  V)c(A)] : AEt
i=1
The Riesz decomposition follows.
(c) An analogous theorem (with analogous proof) applies when J is a set of countable measurable partitions directed by refinement. Note that since 0 is superadditive, it is automatically countably superadditive. In this case, the martingale part in the Riesz decomposition is:
p(A)) 'A.
Yt = AEt
(4.2.18) (An example of the Riesz decomposition.) Let (Xt)tEJ be a stochastic basis generated by a family J of partitions of [0, 1]d satisfying the conditions of Theorem (4.2.8). Let Q be a finite signed measure absolutely
Directed index set
142
continuous with respect to P on J', = v (U .Ft) . Let f and g be real functions such that f (O) = g(O) = 0, derivatives f'(0), g'(0) exist, and g'(0) # 0. Then the stochastic process (Xt) defined by
Xtf(Q(A))'A, AEt
f (P(A))
tEJ
is an amart. The process Xt converges essentially to
f'(0) dQ g,(0) dP' where dQ/dP is the RadonNikodym derivative of Q with respect to P on m. The martingale part of the Riesz decomposition of (Xt) is given by f'(O)Q(A)
1't AEt
1A,
t E J.
91(0)P(A)
(4.2.19) (A combinatorial limit.) lim o 23m
r1
. r3m J =1.
k=m
Let U1,
, U3m
be independent random variables with P{Ui = 1} =
P{Ui = 1} = 1/2 for all i. Then the expression under the limit sign is the probability that there are at least m ones among U1i , U3,,,. Now the sum S = 3 ; U;, has mean E [S] = 0 and variance E [S2] = 3m, since the Ui are orthonormal. Now we may use Chebyshev's inequality to estimate the probability in question, which is
P{S > m} > 1 P{ISI > m}
=1P{S2>m2}
>1 ;W2 E[S2] 3m =1 ;W2 (4.2.20) (Cofinal optional sampling.) A class £ of processes (Xt, Ft, J) has the cofinal optional sampling property if, for every (Xt, Ft, J) E £ and every cofinal subset J' of E, also the process (XT, .FT, J') is in E. The classes of amarts, pramarts, and subpramarts have the cofinal optional sampling property (Millet & Sucheston [1980b]). (4.2.21) (Monotone optional sampling.) A class £ of processes (Xn,.Ft1) indexed by IN has the monotone optional sampling property if for every
4.2. The covering condition (V)
143
(Xn,Fn)nEIN E E, and every increasing sequence Tk in E(on), also the process (XT,,, FT,, )kE]N belongs to E. The classes of amarts, pramarts, and subpramarts indexed by IN have the monotone optional sampling property (Millet & Sucheston [1980b]).
(4.2.22) (Constant stochastic basis.) Suppose the stochastic basis Ft satisfies Ft = F. for all s, t E J. Then a process Xt is an amart if and only if Xt converges essentially and there is s E J so that
E less sup IXt l] < 00 t>s
(Edgar & Sucheston [1976a] for IN; Millet & Sucheston [1980b] for directed set; see also (1.2.11).)
(4.2.23) (Ordered Vitali condition (V°).) A stochastic basis (.Ft) satisfies
condition (V°) if, for each adapted family (At), and each e > 0, there exists T E E° with P(A* \ A(T)) < e. As we know (1.3.1), Libounded ordered amarts converge stochastically. Condition (V°) is necessary and sufficient for essential convergence of Llbounded ordered amarts. Condition (V°), originally called (V'), is sufficient for convergence of Libounded submartingales (Krickeberg [1959]), but it is not necessary. Totally ordered stochastic bases need not satisfy condition (V°). In example (4.2.16(a)), there is a totally ordered stochastic basis, but condition (V°) fails because Libounded supermartingales may not converge. Millet & Sucheston [1980b] has an example of a totally ordered stochastic basis where Llbounded supermartingales converge, but nevertheless (V°) fails. (4.2.24) (Controlled Vitali condition.) We say that a stopping time T E E is controlled by or E E° if v < T and T is Fomeasurable. We say that r is a controlled stopping time if T is controlled by some v E E°. The stochastic basis (.Ft) satisfies the controlled Vitali condition (VC) if for every adapted family (At) of sets, and A E F. with A C_ e lim sup At,
for every e > 0 there exists a stopping time r E E controlled by an ordered stopping time v E E°, and a set B E Fp such that B C_ A(T) and
P(A\B)<e.
Then we have (VC) (V°) (V) and the implications are not reversible. (4.2.25) (Controlled amarts.) Write EC for the set of all controlled stopping times. If rl, T2 EEC, write rl
(E [XT])TEEC converges.
Note that every L1bounded supermartingale or submartingale is a controlled amart. If condition (VC) holds, then every Llbounded controlled amart converges essentially.
144
Directed index set
Remarks
The Vitali condition (V) was obtained by analogy from the related setting of derivations (see Chapter 7). In the derivation setting, it goes back to the lemma of Vitali. In the directedset setting, condition (V) was introduced by Krickeberg [1956], who proved that martingales converge under (V); the condition was also studied by Y. S. Chow [1960b]. The important fact that condition (V) is equivalent to amart convergence (Theorem (4.2.11)) is due to Astbury [1978]. Millet & Sucheston [1980b] is the source of (4.2.2) to (4.2.4), (4.2.8) to (4.2.9), (4.2.14) to (4.2.19), and (4.2.23) to (4.2.25). They also discuss reversed ordered amarts, and Banachvalued ordered amarts.
4.3. LTbounded martingales Our next concern will be the convergence of L,pbounded martingales, where %F is an Orlicz function. (Orlicz functions, and Orlicz spaces, are
discussed in Chapter 2.) In this section, we will limit the setting to a probability space (Q, Y, P). This means that Lw C_ Ll and any LTbounded martingale is Llbounded. But, on the other hand, covering conditions that ensure essential convergence of all Llbounded martingales may be much stronger than necessary to ensure essential convergence of all Lwbounded martingales. We will return in the next section to the most important case, namely Ll itself.
The covering condition most often used in the study of L,wbounded martingales is stated in terms of the Orlicz function 4) conjugate to %F and multivalued stopping times. Multivalued stopping times A simple stopping time T E E defines a finite family of sets At = {T = t}
such that At E Ft and A9 fl At = 0 for s # t. In order to study finite adapted families of sets with overlap, we will use a modification of this sort of stopping time, namely the multivalued stopping time. There are two equivalent ways to look at the definition.
Let (,Ft)tEJ be a stochastic basis. Let J be the family of all finite nonempty subsets of J. A simple multivalued stopping time is a function T: S2 > 3, with only finitely many values, such that, for each t E J,
{wES2:tEr(w)}E Ft. We will write It E T} for this set. (Some of the literature writes {T = t} for this set.) We write EM for the set of all simple multivalued stopping times. We may identify E with the subset of EM such that T(w) is a singleton for
allwEQ. Similarly, a simple incomplete multivalued stopping time is a function U {0}, with only finitely many values, such that, for each t E J, It E T} E Ft. We write Elm for the set of all simple incomplete multivalued stopping times. T : St *
4.3. Lwbounded martingales
145
Consider an adapted family (At)tEJ of sets, only finitely many of which are nonempty. There is a unique T E EIM with It E T} = At for all t. So an alternative definition could be given in terms of such families of sets.
Let r E Elm. We write D(T) = Ut{t E r}, called the domain of T. Thus D(T) = S2 if and only if r E EM. We write S(T) = Et 1{tE1} for the sum of r. Then r E E if and only if S(T) = 1. The excess of T is e(T) = S(T)  1D(r). This function can be used to measure the overlap properties of the family ({t E T})tEJ of sets. Let (Xt)tEJ be a stochastic process. If T E Elm, we write
Xr =
Xt 1 {tEr} tEJ
Or, if we think of T(w) as a finite subset of J,
(Xr)(W) = E Xt(W) tEr(w)
If r E E, this coincides with the usual definition of Xr. Let (At)tEJ be an adapted family of sets (finite or infinite). For T E Elm we write
A(T) = U ({t E T} n At). tEJ
We will say that the stopping time T is subordinate to the adapted family (At) if It E T} C At for all t E J. Then D(T) = A(T). An ordering may be defined on EM in more than one way. For our purposes, they are all equally useful. We have chosen the easiest to define. For r E Elm and to E J, we write to < T if It E T} = 0 except for t _> to; we write T < to if It E r} = 0 except fort < to. If o, T E EM, we write o <1< T if there is to E J with or < to and to < T. Covering condition (Vw)
We begin with a covering condition analogous to the Vitali condition (V). We allow multivalued stopping times, but require that the Lw norm IIe(r)II,, of the excess (known as the overlap of r) be small.
(4.3.1) Definition. Let 4 be an Orlicz function. The stochastic basis (.Ft)tEJ satisfies the covering condition (Ve) if, for each adapted family (At)tEJ of sets, and each e > 0, there exists T E Elm with P(A* \ A(T)) < c and Ile(T)llw < e.
First we will record some useful technical variants of condition (Ve). Also, since P(A(r) \ A*) + 0 by the definition of a lim sup, any of the formulations with P(A* \ A(T)) < e may equivalently be restated with P(A* 0 A(T)) < e. Any of these formulations may be called "condition
Directed index set
146
(4.3.2) Lemma. The following are equivalent. (a) (V.,): for every adapted (At)tEJ and every e > 0, there exists T E Elm with P(A* \ A(T)) < E and IIe(T)II. < e.
(b) For every adapted (At)tEJ and every e > 0, there exists r E EM with P(A* \ A(T)) < e and IIe(T)IIw < e.
(c) For every adapted (At)tEJ, every e > 0, and every to E J, there exists ,r E EM with T >_ to, P(A* \ A(T)) < e and IIe(T)IIw < E.
(d) For every adapted (At)tEJ, every e > 0, and every to E J, there exists T E EIM subordinate to (At)tEJ with T > to, P(A(T)) > P(A*)  e, and IIe(T)II,, < e.
Proof. (a)
(b). Given T E EIM, choose s E J with s > r, and define T'
by
ItET}=
it ET} St\D(T)
t
s
t=s
.
Then T' E EM, A* \ A(T) At \ A(T'), and e(T) = e(T'). (b) (c). Given (At)tEj and to, define (Bt) by Bt = At if t > to and Bt = 0 otherwise. Then B* = At and we may apply (b) to the family (Be) to get (c) for the family (At). (c) (d). Given T E EM, let r' E Elm be defined by {t E T'} _ At n {t E T}. Then e(T') < e(T) and A* \ A(T) = A* \ A(T'). (d) (a). Let (At) be an adapted family, and lets > 0. Choose to so that P{ess supt>to At \ A* j < E/2. By (d) there exists r E Elm subordinate
to (At) with r > to, P(A(T)) > P(A*)  E/2, and IIe(T)IIw < e/2. But then P(A* \ A(T)) = P(A*)
 P(A(T)) + P(A(T) \ A*) < 2 + 2 = E.
It should be noted that if L1, = Lam, then condition (V.,) becomes condition (V). So (V) is sometimes known as (V,,.). Similarly, when L4, = Lp, we write (Vp) for (V4). We will see that under fairly general conditions, (Vw) is necessary and sufficient for the convergence of Lwbounded martingales. The case of LIbounded martingales is excluded, but Lqbounded martingales are included, for 1 < q < oo. There is a sort of convergence that is suggested by condition (V..). Suppose f (T) E IR for each T E EM. We will write V4,lim f (T) = u
if for every e > 0, there is to E J and 6 > 0 so that, if T E EM, T _> to, IIe(T) III < 6, then If (T)  uI < E. Note the use of EM, not Elm. This could be considered the MooreSmith convergence defined by a directed set. The
4.3. Lwbounded martingales
147
elements of the directed set are pairs (T, 6), where r E EM, b > 0, and II e(r) I I w
< b. The ordering is given by:
(r,b) < (T',s') if 6 > 6' and either r 4< r' or r = T'. Note that if Vwlim f (T) exists, then limrEE f (r) exists, and the limits are equal. There is a corresponding Vwlim sup. We will write
Vwlimsup f (T) < K
if for every e > 0, there exist to E J, and 6 > 0, such that if r E EM with r > to, IIe(r)IIt < 6, then f (T) < K + e. Clearly there is also a corresponding Vwlim inf. There are classes of processes that we may naturally associate with condition (Vw).
(4.3.3) Definition. Let (Xt)tEJ be an adapted process. Then (1) (Xt)tEJ is a Vwamart if VwlimE [X] exists; (2) (Xt)tEJ is a V,semiamart if Vplim sup IE [Xr] I < oo. (3) (Xt)tEJ is a V4,potential if Vwlim E [I XT I] = 0.
(4.3.4) Proposition. If (Xt)tEJ is a Vwamart and limt E [Xt 1A] = 0 for all A E Ut ft, then (Xt)tEJ is a V4,potential.
Proof. A Veamart is an amart, so (Xt) is an amart potential. In particular, E [I Xt I ]
> 0. Now given e > 0, choose b > 0 and to E J so
that if a1ia2 E EM, a1,a2 > to, IIe(a1)IIw < 6, and IIe(a2)IIw < 6, then IE [Xa1  Xa2] I < e. Let or E EM with IIe(a)II4, < b and or > to. Choose t1 > a. Let A = Ut{Xt > 0,t E a}, so that A E Ft, Define rl,,r2 E EM by:
{tE}= t {Xt>O,tEa} Q\A ({Xt<0,tEa} A
Sl
if tt1 ift = t1 if t#t1 if t
t1
Then
IX11 = 11>1ftEc') CI] IXtI 1{tEa} t
_ E (Xt 1{tEa,Xt>0}  Xt 1{tEa,Xt<0}) tt1
_
Xt 1{tErl} tTtl
Xt 1{tEr2}
t#tl
Xr" XT2+IXtlI. Thus E [IXaI] < e + E [IXt1I]. This shows that Vd1ima E [IXaI] = 0.
Directed index set
148
(4.3.5) Proposition (lattice property). (a) If (Xt)tEJ is an Libounded Vpsemiamart, then (Xt) is also a V4,semiamart.
(Xt) is also a V,
(b) If amart.
Proof. Write Ut = Xt . (We must use caution, since U, # (XT)+, in general.)
(a) Let Velim sup E [XT] = K < oo.
Choose t1 E J and 61 > 0 so that if r E EM, T > t1i and IIe(,r)II,,, < 61, then E [XT] < K + 1. Let supt E [I Xt 11 = M < oo. Now suppose r E EM with T > tl and IIe(T)II,,, < 61. Choose t2 > T, and let A = Ut{t E T,Xt > 0}. Then A E Ft2. Now define a E EM by:
It Ea}=
({tErr,Xt > 0} {l
SZ \ A
ift#t2 if t
t2
Thus a > t1 and e(a) < e(T), so IIe(a)IIe < 61. Then E [UT] = E
Xt 1{tET} I= E [1: Xt 1{tET,Xt>o}, t
<E[X.]+E[IXt2I]
(b) Let e > 0 be given. Choose t1 E J and 6 > 0 so that if T, or E EM, T > tl, a > tl, IIe(T)II,, < 61, and IIe(a)II,, < 6k, then IE [XT]  E [XQ] I <,F.
Next, using (a), there is t2 > ti, b2 < 61/2, and T1 > t2 with Iie(Ti)IIe < 62i so that if r E EM, T > t2, and IIe(T)II. < 62, then
E[UT] <E[UT1]+e.
Let t3 > Ti. Now suppose r E EM, r > t3, and IIe(T)IIe < 62. Define A as the finite union Ut{t E ri, Xt < 0}. Note that A E .F't3. Define a E EM by:
It ea}=
It ET}\A It E r1,Xt < 01
t>t3 otherwise.
4.3. Lambounded martingales
149
Now e(a) < e(rI) + e(,r), so II e(a) I I.. < S2 + S2 < 61, and v > t2. Therefore IE [Xr1]  E [X.,] I <,F. Also, E [Ur] < E [Url] + e. Now
Url  Ur
E Xt 1{tEr1,Xt>0} t
_ >2 Xt 1{tErl} t

Xt 1{tEr,Xt>_O} t
Xt 1{tEr1,Xt
Xt 1{tEi,Xt>_o}
t
t
t
Xt 1{tEri,Xt>_o} + 10\A > Xt 1{tEr,Xt<0} t
= Xrl  Xo  1A < Xr1  Xc.
Thus E [Ul]  E [U,.] < E [Xrl]  E [Xo] < e. Also E [U 1]  E [UT] > e. Thus (Ut) is a VDamart. As usual, the covering condition is equivalent to a maximal inequality, to an "approximation by the stopped process," and to amart conditions.
(4.3.6) Theorem. Let (.Ft)tEJ be a stochastic basis, and let 4 be an Orlicz function. The following are equivalent. (a) (V4,) holds: for every adapted family (At)tEJ,
V4,liminf P(A* \ A(r)) = 0. (b) For every nonnegative process (Xt)tEJ, and for any A > 0,
P(elimsupXt > A) <
V4,1imsup E[Xr].
(c) If (Xt)tEJ is adapted, e > 0, and t E J, then there is r E EM with r > t, Ile(,r) 11,t < e, and P(elimsupXt  Xr > e) < e. (d) Every Lambounded V4,potential converges essentially (to 0).
Proof. (a) (b). Let (Xt)tEJ be a nonnegative process, and let A > 0. Write X* = e lim sup Xt. Fix /3 with 0 < /3 < A. Define At = {Xt > /3}.
Then {X* > /3} C A*. Let t0 E J and e > 0 be given. Then by (Vt), there is r E EIM, subordinate to (At)tEJ, with r > to, jje(,r)jj. < e, and P(A* \ A(T)) < e. Now
E [Xr] = E E [Xt 1{tEr}] >/ E E [1{tEr}] t
t
> QP(A(rr)) > ,0(P(A*)  e)
>,3(P(X*>)3)E). Take the limit as a + 0:
P(X* > /3) <
VglimsupE [Xr] .
Directed index set
150
Then take the limit as Q * A:
P(X* > A) < (b)
V4,limsupE[XT].
(d). Let (Xt)tEJ be a Vtpotential. Then by (b), for any A > 0, P(elimsup IXtl > A)
VblimsupE [IX' 11 = 0.
Thus e lim sup IXt I = 0, so (Xt)tEJ converges essentially to 0.
(a) = (c). Let (Xt)tEJ be a process and e > 0. Let 6 > 0 be such that JIXII4, < 6 implies IIXII1 < E. Now X* is a'(U.T't)measurable, so there is
ti E J and a random variable Y, measurable with respect to Ft,, so that
P(IX*YI>e)<6. Let At = {Y  Xt < E} for t > ti. Then A* D {IX*  YJ < e}. Thus P(A*) > 1  E. Then there is r E EIM with r > ti, P(A(T)) > 1  e, and IIe(T)II,1, < 6, so that E [e(T)] < e. Then
P(X*  XT > 2e) < P(IX*  YJ > e) + P(A* \ A(T)) + E [e(T)] < 3e. (c) (a). Given an adapted family (At)tEJ of sets, let Xt = 1At Then X* = 1A* and for T E EIM and e < 1, we have
P(X*  XT > s) = P(A* \ A(T)). (d) (a). Let (At) be an adapted family, and let A* = e lim sup At. (Recall that if T is subordinate to (At), then A(T) = D(T).) We construct
recursively sequences Uk > 0 and Tk E Elm. Let
uo = sup { P(A(T)) : T is subordinate to (At) and
Then choose Tl subordinate to (At) with uo/2. Next let
IIe(T)II. < 11
.
Ile(T)II4, < 1 and P(A(Ti)) >
ul = sup { P(A(T) \ A(Ti)) : r is subordinate to (At), Ile(T)II < 1/2, and A(T) 2 A(Ti) }. Then choose r2 E Elm subordinate to (At) with Ile(T2)I14, < 1/2, A(T2) A(Ti), and P(A(T2) \A(Ti)) > ul/2. Continue in this way, so that A(Tk) A(Tk1), P(A(Tk) \ A(Tk1)) > uk_1/2 and
Uk = sup { P(A(T) \ A(Tk)) : T is subordinate to (At), Ile(T)II,,, < 1/(k+ 1), and A(T) D A(Tk) }. This completes the recursive construction.
4.3. Lq,bounded martingales
151
Now if r is subordinate to (At), IIe(T)IIw < 1/(k+ 1), and A(T) 2 A(Tk), then
uk_1 > P(A(T) \ A(Tk)) + P(A(Tk) \A(7k1)) > P(A(T) \ A(Tk)) +uk1/2, so Uk < uk_1/2. Therefore Uk < 2k. For each t, let 00
Ct=At\ U U {UETk}. u
Define Xt = lc. We claim that (Xt) is a Vppotential. Fix k, choose to E J with to > 7,11,T21, ,Tk, and let 6 = (1/k)  IIe(7k)II4. Now for any r E EM with T > to and IIe(T)IIw < 6, define a E Elm by
{tEo}= It ETk}U({tE r} f1Ct). Certainly A(a)
A(rk), IIe(or)II., < IIe(Tk)II,, + IIe(T)II,, <
so P(A(v) \ A(Tk_1)) < Uk_1 < 2k+1. Thus
XT = E Xt 1{tET} _
1Ctn{tET}
1A(a)\A(T,.) + e(T).
There is a constant c so that IIXIII < cII X II.. (2.1.21). Thus 0 < E [XT] < 2k+1 + c/k. This approaches 0 as k > oo, so we see that (Xt) is a V..potential. We conclude by (d) that (Xt) converges essentially. The limit is 0. Thus
P(elimsupCt) = 0. Now if B = U'1 A(Tk), then A* \ B C e lim sup(At \ B) = e lim sup Ct,
so P(A* \ B) = 0. Given e > 0, since the sets A(rk) increase, there is k with P(A* \ A(Tk)) < e and IIe(Tk)II,,, < E.
(4.3.7) Corollary. Suppose (Vw) holds. If (Xt)tEJ is an adapted process, and (ta) C_ J, then there exists an increasing sequence (Tn) in EM with Tn > tn, IIe(Tn) II4, + 0, and XT > e lim sup Xt a.s.
Proof. Apply (4.3.6(c)) repeatedly.
Directed index set
152
(4.3.8) Proposition. An Lwbounded martingale is a V ,amart. Proof. Let (Xt)tEJ be a martingale with IIXtIIw < M for all t. Since it is a martingale, L = E [Xt] is independent of t. Let e > 0 be given, and set b = e/(2M + 1). Suppose ,r E EM and IIe(T)IIe < 6. Choose to E J, to > T. Then
E [XT] = E E [Xt 1{tET}] = E E [Xto 1{tEr}] t
= E [x0
t
E 1{tEr} = E [Xta(1 + e(r))] t
= L + E [Xtoe(r)] . Thus, applying Young's inequality in the form (2.2.7), we get IE [Xr]
 LI < E [IXtoIe(T)] < 2IIXtoIIwIIe(T)IIe < 2M6 < e.
Therefore (Xt)tEJ is a Veamart. Here is a convergence theorem for the covering condition (Vw). (4.3.9) Theorem. Let (.F )tEj be a stochastic basis, and let Orlicz function. Then the following are equivalent: (1) Condition holds.
be an
(2) All Lambounded V,,amarts converge essentially. (3) All Libounded Veamarts converge essentially.
Proof. (1) (2). Let (Xt) be an Lambounded V4,amart. For each n E IN, choose to E J and bn > 0 so that if a, ,r E EM, or, T > tn, I l e(a) I I4, < bn, and Il e(T) III < 6n, then IE [Xo]  E [X,.] I< 1/n. Then by Corollary 4.3.7, there exist Tn E E with Tn > tn, (le(Tn)IIw < bn, and Xr,. * elimsupXt a.s. Similarly, there exist on E E with vn > tn, Ile(an)IIw < 6n, and
Xo * a lim inf Xt a.s. Hence 0 = linm I E [Xr,]  E [Xcj I = E [e lim sup Xt  e lim inf Xt] . Therefore e lim sup Xt = e lim inf Xt.
(1) = (3). Suppose (1) holds. Then (as we have just proved) also (2) holds. Let (Xt) be an Llbounded Vwamart. By the lattice property (4.3.5), if A > 0, then ((A) V Xt A A) is an Lambounded Vbamart. By (2), it converges essentially. Thus (Xt) itself converges essentially on the set S2A = {elimsuplXtl < Al. The maximal inequality (4.3.6(b)) shows P{elimsup IXtI < oo} = 1, so SZ is a countable union of sets Sta. Therefore (Xt) converges essentially. (3) (2) is easy.
(2) = (1). Suppose all Lambounded Veamarts converge essentially. Then all L..bounded VDpotentials converge essentially. Thus, by (4.3.6), (Vw) holds.
4.3. L4.bounded martingales
153
(4.3.10) Theorem. If (V4.) holds, then all L4.bounded martingales converge essentially.
Proof. Let (Xt) be an L4.bounded martingale. By (4.3.8), Xt is a V4.amart. An L4.bounded process is also Llbounded. Hence Xt converges essentially by Theorem (4.3.9). Necessity of (V4.)
We will prove the converse of Theorem (4.3.10) under the assumption that 4i satisfies condition (A2). It is an open problem whether the theorem is true without (A2)A countably valued incomplete multivalued stopping time is a countable collection of sets, written ({t E T})te j, with It E T} E Ft for all t. (Since
the collection is countable, It E T} = 0 except for countably many t.) We write ECIM for the set of all countably valued incomplete multivalued stopping times. We still write S(T) = E 1 {tET} (which may have the value oo); e(T) = S(T)  1 A S(T); if (At)tEJ is an adapted family of sets,
A(T)=U({tET}nAt). t
(4.3.11) Theorem. Let (Ft)tE.i be a stochastic basis, and let 4i be an Orlicz function satisfying (A2) at oo. If every L4.bounded martingale converges essentially, then (,Ft)tEJ satisfies (V4.).
Proof. Since (0,.F, P) is a finite measure space, we may change 4i near 0 without changing finiteness of L4. bounds or smallness of L4. norms. Thus we will assume that 4? also satisfies (A2) at 0, and cp(1) > 0. Thus 4i(2u) < c4i(u) for all u. Then (2.2.18) t'(cp(u)) < c4;(u) also. We first claim: If a, /3 > 0, (At)tEj is an adapted family of sets, and Y is F,measurable and satisfies Y > 0, Y E L4., and P(A* \ {Y > 0}) > 0, then there exist t E J and B E .fit with P(B) > 0, B C At, and E [Y 1B] + aP(B \ A*) < /3P(B). To see this, consider the random variable X = Y + a 10\A c Lp,
and the corresponding martingale Xt = Eat [X]. Now Xt converges to X stochastically, and, by assumption, it converges essentially, so it converges essentially to X. Now X = 0 on A* \ {Y > 0}, which has positive probability. Thus X > /31A. a.s. is false. Hence P ({Xt 3 1At } n At) > 0 for some t. If B = {Xt < )3 1A, } n At, then E [X 1B] = E [Xt 1B] < /3P(B).
Thus E [Y 1B] + aP(B \ A*) < /3P(B). This completes the proof of the claim.
Directed index set
154
Now let e > 0 and let (At)tEJ be an adapted family with P(A*) > e. Choose k so large that 1/e < 2k, and let 77 = ck. [Thus, if E rl, we will have E [t(Xle)] < E [,D(2kX)] < ckE [P(X)] < 1, so that IIXII,,, < e.] Then choose /3 > 0 so small that
0 (1
1
Q W(1))
< 77,
and /3 < cw(1)/2. Let a = cp(1).
We next claim: If r E ECIM is subordinate to (At), P(A* \ A(T)) > 0, and E [4 (e(T))] < ,jP(A* n A(T)), then there is t E J and B E .Ft, with P(B) > 0, B C At, and E [p(S(T)) 1B] + cp(1)P(B \ A*) < /3P(B).
(4.3.11a)
This is demonstrated by taking Y = W(S(T)) in the previous claim. Then
E ['(Y)] = E [IF(p(S(T)))] < cE [S(S(T))] < oo, since S(T) < e(T) + 1 belongs to Lt = Hd,. Now we note that if any t and B satisfy (4.3.11a), and if we define T by
Is ET}=Is ET} for s#tand It ET}={tE r} U B, then we still have E [4t(e(T))] < 7?P(A* n A(T)). Indeed, S(T) > 1 on A(T), so cp(1)P(B \ (A* \ A(T))) < cp(1)P(B n A(T)) + cp(1)P(B \ A*) < E [cp(S(7))1B] + cp(1)P(B \ A*) < /3P(B),
and thus
)
P(B n (A* \ A(T))) = P(B)  P(B \ (A* \ A(T)))
>I1
I P(B).
Therefore
E
/3P(B)
Q
WM
)
1P(Bn(A*\A(T)))
< 77P (B n (A* \ A(T))). Now
E
E [P(e(r)) 10\B] + E [D(S(T)) 1B]
= E [b(e(T))] + E [(S(S(T))  c(e(T))) 1B] < E [(P(e(T))] + E [cp(S(r)) 1B]
ijP(A* n A(T)) + ijP(B n (A* \ A(T))) _ 77P(A* n A(T)).
4.3. Lipbounded martingales
155
Also note
P(B n A(T)) < 1
E
B] < ) P(B) < 1 P(B),
so
(4.3.11b)
P(A(T)) > P(A(T)) + P(B)/2.
Now we will recursively construct a (finite or infinite) sequence r0,
r1,'r2,...EEIM
First let ro be the empty incomplete multivalued stopping time, is E ro} _ 0 for all s E J. Then e(TO) = 0, so trivially E [41(e(ro))] <,jP(A* n A(ro))
Suppose rk has been constructed, and E [(P(e(rk))] < iP(A* n A(rk)). Then if P(A* n A(rk)) > P(A*)  e, stop the recursive construction. But if P(A* n A(rk)) < P(A*)  e, construct rk+l as follows. The class Bk = { B E F :there exists t such that B E Ft, B C At, P(B) > 0, E [co(S(rk))1B] + o(1)P(B \ A*) < /3P(B) } is nonempty. Choose Bk E Bk with
P(Bk) > (1/2) sup { P(B) : B E Bk } , and use it to construct rk+ l as above, with E [,D(e(rk+1))] C ijP(A* n A(Tk+1))
and {sETk+l} 2 {sETk} for all s. We claim that the recursive construction ends at some finite stage rk. If not, we get an infinite sequence (rk) o. Since Is E rk+l} Q Is E rk}, there is a "limiting" r E ECIM defined by Is E r} = Uk{s E Tk}. We have E [fi(e(T))] < 1 kmE [b(e(rk))] < 77P{A* n A(T)} < oo.
Thus there is t E J and B E Ft with P(B) > 0, B E Bk for all k. This means P(Bk) > (1/2)P(B) for all k; but this is impossible, since by (4.3.11b) we
have E P(Bk) < 2P(A(r)) < 2. Therefore, we have P(A* n A(rk)) /> P(A*)  e for some k, so
P(A* \ A(rk)) < e. Also, E [p(e(rk))] < 77 implies IIe(rk)II4, < e.
The equivalence of essential convergence of martingales and covering conditions can be simply stated without stopping times in the case (Vi). The following, due to Krickeberg, is also the earliest result of this type.
Directed index set
156
(4.3.12) Theorem. Let (.Ft) be a stochastic basis. Then every L,,.bounded martingale converges if and only if. for every adapted family (At)tEJ and every e > 0, there is an adapted family (Bt), empty except for finitely many t, with Bt C At, P((J Bt) > P(A*)  e, and E P(Bt)
P(UBt)+e. Proof. Apply the theorem using
(see Section 2.2), then let Bt =
It E r}. Functional condition (FVe)
A variant (FVe) of the condition (Vp) is obtained by replacing the multivalued stopping time r by an adapted family of functions The advantage of (FVe) is that the equivalence with convergence of Lbounded martingales is easily established with no special assumptions on 4D (except finiteness). This disadvantage of (FV t) is that it is not a Vitalitype covering condition. We will say that an adapted family (et)tEJ of functions is subordinate to
the family (At)tEJ of sets if {lt # 0} C At for every t.
(4.3.13) Definition.
The stochastic basis (Ft)tEJ satisfies condition
(FVe) if, for every adapted family (At)tEJ of sets and every e > 0, there exists an adapted family (lt)tEJ of nonnegative bounded functions
subordinate to (At)tEJ, with only finitely many It nonzero, such that
E t  1 E lt 11,t <E.
E[
It is easily verified that condition (V4) implies condition (FV4,), by taking It to be the indicator function of the set It E r}. The variants of (FVp), the corresponding amartlike processes, and other interesting topics, will not be discussed here. We will prove only the martingale convergence theorem. For that purpose, we will need a maximal inequality.
(4.3.14) Proposition. Suppose (FVe) holds. If (Xt)tEJ is an Lbounded positive submartingale, and A > 0, then P(X* > A)
supE [Xt] .
Proof. Suppose IIXtIIw < M for all t. Let /3 satisfy 0 < /3 < A. Let At = {Xt > ,Q}. Thus {X* > Al C A*. Fix e > 0. By (FV4,), there is a family (6)tEJ of nonnegative bounded functions subordinate to (At)tEJ with finitely many t nonzero and E [l ] > P (A*)  e and < e, where
l; = Elt, t" = 1Al;, '
Let to E J be so large that to > t
4.3. LTbounded martingales
157
0) > 0. Then
whenever
iE rftXt t
t
_
E
E
E [EetXto] = aE
E
2 II0I,IIXtoIIw + E [Xto]
2eM
+ 1 sup E [Xt] .
Note that we have applied (2.2.7). Now (Xt)tEJ is L1bounded, since it is Lwbounded. Let e + 0, then Q > A to get the result.
(4.3.15) Theorem. Suppose 4i is a finite Orlicz function with conjugate T, and (Tt)tEJ is a stochastic basis. Then every Lq,bounded martingale converges essentially if and only if (.Ft)tEJ satisfies (FV.). Proof. Suppose (FV ,) holds. Then we have a maximal inequality by (4.3.14). Thus we may apply Proposition (4.1.13) using the family of Lwbounded martingales; we may conclude that these martingales converge. Conversely, suppose (FV.) fails. There is an adapted family (At)tEJ of sets and e > 0 such that, for any family (St)tEJ of functions, if
t > 0, t is bounded, t = 0 outside At, only finitely many t are nonzero,
(4.3.15a)
t is .Ttmeasurable, E then I I
fit] > P(A*)  e
> e. We consider three subsets of Lt:
et  1 A
C1 = { _ E G : (Ct)tEJ satisfies (4.3.15a) }
C2=ICE Lt:C< 1} C3={CELw:IIfIIt<E}. These sets are convex. Since C3 is open, also C2 + C3 is open. But Cl fl (C2 + C3) = 0. Hence by the HahnBanach theorem, there is x* E Ll such that x*(e) > 1 for all e E C1 and x*(C) < 1 for all C E C2 + C3. Now x*(C) < 1 for all C < 1, so by (2.2.24) the functional x* has the form x* (C) = E [LX] for some X E Lw.
Now consider the martingale Xt = E.7t [X]. Since X E Lw, (Xt) Lwbounded. We claim that Xt
1
> P(A*)  e
a.s. on A.
Directed index set
158
If not, fix t and B E.Ft with B C_ At, P(B) > 0, and Xt < 1/(P(A*)  e) on B. Then = P(A*) P(B) e 1B E [CX] = E [eXt] < 1, a contradiction. It
belongs to C1, so 1 < follows that X*
>
P(A*)  e
a.s. on A*.
Now E [X] < 1, so we also have
P (X*
P(A*) 
E) < P(A*)  E.
Thus P(X* # X*) > e, and (Xt)tEJ does not converge essentially. Complements
(4.3.16) (Covering condition (D').) Let 4) be an Orlicz function. We say that the stochastic basis (.F't)tEJ satisfies condition (D'>) if for each e > 0 there exists rl > 0 such that for each 7 > 0 and each adapted family (At)tEJ, there exists T E EIM with P(A* \A(T)) < e and E [4)(rle(rr))] < y. It can easily be verified that if 4) satisfies the (02) condition, then (D ,) is equivalent to (VO. Suppose J has a countable cofinal set. Then (D') holds if and only if every Lq,bounded martingale converges (Talagrand [1986]). This result is more satisfactory than Theorem (4.3.11) in the sense that the hypothesis of condition (02) is not needed. But the covering condition (D.') is not as easy to understand or verify as condition (Vw). Talagrand introduced another covering condition (C..) in his study of Lwbounded martingales (4.4.12). Remarks
The covering condition (Vp) appears first in the derivation setting; see, for example, Hayes [1976]. The martingale material here is based on Krickeberg [1956] and Millet [1978]. Krickeberg proved that (VP) implies essential convergence of all
L9 bounded martingales, where 1/p + 1/q = 1. He also proved the necessity in the case p = oo. (This is our (4.2.12).) See Krickeberg & Pauc [1963]. Following the lead of Hayes in the derivation setting, Millet proved in the directedset setting necessity for (Vp), 1 < p < oo, as well as for (V..), where 4) satisfies condition (O2) (4.3.11). Millet & Sucheston [1979a] and [1980d] proved several equivalent formulations of (VP), including the amart convergence: they used the term "amart for Mr" for our "VP amart." These papers also introduced the stoppingtime formulation of the Vitali conditions. Talagrand [1986] contains a theorem stating that condition is equivalent to the convergence of all LTbounded martingales under a much more general hypothesis than (A2); unfortunately, we believe there is a gap in his proof. Professor Talagrand now shares this view. It would be interesting to know exactly which Orlicz functions have this useful property. The functional condition (FV4,) appears first in the important paper, Talagrand [1986]. Theorem (4.3.15) appears there.
4.4. Llbounded martingales
159
4.4. L1bounded martingales In this section we return once again to the question of essential convergence of Llbounded martingales.
We know that condition (V) = (V,,.) is sufficient for convergence of Libounded martingales. We know that condition (V4.) is necessary and sufficient for convergence of L.,bounded martingales, if 4) satisfies condition (02). But the Orlicz function 4 (see Section 2.2) does not satisfy (02). We will give an example below (4.4.10) showing that condition (V) is not necessary for convergence of Llbounded martingales. We will therefore need to consider another covering condition, condition (C), which is slightly more complicated than (V). Under reasonable hypotheses (the directed set J has a countable cofinal subset) we will prove that (C) is necessary and sufficient for the essential convergence of Libounded martingales. We will retain the notation used in the preceding section. A probability space (1, F, P), a directed set J, and a stochastic basis (Ft)tEJ will be fixed. We write Elm for the set of (simple) incomplete multivalued stopping times, and EM for the set of multivalued stopping times. If (Xt)tEJ is a stochastic process, then X* = e lim sup Xt t
and if T E Elm,
Xr = > Xt 1{tEr} tEJ
If (At)tEJ is an adapted family of sets, then A* = e lim sup At t
and if r E Elm,
A(T) = U ({t E r} n At). tEJ
For r E Elm, the domain of T is D(T) = Ut{t E T}, the sum of T is S(T) = E 1{tEr}, and the excess of r is e(T) = S(T)  1Diri. We say T is subordinate to (At)tEJ if It c r} C At for all t E J; then A(T) = D(T). Covering condition (C) The next covering condition to be introduced is similar to the conditions
(V4.) of the previous section. But instead of requiring that the overlap should be small, it is only required that it be bounded.
(4.4.1) Definition. The stochastic basis (Ft)tEJ satisfies the covering condition (C) if, for each e > 0, there is a constant M such that for every adapted family (At)tEJ of sets, there exists r E EIM with P(A* \ A(T)) < e and S(r) < M. Note that (V) implies (C) even with M = 1. The stochastic basis of Example (4.2.1) fails condition (C): see (4.4.16). Here are some technical variants of condition (C). The proofs are similar to those used for (V4.) so they are left to the reader.
Directed index set
160
(4.4.2) Lemma. The following are equivalent. (a) (C): For every e > 0, there is M such that for any adapted family (At)tEJ of sets, there exists T E Elm with P(A* \ A(T)) < e and S(T) < M. (b) For every e > 0, there is M such that for any adapted family (At)tEJ of sets, there exists T E EM with P(A* \ A(T)) < e and S(T) < M.
(c) For every e > 0, there is M such that for any adapted family (At)tEJ of sets, and every to E J, there exists r E EM with r > to, P(A* \ A(T)) < e and S(T) < M. (d) For every e > 0, there is M such that for any adapted family (At)tEJ
of sets, and every to E J, there exists r E EIM, subordinate to (At)tEJ, with r > to, P(A(T)) > P(A*)  e and S(T) < M. (e) For every e > 0, there exist M and a > 0 such that for any adapted
family (At)tEJ of sets with P(A*) > e, and every to E J, there exists ,r E EIM, subordinate to (At)tEJ, with r > to, P(A(T)) > a and S(T) < M. See Millet & Sucheston [1980e], where the lemma is proved. The same paper contains a proof that the maximal inequality in Lemma (4.4.3), below, is equivalent to condition (C). Again, we refrain from studying the processes corresponding to condition
(C) and proceed directly to martingale convergence. However, our usual technique of truncating at A and A to obtain an L..bounded amart seems difficult to carry out here: we do not know of a suitable amart definition corresponding to condition (C). So we will have to use another method, involving a decomposition theorem for martingales (1.4.17). We begin with a maximal inequality. (4.4.3) Lemma. Suppose condition (C) holds. If (Xt)tEJ is a nonnegative adapted process, then for every e there exists M such that, for every A > 0,
sup E[X]. P(X*>A)<e+1A ,,EEIM S(o)<M
Proof. Let A > 0 be given. Let 0 satisfy 0 < )3 < A. Given e > 0, choose M as in condition (C). Let At = {Xt >,31. Then {X* > A} C A*. Then
there is r E Elm with P(A* \ A(T)) < e and S(T) < M. Then we have A(T) = Ut({t E T} n At), so 1
1A(r)
1At 1{tEr}
1
E Xt 1{tEr} = X.r
Thus P(A(T)) < (1/0)E [Xr]. Therefore P(X* > A) P(A*) < P(A(T)) +e
e+ )3E [X] <e+ 1 sup E[XQ]. a OEEIM S(o)<M
Nowlet,3+A.
4.4. Llbounded martingales
161
(4.4.4) Proposition. Let (Ft)tEJ be a stochastic basis. Suppose condition (C) is satisfied. Then all uniformly integrable martingales converge essentially.
Proof. Let Xt = E*)''t [X] be a uniformly integrable martingale. We may assume that X is o (U Ft)measurable. Fix e > 0. Let M be the constant that corresponds to e by (C). There is s E J and bounded F8measurable function Y such that IIX  YIII < e2/M. Now Zt = EFt [IX YI] is a nonnegative martingale. Thus we have by the maximal inequality (4.4.3): there is T E Elm with r > s, S(r) < M, and
P{Z* > e} < 2E + E [Zr] Choose t > r. We have
P{Z* > e} < 2e + E [ZtS(T)] 2
< 2e + M E [Zt] < 2e + M M = 3e. Thus P(e lim sup Xt  e lim inf Xt > 2e)
< P ({e lim sup Xt  Y > e} U {e lim inf Xt  Y < e}) < P{Z* > e} < 3e. Thus Xt converges essentially. Since Xt converges stochastically to X, the essential limit is also X. Next, we must handle the case of nonuniformly integrable martingales. Recall these definitions: A finitely additive setfunction p defined on a sub
algebra G of F is called singular if for every e > 0, there exists A E 9 with P(St \A) < e but variation IAI(A) < e. A martingale (Xt)tEJ is called singular if the finitely additive setfunction p defined on = Ut.F't by p(A) = lim E [Xt 1A]
is a singular measure. We know (1.4.17) that any Llbounded martingale can be written as the sum of a uniformly integrable martingale and a singular martingale.
(4.4.5) Proposition. Let (.Ft)tEj be a stochastic basis. Suppose condition (C) is satisfied. Then all singular Llbounded martingales converge essentially to 0.
Proof. Suppose (.F't)tEJ is a singular Llbounded martingale. If (Xt) corresponds to singular measure u: µ(A) = lim E [Xt 1A]
Directed index set
162
for all A E G = Ut.Ft, then the absolute value process Zt = XtI corresponds to the variation of p:
1pl(A)=limE[Zt1A]. Thus p has bounded variation, since (Xt) is Llbounded. (The process (Zt) is a submartingale.) Let e > 0. Let M be the constant corresponding to e by condition (C).
Since p is singular, there exist s E J and B E F. with P(B) > 1  e and 1i1(B) < e2/M. If a E Elm with a > s and S(a) < M, then choose t > a and compute E [Zo 1B] < ME [Zt 1B] < e2.
Now by the maximal inequality (4.4.3), applied to the process (Zt 1B)t>8,
we have P{Z* 1B > e} < 2e, so P(Z* > e) < 2e. Thus Z* = 0, so that (Xt) converges essentially to 0.
(4.4.6) Theorem. Let (Ft)tEJ be a stochastic basis. Suppose condition (C) is satisfied. Then all Llbounded martingales converge essentially.
Proof. Apply the decomposition (1.4.17), then use Theorems (4.4.4) and (4.4.5).
Necessity of (C)
We next undertake the proof of the converse: If all Llbounded martingales converge essentially, then condition (C) holds. (We assume the existence of a countable cofinal set for this.) In fact, we need only the convergence of the uniformly integrable Llbounded martingales; that is, martingales of the form Xt = E.Ft [X] for some X E LI. We will say an adapted family (At)tEJ of sets is finite if At = 0 except for finitely many t. (Such an adapted family is thus really the same thing as an element of Elm, viewed from a different perspective.) We also say that the family (At)tEJ is supported beyond to if At = 0 for all t except those with t > to. (Thus the corresponding r E EIM satisfies r > to.) We begin with an application of the HahnBanach theorem.
(4.4.7) Lemma. Let (At)tEJ be a finite adapted family of sets. Let a > 0. Suppose, for each adapted family (et)tEJ of nonnegative bounded functions, subordinate to (At)tEJ, with E [ E et] = 1, we have a. 00
Then, for each y > 0, there is Y ELI with Y > 0, E [Y] < 1/a, and
P (At \ {EFt [Y] > 1/2}) < y for all t.
4.4. Llbounded martingales
163
Proof. Fix ry > 0. Consider the following two subsets of Lam:
Cl = {
:
= E, for some family (et)tEJ of nonnegative functions, subordinate to (At), E
1,
,0
< 1/7 }
C2={ ELF:.
E
2
E [eY] > 2
for all
E C2
for all
E C1.
Now E 1/2 for all < 0 implies Y > 0; and E 1/2 for all with a/2 implies IIYIII <_ 1/a. Now for given t, let B = At \ {EFt [Y] > 1/2}. If P(B) > ry, then
= P(B) 1B
i<
belongs to C1, and thus 1
E
ns E [1' 1B]
P(B) E <
1
P(B)
[E'Ft [Y] 1B]
E 11 B 2
This contradiction shows that P(B) < ry.
=
2. U
(4.4.8) Lemma. Let (.Ft)tEJ be a stochastic basis, where J is a directed set with countable cofinal subset. Suppose every Llbounded martingale converges essentially. Then for every e > 0, there exist to E J and N > 0 so that for all finite adapted families (At)tEJ of sets supported beyond to, with P(Ut At) > e, there is an adapted family (&)tEJ of bounded nonnegative functions, subordinate to (At)tEJ, with E [ E fit] = 1 and > & < N. Proof. Suppose (for purposes of contradiction) that this is false. Then there
is e > 0 such that for each to E J and N > 0, there is a finite adapted family (At)tEJ supported beyond to, with P(Ut At) > e such that any (t)tEJ subordinate to (At)tEJ with E [ E et] = 1 has 11 E t11. > N.
Directed index set
164
Let (sk) be an increasing sequence cofinal in J. For each k, use N = 2k+2/e to obtain a finite family (At)tEJ supported beyond sk as above. Say At # 0 for p different values of t. (Note p depends on k.) Then apply Lemma (4.4.7) with 'y = 1/(kp) and a = N to obtain Yk E Ll with IlYkjil < e/2k+2 and P (At
\ {EFt [Yk] > 2 })
gyp, 1
so that P
Let X =
U{E?t[Yk]>2} >.
Yk. Then X11l < e/4. Consider the uniformly integrable
martingale Xt = E7t [X]. Since (sk) is cofinal, we have
P(e lim sup Xt > 1/2) > e.
But P(eliminfXt > 1/2) < e/2, since 1IXIIl < e/4. So (Xt) does not converge. This contradiction completes the proof.
(4.4.9) Theorem. Suppose J is a directed set with countable cofinal subset, and (.Ft)tEJ is a stochastic basis. If every uniformly integrable Libounded martingale converges essentially then (Ft)tEJ satisfies (C).
Proof. We claim: For every e > 0, there exist M > 0, a > 0, ry > 0, and to E J such that for any finite adapted family (At) supported beyond
to with P(U At) > e and E P(At) < (1 +'y)P(U At), there is T E Elm, subordinate to (At) with P(A(T)) > a and S(T) < M. Indeed, given e > 0, let N and to be as in Lemma (4.4.8), and let a = 1/(4N), ry = 1/(2N), and M = 4N. Let (At)tEJ be a finite adapted family supported beyond to with P(U At) > e and E P(At) < (1 + ry)P (U At). Overlap occurs on the set
C = U(As fl At). $34t
(It is really a finite union.) Then P(C) _< >P(At)  P(UAt) _< 7 = 1/(2N). There is a family (t)tEJ subordinate to (At)tEJ with E [
fit] = 1
and > t < N. Now E [(E t) 1c] < NP(C) < 1/2, so E
Thus if H = { by It E r} =
lO\c] >
2
t > 1/4} \ C, we have P(H) > 1/(4N). Define r E Elm 1/4}. Then, since the sets At do not overlap outside C,
4.4. Llbounded martingales
165
we have P(A(T)) > P(H) > 11(4N) = a. Also, S(T) < 4 E et <_ 4N = M. This proves the claim. Now since all uniformly integrable Llbounded martingales converge, in particular, all Lambounded martingales converge. So (4.3.12) condition (Vi) holds. Now let e > 0 and let (Bt) be an adapted family of sets with P(B*) > E.
By (Vi), there is a finite adapted family (At), subordinate to (Bt), with P(U At) > e and > P(At) < P(U At) + ye < (1 + y)P(U At). So there is T E Elm with P(A(T)) > a and S(T) < M. By Lemma (4.4.2), we see that condition (C) is satisfied. A counterexample
(4.4.10) We will next consider an example of a stochastic basis (Ft)tEJ not satisfying condition (V), but for which all Llbounded martingales converge essentially.
If we wish, the probability space (Q,F, P) may be [0, 1) with Lebesgue
measure, and the sets Dt and It (W) defined below may be taken to be halfopen intervals [a, b). But any continuous probability space will suffice.
Choose integers ni with 2 < ni < n2 < . . and E 1/ni < oo. Choose positive numbers ai with 1/2 > al > a2 > . . . and > ai < oo. For each i, let K(i) = {1, 2, , nil, and let L(i) be the collection of all 2element subsets of {1, 2,. .. , nil. Thus L(i) has ni(ni  1)/2 elements. Let Ji be the Cartesian product KI x K2 x ... x Ki, and let J = U000 Ji. [Write 0 for the unique element of J0.] For t E Ji, we say that t belongs to level i, and write Itl = i. If t = (ti, t2i , ti) E Ji and p E K(i + 1), then we write tp for the element of Ji+i defined by tp = (tl, t2i , ti, p). The set J is directed when the ordering is defined by:
s
iff
Isl
We next define (recursively on the level i): aalgebras cci (i E IN),
aalgebras Ft (t E J), sets At (t E J, Its > 1), sets It(W) (t E J, W E L(Itj + 1)), and
sets Dt (t E J), with P(D t) = where i = ItI.
(1ai)(1a2)...(1ai) nln2 ... ni
Directed index set
166
D1
Figure (4.4.10a). aalgebra go.
D2
D3
Figure (4.4.10b). aalgebra 91.
D11
x Figure (4.4.10c). aalgebra 92.
Figure (4.4.10d). A22 = D22 U I2({1, 2}) U I2({2, 3}) U I2({2, 4}).
You may find the Figures (4.4.10a) to (4.4.10d) helpful when reading the following description. To begin, go = FO = {St, 0},
Dm = S2.
Suppose for some i > 0, that 9i, .Ft, and Dt (for Iti = i) have been defined.
Let t E Ji. Subdivide Dt into ni+l sets Dtp (p E K(i + 1)), each of probability (1  al)(1  a2) ... (1  ai)(1  ai+l)
nin2...nini+l
4.4. Llbounded martingales
167
and ni+1(ni+l  1)/2 sets It(W) (W E L(i + 1)), each of probability (1  al)(1 
ai)ai+l 2
nln2.nini+1(ni+i  1)
This subdivision is possible, since the sum of the probabilities of all the subdividing sets listed is exactly the probability (1  a1)(1  a2)...(1  ai)
nin2...ni
of Dt. For p E K(i + 1), let
Atp=DtpUU{It(W):W E L(i+1),pEW}. Then
P(Atp) =
(1  a,)...(1  ai)ai+12
(1  al)... (1  ai+i)
+ (ni+1  1) nl ... nini+l(ni+l  1)
nl ... ni+1 1 + ai+1
P(Dt) <
ni+1  1
2
P(Dt).
ni+1
Note that Atp n A,Q = 0 if t # s, but Atp n At. = It({p, q}) 0 if p $ q. Let Ftp be the valgebra generated by Gi and the single set Atp. Let 9i+1 be the oralgebra with atoms
for t E Ji, p E K(i + 1), and for Its < i,W E L(Itl + 1).
Dtp
it(W)
Since Dt 2 Atp 2 Dtp, we have 9i C_ Ftp g 9i+1. This completes the recursive definition. As required, if s < t, then .F8 C_ .fit.
We claim that (.Ft)tEJ fails condition (V). We use the adapted family (At)tEJ for this purpose. For each i,
U Dt c U At 9 U Itl=i+1
Dt,
ItI=i
ItI>i+1
so A* = limsupAt = f°_°o Ult1=i Dt. Now
P(A*) =i lim P U Dt 00 ItI=i
= lim 00
/
(1a,)...(1ai)nl...ni nl ... ni
=fl(1aj)>0. j=1
Directed index set
168
Now fix io and let T E E be any simple stopping time with values in levels io and above. Now At is an atom of Ft, so either {T = t} 2 At or {T = t} n At = 0. Because of the overlap properties of the At, for each fixed index t we must have Jr = tp} n Atp = 0 except for at most one value of p, so that
P
U {T = t} n At I< E P(Atl) Itl=i+1
/
Itl=i
< nl ... ni
Thus
00
2
ni+1
P(Dt)
1)
P(A(T)) < E n. i=zo
Since this approaches 0 as io > oo, we see that (V) fails. However, we claim that all L1bounded martingales adapted to (.Ft)tEJ converge essentially (or, what is the same thing since J is countable, a.s.).
We verify this by proving that condition (C) holds. In fact, a uniform version of (C) is true: If (Bt) is an adapted family of sets and e > 0, then there is T E ELM with P(B* \ B(T)) < e and S(T) < 2. Let (Bt) and s be given. Fix i. We will construct Ti E ELM with values in Ji such that S(Ti) < 2 and P (Ultl=i Bt \ B(Ti)) < 2/ni. First consider the atoms of gi_1. For s E Ji_1, if D3 C_ Bt for some t E Ji, then choose one and call it t = ry(s). For Isi < i  2 and W E L(i  1), if I3(W) C_ Bt for some t E Ji, then choose one and call it t = /3(s, W). If some atom D3, S E Ji_1, is not contained in any Bt, then consider the sets Hp = D8 nB,p. This is JFp measurable, and does not include all of Ds. So it is one of: 0, A8P, Dsp \ Asp. For this value of s, if Hsp = Dsp \ Asp for some value of p,
then choose one, and write Rsp = Hp for that value of p and Rsp = 0 for the other values of p. If Hp is 0 or Asp for all p, then let Rsp = Hsp for all
p. Now defineri by: It ETi}=0if Itl #i, and
It ETi}=U{Ds:t=ry(s)}UU{I3(W):t=,3(s,W)}URt. Clearly It E Ti} C Bt. Since the sets Rt intersect at most two at a time (and the others are disjoint), S(Ti) < 2. Now how large is Vi = Ultl=i Bt \ B(Ti)? Its intersection with most
atoms of gi_1 is 0. The only exception is an atom Ds, s E Ji_1, where Hsp = Dsp \ Asp, so P(V n Ds) <_ P(Asp) < (2/ni)P(D8). Thus P(Vi)
Es(2/ni)P(Ds) = (2/ni) Es P(Ds) < 2/ni.
4.4. L1bounded martingales
169
Now given e > 0, choose io with F°_io 2/ni < e/2, then choose it > io so that
P B*\ U Bt
<2
io<1t1
Define T E Elm by:
U B8
It ET}=It ET,tl}\
ii<181
for io < Itl < il, and It E T} = 0 otherwise. Then P(B* \ B(T)) < e and S(T) < 2. This completes the verification of condition (C). Complements
(4.4.11) (Conditions between (V) and (C).) Let m be a positive integer. We say that a stochastic basis (.Ft) satisfies condition SV(m) if for every 6 > 0 and every adapted family (At) of sets there is T E Elm such that P(A* \ A(T)) < e and S(T) < m. Thus the stochastic basis in the counterexample (4.4.10) satisfies condition SV(2). We say that (.Ft) satisfies condition (SV) if SV(m) holds for some m. We say that (.Ft) satisfies condition (vSV) if there exists a sequence of sets SZn in the algebra U .7 t such
that P(U Q,,) = 1 and, for each n, the restriction of (.Ft) to Qn satisfies (SV).
These conditions are related as follows. None of the implications can be reversed.
(V) = SV(2) SV(m + 1)
SV(3) . .
.
..
SV(m)
. = (SV) = (vSV)
(C).
Reference: Millet & Sucheston [1980e].
(4.4.12) (Condition (Ct).) Let be an Orlicz function. We say that the stochastic basis (.F't)tEJ satisfies condition (C..) if for each e > 0 there exists M such that for each adapted family (At)tEJ with P(A*) > e, there M. Then, of course, exists ,r E EIM with P(A* \ A(T)) < e and condition (C) is the special case where Lt, = Lam. Suppose J has a countable cofinal set and the conjugate Orlicz function satisfies condition (02). Then condition (C.) is satisfied if and only if every Lwbounded martingale converges essentially (Talagrand [1986]). (4.4.13) (Covering condition (A).) A stochastic basis (At) satisfies covering condition (A) if, for every adapted family (At) of sets with P(A*) > 0,
there is a constant M such that for all s E J there exists r E E, T > s, r subordinate to (At), with
II8(T)L : MIIS(T)II1 If (A) holds, then all L1bounded martingales converge (Astbury [1981b]).
Directed index set
170
(4.4.14) (Uniformly integrable martingales.) Suppose J has a countable cofinal subset. Let (Ft)tEJ be a stochastic basis. If all uniformly integrable martingales converge essentially, then all Llbounded martingales converge essentially (Astbury [1981a]). (4.4.15) (A counterexample.) Assuming the Continuum Hypothesis, there exists a stochastic basis of finite algebras on [0, 1] such that all uniformly integrable martingales converge essentially, but some L1bounded martingales fail to converge essentially (Talagrand [1986]). This example shows that the hypothesis of countable cofinal set cannot be removed entirely in Theorem (4.4.9) (or in the preceding result (4.4.14) of Astbury). (4.4.16) (Failure of condition (C).) Condition (C) fails for the stochas
tic basis of Example (4.2.1), since the martingale there does not converge essentially. The failure of (C) may be established directly. We use the notation of Proposition (4.2.6). The adapted family At constructed
there has P(A*) = 1. For e = 1/2, we will show that the condition in Lemma (4.4.2(d)) fails. Let M be given, then choose p E IN so large that M 2P < 1/4. Continue as in the argument for (V): Let r E ElM satisfy T > {1, 2, , p} and S(T) < M. For m > p, for a fixed pair (B, D), the sets F(C U B, D) for C E C,,,,
all contain the nonnull set F({m + 1, , 4m} U B, D). Since S(T) < M, there are at most M sets C with F(C U B, D) fl {t(m, C) E r} # 0. Thus
PU
(F(C, 0) fl {t(m, C) E T}
<ME
22m
CECm
M , 2m . 22m = M . 2m Thus
M=P
M . 2"' = M 2P+1 < 1/2. Condition (C) fails.
Remarks
The covering condition (C) was introduced by Millet & Sucheston [1980e]. Astbury [1981b] independently introduced covering condition (A), which he called the "dominated sums property." Millet & Sucheston [1980c] proved (assuming the existence of a countable cofinal set) that (A) is equivalent to condition (C). Millet & Sucheston proved that (C) implies convergence of L1bounded martingales, and Astbury independently proved that (A) implies convergence of L1bounded martingales. The proof that convergence of Llbounded martingales implies (C) is due to Talagrand [1986]. We will see in Chapter 7 that condition (C) can be used profitably in the derivation setting. The interplay between derivation and directedsets is part of the reason that we feel the two settings should be studied together. The counterexample (4.4.10) showing that convergence of L1bounded martingales does not imply condition (V) is from Millet & Sucheston [1979b], where conditions SV(m) and (SV) were introduced. A similar example is given in Astbury [1981b].
5
Banachvalued random variables
In this chapter, we consider martingales, amarts, and related processes that take values in Banach spaces. They are useful in Banach spaces that occur naturally in mathematics, such as function spaces. They have also played an important role in the understanding of some geometric properties of Banach spaces. The reader who knows nothing of Banach spaces should, of course, skip this chapter; but someone with only a minimal knowledge of Banach space theory should be able to work through most of the chapter, with the exception of Section 5.5. There is a close connection between martingales with values in a Ba
nach space E and measures with values in E. This is not unexpected. More interesting are the connections of these two topics with the geometric properties of the Banach space. These are explored in Section 5.4. If a theorem is true in the realvalued case and extends to a more general setting without change of argument, this may be useful, but is hardly exciting. One modern approach to probability in Banach space consists in exactly matching the convergence property to the geometry of the space. Such theorems are not only the best possible in terms of convergence, but they may also shed new light on the structure of the space. The remarkable theorem of A. & C. Ionescu Tulcea and S. D. Chatterji (that L1bounded martingales taking values in a Banach space E converge a.s. if and only if E has the RadonNikodym property, Theorem (5.3.30); also (5.3.34)) characterizes a geometrical property of Banach spaces in terms of convergence of martingales. It will be seen that there are many such characterizations
in terms of amarts. For example, the RadonNikodym property of the space, the RadonNikodym property of the dual, the reflexivity and even the finitedimensionality, can be characterized in terms of convergence of appropriate classes of amarts. Some of the more common operator ideals can be characterized as well in terms of convergence of classes of amarts. These results have various degrees of depth. In some instances, the probabilistic result is an interesting but easy consequence of a deep theorem in functional analysis. Martingale convergence is applied several times in Section 5.4: the fixedpoint theorem of RyllNardzewski, the integral representation theorem of ChoquetEdgar, and two geometric characterizations of the RadonNikodym property, (5.4.13) and (5.4.17). Amart investigations led to the following basic result which can be stated without amarts: If for bounded processes scalar convergence implies weak a.s. convergence, then (and only then) the dual of the Banach space is separable (5.5.27). There is also an operator generalization of this result (5.5.26).
Banachvalued random variables
172
5.1. Vector measures and integrals Theorems of functional analysis
The Banach spaces in this chapter will be understood to be Banach spaces over the field IR of real numbers.
If E is a Banach space, we will write E* for the dual space, that is the set of all bounded linear functionals on E with norm
IIx*Il=sup {Ix*(x)I:XEE,IIxII <1}. If x E E and x* E E*, we will sometimes write (x,x*) for x*(x). The Banach space E will often be identified with a subspace of E** in the natural way. The norm on a Banach space defines a metric on the space, and therefore a topology. When topological words are used without qualification, they will refer to this norm topology. A Banach space also has a weak topology. A net (xt) in E converges weakly to the vector x if (xt, x*) + (x, x*)
for all x* E E*. If E = F* is a dual space, then it has a weakstar topology. A net (xt) in E converges weakstar to the vector x if
(y, xt)  (y, x) for all y E F. Details of these and other items concerning Banach spaces can be found for example in Lindenstrauss & Tzafriri [1968] or in Dunford & Schwartz [1958].
The study of the geometry of convex sets in a Banach space frequently uses the HahnBanach theorem in two different forms:
(5.1.1) The HahnBanach extension theorem. Let E be a Banach space, and El a closed subspace. If xi E El , then there exists an extension x* E E* (that is, xi(x) = x*(x) for all x E El) with IIx*II = IIxi1I.
(5.1.2) The HahnBanach separation theorem. Let E be a Banach space. Suppose C a closed convex subset, and K a compact convex subset. If c fl K = 0, then there exists a functional x* E E* that strictly separates the two sets (that is, sup x* (C) < inf x* (K)). Suppose C a closed convex subset, and G an open convex subset. IfCf1G = 0, then there exists a functional x* E E* that separates the two sets (that is, sup x* (C) < inf x* (G)). A reference for HahnBanach theorems is, for example, Rudin [1973], pp. 5559.
Another important result from functional analysis that will be used often is the closed graph theorem.
5.1. Vector measures and integrals
173
(5.1.3) Closed graph theorem. Let E and F be Banach spaces, and let T : E > F be a linear transformation. If the graph
{(x) y)EExF:y=Tx} is a closed set in the product E x F, then T is a bounded linear transformation. See Rudin [1973], Theorem 2.15, for a discussion of the closed graph theorem. Random variables Let E be a Banach space. By a random variable in E, we mean a function measurable in the sense of Bochner. This can be defined as follows. Let (S2, F, P) be a probability space. A measurable simple function in E is a
function X : 1 i E of the form n
1Aj (w) xj
XW= j=1
where Aj E F and xj E E. The integral or expectation of X is the vector n
E [X] = EP(Aj)xj. j=1
A random variable in E is a function X : Il + E that is equal almost surely
to the limit (in the norm of E) of a sequence Xn of measurable simple functions: limo I I X (w)  Xn (w) II = 0
for almost every w.
Almost all the values of X lie in a separable subspace of E, namely the closed span of the countable set obtained from the values of all the simple functions X. The random variable X is Bochner integrable if E [IIXII] < oo; the quantity IIXIILI = E [IIXII]
is called the Bochner norm of X. If X is Bochner integrable, then the approximating sequence Xn of simple functions can be chosen so that
lim E[IiX  XniI] = 0;
n.oo
then the Bochner integral of X is defined by
E [X] = lim E [Xn] . n.oo
It can be shown that this expression does not depend on the choice of the approximating sequence Xn. We write L1 (SZ, F, P; E) for the set of all Bochner integrable random variables. It is a Banach space (when its elements are considered to be equivalence classes). The Bochner integral has many of the properties of the realvalued integral.
174
(5.1.4) Definition. norm of X is
Banachvalued random variables
Let X : IZ > E be a random variable. The Pettis
IIXIIp =up E [I (X, x*) I ] z' EE'
1Ix*II<1
Note that IIX II p _< II X II Ll , so every Bochner integrable random variable
has finite Pettis norm. The space L1(Il, F, P; E) is not complete under the Pettis norm (unless E is finitedimensional). See Theorem (5.5.34). The random variable X is scalarly integrable if, for every bounded linear functional x* E E*, the composition (X, x*) is a (realvalued) inte
grable random variable. Of course, if X has finite Pettis norm, then it is scalarly integrable. But in fact the converse is also true. This is an application of the closed graph theorem, which may be carried out as follows: If X is scalarly integrable, define a map T : E* > Li (1, F, P) by
T(x*) = (X,x*). We claim that the graph of T is closed. If x* > x* in E*, then (X (w), x*) > (X (w), x*) for each w, hence (X, x*) > (X, x* ) in probability. If, in addition, (X, x*) > Y in L1 norm, then (X, x*) > Y in probability. Therefore Y = (X, x*) a.s. Thus T is a closed linear transformation defined on a Banach space, so it is bounded. The norm IITII is the Pettis norm of X. When the Banach space is separable, there are many equivalent ways to recognize random variables. Our proofs are based on a simple lemma. A closed halfspace in a Banach space E is a set of the form
Ha(x*)={yEE: (y,x*)
(5.1.5) Lemma. Let E be a separable Banach space, and let C be a closed, convex subset of E. Then C is a countable intersection of closed halfspaces.
Proof. It follows from the HahnBanach separation theorem that C is an intersection of closed halfspaces, say
C= n HA,y(xy). yEr
The set E \ C is a Lindelof space, so C is also the intersection of a countable subcollection of the halfspaces Ha, (x7). This point will now be explained more fully. Let V be a countable set dense in E \ C, and let V be the collection of all balls with centers in V and rational radii. V is a countable collection. The union of all the elements of V that are disjoint from some H),7 (xry) is exactly E \ C. Thus C is the intersection of countably many of these halfspaces, one disjoint from each of these balls. The valgebra 13 of Borel sets in the Banach space E is the oralgebra
generated by the open sets. If E is separable, then that is the same as the aalgebra generated by the open balls, since each open set is a countable union of open balls. In fact, it is the same as the oralgebra generated by the closed balls, since each open ball is a countable union of closed balls. Then the previous result will show that 13 is the aalgebra generated by the closed halfspaces. (Note: This may fail in a nonseparable Banach space.)
5.1. Vector measures and integrals
175
(5.1.6) Proposition. Let E be a separable Banach space, and let (St, F, P) be a complete probability space. A function X : f * E is Bochner measurable if and only if it is Bore] measurable.
Proof. A simple function is Borel measurable, and a limit of a sequence of Borel measurable functions (with values in a metric space) is Borel measurable. This is enough to show that any Bochner measurable function is Borel measurable. Now let X : fI p E be Borel measurable. Fix a positive
integer n. The Banach space E is the union of countably open balls of diameter less than 2n, so E is the union of a countable pairwise disjoint family {Dj } of Borel sets of diameter less than 2n. By the countable additivity of P, there is m so that
I
P X E U DjJ> 1 2". j=1
Let aj E Dj and write Aj = {X E D. I. Then
Xn=Eaj1Ai M j=1
is a measurable simple function satisfying
P{JJX  Xnll < 2n} > 1  2n. Clearly X is Bochner measurable.
(5.1.7) Pettis measurability theorem. Let E be a Banach space, and let (St, F, P) be a complete probability space. A function X : Il + E is Bochner measurable if and only if X is scalarly measurable and there is a separable subspace El C E with P{X E E1} = 1. Proof. If X is Bochner measurable, then it is scalarly measurable and (almost) separably valued, since simple functions have these properties and they are preserved by pointwise a.s. limits. For the converse, suppose X is scalarly measurable and has values (a.s.)
in the separable space E1. By ignoring a set of measure zero, we may assume that X has all its values in El. Now the collection of all sets D C E1 such that {X E D} E F is a aalgebra and includes the closed halfspaces. Therefore it includes all Borel sets. This means that X is Borel measurable, so by the preceding result X is Bochner measurable.
A slight variant of this theorem will also be useful. If F is a Banach space and X : 1 + F*, we will say that X is weakstar scalarly measurable if (y, X) is a measurable function on f for each y E F.
Banachvalued random variables
176
(5.1.8) Proposition. Let F be a Banach space with separable dual, and let X : St  F* be weakstar scalarly measurable. Then X is Bochner measurable.
This is proved by almost the same method as we have used for the Pettis measurability theorem. We must use weakstar closed halfspaces in place of closed halfspaces. The unit ball of F* is weakstar closed, so it is the intersection of a family of weakstar closed halfspaces. The rest of the proof is the same. Vector measures
Let E be a Banach space. Suppose F is a aalgebra of subsets of a set Q. A vector measure in E is a function p: .F f E such that
(QA)n = ni
ni
/t(An),
for every pairwise disjoint sequence (An) in .F, where the series converges in the norm of E. The measure p is absolutely continuous with respect to
P iff p(A) = 0 for all A E .F with P(A) = 0. (We write µ « P.) The variation of p on a set A E .F is n
I,u I (A) = sup i=1
IIp(Ai)II,
where the supremum is taken over all finite disjoint sequences Al, A2, , An C A in T. The setfunction is a (possibly infinite) measure on.F. We say that p has afinite variation on a set A if A is a countable union of sets on which p has finite variation. Suppose the random variable X is Bochner integrable, and a set function p is defined by p(A) = E [X 1A] for A E.F. Then p has finite variation; in fact the variation of p on St is exactly the Bochner norm E [IIXIII The semivariation of a vector measure µ on a set A E .F is IIpII(A) = sup { I x*,ul (A) : x* E E*, IIx* II < 11 ,
where Ix*,uI is the variation of the realvalued measure x*p (see Section 1.3). The setfunction is not additive, but it is subadditive and has the following property: IIpli(A) = 0 if and only if p(B) = 0 for all B C A. Indeed, we claim that sup{ IIp(B)II : B C A} <_ IIpII(A) < 2sup{ IIp(B)II : B C A}.
To see this, first consider any B C A. Then IIp(B) II = sup { I x*uI (B) : x* E E*, IIx* II < 1 } sup { Ix*pI(A) : x* E E*, 11x* 11
1} = IlpII(A).
5.1. Vector measures and integrals
177
Thus sup{ 11p(B)II: B CAI < 11µ11(A). On the other hand, if x* E E* and Ilx*11 < 1, choose the positive and negative sets B1 and B2 for the signed measure x*µ (as in (1.3.7)). Then Ix*µl (A) = 1x*p(An B1)1 + Ix*p(An B2)1 IIµ(AnB1)II+IIµ(AnB2)11
<2sup{11p(B)II :BCA}. Therefore sup f I x*pI (A) : x* E E*, 11x* 11 <_ 11 = IIpII(A) < 2sup { IIp(B)II : B C Al.
(5.1.9) If p has the form u(A) = E [X 1A], then the semivariation is the Pettis norm of X (5.1.4): IIXIIP = 1I14(n)
Therefore we have (5.1.9a)
IIXIIP<_2sup{IIE[X1A]II:AEa(X)}.
Here is an extension theorem for vector measures.
(5.1.10) Theorem. Let
P) be a probability space, and let E be a Banach space. Let A C.F be an algebra of sets, and let 9 be the oralgebra generated by A. If µ: A > E is finitely additive and satisfies (5.1.10a)
IIµ(A) II < P(A)
for all A E A,
then there is a unique extension of p to g that is a countably additive vector measure.
Proof. First, if B E 9, then for every e > 0 there is an approximating set A E A with P(A A B) < e. (This is true because the set of all B with this property is a valgebra containing A.) Now if B E 9, we may choose a sequence (A,y) in A with P(A,, D B) f 0. But now by (5.1.10a), we have II µ(Am)  p(An)II < I P(Am)  P(An)I,
so the sequence (p(A,,)) converges. Call the limit µ(B). This limit does not depend on the choice of the sequence (A,,), again by (5.1.10a). This extension of µ to g is countably additive, since it satisfies IIp(B)II < P(B) for all B E 9.
178
Banachvalued random variables
The RadonNikodym property A Banach space E has the RadonNikodym property if, for every prob
ability space (St,F,P) and every measure µ: F > E such that µ is absolutely continuous with respect to P and µ has finite variation on fl, there is a Bochner integrable random variable X : 1 > E such that µ(A) = E [X 1A]
for all A E .F. The random variable X is called the RadonNikodym derivative of µ with respect to P, and is denoted
XdP Here is an elementary, but useful, reformulation of the condition.
(5.1.11) Proposition. Let E be a Banach space. Then E has the RadonNikodym property if and only if for every probability space (0,.F, P) and every measure p: F + E such that II p(A) II P(A) for all A E F, there is a RadonNikodym derivative dµ/dP and Ildµ/dPII < 1 a.s. Proof. If IIp(A)II < P(A) for all A C .F, then p has variation at most 1. If E has the RadonNikodym property, then it satisfies the condition stated here. Conversely, suppose E satisfies the condition. Let (52, F, P) be a probability space, and u: Y + E a measure, absolutely continuous with respect to P, with finite variation. We may suppose that µ is not identically 0. Define __
Iµ1(A)
P'(A) IpI( l) for all A E F. Then (Sl,F, P') is a probability space. Define
1µN) for A E.F. Then /,t': F + E is a vector measure, and II p'(A) I < P' (A) for all A E.F. Thus there is a Bochner integrable random variable X': Il + E such that
A '(A) =E'(X'IA) for all A E F, where E' is the expectation with respect to P'. Now P' is absolutely continuous with respect to P, so there exists a scalarvalued RadonNikodym derivative H = dP'/dP. A short calculation shows that the product X = lµI(S2) H X' is the required RadonNikodym derivative of µ with respect to P. There are some special cases under which RadonNikodym derivatives exist regardless of the RadonNikodym property. One of them occurs when the probability space (Il, P) is atomic. The RadonNikodym derivative of µ is
X = Z'P( A) 1A A
where the sum is over a maximal disjoint partition of f into atoms. Another such special case is the conditional expectation, which we consider next.
5.1. Vector measures and integrals
(5.1.12)
179
Let E be a Banach space, let (1, F, P) be a
Definition.
probability space, let X : 1 + E be a Bochner integrable random variable, and let g g F be a aalgebra. The conditional expectation of X given g is the unique (a.s.) random variable Y: St + E, measurable with respect to such that E [Y 1A] = E [X 1A]
for allAE9. We write Y=EO[X] or E [X 191. (5.1.13) Proposition. Let E be a Banach space, let (Q, .F, P) be a probability space, let X : St  E be a Bochner integrable random variable, and let 9 C_ F be a aalgebra. Then the conditional expectation EO [X] exists.
Proof. The conditional expectation V [X] exists for simple random variables of the form n
X=
1Aj xj, j=1
where A j E F and x j E E, namely n
E9 [X]
= EEg [1A,] xj j=1
It is not hard to establish the estimate E [IJE9 [X1]  E9 [X2] III < E [IIX1  X2II] ,
where X1 and X2 are simple functions. If X is Bochner integrable, and (Xn)nEIN is a sequence of simple functions that converges to X in the Bochner norm, then the sequence E9 [Xn] is Cauchy in the Bochner norm. The limit of this sequence defines E9 [X]. The conditional expectation is a kind of "average," so the following result is not unexpected.
(5.1.14) Proposition. Let X be a Bochner integrable random variable taking values in a closed convex subset C of the Banach space E. Then (a) for any suboralgebra 9 of F, the conditional expectation E9 [X]
has values in C; and (b) for any A E F with P(A) > 0, we have E [X 1A] /P(A) E C.
Proof. Since X is Bochner measurable, its values lie a.s. in a separable subspace of E. So we may assume E itself is separable. By Lemma (5.1.5), C is an intersection of countably many closed halfspaces. That is, there exist linear functionals xi E E* and scalars ai such that n00
(5.1.14a)
C=
I
I
i=1
{ x E E : (x, xi) < ai } .
180
Banachvalued random variables
Now X has its values in C, so for any A E F we have (5.1.14b)
(E [X 1A] , xi) = E [(X, X7) 1A] < E [ai 1A] = ai P(A).
Therefore, by (5.1.14a), the vector E [X 1A] /IP(A) belongs to C. This proves (b). For the proof of (a), write Y = EO [X]. In (5.1.14b) take A= (w E Q: (Y(w), xi) > ai }. Then (E [Y 1A] , xi) = E [(Y, X7) 1A] > E [ai 1A] = ai P(A).
with equality only if P(A) = 0. But A E 9, so E [Y 1A] = E [X 1A], and equality holds. Thus P{(Y, xi) > ail = 0. The union of countably many events of probability zero still has probability zero, so by (5.1.14a), almost all the values of Y lie in C. A corollary of this is the vectorvalued version of Jensen's inequality. (For a simplified statement, replace "lower semicontinuous" with "continuous." Recall that the function cp is called lower semicontinuous if, for each c, we have cp(c) < lim inf.,_,, cp(x).)
P) a probability (5.1.15) Theorem. Let E be a Banach space, space, 9 C_ F a valgebra, and X : S2 > E a Bochner integrable random variable. Let C C E be a closed convex set, and cp: C * IR a convex lower oo, then semicontinuous function. If X E C a.s. and E cp(E4
[X]) < Ell [cp(X)]
a.s.
Proof. In the Banach space k = E ®1R, let C = { (x, t) : x c C, t > cp(x) }. Then C is a convex set since cp is a convex function and C is closed since cp is lower semicontinuous. Define X : Q > E by k (Lo) = (X (w), cp(X (w) )) .
Then k isBochner integrable, with values in C, so the conditional expectation E[X 19] also has values in C. But
E9 [k] = (E9 [X] , E9 [(X)]), so we have EO [cp(X)] > cp(Eg [X]) a.s.
Jensen's inequality shows clearly that the conditional expectation is a contraction on the vectorvalued LP spaces. If p > 1 then the function cp(x) = IIxiiP is convex, so
IlE [X]
IIP
< Ell [IIXIIP]
a.s.
Integrate both sides and raise to the power 1/p to obtain E [IIE11 [X] IIP]
1/p < E [IIXIIP] 1/P
(For the corresponding result in Orlicz spaces, see (5.1.22)).
181
5.1. Vector measures and integrals
We include here a few examples that illustrate the RadonNikodym property. The Banach space 11 is the set of all sequences (x1) x2, x3, ... )
of real numbers with
00 i=1
Ixil < oo.
1
It is a Banach space when given the norm E I xi I. The Banach space co is the set of all sequences (xl,x2,x3,...)
of real numbers with Jim 2,00
xi = 0.
It is a Banach space when given the norm maxi I xi I.
(5.1.16) Proposition. (a) The space 11 has the RadonNikodym property. (b) The space co fails the RadonNikodym property.
P) be a probability space, and let µ:.F * 11 satisfy 11p(A)II < P(A) for all A E.F. When p(A) is written in terms of its Proof. (a) Let components, µ(A) = (µ1(A), µ2(A), µ3(A), ... ),
each µi(.) is a measure (since the map that selects the ith component is linear and continuous). Each is a scalarvalued measure, absolutely continuous with respect to P, so there exists a RadonNikodym derivative Xi : SZ > Ift such that pi (A) = E [Xi 1A]
for all A E F. Now E [IIXili] = Iµil(Q), and E°_1 lpil(S2) < oo, so for almost all w E S2, the combined random variable
X (w) = (X1 M, X2 M, X3 M, ... ) has its values in 11. This combined random variable satisfies
µ(A)=E[X1A] for all A E F. (b) Let (1k, F, P) be [0,1] with Lebesgue measure. Define a: F * co by
p(A) = I
\ JA
sin w dw,
JA
sin 2w dw,
JA
sin 3w dw,
I .
///
By the RiemannLebesgue lemma, µ(A) E co for all A E .F. Also,
jjµ(A)jj < P(A)
182
Banachvalued random variables
for all A E.F. But we claim that u has no RadonNikodym derivative with respect to P. As in part (a), the continuity of the coordinate functionals on co shows that if a RadonNikodym derivative X did exist, we would have X (w) = (sin w, sin 2w, sin 3w,
)
for almost all w E [0, 1]. But this X(w) is in co for (almost) no w. This shows that co fails the RadonNikodym property. Complements
(5.1.17) (Banach space Li.) The space L1 = L1([0,1]) fails the RadonNikodym property. This can be seen using the probability space (Q, .F, P) =
[0,1] with Lebesgue measure and the vector measure p: F > L1 defined by:
µ(A)=1A
for allAE.F.
(5.1.18) (Pettis integral.) There is another kind of integral sometimes used for random variables with values in a Banach space. Suppose the random variable X is scalarly integrable. We say that X is Pettis integrable on the set A E F if there is a vector XA E E satisfying
x*(XA) = E [x*(X)1A] for all x* E E*. We say that X is Pettis integrable if X is Pettis integrable on each set in F. The vector XA is called the Pettis integral of X on A, and we will usually write it using the same notation as the Bochner integral:
xA=E[X1A]. (5.1.19) (Pettis vs. Bochner.) Let X be a Bochner integrable random variable. Then X is also Pettis integrable, and the two integrals agree. (5.1.20) (Pettis vs. Bochner.) Define a random variable X: [0, 1] > 12 as follows. Let en be an orthonormal sequence in 12, and let An be disjoint sets in [0, 1] with Lebesgue measures P(An) = 2n f o r n = 1, 2, 3, . Define n
n 1An (w) en.
X (W) _ n=1
Then X is Pettis integrable but not Bochner integrable. (5.1.21) (Variant definition.) The Banach space E has the RadonNikodym property if and only if for every probability space (Sl, F, P) and every vector measure p: .F * E with afinite variation, absolutely continuous with respect to P, there is a Pettis integrable random variable X such
that
µ(A)=E[X1A] for all A E F.
5.2. Martingales and amarts
183
(5.1.22) (Vectorvalued Orlicz spaces.) Let (F be an Orlicz function. For Evalued random variables, let
=inf{a>0:E['t (Ilallll <1 The set L.,(12, F, P; E) of all (equivalence classes of) random variables X with IIXII . < oo is a Banach space with norm IIXIIp. (5.1.23) (Orlicz norm and variation.) If µ: F > E is a vector measure, then its (Fvariation of µ with respect to P is sup
'D
(JJa(Ai)IJ
P(Ai)
)
P(Ai),
, An) in where the supremum is over all finite disjoint sequences (Al, A2i F. If X is a Bochner integrable random variable, and µ(A) = E [X 1A] for all A, then the (Fvariation of p is equal to the Orlicz modular E [(F (I I X I I )1 of X.
Remarks
Additional material covering measurable subsets of a Banach space is in Edgar [1979b] and Talagrand [1984]. A more thorough discussion of vectorvalued measures and the Bochner integral
can be found in Diestel & Uhl [1977], or Bourgin [1983]. These two books also contain much more material on the geometry of Banach spaces with the RadonNikodym property.
The proof [Proposition (5.1.16(a))] that 11 has the RadonNikodym property actually shows that any Banach space with a boundedly complete basis has the RadonNikodym property. (For the definition, see for example Lindenstrauss & Tzafriri [1968], page 13.) This fact is a special case of the result proved below (5.3.32) that a separable dual Banach space has the RadonNikodym property.
5.2. Martingales and amarts In this section we begin the discussion of martingales and related processes with values in a Banach space. Then we discuss some difference properties and Riesz decompositions for vectorvalued processes. Convergence theorems will be treated in the next section, because of the close connection with the RadonNikodym property. Let (Il, F, P) be a probability space, let (Fn)nEiNV be a stochastic basis on S2, and let E be the set of all simple stopping times for (FF)nEJN. Let E be a Banach space, and let (Xf)fEN be a sequence of Bochner integrable random variables with values in E, adapted to (Fn)nE>rr. These data will be fixed throughout Section 5.2.
Banachvalued random variables
184
Elementary properties
(5.2.1) Definition. The Evalued adapted process (Xn)nEIN is a martingale if the net (E [Xo])oEE is constant.
One shows (applying a linear functional, then using the scalar case (1.4.3)) that (Xn)nErr is a martingale if and only if
EF [Xn] = X.
for m < n. Also, if (Xn)nE]N is a martingale, and or < r in E, then V'° [Xr] = X. The optional sampling theorem for martingales follows from this.
(5.2.2) Optional sampling theorem. Suppose (Xn)nEIN is a martingale
with respect to the stochastic basis (.Fn)nEIN Let rl < r2 < ... be an increasing sequence in E. Then the process Yk = Xrk
is a martingale with respect to the stochastic basis (Gk)kEN defined by 9k = .Trk .
There are several different vectorvalued analogs of the scalarvalued amarts. One that inherits most of their properties is the "uniform amart."
(5.2.3) Definition.
The process (Xn)nEIN is a uniform amart if the
following difference property is satisfied: For every e > 0, there is mo E IN such that for all or, r E E with mo < or < r,
E
[IIEc [Xr]
 XoII ] <,F.
It should be noted that (unlike the scalar case (1.4.21)) this difference property is not equivalent to the one in which a is replaced by an integer m (5.2.35).
(5.2.4) Definition. The process (Xf)fEIN is a quasimartingale if = E E [IIE17° [Xn+1]  XnII ] < oo. n=1
Clearly, every martingale is a quasimartingale. It can be proved (as in (1.4.4)) that every quasimartingale is a uniform amart.
(5.2.5)
Definition.
The process (Xn)nEIN is an amart iff the net
(E [Xa])aEE converges according to the norm of E. Since the norm on E defines a metric topology, we know by the sequential sufficiency theorem (1.1.3) that (Xn)nEIN is an amart if, for every sequence (an) in E converging to oo, the sequence (E converges in E. Amarts can also be characterized by:
5.2. Martingales and amarts
185
(5.2.6) Pettis norm difference property. Let (Xn)nEIN be a process with values in the Banach space E. Then the following are equivalent: (1) (Xn) is an smart. (2) EF° [X,.]  XQ 4 0 in Pettis norm, that is: For every e > 0, there is mo E IN such that
IIE"*°[XT]XuIIP<E
for all o,TEE withmo
 X.] II
[E7° [XT]
=sup{x*(E[EF'[XT]Xc]):x*EE*,IIx*II<1J
< II E7° [XT]  Xa II P
Thus, if the difference property (2) holds, the net E [X,] is Cauchy, and therefore convergent.
For the converse, again let mo < o < T. Note that if A E .''o, then (as in (1.4.5)) there are stopping times o ,Tl > mo with
E[Xri Xol]
E[(XTXQ) 1A]
=E[(E"'[XT]Xo) 1A]. By (5.1.9a) the Pettis norm satisfies
IIE °[XT]XaIIP<2 sup AEF,
IIE[(Ef`[XT]Xo) 1A] III
so if (Xn) is an amart, this Pettis norm converges to 0.
(5.2.7) Corollary. Every uniform amart is an smart. Proof. The Pettis norm is dominated by the Bochner norm. The converse of this proposition is not true in general. In fact (5.5.11), only finitedimensional spaces E have the property that amarts and uniform amarts coincide. The optional sampling theorem for amarts is proved as in the realvalued case.
(5.2.8) Optional sampling theorem. Let (Xn)nEIN be an amart in E adapted to (Fn)nEN. If vl < 0`2 < is an increasing sequence in E, then the process Yk = X,,, is an smart adapted to the stochastic basis (ck) defined by ck = Fak
Banachvalued random variables
186
(5.2.9) Restriction theorem. Let (Xf)nEIN be an amart in E, and let A E .''n, . Then the process (Xn 1 A)°O_,,n is also an amart in E. In particular,
the limit 1im E [Xn 1A]
noo exists.
Proof. Let (Xn) be an amart, and A E .P,n. Given or, ,r > m, choose n > a,'r and define
(v(w) (w)
T (w)
n
1
rr(w) Sl
n
if w E A otherwise if w E A otherwise.
Then v', r' E E and E [1AXa]  E [1AXT] = E [Xa,]  E [XT,]. Therefore (E [1AXa] )aEE is a Cauchy net. So (1AXn) is an amart.
(5.2.10) Definition. The process (Xn)nEIN is a weak amart if the net (E [X,])aEE converges in the weak topology of E; and (Xf)fEIN is a weak sequential amart if, for every increasing sequence (an) in E, the sequence (E [X,,]) converges weakly in E. The weak topology in a Banach space is not metrizable in general, so it is not true in general that a weak amart is a weak sequential amart. The optional sampling theorem and the restriction theorem may fail for weak
amarts, but they both remain correct for weak sequential amarts. The optional sampling theorem is immediate from the definition:
(5.2.11) Optional sampling theorem. Let (Xn)nEIN be a weak sequential amart in E adapted to (Fn)nEIN If a1 < a2 < is an increasing sequence in E, then the process Yk = X,, is a weak sequential amart adapted to the stochastic basis (9,) defined by ck _ yak The restriction theorem requires a proof:
Restriction theorem. Let (Xn)nEIN be a weak sequential amart in E, and let A E Then the process (Xn 1A),n° m is also a (5.2.12)
weak sequential amart in E. In particular, the limit lim E [Xn 1A]
n.oo
exists in the weak topology of E.
Proof. Let an be an increasing sequence in E. Write a = limn an. Then or is a (possibly infinite) stopping time. Now for each n, let Tn (w) =
an(w)
otherwise.
5.2. Martingales and amarts
187
Then Tn is an increasing sequence of bounded stopping times, and limn, Tn =
o A m on A. Then E [Xc,
1A] = E
E [XT,.] + E [XcAm I AI + E [(XT  XoAm) lA] .
The first two terms on the right converge weakly, and the third term is constant, so all that remains is the proof that the last term converges weakly (to zero). Now on A, both Tn and or A m take values between 1 and m, so II (Xr,.  Xanm)1AII < 2 sup IIXkII, I
which is an integrable function. But (XTn  Xanm) 1A converges pointwise a.s. to 0, so the expectation converges in norm to 0. Uniform amart Riesz decomposition We next take up the Riesz decomposition for uniform amarts.
(5.2.13) Riesz decomposition. Let (Xn)nEIN be a uniform amart. Then there is a unique decomposition Xn = Yn + Zn, where (Yn) is a martingale and (ZQ) converges to 0 a.s. and in Bochner norm.
Proof. Fix n E IN. Then we have for n < or < T [XT]  X,11]
E [IIE7° [XT]  E*F^ [Xo] II] < E
so the net (E[Xc])oEE is Cauchy in LI its limit. Now for m < n, we have E
[IIEFm [Yn]
,
P; E). We will write Yn for
 YmII]
= p". lim E [1IEm =0.
[E*Fn [Xp]]
 EF [Xp] II]
So (Yn)nE]V is a martingale. Let Zn = Xn  Yn. For v < T in E we have E'F° [Zr] = E7° [Xr  Yr] = E7° [XT]
 Y.
Thus
li
[IIE"° [Z,] II] = 0.
But IIZoII < IIZo  Er' [Zr] II +
JIE.
[Zr] II,
so
lim E [IIZoII] = 0.
aEE
Then by (1.4.22), we have IIZnII ' 0 a.s.
This proposition enables us to reduce much of the theory of uniform amarts to the martingale case.
188
Banachvalued random variables
(5.2.14) Lemma. Let C be a closed convex subset of a Banach space E. Let (Xf)fEN be a uniform smart with values in C. Let Xn = Yn + Zn be its Riesz decomposition. Then Yn has its values a.s. in C.
Proof. Now Xn has its values in C, and C is closed and convex, so by (5.1.14) Erm [Xn] has its values in C. But Yn,, is the pointwise limit of a subsequence of such conditional expectations, so Y,n has its values in C.
The convergence theorems that will be proved in this chapter utilize certain boundedness conditions. Many of them are essentially the same as the definitions used in the scalar valued case.
(5.2.15) Definition. The set R of Evalued random variables is said to be Llbounded if it is bounded in the Bochner norm: sup E [IIXII] < 00.
XER
A process (Xn)nEIN is accordingly said to be Llbounded if supE [IIXnII] < 00n
The set R is said to be Pettis bounded if sup IIXIIp <00.
XER
The set R is said to be uniformly integrable if for every e > 0 there exists A > 0 so that for all X E R, E [IIXII 1{IIXII>a}] < e; or, if the probability space is atomless, equivalently [see Proposition (2.3.2(3))]: for every s > 0 there exists 6 > 0 so that for all X E R
and all A E .F with P(A) < 6, E [IIXII 1A] < e.
The set R is said to be uniformly absolutely continuous if, for every e > 0 there exists 6 > 0 so that for all X E R and all A E .F with P(A) < 6, IIE[X1A]11<6.
The process (Xn)nEIN satisfies condition (B), or is of class (B) if
sup E[IIXnIII <00.
oEE
5.2. Martingales and amarts
189
(5.2.16) Proposition. Let R be a set of Pettis integrable Evalued random variables. Then R is uniformly absolutely continuous if and only if the set
R1={(X,x*):XER,x* EE*,Ilx*II <1} of scalarvalued random variables is uniformly absolutely continuous.
Proof. Suppose R is uniformly absolutely continuous. Let e > 0. There is 6 such that IIE [X 1A] II < e if P(A) < 6. Let x* E E* with Ilx* II < 1.
Then for A with P(A) < 6, the two sets Al = A n {(X, x*) > 0} and A2 = A n { (X, x*) < 0} also have probability less than S. So E [I (X,x*)l 1A] = IE [(X,x*)1A1] I + IE [(X,x*) 1A2] I < 2E.
This shows that the set Rl is uniformly absolutely continuous. Conversely, suppose that Rl is uniformly absolutely continuous. Let
e >0 be given. There is6>0sothat E[l(X,x*)l1A] <eifP(A) <6and llx* II < 1. Then, for all x* E E* with 11x* 11 < 1,
I(E[X1A],x*)I
E[I(X,x*)I1A] <E,
so that IIE[X1A]ll <E. This proposition means that many of the properties of uniform absolute
continuity in the scalar case carry over to vector case. For example: If (Xn) is uniformly integrable, then it is bounded in Bochner norm. If (Xn) is uniformly absolutely continuous, then it is bounded in Pettis norm. If (X,,,) is uniformly absolutely continuous, and 9 C F is a oralgebra, then (EO [X]) is also uniformly absolutely continuous.
(5.2.17) Proposition. Let (Xn) be a uniformly absolutely continuous amart. Then p(A) = lim E [Xn 1A] nboo
exists for all A E F, and defines a vector measure µ:.F + E.
Proof. By (5.2.9), µ(A) exists for A E U .Fn. Let A E F,,. = o(U Fn). Let E > 0. There is 6 > 0 so that P(D) < 6 implies IIE [1DXn] II < e for all n. Then there is kl E IN and B E Fk1 with P(A A B) < 6. There is k2 > kl so that IIE [1BXn]  E [1BX,,,,] II < e for all n, m > k2. Thus IIE [1AXn]  E [1AX,n] II < IIE [1BXn]  E [1BXm] II
+ IIE [1A\BXn] II + IIE [1A\BX,] II + IIE [1B\AXn] II + IIE [1 B\AXm] II < 5E.
This shows that limn,,. E [1AXn] exists.
Banachvalued random variables
190
Next we claim that if Y E L,,. (SZ, Foc,, P), the limit limn E [YXn] exists. To see this, note that (Xn) is bounded in Pettis norm, say IIXnil < a
for all n. Fore > 0, there is a simple function Y' E L,,,c, (F ) with J Y  Y' I < E/a everywhere. But by the preceding, limn E [Y'Xn] exists. So there is k E IN so that I I E [Y'Xn]  E [Y'X,,,,] II < e for n, m > k. But
IIE [YXn]E[Y'Xn] 11 < 1a=E. a Then we have for n, m > k IIE [YXn]  E [YX,n,] II < 3E,
so limn E [YXn] exists.
Now consider A E F. Then E [1AXn] = E [E°° [1A] Xn]
so it converges as n > oo. Let µ(A) be the limit. We must show that a is countably additive. Suppose Ak
0.
Given
e > 0, there is 6 > 0 so that P(D) < 6 implies IIE [1DXn] II < e. Now P(Ak) > 0, so P(Ak) < 6 for k large enough, and thus IIE [lAkXf] II < E for all n. Therefore Ilµ(Ak)II < e. This shows limk IIp(Ak)II = 0. Thus µ is countably additive. The preceding result is also true for weak sequential amarts. Next we consider condition (B), namely sup(EEE [IIXoII] < oo. It was not necessary to emphasize condition (B) in the scalar case, because an Llbounded scalar amart automatically satisfies condition (B). This is no longer true for vectorvalued amarts (5.5.29). However it does remain true for uniform amarts.
(5.2.18) Proposition. A uniform amart is Llbounded if and only if it satisfies condition (B).
Proof. By the Riesz decomposition (5.2.13), it is enough to prove the result when (Xn) is a martingale. But then (IIXnhi)nEIN is a submartingale. For every v in E there is an n E IN with v < n, and thus E [IIXoII ] < E [IIXnII ] . Therefore supE [IIXoII] = sup E [IIXnII] (EE
nEIN
The following is obtained by applying (1.1.7) to the real process (IIXnII) .
(5.2.19) Maximal inequality. Let (Xn)nEIN be an adapted sequence in the Banach space E, and let A be a positive real number. Then P{ supllXnll ? A}
sup E [IIXo11]
This maximal inequality is a good illustration of the usefulness of condition (B). Another illustration of its usefulness is the next stopping result.
5.2. Martingales and amarts
191
(5.2.20) Definition. Let (Xf)fEN be a stochastic process with values in a Banach space E, and let C be a subset of E. We say that the process (Xn) stops outside C if X,, (w) C implies X,, (w) = Xt,+1(w). (5.2.21) Definition.
Let (Xn)nEiN be a process adapted to the stoSuppose (Xf)fEN has values in a Banach space E, and let C be a Borel subset of E. The first entrance time of (Xn) in E \ C is the (possibly infinite) stopping time or defined as follows. Let D = {Xn E C for all n}. Then chastic basis
for w E D .
The process (Xn) stopped outside C is the process (Yn) defined by Yn = XnAa
(5.2.22) Proposition. Let C be a closed bounded subset of the Banach space E. Suppose the process (Xn)nEIN in E satisfies condition (B). Let (Yn) be the process (Xn) stopped outside C. Then
E
[sup IIY.II] < 00. n
Proof. Let D and a be as in the definition. Outside the set D, we have convergence Yn * Xo, so by Fatou's lemma, E [IIXoII 1S2\D] <_ liminf E [IIXnAoII In\D]
< liminf n E [IIXnACII] supE [IIXTII] rEE
<00. Thus, if A = supXEC IIxII, we have
E
[sup IIYnII] < E [IIXoII 10\D] + E [A 11Z] < 00. n
0 (5.2.23) Corollary. Let C be a closed bounded subset of the Banach space E. Suppose (Xn)nEN is an L1bounded uniform amart. Let (Y,,) be the process (Xn) stopped outside C. Then
E
[sup IIYnII] < 00. n
Proof. By (5.2.18), an Libounded uniform amart satisfies condition (B).
Banachvalued random variables
192
(5.2.24) Corollary. An L1bounded uniform amart (Xn) that stops outside a bounded set satisfies E [sup IiXnIII < 00. n
Proof. In the previous Corollary, set Xn = Yn. Amart Riesz decomposition
We next consider the Riesz decomposition for amarts. We begin with some preliminaries.
(5.2.25) Definition. An amart potential is an amart (Xn)nEIN such that lim IIE [Xn 1A] II = 0 noo
for allAEUmEN.Fm (5.2.26) Lemma. Let (X,,)nEIN be an amart potential in a Banach space. Then
l
EEIIXaIIP=0.
Proof. Let e > 0 be given. There exists no E IN so that IIE [Xn] II < e
IIE[XaXa']II <e
for n > no; for
o'>no.
Fix m > no. Choose D E Fm so that IIE [Xm 1D] II > AE p IIE [Xm 1A] II  E.
Then choose n > m so that IE [Xn 1sz\D] II < e.
Define TEEby
m T =
{ n
on D
on St \ D.
Now IIE [Xr] II < IIE [XT  Xm] II + IIE [Xm] II < 2E, so
IIE [Xm 1D] II = IIE [XT]  E [Xn In\D] II
IIE[XT  Xm] II+IIE[Xm]II+IIE[Xn1o\D] II < 3e.
5.2. Martingales and amarts
193
Therefore
sup IIE [Xin 1A] II < 4E AEFPm
for any m > no. Now if v > no and A E .F,, choose m > a and let m
on A
Q
onIl\A,
or
so that a' E E and a' > no. Then IIE[Xo1A]II < IIE[Xm1A]II+IIE[XoXo']II <4e+E= 5e. Thus we get IIXoIIP < 2 sup IIE [Xo 1A] II < 10E. AE.F,
The goal of the Riesz decomposition is to decompose an amart (Xn)nEJN
into a martingale and an amart potential. As in the case of the uniform amart, the martingale (Y,,,,) should be obtained as a limit lim Em [Xn] .
n.oo
The existence of the limit presents more of a problem this time. The sequence (EFm [Xn])nE]N is clearly Cauchy in the Pettis norm (by the difference property (5.2.6)), but that alone does not guarantee convergence, since the Pettis norm is usually not complete. So some care must be taken in the following proof.
(5.2.27) Vector Riesz decomposition. Let E be a Banach space with the RadonNikodym property, and let (X,,)nE]N be an Llbounded amart in E. Then (Xn) can be written uniquely as Xn = Yn + Z, , where (Y,,) is a martingale, and (Zn) is an amart potential. Furthermore, E [IIYYII] C
nEIN sup
E
[IIXnII]
and
lim IIZiIIP = 0. EE
Proof. The uniqueness is easy: If X. = Y. + Z. = Yn + Zn, where (Yn) and (Y,,) are martingaes and (Zn) and (Zn) are amart potentials, then the difference Yn  Y,, = Zn  Zn is both a martingale and an amart potential. Clearly this difference is identically 0. Fix an integer no. By (5.2.9), the limit p(A) = lim E [Xo 1A] oEE
Banachvalued random variables
194
exists for all A E F,,,0. Clearly it is absolutely continuous with respect to P. We claim that p has bounded variation on ,. Let Al, A2i , Ak E Fno be disjoint, and let M = lim inf1,EIN E [IIX,, II] . Given e > 0, choose m E IN such that
IIE[X.lA,]µ(A)11 <
I fori=1,2,...,k
and
E [IIXm.II] < M + e.
Then k
Ilµ(Ai)II < i=1
k
k
IIE [Xm 1A,111
11,4(A=)  E [X,,,, 1A,] II + i=1
i=1 k
<<e+E EIIXmII1Ai 1i=1
e + E [IIXmII]
< M + 2e. Therefore the variation of p on Il is at most M < oo.
Now, by the RadonNikodym property, there is a random variable Yno E Li (S2, F, 0, P; E) with µ(A) = E [Yno 1A]
for all A E .F,,.
This can be done for each no E IN, to produce a process (Yn). But this process clearly satisfies E.Fm [Y,,] = Y,,,, for m < n, so it is a martingale. Also, E [IIYnII] < IIpII(1) < M.
Let Zn = Xn  Yn. Then
lINIIE[Zn1A]11=0 for all A E UnEIN Tn, so that (Zn) is an amart potential.
It should be noted that Llboundedness may be replaced by the weaker condition liminfno0E [IIXnII] < oo. Then if M = liminfE [IIX,, 11], we have (5.2.27a)
E [IIYnII] < M.
It should also be noted that the Riesz decomposition preserves many important properties.
5.2. Martingales and amarts
195
(5.2.28) Proposition. Let Xn = Yn + Zn be the Riesz decomposition of the Llbounded amart (Xn). (1) If 4' is an Orlicz function, then sup llYnll n
sup llXnll; n
so if (Xn) is Lpbounded, then so are (Yn) and (Zn). (2) If (Xn) is uniformly integrable, then so are (Yn) and (Zn). (3) If (Xn) is uniformly absolutely continuous, then so are (Yn) and
(Zn) (4) If (Xn) satisfies condition (B), then so do (Yn) and (Zn). Proof. Note first: E [Ym 1A] = limn E [Xn 1A] for all A E .F,,,. (1) Let M = supra I I Xn I I If M = oo, then the inequality is trivial, so suppose M < oo. Fix m. We will compute the Lt norm of Yn,, using the 4'variation of the indefinite integral (5.1.23). Let (Ai) be a finite disjoint
sequence of sets in F,,,,,. Then we have (since D is convex)
E[Y.1A,] MP(Ai)
P(Ai) = lnm
II MP(A) ] II
r`
< lim inf E n
p(Ai) 11
\[< 1 L
Therefore E[4'(IIYm11/M)] < 1, so Iltmllb < M(2) If (Xn) is uniformly integrable, then the sequence (llXnll) of scalar random variables is also uniformly integrable. By (2.3.5), there is a finite Orlicz function 4' with 4'(u)/u + oo such that (llXnll) is L.,bounded. By (1), we see that (IIYnll) is also Lt,bounded, so (Yn) is uniformly integrable. (3) If (Xn) is uniformly absolutely continuous, then the collection
{ (Xn,x*):nEIN,x* EE*,llx*II <1} of scalar random variables is uniformly absolutely continuous. By the uniqueness of the scalar Riesz decomposition (1.4.22), we see that for each x*, the Riesz decomposition of (Xn, x*) is (Xn, x*) = (Yn, x*) + (Zn, x*) . Thus { (Yn, x*) : n E IN, x* E E*, Ilx* II < 1 } is uniformly absolutely continuous, and (Yn) is uniformly absolutely continuous.
(4) If (Xn) satisfies condition (B), then it is Llbounded. So Yn is Llbounded, and therefore satisfies condition (B). Finally, Zn = Xn  Yn satisfies condition (B).
Banachvalued random variables
196
Complements
(5.2.29) (Counterexample: The RadonNikodym property cannot be omitted in the amart Riesz decomposition theorem (5.2.27)). Let E = co. Let en, n E IN, 1 < i < 2n be the standard basis in some order. For each n, let measurable sets An, 1 < i < 2n be disjoint and P(An,) = 2n. Let
_Ey:iek lAk' n
Xn
2k
k=1i=1
Then the Riesz decomposition fails to exist. Indeed, since each coordinate projection is continuous, the Riesz decomposition must agree with the scalar Riesz decomposition in each coordinate. The values of the martingale part Y,,,, lie in E** = lam, not in E = co.
(5.2.30) Definition. Let s be a positive integer. Write E3 for the set of all stopping times with at most s values. A process (Xf)nEIN in a Banach space E is an amart of order s if the net (E [XX])oEE, converges in the norm of E. (5.2.31) (Amart potential of order s + 1.) Suppose that (X,,) is an amart of order s + 1 and limn E [Xn 1A] = 0 for all A E UmEN .F,,,,. Then lim IIXQIIP = 0.
uEE,
This is proved as in (5.2.26), above. (5.2.32) (Riesz decomposition.) The proof given above for the amart Riesz decomposition can be used to establish the following interesting result of Lu'u [1981]. The process (Xn)fE]N is called an amart of finite order if (Xn) is an amart of order s for all s E IN. Let (Xn) be an L1bounded amart of finite order with values in the Banach space E with the RadonNikodym property. Then (Xn) has a decomposition as Xn = Yn + Zn, where (Yn) is a martingale, and lim IIZaIIp = 0. aEE
(5.2.33) (Converse Riesz decomposition.) Let the process (Xn)nEIIV be adapted and Bochner integrable. Suppose Xn = Yn + Zn, where (Yn) is a martingale and liEm IIZZIIP = 0.
Then (Xn) is an amart of finite order, but it need not be an amart (Lu'u [1981]).
(5.2.34) (Vector version of (1.4.25).) Let F be a Banach space, and suppose E = F* is separable. Then any adapted Llbounded sequence Xn of Evalued random variables satisfies E [limsuPllXn  X,, I I1 < 2 lim sup E [ I I EST [Xo]  XT I I] n,mE]N
o,rrEE o>T
.
5.2. Martingales and amarts
197
(Edgar [1979a]; see also Bellow & Egghe [1982].) Convergence of uniform
amarts in E = F* clearly follows from this. The inequality may fail in a space E with RadonNikodym property. (5.2.35) (Difference property.) Let {e1, e2, } be an orthonormal sequence in the Banach space 12. Let rk = 1! + 2! + + k!. In the measure space [0, 1] with Lebesgue measure, define sets An by
nrk_1 nrk_1+1
A An
,
rk
rk
if rk_1 < n < rk1. Then the process Xn = en 1A,,, with natural stochastic basis Fn = a(A1i A2, , An), satisfies lim sup E [IIEFm [Xr]  XmII ] = 0
m.E]N,r>m 7EE
but lim sup sup E [IIEY° [XT]  X,II] > 0 oEE r>a (Edgar [1979a], Example 8, corrected). (5.2.36) (Variants of the maximal inequality.) If (Xn)nEIN is a martingale, then P {sup IIXnII >
a} <
supE [IIXnII] .
limE [IIXnII] _
If Xn = Ef^ [X] for all n, then
P {sup IIXnII ? Al <
E [IIXII] .
This follows from (1.4.18), since IIXnII is a positive submartingale. (5.2.37) (Maximal inequality for reversed processes.) Let (.Tn)nE_IN be a reversed stochastic basis. Then for any process (Xn)nE_IN,
P {sup IIXnII ? A} < sup 1 E [IIXnII] . (For the maximum of a finite number of terms, apply the direct maximal inequality. For the complete supremum, take a limit.) If (Xn) is a reversed martingale, then
P{sup IIXnII > A}
AE
[IIX1II] .
(5.2.38) (Riesz decomposition for weak sequential amarts.) If (Xn) is a weak sequential amart with liminfE [IIXnII] < oo and E has the RadonNikodym property, then Xn = Yn + Zn, where (Yn) is a martingale and (Zn) is a weak sequential potential, that is, the weak limit E [1AXn] is 0 for each A E Un Fn (Brunel & Sucheston [1976b]).
Banachvalued random variables
198
Remarks Banachvalued amarts were defined by Chacon & Sucheston [1975]. The notion
of uniform amart is due to Bellow [1978a]. The Riesz decomposition for uniform amarts (5.2.13) is due to Ghoussoub & Sucheston [1978]. The Riesz decomposition for amarts (5.2.27) is due to Edgar & Sucheston [1976c]. Weak and weak sequential amarts were defined by Brunel & Sucheston [1976a], and their properties are investigated in [1976b].
Astbury [1978] characterized amarts by the Pettis norm difference property. He also used the difference property to prove the Riesz decomposition (5.2.27) for (scalarvalued and vectorvalued) amarts. S. N. Bagchi [1983], [1985] discusses convergence of setvalued processes. In a dual Banach space, the right notion is the weak* amart. It converges only weak* a.s., but even setvalued martingales cannot do any better (Neveu [1972]).
5.3. The RadonNikodym property The content of this section can be viewed in two different ways. One could consider it as an exposition of the convergence theorems for vectorvalued martingales (and related processes). The RadonNikodym property is an important hypothesis in many of the convergence theorems. One could also consider the material in this section as a characterization of a certain class of Banach spaces (the spaces with the RadonNikodym property) using martingale or amart convergence. Let (1, F, P) be a probability space; let Vn)nEIN be a stochastic basis on (5l, F, P); and let E be the set of all bounded stopping times for (.Fn)nEIN Let E be a Banach space. We will keep this notation throughout Section 5.3.
Scalar and Pettis norm convergence Singling out a few ideas in advance will help in understanding the convergence proofs.
(5.3.1) Definition. Let p: F  E be a vector measure. The average range of p (with respect to P) is the set of vectors {
P(A) µ(`4)
:AEY, P(A)>0 }. JJJ
(5.3.2) Definition. Let C be a nonempty closed convex subset of the Banach space E. We say that C is a RadonNikodym set iff, for every probability space (St, F, P) and every absolutely continuous vector measure
p: F + E with finite variation and average range contained in C, the RadonNikodym derivative dp/dP exists. On first consideration, it might seem more natural to consider measures p with range in C rather than average range in C. For linear spaces C, this is, of course, equivalent. But consider, for example, in E = Ll [0, 1] the set
C={f ELl: f >0,E[f]=1}. This set vacuously satisfies the property that measures with range in C have derivatives, since no nontrivial measure has range in C. But it has none of the useful properties of RadonNikodym sets to be proved below: for example, a martingale with values in C need not converge.
5.3. The RadonNikodym property
199
(5.3.3) Proposition. A closed nonempty convex set C is a RadonNikodym set if and only if every closed bounded convex nonempty subset of C is a RadonNikodym set. A Banach space E has the RadonNikodym property if and only if E is a RadonNikodym set, which holds if and only if the unit ball { x E E : I I x I I < 1 } is a RadonNikodym set. Proof. Suppose every closed nonempty convex subset of C is a RadonNikodym set. Let u be an absolutely continuous vector measure with finite variation and average range contained in C. The RadonNikodym derivative h = dl pl/dP is a.e. finite, and a has bounded average range on each of the sets Aa = {h < A}, so by assumption du/dP exists on each of the sets AA. So dµ/dP exists.
(5.3.4) Definition. The sequence (Xf)fEIN of random variables in the Banach space E converges scalarly to a random variable X if, for each x* E E*, the sequence ((Xn, x*)) of random variables in IR converges a.s.
to (X,x*). If we say that a sequence (Xn) converges scalarly, we are asserting more than the a.s. convergence of each sequence ((Xn, x*)). We are asserting the existence of a limit random variable X, Bochner measurable, with values
in E. It should be emphasized that for scalar convergence, the exceptional null
set where ((Xn,x*)) does not converge to (X, x*) may depend on x*. A related, but stronger sort of convergence is defined next.
(5.3.5) Definition.
The sequence (Xn)nEJN of random variables converges weakly a.s. to the random variable X if, for almost every w E St, the sequence Xn(w) converges weakly to X(w). In this case, there is a single exceptional null set.
For scalarvalued processes, in the presence of uniform integrability, pointwise convergence implies Ll convergence. The corresponding condition for scalar convergence is uniform absolute continuity (5.2.15). (5.3.6) Proposition. Let (Xn)nEIN be a process with values in a Banach space E. If Xn converges scalarly to X and {Xn} is uniformly absolutely continuous, then Xn  X converges to 0 in Pettis norm.
Proof. Let e > 0. Then there exists S > 0 so that lIE [Xn 1A] 11 < e if P(A) < S. Let x* E E*, Ilx*11 < 1. If P(A) < S then E [I(Xn, x*)I 1A] < 2e. By Fatou's lemma, if P(A) < S, then E [I (X, x*) 11A] < lim inf E [I (Xn, x*) I 1A] < 2e.
Now (Xn,x*) + (X,x*) a.s., so there is N such that for all n > N,
P{I(XnX,x*)I >E} <S.
Banachvalued random variables
200
Write An = { I (Xn  X, x*) I > e}. Then
E [I (Xn  X, x*)I] < E [I(X,,  X, x*)I 1A,.] + E [I(X  X, x*)I 10\A, ]
<4e+e=5e. Therefore II X,,  X II P < 5e. Thus Xn  X + 0 in Pettis norm. Our first convergence theorem will be proved for bounded sets, and then generalized to unbounded sets.
(5.3.7) Proposition. Let E be a Banach space, and C C_ E a closed bounded convex RadonNikodym set. If (Xf)nEIN is an amart in C, then (Xn) converges scalarly and in Pettis norm.
Proof. For each A E U' l .fin, the restriction theorem (5.2.9) implies that the limit
µ(A) = lim E [Xn 1A] n.oo
exists in E, and µ(A)/P(A) E C by (5.1.14b). By (5.1.9a), µ can be extended to a countably additive vector measure µ on the oralgebra generated by U° =1.Fn, and
P(A) E C
for A E
P(A) > 0.
But C is a RadonNikodym set, so there is a random variable X with
E[X1A] =µ(A) for allAEF0. We claim that Xn converges scalarly to X. If x* E E*, then (Xn, x*) is a scalarvalued amart, and hence converges a.s. To see that X has values
in C, write C= n j x E E : (x, xi) < ai } as in (5.1.5); then P{ (X, xi) > ai} < liminfn P{(Xn, xi) > c,} = 0. Thus X E C a.s. Now X. and X all have values in C, a bounded set, so for all A E we have, by the dominated convergence theorem, E [ ((X, x*)  lim (Xn, x*)) 1A] = (µ(A), x*)  (µ(/1),x*) = 0,
so (Xn, x*) > (X, x*) a.s. This proves scalar convergence. The process is uniformly bounded, so in particular uniformly absolutely continuous, which proves Pettis norm convergence by (5.3.6). The converse of this result is also true: If C is a closed bounded convex set, and every amart in C converges scalarly, then C is a RadonNikodym set (5.3.28). The conclusion of (5.3.7) can be proved with weak sequential amarts instead of amarts. See (5.3.14), below.
5.3. The RadonNikodym property
201
(5.3.8) Proposition. Let C be a closed bounded convex RadonNikodym set. Suppose (Xn) is an amart that stops outside C. If either (a) (Xn) satisfies condition (B); or (b) (Xn) is uniformly absolutely continuous; then (Xn) converges scalarly and hence in Pettis norm. Proof. If (a) holds, then by (5.2.24), we have E [sup IIXn II] < oo, which implies that (Xn) is uniformly absolutely continuous. Thus it is enough to
consider case (b). Now by translation, we may assume that 0 E C. Let G. = {Xn E C}. Since (Xn) stops outside C, we have Gn D Gn+i. Now Yn = Xn 1G is an adapted process with its values in C. We claim that (Yn) is an amart. Let e > 0 be given. There is b > 0 so that if P(A) < 6, then IIE [Xn 1A] II < e. Write G = nn 1 G. There is an N E IN with P(GN \ G) < S. Since (Xn) is an amart, there is M E IN with for a, T E E; or, T > M.
IIE [Xu  XT] II < e
Note that
Xn  Yn = {
0
Xn
on Gn on S2 \ Gn,
so that if m < n, then 0
(XnXm)(YnYm)
Xn 0
on Gn
onGm\Gn
onhl\Gm.
Also, if p > n then Xp = Xn outside Gn. Now consider simple stopping times o,, ,r with max {M, N} < or < T. There is an integer p > r. Let
A = Go \ Cr. This is a subset of GN \ G, so P(A) < S. Therefore (XT Xa)  (YT Ya) =Xr1A, hence IIE[(XTXa)(YTY,)]II =1IE[XT1A]11=IIE[XP1A]I1 <E,
so that IIE [Yr]  E [Ya] II < 2e. This shows that (Yn) is an amart. By the preceding proposition, (Yn) converges scalarly and in Pettis norm, say to Y. Let or be the first entrance time of (Xn) in E \ C. If Y
Xa
on G
onQ\G
then (Xn) converges to X scalarly and in Pettis norm.
202
Banachvalued random variables
Next we can go to unbounded sets.
(5.3.9) Theorem. Let C be a RadonNikodym set. If (X,,,) is an smart in C satisfying condition (B), then (Xn) converges scalarly.
Proof. Fix a real number A > 0. Let (Yn) be the process obtained by stopping (Xn) outside
CA={xEC:IIxII
AA={IIXnhI
lim P(A,) = 1, 00
A
so (Xn) itself converges scalarly.
(5.3.10) Theorem. Let E be a Banach space with the RadonNikodym property. Then every amart in E satisfying condition (B) converges scalarly.
Proof. Take E = C in the previous result. The converse of this result is also true (5.3.29), so this is a characterization of the RadonNikodym property. In general, the amart in the preceding proposition does not converge in Pettis norm. The Pettis norm convergence theorem of Uhl is proved below (5.3.24).
To state the next corollary, we define: Xn converges scalarly to X on the set A if, for every x* E E*, there is a null set N (depending on x*) such that x*(X.(w)) > x*(X(w)) for all w E A \ N.
(5.3.11) Corollary. Let E be a Banach space with the RadonNikodym property. If (Xn) is a uniformly absolutely continuous amart, then Xn converges scalarly on the set where sup IIXn II < 00.
Proof. For A > 0, define C,, = { x E E : IIxii < Al. By (5.3.8(b)), the process X,, stopped outside Ca converges scalarly. So X,, itself converges scalarly on {sup IIXn11 < Al. Therefore it converges scalarly on a countable union UfEN Cn = {sup" IIXn II < oo}. Weak a. s. convergence
Scalar convergence is a weak conclusion for a theorem. Stronger sorts of convergence may be obtained by strengthening the hypotheses of the theorem. We first consider strengthening the restrictions on the Banach space E. Later in this section we will consider specializing the process (Xn)nEIN
5.3. The RadonNikodym property
203
(5.3.12) Theorem. Let (Xn) be an amart in the RadonNikodym set C in the Banach space E. Suppose (Xn) satisfies condition (B) and E* is separable. Then (Xn) converges weakly a.s. Proof. By (5.3.9), we know that (Xn) converges scalarly, say to X. Let (xk)kEIN be a countable set dense in E*. There exist events Ak E F with P(Ak) = 0 such that lira (Xn(w), xk) = (X, x*) n
for all w E S2 \ Ak.
By the maximal inequality, there is a set N E F with P(N) = 0 such that supn IIXnII < oo on Sl \ N. Let A = N U UkEIN Ak. Then P(A) = 0. Fix w E St \ A. We claim that Xn(w) converges weakly to X(w). Since w N, we have A = sup IIXn(w)II < oo.
Now let x* E E*, and lets > 0. Choose k E IN so that IIx* xkII
4A+1
For this k, we have w V Ak, so limn(Xn(w),xk) = (X(w),x*). Choose
m E IN sothat forn>m (Xn(w)X(w),xk) < 2 Then for n > m it follows that (Xn(w)  X(w), x*) I I (Xn(w), x*)  (Xn(w), xk) I + I (Xn(w)  X (w), xk)
+ I (X(w),x*)  (X(w),xk)I IIx*  xkII (IIXn(w)II + IIX(w)II) +e 4A+12A+2<
This shows that (Xn(w), x*) > (X (w), x*). This is true for any x* E E*, so Xn(w) converges weakly to X(w). (5.3.13) Theorem. Let (fl, P) be a probability space, let E be a Banach space, and let (Xn)nEIN be an amart in E. Assume: (1) (Xn)nEIN satisfies condition (B); (2) E has the RadonNikodym property; (3) E* is separable. Then (Xn)nEIN converges weakly a.s.
Proof. Take C = E in the preceding result. In this theorem, the conditions on the space E are necessary as well as sufficient. A Banach space E has the RadonNikodym property if and only if every L1bounded martingale in E converges weakly a.s. (5.3.30). Suppose E is a separable Banach space. Then the dual E* is separable if and only if every amart potential in E converges weakly a.s. (5.5.27).
204
Banachvalued random variables
Weak sequential amarts The preceding convergence theorems prove only weak convergence. This
makes the definition of amarts (involving norm convergence) seem too strong. These convergence theorems fail in general for weak amarts. But they remain true for weak sequential amarts. Recall that a weak sequential amart is a process (Xn) such that, for every increasing sequence on E E, the sequence of vectors E [X,.1 converges weakly.
(5.3.14) Proposition. Let E be a Banach space, and C C E a closed bounded convex RadonNikodym set. If (Xn)nEIN is a weak sequential amart in C, then (Xn) converges scalarly. Proof. By (5.2.12), for each A E U°_l .F'n, the sequence E [Xn 1A] con
verges in the weak topology of E.
Let µ(A) be its limit.
Each
E [Xn 1A] /P(A) is in C, and C is weakly closed, so µ(A)/P(A) E C. The remainder of the proof is the same as in (5.3.7).
(5.3.15) Proposition. Let C be a closed bounded convex RadonNikodym set. Every weak sequential amart satisfying condition (B) that stops outside C converges scalarly.
Proof. Let (Xn)nEIN be a weak sequential amart that satisfies condition (B) and stops outside C. Let Gn and Yn be as in the proof of (5.3.8). We claim that (Yn) is a weak sequential amart. Now (X n) is a weak sequential amart for any increasing sequence (on) in E, so it is enough to prove that E [Yn] converges weakly.
Let a be the first entrance time of (Xn) in the set E \ C. Then we have E [IIX0111O\G] < oo by (5.2.22). The difference Xn  Yn converges a.s. (in norm) to Xo In\G, so E [Xn  Yn] converges in norm to E [Xo 10\G]  SO the weak limit of E [Yn] is the difference of the weak limit of E [Xn] minus E [Xo In\G] . The remainder of the proof is the same as (5.3.8).
(5.3.16) Proposition. Let E be a Banach space with the RadonNikodym property. Then every weak sequential amart in E satisfying condition (B) converges scalarly. The proof is the same as that given for (5.3.9).
(5.3.17)
Theorem. Let (S2, .F, P) be a probability space, let E be a
Banach space, and let (Xn)nE1N be a weak sequential amart in E. Assume: (1) (Xn)nEgv satisfies condition (B); (2) E has the RadonNikodym property; (3) E* is separable. Then (Xn)nE]N converges weakly a.s.
This proof, too, is the same as given above for the amart case (5.3.13). One reason that the convergence theorems for weak sequential amarts are interesting is that there is no larger class with the same convergence properties (in bounded sets).
5.3. The RadonNikodym property
205
(5.3.18) Theorem. Let E be a Banach space, let C be a bounded subset, and let (Xn)nEIN have values in C. If (Xn) converges scalarly, then (X,,) is a weak sequential amart.
Proof. Suppose Xn * X scalarly. Let or,, be an inc asing sequence in E. Let a = limn an. Then or is a (possibly infinite) stopping time. Write A = for = oo}. We claim that E [X,,] converges weakly to E [XI 1o\A + X 1A] . Indeed, if x* E E*, then (Xc,,, x*) converges a.s. to (Xv 1o\A + X 1A, x* ); so the convergence of the expectations follows from the dominated convergence theorem.
One could observe that the preceding result is not really probabilistic in nature, since the stochastic basis ('n)nEIlV does not matter. If Xn converges scalarly, then (Xn) is a weak sequential amart with respect to the stochastic
basis Fn = T. Strong convergence
(5.3.19) Definition. Let (Xn)nEIN be a process with values in a Banach space. We will say that (Xn) converges a.s. if the sequence (Xn(w)) converges in the norm of E for almost every w E Q. Next we take up an easy norm convergence theorem. It does not require the RadonNikodym property.
(5.3.20) Proposition. Let (Il,.F, P) be a probability space, let (.Pf)fEN be a stochastic basis, and let E be a Banach space. Write .F,,. for the aalgebra generated by U= For every X E L1(St, .F, P; E), the martingale
Xn = Eon [X] converges a.s. and in Bochner norm to ET°° [X].
Proof. First suppose X is .F,,,.measurable for some m E IN. Then Xn = X for all n > m, so certainly Xn converges a.s. to X = Ef°° [X]. Now consider a general X E Ll (S2, , P; E). Let e be given, 0 < e < 1. There is mo E IN and Y E Ll (1, .;,,,o, P; E) such that E[IIE[X]YII] <e2.
Set Yn = Ecn [Y]; by the maximal inequality (5.2.19), applied to
Xn  Yn = E.Fn [Er'° [X]  Y] we have
,
l P{suPIIXnYII>E} < let=E. r
Now Yn > Y a.s., so there is m1 > mo with P{IIYn
 YII>E forsome n>ml}<E.
206
Banachvalued random variables
Then, with the triangle inequality, we have P{ IIXn  EF°° [X] II > 3e for some n > ml } < 3e. Since c was arbitrary, Xn > E,"°° [X] a.s. For proof of convergence in Bochner norm, observe that EF^ [X] is uniformly integrable. Here is the observation that will be used to strengthen the scalar convergence results proved above to strong convergence results. If a martingale (or even a uniform amart) converges in a very weak sense, then in fact it converges in a much stronger sense.
(5.3.21) Proposition. Let (Xn)nE N be a uniform amart in E. (1) Suppose (Xn) is L1bounded. Then Xn converges scalarly if and only if Xn converges a.s.
(2) Suppose (Xn) is L,,bounded. Then Xn converges scalarly if and only if Xn converges in the Bochner norm. (3) Suppose (Xn) satisfies E [supra IIXnII] < oo. Then Xn converges scalarly if and only if Xn converges in the Bochner norm.
Proof. First observe that it is enough in all parts to prove the result in the case that (Xn) is a martingale by the Riesz decomposition (5.2.13). Observe that a.s. convergence implies scalar convergence. Also, for a martingale (Xn), Bochner norm convergence (to Xc,.) implies mean convergence E [(Xn, x*)  (X., x*)] > 0; these are scalarvalued martingales, so they converge a.s.; thus Xn converges scalarly to X,,.. (2) follows trivially from (3). We begin with (3). Let (Xn) be a martingale with
E
[sup IIXnII] < oo. n
Suppose Xn converges scalarly to X. Now if A E 1:'n,, for some m, then by the dominated convergence theorem, we have
lim E [((Xn, x*)  (X, x*)) 1A] = 0 noo for all x* E E*, so that E [Xn 1A] > E [X 1A] weakly. But for n > m the sequence E [Xn 1A] is constant. Thus E [X 1A] = E [X,,, 1A]. This is true for all A E Jl;n, so that E.F [X] = Xm Then by the preceding result, Xn converges a.s. and in Bochner norm to X. For part (1), we will use the same stopping technique as before. Fix A > 0 and let Yn be obtained by stopping Xn outside
CA={xEE:IIxII
The maximal inequality (5.2.19) then shows that Xn converges a.s. These results, together with the scalar convergence results proved above, produce some convergence theorems in Banach space.
5.3. The RadonNikodym property
(5.3.22)
207
Theorem. Suppose E is a Banach space and C C_ E is a
bounded RadonNikodym set.
(1) A uniform amart in C converges a.s. and in Bochner norm. (2) An L1bounded uniform amart that stops outside C converges a.s. and in Bochner norm.
(5.3.23) Theorem. Let E be a Banach space with the RadonNikodym property. Let (X,,),,EIN be an Libounded uniform amart in E. Then (Xn)nEIN converges a.s.
Here is the Pettis norm convergence theorem of Uhl:
(5.3.24) Theorem. Let E be a Banach space with the RadonNikodym property. Let (Xf)fENN be a uniformly absolutely continuous amart such
that liminf E [IIXnII] < 00.
Then X,, converges in Pettis norm.
Proof. Consider the Riesz decomposition Xn = Yn + Z. The potential Zn converges to 0 in Pettis norm. The martingale Yn is Llbounded by (5.2.27a), and therefore a.s. norm convergent by (5.3.23). But Yn is uniformly absolutely continuous, so it converges in Pettis norm by (5.3.6). Thus the sum Yn + Zn converges in Pettis norm. Tconvergence
Let E be a Banach space and E* its dual. A subset T of E* is called total if x*(x) = 0 for all x* E T implies x = 0. Let (Xn) be an amart of class (B) and X a (Bochner measurable) random variable. If the RadonNikodym property is not assumed, it is natural to ask how small can be the class T of functionals x* such that a.s. convergence of x* (X") to x* (X) for each x* E T implies scalar convergence of Xn (necessarily to X). In fact, T can be any total subset of E*. For martingales, the conclusion is even more dramatic, because scalar convergence implies strong almost sure convergence. Since an Llbounded martingale is an amart of class (B), it follows that an Llbounded martingale Xn such that x* (Xn) converges to x* (X) for each x* in a total set T, converges to X strongly a.s.
(5.3.25) Lemma. Let (Xn) be an amart with sup,, IIXnII E Ll. Let X be a random variable, and let T be a total subset of E*. Assume that x*(Xn) converges to x*(X) almost surely for each x* E T. (i) If A E Y and E [1AIIXII] < oo, then limn E [1AXn] = E [1AX]. (ii) If X is Bochner integrable, then Xn converges scalarly to X.
208
Banachvalued random variables
Proof. The process (Xn) is uniformly absolutely continuous, since we have sup IIXnII E Ll. Thus by (5.2.17), the limit µ(A) = lim E [1AXn]
exists for all A E F and defines a measure on F. (i) Now suppose 1AX is Bochner integrable. For each x* E T, we have
x*(p(A)) = limE [lAx*(Xf)] = E [1Ax*(X)] = x*(E [lAX] ) Since T is total, it follows that µ(A) = E [1AX], as required. (ii) If y* E E*, then y*(Xn) is a realvalued amart with integrable supremum; therefore y* (Xn) converges a.s. to a real random variable, say Zy.. For each A E .F, we have by (i) µ(A) = E [1AX], so that E [lAy*(X)] = y*(IL(A)) = linmE [1Ay*(Xn)] = E [1AZy' ]
.
Therefore y*(X) = Zr,. a.s. Thus X is the scalar limit of X. The lemma is not the best result, because it is not necessary to assume that X is Bochner integrable.
(5.3.26) Theorem. Let (Xn) be an Evalued amart of class (B), and let X be an Evalued random variable. Let T be a total subset of E*. Assume, for each x* E T, that x*(Xn) converges almost surely to x*(X). Then Xn converges scalarly to X. Proof. Assume first that supra IIXnII E L. Fork E IN, let Ak = {IIXII < k}. Then for any A E F, by (5.3.25(i)), lim n E [lAnA, Xn] = E [lAnA X1. Therefore E [1AkIIXII] = IpI(Ak) < E [supra IIXnII]. Let k > oc; so E [IIXII] <_ E
up IIXnII, < 00, Isn
an d X is Bochner integrable. By 5.3.25(ii), Xn converges scalarly to X.
For the general case, we may apply the usual stopping argument: If x* (Xn) * x* (X) a.s. and r = inf {n: IIXnII >_ k }, then x* (XnAT) > x* (Y) a.s., where
Y=
rX,
on{r
tl X
on {T=col.
The preceding result also holds for weak sequential amarts. The conclusion is stronger for martingales.
5.3. The RadonNikodym property
209
(5.3.27) Theorem. Let (X,,) be an Evalued Llbounded martingale, and X an Evalued random variable. Let T be a total subset of E*. Assume, for each x* E T, that x* (X,,) converges almost surely to x* (X). Then X, converges almost surely to X.
Proof. An Llbounded martingale is necessarily an amart of class (B). Therefore Theorem (5.3.26) applies. Now it is easy to see that scalar convergence implies strong a.s. convergence for martingales. Indeed, after reducing to the case sup,, IIX,,II E Ll as usual, the relation
p(A) = E [lAX] proved above, applied to the terms X,, of a martingale and sets A in shows that each X,,,, is E.Fm [X]. This reduces the problem to closed martingales, which converge without the RadonNikodym property (5.3.20). Converses
In most cases, the RadonNikodym property cannot be omitted from convergence theorems. In fact, it is necessary and sufficient for convergence of martingales in E. This converse result has a version for bounded sets and a version for unbounded sets. Recall that a set C C E is a RadonNikodym set if, for every probability space (S2, 2 , P) and every absolutely continuous vector measure µ:.T + E with finite variation and average range contained in C, the RadonNikodym derivative du/dP exists.
(5.3.28) Theorem. Let C be a nonempty closed bounded convex set in a Banach space E. The following are equivalent: C is a RadonNikodym set; every martingale in C converges a.s.; every martingale in C converges in Bochner norm; every martingale in C converges in Pettis norm; every martingale in C converges scalarly; every amart in C converges scalarly.
Proof. The proof outline is: (1) = (6) (2) (5) (3) (1) (3) (4) (5). Seven of the eight implications are easily deduced from what has already been done. (1) (6) by (5.3.7). (6) (5) since every martingale is an amart. (5) (3) by (5.3.21(2)). (3) (4) since the Bochner norm dominates the Pettis norm. (1) (2) by (5.3.22). (2) (3) by the dominated convergence theorem. (4) (5). Pettis norm convergence (to X,,.) implies mean convergence of scalarvalued martingales (X,,, x*) > (X,,,, x*), so they converge a.s.; thus X,,, converges scalarly to X... (3) (1). Suppose, then, that every martingale in C converges in
Bochner norm. Then any martingale in C with directed index set also
210
Banachvalued random variables
converges in Bochner norm. This follows from the sequential sufficiency theorem (1.1.3) and the directedset version of the optional sampling theorem (5.2.2). Let (1, F, P) be a probability space, and let p : F + E be a vector measure with average range contained in C. Consider (as in (1.3.2)) the directed set D of partitions of S2 into finitely many sets of F of positive measure, directed by essential refinement. For each t = {Al, A2i , Ak} in D, let k
Xt =
µAi lA,.
P(Aj)
Then (Xt)tED is a martingale indexed by D. Let X E Ll (1 , F, P; E) be the limit in the Bochner norm. We claim that X is the RadonNikodym derivative of p with respect to P.
Let A E F. (We may assume 0 < P(A) < 1.) For any t E D that essentially refines the partition s = {A, S2 \ A}, we have E [Xt 1A] = E [X9 1A] = µ(A) But E [Xt 1A] converges to E [X 1A], so E [X 1A] = µ(A). Thus X is the RadonNikodym derivative of p.
The version for unbounded sets is an easy consequence.
(5.3.29) Theorem. Let C be a nonempty closed convex set in a Banach space E. The following are equivalent: (1) C is a RadonNikodym set; (2) every Libounded martingale in C converges a.s.; (3) every Lambounded martingale in C converges in Bochner norm; (4) every Lambounded martingale in C converges in Pettis norm; (5) every L1bounded martingale in C converges scalarly; (6) every amart in C satisfying condition (B) converges scalarly; (7) every Lambounded smart in C converges scalarly.
Proof. Each of the parts holds for an unbounded set C if and only if it holds for all nonempty closed bounded convex subsets of C. For (1) this is (5.3.3), and for the others the usual stopping argument (as in (5.3.9)) will suffice.
The most commonly used version of this result is the theorem of the Ionescu Tulceas and Chatterji, which is (1)
(2) of the following result.
For a simple proof of (1) = (2), see (5.3.34). (5.3.30) Theorem. Let E be a Banach space. The following are equivE has the RadonNikodym property; every Llbounded martingale in E converges a.s.; every L,,bounded martingale in E converges in Bochner norm; every Lambounded martingale in E converges in Pettis norm; every Llbounded martingale in E converges scalarly; every smart in E satisfying condition (B) converges scalarly; every Lambounded smart in E converges scalarly.
5.3. The RadonNikodym property
211
These theorems can be used to establish some of the usual results on the RadonNikodym property in Banach spaces.
(5.3.31) Proposition. Let E be a Banach space and let (Xn) be a sequence of Evalued random variables.
(i) Suppose that there exists an event 1o C SZ with P(10) = 1 such that for each w E 1o, the sequence X,,,(w) is relatively weakly sequentially compact in E (that is, each subsequence contains a subsequence that converges weakly to a point in E). Suppose also that for each x* E E*, the limit lim(Xn,x*) exists a.s. Then Xn converges scalarly.
(ii) Assume E = F* is the dual of a Banach space F. Suppose there is a separable set El C E and an event S2o C l with P(11o) = 1 such that for each w E Sto, the sequence X,,,(w) is relatively weakstar sequentially compact in El (that is, every subsequence contains a subsequence that converges weakstar to an element ofE1). Suppose also that for each y E F, the limit lim(y, Xn) exists a.s. Then there
is a Bochner measurable random variable X such that (y, Xn) + (y, X) for each y E F.
Proof. (i) For each x* E E*, denote by R. the a.s. limit of (Xn, x*). For each w E 1, choose a subsequence X,,,, (w) that converges weakly. (This choice depends on w, possibly in a nonmeasurable way.) Let X(w) be the weak limit of X,,,e (w). Now for each x* E E*,
R.* (w) = u m (X.,, (w), x*) = (X(w), x*)
a.s.
But R. is measurable, so X is scalarly measurable. Note that the values X(w) all belong to the closed span of the set of values of all the random variables X,, (since the weak closure of a convex set is equal to its norm closure, see Dunford & Schwartz [1958], p. 422). So the set of values of X is separable. Therefore by the Pettis measurability theorem (5.1.7), X is Bochner measurable. For each x* E E*, lim
noo
(X (w), x*) =
(w)
a.s.
Hence X is the scalar limit of Xn. (ii) The proof is similar. The weakstar limit X(w) of a subsequence Xnk(w) must be chosen in the separable set El. Then for each y E F, (y, Xn(w))  (y, X (w))
a.s.
and X is weakstar measurable, so it is Bochner measurable by (5.1.8).
Banachvalued random variables
212
(5.3.32) Proposition. Let F be a Banach space, and let C C F* be a separable, weak* closed, bounded, convex set. Then C is a RadonNikodym set.
Proof. First note that countably many linear functionals y E F separate points of C: if { xn : n E IN } is a dense set of distinct points in C, choose ymn. E F with Ilymnll C 1 and I (ymn,xn)  (ymn,xm)I > (1/2)Ilxn  xmll.
Now if u, v E C and u $ v, then there exist n and m such that Ilxn  uII < (1/6)IIu  vii and Ilxm  vii < (1/6)IIu  vII. Thus I (Ynm, xn  xm) I  I (Ynm, xn  u) I  I (Ynm, X.  v) I
I (Ynm, u  v) I
>
1 2IIxnx+n1IIlxnuIIIlxmvII
2IIuvII  2IIxn uII  2IIxm vII
> 2IIuvII  4IIuvII  4IIuvII > 0. So
(ynm,uV) $0.
Let (Xn)fEIN be a martingale in C. For y E F, the scalar process ((y, Xn)) is an L,,,,bounded martingale, so it converges a.s. Write Ry for the a.s. limit. Then E^ [Ry] = (y, Xn) a.s. For each w E SIo, choose a subsequence Xn,, (w) that converges weakstar. (This is possible by the theorem of Alaoglu, for example Rudin [1973], p. 66.) Proposition (5.3.31(ii)) can now be applied to conclude that there exists a random variable X such that (y, Xn) * (y, X) a.s. For each y E F we have (y, Er^ [X]) = E" [Ry] = (y, Xn) a.s. Now since countably many linear functionals separate points of C, we have E" [X] = Xn a.s. Therefore the martingale (Xn) converges a.s. This shows that C has the RadonNikodym property.
(5.3.33) Proposition. (i) A separable dual Banach space has the RadonNikodym property. (ii) A weakly compact convex subset of a Banach space is a RadonNikodym set.
Proof. (i) The unit ball of the space satisfies the conditions for C in the preceding proposition. Now apply Proposition (5.3.3). (ii) follows from (5.3.31(i)), since a weakly compact set is weakly sequentially compact (EberleinSmulian theorem; see, for example, Dunford & Schwartz [1958], p. 466).
The preceding result will show that any reflexive Banach space has the RadonNikodym property. Indeed, E is reflexive if and only if the unit ball of E is weakly compact (Dunford & Schwartz [1958], p. 430).
5.3. The RadonNikodym property
213
Complements
(5.3.34) (Convergence of Libounded martingales under RadonNikodym property.) We sketch a direct short proof of the important theorem of the Ionescu Tulceas. Most arguments have appeared before. (i) Martingales of the form Xn = E'° [X] converge a.s. (5.3.20). (ii)
If (Xn) is uniformly integrable, then Xn = EFn [X] for some
X E L1(E). Indeed, µ(A) = limn E [1AXn] defines a countably additive measure of bounded variation on a(U.F'n). Let X be the RadonNikodym derivative dp/dP. (iii) For a constant A > 0, define a stopping time a as the first n such that IIXnII > A if such an n exists; otherwise a = oo. Then (Xnno) is a martingale (5.2.2). Let Z = supra IlXnnsll We assert that Z E L1. Indeed, IIZII < A on the set {a = oo}; and on A = for < oo}, we have Xnno p Xo; so by Fatou and since (IIXnII) is a submartingale, E [1AIIXCII] < liminf E [II1AXnn,II] <_ supE [IIXnII] n
.
But since IlXnnsll < IIXnII on A, we have E [IIZII] < A+supE [IIXnII] < oo. By (i) and (ii), Xnno converges a.s. (iv) By the maximal inequality (5.2.19), Sl is a union of sets St,, = {sup IIXnII < A}, n
.. On Sts, the process Xn coincides with Xnnv, hence it converges a.e. by (iii). Therefore Xn converges a.e. on Q. (5.3.35) (Reversed amarts.) Let (Fn)nEIN be a reversed stochastic basis s a y A = 1, 2, 3,
(1.1.18). Let E be a Banach space. Suppose E has the RadonNikodym property and E* is separable. If (Xn)nE_IN is an amart, then Xn(w) converges weakly for almost all w E B = {sup IIXnII < oo} (and diverges weakly for all w ¢ B) (Edgar & Sucheston [1976a}). It is possible that B = St (so this result applies on 0) even if (Xn) is not L1bounded. If Xn is of class (B), then sup IIXnII < oo a.s., so Xn converges a.s. (5.3.36) (Reversed martingales.) We now discuss convergence of reversed martingales. Unlike direct martingales, or direct and reversed amarts, they converge a.s. also if E does not have the RadonNikodym property. A reversed martingale (Xn)nE_IN has the form Xn = [X]. In fact, X may be taken to be X_1. So (Xn) is uniformly integrable. We claim that such a martingale converges a.s. to EF [X], where
n .Fn. nEIN
To prove this convergence, consider the set G of all X E L, (Q, .F, P; E) such that Eon [X] = E°° [X] for some finite n. Then we claim that the
Banachvalued random variables
214
closed linear span of G is all of L1(E): to see this, it suffices to show that if Y E L,,.(E*) and Y 1 G, then Y = 0. But first G contains all random variables of the form X  E^ [X], so Y 1 G implies Y is
Fnmeasurable.
This is true for all n, so Y is F ,,,,measurable. On the other hand, G contains all T,,.measurable random variables X, so Y 1 G implies Y = 0. Next, we must show that the set of all X E L1 (E) for which convergence EJ^ [X] , EFoo [X] holds is a closed linear space. This is proved from the maximal inequality (5.2.37), as in (5.3.20). Since convergence is clear for X E G, it follows that convergence holds for all X E L1(E). (5.3.37) (Directed index set.) Let J be a directed set, and let (.F't)tEJ be a stochastic basis. Bochner norm convergence and Pettis norm convergence are metric, so the sequential sufficiency theorem applies to generalize results for convergence of these types.
(a) Let E be a Banach space, and C C E a closed bounded convex RadonNikodym set. If (Xt)tEJ is an amart in C, then (Xt) converges in Pettis norm (5.3.7). (b) Let C be a closed bounded convex RadonNikodym set. Suppose (Xt) is an amart that stops outside C [that is: if s < t and X. ¢ C, then X3 = Xt]. If either (Xt) satisfies condition (B); or (Xt) is uniformly absolutely continuous; then (Xt) converges in Pettis norm (5.3.8).
(c) Let E be a Banach space with the RadonNikodym property. Let (Xt) be a uniformly absolutely continuous amart such that sup E [IIXt II] < oo.
Then Xt converges in Pettis norm (5.3.24).
(d) Write F,,, for the Qalgebra generated by UtEJ.t. For every random variable X in L1(1l, .F, P; E), the martingale Xt = Er` [X] converges in Bochner norm to EF°° [X] (5.3.20).
(e) Let (Xt) be a reversed martingale. Write . for the aalgebra generated by ntEj .F't. Then Xn, converges in Bochner norm to E°° [X_1] (5.3.36).
(5.3.38) (Po convergence.) Let (.Ft)tEj be a stochastic basis and let E be a Banach space. The process (Xt) of Evalued random variables converges in probability to X,,, if
P{IIXtX... II >e}  o for every e > 0. A weaker notion is Po convergence, related to convergence in probability as Pettis norm convergence is related to Bochner norm convergence. The process (Xn) converges in Po to the random variable X,,,, iff:
5.3. The RadonNikodym property
215
for every e > 0 there exists to E J such that for all t > to and all x* E E* with 11x* 11 < 1,
P{ I (x*, Xt  X.. )) I > e} < E.
Suppose the Banach space E has the RadonNikodym property. If Xt is an Llbounded Evalued amart, then Xt converges in Po. To prove this, apply the directedset version of the Riesz decomposition (5.2.27): Xt = Yt + Zt, where Yt is a martingale and Zt + 0 in Pettis norm. Then Yt converges
in probability and Zt converges in Pettis norm. Both of these modes of convergence imply convergence in Po (Edgar [19821).
(5.3.39) (Related processes.) As in the scalar case, there are in the vector case many additional generalizations and variants of the class of amarts. Such classes of processes may be more or less general, and correspondingly have less or more desirable properties. We will mention here only a few of the many results along this line. Let E be a Banach space with the RadonNikodym property. If (Xn)nEN is a process with values in E, for positive integers m < n, write
H.. = E"m [Xn]  Xm Similarly, for bounded stopping times or < rr, write
Har = Er° [XT]  X. The process (Xn) is a pramart if HH7 > 0 in probability, i.e.
limsupP{IIH"11>e}=0
oEE T>O TEE
a.s. for every e > 0. Millet & Sucheston [1980b] introduced pramarts and proved that realvalued pramarts converge a.s. if they satisfy condition (d): liminf IJX; 111 + liminf IIX; 111 < oo.
For pramarts, this condition is strictly weaker than Llboundedness (but not for martingales or amarts). Pramarts have the optional sampling property, and therefore there is a continuous parameter theory (see Frangos [1985]). Millet & Sucheston [1980b], Egghe [1981], Slaby[1982] and Frangos [1985] considered almost sure convergence of Banach valued pramarts, and
Banach lattice valued "subpramarts." The Banach valued case is included in the result of Talagrand stated below. The process (Xn) is a martingale in the limit if Hmn + 0 a.s., that is, lim sup IIHmnll = 0 a.s.
moo n>m
216
Banachvalued random variables
Every pramart is a martingale in the limit. This definition is due to Mucci [1973] in the scalarvalued case. Bellow & Dvoretzky (unpublished) and Edgar [1979a] proved partial results on the a.s. convergence of martingales in the limit. These results were also improved by the result of Talagrand, which is stated below. The process (Xn) is a Talagrand mil if Hon + 0 in probability, that is, for every e > 0 there exists N such that for all bounded stopping times or > N and all integers n > v, we have P{IjHnll > e} < E. Every martingale in the limit (and therefore every pramart) is a Talagrand mil. Talagrand [1985] showed that (if E has the RadonNikodym property) all Llbounded Talagrand mils converge a.s. A large class of processes is the game which becomes fairer with time (GFT). The process (Xn) is a GFT if Hmn + 0 in probability, that is, for
every e>0there is Nso that if n>m>N,then P{IIHmnil >El <e. The definition is due to Blake [1970] in the scalar case. This very general class contains all those mentioned above. Mucci [1973] and Subramanian [1973] proved independently (in the scalar case) that a uniformly integrable
GFT converges in mean. The vector case can be proved from a Riesz decomposition as follows.
(5.3.40) Theorem. Let E be a Banach space with the RadonNikodym property. Let (Xn)nEIN be a uniformly integrable GFT. Then Xn converges in Bochner norm.
Proof. Write Hmn = Efm [Xn]  Xn,, for m < n. Since
,,,,n > 0 in
probability, and this double sequence is uniformly integrable, we get Hmn >
0 in Bochner norm. Thus, for a fixed m, the sequence (Em [Xn])n°__",, is Cauchy in Bochner norm. Therefore it converges in Bochner norm. Write
Y,n for the limit, and Z n = X,n  Y,,,.. Thus we have Xn = Yn + Zn, where (Yn) is a uniformly integrable martingale, and (Zn) converges to 0 in Bochner norm. By the RadonNikodym property, (Yn) converges in Bochner norm.
(5.3.41) (Failure of strong convergence of amarts.) Let E = 12 and let
{eni:nEIN,1
Xn =
eni 1Ani i=1
is an amart potential, but does not converge a.s.
(Chacon & Sucheston [1975]). In fact, Bellow [1976a] showed that norm convergence of all
5.4. The RadonNikodym property
217
bounded amarts in a Banach space E implies that E is finitedimensional ((5.5.11(2)), below).
(5.3.42)
(Condition (B) cannot be replaced by Llboundedness.) Let (11, Y, P), eni, Ani be as in the previous exercise. For 1 < k < 2", define
E eni lA..i
Ynk = nenk
1
ilk
2n1 +k 1. Then (Xm) is an L1bounded amart potential, but does not converge weakly a.s. (This example, due to W. J. Davis, appeared in Chacon & Sucheston [1975].) Again in this case, it is true that if E is a Banach space, and every L1bounded amart converges weakly a.s., then E is finitedimensional (5.5.11(3)). (5.3.43) (Scalar convergence and weak a.s. convergence.) Let {en} be the usual unit vectors in the Banach space 11, and let Zn be an independent sequence of real random variables with P { Zn = 11 = P { Zn = 11 = 1/2. Given m E IN, define Xm = Ynk, where m =
Define
2^1
Xn = 2n E Z2n+je2n+j j=o
Then (Xn) is an amart potential with respect to its natural stochastic basis , Xn), it converges scalarly to 0, but it converges weakly .Fn = a(Xi, X2, for (almost) no w. The last point can be verified from the fact that (in l1) weak and norm convergence are equivalent for sequences (Davis & Johnson [1977]). In fact, there is such an example in a separable Banach space E if and only if E* is not separable (5.5.26). (5.3.44) (Submartingales in Banach lattices.) A Banach lattice has the RadonNikodym property if and only if all Libounded positive submartingales converge a.s. The proof of necessity uses the Riesz decomposition (5.2.38) (Heinich [1978b]). Remarks
The RadonNikodym property has become an important idea in the study of the geometry of Banach spaces. This began with Rieffel [1968] and culminated in the books by Diestel & Uhl [1977] and Bourgin [1983]. S. D. Chatterji [1960] proved almost sure convergence of martingales of the form
El [X], where X is Bochner integrable. Convergence of Llbounded martingales taking values in a reflexive Banach space E was proved by F. Scalora [1961]. The same result, assuming only that E has the RadonNikodym property appears as a remark in A. & C. lonescu Tulcea [1963] (page 121; see also Bellow [1984]). The converse, in a stronger formnamely that convergence in Bochner norm of uniformly integrable martingales implies the RadonNikodym propertywas proved by Ronnow [1967] and Chatterji [1968]. Chatterji in 1968 was apparently unaware of the Ionescu Tulcea convergence result, which he reproved. His article was influential
in that it emphasized the importance of the RadonNikodym property.
Banachvalued random variables
218
The convergence theorem for uniform amarts (5.3.23) is due to Bellow [1978a]. The Pettis norm convergence of Llbounded uniformly absolutely continuous amarts
(5.3.24) is due to Uhl [1977]. The result that a separable dual has the RadonNikodym property is due to Dunford & Pettis [1940]. Proposition (5.3.31) is due to Brunel & Sucheston [1976b]. The amart convergence theorem in Banach space (5.3.10) is due to Chacon & Sucheston [1975]. Theorem (5.3.27) is from Davis, Ghoussoub, Johnson, Kwapien, Maurey [1990]. Theorem (5.3.26) is due to Marraffa [1988].
5.4. Geometric properties This section, even more than the last one, is an application of the martingale (and related) results to the geometry of Banach spaces. Martingales have found many varied uses in the study of the geometry of Banach spaces, and we will only describe a few of them here. We will consider extreme points for closed bounded convex sets, and the integral representation that is available for sets with the RadonNikodym property. Another use of RadonNikodym sets is in connection with the RyllNardzewski fixedpoint theorem. We will discuss a geometric characterization of the RadonNikodym property in terms of "dentability." Next we will provide a martingale proof for the "strongly exposed point" characterization of the RadonNikodym property. The following result is a simple consequence of the HahnBanach theorem.
(5.4.1) Proposition. Let E be a Banach space, and suppose x*, y* E E* with IIx*II = IIy*II = 1. Let
a=sup{Ix*(x)I : IIxII < 1,y*(x) =0}. Then either IIx*  y* II < 2a or 11x* + y* II < 2a.
Proof. Let El = { x E E : y* (x) = 0 }. Then El is a subspace of E. The definition of a shows that the norm of the restriction of x* to El is a. Let z* E E* be a HahnBanach extension of it, so that IIz* II = a. Now x*  z* vanishes on the nullspace El of y*, so x*  z* = ,3y* for some /3 E IR. Then
111011=1IIx*IIIIx*z*III <IIz*II
and if ,Q < 0, then
Ily*+x*II = 11(1+a)y*+z*II
I1+0I+IIz*II <2a.
We will need a selection theorem such as the following variant of the Yankovvon Neumann selection theorem. An expository account of it can be found in Section 8.5 of Cohn [1980].
5.4. Geometric properties
219
(5.4.2) Theorem. Let (1k,.F, P) be a complete probability space, let R and S be complete separable metric spaces, and let X : SZ + S be a Borel measurable random variable. Suppose D C_ R is a nonempty Borel set, and f : D  S is a Bore] function. Then {X E f (D)} E.F and there is a Borel
measurable random variable Y: Il + R such that f (Y) = X on the set {X E f (D)}. Some processes appearing below are very close to martingales. By the following lemma, they converge almost surely if martingales do.
(5.4.3) Lemma. Let (Xn) be an Evalued Bochner integrable process such that IIEF° [Xn+1]  Xnll < 2n for all n, and let Yn = limn. Ey° [Xn+,,,,]. Then (Yn) is a martingale, and IIXn  YnII < 2n+1 for all n. Proof. EF, [Xn+m.]
 Xn = E*F
[EF"+m1 [Xn+m]  Xn+,n1]
+ Ef,.
[Xn+m1]  Xn+m2]
+... +E' [E° [Xn+1]  Xn]. Since the conditional expectation is a contraction on Lam, 2nm+1 + 2nm+2 ++2n < 2n+1 II <
A similar computation shows that Yn = limm , EF^ [Xn+m] exists. Now the conditional expectation is continuous on L, (E), so E en [Yn+1] = E F n [lim M
[Xm] ] = lim E"^ [E.*Fn+1 [Xm] ] = Yn,
hence Yn is a martingale.
The study of convex sets is often coupled with the study of extreme points. Let C be a convex set in a vector space E. The point x E C is called an extreme point of C if
x=2(y+z)andy,zEC
imply
x=y=z.
We will write ex C for the set of all extreme points of the set C. The ChoquetEdgar theorem
A theorem of Minkowski states that if C is a closed bounded set in a finitedimensional Euclidean space, and xo E C, then xo can be written as a convex combination of extreme points of C. While this is no longer strictly true in an infinitedimensional space, there is a theorem of Choquet that shows that one can still sometimes write a point in a convex set as a generalized average of extreme points of the convex set. Here is the variant of Choquet's theorem due to Edgar. Notice the method used to construct the martingale. A similar method of construction will be used again below.
220
Banachvalued random variables
(5.4.4) Theorem. Let E be a Banach space. Let C be a separable closed bounded convex set with the RadonNikodym property. Let xo E C. Then there is a random variable X : [0, 1] + C such that (1) X(t) E ex C for almost all t;
(2) fo X(t)dt=xo. Proof. The proof is in three parts: (a) construction of a martingale; (b) the martingale converges; (c) the limit has the properties required. (a) First, there is a strictly convex bounded continuous function 2(b(x) +'(y)), with equality only V): C  R. (That is, 0(1(x + y)) <
if x = y.) For example, since C is separable, there is a countable set {xi } E E* of normone functionals that separates the points of C. Then
=
2i(xi(x))2
(x) _ i=1
is strictly convex. We will construct a martingale (Xn) in C recursively, using the function to measure how "close to the extreme points" a random variable is. The probability space (1,..T, P) will be the interval [0, 1) with Lebesgue measure. The oalgebra J will be the class of Lebesgue measurable sets, and for each n E IN, Fn will be a finite oalgebra with atoms which are subintervals of [0,1). We begin with X1(t) = xo for all t E [0, 1). Let F1 = { 0, St }. Now we want to write xo as 2 (yo + zo) for some yo, zo E C (if xo is an extreme point of C, the only way to do this is yo = zo = xo). Among all possible ways of doing this, we want to choose one representation xo = yo + zo that makes 0 large in the following way:
2 (4'(yo) +b(zo)) ',(xo) sup ( 2 MY) +O(z))  O(xo) : y, z E C, xo = 2 (y + z) }  21 Let X2 = yo 1[o,1/2) + ZO 1[1/2,1),
and let F2 be the oalgebra with the two atoms [0, 1/2) and [1/2, 1). Then clearly E.r'1 [X2] = X1.
Now suppose that Xn has been defined, for some n > 2, as .ln
X. = E Xi 1[(i1)/2n,i/2n), i=1
5.4. Geometric properties
221
where xi E C, and that Fn is the valgebra with atoms [(i  1)/2", i/21) for 1 < i < 2n. For each xi, choose yi and zi in C so that xi = 2 (yi + zi) and 2
('b(yi) + b(zi)) ,O(xi) sup {
2
(O(y) +'i W)  '(xi)
(5.4.4a)
y,zEC,
xi=2(y+z)2n
Then write 2n
Xn+1 = E (yi
1[(a1)/2n,(2i_1)/2n+') + zi 1[(2i_1)/2n+l,i/2n))
i=1
and take Fn+l as the aalgebra with atoms given by the sets in the subscripts. A little thought will show that VF n [Xn+1] = Xn. This completes the recursive construction of the martingale (Xn). (b) The martingale just constructed has its values in the set C. Since C is a RadonNikodym set, the martingale converges a.s. and in Bochner norm (5.3.22). Write X = limn . Xn. (c) We claim that the random variable X has the properties stated in the theorem. Since (Xn) is a uniformly bounded martingale, we have E [X] = E [EFl [X]] = E [Xl] = xo. The proof of property (1) is more involved. Suppose (for purposes of contradiction) that X (t) is not a.s. extreme. Write A={ (x, x) E C X C: x E C }. The set D = C x C \ A is a Borel set in the complete separable metric space C x C, and the map f : D * C\ex C defined by f (x, y) = (x + y) is continuous. Thus by the selection theorem 2 (5.4.2), there is a random variable X': SZ + C x C such that f(X'(t)) _ X(t) a.s. on the set {X ex C}, such that if we write X'(t) = (Y(t), Z(t)), then X(t) = and Y(t) = Z(t) only if X(t) E ex C. Therefore P{Y # Z} > 2(Y(t)+Z(t)) 0. But then b(Z)) > cfi(X)} > 0, so
(5.4.4b)
E [2 (V)(Y) + (Z)) I > E [,O(X)] + 2m
for some m E IN. We may define Y, = Eon [Y] and Zn = Eon [Z] to obtain martingales (Yn) and (Zn) in C with Xn = (Yn + Zn). But Y. converges 2 to Y and Zn converges to Z, so by the continuity (and boundedness) of (5.4.4b) implies that there is no > m such that for all n > no,
E
12
('(Yn) + tP(Zn))] > E [,G(Xn)] + 2m.
Banachvalued random variables
222
By (5.4.4a), we have
El
P(X.) 2! 2 (V '(Y') + V (Z'))  V (Xn)  2n ,
so E [VJ(Xn+1)  V(Xn)] > 212n. But Xn converges, and ip is bounded and continuous, so by the dominated convergence theorem
E [V(Xn+l)  b(Xn)] + 0. This is a contradiction. Common fixed points for noncommuting maps The Schauder fixedpoint theorem asserts that a weakly continuous map of a weakly compact convex set C into itself has a fixed point. The MarkovKakutani fixedpoint theorem asserts that a commuting family of continuous affine maps of a weakly compact convex set C into itself has a common fixed point. The RyllNardzewski fixedpoint theorem is a generalization
to certain noncommuting families of maps. A martingale proof of a key element of the theorem is given here.
(5.4.5) Definition. Let C be a convex set. A map S: C + C is affine
if
n
S
n
ti xi i1
ti Sxi
= i=1
forxiEC,ti>0,Eti=1. (5.4.6) Definition. A family S of maps from C to itself is distal if for
any pair x,yECwith x0y,wehave infSESIISxSyII>0. (5.4.7) Theorem. Let C be a closed bounded convex RadonNikodym set in a Banach space E. Let S1, S2, be (not necessarily continuous) affine maps from C into itself. Suppose that {S1, S2,. } generates a distal 2iSi. Then semigroup. Let xo E C be a fixed point for the map S = xo is also a fixed point for each of the maps Si.
Proof. Let (Un) be an independent sequence of random variables with val
ues in the countable set IS,, S2,. } and P{Un = Si} = 2i for all i. Note that E [Unx] = Sx for x E C. Let .fin be the valgebra generated by U1, U2,.
Un. Define X.: 11 +Cby Xn(W) = Ul(W)U2(W) ... Un(W)x0
We claim that (X,,) is a martingale. (In the following calculations, measurability is clear, since the valgebras .Fn are atomic.) E''F" [Xn+1] = EF" [U1U2 ... U,Un+lxo] = U1U2 ... [Un+1x0] = U1U2 ... UUSxo = U1U2 ... UnxO = Xn.
5.4. Geometric properties
223
Suppose Sixo 0 xo for some i. There is 6 > 0 so that I I SSixo  Sxo I I > 6 for
all S in the semigroup generated by {S1, S2, 2i and on the set {U.+1 = Si}, we have
}. Now P{U,,+1 = Si}
IIX.+1XnII=IIU1U2...U.(Sixox0)II?6,
so P{II Xn+1 X. II ? 6} > 2i. By Egorov's theorem, the martingale (X,a) does not converge a.s. This contradicts the fact that C is a RadonNikodym set. The conclusion can be extended to include uncountable sets of maps by the usual abstract considerations. We will outline them briefly.
(5.4.8) Corollary. Let E be a Banach space, and let C C_ E be closed, bounded, and convex. Assume: (1) C has the affine fixedpoint property; that is, any weakly continuous affine map of C into itself has a fixed point.
(2) C has Corson's property (C); that is, a family of closed convex subsets of C with the finite intersection property has nonempty intersection. (3) C is a RadonNikodym set. Then any distal semigroup S of weakly continuous affine maps from C to itself has a common fixed point.
Proof. For each S E S, let F(S) x E C : Sx = x }. By the affine fixedpoint property, each F(S) is nonempty. Clearly, each F(S) is closed and convex. Now C is a RadonNikodym set, so by Theorem (5.4.7), the family { F(S) : S E S } has the countable intersection property. Therefore, by property (C), the family has a nonempty intersection, that is, S has a common fixed point.
Finally, we can use a singlemap fixedpoint theorem to deduce the RyllNardzewski fixedpoint theorem itself.
(5.4.9) Theorem. Let C be a weakly compact convex set, and let S be a distal semigroup of weakly continuous affine maps from C into itself. Then S has a common fixed point. Proof. Weakly compact sets have property (C) [since closed convex sets are weakly closed] and are RadonNikodym sets (5.3.33). The fixed point property is a version of the Schauder fixedpoint theorem (Dunford & Schwartz [1958], V.10.5). Thus, Corollary (5.4.8) applies to this situation. Dentability In the geometric arguments below we will use the following notations. Let E be a Banach space. The closed ball about a point x with radius r is denoted B(x,r)={yE E: IIy  xII < r}.
224
Banachvalued random variables
Let C C E be a nonempty bounded set. A slice of C is a set of the form S(C, x*, a) = { y E C : x*(y) > sup x*(C)  a } for some x* E E* and some a > 0. Here sup x* (C) = sup { x* (y) : y E C } by definition. Dentability is a purely geometric concept that turns out to be related to the RadonNikodym property and to martingale convergence.
(5.4.10) Definition. Let C be a nonempty closed bounded subset of a Banach space E. We say that C is dentable if for every e > 0, there is a point xo E C such that xo 0 cl conv (C \ B(xo, e)).
The HahnBanach separation theorem shows how slices are related to convexity. This proof is given in great detail. Future uses of the HahnBanach theorem will be less verbose.
(5.4.11) Proposition. Let E be a Banach space, and let C C_ E be a closed, bounded, and nonempty set. Then C is dentable if and only if C admits slices of arbitrarily small diameter.
Figure (5.4.11) A slice of a dentable set. Proof. Suppose C is dentable. Let e > 0 be given. Then there is xo E C so
that xo 0 cl conv (C \ B(xo, e)). By the HahnBanach separation theorem, there is x* E E* and y E IR such that sup x* (C \ B(xo, e)) < y < x*(xo) (see Figure (5.4.11)). But then
{yEC:x*(y)>y} is a slice of C with diameter less than 2e.
5.4. Geometric properties
225
Conversely, suppose C has slices of arbitrarily small diameter. Let e > 0 be given. Then there is a slice S(C, x*, 2a) of C with diameter less than E. Let xo be any point in this slice. Then (C \ B(xo, c)) fl S(C, x*, a) = 0,
so that (C \ B(xo, E)) C { y E C : x* (y) < sup x* (C)  a } . So also
cl conv (C \ B(xo, E)) C {y E C: x* (y) < sup x* (C) a } , and xo is outside the closed convex hull.
It will be useful to observe that a nonempty bounded set C is dentable if its closed convex hull is dentable, since every slice of cl conv C contains a slice of C. The next theorem is probably the most often used geometric characterization of the RadonNikodym property.
(5.4.12) Theorem. Let E be a Banach space, and C C_ E a nonempty closed bounded convex set. Then C is a RadonNikodym set if and only if every nonempty closed convex subset of C is dentable.
Proof. Suppose that there is a nonempty closed bounded set D C C that is not dentable. Then cl conv D C C is not dentable. So we may assume D is convex. We will construct a quasimartingale on [0, 1) with values in D C_
C that diverges everywhere, yet is very close to a martingale and should converge a.s. by Lemma (5.4.3) together with (5.3.34). (Alternatively, the uniform amart convergence theorem (5.3.22) could be applied, since every quasimartingale is a uniform amart.)
Since D is not dentable, there is e > 0 so that every slice of D has diameter greater than E. Thus if x E D, then D fl B(x, E/2) contains no slice of D. This means that cl conv (D \ B(x, e/2)) = D.
Let Q = [0, 1), P = Lebesgue measure, and F = Borel sets. A process on (Q, .F, P) will be constructed recursively. (The construction is
similar to that of the martingale constructed in (5.4.4).) Choose xo E D (this is possible since D is nonempty), let F1 = { 0, 9 }, and define X1 (w) = xo. Now suppose X1, X2, .. , X,ti have been defined, and X9, is a simple function measurable with respect to a aalgebra .F,, with atoms [0, t1), [t1, t2), ... , [tN1, 1) 
Banachvalued random variables
226
Let x be one of the values of Xn; say X, (w) = x for w E [tj_1,t3). Now
D = cl conv (D \ B(x, a/2)), so there exist x1i x2, , xk E D with JJx  xi 11 > 6/2 for all i, and scalars a1, a2, , ak > 0 with Ek1 ai = 1 such that k
xaixi
< 2't.
i=1
Subdivide the interval [tj_1i tj) into disjoint subintervals with lengths proportional to the ai, and define Xn+1 on these intervals to have the values xi. Thus: IIXn(w) X.+1(w)II > 2 for all w E fl

and IIXn
 E.cn [Xn+1] II < 2n
a.s.
This completes the recursive construction of the process (Xn)nE]N. Now (5.4.3) shows that C fails the RadonNikodym property. Conversely, suppose that every nonempty closed convex subset of C is dentable. Then in fact every nonempty subset of C is dentable. We will show that C has the RadonNikodym property. Let (Q, .F, P) be a probability space, and let p: F + E be an absolutely continuous vector measure with average range in C. For each set M E F with P(M) > 0 we will write the average range of µ on M this way: µ(W)
a(M) =
P(W)
W EF,W CM,P(W)>0
.
We will prove first:
for every s > 0, and every M E F with
P(M) > 0, there is F E.F with F C M, P(F) > 0, and diam(a(F)) < s.
(5.4.12a)
Now a(M) C C, so it is dentable. Thus there is a slice
S={yEa(M):x*(y)>ry} with S # 0 and diam S < e. Now x* o p is a scalar measure and x* o p << P,
so by the scalar RadonNikodym theorem, there is Z E L1(P) so that
x* (µ(W)) = E [Z 1w] for all W E.F. Let F = {Z > ^y} fl M. Then P(F) > 0 since S 54 0. And a(F) C_ S: Indeed, for W C F with P(W) > 0 we have µ(W)/P(W) E a(M) and *
X
µ(W) P(W)
E [Z 1w]
E ['Y lw] P(W) > P(W) = 7
so µ(W)/P(W) E S. This completes the proof of (5.4.12a).
5.4. Geometric properties
227
Now, take a maximal pairwise disjoint collection {Mi } in F such that diam a(Mi) < e. We have now proved for every e > O,there is a countable partition
(5.4.12b)
{ {M} of Il with diam a(M)<e for all i. }1)ono1
Thus, we can choose a sequence ({MM of partitions, increasing with respect to refinement, such that diama(M$) < 2ri1. Define n
Xn 
00
P(Mzn) 1M: . a
i=1
Now if M E F, we have IE[Xn1Mp(M)]
=E <E
i
p(M,) 1M^nM p(M:)
[2n1MflM]
p(MM f1 M) 1M^nM
P(mi n m)
;
< 2n1.
i
Thus E [Xn 1M] + A(M). Now the estimates we have will be used to show that the process (Xn)nE]N converges uniformly. If w E Q and n E IN, then w E Min for some i and w E MJn+1 or some j. But
p(E a
(Mn) Xn(w) = p(Mn)
(Mn),
and, since Mn +1 C M, , p(Mn+1)
Xn+1(w) = p(Mn+1) E a(Mi ). Therefore IIXn(w) Xn+1(w)II < 2n1. Let Y be the uniform limit of the sequence Xn. Then, for any M E.F we have both E [Xn 1M] > p(M) and
E[Xn1M]>E[Y1M],sop(M)=E[Y1M]. (5.4.13) Theorem. Let E be a Banach space. Then E has the RadonNikodym property if and only if every nonempty closed bounded convex subset of E is dentable. Proof. Combine Theorem (5.4.12) with Proposition (5.3.3). Strongly exposed points There is a more precise characterization of RadonNikodym sets in terms of strongly exposed points.
228
Banachvalued random variables
(5.4.14) Definition. Let C be a nonempty closed bounded convex set. A point xo E C is a strongly exposed point of C if there is x* E E* (the strongly exposing functional) such that x* (xo) = sup x* (C) and if xn E C is a sequence with limnoo x*(xn) = x*(xo), then limno. II xn  xoll = 0.
Clearly, a strongly exposed point of C is an extreme point of C. If C has a strongly exposed point, then C is dentable. The geometric criterion we will prove is due to Phelps and Bourgain: a set C is a RadonNikodym set if and only if every nonempty closed bounded convex subset of C is the closed convex hull of its strongly exposed points. To begin the proof, we have a lemma, which we prove using Lemma (5.4.3). Recall the notation for slices:
S(C, x*, a) = { y E C : x* (y) > supx* (C)  a }.
(5.4.15) Lemma. Let E be a Banach space, and let C C_ E be a nonempty closed bounded convex RadonNikodym set. Let x* E E* with IIx* II = 1, let E > 0, and 0 < rl < a. Write M = sup x* (C). Then there is xo E S(C, x*, ii) such that
xo V cl conv [(C\ B(xo, E)) U { x : IIx  xoll S 1, x*(x) < M  a } ] .
x*=M rl x*=M Figure (5.4.15) Illustration for Lemma (5.4.15).
Proof. Let W = (C\B(xo,E))U{x: IIxxoll < 1, x*(x) <Ma}. (The shaded portion in Figure (5.4.15).) Assume the conclusion is false. Let
xo E C be such that x*(xo) > M  71. Let 6 = x*(xo)  M + q > 0. Then xo E cl conv W, so xo is as close as we like to a convex combination of elements x1 i x2, , Xk E W, so for each xi, either Ilxo  xi II > E or both llxo  xi II < 1 and x* (xi) < M  a. If the second case holds, then
a  ij < x*(xo)  x*(xi) < llx*II llxo  xill. Hence for all xi we have Ilxi  xo 11 ? min{E, a  77).
5.4. Geometric properties
229
The above observation may be used to construct a bounded process (Xn) on S2 = [0, 1] that stops when it leaves C and satisfies: (1) Xi(w) = xo a.s.; (2) IIE1 [Xn+,]  X. II < S 2n;
(3) if x*(Xn(w)) > M  r7, then we have Xn(w) E C and either IIXn+1(w)  Xn(w)II > e or both x*(Xn+1(w)) < M  a and II Xn(w)  Xn+1(w)II < 1;
(4) if x*(Xn(w)) < M  77, then Xn+1(w) = Xn(w). Now if x*(Xn(w)) > M  77, then IIXn+i(w)  Xn(w)II > min{E,a  77}. But C has the RadonNikodym property, so Xn(w) converges a.s. by (5.4.3). Write X00(w) for the limit, so that x*(X00(w)) < M ,q a.s. So E [x*(X00)] < M  77. But IIE [X00]

00 xo11
ni
IIE [Xn+1]  E [Xn]
E,
so that M ,q > x* (E [X.]) > x*(xo)  S > M  q, a contradiction. The preceding lemma can be used to prove a geometric consequence.
(5.4.16) Lemma. Let E be a Banach space, let C C_ E be a nonempty closed bounded convex RadonNikodym set, and let 0 > 0. Then the set
As = {x* E E* : 11x* 11 = 1 and diamS(C,x*,a) <0 for some a > 0} is dense in the unit sphere of E*. Proof. Let x* E E* with 11x* 11 = 1, and let b > 0. We will show that there is y* E AO with IIx*  y* II < S. We may assume that C is contained in the unit ball and is not just a single point. We may assume that x* is not
constant on C, since functionals not constant on C are dense in the unit sphere of E*. Let c > 0 be so small that
2e
4e<5,
11E < sup x* (C)  inf x* (C).
Write 77 = E, a = 2E, and M = sup x* (C). Then apply the preceding lemma. Thus: there is xo E S(C, x*, e) such that xo V cl conv W, where W = (C \ B(xo, E)) U { x : IIx  xoII < 1, x* (x) < M  a 1. Now by the HahnBanach separation theorem there is y* E E*, 11Y* 11 =
1, with supy*(W) < y < y*(xo). (The hype plane {y* = y} is in the illustration.) Then for all x E C with y*(x) > y we have IIx  xoII < E. That is, y* E A0. It remains only to show that IIx* y* II < 6. We will do this using (5.4.1). Now if x E E with y*(x) = 0 and 11 x1I < 1, then y*(xo+x) = y*(xo) > y and
Banachvalued random variables
230
II(xo+x)xoll = Ilxlj < 1, so xo+x W and therefore x*(xo+x) > Ma. But that means x*(x) > M a x*(xo) > M a M= a.
Similarly, using xo  x in place of xo + x, we see that x*(x) < a. Thus Ix*(x)l < a for all x E E with IIxii 1 and y*(x) = 0. So by Proposition (5.4.1), either Ilx* + y*II < 2a or 11x*  y*Il < 2a. Since 2a = 4e < 6, it remains only to show 11x* + y* II > 4e.
Suppose (for purposes of contradiction) that iIx* +y*II < 4e. Now we have l le < sup x* (C)  inf x* (C) and M = sup x* (C), so there is x1 E C with x*(x1) < M  11e. Then
y*(xi) = x*(xi) + (x* +y*)(xi) ? x*(x1)  IIx* +11*II
> M+11e4e=M+7e. But sup y* (C) < y* (xo) + e, hence y*(xi) < y*(xo) +'F = x*(xo) + (x* + y*)(xo) + e
<M+e+4e+e=M+6e. So we have obtained the contradiction M + 7e < M + 6e. (5.4.17) Theorem. Let E be a Banach space, and let C C_ E be a nonempty closed bounded convex set. If C has the RadonNikodym property, then C is the closed convex hull of its strongly exposed points. Proof. Suppose C has the RadonNikodym property. We may assume that C is contained in the unit ball B(0,1) of E. Let A = { x* E E* : IIx* II = 1 and x* is strongly exposing linear functional for C } .
Now if A$ is defined as in Lemma (5.4.16), then we have A = f °O 1 Alb,,,.
By Lemma (5.4.16), each Alb,,, is dense (and open) in the unit sphere of E*, so by the Baire category theorem, we know that A is itself dense in the unit sphere of E*. Let W be the closed convex hull of the strongly exposed points of C.
Suppose (for purposes of contradiction) that C 54 W. Then there is a linear functional y* E E* with 11y*I1 = 1 and supy*(W) < supy*(C) = M.
Now let e > 0 satisfy supy* (W) + 2e < M, and choose x* E A with Ilx*  y* 11 < F. Say x* strongly exposes xo E C. Then y*(xo) > x*(xo)  e
and x*(xo) > M  e, so y*(xo) > M  2e > supy*(W). Thus xo 0 W, a contradiction.
There exists a converse of the preceding result. Suppose, for each nonempty closed convex subset D of C, the set D is the closed convex hull of its strongly exposed points. Then C is a RadonNikodym set. In fact, by (5.4.12), it is enough if each nonempty convex D C C has at least one strongly exposed point, since a set with a strongly exposed point is necessarily dentable.
5.4. Geometric properties
231
Complements
(5.4.18) (Locally convex space.) Suppose that F is a locally convex Hausdorff topological vector space. Let K C_ F be a nonempty compact convex metrizable subset. Then there is a linear subspace FO C F containing K, a Banach space E, and a continuous linear transformation T : FO > E such that T is a homeomorphism on K. (5.4.19) (Choquet's theorem.) The preceding can be used to prove Choquet's theorem in one of its original forms: Suppose that F is a locally convex topological vector space. Let K C F be a nonempty compact convex metrizable subset. If xo E K, then there is a random variable X : !Z + K
such that P{X E ex K} = 1 and the Pettis integral E [X] exists and is equal to xo. (5.4.20) (An application of Choquet's theorem.) A function f : 7L > C is called positive definite if 00
E tjtkf(j  k) > 0
j,k=oo
for any choice of complex numbers {tj},'?__.O with all but finitely many equal to 0. We will outline a use of Choquet's theorem to prove the following
theorem of Herglotz: If f : 2Z > C is positive definite, then there exists a finite measure p on [0, 21r) such that 2ir
f(n) =
L
for all n E 7L. The following steps are used for the proof. (a) Using sequences {tj}'?_ with only one nonzero term, it can be
seen that f (O) > 0, and using sequences with only two nonzero terms, it can be seen that If (n) I < f (0) for all n. (b) The set K = { f E l,,,, (7L) : f positive definite, f (0) = 1 } is a compact convex metrizable subset of the locally convex space l,,.(7L) with its weak* topology. (c) For 0 E [0, 27r) let go(n) =
einB
Then gg E ex K.
(d) Any extreme point f of K is of the form go. This can be seen as follows: Let a = Re f (1), and show that f ± g E K, where g(n) = 4 f (n + 1)  2 a f (n) + 4 f (n  1),
so that g = 0. Let /3 = Im f (1). Then f f h E K, where h(n) = 4i f (n + 1)  2 of (n)
 4i f (n  1),
so that h = 0. Finally, f (n + 1) = f (1) f (n) for all n, so that f has the form gg. (This proof is similar to a proof in Edgar [1983].)
232
Banachvalued random variables
(5.4.21) (KreinMilman property.) We say that a Banach space E has the KreinMilman property if every closed bounded convex subset is the closed convex hull of its extreme points. (Or, equivalently, every nonempty closed bounded convex subset has at least one extreme point; see Bourgin [1983], Proposition 3.1.1.) Comparing this to Theorem (5.4.17), one might naturally conjecture that the KreinMilman property is equivalent to the RadonNikodym property. This is still an open problem. Remarks
The selection theorem (5.4.2) is due independently to Yankov [1941] and von Neumann [1949].
Choquet's theorem (for compact sets) is due to Choquet [1956]. The proof by twopoint dilations follows Loomis [1975]. The generalization to RadonNikodym sets is due to Edgar [1975]. For further reading on Choquet's theorem, we recommend Phelps [1966]. RyllNardzewski [1967] gave a proof for his fixedpoint theorem using a "differentiation" argument. Namioka & Asplund [1967] realized that a condition like dentability could be used in the proof. The use of property (C) in our proof may be new. Property (C) holds, for example, in separable Banach spaces, in weakly compactly generated Banach spaces, and in many others. See Corson [1961], Pol [1980]. The RadonNikodym type of argument used here to prove the fixedpoint theorem will also prove this variant (Namioka & Phelps [1975], Theorem 15): If C is a separable weakstar closed bounded convex subset of the dual F* of a Banach space F, then any distal semigroup of weakstar continuous affine maps of C into itself admits a common fixed point. The geometric characterization (5.4.13) was one of the first results showing that the RadonNikodym property is relevant in the study of the geometry of Banach spaces. First Rieffel [1968] showed that the RadonNikodym property follows from dentability of every subset. (This step was simplified by Girardi & Uhl [1990].) Then the converse was proved in small steps by Maynard [1973], Davis & Phelps [1974], Huff [1974]. The geometric characterization (5.4.17) is due to Phelps [1974] and Bourgain [1977]; the use of martingales for the proof was suggested by Kunen & Rosenthal [1982].
5.5. Operator ideals This section shows a few more of the connections between our subject matter and the geometric theory of Banach spaces. More knowledge of Banach space theory is required for an understanding of this section than was required in the previous sections.
One concept that has been much used in recent years in the study of Banach spaces is the "operator ideal." We will consider here primarily the following operator ideals: the absolutely summing operators, the RadonNikodym operators, the Asplund operators. Absolutely summing operators The first ideal that we will consider here, because of its connections with amarts, is the ideal of absolutely summing operators.
5.5. Operator ideals
233
(5.5.1) Definition. Let E be a Banach space, and let {xi}°_1 be a sequence of vectors in E. We say that the series E xi is convergent if there is a vector y such that n
lim
y  E xi
n*00
=0.
i=1
The series > xi is absolutely convergent if 00
Ilxill <00. i=1
The series E xi is unconditionally convergent if the series E 9ixi converges for all choices of scalars 9i with 10i I < 1. (Or, equivalently, the series E 9ixi
converges for all choices of scalars 9i = ±1.) The series E xi is weakly unconditionally Cauchy if, for all x* E E*, 00
E Ix*(xi)I < 00. i=1
An application of the closed graph theorem (5.1.3) shows that if E xi is weakly unconditionally Cauchy, then there is a constant A with 00
Ix*(xi)I
for all x* E E*. Indeed, the operator T: E* 4 ll defined by
T(x*) = (x*(xi))21 has closed graph, and is therefore bounded. In a Banach space, every absolutely convergent series is unconditionally convergent, and every unconditionally convergent series is weakly unconditionally Cauchy. (The inverse implications hold if and only if E is finitedimensional; see (5.5.8).) Consider the measure space IN with counting measure, and a function f : N > E defined by f (n) = x,,,. Then the series E x,,, is absolutely
convergent if and only if f is Bochner integrable (clear); and the series E x,,, is unconditionally convergent if and only if f is Pettis integrable (see Lindenstrauss & Tzafriri [19771, Proposition 1.c.1(ii)).
Banachvalued random variables
234
(5.5.2) Definition. Let E and F be Banach spaces. The operator T : E + F is said to be absolutely summing if there is a constant C such that f o r all finite sequences {x1, x2, , xn} in E, n
IITxi1I < C sup i=1
Ix*(xi)I : x* E E*, IIx*II < 1
.
i=1
(Stated another way: for f (= ll(E), the Bochner norm of T o f is < a constant C times the Pettis norm of f.) The smallest such constant C is called the absolutely summing norm of T, and will be written 7rl(T). Note that n
:9i=±1},
sup E I x*(xi)I =sup IIx'II<1 i=1
so that (by the closed graph theorem) T is absolutely summing if and only if the unconditional convergence of a series > xi implies the absolute convergence of ETxi. A generalization of absolutely summing operators is stated next.
(5.5.3) Definition. Let p be a positive real number. The operator T : E + F is called pabsolutely summing if there is a constant C such that f o r all finite sequences {x1, X 2 ,.. . , xn} in E,
(E
IITxiIIp)"'
< C sup {
(E
Ix*(xi)Ip)1/p
: x* E E*, IIx*II < 1 } .
The smallest such constant C is called the psumming norm of T, and will be written irp(T). The collection of pabsolutely summing operators is a Banach operator ideal in the sense given in the next proposition. An operator T : E + F is said to have rank one if it has the form
T(x) = x*(x)y,
for all x E E,
for some x* E E* and y E F.
(5.5.4) Proposition. Let p > 1. (1) For any pair E, F of Banach spaces, the set Hp (E, F) of all pabsolutely summing operators from E to F is a Banach space under the norm 1rp.
(2) For any linear transformation T, we have IITI1 < irp(T), with equality for rank one operators. In particular, if T is pabsolutely summing, then T is bounded. (3) If T : E + F is pabsolutely summing, Q : El + E and R : F + F1
are bounded operators, then the composition RTQ: El + Fl is pabsolutely summing, and irp(RTQ) 5 IIRII irp(T) IIQII.
5.5. Operator ideals
235
Proof. (2) Take n = 1 in the definition. If x E E, then IITxII < irp(T) sup { I (x, x*) I : x* E E*, IIx* II
1
Therefore IITxII < irp(T) IITII So we have IITII < irp(T).
If T : E + F has rank 1, then
T (x) = xo (x) yo
for some xo E E* and yo E F, Ilxoll = 1. Thus IITII = Ilyoll, and for xl, x2i
, xn, E E, we have n
IITxiIIp =
Jxo(xi)Ipllyollp
i=1
< IITll sup { (E l x* (xi) I P) : x* E E*, IIx* II < 1 }
.
Thus 7rp(T) < IITII in this case, so 7rp(T) = IITII. (1) The only nontrivial assertion is the completeness. Suppose (T,,,) is a sequence of pabsolutely summing operators, Cauchy in the norm 1rp. Then irp(T,n) converges, say to a. By part (2), the sequence (T,n) is also Cauchy in the operator norm. So there is an operator T : E + F with II Tn,  T11 + 0. We claim that T,,, + T also in the norm irp. Now T,nx + Tx for all x E E. If the finite set {x1, x2, , xn} is given, then we have
In
n
1/p
1/P
= hm
IITxiIIp i=1
IlTmxillp i=1
n
< lim irp(T,n) sup m
IIx*II:51
= asup
(t1(Xi'X*)JP i=1 1/p
I (xi,x*)IP i=1
So T is pabsolutely summing. Given e > 0, we may choose mo so that +7rp(T,,,o  Tm) < e for all m > mo. The argument just used shows that 7rp (T,no  T) < e. So we have proved that irp (T.,,,,  T) + 0. (3) follows from the definition.
One of the most useful results on absolutely summing operators is:
(5.5.5)
Pietsch factorization. Let p > 1, and let E, F be Banach
spaces. Write K for the unit ball of E* with its weak* topology. Then
the operator T : E + F is pabsolutely summing if and only if there is a probability measure y on K and a constant C such that
/f
1/p Ix*(x)Ipdµ(x*)\)
IITxII < C I JK
Moreover, the smallest such constant C is 7rp(T).
for all x E E.
Banachvalued random variables
236
Proof. Suppose that T : E p F is pabsolutely summing, and irr (T) = 1. Then IITII < 1. For X E E, define gx E C(K) by gx(x*) = Consider two subsets of C(K) defined by
Ix*(x)IP.
F1 = { f E C(K) : sup f (x*) < 1 I x* EK l
F2=conv{gx:xEE, IITxhI=1}. Then F1 and F2 are convex sets and F1 is open. Since irr(T) = 1, we have F1 fl F2 = 0. So the two sets can be separated by a linear functional on C(K). By the Riesz representation theorem, there is a positive constant A and a signed measure p on K with variation 1 so that f E F1 implies
f E F2 implies
JK
fK
Since F1 contains all nonpositive functions, the measure µ is a positive measure. Since F1 contains the open unit ball of C(K), we have A > 1. Thus for any x E E with IITxII = 1 we have fK Ix*(x)IPdp(x*) > 1, and for every x E E we have fK Ix* (x) IP dp(x*) > IITxIIP. For the converse, suppose that p and C exist. Then if xl, x2i , xn are in E, we have n
/ IITxihIP < CP J
lx*(x=)IPdlp(x*)
K
i=1
(5.5.6) Proposition. Let p < q, and suppose that T : E  F is a pabsolutely summing operator. Then T is also qabsolutely summing and a9(T) < irr(T). Proof. By the Pietsch factorization theorem (5.5.5), 1/P
IITxhI
<7rr(T) (f Ix*(x)IPdp(x*)) UK
The function t4/P is convex, since p < q, so by Jensen's inequality (2.3.10), we have 1/P
(1K
I x*(x)I P dp(x*))
1/4
<
I x*(x)I ° (1K
dp(x*))
5.5. Operator ideals
237
Combining these two inequalities, we obtain
/
IITxII
1/q
< ir(T) ( fx I x*(x)Il d,L(x*) )
Therefore T is qabsolutely summing and lrq(T) < 1 p(T).
The Pietsch factorization theorem shows the existence of an actual "factorization" when certain projections exist. Since every closed linear subspace of a Hilbert space is the range of a projection, the case of 2absolutely summing operators is particularly simple.
(5.5.7) Corollary. Let T : E + F be a 2absolutely summing operator. Then there is a compact space K and a probability measure µ on K such that the operator T has a factorization IE
T F
II
Ts
C(K) + L2(K,p) Jµ
where I: E + C(K) is an isometry, and JN, : C(K) + L2 (K, µ) is the formal identity map.
Proof. Let K and p be as in the Pietsch factorization theorem. The canonical isometry I: E + C(K) is defined by I(x)(x*) = x*(x) for x* E K C E*. The inequality from the theorem shows that the map sending JN,I(x) to Tx is a bounded map on the set JµI(E) C L2(K, µ). So it can be extended by continuity to the closure. Composing with the orthogonal projection onto that closure extends it to a map S defined on all of L2(K, t). We obtain from this a version of the DvoretzkyRogers lemma.
(5.5.8) Theorem. Let E be a Banach space. The identity operator on E is absolutely summing if and only if E is finitedimensional. Every unconditionally convergent series in E is absolutely convergent if and only if E is finitedimensional.
Proof. If E is finitedimensional, then the series property follows from the well known onedimensional case, so the identity operator is absolutely summing. Suppose every unconditionally convergent series in E is absolutely con
vergent. That means the identity map I: E + E is absolutely summing. Then I is also 2absolutely summing (5.5.6), so by the previous corollary it factors through a Hilbert space. That means that E is linearly homeomorphic to a closed subspace of the Hilbert space. It therefore remains only to
238
Banachvalued random variables
show that an infinitedimensional Hilbert space contains a series that converges unconditionally but not absolutely. If lei} is an infinite orthonormal sequence, then the series
converges unconditionally (Bessel's inequality), but does not converge absolutely.
The absolutely summing operators illustrate the difference between the
Bochner and Pettis norms. Since the two norms are related by the inequality IIXIIp < E [IIXII], we will have for any operator T: E + F the inequality IITXIIp < IITII E [IIXII]. The reverse inequality (with a multiplicative constant) holds only for the absolutely summing operators.
(5.5.9) Proposition. Let T : E > F be an operator. Then T is absolutely summing if and only if there is a constant C such that, for every random variable X in E, we have E [IITXII] <_ CIIXIIp.
The smallest such constant C is 7rl(T).
Proof. Suppose T is absolutely summing. Let K and p be as in the Pietsch factorization theorem (5.5.5). Then if X: f > E is a random variable, we have by Fubini's theorem E [IITX II ] < 7rl (T) E [1K 1x* (X) I dµ(x*)]
JE [Ix*(X)I] dµ(x*)
_ 7r1(T)
K
< 71 (T) sup E [Ix*(X)I] ir1(T)IIXIIp Conversely, suppose there is a constant C so that
IITXII1
n
X=
nxi 1Ai i=1
5.5. Operator ideals
239
Thus n
n
IIT(nxi)II P(Ai)
IITxiII =
= E [IITXII]
= C sup E [Ix*(X)I] IIx* II<1
n
=C sup Enlx*(xi)IP(Ai) IIx*II:51 i=1
n
=C sup E Ix*(xi)I IIx*II51 i=1
Absolutely summing operators also characterize certain amart properties.
(5.5.10) Theorem. Let T : E > F be a bounded linear operator. Then the following are equivalent: (1) T is absolutely summing;
(2) if (Xn) is an amart in E, then (TX,,) is a uniform smart in F; (3) if (Xn) is an L,bounded amart potential in E, then (TX,,) converges a.s. in F (to 0); (4) if (Xn) is an L,bounded amart potential in E, then (TX,,) converges weakly a.s. in F;
(5) if (Xn) is an L1bounded amart potential in E, then IITXnII is bounded on a set of positive measure.
Proof. (1) (2). Apply the difference properties characterizing amarts (5.2.6) and uniform amarts (5.2.3). Assume that T is absolutely summing. Then (2) follows from Proposition (5.5.9). (2) (3). This is a consequence of the Riesz decomposition theorem (5.2.13). (3)
(4) and (4)
(5) are trivial.
(1). Suppose T is not absolutely summing. We will show that there is an L,bounded amart (Xn) in E such that (TX,,) diverges weakly a.s. There is an unconditionally convergent series > xn in E such that E Txn is not absolutely convergent in F. By the convergence of E x, we have I I xn 11 4 0, so IITxnII 0. Now E IITxnII = oo. Let Qn be such that (5)
+ 0 but E I I T xn I I ,Qn = oo. We may assume that I I Txn I I ,13n < 1 for all n. Let (Il,.F, P) be [0, 1) with Lebesgue measure. Let Cn be independent events with P(Cn) = IITxnII on. Define 13
X.
P(Cn) lC,,
Banachvalued random variables
240
The process (Xn) is Llbounded, since E [IIXxII] = P(C) P(CC) = IIxnII > 0. Let .TT be the aalgebra generated by {X1, X2, , Xn}. The process (Xn) is adapted to the stochastic basis In fact, (Xn) is an amart potential.
Indeed, given e > 0, there is N such that 00
1: anxn
<e
IIfl=N
for any choice of scalars an with lanl < 1. If or E E and or > N, write
Dn=Cnfl{Q=n}, so that
cc
E [Xe] =
E P(C) P(Dn)
But P(Dn)/P(Cn) < 1, so IIE [Xe] II < e. Thus (Xn) is an amart potential. Next, we must show that (TXn) is not bounded on any set of positive measure. Since > P(Cn) = oo, the BorelCantelli lemma tells us that for almost all w E St, we have w E Cn for infinitely many n. For those n, IITxnII
IITXn(w)II = P(Cn)
_
1
Tn
But Nn * 0, so the sequence TXn(w) is unbounded.
As a corollary we obtain some interesting characterizations of finitedimensional spaces.
(5.5.11) Theorem. (1) A Banach space is finitedimensional if and only if every amart in E is a uniform amart. (2) A Banach space E is finitedimensional if and only if every L1bounded amart in E converges strongly a.s. (3) A Banach space E is finitedimensional if and only if every L1bounded amart in E converges weakly a.s. (4) A Banach space E is finitedimensional if and only if every L1bounded amart potential in E is bounded on a set of positive measure.
Proof. In finite dimensional spaces, the convergence properties follow from
the onedimensional cases: the projection on a onedimensional space is continuous, and a sequence in 1R' converges if and only if each of its coordinates converges. The converse directions follow from the previous theorem, together with Theorem (5.5.8).
5.5. Operator ideals
241
RadonNikodym operators The next operator ideal to be considered is the ideal of RadonNikodym operators.
(5.5.12) Definition. Let E and F be Banach spaces, and let T : E > F be a bounded linear operator. We say that T is a RadonNikodym operator (or T has the RadonNikodym property) if, for every probability space (S2, , P) and every vector measure µ:.F > E that is absolutely continuous with respect to P and has finite variation, the measure Tp has a RadonNikodym derivative in L1(S2, .F, P; F). This is motivated by the definition of the RadonNikodym property for a Banach space E. In fact, the Banach space E has the RadonNikodym
property if and only if the identity operator on E is a RadonNikodym operator. Most of the elementary part of the theory of Banach spaces with the RadonNikodym property can be reproduced in terms of RadonNikodym operators, simply by inserting the operator into the proofs in appropriate places. For example, as in (5.3.3), it is sufficient to use vector measures µ with bounded average range in E. Or: an operator is a RadonNikodym operator if and only if its restriction to each separable subspace is a RadonNikodym operator. The next result states that the collection of RadonNikodym operators is a Banach operator ideal. The "RadonNikodym norm" of T is simply the operator norm IITII.
(5.5.13) Proposition. (1) For any pair E, F of Banach spaces, the set of all RadonNikodym operators T : E + F is a Banach space under the operator norm IITII.
(2) If T : E  F is a RadonNikodym operator, Q : El * E and R: F  F1 are bounded operators, then the composition RTQ: El  Fl is also a RadonNikodym operator. Proof. (1) To see that the set of all RadonNikodym operators from E to F is a vector space, note that
d(aTp) dTp =a dP dP d(Ti +T2)µ _ dTip dT2,a dP dP + dP Next consider completeness. Suppose operators Tn : E  F are RadonNikodym operators, and Tn converges to T in the operator norm. By taking a subsequence, we may assume 00
I IITn+1  TnhI < 00n=1
Banachvalued random variables
242
Now each difference T7L+1 Tn is a RadonNikodym operator, so there exist random variables Xn : 0 + F so that Xn = d(T,,+1  Tn) p/dP. (Let To = 0.) Now the variation satisfies I (Tn+1  T.),1 (9) < IITn+1 T. 11 Ii lA,
and E [IIXXII] < IITn+1  T. 11 IpI (0), so the series >X,,, converges in Ll (1; F), say to X. For A E .F, E [X 1A] = limo E
[(x)
1AJ
m T,,, (µ(A)) = T (µ(A))
Thus X = dT u/dP, so T is a RadonNikodym operator.
(2) Let (1,.F, P) be a probability space, and let µ:.F + El be a vector measure, absolutely continuous with respect to P, with finite variation. Then QM: F > E is a vector measure, absolutely continuous with respect to P, with variation at most IIQII IAI(1) < oo. Thus the RadonNikodym
derivative X = dTQµ/dP exists. But then the RadonNikodym derivative RX = dRTQp/dP exists. Thus, the composition RTQ is a RadonNikodym operator. Arguments almost identical to those used above (5.3.29 and 5.4.12) prove the next result.
(5.5.14) Proposition. Let E and F be Banach spaces, and let T : E p F be a bounded linear operator. Then the following are equivalent: (1) T has the RadonNikodym property; (2) every closed bounded nonempty set C C_ E has slices S with image T(S) of arbitrarily small diameter in F; (3) for every L1bounded martingale (Xn)nEIN in E, the image (TX,,,) converges a.s. in F; (4) for every amart (Xf)fEIN in E satisfying condition (B), the image (TX,,,) converges scalarly in F.
The RadonNikodym operators are related to the Riesz representable operators. Before we make this more precise, let us discuss the representable operators.
(5.5.15) Definition. Let (1, .F, P) be a probability space, and let E be a Banach space. Then an operator T : L1(St, .F, P) > E is said to be representable if there is a random variable X : Q + E such that T(Z) = E [Z X] (Necessarily X E L,,,, (1l,.F, P; E).)
for all Z E Ll (St, .F, P).
5.5. Operator ideals
243
(5.5.16) Proposition. Let E be a Banach space. Then E has the RadonNikodym property if and only if every operator T : L1 + E is representable.
Proof. There is a onetoone correspondence between the set of all operators T: L1(Il, F, P) + E and the set of all absolutely continuous vector measures u: F + E with bounded average range. The measure p corresponding to the operator T is defined by p(A) = T(1A). The operator T is representable by a random variable X if and only if the measure p has RadonNikodym derivative X.
Analogous (roughly speaking) to the Pietsch factorization is the LewisStegall factorization, which is proved next.
(5.5.17) Theorem. Let (St, F, P) be a probability space, and let E be a Banach space. Then the operator T : L1(fl, F, P) + E is representable if and only if it factors through the space 11, that is, there exist operators S: L1 + 11 and R: 11 >E such that T= RS. S
L1
11 I
IR
E The range of a representable operator T : L1(Il,
P) > E is separable.
Proof. Suppose first that T has a factorization T = RS. Then S is representable, since ll has the RadonNikodym property (5.1.16). Say S(Z) = E [Z X] for all Z E L1. But R is continuous and linear, so we have RS(Z) = E [Z R(X)], so that RS is also representable. Conversely, suppose that T is representable. Then there exists X E L,,. (S2, .F, P; E) so that T (Z) = E [Z X] for all Z E L1. Now X has separable range, so T also has separable range. We may assume that E is a separable space. Let e > 0. For each positive integer n, the space E can be covered by countably many balls of radius E2_n1, so there is a random variable Yn : Il > E with countably many values such that
IIXYnll,,. <E2n1. Let X1 =Yl andXn=YnYn_1 forn> 2, so that Xn has countably many values and IIX  E; =1 X n ll < e2n1. Then Xn has the form Xn = E00xnk 1Enk, k=1
where Xnk E E and Enk E F with Enk fl Enk' = 0 (for k # k') and II xnk Il < 3E2n+l if n > 2. Define S : L, (Q, F, P) S(Z) (n, k) = Ilxnkll E [Z 1Enk]
11(IN x IN) by
Banachvalued random variables
244
for Z E L1. Then we have 00
00
IIS(Z)II11 = i i IIXnkII jE [Z 1E,.k] n=1 k=1 00
00
00
< E IIx1k11 E [IZI lE,k] + E E 3E2n+1E [IZI 1E,.k] n=2k=1
k=1
But IIX
 X, 11 < E/4 and IIXIIo0 = IITII, so IIX1k1I < IITII + E/4. Then IIS(Z)II1 s (IITII + 4E [IZI] + 4 E [IZI] _ (IITII + e) E [IZI]
Thus IISII
.
IITII + e. Then define R: 11(IN x IN) + E by 00
00
R(h) _
h(n, k) n=1k=1
IlxXnk
nk II'
with the convention 0/0 = 0. Then IIR(h)II <_ > E I h(n, k) I = IIhII, so IIRII < 1. Also 00
00
RS(Z) = E
Xnk
IlxnkII E [Z1E,.k]
IIXnkII
n=1 k=1 00
E n=1
1Enk (fxflk k=1
Z
00
=EE[XnZ] n=1
=E[XZ] = T(Z). Thus T has been factored, and IIRII IISII
IITII +E.
If T is a representable operator, then it factors through 11. Since 11 is separable, the range of T must be separable as well.
The connections between the representable operators and the RadonNikodym operators are illustrated by the next two results.
(5.5.18) Proposition. Let (S2, F, P) be a probability space, and let E be a Banach space. Then the operator T : L1(S2, F, P) p E is a RadonNikodym operator if and only if it is representable.
Proof. Suppose that T is a RadonNikodym operator. Define p: F + Ll by µ(A) = 1A. Then p is absolutely continuous with respect to P, and
5.5. Operator ideals
245
jp1(A) = P(A) for all A E F, so that p has average range bounded by 1. Since T is a RadonNikodym operator, there exists a RadonNikodym derivative X = dTµ/dP. Now if Z = 1A, then we have E [Z X] = E [X 1A] = Tp(A) = T(1A) = T(Z). The equation E [Z X] = T(Z) holds for simple functions Z, and each side is a continuous linear function of Z E L1, so the equation is true for all Z E L1. Therefore T is representable. Conversely, suppose T is representable. Then T factors through 11; say
T = RS, where S: L1 + ll and R: 11 + E. If I is the identity operator on 11, then I is a RadonNikodym operator, since the space ll has the RadonNikodym property (5.1.16).
I
Therefore, by the ideal property (5.3.13(2)), T = RIS is a RadonNikodym operator.
(5.5.19) Theorem. Suppose E and F are Banach spaces, and suppose T : E + F is a bounded linear operator. Then T is a RadonNikodym operator if and only if, for every probability space (S2, .F, P) and every
operator S: L1P) + E, the composition TS is representable. Proof. If T is a RadonNikodym operator, then so is TS, and therefore TS is representable. Conversely, suppose TS is representable for any S. Then if p: .F + E is a vector measure with bounded average range, there is a unique operator S: L1(1
P) + E
such that S(1A) = u(A) for all A E.F. But TS is representable, say by X : 1 + F, and then X = dTµ/dP. In light of the LewisStegall factorization, this result can be restated like
this: T : E + F is a RadonNikodym operator if and only if, for every operator S : L1(1k, .F, P) + E, there is a factorization of TS through 11: L1
>
11
sI 1
Banachvalued random variables
246
It is not hard to see that there is a relation between the ideal of RadonNikodym operators and the ideals of compact and weakly compact operators: An operator T : E * F is compact if the closure of the image of the unit ball of E T(BE) is a compact set. Equivalently, for every bounded sequence (x,,,) in E, there
is a subsequence (x,,,,,) such that T (x,,,,) converges in F. The operator T : E + F is weakly compact if the closure of the image of the unit ball of E is a weakly compact set. Equivalently, for every bounded sequence (x,,) in E, there is a subsequence (x,,,,,) such that T (x,,,,) converges weakly in F.
(5.5.20)
Proposition. Every weakly compact operator is a Radon
Nikodym operator.
Proof. Let T : E > F be weakly compact. Let (Xf)fEIN be a martingale in the unit ball of E. Then the martingale (TX,,) has values in a weakly compact convex set, namely the closure of the image of that unit ball under T. Now weakly compact convex sets are RadonNikodym sets (5.3.33). So (TX,,) converges. This shows that T is a RadonNikodym operator.
(5.5.21) Corollary. Every compact operator is a RadonNikodym operator. A weakly compact operator T : Ll + X has separable range and is representable.
Proof. Any compact operator is weakly compact. Suppose T : Ll (St, F, P) + E is weakly compact. Then T is a RadonNikodym operator, so it is representable. Thus there is a LewisStegall
factorization T = RS, where S: Ll + ll and R: ll > E. But ll is separable, and the range of T is a subset of the range of R, so it is separable. Asplund operators
The next operator ideal to be considered is the ideal of Asplund operators. The main result of concern here is the connection between scalar convergence and weak almost sure convergence.
(5.5.22) Definition. Let E and F be Banach spaces, and let T : E + F be an operator. Then T is an Asplund operator if T* is a RadonNikodym operator. When defined in this way the ideal properties follow easily from those of the RadonNikodym operators.
(5.5.23) Proposition. (1) For any pair E, F of Banach spaces, the set of all Asplund operators T: E + F is a Banach space under the operator norm 1IT11.
(2) If T : E + F is an Asplund operator, Q : El > E and R: F > Fl are bounded operators, then the composition RTQ: El + FL is also an Asplund operator.
5.5. Operator ideals
247
As with the ideals considered above, this one has close connections with
factorization conditions. In order to properly state these conditions, we first consider the Haar operator. The Cantor set is the countable product topological space
A={0,1}IN. It is compact and metrizable. It is made up of made up of parts
IAni:nEIN,O
01o={(xi)EA:x1=0} Ali = { (xi) E 0:x1 =1} A20 = { (xi) E A : x1 = O, x2 = 0 }
(x2) E A: X1 =0,x2=1}
A21
A22={(xi) E 0:x1 =1,x2=0} A23 = { (xi) E A : xl = 1,x2 = 1 } and so on.
The Cantor set supports a probability measure p with p(Oni) = 2n; this is the product measure where the factors assign measure 1/2 to each point of {O,1}.
(5.5.24) Definition. The Haar functions hni : A > IR are defined by hni = 10n+1,2i  10n+1,2i+1
If { eni : n E IN, 0 < i < 2n } is an enumeration of the unit vector basis of 11, then the Haar operator H : l1 + L,,.(A, p) is defined by
H(eni) = hni
For the Here is Stegall's theorem characterizing Asplund operators. proof, see Stegall [1981], p. 515, or Bourgin [1983], Chapter 5. A function cp : E + IR is Frechet differentiable at the point x E E if there is a linear functional x* E E* such that lim Icc(x + y)  cP(x)  x*(y)I = 0. I1yII0
II.II
248
Banachvalued random variables
(5.5.25) Theorem. Let U: E * F be an operator. Then the following are equivalent. (1) U is an Asplund operator. (2) U* factors through a space with the RadonNikodym property. (3) U* maps closed bounded convex subsets of F* to RadonNikodym subsets of E*. (4) If cp: F > IR, is continuous and convex, then coU is F7rechet differentiable on a dense subset of E.
(5) U is not a factor of the Haar operator H:
EUF 11 )H L,,. (6) U factors through a space, every separable subspace of which has separable dual.
A Banach space where the identity operator is an Asplund operator is called an Asplund space. Then E is an Asplund space if and only if E* has the RadonNikodym property. The interest here of Asplund operators is the following result.
(5.5.26) Theorem. Let E and F be Banach spaces, and let U: E > F be an operator. The following are equivalent. (a) U is an Asplund operator. (b) If XX,, is a sequence of Evalued Bochner measurable functions, converging scalarly to 0, with sup,, IIXnII < oo a.s., then the sequence (UXn,) almost surely converges weakly to 0. (c) If (Xn.)n,EIN is a weak sequential amart potential in E satisfying condition (B), then (UXn) almost surely converges weakly to 0. (d) If (Xf)n,EIN is an amart potential in E satisfying condition (B), then (UXn,) almost surely converges weakly to 0.
Proof. Three of the four parts of the proof are easy. (a) (b). Suppose U is an Asplund operator. Then by Theorem (5.5.25(6)), U can be factored, U = WV, where V : E  Eo, W : Eo > F, and Eo is a Banach space, every separable subspace of which has separable dual. Let Xn : St * E be Bochner measurable, sup,, IIXn Il < oo a.s., and suppose X,, converges scalarly to 0. Then by (5.1.7) there is a separable
subspace El of Eo such that P{VX,, E E1} = 1 for all n. Now El is separable; choose a countable dense set {x* }0 For each k there is a null set 1 k C_ SZ such that (VXn,x*) * 0 on St \ Qk. Now since {xk} is dense and {jjVXn(w)jj} is bounded, we have (VXn,x*) + 0 for all x* E El and all w E SZ \ U Qk. But then UXn = WVXn converges weakly to 0 a.s.
5.5. Operator ideals
249
(b) (c). Let (Xf)fEIN be a weak sequential potential in E satisfying condition (B). If x* E E*, then (X,,, x*) is a scalar amart potential, and thus (X,,, x*) + 0 a.e. That is, (X,,) converges scalarly to 0. By the maximal inequality (5.2.19), sup,, IIXnIl < oo a.s. Therefore, UXn > 0
weakly a.s.
(d). Any amart potential is a weak sequential potential. (a). Suppose U is not an Asplund operator. By Theorem (5.5.25(5)), U is a factor of the Haar operator H: (c) (d)
EUF ST
IT
) L. (A,µ)
11
H
Now we claim that it will suffice to exhibit an amart potential (Xf)fEJN in 11 satisfying condition (B) such that HX,, does not converge weakly a.s.
to 0 in L,,. Indeed, then (SX,,) will be a strong amart potential in E satisfying condition (B), but (USX,,) will not converge weakly to 0 a.s. in F. For this example, we let 1 = A be the Cantor set, and P = µ the natural measure on A. We will also use the other notation in Definition (5.5.24). Define X,,: A > ll by n 2m1 Xn(w) =
hmi(w)emj.
m
We will prove that (Xn) has the required properties. First, observe that (Xn) is Lambounded, and hence satisfies condition (B), since for any w E St, n 2m1
IIXn(w)Illl
=
EE m=1 i=0
n
n
1
hmi (w)
We claim next that HXn does not converge weakly a.s. to 0 in Since the range of H is contained in the subspace C(A), it suffices to show
that HXn does not converge weakly to 0 in C(0). For each w E A, we have
n 2m1
HXn(w)(w) = n
n
hmi(w) hmi(w) = n E 1 = 1, m=1 i=0
so that HXn(w) does not converge weakly to 0.
m=1
Banachvalued random variables
250
Now we claim that if 2 < p < oo, there is a constant CP such that for any scalars ani with IaniI < 1,
e En 21
(5.5.26a)
< CP nP/2.
ami hmi
m=1 i=0
This is a classical inequality. It is implicit from Paley [1932]. A modern proof might observe that the Haar functions, taken in the proper order, constitute a martingale difference sequence. The square function S associated with (5.5.26a) satisfies S2
n 2"`1
n
=
1 = n,
ami h n.i <
m=1 i=0
m=1
so S < n1/2. Since the LP norm of a martingale and its square function differ at most by a constant factor (6.3.6), we obtain (5.5.26a). The next claim is that sup { E [lx*X I] : x* Eli, 11 x* < 1 }  0 as or increases in E.
Choose p with 2 < p < oo, and let 0 < a < (p  2)/(2p  2), so that p/2  a(p  1) > 1; for example, take p = 7, a = 1/4. Let en = n°`. Given x* E li with 1Ix*11 < 1, write ani = x*(eni), so that land < 1. By (5.5.26a), P
l
nn` 2TM1
E Y ami n m=1 i=0
CP
< np/2
hmi P
and hence for A > 0, 1
P{
n 2m_1
> n1 m=1 i=0 ami hmi
CP np/2AP
LetaEE,v>N. Then 00
1
2"`1
n
E [Ix*XI] = t E n E n=N
= n=N"" 00
LI
m=1 i=0
°O
1
ami hmi 1{o=n} 2m l
P
n m=1 i=0 ami hmi 1{o=n} > A
=n (E"Pf...JdA+EPf...IdA)
1
dA
5.5. Operator ideals
251
00
E (rnP{v=n}+npP n=N
En
00
00
_E
na P{a = n} +
n=N
NaP{o' = n} + p C
=Na+
Cp1
00
E n(p/2)a(p1)
°O
n =N
P
C
i E n=N nPl2en (p  1)
00
<
ApdAI
J
n(pl2)a(P1)
This tends to 0 as N > oo, since a > 0 and p/2  a(p  1) > 1. Now it is easy to verify that (Xn) is an amart: IIE [Xo] II
= sup
I (E [XoJ , x*) I
IIx* II<<1
sup E [I (Xo, x*) I]
,
11x * II <<1
which tends to 0. Finally, (Xn) is a potential, since the Pettis norm
sup E [I(Xn,x*)I] IIx* II<<1
tends to 0 as n * oo. As a consequence, we obtain a convergence result not involving amarts.
(5.5.27) Theorem. Let E be a separable Banach space. Then E* is separable if and only if for any sequence of random variables (Xn) in E with supra IIXnI I < oo a.s., scalar convergence of Xn to 0 implies weak a.s. convergence of Xn to 0.
Proof. By Theorem (5.5.26), the convergence condition is satisfied if and only if the identity operator on E is an Asplund operator. By (5.5.25), that happens if and only if every separable subspace of E has separable dual. If E itself is separable, this means E has separable dual. The preceding material on Asplund operators will enable us to prove the following characterization of weak amart convergence.
(5.5.28) Theorem. Let E be a Banach space. The following are equivalent. (a) E is reflexive.
(b) If (Xn) is an L00bounded weak amart in E, then (Xn) converges weakly a.s.
(c) If (Xn) is a weak amart of class (B) in E, then (Xn) converges weakly a.s.
Banachvalued random variables
252
Proof. (a) . (b). Suppose E is reflexive. Let (Xn) be a weak amart with values in the unit ball of E. In the proof that Xn converges weakly a.s., we may assume that E is separable. For each x* E E*, the sequence (Xn, x*) converges a.s. For each w E St, the sequence X,,,(w) has a subsequence that converges weakly. The method used in the proof of (5.3.31) shows that X,,, converges weakly a.s. to a Bochner measurable limit. (b) ==* (c). A process of class (B) satisfies the usual maximal inequality. Thus we may apply the stopping time argument as in (5.3.9) to reduce to the bounded case. (c) . (a). Suppose E is not reflexive. We will consider two cases. First, suppose E is not an Asplund space; that is, the identity operator on E is not an Asplund operator. Then (5.5.26) there is a (strong) amart (Xn) in E such that (Xn) does not converge weakly a.s. Next suppose E is an Asplund space. Then by Stegall's theorem (5.5.25), every separable subspace of E has separable dual. Since E is not reflexive, there is a sequence x,,, in E with Ilxnll < 1 such that no subsequence of x,,, converges weakly. The vectors xn are contained in a separable subspace El of E. Since El is separable, a diagonalization procedure shows that there is a subsequence of xn that is weakly Cauchy. (We will continue to write xn for that subsequence.) We will now define a rudimentary weak amart (Xn) on [0,1) as follows: Xn = x (1(0,1/2)  1[1/2,1)). To verify that (X,,) is a weak amart, let x* E E*, and E > 0. The sequence xn, is weakly Cauchy, so there is c E 1R and N such that I (xn, x*)  ci < e for all n > N. So, if
aEEanda>N,then
IE [(X0, x*)] I < (1/2)(c + e)  (1/2)(c  e) < E.
So in fact, (Xn) is a weak amart potential. But (Xn) does not converge weakly a.s., since (xn) does not converge weakly. Complements
(5.5.29) A Banach space E is finitedimensional if and only if every L1bounded amart in E satisfies condition (B). (5.5.30) The Pietsch factorization theorem (5.5.5) also implies other properties of the absolutely summing operators. If T : E * F is pabsolutely summing for some p, then T is completely continuous; that is, if xn  x weakly in E, then Txn > Tx in norm in F. (Apply the dominated convergence theorem in the inequality in (5.5.5).) If T is pabsolutely summing, then T is a weakly compact operator. (In fact, T factors through a subspace of an LP space with 1 < p < oo.) (5.5.31) Theorem (5.5.11) was generalized in a different way by L. Egghe. He replaced the Banach space E by a Frechet space. Some of his conditions are as follows:
5.5. Operator ideals
253
Theorem. Let E be a F]rechet space. Then the following are equivalent. (1) E is nuclear; (2) every meanbounded amart in E converges strongly a.e.; (3) every meanbounded amart in E satisfies condition (B); (4) every meanbounded uniformly integrable amart converges in mean. See Egghe [1980b] and Egghe [1982a] for the details.
(5.5.32) (Asplund operators.) It would be interesting to prove Theorem (5.5.26) without using Stegall's factorization (5.5.25(5)). Can the RadonNikodym property in E* somehow be connected with an amart potential in E? Is there a direct way to connect them with Frechet differentiability of coU for convex functions cp?
(5.5.33) (Another ideal.) Clearly the collection of operators that send Lambounded weak amarts to weakly a.s. convergent processes is an operator
ideal. Which ideal is it? Theorem (5.5.28) leads to the natural conjecture that it is the ideal of weakly compact operators. But that conjecture is wrong. If J is the quasireflexive Banach space of James (see Lindenstrauss & Tzafriri [1977], Example 1.d.2), there is a sequence e,,, in J that is weakly Cauchy but not weakly convergent. Now J is a separable dual, so it has the
RadonNikodym property, and J* is separable, so J is an Asplund space. Let T : 11 > J send the canonical unit vectors in 11 to the sequence en in J. Then T is not weakly compact, since en has no weakly convergent subsequence. If (Xn) is a weak amart in 11, then it is in fact a strong
amart (by the Schur property; see Dunford & Schwartz [1958], IV.8.14). But then Xn converges scalarly, so TXn converges scalarly, and (since J is an Asplund space) TXn converges weakly a.s. (5.5.34) (Incompleteness of the Pettis norm.) Let E be a Banach space. Then L' (St, F, P; E) is complete under the Pettis norm II 11p if and only if E is finitedimensional. To see this, apply Theorem (5.5.9) to the identity operator T : E > E. If 11 . 11p is complete, then by the closed graph theorem, X > TX is a bounded operator in both directions between 11 11p and II  11L, 
Remarks
The brief treatment of pabsolutely summing operators, the Pietsch factorization, and the DvoretzkyRogers lemma follows Lindenstrauss & Tzafriri [1979]. For further reading on operator ideals and their applications to Banach space theory, see Pisier [1986]. Bellow [1976a] proved that Llbounded vector amarts converge strongly only if the Banach space is finitedimensional (Theorem (5.5.11(2))). Edgar & Sucheston [1977a] proved that Llbounded vector amarts converge weakly only if the Banach space is finitedimensional (Theorem (5.5.11(3))). Ghoussoub [1979b] showed how absolutely summing operators come into the problem. The operator ideal of RadonNikodym operators was introduced by Reinov [1975] and Linde [1976], who independently developed its important properties. The LewisStegall factorization is from Lewis & Stegall [1973].
Stegall [1981] discusses the ideal of Asplund operators, but the name is from Edgar [1980], which contains Theorem (5.5.26).
The characterization of weak amart convergence (5.5.28) is due to Brunel & Sucheston [1976b]. Theorem (5.5.27) (the dual is separable if and only if scalar convergence implies weak convergence) is due to Brunel & Sucheston [1977].
6
Martingales
In this chapter we will look at martingales more carefully. In many cases,
submartingales and supermartingales will be included. We will consider some of the many interesting results known for these processes, and some of the applications of them. We discuss many well known results concerning maximal inequalities, La, laws of large numbers for martingale differences (extended to a class of mixing processes), decompositions, convergence of transforms of martingales, Burkholder's squarefunction inequalities. The Maharam lifting theorem is derived from the martingale theorem.
6.1. Maximal inequalities for supermartingales In this section we will discuss various maximal inequalities for martingales and supermartingales. We will also prove a law of large numbers for martingale differences. As an application of this, we present a proof of the law of large numbers under the condition of starmixing. Let (S2, a', µ) be a afinite measure space, let (.F11)11EIN be a stochastic
basis, and write F,,, = a(U.Fi). We will write Ei for the conditional expectation: if X is a measurable function,
This may or may not exist: see (2.3.8). We will also write Pi, for the corresponding "conditional probability" : If A is a measurable set,
Pi(A) = Ei 1A. In particular, El will play the role usually reserved for the expectation E. A reader not interested in this generality may assume that µ is a probability measure, and El = E. However, the.F1 variant has applications: see (6.1.3) below.
A maximal inequality
Let A > 0 be a random variable, measurable with respect to F1. Let (Xi) be an adapted sequence of random variables. Fix a value of n E IN.'
For1 A A = St \ Bn = ll
I
Now Bi E Fi, so we may define a stopping time or E E by
lnf{i:1\(w)} Q(w) I
n
ifwEA, ifwEBT.
6.1. Maximal inequalities for supermartingales
255
(6.1.1) Lemma. On the set {Xn > 0}, we have n1 (6.1.1a)
A 1A< X11B1 + E(Xi+1  Xi) 1B: = X. i=1
Proof. If j < n, then on for = j j we have w E Bi if and only if i < j, so n1
X1 1B1 + E(Xi+l  Xi)1B: = Xj = X. i=1
Therefore the formula is: A 1A = A < Xj = X. On for = n} fl A, we have
WE B for alli< n, so n1
X1 1B1 + Y,(Xi+l  Xi) 1B: = Xn = X,i=1
On {Q = n} fl A, we have A 1A = A < Xn = X0,. On the other hand, on
the set for =n}fl{Xn>0}f1Bnwe have AlA=O<Xn=X,. Now assume that Xn > 0 a.e. and that E1Xi = EF1 [Xi] exists for all i. It follows that EjX1 exists for all j, and that EjXX exists for a E E. Now if we apply El to both sides of (6.1.1a), we obtain: n1
(6. 1.1b) AP1
sup Xi 11
El[(Xi+1  Xi)1B,] = E1Xo.
X1 + i=1
JJJ
A process (Xn) is eventually positive if there is no E IN so that Xn > 0 for all n > no. Assume (Xn) is eventually positive. When we let n > oo in (6.1.1b), we obtain AP,
sup Xi > A } <_ sup ElXr.
1
rEE
JJJ
Replacing A by A  1/k and letting k loo, we obtain (6.1.1c)
sup Xi > A AP1 t
sup El X1.. rEE
JJJ
This is the .F1 variant of the maximal inequality (1.1.7). Another application of (6.1.1b) is a maximal inequality for eventually
positive supermartingales. We have Bi E Fi, so by the supermartingale property, Ei[(Xi+1  Xi) 1B.] = 1B: [Ei(Xi+1)  Xi] < 0. Applying El and using (6.1.1b), we obtain a maximal inequality for a fixed n; then we let n + oo, and we obtain
l
AP1{supXi>A}<X1. i>1
JJJ
Now a conditional probability is bounded by 1, so we obtain:
Martingales
256
(6.1.2) Proposition. If (Xn) is an eventually positive supermartingale, then for each positive .F1measurable random variable A, we have
P1fi>l supXz>A
(6.1.2a)
minIA X1,1}.
_I
JJJ
(6.1.3) Corollary. Let (Xn) be an eventually positive supermartingale, let r E INT be fixed, and let A be an Frmeasurable random variable. If A < supz>1 Xi a.s., then A < supl
Proof. For r = 1, the assertion follows from Proposition (6.1.2). The general case is obtained by applying the case r = 1 to the supermartingale (Xn) defined by Xl' = sups 1. The inequality (6.1.1b) also renders the HajekRenyiChow inequality a variant of Kolmogorov's inequality more directly applicable to certain laws of large numbers; examples will be given below. A sequence (ai) of random variables is called predictable if each ai is measurable with respect to Ti1.
(6.1.4) Proposition (HajekRenyiChow). Let (Xi) be a submartingale, let (ai) be a nonnegative increasing predictable sequence, and let A > 0 be an F1measurable random variable. Then for each n,
\P1
(6.1.4a)
1 X, >A sup t
JJJ
+
n_1
+
E1
a1
X+1 ai+l
1
Proof. First, note that (XZ) is also a submartingale. Indeed, we have EEX +1 > EEXX+1 Xi and also EiX +1 > 0, so EiX +1 > X, . Therefore we have (EzX +1  XZ)/ai+1 > 0. Now in order to apply (6.1.1b) to Y2=X2 Then Xi >A 1
AP1 c sup
I
AP1ta}=AP1{ ai
1A}
n1
+ [X+ n1
E1 Ei
al
X1 + [ E1
a1
i=1
+ X2+1
ai+l
FzX!+ ai+l
_ Xi+ ai
1Bi
Xi ls, ai
6.1. Maximal inequalities for supermartingales
<
Xi +
n
al
I
257
El FiX +1  XZ ai+1
i=1
Note that the special case ai = 1 is the original Doob inequality (1.4.18): If A > 0 is F1measurable, then (6.1.4b)
AP1 i sup Xi > A I  E1Xn . 111
A smaller expression on the right will be useful for strong inequalities in LP and in Orlicz spaces.
(6.1.5) Proposition. Let (Xi) be a positive submartingale, and let A be positive and .F1measurable. Then for each n E N,
AP1 j sup Xi ? 1i1
A} < El I[1{5 P1a} Xn ]. J
Proof. Let the (possibly infinite) stopping time a be defined as usual: or = inf { i : Xi > Al. By the localization theorem (1.4.2), if k < n we have on the set {a = k}, Xo = Xk < Eyk [Xn] = e [X.]. Therefore Xo < E'"*' [Xn]
on the set {a < n}. Write A = {sup1 Al = for < n}. Then: AP1 (A) = El [1A A]
El [1A Xo]
< E. [1A EF' [Xn] ] = E1 [1A Xn],
which proves the Proposition. A law of large numbers We will prove a generalization of the classical law of large numbers for independent random variables. The partial sums of independent mean zero random variables form a martingale, so it is natural to generalize to other martingales, or even submartingales (Xi). The analog of the classical hypothesis would involve powers of differences 1 Xi+1 Xil P. A result with this
sort of hypothesis is obtained later for p < 2. For p > 2 the exact analog fails, and the theorem will be shown to hold with the exponent 1 + p/2 replacing p.
Martingales
258
We will use this result:
(6.1.6) Kronecker's lemma. Let (xi) be a sequence of real numbers, and let (ci) be a positive unbounded increasing sequence. If E°O1(xi/ci) converges, then
1 E xi = 0.
lim
M00 C,,,
i=1
Proof. Write co = 0. Let e > 0. There is p E IN such that I Ei `j (xi/ci) I < e
for p < j < m. Note that m
m
m
xi
E xi = 1:(Cj  cj1) E Ci, i=j
j=1
i=1
Therefore if p < m we have 1
CmE xi i=1
E(Cj  cj_1)
E xi
Cm
j=1
i=j
CE (cj  cj1)
m
P
M
m
M
j=P+1
i=j
1
Ci
m
xi Ci
P m < 1 E(Cj  cj_1) 2xi +Cm te Cm Cm
i=j Ci
j=1
Now when m > oo, the first term goes to 0, so we obtain
<e,
lim sup m1oo
which proves the existence of the required limit.
(6.1.7)
Proposition. Let p > 1 be a fixed number. Let (X,,,) be a
nonnegative submartingale, and let (ai) be a positive increasing predictable sequence with ai + oo. Suppose 00
XiP+1 p
El
(6.1.7a) ti=1
a,+1
\\\
X P I< oo. /
Then limn_00 X,,,/a, = 0 a.e. Proof. First, by Kronecker's lemma (6.1.6) applied to (6.1.7a), we have r
(6.1.7b)
El
P am+1
r
1
El
m
1 1
am+1
XiP
El
= i=1
L
a m+1
0. J
By Jensen's inequality (2.3.10), the process (XP,) is also a submartingale. , Xx), Let m < n; apply (6.1.4a) to the finite submartingale (X n, X n+1)
6.1. Maximal inequalities for supermartingales
259
replacing ai by ail and A by AP; apply the conditional expectation E1; then take the limit as n + oo. The result is:
l <El XP rr >AP7 APP1Ssup XP P P + 1111:>. ai
a
11J
El rX +1P  XiP
00
i=m
L
J.
ai+1
The first term on the right converges to 0 by (6.1.7b). The second term converges to 0 by (6.1.7a). Therefore limX,, /a,, = 0 a.e. The following theorem is due to Paul Levy if p = 2. The particular case where the Yi are independent and with zero expectation is one of the two classical Kolmogorov laws of large numbers. (The second one is proved below: see the case p = 1 of (6.1.15).)
(6.1.8) Theorem (Y. S. Chow). Let p be a fixed number, 1 < p< 2. Let M,, = E 1 Yi be a martingale, and let ai be a positive increasing predictable sequence. Suppose cc
E1 CafO <00.
(6.1.8a) i=1
Then limn, Mn,/an = 0 a.e.
Proof. We take first the easy case p = 2. Since ai is predictable, and E2Y+1 = 0, we have E1
Mi+1 2
 M?1 = E1 E2 LYi+1(2Mi +Yi+l) a
ai+1
L
2
ai+1
r
1
2Mj
= E1 1
(( ai+1
rr
2
EiYi+1] + E1Ei l Y2 +11 L ai+1
2
= E1
Yi+1 2
a i+1 Now apply Proposition (6.1.7). For the case p < 2, we will prove that (6.1.8b)
El (IMi+1 IP
 IMiIP) < 2E1(IYi+1 IP)
This may be used in the same way as the martingale equality was used in the case p = 2 to finish the proof. We have
E2(IMiIP)=IMiIP=IEiMiIP
=IEi(2MiMi+l)IP < Ei(I2Mi  Mi+1IP),
Martingales
260
by Jensen's inequality (2.3.10). Therefore EE(IM2+I I'
(6.1.8c)
 IMjI')
Ei (I Mi+1 I P + I2Mi  Mi+1 IP)  2Ei (I Mi I1)
=E2(IMM,+Yi+IIP+IMM Yi+1IP)
2Ei(IMiIP).
We will need the elementary inequality (6.1.8d)
Ia+bIP+IabIP<2(IaIP+IbIP),
for 1 < p < 2. To prove it, observe that we may assume Ial > IbI by interchanging a and b. Then, writing x = b/a, it suffices to prove (1 + x) P +
(1  x)P < 2(1 + IxIP) for 1 < x < 1. Replacing x by x if necessary, it is enough to prove this for 0 < x < 1. Also, the inequality is clear for
x = 0. Let F(x) = (1 + x)P + (1  x)P  2(1 + xP). If G(x) = F(x)/xP, then we must show G(x) < 0 for 0 < x < 1. But G(1) = 2P  4 < 0, so it is enough to show that the derivative G'(x) > 0. Now G'(x) = pH(x)/xP+1 where H(x) = 2  (1 + x)P1  (1  x)p1, so we must show H(x) > 0 for
0 < x < 1. But H(0) = 0 and H'(x) = (p 1) ((1 x)P2  (1 +x)P2) > 0 since p  2 < 0 and 1 + x > 1 x. This completes the proof of the inequality (6.1.8d). Applying (6.1.8d) to (6.1.8c), we obtain
Ei(IMM+1I1  Mill < 2Ei(IMMIP+ IYi+1IP) 2E2(jM2IP)
= 2Ei(IYi+1IP). Then apply El to obtain (6.1.8b).
Much more difficult is the case p > 2. The proof is based on a fundamental inequality of D. L. Burkholder, proved below (Theorem (6.3.6)). For simplicity we will take ai = i. The exponent in the denominator must be 1 + p/2 < p, so the condition is more difficult to satisfy than the corresponding condition with exponent p. The independent case of the following is due to Brunk [1948] and Chung [1947, 1951].
(6.1.9) Theorem (Y. S. Chow 1967b). Let p be a fixed number with 2 < p < oo. Let Mn = > 1 Yi be a martingale such that 0 l
ff
(6.1.9a)
E1
i1+p/2 +P/2
< 00.
i=1
Then limMn/n = 0 a.e. Proof. By Holder's inequality (with exponents p/(p  2) and p/2), we have
:
n
i=1
2/p
n
2 < n12/P
IYIP i=1
6.1. Maximal inequalities for supermartingales
261
Now raise both sides to the power p/2 and apply E1: P/2
n
E y2
< nP/21E1 EIYIP i=1
i=1
Next we apply the right side of (6.3.6) with Sn = (E a constant K < oo such that
(
n
(6.1.9b)
E1
Yi
1
Y2)1/2. There is
p)
Iv
7
$_1
Now by Proposition (6.1.7), it suffices to prove n
00
E1 I Ei=1 Y I
E
P
 E1 I i=1n1Yi I
P
< 00.
n=2
Therefore it suffices to prove both that lim E1 I E 1 YIP/nP = 0 and that 00
1
E1
(n+1)P
n=1LnP
P
n
1
Yi
< 00.
But by (6.1.9b), En
E1
I
i
Yp< I
n
KnP/21
E E1 (IYiIP) , i=1
which converges to 0 by (6.1.9a) and Kronecker's lemma. Again using (6.1.9b), and since 1/nP  1/(n + 1)P < (nP + pnP1  nP)/nP(n + 1)P < p/nP+l, we have °°
1
n
1
E n=1[nP
E1
(n+ 1)P] :5
00
1: n=1 [ KpE1
Yi
KnPl2_1E1
0n
1
1
(n + 1)P]
(n_P/2_2
IYIP
i=1
(n=0 1 00
KpE1(vPn_P122) E 00 n=i
i=1
C
2Kp
p+2
(Pi_P/2_1) < 00.
E1
i_1
IYIP) J i1
Martingales
262
A generalization of independence: starmixing We restrict attention now to a probability space (St, .F, P). Let (An)n>1 be a family of aalgebras; typically An is generated by a single random variable Y n . Let . P n = a(Ui
The family (An) is called starmixing iff there exists a positive integer N and a function f (n) defined for n > N, such that f (n) 10 and if n > N, A E.Fm, B E A,n+n, then
IP(A n B)  P(A)P(B) I < f (n)P(A)P(B). (6.1.10) Lemma. Assume that (An) is starmixing with constant N and function f (n). Then for any aalgebra C C_ any n > N, and any integrable random variable X measurable with respect to An+n,
IE'[X]E[X]I < f(n)E[IXI] Proof. Let Zk be simple and An,.+nmeasurable so that 0 < Zk T X+. Fix k. Then Zk = >i=1 bi 18;, where bi > 0 and Bi E A,,,,+n Choose A E C
such that P(A) > 0. Then
IE[ZkIA]E[Zk]I < i=1
b P(BZn P(A)A) P(BinA) P(A)
a
bi
 i=1 b2P(Bi) P(Bi)
i=1 J
f (n) > biP(Bi) = f (n)E [Zk]. i=1
Then if k > oo, we obtain IE [X+ I A]  E [X+] I < f (n)E [X+].
This remains true if A is replaced by C (integrate over sets A E C), and also if X+ is replaced by X. The lemma follows.
If (Yn) is a stochastic process, we will say it is starmixing if the aalgebras An = a(Yn) are starmixing.
6.1. Maximal inequalities for supermartingales
263
(6.1.11) Theorem. Let (Y,,,) be starmixing, mean 0 (E [Y,,,] = 0), and L1bounded (IIY,, II 1 < K < oo). Let p be such that 1 < p < oo. Assume that (6.1.11a) E001 E [I YYI P] /iP < oo, if p < 2, or (6.1.11b) E001 E [IYYIP] /i1+P/2 < oo, if p > 2. Then
1 n
(6.1.11c)
Yi * 0
a.s.
i=1
Proof. Fix an integer k with 0 < k < N, and apply Lemma (6.1.10) to the process X,,,, = YmN+k If W. = Q(YmN+k, 1'(m1)N+k, m > 1 and WO = {S2, 0}, then IEnm1
[YmN+k]  E [YmN+k] I < f (N)E [I YmN+k I]
i
1'N+k) for
.
Now E [YmN+k] = 0 and E [IYmN+kI ] < K, so given e we may choose N so
large that (6.1.11d)
IExm1 [YmN+k] I
<< eK
for m > 1 and 0 < k < N. Fix kandN. LetT,,,=YmN+kform> 1. Then U,,, = T,,,  Exm1 [T,,,] defines a martingale difference sequence. The sequence (T,,,.) satisfies (6.1.11a) or (6.1.11b), as appropriate. Now "centering," that is replacing by U,,,, does not change this, because IUmI < ITmI +Exm1 [ITmI] <2max{ITmI,Ercm1 [ITmI] }'
so by Jensen's inequality 21max{ITmI1, (Elm1 [ITmI] )P} < 2Pmax{ITmIP,E"m1 [ITmIP]} < 2P (ITmIP+E'm1 [ITmIP]) .
This implies (6.1.11e)
E [IUmIP] < 2P (E [ITmIP] + E [ITmIP]) = 2P+1E [ITmIP] .
Therefore the law of large numbers for martingale differences (Theorems (6.1.8) and (6.1.9)) implies that r
lim 1 r r
[YmN+k m=1
[YmN+k] ] = 0
a.s.
Martingales
264
With (6.1.11d), this implies that lim sup 1
r
YmN+k < EK. I
m=1
Since e is arbitrary, the left side is 0. This holds for each k with 0 < k < N, which implies the theorem. Complements
(6.1.12) Inequality (6.1.1le) can be improved to E [IUmIP] < 2PE [IT,nIP]. In the estimation of JU,,,,IP, use the inequality Ix + yIP < 2P1(IxIP + IyIP), which is true since the function JxJ' is convex. (6.1.13) (Example of a starmixing process.) A stationary ergodic Markov
chain with countably many states and transition probabilities pig is star
mixing if and only if there is a number 3 with 0 < 3 < 1 such that for all j, sup pig < (1 +)3) inf pig i
(Blum, Hanson, & Koopmans [1963]).
(6.1.14) (Qualitative starmixing.) The family (An) is Qstarmixing if there exists a positive integer N and a constant a > 0, such that if n > N, A E.Fm, B E Am+n, then
IP(A n B)  P(A)P(B)I < aP(A)P(B). Qstarmixing is sufficient for a nonanticipating converse of the dominated
ergodic theorem: If (Yn) is a stationary Qstarmixing positive process,
Xn = (1/n) E
Yi, and Y1 V L log L, then there is a stopping time r such 1 that E [IXI] = oo (Edgar, Millet, & Sucheston [1982]). In Theorem (6.1.11), if starmixing is replaced by Qstarmixing, the conclusion (6.1.11c) should be replaced by 1
n
n
i=1
lira sup  E Yn < oo a.s. n
(6.1.15) (Laws of large numbers.) The Kolmogorov strong law of large numbers for independent random variables Yi with the same distribution and finite expectation has a martingale proof. This has been hailed as a marked success of the theory. J. L. Doob observed (see Doob [1953], p. 343) that X_, = (1/n) 1 Yi is a reversed martingale, hence converges a.e. In fact it suffices to assume that the Yi's are exchangeable. This is extended here. Let I be a set. A permutation on I is a bijection of the set I onto itself.
The support of a permutation it is the set { i E I : 7r(i) 0 i }. A family
6.1. Maximal inequalities for supermartingales
265
(Y2)iEI of random variables is called exchangeable if, for each permutation
ir with finite support, the family (Y)iEI has the same distribution as the family (Y,r(i))iEI Suppose (Yn)nEN is an exchangeable family of positive integrable random variables. For m E IN, let
Xm =
(Yi +... + Ym)P , m
._,n=or {X_r:r>m}. Then (X_n)nEIN is a reversed submartingale if 0 < p < 1; and a reversed supermartingale if 1 < p < oo. For 0 < p < 1, the sequence X_n converges a.s. The limit is 0 if 0 < p < 1 (Edgar & Sucheston [1981]). In order to prove this, we begin with a simple lemma.
Lemma. Let P,,, be the set of permutations of IN with support in the set {1, . . . , m}. Let yl, , yn, be nonnegative real numbers, let 1 < k < m,
andlet0
+ ym)P
(y,,(1) + ... + ya(k))' > (y1 +
M! ,rEP,

k
m
The reverse inequality holds if 1 < p < oo.
Proof. Indeed, for all integers k < m, we have
(m  1)!k(y1 + ... + ym) = > (ya(1) + ... + ya(k)) irEPm
<
(ya(1) + ... + Y r(k))P (yl + ... + ym)1P aEPm
Hence
(m  1)!k(y1 + ... ym)P < E (ya(1) + ... + Y r(k))P. rEPm
E
Now assume that (Yi)iEN are exchangeable, and Y > 0. Let S,n = Yi +
Y,,, and X_,,,, = SP,, /m for m E IN. For all 7r E P,,,., the family (Y1,...
,Ym,Sm,Sm+1,...)
has the same joint distribution as the family (Y11(1),...
,Yr(m),Sm,Sm+1,...).
Hence, for all k < m, EF_m [(Y1 + ... + Yk)P]
=
EF_m
[(Ya(1) + ... + Y, (k))P]
Martingales
266
Therefore, by the lemma, if 0 < p < 1,
((Yl+...+Yk)p k
IL
(Y(1) + ... + Y(k))p
1
k
((Y1 +...Ym)pl = m J If 1 < p < oo, the reverse inequality holds. Now we prove convergence in the case 0 < p < 1. Then X_,,,, is dominated by EFm [Yp] and converges a.e. by (1.2.8), say to X. Assume p < 1. If X > 0 on a nonnull set A, then lira X_2,,,, = X implies that on A we have (Szm/Sm)p  2, if S. = Ei_1 Yi, hence (Y.+.+1+...+Y2m)/(Yi+.. Ym) 21/P  1 > 1. This is a contradiction because X is determined by the exchangeable sequence (Y,,,) so that the random variable X is invariant under the permutation (1, , m) <  + (m + 1, . , 2m). The convergence of X_,,,, to zero when p < 1 can be proved in many ways; see for example Neveu [1965a], p. 153; and Bru, Heinich, & Lootgieter [1981]. For an ergodic approach, see (8.6.18), below.
It is easy to see that X_,,, diverges for p > 1: compare with the case p = 1 (see (6.1.16), below). Yet the information that the process is a reversed supermartingale can be useful: For finite stretches, the "direct" and reverse processes agree, so supermartingale results proved above, for example (6.1.2), are applicable. (6.1.16) For p = 1, the limit X in (6.1.15) can be identified as ES [Y1], where S is the oalgebra of "exchangeable" events: that is, events depending on the sequence (Y,,,) and invariant under permutations of finite support. It can be proved that S agrees (up to null sets) with the "tail" oalgebra: see, for example, Meyer [1966], pp. 149150. Remarks
Another way to achieve the generality replacing the expectation by El would involve using a regular conditional distribution with respect to F1 (see Billingsley [1979], pp. 390, 399). Then the case of general .F'1 can be deduced from the case of trivial Jc'1 and finite p. For Proposition (6.1.4) of HajekRenyiChow, see Hajek & Renyi [1956], Frank [1966], Bauer [1981], and Chow [1960a].
That the process
Zn 
(Y1 + Y2 + ... Yn)P
n
of (6.1.15) is an amart was observed by A. Gut [1982]. He used the Marcinkiewicz theorem to observe that Zn, and hence Zr, converges a.s., and since it is uniformly integrable, E [Z,] converges. This motivated us to show that in fact the process is a reversed submartingale, which does not use the Marcinkiewicz theorem, but instead proved it.
6.2. Decompositions of submartingales
267
6.2. Decompositions of submartingales This section discusses Doob's decomposition and Krickeberg's decomposition; the quadratic variation of a process; and convergence of martingale transforms. A probability space (52,.1, P) and a stochastic basis (Ff)fEIN will be fixed. If (Xn)nEIN is an adapted process, we may refer to it using a bold letter: X = (X,,.).
Then sets of equations, such as Xn = Mn + An for all n will be abbreviated in the obvious way: X = M + A.
Recall that a process V = (Vn) is called predictable if Vn+1 is .I',measurable for all n. Doob's decomposition
We begin with a decomposition theorem for submartingales.
(6.2.1) Theorem. A submartingale [supermartingale] X can be uniquely written as X = M + A (X = M  A], where M is a martingale and A is a predictable increasing process with Al = 0. If X is L1bounded, then so is M, and then the limit A,, = limn An is integrable.
Proof. Let X = (Xn) be a submartingale. Let M1 = X1 and Al = 0, then define recursively Mn+1 = Mn + X.+1  El, [Xn+1] and An+1 = A. + E7^ [Xn+1]  Xn. Then A = (An) is predictable, positive, and increasing, and M = (Mn) is a martingale. To check the uniqueness, suppose that Xn = Mn + A' is another decomposition with the same properties. Then An+1
 An = E7^ [An+1  An] = E""° [Xn+1  Xn, 
(Mn+1
 Mn)]
=An+1An0=An+1AnSince Al = 0 = A'1, this means that An = A' for all n. Thus also Mn =
XnAn=XnAn =Mn. Assume now that we have supra E [I XnI ] < oo. Since An > 0, we
have supra E [M:] < supra E [Xn ] < oo. But Mn is a martingale, so supra E [IMnI] = Supn(2E [Mn ]E [M1]) < oo. Then A is also Libounded, so the integrability of the limit follows from the monotone convergence theorem. The supermartingale case is obtained by applying the submartingale case
to the process (Xn).
Martingales
268
Next is the Krickeberg decomposition.
(6.2.2) Theorem. An Libounded submartingale X can be written as X = M  R, where M is a positive martingale, and R is a positive supermartingale. If X is a martingale, then also R is a positive martingale and E[M1]+E[R1] =sup,,,E [IX,, I].
Proof. It is easy to see that X+ = (X,:) is also a submartingale (see also the proof of Theorem (6.1.4)).
The sequence M will be obtained as the martingale part in the Riesz decomposition of X,+,. Actually, the argument is easier than for general
amarts (1.4.6), since for n > p
E' [Xp+1] = E 7n [E'P [Xr++l]] > E° [Xp ]
.
Set Mn = limp T E' [X]. The supermartingale R = M  X is positive. Finally, if X is a martingale, then E [R1] = E [Mil  E [Xi]
=limTE[XP]E[Xi] = li T (E [Xp] + E [Xp  E [X1]] )
=limTE[Xp]. Therefore E [M1] + E [R1] = limE [XP] + limE lx;] = supra E [f Xn I] . The Krickeberg decomposition is not unique; see (6.2.7). The quadratic variation of a process X is the random variable Q defined as
It is a curious fact (Austin [1966]) that the quadratic variation of any positive martingale is finite. First we consider bounded positive supermartingales.
(6.2.3) Theorem. Let Q be the quadratic variation of a positive supermartingale (Xn) bounded by a constant c. Then: (a) Q is integrable: E [Q] < 2cE [X1]. (b) If Xn = Mn  An is Doob's decomposition, then E [A,,.] < E [X1] and (Mn) is L2bounded: IIMnll2 < 2cE [X1].
Proof. The quadratic variation Q is lim Q, where n1
n1
Xi(X,  Xi+1).
Qn = Xi + E(Xi+1  Xz)2 = Xn2 + 2 i=1
i=1
6.2. Decompositions of submartingales
269
Doob's decomposition X = M  A yields X.  X2+1 = (Mi  M2+1) + (Ai+1  A2). Therefore we have n1
E [Qn] = E
[X+2E3' [Xi(X2  Xi+1)] i=1
n1
=E
[x+2x(A+1 _A2) 2=1
n1
cE [Xn + 2 E(Ai+1  A2) i=1
= cE [Xn + 2An] < 2cE [Xn + An] = 2cE [Mn] = 2cE [M1] = 2cE [X1] .
This proves (a). Since Xn = Mn  An, we have E [An] = E [An]  E [Al] _ E [Xi]  E [Xn] < E [X1], so we have by monotone convergence, E [A,,,,] E [X1]. (This does not require Xn < c.) For the last part of (b), we prove that IIMnII2 < E [Q]. The simple "centering" identity
E [(X  E [X])2] = E [X2]  E [X]2 is also true with conditional expectation in place of expectation. This fact will be used twice. First, we have Ej'i [(M2+1
 Mi)2] = EF' [Mi+1]  M.
Applying the centering identity again, since
Mi+1  M2 = (Xi+1  Xi)  E,7' [(Xi+1  Xi)] , we have
ET' [(M2+1  M2)2] = ET' [(X2+1
 X2)2]  (ETi [Xi+l  Xi])2
< Es' [(X2+1  X2)2]
Now we can compute:
IIMnII2=E[Mn] n1
=E Mi
i1
[M+1 MZ]
n1
= E [M? +
[(M2+1  M2)2] i=1 n1
<E [X? +[(Xi+1Xi)2] i=1
So by (a), we have IIMnII2 < 2cE [X1].
E [Q].
Martingales
270
Next we prove a maximal inequality for the quadratic variation of an arbitrary positive supermartingale. It will be convenient to state it in terms of a homogeneous expression, namely the square function of X, defined by
S=
.
(6.2.4) Theorem. Let (Xn) be a positive supermartingale, and let S = Q be its square function. Then, for every A > 0, we have
I sup
P {sup Xn < A, S >
Al
6E [Xl]
1
P{S > A} < 6E [X1]. Hence S and Q are finite a.s. Proof. Let c > 0 be a constant. The process (Xn Ac) is also a supermartin
gale. Write Q(c) for its quadratic variation. By Theorem (6.2.3(a)), we have E [Q(c)] < 2cE [X1 A c] < 2cE [X1]. Now on the set {supra Xn < c} we have Q = Q(C), so
P{supXn
c, Q(c) > A2}
P{Q(c) > A2} 2
E [Q(c)] <_
22
E [Xi]
Now choose c = A to obtain the first inequality in the theorem. By Proposition (6.1.2), we have P{supra Xn > Al < (1/A)E [X1], so
P{S>A}=P{S>A, sup XnA, supXn>A} n
n
6E[Xi]+ 1E[Xi] = 6E[Xi] (6.2.5) Proposition. (i) If Xn is a martingale, then its square function S satisfies
P{S > A} < 6 supE [IXnI] . (ii) If Xn is a positive submartingale, then
P{S > A} <
12
sup E [Xn].
6.2. Decompositions of submartingales
271
Proof. Since the inequalities are trivial if X is not L1bounded, we may assume that X is L1bounded, and apply the Krickeberg decomposition (6.2.2): X = M  R, where M is a positive martingale and R is a positive supermartingale. The difference sequences satisfy
X.  X.1 = (Mn  Mn1)  (Rn  Rn1) Let S be the square function of X, let S' be the square function of M, and let S" be the square function of R. By the triangle inequality in the sequence space 12(IN), we have S < S' + S". Therefore, by Proposition (6.2.4),
P{S > Al < P{S' > A/2} + P{S" > A/2}
< E[M1]+E[R1] Next, if X is a martingale, then by (6.2.2) sup,, E [IX,,1] = E [M1] + E [R1], which proves (i). If X is a positive submartingale, then E [M1] _
limp T E [Xp] = sup,, E [X,,] and R1 = M1  X1 implies that E [R1] < sup,, E [X,,], which proves (ii).
Part (i) is proved by Burkholder [1973] with constant 3 rather than 6. Martingale transforms Let X = (Xn) be a sequence of random variables. Its difference sequence
is the process Y = (Yn) so that Xn = 1 Y. If V = (Vn) is another process, then the transform of X by V is the process Z = (Zn) given by Zn = 1 ViY. We will sometimes write Z = V * X. If Xn is the fortune of a gambler at time n, then the transform Zn may
be viewed as the result of controlling X by V. If we assume that V is predictable, then multiplication of Y1 by V is equivalent to changing the stakes for the ith game on the basis of information available before the ith game.
If X is a martingale, and V is predictable, then the transform Z = V * X is also a martingale (see Section 3.2). We will now prove that the transform of an Llbounded martingale by a bounded predictable process converges. This is a surprising result, since (see (6.2.8)) the transform need not be Llbounded.
(6.2.6)
Theorem. Let X be an Libounded submartingale or an L1
bounded supermartingale. If V is an Lambounded predictable process, then the transform Z = V * X converges a.s. Proof. We may assume that IV,, I < 1 for all n by multiplying by a constant.
If X is an Llbounded submartingale, and X = X X" is the Krickeberg decomposition (6.2.2), then V * X = V * X'  V * X", and X', X" are both positive supermartingales; therefore it is enough to prove the result when X is a positive supermartingale.
272
Martingales
First suppose the positive supermartingale X is Lambounded, say Xn < c
a.s. Let X = M  A be the Doob decomposition of X. Since V * X = V * M  V * A, it suffices to show that both of these parts converge a.s. Convergence of V * A follows from the monotone convergence of An to Ate, since (V * A)n  (V * A).11:5 I V.1 I An.  An1I
(V * M)2 = V2Mi < M12, M)n]2 [(V * M)n+1  (V * = n+1(Mn+1  Mn)2 < (Mn+1  Mn.)2, and n1
E [(V * M)n] = E [(V * M)i + E [(V * M)k+1  (V *
M)k]2
k=1
n1
< E [M? + E(Mk+1  Mk)2 k=1
=E [MM] <2cE[X1]. Now V * M is L2bounded, hence L1bounded, and therefore it converges a.s.
For a general positive supermartingale X, fix a positive constant c and consider the process X' defined by X;' = Xn A c. The process X' is a bounded positive supermartingale. So by the previous part of the proof, the process V * X' converges a.s. The transform V * X' agrees with V * X on the set St, = {supra IXnI < c}. Therefore V * X converges a.s. on Stc. By the maximal inequality (6.1.2a), the space SZ is a.s. a countable union of sets Stn,, = {supra XI < m}, so V * X converges a.s. Finally, if X is an L1bounded supermartingale, then X is an L1bounded submartingale, so V * X =  [V * (X)] converges a.s. Complements
(6.2.7) The Krickeberg decomposition X = M  R of Theorem (6.2.2) is not unique, since any positive martingale can be added to each term. If Mn is defined as limp E7^ [X], as in the proof of Theorem (6.2.2), then the decomposition is minimal, in the sense that there is no positive martingale dominated by both M and R.
6.3. Norm of the square function
273
} with (6.2.8) (Example of unbounded transform.) Let 1 1 = {1, 2, 3, P{k} = 1/k(k + 1), so that P{k, k + 1, } = 1/k. Define the process X by 1
Xn(k){
n
ifk
Then IIXnII1 < 2 for all n. The process is adapted to the stochastic basis (Fn), where Fn is the finite valgebra with the n + 1 atoms
{1},{2},... Let Yn = Xn  Xn_1 be the difference sequence. To prove that (Xn) is a martingale, we will show that [Yn+1] = 0. Now
Yn(k) _
0
ifk
n
ifk = n;
1
ifk>n.
The only atom of .fin on which Yn+1 does not vanish is A = {n+1, n+2,. Thus we must show E [1AYn+1] = 0. Indeed,
}.
1 1 _ =(n+1) (n+1)(n+2) +n+2 0
We will transform by the process Vi = (1)i+1 Let Z = V * X. Then
we haveZn(2k1)=(2k1)forn> 2kl and Zn(2k) = 2k+1 for n > 2k. Thus 00
((2k 1)P{2k  1} + (2k + 1)P{2k}) = oo.
l im IIZnll1 > k=1
Hence Z is not L1bounded and the limit Zo, = limn. a.s. by Theorem (6.2.6)) is not integrable.
Zn (which exists
Remarks
Theorem (6.2.6) on the convergence of martingale transforms is due to Burkholder [1966]. For sharp Lp inequalities for martingale transforms, see Burkholder [1984], [1988], and [1989].
6.3. The norm of the square function of a martingale In this section we will study the relationship between a norm of a martingale and the corresponding norm of the square function of the martingale.
Martingales
274
Burkholder's inequalities (Theorem (6.3.6)) show that the Lp norms are "equivalent" in a strong sense for 1 < p < oo. Suppose X = (Xn) is a stochastic process adapted to the stochastic
basis (Fn). We will use the generic notations: set Xo = 0; the difference process: OXn = Xn  Xn_1i maximal functions: X* = supkEIN IXkI and X,n = maxl
and Sn(X) _ (Ek_l(OXk)2)1i2; the Lp norm of the process: IIXIIp = supk II Xk II p
In this section, we will consider a process X = (Xn), its difference process
Yn = AX, and its square function process Sn = Sn(X). By convention Xo = Yo = 0. We will also sometimes use the convention X,,, = lim sup Xn, so that XT is defined for stopping times r that take the value oo.
(6.3.1) Lemma. Let X be an L1bounded martingale or positive submartingale. Let A > 0. Define rr = inf In: I Xn I > A} (finite or infinite). Then (6.3.1a)
IISrI II2 + IIX1._1I12 < 2E [X,_1X] < 2AIIXII1.
In the martingale case, the first inequality is equality. Proof. Since I XrlI < A, we have E [I Xr1XrI ] < .XE [I XrI ] < AIIXIII Hence it suffices to prove the first inequality. By elementary algebra,
Sn'_1+Xn_1 =2
YjYk
1<j
=2
E Yj(Xn1  Xj1) 1<j
= 2Xn_1  2
> Xj_lYj 1<j
= 2 XnXn_1  E Xj_1Yj 1<j
IISn1II2 + II`Kn1II2 < 2E [Xn1Xn]
with equality in the martingale case. Now by the optional sampling theorem (1.4.29), the process (X,An) is also a martingale (or positive submartingale). So this inequality remains true: IISrn(n1)I12 + IIXrn(n1)I12 < 2E [Xrn(n1)Xrnn]
6.3. Norm of the square function
275
with equality in the martingale case. Now we have I XTnn I < I XT I except on the set {T = oo}, where I XTAnI < A. Hence the sequence (I XTA(n_1)XTnnI ) is dominated by the integrable random variable A(I XT I + A). Now XTnn >
XT as n + oo, even on {T = oo}, by the convergence theorem (1.2.5). So we obtain the first inequality of (6.3.1a) with the dominated convergence theorem. We now prove the submartingale analog of Theorem (6.1.3). Recall that this theorem was proved for positive supermartingales (Lemma (6.2.4)) using Krickeberg's decomposition. The estimates given here for positive submartingales are sharper than those in Proposition (6.2.5(ii)).
(6.3.2) Lemma. Let X be a martingale or a positive submartingale; let X* be its maximal function and S its square function. Then for every A > 0 (6.3.2a)
P{X* < A, S > A} < 2 IIXIIi,
(6.3.2b)
P{S > A} < a IIXIIi.
Proof. By Proposition (6.1.5), we have AP{X* > A} < IIXII1, so (6.3.2a) implies (6.3.2b). So it suffices to prove (6.3.2a). Let r = inf { n : I Xn I > A }. On the set A = {T = oo}, we have X* < A. Hence by Lemma (6.3.1),
P{X* < A, S > Al < P{ST > Al < A2[ST] E <_
IIXIII.
Next we will consider square functions of sequences of random variables
other than X. We use the the notations S(X) and Sn(X) for square functions as defined above. We begin with a process X = (Xn), fix a positive number 8, and consider the sequence Tn = Sn (8X) V Xn*. We first obtain a weak maximal inequality for Tn, and then apply standard arguments to obtain the desired strong LP inequality.
(6.3.3) Lemma. Let X be a positive submartingale, let 8 be a positive number and let 3 = 1 + 282. Then for every positive A, AP{Tn > 3A} < 3E [Xn
Proof. Let Z. = Xn 1{s, (ex)>a} Let T = inf { n : Sn (8X) > A }. On the set B = {Sn(8X) > 3A, Xn < A}, we have Sn(8X) > A since 0 > 1, hence
Martingales
276
T
(Sn(OX)2 n
_ (Sr1(oX))2 + 92Y2 + 92 E Y2 j=T+1 n (Zj  Zj_1)2
< \2 + 82A2 + B2
j=T+1 (1 + 62)x2 + B2 (Sn(Z))2.
This implies that Sn(Z) > A holds on B. Now Z is a submartingale, since the product of an increasing adapted sequence and a positive submartingale is increasing. Applying is a submartingale and the process Lemma (6.3.2) to the submartingale ,Zn,Zn,...),
(Z1,Z2,...
we obtain AP{Xn < 13A, Sn(OX) > Al < AP{Zn
A, Sn(Z) > A}
2 IIZIII = 2E [Xn
Now applying the classical weak Doob's inequality (1.4.18), we have AP{Xn > Al < E [Xn 1{x;,>A}] < E [Xn 1{ n>A}]
.
Combining,
AP{Tn > /3A} < AP{Xn > Al + AP{Sn(BX) >,3A, Xn < Al < 3E [Xn 1{n>A}]
.
Now that we have a weak inequality, we may use the threefunction inequality to pass to a strong inequality between the process X and its square function S.
(6.3.4) Proposition. Let X = (Xn) be a positive submartingale. Let
1
(6.3.4a)
Ilsnllp <_
2ep3/2
p_ 1
IIXnlIp
6.3. Norm of the square function (6.3.4b)
277
e 13/2 IIXIIP.
IISIIp < 3
p Proof. It suffices to prove (6.3.4a), since (6.3.4b) follows when n + 00. Applying (3.1.17) and (6.3.3), we obtain 0IISnIIp < IITnllp < 3/3 gllXnllp,
where q = p/(p1). The best choice of 0 is the one that minimizes /3P/0 = (1+202)p/2/9. So 0 is the positive solution of the equation 2022p02+1 = 0, namely 0 = 1/ 2(p  1). Then 3(3pq 0
 3yp3/2 p1
(p1)/2
Cl +
p1 1
The last factor increases to e1/2 as p + oo, so inequality (6.3.4a) is proved.
A duality argument will now give an inverse inequality.
(6.3.5) Proposition. Let X = (Xn) be a positive martingale. Let 1 < p < oo and set q = p/(p  1). Then (6.3.5a)
(6.3.5b)
IIXnIIp s
32
3
IIXIIP
IISnhIp
12
q VIp 
IISIIp
Proof. It suffices to prove (6.3.5a). Fix the integer n. Without loss of generality, we may assume that I I Sn I J p < oo. Write Yj = AXE and Y,* = supl<j
Xn E Lp . Set Rn = Xn1 and RJ = [Rn] for 1 < j < n. Let q = p/(p  1) be the conjugate exponent. Then Rn E L., since Xn E Lp; also IIRII4 = IIXnIIp1 and E[X Rn] = IIXnIIp Now (R1iR2,... Rn) is a positive martingale. Therefore (6.3.4a) applies to it (with q replacing p); also using orthogonality and the inequalities of Schwartz and Holder, we
Martingales
278 have
IIXnIIp = E [XnRn] = E [(Xn1 + AXn)(Rn1 + AR.)] = E [Xn1Rn1 + AXn.OR..] = E [Xn2Rn2 + AXn10Rn1 + AXI Rf]
=E
AXjARj j=1
< E [Sn(X)Sn(R)] IISn(X)Ilp IISn(R)Ilq
liSn(X)Ilp
3 2eg3/2 llRnllq
q1
 3 2eg3/2 IIXnIIP' lls(X)IIP q1
Finally, divide by lI Xn Il P1 to obtain (6.3.5a).
(6.3.6) Theorem (Burkholder). Let 1 < p < oo and q = pl(p  1). There are constants cp = (p  1)/(6 2ep3/2) and Cp = 6s/q32/\/f such that for every martingale (Xn), cpll Snlip < IIXnIIp < CPIlSnllp, cP11Sllp < llXllp < CpIISIIp
Proof. As usual, it suffices to prove the first pair of inequalities. Fix n. We apply the Krickeberg decomposition (6.2.2) to the martingale (X1) X2)'* * > Xn7 Xno ..')
In fact the proof of the decomposition is easy in the present case; the general case is only invoked to motivate the choice of M and R: Set
Mn=limT P
{XP] =XZ,
Mj = hm T E'j [XP] = E'j [Xn ]
for 1 < j < n  1.
Now if Rj = Mj  Xj, then M = (Mj) and R = (Rj) are both positive martingales, with X = M  R. Note that Mn = Xn and Rn = Xn ; so IiMnllp < IIXnIIp and llRnllp
IIXnIIp, hence llMnllp+llRnllp c 211Xnllp
Now the difference sequences satisfy AXj = AMj  ARj. Thus by the triangle inequality in ndimensional Euclidean space, we have Sn (X) Sn(M) + Sn(R). Therefore, by (6.3.4), we have IISn(M)llp + Ilsn(R)llp
lISn(X)llp
<
3\/3/2
p
<s 2ep3/2
p1
which proves the lefthand inequality.
(IIMnllp + llRnllp) IIXnIIp=cp111Xnllp,
6.3. Norm of the square function
279
For the righthand inequality, we again use duality. Since X, need not be positive, this may be done using the duality map J: Lp + L. defined by: If (w) Ill f (w) 0
(Jf) (w)
if f (w) 54 0
otherwise.
(In the case of complex random variables, division by f (w) is replaced with division by the complex conjugate f(w).) If f E Lr, then Jf E Lq, IIP_1,
and E If J f ] = II f IIP as before. Now proceed as in the proof of Proposition (6.3.5) using JX,,,.
II Jf IIe = II f
Complements
(6.3.7) Theorem. Let fi be an Orlicz function, let T be its conjugate and let cp be its derivative. Suppose (u) ='I'(cp(u)) is an Orlicz function satisfying the (02) condition. Suppose X E L4. and T are positive functions satisfying AP{T > /3A} < aE [X 1{T>A}]
(6.3.7a)
for constants a,,3. Then, if c is the constant in the (02) condition, i.e. e(l3u) < c(u) for all u, and d > 0, then
T
<1
dMg (a/3c+d)IIxII
and
IITIIe < a/3(c+ 1)IIXII4
Proof. = Verify the hypothesis of the threefunction inequality (3.1.2) with g
Tl(aOIIXIIe), f= X/IIXII4, h = T/(aIIXII4,), a = 1, and b = c+d.
Therefore
(By Corollary (3.1.9), this difference of Mg's is not oo  oo.) By (02),
T M( a(c+ Td)IIXIIa,3( )cM(c+d)IIXII4 so
/
dMg 1
T
Choosing d = 1 gives IITIIe : a/3(c+ 1)IIXIIe
<1.
'
Martingales
280
(6.3.8) Corollary. Let X be a positive submartingale, 9 > 0, and Q = 1 + 282. Then
30(e 11Sn.4
1) IIXnII.P.
Proof. Apply the preceding Theorem, using Lemma (6.3.3), where a = 3.
(6.3.9) (Square function and maximal function.) Since the control of the square function S by X (6.3.8) is similar to control of the maximal function
X* by X (3.1.13), one may expect that S may be equivalent to X*. This is indeed the case: Let P be an Orlicz function such that 4)(u)/u + oo, satisfying the (A2)
condition 4'(2u), < c4(u) for some constant c > 0. Then there exist constants b, and Bc such that for every martingale X, we have bcMt [S(X)] BcM4[S(X)] (Burkholder, Davis, & Gundy [1972]). An L1 version of this result is also true: There exists constants b, B such that for every L1bounded martingale X, we have bE [S(X)] < E [X*] < BE [S(X)] (B. Davis [1969]). There is further a recent extension to more abstract spaces due to Johnson & Schechtman [1988]. (6.3.10) (Converse false.) Corollary (6.3.8) shows that if sup IIXn11,p < oo, then sup IISn Ikk < oo. The converse is false: choose Xn = X1 not in L logk L;
then Sn = 0 is in L logk1 L. (6.3.11) (Best constants.) The inequalities in (6.3.6) can be improved.
Let 1 < p < oo, 1/p + 1/q = 1, and p* = max{p, q}. Then the constant c, can be replaced by (p*  1)1, and Cp by p*  1. The lefthand side inequality is optimal if 1 < p < 2; the righthand side if 2 < p < oo. The best constants in other cases are not known. The results extend to martingales taking values in a Hilbert space. (See D. L. Burkholder [1991], Theorem 3.3, or Burkholder [1988].) Remarks
The first version of the inequality (6.3.6) is in Burkholder [1966].
6.4. Lifting This section will discuss liftings for measure spaces. A measure defines an equivalence relation on the measurable sets: two sets are equivalent if they agree almost everywhere. A lifting is a choice of one set from each equivalence class in such a way as to preserve as much of the structure as possible. (The technical definition is given below.) The existence of liftings may be established using the martingale convergence theorem, so the topic is appropriate for discussion here. Liftings have also been used to choose a good basis for differentiation of integrals.
6.4. Lifting
281
Throughout Section 6.4, we will write (S2, F,,u) for a nonzero complete vfinite measure space.
We will write A  B to mean u(A L B) = 0. Similarly, we will write f  g to mean { w E fl : f (w) 54 g(w) } has measure 0.
In some places, it will be important to distinguish between an equivalence class of measurable functions and an actual function. In order to do this, we will use the following notational conventions. For 0 < p < ao,
the notation Lp (SZ, )',A) = Lr(u) = £ will denote the set of all realvalued .measurable functions f on S1 with f If I" dp < oo. The notation L,, (St, F, µ) will denote the set of all bounded realvalued Fmeasurable functions on 9. The notation Lr (fl, will denote the quotient of LT, (S2, F, µ) obtained by identifying functions that agree almost everywhere.
If f E LP, the corresponding equivalence class will be written with a dot, f', when it is important to distinguish between the two. For f E Lam, we will write II f I I for the uniform norm:
IIfII..=sup{If(w)I:wEQ}, and
I f II
for the usual L,,,, seminorm:
IIfII,,,, =inf{MEIR:p{w:If(w)I>M}=0}. We also write IIf'II. = Ilf II. for the quotient norm on L. Existence of liftings
There are two common formulations of the basic definitions concerning liftings. We use here the two terms lifting and density to distinguish between them. (Some authors use "lifting" to mean what is here called "density," and others use "density" for what is here called "lower density.") Roughly speaking, a "lifting" chooses one element of each equivalence class of L,,, in a nice way.
(6.4.1) Definition. A lifting for the measure space (fl, F, p) is a function
p: L (S2, , p) > L. (Sl,,F, µ) satisfying (i) P(f) = f; (ii) if f g, then p(f) = p(g); (iii) p(1) = 1;
(iv) if f > 0, then p(f) > 0; (v) p(f + g) = p(f) + p(g); p(af) = ap(f), a E 1R; (vi) P(f g) = P(f)P(g) (In fact, condition (iv) follows from the others.)
(6.4.2) Definition.
A density for (fl,
satisfying (a) 6(A) = A;
(b) if A = B, then b(A) = b(B); (c) 6(fl) = Il; 6(0) = 0; (d) b(A n B) = b(A) n b(B); (e) b(A U B) = b(A) U b(B).
p) is a function 6: F  F
Martingales
282
A lower density is a function 6 satisfying (a), (b), (c), (d). An upper density is a function 6 satisfying (a), (b), (c), (e). If g is an algebra of subsets of fZ that meets each µequivalence class of F exactly once, then g is the range of a unique density, namely 6(A) is the element of g in the equivalence class of A. Conversely, if 6 is a density for (S2,
then the range of 6 is an algebra of sets that meets each
equivalence class exactly once. If 6 is a lower density, then the function 6' defined by 6'(A) = S2 \ 6(11 \ A)
is an upper density, and 6'(A) 6(A). We will say that 6' is the upper density corresponding to 6. If 6 is a density, then 6' = 6. It will take some effort to produce a nontrivial example of a lifting or a density. But upper and lower densities can be easily exhibited. Consider the measure space IR with Lebesgue measure A. The Lebesgue density theorem (7.1.12) can be used to show that 6(A)
tl x : lim
o
A(A n (x  h, x + h)) = 2h
1
11
defines a lower density (called Lebesgue lower density),
6'(A)=
x:limsupA(An(x2h ,x+h))
hj0
>0
defines an upper density, and 6(A) C 6'(A). Liftings and densities are closely connected.
(6.4.3) Proposition. (1) Let p be a lifting. If A E F, then the function p(1A) is an indicator function. The equation p(1A) = 16(A) defines a density 6.
(2) If 6 is a density, then there is a unique lifting p such that p(1A) _ 16(A) for A E F.
Proof. (1) Let p be a lifting. If A E F, then by condition (vi) of Definition (6.4.1), we have p(1A)2 = p(1A) = p(1A). So for all w E S2, either p(1A)P) = 0 or p(1A)P) = 1. Therefore, p(1A) is an indicator function, say 16(A). Properties (a), (b), (c) of Definition (6.4.2) for 6 follow from properties (i), (ii), (iii) of Definition (6.4.1) for p. Next, note
that p(1O\A) = p(1  1A) = 1  p(1A), so 6(11 \ A) = Il \ 6(A). Also 16(AnB) = P(1AnB) = P(1 A 1B) = P(1A)P(1B) = 16(A) 16(B) = 16(A)n6(B),
so 6(AnB) = 6(A)n6(B). Finally, by complementation, we get 6(AUB) _ 6(A) U 6(B).
(2) Write S for the set of Fsimple functions. If f = ci 1A: E S, define p(f) = E ci,16(Ai). It can be checked that p is well defined and IIP(f)II.. = 11f 11,,.. The conditions (i)  (vi) hold for functions in S. The closure of S in the G,,.norm is G,,. (12, F, µ), sop can be extended uniquely to 4, in such a way that IIP(f)IIu = Iif III holds for all f E Gam. Now it can
6.4. Lifting
283
be easily checked that (i)  (vi) hold for functions in G,. Suppose that p' is another lifting for p) that satisfies p'(1A) = 16(A) for A E F. Clearly
p(f) = p' (f) for f E S. But if 11f 11x) = M, then M < f < M almost everywhere, so M < p(f) < M everywhere, and thus IIp'(f)IIv. < 11f 11".. Thus p and p' are both continuous, so they coincide on Gam. This proposition shows that the problem of existence of a lifting is equiv
alent to the problem of existence of a density. The next few results have the goal of proving such existence. The first one has a standard sort of Zorn's lemma proof, but checking the details takes some space. (6.4.4) Proposition. Let S be a lower density for the nonzero complete orfinite measure space (52, F, µ), and let b' be the corresponding upper density. Then there is a density 6* for (52, F, p) satisfying b(A) < 6*(A) < 6'(A) for
allAE.F. Proof. Consider the collection R of all 9 satisfying: is an algebra of subsets of 52, meets each pequivalence class at most once,
6(A)9AC6'(A)forall AE9. Then R is partially ordered by inclusion, and not empty, since {0, 52} E R.
The union of a chain of elements of R is again an element. By Zorn's lemma, R has a maximal element g*. It remains only to show that 9* meets each equivalence class at least once. So let AO c .F, and define
Ao= U (6(CUAo)\C). CEcc*
We will show that Ao Ao and Ao E G*. First, if C, D E G*, then
6(CuAo)n6(DU(52\Ao))\(CUD)CS(CuD)\(CUD)=0, so (6(C U Ao) \ C) n (6(D U (Sl \ AO)) \ D) = 0. Therefore, if D E G*, we have
Aon (6(Du(1 \A0))\D) =0 by the definition of A', we get 6(Ao) C A'. By completeness, we have A' E F and A' A0. Next, if D E
we have
6((Q\D)u(52\A'0))=b((I \D)u(1\Ao)) c (1 \D)U(52\A'); so D n Ao C S'(D n A'). Now if E E 9*, then
6((l\E)uA'0)=6((l\E)uAo) c (1 \E)uAo, soE\A' C6'(E\A').
Martingales
284
Let G' be the algebra generated by G* and A', so
G'={CU(DnA')U(E\A'):C,D,EEG*}. and we claim G'ER. IfBEG',sayB=CU(DnA')U
Then (E \ A'0), then (since 6' is an upper density), B C 6'(C) U 6'(D n A'0) U 6'(E \ A'0) = b'(C U (D n A'0) U (E \ A'0)) = 6'(B). Also Q \ B E G', so 1l\B C_ 6'(Il\B) or 6(B) C_ B. To show that G' meets each equivalence class at most once, suppose B1, B2 E G' and B1  B2. Then u(Bl A B2) = 0, so
B1AB2 C8'(BIAB2)=O, soB1=B2Therefore ' E R. By the maximality of g*, we have G' = G*. Therefore Ao E G*. This completes the proof that G* meets each equivalence class. Now g* is the range of the required density 6*.
(6.4.5) Corollary. Lebesgue measure on the real line IR admits a lifting. Proof. Apply Proposition (6.4.4) to the Lebesgue lower density. Note that if (1, F, p) is a complete measure space, and if G C F is a aalgebra containing all of the ynull sets in F, then (St, G, pjG) is also a complete measure space. The collection of all such G is partially ordered by inclusion. In the next few steps we will be preparing for an application of Zorn's lemma in this situation.
(6.4.6) Lemma. Let (1k, F, p) be a complete finite measure space, let G C F be a valgebra containing all of the null sets of F, and let 6 be a density on G. Let A E F. Then there is a density 6* on the valgebra G' generated by G and A that extends 6. Proof. Consider
C1=ICE G:,u(CnA)=0} C2=ICE G:p(C\A)=0}. Both collections are closed under countable unions, and µ is finite, so there
exist sets C1 E C1 and C2 E C2 with u(C \ C1) = 0 for all C E C1 and µ(C \ C2) = 0 for all C E C2. Now
µ(C1nC2)=,u(C1nC2nA)+u((C1nC2)\A)=0, so b(C1) n b(C2) = 0. Define A* Define 6* on g' by: 6*((B1 n A) U (B2 \ A))
(A \ b(C1)) U b(C2). Then A*  A. (6(B1) n A*) U (6(B2) \ A*)
for 131, B2 E G.
If(BInA)U(B2\A)=_ (B3nA)U(B4\A),then p((B1L B3)nA)= 0, so b(B1) A b(B3) C b(C1). Similarly b(B2) L b(B4) C_ b(C2). Since b(C1) n b(C2) = 0, we see that (6(B1),L 6(B3)) n A* = 0 and 6(B2) \ A* = b(B4) \ A*. This shows that 6* is well defined and constant on equivalence classes.
The remaining parts of the verification that 6* is a density that extends 6 are straightforward.
6.4. Lifting
285
(6.4.7) Lemma. Let (1k, be a complete probability space. Suppose .F1 C_ F2 C ... are subQalgebras of .F and F1 contains all of the null sets. S u p p o s e 6, is a d e n s i t y o n fn ( n = 1, 2, ) and 6n+1 extends bn for all n. Then there is a density 6* on the oralgebra 9 generated by U° =1.Fn such that b* extends each 6n. Proof. It suffices to show that there is a lower density b on G that extends the bn. Let pn be the lifting for (Il,.fin,,u) corresponding to bn as in Proposition (6.4.3). If A E F, note that the expression pn (E^ [IA]) is a well defined element of Goo, even though the conditional expectation [1A] is only defined up to null sets. Define, for A E 9,
b(A)={w: lim pn(E'F"[1A])(w)=1} noo We will verify that b is a lower density on G that extends all bn. By the martingale convergence theorem (1.4.7), E"'" [1A] converges a.e. to EO [1A] = 1A, so 6(A) __ A. Also, if A E Fm, then for n > m, Pn(EFn [1A])
= Pn(1A) = 16,.(A) = ]L6 (A),
so b(A) = 6,.. (A). Thus 6 extends all bn. In particular, 6([) = SZ and 6(0) = 0. It remains only to verify that b(A n B) = b(A) n 6(B). Let A, B E 9. ThenlAfB < 1A, so E7:'" [lAnB] <_ E.*'" [lA]
a.e.,
so pn(Eln [1AnB]) <_ Pn(EF" [1A]), and therefore we have b(AnB) C b(A). Similarly 6(A n B) C b(B). Therefore b(A n B) C b(A) n b(B). On the other hand, 1A < 1AUB, so for w E 6(A), 1 = lim Pn (E*F" [1A] ) (w) < lim inf pn (E`" [1AUB]) (w)
< limsupPn(EF" [1AUB])(w) < 1,
and similarly for w E b(B). If w E b(A) n b(B), then lim Pn (Ef" [1AnB] ) (w)
= lim [pn
[lA]) (w) + Pn
[1B]) (w)  Pn (EF" [1AUB]) (w)]l
=1+11=1. Hence b(A) n 6(B) C_ 6(A n B). This completes the proof that b is a lower density.
Martingales
286
(6.4.8) Maharam's lifting theorem. Let (52,.x', µ) be a finite complete measure space. There is a lifting for (S2,
Proof. By Proposition (6.4.3), it suffices to show that there is a density. We may assume µ(S2) = 1 by multiplying p by a constant. Consider the collection R of all ordered pairs (g, 6), where g C_ F is a aalgebra containing all of the null sets in F, and 6 is a density for (0, g, µ). Then R is partially ordered by the definition: (91i 61) < (9, 62) if 91 C 992 and 62 extends 61. Also, R is not empty: let 9 consist only of the null sets and their complements, and 6(A) = 0 for µ(A) = 0; 6(A) = SZ for µ(S2 \ A) = 0. Let C = { (Qi, 6i) : i E I } be a chain in R. We must show that C has an upper bound in R. If C has a largest element, it is clear. If there is an increasing sequence it < i2 < ... such that U`.1in n
= U7if iEI
then the upper bound exists by Lemma (6.4.7). If there is no such sequence, then UiEI CJi is a aalgebra, and the unique extension of all the Si is a density
on it. So by Zorn's lemma, there is a maximal element (g*, 6*) of R. By Lemma (6.4.6), we must have 9* = F. So 6* is a density for (S2, , µ). The existence of a lifting for a complete afinite measure space follows easily from this. Properties of liftings In this section the letter p will denote both a lifting for a measure space (S2, , p) and the corresponding density: so p(1A) = 1piAi. We begin with certainly the most important property for applications
of liftings. As we know, if { hi : i E I } is a bounded set in Gam, then it has an essential supremum (4.1.1). But if the family is uncountable, the essential supremum is often not equal to the "upper envelope" or pointwise supremum: h(w) = sup h2(w) w E Q. iEI
Indeed, the function h defined in this way need not even be a measurable function. If the functions hi are representatives according to a lifting p,
that is p(hi) = hi, then we get the surprising conclusion that the upper envelope is the essential supremum.
(6.4.9) Measurability of the upper envelope. Let p be a lifting for a complete measure space (S2, , µ). Suppose { hi : i E I } is a family of bounded measurable functions, and hi < p(hi) for all i. Then (1) the pointwise supremum h = sup hi is measurable; (2) if h' is measurable and h' > hi a.e. for all i, then h' > h a.e.; (3) if h is bounded, then h < p(h).
6.4. Lifting
287
Proof. We consider first the special case where µ(S2) < oo, Ihil < 1 for all i, and { hi : i E I } is directed upward. Choose a sequence hi, < hi2 <
with limj f h2, dµ = supi f hi dµ. Let g = supk hi,,, so that g' is the supremum of { h' : i E 11, or g is the essential supremum of { hi : i E I }.
Then g > h' for all i, so p(g) > p(hi) > hi everywhere. Thus p(g) > supi hi = h > g, so h is measurable, h  g, and h is the essential supremum
of{hi:iEI}. Also, p(h)=p(g)>h. For the next case, suppose µ(S2) < oo and Ihil < 1 for all i, but { hi : i E I } is not directed upward. The first case may be applied to the collection of finite suprema hi, V hi2 V ... V hi., which is directed upward.
We must check that if p(f) > f and p(g) > g, then p(f V g) > f V g. But
fVg> f, sop(fVg)> p(f) > f; similarly p(fVg)>g;;sop(fVg)> fVg. Next suppose µ(S2) = oo and Ihil < 1 for all i. Write S2 as a disjoint union Un° 1 1 of sets of finite measure, and apply the previous case on each of the sets p(Stn). Finally, in order to drop the condition 1hi I < 1, for a fixed positive integer n, apply the previous case to h A n = sup(hi A n). But this is measurable
for all n, so h itself is measurable. (Possibly h(w) = oo for some w, and possibly h ¢ Gao.)
The preceding theorem can be used to formulate uncountable versions of the standard limit theorems for the Lebesgue integral (Fatou's lemma, monotone convergence, dominated convergence, and so on) for uncountable sets { hi : i E I} of functions with p(hi) = hi.
(6.4.10) Corollary. Let { Ci : i E I } C_ F, and suppose p(Ci) Ci for all i. Then C = UiEI Ci E F and p(C) C. If { Ci : i E I} is directed upward, then p(C) = supµ(Ci). Proof. Apply Theorem (6.4.9) to the measurable functions hi = lc,.
(6.4.11) Proposition. Suppose f c G,, (52,F,µ) and h: IR, > IR is bounded and continuous. Then p(h o f) = h o p(f ). Proof. The function f is bounded, say If I < a. Let
A={hEC([a,a]): p(h o f)=hop(f)}. Then A contains the constants, A contains the identity function h(x) = x, and A is closed under sums and products. Also A is closed under limits of uniformly convergent sequences. By the Weierstrass approximation theorem, A = C([a, a]).
Martingales
288
Corollary. Suppose f E Goo (Il, .F, µ) satisfies p(f) = f. If F C IR, is a closed set, then f 1(F) E Y and p(f 1(F)) C f 1(F). If (6.4.12)
G C 1R, is an open set, then f 1(G) E .F and p(f
(G))
f
(G).
Proof. Suppose G is open. The family
f={hECb(IR):0
f 1(G) E F and p(f1(G)) ? f1(G) A similar argument will show that if f E Goo, p(f) = f , and h : IR + JR, is lower semicontinuous, then p(h o f) > h o f. The last application concerns "separable modifications" for stochastic processes. The result is formulated for a stochastic process with values in an interval of the line, but similar proofs can be used in other situations as well.
(6.4.13) Definition. Let (0,.F, P) be a complete probability space, and [a, b] C IR a compact interval. A stochastic process in [a, b] is a family (Xt)t of random variables with a < Xt < b. The stochastic process (Yt)tEIR is a modification of (Xt) if for all t E IR, we have Xt = Yt a.s. (the exceptional set may depend on t). A stochastic process (Xt) is separable if
there is a countable set I C IR and 1 E F such that P(Q0) = 1, and for all w E Ilo, the graph
{(t,Xt(w)) EIRx [a,b] :tEIR} is contained in the closure of
{(t,Xt(w)):tEI}. (6.4.14) Theorem. Let (Xt)tE]R be a stochastic process in [a, b]. Let p be a lifting for P). Then Yt = p(Xt) is a separable modification of (Xe) Proof. For A C R and F C [a, b], define
V(A,F)={wE ):Yt(w)EFforalltEA}. If F is closed, then by Corollary (6.4.12), we have p(Yti 1(F)) C Yt 1(F) for all t. Now for fixed F C [a, b] closed, and G C IR open, the family
{V(A,F):Afinite, ACG}
6.4. Lifting
289
is directed downward, so n V (A, F) = V (G, F) is measurable and P (V (G, F)) = inf { P (V (A, F)) : A finite, A C G }
.
So there is a countable set IG,F C G with P(V(G, F)) = P(V(IG,F, F)). Now as G runs through a countable base for the open sets in IR, and F runs through a countable base for the closed sets in [a, b], let I be the union of
the sets IG,F. So I is countable. Let no be the complement of the union of the exceptional sets V (IG,F, F) \ V (G, F).
The set no has probability 1, since there are only countably many such exceptional sets, each of probability 0. Now fix wo E no and to E R. If G is an open neighborhood of to and F is the complement of an open neighborhood of Yto (wo), we see that wo ¢ V (G, F), so wo 0 V (IG,F, F), so there exists t E In G with Yt (wo) ¢ F. As G decreases and F increases, this shows that the point (to, Yto (wo)) is in the closure of the points (t, Yt(wo)) with t E I. This shows that (Yt) is separable. Complements
The axiom of choice is used in the proofs of Proposition (6.4.4) and Theorem (6.4.8). It is known that it cannot be avoided. If p is a lifting for IR, then
f.'' P(f)(0) is a linear functional on L,,. (IR) that is not induced by an element of Ll (IR),
so its restriction to the unit ball B of L,,. does not have the property of Baire for the weakstar topology on B (see Christensen [1974], p. 99). The map
A'' P(lA)(0) is a finitely additive measure on a o algebra (say the Borel sets in IR) that is not vadditive. It is not possible to prove the existence of any of these in ZF set theory without choice. R. Solovay [1970] constructed a model of ZF (with the principle of dependent choices) in which they are impossible. A lifting chooses representatives for bounded functions. It was observed
by von Neumann [1931] that a similar construction is not possible for unbounded realvalued functions. In fact, he proved more. A mapping p: L (St, 'F, p)  £ (St, .F, µ) is called a linear lifting iff: (1) P(f) = f; (2) if f g, then p(f) = p(g);
(3) if f > 0 then p(f) > 0; (4) p(f + g) = p(f) + p(g); p(af) = ap(f).
Martingales
290
(6.4.15) Proposition. Let 1 < p < oo. Suppose Gp (1k,.F, µ) admits a linear lifting. Then (12,F,µ) is purely atomic.
Proof. Let p be a linear lifting. If (1,.F,,u) is not purely atomic, then there is E E F such that 0 < µ(E) < oo and E contains no atoms. For each w E E, the map f* H p(f) (w) is a positive linear functional, hence continuous; call it uQ,. Now for each positive integer n, there exist disjoint sets Ein), E2n), , E n) C E with µ(Ekn)) = µ(E)/n. Let n
{w:p(1Ek.,))(w)=1}.
Fin>= k=1
Then F(n) = E, for all n, so
A E fn00Finl
= µ(E) > 0.
Choose W E E fl n°O_1 F(n). For each n, there exists k, 1 < k < n, such that w E so
(1 Ek,.) ,
p (1Ekn)) (w) = 1.
Then IIu.II ?
1
=
n1/nµ(E)1/p,
1Ekn) P
which goes to oo as n > oo. This contradicts the continuity of u,,. So in fact, (Q, .F, µ) is purely atomic. Remarks
The problem of whether a lifting exists for Lebesgue measure on the real line was proposed by A. Haar. This was solved by J. von Neumann [1931]. The existence of a lifting for a general vfinite complete measure space was proved by D. Maharam [1958]. Her proof relied on the classification of measure algebras. A more direct proof was given by A. & C. Ionescu Tulcea [1961], based on the martingale convergence theorem. The proof given here is a variant of that proof. The standard reference on liftings is the comprehensive book by A. & C. lonescu Tulcea [1969]. It contains several applications not discussed here.
7
Derivation
In this chapter we will consider the topic of derivation. We begin with the classical derivation theorems in lR and ]Rd. This can be done by considering an appropriate stochastic basis indexed by a directed set, and applying the martingale convergence theorems of Chapter 4. The theory of derivation bases in general is considered next. There are
many parallels here with the martingale limit theorems on directed sets, but we do not derive the derivation basis material from the stochastic basis material. Finally, we consider the special derivation bases known as Dbases and BusemannFeller bases. Characterization of the bases that differentiate the indefinite integrals of all functions in a given Orlicz space Lq, occupies most of the space here.
7.1. Derivation in lR This section discusses derivation of functions defined on the space R. This is the classical theory of derivation for the Lebesgue integral. We will carry out the discussion using martingale proofs in this simple setting as an example to be followed in the later, more complicated, settings. As always, lR denotes the set of real numbers. We will write F for the valgebra of Lebesgue measurable subsets of JR, and A for Lebesgue measure on F. The Lebesgue outer measure of a set Q will be denoted A* (Q), and the inner measure by A*(Q). The outer measure may be defined as:
A*(Q)=inf{.A(A).AEF,ADQ}. Similarly, the inner measure is:
,\*(Q)=sup {\(A):AEF,ACQ}. It is known that these values may also be written as:
A*(Q)=inf{A(U):Uopen, UDQ} * (Q) = sup J .\(K) : K compact, K C Q 1.
Derivation
292
Stochastic bases
In order to apply the results from Chapter 4, we will consider some stochastic bases on R. Let a < b be real numbers. A finite subdivision of [a, b] is a finite subset of [a, b], including a and b. We write D[a, b] or D[a, b) for the set of all subdivisions of [a, b]. If t = {a = xo < x1 < < xn = b} is a subdivision, the induced partition is lrt = { [xi_1, xi) : i = 1, 2,. , n };
and the induced oralgebra is the oalgebra Ft on Il = [a, b) with atoms [xi_1, xi), i = 1, 2, , n. Note that D[a, b) is a directed set when ordered by inclusion. If tl C_ t2, we will write t1 < t2 and say that t2 refines tl. Note that t1 < t2 implies .Ftl C Ft2. Thus, l Ft)tED[a,b)
is a stochastic basis on 9 = [a, b). Suppose S is a subset of [a, b], including a and b (such as a countable dense subset). We write VS [a, b) for the subset of D[a, b) consisting of those subdivisions involving only elements of S. Then Ft) tEDs [a,b) is again a stochastic basis on SZ = [a, b). There is an easy way to generate martingales adapted to these stochastic bases. Note that we may begin with an arbitrary function F, not necessarily continuous, or even measurable.
(7.1.1) Proposition (DifferenceQuotient Martingale). Let F : [a, b] + IR
be a function. Define simple functions ft as follows: If
t={a=xo <xl <... <xn=b} is a subdivision of [a, b), then n
(7.1.1a) i=1
F(xi)  F(xi1) xi  x%1
Let S be any subset of [a, b). Then
(ft) tEDs [a,b) is a martingale adapted to the stochastic basis l'Ft) tEDs [a,b)'
7.1. Derivation in IR
293
Proof. We must prove: if s, t E DS [a, b) ands < t, then Ea [ ft I F.] = fs Since s and t are finite, by induction it is enough to prove this when t contains exactly one more point than s. Suppose
Now a n y element A of 2 is a finite union of intervals [xj _ 1 i xj), so the equation
JA
ffdA=J ftdA A
must be checked only for A = [xj _ 1, xj), j = 1, 2, except for j = i. For j = i, it reduces to the identity
, n. This is trivial
F(xi)  F(y) (xi  y) + F(y)  F(xi1) (y  xi1) = F(xi)  F(xi1) y  xi1 xi Y Therefore, (ft) is a martingale.
It is probably not hard to believe that the convergence of a differencequotient martingale defined by a function F is related to the differentiability of the function F. This will be considered more fully below. The Vitali covering theorem The technical lemma for derivation in IR is a theorem of Vitali. But it is much more than a technicality. Variants of the theorem will be seen below. The theorem will use some terminology for intervals in R. It is spelled
out here so that the similar ideas below will be more easily recognized. An interval E in IR is nondegenerate iff E has positive length. If E is a bounded nondegenerate interval, then it has a midpoint (or center) x E IR and a radius r > 0, so that E has one of the forms (xr, x+r), [xr, x+r), (xr, x+r], [xr, x+r]. If a > 1, then the ahalo of such an interval E is the closed interval with the same center and a times the radius: [xar, x+ar]. If V is a (finite or infinite) collection of nondegenerate bounded intervals, then the ahalo of V is the union of the ahalos of the intervals in V. If Q C IR and V is a collection of nondegenerate intervals, then we say that V is a Vitali cover of Q if, for each e > 0 and each x E Q, there is E E V with radius < e and x E E. Thus if x E Q fl U, where U is an open set, then there exists E E V with x E E C U.
(7.1.2) Vitali covering theorem. Let Q be a subset of IR, and suppose V is a Vitali cover of Q by nondegenerate intervals. Then (a) There exists a pairwise disjoint countable family {E,,,} C V (finite or infinite) such that
Derivation
294
(b) If \* (Q) < oo, then, for each e > 0, there exists a pairwise disjoint finite family {E1, E2, .. , Ep} C V such that
(U) En
<e.
n=1
Proof (Banach). Note that A(E \ E) = 0 for any interval E, and that if E fl F = 0 then E fl F = 0. Therefore we may assume that the intervals in V are closed intervals. We may assume also that Q # 0. First assume \*(Q) < oo. Choose an open set V Q Q with .(V) < oo. Then
Vo={EEV:ECV} is also a Vitali cover of Q. The disjoint sequence {En} of intervals will be constructed recursively. Since Q is nonempty, Vo is also nonempty. Choose El E Vo. If Q C E1, we are done; so suppose not, that is, Q \ E1 # 0. Then U1 = V \ El is open and U1 fl Q# 0. Let
61 =sup{A(E):EEV0,ECUl}. Choose E2 E Vo such that E2 C U1 and A(E2) > b1/2. Now suppose we have chosen E1i E2,
not, that is, Q \ (E1 U and Un f1 Q # 0. Let
, En. If Q C_ E1 U E2 U . . U En, we are done; suppose
U En) # 0. Then Un = V \ (E1 U . . . U En) is open
bn={A(E):EEV,ECUn}. Choose En+1 E Vo such that En+1 C Un and \(En+1) > bn/2. Note that
we have En+l flEk=0fork
} C Vo of pairwise disjoint intervals.
Write B = Q \ U En. We claim that A* (B) = 0. Let Hn be the 5halo of En for each n. Then for each positive integer p,
(O")0n <>A(Hn)<5>A(En)=5.\ UEn <5A(V)<00. n=p
'n=p
n=p
0=0p
Thus E°°_1 A(Hn) < oo, and hence A (U Hn) > 0 as p * co. We will p show that B C U° p Hn for all p. Fix p. Let ao
p
x E B = Q \ UEnCQ\ UEnCUpn=1
n=1
7.1. Derivation in IR,
295
Since Vo is a Vitali cover of Q, there is E E Vo with x E E C Up. Now .1(En) + 0 and bn < A(En+1), so Sn > 0. Thus, there is n with bn <.(E), so that E g Un. Let n be the smallest integer with E Un. Clearly p < n. Thus
En(Elu...uEn)0, En(Elu...uEn_1)=0, so E n En # 0. But E C_ Un_,, so A(E) < 6n_1 < 2A(En). Therefore [since
E n En # 0, .(E) < 2.(En), A(Hn) = 5A(En)] we have E C_ Hn. Thus x E E C UP Hn. This shows B C UO P Hn for all p, and thus A*(B) = 0. If e > 0 is given, choose p so that E P+1 .(Hn) < e. Thus P
A* Q \ U Hn <. n=1
For the general case A* (Q) = oo, apply the above, replacing Q by Q n (n, n + 1) and let V = (n, n + 1) to get a countable disjoint family An S V of subintervals of (n, n + 1) covering almost all of Q n (n, n + 1). Since the set of integers has Lebesgue measure zero, the union 00
UAn n=oo
is the required countable pairwise disjoint family.
The use that we will make of the Vitali covering theorem is to deduce the Vitali condition M.
(7.1.3) Corollary. Let a < b be real numbers, and let S be a dense subset of [a, b). Then the stochastic basis (Ft)tEvs Ia,b) satisfies the covering condition (V).
Proof. Let At C_ [a, b) be given for each t E Vs [a, b), At E Jc't, and let A* = e lim sup At. Fix to E Vs[a, b) and e > 0. For each positive integer m there is (since S is dense) a partition tm with mesh < 1/m and t,,,, > to. Then we have A* C_ ess sup t>tm At; choose tml, tm2, > with A* C U°° 1 Atmi a.e. Let 00
00
B=A*n
1
1
UAt,.
m=1 i=1
Then B C U°° 1 A. for all m and B = A* a.e.
Derivation
296
Now At_, E Ftm:, so it has the form of a finite union Atm: =
U[amik, bmik),
k
with bmik  amik < 1/m. Therefore, the collection obtained by combining all of the intervals, V = {[a nik, bmik)}m,i,k, is a Vitali cover of B. Thus by the Vitali covering theorem, there exist a finite disjoint collection from Vwith But E1 U E2 U
U Ep is of the form A(T) for some incomplete stopping time
T. Indeed, for each Ej, choose a pair m, i with E3 C Atm; and E3 E Ft_,; then define T = tmi on Ej. Derivation
It is now a routine (but interesting) matter to apply the martingale convergence results of Chapter 4 to prove the classical derivation theorems on R. We need pointwise a.e. convergence, not just essential convergence, so we use a countable directed set.
(7.1.4) Theorem. Let a < b be real numbers, and let F: [a, b] > 1R be monotone. Then the derivative F'(x) exists and is finite for almost all x E (a, b).
Proof. Assume F is nondecreasing. Then F is continuous except at a countable number of points. Let S be a countable set, dense in [a, b], containing a, b, and all points of discontinuity of F. Consider the differencequotient martingale
(It) tEDS [a,b) as in Proposition (7.1.1). This martingale is L1bounded. Indeed, if
t={a=xo <xl <... <xn=b}, then
f Iftld.X=JftdA n
F(xi)  F(xi1) xi  xi1 i=1 = F(b)  F(a) < oo. _ 37
(xi  xi1)
Therefore, by (4.2.11), the martingale converges essentially, and (since VS [a, b) is countable) almost everywhere. Say ft + f a.e. We claim that if x is a point of (a, b) \ S where ft(x) * f (x), then F is differentiable at x and F(x) = f (x). [This will show that F' exists and is finite a.e.] Indeed, let e > 0 be given. Choose to E Ds [a, b) such that, for all t > to, we have I ft (x)  f (x) I < e. Write to = {a = xo < . < xn = b}. Now x ¢ S, so
7.1. Derivation in IR
297
we havexi_1 <x<xi forsomei. Ify,zESwith xi_1
t={a=xo<x1<. <xz_1
F(z)  F(y)
zy
 f (x) < E.
Now it follows that (7.1.4a) holds for all y, z satisfying xi_1 < y < x < z < xi, since points not belonging to S are points of continuity of F. This shows that
F(x)  F(y) < .f (x) + E, xy y'x F(x)  F(y) > f(x) lim inf E
lim sup
xy
y.x
Since e was arbitrary, we have lim
yx
F(x)  F(y) = f W. xy
An examination of the preceding proof might naturally lead to a generalization to functions of bounded variation.
(7.1.5) Definition. Let F: [a, b] + IR be a function. The variation of F on a subdivision t = {a = xo < . . . < xn = b} is n
I F(xi)  F(x11)l.
Vt(F) _ i=1
The variation of F on a subset S of [a, b] is
Vs(F)=sup{Vt(F):tEDs[a,b]}. The function F is said to have bounded variation on [a, b] (or finite variation) if Va,b](F) < oo. It is easily seen that for a < b < c, we have V [.,.l (F) = V[a,b] (F) + Vb,c] (F).
(7.1.6) Proposition. Let F: [a, b] + 1R, have bounded variation. Then F is the difference of two nondecreasing functions.
Proof. Let Fi(x) = Va,x](F) and F2(x) = Fi(x)F(x). Then F = F1F2. Clearly F1 is nondecreasing. To see that F2 is also nondecreasing, for x < y compute F2 (y)  F2 (x) = Va,y] (F)  F(y)  Va,x] + F(x) = Vx,yl  (F'(y)  F(x)) > 0.
298
Derivation
(7.1.7) Theorem. Let F: [a, b] p JR, be a function of bounded variation. Then F' exists and is finite almost everywhere in (a, b). Proof. Since F is the difference of two nondecreasing functions, this follows from Theorem (7.1.4).
An alternative proof of this fact would use the differencequotient martingale (ft) associated with F, and note that the martingale is Llbounded, since the L1 bound of (ft)tEDs[a,b] is exactly equal to the variation VS(F). Then the proof would proceed exactly as in (7.1.4). This method would require the result that a function of bounded variation is continuous except at countably many points, which is easy to prove, but no easier than Proposition (7.1.6). These remarks also lead to another corollary.
(7.1.8) Corollary. Let F: [a, b] > lR have bounded variation. Let S be a countable dense subset of [a, b], including a, b, and all points of discontinuity
of F. Then the differencequotient martingale (ft)tEDs[a,b], defined as in (7.1.1), converges a.e. to F'.
In Chapter 4, a special role was played by the closed martingales; that is, martingales of the form ft = E L1. When we translate into the present language, we obtain the following.
(7.1.9) Theorem. Let f E Li([a, b]), and let F be the indefinite integral
F(x) =
f
f (y) dy.
a
Then F(x) = f (x) for almost all x E (a, b). Proof. Let (ft) be the differencequotient martingale associated to F as in (7.1.1). Let t = {a = xo < < x,, = b} be a subdivision of [a, b]. It is only a matter of applying the definition to see that
E.[fl17t]=ft. If S is a dense set as in the proof of (7.1.4), then the oalgebra .F generated by
uFt
tEDs (a,b)
is the algebra of all Borel sets in [a, b). But (ft) converges to Ea [f .F"'], which is (a.e.) f. Therefore F' = f a.e. We know that a martingale is of the form Ea [f .Ft] if and only if it is uniformly integrable.
7.1. Derivation in IR
299
(7.1.10) Proposition. Let F: [a, b] + JR be a function, and let (ft)tED[a,b)
be the differencequotient martingale associated with F. Then the following are equivalent. (a) (ft) is uniformly integrable.
(b) F is absolutely continuous: that is: for every e > 0 there is 6 > 0 such that 1 IF(bi)  F(ai) I < e for every finite pairwise disjoint family {(al, bl), (a2i b2),
,
(an, bn)} of open subintervals of (a, b)
with E$ 1(bi  ai) < 6. Proof. Lebesgue measure is atomless, so uniform integrability is characterized as in (2.3.2(3)). Suppose that (ft) is uniformly integrable. Let e > 0. Then there is b > 0 such that fA I ft I dA < e for any t E D[a, b) and any set A with A(A) < 6. If {(al, b1), (a2, b2), , (an, bn)} is a pairwise disjoint family of subintervals of (a, b) with 1(bi  ai) < 6, let t consist of a, b, and all of the endpoints ai, bi. Then A = U 1(ai, bi) belongs to Ft and A(A) < 6. Therefore
i=1
F(bi)  F(ai)l = f I fti dA < e. A
Conversely, suppose that (b) is satisfied. Lets > 0 be given. Then there exists b as in (b). Let A be any measurable set with A(A) < 6. We claim that fA I ft I dA < e for all t. There is an open set U Q A with A(U) < 6. It is enough to show that fu I ft I dA < e, since fA I ft I dA < fu I ft I dA. Now U is a countable disjoint union of open intervals, so by the monotone convergence theorem, it is enough to consider the case where U is a finite disjoint union of open intervals. Choose t1 E D[a, b) containing all the endpoints of intervals in U and all the points of t.
I
F(xi)  F(xi 1)I
where the last sum is over a set of indices i such that > (xi  xi1) _ A(U) < S, so that the result is at most e.
(7.1.11) Corollary. A function F: [a, b] > JR is absolutely continuous if and only if it has the form F(x) = c+ f `0 f dA a
for some function f E Ll ([a, b]).
x E [a, b]
Derivation
300
Complements
(7.1.12) (Lebesgue density theorem.) Deduce from the results of this section the Lebesgue density theorem: Let A C IR be a measurable set with A(A) > 0. Then almost every point x E A is a point of density of A in the sense that for every c > 0, there is 6 > 0 so that
A(An (x6,x+ 6)) 26
>
The Vitali covering theorem dates from 1905. The proof given here is due to S. Banach [1924]. This result is, as we have seen, closely connected to the covering condition (V) for stochastic bases. Because of this connection, the covering conditions considered in Chapter 4 are often known generically as "Vitali conditions," and condition (V) is sometimes known simply as "the Vitali condition."
7.2. Derivation in IRd The next topic is derivation in Euclidean space lRd, for d = 2, 3, . Examples of derivation bases to be considered are squares, disks, and intervals. We will write A for ddimensional Lebesgue measure on IRd and A* for ddimensional Lebesgue outer measure.
Substantial sets Let C be a collection of nonempty bounded open sets in IRd. Suppose C is
substantial: that is, there is a constant M such that, for every C E C, there is an open ball B with C C_ B and A(B) < MA(C). We will also assume that the sets C E C have boundary of Lebesgue measure 0: A(OC) = 0. [Recall that a point a is a boundary point of a set C C Rd if every neighborhood of a meets both C and IR.d \ C.] For example, when C is a convex set, we do have A(8C) = 0. If B is an open ball, we write r(B) for its radius.
Let V be a subcollection of C and let Q C IRd. We say that V is a Vitali cover of Q if for every x E Q and every e > 0, there is E E V with
diamE<eandxEE. (7.2.1) Theorem. Let C be a substantial collection of nonempty open bounded subsets of Ed with boundaries of measure 0. Let Q be a set in Rd, and suppose V is a Vitali cover of Q by sets of C. Then there exists a pairwise disjoint countable family {E, } C V (finite or infinite) such that A* (Q \ U1 En) = 0. If A* (Q) < oo, then for each e > 0, there exists a pairwise disjoint finite family {E1i E2, , Ep} C V such that A* (Q \ Un=1 E") < e.
7.2. Derivation in ]R,d
301
Proof. First suppose Q is a bounded set. We apply (4.2.6) repeatedly. Let s > 0 be given. Fix an open set W D Q with A(W) < (1 + e)A*(Q). Then
W1=U{CEC:CCW} is an open set. Apply 4.2.6: there exist disjoint E1, E2,
, Ep, E C with
Ei C W and EP' 1 A(Ei) > 21M13dA(W1). Now W \ UP', Ei is an open set; let
W2=U{CEC:CCW\UP11Ei}. Repeat: there exist Ep, +1,
, E,,, E C with Ei C W \ UP11 Ei and
P2
E A(Ei) > 21M13dA(W2). i=p'+1
Continue in this way. Thus, for each k, pk
A(Ei) > [1 (1  21M13d)k] \*(Q)
i=1
J
Take p = A for large enough k to obtain A* (Q \ Un=1 En) < e, and allow an infinite sequence of Ei to obtain A* (Q \ U= 1 En) = 0. The case of unbounded Q may be done as before: consider Q (1 S for a disjoint family of open cubes S that cover Rd (except for a set of measure zero).
A Cpartition of an open set U is a countable disjoint collection {Ci} C C, contained in U, such that
(u\uc)i =0. i=1
We write D(U) for the set of all Cpartitions of U. The ordering is refinement: if
t={ C2:iEIN }
t'={ C,:jEIN }, then we say t' refines t if each Cj' is contained in some Ci. (And therefore each Ci is, except for a set of measure zero, the union of some collection of the Cil.)
We write.Ft for the valgebra on U generated by the partition t. Theorem (7.2.1) implies that the stochastic basis (.Ft)tcv(v) satisfies the covering condition (V):
Derivation
302
(7.2.2) Proposition. Let U be a bounded open set. The collection D(U) of Cpartitions of U is directed by refinement, and satisfies condition (V).
Proof. D(U) is directed by refinement: Let t1, t2 E D(U). For each C1 E ti and C2 E t2, consider the intersection w = Cl f1 C2. The collection of all C E C contained in W is a Vitali cover of W, so by (7.2.1), W is (up to null sets) the disjoint union of sets C E C. (V) is done as in (7.1.3). As usual, (V) implies convergence.
(7.2.3) Theorem. Let f be locally integrable on Rd. Then
lim J
diamEC0 E
dA = f (x)
EE
xEE
for almost all x E Rd.
Proof. Consider the restriction of f to a large disk R. If
t={Ci:iE]N} is a Cpartition of R, define
A = E (A(1) I : f dA
1ci.
iEIN
It is easy to check that (ft)tED(R) is the closed martingale E), [f I.Ft]. We may therefore conclude by (4.2.11) that (ft) converges essentially to f. Now Ll (A, R) is separable, so there is a countable collection CI that is LIdense in
{lc:CEC,CcR}. For each positive integer n, choose a Cpartition to E D(R) such that
(1) tl < t2 < ...; (2) diam E < 1/n for all E E tn; (3) A{ess sup t>tn ft > f+ 1/n} < 2n; (4) A{ess inf t>t ft < f  1/n} < 2n.
Then, for each n, choose a countable family {tni}°Do such that
(1) tno = tn; (2) tni ? tn;
(3) if E E tn, and F C E, F E C1, then F E tni for some i.
7.2. Derivation in IRd
303
Now if
C. = {sup ft,,; > f + 1/n} U {inf ft,., < f  1/n}, i
i
then A(Cn) < 2n+1, so almost every x E R belongs to only finitely many Cn. Let Ani = UEEt., E. Then A(R \ n. 'j Ani) = 0. So almost all x E R belong to nn i Ani.
Fix a point x with x E nn i Ani and x ¢ Cn for n > no. Fix n > no. Then there is Eo E to such that x E E0; let rn be half the distance from x to the boundary of E0. Then we claim: for all E E C with x E E and diam E < rn, we have 1
A(E)
ffd,\f(x)
n
Indeed, if E E C1, then E E tni for some i, so
(E) E f dA = tn{ (x)
f (x) + n
and similarly
A(E) E f
dA > .f (x)
n
Now by the dominated convergence theorem, the real number 1
A(E)
LfdA
is a continuous function of E E C (as long as E C_ E0), so the inequalities hold for all E E C. This shows that sup EEC XEE
A1 4f dA f(x)I
<
n.
diam E
Now as n * oo, we have rn + 0, so (7.2.3b)
EEC
diam E.0
A(E) )E f dA = f (x)
xEE
The point x was chosen from a set in R with complement of measure zero, so (7.2.3b) holds almost everywhere in R. Finally, it follows that (7.2.3b) holds almost everywhere in JR2.
Note that the condition A(OC) = 0 is required to show that D(C) is a directed set. The derivation theorem (7.2.3) in fact holds more generally: see (7.4.3) and (7.4.11).
Derivation
304
Disks and cubes As a simple example, consider derivation using disks. (In IR3 the analogous theory uses balls, and in higher numbers of dimensions, the appropriate analogs of balls.) An (open) disk is a set of the form
B(x,r)={yEIR2: Ixyj
1J
r0 7rr2
B(x,r)
fdA.
The conclusion will be that this converges to f (x) for almost every x.
(7.2.4) Corollary. Let f be a locally integrable function on W. Then lim
(7.2.4a)
1J
r+0 7rr2
f d= f (x) (x,r)
for almost all x E 1R2; and
1 fE f da = f(x) E dk (E)
(7.2.4b)
r(E).0 sEE
for almost all x E W. Another way in which derivation in IRd can be done uses cubes as the basic cells. (For 1R2, we call them squares.) What we want are squares with sides parallel to the coordinate axes. A square is a set in 1R2 of the form
{(x,y)EIR2:a<x
is the initial vertex and r > 0 is the edge length. If E is a square, we will write e(E) for the edge length of E. So (7.2.3) yields the where (a, b) E following.
(7.2.5) Corollary. Let f be a locally integrable function on IR.2. Then lira sEE
1
e(E).0 A(E)JE E square
for almost all x E IR2.
f dA = f (x)
7.2. Derivation in Ed
305
Intervals We consider next another natural example of a stochastic basis. It occurs in IR". For simplicity consider the case 1R2. This is a situation where the sets are not substantial in the sense described above. We define an interval to be a set [a, b) x [c, d). An interval partition of the square V = [0, 1) x [0, 1) is a finite disjoint set t of intervals with union
V. The corresponding oalgebra Ft has the intervals of t as atoms. The set of all interval partitions of V is ordered by refinement. There is a mode of convergence that corresponds to this stochastic basis.
Let f E L1 (V). We are interested in the convergence of the indefinite integral of f to f (x) in this sense: For every e > 0, there is 6 > 0 such that for all intervals E C V with x E E and diam E < 6, we have f dA  .f (x) <6.
1
A(E)
E
The diameter of a rectangle is the length of the diagonal. An equivalent way to view this convergence requires both dimensions of the rectangle to approach zero, but postulates no relation between them. Now this convergence is true for functions f E L,,. (and even for functions
f E L log L), but it fails in general for f E Li. (This is discussed below in Section 7.4.) We exhibit here an L1 function that is not equal a.e. to the derivative (in the sense of the interval basis) of its indefinite integral.
(7.2.6)
Lemma. Let N > 1 be an integer, and let S be an interval.
Then there exist intervals El, E2, , EN contained in S such that the sets I=n N 1 Ek and U = UN 1 Ek satisfy A(I) = A(Ek)/N; A(Ek) = A(S)/N; A(U) > (log N)A(Ek). N/ N S
IN
2/N
12
1/N I1
1/N
...
1/2
1
Figure (7.2.6). Illustration for Lemma 7.2.6.
Derivation
306
Proof. If S = [a, b) x [c, d), let
Ek =
[a+ b a,c+ (d c)k N
k
Thus A(Ek) _ A(S)/N, A(I) = A(S)/N2 = A(Ek)/N, and A(U)
=
N> (logN)A(Ek). M {1+++...+] 3
2
Rephrased, this shows:
(7.2.7) Lemma. Let N > 1 be an integer and S an interval. There exist sets I C U C S, finite unions of intervals, such that
N
A(U) ? loN A(S), and, for every xo E U, there is an interval Eo C S with xo E Eo and
A(Eo n i) = N A(Eo).
(7.2.8) Lemma. Let N > 1 be an integer and S an interval. There is a set B C S, a finite union of intervals, such that A(B) < (2/(N log N)) A(S) and for almost every xo E S, there is an interval Eo C S with A(Eo n B) > (1/N)A(Eo)
Proof. First, apply Lemma (7.2.7) in S. We get sets I1 and U1 with A(U1) > (logN/N)A(S). Then subdivide the remainder S \ U1 into intervals, and apply Lemma (7.2.7) to each of them. Let I2 and U2 be the unions of the corresponding results. We have A(U2) > (log N/N)A(S \ U'). Continue with the remainder. Stop when enough of the set is covered: A (Ul U ... U Um) > Let
(1_NlgN) 1
A(S).
Um+l=Im+1=S\(U1U...UUm). Let
I U I m+1 Then (almost) every xo E S belongs to some Uk, hence to some interval Eo with A(Eo n B) > (1/N)A(Eo) (equality for k < m). We have m+1
A(B) _ E \(Ik) k=1 1
k=1
N log N
A(Uk) + A(Um+l)
< N l g N A(S) + A(Um+1) <
2
NlogN
A(S).
7.2. Derivation in Rd
307
A slight variant:
(7.2.9) Lemma. Let N > 1 be an integer and S an interval. Then there is a function w defined on S with only values 0 and N, such that
Is
w log w d,\ < 2A(S),
with the convention 0 log 0 = 0; and for almost every xo E S, there is an interval Eo C S with xo E Eo and w dA > A(Eo). ZEro
Proof. Let B be as in (7.2.8), and let w = N 1B. Finally we come to the counterexample. It is a nonnegative integrable function for which the upper derivate (for the interval base) is infinite almost everywhere.
(7.2.10) Theorem. There is a nonnegative function f E L1([0,1) x [0, 1)) with infinite upper derivate a.e. Proof. Let N,,, be a sequence of integers with N,,, > eem. Write V = [0, 1) x [0, 1). For each m, subdivide V into intervals with diameter < 1/m, then on each interval apply Lemma (7.2.9) with integer N,,,. Thus we get a function w,,,: V * [0, oo), with values 0 and N,,,,, such that f w,,,, log w,,, d. < 2 and
for almost all xo E V, there is an interval Eo with diam Eo < 1/m and fEo w,,,, d)\ > ,\(Eo). Since w,,, has only values 0 and N,,,, with the usual convention 0 log 0 = 0 we have w,,, log w,,, = w,,, log N,,,.
Now let f = E' i em/2w,,,. Then f > 0. Also,
f dA = E Je/2w,,, dA
r
V
=E
f e_,/2e''w,,,
<
e /2
=
e"'/2
dA
f(log Nm,)wm dA
f w,,, log w,,,, dA
< E e'"`/22 < oo.
Thus f E L1. For almost all xo and any m, there is an interval Em with diameter < 1/m such that f dA > em/2 Lm
dA > em/2A(Em).
Lm
Derivation
308
Thus the upper derivate of f at xO is > em!2. This is true for all m, so the upper derivate is infinite. Complements
(7.2.11) (A generalized derivation theorem of Besicovitch.) A measure v defined on the Borel sets of Rd is locally finite if v(B(0, R)) < oo for all R > 0. Let us say that a collection V of closed balls in Ed is a Vitali cover of a set Q C_ R' if, for every x E Q, the collection V contains balls of arbitrarily small radius centered at x. Suppose v is a locally finite measure on Rd, Q is a subset of Rd, and V is a Vitali cover of Q by closed balls. Then V has a countable, pairwise disjoint, subcollection {E1, E2, } such that
(Q\uE)n =0. n
(Besicovitch [1945], [1946], [1947]). This Vitali theorem may be used as in this section for derivation theorems with v replacing Lebesgue measure A. (7.2.12) (Intervals with edges of the same size.) Let k, m be integers, 1 < k < m, and let C be the collection of intervals in [0,1]"` whose edges have no more than k different sizes. If f is in L logk1 L, then for almost every x, lim
diam E0JE
f dA = f (x).
EEC xEE
Remarks
The counterexample concerning the interval basis is due to Bohr. It can be adapted to show that the space L log L is exactly the correct space for differentiability in the interval basis; see Hayes & Pauc [1970], page 98. Similarly, in lRd, Zygmund [1934] showed that L logd1 L is exactly the correct space. This is also true in other settings: see (9.2.3) and the introduction to Section 9.4. Complement 7.2.12, due to Zygmund [1967], was extended to the setting of substantial sets by Frangos & Sucheston [1985], Theorem 6.2.
7.3. Abstract derivation We have seen above that, in some special cases, derivation theorems can be viewed as consequences of martingale convergence theorems. But it is not convenient to view them that way in general. It is also possible, in some special cases, to view martingale convergence theorems as consequences of derivation theorems (7.3.23). But again, it is not convenient to view them that way in general. In this section we will discuss derivation in the general abstract setting. In the next section we will specialize somewhat in order to obtain more precise results.
7.3. Abstract derivation
309
Nonmeasurable sets In the most general setting for derivation (defined below), we must deal with nonmeasurable sets. We will specify our notation and terminology be a vfinite measure space. Let Q C ci concerning them here. Let (1k, be a set. The outer measure of Q is
p*(Q)=inf{p(A):AEF,ADQ}, and the inner measure of Q is
µ*(Q) =sup{µ(A) : AE.F,AC Q}. There exists a measurable set C D Q with p(C) = p*(Q). Such a set C is unique up to null sets, and is called an outer envelope of Q. For example, C could be the essential infimum of all measurable sets A with A D Q, and therefore is the intersection of a countable family of such sets A. We may sometimes write Q for an outer envelope of Q. A measurable set Q C Q with p (Q) = µ* (Q) similarly exists, and is called an inner envelope of Q. Derivation bases
The abstract setting for derivation theory is the "derivation basis." It corresponds (very roughly) to the "stochastic basis" in the convergence
theory of Chapter 4. The idea is to assign to every point w E 0 a net (Et)tEJ of sets (or even a family of such nets). We are interested in relations such as
em
JErt
fdp=.f(w)
The directed set J may or may not be the same for all of the nets (Et) belonging to a given derivation basis. Let (St, p) be a Qfinite measure space. (It is possible to study derivation without the assumption of afiniteness, but we do not pursue it here. For example, derivation with respect to Hausdorff [fractal] measures requires such a theory.) A deriving net (w, J, (Et)) consists of a point w E S2,
a directed set J, and a net (Et)tEJ, where Et E .F, 0 < p(Et) < oo. Note that w E Et is not specified in the definition, although this is the case in all the examples we shall study. The abstract definition allows, for example, a case like this: SZ is the plane 1R2, and for w E S2, r > 0, the set E T is an annulus { y E IR2 :r < I y  w I < 2r j centered at w.
is a collection B of (7.3.1) Definition. A derivation basis on deriving nets satisfying: (1) for every w E St, there is at least one deriving net (w, J, (Et)) E B;
(2) if we have (w, J, (Et)tEJ) E B and J' C_ J is cofinal in J, then we also have (w, J', (Et)tEJ') E B.
Derivation
310
If (w, J, (Et)) E B, we will say that (Et) converges to w according to B. When the derivation basis B is understood, we will write Et
w.
The sets Et that occur in a derivation basis B are known as the constituents of B. The collection of all the constituents of B is the constituency of B (or the spread of B).
Normally, throughout this chapter, a measure space (fl,
p) and a
derivation basis B will be fixed. Vitali covers and derivation
(7.3.2) Definition.
Let Q C_ Sl be a (possibly nonmeasurable) set. A collection V of constituents is a Bfine cover of Q if, for every w E Q, there exists Et = w with Et E V for all t. Let C E F be a measurable set. The collection V of constituents is a Bfine almostcover of C if there is a set Q such that C is an outer envelope of Q and V is a Bfine cover of Q. A collection U of constituents is a full Bfine cover of Q if, for every w E Q and every deriving net Et w, there is an index to such that Et E U for all t > to. Observe that the intersection of two full Bfine covers of a set Q is again a full Bfine cover of Q; and the intersection of a full Bfine cover of Q with a Bfine cover of Q is again a Bfine cover of Q. Let y be a realvalued function defined on the constituency of B. The upper derivate of y (with respect to p) is
D*y(w) = sup limsup y(Et) t
/t(Et)
The lower derivate of y (with respect to p) is
D,,y(w) = inf liminf Et=* w
t
y(Et)
µ(Et)
If D*y(w) = D*'y(w), then we say y is differentiable at w, and write Dy(w) for the common value. If y is differentiable at almost every w E 11, then we say that y is differentiable. One family of examples of set functions y is the family of integrals. An integral is a set function y of the form 'Y(E) =
fE
f dµ,
where f is a measurable function such that the integral exists for all constituents E. Then we say that B differentiates the integral of f if Dy = f a.e.
7.3. Abstract derivation
311
Some examples of derivation bases were used in the first two sections of this chapter. In the measure space IR, for the interval basis, we postulate that En = w if En is a sequence of closed intervals, containing the point w, with positive lengths converging to 0. Then Theorem (7.1.9) can be interpreted to say that the interval basis in IR differentiates all L1 integrals. In 1R2, the centered disk basis is described by saying that En w if En is a sequence of disks, centered at w, with radius converging to 0. The disk basis is similar: En w if En is a sequence of disks, containing the point w, with radius converging to 0. Corollary (7.2.4) states that both of these bases differentiate all L1 integrals. Description of the bases corresponding to (7.2.3) and (7.2.5) is left to the reader.
For the interval basis on 1R2, we postulate that En = w iff En is a sequence of intervals, containing the point w, such that diam En * 0. Theorem (7.2.10) shows that the interval basis fails to differentiate some L1 integrals. Many derivation bases (including all of those considered above) have a
useful approximation property. If St is a metric space, and a is a afinite Borel measure on S2, then p is a Radon measure iff, for every Borel set B, we have
µ(B) = sup { µ(K) : K compact, K C B } , µ(B) = inf fµ(U) : U open, U 2 B } . Lebesgue measure on Rd is a Radon measure. If SZ is a complete separable metric space, then every finite Borel measure on 1 is a Radon measure (for example, Halmos [1950], (10), p. 40). Now if B is any of the bases considered above, and if C E Y, 0 < A(C) <
oo, and e > 0, then there is an open set U 2 C with A(U) < )(C) + e. However,
U={ E constituent : E C U is a full Bfine cover of C.
For abstract derivation bases we will use the following definition. A derivation basis B has small overflow if for every C E F with p(C) < 00
and every e > 0, there exist a set Co C_ C with µ(C \ Co) = 0 and a full Bfine cover U of Co such that for any A1, A2i , An E U, we have µ(U21 Ai \ C) < E. We will prove below (Proposition (7.4.2)) that many of the commonly used derivation bases have small overflow. The strong Vitali property We have seen that the Vitali covering theorem is an important tool for the classical derivation theorems considered above. Derivation bases for which corresponding properties hold will have useful derivation properties. We begin with a property for derivation bases that roughly corresponds to the covering property (V) for stochastic bases.
Derivation
312
(7.3.3) Definition. A derivation basis B satisfies the strong Vitali property if, for every C E F with 0 < µ(C) < oo, every Bfine almostcover V of C, and every E > 0, there exist finitely many pairwise disjoint constituents Al, A2, , An E V with
z<e,
(C\UA) d=1
n
µ
UAi\C)
<E.
We will prove that a derivation basis B with the strong Vitali property differentiates all L1 integrals. First, we prove a basic lemma for such integrals.
(7.3.4) Lemma. Let f E L1, and define 7(A) = fA f dµ for A E F. For any e > 0 there exists 6 > 0 so that if A1, A2i , An are disjoint constituents and C E F such that µ(C \ U Ai) < S and µ(U Ai \ C) < b, then
n
7(Ai) 7(C) <e. i=1
Proof. Since f E L1, given e > 0 there is 6 > 0 so that if G E .F with µ(G) < , An be disjoint constituents 6, then f G if I dµ < e/2. Now let Al i A2,
and C E.F with µ(C \ U Ai) < 6 and µ(U Ai \ C) < 6. Then
E 7(Ai) 7(C)I =
JA.fdP_fcfd/i fdµ_fdµdµl
=IJUA,JC
f
A ;\C
Ifl dµ+
f
C\UAi
IfI dµ
(7.3.5) Theorem. Let B be a derivation basis satisfying the strong Vitali property. Then B differentiates all L1 integrals.
Proof. Let 'y be an L1 integral; say ly(A) = fA f dµ, with f E L. We claim first that D*7 < f a.e. If not, there exist a < b such that µ* { f < a < b < D*7} > 0. Hence there is a set Q C {f < a < b < D*7} with 0 < µ*(Q) < oo. Let C be an outer envelope of Q. Now
V = ( E constituent : 'y(E) > b }
7.3. Abstract derivation
313
is a Bfine cover of Q, and therefore a Bfine almostcover of C. Let e > 0. By the strong Vitali property and Lemma (7.3.4), there exist disjoint constituents Al, A2i , A,,, E V with u(C \ rU Ai) < e, µ(U Ai \ C) < e, I >'y(Ai) y(C)I < e. Thus
y(C) + E > E'y(Ai) ? b E µ(Ai) = bµ(U Ai) ? ba(C)  elbl.
This is true for all e > 0, so 'y(C) > ba(C). But f < a on C, so y(C) _ fc f dµ < ap(C) < by(C), a contradiction. Therefore D*y < f a.e. D*ry, so we have D*ry = D*y = f Similarly, D*y > f a.e. Clearly a.e.
The weak Vitali property Next we will consider a condition analogous to condition (Vi) for stochastic bases.
(7.3.6) Definition. The derivation basis B has the weak Vitali property if, for every C E F with 0 < µ(C) < oo, every Bfine almostcover V of C, and every c > 0, there exist finitely many constituents A1, A2, with
,
A,,, E V
(a) Fi (C \ U 1 Ai) < e,
(b) En µ(U 1Ai\C)<E,n
1z(Ai)  i (U 1 Ai) < e. (The lefthand side of (a) is called the "deficit"; the lefthand side of (b) is called the "overflow"; the lefthand side of (c) is called the "overlap.") There are some useful alternative formulations of the definition. (c)
1
(7.3.7) Proposition. Let B be a derivation basis. The following are equivalent
(1) The weak Vitali property: For every C E F with 0 < µ(C) < 00, every Bfine almostcover V of C, and every e > 0, there exist finitely many constituents A1, A2i , A,, E V with a (C \ U Ai) < E, /2(UA2\C) <E, E F with 0 < u(C) < oo, every Bfine almostcover V of C, and every e > 0, there exist finitely many constituents Al,A2,...,A,,EVwith
Il>1Ai1CII1<e.
(3) For every C E F with 0 < µ(C) < oo, every Bfine almostcover V of C, and every e > 0, there exist countably many constituents A1, A2, ... E V with p (C \ U Ai) = 0, µ (U Ai \ C) < e, E u(Ai) p (U Ai) < e.
(4) For every C E F with 0 < p(C) < oo, every Bfine almostcover V of C, and every e > 0, there exist countably many constituents E V with 11 1Ai 1cII1 < E. Al, A2,
Derivation
314
Proof. (1) (3). Let C, V, e be given. Write e' = e/3. Apply (a) to obtain finitely many constituents All, A21, E V with µ (C \ U Ail) <
e'/2, µ (U Ail \ C) < e'/2, E µ(Ail)  µ (U Ail) < E'/2. Then apply (a) to the set Cl = C\U Ail to obtain finitely many constituents A12, A22,
EV
with µ (Cl \ U Ail) < e'/4, µ (U Ail \ Cl) < e'/4, E µ(Ai2)  µ (U Ail) < e'/4. Continue in the same way: Ck = Ck_1 \ Ui Aik, µ (Ck_1 \ U Aik) < 6'/2 k' µ (Ui Aik \ Ck1) < e'/2 k' Ei µ(Aik)  µ (Ui Aik) < E'/2k. Then the countable set {Aik} satisfies the conditions of (3):
µ C \ U Aik = it n Ck = lim µ(Ck) = 0; i,k
µ
k
(uAik\C) =J:µ UAik\Ck_1 <e<e; i,k
i
k
1: µ(Aik) _ 1: 1: µ(Aik) i,k
k
i
5 Jµ YAik
+E'
k
+E C[(Ck_1\Ck)+(uAik\Ck_1)] E i
k
< µ(C) + 2E'
<_µ UAik +3E i,k
=µ UAikI +e. i,k
(3)
(4). Suppose (3) is satisfied. Then 1Aj  1C
< 11E 1Ai 1UA, I 1+II1UA; 1CII1 1
= > µ(Ai)  lt(U Ai) + µ(U Ai \ C) + µ(C \ U Ai) < e + e + 0 = 2e.
(4) : (2).
Suppose the conclusion is true with a countably infinite .. By monotone convergence,
collection of sets A1, A2i n
lim n
00
lAe  is
E 1A,  1C i=1
Thus, for some n, we have
1
II>
i=1
1 1A;  icIIl < E.
< E.
7.3. Abstract derivation
315
(2) = (1). Suppose II> 1A,  1cI i < E. Then 1UAs\c < 121A,1CI, so µ(U Ai \ C) = II U A$ \ C111 < E. Next, 1C\UA; < I E 1A;  1c 1, so µ(C \ U A2) < e. Finally,
rµ(A1)µ(UAi)=IIu1A'1UAj11i
< IIElA`
1Clll+111c1UA.ll1 <3e.
By analogy with the stochastic basis version, we may expect that the weak Vitali property is necessary and sufficient for derivation of L00 integrals. (See below, Theorem (7.3.11).) First, the appropriate lemma.
(7.3.8) Lemma. Let f E L00, and define y(A) = fA f dµ for A E , An be constituents and let C E Y. Then if p(A) < oo. Let A1, A2i 11>1A: 1c111 <E, we have
n
'Y(Ai) 'Y(C) < EIIf 11.. i=1
Proof.
IE7(Ai) 'y(C)I
= If E lA, 1C) f411
IE1A.lcD Ifll.:5 E11f1100. < Derivation of L00 integrals is closely related to the Lebesgue density theorem, and its abstract analogs.
(7.3.9) Definition. Let A be a measurable set. A point w is a point of density of A if for every e > 0 there is Et = w with limsupt µ(A fl Et)/p(Et) > 1  E. (Equivalently, the set function y(E) = µ(A fl E) has D*ry(w) = 1.) A derivation basis has the density property if almost every point of every measurable set A is a point of density of the set A. Of course, it is enough to check sets of finite measure:
(7.3.10) Proposition. Suppose, for every measurable set C of finite measure, almost every point of C is a point of density of C. Then B has the density property.
Proof. Let A be a measurable set of infinite measure. Then (since µ is orfinite), there is an increasing sequence Cn of sets of finite measure with U Cn = A. Now we know that almost every point of Cn is a point of density
of C. Hence for almost every point w of A, there is n such that w is a point of density of Cn. But for any Et = w, u(Et n A) > lim sup u(Et n Cn) t µ(Et) µ(Et) t so w is a point of density of A as well. Now we prove that the density property is characterized by the weak Vitali property. lim sup
Derivation
316
(7.3.11) Theorem. Let B be a derivation basis. The following are equivalent:
(a) B has the density property. (b) If C E F then B differentiates the integral of 1c. (c) B differentiates all L., integrals. (d) B has the weak Vitali property. Proof. (c) (d)
(a) are easy. (b) and (b) (c). Let y be an L.. integral; say y(A) = fA f dµ, with f E L"..
We claim first that D*y < f a.e. If not, there exist a < b such that
µ*{f 0 Hence there is a setQC{f
V = { E constituent :
µ
>b}
(E) is a Bfine almostcover of C. Lets > 0. By the weak Vitali property and Lemma (7.3.8), there exist constituents A1, A2i , A,,, E V with 1Aj 
1cili <e and IEy(Ai)  y(C)I < E. Thus 'y(C) + E > E'y(Ai) >_ b E p(Ai) > bp(C)  eIbl.
This is true for all E > 0, so 'y(C) > by (C). But f < a on C, so y(C) _ fc f dp < ap(C) < bp(C), a contradiction. Therefore D*y < f. Similarly, D*y > f. (a) (d). Suppose the density property holds. Let V be a Bfine cover
of Q and C an outer envelope of Q; suppose 0 < µ(C) < oo; let e > 0. Choose a with 0 < a < 1 such that
0 < ( 1a \\
 1) µ(C) < E.
Now if Y C Q and µ* (Y) > 0, let
V(Y,a)={EEV:,u*(YnE)>ap(E)}, ry=sup {µ(E):EEV(Y,a)}. From the density property applied to an outer envelope V of Y, we see that some point of Y is a point of density of Y, so there exists w E Y and Et
w, with Et E V, so µ(Y fl Et)/p(Et) + 1 > a. Thus V(Y,a) # 0,
and therefore ry > 0. If Y C Q and t* (Y) = 0, write ry = 0. Now fix ,3 with 0 < 0 < 1. Let X1 = Q. Then p* (Xl) > 0, so rX, > 0.
There exists Al E V with µ(A1) > 3rx, and µ(C fl A1) > a t(Ai). Let
7.3. Abstract derivation
317
X2 = X1 \ A1. Continue recursively: Suppose A1, A2, .. , An E V have been defined such that
µ(Xi n Ai) > ali(Ai),
u(Ai) > ,3rx.,
where Xi+1 = Xi \ U;=1 Aj and Xi is an outer envelope of Xi. If we have u*(X,,+1) = 0, then the recursive construction stops. Otherwise, let An+l E V satisfy µ(A,,+1) > and u(X,,+1 n An+1) > aµ(An+1) So we get a (finite or infinite) sequence of sets A1, A2i E V such that Ai n Xi are disjoint subsets of Q, p(C) > µ(C n U Ai) ? /L( U(Ai n Xi))
_ E lL(Xi n Ai) > a E ti(Ai) Thus E p(Ai) < (1/a)p(C n U Ai) < oo. Now we claim that µ(C \ U Ai) = 0. If the sequence Ai is finite, then p*(XN+1) = 0 for some N, so µ* (Q \ UN 1 Ai) = 0 as claimed. So suppose
the sequence Ai is infinite. Then /3>irx; <
p(Ai) < oo, so rx, + 0.
Let X,, = Q \ U Ai. Then Xo,, C Xn for all n. Thus V (X,,c,, a) C_ V (X., a),
so rx_ < rx for all n, so rx_ = 0. Therefore µ*(X,,o) = 0, or lc(C \ U Ai) = 0 as claimed. Now we have µ(C) =µ(C n U Ai). Then
IIE 1Ai
µ(Ai)  µ(C) < ( I a
 1CII1 <
 1\I µ(C) < E.
This shows that the weak Vitali property holds. Orlicz functions
Next we come to the analogs for differentiation bases of the covering conditions (V4.) for stochastic bases. We will retain the same terminology.
(7.3.12) Definition. Let 4) be an Orlicz function. The derivation basis B has property (V4.) if, for every C E Y with 0 < µ(C) < oo, every Bfine almostcover V of C, and every e > 0, there exist finitely many constituents Al, A2i
, A,, E V with n
E 1Ai  1C i=1
< E. 4'
Note in particular that the weak Vitali property is exactly property (V4,) in the case L4, = L1. We will need to recall two facts about Orlicz functions: (2.1.22) If gn has integer values, and II9nII4, + 0, then IIgnIII + 0(2.1.20) If 4) is finite, then p(An) > 0 if and only if II IAn 11 4) + 0.
The typical case where 4) is not finite is L4, = L. When 4) is finite, condition (V4.) may be reformulated.
Derivation
318
(7.3.13) Proposition. Suppose 4> is a finite Orlicz function and B is a derivation basis. The following are equivalent.
(1) (Vt): For every C E F with 0 < µ(C) < oo, every Bfine almostcover V of C, and every e > 0, there exist finitely many constituents A1, A2, , A,,, E V with IIE 1A;  1ctI4, < 6
(2) For every C E F with 0 < p(C) < oo, every Bfine almostcover V of C, and every e > 0, there exist finitely many constituents Al, A2i , A,l E V with µ(C \ U Ai) < e, µ(U Ai \ C) < E, and IIE1A,1UA.II.V <E.
Proof. (1) (2). Let E > 0. Use (2.1.20) to choose E' > 0 so that e' < e and µ(D) < 2E' implies II1DIIb < E/2. Then by (2.1.22) choose 6 < E/2 so that if g has integer values and IIgIIp < b, then IIgII1 < C. Now suppose C and V are given. Then by (1) there exist A1, , A,,, E V 1A.1CII,D < 6. Now lUAi\c < I 1Ai1C I, SO II1UA;\CII'J' < 6, with II > and thus µ(U Ai \ C) < e' < E. Similarly, 1c\UA; < I > 1A:  1C 1, so µ(C \ U Ai) < E' < E. Finally, µ(C A U Ai) < 2e', so II1cjUA, III < E/2, and thus I < I E 1Ai  1CI + II1C  1UA.II,p<< 6 + 2 < 6.
.
IE 1A`  1UA` (1). Let C, V, E be given. By (2.1.20) choose 6 > 0 so that 8 <
(2)
, An E V E/3 and µ(D) < 6 implies II1D 114, < e/3. There exist Al, A2i with µ(C \ U Ai) < 6, µ(U Ai \ C) < 6, II > lA.  1UA, IIi, < 6. Therefore
II1C\UA; II4 <,F/3 and Il1UAi\cII.) < E/3 so II E 1A: 1CII4, < E.
(7.3.14) Proposition. If property (V.p) holds, then the weak Vitali property holds. Proof. Apply (2.1.22) to (7.3.7(2)). Thus we see that property (V.) implies that all L,,, integrals are differentiable. But of course many more integrals may also be differentiable. (7.3.15) Lemma. Let 4> and %F be conjugate Orlicz functions, 4> finite. Let f E Lq,, and define y(A) = fA f dµ for A E F, µ(A) < oo. Let A1, A2i
,
A be constituents and let C E .F. If II E 1Ai  icIlz < E, then
7(Ai) 7(C) < 2eflfII,. i=1
Proof.
I'Y(Ai)  y(C) I = If (> 1A; 1C)
<fIE1Ai1C
f
dµ1
Ifldµ
< 2 IIE JAS  IcDI. IIf IIw < 2E IIf 11w.
The proof of the following result is omitted, since it is almost identical to those of Theorems (7.3.5) and (7.3.11) (d) (c); this time use Lemma (7.3.15).
7.3. Abstract derivation
319
(7.3.16) Theorem. Let 4 and 41 be conjugate Orlicz functions, 4D finite. Let B be a derivation basis satisfying property (V.F). Then B differentiates all L%F integrals.
The converse, however, is more involved. As in (4.3.11), we prove it under the assumption of (02) at oo.
(7.3.17) Theorem. Let 4 and' be conjugate Orlicz functions. Assume 4 is finite and satisfies condition (A2) at oo. Let B be a derivation basis. Suppose B differentiates all Lq, integrals. Then B has property (Vt). Proof. (I) We begin with some simplifications. First, we may assume that 4D(1) > 0: Indeed, let a > 0 be such that 4D(a) > 0. The Orlicz function 4o defined by bo(u) = e(au) satisfies IIfkkpo = aIIfII4, and X0(1) > 0. The conjugate is To(v) = (1/a)WY(v), so Lpo = Lp.
Next, by (02), there exist M and uo such that 4 (2u) < M4(u) for u > uo. Since P(1) > 0, we may assume (by replacing the constant M) that 4D(2u) < M4D(u) for all u > 1. Also D(0) = 0, so this means M,D(n) for all nonnegative integers n. If A E F and p(A) < oo, then 1A E Lip, so B differentiates the integral of 1A. Therefore B differentiates the integral of the complement In\A. (II) Let A = {Al, A2i } be a finite or countably infinite collection of
constituents. Write UA = U2 Ai; nA = > lAi; eA = nA  1 UA. Note if A = 0, then nA = eA = 0. We claim that (7.3.17a)
,D (nA) dµ < M
J
J
t(eA) dµ + 1D(1)µ(UA)
Indeed, on the set {nA < 1} we have fi(nA) = C1)lUA; and on the set {n.A > 2} we have 4D (nA) = 4D (eA + 1) < 4 (2eA) < M4 (eA).
(III) Next claim: Suppose a,,3 > 0, C E F, 0 < p(C) < oo, V is a Bfine almostcover of C, h E Lq,, h > 0, p{h > 0} < oo, and µ(C \ {h > 0}) > 0. Then there is B E V with µ(B) > 0 and (7.3.17b)
Ja
h dµ + ap(B \ C) < ,(3µ(B).
To see this, observe that B differentiates the integrals of h and 1o\C, and therefore B differentiates the integral of h + a 1n\C. This derivative is 0 a.e. on C \ {h > 0}, which has positive measure, so there is B E V with
fB h dµ + aµ(B \ C)
Derivation
320
(N) Next claim: Let 77 > 0, µ(C) < 00, V a Bfine almostcover of C, and A a finite or countably infinite set of constituents. Let 3 > 0 satisfy
(1r7)(1W0) I1<1.
and Suppose
A\\\satisfies
///
(i) f (D(eA) du < 77µ(C n UA),
(ii) (1 1)) f nA dp < µ(C n UA), (iii) µ(C \ UA) > 0, (iv) cp(nA) E Lq,
.
Then there exists B E V with µ(B) > 0 and (v) fB cp(nA) dp + cp(1)Il(B \ C) <,3!,t(B) Furthermore, if B E V satisfies (v), then the collection 13 = AU{B} satisfies
dp < qµ(C n U B), (vi) f (vii) (1  77) f nB dµ:5 µ(C n UB). To prove this, we will apply claim (III) with a = cp(1) and h = cp(nA). Of course cp(1) < oo since 0 is finite, and cp(1) > 0 since 0. The condition µ(C \ {h > 0}) > 0 follows from (iii); and µ{h > 0} < oo follows from (ii). Thus from (7.3.17b) we get (v). Now suppose B satisfies (v). Then
µ(B \ (C \ UA)) < p(B \ C) + µ(B n UA)
µ(B\C)+ <
')
V(1)
fBV(nA)dµ
lt(B)
Thus
µ(B n (C \ UA)) = µ(B)  µ(B
\ (C \ UA)) ? (_)
(B).
Therefore i
f cp(nA) dµ < Qµ(B) < Q
(1_)(Bn(C\UA))
cµ(B n (C \UA)). But now
f
1D(eB) dp =
=
=
fi(eB) dµ fA\B 41(eB) dµ + fUA Bn U
fA\B
BnUA
P(nA) dµ
fA ( D (eA) dµ + f
< f 11(eA) dµ +
snUA
jco(nA)dt.
t(nA  1))dµ
7.3. Abstract derivation
321
Here we used the inequality 4(n)  4)(n  1) < V(n) (2.1.19c). Now
f
4(es) du < f t(eA) dµ + fB v(nA) dµ
<µ(CnUA)+iu(Bn(C\UA)) = 77µ(C n UB).
This proves (vi). For (vii), compute
(177)
f nBdu=(177) f nAdµ+(1q),u(B) <µ(CnUA)+(1i7) (1_)'(Bn(c\UA))
<µ(CnUA)+p(Bn(C\UA)) =µ(CnU3). This proves (vii). (V) Now we are ready to establish (Vi.). Let C E F with 0 < µ(C) < 00,
let V be a Bfine almostcover of C, and let E > 0. We may assume that
E < 1/4. Choose el > 0 so that f
dµ < El implies 11f 114, < E [by Corollary (2.1.18(C))]. Then choose 77 > 0 so that 77µ(C) < e
and
77
1
'q
µ(C) < E.
We will apply the assertion of (IV) recursively.
Begin with A0 = 0. Then Ao satisfies (i)(iv). (Even if (p(0) > 0, still %F(cp(0)) = 0.) Thus Dl # 0, where
Dl = { B E V : f cp(0) dp + p(1)p(B \ C) < /3µ(B) } .
l
B
J
Choose B1 E D1i µ(B1) > (1/2) sup { µ(B) : B E Dl }. Then we write Al = {B1} so that Al satisfies (i), (ii), and (iv). If Al does not satisfy (iii), the recursive construction stops. Otherwise continue. Suppose Ak = {B1, B2i , Bk} is defined, satisfying (i)(iv). Then Dk+1 # 0, where Dk+1 = j B E V :
L co(nAk) du + o(1), (B \ C):5 ,3 (B) } .
Choose Bk+l E Dk+1 with p(Bk+1) > (1/2) sup { µ(B) : B E Dk+1 }. Then let Ak+1 = Ak U {Bk+1}. Thus Ak+1 satisfies (i), (ii), and (iv). If it does not satisfy (iii), stop. Otherwise continue. We therefore get A = Uk Ak = {B1, B2i }, a finite or countably infinite collection of constituents. We claim that A has the properties required
Derivation
322
by (V,), namely: µ(C \ QUA) = 0, p(UA \ C) < e, IleAII4' < C. Each Ak satisfies (i), so (by monotone convergence) so does A:
1 P(est) dp < 71µ(C n UA) < 71µ(C) < el. By the choice of el, we have IleAII4, < e. Each Ak satisfies (ii), so A does also. Therefore
(1  9) µ(UA) < (1  77) J nA dp < µ(C n UA) < oo. Also,
p(UA \ C) < p(UA)  µ(C n UA) < (1  (1  71)) p(UA) = 71,u(UA)
<
µ(C) < E.
77
1
77
Note II2eAll t
< 2e < 1/2, so by (3.1.20) we have cp(2eA) E Lq,. If A = {nA = 1}, then µ(A) < p(UA) < oo, so cp(1A) E Lq,. Now nA < 1A + 2eA with disjoint supports, so cp(nA) < W(1A) + cp(2eA). Thus cp(eA) E Lp. That is, A satisfies (iv). Now we claim p(C \ UA) = 0. If not, then A satisfies (i)(iv). There is B as in (IV), so B E Vk for all k and p(B) > 0. Thus p(B) < 2p(Bk) for all k. But > p(Bk) < oo, a contradiction, so µ(C \ QUA) = 0. Therefore (V4,) is verified.
Property (FV4,)
Next we come to a Vitali type of property, with the advantage that condition (A2) is not required for the converse. Let 4) be an Orlicz function. A derivation basis has property (FV4,) if, for every e > 0, every C E F with 0 < p(C) < oo, and every Bfine almost, An E V and nonnegative cover V of C, there exist constituents Al, A2i scalars al, a2i , an such that
ai1A,1C i=t
<e, 4,
aip(Ai)  p(C) <e. It is easy to see that (V4,) implies (FV4,) with ai = 1.
7.3. Abstract derivation
323
(7.3.18) Proposition. Suppose L4. C_ L1. Then B has property (FV4.)
if and only if for every e > 0, every C E F with 0 < µ.(C) < oo, and , An E V, and every Bfine almostcover V of C, there exist A1, A2i a1, a2,
, an > 0 such that 11E ai 1A:  1C II4, < E.
Proof. If B has property (FVp), then clearly the other condition holds. For the converse, suppose the condition stated is satisfied. For given e > 0,
there is 6 > 0 so that 11f II. < 6 implies 11f II 1 < e. We may also as, A, E V, and sume 6 < e. Then given C and V, there exist A1, A2i a1 i a2, , an > 0 such that II > ai 1Ai  1c 114, < 6 < e. Then we know that II E ai 1Ai 1C111 <E, so
IE ai,u(Ai)
 µ(C) I= IfJ (> ai 1Ai  1C) dµ I
<
The usual lemma will be useful in the proof of convergence. The proof (similar to that of Lemma (7.3.15)) is left to the reader. (7.3.19) Lemma. Let 1D and %F be conjugate Orlicz functions, (D fi
nite. Suppose B has property (FV.,). Let f E Lp, and define y(A) = fA f dµ. If A1, A2i , An are constituents and a1, a2, , an > 0 with II
ai 1A, 1cliw << e, then
IT,
ai Y(Ai)  y(C)I < 2eIIfIIw.
(7.3.20) Theorem. Let 4 and' be conjugate Orlicz functions, 4 finite. Suppose B has property (FV4,). Then B differentiates all Lq, integrals.
Proof. Let f E Lw and y(A) = fA f dµ. We will show that D*y < f. Suppose not: then there exist a < b with p*{ If < a < b < D*y} > 0. Thus there is Q {f
E constituent : µ(E) > b I
is a Bfine almostcover of C. Let e > 0 be given. By (FVw) and Lemma , an > 0 with (7.3.19), there exist A1i A2, , An E V, and al, a2,
aip(Ai)  A(C) < e, aiy(Ai)  y(C) < E. Then
y(C) + e > E ai'y(Ai) >_ b E aiµ(Ai) > bp(C)  Ible. This is true for all e > 0, so y(C) > bp(C). But f < a on C, so
y(C) =
J f dµ
ap(C) < b(C)
a contradiction. Therefore D*y < f . Similarly, D*y > f.
Derivation
324
For the converse, we first take the case L4, C L1.
(7.3.21) Theorem. Let 4) and 41 be conjugate Orlicz functions with 4 finite and L4. C L1. Suppose B differentiates all LT integrals. Then B has property (FV4,).
Proof. Suppose (FV4,) fails. There is e > 0, Q C_ 1, and a Bfine cover V of Q such that, if C is an outer envelope of Q, ai > 0 and Ai E V satisfy
I Eaiµ(Ai)  µ(C)I < E, then
11E ai 1A:  lc
(7.3.21a)
I
> E.
We assume also that E < µ(C). By (2.1.23), there is e' > 0 so that If I < 1, 11f 11, < e' imply If 114, < e/2. Assume also that c' < E. Since L4, C L1, there is s" > 0 so that IIf114 < E" implies I1f111 < e'/2. Assume also that E" < E/2. We begin with an application of the HahnBanach theorem (5.1.2). Consider three subsets of L.,:
Cl = {a i 1A:: ai > 0, Ai E V, I E.aiµ(Ai)  µ(C)I < E'/2 1,
C2=EL4,:.
IIelIe <'}.
All three of the sets are convex. C3 is open, so C2 + C3 is convex and open.
We claim that C1fl(C2+C3) = 0. Suppose = E ai lA; E C1 and = SZ+6 with b2 E C2, 3 E C3. Then 11e3111 < e'/2. Let B = { > 1}. Now
2 <
aiµ(Ai)  µ(C) = f (e  1c) dµ
= fsnc(  1)dµ + Jsz\C
s
fBnc
131dµ+
f\c n
dµ  fc\B(1  ) dµ IS3Idµ
f
c\B
< 2  fc\B(1  .) dµ. Thus fc\s(1 1;) dµ < E' and 11(1  )1c\B114 < E/2. But then I  1c1 = (  1) lBnc + (1  0 lc\B + ICI lO\C < 161 + (1  0 lc\B, so
III1c114, < II'3I14,+11(1 )1C\BI14, <
2+2 =E.
This contradicts (7.3.21a). In fact C1 fl (C2 + C3) = 0. Then by the HahnBanach theorem, there is a functional x* E L*> with x* 1 for all 1; E C1
7.3. Abstract derivation
325
and x* 1 for all E C2 + C3. By (2.2.24), since x* 1 for all E C2 and C3, the functional is of the form x* f f dp for some f E L,&. Now we claim that B does not differentiate the integral y of f . If E E V,
then
so x*
= µ(C)  e'/4 1E E C1, µ(E) 1, which means for y(E) = fE f dµ y(E) >
1
µ(C)  E'/4.
µ(E)
Since V is a Bfine cover of Q, we conclude that D*,y(w) > 1/(µ(C)  e'/4) for all w E Q. Now if D*y = f a.e., we would have f (w) > 1/(µ(C)  E/4) a.e. on C. But 1c E C2, so we would have 1 > 'y(C)
=ff
dµ
1,
µ(C)(CE'/4 > a contradiction. Thus D*y = f fails on a set of positive measure. Therefore B does not differentiate the integral of the Lq, function f.
(7.3.22) Next we should consider the case L4, V= L1. Let 4) and T be conjugate Orlicz functions. Suppose B differentiates all L,, integrals. If A E F, u(A) < oo, then 1A E Lq,. So B has the density property, and therefore B differentiates all L,,. integrals. Thus B differentiates integrals of functions in L,v + L. We have seen in Proposition (2.2.13) that Lp + L,,. is itself an Orlicz space L*a, where AI's is the shifted Orlicz function given by (7.3.22a)
ty(v 
0,
v<1,
1),
v > 1.
A calculation shows that the conjugate Orlicz function is then (7.3.22b)
4)g(u) = 4)(u) + u,
and LD. = L4, fl L1. Since L4,e C L1, the case of (% and '$ is covered by the previous material. (4) is finite if and only if the derivative cp is finite, if and only if Eli is unbounded, if and only if lim'1(v)/v = oo.) Thus we have proved:
Theorem. Let ' be a finite Orlicz function with lim'(v)/v = oo. Let IQ8 and 4)$ be defined by (7.3.22a) and (7.3.22b). Then the following are equivalent: (1) B differentiates all L,, integrals; (2) B differentiates all L p. integrals; (3) B has property (FV4, ).
The important case not yet completed is Lp = L1, Lt = L. We will discuss this in the next section.
326
Derivation Complements
(7.3.23) (Stochastic basis.) Let (fl,.F, P) be a probability space, let J be a countable directed set, and let (Ft)tEJ be a stochastic basis consisting of finite aalgebras. For each t E J and w E S2, let Et (w) be the atom of Ft containing w. We may define a derivation basis B on SI by postulating (Et(w))tEJ,
w
for J' C J cofinal. Now if X E L1, then there correspond a martingale
Xt=E''[X] and an integral
y(A) = fXdP. Now of course
'y(Et(w))
= Xt(w), P(Et(w)) so Dry = X a.e. if and only if Xt > X a.s. The analogies between the stochastic basis covering properties of Chapter 4 and the derivation basis Vitali properties of this chapter can be exhibited in this setting. For example, (.Ft) satisfies the covering condition (V) if and only if B has the strong Vitali property. (7.3.24) (A "demiconvergence" result.) Suppose a derivation basis B has the weak Vitali property. Then B "lower differentiates" all nonnegative L1 integrals. That is: if f > 0, f E L1i and y(A) = fA f dµ for all A E F, then D.y > f a.e. (If f > 0, we may dispense with Lemma (7.3.8) in the proof of the first part of (d) (c) in Theorem (7.3.11). See Hayes & Pauc [1970], Proposition 2.1, page 19.) (7.3.25) (Overflow.) Talagrand's definition for property (V4.) (Talagrand[1986]) is as in Proposition (7.3.12) part (2), except that the overflow
condition µ(U Ai \ C) < e is omitted. Of course, if B has small overflow, then the condition is not needed. But for completely general B it is needed. For example, suppose we have the trivial derivation basis defined on [0, 1] so that En = w if En = [0, 1] for all n. It is easy to arrange zero deficit and overlap, but the integral of no nonconstant f E L4. has derivative f. (7.3.26) ((FV4.).) Talagrand's definition for (FV4.) (Talagrand [1986]) is: for every e > 0, every C E F with 0 < u(C) < oo, and every Bfine almostcover V of C, there exist constituents A1, A2, , An E V and nonnegative scalars a1, a2i , an such that
ai1As 1AEai1A,II <E, aiµ(Ai)  u(C) I < E.
7.4. Dbases
327
For general derivation bases, this is in fact not sufficient for differentiation of L41 integrals. For example: Let (1, F, p) be [0, 1] with Lebesgue measure.
Enumerate the rationals in (0, 1) as rn. Let En = [0, rn], n E IN, and postulate En w for all w E [0, 1]. Then certainly B differentiates the integral of no nonconstant f E L*. But if V is a Bfine almostcover of any set C, and E > 0, then there exist arbitrarily large n with I rn  A(01 < e,
soEnEV. Now let Al=En and al=1, soIla,1A,1Aa,1A,IIt=0 and Iaip(Al)  p(C)I < e. (7.3.27) (Both inequalities are needed in (FVD).) Let _
ai 1Aj. Then III  lc 112 small does not imply I E ai p(Ai)  p(C) I is small. Let (fI, F, p) be IR with Lebesgue measure. Take C = [0, 1], Ai = [0, 1] U [i, i + 1] and ai = 1/n f o r i = 1, 2, n. Then II ai 1 A;  lc 112 = 1/V/n_ but I
.aip(Ai)  p(C)I = 1.
Remarks
A reference on derivation theory is Hayes & Pauc [1970]. Much of the material in this section follows their treatment. Theorem (7.3.17) on the necessity of (V..) is due to C. A. Hayes [1976]. We have used his proof. Property (FV4,) is due to Talagrand [1986] (note (7.3.26)).
7.4. Dbases The most often used derivation bases are of special kinds. In this section we will consider the Dbases and a still more special case, the BusemannFeller bases. Dbases
Let (S2, .F, p) be a orfinite measure space. A Dbasis on (S2, F, p) is a pair (£, 6), where 6 is a family of measurable sets E with 0 < p(E) < oo, and 6 is a function 6: 6 + (0, oo), such that, for every w E St and every E > 0, there exists E E £ with w E E and 6(E) < E. Given a Dbasis (£, 6), we may define a derivation basis by specifying En = w iff w E En E £ and 6(En) > 0. A derivation basis B that can be specified in this way will also be called a Dbasis. We may even write B = (£, 6). For example, Il may be a metric space, with metric p, and 6(E) may be the diameter of E:
6(E) = diam E = sup { p(x, y) : x, y E E 1.
Some of the derivation bases in Sections 7.1 and 7.2 are Dbases with 6 as diameter: the interval basis in IRd; the disk basis in IR2 (but not the centered disk basis). We have seen a correspondence between stochastic bases and derivation bases.
The condition of a countable cofinal subset for stochastic bases
corresponds roughly to being a Dbasis for derivation bases.
Derivation
328
The measurability of the upper and lower derivates is true for certain Dbases.
(7.4.1) Proposition. Let f be a metric space, let p be a Radon measure on p, let (E, 6) be a Dbasis where E is a collection of open sets, and 6 is diameter. If y: E + lit is any set function, then D*y and D*y are measurable functions.
Proof. Fix a real number t. For positive integers m and k, let
Pmk=US EE£:6(E) <
k
00
7(E) >t+m } p(E)
00
{D*y > t} = U n Pmk m=1 k=1
is a measurable set (a Gba set). This is true for all t, so D*y is a measurable function (a function of the second Baire class). Similarly D*y is measurable. The next result shows that the most common derivation bases have small overflow.
(7.4.2) Proposition. Let St be a metric space, let u be a Radon measure, and let (E, 6) be a Dbasis, where 6 is diameter. Then (E, 6) has small overflow.
Proof. Let C E F, 0 < µ(C) < oo. Let e > 0 be given. There is an open set U D C with µ(U) < p(C) + e. The family
U={EE£:ECU} is a full Bfine cover of C, and if Al, A2, µ(U \ C) < C.
, An Elf, then p(U Ai \ C)
Properties (A) and (C)
The derivation basis B has property (A) if for every C E F with 0 < µ(C) < oo there is a constant M such that for every Bfine almostcover V of C and every e > 0, there exist Al, A2i , An E V with u(U Ai \ C) < e, and E 1A; 11
< M E p(Ai)
7.4. Dbases
329
The differentiation theorem follows the usual outline.
(7.4.3) Theorem. Suppose B has property (A). Then B differentiates all L1 integrals.
Proof. Let f E L1, and define y(A) = fA f dµ. We claim f > D*y. If not, then,u*{ If < D*y} > 0, so there exist a < b with µ*{ If < a < b < D*y} > 0.
LetQC{f 0 such that µ(G) < e implies fG If  al dµ < (b  a)/(2M). Now
V = ( E constituent : µ(E) > b } is a Bfine almostcover off C. There exist constituents A1, A2i , A, E V with µ(U Ai \ C) < E and 11 E 1A: 11w < M E µ(A2). Now for each i,
J;\C (.f a) dµ = fAs (f a) dµ  fAsnc(f  a) dµ > (b  a)µ(A2)  0. Now if G = U Ai \ C, then µ(G) < e, and thus
(b  a) E p(Ai) : >
Af.\C
(f a) dp = f (E 1A.) (.f  a) dp G
aIdµ<MEti(Ai)
ba E/t(Ai), =
2M
2
a contradiction. Thus f > D*y. Similarly f < D,,y.
For the converse, we use the HahnBanach theorem, but only on a finitedimensional space.
(7.4.4) Theorem. Suppose the Dbasis B = (E, b) differentiates all L1 integrals. Then B has property (A). Proof. First, B differentiates all L1 integrals, so B has the density property, and therefore the weak Vitali property. In particular, B has small overflow.
Let C E F with 0 < µ(C) < oo. Suppose property (A) fails. Choose Mk T oo with 00
1
Mk
< µ(C) 2
Then for each k, there exists a Bfine almostcover V of C such that A1, A 2 , , A,,, E Vk implies I) E 1A; IIa, > Mk E µ(Ai). We may assume
6(E)<1/kforallEEVk.
Derivation
330
Fix k. The following "functional" version of the inequality is also true:
if Al, A2i..., A. E Vk and al,a2, ,an > 0, then IIEai1A;11. > Mk Eaip(Ai). This may be seen by approximating the scalars ai with rational numbers, multiplying through by a common denominator, and observing that repetitions are allowed in the list Al, A2, An. By the weak Vitali property, there exist finitely many sets Alk, A2k,
E Vk with
p(Ui Aik \ C) < p(C)/2 and u(C \ Ui Aik) < p(C)/2. Let Ak be the finite algebra on Q generated by the sets Alk, A2k, . Consider two subsets of L. (ul, Ak, P):
Cl = > ai IA,,, : ai
aip(Aik) = 1
0,
,
C2={eEL.(Ak):1; <Mk}. These sets are disjoint, convex, and closed. C1 is compact. Thus there is fk E L1(ci, Ak, p) with f t; fk dµ > 1 for all l; E Cl and f t; fk dµ < 1 for all l E C2. Because of C2, we have fk > 0 and IIfkDII <_ 1/Mk. Now 1
1A=k E Cl,
AikJ )
so fAsk fk dp > p(Aik) We may therefore construct such a function fk for each k. Let f = >k fk. Then f > 0, and IIf 111 E 1/Mk < oo, so f E Ll. But we claim B does
not differentiate the integral y of f. Let
B=1
IUAik
k
i
Then p(C \ B) < >k p(C \ Ui Aik) < p(C)/2, so p(C fl B) > p(C)/2. For every w E B and every k, there is E E Vk C_ £ with w E E, 6(E) < 1/k and
fE f dp > fE fk dp > p(E). Therefore D*ry > 1 on B. But if f > 1 on B, then p(2) < E Mk < µ(2 ) ' < p(B) ` JB f dp <_ J f dp a contradiction. Therefore D*.y > f on a set of positive measure. There is another property closely related to property (A). The derivation basis B has property (C) if, for every e > 0, there exists a constant M such that if C E F, e < p(C) < oo, and V is a Bfine almostcover of C, then for every 77 > 0 there exist A1i A2, , An E V with II E 1A; IIw < Mp(U Ai)
andp(UAi\C)
If B has small overflow, the definition may be simplified. Recall that B has small overflow iff, for every C E F with 7.c(C) < oo and every 77 > 0, there exists a set Co C C with p(C \ Co) = 0 and a full Bfine cover U of Co such that, for any Al, A2, , An Elf, we have p(U Ai \ C) < 77.
7.4. Dbases
331
Now if B has small overflow, property (C) may be stated: For every > 0, there exists a constant M such that if C E F, E < u(C) < oo, and , A,,, E V with V is a Bfine almostcover of C, then there exists A1, II E 1Ai 11. < Mu(U Ai). Indeed, if this is true and q > 0 is given, choose Co and U as in the definition of small overflow; then E < u(Co) < oo and V flU is a Bfine almostcover of CO, so that there exist A1, , A E V flU with II E 1Ai Iloo < Mu(U Ai) and µ(U Ai \ C) < 17. Property (A) may be similarly simplified. Properties (A) and (C) are equivalent for a Dbasis with small overflow:
(7.4.5) Theorem. Let B = (E, b) be a Dbasis with small overflow. The following are equivalent:
(1) Property (A): for every C E F, µ(C) > 0, there is M such that for every Bfine almostcover V of C, there exist A1, A2, with 11 > 1A; II. < M 1: µ(Ai).
,
A,,, E V
(2) For every C E F, µ(C) > 0, there is M such that for every Bfine almostcover V of C, there exist A1, A2i , A,, E V with II E 1A, II. < Mµ(U Ai)
(3) Property (C): for every E > 0 there is M such that if C E F, µ(C) > E, and V is a Bfine almostcover of C, then there exist A1, A2, ... , A E V with II E 1Ai 11m < My(U Ai)
Proof. (2) (1)
(1) and (3) = (2) are easy. (2). Since (A) holds, B differentiates all L1 integrals, so the
density property holds: that is, the weak Vitali property holds.
But suppose (2) fails. There is C E F, µ(C) > 0, such that there ex, A,, E Vk implies ist Bfine almostcovers Vk of C for which A1i A2, II E 1Ai Iloo > 2kpi(U Ai). Now for each k, use the weak Vitali property to choose finite sets 13k = {B1ki B2k, } _C Vk with b(Bik) < 1/k,
Ei µ(Bik) ,u(Ui Bik) < 2k, µ(Ui Bik \ C) < 2k, and µ(C \ Ui Bik) < 2k. Note that E µ(Aj)  a(U Aj) < 2k also holds for any subfamily {Aj } of Bk. Now C = lim supk Ui Bik satisfies p(C \ C) = 0, and Uk 13k is a Bfine cover of C. _
We may apply (1) to the set C, to obtain a corresponding constant M. Choose j so large that 2j2 > M. Now V = U00 13k is a Bfine cover of , An E V with II > 1Ai 11. < M E p(Ai). Each C, so there exist Al i A2, set Ai belongs to some 13k with k > j; call it 13k(i). Now n
L i=1
00
ju(Ai)
=k=jEk(i)=k E N'(Aj)
Derivation
332
k=j
00
00
E2 k+1
k=j
i=1
< 2j+2
00
E 1A= i=1
00
a contradiction. Hence (2) holds. (2) (3). Suppose (2) holds. Then (A) holds, so again the weak Vitali property holds.
Suppose (3) fails. Then there exist e > 0, C,,, E F with µ(Cm) > e, and Bfine almostcovers Vm of Cn, such that A1, A2i , An E V. implies 1A; 1100 > 2mµ(IJ Ai). For each m, choose by the weak Vitali property a finite set 13.,,, = {Bim, B2m, } C V. with S(Bi,m) < 1/m, 11
/L(Cm \ Ui Bim) < 2m, and µ(Ui Bim \ Cm) < 2m. Now let C = lira supra IJi Bim. Then Um=113m is a Bfine cover of C and p(C) > e. Apply (2) to C to obtain a constant M. Choose j so that 2j1 > M. Then V = Um=;13m is a Bfine cover of C. Thus there exist A1, A2i , An E V 1Aj 11ao < Ma(U Ai). Each set Ai is in some Bm(i). Now with 11
µ
(UA)n i=1
µ
m=j
U
Ai
m(a)=m
00
E 2m E 'A, m(i)=m j=m
00
< 2j+1 i=1
00
a contradiction. Hence (3) holds. Halo theorems
The classical derivation theorems, such as the theorems in Sections 7.1 and 7.2, use the Vitali covering theorem. Banach's proof of this theorem uses a simple geometric fact about "halos" of sets in W. It is useful to generalize this idea. We will use the "essential supremum" or "essential union" of a family
of sets (see Section 4.1). In many cases, this is the same as the ordinary union:
7.4. Dbases
333
(7.4.6) Proposition. Let St be a metric space and let µ be a Radon measure on Q. Let (Ai)iE7 be a family of open subsets of Q. Then
A=UAi iEI
is an essential supremum of the family.
Proof. Certainly A is open (hence measurable) and A D Ai for all i. Suppose also B is measurable and B D Ai a.e. for all i. We claim that B D A a.e. Suppose not. Then µ(A \ B) > 0. Thus there is a compact set K C_ A\B, with µ(K) > 0. Now (Ai)iEI is an open cover of K, so there is a finite subcover, Ai, UAi2 U . UAi
K. Then K C (Ai, UAj2 U. .
\B.
In fact p(Ai \ B) > 0 for some i, which contradicts B D Ai a.e.
Let B = (£, 6) be a Dbasis, let A E £, and let a > 0. The ahalo of A is
H(a, A) = ess sup {EE£:EnA#0, 6(E)
existence of a constant M with µ(H(2, A)) < Mp(A) for all squares A. This idea can be generalized.
(7.4.7) Theorem. Let SZ be a metric space, and let B = (£, 6) be a Dbasis with small overflow, where 6 is diameter. Suppose all E E £ are closed
sets. Suppose a > 1 and M < oo exist such that p(H(a, A)) < MIL(A) for all A E S. Then B has the strong Vitali property. Proof. Let Q C S2, 0 < µ* (Q) < oo, let V be a Bfine cover of Q, and let C be an outer envelope of Q. Suppose e > 0 is given. By small overflow, there is a Bfine cover V1 C V of Q with 6(E) < 1 for all E E V1 and
p ess sup E \ C < \ EEV1
/
We will proceed by transfinite induction. To begin, let
t1=sup{6(E):EEV1}, and choose Al E V1 with 6(A1) > ,(31/a. Let
V2={EE V1: En Al=0}. If V2 = 0, the recursive construction stops; otherwise continue: define
,32=sup{6(E):EEV2},
Derivation
334
and choose A2 E V2 with 6(A2) > i32/a. Suppose r is an ordinal, and pairwise disjoint sets (Ay)7<, have been chosen. Let
V,= {EEV1 :EnA.)=0forally < r}. If VT = 0, the recursive construction stops; otherwise continue: define
,3T=sup{6(E):EEV,}, and choose AT E VT with S(AT) > PT/a. Now the sets Ary are disjoint, and their union has outer measure at most p(C) + e < oo, so there are only countably many of them. (That is, the construction stops at some countable ordinal.) If E E V1, then E fl Ay # 0 for some 'y. The series p(Ary) converges. Thus we may choose a finite set I of ordinals so that
E pAry < M ryl and p(U.YcI Ary\C) < e. We claim that p(C\U7EI Ary) < e, or equivalently p* (Q \ UryEI Ary) < e. If w E Q \ U..EI Ary, then there is a net Et W, Et E V1. But the finite union U7EI Ay is closed, so there is t with Et flUyE1 Ay = 0. Fix such a t. Now if 'Y is the least ordinal with Et fl Ay # 0, then y 0 I and Et E Vry, so b(Ery) <_ 3ry < ab(A.,), and thus w E Et C H(a, Ary). This shows that
Q\UA.y9 UH(a,Ary) ryEI
yI
Therefore
p*
(Q\ U Ary
E p(H(a, Ary))
M E p(Ary) < e.
ryI
ryEI
ry
I
M
Since H(a, A) is an essential supremum, the assertion
p(H(a, A)) < Mp(A) is the same as: for any finite collection El, E2, and b(Ei) < ab(A), we have
,
En E S with Ei fl A
0
n
p
E/i I < Mp(A). (U 2_1
Dropping the requirement that the constituents be closed, we may still prove the weak Vitali property.
7.4. Dbases
335
(7.4.8) Theorem. Let fl be a metric space, let u be a Radon measure, and let B = (E, 6) be a Dbasis, where 6 is diameter. Suppose there exist a > 1 and M < oo such that p(H(a, A)) < Mp(A) for all A E E. Then B has the weak Vitali property. Proof. Let 0 < r < 1. Define
Er = { K : K compact, K C E, µ(K) > rp(E) for some E E E 1. Then (Er, 6) is also a Dbasis, where 6 is diameter. Its constituents are closed sets. Let Hr(a, K) be the ahalo for the basis Er and H(a, E) be the ahalo for the basis E. If K C E, µ(K) > ru(E), then Hr(a, K) C H(a, E), so
/ (Hr(a, K)) C 1 (H(a, E)) C Mt(E) < Mµ(K). Therefore by the previous theorem, Er has the strong Vitali property. Now let C E .F, 0 < µ(C) < oo, V a Bfine almostcover of C, and e > 0. Choose r < 1 so that
(i_i) (p(C) + e) < e. Then the set of all compact K such that K C_ E and µ(K) _> rµ(E) for some E in V is a (Er, 6)fine almostcover of C. By the strong Vitali property, there exist disjoint K1, K2, . , K.,, with µ(U Ki \ C) < e and µ(C \ U Ki) < E. It is easily checked that the corresponding sets Ei E V satisfy p(C \ U Ei) < e, u(U Ei \ C) < 2e, and E p(Ei)  µ(U Ei) < 3eM
This may be used to prove a covering theorem that may be of independent interest. A homothety of IRd is a function 9: IRd * IRd of the form 9(x) = rx + a, where r > 0 and a E W. Recall the notation A for ddimensional Lebesgue measure on Rd. If 9(x) = rx + a, then for every measurable set C C Rd, we have A(9(C)) = rdA(C).
(7.4.9) Homothety filling theorem. Let U C IRd be an open set of finite measure, let C C Rd be a bounded measurable set of positive measure, and let e > 0. Then there exist homotheties 91i 92, , 9,, such that
(1) Ui ei(C) s U, (2) A(U \ U 9%(C)) < e,
(3) E )(82(C)) < A(U) + e, (4) diam 9i (C) < s. Proof. The family E
9(C) : 9 homothety }
together with 6 = diameter, defines a Dbasis on IRd. Since C is bounded, it is contained in an interval I of IRd. Let J be the interval with the same
Derivation
336
center as I, but 5 times the side. Then clearly H(2, C) C J. Thus we have A(H(2, C)) < MA(C), where M = A(J)/A(C). But then for any homothety 9, also A(H(2, 9(C))) < MA(9(C)). Thus by Theorem (7.4.8), the derivation basis B = (E, 6) has the weak Vitali property. Now V = { 9(C) : 9 homothety, 9(C) c U, (5(9(C)) < e } is a Bfine cover of U, so the result follows from the weak Vitali property.
As usual (Proposition (7.3.7)), the same result is true if we allow countably many sets 9i (C) and change (2) to: A (U \ U 9i (C)) = 0. Weak halo
A related sort of halo will be discussed next. Under the right conditions, it leads to a necessary and sufficient condition for the density property. Let (£, 6) be a Dbasis on (SZ, Let a, ,q > 0, and let C E.F. The weak i7halo of C is:
S(a, 77, C) = ess sup { E E £ : µ(C fl E) > aa(E), 8(E) < i }
.
The weak halo of C is the essential union over all rl > 0, or:
S(a, C) = ess sup { E E £ : µ(C fl E) > ap(E) } . It is not hard to guess that if y(E) =,u(Cfl E), then S(a, r7, C) and S(a, C) are related to the set {D,,y > a}. The next result will make this precise. A Dbasis (£, 6) has the weak halo evanescence property iff: for every a, 0 < a < 1, every decreasing sequence of measurable sets of finite measure Cn 10, and every decreasing sequence of positive numbers 11n 10, we have limn µ(S(a, rln, Cn)) = 0. (7.4.10) Theorem. Let SZ be a metric space, let p be a Radon measure on S2, let B = (E, 6) be a Dbasis, where £ is a collection of open sets of finite measure, and 6 is diameter. Then B has the weak halo evanescence property if and only if B has the density property.
Proof. First suppose that B has the density property. Since £ consists of open sets and p is a Radon measure, the essential supremum in the definition of the weak halo is an actual union. Let C E Y, 0 < µ(C) < 00. Then by Theorem (7.3.11) B differentiates the integral of 1C, so if y(E) _
ji(EflC), then for 0< a <1 we have
nS(a,77,C)={Dry>al =C. v>o
Suppose Cn
0 and 17n 10. For each n, we have therefore 1 km7U(S(a, rik, Cn)) = N'(Cn)
7.4. Dbases
337
Given e > 0, choose no so that p(Cno) < e/2; then choose ko so that µ(S(a, %o, Cno)) < e. If n > max{no, ko}, then µ(S(a, zn, Cn)) < E. This shows that limn µ(S(a,1)n, Cn)) = 0. Thus B has the weak halo evanescence property. Conversely, suppose that B has the weak halo evanescence property. Recall (Proposition (7.4.1)) that the upper and lower derivates are measurable
functions under the conditions assumed. Let C E F, 0 < µ(C) < 00. We want to show that almost every point of C is a point of density of C. Let y be the integral of 1C: we want to show that D*y > 1 a.e. on C. Suppose not. There exists a > 0 such that µ { w E C : D,,y(w) < 1  a } > 0. Thus there is a compact set K c c n {D,,y < 1  a} with µ(K) > 0. Now V1
={ E
µ(KnE)
<1 a,
µ(E)
111
6(E)
} JJJ
is a (Bfine) open cover of K. There are finitely many sets All, A21i
E Vl
with K C Ui Ail, an open set. Next, V2 =
EEE:
(KE) )
< 1 a, s(£') < 2, E C U Ail
jq(
is a (Bfine) open cover of K. There are finitely many sets A12, A22, E V2 with K C Ui Ai2. Continuing in this way, we obtain sets Aik E E such that
µ(K n Aik) µ(Aik)
< i  a,
8(Aik) < k
K C UAik, i
U Ai,k+1 C U Aik i i
Now K is closed, so K= n k Ui Aik. Thus the sets defined by Ck = Ui Aik \ K satisfy Ck 10. But p(Aik) = µ(Aik n Ck) +,a(Aik n K) < 1z(Aik n Ck) + (1 a),u(Aik),
so u(Aik n Ck) > ap(Aik). Thus Aik C S(a,1/k, Ck). Now for every k, we have K C Ui Aik C_ S(a,1/k, Ck), so by the weak halo evanescence property p(K) = 0. This contradiction shows that in fact Dy = 1 a.e. on C.
Derivation
338
BusemannFeller bases A collection A of subsets of 1Rd is said to be closed under homotheties if A E A implies 9(A) E A for all homotheties 9. A Dbasis (£, 6) on (S2, F, i) is a BusemannFeller basis iff:
(1) Suis1Rdforsome d> 1, (2) p is ddimensional Lebesgue measure A, (3) £ is a collection of bounded open sets, (4) £ is closed under homotheties, (5) 6 is diameter. Many of the examples of derivation bases in Sections 7.1 and 7.2 are essentially BusemannFeller bases; in order to make them BusemannFeller bases we must remove the boundaries from the constituents. This makes no essential difference in these cases, since the boundaries are sets of measure zero. Since the constituents are open sets, and Lebesgue measure A on IRd is a Radon measure, the essential suprema appearing in the general theory are unions in this case (as in (7.4.10)). For example, the weak halo is
S(a, C) = U { E EE : µ(C fl E) > ap(E) I. Weak halo conditions are particularly useful for BusemannFeller bases. The weak halo evanescence property may be simplified in the following way:
The derivation basis B has property (WH) if, for every a, 0 < a < 1, there exists M < oo such that for every C E F, we have p(S(a, C)) < M1u(C).
(7.4.11) Theorem. Let B be a BusemannFeller basis. Then B has property (WH) if and only if B has the density property. Proof. Clearly (WH) implies the weak halo evanescence property, hence the density property.
For the converse, suppose (WH) fails. Then there exist a, 0 < a < 1, and B,,,, E .F with µ(S(a, Bm)) > 4m/L(Bm). Let J be the unit interval in ]Rd. Fix m. There exists a finite union 8m of constituents
E with µ(E fl Bm) > ap(E) such that p(Sm) > 4'p(B,,,,). By Theorem (7.4.9), there exist homotheties 01i 92, such that Ui 9i(Sm) C_ J, A(J \ Ui 9i(Sm.)) = 0, >i A(9i(Sm)) < 2, and 6(9i(Sm)) < 1/m. Let
Cm = Ui 9i(Bm). Then: for almost every w E J, there exists a con
stituent E with w E E, 6(E) < 1/m, and A(E fl Cm) > ap(E). Also, A(Cm) < >i A(9i(Bm)) < 4m E {G(9i(sm)) < 2.4. Now let C = U,°_1 Cm. Thus A(C) < > A(Cm) < > 2.4m < 2/3. We claim that B does not differentiate the integral ry of 1c. For almost every w E J and every m, there is a constituent E with w E E, 6(E) < 1/m, and A(E fl C) > A(E fl Cm) > aA(E). Therefore D*ry(w) > a, so D*. = lc fails on the set J \ C, which has positive measure.
7.4. Dbases
339
For derivation of L1 integrals, we should consider a refinement of (WH). We postulate not only for each a there is a constant M, but more precisely
how M depends on a as a  0. The derivation basis B has property (WH1) if there is a constant K such that, for every a, 0 < a < 1, and every C E F, we have A(S(a, C)) <
K
µ(C).
More generally, let %F be an Orlicz function. The derivation basis B has property (WHO) if there is a constant K such that, for every a, 0 < a < 1, and every C E F, we have µ(S(a, C)) < KAY I a) µ(C).
The proof of the following is like the previous proof. (Note that the conjugate function ' is not being used.) (7.4.12) Theorem. Let B be a BusemannFeller basis, and let IF be an Orlicz function with 0 < T(v) < oo for all v > 0. Suppose B differentiates all Lp integrals. Then B has property (WHO). Proof. Suppose (WH,&) fails. Choose numbers Km T oo so that
E1 K. < '(1). 00
(7.4.12a)
m=1
There exist Bm E 2 and am < 1 with B0)) > Proceeding as in (7.4.11), we obtain sets Cm contained in the unit interval J of 1R,d such that )(C,,,.) < 1/(Km'P(1/a,,,.)), and for almost every w E J, there is a constituent E with w E E, 6(E) < 1/m and a,,.A(E). Now consider the nonnegative function
f =sup1Cm, m am
and its integral y(E) = fE f dA. We have dA
(7.4.12b)
=E'\(C,»)`p( 1)
so f E Lp. But we claim that D*ry = f fails on a set of positive measure. For almost every w E J and every m, there is a constituent E with w E E, 6(E) < 1/m, and A(E fl Cm) > a0A(E). Thus 'Y(E)
fE f dA > JE m 1Cm dA = m A(Cm fl E) > A(E).
Derivation
340
Thus D*y(w) > 1 on J. But if f > 1 on J, then we would have f T (f) dA > f f'Y(1) dA = T(1), which contradicts (7.4.12a) and (7.4.12b). Therefore
A(f
(WHq,). The derivation basis B has property (FHp) if there is a constant K such that, for all disjoint bounded sets C1i C2, , C, E Y and all , c,,, we have
nonnegative scalars c1, c2,
µ (ess sup { E constituent : E cip(Ci n E) > µ(E) })
< K > IF(ci.)p(Ci) Note that (FH p) implies (WH p) by taking a single set C1 and scalar cl = 1/a. On the other hand, if (FH,,) holds, then the inequality remains true even for infinite lists C1, C2, and c1, c2, . Condition (FH4,) is more complex than (WH*), but it is necessary and sufficient for differentiation of L,1, integrals.
Note an alternative formulation: If we write g = ci lei (a nonnegative simple function), then the inequality > ci.µ(Ci n E) > p(E) becomes fE g d1i > µ(E), and > IF (ci) p(Ci) becomes f W (g) dµ.
(7.4.13) Theorem. Let B be a BusemannFeller basis, and let 4' be an Orlicz function satisfying (02) with 0 < '(v) < oo for all v > 0. Then B differentiates all Lw integrals if and only if B has property (FHq,).
Proof. Assume (FH,,). Property (FHq,) implies (WHq,), which implies (WH). Thus (7.4.11) B differentiates the integrals of all Lm functions. Let f > 0 belong to Lw, and let y(E) = fE f dA for constituents E. We claim that Dry = f a.e. Write Ci = {i  1 < f < i}; the sets Ci are disjoint. For m E IN, let f,,,, = f 1 [f<,,,1, so f,,,. E L00. Therefore y,,,. (E) = fE f,,,, dA satisfies Dyrn = f,,,, a.e. Fix e > 0. Now f E Lip, where IF satisfies (02), so f oP(2 f /e) dA < oo.
On the set Ci, we have (for i > 2) i/e < 2(i  1)/e < 2f /e, so 00
i=2
4,
(cJL)<00.
/
Now define 00
Sm = U E constituent : > i A(Ci n E) > e,\(E) i=m+1
The sets decrease: Sl Q S2
.
By condition (FHp),
\(Ci);
A(S,,,) <_ K i=m,+1
so A(Sm)
0. Thus A((l Sm) = 0.
.
7.4. Dbases
341
Now let w be such that Dy,,,,(w) = fm(w) for all m and w ¢ n Sm. Almost every w satisfies these conditions. Let En = w. Then choose m so that m > f (w) and w ¢ Sm. Now

°° m y(En n Ci) y(EE) \(En) A(En) = + i=
y(E,,. n Ci) A(En) 1
By the definition of S,,,,, 00
00
y(EnnCi)
i=
iA(E,,,nCi)
<
A(EE.)
1
1
Also, m > f (w), so f (w) = fm(w) and M y(EE n Ci) = ym(En) A(En) A(EE) 1
L. i=1
fm (w)
Thus
f (w)  e < liminf (En) < lim sup y(En) < f (W) + e. This is true for all En = w, so .f (w)  e < D.y(w) < D*y(w) < f (w) + e.
Finally, since a was arbitrary, we have f (w) = Dy(w) a.e. For the converse, suppose (FHq,) fails. Choose Km T oo with °O
2
EK
(7.4.13a)
M=1
< lY(1).
m
There exist nonnegative simple functions gm, with bounded support, such
that A (U { E : fE gm dA > A(E) })
>
Km f W (gm) dA.
For each m, there is a finite union Sm of constituents E with fE gm d,\ > A(E) such that )1(Sm) > Km f %P(gm) dA. Let J be the unit interval in Rd. There exist homotheties 01i 02, such that Ui Oi(Sm) C J, S(6i(Sm)) < 1/m, A(J \ Ui Oi(Sm)) = 0, and >i A(Bi(Sm)) < 2. Thus if fm = supi gm o ei, we have
f T (.fm) dA <
IF (gm o Bi) dA < Km
)A(0i(Sm)) < Km
Derivation
342
and for almost every w E J there is a constituent E with w E E, 5(E) < 1/m and f E f,,,, d' > A(E). Then consider f = sup,,,, f,,,.. Now fI!(f)cDt <
(7.4.13b)
fW(fm)dA
so f E L. Let y be the integral of f. We claim that D*y = f fails on a set of positive measure. For almost every w E J and every m, there is E with w E E, 5(E) < 1/m and f f dA > A(E), so D*y > 1 a.e. on J. But if f > 1 a.e. on J, then we would have f IF(f) dA > f j W(1) dA ='(1), which contradicts (7.4.13a) and (7.4.13b).
The particular case of (FH1) should be noted. The BusemannFeller basis B differentiates all L1 integrals if and only if. there is a constant K such that, for all disjoint bounded sets C1i C2, , C, E F and all nonnegative scalars c1i c2, , c, we have
A (U { E constituent : > ciA(Ci n E) > A(E) }) < K>2 ciA(Ci). Complements
(7.4.14) (Interval basis.) Let B be the interval basis in IR2. The weak halo S(a, J) of the unit square j of JR2 has area 1 + (4/a) log(1/a) (see Figure (7.4.14)). This is a BusemannFeller basis, so we know that B does not differentiate all L,& integrals unless 1'(v) >,,,, v log v. In particular, B does not differentiate all L1 integrals. This is essentially the fact that is used in the counterexample (7.2.10).
M be
Figure (7.4.14).
A(S(a, J)) = 1 + (4/a) log(1/a).
Figure (7.4.15a). Weak halo.
7.4. Dbases
Figure (7.4.15b). Subdivide.
343
Figure (7.4.15c). Translate.
(7.4.15) (Rectangle basis.) In 1R2, the rectangle basis is the BusemannFeller basis consisting of the (open) rectangles, with sides not necessarily parallel to the axes. This basis fails the density property. This can be seen using Theorem (7.4.11) as follows. The weak halo S(1/2, C) of a triangle C with vertices a, b, c (shown in Figure (7.4.15a)) contains a triangle with vertices a, b', c', where b is the midpoint of the line segment from a to b' and c is the midpoint of the line segment from a to c'. Consider a triangle (shaded in Figure (7.4.15b)) and the corresponding larger triangle (white). Subdivide it into many smaller triangles as shown. Then translate the triangles as in Figure (7.4.15c). The weak halo S(1/2, C) of the shaded
portion C contains the white portion. By taking a large number of subdivisions and using appropriate translations, it may be arranged that the shaded area is as close to 0 as we like, while the white area remains large. This shows that the property (WH) fails. (Details of this construction are in Busemann & Feller [1934]. Or see Hayes & Pauc [1970], Section V.5, p. 104.) Remarks
Dbases, BusemannFeller bases, and many other variants are discussed by Hayes and Pauc [1970]. The reader may consult that volume for further information. In the term "Dbasis," the "D" is for Denjoy. See Denjoy [1951], Haupt [1953], Pauc [1953]. The general theory allows the function 6 to be something other than the diameter. There is also the possibility of using another "disentanglement function" in the definition of the halo. Property (A) is the derivation version of a condition of Astbury [1981b]. The equivalence of property (A) and property (C) (for special Dbases) is proved in Millet & Sucheston [1980c]. They also prove the sufficiency of property (C) for differentiation of Ll integrals. Necessity of property (C) for differentiation of Ll integrals is stated in Talagrand [1986]. BusemannFeller bases come from a paper of Busemann and Feller [1934].
Pointwise ergodic theorems
In this chapter, we will prove some of the pointwise convergence theo
rems from ergodic theory. The main result of this chapter is the superadditive ratio ergodic theorem. It implies the ChaconOrnstein theorem, the Kingman subadditive ergodic theorem, and, for positive operators, the DunfordSchwartz theorem and Chacon's ergodic theorem involving "admissible sequences." We consider positive linear contractions T of L1. Our plan is as follows. We first prove weak maximal inequalities, from which we obtain the Hopf decomposition of the space SZ into the conservative part C and the dissipative part D (8.3.1). Assuming T conservative (that is, Q = C), we prove the ChaconOrnstein theorem (8.5.4), i.e., the convergence to a finite limit of the ratio of sums of iterates of T applied to functions f and g. The limit is identified in terms of f, g, and the o algebra C of absorbing sets. The superadditive operator ergodic theorem is proved in the conservative case (8.4.6). It is then observed that the total contribution of SZ to C is a superadditive process with respect to the conservative operator Tc induced by T on C . Since the behavior of the ergodic ratio on the dissipative part D is obvious, the ChaconOrnstein theorem (8.6.10) and, more generally, the superadditive ratio theorem (8.6.7) will follow on 0. This affords considerable economy of argument, since the direct study of the contribution of D to C is not obvious even for additive processes. The superadditive theory (or, equivalently, the subadditive theory) is mostly known for its applications, but in fact the notion of a superadditive process is shown to shed light on the earlier additive theory of L1 operators. Throughout this chapter, the term "operator" means a bounded linear transformation. An operator T defined on (equivalence classes of) realvalued measurable functions on Il is called positive if f > 0 a.e. implies T f > 0 a.e. The operator T is a contraction on LP if JjT f lip < 11f ill, for each f E L7, . A positive operator T is called subMarkovian if it is a contraction on L1. A positive operator T is called Markovian if it preserves
the integral, f T f dp = f f dp for f E Li , so that T is an isometry on Li . If f and g are extended realvalued functions, we write f V g for the pointwise maximum, f A g for the pointwise minimum, f + = f V 0, and
f=(f)V 0.
Throughout Chapter 8, we will let (S2, F, p) be a ofinite measure space, and T a positive contraction on L1(SZ, F, µ). Observe that a positive linear operator on L1 can be extended to act on the lattice of all extended realvalued functions f such that the negative part f  is integrable. (See (8.4.9).)
8.1. Preliminaries
345
8.1. Preliminaries Given a normbounded sequence of elements f,,, in a Banach space E, it is often important to find in E an element cp in some sense asymptotically close to a subsequence of fn. Bounded subsets of reflexive Banach spaces are weakly sequentially compact (in fact, this is a characterization of reflexivity) so in Lp(SZ, F, p) for 1 < p < oo the weak limit of a subsequence of fn will do for W. Weak sequential compactness is not available in Ll (St, F, µ) without
extra assumptions (uniform integrability; see Section 2.3), so we have to settle for less than a weak limit. One procedure (8.1.4) is to consider elements of Li as members of Li*, i.e., finitely additive measures. Then by a theorem of Alaoglu, a subsequence of fn converges weakstar to an element 77 in Li*, and the maximal countably additive measure dominated by 77 will do for W. A related method
consists in taking Banach limits; see for example Krengel [1985], p. 135. First we discuss the truncated limit, a more elementary and transparent method of constructing V. Truncated limits
A weak unit in Ll is an element u E Li such that, for each f E Li , if u A f = 0, then f = 0. (This is the terminology used also in more general Banach lattices. See Remarks, below.) Any strictly positive integrable function u is a weak unit in L1; such a function exists since (1k, is orfinite.
Let (fn) be a sequence in Li with supra IIfnIIl = M < oo. A
function cp E Li is called a weak truncated limit of (fn) iff: for a weak unit u, the weak limit cpk = w lim fn A ku n
exists for every k E IN, and cpk T cp. It is easy to see that in this definition the choice of the weak unit u is irrelevant. We will write cp = WTL fn. n
If fn is not positive, WTL fn is defined as WTL fn  WTL f, , assuming these expressions exist. There is a compactness for weak truncated limits:
(8.1.1) Proposition. Let (fn) be a sequence in Ll with supra II A111 = M < co. Then there is a subsequence of (fn) that has a weak truncated limit cp, and IIcPIII < M.
Proof. It suffices to consider positive fn. Let u be a weak unit. For each k E IN, the sequence (fn A ku)nEIN is a sequence bounded by ku. Thus (fn A ku) is uniformly integrable, so it has a subsequence that converges weakly, say to cpk. By the diagonal procedure, we obtain one subsequence (fns) such that, for each k, the sequence fn3 A ku converges weakly to cpk as j + oo. Now the sequence c'k is increasing and bounded in norm by M,
346
Pointwise ergodic theorems
so its pointwise limit cp = limk cpk belongs to Li . Thus WTL3 fni = cp and IIVII1 <<M.
Next we prove a few of the basic properties of weak truncated limits. The
operator WTL is additive and has the "Fatou property." The arguments will use elementary properties of operations in Banach lattices of functions, for example, (f + h) A h < (f A h) + (g A h) for nonnegative functions. This (and other similar properties) can be verified by considering various possible pointwise relations between f, g, and h. We will also use this fact (8.4.10):
if an operator T is positive in the sense that T f > 0 whenever f > 0, then T is continuous with respect to the norm topology (and therefore also with respect to the weak topology).
(8.1.2) Lemma. Let fn ? 0 and gn > 0. If WTL fn = cp, WTL gn = y, and WTL(fn + gn) = 77, then cp + y =,q.
Proof. First, (fn + 9n) A ku < (fn n ku) + (9n n ku), hence i < V + y. For the other direction,
(.fn + 9n) A 2ku > (.fn A ku) + (9n A ku),
which implies that q > V + y.
(8.1.3) Fatou's lemma. Let T be a positive operator on L1, and let f,, >0. If WTLfn=cpand WTLTfn=V, then TV
and ,k T 0. Given k E IN and e > 0, there exists m so large that T(ku) < mu + g, where g > 0 and II9II 1 < e. Then
T(cpk) = T(wlim(fn A ku)) = w lim T(fn A ku) n
< w lim (T fn A T(ku))
< w lim (T fn A (mu + g)) n
<wlim(TfnA(mu))+g n
yam+g<_V+g. Since IIgil1 < e, and a is arbitrarily small, it follows that Tcpk :5'0. Letting k + oo, we conclude that TV < 0.
The preceding result may be abbreviated: T WTL < WTL T. These results are sufficient for our needs in this chapter. A few other results are stated as Complements below.
8.1. Preliminaries
347
Decomposition of set functions
Let A be an algebra of subsets of a set Q. Sets introduced below will normally be assumed to be in A. A finite, nonnegative, finitelyadditive
set function 0 on A is called a charge. The set function 0 is a supercharge if instead of finite additivity only finite superadditivity is required: 1/)(A U B) > 1b(A) + 1/'(B) for each pair of disjoint sets A, B. Every supercharge 1/) is monotone: A C_ B implies 1/Y(A) < %(B). Also note by induction
that if the sets Ai are pairwise disjoint, then V,
(n
n
U A)
? E 1G(Ai),
i=1
i=1
and therefore (let n > oo) 1/1
U Ai 00
00
i=1
i=1
1b(Ai).
We say that 7P is countably superadditive. A charge 1/) is called a pure charge if it does not dominate any nontrivial
measure on A: if p is a measure and p < 1/i on A, then p vanishes on A. Similarly a pure supercharge is one that does not dominate any nontrivial charge. A partition of a set A is a collection of disjoint sets (in A) with union A.
(8.1.4) Theorem. Let 1/J be a supercharge defined on an algebra A of subsets of Q. Then: (1) V) admits a unique decomposition (8.1.4a)
1/I = VIM +
1b5,
where 1/Jm is a measure, 1/1c is a pure charge, and 1/J$ is a pure supercharge.
(2) The measure 4'm is given by (8.1.4b)
OM(A) = info
where info denotes the inf over all countable partitions {A1i A2, of A. The charge 1/1, is given by
}
0,: (A) = inff E(1/) 
(8.1.4c)
i
where inff denotes the inf over all finite partitions {A1, A2,
,
Ate,}
of A.
(3) If 1/I is a charge, then Os is 0. If 0 dominates a measure W, then 4Im also dominates gyp.
Pointwise ergodic theorems
348
Proof. Let 0 be a supercharge. We first show that bm defined by (8.1.4b) is a measure. Suppose {Al, A2i } is a countable partition of a set A. Given e > 0, choose for each i a countable partition {Ail, Ail, } of Ai such that b(Aik) C bm(Ai) + E2Z. k
Then { Aik : i = 1, 2,. . . ; k = 1, 2,
} is a countable partition of A. There
fore
1'm(A) < E i(Aik) < E [ibm(Ai) +,F2'] _ E wm(Ai) + E. i
i,k
i
Since E is arbitrary, we conclude
bm(A) <
V(Ai).
Next, we prove the reverse inequality. Given s > 0, choose a countable partition {B1, B2,. } of A such that E V,(Bk) < 'Im(A) +,F.
(8.1.4d)
k
Then
E
4'm ('4i)
i
LTV (Ai fl Bk) k
_E [(AinBk)] k
i
EV(Bk), k
since 0 is countably superadditive. Combining this with (8.1.4d), and taking into account the fact that e is arbitrary, we conclude t'm(A)
bm(Ai)
This completes the proof that Y)m is a measure. The proof that oc defined by (8.1.4c) is finitely additive is similar: countable partitions are replaced by finite partitions. The details are omitted. Next we show that Oc is a pure charge. Suppose µ is a measure on A and µ < c. Then, from (8.1.4c) we have u < 0  Yam. For each set A E A,
bm(A) = info E i(Ai) ? info and therefore µ(A) = 0.
[Om(Ai) + iz(Ai)] = bm(A) + l.t(A),
8.1. Preliminaries
The proof that 'bs is a pure supercharge is similar. For the uniqueness, suppose that &i also has the decomposition
349
_
'Pm + cpc + cps, where Wm is a measure, cpc is a pure charge, and cps is a pure
supercharge. Note that the set function
0'(A) = inff TO. (Aj) is a charge dominated by 1/is, hence it vanishes. Similarly
cp'(A) = inff E cps(Ai)
vanishes. Therefore, applying the operation inff to both sides of the equation V m + Vc + 0s = cpm + Vc + cps, we obtain Y5m + Oc ='Pm + cpc. Similarly,
applying the operation info to this equation, we obtain Wm = 'm. Therefore also 4bc = Vc ands = cps. If 0 is a charge, then V),, = 0 by the uniqueness. Finally, suppose 0 dominates a measure cp. By the uniqueness, 'm = V. Applying the operation info to the inequality 0 > ', we have m >
If 0 is a supercharge, we will write M(O) for the maximal measure dominated by 0, that is, the measure bm of the theorem. Note that if the set functions 4)1 and '02 are defined on the same algebra or aalgebra,
then M('01 + b2) = M(&1) + M(02). Indeed, if partitions close to the infima are obtained for each set function, then the common refinement of the partitions will be at least as close for each supercharge. On the other hand, if the set functions are defined on different algebras or aalgebras, then M is only superadditive. This is so since the infimum in the expression (8.1.4b) may correspond to different sequences of partitions for different supercharges.
(8.1.5) Corollary. If 01, algebra, then M(Eti 1 i'i) = a pure charge.
, V),,, are supercharges defined on the same Hence a sum of pure charges is 1
Ez
If 9 is a measurepreserving invertible point transformation and Wi =
9i0, then the corollary may be applied to conclude M(E 19i4') = M(9ib). Note that if 9 is not invertible, the equality may fail, since 1 the supercharges 9i are defined on different algebras. Complements
In complete analogy with weak truncated limits, we can define strong truncated limits, or simply truncated limits. That is: if fn > 0, then we write cp = TL fn iff, for some weak unit u, limn IIPk  (fn A ku)II1 > 0 for all k, and cpk T V. For general fn, write TL fn = TL fn  TL fn .
Pointwise ergodic theorems
350
(8.1.6) (Weak and strong truncated limits 0 coincide for nonnegative functions.) Suppose f,,, > 0. Then WTL f,,, = 0 if and only if TL f,,, = 0. To prove this, integrate the bounded sequence fn A ku. A sequence (fn) will be called TL null if TL I fn I = 0. For this it suffices that w lim If,, I A u = 0. In vfinite measure spaces, stochastic convergence
is defined as convergence in measure on sets of finite measure. Thus in particular, in probability spaces, stochastic convergence is convergence in probability.
(8.1.7)
(TL convergence and stochastic convergence.) Let fn, o E Li.
Then the following are equivalent:
(1) TL fn = W. (2) fn  cp is TL null. (3) fn converges to cp stochastically.
(8.1.8) (Sharpening of Proposition (8.1.1).) If fn E Li , sup,, IlfnIii < oo, then there is a subsequence (fk,) of (fn) such that fk, = gn + hn, where gn, hn E Li , the sequence (gn) converges weakly, and the hn have disjoint supports (hence TL hn = 0). Remarks
The notion of truncated limits is particularly well suited for study of a large class of Banach lattices E, namely those satisfying the following two conditions:
(A) There is a weak unit; that is, an element u E E+ such that if f E E+ and f n u = 0, then f = 0. (B) Every norm bounded increasing sequence in E converges in norm. Assumptions equivalent with (B) are: (B') E is weakly sequentially complete; (B") E contains no subspace isomorphic to co. Condition (B) implies that E is order continuous, hence order intervals are compact and a weak unit exists if E is separable; thus the assumption (A) is not an important loss of generality (see Lindenstrauss & Tzafriri [1979]). Truncated limits exist in Banach lattices satisfying (A) and (B); the proof is the same as the Ll proof given here. The method of truncated limits was developed by Akcoglu & Sucheston [1978], [1983]. The decomposition (8.1.8) of DacunhaCastelle & Schreiber [1974] could be
used instead in the real case, but this decomposition does not extend to Banach lattices satisfying (A) and (B); see Akcoglu & Sucheston [1984a]. Theorem (8.1.4), applied also in Chapter 4, is from Sucheston [1964]. For the purpose of the present chapter, the Yoshida & Hewitt [1953] decomposition in which zp is assumed to be a charge is sufficient.
8.2. Weak maximal inequalities There exist important ergodic results for operators acting on Li . One such result is E. Hopf's maximal ergodic theorem. We will first give a version of Hopf's maximal theorem for lattices.
8.2. Weak maximal inequalities
351
(8.2.1) Lemma. Let L be a linear space of measurable functions with values in (oo, oo) such that L is a lattice under pointwise operations, let T be a positive linear operator on L, and let h E L. For N E IN, let n1
hN =1max E Tih
and BN = {hN > 0}. Then
h1BN > hN T(h+). Proof. Since T is positive, it satisfies T(f V g) > T f V Tg. Thus T (h+) > TO V ThN
> OV (Th) V (Th+T2h)V ... V hN+1  h > hN  h. For w E BN, we have hN(w) = h+ (w), so
h(w)1BN(w) = h(w) > hN(w) T(h+)(w) = h+ (w) T(hr+i)(w) For w
BN, we have h+ (w) = 0, so h(w) 1BN (w) = 0 > 0  T (h+) (w) = h+ (w)
 T (h+) (w).
Let T be a positive operator on L1. Obviously T extends to the positive cone of positive measurable functions by
Tf=limTTfn
(8.2.2) Theorem (Hopf's maximal ergodic theorem). Let T be a positive contraction on L1 , and let h be a measurable extended realvalued function such that h+ is integrable. For N E IN, let n1
hN =1max 1: Tih,
BN = {hN > 0}, and B = UN=1 BN. Then
fB,
hdµ>0.
Pointwise ergodic theorems
352
Proof. Since T is a positive contraction, we have
fT(h)dp = IIT(hN)II1 <_ IIhNII1= Jhdp. The sequence hN increases, so the sequence BN increases. Applying Lemma (8.2.1) and the monotone convergence theorem, we have
B
hdp>liminfJ (hNT(hN))dp>0. BN
0 If f and g are functions, we will study the behavior of ra n1 Ti
Ej=o n1
T39
For example, if 1 is the function identically 1, and Ti = 1, then Rn(f,1) is the standard Cesaro mean
il n11 E Tz f. i=O
Recall that a ufinite measure ry on .F is said to be equivalent to p if µ(A) = 0 holds if and only if y(A) = 0. Then the RadonNikodym derivative g = dry/dµ satisfies 0 < g < oo a.e.
(8.2.3) Proposition. Let f and g be measurable functions. Suppose f is integrable, and 0 < g < oo. Write y = gp. Let s = supnEIN Rn(f, g). Then, for all A > 0, we have (8.2.3a)
ry{s > Al <
A
f
f dy.
{3>A} 9
Proof. Given A > 0, let h = f  Ag. Then the set
oo In1
{s> A}= U
n1
{rf>AETj9
n=1
i=0
j=0
is the set B,,,, of Theorem (8.2.2), so we have
(fA9)dp> 0, and therefore
J
f9
s >a)
T his yields the required inequality.
8.2. Weak maximal inequalities
353
The Cesaro averages of the powers of an operator will often be written as follows:
1 n1 An = An(T) = n ETZ. n io
(8.2.4) Definition. An operator T is said to be LP mean bounded if sup IIAn(T)IIP < oo. n
Note that L00 mean boundedness for a positive operator T can be rewritten: supra it An II O < c if and only if supra An 1 < c. If T satisfies this, and we use g = 1 in the proposition, we have ry = p and 00 n1 n1 00
Is > Al =
U Tif>A
n=1
Ti l } D U {An f > Ac}. j=0
i=0
n=1
J
Thus, given a nonnegative function f, the maximal function
f* =supTnf n satisfies the threefunction inequality (8.2.4a)
Al f * > cA} <
1 A
f
f dp. 3>A}
(See Section 3.1 for sample consequences of this.)
We will discuss replacing s in this inequality by f. In the next lemma, we use a sequence Tn of positive operators, which will be chosen to be An for our application. Recall that Ro is the heart of the largest Orlicz space L1 + L.. Equivalently, Ro may be defined as the space of all measurable functions such that f 1{1f I>,\} ELI
for each A > 0 (Proposition (2.2.14)).
(8.2.5) Lemma. Let Tn be a sequence of positive linear operators on L1 + L00 such that supra IITnIIo0 = c < oo. Assume for each f E L1 and A > 0 that p{supTnf > cA} < IIfIl1. Then for each f E Ro and A > 0,
p{supTnf > 2cA} <
1
A
f
If I dµ. f>A}
Proof. Let f E Ro be given. For A > 0, write f)` = f 1{Ifl>a} Then we have I l l < 11A1 + A, so sup Tn f
p{supTnf > 2cA}
sup Tn I f > I + cA. Therefore
p{ supTn(I fAI > cA} ))
AIIfAII1 = 1.l{IfI>A} Ifl dp.
We may now obtain a maximal inequality by applying Lemma (8.2.5) to (8.2.4a).
354
Pointwise ergodic theorems
(8.2.6) Theorem. Let T be a positive linear operator on L1 + L,,. such that IITII1 < 1 and sup, IIA,,(T)II = c < oc. Then for each f E Ro+ and each A > 0, the maximal function f * = srup1, A,,f satisfies
tiff* > 2cA} <
J{f>a} If I dp.
The case when c = 1 deserves special mention. Then T is a contraction in L,, (as well as L1). The following classical result is the maximal lemma usually used to obtain the HopfDunfordSchwartz ergodic theorem (see, for example, Krengel [1985], p. 51).
(8.2.7) Theorem. Let T be a positive operator on L1 + L,,. that is a contraction on L1 and on L,,. Then for f E Ll and A > 0, the maximal function f * = sup,, A,, f satisfies
µ{f*>A}< 1 f
If Idµ.
Proof. Hopf's maximal ergodic theorem (8.2.2) can be applied to the function h = f  A since H+ is integrable. Then fB_ (f  A) dp > 0, where
B,, = {supAN(f  A) > 01N
But B,,,) D If * > Al
{ f > A}, so f  A < 0 on B,,,, \ if * > Al. Therefore
f
(f  A) dµ > 0,
which yields the required inequality. Complements
(8.2.8) Theorem (8.2.2) remains true if the linearity of T is replaced by the assumptions TO
0,
T (f + 9) > T.f + Tg.
(8.2.9) (Point transformations.) A measurable transformation 9 maps Sl to SZ and is such that 91F C_ F. If also µ(9'A) = µ(A) for each A E F, then 9 is called measure preserving or an endomorphism. An endomorphism
9 defines an operator T by T f = f o 9. Such an operator is Markovian: It is positive and preserves the integral, f f dµ = f f o 9 dµ. Remarks
Theorem (8.2.2) is due to E. Hopf [1954]. A great simplification of the proof is due to Garsia [1965].
An extension of Hopf's lemma to the nonlinear setting was given by Lin & Wittmann [1991].
8.4. Hopf's decomposition
355
8.3. Hopf's decomposition We will next consider Hopf's decomposition of the measure space I into "conservative" and "dissipative" parts. Let T be a positive operator on L1. The corresponding potential operator Tp is defined by 00 n1 lim E Ti f. Tpf = >2 Ti f = n0o i=0
i=0
Pointwise convergence (possibly to oo) of the series holds at least for functions f _> 0. We will obtain the Hopf decomposition of the space f2 into the conservative part C and the dissipative part D, defined as follows. If
f E Li, then Tpf = 0 or oo on C; and Tpf < oo on D. We say that T is a conservative operator if fI = C, and T is a dissipative operator if Il = D. (8.3.1)
Theorem. Let T be a positive contraction on L1. Then S2
uniquely decomposes as a disjoint union C U D as above.
Proof. For nonnegative integrable functions f, g, define E(f, g) _ {Tpf = oo, Tpg < oo}.
Then Tp(f  Ag) = oo on E(f, g) for all A > 0. Applying Hopf's maximal ergodic theorem (8.2.2) to the function h = f  Ag, we obtain (since B,,. D E(f, g)):
0<
f
(fAg)dµ<
B,0
f fdp  AJ 0
gdµ.
E(f,g)
Since this is true for all A > 0, we have fE(f g) g dp = 0.
Now Tpg = oo if and only if Tp(Tug) = oo, so E(f,g) = E(f, Tug). Hence we have fE(f g) Tug dp = 0 for all n. This shows that on E(f, g), we have Tug = 0 for all n. Thus, at each point w with Tpf (w) = oo, we have either Tpg(w) = oo or Tpg(w) = 0. Since the measure p is orfinite, we may choose an integrable function fo > 0 a.e. Let C = {Tp fo = oo} and D = {Tp fo < oo}. By the argument
just given, with f = fo, we see that C is as required for the conservative part. To see that D is as required for the dissipative part, let h > 0 be integrable, and apply the previous discussion with f = h and g = fo. At each point where Tph = oo, we have either Tp fo = oo or Tp fo = 0. But
Tpfo> fo>O,soTph
Clearly, if there is a strictly positive integrable function p such that Tp = p, then T is conservative.
356
Pointwise ergodic theorems Complements
(8.3.2) (Point transformations.) Let T be the Markovian operator defined by an endomorphism 8 of a probability space (f2, .1, p). Then T is conservative, since T1 = 1. Remarks
The decomposition Theorem (8.3.1) for invertible point transformations appears in E. Hopf [1937], and for positive contractions on Ll in E. Hopf [1954].
8.4. The oalgebra of absorbing sets A subset A of S2 is called absorbing if (under T) no mass leaves A. Formally:
(8.4.1) Definition. If A C Sl, write
L+(A)={f ELI: f >O,f =0 outside A}. A set A E Y is absorbing if T f E Li (A) for all f E Li (A). It will be proved that the conservative part C is absorbing. The absorbing subsets of C figure prominently in the identification of the limit in the ergodic theorem (see, for example, (8.5.3)). Therefore it will be useful to give several characterizations of this class. We will write C for the class of all absorbing subsets of the conservative part C. The first characterization of C is in terms of the potential operator Tp.
(8.4.2) Lemma. A subset A of C is absorbing if and only if it has the form {Tp f = oo} for some f E L'. The conservative part C is absorbing.
Proof. For f E Li , write Cf = {Tp f = oo}. Let A be an absorbing subset of C. Let f be an integrable function, 0 outside A and strictly positive inside A. Then Tpf is 0 outside A, since A is absorbing, and Tpf is positive inside A and hence oo since A C C. Therefore A = Cf. Conversely, suppose A = Cf. Let g E Li be 0 on A and strictly positive outside A. Since Tpf is finite outside A, we can write Il \ A as a disjoint union of sets Gi = fig < Tpf < (i+1)g} of finite measure. Now assume (for purposes of contradiction) that A is not absorbing. Then there is h E Li (A)
such that Th is not supported by A. Then for some i, the set G = Gi satisfies II1cThIl1 = a > 0. For each k E IN, we have II1GT(kh)II1 = ka.
Let f,,, be any sequence in Li such that fn increases to oo on A. Then limn (I (kh A fn)  khll1 = 0, so that
lnm IT((khAf,,)kh)I
0,
and hence limn fG T(kh A fn) dµ  fG T(kh) du = 0, so limn fG T(kh A fn) dµ = ka and therefore limn fG Tf,, dp > ka. Thus limn fG Tfn dµ = oo.
8.4. The aalgebra of absorbing sets
357
We now obtain a contradiction by constructing a sequence f,,, such that this relation fails. We know that Tpf = 00 on A, so n1
fn=ETif j=0
increases to oo on A. Then fn < Tpf and T fn < Tpf so we have fG T fn dp < (i + 1) fG g dp < oo. This completes the proof that A is absorbing.
Finally, C = C j for any strictly positive integrable function f, so C is absorbing.
The next characterization of C will involve the adjoint T* of the operator
T. Since T is an operator on L1, the adjoint T* is an operator on L,,,, characterized by the duality relation
fEL1i9EL..
I
Since we are assuming that T is a positive operator, it follows that T* is also a positive operator. We are also assuming that T is a contraction in
L1, so T* is a contraction in L. Thus T*1 < 1. Now the operator T* is monotonely continuous, that is, if fn T f E L., then T* fn T T* f. (See (8.4.8).) Therefore, T* (like T) extends uniquely to the convex cone of finite, nonnegative measurable functions, and the extension retains the monotone continuity (8.4.9). We will write TP for the potential operator defined from the adjoint operator T*.
(8.4.3) Lemma. Let A E C, and let h E L,,. be nonnegative on A. Then TPh has only values 0 and oo on A. In particular, TPh = 0 or o0 on C for every function h E LO. such that h > 0 on C. Proof. Using the duality relation we see that if A is absorbing for T, then SZ \ A is absorbing for T*. Therefore 1A T*h = 1A T*(h 1A), and by induction it follows that T*kh > 0 on A for all k E IN. If the lemma
fails, then there exists a positive number b and a set B C A such that 0 < TPh < b on B and 0 < µ(B) < oo. Then for each k we have 00 > b µ(B)
J
T*Zh1Bdµ = JT*h.>Tu1B 00 > 0, i=k
i=0
which implies that T*kh = 0 on B. It follows that TPh = 0 on B, which is a contradiction.
Pointwise ergodic theorems
358
(8.4.4) Lemma. Let A E C and let g be a finite, positive, measurable function. IfT*g < g on A or T*g > g on A, then T*g = g on A.
Proof. Suppose T*g < g on A, and assume first that g E L+. Set h =
g  T*g. Then Ea T*ih = g  T*' g < g < oo. By Lemma (8.4.3), we O
have TPh = 0 on A, so h = 0 on A. That is, g = T*g on A. Now suppose g is unbounded and T*g < g on A. For a constant A, let g' = g A A. Then T*g' < T*g A T*A < g A A = g' on A. By the previous argument, T*g' = g' on A. If we let A T oo, we will obtain by the monotone continuity of T* that T*g = g. If T*g > g on A, proceed in a similar manner, using h = T*g'  g' and
the fact that T*l g' is bounded in L. The following theorem states the main results about C.
(8.4.5) Theorem. (1) The class C of absorbing subsets of C is a oalgebra of subsets of C.
(2) The class C is the class of all sets of the form C f = {Tp f = oo}, where f E Li . (3) The class C is the class of all subsets A of C such that T*1A = 1A on C.
(4) A nonnegative measurable function on C is Cmeasurable if and only if T*h = h on C. (5) A function h E L,, (C) is Cmeasurable if and only ifT*h = h on C. Proof. Let I denote the class of subsets A of C such that T*1A = 1A on C. First, T*1 < 1, so T*1C < 1C on C, hence by Lemma (8.4.4), we have T*1C = 1C on C. More generally, the inequality T*1A < 1A holds on A, so if A E C, we have T*1A = 1A on A. Let A E Z. If B = C \ A, then taking differences and using T * 1 C = 1C on C, we obtain that T * 1 B = 1 B on A. But T * 1 B < 1 B also holds on B, hence T*1B < 1B on C. Subtracting this from T*1C = 1C, we obtain T*1B = 1B on C. Thus I is closed under complementation. Thus C C Z. If A, B E Z, then on C we have 1A + 1B = T*(1A + 1B) = T*(1AnB + 1AUB) = T*1AnB +T*1AUB 1AnB + 1AUB = 1A + 1B So we have equality, and therefore T*lAnB = 1AnB and T*1AUB = 1AUB
on C. This shows that I is an algebra. The operator T* is monotonely continuous, so I is closed under increasing unions. Therefore I is a oralgebra. Suppose A E Z. Then also B = C \ A E Z. So T * 1 B = 1B. Now if f E Li (A), then
L Tfdµ f T*1B f dp=f 1B.f dp=0,
8.4. The oalgebra of absorbing sets
359
so T f is supported in A. Therefore A E C. We have shown that I C_ C, and hence that C = Z. Let H denote the convex cone of finite, positive, measurable functions
on C such that T*h = h on C. Observe that H is closed under infimum (and supremum). Indeed, T*(h A h') < T*h AT*h' = h A h' on C, so by Lemma (8.4.4), T*(h A h') = h A h' on C. Now we claim that a positive, finite, measurable function h belongs to H if and only if h is Cmeasurable. Indeed, if h is Cmeasurable, then it is the limit of an increasing sequence of lineax combinations of indicator functions of sets in C, so h E H by the
monotone continuity of T*. Conversely, since 1{h>a} is the limit of the increasing sequence
1An(ha)+, we see that if h E H, then all sets of the form {h > a} are in C, so h is Cmeasurable. Finally, assume that h is in L,,,, (C). If h is Cmeasurable, then so are h+
and h, so T*h = h on C by the preceding part. Conversely, if T*h = h on C, then T*(h+) = T*h +T*(h) > T*h = h+ on the support of h+, hence T*(h+) _> h+ on C. Again by Lemma (8.4.4), it follows that T* (h+) = h+ on C. Thus h+ is Cmeasurable. Similarly, his Cmeasurable.
(8.4.6) Definition. Let T be a positive linear contraction on L1, and let A E C. The operator induced by A on T is the operator TA on L1(A) defined by
TAf = T(f 1A). The induced operator TA is conservative and Markovian (that is, it preserves integrals). The powers of TA are given by
TAkf =Tk(f .1A) The adjoint TA* defined on LE(A) satisfies TA*k(h) = T*k(h . 1A)
on A. The aalgebra of absorbing sets of TA is A fl C. All these statements follow from the preceding material, using the fact that A is absorbing. The operator T acts like the conditional expectation in that Cmeasuable functions may be factored out. Thus the appearance of EC in the limit theorems for T is not surprising.
Pointwise ergodic theorems
360
(8.4.7) Proposition. Let f E L1(C) and let h E L,,,, (C) be Cmeasurable. Then T(h Proof. Since T is monotonely continuous and linear, it is enough to con
sider the case h = 1A, where A E C. Since A is absorbing, T(1A f) = 1A T(1A f). Now B = C \ A is also absorbing, so 0 = 1A T(1B f). Adding, we getT(1A f) = Complements
(8.4.8) (Monotone continuity) We write fn T f if fn is a sequence of functions, fn(w) < fn+1(w) for all n and almost all w, and f(w) = limn fn(w) for almost all w. A similar definition can be given for fn 1 f. A positive operator T is monotonely continuous on the space E of functions iff: fn T f E E implies T fn I T f . By considering the functions f  fn, we see that monotone continuity can by characterized by: fn 10 implies T f,, 10. Every positive operator on LP is monotonely continuous for 1 < p < oo, and every adjoint positive operator on L,,,, is monotonely continuous. The proofs are similar. Consider an adjoint operator T on Lam, say T = S*, where S is an operator on L1. If fn 10 in Lam, then T fn decreases to some nonnegative limit h. We must show h = 0 a.e. If g E L1, then
J
S9  fndp=0.
This holds for all g E L1, so h = 0 a.e. (The relevant property of LP is "order continuity." See Lemma (9.1.1).) There is a converse in the case of Lam: If S: L,,,,  L,,. is monotonely continuous, then S = T* for some T : L1 * L1.
(8.4.9) (Operator extension.) A positive operator defined on LP can be extended, preserving monotone continuity, to the convex cone of all nonnegative measurable functions (see also Neveu [1965a], p. 188). (8.4.10) (Continuity.) Every positive operator on a Banach lattice is automatically norm continuous. See (9.1.1), below (or Lindenstrauss & Tzafriri [1979], p. 2).
(8.4.11) (Point transformations.) Let T be the Markovian operator defined by an endomorphism 9 of a probability space (St,.F, µ). A set A is absorbing if and only if A = 9'A a.e. So in this case, C is the aalgebra of invariant sets. (8.4.12) (Monotone continuity.) In an ordercontinuous Banach lattice E, if fn T f E E, then 11fn  f 11 + 0 (see Section 9.1 or Lindenstrauss & Tzafriri [1979], Proposition 1.a.8). Thus positive operators are monotonely continuous. For example, positive operators on LP spaces 1 < p < oo are monotonely continuous. Remarks
Study of the valgebra C goes back to E. Hopf [1954].
8.5. ChaconOrnstein theorem
361
8.5. The ChaconOrnstein theorem (conservative case) This famous theorem asserts the convergence of ratios of the form
Rn=RX,9)_ En°TZf
i=o Tig
for f, g E L1, with g > 0. The case where T is conservative will be treated first. The nonconservative case will be done later (8.6.10) in the framework of superadditive processes. As is customary in ergodic proofs, we establish convergence for larger and larger classes of functions. If f is of the form s  Ts, where s E L1, then the numerator "telescopes," and we have Tns (f,9) = n1s Ti n1 Ti i=o The first term clearly converges to 0 on C. We will show that the second term converges to 0 on fl. (8.5.1) Lemma. Let s E Li , g E L1, g > 0. Then Tns En1 Ti 9 i=o
9 =0 9
converges to 0 a.e.
Proof. Fix E > 0, let hn = Tns  E E? i= O Tig, ho = s, and An = {hn > 01
Now ry = gµ is a finite measure equivalent top. We claim that En ry(An) < oo. To see this, first compute
f
f T(hn) dµ < f T(hn) dµ < f hn dµ = f hn dµ.
Thn dµ < n+1
o
S2
Note that Eg = Thn  hn+1i so that E
f
gdµ=f
Thndµf
hn+l dµ
fhndfh1 dµ. n}1
Thesum telescopes: 00
7(An)=> f gdµ<  f hodµ
n=1
E
Ao
Therefore, by BorelCantelli we have y(lim sup An) = 0, and therefore we have µ(lim sup An) = 0. This shows that the ratio Tns/ E O Vg converges to 0 a.e. We will need an approximation lemma. Functions that are close in L1 norm have similar pointwise ratio behavior. This will allow us to derive the ratio theorem on L1 from that on L9. More precisely, we have:
362
Pointwise ergodic theorems
(8.5.2) Lemma. Let g E L1, g > 0 a.e., and let ek E L1, k E IN, satisfy >k Ilek II1 < oo. Then limk sup. R,,(ek, g) = 0.
Proof. Write, as usual, y = gp. Fix A > 0. Now apply Proposition (8.2.3). If sk = sup. Rf(Iek I, g), then A7{sk > A} <
J{8k>a} g I
dry < fn IekI dp = IIekIIII
Therefore >k y{8k > Al < oo. By BorelCantelli, we have sk < A except for finitely many k, or lira supk sk < A. Since A > 0 was arbitrary, we have limksk = 0. Since IR.(ek,g)I < R, (Iek1,9), we have the required result. We will now assume that the operator T is conservative (C = Sl). Since < 1, hence T* 1 < 1. But T is conservative, so this means that T* 1 = 1; thus T preserves the integral, that is, T is Markovian. The aalgebra C is used to identify the limit of R,, (f, g). (See Section 2.3 for a discussion of conditional expectations on infinite measure spaces.) T is subMarkovian, we have I I T* II
(8.5.3) Proposition. Let T be a conservative Markovian operator. Let f, g E L1 with g > 0 a.e. Let y = gp be a finite equivalent measure. Then the ratio Rn(f, g) converges a.e. to the finite limit
R(f,g) = E7 [i]. Proof. We may assume f > 0. If f is of the form h g, where h E L,,. and T*h = h, then by Proposition (8.4.7), we have Ta(h g) = h Tag, so clearly Rn (f, g) = h. But also, since h is Cmeasurable, E7 [ f g1] = E7 [h] = h. If f is of the form s Ts where s E L1, then by Lemma (8.5.2), we have R,, (f, g) , 0 a.e. But also for A E C, we have fA(f /g) dry = f 1A f du = f 1A (s  Ts) dp = f (1A  T* 1A) sdp = 0, so that E7 [fg1] = 0. Thus the theorem is true for all f in the linear space
Eg is dense in L1. By the HahnBanach theorem (5.1.2), it
suffices to prove: if k E L,, and f k f dp = 0 for all f E Eg, then k = 0 a.e. So suppose k is such a function. Then we have, for all s E L1,
J(k_T*k).sd/t=:Jk.(s_Ts)d/L=O, so that T*k = k. Thus we have k g E Eg, so 0 = f k k g dp, so that k = 0 since g > 0. This completes the proof that Eg is dense in L1.
8.5. ChaconOrnstein theorem
363
Finally, let f be a general element of L1. Choose a sequence fk E E9 with Ek1 II f  fkIIl < 00. Write ek = f  fk and 8k = sup,, Rf(Iek1,g). Consider the inequality I Rn ( f,
g)  R(f, g) I
I Rn (fk, g)  R(fk, g) + 8k + E
[I]
The first term on the right converges to 0 for each fixed k as n + oo because
fk E E9. The second term converges to 0 as k > oo by Lemma (8.5.2). The third term converges to 0 as k + oo since
>2IEC, [
e
] dy=IIIekIIl < 00.
The formulas for the conditional expectation from Section 2.3 may be applied to this limit R(f, g). For example, we have R(f, g) =
EC [f] Eµ [g]
whenever the conditional expectations with respect top exist, in particular if p is finite. Let us consider next what can be done if g E Li , but g = 0 on a set of positive measure. Then g p is no longer equivalent to p. But there is still some finite measure equivalent to p, say p = rp. The isomorphism between Ll (S2, p) and Ll (S2, .F, p) maps T to the operator Tp defined by
f E Li(p)
Tpf = r1T(fr)
Clearly we have TPf = r1Tk(fr), Rn(Tp)(f,g) = Rn(T)(fr,gr), andTp is conservative if T is. The oalgebra of absorbing sets is the same for T and Tp. The spaces L ,,.(p) and Li(p) are the same and the adjoint operators TP and T* are the same, because
fi. Tgdp = f(Tpf)
gdp
=f =fir. T*gdp
fi
T*gdp
Now if f, g E Ll and g > 0, then f /r, g/r E Ll (p), and the preceding results applied to the operator Tp give that
limR.(f,g) = n
EP
[gr1]
364
Pointwise ergodic theorems
It follows from the convergence of Rn (f, g) (and can be checked directly) that the expression on the right does not depend on the finite equivalent measure p chosen. The case when g = 0 on a set of finite measure can now be discussed. It is senseless to consider the limit limn Rn (f, g) if Rn (f, g) is defined for no n because of 0 denominator. But the convergence can be proved on the set G where Rn(f, g) is defined for sufficiently large n. Notation: if B is a measurable set, we will write A E B fl C if A = B fl C for some C E C, or (equivalently) A C B and A E C.
(8.5.4) Proposition (conservative ChaconOrnstein). Let T be conservative, f E L1, g E Li . Then the ratio
.
TZf
Rn(f,9)nO Ti9 i=O
converges to a finite limit R(f, g) a.e. on the set G = {Tpg > 0}. The limit R(f, g) is Cmeasurable, and satisfies fA g R(f, g) dµ = fA f du for all A E G fl C. If p = rµ is any probability measure equivalent to p, then R(f19) = EP [f r1] EP [9r1]
a.e. on G.
Proof. We claim first that the support F of EP [g] is G = {Tpg > 0}. By Theorem (8.4.5), G E C. Since g is supported in G, so is EP [g]. Thus F C G. Conversely, since G \ F E C, for each k,
0=f
G\F
EP [9] dp = f
G\F
Tkgdp,
so p(G \ F) = 0. By considering the induced operator TG as in Definition (8.4.6), we may assume G = Q. Now
Rn(f,9) =
B"(f,EP [9])
R. (9, EC [g])
By Proposition (8.5.3), the numerator converges to EP [fr1] /EP [gr1] and the denominator converges to 1.
The ratio R n (f, g) for f E L1, g > 0, still converges to a finite limit on the set {Tpg > 0}, if T is not necessarily conservative and g is not necessarily integrable. These generalizations are treated in the framework of superadditive processes in Section 8.6.
8.6. Superadditive processes
365
Complements
(8.5.5) (Point transformations.) Theorem (8.5.3), in the special case of an endomorphism of a probability space, with g = 1, is called Birkhoff's theorem, and is essentially the oldest pointwise ergodic result (G. D. Birkhoff [1931] ). The limit in Birkhoff's theorem is EC [ f ], where C is the aalgebra of invariant sets. Remarks
Theorem (8.5.4) is due to Chacon & Ornstein [1960]. The general (nonconservative) case will be proved below (8.6.10). The use of the HahnBanach theorem in the proof of the conservative case of the ChaconOrnstein theorem originated with Neveu; see Neveu [1964] or Garsia [1970].
8.6. Superadditive processes Let T be a subMarkovian operator. The sequence (sn) of measurable functions is called a superadditive process if (S1)
sk+n > sk + TkSn
k,n > 0
and
(S2)
'y = sup n
f sn dµ < oo. n
The number y in (S2) is called the time constant of the process. A sequence (Sn) is called extended superadditive if (Si) is satisfied. A sequence (Sn) is
subadditive if (sn) is superadditive; and additive if it is both superadditive and subadditive. Note that (Sn) is additive if and only if it has the form
n1
Sn=ET=f i=0
for some function f.
If T is Markovian, i.e., it preserves the integral, then x,, = f sn dµ is a numerical superadditive sequence, i.e., Xn+k >_ xk + xn, and the time constant y = supnEIN n1 f sn dµ is also given by ry = limn,,,, n1 f Sn dp (see 8.6.13).
The theory of processes subadditive with respect to an operator induced by a measure preserving point transformation was initiated by J.F.C. King
man in 1968, who gave important applications to probability. Here, the operator superadditive theory is mainly developed because of the light it sheds on operator ergodic theorems. Mathematically, the superadditive and the subadditive cases are equivalent, but the first one is slightly simpler to treat for the following reason.
Pointwise ergodic theorems
366
An arbitrary superadditive process (sn) obviously dominates the additive process
the difference is a positive superadditive processes which can be studied in the pleasing context of positive operators acting on positive functions. By the same argument, if the additive theorem is known, we can restrict our attention to the positive superadditive processes. Dominants We will compare a positive superadditive process to an additive process that dominates it but barely. A dominant of a positive superadditive pro
cess (sn) is an Li function 6 such E
O
Tib > sn for all n. An exact
dominant of a positive superadditive process (sn) with time constant ry is a dominant 6 such that f 6 dp = ry. We will show that such a 6 exists in some cases . We will use the functions M
WM =
7rt
(si  Tsi_1). i=1
(8.6.1) Lemma. Let cpn,, be as above. Then n1
E TZco
>
1
i=0
n1
sn
m
for1
n1
n
ETi(sj Tsj1) = ETZsj  ETzsj1 i=0
i=0
i=1 n
_ (I Tn)Sj + ETi(sj  sj_1), i=1
so that (remembering so = 0) n1
rn
n
mET2cor = (I Tn)1: sj+ETis,m. j=1
i=0
n1 i=0
i=1
r.n n1 Si + Sn + E (Sn+i Tnsi) + 1:(T2sn  TnSi+mn). i=1
i=1
8.6. Superadditive processes
367
The first sum in the last line is nonnegative. Since sn+i  Tnsi > sn, the second sum is > (m  n)sn. Also, Ttsni 2! 0,
T''Sm  Tnsi+mn = TZ(sm  TnZ8i+mn) so the last sum is nonnegative. Therefore M n1
E Ti'pm > [1 + (m  n)] sn. i=o
Now divide by m.
(8.6.2) Proposition. Suppose T is a Markovian operator, and (sn) is a positive superadditive process. Then (sn) admits an exact dominant.
Proof. For this proof, we use the weak truncated limits (see Section 8.1). Define cpn,, as before. Since T is Markovian,
f
Vmdµ
M 1 z1
Y szdµ
sz1dµ
m 1
f
sm,dµ<^j
Taking a subsequence of lpm, still denoted by (pm, and using a diagonal procedure, we may assume that the weak truncated limit Ai = WTLm exists for each i. Then by the additivity of weak truncated limits (8.1.2), WTLm E i=1O Ticpm exists and equals Ei= O A j. By Lemma 8.6.1, n1
> Ai > sn
(8.6.2a)
n > 1.
i=o
By the Fatou property of the weak truncated limits (T WTL < WTL T, see 8.1.3), we have TAi1 < A,, and, more generally, TnA1 < Ai+n. Hence we can write
Ai = (AiTAi_1)+T(Ai_1 TAi_2)+...+TiKill TAo)+TZAo, with all the summands positive. Therefore 7 >_
f
Ai dp
= f [(Ai  TAi1) + (Ai1  TAi2) + ... + (A1  Tao) + A0] dµ. Define 00
b = AO + E(A1  TAi1) (8.6.2b)
i=1 cc
_ E(A.  TA,). i=o
Pointwise ergodic theorems
368
Then f b dy < y, and n1
00
Tib = (I  Tn)
Ai,
j=0
i=o 00
00
>1: Ai
 EAi
i=0
i=n
n1
=EAi>Sn ii=0
by (8.6.2a). This shows that n f 6 du > f sn dµ for each n, hence f 6 dp > y. Thus f Sd p = ry. This shows that 6 is an exact dominant. A positive superadditive process (shifted) can be approximated also from below by additive processes. Note that a positive superadditive process is increasing: Sk+n > Sk + Tksn > Sk
(8.6.3) Lemma. Let 8k be a positive superadditive process, and let ak = Sk/k. Then n1
E TZak < Sn+k1 i=0
for alln>0. Proof. We have n1
n1
Tiak = E Tisk
k
i=0
i=0
n1 < E (Sk+i  Si) i=0
n+k1
n1
E Si  E si < ksn+k1, i=0
ii=0
where the last inequality follows from the fact that sn is increasing. Superadditive ratio theorem
(8.6.4) Theorem. Let T be a conservative operator, and let sn be a positive superadditive process with an exact dominant 6. Let Cb = {E°_0 Ti6 > 0}, and define v(A) = fA S dµ. Then (8.6.4a)
lim n
s
n1 i=0
1 ib
on C6
8.6. Superadditive processes
369
and
J sn dµ = a(A) n n1 JA sn dµ = sup n n1 A
(8.6.4b)
for A E C6 fl C.
lim
Proof. Let f,g E Li . Set C. = {EO°0Tig > 0}. From Proposition 8.5.4 we have that lim Rn(f, g) = R(f, g) where R(f, g) is Cmeasurable and
jR(f,g).gd=ffdL
(8.6.4c)
for A E C9 fl C. Now T* 1A = 1A, so the righthand side does not change if f is replaced by Tk f . Therefore, for each k, we have R(f, g) = R(Tk f, g). Furthermore, CTk9 = C9 and R(f, g) = R(f, tkg). So
E TZf lm . 0 TZf n+k1 0 T gi= n E n+k1 i Tg
R (f, (f, g) =
i=k
So we have
n kTi R(f, g) = lim
(8.6.4d)
n
En_ol 7' if g
i=01
By Lemma 8.6.3 and the definition of a dominant, if ak = Sk/k, then nk
n1
E Tiak < sn <
(8.6.4e)
i=o
Ti6. i=o
Write Sn
lim sup
En1 T'6 i=o Sn
R = lim inf
0 Ti6 Then by (8.6.4d) and (8.6.4e) with g = S, we have on Cb
R(ak,6)
ak dp =
R(ak, b)8 dp <
L
R6 dp <
A
JA
f
A
R6 dp
f
6 dµ.
A
But lim fc6 ak dp = y = fc6 6dµ. Hence R = R on Co. Also, (fA Sk dµ) is a superadditive sequence of numbers, so (8.6.16) it converges to su kp JA
ak d/ = [A
dµ
since 1A5 is an exact dominant of (1ASn). This implies (8.6.4b). We now consider the general nonconservative case. For an extended
superadditive process (s'n) it will be useful to consider the notion of an extended exact dominant.
Pointwise ergodic theorems
370
(8.6.5) Definition.
Let (sn) be an extended superadditive process. A measurable function 6' with values in [oo, oo] is an extended exact dominant for s' if E O V Y > s' for all n and, for all A E C,
8'dµ=o'(A), JA
where a'(A) is the (finite or infinite) limit
o'(A)=limnJ s' dµ=supJ s',,dµ. nnA A The existence of extended exact dominants can be established by applying the results about exact dominants to processes of the form 1AS'n where A E C and o'(A) < oo. (See also the proof of Theorem 8.6.7, below.) We now show that an extended superadditive process divided by n can be integrated term by term. (Of course, this is false without superadditivity.)
(8.6.6) Lemma. Let
Yin
be a sequence of measurable functions withn
integrable. Suppose (8.6.6a)
fork,n>0.
1'k + ,bn C V)k+n
If E is a conditional expectation operator, or an operator fA, then
oo < lim Eon = E ( lim 1 in) < oo.
\nn
nn
on/n. Therefore Proof. We have by induction using (8.6.6a) that Y)1 2Gn/n is bounded below by the integrable function iii , so that Fatou's lemma applies. Using (8.6.13), we have
oo < E01 < E (lim 1 On) n
< lim E
/
=E (supnon)
= E I lim 1 On) < oo. \\\ n n /
0
We will now consider the general nonconservative case. Here is the main
theorem. Recall: C is the conservative part, D the dissipative part; T is Markovian if it preserves the integral.
8.6. Superadditive processes
371
(8.6.7) Theorem (AkcogluSucheston). Let T be a subMarkovian operator, s,, a positive superadditive process, and s' a positive extended superadditive process. Let E = {sup,, sn > 0}. Then lim sn/sn = h exists a.e. on c fl E and is Cmeasurable. Let 6 [respectively, 6'] be an exact [extended exact] dominant of the process sn 1c [sn 1C] with respect to the Markovian operator TC. Let p be a probability measure equivalent to p, say p = rp. Set s = sup 1 EC [sn 1c/r]
nn
s = sup EC [sn Ic/r].
nn
The following limits exist a.e. on C:
linm nEp [sn Ic/r] = s = EC [6/r] < oo, p
(8.6.7a)
linm
nEp [sn lc /r] = s' = EC [6'/r] < oo. p
Also, lira sn/sn = s/s' a.e. on C fl E. If either T is Markovian or sn is additive on D, then lira sn/s'n = lira I sn/ lim T s', a.e. on D fl E. Proof. To simplify the notation, we will assume that p is a probability and r = 1. To obtain the general case, replace p by p and Ec by EP [ r1]. Since sn and sn are increasing sequences, lim sn/s' = lira T sn/ lim T s', exist a.e. on D fl E if lim sn < oo on D. This is true in the Markovian case because of the existence of a dominant for sn (see Proposition 8.6.2). If sn is additive on D, then it is of the form EZO Ti6 with 6 integrable, hence lim sn < oo on D. Now consider C. If (sn) is (extended) superadditive, then so is (sn 1A) for each absorbing set A, both with respect to T and TA, because sk+n 1A > Sk 1A + (Tksn) 1A > sk 1A +Tk(sn 1A)
Positive linear operators on an LP space (or an Orlicz space) can be uniquely
extended to the set L of all measurable functions f from 1 to (oo, oo] such that f  is integrable (8.4.9). The extension preserves their monotone continuity. This is, in particular, true for the subMarkovian operator T, and for the conditional expectation EC []. Since the restriction to a set A E C of an (extended) superadditive process is (extended) superadditive, choosing A = C, we may consider the operator Tc instead of T, and thus assume that S2 = C. If A E C, then T* 1A = 1A implies that fA f du = fA Tf dp and also EC [f] = EC [T f ]. Therefore, applying Lemma 8.6.6, first with E = fA, then with E = EC, we obtain, for each A E C, (8.6.7b) A
s'ndp=sup1J sndp=Q(A)
Pointwise ergodic theorems
372
and also the relations (8.6.7a) hold (recall that p = p and r = 1). Comparing with 8.6.7a and 8.6.4b, we determine that or = sp and v' = s'p on C.
Thuss=Ec[b]. LetFj={0<_s'
be the exact dominant of sn on Fi with respect to the operator TF;; then s' = Ec [b'] on Fi. Since Sn
8'n
S
n
= ff1 Tib
n1 _i=0 Ti6
En1 T6'
n1 Ti6' i=0 Sn
i=0
0
Theorems 8.6.4 shows that sn/sn converges to s/s' on Fi n E, hence on
FnE.
It remains to consider G' = G n E. Let a'k = s'k/k. By Theorem 8.6.4, (8.6.4d), and the primed version of (8.6.4e), we have on G' s
linms
n
n 1 Tib n_0 Tzak _i=O
EC b
= Ec[[
+0 ]
as k  oo. Hence the limit on G' is 0. The definition of b' can be extended to G' so that b' is an extended exact dominant and s' = Ec [b'], finite or infinite.
It should be noted that S /s' need not converge on D n E if sn is not additive on D and T is not Markovian: for example, consider T = 0. On the other hand, we prove next that these assumptions can be weakened to lim inf fD cp,,,, du < oo, where, as before, cp.,,,. = (1/m) Ei'l(si  Tsi_1). Indeed, this condition is sufficient for the existence of an exact dominant for sn, and therefore implies the convergence of sn/sn. In the proof, truncated limits are not used, but the argument is less elementary, since it involves the decomposition of set functions.
(8.6.8) Theorem. Let T be a subMarkovian operator. Let (sn) be a positive superadditive process. Write co _ (1/m) Ei'I(si  Tsi_1),
and let D be the dissipative part of Q. Then the limits limn,, fD W. dp and limn,, f cp.,,,, du exist, finite or infinite. The following conditions are equivalent: (1) limn fD cp,,, dp < oo. (2) lim,,, f cp,n dp < oo.
(3) There is a dominant. (4) There is an exact dominant.
Proof. Recall that T is Markovian on the conservative part C and lcsn is superadditive. Thus 1
1
 s,,,,dp
8.6. Superadditive processes
373
This shows that (1) and (2) are equivalent. (4) (2)
(3) is trivial.
(4). Assume that 0 = lim inf f cpn,, dµ < oo. Then there is a
subsequence, still denoted cp,,,,, such that lim,,,, f cpn,, dµ = 3. Considering the cp,,,, as elements of Li*, we may assume that cp,,,, converges weak* to an element 00 of Li*, in symbols cp,,,. *> o. Observe that this implies T p.. = T**cpr *> T**1)o.
Elements of Li* may be considered to be (signed) charges that vanish on all sets of pmeasure 0. Since all of the Wm are positive, so is the limit 00. Considering Ll a subset of Li* in the usual way, we have by Lemma 8.6.1 00 > sl. Let X10 = M(V)o) be the maximal countably additive measure dominated by ib0 (see 8.1.4). Then also r!o > s1p, because the pure charge 7ro = ')o  77o does not dominate any measure. The second adjoint operator T** maps Ll into itself. Let T**lro
= 771 + 7r1
be the decomposition of T**lro into a measure and a pure charge. Then ,10 + Tr7o + 171 ? s2µ. Indeed, applying (8.6.1) with n = 2, we have cp. +Tcpm >
(i+) m
s2,
hence ''o + T**?o > s2µ and i7o+iro+T**(170+fro) _,70+.7ro+Tr7o+771+7r1 > s2µ.
But 7r1 and ir2 are pure charges, and a sum of pure charges is a pure charge, and so does not dominate any measure; so we have rlo +Tr7o +77l > s2µ.
In general, given 7rn, let 71n+1 = M(T**7rn), and 7rn+l = T**7rn  11n+l Finally, set 77 = E°_0 77i. Then n1
n2
n1
T277o+ETiq1+...+r7n1>snµ,
Ti17
i=0
i=0
i=o
It follows that i7(1) = ry, so that S = dr7/dµ is an exact dominant. (3) (2). Now assume there is a dominant 6. Then 0<
f 7
k1
(I  T) 1: Ti'6  sk
dµ
i=0 7
= J Sdµ  J (I T)sk dµ 
J
Tk6dµ.
Pointwise ergodic theorems
374
Hence
J
f(I_T)skdp+fTkf5di
6dµ
= J [(sk Tsk_1) T(sk  sk_1) +Tk6] dµ. Take the Cesaro averages on the right. r
J
6dµ >
f
J
1
(Wn 
n
Tsn +
1
n
n i=1
Tz6
dµ.
Now 6 is a dominant and T is positive, so we conclude that f 6 dp _> f cn dµ. Proceeding now as above, we obtain an exact dominant 6' such that f 6' dµ = lim inf f con dµ. Since 6' is a dominant, sup f cn dp f 6' dµ implies that lim f Wn du exists. Ergodic theorems
In the ergodic theorems below, sn is often of the form n1
1c > T2f, =o
the restriction of an additive process to the conservative part C. In such a case, we can obtain an explicit determination of the exact dominant 6 appearing in the identification of the limit in terms of the successive contributions of the dissipative part D to the conservative part C, by considering the operator H defined by 00
Hf = lc E(TID)2f, i=0
where the operator ID is defined by ID(f) = 1D f. The operator H is subMarkovian since ry = sup(1/n) f sn dµ < f f dµ, and extends to the set L of measurable functions with integrable positive part in the usual way. Also, Ep [Hfr1] can be defined for f E L.
(8.6.9) Proposition. Suppose sn is a positive (extended) superadditive process of the form n1
1c E Tzf, z=o
with f > 0. Then H f is an (extended) exact dominant of sn.
8.6. Superadditive processes
375
Proof. It suffices to consider the superadditive case. We have n2 n
Sn = >T2(f 1c) + > Tz (1cT(f 1D)) + ... i=o
i=o
+ (I + T) (1c (TID )n2f) + 1c(TID)nlf
n1
< >Ti'Hf. i=o
So H f is a dominant. If we define operators H(n) by n1
H(n) f = lc E(TID)2 f, i=O
then H f is the limit of H(n) f. Now T is Markovian on the conservative part C, so
fSnd, = nJ
fdµ+(n1)J cT(f1D)dµ+ f
+2
Jc
(TID)n2 f dµ + f(TID)n_lfd/L
n1 i=o
fn2
(TID)ifdµ+J
(TID)ifdµ+.
c i=o
+
fdµ c
= J H(n) f dµ + f H(n1) f dµ +... + f H(°)fdµ. This shows that (1/n) f Sn dp is the Cesaro average of f H(n) f dµ, and hence converges to limn f H(n) f dµ = f H f dµ. Thus f H f dµ = ry, so the dominant H f is exact. We now give some ergodic consequences of the superadditive ratio the
orem. The first application is the nonconservative case of the ChaconOrnstein Theorem.
(8.6.10) Theorem (ChaconOrnstein). Let T be a subMarkovian operator, let f E Li , and let g > 0. Then
Rn(f,g)_ En
i=o i=01
T2g
converges to a finite limit h a.e. on the set E = {Tpg > 0}. Let p = rµ be a probability measure equivalent to µr. The limit h is equal to E0 [Hf
EP [ 2]
376
Pointwise ergodic theorems
a.e. on c fl E and equal to
roo
Mo
Tif
F.i°°
=0Tig
a.e. on D n E. Proof. Use 8.6.7 and 8.6.9 with sn = E i= O Ti f and s', = E i= O Tig. Next we prove the DunfordSchwartz Theorem for positive operators. That the supremum of An f in the DunfordSchwartz theorem is finite, follows also from 8.2.6, above.
(8.6.11) Theorem (DunfordSchwartz). Let T be a subMarkovian operator which also satisfies T1 < 1. If f E Ro , then n1
An.f=nETif i=0
converges a.e. to a finite limit h. On the conservative part C,
h= On the dissipative part D,
Tif h = E000 =0 Til Ei0o
Proof. For each f E Ro, and each constant A > 0, there is a function f A E L1 such that If  f AI < A. Since IAnAI < A and A is arbitrarily small, we may assume f E L1.
Now sn = E o TI f is additive, while s' = n 1 is extended superadditive, because k + n > k + T(n 1). W e have s' = limn(1/n)EP [1 n1r1] _ Ep [1cr1]. (See also the identification of the limit in Theorem 8.6.12, below.)
Our last ergodic theorem is Chacon's theorem for positive operators. A sequence (pi) of positive measurable functions will be called admissible if Tpi < pi+1 holds for all i > 0. For example, (pi) is admissible if pi = Tig for some positive g. Also, if T1 < 1, then the sequence pi = 1 is admissible. These two examples show that both the ChaconOrnstein Theorem 8.6.10 and the DunfordSchwartz Theorem 8.6.11 are consequences of the following.
8.6. Superadditive processes
377
(8.6.12) Theorem (Chacon). Let T be a subMarkovian operator, let f E L'1, and let (pi) be admissible. Then
En1 Ti i=0
En1
f
i=0 pi
converges to a finite limit h a.e. on the set E S' = limn Ep [lcpnr1]. The limit h is equal to
> 0}. Let
Ep [H,] s'
a.e. on the set c n E, and equal to Elm i=o
Tf
a.e. on D n E.
Proof. Since Tpi < pi+1 for each i, we have by induction Tkpi < Pi+k Thus
n+k1
E
i=k
nr1
pi > T k ? pi. i=0
Hence s'n = EnI pi is extended superadditive. Set sn = E2
Ti f . Now O Theorems 8.6.7 and 8.6.9 may be applied. The result follows, except that s' is the limit of the Cesaro averages of EC [lcpn], rather than lim Ec [lcpn] itself. (As before, we assume µ is a probability measure and r = 1.) However, Ec [lcpn] = EC [Tcpn] < EC [lcpn+l], so that the sequence EC [lcpn] is increasing, and the two expressions for s' coincide. The observation that Ep [lcpnr1] must be increasing if Pn is admissible indicates that superadditive processes are much more general than processes that are sums of admissible sequences. Complements
(8.6.13) (Numerical superadditive sequences.) Suppose Xn E (0o,00] satisfies Xk+n > xk + xn. Then xn
lim Xn n n n.oo n = sup
.
To see this, write y = supra xn/n (finite or infinite). Fix a positive integer d. Each n can be written with quotient kn and remainder rn, where n = knd + rn and 1 < rn < d. Then xn > xr . But Xk,d > knXd, so xn
n
(knd) xd/d + xr,. n
Pointwise ergodic theorems
378
Now knd/n > 1 and x,.,, /n * 0, so we have
limn inf n > d . This is true for all d, so 1im inf xn/n > y. The inequality lim sup xn/n < y is clear.
(8.6.14) The operator H used in 8.6.9 may be generalized as follows. If A E F, define the operator IA by IA(f) = 1A f . Define HA by
HA f = lA j:(TIo\A)i f. i=0
If A E C, then Proposition 8.6.9 remains valid with the operator HA in place of H. However, the operator HA is subMarkovian even if A is not absorbing. Hint: Define HA(') in a manner analogous to H(s), and show by induction on n that H(An)
= (IA+TIo\A)n  (TIO\A)n.
Therefore H,(q) is dominated by powers of the positive contraction
IA + TIo\A.
(8.6.15) (Convergence of Cesaro averages An f .) Suppose there is a positive g such that {g > 0} = C and Tg < g on C. Then (1/n) EnO T' f converges a.e. on 12 for each f E Li . Indeed, the process s' = long is extended superadditive with respect to the operator Tc, so that the ratio Ti f /ng converges a.e. on C. But on D, it is clear that rZ o T' f /n o converges to 0. (8.6.16) For f E Li , set s,, = lim inf,. Ei+T1 Ti f . Then sn is superadditive, so Theorem 8.6.7 is applicable. This is also true if lim inf is replaced by inf .
(8.6.17) (Subadditive theorem for measure preserving point transformations.) An example of a positive subadditive process to which Theorem 8.6.7 can be applied is 'n1
sn = max
e;=±1
Ueif 00
I
i=0
where f is a Bochner integrable function with values in a Banach space, 0 is a measurepreserving point transformation on a vfinite measure space, and the max is taken over all measurable choices ei(w) of signs +1, 1.
8.6. Superadditive processes
379
(8.6.18) (More on the superadditive theorem for measurepreserving point transformations.) Let 9 be a conservative measure preserving point trans
formation and p be a number with 0 < p < 1. Let f be a nonnegative measurable function such that f f P dµ < oo. Then, for any positive function g, r 1 E:=0 fo9 lim Ei=o _1 =0 z
n
g ° 9i
almost everywhere on {Tpg > 0}. (Akcoglu & Sucheston [1984].) If the measure i is finite, g = 1, and f o 9i is an independent sequence, then this result is due to Marcinkiewicz. (See also 6.1.15.) The ergodic result remains true if the pth power is replaced by a positive subadditive function
(8.6.19) (Markov kernels.) A Markov transition kernel is a function P(w, A), such that: (1) P(w, ) is a probability measure defined on .1 for each fixed w E ft (2) P(  , A) is a measurable function on ci for each fixed A E F. The kernel P is nullpreserving if u(A) = 0 implies P(w, A) = 0 for almost all w. Such a kernel P defines a Markovian operator T via the RadonNikodym isomorphism as follows: If cp is a measure on.F, define the measure Tcp by
f Tcp(A) =
J
P(w, A) cp(dw).
Since P is nullpreserving, if cp is absolutely continuous, so is Tcp, so T defines an operator from L1 to L1. The adjoint operator T* on L,,,, satisfies
T*h(w) = Jh(y)P(w,dy). An important particular case is when P(w, A) = 1e1(A) (w), where 9 is a measurable invertible point transformation that maps null sets on null sets. Then the ChaconOrnstein theorem is essentially the ergodic theorem of W. Hurewicz [1944]. The connections between ergodic theory and Markov processes are studied in Revuz [1974].
(8.6.20) Recall that a subMarkovian operator is a positive linear contraction on L1. There is an example, due to Chacon [1964], of a subMarkovian conservative (Il = C) operator T such that the Cesaro averages A,, f = (1/n) EnO Ti f diverge a.e. for each non null f E Li . This example also shows that there need not exist a strictly positive function g such that Tg < g, since the existence of such a g implies convergence of A" f (see 8.6.15). In fact, in Chacon's example, T is generated by an invertible nonsingular (preserving null sets) point transformation 9 by T`o=9'co,
co=fµ,
380
Pointwise ergodic theorems
which shows that Cesaro convergence may fail also in the setting of the Hurewicz theorem. In the Hurewicz setting, the nonexistence of a g such
that g > 0 and Tg < g (equivalently, Tg = g) also provides a negative solution to the famous problem of the existence of a "ofinite equivalent invariant measure" this name is given to y = gµearlier resolved by D. S. Ornstein [1960].
Even if An f converges to a finite limit, as is the case if TI _< 1, if p is infinite this limit is often zero (8.6.21), so that the ChaconOrnstein theorem remains of interest, informing about the relative behavior of sums of iterates of two functions f and g such that lim A,,, f = lim An,g = 0. (8.6.21) (The ergodic case.) A subMarkovian conservative operator T is called ergodic if the aalgebra C is trivial, that is, for every element A E C, either p(A) = 0 or p(1l \ A) = 0. If T is ergodic, f E Li , and g > 0, then the limit R(f, g) in the ChaconOrnstein theorem is E [f] /E [g] a.e. on D. Indeed, we observed (following (8.5.3)) that for integrable g the limit in the conservative case is EC [f] /Ec [g]. The case of general g > 0 can be reduced to the integrable case by taking integrable positive gj's with gj T g and observing that E [gj] T oo and E [f] /E [gj] > E [f] /E [g].
If T is ergodic, T1 < 1, p(Q) = oo and f E Li , then the limit in the Ti f < R(f, 1) = 0. DunfordSchwartz theorem (8.6.11) is lim (1/n) o (8.6.22) (The ratio theorem if T is not a contraction.) If T is a positive L1 operator that is powerbounded (i.e., such that sup, IITnII1 < oo), then the space S2 uniquely decomposes into parts Y and Z characterized as follows: if f E Li (Y), then liminf IITn f 111 > 0; if f E Li (Z), then JITn f X11 + 0. If f E Li and g > 0, then the limit R(f, g) exists a.e. on Y; in general it fails to exist on Z. (This is proved in Sucheston [1967]. Further results about the decomposition Y + Z were obtained by A. Ionescu Tulcea & M. Moretz [1969]; and by Y. Derriennic & M. Lin [1973].) Remarks
The pioneering paper of E. Hopf [1954] gave the first Ll operator theorem: Theorem (8.6.11) under the assumption that T1 = 1. Theorem (8.6.10) was conjectured by Hopf and proved by Chacon & Ornstein [1960]. The identification of the limit is due independently to Neveu [1961] and Chacon [1962]. The ChaconOrnstein theorem for measurepreserving transformations is due to E. Hopf [1937]. The identification of the limit is facilitated by the fact that not only C, but also D is absorbing (J. Feldman [1962]). Theorem (8.6.7) and the derivation from it of the general ChaconOrnstein and Chacon theorems is from Akcoglu & Sucheston [1978]. Theorem (8.6.8) is from Brunel & Sucheston [1979]. The identification (8.6.9) of
the operator H f as an exact dominant is new. The identification of the limit in Chacon's theorem (8.6.12) is due to U. Krengel [1985], p. 130. The proof via (8.6.7) and (8.6.9) of various ergodic results including Kingman's
theorem and Chacon's theorem together with identification of the limit is by far the shortest. However, other approaches may provide additional information. We mention two powerful methods, both presented in detail in Krengel [1985]. The filling scheme, the original method of ChaconOrnstein, was widely applied in ergodic theory and probability. The maximal lemma of Brunel [1963] provides a proof of
8.6. Superadditive processes
381
ChaconOrnstein that connects with potential theory; it was applied to identify the limit by P. A. Meyer [1965]. The finite measure case of the superadditive theorem for measurepreserving point transformations is due to the pioneering paper of J. F. C. Kingman [1968].
There exist striking applications, in particular to percolation theory and to the limiting behavior of random matrices (Kingman [1973] and [1976], Kesten [1982]).
9
Multiparameter processes
We present here* a unified approach to most of the multiparameter martingale and ergodic theory. In one parameter, the existence of common formulations and proofs is a well known old problem; see e.g. J. L. Doob [1953], p. 342. It has been known that the passage from weak to strong maximal inequalities can be done by a general argument applicable to harmonic analysis, ergodic theory, and martingale theory. In this book a very general such approach is presented in Chapter 3, involving Orlicz spaces and their hearts. There exists also a simple unified (martingales + ergodic theorems) passage from one to many parameters using no multiparameter maximal theorems, based on a general argument valid for ordercontinuous Banach lattices. This approach gives a unified short proof of many known theorems, namely multiparameter versions of theorems of Doob (Cairoli's theorem [1970] in stronger form, not assuming independence), theorems of Dunford & Schwartz [1956] and Fava [1972], the multiparameter pointtransformation case having been earlier proved by Zygmund [1951] and Dunford [1951]. We also obtain multiparameter versions of theorems of Akcoglu [1975], Stein [1961], Rota [1962]. For the Banach lattice argument, the order continuity is needed, which means that the L log" L spaces are not acceptable if the measure is infinite: they fail this property and have to be replaced by their hearts, subspaces H., which are closures of simple integrable functions (see Chapter 2). We will first develop in detail the "multiparameter principle" (Theorem (9.1.3)) that allows the reduction of multiparameter convergence problems to one parameter. There is also a onesided version of this result, useful to prove "demiconvergence" in many parameters. Also a Banachvalued version of the convergence principle is presented, for operators that have positive dominants (Theorem (9.1.5)); many simple operators are in this class, including point transformations and conditional expectations. As an application, in Section 9.2, we deduce theorems about convergence of multiparameter Cesaro averages of operators. Next, a version of the multiparameter superadditive ratio theorem is obtained (Theorem (9.3.3)); this contains a multiparameter ChaconOrnstein theorem. Finally, we consider *Parts of this chapter are taken from "A principle for almost everywhere convergence of multiparameter processes," by L. Sucheston and L. Szab6, pages 253273 in Almost Everywhere Convergence II, A. Bellow and R. Jones, editors, Copyright © 1991 by Academic Press, Inc. Used by permission of Academic Press.
9.1. A multiparameter convergence principle
383
martingales. It has been known that less than the independence assumptions of Cairoli [1970] suffice for convergence theory; if a martingale is also a "block martingale" then the independence may be dispensed with (Theorem (9.4.5)). Conversely, in the presence of independence (or even properly
defined conditional independencethe condition sometimes called F4 or commutation), every martingale is a block martingale (Proposition (9.4.3)). Here we reduce the case of block martingale to successive applications of the conditional expectation operator, which allows the use of Theorem (9.1.3) and simplifies the proofs. Block martingale theorems imply strong laws of large numbers for two parameters (9.4.8).
9.1. A multiparameter convergence principle Let E be a sigmacomplete Banach lattice; that is, a Banach lattice such that every orderbounded sequence has a least upper bound in E (see Lindenstrauss & Tzafriri [1979]). Recall that E is said to have an ordercontinuous norm if for every net (equivalently, sequence) (fi), fi 10 implies 111211
Let F C E. A map T: F + E is increasing, if f< g implies
T f < Tg; positive if f > 0 implies T f > 0; linear if T (a f +/3g) = aT f +,3Tg for any a,,3 E IR; positively homogeneous if I T (a f) I = a I T f I for each f E F,
a E IR+; subadditive if T (f + g) <_ T f + Tg. A map that is both positively homogeneous and subadditive is called sublinear. Sublinear increasing maps are positive since 0 < f implies TO = 0 < T f . A map T is continuous at 0 if for every net (fi) in F, 111111 ' 0 implies II Tfi II ' 0; continuous for order
if fi j f implies that Tfi I TI; continuous for order at 0 if fi 10 implies that Tfi 10. (9.1.1) Lemma. Let E be a Banach lattice with order continuous norm, F a Banach sublattice of E, and T a positively homogeneous, increasing map from F+ to E. Then (i) T is continuous at 0 and continuous at 0 for order. (ii) If in addition T is subadditive (hence sublinear), then T is continuous for order.
Proof. (i) Assume that T is not continuous at 0. Let I be a directed set, and let (fi)iEI be a net of elements of F+ such that lim Ilfi II = 0 and lim sup II T fi II > e > 0. Choose a sequence (in,) of indices such that 00
E 2nllfi II < oo and inf IITfi II > E. n=1 Set
n
9n = E 2kfk k=1
Since F is closed, gn T g E F+, and for every n, Tg > T9n ? T(2 n fin) = 2nT(fi ), hence
IIT9II >_ 2
ns.
Multiparameter processes
384
This is a contradiction, therefore T must be continuous at 0. If fi 10 then I I fi II 10, because E is order continuous. Therefore II T fi II 10. Also Tfi . g for some g E E+, and necessarily II9II < lim II Tfi II = 0. This proves (i).
(ii) Assume T is subadditive. Let (fi) be a net in F+ such that fi ? f or fi 1 f . For every index i,
Tfi
I T f T fi I I= 0. Therefore I T f T fi 11 0.
The following propositions will be useful in deducing manyparameter theorems from oneparameter theorems. The assumption that the directed sets have countable cofinal subsets guarantees that monotone limits are in
the lattice if countable suprema are. We state Proposition (9.1.2) (and Theorem (9.1.3)) for increasing sublinear maps, but in all the applications we give, the maps are positive and linear. In that case, by considering
separately the positive and negative parts of an element f, we see that the convergence statement in (9.1.2(iii)) [and (9.1.3(iii))] apply to f that need not be positive. The assumption in (9.1.2(i)) [and (9.1.3(i))] about the existence of V,,. is satisfied if there is demiconvergence: the stochastic (or truncated) limit exists and is equal to the lim inf. For applications, see Section 9.4, below. Given an increasing sublinear map V : F+  E, let V also denote the extension to F defined b y V f = V (f +)  V (f  ).
(9.1.2) Proposition. Let E be a Banach lattice with an ordercontinuous norm, F a Banach sublattice of E. Let I, J be directed sets with countable cofinal subsets. Let (V )iE1 be a net of increasing, sublinear maps
Vi: F+*E. (i) Assume that there is an increasing, sublinear map VV : F+ . E such that for each f E F+
liminfVf >V,,J. i
Let (fj j E J) be a net of elements of F+ such that lim inf f j = f,,,, E F+.
j
Then
liminfVifj > V"" fem. i,j
(ii) Assume that for each f E F+, V,,. f = lim supi Vi f E E. Let (f j) j E J
be a net of elements of F+ such that sup fj E F+. Let ff _ lim sup f j . Then
limsupvfj
9.1. A multiparameter convergence principle
385
(iii) Assume that lim Vi f = V,, f exists in E for each f E F+. Let (fj)jEJ be a net of elements of F such that sup JfjI E F+ and lim fj = f,,. E F. Then limvifj =V... f00. i>j
Proof. (i) For each j E J, set mj = infk>j fk. By assumption, the net (mj)jEJ increases to f,,,,. Now for each k > j we have fk > mj, so Vufk > Vumj. Thus inf Vufk >_ Vumj
k> j
Therefore, for each i,
inf inf Vu fk > inf Vumj. u>i k> j
u>i
Letting i > oo yields
liminf inf V fk > liminf Vimj > Vumj. i
k>j
i
For each j,
lim inf inf Vi f k < lim inf Vi f j . i
k>j
i,9
Therefore
liminfV fj > Vumj. i,j
Now the net mj increases to f,,,, E F+. The operator V,,. is monotonely continuous by Lemma (9.1.1(ii)). Hence Vumj T V3 fem. It follows that liminfV fj > Voofco. i,j
(ii) Necessarily foo E F+. For each j E J, let Mj = supk>j fk. Applying Fatou's Lemma as before, we have for each u, suPVufk
Therefore, for each i, sup supVu fk < sup VuMj. u>i u>i k>j
Letting i + oo yields lim sup sup V f k < lim sup V Mj = Vo. Mj i i k>j
Multiparameter processes
386
For each j, limsupsupV fk >_ limsupV fj. i
i,j
k>j
Therefore
limsupVifj
Now the net Mj decreases to f, E F+. The operator V,,. is increasing and sublinear being the lim sup of such operators. Therefore, by Lemma (9.1.1), V is continuous for order. Hence V,,,,Mj 1 V... fem. It follows that
limsupVifj
(iii) The operator V,,. is positive and sublinear being the limit of such
operators. We can consider separately the action of V and V,,. on the positive and negative parts of functions, since lim f = f+00 , lim f = f ,. Now, by parts (i) and (ii),
lim sup Vi f < VOf + < lim inf V f ,
ij
ij
hence limi,j Vi fj+ = V,,fj, and similarly limi,j Vi f. = ,,fem. Hence
limVfj =V". f".. i,j
We now consider more than two parameters. For each i, 1 < i < d, let Ii be a directed set with a countable cofinal subset. Let I = Il x I2 x . . . x Id be the product set. The partial order on I is defined by s = (sl, ... , Sd) < t = ( t 1 , . . . , td) if sk < tk for k = 1, ... , d. The notation t + oo then means that all the indices ti converge to infinity independently. L(O) has been introduced below to allow for a compact description of the action of the operators T(i, j).
D L(d) be Banach (9.1.3) Theorem. Let L(O) = L(1) D L(2) D lattices with order continuous norms and let Ii, 1 < i < d, be directed sets with countable cofinal subsets. For each i = 1, 2, . , d, let (T(i, j)),1 be a net of increasing, sublinear maps T(i, j) : L(i)+ > L(i  1)+. (i) Assume that for each i = 1,.. . , d, there exists an increasing, sublinear operator T(i, oo) L(i)+ + L(i  1)+ such that for every
f E L(i)+
T(i,oo)f
Then for every f E L(d)+ we have
liminf T(1,tl)T(2,t2).. T(d,td)f >T(1,oo)...T(d,oo)f.
9.1. A multiparameter convergence principle
387
(ii) Assume that
(a) limsupj T(i, j) f = T(i, oo) f E L(i) for 1 < i < d and each f E L(1)+, and (b) supj T(i, j) f E L(i  1)+ for each f E L(i)+, 2 < i < d. Then for each f E L(d)+, limsup t
(iii) Assume that
(a) limj T(i, j) f = T(i, oo) f exists and is in L(1) for each f E L(i)+ and (b) supj T(i, j) f E L(i  1) for each f E L(i)+, 2 < i < d. Then for each f E L(d)+,
=T(1,oo)
T(d,oo)f.
Proof. (i) By induction on d. For d = 2 choose f E L(2)+ and apply
Proposition (9.1.2(i)) with F = L(1), E = L(O), J = I2, fj = T(2,j)f, f,,. = liminfj fj, I = I,, V =T(1,i), Vo = T(1,oo). Then liminfi Vif > V,,.f by assumption. It follows that
liminf T(1,i)T(2, j) f = liminf Vi fj > Vofo ,,J
{, j
= T(1, oo) lim inf T(2, j) f 3
> T(1, oo)T(2, oo) f.
Now suppose that the inequality holds for any product of d operators and prove it for a product of d + 1 operators. Let F = L(d), E =
L(O), f E L(d + 1)+, J = Id+l, fj = T(d + 1, j) f, ff = liminfj fj,
I=I1 x
xId. For i = (il,...,id) E I, set V =
T(1, oo) .. .T(d, oo). Since each map T(i, oo) is increasing and sublinear, the map VO,, has the same properties. Applying Proposition (9.1.2(i)) and the induction hypothesis we get lira inf T(1, i1) . . . T(d, id)T(d + 1, id+l)f i,id+1
=liminfVT(d+1,j)f i, j >VVff>VVT(d+1,oo)f.
=T(1,oo)...T(d,oo)T(d+1,oo)f. (ii) By induction on d. For d = 2, choose f E L(2)+ and apply Propo
sition (9.1.2(ii)) with F = L(1), E = L(O), J = I2, fj = T(2,j)f, f = limsup, fj, I = I,, V = T(1, i), V,,. = limsupiT(1,i). Then limsupT(l,i)T(2, j)f = limsupVifj > V,,.fc,. i,j
i,j = T(1, oo) lim sup T(2, j) f
j
= T(1, oo)T(2, oo) f.
Multiparameter processes
388
Now suppose that the inequality of (ii) holds for any product of d operators and prove it for a product of d + 1 operators. Let F = L(d + 1), E = L(d),
f E L(d + 1)+, J = Id+l, .fj = T(d + 1j )f, f,,,, = limsupj fj, I = Il x x Id. F o r i = (il, ... , id) E I, set V = T(1, ii) ... T(d, id), V"C:' = T(1, oo) . . . T(d, oo). Since each map T(i, oo) is increasing and sublinear,
so is the map V... By assumption, supj T(d + 1, j) f E L(d). Applying Proposition (9.1.2(ii)) and the induction hypothesis lim sup T(1, il)
T(d, id)T(d + 1, id+1)f
i,id+l
= lim sup VVT (d + 1, j) f i,j
< lim sup Vi f oo = V. f. i = T(1, oo) . . . T (d, oo)T(d + 1, oo) f .
(iii) By parts (i) and (ii),
i
Hence the theorem follows. Banachvalued processes Let E be a Banach space with norm 1 1 . We do not look for the greatest 11
generality, restricting discussion to spaces Rk(E). These spaces could be replaced by Lp(E) spaces, p > 1 fixed. We write Lm(E) for the largest Orlicz space L1(E) + LS(E), defined in analogy to the real case (see Proposition (2.2.4)). A linear operator T on Lmax(E) is said to be positively dominated (by T)if there exists a positive linear operator T on Lmax (R) such that II Tf II < T II f 11 for all f E Lmax (E).
Here are some examples of positively dominated operators: (1) Let 9 be a measure preserving point transformation. Then T (f) _ f o O, considered as an operator on Lmax (E) is positively dominated by T f= f o 9 considered as an operator on Lmax (IR). (2) Any linear operator on Lmax(1R) is positively dominated by its linear modulus ITI defined by ITI .f = sup IT9I
for all f > 0.
IgI <_f
This follows from Dunford & Schwartz [1958], p. 672; see also Krengel [1985], p. 160.
(3) The operator conditional expectation E defined on Lm (E) (in analogy to the real definition (2.3.9)) is positively dominated by the real conditional expectation Eµ on Lmax(1R). We now prove versions of (9.1.2) and (9.1.3) for positively dominated operators in Lma,,(E). The limits below are taken almost everywhere.
9.1. A multiparameter convergence principle
389
(9.1.4) Lemma. Let k be a positive integer. Let I, J be directed sets with countable cofinal subsets. Let (Vi)iE1 be a net of operators on Lm(E), positively dominated by operators V on Lm "'(R). Suppose that for each
f E Ro(E), the limit V,,, f = limi V if exists and is in R0(E) and the limit limi [ I I f II = V.. ) II f II exists and is in Ro (IR). Suppose that for each f E Rk(E) we have supi V ll f 11 E Rk_ 1(IR). Let (f3)2j be a net of functions in Rk(E) such that lim fj = f,,. exists and supj IlfjII E Rk_l(IR). Then IimVifj i, j
Proof. For m E J, define sm = supj>m II fj  fooll, so 8m E Rk_1(IR) C_ Ro(IR). Then lim sup 11 Vi f j Voo foo 11
i,j
limsup llV(fj  foo)Il+limsuplluifm  Voofooll i,3
i
= limsup II v (fj  W11 i,j
< lim sup Villfj  fool) i, j
< lim sup Vism = V.. )sm. i
But sm + 0 and Ro(IR) has ordercontinuous norm (2.1.14), so Voosm + 0.
We now consider several parameters at the same time. As in Theorem (9.1.3), let Ii, I2i , Id be directed sets with countable cofinal subsets, and let I = Ii x 12 x . . . x Id be the product.
(9.1.5) Theorem. F o r i = 1, 2, .. , dand j E J, let T(i, j) be an operator on Lmax(E), positively dominated by T(i, j). Suppose
(a) For each f E Ro(E), the limits limj T(i, j) f = T(i, oo) f E Ro(E) and limj T(i, j)II f II E Ro(IR) exist. (b) For each f E Ri(E), supj T(i, j) 11 f II E Ri_1(IR).
Then for each f E Rd(E), lim T(1, t1)T(2, t2) . .T(d, td) f = T(1, oo)T(2, oo)
.
. T(d, oo) f
exists as the indices ti converge to infinity independently.
Proof. This is proved by induction on d using Lemma (9.1.4). For d = 2,
choose f E Ri(E), fj = T(2, j) f which converges to T(2, oo) f = fm, and V = T(1,i). For the general induction step, assume that the theorem holds for d parameters, let f E Rd+1(E), f j = T (d + 1, j) f , foo =
Multiparameter processes
390
T(d + 1, oo) f, V = T(1, i1) T(d, id). It is easy to see that the operator T(1, i1) T(d, id) is a positive dominant of Vi Remarks
The multiparameter convergence principle [Theorem (9.1.3(iii))], with most applications, is from Sucheston [1983]. The onesided results [(i) and (ii) on demiconvergence] were developed in Millet & Sucheston [1989] and in Sucheston & Szabo [1991]. Theorem (9.1.5) was stated in FYangos & Sucheston [1986]. Demiconvergence in martingale theory was introduced in Edgar & Sucheston [1981], and further studied by Millet & Sucheston [1983].
9.2. Multiparameter Cesaro averages of operators In 1951, articles of A. Zygmund and N. Dunford appeared under the same title and in the same volume of Acta Sci. Math. (Szeged), proving multiparameter convergence theorems for noncommuting point transforma
tions. Zygmund assumed f E Llogd1 L of a probability space. Dunford allowed a vfinite measure but restricted f to Lp, p > 1. The first obvious challenge was to find the common generalization of these two settings. The spaces Rk introduced by Fava [1972] fulfilled this role (also in the more general DunfordSchwartz operator context). However, the theorem of Zygmund, Dunford, and Fava still appeared as a difficult result, depending on a multiparameter maximal theorem: see e.g. Krengel [1985], pp. 196201. Theorem (9.1.3) reduces the theorem to the oneparameter theory that had been known earlier (cf. N. Wiener [1939]). We first recall some of the results from Chapter 2 about function spaces and maximal inequalities. A finite Orlicz function (2.1.1) is an increasing convex function [0, oo) > [0, oo), satisfying 0 and P(u) > 0 for
some u. Such a 4 is differentiable a.e.; the derivative cp, defined a.e. by 4D(u) = fe cp(x) dx, will be assumed leftcontinuous. Often we assume in addition that t(u)/u > oo, which happens if and only if cp is unbounded. Then there is a "leftcontinuous generalized inverse" 0 of cp, defined by 1/i(y) =inf { x E (0, oo) : cp(x) > y }
.
The function 0 is a derivative of an Orlicz function 4', called the conjugate of &. We are also interested in the function t;, defined by e(u) = ucp(u)  4(u) = T (cp(u)). It is leftcontinuous; it may or may not be an Orlicz function.
Let (St, .F, i) be a measure space. The Orlicz modular is the function
M4 (f) = f (I f I) dµ. The Luxemburg norm of a measurable function f is
II.fIIs=inf{a>0:Ms(f/a) <1}. The space
L,={f:IIfils
9.2. Multiparameter Cesaro averages of operators
391
is a Banach space, called an Orlicz space. This terminology may also be used when 4 is not an Orlicz function (cf. to below). If L, is an ordercontinuous Banach lattice, then we need not look elsewhere. But in any case, the heart H4, of L4,, defined by
H4, ={ f :M.,(f/a)
{u(lou)k 0
ifu> 1
if0
Then 4k is an Orlicz function for k _> 1; the corresponding is (u) = k441 (U), so L4 = Llogk1 L. The hearts Hp,, are the Favian spaces Rk. There is no for to (so there will be no maximal inequalities for f E L4,o ), but nevertheless L4,o may be considered as the largest Orlicz space, namely L.Do = L1 + Lam; the usual norm of L1 + L is equivalent with the norm
defined by to. The space Ro is the heart of L4,o. The spaces Rk can be characterized by
Rk=
f:
f
{If1>a}
4'k(Ifl)dµ
(see the remarks following (2.2.16)). The spaces Rk have ordercontinuous norm by (2.1.14). For each k,
U Lp9Rk9Rk_19...9Ro9L1+L..
1
If µ is a finite measure, then Ro coincides with L1 = L1 + L".. If p is a ofinite measure, then Ro is the appropriate space for the oneparameter martingale theorem and some oneparameter pointwise ergodic theorems: those in which pointwise convergence holds for functions in L1 and the operator contracts the Loo norm. (Conditional expectations and point transformations preserving null sets are in this class.) Indeed, if f E Ro, then f = g+h with g E L1 and IIhII= arbitrarily small. Since the operator does not increase the L,,. norm, h may be disregarded. In applications of Theorem (9.1.3), the space L(0) = L(1) will be Ro, and the spaces L(k) will be Rk_1. The operators T(i, j) will be oneparameter averages defined below. It will be necessary to show that if a function f is in Rk, then the appropriate supremum, called g, is in Rk_1. The following lemma follows from (3.1.12(b)), if we note that µ{I if I > Al < oo for all A > 0 is satisfied for any f E Rk.
Multiparameter processes
392
(9.2.1) Lemma. Let f, g be positive functions such that for some c > 0 and every A > 0,
'\µ{g > c'\} < If
f dµ. >a}
Iff E Rk then g E Rk_1. This lemma, together with Theorem (8.2.6) shows:
(9.2.2) Lemma. Let T be a positive linear operator on L1 + Lam, and let A,,,(T) = (1/n)(T° + T1 + + Tn1). Assume that IITII1 < 1 and supra IIAn(T)II. = c < oo. Iff E Rk then sup An(T) f = g E Rk_1
Suppose that T is a positive contraction of L1 and L. The oneparameter averages A,,(T) f converge a.e. for f c R0 by (8.6.11). Thus we have all the elements needed for the application of Theorem (9.1.3), with L(O) = L(1) = R0i L(i) = Ri_1 for i _> 2, and T(i, j) = Aj(Ti). This yields:
(9.2.3) Theorem. Let Ti be positive operators defined on L1 + L,,. such that IITII1 <_ 1 and IITII,,,, <_ 1, i = 1, 2, , d. For each i, and each f E R0, denote by A,,.(Ti) f the a.e. limit of A(T)f. Then for each f E Rd_,, lim
1
EE
5132 .. Sd O
... E Ti3T22...Tdd.f O
= A.(Ti)A,(T2) ... A.(Td)f a.e. as the indices si go to infinity independently. Complements
(9.2.4) (Nonpositive multiparameter DunfordSchwartz.) The statement is the same as (9.2.3) but the Ti's are not assumed positive. The linear modulus operators I Ti I defined by
ITZI f = sup ITigI
for all f >0
I9I<_f
are positive contractions on L1 and L,. (see Krengel [1985], p. 160). Hence
they have the properties required of positive dominants and the result (9.2.4) follows from Theorem (9.1.5).
(9.2.5) (Multiparameter Mourier theorem.) The oneparameter result, Banachvalued Birkhoff's ergodic theorem, is due to E. Mourier [1953]; for a proof see Krengel [1985], p. 167. Our multiparameter version is as follows:
9.2. Multiparameter Cesaro averages of operators
393
Theorem. Let E be a Banach space. Let 9i be measurepreserving point µ). Define Ti f = f o 9i transformations on a vfinite measure space on Lmax(E) = L1(E) +L,,. (E). Let f E Rd_1(E). Then the averages 811 821
8d1
3132...sd k1=0 k2=0
kd=0
1
E ...ETi1T22...Tddf
converge a.e. as the indices si converge to infinity independently.
Proof. The positive dominants are defined as Ti f = f o 9i, f E Lm ,,(]R). They have the required oneparameter convergence and maximal theorems because measurepreserving point transformations on real function spaces do (see Chapter 8); so (9.1.5) is again applicable.
(9.2.6) (Commuting operators.) If the operators Ti in (9.2.3) commute, = sd, then and we desire only convergence "over squares" where s1 =
it suffices to assume that f E L1 (or only f E R0). This result is due to Dunford &.Schwartz; for Brunei's proof, see Krengel [1985], p. 215. Our approach via Theorem (9.1.3) does not yield this deep theorem. (9.2.7) (Multiparameter Lp theorem.) Akcoglu [1975] proved that if T is a positive contraction on an LP space, 1 < p < oo, then for each f E LP , there is a maximal theorem II supra An (T )f II p 5 (p/(p  1)) II f IIp and An(f) converges a.e. Clearly Theorem (9.1.3) applies with all spaces L(i) = Lp. This gives the convergence of averages of the form 1
S182...Sd
811 821 k1=0 k2=0
...
8d1
Ti
1T22 ...Tddf
kd=0
for each f E LP.
(9.2.8) (Heart of Lp + Lam.) There is an Lp analog of the theory of Rk spaces. Define the Orlicz space Lp+L,,. as in (2.2.13), and denote its heart by M. Then f E Mp if and only if, for each e > 0, there is a decomposition f = fl + f2 where f1 E L1 and IIf III e (see (2.1.14c)). If T is a positive contraction on LP that is meanbounded in Lam, sup IIA,, (T)II = c < oo, n
then for f E Mp we have An (T) f converges a.e. and supra An (T) f E M. Indeed, by (2.1.14c), for each f and each e > 0, there is a decomposition f = fl + f2 with f1 E Lp and II12IIoo <_ E/c. Hence supnAn(T)If2I < E. Now sup An (T) I f I <_ sup An (T) If, I + sup An (T) I f2I; n n n
the first term on the right is in Lp by Akcoglu's maximal theorem, and the second term on the right is < e. The conditions are again satisfied for an
application of Theorem (9.1.3) with all the spaces L(i) = M. It follows that for each f E Mp the averages 811 821
s182...sd converge a.e.
E
k1=0 k2=0
8d1
... E
kd=0
Ti1T22...Tdd f
Multiparameter processes
394 Remarks
McGrath [1980], Theorem 3, obtained the convergence result (9.2.7); Yoshimoto [1982] obtained the convergence result (9.2.8) for f E Lp; they used multiparameter maximal theorems. Discussion of the hearts of spaces Lp+Lc, in the ergodic context is from Edgar & Sucheston [1989].
9.3. Multiparameter ratio ergodic theorems We recall some results from Chapter 8. Let T be a subMarkovian operator (positive contraction) on L1. Then the space St decomposes into the con
servative part C and the dissipative part D. If f E Li , then E00OTi f = 0 or oo on C, and E°_o T' f < oo on D (8.3.1). T is called Markovian if it preserves the integral. A sequence (sn) of functions in Li is called an extended superadditive process if Sk+n > Sk +Tksn
for every k, n > 0, and (sn) is called a superadditive process if in addition
n n1
ry = sup
sn dµ < oo.
The constant y is called the time constant of the process. The sequence (sn) is subadditive if (sn) is superadditive; (sn) is additive iff it is both superadditive and subadditive.
Note that if (Sn) is superadditive, then since ry < oo we have sn >
En
Tks1 for all n. We now restate the main part of Theorem (8.6.7): k=01
(9.3.1) Theorem. Let T be a subMarkovian operator on L1, let (rn) be a positive superadditive process, and let (Sn) be a positive extended superadditive process. Write E = {sups sn > 0}. Then the ratio rn/sn
converges a.e. to a finite limit on the set c fl E. If either T is Markovian or (rn) is additive on D, then lim(rn/sn) = lim T rn/lim T sn < oo exists a.e. on D fl E. We are going to use Theorem (9.1.3) to prove a multiparameter variant of this result, in which the sequence in the numerator is additive. A multiparameter version of the ChaconOrnstein theorem is obtained if the sequence in the denominator is also assumed to be additive.
Assume si > 0 and let f E Li . Define
Eni=o1 Tif h =sup n
Sn
En1 1,iif
i=O g=sup Ei=O n1 Ti
n
s1
9.3. Multiparameter ratio ergodic theorems
395
Let v be the finite measure sl p. Then, by (8.2.3), v{g > A} < 1
f {g>,\} si
dv.
Now since f /si E Ll (v), if f /sl E L logk L(v), then by Corollary (3.1.9b) we have g E L logk1 L. But h < g, so we have proved:
(9.3.2) Lemma. If f /s1 E L logk L (v), then h E L logk1 L (v). We are now ready for the multiparameter ratio ergodic theorem.
p) be a probability space. Let Ti, i = 1, 2, .. , d, be positive contractions on L, (p); for each i let snz) be a positive extended superadditive sequence with respect to Ti, and let vi be the measure s(i) p, i = 1, 2, , d. Suppose that the functions sit)/Sil), , sid)/sld1) are bounded away from 0. Then for each f such (9.3.3)
Theorem. Let
sil),
that f /s(,d) E L
logd1
L (v),
nlr 7,k1
k1=0 1
ndr End
n21 ,kz Lk2=0 2 (2)
s(1)
ni
Tdd
(d)
f
snd
5n2
, nd go to infinity independently.
converges a.e. as n1i n2,
To explain the intuitive meaning, consider the case when d = 2, Ti = T, 1 n1 Uj1. Then the theorem asserts T2 = U, Til s(2) = En1 that for each f E L log L,
_ io
` En1
En1 Uj j=0 j=0 Uj1
n,_1
Em1 Tif m1 Tit Ei=0
converges a.e. to a finite limit as m, n + oo independently.
Proof of the Theorem. Define the operators 1 Z i T (k, n) _ _i=o k snk)
F o r k = 1, 2,
, d, let L(k) =
If: f /sik) E L logk1 L (vk) }, and L(O) _
L(1) = L, (vi). The norm of f in L(k) is defined as the norm of f/sik) in L logk1 L (vk). Since L logk1 L is a Banach lattice with ordercontinuous norm, so is L(k). Also, if f E L1(µ), then T(n, k) f E L1(vk) C L1(vk_1) C C Ll(v1) C L1(µ). Suppose f E L(k). Then by Lemma (9.3.2),
hk = sup T(k, n)f E LlogkZL(vk). n
396
Multiparameter processes
Since s(k)/s(k1) is bounded away from 0, we have L logk2 L (vk) C L log1c2 L (vk_1).
The assumptions also imply that sik1) is bounded away from 0, so hk E L(k 1). The existence of the limits T(k, n) f follows from Theorem (9.3.1); since (rn) is additive, it is not necessary to assume T Markovian. The proof is completed by Theorem (9.1.3). Complements
(9.3.4) (A counterexample to a variant of ChaconOrnstein in two parameters.) A multiparameter version of the ChaconOrnstein theorem corresponds to the case of (9.3.3) when the processes (s')) are additive for i = 1, 2, , d. It is natural to ask whether a multiparameter version analogous to Theorem (9.2.3) holds; i.e., convergence of ratios of two expressions of the type appearing in (9.2.3) corresponding to two functions. The answer is no: there is a counterexample due to Brunel and Krengel (see Krengel [1985], p. 217). It has Q = IN with counting measure, functions in L1, commuting operators, and convergence to infinity is over "squares." A change of measure with appropriate modification of operators and functions reduces the situation to a probability space. Since in the discrete case L1 C L log L, this is also a counterexample to our conjecture where convergence (also over "rectangles") would be claimed for f E L log L. Remarks
Theorem (9.3.3) is from Sucheston & Szab6 [1991]. The additive version of it is in Frangos & Sucheston [1986].
9.4. Multiparameter martingales A martingale (Xt)tEJ defined on a probability space (Sl,.F, P) and indexed by a directed set J need not converge a.e.; we know this from Chapter
4. This is even true for martingales indexed by IN x IN. In fact, the convergence may fail for two reasons:
(1) The integrability is insufficient: In two dimensions, in order for Xt = E.:t [X] to converge, the random variable X must be in L log L even if there is appropriate independence of marginal distributions (see Cairoli [1970]). (2) The stochastic basis (.Ft) may be such that even Lambounded martingales fail to converge (Dubins & Pitman [1980]). We will apply a weak maximal inequality for submartingales. Note that L logk L C L1 for a probability space.
9.4. Multiparameter martingales
397
(9.4.1) Lemma. Let (X,,,) be a positive submartingale, and let Y = sup,, X. Suppose X,, converges in Ll to X E L1. Then for each A > 0, AP{Y > A} < E [1{y>,\)X] If X E L logk L then Y E L logk1 L. Proof. Since Xt1 converges in L1, it is uniformly integrable. Therefore we may pass to the limit under the integral sign in (6.1.5), with P1 = P. This proves the first assertion. The second assertion follows from (3.1.9(b)). Recall ((1.3.1) and (1.4.3)) that Libounded martingales (and submartingales) indexed by directed sets converge stochastically; and in the presence of uniform integrability converge in L1 norm. In this section we will consider martingales in d dimensions. We will see that a "block martingale" converges a.s. if it is properly bounded. A particular case is the convergence under condition (F4). The proof is by reducing block martingales to consecutive applications of conditional expectations and applying Theorem (9.1.3). Fix a positive integer d. The directed set I will be INd = IN x IN . . x IN, with d factors. The ordering on I is defined as usual: ifs = (s1i s2, , Sd)
and t = (t1, t2i , td), then s < t if s;, < ti for all i. Let (.Ft)tEJ be a stochastic basis. For integers i, j with 1 < i < j < d, write
for the aalgebra obtained by lumping together the or algebras on all the axes except for axes numbered from i to j. More precisely, ""i is the oralgebra generated by all T, with sk = tk for i < k < j and Sk arbitrary for other k. Of course the containment relations remain, so that for fixed i and j, the family ( ""j)tEI is again a stochastic basis. When i = j, we will sometimes write Tt for JTt'. Denote the conditional expectation Ewe
by E3and Ec8 by E8.
Since the index set I = INd is a directed set, the definition of Chapter 4 applies: a process (Xt)tE1 is a martingale [submartingale] if X. = E3Xt [X8 < EBXt] whenever s < t. Now I has additional structure, so other variants are possible. If k E IN, 1 < k < d, then we will say that the process (Xt) is a block kmartingale [block ksubmartingale] if
El kXt = X (81,... ,8k,tk}1i... ,td) 9
Ir
E9 kXt > X81,...
1
,8k,tk}1,... std) J .
L
whenever s < t. An integrable process is a block martingale [block submartingale] if it is a block kmartingale [block ksubmartingale] for all k < d. Block kmartingales should not be confused with "kmartingales," defined by E8 Xt
v
(tl ... ,tk1>8k tk+1 ... ,td)
Block martingales may be characterized in terms of "factorization" of conditional expectations:
Multiparameter processes
398
(9.4.2) Proposition. (1) Let (Xt) be a uniformly integrable martingale. Write X = slim Xt. Then the following are equivalent: (a) Xt = EtX is a block martingale.
(b) For each k
s lim Xt . Then for each k < d and each t, we have Et X < Et  k Et+1" dX .
Proof. First let Xt be a uniformly integrable block submartingale. Let s = (Si ... , sd) < t = (t1,... , td) with tk+1 = Sk+1, ... J d = 8d. This
is not a loss of generality in what follows, because if s < t and r = Since Xt is a block k, td), then E.1k = E,1k. (sl, , sk, tk+1, submartingale, we have X8 =X(81 ... 8k,tk+1 ... td <
E8kXt
< E8kEtX.
Taking the limit in L1 when t1 > oo, t2 > oo,
, tk + oo, we obtain
lim Xt < lim Et X = E3 +1 dX , hence Xe < EBkE9+1dX.
This proves (2). Now let Xt be a uniformly integrable block martingale. Then (a) follows from part (2) applied to Xt and Xt. Finally, to prove (b) assume that EtX = E t  kEt +1 dX for each k. Then E9kXt
=
(b) (a),
E9kEE kEt+ldX = EskEt+ldX.
Let u be such that ui = ti for k+1 < i < d. Then as ul + oo, , uk + cc, we have Xu + Et +1 "" dX. Since E8 kXu =X(81, ,td), the proof is complete.
(9.4.3) Theorem. Let (Xt) be a uniformly integrable block submartingale, and write X = slim Xt. Then for all s, we have X. < 13,113 ... Es X . 9
Let (Xt) be a uniformly integrable block martingale. Then
X8 = E3E8 ...E.X. for all s.
Proof. Let (Xt) be a uniformly integrable block submartingale. We claim that for all k, 1 < k < d, (9.4.3a)
X. < ESX < E8 Ee ... E. E(k+1)  dX.
9.4. Multiparameter martingales
399
Only the second inequality has to be proved. The proof is by induction on k.
Case k = 1 follows from (9.4.2). Now assume (9.4.3a) holds for some value of k, and consider the next value k + 1. Again by (9.4.2),
X8 < E8X < Taking the limit in L1 as Si + oo,
Es(k+1)Ee+2dX
, sk + 00,
E8k+1)dX < Eek+1)Eek+2)dX.
Now substitute this into (9.4.3a) to complete the induction step. This completes the proof in the case of a block submartingale. The case of a block martingale follows from this.
(9.4.4) Theorem. (i) Let (Xt) be a block submartingale that is bounded in the Orlicz space Llogd1 L. Then we have upper demiconvergence: lim sup Xt = slim X. (ii) If X E L logd1 L then Et Et Ed X converges a.e. to a finite limit. (iii) Let (Xt) be a block martingale bounded in L logd1 L. Then Xt converges a.e. and in L logd1 L.
Proof. (i) First assume that (Xt) is a positive block submartingale bounded in Llogd1 L. Apply Theorem (9.1.3(ii)) and Lemma (9.4.1) with: L(i) = L log'1 L for 1 < i < d; T(i, t) = Et. Note that as t1 > oo, the aalgebras .Ft' increase to VtFl = Vt t, so Et'X converges to X as t1 > oo. Hence T(1, oo) X = X a.e. Similarly T(i, oo) X = X a.e. for all i. Therefore, by (9.4.3) and (9.1.3(ii)), we have lim sup Xt < lim sup Et Et .
. . EdX
< T(1, oo) T(2, oo)
. T(d, oo) X = X.
On the other hand, we always have X = s lim Xt < lim sup Xt. Hence the statement follows for positive block submartingales; and hence also for block submartingales bounded from below by a constant. Now let Xt be an arbitrary L logd1 Lbounded block submartingale. Then for any constant a, the process (Xt V a) is also a block submartingale,
so by the above, we have lim sup(Xt V a) = X V a. By Fatou's lemma, lim sup Xt > oo a.e. Thus letting a > oo, we obtain lim sup Xt = X. (ii) Apply Theorem (9.1.3(iii)) and Lemma (9.4.1) with L(i) = L logx1 L
for1
(9.4.5) (Reversed processes.) There is also a version of our results with a dual directed set I. Fix a positive integer d. Consider
I = (IN) x (IN) x . . . x (IN)
400
Multiparameter processes
with d factors. The ordering on I is given by: s = (s1i , sd) < t = (t1i , td) if Si < ti for all i. But I is a dual directed set: given any
s, t E I, there is u E I with u < s and u < t. When we discuss limits indexed by I, we are interested in what happens when all the indices go to oo. Block martingales and block submartingales are defined by the same inequalities. (9.4.6) (Twoparameter martingales.) Let I = IN x IN. A stochastic basis (F )tEI is said to satisfy condition (F4) if F is conditionally independent of Ft given Yt (Cairoli & Walsh [1975]). A process (Xt) is a 1martingale
if
E8Xt = X(31,t2) whenever s < t; the process is a 2martingale if
E"Xt
= X(t1,s2)
whenever s < t. Thus a block martingale is a martingale which is also a 1martingale. Since (F4) is symmetric, it can be derived only from symmetric assumptions.
Proposition. Let (Ft)tEI be a stochastic basis indexed by I = IN x V. The following are equivalent: (a) Every uniformly integrable martingale is a 1martingale and a 2martingale. (b) Et = Et Et = Et Et . (c) (F4). Proof. A 2martingale is a 1martingale after exchanging the coordinates. So the equivalence of (a) and (b) follows from (9.4.2). (b) (c). Assume (b). We must show that F1 is conditionally inde
pendent of P given .Ft. Let A E Pt , B E Ft , X = 1A, Y = 1B. Then we must show that Et[XY] _ (EtX)(EtY). Now Et[XY] = Et Et [XY] = (Et X) (Et Y) = (Et Et X) (Et Et Y) _ (EtX) (EtY). (c)
(b). Assume (F4). Let A E .F't , X E L1. Then
E [1AEt Et X] = E [Et (lAEt X)] = E [IAEt X] = E [Et(1AEtX)] = E [Et(1A)Et(EtX)] = E [Et(lA)EtX] = E [1AEtX]. Since Et Et X and EtX are both measurable with respect to.Ft, this implies
that Et Et X = EtX. By symmetry, Et Et X = EtX. (9.4.7) (A twoparameter strong law of large numbers.) Suppose Yij are integrable independent random variables with expectation zero. Then
Xt = X,n,n = E Yij 1
1<j
9.4. Multiparameter martingales
401
is a twoparameter martingale, and the stochastic basis (.Pt) generated by it satisfies (F4). On the other hand, there are situations in which a martingale
is both a 1martingale and a 2martingale, yet (F4) fails. An example is provided by a twoparameter extension of the strong law of large numbers via a theorem proved for one parameter in (6.1.15). As in (6.1.15), let P,n be the class of permutations on IN with support in {1, 2,
,
m}. A double sequence (Yij)iEJN,jEmN of random variables is
called 1exchangeable if for each m, and each permutation 7r E P,,,,, the system (Yi,j) has the same distribution as (Y(i),j). The double sequence is 2exchangeable if for each m, and each permutation 7r E P., the system (Yi,j) has the same distribution as (Yi,,,(j)). A double sequence that is both 1exchangeable and 2exchangeable is called row and column exchangeable.
Theorem. Let p > 0, and let Yij be positive row and column exchangeable random variables such that Y i E L1. Set
Sm,n = E Yij i<m j
and Xt = X(_,,,,,_n) = S ,,n/mn. If 0 < p < 1, then Xt is a reversed block submartingale; if 1 < p < oo, then Xt is a reversed block supermartingale. If 0 < p < 1 and Yl l E L log L, then Xt converges a.e. The limit is 0 if p < 1; the limit is EE [Y11] if p = 1, where E is the oalgebra of row and column exchangeable events.
Proof. Set
J'(m,n) = 0`1 X(r,s) : r > m, s > n If 7r E Pm,,, then the two systems
{YY,j, 1 < i < m; Sr,., r > m} and {Yi,(i),j, 1 < i < m; S,,3, r > m}
have the same distribution. Therefore for all i < m, E [Yi,j I JI(m,1)] = E [Y,,(i),j
I
Now the lemma proved in (6.1.15) implies that for each k < m and each n,
E [X(k,n) I
X(m,n)
if 0 < p < 1. That is, Xt is a reversed 1submartingale. By symmetry, it is also a reversed 2submartingale. Applying these two properties one after
the other, we see that Xt is a reversed submartingale. Therefore Xt is a reversed block submartingale. So Xt demiconverges by Theorem (9.4.4). If
0 < p < 1, then slim Xt = lim sup Xt = 0 because oneparameter limits are 0 by (6.1.15); hence lim Xt = 0. If p = 1, then Xt is a reversed block
402
Multiparameter processes
martingale, so it converges. The limit is identified by integration over row and column exchangeable sets. The proof that if p > 1 then Xt is a reversed block supermartingale is analogous.
(9.4.8) (Banachvalued martingales in probability spaces.) Let E be a Banach space, and let Ft be a stochastic basis indexed by I = IN d. What can be said about Evalued martingales Xt ? Oneparameter convergence theorems (5.3.20) and maximal theorems (9.4.1) are available for conditional expectations and their positive dominants, which are also conditional expectations applied to real functions. So (9.1.5) is applicable. In the notation of (9.4.3), we see that E8 E8 ... E3 X converges a.e. if X E L logd1 L (E). A more general formulation with the same proof follows. For each fixed k, let Fn be either increasing or decreasing (reversed) stochastic bases indexed by IN. Let E,! = Eon . Then X(81,82,...,84) = E81E82 ... E
dX
converges a.e. if X E L logd1 L (E). The theory of block martingales also extends to the Banachvalued set
ting. In particular, Theorems (9.4.3) and (9.4.4) are true under the additional assumption that E has the RadonNikodym property. It is needed to obtain the conditional expectation representation of uniformly integrable martingales, hence convergence (5.3.30). (9.4.9) (Infinite measure.) Now consider Banachvalued processes defined on an infinite measure space (St,.F, µ). The expression 1 2 d E81E82...E8dX
converges a.e. for X E Rd_1(E), under proper assumption of afiniteness of measure. Here we prove the oneparameter results for Evalued martingales indexed by IN; the extension to several parameters by the methods of this section is easy.
If F,, are increasing aalgebras such that a is afinite on T1, then the problem of a.s. convergence of martingales easily reduces to the previous case, since if A E Fj is a set of finite measure and (X,,,) is a martingale,
then 1AX is a martingale with respect to µ restricted to A; pointwise convergence on A implies pointwise convergence on S2 as A T Q. So E'
[X]
converges for X E Lm (E) = L1(E) + L,,(E). Similarly, multiparameter a.e. theorems reduce to those on A. Of more interest are reversed martingale
theorems, assuming that µ is afinite on each but not on f ,T ,,. In this setting, the oneparameter L1, case was treated by Dellacherie & Meyer [1982], pp. 3540. Here we limit discussion to martingales given by conditional expectations. Observe that if (X_,,,) is a reversed martingale,
then X_ = E'F^ [X_1].
9.4. Multiparameter martingales
403
(9.4.9 i). Let E be a Banach space. Let (Q, .F, p) be a afinite measure be a stochastic basis. Assume that p is ofinite
space, and let
on each F. (a) If X E Lmax(E) = L1(E) + L,.(E), then for every A > 0,
it {sun< oo
IIE [X]II ? A} < AE [IIXII] .
(b) If X E Rk (E), k > 1, then sup oo
Rk1(E).
(c) If X E Lmax(E), then E" [X] converges a.e. (d) If X E Ro(E), then EF^ [X] converges a.e.
Proof. (a) It suffices to consider supnEIN since this is equivalent to taking supn>,n for any fixed m, and the maximal inequality is preserved as m  oo. Fix A E F1 with µ(A) < oo. In finite measure spaces, the localization theorem (1.4.2) states that X. = EF' [X] for simple stopping times a. The conditional expectation is a contraction on L1(E) [see the remarks following (5.1.15)]; it follows that E [IIXII] 5 E [IIXII] Hence the maximal inequality (5.2.36) applied with the measure it restricted to A gives A
l1AsuIN
Now let A 10; it follows that Aµ {1A nSUP IIEY° [XI 11 >
A} < E [IIXII].
EIN
(b) From (8.2.5) we now obtain sup IIEn[X]II > 2A} < E [1{IIXII?a}IIXII] ,t1A nEIN Now apply (9.2.1). (c) Xn converges on A by the convergence theorem (5.3.20) applied on the set A. Convergence on Sl follows on letting A T Q. (d) Write X = X, +X2, with X1 E L1 and IIX2IIoo < e. Since conditional
expectations are contractions in Lam, we may disregard X2. So we may assume simply that X E L1. But now the maximal inequality (a) allows the proof of convergence in the same way as in probability space (5.3.36).
Multiparameter processes
404
(9.4.10) (Multiparameter Rota theorem.) A positive operator is called bistochastic if it preserves the L1 and L,,. norms of positive functions. Let Ti be bistochastic operators, and let Un = Ti . T,,,,2 Tl . This is sometimes called the "alternating procedure." The oneparameter Rota theorem is: if f E R1, then Un f converges a.e. The operator admits a representation Un.f = E9
[EF" [1]] ,
where 9 is a fixed valgebra and .F is a decreasing sequence of valgebras (Dellacherie & Meyer [1982], p. 56). In order to apply Theorem (9.1.3)
to obtain a multiparameter version of the theorem, f E Rk must imply sup Un f E Rk_1. This follows from the representation of U7, above, because (by Jensen's inequality) the conditional expectation E9 respects the classes
Rk. So the multiparameter Rota theorem is:
(9.4.10(i)). For i = 1,
,
d and n E IN, let Tn be a bistochastic operator.
Set
Un = T1 ...Tn
If f E Rd, then lim U31
U .d
)* ... (Ti)
f exists a.e. as the indices si converge to
infinity independently.
(9.4.11) (Pure LP alternating procedure.) Let T be a linear operator on LP, 1 < p < oo. Define a (nonlinear) operator M(T) : L4  Lp by [(Tf)P1])1/(P1)
M(T)f = (T*
If (Tn)nEIN is a sequence of positive linear contraction operators on LP, then for all f E LP , the sequence M(T. Tl) f converges a.e. and sup. M(T. Tl) f E L4 (Akcoglu & Sucheston [1988]). If p = 2, then M(T) = T*T is linear. In that case, (9.1.3) implies an easy multiparameter version of (9.4.10(i)): Let (T,(, ))11EIN be sequences of positive contractions in L2, for i = 1,
If f EL2,then M(Tel) ... T21)T(1))M(T(2) ... T22)T12)) ... M(T(8dd) ...T(d)Tld)) f 2
converges a.e. as the indices s1, , sd converge to infinity independently. In particular, we obtain a multiparameter version of Stein's [1961] theorem, which corresponds to the case when the T,(,0 = T(i) are independent of n and selfadjoint. Then for each f E L2, (T(1))281 (T(2) J1\ 282
converges a.e. as the indices sl,
... (T(d)
284
)
f
, sd converge to infinity independently.
9.4. Multiparameter martingales
405
(9.4.12) (Amart approach to two parameters.) Let J = IN x IN. For t E J we write t = (t1i t2) and .':'t = V F(ti,u)u
Amapr:Sl > J =IN xIN isa1stoppingtime iff{z=t} EF' for all t E J. The set of simple 1stopping times is denoted El. A 1amart is a process (Xt) such that the net (E [XT])1EEI converges. Assume that the stochastic basis satisfies the conditional independence condition (F4). Let (Xt) be a block martingale; equivalently, let (Xt) be a martingale and a 1martingale (9.4.6). Then (Xt) is a 1amart. Since (.F1) is totally ordered, this implies that Xt converges a.s. ((4.2.11) and (4.2.5)). Convergence of Xt also follows from (9.4.4) above, which is a more general argument. But the amart approach was particularly important in the continuous parameter case (J = IR x IR), where it was applied to the problem of regularity of trajectories of martingales bounded in L log L under condition (F4) (Millet & Sucheston [1981a]). This method proved regularity in the first, second, and fourth quadrants, after Bakry [1979] proved regularity in the first and third quadrants by the method of stochastic integration; see also Meyer [1981]. Again the amart point of view gave the good notions of optional and predictable projections for twoparameter processes; see Bakry [1981]. (9.4.13) (Multiparameter Krengel theorem.) Krengel's stochastic ergodic theorem asserts that if T is a positive linear contraction on L1 and f E L+1, then (1/n) E O Ti f converges stochastically. Since n1 lim inf
1 >2 Ti f = 0, io
there is in fact lower demiconvergence to 0 (Krengel [1985], p. 143). Theorem (9.1.3(i)) now gives a multiparameter version of this result (Millet & Sucheston [1989]).
(9.4.14) (Additive amarts.) Multiparameter processes may be studied in another way. A few details are given here. Let I be a directed set with least element 0, and locally finite in the sense
that all intervals [0, t] = { s E I : 0 < s < t } are finite. A subset Sc I is a (lower) layer of I if from s < t and t E S it follows that s E S. We will write G(I) = L for the set of all layers. Then L is a directed set when ordered by inclusion. So we may study processes with index set L, such as martingales or amarts. If (.Ft)tEI is a stochastic basis indexed by I, there is an associated stochastic basis (?Is)sE.c indexed by the layers, defined by
is = V .F8. 8ES
A process Fs indexed by G is called an additive process if FSUT + FsnT = Fs + FT
a.e.
Multiparameter processes
406
for all S, T E L. Certainly any process (FS) of the form
FS = > Y,, sES
is additive. An additive process that is also an amart is called an additive amart. An additive process that is also a martingale is called an additive martingale. If a process (Xt)tEI has the form
Xt=>YB s
for some difference process (Yt), then we will say the process (Xt) and the additive process
FS=1: Y. sES are associated processes. In that case:
(i) If we have suptEI E [IXtI] < oo and supTEE(e) IE [FT] I < oo, then also supTEE(r) E [IFTI] < oo. (ii) If (FS) is an L1bounded amart, then (IFSI) is also an L1bounded amart. In some special cases, for example I = IN x IN, there is more. Every process (Xt) can be obtained by adding a difference process (Yt). In the case I = IN x IN we also have: (iii) If (Fs) is an additive amart, and Xt = F[o t] is Libounded, then Xt converges essentially. (iv) FS is an additive martingale if and only if E [Xi+1,j+l  X2+1,,  X2j+1 + X2j I Ti,. V F.,j] = 0 E [X2+1,o  X2,o I.F2,.] = 0
E [Xo,7+1  XO,3 I.F.,7] = 0
for alli,jEIN. The conditions in (iv) mean that (Xt) is a strong martingale in the sense of Walsh [1979]. So (iii) implies Walsh's result on the convergence of strong martingales. Reference: Edgar [1982]. Remarks Frangos & Sucheston [1985] introduced block martingales and proved their convergence. Earlier Chatterji [1975] derived convergence of block margingales for
d = 2 from a Banachvalued martingale convergence theorem. Theorems (9.4.2), (9.4.3), and the proofs of ((9.4.1)(i)) and ((9.4.1)(iii)) are from Sucheston & Szabd [1991]; (9.4.1(ii)) is from Sucheston [1983]. This last result was applied to random fields by Follmer [1984]; (9.4.7) is from Edgar & Sucheston [1981]. Row and column exchangeable random variables were introduced by D. J. Aldous [1981], and by Edgar & Sucheston [1981] under the name Texchangeable. There exists now a considerable body of literature on the subject (see Aldous [1985]).
The theory of martingales indexed by lR x IR was initiated in the memoir of Cairoli & Walsh [1975]. The book Korezlioglu, Mazziotto, & Szpirglas [1981] is a collection of articles devoted to it.
References
M. A. Akcoglu [1975] A pointwise ergodic theorem for L,
spaces. Canad. Math. J. 27, pp. 19751982.
M. A. Akcoglu, L. Sucheston [1978] A ratio ergodic theorem for superadditive processes. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 44, pp. 269278.
[1983] A stochastic ergodic theorem for super
additive processes. J. Ergodic Theory and Dynamical Systems 3, pp. 335344. [1984a] On ergodic theory and truncated limits in Banach lattices. In: Measure Theory Oberwolfach 1983, Lecture Notes in Mathematics 1089, SpringerVerlag; pp. 241262.
[1984b] On identification of superadditive ergodic limits. In: Yale University Sympo
sium Dedicated to S. Kakutani, Contemporary Mathematics 26, American Mathematical Society; pp. 2532. [1985a] On uniform monotonicity of norms and ergodic theorems in function spaces. Supplemento ai Rend. Circ. Mat. Palermo 8, pp. 325335. [1985b] An ergodic theorem on Banach Lattices. Israel J. Math. 51, pp. 208222. [1988] Pointwise convergence of alternating sequences. Canad. J. Math. 40, pp. 610
K. Astbury [1976] On Amarts and Other Topics. Ph. D. Dissertation, Ohio State University. [1978] Amarts indexed by directed sets. Ann. Probability 6, pp. 267278. [1981a] Order convergence of martingales in
terms of countably additive and purely finitely additive martingales. Ann. Probability 9, pp. 266275. [1981b] The order convergence of martingales
indexed by directed sets. Trans. Amer. Math. Soc. 265, pp. 495510.
D. G. Austin [1966] A sample function property of martin
gales. Ann. Math. Statist. 37, pp. 13961397.
D. G. Austin, G. A. Edgar, A. Ionescu Tulcea [1974] Pointwise convergence in terms of expectations. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 30, pp. 1726.
S. N. Bagchi [1983] On almost sure convergence of classes
of multivalued asymptotic martingales. Ph.D. dissertation, Department of Mathematics, The Ohio State University. [1985] On as. convergence of classes of multivalued asymptotic martingales. Ann. Inst. H. Poincard, Sec. B 21, pp. 313321.
D. Bakry [1979] Sur la regularite des trajectoires des
D. J. Aldous
martingales a deux indices. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 50, pp. 149157. [1981] Theoremes elementaire des processus a deux indices. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 55, pp. 5571.
[1981] Representations for partially exchange
S. Banach
632.
[1989] A superadditive ergodic theorem in Ba
nach lattices. J. Math. Anal. Appl. 140, pp. 318332.
able arrays of random variables. J. Multivariate Anal. 11, pp. 581598. [1985] Exchangeability and related topics. In: Ecole d'Etd de Probabilitds de SaintFlour XIII1983, Lecture Notes in Mathematics 1117, SpringerVerlag; pp. 1198.
E. S. Andersen, B. Jessen [1948] Some limit theorems on setfunctions. Danske Vid. Selsk. Math.fys. Medd. 25 no. 5, pp. 18.
[1924] Sur un theoreme de M. Vitali. Fundamenta Math. 5, pp. 130136.
H. Bauer [1981] Probability Theory and Elements of Measure Theory. Academic Press.
J. R. Baxter [1974] Pointwise in terms of weak convergence. Proc. Amer. Math. Soc. 46, pp. 395398. [1976] Convergence of stopped random variables. Advances in Math. 21, pp. 112115.
408
References
A. Bellow
P. Billingsley
[1976a] On vectorvalued asymptotic martin
[1979] Probability and Measure. Wiley.
gales. Proc. Nat. Acad. Sci. U. S. A. 73, pp. 17981799. [1976b] Stability properties of the class of asymptotic martingales. Bull. Amer. Math. Soc. 82, pp. 338340. [1977a] Several stability properties of the class of asymptotic martingales. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 37, pp. 275290. [19776] Les amarts uniformes. C. R. Acad. Sci. Paris 284, pp. A12951298. [1978a] Uniform amarts: A class of asymptotic martingales for which strong almost sure convergence obtains. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 41, pp. 177191. [1978b] Some aspects of the theory of vector
valued ainarts. In: Vector Space Measures and Applications I, Lecture Notes in Mathematics 644, SpringerVerlag; pp. 5767.
[1978c] Submartingale characterizations of measurable cluster points. In: Probability on Banach Spaces, J. Kuelbs, editor, Advances in Probability and Related Topics 4, Marcel Dekker; pp. 6980. [1981] Martingales, smarts and related stopping time processes. In: Probability in Banach Spaces III, Lecture Notes in Mathematics 860, SpringerVerlag; pp. 924.
[1984] For the historical record. In: Measure Theory Oberwolfach 1983, Lecture Notes in Mathematics 1089, SpringerVerlag; p. 271.
A. Bellow, A. Dvoretzky [1979] A characterization of almost sure convergence. In: Probability in Banach Spaces
II, Lecture Notes in Mathematics 709, SpringerVerlag; pp. 4565.
[1980] On martingales in the limit. Ann. Probability 8, pp. 602606.
A. Bellow, L. Egghe [1982] Generalized Fatou inequalities. Ann. Inst. H. Poincar6, Sec. B 18, pp. 335365.
Y. Benyamini, N. Ghoussoub [1978] Une caracterisation probabiliste de ll. C. R. Acad. Sci. Paris 286, pp. A795798.
A. S. Besicovitch [1945] A general form of the covering principle and relative differentiation of additive
functions. Proc. Cambridge Philos. Soc. 41, pp. 103110. [1946] A general form of the covering principle and relative differentiation of additive functions II. Proc. Cambridge Philos. Soc. 42, pp. 110.
[1947] Corrigenda to the paper "A general form of the covering principle and relative
differentiation of additive functions II." Proc. Cambridge Philos. Soc. 43, p. 590.
G. D. Birkhoff [1931] Proof of the ergodic theorem. Proc. Nat. Acad. Sci. U. S. A. 17, pp. 656660.
G. Birkhoff [1937] MooreSmith convergence in general topology. Ann. of Math. 38, pp. 3956.
D. Blackwell, L. E. Dubins [1963] A converse to the dominated conver
gence theorem. Illinois J. Math. 7, pp. 508514.
L. H. Blake [1970] A generalization of martingales and two subsequent convergence theorems. Pacific J. Math. 35, pp. 279283.
J. R. Blum, D. L. Hanson, L. H. Koopmans [1963] On a strong law of large numbers for a class of stationary processes. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 2, pp. 111.
F. A. Boshuizen [In press] Comparisons of optimal stopping values and prophet inequalities for matrices of random variables.
J. Bourgain [1977] On dentability and the BishopPhelps property. Israel J. Math. 28, pp. 265271.
R. D. Bourgin [1983] Geometric Aspects of Convex Sets with the RadonNikodym Property. Lecture Notes in Mathematics 993, SpringerVerlag.
B. Bru, H. Heinich [1979] Sur l'esperance des variables aleatoires
vectorielles. C. R. Acad. Sci. Paris 288, pp. 6568. [1980a] Sur 1'esperance des variables aleatoires vectorielles. Ann. Inst. H. Poincar6, Sec. B 16, pp. 177196. [1980b] Sur l'esperance des variables aleatoires a valeures dans lea espaces de Ba
nach reticules. Ann. Inst. H. Poincard, Sec. B 16, pp. 197210.
B. Bru, H. Heinich, J. C. Lootgieter [1981] Lois des grands nombres pour les variables echangeables. C. R. Acad. Sci. Paris 293, pp. A485488.
A. Brunel [1963] Sur un lemme ergodique voisin du lemme de E. Hopf et sur une de sea applications. C. R. Acad. Sci. Paris 256, pp. A581584.
A. Brunel, U. Krengel [1979] Parrier avec un prophete dans le cas d'un processus sousadditif. C. R. Acad. Sci. Paris 288, pp. A5760.
References A. Brunel, L. Sucheston [1976a] Sur lea smarts faibles a valeurs vectorielles. C. R. Acad. Sci. Paris 282, pp. A10111014. [1976b] Sur lea amarts a valeurs vectorielles.
409
D. L. Burkholder, R. F. Gundy [1970] Extrapolation and interpolation of quasilinear operators on martingales. Acts Math. 124, pp. 249304.
C. R. Acad. Sci. Paris 283, pp. A1037
H. Busemann, W. Feller
1040.
[1934] Zur Differentiation der Lebesgueschen Integrale. Fund. Math. 22, pp. 226249.
[1977] Une characterisation probabiliste de la
separabilit6 du dual d'un espace de Ba
nach. C. R. Acad. Sci. Paris 284, pp.
E. Cairoli
A14691472.
[1970] Une inegalite pour martingales a in
[1979] Sur existence de dominantes exactes pour un processus suradditif. C. R. Acad. Sci. Paris 288, pp. A153155.
R. D. Brunk [1948] The strong law of large numbers. Duke
Math. J. 15, pp. 181195.
Bui Khoi Dam [1987] On the convergence of smarts in Orlicz spaces. Ann. Univ. Sci. Budapest. E6tv6s Sect. Math. 30, pp. 231239.
(1989] BMOsequences and amarts. Acta Math. Hungarica 53, pp. 271279.
D. L. Burkholder
dices multiples et lea applications. In: Seminaire de Probabilitts IV, Lecture Notes in Mathematics 124, SpringerVerlag; pp. 128.
E. Cairoli, J. B. Walsh [1975] Stochastic integrals in the plane. Acta
Math. 134, pp. 111183.
R. V. Chacon [1962] Identification of the limit of operator averages. J. Math. Mech. 11, pp. 961968. [1964] A class of linear transformations. Proc. Amer. Math. Soc. 15, pp. 560564. [1974] A "stopped" proof of convergence. Advances in Math. 14, pp. 365368.
[1964] Maximal inequalities as a necessary condition for almost everywhere convergence. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 33, pp. 5559. [1966] Martingale transforms. Ann. Math. Statist. 37, pp. 14971504. (1973] Distribution function inequalities for martingales. Ann. Probab. 1, pp. 1942.
R. V. Chacon, D. S. Ornstein
[1984) Boundary value problems and sharp inequalities for martingale transforms. Ann.
[1975]
Probab. 12, pp. 647702.
[1986] An extension of a classical martingale inequality. In: Probability Theory and Harmonic Analysis, J.A. Chao and W. A. Woyczynski, editors, Marcel Dekker, Inc.; pp. 2130. [1988] Sharp inequalities for martingales and stochastic integrals. Asterisque 157158, pp. 7594. [1989] On the number of escapes of a mar
tingale and its geometrical significance. In: Almost Everywhere Convergence, G. A. Edgar and L. Sucheston, editors, Academic Press; pp. 159178. [1991] Explorations in martingale theory and its applications. In: Ecole d'Ete de Prob
abilites de SaintFlour XIX1989, Lecture Notes in Mathematics 1464, SpringerVerlag; pp. 1665.
D. L. Burkholder, B. J. Davis, R. F. Gundy [1972] Integral inequalities for convex functions of operators on martingales. In: Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume II, L. M. LeCam, J. Neyman, & E. Scott, editors, University of California Press; pp. 223240.
[1960] A general ergodic theorem. Illinois J. Math. 4, pp. 153160.
R. V. Chacon, L. Sucheston On convergence of vectorvalued asymptotic martingales. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 33, pp. 5559.
S. D. Chatterji [1960) Martingales of Banachvalued random variables. Bull. Amer. Math. Soc. 66, pp. 395398. [1968] Martingale convergence and the
RadonNikodym theorem. Math. Scand. 22, pp. 2141. [1975) Vectorvalued martingales and their applications. In: Probability in Banach Spaces, Lecture Notes in Mathematics 526, SpringerVerlag; pp. 3351.
B. D. Choi, L. Sucheston [1981] Continuous parameter uniform smarts. In: Probability in Banach Spaces III, Lecture Notes in Mathematics 860, SpringerVerlag; pp. 8598.
G. Choquet [1956) Existence et unicite des representations integrales an mouen des points extremaux dans lea cones convexes. Seminaire Bour
baki 139, pp. 115.
410
References
Y. S. Chow [1960a] A martingale inequality and the law of large numbers. Proc. Amer. Math. Soc. 11, pp. 107111.
[1960b] Martingales in a afinite measure space indexed by directed sets. Trans. Amer. Math. Soc. 97, pp. 254285. [1965] Local convergence of martingales and
W. J. Davis, N. Ghoussoub, J. Lindenstrauss [1981] A lattice renorming theorem and applications to vectorvalued processes. Trans. Amer. Math. Soc. 263, pp. 531540.
W. J. Davis, W. B. Johnson
submartingale. Ann. Math. Statist. 38,
[1977] Weakly convergent sequences of Banach space valued random variables. In: Banach Spaces of Analytic Functions, Lecture Notes in Mathematics 604, SpringerVerlag; pp. 2931.
pp. 608609.
W. J. Davis, R. R. Phelps
the law of large numbers. Ann. Math.
Statist. 36, pp. 552558. [1967a] On the expected value of a stopped
[1967b] On a strong law of large numbers. Ann. Math. Statist. 38, pp. 610611.
Y. S. Chow, H. Robbins, D. Siegmund
[1974] The RadonNikodym property and dentable sets in Banach spaces. Proc. Amer. Math. Soc. 45, pp. 119122.
[1971] Great Expectations: The Theory of C. Dellacherie, P.A. Meyer Optimal Stopping. Houghton Mifflin. [1978] Probabilities and Potential. NorthY. S. Chow, H. Teicher Holland Mathematics Studies 29, North[1988] Probability Theory, Independence, Interchangeability, Martingales. Second Edition, SpringerVerlag.
J. P. R. Christensen
Holland.
[1982] Probabilities and Potential B. NorthHolland Mathematics Studies 72, NorthHolland.
[1974] Topology and Bore] Structure. NorthHolland Mathematics Studies 10, NorthHolland.
A. Denjoy
K. L. Chung [1947] Notes on some strong laws of large
Y. Derriennic, M. Lin
numbers. Amer. J. Math. 69, pp. 189192. [1951] The strong law of large numbers. In: Proceedings of the Second Berkeley Sym
Anal. 13, pp. 252267.
[1951] Une extension du theoreme de Vitali. Amer. J. Math. 73, pp. 314356.
[1973] On invariant measures and ergodic theorems for positive operators. J. Funct.
posium on Mathematical Statistics and J. Diestel, J. J. Uhl, Jr. Probability, J. Neyman, editor, University of California Press; pp. 341352.
L. E. Clarke
[1979] Problem Solution 6174. Amer. Math. Monthly 86, p. 313.
D. L. Cohn [1980] Measure Theory. Birkhauser.
H. H. Corson [1961] The weak topology of a Banach space. Trans. Amer. Math. Soc. 101, pp. 115.
D. DacunhaCastelle, M. Schreiber
[1977] Vector Measures. Mathematical Surveys 15, American Mathematical Society.
Dinh Quang Lu'u listed under Lu'u
J. L. Doob [1953] Stochastic processes. Wiley. [1963] A ratio operator limit theorem.
Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 1, pp. 288294. [1975] Stochastic process measurablilty conditions. Ann. Inst. Fourier (Grenoble) 25, pp. 163176.
[1974] Techniques probabilistes pour etude de problemes d'isomorphismes entre espaces de Banach. Ann. Inst. H. Poincare, Sec. B 10, pp. 229277.
L. E. Dubins, D. A. Freedman
Bui Khoi Dam listed under Bui
L. E. Dubins, J. Pitman
B. J. Davis [1969] A comparison test for martingale in
equalities. Ann. Math. Statist. 40, pp. 505508.
W. J. Davis, N. Ghoussoub, W. B. Johnson, S. Kwapien, B. Maurey [1990] Weak convergence of vector valued
martingales. In: Probability in Banach Spaces 6, U. Haagerup, J. HoffmanJorgensen, N. J. Nielsen, editors, Birkhauser; pp. 4150.
[1966] On the expected value of a stopped martingale. Ann. Math. Statist. 37, pp. 15051509.
[1980] A divergent, twoparameter, bounded
martingale. Proc. Amer. Math. Soc. 78, pp. 414416.
N. Dunford [1951] An individual ergodic theorem for non
commutative transformations. Acta Sci. Math. (Szeged) 14, pp. 14.
N. Dunford, B. J. Pettis [1940] Linear operations on summable func
tions. Trans. Amer. Math. Soc. 47, pp. 323392.
References N. Dunford, J. T. Schwartz [1956] Convergence almost everywhere of operator averages. J. Rat. Mech. Anal. 5, pp. 129178.
[1958] Linear Operators Part I. Interscience Publishers. A. Dvoretzky [1976] On stopping time directed conver
gence. Bull. Amer. Math. Soc. 82, pp. 347349.
[1977] Generalizations of martingales. Advances in Appl. Probability 9, pp. 193194.
411
Shortt, C. E. Silva, Contemporary Mathematics 94, American Mathematical Society; pp. 113129. [1991] A note on weak inequalities in Orlicz and Lorentz spaces. In: Approximation Theory and Functional Analysis, edited by C. K. Chui, Academic Press; pp. 7380.
L. Egghe new Some ChaconEdgartype inequalities for stochastic processes, and characterizations of Vitaliconditions. Ann. Inst. H. Poincare, Sec. B 16, pp. 327
[1980a]
337.
G. A. Edgar
[1980b] Characterizations of nuclearity in FYe
[1975] A noncompact Choquet theorem. Proc. Amer. Math. Soc. 49, pp. 354358. [1979a] Uniform semiamarts. Ann. Inst. H. Poincare, Sec. B 15, pp. 197203. [1979b] Measurability in a Banach space. In
[1981] Strong convergence of pramarts in Banach spaces. Canad. J. Math. 33, pp. 357
diana Univ. Math. J. 28, pp. 559579. [1980] Asplund operators and a.e. convervence. J. Multivariate Anal. 10, pp. 460466.
[1982] Additive smarts. Ann. Probability 10, pp. 199206. [1983] Two integral representations. In: Mea
sure Theory and its Applications, Lecture Notes in Mathematics 1033, SpringerVerlag; pp. 193198.
G. A. Edgar, A. Millet, L. Sucheston [1982] On compactness and optimality of stopping times. In:
Martingale The
ory in Harmonic Analysis and Banach Spaces, Lecture Notes in Mathematics 939, SpringerVerlag; pp. 3661.
G. A. Edgar, L. Sucheston [1976a] Amarts: a class of asymptotic martingales, A. Discrete parameter. J. Multivariate Anal. 6, pp. 193221. [1976b] Amarts: a class of asymptotic martingales, B. Continuous parameter. J. Multivariate Anal. 6, pp. 572591. [1976c] The Riesz decomposition for vectorvalued smarts. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 36, pp. 8592. [1977a] On vectorvalued smarts and dimension of Banach spaces. Z. Wahrscheinlich
keitstheorie and Verw. Gebiete 39, pp. 213216.
[1977b] Martingales in the limit and smarts. Proc. Amer. Math. Soc. 67, pp. 315320.
[1981] Demonstrations de lois des grands nombres par lea sousmartingales descen
dents. C. R. Acad. Sci. Paris 292, pp. A967969.
[1989] On maximal inequalities in Orlicz spaces. In: Measure and Measurable Dynamics, edited by R. D. Mauldin, R. M.
chet spaces. J. Fhnct. Anal. 35, pp. 207214.
361.
[1982a] Weak and strong convergence of smarts in Frechet spaces. J. Multivariate Anal. 12, pp. 291305.
[1982b] On sub and superpramarts with values in a Banach lattice. In: Measure Theory, Oberwolfach 1981, Lecture Notes in Mathematics 945, SpringerVerlag; pp. 352365.
[1984a] Stopping Time Techniques for Analysts and Probabilists. London Mathematical Society Lecture Notes Series 100, Cambridge University Press. [1984b] Convergence of adapted sequences of Pettisintegrable functions. Pacific J. Math. 114, pp. 345366.
A. Engelbert, H. J. Engelbert [1979] Optimal stopping and almost sure con
vergence of random sequences. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 48, pp. 309325. [1980] On a generalization of a theorem of W. Sudderth and some applications. In: Mathematical Statistics, Banach Center Publications 6, Polish Scientific Publishers; pp. 111120.
N. A. Fava [1972] Weak inequalities for product operators. Studia Math. 42, pp. 271288.
J. Feldman [1962] Subinvariant measures for Markov op
erators. Duke Math. J. 29, pp. 7198.
D. L. Fisk Quasimartingales. Trans. Math. Soc. 120, pp. 369389.
[1965]
Amer.
H. Follmer [1984] Almost sure convergence of multiparameter martingales for Markov random fields. Ann. Probability 12, pp. 133140.
References
412
J. P. Fouque
M. Girardi, J. J. Uhl, Jr.
[1980a] Regularity des trajectoires des smarts
[1990] Slices, RNP, strong regularity, and
et hyperamarts reels. C. R. Acad. Sci. Paris 290, pp. A107110. [1980b] Enveloppe de Snell et theorie generale des processus. C. R. Acad. Sci. Paris 290, pp. A285288.
J. P. Fouque, A. Millet [1980] Regularity a gauche des martingales fortes a plusieurs indices. C. R. Acad. Sci. Paris 290, pp. A773776. N. E. Frangos
[1985] On regularity of Banach valued processes. Ann. Probability 13, pp. 985990.
N. E. Frangos, L. Sucheston [1985] On Convergence and demiconvergence of block martingales and submartingales.
In: Probability in Banach Spaces V, Lecture Notes in Mathematics 1153, SpringerVerlag; pp. 189225. [1986] On multiparameter ergodic and martingale theorems in infinite measure
spaces. Probab. Th. Rel. Fields 71, pp. 477490. 0. Frank [1966] Generalization
of an inequality of
Hajek and Renyi. Scand. Actuarietidskrift 49, pp. 8589.
A. Garsia [1965] A simple proof of E. Hopf's maximal ergodic theorem. J. Math. Mech. 14, pp. 381382. [1970] Topics in Almost Everywhere Convergence. Markham.
N. Ghoussoub [1977] Banach lattices valued smarts. Ann. Inst. H. Poincare, Sec. B 13, pp. 159169. [1979a] Order amarts: A class of asymptotic martingales. J. Multivariate Anal. 9, pp. 165172.
[1979b] Summability and vector amarts. J. Multivariate Anal. 9, pp. 173178. [1982] Riesz spaces valued measures and pro
cesses. Bull. Soc. Math. France 110, pp. 146150.
N. Ghoussoub, L. Sucheston [1978] A refinement of the Riesz decomposition for smarts and semiamarts. J. Multivariate Anal. 8, pp. 146150.
N. Ghoussoub, M. Talagrand [1979] Convergence faible des potentiels de
Doob vectoriels. C. R. Acad. Sci. Paris 288, pp. A599602.
D. Gilat
martingales. Bull. Austral. Math. Soc. 41, pp. 411415.
C. Goffman, D. Waterman [1960] On upper and lower limits in measure. Fundaments Math. 48, pp. 127133.
R. F. Gundy [1969] On the class L log L of martingales and
singular integrals. Studia Math. 33, pp. 109118.
A. Gut [1982] A contribution to the theory of asymp
totic martingales. Glasgow Math. J. 23, pp. 177186.
A. Gut, K. D. Schmidt [1983] Amarts and Set Function Processes. Lecture Notes in Mathematics 1042, SpringerVerlag.
I. HAjek, A. Renyi [1956] Generalization of an inequality of Kolmogorov. Acta Math. Acad. Sci. Hung. 6, pp. 281283.
P. R. Halmos [1950] Measure Theory. Van Nostrand.
G. H. Hardy, J. E. Littlewood, G. Polya [1952]
Inequalities. Second edition, Cam
bridge University Press.
J. A. Hartigan [1983] Bayes Theory. Springer Series in Statistics, SpringerVerlag.
0. Haupt [1953] Propriete de mesurabilite de bases de derivation. Port. Math. 13, pp. 3754. C. A. Hayes
[1976] Necessary and sufficient conditions for the derivation of integrals of L,yfunctions.
Trans. Amer. Math. Soc. 223, pp. 385394.
C. A. Hayes, C. Y. Pauc [1970] Derivation and Martingales. Ergebnisse der Mathematik and ihrer Grenzgebiete 49, SpringerVerlag. H. Heinich [1978a] Martingales
asymptotiques pour l'ordre. Ann. Inst. H. Poincare, Sec. B 14, pp. 315333. [1978b] Convergence des sousmartingales positives dans un Banach reticule. C. R. Acad. Sci. Paris 286, pp. 279280.
F. Hiai
[1986] The best bound in the L log L inequal
[1985] Convergence of conditional expecta
ity of Hardy and Littlewood and its martingale counterpart. Proc. Amer. Math. Soc. 97, pp. 429436.
tions and strong laws of large numbers for multivalued random variables. Trans. Amer. Math. Soc. 291, pp. 613627.
References
413
T. P. Hill
J. L. Kelley
[1983] Prophet inequalities and order selection in optimal stopping problems. Proc. Amer. Math. Soc. 88, pp. 131137. [1986] Prophet inequalities for averages of independent nonnegative random variables. Math. Z. 191, pp. 427436.
[1950] Convergence in topology. Duke Math. J. 17, pp. 277283. [1955] General Topology. D. Van Nostrand Company.
T. P. Hill, D. P. Kennedy [1989] Prophet inequalities for parallel processes. J. Multivariate Anal. 31, pp. 236243.
T. P. Hill, R. P. Kertz [1981a] Ratio comparisons of supremum and stop rule expectations. Z. Wahrscheinlich
keitstheorie and Verw. Gebiete 56, pp. 283285.
[1981b] Additive comparisons of stop rule and supremum expectations of uniformly bounded independent random variables. Proc. Amer. Math. Soc. 83, pp. 582585. [1982] Comparisons of stop rule and supra mum expectations of i.i.d. random variables. Ann. Probability 10, pp. 336345.
E. Hopf [1937] Ergodentheorie. Ergebnisse der Mathematik 5, SpringerVerlag. [1954] The general temporally discrete Markov process. J. Rat. Mech. Anal. 3, pp. 1345.
R. E. Huff [1974] Dentablilty and the RadonNikodym property. Duke Math. J. 41, pp. 111114.
G. A. Hunt [1966] Martingales et Processus de Markov. Dunod.
W. Hurewicz [1944] Ergodic theorem without invariant measure. Ann. Math. 45, pp. 192206.
A. Ionescu Tulcea, C. Ionescu Tulcea [1961] On the lifting property. J. Math. Anal. Appl. 3, pp. 537546. [1963] Abstract ergodic theorems. Trans. Amer. Math. Soc. 107, pp. 107124. [1969] Topics in the Theory of Lifting. Ergebnisse der Mathematik and ihrer Grenzgebiete 48, SpringerVerlag.
A. Ionescu Tulcea, M. Moretz (1969] Ergodic properties of semiMarkovian
operators on the Z' part. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 13, pp. 119122.
W. B. Johnson, G. Schechtman [1988] Martingale inequalities in rearrangement invariant function spaces. Israel J. Math. 64, pp. 267275.
D. Kennedy [1985] Optimal stopping of independent random variables and maximizing prophets. Ann. Probabab. 10, pp. 566571.
H. Kesten [1982] Percolation Theory for Mathematicians. Progress in Probability and Statistics 2, Birkhauser.
J. F. C. Kingman [1968] The ergodic theory of subadditive stochastic processes. J. Royal Stat. Soc. B 30, pp. 499510. [1973] Subadditive ergodic theory. Ann. Probab. 1, pp. 883909. [1975] Subadditive processes. In: Ecole d'Ete
de Probabilites de SaintFlour V, Lecture Notes in Mathematics 539, SpringerVerlag; pp. 167223.
H. Korezlioglu, G. Mazziotto, J. Szpirglas [1981] Processus aleatoires a deux indices. SpringerVerlag.
A. Korzeniowski [1978] Martingales in Banach spaces for which
the convergence with probability one, in probability and in law coincide. Colloq. Math. 39, pp. 153159.
M. A. Krasnosel'skii, Ya. B. Rutickii [1961] Convex Functions and Orlicz Spaces. Gordon and Breach Science Publishers.
U. Krengel [1985] Ergodic Theorems. Walter de Gruyter.
U. Krengel, L. Sucheston [1977] Semismarts and finite values. Bull. Amer. Math. Soc. 83, pp. 745747. [1978] On semiamarts, smarts and processes with finite value. In: Probability on Banach Spaces, J. Kuelbs, editor, Advances in Probability and Related Topics 4, Marcel Dekker; pp. 197266.
[1980] Temps d'arret et tactiques pour des processus indexes par un ensemble ordonne. C. R. Acad. Sci. Paris 290, pp. A192196. [1981] Stopping rules and tactics for processes
indexed by a directed set. J. Multivariate Anal. 11, pp. 199229. [1987] Prophet compared to gambler: An inequality for transforms of processes. Ann. Probab. 15, pp. 15931599.
414
References
K. Krickeberg
W. A. J. Luxemburg
[1956] Convergence of martingales with a directed index set. Trans. Amer. Math. Soc. 83, pp. 313337. [1957] Stochastische Konvergenz von Semimartingalen. Math. Z. 66, pp. 470486. [1959] Notwendige Konvergenzbedingungen
[1955] Banach function spaces. Thesis, Delft University.
D. Maharam [1958] On a theorem of von Neumann. Proc. Amer. Math. Soc. 9, pp. 987994.
bei Martingalen and verwandted Pro
V. Marraffa
Transactions of the Second Prague Conference on Information The
[1988] On almost sure convergence of amarts
cessen. In:
ory, Czech. Acad. Sci.; pp. 279305.
K. Krickeberg, C. Pauc [1963] Martingales et derivation. Bull. Soc. Math. Hance 91, pp. 455554.
K. Kunen, H. Rosenthal [1982] Martingale proofs of some geometrical results in Banach space theory. Pacific J. Math. 100, pp. 153175.
C. W. Lamb [1973] A ratio limit theorem for approximate martingales. Canad. J. Math. 25, pp. 772779.
P. Levy [1937] Theorie de 1'addition des variables aleatoires. GauthierVillars, Paris.
D. R. Lewis, C. Stegall [1973] Banach spaces whose duals are isomor
phic to 11(r). J. Functional Anal. 12, pp. 177187.
M. Lin and R. Wittmann [1991] Pointwise ergodic theorems for certain
order preserving mappings in L1. In: Almost Everywhere Convergence II, A. Bellow and R. Jones, editors, Academic Press; pp. 253273.
W. Linde [1976] An operator ideal in connection with the RadonNikodym property of Banach spaces. Math. Nachr. 71, pp. 6573.
J. Lindenstrauss, L. Tzafriri [1977] Classical Banach Spaces I. Ergebnisse
der Mathematik and ihrer Grenzgebiete
and martingales without the RadonNikodym property. J. Theoretical Probab. 1, pp. 255261.
H. B. Maynard [1973] A geometrical characterization of Banach spaces having the RadonNikodym
property. Trans. Amer. Math. Soc. 185, pp. 493500.
S. M. McGrath [1980] Some ergodic theorems for commuting L1 contractions. Studia Math. 70, pp. 153160.
J. F. Mertens [1972] Theorie des processus stochastiques generaux; applications aux surmartin
gales. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 22, pp. 4568. P: A. Meyer
[1965] Theorie ergodique et potentiels. Ann. Inst. Fourier (Grenoble) 15, pp. 8996. [1966] Probability and Potentials. Blaisdell Publishing Company.
[1971] Le retournement du temps, d'apres Chung et Walsh. In: Seminaire de Probabilites V, Lecture Notes in Mathematics 191, SpringerVerlag; pp. 213236. [1981] Theorie elementaire des processus a deux indices. In: Processus Aleatoires a Deux Indices, Lecture Notes in Mathematics 863, SpringerVerlag; pp. 139.
A. Millet
92, SpringerVerlag. [1979] Classical Banach Spaces IL Ergebnisse
[1978] Sur Is caracterisation des conditions
der Mathematik and ihrer Grenzgebiete
martingales. C. R. Acad. Sci. Paris 287,
97, SpringerVerlag.
de Vitali par la convergence essentielle des
Dinh Quang Lu'u
pp. A887890. [1981] Convergence and regularity of strong submartingales. In: Processus Aleatoires a Deux Indices, Lecture Notes in Mathematics 863, SpringerVerlag; pp. 5058.
[1981] On convegence of vectorvalued smarts
A. Millet, L. Sucheston
L. H. Loomis [1975] Dilations and extremal measures. Advances in Math. 17, pp. 113.
of finite order. Math. Nachr. 113, pp. 3945.
[1985a] Amarts of finite order and Pettis Cauchy sequences of Bochner integrable functions in locally convex spaces. Ann. Sci. Univ. ClermontFerrand II. Probab. Appl. No. 3, pp. 91106. [1985b] Stability and convergence of smarts in Frechet spaces. Acta Math. Hungarica 45, pp. 99108.
[1978] Classes d'amarts filtrants et conditions
de Vitali. C. R. Acad. Sci. Paris 286, pp. A835837. [1979a] Characterizations of Vitali conditions
with overlap in terms of convergence of classes of amarts. Canad. J. Math. 31, pp. 10331046.
[1979b] La convergence essentielle des martingales bornees dans L1 n'implique pas
415
References
la condition de Vitali V. C. R. Acad. Sci. Paris 288, pp. A595598. [1980a] Convergence et regularite des martingales a indices multiples. C. R. Acad. Sci. Paris 291, pp. A147150. [1980b] Convergence of classes of amarts indexed by directed sets. Canad. J. Math. 32, pp. 86125. [1980c] On covering conditions and convergence. In: Measure Theory, Oberwolfach 1979, Lecture Notes in Mathematics 794, SpringerVerlag; pp. 431454. [1980d] A characterization of Vitali condi
tions in terms of maximal inequalities. Ann. Probabability 8, pp. 339349. [1980e] On convergence of Llbounded martingales indexed by directed sets. Probab. Math. Statist. 1, pp. 151169.
[1981a] On regularity of multiparameter amarts and martingales. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 56, pp. 2145. [1981b] Demiconvergence des processus a deux indices. C. R. Acad. Sci. Paris 293, pp. 435438. [1983] Demiconvergence of processes indexed
by two indices. Ann. Inst. H. Poincare, Sec. B 19, pp. 175187. [1989] On fixed points and multiparameter ergodic theorems in Banach lattices. Canad.
Math. J. 40, pp. 429458. J. MogyorGdi
[1978] Remark on a theorem of J. Neveu. Ann.
Univ. Sci. Budapest. E6tv6s Sect. Math. 21, pp. 7781. E. H. Moore
[1915] Definition of limit in general integral analysis. Proc. Nat. Acad. Sci. U. S. A. 1, pp. 628.
E. H. Moore, H. L. Smith [1922] A general theory of limits. Amer. J. Math. 44, pp. 102121. A. P. Morse, W. Transue
[1950] Functionals F bilinear over the product A x B of two pseudonormed vector spaces II. Ann. of Math. 51, pp. 576614.
E. Mourier [1953] Elements aleatoires dans un espace de Banach. Ann. Inst. H. Poincare, Sec. B 13, pp. 161244.
A. G. Mucci
I. Namioka, R. R. Phelps [1975] Banach spaces which are Asplund spaces. Duke Math. J. 42, pp. 735750.
P. A. Nelson [1970] A class of orthgonal series related to martingales. Ann. Math. Statist. 41, pp. 16841694.
J. von Neumann Algebraische Repriisentanten der Funktionen his auf eine Menge von Masse Null. J. Crelle 165, pp. 109115.
[1931]
[1949] On rings of operators, reduction the
ory. Ann. of Math. 30, pp. 401485.
J. Neveu [1961] Sur le theorbme ergodique ponctuel. C. R. Acad. Sci. Paris 252, pp. 15541556. [1964] Potentiels markoviens discret. Ann. Univ. Clermont 24, pp. 3789. [1965a] Mathematical Foundations of the Calculus of Probability. HoldenDay. [1965b] Relations entre la theorie des martin
gales et la theorie ergodique. Ann. Inst. Fourier (Grenoble) 15, pp. 3142. [1972] Convergence presque sur des martingales multivoques. Ann. Inst. H. Poincare, Sec. B 8, pp. 17. [1975] Discrete Parameter Martingales. NorthHolland Mathematical Library 10, NorthHolland.
S. Orey [1967] Fprocesses. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume II,
Part 1, L. M. LeCam & J. Neyman, editors, University of California Press; pp. 301303.
W. Orlicz [1932] Uber eine gewisse Klasse von Raumen vom typus B. Bull. Acad. Sci. Polonaise A, pp. 207220. [1936] Uber Resume (LM). Bull. Acad. Sci. Polonaise A, pp. 93107.
D. S. Ornstein [1960] On invariant measures. Bull. Amer. Math. Soc. 66, pp. 297300.
R. E. A. C. Paley [1932] A remarkable series of orthogonal func
tions. Proc. London Math. Soc. 34, pp. 241264.
(1973] Limits for martingalelike sequences. Pacific J. Math. 48, pp. 197202. [1976] Another martingale convergence theorem. Pacific J. Math. 64, pp. 539541.
C. Y. Pauc
I. Namioka, E. Asplund
V. C. Pestien
[1967] A geometric proof of RyllNardzewski's
[1982] An extended Fatou equation and continuoustime gambling. Advances in Appl. Probability 14, pp. 309323.
fixed point theorem. Bull. Amer. Math. Soc. 73, pp. 443445.
Ableitungsbasen, Pratopologie and starker Vitalischer Satz. J. Reine Angew. Math. 191, pp. 6991.
[1953]
416
References
R. R. Phelps
W. Rudin
[1966] Lectures on Choquet's Theorem. Van Nostrand Mathematical Studies 7, D. Van Nostrand Company. [1974] Dentability and extreme points in Banach spaces. J. Functional Anal. 17, pp.
[1973] Functional Analysis. McGrawHill.
7890.
G. Pisier [1986] Factorization of Linear Operators and Geometry of Banach Spaces. CBMS Regional Conference Series in Mathematics 60, American Mathematical Society.
R. Pol [1980] On a question of H. H. Corson and some related problems. Fundaments Math. 109, pp. 143154.
C. RyllNardzewski [1967] On fixed points of semigroups of en
domorphisms of linear spaces. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probabil
ity, Volume II, Part 1, L. M. LeCam & J. Neyman, editors, University of California Press; pp. 5561.
E. SamuelCahn [1984] Comparison of threshold stopping rules
and maximum for independent nonnegative random variables. Ann. Probab. 12, pp. 12131216.
R. de Possel
R. Sato
[1936] Derivation abstraite des fonctions d'en
[1986] Individual ergodic theorem for superadditive processes. Acts Math. Hungarica 47, pp. 153155.
semble. J. Math. Pures et Appl. 15, pp. 391409.
K. M. Rao [1969] Quasimartingales. Math. Scand. 24, pp. 7992.
O. J. Reinov [1975] Operators of type RN in Banach spaces
(Russian). Dokl. Akad. Nauk SSSR 220, pp. 528531; English translation: Soviet Math. Dok]. 16 (1975) 119123.
P. Revesz
F. S. Scalora [1961] Abstract martingale convergence theorems. Pacific J. Math. 11, pp. 347374.
M. Slaby [1982] Convergence of submartingales and
amarts in Banach lattices. Bull. Acad. Polon. Sci. Ser. Sci. Math. 30, pp. 291299.
[1968] The Laws of Large Numbers. Academic Press. D. Revuz [1974] Markov Chains. NorthHolland.
R. Solovay
M. A. Rieffel
C. Stegall
[1968] The RadonNikodym theorem for the Bochner integral. Trans. Amer. Math. Soc. 131, pp. 466487.
[1975] The RadonNikodym property in conjugate Banach spaces. Trans. Amer. Math. Soc. 206, pp. 213223. [1981] The RadonNikodym property in conjugate Banach spaces, II. Trans. Amer. Math. Soc. 264, pp. 507519.
H. Robbins, D. Siegmund [1971] A convergence theorem for non negative almost supermartingales and some applications. In: Optimizing Methods in Statistics, Academic Press; pp. 233257. U. R.Onnow [1967] On integral representations of vector
valued measures. Math. Scand. 21, pp. 4553.
G. C. Rota [1962] An "alternierende Verfahren" for general positive operators. Bull. Amer. Math. Soc. 68, pp. 95102.
H. L. Royden [1968] Real Analysis. Second edition, Macmillan.
M. Rubinstein
[1970] A model of set theory in which every set of reals is Lebesgue measurable. Ann. of Math. 92, pp. 156.
E. Stein [1961] On the maximal ergodic theorem. Proc. Nat. Acad. Sci. U. S. A. 47, pp. 18941897.
R. Subramanian [1973] On a generalization of martingales due to Blake. Pacific J. Math. 48, pp. 275278.
L. Sucheston [1964] On the existence of finite invariant measures. Math. Z. 34, pp. 327336. [1967] On the ergodic theorem for positive operators. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 8, pp. 111, 353356.
[1974] Properties of uniform integrability and convergence for families of random vari
[1983] On oneparameter proofs of almost sure convergence of multiparameter pro
ables. Rend. Accad. Naz. dei Lincei 57,
cesses. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 63, pp. 4349.
pp. 9599.
References
417
L. Sucheston, L. Szab6
J. J. Uhl, Jr.
[1991] A principle for almost everywhere convergence of multiparameter processes. In:
[1977] Pettis mean convergence of vectorvalued asymptotic martingales. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete
Almost Everywhere Convergence II, A. Bellow and R. Jones, editors, Academic Press; pp. 253273.
L. Sucheston, Z. Yan [In press] On the prophet problem for trans
forms.
W. D. Sudderth [1971] A "Fatou's equation" for randomly stopped variables. Ann. Math. Statist. 42, pp. 21432146.
37, pp. 291295.
J. B. Walsh [1979] Convergence and regularity of multi
parameter strong martingales. Z. Wahrscheinlichkeitstheorie and Verw. Gebiete 46, pp. 177192.
N. Wiener [1939] The ergodic theorem. Duke Math. J. 5, pp. 118.
L. I. Szabb
R. Wittmann
[1991] The converse of the dominated er
[In press] On the prophet inequality for subadditive processes. Stochatic Analysis and Applications.
godic theorem in Hurewicz setting. Canad. Math. Bull. 34, pp. 405411.
J. Szulga
V. Yankov
[1978a] On the submartingale characterization of Banach lattices isomorphic to 11. Bull. Acad. Polon. Sci. Ser. Sci. 26, pp.
[1941] On the uniformization of Asets (Rus
6568. [1978b] Boundedness and convergence of Ba
K. Yoshida, E. Hewitt
nach lattice valued submartingales. In: Probability Theory on Vector Spaces, Lecture Notes in Mathematics 656, SpringerVerlag; pp. 251256. [1979] Regularity of Banach lattice valued
martingales. Colloq. Math. 41, pp. 303312.
J. Szulga, W. A. Woyczynski [1976] Convergence of submartingales in Banach lattices. Ann. Probability 4, pp. 464469.
M. Talagrand [1984] Pettis Integral and Measure Theory. Memoirs of the Amer. Math. Soc. 307, American Mathematical Society. [1985] Some structure results for martingales in the limit and pramarts. Ann. Probability 13, pp. 11921203. [1986] Derivation, L'1'bounded martingales and covering conditions. Trans. Amer. Math. Soc. 293, pp. 257289.
R. L. Taylor, P. Z. Daffer,
R. F. Patterson
[1985] Limit Theorems for Sums of Exchangeable Random Variables. Rowman & Allanheld Publishers.
sian). Dokl. Akad. Nauk SSSR 30, pp. 491592.
Finitely aditive measures. Amer. Math. Soc. 72, pp. 4466.
[1953]
Trans.
T. Yoshimoto [1982] Pointwise ergodic theorem and function spaces MP . Studia Math. 72, pp. 253271.
W. H. Young [1912] On classes of summable functions and their Fourier series. Proc. Royal Soc. London 87, pp. 225229.
A. C. Zaanen [1949] Note on a certain class of Banach spaces. Proc. Nederl. Acad. Sci. A 52, pp. 488498. [1983] Riesz Spaces II. NorthHolland Mathematical Library 30, NorthHolland.
A. Zygmund [1934] On the differentiability of multiple integrals. Fundamenta Math. 23, pp. 143149. [1951] An individual ergodic theorem for non
commutative transformations. Acta Sci. Math. (Szeged) 14, pp. 103110. [1967] A note on differentiability of integrals.
Colloquium Mathematicum 16, pp. 199204.
Index of names
Akcoglu, M. A., 50, 350, 371, 380, 382, 393, 407
Aldous, D. J., 406, 407 Andersen, E. S., 32, 407 Asplund, E., 232, 415 Astbury, K., 32, 136, 144, 169, 170, 198, 343,407
Austin, D. G., 13, 32, 268, 407 Bagchi, S. D., 198, 407 Bakry, D., 405, 407 Banach, S., 300, 332, 407 Bartfai, P., 262 Bauer, H., 266, 407 Baxter, J. R., 13, 407 Bellow, A., 139, 197, 198, 216, 217, 218, 253, 408
Benyamini, Y., 408 Besicovitch, A. S., 308, 408 Billingsley, P., 266, 408 Birkhoff, G., 8, 408 Birkhoff, G. D., 365, 408 Blackwell, D., 28, 408 Blake, L. H., 216, 408 Blum, J. R., 262, 408 Bochner, S., 173 Bohr, H., 308 Boshuizen, F. A., 408 Bourgain, J., 228, 232, 408 Bourgin, R. D., 183, 217, 232, 247, 408 Bru, B., 266, 408 Brunel, A., 197, 218, 253, 380, 393, 408, 409 Brunk, R. D., 260, 409 Bui Khoi Dam, 81, 409 Burkholder, D. L., 27, 82, 96, 100, 103, 127, 260, 271, 273, 278, 280, 409 Busemann, H., 343, 409 Cairoli, E., 382, 383, 406, 409
Chacon, R. V., 8, 12, 198, 216, 217, 365, 375, 376, 377, 379, 380, 409 Chatterji, S. D., 171, 209, 217, 406, 409 Choi, B. D., 32, 409 Choquet, G., 171, 219, 231, 232, 409
Chow, Y. S., 81, 111, 144, 259, 260, 266, 410 Christensen, J. P. R., 289, 410 Chung, K. L., 260, 410 Clarke, L. E., 77, 410 Cohn, D. L., 34, 218, 410
Corson, H. H., 232, 410 DacunhaCastelle, D., 350, 410 Daffer, P. Z., 417 Davis, B. J., 280, 409, 410 Davis, W. J., 217, 218, 232, 410 Dellacherie, C., 81, 404, 410 Denjoy, A., 343, 410 Derriennic, Y., 380, 410 Diestel, J., 183, 217, 410 Dieudonne, J., 113 Doob, J. L., 8, 19, 32, 96, 114, 264, 382, 410 Dubins, L. E., 28, 408, 410 Dunford, N., 77, 172, 212, 218, 223, 253, 376, 382, 390, 393, 410, 411 Dvoretzky, A., 216, 408, 411
Edgar, G. A., 8, 12, 13, 19, 32, 68, 99, 127, 143, 171, 183, 197, 198, 213, 215, 216, 219, 231, 232, 253, 264, 265, 390, 394, 406, 407, 411 Egghe, L., 197, 215, 252, 253, 408, 411 Engelbert, A., 411 Engelbert, H. J., 411 Fava, N. A., 33, 65, 68, 382, 390, 411 Feldman, J., 380, 411 Feller, W., 343, 409 Fisk, D. L., 32, 411 Follmer, H., 411 Fouque, J. P., 412 Frangos, N. E., 68, 127, 215, 308, 390, 396, 406, 412 Frank, 0., 266, 412 Freedman, D. A., 410
Garling, D. J. A., 112 Garsia, A., 354, 412 Ghoussoub, N., 198, 218, 253, 408, 410, 412 Gilat, D., 96, 112, 412 Girardi, M., 232, 412 Goffman, C., 127, 412 Gundy, R. F., 99, 280, 409, 412 Gut, A., 266, 412 Haar, A., 290 Hajek, I., 266, 412 Halmos, P. R., 311, 412 Hanson, D. L., 262, 408 Hardy, G. H., 97, 99, 412
Index of names Hartigan, J. A., 33, 412 Haupt, 0., 343, 412 Hayes, C. A., 113, 158, 308, 326, 327, 343, 412
Heinich, H., 217, 266, 408, 412 Herglotz, G., 231 Hewitt, E., 350, 417 Hiai, F., 412 Hill, T. P., 111, 112, 413 Hopf, E., 354, 356, 360, 380, 413 Huff, R. E., 232, 413 Hunt, G. A., 413 Hurewicz, W., 379, 380
Ionescu Tulcea, A., 13, 32, 82, 171, 209, 213, 217, 290, 380, 407, 413 Ionescu Tulcea, C., 82, 171, 209, 213, 217, 290, 413
Jessen, B., 32, 407 Johnson, W. B., 217, 218, 280, 410, 413 Kelley, J. L., 7, 8, 413 Kennedy, D. P., 413 Kertz, R. P., 413 Kesten, H., 381, 413 Kingman, J. F. C., 365, 380, 381, 413 Kolmogorov, A., 262 Koopmans, L. H., 262, 408 Korezlioglu, H., 406, 413 Korzeniowski, A., 413 Krasnosel'skii, M. A., 49, 50, 67, 68, 413 Krengel, U., 30, 111, 112, 354, 380, 390, 392, 393, 405, 408, 413 Krickeberg, K., 32, 113, 144, 155, 158, 414 Kunen, K., 232, 414 Kwapien, S., 218, 410 La Vallee Poussin, C. de, 68 Lamb, C. W., 13, 32, 414 Levy, P., 32, 259, 414 Lewis, D. R., 253, 414 Lin, M., 354, 380, 410, 414 Linde, W., 253, 414 Lindenstrauss, J., 27, 50, 172, 183, 233, 253, 350, 360, 383, 410, 414 Littlewood, J. E., 97, 99, 412 Loomis, L. H., 232, 414 Lootgieter, J. C., 266, 408 Lu'u, D. Q., 196, 414 Luxemburg, W. A. J., 50, 414 McGrath, S. M., 394, 414 Maharam, D., 290, 414 Marraffa, V., 218, 414 Maurey, B., 218, 410 Maynard, H. B., 232, 414 Mazziotto, G., 406, 413 Menchoff, D. E., 127 Mertens, J. F., 13, 32, 414
419
Meyer, P: A., 13, 32, 81, 266, 381, 404, 410, 414
Millet, A., 19, 32, 68, 127, 139, 142, 143, 144, 158, 160, 169, 170, 215, 264, 343, 390, 405, 411, 414, 415 Minkowski, H., 219 Mogyor6di, J., 81, 415 Moore, E. H., 8, 415 Moretz, M., 380, 413 Morse, A. P., 50, 415 Mourier, E., 392, 415 Mucci, A. G., 216, 415 Namioka, I., 232, 415 Nelson, P. A., 107, 415 Neumann, J. von, 232, 289, 290, 415 Neveu, J., 8, 77, 96, 99, 198, 266, 360, 365, 380, 415 Orey, S., 32, 415 Orlicz, W., 50, 415 Ornstein, D. S., 365, 375, 380, 409, 415
Paley, R. E. A. C., 415 Patterson, R. F., 417 Pauc, C. Y., 113, 158, 308, 326, 327, 343, 412, 414, 415
Pestien, V. C., 415 Pettis, B. J., 218, 410 Phelps, R. R., 228, 232, 410, 415, 416 Pisier, G., 253, 416 Pitman, J., 410 Pol, R., 232, 416 Polya, G., 99, 412 Possel, R. de, 32, 416 Rao, K. M., 32, 416 Reinov, O. J., 253, 416 Renyi, A., 266, 412 Revesz, P., 262, 416 Revuz, D., 379, 416 Rieffel, M. A., 217, 232, 416 Robbins, H., 111, 410, 416 ROnnow, U., 217, 416 Rosenthal, H., 232, 414 Rota, G. C., 382, 416 Royden, H. L., 34, 416 Rubinstein, M., 416 Rudin, W., 51, 172, 173, 416 Rutickii, Ya. B., 49, 50, 67, 68, 413 RyllNardzewski, C., 171, 232, 416 SamuelCahn, E., 416 Scalora, F. S., 217, 416 Schechtman, G., 280, 413 Schmidt, K. D., 412 Schreiber, M., 350, 410 Schwartz, J. T., 77, 172, 212, 223, 253, 376, 382, 393, 411 Siegmund, D., 111, 410, 416 Slaby, M., 215, 416
Index of names
420
Smith, H. L., 8, 415 Solovay, R., 289, 416 Stegall, C., 247, 253, 414, 416 Stein, E., 382, 404, 416 Subramanian, R., 216, 416 Sucheston, L., 8, 12, 13, 19, 30, 32, 50, 68, 99, 111, 112, 127, 139, 142, 143, 144, 158, 160, 169, 170, 197, 198, 213, 215, 216, 217, 218, 253, 264, 265, 308, 343, 350, 371, 380, 390, 394, 396, 405, 406,
407,409,411417
Sudderth, W. D., 417 Szab6, L., 99, 390, 396, 406, 417 Szpirglas, J., 406, 413 Szulga, J., 417 Talagrand, M., 158, 169, 170, 183, 215, 216, 326, 327, 343, 412, 417 Taylor, R. L., 417 Teicher, H., 410 Transue, W., 50, 415
Tzafriri, L., 27, 50, 172, 183, 233, 253, 350, 360, 383, 414
Uhl, J. J.,
183, 202, 207, 217, 218, 232, 410,
412, 417
Vitali, G., 144, 300 Walsh, J. B., 406, 409, 417 Waterman, D., 127, 412 Wiener, N., 390, 417 Wittmann, R., 31, 112, 354, 414, 417 Woyczynski, W. A., 417 Yan, Z., 112, 417 Yankov, V., 232, 417 Yoshida, K., 350, 417 Yoshimoto, T., 394, 417 Young, W. H., 50, 417
Zaanen, A. C., 50, 67, 68, 417 Zygmund, A., 308, 382, 390, 417
Index of terms
(A), 169, 170, 328 absolute continuity of charges, 28 absolute convergence, 233 absolutely continuous, 18, 299 set function, 15 uniformly, 69 absolutely summing, 232, 234 absorbing set, 356 abstract difference condition, 139 adapted family, 128 adapted process, 4 additive amart, 405 additive martingale, 405 additive process, 365, 405 admissible, 376 a.e. convergence, 205 afIine fixedpoint property, 223 almost everywhere, 34 alternating procedure, 404 amart, 9, 13, 113, 139, 147 additive, 405 ascending, 31 Banachvalued, 171 continuous parameter, 31 controlled, 143 descending, 31 exact, 21 ordered, 14 reversed, 12, 17 two parameter, 404 uniform, 184 vectorvalued, 183, 184 weak, 186 weak sequential, 186 amart convergence theorem, 9, 11 smart of finite order, 196 amart potential, 125, 239, 248 vectorvalued, 192 approximation by stopped process, 149 by stopping times, 12 arbitrarily small sets, 38 Asplund operator, 246, 247, 251, 253 Asplund space, 248 associated charge, 13 associated stochastic basis, 405 asymptotic martingale, 13 asymptotically greater, 117
asymptotically uniformly absolutely continuous, 120 atomic, 26, 98 atomless measure, 69 average range of a vector measure, 198 (B), 188 Bp, 38 backward induction, 104 Banach lattice, 41 Banachvalued ordered amarts, 144 Banachvalued process, 388 Banachvalued random variables, 171 Bayesian statistics, 33 best constant, 95, 111, 280 biconvex functional, 101 block martingale, 397 block submartingale, 397 Bochner integrable, 173 Bochner integral, 173, 182 Bochner measurable, 175 Bochner norm, 173 Borel measurable, 175 Borel measure, 311 Borel sets, 174 boundary, 300 bounded variation, 297 boundedly complete basis, 183 Burkholder's inequalities, 273 BusemannFeller basis, 338 (C), 113, 159, 160, 162, 170, 330 Cpartition, 301 (CD), 169 Cantor set, 247 Cauchy net, 3 centered disk basis, 311 Cesaro averages, 378 Chacon inequality, 12 ChaconOrnstein theorem, 361, 380 conservative case, 364 general case, 375 Chacon theorem, 377 charge, 28, 347 associated, 13
pure, 347 ChoquetEdgar theorem, 219 Choquet theorem, 231 class (B), 188
422
Index of terms
closed ball, 223 closed graph theorem, 172, 173 closure, 73 cluster point approximation theorem, 11 cofinal optional sampling, 142 commuting operators, 393 compact operator, 246 comparison of Orlicz spaces, 51 complete metric space, 3 completely continuous operator, 252 condition (A), 169, 170 condition (B), 188, 190, 217 condition (C), 113, 159, 160, 162, 170 condition (Cp), 169 condition (D b), 158 condition (F4), 397 condition (FV ,), 156 condition (SV), 169 condition SV(m), 169 condition (V), 113, 127, 129, 134, 159, 295 condition (VC), 143 condition (V°), 113, 143 condition (V4,), 145, 153, 158 condition (oSV), 169 conditional expectation, 19, 33, 68 infinite measure space, 73 vectorvalued, 179 conditional probability, 254 conjugate of an Orlicz space, 59 conjugate Orlicz function, 35, 59 conservative operator, 355 conservative part, 355 constant stochastic basis, 143 constituency, 310 constituent, 310 continuous for order, 383 continuous parameter, 31 continuum hypothesis, 170 contraction, 344 contraction property, 27 controlled amart, 143 controlled Vitali condition, 143 convergence, 11 a.e., 205 a.s., 11 essential, 114 generalized sequences, 1 in mean, 14, 26, 77 in measure, 70, 117 MooreSmith, 1, 3 nets, 1 order, 114 Pp, 214 Pettis norm, 198, 207 T'modular, 41 ,lbnorm, 41
in probability, 11, 14, 117 scalar, 198, 199
of a series, 233
stochastic, 11, 14, 19, 117, 126 strong, 205 weak a.e., 199 weak a.s., 202 weak truncated limit, 345 convergence theorems, 33 converse maximal inequality, 98 converse Riesz decomposition, 196 convex, 34, 38 convex hull, 73 convex sets, 172 convolution, 67 Corson's property (C), 223, 232 countably additive, 14, 17 countably superadditive, 347 countably valued stopping time, 153 counting measure, 38 covering condition, 113 (A), 169 (C), 159 (V), 127 criterion of de La Vallee Poussin, 71 Dbasis, 327 (D',), 158 decomposition of set functions, 347 decreasing rearrangement, 97 deficit, 313 (02) condition, 43, 47, 49, 88 at oo, 44
at 0, 44 demiconvergence, 118, 326, 382, 384, 390 lower, 118 martingales, 127 upper, 118 density, 281 lower, 282 upper, 282 density property, 315 dentable, 224 derivate lower, 310 upper, 310 derivation, 33, 291, 296 abstract, 308 by cubes, 304 by disks, 304 by intervals, 305 in IR, 291 in IRd, 300 by squares, 304 derivation basis, 309 BusemannFeller, 327, 338 Dbasis, 327 derivation setting, 140, 144, 158, 170 deriving net, 309 diameter, 327 difference condition, 139
Index of terms difference process, 274, 405
difference property, 24, 197
423
Euclidean space, 33 event, 4
smarts, 24
exact amart, 21
in IN, 29
exact dominant, 366, 367, 372 extended, 370 exchangeable, 264, 400 row and column, 400 exchangeable events, 266 expectation, vectorvalued, 173 extended exact dominant, 370
ordered amarts, 24 Pettis norm, 185 reversed, 30 uniform, 29 differencequotient martingale, 292, 298, 299 difference sequence, 271 differentiable, 310 differentiable a.e., 34 differentiate an integral, 310 directed index set, vectorvalued processes,
214
directed set, 2, 113 dissipative operator, 355 dissipative part, 355 distal family, 222 distribution function, 41 dominant, 366, 372 exact, 366 extended exact, 370 dominated, 70 dominated convergence theorem, 287 dominated in distribution, 70, 77 dominated sums property, 170 Doob decomposition, 267 submartingale, 267 supermartingale, 267 Doob inequality, 257 Doob maximal inequality, 28 Doob potential, 30, 125 dual of HH, 62 of Lq,, 62 of an Orlicz space, 59 dual Banach space, 172 dual directed set, 2 duality for Orlicz spaces, 59 duality map, 279 DunfordSchwartz theorem, 33, 376, 380 DvoretzkyRogers lemma, 237 EM, 50 e lim inf , 116 e lim sup, 116 economic interpretation, 111 endomorphism, 354 entrance time, 191 ergodic operator, 380 ergodic theorem, 33, 344 HopfDunfordSchwartz, 354 ess sup, 116 essential convergence, 114, 135 monotone convergence theorem, 126 essential infimum, 115 essential lower limit, 116 essential supremum, 73, 114, 286 essential upper limit, 116
(F4), 397 factorization from conditional expectation, 20
Fatou lemma, 287 stochastic convergence, 126 Fatou property, 40
weak truncated limit, 346 Fava spaces, 65 Rk, 391 Ra, 353 filling scheme, 380 filtering to the left, 2 filtering to the right, 2 fine almostcover, 310 fine cover, 310 finite dimensional, 217, 240, 252 finite measurable partitions, 140 finite subdivision, 292 finite subsets, 131 finite variation, 15, 297 finitely additive set function, 13 first entrance time, 191 fixedpoint theorems, 222 foresight, 104 Frechet differentiable, 247 Frechet property, 7 Frechet space, 252
full fine cover, 310 function, bounded variation, 297
(FV,), 156 gambler, 104 game fairer with time, 216 generalized inverse, 35 generalized sequence, 1, 3 generalized waiting lemma, 127 geometry of Banach spaces, 183 GFT, 216
H.,, 39 Haar functions, 247 Haar operator, 247 HahnBanach theorem, 162, 172, 224 extension form, 172 separation form, 172 Hahn decomposition, 19 HajekRenyiChow inequality, 256 halfspace, 174 halo, 293 halo theorems, 332
Index of terms
424
HardyLittlewood maximal inequality, 97 heart, 33, 39, 68 of a sum, 63 hereditary northwest, 36 southeast, 36 homothety, 335 homothety filling theorem, 335 Hopf decomposition, 355 HopfDunfordSchwartz ergodic theorem, 354
Hopf maximal ergodic theorem, 350, 351 hyperamart, 32
ideal, operator, 232 improved constant, 96 incomplete multivalued stopping time, 144 countablyvalued, 153 increasing, 34 increasing operator, 383 indefinite integral, 298 induced partition, 292 infinite measure, 33 inner envelope, 309 inner measure, 291, 309 integrable Bochner, 173 Pettis, 182 scalarly, 174 integral, 310 Pettis, 182 interval basis, 311, 342 interval partitions, 133 invariant set, 360 inverse, generalized, 35 James space, 253 Jensen's inequality, 75 vectorvalued, 180 Kolmogorov inequality, 256 Kolmogorov law of large numbers, 259 KreinMilman property, 232 Krengel theorem multiparameter, 405 Krickeberg decomposition, 267, 272 submartingale, 268 Kronecker's lemma, 258
Llbounded, 188 Libounded martingale, 159 Lmax, 57 Lmin, 57 Lp, 51
L,, 39 largest Orlicz function, 54 largest Orlicz space, 57 lattice property, 9, 136, 148 law of large numbers, 257, 264 twoparameter, 400 Lebesgue decomposition, 28
Lebesgue density theorem, 300, 315 Lebesgue measure, 36, 37, 38 left continuous, 34 LewisStegall factorization, 243 lifting, 280, 281, 286 linear, 289 linear functional, 68 linear lifting, 289 linear operator, 383 localization theorem, 20 locally convex space, 231 locally finite, 308 lower density, 282 Lebesgue, 284 lower derivate, 310 lower semicontinuous, 180 Luxemburg norm, 39, 61, 390
Mp, 38 Maharam's lifting theorem, 286 MarkovKakutani fixed point theorem, 222 Markov kernel, 379 Markovian operator, 27, 344, 354, 394 martingale, 9, 21, 139, 144, 206, 209, 254 additive, 405 Banachvalued, 171, 402 block, 397 decomposition of, 28 differencequotient, 292 Llbounded, 159 multiparameter, 396 Orlicz bounded, 80 reversed, 264 singular, 161 strong, 406 twoparameter, 400 uniformly integrable, 79, 123, 170 vectorvalued, 183, 184 martingale difference sequence, 99 martingale in the limit, 139, 215 martingale theorem, 33 martingale transform, 99, 271 maximal ergodic theorem, 351 maximal function, 6, 7, 89, 274, 280 maximal inequality, 7, 28, 82, 94, 160, 197 converse, 98 Doob, 100 HardyLittlewood, 97 martingale transforms, 99 reversed processes, 197 stochastic, 120 supermartingale, 254 vectorvalued, 190 weak, 350 maximal theorem, 393 Akcoglu, 393 maximal weak L1, 8 mean bounded operator, 353 measurable Bochner, 173
Index of terms measurable approximation lemma, 7 measurable function, 33 measurable partition, 15 measure arbitrarily small sets, 38 infinite, 38 Radon, 311 orfinite, 38
vector, 176 measurepreserving transformation, 354 metric space, 3 complete, 3 mil, 216 mixing Qstar, 264 star, 262
modification, separable, 288 monotone convergence theorem, 287 essential convergence, 126 monotone optional sampling, 142 monotonely continuous, 357, 360 monotonicity of stopping, 8 MooreSmith convergence, 1, 3, 146 Mourier theorem, 392 multiparameter martingale, 396 multiparameter principle, 382, 383, 386 multiparameter processes, 382 multivalued stopping time, 113, 144 countably valued, 153 incomplete, 144
net, 1, 3 Cauchy, 3 nonlocalization, 26 nonmeasurable sets, 309 nontrivial, 34 norm, order continuous, 42, 383 norm of a process, 274 northwest hereditary, 36 nuclear space, 253 operator, 344 conservative, 355 continuous for order, 383 dissipative, 355 ergodic, 380 increasing, 383 linear, 383 Markovian, 344, 354 mean bounded, 353 monotonely continuous, 357 norm continuous, 360 positive, 344 positively dominated, 388 positively homogeneous, 383 potential, 355 subadditive, 383 sublinear, 383 subMarkovian, 344, 365 operator extension, 360
425
operator ideal, 232 absolutely summing, 232, 234 Asplund operators, 246 compact operators, 246 pabsolutely summing, 234 RadonNikodym operators, 241 2absolutely summing, 237 weakly compact operators, 246 optional sampling amarts, 12 cofinal, 142 monotone, 142
optional sampling theorem vectorvalued amart, 185 vectorvalued martingales, 184 weak sequential smart, 186 optional stopping, 6, 7, 31 order continuous, 360 order continuous norm, 42, 68, 383 order convergence, 114 ordered smart, 14 Banachvalued, 144 ordered stopping time, 13, 21 ordered Vitali condition, 143 Orlicz ball, 38 Orlicz bounded martingale, 80 Orlicz derivative, 35 finite, 35 unbounded,35 Orlicz function, 34 comparison of, 52 conjugate, 35, 59 equivalent, 54 finite, 35 largest, 54 smallest, 54 Orlicz heart, 90, 391 Orlicz modular, 38, 61, 390 Orlicz modular convergence of martingales, 80 Orlicz norm, 60, 61, 183 Orlicz norm convergence of martingales, 80 Orlicz sequence space, 39 Orlicz space, 33, 34, 38, 39, 51, 83, 90, 391 comparison, 51 duality, 59 heart, 33, 39 largest, 57 smallest, 57 vectorvalued, 183 outer envelope, 309 outer measure, 291, 309 overflow, 313, 326 overlap, 313
pabsolutely summing, 234 Po convergence, 214 permutation, 264 Pettis bounded, 188
Index of terms
426 Pettis integral, 182 Pettis measurability theorem, 175 Pettis norm, 173 incompleteness, 253 Pettis norm convergence, 198, 207 tnorm, 41 4 , 63
Pietsch factorization, 235, 252 point of density, 300, 315 point transformation, 354, 356, 360, 365, 378
pointwise ergodic theorems, 344 positive definite, 231 positive operator, 344 positive submartingale, 276 positive supermartingale, 98, 270 positively dominated operator, 388 positively homogeneous operator, 383 potential, 147 smart, 192 Doob, 30 potential operator, 355 potential theory, 381 pramart, 138 vectorvalued, 215 predictable, 256 predictable process, 99, 267
presently, 106 presently predictable, 106
process
adapted, 4 additive, 365 subadditive, 365 superadditive, 365 property (A), 328 property (C), 330 property (C) of Corson, 223, 232 property (WH), 338 prophet, 104 prophet inequalities, 82, 103, 104, 112 pure charge, 347 pure supercharge, 347 purely atomic, 290 Qstarmixing, 264 quadratic variation, 268 qualitative starmixing, 264 quasimartingale, 23, 31 continuous parameter, 32 vectorvalued, 184 Rk, 65, 391 Ro, 353, 391 Radon measure, 311 RadonNikodym derivative, 178, 198 RadonNikodym operator, 241 RadonNikodym property, 171, 178, 181, 182, 196, 198, 202, 209, 217, 225, 227, 230
boundedly complete basis, 183
fails in co, 181 fails in L1, 182
in ll, 181 of reflexive space, 212 of separable dual space, 212 RadonNikodym set, 198, 199, 200, 201, 203, 207, 223 weakly compact convex set, 212 RadonNikodym theorem, 15 random variable, vectorvalued, 173 rank one, 234
ratio ergodic theorem multiparameter, 395 ratio theorem, 380 superadditive, 368 rearrangement invariant function space, 41 rectangle basis, 343 refinement, 134, 140 reflexive Orlicz space, 62 relatively sequentially compact, 78 representable operator, 242 restriction lemma, 13 restriction theorem, vectorvalued smart, 186
reverse inequality, 92 reversed amart, 12, 17, 213 reversed martingale, 26, 213, 264 reversed process, 399 reversed stochastic basis, 8 reversed submartingale, 265 reversed supermartingale, 265 Riesz decomposition, 24, 25, 125, 141 smart, 125, 193 amart of finite order, 196 converse, 196 GFT, 216 in IN229 reversed smarts, 31 semiamart, 30 uniform smart, 187 vectorvalued, 196 vectorvalued smart, 193 weak sequential amart, 197 Rota theorem, multiparameter, 404 row and column exchangeable, 400 RyllNardzewski fixedpoint theorem, 222
sample space, 4 scalar convergence, 198, 199, 217 scalarly integrable, 174 scalarly measurable, weakstar, 175 Schauder fixed point theorem, 222 Schur property, 253 semiamart, 9, 147 semivariation, 176 separable, 49 separable modification, 288 sequential sufficiency theorem, 3 set function charge, 347
Index of terms countably superadditive, 347 decomposition of, 347 singular, 161 superadditive, 113 supercharge, 347 set of ufiniteness, 74 adirected set, 138 afinite measure, 38 ufiniteness, set of, 74 EC, 120 signed transform, 110 simple function, 33 simple ordered stopping time, 120 simple stopping time, 1, 5 singular martingale, 161 singular setfunction, 161
slice, 224 small overflow, 311, 328, 330 smaller algebra prevails, 20 smallest Orlicz function, 54 smallest Orlicz space, 57 Snell envelope, 123, 125
southeast hereditary, 36 spread, 310 square function, 270, 273, 274, 276, 280 stable family, 123 starmixing, 262, 264 stochastic basis, 4, 13, 292, 326 constant, 143 reversed, 8
stochastic convergence, 11, 117, 350 stochastic lower limit, 117, 127 stochastic maximal inequality, 120 stochastic process, adapted, 4 stochastic supremum, 117 stochastic upper limit, 117, 118, 127 stock options, 111 stopped inequality, 30 stopped outside, 191 stopped process, 94, 104 approximation by, 149 stopping, optional, 6 stopping time, 1, 5 incomplete multivalued, 144 multivalued, 113, 144 ordered, 13, 120 simple, 1, 5 stops outside, 191 strong convergence, 205 strong inequality, 82 strong martingale, 406 strong Vitali property, 311 strongly exposed point, 227, 228, 230 strongly exposing functional, 228 subadditive, 394 subadditive operator, 383 subadditive process, 365 subdivision finite, 292 sublinear operator, 383
427
subMarkovian operator, 344, 365 submartingale, 21, 139 in Banach lattice, 217 block, 397 reversed, 265
subnet, 7 subordinate, 156 subpramart, 139 substantial sets, 133, 300 superadditive, 394 superadditive process, 365 superadditive ratio theorem, 368 superadditive sequence, numerical, 377 superadditive setfunction, 113 superadditive theorem point transformations, 378 supercharge, 347 maximal measure in, 349 pure, 347 supermartingale, 21, 139 reversed, 265
supermartingale smart, 140 support beyond, 162 support of a permutation, 264, 265 Tconvergence, 207 Talagrand mil, 216 threefunction inequality, 82, 276, 279 time constant, 365, 394 totally ordered basis, 131 transform of a process, 99 transformed process, 106 transition kernel Markov, 379 nullpreserving, 379 truncated limit, 345, 349, 350 weak, 345 2absolutely summing, 237 twoparameter martingale, 400
unconditional convergence, 233 uniform smart, 184, 190, 206, 239 uniformly absolutely continuous, 69, 77, 188, 199
asymptotically, 120 uniformly integrable, 14, 33, 68, 69, 71, 77, 162, 188, 299 martingale, 79, 123, 170 stochastic convergence, 126 uniformly Ro, 78, 79 uniqueness of limits, 7 unit interval, 38 upcrossing, 13 upper density, 282 upper derivate, 310 upper envelope, measurability of, 286 (V), 113, 127, 129, 159 (VC), 143
(V°), 143
428
(Vt), 145, 153, 158 variation, 15, 18, 176 of a function, 297 vector measure, 176 Vitali condition, 113, 300 controlled, 143 ordered, 143 Vitali condition (V), 144 Vitali cover, 293, 300, 310 Vitali covering theorem, 293, 333 Vitali lemma, 133 Vitali property strong, 311 weak, 313 waiting lemma, 6, 127 generalized, 8 weak a.e. convergence, 199 weak amart, 186, 251 weak a.s. convergence, 202, 217 weak halo, 336, 343 weak halo evanescence property, 336
weak inequality, 82
Index of terms weak L1, 8 weak martingale, 107 weak maximal inequality, 99, 350 weak sequential smart, 186, 190, 200, 204, 208
weakstar topology on a Banach space, 172 weak topology, 78 on a Banach space, 172 weak truncated limit, 345, 350 weak unit, 350 weak Vitali property, 313, 315, 335 weakly compact, 79 weakly compact operator, 246 weakly sequentially complete, 350 weakly unconditionally Cauchy, 233 (WA), 338 Yankovvon Neumann selection theorem, 218
Young class, 38 Young modular, 61 Young partition, 36 Young's inequality, 37, 59