This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
0. n→∞
At least in the case when νn = µ, Mycielski has an alternative, in some sense simpler, derivation of (5.2.28). The strategy which I will use is the following. For each k ≥ 1 and f ∈ L1 (µ; R), define fk : E −→ R so that Z 1 f (y) µ(dy). fk (x) = µ(Qk (x)) Qk (x) Obviously fk Yn (ω) = fk Zn (ω) if Yn (ω) ∈ Qk Zn (ω) . Moreover, as I will / Qk (Zn ) = 0 for each k ≥ 0. Thus, the key step is show below, limn→∞ P Yn ∈ to show that lim sup P |f (Yn ) − fk (Yn )| ≥ = 0 for all > 0. k→∞ n≥1
Notice that, because fk = Eµ f σ(Pk ) , this would be obvious from Corollary 5.2.4 if the Yn were replaced by Xn . Thus, the problem comes down to showing that the distributions of Yn ’s are uniformly sufficiently close to µ. For each n ≥ 1, define Πn (z, Γ) = =
n ∞ X X k=0 j=1 ∞ X n
∆
k=0
j−1 n−j 1 − µ Qk (z) µ (Qk (z) \ Qk+1 (z)) ∩ Γ 1 − µ Qk+1 (z) µ (Qk (z) \ Qk+1 (z)) ∩ Γ , Qk+1 (z) µ Qk (z) \ Qk+1 (z)
n ˘ n . Then where ∆n (Q) ≡ 1 − µ(Q) − 1 − µ(Q) ZZ 1B (z, y) Πn (z, dy)νn (dz). (5.2.29) P (Zn , Yn ) ∈ B = B
In particular, if µn is the distribution of Yn , then Z ∞ ˘ \ Q) ∩ Γ X X µ (Q n . µn (Γ) = Πn (z, Γ) νn (dz) = ∆ (Q)νn (Q) ˘ \ Q) ν(Q k=0 Q∈Pk+1
§ 5.2 Discrete Parameter Martingales
223
In addition, because Q` (z) \ Q`+1 (z) ∩ Qk (z) = ∅ if ` < k and is equal to Q` (z) \ Q`+1 (z) when ` ≥ k, ∞ X ∆n Q`+1 (z)) Πn z, Qk (z) = `=k
= lim
L→∞
n n n 1 − µ(QL+1 (z)) − 1 − µ(Qk (z) = 1 − 1 − µ(Qk (z)) .
Thus, if r0 is the H¨older conjugate of r, then / Qk (Zn ) = P Yn ∈
Z 1−µ(Qk (z))
n
Z νn (dz) ≤ Kr
10 r nr0 , µ(dz) 1 − µ(Qk (z))
and so, by Lebesgue’s Dominated Convergence Theorem, / Qk (Zn ) = 0 for all k ≥ 0. (5.2.30) lim P Yn ∈ n→∞
Given an f ∈ L1 (µ; R) and Q ∈ 1 Af (Q) = µ(Q)
S∞
k=0
Z f dµ
Pk , set ( 0
0
and M f (Q) = sup A|f |(Q ) : Q ⊆ Q ∈
Q
∞ [
) Pk
.
k=0
Clearly, x ∈ Q =⇒ M f (Q) ≤ f ∗ (x) ≡ sup A|f | Qk (x) , k≥0
and, because Af Qk (x) = Eµ f σ(Pk ) (x), Doob’s Inequality (5.2.3) implies p kf kLp (µ;R) for all p ∈ (1, ∞]. that kf ∗ kLp (µ;R) ≤ p−1
Lemma 5.2.31. For any f ∈ L1 (µ; R), Z Z −1 f ∗ dνn . (5.2.32) |f | dµn ≤ θ 0
In particular, if q ∈ [1, ∞) and f ∈ Lqr (µ; R), then (5.2.33)
kf kLq (µn ;R) ≤
rKr θ
q‘
kf kLqr0 (µ;R) .
Proof: Without loss in generality, I will assume throughout that f ≥ 0. To prove (5.2.32), first note that Z Z ∞ X X 1 n f dµ ∆ (Q)νn (Q) f dµn = ˘ \ Q) Q\Q ˘ µ(Q k=0 Q∈Pk+1 X X ˘ ∆n (Q)νn (Q)M f (Q), ≤ θ−1 k=0 Q∈Pk+1
224
5 Conditioning and Martingales
since
Z
1 ˘ µ(Q \ Q)
˘ ≤ θ−1 M f (Q). ˘ f dµ ≤ θ−1 Af (Q)
˘ Q\Q
Next, for each k ≥ 0, X Q∈Pk+1
n ˘ 1 − µ(Q) νn (Q)M f (Q)
X
˘ = ∆n (Q)νn (Q)M f (Q)
Q∈Pk+1
˘ n νn (Q)M f (Q) ˘ 1 − µ(Q)
X
−
Q∈Pk+1
X n n ˘ − 1 − µ(Q) νn (Q)M f (Q) 1 − µ(Q) νn (Q)M f (Q)
X
=
Q∈Pk+1
≤
Q∈Pk
X n n 1 − µ(Q) νn (Q)M f (Q) − 1 − µ(Q) νn (Q)M f (Q),
X Q∈Pk+1
Q∈Pk
and therefore K X X
Z f dµn ≤ lim
θ
K→∞
n 1 − µ(Q) νn (Q)M f (Q)
Q∈Pk+1
k=0
−
n 1 − µ(Q) νn (Q)M f (Q)
X
Q∈Pk
X
= lim
K→∞
n 1 − µ(Q) νn (Q)M f (Q) ≤
Z
f ∗ dνn .
Q∈PK+1
Given (5.2.32), (5.2.33) is an easy application of H¨older’s Inequality and the estimate coming from (5.2.3) on the Lp (µ; R)-norm of f ∗ in terms of that of f . Namely, Z
q
f dµn ≤ θ
−1
Z
q ∗
(f ) dνn ≤ Kr
≤ θ−1 Kr
Z
0 q ∗ r
(f )
10 r dµ
rKr r0 kf kqLqr0 (µ;R) . kf q kLr0 (µ;R) = 0 θ r −1
Theorem 5.2.34. For each B-measurable f : E −→ R, (5.2.28) holds. More0 over, if q ∈ (1, ∞) and f ∈ Lqr (µ; R), then (5.2.35)
lim EP |f (Yn ) − f (Zn )|q = 0 for each p ∈ [1, q).
n→∞
(See Exercise 6.1.19 for a related result.)
§ 5.2 Discrete Parameter Martingales
225
Proof: It is easy to prove Indeed, given δ > 0, choose (5.2.28) from (5.2.35). R R > 0 so that µ |f | ≥ R < δ, and set f = f 1[−R,R] (f ). Then, by (5.2.35), limn→∞ P |f R (Yn ) − f R (Zn )| ≥ = 0 for all > 0. Hence, lim P |f (Yn ) − f (Zn )| ≥ 3 n→∞ ≤ lim µn |f − f R | ≥ + lim νn |f − f R | ≥ . n→∞
n→∞
By H¨older’s Inequality, 1 1 νn |f − f R | ≥ ≤ Kr µ |f − f R | ≥ r0 < Kr δ r0 ,
and, by (5.2.33) with q = 1,
1 rKr rKr 10 δr . µ |f − f R | ≥ r0 < µn |f − f R | ≥ ≤ θ θ
The proof of (5.2.35) follows the strategy outlined earlier. That is, 1 EP |f (Yn ) − f (Zn )|p p
1 ≤ kf − fk kLp (µn ;R) + EP |fk (Yn ) − fk (Zn )|p p + kfk − f kLp (νn ;R) .
By (5.2.33), kf − fk kLp (µn ;R) ≤
rKr θ
p1
kf − fk kLpr0 (µ;R) ,
and, by H¨older’s Inequality, 1
kf − fk kLp (νn ;R) ≤ Krp kf − fk kLpr0 (µ;R) .
Since, by Corollary 5.2.4, kf − fk kLpr0 (µ;R) −→ 0 as k → ∞, all that remains is 1 to show that, for each k ≥ 0, EP |fk (Yn ) − fk (Zn )|p p −→ 0. But
1 1 / Qk (Zn ) p EP |fk (Yn ) − fk (Zn )|p p = EP |fk (Yn ) − fk (Zn )|p , Yn ∈ 1−1 1 / Qk (Zn ) p q . ≤ EP |fk (Yn ) − fk (Zn )|q q P Yn ∈ By (5.2.30), the final factor tends to 0 as n → ∞. Hence, since, by H¨older’s Inequality and (5.2.33), 1 EP |fk (Yn ) − fk (Zn )|q q ≤ kfk kLq (µn ;R) + kfk kLq (νn ;R) 1 1 1 1 r q r q + 1 Krq kf kLqr0 (µ;R) , + 1 Krq kfk kLqr0 (µ;R) ≤ ≤ θ θ
the proof is complete.
226
5 Conditioning and Martingales Exercises for § 5.2
Exercise 5.2.36. In this exercise I will outline a quite independent derivation of the convergence assertion in Doob’s Martingale Convergence Theorem. The key observations here are first that, given Doob’s Inequality (cf. (5.2.2)), the result is nearly trivial for martingales having uniformly bounded second moments and second that everything can be reduced to that case. (i) Let Mn , Fn , P be a martingale which is L2 -bounded (i.e., supn∈N EP [Mn2 ] < ∞). Note that h 2 2 i EP Mn2 − EP Mm−1 = EP Mn − Mm−1
for
1 ≤ m ≤ n;
and starting from this, show that there is an M ∈ L2 (P; R) such that Mn −→ M 2 in L (P; R). Next apply (5.2.5) to the submartingale Mn∨m − Mm , Fn , P to show that, for every > 0, P
sup Mn − Mm ≥
n≥m
≤
i 1 P h E M − Mm −→ 0
as
m → ∞,
and conclude that Mn −→ M (a.s., P). (ii) Let Xn , Fn , P be a non-negative submartingale with the property that supn∈N EP [Xn2 ] < ∞, define the sequence {An : n ∈ N} accordingly, as in Lemma 5.2.12, and set Mn = Xn − An , n ∈ N. Then Mn , Fn , P is a martingale, and clearly both Mn and An are P-square integrable for each n ∈ N. In fact, check that 2 = EP Mn − Mn−1 Xn + Xn−1 EP Mn2 − Mn−1 2 2 − EP An − An−1 Xn + Xn−1 ≤ EP Xn2 − Xn−1 , = EP Xn2 − Xn−1 and therefore that EP Mn2 ≤ EP Xn2 and
EP A2n ≤ 4EP Xn2
for every n ∈ N.
Finally, show that there exist M ∈ L2 (P; R) and A ∈ L2 P; [0, ∞) such that Mn −→ M , An % A, and, therefore, Xn −→ X ≡ M + A both P-almost surely and in L2 (P; R). (iii) Let Xn , Fn , P be a non-negative martingale, set Yn = e−Xn , n ∈ N, use Corollary 5.2.10 to see that Yn , Fn , P is a uniformly bounded, non-negative, submartingale, and apply part (ii) to conclude that {Xn : n ≥ 0} converges P-almost surely to a non-negative X ∈ L1 (P; R).
Exercises for § 5.2
227
(iv) Let Xn , Fn , P be a martingale for which (5.2.37)
sup EP Xn < ∞. n∈N
± ± Fm ∨0 for n ∈ N. Show that Y ± For each m ∈ N, define Yn,m = EP Xn∨m ≥ n+1,m ± + ± ± Yn,m (a.s., P), define Ym = limn→∞ Yn,m , check that both Ym , Fm , P and Ym− , Fm , P are non-negative martingales with EP Y0+ +Y0− ≤ supn∈N EP |Xn | , and note that Xm = Y m+ − Ym− (a.s., P) for each m ∈ N. In other words, every martingale Xn , Fn , P satisfying (5.2.37 ) admits a Hahn decomposition3 as the difference of two non-negative martingales whose sum has expectation value dominated by the left-hand side of (5.2.37). Finally, use this observation together with (iii) to see that every such martingale converges P-almost surely to some X ∈ L1 (P; R).
(v) By combining the final assertion in (iv) together with Doob’s Decomposition in Lemma 5.2.12, give another proof of the convergence assertion in Theorem 5.2.15. Exercise 5.2.38. In this exercise we will develop another way to reduce Doob’s Martingale Convergence Theorem to the case of L2 -bounded martingales. The technique here is due to R. Gundy and derives from the ideas introduced by Calder´on and Zygmund in connection with their famous work on weak-type 1–1 estimates for singular integrals. measurable, [0, R]-valued (i) Let {Zn : n ∈ N} be a Fn : n ∈ N -progressively , F , P is a submartingale. Next, choose sequence with the property that −Z n n {An : n ∈ N} for −Zn , Fn , P as in Lemma 5.2.12, note that An ’s can be chosen so that 0 ≤ An − An−1 ≤ R for all n ∈ Z+ , and set Mn = Zn + An , n ∈ N. Check that Mn , Fn , P is a non-negative martingale with Mn ≤ (n + 1)R for each n ∈ N. Next, show that 2 = EP Mn − Mn−1 Zn + Zn−1 EP Mn2 − Mn−1 2 + EP An − An−1 Zn + Zn−1 = EP Zn2 − Zn−1 2 + 2R EP An − An−1 , ≤ EP Zn2 − Zn−1 and conclude that EP [A2n ] ≤ EP [Mn2 ] ≤ 3REP [Z0 ] for all n ∈ N. (ii) Let Xn , Fn , P be a non-negative martingale. Show that, for each R ∈ (R) (R) (R) (R) (0, ∞), Xn = Mn − An + ∆n , n ∈ N, where Mn , Fn , P is a non-negative (R) (R) 2 ≤ 3R EP X0 ; An : n ∈ N is a martingale satisfying supn≥0 EP Mn (R)
non-decreasing sequence of random variables with the properties that A0 3
This useful observation was made by Klaus Krickeberg.
≡ 0,
228
5 Conditioning and Martingales
(R) 2 (R) (R) ≤ 3REP X0 ; and ∆n : An is Fn−1 -measurable, and supn≥1 EP An n ∈ N is a Fn : n ∈ N -progressively measurable sequence with the property that 1 P ∃n ∈ N ∆(R) 6= 0 ≤ EP X0 . n R (R)
(R)
(R)
Hint: Set Zn = Xn ∧ R and ∆n = Xn − Zn for n ∈ N, apply part (i) (R) to Zn : n ∈ N , and use Doob’s Inequality to estimate the probability that (R) ∆n 6= 0 for some n ∈ N. (iii) Let Xn , Fn , P be any martingale. Using (ii) above and part (iv) of Exer(R) (R) (R) cise 5.2.36, show that, for each R ∈ (0, ∞), Xn = Mn + Vn + ∆n , n ∈ N, (R) (R) 2 ≤ 12 REP |Xn | ; where Mn , Fn , P is a martingale satisfying EP Mn (R) (R) (R) Vn : n ∈ N is a sequence of random variables satisfying V0 ≡ 0, Vn is Fn−1 -measurable, and !2 n X Vm(R) − V (R) ≤ 12REP |Xn | EP m−1 1
for n ∈ Z+ ; and {∆n : ∈ N} is an sequence satisfying
Fn : n ∈ N -progressively measurable
2 P ∃ 0 ≤ m ≤ n ∆(R) = 6 0 ≤ EP |Xn | . m R
The preceding representation is called on–Zygmund decomposi the Calder´ tion of the martingale Xn , Fn , P . (iv) Let Xn , Fn , P be a martingale that satisfies (5.2.37), and use part (iii) above together with part (i) of Exercise 5.2.36 to show that, for each R ∈ (0, ∞), 2 times the {Xn : n ≥ 0} converges off of a set whose P-measure is no more than R P supremum over n ∈ N of E [|Xn |]. In particular, when combined with Lemma 5.2.12, the preceding line of reasoning leads to the advertised alternate proof of the convergence result in Theorem 5.2.15.
Exercise 5.2.39. In this exercise we will extend Hunt’s Theorem (cf. Theorem 5.2.13) to allow unbounded stopping times. To this end, let Xn , Fn , P be a uniformly P-integrable submartingale on the probability space (Ω, F, P), and set Mn = Xn − An , n ∈ N, where {An : n ∈ N} is the sequence produced in Lemma 5.2.12. After checking that Mn , Fn , P is a uniformly P-integrable martingale, show that, for any stopping time ζ: Xζ = EP [M∞ |Fζ ] + Aζ (a.s., P), where X∞ , M∞ , and A∞ are, respectively, the P-almost sure limits of {Xn : n ≥ 0}, {Mn : n ≥ 0}, and {An : n ≥ 0}. In particular, if ζ and ζ 0 are a pair of stopping times and ζ ≤ ζ 0 , conclude that Xζ ≤ EP [Xζ 0 |Fζ ] (a.s., P).
Exercises for § 5.2
229
Exercise 5.2.40. There are times when submartingales converge even though they are not bounded in L1 (P; R). For example, suppose that (Xn , Fn , P) is a submartingale for which there exists a non-decreasing function ρ : R 7−→ R with the properties that ρ(R) ≥ R for all R and Xn+1 ≤ ρ Xn (a.e., P) for each n ∈ N. (i) Set ζR (ω) = inf n ∈ N : Xn (ω) ≥ R for R ∈ (0, ∞), and note that sup Xn∧ζR ≤ X0 ∨ ρ(R)
(a.e., P).
n∈N
In particular, if X0 is P-integrable, show that {Xn (ω) : n ≥ 0} converges in R for P-almost every ω for which the sequence {Xn (ω) : n ≥ 0} is bounded above. + Hint: After observing that supn∈N EP [Xn∧ζ ] < ∞ for every R ∈ (0, ∞), conR clude that, for each R ∈ (0, ∞), {Xn : n ≥ 0} converges P-almost everywhere on {ζR = ∞}. (ii) Let {Yn : n ≥ 1} be a sequence of mutually independent, P-integrable random variables, assume that EP [Yn ] ≥ 0 for n ∈ N and supn∈N kYn+ kL∞ (P;R) < Pn ∞, and set Sn = 1 Ym . Show that {Sn : n ≥ 0} is either P-almost surely unbounded above or P-almost surely convergent in R. (iii) Let Fn : n ∈ N be a non-decreasing sequence of sub-σ-algebras and An an element of Fn for each n ∈ N. Show that the set of ω ∈ Ω for which either ∞ X
1An (ω) < ∞ but
n=0
or
∞ X
∞ X
P An Fn−1 (ω) = ∞
n=1
1An (ω) = ∞ but
n=0
∞ X
P An Fn−1 (ω) < ∞
n=1
has P-measure 0. In particular, note that this gives another derivation of the second part of the Borel–Cantelli Lemma (cf. Lemma 1.1.3). Exercise 5.2.41. For each n ∈ N, let (En , Bn ) be a measurable space and µn and νn a pair of probability measures on (En , Bn ) with the property that Theorem, Q which says that (cf. Exercise 1.1.14) νn µ Qn . Prove Kakutani’s Q Q either n∈N νn ⊥ n∈N µn or n∈N νn n∈N µn . Hint: Set Y Y Y Y En , F = Bn , P = µn , and Q = νn . Ω= n∈N
n∈N
n∈N
n∈N
Qn ( 0 Bm ), where πn is the natural projection from Ω onto Next, take Fn = Q n 0 Em , set Pn = P Fn and Qn = Q Fn , and note that πn−1
n
Xn (x) ≡
Y dQn (x) = fm (xm ), dPn 0
x ∈ Ω,
230
5 Conditioning and Martingales
dνn . In particular, when νn ∼ µn for each n ∈ N, use Kolwhere fn ≡ dµ n mogorov’s 0–1 Law (cf. Theorem 1.1.2) to see that Q(G) ∈ {0, 1}, where G ≡ limn→∞ Xn ∈ (0, ∞)}, and combine this with the last part of Theorem 5.2.20 to conclude that Q 6⊥ P =⇒ Q P. Finally, to remove the assumption that νn ∼ µn for all n’s, define ν˜n on (En , Bn ) by ν˜n = 1 − 2−n−1 νn + 2−n−1 µn , ˜ ≡Q ˜n , and use the preceding to complete check that ν˜n ∼ µn and Q Q n∈N ν the proof.
Exercise 5.2.42. Let (Ω, F) be a measurable space and Σ a sub-σ-algebra of F. Given a pair of probability measures P and Q on (Ω, F), let XΣ and YΣ be non-negative Radon–Nikodym derivatives of, respectively, PΣ ≡ P Σ and QΣ ≡ Q Σ with respect to PΣ + QΣ , and define P, Q Σ =
Z
1
1
XΣ2 YΣ2 d(P + Q).
(i) Show that if µ is any σ-finite measure on (Ω, Σ) with the property that PΣ µ and QΣ µ, then the number P, Q Σ given above is equal to Z
dPΣ dµ
12
dQΣ dµ
12
dµ.
Also, check that PΣ ⊥ QΣ if and only if P, Q Σ = 0. (ii) Suppose that Fn : n ∈ N is a non-decreasing sequence of sub-σ-algebras of F, and show that (P, Q)Fn −→ (P, Q)W∞ Fn . 0
(iii) Referring to part (ii), assume that Q Fn P Fn for each n ∈ N, let Xn be a non-negative Radon–Nikodym derivative Fn , √ to P W∞of Q Fn with respect W∞ P Xn −→ 0 and show that Q 0 Fn is singular to P 0 Fn if and only if E as n → ∞.
(iv) Let {σn }∞ 0 ⊆ (0, ∞), and, for each n ∈ N, let µn and νn be Gaussian measures on R with variance σn2 . If an and bn are the mean values of, respectively, µn and νn , show that Y
νn ∼
n∈N
depending on whether
Y n∈N
P∞ 0
µn
or
Y n∈N
νn ⊥
Y
µn
n∈N
σn−2 (bn − an )2 converges or diverges.
Exercise 5.2.43. Let {Xn : n ∈ Z+ } be a sequence of identically distributed, mutually independent, integrable, mean value P 0, R-valued random variables on n the probability space (Ω, F, P), and set Sn = 1 Xm for n ∈ Z+ . In Exercise
Exercises for § 5.2
231
1.4.28 we showed that limn→∞ |Sn | < ∞ P-almost surely. Here we will show that
(5.2.44)
lim |Sn | = 0 P-almost surely. n→∞
As was mentioned before, this result was proved first by K.L. Chung and W.H. Fuchs. The basic observation behind the present proof is due to A. Perlin, who noticed that, by the Hewitt–Savage 0–1 Law, limn→∞ |Sn | = L P-almost surely for some L ∈ [0, ∞). Thus, the problem is to show that L = 0, and we will do this by an simple argument invented by A. Yushkevich.
(i) Assuming that L > 0, use the Hewitt–Savage 0–1 Law to show that P |Sn − x|
0, argue that P |Sn − L|
0.
j=1
As is easily checked, Mf : RN −→ [0, ∞] is lower semicontinuous and therefore certainly Borel measurable. Furthermore, if we restrict our attention to nicely meshed families of cubes, then it is easy to relate Mf to martingales. More precisely, for each n ∈ Z, the nth standard dyadic partition of RN is the partition Pn of RN into the cubes N Y ki ki + 1 , k ∈ ZN . , (6.1.4) Cn (k) ≡ n n 2 2 i=1
These partitions are nicely meshed in the sense that the (n + 1)st is a refinement of the nth. Equivalently, if Fn denotes the σ-algebra over RN generated by the partition Pn , then Fn ⊆ Fn+1 . Moreover, if f ∈ L1 (RN ; R) and Z f nN f (y) dy for x ∈ Cn (k) and k ∈ ZN , Xn (x) ≡ 2 Cn (k)
then, for each n ∈ Z, Xnf = EλRN |f | Fn
(a.e., λRN ),
where λRN denotes Lebesgue measure on RN . In particular, for each m ∈ Z, f Xm+n , Fm+n , λRN , n ∈ N, is a non-negative martingale; and so, by applying (6.1.2) for each m ∈ Z and then letting m & −∞, we see that Z n o 1 (0) |f (y)| dy, α ∈ (0, ∞), (6.1.5) x : M f (x) ≥ α ≤ α {M(0) f ≥α}
236
6 Some Extensions and Applications
where
( (0)
M
f (x) = sup
1 |Q|
)
Z |f (y)| dy : x ∈ Q ∈ Q
[
Pn
n∈Z
and I have used |Γ| to denote λRN (Γ), the Lebesgue measure of Γ. At first sight, one might hope that it should be possible to pass directly from (6.1.5) to analogous estimates on the level sets of Mf . However, the passage from (6.1.5) to control on Mf is not as easy as it might appear at first: the “sup” in the definition of Mf involves many more cubes than the one in the definition of M(0) f . For this reason I will have to introduce additional families of meshed partitions. Namely, for each η ∈ {0, 1}N , set (−1)n η N + Cn (k) : k ∈ Z , Pn (η) = 3 × 2n
where Cn (k) is the cube described in (6.1.4). It is then an easy matter to check that, for each η ∈ {0, 1}N , Pn (η) : n ∈ Z is a family of meshed partitions of RN . Furthermore, if ) ( Z [ (η) 1 f (y) dy : x ∈ Q ∈ Pn (η) , x ∈ RN , M f (x) = sup |Q| Q n∈Z
then exactly the same argument that (when η = 0) led us to (6.1.5) can now be used to get Z n o 1 N (η) f (y) dy (*) x ∈ R : M f (x) ≥ α ≤ α {M(η) f ≥α}
for each η ∈ {0, 1}N and α ∈ (0, ∞). Finally, if Q is given by (6.1.3) and r ≤ 3 12n , then it is possible to find an η ∈ {0, 1}N and a C ∈ Pn (η) for which Q ⊆ C. (To see this, first reduce to the case when N = 1.) Hence,
max
η∈{0,1}N
M(η) f ≤ Mf ≤ 6N
max
η∈{0,1}N
M(η) f.
After combining this with the estimate in (*), we arrive at the following version of the Hardy–Littlewood Maximal Inequality: n o (12)N Z |f (y)| dy. (6.1.6) x ∈ RN : Mf (x) ≥ α ≤ α RN
At the same time, (*) implies that max
η∈{0,1}N
(η)
M f p N ≤ L (R ;R)
p kf kLp (RN ;R) , p−1
p ∈ (1, ∞].
§ 6.1 Some Extensions
237
To check this, first note that it suffices to do so when f vanishes outside of the ball B(0, R) for some R > 0. Second, assuming that f = 0 off of B(0, R), observe that (*) implies that Z n o 1 f (y) dy. x ∈ B(0, R) : M(η) f (x) ≥ α ≤ α {M(η)∩B(0,R) f ≥α}
Next, even though the result in Exercise 1.4.18 was stated for probability measures, it applies equally well to any finite measure. Thus, we now know that (η)
kM
! p1
Z
(η)
f kLp (RN ;R) = lim
R→∞
(M
p
f ) (x) dx
B(0,R)
≤
p kf kLp (RN ;R) , p−1
and so we can repeat the argument just made to obtain (6.1.7)
N
Mf p N ≤ (12) p kf kLp (RN ;R) L (R ;R) p−1
for p ∈ (1, ∞].
In this connection, notice that there is no hope of getting this sort of estimate when p = 1, since it is clear that lim |x|N Mf (x) > 0 |x|→∞
whenever f does not vanish λRN -almost everywhere. The inequality in (6.1.6) plays the same role in classical analysis as Doob’s Inequality plays in martingale theory. For example, by essentially the same argument as I used to pass from Doob’s Inequality to Corollary 5.2.4, we obtain the following version of famous Lebesgue Differentiation Theorem. Theorem 6.1.8. For each f ∈ L1 RN ; R), Z 1 f (y) − f (x) dy = 0 lim B&{x} |B| B (6.1.9)
for λRN -almost every x ∈ RN , where, for each x ∈ RN , the limit is taken over balls B that contain x and tend to x in the sense that their radii shrink to 0. In particular, Z 1 f (y) dy for λRN -almost every x ∈ RN . f (x) = lim B&{x} |B| B
Proof: I begin with the observation that, for each f ∈ L1 (RN ; R), Z 1 ˜ f (y) dy ≤ κN Mf (x), x ∈ RN , Mf (x) ≡ sup B3x |B| B
238
6 Some Extensions and Applications
2N with ΩN = B(0, 1) . Second, notice that (6.1.9) for every where κn = Ω N x ∈ RN is trivial when f ∈ Cc (RN ; R). Hence, all that remains is to check that if fn −→ f in L1 (RN ; R) and if (6.1.9) holds for each fn , then it holds for f . To this end, let > 0 be given and check that, because of the preceding and (6.1.6), Z f (y) − f (x) dy ≥ x : lim 1 B&{x} |B| B n o ˜ ≤ x : M(f − fn )(x) ≥ 3 Z 1 fn (y) − fn (x) dy ≥ + x : lim 3 B&{x} |B| B n o + x : fn (x) − f (x) ≥ 3 3 ≤ 1 + (12)N κN kf − fn kL1 (RN )
for every n ∈ Z+ . Hence, after letting n → ∞, we get (6.1.9) f . Although applications like Lebesgue’s Differentiation Theorem might make one think that (6.1.6) is most interesting because of what it says about averages over small cubes, its implications for large cubes are also significant. In fact, as I will show in § 6.2, it allows one to prove Birkhoff’s Individual Ergodic Theorem (cf. Theorem 6.2.7), which may be viewed as a result about differentiation at infinity. The link between ergodic theory and the Hardy–Littlewood Inequality is provided by the following deterministic version of the Maximal Ergodic Lemma (cf. Lemma 6.2.1). Namely, let ak : k ∈ ZN be a summable subset of [0, ∞), and set X 1 aj+k , n ∈ N and k ∈ ZN , S n (k) = N (2n) j∈Qn where Qn = j ∈ ZN : −n ≤ ji < n for 1 ≤ i ≤ N . By applying (6.1.6) and (6.1.7) to the function f given by (cf. (6.1.4)) f (x) = ak when x ∈ C0 (k), we see that (12)N X N ak , α ∈ (0, ∞) (6.1.10) card k ∈ Z : sup S n (k) ≥ α ≤ α n∈Z+ N k∈Z
and
! p1 (6.1.11)
X k∈ZN
sup |S n (k)|p n∈Z+
(12)N p ≤ p−1
! p1 X
|ak |p
for p ∈ (1, ∞].
k∈ZN
The inequality in (6.1.10) is called Hardy’s Inequality. Actually, Hardy worked in one dimension and was drawn to this line of research by his passion
§ 6.1 Some Extensions
239
for the game of cricket. What Hardy wanted to find is the optimal order in which to arrange batters to maximize the average score per inning. Thus, he worked with a non-negative sequence {ak : k ≥ 0} in which ak represented the expected number of runs scored by player k, and what he showed is that, for each α ∈ (0, ∞), k ∈ N : sup S n (k) ≥ α + n∈Z
is maximized when {ak : k ≥ 0} is non-increasing, from which it is an easy application of Markov’s Inequality to prove that ∞ X k ∈ N : sup S n (k) ≥ α ≤ 1 ak , α n∈Z+ 0
α ∈ (0, ∞).
Although this sharpened result can also be obtained as a corollary the Sunrise Lemma,1 Hardy’s approach remains the most appealing. § 6.1.2. Banach Space–Valued Martingales. I turn next to martingales with values in a separable Banach space. Actually, everything except the easiest aspects of this topic becomes extremely complicated and technical very quickly, and, for this reason, I will restrict my attention to those results that do not involve any deep properties of the geometry of Banach spaces. In fact, the only general theory with which I will deal is contained in the following. Theorem 6.1.12. Let E be a separable Banach space and X , F , µ an En n valued martingale. Then kXn kE , Fn , µ is a non-negative submartingale and therefore, for each N ∈ Z+ and all α ∈ (0, ∞), (6.1.13)
µ
sup kXn kE ≥ α
≤
0≤n≤N
1 µ E kXN kE , α
sup kXn kE ≥ α . 0≤n≤N
In particular, for each p ∈ (1, ∞], (6.1.14)
sup kXn kE
n∈N
Lp (µ;E)
≤
p sup kXn kLp (µ;E) . p − 1 n∈N
Finally, if Xn = Eµ [X | Fn ], where X ∈ Lp (µ; E) for some p ∈ [1, ∞), then " ∞ # _ Xn −→ Eµ X Fn both (a.e., µ) and in Lp (µ; E). 0
1
See Lemma 3.4.5 in my A Concise Introduction to the Theory of Integration, Third Edition, Birkhauser (1998).
240
6 Some Extensions and Applications
Proof: The fact kXn kE , Fn , µ is a submartingale is an easy application of the inequality in (5.1.14); and, given this fact, the inequalities in (6.1.13) and (6.1.14) follow from the corresponding inequalities in Theorem 6.1.1. While proving the convergence statement, I may and will assume that F = W∞ p µ 0 Fn . Now let X ∈ L (µ; E) be given, and set Xn = E [X|Fn ], n ∈ N. Because of (6.1.13) and (6.1.14), we know (cf. the proofs of Corollary 5.2.4 and Theorem 6.1.8) that the set of X for which Xn −→ X (a.e., µ) is a closed subset of Lp (µ; E). Moreover, if X is µ-simple, then the µ-almost everywhere convergence of Xn to X follows easily from the R-valued result. Hence, we now know that Xn −→ X (a.s, µ) for each X ∈ L1 (µ; E). In addition, because of (6.1.14), when p ∈ (1, ∞), the convergence in Lp (µ; E) follows by Lebesgue’s Dominated Convergence Theorem. Finally, to prove the convergence in L1 (µ; E) when X ∈ L1 (µ; E), note that, by Fatou’s Lemma, kXkL1 (µ;E) ≤ lim kXn kL1 (µ;E) , n→∞
whereas (5.1.14) guarantees that kXkL1 (µ;E) ≥ lim kXn kL1 (µ;E) . n→∞
Hence, because kXn kE − kXkE − kXn − XkE ≤ 2kXkE , the convergence in L1 (µ; E) is again an application of Lebesgue’s Dominated Convergence Theorem. Going beyond the convergence result in Theorem 6.1.12 to get an analog of Doob’s Martingale Convergence Theorem is hard. For one thing, a na¨ıve analog is not even true for general separable Banach spaces, and a rather deep analysis of the geometry of Banach spaces is required in order to determine exactly when it is true. (See Exercise 6.1.18 for a case in which it is.) Exercises for § 6.1 Exercise 6.1.15. In this exercise we will develop Jensen’s Inequality in the Banach space setting. Thus, (Ω, F, P) will be a probability space, C will be a closed, convex subset of the separable Banach space E, and X will be a C-valued element of L1 (P; E). (i) Show that there exists a sequence {Xn : n ≥ 1} of C-valued, simple functions that tend to X both P-almost surely and in L1 (P; E). (ii) Show that EP [X] ∈ C and that EP g(X) ≤ g EP [X] for every continuous, concave g : C −→ [0, ∞).
Exercises for § 6.1
241
(iii) Given a sub-σ-algebra Σ of F, follow the argument in Corollary 5.2.8 to show that there exists a sequence {Pn }∞ 0 of finite, Σ-measurable partitions with the property that X EP [X, A] 1A −→ EP [X|Σ] P(A)
both P-almost surely and in L1 (P; E).
A∈Pn
In particular, conclude that there is a representative XΣ of EP [X|Σ] that is C-valued and satisfies EP g(X) Σ ≤ g XΣ (a.s., P) for each continuous, convex g : C −→ [0, ∞). Exercise 6.1.16. Again let (Ω, F, P) be a probability space and E be a separable Banach space. Further, suppose that {FTn : n ≥ 0} is a non-increasing se∞ quence of sub-σ-algebras of F, and set F∞ = 0 Fn . Finally, let X ∈ L1 (P; E). (i) Show that EP X Fn −→ EP [X|F∞ ] both P-almost surely and in Lp (P; E) for any p ∈ [1, ∞) with X ∈ Lp (P; E). Hint: Use (6.1.13) and the approximation result in Theorem 5.1.10 to reduce to the case when X is simple. When X is simple, get the result as an application of the convergence result for R-valued, reversed martingales in Theorem 5.2.21. (ii) Using part (i) and following the line of reasoning suggested at the end of § 5.2.4, give a proof of The Strong Law of Large Numbers for Banach space– valued random variables.2 (See Exercises 6.2.18 and 9.1.18 for entirely different approaches.) Exercise 6.1.17. As we saw in the proof of Theorem 6.1.8, the Hardy– Littlewood maximal function can be used to dominate other quantities of interest. As a further indication of its importance, I will use it in this exercise to prove the analog of Theorem 6.1.8 for a large class of approximate identities. R That is, let ψ ∈ L1 (RN ; R) with RN ψ(x) dx = 1 be given, and set ψt (x) = t−N ψ xt , t ∈ (0, ∞) and x ∈ RN .
Then {ψt : t > 0} forms an approximate identity in the sense that, as tempered distributions, ψt −→ δ0 as t & 0. In fact, because kψt ? f kLp (RN ;R) ≤ kψkL1 (RN ;R) kf kLp (RN ;R) , 2
t ∈ (0, ∞) and p ∈ [1, ∞],
This proof, which seems to have been the first, of the Strong Law for Banach spaces was given by E. Mourier in “El´ ements al´ eatoires dans un espace de Banach,” Ann. Inst. Poincar´ e 13, pp. 166–244 (1953).
242
6 Some Extensions and Applications
and
Z ψ(y) f (x − ty) dy,
ψt ? f (x) = RN
it is easy to see that, for each p ∈ [1, ∞),
lim ψt ? f − f Lp (RN ;R) = 0
t&0
first for f ∈ Cc (RN ; R) and then for all f ∈ Lp (RN ; R). The purpose of this exercise is to sharpen the preceding under the assumption that ψ(x) = α |x| ,
x ∈ RN \ {0} for some α ∈ C 1 (0, ∞); R with Z A≡ rN |α0 (r)| dr < ∞. (0,∞)
Notice that when α is non-negative and non-increasing, integration by parts shows that A = N . (i) Let f ∈ Cc (RN ; R) be given, and set f˜(r, x) =
1 |B(x, r)|
Z f (y) dy
for r ∈ (0, ∞) and x ∈ RN .
B(x,r)
Using integration by parts and the given hypotheses, show that ψt ? f (x) =
− N1
Z
rN α0 (r) f˜(tr, x) dr,
(0,∞)
and conclude that ψt ? f (x) ≤
A N
˜ (x), Mf
˜ is the quantity introduced at the beginning of the proof of Theorem where Mf 6.1.8. In particular, conclude that there is a constant KN ∈ (0, ∞), depending only on N ∈ Z+ , such that Mψ f (x) ≡ sup ψt ? f (x) ≤ KN A Mf (x),
x ∈ RN .
t∈(0,∞)
(ii) Starting from the conclusion in (i), show that (12)N KN Akf kL1 (RN ) {x : Mψ f (x) ≥ R} ≤ , R
f ∈ L1 (RN ; R),
Exercises for § 6.1
243
and that for p ∈ (1, ∞], N
Mψ f p N ≤ (12) KN A p kf kLp (RN ;R) , f ∈ Lp (RN ; R). L (R ;R) p−1 Finally, proceeding as in the proof of Theorem 6.1.8, use the first of these to prove that, for f ∈ L1 (RN ; R) and Lebesgue almost every x ∈ RN , lim ψt ? f (x) − f (x) t&0
Z
ψt (y) f (x − y) − f (x) dy = 0.
≤ lim
t&0
RN
Two of the most familiar examples to which the preceding applies are the 2 N Gauss kernel gt (x) = (2πt)− 2 exp − |x|2 and the Poisson kernel (cf. (3.3.19)) N ΠR t . In both these cases, A = N .
Exercise 6.1.18. Let E be a separable Hilbert space and (Xn , F, P) an Evalued martingale on some probability space (Ω, F, P) satisfying the condition sup EP kXn k2E < ∞. n∈Z+
W∞ Proceeding as in (i) of Exercise 5.2.36, first prove that there is a 1 Fn -measurable X ∈ L2 (P; E) to which {Xn : n ≥ 1} converges in L2 (P; E), next check that Xn = EP X Fn (a.s., P) for each n ∈ Z+ , and finally apply the last part of Theorem 6.1.12 to see that Xn −→ X P-almost surely. Exercise 6.1.19. This exercise deals with a variation, proposed by Jan Mycielski, on the sort of search algorithm discussed in § 5.2.5. Let G be a non-empty, bounded, open subset of RN with the property that λRN B(x, r) ∩ G ≥ αΩN rd for some α > 0 and all x ∈ G and 0 < r ≤ diam(G), and define µ on (G, BG ) λ (Γ∩G) by µ(Γ) = RλNN (G) . Next, let (Ω, F, P) be a probability space on which there R exists sequences {Xn : n ≥ 1} and {Zn : n ≥ 1} of G-valued random variables with the properties that the Xn ’s are mutually independent and have distribution µ, Zn is independent of {X 1 , . . . , Xn } and has distribution νn µ for each
n < ∞ for some r ∈ (1, ∞). Without loss n ≥ 1, and Kr ≡ supn≥1 dν dµ r L (µ;R)
in generality, assume that n 6= n0 =⇒ Xn (ω) 6= Xn0 (ω) for all ω ∈ Ω. For each n ≥ 1, let Yn (ω) be the last element of {X1 (ω), . . . , Xn (ω)} which is closest to Zn (ω). That is, if Σn is the permutation group on {1, . . . , n} and, for π ∈ Σn , An (π) = ω : |Xπ(m) (ω) − Zn (ω)| < |Xπ(m−1) (ω) − Zn (ω)| : for 2 ≤ m ≤ n , then Yn = Zπ(n) on An (π). Show that for all Borel measurable f : G −→ R, |f (Yn ) − f (Zn )| −→ 0 in P-probability. Here are some steps that you might want to follow.
244
6 Some Extensions and Applications
(i) Given f ∈ L1 (µ; R), show that ) ( Z 1 |f | dµ ≤ α−1 Mf (x) MG f (x) ≡ sup |B(x, r) ∩ G| B(x,r)∩G r>0
and therefore that there is a C < ∞ such that kMG f kLp (µ;R) ≤ for all p ∈ (1, ∞].
Cp p p−1 kf kL (µ;R)
(ii) Given n ≥ 1 and z ∈ G, set An (z) = ω : |Xm (ω) − z| < |Xm−1 (ω) − z| : for 2 ≤ m ≤ n , and show that E f (Yn ) = n! P
Z
EP f (Xn ), An (z) νn (dz).
Next, for n ≥ 2, set rn (ω) = |Xn−1 (ω) − z|, and show that "Z # P P E f (Xn ), An (z) = E f dµ, An−1 (z) ≤ MG f (z)P An (z) , B(z,rn )
and conclude from this that E f (Yn ) ≤ P
Z MG f dνn .
(iii) Given the conclusion drawn at the end of (ii), proceed as in the derivation of Theorem 5.2.34 from Lemma 5.2.31 to get the desired result. § 6.2 Elements of Ergodic Theory Among the two or three most important general results about dynamical systems is D. Birkhoff’s Individual Ergodic Theorem. In this section, I will present a generalization, due to N. Wiener, of Birkhoff’s basic theorem. The setting in which I will prove the Ergodic Theorem will be the following. (Ω, be a σ-finite measure space on which there exits a semigroup kF, µ) will N Σ : k ∈ N of measurable, µ-measure preserving transformations. That is, for each k ∈ NN , Σk is an F-measurable map from Ω into itself, Σ0 is the identity map, Σk+` = Σk ◦ Σ` for all k, ` ∈ NN , and µ(Γ) = µ (Σk )−1 (Γ) for all k ∈ N and Γ ∈ F. Further, E will be a separable Banach space with norm k · kE , and, given a function F : Ω −→ E, I will be considering the averages 1 X F ◦ Σk (ω), n ∈ Z+ , An F (ω) ≡ N n + k∈Qn
N where Q+ : kkk∞ < n and kkk∞ ≡ max1≤j≤N kj . My n is the cube k ∈ N goal (cf. Theorem 6.2.7) is to show that, for each p ∈ [1, ∞) and F ∈ Lp (µ; E), {An F : n ≥ 1} converges µ-almost everywhere. In fact, when either µ is finite or p ∈ (1, ∞), I will show that the convergence is also in Lp (µ; E).
§ 6.2 Elements of Ergodic Theory
245
§ 6.2.1. The Maximal Ergodic Lemma. Because he was thinking in terms of dynamical systems and therefore did not take full advantage of measure theory, Birkhoff’s own proof of his theorem is rather cumbersome. Later, F. Riesz discovered a proof which has become the model for all later proofs. Specifically, he introduced what is now called the Maximal Ergodic Inequality, which is an inequality that plays the same role here that Doob’s Inequality played in the derivation of Corollary 5.2.4. In order to cover Wiener’s extension of Birkhoff’s theorem, I will derive a multiparameter version of the Maximal Ergodic Inequality, which, as the proof shows, is really just a clever application of Hardy’s Inequality.1 Lemma 6.2.1 (Maximal Ergodic Lemma). For each n ∈ Z+ and p ∈ [1, ∞], An is a contraction on Lp (µ; E). Moreover, for each F ∈ Lp (µ; E), (6.2.2)
(24)N kF kL1 (µ;E) , µ sup kAn F kE ≥ λ ≤ λ n≥1
λ ∈ (0, ∞),
or
sup kAn F kE
n≥1
(6.2.3)
≤
Lp (µ)
(24)N p kF kLp (µ;E) , p−1
depending on whether p = 1 or p ∈ (1, ∞). Proof: First observe that, because kAn F kE ≤ An kF kE , it suffices to prove all of these assertions in the case when E = R and F is non-negative. Thus, I will restrict myself to this case. Since F ◦ Σk has the same distribution as F itself, the first assertion is trivial. To prove (6.2.2) and (6.2.3), let n ∈ Z+ be given, apply (6.1.10) and (6.1.11) to ak (ω) ≡
F ◦ Σk (ω) if k ∈ Q+ 2n if k ∈ / Q+ 2n ,
0
and conclude that n o + k Cn (ω) ≡ card k ∈ Qn : max Am F ◦ Σ (ω) ≥ λ 1≤m≤n
N
≤
1
(12) λ
X
F ◦ Σk (ω)
k∈Q+ 2n
The idea of using Hardy’s Inequality was suggested to P. Hartman by J. von Neumann and appears for the first time in Hartman’s “On the ergodic theorem,” Am. J. Math. 69, pp. 193–199 (1947).
246
6 Some Extensions and Applications
and X k∈Q+ n
max
1≤m≤n
k
Am F ◦ Σ (ω)
p
≤
(12)N p p−1
p X
F ◦ Σk (ω)
p
.
k∈Q+ 2n
Hence, by Tonelli’s Theorem,
X
k
max Am F ◦ Σ
µ
≥λ
1≤m≤n
k∈Q+ n
Z =
Cn (ω) µ(dω)
Z (12)N X F ◦ Σk f dµ ≤ λ + k∈Q2n
and, similarly, X Z k∈Q+ n
max
1≤m≤n
Am F ◦ Σ
k
p
dµ ≤
(12)N p p−1
p X Z
F ◦ Σk
p
dµ.
k∈Q+ 2n
Finally, since the distributions of max1≤m≤n Am F ◦ Σk and F ◦ Σk do not depend on k ∈ NN , the preceding lead immediately to µ
max Am F ≥ λ
1≤m≤n
and
max Am F
1≤m≤n
Lp (µ)
≤
(24)N kF kL1 (µ) λ
N
2 p (12)N p kF kLp (µ) ≤ p−1
for all n ∈ Z+ . Thus, (6.2.2) and (6.2.3) follow after one lets n → ∞. Given (6.2.2) and (6.2.3), I adopt again the strategy used in the proof of Corollary 5.2.4. That is, I must begin by finding a dense subset of each Lp -space on that the desired convergence results can be checked by hand, and for this purpose I will have to introduce the notion of invariance. A set Γ ∈ F is said to be invariant, and I write Γ ∈ I if Γ = (Σk )−1 (Γ) for every k ∈ NN . As is easily checked, I is a sub-σ-algebra of F. In addition, it is clear that Γ ∈ F is invariant if Γ = (Σej )−1 (Γ) for each 1 ≤ j ≤ N , where {ei : 1 ≤ i ≤ N } is the standard orthonormal basis in RN . Finally, if I is the µ-completion of I relative to F in the sense that Γ ∈ I if and only if Γ ∈ F and ˜ ∈ I such that µ(Γ∆Γ) ˜ = 0 (A∆B ≡ (A\B)∪(B \A) is the symmetric there is Γ difference between the sets A and B), then an F-measurable F : Ω −→ E is I-measurable if and only if F = F ◦ Σk (a.e., µ) for each k ∈ NN . Indeed, one
§ 6.2 Elements of Ergodic Theory
247
need only check this equivalence for indicator functions of sets. But if Γ ∈ F ˜ = 0 for some Γ ˜ ∈ I, then and µ(Γ∆Γ) ˜ + µ(Γ∆Γ) ˜ = 0, µ Γ∆(Σk )−1 (Γ) ≤ µ (Σk )−1 (Γ∆Γ) and so Γ ∈ I. Conversely, if Γ ∈ I, set [ ˜= Γ (Σk )−1 (Γ), k∈NN
˜ ∈ I and µ(Γ∆Γ) ˜ = 0. and check that Γ Lemma 6.2.4. Let I(E) be the subspace of I-measurable elements of L2 (µ; E). Then, I(E) is a closed linear subspace of L2 (µ; E). Moreover, if ΠI(R) denotes orthogonal projection from L2 (µ; R) onto I(R), then there exists a unique linear contraction ΠI(E) : L2 (µ; E) −→ I(E) with the property that ΠI(E) (af ) = aΠI(R) f for a ∈ E and f ∈ L2 (µ; R). Finally, for each F ∈ L2 (µ; E),
(6.2.5)
An F −→ ΠI(E) F
(a.e., µ) and in L2 (µ; E).
Proof: I begin with the case when E = R. The first step is to identify the orthogonal complement I(R)⊥ of I(R). To this end, let N denote the subspace of L2 (µ; R) consisting of elements having the form g − g ◦ Σej for some g ∈ L2 (µ; R) ∩ L∞ (µ; R) and 1 ≤ j ≤ N . Given f ∈ I(R), observe that f, g − g ◦ Σej L2 (µ;R) = f, g L2 (µ;R) − f ◦ Σej , g ◦ Σej L2 (µ;R) = 0. Hence, N ⊆ I(R)⊥ . On the other hand, if f ∈ L2 (µ; R) and f ⊥ N , then it is clear that f ⊥ f − f ◦ Σej for each 1 ≤ j ≤ N and therefore that
f − f ◦ Σej 2 2 L (µ;R)
2 2 = kf kL2 (µ;R) − 2 f, f ◦ Σej L2 (µ;R) + f ◦ Σej L2 (µ;R) = 2 kf k2L2 (µ;R) − f, f ◦ Σej L2 (µ;R) = 2 f, f − f ◦ Σej L2 (µ;R) = 0. Thus, for each 1 ≤ j ≤ N , f = f ◦ Σej µ-almost everywhere; and, by induction on kkk∞ , one concludes that f = f ◦ Σk µ-almost everywhere for all k ∈ NN . In other words, we have now shown that I(R) = N ⊥ or, equivalently, that N = I(R)⊥ . Continuing with E = R, next note that if f ∈ I(R), then An f = f (a.e., µ) for each n ∈ Z+ . Hence, (6.2.5) is completely trivial in this case. On the other hand, if g ∈ L2 (µ; R) ∩ L∞ (µ; R) and f = g − g ◦ Σej , then X X nN An f = g ◦ Σk − g ◦ Σk+ej , {k∈Q+ n :kj =0}
{k∈Q+ n :kj =n−1}
248
6 Some Extensions and Applications
and so, with p ∈ {2, ∞},
2kgkLp (µ;R)
An f p −→ 0 ≤ L (µ;R) n
as n → ∞.
Hence, in this case also, (6.2.5) is easy. Finally, to complete the proof for E = R, simply note that, by (6.2.3) with p = 2 and E = R, the set of f ∈ L2 (µ; R) for which (6.2.5) holds is a closed linear subspace of L2 (µ; R) and that we have already verified (6.2.5) for f ∈ I(R) and f from a dense subspace of I(R)⊥ . Turning to general E’s, first note that ΠI(E) F is well defined for µ-simple F ’s. P` Indeed, if F = 1 ai 1Γi for some {ai : 1 ≤ i ≤ `} ⊆ E and {Γi : 1 ≤ i ≤ `} of mutually disjoint elements of F with finite µ-measure, then ΠI(E) F =
` X
ai ΠI(R) 1Γi
1
and so
ΠI(E) F 2 2 ≤ L (µ;E)
Z
` X
!2 kai kE ΠI(R) 1Γi
dµ
1
= ΠI(R)
` X 1
! 2
kai kE 1Γi
2
L (µ;R)
≤
` X
kai k2E µ(Γi ) = kF k2L2 (µ;E) .
1
Thus, since the space of µ-simple functions is dense in L2 (µ; E), it is clear that ΠI(E) not only exists but is also unique. Finally, to check (6.2.5) for general E’s, note that (6.2.5) for E-valued, µsimple F ’s is an immediate consequence of (6.2.5) for E = R. Thus, we already know (6.2.5) for a dense subspace of L2 (µ; E), and so the rest is another elementary application of (6.2.3). § 6.2.2. Birkhoff ’s Ergodic Theorem. For any p ∈ [1, ∞), let Ip (E) denote the subspace of I-measurable elements of Lp (µ; E). Clearly Ip (E) is closed for every p ∈ [1, ∞). Moreover, since (6.2.6) µ(Ω) < ∞ =⇒ ΠI(E) F = Eµ F I ,
when µ is finite ΠI(E) extends automatically as a linear contraction from Lp (µ; E) onto Ip (E) for each p ∈ [1, ∞), the extension being given by the right-hand side of (6.2.6). However, when µ(E) = ∞, there is a problem. Namely, because µ I will seldom be σ-finite, it will not be possible to condition µ with respect to I. Be that as it may, (6.2.5) provides an extension of ΠI(E) . Namely, from (6.2.5) and Fatou’s Lemma, it is clear that, for each p ∈ [1, ∞),
ΠI(E) F p ≤ kF kLp (µ;E) , F ∈ Lp (µ; E) ∩ L2 (µ; E), L (µ;E) and therefore the desired existence of the extension follows by continuity.
§ 6.2 Elements of Ergodic Theory
249
Theorem 6.2.7 (Birkhoff ’s Individual Ergodic Theorem). For each p ∈ [1, ∞) and F ∈ Lp (µ; E), (6.2.8)
An F −→ ΠI(E) F
(a.e., µ).
Moreover, if either p ∈ (1, ∞) or p = 1 and µ(Ω) < ∞, then the convergence in (6.2.8) is also in Lp (µ; E). Finally, if µ(Γ) ∧ µ(Γ{) = 0 for every Γ ∈ I, then (6.2.8) can be replaced by Eµ [F ] µ(Ω) lim An F = n→∞ 0
if µ(Ω) ∈ (0, ∞) (a.e., µ), if µ(Ω) = ∞
and the convergence is in Lp (µ; E) when either p ∈ (1, ∞) or p = 1 and µ(Ω) < ∞. Proof: As I said above, the proof is now an easy application of the strategy used to prove Corollary 5.2.4. Namely, by (6.2.2), the set of F ∈ L1 (µ; E) for which (6.2.8) holds is closed and, by (6.2.5), it includes L1 (µ; E) ∩ L∞ (µ; E). Hence, (6.2.8) is proved for p = 1. On the other hand, when p ∈ (1, ∞), (6.2.3) applies and shows first that the set of F ∈ Lp (µ; E) for which (6.2.8) holds is closed in Lp (µ; E) and second that µ-almost everywhere convergence already implies convergence in Lp (µ; E). Hence, we have proved that (6.2.8) holds and that the convergence is in Lp (µ; E) when p ∈ (1, ∞). In addition, when µ(Γ) ∧ µ(Γ{) = 0 for all Γ ∈ I, it is clear that the only elements of Ip (E) are µ-almost everywhere constant, which, in the case when µ(Ω) < ∞, means (cf. µ [F ] , and, when µ(Ω) = ∞, means that Ip (E) = {0} (6.2.6)) that ΠI(E) F = Eµ(Ω) for all p ∈ [1, ∞). In view of the preceding, all that remains is to discuss the L1 (µ; E) convergence in the case when p = 1 and µ(Ω) < ∞. To this end, observe that, because the An ’s are all contractions in L1 (µ; E), it suffices to prove L1 (µ; E) convergence for E-valued, µ-simple F ’s. But L1 (µ; E) convergence for such F ’s reduces to showing that An f −→ ΠI(R) f in L1 (µ; R) for non-negative f ∈ L∞ (µ; R). Finally, if f ∈ L1 µ; [0, ∞) , then
An f kL1 (µ) = kf kL1 (µ) = ΠI(R) f kL1 (µ;R) ,
n ∈ Z+ ,
where, in the last equality, I used (6.2.6); and this, together with (6.2.8), implies (cf. the final step in the proof of Theorem 6.1.12) convergence in L1 (µ). I will say that semigroup Σk : k ∈ NN is ergodic on (Ω, F, µ) if, in addition to being µ-measure preserving, µ(Γ) ∧ µ(Γ{) = 0 for every invariant Γ ∈ I.
250
6 Some Extensions and Applications
Classic Example. In order to get a feeling for what the Ergodic Theorem is saying, take µ to be Lebesgue measure on the interval [0, 1) and, for a given α ∈ (0, 1), define Σα : [0, 1) −→ [0, 1) so that Σα (ω) ≡ ω + α − [ω + α] = ω + α mod 1. If α is rational and m is the smallest element of Z+ with the property that mα ∈ Z+ , then it is clear that, for any F on [0, 1), F ◦ Σα = F if and only if F 1 . Hence, if F ∈ L2 [0, 1); C and has period m Z √ c` (F ) ≡ F (ω)e− −1 2π`ω dω, ` ∈ Z, [0,1)
then elementary Fourier analysis leads to the conclusion that, in this case, √ X lim An F (ω) = cm` (F )e −1 2m`πω for Lebesgue-almost every ω ∈ [0, 1). n→∞
`∈Z
On the other hand, if α is irrational, then Σkα : k ∈ N} is µ-ergodic on [0, 1). To see this, suppose that F ∈ I(C). Then (cf. the preceding and use Parseval’s Identity) X
2 c` (F ) − c` (F ◦ Σα ) 2 . 0 = F − F ◦ Σα L2 ([0,1);C) = `∈Z
But, clearly,
√
c` (F ◦ Σα ) = e
−1 2π`α
c` (F ),
` ∈ Z,
and so (because α is irrational) c` (F ) = 0 for each ` 6= 0. In other words, the only elements of I(C) are µ-almost everywhere constant. Thus, for each irrational α ∈ (0, 1), p ∈ [1, ∞), separable Banach space E, and F ∈ Lp [0, 1); E , Z lim An F = F (ω) dω Lebesgue-almost everywhere and in Lp (µ; E). n→∞
[0,1)
Finally, notice that the situation changes radically when one moves from [0, 1) to [0, ∞) and again takes µ to be Lebesgue measure and α ∈ (0, 1) to be irrational. If I extend the definition of Σα by taking Σα (ω) = bωc + Σα (ω − bωc) for ω ∈ [0, ∞), then it is clear that invariant functions are those that are constant on each R bωc+1 interval [m, m+1) and that, Lebesgue-almost surely, An f (ω) −→ bωc f (η) dη. On the other hand, if one defines Σα (ω) = ω + α, then every invariant set that has non-zero measure will have infinite measure, and so, now, every choice of α ∈ (0, 1) (not just irrational ones) will give rise to an ergodic system. In particular, one will have, for each p ∈ [1, ∞) and F ∈ Lp (µ; E), lim An F = 0
n→∞
Lebesgue-almost everywhere,
and the convergence will be in Lp (µ; E) when p ∈ (1, ∞).
§ 6.2 Elements of Ergodic Theory
251
§ 6.2.3. Stationary Sequences. For applications to probability theory, it is useful to reformulate these considerations in terms of stationary families of random variables. Thus, let (Ω, F, P) be a probability space and (E, B) be a measurable space (E need not be a Banach space). Given a family F = {Xk : k ∈ NN } of E-valued random variables on (Ω, F, P), I will say that F is Pstationary (or simply stationary) if, for each ` ∈ NN , the family F` ≡ Xk+` : k ∈ NN has the same (joint) distribution under P as F itself. Clearly, one can test for stationarity by checking that the distribution of Fej is the same as that of F for each 1 ≤ j ≤ N . In order to apply the considerations of § 6.2.1 to stationary families, note that all questions about the properties of F can be phrased in N terms of the following canonical setting . Namely, set E = E N and define µ N N on E, B N to be the image measure F∗ P. In other words, for each Γ ∈ B N , µ(Γ) = P F ∈ Γ . Next, for each ` ∈ NN , define Σ` : E −→ E to be the natural shift transformation on E given by Σ` (x)k = xk+` for all k ∈ NN . Obviously, stationarity of F is equivalent to the statement that {Σk : k ∈ NN } is µ-measure N preserving. Moreover, if I is the σ-algebra of shift invariant elements Γ ∈ B N −1 (i.e., Γ = Σk (Γ) for all k ∈ NN ), then, by Theorem 6.2.7, for any separable Banach space B, any p ∈ [1, ∞), and any F ∈ Lp (P; B), h i 1 X F ◦ Fk = EP F ◦ F F−1 (I) (a.s., P) and in Lp (P; B). lim N n→∞ n + k∈Qn
N In particular, when Σk : k ∈ NN is ergodic on E, B N µ , I will say that the family F is ergodic and conclude that the preceding can be replaced by 1 X F ◦ Fk = EP F ◦ F (a.s., P) and in Lp (P; B). (6.2.9) lim N n→∞ n + k∈Qn
So far I have discussed one-sided stationary families, that is, families indexed by NN . However, for various reasons (cf. Theorem 6.2.11) it is useful to know that one can usually embed a one-sided stationary family into a two-sided one. In terms of the semigroup of shifts, to the trivial observation that k this corresponds N NN the semigroup Σ : k ∈ N on E = E can be viewed as a sub-semigroup ˆ = E ZN . With these comments in of the group of shifts Σk : k ∈ ZN on E mind, I will prove the following. Lemma 6.2.10. Assume that E is a complete, separable, metric space and that F = {Xk : k ∈ NN } is a stationary family of E-valued random variables on the ˆ and ˆ F, ˆ P) probability space (Ω, F, P). Then there exists a probability space (Ω, N N ˆ ˆ a family F = Xk : k ∈ Z with the property that, for each ` ∈ Z , ˆ` ≡ X ˆ k+` : k ∈ NN F ˆ as F has under P. has the same distribution under P
252
6 Some Extensions and Applications
Proof: When formulated correctly, this theorem is an essentially trivial application of Kolmogorov’s Extension Theorem (cf. part (iii) of Exercise 9.1.17). Namely, for n ∈ N, set Λn = k ∈ ZN : kj ≥ −n for 1 ≤ j ≤ N , and define Φn : E Λ0 −→ E Λn so that Φn (x)k = xn+k
for x ∈ E Λ0 and k ∈ Λn , where n ≡ (n, . . . , n)
Next, take µ0 on E Λ0 to be the P-distribution of F and, for n ≥ 1, µn on E Λn to be (Φn )∗ µ0 . Using stationarity, one can easily check that, for each n ≥ 0 and k ∈ NN , µn is invariant under the obvious extension of Σk to E Λn . In particular, if one identifies E Λn+1 with E Λn+1 \Λn × E Λn , then µn+1 E Λn+1 \Λn × Γ = µn (Γ)
for all Γ ∈ BE Λn .
Hence the µn ’s are consistently defined on the spaces E Λn , and therefore Kolmogorov’s Extension Theorem applies and guarantees the existence of a unique N Borel probability measure µ on E Z with the property that N
µ EZ
\Λn
× Γ = µn (Γ)
for all n ≥ 0 and Γ ∈ BE Λn .
Moreover, since each µn is Σk -invariant for all k ∈ NN , it is clear that µ is also. N Thus, because Σk is invertible on E Z and Σ−k is its inverse, it follows that µ is invariant under Σk for all k ∈ ZN . ˆ = µ, ˆ = E ZN , Fˆ = B ˆ , P To complete the proof at this point, simply take Ω Ω ˆ k (ˆ and X ω) = ω ˆ k for k ∈ ZN . As an example of the advantage that Lemma 6.2.10 affords, I present the following beautiful observation made originally by M. Kac. Theorem 6.2.11. Let (E, B) be a measurable space and {Xk : k ∈ N} a stationary sequence of E-valued random variables on the probability space (Ω, F, P). Given Γ ∈ B, define the return time ρΓ (ω) = inf{k ≥ 1 : Xk (ω) ∈ Γ}. Then, EP ρΓ , X0 ∈ Γ = P Xk ∈ Γ for some k ∈ N . In particular, if {Xk : k ∈ N} is ergodic, then P X0 ∈ Γ > 0 =⇒ EP ρΓ , X0 ∈ Γ = 1. Proof: Set Uk = 1Γ ◦Xk for k ∈ N. Then {Uk : k ∈ N} is a stationary sequence of {0, 1}-valued random variables. Hence, by Lemma 6.2.10, we can find a probˆ on which there is a family {U ˆ F, ˆ P ˆk : k ∈ Z} of {0, 1}-valued ability space Ω,
§ 6.2 Elements of Ergodic Theory
253
ˆn , . . . , U ˆn+k , . . . random variables with the property that, for every n ∈ Z, U ˆ as (U0 , . . . , Uk , . . . ) has under P. In particular, has the same distribution under P ˆ U ˆ0 = 1 and P ρΓ ≥ 1, X0 ∈ Γ = P ˆ U ˆ−n = 1, U ˆ−n+1 = 0, . . . , U ˆ0 = 0 , P ρΓ ≥ n + 1, X0 ∈ Γ = P
n ∈ Z+ .
Thus, if λΓ (ˆ ω ) ≡ inf k ∈ N : U−k (ˆ ω) = 1 , then ˆ λΓ = n − 1 , P ρΓ ≥ n, X0 ∈ Γ = P
n ∈ Z+ ,
and so ˆ λΓ < ∞ . EP ρΓ , X0 ∈ Γ = P Now observe that ˆ λΓ > n = P ˆ U ˆ−n = 0, . . . , U ˆ0 = 0 = P X0 ∈ P / Γ, . . . , Xn ∈ /Γ , from which it is clear that ˆ λΓ < ∞ = P ∃k ∈ N Xk ∈ Γ . P Finally, assume that P∞{Xk : k ∈ N} is ergodic and that P(X0 ∈ Γ) > 0. Because, by (6.2.9), 0 1Γ Xk = ∞ P-almost surely, it follows that, P-almost surely, Xk ∈ Γ for some k ∈ N. It should be noticed that, although there are far more elementary proofs, when {Xn : n ≥ 0} is an irreducible, ergodic Markov chain on a countable state space E, then Kac’s theorem proves that the stationary measure at the state x ∈ E is the reciprocal of the expected time that the chain takes to return to x when it starts at x. § 6.2.4. Continuous Parameter Ergodic Theory. I turn now to the setting of continuously parametrized semigroups Thus, again of transformations. (Ω, F, µ) is a σ-finite measure space and Σt : t ∈ [0, ∞)N is a measurable semigroup of µ-measure preserving transformations on Ω. That is, Σ0 is the identity, Σs+t = Σs ◦ Σt , (t, ω) ∈ [0, ∞)N × Ω 7−→ Σt (ω) ∈ Ω is B[0,∞)N × F-measurable, and Σt ∗ µ = µ for every t ∈ [0, ∞)N . Next, given an F-measurable F with values in some separable Banach space E, let G(F ) be the set of ω ∈ Ω with the property that Z
F ◦ Σt (ω) dt < ∞ for all T ∈ (0, ∞). E [0,T )N
254
6 Some Extensions and Applications
Clearly, ω ∈ G(F ) =⇒ Σt (ω) ∈ G(F )
for every t ∈ [0, ∞)N .
In addition, if F ∈ Lp (µ; E) for some p ∈ [1, ∞), then ! Z Z
F ◦ Σt (ω) p dt µ(dω) = T N kF kp p Ω
and so F ∈
L (µ;E)
E
[0,T )N
[
< ∞,
Lp (µ; E) =⇒ µ G(F ){ = 0.
p∈[1,∞)
Next, for each T ∈ (0, ∞), define ( −N R T F ◦ Σt (ω) dt [0,T )N AT F (ω) = 0
if ω ∈ G(F ) if ω ∈ / G(F ),
and note that, as a consequence of the invariance of G(F ), AT F ◦ Σt = AT F ◦ Σt for all t ∈ [0, ∞)N . ˆ to denote the σ-algebra of Γ ∈ F with the property that Γ = Finally, use I t −1 (Σ ) (Γ) for each t ∈ [0, ∞)N , and say that Σt : t ∈ [0, ∞)N is ergodic if ˆ µ(Γ) ∧ µ(Γ{) = 0 for every Γ ∈ I. t Theorem 6.2.12. Let (Ω, F, µ) be a σ-finite measure space and Σ : t ∈ [0, ∞)N be a measurable semigroup of µ-measure preserving transformations on Ω. Then, for each separable Banach space E, p ∈ [1, ∞), and T ∈ (0, ∞), AT is a contraction on Lp (µ; E). Next, set ΠI(E) = ΠI(E) ◦ A1 , where ΠI(E) ˆ k N is defined in terms of Σ : k ∈ N as in Theorem 6.2.7. Then, for each p ∈ [1, ∞) and F ∈ Lp (µ; E), lim AT F = ΠI(E) F ˆ
(6.2.13)
T →∞
(a.e., µ).
Moreover, if p ∈ (1, ∞) or p = 1 and µ(Ω) < ∞, then the convergence is also in Lp (µ; E). In fact, if µ(Ω) < ∞, then ˆ (a.e., µ) and in Lp (µ : E). lim AT F = Eµ F I T →∞
Finally, if Σt : t ∈ [0, ∞)N is ergodic, then (6.2.13) can be replaced by lim AT F =
T →∞
Eµ [F ] µ(Ω)
(a.e., µ),
where it is understood that the ratio is 0 when the denominator is infinite.
§ 6.2 Elements of Ergodic Theory Proof: The first step is the observation that
(24)N kF kL1 (µ;E) , (6.2.14) µ sup AT F E ≥ λ ≤ λ T >0
255
λ ∈ (0, ∞)
and
sup AT F
T >0 E
(6.2.15)
≤
Lp (µ;E)
(24)N p kF kLp (µ;E) p−1
for p ∈ (1, ∞).
Indeed, because of (AT F ) ◦ Σt = AT (F ◦ Σt ), (6.2.14) is derived from (6.1.6) in precisely the same way as I derived (6.2.2) from (6.1.10), and (6.2.15) comes from (6.1.7) just as (6.2.3) came from (6.1.7). Given (6.2.14) and (6.2.15), we know that it suffices to prove (6.2.13) for a dense subset of L1 (µ; E). Thus, let F be a uniformly bounded element of L1 (µ; E) and set Fˆ = A1 F . Because Z
N
T AT F (ω) − nN An Fˆ (ω) ≤
F ◦ Σt (ω)kE dt E [0,n+1)N \[0,n)N
for n ≤ T ≤ n + 1,
lim
sup n→∞
n≤T ≤n+1
AT F − An Fˆ E
=0
for every p ∈ [1, ∞].
Lp (µ;R)
Hence, for F ∈ L1 (µ; E)∩L∞ (µ; E), (6.2.13) follows from (6.2.8). As for the case when µ(Ω) < ∞, all that we have to do is check that ΠI(E) F = Eµ F ˆ I (a.e., µ). ˆ However, from (6.2.13), it is easy to see that ΠI(E) F is measurable with respect ˆ ˆ to the µ-completion of I, and so it suffices to show that ˆ Eµ F, Γ = Eµ A1 F, Γ for all Γ ∈ I. ˆ then But, if Γ ∈ I, µ
E A1 F, Γ =
Z
Eµ F ◦ Σt , Γ dt
[0,1)N
Z =
−1 Eµ F ◦ Σt , Σt (Γ) dt = Eµ [F, Γ].
[0,1)N
Finally, assume that Σt : t ∈ [0, ∞)N is µ-ergodic. When µ(Ω) < ∞, the asserted result follows immediately from the preceding; and when µ(Ω) = ∞, it follows from the fact that ΠI(E) F is measurable with respect to the µ-completion ˆ ˆ of I.
256
6 Some Extensions and Applications Exercises for § 6.2
Exercise 6.2.16. Given an irrational α ∈ (0, 1) and an ∈ (0, 1), let Nn (α, ) be the number of 1 ≤ m ≤ n with the property that α − ` ≤ for some ` ∈ Z. m 2m
As an application of the considerations in the Classic Example given at the end of § 6.1, show that Nn (α, ) ≥ . lim n n→∞ Hint: Let δ ∈ 0, 2 be given, take f equal to the indicator function of [0, δ) ∪ Pn (1 − δ, 1), and observe that Nn (α, ) ≥ k=1 f ◦ Σkα (ω) so long as 0 ≤ ω ≤ 2 − δ. Exercise 6.2.17. Assume that µ(Ω) < ∞ and that Σk : k ∈ NN is ergodic. Given a non-negative F-measurable function f , show that
lim An f < ∞ on a set of positive µ-measure =⇒ f ∈ L1 (µ; R)
n→∞
Eµ [f ] (a.e., µ). n→∞ µ(Ω) Exercise 6.2.18. Let F = Xk : k ∈ NN be a stationary family of random variables on the probability space (Ω, F, P) with values in the measurable space NN (E, B), and let I denote the σ-algebra of shift invariant Γ ∈ BE . =⇒ lim An f =
(i) Take T ≡
\
σ Xk : kj ≥ n for all 1 ≤ j ≤ N ,
n≥0
N the tail σ-algebra determined by X : k ∈ N . Show that F−1 (I) ⊆ T , and k conclude that Xk : k ∈ NN is ergodic if T is P-trivial (i.e., P(Γ) ∈ {0, 1} for all Γ ∈ T ). (ii) By combining (i), Kolmogorov’s 0–1 Law, and the Individual Ergodic Theorem, give another derivation of The Strong Law of Large Numbers for independent, identically distributed, integrable random variables with values in a separable Banach space. Exercise 6.2.19. Let Xk : k ∈ N be a stationary, ergodic sequence of Rvalued, integrable random variables on (Ω, F, P). Using the reasoning suggested in Exercise 1.4.28, prove Guivarc’h’s lemma: n−1 X Xk < ∞. EP X1 = 0 =⇒ lim n→∞ k=0
§ 6.3 Burkholder’s Inequality
257
§ 6.3 Burkholder’s Inequality Given a martingale Xn , Fn , P with X0 = 0 and a sequence {σn : n ≥ 0} of bounded functions with the property that σn is Fn -measurable for n ≥ 0, determine {Yn : n ≥ 0} byY0 = 0 and Yn − Yn−1 = σn−1 (Xn − Xn−1 ) for n ≥ 1. It is clear that Yn , Fn , P is again a martingale. In addition, if the absolute values of all the σn ’s are bounded by some constant σ < ∞ and Xn is square P-integrable, then one can easily check that n n X X EP Yn2 = EP σn2 (Xn − Xn−1 )2 ≤ σ 2 EP (Xn − Xn−1 )2 = σ 2 EP Xn2 . m=1
m=1
On the other hand, it is not at all clear how to compare the size of Yn to that of Xn in any of the Lp spaces other than p = 2. The problem of finding such a comparison was given a definitive solution by D. Burkholder, and I will present his solution in this section. Actually, Burkholder solved the problem twice. His first solution was a beautiful adaptation of general ideas and results that had been developed over the years to solve related problems in probability theory and analysis and, as such, did not yield the optimal solution. His second approach is designed specifically to address the problem at hand and bears little or no resemblance to familiar techniques. It is entirely original, remarkably elementary and effective, but somewhat opaque. The approach is the outgrowth of many years of deep thinking that Burkholder devoted to the topic, and the reader who wants to understand the path that led him to it should consult the explanation that he wrote.1 § 6.3.1. Burkholder’s Comparison Theorem. Burkholder’s basic result is the following comparison theorem. Theorem 6.3.1 (Burkholder). Let Ω, F, P be a probability space, Fn : n ∈ N a non-decreasing sequence of sub-σ-algebras of F, and E and F a pair of (real or complex) separable Hilbert spaces. Next, suppose that Xn , Fn , P and Yn , Fn , P are, respectively, E- and F -valued martingales. If kY0 kF ≤ kX0 kE and kYn − Yn−1 kF ≤ kXn − Xn−1 kE , n ∈ Z+ , P-almost surely, then, for each p ∈ (1, ∞) and n ∈ N, (6.3.2)
Yn p ≤ Bp Xn Lp (P;E) , L (P;F )
where Bp ≡ (p − 1) ∨
1 . p−1
As I said before, the derivation of Theorem 6.3.1 is both elementary and mysterious. I begin with the trivial observation that, without loss in generality, 1
For those who want to know the secret behind this proof, Burkholder has revealed it in his article “Explorations in martingale theory and its applications” for the 1989 Saint-Flour Ecole d’Et´ e lectures published by Springer-Verlag, LNM 1464 (1991).
258
6 Some Extensions and Applications
I may assume that both E and F are complex Hilbert spaces, since we can always complexify them, and, in addition, that E = F , since, if that is not already the case, I can embed them in E ⊕ F . Thus, I will be making these assumptions throughout. The heart of the proof lies in the computations contained in the following two lemmas. Lemma 6.3.3. Let p ∈ (1, ∞) be given, set αp =
p2−p (p − 1)p−1 2−p
if p ∈ [2, ∞) if p ∈ (1, 2],
p
and define u : E 2 −→ R by (cf. (6.3.2)) u(x, y) = kykE − Bp kxkE Then kykpE − Bp kxkE
p
kykE + kxkE
≤ αp u(x, y),
p−1
.
(x, y) ∈ E 2 .
Proof: When p = 2, there is nothing to do. Thus, I will assume that p ∈ (1, ∞) \ {2}. Observe that it suffices to show that, for all (x, y) ∈ E 2 satisfying kxkE + kykE = 1, depending on whether p ∈ (2, ∞) or p ∈ (1, 2), p ≤ p2−p (p − 1)p−1 kykE − (p − 1)kxkE p (*) kykE − (p − 1)kxkE ≥ p2−p (p − 1)p−1 kykE − (p − 1)kxkE . Indeed, when p ∈ (2, ∞), (*) is precisely the result desired, and, when p ∈ (1, 2), (*) gives the desired result after one divides through by (p − 1)p and reverses the roles of x and y. I begin the verification of (*) by checking that 2−p
(**)
p
(p − 1)
p−1
>1
if p ∈ (2, ∞)
0 on (2, ∞). Next, observe that proving (*) comes down to checking that, for s ∈ [0, 1], Φ(s) ≡ p
2−p
(p − 1)
p−1
p
p p
(1 − ps) − (1 − s) + (p − 1) s
≥0 ≤0
if p ∈ (2, ∞) if p ∈ (1, 2).
§ 6.3 Burkholder’s Inequality
259
To this end, note that, by (**), Φ(0) > 0 when p ∈ (2, ∞) and Φ(0) < 0 when p ∈ (1, 2). Also, for s ∈ (0, 1), h i Φ0 (s) = p (p − 1)p sp−1 + (1 − s)p−1 − p2−p (p − 1)p−1 and
h i Φ00 (s) = p(p − 1) (p − 1)p sp−2 − (1 − s)p−2 . In particular, we see that Φ p1 = Φ0 p1 = 0. In addition, depending on whether p ∈ (2, ∞) or p ∈ (1, 2), lims&0 Φ00 (s) is negative or positive, Φ00 is strictly increasing or decreasing on (0, 1), and lims%1 Φ00 (1) is positive or negative. Hence, there exists a unique t = tp ∈ (0, 1) with the property that < 0 if p ∈ (2, ∞) > 0 if p ∈ (2, ∞) Φ00 (0, t) and Φ00 (t, 1) > 0 if p ∈ (1, 2) < 0 if p ∈ (1, 2. Moreover, because Φ00 (t) = 0, it is easy to see that t ∈ 0, p1 . Now suppose that p ∈ (2, ∞) and consider Φ on each of the intervals p1 , 1 , 1 t, p , and 0, t separately. Because both Φ and Φ0 vanish at p1 while Φ00 > 0 on p1 , 1 , it is clear that Φ > 0 on p1 , 1 . Next, because Φ0 p1 = 0 and Φ00 t, p1 > 0, we know that Φ is strictly decreasing on t, p1 and therefore that Φ t, p1 > Φ p1 = 0. Finally, because Φ00 (0, t) < 0 while Φ(0) ∧ Φ(t) ≥ 0, we also know that Φ (0, t) > 0. The argument when p ∈ (1, 2) is similar, only this time all the signs are reversed.
Lemma 6.3.4. Again let p ∈ (1, ∞) be given, and define u : E × F −→ R as in Lemma 6.3.3. In addition, define the functions v and w on E 2 \ {0, 0} by p−2 v(x, y) = p kykE + kxkE kykE + (2 − p)kxkE and w(x, y) = p(1 − p) kykE + kxkE
p−2
kxkE .
Then, for (x, y) ∈ E 2 and (k, h) ∈ E 2 satisfying min ky + thkE ∧ kx + tkkE > 0 and khkE ≤ kkkE , t∈[0,1]
one has u(x + k, y + h) − u(x, y) ≤ v(x, y) Re
y kykF
x , k , h + w(x, y) Re kxk E F
E
when p ∈ [2, ∞) and y x , k , h −v(y, x) Re (p−1) u(x+k, y+h)−u(x, y) ≤ −w(y, x) Re kyk kxkE E E
when p ∈ (1, 2].
E
260
6 Some Extensions and Applications
Proof: Set Φ(t) = Φ t; (x, k), (y, h) ≡ ky + thkE − (p − 1)kx + tkkE
kx + tkkE + ky + thkE
p−1
,
and observe that (
u x + tk, y + th =
Φ t; (x, k), (y, h) −(p − 1)
−1
if p ∈ [2, ∞)
Φ t; (y, h), (x, k)
if p ∈ (1, 2).
Hence, it suffices for us to check that y+th x+tk , k , h + w(x + tk, y + th)Re Φ0 (t) = v(x + tk, y + th)Re ky+thk kx+tkkE E E
E
and prove that Φ00 t; (x, k), (y, h)
≤0
if p ∈ [2, ∞) and khkE ≤ kkkE
≥0
if p ∈ (1, 2] and khkE ≥ kkkE .
To prove the preceding, = y + th, Ψ(t) = kx(t)kE + set x(t) = x + tk, y(t) ky(t)kE , a(t) =
Re x(t),k
kx(t)kE
E
, and b(t) =
Re y(t),h
ky(t)kE
E
. One then has that
h i Φ0 (t) = pΨ(t)p−2 (1 − p)kx(t)kE a(t) + ky(t)kE + (2 − p)kx(t)kE b(t) h i = p (1 − p)Ψ(t)p−2 kx(t)kE a(t) + b(t) + Ψ(t)p−1 b(t) . In particular, the first expression establishes the required form for Φ0 (t). In addition, from the second expression, we see that −
2 Φ00 (t) = (p − 1)(p − 2) Ψ(t)p−3 kx(t)kE a(t) + b(t) p i h 2 2 E + (p − 1)Ψ(t)p−2 a(t) a(t) + b(t) + kx(t)k ky(t)kE b⊥ (t) + a⊥ (t) i h b⊥ (t)2 − Ψ(t)p−2 (p − 1) a(t) + b(t) b(t) + Ψ(t) ky(t)k E
2 = (p − 1)(p − 2) Ψ(t)p−3 kx(t)kE a(t) + b(t) + (p − 1)Ψ(t)p−2 kkk2E − khk2E + (p − 2)Ψ(t)p−1
b⊥ (t)2 ky(t)kE ,
p p where a⊥ (t) = kkk2E − a(t)2 and b⊥ (t) = khk2E − b(t)2 . Hence the required properties of Φ00 (t) have also been established.
§ 6.3 Burkholder’s Inequality
261
Proof of Theorem 6.3.1: Set Kn = Xn − Xn−1 and Hn = Yn − Yn−1 for n ∈ Z+ . I will assume that there is an > 0 with the property that
X0 (ω) − span{Kn (ω) : n ∈ Z+ } ≥ E and
Y0 (ω) − span{Hn (ω) : n ∈ Z+ } ≥ E for all ω ∈ Ω. Indeed, if this is not already the case, then I can replace E by R × E (or, when E is complex, C × E) and Xn (ω) and Yn (ω), respectively, by Xn() (ω) ≡ , Xn (ω) and Yn() (ω) ≡ , Yn (ω) , ()
()
for each n ∈ N. Clearly, (6.3.2) for each Xn and Yn implies (6.3.2) for Xn and Yn after one lets & 0. Finally, because there is nothing to do when the right-hand side of (6.3.2) is infinite, let p ∈ (1, ∞) be given, and assume that Xn ∈ Lp (P; E) for each n ∈ N. In particular, if u is the function defined in Lemma 6.3.3 and v and w are those defined in Lemma 6.3.4, then u(Xn , Yn ) ∈ L1 (P; R)
0
and v(Xn , Yn ), w(Xn , Yn ) ∈ Lp (P; R)
p is the H¨older conjugate of p. for all n ∈ N, where p0 = p−1 Note that, by Lemma 6.3.3, it suffices for us to show that An ≡ EP u Xn , Yn ≤ 0, n ∈ N. Since u X0 , Y0 ) ≤ 0 P-almost surely, there is no question that A0 ≤ 0. Next, assume that An ≤ 0, and, depending on whether p ∈ [2, ∞) or p ∈ (1, 2], use the appropriate part of Lemma 6.3.4 to see that i h An+1 ≤EP v(Xn , Yn )Re kYYnnkE , Hn+1 E i h P Xn + E w(Xn , Yn )Re kXn kE , Kn+1
E
or
i h An+1 ≤ − EP w(Yn , Xn )Re kYYnnkE , Hn+1 E i h P Xn . − E v(Yn , Xn )Re kXn kE , Kn+1 E
v(Xn , Yn ) kYYnnkE
But, since (cf. Exercise 5.1.18)
is Fn -measurable, E [Hn+1 |Fn ] = 0, and therefore P
i h = 0. EP v(Xn , Yn )Re kYYnnkE , Hn+1 E
Since the same reasoning shows that each of the other terms on the right-hand side vanishes, we have now proved that An+1 ≤ 0. As an immediate consequence of Theorem (6.3.2), we have the following answer to the question raised at the beginning of this section.
262
6 Some Extensions and Applications
Corollary 6.3.5. Suppose that (Xn , Fn , P) is a martingale with values in a separable (real or complex) Hilbert space E. Further, let F be a second separable, complex Hilbert space, and suppose that {σn : n ≥ 0} is a sequence of Hom(E; F )-valued random variables with the properties that σ0 is constant, σn is Fn -measurable for n ≥ 1, and kσn kop ≤ σ < ∞ (a.s., P) for some constant σ < ∞ and all n ∈ N. If kY0 kF ≤ σkX0 kE and Yn − Yn−1 = σn−1 (Xn − Xn−1 ) for n ≥ 1, then (Yn , Fn , P) is an F -valued martingale and, for each p ∈ (1, ∞), (cf. (6.3.2)) kYn kLp (P;F ) ≤ σBp kXn kLp (P;E) , n ∈ N. § 6.3.2. Burkholder’s Inequality. In many applications, the most useful form of Burkholder’s result is as a generalization to p 6= 2 of the obvious equality " n # X 2 P 2 E |Xn − X0 | = E |Xm − Xm−1 | . P
m=1
This is the form of his inequality which is best known and, as such, is called Burkholder’s Inequality. Notice that his inequality can be viewed as a vast generalization of Khinchine’s Inequality (2.3.27), although it applies only when p ∈ (1, ∞). Theorem 6.3.6 (Burkholder’s Inequality). Let Ω, F, P and Fn : n ∈ N be as in Theorem 6.3.1, and let Xn , Fn , P be a martingale with values in the separable Hilbert space E. Then, for each p ∈ (1, ∞),
(6.3.7)
1 sup Xn − X0 Lp (P;E) Bp n∈N ! p2 p1 ∞ X
Xn − Xn−1 2 ≤ EP E 1
≤ Bp sup Xn − X0 Lp (P;E) , n∈N
with Bp as in (6.3.2). Proof: Let F = `2 (N; E) be the separable Hilbert space of sequences y = x0 , . . . , xn , . . . ∈ E N satisfying kykF ≡
∞ X 0
! 12 kxn k2E
< ∞,
Exercises for § 6.3
263
and define Yn (ω) = (X0 (ω), X1 (ω) − X0 (ω), . . . , Xn (ω) − Xn−1 (ω), 0, 0, . . . ) ∈ F for ω ∈ Ω and n ∈ N. Obviously, Yn , Fn , P is an F -valued martingale. Moreover, kX0 kE = kY0 kF
and kXn − Xn−1 kE = kYn − Yn−1 kF ,
n ∈ N,
and therefore the right-hand side of (6.3.7) is implied by (6.3.2) while the lefthand side also follows from (6.3.2) when the roles of the Xn ’s and Yn ’s are reversed. Exercises for § 6.3 Exercise 6.3.8. Because it arises repeatedly in the theory of stochastic integration, one of the most frequent applications of Burkholder’s Inequality is to situations in which E is a separable Hilbert space and (Xn , Fn , P) is an E-valued martingale for which one has an estimate of the form
h 1 i 2p
P 2p
m kMn − Mm k[0,t] = 0 P-almost surely (m)
(m)
and in L2 (P; R) for each t ∈ [0, ∞). To this end, define Yk−1,n so that Yk−1,n (ω) (m)
= Xk−1,n (ω) − X`−1,m (ω) when ζ`−1,m (ω) ≤ ζk−1,n (ω) < ζ`,m (ω). Then Yk−1,n P∞ (m) (m) 1 (a.s., P), and Mn −Mm = k=1 Yk−1,n ∆k,n . is Fk−1,n -measurable, |Yk−1,n | ≤ m Hence, by the same reasoning as above, ∞ X (m) 4 EP kMn − Mm k2[0,t] ≤ 4 EP (Yk−1,n )2 ∆k,n (t)2 ≤ 2 EP X(t)2 , m k=1
which is more than enough to get the asserted convergence result. We can now apply Lemma 7.2.2 to produce a right-continuous, progressively measurable M : [0, ∞) × Ω −→ R which is P-almost surely continuous and to
288
7 Continuous Parameter Martingales
which {Mn : n ≥ 1} converges uniformly on compacts, both P-almost surely and in L2 (P; R). In particular, M (t), Ft , P is a square integrable martingale. Finally, set hXi = (X 2 − 2M )+ . Obviously, hXi = X 2 − 2M (a.s., P), and hXi is right-continuous, progressively measurable, and P-almost surely continuous. In addition, because, P-almost surely, hXin −→ hXi uniformly on compacts and hXin (s) ≤ hXin (t) when t − s > n1 , it follows that hXi( · , ω) is non-decreasing for P-almost every ω ∈ Ω.
Remark 7.2.4. The reader may be wondering why I chose to complicate the preceding statement and proof by insisting that hXibe progressively measurable with respect to the original family of σ-algebras Ft : t ∈ [0, ∞) . Indeed, Exercise 7.1.22 shows that I could have replaced all the σ-algebras with their completions, and, if I had done so, there would have been no reason not to have taken X( · , ω) to be continuous and hXi( · , ω) to be continuous and nondecreasing for every ω ∈ Ω. However, there is a price to be paid for completing σ-algebras. In the first place, when one does, all statements become dependent on the particular P with which one is dealing. Secondly, because completed σalgebras are nearly never countably generated, certain desirable properties can be lost by introducing them. See, for example, Theorem 9.2.1. By combining Theorem 7.2.3 with Theorem 7.2.1, one can show that, up to time re-parametrization, all continuous martingales are Brownian motions. In order to avoid technical difficulties, I will prove this result only in the simplest case. Corollary 7.2.5. Let X(t), Ft , P be a continuous, square integrable martingale with the properties that, for P-almost every ω ∈ Ω, hXi( · , ω) is strictly increasing and exists a Brownian motion limt→∞ hXi(t, ω) = ∞. Then there B(t), Ft0 , P such that X(t) = X(0) + B hXi(t) , t ∈ [0, ∞) P-almost surely. In particular,
X(t) X(t) = 1 = − lim q lim q t→∞ 2hXi(t) log(2) hXi(t) 2hXi(t) log(2) hXi(t)
t→∞
P-almost surely. Proof: Clearly, given the first part, the last assertion is a trivial application of Exercise 4.3.15. After replacing F and the Ft ’s by their completions and applying Exercise 7.1.22, I may and will assume that X(0, ω) = 0, X( · , ω) is continuous, hXi( · , ω) is continuous and strictly increasing, and limt→∞ hXi(t, ω) = ∞ for every ω ∈ Ω. Next, for each (t, ω) ∈ [0, ∞), set ζt (ω) = hXi−1 (t, ω), where hXi−1 ( · , ω) is the inverse of hXi( · , ω). Clearly, for each ω ∈ Ω, t ζt (ω) is a continuous, strictly increasing function that tends to infinity as t → ∞. Moreover, because hXi is progressively measurable, ζt is a stopping time for each t ∈ [0, ∞). Now set
§ 7.2 Brownian Motion and Martingales
289
B(t) = X(ζt ). Since it is obvious that X(t) = B hXi(t) , all that I have to show is that B(t), Ft0 , P is a Brownian motion for some non-decreasing family {Ft0 : t ≥ 0} of sub-σ-algebras. Trivially, B(0, ω) = 0 and B( · , ω) is continuous for all ω ∈ Ω. In addition, B(t) is Fζt -measurable, and so B is progressively measurable with respect to {Fζt : t ≥ 0}. Thus, by Theorem 7.2.1, I will be done once I show that 2 B(t), Fζt , P and B(t) − t, Fζt , P are martingales. To this end, first observe that " # " # EP
sup X(τ )2 = lim EP τ ∈[0,ζt ]
T →∞
sup
X(τ )2
τ ∈[0,T ∧ζt ]
≤ 4 lim EP X(T ∧ ζt )2 ≤ 4 lim EP hXi(T ∧ ζt ) ≤ 4t. T →∞
T →∞
Thus, limT →∞ X(T ∧ ζt ) −→ B(t) in L2 (P; R). Now let 0 ≤ s < t and A ∈ Fζs be given. Then, for each T > 0, AT ≡ A ∩ {ζs ≤ T } ∈ FT ∧ζs , and so, by Theorem 7.1.14, EP X(T ∧ ζt ), AT = EP X(T ∧ ζs ), AT and EP X(T ∧ ζt )2 − hXi(T ∧ ζt ), AT = EP X(T ∧ ζs )2 − hXi(T ∧ ζs ), AT . Now let T → ∞, and apply the preceding convergence assertion to get the desired conclusion. § 7.2.3. Burkholder’s Inequality Again. In this subsection we will see what Burkholder’s Inequality looks like in the continuous parameter setting, a result whose importance for the theory of stochastic integration is hard to overstate. Theorem 7.2.6 (Burkholder). Let X(t), Ft , P be a P-almost surely continuous, square integrable martingale. Then, for each p ∈ (1, ∞) and t ∈ [0, ∞) (cf. (6.3.2)), p1 (7.2.7) Bp−1 kX(t) − X(0)kLp (P;R) ≤ EP hX(t)i 2 p ≤ Bp kX(t) − X(0)kLp (P;R) .
Proof: After completing the σ-algebras if necessary, I may (cf. Exercise 7.1.22) and will assume that X( · , ω) is continuous and that hXi( · , ω) is continuous and non-decreasing for every ω ∈ Ω. In addition, I may and will assume that X(0) = 0. Finally, I will assume that X is bounded. To justify this last assumption, let ζn = inf{t ≥ 0 : |X(t)| ≥ n}, set Xn (t) = X(t ∧ ζn ), and use Exercise 7.2.10 to see that one can take hXn i = hXi(t ∧ ζn ). Hence, if we know (7.2.7) for bounded martingales, then p1 Bp−1 kX(t ∧ ζn )kLp (P;R) ≤ EP hXi(t ∧ ζn ) 2 p ≤ Bp kX(t ∧ ζn )kLp (P;R)
290
7 Continuous Parameter Martingales
for all n ≥ 1. Since hXi is non-decreasing, we can apply Fatou’s Lemma to the preceding and thereby get
p1 kX(t)kLp (P;R) ≤ lim kX(t ∧ ζn )kLp (P;R) ≤ Bp EP hXi(t) 2 p , n→∞
which is the left-hand side of (7.2.7). To get the right-hand side, note that either kX(t)kLp (P;R) = ∞, in which case there is nothing to do, or kX(t)kLp (P;R) < ∞, in which case, by the second half of Theorem 7.1.9, X(t ∧ ζn ) −→ X(t) in Lp (P; R) and therefore
p1 p1 EP hXi(t) 2 p = lim EP hXi(t ∧ ζn ) 2 p n→∞
≤ Bp lim kX(t ∧ ζn )kLp (P;R) = Bp kX(t)kLp (P;R) . n→∞
Proceeding under the above assumptions and referring to the notation in the proof of Theorem 7.2.3, begin by observing that, for any t ∈ [0, ∞) and n ∈ N, Theorem 7.1.14 shows that X(t ∧ ζk,n ), Ft∧ζk,n , P is a discrete parameter martingale indexed by k ∈ N. In addition, ζk,n = t for all but a finite number of k’s. Hence, by (6.3.7) applied to X(t ∧ ζk,n ), Ft∧ζk,n , P , p1 Bp−1 kX(t)kLp (P;R) ≤ EP hXin (t) 2 p ≤ Bp kX(t)kLp (P;R)
for all n ∈ N.
In particular, this shows that supn≥0 khXin (t)kLp (P;R) < ∞ for every p ∈ (1, ∞), and therefore, since hXin (t) −→ hXi(t) (a.s., P), this is more than enough to p p P P 2 2 verify that E hXin (t) −→ E hXi(t) for every p ∈ (1, ∞).
Exercises for § 7.2 Exercise 7.2.8. Let X(t), Ft , P be a square integrable, continuous martingale. Following the strategy used to prove Theorem 7.2.1, show that Z F X(t) − 0
t 1 2 2 ∂x F
X(τ ) hXi(dτ ), Ft , P
is a martingale for every F ∈ Cb2 (R; C). Hint: Begin by using cutoffs and mollification to reduce to the case when F ∈ Cc∞ (R; R). Next, given s < t and > 0, introduce the stopping times ζ0 = s and ζn = inf{t ≥ ζn−1 : |X(t) − X(ζn−1 )| ≥ } ∧ (ζn−1 + ) ∧ (hXi(ζn−1 ) + ) ∧ t for n ≥ 1. Now proceed as in the proof of Theorem 7.2.1.
Exercises for § 7.2
291
Exercise 7.2.9. Let X(t), Ft , P be a continuous, square integrable martingale with X(0) = 0, and assume that there exists a non-decreasing function A : [0, ∞) −→ [0, ∞) such that hXi(t) ≤ A(t) (a.s., P) for each t ∈ [0, ∞). The goal of this exercise is to show that E(t), Ft , P is a martingale when E(t) = exp X(t) − 12 hXi(t) .
(i) Given R ∈ (0, ∞), set ζR = inf{t ≥ 0 : |X(t)| ≥ R}, and show that ! Z t∧ζR
eX(t∧ζR ) −
1 2
eX(τ ) dhXi, Ft , P
0
is a martingale. Hint: Choose F ∈ Cc∞ (R; R) so that F (x) = ex for x ∈ [−2R, 2R], apply Exercise 7.2.8 to this F , and then use Doob’s Stopping Time Theorem. 1
(ii) Apply Theorem 7.1.17 to the martingale in (i) and e− 2 hXi(t∧ζR ) to show that E(t ∧ ζR ), Ft , P is a martingale.
(iii) By replacing X and R with 2X and 2R in (ii), show that EP E(t ∧ ζR )2 ≤ eA(t) EP e2X(t∧ζR )−2hXi(t∧ζR ) = eA(t) . Conclude that {E(t ∧ ζR ) : R ∈ (0, ∞)} is uniformly P-integrable and therefore that E(t), Ft , P is a martingale. Exercise 7.2.10. If X(t), Ft , P is a P-almost surely continuous, square integrable martingale, ζ is a stopping time, and Y (t) = X(t ∧ ζ), show that hY i(t) = hXi(t ∧ ζ), t ≥ 0, P-almost surely. Exercise 7.2.11. Continuing in the setting of Exercise 7.2.9, first show that, for every λ ∈ R, Eλ (t), Ft , P is a martingale, where Eλ (t) = exp λX(t) −
λ2 2 hXi(t)
.
Next, use Doob’s Inequality to see that, for each λ ≥ 0, ! ! sup X(τ ) ≥ R
P
τ ∈[0,t]
≤P
sup Eλ (τ ) ≥ eλR−
λ2 2
A(t)
≤ e−λR+
λ2 2
A(t)
.
τ ∈[0,t]
Starting from this, conclude that (7.2.12)
R2 P kXk[0,t] ≥ R ≤ 2e− 2A(t) .
Finally, given this estimate, show that the conclusion in Exercise 7.2.8 continues to hold for any F ∈ C 2 (R; C) whose second derivative has at most exponential growth.
292
7 Continuous Parameter Martingales
Exercise 7.2.13. Given a pair continuous martingales of square integrable, hX+Y i−hX−Y i , and show that X(t), Ft , P and Y (t), Ft , P , set hX, Y i = 4 X(t)Y (t) − hX, Y i(t), Ft , P is a martingale. Further, show that hX, Y i is uniquely determined up to a P-null set by this property together with the facts that hX, Y i(0, ω) = 0 and hX, Y i( · , ω) is continuous and has locally bounded variation for P-almost every ω ∈ Ω. Exercise 7.2.14. Let B(t), Ft , P be an RN -valued Brownian motion. Given f, g ∈ Cb1,2 [0, ∞) × RN ; R , set t
Z
X(t) = f t, B(t) −
∂τ + 12 ∆ f τ, B(τ ) dτ,
0
t
Z
∂τ + 12 ∆ g τ, B(τ ) dτ,
Y (t) = g t, B(t) − 0
and show that t
Z
∇f · ∇g τ, B(τ ) dτ.
hX, Y i(t) = 0
Hint: First reduce to the case when f = g. Second, write X(t)2 as f t, B(t)
2
t
Z
∂τ + 12 ∆ f τ, B(τ ) dτ
− 2X(t) 0
Z − 0
t
∂τ + 12 ∆ f τ, B(τ ) dτ
2 ,
and apply Theorem 7.1.17 to the second term. § 7.3 The Reflection Principle Revisited In Exercise 4.3.12 we saw that L´evy’s Reflection Principle (Theorem 1.4.13) has a sharpened version when applied to Brownian motion. In this section I will give another, more powerful way of discussing the reflection principle for Brownian motion. § 7.3.1. Reflecting Symmetric L´ evy Processes. In this subsection, µ will be used to denote a symmetric, infinitely divisible law. Equivalently (cf. Exercise 3.3.11), µ ˆ = e`µ (ξ) , where `µ (ξ) = −
1 ξ, Cξ RN + 2
Z RN
cos ξ, y RN − 1 M (dy)
for some non-negative definite, symmetric C and symmetric L´evy measure M .
§ 7.3 The Reflection Principle Revisited
293
Lemma 7.3.1. Let {Z(t) : t ≥ 0} be a L´evy process for µ, and set Ft = σ {Z(τ ) : τ ∈ [0, t]} . If ζ is a stopping time relative to Ft : t ∈ [0, ∞) and Z(t) if ζ > t ˜ Z(t) ≡ 2Z(t ∧ ζ) − Z(t) = 2Z(ζ) − Z(t) if ζ ≤ t, ˜ : t ≥ 0} is again a L´evy process for µ. then {Z(t) Proof: According to Theorem 7.1.3, all that I have to show is that √ ˜ − t` (ξ) , F , P exp −1 (ξ, Z(t) µ t RN
is a martingale for all ξ ∈ RN . Thus, let 0 ≤ s < t and A ∈ Fs be given. Then, by Theorem 7.1.14 and the fact that `µ (−ξ) = `µ (ξ), i h √ ˜ − t` (ξ) , A ∩ {ζ ≤ s} EP exp −1 (ξ, Z(t) µ RN i h √ √ = EP e2 −1(ξ,Z(s∧ζ))RN exp − −1 (ξ, Z(t) RN − t`µ (ξ) , A ∩ {ζ ≤ s} i h √ √ = EP e2 −1(ξ,Z(s∧ζ))RN exp − −1 ξ, Z(s) RN − s`µ (ξ) , A ∩ {ζ ≤ s} i h √ ˜ − t` (ξ) , A ∩ {ζ ≤ s} . = EP exp −1 ξ, Z(s) µ N R
Similarly, i h √ ˜ − t` (ξ) , A ∩ {ζ > s} EP exp −1 ξ, Z(t) µ RN i h √ √ = EP e2 −1(ξ,Z(t∧ζ))RN exp − −1 (ξ, Z(t) RN − t`µ (ξ) , A ∩ {ζ > s} i h √ = EP exp −1 ξ, Z(t ∧ ζ) RN − (t ∧ ζ)`µ (ξ) , A ∩ {ζ > s} i h √ = EP exp −1 ξ, Z(s ∧ ζ) RN − (s ∧ ζ)`µ (ξ) , A ∩ {ζ > s} i h √ ˜ − s` (ξ) , A ∩ {ζ > s} . = EP exp −1 ξ, Z(s) µ N R
˜ Obviously, the process {Z(t) : t ≥ 0} in Lemma 7.3.1 is the one obtained by reflecting (i.e., reversing the direction of {Z(t) : t ≥ 0}) at time ζ, and the lemma says that the distribution of the resulting process is the same as that of the original one. Most applications of this result are to situations when one knows more or less precisely where the process is at the time when it is reflected. For example, suppose N = 1, a ∈ (0, ∞), and ζa = inf{t ≥ 0 : Z(t) ≥ a}. Noting ˜ = Z(t) for t ≤ ζa and therefore that ζa = inf{t ≥ 0 : Z(t) ˜ ≥ that, because Z(t) a}, we have that P Z(t) ≤ x & ζa ≤ t = P 2Z(ζa ) − Z(t) ≤ x & ζa ≤ t = P Z(t) ≥ 2Z(ζa ) − x & ζa ≤ t .
294
7 Continuous Parameter Martingales
Hence, if x ≤ a, and therefore Z(t) ≥ 2Z(ζa ) − x =⇒ ζa ≤ t when ζa < ∞, then P Z(t) ≤ x & ζa ≤ t = P Z(t) ≥ 2Z(ζa ) − x & ζa < ∞ for x ≤ a. Applying this when x = a and using P ζ ≤ t = P Z(t) ≤ a & ζ ≤ t + a a P Z(t) > a , one gets P ζa ≤ t ≤ 2P Z(t) ≥ a , a conclusion that also could have been reached via Theorem 1.4.13. § 7.3.2. Reflected Brownian Motion. The considerations in the preceding subsection are most interesting when applied to R-valued Brownian motion. Thus, let B(t), Ft , P be an R-valued Brownian motion. To appreciate the improvements that can be made in the calculations just made, again take ζa = inf{t ≥ 0 : B(t) ≥ a} for some a > 0. Then, because Brownian paths are continuous, ζa < ∞ =⇒ B(ζa ) = a and so, since P(ζa < ∞) = 1, we can say that (7.3.2) P B(t) ≤ x & ζa ≤ t = P B(t) ≥ 2a−x for (t, x) ∈ [0, ∞)×(−∞, a]. In particular, by taking x = a and using P B(t) ≥ a = P B(t) ≥ a & ζa ≤ t , we recover the result in Exercise 4.3.12 that P ζa ≤ t = 2P B(t) ≥ a . A more interesting application of Lemma 7.3.1 to Brownian motion is to the case when ζ is the exit time from an interval other than a half-line. Theorem 7.3.3. Let a1 < 0 < a2 be given, define ζ (a1 ,a2 ) = inf{t ≥ 0 : B(t) ∈ / (a1 ,a2 ) (a1 ,a2 ) (a1 , a2 )}, and set Ai (t) = {ζ ≤ t & B(ζ ) = ai } for i ∈ {1, 2}. Then, for Γ ∈ B[a1 ,∞) , 0 ≤ P {B(t) ∈ Γ} ∩ A1 (t) − P {B(t) ∈ 2(a2 − a1 ) + Γ} ∩ A1 (t) = P B(t) ∈ 2a1 − Γ − P B(t) ∈ 2(a2 − a1 ) + Γ and, for Γ ∈ B(−∞,a2 ] , 0 ≤ P {B(t) ∈ Γ} ∩ A2 (t) − P {B(t) ∈ −2(a2 − a1 ) + Γ} ∩ A2 (t) = P B(t) ∈ 2a2 − Γ − P B(t) ∈ −2(a2 − a1 ) + Γ . Hence, for Γ ∈ B[a1 ,∞) , P {B(t) ∈ Γ} ∩ A1 (t) equals ∞ h X i γ0,t Γ − 2a1 + 2(m − 1)(a2 − a1 ) − γ0,t Γ + 2m(a2 − a1 ) m=1
and, for Γ ∈ B(−∞,a2 ] , P {B(t) ∈ Γ} ∩ A2 (t) equals ∞ h X i γ0,t Γ − 2a2 − 2(m − 1)(a2 − a1 ) − γ0,t Γ − 2m(a2 − a1 ) , m=1
where in both cases the convergence is uniform with respect t in compacts and Γ ∈ B(a1 ,a2 ) .
§ 7.3 The Reflection Principle Revisited
295
Proof: Suppose Γ ∈ B[a1 ,∞) . Then, by Lemma 7.3.1, P {B(t) ∈ Γ} ∩ A1 (t) = P {2a1 − B(t) ∈ Γ} ∩ A1 (t) = P B(t) ∈ 2a1 − Γ − P {B(t) ∈ 2a1 − Γ} ∩ A2 (t) , since B(t) ∈ 2a1 − Γ =⇒ B(t) ≤ a1 =⇒ ζ (a1 ,a2 ) ≤ t. Similarly, P {B(t) ∈ Γ} ∩ A2 (t) = P {2a2 − B(t) ∈ Γ} ∩ A1 (t) = P B(t) ∈ 2a2 − Γ − P {B(t) ∈ 2a2 − Γ} ∩ A1 (t) when Γ ∈ B(−∞,a2 ] . Hence, since 2a1 − Γ ⊆ (−∞, a1 ] ⊆ (−∞, a2 ] if Γ ∈ B[a1 ,∞) , P {B(t) ∈ Γ} ∩ A1 (t) = P B(t) ∈ 2a1 − Γ − P B(t) ∈ 2(a2 − a1 ) + Γ + P {B(t) ∈ 2(a2 − a1 ) + Γ} ∩ A1 (t) when Γ ∈ B[a1 ,∞) . Similarly, when Γ ∈ B(−∞,a2 ] , P {B(t) ∈ Γ} ∩ A2 (t) = P B(t) ∈ 2a2 − Γ − P B(t) ∈ −2(a2 − a1 ) + Γ + P {B(t) ∈ −2(a2 − a1 ) + Γ} ∩ A2 (t) . To check that P {B(t) ∈ Γ}∩A1 (t) −P {B(t) ∈ 2(a2 −a1 )+Γ}∩A1 (t) ≥ 0 when Γ ∈ B[a1 ,∞) , first use Theorem 7.1.16 to see that P {B(t) ∈ Γ} ∩ A1 (t) = EP γ0,t−ζ (a1 ,a2 ) (Γ − a1 ), A1 (t) . Second, observe that, because Γ ⊆ [a1 , ∞), γ0,τ 2(a2 − a1 ) + Γ ≤ γ0,τ (Γ) for all τ ≥ 0. The case when Γ ∈ B(−∞,a2 ] and A1 (t) is replaced by A2 (t) is handled in the same way. Given the preceding, one can use induction to check that P {B(t) ∈ Γ}∩A1 (t) equals M h X
i P B(t) ∈ 2a1 − 2(m − 1)(a2 − a1 ) − Γ − P B(t) ∈ 2m(a2 − a1 ) + Γ
m=1
+ P {B(t) ∈ 2M (a2 − a1 ) + Γ} ∩ A1 (t) for all Γ ∈ B[a1 ,∞) . The same line of reasoning applies when Γ ∈ B(−∞,a2 ] and A1 (t) is replaced by A2 (t). Perhaps the most useful consequence of the preceding is the following corollary.
296
7 Continuous Parameter Martingales
Corollary 7.3.4. Given a c ∈ R and an r ∈ (0, ∞), set I = (c − r, c + r) and P I (t, x, Γ) = P {x + B(t) ∈ Γ} ∩ {ζ I > t} , x ∈ I and Γ ∈ BI . Then Z
I
(7.3.5)
P I (t, z, Γ) P I (s, x, dz).
P (s + t, x, Γ) = I
Next, set g˜(t, x) =
X
g(t, x + 4m),
1
x2
where g(t, x) = (2πt)− 2 e− 2t
m∈Z
and p(−1,1) (t, x, y) = g˜(t, y − x) − g˜(t, y + x + 2)
for (t, x, y) ∈ (0, ∞) × [−1, 1]2 .
Then p(−1,1) is a smooth function that is symmetric in (x, y), strictly positive on (0, ∞) × (0, 1)2 , and vanishes when x ∈ {−1, 1}. Finally, if pI (t, x, y) = r−1 p(−1,1) r−2 , r−1 (x − c), r−1 (y − c) ,
(t, x, y) ∈ (0, ∞) × I 2 ,
then I
(7.3.6)
Z
p (s + t, x, y) =
pI (s, x, z)pI (t, z, y) dz
I
and, for (t, x) ∈ (0, ∞) × I, P I (t, x, dy) = pI (t, x, y) dy. Proof: Begin by applying Theorem 7.1.16 to check that P I (s + t, x, Γ) equals W (1) {x + ψ(s) + δs ψ(t) ∈ Γ} ∩ {x + ψ(s) + δs ψ(τ ), τ ∈ [0, t − s]} ∩ {x + ψ(σ) ∈ I, σ ∈ [0, s]} (1) = EW P I t, x + ψ(s), Γ , {x + ψ(σ) ∈ I, σ ∈ [0, s]} Z = P I (t, z, Γ) P I (s, x, dz). I
Next, set a1 = r−1 (c − x) − 1 and a2 = r−1 (x − x) + 1. Then P I (t, x, Γ) = P {B(t) ∈ Γ − x} ∩ {B(τ ) ∈ (ra1 , ra2 ), τ ∈ [0, t]} = P {B(r−2 t) ∈ r−1 (Γ − x)} ∩ {B(r−2 τ ) ∈ (a1 , a2 ), τ ∈ [0, t]} = P B(r−2 t) ∈ r−1 (Γ − x) & ζ (a1 ,a2 ) > r−2 t = P B(r−2 t) ∈ r−1 (Γ − x) − P B(r−2 t) ∈ r−1 (Γ − x) & ζ (a1 ,a2 ) ≤ r−2 t ,
§ 7.3 The Reflection Principle Revisited
297
where, in the passage to the second line, I have used Brownian scaling. Now, use the last part of Theorem 7.3.3, the symmetry of γ0,r−2 t , and elementary rearrangement of terms to arrive first at P I (t, x, Γ) =
Xh
i γr−2 t 4m + r−1 (Γ − x) − γr−2 t 4m + 2 + r−1 (Γ + x − 2c) ,
m∈Z
and then at P I (t, x, dy) = pI (t, x, y) dy. Given this and (7.3.5), (7.3.6) is obvious. Turning to the properties of p(−1,1) (t, x, y), both its symmetry and smoothness are clear. In addition, as the density for P (−1,1) (t, x. · ), it is non-negative, and, because x g˜(t, x) is periodic with period 4, it is easy to see that (−1,1) p (t, ±1, y) = 0. Thus, everything comes down to proving that p(−1,1) (t, x, y) > 0 for (t, x, y) ∈ (0, ∞) × (−1, 1)2 . To this end, first observe that, after rearranging terms, one can write p(−1,1) (t, x, y) as g(t,y − x) − g(t, y + x) + g(t, 2 − x − y) ∞ h X + g(t, y − x + 4m) − g(t, y + x + 2 + 4m) m=1
i + g(t, y − x − 4m) − g(t, y + x − 2 − 4m) . Since each of the terms in the sum over m ∈ Z+ is positive, we have that
2(1−|x|)(1−|y|) t ≥ 1 − 2e g(t, y − x) p(−1,1) (t, x, y) > g(t, y − x) 1 − 2e− if t ≤ 2(1 − |x|)(1 − |y|). Hence, for each θ ∈ (0, 1), p(−1,1) (t, x, y) > 0 for all (t, x, y) ∈ [0, 2θ2 ] × [−1 + θ, 1 − θ]2 . Finally, to handle x, y ∈ [−1 + θ, 1 − θ] and t > 2θ2 , apply (7.3.6) with I = (−1, 1) to see that p
(−1,1)
2
Z
(m + 1)θ , x, y) ≥
p(−1,1) (θ2 , x, z)p(−1,1) (mθ2 , z, y) dz,
|z|≤(1−θ)
and use this and induction to see that p(−1,1) (mθ2 , x, y) > 0 for all m ≥ 1. Thus, if n ∈ Z+ is chosen so that nθ2 < t ≤ (n + 1)θ2 , then another application of (7.3.6) shows that (−1,1)
p
Z (t, x, y) ≥ |z|≤(1−θ)
p(−1,1) (t − nθ2 , x, z)p(−1,1) (nθ2 , z, y) dz > 0.
298
7 Continuous Parameter Martingales Exercises for § 7.3
Exercise 7.3.7. Suppose that G is a non-empty, open subset of RN , define ζxG : C(RN ) −→ [0, ∞] by ζxG (ψ) = inf{t ≥ 0 : x + ψ(t) ∈ / G}, and set P G (t, x, Γ) = W (N ) {ψ : x + ψ(t) ∈ Γ & ζxG (ψ) > t} for (t, x) ∈ (0, ∞) × G and Γ ∈ BG . (i) Show that G
Z
P G (t, z, Γ) P G (s, x, dy).
P (s + t, x, Γ) = G
(ii) As an application of Exercise 7.1.25, show that P G (t, x, Γ) = γ0,tI (Γ − x) − EW
(N )
γ0,(t−ζxG )I Γ − x − ψ(ζxG ) , ζxG ≤ Γ .
. This is the probabilistic version of Duhamel’s Formula, which we will see again in § 10.3.1. (iii) As a consequence of (ii), show that there is a Borel measurable function pG : (0, ∞) × G2 −→ [0, ∞) such that (t, y) pG (t, x, y) is continuous for each x ∈ G and P G (t, x, dy) = pG (t, x, y) dy for each (t, x) ∈ (0, ∞) × G. In particular, use this in conjunction with (i) to conclude that Z G p (s + t, x, y) = pG (t, z, y)pG (s, x, z) dz. G N
Hint: Keep in mind that (τ, ξ) (2πτ )− 2 e− long as ξ stays away from the origin.
|ξ|2 2τ
is smooth and bounded as
(iv) Given c = (c1 , . . . , cN ) ∈ RN and r > 0, let Q(c, r) denote the open cube QN i=1 (ci − r, ci + r), and show that (cf. Corollary 7.3.4) pQ(c,r) (t, x, y) =
N Y
p(ci −r,ci +r) (t, xi , yi )
i=1
for x = (x1 , . . . , xN ), y = (y1 , . . . , yN ) ∈ Q(c, r). In particular, conclude that pQ(c,r) (t, x, y) is uniformly positive on compact subsets of (0, ∞) × Q(c, r)2 . (v) Assume that G is connected, and show that pG (t, x, y) is uniformly positive on compact subsets of (0, ∞) × G2 . Hint: If Q(c, r) ⊆ G, show that pG (t, x, y) ≥ pQ(c,r) (t, x, y) on (0, ∞)×Q(c, r)2 .
Chapter 8 Gaussian Measures on a Banach Space
As I said at the end of § 4.3.2, the distribution of Brownian motion is called Wiener measure because Wiener was the first to construct it. Wiener’s own thinking about his measure had little or nothing in common with the L´evy– Khinchine program. Instead, he looked upon his measure as a Gaussian measure on an infinite dimensional space, and most of what he did with his measure is best understood from that perspective. Thus, in this chapter, we will look at Wiener measure from a strictly Gaussian point of view. More generally, we will be dealing here with measures on a real Banach space E that are centered Gaussian in the sense that, for each x∗ in the dual space E ∗ , x ∈ E 7−→ hx, x∗ i ∈ R is a centered Gaussian random variable. Not surprisingly, such a measure will be said to be a centered Gaussian measure on E . Although the ideas that I will use are already implicit in Wiener’s work, it was I. Segal and his school, especially L. Gross,1 who gave them the form presented here. § 8.1 The Classical Wiener Space In order to motivate what follows, it is helpful to first understand Wiener measure from the point of view which I will be adopting here. § 8.1.1. Classical Wiener Measure. Up until now I have been rather casual about the space from which Brownian paths come. Namely, because Brownian paths are continuous, I have thought of their distribution as being a probability on the space C(RN ) = C [0, ∞); RN . In general, there is no harm done by choosing C(RN ) as the sample space for Brownian paths. However, for my purposes here, I need my sample spaces to be separable Banach spaces, and, although it is a complete, separable metric space, C(RN ) is not a Banach space. With this in mind, define Θ(RN ) to be the space of continuous paths θ : [0, ∞) −→ RN with the properties that θ(0) = 0 and limt→∞ t−1 |θ(t)| = 0. 1
See I.E. Segal’s “Distributions in Hilbert space and canonical systems of operators,” T.A.M.S., 88 (1958) and L. Gross’s “Abstract Wiener spaces,” Proc. 5th Berkeley Symp. on Prob. & Stat., 2 (1965), Univ. of California Press. A good exposition of this topic can be found in H.-H. Kuo’s Gaussian Measures in Banach Spaces, Springer-Verlag, Math. Lec. Notes., # 463 (1975).
299
300
8 Gaussian Measures on a Banach Space
Lemma 8.1.1. The map |ψ(t)| ∈ [0, ∞] t≥0 1 + t is lower semicontinuous, and the pair Θ(RN ), k · kΘ(RN ) is a separable Banach space that is continuously embedded as a Borel measurable subset of C(RN ). In N particular, BΘ(RN ) coincides with BC(RN ) [Θ(R )] = A∩Θ(RN ) : A ∈ BC(RN ) . ∗ Moreover, the dual space Θ(RN ) of Θ(RN ) can be identified with the space of RN -valued, Borel measures λ on [0, ∞) with the properties that λ({0}) = 0 and 1 Z ∗ kλkΘ(RN ) ≡ (1 + t) |λ|(dt) < ∞, ψ ∈ C(RN ) 7−→ kψkΘ(RN ) ≡ sup
[0,∞)
when the duality relation is given by Z hθ, λi =
θ(t) · λ(dt).
[0,∞)
Finally, if (B(t), Ft , P) is an RN -valued Brownian motion, then B ∈ Θ(RN ) P-almost surely and EP kBk2Θ(RN ) ≤ 32N. Proof: It is obvious that the inclusion map taking Θ(RN ) into C(RN ) is continuous. To see that k · kΘ(RN ) is lower semicontinuous on C(RN ) and that Θ(RN ) ∈ BC(RN ) , note that, for any s ∈ [0, ∞) and R ∈ (0, ∞), n o A(s, R) ≡ ψ ∈ C(RN ) : ψ(t) ≤ R(1 + t) for t ≥ s is closed in C(RN ). Hence, since kψkΘ(RN ) ≤ R ⇐⇒ ψ ∈ A(0, R), k · kΘ(RN ) is lower semicontinuous. In addition, since {ψ ∈ C(RN ) : ψ(0) = 0} is also closed, Θ(RN ) =
∞ [ ∞ n o \ ψ ∈ A m, n1 : ψ(0) = 0 ∈ BC(RN ) . n=1 m=1
In order to analyze the space Θ(RN ), k · kΘ(RN ) , define N N N F : Θ(R ) −→ C0 R; R ≡ ψ ∈ C R; R : lim |ψ(s)| = 0 |s|→∞
by θ (es ) , F (θ) (s) = 1 + es 1
s ∈ R.
I use |λ| to denote the variation measure determined by λ.
§ 8.1 The Classical Wiener Space
301
As is well known, C0 R; RN with the uniform norm is a separable Banach space, N N and it is obvious that F is an isometry from Θ(R ) onto C0 R; R . Moreover, by the Riesz Representation Theorem for C0 R; RN , one knows that the dual of C0 R; RN is isometric to the space of totally finite, RN -valued measures on R; BR with the norm given by total variation. Hence, the identification ∗ of Θ(RN ) reduces to the obvious interpretation of the adjoint map F ∗ as a mapping from totally finite RN -valued measures onto the space of RN -valued measures that do not charge 0 and whose variation measure integrates (1 + t). Because of the Strong Law in part (ii) of Exercise 4.3.11, it is clear that almost every Brownian path is in Θ(RN ). In addition, by the Brownian scaling property and Doob’s Inequality (cf. Theorem 7.1.9), P
E
kBk2Θ(RN )
≤ =
∞ X n=0 ∞ X
4 2
−n+1
−n+2
P
E
2
sup |B(t)| 0≤t≤2n
P
E
sup |B(t)|
2
≤ 32EP |B(1)|2 = 32N.
0≤t≤1
n=0
In view of Lemma 8.1.1, we now know that the distribution of RN -valued Brownian motion induces a Borel measure W (N ) on the separable Banach space Θ(RN ), and throughout this chapter I will refer to this measure as the classical Wiener measure. My next goal is to characterize, in terms of Θ(RN ), exactly which measure on Θ(RN ) Wiener’s is, and for this purpose I will use the following simple fact about Borel probability measures on a separable Banach space. Lemma 8.1.2. Let E with norm k · kE be a separable, real Banach space, and use (x, x∗ ) ∈ E × E ∗ 7−→ hx, x∗ i ∈ R to denote the duality relation between E and its dual space E ∗ . Then the Borel field BE coincides with the σ-algebra generated by the maps x ∈ E 7−→ hx, x∗ i as x∗ runs over E ∗ . In particular, if, for µ ∈ M1 (E), one defines its Fourier transform µ ˆ : E ∗ −→ C by Z i h√ µ ˆ(x∗ ) = exp −1 hx, x∗ i µ(dx), x∗ ∈ E ∗ , E
then µ ˆ is a continuous function of weak* convergence on Θ∗ , and µ ˆ uniquely determines µ in the sense that if ν is a second element of M1 (Θ) and µ ˆ = νˆ, then µ = ν. Proof: Since it is clear that each of the maps x ∈ E 7−→ hx, x∗ i ∈ R is continuous and therefore BE -measurable, the first assertion will follow as soon
302
8 Gaussian Measures on a Banach Space
as we show that the norm x kxkE can be expressed as a measurable function of these maps. But, because E is separable, we know (cf. Exercise 5.1.19) that the closed unit ball BE ∗ (0, 1) in E ∗ is separable with respect to the weak* topology and therefore that we can find a sequence {x∗n :, n ≥ 1} ⊆ BE ∗ (0, 1) so that
kxkΘ = sup hx, x∗n i,
x ∈ E.
n∈Z+
Turning to the properties of µ ˆ, note that its continuity with respect to weak* convergence is an immediate consequence of Lebesgue’s Dominated Convergence Theorem. Furthermore, in view of the preceding, we will know that µ ˆ completely determines µ as soon as we show that, for each n ∈ Z+ and X ∗ = x∗1 , . . . , x∗n ∈ n E∗ , µ ˆ determines the marginal distribution µX ∗ ∈ M1 (RN ) of x ∈ E 7−→ hx, x∗1 i, . . . , hx, x∗n i ∈ Rn under µ. But this is clear (cf. Lemma 2.3.3), since ! n X µd ˆ ξm x∗m for ξ = (ξ1 , . . . , ξn ) ∈ Rn . X ∗ (ξ) = µ m=1
I will now compute the Fourier transform of W (N) . To this end, first recall that, for an RN -valued Brownian motion, { ξ, B(t) RN : t ≥ 0 and ξ ∈ RN spans a Gaussian family G(B) in L2 (P; R). Hence, span ξ, θ(t) : t ≥ 0 and ξ ∈ RN is a Gaussian family in L2 (W (N ) ; R). From this, combined with an easy limit argument using Riemann sum approximations, one sees that, ∗ for any λ ∈ Θ(RN ) , θ hθ, λi is a centered Gaussian random variable under W (N ) . Furthermore, because, for 0 ≤ s ≤ t, (N ) (N ) EW ξ, θ(s) RN η, θ(t) RN = EW ξ, θ(s) RN η, θ(s) RN = s ξ, η RN , we can apply Fubini’s Theorem to see that ZZ (N ) EW hθ, λi2 = s ∧ t λ(ds) · λ(dt). [0,∞)2
Therefore, we now know that W (N ) is characterized by its Fourier transform ZZ 1 ∗ \ (N ) (λ) = exp − s ∧ t λ(ds) · λ(dt) , λ ∈ Θ(RN ) . (8.1.3) W 2 [0,∞)2
Equivalently, we have shown that W (N ) is the centered Gaussian measure on ∗ Θ(RN ) with the property that, for each λ ∈ Θ(RNRR ) ,θ hθ, λi is a centered Gaussian random variable with variance equal to s ∧ t λ(ds) · λ(dt). [0,∞)2
§ 8.1 The Classical Wiener Space
303
§ 8.1.2. The Classical Cameron–Martin Space. From the Gaussian standpoint, it is extremely unfortunate that the natural home for Wiener measure is a Banach space rather than a Hilbert space. Indeed, in finite dimensions, every centered, Gaussian measure with non-degenerate covariance can be thought of as the canonical, or standard, Gaussian measure on a Hilbert space. Namely, if γ0,C is the Gaussian measure on RN with mean 0 and non-degenerate covariance C, consider RN as a Hilbert space H with inner product (g, h)H = (g, Ch)RN , and take λH to be the natural Lebesgue measure there: the one that assigns measure 1 to a unit cube in H or, equivalently, the one obtained by pushing the 1 usual Lebesgue measure λRN forward under the linear transformation C 2 . Then we can write khk2 1 − 2H λH (dh) e γ0,C (dh) = N (2π) 2 and 2 − γd 0,C (h) = e
khk H 2
.
As was already pointed out in Exercise 3.1.11, in infinite dimensions there is no precise analog of the preceding canonical representation (cf. Exercise 8.1.7 for further corroboration of this point). Nonetheless, a good deal of insight can be gained by seeing how close one can come. In order to guess on which Hilbert space it is that W (N ) would like to live, I will give R. Feynman’s highly questionable but remarkably powerful way of thinking about such matters. Namely, n given n ∈ Z+ , 0 = t0 < t1 < · · · < tn , and a set A ∈ BRN , we know that W (N ) assigns θ : θ(t1 ), . . . , θ(tn ) ∈ A probability # " Z n X |ym − ym−1 |2 1 dy1 · · · dyn , exp − tm − tm−1 Z(t1 , . . . , tn ) A m=1 N Qn where y0 ≡ 0 and Z(t1 , . . . , tn ) = m=1 2π(tm − tm1 ) 2 . Now rename the variable ym as θ(tm ), and rewrite the preceding as Z(t1 , . . . , tn )−1 times !2 Z n X θ(t ) − θ(t ) t − t m m−1 m m−1 dθ(t1 ) · · · dθ(tn ). exp − tm − tm−1 2 A m=1
Obviously, nothing very significant has happened yet, since nothing very exciting has been done yet. However, if we now close our eyes, suspend our disbelief, and pass to the limit as n tends to infinity and the tk ’s become dense, we arrive at Feynman’s representation 2 of Wiener’s measure: # " Z 2 1 1 (N ) ˙ dt dθ, θ(t) (8.1.4) W dθ) = exp − 2 [0,∞) Z 2
In truth, Feynman himself never dabbled in considerations so mundane as the ones that √ follow. He was interested in the Sch¨ odinger equation, and so he had a factor −1 multiplying the exponent.
304
8 Gaussian Measures on a Banach Space
where θ˙ denotes the velocity (i.e., derivative) of θ. Of course, when we reopen our eyes and take a look at (8.1.4), we see that it is riddled with flaws. Not even one of the ingredients on the right-hand side of (8.1.4) makes sense! In the first place, the constant Z must be 0 (or maybe ∞). Secondly, since the image of the “measure dθ” under n θ ∈ Θ(RN ) 7−→ θ(t1 ) . . . , θ(tn ) ∈ RN is Lebesgue measure for every n ∈ Z+ and 0 < t1 · · · < tn , dθ must be the nonexistent translation invariant measure on the infinite dimensional space Θ(RN ). Finally, the integral in the exponent only makes sense if θ is differentiable in some sense, but almost no Brownian path is. Nonetheless, ridiculous as it is, (8.1.4) is exactly the expression at which one would arrive if one were to make a sufficiently na¨ıve interpretation of the notion that Wiener measure is the standard Gauss measure on the Hilbert space H(RN ) consisting of absolutely continuous h : [0, ∞) −→ RN with h(0) = 0 and ˙ L2 ([0,∞);RN ) < ∞. khkH1 (RN ) = khk Of course, the preceding discussion is entirely heuristic. However, now that we know that H1 (RN ) is the Hilbert space at which to look, it is easy to provide a mathematically rigorous statement of the connection between Θ(RN ), W (N ) , and H1 (RN ). To this end, observe that H(RN ) is continuously embedded in 1 Θ(RN ) as a dense subspace. Indeed, if h ∈ H1 (RN ), then |h(t)| ≤ t 2 khkH1 (RN ) , and so not only is h ∈ Θ(RN ) but also khkΘ(RN ) ≤ 12 khkH1 (RN ) . In addition, since Cc∞ (0, ∞); RN is already dense in Θ(RN ), the density of H1 (RN ) in Θ(RN ) is clear. Knowing this, abstract reasoning (cf. Lemma 8.2.3) guarantees ∗ that Θ(RN ) can be identified as a subspace of H1 (RN ). That is, for each λ ∈ ∗ Θ(RN ) , there is a hλ ∈ H1 (RN ) with the property that h, hλ H1 (RN ) = hh, λi
for all h ∈ H1 (RN ), and in the present setting it is easy to give a concrete ∗ representation of hλ . In fact, if λ ∈ Θ(RN ) , then, for any h ∈ H1 (RN ), Z hh, λi =
Z
Z
h(t) · λ(dt) = (0,∞)
Z = (0,∞)
(0,∞)
! ˙ ) dτ h(τ
(0,t)
˙ ) · λ (τ, ∞) dτ = h, hλ 1 N , h(τ H (R )
where Z hλ (t) = (0,t]
λ (τ, ∞) dτ.
· λ(dt)
§ 8.1 The Classical Wiener Space
305
Moreover, khλ k2H1 (RN ) =
Z
λ (τ, ∞) |2 dτ =
(0,∞)
Z
ZZ
(0,∞)
λ(ds) · λ(dt) dτ
(τ,∞)2
ZZ =
s ∧ t λ(ds) · λ(dt).
(0,∞)2
Hence, by (8.1.3), \ (N ) (λ) = exp − W
(8.1.5)
khλ k2H(RN )
! ,
2
∗
λ ∈ Θ(RN ) .
Although (8.1.5) is far less intuitively appealing than (8.1.4), it provides a mathematically rigorous way in which to think of W (N ) as the standard Gaussian measure on H1 (RN ). Furthermore, there is another way to understand why one should accept (8.1.5) as evidence for this way of thinking about W (N ) . Indeed, ∗ given λ ∈ Θ(RN ) , write Z Z T hθ, λi = lim θ(t) · λ(dt) = − lim θ(t) · dλ (t, ∞) , T →∞
T →∞
[0,T ]
0
where the integral in the last expression is taken in the sense of Riemann– Stieltjes. Next, apply the integration by part formula3 to conclude that t λ (t, ∞) is Riemann–Stieltjes integrable with respect to t θ(t) and that Z T Z T − θ(t) · dλ (t, ∞) = −θ(T ) · λ (T, ∞) + λ (t, ∞) · dθ(t). 0
0
Hence, since |θ(T )| lim |θ(T )||λ|(T, ∞) ≤ lim T →∞ T →∞ 1 + T
Z (8.1.6)
hθ, λi = lim
T →∞
Z (1 + t) |λ|(dt) = 0, (0,∞)
T
h˙ λ (t) · dθ(t),
0
where again the integral is in the sense of Riemann–Stieltjes. Thus, if one ˙ somewhat casually writes dθ(t) = θ(t) dt, one can believe that hθ, λi provides a reasonable interpretation of θ, hλ H(RN ) for all θ ∈ Θ(RN ), not just those that are in H1 (RN ). Because R. Cameron and T. Martin were the first mathematicians to systematically exploit the consequences of this line of reasoning, I will call H1 (RN ) the Cameron–Martin space for classical Wiener measure. 3
See, for example, Theorem 1.2.7 in my A Concise Introduction to the Theory of Integration, Birkh¨ auser (1999).
306
8 Gaussian Measures on a Banach Space Exercises for § 8.1
Exercise 8.1.7. Let H be a separable Hilbert space, and, for each n ∈ Z+ and subset {g1 , . . . , gn } ⊆ H, let A(g1 , . . . , gn ) denote the σ-algebra over H generated by the mapping h ∈ H 7−→ (h, g1 )H , . . . , (h, gn )H ∈ Rn , and check that A=
[
A(g1 , . . . , gn ) : n ∈ Z+ and g1 , . . . , gn ∈ H
is an algebra that generates BH . Show that there always exists a finitely additive WH on A that is uniquely determined by the properties that it is σ-additive on A(g1 , . . . , gn ) for every n ∈ Z+ and {g1 , . . . , gn } ⊆ H and that Z i h√ kgk2H , g ∈ H. exp −1 (h, g)H WH (dh) = exp − 2 H
On the other hand, as we already know, this finitely additive measure admits a countably additive extension to BH if and only if H is finite dimensional. § 8.2 A Structure Theorem for Gaussian Measures Say that a centered Gaussian measure W on a separable Banach space E is non-degenerate if EW hx, x∗ i2 > 0 unless x∗ = 0. (See Exercise 8.2.11.) In this section I will show that any non-degenerate, centered Gaussian measure W on a separable Banach space E shares the same basic structure that W (N ) has on Θ(RN ). In particular, I will show that there is always a Hilbert space H ⊆ E for which W is the standard Gauss measure in the same sense that W (N ) was shown in § 8.1.2 to be the standard Gauss measure for H1 (RN ). § 8.2.1. Fernique’s Theorem. In order to carry out my program, I need a basic integrability result about Banach space–valued, Gaussian random variables. The one that I will use is due to X. Fernique, and his is arguably the most singularly beautiful result in the theory of Gaussian measures on a Banach space. Theorem 8.2.1 (Fernique’s Theorem). Let E be a real, separable Banach space, and suppose that X is an E-valued random variable that is centered and Gaussian in the sense that, for each x∗ ∈ E ∗ , hX, x∗ i is a centered, R-valued Gaussian random variable. If R = inf{r : P(kXkE ≤ r) ≥ 34 )}, then
(8.2.2)
∞ 2n h kXk2E i X 1 e . E e 18R2 ≤ K ≡ e 2 + 3 n=0
(See Corollary 8.4.3 for a sharpened statement.)
§ 8.2 A Structure Theorem for Gaussian Measures
307
Proof: After enlarging the sample space if necessary, I may and will assume that there is an E-valued random variable X 0 that is independent of X and has 1 1 the same distribution as X. Set Y = 2− 2 (X + X 0 ) and Y 0 = 2− 2 (X − X 0 ). Then the pair (Y, Y 0 ) has the same distribution as the pair (X, X 0 ). Indeed, by 2 Lemma 8.1.2, this random variable comes down to showing that the R ∗-valued ∗ 0 ∗ hY, x i, hY , x i has the same distribution as hX, x i, hX 0 , x∗ i , and that is an elementary application of the additivity property of independent Gaussians. Turning to the main assertion, let 0 < s ≤ t be given, and use the preceding to justify P kXkE ≤ s P kXkE ≥ t = P kXkE ≤ s & kX 0 kE ≥ t 1 1 = P kX − X 0 kE ≤ 2 2 s & kX + X 0 kE ≥ 2 2 t 1 1 ≤ P kXkE − kX 0 kE ≤ 2 2 s & kXkE + kX 0 kE ≥ 2 2 t 2 1 1 ≤ P kXkE ∧ kX 0 kE ≥ 2− 2 (t − s) = P kXkE ≥ 2− 2 (t − s) .
Now suppose that P kXk ≤ R ≥ 1 tn = R + 2 2 tn−1 for n ≥ 1. Then
3 4,
and define {tn : n ≥ 0} by t0 = R and
2 P kXkE ≤ R P kXkE ≥ tn ≤ P kXkE ≥ tn−1 and therefore P kXkE ≥ tn ≤ P kXkE ≤ R
P kXkE ≥ tn−1 P kXkE ≤ R
!2
for n ≥ 1. Working by induction, one gets from this that
!2n P kXkE ≥ R P kXkE ≤ R
P kXkE ≥ tn ≤ P kXkE ≤ R
and therefore, since tn = R 2
n+1 2 −1 1 2 2 −1
≤ 32
n+1 2
R, that P kXkE ≥ 32
n+1 2
n R ≤ 3−2 .
Hence, ∞ h kXk2E i X n+1 n n 1 e2 P 32 2 R ≤ kXkE ≤ 32 2 R EP e 18R2 ≤ e 2 P kXkE ≤ 3R + n=0 1
≤ e2 +
∞ X n=0
n e 2
3
= K.
§ 8.2.2. The Basic Structure Theorem. I will now abstract the relationship, proved in § 8.1.2, between Θ(RN ), H1 (RN ), and W (N ) , and for this purpose I will need the following simple lemma.
308
8 Gaussian Measures on a Banach Space
Lemma 8.2.3. Let E be a separable, real Banach space, and suppose that H ⊆ E is a real Hilbert space that is continuously embedded as a dense subspace of E. (i) For each x∗ ∈ E ∗ there is a unique hx∗ ∈ H with the property that h, hx∗ H = hh, x∗ i for all h ∈ H, and the map x∗ ∈ E ∗ 7−→ hx∗ ∈ H is linear, continuous, one-to-one, and onto a dense subspace of H. (ii) If x ∈ E, then x ∈ H if and only if there is a K < ∞ such that |hx, x∗ i| ≤ Kkhx∗ kH for all x∗ ∈ E ∗ . Moreover, for each h ∈ H, khkH = sup{hh, x∗ i : x∗ ∈ E ∗ & kx∗ kE ∗ ≤ 1}. (iii) If L∗ is a weak* dense subspace of E ∗ , then there exists a sequence {x∗n : n ≥ 0} ⊆ L∗ such that {hx∗n : n ≥ 0} is an orthonormal basis for H. Moreover, P∞ if x ∈ E, then x ∈ H if and only if n=0 hx, x∗n i2 < ∞. Finally, h, h0
H
=
∞ X
hh, x∗n ihh0 , x∗n i for all h, h0 ∈ H.
n=0
Proof: Because H is continuously embedded in E, there exists a C < ∞ such that khkE ≤ CkhkH . Thus, if x∗ ∈ E ∗ and f (h) = hh, x∗ i, then f is linear and |f (h)| ≤ khkE kx∗ kE ∗ ≤ Ckx∗ kE ∗ khkH , and so, by the Riesz Representation Theorem for Hilbert spaces, there exists a unique hx∗ ∈ H such that f (h) = h, hx∗ H . In fact, khx∗ kH ≤ Ckx∗ kE ∗ , and uniqueness can be used to check that x∗ hx∗ is linear. To see that x∗ hx∗ is one-to-one, it suffices to show ∗ that x = 0 if hx∗ = 0. But if hx∗ = 0, then hh, x∗ i = 0 for all h ∈ H, and therefore, because H is dense in E, x∗ = 0. Because I will use it later, I will prove slightly more than the density of just {hx∗ : x∗ ∈ E ∗ } in H. Namely, for any weak* dense subset S ∗ of E ∗ , {hx∗ : x∗ ∈ S ∗ } is dense in H. Indeed, if this were not the case, exist an h ∈ H \ {0} with the property that then there would ∗ ∗ hh, x i = h, hx∗ H = 0 for all x ∈ S. But, since S ∗ is weak* dense in E ∗ , this would lead to the contradiction that h = 0. Thus, (i) is now proved. Obviously, if h ∈ H, then |hh, x∗ i| = |(h, hx∗ )H | ≤ khx∗ kH khkH for x∗ ∈ E ∗ . Conversely, if x ∈ E and |hx, x∗ i| ≤ Kkhx∗ kH for some K < ∞ and all x∗ ∈ E ∗ , set f (hx∗ ) = hx, x∗ i for x∗ ∈ E ∗ . Then, because x∗ hx∗ is one-to-one, f ∗ ∗ ∗ is a well-defined, linear functional on {hx : x ∈ E }. Moreover, |f (x∗ )| ≤ Kkhx∗ kH , and therefore, since {hx∗ : x∗ ∈ E ∗ } is dense, f admits a unique extension as a continuous, linear functional on H. Hence, by Riesz’s theorem, there is an h ∈ H such that hx, x∗ i = f (hx∗ ) = h, hx∗
H
= hh, x∗ i,
x∗ ∈ E ∗ ,
which means that x = h ∈ H. In addition, if h ∈ H, then khkH = sup{hh, x∗ i : khx∗ kH ≤ 1} follows from the density of {hx∗ : x∗ ∈ E ∗ }, and this completes the proof of (ii).
§ 8.2 A Structure Theorem for Gaussian Measures
309
Turning to (iii), remember that, by Exercise 5.1.19, the weak* topology on E ∗ is second countable. Hence, the weak* topology on L∗ is also second countable and therefore separable. Thus, we can find a sequence in L∗ that is weak* dense in E ∗ , and then, proceeding as in the hint given for Exercise 5.1.19, extract a subsequence of linearly independent elements whose span S ∗ is weak* dense in E ∗ . Starting with this subsequence, apply the Grahm–Schmidt orthogonalization procedure to produce a sequence {x∗n : n ≥ 0} whose span is S ∗ and for which {hx∗n : n ≥ 0} is orthonormal in H. Moreover, because the span of {hx∗n : n ≥ 0} equals {hx∗ : x∗ ∈ S ∗ }, which, by what we proved earlier, is dense in H, {hx∗n : n ≥ 0} is an orthonormal basis in H. Knowing this, it is immediate that 0
h, h
H
=
∞ X
h, hxn
0
H
h , hxn
n=0
H
=
∞ X
hh, x∗n ihh0 , x∗n i.
n=0
P∞ P∞ 2 ∗ 2 ∗ 2 In particular, n=0 hx, xn i < ∞, P khkH ∗= n=0 hh, xn i . Finally, if x ∈ E and ∗ set g = m=0 hx, xn ihx∗n . Then g ∈ H and hx − g, x i = 0 for all x∗ ∈ S ∗ . Hence, since S ∗ is weak* dense in E ∗ , x = g ∈ H. Given a separable real Hilbert space H, a separable real Banach space E, and a W ∈ M1 (E), I will say that the triple (H, E, W) is an abstract Wiener space if H is continuously embedded as a dense subspace of E and W ∈ M1 (E) has Fourier transform (8.2.4)
c ∗ ) = e− W(x
khx∗ k2 H 2
for all x∗ ∈ E ∗ .
The terminology is justified by the fact, demonstrated at the end of § 8.1.2, that H1 (RN ), Θ(RN ), W (N ) is an abstract Wiener space. The concept of an abstract Wiener space was introduced by Gross, although his description was somewhat different from the one just given (cf. Theorem 8.3.9 for a reconciliation of mine with his definition). Theorem 8.2.5. Suppose that E is a separable, real Banach space and that W ∈ M1 (E) is a centered Gaussian measure that is non-degenerate. Then there exists a unique Hilbert space H such that (H, E, W) is an abstract Wiener space. q Proof: By Fernique’s Theorem, we know that C ≡ EW kxk2E < ∞. To understand the proof of existence, it is best to start with the proof of uniqueness. Thus, suppose that H is a Hilbert space for which (E, H, W) is an abstract Wiener space. Then, for all x∗ , y ∗ ∈ E ∗ , hhx∗ , y ∗ i = (hx∗ , hy∗ )H = hhy∗ , x∗ i. In addition, ∗
hhx∗ , x i =
khx∗ k2H
Z =
hx, x∗ i2 W(dx),
310
8 Gaussian Measures on a Banach Space
and so, by the symmetry just established, Z (*) hhx∗ , y ∗ i = khx∗ k2H = hx, x∗ ihx, y ∗ i W(dx), for all x∗ , y ∗ ∈ E ∗ . Next observe that Z
hx, x∗ i x W(dx) ≤ Ckhx∗ kH , (**) E R and therefore that the integral xhx, x∗ i W(dx) is a well-defined element of E. Moreover, by (*), Z ∗ ∗ ∗ hhx∗ , y i = xhx, x i W(dx), y for all y ∗ ∈ E ∗ , and so Z (***)
hx∗ =
xhx, x∗ i W(dx).
Finally, given h ∈ H, choose {x∗n : n ≥ 1} ⊆ E ∗ so that hx∗n −→ h in H. Then
lim sup h · , x∗n i − h · , x∗m i 2 = lim sup khx∗ − hx∗ kH = 0, m→∞ n>m
L (W;R)
m→∞ n>m
n
m
and so, if Ψ denotes the closure of {h · , x∗ i : x∗ ∈ E ∗ } in L2 (W; R) and F : Ψ −→ E is given by Z F (ψ) = xψ(x) W(dx), ψ ∈ Ψ, then h = F (ψ) for some ψ ∈ Ψ. Conversely, if ψ ∈ Ψ and {x∗n : n ≥ 1} is chosen so that h · , x∗n i −→ ψ in L2 (W; R), then {hx∗n : n ≥ 1} converges in H to some h ∈ H and it converges in E to F (ψ). Hence, F (ψ) = h ∈ H. In other words, H = F (Ψ). The proof of existence is now a matter of checking that if Ψ and F are defined as above and if H = F (Ψ) with kF (ψ)kH = kψkL2 (W;R) , then (H, E, W) is an abstract Wiener space. To this end, observe that Z ∗ hF (ψ), x i = hx, x∗ iψ(x) W(dx) = F (ψ), hx∗ H , and therefore both (*) and (***) hold for this choice of H. Further, given (*), it is clear that khx∗ k2H is the variance of h · , x∗ i and therefore that (8.2.4) holds. At the same time, just as in the derivation of (**), kF (ψ)kE ≤ CkψkL2 (W;R) = CkF (ψ)kH , and so H is continuously embedded inside E. Finally, by the Hahn– Banach Theorem, to show that H is dense in E it suffices to check that the only x∗ ∈ E ∗ such Rthat hF (ψ), x∗ i = 0 for all ψ ∈ Ψ is x∗ = 0. But when ψ = h · , x∗ i, hF (ψ), x∗ i = hx, x∗ i2 W (dx), and therefore, because W is non-degenerate, such an x∗ would have to be 0. § 8.2.3. The Cameron–Marin Space. Given a centered, non-degenerate Gaussian measure W on E, the Hilbert space H for which (H, E, W) is an abstract Wiener space is called its Cameron–Martin space. Here are a couple of important properties of the Cameron–Martin subspace.
§ 8.2 A Structure Theorem for Gaussian Measures
311
Theorem 8.2.6. If (H, E, W) is an abstract Wiener space, then the map x∗ ∈ E ∗ 7−→ hx∗ ∈ H is continuous from the weak* topology on E ∗ into the strong topology on H. In particular, for each R > 0, {hx∗ : x∗ ∈ BE ∗ (0, R)} is a compact subset of H, BH (0, R) is a compact subset of E, and so H ∈ BE . Moreover, when E is infinite dimensional, W(H) = 0. Finally, there is a unique linear, isometric map I : H −→ L2 (W; R) such that I(hx∗ ) = h · , x∗ i for all x∗ ∈ E ∗ , and {I(h) : h ∈ H} is a Gaussian family in L2 (W; R).
c ∗ ) is continuous Proof: To prove the initial assertion, remember that x∗ W(x ∗ ∗ with respect to the weak* topology. Hence, if xk −→ x in the weak* topology, then ! khx∗k − hx∗ k2H c ∗k − x∗ ) −→ 1, = W(x exp − 2
and so hx∗k −→ hx∗ in H. Given the first assertion, the compactness of {hx∗ : x∗ ∈ BE ∗ (0, R)} in H follows from the compactness (cf. Exercise 5.1.19) of BE ∗ (0, R) in the weak* topology. To see that BH (0, R) is compact in E, again apply Exercise 5.1.19 to check that BH (0, R) is compact in the weak topology on H. Therefore, all that we have to show is that the embedding map h ∈ H 7−→ h ∈ E is continuous from the weak topology on H into the strong topology on E. Thus, suppose that hk −→ h weakly in H. Because hx∗ : x∗ ∈ BE ∗ (0, 1) is compact in H, for each > 0 there exist an n ∈ Z+ and a {x∗1 , . . . , x∗n } ⊆ BE ∗ (0, 1) such that ∗
{hx∗ : x ∈ BE ∗ (0, 1)} ⊆
n [
BH (hx∗m , ).
1
Now choose ` so that max1≤m≤n |hhk − h, x∗m i| < for all k ≥ `. Then, for any x∗ ∈ BE ∗ (0, 1) and all k ≥ `, |hhk − h, x∗ i| ≤ + min hk − h, hx∗ − hx∗m H ≤ + 2 sup khk kH . 1≤m≤n
k≥1
Since, by the uniform boundedness principle, supk≥1 khk kH < ∞, this proves that khk − hkE = sup{hhk − h, x∗ i : x∗ ∈ BE ∗ (0, 1)} −→ 0 as k → ∞. S∞ Because H = 1 BH (0, n) and BH (0, n) is a compact subset of E for each n ∈ Z+ , it is clear that H ∈ BE . To see that W(H) = 0 when E is infinite dimensional, choose {x∗n : n ≥ 0} as in the final part of Lemma 8.2.3, and set Xn (x) = hx, x∗n i. Then the Xn ’s are an infinite P∞ sequence of independent, centered, Gaussians with mean value 1, and so n=0 Xn2 = ∞ W-almost surely. Hence, by Lemma 8.2.3, W-almost no x is in H. Turning to the map I, define I(hx∗ ) = h · , x∗ i. Then, for each x∗ , I(hx∗ ) is a centered Gaussian with variance khx∗ k2H , and so I is a linear isometry from
312
8 Gaussian Measures on a Banach Space
{hx∗ : x∗ ∈ E ∗ } into L2 (W; R). Hence, since {hx∗ : x∗ ∈ E ∗ } is dense in H, I admits a unique extension as a linear isometry from H into L2 (W; R). Moreover, as the L2 (W; R)-limit of centered Gaussians, I(h) is a centered Gaussian for each h ∈ H. The map I in Theorem 8.2.6 was introduced for the classical Wiener space by Paley and Wiener, and so I will call it the Paley–Wiener map. To appreciate its importance here, observe that {hx∗ : x∗ ∈ E ∗ } is the subspace of g ∈ H with the property that h ∈ H 7−→ (h, g)H ∈ R admits a continuous extension to E. Even though, when dim(H) = ∞, no such continuous extension exists for general g ∈ H, I(g) can be thought of as an extension of h (h, g)H , albeit one that is defined only up to a W-null set. Of course, one has to be careful when using this interpretation, since, when H is infinite dimensional, I(g)(x) for a given x ∈ E is not well-defined simultaneously of all g ∈ H. Nonetheless, by adopting it, one gets further evidence for the idea that W wants to be the standard Gauss measure on H. Namely, because (8.2.7)
khk2 √ H EW e −1 I(h) = e− 2 ,
h ∈ H,
if W lived on H, then it would certainly be the standard Gauss measure there. Perhaps the most important application of the Paley–Wiener map is the following theorem about the behavior of Gaussian measures under translation. That is, if y ∈ E and τy : E −→ E is given by τy (x) = x + y, we will be looking at the measure (τy )∗ W and its relationship to W. Using the reasoning suggested above, the result is easy to guess. Namely, if W really lived on H and were given by a Feynman-type representation W(dh) =
1 − khk2H e 2 λH (dh), Z
then (τg )∗ W should have the Feynman representation 1 − kh−gk2H 2 λH (dh), e Z
which could be rewritten as (τg )∗ W (dh) = exp h, g H − 12 kgk2H W(dh).
Hence, if we assume that I(g) gives us the correct interpretation of ( · , g)H , we are led to guess that, at least for g ∈ H, (8.2.8) (τg )∗ W(dx) (dh) = Rg (x) W (dx), where Rg = exp I(g) − 12 kgk2H .
That (8.2.8) is correct was proved for the classical Wiener space by Cameron and Martin, and for this reason it is called the Cameron–Martin formula. In fact, one has the following result, the second half of which is due to Segal.
Exercises for § 8.2
313
Theorem 8.2.9. If (H, E, W) is an abstract Wiener space, then, for each g ∈ H, (τg )∗ W W and the Rg in (8.2.8) is the corresponding Radon–Nikodym derivative. Conversely, if (τy )∗ W is not singular with respect to W, then y ∈ H. Proof: Let g ∈ H, and set µ = (τg )∗ W. Then
√ √ ∗ µ ˆ(x∗ ) = EW e −1hx+g,x i = exp −1hg, x∗ i − 12 khx∗ k2H .
(*)
Now define ν by the right-hand side of (8.2.8). Clearly ν ∈ M1 (E). Thus, we will have proved the first part once we show that νˆ is given by the right-hand side of (*). To this end, observe that, for any h1 , h2 ∈ H,
2 ξ1 I(h1 )+ξ2 I(h2 ) ξ22 ξ1 2 2 kh1 kH + ξ1 ξ2 h1 , h2 H + kh2 kH E e = exp 2 2 W
for all ξ1 , ξ2 ∈ C. Indeed, this is obvious when ξ1 and ξ2 are pure imaginary, and, since both sides are entire functions of (ξ1 , ξ2 ) ∈ C2 , it follows in general by analytic continuation. In particular, by taking h1 = g, ξ1 = 1, h2 = hx∗ , and √ ξ2 = −1, it is easy to check that the right-hand side of (*) is equal to νˆ(x∗ ). To prove the second assertion, begin by recalling from Lemma 8.2.3 that if y ∈ E, then y ∈ H if and only if there is a K < ∞ with the property that |hy, x∗ i| ≤ K for all x∗ ∈ E ∗ with khx∗ kH = 1. Now suppose that (τx∗ )∗ W 6⊥ W, and let R be the Radon–Nikodym derivative of its absolutely continuous part. Given x∗ ∈ E ∗ with khx∗ kH = 1, let Fx∗ be the σ-algebra generated by x hx, x∗ i, and check that (τy )∗ W Fx∗ W Fx∗ with Radon–Nikodym derivative hy, x∗ i2 ∗ ∗ . Y (x) = exp hy, x ihx, x i − 2
Hence,
2 1 Y ≥ EW R Fx∗ ≥ EW R 2 Fx∗ , and so (cf. Exercise 8.2.19) hy, x∗ i2 exp − 8
1 1 = EW Y 2 ≥ α ≡ EW R 2 ∈ (0, 1].
Since this means that hy, x∗ i2 ≤ 8 log α1 , the proof is complete.
Exercises for § 8.2 Exercise 8.2.10. Let C ∈ Hom(RN ; RN be a positive definite and symmetric, take E = RN to be the standard Euclidean metric, and let H = RN with the Hilbert inner product (x, y)H = (x, C−1 y)RN . Show that H, E, γ0,C is an abstract Wiener space.
314
8 Gaussian Measures on a Banach Space
Exercise 8.2.11. Let E be a separable Banach space and W a centered Gaussian measure on E, but do not assume that W is non-degenerate. Denote by N the set of x∗ ∈ E ∗ for which EW hx, x∗ i2 = 0, and set ˆ = x ∈ E : hx, x∗ i = 0 for all x∗ ∈ N . E ˆ is closed, that W(E) ˆ = 1, and that W E ˆ is a non-degenerate, Show that E ˆ centered Gaussian measure on E. Hint: Since W {x ∈ E : hx, x∗ i = 6 0} = 0 for each x∗ ∈ N , the only question is ˆ if and only whether one can choose a countable subset C ⊆ N such that x ∈ E ∗ ∗ if hx, x i = 0 for all x ∈ C. For this purpose, recall that, by Exercise 5.1.19, E ∗ with the weak* topology is second countable and therefore that N is separable with respect to the weak* topology. Exercise 8.2.12. Let {xP separable Banach space n : n ≥ 0} be a sequence in the P ∞ ∞ E with the property that n=0 kxn kE < ∞. Show that n=0 |ξn |kxP n k < ∞ for ∞ N γ0,1 -almost every ξ ∈ RN , and define X : RN −→ E so that X(ξ) = n=0 ξn xn P∞ if n=0 |ξn |kxn kE < ∞ and X(ξ) = 0 otherwise. Show that the distribution µ of X is a centered, Gaussian measure on E. In addition, show that µ is non-degenerate if and only if the span of {xn : n ≥ 0} is dense in E. Exercise 8.2.13. Here an application of Fernique’s Theorem to functional analysis. Let E and F be a pair of separable Banach spaces and ψ a Borel measurable, linear map from E to F . Given a centered, Gaussian E-valued random variable X, use Exercise 2.3.21 see that ψ ◦ X is an F -valued, a centered Gaussian random variable, and apply Fernique’s Theorem to conclude that ψ ◦ X is a square integrable and has mean value 0. Next, suppose that ψ is not continuous, and choose {xn : n ≥ 0} ⊆ E and {yn : n ≥ 0} ⊆ F ∗ so that kxn kE = 1 = kyn ∗ kF ∗ and hψ(xn ), yn∗ i ≥ n + 13 . Using Exercise 8.2.12, show that there exist centered, Gaussian F -valued random variables {Xn : n ≥ 0},P {X n : n ≥ 0}, ∞ N −2 and X under γ0,1 such that Xn (ξ) = (n + 1) ξn xn , X(ξ) = n=0 Xn (ξ), and N X n (ξ) = X(ξ) − Xn (ξ) for γ0,1 -almost every ξ ∈ RN . Show that Z
Z
N kψ ◦ ≥ hψ ◦ X(ξ), yn∗ i γ0,1 (dξ) Z N ≥ hψ ◦ Xn (ξ), yn∗ i γ0,1 (dξ) ≥ (n + 1),
X(ξ)k2F
N γ0,1 (dξ)
N and thereby arrive at the contradiction that ψ ◦ X ∈ / L2 (γ0,1 ; F ). Conclude that every Borel measurable, linear map from E to F is continuous. Notice that, as a consequence, we know that the Paley–Wiener integral I(h) of an h in the Cameron–Martin space is equal W-almost everywhere to a Borel measurable, linear function if and only if h = hx∗ for some x∗ ∈ E ∗ .
Exercises for § 8.2
315
Exercise 8.2.14. Let W p bePa centered, Gaussian measure on a separable Ban 2 nach space E, and set σ = m=1 am , where a1 , . . . , an ∈ R. If X1 , . . . , Xn are mutually independent, E-valued random variables with distribution Pn W on some probability space (Ω, F, P), show that the P-distribution of S ≡ m=1 am Xm is the same as the W-distribution of x σx. In particular, EP kSkpE = σ p EW kxkpE for all p ∈ [0, ∞).
Hint: Using Exercise 8.2.11, reduce to the case when W is non-degenerate. For this case, let H be the Cameron–Martin space for W on E, and show that i h √ 2 ∗ σ2 EP e −1hS,x i = e− 2 khx∗ kH for all x∗ ∈ E ∗ .
Exercise 8.2.15. Referring to the setting in Lemma 8.2.3, show that there is a (n) sequence {k · kE : n ≥ 0} of norms on E each of which is commensurate with (N ) k · kE (i.e., Cn−1 k · k ≤ k · kE ≤ Cn k · k for some Cn ∈ [1, ∞)) such that, for each R > 0, (n)
BH (0, R) = {x ∈ E : kxkE ≤ R for all n ≥ 0}. Hint: Choose {x∗m : m ≥ 0} ⊆ E ∗ so that {hx∗m : m ≥ 0} is an orthonormal Pn basis for H, define Pn : E −→ H by Pn x = m=0 hx, x∗m ihx∗m , and set (n) kxkE
=
q
kPn xk2H + kx − Pn xk2E .
Exercise 8.2.16. Referring to the setting in Fernique’s Theorem, observe that all powers of kXkE are integrable, and set σ 2 = E kXk2E . Show that h kXk2E i E e 72σ2 ≤ K.
In particular, for any n ≥ 1, conclude that E kXk2n ≤ (72)n n!Kσ 2n , E which is remarkably close to the equality that holds when E = R. See Corollary 8.4.3 for a sharper statement. Exercise 8.2.17. Again let E be a separable, real Banach space. Suppose that {Xn : n ≥ 1} is a sequence for centered, Gaussian E-valued random variables on some probability space (Ω, F, P) and that Xn −→ X in P-probability. Show that X is again a centered, random variable and that there exists a λ > 0 Gaussian 2 for which supn≥1 EP eλkXn kE < ∞. Conclude, in particular, that Xn −→ X in Lp (P; E) for every p ∈ [1, ∞).
316
8 Gaussian Measures on a Banach Space
Exercise 8.2.18. Given λ ∈ Θ(RN )∗ , I pointed out at the end of § 8.1.2 that the Paley–Wiener integral [I(hλ )](θ) can be interpreted as the Riemann–Stieltjes integral of λ (s, ∞) with respect to θ(s). In this exercise, I will use this observation as the starting point for what is called stochastic integration. (i) Given λ ∈ Θ(RN )∗ and t > 0, set λt (dτ ) = 1[0,t) (τ )λ(dτ ) + δt λ [t, ∞) , and show that for all θ ∈ Θ(RN ) hθ, λt i =
Z
t
λ (τ, ∞) · dθ(τ ),
0
where the integral on the right is taken in the sense of Riemann–Stieltjes. In particular, conclude that t hθ, λt i is continuous for each θ. (ii) Given f ∈ Cc1 [0, ∞); RN , set λf (dτ ) = −f˙ (τ ) dτ , and show that hθ, λtf i =
Z
t
f (τ ) · dθ(τ ), 0
where again the integral on the right is Riemann–Stieltjes. Use this to see that the process Z t f (τ ) · dθ(τ ) : t ≥ 0 0
has the same distribution under W (N ) as (*)
Z t 2 B |f (τ )| dτ : t ≥ 0 , 0
where {B(t) : t ≥ 0} is an R-valued Brownian motion. R t∧τ (iii) Given f ∈ L2loc [0, ∞); RN and t > 0, set htf (τ ) = 0 f (s) ds. Show that the W (N ) -distribution of the process I(htf ) : t ≥ 0 is the same as that of the process in (*). In particular, conclude (cf. part (ii) of Exercise 4.3.16) that there is a continuous modification of the process {I(htf ) : t ≥ 0}. For reasons made clear in (ii), such a continuous modification is denoted by Z
t
f (τ ) · dθ(τ ) : t ≥ 0 .
0
Of course, unless f has bounded variation, the integrals in the preceding are no longer interpretable as Riemann–Stieltjes integrals. In fact, they not even defined θ by θ but only as a stochastic process. For this reason, they are called stochastic integrals.
§ 8.3 From Hilbert to Abstract Wiener Space
317
Exercise 8.2.19. Define Rg as in (8.2.8), and show that p p1 (p − 1)kgk2H W for all p ∈ (0, ∞). E Rg = exp 2 Exercise 8.2.20. Here is another way to think about Segal’s half of Theorem 8.2.9. Using Lemma 8.2.3, choose {x∗n : n ≥ 0} ⊆ E ∗ so that {hx∗n : n ≥ 0} is an orthonormal basis for H. Next, define F : E −→ RN so thatQ F (x)n = hx, x∗n i ∞ N for each n ∈ N, and show that F∗ W = γ0,1 and (F ◦ τy )∗ W = 0 γan ,1 , where Q ∞ N an = hy, x∗n i. Conclude from this that (τy )∗ W ⊥ W if γ0,1 ⊥ 0 γan ,1 . Finally, P∞ use this together with Exercise 5.2.42 to see that (τy )∗ W ⊥ W if 0 a2m = ∞, which, by Lemma 8.2.3, will be the case if y ∈ / H.
§ 8.3 From Hilbert to Abstract Wiener Space Up to this point I have been assuming that we already have at hand a nondegenerate, centered Gaussian measure W on a Banach space E, and, on the basis of this assumption, I produced the associated Cameron–Martin space H. In this section, I will show how one can go in the opposite direction. That is, I will start with a separable, real Hilbert space H and show how to go about finding a separable, real Banach space E for which there exists a W ∈ M1 (E) such that (H, E, W) is an abstract Wiener space. Although I will not adopt his approach, the idea of carrying out such a program is Gross’s. Warning: From now on, unless the contrary is explicitly stated, I will be assuming that the spaces with which I am dealing are all infinite dimensional, separable, and real. § 8.3.1. An Isomorphism Theorem. Because, at an abstract level, all infinite dimensional, separable Hilbert spaces are the same, one should expect that, in a related sense, the set of all abstract Wiener spaces for which one Hilbert space is the Cameron–Martin space is the same as the set of all abstract Wiener spaces for which any other Hilbert space is the Cameron–Martin space. The following simple result verifies this conjecture. Theorem 8.3.1. Let H and H 0 be a pair of Hilbert spaces, and suppose that F is a linear isometry from H onto H 0 . Further, suppose that (H, E, W) is an abstract Wiener space. Then there exists a separable, real Banach space E 0 ⊇ H 0 anda linear isometry F˜ from E onto E 0 such that F˜ H = F and H 0 , E 0 , F˜∗ W is an abstract Wiener space. Proof: Define kh0 kE 0 = kF −1 h0 kE for h0 ∈ H 0 , and let E 0 be the Banach space obtained by completing H 0 with respect to k · kE 0 . Trivially, H 0 is continuously embedded in E 0 as a dense subspace, and F admits a unique extension F˜ as an isometry from E onto E 0 . Moreover, if (x0 )∗ ∈ (E 0 )∗ and F˜ > is the adjoint map from (E 0 )∗ onto E ∗ , then h0 , h0(x0 )∗ H 0 = hh0 , (x0 )∗ i = hF −1 h0 , F˜ > (x0 )∗ i = F −1 h0 , hF˜ > (x0 )∗ H = h0 , F hF˜ > (x0 )∗ H 0 ,
318
8 Gaussian Measures on a Banach Space
and so h0(x0 )∗ = F hF˜ > (x0 )∗ . Hence, i i h √ i h √ h √ 0 ∗ 0 0 ∗ ˜> 0 ∗ ˜ ˜ EF∗ W e −1 hx ,(x ) i = EW e −1 hF x,(x ) i = EW e −1 hx,F (x ) i 1
2
− 1 kF −1 h0
k2
− 1 kh0
k2
(x0 )∗ H 0 , (x0 )∗ H = e 2 = e− 2 khF˜ > (x0 )∗ kH = e 2 which completes the proof that H 0 , E 0 , F˜∗ W is an abstract Wiener space. Theorem 8.3.1 says that there is a one-to-one correspondence between the abstract Wiener spaces associated with one Hilbert space and the abstract Wiener spaces associated with any other. In particular, it allows us to prove the theorem of Gross which states that every Hilbert space is the Cameron–Martin space for some abstract Wiener space.
Corollary 8.3.2. Given a separable, real Hilbert space H, there exists a separable Banach space E and a W ∈ M1 (E) such that (H, E, W) is an abstract Wiener space. Proof: Let F : H 1 (R) −→ H be an isometric isomorphism, and use Theorem 8.3.1 to construct a separable Banach space E and an isometric, isomorphism F˜ : Θ(R) −→ E so that (H, E, W) is an abstract Wiener space when W = F˜∗ W (1) . It is important to recognize that although a non-degenerate, centered Gaussian measure on a Banach space E determines a unique Cameron–Martin space H, a given H will be the Cameron–Martin space for an uncountable number of abstract Wiener spaces. For example, in the classical case when H = H1 (RN ), we could have replaced Θ(RN ) by a subspace which reflected the fact that almost every Brownian path is locally H¨ older continuous of any order less than a half. We will see a definitive, general formulation of this point in Corollary 8.3.10. § 8.3.2. Wiener Series. The proof that I gave of Corollary 8.3.2 is too nonconstructive to reveal much about the relationship between H and the abstract Wiener spaces for which it is the Cameron–Martin space. Thus, in this subsection I will develop another, entirely different way of constructing abstract Wiener spaces for a Hilbert space. The approach here has its origins in one of Wiener’s own constructions of Brownian motion and is based on the following line of reasoning. Given H, choose an orthonormal basis {hn : n ≥ 0}. If there were a standard Gauss measure W on H, then the random variables {Xn : n ≥ 0} given by Xn (h) = h, hn H would be independent, standard normal, R-valued random variables, P∞ and, for each h ∈ H, 0 Xn (h)hn would converge in H to h. Even though W cannot live on H, this line of reasoning suggests that a way to construct an abstract Wiener space is to start with a sequence {Xn : n ≥ 0} of R-valued, independent standard normalPrandom variables on some probability space, find ∞ a Banach space E in which 0 Xn hn converges with probability 1, and take W on E to the distribution of this series.
§ 8.3 From Hilbert to Abstract Wiener Space
319
To convince oneself that this line of reasoning has a chance of leading somewhere, one should observe that L´evy’s construction corresponds to a particular choice of the orthonormal basis {hm : m ≥ 0}.1 To see this, determine {h˙ k,n : (k, n) ∈ N2 } by 1 on k21−n , (2k + 1)2−n n−1 h˙ k,0 = 1[k,k+1) and h˙ k,n = 2 2 −1 on (2k + 1)2−n , (k + 1)21−n 0 elsewhere for n ≥ 1. Clearly, the h˙ k,n ’s are orthonormal in L2 [0, ∞); R . In addition, for each n ∈ N, the span of {h˙ k,n : k ∈ N} equals that of {1[k2−n ,(k+1)2−n ) : k ∈ N}. Perhaps the easiest way to check this is to do so by dimension counting. That is, for a given (`, n) ∈ N2 , note that
{h˙ `,0 } ∪ {h˙ k,m : `2m−1 ≤ k < (` + 1)2m−1 and 1 ≤ m ≤ n} has the same number of elements as {1[k2−n ,(k+1)2−n ) : `2n ≤ k < (` + 1)2n } and that the first set is contained in the span of the second. As a consequence, we know that {h˙ k,n : (k, n) ∈ N2 } is an orthonormal basis in L2 [0, ∞); R , and Rt so, if hk,n (t) = 0 h˙ k,n (τ ) dτ and (e1 , . . . , eN ) is an orthonormal basis in RN , then hk,n,i ≡ hk,n ei : (k, n, i) ∈ N2 × {1, . . . , N } is known as the Haar basis, in H1 (RN ). Finally, if an orthonormal basis, 2 Xk,n,i : (k, n, i) ∈ N ×{1, . . . , N } is a family of independent, N (0, 1)-random PN variables and Xk,n = i=1 Xk,n,i ei , then n X ∞ X N X
Xk,m,i hk,m,i (t) =
m=0 k=0 i=1
n X ∞ X
hk,m (t)Xk,m
m=0 k=0
is precisely the polygonalization that I denoted by Bn (t) in L´evy’s construction (cf. § 4.3.2). The construction by Wiener, alluded to above, was essentially the same, only he chose a different basis for H1 (RN ). Wiener took h˙ k,0 (t) = 1[k,k+1) (t) for 1 k ∈ N and h˙ k,n (t) = 2 2 1[k,k+1) (t) cos πn(t − k) for (k, n) ∈ N × Z+ , which means that he was looking at the series 1 ∞ X X 2 2 sin πn(t − k) Xk,n , (t − k)1[k,k+1) (t)Xk,0 + 1[k,k+1) (t) πn + k=0
1
(k,n)∈N×Z
The observation that L´ evy’s construction (cf. § 4.3.2) can be interpreted in terms of a Wiener series is due to Z. Ciesielski. To be more precise, initially Ciesielski himself was thinking entirely in terms of orthogonal series and did not realize that he was giving a re-interpretation of L´ evy’s construction. Only later did the connection become clear.
320
8 Gaussian Measures on a Banach Space
where again {Xk,n : (k, n) ∈ N2 } is a family of independent, RN -valued, N (0, I)random variables. The reason why L´evy’s choice is easier to handle than Wiener’s is that, in L´evy’s case, for each n ∈ Z+ and t ∈ [0, ∞), hk,n (t) 6= 0 for precisely one k ∈ N. Wiener’s choice has no such property. With these preliminaries, the following theorem should come as no surprise. Theorem 8.3.3. Let H be an infinite dimensional, separable, real Hilbert space and E a Banach space into which H is continuously embedded as a dense subspace. If for some orthonormal basis {hm : m ≥ 0} in H the series ∞ X
(8.3.4)
ξm hm converges in E
m=0 N for γ0,1 -almost every ξ = (ξ0 , . . . , ξm , . . . ) ∈ RN
and if S : RN −→ E is given by P∞ m=0 ξm hm S(ξ) = 0
when the series converges in E otherwise,
N then H, E, W with W = S∗ γ0,1 is an abstract Wiener space. Conversely, if (H, E, W) is an abstract Wiener space and {hm : m ≥ 0} is an orthogonal sequence in H such that, for each m ∈ N, either hm = 0 or khm kH = 1, then " (8.3.5)
E
W
p # n
X
sup I(hm )hm < ∞ for all p ∈ [1, ∞),
n≥0 m=0
E
P∞
and, for W-almost every x ∈ E, m=0 [I(hm )](x)hm converges in E to the W-conditional expectation value of x given σ {I(hm ) : m ≥ 0} . Moreover, ∞ X
[I(hm )](x)hm is W-independent of x −
m=0
∞ X
[I(hm )](x)hm .
m=0
Finally,P if {hm : m ≥ 0} is an orthonormal basis in H, then, for W-almost every ∞ x ∈ E, m=0 [I(hm )](x)hm converges in E to x, and the convergence is also in Lp (W; E) for every p ∈ [1, ∞). Proof: P First assume that (8.3.4) holds for some orthonormal basis, and set n N Sn (ξ) = m=0 ξm hm and W = S∗ γ0,1 . Then, because Sn (ξ) −→ S(ξ) in E for N N γ0,1 -almost every ξ ∈ R , n i h √ Y 2 2 N 1 1 c ∗ ) = lim Eγ0,1 e− 2 (hx∗ ,hm )H = e− 2 khx∗ kH , W(x e −1hSn ,λi = lim n→∞
n→∞
m=0
§ 8.3 From Hilbert to Abstract Wiener Space
321
which proves that (H, E, W) is an abstract Wiener space. Next suppose that (H, E, W) is an abstract Wiener space and that {hm : m ≥ 0} is an orthogonal sequence with khm kH ∈ {0, 1} for each m ≥ 0. By Theorem 8.2.1, x ∈ Lp (W; E) for every p ∈ [1, ∞). Next, for each W∞ n ∈ N, set Fn = σ {I(hm ) : 0 ≤ m ≤ n} . Clearly, Fn ⊆ Fn+1 and F ≡ n=0 Fn is the Pn σ-algebra generated by {I(hm ) : m ≥ 0}. Moreover, if Sn = m=0 I(hm )hm , then, since {I(hm ) : m ≥ 0} is a Gaussian family and hx − Sn (x), x∗ i is perpendicular in L2 (W; R) to I(hm ) for all x∗ ∈ E ∗ and 0 ≤ m ≤ n, x − Sn (x) is W-independent of Fn . Thus Sn = EW [x | Fn ], and so, by Theorem 6.1.12, we know both that (8.3.5) holds and that Sn −→ EW [x | F] W-almost surely. In addition, the W-independence of Sn (x) from x − Sn (x) implies that the limit quantities possess the same independence property. In order to complete the proof at this point, all that I have to do is show that x = EW [x | F] W-almost surely when {hm : m ≥ 0} is an orthonormal basis. W Equivalently, I must check that BE is contained P in the W-completion F of F. n To this end, note that, for each h ∈ H, because m=0 (h, hm )H hm converges in H to h, ! n n X X h, hm H I(hm ) = I h, hm H hm −→ I(h) in L2 (W; R). m=0
m=0
W
Hence, I(h) is F -measurable for every h ∈ H. In particular, this means that W x hx, x∗ i is F -measurable for every x∗ ∈ E ∗ , and so, since BE is generated W by {h · , x∗ i : x∗ ∈ E ∗ }, BE ⊆ F . It is important to acknowledge that the preceding theorem does not give another proof of Wiener’s theorem that Brownian motion exists. Instead, it simply says that, knowing it exists, there are lots of ways in which to construct it. See Exercise 8.3.21 for a more satisfactory proof of the same conclusion in the classical case, one that does not require the a priori existence of W (N ) . The following result shows that, in some sense, a non-degenerate, centered, Gaussian measure W on a Banach space does not fit on a smaller space.
Corollary 8.3.6. If W is a non-degenerate, centered Gaussian measure on a separable Banach space E, then E is the support of W in the sense that W assigns positive probability to every non-empty open subset of E. Proof: Let H be the Cameron–Martin space for W. Since H is dense in E, it suffices to show that W BE (g, r) > 0 for every g ∈ H and r > 0. Moreover, since, by the Cameron–Martin formula (8.2.8) (cf. Exercise 8.2.19) W BE (0, r) = (τ−g )∗ W BE (g, r) = EW R−g , BE (g, r) q kgk2 H W BE (g, r) , ≤e 2
322
8 Gaussian Measures on a Banach Space
I need only show that W BE (0, r) > 0 for all r > 0. To this end, choose an Pn orthonormal basis {hm : m ≥ 0} in H, and set Sn = m=0 I(hm )hm . Then, by Theorem 8.3.3, x Sn (x) is W-independent of x x − Sn (x) and Sn (x) −→ x in E for W-almost every x ∈ E. Hence, W {kx − Sn (x)kE < 2r } ≥ 12 for some n ∈ N, and therefore W BE (0, r) ≥ 12 W kSn kE < 2r . Pn But kSn k2E ≤ CkSn k2H = m=0 I(hm )2 for some C < ∞, and so n+1 r > 0 for any r > 0. BRn+1 0, 2C W kSn kE < 2r ≥ γ0,1
§ 8.3.3. Orthogonal Projections. Associated with any closed, linear subspace L of a Hilbert space H, there is an orthogonal projection map ΠL : H −→ L determined by the property that, for each h ∈ H, h − ΠL h ⊥ L. Equivalently, ΠL h is the element of L that is closest to h. In this subsection I will show that if (H, E, W) is an abstract Wiener space and L is a finite dimensional subspace of H, then ΠL admits a W-almost surely unique extension PL to E. In addition, I will show that PL x −→ x in L2 (W; E) as L % H. Lemma 8.3.7. Let (H, E, W) be an abstract Wiener space P∞ and {hm : m ≥ 0} an orthonormal basis in H. Then, for each h ∈ H, m=0 (h, hm )H I(hm ) converges to I(h) W-almost surely and in Lp (W; R) for every p ∈ [1, ∞). Proof: Define the σ-algebras Fn and F as in the proof P of Theorem 8.3.3. Then, n by the same argument as I used there, one can identify m=0 (h, hm )H I(hm ) as W
EW [I(h) | Fn ]. Thus, since F ⊇ BE , the required convergence statement is an immediate consequence of Corollary 5.2.4.
Theorem 8.3.8. Let (H, E, W) be an abstract Wiener space. For each finite dimensional subspace L of H there is a W-almost surely unique map PL : E −→ H such that, for every h ∈ H and W-almost every x ∈ E, h, PL x H = I(ΠL h)(x), where ΠL denotes orthogonal projection from H onto L. In fact, if {g1 , . . . , gdim(L) } is an orthonormal basis for L, then PL x = Pdim(L) [I(gi )](x)gi , and so PL x ∈ L for W-almost every x ∈ E. In partic1 ular, the distribution of x ∈ E 7−→ PL x ∈ L under W is the same as that Pdim(L) dim(L) of (ξ1 , . . . , ξdim(L) ) ∈ Rdim(L) 7−→ ξi gi ∈ L under γ0,1 . Finally, 1 x PL x is W-independent of x x − PL x. Proof: Set ` = dim(L). It suffices to note that ! ` ` X X I(ΠL h) = I (h, gk )H gk = (h, gk )H I(gk ) = k=1
k=1
` X k=1
! I(gk )gk , h H
for all h ∈ H We now have the preparations needed to prove a result which shows that my definition of an abstract Wiener space is the same as Gross’s. Specifically, Gross’s own definition was based on the property proved in the following.
§ 8.3 From Hilbert to Abstract Wiener Space
323
Theorem 8.3.9. Let (H, E, W) be an abstract Wiener space and {hn : n ≥ 0} an orthonormal basis for H, and set Ln = span {h , . . . , h } n . Then, for all 0 2 W 2 > 0 there exists an n ∈ N such that E kPL xkE ≤ whenever L is a finite dimensional subspace that is perpendicular to Ln . Proof: Without loss in generality, I will assume that k · kE ≤ k · kH . Arguing by contradiction, I will show that if the asserted property does not hold,P then there would exist an orthonormal basis {fn : n ≥ 0} for H such ∞ that 0 I(fn )fn fails to converge in L2 (W; E). Thus, suppose that there exists an > 0 such that for all n ∈ N there exists a finite dimensional L ⊥ Ln with EW kPL xk2E ≥ 2 . Under this assumption, define {nm : m ≥ 0} ⊆ N, {`m : m ≥ 0} ⊆ N, and {f0 , . . . , fnm } : m ≥ 0 ⊆ Lnm inductively by the following prescription. First, take n0 = 0 = `0 and f0 = h0 . Next, knowing nm and {f0, . . . , fnm }, choose a finite dimensional subspace L ⊥ Lnm so that EW kPL xk2E ≥ 2 , set `m = dim(L), and let {gm,1 , . . . , gm,`m } be an orthonormal basis for L. For any δ > 0 there exists an n ≥ nm + `m such that `m X ΠLn gm,i , ΠLn gm,j − δi,j ≤ δ. H i,j=1
In particular, if δ ∈ (0, 1), then the elements of {ΠLn gm,i : 1 ≤ i ≤ `m } are linearly independent and the orthonormal set {˜ gm,i : 1 ≤ i ≤ `m } obtained from them via the Gram–Schmidt orthogonalization procedure satisfies (cf. Exercise 8.3.16) `m X
`m X ΠLn gm,i , ΠLn gm,j − δi,j
k˜ gm,i − gm,i kH ≤ K`m
i=1
i,j=1
for some Km < ∞ which depends only on `m . Moreover, and because L ⊥ Lnm , g˜m,i ⊥ Lnm for all 1 ≤ i ≤ `m. Hence, we can find an nm+1 ≥ nm + `m so that span {hn : nm < n ≤ nm+1 } admits an orthonormal basis {fnm +1 , . . . , fnm+1 } P` with the property that 1m kgm,i − fnm +i kH ≤ 4 . Clearly {fn : n ≥ 0} is an orthonormal basis for H. On the other hand,
2 12
2 12 `m +`m
X
nmX
I(gm,i )gm,i − I(fnm +i )fnm +i EW I(fn )fn ≥ − EW
n=nm +1
1
E
≥−
`m X
2 1 EW I(gm,i )gm,i − I(fnm +i )fnm +i H 2 ,
1
2 1 and so, since EW I(gi,m )gm,i − I(fnm +i )fnm +i H 2 is dominated by
2 1 1 EW I(gm,i ) − I(fnm +i ) gm,i H 2 + EW I(fnm +i )2 2 kgm,i − fnm +i kH
≤ 2kgm,i − fnm +i kH ,
E
324
8 Gaussian Measures on a Banach Space
we have that
2 12 +`m
nmX
EW I(fn )fn ≥
2 n +1 m
for all m ≥ 0,
E
P∞ and this means that 0 I(fn )fn cannot be converging in L2 (W; E). Besides showing that my definition of an abstract Wiener space is the same as Gross’s, Theorem 8.3.9 allows us to prove a very convincing statement, again due to Gross, of just how non-unique is the Banach space for which a given Hilbert space is the Cameron–Martin space. Corollary 8.3.10. If (H, E, W) is an abstract Wiener space, then there exists a separable Banach space E0 that is continuously embedded in E as a measurable subset and has the properties that W(E 0 ) = 1, bounded subsets of E0 are relatively compact in E, and (H, E0 , W E0 is again an abstract Wiener space. Proof: Again I will assume that k · kE ≤ k · kH . Choose {x∗n : n ≥ 0} ⊆ E ∗ so that {hn : n ≥ 0} is an orthonormal basis in H when hn = hx∗n , and set Ln = span {h0 , . . . , hn } . Next, using Theorem 8.3.9, choose an increasing sequence {nm : m ≥ 0} so that n0 = 0 and 1 EW kPL xk2E 2 ≤ 2−m for m ≥ 1 and finite dimensional L ⊥ Lnm , and define Q` for ` ≥ 0 on E into H so that
Q0 x =
hx, x∗0 ih0
and Q` x =
n` X
hx, x∗n ihn
when ` ≥ 1.
n=n`−1 +1
Finally, set Sm = PLnm = that kxkE0 ≡ kQ0 xkE +
Pm
`=0
∞ X
Q` , and define E0 to be the set of x ∈ E such
`2 Q` xkE < ∞
and kSm x − xkE −→ 0.
`=1
To show that k · kE0 is a norm on E0 and that E0 with norm k · kE0 is a Banach space, first note that if x ∈ E0 , then kxkE = lim kSm xkE ≤ kQ0 xkE + lim m→∞
m X
kQ` xkE ≤ kxkE0 ,
m→∞ `=1
and therefore k · kE0 is certainly a norm on E0 . Next, suppose that the sequence {xk : k ≥ 1} ⊆ E0 is a Cauchy sequence with respect to k · kE0 . By the preceding, we know that {xk : k ≥ 1} is also Cauchy convergent with respect to
§ 8.3 From Hilbert to Abstract Wiener Space
325
k · kE , and so there exists an x ∈ E such that xk −→ x in E. We need to show that x ∈ E0 and that kxk − xkE0 −→ 0. Because {xk : k ≥ 1} is bounded in E0 , it is clear that kxkE0 < ∞. In addition, for any m ≥ 0 and k ≥ 1, kx − Sm xkE = lim kx` − Sm x` kE ≤ lim kx` − Sm x` kE0 `→∞
= lim
`→∞
X
2
n kQn x` kE ≤
`→∞ n>m
X
n2 kQn xk k + sup kx` − xk kE0 .
n>m
`>k
Thus, by choosing k for a given > 0 so that sup`>k kx` − xk kE0 < , we conclude that limm→∞ kx − Sm xkE < and therefore that Sm x −→ x in E. Hence, x ∈ E0 . Finally, to see that xk −→ x in E0 , simply note that ∞ X
kx − xk kE0 = kQ0 (x − xk )kE +
m2 kQm (x − xk )kE
m=1
kQ0 (x` − xk )kE +
≤ lim `→∞
∞ X
! 2
m kQm (x` − xk )kE
≤ sup kx` − xk kE0 , `>k
m=1
which tends to 0 as k → ∞. To show that bounded subsets of E0 are relatively compact in E, it suffices to show that if {x` : ` ≥ 1} ⊆ BE0 (0, R), then there is an x ∈ E to which a subsequence converges in E. For this purpose, observe that, for each m ≥ 0, there is a subsequence {x`k : k ≥ 1} along which Sm x`k converges in Lnm . Hence, by a diagonalization argument, {x`k : k ≥ 1} can be chosen so that {Sm x`k : k ≥ 1} converges in Lnm for all m ≥ 0. Since, for 1 ≤ j < k, X kx`k − x`j kE ≤ kSm x`k − Sm x`j kE + kQn (x`k − x`j )kE n>m
≤ kSm x`k − Sm x`j kE + 2R
X 1 , n2 n>m
it follows that {x`k : k ≥ 1} is Cauchy convergent in E and therefore that it converges in E. I must still show that E0 ∈ BE and that (H, E0 , W0 ) is an abstract Wiener space when W0 = W E0 . To see the first of these, observe that x ∈ E 7−→ kxkE0 ∈ [0, ∞] is lower semicontinuous and that {x : kSm x − xkE −→ 0} ∈ BE . In addition, because, by Theorem 8.3.3, kSm x − xkE −→ 0 for W-almostevery x ∈ E, we will know that W(E0 ) = 1 once I show that W kxkE0 < ∞ = 1, which follows immediately from ∞ X EW kxkE0 = EW kQ0 xkE + m2 EW kQm xkE 1
≤ EW kQ0 xkE +
∞ X 1
1 m2 EW kQm xk2E 2 < ∞.
326
8 Gaussian Measures on a Banach Space
The next step is to check that H is continuously embedded in E0 . Certainly h ∈ H =⇒ kSm h − hkE ≤ kSm h − hkH −→ 0. Next suppose that h ∈ H \ {0} and that h ⊥ Lnm , and let L be the line spanned by h. Then PL x = khk−2 H [I(h)](x)h, and so, because L ⊥ Lnm ,
1 khkE 1 W 2 2 khkE . = ≥ E I(h) khkH khk2H 2m
Hence, we now know that h ⊥ Lnm =⇒ khkE ≤ 2−m khkH . In particular, kQm+1 hkE ≤ 2−m kQm+1 hkH ≤ 2−m khkH for all m ≥ 0 and h ∈ H, and so ! ∞ ∞ X X m2 2 khkH = 25khkH . khkE0 = kQ0 hkE + m kQm hkE ≤ 1 + 2 m 2 m=1 m=1
To complete the proof, I must show that H is dense in E0 and that, for each c0 (y ∗ ) = e− 12 khy∗ k2H , where W0 = W E0 and hy∗ ∈ H is determined y ∗ ∈ E0∗ , W by h, hy∗ H = hh, y ∗ i for h ∈ H. Both these facts rely on the observation that X kx − Sm xkE0 = n2 kQn xkE −→ 0 for all x ∈ E0 . n>m
Knowing this, the density of H in E0 is obvious. Finally, if y ∗ ∈ E0∗ , then, by the preceding and Lemma 8.3.7, hx, y ∗ i = lim hSm x, y ∗ i = lim m→∞
= lim
m→∞
m→∞
nm X
hy∗ , hn
H
nm X
hx, x∗n ihhn , y ∗ i
n=0
I(hn ) (x) = I(hy∗ ) (x)
n=0
for W0 -almost every x ∈ E0 . Hence h · , y ∗ i under W0 is a centered Gaussian with variance khy∗ k2H . § 8.3.4. Pinned Brownian Motion. Theorem 8.3.8 has a particularly inter esting application to the classical abstract Wiener space H1 (RN ), Θ(RN ), W (N ) . Namely, suppose that 0 = t0 < t1 < · · · < tn , and let L be the span of htm e : 1 ≤ m ≤ n and e ∈ SN −1 , where ht (τ ) ≡ t ∧ τ . In this case, PL θ =
n X htm − htm−1 θ(tm ) − θ(tm−1 ) , t − tm−1 m=1 m
and so θ(t1 ,... ,tn ) (t) ≡ [θ − PL θ](t) ( t−tm−1 if t ∈ [tm−1 , tm ] θ(t) − θ(tm−1 ) − tm (8.3.11) −tm−1 θ(tm ) − θ(tm−1 ) = θ(t) − θ(tn ) if t ∈ [tn , ∞).
§ 8.3 From Hilbert to Abstract Wiener Space
327
Thus, if (θ, ~y) ∈ Θ(RN ) × (RN )n 7−→ θ(t1 ,... ,tn ),~y ∈ Θ(RN ) is given by θ(t1 ,... ,tn ),~y = θ(t1 ,... ,tn ) +
n X htm − htm−1 (ym − ym−1 ), t − tm−1 m=1 m
where ~y = (y1 , . . . , yn ) and y0 ≡ 0, then, for any Borel measurable F : Θ(RN )× (RN )n −→ [0, ∞), Z F θ, θ(t1 ), . . . , θ(tn ) W (N ) (dθ) Θ(RN )
(8.3.12)
Z
Z
F θ(t1 ,... ,tn ),~y , ~y W
= (RN )n
(N )
(dθ) γ0,C(t1 ,... ,tn ) (d~y),
Θ(RN )
where C(t1 , . . . , tn )(m,i),(m0 i0 ) = tm ∧tm0 δi,i0 for 1 ≤ m, m0 ≤ n and 1 ≤ i, i0 ≤ N is the covariance of θ (θ(t1 ), . . . , θ(tn )) under W (N ) . Equivalently, if θˇ(t1 ,... ,tn ),~y = θ(t1 ,... ,tn ) +
n X htm − htm−1 ym , t − tm−1 m=1 m
then Z
F θ, θ(t1 ) − θ(t0 ), . . . , θ(tn ) − θ(tn−1 ) W (N ) (dθ)
Θ(RN )
(8.3.13)
Z
Z
= (RN )n
(N ) ˇ ~ F θ(t1 ,... ,tn ),~y , y W (dθ) γ0,D(t1 ,... ,tn ) (d~y),
Θ(RN )
where D(t1 , . . . , tn )(m,i),(m0 ,i0 ) = (tm − tm−1 )δm,m0 δi,i0 for 1 ≤ m, m0 ≤ n and 1 ≤ i, i0 ≤ N is the covariance matrix for θ(t1 ) − θ(t0 ), . . . , θ(tn ) − θ(tn−1 ) under W (N ) . There are several comments that should be made about these conclusions. In the first place, it is clear from (8.3.11) that t θ(t1 ,... ,tn ) (t) returns to the origin at each of the times {tm : 1 ≤ m ≤ n}. In addition, the excursions θ(t1 ,... ,tn ) [tm−1 , tm ], 1 ≤ m ≤ n, are independent of each other and of θ(t1 ,... ,tn ) [tn , ∞). (N )
Secondly, if W(t1 ,... ,tn ),~y denotes the W (N ) -distribution of θ (8.3.12) says that (N ) θ W(t1 ,... ,tn ),(θ(t1 ),... ,θ(tn ))
θ(t1 ,... ,tn ),~y , then
is a regular conditional probability distribution (cf. § 9.2) of W (N ) given the σalgebra generated by {θ(t1 ), . . . , θ(tn )}. Expressed in more colloquial terms, the process θ(t1 ,... ,tn ),~y (t) : t ≥ 0 is Brownian motion pinned to the points {ym : 1 ≤ m ≤ n} at times {tm : 1 ≤ m ≤ n}.
328
8 Gaussian Measures on a Banach Space
§ 8.3.5. Orthogonal Invariance. Consider the standard Gauss distribution γ0,I on RN . Obviously, γ0,I is rotation invariant. That is, if O is an orthogonal transformation on RN , then γ0,I is invariant under the transformation TO : RN −→ RN given by TO x = Ox. On the other hand, none of these transformations can be ergodic, since any radial function on RN is invariant under TO for every O. Now think about the analogous situation when RN is replaced by an infinite dimensional Hilbert space H and (H, E, W) is an associated abstract Wiener space. As I am about to show, W still enjoys rotation invariance with respect to orthogonal transformations on H. On the other hand, because kxkH = ∞ for W-almost every x ∈ E, there are no non-trivial radial functions now, a fact that leaves open the possibility that some orthogonal transformation of H give rise to ergodic transformations for W. The purpose of this subsection is to investigate these matters, and I begin with the following formulation of the rotation invariance of W. Theorem 8.3.14. Let (H, E, W) be an abstract Wiener space and O an orthogonal transformation on H. Then there is a W-almost surely unique, Borel measurable map TO : E −→ E such that I(h) ◦ TO = I(O> h) W-almost surely for each h ∈ H. Moreover, W = (TO )∗ W. Proof: To prove uniqueness, note that if T and T 0 both satisfy the defining property for TO , then, for each x∗ ∈ E ∗ , hT x, x∗ i = I(hx∗ )(T x) = I(O> hx∗ ) = I(hx∗ )(T 0 x) = hT 0 x, x∗ i for W-almost every x ∈ E. Hence, since E ∗ is separable in the weak* topology, T x = T 0 x for W-almost every x ∈ E. To prove existence, choose an orthonormal basis {hm : m ≥ for H, and let P∞ P0} ∞ C be the set of x ∈ E for which both m=0 [I(hm )](x)hm and m=0 [I(hm )](x)Ohm converge in E. By Theorem 8.3.3, we know that W(C) = 1 and that P∞ m=0 [I(hm )](x)Ohm if x ∈ C x TO x ≡ 0 if x ∈ /C has distribution W. Hence, all that remains is to check that I(h)◦TO = I(O> h) W-almost surely for each h ∈ H. To this end, let x∗ ∈ E ∗ , and observe that ∗
[I(hx∗ )](TO x) = hTO x, x i =
∞ X
hx∗ , Ohm
H
[I(hm )](x)
m=0
=
∞ X
O> hx∗ , hm
H
[I(hm )](x)
m=0
for W-almost every x ∈ E. Thus, since, by Lemma 8.3.7, the last of these series convergences W-almost surely to I(O> hx∗ ), we have that I(hx∗ ) ◦ TO =
§ 8.3 From Hilbert to Abstract Wiener Space
329
I(O> hx∗ ) W-almost surely. To handle general h ∈ H, simply note that both h ∈ H 7−→ I(h) ◦ TO ∈ L2 (W; R) and h ∈ H 7−→ I(O> h) ∈ L2 (W; R) are isometric, and remember that {hx∗ : x∗ ∈ E ∗ } is dense in H. I next want to discuss the possibility of TO being ergodic for some orthogonal transformations O. First notice that TO cannot be ergodic if O has a non-trivial, finite dimensional invariant Pn subspace L, since if {h1 , . . . , hn } were an orthonormal basis for L, then m=1 I(hm )2 would be a non-constant, TO invariant function. Thus, the only candidates for ergodicity are O’s that have no non-trivial, finite dimensional, invariant subspaces. In a more general and highly abstract context, I. Segal2 showed that the existence of a non-trivial, finite dimensional subspace for O is the only obstruction to TO being ergodic. Here I will show less. Theorem 8.3.15. Let (H, E, W) be an abstract Wiener space. If O is an orthogonal transformation on H with the property that, for every g, h ∈ H, limn→∞ On g, h H = 0, then TO is ergodic.
Proof: What I have to show is that any TO -invariant element Φ ∈ L2 (W; R) is W-almost surely constant, and for this purpose it suffices to check that (*)
lim EW (Φ ◦ TOn )Φ = 0 n→∞
for all Φ ∈ L2 (W; R) with mean value 0. In fact, if {hm : m ≥ 1} is an orthonormal basis for H, then it suffices to check (*) when Φ(x) = F [I(h1 )](x), . . . , [I(hN )](x) for some N ∈ Z+ and bounded, Borel measurable F : RN −→ R. The reason why it is sufficient to check it for such Φ’s is that, because TO is W-measure preserving, the set of Φ’s for which (*) holds is closed in L2 (W; R). Hence, if we start with any Φ ∈ L2 (W; R) with mean value 0, we can first approximate it in L2 (W; R) by bounded functions with mean value 0 and then condition these bounded approximates with respect to σ {I(h1 ), . . . , I(hN )} to give them the required form. Now suppose that Φ = F I(h1 ), . . . , I(hN ) for some N and bounded, measurable F . Then ZZ EW Φ ◦ TOn Φ = F (ξ)F (η) γ0,Cn (dξ × dη), RN ×RN 2
See I.E. Segal’s “Ergodic subsgroups of the orthogonal group on a real Hilbert Space,” Annals of Math. 66 # 2, pp. 297–303 (1957). For a treatment in the setting here, see my article “Some thoughts about Segals ergodic theorem,” Colloq. Math. 118 # 1, pp. 89-105 (2010).
330
8 Gaussian Measures on a Banach Space
where Cn =
I B> n
Bn I
with Bn =
hk , On h`
H
1≤k,`≤N
and the block structure corresponds to RN × RN . Finally, by our hypothesis about O, we can find a subsequence {nm : m ≥ 0} such that limm→∞ Bnm = 0, from which it is clear that γ0,Cnm tends to γ0,I × γ0,I in variation and therefore lim EW (Φ ◦ TOnm )Φ = EW [Φ]2 = 0.
m→∞
Perhaps the best tests for whether an orthogonal transformation satisfies the hypothesis in Theorem 8.3.15 come from spectral theory. To be more precise, if Hc and Oc are the space and operator obtained by complexifying H and O, the Spectral Theorem for normal operators allows one to write Z Oc =
2π
√
e
−1α
dEα ,
0
where {Eα : α ∈ [0, 2π)} is a resolution of the identity in Hc by orthogonal projection operators. The spectrum of Oc is said to be absolutely continuous if, for each h ∈ Hc , the non-decreasing function α Eα h, h Hc is absolutely continuous, which, by polarization, means that α Eα h, h0 Hc is absolutely continuous for all h, h0 ∈ Hc . The reason for introducing this concept here is that, by combining the Riemann–Lebesgue Lemma with Theorem 8.3.15, one can prove that TO is ergodic if the spectrum of Oc is absolutely continuous.3 Indeed, given h, h0 ∈ H, let f be the Radon–Nikodym derivative of α Eα h, h0 H , c and apply the Riemann–Lebesgue Lemma to see that n
0
O h, h
2π
Z
H
=
√
e
−1nα
f (α) dα −→ 0 as n → ∞.
0
See Exercises 8.3.24, 8.3.25, and 8.5.15 for a more concrete examples. Exercises for § 8.3 Exercise 8.3.16. The purpose of this exercise is to provide the linear algebraic facts that I used in the proof of Theorem 8.3.9. Namely, I want to show that if a set {h1 , . . . , hn } ⊆ H is approximately orthonormal, then the vectors hi differ by very little from their Gram–Schmidt orthogonalization. 3
This conclusion highlights the poverty of the result here in comparison to Segal’s result, which says that TO is ergodic as soon as the spectrum of Oc is continuous.
Exercises for § 8.3
331
(i) Suppose that A = aij 1≤i,j≤n ∈ Rn ⊗Rn is a lower triangular matrix whose diagonal entries are non-negative. Show that there is a Cn < ∞, depending only on n, such that kIRn − Akop ≤ Cn kIRn − AA> kop . Hint: Show that it suffices to treat the case when AA> ≤ 2IRn , and set ∆ = IRn − AA> . Assuming that AA> ≤ 2IRn , work by induction on n, at each step using the lower triangularity of A, to see that 12 ` X 1 a2` j if 1 ≤ ` < n |a` ` an ` | ≤ |∆n ` | + (AA> )n2 n j=1
n−1 X 1 − a2n n ≤ |∆n n | + a2n ` . `=1
(ii) Let {h1 , . . . , hn } ⊆ H, set B = (hi , hj )H 1≤i,j≤n , and assume that kIRn − Bkop < 1. Show that the hi ’s are linearly independent. (iii) Continuing part (ii), let {f1 , . . . , fn } be the orthonormal set obtained from the hi ’s by the Gram–Schmidt orthogonalization procedure, and let A be the matrix whose (i, j)th entry is (hi , fj )H . Show that A is lower triangular and that its diagonal entries are non-negative. In addition, show that AA> = B. (iv) By combining (i) and (iii), show that there is a Kn < ∞, depending only on n, such that n X
khi − fi kH ≤ Kn
i=1
Hint: Note that hi = khi −
n X δi,j − (hi , hj )H . i,j=1
Pn
j=1
fi k2H
aij fj and therefore that =
n X
IRn − A
2 ij
≤ nkIRn − Ak2op .
j=1
Exercise 8.3.17. Given a Hilbert space H, the problem of determining for which Banach spaces H arises as the Cameron–Martin space is an extremely delicate one. For example, one might hope that H will be the Cameron–Martin space for E if H is dense in E and its closed unit ball BH (0, 1) is compact in E. However, this is not the case. For example,qtake H = `2 (N; R) and let E be the P∞ ξn2 completion of H with respect to kξkE ≡ n=0 n+1 . Show that BH (0, 1) is compact as a subset of E but that there is no W ∈ M1 (E) for which (H, E, W) is an abstract Wiener space. Hint: The first part is an easy application of the standard diagonalization ar2 P ξn 1 kξk`2 (N;R) . To ≤ m+1 gument combined with the obvious fact that n≥m n+1 prove the second part, note that in order for W to exist it would be necessary P∞ ξn2 N to be γ0,1 -almost surely convergent. for n=0 n+1
332
8 Gaussian Measures on a Banach Space
Exercise 8.3.18. Let (H, E, W) be an abstract Wiener space, and assume that H is infinite dimensional. As was pointed out, {hx∗ : x∗ ∈ E ∗ } is the subspace of g ∈ H for which there exists a C < ∞ with the property that |(h, g)H | ≤ CkhkE for all h ∈ H. Show that for each g ∈ H there is separable Banach space Eg that is continuously embedded as a Borel subset of E such that W(Eg ) = 1, (H, Eg , W Eg ) is an abstract Wiener space, and |(h, g)H | ≤ khkEg for all h ∈ H. Hint: Refer to the notation used in the proof of Corollary 8.3.10. Choose nm % 1 ∞ so that n0 = 0 and, for m ≥ 1, kΠL⊥ gkH ≤ 2−m and EW kPL k2E 2 ≤ 2−m nm for finite dimensional L ⊥ Lnm . Next, define Eg to be the space of x ∈ E with the properties that PLnm x −→ x in E and
kxkEg ≡
X
kQ` xkE + Q` x, g H < ∞,
`=0
Pn` ∗ ∗ where Q0 x = hx, x∗0 ihx∗0 and Q` x = n=n`−1 +1 hx, xn ihxn for ` ≥ 1. Using the reasoning in the proof of Corollary 8.3.10, show that Eg has the required properties. Exercise 8.3.19. Let N = 1. Using Theorem 8.3.3, take Wiener’s choice of orthonormal basis and check that there are independent, standard normal random variables {Xm : m ≥ 1} under W (1) such that, for W (1) -almost almost every θ, 1
θ(t) = tX0 (θ) + 2 2
∞ X
Xm (θ)
m=1
sin(πmt) , mπ
t ∈ [0, 1],
where the convergence is uniform. From this, show that, W (1) -almost surely, 1
Z
√ ∞ 1 X Xm (θ)2 + 8X0 (θ)Xm (θ) X0 (θ)2 , + 2 θ(t) dt = m2 π m=1 3 2
0
where the convergence of the series is absolute. Using the preceding, conclude that, for any α ∈ (0, ∞),
EW
(1)
Z −α 0
1
#− 12 #− 12 " "Y ∞ ∞ X 1 2α . 1 + 4α θ(t)2 dt = 1+ 2 2 m2 π 2 + 2α m π m=1 m=1
Finally, recall Euler’s product formula ∞ Y sinh z = 1+ m=1
z2 m2 π 2
,
z ∈ C,
Exercises for § 8.3
333
and arrive first at W (1)
E
Z exp −α
1
√ − 1 θ(t) dt = cosh 2α 2 , 2
0
and then, after an application of Brownian rescaling, at " !# Z T √ − 1 W (1) 2 E exp −α θ(t) dt = cosh 2α T 2 . 0
This is a famous calculation that can be made using many different methods. We will return to it in § 10.1.3. See, in addition, Exercise 8.4.7. Hint: Use Euler’s product formula to see that ∞ X 1 sinh t d = 2t log 2 2 n π + t2 t dt n=1
for t ∈ R.
Exercise 8.3.20. Related to the preceding exercise, but easier, is finding the Laplace transform of the variance !2 Z Z 1 T 1 T 2 θ(t) dt θ(t) dt − VT (θ) ≡ T 0 T 0
of a Brownian path over the interval [0, T ]. To do this calculation, first use Brownian scaling to show that (1) (1) EW e−αVT = EW e−αT V1 . Next, use elementary Fourier series to show that (cf. part (iii) of Exercise 8.2.18) R 2 1 2 X ∞ Z 1 ∞ f (t) dθ(t) X k 0 , V1 (θ) = 2 θ(t) cos(kπt) dt = k2 π2 0 k=1
k=1
1 2
where fk (t) = 2 sin(kπt) for k ≥ 1. Since the fk ’s are orthonormal as elements of L2 [0, ∞); R , this leads to − 12 ∞ Y 2α W (1) −αV1 . E e = 1+ 2 2 k π k=1
Now apply Euler’s formula to arrive at s W
E e−αVT =
√
2αT √ . sinh( 2αT )
Finally, using Wiener’s choice of basis, show that θ V1 (θ) has the same dis2 R1 (1) tribution as θ θ(t) − tθ(1) dt under W , a fact for which I would like 0 but do not have any conceptual explanation.
334
8 Gaussian Measures on a Banach Space
Exercise 8.3.21. The purpose of this exercise is to show that, without knowing ahead of time that W (N ) lives on Θ(RN ), for the Hilbert space H1 (RN ) one N -almost surely in Θ(RN ). can give a proof that any Wiener series converges γ0,1 N Thus, let {hm : m ≥ 0} be an orthonormal basis Pn in H(R ) and, for n ∈ N N and ω = (ω0 , . . . , ωm , . . . ) ∈ R , set Sn (t, ω) = m=0 ωm hm (t). The goal is to N show that {Sn ( · , ω) : n ≥ 0} converges in Θ(RN ) for γ0,1 -almost every ω ∈ RN . (i) For ξ ∈ RN , set ht,ξ (τ ) = t∧τ ξ, check that ξ, Sn (t) RN = ht,ξ , Sn (t) H1 (RN ) , N and apply Theorem 1.4.2 to show that limn→∞ ξ, Sn (t) RN exists both γ0,1 2 N N almost surely and in L (γ0,1 ; R) for each (t, ξ) ∈ [0, ∞) × R . Conclude from N this that, for each t ∈ [0, ∞), limn→∞ Sn (t) exists both γ0,1 -almost surely and 2 N N in L (γ0,1 ; R ). (ii) On the basis of part (i), show that we will be done once we know that, N for γ0,1 -almost every x ∈ RN , {Sn ( · , x) : n ≥ 0} is equicontinuous on finite intervals and that supn≥0 t−1 |Sn (t, x)| −→ 0 as t → ∞. Show that both these will follow from the existence of a C < ∞ such that " # Sn (t) − Sn (s) N 3 γ0,1 ≤ CT 8 for all T ∈ (0, ∞). (*) E sup sup 1 (t − s) 8 0≤s 0, set θT (t) = θ(t) − t∧T T θ(T ). As I pointed out at the end of § 8.3.2, the W (N ) -distribution of θT is that of a Brownian motion conditioned to be back at 0 at time T . Next take ΘT (RN ) to be the space of (N ) continuous paths θ : [0, T ] −→ RN satisfying θ(0) = 0 = θ(T ), and let WT (N ) N N denote the W -distribution of θ ∈ Θ(R ) 7−→ θT ∈ ΘT (R ).
Exercises for § 8.3
335
(i) Show that the W (N ) -distribution of {θT (t) : t ≥ 0} is the same as that of 1 {T 2 θ1 (T −1 t) : t ≥ 0}.
(ii) Set H1T (RN ) = {h [0, T ] : h ∈ H1 (RN ) & h(T ) = 0}, and define ˙ L2 ([0,T ];RN ) . Show that the triple H1 (RN ), ΘT (RN ), W (N ) khkH1T (RN ) = khk T T (N )
is an abstract Wiener space. In addition, show that WT is invariant under time reversal. That is, {θ(t) : t ∈ [0, T ]} and {θ(T − t) : t ∈ [0, T ]} have the (N ) same distribution under WT . Hint: Begin by identifying ΘT (RN )∗ as the space of finite, RN -valued Borel measures λ on [0, T ] such that λ({0}) = 0 = λ({T }). Exercise 8.3.23. Say that D ⊆ E ∗ is determining if x = y whenever hx, x∗ i = hy, x∗ i for all x∗ ∈ D. Next, referring to Theorem 8.3.14, suppose that O is an orthogonal transformation on H and that F : E 7−→ E has the properties that F H = O and that x hF (x), x∗ i is continuous for all x∗ ’s from a determining set D. Show that TO x = F (x) for W-almost every x ∈ E. Exercise 8.3.24. Consider H1 (RN ), Θ(RN ), W (N ) , the classical Wiener space. Given α ∈ (0, ∞), define Oα : H1 (RN ) −→ H(RN ) by [Oα h](t) = 1 α− 2 h(αt), show that Oα is an orthogonal transformation, and apply Exercise 8.3.23 to see that TOα is the Brownian scaling map Sα given by Sα θ(t) = 1 α− 2 θ(αt) discussed in part (iii) of Exercise 4.3.10. The main goal of this exercise is to apply Theorem 8.3.15 to show that TOα is ergodic for every α ∈ (0, ∞)\{1}. (i) Given an orthogonal transformation O on H1 (RN ), show that On h, h0 H1 (RN ) tends to 0 for all h, h0 ∈ H1 (RN ) if limn→∞ On h, h0 H1 (RN ) = 0 for all h, h0 ∈ ˙ h˙ 0 ∈ C ∞ (0, ∞); RN . H(RN ) with h, c
(ii) Complete the program by showing that Oαn h, h0 H1 (RN ) tends to 0 for all ˙ h˙ 0 ∈ C ∞ (0, ∞); RN . α ∈ (0, ∞) \ {1} and h, h0 ∈ H1 (RN ) with h, c (iii) There is another way to think about the operator Oα . Namely, let λRN be Lebesgue measure on R, define U : H(RN ) −→ L2 (λRN ; RN ) by U h(x) = x ˙ x ), and show that U is an isometry from H1 (RN ) onto L2 (λRN ; RN ). Fure 2 h(e ther, show that U ◦ Oα = τlog α ◦ U , where τα : L2 (λRN ; RN ) −→ L2 (λRN ; RN ) is the translation map τα f (x) = f (x + α). Conclude from this that
Oαn h, h0
H1 (RN )
= (2π)−1
Z R
e−
√
−1nξ log α
d Uch(ξ), U h0
CN
dξ,
and use this, together with the Riemann–Lebesgue Lemma, to give a second proof that Oαn h, h0 H1 (RN ) tends to 0 as n → ∞ when α 6= 1.
336
8 Gaussian Measures on a Banach Space
(iv) As a consequence of the above and Theorem 6.2.7, show that for each α ∈ (0, ∞) \ {1}, q ∈ [1, ∞), and F ∈ Lq (W (N ) ; C), n−1 (N ) 1 X F Sαn θ = EW [F ] W (N ) -almost surely and in Lq (W (N ) ; C). n→∞ n m=0
lim
Next, replace Theorem 6.2.7 by Theorem 6.2.12 to show that Z t (N ) 1 τ −1 F Sτ θ dτ = EW [F ] lim t→∞ log t 1
W (N ) -almost surely and in Lq (W (N ) ; C). In particular, use this to show that, for n ∈ N, ( Qn Z t 2 n 1 m=1 (2m − 1) if n is even τ − 2 −1 θ(τ )n dτ = lim t→∞ log t 1 0 if n is odd.
Exercise 8.3.25. Here is a second reasonably explicit example to which Theorem 8.3.15 applies. Again consider the classical case when H = H1 (RN ), and assume that N ∈ Z+ is even. Choose a skew-symmetric A ∈ Hom(RN ; RN ) whose kernel is {0}. That is, A> = −A and Ax = 0 =⇒ x = 0. (i) Define OA on H1 (RN ) by Z
t
˙ ) dτ, eτ A h(τ
OA h(t) = 0
and show that OA is an orthogonal transformation that satisfies the hypotheses in Theorem 8.3.15. Hint: Using elementary spectral theory, show that there exist non-zero, real numbers α1 , . . . , α N and an orthonormal basis (e1 , . . . , eN ) in RN such that 2 Ae2m−1 = αm e2m and Ae2m = −αm e2m−1 for 1 ≤ m ≤ N2 . Thus, if Lm is the space spanned by e2m−1 and e2m , then Lm is invariant under A and the action of eτ A on Lm in terms of this basis is given by cos(αm τ ) − sin(αm τ ) . sin(αm τ ) cos(αm τ ) n Finally, observe that OA = OnA , and apply the Riemann–Lebesgue Lemma.
(ii) With the help of Exercise 8.3.23, show that Z t TOA θ(t) = eτ A dθ(τ ), 0
where the integral is taken in the sense of Riemann–Stieltjes.
§ 8.4 A Large Deviations Result and Strassen’s Theorem
337
§ 8.4 A Large Deviations Result and Strassen’s Theorem In this section I will prove the analog of Corollary 1.3.13 for non-degenerate, centered Gaussian measures on a Banach space. Once we have that result, I will apply it to prove Strassen’s Theorem, which is the law of the iterated logarithm for such measures. § 8.4.1. Large Deviations for Abstract Wiener Space. The goal of this subsection is to derive the following result. Theorem 8.4.1. Let (H, E, W) be an abstract Wiener space, and, for > 0, 1 denote by W the W-distribution of x 2 x. Then, for each Γ ∈ BE ,
− inf◦ h∈Γ
(8.4.2)
khk2H ≤ lim log W (Γ) 2 &0
khk2H . 2 h∈Γ
≤ lim log W (Γ) ≤ − inf &0
The original version of Theorem 8.4.1 was proved by M. Schilder for the classical Wiener measure using a method that does not extend easily to the general case. The statement that I have given is due to Donsker and S.R.S. Varadhan, and my proof derives from an approach (which very much resembles the arguments given in § 1.3 to prove Cram´er’s Theorem) that was introduced into this context by Varadhan. The lower bound is an easy application of the Cameron–Martin formula. Indeed, all that I have to do is show that if h ∈ H and r > 0, then
khk2H . lim log W BE (h, r) ≥ − 2 &0
(*)
To this end, note that, for any x∗ ∈ E ∗ and δ > 0,
1 1 W BE (hx∗ , δ) = W BE (− 2 hx∗ , − 2 δ) i h −1 2 ∗ 1 1 = EW e− 2 hx,x i− 2 khx∗ kH , BE (0, − 2 δ) 2 −1 ∗ 1 1 ≥ e−δ kx kE∗ − 2 khx∗ kH W BE (0, − 2 δ) ,
which means that
khx∗ k2H , BE (hx∗ , δ) ⊆ BE (h, r) =⇒ lim log W BE (hx∗ , r) ≥ −δkx∗ kE ∗ − &0 2
and therefore, after letting δ & 0 and remembering that {hx∗ : x ∈ E ∗ } is dense in H, that (*) holds.
338
8 Gaussian Measures on a Banach Space
The proof of the upper bound in (8.4.2) is a little more involved. The first step is to show that it suffices to treat the case when Γ is relatively compact. To this end, refer to Corollary 8.3.10, and set CR equal to the closure in E of BE0 (0, R). 2 By Fernique’s Theorem applied to W on E0 , we know that EW eαkxkE0 ≤ K < ∞ for some α > 0. Hence W E \ CR = W E \ C
1
− 2 R
≤ Ke−α
R2
,
and so, for any Γ ∈ BE and R > 0, R2 W Γ ≤ 2W(Γ ∩ CR ) ∨ Ke−α .
Thus, if we can prove the upper bound for relatively compact Γ’s, then, because Γ ∩ CR is relatively compact, we will know that, for all R > 0,
khk2H 2 h∈Γ
lim log W (Γ) ≤ −
inf
&0
∧ αR2
,
from which the general result is immediate. To prove the upper bound when Γ is relatively compact, I will show that, for any y ∈ E, ( kyk2 − 2 H if y ∈ H lim lim log W BE (y, r) ≤ (**) r&0 &0 −∞ if y ∈ / H.
To see that (**) is enough, assume that it is true and let Γ ∈ BE \{∅} be relatively compact. Given β ∈ (0, 1), for each y ∈ Γ choose r(y) > 0 and (y) > 0 so that
(
W BE (y, r(y)) ≤
e−
e
(1−β) 2 2 kykH
1 − β
if y ∈ H
if y ∈ /H
for all 0 < ≤ (y). Because Γ is relatively compact, we can find N ∈ Z+ and SN {y1 , . . . , yN } ⊆ Γ such that Γ ⊆ 1 BE (yn , rn ), where rn = r(yn ). Then, for sufficiently small > 0, 1 1−β 2 , inf khkH ∧ W (Γ) ≤ N exp − β 2 h∈Γ
and so
lim log W (Γ) ≤ −
&0
Now let β & 0.
1−β inf khk2H 2 h∈Γ
1 . ∧ β
§ 8.4 A Large Deviations Result and Strassen’s Theorem
339
Finally, to prove (**), observe that
i h −1 −1 ∗ ∗ W BE (y, r) = W BE ( √y , √r ) = EW e− 2 hx,x i e 2 hx,x i , BE ( √y , √r ) khx∗ k2 −1 ∗ −1 ∗ ∗ −1 ∗ ∗ H ≤ e− (hy,x i−rkx kE∗ ) EW e 2 hx,x i = e− hy,x i− 2 −rkx kE∗ ,
for all x∗ ∈ E. Hence,
lim lim log W BE (y, r) ≤ − sup hy, x∗ i − 12 khx∗ k2H .
r&0 &0
x∗ ∈E ∗
Finally, note that the preceding supremum is the same as half the supremum kyk2 of hy, x∗ i over x∗ with khx∗ kH = 1, which, by Lemma 8.2.3, is equal to 2 H if y ∈ H and to ∞ if y ∈ / H. An interesting corollary of Theorem 8.4.1 is the following sharpening, due to Donsker and Varadhan, of Fernique’s Theorem.
Corollary 8.4.3. Let W be a non-degenerate, centered, Gaussian measure on the separable Banach space E, let H be the associated Cameron–Martin space, and determine Σ > 0 by Σ−1 = inf{khkH : khkE = 1}. Then 1 lim R−2 log W kxkE ≥ R = − 2 . 2Σ
R→∞
α2 2 In particular, EW e 2 kxkE is finite if α < Σ−1 and infinite if α ≥ Σ−1 .
Proof: Set f (r) = inf{khkH : khkE ≥ r}. Clearly f (r) = rf (1) and f (1) = Σ−1 . Thus, by the upper bound in (8.4.2), we know that
Σ−2 f (1)2 . = lim R−2 log W kxkE ≥ R = lim R−2 log WR−2 kxkE ≥ 1 ≤ − R→∞ R→∞ 2 2
Similarly, by the lower bound in (8.4.2), for any δ ∈ (0, 1),
lim R−2 log W kxkE ≥ R ≥ lim R−2 log W kxkE > R R→∞
R→∞
≥ − inf
khk2H : khkE > R 2
≥−
1 f (1 + δ)2 = −(1 + δ)2 2 , 2Σ 2
and so we have now proved the first assertion. α2 kxk2E is finite when α < Given the first assertion, it is obvious that EW e 2 Σ−1 and infinite when α > Σ−1 . The case when α = Σ−1 is more delicate. To handle it, I first show that Σ = sup{khx∗ kH : kx∗ kE ∗ = 1}. Indeed, if x∗ ∈ E ∗ and kx∗ kE ∗ = 1, set g = khhxx∗∗kE , note that kgkE = 1, and check that
340
8 Gaussian Measures on a Banach Space
1 ≥ hg, x∗ i = g, hx∗ H = kgkH khx∗ kH . Hence khx∗ kH ≤ kgk−1 H ≤ Σ. Next, suppose that h ∈ H with khkE = 1. Then, by the Hahn–Banach Theorem, there exists a x∗ ∈ E ∗ with kxkE ∗ = 1 and hh, x∗ i = 1. In particular, khkH khx∗ kH ≥ h, hx∗ H = hh, x∗ i = 1, and therefore khk−1 H ≤ khx∗ kH , which, together with the preceding, completes the verification. The next step is to show that there exists an x∗ ∈ E ∗ with kx∗ kE ∗ = 1 such that khx∗ kH = Σ. To this end, choose {x∗k : k ≥ 1} ⊆ E ∗ with kx∗k kE ∗ = 1 so that khx∗k kH −→ Σ. Because BE ∗ (0, 1) is compact in the weak* topology and, by Theorem 8.2.6, x∗ ∈ E ∗ 7−→ hx∗ ∈ H is continuous from the weak* topology into the strong topology, we can assume that {x∗k : k ≥ 1} is weak* convergent to some x∗ ∈ BE ∗ (0, 1) and that khx∗ kH = Σ, which is possible only if kx∗ kE ∗ = 1. Finally, knowing that this x∗ exists, note that h · , x∗ i is a centered Gaussian under W with variance Σ2 . Hence, since kxkE ≥ |hx, x∗ i|, h kxk2E i Z ξ2 e 2Σ2 γ0,Σ2 (dξ) = ∞. EW e 2Σ2 ≥ R
§ 8.4.2. Strassen’s Law of the Iterated Logarithm. Just as in § 1.5 we were able to prove a law of the iterated logarithm on the basis of the large deviation estimates in § 1.3, so here the estimates in the preceding subsection will allow us to prove a law of the iterated for centered Gaussian random variables on a Banach space. Specifically, I will prove the following theorem, whose statement is modeled on V. Strassen’s famous law of the iterated for Brownian motion (cf. § 8.6.3). q Sn , where Recall from § 1.5 the notation Λn = 2n log(2) (n ∨ 3) and S˜n = Λ n Pn Sn = 1 Xm .
Theorem 8.4.4. Suppose that W is a non-degenerate, centered, Gaussian measure on the Banach space E, and let H be its Cameron–Martin space. If {Xn : n ≥ 1} is a sequence of independent, E-valued, W-distributed random variables on some probability space (Ω, F, P), then, P-almost surely, the sequence {S˜n : n ≥ 1} is relatively compact in E and the closed unit ball BH (0, 1) in H coincides with its set of limit points. Equivalently, P-almost surely, limn→∞ kS˜n − BH (0, 1)kE = 0 and, for each h ∈ BH (0, 1), limn→∞ kS˜n − hkE = 0.
Because, by Theorem 8.2.6, BH (0, 1) is compact in E, the equivalence of the two formulations is obvious, and so I will concentrate on the second formulation. I begin by showing that limn→∞ kS˜n − BH (0, 1)kE = 0 P-almost surely, and the fact that underlies my proof is the estimate that, for each open subset G of E and α < inf{khkH : h ∈ / G}, there is an M ∈ (0, ∞) with the property that √ α2 Λ2 Sn for all n ∈ Z+ and Λ ≥ M n. ∈ / G ≤ exp − (*) P 2n Λ
§ 8.4 A Large Deviations Result and Strassen’s Theorem
341
To check (*), first note (cf. Exercise 8.2.14) that the distribution of Sn under 1 ˜ /G = P is the same as that of x n 2 x under W and therefore that P SΛn ∈
W n2 (G{). Hence, (*) is really just an application of the upper bound in (8.4.2). Λ Given (*), I proceed in very much the same way as I did at the analogous place in § 1.5. Namely, for any β ∈ (1, 2),
lim kS˜n − BH (0, 1)kE ≤ lim
max
m→∞ β m−1 ≤n≤β m
n→∞
kS˜n − BH (0, 1)kE
kSn − BH (0, Λ[β m−1 ] )kE m→∞ β Λn ≤n≤β m
Sn
− BH (0, 1) ≤ lim max m
. m→∞ 1≤n≤β Λ m−1
≤ lim
max m−1
[β
]
E
At this point in § 1.5 (cf. the proof of Lemma 1.5.3), I applied L´evy’s reflection principle to get rid of the “max.” However, L´evy’s argument works only for R-valued random variables, and so here I will replace his estimate by one based on the idea in Exercise 1.4.25. Lemma 8.4.5. Let {YmP: m ≥ 1} be mutually independent, E-valued random n variables, and set Sn = m=1 Ym for n ≥ 1. Then, for any closed F ⊆ E and δ > 0, P(kSn − F kE ≥ δ) . P max kSm − F kE ≥ 2δ ≤ 1≤m≤n 1 − max1≤m≤n P(kSn − Sm kE ≥ δ)
Proof: Set Am = {kSm − F kE ≥ 2δ and kSk − F kE < 2δ for 1 ≤ k < m}. Following the hint for Exercise 1.4.25, observe that P max kSm − F kE ≥ 2δ min P(kSn − Sm kE < δ) 1≤m≤n
≤
n X m=1
1≤m≤n
n X P Am ∩ {kSn − Sm kE < δ} ≤ P Am ∩ {kSn − F kE ≥ δ} , m=1
which, because the Am ’s are disjoint, is dominated by P kSn − F kE ≥ δ . Applying the preceding to the situation at hand, we see that !
Sn − BH (0, 1) P max
≥ 2δ 1≤n≤β m Λ[β m−1 ] E
S[βm ] − BH (0, 1) ≥ δ P Λ[β m−1 ] E . ≤ 1 − max1≤n≤β m P kSn kE ≥ δΛ[β m−1 ]
342
8 Gaussian Measures on a Banach Space
After combining this with the estimate in (*), it is an easy matter to show that, for each δ > 0, there is a β ∈ (1, 2) such that !
∞ X
Sn
− BH (0, 1) P max
≥ 2δ < ∞,
β m−1 ≤n≤β m Λ[β m−1 ] E m=1
from which it should be clear why limn→∞ kS˜n − BH (0, 1)kE = 0 P-almost surely. The proof that, P-almost surely, limn→∞ kS˜n − hkE = 0 for all h ∈ BH (0, 1) differs in no substantive way from the proof of the analogous assertion in the second part of Theorem 1.5.9. Namely, because BH (0, 1) is separable, it suffices to work with one h ∈ BH (0, 1) at a time. Furthermore, just as I did there, I can reduce the problem to showing that, for each k ≥ 2, > 0, and h with khkH < 1, ∞ X
P S˜km −km−1 − h E < = ∞. m=1
But, if khkH < α < 1, then (8.4.2) says that
2 m m−1 ) P S˜km −km−1 − h E < = W km −km−1 BE (h, ) ≥ e−α log(2) (k −k Λ2 km −km−1
for all large enough m’s. Exercises for § 8.4 Exercise 8.4.6. Let (H, E, W) be an abstract Wiener space, and assume that dim(H) = ∞. If W is defined for > 0 as in Theorem 8.4.1, show that W1 ⊥ W2 if 2 6= 1 . Hint: Choose {x∗m : m ≥ 0} ⊆ E ∗ so that {hx∗m : m ≥ 0} is an orthonormal basis in H, and show that n−1 1 X hx, x∗m i2 = W -almost surely. n→∞ n m=0
lim
Exercise 8.4.7. Show that the Σ in Corollary 8.4.3 is 12 in the case of the classical abstract Wiener space H1 (RN ), Θ(RN ), W (N ) and therefore that
lim R−2 log W (N ) kθkΘ(RN ) ≥ R = −2.
R→∞
Next, show that ! lim R
R→∞
−2
log W
(N )
sup |θ(τ )| ≥ R τ ∈[0,t]
=−
1 2t
§ 8.5 Euclidean Free Fields
343
and that ! 2 sup |θ(τ )| ≥ R θ(t) = 0 = − . t
lim R−2 log W (N )
R→∞
τ ∈[0,t]
Finally, show that lim R
R→∞
−1
log W
(N )
Z
t 2
|θ(τ )| dτ ≥ R 0
=−
π2 8t2
and that lim R
−1
R→∞
log W
(N )
Z 0
t
π2 |θ(τ )| dτ ≥ R θ(t) = 0 = − 2 . 2t 2
Hint: In each case after the first, Brownian scaling can be used to reduce the problem to the case when t = 1, and the challenge is to find the optimal constant C for which khkE ≤ CkhkH , h ∈ H for the appropriate abstract Wiener space N (E, H, W). In the second case E = C [0, 1] : R ≡ θ [0, 1] : θ ∈ Θ(RN ) 0 and H = η [0, 1] : η ∈ H1 (RN ) , in the third (cf. part (ii) of Exercise 8.3.22) N E = Θ1 (RN ) and H = H11 (RN ) , in the fourth E = L2 [0, 1]; {η R ) and H = 1 N 2 N 1 N [0, 1] : η ∈ H (R )}, and in the fifth E = L [0, 1]; R and H = H (R ). 1 The optimization problems when E = Θ(RN ) or C0 [0, 1]; RN are rather easy 1 consequences of |η(t)| ≤ t 2 kηkH1 (RN ) . When E = Θ1 (RN ), one should start with ˙ L1 ([0,1];RN ) ≤ kηkH11 (RN ) . the observation that if η ∈ H11 (RN ), then 2kηku ≤ kηk In the final two cases, one can either use elementary variational calculus or one can make use of, respectively, the orthonormal bases
1
2 2 sin n +
1 2
1 πτ : n ≥ 0 and 2 2 sin nπτ : n ≥ 1 in L2 [0, 1]; R).
Exercise 8.4.8. Suppose that f ∈ C E; R , and show, as a consequence of Theorem 8.4.4, that
lim f S˜n = min{f (h) : khkH ≤ 1} and lim f S˜n = max{f (h) : khkH ≤ 1} n→∞
n→∞
W N -almost surely. § 8.5 Euclidean Free Fields In this section I will give a very cursory introduction to a family of abstract Wiener spaces they played an important role in the attempt to give a mathematically rigorous construction of quantum fields. From the physical standpoint, the fields treated here are “trivial” in the sense that they model “free” (i.e., non-interacting) fields. Nonetheless, they are interesting from a mathematical
344
8 Gaussian Measures on a Banach Space
standpoint and, if nothing else, show how profoundly properties of a process are effected by the dimension of its parameter set. I begin with the case when the parameter set is one dimensional and the resulting process can be seen as a minor variant of Brownian motion. As we will see, the intractability of the higher dimensional analogs increases with the number of dimensions. § 8.5.1. The Ornstein–Uhlenbeck Process. Given x ∈ RN and θ ∈ Θ(RN ), consider the integral equation Z 1 t U(τ, x, θ) dτ, t ≥ 0. (8.5.1) U(t, x, θ) = x + θ(t) − 2 0
A completely elementary argument (e.g., via Gronwall’s Inequality) shows that, for each x and θ, there is at most one solution. Furthermore, integration by parts allows one to check that if Z t τ − 2t e 2 dθ(τ ), U(t, 0, θ) = e 0
where the integral is taken in the sense of Riemann-Stieltjes, then t
U(t, x, θ) = e− 2 x + U(t, 0, θ)
is one, and therefore the one and only, solution. The stochastic process {U(t, x) : t ≥ 0} under W (N ) was introduced by L. Ornstein and G. Uhlenbeck1 and is known as the Ornstein–Uhlenbeck process starting from x. From our immediate point of view, its importance is that it leads to a completely tractable example of a free field. Intuitively, U(t, 0, θ) is a Brownian motion that has been subjected to a linear restoring force. Thus, locally it should behave very much like a Brownian motion. However, over long time intervals it should feel the effect of the restoring force, which is always pushing it back toward the origin. To see how these intuitive ideas are reflected in the distribution of {U(t, 0, θ) : t ≥ 0}, I begin by using t Exercise 8.2.18 to identify e, U(t, 0) RN as e− 2 I(hte ) for each e ∈ SN −1 , where t∧τ hte (τ ) = 2 e 2 − 1 e. Hence, the span of ξ, U(t, 0) RN : t ≥ 0 & ξ ∈ RN is a Gaussian family in L2 (W (N ) ; R), and
EW
(N )
|t−s| s+t U(s, 0) ⊗ U(t, 0) = e− 2 − e− 2 I.
The key to understanding the process {U(t, 0) : t t ≥ 0} is the observation that it has the same distribution as the process e− 2 B et − 1 : t ≥ 0 , where 1
In their article “On the theory of Brownian motion,” Phys. Reviews 36 # 3, pp. 823-841 (1930), L. Ornstein and G. Uhlenbeck introduced this process in an attempt to reconcile some of the more disturbing properties of Wiener paths with physical reality.
§ 8.5 Euclidean Free Fields
345
{B(t) : t ≥ 0} is a Brownian motion, a fact that follows immediately from the observation that they are Gaussian families with the same covariance structure. In particular, by combining this with the Law of the Iterated Logarithm proved in Exercise 4.3.15, we see that, for each e ∈ SN −1 , e, U(t, x) RN e, U(t, x) RN √ √ = 1 = − lim lim (8.5.2) t→∞ 2 log t 2 log t t→∞
W (N ) -almost surely, which confirms the suspicion that the restoring force dampens the Brownian excursions out toward infinity. A second indication that U( · , x) tends to spend more time than Brownian paths do near the origin is that its distribution at time t will be γe− 2t x,(1−e−t )I , and so, as distinguished from Brownian motion itself, its distribution as time t tends to a limit, namely γ0,I . This observation suggests that it might be interesting to look at an ancient Ornstein–Uhlenbeck process, one that already has been running for an infinite amount of time. To be more precise, since the distribution of an ancient Ornstein–Uhlenbeck at time 0 would be γ0,I , what we should look at is the process that we get by making the x in U( · , x, θ) a standard normal random variable. Thus, I will say that a stochastic process {UA (t) : t ≥ 0} is an ancient Ornstein–Uhlenbeck process if its distribution is that of {U(t, x, θ) : t ≥ 0} under γ0,I × W (N ) . If {U process, then it is clear A (t) : t ≥ 0} is an ancient Ornstein–Uhlenbeck that ξ, UA (t) RN : t ≥ 0 & ξ ∈ RN spans a Gaussian family with covariance
|t−s| EP UA (s) ⊗ UA (t) = e− 2 I.
As we see that if {B(t) : t ≥ 0} is a Brownian motion, then −at consequence, e 2 B et : t ≥ 0 is an ancient Ornstein–Uhlenbeck process. In addition, as we suspected, the ancient Ornstein–Uhlenbeck process is a stationary process in the sense that, for each T > 0, the distribution of {UA (t + T ) : t ≥ 0} is the same as that of {UA (t) : t ≥ 0}, which can be checked either by using the preceding representation in terms of Brownian motion or by observing that its covariance is a function of t − s. In fact, even more is true: it is time reversible in the sense that, for each T > 0, {UA (t) : t ∈ [0, T ]} has the same distribution as {UA (T − t) : t ∈ [0, T ]}. This observation suggests that we can give the ancient Ornstein–Uhlenbeck its past by running it backwards. That is, define UR : [0, ∞) × RN × Θ(RN )2 −→ RN by U(t, x, θ+ ) if t ≥ 0 UR (t, x, θ+ , θ− ) = U(−t, x, θ− ) if t < 0,
and consider the process {UR (t, x, θ+ , θ− ) : t ∈ R} under γ0,I × W (N ) × W (N ) . This process also spans a Gaussian family, and it is still true that |t−s| (N ) (N ) (8.5.3) Eγ0,I ×W ×W UR (s) ⊗ UR (t) = u(s, t)I, where u(s, t) ≡ e− 2 ,
346
8 Gaussian Measures on a Banach Space
only now for all s, t ∈ R. One advantage of having added the past is that the statement of reversibility takes a more appealing form. Namely, {UR (t) : t ∈ R} is reversible in the sense that its distribution is the same whether one runs it forward or backward in time. That is, {UR (−t) : t ∈ R} has the same distribution as {UR (t) : t ∈ R}. For this reason, I will say that {UR (t) : t ≥ 0} is a reversible Ornstein–Uhlenbeck process if its distribution is the same as that of {UR (t, x, θ+ , θ− ) : t ≥ 0} under γ0,I × W (N ) × W (N ) . An alternative way to realize a reversible Ornstein–Uhlenbeck process is to start with an RN -valued Brownian motion {B(t) : t ≥ 0} and consider the t t process {e− 2 B(et ) : t ∈ R}. Clearly ξ, e− 2 B(et ) RN : (t, ξ) ∈ R × RN is a Gaussian family with covariance given by (8.5.3). It is amusing to observe that, when one uses this realization, the reversibility of the Ornstein–Uhlenbeck process is equivalent to the time inversion invariance (cf. Exercise 4.3.11) of the original Brownian motion. § 8.5.2. Ornstein–Uhlenbeck as an Abstract Wiener Space. So far, my treatment of the Ornstein–Uhlenbeck process has been based on its relationship to Brownian motion. Here I will look at it as an abstract Wiener space. Begin with the one-sided process 0) : t ≥ 0}. Seeing as this process − t {U(t, t 2 has the same distribution as e B e − 1 : t ≥ 0}, it is reasonably clear that the Hilbert space associated with this process should be the space HU (RN ) t of functions hU (t) = e− 2 h et − 1), h ∈ H1 (RN ). Thus, define the map F U : H1 (RN ) −→ HU (RN ) accordingly, and introduce the Hilbert norm k · kHU (RN ) on HU (RN ) that makes F U into an isometry. Equivalently, Z h d i2 1 U 2 ds (1 + s) 2 hU log(1 + s) kh kHU (RN ) = [0,∞) ds 1 ˙U U 2 khU k2 2 = kh˙ U k2 2 N . N + h ,h N + L ([0,∞);R )
L ([0,∞);R )
4
L ([0,∞);R )
Note that h˙ U , hU
L2 ([0,∞);RN )
=
1 2
Z [0,∞)
d U |h (t)|2 dt = dt
1 lim |hU (t)|2 2 t→∞
= 0. 1
To check the final equality, observe that it is equivalent to limt→∞ t− 2 |h(t)| = 0 1 1 for h ∈ H(RN ). Hence, since supt>0 t− 2 |h(t)| ≤ khkH1 (RN ) and limt→∞ t− 2 |h(t)| = 0 if h˙ has compact support, the same result is true for all h ∈ H1 (RN ). In particular, q khU kHU (RN ) = kh˙ U k2L2 ([0,∞);RN ) + 14 khU k2L2 ([0,∞);RN ) .
If we were to follow the prescription in Theorem 8.3.1, we would next complete t HU (RN ) with respect to the norm supt≥0 e− 2 |hU (t)|. However, we already know
§ 8.5 Euclidean Free Fields
347
from (8.5.2) that {U(t, 0) : t ≥ 0} lives on ΘU (RN ), the space of θ ∈ Θ(RN ) such that limt→∞ (log t)−1 |θ(t)| = 0 with Banach norm −1 kθk ≡ sup log(e + t) |θ(t)|, t≥0
and so we will adopt ΘU (RN ) as the Banach space for HU (RN ). Clearly, the dual space ΘU (RN )∗ of ΘU (RN ) can be identified with the space of RN -valued Borel measures λ on [0, ∞) that give 0 mass to {0} and satisfy kλkΛU (RN ) ≡ R log(e + t) |λ|(dt) < ∞. [0,∞) (N ) Theorem 8.5.4. Let U0 ∈ M1 ΘU (RN ) be the distribution of {U(t, 0) : (N ) t ≥ 0} under W (N ) . Then HU (RN ), ΘU (RN ), U0 is an abstract Wiener space. Proof: Since Cc∞ (0, ∞); RN is contained in HU (RN ) and is dense in ΘU (RN ), we know that HU (RN ) is dense in ΘU (RN ). In addition, because η U (t) = t e− 2 η(et − 1), where η ∈ H1 (RN ), and kη U kHU (RN ) = kηkH1 (RN ) , kη U ku ≤ 1 kη U kHU (RN ) follows from |η(t)| ≤ t 2 kηkH1 (RN ) . Hence, HU (RN ) is continuously embedded in ΘU (RN ). To complete the proof, remember our earlier calculation of the covariance of {U(t; 0) : t ≥ 0}, and use it to check that (N )
EU0
hθ, λi2 =
ZZ u0 (s, t) λ(ds) · λ(dt),
where u0 (s, t) ≡ e−
|s−t| 2
− e−
s+t 2
.
[0,∞)2
U N Hence, what I need to show is that if λ ∈ ΘU (RN )∗ −→ hU λ ∈ H (R ) is the U U U map determined by hh , λi = h , hλ HU (RN ) , then
(8.5.5)
2 khU λ kHU (RN ) =
ZZ u0 (s, t) λ(ds) · λ(dt).
[0,∞)2
In order to do this, we must first know how hU λ is constructed from λ. But if (8.5.5) is going to hold, then, by polarization, U e, hU λ (τ ) RN = hhλ , δτ ei =
ZZ u0 (s, t) δτ (ds) e, λ(dt)
[0,∞)2
Z =
e,
! u0 (τ, t) λ(dt)
[0,∞)
. RN
RN
348
8 Gaussian Measures on a Banach Space
R Thus, one should guess that hU λ (τ ) = [0,∞) u0 (τ, t) λ(dt) and must check that, U U N U U N with this choice, h λ ∈ H (R ), (8.5.5) holds, and, for all h ∈ H (R ), U U U hh , λi = h , hλ HU (RN ) . The key to proving all these is the equality Z Z hU (τ )u0 (τ, t) dτ = hU (t), (*) h˙ U (τ )∂τ u0 (τ, t) dτ + 14 [0,∞)
[0,∞)
which is an elementary application of integration by parts. Applying (*) with N = 1 to hU = u0 ( · , s), we see that Z ∂τ u0 (s, τ )∂τ u0 (t, τ ) dτ = u0 (s, t), [0,∞) U N from which it follows easily both that hU λ ∈ H (R ) and that (8.5.5) holds. U U N U U In addition, if h ∈ H (R ), then hh , λi = h , hU λ HU (RN ) follows from (*) after one integrates both sides of the preceding with respect to λ(dt). I turn next to the reversible case. By the considerations in § 8.4.1, we know (N ) that the distribution UR of {UR (t) : t ≥ 0} under γ0,1 × W (N ) × W (N ) is a Borel measure on the space Banach space ΘU (R; RN ) of continuous θ : R −→ RN such that lim|t|→∞ (log t)−1 |θ(t)| = 0 with norm
−1 kθkΘU (R;RN ) ≡ sup log(e + |t|) |θ(t)| < ∞. t∈R
Furthermore, it should be clear that one can identify ΘU (R; RN )∗ with the space of RN -valued Borel measures λ on R satisfying Z kλkΛU (R;RN ) ≡ log(e + |t|) |λ|(dt) < ∞. R
Theorem 8.5.6. Take H1 (R; RN ) to be the separable Hilbert space of absolutely continuous h : R −→ RN satisfying khkH1 (R;RN ) ≡
q
˙ 2 2 N + 1 khk2 2 N < ∞. khk 4 L (R:R ) L (R:R ) (N )
Then H1 (R; RN ), ΘU (R; RN ), UR
is an abstract Wiener space.
|s−t| − 2
, and let λ ∈ ΛU (R; RN ). By the same reasoning Proof: Set u(s, t) ≡ e as I used in the preceding proof, hh, λi = h, hλ
H1 (R;RN )
§ 8.5 Euclidean Free Fields and khλ k2H1 (R;RN ) =
349
ZZ u(s, t) λ(ds) · λ(dt) R×R
u(τ, t) λ(dt). Hence, since ξ, θ(t) RN : t ≥ 0 & ξ ∈ RN (N ) (N ) spans a Gaussian family in L2 UR ; R and u(s, t)I = EUR θ(s) ⊗ θ(t) , the proof is complete. when hλ (τ ) =
R
R
§ 8.5.3. Higher Dimensional Free Fields. Thinking a la Feynman, Theorem (N ) 8.5.6 is saying that UR wants to be the measure on H 1 (R; R) given by 1
√
( 2π)dim(H1 (R;RN ))
Z 1 2 2 1 ˙ |h(t)| + 4 |h(t)| dt λH1 (R;RN ) (dh), exp − 2 R
where λH1 (R;RN ) is the Lebesgue measure on H1 (R; RN ). I am now going to look at the analogous situation when N = 1 but the parameter set R is replaced by Rν for some ν ≥ 2. That is, I want to look at the measure that Feynman would have written as 1
√ ( 2π)dim(H 1 (Rν ;R))
Z 1 2 2 1 |∇h(x))| + 4 |h(x)| dx λH 1 (Rν ;R) (dh), exp − 2 Rν
where H 1 (Rν ; R) is the separable Hilbert space obtained by completing the Schwartz test function space S (Rν ; R) with respect to the Hilbert norm khkH 1 (Rν ;R) ≡
q
k∇hk2L2 (Rν ;R) + 14 khk2L2 (Rν ;R) .
When ν = 1 this is exactly the Hilbert space H 1 (R; R) described in Theorem 8.5.6 for N = 1. When ν ≥ 2, generic elements of H 1 (Rν ; R) are better than generic elements of L2 (Rν ; R) but are not enough better to be continuous. In fact, they are not even well-defined pointwise, and matters get worse as ν gets larger. Thus, although Feynman’s representation is already questionable when ν = 1, its interpretation when ν ≥ 2 is even more fraught with difficulties. As we will see, these difficulties are reflected mathematically by the fact that, in order to construct an abstract Wiener space for H 1 (Rν ; R) when ν ≥ 2, we will have to resort to Banach spaces whose elements are generalized functions (i.e., distributions in the sense of L. Schwartz).2 2
The need to deal with generalized functions is the primary source of the difficulties that mathematicians have when they attempt to construct non-trivial quantum fields. Without going into any details, suffice it to say that in order to construct interacting (i.e., non-Gaussian) fields, one has to take non-linear functions of a Gaussian field. However, if the Gaussian field is distribution valued, it is not at all clear how to apply a non-linear function to it.
350
8 Gaussian Measures on a Banach Space
The approach that I will adopt is based on the following subterfuge. The space H 1 (Rν ; R) is one of a continuously graded family of spaces known as Sobolev spaces. Sobolev spaces are graded according to the number of derivatives “better or worse” than L2 (Rν ; R) their elements are. To be more precise, for each s ∈ R, define the Bessel operator B s on S (Rν ; C) so that s s ϕ(ξ) = 1 + |ξ|2 − 2 ϕ(ξ). d ˆ B 4 m When s = −2m, it is clear that B s = 14 −∆ , and so, in general, it is reasonable to think of B s as an operator that, depending on whether s ≤ 0 or s ≥ 0, involves taking or restoring derivatives of order |s|. In particular, kϕkH 1 (Rν ;R) = kB −1 ϕkL2 (Rν ;R) for ϕ ∈ S (Rν ; R). More generally, define the Sobolev space H s (Rν ; R) to be the separable Hilbert space obtained by completing S (Rν ; R) with respect to s Z s 1 −s 1 2 dξ. ˆ + |ξ|2 |h(ξ)| khkH s (Rν ;R) ≡ kB hkL2 (Rν ;R) = 4 ν (2π) Rν
Obviously, H 0 (Rν ; R) is just L2 (Rν ; R). When s > 0, H s (Rν ; R) is a subspace of L2 (Rν ; R), and the quality of its elements will improve as s gets larger. However, when s < 0, some elements of H s (Rν ; R) will be strictly worse than elements of L2 (Rν ; R), and their quality will deteriorate as s becomes more negative. Nonetheless, for every s ∈ R, H s (Rν ; R) ⊆ S 0 (Rν ; R), where S 0 (Rν ; R), whose elements are called real-valued tempered distributions, is the dual space of S (Rν ; R). In fact, with a little effort, one can check that an alternative description of H s (Rν ; R) is as the subspace of u ∈ S 0 (Rν ; R) with the property that B −s u ∈ L2 (Rν ; R). Equivalently, H s (Rν ; R) is the isometric image in S (Rν ; R) of L2 (Rν ; R) under the map B s , and, more generally, H s2 (Rν ; R) is the isometric image of H s1 (Rν ; R) under B s2 −s1 . Thus, by Theorem 8.3.1, once we understand the abstract Wiener spaces for any one of the spaces H s (Rν ; R), understanding the abstract Wiener spaces for any of the others comes down to understanding the action of the Bessel operators, a task that, depending on what one wants to know, can be highly non-trivial. ν+1
Lemma 8.5.7. The space H 2 (Rν ; R) is continuously embedded as a dense subspace of the separable Banach space C0 (Rν ; R) whose elements are continuous functions that tend to 0 at infinity and whose norm is the uniform norm. Moreover, given a totally finite, signed Borel measure λ on Rν , the function Z 1−ν |x−y| π 2 − 2 , λ(dy), with Kν ≡ hλ (x) ≡ Kν e Γ ν+1 Rν 2
is an element of H
ν+1 2
khλ k
(Rν ; R),
ZZ
2 H
ν+1 2
(Rν ;R)
= Kν Rν ×Rν
e−
|x−y| 2
λ(dx)λ(dy),
§ 8.5 Euclidean Free Fields
351
and hh, λi = h, hλ
H
ν+1 2
for each h ∈ H
(Rν ;R)
ν+1 2
(Rν ; R).
Proof: To prove the initial assertion, use the Fourier inversion formula to write Z √ −ν ˆ dξ h(x) = (2π) e− −1(x,ξ)Rν h(ξ) Rν
for h ∈ S (R ; R), and derive from this the estimate 12 Z ν+1 ν 2 − 2 1 khk dξ + |ξ| khku ≤ (2π)− 2 4 ν
H
Rν
Hence, since H norm k · k ν+1 H
2
ν+1 2
ν+1 2
(Rν ;R)
.
(Rν ; R) is the completion of S (Rν ; R) with respect to the ν+1 , it is clear that H 2 (Rν ; R) is continuously embedded in
(Rν ;R)
ν+1
C0 (R ; R). In addition, since S (Rν ; R) is dense in C0 (Rν ; R), H 2 (Rν ; R) is also. To carry out the next step, let λ be given, and observe that the Fourier − ν+1 2 ˆ λ(ξ) and therefore that transform of B ν+1 λ is 14 + |ξ|2 √ Z ˆ e− −1(x,ξ)Rν λ(ξ) 1 ν+1 dξ B λ(x) = ν+1 2 (2π)ν Rν 1 2 + |ξ| 4 √ Z Z −1(y−x,ξ)Rν e 1 dξ λ(dy). = ν+1 (2π)ν Rν 2 Rν 1 + |ξ|2 4 ν
Now use (3.3.19) (with N = ν and t = 12 ) to see that √ Z |y−x| e −1(y−x,ξ)Rν 1 − 2 , dξ = K e ν ν+1 (2π)ν Rν 1 + |ξ|2 2
and thereby arrive at hλ = B khλ k2
H
ν+1 2
(Rν ;R)
4 ν+1
λ. In particular, this shows that Z 2 ˆ |λ(ξ)| 1 dξ < ∞. = (2π)ν Rν 1 + |ξ|2 ν+1 2 4
Now let h ∈ S (Rν ; R), and use the preceding to justify ν+1 ν+1 hh, λi = hB − 2 h, B − 2 B ν+1 λi = h, hλ ν+1 H
2
(Rν ;R)
.
ν+1
Since both sides are continuous with respect to convergence in H 2 (Rν ; R), we ν+1 for all h ∈ H 2 (Rν ; R). In have now proved that hh, λi = h, hλ ν+1 ν H
2
(R ;R)
particular, khλ k
ZZ
2 H
ν+1 2
(Rν ;R)
= hhλ , λi = Kν Rν ×Rν
e−
|y−x| 2
λ(dx)λ(dy).
352
8 Gaussian Measures on a Banach Space ν+1
Theorem 8.5.8. Let Θ 2 (Rν ; R) be the space of continuous θ : Rν −→ R sat−1 ν+1 isfying lim|x|→∞ log(e+|x|) |θ(x)| = 0, and turn Θ 2 (Rν ; R) into a separable −1 = supx∈RN log(e + |x|) |θ(x)|. Then Banach space with norm kθk ν+1 ν Θ
ν+1 2
2
(R ;R)
ν
(R ; R) is continuously embedded as a dense subspace of Θ H ν+1 there is a W ν+1 ν ∈ M1 Θ 2 (Rν ; R) such that H
2
ν+1 2
(Rν ; R), and
(R ;R)
H
ν+1 2
(Rν ; R), Θ
ν+1 2
(Rν ; R), W
H
ν+1 2
(Rν ;R)
is an abstract Wiener space. Moreover, for each α ∈ 0, 12 , W
H
every θ is H¨ older continuous of order α and, for each α > 12 , W
H
ν+1 2
(Rν ;R)
ν+1 2
(Rν ;R)
-almost
-almost
no θ is anywhere H¨ older continuous of order α. Proof: The initial part of the first assertion follows from the first part of Lemma 8.5.7 plus the essentially trivial fact that C0 (Rν ; R) is continuously emν+1 bedded as a dense subspace of Θ 2 (Rν ; R). Further, by the second part of that same lemma combined with Theorem 8.3.3, we will have proved the second part of the first assertion here once we show that, when {hm : m ≥ 0} is P∞ ν+1 an orthonormal basis in H 2 (Rν ; R), the Wiener series m=0 ωm hm converges ν+1 N -almost every ω = (ω0 , . . . , ωm , . . . ) ∈ RN . Thus, set in Θ 2 (Rν ; R) for γ0,1 Pn Sn (ω) = m=0 ωm hm for n ≥ 1. More or less mimicking the steps outlined in Exercise 8.3.21, I will begin by showing that, for each α ∈ 0, 12 and R ∈ [1, ∞),
(*)
|Sn (y) − Sn (x)| < ∞, |y − x|α n≥0 x,y∈Q(z,R)
N sup Eγ0,1 sup
z∈Rν
sup
x6=y
where Q(z, R) = z + [−R, R)ν . Indeed, by the argument given in that exercise combined with the higher dimensional analog of Kolmogorov’s continuity criterion in Exercise 4.3.18, (*) will follow once we show that N Eγ0,1 |Sn (y) − Sn (x)|2 ≤ C|y − x|,
x, y ∈ Rν ,
for some C < ∞. To this end, set λ = δy − δx , and apply Lemma 8.5.7 to check E
N γ0,1
n X 2 2 |Sn (y) − Sn (x)| = hm , hλ
H
ν+1 2
(Rν ;R)
m=0
≤ khλ k2
H
ν+1 2
(Rν ;R)
= 2Kν 1 − e−
|y−x| 2
.
Knowing (*), it becomes an easy matter to see that there exists a measurable S : Rν × RN −→ R such that x S(x, ω) is continuous of each ω and
§ 8.5 Euclidean Free Fields
353
N Sn ( · , ω) −→ S( · , ω) uniformly on compacts for γ0,1 -almost every ω ∈ RN . In N fact, because of (*), it suffices to check that limn→∞ Sn (x) exists γ0,1 -almost ν surely for each x ∈ R , and this follows immediately from Theorem 1.4.2 plus ∞ X
Var ωm hm (x) =
m=0
∞ X
hm , hδx
2 H
ν+1 2
= khδx k2
(Rν ;R)
H
m=0
ν+1 2
(Rν ;R)
= Kν .
N -almost every ω, x S(x, ω) Furthermore, again from (*), we know that, γ0,1 1 is α-H¨ older continuous so long as α ∈ 0, 2 . N I must still check that, γ0,1 -almost surely, the convergence of Sn ( · , ω) to ν+1 S( · , ω) is taking place in Θ 2 (Rν ; R), and, in view of the fact that we already N know that, γ0,1 -almost surely, it is taking place uniformly on compacts, this reduces to showing that −1 N lim log(e + |x| sup |Sn (x)| −→ 0 γ0,1 -almost surely.
|x|→∞
n≥0
For this purpose, observe that (*) says that N γ0,1 sup E sup kSn ku,Q(z,1) < ∞, z∈Rν
n≥0
where k · ku,C denotes the uniform norm over a set C ⊆ Rν . At this point, I would like to apply Fernique’s Theorem (Theorem 8.2.1) to the Banach space `∞ N; Cb (Q(z, 1); R) and thereby conclude that there exists an α > 0 such that N (**) B ≡ sup Eγ0,1 exp α sup kSn k2u,Q(z,1) < ∞. z∈Rν
n≥0
∞
However, ` N; Cb (Q(z, 1); R) is not separable. Nonetheless, there are two ways to get around this technicality. The first is to observe that the only place separability was used in the proof of Fernique’s Theorem was at the beginning, where I used it to guarantee that BE is generated by the maps x hx, x∗ i as ∗ ∗ x runs over E and therefore that the distribution of X is determined by the distribution of {hX, x∗ i : x∗ ∈ E ∗ }. But, even though `∞ N; Cb (Q(z, 1); R) is not separable, one can easily check that it nevertheless possesses this property. The second way to deal with the problem is to apply his theorem to `∞ {0, . . . , N }; Cb (Q(z, 1); R) , which is separable, and to note that the resulting estimate can be made uniform in N ∈ N. Either way, p one arrives at (**). 2 Now set ψ(t) = eαt − 1 for t ≥ 0. Then ψ −1 (s) = α−1 log(1 + s), and ν sup kSn ku,Q(0,M ) = max sup kSn ku,Q(m,1) : m ∈ Q(0, M ) ∩ Z n≥0
n≥0
≤ ψ −1
X
m∈Q(0,M )∩Zν
ψ sup kSn ku,Q(m,1) .
n≥0
354
8 Gaussian Measures on a Banach Space
Thus, because ψ −1 is concave, Jensen’s Inequality applies and yields N Eγ0,1 sup kSn ku,Q(0,M ) ≤ ψ −1 (2M )ν B , n≥0
and therefore
#
"
Sn (x) ≤ log(e + |x|) |x|≥R n≥0
N γ0,1
sup sup
E
X
≤
1 m≥(log R) 4
X
h i N Eγ0,1 supn≥0 kSn ku,Q(0,em4 )
log(e + e(m−1)4 )
1 m≥(log R) 4
p log(1 + 2ν eν(m+1)4 B) √ −→ 0 α log(e + e(m−1)4 )
as R → ∞.
To complete the proof, I must show that, for any α > 12 , W
H
ν+1 2
(Rν ;R)
-almost
no θ is anywhere H¨older continuous of order α, and for this purpose I will proceed as in the proof of Theorem 4.3.4. Because the {θ(x + y) : x ∈ Rν } has the same W ν+1 ν -distribution for all y, it suffices for me to show that, W ν+1 ν H
2
(R ;R)
H
2
(R ;R)
almost surely, there is no x ∈ Q(0, 1) at which θ is H¨older continuous of order α > 12 . Now suppose that α ∈ 12 , 1 , and observe that, for any L ∈ Z+ and e ∈ Sν−1 , the set H(α) of θ’s that are α-H¨older continuous at some x ∈ Q(0, 1) is contained in ∞ \ ∞ [
[
L n \ θ : θ
m+`e n
−θ
m+(`−1)e n
≤
M nα
o .
M =1 n=1 m∈Q(0,n)∩Zν `=1
Hence, again using translation invariance, we see that we need only show that there is an L ∈ Z+ such that, for each M ∈ Z+ , (`−1)e M , 1 ≤ ` ≤ L ≤ − θ θ : θ `e nν W ν+1 ν α n n n H
2
(R ;R)
−1
tends to 0 as n → ∞. To this end, set U (t, θ) = Kν 2 θ(te), and observe that the W ν+1 ν -distribution of {U (t) : t ≥ 0} is that of an R-valued ancient H
2
(R ;R)
Ornstein–Uhlenbeck process. Thus, what I have to estimate is ` `−1 `−1 ` P e− 2n B e n − e− 2n B e n ≤ nMα , 1 ≤ ` ≤ L ,
where B(t), Ft , P is an R-valued Brownian motion. But clearly this probability is dominated by the sum of ` `−1 ` P B e n − B e n ≤ M2ne 2n α , 1 ≤ ` ≤ L
Exercises for § 8.5
355
and P ∃1 ≤ ` ≤ L
`−1 1 1 − e− 2n B e n ≥
`
M e 2n 2nα
.
M 2 n2(1−α)
8 , which, since α < 1, The second of these is easily dominated by 2Le− means that it causes no problems. As for the first, one can use the independence of Brownian increments and Brownian scaling to dominate it by the Lth power of 1 P B(1)−B e− n ≤ M (2nα )−1 . Hence, I can take any L such that α− 12 L > ν. As a consequence of the preceding and Theorem 8.3.1, we have the following corollary.
Corollary 8.5.9. Given s ∈ R, set ν+1 ν+1 Θs (Rν ; R) = B s− 2 θ : θ ∈ Θ 2 (Rν ; R) ,
kθkΘs (Rν ;R) = kB
ν+1 2 −s
θk
Θ
and WH s (Rν ;R) = (B s−
ν+1 2
)∗ W
ν+1 2
H
(Rν ;R)
ν+1 2
,
(Rν ;R) s
.
Then Θs (Rν ; R) is a separable Banach space in which H (Rν ; R) is continuously embedded as a dense subspace, and H s (Rν ; R), Θs (Rν ; R), WH s (Rν ;R) is an abstract Wiener space. Exercises for § 8.5 Exercise 8.5.10. In this exercise we will show how to use the Ornstein–Uhlenbeck process to prove Poincar´ e’s Inequality Varγ0,1 (ϕ) = kϕ − hϕ, γ0,1 ik2L2 (γ0,1 ;R) ≤ kϕ0 k2L2 (γ0,1 ;R)
(8.5.11)
for the standard Gaussian distribution on R. I will outline the proof of (8.5.11) for ϕ ∈ S (R; R), but the estimate immediately extends to any ϕ ∈ L2 (γ0,1 ; R) whose (distributional) first derivative is again in L2 (γ0,1 ; R). (i) For ϕ ∈ S (R; R), set uϕ (t, x) = EW
(1)
ϕ U (t, x) ,
where {U (t, x) : t ≥ 0} is the one-sided, R-valued Ornstein–Uhlenbeck process t starting at x. Show that u0ϕ (t, x) = e− 2 uϕ0 (t, x) and that
lim uϕ (t, · ) = ϕ and
t&0
lim uϕ (t, · ) = hϕ, γ0,1 i
t→∞
in L2 (γ0,1 ; R).
Show that another expression for uϕ is ! Z t 1 (y − e− 2 x)2 −t − 2 dy. ϕ(y) exp − uϕ (t, x) = 2π(1 − e ) 2(1 − e−t ) R
Using this second expression, show that uϕ (t, · ) ∈ S (R; R) and that t ∈ [0, ∞) 7−→ uϕ (t, · ) ∈ S (R; R) is continuous. In addition, show that u˙ ϕ (t, x) = 1 00 0 2 uϕ (t, x) − xuϕ (t, x) .
356
8 Gaussian Measures on a Banach Space
(ii) For ϕ1 , ϕ2 ∈ C 2 (R; R) whose second derivative are tempered, show that ϕ1 , ϕ002 − xϕ2
L2 (γ0,1 ;R)
= − ϕ01 , ϕ02
L2 (γ0,1 ;R)
,
and use this together with (i) to show that, for any ϕ ∈ S (R; R), huϕ (t, · ), γ0,1 i = hϕ, γ0,1 i and
d kuϕ (t, · )k2L2 (γ0,1 ;R) = −e−t kuϕ0 (t, · )k2L2 (γ0,1 ;R) . dt
Conclude that kuϕ (t, · )kL2 (γ0,1 ;R) ≤ kϕkL2 (γ0,1 ;R) and d kuϕ (t, · )k2L2 (γ0,1 ;R) ≥ −e−t kϕ0 k2L2 (γ0,1 ;R) . dt
Finally, integrate the preceding inequality to arrive at (8.5.11). Exercise 8.5.12. In this exercise I will outline how the ideas in Exercise 8.5.10 can be used to give another derivation of the logarithmic Sobolev Inequality (2.4.42). Again, I restrict my attention to ϕ ∈ S (R; R), since the general case can be easily obtained from this by taking limits. (i) Begin by showing that (2.4.42) for ϕ ∈ S (R; R) once one knows that (*)
ϕ log ϕ
γ0,1
1 ≤ 2
(ϕ0 )2 ϕ
γ0,1
for uniformly positive ϕ ∈ R ⊕ S (R; R). (ii) Given a uniformly positive ϕ ∈ R ⊕ S (R; R), use the results in Exercise 8.5.10 to show that e−t uϕ0 (t, · )2 d
. uϕ (t, · ) log uϕ (t, · ) γ0,1 = − uϕ (t, · ) γ0,1 2 dt
(iii) Continuing (ii), apply Schwarz’s inequality to check that uϕ0 (t, x)2 ≤ u (ϕ0 )2 (t, x), uϕ (t, x) ϕ
and combine this with (ii) to get
e−t d
uϕ (t, · ) log uϕ (t, · ) γ0,1 ≥ − 2 dt
Finally, integrate this to arrive at (*).
(ϕ0 )2 ϕ
. γ0,1
Exercises for § 8.5
357
Exercise 8.5.13. Although it should be clear that the arguments given in Exercises 8.5.10 and 8.5.12 work equally well in RN and yield (8.5.11) and (2.4.42) with γ0,1 replaced by γ0,I and (ϕ0 )2 replaced by |∇ϕ|2 , it is significant that each of these inequalities for R implies its RN analog. Indeed, show that Fubini’s Theorem is all that one needs to pass to the higher dimensional results. The reason why this remark is significant is that it allows one to prove infinite dimensional versions of both Poincar´e’s Inequality and the logarithmic Sobolev Inequality, and both of these play a crucial role in infinite dimensional analysis. In fact, Nelson’s interest in hypercontractive estimates sprung from his brilliant insight that hypercontractive estimates would allow him to construct a non-trivial (i.e., non-Gaussian), translation invariant quantum field for R2 . Exercise 8.5.14. It is interesting to see what happens if one changes the sign of the second term on the right-hand side of (8.5.1), thereby converting the centripetal force into a centrifugal one. (i) Show that, for each θ ∈ Θ(RN ), the unique solution to V(t, θ) = θ(t) +
1 2
t
Z
V(τ, θ) dτ,
t ≥ 0,
0
is
Z
t
t
V(t, θ) = e 2
τ
e− 2 dθ(τ ),
0
where the integral is taken in the sense of Riemann–Stieltjes. (ii) Show that ξ, V(t, · ) RN : (t, ξ) ∈ [0, ∞) × RN under W (N ) is a Gaussian family with covariance v(s, t) = e
s+t 2
−e
|t−s| 2
.
(iii) Let {B(t) : t ≥ 0} be an RN -valued Brownian motion, and show that the distribution of t e 2 B 1 − e−t : t ≥ 0
is the W (N ) -distribution of {V(t) : t ≥ 0}. Next, let ΘV (RN ) be the space of continuous θ : [0, ∞) −→ RN with the properties that θ(0) = 0 = lim e−t |θ(t)|, t→∞
and set kθkΘV (RN ) ≡ supt≥0 e−t |θ(t)|. Show that ΘV (RN ); k · kΘV (RN ) is a separable Banach space and that there exists a unique V (N ) ∈ M1 ΘV (RN ) such that the distribution of {θ(t) : t ≥ 0} under V (N ) is the same as the distribution of {V(t) : t ≥ 0} under W (N ) .
358
8 Gaussian Measures on a Banach Space
(iv) Let HV (RN ) be the space of absolutely continuous h : [0, ∞) −→ RN with the properties that h(0) = 0 and h˙ − 12 h ∈ L2 [0, ∞); RN . Show that HV (RN ) with norm
khkHV (RN ) ≡ h˙ − 12 h L2 ([0,∞);RN ) V N is a separable Hilbert space that is continuously embedded in Θ (R ) as a dense V N V N (N ) subspace. Finally, show that H (R ), Θ (R ), V is an abstract Wiener space.
(v) There is a subtlety here that is worth mentioning. Namely, show that HU (RN ) is isometrically embedded in HV (RN ). On the other hand, as distinguished from elements of HU (RN ), it is not true that kη˙ − 12 ηk2L2 (R;RN ) = ˙ 2L2 (R;RN ) + 41 kηk2L2 (R;RN ) , the point being that whereas the elements h of kηk HV (RN ) with h˙ ∈ Cc (0, ∞); RN are dense in HU (RN ), they are not dense in
HV (RN ). Exercise 8.5.15. Given x ∈ Rν and a slowly increasing ϕ ∈ C(Rν ; R), define τx ϕ ∈ C(Rν ; R) so that τx ϕ(y) = ϕ(x + y) for y ∈ Rν . Next, extend τx to S 0 (Rν ; R) so that hϕ, τx ui = hτ−x ϕ, ui for ϕ ∈ S (Rν ; R), and check that this is a legitimate extension in the sense that it is consistent with the original definition when applied to u’s that are slowly increasing, continuous functions. Finally, given s ∈ R, define Ox : H s (Rν ; R) −→ H s (Rν ; R) by Ox h = τx h. (i) Show that B s ◦ τx = τx ◦ B s for all s ∈ R and x ∈ Rν . (ii) Given s ∈ R, define Ox = τx H s (Rν ; R), and show that Ox is an orthogonal transformation. (iii) Referring to Theorem 8.3.14 and Corollary 8.5.9, show that the measure preserving transformation TOx that Ox determines on Θs (Rν ; R), WH s (Rν ;R) is the restriction of τx to Θs (Rν ; R). (iv) If x 6= 0, show that TOx is ergodic on Θs (Rν ; R), WH s (Rν ;R) . § 8.6 Brownian Motion on a Banach Space In this concluding section I will discuss Brownian motion on a Banach space. More precisely, given a non-degenerate, centered, Gaussian measure W on a separable Banach space E, we will see that there exists an E-valued stochastic process {B(t) : t ≥ 0} with the properties that B(0) = 0, t B(t) is continuous, and, for all 0 ≤ s < t, B(t) − B(s) is independent of σ {B(τ ) : τ ∈ [0, s]} and has distribution (cf. the notation in § 8.4) Wt−s . § 8.6.1. Abstract Wiener Formulation. Let W on E be as above, use H to denote its Cameron–Martin space, and take H 1 (H) to be the Hilbert space of absolutely continuous h : [0, ∞) −→ H such that h(0) = 0 and khkH 1 (H) = ˙ L2 ([0,∞);H) < ∞. Finally, let Θ(E) be the space of continuous θ : [0, ∞) −→ khk
§ 8.6 Brownian Motion on a Banach Space
359
E = 0, and turn Θ(E) into a Banach space with norm E satisfying limt→∞ kθ(t)k t −1 kθkΘ(E) = supt≥0 (1 + t) kθ(t)kE . By exactly the same line of reasoning as I used when E = RN , one can show that Θ(E) is a separable Banach space in which H 1 (E) is continuously embedded as a dense subspace. My goal is to prove the following statement.
Theorem 8.6.1. With H 1 (H) and Θ(E) as above, there is a unique W (E) ∈ M1 Θ(E) such that H 1 (H), Θ(E), W (E) is an abstract Wiener space. 1 Choose an orthonormal basis {h1m : m ≥ 0} in H (R), and, for n ≥ 0, t ≥ 0, P n N 1 and x = (x0 , . . . , xm , . . . ) ∈ E , set Sn (t, x) = m=0 hm (t)xm . I will show N that, W -almost surely, {Sn ( · , x) : n ≥ 0} converges in Θ(E), and, for the most part, the proof follows the same basic line of reasoning as that suggested in Exercise 8.3.21 when E = RN . However, there is a problem here that we did not encounter there. Namely, unless E is finite dimensional, bounded subsets will not necessarily be relatively compact in E. Hence, local uniform equicontinuity plus local boundedness is not sufficient to guarantee that a collection of E-valued paths is relatively compact in C [0, ∞); E , and that is the reason why we have to work a little harder here.
Lemma 8.6.2. For W N -almost every x ∈ E N , {Sn ( · , x) : n ≥ 0} is relatively compact in Θ(E). Proof: Choose E0 ⊆ E, as in Corollary 8.3.10, so that bounded subsets of E0 are relatively compact in E and H, E0 , W E0 is again an abstract Wiener space. Without loss in generality, I will assume that k · kE ≤ k · kE0 , and, by Fernique’s Theorem, we know that C ≡ EW0 kxk4E0 < ∞. Pn Since (cf. Exercise 8.2.14) Sn (t, x) − Sn (s, x) = m=0 h1t − h1s , h1m H 1 (R) xm , where h1τ = · ∧ τ , the W0N -distribution of Sn (t) − Sn (s) is Wn , where 2n = Pn 1 1 1 2 WN kSn (t) − Sn (s)k4E0 ≤ C(t − s)2 . 0 ht − hs , hm H 1 (R) ≤ t − s. Hence, E In addition, {kSn (t) − Sn (s)kE0 : n ≥ 1} is a submartingale, and so, by Doob’s Inequality plus Kolmogorov’s Continuity Criterion, there exists a K < ∞ such that, for each T > 0, (*)
EW
N
sup
sup
n≥0 0≤s 0, {Sn (t, x) : n ≥ 0 & t ∈ [0, T ]} is relatively compact in E and {Sn ( · , x) [0, T ] : n ≥ 0} is uniformly k · kE equicontinuous W N -almost surely, the Ascoli–Arzela Theorem guarantees that, W N -almost surely, {Sn ( · , x) : n ≥ 0} is relatively compact in C [0, ∞); E with
360
8 Gaussian Measures on a Banach Space
the topology of uniform convergence on compacts. Thus, in order to complete the proof, all that I have to show is that, W N -almost surely, lim sup sup
T →∞ n≥0 t≥T
kSn (t, x)kE = 0. t
But, sup t≥2k
X 7` X kSn (t, x)kE kSn (t, x)kE ≤ 2− 8 ≤ sup t t ` `+1 2 ≤t≤2 `≥k
`≥k
and therefore, by (*), " EW
N
kSn (t, x)kE sup sup t n≥0 t≥2k
#
sup 0≤t≤2`+1
kSn (t, x)kE 1
t8
,
3
≤
24 K 1 8
2 −1
k
2− 8 .
Now that we have the requisite compactness of {Sn : n ≥ 0}, convergence comes to checking a criterion of the sort given in the following simple lemma. Lemma 8.6.3. Suppose that {θn : n ≥ 0} is a relatively compact sequence in Θ(E). If limn→∞ hθn (t), x∗ i exists for each t in a dense subset of [0, ∞) and x∗ in a weak* dense subset of E ∗ , then {θn : n ≥ 0} converges in Θ(E). Proof: For a relatively compact sequence to be convergent, it is necessary and sufficient that every convergent subsequence have the same limit. Thus, suppose that θ and θ0 are limit points of {θn : n ≥ 0}. Then, by hypothesis, hθ(t), x∗ i = hθ0 (t), x∗ i for t in a dense subset of [0, ∞) and x∗ in a weak* dense subset of E ∗ . But this means that the same equality holds for all (t, x∗ ) ∈ [0, ∞) × E ∗ and therefore that θ = θ0 . Proof of Theorem 8.6.1: In view of Lemmas 8.6.2 and 8.6.3 and the separability of E ∗ in the weak* topology, we will know that {Sn ( · , x) : n ≥ 0} converges in Θ(E) for W N -almost every x ∈ E N once we show that, for each (t, x∗ ) ∈ [0, ∞) × E ∗ , {hSn (t, x), x∗ i : n ≥ 0} converges for W N -almost Pn in R N ∗ ∗ ∗ ∗ 1 every x ∈ E . But if x ∈ E , then hSn (t, x), x i = 0 hxm , x ihm (t), the random variables x hxm , x∗ ih1m (t) are P independent, centered Gaussians under ∞ N W with variance khx∗ k2H h1m (t)2 , and 0 h1m (t)2 = kht k2H 1 (R) = t. Thus, by Theorem 1.4.2, we have the required convergence. Next, define B : [0, ∞) × E N −→ E so that limn→∞ Sn (t, x) if {Sn ( · , x) : n ≥ 0} converges in Θ(E) B(t, x) = 0 otherwise. Given λ ∈ Θ(E)∗ , determine hλ ∈ H 1 (H) by h, hλ H 1 (H) = hh, λi for all h ∈ H 1 (H). I want to show that, under W N , x
hB( · , x), λi is a centered Gaussian
§ 8.6 Brownian Motion on a Banach Space
361
with variance khλ k2H 1 (H) . To this end, define x∗m ∈ E ∗ so that1 hx, x∗m i = hh1m x, λi for x ∈ E. Then, n X hB( · , x), λi = lim hSn ( · , x), λi = lim hxm , x∗m i n→∞
n→∞
W N -almost surely.
0
Hence, hB( · , x), λi is certainly a centered Gaussian under W N , and, because we are dealing with Gaussian random variables, almost sure convergence implies L2 convergence. To compute its variance, choose an orthonormal basis {hk : k ≥ 0} for H, and note that, for each m ≥ 0, WN
E
∞ X ∗ 2 2 hxm , xm i = khx∗m kH = hh1m hk , λi2 . k=0
Thus, since {h1m hk : (m, k) ∈ N2 } is an orthonormal basis in H 1 (H), WN
E
∞ ∞ X X 2 2 1 2 hB( · ), λi = hhm hk , λi = h1m hk , hλ H 1 (H) = khλ k2H 1 (H) . m,k=0
m,k=0
Finally, to complete the proof, all that remains is to take W (E) to be the W N -distribution of x B( · , x). § 8.6.2. Brownian Formulation. Let (H, E, W) be an abstract Wiener space. Given a probability space (Ω, F, P), a non-decreasing family of sub-σ-algebras {Ft : t ≥ 0}, and a measurable map B : [0, ∞) × Ω −→ E, say that the triple B(t), Ft , P is a W-Brownian motion if (1) B is {Ft : t ≥ 0}-progressively measurable, (2) B(0, ω) = 0 and B( · , ω) ∈ C [0, ∞); E for P-almost every ω, (3) B(1) has distribution W, and, for all 0 ≤ s < t, B(t)−B(s) is independent 1 of Fs and has the same distribution as (t − s) 2 B(1).
Lemma 8.6.4. Suppose that {B(t) : t ≥ 0} satisfies conditions (1) and (2). Then B(t), Ft , P is a W-Brownian motion if and only if hB(t), x∗ i, Ft , P is an R-valued Brownian motion for each x∗ ∈ E ∗ with khx∗ kH = 1. In addition, if B(t), Ft , P is a W-Brownian motion, then the span G(B) of {hB(t), x∗ i : (t, x∗ ) ∈ [0, ∞) × E ∗ } is a Gaussian family in L2 (P; R) and (8.6.5) EP hB(t1 ), x∗1 ihB(t2 ), x∗2 i = (t1 ∧ t2 ) hx∗1 , hx∗2 H . Conversely, if G(B) is a Gaussian family in L2 (P; R) and (8.6.5) holds, then B(t), Ft , P is a W-Brownian motion when Ft = σ {B(τ ) : τ ∈ [0, t]} . 1
Given h1 ∈ H 1 (R) and x ∈ E, I use h1 x to denote the element θ of Θ(E) determined by θ(t) = h1 (t)x.
362
8 Gaussian Measures on a Banach Space
Proof: If B(t), Ft , P is a W-Brownian motion and x∗ ∈ E ∗ with khx∗ kH = 1, then hB(t), x∗ i − hB(s), x∗ i = hB(t) − B(s), x∗ i is independent of Fs and is a centered Gaussian with variance (t − s). Thus, hB(t), x∗ i, Ft , P is an R-valued Brownian motion. Next assume that hB(t), x∗ i, Ft , P is an R-valued Brownian motion for every x∗ with khx∗ kH = 1. Then hB(t) − B(s), x∗ i is independent of Fs for every x∗ ∈ E ∗ , and so, since BE is generated by {h · , x∗ i : x∗ ∈ E ∗ }, B(t) − B(s) is independent of Fs . In addition, hB(t) − B(s), x∗ i is a centered Gaussian with variance (t − s)khx∗ k2H , and therefore B(1) has distribution W and B(t) − B(s) 1 has the same distribution as (t − s) 2 B(1). Thus, B(t), Ft , P is a W-Brownian motion. Again assume that B(t), Ft , P is a W-Brownian motion. To prove that G(B) is a Gaussian family for which (8.6.5) holds, it suffices to show that, for all 0 ≤ t1 ≤ t2 and x∗1 , x∗2 ∈ E ∗ , hB(t1 ), x∗1 i + hB(t2 ), x∗2 i is a centered Gaussian with covariance t1 khx1 ∗ + hx∗2 k2H + (t2 − t1 )khx∗2 k2H . Indeed, we would then know not only that G(B) is a Gaussian family but also that the variance of hB(t1 ), x∗1 i ± hB(t2 ), x∗2 i is t1 khx1 ∗ ± hx∗2 k2H + (t2 − t1 )khx∗2 k2H , from which (8.6.5) is immediate. But
hB(t1 ), x∗1 i + hB(t2 ), x∗2 i = hB(t1 ), x∗1 + x∗2 i + hB(t2 ) − B(t1 ), x∗2 i, and the terms on the right are independent, centered Gaussians, the first with variance t1 khx∗1 + hx∗2 k2H and the second with variance (t2 − t1 )khx∗2 k2H . Finally, take Ft = σ {B(τ ) : τ ∈ [0, t]} , and assume that G(B) is a Gaussian family satisfying (8.6.5). Given x∗ with khx∗ kH = 1 and 0 ≤ s < t, we know that hB(t) − B(s), x∗ i = hB(t), x∗ i − hB(s), x∗ i is orthogonal in L2 (P; R) to hB(τ ), y ∗ i for every τ ∈ [0, s] and y ∗ ∈ E ∗ . Hence, since Fs is generated by {hB(τ ), y ∗ i : (τ, y ∗ ) ∈ [0, s]×E ∗ }, we know that hB(t)−B(s), x∗ i is independent of Fs . In addition, hB(t) − B(s), x∗ i is a centered Gaussian with variance t − s, and so we have proved that hB(t), x∗ i, Ft , P is an R-valued Brownian motion. Now apply the first part of the lemma to conclude that B(t), Ft , P is a WBrownian motion. Theorem 8.6.6. Refer to the notation in Theorem 8.6.1. When Ω = Θ(E), F = BE , and Ft = σ {θ(τ ) : τ ∈ [0, t]} , θ(t), Ft , W (E) is a W-Brownian motion. Conversely, if B(t), Ft , P is any W-Brownian motion, then B( · , ω) ∈ Θ(E) P-almost surely and W (E) is the P-distribution of ω B( · , ω). Proof: To prove the first assertion, let t1 , t2 ∈ [0, ∞) and x∗1 , x∗2 ∈ E ∗ be given, and define λi ∈ Θ(E)∗ so that hθ, λi i = hθ(ti ), x∗i i for i ∈ {1, 2}. Then (cf. the notation in the proof of Theorem 8.6.1) hλi = h1ti hx∗i , and so EW
(E)
hθ(t1 ), x∗1 ihθ(t2 ), x∗2 i = hλ1 hλ2 H 1 (H) = (t1 ∧ t2 ) hx∗1 , hx∗2 H .
§ 8.6 Brownian Motion on a Banach Space
363
Starting from this, it is an easy matter to check that the span of {hθ(t), x∗ i : (t, x∗ ) ∈ [0, ∞) × E ∗ } is a Gaussian family in L2 (W (E) ; R) that satisfies (8.6.5). To prove the converse, begin by observing that, because G(B) is a Gaussian family satisfying (8.6.5), the distribution of ω ∈ Ω 7−→ B( · , ω) ∈ C [0, ∞); E under P is the same as that of θ ∈ Θ(E) 7−→ θ( · ) ∈ C [0, ∞); E under W (E) . Hence kθ(t)kE kB(t)kE (E) = 0 = 1, lim =0 =W P lim t→∞ t→∞ t t
and so B( · , ω) ∈ Θ(E) P-almost surely and the distribution of ω Θ(E) is W (E) .
B( · , ω) on
§ 8.6.3. Strassen’s Theorem Revisited. What I called Strassen’s Theorem in § 8.4.2 is not the form in which Strassen himself presented it. Instead, his formulation was in terms of rescaled R-valued Brownian motion, not partial sums of independent random variables. The true statement of Strassen’s Theorem is the following in the present setting. Theorem 8.6.7 (Strassen). Given θ ∈ Θ(E), define θ˜n (t) = θ(nt) Λn for n ≥ 1 q and t ∈ [0, ∞), where Λn = 2n log(2) (n ∨ 3). Then, for W (E) -almost every θ, the sequence {θ˜n : n ≥ 0} is relatively compact in Θ(E) and BH 1 (H) (0, 1) is its set of limit points. Equivalently, for W (E) -almost every θ,
lim kθ˜n − BH 1 (H) (0, 1)kΘ(E) = 0
n→∞
and, for each h ∈ BH 1 (H) (0, 1), limn→∞ kθ˜n − hkΘ(E) = 0.
Not surprisingly, the proof differs only slightly from that of Theorem 8.4.4. In proving the W (E) -almost sure convergence of {θ˜n : n ≥ 1} to BH 1 (H) (0, 1), there are two new ingredients here. The first is the use of the Brownian scaling invariance property (cf. Exercise 8.6.8), which says that the W (E) is invariant 1 under the scaling maps Sα : Θ(E) −→ Θ(E) given by Sα θ = α− 2 θ(α · ) for α > 0 and is easily proved as a consequence of the fact that these maps are isometric from H 1 (H) onto itself. The second new ingredient is the observation that, for any R > 0, r ∈ (0, 1], and θ ∈ Θ(E), kθ(r · ) − BH 1 (H) (0, R)kΘ(E) ≤ kθ − BH 1 (H) (0, R)kΘ(E) . To see this, let h ∈ BH 1 (H) (0, R) be given, and check that h(r · ) is again in BH (0, R) and that kθ(r · ) − h(r · )kΘ(E) ≤ kθ − hkΘ(E) .
364
8 Gaussian Measures on a Banach Space
Taking these into account and applying (8.4.2), one can now justify
W (E) m−1max m ˜ θn − BH 1 (H) (0, 1) Θ(E) ≥ δ β ≤n≤β !
m
β 2 θ(nβ −m · ) (E)
≥δ − BH 1 (H) (0, 1) =W max
Λn β m−1 ≤n≤β m Θ(E)
m−1 Λ δ [β ]
≥ m ≤ W (E) m−1max m θ β −m n · − BH 1 (H) 0,
m
β ≤n≤β β 2 Λ[β m−1 ] β2 Θ(E)
Λ[β m−1 ] δ
≥ m ≤ W (E) θ − BH 1 (H) 0,
m
β 2 Λ[β m−1 ] β2 Θ(E) m
1 B (0, 1) ≥ δ θ − = W (E) β 2 Λ−1 m−1 H (H) [β ] Θ(E) R2 [β m−1 ] (E) m−1 log(2) [β ] = Wβ m Λ−2 kθ − BH 1 (H) (0, 1)kΘ(E) ≥ δ ≤ exp − βm [β m−1 ]
for all β ∈ (1, 2), R < inf{khkH 1 (H) : khkΘ(E) ≥ δ}, and sufficiently large m ≥ 1. Armed with this information, one can simply repeat the argument given at the analogous place in the proof of Theorem 8.4.4. The proof that, W (E) -almost surely, θ˜n approaches every h ∈ C infinitely often also requires only minor modification. To begin, one remarks that if A ⊆ Θ(E) is relatively compact, then kθ(t)kE = 0. T →∞ θ∈A t∈[T / −1 ,T ] 1 + t lim sup
sup
Thus, since, by the preceding, for W (E) -almost every θ, the union of {θn : n ≥ 1} and BH 1 (H) (0, 1) is relatively compact in Θ(E), it suffices to prove that
θ˜n (t) − θ˜n (k −1 ) − h(t) − h(k −1 ) kE = 0 W (E) -almost surely lim sup 1+t n→∞ t∈[k−1 ,k]
for each h ∈ BH 1 (H) (0, 1) and k ≥ 2. Because, for a fixed k ≥ 2, the random variables θ˜k2m − θ˜k2m (k −1 ) [k −1 , k], m ≥ 1, are W (E) -independent random variables, we can use the Borel–Cantelli Lemma as in § 8.4.2 and thereby reduce the problem to showing that, if θˇkm (t) = θ˜km (t + k −1 ) − θ˜km (k −1 ), then ∞ X
W (E) kθˇk2m − hkΘ(E) ≤ δ = ∞
m=1
for each δ > 0, k ≥ 2, and h ∈ BH 1 (H) (0, 1). Finally, since W (E) km Λ−1 is the k2m W (E) distribution of θ θˇk2m , the rest of the argument is the same as the one given in § 8.4.2.
Exercises for § 8.6
365
Exercises for § 8.6 Exercise 8.6.8. Let H 1 (H), Θ(E), W (E) be as in Theorem 8.6.1. 1
(i) Given α > 0, define Sα : Θ(E) −→ Θ(E) so that Sα θ(t) = α− 2 θ(αt), t ∈ [0, ∞), and show that (Sα )∗ W (E) = W (E) . Again, this property is called Brownian scaling invariance. (ii) Define I : Θ(E) −→ C [0, ∞); E so the Iθ(0) = 0 and Iθ(t) = tθ(t−1 ) for t > 0. Show that I is an isometry from Θ(E) onto itself and that I H 1 (H) is an isometry on H onto itself. Finally, use this to prove the Brownian time inversion invariance property: I∗ W (E) = W (E) .
Exercise 8.6.9. Let H U (H) be the Hilbert space of absolutely continuous hU : R −→ H with the property that q khkH U (H) = kh˙ U k2L2 (R;H) + 14 khU k2L2 (R;H) < ∞,
and take ΘU (E) to be the Banach space of continuous θU : R −→ E satisfying U kθ U (t) . If F : Θ(E) −→ = 0 with norm kθU kΘU (E) = supt∈R log(e+|t|) lim|t|→∞ kθlog(t)k t t
C(R; E) is given by [F (θ)](t) = e− 2 θ(et ), show that F takes Θ(E) continuously into ΘU (E) and that H U (H), ΘU (E), U (E) is an abstract Wiener space when (E) (E) UR = F∗ W (E) . Of course, one should recognize the measure UR as the distribution of an E-valued, reversible, Ornstein–Uhlenbeck process.
Exercise 8.6.10. A particularly interesting case of the construction in Exercise 8.6.9 is when H = H 1 (RN ) and E = Θ(RN ). Working in that setting, define B : R × [0, ∞) × ΘU Θ(E) −→ RN by B (s, t), θ = [θ(s)](t), and show that, Θ(RN ) for each s ∈ R, B(s, t), F(s,t) , UR is an RN -valued Brownian motion when F(s,t) = σ {B(s, τ ) : τ ∈ [0, t]} . Next, for each t ∈ [0, ∞), show that the √ Θ(E) UR -distribution of θ B( · , t) is that of t times a reversible, RN -valued Ornstein–Uhlenbeck process.
Exercise 8.6.11. Continuing in the same setting as in the preceding, set σ 2 = (E) EW kθk2Θ(E) , and combine the result in Exercise 8.2.16 with Brownian scaling invariance to show that ! R2 (E) , W sup kθ(t)kE ≥ R ≤ K exp − 72σ 2 t τ ∈[0,t]
where K is the constant in Fernique’s Theorem. Next, use this together with Theorem 8.4.4 and the reasoning in Exercise 4.3.16 to show that
kθ(t)kE kθ(t)kE = L = lim q lim q t&0 2t log(2) t 2t log(2) where L = sup khkE : h ∈ BH (0, 1) . t→∞
W (E) -almost surely, 1 t
366
8 Gaussian Measures on a Banach Space
Exercise 8.6.12. It should be recognized that Theorem 8.4.4 is an immediate corollary of Theorem 8.6.7. To see this, check that {θ(n) : n ≥ 1} has the same distribution under W (E) as {Sn : n ≥ 1} has under W N and that BH (0, 1) = {h(1) : h ∈ BH 1 (H) }, and use these to show that Theorem 8.4.4 follows from Theorem 8.6.7.
Exercise 8.6.13. For θ ∈ Θ(E) and n ∈ Z+ , define θ˘n ∈ Θ(E) so that θ˘n (t) =
s
n θ log(2) (n ∨ 3)
t , n
t ∈ [0, ∞),
and show that, W (E) -almost surely, {θ˘n : n ≥ 1} is relatively compact in Θ(E) and that BH 1 (H) (0, 1) is the set of its limit points.
Hint: Referring to (ii) in Exercise 8.6.8, show that it suffices to prove these properties for the sequence {(Iθ)˘n : n ≥ 1}. Next check that
(Iθ)˘n − Ih = ˜ θn − h Θ(E) Θ(E)
for h ∈ H 1 (H),
and use Theorem 8.6.7 and the fact that I is an isometry of H 1 (H) onto itself.
Chapter 9 Convergence of Measures on a Polish Space
In Chapters 2 and 3, I introduced a notion of convergence on M1 (RN ) that is appropriate when discussing either Central Limit phenomena or the sort of limits that arose in connection with infinitely divisible laws. In this chapter, I will give a systematic treatment of this sort of convergence and show how it extends to probability measures on any Polish space, that is, any complete, separable, metric space. Unfortunately, this extension will entail an excursion into territory that borders on abstract nonsense, although I hope to avoid crossing that border. In any case, just as Banach’s great achievement was the ingenious use for infinite dimensional vector spaces of completeness to replace local compactness, so here we will have to learn how to substitute compactness by completeness in measure theoretic arguments. § 9.1 Prohorov–Varadarajan Theory The goal in this section is to generalize results like Lemma 2.1.7 and Theorem 3.1.1 to a very abstract setting. § 9.1.1. Some Background. When discussing the convergence of probability measures on a measurable space (E, B), one always has at least two senses in which the convergence may take place, and (depending on additional structure that the space may possess) one may have more. To be more precise, let B(E; R) ≡ B (E, B); R be the space of bounded, R-valued, B-measurable functions on E, use M1 (E) ≡ M1 (E, B) to denote the space of all probability measures on (E, B), and define the duality relation Z hϕ, µi = ϕ dµ for ϕ ∈ B(E; R) and µ ∈ M1 (E). E
Next, again use kϕku ≡ supx∈E |ϕ(x)| to denote the uniform norm of ϕ ∈ B(E; R), and consider the neighborhood basis at µ ∈ M1 (E) determined by the sets U (µ, r) = ν ∈ M1 (E) : hϕ, νi − hϕ, µi < r for ϕ ∈ B(E, R) with kϕku ≤ 1 as r runs over (0, ∞). For obvious reasons, the topology defined by these neighborhoods U is called the uniform topology on M1 (E). In order to develop some feeling for the uniform topology, I will begin by examining a few of its elementary properties. 367
368
9 Convergence of Measures on a Polish Space
Lemma 9.1.1. M1 (E) by
Define the variation distance between elements µ and ν of
n o kν − µkvar = sup hϕ, µi − hϕ, νi : ϕ ∈ B(E; R) with kϕku ≤ 1 . Then (µ, ν) ∈ M1 (E)2 7−→ kµ − νkvar is a metric on M1 (E) that is compatible with the uniform topology. Moreover, if µ, ν ∈ M1 (E) are two elements of M1 (E) and λ is any element of M1 (E) with respect to which both µ and ν are absolutely continuous (e.g., µ+ν 2 ), then
(9.1.2)
kµ − νkvar = kg − f kL1 (λ;R) ,
where f =
∂ν dµ . and g = ∂λ dλ
In particular, kµ − νkvar ≤ 2, and equality holds precisely when ν ⊥ µ (i.e., they are singular). Finally, the metric (µ, ν) ∈ M1 (E)2 7−→ kµ − νkvar is complete. Proof: The first assertion needing comment is the one in (9.1.2). But, for every ϕ ∈ B(E; R) with kϕku ≤ 1, Z hϕ, νi − hϕ, µi = ϕ(g − f ) dλ ≤ kg − f kL1 (λ;R) , E
and equality holds when ϕ = sgn ◦ (g − f ). To prove the assertion that follows (9.1.2), note that kg − f kL1 (λ;R) ≤ kf kL1 (λ;R) + kgkL1 (λ;R) = 2 and that the inequality is strict if and only if f g > 0 on a set of strictly positive λ-measure or, equivalently, if and only if µ 6⊥ ν. Thus, all that remains is to check the completeness assertion. To this end, let {µn : n ≥ 1} ⊆ M1 (E) satisfying lim sup kµn − µm kvar = 0 m→∞ n≥m
P∞ be given, and set λ = n=1 2−n µn . Clearly, λ is an element of M1 (E) with n respect to which each µn is absolutely continuous. Moreover, if fn = dµ dλ , then, 1 by (9.1.2), {fn : n ≥ 1} is a Cauchy convergent sequence in L (λ; R). Hence, since L1 (λ; R) is complete, there is an f ∈ L1 (λ; R) to which the fn ’s converge in L1 (λ; R). Obviously, we may choose f to be non-negative, and certainly it has λ-integral 1. Thus, the measure µ given by dµ = f dλ is an element of M1 (E), and, by (9.1.2), kµn − µkvar −→ 0. As a consequence of Lemma 9.1.1, we see that the uniform topology on M1 (E) admits a complete metric and that convergence in this topology is intimately related to L1 -convergence in the L1 -space of an appropriate element of M1 (E).
§ 9.1 Prohorov–Varadarajan Theory
369
In fact, M1 (E) looks in the uniform topology like a galaxy that is broken into many constellations, each constellation consisting of measures that are all absolutely continuous with respect to some fixed measure. In particular, there will usually be too many constellations for M1 (E) in the uniform topology to be separable. To wit, if E is uncountable and {x} ∈ B for every x ∈ E, then the point masses δx , x ∈ E, (i.e., δx (Γ) = 1Γ (x)) form an uncountable subset of M1 (E) and kδy − δx kvar = 2 for y 6= x. Hence, in this case, M1 (E) cannot be covered by a countable collection of open k · kvar -balls of radius 1. As I said at the beginning of this section, the uniform topology is not the only one available. Indeed, for many purposes and, in particular, for probability theory, it is too rigid a topology to be useful. For this reason, it is often convenient to consider a more lenient topology on M1 (E). The first one that comes to mind is the one that results from eliminating the uniformity in the uniform topology. That is, given a µ ∈ M1 (E), define o n (9.1.3) S µ, δ; ϕ1 , . . . , ϕn ≡ ν ∈ M1 (E) : max hϕk , νi − hϕk , µi < δ 1≤k≤n
for n ∈ Z+ , ϕ1 , . . . , ϕn ∈ B(E; R), and δ > 0. Clearly these sets S determine a Hausdorff topology on M1 (E) in which the net {µα : α ∈ A} converges to µ if and only if limα hϕ, µα i = hϕ, µi for every ϕ ∈ B(E; R). For historical reasons, in spite of the fact that it is obviously weaker than the uniform topology, this topology on M1 (E) is sometimes called the strong topology, although, in some of the statistics literature, it is also known as the τ -topology. A good understanding of the relationship between the strong and uniform topologies is most easily gained through functional analytic considerations that will not be particularly important for what follows. Nonetheless, it will be useful to recognize that, except in very special circumstances, the strong topology is strictly weaker than the uniform topology. For example, take E = [0, 1] withits Borel field, and consider the probability measures µn (dt) = 1 + sin(2nπt) dt for n ∈ Z+ . Noting that, since | sin(2nπt) − sin(2mπt)| ≤ 2 and therefore Z 1 | sin(2nπt) − sin(2mπt)| 1 dt 2 kµn − µm kvar = 2 0 Z 2 1 1 1 sin(2nπt) − sin(2mπt) dt = ≥ 4 4 0
for m 6= n, one sees that {µn : n ≥ 1} not only fails to converge in the uniform topology, it does 1not even have any limit points as n →2 ∞. On the other hand, because 2 2 sin(2nπt) : n ≥ 1 is orthonormal in L λ[0,1] ; R , Bessel’s Inequality says that !2 Z ∞ X 2 ϕ(t) sin(2nπt) dt ≤ kϕk2L2 (λ[0,1] ) ≤ kϕk2u < ∞ n=1
[0,1]
370
9 Convergence of Measures on a Polish Space
and therefore hϕ, µn i −→ hϕ, λ[0,1] i for every ϕ ∈ B [0, 1]; R . In other words, {µn : n ≥ 1} converges to λ[0,1] in the strong topology, but it converges to nothing at all in the uniform topology. § 9.1.2. The Weak Topology. Although the strong topology is weaker than the uniform and can be effectively used in various applications, it is still not weak enough for most probabilistic applications. Indeed, even when E possesses a good topological structure and B = BE is the Borel field over E, the strong topology on M1 (E) shows no respect for the topology on E. For example, suppose that E is a metric space and, for each x ∈ E, consider the point mass δx on BE . Then, no matter how close x ∈ E \ {x} gets to y in the sense of the topology on E, δx is not getting close to δy in the strong topology on M1 (E). More generally (cf. Exercise 9.1.15), measures cannot be close in the strong topology unless their sets of small measure are essentially the same. Thus, for example, the convergence that is occurring in The Central Limit Theorem (cf. Theorem 2.1.8) cannot, in general, be taking place in the strong topology; and since The Central Limit Theorem is an archetypal example of the sort of convergence result at which probabilists look, it is only sensible for us to take a hint from the result that we got there. Thus, let E be a metric space, set B = BE , and consider the neighborhood basis at µ ∈ M1 (E) given by the sets S(µ, δ; ϕ1 , . . . , ϕn ) in (9.1.3) when the ϕk ’s are restricted to be elements of Cb (E; R). The topology that results is much weaker than the strong topology, and is therefore justifiably called the weak topology on M1 (E). (The reader who is familiar with the language of functional analysis will, with considerable justice, complain about this terminology. Indeed, if one thinks of Cb (E; R) as a Banach space and of M1 (E) as a subspace of its dual space Cb (E; R)∗ , then the topology that I am calling the weak topology is what a functional analyst would call the weak∗ topology. However, because it is the most commonly accepted choice of probabilists, I will continue to use the term weak instead of the more correct term weak∗ .) In particular, the weak topology respects the topology on E: δy tends to δx in the weak topology on M1 (E) if and only if y −→ x in E. Lemma 2.3.3 provides further evidence that the weak topology is well adapted to the sort of analysis encountered in probability theory, since, by that lemma, weak convergence of {µn : n ≥ 1} ⊆ M1 (RN ) to µ is equivalent to pointwise convergence of µ cn (ξ) to µ ˆ(ξ). Besides being well adapted to probabilistic analysis, the weak topology turns out to have many intrinsic virtues that are not shared by either the uniform or strong topologies. In particular, as we will see shortly, when E is a separable metric space, the weak topology on M1 (E) is not only a metric topology, which (cf. Exercise 9.1.15) the strong topology seldom is, but it is even separable, which, as we have seen, the uniform topology seldom is. In order to check these properties, we will first have to review some elementary facts about separable metric spaces. Given a metric ρ for a topological space E, I will use Ubρ (E; R) to denote
§ 9.1 Prohorov–Varadarajan Theory
371
the space of bounded, ρ-uniformly continuous R-valued functions on E and will endow Ubρ (E; R) with the topology determined by the uniform norm. Thus, Ubρ (E; R) becomes in this way a closed subspace of Cb (E; R). Lemma 9.1.4. Let E be a separable metric space. Then E is homeomorphic + to a subset of [0, 1]Z . In particular: (i) If E is compact, then the space C(E; R) is separable with respect to the uniform metric. (ii) Even when E is not compact, it nonetheless admits a metric ρˆ with respect to which it becomes a totally bounded metric space. (iii) If ρˆ is a totally bounded metric on E, then Ubρˆ(E; R) is separable. Proof: Let ρ be any metric on E, and choose {pn : n ≥ 1} to be a countable, + dense subset of E. Next, define h : E −→ [0, 1]Z to be the mapping whose nth coordinate is given by hn (x) =
ρ(x, pn ) , 1 + ρ(x, pn )
x ∈ E.
It is then an easy matter to check that h is homeomorphic onto a subset of + [0, 1]Z . + To prove (i), I will first check it for compact subsets K of E = [0, 1]Z . To this + end, denote by P the space of polynomials p : [0, 1]Z −→ R. That is, P consists + of finite, R-linear combinations of the monomials ξ ∈ [0, 1]Z 7−→ ξkn11 · · · ξkn`` , where ` ≥ 1, 1 ≤ k1 < · · · < k` , and {n1 , . . . , n` } ⊆ N. Clearly, if P0 is the subset of P consisting of those p’s with rational coefficients, then P0 is countable, and P0 is dense in P. Thus, it suffices to show that {p K : p ∈ P} is dense in C(K; R). But P is obviously an algebra. In addition, if ξ and η are distinct + points in [0, 1]Z , it is an easy (in fact, a one dimensional) matter to see that there is a p ∈ P for which p(ξ) 6= p(η). Hence, the desired density follows from the Stone–Weierstrass Approximation Theorem. Finally, for an arbitrary + compact metric space E, define h : E −→ [0, 1]Z as above, note that K ≡ h(E) is compact, and conclude that the map ϕ ∈ C(K; R) 7−→ ϕ ◦ h ∈ C(E; R) is a homeomorphism between the uniform topologies on these spaces. Since we already know that C(K; R) is separable, this completes (i). The proof of (ii) is easy. Namely, define D(x, η) =
∞ X |ξn − ηn | 2n n=1
+
for x, η ∈ [0, 1]Z .
+
Clearly, D is a metric for [0, 1]Z , and therefore (x, y) ∈ E 2 7−→ ρˆ(x, y) ≡ D h(x), h(y)
372
9 Convergence of Measures on a Polish Space +
is a metric for E. At the same time, since [0, 1]Z is compact, and therefore the restriction of D to any subset is totally bounded, it is clear that ρˆ is totally bounded on E. ˆ denote the completion of E with respect to the totally To prove (iii), let E ˆ E ˆ is both complete and bounded metric ρˆ. Then, because E is dense in E, ˆ R 7−→ ϕˆ E ∈ totally bounded and therefore compact. In addition, ϕˆ ∈ C E; Ubρˆ(E; R) is a surjective homeomorphism; and so (iii) now follows from (i). One of the main reasons why Lemma 9.1.4 will be important to us is that it will enable us to show that, for separable metric spaces E, the weak topology on M1 (E) is also a separable metric topology. However, thus far we do not even know that the neighborhood bases are countably generated, and so, for a moment longer, I must continue to consider nets when discussing convergence. In order to indicate that a net {µσ : α ∈ A} ⊆ M1 (E) is converging weakly (i.e., in the weak topology) to µ, I will write µα =⇒ µ. Theorem 9.1.5. Let E be any metric space and {µα : α ∈ A} a net in M1 (E). Given any µ ∈ M1 (E), the following statements are equivalent: (i) µα =⇒ µ. (ii) If ρ is any metric for E, then hϕ, µα i −→ hϕ, µi for every ϕ ∈ Ubρ (E; R). (iii) For every closed set F ⊆ E, lim µα (F ) ≤ µ(F ). α
(iv) For every open set G ⊆ E, lim µα (G) ≥ µ(G). α
(v) For every upper semicontinuous function f : E −→ R that is bounded above, limhf, µα i ≤ hf, µi. α
(vi) For every lower semicontinuous function f : E −→ R that is bounded below, limhf, µα i ≥ hf, µi. α
(vii) For every f ∈ B(E; R) that is continuous at µ-almost every x ∈ E, hf, µα i −→ hf, µi. Finally, assume that E is separable, and let ρˆ be a totally bounded metric for E. Then there exists a countable subset {ϕn : n ≥ 1} ⊆ Ubρˆ(E; [0, 1] that is + dense in Ubρˆ(E; R), and therefore the mapping H : M1 (E) −→ [0, 1]Z given by H(µ) = hϕ1 , µi, . . . , hϕn , µi, . . . is a homeomorphism from the weak topology + on M1 (E) into [0, 1]Z . In particular, when E is separable, M1 (E) with the weak topology is itself a separable metric space and, in fact, one can take ∞ X hϕn , µi − hϕn , νi 2 (µ, ν) ∈ M1 (E) 7−→ R(µ, ν) ≡ 2n n=1
to be a metric for M1 (E).
§ 9.1 Prohorov–Varadarajan Theory
373
Proof: The implications (iii) ⇐⇒ (iv),
(vii) =⇒ (i) =⇒ (ii),
and (v) ⇐⇒ (vi)
are all trivial. Thus, the first part will be complete once I check that (ii) =⇒ (iii), (iv) =⇒ (vi), and that (v) together with (vi) imply (vii). To see the first of these, let F be a closed subset of E, and set ψn (x) = 1 −
ρ(x, F ) 1 + ρ(x, F )
n1
for n ∈ Z+ and x ∈ E.
It is then clear that ψn ∈ Ubρ (E; R) for each n ∈ Z+ and that 1 ≥ ψn (x) & 1F (x) as n → ∞ for each x ∈ E. Thus, The Monotone Convergence Theorem followed by (ii) imply that µ(F ) = lim hψn , µi = lim limhψn , µα i ≥ lim µα (F ). n→∞
α
n→∞ α
In proving that (iv) =⇒ (vi), I may and will assume that f is a non-negative, lower semicontinuous function. For n ∈ N, define fn =
∞ X ` ∧ 4n `=0
where
I`,n =
2n
n
1I`,n
4 1 X 1J`,n ◦ f, ◦f = n 2 `=0
` `+1 , 2n 2n
and J`,n =
` ,∞ . 2n
It is then clear that 0 ≤ fn % f and therefore that hfn , µi −→ hf, µi as n → ∞. At the same time, by lower semicontinuity, the sets {f ∈ J`,n } are open, and so (iv) implies hfn , µi ≤ limhfn , µα i ≤ limhf, µα i α
α
+
for each n ∈ Z . After letting n → ∞, one sees that (iv) =⇒ (vi). Turning to the proof that (v) & (vi) =⇒ (vii), suppose that f ∈ B(E; R) is continuous at µ-almost every x ∈ E, and define f (x) = lim f (y) y→x
and f (x) = lim f (y) y→x
for x ∈ E.
It is then an easy matter to check that f ≤ f ≤ f everywhere and that equality holds µ-almost surely. Furthermore, f is lower semicontinuous, f is upper semicontinuous, and both are bounded. Hence, by (v) and (vi),
limhf, µα i ≤ limhf , µα i ≤ hf , µi = hf , µi ≤ limhf , µα i ≤ limhf, µα i; α
α
α
α
374
9 Convergence of Measures on a Polish Space
and so I have now completed the proof that conditions (i) through (vii) are equivalent. Now assume that E is separable, and let ρˆ be a totally bounded metric for E. By (iii) of Lemma 9.1.4, Ubρˆ(E; R) is separable. Hence, we can find a countable set {ϕn : n ≥ 1} that is dense in Ubρˆ(E; R). In particular, by the equivalence of (i) and (ii) above, we see that hϕn , µα i −→ hϕn , µi for all n ∈ Z+ if and only if + µα =⇒ µ, which is to say that the corresponding map H : M1 (E) −→ [0, 1]Z is + a homeomorphism. Since [0, 1]Z is a compact metric space and D (cf. the proof of (ii) in Lemma 9.1.4) is a metric for it, we also see that the R described is a totally bounded metric for M1 (E). In particular, M1 (E) is separable. Finally, since, by (ii) in Lemma 9.1.4, it is always possible to find a totally bounded metric for E, the last assertion needs no further comment. The reader would do well to pay close attention to what (iii) and (iv) say about the nature of weak convergence. Namely, even though µα =⇒ µ, it is possible that some or all of the mass that the µα ’s assign to the interior of a set may gravitate to the boundary in the limit. This phenomenon is most easily understood by taking E = R, µα to be the unit point mass δα at α ∈ [0, 1), checking that δα =⇒ δ1 , and noting that δ1 (0, 1) = 0 < 1 = δα (0, 1) for each α ∈ [0, 1). Remark 9.1.6. Those who find nets distasteful will be pleased to learn that, from now on, I will be restricting my attention to separable metric spaces E and therefore need only discuss sequential convergence when working with the weak topology on M1 (E). Furthermore, unless the contrary is explicitly stated, I will always be thinking of the weak topology when working with M1 (E). Given a separable metric space E, I next want to find conditions that guarantee that a subset of M1 (E) is compact; and at this point it will be convenient to have introduced the notation K ⊂⊂ E to indicate that K is a compact subset of E. The key to my analysis is the following extension of the sort of Riesz Representation result in Theorem 3.1.1 combined with a crucial observation made by S. Ulam.1 Lemma 9.1.7. Let E be a separable metric space, ρ a metric for E, and Λ a non-negative linear functional on Ubρ (E; R) (i.e., Λ is a linear map that assigns a non-negative value to a non-negative ϕ ∈ Ubρ (E; R)) with Λ(1) = 1. Then, in order for there to be a (necessarily unique) µ ∈ M1 (E) satisfying Λ(ϕ) = hϕ, µi for all ϕ ∈ Ubρ (E; R), it is sufficient that, for every > 0, there exist a K ⊂⊂ E 1
It is no accident that Ulam was the first to make this observation. Indeed, the term Polish space was coined by Bourbaki in recognition of the contribution made to this subject by the Polish school in general and C. Kuratowski in particular (cf. Kuratowski’s Topologie, Vol. I, Warszawa–Lwow (1933)). Ulam had studied with Kuratowski.
§ 9.1 Prohorov–Varadarajan Theory
375
such that (9.1.8)
Λ(ϕ) ≤ sup |ϕ(x)| + kϕku , x∈K
ϕ ∈ Ubρ (E; R).
Conversely, if E is a Polish space and µ ∈ M1 (E), then for every > 0 there is a K ⊂⊂ E such that µ(K) ≥ 1 − . In particular, if µ ∈ M1 (E) and Λ(ϕ) = hϕ, µi for ϕ ∈ Cb (E; R), then, for each > 0, (9.1.8) holds for some K ⊂⊂ E. Proof: I begin with the trivial observation that, because Λ is non-negative and Λ(1) = 1, Λ(ϕ) ≤ kϕku . Next, according to the Daniell theory of integration, the first statement will be proved as soon as we know that Λ(ϕn ) & 0 whenever {ϕn : n ≥ 1} is a non-increasing sequence of functions from Ubρ E; [0, ∞) that tend pointwise to 0 as n → ∞. To this end, let > 0 be given, and choose K ⊂⊂ E so that (9.1.8) holds. One then has that
lim Λ ϕn ≤ lim sup |ϕn (x)| + kϕ1 ku = kϕ1 ku ,
n→∞
n→∞ x∈K
since, by Dini’s Lemma, ϕn & 0 uniformly on compact subsets of E. Turning to the second part, assume that E is Polish, and use B(x, r) to denote the open ball of radius r > 0 around x ∈ E, computed with respect to a complete metric ρ for E. Next, let {pk : k ≥ 1} be a countable dense subset of E, and set Bk,n = B pk , n1 for k, n ∈ Z+ . Given µ ∈ M1 (E) and > 0, we can choose, for each n ∈ Z+ , an `n ∈ Z+ so that `n [
µ
! Bk,n
k=1
≥1−
. 2n
Hence, if Cn ≡
`n [ k=1
B k,n
and K =
∞ \
Cn ,
n=1
then µ(K) ≥ 1 − . At the same time, it is obvious that, on the one hand, K is closed (and therefore ρ-complete) and that, on the other hand, K ⊆ S`n 2 for every n ∈ Z+ . Hence, K is both complete and totally k=1 B pk , n bounded with respect to ρ and, as such, is compact. As Lemma 9.1.7 makes clear, probability measures on a Polish space like to be nearly concentrated on a compact set. Following Prohorov and Varadarajan,2 2
See Yu. V. Prohorov’s article “Convergence of random processes and limit theorems in probability theory,” Theory of Prob. & Appl., which appeared in 1956. Independently, V.S. Varadarajan developed essentially the same theory in “Weak convergence of measures on a separable metric spaces,” Sankhyˇ a, which was published in 1958. Although Prohorov got into print first, subsequent expositions, including this one, rely heavily on Varadarajan.
376
9 Convergence of Measures on a Polish Space
what we are about to see is that, for a Polish space E, relatively compact subsets of M1 (E) are those whose elements are nearly concentrated on the same compact set of E. More precisely, given a separable metric space E, say that M ⊆ M1 (E) is tight if, for every > 0, there exists a K ⊂⊂ E such that µ(K) ≥ 1 − for all µ ∈ M . Theorem 9.1.9. Let E be a separable metric space and M ⊆ M1 (E). Then M is compact if M is tight. Conversely, when E is Polish, M is tight if M is compact.3
Proof: Since it is clear, from (iii) in Theorem 9.1.5, that M is tight if and only if M is, I will assume throughout that M is closed in M1 (E). To prove the first statement, take ρˆ to be a totally bounded metric on E, ρˆ choose {ϕn : n ≥ 1} ⊆ Ub E; [0, 1] accordingly, as in the last part of Theorem 9.1.5, and let ϕ0 = 1. Given a sequence {µ` : ` ≥ 1} ⊆ M1(E), we can use a standard diagonalization procedure to extract a subsequence µ`k : k ≥ 1 such that Λ(ϕn ) ≡ lim hϕn , µ`k i k→∞
exists for each n ∈ N. Since Λ(ϕ) ≡ limk→∞ hϕ, µ`k i continues to exist for every ϕ in the uniform closure of the span of {ϕn : n ≥ 1}, we now see that Λ determines a non-negative linear functional on Ubρˆ(E; R) and that Λ(1) = 1. Moreover, because M is tight, we can find, for any > 0, a K ⊂⊂ E such that µ(K) ≥ 1 − for every µ ∈ M , and therefore (9.1.8) holds with this choice of K. Hence, by Lemma 9.1.7, we know that there is a µ ∈ M1 (E) for which Λ(ϕ) = hϕ, µi, ϕ ∈ Ubρˆ(E; R). Because this means that hϕ, µ`k i −→ hϕ, µi for every ϕ ∈ Ubρˆ(E; R), the equivalence of (i) and (ii) in Theorem 9.1.5 allows us to conclude that µ`k =⇒ µ. Finally, suppose that E is Polish and that M is compact in M1 (E). To see that M must be tight, repeat the argument used to prove the second part of Lemma 9.1.7. Thus, choose Bk,n for k, n ∈ Z+ as in the proof there, and set f`,n (µ) = µ
` [
! Bk,n
for `, n ∈ Z+ .
k=1
By (iv) in Theorem 9.1.5, µ ∈ M1 (E) 7−→ f`,n (µ) ∈ [0, 1] is lower semicontinuous. Moreover, for each n ∈ Z+ , f`,n % 1 as ` % ∞. Thus, by Dini’s Lemma, we can choose, for each n ∈ Z+ , one `n ∈ Z+ so that f`n ,n (µ) ≥ 1 − 2n for all 3
For the reader who wishes to investigate just how far these results can be pushed before they start of break down, a good place to start is Appendix III in P. Billingsley’s Convergence of Probability Measures, Wiley (1968). In particular, although it is reasonably clear that completeness is more or less essential for the necessity, the havoc that results from dropping separability may come as a surprise.
§ 9.1 Prohorov–Varadarajan Theory
377
µ ∈ M ; and at this point the rest of the argument is precisely the same as the one given at the end of the proof of Lemma 9.1.7. § 9.1.3. The L´ evy Metric and Completeness of M1 (E). We have now seen that M1 (E) inherits properties from E. To be more specific, if E is a metric space, then M1 (E) is separable or compact if E itself is. What I want to show next is that completeness also gets transferred. That is, I will show that M1 (E) is Polish if E is. In order to do this, I will need a lemma that is of considerable importance in its own right. Lemma 9.1.10. Let E be a Polish space and Φ a bounded subset of Cb (E; R) that is equicontinuous at each x ∈ E. (That is, for each x ∈ E, supϕ∈Φ |ϕ(y) − ϕ(x)| = 0 as y → x.) If {µn : n ≥ 1} ∪ {µ} ⊆ M1 (E) and µn =⇒ µ, then lim sup hϕ, µn i − hϕ, µi = 0.
n→∞ ϕ∈Φ
Proof: Let > 0 be given, and use the second part of Theorem 9.1.9 to choose K ⊂⊂ E so that sup kϕku sup µn K{ < . 4 ϕ∈Φ n∈Z+ By (iv) of Theorem 9.1.5, µ K{ satisfies the same estimate. Next, choose a metric ρ for E and a countable dense set {pk : k ≥ 1} in K. Using equicontinuity together with compactness, find ` ∈ Z+ and δ1 , . . . , δ` > 0 so that K ⊆ x : ρ(x, pk ) < δk for some 1 ≤ k ≤ ` and
sup ϕ(x) − ϕ(pk ) < 4 ϕ∈Φ
for 1 ≤ k ≤ ` and x ∈ K with ρ(x, pk ) < 2δk .
Because r ∈ (0, ∞) 7−→ µ y ∈ K : ρ(y, x) ≤ r ∈ [0, 1] is non-decreasing for each x ∈ K, we can find, for each 1 ≤k ≤ `, an rk ∈ δk , 2δk such that µ(∂Bk ) = 0 when Bk ≡ x ∈ K : ρ x, pk < rk . Finally, set A1 = B1 and Sk S` Ak+1 = Bk+1 \ j=1 Bj for 1 ≤ k < `. Then, K ⊆ k=1 Ak , the Ak ’s are disjoint, and, for each 1 ≤ k ≤ `, sup sup ϕ(x) − ϕ pk < 4 ϕ∈Φ x∈Ak
and µ ∂Ak = 0.
Hence, by (vii) in Theorem 9.1.5 applied to the 1Ak’s, ` X sup ϕ pk µn Ak − µ Ak = . lim sup hϕ, µn i − hϕ, µi < + lim
n→∞ ϕ∈Φ
n→∞
k=1
ϕ∈Φ
378
9 Convergence of Measures on a Polish Space
Theorem 9.1.11. Let E be a Polish space and ρ a complete metric for E. Given (µ, ν) ∈ M1 (E)2 , define n L(µ, ν) = inf δ : µ(F ) ≤ ν F (δ) + δ o and ν(F ) ≤ µ F (δ) + δ for all closed F ⊆ E , where F (δ) denotes the set of x ∈ E that lie a ρ-distance less than δ from F . Then L is a complete metric for M1 (E), and therefore M1 (E) is Polish. Proof: It is clear that L is symmetric and that it satisfies the triangle inequality. Thus, we will know that it is a metric for M1 (E) as soon as we show that L µn , µ −→ 0 if and only if µn =⇒ µ. To this end, first suppose that L µn , µ −→ 0. Then, for every closed F , µ F (δ) + δ ≥ limn→∞ µn (F ) for all δ > 0; and therefore, by countable additivity, µ(F ) ≥ limn→∞ µn (F ) for every closed F . Hence, by the equivalence of (i) and (iii) in Theorem 9.1.5, µn =⇒ µ. Now suppose that µn =⇒ µ, and let δ > 0 be given. Given a closed F in E, define ρ x, F (δ) { for x ∈ E. ψF (x) = ρ x, F (δ) { + ρ(x, F )
It is then an easy matter to check that both 1F ≤ ψF ≤ 1F (δ)
ρ(x, y) . and ψF (x) − ψF (y) ≤ δ
In particular, by Lemma 9.1.10, we can choose m ∈ Z+ so that n o sup sup hψF , µn i − hψF , µi : F closed in E < δ, n≥m
from which it is an easy matter to see that, for all n ≥ m, µ(F ) ≤ µn F (δ) + δ and µn (F ) ≤ µ F (δ) + δ. In other words, supn≥m L µn , µ ≤ δ, and, since δ > 0 was arbitrary, we have shown that L µn , µ −→ 0. In order to finish the proof, I must show that if {µn : n ≥ 1} ⊆ M1 (E) is L-Cauchy convergent, then it is tight. Thus, let > 0 be given, and choose, for each ` ∈ Z+ , an m` ∈ Z+ and a K` ⊂⊂ E so that max µn K` { ≤ `+1 . sup L µn , µm` ≤ `+1 and 1≤n≤m 2 2 ` n≥m` ( ) one then has that supn∈Z+ µn K` ` { ≤ ` for each ` ∈ Z+ . T∞ ( ) In particular, if K ≡ `=1 K` ` , then µn (K) ≥ 1 − for all n ∈ Z+ . Finally,
Setting ` =
, 2`
§ 9.1 Prohorov–Varadarajan Theory
379
because each K` is compact, it is easy to see that K is both ρ-complete and totally bounded and therefore also compact. When E = R, P. L´evy was the first to construct a complete metric on M1 (E), and it is for this reason that I will call the metric L described in Theorem 9.1.11 the L´ evy metric determined by ρ. Using an abstract argument, Varadarajan showed that M1 (E) must be Polish whenever E is, and the explicit construction that I have used is essentially the one first produced by Prohorov. Before closing this subsection, it seems appropriate to introduce and explain some of the more classical terminology connected with applications of weak convergence to probability theory. For this purpose, let (Ω, F, P) be a probability space and E a metric space. Given a sequence {Xn : n ≥ 1} of E-valued random variables on (Ω, F, P), one says that the {Xn : n ≥ 1} tends in law (or in L distribution) to the E-valued random variable X and writes Xn −→ X if (cf. Exercise 1.1.16) (Xn )∗ P =⇒ X∗ P. The idea here is that, when the measures under consideration are the distributions of random variables, one wants to think of weak convergence of the distributions as determining a kind of convergence of the corresponding random variables. Thus, one can add convergence in law to the list of possible ways in which random variables might converge. In order to elucidate the relationship between convergence in law, P-almost sure convergence, and convergence in P-measure, it will be convenient to have the following lemma. Lemma 9.1.12. Let (Ω, F, P) be a probability space and E a metric space. Given any E-valued random variables {Xn : n ≥ 1} ∪ {X} on (Ω, F, P) and any pair of topologically equivalent metrics ρ and σ for E, ρ Xn , X −→ 0 in Pmeasure if and only if σ Xn , X −→ 0 in P-measure. In particular, convergence in P-measure does not depend on the choice of metric, and so one can write Xn −→ X in P-measure without specifying a metric. Moreover, if Xn −→ X in L P-measure, then Xn −→ X. In fact, if E is a Polish space and L is the L´evy metric on M1 (E) associated with a complete metric ρ for E, then L X∗ P, Y∗ P) ≤ δ ∨ P ρ(X, Y ) ≥ δ
for all δ > 0 and E-valued random variables X and Y . Proof: To prove the first assertion, suppose that ρ(Xn , X) −→ 0 in P-measure but that σ(Xn , X)−→ 6 0 in P-measure. After passing to a subsequence if necessary, we could then arrange that ρ(Xn , X) −→ 0 (a.s., P) but P σ(Xn , X) ≥ ≥ for all n ∈ Z+ and some > 0. But this is impossible, since then we would have that σ(Xn , X) −→ 0 P-almost surely but not in P-measure. Hence, we now know that convergence in P-measure does
380
9 Convergence of Measures on a Polish Space
not depend on the choice of metric. To complete the first part, suppose that ρ(Xn , X) −→ 0 in P-measure. Then, for every ϕ ∈ Ubρ (E; R) and δ > 0, lim EP ϕ Xn − EP ϕ X) ≤ lim EP ϕ Xn − ϕ(X) n→∞ n→∞ ≤ (δ) + kϕku lim P ρ Xn , X ≥ δ = (δ), n→∞
where (δ) ≡ sup |ϕ(y) − ϕ(x)| : ρ(x, y) ≤ δ −→ 0 as
δ & 0.
Thus, by (ii) in Theorem 9.1.5, (Xn )∗ P =⇒ X∗ P. Now assume that E is Polish, and take ρ and L accordingly. Then, for any closed set F and δ > 0, X∗ P(F ) = P(X ∈ F ) ≤ P ρ(Y, F ) < δ + P ρ(X, Y ) ≥ δ = Y∗ P F (δ) + P ρ(X, Y ) ≥ δ . Hence, since the same is true when the roles of X and Y are reversed, the asserted estimate for L X∗ P, Y∗ P) holds. As a demonstration of the sort of use to which one can put these ideas, I present the following version of the Principle of Accompanying Laws. Theorem 9.1.13. Let E be a Polish space and, for each k ∈ Z+ , let {Yk,n : n ≥ 1} be a sequence of E-valued random variables on the probability space (Ω, F, P). Further, assume that, for each k ∈ Z+ , there is a µk ∈ M1 (E) such ∗ that Yk,n P =⇒ µk as n → ∞. Finally, let ρ be a complete metric for E, and suppose that {Xn : n ≥ 1} is a sequence of E-valued random variables on (Ω, F, P) with the property that (9.1.14) lim lim P ρ Xn , Yk,n ≥ δ = 0 for every δ > 0. k→∞ n→∞
Then there is a µ ∈ M1 (E) such that µk =⇒ µ as k → ∞ and (Xn )∗ P =⇒ µ as L n → ∞. In particular, if, as n → ∞, Yn −→ X and P ρ(Xn , Yn ) ≥ δ −→ 0 for L
each δ > 0, then Xn −→ X. Proof: Let L be the L´evy metric associated with a complete metric ρ for E. By the second part of Lemma 9.1.12, sup L (Y`,n )∗ P, (Xn )∗ P ≤ δ ∨ sup lim P ρ(Y`,n , Xn ) ≥ δ , `≥k n→∞
`≥k
and therefore, by (9.1.14), (*)
lim lim L (Y`,n )∗ P, (Xn )∗ P = 0.
k→∞ n→∞
Exercises for § 9.1
381
Thus, since for any k ∈ Z+ , sup L µ` , µk = sup lim L (Y`,n )∗ P, (Yk,n )∗ P , `≥k
`≥k n→∞
{µk : k ≥ 1} is an L-Cauchy sequence and, as such, converges to some µ. Finally, for every k ∈ Z+ , L µ, (Xn )∗ P ≤ L(µ, µk ) + L µk , (Yk,n )∗ + L (Yk,n )∗ P, (Xn )∗ P , and so
lim L µ, (Xn )∗ P ≤ L(µ, µk ) + lim L (Yk,n )∗ P, (Xn )∗ P . n→∞
n→∞
Thus, after letting k → ∞ and applying (*), one concludes that (Xn )∗ P =⇒ µ. Exercises for § 9.1 Exercise 9.1.15. Let (E, B) be a measurable space with the property that {x} ∈ B for all x ∈ E. In this exercise, we will investigate the strong topology in a little more detail. In particular, in part (iv), we will show that when µ ∈ M1 (E) is non-atomic (i.e., µ {x} = 0 for every x ∈ E), then there is no countable neighborhood basis of µ in the strong topology. Obviously, this means that the strong topology for M1 (E) admits no metric whenever M1 (E) contains a non-atomic element. (i) Show that, in general, kν − µkvar = 2 max ν(A) − µ(A) : A ∈ B and that in the case when E is a metric space, B its Borel field, and ρ a metric for E, kν − µkvar = sup hϕ, νi − hϕ, µi : ϕ ∈ Ubρ (E; R) and kϕku ≤ 1 . (ii) Show that if {µn : n ≥ 1} is a P sequence in M1 (E) that tends in the strong ∞ topology to µ ∈ M1 (E), then µ n=1 2−n µn . (iii) Given µ ∈ M1 (E), show that µ admits a countable neighborhood basis in the strong topology if and only if there exists a countable {ϕk : k ≥ 1} ⊆ B(E; R) such that, for any net {µα : α ∈ A} ⊆ M1 (E), µα −→ µ in the strong topology as soon as limα hϕk , µα i = hϕk , µi for every k ∈ Z+ .
382
9 Convergence of Measures on a Polish Space +
+
(iv) Referring to Exercises 1.1.14 and 1.1.16, set Ω = E Z and F = B Z . Next, + let µ ∈ M1 (E) be given, and define P = µZ on (Ω, F). Show that, for any ϕ ∈ B(E; R), the random variables x ∈ Ω 7−→ Xnϕ (x) ≡ ϕ xn , n ∈ Z+ , are mutually P-independent and all have distribution ϕ∗ µ. In particular, use the Strong Law of Large Numbers to conclude that n 1 X ϕ Xm (x) = hϕ, µ n→∞ n m=1
lim
for each x outside of a P-null set. Now assume that µ is non-atomic, and suppose that µ admitted a countable neighborhood basis in the strong topology. Choose {ϕk : k ≥ 1} ⊆ B(E; R) accordingly, as in (iii), and (using the preceding) conclude P that there exists at n least one x ∈ Ω for which the measures µn given by µn ≡ n1 m=1 δxm , n ∈ Z+ , converge in the strong topology to µ. Finally, apply (ii) to see that this is impossible.
Exercise 9.1.16. Throughout this exercise, E is a separable metric space. (i) We already know that M1 (E) is separable; however, our proof was non-con structive. Show that if {pk : k ≥ 1 is a dense subset of E, then the set of Pn + all convex combinations and {αk : 1 ≤ k ≤ n} ⊂ k=1 αk δpk , where n ∈ Z Pn [0, 1] ∩ Q with 1 αk = 1, is a countable dense set in M1 (E). (ii) We have seen that M1 (E) is compact if E is. To see that the converse is also true, show that x ∈ E 7−→ δx ∈ M1 (E) is a homeomorphism whose image is closed. (iii) Although it is a little off our track, it is amusing to show that E being compact is equivalent to Cb (E; R) being separable; and, in view of (i) in Lemma 9.1.4, this comes down to checking that E is compact if Cb (E; R) is separable. ˆ to denote the ρˆHint: Let ρˆ be a totally bounded metric on E, and use E completion of E. Show that if {xn : n ≥ 1} ⊆ E has the properties that ˆ and limn→∞ ϕ(xn ) exists for every ϕ ∈ Cb (E; R), then x xn −→ x ˆ∈E ˆ ∈ E. 1 , and consider functions of the form f ◦ ψ for (Suppose not, set ψ(x) = ρ(x,ˆ ˆ x) f ∈ Cb (R; R).) Finally, assuming that Cb (E; R) is separable, and, using a diagonalization procedure, show that every sequence {xn : n ≥ 1} ⊆ E admits a ˆ and limm→∞ ϕ xn subsequence {xnm : m ≥ 1} that converges to some x ˆ∈E m exists for every ϕ ∈ Cb (E; R).
(iv) Let {Mn : n ≥ 1} be a sequence of finite, non-negative measures on (E, B). Assuming that {Mn : n ≥ 1} is tight in the sense that {Mn (E) : n ≥ 1} is bounded and that, for each > 0, there is a K ⊂⊂ E such that supn Mn K{ ≤
Exercises for § 9.1
383
, show that there is a subsequence {Mnk : k ≥ 1} and a finite measure M such that Z Z ϕ dM = lim ϕ dMnk , for all ϕ ∈ Cb (E; R). k→∞
E
E
R Conversely, if E is Polish and there is a finite measure M such that ϕ dMn −→ E R ϕ dM for every ϕ ∈ Cb (E; R), show that {Mn : n ≥ 1} is tight. E Exercise 9.1.17. Let {E` : ` ≥ 1} be a sequence of Polish spaces, set E = Q∞ 1 E` , and give E the product topology. (i) For each ` ∈ Z+ , let ρ` be a complete metric for E` , and define ∞ X 1 ρ` (x` , y` ) R(x, y) = 2` 1 + ρ` (x` , y` )
for x, y ∈ E.
`=1
Show that R is a complete metric Q∞ for E, and conclude that E is a Polish space. In addition, check that BE = 1 BE` . (ii) For ` ∈ Z+ , let π` be the natural projection map from E onto E` , and show that K ⊂⊂ E if and only if \ K= π`−1 (K` ), where K` ⊂⊂ E` for each ` ∈ Z+ . `∈Z+
Also, show that the span of the functions ` Y
ϕk ◦ πk ,
where ` ∈ Z+ and ϕk ∈ Ubρk (Ek ; R), 1 ≤ k ≤ `,
k=1
is dense in UbR (E; R). (E) is In particular, conclude from these that A ⊆ M1+ tight if and only if (π` )∗ µ : µ ∈ A ⊆ M1 (E` ) is tight for every ` ∈ Z and that µn =⇒ µ in M1 (E) if and only if * ` + * ` + Y Y ϕk ◦ πk , µn −→ ϕk ◦ πk , µ k=1
k=1
for every ` ∈ Z+ and choice of ϕk ∈ Ubρk (Ek ; R), 1 ≤ k ≤ `. Q` (iii) For each ` ∈ Z+ , set E` = k=1 Ek , and let π` denote thenatural projection map from E onto E` . Next, let µ[1,`] be an element of M1 E` , and assume that the µ[1,`] ’s are consistent in the sense that, for every ` ∈ Z+ , µ[1,`+1] Γ × E`+1 = µ[1,`] (Γ) for all Γ ∈ BE` . Show that there is a unique µ ∈ M1 (E) such that µ[1,`] = (π` )∗ µ for every ` ∈ Z+ .
384
9 Convergence of Measures on a Polish Space
Hint: Choose and fix an e ∈ E, and define Φ` : E` −→ E so that
Φ` x1 , . . . , x`
= n
n≤`
xn
if
en
otherwise.
Show that (Φ` )∗ µ[1,`] : ` ∈ Z+ ∈ M1 (E) is tight and that any limit must be the desired µ. The conclusion drawn in (iii) is the renowned Kolmogorov Extension (or Consistency) Theorem. Notice that, at least for Polish spaces, it represents a vast generalization of the result obtained in Exercise 1.1.14. Exercise 9.1.18. In this exercise we will use the theory of weak convergence to develop variations on The Strong Law of Large Numbers (cf. Theorem 1.4.9). Thus, let E be a Polish space, (Ω, F, P ) a probability space, and {Xn : n ≥ 1} a sequence of mutually independent E-valued random variables on (Ω, F, P ) with common distribution µ ∈ M1 (E). Next, define the empirical distribution function n 1 X δX (ω) ∈ M1 (E), ω ∈ Ω 7−→ Ln (ω) ≡ n m=1 m
and observe that, for any ϕ ∈ B(E; R),
n 1 X ϕ Xm (ω) , ϕ, Ln (ω) = n m=1
n ∈ Z+ and ω ∈ Ω.
As a consequence of the Strong Law, show that (9.1.19)
Ln (ω) =⇒ µ for P -almost every ω ∈ Ω,
which is The Strong Law of Large Numbers for the empirical distribution. Now show that (9.1.19) provides another (cf. Exercises 6.1.16 and 6.2.18) proof of the Strong Law of Large Numbers for Banach space–valued random variables. Thus, let EPbe a real, separable, Banach space with dual space E ∗ , and set n S n (ω) = n1 1 Xm (ω) for n ∈ Z+ and ω ∈ Ω.
(i) As a preliminary step, begin with the case when (*)
µ BE (0, R){ = 0
for some
R ∈ (0, ∞).
Choose η ∈ Cb R; R so that η(t) = t for t ∈ [−R, R] and η(t) = 0 when |t| ≥ R + 1, and define ψx∗ ∈ Cb (E; R) for x∗ ∈ E ∗ by ψx∗ (x) = η hx, x∗ i , x ∈ E,
Exercises for § 9.1
385
where hx, x∗ i is used here to denote the action of x∗ ∈ E ∗ on x ∈ E. Taking (*) into account and applying (9.1.19) and Lemma 9.1.10, show that lim
sup
n→∞ kx∗ k ∗ ≤1 E
Z hψx∗ , Ln (ω)i − hx, x∗ i µ(dx) = 0 E
for P-almost every ω ∈ Ω, and conclude from this that
lim S n (ω) − m E = 0
n→∞
for P-almost every ω ∈ Ω,
where (cf. Lemma 5.1.10) m = Eµ [x]. (ii) The next step is to replace the boundedness assumption in (*) by the hypothesis that x kxkE is µ-integrable. Assuming that it is, define, for R ∈ (0, ∞), n ∈ Z+ , and ω ∈ Ω, Xn(R) (ω)
=
Xn (ω)
if Xn (ω) E < R
0
otherwise
Pn (R) (R) (R) (ω) = Xn (ω) − Xn (ω). Next, set S n = n1 1 Xm , n ∈ Z+ , and, (R) from (i), note that S n (ω) : n ≥ 1 converges in E for P-almost every ω ∈ Ω. In particular, if > 0 is given and R ∈ (0, ∞) is chosen so that (R)
and Yn
Z kxkE µ(dx)
0 by ρ(r, 1) = sup ρ ∈ [0, ∞) : M [ρ, ∞) ≥ r−1 ρ(r, −1) = sup ρ ∈ [0, ∞) : M (−∞, −ρ] ≥ r−1 , where I have taken the supremum over the empty set to be 0. Applying Exercise 9.2.6 with ν(dr)= r−2 λ(0,∞) (dr), one sees that M = M0F when F (0) = 0 and y for y ∈ R \ {0}. F (y) = ρ |y|, |y| Now assume that N ≥ 2, and let M ∈ M∞ (RN ). If M = 0, simply take F ≡ 0. If M 6= 0, choose a non-decreasing function h : (0, ∞) −→ (0, ∞) so that Z h |y| M (dy) = 1,
and define µ ∈ M1 (0, ∞) × SN −1 ) so that Z hϕ, µi = h |y| ϕ(y)M (dy). RN 1
See K. Itˆ o’s On stochastic differential equations, Memoirs of the A.M.S. 4 (1951) or my Markov Processes from K. Itˆ o’s Perspective, Princeton Univ. Press, Annals of Math. Studies 155 (2003). 2 There is nothing sacrosanct about the choice of M as my reference measure. For instance, it 0 should be obvious that one can choose any L´ evy measure M with the property that M0 = M F for some Borel measurable F : RN −→ RN that takes 0 to 0.
§ 9.2 Regular Conditional Probability Distributions
391
Using µ2 to denote the marginal distribution of µ on SN −1 , apply Corollary 9.2.3 to find a Borel measurable f : [0, 1) −→ RN so that µ2 = f∗ λ[0,1) . Since µ2 lives on SN −1 , I may and will assume that f (u) ∈ SN −1 for all u ∈ [0, 1). Next, use Theorem 9.2.2 to find a measurable map η ∈ SN −1 7−→ µ(η, · ) ∈ M1 (0, ∞) so that µ(dr × dη) = µ(η, dr) µ2 (dη), and define ρ : (0, ∞) × SN −1 −→ [0, ∞) by ) ( Z ωN −1 1 . µ(η, dr) ≥ ρ(r, η) = sup ρ ∈ [0, ∞) : r [ρ,∞) h(r)
Then, again by Exercise 9.2.6, but this time with ν(dr) = ωN −1 r−2 λ(0,∞) (dr), for any continuous ϕ : RN −→ [0, ∞) that vanishes in a neighborhood of 0, Z Z ϕ(rη) µ(η, dr) = ωN −1 ϕ ρ(r, η)η r−2 dr, η ∈ SN −1 , (0,∞) (0,∞) h(r)
and so Z
Z
!
Z
ϕ(y) M (dy) = ωN −1
ϕ ρ(r, η)η r SN −1
RN
Z
dr µ2 (dη)
(0,∞)
Z
= ωN −1 [0,1)
−2
! −2 ϕ ρ(r, η)f (t) r dr λ[0,1) (dt).
(0,∞)
Finally, define g : SN −1 −→ [0, ωN −1 ) by g(η) = λSN −1 {η 0 ∈ SN −1 : η10 ≤ η1 } , note that ωN −1 λ[0,1) = g∗ λSN −1 , and conclude that M = M0F when y y for y ∈ RN \ {0}. f ◦ g |y| F (0) = 0 and F (y) = ρ |y|, |y|
We can now prove the following theorem, which is the simplest example of Itˆ o’s procedure. Theorem 9.2.5. Let {j0 (t, · ) : t ≥ 0} be a Poisson jump process associated with M0 . Then, for each M ∈ M∞ (RN ), there is a Borel measurable map F : RN −→ RN with F (0) = 0 and a Poisson jump process {j(t, · ) : t ≥ 0} associated with M such that j(t, · ) = j0F (t, · ), t ≥ 0, P-almost surely. Proof: Choose F as in Theorem 9.2.4 so that M = M0F . For R > 0, set FR (y) = 1[R,∞) (y)F (y). By Lemma 4.2.12, we know that {j0FR (t, · ) : t ≥ 0} is a Poisson jump process associated with M FR . In particular, for each r > 0, EP j0F t, RN \ B(0, r) = lim EP j0FR t, RN \ B(0, r) = M RN \ B(0, r) < ∞. R&0
Hence, there exists a P-null set N such that t j0F (t, · , ω) is a jump function F for all ω ∈ / N . Finally, if j(t, · , ω) = j0 (t, · , ω) when ω ∈ / N and j(t, · , ω) = 0 for ω ∈ N , then {j(t, · ) : t ≥ 0} is a jump process associated with M and j(t, · ) = j0F (t, · ), t ≥ 0, for P-almost every ω ∈ Ω.
392
9 Convergence of Measures on a Polish Space Exercises for § 9.2
Exercise 9.2.6. Let ν be an infinite non-negative, non-atomic, Borel measure on [0, ∞) with the property that ν [r2 , ∞) < ν [r1 , ∞) < ∞ for all 0 < r1 < r2 < ∞. Given any other non-negative Borel measure on [0, ∞) with the properties that µ({0}) = 0 and µ [r, ∞) < ∞ for all r > 0, define ρ(r) = sup ρ ∈ (0, ∞) : µ [ρ, ∞) ≥ ν [r, ∞) ,
r ≥ 0,
where over the empty set is taken to be 0. Show that µ [t, ∞) = the supremum ν r : ρ(r) ≥ t for all t > 0, and therefore that hϕ, µi = hϕ ◦ ρ, νi for all Borel measurable ϕ : [0, ∞) −→ [0, ∞) that vanish at 0. Hint: Determine g : (0, ∞) −→ (0, ∞) so that ν g(r), ∞ = r, and check that {r : ρ(r) ≥ t} = g µ([t, ∞)) , ∞ for all t > 0. § 9.3 Donsker’s Invariance Principle The content of this section is my main justification for presenting the material in § 9.1. Namely, as we saw in Chapter 8, there is good reason to think that Wiener measure is the infinite dimensional version of the standard Gauss measure in RN , and as such one might suspect that there is a version of The Central Limit Theorem that applies to it. In this section I will prove such a Central Limit Theorem for Wiener measure. The result is due to M. Donsker and is known as Donsker’s Invariance Principle (cf. Theorem 9.3.1). Before getting started, I need to make a couple of simple preparatory remarks. In the first place, I will be thinking of Wiener measure W (N ) as a Borel N N probability measure on C(R ) = C [0, ∞); R with the topology of uniform convergence on compact intervals. Equivalently, C(RN ) is given the topology for which ∞ X 1 kψ − ψ 0 k[0,n] ρ(ψ, ψ 0 ) = 2n 1 + kψ − ψ 0 k[0,n] n=1
is a metric, which, just as in the case of D(RN ) (cf. 4.1.1), is complete on C(RN ) and, as distinguished from D(RN ), is separable there. One way to check separability is to note that the set of paths ψ that, for some n ∈ N, are linear on [(m − 1)2−n , m2−n ] and satisfy ψ(m2−n ) ∈ QN for all m ∈ Z+ is a countable, dense subset. In particular, this means that C(RN ) is a Polish space, and so the theory developed in § 9.1 applies to it. In addition, the Borel field BC(RN ) coincides with σ {ψ(t) : t ≥ 0} , the σ-algebra that C(RN ) inherits as a subset of (RN )[0,∞) (cf. § 4.1). Indeed, since ψ ψ(t) is continuous for every t ≥ 0, it is obvious that σ {ψ(t) : t ≥ 0} ⊆ BC(RN ) . At the same time, since kψk[0,t] = sup{|ψ(τ ) : τ ∈ [0, t] ∩ Q}, it is easy to check that open balls are σ {ψ(t) : t ≥ 0} -measurable. Hence, since every open set is the countable union of open balls, BC(RN ) ⊆ σ {ψ(t) : t ≥ 0} . Knowing that these σ-algebras coincide,
§ 9.3 Donsker’s Invariance Principle
393
we know that two probability measures µ, ν ∈ M1 C(RN ) are equal if they determine the same distribution on (RN )[0,∞) , that is, if, for each n ∈ Z+ and 0 = t0 < t1 < tn , the distribution of ψ ∈ C(RN ) 7−→ ψ(t0 , . . . , ψ(tn ) ∈ (RN )n is the same under µ and ν. § 9.3.1. Donsker’s Theorem. Let (Ω, F, P) be a probability space, and suppose that {Xn : n ≥ 1} is a sequence of independent, P-uniformly square integrable random variables (i.e., as R → ∞, EP |Xn |2 , |Xn | ≥ R −→ 0 uniformly in n) with mean value 0 and covariance I. Given n ≥ 1, define Pm m − 12 = n ω ∈ Ω 7−→ Sn ( · , ω) ∈ C(RN ) so that S (0) = 0, S n n k=1 Xk , and n m−1 m + Sn ( · , ω) is linear on each interval n , n for all m ∈ Z . Donsker’s theorem is the following.
Theorem 9.3.1 (Donsker’s Invariance Principle). If µn = (Sn )∗ P ∈ M1 C(RN ) is the distribution of ω ∈ Ω 7−→ Sn ( · , ω) ∈ C(RN ) under P, then µn =⇒W (N ) . Equivalently, for any bounded, continuous Φ : C(RN ) −→ C, lim EP Φ ◦ Sn = hΦ, W (N ) i.
n→∞
Proving this result comes down to showing that {µn : n ≥ 1} is tight and that every limit point is W (N ) . The second of these is a rather elementary application of the Central Limit Theorem, and, at least when the Xn ’s have uniformly bounded fourth moments, the first is an application of Kolmogorov’s Continuity Criterion. Finally, to remove the fourth moment assumption, I will use the Principle of Accompanying Laws. It should be noticed that, at no point in the proof, do I make use of the a priori existence of Wiener measure. Thus, Theorem 9.3.1 provides another derivation of its existence, a derivation that includes an an extremely ubiquitous approximation procedure. Lemma 9.3.2. Any limit point of {µn : n ≥ 1} is W (N ) . Proof: Since a probability on C(RN ) is uniquely determined by its finite dimensional time marginals, and because ψ(0) = 0 with probability 1 under all the µn ’s as well as W (N ) , it suffices to show that, for each ` ∈ Z+ and 0 = t0 < t1 < · · · < t` , Sn (t1 ), Sn (t2 ) − Sn (t1 ), . . . , Sn (t` ) − Sn (t`−1 ) ∗ P =⇒ γ0,τ1 I × · · · × γ0,τ` I , where τk = tk − tk−1 , 1 ≤ k ≤ `. To this end, for 1 ≤ k ≤ ` and n > bntk c 1
∆n (k) = n− 2
X j=bntk−1 c+1
Xj ,
1 τk ,
set
394
9 Convergence of Measures on a Polish Space
where, as usual, I use the notation btc to denote the integer part of t. Noting that Sn tk − Sn tk−1 − ∆n (k) bntk−1 c bntk c ≤ Sn tk − Sn + Sn tk−1 − Sn n n Xbnt c+1 + Xbnt c+1 k k−1 , ≤ 1 n2
one sees that, for any > 0, ` 2 X P Sn tk − Sn tk−1 − ∆n (k) ≥ 2 k=1
≤
!
` 2 X n2 ≤P Xbntk c+1 ≥ 4
!
k=0
` 2 i 4(` + 1)N 4 X P h = −→ 0 E X bnt c+1 k n2 n2 k=0
as n → ∞. Hence, by the Principle of Accompanying Laws (cf. Theorem 9.1.13), we need only check that ∆n (1), . . . , ∆n (`) ∗ P =⇒ γτN × · · · × γτN . 1 ` Moreover, since ∆n (1), . . . , ∆n (`) ∗ P = ∆n (1) ∗ P × · · · × ∆n (`) ∗ P for all sufficiently large n’s, this reduces to checking ∆n (k) ∗ P =⇒ γ0,τk I for each 1 ≤ k ≤ `. Finally, given 1 ≤ k ≤ `, set Mn (k) = bntk c − bntk−1 c, and use Theorem 2.3.8 to see that, as n → ∞, √ Mn (k) X |ξ|2 −1 P ξ, Xbntk c+j RN −→ exp − E exp 1 2 Mn (k) 2 j=1
uniformly for ξ in compact subsets of RN . Hence, since see that, for any fixed ξ ∈ RN ,
Mn (k) n
−→ τk , we now
√ i τk |ξ|2 = γ\ −→ exp − E exp −1 ξ, ∆n (k) RN 0,τk I (ξ), 2 P
h
and therefore ∆n (k) ∗ P =⇒ γ0,τk I .
§ 9.3 Donsker’s Invariance Principle
395
I turn next to the problem of showing that {µn : n ≥ 1} is tight. By the Ascoli–Arzela´ a Theorem, any subset K ⊆ C(RN ) of the form ∞ \ |ψ(t) − ψ(s)| ≤ R` ψ : |ψ(0)| ∨ sup (t − s)α 0≤s 0 and {R` : ` ≥ 1} ⊆ [0, ∞). Thus, since µn ψ(0) = 0 = 1, all that we have to do is show that, for each T > 0, |Sn (t) − Sn (s)| P < ∞, sup E sup 1 (t − s) 8 n≥1 1≤s 0 ⊆ Cb RN , RN with the properties that, for each δ > 0, supn∈Z+ fn,δ u < ∞, h 2 i sup EP Xn − fn,δ ◦ Xn < δ,
n∈Z+
and, for every n ∈ Z+ , the random variable Xn,δ ≡ fn,δ ◦ Xn has mean value 0 and covariance I. Next, for each δ > 0, define the maps ω ∈Ω 7−→ Sn,δ ( · , ω) ∈ C(RN ) relative to {Xn,δ : n ≥ 1}, and set µn,δ = Sn,δ ∗ P. Then, by the preceding, we know that µn,δ =⇒ W (N ) for each δ > 0. Hence, by Theorem 9.1.13, we will have proved that µn =⇒ W (N ) as soon as we show that lim sup P sup Sn (t) − Sn,δ (t) ≥ = 0 δ&0 n∈Z+
0≤t≤T
for every T ∈ Z+ and > 0. To this end, first observe that, because Sn ( · ) and Sn,δ ( · ) are linear on each interval [(m − 1)2−n , m2−n ], m X 1 Y sup Sn (t) − Sn,δ (t) = max k,δ , 1 1≤m≤nT 2 n k=1 t∈[0,T ]
where Yk,δ ≡ Xk − Xk,δ . Next, note that ! m 1 X Yk,δ ≥ P max 1 1≤m≤nT n 2 k=1 ! m X n 12 ≤ N max P max e, Yk,δ RN ≥ 1 . 1≤m≤nT e∈SN −1 N2 k=1
Finally, by Kolmogorov’s Inequality, m ! X n 12 NTδ P max e, Yk,δ RN ≥ 1 ≤ 2 1≤m≤nT N2 k=1
for every e ∈ SN −1 . § 9.3.2. Rayleigh’s Random Flights Model. Here is a more picturesque scheme for approximating Brownian motion. Imagine the path t R(t) of a bird that starts at the origin, flies in a randomly chosen direction at unit speed
§ 9.3 Donsker’s Invariance Principle
397
for a unit exponential random time, then switches to a new randomly chosen direction for a second unit exponential time, etc. Next, given > 0, rescale time 1 and space so that the path becomes t R (t), where R (t) ≡ 2 R(−1 t). I will show that, as & 0, the distribution of {R (t) : t ≥ 0} becomes Brownian motion. This model was introduced by Rayleigh and is called his random flights model. In the following, {τm : m ≥ 1} is a sequence of mutually independent, unit exponential random variables from which their partial sums {Tn : n ≥ 0} and the associated simple Poisson process {N (t) : t ≥ 0} are defined as in § 4.2.1. Finally, given > 0, N (t) = N (−1 t).
Lemma 9.3.3. Let {Xn : n ≥ 1} a sequence of mutually independent RN valued, uniformly square P-integrable random variables with mean value 0 and covariance I, and define {Sn (t) : t ≥ 0} accordingly, as in Theorem 9.3.1. (Note that the Xn ’s are not assumed to be independent of the τn ’s.) Next, define X (t, ω) =
√
N (t,ω)
X
Xm ,
(t, ω) ∈ [0, ∞) × Ω.
m=1
Then, for all r ∈ (0, ∞) and T ∈ [0, ∞), ! sup X (t) − Sn (t) ≥ r
lim P
&0
= 0,
where n ≡ [−1 ].
t∈[0,T ]
Proof: Note that
√ N (t, ω) ,ω X (t, ω) − Sn (t, ω) = ( n − 1) Sn n N (t, ω) , ω − Sn (t, ω) . + Sn n
Hence, for every δ ∈ (0, 1], ! P
sup X (t) − Sn (t) ≥ r t∈[0,T ]
! N (t) − t ≥ δ + P sup n t∈[0,T ] ! r . + P sup sup Sn (t) − Sn (s) ≥ 2 s∈[0,T ] |t−s|≤δ
r ≤P sup Sn (t) ≥ 2 t∈[0,T +δ]
!
But, by Theorem 9.3.1 and the converse statement in Theorem 9.1.9, we know that the first term tends to 0 as & 0 uniformly in δ ∈ (0, 1] and that the third
398
9 Convergence of Measures on a Polish Space
term tends to 0 as δ & 0 uniformly in ∈ (0, 1]. Thus, all that remains is to note that, by Exercise 4.2.19, ! (9.3.4) lim P sup N (t) − t ≥ δ = 0. &0
t∈[0,T ]
Now suppose that {θn : n ≥ 1} is a sequence of mutually independent RN valued random variables that satisfy the conditions that h i M ≡ sup EP |τn θn |4 < ∞, n∈Z+ h i EP τn θn = 0, and EP (τn θn ) ⊗ (τn θn ) = I, n ∈ Z+ . Finally, define ω ∈ Ω 7−→ R( · , ω) ∈ C(RN ) by N (t,ω) X R(t, ω) = t − TN (t,ω) (ω) θN (t,ω)+1 (ω) + τm (ω)θm (ω). m=1
The process {R(t) : t ≥ 0} is my interpretation of Rayleigh’s random flights model. A typical choice of the θn ’s would be to make them independent of the holding times (i.e., √theτn ’s) and to choose them to be uniformly distributed over N −1 N . the sphere S
Theorem 9.3.5. Referring to the preceding, set √ R (t, ω) = R t , ω , (t, ω) ∈ [0, ∞) × Ω. Then R ∗ P =⇒ W (N ) as & 0.
Proof: Set Xn = τn θn , and, using the same notation as in Lemma 9.3.3, observe that √ R (t) − X (t) ≤ XN (t)+1 .
Hence, by Lemma 9.3.3 and Theorems 9.3.1 and 9.1.13, all that we have to do is check that ! √ lim P sup XN (t)+1 ≥ r = 0 &0
t∈[0,T ]
for every r ∈ (0, ∞) and T ∈ [0, ∞). To this end, set T = 1+T . Then, by (9.3.4), we have that ! √ r lim P sup XN (t)+1 ≥ r = lim P max |Xn+1 | ≥ √ 0≤n≤T &0 &0 t∈[0,T ] 14 1 √ M (2 + T ) 4 P X 4 = 0. E |Xn+1 | ≤ lim ≤ lim &0 r &0 r 0≤n≤T
Exercise for § 9.3
399
Exercise for § 9.3 Exercise 9.3.6. Let {µn : n ≥ 1} ⊆ M1 C(RN ) , and, for each T ∈ (0, ∞), let µTn ∈ M1 C [0, T ]; E) denote the distribution of ψ ∈ C(RN ) 7−→ ψ [0, T ] ∈ C [0, T ]; RN under µn . Show that there is a µ ∈ M1 C(RN ) to which {µn : n ≥ 1} converges in M1 C(RN ) if and only if, for each T ∈ (0, ∞), there is a µT ∈ M1 C([0, T ]; RN ) with the property that µTn =⇒ µT in M1 C([0, T ]; RN ) , in which case µT is the distribution of ψ ∈ C(RN ) 7−→ ψ [0, T ] ∈ C([0, T ]; RN ) under µ. In particular, weak convergence of measures on C(RN ) is really a local property. Exercise 9.3.7. Donsker’s own proof of Theorem 9.3.1 was entirely different from the one given here. Instead it was based on a special case of his result, a case that had been proved already (with a very difficult argument) by P. Erd¨os and M. Kac. The result of Erd¨os and Kac was that if {Xn : n ≥ 1} is a sequence of independent, uniformly square integrable random variables with mean value 0 and variance 1, then, for all a ≥ 0, ! r Z m X 2 ∞ − x2 − 21 e 2 dx. Xk ≥ a = lim P max n n→∞ 1≤m≤n π a k=1
Prove their result as an application of Donsker’s Theorem and part (iii) of Exercise 4.3.11. According to Kac, it was G. Uhlenbeck who first suggested that their result might be a consequence of a more general “invariance” principle. Exercise 9.3.8. Here is another version of Rayleigh’s random flights model. Again let {τk : k ≥ 1}, Tm : m ≥ 0 , and N (t) : t ≥ 0 be as in § 4.2.2, and set Z t √ R(t) = (−1)N (s) ds and R (t) = R t . 0
Show that R ∗ P =⇒ W (1) as & 0. Hint: Set βk = 0 or 1 according to whether k ∈ N is even or odd, and note that n X k=1
(−1)k τk =
n X k=1
X βk τk+1 − τk − βn τn =
τ2k − τ2k−1 − βn τn+1 .
1≤k≤ n 2
Now proceed as in the derivations of Lemma 9.3.3 and Theorem 9.3.5.
Chapter 10 Wiener Measure and Partial Differential Equations
In this chapter I will give a somewhat sketchy survey of the bridge between Brownian motion and partial differential equations. Like all good bridges, it is valuable when crossed starting at either end. For those starting from the probability side, it provides a computational tool with which the evaluation of many otherwise intractable Wiener integrals is reduced to finding the solution to a partial differential equation. For aficionados of partial differential equations, it provides a representation of solutions that often reveals properties that are not at all apparent in more conventional, purely analytic, representations. § 10.1 Martingales and Partial Differential Equations The origin of all the connections between Brownian motion and partial differential equations is the observation that the Gauss kernel (10.1.1)
N
g (N ) (t, x) = (2πt)− 2 e−
|x|2 2t
,
(t, x) ∈ (0, ∞) × RN ,
is simultaneously the density for the Gaussian distribution γ0,tI and the solution to the heat equation ∂t u = 12 ∆u in (0, ∞) × R with initial condition δ0 . More precisely, if ϕ ∈ Cb (RN ; R), then
Z uϕ (t, x) =
g (N ) (t, y − x)ϕ(y) dy
RN
is the one and only bounded u ∈ C 1,2 (0, ∞) × RN ; R that solves the Cauchy initial value problem ∂t u = 21 ∆u in (0, ∞) × RN with lim u(t, · ) = ϕ uniformly on compacts. t&0
Checking that uϕ solves this problem is an elementary computation. Showing that it is the only solution is less straightforward. Purely analytic proofs can be based on the weak minimum principle. If one assumes more about u, then a probabilistic proof can be based on Theorem 7.1.6. Indeed, if one assumes that 400
§ 10.1 Martingales and Partial Differential Equations
401
u ∈ Cb1,2 [0, ∞) × RN ; C , then that theorem shows that, when B(t), Ft , P is a Brownian motion, for each T > 0, u(T −t∧T, x+B(t∧T )Ft , P is a martingale. Thus, Z u(T, x) = EP ϕ B(T ) = ϕ(x + y) γ0,tI (dy) = uϕ (T, x). RN
In Theorem 10.1.2, I will prove a refinement of Theorem 7.1.6 that will enable me (cf. the discussion following Corollary 10.1.3) to remove the assumption that the derivatives of u are bounded. As the preceding line of reasoning indicates, the advantage that probability theory provides comes from lifting questions about a partial differential equation to a pathspace setting, and martingales provide one of the most powerful machines with which to do the requisite lifting. In this section I will refine and exploit that machine. § 10.1.1. Localizing and Extending Martingale Representations. The purpose of this subsection is to combine Theorems 7.1.6 and 7.1.17 with Corollary 7.1.15 to obtain a quite general method for representing solutions to partial differential equations as Wiener integrals. For the purposes of this chapter, it is best to think of Wiener measure W (N ) N N as a Borel measure on the Polish space C(R ) ≡ C [0, ∞); R and to take {Ft : t ≥ 0} with Ft = σ {ψ(τ ) : τ ∈ [0, t]} as the standard choice of a non-decreasing family of σ-algebras. The reason for using C(RN ) instead of (cf. (N ) § 8.1.3) Θ(RN ) is that we will want to consider the translates Wx of W (N ) by (N ) x ∈ RN . That is, Wx is the distribution of ψ x + ψ under W (N ) . Since it (N ) N is clear that the map x ∈ R 7−→ Wx ∈ M1 C(RN ) is continuous, there is no doubt that it is Borel measurable. Theorem 10.1.2. Let G be a non-empty, open subset of R × RN , and, for s ∈ R, define ζsG : C(RN ) −→ [0, ∞] by ζsG (ψ) = inf t ≥ 0 : s + t, ψ(t) ∈ /G . Further, suppose that V : G −→ R is a Borel measurable function that is bounded above on the whole of G and bounded below on each compact subset of G, and set ! Z t∧ζsG EsV (t, ψ) = exp V s + τ, ψ(τ ) dτ . 0
If w ∈ C 1,2 (G; R) ∩ Cb (G; R) satisfies ∂t + 12 ∆ + V w ≥ f on G, where f : G −→ R is a bounded, Borel measurable function, then EsV (t, ψ)w s + t ∧ ζsG (ψ), ψ(t ∧ ζsG ) Z − 0
t∧ζsG (ψ)
V
E (τ, ψ)f s + τ, ψ(τ ) , Ft , Wx(N )
402
10 Wiener Measure and P.D.E.’s
is a submartingale for every (s, x) ∈ G. In particular, if ∂t + 12 ∆ + V w = f on G, then the preceding triple is a martingale. Proof: Without loss in generality, I may and will assume that s = 0. Choose a sequence {Gn : n ≥ 0} of open sets such that (0, x)S∈ G0 , Gn ⊆ ∞ Gn+1 , Gn is a compact subset of G for each n ∈ N , and G = n=0 Gn . At the same time, for each n ∈ N, choose ηn ∈ C ∞ R × RN ; [0, 1] so that ηn = 1 on Gn and ηn vanishes off a compact subset of G, and define wn and Vn so that wn = ηn w and V n = ηn V on G and wn and Vn vanish off of G. Clearly, 1,2 N wn ∈ Cb R × R ; R and Vn is bounded and measurable. (N ) By Theorem 7.1.6, we know that Mn (t), Ft , Wx is a martingale, where
Mn (t, ψ) = wn
t, ψ(t) −
Z
t
with gn = ∂t wn + 12 ∆wn .
gn τ, ψ(τ ) dτ
0
Thus, if Z En (t, ψ) = exp
t
Vn τ, ψ(τ ) dτ
,
0
then, by Theorem 7.1.17, Z t (N ) En (t, ψ)Mn (t, ψ) − En (τ, ψ)Mn (τ, ψ)Vn (τ, ψ) dτ, Ft , Wx 0
is also a martingale. In addition, Z τ Z t En (τ, ψ)Vn (τ, ψ) gn σ, ψ(σ) dσ dτ 0 0 Z t Z t = gn σ, ψ(σ) En (τ, ψ)Vn τ, ψ(τ ) dτ dσ 0
σ t
Z = En (t, ψ)
gn
σ, ψ(σ) dσ −
0
Z
t
En (σ, ψ)gn σ, ψ(σ) dσ,
0
and therefore Z
t
En (t, ψ)Mn (t, ψ) −
En (τ, ψ)Mn (τ, ψ)Vn (τ, ψ) dτ Z t = En (t, ψ)wn t, ψ(t) − En (τ, ψ)fn τ, ψ(τ ) dτ, 0
0
where fn = gn + Vn wn . Hence, we now know that Z t (N ) En (t, ψ)wn t, ψ(t) − En (τ, ψ)fn τ, ψ(τ ) dτ, Ft , Wx 0
§ 10.1 Martingales and Partial Differential Equations
403
is a martingale. Finally, define ζ0Gn for Gn in the same way as ζ0G was defined for G. Since fn ≥ f on Gn , an application of Theorem 7.1.15 gives the desired result with ζ0Gn in place of ζ0G , and, because ζ0Gn % ζ0G , this completes the proof. Perhaps the most famous application of Theorem 10.1.2 is the Feynman–Kac formula,1 a version of which is the content of the following corollary. Corollary 10.1.3. Let V : [0, T ] × RN −→ R be a Borel measurable function that is uniformly bounded above everywhere and bounded below uniformly on compacts. If u ∈ C 1,2 (0, T ) × RN ; R is bounded and satisfies the Cauchy initial value problem ∂t u = 12 ∆u+V u+f in (0, T )×RN
with lim u(t, · ) = ϕ uniformly on compacts t&0
for some bounded, Borel measurable f : [0, T ] × RN −→ R and ϕ ∈ Cb (RN ; R), then RT (N ) V (τ,ψ(τ )) dτ u(T, x) = EWx e 0 ϕ ψ(T ) "Z # T Rt (N ) V (τ,ψ(τ )) dτ Wx +E e 0 f t, ω(t) dt . 0
Proof: Given Theorem 10.1.2, there is hardly anything to do. Indeed, here G = (0, T ) × RN and so ζ0G = T . Thus, by Theorem 10.1.2 applied to w(t, · ) = u(T − t, · ), we know that R t∧T V (τ,ψ(τ )) dτ e 0 u T − t ∧ T, ψ(t) Z
t∧T
−
e
Rτ 0
V (σ,ψ(σ)) dσ
f τ, ψ(τ ) dτ, Ft , Wx(N )
0
is a martingale. Hence, W (N )
u(T, x) = lim E t%T
Rt V (τ,ψ(τ )) dτ e 0 u T − t, ψ(t) Z +
t
e
Rτ 0
V (σ,ψ(σ)) dσ
f τ, ψ(τ ) dτ ,
0 1
In the same spirit as he wrote down (8.1.4), Feynman expressed solutions to Schr¨ odinger’s equation in terms of path-integrals. After hearing Feynman lecture on his method, Kac realized that one could transfer Feynman’s ideas from the Schr¨ odinger to the heat context and thereby arrive at a mathematically rigorous but far less exciting theory.
404
10 Wiener Measure and P.D.E.’s
from which the asserted equality follows immediately.
As a special case of the preceding, we obtain the missing uniqueness statement in the introduction to this section. Namely, if u ∈ C 1,2 (0, ∞) × RN ; C is a bounded solution to the heat equation with initial value ϕ, then, by considering the real and imaginary parts of u separately, Corollary 10.1.3 implies that (N )
u(t, x) = EWx
ϕ ψ(t) =
Z ϕ(y)g(t, y − x) dy. RN
§ 10.1.2. Minimum Principles. In this subsection I will show how Theorem 10.1.2 leads to an elegant derivation of the basic minimum principle for solutions to equations like the heat equation. Actually, there are two such minimum principles, one of which says that solutions achieve their minimum value at the boundary of the region in which they are defined and the other of which says that only solutions that are constant can achieve a minimum value on the interior. The first of these principles is called the weak minimum principle, and the second is called the strong minimum principle. Theorem 10.1.4. Let G be a non-empty open subset of R × RN , and let V be a function of the sort described in Theorem 10.1.2. Further, suppose that (s, x) ∈ G is a point at which (10.1.5)
Wx(N ) ∃ t ∈ (0, ∞) s − t, ψ(t) ∈ / G = 1.
If u ∈ C 1,2 (G; R) is bounded below and satisfies ∂t u − 12 ∆u − V u ≥ 0 in G and if lim(t,y)→(t0 ,y0 ) u(t, y) ≥ 0 for every (t0 , y0 ) ∈ ∂G with t0 < s, then u(s, x) ≥ 0.
Proof: Without loss in generality, I will assume that s = 0. ˜ = {(t, y) : (−t, y) ∈ G} and define w on G ˜ by w(t, y) = u(−t, y). Next, Set G choose an exhaustion {Gn : n ≥ 0} of G as in the proof of Theorem 10.1.2, and fn = {(t, y) : (−t, y) ∈ Gn }. By Theorem 10.1.2, we know that set G w(0, x) ≥ E
(N )
Wx
h R ζn (ψ) i V (−τ,ψ(τ )) dτ e 0 w ζn (ψ), ψ(ζn ) ,
fn } ∧ n. Moreover, by (10.1.5), for where ζn (ψ) = inf{t ≥ 0 : t, ψ(t) ∈ / G (N ) Wx -almost every ψ, −ζn (ψ), ψ(ζn ) tends to a point in {(t, x) ∈ ∂G : t < 0} as n → ∞, and therefore
lim w ζn (ψ), ψ(ζn ) = lim u −ζn (ψ), ψ(ζn ) ≥ 0 n→∞
Wx(N ) -almost surely.
n→∞
Hence, by Fatou’s Lemma, we see that u(0, x) = w(0, x) ≥ 0.
§ 10.1 Martingales and Partial Differential Equations
405
Theorem 10.1.6. In the same setting as the preceding, suppose that u ∈ C 1,2 (G; R) satisfies ∂t u − 12 ∆u − V u ≥ 0 in G. If (s, x) ∈ G and 0 = u(s, x) ≤ u(t, y) for all (t, y) ∈ G with t ≤ s, then u s − t, ψ(t) = 0 for all (t, ψ) ∈ [0, ∞) × C(RN ) such that ψ(0) = x and s − τ, ψ(τ ) ∈ G for all τ ∈ [0, t]. In particular, if G is a connected, open subset of RN , V is independent of time, 2 and u ∈ C G; [0, ∞) satisfies 12 ∆u + V u ≤ 0, then either u ≡ 0 or u > 0 everywhere on G.
Proof: Again, without loss in generality, I assume that s = 0. In addition,I may and will assume that x = 0, V is uniformly bounded, and u ∈ Cb G; [0, ∞) . To see that these latter assumptions cause no loss in generality, one can use an exhaustion argument of the same sort as was used in the proof of Theorem 10.1.2. N Given (t, ψ) ∈ (0, ∞)×C(R ) with ψ(0) = 0 and −τ, ψ(τ ) ∈ G for τ ∈ [0, t], suppose that u −t, ψ(t) > 0. In order to get a contradiction, choose r > 0 so that u(−t, y) ≥ r if |y − ψ(t)| ≤ r and so that −τ, ψ 0 (τ ) ∈ G if τ ∈ [0, t] and ˜ = {(t, y) : (−t, y) ∈ G}, then, just as in the proof of kψ 0 − ψk[0,t] ≤ r. If G Theorem 10.1.2,
Z 0 = u(0, 0) ≥
e
R t∧ζG˜ (ψ0 ) 0
0
≥ re−tkV ku k W (N )
V (−τ,ψ 0 (τ )) dτ
˜ ˜ u −t ∧ ζ0G (ψ 0 ), ψ 0 (t ∧ ζ0G ) W (N ) (dψ 0 ) {ψ 0 : kψ 0 − ψk[0,t] ≤ r} .
Since, by Corollary 8.3.6, W (N ) {ψ 0 : kψ 0 − ψk[0,t] ≤ r} > 0, we have the required contradiction. Turning to the final assertion, take G = R × G, and observe that for all (x, y) ∈ G2 there is a ψ such that ψ(0) = x, ψ(1) = y, and ψ(τ ) ∈ G for all τ ∈ [0, 1]. At first glance, one might think that the strong minimum principle overshadows the weak minimum principle and makes it obsolete. However, that is not entirely true. Specifically, before one can apply the strong minimum principle, one has to know that a minimum is actually achieved. In many situations, continuity plus compactness provide the necessary existence. However, when compactness is absent, special considerations have to be brought to bear. The weak minimum principle does not suffer from this problem. On the other hand, it suffers from a related problem. Namely, one has to know ahead of time that (10.1.5) holds. As we will see below, this is usually not too serious a problem, but it should be kept in mind. § 10.1.3. The Hermite Heat Equation. In the preceding subsection I gave an example of how probability theory can give information about solutions to partial differential equations. In this subsection, it will be a differential equation that gives us information about probability theory. To be precise, I, following M. Kac, will give in this subsection his derivation of the formulas that we derived
406
10 Wiener Measure and P.D.E.’s
by purely Gaussian techniques in Exercise 8.2.16, and in the next section I will give his treatment of a closely related problem.2 Closed form solutions to the Cauchy initial value problem are available for very few V ’s, but there is a famous one for which they are. Namely, when V = − 12 |x|2 , a great deal is known. Indeed, already in the nineteenth century, Hermite knew how to analyze the operator 12 ∆− 12 |x|2 . As a result, this operator is often called the Hermite operator by mathematicians, although physicists call it the harmonic oscillator because it arises in quantum mechanics as minus the Hamiltonian for an oscillator that satisfies Hook’s law. Be that as it may, set (cf. (10.1.1))
(10.1.7)
h(t, x, y) = e
−
N t+|x|2 2
g
(N )
|y|2 1 − e−2t −t ,y − e x e 2 2
for (t, x, y) ∈ (0, ∞) × RN × RN . By using the fact that g (N ) solves the heat equation and tends to δ0 as t & 0, one can apply elementary calculus to check that ∂t h(t, · , y) = 12 ∆ − 12 |x|2 h(t, · , y) in (0, ∞) × RN for each y ∈ RN . and lim h(t, x, y) = δy−x t&0
Now let ϕ ∈ Cb (RN ; R) be given, and set Z uϕ (t, x) =
ϕ(y)h(t, x, y) dy. RN
Then, uϕ is a bounded solution to ∂t u = 12 ∆u − 12 |x|2 u that tends to ϕ as t & 0. Hence, as an immediate consequence of Corollary 10.1.3, we see that (N )
uϕ (t, x) = EWx
h 1 Rt i |ψ(τ )|2 dτ − ϕ ψ(t) . e 2 0
By taking ϕ = 1 and performing a tedious, but completely standard, Gaussian computation, one can use this to derive (N )
E
Wx
Rt − N2 |x|2 |ψ(τ )|2 dτ − 12 0 tanh t , exp − = cosh t e 2
which, together with Brownian scaling, vastly generalizes the result in Exercise 8.2.16. 2
See Kac’s “On some connections between probability theory and differential and integral equations,” Proc. 2nd Berkeley Symp. on Prob. & Stat. Univ. of California Press (1951), where he gives several additional, intriguing applications of Corollary 10.1.3.
§ 10.1 Martingales and Partial Differential Equations
407
§ 10.1.4. The Arcsine Law. As I said at the beginning of the last subsection, there are very few V ’s for which one can write down explicit solutions to equations of the form ∂t u = 12 ∆u + V u. On the other hand, when V is independent of time one can often, particularly whenRN = 1, write down a closed form ex∞ pression for the Laplace transform Uλ = 0 e−λt u(t, · ) dt of u. Indeed, if u is a bounded solution to ∂t u = 12 ∆u + V u, then it is an elementary exercise to check that λ − 12 ∆ − V Uλ = f,
and when N = 1 this is an ordinary differential equation. Moreover, when Uλ ∈ C 2 (RN ; R) is bounded, one can apply Corollary 10.1.3 to see that h RT i V (ψ(τ )) dτ e 0 λ Uλ ψ(T ) "Z # T Rt (N ) Vλ (ψ(τ )) dτ Wx +E e 0 f ψ(t) dt (N )
Uλ (x) =EWx
for T > 0,
0
where Vλ = V − λ. Hence, if Vλ is uniformly negative and one lets T → ∞, one gets Z ∞ R t (N ) Vλ (ψ(τ )) dτ Wx 0 Uλ (x) = E e f ψ(t) dt . 0
The preceding remark is the origin of Kac’s derivation of L´evy’s Arcsine Law for Wiener measure. Theorem 10.1.8. For every T ∈ (0, ∞) and α ∈ [0, 1], ( W
(1)
Z
1 ψ ∈ C(R) : T
)!
T
1[0,∞) ψ(t) dt ≤ α
=
0
√ 2 arcsin α . π
Proof: First note that, by Brownian scaling, it suffices to prove the result when T = 1. Next, set 1
Z F (α) = W ψ ∈ C(R) :
1[0,∞)
ψ(s) ds ≤ α
,
α ∈ [0, ∞),
0
and let µ denote the element of M1 [0, ∞) for which F is the distribution function. We are going to compute F (α) by looking at the double Laplace transform Z G(λ) ≡ e−λt g(t) dt, λ ∈ (0, ∞), (0,∞)
where
Z g(t) ≡ [0,∞)
e−tα µ(dα),
t ∈ (0, ∞);
408
10 Wiener Measure and P.D.E.’s
and, by another application of the Brownian scaling property, we see that ∞
Z t (1) G(λ) = exp − λ + 1[0,∞) ψ(s) ds W (dψ) dt 0 0 Z ∞ R t (1) V (ψ(τ )) dτ = EW e 0 λ dt where Vλ ≡ −λ − 1[0,∞) . Z
Z
0
At this point, the strategy is to calculate G(λ) with the help of the idea explained above. For this purpose, I begin by seeking as good a solution x ∈ R 7−→ uλ (x) ∈ R as I can find to the equation 12 u00 + Vλ u = −1. By considering this equation separately on the left and right half-lines and then matching, in so far as possible, at 0, one finds that the best choice of bounded uλ will be to take i h p 1 if x ∈ [0, ∞) Aλ exp − 2(1 + λ) x + 1+λ uλ (x) = i h√ Bλ exp 2λ x + 1 if x ∈ (−∞, 0), λ
where Aλ =
1 λ(1 + λ)
12
1 − 1+λ
and Bλ =
1 λ(1 + λ)
12
−
1 . λ
(The choice of sign in the exponent is dictated by my desire to have uλ bounded.) If uλ were twice continuously differentiable, I could apply the reasoning above directly and thereby arrive at G(λ) = uλ (0). However, because the second derivative of uλ is discontinuous at 0, I have to work a little harder. Notice that, although the second derivative of uλ has a discontinuity at 0, u0λ is nonetheless uniformly Lipschitz continuous everywhere. Hence, by taking ρ ∈ Cc∞ R; [0, ∞) with Lebesgue integral 1 and setting Z uλ (x − y)ρ(ny) dy,
uλ,n (x) = n
n ∈ Z+ ,
R
we see that uλ,n ∈ Cb∞ (R; R) for each n ∈ Z+ , uλ,n −→ uλ uniformly on R as
n → ∞, supn∈Z+ uλ,n kCb2 (R;R) < ∞, and, as n → ∞, fn ≡
1 00 uλ,n − λ + 1[0,∞) uλ,n −→ −1 2
on R \ {0}.
Thus, since the argument that I attempted to apply to uλ works for uλ,n , we know that Z ∞ R t Vλ (ψ(τ )) dτ W (1) uλ,n (0) = E e 0 fn ψ(t) dτ dt . 0
§ 10.1 Martingales and Partial Differential Equations
409
In addition, because W (1)
∞
Z
1{0}
E
0 W (1)
∞
Z ψ(t) dt =
γ0,t {0} dt = 0,
0 ∞
Z
e
E
Rt 0
Vλ (ψ(τ )) dτ
fn
ψ(t) dt −→ G(λ).
0
Hence, the conclusion uλ (0) = G(λ) has now been rigorously verified. − 1 Knowing that G(λ) = λ(1−λ) 2 , the rest of the calculation is easy. Indeed, since r Z ∞ π − 12 −λt , t e dt = λ 0
the multiplication rule for Laplace transforms tells us that 1 g(t) = π
Z
t
0
e−s
1 p ds = π s(t − s)
Z 0
1
e−tα p dα; α(1 − α)
and so we now find that Z √ 2 1 1 α∧1 p dβ = arcsin α ∧ 1 . F (α) = π π 0 β(1 − β)
Just as Donsker’s Invariance Principle enabled us in Exercise 9.3.7 to derive the Erd¨os–Kac Theorem from the reflection principle for Brownian motion, it now allows us to transfer the Arcsine Law for Wiener measure to the Arcsine Law for sums of independent random variables. Corollary 10.1.9. If Xn : n ≥ 1 is a sequence of independent, uniformly square P-integrable random variables with mean value 0 and variance 1 on some probability space (Ω, F, P), then, for every α ∈ [0, 1], √ 2 Nn (ω) ≤α = arcsin α , lim P ω: n→∞ π n Pm where Nn (ω) is the number of m ∈ Z+ ∩ [0, n] for which Sm (ω) ≡ `=1 X` (ω) is non-negative.
Proof: Thinking of § 9.2.1)
Nn (ω) n
as a Riemann approximation to (cf. the notation in Z
1
1[0,∞) Sn (t, ω) dt,
0
one should guess that, in view of Theorem 9.3.1 and Theorem 9.1.13, there should be very little left to be done. However, once again there are continuity
410
10 Wiener Measure and P.D.E.’s
issues that have to be dealt with. Thus, for each f ∈ C R; [0, 1] and n ∈ Z+ , introduce the functions F f and Fnf on C(R) given by F f (ψ) =
Z
1
f ψ(t) dt
and Fnf (ψ) =
0
n 1 X f ψ n m=1
m n
for any f ∈ C R; [0, 1] . Since Fnf −→ F f uniformly on compacts, Theorem 9.3.1 plus Lemma 9.1.10 show that the distribution of ω ∈ Ω 7−→ Afn (ω) ≡
n Sm (ω) 1 X f 1 n m=1 n2
under P tends weakly to that of ψ ∈ C(R) 7−→ F f (ψ) under W (1) . Next, for each δ ∈ (0, ∞), choose continuous functions fδ± so that 1(δ,∞) ≤ fδ+ ≤ 1[0,∞) and 1[0,∞) ≤ fδ− ≤ 1[−δ,∞) , and conclude that
+ Nn (1) fδ ≤α ≤W F ≤α lim P n→∞ n and
− Nn (1) fδ 0, apply Theorem 10.1.2 to see that (cf. the notation in Theorem 10.1.11) ψ(t ∧ ζr ) −N +2 , Ft , Wx(N ) is a bounded, non-negative martingale for every |x| > r > 0. Hence, by Theorem 7.1.14, for any 0 ≤ s ≤ t < ∞ and A ∈ Fs , h i ψ(s) −N +2 , A ∩ ζr (ψ) > s h −N +2 i (N ) = EWx ψ t ∧ ζr , A ∩ ζr (ψ) > s ; (N )
|x|−N +2 ≥ EWx
(N ) and, because N ≥ 3 and therefore ζr % ∞ a.s., Wx as r & 0, an application of the Monotone Convergence Theorem and Fatou’s Lemma leads to (N )
|x|−N +2 ≥ EWx
h i h i (N ) ψ(s) −N +2 , A ≥ EWx ψ(t) −N +2 , A
for all 0 ≤ s ≤ t < ∞, A ∈ Fs , and x 6= 0. In particular, this proves that −N +2 − ψ(t) , Ft , Wx(N ) is a non-positive submartingale for every x 6= 0 and therefore, by Theorem (N ) 7.1.10, that limt→∞ ψ(t) exists in [0, ∞] for Wx -almost every ψ ∈ C(RN ). At the same time, Wx(N ) ψ(t) ≤ R = γ0,tI y : |y − x| ≤ R −→ 0 as t → ∞ for every R ∈ (0, ∞) and x ∈ RN ; and so we now know that, at least (N ) when x 6= 0, |ψ(t)| −→ ∞ for Wx -almost every ψ ∈ C(RN ). Finally, since (N ) W0
inf ψ(t) ≤ R
t≥T +1
Z =
Wx(N )
inf ψ(t) ≤ R
t≥T
γ0,I (dx),
RN \{0}
the same result also holds when x = 0. The conclusion drawn in the preceding is sometimes summarized as the statement that Brownian motion in three or more dimensions is transient.
Exercises for § 10.1
415
Exercises for § 10.1 Exercise 10.1.13. Referring to § 8.4.1, define U(t, x, θ) by (8.5.1), and let (N ) Ux ∈ M1 C(RN ) denote the W (N ) -distribution of θ U (N ) ( · , x, θ). Given N G a non-empty open set G ⊆ R × R , define ζs (ψ) as in Theorem 10.1.2, and show that for each w ∈ C 1,2 (G; R) ∩ Cb (G; R) and f ∈ Cb (G; R) satisfying 1 1 ∂t w(t, y) − ∆w(t, y) + y, ∇w(t, y) RN ≥ f in G, 2 2 ! Z t∧ζsG (ψ) G G (N ) w s + t ∧ ζs (ψ), ψ(t ∧ ζs ) − f s + τ, ψ(τ ) dτ, Ft , Ux 0
is a submartingale for all (s, x) ∈ G. Exercise 10.1.14. Let h be the function described in (10.1.7), and show that h 1 RT i (N ) h t, x, ψ(T ) |ψ(τ )|2 dτ −2 Wx 0 . E e σ {ψ(T )} = (N ) g T, ψ(T ) − x
Next, referring to Exercise 8.3.21, set `T,x,y (t) = TT−t x + Tt y for t ∈ [0, T ], let (N ) WT,x,y ∈ M1 C([0, T ]; RN ) denote the W (N ) -distribution of θ `T,x,y + θT [0, T ], and show that i h 1 RT (N ) h(t, x, y) |ψ(τ )|2 dτ − . = (N ) EWT ,x,y e 2 0 g (T, y − x) Exercise 10.1.15. The purpose of this exercise is to examine the assertion made in Remark 10.1.10 about the characterization of the arcsine distribution (i.e., the Borel probability √ measure on [0, 1] with distribution function x ∈ [0, 1] 7−→ F (x) = π2 arcsin x ∈ [0, 1]). Specifically, the goal is to show that the arcsine distribution is the one and only Borel probability measure on [0, 1] that is absolutely continuous with respect to Lebesgue measure and invariant under x ∈ [0, 1] 7−→ 4x(1 − x) ∈ [0, 1]. 2 ∈ [0, 1], and show that a Borel (i) Define x ∈ [0, 1] 7−→ Φ(x) = sin πx 2 probability measure µ on [0, 1] is invariant under x 4x(1−x) if and only if Φ∗ µ is invariant under x 2x mod 1. Conclude that the desired characterization of the arcsine distribution is equivalent to showing that Lebesgue measure λ[0,1] on [0, 1] is the one and only Borel probability measure on [0, 1] that is absolutely continuous with respect to Lebesgue measure and invariant under x 2x mod 1.
(ii) Suppose that µ is a Borel probability measure on [0, 1] that is invariant under x 2x mod 1 and assigns probability 0 to {0}. Set F (x) = µ [0, x] , the distribution function for µ, and use induction on n ≥ 0 to show that n 2X −1 F (x) = F m2−n + x2−n − F m2−n m=0
for x ∈ [0, 1].
416
10 Wiener Measure and P.D.E.’s
(iii) Now add the assumption that µ λ[0,1] , let f be the corresponding Radon– Nikodym derivative, and extend f to R by taking f = 0 off of [0, 1]. Given 0 ≤ x < x + y ≤ 1, conclude that Z F (x + y) − F (x) − F (y) ≤ f t + x2−n − f (t) dt −→ 0 R
as n → ∞. In other words, F (x + y) = F (x) + F (y) whenever 0 ≤ x < x + y ≤ 1. Finally, after combining this with the facts that F (0) = 0, F (1) = 1, and F is continuous, conclude that F (x) = x for x ∈ [0, 1]. In view of part (i), this completes the proof that the arcsine distribution admits the asserted characterization. (vi) To see that absolute continuity is absolutely essential in the preceding con+ siderations, consider any Borel probability measure M on {0, 1}Z that is stationary in the sense that the M -distribution of +
+
ω ∈ {0, 1}Z 7−→ (ω2 , . . . , ωn+1 , . . . ) ∈ {0, 1}Z is again M . Show that the M -distribution µ of ∞ X + ω ∈ {0, 1}Z 7−→ 2−n ωn ∈ [0, 1] n=1
is invariant under x 2x mod 1. In particular, this means that, for each p ∈ (0, 1) \ { 12 }, the µp described in Exercise 1.4.29 is a non-atomic, Borel probability measure on [0, 1] that is invariant under x 2x mod 1 but singular to Lebesgue measure.
§ 10.2 The Markov Property and Potential Theory In this section I will discuss the Markov property for Wiener measure and show how it can be used as a tool for connecting Brownian motion to partial differential equations. § 10.2.1. The Markov Property for Wiener Measure. The introduction (N ) of the translates Wx ’s facilitates the statement of the following important interpretation of Theorem 7.1.16. In its statement, and elsewhere, Σt : C(RN ) −→ C(RN ) is the time-shift map determined by Σt ψ(τ ) = ψ(t + τ ), τ ∈ [0, ∞), and when ζ is a stopping time, Σζ is the map on {ψ : ζ(ψ) < ∞} −→ C(RN ) given by Σζ ψ(τ ) = ψ ζ(ψ) + τ . Theorem 10.2.1. If ζ is a stopping time and F : C(RN ) × C(RN ) −→ [0, ∞) is a Fζ × FC(RN ) -measurable function, then Z F ψ, Σζ ψ Wx(N ) (dψ) {ψ:ζ(ψ) 0, (N ) G G (10.2.13) lim W ζ , ψ(ζ ) ∈ (0, δ) × B(a, δ) = 1. x x→a x∈G
Proof: Set G(a, r) = G ∩ BRN (a, r). Since it is obvious that ζ G(a,r) is dominated by ζ G , there is no question that a ∈ ∂reg G =⇒ a ∈ ∂reg G(a, r). On the other hand, if a ∈ ∂reg G(a, r) and > 0, then, for all 0 < δ < ,
lim Wx(N ) (ζ G ≥ δ) lim Wx(N ) (ζ G ≥ ) ≤ x→a
x→a x∈G
x∈G
≤
lim x→a
Wx(N )
lim Wx(N ) ζ BRN (a,r) ≤ δ ζ G(a,r) ≥ δ + x→a x∈G
x∈G(a,r)
≤ W (N )
sup |ψ(t)| ≥ t∈[0,δ]
r 2
! −→ 0
as δ & 0.
Hence, we have now also proved that a ∈ ∂reg G(a, r) =⇒ a ∈ ∂reg G. Next, let a ∈ ∂G. To check the equivalence in (10.2.12), use the first part of (10.2.10) and the Markov property to see that Z x ∈ RN 7−→ Wx(N ) ζsG ≥ δ) = Wy(N ) ζ G ≥ δ − s g (N ) (s, y − x) dy ∈ [0, 1] RN
is a continuous function for every s ∈ (0, ∞), and therefore that G x ∈ RN 7−→ Wx(N ) ζ0+ ≥ δ = lim Wx(N ) ζsG ≥ δ s&0
(N )
is upper semicontinuous for all δ ≥ 0. In particular, if Wa G because ζ G (ψ) = ζ0+ (ψ) when ψ(0) ∈ G, it follows that
G ζ0+ > 0 = 0, then,
G lim Wx(N ) ζ0+ ≥δ =0 lim Wx(N ) ζ G ≥ δ = x→a
x→a x∈G
x∈G
for every δ > 0. To prove the converse, suppose that a ∈ ∂reg G, let positive and δ be given, and choose r > 0 so that Wx(N ) ζ G ≥ δ ≤ for x ∈ G ∩ B(a, r). Then, by the second part of (10.2.10), the Markov property, and (4.3.13), for each s ∈ (0, δ) one has h i (N ) (N ) G Wa(N ) ζ0+ ≥ 2δ ≤ EWa Wψ(s) ζ G ≥ δ , ψ(s) ∈ G r2 ≤ + Wa(N ) ψ(s) ∈ / B(a, r) ≤ + 2N e− 2N s ,
§ 10.2 The Markov Property and Potential Theory
423
(N ) G from which Wa ζ0+ > 0 = 0 follows when first s & 0 and then & 0. Now, assume that a ∈ ∂reg G, and observe that, for each 0 < < δ, Wx(N ) ψ ζ G ∈ / B(a, δ) or ζ G ≥ δ ! ≤ Wx(N ) ζ G ≥ + Wx(N ) sup |ψ(t) − a| ≥ δ . t∈[0,]
Hence, (10.2.9) and (4.3.13) together imply that
lim Wx(N ) x→a x∈G
ψ ζ
G
∈ / B(x, δ) or ζ
G
δ2 , ≥ δ ≤ 2N exp − 2N
from which (10.2.13) follows after one lets & 0. In view of the last part of Theorem 10.2.4 and (10.2.13), the following statement is obvious. Theorem 10.2.14. Let G be a non-empty open subset of RN and f : ∂G −→ R a bound, Borel measurable function. If u is given by (10.2.8), then u is a bounded harmonic function in G, and, for every a ∈ ∂reg G at which f is continuous, u(x) −→ f (a) as x → a through G. Before closing this brief introduction to one of the most successful applications of probability theory to partial differential equations, it seems only appropriate to check that the conclusion in Theorem 10.2.14 is equivalent to the classical one at which analysts arrived. To be precise, recall the famous program, initiated by O. Perron and completed by Wiener, M. Br´elot, and others, for solving the Dirichlet problem. Namely, given a bounded, non-empty open set G in RN and an f ∈ C(∂G; R), consider the set U(f ) of lower semicontinuous functions w : G −→ R that are bounded below and satisfy the super-mean value property Z B(x, r) ⊂⊂ G =⇒ w(x) ≥ − w(x + rω) λSN −1 (dω), SN −1
and the boundary condition lim w(x) ≥ f (a)
for all a ∈ ∂G.
x→a x∈G
At the same time, define L(f ) to be the set of v : G −→ R such that −v ∈ U(−f ). Finally, given a ∈ ∂G, say that a admits a barrier if, for some r > 0, there exists an η ∈ C 2 G ∩ B(a, r); (0, ∞) such that lim
x→a x∈G∩B(a,r)
η(x) = 0
and
∆η ≤ − for some > 0.
424
10 Wiener Measure and P.D.E.’s
A famous theorem3 proved by Wiener states that inf{w(x) : w ∈ U(f )} = sup{v(x) : v ∈ L(f )} and that if Hf (x) denotes this common value, then x harmonic function on G with the property that lim Hf (x) = f (a)
x→a x∈G
for all x ∈ G Hf (x) is a bounded
for a ∈ ∂G that admit a barrier.
Theorem 10.2.15. Referring to the preceding paragraph, the function Hf described there coincides with the function u in (10.2.8). In addition, a boundary point a ∈ ∂G is regular (i.e., (10.2.9) holds) if and only if it admits a barrier. Proof: To prove the first part, all that I have to do is check that v ≤ u ≤ w for all v ∈ L(f ) and w ∈ U(f ). For this purpose, set r(x) = 12 |x − G{|, and define {ζ n : n ≥ 0} so that ζ 0 = 0 and ζ n+1 (ψ) = inf t ≥ ζ n (ψ) : |ψ(t) − ψ(ζ n )| ≥ r ψ(ζ n ) for n ≥ 0,
with the usual understanding that ζ n (ψ) = ∞ =⇒ ζ n+1 (ψ) = ∞. An easy inductive argument shows that all the ζ n ’s are stopping times. In addition, it is clear that ζ n ≤ ζ n+1 ≤ ζ G . I now want to show that ζ G (ψ) < ∞ =⇒ ζ n (ψ) % ζ G (ψ). To this end, suppose that supn≥0 ζ n (ψ) < ζ G (ψ) < ∞, in which case there exists an > 0 such that r ψ(ζ n ) ≥ for all n ≥ 0. But this would mean that {ζ n (ψ) : n ≥ 0} is a bounded sequence for which inf n≥0 |ψ(ζ n+1 ) − ψ(ζ n )| ≥ , which contradicts the continuity of ψ. Finally, choose a reference point y ∈ G, and set Xn (ψ) equal to ψ(ζ n ) or y according to whether ζ n (ψ) < ∞ or not, Rn (ψ) = r Xn (ψ) , and Bn (ψ) = B Xn (ψ), Rn (ψ) , the ball around Xn (ψ) of radius Rn (ψ), and observe that ζ n (ψ) < ∞ =⇒ ζ n+1 (ψ) = ζ n (ψ) + ζ Bn (ψ) ◦ Σζ n (ψ). With these preparations at hand, let w ∈ U(f ) and x ∈ G be given. By Theorem 10.2.1 and the preceding, (N ) EWx w ψ(ζ n+1 ) , ζ n+1 (ψ) < ∞ Z Z (N ) = w ψ 0 (ζ Bn (ψ) ) WXn (ψ) (dψ 0 ) Wx(N ) (ψ) {ψ:ζ n (ψ) 0 and satisfies (10.3.19). Proof: The only assertion that has not already been proved is that the u described takes on the correct initial value. However, because q V (t, x, y) ≤ + ekV ku g (N ) (t, y − x), it is clear that, for each r > 0, Z
q V (t, x, y) dy = 0.
lim sup
t&0 x∈RN
B(x,r){
Hence, all that remains is to check that, for each R > 0, Z lim sup 1 − q V (t, x, y) dy = 0. t&0 |x|≤R
RN
But if K(R) = sup|y|≤2R |V (y)|, then Z sup 1 −
Rt V (ψ(τ )) dτ W (N ) x q (t, x, y) dy ≤ E 1 − e 0 |x|≤R RN + ≤ tK(R)etK(R) + 1 + etkV ku W (N ) kψk[0,t] ≥ R , V
which, by (4.3.13), gives the desired conclusion.
§ 10.3 Other Heat Kernels
439
§ 10.3.4. Ground States and Associated Measures on Pathspace. From a probabilistic standpoint, the heat kernel q V (t, x, y) is flawed by the fact that it is not a probability density. However, in many cases this flaw can be removed by what physicists call switching to the ground state representation. This terminology and the ideas underlying it are best understood when expressed in terms of operators. Thus, let V ∈ C(RN ; R) be bounded above, refer to the preceding subsection, and define the operator Z V Qt ϕ(x) = ϕ(y)q V (t, x, y) dy for t ≥ 0 and ϕ ∈ Cb (RN ; R). RN
QVt
We know that is a bounded map from Cb (RN ; R) into itself. In addition, by V (10.3.17), {Qt : t ≥ 0} is a semigroup. That is, QVs+t = QVt ◦ QVs . Also, by Corollary 10.3.22, we know that if (10.3.23) V ∈ C 2 (RN ; R) and max |∂ α V | ≤ C 1 + V − , kαk≤2
then (t, x) QVt ϕ(x) is a solution to (10.3.19). I will say that ρ : RN −→ R is a ground state for V if ρ is a (strictly) positive, continuous function that satisfies the equation etλ ρ = QVt ρ for some λ ∈ R and all t ≥ 0, in which case λ will be called the eigenvalue associated with ρ. Lemma 10.3.24. Let V be as above, and assume that ρ ∈ C RN ; [0, ∞) does not vanish identically. If etλ ρ = QVt ρ for all t ≥ 0, then ρ is a ground state with associated eigenvalue λ. In fact, ρ ∈ Cb2 RN ; (0, ∞) if ρ is bounded and V ∈ C 2 (RN ; R) satisfies (10.3.23). Next, if ρ is a twice continuously differentiable ground state with associated eigenvalue λ, then 12 ∆ρ + V ρ = λρ. Conversely, if ρ is a twice continuously differentiable, bounded solution to 12 ∆ρ + V ρ = λρ, then ρ is a ground state with associated eigenvalue λ.
Proof: Since I can always replace V by V −λ, I may and will assume that λ = 0 throughout. Also, observe that if ρ ∈ C RN ; [0, ∞) satisfies ρ = QV1 ρ, then, because q V (1, x, y) > 0 everywhere, ρ > 0 everywhere unless ρ ≡ 0. Hence, the first assertion is proved. Next suppose that ρ is a twice continuously differentiable ground state with eigenvalue 0. To see that 12 ∆ρ + V ρ = 0, it suffices to show that ∞ N 1 2 ∆ϕ + V ϕ, ρ L2 (RN ;R) = 0 for all ϕ ∈ Cc (R ; R).
To this end, let ϕ ∈ Cc∞ (RN ; R) be given, and apply symmetry, Theorem 10.1.2, and Fubini’s Theorem to justify 0 = ϕ, QV1 ρ − ρ L2 (RN ;R) = QV1 ϕ − ϕ, ρ L2 (RN ;R) Z 1 = QVτ 21 ∆ϕ + V ϕ , ρ 2 N dτ L (R ;R)
0
=
Z 1 0
1 2 ∆ϕ
+ V ϕ, QVτ ρ
L2 (RN ;R)
dτ =
1 2 ∆ϕ
+ V ϕ, ρ L2 (RN ;R) .
440
10 Wiener Measure and P.D.E.’s
Finally, suppose that ρ is a bounded, twice continuously differentiable solution to 12 ∆ρ + V ρ = 0. Then, by Corollary 10.1.3 applied to the time-independent function u(t, · ) = ρ, we know that ρ = QVt ρ for all t ≥ 0. Thus, by the initial observation, ρ is a ground state with associated eigenvalue 0.
Theorem 10.3.25. Let V ∈ C(RN ; R) be bounded above, assume that ρ is a ground state for V with associated eigenvalue λ, and set pρ (t, x, y) = e−tλ ρ(x)−1 q V (t, x, y)ρ(y) for (t, x, y). Then pρ is a strictly positive, continuous function, pρ (t, x, · ) has total integral 1 for all (t, x) ∈ (0, ∞) × RN , Z lim sup pρ (t, x, y) dy = 0 for all r, R ∈ (0, ∞), t&0 |x|≤R
B(0,r)
and pρ (s + t, x, y) =
Z
pρ (t, z, y)pρ (t, x, z) dz.
RN
Finally, if V ∈ C 2 (RN ; R) satisfies (10.3.23), then x pρ (t, x, y) is twice conN tinuously differentiable for each (t, y) ∈ (0, ∞) × R , y ∂xα pρ (t, x, y) is twice continuously differentiable for each α with kαk ≤ 2 and (t, x) ∈ (0, ∞) × RN , and ∂t pρ (t, x, y) = 12 ∆x pρ (t, x, y) + ∇x (log ρ), ∇x pρ (t, x, y) RN = 12 ∆y pρ (t, x, y) − divy pρ (t, x, y)∇ log ρ(y)
for all (t, x, y) ∈ (0, ∞) × RN × RN . In particular, for each ϕ ∈ Cb (RN ; R), the function Z u(t, x) = ϕ(y)pρ (t, x, y) dy N R is the one and only bounded u ∈ C 1,2 (0, ∞) × RN ; R that satisfies ∂t u(t, x) = 12 ∆u(t, x) + ∇ log ρ(x), ∇u(t, x) RN in (0, ∞) × RN
lim u(t, x) = ϕ(x) uniformly on compacts.
t&0
Proof: The only assertion that is not an immediate consequence of Theorem 10.3.21, Corollary 10.3.22, and the preceding lemma is the uniqueness in the final part, which is an easy consequence of the corresponding uniqueness statement in Corollary 10.3.22. Indeed, if u is a bounded solution to the given Cauchy initial value problem and w(t, · ) = ρu(t, · ), then w is a bounded solution to ∂t w = 1 2 ∆w + (V − λ)w with initial condition ρϕ. Hence, by R the uniqueness result in Corollary 10.3.22, w(t, · ) = QVt (ρϕ), and so u(t, · ) = RN ϕ(y)pρ (t, x, y) dy.
The advantage that pρ (t, x, y) has over q V (t, x, y) is that we can construct measures on C(RN ) that bear the same relationship to it as the Wiener measures (N ) Wx bear to the classical heat kernel g (N ) (t, y − x).
§ 10.3 Other Heat Kernels
441
Theorem 10.3.26. Let V ∈ C(RN ; R) be bounded above, and assume that ρ is a ground state for V with associated eigenvalue λ. Then, for each x ∈ RN , there is a unique Pρx ∈ M1 C(RN ) such that, for each n ≥ 1, 0 = t0 ≤ t1 < · · · < tm , and Γ, · · · , Γn ∈ BRN , Pρx
ψ(tm ) ∈ Γm , 1 ≤ m ≤ n =
Z ···
Z Y n
pρ tm −tm−1 , ym−1 , ym ) dy1 · · · dyn ,
Γ1 ×···×Γn m=1
where y0 = x. In fact, if ρ
R (t, ψ) = e
−tλ
−1 V ρ ψ(0) E (t, ψ)ρ ψ(t)
then
(N )
Pρx (A) = EWx Finally, x
Rt V ((ψ(τ )) dτ where E (t, ψ) = e 0 , V
ρ R (t), A for all t ≥ 0 and A ∈ Ft .
Pρx is continuous, and, for any stopping time ζ,
Z F ψ, Σζ ψ
Pρx (dψ)
{ζ(ψ) 0, set eξ (y) = e −1(ξ,y)RN ,
|ξ|2 √ − −1 ξ, b(y) RN , and EξR (t, ψ) = exp f (y) = 2
Z
t∧ζ B(0,R) (ψ)
f ψ(τ ) dτ
! .
0
By choosing ϕ ∈ Cc∞ (RN ; C) so that ϕ = eξ on B(0, 2R) and applying Doob’s Stopping Time Theorem, we know that MξR (t), Ft , P is a martingale, where MξR (t, ψ)
= eξ ψ(t ∧ ζ
B(0,R)
Z
) + 0
t∧ζ B(0,R) (ψ)
f ψ(τ ) eξ ψ(τ ) dτ.
444
10 Wiener Measure and P.D.E.’s
Thus, by Theorem 7.1.17, EξR (t)MξR (t) −
Z
t∧ζ B(0,R) (ψ)
! MξR (τ, ψ)f ψ(τ ) EξR (τ, ψ) dτ, Ft , P
0
is also a martingale. At the same time, after performing elementary calculus operations, one sees that √ |ξ|2 t ∧ ζ B(0,R) (ψ) exp −1 ξ, B(t ∧ ζ B(0,R) (ψ) RN + 2 Z t∧ζ B(0,R) (ψ) = EξR (t)MξR (t) − MξR (τ, ψ)f ψ(τ ) EξR (τ, ψ) dτ. 0
Hence exp
√
−1 ξ, B(t ∧ ζ
B(0,R)
|ξ|2 B(0,R) t∧ζ (ψ) , Ft , P (ψ) RN + 2
is a martingale for every R > 0, and so, after letting R → ∞, we know, by Theorem 7.1.7, that B(t), Ft , P is a Brownian motion. It is important to be clear about what Lemma 10.3.28 says and what it does not say. It says that there is a progressively measurable B : [0, ∞) × C(RN ) −→ RN such that B(t), Ft , P is a Brownian motion and Z t (*) ψ(t) = x + B(t, ψ) + b ψ(τ ) dτ, (t, ψ) ∈ [0, ∞) × C(RN ). 0
In the probabilistic literature, this would be summarized by saying that P is the distribution of a Brownian motion with drift b. What Lemma 10.3.28 does not say is that one can always use (*) to reconstruct ψ from B( · , ψ). More precisely, ψ is not necessarily a measurable function of B( · , ψ). Indeed, without additional assumptions on b, it will not be a measurable function of B. Nonetheless, if b is locally Lipschitz continuous, then it will be. To see this, N take η ∈ Cc∞ R ; [0, 1] so that η = 1 on B(0, 2) and 0 off of B(0, 3), and set y b(y). Then bR is uniformly Lipschitz continuous, and so, by bR (y) = η R completely standard methods (e.g., Picard iteration), one can show that there is a continuous map ϕ ∈ C(RN ) 7−→ XR ( · , ϕ) ∈ C(RN ) such that, for each ϕ ∈ C(RN ), Z t R X (t, ϕ) = ϕ(t) + bR XR (τ, ϕ) dτ, t ≥ 0. 0
Moreover, if ψ ∈ C(RN ) and Z ψ(t) = ϕ(t) + 0
t
bR ψ(τ ) dτ,
t ∈ [0, T ],
§ 10.3 Other Heat Kernels
445
then ψ [0, T ] = XR ( · , ϕ) [0, T ]. Hence, if A(b) = ϕ ∈ C(RN ) : ∀t ≥ 0 ∃R > 0 kXR ( · , ϕ)k[0,t] ≤ R , then A(b) ∈ BC(RN ) , and I can define the Borel measurable map ϕ ∈ C(RN ) 7−→ Xb ( · , ϕ) ∈ C(RN ) given by Xb (t, ϕ) =
XR (t, ϕ)
if ϕ ∈ A(b) and kXR ( · , ϕ)k[0,t] ≤ R
ϕ(t)
if ϕ ∈ / A(b).
In particular, when b is locally Lipschitz continuous, Lemma 10.3.28 says that x+B( · , ψ) ∈ A(b) and ψ(t) = Xb t, x+B( · , ψ) for all (t, ψ) ∈ [0, ∞)×C(RN ). Corollary 10.3.29. Let everything be as in Corollary 10.3.27, bρ = ∇ log ρ, and define the set A(bρ ) and the map Xbρ accordingly, as in the preceding (N ) (N ) discussion. Then Wx A(bρ ) = 1 and Pρx = (Xbρ )∗ Wx for all x ∈ RN . Proof: Define ψ B( · , ψ) in terms of bρ as in Corollary 10.3.27. Then, by (N ) that corollary, we know that Wx is the distribution of ψ x + B( · , ψ) under Pρx . Therefore, since x + B( · , ψ) ∈ A(bρ ) and ψ(t) = Xbρ t, x + B( · , ψ) for all (t, ψ) ∈ [0, ∞) × C(RN ), the desired conclusions follow immediately. § 10.3.5. Producing Ground States. As yet I have not addressed the problem of producing ground states. In this subsection I will provide two approaches. The first of these gives a criterion that guarantees the existence of a ground state for a given V . The second goes in the opposite direction. It is the essentially trivial remark that there are many ρ ∈ C 2 RN ; (0, ∞) such that ρ is the ground state of some V . The first approach is an application of elementary spectral theory and is based on the observation that, because q V (t, x, y) = q V (t, y, x), QVt is symmetric on L2 (RN ; R) in the sense that (10.3.30)
ϕ1 , QVt ϕ2
L2 (RN ;R)
= ϕ2 , QVt ϕ1
L2 (RN ;R)
for all ϕ1 , ϕ2 ∈ Cc (RN ; R). The fact that QVt is symmetric on L2 (RN ; R) has profound implications, a few of which are contained in the following lemma. Lemma 10.3.31. For each q ∈ [1, ∞) and t ∈ (0, ∞), QVt Cc∞ (RN ; R) admits a unique extension (which I again denote by QVt ) as a bounded linear operator on + Lq (RN ; R) into itself with norm at most etkV ku . Moreover, for each t > 0, QVt is non-negative definite and self-adjoint on L2 (RN ; R), and, for each q ∈ [1, ∞), QVt takes Lq (RN ; R) into Cb (RN ; R) for each q ∈ [1, ∞) and N
kQVt ϕ(x)ku ≤ (2πt)− 2q etkV
+
ku
kϕkLq (RN ;R) .
446
10 Wiener Measure and P.D.E.’s
Finally, ZZ
V
Z
2
V
q (2t, x, x) dx ≤ (4πt)
q (t, x, y) dx dy =
−N 2
Z
RN ×RN
e2tV (x) dx.
RN
RN
Proof: Given q ∈ [1, ∞) and a Borel measurable ϕ : RN −→ [0, ∞), we have, by Jensen’s Inequality, that h Rt q i q (N ) V (ψ(τ )) dτ QVt ϕ(x) = EWx e 0 ϕ ψ(t) Z h Rt q i (N ) + q V (ψ(τ )) dτ ≤ EWx e 0 ϕ ψ(t) ≤ eqtkV ku ϕ(y)q g (N ) (t, y − x) dy. RN
Hence, since g (N ) (t, · ) has L1 (RN ; R) norm 1, kQVt ϕkLq (RN ;R) ≤ etkV
+
ku
kϕkLq (RN ;R) ,
and so we have proved the first assertion. In addition, if q 0 is that H¨older conjugate of q, then kQVt
ϕku ≤ e
tkV + ku
kg
(N )
(t, · )kLq0 (RN ;R) kϕkLq (RN ;R) ≤
etkV
+
ku N
(2πt)− 2q
kϕkLq (RN ;R) .
Thus, since QVt maps Cc∞ (RN ; R) into Cb (RN ; R), it also takes Lq (RN ; R) there. Because (10.3.30) holds for elements of Cc (RN ; R), the preceding estimates make it clear that it continues to hold for elements of L2 (RN ; R). That is, QVt is self-adjoint on L2 (RN ; R). To see that it is non-negative definite, simply observe that ϕ, QVt ϕ L2 (RN ;R) = QVt ϕ, QVt ϕ L2 (RN ;R) ≥ 0. 2
2
Turning to the final estimate, note that (cf. (10.3.17)) Z Z q V (t, x, y)2 dy = q V (t, x, y)q V (t, y, x) dy = q V (2t, x, x). RN
RN
At the same time, by Jensen’s Inequality, h R 2t i (N ) V (x+ψ2t (τ )) dτ q V (2t, x, x) = EW e 0 g (N ) (2t, 0) Z 1 2t W (N ) 2tV (x+ψ2t (τ )) −N 2 E e dτ, ≤ (4πt) 2t 0
and, by Tonelli’s Theorem, Z Z (N ) EW e2tV (x+ψ(τ )) dx = RN
e2tV (x) dx.
RN
In the language of functional analysis, the last part of Lemma 10.3.31 says that QVT is Hilbert–Schmidt and therefore compact if e2T V ∈ L1 (RN ; R). As a consequence, the elementary theory of compact, self-adjoint operators allows us to make the conclusions drawn in the following theorem.
§ 10.3 Other Heat Kernels
447
Theorem 10.3.32. Assume that eT V ∈ L2 (RN ; R) for some T ∈ (0, ∞). Then there is a unique ρ ∈ Cb RN ; (0, ∞) ∩ L2 (RN ; R) such that kρkL2 (RN ;R) = 1 and etλ ρ = QVt ρ for some λ ∈ R and all t ∈ (0, ∞). Moreover, if V ∈ C 2 (RN ; R) satisfies (10.3.23), then pρ (t, · , y) ∈ C 2 (RN ; R) and
∂t pρ (t, x, y) = 12 ∆x pρ (t, x, y)+ ∇ log ρ(x), ∇x pρ (t, x, y) RN in (0, ∞)×RN ×RN . Proof: The spectral theory of compact, self-adjoint operators guarantees that the operator QVT has a completely discrete spectrum and that its largest eigenvalue is α(T ) = sup ϕ, QVT ϕ L2 (RN ;R) : kϕkL2 (RN ;R) = 1 . Now let ρ be an L2 (RN ; R)-normalized eigenvector for QVT with eigenvalue α(T ). Because α(T )ρ = QVT ρ, we know that ρ can be taken to be continuous. In addition, by the preceding paragraph, ZZ
ρ(x)q V (T, x, y)ρ(y) dx dy = α(T ) ≥
RN ×RN
ZZ
|ρ(x)|q V (T, x, y)|ρ(y)| dx dy,
RN ×RN
which, because q V (T, x, y) > 0 for all (x, y), is possible only if α(T ) > 0 and ρ never changes sign. Therefore we can be take ρ to be non-negative. But, if ρ ≥ 0, then, since pρ (T, x, y) > 0 everywhere and α(T )ρ = QVT ρ, ρ > 0 everywhere. Thus, we have now shown that every normalized eigenvector for QVT with eigenvalue α(T ) is a bounded, continuous function that, after a change of sign, can be taken to be strictly positive. In particular, if ρ1 and ρ2 were linearly independent, normalized eigenvectors of QVT with eigenvalue α(T ), then g=
ρ2 − (ρ1 , ρ2 )L2 (RN ;R) ρ1 kρ2 − (ρ1 , ρ2 )L2 (RN ;R) ρ1 kL2 (RN ;R)
would also be such an eigenvector, and this one would be orthogonal to ρ1 . On the other hand, since neither ρ1 nor g changes sign, ρ1 , g L2 (RN ;R) 6= 0. In summary, we now know that there is, up to sign, a unique L2 (RN ; R)-normalized eigenvector ρ for QVT with eigenvalue α(T ) and that ρ can be taken to be strictly positive, bounded, and continuous. To complete the proof, I must show that QVt ρ = etλ ρ, where λ = T1 log α(T ). To this end, set ρt = QVt ρ for t > 0. Then ρt ∈ Cb RN ; (0, ∞) for each t > 0 and t ρt (x) is continuous for each x ∈ RN . Moreover, QVT ρt = QVt ◦QVT ρ = α(T )ρt . Hence, by the uniqueness proved above, ρt = α(t)ρ for some α(t) ∈ R. In
448
10 Wiener Measure and P.D.E.’s
addition, because t Finally,
ρt (x) is continuous and strictly positive, so is t
α(s + t) = ρ, QVs+t ρ
L2 (RN ;R)
= α(s) ρ, QVt ρ
L2 (RN ;R)
α(t).
= α(s)α(t),
which means that α(t) = etβ for some β ∈ R, and, because α(T ) = eT λ , this completes the proof of everything except the final statement, which is an immediate consequence of Theorem 10.3.21. If nothing else, Theorem 10.3.32 helps to explain the terminology that I have been using. In Schr¨ odinger mechanics, the function ρ in Theorem 10.3.32 is called the ground state because it is the wave function corresponding to the lowest energy level of the quantum mechanical Hamiltonian − 12 ∆ − V . From our standpoint, its importance is that it shows that lots of V ’s admit a ground state. I turn now to the second method for producing ground states. Namely, sup pose that ρ ∈ C 2 RN ; (0, ∞) . Then, it is obvious that 12 ∆ρ + V ρ = 0, where
V =−
∆ log ρ + |∇ log ρ|2 ∆ρ . =− 2 2ρ
Theorem 10.3.33. Let U ∈ C 2 (RN ; R), and assume that both U and V U ≡ 1 2 N − 2 ∆U + |∇U | are bounded above. Then, for each x ∈ R , there is a unique U N U Px ∈ M1 C(R ) such that Px ψ(0) = x = 1 and
ϕ ψ(t) −
1 2
Z t 0
∆ϕ + ∇U, ∇ϕ RN ψ(τ ) dτ, Ft , PU x
is a martingale for all ϕ ∈ Cc∞ (RN ; R). Moreover, for each x ∈ RN ,
Z ψ(t) − x −
t
∇U ψ(τ ) dτ, Ft , PU x
0
is a Brownian motion and PU x (A)
=e
−U (x)
(N )
Wx
E
Rt U h i U ((ψ(t))+ V (ψ(τ )) dτ 0 e ,A
for all t ≥ 0 and A ∈ Ft .
Finally, x PU x is continuous and, for any stopping time ζ and any Fζ ×BC(RN ) measurable F : C(RN ) × C(RN ) that is bounded below, Z {ζ(ψ) 0. Show that where g˜(τ, ξ) =
Z
P
m∈Z
pQ(a,R) (t, x, y)
Q(a,R)
N Y i=1
sin
N N π2 Y π(xi − ai + R) π(yi − ai + R) sin dy = e− 8R2 t 2R 2R i=1
for (t, x, y) ∈ (0, ∞) × Q(a, R)2 . Conclude that
N π2 1 log Wx(N ) (ζ Q(a,R) > t) = − t→∞ t 8R2 lim
for x ∈ Q(a, R).
450
10 Wiener Measure and P.D.E.’s
Hint: First observe that it suffices to handle a = 0, R = 1, and N = 1. To prove π2 (1) , and show that u(t, ψ(t)), Ft , Wx the first part, set u(t, x) = e 4 t sin π(x+1) 2 2 (1) is a martingale. Given the first part, limt→∞ 1t log Wx (ζ (−1,1) > t) ≥ − π8 is clear. To get the inequality in the opposite direction, note that p(−1,1) (t, x, y) ≤ p(−R,R) (t, x, y) if R > 1, and use this to see that, for R > 1 and (t, x) ∈ (0, ∞) × (−1, 1), Z π2 π(x + R) π(y + R) . dy ≤ e− 8R2 t sin p(−1,1) (t, x, y) sin 2R 2R (−1,1)
Exercise 10.3.37. Let G be a non-empty, bounded, connected, open subset (N ) of RN , and set w(t) = supx∈G Wx (ζ G > t) for t > 0. The purpose of this exercise is to show that λG ≡ − limt→∞ 1t log w(t) exists and is an element of (0, ∞).
(i) Show that w is sub-multiplicative in the sense that w(s + t) ≤ w(s)w(t), and conclude from this that limt→∞ 1t log w(t) = supT >0 T1 f (T ) ∈ [−∞, 0]. Hint: Set f (t) = log w(t). Because w takes values in (0, 1] and is non-increasing, f is non-positive and bounded on compacts. f (s+t) ≤ t Further, f is sub-additive: 1 f (s)+f (t). Thus, given, T > 0, f (t) ≤ T f (T ), and so limt→∞ t f (t) ≤ T1 f (T ) for every T > 0. Conclude from this that limt→∞ 1t f (t) = supT >0 T1 f (T ) ∈ [−∞, 0].
(ii) Refer to the notation in Exercise 10.3.36, set R1 = sup{r ≥ 0 : Q(a, r) ⊆ π2 . In particG for some a ∈ G}, and show that λG ≡ − limt→∞ 1t log w(t) ≤ N 8R2 1
ular, λG < ∞.
(ii) Let R2 be the diameter of G, choose a ∈ RN so that G ⊆ B(a, R2 ), and use (N ) R2 the first part of Theorem 10.1.11 to show that EWx [ζ G ] ≤ N2 for all x ∈ G. In log 2 > 0. particular, conclude that w 2N −1 R22 ≤ 12 and therefore that λG ≥ N2R 2 2
Exercise 10.3.38. Again let G be a bounded, connected, open subset of RN . Using spectral theory, the conclusions drawn in Exercise 10.3.37 can be sharpened. Namely, this exercise outlines a proof that ∞ X G (10.3.39) p (t, x, y) = e−tλn ϕn (x)ϕn (y), n=0
where {λn : n ≥ 0} ⊆ (0, ∞) is a non-decreasing sequence that tends to ∞, {ϕn : n ≥ 0} ⊆ Cb (G, R) is an orthonormal basis in L2 (G; R) of smooth functions, λ0 < λ1 , ϕ0 > 0, and the convergence is uniform on [, ∞) × G2 for each > 0. Finally, from (10.3.39), it will follow that tλ e 0 p(t, x, y) − ϕ0 (x)ϕ0 (y) ≤ δ −1 e−tδ , (t, x, y) ∈ [1, ∞) × G2 , for some δ > 0. In particular, this means that λ0 here is equal to λG in Exercise 10.3.37.
Exercises for § 10.3
451
G (i) Let PG t be the operator on Cb (G; R) whose kernel is p (t, x, y), and show G 2 that Pt admits a unique extension to L (G; R) as a self-adjoint contraction. Further, show that {PG t : t > 0} is a continuous semigroup of non-negative definite, self-adjoint contractions on L2 (G; R). Finally, show that
ZZ
G
Z
2
p (t, x, y) dxdy =
pG (2t, x, x) dx ≤
G
G×G
|G| N
(4πt) 2
,
and therefore that each PG t is Hilbert–Schmidt. (ii) Knowing that the operators PG t form a continuous semigroup of self-adjoint, Hilbert–Schmidt (and therefore compact), non-negative definite contractions, standard spectral theory2 guarantees that there exists a non-decreasing sequence {λn : n ≥ 0} ⊆ [0, ∞) tending to ∞ and an orthonormal basis {ϕn : n ≥ 0} in L2 (G; R) such that e−tλn ϕn = PG t ϕn for all t ∈ (0, ∞) and n ≥ 0. Conclude from this that ϕn can be taken to be smooth and bounded. In addition, show that PG t ϕ0 −→ 0 uniformly, and therefore that λ0 > 0. (iii) Show that 0 ϕ, PG t ϕ
(*)
L2 (G;R)
=
∞ X
e−tλn ϕ, ϕn
L2 (G;R)
ϕ0 , ϕn
L2 (G;R)
n=0
for ϕ, ϕ0 ∈ L2 (G; R), and conclude that e−λ0 = sup ϕ, PG 1 ϕ L2 (G;R) : kϕkL2 (G;R) = 1 . Use (cf. the proof of Theorem 10.3.32) this to show that if λn = λ0 , then ϕn never changes sign and can therefore be taken to be non-negative. In particular, show that this means that λ1 > λ0 and that ϕ0 > 0. (iv) Starting from (*), show that ∞ X n=0
2
−tλn
e
ϕ, ϕn
2 L2 (G;R)
Z =
N
ϕ(x)pG (t, x, y)ϕ(y) dxdy ≤ (2πt)− 2 kϕk2L1 (G;R)
G×G
What is needed here is the variant of Stone’s Theorem that applies to semigroups. The technical question which his theorem addresses is that of finding a simultaneous diagonalization of the operators PG t . Because we are dealing here with compact operators, this question can be reduced to one about operators in finite dimensions, where it is quite easy to handle. For a general statement, see, for example, K. Yoshida’s Functional Analysis and its Applications, Springer-Verlag (1971).
452
10 Wiener Measure and P.D.E.’s
for any ϕ ∈ L2 (G; R), and use this to show that, for any M ∈ N and ϕ, ϕ0 ∈ L2 (G; R), ∞ X
tλ
e
−tλn
ϕ, ϕn
n=M
0 e− 2M 0 1 1 ϕ , ϕ n L2 (G;R) ≤ N kϕkL (G;R) kϕ kL (G;R) . L2 (G;R) (πt) 2
Next, given x, y ∈ G, set R = |x − ∂G| ∧ |y − ∂G|, and, for 0 < r ≤ R, apply the preceding to see that Z Z e− tλ2M −tλn e ϕn (z) dz − ϕn (z) dz ≤ − N . B(x,r) B(y,r) (πt) 2 n=M ∞ X
Finally, by combining this with (*), reach the conclusion that tλ
M −1 X G 2e− 2M p (t, x, y) − e−tλn ϕn (x)ϕn (y) ≤ N , (πt) 2 n=0
which, because λM −→ ∞, certainly implies the asserted convergence result. (v) To complete program, set θ = 1 −
tλ e 0 p(t, x, y) − ϕ0 (x)ϕ0 (y) ≤
∞ X
λ0 λ1
∈ (0, 1). Show that ! 12
e−θtλn ϕn (x)2
θtλ1 2
pG
θt 2 , x, x
12
pG
θt 2 , y, y
! 12 e−θtλn ϕn (y)2
n=1
n=1
≤ e−
∞ X
12
≤
e
−
θtλ1 2
N
(πθt) 2
.
Exercise 10.3.40. M. Kac3 made an interesting application of (10.3.39) to a problem raised originally by the physicist H. Lorentz and solved, remarkably quickly, by H. Weyl. What Lorentz noticed is that, if one takes Planck’s theory of black body radiation seriously, then the distribution of high frequencies emitted should depend only on the volume of the radiator. In order to state Lorentz’s question in mathematical terms, let G be a non-empty, bounded, connected, open subset of RN , let {λn : n ≥ 0} be the eigenvalues, arranged in non-decreasing order, of − 12 ∆ with zero boundary conditions, and use N(λ) to denote the number of n ≥ 0 such that λn ≤ λ. What Lorentz predicted was that the rate at which N(λ) grows as λ → ∞ depends only on the volume |G| of G and on nothing else about G. Thus, the original interest in the result was that the asymptotic distribution of high frequencies is so insensitive to the shape of the 3
See Kac’s wonderful article “Can one hear the shape of drum?,” Am. Math. Monthly 73 # 4, pp. 1–23 (1966), or, better yet, borrow the movie from the A.M.S.
Exercises for § 10.3
453
radiator. When Kac took up the problem, he turned it around. Namely, he asked what geometric information, besides the volume, is encoded in the eigenvalues. When he explained his program to L. Bers, Bers rephrased the problem in the terms that Kac adopted for his title. Audiophiles will be disappointed to learn that, according to C. Gordon, D. Webb, and S. Wolpert’s,4 one cannot hear the shape of a drum, even a two dimensional one. This exercise outlines Kac’s argument for proving Weyl’s asymptotic formula N |G|λ 2 , N (λ) ∼ N (2π) 2 Γ( N2+1 )
in the sense that the ratio of the two sides tends to 1 as λ → ∞. (i) Refer to Exercise 10.3.38, and show that, for each n ≥ 0, 1 2 ∆ϕn
= −λn ϕn and lim ϕn (x) = 0 for a ∈ ∂reg G. x∈G x→a
Thus, I will interpret the λn ’s in Exercise 10.3.38 as the frequencies referred to in Lorentz’s problem. (ii) Using (10.3.39), show that Z e
−tλ
N (dλ) =
(0,∞)
∞ X n=0
e
−tλn
Z =
pG (t, x, x) dx,
G
where N (dλ) denotes integration with respect the purely atomic measure on (0, ∞) determined by the non-decreasing function λ N (λ). (iii) Using (10.3.8), show that N
1 ≥ (2πt) 2 pG (t, x, x) ≥ 1 − E(t, x),
where E(t, x) ≥ 0 and, as t & 0, E(t, x) −→ 0 uniformly on compact subsets of G. Conclude that Z N e−tλ N (dλ) = |G|. lim (2πt) 2 t&0
(0,∞)
At this point, Kac invoked Karamata’s Tauberian Theorem,5 which relates the asymptotics at infinity of an increasing function to the asymptotics at zero of 4
See their 1992 announcement in B.A.M.S., new series 27 (2), “One cannot hear the shape of a drum.” 5 See, for example, Theorem 1.7.6 in N. Bingham, C. Goldie, and J. Teugel’s Regularly Varying Functions, Cambridge U. Press (1987).
454
10 Wiener Measure and P.D.E.’s
its Laplace transform. Given the preceding, Karamata’s theorem yields Weyl’s asymptotic formula. It should be pointed out that the weakness of Kac’s method is its reliance on the Laplace transform and Tauberian theory, which gives only the principal term in the asymptotics. Further information can be obtained using Fourier methods, which, in terms of partial differential equations, means that one is replacing the heat equation by the wave equation, an equation about which probability theory has embarrassingly little to say. Exercise 10.3.41. It will have occurred to most readers that the relation between the Hermite heat kernel in (10.1.7) and the Ornstein–Uhlenbeck process in § 8.4.1 is the archetypal example of what we have been doing in this section. This exercise gives substance to this remark. (i) Set ρ± (x) = e±
|x|2 2
, and show that 2
1 2 ∆ρ±
− 12 |x|2 ρ± = ± N2 ρ± . By Lemma
10.3.24, ρ− is a ground state for − |x|2 with associated eigenvalue − N2 , a fact that also can be verified by direct computation using (10.1.7). Show that the 1 1 ρ measure Px− is the distribution under W (N ) of {2− 2 U(2t, 2 2 x, θ) : t ≥ 0}, where U(t, x, θ) is the Ornstein–Uhlenbeck process described in (8.5.1).
(ii) Although it does not follow from Lemma 10.3.24, use (10.1.7) to show that 2 ρ+ is also a ground state for − |x|2 with associated N2 . (See Exercise 10.3.43.) 1 ρ Also, show that Px+ is the W (N ) -distribution of {θ ∈ et x+2− 2 V(2t, θ) : t ≥ 0}, where {V(t, θ) : t ≥ 0} is the process discussed in Exercise 8.5.14. x2
n
x2
d − 2 Exercise 10.3.42. Recall the Hermite polynomials Hn (x) = (−1)n e 2 dx ne in § 2.4.1. Show that the Hermite functions (although these are not precisely the ones introduced in § 2.4, they are obtained from those by rescaling) 1
2 4 ˜ n (x) = 2 1 e− x2 Hn (2 12 x), n ≥ 0, h (n!) 2 form an orthonormal basis in L2 (R; R) and that Z ˜ n (x), n ≥ 0 and (t, x) ∈ (0, ∞) × R, ˜ n (y)h(t, x, y) dy = e−(n+ 12 )t h h
R
where h(t, x, y) is the function in (10.1.7) when N = 1. As a consequence, if ˜ n (x) = h
N Y
hni (xi )
for n ∈ NN and x ∈ RN ,
i=1
˜ n : n ∈ NN } is an orthonormal basis in L2 (RN ; R) and show that {h Z ˜ n (x), n ∈ NN and (t, x) ∈ (0, ∞) × RN . ˜ n (y)h(t, x, y) dy = e−(knk+ N2 )t h h R
Hint: Remember that
∞ X λ2 λn Hn (x) = eλx− 2 . n! n=0
Exercises for § 10.3
455
Exercise 10.3.43. Part (ii) of Exercise 10.3.41 might lead one to question the necessity of the boundedness assumption made in Lemma 10.3.24. However, that would be a mistake because, in general, a positive solution to 12 ∆ρ + V ρ = λρ need not be a ground state. For example, in this exercise we will show that x4 although ρ(x) = e 4 satisfies 12 ∂x2 ρ + V ρ = 0 when V = − 12 x6 + 3x2 , this ρ is not a ground state for V . The proof is based on the following idea. If ρ were a ground state, then Theorems 10.3.26 and its corollaries would apply, and so we would know that the equation
Z (*)
X(t, ψ) = ψ(t) +
t
X(τ, ψ)3 dτ
0 (1)
would have a solution on [0, ∞) for Wx -almost every ψ ∈ C(R) for every x ∈ R. The following steps show that this is impossible. (i) Suppose that ψ1 , ψ2 ∈ C(R) and that 0 ≤ ψ1 (t) ≤ ψ2 (t) for t ∈ [0, 1]. If X( · , ψ2 ) exists on [0, 1], show that X( · , ψ1 ) exists on [0, 1]. Rt Hint: Define X0 (t, ψ) = ψ(t) and Xn+1 (t, ψ) = ψ(t) + 0 Xn (τ, ψ)3 dτ . First show that if 0 ≤ ψ1 (t) ≤ ψ2 (t), then 0 ≤ Xn ( · , ψ1 ) ≤ Xn ( · , ψ2 ). Second, if supn≥0 kXn ( · , ψ)k[0,T ] < ∞, show that Xn ( · , ψ) converges uniformly on [0, T ] to the unique solution to (*) on [0, T ]. 1 (ii) Show that if ψ(t) ≥ 1 for t ∈ [0, 1], then X(t, ψ) ≥ (1 − 2t)− 2 for t ∈ 0, 12 and therefore X( · , ψ) fails to exist after time 12 . (1) (iii) Show that W2 ψ(t) ≥ 1 for t ∈ [0, 1] > 0, and conclude from this that ρ cannot be a ground state for V .
Chapter 11 Some Classical Potential Theory
In this concluding chapter I will discuss a few refinements and extensions of the material in §§ 10.2 and 10.3. Even so, I will be barely scratching the surface. The interested reader should consult J.L. Doob’s thorough account in Classical Potential Theory and Its Probabilistic Counterpart, published by Springer–Verlag in 1984, or S. Port and C. Stones’s Brownian Motion and Classical Potential Theory, published by Academic Press in 1978. § 11.1 Uniqueness Refined In this section I will refine some of the uniqueness statements made in § 10.2. The improved statements result from the removal of the defect mentioned in Remark 10.3.14. To be precise, recall that if G is an open subset of RN , then G ζsG (ψ) = inf{t ≥ s : ψ(t) ∈ / G}, ζ0+ = lims&0 ζsG , and (cf. Lemma 10.2.11) (N ) G ∂reg G is the set of x ∈ ∂G such that Wx (ζ0+ = 0) = 1. The main result proved in this section is Theorem 11.1.15, which states that, for any x ∈ G and (N ) Wx -almost all ψ ∈ C(RN ), ζ G (ψ) < ∞ =⇒ ψ(ζ G ) ∈ ∂reg G. However, I will begin by amending the treatment that I gave in § 10.3 of the Dirichlet heat kernel pG (t, x, y). § 11.1.1. The Dirichlet Heat Kernel Again. In § 10.3, I introduced the Dirichlet heat kernel pG (t, x, y). At the time, I was concerned with it only when (x, y) ∈ G × G, and so I defined it in such a way that it was 0 outside G × G. When G is regular in the sense that ∂G = ∂reg G, this choice is the obvious one, since (cf. Theorem 10.3.9) it is the one that makes pG (t, · , y) continuous on R for each (t, y) ∈ (0, ∞) × RN . However, when G is not regular, it is too crude for the analysis here. Instead, from now on I will take pG (t, x, y) = (11.1.1)
W (N ) x 1 − `t (τ ) + θt (τ ) + y`t (τ ) ∈ G, τ ∈ (0, t) g (N ) (t, y − x),
and θt (τ ) = θ(τ ) − θ(t)`t (τ ). Notice that the difference where `t (τ ) = τ ∧t t between this definition and the one in § 10.3.2 results from the replacement of the closed interval [0, t] there by the open interval (0, t) here. That is, in § 10.3.2, pG (t, x, y) was given by W (N ) x 1 − `t (τ ) + θt (τ ) + y`t (τ ) ∈ G, τ ∈ [0, t] g (N ) (t, y − x). 456
§ 11.1 Uniqueness Refined
457
Of course, unless x, y ∈ ∂G, the difference between these two disappears. On the other hand, when either x or y is an element of ∂G, there is a subtle, but crucial, difference. To relate the preceding definition to the considerations in § 10.3.1, set Et◦ (ψ) = G 1[t,∞) ζ0+ (ψ) . Then (11.1.1) is equivalent to saying that pG (t, x, ψ) = q ◦ (t, x, y) ◦ when q (t, x, y) is defined in terms of Et◦ via (10.3.2). Hence, just as in the proof of Theorem 10.3.3, one can use the results in § 8.3.3 to check that pG (t, x, y) = pG (t, y, x) is again true but that (10.3.4) has to be replaced by Z
(N )
ϕ(y)pG (t, x, y) dy = EWx
(11.1.2) RN
G ϕ(ψ(t) , ζ0+ (ψ) ≥ t .
However, the analog here of the Chapman–Kolmogorov equation (10.3.5) presents something of challenge. To understand this challenge, note that t Et◦ fails to satisfy (10.3.1). Indeed, ◦ Es+t (ψ) = 1G ψ(s) Es◦ (ψ)Et◦ (ψ).
(11.1.3)
Thus, repeating the argument used in the proof of Theorem 10.3.3 to derive (10.3.5), one finds that (11.1.4)
Z
G
p (s + t, x, y) =
pG (s, x, z)pG (t, z, y) dz,
G
which, because the integral is over G and not RN , is a flawed version of the Chapman–Kolmogorov equation. In order to remove this flaw, I will need the following lemma. Lemma 11.1.5. For each (t, x) ∈ (0, ∞) × RN , G Wx(N ) (ζ G = t) = 0 = Wx(N ) (ζ0+ = t),
and therefore Z (11.1.6) RN
h i G ϕ(y)pG (t, x, y) = Wx(N ) ϕ ψ(t) , ζ0+ (ψ) > t
for all Borel measurable ϕ : RN −→ R that are bounded below. In particular, pG (t, x, y) = 0 for Lebesgue-almost every y ∈ / G. Proof: Set Z ρ(ξ) = RN
Wy(N ) ζ G > ξ γ0,I (dy),
ξ ∈ (0, ∞).
458
11 Some Classical Potential Theory
Obviously, ρ is a right-continuous, non-increasing, [0, 1]-valued function, and, as such, it has only countably many discontinuities. Hence, there is a countable set Λ ⊆ (0, ∞) such that ξ∈ / Λ =⇒ Wy(N ) (ζ G = ξ) = 0
for Lebesgue-almost every y ∈ RN .
Now let (t, x) ∈ (0, ∞) × RN be given, and choose s ∈ (0, t) so that t − s ∈ / Λ. Then, by the Markov property and (10.2.10), G G Wx(N ) ζ0+ = t = Wx(N ) ζ0+ > s & ζ G ◦ Σs = t − s ≤ Wx(N ) ζ G ◦ Σs = t − s Z (N ) = Wx+y ζ G = t − s γ0,sI (dy) = 0. RN
G In addition, because ζ G (ψ) = t =⇒ ζ0+ (ψ) = t when t > 0, it follows that (N ) G Wx (ζ = t) = 0 also. Given the preceding, it is clear how to pass from (11.1.2) to (11.1.6). Finally, by applying (11.1.6) with ϕ = 1G{ , we see that Z G pG (t, x, y) dy = Wx(N ) ψ(t) ∈ / G & ζ0+ (ψ) > t = 0, G{
which says that pG (t, x, · ) vanishes Lebesgue-almost everywhere on G{. Because of the final part of Lemma 11.1.5, we can now replace the preceding flawed version of the Chapman–Kolmogorov equation by Z G (11.1.7) p (s+t, x, y) = pG (s, x, z)pG (t, z, y) dz, (t, x, y) ∈ (0, ∞)×(RN )2 . RN
Before completing this discussion, I want to develop a Duhamel formula for pG . That is, I want to show that (11.1.8)
pG (t, x, y) =g (N ) (t, y − x) h i G (N ) G G − EWx g (N ) t − ζ0+ (ψ), y − ψ(ζ0+ ) , ζ0+ (ψ) < t
for all (t, x, y) ∈ (0, ∞) × (RN )2 , and the idea is very much the same as the one used to prove (10.3.8). Thus, for α ∈ (0, 1), set qα◦ (t, x, y) = W (N ) x 1 − `t (τ ) + θt (τ ) + y`t (τ ) ∈ G, τ ∈ (0, αt) g (N ) (t, y − x). Obviously, qα◦ (t, x, y) & pG (t, x, y) as α % 1. In addition, proceeding as in the proof of Theorem 10.3.3, one finds that qα◦ (t, x, · ) is continuous and that Z h i G (N ) (*) ϕ(y)qα◦ (t, x, y) dy = EWx ϕ ψ(t) , ζ0+ (ψ) ≥ αt . RN
§ 11.1 Uniqueness Refined
459
Now use the Markov property to justify Z ϕ(y)g (N ) (t, y − x) dy RN h i h i (N ) (N ) = EWx ϕ ψ(t) , ζsG (ψ) ≥ αt + EWx ϕ ψ(t) , ζsG (ψ) < αt h i (N ) = EWx ϕ ψ(t) , ζsG (ψ) ≥ αt Z (N ) Wx (N ) G G G +E ϕ(y)g t − ζs (ψ), y − ψ(ζs ) dy, ζs (ψ) < αt . RN
for all α ∈ (0, 1), t ∈ (0, ∞) and s ∈ (0, αt). Thus, by (*), after letting s & 0, we see that Z ϕ(y)qα (t, x, y) dy RN Z = ϕ(y)g (N ) (t, y − x) dy RN Z (N ) G G G − EWx ϕ(y)g t − ζ0+ (ψ), y − ψ(ζ0+ ) dy, ζ0+ (ψ) < αt . RN
Because qα◦ (t, x, · ) is continuous, this means that (N )
qα◦ (t, x, y) = g (N ) (t, y − x) − EWx
(N ) G G G g t − ζ0+ (ψ), y − ψ(ζ0+ ) , ζ0+ (ψ) < αt ,
and so (11.1.8) follows when one lets α % 1. § 11.1.2. Exiting Through ∂reg G. The purpose of this subsection is to prove that when Brownian motion exits from a region, it does so through regular points. My proof of this fact follows the reasoning in the book, cited above, by Port and Stone. Lemma 11.1.9. Let G be a non-empty, connected open subset of RN , and define pG by (11.1.1). Then, for each (t, x, a) ∈ (0, ∞)×RN ×reg(G) ≡ ∂reg G∪(RN \ G, pG (t, x, a) = 0. On the other hand, if (t, x) ∈ (0, ∞) × G, then pG (t, x, a) > 0 for all a ∈ ∂G \ ∂reg G. In particular, ∂G \ ∂reg G has Lebesgue measure 0. ¯ Next, suppose that a ∈ Proof: Obviously, pG (t, x, a) = 0 if a ∈ RN \ G. G ∂reg G. Then, by (11.1.8), p (t, a, x) = 0 for all (t, x) ∈ (0, ∞) × RN , and so, by symmetry, the same is true of pG (t, x, a). To go in the other direction when (t, x) ∈ (0, ∞) × G, let a ∈ ∂G be given, and begin with the observation that (t, x) ∈ (0, ∞) × G 7−→ pG (t, x, a) is in 1,2 C (0, ∞) × G; [0, ∞) and satisfies ∂t pG (t, x, a) = 12 ∆x pG (t, x, a). To check this, use (11.1.4) to write Z G p (t, x, a) = pG (t − s, x, z)pG (s, z, a) dz G
460
11 Some Classical Potential Theory
for any 0 < s < t, and note that pG (s, · , a) is bounded. Hence, the desired conclusions follow from (10.3.12) and the argument used to prove the last part of Theorem 10.3.9. Next, suppose that pG (t0 , x0 , a) = 0 for some (t0 , x0 ) ∈ (0, ∞)× G. Then, by the strong minimum principle (cf. Theorem 10.1.6), pG (t, x, a) = 0 for all (t, x) ∈ (0, t0 ) × G. But this, by (11.1.2) and symmetry, means that, for t ∈ (0, t0 ), Z Z (N ) G G Wa (ζ0+ ≥ t) = p (t, a, y) dy = pG (t, x, a) dx = 0, RN
G
where I have used the final part of Lemma 11.1.5 to get the second equality. Hence, pG (t0 , x0 , a) = 0 =⇒ a ∈ ∂reg G. Finally, because, by the preceding and symmetry, for any x ∈ G, ∂G \ ∂reg G is contained in {y ∈ / G : p(1, x, y) > 0}, and, by Lemma 11.1.5, the latter set has Lebesgue measure 0, it is clear the ∂G \ ∂reg G has Lebesgue measure 0. I next introduce the function (N )
v G (x) ≡ EWx
(11.1.10)
−ζ G e 0+ ,
x ∈ RN .
Since, by the Markov property, Z (N ) (N ) G G −s e g (N ) (s, y − x)EWy e−ζ dy = EWx e−ζs % v G (x) RN
as s & 0, it is clear that v G is lower semicontinuous. In addition, it is obvious that v G ≤ 1 everywhere and that x ∈ RN : v G (x) = 1 = reg(G) = ∂reg G ∪ RN \ G .
Lemma 11.1.11. Define the Borel measure ν G on RN by 1 Z h G i (N ) G G ν (Γ) = EWx e−ζ0+ (ψ) , ψ(ζ0+ ) ∈ Γ dx. RN
Then ν
G
is supported on G{, and if Z r(x) = e−t g (N ) (t, x) dt,
x ∈ RN ,
(0,∞)
then (11.1.12)
G
Z
r(y − x) ν G (dy),
v (x) =
x ∈ RN .
RN
In particular, ν G is always locally finite and is therefore finite in the case when G{ is compact. Finally, for any non-empty, open set H ⊂ RN , h H i (N ) H (11.1.13) G{ ⊆ reg(H) =⇒ v G (x) = EWx e−ζ0+ v G ψ(ζ0+ ) , x ∈ RN , ¯ where reg(H) = ∂reg H ∪ (RN \ H). 1
G −ζ0+ (ψ)
Below I use the convention that e
G (ψ) = ∞. Thus, the problem of = 0 when ζ0+ G −ζ0+ (ψ)
G ) meaning when ζ G (ψ) = ∞ does not arise in integrals having e giving ψ(ζ0+ 0+ factor in their integrands.
as a
§ 11.1 Uniqueness Refined
461
Proof: Clearly ν G is supported on G{. To prove (11.1.12), note that the symmetry of pG (t, x, y) together with (11.1.8) imply that h i G (N ) G G EWx g (N ) t − ζ0+ (ψ), y − ψ(ζ0+ ) , ζ0+ (ψ) < t h i G (N ) G G = EWy g (N ) t − ζ0+ (ψ), x − ψ(ζ0+ ) , ζ0+ (ψ) < t for all (t, x, y) ∈ (0, ∞) × RN × RN . Hence, after multiplying by e−t and integrating with respect to t ∈ (0, ∞), one arrives at h G h G i i (N ) (N ) G G EWx e−ζ0+ (ψ) r ψ(ζ0+ ) − y = EWy e−ζ0+ (ψ) r ψ(ζ0+ )−x . But
Z r(x − y) dy = 1,
x ∈ RN ,
RN
and so (11.1.12) follows after one integrates the preceding over y ∈ RN and applies Tonelli’s Theorem. Given (11.1.12) and the fact that r is uniformly positive on compacts, it becomes obvious that ν G must be always locally finite and finite when G{ is compact. Thus, all that remains is to check (11.1.13). But clearly, after multiplying (11.1.8) with G = H throughout by e−t and integrating with respect to t ∈ (0, ∞), one gets Z h G i (N ) G r(x − y) = e−t pH (t, x, y) dt + EWx e−ζ0+ r ψ(ζ0+ )−y . (0,∞)
Hence, since, by the first part of Lemma 11.1.9 with G = H, pH (t, x, · ) vanishes on reg(H), (11.1.13) follows after one integrates the preceding with respect to ν G (dy) and uses (11.1.12). Lemma 11.1.14. If G{ is compact and, for some θ ∈ [0, 1), v G G{ ≤ θ, then (N ) G Wx ζ0+ < ∞ = 0 for every x ∈ RN . G Proof: by checking that that I begin v ≤ θ everywhere. Thus, suppose N G H = x ∈ R : v (x) > θ + 6= ∅ for some > 0. Because v G is lower semicontinuous, H is open. I will derive a contradiction by first showing that G{ ⊆ reg(H) and then applying (11.1.13). To carry out the first step, use (11.1.12) to see that, for any s ∈ (0, ∞), Z Z ∞ G −t (N ) G v (x) ≥ e g (t, y − x) ν (dy) dt s RN Z Z −s −t (N ) G =e e g (s, y − x)v (y) dy dt (0,∞)
≥e
−s
(θ +
)Wx(N )
RN
H ψ(s) ∈ H ≥ e−s (θ + )Wx(N ) ζ0+ >s ,
462
11 Some Classical Potential Theory (N )
H and so, after letting s & 0, we have that v G (x) ≥ (θ + )Wx (ζ0+ > 0). (N ) H In particular, if x ∈ / G, then θ ≥ (θ + )Wx (ζ0+ > 0), which means that (N ) H x ∈ / G =⇒ Wx (ζ0+ > 0) < 1. Hence, because (cf. part (ii) of Exercise (N ) H 10.2.19) Wx (ζ0+ > 0) ∈ {0, 1}, this means that x ∈ / G =⇒ x ∈ reg(H) and therefore that (11.1.13) applies. But if x ∈ H, (11.1.13) yields the contradiction (N )
θ + < v G (x) = EWx
h H i H e−ζ0+ v G ψ(ζ0+ ) < θ + ,
H H since ζ0+ (ψ) < ∞ =⇒ ψ(ζ0+ )∈ / H. That is, I have shown that H must be empty. Knowing that v G ≤ θ everywhere, I now want to argue that ν G (RN ) ≤ θν G (RN ). Since ν G (RN ) < ∞, this will show that ν G = 0 and therefore, by (N ) G (11.1.12), that v G ≡ 0, which is the same as saying that Wx (ζ0+ < ∞) = 0 everywhere. Thus, let K = G{, and set Kn = {x : dist(x, K) ≤ n−1 } and Gn = Kn { for n ≥ 1. Clearly, K ⊆ RN \ Gn ⊆ reg(Gn ), and so, by (11.1.12) and Tonelli’s Theorem, Z Z G N Gn G ν (R ) = v (x) ν (dx) = v G (y) ν Gn (dy) ≤ θν Gn (RN ).
RN
RN
Thus, all that we have to do is check that ν Gn (RN ) & ν G (RN ) when n → ∞. But Z Gn N ν (R ) = v Gn (x) dx RN
and ν G1 (RN ) < ∞. Hence, by the Monotone Convergence Theorem, it is enough for us to know that v Gn (x) & v G (x) for Lebesgue-almost every x ∈ RN . Because (N ) Gn G x ∈ Gn implies ζ0+ = ζ Gn % ζ G = ζ0+ Wx -almost surely, 1 ≥ v Gn & v G on Gn G G. At the same time, 1 ≥ v ≥ v = 1 on reg(G), and, by the last part of Lemma 11.1.9, G{ \ reg(G) = ∂G \ ∂reg G has Lebesgue measure 0. Theorem 11.1.15. For every open G ⊂ RN , G G Wx(N ) ζ0+ (ψ) < ∞ & ψ(ζ0+ )∈ / ∂reg G = 0 for all x ∈ G. (N )
G Proof: Suppose not. Because Wy (ζ0+ > 0) ∈ {0, 1} for all y ∈ RN , we could then find an x ∈ G and a δ > 0 for which G G Wx(N ) ζ0+ (ψ) < ∞ & ψ(ζ0+ ) ∈ Γδ > 0,
where
o n G Γδ = y ∈ ∂G : Wy(N ) ζ0+ ≥ δ ≥ 12 .
§ 11.1 Uniqueness Refined
463
But then there would exist a compact K ⊆ Γδ for which K{ G G Wx(N ) ζ0+ < ∞ ≥ Wx(N ) ζ0+ (ψ) < ∞ & ψ(ζ0+ ) ∈ K > 0. On the other hand, because K{ ⊇ G, v K{ ≤ v G everywhere, and therefore, because v G (y) ≤ 12 1 + e−δ < 1 for y ∈ K, Lemma 11.1.14 would say that (N ) K{ Wx ζ0+ < ∞ = 0, which is obviously a contradiction. § 11.1.3. Applications to Questions of Uniqueness. My main reason for wanting the result in Theorem 11.1.15 is that it allows me to improve on the uniqueness results that were proved in §§ 10.2.3 and 10.3.1. For example, by the comment in Remark 10.3.14, we can now remove the assumption that ∂G = ∂reg G from the uniqueness assertion in Corollary 10.3.13.
Theorem 11.1.16. Let G be an open subset of RN and ϕ ∈ Cb (G; R). Then Z G (N ) Wx (t, x) ∈ (0, ∞) × G 7−→ E ϕ ψ(t) , ζ (ψ) > t = ϕ(y)pG (t, x, y) dy ∈ R G
is the one and only bounded, smooth solution to the boundary value problem described in Corollary 10.3.13. More interesting are the improvements that Theorem 11.1.15 allows me to make to the results in § 10.2.3. Theorem 11.1.17. f : ∂G −→ R, set (11.1.18)
Given an open G ⊆ RN and a bounded Borel measurable (N )
uf (x) = EWx
f ψ(ζ G ) , ζ G (ψ) < ∞ ,
for x ∈ G.
Then uf is a bounded harmonic function on G and limx→a uf (x) = f (a) whenx∈G ever a ∈ ∂regG is a point at which f is continuous. Furthermore, if f ∈ 2 Cb ∂G; [0, ∞) and u is an element of C G; [0, ∞) that satisfies ∆u ≤ 0
in G and
lim u(x) ≥ f (a) for a ∈ ∂reg G,
x→a x∈G
then uf ≤ u. In particular, if f ∈ Cb ∂G; R , then uf is the one and only harmonic function u on G with the properties that u(x) ≤ CWx(N ) ζ G < ∞ for all x ∈ G, for some C < ∞ and lim u(x) = f (a) for each a ∈ ∂reg G.
x→a x∈G
464
11 Some Classical Potential Theory
Proof: The initial assertions are covered already by Theorem 10.2.14. Next, let f ∈ Cb (∂G; R) be given, and suppose that u is an element of C 2 G; [0, ∞) which satisfies the conditions in the second assertion. To prove that uf ≤ u, set Ft = σ {ψ(τ ) : τ ∈ [0, t]} , and choose a sequence of bounded, open subsets Gn (N ) so that Gn ⊆ G and Gn % G. Then, for each n ≥ 1, −u ψ(t ∧ ζ Gn ), Ft , Wx is a submartingale, and so we know that, for each x ∈ G, u(x) dominates
lim
(N )
lim EWx
u ψ(T ∧ ζ Gn ) ≥ lim
u ψ(ζ Gn ) , ζ G ≤ T
T %∞ n→∞
T %∞ n→∞ (N )
Wx
≥E
(N )
lim EWx
h i f ψ(ζ G ) , ζ G < ∞ = uf (x),
where, in the passage to the last line, I have used Fatou’s Lemma and Theorem 11.1.15. Finally, let f ∈ Cb (∂G; R) be given. What I still have to show is that if u is a harmonic function on G which tends to f at points in ∂reg G and satisfies (N ) |u(x)| ≤ CWx (ζ G < ∞) for some C < ∞, then u = uf . Thus, suppose u is such a function, and set M = C + kf ku . Then, by the preceding, we have both that M Wx(N ) ζ G < ∞ + u(x) ≥ uM 1+f (x) = M Wx(N ) ζ G < ∞ + uf (x) and that M Wx(N ) ζ G < ∞ − u(x) ≥ uM 1−f (x) = M Wx(N ) ζ G < ∞ − uf (x), which means, of course, that u = uf . As an immediate consequence of Theorem 11.1.17, we have the following. Corollary 11.1.19. Assume that (11.1.20)
Wx(N ) (ζ G < ∞) = 1 for all x ∈ G.
Then, for each f ∈ Cb (G; R) the function uf in (11.1.18) is the one and only bounded, harmonic function u on G which satisfies limx→a u(x) = f (a) for every x∈G a ∈ ∂reg G. In particular, this will be the case if G is contained in a half-space. In order to go further, it will be helpful to have the following lemma. Lemma 11.1.21. Let G be a non-empty, connected, open set in RN . Then ∂reg G = ∅ ⇐⇒ Wx(N ) ζ G < ∞ = 0 for all x ∈ G. On the other hand, if ∂reg G 6= ∅ and b ∈ ∂G, then / BRN (b, r) & ζ G < ∞ > 0. b∈ / ∂reg G ⇐⇒ lim lim Wx(N ) ψ(ζ G ) ∈ r&0 x→b x∈G
§ 11.1 Uniqueness Refined
465
Proof: The equivalence ∂reg G = ∅ ⇐⇒ Wx(N ) (ζ G < ∞) = 0,
x ∈ G,
follows immediately from Theorems 11.1.15 and 11.1.17. Now assume that ∂reg G 6= ∅, and let b ∈ ∂G. If b ∈ ∂reg G, then lim lim Wx(N ) ψ(ζ G ) ∈ / BRN (b, r) & ζ G < ∞ = 0 r&0 x→b x∈G
follows from (10.2.13). Thus, suppose that b ∈ / ∂reg G. Choose a ∈ ∂reg G, 1 and set B = BRN (b, r), where 0 < r ≤ 2 |a − b|. One can then construct an f ∈ C ∂G; [0, 1] with the properties that f = 0 on B ∩ ∂G and f (a) = 1. In particular, 0 ≤ uf (x) ≤ Wx(N ) ψ(ζ G ) ∈ / B & ζ G < ∞ ≤ 1 for all x ∈ G,
and so we need only check that limx→b uf (x) > 0. To this end, first note that, x∈G
since
lim uf (x) = f (a) = 1,
x→a x∈G
the Strong Minimum Principle (cf. Theorem 10.1.6) says that uf > 0 everywhere in G. Next, because b is not regular, we can find a δ > 0 and a sequence {xn : n ≥ 1} ⊆ G such that xn → b and ) G ≡ inf+ Wx(N ζ > δ > 0. n n∈Z
Moreover, by the Markov property, we know that i Z (N ) Wx n G G uf (xn ) ≥ E f ψ(ζ ) , δ < ζ < ∞ = uf (y) pG (δ, xn , y) dy. G
At the same time, we know that pG (δ, xn , y) ≤ g (N ) (δ, y − xn ), and therefore that Z sup pG (δ, xn , y) dy ≤ 2 + n∈Z G\K
for some compact subset K of G. Hence,
lim uf (x) ≥ lim uf (xn ) ≥
x→b x∈G
n→∞
inf uf (y) > 0. 2 y∈K
As a consequence of Lemma 11.1.21, I will now show that solutions to the Dirichlet problem will not, in general, approach the correct value at points outside of ∂reg G.
466
11 Some Classical Potential Theory
Theorem 11.1.22. Let G be a connected open set in RN , and assume that ∂reg G 6= ∅. If b ∈ ∂G \ ∂reg G, then there exists an f ∈ C ∂G; [0, 1] which has the property that lim uf (x) 6= f (b). x→b x∈G
Proof: Given b, use Lemma 11.1.21 to find an r ∈ (0, ∞) so that lim Wx(N ) ψ(ζ G ) ∈ / B(b, r) & ζ G < ∞ > 0, x→b x∈G
and construct f so that f ≡ 1 on ∂G ∩ B(b, r){ and f (b) = 0. Then f (b) < limx→b uf (x). x∈G
I next take a closer look at the conditions under which we can assert the uniqueness of solutions to the Dirichlet problem. To begin, observe that, by Corollary 11.1.19, the situation is quite satisfactory when (11.1.20) holds. In fact, the same line of reasoning which I used there shows that the same conclusion (N ) holds as soon as one knows that Wx ζ G < ∞ is bounded below by a positive (N ) constant; and therefore, because x ∈ G 7−→ Wx (ζ G < ∞) is a bounded harmonic function which tends to 1 at ∂reg G, Theorem 11.1.17 tells us that (11.1.23) inf Wx(N ) ζ G < ∞ > 0 =⇒ inf Wx(N ) ζ G < ∞ = 1. x∈G
x∈G
I will close this discussion of the Dirichlet problem with two results which reflect the transience of Brownian paths in three and higher dimensions and their recurrence in one and two dimensions. Theorem 11.1.24. Assume that N ≥ 3, and let G be a nonempty, connected, open subset of RN . If f ∈ Cc (∂G; R), then uf is the one and only bounded harmonic function u on G which tends to f at ∂reg G and satisfies (11.1.25)
lim u(x) = 0.
|x|→∞ x∈G
Proof: We already know that uf is a bounded harmonic function which tends to f at ∂reg G, but we must still show that it satisfies (11.1.25). For this purpose, choose r ∈ (0, ∞) so that f is supported in B(0, r). Then (cf. the last part of Theorem 10.1.11), because N ≥ 3, uf (x) ≤ kf ku Wx(N ) ζr < ∞ −→ 0 as |x| → ∞. To prove that uf is the only such function u, select bounded open sets Gn % G with Gn ⊂⊂ G, and note that, for each T ∈ (0, ∞), h i (N ) u(x) = lim EWx u ψ(T ∧ ζ Gn ) n→∞ h i h i (N ) (N ) = EWx f ψ(ζ G ) , ζ G ≤ T + EWx u ψ(T ) , T < ζ G < ∞ h i (N ) + EWx u ψ(T ) , ζ G = ∞ .
§ 11.1 Uniqueness Refined
467
Clearly, (N )
uf (x) = lim EWx T %∞
and
(N )
lim EWx
h
T %∞
h i f ψ(ζ G ) , ζ G ≤ T
i u ψ(T ) , T < ζ G < ∞ = 0.
Finally, because N ≥ 3 and, therefore, by Corollary 10.1.12, ψ(T ) −→ ∞ as (N ) T % ∞ for Wx -almost every ψ ∈ C(RN ), (11.1.25) guarantees that (N )
lim EWx
T %∞
h i u ψ(T ) , ζ G = ∞ = 0,
which completes the proof that u = uf . The situation when N ∈ {1, 2} is more complicated. Theorem 11.1.26. RN ,
If N ∈ {1, 2}, then for every non-empty, open set G in
Wx(N ) ζ G < ∞) = 1 for all x ∈ G or Wx(N ) ζ G < ∞ = 0 for all x ∈ G, depending on whether ∂reg G 6= ∅ or ∂reg G = ∅. Moreover, if ∂reg G = ∅, then the only functions u ∈ C 2 G; [0, ∞) satisfying ∆u ≤ 0 are constant. In particular, either ∂reg G = ∅, and there are no non-constant, nonnegative harmonic functions on G, or ∂reg G 6= ∅, and, for each f ∈ Cb (∂G; R), uf is the unique bounded harmonic function on G which tends to f at ∂reg G. (N ) Proof: Suppose that Wx0 ζ G < ∞ < 1 for some x0 ∈ G, and choose open sets Gn % G so that x0 ∈ G1 and Gn ⊂⊂ G for all n ∈ Z+ . Given u ∈ C 2 G; [0, ∞) with ∆u ≤ 0, set
Xn (t, ψ) = 1(t,∞] ζ Gn (ψ) u ψ(t)
for (t, ψ) ∈ [0, ∞) × C(RN ).
(N ) (N ) Then −Xn (t), Ft , Wx0 is a non-positive, right-continuous, Wx0 -submartin gale when Ft = σ {ψ(τ ) : τ ∈ [0, t]} . Hence, since
Xn (t, ψ) % X(t, ψ) ≡ 1(t,∞] (ζ G ) u ψ(t)
pointwise as n → ∞,
an application of The Monotone Convergence Theorem allows us to conclude (N ) that −X(t), Ft , Wx0 is also a non-positive, continuous, submartingale. In particular, by Theorem 7.1.10, this means that ) lim u ψ(t) exists for Wx(N -almost every ψ ∈ {ζ G = ∞}. 0
t→∞
468
11 Some Classical Potential Theory (N )
At the same time, by Theorem 10.2.3, we know that, for Wx0 -almost every ψ ∈ C(RN ), Z ∞ 1U ψ(t) dt = ∞ for all open U 6= ∅. 0
(N ) Hence, since Wx0 ζ G = ∞ > 0, there exists a ψ0 ∈ C(RN ) with the properties that ψ(0) = x0 , ζ G (ψ0 ) = ∞, Z ∞ 1U ψ0 (t) dt = ∞ for all open U 6= ∅, and lim u ψ0 (t) exists, t→∞
0
which is possible only if u is constant. In other words, we have now proved that (N ) when Wx0 (ζ G < ∞) < 1 for some x0 ∈ G, then the only u ∈ C 2 G; [0, ∞) with ∆u ≤ 0 are constant. Given the preceding paragraph, the rest is easy. Indeed, if ∂reg G = ∅, then (N ) Theorem 11.1.15 already implies that Wx (ζ G < ∞) = 0 for all x ∈ G. On the (N ) other hand, if a ∈ ∂reg G but Wx0 ζ G < ∞ < 1 for some x0 ∈ G, then the (N ) (N ) preceding paragraph applied to x Wx (ζ G < ∞) says that Wx (ζ G < ∞) is constant, which leads to the contradiction ) G 1 > Wx(N (ζ < ∞) = x→a lim Wx(N ) (ζ G < ∞) = 1. 0 x∈G
§ 11.1.4. Harmonic Measure. We now have a rather complete abstract analysis of when the Dirichlet problem can be solved. Indeed, we know that, at least when f ∈ Cc (∂G; R), one cannot do better than take one’s solution to be the function uf given by (11.1.18). For this reason, I will call (11.1.27) ΠG (x, Γ) ≡ Wx(N ) ψ(ζ G ) ∈ Γ, ζ G (ψ) < ∞ the harmonic measure for G based at x ∈ G of the set Γ ∈ B∂G . Obviously, Theorem 11.1.15 says that ΠG (x, ∂G \ ∂reg G) = 0, and Z uf (x) = f (η) ΠG (x, dη). ∂G
This connection between harmonic measure and Wiener’s measure is due to Doob,2 and it is the starting point for what, in the hands of G. Hunt,3 became an isomorphism between potential theory and the theory of Markov processes. 2
Actually, S. Kakutani’s 1944 article, “Two dimensional Brownian motion and harmonic functions,” Proc. Imp. Acad. Tokyo 20, together with his 1949 article, “Markoff process and the Dirichlet problem,” Proc. Imp. Acad. Tokyo 21, are generally accepted as the first place in which a definitive connection between the harmonic functions and Wiener’s measure was established. However, it was not until with Doob’s “Semimartingales and subharmonic functions,” T.A.M.S. 77, in 1954 that the connection was completed. 3 In 1957, Hunt published a series of three articles: “Markov processes and potentials, parts I, II, & III,” Ill. J. Math. 1 & 2. In these articles, he literally created the modern theory of Markov processes and established their relationship to potential theory. To see just how far Hunt’s ideas can be elaborated, see M. Sharpe’s General Theory of Markov Processes, Acad. Press Series in Pure & Appl. Math. 133 (1988).
§ 11.1 Uniqueness Refined
469
Although (11.1.27) provides an intuitively appealing formula for the harmonic measure ΠG (x, · ), it hardly can be considered explicit. Thus, in this subsection I will write down two important examples in which explicit formulas for the harmonic measure are readily available. The first example is the one discussed in Exercise 10.2.22, namely, when G is a half-space. To be precise, if N = 1 and G = (0, ∞), then, because one-dimensional Wiener paths hit points, it is clear that Π(0,∞) (x, · ) is nothing but the point mass δ0 for all x ∈ (0, ∞). On N −1 the other hand, if N ≥ 2 and G = RN × (0, ∞), then we know from + ≡ R Exercise 10.2.22 and (3.3.19) that, for y ∈ (0, ∞), N ΠR+ (0, y), dω =
y 2 λ N −1 (dω), ωN −1 y 2 + |ω|2 N2 R
y ∈ (0, ∞),
N −1 where ωN −1 is the surface area of SN −1 and I have identified ∂RN + with R and used λRN −1 to denote Lebesgue measure on RN −1 . Hence, after a trivial translation,
N ΠR+ (x, y), dω =
y 2 λ N −1 (dω) ωN −1 y 2 + |x − ω|2 N2 R
for
(x, y) ∈ RN −1 × (0, ∞).
Moreover, by using further translation plus Wiener rotation invariance (cf. (ii) in Exercise 4.3.10), one can pass easily from the preceding to an explicit expression of the harmonic measure for an arbitrary half-space. In the preceding, we were able to derive an expression giving the harmonic measure for half-spaces directly from probabilistic considerations. Unfortunately, half-spaces are essentially the only regions for which probabilistic reasoning yields such explicit expressions. Indeed, embarrassing as it is to admit, it must recognized that, when it comes to explicit expressions, the time-honored techniques of clever changes of variables followed by separation of variables are more powerful than anything which comes out of (11.1.27). To wit, I have been unable to give a truly probabilistic derivation of the classical formula given in the following. Theorem 11.1.28 (Poisson Formula). Use λSN −1 to denote the surface measure on the unit sphere SN −1 in RN , and define π (N ) (x, ω) =
1
ωN −1
1 − |x|2 |x − ω|N
for (x, ω) ∈ B(0, 1) × SN −1 .
Then: ΠB(0,1) (x, dω) = π (N ) (x, ω) λSN −1 (dω),
for x ∈ B(0, 1).
470
11 Some Classical Potential Theory
More generally, if c ∈ RN , r ∈ (0, ∞), and λSN −1 (c,r) denotes the surface measure on the sphere SN −1 (c, r) ≡ ∂B(c, r), then ΠB(c,r) (x, dω) =
r2 − |x − c|2 λSN −1 (c,r) (dω), ωN −1 r |x − ω|N 1
x ∈ B(c, r).
Equivalently, for each open G in RN , harmonic function u on G, B(c, r) ⊂⊂ G, and x ∈ B(c, r), Z u(x) = u(c + rω) π (N ) x−c r , ω λSN −1 (dω). SN −1
In particular, if {un : n ≥ 1} is a sequence of harmonic functions on the open set G and if un −→ u boundedly and pointwise on compact subsets of G, then u is harmonic on G and un −→ u uniformly on compact subsets. (See Exercise 11.2.22 for another approach.) Proof: Set B = B(0, 1). Clearly, everything except the final assertion follows by scaling and translation once we identify π (N ) as the density for ΠB . To make this identification, first check, by direct calculation, that π (N ) ( · , ω) is harmonic in B for each ω ∈ SN −1 . Hence, in order to complete the proof, all that we have to do is check that Z f (ω) π (N ) (x, ω) λSN −1 (dω) = f (a)
lim
x→a x∈B
SN −1
for every f ∈ C SN −1 ; R) and a ∈ SN −1 . Since, for each δ > 0, it is clear that Z lim π (N ) (x, ω) λSN −1 (dω) = 0, x→a x∈B
SN −1 ∩B(a,δ){
we will be done Zas soon as we show that π (N ) (x, ω) λSN −1 (dω) = 1
for all x ∈ B.
SN −1
But, because, for each ξ ∈ SN −1 , π (N ) ( · , ξ) is harmonic in B and, by (10.2.7), λSN −1 (0,r) for each r ∈ (0, ∞), ΠB(0,r) (0, · ) = ωN −1 rN −1
we have that, for r ∈ [0, 1) and ξ ∈ SN −1 , Z (N ) 1 = ωN −1 π (0, ξ) = π (N ) (rω, ξ) λSN −1 (dω) SN −1
Z =
π (N ) (rξ, ω) λSN −1 (dω),
SN −1
where, in the final step, I have used the easily verified identity π (N ) (rξ, ω) = π (N ) (rω, ξ)
2 for all r ∈ [0, 1) and (ξ, ω) ∈ SN −1 .
Thus, by writing x = rξ, we obtain the desired identity. When N = 2, one gets the following dividend from Theorem 11.1.28.
§ 11.1 Uniqueness Refined
471
Corollary 11.1.29. Set D(r) = B(0, r) in R2 for r ∈ (0, ∞). Then
|x|2 − r2 r|x|2 λS1 (0,r) (dω) 2π |x|2 ω − r2 x 2 for each x ∈ / D(r). In particular, if u ∈ Cb R2 \ D(r); R is harmonic on R2 \ D(r), then Z |x|2 − r2 |x|2 u(x) = u(rω)λS1 (dω), 2π S1 |x|2 ω − rx 2 (11.1.30)
2
ΠR
\D(r)
(x, dω) =
and so (11.1.31)
1 lim u(x) = 2π |x|→∞
Z S1
u(rω) λS1 (dω).
Proof: After an easy scaling argument, I may and will assume that r = 1. Thus, set D = D(1), and that u ∈ Cb R2 \ D; R is harmonic in R2 \ assume
x for x ∈ D \ {0}. Obviously, v is bounded and D. Next, set v(x) = u |x| 2 continuous. In addition, by using polar coordinates, one can easily check that v is harmonic in D \ {0}. In particular, if ρ ∈ (0, 1) and G(ρ) ≡ B \ B(0, ρ), then h i h i (N ) (N ) v(x) = EWx v ψ(ζ1 ) , ζ1 < ζρ + EWx v ψ(ζρ ) , ζρ < ζ1 , x ∈ G(ρ),
where the notation is that in Theorem 10.1.11. Hence, because, by that theorem, (N ) ζρ % ∞ (a.s., Wx ) as ρ & 0, this leads to Z h i (N ) 1 1 − |x|2 Wx v(x) = E v ψ(ζ1 ) , ζ1 < ∞ = u(ω) λS1 (dω) 2π S1 ω − x 2
for all x ∈ D \{0}. Finally, given the preceding, the rest comes down to a simple matter of bookkeeping. As a second application of Poisson’s formula, I make the following famous observation, which can be viewed as a quantitative version of the Strong Minimum Principle (cf. Theorem 10.1.6) for harmonic functions. Corollary 11.1.32 (Harnack’s Principle). (0, ∞), rN −2 r − |x − c| B(c,r) (c, · ) N −1 Π r + |x − c|
For any c ∈ RN and r ∈
(11.1.33) ≤Π
B(c,r)
rN −2 r + |x − c| B(c,r) (c, · ). (x, · ) ≤ N −1 Π r − |x − c|
472
11 Some Classical Potential Theory
for all x ∈ B(c, r). Hence, if u is a non-negative, harmonic function on B(c, r), then rN −2 r + |x − c| rN −2 r − |x − c| (11.1.34) N −1 u(c). N −1 u(c) ≤ u(x) ≤ r − |x − c| r + |x − c|
In particular, if G is a connected region in RN and {un : n ≥ 1} is a nondecreasing sequence of harmonic functions on G, then either limn→∞ u(x) = ∞ for every x ∈ G or there is a harmonic function u on G to which {un : n ≥ 1} converges uniformly on compact subsets of G. Proof: The inequalities in (11.1.33) are immediate consequences of Poisson’s formula and the triangle inequality; and, given (11.1.33), the inequalities in (11.1.34) comes from integrating the inequalities in (11.1.33). Finally, let a connected, open set G and a nondecreasing sequence {un : n ≥ 1} of harmonic functions be given. By replacing un with un − u0 if necessary, I may and will assume that all the un ’s are nonnegative. Next, for each x ∈ G, set u(x) = limn→∞ un (x) ∈ [0, ∞]. Because (11.1.34) holds for each of the un ’s and B(c, r) ⊂⊂ G, the Monotone Convergence Theorem allows us to conclude that it also holds for u itself. Hence, we know that both {x ∈ G : u(x) = ∞} and {x ∈ G : u(x) < ∞} are open subsets of G, and so one of them must be empty. Finally, assume that u < ∞ everywhere on G, and suppose that B(c, 2r) ⊂⊂ G. Then, by the right-hand side of (11.1.34), the un ’s are uniformly bounded on B c, 3r 2 , and so, by the last part of Theorem 11.1.28, we know that u is harmonic and that un −→ u uniformly on B(c, r).
Notice that, by taking c = 0 and letting r % ∞ in (11.1.34), one gets an easy derivation of the following general statement, of which we already know a sharper version (cf. Theorem 11.1.26) when N ∈ {1, 2}. Corollary 11.1.35 (Liouville Theorem). The only nonnegative harmonic functions on RN are constant. Exercises for § 11.1 Exercise 11.1.36. As a consequence of (11.1.31), note that if u is a bounded harmonic function in the exterior of a compact subset of R2 , then u has a limit as |x| → ∞. Show (by counterexample) that the analogous result is false in dimensions greater than two. Exercise 11.1.37. Once I reduced the problem to that of studying v on D\{0}, the rest of the argument which I used in the proof of (11.1.31) was based on a general principle. Namely, given an open G, a K ⊂⊂ G, and a harmonic function on G \ K, one says that K is a removable singularity for u in G if u admits a unique harmonic extension to the whole of G.
Exercises for § 11.1
473
(ii) Let K ⊂⊂ RN , and take σK (ψ) = inf{t > 0 : ψ(t) ∈ K} to be the first positive entrance time of ψ ∈ C(RN ) into K. Given an open G ⊃⊃ K, show that (11.1.38) Wx(N ) σK < ζ G = 0 for all x ∈ G \ K if and only if K ∩ ∂reg (G \ K) = ∅, and use the locality proved in Lemma 10.2.11 to conclude that (11.1.38) for some G ⊃⊃ K is equivalent to K ∩ ∂reg (G \ K) = ∅ for all G ⊃⊃ K. In particular, conclude that (11.1.38) holds for some G ⊃⊃ K if and only if (11.1.39) Wx(N ) ∃t ∈ [0, ∞) ψ(t) ∈ K = 0 for all x ∈ / K. (iii) Let K ⊂⊂ RN be given, and assume that (11.1.39) holds. Given G ⊃⊃ K and a u ∈ C(G; R) which is harmonic on G \ K, show that K is a removable singularity for u in G. Hint: Begin by choosing a bounded open set H ⊃⊃ K so that H ⊂⊂ G. Next, set n o 1 dist K, H{ , σn (ψ) = inf t > 0 : dist ψ(t), K ≤ 2n
and define un on H by (N )
un (x) = EWx
h
i u ψ(ζ H ) , ζ H < σn .
Show that, on the one hand, un −→ u on H \ K, while, on the other hand, h i (N ) lim un (x) = EWx u ψ(ζ H ) , ζ H < ∞ n→∞
for all x ∈ H. (iii) Let K be a compact subset of RN and a connected G ⊃⊃ K be given. Assuming either that N ≥ 3 or that ∂reg G 6= ∅, show that (11.1.39) holds if K is a removable singularity in G for every bounded, harmonic function on G \ K. (N ) Hint: Consider the function x ∈ G \ K 7−→ Wx σK < ζ G ∈ [0, 1], and use the Strong Minimum Principle. (iv) Let G be a non-empty, open subset of RN , where N ≥ 2, and set D = {(x, x) : x ∈ G}, the diagonal in G2 . Given a u ∈ C(G2 ; R) which is harmonic on G \ D, show that u is harmonic on G2 . Hint: Show that (2N ) Wx,y ∃t ∈ [0, ∞) ψ(t) ∈ D Z ≤ Wy(N ) ∃t ∈ (0, ∞) ψ(t) = ϕ1 (t) Wx(N ) (dϕ) = 0 C(RN )
for (x, y) ∈ G2 \ D.
474
11 Some Classical Potential Theory
Exercise 11.1.40. For each r ∈ (0, ∞), let S(r) denote the open vertical strip (−r, r) × R in R2 . Clearly, ζ S(r) (ψ) = ζr(1) (ψ) ≡ inf t ≥ 0 : |ψ1 (t)| ≥ r , and so the harmonic measure for S(r), based at any point in S(r), will be supported on {(x, y) : x = ±r and y ∈ R}. In particular, if u ∈ Cb S(r); R is bounded and harmonic on S(r), then
(11.1.41)
kuku ≤ sup |u(1, y)| ∨ |u(−1, y)|. y∈R
The estimate in (11.1.41) is a primitive version of the Phragm´en–Lindel¨of maximum principle. To get a sharper version, one has to relax the global boundedness condition on S(r). To see what can be expected, consider the function πy π(x + r) for z = (x, y) ∈ R2 . cosh ur (z) ≡ sin 2r 2r
Obviously, ur is harmonic everywhere but (11.1.41) fails dramatically. Hence, even if boundedness is not necessary for (11.1.41), something is: the function cannot be allowed to grow, as |y| → ∞, as fast as ur does. What follows is the outline of a proof that those harmonic functions which grow strictly slower than ur do satisfy (11.1.41). More precisely, it will be shown that, for u ∈ C S(r); R which are harmonic on S(r), θπ|y| u(x, y) < ∞ for some θ ∈ [0, 1) sup exp − 2r (x,y)∈S(r)
=⇒ u satisfies (11.1.41), which is the true Phragm´ en–Lindel¨ of principle (i)
(i) Given R ∈ (0, ∞), set ζR (ψ) = inf{t ≥ 0 : |ψi (t)| ≥ R}, and show that, for any u ∈ C S(r); R which is harmonic on S(r), h i h i (2) (2) (2) (2) (2) u(z) = EWz u ψ ζr(1) , ζr(1) ≤ ζR + EWz u ψ ζR , ζR < ζr(1)
for z ∈ S(r, R) ≡ (−r, r) × (−R, R). Conclude that (11.1.41) holds as long as (2) lim sup u(x, R) ∨ u(x, −R) Wz(2) ζR < ζr(1) = 0, z ∈ S(r). R→∞ |x|≤1
Thus, the desired conclusion comes down to showing that, for each ρ ∈ (r, ∞), πR (2) Wz(2) ζR < ζr(1) = 0, z ∈ S(r). (*) lim exp R→∞ 2ρ
§ 11.2 The Poisson Problem and Green Functions
475
(ii) To prove (*), let ρ ∈ (r, ∞) be given. Show that, for R ∈ (0, ∞) and z ∈ S(r, R), i h (2) (2) π ψ1 ζR +ρ (2) (1) Wz πR , ζ < ζ sin uρ (z) = cosh 2ρ E r R 2ρ (2) Wz(2) ζR < ζr(1) , cos πr ≥ cosh πR 2ρ 2ρ
and from this get (*). § 11.2 The Poisson Problem and Green Functions Let G be an open subset of RN and f a smooth function on G. The basic problem which motivates the contents of this section is that of analyzing solutions u to the Poisson problem (11.2.1)
1 2 ∆u
= −f in G and
lim u(x) = 0 for a ∈ ∂reg G.
x→a
Notice that, at least when G is bounded, or, more generally, whenever (11.1.20) holds, there is at most one bounded u ∈ C 2 (G; R) which satisfies (11.2.1). Indeed, if there were two, then their difference would be a bounded harmonic function on G satisfying boundary condition 0 at ∂reg G, which, because of (11.1.20) and Corollary 11.1.19, means that this difference vanishes. Moreover, when N ≥ 3, even if (11.1.20) fails, one can (cf. Theorem 11.1.24) recover uniqueness by adding to (11.2.1) the condition that (11.2.2)
lim u(x) = 0.
|x|→∞ x∈G
In view of the preceding discussion, the problem in Poisson’s problem is that of proving that solutions exist. In order to get a feeling for what is involved, given f ∈ Cc (G; R), define # Z Z "Z T T (N ) Wx f (y)pG (t, x, y) y dt 1[0,ζ G ) (t)f ψ(t) dt = uT (x) = E 1 T
1 T
G
for T ∈ (1, ∞) and x ∈ G. Then, by Corollary 10.3.13, 1 2 ∆uT
Z =
f (y) pG (T, x, y) − pG (T −1 , x, y) dy
G
and x→a lim uT (x) = 0 for a ∈ ∂reg G. x∈G
R Hence, at least when (11.1.20) holds and therefore G pG (T, x, y)f (y) dy −→ 0 as T % ∞, it is reasonable to hope that u = limT →∞ uT exists and will be the
476
11 Some Classical Potential Theory
desired solution to (11.2.1). On the other hand, it is neither obvious that the limit will exist nor, even if it does exist, in what sense either the smoothness properties or (11.2.2) will survive the limit procedure. Motivated by these considerations, I now define the Green function to be the function g G given by Z G (11.2.3) g (x, y) = pG (t, x, y) dt, (x, y) ∈ G2 . (0,∞)
My goal in this section is to show that, in great generality, g G is the fundamental R solution to (11.2.1) in the sense that x f (y)g G (x, y) dy solves (11.2.1). G § 11.2.1. Green Functions when N ≥ 3. The transience of Brownian motion in RN for N ≥ 3 greatly simplifies the analysis of g G there. The basic reason why is that Z ∞ Z ∞ |y−x|2 N RN (N ) −N 2 t− 2 e− 2t dt g (x, y) ≡ g (t, y − x) dt = (2π) 0 0 N Γ 2 −1 , = N 2π 2 |y − x|N −2
and therefore (cf. part (i) in Exercise 2.1.13) (11.2.4)
N
g R (x, y) =
2|y − x|2−N , (N − 2)ωN −1 N
where ωN −1 is the area of SN −1 . In particular, when N ≥ 3, g R (x, · ) is smooth and has bounded derivatives of all orders in RN \ B(x, r) for each r > 0. Next, by integrating both sides of (10.3.8) with respect to t ∈ (0, ∞), we obtain, for any G, the Duhamel formula h N i G (N ) N G (11.2.5) g G (x, y) = g R (x, y) − EWx g R ψ(ζ0+ ), y , ζ0+ 0}. It should be clear that, for x = (x1 , x2 ) and y = (y1 , y2 ), |ˇ y−x|2 |y−x|2 1 − 2t − 2t R2+ (1) (0,∞) , −e e p (t, x, y) = g (t, y1 − x1 )p (t, y1 , y2 ) = 2πt
ˇ = (y1 , −y2 ). Therefore, where y Z 2 2π pR+ (t, x, y) dt (0,∞)
Z = lim
T %∞
0
T
1 t
|ˇ y − x|2 |y − x|2 dt − exp − exp − 2t 2t
−2 |y−x| Z
= lim
T %∞
1 − 1 e 2tT dt, t
|ˇ y−x|−2 2
which means that g R+ (x, y) = − π1 log 2
|y−x| |ˇ y−x| .
h (2)
g G (x, y) = g R+ (x, y) − EWx if G ⊆ R2+ . Furthermore, because x from the preceding to (11.2.12) g G (x, y) = −
Hence, by (11.2.9), we know that i G 2 G g R+ ψ(ζ0+ ), y , ζ0+ 0, h i (2) (x, y) ur (x, y) ≡ EWx log |y − ψ(ζ G )|, ζ G (ψ) < ζ B(c,r) (ψ) 2 is harmonic on G ∩ B(c, r) , and, as r → ∞, {ur : r > 0} tends uniformly on compact subsets of G2 to the function h i (2) (x, y) ∈ G2 7−→ u(x, y) ≡ EWx log |y − ψ(ζ G )|, ζ G (ψ) < ∞ ∈ R. In particular, u is harmonic on G2 . Proof: Since g G is symmetric, the first equality is obvious. While proving the associated finiteness assertion, I may and will assume that G is connected. In addition, it suffices for me to prove "Z G # ζ (ψ) (2) sup EWx 1B(c,r) ψ(t) dt < ∞ x∈G
0
for all c ∈ G and r > 0 with B(c, 2r) ⊂⊂ G. Given such a ball, set B = B(c, r) and 2B = B(c, 2r), and define {ζn : n ≥ 0} inductively by ζ0 = 0 and, for n ≥ 1, / 2B}. ζ2n−1 = inf{t ≥ ζ2(n−1) : ψ(t) ∈ B} and ζ2n = inf{t ≥ ζ2n−1 : ψ(t) ∈ (2) G If u(x) = Wx ζ1 < ζ , then u is a [0, 1]-valued, harmonic function on G \ B that tends to 0 as x tends to ∂reg G and to 1 as x tends to ∂B. Thus, since ∂reg G 6= ∅, the Minimum Principle says that u(x) ∈ (0, 1) for all x ∈ G \ B. In particular, this means that α ≡ max{u(x) : |x − c| = 2r} ∈ (0, 1). At the same time, by the Markov property, (2) Wx(2) ζ2n+1 < ζ G = EWx u ψ(ζ2n ) , ζ2n (ψ) < ζ G (ψ) ≤ αWx(2) ζ2n−1 < ζ G , (2) (2) and so Wx ζ2n−1 < ζ G ≤ αn−1 for n ∈ Z+ . Hence, if f (y) = EWy ζ 2B , then "Z G # "Z # ∞ ζ ζ2n X (2) (2) Wx Wx G E 1B ψ(t) dt = E 1B ψ(t) dt, ζ2n−1 (ψ) < ζ 0
≤
∞ X n=1
n=1 (2)
EWx
ζ2n−1
kf ku . f ψ(ζ2n−1 ) , ζ2n−1 (ψ) < ζ G (ψ) ≤ 1−α
§ 11.2 The Poisson Problem and Green Functions
481
Since, by Theorem 10.1.11, f is bounded, this completes the proof. Turning to the second part, begin by observing that, for each r > 0 and x ∈ G(r) ≡ G ∩ B(c, r), ur (x, · ) is a harmonic function on G(r). Next, given y ∈ G(r), define f on ∂G(r) so that f (ξ) = log |y − ξ| or 0 according to whether ξ is or is not an element of ∂G(r) \ ∂B(c, r). Then (2) ur (x, y) = EWx f ψ(ζ G(r) ), ζ G(r) (ψ) < ∞ , and so ur ( · , y) is also harmonic on G(r). Hence, since ur is locally bounded on G(r)2 , Exercise 10.2.16 applies and says that ur is harmonic on G(r)2 . To complete the proof, let B be an open ball whose closure is contained in G, set ¯ G{), and choose R > 0 so that B ¯ ⊆ G(R). Then, for each r > R, D = dist(B, vr (x, y) ≡ ur (x, y) − log DWx(2) ζ G < ζ B(c,r) (2) |y − ψ(ζ G )| G , ζ (ψ) < ζ B(c,r) (ψ) = EWx log D
is a non-negative, harmonic function on B 2 , and, for each (x, y) ∈ B 2 , vr (x, y) is non-decreasing as a function of r > R. Thus, by Harnack’s Principle (cf. Corollary 11.1.32), either limr→∞ vr = ∞ on B 2 or vr tends uniformly on compact subsets of B 2 to a harmonic function v. Since lim sup Wx(2) ζ G < ζ B(c,r) − Wx(2) (ζ G < ∞) = 0, r→∞ x∈B
it is clear that the latter case implies that h i (2) sup EWx log |y − ζ G (ψ)| , ζ G (ψ) < ∞ (x,y)∈K 2
≤ lim
sup
r→∞ (x,y)∈K 2
vr (x, y) + | log D| < ∞
for K ⊂⊂ B and that ur tends to u uniformly on compact subsets of B. Hence, all that remains is for me to rule out the possibility that limr→∞ vr = ∞ on B. Equivalently, I must show that limr→∞ ur (x, y) < ∞ for some (x, y) ∈ B 2 . For this purpose, note that, because G(r) is bounded, and therefore contained in some half-space, (11.2.12) applies and says that h i (2) πg G(r) (x, y) + log |y − x| = EWx log y − ψ(ζ G(r) ) , ζ G(r) (ψ) < ∞ h i (2) = ur (x, y) + EWx log y − ψ(ζ B(c,r) ) , ζ B(c,r) (ψ) < ζ G (ψ) < ∞ .
Hence, for sufficiently large r’s and all (x, y) ∈ B 2 , ur (x, y) ≤ πg G(r) (x, y) +
1 1 log |y − x| ≤ πg G (x, y) + log |y − x|, π π
which, by the first part of this lemma, means that limr→∞ ur cannot be infinite everywhere on B 2 .
482
11 Some Classical Potential Theory
Theorem 11.2.14. Let G be a non-empty, open subset of R2 for which ∂reg G 6= ∅. Then, (11.1.20) holds, (2)
sup EWx x,y∈K
h i log y − ψ(ζ G ) , ζ G < ∞ < ∞ for K ⊂⊂ G,
and
(2)
(x, y) ∈ G2 7−→ EWx
h i log y − ψ(ζ G ) , ζ G < ∞ ∈ R
is a harmonic function. In addition, for each c ∈ G, the limit
log r (2) B(c,r) Wx ζ ≤ ζG , r→∞ π
hG (x) ≡ lim
(11.2.15)
x ∈ G,
exists, is uniform with respect to x in compact subsets of G and independent of c ∈ G, and determines a harmonic function of x ∈ G. Finally, (11.2.16) g G (x, y) = −
i h (2) 1 1 log |y−x|+ EWx log y−ψ(ζ G ) , ζ G < ∞ +hG (x) π π
for all distinct x’s and y’s from G, and so either hG ≡ 0 or G is unbounded and (11.2.17)
g G ( · , y) −→ hG
uniformly on compacts as |y| → ∞ through G.
Proof: Note that, because N = 2, Theorem 11.1.26 guarantees that (11.1.20) follows from ∂reg G 6= ∅, and the rest of the initial assertion is covered by Lemma 11.2.13. To prove the remaining assertions, let c ∈ G be given, set G(r) = G ∩ B(c, r), and set gr (x, y) = g G(r) (x, y) for (x, y) ∈ G(r)2 . By (11.2.12), gr (x, y) = −
h i (2) 1 1 log |y − x| + EWx log |y − ψ(ζ G(r) )|, ζ G(r) (ψ) < ∞ . π π
In particular, for each (x, y) ∈ G(r)2 , gr ( · , y) is harmonic on G(r) \ {y} and gr (x, · ) is harmonic on G(r) \ {x}. Hence, by Exercise 10.2.16, gr is a non\2 ≡ {(x, y) ∈ G(r)2 : x 6= y}. At the same negative, harmonic function on G(r) time, because pG(r) (t, x, y) is non-decreasing in r for each (t, x, y) ∈ (0, ∞) × G(r)2 , we know that gr is non-decreasing in r. Hence, by Harnack’s Principle c2 ≡ {(x, y) ∈ (cf. Corollary 11.1.32), either limr%∞ gr is everywhere infinite on G 2 c 2 G : x 6= y} or gr converges uniformly on compact subsets of G to a harmonic function. Because Z Z g G (x, y) = pG (t, x, y) dt = lim pG(r) (t, x, y) dt = lim gr (x, y), (0,∞)
r%∞
(0,∞)
r%∞
§ 11.2 The Poisson Problem and Green Functions
483
we conclude from the first part of Lemma 11.2.13 that only the second alternative c2 and that is possible. Thus, we now know that g G is harmonic on G (*)
gr (x, y) % g G (x, y)
c2 . uniformly on compact subsets of G
To go further, first notice that the expression in (11.2.12) for gr can be rewritten as (**)
πgr (x, y) = − log |y − x| + ur (x, y) h i (2) + EWx log y − ψ(ζ B(c,r) ) , ζ B(c,r) (ψ) ≤ ζ G (ψ) < ∞ ,
where (2)
ur (x, y) = EWx
i h log y − ψ(ζ G ) , ζ G (ψ) < ζ B(c,r) (ψ)
for (x, y) ∈ G(r)2 .
By the second part of Lemma 11.2.13, we know that each ur is harmonic on G(r)2 and that, as r → ∞, {ur : r > 0} tends uniformly on compact subsets of G2 to the harmonic function h i (2) (x, y) u(x, y) ≡ EWx log |y − ψ(ζ G )|, ζ G (ψ) < ∞ . Moreover, by combining this with (*) and (**), we also know that the third term c2 to a harmonic on the right of (**) converges uniformly on compact subsets of G c2 . At the same time, as r → ∞, function on G h i (2) EWx log y − ψ(ζ B(c,r) ) , ζ B(c,r) (ψ) ≤ ζ G (ψ) < ∞ − log rWx(2) ζ B(c,r) ≤ ζ G (ψ) < ∞ # " ! y − ψ(ζ B(c,r) ) (2) , ζ B(c,r) ≤ ζ G (ψ) < ∞ −→ 0 = EWx log r
uniformly for (x, y) in compact subsets of G2 . Thus, the asserted limit in (11.2.15) exists, the function hG is harmonic on G, and (11.2.16) holds. Finally, to complete the proof, note that if G is bounded, then (11.2.12) holds and therefore hG must be identically 0. Now, assume that G is unbounded. To prove (11.2.17), use (11.2.16) to write # " ! y − ψ(ζ G ) 1 Wx(2) G G G , ζ (ψ) < ∞ , log h (x) = g (x, y) + E |y − x| π
and apply Lebesgue’s Dominated Convergence Theorem together with the integrability estimate in the second part of Lemma 11.2.13 to see that, as |y| → ∞ through G, the second term tends to 0 uniformly for x in compact subsets of G.
484
11 Some Classical Potential Theory
Remark 11.2.18. The appearance of the extra term hG in (11.2.16) is, of course, a reflection of the fact that, for unbounded regions in R2 , we do not know a priori which harmonic function (cf. Remark 11.2.8) should be used to correct − π1 log |y−x|. When N ≥ 3, the obvious choice was the one that behaved N the same way at ∞ as g R itself (i.e., the one that tends to 0 at ∞). Actually, as (11.2.17) makes explicit, the same principle applies to the case when N = 2, G although now 0 may not be that limiting behavior. To see that, in general, h is not identically 0, consider the open disk D(R) = x : |x| < R , and take G = R2 \ D(R). Then it is an easy matter to check that, for R < |x| < r,
log |x| R . Wx(2) ζ D(r) < ζ G = log Rr
Hence, by (11.2.15), we see that 2
hR
\D(R)
(x) =
|x| 1 , log R π
x∈ / D(R).
As we are about to see, for G’s whose complements are compact, the conclusion drawn about hG at the end of Remark 11.2.18 is typical, at least as |x| → ∞. Corollary 11.2.19. Let everything be as in Theorem 11.2.14, and assume that K ≡ R2 \ G is compact. Then, for each R ∈ (0, ∞) with the property that K ⊂⊂ D(R), one has that Z |x|2 − R2 |x|2 |x| 1 G = hG (x) − log h (Rω) λS1 (dω) 2π S1 |x|2 ω − Rx 2 R π Z 1 hG (Rω) λS1 (dω) −→ 2π S1
as |x| → ∞. Proof: Define σ : C(RN ) −→ [0, ∞] to be the first entrance time into D(R), and note (cf. the preceding discussion) that, for each r > R and R < |x| < r, Wx(2) ζ D(r) < ζ G h i (2) (2) = Wx(2) ζ D(r) < σ + EWx Wψ(σ) ζ D(r) < ζ G , σ < ζ D(r)
=
h i (2) log |x| (2) Wx R Wψ(σ) ζ D(r) < ζ G , σ < ζ D(r) . r +E log R
Hence, after multiplying the preceding through by logπ r , using (11.2.15), and letting r → ∞, we arrive at h i (2) 1 |x| 1 + EWx hG ψ(σ) , σ < ∞ , x ∈ R2 \ D(R), hG (x) = log π R π
§ 11.2 The Poisson Problem and Green Functions
485
which certainly implies that x ∈ R2 7−→ hG (x) −
|x| 1 log R π
is a bounded function that is harmonic off of D(R). Thus, the desired result now follows from the first part of Theorem 11.1.29. Notice that, as a by-product, one knows that the number Z 1 1 hG (Rω) λS1 (ω) − log R π 2π S1
does not depend on R as long as G{ ⊂⊂ B(0, R). This number plays an important role in classical two-dimensional potential theory, where it is known as Robin’s constant for G. Corollary 11.2.20. Again let everything be as in Theorem 11.2.14. Then, for each K ⊂⊂ G and r > 0, n o sup g G (x, y) : |x − y| ≥ r and y ∈ K < ∞ and lim sup g G (x, y) = 0 for each a ∈ ∂reg G.
x→a x∈G y∈K
Moreover, for each f ∈ Cc1 (G; R), GG f is the unique bounded solution to (11.2.1). Proof: To prove the initial statements, let c ∈ G and r > 0 satisfying B(c, 2r) ⊂⊂ G be given, set B = B(c, r), and define the first entrance time σ(ψ) of ψ B . By the Markov property, we see that, 0 : ψ(t) ∈ into B by σ(ψ) = inf t ≥ for any f ∈ Cc B; [0, ∞) , "Z G # Z ζ (2) g G (x, y)f (y) dy = EWx f ψ(t) dt, σ < ζ G G
σ (2)
Wx
=E
Z g
G
ψ(σ), y f (y) dy, σ < ζ
G
.
G
Hence, if x ∈ / 2B ≡ B(c, 2r) and therefore g G (x, · ) B is continuous, we find that h i (2) g G (x, y) = EWx g G ψ(σ), y), σ < ζ G for all y ∈ B. But, because g G ∂(2B) × B is bounded, we now see that (*) sup g G (x, y) ≤ CWx(2) σ < ζ G , x ∈ / 2B, y∈B
486
11 Some Classical Potential Theory
for some C ∈ (0, ∞). In particular, this, combined with the obvious Heine– Borel argument, proves the first estimate. In addition, if a ∈ ∂reg G, then, for each δ > 0, lim Wx(2) ζ G > δ lim Wx(2) σ ≤ δ + x→a lim Wx(2) σ < ζ G ≤ x→a x→a x∈G
x∈G
x∈G
=
lim Wx(2) x→a x∈G
σ≤δ .
Thus, since the last expression obviously tends to 0 as δ & 0, this, together with (*), implies that lim sup g G (x, y) = 0, x→a x∈G y∈B
which (again after the obvious Heine–Borel argument) means that we have also proved the second assertion. Turning to the last part of the statement, let f ∈ Cc1 (G, R) be given. By the preceding, we know that GG f is bounded and tends to 0 at ∂reg G. In addition, using Theorem 11.2.14, especially (11.2.16), and arguing as I did in the case when N ≥ 3, it is easy to check that GG f ∈ C 2 (G; R) and 12 ∆GG = −f . Thus, GG f is a bounded solution to (11.2.1), and, because (11.1.20) holds, it can be the only such solution.
Exercises for § 11.2 Exercises 11.2.21. Give an explicit expression for the Green function g B(c,R) when N ≥ 2. To this end, first use translation and scaling to see that x−c y−c , g B(c,R) (x, y) = R2−N g B(0,1) R R
for distinct x, y from B(c, R). Thus, assume that c = 0 and R = 1. Next, observe that y |x − y| = |y|x − |y| for x ∈ SN −1 and y ∈ BRN (0, 1) \ {0},
and use this observation together with (11.2.12) and (11.2.5) to conclude that ( y − |y|x log 1 1 if y 6= 0 B(0,1) |y| g (x, y) = − log |y − x| + π 0 π if y = 0
when N = 2 and N
g B(0,1) (x, y) = g R (x, y) −
N y gR − |y|x if y 6= 0 |y|
when N ≥ 3.
2 (N −2)ωN −1
if y = 0
§ 11.3 Excessive Functions, Potentials, and Riesz Decompositions
487
Exercise 11.2.22. The derivation that I gave of Poisson’s formula (cf. Theorem 11.1.28) required me to already know the answer and simply verify that it is correct. Here I outline another approach, which is the basis for a quite general procedure. To begin with, recall the classical Green’s Identity Z Z ∂u ∂v dλ∂G − v ∂n u∆v − v∆u dx = u ∂n G
∂G
N
for bounded, smooth regions G in R and functions u and v that are smooth in a neighborhood of G. (In the preceding, ∂w ∂n (x) is used to denote the normal derivative ∇w(x), n(x) RN , where n(x) is the outer unit normal at x ∈ ∂G and λ∂G is the standard surface measure for ∂G.) Next, let c be an element of B(0, 1), suppose r > 0 satisfies B(c, r) ⊂⊂ B(0, 1), and let u be a function that is harmonic in a neighborhood of BRN (0, 1). By applying Green’s Identity with G = BRN (0, 1) \ B(c, r) and v = 12 g B(0,1) (c, · ), use Exercise 11.2.21 to verify Z N −1 u(c) = lim r ω, ∇v(c + rω) RN u c + rω) λSN −1 (dω) r&0 SN −1 Z Z = ω, ∇v(ω) RN u ω) λSN −1 (dω) = u ω)π (N ) (c, ω) λSN −1 (dω), SN −1
SN −1
where π (N ) is the Poisson kernel given in Theorem 11.1.28. Finally, given f ∈ C(∂G; R), extend f to BRN (0, 1){ so that it is constant along rays, take (N )
uR (x) = EWx
f ψ(ζ B(0,R) ) , ζ B(0,R) < ∞ for R ≥ 1 and x ∈ B(0, R),
check that, as R & 1, uR −→ u1 uniformly on B(0, 1), and use the preceding to conclude that Z u1 (c) = f (ω) π (N ) (c, ω) λSN −1 (dω), SN −1
which is, of course, the result that was proved in Theorem 11.1.28. § 11.3 Excessive Functions, Potentials, and Riesz Decompositions The origin of the Green function lies in the theory of electricity and magnetism. Namely, if G is a region in RN whose boundary is grounded and y ∈ G, then g G ( · , y) should be the electrical potential in G that results from placing a unit point charge at y. More generally, if µ is any distribution of charge in G (i.e., a non-negative, locally finite, Borel measure on G), then one can consider the potential GG µ given by Z (11.3.1) GG µ(x) = g G (x, y) µ(dy), x ∈ G, G
where I have implicitly assumed that either N ≥ 3 or (11.1.20) holds. In this section I will characterize functions that arise in this way (i.e., are potentials).
488
11 Some Classical Potential Theory
§ 11.3.1. Excessive Functions. Throughout this subsection, G will be a nonempty, connected, open region in RN , and I will be assuming either that N ≥ 3 or that (11.1.20) holds. Thus, by the results obtained in §§ 8.2.1 and 8.2.2, the Green function (cf. (11.2.3)) g G satisfies (depending on whether N = 1, N = 2, or N ≥ 3) either (11.2.10), (11.2.11), (11.2.16), or (11.2.5), and, in order to have g G defined everywhere on G2 , I will take g G (x, x) = ∞, x ∈ G, when N ≥ 2. I will say that u is an excessive function on G and will write u ∈ E(G) if u is a lower semicontinuous, [0, ∞]-valued function that satisfies the super mean value property: u(x) ≥
Z
1
ωN −1
SN −1
u(x + rω) λSN −1 (dω) whenever BRN (x, r) ⊆ G.
As the next lemma shows, there are lots of excessive functions. Lemma 11.3.2. E(G) is closed under non-negative linear combinations and non-decreasing limits, and u, v ∈ E(G) =⇒ u ∧ v ∈ E(G). Moreover, if u ∈ C 2 G; [0, ∞) , then u ∈ E(G) ⇐⇒ ∆u ≤ 0. Finally, for each non-negative, locally finite, Borel measure µ on G and each non-negative harmonic function h on G, GG µ + h is an excessive function on G. Proof: The initial assertions are obvious. To prove the next part, suppose that u ∈ C 2 G; [0, ∞) is given. If u ∈ E(G), then 1 2 ∆u(x)
= lim
r&0
1
ωN −1
Z
u(x + rω) − u(x) λS N −1 (dω) ≤ 0
SN −1
for each x ∈ G. Conversely, if ∆u ≤ 0 and B(x, r) ⊂⊂ G, then (N ) Wx
u(x) = E
(N ) u ψ(ζ B(x,r) ) , ζ B(x,r) < ∞ − EWx
"Z
ζ B(x,r)
0
≥
1
ωN −1
# 1 2 ∆u
ψ(τ ) dτ
Z SN −1
u(x + rω) λSN −1 (dω).
Clearly the third assertion comes down to showing that GG µ is excessive. Moreover, by Fatou’s Lemma and Tonelli’s Theorem, we will know that GG µ is excessive as soon as we show that, for each y ∈ G, g G ( · , y) is excessive. To this 1 end, set fn = pG n , · , y and (cf. (11.2.6)) un = GG fn . Because
Z
T
pG (t, · , y) dt % un 1 n
as T → ∞,
§ 11.3 Excessive Functions, Potentials, and Riesz Decompositions
489
un is lower semicontinuous. In addition, by the Markov property and rotation invariance, B(x, r) ⊂⊂ G implies "Z G # ζ h i (N ) (N ) Wx un (x) ≥ E fn ψ(t) dt = EWx un ψ(ζr ) , ζr < ∞ ζr
=
1
ωN −1
Z un (x + rx) λSN −1 (dx), SN −1
where I have introduced the notation n o (11.3.3) ζr (ψ) = inf t : ψ(t) − ψ(0) ≥ r and used the rotation invariance of Brownian motion. Hence, each un is excessive, and therefore, since Z ∞ pG (t, x, y) dt % g G (x, y) as n → ∞, un (x) = 1 n
we are done. § 11.3.2. Potentials and Riesz Decomposition. My next goal is to prove that, apart from the trivial case when u ≡ ∞, every excessive function on G admits a unique representation in the form GG µ + h for an appropriate choice of µ and h. The proof requires me to make some preparations. Lemma 11.3.4. If u ∈ E(G), then either u ≡ ∞ or u is locally integrable on G. Next, given a u ∈ E(G) that is not identically infinite, there exists a sequence {un : n ≥ 1} ⊆ Cc∞ (G; R) and a non-decreasing sequence {Gn : n ≥ 1} of open subsets of G with the properties that Gn ⊂⊂ G, Gn % G, un ≤ u, ∆un ≤ 0 on Gn for each n ≥ 1, and un −→ u pointwise as n → ∞. Moreover, if µn (dy) = − 12 1Gn (y)∆un (y) dy, then there is a non-negative, locally finite, Borel measure µ on G such that Z Z (11.3.5) lim ϕ dµn = ϕ dµ for all ϕ ∈ Cc (G; R). n→∞
G
G
In fact, µ is uniquely determined by the fact that µ = − 12 ∆u in the sense that Z Z 1 ϕ dµ for all ϕ ∈ Cc∞ (G; R). (11.3.6). 2 ∆ϕ(y)u(y) dy = − G
G
Proof: To prove the first assertion, let U denote the set of all x ∈ G with the property that Z u(y) dy < ∞ for some r > 0 with B(x, r) ⊂⊂ G. B(x,r)
490
11 Some Classical Potential Theory
Obviously, U is an open subset of G. At the same time, if x ∈ G \ U and r > 0 is chosen so that BRN (x, 2r) ⊂⊂ G, then, for each y ∈ B(x, r) and s ∈ (0, r), Z 1 u(y + sω) λSN −1 (dω), u(y) ≥ ωN −1 SN −1
and so, after integrating this with respect to N sN −1 ds over (0, r), we get Z Z 1 1 u(z) dz = ∞, u(z) dz ≥ u(y) ≥ ΩN −1 rN B(x,δ) ΩN −1 rN B(y,r)
where δ ≡ r − |y − x|. Hence, we now see that G \ U is also open, and therefore that either U = G or U = ∅ and u ≡ ∞. Now assume that u ∈ E(G) is not identically infinite. To construct the required Gn ’s and un ’s, choose a reference point c ∈ G, set R = 12 |c − G{|, and take ρ ∈ Cc∞ B(0, R4 ); [0, ∞) to be a rotationally invariant function with total integral 1. Next, for each n ∈ Z+ , set and Gn = x ∈ G ∩ B(c, n) : |x − G{| > R n Z (11.3.7) un (x) = ρn (x − y)u(y) dy, x ∈ RN , G4n
where ρn (ξ) = nN ρ(nξ). Clearly, {un : n ≥ 1} ⊆ Cc∞ G; [0, ∞) . In addition, if x ∈ Gn , then, by taking advantage of the rotation invariance of ρ, one can check that Z Z N −1 t t ρ˜(t) u x + n ω λSN −1 (dω) dt un (x) = (0, R 4 )
SN −1
Z
tN −1 ρ˜(t) dt = u(x),
≤ u(x) ωN −1 (0, R 4 )
where ρ˜ : R −→ [0, ∞) is taken so that ρ(x) = ρ˜ |x| . Similarly, if B(x, r) ⊂⊂ Gn , then Z un (x + rω) λSN −1 (dω) SN −1
Z
Z
ρ(z)
= B(0, R 4 )
Z ≤ ωN −1 B(0, R 4 )
u x+ SN −1
1 nz
+ rω λSN −1 (dω)
dz
ρ(z)u x + n1 z dz = ωN −1 un (x).
Hence, un Gn is a smooth element of E(Gn ), and therefore, by the second part of Lemma 11.3.2, we know that ∆un ≤ 0 on Gn . To see that un −→ u pointwise,
§ 11.3 Excessive Functions, Potentials, and Riesz Decompositions
491
observe that we already know that u(x) ≥ limn→∞ un (x). On the other hand, because u is lower semicontinuous, an application of Fatou’s Lemma yields Z ρ(y) u x + n1 y dy = lim un (x). u(x) ≤ lim n→∞
n→∞
G
To complete the proof, let µn be the measure described, and note that # "Z t∧ζ Gn h i (N ) (N ) 1 un (x) = EWx un ψ(t ∧ ζ Gn ) − EWx 2 ∆un ψ(s) ds 0
(N )
Wx
≥ −E
"Z
t∧ζ Gn
# 1 2 ∆un
0
ψ(s) ds =
Z t Z p 0
Gn
(s, x, y) µn (dy)
ds
Gn
for all n ∈ Z+ and (t, x) ∈ (0, ∞) × Gn . Hence, after letting t % ∞, we see that Z u(x) ≥ un (x) ≥ g Gn (x, y) µn (dy), n ∈ Z+ and x ∈ Gn . Gn
In particular, because u(x) < ∞ for Lebesgue-almost every x ∈ G, this proves that, for each K ⊂⊂ G, supn∈Z+ µn (K) < ∞, and therefore (cf. part (iv) of Exercise 9.1.16 and apply a diagonalization procedure) {µn : n ≥ 1} is relatively compact in the sense that every subsequence {µnm : m ≥ 1} admits a subsequence {µnmk : k ≥ 1} and a locally finite, non-negative, Borel measure µ on G with the property that Z Z lim ϕ dµnmk = ϕ dµ for all ϕ ∈ Cc (G; R). k→∞
G
G
At the same time, using integration by parts followed by Lebesgue’s Dominated Theorem, we see that Z Z Z 1 1 ϕ ∈ Cc2 (G; R), ∆ϕ u dx = − lim ϕ dµn = − lim n 2 ∆ϕ u dx, 2 n→∞
G
n→∞
G
G
and therefore any limit µ of {µn : n ≥ 1} must satisfy (11.3.6), which proves not only that there is such a µ but also that (11.3.5) is satisfied. Lemma 11.3.8. For any lower semicontinuous u : G −→ [0, ∞], u ∈ E(G) if and only if h i h i (N ) (N ) (11.3.9) EWx u ψ(τ ) , τ (ψ) < ζ G (ψ) ≤ EWx u ψ(σ) , σ(ψ) < ζ G (ψ) for every pair σ and τ of Bt : t ∈ [0, ∞) -stopping times with σ ≤ τ . In particular, if u ∈ E(G) and B(x, r) ⊂⊂ G, then, for any rotationally symmetric ρ ∈ Cc B(0, r); [0, ∞) with total integral 1, Z t ∈ (0, 1) 7−→ ρ(y) u(x + ty) dy ∈ [0, ∞] B(0,r)
is a non-increasing function.
492
11 Some Classical Potential Theory
Proof: Let u ∈ E(G) be given. Clearly (11.3.9) is trivial in the case when u ≡ ∞. Thus, assume that u 6≡ ∞, and define Gn and un for n ∈ Z+ as in (11.3.7). Because ∆un Gn ≤ 0, we know that h i (N ) EWx un ψ(τ ∧ ζ Gm ∧ T ) , σ(ψ) ∧ T < ζ Gm (ψ) h i (N ) ≤ EWx un ψ(σ ∧ T ) , σ(ψ) ∧ T < ζ Gm (ψ) for all 1 ≤ m ≤ n, x ∈ Gm , and T ∈ [0, ∞). Next, after noting that ζ Gm < ∞ (N ) Wx -almost surely, let T % ∞ in the preceding, and arrive at h i h i (N ) (N ) EWx un ψ(τ ∧ζ Gm ) , σ(ψ) < ζ Gm (ψ) ≤ EWx un ψ(σ) , σ(ψ) < ζ Gm (ψ) . But, because σ ≤ τ and u ≥ un ≥ 0, this means that h i h i (N ) (N ) EWx un ψ(τ ) , τ (ψ) < ζ Gm (ψ) ≤ EWx u ψ(σ) , σ(ψ) < ζ Gm (ψ) , which, because 0 ≤ un −→ u pointwise, leads, via Fatou’s Lemma, first to h i h i (N ) (N ) EWx u ψ(τ ) , τ (ψ) < ζ Gm (ψ) ≤ EWx u ψ(σ) , σ(ψ) < ζ Gm (ψ) and thence, by the Monotone Convergence Theorem, to (11.3.9) when m → ∞. From here, the rest is easy. Given a lower semicontinuous u : G −→ [0, ∞] and B(x, r) ⊂⊂ G, we have (cf. (11.3.3)) Z h i (N ) 1 u(x + rω) λSN −1 (dω) = EWx u ψ(ζr ) , ζr (ψ) < ζ G (ψ) . ωN −1 SN −1
Thus, if, in addition, (11.3.9) holds, then Z 1 u(x + trω) λSN −1 (dω) ∈ [0, ∞] t ∈ [0, 1] 7−→ ωN −1 SN −1
is non-increasing; and, therefore, not only is u excessive but also (after passing to polar coordinates and integrating) one finds that the monotonicity described in the final assertion is true. Theorem 11.3.10 (Riesz Decomposition). Let G be a non-empty, connected open subset of RN , and assume either that N ≥ 3 or that (11.1.20) holds. If u ∈ E(G) is not identically infinite, then there exists a unique locally finite, non-negative Borel measure µ and a unique non-negative harmonic function h on G with the property that (11.3.11)
u(x) = GG µ(x) + h(x) for all x ∈ G.
In fact, µ is uniquely determined by (11.3.6), and h is the unique harmonic function on G that is dominated by u and has the property that h ≥ w for every non-negative harmonic w that is dominated by u. (Cf. Exercise 11.3.14 as well.)
§ 11.3 Excessive Functions, Potentials, and Riesz Decompositions
493
Proof: Take Gn and un as in (11.3.7), and define µn accordingly, as in Lemma 11.3.4. Then, for each 1 ≤ m ≤ n, Lemma 11.3.4 and the final part of Lemma 11.3.8 say that um ≤ un % u pointwise on Gm . In addition, for m ≤ n and x ∈ Gm , Z
g Gm (x, y) µn (dy) + wm,n (x),
un (x) = Gm
(N )
where wm,n = EWx
un ψ(ζ Gm ) , ζ Gm < ∞ .
Hence, by the Monotone Convergence Theorem, for any locally finite, nonnegative, Borel measure ν on G, Z ZZ Z Gm (*) u(x) ν(dx) = lim g (x, y) ν(dx)dµn (y) + wm (x) ν(dx), Gm
n→∞
Gm
G2m
(N ) where wm (x) = EWx u ψ(ζ Gm ) , ζ Gm < ∞ . Notice (cf. Harnack’s Principle) that, as the non-decreasing limit of nonnegative harmonic functions {wm,n : n ≥ m}, wm is either identically infinite or is itself a non-negative harmonic function on G; and so, since u(x) < ∞ Lebesgue-almost everywhere, (*) shows that the latter must be the case. Now let a be a fixed element of Gm , take ρn as in (11.3.7), and, for n ≥ m, define (R ρ (x − a)g Gm (x, y) dx if y ∈ Gm Gm n ϕn (y) = 0 otherwise. By taking ν(dx) = 1Gm (x)ρn (x − a) dx in (*), we see that, for n ≥ m, Z Z ρn (x − a) u(x) dx = lim ϕn (y) µk (dy) k→∞ G Gm Z + ρn (x − a) wm (x) dx. Gm
But, since Gm is the intersection of two sets, both of which (cf. part (iv) in Exercise 10.2.19) are regular, and is therefore regular as well, there is an n(a) ≥ m for which ϕn is continuous whenever n ≥ n(a). In particular, by (11.3.5), we can now say that Z Z Z ρn (x − a) u(x) dx = ϕn (x) µ(dx) + ρn (x − a) wm (x) dx Gm
G
Gm
for all n ≥ n(a). In addition, as n → ∞, the reasoning with which we showed the un −→ u in Lemma 11.3.4 shows that the term on the left tends to u(a). At
494
11 Some Classical Potential Theory
the same time, it is clear that the second term on the right goes to wm (a) and that ϕn (y) : n ≥ n(a) tends non-decreasingly to g Gm (a, y). Thus, we have now proved that (**)
u = GGm µ + wm
on Gm for every m ∈ Z+ .
Starting from (**), the rest of the proof is quite easy. Namely, fix x ∈ G, choose m so that x ∈ Gm , note that, g Gn (x, · ) is non-decreasing as n ≥ m increases, and conclude that GGn∨m µ(x) % GG µ(x). Hence, by (**) (alternatively, by (11.3.9)), we know that wm∨n (x) tends non-increasingly to a limit h(x), which Harnack’s Principle guarantees to be harmonic as a function of x ∈ G. Thus, after passing to the limit as m → ∞ in (**), we conclude that (11.3.11) holds with the µ satisfying (11.3.6) and h = limm→∞ H Gm u. To prove that these quantities are unique, note that if ν is any locally finite, non-negative, Borel measure on G for which u − GG ν is a non-negative harmonic function, then, for every ϕ ∈ Cc∞ (G; R), simple integration by parts plus the symmetry of g G shows that Z Z Z G 1 1 ∆G ϕ dν = ϕ dν. ∆ϕu dx = − 2 −2 G
G
G
That is, ν must satisfy (11.3.6); and so we have now derived the required uniqueness result. Finally, to check the asserted characterization of h, suppose that v is a nonnegative harmonic function that is dominated by u on G. We then have (N ) v(x) = EWx v ψ(ζ Gm ) , ζ Gm (ψ) < ∞ ≤ wm (x) for m ∈ Z+ and x ∈ Gm , and therefore the desired conclusion follows from the fact that wm tends to h. By combining Lemma 11.3.2 with Theorem 11.3.10, we arrive at the following characterization of potentials. Corollary 11.3.12. Let everything be as in Theorem 11.3.10, and suppose that u : G −→ [0, ∞] is not identically infinite. Then a necessary and sufficient condition for u to be the potential GG µ of some locally finite, non-negative, Borel measure µ on G is that u be excessive on G and have the property that the constant function 0 is the only non-negative harmonic function on G that is dominated by u. Let u be an excessive function on G that is not identically infinite. In keeping with the electrostatic metaphor, I will call the measure µ entering the Riesz decomposition (11.3.11) of u the charge determined by u. A more mathematical interpretation is provided by Schwartz’s theory of distributions. Namely, when u ∈ E(G) is not identically infinite, it is (cf. Lemma 11.3.4) locally integrable on G, and, as such, it determines a distribution there. Moreover, in the language of distribution theory, (11.3.6) says that µ = − 12 ∆u. However, the following theorem provides a better way of thinking about µ.
§ 11.3 Excessive Functions, Potentials, and Riesz Decompositions
495
Theorem 11.3.13. Let G be as in Theorem 11.3.10 and u : G −→ [0, ∞] a lower semicontinuous function. Then u ∈ E(G) if and only if Z u(x) ≥ us (x) ≡
u(y)pG (s, x, y) dy
for all (s, x) ∈ (0, ∞) × G.
G
Moreover, if u ∈ E(G) is not identically infinite and, for s ∈ (0, ∞), µs (dx) = s (x) , then, as s & 0, {µs : s > 0} tends to the fs (x) dx, where fs (x) = u(x)−u s charge µ of u in the sense that
Z
Z ϕ(x) µ(dx) = lim
ϕ(x) µs (dx)
s&0
G
for all ϕ ∈ Cc (G; R).
G
Proof: If u ∈ E(G), then, by the first part of Lemma 11.3.8 with τ = s and σ = 0, one sees that u ≥ us . Conversely, suppose that u : G −→ [0, ∞] is lower semicontinuous, not identically infinite, and satisfies u ≥ us for all s > 0. Then, since pG (s, x, · ) > 0, u is locally integrable on G. Thus, if B(c, r) ⊂⊂ G and
Z
u(y)pB(c,r) (s, x, y) dy,
ws (x) = B(c,r)
then ws is bounded on B(c, r) and therefore, because pB(c,r) is smooth on (0, ∞) × B(c, r)2 and satisfies the Chapman–Kolmogorov equation, it follows that ws is smooth on B(c, r). In addition, because pB(c,r) ≤ pG and ut ≤ u, another application of the Chapman–Kolmogorov equation leads to Z
u(y)pB(c,r) (s + t, x, y) dy
ws+t (x) = B(c,r)
Z
pB(c,r) (s, x, y)ut (y) dy ≤ ws (x)
≤ B(c,r)
for (s, t) ∈ (0, ∞)2 and x ∈ B(c, r). Hence, if ϕ ∈ Cc2 B(c, r); [0, ∞) , then Z B(c,r)
1 t&0 s
− 12 ∆ws (x)ϕ(x) dx = lim
Z
ws (x) − ws+t (x) ϕ(x) dx ≥ 0,
B(c,r)
which proves that ∆ws ≤ 0 on B(c, r). Since this means that ws ∈ E B(c, r) for each s > 0 and because ws is non-increasing as a function of s, we will know that u ∈ E B(c, r) once we show that ws −→ u pointwise on B(c, r). But, since ws ≤ u, this comes down to checking u(x) ≤ lims&0 ws (x), which follows from lower semicontinuity.
496
11 Some Classical Potential Theory
Turning to the second assertion, begin with the observation that, because u ≥ us and u is lower semicontinuous, us −→ u pointwise as s & 0. Next, note that for (s, x) ∈ (0, ∞) × G,
"Z # Z T +s s 1 ut (x) dt − ut (x) dt g (x, y)fs (y) dy = lim T →∞ s 0 T G Z 1 s ut (x) dt ≤ u(x). ≤ s 0
Z
G
Hence, since u < ∞ Lebesgue-almost everywhere on G, sups>o µs (K) < ∞ for all K ⊂⊂ G, and so {µs : s > 0} is (cf. part (iv) of Exercise 9.1.16) relatively sequentially compact in the sense that every subsequence admits a subsequence that converges when tested against ϕ ∈ Cc (G; R). At the same R time, if ϕ ∈ Cc2 (G; R) and ϕs (x) = G ϕ(y)pG (s, x, y) dy, then s
Z
Z
ϕs − ϕ = 0
G
G 1 2 ∆ϕ(y)p (τ,
· , y) dy
dτ,
and so, by Fubini’s Theorem and the symmetry of pG (τ, x, y), one can justify Z ϕ dµs = − G
1 2s Z
−→ G
Z Z
s
uτ (y) dτ
∆ϕ(y) dy Z 1 ϕ dµ. − 2 ∆ϕ(y)u(y) dy = G
0
G
Hence, every limit of {µs : s > 0} is µ. Exercises for § 11.3 Exercise 11.3.14. Let G be a connected open set in RN , and assume that N ∈ {1, 2}. If (11.1.20) fails, show that every excessive function on G is constant. Hence, the only cases not already covered by Riesz’s Decomposition Theorem are trivial anyhow. Hint: Using the reasoning employed to prove the first part of Lemma 11.3.4, reduce to the case when u is smooth and satisfies ∆u ≤ 0, and in this case apply the result in Theorem 11.1.26. Exercise 11.3.15. Let G be an open subset of R, and assume that either N ≥ 3 or (11.1.20) holds. If u is an excessive function on G that is not identically infinite and has charge µ, show that u is harmonic on any open H ⊆ G for which µ(H) = 0. In addition, show that u is a potential if it is bounded and u(x) −→ 0 as x ∈ G tends to ∂reg G ∪ {∞}.
§ 11.4 Capacity
497
Exercise 11.3.16. Let G be a connected, open subset of RN , and again assume that either N ≥ 3 or (11.1.20) holds. If u ∈ E(G) is not identically infinite but u (N ) is infinite on the compact set K, show that Wx ∃t ∈ 0, ζ G (ψ) ψ(t) ∈ K = 0 for all x ∈ G \ K. Finally, apply part (ii) of Exercise 11.1.37 to conclude that (N ) Wx (∃t > 0 ψ(t) ∈ K) = 0 for all x ∈ / K. § 11.4 Capacity In the classical theory of electricity, a question of interest is that of determining the largest charge that can be placed on a body so that the resulting electric field nowhere exceeds 1. From a mathematical standpoint this question is the following. Let M(G) denote the space of non-negative, finite Borel measures on an open set G. Then, given ∅ = 6 K ⊂⊂ R3 , what we want to know is the 3 total mass of the µK ∈ M(R ) that is supported on K and solves the extremal problem 3
3
3
GR µK (x) = max{GR µ(x) : µ ∈ M(R3 ) with µ(R3 \ K) = 0 and GR µ ≤ 1} for all x ∈ R3 . Of course, it is not at all obvious that such a µK exists. Indeed, the proof that it always does was one of Wiener’s significant contributions to classical potential theory. As we are about to see, probability provides a simple proof of Wiener’s result.1 § 11.4.1. The Capacitory Potential. Here I will show that the extremal problem described above has a solution. Theorem 11.4.1. Assume that G is a connected, open subset of RN and that either N ≥ 3 or (11.1.20) holds. Given K ⊂⊂ G, set (N ) (11.4.2) pG ∃t ∈ 0, ζ G (ψ) ψ(t) ∈ K , x ∈ G. K (x) = Wx G Then pG K is a potential whose charge µK is supported on K. Moreover, if µ ∈ M(G) is supported on K and GG µ ≤ 1, then GG µ ≤ pG K.
Proof: I begin by checking that pG K is excessive. For this purpose, note that, for any s > 0, the Markov property says that Z G (N ) pG ∃t ∈ s, ζ G (ψ) ψ(t) ∈ K ≤ pG K (y)p (s, x, y) dy = Wx K (x). G
In addition, because pG K is bounded, the left-hand side is continuous with respect to x ∈ G, and clearly the middle expression tends non-decreasingly to pG K (x) as s & 0. Thus, by the first part of Theorem 11.3.13, we now know that pG ∈ E(G). K 1
It is interesting to note that, although Wiener’s 1924 article, “Certain notions in potential theory,” J. Math. Phys. M.I.T. 4, contains the first proof that an arbitrary compact set is capacitable, it contains no reference to his own measure.
498
11 Some Classical Potential Theory
The next step is to prove that pG K is a potential whose charge is supported on K. But, because N ≥ 3 or (11.1.20) holds, it is clear that pG K (x) tends to 0 as x ∈ G tends to either ∂reg G or ∞. Hence, if u is a non-negative harmonic function on G that is dominated by pG K , then u must be a bounded harmonic function that tends to 0 at ∂reg G ∪ {∞}, and so, because N ≥ 3 or (11.1.20) holds, u ≡ 0. Therefore, pG K is a potential. By Exercise 11.3.15, to check that µG (G \ K) = 0, it suffices to show that pG K K is harmonic on G \ K. For this purpose, assume that B(x, r) ⊂⊂ (G \ K), and use the Markov property to justify
1
ωN −1
Z
B(x,r) (N ) Wx B(x,r) pG pG ,ζ (ψ) < ∞ K (ω) λSN −1 (dω) = E K ψ(ζ SN −1 = Wx(N ) ∃t ∈ ζ B(x,r) (ψ), ζ G (ψ) ψ(t) ∈ K = pG K (x).
That is, pG K satisfies the mean value property in G \ K and is therefore harmonic there. To complete the proof I must still show that if µ ∈ M(G) is supported on G K and u ≡ GG µ ≤ 1, then u ≤ pG K , and I will start by showing that u ≤ pK on G \ K. To this end, observe that u is harmonic on G \ K and that it tends to 0 at ∂reg G ∪ {∞}. Thus, if ζδ (ψ) = inf{t ≥ 0 : ψ(t) ∈ K(δ)}, where K(δ) = {x : |x − K| ≤ δ}, then, for δ ∈ 0, dist(K, G{) and x ∈ G \ K(δ), u(x) is dominated by (N )
EWx
u ψ(ζδ ) , ζδ (ψ) < ζ G (ψ) ≤ Wx(N ) ∃t ∈ 0, ζ G (ψ) ψ(t) ∈ K(δ) .
But, as δ & 0, the last expression tends to pG K (x) plus Wx(N ) ∀δ > 0 ζδ < ζ G and lim ζδ = ∞ = ζ G , δ&0
and, because N ≥ 3 or (11.1.20) holds, this additional term is 0. We now know that u ≤ pG K on G \ K. To prove that the same inequality holds on K, first observe that, by part (i) of Exercise 10.2.19, pG K K = 1 ≥ u K when N = 1. Thus, assume that N ≥ 2. In this case, g G (x, x) = ∞ for x ∈ G, and so, since u ≤ 1, µ must be non-atomic. In particular, this means that Z u(x) = lim ur (x), r&0
where ur ≡
g G ( · , y) µ(dy).
G\B(x,r)
But, by the preceding applied with K \ B(x, r) replacing K, ur (x) ≤ pG K\B(x,r) , G G and obviously pK\B(x,r) ≤ pK .
§ 11.4 Capacity
499
G The function pG K and the measure µK are, for the reasons explained above, known as, respectively, the capacitory potential and the capacitory distribution for K in G, and the total mass
Cap(K; G) ≡ µG K (K)
(11.4.3)
is called the capacity of K in G. As a dividend from Theorem 11.4.1, we get the following important connection between properties of Brownian paths and classical potential theory. Corollary 11.4.4. Let everything be as in the statement of Theorem 11.4.1. Then the following are equivalent: (i) For every x ∈ G, Wx(N ) ∃ t ∈ 0, ζ G (ψ) ψ(t) ∈ K > 0. (ii) There is an x ∈ G for which Wx(N ) ∃ t ∈ 0, ζ G (ψ) ψ(t) ∈ K > 0. (iii) There exists a non-zero, bounded potential on G whose charge is supported in K. (iv) Cap(K; G) > 0. Moreover, Cap(K; G) = 0 for, when N ≥ 3, some G ⊃⊃ K or, when N ∈ {1, 2}, (N ) some G ⊃⊃ K satisfying (11.1.20), if and only if Wx ∃t ∈ (0, ∞) ψ(t) ∈ K = 0 for all x ∈ / K. Proof: The only implications in the equivalence assertion that are not completely trivial are (iii) =⇒ (iv) and (iv) =⇒ (i). But, by Theorem 11.4.1, (iii) G implies that pG K 6≡ 0 and therefore that µK 6= 0. Similarly, (iv) implies that G G µK 6= 0, and therefore, since g > 0 throughout G2 , that pG K > 0 throughoutG. (N ) To prove the final assertion, first suppose that Wx0 ∃t ∈ (0, ∞) ψ(t) ∈ K > 0 for some x0 ∈ / K. Then we can choose R ∈ (0, ∞) so that K ⊂⊂ B(0, R) and B(0,R) B(0,R) pK (x0 ) > 0. In particular, µK 6= 0 and B(0,R)
GG∩B(0,R) µK
B(0,R)
≤ GG µK
≤ 1.
At the same time, because (N )
g G (x, y) ≤ g G∩B(0,R) (x, y) + EWx
h
i g G ψ(ζ G∩B(0,R) ) , ζ G∩B(0,R) (ψ) < ∞ ,
there exists (cf. Corollary 11.2.20 when N = 2) a C < ∞ such that g G (x, y) ≤ B(0,R) g G∩B(0,R) (x, y) + C for all x ∈ / B(0, R) and y ∈ K. Hence, GG µK ≤
500
11 Some Classical Potential Theory
B(0,R) 1 + CCap K, B(0, R) , and so we have shown that GG µK is a non-zero, bounded potential on G whose charge is supported in K, which, by the preceding equivalences, means that Cap(K; G) > 0. Conversely, if Cap(K; G) > 0, then, again by the preceding equivalences, we know that pG K > 0everywhere on G, (N ) which, of course, means that Wx ∃t ∈ (0, ∞) ψ(t) ∈ K > 0, first for all x ∈ G and then for all x ∈ RN . The last part of the preceding allows us to use capacity to determine whether Brownian paths will hit a K ⊂⊂ RN . Indeed, we now know that they will if and only if Cap(K; G) > 0 for some G ⊃⊃ K satisfying our hypotheses. Thus, the ability of Brownian paths in RN to hit a set is completely determined by the singularity in the Green function. Namely, they will hit K with positive probability if and only if there is a non-zero µ supported on K for which GG µ is bounded. When N = 1, there is no singularity, and so even points can be hit. When N ≥ 2, there is a singularity, and so, in order to be hit, K has to be large enough to support a measure that is sufficiently smooth to mollify the singularity in the Green function. Non-trivial (i.e., K’s for which K{ is the interior of its closure) examples of K’s that cannot be hit are hard to come by. “Lebesgue’s spine” provides one in R3 and can be adapted to RN for N ≥ 3. When N = 2 one has too work much harder. The most famous example is a devilishly clever construction, known as “Littlewood’s crocodile,” due to J.E. Littlewood. See M. ´ ements de la Th´eorie Classique du Potenial published Brelot’s lecture notes El´ in 1965 by Centre de Documentation Universitaire, Sorbonne, Paris V. § 11.4.2. The Capacitory Distribution. In this subsection I will give a probabilistic representation, discovered by K.L. Chung, of the capacitory distribution N µG K . Again I assume that G is a connected open subset of R and that either N ≥ 3 or (11.1.20) holds. N The function `G K : C(R ) −→ [0, ∞] given by (11.4.5)
G `G K (ψ) = sup t ∈ 0, ζ (ψ) : ψ(t) ∈ K ≡ 0 if t ∈ 0, ζ G (ψ) : ψ(t) ∈ K = ∅ .
is called a quitting time. Clearly, `G K is not a stopping time. On the other hand, it transforms nicely under the time-shift maps Σt . Specifically, + G `G K ◦ Σt = `K − t
for t ∈ [0, ζ G ).
Theorem 11.4.6 (Chung).2 Let G be a connected open subset of RN , assume that either N ≥ 3 or that (11.1.20) holds, and suppose that K ⊂⊂ G with 2
This result appeared originally in K.L. Chung’s “Probabilistic approach in potential theory to the equilibrium problem,” Ann. Inst. Fourier Gren. 23 # 3, pp. 313–322 (1973). It gives the first direct probabilistic interpretation of the capacitory measure.
§ 11.4 Capacity
501
Cap(K; G) > 0. Then, for all Borel measurable ϕ : G −→ R that are bounded below and every c ∈ G, # " Z (N ) ϕ ψ(`G K) G G Wc , `K ∈ (0, ∞) . (11.4.7) ϕ dµK = E g G c, ψ(`G G K)
Proof: Take u = pG f and µs for s > 0 as in Theorem 11.3.13. K , and define s (N ) G Then sfs (x) = Wx 0 < `K ≤ s , and so, for any ϕ ∈ Cb (G; R), "Z G # Z ζ (N ) G Wc g (c, y)ϕ(y) µs (dy) = E ϕ ψ(t) fs ψ(t) dt G
0
=
=
∞
1 s
Z
1 s
Z
(N )
h
i (N ) G ϕ ψ(t) Wψ(t) 0 < `G K ≤ s , ζ > t dt
(N )
h
i ϕ ψ(t) , t < `G ≤ s + t dt K
EWc
0
∞
EWc
0
# " Z G 1 `K G ϕ ψ(t) dt, `K ∈ (0, ∞) =E s (`G −s)+ K h i G (N ) −→ EWc ϕ ψ(`G as s & 0, K ) , `K ∈ (0, ∞) (N )
Wc
where, in the passage to the third line, I have applied the Markov property and used the time-shift property of `G K . Next, let η ∈ Cc (G; R) be given, note that η is again an element of Cc (G; R), and conclude from Theorem 11.3.13 ϕ = gG (c, ·) and the preceding that (11.4.7) holds first for ϕ’s in Cc (G; R) and then for all bounded, measurable ϕ’s on G. Aside from its intrinsic beauty, (11.4.7) has the virtue that it simplifies the proofs of various important facts about capacity. For instance, it allows one to prove a basic monotone convergence result for capacity. However, before doing so, I will need to introduce the the energy E G (µ, ν), which is defined for locally finite, non-negative Borel measures µ and ν on G by ZZ E G (µ, ν) = g G (x, y) µ(dx)ν(dy). G2
Clearly E G (µ, ν) is some sort of inner product, and so it is not surprising that there is a Schwarz inequality for it. Lemma 11.4.8. and ν on G,
For any pair of locally finite, non-negative, Borel measures µ E G (µ, ν) ≤
q
E G (µ, µ)
q
E G (ν, ν);
and, when the factors on the right are both finite, equality holds if and only if aµ − bν = 0 for some pair (a, b) ∈ [0, ∞)2 \ (0, 0).
502
11 Some Classical Potential Theory
Proof: For each (t, x) ∈ (0, ∞) × G, set Z
pG (t, x, y) µ(dy)
f (t, x) =
Z
g G (t, x, y) ν(dy),
and g(t, x) =
G
G
and note that, by the Chapman–Kolmogorov equation, Tonelli’s Theorem, and Schwarz’s Inequality:
E G (µ, ν) =
ZZ pG (t, x, y) µ(dx)ν(dy) dt
Z (0,∞)
G2
ZZ =
f
t 2, x
g
t 2, x
dtdx
(0,∞)×G
ZZ ≤
2 t 2, x
f
ZZ
2 t 2, x
dtdx
(0,∞)×G
12
12
ZZ
ZZ =
g(t, x) dtdx
f (t, x) dtdx (0,∞)×G
(0,∞)×G
=
g
dtdx
(0,∞)×G
q
12
12
q E G (µ, µ) E G (ν, ν).
Furthermore, when f and g are square integrable, then equality holds if and only if they are linearly dependent in the sense that af − bg = 0 Lebesgue-almost everywhere for some non-trivial choice of a, b ∈ [0, ∞). But this means that Z
a a ϕ dµ = lim T &0 T G
a = lim T &0 T
T
Z
Z
0
ϕ(x)p (t, x, y) µ(dx) dt G
G
ZZ
b ϕ(x) f (t, x) dtdx = lim T &0 T
b T &0 T
T
Z
= lim
0
ϕ(x) g(t, x) dtdx
(0,T ]×G
(0,T ]×G
Z
ZZ
ϕ(x)pG (t, x, y) ν(dx)
G
Z dt = b
ϕ dν G
for every ϕ ∈ Cc (G; R), and so aµ − bν = 0. With this lemma, I can now give the application of Theorem (11.4.7) mentioned above.
§ 11.4 Capacity
503
Theorem 11.4.9. Let G be as in Theorem (11.4.7) and T∞{Kn : n ≥ 1} a nonincreasing sequence of compact subsets of G. If K = 1 Kn , then, for every Borel measurable ϕ : G −→ R that is continuous in a neighborhood of K1 , Z Z G lim ϕ dµKn = ϕ dµG K, n→∞
G
G
and so Cap(K; G) = lim Cap Kn ; G). n→∞
Finally, if µ is any non-negative Borel measure on G satisfying µ(G \ K) = 0 and GG µ ≤ 1, then E G µ, µ ≤ Cap(K; G) and equality holds ⇐⇒ µ = µG K. Proof: Let c ∈ G \ K1 be given. In view of (11.4.7), checking the first assertion (N ) comes down to showing that, for Wc -almost every ψ ∈ C(RN ), G G `G if either Kn (ψ) −→ `K (ψ) ∈ 0, ζ (ψ) G G `Kn (ψ) : n ≥ 1 ⊆ 0, ζ G (ψ) or `G K (ψ) ∈ 0, ζ (ψ) . To this end, let ψ ∈ C(RN ) with ψ(0) = c be given. If `G (ψ) : n ≥ 1 ⊆ K n 0, ζ G (ψ) , then it is clear that G `G where T ∈ 0, ζ G (ψ) . Kn (ψ) & T ≥ `K (ψ), In addition, by continuity, ψ(T ) ∈ K, which means first that T ≤ `G K (ψ) and G G G then that `Kn (ψ) −→ `K (ψ) ∈ 0, ζ (ψ) . Next, observe that G G G G 0 < `G for all n ∈ Z+ . K (ψ) < ζ (ψ) < ∞ =⇒ `Kn (ψ) ∈ `K (ψ), ζ (ψ) Hence, we are done if (11.1.20) holds. On the other hand, if N ≥ 3, then, (N ) because limt→∞ |ψ(t)| = ∞ for Wc -almost all ψ ∈ C(RN ), we know that, for (N ) Wc -almost every ψ ∈ C(RN ), G ζ G (ψ) = ∞ and `G `Kn (ψ) : n ≥ 1 ⊆ 0, ζ G (ψ) ; K (ψ) ∈ (0, ∞) =⇒ and so we have now completed the proof of the first part. To prove the final assertion, first choose compact Kn ’s in G so that K ⊂⊂ (Kn )◦ for each n ∈ Z+ and Kn & K as n → ∞. Because pG Kn K ≡ 1 and G pKn ≤ 1, we have that Z G G G Cap(K; G) = pG µG Kn (x) µK (dx) = E K , µKn G
= EG
12 G E G µG Kn , µKn Z 12 1 G G G G 2 pKn (x) µKn (dx) µK , µK
≤ EG
1 1 1 1 G 2 G 2 Cap K; G 2 Cap Kn ; G 2 −→ E G µG µG K , µK K , µK
G ≤ E G µG K , µK
21
G
504
11 Some Classical Potential Theory
G as n → ∞. Hence, Cap(K; G) ≤ E G µG K , µK . On the other hand, if µ(G \ K) = 0 and GG µ ≤ 1, then, by Theorem 11.4.1, GG µ ≤ pG K ≤ 1, Z Z G E G (µ, µ) = GG µ dµ ≤ pG µG K dµ = E K, µ G G
G G µG K , µK
12
G
1
E (µ, µ) 2 ≤E Z 12 q p 1 G G 2 ≤ E G (µ, µ), Cap(K; G) E (µ, µ) = pG dµ K K G
and equality can hold only if aµG K − bµ = 0 for some non-trivial pair (a, b) ∈ [0, ∞)2 . When one takes µ = µG , K this, in conjunction with the preceding, proves G that Cap(K; G) = E G µG , µ K K . In addition, for any µ with µ(G \ K) = 0 and G G G µ ≤ 1, it shows that E (µ, µ) ≤ Cap(K; G) and that equality can hold only if µ and µG in which case µ = µG K are related by a non-trivial linear equation, K G G G G follows immediately from the equality E µK , µK = E (µ, µ). The result in Theorem 11.4.9, which was known to Wiener, played an important role in his analysis of classical potential theory. To be more precise, when 3 3 N = 3 and K{ is regular, pR K is the continuous function on R that is harmonic off K, is 1 on K, and tends to 0 at infinity. Thus, it is a relatively simple problem to define the capacitory distribution for such K’s in R3 . The importance to Wiener of results like that in Theorem 11.4.9 is that they enabled him (cf. Exercise 11.4.20) to make a consistent assignment of capacity to K’s for which K{ is not necessarily regular. § 11.4.3. Wiener’s Test. This subsection is devoted to another of Wiener’s famous contributions to classical potential theory. As was pointed out following Corollary 11.4.4, capacity can be used to test whether Brownian paths will hit a compact set K. By Lemma 11.1.21, an equivalent statement is that capacity can be used to test whether ∂reg (K{) is empty or not. The result of Wiener that will be proved here can be viewed as a sharpening of this remark. Assume that N ≥ 2, and let an open subset G of RN and an a ∈ ∂G be given. For n ∈ Z+ , set n o Kn = y ∈ / G : 2−n−1 ≤ |y − a| ≤ 2−n , and define (11.4.10)
Wn (a, G) =
nCap Kn ; B(a, 1)
2n(N −2) Cap Kn ; B(a, 1)
if N = 2
Then Wiener’s test says that (11.4.11)
a ∈ ∂reg G ⇐⇒
∞ X n=1
Wn (a, G) = ∞.
if N ≥ 3.
§ 11.4 Capacity
505
Notice that, at least qualitatively, (11.4.11) is what one should expect in that the divergence of the series is some sort of statement that G{ is robust at a. The key to my proof of Wiener’s test is the trivial observation that because Z B(a,1) B(a,1) pn (x) ≡ pKn (x) = g B(a,1) (x, y) µKn (dy), Kn
and, depending on whether N = 2 or N ≥ 3, there exists (cf. Exercise 11.2.21) an −1 αN ∈ (0, 1) such that αN n ≤ g B(a,1) (a, y) ≤ αN n or αN 2n(N −2) ≤ g B(a,1) (a, y) −1 n(N −2) −n −n−1 ≤ αN 2 for y ∈ B(a, 2 ) \ B(a, 2 ), we know that αN Wn (a, G) ≤ pn (a) ≤ Wn (a, G),
n ∈ Z+ .
Hence, in probabilistic terms, Wiener’s test comes down to the assertion that Wa(N )
G ζ0+
∞ X = 0 = 1 ⇐⇒ Wa(N ) An = ∞, 1
where An is the set of ψ ∈ C(RN ) that visit Kn before leaving B(a, 1). Actually, although the preceding equivalence is not obvious, the closely related statement G (11.4.12) Wa(N ) ζ0+ = 0 = 1 ⇐⇒ Wa(N ) lim An > 0 n→∞
G is essentially immediate. Indeed, if ψ(0) = a and ζ0+ (ψ) = 0, then there exists a sequence of times tm & 0 with the property that ψ(tm ) ∈ B(a, 1) ∩ G{ for all m, from which it is clear that ψ visits infinitely many Kn ’s before leaving B(a, 1). Hence, the “ =⇒ ” in (11.4.12) is trivial. As for the opposite N B(a,1) implication, suppose (ψ) < ∞, B(a,1) that ψ ∈ C(R ) has the properties that ζ t ∈ 0, ζ (ψ) : ψ(t) = a} = {0}, and that ψ visits infinitely many Kn ’s before leaving B(a, 1). We can then find a subsequence {nm : m ≥ 1} and a convergent sequence of times tm > 0 such that ψ(tm ) ∈ Knm for each m. Clearly, limm→∞ ψ(tm) = a, and therefore limm→∞ tm = 0. In other words, if ζ B(a,1) (ψ) < ∞, t ∈ 0, ζ B(a,1) (ψ) : ψ(t) = a = {0}, and ψ ∈ limn→∞ An , G then ζ0+ (ψ) = 0. Hence, since N ≥ 2 and therefore
Wa(N )
ψ : ζ B(a,1) (ψ) < ∞ and ∀t > 0 ψ(t) 6= a
= 1,
we have shown that
G Wa(N ) ζ0+ = 0 ≥ Wa(N ) lim An ; n→∞
(N )
and therefore, because Wa in (11.4.12).
G ζ0+ = 0 ∈ {0, 1}, we have proved the equivalence
506
11 Some Classical Potential Theory
In view of the preceding paragraph, the proof of Wiener’s test reduces to the problem of showing that Wa(N )
(11.4.13)
lim An > 0 ⇐⇒
n→∞
∞ X
Wa(N ) An = ∞.
1
By the trivial part of the Borel–Cantelli Lemma, the “ =⇒ ” implication in (11.4.13) is easy. On the other hand, because the events {An : n ≥ 1} are not mutually independent, the non-trivial part of that lemma does not apply and therefore cannot be used to go in the opposite direction. Nonetheless, as we will see, the following interesting variation on the Borel–Cantelli theme does apply and gives us the “⇐=” implication in (11.4.13). Lemma 11.4.14. Let (Ω, F, P) be a probability space and {An : n ≥ 1} a sequence of F-measurable sets with the property that P Am ∩ An ≤ CP Am P An , m ∈ Z+ and n ≥ m + d, for some C ∈ [1, ∞) and d ∈ Z+ . Then ∞ X 1
1 . P An = ∞ =⇒ P lim An ≥ n→∞ 4C
Proof: Because ∞ X
∞ X P An = ∞ =⇒ P And+k = ∞ for some 0 ≤ k < d,
n=1
n=1
whereas
P
lim An ≥ P lim And+k n→∞
n→∞
for each 0 ≤ k < d,
I may and will assume that d = 1. Further, since P lim An ≥ lim P An , n→∞
n→∞
1 for all n ∈ Z+ . In particular, these assumptions I will assume that P(An ) ≤ 4C mean that, for each m ∈ Z+ , we can find an nm > m such that
sm ≡
nm X
3 1 ,C . P A` ∈ 4C
`=m
Pn Indeed, simply take nm to be the largest n > m for which `=m P A` ≤ At the same time, by an easy induction argument on n > m, one has that ! n n [ X 1 X P Ak ∩ A` P A` ≥ P A` − 2 `=m
`=m
m≤k6=`≤n
1 C.
§ 11.4 Capacity for all n > m ≥ 1, and therefore ! ∞ [ P A` ≥ P `=m
n m [
! A`
≥ sm −
`=m
507
1 Cs2m ≥ 4C 2
for all m ∈ Z+ . Proof of Wiener’s Test: All that remains is to check that the sets An (N ) appearing in (11.4.13) satisfy the hypothesis in Lemma 11.4.14 when P = Wa . To this end, set n o σn (ψ) = inf t ∈ (0, ∞) : ψ(t) ∈ Kn . Clearly, An = σn < ζ B(a,1) , and so Wa(N ) Am ∩ An ≤ Wa(N ) σm < σn < ζ B(a,1) + Wa(N ) σn < σm < ζ B(a,1) for all m ∈ Z+ and n 6= m. But, by the Markov property, (N ) Wa(N ) σm < σn < ζ B(a,1) ≤ EWa pn ψ(σm ) , σm (ψ) < ζ B(a,1) (ψ) ≤ β(m, n)pm (a), where I have introduced the notation β(m, n) ≡ maxx∈Km pn (x). Finally, beB(a,1)
cause pn (x) = GB(a,1) µKn (x) and there is a CN < ∞ such that S CN for x ∈ |m−n|≥2 Km and y ∈ Kn ,
g B(a,1) (x,y) g B(a,1) (a,y)
≤
β(m, n) ≤ CN pn (a) for all |m − n| ≥ 2. (N ) Hence, since pn (a) = Wa An , we have now shown that Wa(N ) Am ∩ An ≤ 2CN Wa(N ) Am Wa(N ) An for all |m − n| ≥ 2, which means that Lemma 11.4.14 applies with C = 2CN and d = 2. § 11.4.4. Some Asymptotic Expressions Involving Capacity. Assume K{ that K ⊂⊂ RN and that N ≥ 2. Given K ⊂⊂ RN , define σK (ψ) = ζ0+ (ψ) = inf{t > 0 : ψ(t) ∈ K} to be the first positive entrance time into K. In this subsection I will make some computations in which σK and capacity play a critical role. I begin with a result of F. Spitzer’s3 about the rate of heat transfers from the outside to the inside of a compact set. To be precise, let K ⊂⊂ RN , where N ≥ 3, and think of Z (11.4.15) EK (t) ≡ Wx(N ) σK ≤ t dx K{
as the amount of heat that flows into K during [0, t] from outside. 3
See Electrostatic capacity, heat flow, and Brownian motion, in Z. Wahrsh. Gebiete. 3. Recently, M. Van den Burg has written several papers in which he greatly refines Spitzer’s result.
508
11 Some Classical Potential Theory
Theorem 11.4.16 (Spitzer). Assume that N ≥ 3, and, for K ⊂⊂ RN , define t EK (t) as in (11.4.15). Then EK (t) = Cap(K; RN ). t→∞ t lim
Proof: Because, by the second part of Lemma 11.1.5, Wx(N ) σK = t = 0
for all (t, x) ∈ (0, ∞) × RN ,
we know that t EK (t) is a bounded, non-negative, continuous, non-decreasing function. I next observe that, for any 0 ≤ h < t, Z EK (t) − EK (t − h) = Wx(N ) t − h < σK ≤ t dx. RN
To see this, notice that there would be nothing to do if the integral were over (N ) K{. On the other hand, by part (ii) of Exercise 10.2.19, Wx (σK > 0) = 0 Lebesgue-almost everywhere on K, and so the integral over K does not contribute anything. I now want to replace the preceding by Z h (*) EK (t) − EK (t − h) = Wy(N ) σK ≤ h and σK > t dy, RN
where h σK (ψ) ≡ inf s ∈ (h, ∞) : ψ(s) ∈ K is the first entrance time into K after time h. To prove (*), set (x,y)
θt
(s) =
s t−s x + θt (s) + y, t t
s ∈ [0, t],
where θt (s) = θ(s) − s∧t t θ(t). Then, by (8.3.12) and the reversibility property discussed in Exercise 8.3.22,
Wx(N ) t − h < σK ≤ t Z (x,y) = W (N ) t − h < σK θt ≤ t g (N ) (t, y − x) dy N ZR (y,x) (y,x) h = W (N ) σK θt ≤ h and σK θt > t g (N ) (t, y − x) dy, RN
and now integrate with respect to x to arrive at (*) after an application of Tonelli’s Theorem and another application of (8.3.12).
§ 11.4 Capacity
509
Starting from (*), one has that, for each h ∈ [0, ∞), ∆K (h) ≡ lim EK (t + h) − EK (t) t→∞ Z h = Wy(N ) σK ≤ h and σK = ∞ dy, RN
the convergence being uniform for h in compacts. Thus, ∆K is non-negative and continuous, and, from its definition, it is clear that it is additive in the sense that ∆K (h1 + h2 ) = ∆K (h1 ) + ∆K (h2 ). Therefore, by standard results about additive functions, we now know that ∆K (h) = h∆K (1). The problem which remains is that of evaluating ∆K (1). First observe that, by (4.3.13), |y − K|2 (N ) h (N ) , Wy σK ≤ h and σK = ∞ ≤ Wy σK ≤ h ≤ 2N exp − 2N h
and therefore that
1 ∆K (h) = lim h&0 h h&0 h
Z
∆K (1) = lim
h Wy(N ) σK ≤ h & σK = ∞ dy
B(0,R)
for any R > 0 satisfying K ⊂⊂ B(0, R). Second, note that h h Wy(N ) σK ≤ h and σK = ∞ = Wy(N ) σK = ∞ − Wy(N ) σK = ∞ Z N N h = Wy(N ) σK < ∞ − Wy(N ) σK < ∞ = pR (y) − g (N ) (h, y − ξ)pR K K (ξ) dξ. RN
Finally, combine these with Theorem 11.3.13 to arrive at ∆K (1) = Cap K; RN . To complete the proof, set ]t[= t − btc and write [t]
EK (t) = EK
X ]t[ + EK ]t[ +n − EK ]t[ +n − 1 . n=1
Using this together with ∆K (h) = hCap(K; G), one obtains the desired result. The next two computations provide asymptotic formulas as t % ∞ for the (N ) quantity Wx σK ∈ (t, ∞) . Theorem 11.4.17.4 If N ≥ 3 and K ⊂⊂ RN , then, as t % ∞, pK (t, x) ≡
Wx(N )
σK
N 2Cap(K; RN ) 1 − pR K (x) 1− N t 2 ∈ (t, ∞) ∼ N (2π) 2 (N − 2)
uniformly for x in compacts. 4
This result was conjectured by Kac and first proved by his student A. Joffe. However, I will follow the argument given by F. Spitzer in the article cited above.
510
11 Some Classical Potential Theory
Proof: Without loss in generality (cf. Corollary 11.4.4), I will assume that N K{ Cap(K; RN ) > 0. Next, set pK (x) = pR (t, x, y), and K (x) and pK (t, x, y) = p note that, by the Markov property, Z pK (t, x) =
pK (y) pK (t, x, y) dy. K{ N
Thus, since pK (t, x, y) ≤ (2πt)− 2 , we know that
lim sup t
N 2
−1
t→∞ x∈RN
Z p (t, x) − p (y) p (t, x, y) dy K =0 K K |y|≥R
for every R > 0 with K ⊂⊂ B(0, R). At the same time, because Z
3
N
g R (x, y) µR K (dx),
pK (y) = K
it is clear that lim |y|N −2 pK (y) =
|y|→∞
2Cap(K; RN ) . (N − 2)ωN −1
Hence, we have now shown that
N Z p (t, x, y) 2Cap(K; R ) K dy lim sup t −1 pK (t, x) − =0 N −2 t→∞ x∈RN (N − 2)ωN −1 |y|≥R |y| N 2
for each R ∈ (0, ∞) with K ⊂⊂ B(0, R), and what we must still prove is that
(*)
N Z ωN −1 (N ) pK (t, x, y) 2 −1 W (σ = ∞) dy − lim sup t =0 K N x N −2 t→∞ |x|≤r |y| (2π) 2 |y|≥R
for all positive r and R with K ⊂⊂ B(0, R). To prove (*), let r and R be given, and use (10.3.8) to see that Z |y|≥R
h i (N ) pK (t, x, y) Wx dy = q(t, x) − E q t − σ , ψ(σ ) , σ < t , K K K |y|N −2
where Z q(t, x) ≡ |y|≥R
g (N ) (t, y − x) dy |y|N −2
for (t, x) ∈ (0, ∞) × RN .
§ 11.4 Capacity
511
After changing to polar coordinates and making a change of variables, one can easily check that, for each T ∈ [0, ∞), N ωN −1 lim sup t 2 −1 q(t − s, x) − N t→∞ 0<s≤T (2π) 2 |x|≤r
= 0.
Thus, if, for T ∈ (0, t), we write
Z
ωN −1 (N ) pK (t, x, y) (σK = ∞) dy − N Wx N −2 |y| (2π) 2 |y|≥R ! i h N (N ) N ωN −1 ωN −1 Wx −1 2 −1 q t − σ , σ ≤ T , ψ(σ ) − − E t = t 2 q(t, x) − K K K N N (2π) 2 (2π) 2 i h N (N ) ωN −1 (N ) σK ∈ (T, ∞) , − EWx t 2 −1 q t − σK , ψ(σK ) , σK ∈ (T, t) + N Wx (2π) 2
t
N 2
−1
then it becomes clear that (*) will follow once we check that
(**)
lim
lim sup Wx(N ) σK ∈ (T, ∞) = 0 and T →∞ x∈RN h i (N ) N sup t 2 −1 EWx q t − σK , ψ(σK ) , σK ∈ (T, t) = 0.
T →∞ t>T x∈RN
To check the first part of (**), note that, by the Markov property, Wx(N ) σK ∈ (T, T + 1] =
Z
pK (T, x, y)Wy(N ) σK ≤ 1 dy
K{ −N 2
Z
≤ (2πT )
RN
N Wy(N ) σK ≤ 1 dy ≤ CT − 2 ,
where C = C(N, R) ∈ (0, ∞). Hence, after writing Wx(N )
σK
∞ X ∈ (T, ∞) ≤ Wx(N ) σK ∈ (T + n, T + n + 1] , n=0
(N ) we see that, as T → ∞, Wx σK ∈ (T, ∞) −→ 0 uniformly with respect to x ∈ RN . To handle the second part of (**), note that there is a constant A ∈ (0, ∞) for which N q(t, x) ≤ A (t ∨ 1)1− 2 , (t, x) ∈ (0, ∞) × K,
512
11 Some Classical Potential Theory
and therefore N
(N )
t 2 −1 EWx
q t − σK , ψ(σK ) , σK ∈ (T, t) N −1 2 Wx(N ) σK ∈ [t] − 1, t ≤ At [t]−1
+
X
N (t − `)1− 2 Wx(N ) σK ∈ (` − 1, `]
`=[T ]
[t]−1
≤ ACt
N 2
−1
([t] − 1)
−N 2
+ ACt
N 2
−1
X
N
N
(t − `)1− 2 (` − 1)− 2 ,
`=[T ]
where the C is the same as the one that appeared in the derivation of the first part of (**). Thus, everything comes down to verifying that N
lim sup n 2 −1
m→∞ n>m
n−1 X
N
N
(n − `)1− 2 `− 2 = 0.
`=m
2
But, by taking m = m N −1 and considering
X
N
N
(n − `)1− 2 `− 2
X
and
N
N
(n − `)1− 2 `− 2
(1−m )n≤`≤n
m≤`≤(1−m )n
separately, one finds that there is a B ∈ (0, ∞) such that N
n 2 −1
n−1 X
N
N
(n − `)1− 2 `− 2 ≤ Bm .
`=m
As one might guess, on the basis of (11.2.15), the analogous situation in R2 is somewhat more delicate in that it involves logarithms. Theorem 11.4.18 (Hunt).5 Let K be a compact subset of R2 , define σK as (2) above, assume that Wx σK < ∞ = 1 for all x ∈ R2 , and use hK to denote the function hG given in (11.2.15) when G = R2 \ K. Then, as t % ∞, 2πhK (x) Wx(2) σK > t ∼ log t 5
for each x ∈ R2 \ K.
This theorem is taken from G. Hunt’s article Some theorems concerning Brownian motion, T.A.M.S. 81, pp. 294–319 (1956). With breathtaking rapidity, it was followed by the articles referred to in § 11.1.4.
§ 11.4 Capacity
513
Proof: The strategy of Hunt’s proof is to deal with the Laplace transform Z ∞ (2) e−αt W (2) σK > t dt = α−1 1 − EWx e−ασK , 0
show that
(2) log α1 1 − EWx e−ασK = hK (x), α&0 2π
(*)
lim
and apply Karamata’s Tauberian Theorem to conclude first that Z log t t (2) Wx σK > τ dτ = hK (x) lim t→∞ 2πt 0 and then, because t W (2) σK > t is non-increasing, that the asserted result holds. Thus, everything comes down to proving (*). Set G = R2 \ K. By assumption, G satisfies the hypotheses of Theorem 11.2.14. Now let x ∈ G be given, and choose y ∈ G \ {x} from the same connected component of G as x. Then pG (t, x, y) > 0 for all t ∈ (0, ∞). In addition, by (10.3.8), for each α ∈ (0, ∞), Z ∞ e−αt pG (t, x, y) dt 0 Z ∞ Z ∞ (N ) −αt (2) Wx −ασK −αt (2) = e g t, y − ψ(σK ) dt − E e e g (t, y − x) dt . 0
0
Next observe that Z ∞ α|z|2 −αt (2) e g (t, z) dt = f 2 0 Z ∞ 1 t−1 exp −βt − t−1 dt for β > 0. where f (β) ≡ 2π 0
Writing Z 2πf (β) = 0
1 −1
t exp −βt − t−1 dt + Z ∞ + t−1 e−t dt,
Z
∞
t−1 e−βt exp −t−1 − 1 dt
1
β
integrating by parts, and performing elementary manipulations, we find that f (β) =
log
1 β
2π
+ κ + o(1)
as β & 0,
514
11 Some Classical Potential Theory
where κ=
1 π
Z
∞
e−t log t dt.
0
At the same time, we have that Z ∞ e−αt pG (t, x, y) dt −→ g G (x, y) as α & 0. 0
Hence, when we plug these into the preceding, we get g G (x, y) = −
(2) 1 1 log |y − x| + EWx log |y − ψ(σK )|, σK < ∞ π π (2) log α1 1 − EWx e−ασK + o(1) + 2π
as α & 0. Finally, after comparing this to (11.2.16), we arrive at (*). Let K ⊂⊂ RN be as in the preceding theorem, and choose some c ∈ K{. By comparing the result just obtained to (11.2.15), we see that (2)
lim
t→∞
Wx (2)
Wx
σK > t
σK > ζ BR2 (c,t)
= 2 for each x ∈ K{.
It would be interesting to know if there is a more direct route to this conclusion, in particular, one that avoids a Tauberian argument. Exercises for § 11.4 Exercise 11.4.19. Assume that N ≥ 2. Given a µ ∈ M(RN ), say that µ is tame if Z sup − log |y − x| ∧ 1 µ(dy) < ∞ when N = 2 x∈R2 R2 Z sup |y − x|2−N µ(dy) < ∞ when N ≥ 3. x∈RN
RN
Further, say that Γ ∈ BRN has capacity zero if there is no tame µ ∈ M(RN ) for which µ(Γ) > 0. (i) If K ⊂⊂ RN , show that K has capacity 0 if and only if Cap K; B(0, R) = 0 for some R > 0 with K ⊂⊂ B(0, R). Further, show that if K has capacity 0, G is open with K ⊂⊂ G, and either N ≥ 3 or (11.1.20) holds, then Cap(K; G) = 0. (ii) If Γ ∈ BRN , show that Γ has capacity 0 if and only if every compact K ⊆ Γ has capacity 0. (iii) For any open G ⊆ RN , show that ∂G \ ∂reg G has capacity 0.
Exercises for § 11.4
515
(iv) Let G be an open subset of RN , and assume that either N ≥ 3 or (11.1.20) holds. If u ∈ E(G) is not identically infinite, show that {x ∈ G : u(x) = ∞} has capacity 0. (v) Suppose that G is an open subset of RN and that either N ≥ 3 or (11.1.20) holds. If K ⊂⊂ G, show that {x ∈ K : pG K (x) < 1} has capacity 0. Conclude N N that if µ ∈ M(R ) is tame and µ(R \ K) = 0, then Z Z G G G µ(K) = pK dµ = E (µ, µK ) = GG µ dµG K. G
G N
Exercise 11.4.20. Let G be an open subset of R for some N ≥ 2, and assume that either N ≥ 3 or that (11.1.20) holds. We know how to define Cap(K; G) for K ⊂⊂ G. However, the map K Cap(K; G) is somewhat mysterious. In this exercise we will discuss a few of its important properties, properties that enabled G. Choquet1 to prove that Cap( · , G) admits a well-defined extension to all of BG . (i) If µ, ν ∈ M(G) and GG µ ≤ GG ν, show that E G (µ, µ) ≤ E G (µ, ν). In particular, conclude that Cap(K1 , G) ≤ Cap(K2 , G) for all compacts K1 ⊆ K2 ⊂ G. Thus the convergence in Theorem 11.4.9 is non-increasing convergence. (ii) If K1 , K2 ⊂⊂ G, show that G (N ) pG σK2 < ζ G ≤ σK1 K1 ∪K2 (x) − pK1 (x) = Wx G ≤ Wx(N ) σK2 < ζ G ≤ σK1 ∩K2 ≤ pG K2 (x) − pK1 ∩K2 (x), G G G and therefore that pG K1 ∪K2 + pK1 ∩K2 ≤ pK1 + pK2 . (iii) By combining (i) and (ii), arrive at
E G (µK1 ∪K2 + µK1 ∩K2 , µK1 ∪K2 + µK1 ∩K2 ) ≤ E G (µK1 ∪K2 + µK1 ∩K2 , µK1 + µK2 ). Next, apply (v) of the preceding exercise to see that E G (µK1 ∪K2 +µK1 ∩K2 , µK1 ∪K2 +µK1 ∩K2 ) = Cap(K1 ∪K2 ; G)+3Cap(K1 ∩K2 ; G) and E G (µK1 ∪K2 +µK1 ∩K2 , µK1 +µK2 ) = Cap(K1 ; G)+Cap(K2 ; G)+2Cap(K1 ∩K2 ; G), and conclude that Cap( · ; G) satisfies the strong sub-additivity property Cap(K1 ∪ K2 ; G) + Cap(K1 ∩ K2 ; G) ≤ Cap(K1 ; G) + Cap(K2 ; G). What Choquet showed is that a non-negative set function defined for compact subsets of G and satisfying the monotonicity property in (i), the monotone convergence property in (ii), and the strong subadditivity property in (iii) admits a unique extension to BG in such a way that these properties persist. In the articles alluded to earlier, Hunt used Choquet’s result to show that the first positive entrance into a Borel set is measurable. 1
See Choquet’s Lectures on Analysis, Vol. I, W.A. Benjamin (1965).
Notation
General Description
Notation a∧b&a∨b a+ & a−
The minimum and the maximum of a and b The non-negative part, a ∨ 0, and non-positive part, −(a ∧ 0), of a ∈ R
f ↾S
The restriction of the function f to the set S
k · ku
The uniform (supremum) norm
kψk[a,b]
See
The uniform norm of the path ψ restricted to the inter(4.1.1)
val [a, b] Variation norm of the path ψ ↾ [a, b]
(4.1.2)
Euler Gamma function
(1.3.20)
ωN −1
The surface area of the sphere SN −1 in RN
(2.1.13)
ΩN −1
The volume, N −1 ωN −1 , of the unit ball B(0; 1) in RN
var[a,b] (ψ) Γ(t)
⌊t⌋
The integer part of t ∈ R
Sets and Spaces A∁
The complement of the set A
A(δ)
The δ-hull around the set A
§3.1
The indicator function of the set A.
§1.1
1A BE (a, r) B (E; R) K ⊂⊂ E
The ball of radius r around a in E. When E is omitted, it is assumed to be the RN for some N ∈ Z+ Space of bounded, Borel measurable functions from E into R To be read: K is a compact subset of E.
C
The complex numbers
N
The non-negative integers: N = {0} ∪ Z+
517
Notation
518
The unit sphere in RN
SN −1
The set of rational numbers
Q Z & Z+
Set of all integers and the subset of positive integers The space C ([0, ∞); RN ) of continuous paths ψ : [0, ∞) −→
C(RN )
RN
§9.3
The space of bounded continuous functions from E into
Cb (E; R)
R. The space of continuous, R-valued functions having com-
Cc (G; R)
pact support in the open set G The space of functions (t, x) ∈ R × RN −→ R which are
C 1,2 (R × RN ; R)
continuously differentiable once in t and twice in x. The space of right-continuous paths ψ : [0, ∞) −→ RN
D(RN )
with left-limits on (0, ∞)
§4.1.1
The Cameron–Martin subspace for Wiener measure on
H ( RN )
§8.1.2
Θ ( RN ) The Lebesgue space of E-valued functions f for which
Lp (µ; E)
kf kpE is µ-integrable The space of Borel probability measures on E
M1 (E)
§9.1.2
The space of non-negative, finite, Borel probability mea-
M(E)
sures on E
S (RN ; R) or S (RN ; C)
Real- or complex-valued Schwartz test function space on §3.2.3
RN
Measure Theoretic BE B(E; R)
The Borel σ-algebra over E The space of bounded, measurable functions on E To be read the expectation value of X with respect to µ
µ
E [X, A]
on A. Equivalent to
R
A
X dµ. When A is unspecified, it
is assumed to be the whole space δa
The unit point mass at a
Notation λA E µ [X | F ] fˆ
519
Lebesgue measure on the set A. Usually A = RN or some interval To be read: the conditional expectation value of X given §5.1.1
the σ-algebra F The Fourier transform of the function f
§2.3.1
f ⋆g
The convolution of f with g
hϕ, µi
An alternative notation for Eµ [ϕ]
§2.1
The density of the Gauss distribution in RN
§10.1
g (N ) (t, x)
Wiener Measure Gaussian or normal distribution with mean m and co-
γm,C µ ˆ µ⋆ν µ≪ν µn =⇒ µ µ⊥ν
§2.3.1
variance C The Fourier of the measure µ
§2.3.1
The convolution of measures µ with ν The measure µ is absolutely continuous with respect to ν The sequence {µn : n ≥ 1} tends weakly to µ
The set of medians of the random variable Y
N (m, C)
Normal distributions with mean m and covariance C
σ({Xi : i ∈ I})
_
Fi
§9.1.2
The measure µ is singular to ν
med(Y )
Φ∗ µ
Chap. III
The pushforward (image) of µ under Φ
§1.4 §2.3.1 (1.1.16)
The σ-algebra generated by the set of random variables {Xi : i ∈ I} The σ-algebra generated by
S
i∈I
Fi
i∈I
δs
The differential time-shift map on C(RN )
§7.1.4
Σs
The time-shift map on C(RN )
§10.2.1
Wiener measure on Θ(RN ) or C(RN )
§8.1.1
The distribution of x + ψ under W (N )
§10.1.1
W (N ) (N )
Wx
(H, E, WH )
The abstract Wiener space with Cameron–Martin space H
§8.2.2
Notation
520
Potential Theoretic E(G) g G (x, y) GG µ pG (t, x, y)
The set of excessive functions on G Dirichlet Green function for G
§11.3.1 §11.2
Green potential with charge µ in G
(11.3.1)
Dirichlet heat kernel for G
§10.3.1
Index
A
iterated logarithm, 189, 366 L´ evy’s martingale characterization, 282 L´ evy’s modulus of continuity, 191 non-differentiability, 183 on a Banach space, 361 pinned, 327, 334 recurrence in one and two dimensions, 413 reflection principle, 188, 294 rotational invariance, 187 scaling invariance, 187, 335 for Banach space, 365 strong law, 188 time inversion, 187 for Banach space, 365 transience for N ≥ 3, 414 transition function for killed, 298 variance of paths, 333 with drift, 444 Burkholder’s Inequality, 262 application to Fourier series, 263 application to Walsh series, 264 for continuous martingales, 289 martingale comparison, 257 for martingale square function, 262
absolutely monotone, 19 absolutely pure jump path, 158 abstract Wiener space, 309 orthogonal invariance, 328 ergodicity, 329 adapted, 266 σ-algebra atom in, 13 tail, 2 trivial, 2 approximate identity, 16 a.e. convergence of, 241 Arcsine Law, 407 a characterization of, 415 for random variables, 409 asymptotic, 32 atom, 13 Azema’s Inequality, 264 B Bachelier, 188 barrier function, 423 Beckner’s inequality, 108 Bernoulli multiplier, 101 Bernoulli random variables, 5 Bernstein polynomial, 17 Berry–Esseen Theorem, 77 Bessel operator, 350 Beta function, 138 Blumenthal’s 0–1 Law, 426 Bochner’s Theorem, 119 Borel measureable linear maps are continuous, 314 Borel–Cantelli Lemma extended version of, 506 martingale extension of, 229 original version, 3 Brownian motion, 177 Erd¨ os–Kac Theorem, 399 H¨ older continuity, 183 in a Banach space, 359
C Calder´ on–Zygmund Decomposition Gundy’s for martingales, 227 Cameron–Martin formula, 312 Cameron–Martin space, 305 classical, 305 in general, 310 capacitory distribution, 499 Chung’s representation of, 500 capacitory potential, 497, 499 capacitory distribution, 499 capacity, 499 monotone continuity, 502 capacity zero, 514 Cauchy distribution, 149 Cauchy initial value problem, 400 centered Gaussian measure, 299 non-degenerate, 306
521
522
Index
centered random variable, 179 Central Limit phenomenon, 60 Central Limit Theorem basic case, 64 Berry–Esseen, 77 higher moments, 87 Lindeberg, 61 sub-Gaussian random variables, 89 characteristic function, 82 Chebychev polynomial, 34 Chebyshev’s inequality, 15 Chernoff’s Inequality, 30 Chung–Fuchs Theorem, 231 conditional expectation, 194 application to Fourier series, 204 basic properties, 197 existence and uniqueness, 195 infinite measure, 200 Banach space–valued case, 200 Jensen’s Inequality for, 210 properties, 197 regular, 386 versus orthogonal projection, 202 conditional probability, 196 as limit of na¨ıve case, 209 na¨ıve case, 193 regular version, 388 conditional probability distribution, 388 continuous martingale, 267 Burkholder’s Inequality for, 289 Doob–Meyer Theorem, 285 exponential estimate, 291 exponential martingale, 291 continuous singular functions, 47 convergence in law or distribution, 379 weak, 116 convolution, 63 measure with measure, 115 of function with measure, 83 of functions, 63 countably generated σ-algebra, 13 covariance, 84 Cram´ er’s Theorem, 27 D De,Finetti, 219 strong law, 220 difference operator, 18 Dirichlet problem, 418
balayage procedure, 426 Courant–Friedrichs–Lewy scheme, 428 finite difference scheme, 428 Perron–Wiener solution, 423 regular point, 421 uniqueness, 463 uniqueness criterion N ≥ 3, 466 N ∈ {1, 2}, 467 distribution, 12 function, 7 Gaussian or normal, 85 uniform, 6 distribution of a stochastic process, 152 Donsker’s Invariance Principle, 393 Doob’s Decomposition, 213 continuous case, see Doob–Meyer Doob’s Inequality Banach-valued case, 239 continuous parameter, 270 discrete parameter, 207 Doob’s Stopping Time Theorem continuous parameter, 275 discrete parameter, 213 Doob–Meyer Decomposition, 285 drift, 444 Duhamel’s Formula, 282 for Green function when N = 2, 482 for Green function when N ≥ 3, 476 for killed Brownian motion, 298 E eigenvalues for Dirichlet Laplacian, 450 principal eigenvalue, 450 Weyl’s asymptotic formula, 453 empirical distribution, 384 energy of a charge, 501 equicontinuous family, 377 Erd¨ os–Kac Theorem, 399 ergodic hypothesis continuous case, 254 discrete case, 249 ergodic theory Individual Ergodic Theorem continuous parameter, 254 discrete parameter, 248 stationary family, 251 error function, 72 Euler’s Gamma function, 32 excessive function, 488
Index excessive function (continued) charge determined by, 494 Riesz Decomposition of, 492 exchangeable random variables, 220 Strong Law for, 220 exponential random variable, 161 extended stopping time, 278 F Fernique’s Theorem, 306 application to functional analysis, 314 Feynman’s representation, 303 Feynman–Kac formula, 403 heat kernel, 437 fibering a measure, 389 first entrance time, asymptotics of distribution N = 2, 512 N ≥ 3, 509 first exit time, 419 fixed points of Tα , 92 Fourier transform, 82 Beckner’s inequality for, 108 diagonalized by Hermite functions, 100 for measure on Banach space, 301 inversion formula, 98, 112 of a function, 82 of a measure, 82 operator, 100 Parseval’s Identity for, 112 free fields Gaussian, 343 erogicity, 358 existence of, 352 function characteristic, 82 distribution, 7 error, 72 Euler’s Beta, 138 Euler’s Gamma, 32 excessive, 488 Fourier transform of, 82 Hermite, 100 indicator, 4 moment generating, 23 logarithmic, 25 normalized Hermite, 112 probability generating, 19 progressively measurable, 266
523 Rademacher, 5 rapidly decreasing, 82 tempered, 97 G
Gamma distribution, 138 Gamma function, 32 Gauss kernel, 23 Gaussian family, 179 conditioning, 203 Gaussian measure on a Banach space, 299 support of, 321 Gaussian random variable, independence vs. orthogonality, 94 generalized Poisson process, 171 Green function, 476 for balls, 486 Duhamel’s Formula for N = 2, 482 Duhamel’s Formula for N ≥ 3, 476 properties when N = 2, 485 Green’s Identity, 487 ground state, 439, 448 associated eigenvalue, 439 ground state representation, 439 Guivarc’h recurrence lemma, 45, 256 H Haar basis, 319 Hardy’s Inequality, 238 Hardy–Littlewood Maximal Inequality, 235 harmonic function, 419 Harnack’s Inequality and Principle, 471 Liouville Theorem, 472 removable singularities for, 472 harmonic measure, 468 for balls, 469 for RN + , 469 harmonic oscillator, 406 Harnack’s Inequality, 471 Harnack’s Principle, 471 heat equation, 400 Cauchy initial value problem, 400 heat kernel, 429 Dirichlet, 435 Feynman–Kac, 437 Hermite, 406, 454 heat transfer, Spitzer’s asymptotic rate, 507 Hermite functions, 100 eigenfunctions for Hermite operator, 454
524
Index
Hermite functions (continued) Fourier eigenvectors, 100 normalized, 112 Hermite heat kernel, 406 Hermite multiplier, 98 Hermite operator, 406 Hermite polynomials, 97 Lp -estimate, 114 Hewitt–Savage 0–1 Law, 221 H¨ older conjugate, 100 hypercontractive, 105 I independent events or sets, 1 random variables, 4 existence in general, 12 existence of R-valued sequences, 7 σ-algebras, 1 indicator function, 4 inequality Azema’s, 264 Burkholder’s, 262, 289 Gross’s logarithmic Sobolev, 114 Harnack’s, 471 Jensen’s, 210, 240 Khinchine’s, 94 Kolmogorov’s, 36 L´ evy’s, 40 Nelson’s Hypercontractive, 106 infinitely divisible, 115 measure or law, 115 inner product for measures, 230 integer part, 5 invariant set, 246 J Jensen’s Inequality, 210 Banach-valued case, 240 jump function, 156 K Kac’s Theorem, 252 Kakutani’s Theorem, 229 kernel Gauss, 23 Mehler’s, 98 Khinchine’s Inequality, 94
Kolmogorov’s continuity criterion, 182 Extension or Consistency Theorem, 384 Inequality, 36 Strong Law, 38 0–1 Law, 2 Kronecker’s Lemma, 37 L λ-system, 8 Laplace transform inversion formula, 21 large deviations estimates, 28 Law of Large Numbers Strong in Banach space, 241, 256, 384 for empirical distribution, 384 for exchangeable random variables, 220 Kolmogorov’s, 38 Weak, 16 refinement, 20, 44, 45 Law of the Iterated Logarithm converse, 56 proof of, 54 statement, 49 Strassen’s Version, 340, 366 Lebesgue’s Differentiation Theorem, 237 L´ evy measure, 128 Itˆ o map for, 390 L´ evy operator, 268 L´ evy process, 152 reflection, 292 L´ evy system, 134 L´ evy’s Continuity Theorem, 118 second version, 120 L´ evy–Cram´ er Theorem, 66 L´ evy–Khinchine formula, 136 limit superior of sets, 2 Lindeberg’s Theorem, 61 Lindeberg–Feller Theorem, 62 Feller’s part, 90 Liouville Theorem, 472 locally µ-integrable, 199 Logarithmic Sobolev Inequality, 113 for Bernoulli, 113 logarithmic Sobolev Inequality for Gaussian, 114, 356 lowering operator, 97
Index M marginal distribution, 83 Markov property, 417 martingale, 205 application to Fourier series, 263 continuous parameter, 267 complex, 267 Gundy’s decomposition of, 227 Hahn decomposition of, 227 reversed, 217 Banach-valued case, 241 on σ-finite measure space, 233 martingale convergence continuous parameter, 271 Hilbert-valued case, 243 Marcinkewitz’s Theorem, 207 preliminary version for Banach space, 239 second proof, 226 third proof, 227 via upcrossing inequality, 214 maximal function Hardy–Littlewood, 235 Hardy–Littlewood inequality, 236 maximum principle of Phragm´ en– Lindel¨ of, 474 Maxwell distribution for ideal gas, 70 mean value Banach space case, 199 vector-valued case, 84 measure invariant, 112 locally finite, 63 non-atomic, 381 product, 10 pushforward Φ∗ µ of µ under Φ, 12 measure preserving, 244 measures consistent family, 383 tight, 376, 382 median, 39 variational characterization, 43 Mehler kernel, 98 minimum principle, 130 strong, 405 weak, 404 moment estimate for sums of independent random variables, 94 moment generating function, 23 logarithmic, 25 multiplier Bernoulli, 101
525 Hermite, 98 N
Nelson’s Inequality, 106 non-degenerate, 306 non-negative definite function, 119 non-negative linear functional, 374 normal law, 23 fixed point characterization, 91 L´ evy–Cram´ er Theorem, 66 standard, 23 null set, see P-null set O operator Fourier, 100 hypercontractive, 105 lowering, 97 raising, 96 optional stopping time, 280 Ornstein–Uhlenbeck process, 344 ancient, 345 associated martingales, 415 Gaussian description, 344 Hermite heat kernel, 454 reversible, 346 in Banach space, 365 P Paley–Littlewood Inequality for Walsh series, 264 Paley–Wiener map, 312 as a stochastic integral, 316 Parseval’s Identity, 112 path properties, 158 absolutely pure jump, 158 piecewise constant, 158 Phragm´ en–Lindel¨ of, 474 pinned Brownian motion, 327 π-system, 8 P-null set, 194 Poincar´ e’s Inequality for Gaussian, 355 Poisson jump process, 168 Itˆ o’s construction of, 390 Poisson kernel, 149 for upper half-space, 429 for ball via Green’s Identity, 487 Poisson measure, 122 generalized, 171 simple, 161
526
Index
Poisson point process, 176 Poisson problem, 475 Poisson process, 161, 163 associated with πM , 164 generalized, 171 jump distribution, 163 rate, 163 simple, 161 Poisson random variable, N-valued, 21 Poisson’s formula, 469 Polish space, 367 potential, 487 charge determined by, 494 in terms of excessive functions, 494 principle of accompanying laws, 380 probability space, 1 process Brownian motion, 177 with drift, 444 Ornstein–Uhlenbeck, 344 stationary, 345 process with independent, homogeneous increments, 152 product measure, 10 progressively measurable, 205, 266 versus adapted, 267 pushforward measure Φ∗ µ, 12 Q quitting time, 500 R Rademacher functions, 5 Radon–Nikodym derivatives, martingale interpretation, 216 raising operator, 96 random variable, 4 N-valued Poisson, 21 Bernoulli, 5 characteristic function, 82 convergence in law, 379 Gaussian or normal, 23 vector-valued case, 85 median of, 39 sub-Gaussian, 88 symmetric, 44 uniformly integrable, 15 variance of, 15 rapidly decreasing, 9, 82 Rayleigh’s Random Flights Model, 396, 399
recurrence of Brownian motion, 413 reflection principle Brownian motion, 188, 294 for independent random variables, 40 regular point, 421, 427 exterior cone condition, 427 probabilistic criterion, 421 Wiener’s test for, 504 removable singularity, 472 return time, Kac’s Theorem for, 252 Riemann–Lebesgue Lemma, 121 Riesz Decomposition Theorem, 492 Robin’s constant, 485 S semigroup, hypercontractive estimate, 105 shift invariant, 251 σ-algebra, countably generated, 13 simple Poisson process, 163 run at rate α, 163 Sobolev space, 350 square function, Burkholder’s Inequality for, 262 stable laws, 141 1 order one-sided 2 Brownian motion, 281 density, 149 characterization, 144 one-sided, 147 density, 148 symmetric, 146 densities, 149 state space, 152 stationary, 251 stationary family canonical setting for, 251 Kac’s Theorem for, 252 stationary process, 345 statistical mechanics, derivation of Maxwell distribution, 70 Stein’s method, 72 Stirling’s formula, 32, 70 stochastic integral, 316 stochastic process, 152 adapted, 266 continuous, 266 distribution of, 152 independent increments, 152 modification, 189 reversible, 346
Index stochastic process (continued) right-continuous, 266 state of, 152 stochastic continuity, 189 stopping time, 212 continuous parameter, 272 discrete case, 212 extended, 278 old definition, 280 optional, 280 Stopping Time Theorem Doob’s, continuous parameter, 275 Doob’s, discrete parameter, 213 Hunt’s, continuous parameter, 275 Hunt’s, discrete parameter, 213 Strassen’s Theorem, 340 Brownian formulation of, 363 Strong Law of Large Numbers, 23 for Brownian motion, 188 for empirical distribution, 384 in Banach space, 241, 256, 384 Kolmogorov’s, 38 strong Markov property, 417 Strong Minimum Principle, 405 strong topology on M1 (E), 369 not metrizable, 381 sub-Gaussian random variables, moment estimates, 93 submartingale, 205 continuous parameter, 267 Doob’s Decomposition, 213 Doob’s Inequality continuous parameter, 270 discrete parameter, 206 Doob’s Upcrossing Inequality, 214 reversed, 217 σ-finite measure space, 233 stopping time theorem Doob’s discrete parameter, 212 Doob’s continuous parameter, 275 Hunt’s discrete parameter, 213 Hunt’s continuous parameter, 275 subordination, 148 symmetric difference of sets, 246 symmetric random variable, 44 moment relations, 45 T tail σ-algebra, 2 and exchangability, 220 ergodicity of, 256
527
tempered, 97 tempered distribution, 350 tight, 376, 382 for finite measures, 382 time reversal, 335 time-shift map, 416 Tonelli’s Theorem, 4 transform Fourier, see Fourier transform Laplace, 21 Legendre, 26 transformation, measure preserving, 244 transient, 414 transition probability, 112 U uniform norm k · ku , 17 uniform topology on M1 (E), 367 uniformly distributed, 6 uniformly integrable, 15 unit exponential random variable, 161 V variance, 15 variation norm, 368 W Walsh functions, 264 weak convergence, 116 equivalent formulations, 372 principle of accompanying laws, 380 Weak Law of Large Numbers, 16 Weak Minimum Principle, 404 weak topology on M1 (E), 370 completeness, 377 Prohorov metric for, 379 separable, 376, 382 weak-type inequality, 207 Weierstrass’s Approximation Theorem, 17 Wiener measure, 301 Arcsine law, 407 Feynman’s representation, 303 Markov property, 417 translation by x, 401 Wiener series, 318 classical case, 334 Wiener’s test for regularity, 504