This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
n, then what is the probability that A will maintain a lead throughout the process of sequentially counting all m + n votes cast? [Hint: P (out of m + n votes cast, A scores m votes, B scores n votes, and A maintains a lead throughout) = P(Tö n = m + n).] -
7. Let {S„} be the simple random walk starting at 0 and let MN =max{S„:n=0, 1,2,...,N}, m N =min{S„:n=0, 1, 2,...,N}. (i) Calculate the distribution of MN . (ii) Calculate the distribution of m N . (iii) Calculate the joint distribution of MN and S N . [Hint: Let a > 0, b be integers, a -> b. Then N
P(Tn =n,SN ^b)
P(MN >,a,S N ^b)=P(Ta -b)= n=1 N
_ Z P(Ta =n,SN —S„,>b—a) N
_
P(Tn =n)P(SN _„>-b—a) n=1
8. What percentage of the particles at y at time N are there for the first time in a dilute system of many noninteracting (i.e., independent) particles each undergoing a simple random walk starting at the origin? *9. Suppose that the points of the state space S = 1 are painted blue with probability
EXERCISES
71
or green with probability 1 - p, 0 -< p < 1, independently of each other and of a simple random walk {Sn } starting at 0. Let B denote the random set of states (integer sites) colored blue and let N(p) denote the amount of time (occupation time) that the random walk spends in the set B prior to time n, i.e., p
1B(Sk)•
Nn(P) _ k=0
(i) Show that EN(p) = (n + 1)p. [Hint: EI B (Sk ) = E{E[I B (Sk ) I Sk ]}.] (ii) Verify that
lim Var{ Nn(p) l - cap P(I - P) l n ) 1p
-
for p = Z for p
z.
9^
[Hint: Use Exercise 13.15.] (iii) For p ^ Z, use (ii), to show Nn (p)/n -+ p in probability as n -+ Go. [Hint: Use Chebyshev's Inequality.] (iv) For p = 1, show NN (p)/n -^ p in probability. [Hint: Show Var(N n (p)/n) -* 0 as n—. co.]
10. Apply Stirling's Formula, (k! _ (27rk)'/ Z k k e -k (1 + 0(l)) as k -. 00), to show for the simple symmetric random walks starting at 0 that (i) P(T N) '
(2rz)1'^z N-3/2
as N -^ oo.
(ii) ETy = co.
Exercises for Section 1.5
I. (i) Complete the proof of P61ya'.s Theorem fork > 3. (See Exercise 5 below.) (ii) Give an alternative proof of transience for k > 3 by an application of the Borel-Cantelli Lemma Part 1 (Chapter 0, (6.1)). Why cannot Part 2 of the lemma be directly applied to prove recurrence for k = 1, 2? 2. Show that
kO \k/
Z
=() /
.
[Hint: Consider the number of ways in which n balls can be selected from a box of n black and n white balls.] 3. (i) Show that for the 2-dimensional simple symmetric random walk, the probability of a return to (0, 0) at time 2n is the same as that for two independent walkers, one along the horizontal and the other along the vertical, to be at (0, 0) at time 2n. Also verify this by a geometric argument based on two independent walkers with step size 1/ f and viewed along the axes rotated by 450
72
RANDOM WALK AND BROWNIAN MOTION
(ii) Show that relations (5.5) hold for a general random walk on the integer lattice in any dimension. Use these to compute, for the simple symmetric random walk in dimension two, the probabilities fj that the random walk returns to the origin at time j for the first time for j = 1, ... , 8. Similarly compute fj in dimension three for I < j < 4. 4. (i) Show that the method of Exercise 3(i) above does not hold in k = 3 dimensions. (ii) Show that the motion of three independent simple symmetric random walkers starting at (0, 0, 0) in 71 3 is transient. 5. Show that the trinomial coefficient
n (
j, k, n
—
j — k
n! j!k!(n—j—k)!
is largest for j, k, n j — k, closest to n/3. [Hint: Suppose a maximum is attained for j = J, k = K. Consider the inequalities of the form —
n (
^
n
j, k,n—j—k)_ < (J,K,n—J—K)
when j, k and/or n j — k differ from J, K, n — J — K, respectively, by ± 1. Use this to show In — J — 2KI < 1, in — K — 2JI < 1.] —
6. Give a probabilistic interpretation to the relation (see Eq. 5.9) ß = 1/(1 — y). [Hint: Argue that the number of returns to 0 is geometrically distributed with parameter y.] 7. Show that a multidimensional random walk is transient when the (one-step) mean displacement is nonzero. 8. Calculate the probability that the simple symmetric k-dimensional random walk will return i.o. to a previously occupied site. [Hint: The conditional probability, given S o , ... , S, that S„ + 1 ^ {S o , ... , S„} is at most (2k — 1)/2k. Check that P(S+1,...,S,+mE{So,...,S„}`5
2k-1 '" 2k
)
for each m >, 1.] 9. (i) Estimate (numerically) the expected number (ß — 1) of returns to the origin. [Hint: Estimate a bound for C in (5.15) and bound (5.16) with a Riemann integral.] (ii) Give a numerical estimate of the probability y that the simple random walk in k = 3 dimensions will return to the origin. [Hint: Use (i).] *10. Calculate the probability that a simple symmetric random walk ink = 3 dimensions will eventually hit a given line {(ra, rb, rc): r e Z}, where (a, b, c) & (0, 0, 0) is a lattice point. *11. (A Finite Switching Network) Let F = (x 1 , X21... , x k } be a Finite set of k sites that can be either "on" (1) or "off" (0). At each instant of time a site is randomly selected and switched from its current state e to 1 — e. Let S = {0,1 } F = {(e l , ... , e k ): e ; = 0 or 1}. Let X1 , X2 , ... be i.i.d. S-valued random variables with
EXERCISES
73
P(X" = e ; ) = p ; , i = i,... , k, where e ; e S is defined by e.(j) = b ; , j , and p, _> 0, I p i = 1. Define a random walk on S, regarded as a group under coordinatewise addition mod 2, by
S0=(0,0,...,0).
Show that the configuration in which all switches are off is recurrent in the cases k = 1, 2. The general case will follow from the methods and theory of Chapter II when k < oo. The problem when k = cc has an interesting history: see F. Spitzer (1976), Principles of Random Walk, Springer Verlag, New York, and references
therein. *12. Use Exercise 11 above and the examples of random walks on Z to arrive at a general formulation of the notion of a random walk on a group. Describe a random walk on the unit circle in the complex plane as an illustration of your ideas. 13. Let {X"} denote a recurrent random walk on the 1-dimensional integer lattice. Show that E,(number of visits to x before hitting 0) = + cc,
x ^ 0.
[Hint: Translate the problem by x and consider that starting from 0, the number of visits to 0 before hitting —x is bounded below by the number of visits to 0 before leaving the (open) interval centered at 0 of length lxi. Use monotonicity to pass to the limit.]
Exercises for Section I.6
1. Let X be an (n + 1)-dimensional Gaussian random vector and A an arbitrary linear transformation from 11" + ' to I. Show that AX is a Gaussian random vector in 11'. 2. (i) Determine the consistent specification of finite-dimensional distributions for the canonical construction of the simple random walk. (ii) Verify Kolmogorov consistency for Example 3. *3. (A Kolmogorov Extension Theorem: Special Case) This exercise shows the role of topology in proving what otherwise seems a (purely) measure-theoretical assertion
in the simplest case of Kolmogorov's theorem. Let fl = {0, 1 } be the product space consisting of sequences of the form w = (cw„ w 2 ,...) with w i e {0, 1}. Give S2 the product topology for the discrete topology on {0, 1}. By Tychonoff's Theorem from topology, this makes S compact. Let X" be the nth coordinate projection mapping on S2. Let .y be the Borel sigmafield for ). (i) Show that F coincides with the sigmafield for S2 generated by events of the form F(E 1 ,.. ,E")={we
c2:w 1 =E i for i= 1,2,.. ,n},
for an arbitrarily prescribed sequence E i ..... E" of l's and 0's. [Hint: p(w, n) = J^ ^ Iw" — 7"I/ 2 " metrizes the product topology on S2. Consider the open balls of radii of the form r = 2 -^' centered at sequences which are 0 from some n onward and use separability.]
RANDOM WALK AND BROWNIAN MOTION
74
(ii) Let {P"} be a consistent family of probability measures, with P" defined on (Ii",."), and such that P. is concentrated on Q, = {0,1 }". Define a set function for events of the form F = F(E 1 , ... , e") in (i), by P(F)=P"({w ; =c,i=1,2,.. ,n}).
Show that there is a unique probability measure P on the sigmafield .^" of cylinder sets of the form C={a eft(w,..• .w n )EB},
where B c {0, 1 }", which agrees with this formula for Fe .f", n >_ 1. (iii) Show that F ° := Un 1 .f" is a field of subsets of Q but not a sigmafield. (iv) Show that P is a countably additive measure on F°. [Hint: f) is compact and the cylinder sets are both open and closed for the product topology on D.] (v) Show that P has a unique extension to a probability measure on F. [Hint: Invoke the Caratheodory Extension Theorem (Chapter 0, Section 1) under (iii), (iv)•]
(vi) Show that the above arguments also apply to any finite-state discrete-parameter stochastic process. 4. Let (S2, F, P) and (S, 2) represent measurable spaces. A function X defined on S2 and taking values in S is called measurable if X - `(B) E F for all Be 2', where X -' (B) = {X E B} _ {w E D: X(o) e B}. This is the meaning of an S-valued random variable. The distribution of X is the induced probability measure Q on ,P defined by Q(B)=P(X - '(B))=P(XEB),
BEY.
Let (S2, .F, P) be the canonical model for nonterminating repeated tosses of a coin and X(a) = w", WE n. Show that {X"„ X.1, . ..}, m an arbitrary positive integer, is a measurable function on (S2, F, P) taking values in (S2, F) with the distribution P; i.e., {X,", Xm+ ,, ... , } is a noncanonical model for an infinite sequence of coin tossings. 5. Suppose that Di" and D'2 are covariance matrices. (i) Verify that aD" + ßD 21 , a, ß _> 0, is a covariance matrix. (ii) Let {D ( " ) = ((a;;'))} be a sequence of covariance matrices (k x k) such that lim a v;j ) = a i; exists. Show that D = ((a u )) is a covariance matrix. *6. Let µ(t) = f ° e"'p(dx) be the Fourier transform of a positive finite measure p. (Chapter 0, (8.46)). (i) Show that (µ(t, — t.)) is a nonnegative definite matrix for any t, < • • • < t k . (ii) Show that µ(t) = e - I` 1 if p is the Cauchy distribution. *7. (Pölya Criterion for Characteristic Functions) Suppose that 4 is a real-valued nonnegative function on (— cc, oo) with 4(—t) = 4(t) and 0(0) = 1. Show that if 0 is continuous and convex on [0, cc), then 0 is the Fourier transform (characteristic function) of a probability distribution (in particular, for any t, < t 2 < • • • < t k , k >_ 1, ((4(t ; — ti ))) is nonnegative definite by Exercise 6), via the following steps. (i) Check that
1 — Its, Y(t) = 0,
Its _< 1, t E 68', Itl > 1
75
EXERCISES
(ii)
(iii)
(iv)
(v)
is the characteristic function of the (probability) measure with p.d.f. y(x) = (1 - cos x)/itx 2 , - oo <x < oc. Use the Fourier inversion formula [(8.43), Chapter 0]. Check that the characteristic function of a convex combination (mixture) of probability distributions is the corresponding convex combination of characteristic functions. Let 0 < a, < • • • - 0, Y-"•_, p ; = 1. Show that D(t) = p, y(t/a 1 ) + • • • + p, (t/a^) is a characteristic function. Draw a graph of 0(t) and check that the slope of the segment between a k and ak+I is -[(p k /a k ) + • • • + (p„/a„)], k = 1, 2, ... , n - 1. Interpret the numbers Pi , p, + p 2 , ... , p, + • • • + p„ = I along the vertical axis with reference to the polygonal graph of ß(t). Show that a function 0(t) satisfying Pölya's criterion can be approximated by a function of the form (iii) to arbitrary accuracy. [Hint: Approximate 4(t) by a polygonal path consisting of n segments of decreasing slopes.] Show that the (pointwise) limit of characteristic functions is a characteristic function.
*8. Show that the following define covariance functions for a Gaussian process. (i)
Q.j
=e
i,J = 0, 1, 2, .. .
(ii) Q = min(i, j) = (iii + I il - Ii -JI)/2• ;;
[Hint: Use Exercises 6 and 7.]
Exercises for Section I.7 1. Verify that (B,,, ... , B,,,,) has an m-dimensional Gaussian distribution and calculate the mean and the variance-covariance matrix using the fact that the Brownian motion has independent Gaussian increments. 2. Let {X,} be a process with stationary and independent increments starting at 0 with EXs < cc for s> 0. Assume EX„ EX, are continuous functions of t. (i) Show that EX, = mt for some constant m. (ii) Show that Var X, = at for some constant a 2 > 0. (iii) Calculate the limiting distribution of Y( . )
(X, - mnt) -
t
asn-> cc, for t>0,fixed. 3. (i) (Diffusion Limit Scalings) Let X, be a random variable with mean x + t f (p - q)0 and variance t f A 2 pq. Give a direct calculation of p and q = I - p in terms off and A using only the requirements that the mean and variance of X, - x should stabilize to some limiting values proportional to t as f -. oc and A -+0. (ii) Verify convergence to the Gaussian distribution for the distribution of (Q//)S1 „, j as n -* co, where {S„} is the simple random walk with p„ = (p/2 f) + Z, by application of Liapunov's CLT (Chapter 0, Corollary 7.3).
76
RANDOM WALK AND BROWNIAN MOTION
4. Let {X} be a Brownian motion starting at 0 with diffusion coefficient a 2 > 0 and zero drift. (ii) Show that the process has the following scaling property. For each A > 0 the process {Y} defined by Y, = A - ' t Z Xx, is distributed exactly as the process Al. (ii) How does (i) extend to k-dimensional Brownian motion? 5. Let {X} be a stochastic process which has stationary and independent increments. (i) Show that the distribution of the increments must be infinitely divisible; i.e., for each integer n, the distribution of X, - X, (s < t) can be expressed as an n-fold convolution of a probability measure p". (ii) Suppose that the increment 24 - X s has the Cauchy distribution with p.d.f. (t - s)/n[(t - s) 2 + x 2 ] for s < t, x e I8'. Show that the Cauchy process so described is invariant under the rescaling { Y} where Y = A -' Xi, for A > 0; i.e., {Y,} has the same distribution as {X,}. (This process can be constructed by methods of theoretical complements 1, 2 to Section IV.1.) 6. Let {X} be a Brownian motion starting at 0 with zero drift and diffusion coefficient a 2 > 0. Define Y,= JXJ,t ?0. (i) Calculate EY, Var Y,. (ii) Is { Y} a process with independent increments? 7. Let R, = X, where {Xj is a Brownian motion starting at 0 with zero drift and diffusion coefficient a 2 > 0. Calculate the distribution of R. 8. Let {B,} be a standard Brownian motion starting at 0. Define - Bo- nie "I
V" _
(i) Verify that EV" = 2"' 2 EIB 1 I. (ii) Show that Var V. = VarIB 1 j. t 1. (iii) Show that with probability one, {B,} is not of bounded variation on 0 V", n 1, and, using Chebyshev's Inequality, [Hint: Show that I' + l', P(V">M)->lasn- ooforanyM>0.]
>,
>-
-< -
0 units. Policy holders are charged a (gross) risk premium rate a per unit time and claims are made at an average rate A. The average claim amount is it with variance a 2 . Discuss modeling the risk reserve process {X,} as a Brownian motion starting at x with drift coefficient of the form a - p l and diffusion coefficient 2a 2 , on some scale. 7. (Law of Proportionate Effect) A material (e.g., pavement) is subject to a succession of random impacts or loads in the form of positive random variables L,, L 2 , .. (e.g., traffic). It is assumed that the (measure of) material strength T k after the kth impact is proportional to the strength Tk _, at the preceding stage through the applied load L k , k = 1, 2, ... , i.e., Tk = L,,T,k _,. Assume an initial strength To - 1 as normalization, and that E(log L 1 ) 2 < co. Describe conditions under which it is appropriate to consider the geometric Brownian motion defined by {exp(pt + a 2 B,)}, where {Bj is standard Brownian motion, as a model for the strength process. 8. Let X 1 , X2 ,,.. be i.i.d. random variables with EX„ = 0, Var X. = a 2 > 0. Let S. = X, + • . • + X,,, n >, 1, So = 0. Express the limiting distribution of each of the random variables defined below in terms of the distribution of the appropriate random variable associated with Brownian motion having drift 0 and diffusion coefficient a 2 > 0. (i) Fix 0>0, Y. = n -012 max{ISj ° : 1 _< k < n}. (ii) Yn = n - 'I'S.. (iii) Y. = n 312 >I Sk . [Hint: Consider the integral of t -- S(n , l , 0 5 t - 1.] 9. (i) Write R n (x) = 1(1 + xfn)" - esj. Show that
EXERCISES
79
R(x)
1 — 1 —
sup R(x) —*0
n
1 _ r = 1_ ^x r n Jj r! as n —' oo
+ I x en+
t e^xl (n + 1)!
(for every c > 0).
JxJ Sc
(ii) Use (i) to prove (8.6). [Hint: Use Taylor's theorem for the inequality, and Lebesgue's Dominated Convergence Theorem (Chapter 0, Section 0.3).]
Exercises for Section I.9
1. (i) Use the SLLN to show that the Brownian motion with nonzero drift is transient. (ii) Extend (i) to the k-dimensional Brownian motion with drift. 2. Let X, = X 0 + vt, t >, 0, where v is a nonrandom constant-rate parameter and X 0 is a random variable. (i) Calculate the conditional distribution of X,, given XS = x, for s < t. (ii) Show that all states are transient if v 0. (iii) Calculate the distribution of X, if the initial state is normally distributed with mean µ and variance a 2 . 3. Let {X,} be a Brownian motion starting at 0 with diffusion coefficient a 2 > 0 and zero drift. (i) Define { }} by Y, = tX,,, for t > 0 and Y0 = 0. Show that { Y} is distributed as Brownian motion starting at 0. [Hint: Use the law of large numbers to prove sample path continuity at t = 0.] (ii) Show that {X,} has infinitely many zeros in every neighborhood of t = 0 with probability 1. (iii) Show that the probability that t - X, has a right-hand derivative at t = 0 is zero. (iv) Use (iii) to provide another example to Exercise 8.4. 4. Show that the distribution of min,, 0 X° is exponential if {X,° } is Brownian motion starting at 0 with drift p > 0. Likewise, calculate the distribution of max,,, X,° when p 0, calculate the fraction of particles eventually absorbed. What if p = 0? 4. Two independent Brownian motions with drift p ; and diffusion coefficient a?, i = 1, 2, are found at time t = 0 at positions x i , i = 1, 2, with x, <x 2 . (i) Calculate the probability that the two particles will never meet. (ii) Calculate the probability that the particles will meet before time s > 0. 5. (i) Calculate the distribution of the maximum value of the Brownian motion starting at 0 with drift p and diffusion coefficient a 2 over the time period [0, t]. *(ii) For the case p = 0 give a geometric "reflection" argument that P(max o , s _ y) = 2P(X, _> y). Use (i) to verify this. 6. Calculate the distribution of the minimum value of a Brownian motion starting at 0 with drift I and diffusion coefficient a 2 over the time period [0, t]. 7. Let {B 1 } be standard Brownian motion starting at 0 and let a, b > 0. (i) Calculate the probability that —at < B, < bt for all sufficiently large t. (ii) Calculate the probability that {B,} last touches the line v = —at instead of y = bt. [Hint: Consider the process {Z,} defined by Zo = 0, Z, = tB, I , for t > 0. and Exercise 9.3(i).] 8. Let {(B,(' ) , B 2 )} be a two-dimensional standard Brownian motion starting at (0, 0) (see Section 7). Let r y = inf{t _> 0: B; 2 = y}, y > 0. Calculate the distribution of Bty ) . [Hint: {B} and {B 2 } are independent one-dimensional Brownian motions. Condition on r. Evaluate the integral by substituting u = (x 2 + y 2 )/t.] 9. Let {Br } be a standard Brownian motion starting at 0. Describe the geometric structure of sample paths for each of the following stochastic processes and calculate
EY,. (i) (Absorption)
fY = B 1 if max o<s ,, Bs < a 1 Y = a
where a > 0 is a constant.
if max o , s _ a,
82
RANDOM WALK AND BROWNIAN MOTION
J
(*ii) (Reflection)
Y=B,
ifB,a,
where a > 0 is a constant. (*iii) (Periodic) Y, = B,
—
[B, ],
where [x] denotes the greatest integer less than or equal to x. 10. Let, for a > 0, a e .f8(t) = (2 )"2 t 3/2 ir
22
'
(t > 0).
(i) Verify the convolution property fa * ff (t) = L +ß (t)
for any a, ß > 0.
(ii) Verify that the distribution of; is a stable law with exponent 0 = 2 (index z) in the sense that if Ti , T2 , ... , T. are i.i.d. and distributed as T. then n -8 (T1 + • • • + T.) is distributed as T. (see Eq. 10.2). (iii) (Scaling property) ; is distributed as z 2 z1. 11. Let T. be the first passage time to z for a standard Brownian motion starting at 0 with zero drift. (i) Verify that Ez= is not finite. A > 0. [Hint: Tedious integration will = e i ✓ 2 ' ^"', (ii) Show that Ee work.] (iii) Use Laplace transforms to check that (1/n)z (,J J converges in distribution to t z as n —> oo.
12. Let {B,} be standard Brownian motion starting at 0. Let s < t. Show that the probability that {B,} has at least one zero in (s, t) is given by (2/it) cos - '(s/t)`/ 2 . [Hint: Let p(x) = P({B,} has at least one zero in (s, t) ^ B, = x).
Then for x > 0, p(x)=P(min B,- 0, Sk> S(,..., S,> Sk_1,Sk>-Sk+1,•. •,Sk>- S n }
fork>- t,{T " =0}={Sk y}. Show that if Y is uniform on [0, 1] then X = F '(Y) has distribution function F. -
3. Let {B r } be standard Brownian motion starting at 0 and let B* = B, - tB,, 0 < t x) dx leads to an absurdity, since the values of the termwise integrals are zero. Express EM* + as Q, q)
2
lim
e
-
0
Y [(2kx) 2 — l]exp{ —' (2kx) 2 } dx k=1
86
_
RANDOM WALK AND BROWNIAN MOTION
and note that for k >- A the integrand is nonnegative on [A, oc ). So Lebesgue's monotone convergence can be applied to interchange integral with sum over k > 1/(20) to get zero for this. Thus, EM* + is the limit as 0 - 0 of a finite sum over k .,) -*0 as m -* oo under assumptions (2) and (3) of Theorem 13 1. i
88
RANDOM WALK AND BROWNIAN MOTION
4. For the simple symmetric random walk, starting at x, show that E{S
yP(r
y
= r) for
r _< m.
, ,1(, ,)} _
5. Prove that EZn is independent of n (i.e., constant) for a martingale {Zn ). Show also that E(Zn I {Z o , ... , Zk )) = Zk for any n> k. 6. Write out a proof of Theorem 13.3 along the lines of that of Theorem 13.1. 7. Let {Sn } be a simple symmetric random walk with p a (2, 1). (i) Prove that {(q/p)s ': n = 0, 1, 2, ...} is a martingale. (ii) Let c <x 1, derive the Maximal Inequality P(MM >, A) < EIZn I°/ti" in the context of Theorem 13.6. 10. (Submartingales) Let {Z n : n = 0, 1, 2, ...} be a finite or infinite sequence of integrable random variables satisfying E(Zn+ I {Z 0 , ...Zn }) > Zn for all n. Such a sequence {Zn } is called a submartingale. (i) Prove that, for any n > k, E(Zn ( {Z0.....4)) >- Z. (ii) Let Mn = max{Z o , ... , Zn }. Prove the maximal inequality P(MM _> A) _< EZ/A 2 for A > 0. [Hint: E(ZkIAk(Zn — Zk)) = E(Z,IAkE(Zn — Zk I {Zo, ... , Zk})) i 0
for n>k, where A k :={Z 0 _ 1. [Hint: Use Jensen's or Hölder's Inequality, Chapter 0, (2.7), (2.12).] 12. (An Exponential Martingale) Let {X,: j _> 0} be a sequence of independent random variables having finite moment-generating functions 4,(^):= E exp{^XX } for some 96 0. Define Sn := X, + • • • + XX , Zn = exp{^Sn }/fl7 = , q(). (i) Prove that {Zn } is a martingale. (ii) Write M„ = max{S...... Sn }. If > 0, prove that n
11 O ; (Z) P(MM - A) -< exp{ — ZA}
(Ä >0).
(iii) Write mit = min{S l , ... , Sn }. If < 0, prove
P(mn -
0).
i =1
13. Let {Xn : n >_ 1) be i.i.d. Gaussian with mean zero and variance a 2 > 0. Let Sn = X, + • • + Xn , MM = max{S 1 , ... , Sn }. Prove the following for A > 0. (i) P(MM _> 2) < exp{ —2 2 /(2a 2 n)). [Hint: Use Exercise 12(ii) and an appropriate choice of .]
(ii) P(max {ISST: 1 < j < n} >_ Aaln) _< 2 exp{ —A 2 /2}. 14. Let r ' r 2 be stopping times. Show the following assertions (i) —(v) hold. (i) z l v r 2 '= max(r l , tr z ) is a stopping time.
89
EXERCISES (ii)
il
A r Z := min(r l , 2 2 ) is a stopping time.
(iii) t l + 1 2 is a stopping time. (iv) at,, where a is a positive integer, is a stopping time. (v) If i l < t Z a.s. then it need not be the case that 1 2 — r, is a stopping time. (vi) If r is an even integer-valued stopping time, must zr be a stopping time? 15. (A Doob-Meyer Decomposition)
(i) Let { } } be an arbitrary submartingale (see Exercise 10) with respect to
sigmafields , o c ,yl c ,F2 c • • • . Show that there is a unique sequence { V„} such that: (a) 0= VQ
_< V 1.]
(ii) Calculate the {V„}, {Z„} decomposition for Y = S„, where {S„} is the simple symmetric random walk starting at 0. [Note: A sequence { V„} satisfying (b) is called a predictable sequence with respect to {.y}.] 16. Let {S„} be the simple random walk starting at 0. Let {G„} be a predictable sequence of nonnegative random variables with respect to .f„ = a{X l , ... , X„} = a So , ... , S„}, where X. = SS — S,_ 1 (n = 1, 2, ...); i.e., each G. is 9„_ 1 -measurable. Assume each G. to have finite first moment. Such a sequence {G„} will be called a strategy. Define ,
Wn =Wo+
Y Gk(Sk
—
Sk-I),
nil,
k=1
where Wo is an integrable nonnegative random variable independent of {S„} (representing initial capital). Show that regardless of the strategy {G„} we have the following. (i) If p = Z then { W„} is a martingale. (ii) If p > Z then { W„} is a submartingale. (iii) If p < Z then {W} is a supermartingale (i.e., EIW„I < cc, E(Wn+1 I f) W„, n=1,2,...).
(iv) Calculate EW„, n > 1, in the case of the so-called double-or-nothing strategy defined by G„ = 2 S ^ - '1 (S=i _ 1j , n >_ 1.
17. Let {S„} be the simple symmetric random walk starting at 0. Let r = inf{n ? 0: S„=2—n}. (i) Calculate Er from the distribution of t. (ii) Use the martingale stopping theorem to calculate Et. (*iii) How does this generalize to the cases r = inf{n ? 0: S„ = b — n}, where b is a positive integer? [Hint: Check that n + S. is even for n = 0, 1, 2, ....] 18. (i) Show that if X is a random variable such that g(z) = Ee^ x is finite in a neighborhood of z = 0, then EX” < oo for all k = 1, 2..... (ii) For a Brownian motion {X,} with drift p and diffusion coefficient a 2 , prove that exp{AX, — Atp — A 2 a 2 t/2} (t _> 0) is a martingale. 19. Consider an arbitrary Brownian motion with drift µ and diffusion coefficient a 2 > 0. (i) Let m(x) = ET", where Ts is the time to reach the boundary {c, d} starting at x e [c, d]. Show that m(x) solves the boundary-value problem
90
RANDOM WALK AND BROWNIAN MOTION d e m
dm
zag d2 +µ dz —1,
(ii) Let r(x) = Px (r d problem
Z
_ 1, So = 0, on Z with 1 - EX, = 0, then {Sn } is recurrent. ❑ Proof. To prove this, first observe that P(S, = 0 i.o.) is 1 or 0 by the Hewitt—Savage zero—one law (theoretical complement 1.2). If X° , P(S n = 0) < oc, then P(S, = 0 i.o.) = 0 by the Borel—Cantelli Lemma. If Y_„ , P(S, = 0) is divergent (i.e., the expected number of visits to 0 is infinite), then we can show that P(Sn = 0 i.o.) = 1 as follows. Using independence and the property that the shifted sequence Xk , Xk ,,, ... has the same distribution as X,, X 2 , ... , one has 1 >, P(S, = 0 finitely often) >
P(S, = 0, S,„ 0, m> n)
=Z P(Sn =0)P(Sm —S, 540,m>n) = Y_ P(S„ = 0)P(Sm, 0,m i 1).
Thus, if Z„ P(S, = 0) diverges, then P(S,„ : 0, m >_ 1) = 0 or equivalently P(S m = 0 for some m >_ 1) = 1. This may now be extended by induction to get that at least r visits to 0 is certain for each r = 1, 2, .... One may also use the strong Markov property of Chapter II, Section 4, with the time of the rth visit to 0 as the stopping time. From here the proof rests on showing E n P(S, = 0) is divergent when p = 0. Consider the generating/unction of the sequence P(S, = 0), n = 0, 1, 2, ... , namely,
92
RANDOM WALK AND BROWNIAN MOTION
g(x):=
P(S" = 0)x",
Ix) < 1.
"=o
The problem is to investigate the divergence of g(x) as x -+ 1 . Note that P(S" = 0) is the 0th-term Fourier coefficient of the characteristic function (Fourier series) -
Ee tts" _
P(S" = k) e ttk
Thus,
1
1
"
P(S " = 0) = — (p"(t) dt Ee"s^ dt = — _„ 2n _"
lac
where cp(t) = Ee. It follows that for lxl < 1,
N dt
1 9(x) = 2n
" 1 - xtp(t)
Thus, recurrence or transience depends on the divergence or convergence, respectively, of this integral. Now, with p = 0, we have q(t) = 1 - o(Iti) as t -• 0. Thus, for any e > 0 there is a 6 > 0 such that I1 - cp 1 (t)I ltl. I(P2(t)I lti, for Itl _< S, where q(t) = q 1 (t) + i , 2 (t) has real and imaginary parts q, qi 2 . Now, for 0 < x < 1, noting that g(x) is real valued,
f
f
f ( l
1 - x(p (t) dt a
dt = " Re 1 J n -- x0(t) - .L \1 - xtP(t)/ J -" I 1 n dt
=
J-
1 -x(p 1 (t) a (1
-
x(V1(t))2 + x2(t)
> a
1-x 2(1
2
=—tan 3e
- x) 2 + 3x 2 r 2 t 2
- 1^
ES
1
n
/J-^— 1 - x 3E
dt v
x^P(t)Iz j
f
b -6 (1
1 - xq,(t)
dt
I1 - x^V(t)I Z
1 -x
- x + xslti) 2 + x2e2t2
dt
dt> a 1-x dt 3[(1 -x) 2 + e 2 t 2 ]
asx-' 1.
Since e is arbitrary, this completes the argument.
n
The above argument is a special case of the so-called Chung-Fuchs recurrence criterion developed by K. L. Chung and W. H. J. Fuchs (1951), "On the Distribution of Values of Sums of Random Variables," Mem. Amer. Math. Soc., No. 6.
Theoretical Complements to Section I.6 1. The Kolmogorov Extension Theorem holds for more general spaces S than the case S = R' presented. In general it requires that S be homeomorphic to a complete and separable metric space. So, for example, it applies whenever S is R k or a rectangle,
THEORETICAL COMPLEMENTS
93
or when S is a finite or countable set. Assuming some background in analysis, a simple proof of Kolmogorov's theorem (due to Edward Nelson (1959), "Regular Probability Measures on Function Space," Annals of Math., 69, pp. 630-643) in the case of compact S can be made as follows. Define a linear functional Ion the subspace of C(S') consisting of continuous functions on S' that depend on finitely many coordinates, by
l(f)' L
f(x1 .,...,
—
x !k)pI1....f Pik(dxi^ . . .dx^k)s
'I "ki
where f((xl)IE/) = J (xi i , . . . , xik).
By consistency, l is a well-defined linear functional on a subspace of C(S') that, by the Stone—Weierstrass Theorem of functional analysis, is dense in C(S'); note that S' is compact for the product topology by Tychonoff's Theorem from topology (see H. L. Royden (1968), Real Analysis, 2nd ed., Macmillan, New York, pp. 174, 166). In particular, I has a natural extension to C(S') that is linear and continuous. Now apply the Riesz—Representation Theorem to get a (probability) measure p on (S', .y ) such that for any Je C(S'), l(f) = f s t f dp (see Royden, loc. cit., p. 310). To make the proof in the noncompact but separable and complete case, one can use a fundamental (homeomorphic) embedding of S into IJ (see P. Billingsley (1968), Convergence of Probability Measures, Wiley, New York, p. 219); this is a special case of Urysohn's Theorem in topology (see H. L. Royden, loc. cit., p. 149). Then, by making a two-point compactification of R', S can be further embedded in the Hilbert cube [0,1], where the measure p can be obtained as above. Consistency allows one to restrict p back to S. 0 Theoretical Complements to Section I.7 1. (Brownian Motion and the Inadequacy of Kolmogorov's Extension Theorem) Let S = C[0,1] denote the space of continuous real-valued functions on [0, 1] equipped with the uniform metric p(x, y) = max Jx(t) — y(t)I,
x, ye C[0, 1].
osr x} is the countable union {X„ > x} of measurable sets, which is, therefore, measurable. But, on the other hand, if I is uncountable, say 1 = [0, 1], then {sup,., X, > x} is an uncountable union of measurable sets. This of course does not make {sup,,, X,> x} measurable. If, for example, it is known that {XX } has continuous sample paths, however, then, letting T denote the rationals in [0, 1], we have {sup,,, X, > x} = {sup, ET X, > x} = U tET {X> x}. Thus {sup,,, X, > x} is seen to be measurable under these circumstances. While it is not always possible to construct a stochastic process {X,: t _> 0} having continuous sample paths for a given consistent specification of finite-dimensional distributions, it is often possible to construct a model (S2, F, P), {Xr : t 0} with the following property, called separability. There is a countable dense subset T of I = [0, oo] and a set D e F with P(D) = 0 such that for each we S1 - D, and each t e i there is a sequence t 1 , t 2 ,... in T (which may depend on w) such that t„ -> t and X(w) -+ X(w) (P. Billingsley (1986),
THEORETICAL COMPLEMENTS
95
Probability and Measure, 2nd ed., Wiley, New York, P. 558). In theory, this is enough sample path regularity to make manageable most such measurability issues connected with processes at uncountably many time points. In practice though, one seeks to explicitly construct models with sufficient sample path regularity that such considerations are often avoidable. The latter is the approach of this text.
Theoretical Complements to Section I.8 1. Let {X;" }, n = 1, 2, ... and {X,} be stochastic processes whose sample paths belong to a metric space S with metric p; for example, S = C[O, 1] with p(aw, ri) = sup o , t ,, 1co, — n,j, w, ry E S. Let .4 denote the Borel sigmafield for S. The distributions P of {X,} and P. of {X," }, respectively, are probability measures on (S, 9). Assume {X;" } and {X} are defined on a probability space (S2, .F , Q). Then, )
)
)
(i)
P(B) = Q({w E ): {X,(uo)} E B})
(ii)
P„
^n) (m)} E B}),
= Q({ (U aft {
(T.8.1)
n >- 1.
Convergence in distribution of {X} to {X} has been defined in the text to mean that the sequence of real-valued random variables Y:= f({X;" }) converges in distribution to Y:= f({X})for each continuous (for the metric p) function f: S - W. However, an equivalent condition is that for each bounded and continuous real-valued function f: S one has, )
lim Ef({X!" }) = Ef({X,}).
(T.8.2)
)
To see the equivalence, first observe that for any continuous f: S —• R', the functions cos(rf) and sin(rf) are, for each r E R', continuous and bounded functions on S. Therefore, assuming that condition (T.8.2) gives the convergence of the characteristic functions of the Y. to that of Y for each continuous f on S. In particular, the Y" must converge in distribution to Y. To go the other way, suppose that f: S —• ff8' is continuous and bounded. Assume without loss of generality that 0 1, " k-1 /k-1 k
, N
P
N
- limsup f dP„.
s
„
is
Thus, in general, limsup „
J
f dP„ < I f dP -< liminf I f dP„ s
i s
',
(T.8.6)
s
which implies limsup J f dP„ = liminf J f dP„ = J f dP. s
„
s
(T.8.7)
s
This is the desired condition (T.8.2).
n
With the above equivalence in mind we make the following general definition. Definition. A sequence {P„} of probability measures on (S, .4) converges weakly (or in distribution) to a probability measure P on (S, 9) provided that lim„ f s f dP„ = f s f dP
for all bounded and continuous functions f: S -+ U8'. Weak convergence is sometimes denoted by P„ P as n -+ oo. Other equivalent notions of convergence in distribution are as follows. The proofs are along the lines of the arguments above. (P. Billingsley (1968), Convergence of Probability Measures, Theorem 9.1, pp. 11-14.) Theorem T.8.1. (Alexandrov). P, = P as n -. oo if and only if (i) lim„.^ P„(A) = P(A) for all A e -4 such that P(ÖA) = 0, where öA denotes the boundary of A for the metric p. (ii) limsup, P„(F) P(F) for all closed sets F c S. (iii) liminf, P,(G) P(G) for all open sets G c S. ❑ 2. The convergence of a sequence of probability measures is frequently established by an application of the following theorem due to Prohorov.
THEORETICAL COMPLEMENTS
97
Theorem T.8.2. (Prohorov). Let {PP } be a sequence of probability measures on the metric space S with Bore! sigmafield . If for each r > 0 there is a compact set K E c S such that P(K) > 1 — a
for all n = 1, 2, ... ,
(T.8.8)
then {P.} has a subsequence weakly convergent to a probability measure Q on (S, °t7). Moreover, if S is complete and separable then the condition (T.8.8) is also necessary. ❑ The condition (T.8.8) is referred to as tightness of the sequence of probability measures {PP }. A proof of sufficiency of the tightness condition in the special case S = 11' is given in Chapter 0, Theorem 5.2. For the general result, consult Billingsley, loc. cit., pp. 37-40. A version of (T.8.8) for processes with continuous paths is computed below in Theorem T.8.4. 3. In the case of probability measures {P P } on S = C[0, 1], if the finite-dimensional distributions of P. converge to those of P and if the sequence {P n } is tight, then it will follow from Prohorov's theorem that {P P } converges weakly to P. To check tightness it is useful to have the following characterization of (relatively) compact subsets of C[0, 1] from real analysis (A. N. Kolmogorov and S. V. Fomin (1975), Introductory Real Analysis, Dover, New York, p. 102). Theorem T.8.3. (Arzela-Ascoli). A subset A of functions in C[0, 1] has compact closure if and only if
(i) sup Iwol < c, ..A
(ii) lim sup v w (S) = 0, 5-•o w.A
where v(ö) is the oscillation in we C[0,1 ] defined by v w (S) = sup
0 there is a common S > 0 such that for all functions w e A we have Iw, — cw,I < e if It — sI < b. Conditions (i) and (ii) together imply that A is uniformly bounded in the sense that there is a number B for which IIwII := sup ow,l _< B
for all w e A.
ose
This is because for N sufficiently large we have sup WEA v w (1/N) < I and, therefore, for each 0 0 there is a compact K such that P(K) > 1 — ry for all n. By the Arzela—Ascoli theorem, if B> supKIWol then P"({w e C[O, 1]: I(0 o l >, B}) 5 P"(K c ) _< 1 — (1 — n) = n. Also given e > 0 select b > 0 such that sup. EK v.((5) < E. Then P"({w e C[0,1]: v W (S) >_ ej) < P(KC) < ry
for all n _> 1.
The converse goes as follows. Given ry > 0, first select B using (i) such that P"({w: Iw o l < B}) _> 1 — Zry, for n >, 1. Select S, using (ii) such that P({w: v w ((5,) < 1/r}) 1— for n >, 1. Now take K to be the closure of
t
1 —1r co: v w (8,) < — {w: Iw o l < B} n n
Then P"(K) > 1 — ry for n > 1, and K is compact by the Arzela—Ascoli theorem. • The above theorem taken with Prohorov's theorem is a cornerstone of weak convergence theory in C[O, 1]. If one has proved convergence of the finite-dimensional distributions of {X,('°} to {X,} then the distributions of {Xo'} must be tight as probability measures on III' so that condition (i) is implied. In view of this and the Prohorov theorem we have the following necessary and sufficient condition for weak convergence in C[0,1] based on convergence of the finite-dimensional distributions and tightness. Theorem T.8.5. Let {X: 0 < t < 1) and {XX :0 _< t E) m=k+l
m 2a2m M2 m(l+p) = M mk+1
a
h=0
x m 2„ — mß
mk+12
This bound does not depend on n and goes to zero as S -+ 0 (i.e., k = log 2 (5 -' + 1 -• + oo) since it is the tail of a convergent series. This proves the corollary.
n
5. (FCLT and a Brownian Motion Construction) As an application of the above corollary one can obtain a proof of Donsker's Invariance Principle for i.i.d. summands having finite fourth moments. Moreover, since the simple random walk has moments
100
RANDOM WALK AND BROWNIAN MOTION
of all orders, this approach can also be used to give an alternative rigorous construction of the Wiener measure based on Prohorov's theorem as the limiting distribution of random walks. (Compare theoretical complement 13.1 for another construction). Let Z 1 , Z 2 ,... be i.i.d. random variables on a probability space (S2, y , P) having mean zero, variance one, and finite fourth moment m, = EZ. Define So = 0, S"=Z 1 +• •+Z",n>,1,and
= n-
Xtn]
IIZ
Stnr] + n - ' ]Z (nt — [ nt])Zt,, i +l,
0 1/n. Then take f(x) = h"(p(x, F)).
Theoretical Complements to Section I.9 1. If f: C[0, oo) -+ 08' is continuous, then the FCLT provides that the real-valued random variables X. = f({X;" 1 }), n > 1, converge in distribution to X = f({X' }). Thus, if one checks by direct computation that the limit F(a) = lim a P(X" _< a), or F(a ) = lim a P(X" < a), exists for all a, then F is the d.f. of X. Applying this to f ({Xs }) := max{X5 :0 ,< s ,< t} __ M, we get (10.10), for P(T. > t) = P(M< < z), if z > 0. The case z < 0 is similar. Joint distributions of several functionals may be similarly obtained by looking at linear combinations of the functionals. Here is the precise statement. -
Theorem T.9.1. If f: C[0, oo) -- l8'` is continuous, say f = (fl , ... , f,,) where f• : C[0, oo) - III', then the random vectors X. = f({X;" 1 }), n >, 1, converge in
distribution to X = f({X,}). Proof. For any r i , ... , rk e R , ^j_ rr f : C[0, co) --• W is continuous so that ^j= 1 r; f ({ X}'°}) converges in distribution to jj =1 rt f ({X}) by the FCLT. Therefore, its characteristic function converges to Ee'
(T.13.2)
In particular, it will follow that with probability 1, for every t > 0, q - Xq is uniformly continuous on D n [0, t]. Thus, almost all sample paths of {Xq : q e D} have a unique extension to continuous functions {B,: t 0}. That is, letting C = {wu e Q: for each
106
RANDOM WALK AND BROWNIAN MOTION
t > 0, q - Xq (w) is uniformly continuous on D n [0, t]}, define for w e C, B,(co)
if t = q e D,
= Xq (W),
lim Xq (co),
(T.13.3)
if t 0 D,
q—t
where the limit is over dyadic rational q decreasing to t. By construction, {B,: t _> 0} has continuous paths with probability 1. Moreover, for 0 < t, < • • • < t, with probability one, (B,,, .. . , B1 ) = lim" . (Xqc ., ... , Xq ,) for dyadic rational q;" , ... , qp" decreasing to t,.. ... t k . Also, the random vector (Xgti .... .. Xq ^ ,) has the multivariate normal distribution with mean vector 0 and variance—covariance matrix = min(t ; , t i ), I < i, j < k as a limiting distribution. lt follows from these two facts that this must be the distribution of (B 1 , B, k ). Thus, {B,} is a standard Brownian motion process. To verify the condition (T.13.1) for the Borel—Cantelli lemma, just note that by the maximal inequality (see Exercises 4.3, 13.11),
)
)
P(
max IX,+,a2-- —
a) 2P(IX,+a — X11 % a)
(s2"ß 2
4 E(X1+, — X,) 4
a
66'
(T.13.4)
aä
since the increments of {X,} are independent and Gaussian with mean 0. Now since a} increase with m, we have, letting m —* oo, the events {max 142 .„IX, +1a2 -, — X, P(
sup
(T.13.5)
iXr+qa — X1I > a \ 6S . J
o$q$l.q.D
a
Thus, Pl max
sup Xq — Xk12 .4 >
O^kan2"qeJ".knD
/
Pf sup
- n
k=0
6^ri2 ^
)4
1 n
\qeJ",k,D
( n)2
n2"
(Xq — Xk,2"I > 5
=—‚ 6
(T.13.6)
which is summable.
Theoretical Complements to Section 1.14 1. The treatment given in this section follows that in R. N. Bhattacharya, V. K. Gupta, and E. Waymire (1983), "The Hurst Effect Under Trends," J. App!. Probability, 20, pp. 649-662. The research on the Hurst effect is rather extensive. For the other related results mentioned in the text consult the following references:
THEORETICAL COMPLEMENTS
107
W. Feller (1951), "The Asymptotic Distribution of the Range of Sums of Independent Random Variables," Ann. Math. Statist., 22, pp. 427-432. P. A. P. Moran (1964), "On the Range of Cumulative Sums," Ann. Inst. Statist. Math., 16, pp. 109-112. B. B. Mandelbrot and J. W. Van Ness (1968), "Fractional Brownian Motions, Fractional Noises and Applications," SIAM Rev., 10, pp. 422-437. Processes of the type considered by Mandelbrot and Van Ness are briefly described in theoretical complement 1.3 to Chapter IV of this book.
CHAPTER II
Discrete-Parameter Markov Chains
1 MARKOV DEPENDENCE Consider a discrete-parameter stochastic process {X„}. Think of X0 , X,, ... , X,_, as "the past," X. as "the present," and X +1 X +z as "the future" of the process relative to time n. The law of evolution of a stochastic process is often thought of in terms of the conditional distribution of the future given the present and past states of the process. In the case of a sequence of independent random variables or of a simple random walk, for example, this conditional distribution does not depend on the past. This important property is expressed by Definition 1.1. ,
i
, • • .
Definition 1.1. A stochastic process {X0 , X 1 ..... X„, ...} has the Markov property if, for each n and m, the conditional distribution of X„ + 1 .. • , Xn +m given X0 , X 1 ..... X„ is the same as its conditional distribution given X. alone. A process having the Markov property is called a Markov process. If, in addition, the state space of the process is countable, then a Markov process is called a Markov chain. ,
In view of the next proposition, it is actually enough to take m = I in the above definition. Proposition 1.1. A stochastic process X0 , X,, X 2 , .. . has the Markov property if and only if for each n the conditional distribution of X„ + , given X 0 X,..... X is a function only of X. ,
X
Proof. For simplicity, take the state space S to be countable. The necessity of the condition is obvious. For sufficiency, observe that 109
110 P(Xn+1 =ill . .. ,
DISCRETE-PARAMETER MARKOV CHAINS
Xn +m
—
.lm
I Xo =
= P(Xn+t
... ,
Xn = in)
= f1 I Xo = io, .. . , Xn = in)'
P(Xn+z = J2jXo= io,...,Xn=In,Xn+t
= J1) ...
P(Xn +m=lmIXo = io,...,Xn +m-1 =1m-1)
= P(Xn +1 =J1
I Xn = in)P(Xn +2 =12 I Xn +1 =1 ).
P(Xn +m =1m
Xn +m-1 =1m-1).
(1.1)
The last equality follows from the hypothesis of the proposition. Thus the conditional distribution of the future as a function of the past and present states i 0 , i 1 , ... , i n depends only on the present state i n . This is, therefore, the conditional distribution given X n = i n (Exercise 1). n A Markov chain {X0 , X1,. . .} is said to have a homogeneous or stationary transition law if the distribution of Xn +l, ... , Xn +m given Xn = y depends on the state at time n, namely y, but not on the time n. Otherwise, the transition law is called nonhomogeneous. An i.i.d. sequence {X n } and its associated random walk possess time-homogeneous transition laws, while an independent nonidentically distributed sequence {X} and its associated random walk have nonhomogeneous transitions. Unless otherwise specified, by a Markov process (chain) we shall mean a Markov process (chain) with a homogeneous transition law. The Markov property as defined above refers to a special type of statistical dependence among families of random variables indexed by a linearly ordered parameter set. In the case of a continuous parameter process, we have the following analogous definition. Definition 1.2. A continuous-parameter stochastic process {X,} has the Markov property if for each s < t, the conditional distribution of X, given {X, u < s} is the same as the conditional distribution of X, given Xs . Such a process is called a continuous-parameter Markov process. If, in addition, the state space is countable, then the process is called a continuous-parameter Markov chain.
2 TRANSITION PROBABILITIES AND THE PROBABILITY SPACE An i.i.d. sequence and a random walk are merely two examples of Markov chains. To define a general Markov chain, it is convenient to introduce a matrix p to describe the probabilities of transition between successive states in the evolution of the process. Definition 2.1. A transition probability matrix or a stochastic matrix is a square
TRANSITION PROBABILITIES AND THE PROBABILITY SPACE
III
matrix p = ((p i3 )), where i and] vary over a finite or denumerable set S, satisfying (i) p i; >, 0 for all i and j, (ii) YJES pit = I for all i.
The set S is called the state space and its elements are states. Think of a particle that moves from point to point in the state space according to the following scheme. At time n = 0 the particle is set in motion either by starting it at a fixed state i o , called the initial state, or by randomly locating it in the state space according to a probability distribution it on S, called the initial distribution. In the former case, it is the distribution concentrated at the state i 0 , i.e., n ; = 1 if j = i o , ire = 0 if j : i 0 . In the latter case, the probability is 7r ; that at time zero the particle will be found in state i, where 0 < n i < 1 and y i tc i = 1. Given that the, particle is in state i 0 at time n = 0, a random trial is performed, assigning probability p ;0 to the respective states j' E S. If the outcome of the trial is the state i l , then the particle moves to state i 1 at time n = 1. A second trial is performed with probabilities p i , j - of states j' E S. If the outcome of the second trial is i 2 , then the particle moves to state i 2 at time n = 2, and so on. A typical sample point of this experiment is a sequence of states, say (i o , i 1 , i z , ... , i n , ...), representing a sample path. The set of all such sample paths is the sample space S2. The position Xn at time iI is a random variable whose value is given by X n = i n if the sample path is (i 0 , i l , ... , i n , ...). The precise specification of the probability P,, on Q for the above experiment is given by Pn( XO =
l 0, X1 = ii, •
, Xn = in) =
M io Pioi1 Pi l i 2
.. •p
(2.1)
More generally, for finite-dimensional events of the form A={(X0 ,X 1 ....,Xn )EB},
(2.2)
where B is an arbitrary set of (n + 1)-tuples of elements of S, the probability of A is specified by P,1(A) _
y-
iioPioi,...p1
^.
(2.3)
(io.i1 ..... ln)EB
By Kolmogorov's existence theorem, P,, extends uniquely as a probability measure on the smallest sigmafield F containing the class of all events of the form (2.2); see Example 6.3 of Chapter I. This probability space (i), F, Pn ) with Xn (CU) = w,, co E S2, is a canonical model for the Markov chain with transition probabilities ((p ij )) and initial distribution it (the Markov property is established below at (2.7)—(2.10)). In the case of a Markov chain starting in state i, that is, 1t ; = 1, we write Pi in place of P11.
112
DISCRETE-PARAMETER MARKOV CHAINS
To specify various joint distributions and conditional distributions associated with this Markov chain, it is convenient to use the notation of matrix multiplication. By definition the (i, j) element of the matrix p 2 is given by nj
)
_ P
(2.4)
ik Pkj
kcS
The elements of the matrix p" are defined recursively by p" = p" 'p so that the (i, j) element of p" is given by -
cn-1)
cn) =
cn-')
_
P ik Pkj PikPkj
n = 2,3 ....
>
(2.5)
kcS
kES
It is easily checked by induction on n that the expression for p;n is given directly in terms of the elements of p according to )
Pi; =
Y_
)
(2.6)
Pü1Pi1i2•..Pin-zln-IPin-li.
11.....In - l E S
Now let us check the Markov property of this probability model. Using (2.1) and summing over unrestricted coordinates, the joint distribution of Xo , Xn ,, X" 2 , . .. , Xnk , with 0 = n o < n l < n2 < • • • < nk, is given by Pa(X0 = i, Xn l =j, Xn 2
—
_ Z E . . . I (
1 2
k
12' ... , Xn k =ik) 7r l Pii, Pi, . . . . pin 1 - lj l)(Pi l in i + I Pin, + l in, + Z . . I7 Y /n2 - 1 J2 )
X ( Pik-link-,+1
. .
(2.7)
Pink-,+l ink_I+2...Pink-lik)>
where >, is the sum over the rth block of indices ii + n ,_ ', .+ .), . ‚ 1,, , in , (r = 1, 2, . .. , k). The sum Y_ k , keeping indices in all other blocks fixed, yields the factor p^kk Ok -') using (2.6) for the last group of terms. Next sum successively over the (k — 1)st, ... , second, and first blocks of factors to get P (X0 =
i,
Xn1 =j1, X,2 =J2'•
^nJ 2 nl).. , Xn k =1k) = 7riP) p . plkk Ijk 1). (2.8)
Now sum over i e S to get PE(Xni
j1, X2 —121 ,
Xnk
—Jk) = ( ^I Pin1))Pjll.i2 " / ( ics
1) .
. .pjkk lik 1). (2.9)
Using (2.8) and the elementary definition of conditional probabilities, it now follows that the conditional distribution of X„ +m given X o , X,,. . . , Xn is given by Pt(Xn+m = J I Xo = i0, Xl = il, .. , Xn— 1 = in — 1, Xn = i) =p^) =P"(Xn+m=j1 Xn =i) )
=
PP(Xm
= .1 1 XO = 1),
m
i
1 , j E S. (2.10)
Although by Proposition 1.1 the case m = 1 would have been sufficient to prove
113
SOME EXAMPLES
the Markov property, (2.10) justifies the terminology that p'° :_ ((p;; >)) is the m-step transition probability matrix. Note that p'° is a stochastic matrix for all m>'1. The calculation of the distribution of X,„ follows from (2.10). We have, Pn(Xm=j)=ZPn(Xm=j• X0 =i) =ZPP(X0=i)Pi(Xm=j1 i
X0
=')
i
_
7
Z.pi;' = (n'pm).,
(2.11)
where n' is the transpose of the column vector n, and (n'pt) j is the jth element of the row vector n'pm.
3 SOME EXAMPLES The transition probabilities for some familiar Markov chains are given in the examples of this section. Although they are excluded from the general development of this chapter, examples of a non-Markov process and a Markov process having a nonhomogeneous transition law are both supplied under Example 8 below. Example 1. (Completely Deterministic Motion). Let the only elements of p, which may be either a finite or denumerable square matrix, be 0's and l's. That is, for each state i there exists a state h(i) such that p ih(i) = 1,
pi; = 0
for j 0 h(i)
(i n S).
(3.1)
This means that if the process is now in state i it must be in state h(i) at the next instant. In this case, if the initial state X o is known then one knows the entire future. Thus, if X 0 = i, then X l = h(i), X z = h(h(i)) := h (z) (i), ... , X" = h(h (
" - 1)
(i)) = h ( " (i), ... . )
Hence p;;^ = 1 if j = h ( " (i) and p;j" = 0 if j ^ h ( " (i). Pseudorandom number generators are of this type (see Exercise 7). )
)
)
Example 2. (Completely Random Motion or Independence). Let all the rows of p be identical, i.e., suppose p i , is the same for all i. Write for the common row,
p si =p
(ieS,jeS).
(3.2)
114
DISCRETE-PARAMETER MARKOV CHAINS
In this case, X0 , X 1 , X2 . . forms a sequence of independent random variables. The distribution of X0 is it while X 1 ,. . . , X, ... have a common distribution given by the probability vector (p j )jcs . If we let it = (p; ), ES , then Xo , X 1 , .. . form a sequence of independent and identically distributed (i.i.d.) random variables. The coin-tossing example is of this kind. There, S = {0,1 } (or {H, T } ) i and po=2'Pj =za1 ,.
Example 3. (Unrestricted Simple Random Walk). Here, S = {0, + 1, ±2, ...}
and p = ((p ;j )) is given by Pi; =P =q =0
ifj =i +1 ifj =i -1 ifjj—iI> 1.
(3.3)
where 0 < p < l and q = 1 — p. Example 4. (Simple Random Walk with Two Reflecting Boundaries). Here S = {c, c + 1, ... , d}, where c and d are integers, c < d. Let =p
ifj =i +1 andc,l,
X^=X0 +Z 1 +-•-+Z,,,
(3.7)
is a Markov chain with the transition probability (3.6). Also note that Example 3 is a special case of Example 6, with Q(— l) = q, Q(l) = p, and Q(i) = 0 for i: +1.
Example 7. (Bienayme—Galton—Watson Simple Branching Processes). Particles such as neutrons or organisms such as bacteria can generate new particles or organisms of the same type. The number of particles generated by a single particle is a random variable with a probability function f; that is, f (j) is the probability that a single particle generates j particles, j = 0, 1, 2, .... Suppose that at time n = 0 there are Xo = i particles present. Let Z 1 , Z 2 , ... , Z i denote the numbers of particles generated by the first, second, ... , ith particle, respectively, in the initial set. Then each of Z 1 , ... , Z ; has the probability function f and it is assumed that Z 1 , ... , Z; are independent random variables. The size of the first generation is X 1 = Z 1 + • • • + Z., the total number generated by the initial set. The X, particles in the first generation will in turn generate a total of X 2 particles comprising the second generation in the same manner; that is, the X, particles generate new particles independently of each other and the number generated by each has probability function f. This goes on so long as offspring occur. Let X. denote the size of the nth generation. Then, using the convolution notation for distributions of sums of independent random variables, pti=P(Z1+ ... +Z.=1)_f *i (j), Poo= 1 ,
p oi =0
i> 1, j^0,
(3.8)
ifj00.
The last row says that "zero" is an absorbing state, i.e., if at any point of time X„ = 0, then X. = 0 for all m > n, and extinction occurs.
Example 8. (Pölya Urn Scheme and a Non-Markovian Example). A box contains r red balls and b black balls. A ball is randomly selected from the box and its color is noted. The ball selected together with c 0 balls of the same color are then placed back into the box. This process is repeated in successive trials numbered n = 1, 2,-. . . . Indicate the event that a red ball occurs at the nth trial by XX = 1 and that a black ball occurs at the nth trial by X„ = 0. A straightforward induction calculation gives for 0 < Yk= 1 e k < n,
116
DISCRETE-PARAMETER MARKOV CHAINS
P(X1 =
E1,•• .Xn
=
En)
[r+(s„-1)c][r+(s„-2)c].• r[b+(T„-1)c]••.b
[r+b+(n— 1)c][r+b+(n-2)c]...[r+b]
(3.9)
where „ s„= Y_ s k ,
r„ =n—s„.
(3.10)
k=1
In the cases„=n(i.e.,c 1 =...=E„=1), P(X I = 1, ... X„ = 1)
[ r + (n — 1) c] • r
[r+b+(n-1)c]••.[r+b]
(3.11)
and if s„ = 0 (i.e., e 1 = • • • = E„ = 0) then P(XI=0,...,Xn =0)=
[b + (n — 1)c]•.•b
[r+b+(n— 1)c]•••[r+b]
(3.12)
In particular, P(Xn
= E„IXt = ei,...,Xn-1 =E P (X1 = E1, .. . ,
P (X 1 = _
X.
-1)
= £ n)
c1,. ..,X—t = En — t )
[r + (s„ — 1)c]•..r[b + (r„ — 1)c]•••b [r + (s n _ 1 — 1)c].• r[b + (t n [r+s„_Ic]
1
[r + b + (n — 2)c]..•[r + b]
— 1)c] •••b [r + b + (n — 1)c]. .[r + b]
ifE= 1
— r+b+(n— 1)c [b+r„-IC]
(3.13)
ife„=0.
r+b+ (ii — 1)c
It follows that {X„} is non-Markov unless c = 0 (in which case {X„} is i.i.d.). Note, however, that {X„} does have a distinctive symmetry property reflected in (3.9). Namely, the joint distribution is a function of s„ = Yk =1 e k only, and is therefore invariant under permutations of e l • • . i,,. Such a stochastic process is called exchangeable (or symmetrically dependent). The Pölya urn model was originally introduced to illustrate a notion of "contagious disease” or "accident proneness' for actuarial mathematics. Although {X„} is non-Markov for c ^ 0, it is interesting to note that the partial-sum process {S„}, representing the evolution of accumulated numbers of red balls sampled, does have the Markov property. From (3.13) one can also get that
117
STOPPING TIMES AND THE STRONG MARKOV PROPERTY
r P(sn=
sIS,
+ CS n _ , +(n-1)c
=s1,•• ,Sn_,
if s = l +s„_,
=s,)= r+b
+(n— S„- 1 ) c r + b + (n — 1)c b
lfs=s _ i . (3.14)
Observe that the transition law (3.14) depends explicitly on the time point n. In other words, the partial-sum process {S„} is a Markov process with a nonhomogeneous transition law. A related continuous-time version of this Markov process, again usually called the Pö1ya process is described in Exercise 1 of Chapter IV, Section 4.1. An alternative model for contagion is also given in Example 1 of Chapter IV, Section 4, and that one has a homogeneous transition law.
4 STOPPING TIMES AND THE STRONG MARKOV PROPERTY One of the most useful general properties of a Markov chain is that the Markov property holds even when the "past" is given up to certain types of random times. Indeed, we have tacitly used it in proving that the simple symmetric random walk reaches every state infinitely often with probability 1 (see Eq. 3.18 of Chapter 1). These special random times are called stopping times or (less appropriately) Markov times. Definition 4.1. Let {},,: n = 0, 1, 2, ...} be a stochastic process having a countable state space and defined on some probability space (S2, 3, P). A random variable r defined on this space is said to be a stopping time if
(i) It assumes only nonnegative integer values (including, possibly, +co), and (ii) For every nonnegative integer m the event {w: r(w) < m} is determined by Yo , Y1 ,..., Y.. Intuitively, if r is a stopping time, then whether or not to stop by time m can be decided by observing the stochastic process up to time m. For an example, consider the first time T y the process { Y„} reaches the state y, defined by t(co) = inf{n > 0: Y„(co) = y} . (4.1) If co is such that Y„(w) y whatever be n (i.e., if the process never reaches y), then take r y,(w) = oo. Observe that {w: r(w) < m} = U {co: Y„(w) = y} . n
=O
(4.2)
DISCRETE-PARAMETER MARKOV CHAINS
118
Hence zr Y is a stopping time. The rth return times r;' ofy are defined recursively by )
r" (w) = inf{n )
1: Y(w) = y},
r(w) = inf{n > iy '(w): Y (w) = y},
for r = 2, 3, ....
(4.3)
Once again, the infimum over an empty set is to be taken as oo. Now whether or not the process has reached (or hit) the state y at least r times by the time m depends entirely on the values of Y1 , ... , Y.. Indeed, {rI m} is precisely the event that at least r of the variables Y1 , ... , Y,n equal y. Hence ry" is a stopping time. On the other hand, if n y denotes the last time the process reaches the state y, then ?J is not a stopping time; for whether or not i < m cannot in general be determined without observing the entire process {Y n }. Let S be a countable state space and p a transition probability matrix on S, and let P,, denote the distribution of the Markov process with transition probability p and initial distribution n. It will be useful to identify the events that depend on the process up to time n. For this, let S2 denote the set of all sequences w = (i 0 , i 1 , i 2 , ...) of states, and let Y(w) be the nth coordinate of w (if w = (i o , i,, ... , i, ...), then Yn (cw) = in ). Let .fin denote the class of all events that depend only on Yo , YI , ... , Yn . Then the ‚ n form an increasing sequence of sigmafields of finite-dimensional events. The Markov property says that given the "past" Yo , Yi , ... , Y. up to time m, or given .gym , the conditional distribution of the "after-m"stochastic process Y,„ = {(Ym )„ } := {Ym+n . n = 0, 1, ...} is P. In other words, if the process is re-indexed after time m with m + n being regarded as time n, then this stochastic process is conditionally distributed as a Markov chain having transition probability p and initial state Y.. A WORD ON NOTATION. Many of the conditional distributions do not depend on the initial distribution. So the subscripts on P,^, Pi , etc., are suppressed as a matter of convenience in some calculations. Suppose now that r is the stopping time. "Given the past up to time t" means given the values oft and Yo , Y1 , ... , YY . By the "after-r"process we now mean the stochastic process Yi = {Yt+n :n=0, 1,2,...},
which is well defined only on the set IT < co}. Theorem 4.1. Every Markov chain { Yn : n = 0, 1, 2, ...} has the strong Markov property; that is, for every stopping time i, the conditional distribution of the after-r process Yt = { Yt+n : n = 0, 1, 2, ...}, given the past up to time i is P.. on the set
{i
< co}.
Proof. Choose and fix a nonnegative integer m and a positive integer k along with k time points 0 _< m, < m 2 < • • • < m k , and states i o , i l , ... 'im'
119
STOPPING TIMES AND THE STRONG MARKOV PROPERTY
J1,J2, • • • ,Jk•
Then,
= Jk I T = m, Yo = i o , ... , Ym = Im) = P(Ym+mi = J1 , Ym+mz = J2, .. , Ym+mk — Jk I T = m, Yo = io... , Ym = Im)•
P(Yt +m l
J1
Yt +m 2
=J2,
... , Yt +m k
(4.4) Now if the event IT = m} (which is determined by the values of Y,,... , Ym ) is not consistent with the event "Yo = i o , ... , Ym = im " then {T = m, Yo = i o , • • • , Ym = i m } = 0 and the conditioning event is impossible. In that case the conditional probability may be defined arbitrarily (or left undefined). However, if IT = m} is consistent with (i.e., implied by) "Yo = i o , • • • , Ym = then{ T= m,Yo =i o .....Ym = i m }= {Yo =i o ,•••,Ym =i m },and the right side of (4.4) becomes P(Ym+mi
—
11> Ym+mz =i2'
..
,
Ym+m k =Jk
IY
Q
= 1 p , ... , Ym = im) •
(4.5)
But by the Markov property, (4.5) equals Pi m (Ym i = J1, Ymz =J2,
• • • , Ymk = Jk)
= Py,(Ym i =
on the set
{T
= m}.
11, Ym z
= J1, Ym z = J2,
... ,
Ym k =Jk),
(4.6)
n
We have considered just "future" events depending on only finitely many (namely k) time points. The general case (applying to infinitely many time points) may be obtained by passage to a limit. Note that the equality of (4.4) and (4.5) holds (in case {r = m} {Y0 = i o , ... , Ym = i m }) for all stochastic processes and all events whose (conditional) probabilities may be sought. The equality between (4.5) and (4.6) is a consequence of the Markov property. Since the latter property holds for all future events, so does the corresponding result for stopping times T. Events determined by the past up to time T comprise a sigmafield . , called the pre -T sigmafleld. The strong Markov property is often expressed as: the conditional distribution of Y7 given . is PY (on {T < co}). Note that . is the smallest sigmafield containing all events of the form IT = m, ,Ym =im }.
Example 1. In this example let us reconsider the validity of relation (3.18) of Chapter I in light of the strong Markov property. Let d and y be two integers. For the simple symmetric random walk {S: n = 0, 1, 2, ...} starting at x, is an almost surely finite stopping time, for the probability p x ,, that the random walk ever reaches y is I (see Chapter 1, Eqs. 3.16-3.17). Denoting by E, the expectation with respect to P, we have
120
DISCRETE-PARAMETER MARKOV CHAINS
P.(TY < oo) = P (S Y + „ = y for some n > 1) X
= EX[P(Sv+„ = y for some n , 1 (4 , So , ... ,SY)] = E X [PY (S„ = y for some n >, 1)] (since S t,,, Y = y) (Strong Markov Property) = P(S„ = y for some n > 1)
=Py (S, =y— 1,S„=y forsome n>, 1) +PP (S 1 = y + 1, S„=y for some n>, 1) =PY (S 1
= y-1)Py (S„=y forsomen> 1IS =y-1) 1
+PP (S, =y— 1)Py (S„=yforsomen>, 1 IS 1 =y+ 1) =zPy (S, + ,„=y for some m>,01S, = y — 1) +ZPy (S l+ ,„=y for some m >,QIS 1 =y+1)
(m=n-1)
=ZPy _ 1 ( S.=y for some m>,0) +ZPY+ ,( S,„=y for some m,0) (Markov property) 2P r 1,r +2Py +1, —2 +z= 1.
(4.7)
Now all the steps in (4.7) remain valid if one replaces Ty l) by i;,' -1) and T;,2) by zy'r and assumes that < oo almost surely. Hence, by induction, P ('”) < oo) = 1 for all positive integers r. This is equivalent to asserting 1 = P.,(T ( ' )
0 for some n 1.
Y_
)
Since Pi;)=
Pi
(5.1)
1I.i2....,ln - 1 E.S
i -• j if and only if there exists one chain Pul'
pi,i2 , .
, p,,,_ 1 j
(i, i,, i 2 ,
... , i n _ 1 , j)
such that
are strictly positive.
Definition 5.2. Write i H j and read "i and j communicate" if i -‚ j and j -• i. Say "i is essential" if i -• j implies j -+ i ( i.e., if any state j is accessible from i, then i is accessible from that state). We shall let & denote the set of all essential states. States that are not essential are called inessential.
Proposition 5.1
(a) For every i there exists (at least one) j such that i ' f. (b) i -*j,j- •k imply i-• k. (c) "i essential" implies i , 1 such that p1') > 0. Then, by (b), i -• k. Since i is essential, one must have k -> i. Together with i --+ j this implies (again by (b)) k -• j. Thus, if any state k is accessible from j, then j is accessible from that state k, proving that j is essential. 1 = 1, (e) If 60 is empty (which is possible, as for example in the case p i = 0, 1, 2, ...), then, there is nothing to prove. Suppose ' is nonempty. Then: (i) On f the relation "+-•" is reflexive by (c). (ii) If i is essential and i.-.j, then (by (d)) j is essential and, of course, i •-^ j and j H i are equivalent properties. Thus "«->" is symmetric (on ' as well as on S). (iii) If i H j and j *-+ k, then i j and j -• k. Hence i -4 k (by (b)). Also, k -• j and j -> i imply k --> i (again by (b)). Hence i H k. This shows that "•-+" is transitive (on 9 as well as on S). • )
)
-
From the proof of (e) the relation "^-•" is seen to be symmetric and transitive on all of S (and not merely 9). However, it is not generally true that i i (or, i -• i) for all i e S. In other words, reflexivity may break down on S. Example 1. (Simple (Unrestricted) Random Walk). S = {0, ± 1, ±2, ...}. Assume, as usual, 0
j. Hence j e 6"(k). The classes into which of decomposes are called equivalence classes. In the case of the unrestricted simple random walk {S„}, we have 6" S = {0, + 1, ±2,. . .}' and all states in 6" communicate with each other; only one equivalence class. While for {X„} = {S 2n }, = S consists of two disjoint equivalence classes, the odd integers and the even integers. Our last item of bookkeeping concerns the role of possible cyclic motions within an essential class. In the unrestricted simple random walk example, note that p i ,=0 for all i=0,±1,±2,...,butp;2 ) =2pq>0. In fact p;7 1 =0for all odd n, and p;" > 0 for all even n. In this case, we say that the period of i is 2. More generally, if i --• i, then the period of i is the greatest common divisor of the integers in the set A = In >, 1: p}. If d = d, is the period of i, then p;" ) = 0 whenever n is not a multiple of d and d is the largest integer with this property. )
Proposition 5.2 (a) If i H j then i and j possess the same period. In particular "period" is constant on each equivalence class. (b) Let i e 9' have a period d = d ; . For each j e 6'(i) there exists a unique integer r 1 , 0 < rj d - 1, such that p;j ) > 0 implies n = rj (mod d) (i.e., either n = rj or n = sd + rj for some integer s >, I).
Proof. (a) Clearly, (a+m+b)
P«
(a) (m) (b)
(5.5)
>P);Pi; P;,
for all positive integers a, m, b. Choose a and b such that p;, > 0 and pj(b > 0. If pj7 1 > 0, then ps_m ) - pj^" ) p^•^ ) > 0, and )
(a+2m+b) >
(a ) ^^m)
)
^b) > 0.
p.. Pu P^l P ( ) P (b.. P (• p.• p•. (a+m+b) >
) > 0
(
)
( 5.6 )
124
DISCRETE-PARAMETER MARKOV CHAINS
Therefore, d (the period of i) divides a + m + b and a + 2m + b, so that it divides the difference m = (a + 2m + b) — (a + m + b). Hence, the period of i does not exceed the period of j. By the same argument (since i 4—J is the same as j •-+ i), the period of j does not exceed the period of i. Hence the period of i equals the period of j. (b) Choose a such that !°> > 0. If ;'") > 0, ;") > 0, then "' 3p 1) p)> 0 and p;' ? pp;°) > 0. Hence d, the period of i, divides m + a, n + a and, therefore, m — n = m + a — (n + a). Since this is true for all m, n such that p 1) > 0, p;j> > 0, it means that the difference between any two integers in the set A = {n: p;jn> > 0} is divisible by d. This implies that there exists a unique integer r,, 0 < rj < d — 1, such that n = rj (mod d) for all n e A (i.e., n = sd + r, for some integer s >, 0 where s depends on n). n It is generally not true that the period of an essential state i is min{n >, 1: p;; > 0}. To see this consider the chain with state space { 1, 2, 3, 4} and transition matrix )
0 1 0 0 0 0 1 0 0
2 1
0 '2
1 0 0 0
Schematically, only the following one-step transitions are possible. 4—.1 T
I-->2—^3 2
Thus p;i' = 0, pi 1 > 0 , P11 > 0, etc., and pi"i = 0 for all odd n. The states communicate with each other and their common period is 2, although min{n: p;"1 > 0} = 4. Note that min{n > 1: p;° ) > 0} is a multiple of d, since d, divides all n for which p;" ) > 0. Thus, d. , 1: p°> 0}. Proposition 5.3. Let i E e have period d> 1. Let Cr be the set of j e .9(i) such that rj = r, where rr is the remainder term as defined in Proposition 5.2(b). Then (a) Co , C,, ... , Cd _, are disjoint, U r° =ö C, = (b) If je C„ then pik >0 implies k e C, + ,, where we take r + 1 = 0 if r = d — 1.
Proof. (a) Follows from Proposition 5.2(b). (b) Suppose j e C, and p;j > 0. Then n = sd + r for some s >, 0. Hence, if p;k > 0 then )
A CLASSIFICATION OF STATES OF A MARKOV CHAIN
125
Pik +>> Pi; )Pjk >0,
(5.7)
which implies k e Cr+l (since n + I = sd + r + I = r + 1 (mod d)), by n Proposition 5.2(b). Here is what Proposition 5.3 means. Suppose i is an essential state and has a period d > 1. In one step (i.e., one time unit) the process can go from i E Ca only to some state in C, (i.e., p 1 > 0 only if j e C 1 ). From states in C,, in one step the process can go only to states in C 2 . This means that in two steps the process can go from i only to states in C 2 (i.e., p;» > 0 only if je C 2 ), and so on. In d steps the process can go from i only to states in Cd + , = CO3 completing one cycle (of d steps). Again in d + 1 steps the process can go from i only to states in C 1 , and so on. In general, in sd + r steps the process can go from i only to states in Cr. Schematically, one has the picture in Figure 5.1 for the case d = 4 and a fixed state i e Co of period 4.
Example S.S. In the case of the unrestricted simple random walk, the period is 2 and all states are essential and communicate with each other. Fix i = 0. Then C o = {0, ±2, ±4, ...}, C l = (±1, ±3, ±5, ...}. If we take i to be any even integer, then C O3 C l are as above. If, however, we start with i odd, then C o = {± 1, ±3, ±5, ...}, C, = {0, ±2, ±4, ...}.
C, iEG,--.jEC j —^kEC,—./EC,—^mEC„—^ •••
Figure 5.1
126
DISCRETE-PARAMETER MARKOV CHAINS
6 CONVERGENCE TO STEADY STATE FOR IRREDUCIBLE AND APERIODIC MARKOV PROCESSES ON FINITE SPACES As will be demonstrated in this section, if the state space is finite, a complete analysis of the limiting behavior of p", as n —• oo, may be carried out by elementary methods that also provide sharp rates of convergence to the so-called steady-state or invariant distributions. Although later, in Section 9, the asymptotic behavior of general Markov chains is analyzed in detail, including the law of large numbers and the central limit theorem for Markov chains that admit unique steady-state distributions, the methods of the present section are also suited for applications to certain more general (nonfinite) state spaces (e.g., closed and bounded intervals). These latter extensions are outlined in the exercises. First we consider what happens to the n-step transition law if all states are accessible from each other in one time step.
Proposition 6.1. Suppose S is finite and p,J > 0 for all i, j. Then there exists a unique probability distribution it = {m 1 : j e S} such that
(6.1)
yrc ; p ;J =n J for alljeS i
and ^p;j^—itJ1
(1—Nb)"
for all i,jeS, n>,1,
(6.2)
where 6 = min{p ;J : i, je S} and N is the number of elements in S. Also, it J >, 6 for alljeS and 6, 1.
(6.8)
Now M in+ 1) = max (n+ 1) = max I P
i
i
m
\ P'k Pkj
= Mcn) , )) < max (Y- p ik M(n)1 j J
k
i
r 1) =min p^^ +i) = min Y_ P ik Pkj) i
i
\ k
k
min ) (^ Pikm;" ^ = m;" ) ,
J
i
k
nondecreasing in n. Since M;" , mj" are i.e., Mj(" is nonincreasing and m bounded above by 1, (6.7) now implies that both sequences have the same limit, say n j . Also, 6<mj(' <m6 for all jand )
)
)
)
()
m ,")—M(n)
\p i1)_ 7El\
M(n)— m^n)
which, together with (6.8), implies the desired inequality (6.2). Finally, taking limits on both sides of the identity Pij +l) _ > P ik Pkj )
k
(6.9)
128
DISCRETE-PARAMETER MARKOV CHAINS
one gets it j = E k Trk Pkj, proving (6.1). Since E j p;^ = 1, taking limits, as n --• co, it follows that Y j nj = 1. To prove uniqueness of the probability distribution it satisfying (6.1), let it = f nj : j e S} be a probability distribution satisfying ic'p = it'. Then by iteration it follows that )
frj =
n;P;j = (it'p)j = (n'pp)j = (n'p 2 )j = ... _ (n'p")j = Y n;P. (6.10)
n
Taking limits as n —* oo, one gets nj = Y ; n ; ij = n j . Thus, n = n.
For an irreducible aperiodic Markov chain on a finite state space S there is a positive integer v such that 6':=minp;j ) >0.
(6.11)
^.j
Applying Proposition 6.1 with p replaced by p" one gets a probability it = {rc j : j e S} such that max
— nj l < (1 — Na')", n = 1, 2 ....
(6.12)
;.j
Now use the relations IP(jv+r") — 1Cjl =
Pik ) (Pkj - v) k
x j ) ) 1
0 and some positive integer v.
(6.16)
CONVERGENCE TO STEADY STATE
129
The property (6.15) is important enough to justify a definition.
Definition 6.1. A probability measure it satisfying (6.15) is said to be an invariant or steady-state distribution for p. Suppose that it is an invariant distribution for p and let {X n } be the Markov chain with transition law p starting with initial distribution n. It follows from the Markov property, using (2.9), that 1 Pn(Xm = lo, Xm+1 = i l , ... , Xm+n = in) _ (TC ' P m )i0Pt0t Pi 1 i2
_ lt lo PioiI Pi,(z.
..
.
Pi r
11.
. . D)n Iin
= P (X0 = i0, X1 = i1, • P
. Xn = in),
(6.17) for any given positive integer n and arbitrary states i 0 , i 1 , ... , i n e S. In other words, the distribution of the process is invariant under time translation; i.e., {Xn } is a stationary Markov process according to the following definition.
Definition 6.2. A stochastic process { Yn } is called a stationary stochastic process if for all n, m P(YO = i0, Y1 = i1, ... , Y. =
in)
= P(Ym
=
lo, Y1 +m = i1 , •
. Yn+m = in).
(6.18) Proposition 6.1 or its corollary 6.2 establish the existence and uniqueness of an invariant initial distribution that makes the process stationary. Moreover, the asymptotic convergence rate (relaxation time) may also be expressed in the form IPi;) — ir;I < ce '
(6.19)
-
for some c, A > 0. Suppose that {Xn } is a stochastic process with state space J = [c, d] and having the Markov property. Suppose also that the conditional distribution of Xn+ 1 given X. = x has a density p(x, y) that is jointly continuous in (x, y) and does not depend on n >, 1. Given X0 = x 0 , the (conditional on X 0 ) joint density of X 1 , ... , Xn is given by J(Xi.....X..IXo=xo)(xl, ...
, xn) = P(x0, X1)P(x1, X2)
If X0 has a density u(x), then the joint density of X 0 , . Jxo,....X,)(xo, .. . , xn) = µ( x0)P(xo, x1)
...
...
.. ,
P(Xn— 1, xn).
(6.20)
Xn is
P(xn-1, xn).
(6.21)
130
DISCRETE-PARAMETER MARKOV CHAINS
Now let 6 = min p(x, y).
(6.22)
x.y c [c,dj
In a manner analogous to the proof of Proposition 6.1, one can also obtain the following result (Exercise 9). Proposition 6.3. If 6 = minx, Y E[C,dlp(x, y) > 0 then there is a continuous probability density function n(x) such that
f
d
n(x)p(x, y) dx = n(y)
for all y e (c, d)
(6.23)
and
p( )(x, y) — n(y)I '< [1 — 6(d — c)]' '0
for all x, ye (c, d)
(6.24)
where 0 = max { p(x, y) — p(z, y)} .
(6.25)
x,y, z e [c,dl
Here p "°(x, y) is the n -step transition probability density function of X" given [
Xo = x. Markov processes on general state spaces are discussed further in theoretical complements to this chapter. Observe that if A = (( a s)) is an N x N matrix with strictly positive entries, then one may readily deduce from Proposition 6.1 that if furthermore , a, = 1. for each i, then the spectral radius of A, i.e., the magnitude of the largest eigenvalue of A, must be 1. Moreover, 2 = I must be a simple eigenvalue (i.e., multiplicity 1) of A. To see this let z be a (left) eigenvector of A corresponding to A = 1. Then for t sufficiently large, z + to is also a positive eigenvector (and normalizable), where it is the invariant distribution (normalized positive eigenvector for A = 1) of A. Thus, uniqueness makes z a scalar multiple of n. The following theorem provides an extension of these results to the case of arbitrary positive matrices A = ((a u )), i.e., a 1 > 0 for 1 < i, j < N. Use is not made of this result until Sections 11 and 12, so that it may be skipped on first reading. At this stage it may be regarded as an application of probability to analysis.
yN
Theorem 6.4. [Perron—Frobenius]. Let A = ((a ij )) be a positive N x N matrix. (a) There is a unique eigenvalue A o of A that has largest magnitude. Moreover, A is positive and has a corresponding positive eigenvector.
CONVERGENCE TO STEADY STATE
131
(b) Let x be any nonnegative nonzero vector. Then v = tim A nAnx
(6.26)
new
Proof.
exists and is an eigenvector of A corresponding to A,, unique up to a scalar multiple determined by x, but otherwise independent of x.
Define A + = {A > 0: Ax >, tix for some nonnegative nonzero vector x}; here inequalities are to be interpreted componentwise. Observe that the set A + is nonempty and bounded above by IIAII :_ JN , J" , a ij . Let A o be the least upper bound of A. There is a sequence {2 n : n >, l} in A + with limit A o as n -+ oc. Let {x n : n >, l} be corresponding nonnegative vectors, normalized so that hIxIl xi = 1, n >, 1, for which Ax n >, .? n x n . Then, since Ilxnll = 1 , n = 1, 2, ... , {x n } must have a convergent subsequence, with limit denoted x 0 , say. Therefore Ax 0 >, 2 0 x 0 and hence A o e A + . In fact, it follows from the least upper bound property of 2 c that Ax o = 2 0 x 0 . For otherwise there must be a 1 component with strict inequality, say l j j — 2 0 x 1 = ö > 0, where x 0 = (x 1 ..... x N )', and Ij , a kj x j — 2 o x k >, 0, k = 2, ... , N. But then taking y = (x 1 + ((5/2), x 2 , ... , x N )' we get Ay > pl o y with strict inequality in each component. This contradicts the maximality of 2. To prove that if A is any J
xi
n
Corollary 6.5. Let A = ((a ij )) be an N x N matrix with positive entries and let B be the (N — 1) x (N — 1) matrix obtained from A by striking out an ith row and jth column. Then the spectral radius of B is strictly smaller than that of A.
132
DISCRETE-PARAMETER MARKOV CHAINS
Proof. Let Z oo be the largest positive eigenvalue of B and without loss of generality take B = ((a 1 : i, j = 1, ... , N — 1)). Since, for some positive x, N
Y
a,j x j = 2 o x 1 ,
i = 1, .. , N, x i > 0,
j=1
we have N-1
a,jxj
(Bx)i =
= 2oxi — a,NXN < 2OX;,
1= 1,. ..,N— 1.
j=1
Thus, by the property (6.26) applied to Z oo we must have
.loo
0:tjr' 0 if i is positive recurrent, and it = 0 if i is null recurrent (use (9.32) T`r) < r+1) _ T^'
)
)
146
DISCRETE-PARAMETER MARKOV CHAINS
in the latter case). Thus (9.24) holds. Now O (i) > 0; for otherwise T = 0 with probability 1 for all r > 0, implying p , = 0, which is ruled out by the assumption. The relation (9.24) implies it > 0, since it > 0 and O (i) > 0. Therefore, all states are positive recurrent. For part (c), it has been proved above that there exists a unique invariant probability distribution it given by (9.35) if all states are positive recurrent. Conversely, suppose it is an invariant probability distribution. We need to show that all states are positive recurrent. This is done by elimination of other possibilities. If the states are transient, then (9.36) holds; using this in (9.29) (or (9.30)) one would get it = 0 for all j, which is a contradiction. Similarly, null recurrence implies, by (9.30) and (9.35), that ii = 0 for all j. Therefore, the states are all positive recurrent. Part (d) will follow from Theorem 9.2 if: ;
(i) The hypothesis (9.14) holds whenever
(9.39)
Y iil.f(i)I < oo, ics
and n (ü)
E;Zo = E;(.i(X1) + ... + .i(XT;1)) = —1
i
f(i).
(9.40)
tj ics
To verify (i) and (ii), first note that by the definition of
=)
T!
I
(9.41)
If(i)IT;'
If(Xm)I = m
iES
Since ET;' = O (i) = n i /rr (see Eq. 9.24), taking expectations in (9.41) yields Ta
E(
If(i)I"`.
If(i)IT')) =
If(Xm)I = E m=T^ ) +I
ies
iES
(9.42)
7rj
The last equality follows upon interchanging the orders of summation and expectation, which by Fubini's theorem is always permissible if the summands are nonnegative. Therefore (9.14) follows from (9.39). Now as in (9.42), T cn
.i(Xm) = E( , i(i)T;' )
E;Z0 = EZi = E m
_ ics
+i
lES
f(i)ET' ) = 1 f(i) 1 ,
(9.43)
I)
where this time the interchange of the orders of summation and expectation is justified again using Fubini's theorem by finiteness of the double "integral."
n
THE LAW OF LARGE NUMBERS
147
If the assumption that "all states communicate with each other" in Theorem 9.4 is dropped, then S can be decomposed into a set . t of inessential states and (disjoint) classes S l , S 2 , ... , St of essential states. The transition probability matrix p may be restricted to each one of the classes S..... , S, and the conclusions of Theorem 9.2 will hold individually for each class. If more than one of these classes is positive recurrent, then more than one invariant distribution exist, and they are supported on disjoint sets. Since any convex combination of invariant distributions is again invariant, an infinity of invariant distributions exist in this case. The following result takes care of the set , of inessential states in this connection (also see Exercise 4).
Proposition 9.5 (a) If j is inessential then it is transient. (b) Every invariant distribution assigns zero probability to inessential, transient, and null recurrent states. Proof. (a) If j is inessential then there exist i E S and m > I such that P;M > 0
and
)
p;; = 0 )
for all n >, 1.
(9.44)
Hence PJ(N(j)Pj(Xm =i, X,^j for n>m) =pP.(Xn rj for n>0)=p;7' 1 >0.
(9.45)
By Proposition 8.1, j is transient, since (9.45) says j is not recurrent. (b) Next use (9.36), (9.35) and (9.30) and argue as in the proof of part (c) of Theorem 9.2 to conclude that nj = 0 if j is either transient or null recurrent.
n Corollary 9.6. If S is finite, then there exists at least one positive recurrent state, and therefore at least one invariant distribution n. This invariant distribution is unique if and only if all positive recurrent states communicate. Proof. Suppose it possible that all states are either transient or null recurrent. Then lim n -.^
— n+1
p7° = 0 m
for all i, j e S.
(9.46)
-p
Since (n + 1) ' I, _ , p;T < I for all], and there are only finitely many states j, by Lebesgue's Dominated Convergence Theorem, )
148
DISCRETE-PARAMETER MARKOV CHAINS
lim jes n
=
1
1
Pi; ^) = lim
n+ 1 m=1
" Pi;
nix jes n+ 1 m=o
1
lim
(m) %^ij
— n -. x
n + 1 m=0 jcs n
=tim
1 — i 1 =lim n--1 =1. (9.47)
(n + 1 m=o -
n-x,n+1
But the first term in (9.47) is zero by (9.46). We have reached a contradiction. Thus, there exists at least one positive recurrent state. The rest follows from Theorem 9.2 and the remark following its proof. •
10 THE CENTRAL LIMIT THEOREM FOR MARKOV CHAINS The same method as used in Section 9 to obtain the law of large numbers may be used to derive a central limit theorem for S. = Ym =o f(Xm ), where f is a real-valued function on the state space S. Write = Ef(X0) = 1] it f(i),
(10.1)
ieS
and assume that, for Z, =
L
, r fr, + 1 f (Xm ), r = 0, 1, 2, ... Ej (Z0
- E ; Zo ) 2 < co.
(10.2)
Now replace f by f = f - µ, and write _
n _
Sn = Y- .J(X.) = Z (f(Xm) - U), m=0
Zr =
m=0
^ J(X)
(10.3)
(r = 0, 1, 2, ...).
m =,(i +1
Then by (9.40), Ej Zr = (Ejt)En f(Xo) = 0
(r = 0, 1, 2, ...).
(10.4)
Thus {Z,: r = 1, 2, ...} is an i.i.d. sequence with mean zero and finite variance or = EJ
. (10.5)
Now apply the classical central limit theorem to this sequence. As r -• cc, (1/^) jk =1 Zk converges in distribution to the Gaussian law with mean zero and variance U 2 . Now express Sn as in (9.6) with f replaced by f, S" by S,, and
149
THE CENTRAL LIMIT THEOREM FOR MARKOV CHAINS
Zr by Zr, to see that the limiting distribution of (1/^)S n is the same as that of (Exercise 1) 1 N,,
7
r
(
i
Na )112
1
N"
(10.6)
JNn r=1
n
We shall need an extension of the central limit theorem that applies to sums of random numbers of i.i.d. random variables. We can get such a result as an extension of Corollary 7.2 in Chapter 0 as follows. Proposition 10.1. Let {XX : j >, 1} be i.i.d., EX^ = 0, 0 < a 2 := EX J2 < oc. Let {v n : n >, l} be a sequence of nonnegative integer-valued random variables with v
(10.7)
in probability
lim " = a n-•,, n
XX/✓v, converges in distribution to
for some constant a > 0. Then N(0, a 2 ).
Without loss of generality, let a = 1. Write S n := X, F > 0 arbitrarily. Then, Proof.
P(IS,•" — Small > c([na])
1/2
) < P(Ivn — [na]I
+
P(
+ • •
+ Xn . Choose
s 3 [na])
max
ISm — (n]1 % c([na]) 1J2 )
(mI — (na1I x, by (10.7). The second term is estimated by Kolmogorov's Maximal Inequality (Chapter 1, Corollary 13.7), as being no more than {s([na]) 1 J z ) -z (8 3 [na]) = r.
(10.9)
This shows that S—" — Sfna1 —+ 0 1[na])
1r2
in probability.
(10.10)
Since S1na1 /([na]) l " 2 converges in distribution to N(0, 1), it follows from (10.10) that so does S"/([na])'/ z . The desired convergence now follows from (10.7).
n
By Proposition 10.1, N»'2 1 Zr is asymptotically Gaussian with mean zero and variance o z . Since N„/n converges to (Er') ', it follows that the expression in (10.6) is asymptotically Gaussian with mean zero and variance
150
DISCRETE-PARAMETER MARKOV CHAINS
(E j i5 l) ) -1 0 2 . This is then the asymptotic distribution of defining, as in Chapter I, Wn(t) =
n - '/ 2 5,,.
Moreover,
Sin(]
n+l
(10.11)
I
l' (t) = W(t) +
(nt — [nt])X 1+ 1 (t '> 0),
n+l
all the finite-dimensional distributions of {W„(t)}, as well as {W(t)}, converge in distribution to those of Brownian motion with zero drift and diffusion coefficient D = (Ejij1>)-'Q2
(10.12)
(Exercise 2). In fact, convergence of the full distribution may also be obtained by consideration of the above renewal argument. The precise form of the functional central limit theorem (FCLT) for Markov chains goes as follows (see theoretical complement 1). Theorem 10.2. (FCLT). If S is a positive recurrent class of states and if (10.2) holds then, as n -+ oo, l4(I) = (n + 1) -112 S„ converges in distribution to a Gaussian law with mean zero and variance D given by (10.12) and (10.5). Moreover, the stochastic process {W(t)} (or {W(t)}) converges in distribution to Brownian motion with zero drift and diffusion coefficient D. Rather than using (10.12), a more convenient way to compute D is sometimes the following. Write E,^, Var, f , Cov,, for expectation, variance, and covariance, respectively, when the initial distribution is it (the unique invariant distribution). The following computation is straightforward. First write, for any two functions h, g that are square summable with respect to it, = Z h(i)g(i)r1 = E,,[h(X0)g(X0)].
(10.13)
ics
Then,
f(Xm)
Var,,((n + 1)-1/2) = E,d (
M=O
JJ
Z
'(n + 1)
= E, MZ0 1 2(Xm) +2 mm 1 M^ O J (Xm')J (Xm)]'(n + 1) ,
/
n + 1 m=0
E,J 2 (Xm)
ABSORPTION PROBABILITIES
151
+ —?
Y E,,[J (Xm , )Erz(f (Xm) I Xr')]
I
n + 1 m=1 m0 2
= EJ 2(X0) +
— Y Y Erz[f 1
(Xm , )(P
m-m
J )(XYr')]
n + 1 m=1 m=0 rn-
I
= EJ 2 (X0 ) + —^
Erz
% (X0)(P m
'.j)(X0)]
n + 1 m=1 m'=0 2
n
m-1
=ErzfZ(X0)+ ---- Z _Z <J,Pm-mJ>rz ,
n + 1 m=1 m'=0
=Erzf 2 (X0)+
f,
2
1 Y-
k Y-Pi
I n
n+ 1 m = 1 k= 1 I
( k=m—m ' )
(10.14)
Now assume that the limit m
y=
lim
I
P ik a i (k)
(jEA,ieS).
(11.11)
k
Denoting by a j the vector (a(i): i E S), viewed as a column vector, one may express (11.10) as (je A).
aj=paj
(11.12)
Alternatively, (11.11) or (11.12) may be replaced by
a j (i) _ E p k a J (k)
for i A,
k
(11.13) _ 1 a ' (j) 0
ifi=j if i E A, but i A j
(^ E A).
A function (or, vector) a = (a(i): i ES) is said to be p-harmonic on B(c S) if a(i) = (pa)(i)
for i e B.
(11.14)
Hence a j is p-harmonic on A` and has the (boundary-)values on A, 1 a ' (i)
f o if
ifi=j i e A, i^ j.
We have thus proved part (a) of the following proposition.
(11.15)
154
DISCRETE-PARAMETER MARKOV CHAINS
Proposition 11.2. Let p be a transition probability matrix and A a nonempty subset of S. (a) Then the distribution of Xtq , as defined by (11.10), satisfies (11.13). (b) This is the unique bounded solution if and only if forallieS.
P; (T A n)1 P,(T A = cc)
as n T oo.
(11.17)
kEAl
Hence, if (11.16) holds, then um p;V = 0
for all i, k e A. (11.18)
n-. .0
On the other hand, Pil o =P(TA hika2N(k) k / a2N( 2 N) = 1, a2N(0) = 0,
(11.27)
and a o (i) = 1 — a 2N (i). In view of (11.26) and (11.19) we have i = E ; X0
2N = E . X,, _ kp ;k) —. 0a 0 (i) + 2 Na2N(i) = 2 Na2N(i) k=0
for i = 1, ... ,
2N — 1. Therefore, a 2N() i = ao(i)
2N
,
i
= 2N —i 2N '
=0, 1, 2,...,2N,
(11.28)
i = 0, 1, 2, ... , 2N.
(11.29)
Check also that (11.29) satisfies (11.27). In order to estimate the rate at which fixation of opinion occurs, we shall calculate the eigenvalues of p(= p here). Let v = (v o v 2N )' and consider the eigenvalue problem ,
...
,
2N
E' p j vj =2v i , ;
j=o
i
=0,1,...,2N.
(11.30)
The rth factorial moment of the binomial distribution (p ij : 0 < j < 2N) is
(2N — rl
(2N — r + 1) Z j(j — 1)... (j — r + l)pij =( 1r(2N)... 2N) j =r j — r j=o (jJ_r i 2N -j x 2N) 1 — 2N
_ (_ N I r (2N)(2N — 1)...(2N — r + 1), jj (11.31) for r = 1, 2, ... , 2N. Equation (11.31) contains a transformation between "factorial powers" and "ordinary powers" that deserves to be examined for connections with (11.30). The "factorial powers" (j) r := j(j — 1)... (j — r + 1) are simply polynomials in the "ordinary powers" and can be expressed as
j(1 — 1)•••(j — r + 1) _ Likewise, "ordinary powers"
r k =1
skjk.
(11.32)
jr can be expressed as polynomials in the "factorial
ABSORPTION PROBABILITIES
157
powers" as r
(11.33)
Sk(J)k,
J r = k=0
with the convention (j) 0 = 1. Note that S r' = 1 for all r > 0. The coefficients Sr), {Sk} are commonly referred to as Stirling coefficients of the first and second kinds, respectively. Now every vector v = (v 0 , ... , v 2N )' may be represented as the successive values of a unique (factorial) polynomial of degree 2N evaluated at 0, 1, . .. , 2N (Exercise 7), i.e., 2N
a r (j)r forj=0,1,...,2N.
(11.34)
r=0
According to (11.24), (11.32), 2N
2N
j =0 r=0
2N
2N
!2N\2N
/2N) r
S( r jr = Y- ar r Y- a r Y- pij(J)r —r=0 2N n=0 j=0 (2N) r=0 (2N)
pij vj = L-
ar
t J
\
2N /2N
=
Y- ar (2N (2N)r )r S»)(1)n.
(11.35)
n=0 r=n
It is now clear that (11.30) holds if and only if 2N
(2N)'
a, r
(2N)r S »
(11.36)
7
n
In particular, taking n = 2N and noting S' = 1, we see that (11.36) holds if
_ (2 N )2N
_ ^`'2N
(2N)2N +
(2N)
a2N
I, —
\ 1 1.37 ) a2n
and a r = & 2N) , r = 2N — 1,... ‚0, are solved recursively from (11.36). Next takea2N 1) — 0 , a2N-1 1) = 1 and solve for (2N)2N_
1
(2N)2N-1'
etc.
Then, 2r (2N)' = ( 1 2N)...(1 — r — 1),
0 < r < 2N.
(11.38)
158
DISCRETE-PARAMETER MARKOV CHAINS
Notice that .l o = A l = 1. The next largest eigenvalue is A 2 = 1 — (1/2N). Let V = ((v ij )) be the matrix whose columns are the eigenvectors obtained above. Writing D = diag(a. 0 , A.1, ... , 2 2 ,), this means pV = VD,
(11.39)
p = VDV -1 ,
(11.40)
p'° = VDmV -1 .
(11.41)
or
so that
Therefore, writing V' = ((v i ')), we have from (11.8) 2N-1 2N
2N-1
2N 2N-1
Y_ Y_ Ak vikU k ' _ Y
Pi(T(o.2N) > m) = Y p,, j=1
Y_ V ik V k ' tik . (11.42)
k=O j=1
j=1 k=O
Since the left side of (11.42) must go to zero as m —• oo, the coefficients of and A; - 1 must be zero. Thus, 2N 2N-1
Pi(T(0.2N) > m) _ Y, Y, Vikv k ^^k k=O j=1
_ ^2
Lt
, E1 1
vi2v 2i ) + kY 1
— (const.).12
, Y1 1
for large m.
Vik Uk,/ \^ 2/ m J
(11.43)
Example 2. (Bienayme—Galton—Watson Simple Branching Process). A simple branching process was introduced as Example 3.7. The state X,, of the process at time n represents the total number of offspring in the nth generation of a population that evolves by i.i.d. random replications of parent individuals. If the offspring distribution is denoted by f then, as explained in Section 3, the one-step transition probabilities are given by .f * `(j),
ifi>'1,j>'0,
p ij = 1
ifi=0,j=0,
0
ifi=0,j 0.
(11.44)
According to the values of p i j at the boundary, the state i = 0 is absorbing (permanent extinction). Write p io for the probability that eventually extinction occurs given X0 = i. Also write p = p lo . Then p io = p i , since each of the i sequences of generations arising from the i initial particles has the same chance
ABSORPTION PROBABILITIES
159
p of extinction, and the i sequences evolving independently must all be extinct in order that there may be eventual extinction, given X0 = i. If f(0) = 0, then p = p, o = 0 and extinction is impossible. If f(0) = 1, then pro % pi = I and extinction is certain (no matter what X o is). To avoid these and other trivialities, we assume (unless otherwise specified) -
0 0 for 0 < z < 1, 0 is strictly increasing. Also, since 4) "(z) (which exists and is finite for 0 < z < l) satisfies dz zz
O(z) _
j(j — 1)f(j)z'
2
>0
for 0 < z < 1, (11.50)
i =2
the function 0 is strictly convex on [0, 1]. In other words, the line segment joining any two points on the curve y = 4)(z) lies strictly above the curve (except at the two points joined). Because 0(0) = f(0) >0 and 4)(l) _ f(j) = 1, the graph of ¢ looks like that of Figure 11.1 (curve a or b). The maximum of ¢'(z) is y, which is attained at z = 1. Hence, in the case µ > 1, the graph of y = 4)(z) must lie below that of y = z near z = 1 and, because 4)(0) = f(0) > 0, must cross the line y = z at a point z o , 0 < z o < 1. Since the slope of the curve y = 4)(z) continuously increases as z increases in (0, 1), z o is the unique solution of the equation z = 4)(z) that is smaller than 1.
160
DISCRETE-PARAMETER MARKOV CHAINS
4(0) = f(0)
U
Zo
I
z
Figure 11.1 In case u ,n,so that q„ 1} an i.i.d. sequence of real-valued
Model).
A MARKOVIAN APPROACH TO LINEAR TIME SERIES MODELS
167
random variables defined on some probability space (f), .F, P). Given an initial random variable X 0 independent of {E„}, define recursively the sequence of random variables {X": n > 0} as follows: X,, X 1 := bX o + c ,
Xn+ 1 := bXn +„
(n 0).
(13.1)
As X0 , X 1 ... . , Xn are determined by {X 0 , s„ .... E n }, and E" + , is independent of the latter, one has, for all Bore! sets C, P(Xn+t E
Cl {X0 , X 1 ...
.. Xn}) = [P(bx + c,,1 E C)]x=x„
= [P(En+ ^ E C — bx)] =x„ = Q(C — bXn), (13.2)
where Q is the common distribution of the random variables t.• n . Thus {Xn : n >, 0} is a Markov process on the state space S = ER', having the transition probability (of going from x to C in one step) p(x, C):= Q(C — bx),
(13.3)
and initial distribution given by the distribution of X 0 . The analysis of this Markov process is, however, facilitated more by its representation (13.1) than by an analytical study of the asymptotics of n-step transition probabilities. Note that successive iteration in (13.1) yields X,=bX0 +E,,
X 2 =bX,+f 2 =b 2 X 0 +he,+c 2 ••
XX = bnX0 + bn-' E1 + bn-2 c2 + ... + be„-, + e„
(13.4)
(n % I).
The distribution of Xn is, therefore, the same as that of Y„ := b"X0 + e, + bs 2 + b 2 e 3 + • • . + b""'F
(n >, 1).
(13.5)
Assume now that IbI < 1
(13.6)
and le n s < c with probability I for some constant c. Then it follows from (13.5) that b"En+^
Yn —'
a.s.,
(13.7)
n=0
regardless of X0 . Let it denote the distribution of the random variable on the right side in (13.7). Then Y. converges in distribution to it as n -, oo (Exercise 1). Because the distribution of X. is the same as that of Yn , it follows that X. converges in distribution to n. Therefore, is is the unique invariant distribution for the Markov process {Xn }, i.e., for p(x, dy) (Exercise 1).
168
DISCRETE-PARAMETER MARKOV CHAINS
The assumption that the random variable s, is bounded can be relaxed. Indeed, it suffices to assume
"
E P(IE I > cS") < oo 1
for some 6 0.
I
(13.8)
I
For (13.8) is equivalent to assuming P(IE" + 1 I > c8") < co so that, by the Borel—Cantelli Lemma (see Chapter 0, Section 6), P(IE"+ lI < cS" for all but finitely many n) = 1. This implies that, with probability 1, Ib "E" +l l < c(Ibj6)" for all but finitely many n. Since IbI b < 1, the series on the right side of (13.7) is convergent and is the limit of Y". It is simple to check that (13.8) holds if I bi < 1 and (Exercise 3) Eie 1 i' 0.
(13.9)
The conditions (13.6) and (13.8) (or (13.9)) are therefore sufficient for the existence of a unique invariant probability it and for the convergence of X. in distribution to it.
_>
Next, Example 1 is extended to multidimensional state space. Example 2. (General Linear Time Series Model). Let {E": n 1} be a sequence of i.i.d. random vectors with values in Rm and common distribution Q, and let B be an m x m matrix with real entries b . Suppose X o is an m-dimensional random vector independent of {E" }. Define recursively the sequence of random vectors ;;
X0,X
1'= BXn+En +1
(n =0, 1,2,...).
(13.10)
As in (13.2), (13.3), {X"} is a Markov process with state space 68'" and transition probability
p(x, C):=Q(C — Bx)
(for all Borel sets C
c Qm). (13.11)
Assume that
IIB"°II < 1
for some positive integer n o .
( 13.12)
Recall that the norm of a matrix H is defined by IIHII:= sup IHxi, IXI =1
(13.13)
169
A MARKOVIAN APPROACH TO LINEAR TIME SERIES MODELS
where Ixl denotes the Euclidean length of x in LRm. For a positive integer n > n o write n = jn o + j', where 0 < j' < n o . Then using the fact IIB1B2II IIB1II IIB2II for arbitrary m x m matrices B,, B 2 (Exercise 2), one gets
II WI = IIB'"°B' II < IIB"°IVIiB' II < cIIB"°II'>
c
max {IIB•II:0 < r < n o }. (13.14)
From (13.12) and (13.14) it follows, as in Example 1, that the series Z B"E, converges a.s. in Euclidean norm if (Exercise 3), for some c > 0,
" °^
1 1 P(i 1 I > cb") < oo for some 6
0 ),
= HXn + En+ 1
173
(13.34)
where H is the (p + q) x (p + q) matrix
b ll .
...
h nl H :=
b 1 0 h
.
d9-1
...
0
...
b 2 d1
0
0
00
1
0 ..
00
0
00
0
1
•••
00
0
0
...
0
0•••
01
0
0
.••
0
0
00
...
the first p rows and p columns of H being the matrix B in (13.27). Note that U., ... , U P _ 1 , rl p _ q , ... , q p _, determine X 0 so that X o is independent of rl p and, therefore, of E,. It follows by induction that X. and E n+ are independent. Hence {X n } is a Markov process on the state space I8° + Q. In order to apply the Lemma above, expand det(H —;.I) in terms of the elements of its pth row to get (Exercise 5) ,
,
det(H — Al) = det(B — Al)(_2) 9
.
( 13.35)
Therefore, the eigenvalues of H are q zeros and the roots of (13.29). Thus, one has the following proposition.
Proposition 13.2. Under the hypothesis of Proposition 13.1, the ARMA(p, q) process {X n } has a unique invariant distribution n, and X. converges in distribution to it no matter what the initial distribution is. As a corollary, the time series { Un } converges in distribution to m given for all Borel sets C c R' by
ii
u
(C):= rz({x e RP + q: x
(
' E )
C}),
(13.36)
no matter what the distribution of (U0 , U„ ... , U p _,) is, provided the hypothesis of Proposition 13.2 is satisfied. In the case that E n is Gaussian, it is simple to check that under the hypothesis (13.12) in Example 2 the random vector V in (13.16) is Gaussian. Therefore, it is Gaussian, so that the stationary vector-valued process {X n } with initial distribution it is Gaussian (Exercise 6). In particular, if q n are Gaussian in Example 2(a), and the roots of the polynomial equation (13.29) lie inside the unit circle in the complex plane, then the stationary process {Un }, obtained
174
DISCRETE-PARAMETER MARKOV CHAINS
when (U0 , U I ... , U,,,_ 1 ) have distribution similar assertion holds for Example 2(b). ,
it
in Example 2(a), is Gaussian. A
14 MARKOV PROCESSES GENERATED BY ITERATIONS OF I.I.D. MAPS The method of construction of Markov processes illustrated for linear time series models in Section 13 extends to more general Markov processes. The present section is devoted to the construction and analysis of some nonlinear models. Before turning to these models, note that one may regard the process {X„} in Example 13.1 (see (13.1)) to be generated by successive iterations of an i.i.d. sequence of random maps a l , a...... a„, ... defined by x-->a„x=bx
+E„
(n> 1),
{s„: n >, l} being a sequence of i.i.d. real-valued random variables. Each a„ is random (affine linear) map on the state space R' into itself. The Markov sequence {X} is defined by X„ = a„ • . a,X (n > 1), (14.1) ()
where the initial X 0 is a real-valued random variable independent of the sequence of random maps {a„: n >, 1 }. A similar interpretation holds for the other examples of Section 13. Indeed it may be shown, under a very mild condition on the state space, that every Markov process in discrete time may be represented as (14.1) (see theoretical complement 1). Thus the method of the last section and the present one is truly a general device for constructing and analyzing Markov processes on general state spaces. Example 1. (Iterations of I.I.D. Increasing Maps). Let the state space be an interval J, finite or infinite. On some probability space (Q, .F , P) is given a sequence of i.i.d. continuous and increasing random maps {a„: n >, l} on J into itself. This means first of all that for each w E S2, a„(w) is a continuous and increasing (i.e., nondecreasing) function on J into J. Second, there exists a set I' of continuous increasing functions on J into J such that P(a„ e i') = I for all n; I' has a sigmafield .4(I') generated by sets of the form {y e I': a < yx < b} where a < b and x are arbitrary elements of J, and yx denotes the value of y at x. The maps a„ on Q into I' are measurable, i.e., F„ := {co e S2: a„(w) e D} E F for every D e .(r). Also, P(F) is the same for all n. Finally, {a„: n >, 1 } are independent, i.e., events {a„ e D„} is an independent sequence for every given sequence {D„} For any finite set of functions y,, Y 2 , ... , y k in I', one defines the composition Y1Y2' • •Yk in the usual manner. For example, y 1 y 2 x = y 1 (y 2 x), the value of y 1 at the point y2x.
175
MARKOV PROCESSES GENERATED BY ITERATIONS OF I.I.D. MAPS
For each x e J define the sequence of random variables
X0(x):=x,
X,(x):=a„X„ - 1(x) = anan- 1
...
(n > 1). (14.2)
a,x
In view of the independence of {a„: n > 1}, {X„: n 0} is a Markov process (Exercise 1) on J, starting at x and having the (one-step) transition probability
p(y, C):= P(a„
y E C) = p({y e F: yy e Cl)
where y is the common distribution of
(C Borel subset of J), (14.3)
an .
lt will be shown now that the following condition guarantees the existence of a unique invariant probability it as well as stability, i.e., convergence of X„(x) in distribution to it for every initial state x. Assume
6, := P(X„ o (x) '< z 0 Vx) >0
and
82 :=
P(X„ o (x) z 0 Vx) >0 (14.4)
for some zo E J and some integer n o . Define A,:= sup I P(X,(x)
'< z) — P(X,(y) '< z).
(14.5)
x,y,zcJ
For the existence of a unique invariant probability it and for stability it is enough to show that A,, —> 0 as n —> co. For this implies P(X„(x) < z) converges, uniformly in z e J, to a distribution function (of a probability measure on J). To see this last fact, observe that X„ +m (x) _— a„ + m •a,x has the same distribution as a„• • a1„ +m • • a„ + ,x, so that
IP(Xn +m(x) < Z) — P(Xn(x)
, 1. Then, IP(X(j+l)no(x) < Z) — P(X(j+l)no(.y) < z)I = I E ( 1 {ai . +un o ... ain o +IXjn o (x) z) — = I E(l {xjno(x) 2 ( ( Itj+1mo ... n 0+1) - '(
1 (au+IW o ...ai no + Xjn o (Y)Sz) ) I
— aO.zJ) — 1 {Xjno(Y)e(a(j+ llno ... a^no+1) - '( — ao.zJ) )1 ' (14.13)
Let F3 1 = {a(j + 1) no ...aj no+ lx < zOVx},
F4:= {a(j+1) no ...aj np +1X s Zpdx}.
177
MARKOV PROCESSES GENERATED BY ITERATIONS OF I.I.D. MAPS
Take z < z o first. Then the inverse image of (—x,:] in (14.13) is empty on F4 , so that the difference between the two indicator functions vanishes on F 4 . On the complement of F,, the inverse image of (— ao, z] under the continuous increasing map a (j+1)n0 • ' *anno + , is an interval (—oo, Z'] n J, where Z' is a random variable. Therefore, (14.13) leads to JE 1 F ( I lX,lx)SZ'1 — 1 X(v)$l'^)I-
IP(X(J+I)no(x) < z) — P(X(J+1)no(y) < z)I =
(14.14)
As Fä and Z' are determined by a+,)n o , ... , a;n.+, and the latter are independent of Xj„„(x), X 0 (y) one gets, by taking conditional expectation given 1aU+1)no, ... , ajno+1
IP(X(J+1)no(X) l< z) — P(X(J+1)no(Y) S z)1
IE1 Fg i jn0 I _ (1 — 5 2 )A jn0 < b\ ino < hJ+ 1 . (14.15) Similarly, if z > z 0 , the inverse image in (14.13) is J on F3 . Therefore, the difference between the two indicator functions in (14.13) vanishes on F3 , and one has (14.14), (14.15) with F4 , b 2 replaced by F 3 and J,. As (14.12) holds for j = 1 (see (14.6)), the induction is complete and (14.12) holds for allj >, 1. Since 6 < 1, it follows that under the hypothesis (14.4), A n —* 0 exponentially fast as n —+ oo and, therefore, there exists a unique invariant probability. If J is a closed bounded interval [a, b], then the condition (14.4) is essentially necessary for stability. To see this, define Y0(x) = x,
}(x):=12.
(n (n
1).
(14.16)
Then Yn (x) and X(x) have the same distribution. Also, Y1(a)
a, Y2(a) = Y1(a2a) >, Y,(a), .. . Yn+,(a) = Yn(an+la)
i Yn(a),
i.e., the sequence of random variables { Yn (a): n > 0} is increasing. Similarly, n > 0} is decreasing. Let the limits of these two sequences be Y, Y, respectively. As Yn (a) < Yn (b) for all n, Y < Y If P( Y < Y) > 0, then Y and Y cannot have the same distribution. In other words, Yn (a) (and ; therefore, Xn (a)) and Yn (b) (and, therefore, XX (b)) converge in distribution to different limits it 1 , rr z say. On the other hand, if Y = Y a.s., then these limiting distributions are the same. Also, Yn (a) < Yn (x) ` Yn (b) for all x, so that Yn (x) converges in distribution to the same limit it, whatever x. Therefore, it is the unique invariant probability. Assume that is does not assign all its mass at a single point. That is, rule out the case that with probability I ally's in I, have a common fixed point. Then there exist m < M such that P(Y < m) > 0 and P(Y > M) > 0. There exists n o such that P(Yno (b) < m) > 0 and P(Yno (a) > M) > 0. Now any z o e [m, M] satisfies (14.4).
178
DISCRETE-PARAMETER MARKOV CHAINS
As an application of Example 1, consider the following example from economics.
Example 1(a). (A Descriptive Model of Capital Accumulation). Consider an economy that has a single producible good. The economy starts with an initial stock X0 = x > 0 of this good which is used to produce an output Y, in period 1. The output Y, is not a deterministic function of the input x. In view of the randomness of the state of nature, Y, takes one of the values fr (X) with probability p, > 0 (1 < r 0 and f' (x) 0.
(ii) limxlo fr(X) = 0, limXlof;(x) > 1, limit f(x) = 0. (iii) If r > r', then fr (X) > fr (X) for all x > 0. The strict concavity of f, in (i) reflects a law of diminishing returns, while (iii) assumes an ordering of the technologies or production functions f„ from the least productive fi to the most productive fN . A fraction ß (0 < ß < 1) of the output Yl is consumed, while the rest (1 — ß)Y, is invested for the production in the next period. The total stock X, at hand for investment in period 1 is OXo + (1 — ß)Y1 . Here 0 < I is the rate of depreciation of capital used in production. This process continues indefinitely, each time with an independent choice of the production function (f, with probability p r , 1 , 0 if x < a N . As a consequence, if the initial state x is in [a,, a N ], then the process n >, 0} remains in [a,, a N ] forever. In this case, one may take J = [a,, a N ] to be the effective state space. Also, if x > a, then the nth iterate of g i , namely gin ) (x), decreases as n increases. For if x a,, then g,(x) < x, 8121 (x) = g, (g, (x)) < g, (x), etc. The limit of this decreasing sequence is a fixed point of g l (Exercise 3) and, therefore, must be a,. Similarly, if x < a N then g(x) increases, as n increases, to a N . In particular, lim g(a N )
= a,,
n-x
lim g(a 1 ) = a N . n-•y,
Thus, there exists an integer n 0 such that no)
91 (aN) < 9 N< 0) (a,).
a,
u
Figure 14.1
a,
(14.22)
180
DISCRETE-PARAMETER MARKOV CHAINS
This means that if z o e [gi" ° '(aN), 9 °) (a1)], then
P(X" O (x) P(oc"=g l for I , ii(y)
for all
XE
68,
(14.25)
where i(y):= min{cp(y — z): a 0. Then (see theoretical complement 6.1) it follows that this Markov process has a unique invariant probability with a density n(y) and that the distribution of X. converges to it(y) dy, whatever the initial state. The following example illustrates the dramatic difference between the cases when a density exists and when it does not. Example 2. Let S = [-2,2] and consider the Markov process X„ + , = .f(X) + r„ + , (n > 0), X0 independent of {^„}, where ie„} is an i.i.d. sequence
with values in [-1,1], and x+l
if-2x0,
Lx-1
if 0<x^2.
(14.26)
First let E„ be Bernoulli, P(E„ = I) = '' = P(f„ _ — I). Then, with X. - x e (0, 2],
I
X 1 (x) =
x — 2
with probability ' ,
x
with probability
i,
and X, (x — 2) has the same distribution as X 1 (x). It follows that P(X 2 (x)=x-2IX,(x)=x)=?=P(X 2 (x)=x— 21X,(x)=x-2), P(X 2 (x) = xI X 1 (x) = x) = z = P(X 2 (x) = x X 1 (x) = x — 2).
In other words, X 1 (x) and X 2 (x) are independent and have the same two-point distribution It s . It follows that { X„(x): n >, I } is i.i.d. with common distribution m. In particular, n x is an invariant initial distribution. If x e [-2,0], then {X„(x): n > l} is i.i.d. with common distribution nx+2, assigning probabilities Z and Z to {x + 2} and {x}. Thus, there is an uncountable family of invariant initial distributions {n r : 0 < x < 1) v {n x+ ,: — I _< x 0}. On the other hand, suppose s„ is uniform on [— 1, 1], i.e., has the density z on [— 1, 1] and zero outside. Check that (Exercise 6) {X 2 (x): n > l} is an i.i.d.
DISCRETE-PARAMETER MARKOV CHAINS
182
sequence whose common distribution does not depend on x and has a density —2 c) and the remainder x — c is invested for production in the next period. The stock produced in period 1 is X l = t 1 (X0 — c) = E 1 (x — c), where a l is a nonnegative random variable. Again, after consumption, X l — c is invested in production of a stock of E 2 (X 1 — c), provided X l > c. If X l < c, the agent is ruined. In general, Xo = x,
Xn+l = En+1 (Xn
— c)
(n>0).
(14.28)
where {En : fl ? l} is an i.i.d. sequence of nonnegative random variables. The state space may be taken to be [0, oo) with absorption at 0. The probability of survival of the economic agent, starting with an initial stock x > c, is (14.29)
p(x):= P(Xn > c for all n > 01 X 0 = x).
If S = P(c 1 = 0) > 0, then it is simple to check that P(r n = 0 for some 1 (Exercise 8), so that p(x) = 0 for all x. Therefore, assume
n > 0) =
(14.30)
P(e 1 > 0) = 1.
From (14.28) one gets, by successive iteration, c c+ -En+l
Xn+l > C lff Xn > C + C iff Xn - 1 > C + E n+1
=C+c + En
En
C
CC
iffX,=—x>c+ c +-- -+•..+ EI
E1E2
C
2n n+1
-
E1E2...sn+1
Hence, on the set {E n > 0 for all n}, {Xn >cforalln}={x>c+ + _ - +•••+ l E1 6162
` °D =jxiC+cF l
1
1 foralln}
E1gZ...En
ll—j^ ( 1
n=1 6 I E Z ...E n )
0D
In=1 E182...En
) x
, c). Suppose first that E log E, exists and E log E, < 0. Then, by the Strong Law of Large Numbers,
1
n
n
r= ,
— Y log E n gElogE l
In particular, for all sufficiently large t, say t >, T, exp{—t(H + y)}
< p,(X 0 , ... , X,_,) < exp{—t(H — y)}
with probability at least I — ö. Let R, denote the set consisting of all words x of length t such that e - ` ( " + }' ) < p 1 (a) < e - ` ( " - ' ) . Fix t larger than T. Let Sr = {a„ ) , a (2 ... , ; N , (e» }. The sum of the probabilities of the M,(E), say, words a of length t in R, that are counted among the N,(e) words in S, equals Yr €s, u R, p,(a) > r —5 by definition of N,(e). Therefore, M, ( s ) e -r(H-vi >,
I
p,(a) > t — b.
(15.8)
aeS,' R,
Also, none of the elements of S, has probability less than exp{ —t(H + y)}, since the set of all a with p,(a) > exp{ — t(H + y)} contains R, and has total probability larger than I — ö > E. Therefore, N,(c)c- pur +y) < 1.
(15.9)
Taking logarithms, we have log N,(e) t
e — 6.
(15.10)
Again taking logarithms and now combining this with (15.9), we get log N,(E) t
Since y and 6 are arbitrarily small the proof is complete.
n
188
DISCRETE-PARAMETER MARKOV CHAINS
Returning to the problem of calculating the compression coefficient, first let us show that p >, H/log M. Let 6 > 0 and let H' = H — 2b < H. For an arbitrary given code, let J, = {a a is a word of length t and c(a) < tH'/log M}.
(15.11)
Then #J < M + M 2 + • • • + MIIH'IIogM] < M(1H'jiogM) { l + 1/M + • • _
e`H'M M-1
,
(15.12)
since the number of code-words of length k is M k . Now observe that
tH'
tµ,=Ec(X1,...,X^)%lo MP{(X„...,X,)eJi} g _ ^H—[1 —P{(X,,...,X,)eJ,}].
log M
(15.13)
Therefore,
p = limsup p,
> log M
limsup [1 — P{(X,, ... , X,) e J}].
(15.14)
Now observe that for any positive number e < 1, for the probability P({(X,, ... , Xr ) e J}) to exceed E requires that N,(E) be smaller than #J,. In
view of (15.12) this means that N,(E) 0). Calculate x = 0, 1, 2, .... Assume that A„ + , is independent of X o the transition probabilities for {X„}. , ... ,
_
0 for all i, j e S by filling in the details of the following steps. (i) For a fixed (i, )), let B i; _ {v > 1: p;J) > 0}. Then for each state j, B is closed under addition. (ii) (Basic Number Theory Lemma) If B is a set of positive integers having greatest common divisor 1 and if B is closed under addition, then there is an integer b such that ne B for all n >, b. [Hints: (a) Let G be the smallest additive subgroup of Z that contains B. Then argue that G = Z since if d is the smallest positive integer in G it will follow that if n E B, then, since n = qd + r, 0 (a + ß) 2 , then, writing
>_
n= q(a +ß)+r, 0 0 by virtue of n > ( r + 1)(a + ß).] (iii) For each (i, j) there is an integer b;; such that v b implies v e B.. [Hint: Obtain b from (ii) applied to (i) and then choose k such that p;; 9 > 0. Check that b = k + b suffices.] (iv) Check that v = max {b : i, j e S} suffices for the statement of the exercise. ;;
;;
;;
;;
5. Classify the states in Exercises 2.4, 2.5, 2.6, 2.8 as essential and inessential states. Decompose the essential states into their respective equivalence classes. 6. Let p be the transition matrix on S = {0, 1, 2, 3} defined below.
2
0 ' 0 '2
2
20 Z 0 0 1 0 '2 2
0
2
0
Show that S is a single class of essential states of period 2 and calculate p" for all n. 7. Use the strong Markov property to prove that if j is inessential then P,,(X,, =j for infinitely many n) = 0. 8. Show by induction on N that all states communicate in the Top-In Card Shuffling example of Exercises 2.3(ii) and 4.5.
EXERCISES
197
Exercises for Section II.6 1. (i) Check that "deterministic motion going one step to the right" on S = {0, 1, 2, ...} provides a simple example of a homogeneous Markov chain for which there is no invariant distribution. (ii) Check that the "static evolution" corresponding to the identity matrix provides a simple example of a 2-state Markov chain with more than one invariant distribution (see Exercise 3.5(i)). (iii) Check that "deterministic cyclic motion" of oscillations between states provides a simple example of a 2-state Markov chain having a unique invariant distribution it that is not the limiting distribution lim,,. P(X„ =1) for any other initial distribution It : it (see Exercise 3.5(ii)). (iv) Show by direct calculation for the case of a 2-state Markov chain having strictly positive transition probabilities that there is a unique invariant distribution that is the limiting distribution for any initial distribution. Calculate the precise exponential rate of convergence. 2. Calculate the invariant distribution for Exercise 2.5. Calculate the so-called equilibrium price of the commodity, 3. Calculate the invariant distribution for Exercise 2.8. 4. Calculate the invariant distribution for Exercise 2.3(ii) (see Exercise 5.8). 5. Suppose that n states a,, a 2 .... , a„ are arranged counterclockwise in a circle. A particle jumps one unit in the clockwise direction with probability p, 0 _< p _< 1, or one unit in the counterclockwise direction with probability q = I — p. Calculate the invariant distribution. 6. (i) (Coupling Bound) If X and Y are arbitrary real-valued random variables and J an arbitrary interval then show that JP(X e J) — P(Y e J)I < P(X ^ Y). (ii) (Doeblin's Coupling Method) Let p be an aperiodic irreducible finite-state transition matrix with invariant distribution n. Let {X„} and { Y„} be independent Markov chains with transition law p and having respective initial distributions it and S ; ,. Let T denote the first time n that X. = (a) Show P(T < oo) = 1. [Hint: Let v = min{n: p >0 for all i, j e S}. Argue that P(T> kv)5 P(Xkv^ YkvIX(k-i)i# (k-1)v''Xv# Yv) ... P(X 2 ,, ^ Y'2V I X„ ^ YV )P(X V 0 YY ) < (I — Nb 2 ) k ,
where 6 = min ; p;j> > 0, N = ISI.] (b) Show that {(X„, )„)} is a Markov chain on S x S and T is a stopping time for this Markov chain. (c) Define {Z„} to be the process obtained by observing { Y„} until it meets {X.} and then watching {X„} from then on. Show that {Z„} is a Markov chain with transition law p and invariant initial distribution H. (d) Show IP(Z„ = j) — P(XX =1)1 _< P(T _> n).
(e) Show Ip
—
P(T >_ n) _< ( 1
Nö2)o.iv^
—
-
1.
Let A = ((a ;J )) be an N x N matrix. Suppose that A is a transition probability matrix with strictly positive entries a ;j . (i) Show that the spectral radius, i.e., the magnitude of the largest eigenvalue of A,
198
DISCRETE-PARAMETER MARKOV CHAINS
must be 1. [Hint: Check first that A is an eigenvalue of A (in the sense Ax = Ax has a nonzero solution x = (x 1 x")') if and only if zA = atz has a nonzero solution z = (z, • • z"); recall that det(B) = det(B') for any N x N matrix B.] (ii) Show that A = 1 must be a simple eigenvalue of A (i.e., geometric multiplicity 1). [Hint: Suppose z is any (left) eigenvector corresponding to A = 1. By the results of this section there must be an invariant distribution (positive eigenvector) n. For t sufficiently large z + to is also positive (and normalizable).] . •
*8. Let A = ((a ;j )) be a N x N matrix with positive entries. Show that the spectral radius is also given by min{.l > 0: Ax < Ax for some positive x}. [Hint: A and its transpose A' have the same eigenvalues (why?) and therefore the same spectral radius. A' is adjoint to A with respect to the usual (dot) inner product in the sense (Ax, y) = (x, A'y) for all x, y, where (u, v) = Z" 1 u 1 v.. Apply the maximal property to the spectral radius of A'.] 9. Let p(x, y) be a continuous function on [c, d] x [c, d] with c < d. Assume that p(x, y) > 0 and p(x, y) dy = 1. Let S2 denote the space of all sequences w = (x 0 x 1 ,...) of numbers x ; e [c, d]. Let .yo denote the class of all finite-dimensional sets A of the form A = {co = (x 0 x,, . ..) e S2: a, <x < b ; , i = 0,1, ... , n}, where c < a ; < b, < d for each i. Define P (A) for such a set A by
J
,
,
P
Px(A)
=
J
"...
fa
P(x,Y1)p(Y1,Y2) ... p(Y "-1,Y ")dY" ... dy1
forxe[ao,bo].
bl ,
Define P (A) = 0 if x < a o or x > b 0 . The Kolmogorov Extension Theorem assures us that PX has a unique extension to a probability measure defined for all events in the smallest sigmafield .y of subsets of Q that contains FO . For any nonnegative integrable function y with integral 1, define P
P(A)
= J P (A)y(X) dx, x
A e F.
Let X. denote the nth coordinate projection mapping on 52. Then {X"} is said to be a Markov chain on the state space S = [c, d] with transition density p(x, y) and initial density y under P. Under P. the process is said to have initial state x. (i) Prove the Markov property for {X"}; i.e., the conditional distribution of X"+ 1 given X0 X" is p(X", y) dy. (ii) Compute the distribution of X. under P. (iii) Show that under PY the conditional distribution of X. given X o = x o is p 1 " ) ( x o , y) dy, where ,...
,
p(x,Y)=
f
p ( n - 1)
(x, z)p(z, y) dz
and
p ( ' ) =p.
(iv) Show that if S = inf{p(x, y): x, ye [c, d]} > 0, then
J I P(x, Y) — P(z, Y)I dy < 2[ 1 — b(d — c)]
199
EXERCISES
by breaking the integral into two terms involving y such that p(x, y) > p(z, y) and those y such that p(x, y) -< p(z, y). (v) Show that there is a continuous strictly positive function n(y) such that max {Ip " (x, y) — m(y)I: c -< x, y -< d} < [1 — b(d — c)]" 'p (
)
where p = max {Ip(x, y) — p(z, y)I: c -< x, y, z -< d} < oo. (vi) Prove that it is an invariant distribution and moreover that under the present conditions it is unique. (vii) Show that P(X" e (a, b) i.o.) = 1, for any c m).] -
10. (Random Walk on the Circle) Let {X"} be an i.i.d. sequence of random variables taking values in [0, 1] and having a continuous p.d.f. _f(x). Let {S} be the process on [0, 1) defined by mod1,
S"=x+X,+•••+X"
where x E [0, 1). (i) Show that {S"} is a Markov chain and calculate its transition density. (ii) Describe the time asymptotic behavior of p "»(x, y). (iii) Describe the invariant distribution. (
11. (The Umbrella Problem) A person who owns r umbrellas distributes them between home and office according to the following routine. If it is raining upon departure from either place, an event that has probability p, say, then an umbrella is carried to the other location if available at the location of departure. If it is not raining, then an umbrella is not carried. Let X. denote the number of available umbrellas at whatever place the person happens to be departing on the nth trip. (i) Determine the transition probability matrix and the invariant distribution (equilibrium). (ii) Let 0 < a < 1. How many umbrellas should the person own so that the probability of getting wet under the equilibrium distribution is at most a against a climate (p)? What number works against all possible climates for the probability a?
Exercises for Section I1.7 1. Calculate the invariant distributions for Exercise 2.6. 2. Calculate the invariant distribution for Exercise 5.6. 3. A transition matrix is called doubly-stochastic if its transpose p' is also a transition matrix; i.e., if the elements of each column add to 1. (i) Show that the vector consisting entirely of l's is invariant under p and can be normalized to a probability distribution if S is finite. (ii) Under what additional conditions is this distribution the unique invariant distribution? 4.
(i) Suppose that
it = ( a ; )
is an invariant distribution for p. The distribution P,, of
200
DISCRETE-PARAMETER MARKOV CHAINS
the process is called time-reversible if it p ij = 7r j pj; for all i, j e S [it is often said to be time-reversible (with respect to p) as well]. Show that if S is finite and p is doubly stochastic, then the (discrete) uniform distribution makes the process time-reversible if and only if p is symmetric. (*ii) Suppose that {X^} is a Markov chain with invariant distribution it and started in x. Then {X„} is a stationary process and therefore has an extension backward in time ton = 0, ± 1, ± 2..... [Use Kolmogorov's Extension Theorem.] Define the time-reversed process by Y„ = X_,. Show that the reversed process {}„} is a Markov chain with 1-step transition probabilities q ;j = nj pji /n,. (iii) Show that under the time-reversibility condition (i), the processes in (ii), { Y„} and {X„}, have the same distribution; i.e., in equilibrium a movie of the evolution looks the same statistically whether run forward or backward in time.
(iv) Show that an irreducible Markov chain on a state space S with an invariant initial distribution it is time-reversible if and only if (Kolmogorov Condition): Pr, Pi,i 2 •
Piki = PukP ikik- 1 * • •P i,i
for all i, i 1 , .... i k E S, k >_ 1.
(v) If there is a j e S such that p ij > 0 for all i 0 j in (iv), then for time-reversibility it is both necessary and sufficient that p ij Pik Pki = Pik Pkj Pji for all i, j, k. 5. (Random Walk on a Tree) A tree graph on n vertices v 1 , v 2 , ... , v, is a connected graph that contains no cycles. [That is, there is given a collection of unordered pairs of distinct vertices (called edges) with the following property: Any two distinct vertices
u, v e S are uniquely connected in the sense that there is a unique sequence e l , e 2 .... , e„ of edges e ; _ {v k; , v,„,} such that u e e, v e e,,, e i n e i ,, ^ 0, i = 1, ... , n — 1.] The degree v i of the vertex v ; represents the number
of vertices adjacent to v i , where u, v E S are called adjacent if there is an edge {u, v}. By a tree random walk on a given tree graph we mean a Markov chain on the state space S = {v1, v2.... , v r } that at each time step n changes its state v ; to one of its v i randomly selected adjacent states, with equal probabilities and independently of its states prior to time n. (i) Explain that such a Markov chain must have a unique invariant distribution. (ii) Calculate the invariant distribution in terms of the vertex degrees v i , i = 1, ... , r. (iii) Show that the invariant distribution makes the tree random walk time-reversible. n = 0, 1, 2, ... . 6. Let {X„} be a Markov chain on S and define Y. = (X, X (i) Show that {Y„} is a Markov chain on S' = {(i, j) E S x S: p i , > 0). (ii) Show that if {X„} is irreducible and aperiodic then so is {Y„}. (iii) Show that if {X„} has invariant distribution it = (rz ; ) then {}„} has invariant distribution (n ; p ; ,).
7. Let {X„} be an irreducible Markov chain on a finite state space S. Define a graph G having states of S as vertices with edges joining i and j if and only if either p, j > 0 or pj; > 0. (i) Show that G is connected; i.e., for any two sites i and j there is a path of edges from i to j. (ii) Show that if {X„} has an invariant distribution it then for any A c S, Y. E 7 ri Pij = /_ Y_ Irj Pjl leA jES\A ieA jeS\A
EXERCISES
201
(i.e., the net probability flux across a cut of S into complementary subsets A, S\A is in balance). (iii) Show that if G contains no cycles (i.e., is a tree graph in the sense of Exercise 5). then the process is time-reversible started with n.
Exercises for Section 1I.8 1. Verify (8.10) using summation by parts as indicated. [Hint: Let Z be nonnegative integer-valued. Then P(Z>r)=
P(Z=n). r=on=r+t
.=o
Now interchange the order of summation.] 2. Classify the states in Examples 3.1-3.7 as transient or recurrent. 3. Show that if j is transient and i —*j then - o p ^ < co and, in particular, p —*0 as n —• oc. [Hint: Represent N(j) as a sum of indicator variables and use (8.11).] (
)
4. Classify the states for the models in Exercises 2.6, 2.7, 2.8, 2.9 as transient or recurrent. 5. Classify the states for {R„} = {IS,,}, where S. is the simple symmetric random walk starting at 0 (see Exercise 1.8). 6. Show that inessential states are transient. 7. (A Birth or Collapse Model) Let
1
-- ' Pi.^+I =i+1
i
Pi.o
=i+
i
1'
=0, 1,2,....
Determine whether p is transient or recurrent. 8. Solve Exercise 7 when I
i
>_ I ' Po.i
1.
9. Let p,, +1 = p, p; , o = q, i = 0, 1, 2, .... Classify the states of transient or recurrent (0 < p < 1, q = 1 — p).
S=
A.o
=
i+
l '
Pi.;+I
= i + 1'
i
{0, 1, 2, ...} as
10. Fix i,.j E S. Write r„=P;(X„=j)
= p;; ' (n%1).
%=P i (X,,,^jform 1). (ii) Sum (i) over n to give an alternative proof of (8.11). (iii) Use (i) to indicate how one may compute the distribution of the first visit to state j (after time zero), starting in state i, in terms of p;PP (n _> 1).
202
DISCRETE-PARAMETER MARKOV CHAINS
11. (i) Show that if II • II and II • 110 are any two norms on t, then there are positive
constants c,, c 2 such that
c1 IIxll -< Ilxllo cc. (ii) Verify (9.8) under the assumption (9.4). [Hint: Show that P(IY^I > e i.o.) = 0 for every e> 0.] 2. Verify for (9.12) that n lim — = E(T; 2) — t) = provided E j (tj(' ) ) < co.
3.Prove Jim ' = o0 n w Nn
Pi—a.s. for a null-recurrent state j such that i#-j. j. 4. Show that positive and null recurrence are class properties. 5. Let {X„} be an irreducible aperiodic positive recurrent Markov chain having transition matrix p on a denumerable state space S. Define a transition matrix q on S x S by (i, j), (k, 1) e S x S.
= Pik P;i,
Let {Z„} be a Markov chain on S x S with transition law q. Define Tp = inf{n: Zn e D}, where D = {(i, i): je S} c S x S. Show that if P(i.j) (TD < cc) = I for all i, j then for all i, je S, lim,,.,. p;; = 7t 1 , where {n t } is the unique invariant distribution. [Hint: Use the coupling method described in Exercise 6.6 for finite state space.] )
EXERCISES
203
6. (General Birth—Collapse) Let p be a transition probability matrix on S={0,1,2,...} of the form e ;. , + ,=p i ,p,. o =I— p j , i=0,1,2,...,O
Z for i _> 1. Explain why for fixed i > 0,
1 ^ j e S, n,„-,
does not converge to the invariant distribution S o ({j}) (as n -- co). How can this be modified to get convergence? 14. (Iterated Averaging)
(i) Let a,, a 2 , a 3 be three numbers. Define a 4 = (a, + a 2 + a 3 )/3, a 5 = (a 2 + a 3 + a 4 )/3,.... Show that lim,, a„ = (a, + 2a 2 + 3a 3 )/6. (ii) Let p be an irreducible positive recurrent transition law and let a,, a 2 , ... be any bounded sequence of numbers. Show that lim
Pi! aj _ )
ajnt,
where (it) is the invariant distribution of p. Show that the result of (i) is a special case.
EXERCISES
205
Exercises for Section 11.10
1. (i) Let Y1 , Y,, ... be i.i.d. with EY < co. Show that max(Y 1 , ... , Y")/,/n —• 0 a.s. as n —* . [Hint: Show that P(Y.2 > ne i.o.) = 0 for every e > 0.] (ii) Verify that n '/ZS" has the same limiting distribution as (10.6). -
2. Let {W"(t): t >_ 0} be the path process defined in (10.7). Let t I < t 2 < • • • < t k , k >_ 1, be an arbitrary finite set of time points. Show that (W"(t 1 ), ... , W"(t k )) converges in distribution as n —^ x to the multivariate Gaussian distribution with mean zero and variance—covariance matrix ((D min{t i , t i })), where D is defined by (10.12). 3. Suppose that {X"} is Markov chain with state space S = { 1, 2, . .. , r} having unique invariant distribution (it ; ). Let N(i)= #{k_ 1, in terms of the parameters p, o and Poi 2. Let {X"} be a three-state Markov chain on S = {0, 1, 2} where 0, 1, 2 are arranged counterclockwise on a circle, and at each time a transition occurs one unit clockwise with probability p or one unit counterclockwise with probability 1 — p. Let t o denote the time of the first return to 0. Calculate P(-r o > n), n > 1. 3. Let T o denote the first time starting in state 2 that the Markov chain in Exercise 5.6 reaches state 0. Calculate P 2 (r o > n).
4. Verify that the Markov chains starting at i having transition probabilities p and p, and viewed up to time T A have the same distribution by calculating the probabilities of the event {X0 = i, X, = i ...... Xm = ► m , T A =m} under each of p and p. 5. Write out a detailed explanation of (11.22). 6. Explain the calculation of (11.28) and (11.29) as given in the text using earlier results on the long-term behavior of transition probabilities. 7. (Collocation) Show that there is a unique polynomial p(x) of degree k that takes
prescribed (given) values v o , v 1 ..... v k at any prescribed (given) distinct points x 0 , x,, ... , x k , respectively; such a polynomial is called a collocation polynomial. [Hint: Write down a linear system with the coefficients a o , a,, ... , a k of p(x) as the unknowns. To show the system is nonsingular, view the determinant as a polynomial and identify all of its zeros.] *8. (Absorption Rates and the Spectral Radius) Let p be a transition probability matrix
for a finite-state Markov chain and let r ; be the time of the first visit to j. Use (11.4) and the results of the Perron—Frobenius Theorem 6.4 and its corollary to show that exponential rates of convergence (as obtained in (11.43)) can be anticipated more generally.
9. Let p be the transition probability matrix on S = {0, ± 1, ±2, ...} defined by ifi>0,j=0,1,2,...,i-1 ifi 0 is given by
=, (i /k).
10. Let {X„} be the simple branching process on S = {0, 1, 2, ...} with offspring distribution { fj }, f j a jfj 1. (i) Show that all nonzero states in S are transient and that lim„^ P 1 (X„ = k) = 0, k=1,2,....
(ii) Describe the unique invariant probability distribution for {X„}. II. (i) Suppose that in a certain society each parent has exactly two children, and both males and females are equally likely to occur. Show that passage of the family surname to descendants of males eventually stops. (ii) Calculate the extinction probability for the male lineage as in (i) if each parent has exactly three children. (iii) Prompted by an interest in the survival of family surnames, A. J. Lotka (1939), "Theorie Analytique des Associations Biologiques II,” Actualites Scientifiques et Industrielles, (N.780), Hermann et Cie, Paris, used data for white males in the United States in 1920 to estimate the probability function f for the number of male children of a white male. He estimated f(0) = 0.4825, f(j)= (0.2126)(0.5893)' ' (j = 1,2,...). -
(a) Calculate the mean number of male offspring. (b) Calculate the probability of survival of the family surname if there is only one male with the given name. (c) What is the survival probability forj males with the given name under this model. (iv) (Maximal Branching) Consider the following modification of the simple branching process in which if there are k individuals in the nth generation, and if X,, X 2 , ... , X„,, are independent random variables representing their respective numbers of offspring, then the (n + 1)st generation will contain Zn + , = max{X„ . ,, X z , ... , X„.k } individuals. In terms of the survival of family names one may assume that only the son providing the largest number of grandsons is entitled to inherit the family title in this model (due to J. Lamperti (1970), "Maximal Branching Processes and Long-Range Percolation,” J. Appt. Probability, 7, pp. 89-98). (a) Calculate the transition law for the successive size of the generations when the offspring distribution function is F(x), x = 0, 1, 2, ... . (b) Consider the case F(x) = 1 — (1/x), x = 1, 2, 3..... Show that lim P(Z„ + , -< kx I Z„ = k) = e x ', -
-
x>0.
12. Let f be the offspring distribution function for a simple branching process having finite second moment. Let p = > k kf(k), v 2 = E k (k — p) 2 f(k). Show that, given Xo = 1,
Var X. = g
— 1 )/(u — 1)
if
if it # I p = 1.
13. Each of the following distributions below depends on a single parameter. Construct graphs of the nonextinction probability and the expected sizes of the successive generations as a function of the parameter.
208
DISCRETE-PARAMETER MARKOV CHAINS
p (i) f(j)= q 0 (ii) f(j)=qp',
ifj= 2 ifj=0 otherwise; j=0,1,2,...;
(iii) f(j)=^ i j -
j•
14. (Electron Multiplier) A weak current of electrons may be amplified by a device consisting of a series of plates. Each electron, as it strikes a plate, gives rise to a random number of electrons, which go on to strike the next plate to produce more electrons, and so on. Use the simple branching process with a Poisson offspring distribution for the numbers of electrons produced at successive plates. (i) Calculate the mean and variance in the amplification of a single electron at the
nth plate (see Exercise 12). (ii) Calculate the survival probability of a single electron in an infinite succession of plates if y = 1.01. 15. (A Generalized Gambler's Ruin) Gamblers 1 and 2 have respective initial capitals i > 0 and c — i > 0 (whole numbers) of dollars. The gamblers engage in a series of fair bets (in whole numbers of dollars) that stops when and only when one of the gamblers goes broke. Let X. denote gambler l's capital at the nth play (n >_ 1), Xo = I. Gambler 1 is allowed to select a different game (to bet) at each play subject to the condition that it be fair in the sense that E(X„ X^_,) = X,_,, n = 1, 2, ... , and that the amounts wagered be covered by the current capitals of the respective gamblers. Assume that gambler l's selection of a game for the (n + 1)st play (n >_ 0) depends only on the current capital X. so that {XX } is a Markov chain with stationary transition probabilities. Also assume that P(X„ + I = i XX = i) < 1, 0 < i < c, although it may be possible to break even in a play. Calculate the probability that gambler 1 will eventually go broke. How does this compare to classical gamblers ruin (win or lose $1 bets placed on outcomes of fair coin tosses)? [Hint: Check that a c (i) = i/c, 0 < i < c, a 0 (i) _ (c — i)/c, 0 < i _< c (uniquely) solves (11.13) using the equation E(X„ X,) = X„_ 1 .] Exercises for Section II.12 1. (i) Show that for (12.10) it is sufficient to check that 9,(a) _ q(b, a)q(a, c) _
9,(0)
q(b, O)q(O, c)'
a, b, cc S.
[Hint: Y_ g(a) = 1, äb, c e S.] a
(ii) Use (12.11), (12.12) to show that this condition can be expressed as 9(a) 9n,ß( 8 )
9e.o(b) 9e.e( 0 ) 9e.^(a) g9(0)
9e,e(b) 9e.,,(0)
a, b, cc S.
EXERCISES
209
(iii) Consider four adjacent sites on Z in states a, ß, a, b, respectively. For notational convenience, let [a, ß, a, b] = P(X0 = a, X, = ß, X 2 = a, X 3 = b). Use the
condition (ii) of Theorem 12.1 to show [a, ß, a, b] = P(X0 = a, X 2 = a, X3 = b)g".a(ß)
and, therefore, [a, ß , a, b] _ g(ß) [a, /3', a, b]
g,,,(ß')
(iv) Along the lines of (iii) show that also [a, ß, a, b]
g ß (a)
[a, ß, a', b]
gß.b(a')
Use the "substitution scheme" of (iii) and (iv) to verify (12.10) by checking (ii). 2. (i) Verify (12.13) for the case n = 1, r = 2. [Hints: Without loss of generality, take h = 0, and note, P(X 1 = ß,X 2
=
yIXo = a,X3 =h)=
[x, ß, y b] ,
--
[a, u, v, b]
and using Exercise 1(iii)—(iv) and (12.10), [a, u, v, b] _ g(u)g(v) _ 9(a, u)q(u, v)q(v, b) [a, u', v', b]
ga."(u')gr,b(v')
q(a, u')q(u', ß)9(v', b)
Sum over u, v E S and then take u' = ß, v' = y.] (ii) Complete the proof of (12.13) for n = 1, r 2 by induction and then n
>- 2.
3. Verify (12.15). 4. Justify the limiting result in (12.17) as a consequence of Proposition 6.1. [Hint: Use Scheffe's Theorem, Chapter 0.] *5. Take S = {0, 1}, g,,,(1) = µ, g, 0 (1) = v. Find the transition matrix q and the invariant initial distribution for the Markov random field viewed as a Markov chain.
Exercises for Section 11.13 (i) Let {Y": n -> 0} be a sequence of random vectors with values in Il k which converge a.s. to a random vector Y. Show that the distribution Q. of Y. converges weakly to the distribution Q of Y. (ii) Let p(x, dy) be a transition probability on the state space S = R k (i.e., (a) for each x e S. p(x, dy) is a probability measure on (R", B') and (b) for each B E x —. p(x, B) is a Borel-measurable function on R k ). The n-step transition probability p " (x, dy) is defined recursively by (
)
210
DISCRETE-PARAMETER MARKOV CHAINS
p cl)
(x, dy) = p(x, dy),
pcn+1)(x,
B)
= Js
p(y, B)p' n) (x, dy).
Show that, if p " (x, dy) converges weakly to the same probability measure it(dy) for all x, and x —* p(x, dy) is weakly continuous (i.e., f s f (y) p(x, dy) is a continuous function of x for every bounded continuous function f on S), then n is the unique invariant probability for p(x, dy), i.e., j 13 p(x, B)rc(dx) = n(B) for all Be .V V. [Hint: Let f be bounded and continuous. Then )
)
J
dY) ->
f
.%(Y)7r(dy),
1lr(dz). ]
J
(iii) Extend (i) and (ii) to arbitrary metric space S, and note that it suffices to require convergence of n - ' Jm-ö If (Y)pl'")(x, dy) to If (y)ir(y) dy for all bounded continuous f on S. 2. (i) Let B 1 , B 2 be m x m matrices (with real or complex coefficients). Define IIBII as in (13.13), with the supremum over unit vectors in Il'" or C'°. Show that
IIBIB2II < IIB,II IIB211(ii) Prove that if B is an m x m matrix then
IIBII < m' 12 max {Ib 1 l: 1 < i,j 0, then P(Ie 1 I > cb")
EEIZI — 1,
where Z = logI ,I —
" =1 log
_
n) _
[Hint: n =1
n=1
(ii) Show that if (13.15) holds then (13.16) converges. [Hint:
IIS"B"II
dfl("° B""II 1 "f"°)
log c
(5
where d = max{8'IIBII': 0
_< _< r
n o }. ]
(iii) Show that (13.15) holds, if it holds for some S < 1 /r(B). [Hint: Use the Lemma.] 4. Suppose Y a"z" and 1] b "z" are absolutely convergent and are equal for Izl < r, where r is some positive number. Show that a n = bn for all n. [Hint: Within its radius of
EXERCISES
211
convergence a power series is infinitely differentiable and may be repeatedly differentiated term by term.] 5. (i) Prove that the determinant of an m x m matrix in triangular form equals the product of its diagonal elements. (ii) Check (13.28) and (13.35). 6. (i) Prove that under (13.15), Y in (13.16) is Gaussian if c" are Gaussian. Calculate the mean vector and the dispersion matrix of Y in terms of those of E". (ii) Apply (i) to Examples 2(a) and 2(b). 7. (i) In Example I show that Jhi < 1 is necessary for the existence of a unique invariant probability. (ii) Show by example that ibI < I is not sufficient for the existence of a unique invariant probability. [Hint: Find a distribution Q of the noise E" with an appropriately heavy tail.] 8. In Example 1, assume EE„ < oo, and write a = EE", X" + , = a + bX" + 0" + „ where = a" — a (n -> 1). The least squares estimates of a, b are ä N , b N , which minimize Yn-ö (XX+ , — a — bX") z with respect to a, b. (XX — X ) 2 , where (i) Show that a N = Y — 6 N X, b N = I ö ' X" + , (X" — X= N 1 X", Y= N 1 I i X. [Hint: Reparametrize to write a + bX" _ a, + b(X" — X).]
(ii) In the case Ibi < 1, prove that a N —+ a and b N - b a.s. as N —• oo. 9. (i) In Example 2 let m = 2, b„ _ —4, b 12 = 5, b 2 , = —10, b 22 = 3. Assume e, has a finite absolute second moment. Does there exists a unique invariant probability? (ii) For the AR(2) or ARMA(2, q) models find a sufficient condition in terms of ß o and (i, for the existence of a unique invariant probability, assuming that q" has a finite rth moment for some r > 0.
Exercises for Section II.14 1. Prove that the process {X"(x): n >- 0} defined by (14.2) is a Markov process having the transition probability (14.3). Show that this remains true if the initial state x is replaced by a random variable X. independent of {a"}. 2. Let F"(z):= P(X, -< z), n >- 0, be a sequence of distribution functions of random variables X. taking values in an interval J. Prove that if F + ,„(z) — F(z) converges to zero uniformly for z e J, as n and m go to co, then F(z) converges uniformly (for all z e J) to the distribution function F(z) of a probability measure on J. 3. Let g be a continuous function on a metric space S (with metric p) into itself. If, for some x e S, the iterates g ( "^(x) - g(gt" (x)), n >- 1, converge to a point x* E S as n —* cc, then show that x* is a fixed point of g, i.e., g(x*) = x*. 4. Extend the strong Markov property (Theorem 4.1) to Markov processes {X"} on an interval J (or, on 5. Let r be an a.s. finite stopping time for the Markov process {X"(x): n > 0} defined by (14.2). Assume that X,(x) belongs to an interval J with probability 1 and p ( " ) (y, dz) converges weakly to a probability measure n(dz) on J for all ye J. Assume also that p(y, J) = I for all y e J.
212
DISCRETE-PARAMETER MARKOV CHAINS (i) Prove that p " (x, dz) converges weakly to n(dz). [Hint: p '` (x, J) —. I as k --i c0, (
f f(y)p
(k+r)
)
(
.)
(x dy) = f ($f(z)p^ (y, dz))p
(k)
)
(x dy).]
(ii) Assume the hypothesis above for all x e J (with J not depending on x). Prove
that it is the unique invariant probability. 6. In Example 2, ifs,, are i.i.d. uniform on [— 1,1], prove that {X 2n (x): n > 1} is i.i.d. with common p.d.f. given by (14.27) if XE [-2, 2]. 7. In Example 2, modify f as follows. Let 0 < ö _ 2} is i.i.d. with common distribution —
ix (or, ix +2).
(iii) For x e (-8, 6) {X"(x): n >_ 1} is i.i.d. with common distribution it_ X+ ,. 8. In Example 3, assume P(e 1 = 0) > 0 and prove that P(e,, = 0 for some n >_ 0) = 1. 9. In Example 3, suppose E log s > 0. (i) Prove that E,„ , {1/(E 1 ...e" )} converges a.s. to a (finite) nonnegative random variable Z. (ii) Let d, := inf{z > 0: P(Z < z) > 0}, d 2 == sup{z >- 0: P(Z >- z) > 0}. Show that ;
=0
if x < c(d, + l ),
p(x) e (0, 1)
if c(d, + l) < x < c(d 2 + 1),
=1
if x > c(d 2 + 1).
10. In Example 3, define M := sup{z >_ 0: P(e, > z) > 0}. (i) Suppose I <M < oo. Then show that p(x) = 0 if x < cM/(M — 1). (ii) If Al _< 1, then show that p(x) = 0 for all x. (iii) Let m be as in (14.34) and M as above. Let d,, d 2 be as defined in Exercise 9(ii). Show that (a) d, = > m- "
_
n =,
(b)d 2 =
M " -
1
if m > 1, = oo if m 5 1 ,
and
m—I
(=_ifM>1=coifM1).
m—I
Exercises for Section I1.15 1. Let 0 < p < I and suppose h(p) represents a measure of uncertainty regarding the
occurrence of an event having probability p. Assume (a) h(i)=0. (b) h(p) is strictly decreasing for 0 < p _< 1. (c) h is continuous on (0, 1]. (d)h(pr) = h(p) + h(r), 0
p(y) for all x, y in S. Then there is a unique invariant distribution it such that (
sup B
)
p ( " » (x, y)p(dy) — it(B)
(1 —
where a = f s p(x) u(dx), n' = [n/r], and the sup is over B e .Y .
❑
Proof. To see this define sup M(B) :=
p " (u, y)p(dy) (
)
uES ^ JB
inf if p " (u, y)p(dy),
m"(B):=
(
)
(T.6.3)
s "
sup {M(B) — m"(B)}. B
Then = sup
Ip ( " )
"
1.
(x, y) — pc (z, y)I u(dy) )
2 , s
pick+ n.> (ii)
Ifil
(x, y)p(dy) — p «k+ '(z, y)p(dy)1
( 1 — e)[Mk.(B) — mk.(B)]
(iii) The probability measure it given by n(B)
J p " (x, y)p(dy)
=
lim
su ( " ) (x, B p Ip
y)p(dy) — it(B)
(
)
(T.6.4)
is well defined and (1 — a)"'.
(T.6.5) n
Also, the following facts are simple to check. (iv) it is absolutely continuous with respect to p and therefore by the Radon—Nykodym Theorem (Chapter 0) has a density rz(y) with respect to p. (v) For Be .9' with 7r(B) > 0, one has Py,(X" e B i.o.) = 1. 2. (Doeblin Condition) A well-known condition ensuring the existence of an invariant distribution is the following. Suppose there is a finite measure cp and an integer r >_ 1, and > 0, such that p ' (x, B) < I — r whenever (p(B) _< e. Under this condition there is a decomposition of the state space as S = U;"_ j S i such that (i) Each Si is closed in the sense that p(x, S ; ) = I for x e S. (1 i m). (ii) p restricted to Si has a unique invariant distribution rti. (
)
216
DISCRETE-PARAMETER MARKOV CHAINS
(iii) -
j p ' (x, B) -+ zr (B) (
)
;
as n -. cc for x e Si .
n r =^
Moreover, the convergence is exponentially fast and uniform in x and B. It is easy to check that the condition of theoretical complement 1 above implies the Doeblin condition with Qp(dx) = p(x)p(dx). The more general connections between the Doeblin condition with the gi(dx) = p(x) p(dx) condition in theoretical complement 1 can be found in detail in J. L. Doob (1953), Stochastic Processes, Wiley, New York, pp. 190. 3. (Lengths of Increasing Runs) Here is an example where the more general theory applies. Let X,, X2 , ... be i.i.d. uniform on [0, 1]. Successive increasing runs among the values of the sequence X 1 , X2 ,. . . are defined by placing a marker at 0 and then between X; and X +1 whenever X; exceeds X +1 , e.g., X0.20, 0.24, 0.6010.5010.30, 0.7010.20... . Let Y. denote the initial (smallest) value in the nth run, and let L. denote the length of the nth run, n = 1, 2.... . (i) { Y,} is a Markov chain on the state space S = [0, 1] with transition density e' - x
ify<x
P(x, Y) _ (ii) Applying theoretical complement 1, { Y„} has a unique invariant distribution n. Moreover, it has a density given by n(y) = 2(1 — y), 0 < y < 1. (iii) The limit in distribution of the length L. as n --+ oo may also be calculated from that of { }„}, since ,r,(1_y)m m I Y„ = Y)P(Y„ E dY) = P(L ( m — 1)! 0
J
fo
P(L„ m) =
P(Y„ E dY),
and therefore, ri
—y )1 P(L >, m):= lim P(L„ >_ m) = J ^ 1 0
2(1 — y) dy.
o (m — 1)!
„--W
Note from this that P(L. _> m) = 2.
EL. _ m=1
Theoretical Complement to Section 11.8 1. We learned about the random rounding problem from Andreas Weisshaar (1987), "Statistisches Runden in rekursiven, digitalen Systemen 1 und 2,” Diplomarbeit erstellt am Institut für Netzwerk- und Systemtheorie, Universität Stuttgart. The stability condition of Example 8.1 is new, however computer simulations by Weisshaar suggest that the result is more generally true if the spectral radius is less than 1. However, this is not known rigorously. For additional background on the applications of this
THEORETICAL COMPLEMENTS
217
technique to the design of digital filters, see R. B. Kieburtz, V. B. Lawrance, and K. V. Mina (1977), "Control of Limit Cycles in Recursive Digital Filters by Randomized Quantization," IEEE Trans. Circuits and Systems, CAS-24(6), pp. 291-299, and references therein. Theoretical Complements to Section I1.9 1. The lattice case of Blackwell's Renewal Theorem (theoretical complement 2 below)
may be used to replace the large-scale
Caesaro type convergence
obtained for the
transition probabilities in (9.16) by the stronger elementwise convergence described as follows. (i) If j e S is recurrent with period 1, then for any i e S, lim pi; ) = pj;(E;i; t) ) - ` nom
where p 0 is the probability P; (i^' ) < cc) (see Eq. 8.4). (ii) If j e S is recurrent with period d> 1, then by regarding p ° as a one-step transition law, lim p(Jd) = pijd(E;i^ii)-
i
n-^m
To obtain these from the general renewal theory described below, take as the delay Zo the time to reach j for the first time starting at i. The durations of the subsequent replacements Z 1 , Z 2 , ... represent the lengths of times between returns to I.
2. (Renewal Theorems) Let Z,, 4,... be independent nonnegative random variables such that Z,, 4,.. . are i.i.d. with common (nondegenerate) distribution function F, and Z o has distribution function G. In the customary framework of renewal theory, components subject to failure (e.g. lightbulbs) are instantly replaced upon failure, and Z,, Z 2 , ... represent the random durations of the successive replacements. The delay random variable Zo represents the length of time remaining in the life of the initial component with respect to some specified time origin. For example, if the initial component has age a relative to the placement of the time origin, then one may take G(x) _
F(x
+ a) — F(a)
1 — F(a)
Let S. = Zo + Z, + ••• + Zn ,
n ^> 0,
(T.9.1)
and let N,=inf{n>-O:Sn>t},
t>-0.
(T.9.2)
We will use the notation S.0, N° sometimes to identify cases when Z o = 0 a.s. Then Sn is the time of the (n + 1)st renewal and N, counts the number of renewals up to
218
DISCRETE-PARAMETER MARKOV CHAINS
and including time t. In the case that Z o = 0 a.s., the stochastic (counting) process {N} - {N°} is called the (ordinary) renewal process. Otherwise {N,} is called a delayed renewal process. For simplicity, first restrict attention to the case of the ordinary renewal process. Let u = EZ l < oo. Then 1/µ is called the renewal rate. The interpretation as an average rate of renewals is reasonable since Sx
-i
\ t \ —
N,
(T.9.3)
N, N,
and N, -- oc as t --* oo, so that by the strong law of large numbers
N, t
-.
1 a.s. p
>_
as t - oc.
Since N, is a stopping time for {Sn } ( for fixed t (Chapter I, Corollary 13.2) that
(T.9.4)
0), it follows from Wald's Equation
ESN , = pEN,.
(T.9.5)
In fact, the so-called Elementary Renewal Theorem asserts that
EN,1 - t
as t --* Cc.
µ
(T.9.6)
To deduce this from the above, simply observe that pEN, = ESN , > t and therefore liminf
‚- .
EN,1 / -
t
p
>,
On the other hand, assuming first that Z. < C a.s. for each n 1, where C is a positive constant, gives pEN, < t + C and therefore for this case, limsup,.,(EN,/t) I/p. More generally, since truncations of the Z. at the level C would at most decrease the S„, and therefore at most increase N, and EN„ this last inequality applied to the truncated process yields
EN,
, C) +
fc
as C -4 co.
(T.9.7)
xF(dx) p
The above limits (T.9.4) and (T.9.6) also hold in the case that p = co, under the convention that 1 /0o is 0, by the SLLN and (T.9.7). Moreover, these asymptotics can now be applied to the delayed renewal process to get precisely the same conclusions for any given (initial) distribution G of delay. With the special choice of G = F. defined by x
F(x)
=1 I u j0
P(Z, > u)du,
x 0,
(T.9.8)
THEORETICAL COMPLEMENTS
219
the corresponding delayed renewal process N, called the equilibrium renewal process, has the property that for any h, t l 0,
EN(t, t + h] = h/p,
(T.9.9)
where NI(t,t+h]=N+h—N, N.
(T.9.10)
To prove this, define the renewal function m(t) = EN„ t >- 0, for {N1 }. Then for the
general (delayed) process we have P(N, n) = G(t) +
m(t) = EN, =
= G(t) +
P(N, n+l)
P(N,_„ -> n)F(du) = G(t) + J m(t — u)F(du) f0 0 '
= G(t) + m * F(t),
(T.9.11)
where * denotes the convolution defined by
m
s
F(t)
J
=
m(t
—
(T.9.12)
s)F(ds).
u Observe that g(t) = t/p, t -> 0, solves the renewal equation (T.9.11) with G = Fx,; i.e., t
1`
JJ
t
—= (1 — F(u)) du + — F(u) du = F(t) + -F(ds) du µ u o µ o N o 0 = F^(t) + 1
F(ds) = F (t) + t s F(ds). J J r duo s f EP x
—
u '
(T.9.13)
.'
To finish the proof of (T.9.9), observe that g(t) = m`(t):= EN;, t >- 0, uniquely solves (T.9.11), with G = F., among functions that are bounded on finite intervals. For if r(t) is another such function, then by iterating we have r(t) = F(t) +^^ r(t — u)F(du) 0
= F(t) + f { F^(t — u) + .'
(
J
t
u
r(t — u — s)F(ds)jF(du)
o
= P(Nj' >- 1) + P(N' -> 2)
+ = P(Nf' ^ 1) + P(N, 3 2) +
f
r(t
—
v)P(S° e dv)
+ P(Nf' ? n) + J r(t — v)P(S° e dv). (T.9.14)
220
DISCRETE-PARAMETER MARKOV CHAINS
Thus, r(t) _
P(Nj°
3 n) = m W (t)
n =i
since r J
Ir(t - v)IP(S E dv) _< sup jr(s)jP(S° 0,
P(t<Sk_- 1} and {Z„: n >- l} denote two independent sequences of renewal lifetime random variables with common distribution F, and let Z o and Z o be independent delays for the two sequences having distributions G and G = F es , respectively. The tilde () will be used in reference to quantities associated with the latter (equilibrium) process. Let a > 0 and define, for some h},
(T.9.18)
v"(s)=inf{n>,0:IS„—§,jj e. for some n}.
(T.9.19)
v(r)=inf{n>0:IS„—
Suppose we have established that (e-recurrence) P(v(e) < oo) = I (i.e., the coupling will occur). Since the event {v(e) = n, v(e) = n} is determined by 4, Z 1 , ... , Z„ and Z;,, the sequence of lifetimes {Z' +k : k >- l} may be replaced by the sequence {Z, +k : k >- 1} without changing the distributions of {S„}, {N}, etc. Then, after such a modification for < h/2, observe with the aid of a simple figure that N(t+r,t+h—e]1 1S,w—t) -g
N(t + s, t + h — E] 1 ^ s „ ,, E N(t, t + h]I (s , (,) ,, I N(t, t
+ h]1 Is ,
II
+ N(t, t + h]l {s „)(= N(t, t + h])
N(t — E, t + h + E]1 (s
, } + N(t, t + h]l {s ,,,,,,,
N(t—e,t+h+e]+N(t,t+h]l 5 )>,}.
(T.9.21)
Taking expected values and noting the first, fifth, and seventh lines, we have the following coupling inequality, EN(t+r,t+h—e]—E(N(t,t +h]l is ,. ( ,,„,) EN(t, t + h] < EN(t — c, t + h + e] + E(N(t, t + h]1 (5 , > ,). (T.9.22)
222
DISCRETE - PARAMETER MARKOV CHAINS
Using (T.9.9), EN(t+s, t+h — e]=
h
-
2s and
EN(t — E, t+h+e]=
µ
h+2e p
Therefore,
EN(t, t + h] — hl < ?E + E(N(t, t + h]l {s ,}).
u
(T.9.23)
Since A = I Is ,()>$} is independent of {ZN, +k: k >, 1}, we have E(N(t, t + h]1 (s ,^ >,^) < E o Nh P(SVO > t) = m(h)P(S V(C) > t),
(T.9.24)
where E o denotes expected value for the process N h under zero-delay. More precisely, because (t, t + h] c (t, SN , + h] and there are no renewals in (t, SN ,), we have N(t, t + h] < inf{k 3 0: ZN, +k > h). In particular, noting (T.9.2), this upper bound by an ordinary (zero-delay) renewal process with renewal distribution F, is independent of the event A, and furnishes the desired estimate (T.9.24). Now from (T.9.23) and (T.9.24) we have the estimate EN(t, t + h] — hl < m(h)P(SV(C) > t) + ?E ,
(T.9.25)
which is enough, since e> 0 is arbitrary, provided that the initial e-recurrence assumption, P(v(e) < oo) = 1, can be established. So, the bulk of the proof rests on showing that the coupling will eventually occur. The probability P(v(c) < cc) can be analyzed separately for each of the two cases (i) and (ii) of Theorem T.9.1. First take the lattice case (ii) with lattice spacing (period) d. Note that for e < d, v(e) = v(0) = inf{n > 0: S" — = 0 for some n}.
(T.9.26)
Also, by recurrence of the mean-zero random walk on the integers (theoretical complement 3.1 of Chapter I), we have P(v(0) < co) = 1. Moreover, S, (0) is a.s. a finite integral multiple of d. Taking t = nd and h = d with e = 0, we have EN ( " ) = EN(nd, nd + h] -+ h = d P p
as n - cc.
(T.9.27)
For case (i), observe by the Hewitt—Savage zero—one law (theoretical complement 1.2 of Chapter I) applied to the i.i.d. sequence (Z 1 , Z 1 ), (Z 2 , Z 2 ), (Z 3 , Z 3 ), ... , that P(R " < c i.o. I Z o = z) = 0 or 1, where R"= min{ S,f—S":5;,—S">0,n_>0}=SS,^S —
"
Now, the distribution of {SR, +; — t} ; does not depend on t (Exercise 7.5, Chapter
THEORETICAL COMPLEMENTS
223
->
IV). This, independence of {Z' j } and {S„}, and the fact that {Sk + „ — Sk : n 0} does not depend on k, make {R,, +k } also have distribution independent of k. Therefore, the probability P(R, < e for some n > k), does not depend on k, and thus {R„ < e i.o.} =
n
{R. < e for some n
-> k}
(T.9.28)
I,=0
implies P(R„ < r i.o.) = P(R„ < e for some n
J P(R„ < s i.o.
>, 0) -< P(v(e) < oo). Now,
= z)P(2 0 e dz) = P(R„ < e i.o.) = P(R„ < r for some n)
=
J P(R. <e for some n 12 = z)P(2 e dz). (T.9.29) 0
0
The proof that P(R„ <e for some n z) > 0 (and therefore is 1) in (T.9.29) follows from a final technical lemma given below on "points of increase" of distribution functions of sums of i.i.d. nonlattice positive random variables; a point x is called a point of increase of a distribution function F if F(b) — F(a) > 0 whenever a < x < b. Lemma. Let F be a nonlattice distribution function on (0, co). The set E of points of increase of the functions F, F* z , F* 3 , . . is "asymptotically dense at co" in the sense that for any t > 0 and x sufficiently large, E n (x, x + e) 96 0, i.e., the interval (x, x + e) meets E for x sufficiently large. 0
The following proof follows that in W. Feller (1971), An Introduction to Probability Theory and Its Applications, 2nd ed., Wiley, New York, p. 147. Proof. Let a, b E E, 0 a 2 /(b — a) belongs to some I., n >, 1. Since E is easily checked to be closed under addition, the n + 1 points na + k(b — a), k = 0,1, ... , n, belong to E and partition I. into n subintervals of length b — a < r. Thus each x > a 2 /(b — a) is at a distance (b — a)/2 0, b — a -> e for all a, b e E then F must be a lattice distribution. To see this say, without loss of generality, E -< b — a < 2a for somea,beE.ThenEnl c {na +k(b—a):k= 0,1,...,n}.Since(n+l)aeEnl„ for a < n(b — a), E n I must consist of multiples of (b — a). Thus, if c e E then c + k(b — a) e /„ n E for n sufficiently large. Thus c is a multiple of (b — a). n Coupling approaches to the renewal theorem on which the preceding is based can be found in the papers of H. Thorisson (1987), "A Complete Coupling Proof of Blackwell's Renewal Theorem," Stoch. Proc. App!., 26, pp. 87-97; K. Athreya, D. McDonald, P. Ney (1978), "Coupling and the Renewal Theorem," Amer. Math. Monthly, 851, pp. 809-814; T. Lindvall (1977), "A Probabilistic Proof of Blackwell's Renewal Theorem," Ann. Probab., 5, pp. 482-485. 3. (Birkhoff's Ergodic Theorem) Suppose {X: n -> 0} is a stochastic process on (S2, .F, P) with values in (S, ,V). The process {X„} is (strictly) stationary if for every pair of integers m >- 0, r >- 1, the distribution of (X 0 , X 1 , ... , X,,) is the same as that
224
DISCRETE-PARAMETER MARKOV CHAINS
>,
of (X„ X l+ „ ... , X. + ,). An equivalent definition is: {X„} is stationary if the distribution µ, say, of X :_ (X0 , X 1 , X2 , ...) is the same as that of T'X :_ (X,, X l +„ X2 +„ ...) for all r 0. Recall that the distribution of (X„ X, + „ ...) is the probability measure induced on (St, .9 ®x) by the map co -+ (X,(w), X 1 + ,( w), X 2+r (w)....). Here St is the space of all sequences x = (x 0 , x 1 , x 2 , ...) with x i E S for all i, and .5'® is the smallest sigmafield containing the class of all sets of the form C = {x e St: x i E B, for 0 < i n} where n 0 and B E . ' (0 i n) are arbitrary. The shift transformation T is defined by Tx:= (x i , x 2 , ...) on St, so that T'x = (x„ XI+„ x2+„ ...). Denote by I the sigmafield generated by {X: n 0 }. That is, ✓! is the class of all sets of the form G = X -1 C = {co E Q: X(co) E C}, C E .9'®x• For a set G of this form, write T '6= {w e Q: TX(w) E C) = {(X,, X 2 ,. . .) E C) _ {X E T 'Cl. Such a set G is said to be invariant if P(G AT 'G) = 0, where A denotes the symmetric difference. By iteration it follows that if G = {X e C} is invariant then P(G AT 'G) = 0 for all r > 0, where T 'G = {(X„ X l +„ X2,,. . .) e C}. Let f be a real-valued measurable function on (5 50®) Then cp(w):= f(X(w)) is W- measurable and, conversely, all 1-measurable functions are of this form. Such a function cp = f(X) is invariant if f(X) = f(TX) a.s. Note that G = {X E C) is invariant if and only if 1 G = l c (X) is invariant. Again, by iteration, if f(X) is invariant then f(X) = f (T'X) a.s. for all r ? 1. Given any p ✓ -measurable real-valued function cp = f(X), the functions (extended real-valued)
_
;
-
-
-
t,
f(X): = lim n -1 (f(X)+f(TX)+•-•+f(T'X)) n -m
and f(X):= nyt lim n '( f(X)+ ... + f(Tn -1 X)) -
are invariant, and the set {7(X) = f(X)} is invariant. The class 5 of all invariant sets (in 5 ✓ ) is easily seen to be a sigmafield, which is called the invariant sigmafield. The invariant sigmafield .1 is said to be trivial if P(G) = 0 or 1 for every G E J. Definition T.9.1. The process {X„: n > 0} and the shift transformation T are said to be ergodic if S is trivial.
_>
The next result is an important generalization of the classical strong law of large numbers (Chapter 0, Theorem 6.1).
Theorem T.9.2. (Birkhoff's Ergodic Theorem). Let {X: n
0) be a stationary sequence on the state space S (having sigmafield .9'). Let f(X) be a real-valued 1-measurable function such that Elf(X)( < oc. Then f(TX) converges a.s. and in L' to an invariant random variable g(X), (a) n and ❑ (b) g(X) = Ef(X) a.s. if S is trivial. -
' Y_;=ö
We first need an inequality whose derivation below follows A. M. Garcia (1965), "A Simple Proof of E. Hopf's Maximal Ergodic Theorem,” J. Math. Mech., 14, pp. 381-382. Write
THEORETICAL COMPLEMENTS
225
M„(f):= max{0, f(X),f(X)+ f(TX),...,f(X)+ • • • + f(T" - 'X)}, M(f-T)= max{0, f(TX),f(TX)+f(T 2 X),...,f(TX)+•••+f(T"X)}, M(f):= lim M„(f)= sup M„(f). (T.9.30)
Proposition T.9.3. (Maxima! Ergodic Theorem). Under the hypothesis of Theorem T.9.2, f(X)dP>-0
VGEJ.
(T.9.31)
fuW 1 01IG
Proof. Note that f(X)+M„(foT)=M^ +1 (f) on the set {MM+ ,(f)>0}. Since M, + , (f) >, M(f) and {M(f) > 0} c {M„ + , (f) > 0}, it follows that f (X) M(f) — M„(f o T) on {M(f) > 0}. Also, M (f) -> 0, M^(f o T) -> 0. Therefore, M
f(X)dP>
(M^(f)—M^(f°T))dP
-
J( M„(f)>OV G
{M„(f)>OInG
=
M„(f) dP — J J
M„(f o T) dP
O
J
J
M^(f) dP — M„(f o T) dP G
G
= 0,
where the last equality follows from the invariance of G and the stationarity of Thus, (T.9.31) holds with {M„(f) > 0} in place of {M(f) > 0}. Now let n j cc. • Now consider the quantities 1 n- 1
A,(f):= max{.f(X), -1(.f(X) + f(TX)), ... ,
n,_
o
f(T`X)
A(f):= tim A„(f)= sup A„(f). new
n,(
The following is a consequence of Proposition T.9.3.
Corollary T.9.4. (Ergodic Maximal Inequality). Under the hypothesis of Theorem T.9.1 one has, for every cc If8',
f
(A(f»c)n G
f(X) dP -> cP({A(f ) > c} n G)
VG E 5.
(T.9.32)
226
DISCRETE-PARAMETER MARKOV CHAINS
Proof. Apply Proposition T.9.3 to the function f - c to get
f
f(X) dP -> cP({M(f - c) > 0} n G).
(M(f-^»0)nG
But {M„(f - c) > 0} = {A„(f - c) > 0} = {A„(f) > c}, and {M(f - c) > 0} _
n
{A(f)>c}. We are now ready to prove Theorem T.9.2, using (T.9.31). Proof of Theorem T.9.2. Write I
„-
'
I^-'
7(X):= i-- > f(T'X),
f(X):= lim - Y f(T'X),
p_Qn.=o
n—,n.=o
Gc,,(f )'= {f(X) > c, f(X) , 0, since n - ' jö -1 f + (T'X) -- 7(X) a.s. and n - ' 2]o - 'f(TX) - r(X) a.s., where f + = max{ f, 0}, - f - = min{ f, 0}. Assume then f > 0. First, by Fatou's Lemma and stationarity of {X„},
E7(X) < lim E
”-' 1 -
f(T'(X)) = Ef(X) < oo.
n-ao n r=o
To prove the L'-convergence, it is enough to prove the uniform integrability of the sequence {(1/n)S„(f):n_>1}, where S„(f):=Zö 'f(T'X). Now since f(X) is nonnegative and integrable, given s > 0 there exists a constant N E such that -
227
THEORETICAL COMPLEMENTS
II f (X) — fE(X)II l >a1 nn
S"(.ff) dP Ins.,ti)>,1 n
+ N"P({n S"(f) > }) S e + N,Ef(X)/.l.
(T.9.36)
It follows that the left side of (T.9.36) goes to zero as .l --p oo, uniformly for all n. Part (b) is an immediate consequence of part (a). • Notice that part (a) of Theorem T.9.2 also implies that g(X) = E(f (X) J). Theorem T.9.2 is generally stated for any transformation T on a probability space (S2, ^, p) satisfying p(T 'G) = p(G) for all G e ^. Such a transformation is called measure-preserving. If in this case we take X to be the identity map: X(cu) = w, then parts (a) and (b) hold without any essential change in the proof. -
Theoretical Complements to Section 11.10
1. To prove the FCLT for Markov-dependent summands as asserted in Theorem 10.2, first consider X=
+ +Z Q^
Since Z i , . are i.i.d. with finite second moment, the FCLT of Chapter I provides that {X;"°} converges in distribution to standard Brownian motion. The corresponding result for {W"(t)} follows by an application of the Maximal Inequality to show sup I X;" — W " E . iJ (t)I - 0
in probability as n -+ cc,
(T.10.1)
osfsI
where t is the first return time to j. Theoretical Complements to Section 11.12
There are specifications of local structure that are defined in a natural manner but for which there are no Gibbs states having the given structure when, for example, A = Z, but S is not finite. As an example, one can take q to be the transition matrix of a (general) random walk on S = Z such that q = q11-1 > 0 for all i, j. In this case no probability distribution on S^ exists having the local structure furnished by (12.10). For proofs, refer to the papers of F. Spitzer (1974), "Phase Transition in OneDimensional Nearest-Neighbor Systems," J. Functional Analysis, 20, pp. 240-254; H. Kesten (1975), "Existence and Uniqueness of Countable One-Dimensional Markov Random Fields," Ann. Probab., 4, pp. 557-569. The treatment here follows F. Spitzer (1971), "Random Fields and Interacting Particle Systems," MAA Lecture Notes, Washington, D.C. ;;
228
DISCRETE-PARAMETER MARKOV CHAINS
Theoretical Complements to Section II.14 (Markov Processes and Iterations of I.I.D. Maps) Let p(x; dy) denote a transition probability on a state space (S, 91); that is, (1) for each x e S. p(x; dy) is a probability measure on (S, 9), and (2) for each B e.9', x --+ p(x; B) is .9'-measurable. We will assume that S is a Borel subset of a complete separable metric space, and .9' its Borel sigmafield M(S). It may be shown that S may be "relabeled" as a Borel subset C of [0, 1], with M(C) as the relabeling of .^(S). (See H. L. Royden (1968), Real Analysis, 2nd ed., Macmillan, New York, pp. 326-327). Therefore, without any essential loss of generality, we take S to be a Borel subset of [0, 1]. For each x e S, let F(.) denote the distribution function of p(x; dy): FX (y)1= p(x; S o (—oo, y]). Define Fx ' (t) := inf {y e 11': FF (y) > t}. Let U be a random variable defined on some probability space (f2, , , P), whose distribution is uniform on (0, 1). Then it is simple to check that P(Fx'(U) _< y) > P(F x (y) > U) = P(U < FX (y)) = Fx (y), and P(F»'(U) _< y) _< P(FF (y) 3 U) = P(U S FX(y)) = Fx(y). Therefore, P(FX'(U) _< y) _ F,(y), that is, the distribution of Fx'(U) is p(x; dy). Now let U,, U2 ,. . . be a sequence of i.i.d. random variables on (S2, , P), each having the uniform distribution on (0, 1). Let X0 be a random variable with values in S. independent of {U„}. Define X„ + 1 = f(X, U,, ,) (n > 0), where f(x, u) := Fx '(u). It then follows from the above that {X: n >_ 0} is a Markov process having transition probability p(x; dy), and initial distribution that of X0 . Of course, this type of representation of a Markov process having a given transition probability and a given initial distribution is not unique. For additional information, see R. M. Blumenthal and H. K. Corson (1972), "On Continuous Collections of Measures,” Proc. 6th Berkeley Symposium on Math. Stat. and Prob., Vol. 2, pp. 33-40.
2. Example I is essentially due to L. E. Dubins and D. A. Freedman (1966), "Invariant Probabilities for Certain Markov Processes,” Ann. Math. Statist., 37, pp. 837-847. The assumption of continuity of the maps is not needed, as shown in J. A. Yahav (1975), "On a Fixed Point Theorem and Its Stochastic Equivalent," J. App!. Probability, 12, pp. 605-611. An extension to multidimensional state space with an application to time series models may be found in R. N. Bhattacharya and O. Lee (1988), "Asymptotics of a Class of Markov Processes Which Are Not in General Irreducible," Ann. Probab., 16, pp. 1333-1347. Example l(a) may be found in L. J. Mirman (1980), "One Sector Economic Growth and Uncertainty: A Survey," Stochastic Programming (M. A. H. Dempster, ed.), Academic Press, New York. It is shown in theoretical complement 3 below that the existence of a unique invariant probability implies ergodicity of a stationary Markov process. The SLLN then follows from Birkhoff's Ergodic Theorem (see Theorem T.9.2). Central limit theorems for normalized partial sums may be derived for appropriate functions on the state space, by Theorem T.13.3 in the theoretical complements of Chapter V. Also see, Bhattacharya and Lee, loc. cit. Example 3 is due to M. Majumdar and R. Radner, unpublished manuscript. K. S. Chan and H. Tong (1985), "On the Use of Deterministic Lyapunov Function for the Ergodicity of Stochastic Difference Equations," Advances in App!. Probability, 17, pp. 666-678, consider iterations of i.i.d. piecewise linear maps. 3. (Irreducible Markov Processes) A transition probability p(x; dy) on the state space (S,5°) is said to be co-irreducible with respect to a sigmafinite nonzero measure q if, for each x e S and Be .9' with q(B) > 0, there exists an integer n = n(x, B) such that
THEORETICAL COMPLEMENTS
229
p "^(x, B) > 0. There is an extensive literature on the asymptotics of cp-irreducible (
Markov processes. We mention in particular, N. Jain and B. Jamison (1967), "Contributions to Doeblin's Theorem of Markov Processes," Z. Wahrscheinlichkeitstheorie und Venn Gebiete, 8, pp. 19-40; S. Orey (1971), Limit Theorems for Markov Chain Transition Probabilities. Van Nostrand, New York; R. L. Tweedie (1975), "Sufficient Conditions for Ergodicity and Recurrence of Markov Chains on a General State Space," Stochastic Process App!., 3, pp. 385-403. Irreducible Markov chains on countable state spaces S are the simplest examples of cp-irreducible processes; here cp is the counting measure, ep(B)'= number of points in B. Some other examples are given in theoretical complements to Section 11.6. There is no general theory that applies if p(x; dy) is not cp-irreducible, for any sigmafinite q. The method of iterated maps provides one approach, when the Markov process arises naturally in this manner. A simple example of a nonirreducible p is given by Example 2. Another example, in which p admits a unique invariant probability, is provided by the simple linear model: X"„ = ZX,, + E" + ,, where F" are i.i.d., P(e" = i) = P(F" = i) = z. —
4. (Ergodicity, SLLN, and the Uniqueness of Invariant Probabilities) Suppose {XX : it -> 0} is a stationary Markov process on a state space IS, 9'), having a transition probability p(x; dy) and an invariant initial distribution it. We will prove the following result: The process {X"} is ergodic if and only if there does not exist an invariant probability n' that is absolutely continuous with respect to it and different from it. The crucial step in the proof is to show that every (shift) invariant bounded measurable function h(X) is a.s. equal to a random variable g(Xo ) where g is a measurable function on (S, .9'). Here X= (X0 , X 1 , X 2 , . ..), and we let T denote the shift transformation and 5 the (shift) invariant sigmafield (see theoretical complement 9.3). Now if h(X) is invariant, h(X) = h(T"X) a.s. for all n > 1. Then, by the Markov property, E(h(X) I cr(Xo , ... , XX }) = E(h(T"X) I a{X 0 , ... , X,,}) = E(h(T"X) 1 6{X"}) = g(X"), where g(x) = E(h(X0 , X„ . ..) I X0 = x). By the Martingale Convergence Theorem (see theoretical complement 5.1 to Chapter IV, Theorem T.5.2), applied to the martingale g(X) = E(h(X) a{Xo , ... , X"}), g(X) converges a.s., and in L', to E(h(X) I a{Xo , X„ ...}) = h(X). But g(X) — h(X) = g(X) — h(T"X) has the same distribution as g(X0 ) — h(X) for all n > 1. Therefore, g(Xo ) — h(X) = 0 a.s., since the limit of g(X) — h(X) is zero a.s. In particular, if G e .S then there exists B E . ' such that {X 0 e B} = G a.s. This implies it(B) = P(X0 e B) = P(G). If {X"} is not ergodic, then there exists G ei such that 0 < P(G) < I and, therefore, 0 < n(B) < 1 for a corresponding set BE .y as above. But the probability measure r1 B defined by: n e (A) = tr(A n B)/n(B), A e.9", is invariant. To see this observe that $ p(x; A)i B (dx) = f B p(x; A)rz(dx)/ic(B) = P(X0 e B, X, e A)/ir(B) = P(X, E B, X, E A)/it(B) (since {X o e B} is invariant) = P(X0 E A n B)/n(B) (by stationarity) = n(A r B)/tc(B) = tt 8 (A). Since i,(B) = 1 > rz(B), and tt B is absolutely continuous with respect to n, one half of the italicized statement is proved. To prove the other half, suppose {X"} is ergodic and n' is also invariant and absolutely continuous with respect to n. Fix A e Y. By Birkhoff's Ergodic Theorem, and conditioning on X 0 , (I/n) Z;=ä p ' (x; A) converges to it(A) for all x outside a set of zero it-measure. Now the invariance of n' implies f (1/n) p(')(x; A)zr'(dx) = rz'(A) for all n. Therefore, n'(A) = rz(A). Thus it' = it, completing the proof. As a very special case, the following strong law of large numbers (SLLN) for Markov processes on general state spaces is obtained: If p(x; dy) admits a unique invariant probability rr, and {X": n -> 0} is a Markov process with transition probability (
)
230
DISCRETE-PARAMETER MARKOV CHAINS
p and initial distribution n, then (1/n) jö ' f (X,) converges to f f (x)n(dx) a.s. provided that If (x)I i(dx) < co. This also implies, by conditioning on X 0 , that this almost sure convergence holds under all initial states x outside a set of zero it-measure. -
f
5. (Ergodic Decomposition of a Compact State Space) Suppose S is a compact metric space and S = .l(S) its Borel sigmafield. Let p(x; dy) be a transition probability on (S, s(S)) having the Feller property: x -* p(x; dy) is weakly continuous on S into p(S)—the set of all probability measures on (S,.R(S)). Let T* denote the map on 9(S) into a(S) defined by: (T*µ)(B) = $ p(x; B)p(dx) (Be f(S)). Then T* is weakly continuous. For if probability measures µ^ converge weakly to p then, for every real-valued bounded continuous f on S, J f d(T*p") = f (f f (y)p(x; dy))p ^ (dx) converges to ($ f(y)p(x; dy))p(dx) =If d(T*p), since x -a f f(y)p(x;dy) is continuous by the Feller property of p. Let us show that under the above hypothesis there exists at least one invariant probability for p. Fix p e P1(S). Consider the sequence of probability measures
f
1^ ' -
µ^'=
n r=o
T *',u
(n
_>
1),
where T *0 p = u,
T *Iµ = T*p,
and
T*('"y = T*(T
*rp)
(r 1).
Since S is compact, by Prohorov's Theorem (see theoretical complement 8.2 of Chapter I), there exists a subsequence {p.} of {p"} such that p". converges weakly to a probability measure n, say. Then T *p , converges weakly to T *n. On the other hand,
J f dµ^
.
-
J f d(T *u
)I =4 f nJ
f dy -
f
f d(T*" p)) -< (sup{f(x)I: x e S })(2/n') -+ 0,
as n' -• oo. Therefore, {p„} and {T*p".} converge to the same limit. In other words, it = T*n, or it is invariant. This also shows that on a compact metric space, and with p having the Feller property, if there exists a unique invariant probability it then (1/n) T*'p converges weakly to n, no matter what (the initial T*p distribution) p is. Next, consider the set .f = .alp of all invariant probabilities for p. This is a convex and (weakly) compact subset of P1(S). Convexity is obvious. Weak compactness follows from the facts (i) q(S) is weakly compact (by Prohorov's Theorem), and (ii) T* is continuous for the weak topology on 9(S). For, if u ^ e .elf and µ^ converges weakly to p, then µ^ = T*µ" converges weakly to T*p. Therefore, T*p = p. Also, P1(S) is a metric space (see, e.g., K. R. Parthasarathy (1967), Probability Measures on Metric Spaces, Academic Press, New York, p. 43). It now follows from the Krein-Milman Theorem (see H. L. Royden (1968), Real Analysis, 2nd ed., Macmillan, New York, p. 207) that di is the closed convex hull of its extreme points. Now if {X ^ } is not ergodic under an invariant initial distribution n, then, by the construction given in theoretical complement 4 above, there exists B e P(S) such that 0 < n(B) < I and it = it(B)it B + n(B`)i B .,, with n B and iB ., mutually singular invariant probabilities. In other words, the set K, say, of extreme points of d# comprises those it such that {X"} with initial distribution it is ergodic. Every it e .i is a (weak) limit of convex combinations of the form .1;'°p;" ( n -+ cc), where 0 < A;^ < 1, .l;'° = 1, µ;'° e K.
1_
)
THEORETICAL COMPLEMENTS
231
Therefore, the limit it may be expressed uniquely as it = f K pm(dp), where m is a probability measure on (K, :. (K)). This means, for every real-valued bounded continuous f, 1, f drt = Sic (f s f dp)m(dµ).
Theoretical Complements to Section II.15 1. For some of Claude Shannon's applications of information theory to language structure, see C. E. Shannon (1951), "Prediction and Entropy of Printed English," Bell System Tech. J., 30(1), pp. 50-64. The basic ideas originated in C. E. Shannon (1948), "A Mathematical Theory of Communication," Bell System Tech. J., 27, pp. 379-423, 623-656. There are a number of excellent textbooks and references devoted to this and other problems of information theory. A few standard references are: C. E. Shannon and W. Weaver (1949), The Mathematical Theory of Communications, University of Illinois Press, Urbana; and N. Abramson (1963), Information Theory and Coding. McGraw-Hill, New York.
CHAPTER III
Birth—Death Markov Chains
1 INTRODUCTION TO BIRTH—DEATH CHAINS Each of the simple random walk examples described in Section 1.3 has the special property that it does not skip states in its evolution. In this vein, we shall study time-homogeneous Markov chains called birth—death chains whose transition law takes the form ß;
ifj =i +1
S ; ifj =i -1 a i ifj =i 0
(1.1)
otherwise,
where a + ß, + b = 1. In particular, the displacement probabilities may depend on the state in which the process is located. ;
;
Example 1. (The Bernoulli—Laplace Model). A simple model to describe the mixing of two incompressible liquids in possibly different proportions can be obtained by the following considerations. Consider two containers labeled box I and box II, respectively, each having N balls. Among the total of 2N balls, there are 2r red and 2w white balls, I < r < w. At each instant of time, a ball is randomly selected from each of the boxes, and moved to the other box. The state at each instant is the number of red balls in box I. In this example, the state space is S = {O, 1, ... , 2r} and the evolution is a Markov chain on S with transition probabilities given by for I
i c iff Y a—
x=1 ß 1ß2
< I
for all y> c if Y
•
'ß
x < 00.
(2.14)
x=1/'1N2"'Nz
By relabeling the states i as — i (i = 0, + 1, ± 2, ...), one gets (Exercise 2) 0 p yd = 1
for all
y < d
iff F
=
ßx ßx+
0
x=—oo axax+l"'SO 0
0 for i > 0, fl + ö 1 < 1. For c, d e S, the probability Ji(y) is given by (2.10) and the probability p, which is also interpreted as the probability of eventual absorption starting at y> 0, is given by d-1
ax ax l
...
-
bl
Y ß.ß.- 1 . . Ij1
p Ya =hm
d-1
dtv 1 +
—
Sxbx-1_..51
x=1 ßßx-1
= 1
iff 2
a lb /j
J^
... Y1
a = oo
(for y > 0).
(2.19)
x=1 ßIß2'''Yx
Whether or not the last series diverges, ...6,
>
0
,
for all y> 0
(2.20)
and P Yd
1 — 6 y 6 y _ 1 * (6 1 < l ,
ford > y > 0,
Pod=O foralld>0.
(2.21)
,
By (2.16) it follows that pYY < 1
Thus, all nonzero states y are transient.
(y > 0).
(2.22)
BIRTH-DEATH MARKOV CHAINS
238
CASE IV. As a final illustration of transience—recurrence conditions, take the case of one reflecting boundary at 0 with S = {0, 1, 2, 3, ...} and Poo = I — ßo, Poi =ßo, pi.r+ =ß' =‚ p ;.; = 1 — ß ; — b ; for i > 0; ß 1 >O for all i, b ; > 0 for i > 1, ß i + 6 i < 1. Let us now see that all states are recurrent if and only if the infinite series (2.19) diverges, i.e., if and only if p yo = 1. First assume that the infinite series in (2.19) diverges, i.e., p yo = I for all y > 0. Then condition on X, to get Poo = ( 1 — ßo) + ßopio,
(2.23)
Poo = 1.
(2.24)
so that
Next look at (see Eq. 2.16)
Pu = 6 1Poi
+ ß1P21
+ (1 — 6 1 — ßl).
(2.25)
Since P 20 = 1 and the process does not skip states, P 22 = 1. Also, p ol = 1 (Exercise 6). Thus, p ll = I and, proceeding by induction, p= 1,
for each y > 0.
(2.26)
On the other hand, if the series in (2.19) converges then p^, o < 1 for all y > 0. In particular, from (2.23), we see Poo < 1. Convergence of the series in (2.19) also gives p r,, < 1 for all c < y by (2.12). Now apply (2.16) to get p 1, i >1 0),
(4.31)
l o R
p p = 1c os(0) )
cos(iO) dO
(i i
0).
ir o
An alternative calculation of (4.31) can be made by first noting that the condition (4.3) is valid for the sequence of weights {rr ; } given in (4.1) with it o = 1. This provides an inner product (sequence) space on which p is bounded self-adjoint linear transformation. The spectral theory extends to such settings as well. An example in which the birth—death parameters are state-space dependent (i.e., nonconstant) is given in the chapter application. 5 CHAPTER APPLICATION: THE EHRENFEST MODEL OF
HEAT EXCHANGE The Ehrenfest model illustrates the process of heat exchange between two bodies that are in contact and insulated from the outside. The temperatures are assumed to change in steps of one unit and are represented by the numbers of balls in two boxes. The two boxes are marked I and II and there are 2d balls labeled 1, 2, ... , 2d. Initially some of these balls are in box I and the remainder in box II. At each step a ball is chosen at random (i.e., with equal probabilities among ball numbers 1, 2, ... , 2d) and moved from its box to the other box. If there
CHAPTER APPLICATION: THE EHRENFEST MODEL OF HEAT EXCHANGE
247
are i balls in box I, then there are 2d — i balls in box II. Thus there is no overall heat loss or gain. Let X. denote the number of balls in box I after the nth trial. Then {X„: n = 0, 1, ...} is a Markov chain with state space S = {0, 1, 2, ... , 2d} and transition probabilities Pi,c+1 =1 — d- ,
Pu.,- = 2d '
for i = 1,2,... , 2d — 1,
(5.1) Poi-
P
ij
1 ,
P2d,24-1=
= 0,
1
,
otherwise.
This is a birth-death chain with two reflecting boundaries at 0 and 2d. The transition probabilities are such that the mean change in temperature, in box I, say, at each step is propostional to the negative of the existing temperature gradient, or temperature difference, between the two bodies. We will first see that the model yields Newton's law of cooling at the level of the evolution of the averages. Assume that initially there are i balls in box I. Let Y„ = X„ — d, the excess of the number of balls in box I over d. Writing e„ = E j (Y„), the expected value of Y„ given X 0 = i, one has e„=E,(X„—d)=E,[X„- —d+(X„—X„-,)]
=E,(X„_1
—
d)+E;(X„
= e„- 1 + Ei
/ 2d —
X„ 1)=e„-1+Er
—x_, X _ i
2d
1
2d )
1\ e„ -i = e„-, — d = 1 — ^ e„ 1.
d
-
)
Note that in evaluating E i (X„ — X„ _ 1 ) we first calculated the conditional expectation of X„ — X„-, given Xn _ 1 and then took the expectation of this conditional mean. Now, by successive applications of the relation e„ = (1 — e„=(1 —!) e o = „
(
1 —!) E i (X0 —d) =(i —d)(1 —! „. „
)
(5.2)
Suppose in the physical model the frequency of transitions is r per second. Then in time t there are n = tT transitions. Write v = —log[(l — (1/d))]T. Then e„ = (i — d)e - °`,
(5.3)
which is Newton's law of cooling. The equilibrium distribution for the Ehrenfest model is easily seen, using (3.5),
to be
248
BIRTH-DEATH MARKOV CHAINS
( 2 d) 2 _ a y
_
(5.4)
j=0, 1,...,2d.
That is, it = (ij : j e S) is binomial with parameters 2d, z. Note that d = is the (constant) mean temperature under equilibrium in (5.3). The physicists P. and T. Ehrenfest in 1907, and later Smoluchowski in 1916, used this model in order to explain an apparent paradox that at the turn of the century threatened to wreck Boltzmann's kinetic theory of matter. In the kinetic theory, heat exchange is a random process, while in thermodynamics it is an orderly irreversible progression toward equilibrium. In the present context, thermodynamic equilibrium would be achieved when the temperatures of the two bodies became equal, or at least approximately or macroscopically equal. But if one uses a kinetic model such as the one described above, from the state i = d of thermodynamical equilibrium the system will eventually pass to a state of extreme disequilibrium (e.g., i = 0) owing to recurrence. This would contradict irreversibility of thermodynamics. However, one of the main objectives of kinetic theory was to explain thermodynamics, a largely phenomenological macroscopic-scale theory, starting from the molecular theory of matter. Historically it was Poincare who first showed that statistical-mechanical systems have the recurrence property (theoretical complement 2). A scientiest named Zermelo then forcefully argued that recurrence contradicted irreversibility. Although Boltzmann rightly maintained that the time required by the random process to pass from the equilibrium state to a state of macroscopic nonequilibrium would be so large as to be of no physical significance, his reasoning did not convince other physicists. The Ehrenfests and Smoluchowski finally resolved the dispute by demonstrating how large the passage time may be from i = d to i = 0 in the present model. Let us now present in detail a method of calculating the mean first passage time m = E 1 To , where To = inf {n > 0: X„ = 0} . Since the method is applicable to general birth—death chains, consider a state space S = {0, 1, 2, ... , N} and a reflecting chain with parameters ß ; , S ; = 1 — ß, such that 0 < ß, < 1 for 1 _ > >_
1. Let A d be the set {w: X 0 (co) = y, {Xn (w): n 0} reaches c before d}, where y > c. Show that A d j A = {cw: X0 (cu) = y, {X (cu): n O} ever reaches c}. X
2. Prove (2.15) by using (2.14) and looking at {—Xn : n
_< _
_ _
j, and < (1 — ßo/i t ..ß N _, )m if i < f. Here T = inf {n 1: Xn =j}. (ii) Use (i) to prove that p ; , = P; (Tt < x) = I for all i, j. 6. Consider a birth—death chain on S = {0, 1, ...} with 0 reflecting. Argue as in Exercise 5 to show that p 1 = I for all y. 7. Consider a birth—death chain on S = {0, I, ... , N} with 0, N absorbing. Calculate
p;T',
lim n ' nix
for all i, j.
m=1
8. Let 0 be a reflecting boundary for a birth—death chain on S = {... , —3, —2, —1, 0}. Derive the necessary and sufficient condition for recurrence.
N
9. If 0 is absorbing, and reflecting, for a birth—death chain on S = {0, 1, ... , N}, then show that 0 is recurrent and all other states are transient. 10. Let p be the transition probability matrix of a birth—death chain on S = {0, 1, 2, ...} with ß.=
2 (j +21 )
Si
2 (j+ 1),
j= 0.1,2,....
254
BIRTH-DEATH MARKOV CHAINS
(i) Are the states transient or recurrent? (ii) Compute the probability of reaching c before d, c < d, starting from state i, c_ 0 is the spring constant. In particular, it follows that T r x = xA(t), where
A(t)=
cos(yt)
—my sin(yt)
--sin(yt)
cos(yt)
1 t_>0,
my
where y=
k
> 0.
Notice that areas (2-dimensional phase-space volume) are preserved under T, since det A(t) = 1. The motion is obviously periodic in this case.
Example 2. A standard model in statistical mechanics is that of a system having k (generalized) position coordinates q l , ... , q, and k corresponding (generalized) momentum coordinates p l , ... , P k . The law of evolution is usually cast in Hamiltonian form:
OH dq;_aH dp;_ dt
ap ' ;
aq; '
dt
i
=1,...,k,
(T.5.2)
where H - ll(q,, . .. , qk, p,, ... , Pk) is the Hamiltonian function representing the total energy (kinetic energy plus potential energy) of the system. Example I is of this form with k = 1, H(q, p) = p 2 /2m + kg 2 . Writing n = 2k, x, = q,, ... , X k = qk> Xk+ 1 = Pi, • • • , X2k = Pk, this is also of the form (T.5.1) with p
f(x) _ (fl (x), ..... 2k(x)) _
/
OH OH
OH —
GXk+ 1 aX2,
ax,
aH
, ... , --
(T.5.3)
OXk
Observe that for H sufficiently smooth, the flow in phase space is generally incompressible. That is,
258
BIRTH—DEATH MARKOV CHAINS
div f(x) - trace^af 1
ax;/1
= 0
r a / a Fl 1 + a
of —
—
i =1
ax;
(_ OH )]
I
ax1
i 1 LOx1\aXk+t/ axk +i
(T.5.4)
for all x.
Liouville first noticed the important fact that incompressibility gives the volume preserving property of the flow in phase space. Lionville Theorem T.5.1. Suppose that f(x) in (T.5.1) is such that div f(x) = 0 for all x. Then for each bounded (measurable) set D c R', IT DI = IDI for all t > 0, where I • I denotes n-dimensional volume (Lebesgue measure). ❑ Proof. By the uniqueness condition stated at the outset we have T,
+h
= T,Th for all
t, h > 0. So, by the change of variable formula, d e t(
ITs+hDI = f
-T,,x
l dx.
11,D \ ax )
To calculate the Jacobian, first note from (T.5.1) that aThx
I+ af h+O(h2)
ax =
ax
as h—•0.
But, expanding the determinant and collecting terms, one sees for any matrix M that det(I + hM) = 1 + h trace(M) + 0(h 2 )
as h —• 0.
Thus, since trace(af/ax) = div f(x) = 0,
det( Ox ) = 1 + O(h 2 )
as h —p 0.
ax
It follows that for each t >_ 0 IT,+h DI = IT,DI + O(h 2 )
as h —+ 0,
or dt
IT,DI = 0
and
ITODI = IDI, n
i.e., t —* ITDI is constant with constant value L.
2. Liouville's theorem becomes especially interesting when considered along side the following theorem of Poincare. Poincare's Recurrence Theorem T.5.2. Let T be any volume preserving continuous one-to-one mapping of a bounded (measurable) region D c 1' onto itself. Then for each neighborhood A of any point x in D and every n, however large there is a subset B of A having positive volume such that for all ye B T'y E A for some
r
_>
n.
❑
THEORETICAL COMPLEMENTS
259
Proof: Consider A, T", T - 2 "A, .... Then there are distinct times i, j such that IT -. "A n T'01 ^ 0; for otherwise
Ipl >- Ü T i
-
i "a,1
=I
I
-i
"AI = Z JAI = +oo.
i =a
=o
i =o
It follows that IO n T - "li - 'IAI ^ 0. Take B=AnT - "'i - 'IA,r= nil
—
ii.
n
3. S. Chandrasekhar (1943), "Stochastic Problems in Physics and Astronomy", Reviews in Modern Physics, 15, 1-89, contains a discussion of Boltzmann and Zermello's classical analysis together with other applications of Markov chains to physics. More complete references on alternative derivations as well as the computation of the mean recurrence time of a state can be found in M. Kac (1947), "Random Walk and the Theory of Brownian Motion", American Mathematical Monthly, 54, 369-391; also see E. Waymire (1982), "Mixing and Cooling from a Probabilistic Point of View", SIAM Review, 24, 73-75.
CHAPTER IV
Continuous-Parameter Markov Chains I INTRODUCTION TO CONTINUOUS-TIME MARKOV CHAINS Suppose that {X,: t O} is a continuous-parameter stochastic process with a finite or denumerably infinite state space S. Just as in the discrete-parameter case, the Markov property here also refers to the property that the conditional distribution of the future, given past and present states of the process, does not depend on the past. In terms of finite-dimensional events, the Markov property requires that for arbitrary time points 0 < s o < s, < ... < s <S 0, t > 0),
(2.5)
,jEs
which may also be expressed in matrix notation by the following so-called semigroup property p(t + s) = p(t)p(s)
(s
> 0, t > 0).
(2.6)
Therefore, the transition matrices p(t) cannot be chosen arbitrarily. They must be so chosen as to satisfy the Chapman—Kolmogorov equations. It turns out that (2.5) is the only restriction required for consistency in the sense of prescribing finite-dimensional distributions as in Section 6, Chapter I. To see this, take an arbitrary initial distribution it and time points 0 < t t < t 2 < t 3 . For arbitrary states i 0 , i l , i 2 , i 3 , one has from (2.2) that P-(X0 =
i0, Xti = i1 ,
X,2 = 12, X3 = 1 3) _ "ioPioi1(t1)P2(t2 — t1)Pi2t3(t3 — t2),
(2.7)
as well as
Pn(XO = i0 X,, = ,
i1
,
Xt3 = i3) _
it OPiOiJ
(t1)Pi1i3(t3 — t1).
(2.8)
KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS
265
But consistency requires that (2.8) be obtained from (2.7) by summing over i 2 . This sum is
Y—ii 7C top10 (tl)Pt,i2(t2 — tl)P1213(t3 — t2) = ni o p0 (t1)Y-Pi i2 1
1
i2
(t2 — t1)Pi 2 i 3 (t3 — t 2 ). (2.9)
By the Chapman—Kolmogorov equations (2.5), with t = t 2 — t,, s = t 3 — t Z , one has
Z PiIi2(t2 — t1)Pi2i3(t3 — t2) = Pi ,3(t3 — t1 ), 1
iZ
showing that the right sides of (2.8) and (2.9) are indeed equal. Thus, if (2.5) holds, then (2.2) defines joint distributions consistently, i.e., the joint distribution at any finite set of points as specified by (2.2) equals the probability obtained by summing successive probabilities of a joint distribution (like (2.2)) involving a larger set of time points, over states belonging to the additional time points. Suppose now that p(t) is given for 0 < t < t o , for some t o > 0, and the transition probability matrices satisfy (2.6). Since any t > t o may be expressed uniquely as t = rt o + s, where r is a positive integer and 0 < s < t o , by (2.6) we have p(t) = p(rt o + s) = p(t o )p((r — 1)t 0 + s) = p 2 (to)p((r — 2)t o + s) = ... = pr(t)p( s ) • Thus, it is enough to specify p(t) on any interval 0 < t < t o , however small t o > 0 may be. In fact, we will see that under certain further conditions p(t) is determined by its values for infinitesimal times; i.e., in the limit as t o —• 0. From now on we shall assume that lim p(t) = 5 ;j , rlo
(2.10)
where Kronecker's delta is given by 6 ij = 1 if i = j, 5 = 0 if i : j. This condition is very reasonable in most circumstances. Namely, it requires that with probability 1, the process spends a positive (but variable) amount of time in the initial state i before moving to a different state j. The relations (2.10) are also expressed as lim p(t) = I, tlo
(2.11)
where I is the identity matrix, with 1's along the diagonal and 0's elsewhere.
266
CONTINUOUS-PARAMETER MARKOV CHAINS
We shall also write,
(2.12)
p(o)=I.
Then (2.11) expresses the fact that p(t), 0 < t < oc, is (componentwise) continuous at t = 0 as a function of t. It may actually be shown that owing to the rich additional structure reflected in (2.6), continuity implies that p(t) is in fact differentiable in t, i.e., p(t) = d(p ;; (t))/dt exist for all pairs (i, j) of states, and alit > 0. At t = 0, of course, "derivative" refers to the right-hand derivative. In particular, the parameters q ;j given by
q^; = lim
Pij(t) — p(0) =
tlo
t
lim
p(t) —
elo
S `' ,
(2.13)
t
are well defined. Instead of proving differentiability from continuity for transition probabilities, which is nontrivial, we shall assume from now on that p ;; (t) has a finite derivativefor all (i, j) as part of the required structure. Also, we shall write Q
=
(2.14)
((qi;)),
for q i; defined in (2.13). The quantities q ;; are referred to as the infinitesimal transition rates and Q the (formal) infinitesimal generator. Note that (2.13) may be expressed equivalently as p
1
(At) = S i; + q, At + o(At)
as At —. 0.
(2.15)
Suppose for the time being that S is finite. Since the derivative of a finite sum equals the sum of the derivatives, it follows by differentiating both sides of (2.5) with respect to t and setting t = 0 that
Pik(s) =
jes
qij Pik(s),
p' (0 )Pfk(s) _
i, k e S,
(2.16)
JE$
or, in matrix notation after relabeling s as t in (2.16) for notational convenience, p'(t)
=
QP(t)
(t
0).
(2.17)
The system of equations (2.16) or (2.17) is called Kolmogorov's backward equations. One may also differentiate both sides of (2.5) with respect to s and then set s = 0 to get Kolmogorov's forward equations for a finite state space S, Pik(t) = E p (t)q^ k , JES
i, k E S,
(2.18)
SOLUTIONS TO KOLMOGOROV'S EQUATIONS IN EXPONENTIAL FORM
267
or, in matrix notation, P'(t) = p(t)Q.
(2.19)
Since p ij (t) are transition probabilities, for all i e S.
p 13 (t) = I
(2.20)
,%ES
Differentiating (2.20) term by term and setting t = 0, Y 9i; = 0.
(2.21)
jES
Note that qij:= pi ß (0) > 0
for i ^ j, (2.22)
9ii'= p(0)
0,
in view of the fact p iJ (t) 0 = p ij (0) for i 0 j, and p ii (t) < 1 = p i; (0). In the general case of a countable state space S, the term-by-term differentiation used to derive Kolmogorov's equations may not always be justified. Conditions are given in the next two sections for the validity of these equations for transition probabilities on denumerable state spaces. However, regardless of whether or not the differential equations are valid for given transition probabilities p(t), we shall refer to the equations in general as Kolmogorov's backward and forward equations, respectively. Example 1. (Compound Poisson). From (1.7), co
p 1 (t) = I f*k(j — i) k=o
^ktk 1 e -at
t j0,
k•
= s ;; e + f(j — i)a.te ^' + o (t),
as t j 0.
(2.23)
Therefore, and
= pii( 0 ) = 1 f(j — i),
(2.24)
4ü = pii( 0 ) = —2(1 — .f(0 )).
3 SOLUTIONS TO KOLMOGOROV'S EQUATIONS IN EXPONENTIAL FORM We saw in Section 2 that if p(t) is a transition probability law on a finite state space S with Q = p'(0), then p satisfies Kolmogorov's backward equation
268
CONTINUOUS-PARAMETER MARKOV CHAINS
Y_
p(t) _ > gikPkj(t), k
i, j e S, t
0,
(3.1)
j ES,
0,
(3.2)
and Kolmogorov's forward equation p(t)
_
Pik(t)gkj , i,
t
k
where Q = (( q ij )) satisfies the conditions in (2.21) and (2.22). The important problem is, however, to construct transition probabilities p(t) having prescribed infinitesimal transition rates Q = ((q1)) satisfying qij
> 0
for i : j, = L.
q 11
0, (3.3)
qij•
j
In the case that S is finite it is known from the theory of ordinary differential equations that, subject to the initial condition p(0) = I, the unique solution to (3.1) is given by p(t)
where the matrix
e`Q is
= e,
t
? 0,
(3.4)
defined by e=
I (tQ)n = I + Y t Q". n=1 n. n.
(3.5)
n =o
Example 1. Consider the case S = {0, 1} for a general two-state Markov chain
with rates — qoo = q01 =
ß,
Then, observing that Q 2 = — ( ß + Q"
=
(-
—q11 = q10
5)Q
(3.6)
and iterating, we have
1)" '(ß + S)" ' Q, -
= 6.
-
for n = 1, 2, ....
(3.7)
Therefore, p(t)=
e`Q=I
—R I
(e-
r(ß+b)—
1)Q
_ 1 S + ße-(B+a)' ß — ß e cß+air ß+b S — Se -
ß+Se-
It is also simple, however, to solve the (forward) equations directly in this case (Exercise 3).
269
SOLUTIONS TO KOLMOGOROV'S EQUATIONS IN EXPONENTIAL FORM
In the case that S is countably infinite, results analogous to those for the finite case can be obtained under the following fairly restrictive condition, A
:=
sup
Iql
0, all i e S) clearly satisfy these equations, one has H,(t) = 1 for all t by uniqueness of such solutions. Thus, the solutions (3.11) have been shown to satisfy all conditions for being transition probabilities except for nonnegativity (Exercise 5). Nonnegativity will also follow as a consequence of a more general method of construction of solutions given in the next section. When it applies, the exponential form (3.4) (equivalently, (3.11)) is especially suitable for calculations of transition probabilities by spectral methods as will be seen in Section 9.
Example 2. (Poisson Process). The Poisson process with parameter 2 > 0 was introduced in Example 1.1. Alternatively, the process may be regarded as a Markov process on the state space S = {0, 1, 2, ...} with prescribed infinitesimal transition rates of the form q ij =2,
forj =i +1, i= 0,1,2,...
q=—A,
i =0,1,2,...
q 1 = 0,
otherwise.
(3.15)
By induction it will follow that
nn+j n (-1) —i^l, (n) _
if0<j—inn
—
0,
(3.16)
otherwise.
Therefore, the exponential formula (3.4) [(3.11)] gives for j i, t > 0, /
!)ijlt) =
(S ij
t2 +
tqi; +
Cji^ + .. .
t ;—i+k
oo
k= (j — i +k)! q = (2t }'
1 (—At)k
(j — 1 )!k=o
k!
tj—i+k
i + k
k=O(j —i +k)!\ j —i = (2ty-` e-a:
l
(3.17)
(j —i)!
Likewise, pij(t) = 0
for j < i.
(3.18)
Thus, the transition probabilities coincide with those given in Example 1.
SOLUTIONS TO KOLMOGOROV'S EQUATIONS BY SUCCESSIVE APPROXIMATION 271
4 SOLUTIONS TO KOLMOGOROV'S EQUATIONS BY SUCCESSIVE APPROXIMATIONS For the general problem of constructing transition probabilities ((p ;; (t))) having a prescribed set of infinitesimal transition rates given by Q = ((q1)), where
—q i1 = y q ;j ,
q ;j >0
for i j,
(4.1)
1^I
the method of successive approximations will be used in this section. The main result provides a solution to the backward equations p(t) _ Z gikPk;(t),
i, j E S
(4.2)
k
under the initial condition p 11 (0)=5 1j ,
(4.3)
i,jeS.
The precise statement of this result is the following. Theorem 4.1. Given any Q satisfying (4.1) there exists a smallest nonnegative solution p(t) of the backward equations (4.2) satisfying (4.3). This solution satisfies X p 1 (t) < 1
for all i ES, all t >, 0.
(4.4)
jES
In case equality holds in (4.4) for all i e S and t > 0, there does not exist any other nonnegative solution of (4.2) that satisfies (4.3) and (4.4). Proof. Write n, = — q (i e S). Multiplying both sides of the backward equations (4.2) by the "integrating factor" e z is one obtains ;
;;
e A " s PLk(s) = e Ais y gI,Ptk(s), jCs
or d li" ds (ePik(s)) = A;e PIk(S) + e r ' s
ga;P;k(s) = y elisq„PJk(s)• JE$
j :i
On integration between 0 and t one has, remembering that p,(0) = S ;k , t
e Air Pik(t) = Sjk +
e gIJPjk(s) ds , jo o
272
CONTINUOUS-PARAMETER MARKOV CHAINS
or
Pik(t) = Sike a,' +
f` e -aiu-s)
(t ^ 0; i,
gijpjk(s) ds
k E S).
(4.5)
j #i o
Reversing the steps shows that (4.2) together with (4.3) follow from (4.5). Thus (4.2) and (4.3) are equivalent to the system of integral equations (4.5). To solve the system (4.5) start with the first approximation
°
(4.6)
(i, k e S, t ) 0)
Pik = Sike A'`
and compute successive approximations, recursively, by p(t)
_ 6 ike -Ail + Y_
e -x p;k ^(t). It then follows from (4.7) by induction that p;k + '(t) p(t) for all n , 0. Thus, p ;k (t) = lim n -. p(t) exists. Taking limits on both sides of (4.7) yields Pik(t) = 6ike-Ail +
e-a'(t-S)gijPjk(s) ds.
(4.8)
j9 6 i JO
1 for Hence, p ik (t) satisfy (4.5). Also, P k (t) - p;° (t) >, 0. Further, >kes Pik »(t) all t > 0 and all i. Assuming, as induction hypothesis, that >kES Pik '^(t) 1 for all t >, 0 and all i, it follows from (4.7) that p(t) =
e
-xis + Y
kES
e-zju-s>qij( ^
keS
j #iJ0 '
e -x;' +
pk -1)(S))ds l
ds
joi o
= et + tie-A;' fells ds = e-x;' + A i e -
'
= 1.
o
Hence, Ekcs
p(t)
, 0, and the same must be true for
)
I Pikt) = lim X pik (t)• kES
nj ro kES
We now show that p(t) is the smallest nonnegative solution of (4.5). Suppose p(t) is any other nonnegative solution. Then obviously p ;k (t) ^ 6 ik e = p»(t) for all i, k, t. Assuming, as induction hypothesis, p ik (t) > p;k 1 (t) for all i, k e S. t > 0, it follows from the fact that p ;k (t) satisfies (4.5) that
SOLUTIONS TO KOLMOGOROV'S EQUATIONS BY SUCCESSIVE APPROXIMATION 273 r
>
e x;a s^
e—x;t + S pik (t) > ik
cn-1) sd s /t ) . ( ) = p ikt
qjk ij p
joi Jo
Hence, p ik (t) p(t) for all n 0 and, therefore, p ikt) 15 k (t) for all i, k e S and all t >, 0. The last assertion of the theorem is almost obvious. For if equality holds in (4.4) for p(t), for all i and all t >, 0, and p(t) is another transition probability matrix, then, by the above p ik (t) ^ p ;k (t) for all i, k, and t >, 0. If strict inequality holds for some t = t o and i = i o then summing over k one gets Y- Pik( O) > E Pik(0) = I, kE$
kES
contradicting the hypothesis that p(t o ) is a transition probability matrix.
•
Note that we have not proved that p(t) satisfies the Chapman-Kolmogorov equation (2.6). This may be proved by using Laplace transforms (Exercise 6). It is also the case that the forward equations (2.18) (or (2.19)) always hold for the minimal solution p(t) (Exercise 5). In the case that (3.9) holds, i.e., the bounded rates condition, there is only one solution satisfying the backward equations and the initial conditions and, therefore, p;k(t) is given by exponential representation on the right side of (3.11). Of course, the solution may be unique even otherwise. We will come back to this question and the probabilistic implications of nonuniqueness in the next section. Finally, the upshot of all this is that the Markov process is under certain circumstances specified by an initial distribution it and a matrix Q, satisfying (4.1). In any case, the minimal solution always exists, although the total mass may be less than 1. Example 1. Another simple model of "contagion" or "accident proneness" for actuarial mathematics, this one having a homogeneous transition law, is obtained as follows. Suppose that the probability of an accident in t to t + At is (v + Ar) At + o(At) given that r accidents have occurred previous to time t, for v,). > 0. We have a pure birth process on S = {0, 1, 2, ...} having infinitesimal parameters q = —(v + i),), q i , i+ , = v + i2, q ;k = 0 if k < i or if k> i + 1. The forward equations yield ;;
p(t) _ —(v + nA)po„(t) + (v + (n — 1 )A)po,-1(t), p
0 (t)
=
—
vpoo(t)
(n
(4.9)
1).
Clearly, poo(t) = p01
(t) _ —(v + A)poI(t) + vpoo(t) _ —(v + A)poI(t) + ve "`
(4.10)
274
CONTINUOUS-PARAMETER MARKOV CHAINS
This equation can be solved with the aid of an integrating factor as follows. Let g(t) = e ( v + z tp ol (t). Then (4.10) may be expressed as dg(t) = vez,
dt
'
or g(t) =g(0)+
Jve z u
du=
0
J
vez"du=V(ezt
-
1).
0
Therefore, Poi(t) = e-(V+zng(t) = V ( e -^^ _ e -cV+z)t) _ V
— e - z').
(4.11)
Next p2(t) = —(v + 22)P02(t) + (v + A)Poi(t) _ —(v + 22)p 02 (t) + (v + 2) -v e - "`(1 — e -zt ).
(4.12)
Therefore, as in (4.10) and (4.11), (v p 02 (t) = e cv+zztt[J r (v+zz)" e 0 = e-(v+2)t [1'(%'
2 )v e"(1
[''("_+ A) = e-cv+2t z) ,1
z") du ]
+ ^1) J` (ezz"
^
—e
— ezu) du]
o
e 2zr — 1 — e zt 1 221
2
— v(v + 2) [ e - vt — v+z ) t 2e -( + e - w+zz)t]
22 2
— v(v + 2) e_vt[1 — e - zt]z
22 2
Assume, as induction hypothesis, that Po"(t)=
v(v + 2) .. .( v + (n — 1)^) e_vr[1 —
n! A"
Then, Pö. "+i(t) = —(v + (n + 1 ) 2 )p0,"+1(t) + (v + n2)P0"(t),
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY
275
yields (t) = e—cv+("+1)z)r Po,"+ 1
J
e(v+("+1)z)u(v
+ n.1 )po"(u) du
0
= v(v + 2) ••(v+n2) i)z),
`
n!^"
ec"+i)z"[1 — e z"]"du.
(4.13)
o
Now, setting x = e za,
r ` 1
e " — e-z°]" du = - J
Jo 1
(x
1
x"( i
i
—
\
--)
1)"+1
x/ e ,.^
1 e"
dx = -
f
(x — 1)" dx
2 l 1 ( e z^ — 1)"+1
L n+1 ] 1 n+l
=
Hence, v(v + A)... (v + n2) po+i(t) _
(n + l)!2'
z^ "+ i e vt (1 — e)
(4.14)
5 SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY Let Q = ((q ij )) be transition rates satisfying (4.1) and such that the corresponding Kolmogorov backward equation admits a unique (transition probability semigroup) solution p(t) = ((p ;j (t))). Given an initial distribution it on S there is a Markov process {X1 } with transition probabilities p(t), t 0, and initial distribution it having right-continuous sample paths. Indeed, the process {X1 } may be constructed as coordinate projections on the space S2 of right-continuous step functions on [0, oo) with values in S (theoretical complement 5.3). Our purpose in the present section is to analyze the probabilistic nature of the process {X,}. First we consider the distribution of the time spent in the initial state. Proposition 5.1. Let the Markov chain {X,: 0 < t < cc} have the initial state i and let To = inf{t > 0: X, 0- i }. Then To has an exponential distribution with
parameter —q . In the case q = 0, the degeneracy of the exponential distribution can be interpreted to mean that i is an absorbing state, i.e., P1 (TO =oo)=1. ;;
;;
Proof. Choose and fix t > 0. For each integer n > 1 define the finite-dimensional
event
276
CONTINUOUS-PARAMETER MARKOV CHAINS
A„={X(m/2 n )t =iform=0,l,...,2n}. The events A. are decreasing as n increases and
A =
linl A:= fl A„ n-00
n=1
= {X„ = i for all u in [0, t] which is a binary rational multiple of t}
={To >t}. To see why the last equality holds, first note that{ To > t} = {X„ = i for all u in [0, t]} c A. On the other hand, since the sample paths are step functions, if a sample path is not in {To > t} then there occurs a jump to state j, different from i, at some time t o (0 < t o < t). The case t o = t may be excluded, since it is not in A. Because each sample path is a right-continuous step function, there is a time point t 1 > t o such that X. =j for t o < u < t 1 . Since there is some u of the form u = (m/2n)t < t in every nondegenerate interval, it follows that X„ = j for some u of the form u = (m/2n)t < t; this implies that this sample path is not in A. and, hence, not in A. Therefore, {To > t} A. Now note by (2.2)
r
Z°
P1 (To > t) = Pi (A) = lim F(A) = 11n p„(2„ nt x
n
t
1 o0
L
1 2” ]
= l im 1 + 2„ qii + o^2 „) = e`q ,
(5.1)
nt o0
proving that To is exponentially distributed with parameter —q ii . If q 1 , = 0, the above calculation shows that P; (To > t) = e ° = 1 for all t > 0. Hence P(T0 =oo)=1. n The following random times are basic to the description of the evolution of continuous time Markov chains. T o =O,
r 1 =inf{t>0:X, Xo }, To=t1,
r„=inf{t>t n _ 1 :X1 0-X,, 1 },
Tn=Tn+1—Tn
(5.2) forn^ 1.
Thus, To is the holding time in the initial state, T1 is the holding time in the state to which the process jumps first time, and so on. Generally, T. is the holding time in the state to which the process jumps at its nth transition. As usual, P; denotes the distribution of the process {X1 } under Xo = i. As might be guessed, given the past up to and including time To , the process evolves from time To onwards as the original Markov process would with initial state X.0 . More generally, given the sample path of the process up to time = To + T1 + • • • + Tn _ 1 , the conditional distribution of {Xz , + ,: t > 0} is Ps , , , depends only on the (present) state X, and on nothing else in the past.
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY
277
Although this seems intuitively clear from the Markov property, the time To is not a constant, as in the Markov property, but a random variable. The italicized statement above is a case of an extension of the Markov property known as the strong Markov property. To state this property we introduce a class of random times called stopping times or Markov times. A stopping time r is a random time, i.e., a random variable with values in [0, 00], with the property that for every fixed time s, the occurrence or nonoccurrence of the event {i 0,h" 1 (l)=Zj
ag,(' (p) = 0. at )
o
(5.29)
q 1j =0,and
d2V l( r )
dr2 =A I j(j-1)fjrj i 0,
0oo. 2f
287
(5.39)
The lemma provides a way to compute the conditional moments as follows:
EL I ML = n} = r J
"
tr 'P(L>tJML =n)dt -
(5.40)
0
and P(L> t,ML =n) — P(ML =n)—P(L t1 ML =n)= P(ML = n) P(ML = n) P(ML=n)—P(X,=0,ML=n) (5.41)
P(ML =n) Now, letting
hn=P(ML=n)E(Ln ML =n),
n> 1,
(5.42)
we have for 0 < v < 1, using (5.33), (5.37), (5.40) and (5.41),
J
h n vn = r tr -1 {(1 —(1 — v) 112 ) — g(t, 0, v)} dt n1
0
= rr°°
C C z
tr-1 C _ 1
o
f
—
1A(c,-c C zC 1e 2)r
dt -C2)u C 1a(c, C2z— 1 e
^
1
C C _r-1J r t d t. Cl)e#M(cl-CZ)`^ 1 — ( 2 C 22 — C ie
(5.43)
Ix(c1-C2)1
Expanding the integrand as a geometric series, one obtains 0on
(v) =
r CI(C2 — C2) t r-1
C2
i
n
Jo
e(n+1)z(c1 cz)t dt (
2
C 21 C
(C — C l r+1 y (C = irr \nn—r r l n=1 ) 2
= 2 ^r (1 — v)_
1)/2
)
"
f"
S r—
le—s ds
(Clinn-r. n=1
(5.44)
C2J
In the case r = 1, h(v) _ — 2 log A
1—
C1(v)
C2(v)
_—
2a.
log
2(1 — v) 1/2
1 + (1 — V)112
(5.45)
288
CONTINUOUS-PARAMETER MARKOV CHAINS
To invert h(v) in the case r = 1, consider that vh'(v) is the generating function of {nh"}. Thus, differentiating one gets that vh'(v) =.i 1 (1 — v) -1 [1 —(I — v) i
2] _). {( 1 — v) i
i
—(1 — v) -i'2} (5.46) and, therefore, after expanding (1 -- v) - ' and (1 — v) - `/ z in a Taylor series about v = 0 and equating coefficients in the left and rightmost expansions in (5.46), it follows that -
(_ 2 nP(ML =n)E(LIML =n)=2 1+ ' )(_1)"
t
.
( 5.47)
In particular, using (5.39) and Stirling's formula, it follows that E(n - '' z )LI ML =n)' 2,
as n—•oc.
(5.48)
}' I
ML = n), r >, 2, may The asymptotic behavior of higher moments E({n - " 2 ):L also be determined from (5.44) (theoretical complement 4). Moreover, it may be shown that as n —> oo, the conditional distribution of n -112 AL given ML = n has a limiting distribution that coincides with the distribution of the maximum of a (suitably scaled) Brownian excursion process as described in Exercise 12.7, Chapter I (see theoretical complement 4).
6 THE MINIMAL PROCESS AND EXPLOSION The Markov process with infinitesimal generator Q as constructed in Theorem 5.4 is called the minimal process with infinitesimal generator Q because it gives the process only up to the time T0 +T1 +T2
+•••=
n =o
.
( 6.1)
The time (is called the explosion time. What we have shown in general is that every Markov chain with infinitesimal generator Q and initial distribution is is given, up to the explosion time, by the process {X1 } described in Theorem 5.4. If ( = oo with probability 1, then the minimal process is the only one with infinitesimal generator Q and we have uniqueness. In the case P;(( = co) < 1, i.e., if explosion does take place with positive probability, then there are various ways of continuing the minimal process for t ( so as to preserve the Markov property and the backward equations (2.16) for the continued process {X1 }. One such way is to fix an arbitrary probability distribution 4 on S and let X, = j with probability i/i i (j e S). For t > ( the continued process then evolves as a new (minimal) process
THE MINIMAL PROCESS AND EXPLOSION
289
with initial j and infinitesimal generator Q until a second explosion occurs, at which time {X,} is assigned a state according to 4r independently of the assignment at the time of the first explosion C, and so on. The transition probabilities p ;k (t), say, of {X,} clearly satisfy (5.9), the integral version of the backward equations (2.16), since the derivation (5.10) is based on conditioning with respect to To , which is surely less than S, and on the Markov property. Unfortunately, the forward equations (2.19) do not hold for p, since the conditioning in the corresponding integral equations is with respect to the final jump before time t and one or more explosions may have already occurred by time t. We have already shown in Section 3 that if {q ;; : i c S} is bounded, the backward equations have a unique solution. This solution p. (t) gives rise to a unique Markov process. Therefore we have the following elementary criterion for nonexplosion. Proposition 6.1. If sup, Es ) (= sup,jq ;i l = sup ; , j Jq ;^^) is finite, then P = cc) = 1 for all i e S, and the minimal process having infinitesimal generator Q and an arbitrary initial distribution is the only Markov process with infinitesimal
generator Q. Definition 6.1. A Markov process {X} for which P; (C = oo) = I for all je S is called conservative or nonexplosive. Compound Poisson processes are conservative as may be checked from ll ( A t) /' / ^J*„v1—I)_
e jes
n=O JES
n•
i
^t e-i,` (^)„
n=O
e-x^(^.t)„=
n•
1.
n=O n!
*n (j _ i)
jES J
(6 . 2 )
Example 1. (Pure Birth Process). A pure birth process has state space S = {0, 1, 2, ...}, and its generator Q has elements q ;i = — A i , q ; , ;+ I = ^.,, q,; = 0 for j > i + I or j < i, i e S. Note that this has the same degenerate (i.e., completely deterministic) embedded spatial transition matrix as the Poisson process. However, the parameters Al i of the exponential holding times are not assumed to be equal (spatially constant) here. Fix an initial state i. Then the embedded chain { }„: n = 0, 1, ...} has only one possible realization, namely, Y„ = n + i (n > 0). Therefore, the holding times To , T1 ,... , are (unconditionally) independent and exponentially distributed with parameters 2, A i+ ,, ... , Consider the explosion time =To +T1 +•••=
Tm. m=0
CONTINUOUS-PARAMETER MARKOV CHAINS
290
First assume °° 1 Y, —=00.
(6.3)
n=0 ^n
For s > 0, consider 4(s)=E1e-
=Ee -'16r m= Eie-ST„, m=0
= ri
m +i = H
1
mj=O'm +i + S „=O 1 + S/Am +i
Choose and fix s> 0. Then,
log 4(s) _ — E 1og 1 + S m=0
^m +i
1
°° _ —s
— E s/'l m MW =0 1 + S/Am+l
= —oo,
(6.4)
m=0 1m+i + S
since log(1 + x) >, x/(1 + x) for x 0 (Exercise 1). Hence log 0(s) = — oc, i.e., (k(s) = 0. This is possible only if P i (C = oc) = 1 since e > 0 whenever ( < oo. Thus if (6.3) holds, then explosion is impossible. Next suppose that
1 0, A = vt -- 0, n, m -+ oo such that x = mA, t = nT, these equations go over to
au
= v
a
— au + au ,
(7.29)
-
(7.30)
-
a u = — v ax - au - + au + .
301
SOME EXAMPLES
Let, for the solutions u + , u to these equations, u + +u - u+-uw= and u=—-
(7.31)
By combining (7.29) and (7.30) we get the so-called transmission line equations
for u and w, au
aw
(7.32)
^t = v -- , aw = V au - 2aw.
at ax
(7.33)
Now w can be eliminated by differentiating (7.32) with respect to t and (7.33) with respect to x and combining the equations. This results in the telegrapher's equation for u. The initial conditions can also be checked by passage to the limit. The natural Monte Carlo simulation for solving the telegrapher's equation suggested by the above analysis is to start a large number of noninteracting particles in motion according to the above scheme, say half of them starting to the right and the other half going to the left initially. One then approximates u(t, x), t = nt, x = m0, by the arithmetic mean N -1 Yk= , cp(x + Sr), where N is the number of particles. However, there is a practical issue that makes the approach unfeasible. Namely, for a small time-space grid, the probability p = ca of a reversal will also be very small; the smaller the grid size the larger the mean time to reversal and its variance. This means that an extremely large number of particle evolutions will have to be simulated in order to keep the fluctuations in the average down. Mark Kac first suggested that for this problem it is possible, and more practical, to use continuous-time simulations. The idea is to consider directly the time to velocity reversal. In the above scheme this time is TN2 , where N, the number of time steps until reversal, is geometrically distributed with parameter p = ar. In the limit as T -+ 0 the time therefore converges in distribution to the exponential distribution with parameter a. So it at least seems reasonable to consider the continuous-parameter position process { Y} defined by the motion of a particle that, starting from the origin, travels at a rate v for an exponentially distributed length of time, reverses its direction, and then travels again for an (independent) exponentially distributed length of time at the speed v before again reversing its direction, and so on. The Poisson process of reversal times can be accurately simulated. To make the above ideas firm, let {N,} be the continuous-parameter Poisson process with parameter a. Let be a + 1-valued random variable independent of {N} with P(e = 1) = Z. The velocity process is the two-state Markov process { V,} defined by V,,=us(-1)v,
t>,0.
(7.34)
302
CONTINUOUS-PARAMETER MARKOV CHAINS
The position process, starting from x, is therefore given by Y,=x+
J
0
t
Vds=x+vs
E
(7.35)
(—l)' ds. ,
0
Although {Y} is not a Markov process, the joint evolution {(Y„ V)} of position and velocity is a Markov process, as is the velocity process { V} alone. This is seen as follows.
P(a 0. As in the case of discrete time, write i -• j if p 13 (t) > 0 for some t > 0. States i and j communicate, denoted i -* j, if i -• j and j -* i. Say "i is essential" if i -• j implies j --- i for all j, otherwise say "i is inessential." The following analog of Proposition 5.1 of Chapter II is proved in exactly the same manner, replacing p7 , p, etc., by p j (t), p ( s), etc. Let denote the set of all essential states.
°
;
;;
Proposition 8.1 (a) For every i there exists at least one j such that i -• j. (b) Ifi-->j,j- .k then i-•k.
(c) If i is essential then i H I. (d) If i is essential and i - j then "j is essential" and i .-+ j. (e) On 41 the relation "+-." is an equivalence relation, i.e., reflexive, symmetric, and transitive. By (e) of Proposition 8.1, & decomposes into disjoint equivalence classes of states such that members of a class communicate with each other. If i and j belong to different classes then i +-* j, j -i- I. A significant departure from the discrete-parameter case is that states in
304
CONTINUOUS-PARAMETER MARKOV CHAINS
continuous parameter chains are not periodic. More precisely, one has the following proposition.
Proposition 8.2 (a) For each state i, p, (t) > 0 for all t > 0. (b) For each pair i, j of distinct states, either p 1 (t) = 0 for all t > 0 or p ij (t)>0 for all t>0.
Proof. (a) If there is a positive t o > 0 such that p 1 (t 0 ) = 0, then since
C
P14
(» n
—
n
t ... P«^ n t ^ < P«(to), = Puu( n t Pi^( n )
)
P« to = 0 \ n) for all positive integers n. Taking the limit as n --' oo, one gets by continuity that lim„ p ;; (t o /n) = 0, which is a contradiction to continuity at t = 0 and the requirement p(0) = 1. (b) Let i, j be two distinct states. Suppose t o is a positive number such that p, (t o ) = 0. Since p i,(t o ) > p ;; (t o — s)p ; ,(s) for all s, 0 < s < t o , and since p ;i (t o — s) > 0 it follows that p. s (s) = 0 for all s 0. For, if k ;j . > 0 and k j . j > 0, then no matter how small t is (t > 0), one has, denoting by {} n : n = 0, 1, 2, ...} the embedded discrete parameter Markov chain, P.;(t) P(XT o =j', XTo+T1 = j To+Ti _0,
c=e'Ieto>>0.
By consideration of the p.g.f. k(z, t) of the conditional distribution of X, given {X, > 0, X0 = 1}, one may obtain the existence of a nondegenerate limit distribution in the limit as t --^ oo having p.g.f. of the form (Exercise 4*), Z ds
k(z, co) := tim k(z, t) = 1 — exp K l (10.34) roo
o h(s)
11 CHAPTER APPLICATION: AN INTERACTING SYSTEM THE SIMPLE SYMMETRIC VOTER MODEL Although their interpretations vary, the voter model was independently introduced by P. Clifford and A. Sudbury and by R. Holley and T. Liggett. In either case, one considers a distribution of + l's at the point of the d-dimensional integer lattice Z d at time t = 0. In the demographic interpretation one imagines the sites of Z' as the locations of a species of one of the two types, + 1 or —1. In the course of time, the type of species occupying a particular site can change owing to invasion by the opposition. The invasion is by occupants of neighboring sites and occurs at a rate proportional to the number of neighboring sites maintained by the opposing species. In the sociopolitical version, the values + 1 represent opposing opinions (pro or con) held by occupants of locations indexed by 7L'4 . As time evolves, a voter may change opinion on the issue. The rate at which the voters' position on the issue changes is proportional to the number of neighbors who hold the opposing opinion. This model is also related to a tumor-growth model introduced by T. Williams and R. Bjerknes, which, in fact, is now often referred to as the biased voter model. If one assumes the mechanism for cell division to be the same for abnormal as for normal cells in the Williams-Bjerknes model (i.e., no "carcinogenic advantage" in their model), then one obtains the voter model discussed here (see theoretical complements 4 for reference). While mathematical methods and theories for this and the more general models of this type are relatively recent, quite a bit can be learned about the voter model by applying some of the basic results of this chapter.
325
CHAPTER APPLICATION
A sample configuration of ± 1-values is represented by points a = ( a n : n E 7L ° ) in the (uncountable) product space S = (1, — I )". The evolution of configurations for the voter model will be defined by a Markov process in S. The desired transition rates are such that in time t to t + At, the configuration may change from a, by a flip at some site m, to the configuration 6 ( m , where Q;,m ) = —a n if n = m and 4m ) = a n otherwise. This occurs, with probability C m (a) At + o(At), where c m (a) is the number of neighbors n of m such that a n 54 a,,; two sites that differ by one unit in one (and only one) of the coordinate directions are defined to be neighbors. More complicated changes involving a flip at two or more sites are to occur with probability o(At) as At -- 0. It is instructive to look closely at the flip-rates for the one-dimensional case. For a configuration a E S a flip at site to occurs at unit rate c m (a) = I if the neighboring sites m — I and m + I have opposite values (opinions); i.e., u._ lam+ 1 = — 1. If, on the other hand, the values at m — 1 and m + I agree mutually, but disagree with the value at m, then the flip-rate (probability) at in is proportionately increased to c m (a) = 2. If both neighboring values agree mutually as well as with the value at ni, then the flip-rate mechanism is turned off, c m (a) = 0. Thus there is a local tendency toward consensus within the system. Observe that c m (a) may be expressed as a local averaging via )
cm( 6 )
Y_
= I — 2am( a m-I + a m+1 ).
(11.1)
More generally, in d dimensions the rates c m (a) may be expressed likewise as c m (a) = d(I —
=2d
am hmn"n J nt/ d /
I
Pmn'
(11.2)
In: o. # amt
where p = (( p m ,,)) is the transition probability matrix on 7L ° of the simple
°
symmetric random walk on Z , i.e.,
1 2d' Pm,, =
0,
if m and
n
are neighbors,
otherwise.
This helps explain the terminology simple symmetric voter model. The first issue one must contend with is the existence of a Markov evolution {a(t): t >, 0} on the uncountable state space S having the prescribed infinitesimal transition rates. In general this itself can be a mathematically nontrivial matter when it comes to describing infinite interacting systems. However, for the special flip rates desired for the voter model, a relatively simple graphical construction of the process is possible; called the percolation construction because of the "fluid flow" interpretation described below.
326
CONTINUOUS-PARAMETER MARKOV CHAINS Time
...
(+)
Space
n
m (-)
(+)
(-)
(-)
... ( ao(o))
Figure 11.1
Consider a space-time diagram of V x [0, oo) in which a "vertical time axis" is located at each site m n 7L , and imagine the points of Z" as being laid out along a "horizontal axis"; see Figure 11.1. The basic idea is this. Imagine each vertical time axis as a wire that transports negative charge (opinion) upward from the (initially) negatively charged sites m such that a m (0) = -1. For a certain random length of time (wire), the voter opinion at m will coincide with that of Q m (0), but then this influence will be blocked, and the voter will receive the opinion of a randomly selected neighbor. Place a S at the blockage time, and draw an arrow from the wire at the randomly selected site to the wire at m at this location of the blockage. S stands for "death of influence from (directly) below." If the neighbor is conducting negative flow from some point below to this time, then it will be transferred across the arrow and up. So, while the source of influence at m might have changed, the opinion has not. However, if the selected neighbor is not conducting negative flow by this time, then a portion of the wire at m above this time will be given positive charge (opinion). Likewise, at certain times positive (plus) portions of the wire at a site can become negatively charged by the occurrence of an arrow from a randomly selected neighbor conducting negative flow across the arrow. This "space-time" flow of influence being qualitatively correct, it is now a matter of selecting the distribution of occurrence times of 8's and arrows with the Markov property of the evolution in mind. The specification of the average density of occurrences of 6's and arrows will then furnish the rate-parameter values. Let {Nm (t): t >, 0}, m E 71 d , be independent Poisson processes with intensity 2d. The construction of a Poisson process having right-continuous unit jump sample functions is equivalent to the construction of a sequence {7,,(k): k > 1} of i.i.d. exponential inter-arrival times (see Section 5). The construction of countably many independent versions is made possible by Kolmogorov's existence theorem (theoretical complement 6.1 of Chapter!). Let {U m (k): k m e Z', be independent i.i.d. sequences and independent of the processes
°
327
CHAPTER APPLICATION
{Nm (t) }, m E Z, where P(U m (k) = n) = p mn . At the kth occurrence time r m (k):= Tm (1) + • • • + Tm (k) of the Poisson process at m e Z d a neighboring site of m, represented by Um (k), is randomly selected. One may also consider independent Poisson processes {Nmn (t): t >, 0}, m, n e Z d , with respectively ,
intensity parameters 2dp mn representing those (Poisson) times at m when the neighboring site n is picked. Notice that 2dp mn is either 1 or 0 according to whether m and n are neighbors or not. At the kth event time i = r m (k) of the process {Nm (t)}, let n = Um (k) be the corresponding neighbor selected. Place an arrow from (n, t) to (m, r), and attach the symbol S to the point (m, r) at the arrowhead. For a given initial configuration a(0) _ (° m (0)), define a flow to be initiated at the sites m such that a m (0) = —1. The flow passes vertically until blocked by the occurrence of a S above, and also passes horizontally across arrows in the direction of the arrows and is stopped at the occurrence of a b from moving upward. The configuration at time t is defined by 6 n (t)
= —1 if the (negative) flow reaches (n, t) from some initial point (m, 0)
= +1 otherwise. (11.3) More precisely, the flow is defined to reach (n, t) from (m, 0) if there are times 0= t o 0, a E S,
at
where, for such functions
(a) = T,(Af)(a) = E Q {Af(a(t))},
f,
Af(a) = lim t - a +
T f(a) — f(a) `
t
(11.6)
CHAPTER APPLICATION
329
= lim r-0
= L.
m
E { f (a(t Q
)) —
f (a)}
1
{ (a 1m, )
—J
(a)}C m (J).
(11.7)
Let tt be an arbitrary initial probability distribution on (S, S) and let µ, denote the corresponding distribution of a(t) with p o = µ. The (spatial) block correlations of the distribution p, over the finite set of sites D = {n 1 , ... , n,}, say, are the multivariate moments of a (t), ... , .r (t) defined by (pµ
(t, D) = E µ { Qn (t)
x
... X a(t)}.
(11.8)
Using standard inclusion—exclusion calculations, one can verify that a probability distribution on (S, .y) is uniquely determined by its block correlations. Let cp 0 (t, D) = cp sa (t, D). Applying equation (11.6) to the function f (a) := Q n , x x a, a e S, we get an equation describing the evolution of the block correlations of the distributions as follows:
Öt
j^
tp e (t, D)
=
Ea
kn Q( ) (t) — kfl
_ —2E6{
[ii ak(t)
Cm(a(t))S
k€D
mED
_ 2d Z E,, j meD
]
l 6 k(t) Cm(a(t))
l fl a (t)] L 1 k
t keD
—
am(t) I Pmn an( t ) n
J
}
1 p mn q (t, (D\{m})A{n}), (11.9)
_ —2dIDI cp e (t, D) + 2d
meD n
where A is symmetric difference, A AB := A n BC u AC n B, A, B c Z', and IDS is cardinality of D. Taking expected values, we get from (11.9) that acp µ (t, D)
a
— —2dIDjcp µ (t, D) + 2d
t
['
/
t, / D m 0 J n
mED n
_ Y E {cp µ (t, (D\{m})A{n}) — (p µ (t, D)}2p mn .
(11.10)
m€D n
As a warm-up to the equations (11.10) take D = {m} and suppose that p is an invariant (equilibrium) distribution. Then Ep C m = cp,,(0, {m}) is a harmonic . function for the random walk; i.e., the left-hand side of (11.10) is zero so that the equations show that q, has the averaging property (harmonicity) ßp µ (0, {m}) _ Y_ q,(0, {n})p mn . n
(11.11)
330
CONTINUOUS-PARAMETER MARKOV CHAINS
°
Now, for the simple symmetric random walk on 7L , one can show that the only bounded solutions to (11.11) are constants (theoretical complement 3). Therefore, the distribution of a m under the invariant equilibrium distribution is independent of m in this case. From here out we will restrict our consideration to the long-time behavior of various translation invariant initial distributions, with say, EN7m=2p-1,
0, 0, D e 2 (d) , the annihilating random walk, the solution to (11.13), (11.14) is given by
(PQ(t, D) = ED(a(0, D1) = E D I H m e D,
Qm
} . )))
(11.16)
331
CHAPTER APPLICATION
The representation (11.16) is known as the duality equation and is the basis for the proof of the following major result. Theorem 11.1. Let p = ((p m „)) be the transition probability matrix of the simple symmetric random walk on 7L ° associated with the simple symmetric voter model on 7L ° . For an initial probability distribution µ satisfying (11.12) we have the following: (a) If d < 2, then p, converges in distribution to pö Q+ + (1 - p)S,, _ as t -* oo. (b) If d >, 3, then for each p e (0, 1) there is a distinct translation-invariant equilibrium distribution v°° ) on (S, .F ), which is not a mixture of 6 Q+ and S a _, such that E,,,p,v m = 2p - 1. Moreover, if the initial distribution It is that of independent ± 1-valued Bernoulli random variables with probability parameter determined by (11.12), then µ, converges in distribution to v° 0) as t -+ co. Proof. To prove (a) first consider that
E p m (t) = cp µ (t, {m}) =
f
^p.(t, {m })µ(da)
S
_J [
ESml fl T k
]
1t(da)
Y_ Pmk(t)E j,Uk = 2p - I
for all t >, 0, (11.17)
k
where p mk (t) is the transition law for the continuous-time random walk. Also for distinct sites n and m in 7L" we have q (t, {n, m}) = Pp(0m(t) = ø(t)) — Pj (Om(t) # cr (t))
= I - 2Pµ (a m (t) : a, (t)).
(11.18)
Therefore, it is enough to show that cp(t, {n, m}) -• 1 as t -> oo to prove (a). Now since the difference of the two independent simple symmetric random walks starting at • m and n evolves as a (symmetrized) continuous-parameter simple symmetric random walk, it follows from the recurrence when d < 2, that 0 a.s. as t --> oo; i.e., the particles will eventually meet. Using the duality relation and the Lebesgue Dominated Convergence Theorem we have, therefore,
lim cp µ (t, {n, m}) = lim J E{f,m) s
L fl
(Tk}µ(dß) = 1.
(11.19)
keD^
To prove (b), on the other hand, take for It the Bernoulli product distribution
332
CONTINUOUS-PARAMETER MARKOV CHAINS
of independent values subject to (11.12) and consider the duality relation yl,(t, D) = E 1 , ED H
l
6 m}
= ED Et fl amt = ED fl EN am
meD,
{
meD1
J
meDt
= ED( 2 p — 1)ID I, (11.20) ,
where 1D1 1 denotes the cardinality of D1 . Now since JD I is a.s. nonincreasing, the limit, denoted FDJ, exists a.s. and is positive with nonzero probability by transience. The limit distribution v P is defined through its block correlation accordingly. That is, (
f
)
s{ [ am}vc")(da) = E D (2p — 1)ID. 1 . meD
Certainly it follows that v p» is translation-invariant and, being a time-asymptotic distribution of the evolution, one expects it to be invariant under the (further) evolution. More precisely, for any (bounded) continuous real-valued function f on S we have by the Chapman—Kolmogorov equations (semigroup property) that (
f(i1)v(di) = fS J f(rl)p(t; a, dil)vcP ) (da)
s
fS
f
(
= lim
s-.
f
11)p(t;
a, dq)g,(da),
(11.22)
s s
since for fixed (continuous) f and t > 0, the (bounded) function a --• $ s f (rl)p(t; a, dq) is continuous (theoretical complement) and, as noted earlier, by compactness of S µ s converges to v(da) in distribution (i.e., weak convergence) as s --* oo. Now, from (11.22) and the Chapman—Kolmogorov equations again, one has
J
f(t)v(di) = lim J .f(il)µ,+s(drl) = J f(rl)v (a) (dii). (11.23) s
s_^
s
s
Thus, since the integrals of continuous functions on S with respect to the distributions v(drl) and v (1»(dq) coincide, the two measures must be the same (see theoretical complement 8.6 of Chapter I). n The property that, for each bounded continuous function f on S. the function a —* E f (a(t)) = f s f (rl) p(t; a, dq) is also a bounded continuous function, is called the Feller property. As illustrated in the above proof (Eq. 11.22), this property is essential for confirming one's intuition about invariance properties of long-time limits under weak convergence. The other important role played Q
333
EXERCISES
by topology in the above was in the use of compactness to, in fact, get weak convergence from finite-dimensional calculations.
EXERCISES Exercises for Section IV.1 1. Show that in order to establish the Markov property it is enough to check Eq. 1.1 for only one "future" time point t, for arbitrary t > s. 2. Show that a continuous-parameter process with independent increments has the Markov property. Show also that such a process has a homogeneous transition law if and only if, for every h > 0, the distribution of X, +h — X, is the same for all t _> 0. 3. Show that, given a Poisson process {X} as in Example 1 with f ö p(u) du = cc, there exists an increasing transformation cp: [0, oo) —+ [0, cc) such that the process X,,,,} is homogeneous with parameter). = 1. 4. Prove that the compound Poisson process (Example 2) has independent increments. 5. Consider a compound Poisson process {X,} with state space U8' and an arbitrary jump distribution p(dx) on (f8'. (ii) Show that {X} is a Markov process and compute its transition probability p(t; x, B)'= P(X, E B I X0 = x).
(ii) Compute the characteristic function of X,. 6. (Doubly Stochastic Poisson or Cox Process) Suppose that the parameter A (mean rate) of a homogeneous Poisson process {X} is random with distribution p(dA) = f(A) d). on (0, ro). In other words, conditionally given A = ) o , {X} is a Poisson process with parameter A 0 . (i) Show that {X} is not a process with independent increments. (ii) Show that {X,} is (generally) not a Markov process. (iii) Compute the distribution of X, for arbitrary but fixed t > 0. (iv) Compute Cov(X ,, X,). .
7. Generalize Exercise 6 to compound Poisson processes. 8. The lifetimes of elements of a certain type are independent and exponentially distributed with parameter a > 0. At time t = 0 there are X0 = n living elements present. Let X, denote the number alive at time t. Show that {X,} is a Markov process and calculate its transition probabilities. 9. Let {X,: t _> 0} be a process starting at x with (stationary) independent increments, and EX, < cc. Assuming EX, and EX are continuous, prove the following. (i) EX, = mt + x, Var X, = a 2 t for some constants m and a 2 . (ii) (X, — mt)/../ converges in distribution to the Gaussian distribution with mean 0 and variance a', as t —^ co. 10.
(i) Show that the sum of a finite number of independent real-valued stochastic
processes, each having independent increments is also a process with independent increments. (ii) Let {N,(' ) } (i = 1, 2, ... , k) be independent Poisson processes with mean
334
CONTINUOUS-PARAMETER MARKOV CHAINS
parameter A ; (i = 1, 2, ... , k). If c„ C 2 ..... C, are arbitrary distinct positive constants, show that { c N} is a compound Poisson process and compute its jump distribution and the (Poisson) mean rate of occurrences. (*iii) Prove that every compound Poisson process is a limit in distribution of superpositions of independent Poisson processes as described in (ii). [Hint: Compute the characteristic function of respective increments.] ;
Exercises for Section IV.2 1. Check the Kolmogorov consistency condition for P,, defined by Eq. 2.2, assuming the Chapman—Kolmogorov condition (2.5). 2. There are n identical components in a system that operate independently. When a component fails, it undergoes repair, and after repair is placed back into the system. Assume that for a component the operating times between successive failures are i.i.d. exponential with mean 1/A, and that these are independent of the successive repair times, which are i.i.d. exponential with mean 1/µ. The state of the system is the number of components in operation. Determine the infinitesimal generator of this Markov process. 3. For the process {X,} in Exercise 1.8, give the corresponding infinitesimal generator and Kolmogorov's backward and forward equations. 4. Let {X,} be a birth—death process on S = {0, 1, 2, ...} with q i >,0, where ß,b>0,q ;; =0iflj—it>1,q o ,=0.Let
m(t) = EX,,
s (t) = ;
+
, = iß, q , _, = i5, ; ;
E.X..
(i) Use the foward equation to show m(t) _ (ß — S)m (t), m ; (0) = i. ;
(ii) Show m (t) = ;
(iii) Show s(t) = 2(ß — b)s (t) + (ß + S)m (t). (iv) Show that ;
;
ß + (1 — e ^e_ 1)]
+ sr(t)=
i(i + 2ßt),
if ß 8 if ß = b.
(v) Calculate Var X,. ;
*5. Consider a compound Poisson process {X,} on III' with an arbitrary jump distribution p(dx).
(i) For a given bounded continuous function f on II' compute u(t, x)'= E(f(X,) 1 Xo = x),
and show that at u(t, x) = —Au(t, x) +
J
.i J u(t, x + y)µ(dy) = A (u(t, x + y) — u(t, x))µ(dy).
335
EXERCISES
(ii) Show that the limit of the last expression is lim d u(t x) = (Qf)(x), '
, lo at
where Q is the integral operator (Qf)(x) _ ) f (f (x + y) — f (x)) p(d y). (iii) Write the (backward) equation (i) above for u(t, x) in terms of Q. 6. Let { Y} be a nonhomogeneous Markov chain with transition probabilities p ü (s, t) = P(Y = j ( Y = i), continuous for 0 _< s _< t, with p ;i (s, s) = b, i , i, j e S, and such that S
lim P. i(s' t ) — a +i = 91;(s) NS
t—s
exists and is finite for each s. (i) Show that the Chapman—Kolmogorov equations take the form, Pk(, t) _
p^ ; (s, r)P;k (r, t),
s < r < t.
(ii) For finite S show that the backward and forward equations, respectively, take the forms below: (backward)
ap`as' t) = —Z 9ii(s)Pik(s, t), i
(forward)
a
P,k(S, t) — dt _
P,;(s, t)9;k(t)• i
7. Consider a collection of particles that act independently in giving rise to succeeding generations of particles. Suppose that each particle, from the time it appears, waits a random length of time having an exponential distribution with parameter ), and then either splits into two particles with probability p or disappears with probability q = I — p. Find the generator of this Markov process on the state space S = {0, 1, 2, 3, ...}, the state being the number of particles present. 8. In Exercise 7 above, suppose that new particles immigrate into the system (independently of particles present) at random times that form a Poisson process with parameter y, and then give rise to succeeding generations as described in Exercise 7. Compute the generator of this Markov process. *9. Let {XJ be a Markov process with state space an arbitrary measurable space (S, .9'). Let B(S) be the space of (Sorel-measurable) bounded real-valued functions on S with the uniform norm, 111 11 = supxeslf(x)I, JE B(S). Then (B(S), II • II) is a Banach space. Define T, f(x) = E x f (X,), t >_ 0, x e S, for Je B(S). Also, for Je B(S) such that lim, . o . {(T, f — f)/t} exists in (B(S), II • II ), say that f belongs to the domain of Q and define Qf = lim,. o ,{(T,f — f)/t}. Show that if the domain of Q is all of B(S), then Q must be a bounded linear operator on (B(S), II • II); i.e., Q is continuous on (B(S), II • II). [Hint: T, f(x) = f(x) + J T,Qf (x) ds, t > 0, XE S, Je B(S). Apply the closed graph theorem from functional analysis.]
336
CONTINUOUS-PARAMETER MARKOV CHAINS
10. (Continuous-Parameter Pölya Process) Fix r> 0 and consider a box containing r = r(r) red balls and b = b(T) black balls. Every r units of time a ball is randomly selected, its color is noted, and together with c = c(z) balls of the same color it is placed in the box. Let S„ T denote the number of red balls sampled by the time nr, n = 0, 1, 2, ... , S o = 0. As in Eq. 3.14 of Chapter II, {S„,} is a discrete-parameter nonhomogeneous Markov chain with one-step transition probabilities pj.;(nr, (n + 1)T) =
I - pj.^+i(nr, (n + 1)r),
and p,.;+ (nt,(n+1)a)=P(S^, +nz =i +1 S,, =i)=
r
+ ci
i= 0,1,...
r + b + nc '
Let p = p(r) = r/(r + b), y = y(r) = c/(r + b), t = nz. Then the probability of a transition from i to i + 1 in time t to + t is given by
p+yi
(t,t+T)=
1+yt i
Note that p = r/(r + b) is the probability of selecting a red ball at the nth trial and np = (p/z)t is the expected number of red balls sampled by time t = nr, i.e., p/i is the mean sampling rate of red balls. Suppose that p/T = p(r)/r -+ 1 and = Y(i)/t -. y o > 0 as t - 0.
(i) For fixed r > 0, use a combinatorial argument to show that the distribution of S,, is given by n) b(b+c)•••(b+(n (
-
j)c
-
c) r( r+c). (r +jc
-
c)
j (b+r)(b+r+c) •(b+r+nc-c)
(ii) Show that in the limit as r -+ 0, the distribution in (i) converges to f(t) = t'(1 + yot)-t-'
((j - I)Yo + 1 )
..
. (2Yo + 1 )(Yo + 1)
,
= 0,1, 2, ... ,
j•
which is a negative binomial p.m.f. (iii) Show that jf (t) = t 3=o
and
(j - t) Z Tf(t)
= Yot 2 + t.
3=o
(iv) The continuous-parameter Pölya process is often defined as a nonhomogeneous (pure birth) Markov chain { Y,} starting at 0 on S = {0, 1, 2, ...} with transition probabilities denoted by P(Y =j I Y = i) = p ;; (s, t), j, i = 0, 1, 2, ... , 0 _< s 0.
338
CONTINUOUS-PARAMETER MARKOV CHAINS
2. Write out the third iterate (of Eq. 4.7) p;» for the case S = {l, 2, 3}, q,, i = —A, q1,2 = A q 22 = —µ‚ qz.3 = µ, q3.3 = — y, q 32 = y, q•; = 0 otherwise for i, j e S. 3. (A Pure Death Process) Let S = {0, 1, 2, ...} and let q_ = 6 > 0, q ;; _ —S, i > 1, = 0, otherwise. (i) Calculate p (t), t 0 using successive approximations. (ii) Calculate E.X, and Var ; X. ;;
4. Calculate approximations p 2 (t) for Exercises 2.7 and 2.8. 5. Show that the minimal solution p(t) also satisfies the forward equation. [Hint: Consider the integral version.] 6. Show that the minimal solution p(t), t > 0, is a semigroup (i.e., that it satisfies the Chapman—Kolmogorov equations) by the following procedure. (i) Let f be a continuous bounded function on [0, oo) with Laplace transform f(v) =
f"
e °f(t) dt. -
'
Define F(s, t) = f(s + t), s, t _> 0, and let F(v, µ) =
J
$ e`°F(s, t) ds dt
(µ > 0, v > 0)
0
be the bivariate Laplace transform of F. Show that P satisfies the resolvent equation: F(v, µ) _ —
i(v) — ^(µ)
v ^ .
V—/1
[Hint: Write u = s + t, v = —s + t, 0 , 0 if and only if
_ p(v) — p(µ) = p(µ)p(v),
vu.
v—p
(iii) Show that the backward and forward equations (see also Exercise 5) transform, respectively, as [B]: vp(v) = I + Qp(v), [F]: v(v) = I + P(v)Q•
(iv) Use the backward equation to show µp(v) = A + Q(v) for A = I + (p — v)p(v). Use nonnegativity of p(v) and A for p> v, to show by
339
EXERCISES
induction, using (4.7), that p(v) > ß( " ) (p)A for n = 0, 1, 2, ... , and therefore P(v) > P(,u)A = p(µ) + (p - v)P(µ)P(v),
p> v.
(v) Use [F] to prove the reverse inequality and hence (ii). [Hint: Check r(v) = p(p) + (u - v)p(p)p(v) solves [F] and use minimality.] 7. Compute p ;k (t) for all i, k in Example 1.
Exercises for Section IV.5 1. Show that the holding time T" (n > 1) is not a stopping time. 2. Let A(A), B(A), B(0) denote the events Is < To 0. Let To denote the time of the first arrival. (i) Let N = X ZTo — X T ,, and calculate Cov(N, To ). (ii) Calculate the (conditional) expected value of the time To + • • • + T,_, of the rth arrival given X, = n > r.
>_
9. Suppose that two colonies, I and 2, start as single units and independently undergo growth by pure birth processes with rates ß„ ß 2 , respectively. Calculate the expected size of colony 1 at the time when the first offspring is produced in colony 2. 10. Consider a pure—death process with rates —q ;; = q 1 _ 1 = 2>0, i 1, q• ; = 0 otherwise. (i) Calculate p(t). (ii) If initially there are n, particles of type 1 and n 2 particles of type 2 that independently undergo pure death at rates S,, 5, respectively, calculate the expected number of type 1 particles at the time of extinction of the type 2 particles. 11. Let {N,} be a homogeneous Poisson process with parameter 2>0. Define X, _ (-1) ^ `, t 0. Show that {X,} is a Markov process and compute its transition probabilities. 12. Consider all rooted binary tree graphs having n sources (i.e., degree-one vertices excluding the root). Call any edge incident to a source vertex an external edge and call the others internal (see Fig. Ex.IV.5). [By a rooted tree is meant a tree graph in which a vertex is singled out as the root. Other graph-theory terminology is described in Exercise 7.5 of Chapter II.]
n=4
-source Internal
Root Figure Ex. IV.5
EXERCISES
_>
341
(i) Show that a rooted binary tree with n sources has n external edges and n — t internal edges, n 1. In particular, the total number of edges is 2n — 1, which is also the total number of vertices excluding the root. (ii) Show that the following code establishes a one-to-one correspondence between the collection of all rooted binary tree graphs and the collection of all simple polygonal paths from (0, 0) to (2n — 1, — 1) in steps of + I that do not touch or cross the line y = —1 prior to x = 2n — 1. Starting with the edge incident to the root, traverse the tree along the leftmost path until reaching the leftmost source. Then follow back until reaching a junction leading to the next leftmost source, and so on, recording, on the first (and only on the first) traverse of an edge, + if it is internal and — if it is external. The path 2n — I long of (+ )s and (—)s furnishes the coding of the tree in the form of a "random walk excursion."
(iii) Use (ii) and the reflection principle (Section 4, Chapter I) to calculate the distribution of ML in Example 3. (iv) Let g(x) = Ex M I denote the probability-generating function of ML . Use the recursive structure of the tree to establish the quadratic equation g(x) = Zx + ? g 2 (x). (v) Use (iv) to give another derivation of the distribution of ML .
13. Show that the holding time in state j for Example 2 is exponentially distributed with parameter i t = jA (I? 1).
Exercises for Section IV.6 I. Show that (I + x) log(1 + x)
_> >_ x for all x
0.
2. Show that (i) the backward equation holds for p 1 (t), and (ii) {‚(t): t > 0} are transition probabilities.
>_
3. Show that {p(t): t > 0} are transition probabilities on S v {A} satisfying the backward and forward equations. 4. (i) Consider an initial mass of size x o that grows (deterministically) to a size x, at time t at a rate that is dependent upon size; say, x; = f (x,), t 0. Give an example of a growth-rate function f (x), x 0, such that the mass will grow without bound within a finite time > 0, i.e., lim,_, x, = co. [Hint: Consider a case in which each "element" of mass grows at a rate proportional to the total mass, so that the total mass itself grows at a rate proportional to the mass squared.] (ii) Let X, denote the number of reactions that have occurred on or before time t in a chain reaction process. Suppose that {X} is a pure birth process starting at 0. Show that the expected number of reactions that occur in a finite time interval [0, t] need not be finite. (A Bus Stop Problem) A passenger regularly travels from home to office either by bus or by walking. The travel time by walking is a constant t w , whereas the travel
time by bus from the stop to work is a random variable, independent of the bus arrival time, with a continuous distribution having mean t b < t w . Buses arrive randomly at the home stop according to a Poisson process with intensity parameter 2. The passenger uses the following strategy to decide whether to walk or ride. If the bus arrives within c time units, then ride the bus; if the wait reaches c, then walk. Determine (optimality) conditions on c that minimize the average travel time in terms
342
CONTINUOUS-PARAMETER MARKOV CHAINS
of t b , t„„ A. [Note: The solutions c = 0 (always walk), c = cc (always ride) are permitted.] 6. Consider a single telephone line that is either free (state 0) or busy (state 1). Suppose incoming calls form a Poisson process with a mean rate (intensity) A per minute. The successive durations of calls are i.i.d. exponential random variables with parameter p (mean duration of a call being p ' minutes), independent of the Poisson process of incoming calls. If a call arrives at a time when the line is busy, the call is simply lost. (i) Give the corresponding prescription of Kolmogorov's backward and forward equations for the state of the line. (ii) Solve the forward equation. -
Exercises for Section IV.7 1. Show that the process {X} described noncanonically in Example 2 is a Markov process. 2. Prove Eq. 7.11. 3. Let {X,} be the Yule linear growth process of Example 2. Show that (i) EXX = ie". (ii) Var, X, = ie(el` - 1) = 2ie 3 II 2 sinh(lAt). 4. Show that the N-server queue process {X} is a Markov process. *5. (Renewal Age Process) Let Ti , T2 ,... be an i.i.d. sequence of positive random
variables with a continuous (lifetime) distribution function F. Let S o = 0, S„=T1 +T2 +•••+T,,,n> 1,andforA 0 =O,definefort>0,theageattimet by A,:= t - max{Sk : Sk _< t, k >_ 0}. Then {a + A,} is the process starting at a >_ 0. (i) Show that {A,} has the Markov property: i.e., for arbitrary time points 0 -< s o <S 1 < • • • <S <s < t < t, < • • • < t n , the conditional distribution of A so , .. , A sk A S does not depend on A so , .. , A sk A, A,,, .. , A (ii) The failure rate (also called hazard rate or force of mortality) for the objects being renewed is defined by h(t) = f(t)/[l - F(t)], where f is the p.d.f. (assumed to exist) of F. For continuously differentiable functions g having bounded derivatives show, for a _> 0, .
,
Qg(a): lim Ep
g(Ar) - g(a) dg = h(a){g(0) - g(a)} + —>
,- o t
da
a -> 0.
(iii) Show that µ(a) = Z -1 exp{ - J$ h(u) du}, a >_ 0, where Z is the normalization constant, solves f o u(a)Qg(a) da = 0. In particular, show that p is the density of an invariant probability for {A,}. (iv) Show (i) holds also for the residual lifetime { SN , - t}, where N, = inf {n >_ 0: S> t}. For ET, < oc, show that ff (t) = (1 - F(t))/ET„ t >, 0, is the density of an invariant probability. [The above has an interesting generalization by
EXERCISES
343
F. Spitzer (1986), "A Multidimensional Renewal Theorem," in Probability, Statistical Mechanics, and Number Theory, Advances in Mathematics Supplementary Studies, Vol. 9 (G. C. Rota, ed.), pp. 147-155.] 6. (Thinned Poisson) Let {t n } be the sequence of occurrence times for a Poisson process {N} with parameter 2. Let {r„} be an i.i.d. sequence, independent of the Poisson process, of Bernoulli 0-1-valued random variables such that P(E, = 0) = p, 0
0,
li0.1)(e tn),
Vo = 0,
where {N, = max {n: i n < t}} is the Poisson counting process. Show that {}} is a Poisson process with intensity parameter (1 - p)2. 7. (i) (Vibrating String) For Example 5, check that in the (deterministic) case
p = ar = 0, u(x, t) = 2 {cp(x + vt) + cp(x - vt)}. In particular, that u(x, t) solves —=v2 [Z
x2 ,
u(x, 0) = cP(x),
nt
au t) =0
at t = 0.
(ii) Check that there is a randomization of time represented by a nonnegative nondecreasing stochastic process { T,} such that u(x, t) = z{Ecp(x + vT,) + Ecp(x - vT) }. [Hint: Simply consider (7.37).] 8. (Diffusion limit) Check that in the limit a -. co, v - oo, 2a /V 2 = D ' > 0, the diffusion equation -
äu
au
at = D 3x2' is consistent with the telegrapher's equation. 9. The flow of electricity through a coaxial cable is typically described by the telegrapher's equation (7.19) or (7.32), (7.33), where u(x, t) represents the (instantaneous) voltage and w(x, t) the current at a distance x from the sending end of the cable. The parameters a, /3, and v 2 can be interpreted in terms of the electrical properties of the cable as outlined in this exercise. If one ignores leakage (conductance due to inadequate insulation), then the circuit diagram for the segment of cable from x to x + Ax may be depicted as in Figure Ex.IV.7 below. Here the parameters R, L, and C are the resistance, inductance, and capacitance per unit length. These parameters are defined in accordance with certain physical principles. For example, Ohm's law says that the ratio of the voltage drop across a resistor to the current through the resistor is a constant (called the resistance) given here by R'Ax. Thus, the voltage drop across the resistor is R Ax w(x, t). Likewi.>e,
344
CONTINUOUS-PARAMETER MARKOV CHAINS x+Ax R4x
Lex
I
❑ (x. t)
J
-L
cex
u(x +/X. t)
Figure Ex. IV.7
when there is a change in the flow of current aw/at in an inductor then there is a corresponding voltage drop of L Ax Ow/Ot. According to Kirchhoff's second law, the sum of potential drops (as measured by a voltmeter) around a closed loop in an electric network is zero. Thus, for Ax small, one has u(x + Ax, t) - u(x, t) + R Ax w(x, t) + L Ax
a`
= 0.
The nature of a capacitor is such that the ratio of the charge stored to the voltage drop is the capacitance C Ax. Thus, the capacitor current, being the time rate of change of charge, is given by C Ax au/at. According to Kirchhoff's first law, the sum of the currents flowing toward any point in an electrical network is zero (i.e., charge is conserved). Thus, for Ax small, one has au w(x+ Ax, t)—w(x,t) +CAx—=0. at (i) Use the above to complete the derivation of the transmission line equations and the telegrapher's equation (for twice-continuously differentiable functions) and give the corresponding electrical interpretation of the parameters a, ß, and v 2 (ii) In the case when leakage is present there is an additional parameter (loss factor) G, called the conductance per unit length, such that the leakage current is proportional to the voltage G Ax u(x, t). In this case the term G Ax u(x, t) is added to the left-hand side in Kirchhoff's first law and one sees that the voltage and current satisfy transmission line equations of exactly the same general forms. Show that precisely the same equation is satisfied by both voltage and current. .
10. In Example 4, suppose that the service time distribution is arbitrary, with distribution function F. Determine the distribution of W,.
Exercises for Section IV.8 1. Let Z (i >, 1) be a sequence of i.i.d. nonnegative random variables, P(Z, > 0) > 0. Prove that Y "_, Z -* oo almost surely. ;
;
2. Prove that the first and third terms on the right side of Eq. 8.13 go to zero (almost surely) as t -+ oo, provided (8.10) holds.
345
EXERCISES
3. When is a compound Poisson process (i) a pure birth process, (ii) a birth—death process? Write down q1, k ;j in these cases. 4. Let {X,} be a birth—death chain, and let c <x 0 and consider the discrete-parameter one-step transition probability matrix q = ((p ;,(t))). Then q" = ((p ; ,(nt)), n = 0, 1, 2, .... Use the semigroup property to show that lim"^ q;jl' (exists) does not depend on t > 0.] 7. Suppose that {X,} is irreducible and positive-recurrent. Let {}"} be the embedded discrete-parameter Markov chain with transition matrix K. Let a be the invariant distribution for {X,} such that n' Q = 0. Calculate the invariant distribution for { Y"}. 8. Let {X} be a positive-recurrent birth—death process on S = {0, 1, 2, ...} with birth—death rates ß ; a ; and b ; A ; respectively. Show that =
where it is the invariant initial distribution. 9. Consider the N-server queue under the invariant initial distribution (8.27). (i) Show that the expected number of customers waiting to be served (excluding those being served) is given by (IP( ^) ) N' rc o , i
where p = Nß (< 1)
is referred to as the traffic intensity parameter; i.e., the mean number of arrivals within the average service time ß '/N. (ii) Show that the average length of time a customer must wait for service is -
(NP)"
It o .
N(1 — p) 2 ßN!
(iii) A particular hospital ward receives patients according to a Poisson process at an average rate of a = 2 per day. The average length of stay is 5 days. Assuming the length of stay to be exponentially distributed, how many beds are necessary
346
CONTINUOUS-PARAMETER MARKOV CHAINS
in order to achieve an equilibrium distribution for the total number of patients admitted and waiting admission? What will be the average waiting time for admission? 10. Let {X= } be the continuous-parameter Markov (binary) branching process with offspring distribution f(0) = (5,f(2) = ß, ß + 8 = 1, and parameter A. (i) Show that {X,} is a birth—death process with linear rates. (ii) Let gt =_ g;' (r):=Z ; P; (X, = j)rt denote the p.g.f. of the P ; -distribution of X,. Show that the forward equations transform as )
°
age _ at =[ß(r
z
ag^'> _ r , r)+b(1—r)] ^
Q=Q1ß, S=2S.
(iii) Solve the equation for g. 11. Suppose that {X,} has transition probabilities p 1 (t) and infinitesimal transition rates given by Q = ((q)). Suppose that it is a probability distribution satisfying zt'Q = 0. If E ; A ; n i < oo, where 2 ; = —q ;; , then show that it is an invariant initial distribution. [Hint: Differentiation of >' nip(t) term by term is allowed.] 12. If S comprises a single communicating class and it is an invariant probability, then all states are positive recurrent. [Hint: Consider the discrete time chain X,, n >_ 0. Use Theorem 9.4 of Chapter II. Note that q defined by (8.4) is smaller than the second passage time to state i for the discrete parameter chain.] 13. Show that positive recurrence is a class property. [Hint: Use arguments analogous to those used in the discrete-time case.] 14. Verify the forward equations for Example 5.3. [Hint: Use Exercise 4.5 and Example
8.2.] Exercises for Section IV.9 1. Calculate the spectral representation of the transition probabilities p 1 (t) for the continuous-parameter simple symmetric random walk on S = {0, 1, 2, ...} with one reflecting boundary at 0. [Hint: Use (9.19).] 2. (Time Reversibility) Let {X,} be an irreducible positive recurrent Markov chain with
invariant initial distribution it. (i) Show that {X} is a stationary process; i.e., for any h > 0, 0 < t i < t 2 < • • < t k (X,,..... X, k ) and (X,,,,, . .. , X, k „,) have the same distribution. (ii) Show that {X,} is time-reversible, in the sense that for any 0 _< t i < t 2 < • • • < t j, < T (X,,, . .. , X, k ) and (XT _ tk , ... , X T _,,) have the same distribution, if and only if n ; p ;j (t) = rr t pt; (t)
for all i, j e S, t _> 0.
This last property is sometimes referred to as time-reversibility of n. (iii) Show that {X} is time-reversible if and only if n ; q i; = rc ; q ;; for all i, j e S. (iv) Show that {X,} is time-reversible if and only if the discrete-parameter embedded chain is time-reversible; see Exercise 7.4 of Chapter II. (v) Show that the chemical reaction kinetics example is time-reversible under the invariant initial distribution.
347
EXERCISES
3. (Relaxation and Maximal Correlation) Let {X,} bean irreducible finite-state Markov chain on S with transition probabilities p(t) = ((p ;j (t))), t >, 0, and infinitesimal rates Q = ((q i3 )). Let it be the unique invariant distribution and assume the distribution to be time-reversible as defined in Exercise 2(ii) above. Define the maximal correlation by p(t) = sup f , 9 Corr( f (XX ), g(X o )), where the supremum is over all real-valued functions f, g on S and Corrn(.f(X,), g(Xo)) =
E{[J(X,) — E,.f(X^)][g(XO) — Eg(XO)]}
{ Var r f (Xi )} "2 {Var,, g(Xo)} It 2
Show that p(t)=e2,ß
t>_0
where A, < 0 is the largest nontrivial eigenvalue of Q. The parameter y = —1/),, is called the relaxation time or correlation length parameter in this context. [Hint: For f, g such that E rt f = E R g = 0, 1II f II,, = IIII R = i Corr rt ( f (X,), g(X o )) = (p(t)f, g),. So there is an orthonormal basis {(p„} of eigenvectors of p(t). Since k ! p(t) = e Qr _ k ^ _ Qktk
the eigenvalues of p(t) are of the form e`^' where A. are eigenvalues of Q. Take f = g = cp, to get , = e` and, for the general case (centered and scaled),
expand f and g in terms of {q,}, i.e., f = J„ (f, cp„) n cp,,, g = E„ (g, to show ICorr n ( f (X,), g(X0 ))l _< e. Use the simple inequality ab _< (a 2 + b 2 )/2.] 4. Let {X,} be an irreducible positive recurrent finite-state Markov chain with initial distribution p. Let lt j (t) = PI (X, = j), je S, and let it denote the invariant initial distribution. Also let Q = ((q, j )) be the matrix of infinitesimal rates for {X}. Show that (i)
dµ (t) = '
dt
{µ(t)q 1 — p (t)q 1 },
z(0)=p,
(ii) Suppose that it is a
t >' 0, j e S,
tes
time-reversible
j e S.
invariant distribution and define
Yjj =
n1g1j
_ njgj,
(possibly infinite). Show that dpj(t) dt
_
1
keS Yjk
ilk\t) — 14j (t) j
Ttk
e S.
Itj )'
Consider an electrical network in which the states of S are the nodes, and y jk is the resistance in a wire connectingj and k. Suppose also that each nodej carries a capacitance rrj . Then these equations are Kirchhoffs equations for the spread
348
CONTINUOUS-PARAMETER MARKOV CHAINS
of an initial "electrical charge" µ ; (0) = p,, j ES, with time (see also Exercise 7.9). The potential energy stored in the capacitors at the nodes at time t, when the initial distribution is µ(0) = t, is given by U(t) = U(t(t)) = — - Y
2
(ul(t))^
z
ieS n%
One expects that as time progresses energy will be dissipated as heat in the wires. (iii) Show that if p 96 it, then U is strictly decreasing as a function of time. (iv) Calculate U(p(t)) for the two-state (flip-flop) Example 3.1 in the cases t(0) = 6 ( ,^, i =0,1.
5. As in Exercise 4 above, let {X1 } be an irreducible positive recurrent finite-state Markov chain with initial distribution it. Let µ; (t) = PI (XZ = j), je S, and let it denote the invariant initial distribution. Let h be a strictly concave function on [0, x) and define the h-entropy of the distribution at time t by h
H(t) = H(µ(t)) ,%e$
t >- 0. '\
1j
(i) Show for µ 0 n, H(t) is strictly increasing as a function of t and lim 1 ^^ H(p(t)) = H(n). (ii) Statistical entropy is defined by taking h(x) _ —x log x in (i). Calculate H(p(t)) for the two-state (flip-flop) Example 3.1 when t(0) = S (;r , i = 0, 1. Exercises for Section IV.10 ) appearing in Eq. 10.4. [Hint: Apply 1. Calculate the distribution G(t; 2, A ; , ... , A i the partial fractions decomposition to the characteristic function of the sum.] 2. Calculate the distribution of the time until absorption at j = 0 for the pure death process on S = {0, 1, 2, ...} starting at i, with infinitesimal rates q 1 = ib if j = i — 1, i >, 1, q i1 = —i8, q 11 = 0 otherwise. What is the average time to absorption? 3. A zero-seeking particle in state i > 0 waits for an exponentially distributed time with parameter a. and then jumps to a position uniformly distributed over 0, 1, 2, ... , i — 1. If in state i < 0, it holds for an exponential time with parameter 2 and then jumps to a position uniformly distributed over i + 1, i + 2, ... , —1, 0. Once in state 0 it stays there. Calculate the average time to reach zero, starting in
state
i.
*4. (i) Under the conditions stated in Example 2, show that conditioned on nonextinction at time t, X, has a limit distribution as t -- oc with p.g.f. given by Eq. 10.34. [Hint: Check that Zj o P,(X, =j I r (o) > t)r may be expressed as {g,(' ) (r) — g,(' ß ( 0 )}/( 1 — g"(Ø)). Apply (10.28) to get H(t + H(r)) — H '(t + H(0)) -
1 — H"'(t + H(0)) which by (10.31) is, asymptotically as t -+ oc, given by I — exp{x l (H(r) — H(0))} x (1 + 0(1)). Finally, apply (10.26).]
THEORETICAL COMPLEMENTS
349
(ii) Determine the precise form of the distribution of the limit in the binary case .r2=p_ 1, Q can be factored as an n-fold convolution of a probability distribution Q. on (R' , . iJ), i.e., Q = Q, * • • • * Q. (n-fold ), n >_ 1. Familiar examples are the Normal, Gamma, Poisson and Cauchy distributions, as well as the first passage time distribution of the Brownian motion (Chapter 1). A basic property of infinitely divisible distributions is that the characteristic function Q() = f eQ(dx), 5 e R', is never zero. To see this, observe that by considering cP(5) = IQ 2 (^)I if necessary, one may assume to be given a real nonnegative characteristic function which, for any n _> 1, factors into an n-fold product of real nonnegative characteristic functions q,, (of some probability distributions µ n , say). Since q(0) = 1 and cp is continuous, there is an interval [ —i, r] on which cp(5) is strictly positive. Now, taking the unique version of log cp(^) which makes - log cp(5) continuous and log cp(0) = 0, log 1, X,
—
XS= Z (X,—X, 1 )•
k=1
t k =s+(t—s)^
k=0,1,.. ,n.
n
Conversely, given an infinitely divisible distribution Q there is a family of probability measures Q„ t > 0, such that Q, = Q and Q, * Q s = Q, + ,,, obtained, for example, though their respective characteristic functions by taking cp,(^) = (cp(^))':_ exp{t log cp, ( )}, since cp, (S) ^ 0, for H' Thus one obtains a consistent specification of the finite-dimensional distributions of a stochastic process {X,} having stationary independent increments starting at 0 such that X, — X, has distribution Q, _„ 0 _< s< t.
CONTINUOUS-PARAMETER MARKOV CHAINS
350
The process {X,} with stationary independent increments is a Markov process on R' starting at 0 having stationary transition probabilities given by p(t;x,B)=P(X,+sEB1 Xs=x)=Q,(B — x),
t>0, BE.c',
where B — x :_ {y — x: ye B} is a translate of B. 2. (Levy—Khinchine Representation) The Brownian motion process and the compound Poisson process are simple examples of processes having stationary independent increments. While the Brownian motion is the only one of these that is a.s. continuous, both are continuous in probability (called stochastic continuity). That is, for any t o >' 0, X, —• X 0 in probability as t —+ t o since (i) for the Brownian motion, a.s. convergence implies convergence in probability and (ii) for the compound Poisson process, P(IX, — X,I > e) < I — e -z^` = 0Qt — sI) for e > 0. Although the sample paths of the Poisson and compound Poisson processes have jump discontinuities, stochastic continuity means that there are no fixed discontinuities. Stochastic processes {X} and {Y} defined on the same probability space and having the same index set are said to be stochastically equivalent if P(X, = Y) = 1 for each t. Stochastically equivalent processes must have the same finite-dimensional distributions. As an application of the fundamental theorem on the sample path regularity of stochastically continuous submartingales given in theoretical complement 5.2, it will follow that a stochastically continuous process {X,} with independent increments is equivalent to a stochastic process {Y} having the property that almost all of its sample paths are right-continuous and have left-hand limits at each t, i.e., have at most jump discontinuities of the first kind. Moreover, the process {Y,} is unique in the sense that if { Y;} is any other such process, then P(Y, = Y, for every t) = 1. Without loss of generality we may assume the given process {X} with stationary independent increments to have jump discontinuities of the first kind. Such a process is called a (homogeneous) Levy process. The prefix homogeneous refers only to stationarity of the increments. Theorem T.1.1. If {X,} is a (nondegenerate) homogeneous Levy process having a.s. continuous sample paths, then {XX } must be Brownian motion. ❑ Proof. Let s < t. By sample path continuity, for any e> 0 there is a number 6 = S(c)> 0 such that P(IX„ — X
I <e whenever lu — VI 1 — e.
Let {E"} be a sequence of positive numbers decreasing to zero and partition (s, t] into subintervals s = tö / < t;" < ... < tk", = t of lengths less than b" = b(e,,). Then )
)
k„
k„
X,—X9 = E (XX ,1—X1 )__ E X;" I
where X'> . .. , Xk^ are independent (triangular array). Observe that the truncated random variables )
351
THEORETICAL COMPLEMENTS
are also independent and
.,
—+ X, — X, in probability as n —* oo since k„
> I — e". P(X, — XS =R)>
The result now follows by an application of the Lindeberg CLT (Chapter 0, Theorem 7.1). n Within this same context, the other extreme is represented by the following. Theorem T.1.2. Let {XX } be a homogeneous Levy process almost all of whose sample paths are step functions with unit jumps. Then {X,} is a Poisson process. 0 The proof of Theorem T.1.2 will be based on the following basic coupling lemma. Lemma. (Coupling Bound). Let X and Y be arbitrary random variables. Then for any (Borel) set B, P(X e B) — P(Ye B)I -< P(X ^ Y). Proof. First consider the case P(X e B) > P(Y e B). Then 0- 1) ) = 1) I f-(U I - {F
e-n,
;
;
;
THEORETICAL COMPLEMENTS
353
(i).
since 1 — e °i -< p,. Thus, (T.1.4) and (T.1.5) prove To derive the estimate (ii), let > 0 be arbitrary and use the triangle inequality to write -
i
PI ` ^t Y Ei) — FA (J)I 5
IP
1
(^
Y Ei) — FP„(J) + IF,,(J) — Fz(J)I
n
<
p? + IF(J) — FA(J)I max pi 15i^n
(T.1.6)
pi + IFF .(J) — FA (J)I. i =1
To complete the proof, we simply need to show 1—
IF(J) — FA(J)I
For this, first suppose p" = _, p ; > land let Z„ Z Z be independent Poisson random variables with parameters a and Y"_ , p i — A. Again use the coupling bound lemma to bound the last term in (T.1.6) by P(Z1 ^ Z 1 + Zz) = P(Z2 ^ 0) = 1 — exp
[—
(iZ pi — i)] -< .l — t
pi. (T.1.7) i =1
n
The symmetrical argument works for Y" =1 p ; < A also.
Proof of Theorem T.1.2. It is enough to show that for each t > 0, X, has a Poisson distribution with EX, = At for some A > 0. Partition (0, t] into n intervals of the form (t i _,, t i ], i = 1, ... , n having equal lengths A = t/n. Let A> = {X,, — X1 l} and SS _ Y"-_, IÄ"? (the number of time intervals with at least one jump occurrence). Let D denote the shortest distance between jumps in the path Xs , 0 < s -< t. Then P(X, 0 S") -< P(0 < D < t/n) —+ 0 as n --, oo, since {X, 0 Sn } implies that there is at least one interval containing two or more jumps. Now, by the Poisson approximation lemma (ii), taking J = {m}, we have P(S" = m) — — e -x' \ Inp" — Al + (npn)z
m!
n
where pn = pin) _ P(A^nl) ,
i = 1, ... , n, and A > 0 is arbitrary. Thus, using the triangle inequality and the coupling bound lemma, P(XX
= m )-- e xS -
IP(X, =m )—P(Sn=m )I
+
m!
P(Sn=m )
--- e -zl m!
P(X, 0 Sn) + Inpn — AI + npn
=o( 1 )+Inpn — Al+np^.
(T.1.8)
The proof will be completed by determining A such that np" — Al —• 0, at least along
354
CONTINUOUS-PARAMETER MARKOV CHAINS
a subsequence of {np"}. For then we also have np„ = np"P(A) — 0 as n —> oo. But, since P(X, = 0) = P(n"=, A;'°') = (1 — p")", it follows that P(X, = 0) > 0; for otherwise P(A;" ) ) = 1 for each i and therefore X, > n for all n, which is not possible under the assumed sample path regularity. Therefore np, _< —n log(1 — p") = —log P(X, = 0) for all n. Since {np"} is a bounded sequence of positive numbers, there is at least one limit point A > 0. This provides the desired A. • The remarkable fact which we wish to record here (with a sketch of the proof and examples) is that every (homogeneous) Levy process may be represented as sum of a (possibly degenerate) Brownian motion and a limit of independent superpositions of compound Poisson processes with varying jump sizes. Observe that if the sample paths of {X,} are step function of a fixed jump size y 0 0, then {y 1 X1 } is a Poisson process by Theorem 1.1.2. The idea is that by removing (subtracting) the jumps of various sizes from {XX } one arrives at an independent homogeneous Levy process with continuous sample paths. According to Theorem T.1.1, therefore, this is a Brownian motion process. More precisely, let {X,} be a homogeneous Levy process and let S = [0, T) x !J' for fixed T> 0. Let a(S) be the Borel sigmafield of S, and let R + (S) = {A e g(S): p(A, [0, T) x {0}) > 0}, where p(A, B) = inf{Jx — yl: x e A, ye B}. The space—time jump set of {X,} is defined by J=J(w)={(t,y)eS:y=X,(ow)—X,-(cw) 0,0, 0, where K is a nonnegative measure on I8'. Example 1. K = 0. Then {X,} is Brownian motion with drift m and diffusion
coefficient a Z . Example 2 r = lim
y z K(dy)
e10 IYI>s 1 + y
exists. Then, X, = (m + r)t + 6 2 B, +
5
yv,(dy).
In this case
Ee' 4x, =exp{it(m+r)^—t
l
= lim
f
e10
+t
J -. [e'^Y - 1]K(dy)}.J J
Ifm+r= 0, a 2 = 0, then
Example 3.
X,
Z
yv,(dy)
and
Ee°
= exp{lim t J
(IYI>e)
(((sly
(e'4Y — 1)K(dy) (IYI>el )))
Example4. IfA=K(t') 0, by the properties (i) and (ii), Ill - Tx- lTII - 0 as A - oo, and the integrand is bounded by 2e - TII f II since II 7;f II _< Ill II for any t >_ 0. By Lebesgue's Dominated Convergence Theorem (Chapter 0) it follows that III f - 2R x f II - 0 as A -+ oo; i.e., ARJ f - f uniformly on S as d --> oo. With this property of the resolvent in mind, consider the process { Y} defined by 1', = e
-,
"Rj(X,),
(T.5.9)
t >, 0,
where f e C(S) is fixed but arbitrary nonnegative function on S. Then {Y,} is a supermartingale with respect to ^, = a{X,: s 5 t}, t > 0, since EI YI < 00, Y, is .-measurable, one has T f (x) >, 0 (x e S. t >_ 0), and - zn +e >E{R f(X )^.3z}=e-xetF ThR f X) E{Y t+h 1.^}:=e r ^ r+h h r ,t ( r
= e -at f m e -xis+nt
J
m - zr T,+hf(X$) ds = e.
0
e-.0 f
e
-,
"T.f(X,) ds
n
e'Tf(X,)ds=Y.
(T.5.10)
^
.
Applying the martingale regularity result of Theorem T.5.3, we obtain a version { Y} of { Y} whose sample paths are a.s. right continuous with left-hand limits at each t.
THEORETICAL COMPLEMENTS
361
Thus, the same is true for {)e^'Y} _ {AR x f(X,)}. Since J.R A f -+ f uniformly as ;. -+ x, the process { f(X,)} must, therefore, a.s. have left and right-hand limits at each t. The same will be true for any f E C(S) since one can write f = f ' - f - with f + and f - continuous nonnegative functions on S. So we have that for each f e C(S), { f (X,)} is a process whose left- and right-hand limits exist at each t (with probability 1). As remarked at the outset, it will be enough to argue that this means that the process {X,} will a.s. have left- and right-hand limits. This is where compactness of S enters the argument. Since S is a compact metric space, it has a countable dense subset {x„}. The functions f„: S --• 68' defined by f(x) = p(x, x), XE S, are continuous for the metric p, and separate points of S in the sense that if x y, then for some n, f(X) ^ f,(y). In view of the above, for each n, { f,(X,)} is a process whose left- and right-hand limits exist at each t with probability 1. Thus, the countable union of events of probability 0 having probability 0, it follows that, with probability 1, for all n the left- and right-hand limits exist at each t for { f„(X,)}. But this means that with probability one the left- and right-hand limits exist at each t for {X} since the fn 's separate points; i.e., if either limit, say left, fails to exist at some t', then, by compactness of S, the sample path t -+ X, must have at least two distinct limit points as t --> t' , contradicting the corresponding property for all the processes { f„(X,)}, -
n=1,2,....
n
In the case that S is locally compact, one may adjoin a point at infinity, denoted A( S), to S. The topology of the one point compactification on S = S v {A} defines a neighborhood system for A by complements of compact subsets of S. Let . !(S) be the sigmafield generated by {, {A}}. The transition probability function p(t; x, B), (t >, 0, XE S. Be s(S)) is extended to p"(t; x, B) (t >, 0, XE S. B e 4(S)) by making A an absorbing state; i.e., p(t; A, B) = I if and only if Ac B, Be 4, otherwise p(t; A, B) = 0. If the conditions of Theorem T.5.4 are fulfilled for p, then one obtains a regular process {X,} with state space S. Defining r o =inf{t>0;X,=A},
the basic Theorem T.5.4 provides a process {X,: t < r o } with state space S whose sample paths are right continuous with left-hand limits at each t < r e . A detailed treatment of this case can be found in K. L. Chung (1982), Lectures from Markov Processes to Brownian Motion, Springer- Verlag, New York. While the one-point compactification is natural on analytic grounds, it is not always probabilistically natural, since in general a given process may escape the state space in a variety of manners. For example, in the case of birth-death processes with state space S = Z one may want to consider escapes through the positive integers as distinct from escapes through the negative integers. These matters are beyond the present scope. 4. (Tauberian Theorem) According to (5.44), in the case r >- 2 the generating function
h(v) can be expressed in the form h(v) - (1 - v) K((1 - v) ') as v -* 1 , (T.5.12) -
-
where p >- 0 and K(x) is a slowly varying function as x -+ oo; that is, K(tx)/K(t) --* 1 as t -• oo for each x > 0. Examples of slowly varying functions at infinity are constants, various powers of Ilog xl, and the coefficient appearing in (5.44) as a function of x = (1 - v) '. In the case r = 1, the generating function vh'(v) for {nh„} is also of -
362
CONTINUOUS-PARAMETER MARKOV CHAINS
this form asymptotically with p = 1. Likewise, in the case of (5.37) one can differentiate to get that the generating function of {(n + I)P(ML = n + 1)} is asymptotically of the form Z(1 — v) '' 2 as v -• 1. Let d(v) be the generating function for a sequence {a„} of nonnegative real numbers and suppose -
X
=I
ä(v)
a k v k ,
( T.5.13)
k=0
converges for 0 < v < 1. The Tauberian theorem provides the asymptotic growth of the sums as n
Y a k — (1/pl'(p))nP
(T.5.14)
as n -+ co,
k=0
from that of ä(v) of the form ä(v)
—
— v) oK((l — v) ')
(1
-
-
as v -. 1 (T.5.15) -
as in (T.5.12). Under additional regularity (e.g., monotonicity) of the terms {a„} it is often possible to deduce the asymptotic behavior of the terms (i.e., differenced sums) from this as a. — (1/I'(p))n° - '
as n -• oo.
(T.5.16)
An especially simple proof can be found in W. Feller (1971), An Introduction to Probability Theory and Its Applications, Vol. II, Wiley, New York, pp. 442-447. Using the Tauberian theorem one can compute the asymptotic form of the rth moments (r > 1) of An ' 12 L given ML = n as n -. oo. In fact, one obtains that for r > 1, -
I
E{»n - 'I 2 L ML = n} — (2/)'p* + (r)
as n - oo,
(T.5.17)
where y* + (r) is the rth moment of the maximum of the Brownian excursion process given in Exercise 12.8(iii) of Chapter I. Moreover, in view of the last equation in the hint following that exercise, one can check that the moments µ*+(r) uniquely determine the distribution function by checking that the moment-generating function with these coefficients has an infinite radius of convergence. From this one obtains convergence in distribution to that of the maximum of a (appropriately scaled) Brownian excursion. This result, which was motivated by considerations of the main channel length as an extreme value of a river network, can be found in V. K. Gupta, O. Mesa and E. Waymire (1990), "Tree Dependent Extreme Values: The Exponential Case,” J. Appl. Probability, in press. A more comprehensive treatment of this problem is given by R. Durrett, H. Kesten and E. Waymire (1989), "Random Heights of Weighted Trees," MSI Report, Cornell University, Ithaca. Problems of this type also occur in the analysis of tree search algorithms in computer science (see P. Flajolet and A. M. Odlyzko (1982), "The Average Height of Binary Trees and Other Simple Trees," J. Comput. System Sci., 25, pp. 171-213).
363
THEORETICAL COMPLEMENTS Theoretical Complements to Section IV.1I
1. A (noncanonical) probability space for the voter model is furnished by the graphical percolation construction. This approach was created by T. E. Harris (1978), "Additive Set-Valued Markov Processes and Percolation Methods," Ann. Probab., 6, pp. 355-378. To construct Q, first, for each m e Z', n belonging to the boundary set 0(m) of nearest neighbors to m, let f2 m , n denote the collection of right-continuous nondecreasing unit jump stepfunctions cn m.n (t), t >, 0, such that Wm(t) --* co as t -+ x. By the term "step function," it is implied that there are at most finitely many jumps in any bounded interval (nonexplosive). Let Pm ,, be the (canonical) Poisson probability distribution on (S 2 m.n• m.n) with intensity 1, where S m.n is the sigmafield generated by events of the form {w E Q m : co(t) s k}, t > 0, k = 0, 1,2,.... Define (1 ',Y', P') as the product probability space S2' = 11 ' 11 m.n, ' = H where the products fj are over m E 1d , n e 3(m). To get 52, remove from Q' any = ((Wm,n (t): t % 0): m E Z', n E 3(m)) such that two or more w m , n (t) have a jump at the same time t. Then f(= S2 n #') and P are obtained by the corresponding restriction to S2 (and measure-theoretical completion). The percolation flow structure can now be defined on (S2, , P) as in the text, but sample pointwise for each w = ((wm.n(t): t 0): m E 1 d ,
ne 3(n)) e 0. For a given initial configuration 1I E S let
D = D(q) :_ {m E 7L ° : 'im = -1 }. A sample path of the process started at '1. denoted a D( " ) (t, w) _ (a^°" ) (t, co): n E 71"), t > 0, co e S2, is given in terms of the percolation flow
on w by if the flow on as reaches (n, t) from some (m, 0), m E D,
I 1 +1
otherwise.
(T.11.1)
The Markov property follows from the following basic property of the construction: QD(^) (t + s, w) =
D^Q UI9I^S.CU %i
(t, Us w),
(T.11.2)
where, for co E 0 as above, U,: S2 --+ Q, is given by U,(co) = ((W m , n (t + s): t >- 0): me 7L ° , ne 3(n)).
(T.11.3)
The Markov property follows from (T.11.2) as a consequence of the invariance of P under the map U,: S2 -• 0 for each s; this is easily checked for the Poisson distribution with constant intensity and, because of independence, it is enough. 2. The distribution of the process {a(t)} started at it E S = { - 1, 1}'" is the induced probability measure P" on (S, .) defined by P(B) = P({6 D(1) (t)} E B),
Be..
(T.11.4)
A metric for S, which gives it the so-called product space topology, is defined by
--kJ.21ni—_._...
=
(T.11.5)
nEzd
where uni _ I(n l , ... , n d )I := max(1n 1 1, ... , In d l). Note that this metric is possible largely because of the denumerability of 7L ° . In any case, this makes the fact that, (i) ' is
364
CONTINUOUS-PARAMETER MARKOV CHAINS
the Bore! sigmafield and (ii) S is compact, rather straightforward exercises for this metric (topology). Compactness is in fact true for arbitrary products of compact spaces under the product topology, but this is a much deeper result (called Tychonoffs Theorem).
To prove the Feller property, it is sufficient to consider continuity of the mappings of the form a —' PQ (v„(t) = —1, ne F) at tt e S, for, fixed t >0 and finite sets F c V. Inclusion—exclusion principles can then be used to get the continuity of a -+ P0 (B) for all finite-dimensional sets B e F. The rest will follow from compactness (tightness). For simplicity, first consider the map a —* PP (a (t) _ —1), for F = {n}. Open neighborhoods of t) e S are provided by sets of the form, R A (1l) = {a e S: Qm = h m for all me A},
(T.11.6)
where A is a finite subset of 7L", say A={m=(m i ,...,m d )e7L ° :Im ; j0.
(T.11.7)
Now, in view of the simple percolation structure for the voter model, regardless of the initial configuration il, one has Qn^^ ) (t) = Qm(n,t)( 0 )
a.s.,
(T.11.8)
where m(n, t) is some (random) site, which does not depend on the initial configuration, obtained by following backwards through the percolation diagram down and against the direction of arrows. Thus, if {r k } is a sequence of configurations in fl that converges to il in the metric p as k — oo, then o (t) —• vä,,,,, t) (t) a.s. as k —+ oo. Thus, by Lebesgue's Dominated Convergence Theorem (Chapter 0), one has 1 — 2 Pnk (Qfl(t) _ —1) = Ea 11k .")(t) = 1 — 2 Pn (Qn(t) _ —1).
Thus, P,) , (a„(t) = —1) —+ Pq (v„(t) = —1) as k —• co. 3. A real-valued function cp(m), m e ZL", such that for each m e Z",
2d
I
/
^ Ym)
(T.11.9)
4 (n) = q (m),
E
where 0(m) denotes the set of nearest neighbors of m, is said to be harmonic (with respect to the discrete Laplacian on 7Ld ). Note that (T.11.9) may be expressed as (p(Z O ) = E m cp(Z,) = E m rp(Zk ),
k>_ 1,
(T.11.10)
where {Z k } is the simple symmetric random walk starting at Z o = m. Theorem T.11.1. (Boundedness Principle). Let (p be a real-valued bounded harmonic ❑ function on 7L ° . Then cp is a constant function. Proof. Suppose that {(Zk, ZZ)} is a coupling of two copies of {Z k }; i.e., a Markov chain on S x S such that the marginal processes {Zk} and {Z'} are each Markov
THEORETICAL COMPLEMENTS
>,
365
chains on S with the same transition law as {Z}. Then, if r = inf{k 1: Zk = Zk} < o0 a.s. one may define Z; = Zk for all k > r without disrupting the property that the process {(Z;, Zk)} is a coupling of {Z k }. With this as the case, by the boundedness of cp and (T.11.10), one obtains
-
Ig(n) — 9(m)I = IEnp(Zk) — Emco(Zk)I = I E (n.m){q (Zk) — p(Zk) }I E(n,m)Iq(Zk) — w(Zk)I , 2BP(n.m)(Zk 0 Zk) = 2 BP(n.ml(z > k),
(T.l 1.11)
where B = sup.J(p(x)I. Letting k —+ oc it would then follow that q(n) = p(m). The success of this approach rests on the construction of a coupling {(Z, Zk)} with r < 00 a.s.; such a coupling is called a successful coupling. The independent coupling is the simplest to try. To see how it works, take d = 1 and let {Z} and {Z} be independent simple symmetric random walks on Z starting at n, m, n — m even. Then {(Zk, Zk)} is a successful coupling since {Zk — Zk} is easily checked to be a recurrent random walk using the results of Chapters II and IV or theoretical complement to Section 3 of Chapter 1. This would also work for d = 2, but it fails for d 3 owing to transience. In any case, here is another coupling that is easily checked to be successful for any d. Let {(Z, Zk)} be the Markov chain on 7L ° x Z d starting at (n, m) and having the stationary one-step transition probabilities furnished by the following rules of motion. At each time, first select a (common) coordinate axis at random (each having probability 1 /d). From (n, m) such that the coordinates of n, m differ along the selected axis, independently select ± directions (each having probability i) along the selected axis for displacements of the components n, m. If, on the other hand, the coordinates of n, m agree along the selected axis, then randomly select a common ± direction along the axis for displacement of both components n, m. Then the process {(Zk, Zk)} with this transition law is a coupling. That it is successful for all (n, m), whose coordinates all have the same parity, follows from the recurrence of the simple symmetric random walk on 1 1 , since it guarantees with probability 1 that each of the d coordinates will eventually line up. Thus, one obtains cp(n) = cp(m) for all n, m whose coordinates are each of the same parity. This is enough by (T.11.9) and the maximum principle for harmonic functions described below. •
_>
The following two results provide fundamental properties of harmonic functions given on a suitable domain D. Their proofs are obtained by iterating the averaging property out to the boundary, just as in the one-dimensional case of Exercise 3.16, Chapter 1. First some preliminaries about the domain. We require that D = D ° u 33D, where D ° , CD are finite, disjoint subsets of Z d such that (i) 3(n) c D for each n e D , where 3(n) denotes the set of nearest neighbors of n; (ii) ä(m) n D ° 0 Q for each me OD; (iii) for each n, me D there is a path of respective nearest neighbors n„ n 2 , ... , n k in D such that n, e 0(n), n k e 0(m).
°
°
Theorem T:11.2. (Maximum Principle). A real-valued function cp on a domain D, as
366
CONTINUOUS-PARAMETER MARKOV CHAINS
°
defined above, that is harmonic on D , i.e., p(m)
_—m
2d neB(ml
(P(n),
takes its maximum and minimum values on D.
m e D°, ❑
In particular, it follows from the maximum principle that if cp also takes an extreme value on D , then it must be constant on D ° . This can be used to complete the proof of Theorem T. 11.1 by suitably constructing a domain D with any of the 2 d coordinate parities on D and öD desired.
°
°
Theorem T.11.3. (Uniqueness Principle). If cp, and Q 2 are harmonic functions on D, as defined in Theorem T.11.2, and if q, 1 = cp 2 on OD, then gyp, = c0 2 on D. ❑ Proof. To prove the uniqueness principle, simply note that q - 42 is harmonic with zero boundary values and hence, by the Maximum Principle, zero extreme values.
n
The coupling described in Theorem T.11.1 can be found in T. Liggett (1985), Interacting Particle Systems, Springer-Verlag, New York, pp. 67-69, together with another coupling to prove the boundedness principle (Choquet-Deny theorem) for the more general case of an irreducible random walk on V.
4. Additional references: The voter model was first considered in the papers by P. Clifford and A. Sudbury (1973), "A Model for Spatial Conflict," Biometrika, 60, pp. 581-588, and independently by R. Holley and T. M. Liggett (1975), "Ergodic Theorems for Weakly Interacting Infinite Systems and the Voter Model," Ann. Probab., 3, pp. 643-663. The approach in section 11 essentially follows F. Spitzer (1981), "Infinite Systems with Locally Interacting Components," Ann. Probab., 9, 349-364. The so-called biased voter (tumor-growth) model originated in T. Williams and R. Bjerknes (1972), "Stochastic Model for Abnormal Clone Spread Through Epithelial Basal Layer," Nature, 236, pp. 19-21. Much of the modern interest in the mathematical theory of infinite systems of interacting components from the point of view of continuous-time Markov evolutions was inspired by the fundamental papers of Frank Spitzer (1970), "Interaction of Markov Processes," Advances in Math., 5, pp. 246-290, and by R. L. Dobrushin (1971), "Markov Processes With a Large Number of Locally Interacting Components," Problems Inform. Transmission, 7, pp. 149-164, 235-241. Since then several books and monographs have been written on the subject, the most comprehensive being that of T. Liggett (1985), loc. cit. Other modern books and monographs on the subject are those of F. Spitzer (1971), Random Fields and Interacting Particle Systems, MAA, Summer Seminar Notes; D. Griffeath (1979), Additive and Cancellative Interacting Particle Systems, Lecture Notes in Math., No. 724, Springer-Verlag, New York; R. Kindermann and J. L. Snell (1980), Markov Random Fields and Their Applications, Contemporary Mathematics Series, Vol. 1, AMS, Providence, RI; R. Durrett (1988), Lecture Notes on Particle Systems and Percolation, Wadsworth, Brooks/Cole, San Francisco.
CHAPTER V
Brownian Motion and Diffusions
I INTRODUCTION AND DEFINITION One-dimensional unrestricted diffusions are Markov processes in continuous time with state space S = (a, b), — oo < a 0 introduced in Chapter I as a limit of simple random walks. In this case, S = (—cc, cc). The transition probability distribution p(t; x, dy) of Ys+ , given Ys = x has a density given by p(i; x, y) _
1
(2tto 2 t) i / 2
I
e -2Q cv- x -gt)2.
(1.1)
Because { Y} has independent increments, the Markov property follows directly. Notice that (1.1) does not depend on s; i.e., {Y} is a time-homogeneous Markov process. As before, by a Markov process we mean one having a time-homogeneous tranition law unless stated otherwise. One may imagine a Markov process {X,} that has continuous sample paths but that is not a proc,ss with independent increments. Suppose that, given Xs = x, for (infnitesimJl) small times t, the displacement X s +, — Xs = Xs+, — x has mean and variance approximately tp(x) and ta 2 (x), respectively. Here p(x) and a 2 (x) are function;; of the state of x, and not constants as in the case of {Y1 }. The distinction between { Y} and {X,} is analogous to that between a simple random walk and a birth-death chain. More precisely, suppose E;X + , — Xs X., = x) = tp(x) + o(t), 3
E((X X+, — XS) 2 I X = x) = ta 2 (x) + o(t), X
E(I);s +1— Xsl' I XS = x) = o(t), hold, as t j 0, for every x e S.
(1.2)
367
368
BROWNIAN MOTION AND DIFFUSIONS
Note that (1.2) holds for Brownian motions (Exercise 1). A more general formulation of the existence of infinitesimal mean and variance parameters, which does not require the existence of finite moments, is the following. For every c > 0 assume that E((X5+, — XS) 1 {Ixa.,-xsl_<E} j XS = x) = tµ(x) + o(t), E((X.+: — )S)21IIX.,,-xslsE} X = x) = ta 2 (x) + o(t),
(1.2),
P(IXs„,—XI>E1Xs=x)=o(t), hold as t j 0. It is a simple exercise to show that (1.2) implies (1.2)' (Exercise 2). However, there are many Markov processes with continuous sample paths for which (1.2)' hold, but not (1.2). Example 1. (One-to-One Functions of Brownian Motion). Let {B 1 } be a standard one-dimensional Brownian motion and cp a twice continuously differentiable strictly increasing function of (—cc, oo) onto (a, b). Then {Xj :_ {cp(B,)} is a time-homogeneous Markov process with continuous sample paths. Take p(x) = e' 3 . Now check that EIX^I = cc for all t > 0, so that (1.2) does not hold. On the other hand, (1.2)' does hold (as explained more generally in Section 3). Definition 1.1. A Markov process {X} on the state space S = (a, b) is said to be a diffusion with drift coefficient µ(x) and diffusion coefficient v 2 (x) > 0, if (i) it has continuous sample paths, and (ii) relations (1.2)' hold for all x. If the transition probability distribution p(t; x, dy) has a density p(t; x, y), then, for (Borel) subsets B of S, p(t; x, B) = J p(t; x, y) dy.
(1.3)
e
It is known that a strictly positive and continuous density exists under the Condition (1.1) below, in the case S = (— co, oo) (see theoretical complement 1). Since any open interval (a, b) can be transformed into (— oo, oo) by a strictly increasing smooth map, Condition (1.1) may be applied to S = (a, b) after transformation (see Section 3). Below, a(x) denotes either the positive square root of a 2 (x) for all x or the negative square root for all x. Condition (1.1). The functions µ(x), a(x) are continuously differentiable, with bounded derivatives on S = (— oo, oo). Also, a” exists and is continuous, and
369
INTRODUCTION AND DEFINITION
a 2 (x) > 0 for all x. If S = (a, b), then assume the above conditions for the relabeled process under some smooth and strictly increasing transformation onto (— oo, co ).
Although the results presented in this chapter are true under (1.2)', in order to make the calculations less technical we will assume (1.2). It turns out that Condition (1.1) guarantees (1.2). From the Markov property the joint density of X,,, X, 2 , ... , X, for 0 < t l < t 2 < . • • < t„ is given by the product p(tl;x,Yi)p(t2 — t1;Yi,Y2)"'p(tn — tn-1;Y,ß-1,Yn)• Therefore, for an initial distribution n, P,,(XX,EB,,...,X,EB.)
f
s e,
..
p(tl;x,Y)
...
p(t„ — t^-1;Yn-1,Yn)dy ...dy e n(dx). (1.4)
JB,,
As usual P,, denotes the distribution of the process {X} for the initial distribution rt. In the case it = S.. we write PX in place of P6 . Likewise, E• E X are the corresponding expectations. Example 2. (Ornstein-Uhlenbeck Process). Let V, denote the random velocity of a large solute molecule immersed in a liquid at rest. For simplicity, let V denote the vertical component of the velocity. The solute molecule is subject to two forces: gravity and friction. It turns out that the gravitational force is negligible compared to the frictional force exerted on the solute molecule by the liquid. The frictional force is directed oppositely to the direction of motion and is proportional to the velocity in magnitude. In the absence of statistical fluctuations, one would therefore have m(dV,/dt) = —ßV„ where m is the mass of the solute molecule and ß is the constant of proportionality known as the coefficient of friction. However, this frictional force —ßV may be thought of as the mean of a large number of random molecular collisions. Assuming that the central limit theorem applies to the superposition of displacements due to a large number of such collisions, the change in momentum m dV over a time interval (t, t + dt) is approximately Gaussian, provided dt is such that the mean number of collisions during (t, t + dt) is large. The mean and variance of this Gaussian distribution are both proportional to the number of collisions, i.e., to dt. Therefore, one may take the (local) mean of V to be —(ß/m) V, dt and the (local) variance to be a 2 dt. E(V
+d ,
— V I V = V) = — (ß/m)V dt + o(dt),
(1.5) E((V
+d ,
— V) 2 V = V) = a 2 dt + o(dt).
Therefore, a reasonable model for the velocity process is a diffusion with drift (—ß/m)V and diffusion coefficient a 2 > 0, called the Ornstein-Uhlenbeck process.
370
BROWNIAN MOTION AND DIFFUSIONS
An important problem is to determine the transition probabilities for diffusions having given drift and diffusion coefficients. Various approaches to this problem will be developed in this chapter, but in the meantime for the Ornstein-Uhlenbeck process we can check that the transition function given by
1(y 2 — x _ßm^ 't)2
p(t; x, y) = [nw 2 ß -1 m(1 — exp{-2ßm"'t})] - 'I 2 exp — ßm a (1 —e
)
(t > 0, — oo < x, y < oo), (1.6)
furnishes the solution. In other words, given V = x, L' Gaussian with mean xe - ß'" ` and variance ZQ 2 (1 — e_ 2 ß'°``)ß -1 m. From this one may check (1.5) directly.
Example 3. (A Nonhomogeneous Diffusion). Suppose {X} is a diffusion with drift µ(x) and diffusion coefficient a 2 (x). Let f be a continuous function on [0, co). Define
f
Z, =X,+
f(u)du,
(1.7)
t30.
o
Then, since f is a deterministic function, the process {Z 1 } inherits the Markov property from {XX }. Moreover, {Z} has continuous sample paths. Now, E{Z5+ ,—Z3 1Z ,=z}=E{Xs+,—XXIX„=z— o
(
=µ^z—
J
£
f(u)du}+
f
"+ t f (u) du
s
f(u)du)t+f(s)t+o(t).
(1.8)
0
Also, E{(Z+t—ZS) 2 lZ3=z}=Ej(Xs+1—X.,) 2 X.,=z—
l
+
Jf (u)du} o
(f
.'
f(u) du ) /I /
s
= a 2 (z —
+o(t)
J
S
f(u) du)t + o(t).
(1.9)
0
Similarly, E{IZs+, — Zs 1 3 Z., = z} = o(t).
(1.10)
In this case {Z} is referred to as a diffusion with (nonhomogeneous) drift and
KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS, MARTINGALES
371
diffusion coefficients, respectively, given by µ(s, z) =p z —
\
J
du) + f(s),
Js f (u) du I .
(1.11)
v 2 (s, z) = a 2 (z \ o —
Note that if {X,} is a Brownian motion with drift µ and diffusion coefficient a 2 , and if f (t) := v is constant, then {Z,} is Brownian motion with drift µ + v
and diffusion coefficient o 2 . A rigorous construction of unrestricted diffusions by the method of stochastic differential equations is given in Chapter VII. An alternative method, similar to that of Chapter IV, Sections 3 and 4, is to solve Kolmogorov's backward or forward equations for the transition probability density. The latter is then used to construct probability measures P,, Px on the space of all continuous functions on [0, oo). The Kolmogorov equations, derived in the next section, may be solved by the methods of partial differential equations (PDE) (see theoretical complement 1). Although the general PDE solution is not derived in this book, the method is illustrated by two examples in Section 5. A third method of construction of diffusions is based on approximation by birth—death chains. This method is outlined in Section 4. Diffusions with boundary conditions are constructed from unrestricted ones in Sections 6 and 7 by a probabilistic method. The PDE method, based on eigenfunction expansions, is described and illustrated in Section 8. The aim of the present chapter is to provide a systematic development of some of the most important aspects of diffusions. Computational expressions and examples are emphasized. The development proceeds in a manner analogous to that pursued in previous chapters, and, as before, there is a focus on large-time behavior. 2 KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS, MARTINGALES
In Section 2 of Chapter IV, Kolmogorov's equations were derived for continuous-parameter Markov chains. The backward equations for diffusions are derived in Subsection 2.1 below, together with an important connection between Markov processes and martingales. The forward, or Fokker—Planck, equation is obtained in Subsection 2.2. Although the latter plays a less important mathematical role in the study of diffusions, it arises more naturally than the backward equation in physical applications. 2.1 The Backward Equation and Martingales
Suppose {X} is a Markov process on a metric space S, having right-continuous sample paths and a transition probability p(t; x, dy). Continuous-parameter
372
BROWNIAN MOTION AND DIFFUSIONS
Markov chains on countable state spaces and diffusions on S = (a, b) are examples of such processes. On the set B(S) of all real-valued, bounded, Borel measurable function f on S define the transition operator (Ttf)(x):= E(f(X) ( Xo = x) =
f
f (y)p(t; x, dy),
(t > 0).
(2.1)
Then T, is a bounded linear operator on B(S) when the latter is given the sup norm defined by
il f II := sup{II(Y)I : Y e S} .
(2.2)
Indeed T, is a contraction, i.e., IITrf II < IIfit for all f E B(S). For,
I(T,.f)(x)I
5 J I.f (Y)I P(t;, x, dy) < II f 1I .
(2.3)
The family of transition operators {T,: t > 0} has the semigroup property, Ts+, = TSTt,
(2.4)
where the right side denotes the composition of two maps. This relation follows from (Ts+,.f )(x) = E(f(Xs+,)1 Xo = x) = E[E(.f(X5+^) J XS) 1 Xo = x] = E[(T,f)(X.,) Xo = x] = T.,(T1.f)(x). (2.5)
The relation (2.4) also implies that the transition operators commute, TT, = T A T,.
(2.6)
As discussed in Section 2 of Chapter IV for the case of continuous-parameter Markov chains, the semigroup relation (2.4) implies that the behavior of T, near t = 0 completely determines the semigroup. Observe also that if f e Cs(S), the set of all real bounded continuous functions on S, then (T j)(x) - E(f (XX ) I X0 = x) —+ f (x) as t j 0, by the right continuity of the sample paths. It then turns out, as in Chapter IV, that the derivative (operator) of the function t --+ T, at t = 0 determines {T,: t > 0} (see theoretical complement 3). Definition 2.1. The infinitesimal generator A of {T 1 : t > 0}, or of the Markov process {X1 }, is the linear operator A defined by
(A.f)(x) = lim (Ts i)(x) — i(x) > s i o S
(2.7)
KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS, MARTINGALES
373
for all f e B(S) such that the right side converges to some function uniformly in x. The class of all such f comprises the domain ^A c?f A. In order to determine A explicitly in the case of a diffusion {X,}, let f be a bounded twice continuously differentiable function. Fix x e S, and a ä > 0, however small. Find r > 0 such that I f"(x) — f"(y)I < b for all y such that ly — xl < e. Write (T,f)(x) = E(.f(X,)I(lx,-xp,E) I Xo = x) + E(f(XX)Itlx,-xl>E} I Xo = x). (2.8) Now by a Taylor expansion of f(X1 ) around x, E(.f (XX) 1 nx, -x1
Xo = x) = E[{ f(x) + (X — x).f'(x) + z(X, — x) 2
f „(x)
+ 2(XI — x) 2 (f " (^,) — f ,, (x))} 1 11x xl Vie} I Xo = x], (2.9)
where ^, lies between x and X. By the first two relations in (1.2)', the expectations of the first three terms on the right add up to f(x) + tj,(x)f , (x) + tza 2 (x)f"(x) + 0(t),
(t 0).
(2.10)
The expectation of the remainder term in (2.9) is less than — x) 2 lilx^ x^sEt 1 Xo = x) = 8Q 2 (x)t + o(t),
(t j0). (2.11)
The relations (2.8)-(2.11) lead to
um
(Tj)(x) - f(x) _
( l o t
{u(x)f'(x) + z6 2 (x)f "(x)} -562 1 x) 2 (
+ liro (lo
Ilfll E(1 t
(fix,—xl>el
X o
x). (2.12)
The Firn on the right is zero by the last relation in (1.2)'. As >0 is arbitrary it now follows that Firn
rlo
(T^.1)(x)-
.l (x) = µ(x)1 '(x) + ia 2 (x)f "(x).
(2.13)
t
Does the above computation prove that a bounded twice continuously differentiable f belongs to -9A ? Not quite, for the limit (2.13) has not been shown to be uniform in x. There are three sources of nonuniformity. One is that the o(t) terms in (1.2)' need not be uniform in x. The second is that the o(t) term in (2.10) may not be uniform in x, even if those in (1.2)' are; for µ(x)f'(x), Q 2 (x)f"(x) may not be bounded. The third source of nonuniformity arises from
374
BROWNIAN MOTION AND DIFFUSIONS
the fact that given b > 0 there may not exist an e independent of x such that I f() — f (x)I 0: X, = c or d}.
inf{t
(2.26)
Let x E (c, d). By the optional stopping result Proposition 13.9 of Chapter I
E x Zt = E x Z o .
(2.27)
But Z o = f(X0) - f(x) under PX , so that E.,Z 0 is the right side of (2.24). Now check that Af(x) = 0 for c <x < d, so that (Af)(XS ) = 0 for 0 < s 0). (2.41)
For a physical interpretation of the forward equation (2.41), consider a dilute concentration of solute molecules diffusing along one direction in a possibly nonhomogeneous fluid. An individual molecule's position, say in the x-direction, is a Markov process with drift u(x) and diffusion coefficient a 2 (x). Given an initial concentration c o (x), the concentration c(t, x) at x at time t is given by c(t, x) =
f
c o (z)p(t; z, x) dz,
(2.42)
s
where p(t; z, x) is the transition probability density of the position process of an individual solute molecule. Therefore, c(t, x) satisfies Kolmogorov's forward equation ac(t, x)
A*c(t, x)
ax J(t, x),
(2.43)
with J given by J(t, x) _ —
ä
[1 2 (x)c(t,
x)) + µ(x)c(t, x)].
(2.44)
The Kolmogorov forward equation is also referred to as the Fokker—Planck equation in this context. Now the increase in the amount of solute in a small region [x, x + Ax] during a small time interval [t, t + At] is approximately öc(t, x)
öt
At Ax.
(2.45)
On the other hand, if v(t, x) denotes the velocity of the solute at x at time t, moving as a continuum, then a fluid column of length approximately v(t, x) At flows into the region at x during [t, t + At]. Hence the amount of solute that flows into the region at x during [t, t + At] is approximately v(t, x)c(t, x) At, while the amount passing out at x + Ax during [t, t + At] is approximately v(t, x + Ax)c(t, x + Ax) At. Therefore, the increase in the amount of solute in [x, x + Ax] during [t, t + At] is approximately [v(t, x)c(t, x) — v(t, x + Ax)c(t, x + Ax)] At.
(2.46)
TRANSFORMATION OF THE GENERATOR
381
Equating (2.45) and (2.46) and dividing by At Ax, one gets öc( öt x)
äx (v(t,
x)c(t, x)).
(2.47)
Equation (2.47) is generally referred to as the equation of continuity or the equation of mass conservation. The quantity v(t, x)c(t, x) is called the flux of the solute, which is seen to be the rate per unit time at which the solute passes out at x (at time t) in the positive x-direction. In the present case, therefore, the flux is given by (2.44).
3 TRANSFORMATION OF THE GENERATOR UNDER RELABELING OF THE STATE SPACE In many applications the state space of the diffusion of interest is a finite or a semi-infinite open interval. For example, in an economic model the quantity of interest, say price, demand, supply, etc., is typically nonnegative; in a genetic model the gene frequency is a proportion in the interval (0, 1). It is often the case that the end points or boundaries of the state space in such models cannot be reached from the interior. In other words, owing to some built-in mechanism the boundaries are inaccessible. A simple way to understand these processes is to think of them as strictly monotone functions of diffusions on S = R 1 . Let d 2 d +µ(x)x
(3.1)
A=162(x)dx2
be the generator of a diffusion on S = (—co, oo). That is to say, we consider a diffusion with coefficients µ(x), a 2 (x). Let 0 be a continuous one-to-one map of S onto S = (a, b), where — oo < a 0. Proof. Let r> 2, E' > 0 be given. Fix 0 > 0. We will show that the left side of
(3.2) is less than 6t for all sufficiently small t. For this, write
382
BROWNIAN MOTION AND DIFFUSIONS
(5= (0/(2 r 2 (x))) 1 " ( r -2) . Then the left side of (3.2) is no more than E(IXS+1 — Xsl'lnx.+t—xsl,a) ( XS = x)
r
+ E(IXs+r — X 1 (Ixs+t — xsls^xs.t—xsl>S) Xs = x)
S' 2 (1 X., — X521(1
xsi,ts) 1 XX = x) + E 'r P(IX., — XJ > (5 XS = x)
(o2(x)t + o(t)) + (e')ro(t) = 0 t + o(t). (3.3)
l 2 o' 2 (x)
The last inequality uses the fact that (1.2)' holds for all e > 0. Now the term o(t) is smaller than Ot/2 for all sufficiently small t. n Proposition 3.1. Let {X} be a diffusion on S = (c, d) having drift and diffusion coefficients µ(x), Q 2 (x), respectively. If 0 is a three times continuously differentiable function on (c, d) onto (a, b) such that 0' is either strictly positive or strictly negative, then {Z, :_ ¢(XX )} is a diffusion on (a, b) whose drift i(•) and diffusion coefficient Q 2 (.) are given by A(z) = 4) (4
+ i4) (4) 1(z))a2(4,-1(z)),
'
& 2 (Z) _ (4' 1 (0 -1 (Z))) 2
z E (a, b). (3.4)
2 (4 -1 (Z))'
Proof. By a Taylor expansion, Za+, — Z., = 0(Xs+r) — 0(X.,)
= (X,+' — X.)^'(XS) + 1 (Xs +, — X:) 2 4)"(X5) + 3 (Xs +, — XS) 3 4"(Z), {
2!
(3.5) where lies between Xs and XS+ ,. Fix s > 0, z e (a, b). There exist positive constants (5 1 (),5 2 (c) such that 0 - ' maps the interval [z — e, z + s] onto [4 -1 (z) — (5 1 (e), ¢ - '(z) + 6 2 (s)]• Write x = 4 - '(z), and let S m = min{8 1 (s), 6 2 (8)}, SM = max{81(c), 62(8)}. Then 1 ({Zs+t-zlbE) = 1(x-61(E)sX,+t,x+bz(E))1 (3.6)
1 (Ixa+t-xlsam)
1 {IZ.+t-zI e}
1 {IX,+t-xlsöM)'
Therefore, E((Xs — X., ) 1 {Iza+t-zl,E) 1 Zs = z) = E((X„, — x)ltl+t-IE) I X, = x) = E((X5+, —
+ E((Xs
x ) l (Ixa+t — xl) —
1
XX = x )
x ) l (Ix.+t—xl>b,,.,iz,+t—zl5e) I Xs =
(3.7) In view of the last inequality in (3.6), the last expectation is bounded in
383
TRANSFORMATION OF THE GENERATOR
magnitude by SMP(IXs+a — xI > CS m I XX = x), which is of the order o(t) as t j 0, by the last relation in (1.2)'. Also, by the first relation in (1.2)', E((Xs+t — Xs)ltIxs+t-xl,am) I Xs = x) = µ(x)t + o(t).
Therefore, E((XX+r — XS)l(Iz. ZIsE) 1 Z = z) = u(x)t + o(t). ,
(3.8)
In the same manner, one has E((X5+1 — Xs) 21 (1z,+t—zsl,E) 1 Zs = z) = E((Xs+r — Xs) 21 (Ixs-,—xaI o,,,} I Xs = x)
+ O(SMP(I Xs+r — xl > b m I Xs = x) = 0 2 (x)t + o(t).
(3.9)
Also, by the Lemma above, ,,,
E(IXs+r — Xs1 3 1o (^)Ilnzs.,—z,I,E) ( ZS = z) cE(IXX+r — Xsl 31 {Ix,,,-x,I_cIZs=z) 6 .1X.,=x)=
0 (t),
(3.14)
by the last condition in (1.2)'. n Example 1. (Geometric Brownian Motion). Let S = (—cc, cc), S = (0, cc), 4(x) = e". If A is given by (3.1), then the coefficients of A are µ(z) = zµ(log z) + Zza 2 (log z), (0 0, having transition probabilities p ;; of going from iA to JA in one step given by, 1z(iA)E U2(ie)8 = a;o^ ;=Q Z (ie)E^t(ie)E _ 2e ^o> := 2e2 + 2A 2A2
(4.2)
a z (i0)e = 1
Pu
cep — a^e^ 1 — ez ß< < = —
with the parameter given by A2 Az .
(4.3)
Uo
Note that under Condition (1.1) and boundedness of p(x), v z (x), the quantities ß, 5, and I — ß;°) — a;° ) are nonnegative for sufficiently small A. We shall let e be the actual time in between two successive transitions. Note that, given
DIFFUSIONS AS LIMITS OF BIRTH—DEATH CHAINS
387
that the process is at x = iA, the mean displacement in a single step in time e is A /3i(°) - A bi(°) = p(iA)e = fi(x)e.
(4.4)
Hence, the instantaneous rate of mean displacement per unit time, when the process is at x, is u(x). Also, the mean squared displacement in a single step is A 2 ß + ( - A) 2 8; °) = r 2 (iA)c = Q Z (x)E.
(4.5)
Therefore, a 2 (x) is the instantaneous rate of mean squared displacement per unit time. Conversely, in order that (4.4), (4.5) may hold one must have the birth-death parameters (4.2) (Exercise 1). The choice (4.3) of c guarantees the nonnegativity of the transition probabilities p ;; , p ;.; _,, pi,;+l. Just as the simple random walk approximates Brownian motion under proper scaling of states and time, the above birth-death chain approximates the diffusion with coefficients µ(x) and u'(x). Indeed, the approximation of Brownian motion by simple random walk described in Section 8 of Chapter I follows as a special case of the following result, which we state without proof (see theoretical complement 1). Below [r] is the integer part of r. Theorem 4.1. Let { Y: n = 0, 1, 2, ...} be a discrete-parameter birth-death with one-step transition probabilities (4.2) and chain on S = {0, ±A, ±2&. with Yo) = [x o /0]A where x 0 is a fixed number. Define the stochastic process {X} := { Yt°e 1 }
(t >, 0).
(4.6)
Then, as A j 0, {X} converges in distribution to a diffusion {X} with drift µ(x) and diffusion coefficient 6 2 (x), starting at x 0 . As a consequence of Theorem 4.1 it follows that for any bounded continuous function f we have
°)
)
lim E{.f(X( ) I Xö = ^]Aj = Ez.f(X,). A-^ [
(4.7)
By (2.15), in the case of a smooth initial function f, the function u(t, x):= E x f (XX ) solves the initial value problem
au
_ ac
^
Z 2(x)
ai Z + p(x) au ,
ax
ax
lim u(t, x) _ f (x). ,1 o
(4.8)
The expectation on the left side of (4.7) may be expressed as u (A) (n, i)'=
>f(jA)P
(4.9)
388
BROWNIAN MOTION AND DIFFUSIONS
where n = [ t/s], i = [x/A], and p;; are the n -step transition probabilities. It is illuminating to check that the function (n, i) u ° (n, i) satisfies a difference equation that is a discretized version of the differential equation (4.8). To see this, note that )
(
(
P;
+
i)
)
= PiiPci ) +P).i +i pi+i.;+pa,i - iPi"- i,; )
(n) _ ce) c") (e) (") (o) (") = ( 1 — ß) at )P); + ßt Pt+i,i + bI PI -i,; (n)
= P);
+
(A) (") i (Pt +i.;
(n)
(D) (n)
— P.;) — bt (Pt;
—
(n)
Pi-
i,;),
(4.10)
or, P(
— P ( __ µ(iA) (") .+ 20 {(P i,;
1
+
Q2
2
—
(") + (P,;(") —p)(") Pi;) -i,;)}
(iA) (") (") + PI (") -i, ;) (P( +i,; — 2pi(;
A2
( 4.11 )
Summing over j one gets a corresponding equation for u (°) ( n, i). As Ab 0 the state space S = {0, ±A, ±2A,. . .} approximates l = (— oo, oo), provided we think of the state jA as representing an interval of width A around jA. Accordingly, think of spreading the probability p;j ) over this interval. Thus, one introduces the approximate density y -- p (°) ( t; x, y) at time t = ne for states x = iA, y =jA, by p(e)(ns; iA, jA) := h-.
(4.12)
By dividing both sides of (4.11) by A, one then arrives at the difference equation (P (e) ((n + 1)s; iA, jA) — p ( e)(ne; iA,jA)) /E e = µ(iA)(P (n) (ns; (i + 1)A, jA) — p ( ) (ne; (i — 1)A, jA))/2A A + ia 2 (iA)(P (e) (ns; (i + 1)A, jA) — 2p ( ) ( nc; iA,jA) + p ( e) (ne; (i — 1)A, jA))/AZ. (4.13)
This is a difference-equation version of the partial differential equation y) ap(t;, a t
= k(x)
op(; x, y) + Za2(x) a 2 P(t; x, y) ax
i
for t > 0, — oo < x, y < oo, (4.14) at grid points (t, x, y) = (ne, iA, jA). Thus, the transition probability density p(t; x, y) of the diffusion {X} may
TRANSITION PROBABILITIES FROM THE KOLMOGOROV EQUATIONS: EXAMPLES
389
be approximately computed by computing p. The latter computation only involves raising a matrix to the nth power, a fairly standard computational task. In addition, Theorem 4.1 may be used to derive the probability i/i(x) that {X,} reaches c before d, starting at x E (c, d), by taking the limit, as A J 0, of the corresponding probability for the approximating birth—death chain {Y;,e }. In other words (see Section 2 of Chapter III, relation (2.10)), )
^e>
j-1 Sce)S(e) r r -1
0(x) = leim
5 1+1
ß(s)
r=to
(4.15)
+1
1 L=ißie^ r
1
g^e>
^e^ +1 r
Pi+1
where i' =
i= i =
x], [
The limit in (4.15) is (Exercise 2) J
1L/(X) =
Y1 exp —
f
rip
hm
(4.16)
( 2 µ(Y)/a 2 (Y)) dY j -
eio I + ^Y1 exp —
f I
].
rA
c
( J .
r =i +t
[
]
[
re ( 2 li(Y)/a 2 (Y)) dY c
d
exp
^
2 µ(Y)/Q 2 (Y)) d y}
Z
J
exp{ — c
dz
_JZ (
(4.17) )
J ( µ(Y)/a (Y)) dy} dz 2
c
2
) ))
which confirms the computation (2.24) given in Section 2. This leads to necessary and sufficient conditions for transience and recurrence of diffusions (Exercise 3). This is analogous to the derivation of the corresponding probabilities for Brownian motion given in Section 9 of Chapter I. Alternative derivations of (4.17) are given in Section 9 (see Eq. 9.23 and Exercise 9.2), in addition to Section 2 (Eq. 2.24).
5 TRANSITION PROBABILITIES FROM THE KOLMOGOROV EQUATIONS: EXAMPLES Under Condition (1.1) the Kolmogorov equations uniquely determine the transition probabilities. However, solutions are generally not obtainable in closed form, although numerical methods based on the scheme described in
390
BROWNIAN MOTION AND DIFFUSIONS
Section 4 sometimes provide practical approximations. Two examples for which the solutions can be obtained explicitly from the Kolmogorov equations alone are given here.
Example 1. (Brownian Motion). Brownian motion is a diffusion with constant drift and diffusion coefficients. First assume that the drift is zero. Let the diffusion coefficient be a 2 . Then the forward, or Fokker-Planck, equation for p(t; x, y) is
ap(t at a , Y) = 2a2 a Z P(; 2 , Y) (
t > 0, — co < x < oc, — oo < y < co ). (5.1)
Y
Let the Fourier transform of p(t; x, y) as a function of y be denoted by P(t; x, )
=
J
e 14y p(t;
x, y) dy.
(5.2)
Then (5.1) becomes (5.3)
2 Q2ß'
öt or, 2
öt (e
)
0, (5.4)
whose general solution is p(t; x, ) = c(x, l;)exp{—?a 2 2 t}.
(5.5)
Now p is the characteristic function of the distribution of X, given X 0 = x, and as t 10 this distribution converges to the distribution of X0 that is degenerate at x. Therefore,
c(x, Z) = lim p(t; x, Z) = E(e`
X °)
= e`x,
(5.6)
tjo
and we obtain ^(t; x, l;) = exp {il x — a 2 2 t}.
(5.7)
But the right side is the characteristic function of the normal distribution with
TRANSITION PROBABILITIES FROM THE KOLMOGOROV EQUATIONS: EXAMPLES
391
mean x and variance tv 2 . Therefore, P(t; x,Y) _ (2na2t)"2 exp{-
s
(y _
(t > 0, -cc < x, y < oo). (5.8) X)
The transition probability density of a Brownian motion with nonzero drift may be obtained in the same manner as above (Exercise 1).
Example 2. (The Ornstein-Uhlenbeck Process). S = (- oo, cc), u(x) = - yx, a 2 (x) := v 2 > 0. Here y is a (positive) constant. Fix an initial state x. As a function of t and y, p(t; x, y) satisfies the forward equation
P
z
at
ZP
2 öy 2
(5.9)
+ y - (YP(t; X, Y))-
Let p be the Fourier transform of p as a function of y, ß(t; Z) =
J
(5.10)
e'4YP(t; x, y) dy.
Then, upon integration by parts,
G
Y/ (t; ) = J -^ e aY (t; x, y) dy
i^e'4vp(t;
^ I (t; Z) = GY /
_ m
e1 ap(t ; x, Y) dY = - iZ aY
_ (_ iZ) 2
C
Y ^y ) ^(t; ) = J
x, y) dy = -iß,
J
J e`' apaydy
ep(t; x, Y) dY =
ey aP(t;
, Y) =
1 (ap ö
1 d
ay^
iö
i
y a J p e ^ y ! dY
( -1 P) _
-
P -
öp(t; )
a
(5.11)
Here we have assumed that y(ap/ay), a l p/ay e are integrable and go to zero as jyj - cc. Taking Fourier transforms on both sides of (5.9) one has aß(t; ) =-2
^ 2 P+yP+y( - P
-
aß)=
392
BROWNIAN MOTION AND DIFFUSIONS
Therefore, öp(t;
3ß 2 2 ^ Z p. ^ öt
(5.12)
Thus, we have reduced the second-order partial differential equation (5.9) to the first-order equation (5.12). The left side of (5.12) is the directional derivative of p along the vector (1, y^) in the (t, )-plane. Let a(t) = d exp{yt}. Then (5.12) yields
d
(5.13)
t p(t; a(t)) = --- a 2 (t)p(t; a(t))
or
-f
p(t; a(t)) = c(x, d) exp - -
o, a
l
2 (s)
ds }
))
= c( x , d) e x p ^ — a 4 dz (e 2 Yt _ l)}.
Y
For arbitrary t and one may choose d = ^e - Y`, so that a(t) = ^, and get ß(t; ) = c(x, ^e - '")exp{ —42 X 2 (1 — e -2 ji)}
(5.14)
l 4y
t j 0, one then has (5.15)
,im ß(t; Z) = c(x, Z). :10
On the other hand, as t 10, p(t; c) converges to the Fourier transform of 5, which is exp{i^x}. Thus, c(x, ) = exp{i^x} for every real , so that (5.16)
(t; ) = exp{ ixe - y` — 4 (1 — e -Z Y`) Z },
Y
which is the Fourier transform of a Gaussian density with mean xe - y` and variance (o 2 /2y)(1 — e -2 yt ). Therefore, p(t; x, y) = (2 )^ii (2y (1 — e -Z Y`)) I/2 exp{
a 2 (1 — e e 2y1)2}
(5.17)
Note that the above derivation does not require that y be a positive parameter. However, observe that if y = ß/m > 0 then letting t --+ oo in (5.16) we obtain
393
DIFFUSIONS WITH REFLECTING BOUNDARIES
the characteristic function of the Gaussian distribution with mean 0 and variance Q 2 /2y = mc 2 /2ß (the Maxwell—Boltzmann velocity distribution). It follows that it(v) (__Zy-1)1/2 exp{ — yv z/6 z },
—cc
< v < oo,
is the p.d.f. of the invariant initial distribution of the process. With it as the initial distribution, { V} is a stationary Gaussian process. The stationarity may be viewed as an "equilibrium" status in which energy exchanges between the particle and the fluid by thermal agitation and viscous dissipation have reached a balance (on the average). Observe that the average kinetic energy is given by E, ( mV,) = (m 2 /4ß)Q 2 •
6 DIFFUSIONS WITH REFLECTING BOUNDARIES So far we have considered unrestricted diffusions on S = (— oo, oo), or on open intervals. End points of the state space of a diffusion that cannot be reached from the interior are said to be inaccessible. In this section and the next we look at diffusions restricted to subintervals of S with one or more end points accessible from the interior. In order to continue the process after it reaches a boundary point, one must specify some boundary behavior, or boundary condition, consistent with the requirement that the process be Markovian. One such boundary condition, known as the reflecting boundary condition or the Neumann boundary condition, is discussed in this section. Subsection 6.2 may be read independently of Subsection 6.1.
6.1 Reflecting Diffusions as Limits of Birth—Death Chains As an aid to intuition let us first see, in the spirit of Section 4, how a reflecting diffusion on S = [0, oo) may be viewed as a limit of reflecting birth—death chains. Let it(x) and a 2 (x) satisfy Condition (1.1) on [0, oo) and let µ(x) be bounded. For sufficiently small A > 0 one may consider a discrete-parameter birth—death chain on S = {0, A, 2A,. . .} with one-step transition probabilities =
,e) = u'(i0)e p(iA)a ß^ 2AZ + 2A
= ö Stns
— a 2 (iA) e u(iA)E 2A2 — 2A (i CrZAE
P^^ = 1 — (ßi + b) = 1 — (i
)
1),
(6.1)
394
BROWNIAN MOTION AND DIFFUSIONS
and Poi = ßo (Al
—
62(0)E 0 )E a2(0)E ii(0)E (0) ß o 1—— 2ez — µ( 2e .
202 + 2e Poo = 1 —
(6.2)
Exactly as in Section 4, the backward difference equations (4.11) or (4.13) are obtained for i >, 1, j > 0. These are the discretized difference equations for
aP(t;
,
Y)
= u(x) aP(
ä )x Y + ZU2(x)
z
a 2 P(t; x, Y)
t>0,
x>0, Y>0.
(6.3) The backward boundary condition for the diffusion is obtained in the limit as A J, 0 from the corresponding equations for the chain for i = 0, j >, 0, Po.i+ 11 — Poi Pi] + Poo Po ( a'(0)f: µ( 0 )s )(") — a 2 0 )E - 11 0 )E l( n 2A 2 + 2A p '' + 1\ 1 2A z 2A )Poi)
_
(
or, + Pö; i)E Pö;
(
(J 0),
_ 2Q0) (Pi; — Pö;) + 2D) (P (' — Pö;) (j > 0). (6.4)
In the notation of (4.12) this leads to,
2e)
p((n + 1)E; 0, jA) — p (°) (m; 0, j0) A =
(
P (°) (nw; A, JA) — p (°) (w; 0, jA))
+ µ( 0) (p (°) (nc; A, j0) — p (°)
(
ns; 0, JA))A.
(6.5) Fix y >, 0, t > 0. For n = [ t/E], j = [y/0], the left side of (6.5) is approximately 0 ( aP(t; x, Y) at
)
.= O'
while the right side is approximately 1az(0) öP(t; x, Y) 01p(0) aP(t; x, Y)
z
ax
x_o
z
ax
).=0
Letting A j 0, one has ap(t; x, Y) =
ox
x=0
0,
(t
> 0, y > 0).
(6.6)
395
DIFFUSIONS WITH REFLECTING BOUNDARIES
The equations (6.3) and (6.6), together with an initial condition p(0; x,.) = 6 x determine a transition probability density of a Markov process on [0, co) having continuous sample paths and satisfying the infinitesimal conditions (1.2) on the interior (0, oc). This Markov process is called the diffusion on [0, co) with reflecting boundary at 0 and drift and diffusion coefficients µ(x), a 2 (x). The equations (6.3) and (6.6) are called the Kolmogorov backward equation and backward boundary condition, respectively, for this diffusion. This particular boundary condition (6.6) is also known as a Neumann boundary condition. The precise nature of the approximation of the reflecting diffusion by the corresponding reflecting birth—death chain is the same as described in Theorem 4.1. 6.2 Sample Path Construction of Reflecting Diffusions Independently of the above heuristic considerations, we shall give in the rest of this section complete probabilistic descriptions of diffusions reflecting at one or two boundary points, with a treatment of the periodic boundary along the way. The general method described here is sometimes called the method of images. ONE-POINT BOUNDARY CASE. (Diffusion on S = [0, c0) with "0" as a Reflecting Boundary). First we consider a special case for which the probabilistic description of "reflection" is simple. Let µ(x), v 2 (x) be defined on S = [0, cc) and satisfy Condition (1.1) on S. Assume also that µ(0) = 0. Now extend the coefficients p(.), a 2 µ(
—
x) =
—
u(x),
(.)
(6.7)
on R' by setting
Q 2 (— x)
= Q 2 (x),
(x > 0).
(6.8)
Although /1(.), a'(.) so obtained may no longer be twice-differentiable on R', they are Lipschitzian, and this suffices for the construction of a diffusion with these coefficients (Chapter VII). Theorem 6.1. Let {X} denote a diffusion on FR' with the extended coefficients ,u(.), a 2 (.) defined above. Then {1X1 1} is a Markov process on the state space S = [0, cc), whose transition probability density q(t; x, y) is given by q(t;
x, y) = p(t; x, y) + p(t; x, — y)
(x, ye [0, cc)),
(6.9)
where p(t; x, y) is the transition probability density of {X}. Further, q satisfies the backward equation aq(t;
x, y)
äq
12 a 2 q
+ p(x)
ax (t
> 0; x > 0, y % 0 ),
(6.10)
396
BROWNIAN MOTION AND DIFFUSIONS
and the backward boundary condition aq(t; x, y) I ax
=0
(6.11)
(t>0;y>,0).
x=0
Proof. First note that the two Markov processes {X} and {—X} on R' have
the same drift and diffusion coefficients (use Proposition 3.1, or see Exercise 1). Therefore, they have the same transition probability density function p, so that the conditional density p(t; x, y) of XX at y given X0 = x is the same as the conditional density of —X, at y given —X 0 = x; but the latter is the conditional density of X, at —y, given X0 = —x. Hence, p(t; x, y) = p(t; —x, —y).
(6.12)
In order to show that { Y := I X^I } is a Markov process on S = [0, cc), consider an arbitrary real-valued bounded (Borel measurable or continuous) function g on [0, oo), and write h(x) = g(Ixl). Then, as usual, writing Px for the distribution of {X1 } starting at x and Ex for the corresponding expectations, one has for Ex(g()s+,) I {Y„ 0 < u < s}), Ex(g(IX5+,I) I {IXuI: 0 s u
s})
= Ex[E(g(IXs+,I) I {X„: 0 u s}) I {IX I: 0 u 5 s}] X
= E x [E(h(XS+ ,) {X: 0 u s}) {IX„I: 0 u s}]
Ex[(J
m
=
EX[J
{IX„I: 0 < u O;x,ye[0,d);m=0,+1,±2,...). (6.24) Using (6.24) in (6.23), and using the fact that g(md + z') = f(z') for z E [0, d), one gets E[ f (ZS+
,)
J
I {X: 0 u < s}] = =
m
d
g(md + z') p(t; Xs , and + z')
—ao 0
d
=
p(t; X , and + z') dz'.(6.25)
f(z')
5
m
0
=
—ao
Since Zs = X (mod d), there exists a unique integer m o = m 0 (X ) such that Xs = m o d + Z. Then 5
3
p(t; X , and + z') = p(t; m o d + Z , and + z') = p(t; Z S
5
S,
(
m — m 0 )d + z'),
by (6.24). Hence (6.25) reduces to (taking m' = m — m 0 ) E[f(Z ±,)
I {X: 0 - u < s}] =
f
=
J
od .
d
p(t; Z , m'd + z') dz
f (z')
f (z')q(t; Zs
S
,
z') dz'.
(6.26)
0
The desired relation (6.22) is obtained by taking conditional expectations of extreme left and right sides of (6.26) with respect to {Z: 0 < u < s}, noting that {Z: 0 < u < s} is determined by {X: 0 < u < s} (i.e., a{Z: 0 < u < s} c
Q{X:O' 0. Let q be an arbitrary (measurable) function on S. Then {q(XX is a Markov process on S':= q(S) if T ( f o 4,) is a function of (p, for every bounded (measurable) f on S'. )}
'
TWO-POINT BOUNDARY CASE. (Diffusions on S = [0, 1] with "0" and "1" Reflecting). To illustrate the ideas for the general case, let us first see how to obtain a probabilistic construction of a Brownian motion on [0, 1] with zero drift, and both boundary points 0,1 reflecting. This leads to another application of the above general principle 6.3 in the following example. Example 4. Let {X} be an unrestricted Brownian motion with zero drift, starting at x e [0, 1]. Define Z,(' ) := X,(mod 2). By Theorem 6.2, {Z} is a diffusion on [0, 2] (with "0" and "2" identified). Hence {Z, 2 ':= Z,( " — 1} is a diffusion on [-1, 1] whose transition probability density is (see (6.27))
q(2)(t; z, z')
_Y ,,,_ _
( (2m+z'—z) 2 1 (2nQ 2 t) 1/2 expt— 2a2t Jj
for —1 < z', z
,< 1.
(6.28) In particular,
q(2)(t; z, z') = q (2) (t; —z, —z').
(6.29)
It now follows, exactly as in the proof of Theorem 6.1, that {Z r } = { cp(Zt( z) )} :_ {jZt( 2) I} is a Markov process on [0, 1]. For, if f is a continuous function on
401
DIFFUSIONS WITH REFLECTING BOUNDARIES
[0, 1] then E[f(Z,) I Zö' = z] = E[f(IZ; Z ^I) Z0 = z] f (I z'I)q(t; z, z') dz'
= I
=
1
f
f(Iz'I)q(2(t;z
z') dz'
0
= f0 =
f(Iz'I)q ( ^ 1 (t;z,z') dz' + J
0
f
f(Iz'I)(q` 2 (t; z, —z') + q (2 (t; z, z')) dz'
i
f(lz'I)(q (2> (t; — z, z') + q (2 (t;z, z')) dz' (by (6. 29 ))
0
=f(Iz'I)q(t; Izl, z') dz', say,
(6.30)
fo
which is a function of q(z) = IzI. Hence, {Z,} is a Markov process on [0, 1] whose transition probability density is q(t; x, y) = q (2 (t;
m=
— x,
y) + q(2 (t; x, y)
_
(2na2t)
—
co
+ y — x)Zl + exp{ — (2m + y + x)?^J 2QZt li2[exp —(2m
l
j
{
2a2t
forx,ye[0,l]. (6.31) Note that öq(t; x, y) aq( 2 )( t; — x, Y) a9 (2) (t; x, y) at
—
at
m= —ao
+ at (p(t;
at
—x, 2m + y) + p(t; x, 2m + y)), (6.32)
where p(t; x, y) is the transition probability density of a Brownian motion with zero drift and diffusion coefficient 6 2 . Hence _ aq(t; x, at
)
Z 1- a9t>Y) ax2
y
,
(xE(0 , 1 )>YE[0 , 1 ]).
(6.33)
(yE[0,1]).
(6.34)
Also, the first equality in (6.31) shows that anti- x_ v1
=0, UA
1x=0
402
BROWNIAN MOTION AND DIFFUSIONS
Since q (2) (t; —x, y) + q (2) (t; x, y) is symmetric about x = 1 (see (6.21)) one has aq(t; x, y) = 0 3x x=1
(y E [0 , 1 ]).
(6.35)
Thus, q satisfies Kolmogorov's backward equation (6.33), and backward boundary conditions (6.34), (6.35). In precisely the same manner as in Example 4, we may arrive at a more general result. In order to state it, consider µ(• ), Q 2 (•) satisfying Condition (1.1) on [0, 1]. Assume µ(0) = 0 = µ(1).
(6.36)
Extend p(.), a 2 (.) to (— oo, oo) as follows. First set p(—x) = —µ(x),
for x e [0, 1]
u 2 (—x) = r 2 (x)
(6.37)
and then set p(x+2m)=µ(x),
a 2 (x+2m)=v 2 (x)
forxe[-1,1],
m=0,±1,±2.....(6.38)
Theorem 6.4. Let «•)' a 2 (.) be extended as above, and let {XX } be a diffusion on S = (—oo, oo) having these coefficients. Define {Z'} __ {X,(mod 2)}, and {ZZ } _= {IZ,( 1 " — 1I}. Then {Z^} is a diffusion with coefficients p(.) and Q 2 (.) on [0, 1], and reflecting boundary points 0, 1. The condition (6.36) guarantees continuity of the extended coefficients. However, if the given t(.) on [0, 1] does not satisfy this condition, one may modify y(x) at x = 0 and x = 1 as in (6.36). Although this makes z(.) discontinuous, Theorem 6.4 may be shown to go through (see theoretical complements 1 and 2). 7 DIFFUSIONS WITH ABSORBING BOUNDARIES Diffusions with absorbing boundaries are rather simple to describe. Upon arrival at a boundary point (state) the process is to remain in that state for all times thereafter. In particular this entails jumps in the transition probability distribution at absorbing states. 7.1 One-Point Boundary Case (Diffusions on S = [a, oo) with Absorption at a) Let p(• ), a'(.) be defined on S and satisfy Condition (1.1). Extend µ(• ), a'(• ) on all of W in some (arbitrary) manner such that Condition (1.1) holds on II'
403
DIFFUSIONS WITH ABSORBING BOUNDARIES
for this extension. Let {X} denote a diffusion on ER' having these coefficients, and starting at x e [a, oo). Define a new stochastic process {X = } by
^
X,
if t < Ta
a=X^.
ift>,T a ,
(7.1)
where 'c a is the first passage time to a defined by
ra := inf{t >, 0: X`
= a}.
(7.2)
Note that 'c a - i a ({X,}) is a function of the process {X}. Theorem 7.1. The process {X,} is a time-homogeneous Markov process. Proof. Let B be a Borel subset of (a, oo). Then P(XS+ ,EB, and T a 's {X:0- 0} and
using the Markov property of
{X,}
one gets, on the set {T a > s},
P(Xs+ ,EB,t a >s1 {X:0‹u<s})
= P(Xs+1 e B, T a > s, and T a > S + t {X: 0 1 u t I {X° = x})
S(P(TQ 'a, p(t;x,B)=P(X1 EB,and to >t1 {X° =x}) P(XX e B I {X° = x}) = p(t; x, B),
(B c (a, oo)),
(7.8)
°
°
P(t; x, dy) is given by a density p (t; x, y), say, on (a, co) with p (t; x, y) < p(t; x, y) (for x > a, y > a) (Exercise 1). One may then rewrite (7.7) as
IffB
p°(t;x,y)dy
P(t; x B) = P(TQ t)
ifBc(a, cc) and x>a, ifB={a},
(7.9)
where P. denotes the distribution of {X} starting at x. Thus for the analytical determination of p(t; x, dy) one needs to find the density p ° (t; x, y) and the function (t, x) -^ Px (ra < t) for x > a. It is shown in Section 15 that p° (t; x, y) satisfies the same backward equation as does p(t; x, y) (also see Exercise 2).
°
ap (t; x, y) = öt
I 2
0Zp° ap0
(t>0;x>a,y>a), (7.10)
Za (x) axz + µ(x) Ox
and the Dirichlet boundary condition
°
lim p (t; x, y) = 0. xlo
(7.11)
405
DIFFUSIONS WITH ABSORBING BOUNDARIES
Indeed, (7.11) may be derived from the relation (see (7.9)) P ° (t; x, Y) dY = Px (T a > t),
(x > a, t > 0),
(7.12)
noting that as x j a the probability on the right side goes to zero. If one assumes that p ° (t; x, y) has a limit as x j a, then this limit must be zero. By the same method as used in the derivation of (6.17) it will follow (Exercise 3) that p° satisfies the forward equation
aP ° (t;x,Y) = aZ (i 6Z (Y)P ° ) — öt
ay2
a (µ(Y)P ° ),
(t > 0; x > a,Y > a), (7.13)
aY
and the forward boundary condition lim p ° (t; x, y) = 0,
(t > 0; x > a).
(7.14)
y1a
° °
For a Brownian motion on [0, co) with an absorbing boundary zero, p is calculated in Section 8 analytically. A purely probabilistic derivation of p is sketched in Exercises 11.5, 11.6, and 11.11.
7.2 Two-Point Boundary Case (Diffusions on S = [a, b] with Two Absorbing Boundary Points a, b) Let U(• ), i 2 (•) be defined on [a, b]. Extend these to J' in any manner such that Condition (1.1) is satisfied. Let {X,} be a diffusion on R' having these extended coefficients, starting at a point in [a, b]. Define the stopped process {X,} by,
tx, X,
ift>T,(7.15) ift ^i,
where T is the first passage time to the boundary, r:=
inf{t>,0:X,=aor
(7.16)
X,=b}.
Virtually the same proof as given for Theorem 7.1 applies to show that {X t } is a Markov process on [a, b]. Definition 7.2. The process {X,} in (7.15) is called a diffusion coefficients i(.), a'(•) and with two absorbing boundaries a, b.
Once again, the transition probability
p(t; x, dy) of {X}
on [a, b] with
is given by a density
406
BROWNIAN MOTION AND DIFFUSIONS
p ° (t; x, y) when restricted to the interior (a, b),
°
p(t; x, B) = J p (t; x, y) dy a
(t > 0; a <x < b, B c (a, b)). (7.17)
This density has total mass less than 1,
J
°
P (t; x, y) dy = PX (r > t),
(t > 0; x e (a, b)),
(7.18)
(a,b)
where Ps is the distribution of {X}. Also, by the same argument as in the one-point boundary case, lim p ° (t; x, y) = 0,
(7.19)
Y1a
lim p ° (t; x, y) = 0;
(t > 0; x e (a, b)).
(7.20)
YIb
Unlike the one-point boundary case, however, p ° does not completely determine p(t; x, dy) in the present case. For this, one also needs to calculate the probabilities P(t; x, {b}) := Px (i ,< t, XL = b). (7.21)
P(t; x, {a}) °= Px (r < t, Xi = a),
In order to calculate these probabilities, let A denote the event that the diffusion {X,}, starting at x, reaches a before reaching b, and let DD be the event that by time t the process does not reach either a or b but eventually reaches a before b. Then DD c A and P(t; x, {a}) = Px(A\D,) = P(A) — Px(DD) = ‚ 1i(x) — P(D),
(7.22)
where (/i(x) is the probability that starting at x the diffusion {X,} reaches a before b. Conditioning on XX , one gets for t > 0, a <x < b,
PP(DD) =
f
°
PX(D, 1 X, = y)P (t; x, y) dy =
(a,b)
f
i(y)P°(t; x, y) dy. (7.23)
(a.b)
Substituting (7.22) in (7.23) yields p(t; x, {a}) = ^i(x)
—
J
°
^i(y)p (t; x, y) dy, (a,b)
Therefore we have the following result.
(t > 0, a < x < b). (7.24)
DIFFUSIONS WITH ABSORBING BOUNDARIES
407
Proposition 7.2. Let µ(x) and a 2 (x) satisfy Condition (1.1) on (—oo, oo). Let S = [a, b],
for some a < b. Then the probability p(t; x, {a}) is given by
P(t; x, {a}) = /i(x)
—
J0
6 ^i(Y)p° (t; x, y) dy
(t > 0, a <x
< b), (7.25)
where '(y) = PY (X, = a), Py being the distribution of the unrestricted process {X,} starting at y. The function çli(x) in Proposition 7.2 is given by (2.24) as well as (4.17). It is also calculated in Section 8, where it is shown that tJi(x) can be obtained as the solution to a boundary-value problem that is the continuous (or, differential equation) analog of the discrete (or difference equation) boundary-value problem (Chapter III, Eqs. 2.4-2.5) for the birth—death process. It is given by
rJi(x) _
f. f '
exp
dy } dz
< Ja ^Z(Y) ) ) (a < x nWn.
(8.8)
n=0
Consider the superposition defined by u(t, x) =
E e"^`0,0<x,y 0. We seek the transition density p that satisfies
ap
2
a lp
at= 2 a
äxZ
ap ax I =o
=0
fort>0, x >0, y:0,
(8.25) fort>0, y>,O.
X
This diffusion is the limiting form of that in Example 1 as d j oo. Letting d j oo
412
BROWNIAN MOTION AND DIFFUSIONS
in (8.23) one has 1 (— oznz(m/d)2 1 ^— p(t; x, y) = lim - exp 1 t)t cos cos d r^d m 2 =-m d d \ m^y^ ( to z i z u zl °° = exp{ — cos(zrxu) cos(nyu) du 2 1.
-X
= 1
_ 2^
)
Zzz exp - ta [cos(n(x + y)u) + cos(n(x - y)u)] du 2. I (e ,J-^
t62 z 1 dZ + e) ex{p _ l I
= (2ntor) lz ^1z (exp {( _
(x + Z )Z } + ex 2tv )
p
( _ nu)
2tQ Z )Z) / (x
(t>0,0<x,y0,0<x 0; Zero an Absorbing Boundary). The transition density function p(t; x, y) for
x, y e (0, oo) is obtained as a limit of (8.33) as d T oo. That is,
µIY —^z x) _ J nz 2 exp l 2 p(t; x, y) =
exp —
2 sin(irxu) sin(ityu) du
µ2t }n2o2tu2
2 _ exp
1 J exp{ - 2 } sin(Zx) sin(ZY) dZ -
tµ( a 2 x) 2 Q Z —
_ exp p(y 2
2 2t)
i J
x) — /
1.
o
))
exp{ — l ort z ). 0
x [cos(^(x — y)) — cos(^(x + y))] d^
1
= xp e 2ir (
z µ(Y — x) _ /it ) Qz
e-i^cx-ri _ e -14cx+Y)) e -a 2 ,4 2 J2 dZ
2QZ
1 µ2t ) = exp { µ(y — x) — l Q z 2vz (2na 2 t) 1 / 2 x exp —
z (x — y) z (xexp+ — y) 2o 2 t
2vzt
(8.39)
Integration of (8.39) with respect to y over (0, oo) yields p(t; x, {0}`), which is the probability that, starting from x, the first passage time to zero is greater than t. Examples 3 and 4 yield the distributions of the maximum and the minimum and the joint distribution of the maximum, minimum and the state at time t of a diffusion with constant coefficients over the time period [0, t] (Exercises 4-6).
9 TRANSIENCE AND RECURRENCE OF DIFFUSIONS Let {X': t 0} be a diffusion on an interval S, with drift and diffusion coefficients µ(x) and a 2 (x), starting at x. Let [c, d] c S, c < d, and let
415
TRANSIENCE AND RECURRENCE OF DIFFUSIONS
O(x) = P({XI } reaches c before d),
c < x < d.
(9.1)
Assume, as always, that Condition (1.1) holds. Criteria for transience and recurrence of diffusions may be derived from the computation of i/i(x) given in Section 4 (see Eq. (4.17)). A different method of computation based on solving a differential equation governing ifr is given in this section; recall Chapter III, Eqs. 2.4, 2.5 in the case of the birth—death chain for the analogous discrete problem. The present method is similar to that of Proposition 2.5, but does not make use of martingales. It does, however, use the fact derived in Exercise 3.5, namely, Px ( max IX, — xI > s = o(h) os x then p x ^, is obtained from (9.20) by letting d = y, and c J. a.
(b) If y < x, then use (9.18) with
c = y,
and let d lb.
n
TRANSIENCE AND RECURRENCE OF DIFFUSIONS
419
Definition 9.1. A state y is recurrent if p xy = 1 for all x E S such that p vx > 0, and is transient otherwise. If all states in S are recurrent, then the diffusion is said to be recurrent. The following corollary is a useful consequence of Proposition 9.2.
Corollary 9.3. A diffusion on S = ( a, b) with coefficients µ(x), r 2 (x) is recurrent if and only if s(a) = —cc and s(b) = oo. If S has one or two boundary points, then p xv is given by the following proposition. The modifications of the above to get these conditions are left to exercises. Proposition 9.4 (a) Suppose S = [a, b) and a is reflecting. Then the diffusion is recurrent if and only if s(b) = oo. If, on the other hand, s(b) < oo then one has p 1 for y > x and b exp{-1(x 0 , z)} dz
Pxv =
s(b)—s(x) = s(b) — s(y)
fb
y<x.
(9.26)
exp{ — I(x o ,z)} dz
y
(b) Suppose S = [ a, b) and a is absorbing. Then the only recurrent state is a and s(x) — s(a)
if0<x —co,
(10.20)
rn(b) < oo.
(10.21)
and s(b) = +oo,
For intervals S having boundary points one may prove the following result. The modifications that give this result are left to exercises. Proposition 10.3 (a) Suppose S = [ a, b) with a a reflecting boundary. Then the diffusion is positive recurrent if and only if s(b) = cc and m(b) < oo. (b) Suppose S = [a, b] with a and b reflecting boundaries. Then the diffusion is positive recurrent.
11 STOPPING TIMES AND THE STRONG MARKOV PROPERTY
Take Q to be the set of all possible. paths, i.e., the set of all continuous functions aw on the time interval [0, co) into the interval S (state space). In this case X,
424
BROWNIAN MOTION AND DIFFUSIONS
is the value of the trajectory (function) at time t, X1 (w) = w,. The sigmafield F is the smallest sigmafield that includes all finite-dimensional events of the form {X^. e B, for I < i ,< n}, where 0 .< t 1 > ... , X = E(E[(f(XS;At> ... X = E(f(XS
;nt
„)h(X• . 41 , .
.. , X. tß)
1
=}) I s;^)
. .. , Xs ; ,, t)lt==5 ;}E[h(XS ; +t> .. . , Xs+) ^^)
(11.14)
since XSJA t ,, ... , X,, A „ are determined by {Xs : 0 < s • . • , X,,, l„,)l{t=s ; }[E y, h(Xt;, ... , X„)]Y=x.).
STOPPING TIMES AND THE STRONG MARKOV PROPERTY
427
Therefore, E (J (X
A
r t , . . . , XLA !)h(X+ti, . . , XT +Ir) {T=S;})
= E(f(XrAt
XL ,m)
1 }L=S
;
}[E,h(Xt.l , X})^=x). (11.15)
Summing (11.15) over j one gets E(f(XL„ t,, .. . , X, A IM)h(XZ+t^, .. , XL+t.) 1 IT, a) — P0 (M, >, a, X, > a) = PO (M, > a) — PO (X, > a)
(11.36)
since {X1 > a} c {M1 > a }. Thus,
PO (M, > a) = 2P0 (X, > a).
(11.37)
Hence, PO(ta < t)
= P0 (M, % a) _
2 1c0 YZI(2a2t) dy e(2ita2t)I 2 a 2 (2)1/2
e - = 212 dz.
(11.38)
Thus, the probability density function (p.d.f.) of T a is given by t 3/2 e a2 /(2ort) -
fa(t) = u(27r)1/2
0 < t < co.
(11.39)
The p.d.f. of MM is obtained by differentiating the first integral in (11.38) with respect to a, namely,
9,(a)
- a2 /(2,,2f) = 2Q1 = 2 e ( 27ra 2 t) 1J2 \a
aj tJI ,
—cc r12r- :Xr =Y},
(12.2)
(r=1,2,...).
By the strong Markov property, the conditional distribution of Z, given the past up to time 11 2 , is the same as the distribution of j2 f (Xs ) ds with initial state y. This last distribution does not change with the sample point w, and therefore Z, is independent of the past up to time q 2 ,. In particular, Zr is independent of Z 1 , Z Z ... , Z,_ 1 . n Proof.
,
The main result may now be derived. Theorem 12.2. Suppose that the diffusion is positive recurrent on S = ( a, b).
(a) Then there exists a unique invariant distribution n(dx). (b) For every real-valued f such that
fs Lf(x)In(dx) < oo,
(12.3)
INVARIANT DISTRIBUTIONS AND THE STRONG LAW OF LARGE NUMBERS
433
the strong law of large numbers holds, i.e., with probability 1,
1
f
t
lim f(X.,) ds = t o
r_X
(12.4)
fs(x)n(dx),
no matter what the initial distribution may be. (c) If the end points a, b, of S are inaccessible or reflecting, then the invariant probability has a density ic(x), which is the unique normalized integrable solution of A*7c(x) = 0, i.e., 1 d2 2
(a 2 (x)rc(x)) — x (µ(x)n(x)) = 0
for x e S,
d
(12.5)
or simply of dx (U2(X)n(x)) — µ(x)ac(x) = 0.
2
(12.6)
Indeed, the invariant measure is the normalized speed measure,
ir(x)
=
m (x)(12.7)
m(b) — m(a)
Proof. Positive recurrence implies E5 (rl 2r — r12._2) = E Y ri 2 < oc. Hence, by the classical strong law of large numbers, r (flr' — 12(r -1)) - r' 1 X12r —
—* Ey112
r
r
(12.8)
with Pr -probability 1, as r --• oo. Let f be a bounded real-valued (measurable) function on S. Then applying the strong law to the sequence Z, (r = 0, 1, 2, ... ; ri o = 0) in (12.1) one gets lim 1
r— ^
r
Zr -1 = EYZ0 = E,f f(X,) ds.
Z
r'=
, "'
(12.9)
1
As in Section 9 of Chapter II (or Section 8 of Chapter IV), one has (Exercise 1) lim
1 r—c, fl2r
f
0
f(XS ) ds = lim
f
t —ao t 0
f(
X
) ds,
(12.10)
434
BROWNIAN MOTION AND DIFFUSIONS
for every f such that
EI J
n2
(12.11)
I f(X,)I ds < oo .
0
Combining (12.8)—(12.10), one gets lim
1 fo f(XX)ds=
r-• m t
for all
Et InZ f(X5 ) ds,
Ey'12
(12.12)
0
f satisfying (12.11). In the special case f = 1
8
with
B a Borel subset of
S, (12.12) becomes r lim 1 8 (X3 ) ds = ir(B)
1
(12.13)
t- r tfo
where
f
on 2
n(B)'= 1
EY 1 B (XS ) ds.
Ep^i2
(12.14)
Thus, (12.13) says that the limiting proportion of time the process spends in a set B equals the expected amount of time it spends in B during a single cycle relative to (i.e., divided by) the mean length of the cycle. It is simple to check from (12.14) that n is a probability measure on S (Exercise 2). Also, if f = a1 ß , where a l , a 2 , ... , a„ are real numbers and B 1 , B 2i ... , B„ are pairwise disjoint (Borel) subsets of S, then 1 E5 f(Xs) ds =E 1
E5,12
Y a ^rl: 1
2 i= 1
0
_
8 r (X.) ds
0
a n (B ) =J f(x)ic(dx). i
;
The equality EY z ^
n=
E2 o f (X„) ds =
('
J
s
f(x)ir(dx)
(12.15)
may now be extended to all f satisfying (12.11) (Exercise 2). Combining (12.12) and (12.15), one has
lim
1
r
f
f(X5 ) ds = sf(x)n(dx)
r- ,O t 0
(12.16)
435
INVARIANT DISTRIBUTIONS AND THE STRONG LAW OF LARGE NUMBERS
with Pr -probability 1, for all f satisfying (12.11). Taking expectations in (12.16) for bounded f one gets, by arguments used in Section 9 of Chapter II, lim J t B y f (X,) ds = t-. t o
J f (x)n(dx)
(12.17)
s
for all y. Writing (T5f)(Y) = Ej(XS) = f(z)p(s; Y, z) dz,
(12.18)
fs
one may express (12.17) as
1 J (T ('
lim r
-
t
S
f )(y) ds = f (x)n(dx)
.0 t o
(12.19)
s
for all y E S. But the left side also equals, for any given h> 0, 1 t+h
lim
1 (Ts f)(y) ds = lim —
t-^co t h
r^oo t
t
1 t (Ts+hf)(Y) ds = lim — (TT,, f))(y) ds ('
o t-
x,
=
JO
t
J (T f)(x)n(dx)• (12.20) h
The last equality follows from (12.19) applied to the function Th f. It follows from (12.19) and (12.20) that, at least for all bounded (measurable) f, one has
JI (T f )(x)n(dx) = f f(x)n(dx) h
s
(h > 0).
(12.21)
s
Specializing this to f = 1 B one has (T h f)(x) = p(h; x, B), so that (12.21) yields PP (Xh E B) = E,P(Xh e B 1 Xo ) =
J p(h; x, B)n(dx) s
J
=f
s
1 B (x)rc(dx) =
J x(dx) = T(B) = P,,(Xo e B),
(12.22)
s
proving that it is an invariant probability. The proof of parts (a), (b) of Theorem 12.2 is now complete, excepting for uniqueness. Uniqueness may be proved in the same manner as in Section 6 of Chapter II (Exercise 3). In order to prove (c), first note that n(x) given by (12.7) is a probability density function (p.d.f.) which satisfies (12.6) and, therefore, (12.5). To prove
BROWNIAN MOTION AND DIFFUSIONS
436
its invariance one needs to check (see (12.14)), for n given by (12.7),
E
z
E E x .f(X:) ds = .f(z)n(z) dz = 1 YIl
i (b) m(ä )d (
s
s
)
(12.23)
-
where f is an arbitrary bounded measurable function on S, and a, b, are the end points of S. But, as in the proof of (10.17) and (10.18) (Exercise 4) Ex f
T-
f(X,) ds = t
0
y
f f'u^).dm(ul ds(z), Z.
Jz a
f
(12.24)
JJ
.f(X,) ds = Y b .f(u) dm(u) ds(z). Ex o
Hence, by the strong Markov property,
EJ =
n2 f(X)ds=
0
I
E
!
r
£
f(Xs) ds + E x
'
y
/('
J
tY
.f (XS) ds
0
\
6
( J f (u) dm(u) I ds(z) = s(x; y) J X \ n
/
6
f(u) dm(u). (12.25) a
In particular, taking f - 1,
E
y
'7 2 = s(x; y)(m(b) — m(a)).
(12.26)
n
Dividing (12.25) by (12.26); one arrives at (12.23).
The following alternative argument is instructive, and may be justified under appropriate assumptions on the transition probability density p(t; x, y) (Exercise 5). By the backward equation (2.15) and integration by parts one has, for the function ir(x) given by (12.7),
f p(t; x, y) 7 r(x) dx = s s
=
ap(t
, ät
y)
f
m(x) dx = (Ax p(t; x, y))rti(x) dx s
Jp(t; x, y)(A*rc)(x) dx = 0,
(t > 0).
(12.27)
s
Since fs p(t; x, y)rc(x) dx is the p.d.f. of X, when the initial p.d.f. is ir(x), it follows from (12.27) that the distribution of X, does not change for t > 0; but X, —► Xo a.s. as t 10, so that X, converges in distribution to ir(x) dx. Therefore, X, has p.d.f. n(x) for every t > 0. Example 1. (Price Adjustment in a Two-Commodity Model). The excess
INVARIANT DISTRIBUTIONS AND THE STRONG LAW OF LARGE NUMBERS
437
demand function a(x) = (z t (x), z 2 (x)) of two commodities is a function of prices x = (xl, x2) E A:= 1 (x1, x2): xl > 0,x2
> 0,x1 + x2 =
1}
such that Walras' law holds, namely, x,z i (x) + x z z 2 (x) = 0,
x c- A.
(12.28)
In view of (12.28) one may concentrate on the behavior of just one excess demand, say z l (x). Price adjustments in a market usually depend on excess demands. Since x 2 = 1 — x 1 , one may consider the price X 1 (t) of the first commodity as a diffusion on (0, 1) with a drift function determined by the excess demand z l at this price. For example, take the generator of such a diffusion to be of the form Ä
= 12 62
O x
dz + ,O x d (0 <x 0 such that Qz (x) , Qö.
(12.31)
Then the boundaries 0 and I become inaccessible. Indeed, it is easy to check in this case that s(0) = —co, s(1) = oo, and that the diffusion on S = (0, 1) is positive recurrent and, therefore, 0 and 1 cannot be reached from S (Exercise 6). There is, therefore, a unique steady-state or equilibrium distribution it(dx) = iv(x) dx, and, no matter where the system starts, the distribution of X 1 (t) will approach this steady-state distribution as t co (see theoretical complement 1). Example 2. (Stochastic Changes in the Size of an Industry). By the "size" of a competitive industry is meant its productive capacity. The profit level f(y) for a given size y is the rate of additional return (profit) per unit of increase in the industry size from its present size y. The law of diminishing return requires that f'(y) 0 ifu 0
p > f (co ). In a stochastic dynamic model one may take the industry
size Y at time t to be a diffusion on S = (0, oo) with drift 0 0,
J
T E h(x) = lim J S (TE T„f)(x) du = lim
S
(TE+ . f)(x) du
s-^ o
= lim S
= lim
I
+E (T„f)(x) dv = lim
s
J
S
s'^^ o
(Tj)(x) dv —
J
J
S
(T„ f)(x) dv
e
(T^f)(x) dv = h(x) —
0
J (T^f)(x) dv. 0
(13.11)
Hence, (TEh)(x) — h(x) 1
E = —um-- (T„ f)(x) dv = — (T o f)(x) = — f (x).
Ah(x) = lim
ej0
E
sf.0 E 0
(13.12)
These calculations provide the following result. Proposition 13.2. The variance of the limiting Gaussian distribution in Theorem (13.1) is given by
2 := 2
f
f(x)h(x)n(x) dx
s
where h is a solution to (13.9) in LZ (S, n).
(13.13)
INTRODUCTION TO MULTIDIMENSIONAL BROWNIAN MOTION AND DIFFUSIONS
441
It may be shown that if there exists a function h in LZ (S, it) satisfying (13.9) for a given f in LZ (S, it) with Ej = 0, then such an h is unique up to the addition of a constant (see theoretical complement 4). The condition E n f = 0 ensures that in this case the expression (13.13) does not change if a constant is added to h. Also note that, in order that (13.9) may admit a solution h, one must have Ej f = 0. This is seen by integrating both sides of (13.9) with respect to it(x) dx and reducing the first integral by integration by parts. That is,
f = fh
Ah(x)it(x) dx = JS
h(x)(A*n(x)) dx
s
(x)[Z(u'(x)n(x))" — (p(x)n(x))'] dx = 0, (13.14)
s
using (12.5).
14 INTRODUCTION TO MULTIDIMENSIONAL BROWNIAN MOTION AND DIFFUSIONS A k- dimensional standard Brownian motion with initial state x = (x ( ' ) , x (2) , ... , x) is the process {B, = (B, .. . , B"): t 0} where {B, }, 1 < i < k, are k independent one-dimensional Brownian motions starting at x '° (1 < i < k). It is easily seen to be Markovian and the conditional density of B + given given {Bh: 0 < u < s} is a Gaussian (k-dimensional) density with mean vector Bs and variance—covariance matrix tl where I is the (k x k) identity matrix. The transition density function is (
k ) 2 1 ^/2 k exp — 1 ^ (y ( — x" ) 2t ; = 1 ]
p(t; x, y) = [ (2nt)
I k
(Y ` — x(`)) ( )
I ex { _ (2nt) 1 / 2 p 2t
?^
where x
= (x"
...,x (k) ),
Y = (Y "),...,y(k)). (14.1)
As in the one-dimensional case, the Markov property also follows from the fact that {B: t > 0} is a (vector-valued) process with independent increments. It is straightforward to check that p satisfies the backward equation
ap_1k ^ a 2 p
(14.2)
442
BROWNIAN MOTION AND DIFFUSIONS
as well as the forward equation
ap
-
öt
1 k02p
(14.3) 2 a=1 (a y (I)) 2 •
A Brownian motion {X, = (X,(' ) , . .. , X,(k) )} with drift vector p = (µ (1) , ... , µ (k) ) and diffusion matrix D = ((d ;; )), is defined by X,=x 0 +tµ+o'B,,
(14.4)
where X a = x 0 = (xol) , ... , xö )) is the initial state, a is a k x k matrix satisfying and B, = (B,. . . , Bt( k) ) is a standard Brownian motion with initial state "zero" (vector). For each t > 0, aB, is a nonsingular linear transformation of a Gaussian vector B, whose mean (vector) is zero and variance-covariance matrix tI. Therefore, aB, is a Gaussian random vector with mean vector zero and variance-covariance matrix atic' = talä = tD = ((td ;; )). Therefore, X, is Gaussian with mean x 0 + ttt and variance-covariance matrix tD. Since X, +S - X, = st + a(B, , - B,), {X,} is a process with independent increments (and is, therefore, Markovian) having a transition density function 66' = D ,
1 p(t; x, y) -
((21rt)'/ 2 ) k (det D)" 2 1 k
(
k
x ci) - tµ (i)) I d''(y(`) - x ( ` ) - tµ(°)(yc;) ` 2 t;=ii=1
x expS -- Y
(x,yeRk;t>,0). (14.5)
Here ((d")) = D -i* . This transition probability density satisfies the backward and forward equations (Exercise 3)
a
1
k
k
rat = 2 ; ^
j=1 d"
a t__ 1d l3 öt
2;=1; =i
a
k
a2
U)
, ax ` axu + ; Y µ axe;) ( )
)
a 2 P -
ay u)a x c;)
IC
;=1
(14.6) u ;;)
aP a y (j)•
More generally, we have the following.
Definition 14.1. A k-dimensional diffusion with (nonconstant) drift bector µ(x) = (p" l) (x), .... p (k) (x)) and (nonconstant) diffusion matrix D(x) = ((d (x))) is a Markov process whose transition density function p(t; x, y) satisfies the ;;
INTRODUCTION TO MULTIDIMENSIONAL BROWNIAN MOTION AND DIFFUSIONS 443
1 Y_
Kolmogorov equations
aP öt
k
^,^ ,
1
k
a 2 (dj;(Y)P) — 2i=11=1 äyu°
aP __ 1 öt
a
I d,i(x) ax ^ ax ct^ + Y µ (x) ax 2, =1 i=1 =1
(14.7)
3(f^(`^(Y)P) a y , 0} is a standard Brownian motion. The equations (14.8) (there are k equations corresponding to the k components of the vector X,) may be roughly interpreted as follows: Given {X: 0 < u 2,
-)
for c 2, there is a positive probability that {X,} will never reach a ball if it starts from a point outside. Recall that an analogous result is true for multidimensional simple symmetric random walks (see Section 5 of Chapter I). In one respect, however, multidimensional random walks on integer lattices differ from multidimensional Brownian motions. In the former case, in view of
448
BROWNIAN MOTION AND DIFFUSIONS
the countability of the state space, the random walk reaches every state with positive probability. This is not true for multidimensional Brownian motions. Indeed, by letting c 10 in (14.25) one gets for 0< jx — aj < d, P,,({X,} reaches a before 3B(a:d)) = 0.
(14.27)
Letting d t co in (14.27) it follows that PX ({X,} ever reaches a) = 0
(0 < x — al).
(14.28)
(0 t }) )
= P( {X{ i) E 1 i } n {T OG > t} Xo' ) = x^' ) ) x P((X; Z) , .. , Xt(k)) E C Xo) = xci) for 2 0 and F"(r) + F'(r)ß(r)/r = 0, for r > c. Hence by (15.35) and the definition of ß, writing r = jxl, A x
A f 1) is said to be recurrent if for every pair x0y PX(IX, - yl 0. A diffusion is transient if Px (IX^I- ► coast- ► co) =1
(15.49)
for all x. Theorem 15.3. Assume that the conditions (1)-(4) in Section 14 hold.
= oo for some c > 0, then the diffusion is recurrent. (a) If J° e_ 1 (b) If J e `- " du < oo for some c> 0, then the diffusion is transient. -
(
)
Proof. (a) Fix x 0 , y (x o y) and c > 0. Let d > max{1x 0 1, lyl + s}. Choose c, 0 < c < d, such that {Ixl = c} is disjoint from B(y; e). Define the stopping times
T 1 := inf{t ? 0: IX^I = d}, i2n+1 = inf{t - r 2n : IX 1 I = d},
T2:= inf{t >, r 1 : IXI = c}, T2 , :=
(15.50)
inf{t ,> zzn 1: IX:I = c}. -
Now the function x -• PX (IX,I = d for some t > 0), (x) < d, is the solution to the Dirichlet problem (15.23) with G = B(0; d), and boundary function f - 1 on {Ixl = d}. But i(x) - 1 is a solution to this Dirichlet problem. By the
MULTIDIMENSIONAL DIFFUSIONS UNDER ABSORBING BOUNDARY CONDITIONS 459
uniqueness of the solution, Px (IX,I = d for some t >, 0) = I for Ixl < d.
(15.51)
This means PX(TI < cc) = 1,
(Ixl < d).
(15.52)
As r 2 is the first hitting time of {Ixl = c} by the after-z process X , it follows from (15.52), the strong Markov property, and (15.46), that 00 )
Px(r2
/i(x) < 1.
(15.59)
IxI =c
Therefore, from (15.56), P..(A, I ^sz.)'< (I — 6 0)
(n
>, 1).
(15.60)
460
BROWNIAN MOTION AND DIFFUSIONS
Hence, (Exercise 17) P 0 X,} does not reach ÖB(y; E)) < P, 0 (A I n • . • n A„) < (1 — 8 0 )” for all n, (15.61) which implies (15.48), with x = x 0 . (b) The proof is analogous to the proof of the transience of Brownian motion for k >, 3, as sketched in Exercise 14.5 and Exercise 18. n ({
16 REFLECTING BOUNDARY CONDITIONS FOR MULTIDIMENSIONAL DIFFUSIONS The probabilistic construction of reflecting diffusions is not quite as simple as that of absorbing diffusions. Let us begin with an example. Example 1. (Reflecting Brownian Motion on a Half-Space). Take S = Hk uäHk = R,where Hk = {X E
Rk
;
X (1) >O},
3H k = {X E P k ; x (1) = 0}.
First consider the case k = 1. This is dealt with in detail in Subsection 6.2, Example 1. In this case a reflecting Brownian motion {X,} with zero drift and diffusion coefficient a 2 > 0 is given by (16.1)
Y = All
where {X} is a one-dimensional (unrestricted) Brownian motion with zero drift and diffusion coefficient a 2 > 0. The transition probability density of {X1 is given by (16.2) g1(t;x,y)= p1(t;x,y)+p1(t;x, — y) (x,y% 0 ), }
where p l (t; x, y) is the transition probability density of {B,}, p, (t; x, y) = (2na2t)_ 1/2 exp
_(y
—
x
2
2Q t )
Z
( X, y e W).
(16.3)
Fork > 1, consider k independent Brownian motions {Xli) }, I < j k, each having drift zero and diffusion coefficient a 2 . Then {Y t ;= (IX'(' ) I, X e ) , ... , X; k) )} is a Brownian motion in Hk := {x e Il k ; x ( ' ) > 0} = Hk u 3Hk with normal reflection at the boundary {x e ff; x (1) = 0} = öHk . Its transition probability density function q is given by k
q(t; x, y) = qi(t; x('), y ( ' ) ) (X = (x(1),
... , x (k) ),
ri pi(t; x(i), y e) ), j =2
y = (.y(1)ß ... , y (k) ) E Hk).
(16.4)
REFLECTING BOUNDARY CONDITIONS FOR MULTIDIMENSIONAL DIFFUSIONS
461
This transition density satisfies Kolmogorov's backward equation
a
(t > 0; XE
t = i 62 A.q(t; x, y), k
where =j
Hk , ye Hk), (16.5)
a2
ax
and the backward boundary condition
a q(t; x, y) =0,
(t >0;xEaHk,yEHk).
--
(16.6)
_ The following extension of Theorem 6.1 provides a class of diffusions on
Hk with normal reflection at the boundary.
Suppose that drift coefficients p (x), I < i < k, and diffusion coefficients d(x) are prescribed on '1k the assumptions (1)—(4) in Section 14. Assume also that p 1l) (x) =0,
d 1 (x) =0
(2<j Y) _ (a1P)(t; x, y),
(16.13)
(ataip)(t; X, Y) _ — ( 2 ,a;P)(t; x, y),
(2 j 0} and its boundary is aH := {y• x = 0}. Let t = (µ (1) , ... , µ (k) ) be an arbitrary vector in R" and D := ((d 1 )) an arbitrary k x k positive definite matrix. We write, for every real-valued differentiable function f (on l or H), grad f for the gradient of f, i.e., (grad
(
=
3f(x) ax(1)
, ... ,
of (x) (16.24) ax(k)
Also write D" 2 for the positive definite matrix such that D 1 / 2 D 1 / 2 = D, and D 1/2 for its inverse. Proposition 16.2. On the half-space H = fly there exists a Markov process {Z,} with continuous sample paths whose transition probability density r satisfies the Kolmogorov backward equation ar(t; z, w)
at
1 k 02r k ^^^ ar d,+ b` az(;) + 1 =1 = 2 ;, 1 (t > 0; zeH,wef), (16.25)
and the backward boundary condition (Dy)•(grad r)(t; z, w) = 0
(t > 0; z e OH, w E H).
(16.26)
Further, {Z,} has the representation Z, := D' 12 0Y,
(16.27)
REFLECTING BOUNDARY CONDITIONS FOR MULTIDIMENSIONAL DIFFUSIONS
465
where 0 is an orthogonal transformation such that 0' maps D" 2 y/ID" 2 yJ into e:= (1, 0, ... ‚0), and {Y} is a reflecting Brownian motion on Hk having drift vector v:= (D 112 O) - 't and diffusion matrix I.
Proof. Let {Y,} denote a reflecting Brownian motion on the half-space Hk and 0 an orthogonal matrix as described. Then {Z, }, defined by (16.27), is a Markov process on the state space H (Exercise 3). Let q, r denote the transition probability densities of {Y, }, {Z, }, respectively. Then, writing T:= D 112 0, r(t; z, w) = (det D 112 ) 'q(t; T - 'z T - 'w). (16.28) Since whenever {B,} is a standard Brownian motion, {TB,} is a Brownian motion with dispersion matrix TT' = D, one may easily guess that r satisfies the backward equation (16.25). To verify (16.25) by direct computation, use the fact that, k q 16.29 =2e + ;^1 v ( ` a ( ) xq at — ax(`)
aq(t; x, y)
)
and that, for every real-valued twice-differentiable function f on W' the following standard rules on the differentiation of composite functions apply (Exercise 4): T' grad(f o T - ') = (grad f) o T - ', (16.30)
TT'(((aja;f) ° T - I )) = (((ai a;)(f ° T -' ))) This is used with f as the function x —• q(t; x, T -1 w), for fixed t and T - 'w, to arrive at (16.25) using (16.28) and (16.29). The boundary condition aq/ax^`> - (e•grad q) = 0 on OHk becomes, using the first relation in (16.30), e•(T' grad r)(t; z, w) = 0
(z E aH - {y•z = 0 }).
(16.31)
Recall that O' D 1 / 2 y = IDYI e, and T = D 112 0, to express (16.31) as (D). (grad r)(t; z, w) = 0
if z e OH.
(16.32)
n Since a diffusion on H k with spatially varying drift coefficients and constant diffusion matrix I may be constructed by the method of Proposition 16.1, the preceding result may be proved with {Y,} taken as such a diffusion. This leads to a Markov process (diffusion) on H whose transition probability density r
466
BROWNIAN MOTION AND DIFFUSIONS
satisfies the backward equation ar(t; z, w)
at
1 k 32r
k
( i)
ar
µ (z) az^i)
2 ,, ^ d " as 0, (gradq(x)i>c>0
forxe aG:_{x:Q(x)=0}.
(16.35)
Write G:= {p(x) > 0}. Let D(x):= ((d ; (x))), and u ' (x) (1 < i < k) satisfy the assumptions (1)—(4) of Section 14, on G. ( )
Definition 16.2. Let {X,} be a Markov process on the state space G in (16.34) that has continuous sample paths and a transition probability density q satisfying the Kolmogorov backward equation ag(t; x, y) at
k
2
k
= Aq := 1 Z d,j ( x ) a g + 1 u^i^(x) qa , 2 i . j = 1 ax(`) ax j> (
i = 1 ax(f)
(t>0;xeG,ye G), (16.36)
and the backward boundary condition D(x)(grad (p)(x)•grad q = 0
(t > 0;
x e aG, y e G).
(16.37)
Then {X,} is called a reflecting diffusion on G having drift coefficients p O(• ) and diffusion coefficients d(.) (1 < i,j < k). The vector D(x)(grad cp)(x) at x e aG (or any nonzero multiple of it), is said to be conormal to the boundary at x. The reflection of {X,} is said to be in the direction of the conormal to the boundary. (
Note that, according to standard terminology, (grad p)(x) is normal to the boundary of G at x a aG.
So far we have considered domains (16.37) with q(x) = x^l ) . Now let p(x) = 1 — 1x1 2 , so that G =_ {x e IRk: lxi < 1} is the closed unit ball.
REFLECTING BOUNDARY CONDITIONS FOR MULTIDIMENSIONAL DIFFUSIONS
467
Example 2. The reflecting standard Brownian motion {X,} on the unit ball B(0:1) = {x e R': Ixl < 1} has a transition probability density p satisfying the backward equation ap(t; x, Y) = i
(t > 0; IXI < 1 , IY1 , 1),
!A p(t; x, Y)
at
(16.38)
and the boundary condition
y-
—0)
ap(t; x, Y) =
ax '
0
(t > 0;
Ixl = 1, IYI < 1).
(16.39)
( )
It is useful to note that the radial motion {R,:= IX,I} is a Markov process on [0, 1]. This follows from the fact that the Laplacian A. transforms radial functions into radial functions (Proposition 14.1) and the boundary condition is radial in nature. Indeed, if f is a twice continuously differentiable function on [0, 1] and g(x) = f(Ixl), then g is a twice continuously differentiable radial function on 9(0:1) and the function 7 g(x)°= Exg(X,) = Exf(IXI),
(16.40)
is the solution to the initial-value problem au(t, x) —
at k
x(i'
au(t,
ZAXu(t, x),
x) = 0,
lim u(t, x) = f(Ixl), to
(t > 0; Ixl < 1),
(Ixl = 1),
(16.41)
(Ixl < 1 ).
Let v(t, r) be the unique solution to av(t,
at
r)
1 az v k — 1 av = ----- + ------ tare 2r ar'
av(t, r) -
ar
- 0,
limv(t,r)= f(r), tlo
(t>0;0 r 0; r = 1),
(16.42)
(00;ye6). (17.4)
+x ( ' ) b i
öx
Here, n(x) = (0, x') is the normal (of length a) at the boundary point x = (x", x').
The Fokker—Planck or forward equation may now be shown, by arguments entirely analogous to those given in Sections 2 and 6, to be 2D0A rp — --- (F(y')p) (ye G, t > 0).
=
(17.5)
The forward boundary condition is the no-flux condition (see Section 2) y (2) 2)
+y=n(y)•grad p
=0
(ycäG,t>0).
(17.6)
By using the divergence theorem, which is a multidimensional analog of integration by parts, it is simple to check, in the manner of Sections 2 and 6, that (17.5) and (17.6) are indeed the forward (or, adjoint) conditions. Given an initial solute concentration distribution c o (dx), the solute concentration c(t; y) at y at time t is then given by c(t;
y) =
J
p(t; x, y)co(dx)
(17.7)
G
and it satisfies the Fokker—Planck equations (17.5) and (17.6). Since (a) the diffusion matrix is D O I, (b) the drift velocity is along the x' -axis and depends only on x^ 2 ^, x^ 3 ^, and (c) the boundary condition only involves x (2) , x (3) , it should be at least intuitively clear that {X,} has the following representation: 1. {X := (X
2
,X
3
)} is a two-dimensional, reflecting Brownian motion on
470
BROWNIAN MOTION AND DIFFUSIONS
the disc Ba := {ly'I 0. It follows, as in Proposition 6.3 of Chapter II, that for some positive constants c 1 , c 2 , one has max Ip'(t; x', y') — y(Y')I < c l e - ` 2 `
(t > 0),
(17.10)
X , ,y'EBa
where y(y') is the unique invariant probability for p'. Let us check that y(y') is the uniform density,
y(Y') = a n g (IY'I < a).
(17.11)
It is enough to show that p'(t;
atfjjy'j'_^a)
x', y')y(x') dx' = 0.
(17.12)
But p' satisfies the backward equation
atp = iDOAX• p'
(Ix'! < a),
(17.13)
along with the boundary condition x t2) a P + xts) a p = 0
ax
ax
(Ix') = a).
(17.14)
Interchanging the order of differentiation and integration on the left side of (17.12), and using (17.13) and (17.14) and the divergence theorem, (17.12) is established. In other words, the adjoint operator, which in this symmetric case is the same as the infinitesimal generator, annihilates constants.
471
CHAPTER APPLICATION
As a consequence of (17.10), the concentration c(t; y) gets uniformized, or becomes constant, in the y'-plane. That is, (t; Y,) :=
J
^ c(t; y) dy ' —' cö'= (
)
ao
(fa co(dx))'7za2
(17.15)
exponentially fast as t —> oo. Of much greater interest, however, is the asymptotic behavior of the concentration in the y ' -direction, i.e., of (
)) := e(t; y
y) dy' =
fj c(t; Iy'l a)
)
J _co(dx)I G
IJy'^
a)
7 dy'I. p(t; x, y)
(1 .16)
The study of the asymptotics of c(t; y ") is further simplified by the fact that the radial process {R,:= IX,'1} is Markovian. This is a consequence of the facts that (1) the backward operator ZD.A.. of {X;} transforms all smooth radial functions on the disc B o into radial functions and (2) the boundary condition (17.12) is radial, asserting that the derivative in the radial direction at the boundary vanishes (see Proposition 14.1). Hence {R,'} is a diffusion on [0, a] whose transition probability density is (
q(t; r, r') := p'(t; X', Y')sr(dY) .
(Ix'l = r),
(17.17)
iI9'I_r)
where s,•(dy') is the arc length measure on the circle {Iy'I = r' }. The infinitesimal generator of {R;} is given by the backward operator _ AR
(d 2 1 dl A dr 2 + r drJ'
0 < r < a
(17.18)
and the backward reflecting boundary condition
dl
dr r = a
= 0.
(17.19)
One may arrive at (17.18) and (17.19) from (17.17), (17.13), and (17.14), as in Example 2 of Section 16 (see Eq. 16.42 for k = 2). It follows from (17.10), (17.11), and (17.17) that {R,'} has the unique invariant density ir(r)
=
2r
z ,
(17.20)
0 oo. Hence, by Scheffe's theorem (Section 3 of Chapter 0), the density of Y," converges to that of N(0, Dt) in the L' -norm. Since the distribution of X,(' ) has the density c(t;•)/Co , where C o := $,3 c o (dx) is the total amount of solute present, the density of Y^' ° is given by )
)
(17.31)
--> n" 2 c(nt; n'/ Z Z + 2Uo nt)/C0 .
z
One therefore has, writing E =
I
E
-l^( E -z t; E -i z + zU0 E -2 t) —
(2nCt)ii2 exp{ - 2
dz-+0
as v J, 0. (17.32)
Another way of expressing (17.32) is by changing variables z' _ E - 'z, t' = E -z t. Then (17.32) becomes ^c(t ; z + zU°t)
exp^ (2^Dt')^^z
ll
--2Dt'}
dz' -+0
ast'I oo.
(17.33)
From a practical point of view, (17.33) says that at time t' much larger than the relaxation time over which the error in (17.10) becomes negligible, the concentration along the capillary axis becomes Gaussian. The center of mass of the solute moves with a velocity ZU 0 along the capillary axis, with a dispersion D per unit of time. The relaxation time in (17.10) is 1/c 2 , and —c 2 is estimated by the first (i.e., closest to zero) nonzero eigenvalue of ZD o i X . on the disc Ba with the no-flux, or Neumann, boundary condition. It remains to give a proof of the representation (17.8). First, let us show that {X, = (X; 1 ", X;)} as defined by (17.8) is a time-homogeneous Markov process. Fix s, t positive, and let the initial state be x o = (xö'I, xo'). Write X
;+)S = x
+
1 5 F((X; + )„) du + Do (B,+s — B,),
(17.34)
0
where X; + is the after -t process {(X )":= X, + : u >, 0 }. Let g be a bounded measurable function on G. Since {B,,: 0 and X; +, the inner conditional expectation on the right in (17.35) may be expressed as
f
E(h(s, (X, + F o ((X;+)„))du, X )
+s
{B: 0 < u < t}, {X: 0 < u < t} (17.37)
where h(s, v, z') = Eg(v + Ji (B, — B,), z').
(17.38)
To evaluate (17.37) use the facts that (1) X,(' ) is determined by the conditioning variables, (2) {X;} is independent of {B,}, so that the conditional distribution of X given {B,,: 0 < u < t}, {X: 0 < u < t}, is the same as that given {XÜ: 0 < u < t}. Then (17.37) becomes (E(h(s, y + Jo F((X)u) du, X;+S) I {X: 0 < u < t})).. (17.39)
But {X;} is a time-homogeneous Markov process. Therefore, (17.39) becomes
(h1(s, y, X;)) Y =xil , = hl(s, Xi l) , X;),
(17.40)
where h l is some function on [0, ao) x G. The expression (17.40) is already determined by {X 0 : 0 < u < t}. Therefore, the outer conditional expectation in (17.35) is the same as (17.40), completing the proof that {X 1 } is a time-homogeneous Markov process. Observe next that by Proposition 14.1, if {Z, :_ (Z,'', Z,')} is a Markov process on G whose transition probability density satisfies the Kolmogorov equations (17.3) and (17.4), then {Z,'} is a Markov process on the disc B o whose transition probability density satisfies the Kolmogorov equations (17.13) and (17.14). Since the latter are also satisfied by {X;}, (17.8(i)) is established. It is enough then to identify the conditional probability density of X;'^, given X o = x, with that of Z; 1) , given Z o = x. Let g be a smooth bounded function on R', and let g denote the function on G defined by g(x) = g(xU )). Then, denoting by EX
EXERCISES
475
expectation given X 0 = x, (Tt9)(X)'= E.O(Xr) = E,,g(X).
(17.41)
By (17.8), as t j 0 one has by a Taylor expansion, (Tr9)(x) = g(xci)) + g'(xU))Ex(J F(X) ds + Do B `) 0 r
+ Zg"(x ( ' ) )E.
2
F(X) ds + Do B, + o(t) 0
= g(x1) + g '( x U))F( x ') + zg"(x" ) )Dot + o(t)
(17.42)
where g', g" are the first and second derivatives of g. It follows that
a
( at
T,9(x) I
= F(x')g'(x") + ZDo g"(x" i ^).
(17.43)
/t_o
But the right side is Ag, where A is as in (17.3). Therefore, the infinitesimal generator of {X,} agrees with that of {Z I } for functions depending only on the first coordinate. In particular, the backward equation for T, g becomes
a T,g(x) = AT,g(x), at
(17.44)
so that, by the uniqueness of the solution to the initial-value problem, E.g(X) _ (T,g)(x) = E(g(Z{' ) ) I Z o = x). Thus, the conditional distribution of Xf ", given X 0 = x, is the same as that of Z,1 1) , given Z o = x. This completes the proof of (17.8). It may be noted that, although kinetic theoretic arguments given earlier provide a justification for the validity of the Fokker—Planck equations (17.5) and (17.6) (with c(t; y) replacing p(t; x, y)), they are not needed for the analysis of the asymptotics of c(t; y) carried out above. We have simply given a probabilistic proof, using the central limit theorem for Markov processes, of an important analytical result.
EXERCISES Exercises for Section V.1
Check (1.2) for (i) a Brownian motion with drift p and diffusion coefficient a Z > 0, and (ii) the Ornstein—Uhlenbeck process.
476
BROWNIAN MOTION AND DIFFUSIONS
2. Show that if (1.2) holds then so does (1.2)' for every e > 0. 3. Suppose that {X,} is a Markov process on S = (a, b), and (p is a continuous strictly monotone function on (a, b) onto (c, d). (i) Prove that {ç(XX )} is a Markov process. (ii) Compute the transition probability density for the process {X,:=exp{B,}} in Example 1. 4. Consider the Ornstein-Uhlenbeck velocity process { Y,} starting at V o = v. Let y = ß/m (Example 2). (i) Show that az Cov(V, I;+) = — (1 - e - ' 7 )e - Y5 , Y
s > 0.
(ii) Calculate the limit of the transition probability density p(t; x, y) in (1.6), as t--. oo,if(a)y0. (iii) Define the Ornstein-Uhlenbeck position process by X, = x + f o V. ds, t -> 0. (a) Show that {X} is a Gaussian process. (b) Show EX, = x + vy '(1 - e Y`). (c) Show
z
z
z
VarX,=a t-3v +v
Y
z
2 Y'
4e-rr-1e-zyr
2Y z Y
Y
(d) Show z COV(Xs , Xs+t - Xs) =
(1 - e)(1 - e) 2 . Y
(iv) Explain why {X,} is not a Markov process. (v) Write y = ßm 'and let a z = ya for some aö > 0. Use (iii) to show that as y -+ co the process {XX } converges to a Brownian motion with zero drift and diffusion coefficient a. 5. In Example 3, take {X,} to be an Ornstein-Uhlenbeck process and f(t) = ct (c ^ 0). Compute the transition probability density of {Z, = X, + ct}. 6. Let v e R', v 0 0, be an arbitrary (nonrandom) constant. Define a deterministic motion by XX = X0 + vt, t > 0, i.e., randomness only in the initial distribution of X 0 . Show that {X,} is a Markov process having continuous sample paths and satisfying (1.2) but with o z = 0. Does the transition probability distribution have a density?
Exercises for Section V.2 1. If fa f(x)h l (x) dx = J f(x)h z (x) dx for all twice continuously differentiable functions f vanishing outside some closed, bounded subinterval of (a, b), then prove that h l (x) = h 2 (x) outside a set of Lebesgue measure zero.
2. Show that the derivations of (2.13) and of the backward equation (2.15) do not require the assumption of the existence of a density of the transition probability.
EXERCISES
477
3. Let f be a twice continuously differentiable function on [c, d]. Show that given &> 0 there exists a twice-differentiable g on l' such that g = f on [c, d] and g vanishes outside [c - e, d + J.
4. (i) Show that (1.2) in Section 1 holds uniformly for all x in the following cases: (a)
U(x)
=0,a 2 (x) =a 2 .
(b) u(x) = µ, a 2 (x) = a 2
(ii) Show that, for the Ornstein-Uhlenbeck process, PP (IX, - xI > r) = o(t) does not hold uniformly in x.
5. Let p(t; x, y) be the transition probability density of a diffusion on V8' and A a real number. Write q(t; x, y) exp{ - At} p(t; x, y). (i) Show that q satisfies the Chapman-Kolmogorov equation (2.2), and the backward and forward equations at = µ(x)
ax —— + Za2(x) ax
2q,
- ay a (u(y)q) + äya (za (y)q) - Aq. z
aq
at
2
2
(ii) (Initial-Value Problem) Let f be a bounded continuous function and define T, f := f f(y)q(t; x, y) dy. Show that the function u(t, x):= (T, f)(x) solves the initial-value problem z au au —=p(x)tx +za2(x)axz-Au at
(xEf8',t>0),
lim u(t, x) = f(x). rlo (iii) (Killing at a Constant Rate) Let {X,} be a diffusion with transition probability density p, and an exponentially distributed random variable independent of {X,} and having parameter A. Then q may be interpreted as the (defective) transition probability density of the process {X,} killed at time . More specifically, show that the function u in (ii) may be represented as E(f(X,) 1 (Z>l) 1 Xo = x). (iv) (Killing) Let ).(x) be a continuous, nonnegative function on R'. Define the operators T, acting on bounded continuous functions f, by (T,f)(x)'= E(.i(X,) expS
l
-
J o A(X,) dsl t
I
Xo = x)
where {X} is a diffusion on W. Show that the operators T, have the semigroup property, and that u(t, x):= (T, f)(x) solves the initial-value problem in (ii) with
A= (v) Note that (T, f)(x) is finite (indeed, T, f is bounded if f is bounded), if A(x) >_ 0. One may also express T, f as
(T1./ )(x) = E(.f(X,) 1 (4,X >> n),
478
BROWNIAN MOTION AND DIFFUSIONS
where conditionally given the sample path {X s : s >_ 0}, the killing time ^ (X , E has the (nonhomogeneous) exponential distribution P(i;^ x , } > t {Xs : s > 0}) = exp
t— £
2(XS ) ds } . )
The killing times ^ fx . ) may take the value + co with positive probability. If {X,} is defined on a probability space of trajectories (52,,.F 1 , P,), then show that by enlarging the space appropriately one may define both {X,} and on a common probability space (52,.x, P). [Hint: First construct the product space (Q 2 , .Fz , Pz ) for an independent family of random variables { } indexed by the set S2, of all trajectories co,. That is, 0 2 = X w En , Icu,, where lo = [0, oo] for all w l E Q. Then take the product space (S2 = X S 2 Z, = 1 OO Wiz, P = P 1 X P2).]
(vi) (Feynman—Kac Formula) If A(x) is bounded below and continuous, and not necessarily nonnegative, show that T, f given in (iv) is well defined, defines a semigroup, and solves the initial-value problem (ii) with A = A(x). (Adjoint Diffusions) Let p(t; x, y) denote the transition probability density of a diffusion on F1' with coefficients satisfying Condition (1.1). In addition, assume that
z 2dxz
(o2(x))—d µ(x)=0,
(xeR').
(i) Show that P(t; x, y):= p(t; y, x) is a transition probability density of a diffusion on F1' and compute its drift and diffusion coefficients. What is the forward equation for p? (ii) (Time Reversibility) Let {X,} and {Y} be diffusions with transition probability densities p and p" as above, with X o =_ x, Yo = y. Prove that for arbitrary time points 0 < t, < • • • < t,, < t the conditional p.d.f. of (X,,, X,,, ... , X,„) given X, = y, evaluated at (x,, x 2 ..... x„), is the same as the conditional p.d.f. of (Y,„, Y,, .. , Y) given Y, = x, evaluated at (x,,, x„_,, .. , x,). Thus, the sample path of one process over any finite time period cannot be distinguished probabilistically from that of the other with the direction of time reversed. 7. (Sources and Sinks) Consider the Fokker—Planck equation with a source (or sink) term h(t, y), öc(t, at
ö a2 (u(Y)c(t, Y)) + i (ia z (Y)c(t, y)). + h(t, y), —
Y
where h(t, y) is a bounded continuous function of t and y, and f Ih(t, x)Ip(t; x, y) dx < oo. One may interpret h(t, y) (in case h _> 0) as the rate at which new solute particles are created at y at time t, providing an additional source for the change in concentration with time other than the flux. The contribution of the initial concentration c o to the concentration at y at time t is c, (t, Y)'=
co(x)p(t; x, Y) dx. J
EXERCISES
479
The contribution due to the h(s, x) Ax As new particles created in the region [x, x + ix] during the time [s, s + As] is h(s, x)p(t — s; x, y) As Ax (0 _< s < t). Integrating, the overall contribution from this to the concentration at y at time t is c 2 (t, y) =
J
f. gal
h(s, x) p(t — s; x, y) dx ds.
'
(i) (Duhamel's Principle) Show that c, satisfies the homogeneous Fokker—Planck equation with h = 0 and initial concentration c o , while c 2 satisfies the nonhomogeneous Fokker—Planck equation above with initial concentration zero. Hence c := c, + c 2 solves the nonhomogeneous Fokker—Planck equation above with initial concentration c o . Assume here that c o (x) is integrable, and lim
J h(t —
s, x)p(s; x, y) ds = h(t, x).
ajo PU
Show that this last condition is satisfied, for example, under the hypothesis of Exercise 6. *Can you make use of Exercise 5 to verify it more generally? (ii) Apply (i) to the case µ(x), a z (x), h(t, x) are constants. (iii) Solve the corresponding initial-value problem for the backward equation.
Exercises for Section
V.3
1. Derive (3.21) using (3.15) and Proposition 3.1. 2. Show that the process {Z,} of Example 4 does not have a finite first moment. 3. By a change of variables directly derive the backward equation satisfied by the transition probability density p of {Z,} := {4p(X,)}, using the corresponding equation for p. From this read off the drift and diffusion coefficients of {Z, }. 4. (Natural Scale for Diffusions) For given drift and diffusion coefficients p(.), a'(.) >0 on S = (a, 6), define (Y 1(x z):= ^ 2, u ) d Y
with the usual convention !(z, x) define the scale function
_
x 62 (y)
—1(x, z) for z < x. Fix an arbitrary x o e S, and
J
s(x)= exp{ —1(x 0 , z)} dz, x°
again following the convention fx° = —J° for x < x o . If {X,} is a diffusion on S with coefficients p(•) a 2 (.), find the drift and diffusion coefficients of {s(X,)} (see Section 9 for a justification of the nomenclature).
5. Use Corollary 2.4 to prove that the following stronger version of the last relation
480
BROWNIAN MOTION AND DIFFUSIONS
in (1.2)' holds: for each x e 11' and each c> 0, P.( max IX,-xl>E^=o(h)
ashj0.
o 0. Show that { Y,} is an Ornstein-Uhlenbeck process. Note continuity of sample paths from {B(t)}.
Exercises for Section V.4 1. Show that (4.4), (4.5) are solved by (4.2). 2. Derive (4.17). 3. (i) Using (4.17) derive the probability p , that, starting at x, {X,} ever reaches c. X
(ii) Find a necessary and sufficient condition for {X 1 } to be recurrent. 4. Apply the criterion in Exercise 3(ii) to the Ornstein-Uhlenbeck process. What if y—ß/m> 0 ? 5. (i) Using the results of Section 3 of Chapter III, give an informal derivation of a necessary and sufficient condition for the existence of a unique invariant probability distribution for {X,} (i.e., of p(t; x, y) dy).
481
EXERCISES
(ii) Compute this invariant density. (iii) Show that the Ornstein-Uhlenbeck process { V} has a unique invariant distribution and compute this distribution (called the Maxwell-Boltzmann: velocity distribution). (iv) Show that under the invariant (Maxwell-Boltzmann) distribution, az Cov,(V, , V +,) =
2- e
-
'S,
s > 0.
Y
(v) Calculate the average kinetic energy E(mV )of an Ornstein-Uhlenbeck particle under the invariant initial distribution. (vi) According to Bochner's Theorem (Chapter 0, Section 8), one can check that p(s) = Cov,,(V„ V + ,) in (iv) is the Fourier transform of a measure It (spectral distribution). Calculate p. 6. Briefly indicate how the Ornstein-Uhlenbeck process may be viewed as a limit of the Ehrenfest model (see Section 5 of Chapter I11) as the number of balls d goes to infinity. 7. Let s (x)
J
E
= pex ^-I 2ti(1)dy^dz .
xo a 2 (Y)
be the so-called scale function on S = (a, b), where x o is some point in S. If {X} is a diffusion on S with coefficients p(• ), a 2 (• ), then show that the diffusion { Y, := s(X,)} on S = (s(a), s(b)) has the property P({Y,} reaches c before dl Yo = y) =a —y c ,
y_0, -cc < x, y < oo), along the lines of the derivation of the backward equation (4.14) given in this section. [Hint: p " +t = pap ] .
9. (Maxwell-Boltzmann Velocities) A monatomic gas consists of a large number N (- 10 23 ) molecules of identical masses m each. Label the velocity of the ith molecule by v, _ (v 31 _ 2 , v 3; 1 , v 3; ), for i = 1, 2, ... , N. To say that the gas is an ideal gas means that molecular interactions are ignored. In particular, the only energy represented in an ideal monatomic gas is the translational kinetic energy of individual molecular motions. The total energy is therefore 3N
'zmv; = zm IIvIIi,
T = T(v1, v2, ... , U3N) _
for v = (v1.... , V3N) E R 3N ,
j=1
where IlvlI2:=^; =Ni vj. Define equilibrium for a gas in a closed system of energy E
482
BROWNIAN MOTION AND DIFFUSIONS
to mean that the velocities are (purely random) uniformly distributed over the energy surface S given by the surface of the 3N-dimensional ball of radius R = (2E/m)" 2 , i.e.,
f
3N
S = ( VI, v2, •
2E
, V3N)• Y_vJ _ m i =1
One may also define temperature in proportion to E. (i) Show that the distribution of the jth component of velocity is given by P(a 0. (iii) How do the above calculations change when the 1 2 -norm 1 2 defined above is replaced by the 1p norm defined by jIvIIP'= v, where p >_ 1 is fixed? (This interesting generalization was brought to our attention by Professor Svetlozar T. Rachev.)
Exercises for Section V.5 1. Extend the argument given in Example 1 to compute the transition probability density of a Brownian motion with drift p and diffusion coefficient a 2 > 0. 2. Use the method of Example 2 to compute the transition probability density of a diffusion with drift u(x) = —yx — g and diffusion coefficient a 2 > 0. 3. (i) Use Kolmogorov's backward equation and the Fourier transform to solve the initial-value problem
öu ^= x (
)
Z
=za axe+µöx
(t>0,—oo<x0 for all t > 0, and x, y e (— oo, oo) for every diffusion whose drift and diffusion coefficients satisfy Condition (1.1). Prove that the Markov process {Z,} on the circle in Theorem 6.2 has a unique invariant probability m(y) dy, and that the transition probability density q(t; x, y) of {Z,} converges to m(y) in the sense that
J
jq(t;x,y)
—
m(y)ldy - + 0 ast-+oo,
s
the convergence being exponentially fast and uniform in x. (ii) Compute m(y). [Hint: Differentiate $ T, f (x)m(x) dx = Sf(x)m(x) dx with respect to t, to get (A f)(x)m(x) dx = 0 for all twice-differentiable f on S with compact support, so that A *m = 0.]
J),
9. Derive the forward boundary conditions for a reflecting diffusion on [0, 1]. 10. Suppose (a) {X} and { —X,} have the same transition probability density, and (b) { 1 + X} and { l — X,} have the same transition probability density. Show that the drift and diffusion coefficients of {X} must be periodic with period 2.
Exercises for Section V.7
1. Use the Radon-Nikodym Theorem (Section 4 of Chapter 0) to prove the existence of a nonnegative function y -+ p (t; x, y) such that f, p (t; x, y) dy = p(t; x, B) for every Borel subset B of (a, oo). Show that p (t; x, y) _< p(t; x, y) where p is the transition probability density of the unrestricted process {X, }.
°
°
°
2. Let T, (t > 0) be the semigroup of transition operators for the Markov process {Xj in Theorem 7.1, i.e.,
f(y)P(t; x, dy)
(T,f)(x) = Exf(Xr) = fla,
for all bounded continuous f on [a, oo).
484
BROWNIAN MOTION AND DIFFUSIONS
(i) If f(a) = 0, show that (a) (T^f)(x) =
f(
°
(x > a),
f(Y)P (t; x, Y) dY a.m)
(b) (T, f)(x) -+ 0 as x j a. [Hint: Use (7.11).] (ii) Use (i) to show that T° (t > 0), defined by
(T°.f )(x);=
f(
f(Y)P°(t; x, y) dY, a,
)
is a semigroup of operators on the class C of all bounded continuous functions on [a, oo) vanishing at a. Thus T° is the restriction of T, to C (and T°C c C). (iii) Let A ° denote the infinitesimal generator of T°, i.e., the restriction of the infinitesimal generator of T, to C. Give at least a rough argument to show that (7.10) holds (along with (7.11)). 3. Derive (7.13) and (7.14). 4. (Method of Images) Consider the drift and diffusion coefficients p(.), a 2 (•) defined on [0, oo). Assume u(0) = 0. Extend p(.), a 2 (•) to 01' as in (6.8) by setting ,u(—x) = —µ(x), a'(—x) = a 2 (x) for x > 0. Let p ° denote the (defective) transition probability density on [0, oo) (extended by continuity to 0) defined by (7.9). Let p denote the transition probability density for a diffusion on R' having the extended coefficients above. Show that P ° (t; x, y) = p(t; x, y) — p(t; x, — y) = p(t; x, y) — p(t; —x, y), (t > 0;
x, y >- 0).
[Hint: Show that p ° satisfies the Chapman-Kolmogorov equation (2.30) on S = [0, oo), the backward equation (7.10), the boundary condition (7.11), and the initial condition p ° (t; x, y) dy -* 5(dy) as t j 0 (i.e., (T° f)(x) -• f (x) as t j 0 for every bounded continuous function f on [0, oo) vanishing at 0). Then appeal to the uniqueness of such a solution (see theoretical complements 2.3, 8.1). An alternative probabilistic derivation is given in Exercise 11.11.] 5. (Method of Images) Let u(.), a 2 (.) be drift and diffusion coefficients defined on [0, d] with p(0) = µ(d) = 0, as in (6.36). Extend p(.), a 2 (.) to R' as in (6.37) and (6.38) (with d in place of 1). Let p denote the transition probability density of a diffusion on W having these extended coefficients. Define q(t;
x,
y):=
I p(t; x, y + 2md),
—d < x, y < d,
m=—m
P
° (t; x, Y)'= q(t; x, Y) — q(t; x, — y),
0 5 x, y < d.
(i) Prove that p ° is the density component of the transition probability of the diffusion on [0, d] absorbed at the boundary {0, d}. [Hint: Check (2.30). (7.10), and (7.11) on [0, d], as well as the initial condition: (T° f)(x) -+ f (x) as t 10 for every continuous f on [0, d] with f(0) = f(d) = 0.] (ii) For the special case j(.) = 0, a 2 (.) = a 2 compute p°.
485
EXERCISES 6. (Duhamel's Principle) (i) (Nonhomogeneous Equation) Solve the initial-boundary-value problem a at
u(t, x) = u(x)
öu -x +
za2(x)
tazu x), c3x2 + h(t,
lim u(t, x) = 0,
lim u(t, x) = 0,
xja
zTb
(t > 0, a <x < b); (t > 0, a -< x -< b);
(a -<x- 0; a <x < b);
u(t, x),
lim u(t, x) = a,
lim u(t, x) = ß;
:la
xTb
lim u(t, x) = f(x), t;
o
where f is a bounded continuous function on [a, b] with 1(a) = a, f (b) = ß. [Hint: Let u (x) satisfy the differential equation and the boundary conditions. Find u,(t, x) which satisfies the differential equation, zero boundary conditions, and the initial condition: lim a ° u, (t, x) = f (x) — u ° (x). Then consider u ° (x) + u,(t, x).] (iii) (Time Dependent Boundary Values) In (ii) take a = a(t), ß = ß(t) where a(t), ß(t) are continuous differentiable functions on [0, x). [Hint: Let u ( t, x):= a(t) + ((ß(t) — a(t))/(b — a))(x — a). Find u,(t, x) solving the nonhomogeneous equation:
°
°
au, = _a u 1 _ a u0 1Q2
(3t
_
z öx z at
with zero boundary conditions and initial condition lim, ° u, (t, x) = .f(x) — uo(0, x)•] ]
7. (Maximum Principle) (i) Let u(t, x) be twice-differentiable with respect to x and once with respect to t on
486
BROWNIAN MOTION AND DIFFUSIONS
(0, T] x (a, b), continuous on [0, T] x [a, b], and satisfy (au/at) — Au 0. Show that the maximum value of u is attained either (a) initially, i.e., at a point (0, x o ), or (b) at a point (t, a) or (t, b) for some t e (0, T]. [Hint: Suppose not. Let the maximum value be attained at a point (t o , x 0 ) with 0 < t o T, and x o e (a, b). Then (Au)(t o , x o ) 0, so that (öu/öt)(t o , x o ) < 0. But this means u(t, x o ) > u(t o , x o ) for some t < t o ]. (ii) Extend (i) to the case au/at — Au < 0. [Hint: Let 0 be a continuous function on [a, b] satisfying AB = 1. For each r > 0, consider u,(t, u) = u(t, x) + eO(x).] (iii) Let u(x) be twice-differentiable on (a, b), continuous on [a, b] and satisfy (Au)(x) > 0 for a <x < b. Show that the maximum value of u is attained at x = a or x = b. [Hint: First assume Au > 0; if the maximum value is attained at x o e (a, b), then Au(x o ) < 0.] (iv) Prove that the solutions to the initial-boundary-value problems in Exercise 6 are unique.
_
p^-
)7J
2621
— cos
m x / \i
\-
exp{-(2md + y —Y^ - exp^-(2md + y + x) 2a2t 2a 2 1 ) -^1 x,
2
_-- Z exp -
a 2 rr 2 m 2 1
m rzy
sin ( rnnx - si n -
(0 x, y d).
)
d m = 1 2d2
d d
Use these to derive Jacobi's identity for the theta function, .
©(z):=
°^
exp{
-nm z} _—1 2
m - m _
[Hint:
exp{
(il 1 -nmz /z} _ --O` f \z /
(z > 0).
^Z
^z m s,
Compare (8.23) with (6.31), and (8.33) with Exercise 7.5(ii).]
9. (Hermite Polynomials and the Ornstein-Uhlenbeck Process) Consider the generator A = ' d 2 /dx 2 - x d/dx on the state space W. (i) Prove that A is symmetric on an appropriate dense subspace of L2 (!8', e - x 2 dx). (ii) Check that A has eigenfunctions H"(x)'= ( -1)" exp {x 2 }(d "/dx") exp{ -x 2 }, the so-called Hermite polynomials, with corresponding eigenvalues n = 0, 1, 2, ... . (iii) Give some justification for the expansion x
p(t; x, y) = e
2
Z c"e
t
"'H"(x)H"(y),
c "'=
10. According to the theory of Fourier series, the functions cos nx (n = 0, 1, 2, ...), sin nx (n = 1, 2, ...) form a complete orthogonal system in L2 [-n, 7r]. Use this to prove the following: (i) The functions cos nx (n = 0, 1, 2, ...) form a complete orthogonal sequence in L2 [0, it]. [Hint: Let f E L2 [0, n]. Make an even extension of f to [ -n, n], and show that this f may be expanded in L2 [-n, it] in terms ofcos nx (n = 0, 1, ...).] (ii) The functions sin x (n = 1, 2, ...) form a complete orthogonal sequence in L2 [0, it]. [Hint: Extend f to [-it, n] bysetting,f( -x) = - f(x)forx e [ -n, 0).]
Exercises for Section V.9 1. Check (9.13). 2. Derive (9.20) by solving the appropriate two-point boundary-value problem.
488
BROWNIAN MOTION AND DIFFUSIONS
3. Suppose S = [a, b) and a is reflecting. Show that the diffusion is recurrent if and only if s(b) = cc. If, on the other hand, s(b) < oo then one has p s,. = 1 for y> x and
s(b) — s(x) _
Px r
=
Sib) — s(Y)
J
n exp{ — I(x o , z)} dz
x
----,
s
y < x.
exp{ — I(x o , z)} dz
[Hint: For y > x, assume the fact that the transition probability density p(t; x, y) is strictly positive and continuous in x, y for each t> 0, to show that b := min{P= (X o y): a -< z -< y} > 0 for t o > 0.]
4. Suppose S = [a, b) and a is absorbing. Then show that no state, other than a, is recurrent and that s(x) —s(a)
ifa<x^y,
s(y) — s(a)
Px y =
ifs(b)=oo,anda0;
(iii) S = (—cc, cc), µ(x) = —ßx, a 2 (x) = a 2 >0 (consider separately, ß>0, ß < 0); (iv) S = [0, cc), p(x) - p, a 2 (x) = a 2 > 0, 0 reflecting (consider separately the cases P< 0 ,P= 0 ,u>0); (v) S=(0,1),u(x)--* oo as x10,p(x)-+ —oo asxji,a 2 (x)>a 2 >0 for all x. 7. Suppose S = [a, b] with a absorbing and are transient, p xy = 1 for y < x, and
b
reflecting. Show that all states but a
s(x) — s(a)
Px v
= s(y)
— s(a),
Y > x.
8. Suppose S = [a, b] with both boundaries absorbing. Show that all states, other than a and b, are transient and s(b)—s(x)
s(b) — s(y) Pz y =
s(x) — s(a) s(y)—s(a)
ayx 0, show that M(x), defined by (10.3), is bounded on [c, d]. [Hint: Use Proposition 13.5 of Chapter I.] (ii) Let T be as in (10.2), x e (c, d). Assume that
Px (max (XX —xj>a =o(h) 0 0 (this is proved in Exercise 3.5). Show that E (1,,, h } M(Xh )) = o(h) as h l 0. Check the assumption in the case of constant coefficients. X
2. Show that (10.15) holds if s(b) = oo. 3. Assume s(b) = oc. (i) Show that (10.16) is necessary and sufficient for finiteness of E x r, (c < x). (ii) Prove (10.17), under the assumption (10.16). 4. Suppose s(a) = —cc. (i) Prove that, for x < d, E,t d (ii) Derive (10.19).
0, and proceed as in Exercise 9.3, or use Proposition 13.5 of Chapter I.] Show that (10.19) holds in case of positive recurrence. (ii) Suppose S = [a, b] with a and b reflecting boundaries. Show that the diffusion is positive recurrent. Show that (10.17) and (10.19) hold. 6. Apply Proposition 10.2 and Exercise 5 above (as well as Exercise 9.6) to classify the diffusions in Exercise 9.6 into transient, null recurrent and positive recurrent ones. 7. Let [c, d] c S. Solve the two-point boundary-value problem Af (x) = —g(x) for c <x < d, lim, 1 c f (x) = a, lim x 1 d f (x) = ß, where g is a given bounded measurable (or, continuous) function on [c, d] and a, ß are given constants. Show that the solution represents
f
OT, n za
Ex
g(X,) ds + aPa(t< < t,.) +
>_
ßPx(Td < t,)•
8. Show that, under natural scale (i.e., s(x) = x), m'1 (x) M I (x) -> M 2 (x) for all x.
m'2 (x) for all x implies
490
BROWNIAN MOTION AND DIFFUSIONS
Exercises for Section V.11 1. Prove that t, r, A r d are stopping times, by showing that they are measurable with respect to . O ' t^ A Zd' respectively. 2. Prove that T 1 A 1 2 is a stopping time, if z and t Z are. 3. (i) Suppose that E(Zh(XX+t,, .
.. ' Xi+,,)lfr", and P(T2 1 < oo I TZ" < (Xo) = 1. (c) By (b), P(T Z " = oo for some n) = 1.] (ii) From (i) conclude that P(IB,I-+coast-+ cc) =1. 6. Let {B,} be a two-dimensional standard Brownian motion. Let B = {z: Iz - z 0 1 with z o and E > 0 arbitrary. Prove that P(sup {t >_0:B,EB} = oo) = 1. 7. Let {X,} be a k - dimensional diffusion with periodic drift and diffusion coefficients, the period being d > 0 in each coordinate. Write Z;' := X;'°(mod d), Z, (Z t (I) , Z't1k) ) (i) Use Proposition 14.1 to show that {Z,} is a Markov process on the torus T = [0, d)" —which may be regarded as a Cartesian product of k circles. (ii) Assuming that the transition probability density p(t; x, y) of {X,} is continuous and positive for all t > 0, x, y, prove that the transition probability q(t; z, z') dz' of {Z,} admits a unique invariant probability n(z') dz' and that )
max jq(t; z, z') - ir(z')l z.z'e T
_
1) and cp a bounded Borel measurable function on 3G. Suppose Px (i ?c < cc) = 1 for all x c G, where P. denotes the distribution of {B,} starting at x. Show that u(x):= E x cp(B,,, G ) has the mean value property in G. [Hint: u(x) = E x u(B,,, A ) if {y: ly - xl -< r} c G, by the Strong Markov property. Next note that the Po -distribution of {B,} is the same as that of {OB,} for every orthogonal transformation 0.] (ii) Show that if u has the mean value property in G, then u is infinitely differentiable. [Hint: Fix x c- G. Let B(x:2r) c G, and let 0 be an infinitely differentiable radially symmetric p.d.f. vanishing outside B(O:r). Let ü.= u on B(x:2r) and zero outside. Show that u = ü * i/i on B(x:r)]. (iii) Show that if u has the mean value property in G, then it is harmonic in G, i.e., Au = 0 in G. [Hint:
u(x) = u(y) + Y- (xW`W - y (0 )
i
au (Y) + 2 Z (xu) - y 'n)( x u> - yÜn) öxO öxu> (Y)
+ O(p 3 ), where p = Ix - yl. Integrate with respect to the uniform distribution on {x: Ix - yj = p }. Show that the integral of the first sum is zero, that of the second sum is (p 2 /2k)Au(y) and that of the remainder is 0(p 3 ), so that (p 2 /2k)Au(y) + O(p 3 ) is zero for small p.] (iv) Show that a harmonic function in G has the mean value property. [Hint: Use the divergence theorem.] 11. Check that (i) the function i/i f in (15.28) is harmonic,
O(x; y)s(dy) = I for x E 3G, and (ii) f (iii) i(i(x, y)s(dy) converges weakly to b .(dy) as x -^ y o E 3G. IG
y
12. (Maximum Principle) Let u be harmonic in a connected open set G c 1& k . Show that u cannot attain its infimum or supremum in G, unless u is constant in G. [Hint: Use the mean value property.] 13. (Dirichlet Problem) Let G be a bounded, connected, open subset of R. Given a continuous function q' on OG, show, using Exercises 10 and 12, that (i) u(x):= E, e q (B) is a solution of the Dirichlet problem Au(x) = 0
for x e G,
u(x) = 43(x)
for x e 3G,
and (ii) if u is continuous at the boundary, i.e., u(y) = lim u(x) as x(e G) - y e 0G, then this solution is unique in the class of solutions which are continuous on G.
496
BROWNIAN MOTION AND DIFFUSIONS
14. (Poisson's Equation) Let G be as in Exercise 13. Give at least an informal proof, akin to that of (15.25), that Tr'G
u(x):= E x g(X,) ds + E x cp(X T G ,
)
0
satisfies Poisson's equation 2Au(x) = —g(x)
for x e G,
u(x) = 47(x)
for x e OG.
Here g, cp are given continuous functions on G and OG, respectively. Assuming u is continuous at the boundary, show that this solution of Poisson's equation is unique in the class of all continuous solutions on G. 15. Extend the function F in (15.37) to a twice continuously differentiable function on [0, co) vanishing outside [c 1 , d 1 ], where 0 < c, < c < d , M, where z e l8 k , 5>0, M >0 are given. Apply Theorem 15.3 (and Exercise 19) to decide whether the diffusion is recurrent or transient.
(b)E;`_ ;;
(
21. For a diffusion having a positive and continuous (in x, y) transition density p(t; x, y) (t > 0), prove that the P,-distribution of T OB(O . d) has a finite m.g.f. in a neighborhood of zero, if xl < d.
Exercises for Section V.16 1. Compute the transition probability density of a standard Brownian motion on G = {x e 08 k x '° >, 0 for 1 < i < k} with (normal) reflection at the boundary (k > 1). :
(
2. (i) Compute the transition probability density q of a standard Brownian motion on G = {x e ll : 0 < x ` -< 1 for 1 -< i -< k} with (normal) reflection at the boundary. ( )
497
THEORETICAL COMPLEMENTS
(ii) Prove that max q((t; x, y) — lI -< c i e ` z', -
(t > 0),
x,yEri
for some positive constants c,, c 2
.
3. Prove that {Z,} defined by (16.27) is a Markov process on H = {x e 11": y•x > 0}. 4. Check (16.30). 5. Check that the function v(t, jxj) given by the solution of (16.42) (with r = jxl) satisfies (16.41).
THEORETICAL COMPLEMENTS Theoretical Complements to Section V.1 1. A construction of diffusions on S = R' by the method of stochastic differential equations is given in Chapter VII (Theorem 2.2), under the assumption that g(• ), r(.) are Lipschitzian. If it is also assumed that a(x) is nonzero for all x, then it follows from K. Ito and H. P. McKean, Jr. (1965), Diffusion Processes and Their Sample Paths, Springer- Verlag, New York, pp. 149-158, that the transition probability p(t; x, dy) has a positive density p(t; x, y) for t > 0, which is once continuously differentiable in t and twice in x. Most of the main results of Chapter V have been proved without assuming existence of a smooth transition probability density. Ito and McKean, loc. cit., contains a comprehensive account of one-dimensional diffusions.
Theoretical Complements to Section V.2 1. For the Markov semigroup {T,} defined by (2.1) in the case p(t; x, dy) is the transition probability of a diffusion {X }, it may be shown that every twice continuously differentiable function f vanishing outside a compact subset of the state space belongs to 9,,. A proof of this is sketched in Exercise 3.12 of Chapter VII. As a consequence, by Theorem 2.3, f (X,) — $ö Af (Xs ds is a martingale. More generally, it is proved in Chapter VII, Corollary 3.2, that this martingale property holds for every twice continuously differentiable f such that f, f', f" are bounded. Many important properties of diffusions may be deduced from this martingale property (see Sections 3, 4 of Chapter VII). )
2. (Progressive Measurability) The assumption of right continuity of t —• X,(w) for every w e Q in Theorem 2.3 ensures that {X,} is progressively measurable with respect to {.y, }; that is, for each t > 0 the map (s, w) —+ X(w) on [0, t] x S2 into S is measurable with respect to the product sigmafield _4([0, t]) © SF, (on [0, t] x S2) and the Borel sigmafield .a(S) on S. Before turning to the proof, note that as a consequence of the progressive measurability of {X} the integral f p Af (Xj ds is well defined, is .y,-measurable and has a finite expectation, by Fubini's Theorem.
498
BROWNIAN MOTION AND DIFFUSIONS
Lemma 1. Let (S2, .y) be a measurable space. Suppose S is a metric space and s --+ X(s, w) is right continuous on [0, oo) into S, for every aw e Q. Let {.y,} be an increasing family of sub-sigmafields of .f such that X, is .F, measurable for every
t _> 0. Then {X,} is progressively measurable with respect to {. ~r }.
❑
Proof. Fix t > 0. Let f be a real-valued bounded continuous function on S. Consider the function (s, cw) -* f (XX (w)) on [0, t] x fI. Define 2^
gn(s, w) _ f(X2-^r(w)) 1 [0.2--r](s) +
f(X12-nr(w))lu+-1)2 r,i2-"r](S). (T.2.1)
As each summand in (T.2.1) is the product of an S measurable function of w and a ([0, t])-measurable function of s, g„ is _4([0, t]) ®.-measurable. Also, for each (s, co) e [0, t] x f, g„(s, w) -+ f (X$ (co)). Therefore, (s, w) , f (X3 (o))) is R([0, t]) Qx .measurable. Since the indicator function of a closed set F c S may be expressed as a pointwise limit of continuous functions, it follows that (s, w) -• 1 F (Xs (w)) is _4([0, t]) ©.0;-measurable. Finally, c' == {B e p1(S): (s, co) -• 1 8 (XS (w)) is ‚([0, t]) Q measurable on [0, t] x fl} equals s(S), by Dynkin's Pi-Lambda Theorem. n Another significant consequence of progressive measurability is the following result, which makes the statement of the strong Markov property of continuous-parameter Markov processes complete (see Chapter IV, Proposition 5.2; Chapter V, Theorem 11.1; and Exercise 11.9). Lemma 2. Under the hypothesis of Lemma 1, (a) every {.^,}-stopping time r is .-measurable, where 5= == {A e F: A n {i _< t} e .F, Vt > 0}, and (b) X,1 (T< . ) is measurable. SFB ❑ Proof. The first part is obvious. For part (b), fix t > 0. On the set 52, := {r < t}, XT is the composition of the maps (i) a -• (tr(w), w) (on f2, into [0, t] x S2,) and (ii) (s, w) •--> X (w) (on [0, t] x Q r into S). Let n __ {A n St,: Ac . Nr } = (Ac 3: A c i2 r }, be the trace of the sigmafleld .Fr on 52,. If r e [0, t] and Ac .y, ( n , one has 5
{west,: (r(w),o)e[0,
r] x A} =
r} n
Ac
Thus the map (i) is measurable with respect to In (on the domain St,) and ,1([0, t] ® (on the range [0, t] x S2 r ). Next, the map (s, w) -+ XS (w) is measurable with respect to .q([0, t]) O .yr (on [0, t] x f2) and .a(S) (on S), in view of the progressive measurability of {X 5 }. Since [0, t] x Q, e ([0, t]) ® F,, and the restriction of a measurable function to a measurable set is measurable with respect to the trace sigmafield, the map (ii) is e([0, t]) © SFJ,,- measurable. Hence w -. XT(m[ (W) is . I n ,-measurable on R. Therefore, for every B e a(S), (we S2: X
X(W)
(cw) e B} n Q, - {co E f',: X, [m) (w) e B) e S.
n
3. (Semigroup Theory and Feller's Construction of One-Dimensional Diffusions) In a series of articles in the 1950s, beginning with "The Parabolic Differential Equation
499
THEORETICAL COMPLEMENTS
and the Associated Semigroups of Transformations," Ann. Math., 55 (1952), pp. 468-519, W. Feller constructed all nonsingular Markov processes on an interval S having continuous sample paths. Nonsingularity here means that, starting from any point in the interior of S, the process can move to the right, as well as to the left with positive probability. If S = (a, b), and a, b cannot be reached from the interior, then such a process is completely characterized by a strictly increasing continuous scale function s(x) and a strictly increasing right-continuous speed function m(x). The role of s(x) is the determination of the probability 4(x) of reaching d before c, starting from x, where a 0, A — A is one-to-one (on 9A ) and onto B with II(A — A) - ' II < 1/2. A simple proof of this theorem for the case B = C[a, b], the set of all continuous functions on (a, b) with finite limits at a and b, and for closed linear subspaces of C[a, b], may be found in P. Mandl (1968),
THEORETICAL COMPLEMENTS
501
Analytical Treatment o/' One-Dimensional Markov Processes, Springer- Verlag,
New York, pp. 2-5. It may be noted that if A is a bounded operator on B, then T, = exp{tA} (see Section 3 of Chapter IV). But the differential operator A = (d /dm(x))(d/ds(x)) is unbounded on C[a, b]. Consider the case B = C 0 (a, b), the set of all continuous functions on (a, b) having limit zero at a and b. C0 (a, b) is given the supnorm. Let A = (d/dm(x))(d/ds(x)) with ^A comprising the set of all f e B = C0 (a, h) such that Af e B. Suppose A, i7 A satisfy the hypothesis of the Hille-Yosida Theorem, and {T,} the corresponding contraction semigroup. It is simple to check that (i: - A) 'f equals g= J/ e '°T, f dt, for each 2 > 0; that is, . - A)g = f. Suppose that the resolvent operator (;. - A) ' is positive; if f >_ 0 then (1 - A) 'f >_ 0. Then, by the uniqueness theorem for Laplace transforms, T, f > 0 if f >_ 0. In this case f -^ T, f(x) is a positive bounded linear functional on C 0 (a, b). Therefore, by the Riesz Representation Theorem (see H. L. Royden (1968), Real Analysis, 2nd ed., Macmillan, New York, pp. 310-11), there exists a finite measure p(t; x, dy) on S = (a, b) such that T, f(x) = $ f(y)p(t; x, dy) for every f e C 0 (a, b). In view of the contraction property, p(t; x, S) < 1. In order to verify the hypothesis of the Hille-Yosida Theorem in the case a, b are inaccessible, construct for each A. > 0 two positive solutions u„ u 2 of (A - A)u = 0, u, increasing and u 2 decreasing. Then define the symmetric Green's function G,A (x, y) = W 'u,(x)u z (y) for a <x _< y < b, extended to x > y by symmetry. Here the Wronskian W given by W 1= u 2 (x) du, (x)/ds(x) - u l (x) du 2 (x)/ds(x) is independent of x. For Ja C0 (a, b) the function g(x) = G x f(x):= J G,(x, y)f(y) dm(y) is the unique solution in C0 (a, b) of (A - A)g = f. In other words, G. = (A - A) '. The positivity of (A - A) ' follows from that of G A (x, y). Also, one may directly check that G.1 - 1/2. This implies p(t; x, (a, b)) = 1. For details of this, and for the proof that a Markov process on S = (a, b) with this transition probability may be constructed on the space C([0, oo ): S) of continuous trajectories, see Mandl (1968), loc. cit., pp. 14-17,21-38. -
-
-
-
-
-
4. For a diffusion on S = (a, b), consider the semigroup {T} in (2.1). Under Condition (1.1), it is easy to check on integration by parts that the infinitesimal generator A is self-adjoint on LZ (S, n(y) dy), where ir(y) dy is the speed measure whose distribution function is the speed function (see Eq. 8.1). That is,
J
Af(Y)g(Y)n(Y) dy = ff(y)Ag(y)7r(y)dy
for all twice continuously differentiable f, g vanishing outside compacts. It follows that T, is self-adjoint and, therefore, the transition probability density q(t; x, y):= p(t; x, y) /n(y) with respect to n(y) dy is symmetric in x and y. That is, q(t; x, y) = q(t; y, x). Since p satisfies the backward equation cap/c?t = A ' p, it follows that so does q: cq/ät = A x q. By symmetry of q it now follows that 3q /ät = A y q. Here Aq, A y q denote the application of A to x -. q(t; x, y) and y -^ q(t: x, y) respectively. The equation tq/at = A y q easily reduces to cep/lit = A *p, as given in (2.41). For details see Ito and McKean (1965), loc. cit., pp. 149-158. For more general treatments of the relations between Markov processes and semigroup theory, see E. B. Dynkin (1965), Markov Processes, Vol. 1, Springer- Verlag, New York, and S. N. Ethier and T. G. Kurtz (1986), Markov Processes: Characterization and Convergence, Wiley, New York.
502
BROWNIAN MOTION AND DIFFUSIONS
Theoretical Complements to Section V.4 1. Theorem 4.1 is a special case of a more general result on the approximation of diffusions by discrete-parameter Markov chains, as may be found in D. W. Stroock and S. R. S. Varadhan (1979), Multidimensional Diffusions, Springer-Verlag, New York, Theorem 11.2.3. From the point of view of numerical analysis, (4.13) is the discretized, or difference-equation, version of the backward equation (4.14), and is usually solved by matrix methods.
Theoretical Complements to Section V.5 1. The general PDE (partial differential equations) method for solving the Kolmogorov equations, or second-order parabolic equations, may be found in A. Friedman (1964), Partial Differential Equations of Parabolic Type, Prentice-Hall, Englewood Cliffs. In particular, the existence of a smooth positive fundamental solution, or transition density, is proved there.
Theoretical Complements to Section V.6 1. Suppose two Borel-measurable function p(• ), a'(•) > 0 are given on S = (a, b) such that (i) j(.) is bounded on compact subsets of S, (ii) a 2 (.) is bounded away from zero and infinity on compact subsets of S, and (iii) a and b are inaccessible (see theoretical complement 2.1 above). Then Feller's construction provides a Markov process on S having continuous sample paths, with a scale function s(.) and speed function m(.) expressed as functions of p(• ), a'(.) as described in theoretical complement 2.1. Hence continuity of p(•) is not needed for this construction. 2. The general principle Proposition 6.3 is very useful in proving that certain functions {Q(XX )} of Markov processes {X} are also Markov. But how does one find such functions? It turns out that in all the cases considered in this book the function q0 is a maximal invariant of a group of transformations G under which the transition probability is invariant. Here invariance of the transition probability means p(t; gx, g(B)) = p(t; x, B) for all t > 0, x e S. g e G, B Borel subset of the metric space S. A function cp is said to be a maximal invariant if (i) cp(gx) = cp(x) for all g e G, x e S, and (ii) every measurable invariant function is a (measurable) function of (p. For each x e S the orbit of x (under G) is the set o(x):= {gx: g e G}. Each invariant function is constant on orbits. Let S' be a metric space and cp a measurable function on S onto S' such that (i)' pp is constant on orbits, and (ii)' cp(x) # q(y) if o(x) 96 o(y). In other words 0.
First, let us show that {X„} so extended is a {3} -martingale difference sequence, where ^„ := a {X; : —co 0, E(Xn +l
I `;n) = 0, i.e., `
E(1 X„ ,) = 0 A
+
for all A e .fF„.
(T.13.16)
Suppose A is a finite-dimensional event in , say A e a{ X„ _ j : 0 _< j _< m }. Note that the (joint) distribution of (X„_„„ ... , X„, X ,) is the same as that of any in + 2 consecutive terms of the sequence {X„: n >_ 0 }, e.g., (X0 , X,, ... , X„, + ,). Therefore, (T.13.16) becomes, for such a choice of A, E(I A X„, + ,) = 0, for all A E a{X o , ... , X„,}, which is clearly true by the martingale difference property of {X„: n >, 0 }. Now consider the class 'K„ of sets A in f„ such that E(1,, X„. +1 ) = 0. It is simple to check that '6'„ is a sigmafield and since''„ = a {X„ - t : 0 s j _< m} for all m ? 0 one has 16„ _ „. Next define a := E(X, I -,) (na 1). Then {a: n _> 0} is a stationary ergodic sequence, and EQ,., = a' > 0. In particular, sn /n = cr ,/n -+ a 2 a.s., where s„ = am. Observe that s,, - co almost surely. Write Xk,„:=Xk/fn, k„ = n. Then at,, = at/n, and s.2 = s /n - a 2 a.s., so that condition (i) of Theorem T.13.1 is checked, assuming a 2 = I without loss of generality. It remains to check the conditional Lindeberg condition (ii). But L „, „(e) is nonnegative, and EL „ „(F) = E(XI l (1X ,^ >E ;_n I ) -+ 0. Therefore, L „, „(e) --* 0 in probability. n .
512
BROWNIAN MOTION AND DIFFUSIONS
2. We now apply Theorem 1.13.2 to Markov processes. Let {X n : n _> 0} be a Markov process on a state space S (with sigmafield .'), having a transition probability p(x; dy). Assume that p(x; dy) admits an invariant probability it and that under this initial invariant distribution the stationary process {Xn : n >, 0} is ergodic. Assume X0 has distribution n. Consider a real-valued function f on S such that Ef Z (Xo ) < oo. Write f'(x) = f (x) -1 where f = f f dit. Write T for the transition operator, (Tg)(x):= I g(y)p(x; dy). Then (Tmf')(x) _ (Tmf )(x) -7 for all m >, 0. Also, Ef'(X0 ) = 0. Suppose the series h n (x) ^_ Iö (Tmf')(x) converges in L2 (S, n) to h, i.e.
f
asn-•co.
(h. - h)' d7z -* 0
(T.13.17)
This is true, e.g., for all f in case S is finite. In the case (T.13.17) holds, h satisfies h(x) - (Th)(x) = f'(x),
or
(I - T)h = f',
(T.13.18)
i.e., f' belongs to the range of I - T regarded as an operator on LZ (S,., iv). Now it is simple to check that h(Xn ) - (Th)(Xn _ 1 )
(n _> I)
(T.13.19)
is a martingale difference sequence. It is also stationary and ergodic. Write n
Zn
(h(X,) - (Th)(Xm-, )).
°=
(T.13.20)
m=1
Then, by Theorem T.13.2, Z,/ f converges in distribution to N(0, a 2 ), where a 2 = E(h(XI) - (Th)(X0)) 2 = Eh 2 (X,) + E(Th) 2 (X0 ) - 2E[(Th)(X0)h(X1)]. (T.13.21) Since E[h(X 1 ) I {X0 }] = (Th)(X0 ), we have E[(Th)(X o )h(X I )] = E(Th) 2 (X0 ), so that (T.13.21) reduces to
a 2 = Eh 2 (XI) - E(Th) 2 (Xo) = J h 2 do - J (Th) 2 dn.
(T.13.22)
Also, by (T.13.18), n-1
n
Zn
(h(X,) - (Th)(Xm)) + h(X) - h(X0)
(h(Xm) - (Th)(Xm-1)) _
=
m=1
n—I
_
r'
f'(Xm ) + h(X) - h(Xo ). m=0
Since
E[h(Xn) - h(X0))/ n]Z ?(Eh2(X0) + Eh 2 (X0)) = 4 $h 2 dir -. 0,
n
n
(T.13.23)
THEORETICAL COMPLEMENTS
513
and
1 " 11
1
1=o f'(Xm) = Z In
—
(T.13.24)
— — (h(X") — h(X0)),
f
m
it follows that the left side converges in distribution to N(0, a 2 ) as n —* Co. We have arrived at a result of M. 1. Gordin and B. A. Lifsic (1978), "The Central Limit Theorem for Stationary Ergodic Markov Processes," Dokl. Akad. Nauk SSSR, 19, pp. 392-393.
Theorem T.13.3. (CLT for Discrete-Parameter Markov Processes). Assume p(x, dy) admits an invariant probability n and, under the initial distribution n, {X"} is ergodic. Assume also that f' := f — f is in the range of I — T. Then
-- Z jn
(f (Xm
) —
r)
N(0,0. 2
)
as n —• oo ,
(T.13.25)
m=0
where the convergence is in distribution, and
is given by (T.13.22).
Q2
❑
Some applications of this theorem in the context of processes such as considered in Sections 13, 14 of Chapter II may be found in R. N. Bhattacharya and O. Lee (1988), "Asymptotics of a Class of Markov Processes Which Are Not In General Irreducible," Ann. Probab., 16, pp. 1333-1347, and "Ergodicity and Central Limit Theorems for a Class of Markov Processes," J. Multivariate Analysis, 27, pp. 80-90.
3. For continuous-parameter Markov processes such as diffusions, the following theorem applies (see R. N. Bhattacharya (1982), "On the Functional Central Limit Theorem and the Law of the Iterated Logarithm for Markov Processes," Z. Wahrscheinlichkeitstheorie und Verw. Geb., 60, pp. 185-201).
Theorem T.13.4. (CLT for Continuous-Parameter Markov Processes). Let {X} be a stationary ergodic Markov process on a metric space S, having right-continuous sample paths. Let n denote the (stationary) distribution of X„ and A the infinitesimal generator of the process on LZ (S,it). If f belongs to the range of A, then t -1 / 2 f (X,) ds converges in distribution to N(0, a 2 ), with a Z = 2 ❑ —2 f f(x)g(x)ir(dx), where g e -9A and Ag = f.
fo
—
Proof. It follows from Theorem 2.3 that Z" = g(X)
—
J
Ag(XX) ds = g(X) 0
—
J
(n = 1, 2,...)
f(X) ds o
>,
is a square integrable martingale. By hypothesis, Z" — Z" _ 1 (n 1), is a stationary ergodic sequence of martingale differences. Therefore, Theorem T.13.2 applies and converges in distribution to N(0, a'), where a 2 = E(Z 1 — Z0 2 i.e., )
a z = EIg(X,) — g(X 0 ) —
J
1 0
Ag(XS) ds
j
2 .
,
(
T.13.26)
514
BROWNIAN MOTION AND DIFFUSIONS
But E(n- ` 12 g(Xn)) 2 = n - 'Eg 2 (X,) -p 0
as n- oc.
Therefore, +n-1/2 fo" f (XS)ds -^ N(0, a 2 ). Also, for each positive integer n, {Zk/n - Z(k-1)/n• 1 < k < n} are stationary martingale differences, so that n
Q 2 = E(ZI — 4) 2 = E E(Zk/ n — Z(k— 1)/n) 2 = nE(Z11n — 4)2 k=1
f
/n
= nE g(XI1n) - g(Xo) -
2
Ag(X,) ds
o, I/n
=
nE(g(X I/n ) - g(Xo )) 2 + nE^
z
Äg(Xs) ds 0
- 2nE (g(XI/n) - g(X0))
f
/n
(T.13.27)
Ag(XX) ds .
o
,
Now, nE(g(X11n) - g(Xo)) 2 = n[Eg 2 (XI/n) + Eg 2 (Xo ) - 2 Eg(X11n)g(Xo)] = n 2 Eg 2 (Xo) - 2 J g(x)TI/ng(x)it(dx)] [
1/n
(^('
= n 2Eg 2 (Xo ) - 2 J g(x) g(x) +
TSÄg(x) ds n(dx) 0
[
f J
= - 2n g(x)T,Ag(x) ds ir(dx) --o - 2
g(x)Ag(x)rr(dx),
0, /n (T.13.28)
as n - oo, since T,Äg -. Äg in L2 (S, n) as s j 0. Also, lJn (
E
=
o
1
2
1
Äg(XS) ds I-< E(! / n
fo
J('
I/n
l/"
1 E(Ag) n 2
2 (Xo) (Äg(Xs))2ds) _ - d E(Äg)s n o
(T.13.29)
2 (X0).
Applying (T.13.28, 29), the product term on the extreme right side of (T.13.27) is seen to go to zero. Therefore, a 2 = - 2 + o(1) as n -+ oo, which implies a 2 = - 2. This proves the result for the sequence t = n. But if Mn: =max{n - 1/2
ll
J
k I k l
f(X,)I ds: 1 5 k-< n}, )
THEORETICAL COMPLEMENTS
515
then P(M„ >
Y P(n -1j2
E)
k=1
as n —p
fk
If(XS)I ds >
E
t
1= nP(f 1 If(X5)I ds > E f)
/
-
0
, 0
/
since
G0' If(X)Ids) /
n
.
In order to check that f — f belongs to the range of A it is enough to show that —$0 Ts (f — f)(x) ds converges in LZ (S, it) as t —• oo. For if the convergence is to a function g in L2 , then n
— (Tag — g)(x) = h in L2 as h 10. In other words, Äg =
J
Ta(f — f)(x) ds = 0
TS(.%
—
.%)( x) ds —' (f —
f —7 Now,
I
f
J
y f( )(p(s; x, dy) — n(dy)) ds.
(T.13.26)
os
Therefore, one simple sufficient condition that f — f belongs to the range of Ä for every f e LZ(S, n) is sup{Ip(s; x, B) — n(B)I: Be a(S), x E S} —• 0
_
1. r -dimensional Brownian motion on the disc {1x1 2 An alternative renewal approach may be found in R. N. Bhattacharya and S. Ramasubramanian (1982), "Recurrence and Ergodicity of Diffusions," J. Multivariate Analysis, 12, pp. 95-122. 4. It is shown in Bhattacharya (1982), loc. cit., that ergodicity of the Markov process {X,} under the stationary initial distribution n is equivalent to the null space of A being the space of constants.
Theoretical Complements to Section V.14 1. To state Proposition 14.1 more precisely, suppose {X,} is a Markov process on a metric space S having an infinitesimal generator A on a domain -, and transition operators T, on the Banach space Cb (S) of all real-valued bounded continuous functions f on S, with the sup norm II.f I). Let cp be a continuous function on S onto a metric space S'. Let be a class of real-valued, bounded and continuous functions g on S' having the following two properties: (i) If g e ✓ g then g o ( A' and A(g o q) = h o cp for some bounded continuous hon S'; (ii) 9 is a determining class for probability measures on (S', M(S')), that is, if µ, v are
two probability measures such that $ g dp = J g dv for all g e then p = v. Then {tp(X,)} is a Markov process on S'.
516
BROWNIAN MOTION AND DIFFUSIONS
To see this consider the subspace B of C,(S) comprising all functions g o gyp, g in the closure of the linear span of 2(f) of W✓ . Then B is a Banach space, and A is the generator of a semigroup on B by the Hille -Yosida Theorem and property (i). This implies in particular that T, maps lB into itself, so that T,(g o 4,) is of the form h, o cp for some bounded continuous h, on S'. Since 5 ✓ is a determining class, it follows that T,(g o cp) is of the form h, o cp for every bounded measurable g on S'. Therefore, Proposition 6.3 applies, showing that {q (X,)} is a Markov process on S'. Theoretical Complements to Section V.15 1. The Markov property of {X,} as defined by (15.1) does not require any assumption on (G. But for the continuity at the boundary of x - T, g(x) := E. g(X) for bounded continuous g on G, or of the solution /i f of the Dirichlet problem (15.23) for continuous bounded f on (G, some "smoothness" of aG is needed. The minimal such requirement is that every point b of äG be regular, that is, for every t > 0, Px (t d0 > t) -+ 0 as x -• b, x e G. Here T ai is as in (15.2). A simple sufficient condition for the regularity of b is that it be Poincare point, that is, there exists a truncated cone contained in 68'`\G with vertex at b. For the case of a standard Brownian motion on R', simple proofs of these facts may be found in E. B. Dynkin, and A. A. Yushkevich (1969), Markov Processes: Theorems and Problems, Plenum Press, New York, pp. 51-62. Analytical proofs for general diffusions may be found in D. Gilbarg and N. S. Trudinger (1977), Elliptic Partial Differential Equations of Second Order, Springer- Verlag, New York, p. 196, for the Dirichlet problem, and A. Friedman (1964), Partial Differential Equations of Parabolic Type, Prentice-Hall, Englewood Cliffs. 2. Let f be a bounded continuous function that vanishes outside G. Define T°f(x):= E(f(X ( )1 wc „ ) ( X ° = x). If b is a regular boundary point then as x -+ b (x e G), T° f (x) -+ 0. In other words, T°C ° c C° , where Co is the class of all bounded continuous functions on G vanishing at the boundary. If f is a twice continuously differentiable function in C o with compact support then one may show that f e ^ö,^o, the domain of the infinitesimal generator A ° of the semigroup {T°}, and that (A°f )(x) _ (A f)(x) for all x e G. For this one needs the estimate P,(T eG 0 for all x e G,
where G is a bounded open set. Let us show that the maximum of u cannot be attained in the interior. For if it did at x 0 e G, then ((3u/öx''') x0 = 0, 1 < i _< k, and B:= ((ö 2 u/(3x (') dx (j) )),ro is negative semidefinite. This implies that the eigenvalues of CB are all real nonpositive, where C:= ((d(x ° )). For if ) is an eigenvalue of CB, then CBz = Az for some nonzero z e C k and, using the inner product < , > on C”, 0 _< = _ A. Therefore, A _< 0. It follows now that d ;; (x ° )(öz u/öx ('° (3 )x 0 = trace of CB is 0. Now consider u such that Au(x) _> 0 for all x e G. For > 0 consider the function u,(x):= u(x) + e exp{yx^' )}. Then Au(x) = Au(x) + E[Zd l ,(x)y z + µ"(x)y] exp{yx"^}.
THEORETICAL COMPLEMENTS
517
Choose y sufficiently large that the expression within square brackets is strictly positive for all x e G. Then Au(x) > 0 for x e G, for all t > 0. Applying the conclusion of the preceding paragraph conclude that the maximum of u t is attained on l G, say at x^. Since öG is compact, choosing a convergent subsequence of x L as E j 0 through a sequence, it follows that the maximum of u is attained on EG. Finally, if Au(x) = 0 for all x e G, then applying the above result to both u and —u one gets: u attains its maximum and minimum on c?G. Note that for the above proof it is enough to require that µ '°(x), d ;j (x), I _< i, j < k, are continuous on G, and d 4 (x) > 0 on G for some i. Under the additional hypothesis that ((d ;; (x))) is positive definite on G, one can prove the strong maximum principle: Suppose G is a (
bounded and connected open set. 1f Au(x) = 0 for all x e G, and u is continuous on G, then u cannot attain its maximum or minimum in G , unless u is constant.
The probabilistic argument for the strong maximum principle is illuminating. Under the conditions (1)—(4) of Section 14 it may be shown that the support of the probability measure P. is the set of all continuous functions on [0, ou) into R', starting at x (see theoretical complements to Section Vll.3). Therefore, for any ball B x, the Pr -distribution i(x; dy) of X i 8 has support i)B. Now suppose u is continuous on G and Au(x) = 0 for all x e G. Suppose u(x 0 ) = maxju(x): x e G} for some x o e G. Then for every closed ball B e = ;x: Ix — x 0 ) _< e} c G, the strong Markov property yields: u(x o ) = E..u(X,,,,,,), in view of the representation (15.22). (Also see Exercise 3.14 of Chapter VII ). That is, u(xo) =
J
u(y)^e(xo; dy), (T.15,1)
where 0,(x o ; dy) is the PX -distribution of X. Since u is continuous on iB and the probability measure 0,(x 0 ; dy) has the full support (,B,, the right side is strictly smaller than the maximum value of u on B E , namely u(x o ), unless u(y) = u(x 0 ) for all y e i B^. By letting E vary, this constancy extends to the maximal closed ball with center x 0 contained in G. By connectedness of G, the proof is completed. 4. Let G be a bounded open set such that there exists a twice continuously differentiable real-valued function cp on R'` such that G = {x: ep(x) < e}, (G = ix: q(x) = c;, for some c e 18. Assume also that grad cpi is bounded away from zero on iG. If u is a twice continuously differentiable function in G such that u, i u/ a) + aEA
I
hN(z)P(z; Y, a)
2E$
= 9N -1(Y,f -1(Y)) +
J
X hN(z)P(z;Y,.IN -1(Y)) = hN -1(Y), (1.13) ZES
522
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
say. In general, let f k be a function (on S into A) such that max 9k(Y, a) + I hk+ l(z)P(z; Y, a) = 9k(Y, f aeA
L
J
ZES
(Y)) + Y hk+ 1 (z)p(z;
Y, fk (Y))
ZES
=hk(Y),
(k= N— 1,N-2,...,0) (1.14)
say, obtained successively (i.e., by backward recursion) starting from (1.11) and (1.12).
Theorem 1.2. The Markovian policy f* (f 0* f
i . .. , f N) is optimal.
Proof. Fix an initial distribution n o . Let f = (fo , f1 , . .. , fN ) be any given policy. Define the policies k f^ = (fo'f . .. {'
.. . >fN) (k = 1,... ,N),
{'
f(0) = f*
(1.15)
We will show that (k = 0, 1, 2, ... , N).
Jn o < J f 'ö'
(1.16)
First note that the joint distribution of Xo , X 1 , ... , Xk is the same under f and under f (k) (i.e., nö, k = it). In particular, k-1
k-1
9k'(Xk,, a k .) = E o ' Z g k •(Xk ., a k .).
ERO
(1.17)
k'=0
k'=0
Now, Erto{9N(XN , aN) I XO
= X0, X1 = X 1 , . .. , XN = XN] = 9N(XN, aN) ' 9N(XNr f *I \\= hN(XN) = Eao ' 19N(XNr aN) I X O = x0, ... , XN = XN].
Since nö.N = nö N, averaging over x o , x 1 , .
.. , X N ,
(1.18)
one gets
En09N(XN aN) a k
k'=k+1
k+l
for all X k+ 1 e S. Then N
Eno
E
gk,(Xk., a k ')
X0 = x 0
, ...
,
Y_
= x k+l
N
gk(xk , ak)
+ Ea o Eao
9k'(Xk', a k
,) I X
0 ,..
. , Xk +1
k'=k+1
XO = X 0 , X i = x 1 , ... , Xk = xk1
+ Eno[hk+ 1(Xk+ 1) I XO =
= 9k(xk , ak) +
(1.24)
Xk = xk
k'=k
= gk(xk, ak)
—h k+l(x k+l ) —
hk+1(z)p(z; Xk, ak)
X0,
0 a diffusion coefficient specified on (0, oo), satisfying appropriate smoothness conditions. Let i/i°(z) denote the probability that a diffusion with coefficients p ( O(• ), 6 2 (.) reaches 0 before reaching d, starting at z. Here d > 0 is a fixed number. It is intuitively clear that, for the same diffusion coefficient a 2(.), larger the drift tc(.) better the chance of staying away from zero. Here is a proof. Proposition 3.1. Consider two diffusions on [0, oo ), with absorption at 0, having a common diffusion coefficient a'(•) but different drifts p(•) (i = 1, 2) 1(z) for all z > 0. satisfying p ( '>(z)'< j 2 ^(z) for every z > 0. Then /i 2 (z) Proof. By relation (9.19) in Chapter V, = ^i^`^(z)
2 11()(y) a (— fu 2µct_ (y) dy du. (3.7) du exp xp{ ( e } 40, —fo" a2(y) fZ o a 2 (y)
535
OPTIMAL CONTROL OF DIFFUSIONS
Write `I( z ) — pti^(z), ^(z):= y
i (z) ° =µt' (z) + srl(z),
and let F(r; z) denote the probability of reaching 0 before d starting at z, for a diffusion with drift coefficient µ E ( • ) and diffusion coefficient o2(.). It is straightforward to check (Exercise 2) that F(a; z) = /it i ^(z)(1 — ey(z)) + O(s 2 )
as e —• 0,
(3.8)
where
f
^ µ ti ^
a
y(z):=
2
exp
J
o
(Y) rl(Y)
2 dy 6 (Y)
22
0
6
dy du
(Y)
I
°
exp —
0
Y) 2 pta^( z dy du. 6
(Y}
z(3.9)
Since rl(z) >, 0 for all z, it follows that y(z) >0
unless µt' (.) = )
de
µt 2) (• ).
(3.10)
Therefore, barring the case of identical drifts, one has
F(e;
_ —^itn(z)Y(z) < 0.
z)
(3.11)
e=0
Hence, F(c; z) is strictly decreasing in e in a neighborhood of a = 0, say on [0, e o ). Since one can continue beyond s o , by replacing µt l) (•) by j(.), the supremum of all such e o is oo. In particular, one can take e = 1. n Example 1. (Survival Under Uncertainty). Consider first the following discrete-time model, in which h denotes the length of time between successive periods. Let Z, denote an agent's capital at time t. His capital at time t + h is given by Zo =z >0, Z,
=exp {W +n }(Z,—ah),
(t = nh; n=0,1,...)
(3.12)
where a is a positive constant and { W h : n = 1, 2, ...} is a sequence of i.i.d. Gaussian random variables with mean mh and variance vh. The constant a denotes a fixed consumption rate per unit of time, so that Z, — ah is the amount available for investment in the next period. The random exponential term represents the uncertain rate of return. For the Markov process {Z n h }, it is easy
536
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
to check (Exercise 3) that E(Z, +h ( Z,) = exp{mh + Zvh}(Z, — ah), E(Zt+ ^, I Z,) = exp{2mh + 2vh}(Z1 — ah)', E(Z, + „—Z,IZ,)=[(m+2v)Z,—a]h+o(h),
(3.13)
E((Zj+h — Z,) 2 ( Z,) = vZI h + o(h), E(IZ,+n — Z t 1 3 Zr )=o(h),
as h j0.
Therefore, one may, in the limit as h j 0, model Z, by a diffusion with drift t(.) and diffusion coefficient Q 2 (.) given by p(z) = (m + -v)z — a,
a2(z) = vz 2 .
(3.14)
The state space of this diffusion should be taken to be [0, cc), since negative capital is not allowed. One takes "0" as an absorbing state, which when reached indicates the agent's economic ruin. In the discrete-time model, suppose that the agent is free to choose (m, v) for the next period depending on the capital in the current period. This choice is restricted to a set C x (0, cc). In the diffusion approximation, this amounts to a choice of drift and diffusion coefficients, µ(z) = (m(z) + iv(z))z — a,
o2(z) = v(z)z 2 ,
(3.15)
such that (m(z), v(z)) E C
for every z e (0, co).
(3.16)
The object is to choose m(z), v(z) in such a way as to minimize the probability of ruin. The following assumptions on C are needed: (i) 0 < v * := inf{v: (m, v)€ C} < v* := sup{v: (m, v) E C} < oo, (ii) f (v) _= sup{m: (m, v) e C} < co, and (f (v), v) E C for each v e [v * , v*] , (iii) v -+ f(v) is twice continuously differentiable and concave on (v * , v*), (iv) lim l ^,, f'(v) = cc, Iim
t v.
f'(v) = — cc.
(3.17)
First fix a d > 0. Suppose that the agent wants to quit while ahead, i.e., when a capital d is reached. The goal is to maximize the probability of reaching d before 0. This is equivalent to minimizing the probability O(z) of reaching 0 before d, starting at z. It follows from Proposition 3.1 that an optimal choice of (m(z), v(z)) should be of the form (m(z) = f(v(z)), v(z))-
(3.18)
537
OPTIMAL CONTROL OF DIFFUSIONS
The problem of optimization has now been reduced to a choice of v(z). The dynamic programming equation for this choice is contained in the following proposition. To state it define, for each constant v E [v * , v*], the infinitesimal generator A„ by (A,g)(z):= zvz 2 g"(z) + {(f(v) + zv)z — a}g'(z).
(3.19)
For a given measurable function v(.) on (0, cc) into [v * , v *] define the infinitesimal generator A v( . ) by (A V( . ) g)(z)-= zv(z)z 2 g"(z) + {(f(v(z)) + 1 v(z))z — a}g'(z).
(3.20)
Fix d > 0. Let /i,, ( . ) (z) denote the probability that a diffusion having generator A, ( . ) , starting at z e [0, d], reaches 0 before d. For simplicity consider v(.) to be differentiable, although the arguments are valid for all measurable v(.) (see theoretical complement 2). Write ^ (z):= inf i , ( . ) (z).
(3.21)
Proposition 3.2. Assume the hypothesis (3.17) on C. Then the dynamic programming equation min A,,O(z) =0 VE
I V,R .
for0 0 is the discount rate. The given functions ph i) (y, c) (1 < i < k) are appropriately smooth on G x C, as are d(y, c) (1 < i, j < k), and D(y, c) is positive definite. Once again, the set of feasible controls c(.) may vary. For example, it may be the class of all continuously differentiable functions on G into C. The function r(y, c) is continuous and bounded on G x C. If, under some feasible control c(. ), the diffusion {X,} can reach öG with positive probability, then in (3.34) the upper limit of integration should be replaced by rO G • Let J(x):= sup J`O(x). a.)
(3.35)
In order to derive the dynamic programming equation informally, let c *(.) be an optimal control. Consider a control c 1 (.) which, starting at x, takes the action c initially over the period [0, t] and from then on uses the optimal control c *(. ). Then, as in the derivation of (3.2), (3.3), J(x) = E`X o e -a sr(X S , c) ds + EXr )
=
E`
fo,
= E` J
5
^" f
e -Qs r(Xs,
e ßsr(X,, c) ds + EX^'Ie ` -
o e - ßsr(X s , c)
e 'o
ds + e - Q`ExJ(X,)
= t(r(x, c)) + o(t) + e - ß`T,J(x) < J(x),
-s
c(XS)) ds
ri r(X^ +s•, c(Xr +s•)) ds'
540
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
or, (1 —en')
t
J(x) > r(x, c) + e-ß'
T,J(x) — J(x)
t
+ o(1).
(3.36)
Letting t l 0 in (3.36) one arrives at ßJ(x) > r(x, c) + A c J(x).
(3.37)
One expects equality if the right side of (3.37) (or (3.36)) is maximized with respect to c, leading to the dynamic programming equation ßJ(x) = sup {r(x, c) + A,J(x)).
(3.38)
Cec
In the example below, k = 1. Example 2. The state space is G = (0, oo). Let c denote the consumption rate of an economic agent, i.e., the "fraction" of stock consumed per unit time. The utility, or reward rate, is c) = (cx)1 r(x, (3.39) y
where y is a constant, 0 < y < 1. The stock X, corresponding to a constant control c is a diffusion with drift and diffusion coefficients µ(x, c) = Sx — cx,
o 2 (x, c) = v 2 x 2 ,
( 3.40)
where 6 > 1 is a given constant representing growth rate of capital, while c is the depletion rate due to consumption. For any given feasible control c(•) one replaces c by c(x) in (3.40), giving rise to a corresponding diffusion {X1 }. It is simple to check that such a diffusion never reaches the boundary (Exercise 4). The dynamic programming equation (3.38) becomes ßJ(x) = max — x' + zQ 2 x 2 J"(x) + (Sx — cx)J'(x) ? . CE[0,1]
(3.41)
y
Let us for the moment assume that the maximum in (3.41) is attained in the interior so that, differentiating the expression within braces with respect to c one gets cY-'xY
= xJ'(x),
or, c*(x) =
1 (J'(x))' icy
x
-1)
.
( 3.42)
541
OPTIMAL CONTROL OF DIFFUSIONS
Since the function within braces in (3.41) is strictly concave it follows that (3.42) is the unique maximum, provided it lies in (0, 1). Substituting (3.42) in (3.41) one gets 262x2)„(.x)
+ SxJ'(x) — \
— ßJ(x) = 0.
I—
(3.43)
Y/
Try Try the "trial solution” J(x) = dxv.
(3.44)
Then (3.43) becomes, as x 1 is a factor in (3.43), -1) — ßd = 0, a 2 dy(v — I) + döy — (1 — I I (dv)rrw
vl
(3.45)
i.e., d =
1
v ß
1 —y
+ .a 2 v(l —
1-v (3.46)
-- -
y) —
ay
Hence, from (3.42), (3.44), and (3.46), 72
ii^ v^ = ß +± Y( 1 =r — SY . c * := c * (x) = (yd) 1 )
I —y
(3.47)
Thus c*(x) is independent of x. Of course one needs to assume here that this constant is positive, since a zero value leads to the minimum expected discounted reward zero. Hence, ß + a 2 y(1 — y) — by > 0.
(3.48)
For feasibility, one must also require that c* < 1; i.e., if the expression on the extreme right in (3.47) is greater than 1, then the maximum in (3.41) is attained always at c* = 1 (Exercise 5). Therefore,
c* = max {1, d' },
(3.49)
where d' is the extreme right side of (3.47). We have arrived at the following. Proposition 3.3. Assume 6 > land (3.48). Then the control c*(•) = min{ 1, d'} is optimal in the class of all continuously differentiable controls. Proof. It has been shown above that c*(. ) is the unique solution to the dynamic programming equation (3.38) with J = J` * . Let c(•) be any continuously differentiable control. Then it is simple to check (see Exercise 6) that ßJ` ( ) (x) = r(x, c(x)) + A c( - ) jc '(x). (.
(3.50)
542
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
Using (3.38) with J = J one then has for the function (3.51)
h(x) := J(x) — J` ' (x), (
)
the inequality
ßh(x) = r(x, c*) + A,.J(x) — {r(x, c(x)) + A C( . ) J` ' (x)} (
)
r(x, c(x)) + A C( . ) J(x) — {r(x, c(x)) + A C( . ) J` ' (x)} = A C( . ) h(x), (3.52) ( )
or, g(x) := (ß — A C( . ) )h(x) >, 0.
(3.53)
Since the nonnegative function hi(x):= EX )(J e 0
ßs
g(XS)ds),
(3.54)
satisfies the equation (Exercise 6) (ß — A«.^)h1(x) = g(x),
(3.55)
one has, assuming uniqueness of the solution to (3.55) (see theoretical complement 3), h(x) = h l (x). Therefore, h(x) >, 0. n Under the optimal policy c* the growth of the capital (or stock) X, is that of a diffusion on (0, co) with drift and diffusion coefficients
µ(x):=
(ö
—
c*)x,
u'(x):= u 2 x 2 .
(3.56)
By a change of variables (see Example 3.1 of Chapter V), the process { Y := log X} is a Brownian motion on O' with drift 6 — c* — ZQ Z and diffusion coefficient a'. Thus, {X,} is a geometric Brownian motion.
4 OPTIMAL STOPPING AND THE SECRETARY PROBLEM On a probability space (S2, .`, P) an increasing sequence {.: n >, 0} of subsigmafields of are given. For example, one may have 3„ = a{Xo , X1 , ... , X„}, where {X„} is a sequence of random variables. Also given are real-valued integrable random variables {Y,,: n > 0}, Y„ being .-measurable. The objective is to find a {}-stopping time r,*„ that minimizes EYT in the class ✓ , of all {.y„}-stopping times T < m. One may think of Y„ as the loss incurred by stopping at time n, and m as the maximum number of observations allowed.
543
OPTIMAL STOPPING AND THE SECRETARY PROBLEM
This problem is solved by backward recursion in much the same way as the finite-horizon dynamic programming problem was solved in Section 1. We first give a somewhat heuristic derivation of rm. If S = . . . .‚, X„} (n >, 0), and X0 , X 1 ,. . . , Xm _ 1 have been observed, then by stopping at time m — 1 the loss incurred would be Ym _ 1 . On the other hand, if one decided to continue sampling then the loss would be Ym . But Ym is not known yet, since Xm has not been observed at the time the decision is made to stop or not to stop sampling. The (conditional) expected value of Ym , given X0, ... , Xm_1, must then be compared to Ym _ 1 . In other words, rm is given, on the set {im >,m— l}, by if Ym_1 < E(Ym I °Fm-1 ),
_Im — 1
2* m
if Ym -1
m
(4.1 ) > E(Ym I `gym - 1)•
As a consequence of such a stopping rule, one's expected loss, given {X0 ,...,Xm — I },is
Vm _, :=min{Ym _ 1 , E(Ym I `gym -1)}
on
{t,
>, m — 1 }.
(4.2)
Similarly, suppose one has already observed Xo , X 1 , ... , Xm _ 2 (so that r m — 2). Then one should continue sampling only if Ym _ 2 is greater than the conditional expectation (given {X0 , . .. , Xm _ 2 }) of the loss that would result from continued sampling. That is, — 2
T m
m
>m
—
if Ym -2 < E(Vm -1 I .m-2), if Ym -2 > E( Vm -t I'Fm -2) on {Tm i m
I
—
(4.3)
2
}.
The conditional expected loss, given {Xo , ... , Xm _ 2 }, is then Vm - 2 :=min{Ym _ 2 , E(Vm -1 I'm -2)}
on
{t„, i m
— 2 }.
(4.4)
Proceeding backward in this manner one finally arrives at 0 m
>1
if Yo ` E(V1 I .Fo), ifYo>E(Vil^o)
on{t0} =52.
(4.5)
The conditional expectation of the loss, given .moo , is then Vo := min{ Yo , E(V1 )} .
(4.6)
More precisely, V, are defined by backward recursion, Vm :=Ym ,
Vj:=min{Y^,E(Vj
1
I.3j )}
(j = m-1,m-2,...,0), (4.7)
544
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
and the stopping time Tm is defined by r:= min{ j: 0 < j < m, }, = Vj }.
(4.8)
Although the optimality of im is intuitively clear, a formal proof is worthwhile. First we need the following extension of the Optional Stopping Theorem (Chapter I, Theorem 13.3). A sequence {S S : j >, 0} is an {.9j }-submartingale if E(SS I .Fj _,) > SS _ 1 a.s. for all j >, 1. Theorem 4.1. (Optional Stopping Theorem for Submartingales). Let {S; : j > 0} be an {.}-submartingale, r a stopping time such that (i) P(r < cc) = 1, (ii) E^SZ C < oc, and (iii) lim n . E(S,„l {t>m) ) = 0. Then EST >, ESo , with equality in the case {SS } is a {3}-martingale. Proof. The proof is almost the same as that of Theorem 13.1 of Chapter I if we write XX := SS — SJ _ 1 (j >, 1), X0 = Sp, and note that E(XX I .9j _ 1 ) 0 and accordingly change (13.8) of Chapter Ito E(Xjlj)) = E[ 1 {r j)E(Xj ( . _ 1)]
% 0,
and replace the equalities in (13.9) and (13.11) by the corresponding inequalities.
n The main theorem of this section may be proved now. Theorem 4.1 is needed for the proof of part (c), and we use it only for r J))
= E(Zl'
A j
I (rm , j) ) + E(ZVV l st ,, > J) ).
(4.11)
545
OPTIMAL STOPPING AND THE SECRETARY PROBLEM
But, on {tm >j}, Vj = E(VJ+ , I S). Also, {rm >j} E Ft . Therefore, E(ZVjl
{t.,
>i}) = E(Zllr.,>J}E(V;
+i
I ,;))
= E(ZV, + , i{T ,>i)) = E(ZV^n,A(J+1)l1t,,>i}).
(4.12)
j }. A (j + 1) on {r Using (4.12) in (4.11) one gets (4.10), since zm A j = (c) Let r E .9,,. Since YY >, V^ for all j (see (4.7)), one has Yt , V. By (4.7), Theorem 4.1, and the submartingale property of {Vj } it now follows that
(4.13)
E(Y,) >, E(VT ) > E(V0 ).
This gives the first relation in (4.9). The second relation in (4.9) follows by the n martingale property of { Vtm „ J } (and Theorem 4.1). Note YL ^, = V^.,.
Y„
Y„
In the minimization of EYt over .9 , need not be .-measurable. In such cases one may replace by E( Y„ .) = U„, say, and note that, for every t C-
j)] Z_ m
m
m
E(Y) _ I E(Y.iItT=1)) = I E[E(Yi 1 J^=jJ I J= 0 l=0
=
J=0
E[ 1 1t=r)UJ) = EUt .
Hence the minimization of EYE reduces to the minimization of EUr over .%,,,. Also, instead of minimization one could as easily maximize EYT over .9,„. Simply replace min by max in (4.7), and replace ">," by "" replaced by "j' then aj < 1 so that (i) Y > E(Vj+1 ;) = ( j/m)a j on {XX = MM } and (ii) 0= Y < E(V 1 .yj ) on {X, < M}. Simply Simply stated, the optimal stopping rule is to draw j* observations and then continue sampling until an observation larger than all the preceding shows up (and if this does not happen, stop after the last observation has been drawn). The maximal probability of stopping at the maximum value is then ;
I
E(V1 ) Vi
1
11 -+ j*+ 1 +...+
-
m
m-1
(4.25)
Finally, note that, as m —► oo, 1 1 1 a*=—+ z1, +•••+ m — 1 ' j* j* + 1
(4.26)
where the difference between the two sides of the relation "z" goes to zero. This follows since j* must go to infinity (as the series (1/j) diverges and j* is defined by (4.20)) and a;. > 1, a;. + l < 1. Now, l
1
a,. = m* iI J/m ,*/m .
m
1
1 —
dx = — log(j*/m).
(4.27)
- « e ', m
(4.28)
Combining (4.26) and (4.27) one gets j*
—logt
m
.: 1,
where the ratio of the two sides of "—" goes to one, as m --+ oo. Thus, lim —=e, m
m-
J
lim E(V1 )=e. m-+co
(4.29)
549
CHAPTER APPLICATION
5 CHAPTER APPLICATION: OPTIMALITY OF (S, s) POLICIES IN INVENTORY PROBLEMS Suppose a company has an amount x of a commodity on stock at the beginning of a period. The problem is to determine the amount a of additional stock that should be ordered to meet a random demand W at the end of this period. The cost of ordering a units is
K +ca
if a = 0, ifa>0,
(5.1)
where K >, 0 is the reorder cost and c > 0 is the cost per unit. There is a holding cost h > 0 per unit of stock left unsold, and a depletion cost d > 0 per unit of unmet demand. Thus the expected total cost is
I(x, a):='(a) + L(x + a),
(5.2)
L(y):= E(h max {0, y — W} + d max {0, W — y }).
( 5.3)
where
In this model x may be negative, but a > 0. The objective is to find, for each x, the value a = f *(x) that minimizes (5.2). Assume
EWc>0,
h >0.
(5.4)
Consider the function G on R ,
G(y):= cy + L(y).
(5.5)
Then,
G(x)—cx 1 ( x, a) = JG(x + a) + K — cx
ifa =0, ifa>0.
(5.6)
Minimizing I(x, a) over a >, 0 is equivalent to minimizing I(x, a) + cx over a>,O,oroverx +a>,x. But G(x)
a) + cx = +a) JG(x +K I(x,
if a = 0, ifa>0.
(5.7)
Thus, the optimum a = f *(x) equals y*(x) — x, where y = y*(x) minimizes the
550
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
function (on [x, oo))
G(Y; x) {G(x) G(y) + K
if y = x, if y> x,
(5.8)
over y > x. Now the function y - L(y) is convex, since the functions y -+ max{0, y - w}, max{0, w - y} are convex for each w. Therefore, G(y) is convex. Clearly, G(y) goes to 0o as y -> oo. Also, for y < 0, G(y) > cy + djyl. Hence G(y) -+ oo as y -. - oo, as d > c > 0. Thus, G(y) has a minimum at y = S. say. Let s < S be such that
G(s)
=
K
+
G(S).
(5.9)
If K = 0, one may take s = S. If K > 0 then such ans < S exists, since G(y) --> o0 as y -- - oo and G is continuous. Observe also that G decreases on (- cc, S] and increases on (S, cc), by convexity. Therefore, for XE (-cc, s], G(x) > G(s) = K + G(S), and the minimum of G in (5.8) is attained at y = y*(x) = S. On the other hand, for x E (s, S], G(x) x, since G is increasing on (S, oc). Hence y*(x) = x for x E (S, oo). Since f *(x) = y*(x) - x, we have proved the following. Proposition 5.1. Assume (5.4). There exist two numbers s s.
The minimum cost function is
I(x) := I(x, f *(x)) = (g(f *(x)) + L(x + f *(x)) K+G(S)-cx G(x)-cx
ifx.s, ifx>s.
(5.11)
Instead of fixed costs per unit for holding and depletion, one may assume that L(y) is given by L(y):=
E(H(max{ 0, y - W}) + D(max{0, W - y})).
(5.12)
Here H and D are convex and increasing on [0, co), H(0) = 0 = D(0). The
CHAPTER APPLICATION
551
assumption (5.4) may now be relaxed to L(y) < oo
for all y,
lim (cx + L(x)) > K + cS + L(S), (5.13) X - - -
where S is a point where G is minimum. Next consider an (N + 1)-period dynamic inventory problem. If the initial stock is x = X0 (state) and a = a 0 units (action) are ordered, then the expected total cost in period 0 is g 0 (x, a):= E(4'(a) + h max{0, x + a — W1 } + d max {0, W1 — x — a}) = I(x, a).
(5.14)
We denote by W1 , W2 ,.. . , W^ i.i.d. random demands arising at the end of periods 0, 1, . .. , N — 1. Assumption (5.4) is still in force with W a generic random variable having the same distribution as the W. The state X 1 in period I is given by X 1 = x + a — W1 = X0 + a o — W1 and in general Xk = Xk-
l
+ ak-, — Wk (5.15)
where Xk - 1 is the state in period k — 1, and a k _ is the action taken in that period. Thus, the transition probability law p(dz; x, a) is the distribution of x + z — W. The conditional expectation of the cost for period k, given Xk _, = x, ak -1 =a, is
(k = 0, 1, ... , N),
gk(x, a)'= 5 k l(x, a)
(5.16)
where ö is the discount factor, 0 < 6 < oo. The objective is to minimize the total (N + 1)- period expected discounted cost N
JX °= EX
ak I (Xk, ak),
(5.17)
k=0
over all policies f = (f, f', ... 'f). By Theorem 1.2 (changing max to min), an optimal policy is f = (f ö, f *, ... , f N), which is Markovian and is given by backward recursion (see Eqs. 1.11 - 1.14). In other words, f N is given by (5.10), fN( )= S—x X
0
ifX<S, ifx>S,
(5.18)
and f (0 < k < N — 1) are given recursively by (1.14). Let us determine f N _i. For this one minimizes gN_ 1(x, a) + Eh N (x + a — W),
(5.19)
552
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
where 8 N,- 1 is given by (5.16) and h N,(x) is the minimum of g,(x, a) = S"I(x, a), i.e., (see Eq. 5.11) hN(x) =
Ö N I(x).
Thus, (5.19) becomes 6' -1 [I(x,a)+6EI(x+a—W)],
(5.20)
and its minimization is equivalent to that of I N _ l (x,a):=I(x,a)+6E1(x+a— W).
(5.21)
GN -1(Y) °= cy + L(y) + 6E1(y•— W).
(5.22)
Write
Then (see the analogous relations (5.5)-(5.8)), GN -i(x) I N - i (x, a) + cx = f +a)+K G,-,(x
if a = 0, ifa>0.
(5.23)
Hence, the minimization of IN _ 1 (x, a) over a >, 0 is equivalent to the minimization of the function (on [x, oo)), GN -l(X) G N -,(Y; x) _
tGN-1(Y) + K
ify=x, ify>x,
( 5.24)
over y >, x. The corresponding points f _ 1 (x), yN _ 1 (x), where these minima are achieved, are related by J
N -1 \x) = YN-I(X) - X.
(5.25)
The main difficulty in extending the one-period argument here is that the function GN - 1 is not in general convex, since I is not in general convex. Indeed (see (5.11)), to the left of the point s it is linear with a slope —c, while the right-hand derivative at s is G'(s) — c < —c since G'(s) 0. One can easily check now that I is not convex, if K > 0. It may be shown, however, that GN _ I is K- convex in the following sense. Definition 5.1. Let K >, 0. A function g on an interval.! is said to be K- convex if, for all y l < Y 2 <J'3 in a — Y2
K + 9(Y3) 9(Y2) + Y (9(Y2) — 9(Y1)). Y2 — Yi
(5.26)
553
CHAPTER APPLICATION
Thus 0-convexity is the same as convexity, and a convex function is K-convex for all K > 0. We will show a little later that (i) G N _ 1 is K-convex on V^ 1 , (ii) GN -1(Y) -* cc as IYI --> oo.
Assuming (i), (ii), we now prove the existence of two numbers S N _ I SN _ 1 such that YN
ifx,SN -1
SN -1
1(X)'- x t
(5.27)
1f X > S N-1+
minimizes GN _ 1 (y; x) over the interval [x, cc). For this let SN _ 1 be a point where G N _ , attains its minimum value, and let S N _ 1 be the smallest number SN _ 1 such that GN-I(SN -1) =
K + GN-1(SN -1)-
(5.28)
Such a number S N _, exists, since G N _ 1 is continuous and G N _ I (x) -• co as x-* -cc. Now G N _ 1 is decreasing on (-cc ; S N - I ]. To see this, let y, < Y 2 < S N _ 1 and apply (5.26) with y, = K + GN- I (SN- 1) > G_1(y2) +
SN -1
—
Y2
/
/
-- ( GN- I (Y2) — GN- 1 (YI )). (5.29)
Y2 — YI
Also, GN _ I (y 2 ) > K + G N _ 1 (SN _ 1 ), since Y 2 K + GN-I(SN -1) +
/
/
----- -- (GN-I(Y2) — GN-1(Y1))
,
Y2 — Y1
so that GN _ I (y2) < GN _ I (y l ). Therefore, if x -< sN _ 1, then GN _ , (x) > GN _ I (SN) _ GN_ 1 (SN_ 1 ) + K and,forally> x , GN_1(y;x)=GN_1(y)+K^GN_1(SN_I) +K. Hence, the minimum of 6N -,(-; x) on [x, oo) is attained at y = 5N-1' proving the first half of (5.27). In order to prove the second half of (5.27) it is enough to show that GN-I(x) 0, then S 1 J1 + S 2 J2 is (a 1 K 1 + 5 2 K 2 )-convex for all S 0, 6 2 >, 0. For positive S it now follows from (5.31) that GN _ 1 is K- convex if I is K- convex. Recall the definition of I (see Eq. 5.11), 1 ( x) __
ifx<s, ifx>s.
K+G(S)—cx
(5.32) ( )
We want to show K + 1(x 3 ) > 1(x 2 ) + X3
—
X2
x2 — x1
(1(x 2 ) — 1(x 1 ))
for all x 1 <x 2 <x 3 . ( 5.33)
If x 3 -< s, then linearity of I in (— oo, s] implies convexity and, therefore, K- convexity in (— co, s]. Similarly, if x 1 > s, (5.33) is trivially true since G(x) — cx is convex, and, therefore, K- convex. Consider then x 1 <s <x 3 . Distinguish two cases. CASE I.
x 1 <s <x 2 . In this case (5.33) is equivalent to
K+G(x 3 )— cx 3 >G(x 2 )— cx 2
+
X3—X2
x2
—
X1
(G(x 2 )— cx 2
—
K—G(S)+cx 1 ).
(5.34) Canceling cx (i = 1, 2, 3) from both sides and recalling that K + G(S) = G(s), ;
CHAPTER APPLICATION
555
(5.34) becomes equivalent to K + G(x 3 ) >, G(x 2 ) + X3 x2
—
X2
—
x1
(G(x 2 ) — G(s)).
(5.35)
If G(x 2 ) >, G(s), then the right side of (5.35) is no more than G(x2) +
X3
x2
— X2 —
s
(G(x2) — G(s)),
which is no more than G(x 3 ) by convexity of G. Hence (5.35) holds. Suppose G(x 2 ) < G(s). The left side of (5.35) is no less than K + G(S) = G(s), and (5.35) will be proved if it holds with the left side replaced by G(s), i.e., if (5.36)
G(s) > G(x 2 ) + x3 x2 (G(x 2 ) — G(s)). x2 — x 1 —
By simple algebra, (5.36) is equivalent to G(s) >, G(x 2 ), which has been assumed to be the case.
CASE 1I. x 1 < x 2 - s < x 3 . In this case (5.33) is equivalent to
K + G(x3) — cx3 > K + G(S) — cx2 + x3 — x2
—
X2
x1
(
—
cx2 + cx l )
= K + G(S) — cx 3 ,
which is obviously true. Thus, I is K-convex, so that G N _, is K-convex, and the proof of (5.27) is complete.
The minimum value of (5.19), or (5.20), is (5.37)
where (see Eqs. 5.23, 5.25, 5.27) K +GN _,(SN _,)—cx Ix -,(x) = min I N -,(x, a) _ G,^_,(x)—cx a>- 0
ifx s N _ 1 , ifx>sN_1.
(5.38)
The equation for the determination of f_2 is then (see Eq. 1.14) min [6 N-2 1(x, a) + 6 N- 'EI N _ ,(x + a — W)] a
?o
= 6 N-2 min [I(x, a) + 6EI N _ I (x + a — W)]. (5.39) n3o
556
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
This problem is mathematically equivalent to the one just considered since (5.32) and (5.38) are of the same structure, except for the fact that G is convex and G, -1 is K-convex. But only K-convexity of G was used in the proof of K-convexity of I(x). Proceeding iteratively, we arrive at the following. Proposition 5.2. Assume (5.4). Then there exists an optimal Markovian policy
f* _ (f ö, ... , f *) of the (S, s) type for the (N + 1)-period dynamic inventory model, i.e., there exist Sk < S k (k = 0, 1, . .. , N) such that during period k it is optimal to order Sk — x if the stock x is < s k , and to order nothing if x > Sk. Also s k <Sk for all kifK>0, and sk =Sk ifK=0. Finally, consider the infinite-horizon problem of minimizing, for a given b, 0 S.
Here S is the point where the function (5.48) attains its minimum. EXERCISES Exercises for Section VIA Consider the following inventory control problem. Let the possible amounts of a commodity that a business can stock be 0, 1, 2 (states). In each period k = 0, 1, 2 (N = 2) the stock is replenished by ordering 0, 1, or 2 units (actions), but not more than 2 units can be stored. There is a random demand that arises at the end of each period, demands over different periods being independent of each other and identically distributed. The possible values of demand are 0, 1, 2, which occur with probabilities 0.2, 0.5, and 0.3, respectively. Thus the transition probability p(z; x, a) is the distribution of max {0, z + a — W} A 2, where W is the random demand. Find an optimal policy to minimize the total expected cost if in each period the cost is the sum of (a) the cost of ordering a units at the rate of $1 per unit, (b) the cost of storing
558
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
the excess supply max{0, x + a — W} at the rate of $1 per unit, and (c) a penalty of $1 per unit for excess demand max{0, W — x — a}. 2. (Taxicab Operation) The area of operation of a cab driver comprises three towns, Ti, T2 , T3 (states). To get a fare, the driver may follow one of three courses (actions): pull over and wait for a radio call (a 1 ), go to the nearest taxi stand and wait in line (a 2 ), or go on cruising until hailed by a passenger (a 3 ). The action a, is not available if the driver is in town T,, as there is no radio service in this town. There is a known probability p(T; ; Ti , a,) that the next trip will be to town T; , if the cab is in town T• and follows the course of action a,. The corresponding reward is g(T, a k , T). (i) Write down the backward recursion relations for maximizing the total expected reward over a finite horizon. (ii) Find the optimal policy for the case N = 2, if the transition probabilities and rewards are as follows: State
Ti
Action
a2 a3
T2
al a2 a3
T3
a, az a3
Probability of transition to state
Reward if trip is to state
T1 T2 T3 TI
T2 T3
2
0 2 2
1 4 i 4 1 4 i 4 t I
2 I 4
0
2
1
1 4 1 4 1
2 1 2 1 4 1
2 41 i 4 I 4
2
1 4 21 1 4 1 2
2 4 2 2 2
4
2
4
4
2 2
4
2
4 4 4
2 2 2 2 2
3. Prove Theorem 1.3 along the lines of the proof of Theorem 1.2.
Exercises for Section VI.2 1, Suppose S = {x,, x z , ...} is countable and A is a compact metric space. For each E e (0, b) let {f: k = 0,1, ...} be a sequence of functions on S into A. (i) Show that there exists a sequence E, j 0 such that f 0-(x l ) converges to some point fö(x l ), say, in A. [Hint: A is a compact metric space.] (ii) Show that there exists a subsequence {E,,, 2 : n = 1, 2, ...} of {E„ : n = 1, 2, ...} such that f ö 2 (x z ) converges to some point f(x2) in A. (iii) Having constructed {E 1 : n = 1, 2, ...} in this manner find a subsequence n = 1, 2, ...} of {: n = 1, 2, ...} such that f " ( x ;+ ,) converges to some point f ö(x ;+ , ), say. (iv) Define S;p = E.,,, and show that f ö^•°(xj ) - f ö(x j ) for allj = 1, 2, ... , as n — co. (v) Use (i)—(iv) to find a subsequence {8,.,: n >, 1} of {8 0 } such that ^•'(x) converges to f (x), say, for every x e S, as n —+ oo. (vi) Proceeding in this manner find a subsequence { ° . k + 1 : n l} c {8,,, k : n > l} such that f k^ k(x) -- f k (x), say, for every x e S.
fa
THEORETICAL COMPLEMENTS
559
(vii) Let 6, := 8 . Show that f k "(x) - f (x) for every k >_ 0 and every x e S, as n -* oo.
2. Let the elements of a finite set S be labeled 1, 2, ... , m.
(i) Show that the set B(S) of all real-valued functions on S may be identified with 1m, and the "sup norm" on B(S) corresponds to the norm Ixl := max {Ix°' I: I _< i 0. Define f(t) = B, and S
f"(s):= Br2
-„ 1
if r2 - "t oo).
—'
B0 + 2Bt (1.34)
DIFFUSIONS AS SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS
571
The last convergence follows from an application of the Borel-Cantelli Lemma (Exercise 4). Hence,
J
t
B5 dB 5 = —(B-B)—t.
(1.35)
0
Notice that a formal application of ordinary calculus would yield J ö Bs dB., = 2(B, — Bö). Also, if one replaced the values of f at the left end points of the subintervals by those at the right end points, then one would get, in place of the first sum in (1.34), 1 z^ -i
z^ -i
(
Br+l,n(Br+l,n — Br,n) _ —
z
i
tBr+l,n — Br,n) + -1
z (B z —B0)ß
2 r=0
r=0
which converges a.s. to Zt + Z(B2 — Bö). Observe finally that the stochastic integral in (1.2) is well defined if } s(X )} belongs to .A'[0, cc). In the next section, it is shown that there exists a unique continuous nonanticipative solution of (1.2) if u(•) and a(.) are Lipschitzian and, in particular, {6(X )} e .H[0, cc). S
5
2 CONSTRUCTION OF DIFFUSIONS AS SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS In the last section, a precise meaning was given to the stochastic differential equation (1.1) in terms of its integral version (1.2). The present section is devoted to the solution of (1.1) (or (1.2)). 2.1 Construction of One-Dimensional Diffusions Let u(x) and v(x) be two real-valued functions on R' that are Lipschitzian, i.e., there exists a constant M > 0 such that Iµ(x) — i (Y)I , a} belonging to .‚i [a, oo) that satisfies (2.2).
572 Proof.
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
We prove "existence" by the method of iterations. Fix X,1°):= V
Let
a.
XI(" +i) ; = Xa + ,u(X(")) ds +
f
t
o(X) dB,,
'
a
(2.3)
a t
T.
(2.4)
a
For example, Xi l) = Xa + µ(Xa)(t — a) + u(Xa)(B, — Ba),
X i2) = Xa + f.' +
J
lz([X« + u(XX)(s — a) + Q(Xa)(BS — Ba)]) ds
([Xa + p(XX)(s
—
a) + u(Xa)(B.,
(2.5)
B )]) dB,,
a t 1).
(2.6)
Write
(2.7)
D := E(max (Xs" — Xs" 1) ) 2 ) )
aSs^t
Taking expectations of the maximum in (2.6), over 0 < t < T, and using (2.1) we get T
D r" + 1) 2EM2 T
I X(") s — X s( "
2 -
' ds )
a
+ 2E max aSI- 0,
(2.1 7 )
rn+1
/
as n —• w (Exercise 2). Therefore, {Xt : a < t < T} is in #[a, T], and /' T
E(X(" ) —Xt ) 2 dt
(2.18)
asn--roo.
—+0,
Ja
To prove that X, satisfies (2.2), note that max ! t µ(Xs" ) ds + I t o(Xs" ) dBs — T JJ a Ja )
M
)
J
f«' µ(X,)
ds —
J o(XS) dB5 1
a
T
IXS" — Xs 1 ds + max )
a a,tsT
f
(Q(XS" ) — u(XX)) dB s . (2.19) )
a'
The first term on the right side goes to zero, as proved above. For the second term, use the Maximal Inequality (Chapter I, Theorem 13.6) to get
PI max I
\\\a5t5T
f.
'
(a(Xs" ) ) — a(X2)) dB, > 1 J < k 2 E k /
J T (v(Xs" ) )
— ^(X)) 2 ds
a
k2
Y (D)
)
(2.20)
= n+1 \ r-
Since the last expression in (2.20) is summable in n (Exercise 2), it follows from the Borel—CanteIli Lemma that max{jsa (o(XS" ) ) — a(X5 )) dB s j: a < t < ß} < 1/k
DIFFUSIONS AS SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS
575
for all sufficiently large n, outside a set Nk of zero probability. Let No := U {Nk : 1 < k < co}. Then No has zero probability and, outside No , the second term on the right side of (2.19) goes to zero. Thus, the right side of (2.4) converges uniformly on [a, T] to the right side of (2.2) as n —+ cc, outside a set of zero probability. Since X 1 converges to XX uniformly on [a, T] outside a set of zero probability, (2.2) holds on [a, T] outside a set of zero probability. It remains to prove the uniqueness to the solution to (2.2). Let { Y,: a < t < T} be another solution. Then, write 49 E(max{(X 3 — YI: a , s'< t}) 2 and use (2.7)—(2.11) with cp, in place of D, X, in place of X;" ) and Y in place of X;" - ' ) , to get T
q ds.
(PT < cl
(2.21)
Since t —+ cp, is nondecreasing, iteration of (2.21) leads, just as in (2.12), (2.13), to c 2 (T — a) C1 "
PP T ^
n!
--'.0
asn —*oo.
(2.22)
Hence, (p T = 0, i.e., X, = Y on [a, T] outside a set of zero probability.
•
The next result identifies the stochastic process {X,: t >, 0} solving (2.2), in the case a = 0, as a diffusion with drift u(. ), diffusion coefficient a 2 (•), and initial distribution as the distribution of the given random variable X 0 . Theorem 2.2. Assume (2.1). For each x E El let {X: t > 0} denote the unique continuous solution in .ßl[0, oo) of Itö's integral equation
X f = x +
f
p(Xs) ds + i 1 u(Xs) dB (t > 0). 5
(2.23)
,10
o ,
Then {X'} is a diffusion on R' with drift j(.) and diffusion coefficient a 2 (. ), starting at x.
Proof. By the additivity of the Riemann and stochastic integrals, t
p(X„) du +
X; = Xs + S
f
' u(Xx) dB,
(t '> s).
(2.24)
5
Consider also the equation, for z e R',
X, = z +
J
u(X„) du + J a(X)dB, S
S
(t >' s).
(2.25)
576
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Let us write the solution to (2.25) (in . #[s, co)) as 6(s, t; z, Bs), where Bs:= {B„ — B s : s < u < t }. It may be seen from the successive approximation scheme (2.3)—(2.5) that 6(s, t; z, B.,) is measurable in (z, B.,) (theoretical complement 1). As {X„: u > s} is a continuous stochastic process in ✓#[s, co) and is, by (2.24), a solution to (2.25) with z = Xs, it follows from the uniqueness of this solution that = 8(s, t; Xs, B),
X;
(t >, s).
(2.26)
Since Xs is .-measurable and S and Bs are independent, (2.26) implies (see Section 4 of Chapter 0, part (b) of the theorem on Independence and Conditional Expectation) that the conditional distribution of X1 given .Fs is the distribution of 6(s, t; z, BS), say q(s, t; z, dy), evaluated at z = X. Since o {X„: 0 < u < s} the Markov property is proved. To prove homogeneity of this Markov process, notice that for every h > 0, the solution 8(s + h, t + h; z, Bs+;,) of
X,+h = z +
l'
+n µ(X„) du + ,h
+h
J
(t >, s)
a(X„) dB,
(2.27)
s+n
has the same distribution as that of (2.25). This fact is verified by noting that the successive approximations (2.3)—(2.5) yield the same functions in the two cases except that, in the case of (2.27), B., is replaced by Bs+,^,. But B, and B + have have the same distribution, so that q(s + h, t + h; z, dy) = q(s, t; z, dy). This proves homogeneity. To prove that {X} is a diffusion in the sense of (1.2) or (1.2)' of Chapter V, assume for the sake of simplicity that µ(•) and o 2 (. ) are bounded (see Exercise 7 for the general case). Then E(X' — x) = E J r u(X s) ds + E a(Xs) dBs = E 0
0
= tµ(x) +
J
E(lc(Xs)
t
—
J
r
µ(X5) ds
o
µ(x)) ds = tp(x) + o(t),
(2.28)
0
since Ej(XS) — µ(x) —• 0 as s J. 0. Next,
E(Xx
— x) 2
= E(
Jo µ(X;)ds r
+ 2EL(
= 0 (t 2 ) +
J
o
J
0
Z )
µ(X5)
t
+E(fo
J
ds)( o
a(Xs)dB5) /
a(X) dB.,) ]
Ea 2 (X) ds + 0(t) = 0(t)
as t j 0. (2.29)
577
DIFFUSIONS AS SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS
For, by the Schwarz Inequality,
E
[(J o µ(Xs) ds)(o a(XS) dB. )]
2
[E(fo µ(X ('
= O(t)[
s) ds I 2] 1/2[E(fo a(X) dB))2] /1/z
c
J E2(XS)ds]
= O(t)O(t" 2 ) = 0(t)
as t j 0,
0
leading to E(X, — x) 2
=J
o z (X,,) ds + o(t)
as t j 0.
(2.30)
Now, as in (2.28),
J Ea
a 2 (x) dx +
2(XS) ds =
0
0
J
E(a 2 (X;) — a 2 (x)) ds
0
= ta 2 (x) + 0(t)
as t j 0.
(2.31)
The last condition (1.2)' of Chapter V is checked in Section 3, Corollary 3.5.
n
2.2 Construction of Multidimensional Diffusions In order to construct multidimensional diffusions by the method of Ito it is necessary to define stochastic integrals for vector-valued integrands with respect to the increments of a multidimensional Brownian motion. This turns out to be rather straightforward. Let {B, = ( B;' 1 B, )} be a standard k-dimensional Brownian motion. Assume (1.4) for {B,}. Define a vector-valued stochastic process ,
...
,
f = {f(t) = (f">(t), ... , .f
(k)
(t)): a z t < ß}
to be a nonanticipative step functional on [a, ß] if (1.5) holds for some finite set of time points t o = a < t, < • • • >) I •Fa]
.mau) I . ] = E[.f (n(s)(Bs`) — Bsn)f u)( u )E(B/) — Bü;) I .F„) I .Fa] = 0. (2.39)
Also, by the indepencence of .Fs and {B s . — B s : s' s }, and by the independence
DIFFUSIONS AS SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS
579
of {B{°} and {Bl'l} for i j, E[ f a} of #[a, oo) such that (2.43) holds. Theorem 2.5. Suppose µ(• ), a(.) satisfy (2.42), and let {X'; t > 0) denote the unique (up to a P-null set) continuous nonanticipative functional in .t[0, co) satisfying Itö's integral equation X,
f' ,
s) ds +
J
r
6(X 8 ) dB5
,
(
t >- 0).
(2.45)
0
Then {X'} is a diffusion on R' with drift t(•) and diffusion matrix a(•)6'(• ), a'(.) being the transpose i(.). It may be noted that in Theorems 2.4 and 2.5 it is not assumed that a(x) is positive definite. The positive definiteness guarantees the existence of a density for the transition probability and its smoothness (see theoretical complement 5.1 of Chapter V), but is not needed for the Markov property. Example. Let k = 1, µ(x) = — yx, 6(x) = a. Then the successive approximations
DIFFUSIONS AS SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS
581
X,(" (see Eqs. 2.3-2.5) are given by (assuming B 0 = 0) )
X'( o) - X0,
J — yX ds + 6B, = X (I — ty) + uB , = X0 + J — y{X (I — sy) +
X; ') = Xp +
0
o
r
Jo
X; 2)
r
QB S } ds + aB ,
0
0
t i ,z \
=
I —ty+
r
} — )Xp—yQ 2!J
Cl X1 3) = Xp +
p
S2y2
— y (I — Sy + 2 2
=
B,ds+aB„ 0 s
2,--- ^ Xp — y6
3 31
I
I — ty+ 2 t ^ ---- IX 0 +, 2 6 2 2
f('
33
ty+ -2 i —t 3! ^Xp+,, a =
1
—
ty+
-- ,
t3 „3
- - ---'-)X0+,r 2 (J
2!
J
( '
3
0
r
(
r
J(' B ds+a'B, Jz
B„ duds—y6
o
t22
B u du + QB S ^ dS + 6B,
s
o,
/
fo
s
0
f
ds)B„du—yß f Bs ds+aB,
.
o,
'
(t—u)B„du—
Bsds+aB,.
o
0
(2.46) Assume, as an induction hypothesis, (—ty)Mn-1
a•," ) = Y -- )X 0 + Y- (—Y)r"6 (
M "= 0 mI
m1
fr(t—S )m
0 (n1
—
-
I
Bs ds + 6B,. (2.47)
I)!
Now use (2.4) to check that (2.47) holds for n + 1, replacing n. Therefore, (2.47) holds for all n. But as n — co, the right side converges (for every w e S2) to
e
-r yX0 —
ya
f r e-vlr-S^B5 ds + QB,.
Jo
(2.48)
Hence, Xr equals (2.48). In particular, with Xp = x,
X, = e-"x —
t7 £ r
e-''u-s)B5 ds + aB r .
(2.49)
0
As a special case, for y > 0, a 0- 0, (2.49) gives a representation of an Ornstein—Uhlenbeck process as a functional of a Brownian motion.
582
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
3 ITÖ'S LEMMA
Brownian paths s - B, have finite quadratic variation on every finite time interval [0, t] in the sense that x -1
max
Ji
16N52"
(
m =0
B(m+l)2-nt - Bm2-nt) 2 - N2 - "t -+0 a.s.
as n -• cc. (3.1)
This is easily checked by recognizing that the expression ZN say, within the absolute signs is, for each n, a martingale (1 < N < 2 "), so that the Maximal Inequality (Chapter I, Eq. 13.56) may be used to prove (3.1) (Exercise 1). Notice that the quadratic variation of {B } over an interval equals the length of the interval. A consequence of (3.1) is a curious and extremely important "chain rule" for the stochastic calculus, called Itö's Lemma. The present section is devoted to a derivation of this chain rule and some applications. For an intuitive understanding of this chain rule, consider a nonanticipative functional of the form ,,, ,
5
Y(t) = Y(0) +
J " f(s) ds + fo g(s) dB
5
,
(
3.2)
0
where f, g E ✓l'[0, T], and Y(0) is .-measurable and square integrable. One may express (3.2) in the differential form (3.3)
dY(t) = f(t)dt + g(t)dB,.
Suppose that cp is a real-valued twice continuously differentiable function on ll , say with bounded derivatives gyp', cp". Then Itö's Lemma says dq (Y(t)) = q'( Y(t)) dY(t) + zip " (Y(t))g 2 (t) dt _ {qp'(Y(t)) f(t) + iqp"(Y(t))g 2 (t))} dt + gp'(Y(t))g(t) dB,. (3.4) .
In other words, (P(Y(t)) = (P(Y(0)) +
E
{(P'(Y(s))f(s) + 1 "(Y(s))g 2 (s)} ds + -
J
9
'(Y(s))g(s) dB 5
.
0
(3.5)
Observe that a formal application of ordinary calculus would give dcp(Y(t)) = q'(Y(t)) d Y(t) = cp'(Y(t))f(t) dt
The extra term
Zip "(Y(t))g 2 (t)
+ tp'(Y(t))g(t) dB,.
dt appearing in (3.4) arises because
(dY(t)) 2 = g 2 (t)(dB,) 2 + o(dt) = g 2 (t) dt + o(dt).
(3.6)
583
ITO'S LEMMA
Since the term g 2 (t) dt cannot be neglected in computing the differential dcp(Y(t)), one must expand cp(Y(t + dt)) around Y(t) in a Taylor expansion including the second derivative of cp. The same argument as above applied to a function p(t, y) on [0, T] x (f8 1 , such that cp o := öcp/öt, q,', cp" are continuous and bounded, leads to dcp(t, Y(t)) = {(p o (t, Y(t)) + cp'(t, Y(t))f(t) + jqp'(t, Y(t))g 2 (t)} dt + cp'(t, Y(t))g(t) dB,.
(3.7)
To state an extension to multidimensions, let cp(t, y) be a function on [0, T] x Ihm that is once continuously differentiable in t and twice in y. Write
a0(P(t, Y) __ a0t, Y)
7r(P(t, y)'= a Y) (1 < r < m).
at
(3.8)
Let {B,} be a k-dimensional standard Brownian motion satisfying conditions (1.4(i), (ii)) (with B s replaced by B S ). Suppose Y(t) is a vector of m processes of the form Y(t) = (Y (I) (t), . . . , Y Im' \t)),
f
y (r) (t) = y()(Ø) +
I fr( s) ds +
o
J
o
s • dB s grO
r m). (1 ^^
(3.9)
Here fr,. 'fm are real-valued and g 1 , ... , g m vector-valued (with values in ff8'k ) nonanticipative functionals belonging to . &[0, T]. Also Y ( ' ) (0) are .äo -measurable square integrable random variables. One may express (3.9) in the differential form
dY r (t) = fr (t) dt + g,(t).dB, (
)
(1 (r) := max Ixl =r
B(x) + C(x)
C(x): =2 x ( ` )JUY ) (x),
— 1, ß(r) min
d(x)
& (r):= max d(x),
B(x) + C(x)
1xl =r
— 1, (3.23 )
d(x)
a(r)r= Ixl_r min d(x),
Ixl =r
I(r) := f r ß(u) du, u
J
^
1(r) :=
ß (u) du,
fu
where c> 0 is a given constant. Also note that (see Eq. 15.35 of Chapter V) for every F that is twice differentiable on (0, oo), Acp for cp(x):= F(IxI) is given by
B(x) + C(x)—d(x) „ 2 (Aq, )(x) = (d(x))F (Ixl) + Ixl F
(Ixl)
(Ixt > 0). (3.24)
Proposition 3.3. Let t(.), 6(•) be Lipschitzian, and ß(x) nonsingular for all x.
ITÖ'S LEMMA
587
Suppose that, for some c > 0,
J
J^
1 exp{1(u)} du < oo a(u)
(3.25)
Then E(t B(O:ro)
(3.26)
(Ixj > ro > 0),
( X o = x) < cc
where t
B(O : r)
:=
inf{t
0: 1X^l = r} .
(3.27)
Proof. First note that if (3.25) holds for some c> 0 then it holds for all c > 0
(Exercise 2). Let c = ro > 0. Define F(r) :=
f"
exp{ —1(u)} u
0
J acv)exp{I(v)} dv J du,
(r >, ro )
(3.28)
and q(x)
=
F(Ixl),
Ixi
(3.29)
ro .
Note that F'(r) > 0 and F"(r) + (ß(r)/r)F'(r) _ —1/a(r) for r > r 0 . Hence, by (3.24),
2(A4)(x) < — d(x)/a((x)) < —1
forIxI >, r 0 .
(3.30)
Fix x such that Ixi > r0 . By Corollary 3.2(b), and optional stopping (Proposition 13.9 of Chapter 1; also see Exercises 3, 4), EZgN = EZ o = q(x) = F(Ixl)
(3.31)
where Z, is as in (3.22) with cp(t, y) = q(y) and cp o (t, y) = 0, and r1 N is the {.}-stopping time
?J N := inf{t >, O:IX; I = ro or N},
(r0 < IxI < N).
(3.32)
Using (3.30) in (3.31) the following inequality is obtained, 2EF(IXn I) — 2F(Ixl) = E N
J nN
2(Acp)(X9) ds
—E(11,).
(3.33)
0
Now the first relation in (3.25) implies that T (O:, o ) < oo a.s. (see Corollary 15.2 of Chapter V. or Exercise 5). Therefore, rJ N --+ t,, B(a:ro) a.s. as N -* co, so that
588
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
(3.33) yields E('tae(O:ro) 1 Xo = x) < 2 F(Ixl).
(3.34)
n
As in the one-dimensional case (see Section 12 of Chapter V), it may be shown that (3.26) implies the existence of a unique invariant distribution of the diffusion (theoretical complement 5). As a second application of Itö's Lemma, let us obtain an estimate of the fourth moment of a stochastic integral.
Proposition 3.4. Let f = {(fi (t), ... , fk (t)): 0 < t < T} be a nonanticipative functional on [0, T] satisfying T f 4(t) dt < oo
(i = 1, ... , k).
EJ 0
(3.35)
Then
ff
E
'T{Ef 4(t)} dt. f(t)•dB,4 < 9k 3 T Jy ^ 1=1
.
Proof.
f
(3.36)
0
Since f(t) • dB, = Z _, f (t) dB}` ) ,
E( oT f(t)•dB,f 4 =k 4 E) 1 j f Tf(t)dBi^>)4 \\\ki =io
/
k
4
Ek
(f T
.f (t) dB/4
=
0
k 3 Y E \J
1
T
f
(t) dB,(')
4
(3.37)
Hence, it is enough to prove T
E(
4
g(t) dB)
T
{Eg 4 (t)} dt
(3.38)
0
0
for a real-valued nonanticipative functional g satisfying
E
f
0T
g 4 (t) dt < oo.
(3.39)
First suppose g is a bounded nonanticipative step functional. Then by an explicit computation (Exercise 6) the nonanticipative functional s -a (f ö g(u) dB.) 3 g(s) belongs to AY[0, T]. One may then apply Itö's Lemma (see Eq. 3.5, or Eq.
589
ITÖ'S LEMMA
3.20 with k = 1) to q (y) = y 4 and Y(t) = f o g(s) dB to get 5
(
fo g(s) dBs / 4 6
fo
\Jo
g(u) dB„
I Z g 2 (s) ds + 4 fo
(10
9(u) dBu 3 g(s) dBs> )
0 0, P(X" — xl > E) = 0(t)
as t j 0.
(3.43)
Proof. By Chebyshev's Inequality, P(IXI — xl) < E(Xx — x) 4 . E
But if u(.) and a(.) are both bounded by c, then r
a
E(Xl — x) 4 = EI µ(Xs) ds + o (Xs) dB s
/fo
Jo
,
4
23{E(Joµ(Xs) ds)
1
4 + E (Joi(Xs) dBs) n
as t j 0.
8 {(ct) 4 + 9t 2 C 4 } = 0(t 2 ) = 0(t)
For another application of Itö's Lemma (see Eq. 3.19), let Y(t) != Xl — X = x
—y+
0
J
r
(µ(Xs) — µ(X5 )) ds +
J
r
(a(XS) — a(XS )) dBs,
0
(3.44)
and cp(z) = Iz1 2 , to get
Ixe — xs I Z = IX — y1 2 +
J
r
{2(x5 — xs) . (µ(X5 ) — µ(Xs ))
0 k
I rr(Xs) — ar(X5)i2} ds
+ r=1
+
J
0
2(X — Xs) • (a(Xs) — ß(XS )) dB s .
(3.45)
CHAPTER APPLICATION: ASYMPTOTICS OF SINGULAR DIFFUSIONS
591
Note that, since X, X; E ßl[0, oo) and p(•) and o(.) are Lipschitzian, the expectation of the stochastic integral is zero, so that EIX, — Xfl 2 = Ix — y1 2 +
J'
o
f
2E(Xs — Xs) . (µ(Xs) — µ(Xs ))
+ 1 Et(X) - 6r(XS )I 2 ( ds. (3.46) r=1
In particular, the left side is an absolutely continuous function of t with a density dt EIX — Xfl 2 2MEIXt — Xf l 2 + kM 2 EIX, — Xfl 2 .
(3.47)
Integrate (3.47) to obtain EIX' — X, l 2 < Ix — y12e(2M+kM2),.
(3.48)
As a consequence, the diffusion has the Feller property: If y --^ x, then p(t; y, dz) converges weakly to p(t; x, dz) for every t >, 0. To deduce this property, note that (3.48) implies that, for every bounded Lipschitzian function h on R' with Ih(z) — h(z')I < cjz — z'J,
IEh(XT) — Eh(X^ )I = IE(h(XT) — h(X^ ))I cEjX — X; clx — yIe(M+kM2/2)t y 0
c(EIX; — X; X 2 )'/ 2
as y —• x.
(3.49)
Now apply Theorem 5.1 of Chapter 0. To state the Feller property another way, write T, for the transition operator (T,f)(x) = Ef(X,) =
f
f (y)p(t; x, dy).
(3.50)
The Feller property says that if f is bounded and continuous, so is T 1 J'. A number of other applications of Itö's Lemma are sketched in the Exercises.
4 CHAPTER APPLICATION: ASYMPTOTICS OF SINGULAR DIFFUSIONS A significant advantage of the theory of stochastic differential equations over other methods of construction of diffusions lies in its ability to construct and analyze with relative ease those diffusions whose diffusion matrices D(x):= u(x)a'(x) are singular. Such diffusions are known as singular, or degenerate, diffusions. Observe that in Section 2 the only assumption made on
592
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
the coefficients is that they are Lipschitzian. As we shall see in this section, the stochastic integral representation (2.45) and Itö's Lemma are effective tools in analyzing the asymptotic behavior of these diffusions. Notice, on the other hand, that the method of Section 15 of Chapter V (also see Section 3 of the present chapter) does not work for analyzing transience, recurrence, etc., as quantities such as ß(r) may not be finite. Singular diffusions arise in many different contexts. Suppose that the velocity V, of a particle satisfies the stochastic differential equation dV, = µ0(V,) dt + a 0 '
(V1)
dB',
(4.1)
where {B} is a standard three-dimensional Brownian motion. The position Y, of the particle satisfies dY,
V, dt.
=
(4.2)
The process {X, := (V 1 , Y,)} is then a six-dimensional singular diffusion governed by the stochastic differential equation dX, = µ(X,) dt + a(X,) dB,,
(4.3)
where
FL(x) = (µo(v),
v)', 6(x) 0 _I _
c'^
v) 0
'
writing x = (v, y) and 0 for a 3 x 3 null matrix. Here {B,} is a six-dimensional standard Brownian motion whose first three coordinates comprise {B,(" }. Often, as in the case of the Ornstein—Uhlenbeck process (see Example 1.2 in Chapter V, and Exercise 14.1(iii) of Chapter V), the velocity process has a unique invariant distribution. The position process, being an integral of velocity, is usually asymptotically Gaussian by an application of the central limit theorem. Thus, the asymptotic properties of X, = (V„ Y,) may in this case be deduced from those of the nonsingular diffusion {V, }. On the other hand, there are many problems arising in applications in which the analysis is not as straightforward. One may think of a deterministic process U, = (Ut", U^ 21 ) governed by a system of ordinary differential equations dU, = µ(U 1 ) dt, having a unique fixed point u* such that U, --+ u* as t --• co, no matter what the initial value U 0 may be. Suppose now that a noise is superimposed that affects only the second component Ue 2) directly. The perturbed system X, = (X; 11 , XI 21 ) may then be governed by an equation of the form (4.3) with (
6(x)
-
0
0 0(
(x)^
CHAPTER APPLICATION: ASYMPTOTICS OF SINGULAR DIFFUSIONS
593
where a&(x) is nonsingular. The following result may be thought to deal with this kind of phenomenon, although 6(x) need not be singular. For its statement, use the notation J(x)°_ ((J1i(x))).
°(x),
Theorem 4.1. Let
i(.),
(4.4)
6(•) be Lipschitzian on ll ,
Il(x) — a(y)II '< )olx — yl
for all x, ye UBk.
(4.5)
Assume that, for all x, the eigenvalues of the symmetric matrix 2(J(x) + J'(x)) are all less than or equal to —A 1 < 0, where k2ö < 22 1 . Then there exists a unique invariant distribution ir(dz) for the diffusion, and p(t; x, dz) converges weakly to n(dy) as t , cc, for every x e W. Proof. The first step is to show that, for some x, the family { p(t; x, dy): t >, 0} is tight, i.e., given E > 0 there exists C E < oo such that p(t; x, {y: IYI > c E }) , 0).
(4.6)
It will then follow (see Chapter 0, Theorem 5.1) that there exist t„ —^ cc and a probability measure it, perhaps depending on x, such that weakly
p(t.; x, dy)
i
as n --+ oo.
rc(dy)
(4.7)
We will actually prove that sup EIXfI Z < oo,
(4.8)
t>'0
where {X'} is the solution to (4.3) with X 0 = x. Clearly, (4.6) follows from (4.7) by means of Chebyshev's Inequality,
p(t; x, {lyl > c}) = P(I X I > c)
i c
EIX^ I 2 .
In order to prove (4.8), apply It6's Lemma (see Eq. 3.20) to the function ^(y) = Iy1 2 to get
5
t
IXxI 2 = IXI2 + (Acp)(XS)ds + 0
('
t
. ö,(p(X7)a,(Xs)•dB5. r= 1 0
(4.9)
594
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Now check that k
(Agq)(Y) = 2y it(Y) + X I(y)I 2 = 2 Y'µ(Y) + tr(a(Y)a'
(Y)),
(4.10)
where tr D denotes the trace of D, i.e., the sum of the diagonal elements of D. Now, by a one-term Taylor expansion, i y • grad j(By) dB,
µ (r) (Y) — µ r (0) = (
)
0
µ(y)
—
µ(0 ) = J 1 y'J(Oy)dO, 0
(4.11)
y •µ(Y) = Y - (µ(Y) — µ(0 )) + Y'µ( 0 ) =
f
1 o
(Y'J(Oy)Y) dO + y•µ(0).
Now, Y'J(OY)Y = Y'J'(OY)Y = zY'(J(OY) + J'(OY))Y 5 — A,jy 2 .
(4.12)
Using this in (4.11), 2 Y'µ(Y) < —2
1
y
2
+ 2 (I t( 0 )I)IYI.
(4.13)
Also, tr(a(Y) 6 '(Y)) = tr{(ß(Y) — a( 0 ))(a(Y) — 6 (0 ))' + a(0 )(u(Y) — ß(0))' +
(a(y)
—
6(0))6'(0) + a(0)ä (0)} .
(4.14)
Since every element of a matrix is bounded by its norm, the trace of a k x k matrix is no more than k times its norm. Therefore, (4.14) leads to tr(ß(Y)a'(Y)) < kAöjy( 2 + 2kIG(0)IA0IYl + kIa(0)I 2 .
(4.15)
Substituting (4.13) and (4.15) into (4.10), and using the fact that Il < 6 1Y1 2 + 1/5 for all 6 > 0, there exist 6, > 0, 6 2 > 0, such that 2), 1 — kt — 6 1 > 0 and (AqP)(Y)
—(2A — k^0 — b^)IYI 2 + b 2 •
(4.16)
Now take expectations in (4.9) to get EIX^ I 2 = IXI Z + EE(Agp(XS)) ds.
(4.17)
CHAPTER APPLICATION: ASYMPTOTICS OF SINGULAR DIFFUSIONS
595
In arriving at (4.17), use the facts (i) IAq (y)I < cjy1 2 + c' for some c, c' positive, so that f o EIA(p(Xs)I ds < oo, and (ii) the integrand of the stochastic integral in (4.9) is in .t[0, cc). Now (4.17) implies that t --> EIX, I Z is absolutely continuous with a density satisfying, by (4.16), dt
— kA 2 — 6 1)EjX x 1 2 + 6 2 .
EIXx 1 2 = EAW(X')
(4.18)
In other words, writing 6' .= 22 1 — k2ö — 6 1 > 0, 8(t) = EIX9 2 ,
dt (e
a ` 0 (t)) - 6 2 e b `,
0 (t) -< {0(0) + (ö 2 /8')(e a ` — 1)}e - a`
(4.19)
0. In other words, it is an invariant (initial) distribution. To prove uniqueness, let n' be any invariant probability. Then for all bounded continuous f,
J
(T„ f)(z)n'(dz) =
f
f(z)ir'(dz) Vn.
(4.33)
But, by (4.29), the left side converges to 7. Therefore,
f-
f
f(z)ir(dz)
= ff(z)7r'(dz), (4.34)
implying it' = n.
U
Consider next linear stochastic differential equations of the form dX, = CX, dt + a dB,
(4.35)
where C and a are constant k x k matrices. This is a continuous-time analog of the difference equation (13.10) of Chapter II for linear time series models. It is simple to check that (Exercise 1) t
XX:= e``x +
e ( `"s )c v dB Q s
(4.36)
is a solution to (4.35) with initial state x. By uniqueness, it is the solution with X o = x. Observe that, being a limit of linear combinations of Gaussians, X, is Gaussian with mean vector exp{tC}x and dispersion matrix (Exercise 2) L(t) = E eta-s)co dB se (r-s)'6 dB s = £ e('-s)c66^e('-s)' ds 0
0
0
=
5
eae du.
(4.37)
0
Suppose that the real parts of the eigenvalues A.....2, , say, of C are negative. This does not necessarily imply that the eigenvalues of C + C are all negative (Exercise 3). Therefore, Theorem 4.1 does not quite apply. But, writing ß
598
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
[t] for the integer part of t and B = exp {C },
Ile` c I < Ile t " c Ii Ile" -( ""II < IIB 1 t'II max{ Ilesc ii: 0 < s < 1} -+ 0
(4.38)
exponentially fast, as t -+ oo, by the Lemma in Section 13 of Chapter II. For the eigenvalues of B are exp{A ; } (1 < i < k), all smaller than I in modulus. It follows from the exponential convergence (4.38) that exp{tC }x -► 0 as t -• oo, and the integral in (4.37) converges to E :=
J
e°c aae c du. "
(4.39)
0
The following result has, therefore, been proved. Proposition 4.2. The diffusion governed by (4.35) has a unique invariant distribution if all the eigenvalues of C have negative real parts. The invariant distribution is Gaussian with zero mean vector and dispersion matrix E given by (4.39).
Under the hypothesis of Proposition 4.2, the diffusion governed by (4.35) may be thought of as the multivariate Ornstein-Uhlenbeck process.
EXERCISES Exercises for Section VII.1 1. Let s = t 0 ,,, < t l , n < • • • < t k ., n = t be, for each n, a partition of the interval [s, t], such that S n := max {t ;+ ,, n - t in : 0 i 5 k n - 1} -+0 as n -• oo.
(i) Prove that n
s :_ n
(B,3
a n — Btn) 2
i=0
converges to t - s in probability as n --> oc. [Hint: E(sn - (t - s)) 2 -+ 0.] (ii) Prove that
J B,, i l.n — B, ,I —* 00 i=0
in probability as n --> oo. [Hint: y n -> s ',/maxJB,,,,, n — B, i.n ^.] (iii) Let it denote an arbitrary subdivision s = t o < t l < • • 0,
then the same holds for all c > 0. 3. Let p(• ), a(•) be Lipschitzian and d, 1 (x)= i1(x)l 2 > 0 for all x. Prove that Et < oo, where r := inf{t ? 0: XI = d} and lxi < d. [Hint: For a sufficiently large c, the function cp(x):= —exp{cx" ) } in {Ixi d} (extended suitably) satisfies A(x) < 0 in d}. Apply It6's Lemma to cP and optional {lxs 0 is chosen so that r o — s > 0. Show that cp(x):= F(Ixl) is twice continuously differentiable on l, vanishing outside a compact set. Apply It6's Lemma to cp to derive (3.31). 5. Let
F(r):= J r exp{ —1(u))} du,
c -< r -< d,
where /(u) is defined by (3.23). (i) Define cp(x):= F(IxI) for c < jxJ _< d and obtain a twice continuously differentiable extension of cp vanishing outside a compact. (ii) For the function tp in (i) check that Agp(x) ` 0 for c 5 lxi < d, and use It6's Lemma to derive a lower bound for P(r( e(o:r) < =ae(o:d)), where raa(v:.)'= ,
inf{t >_ 0:IX;i = r}. (iii) Use (ii) to prove that
for lxi > r o ,
P(iaBio:r o ) < 00) = I
provided
f
^ exp{ —1(u)} du = ca
for some c> 0.
c
(iv) Use an argument similar to that outlined in (i) —(iii) to prove that P(zae(o:. o )
0 for all x. (i) Use Itö's Lemma to compute (a) i/i(x):= P( {X^ } reaches c before d), c < x < d; (b) p P({X; } ever reaches c), x > c; (c) p xd t= P( {Xf } ever reaches d), x < d. [Hint: Consider p(y):= f' exp{ —1(c, r)} dr for c -< y -< d, where 1(c, r):_ f C ( 2 u(z)/a 2 (z)) dz.]
(ii) Compute (a) ET, A T d , where T,:= inf {t _> 0: Xx = r }, and c < x < d; (b) Et c (x > c); (c) ET d (x - A) d i (by Fubini)
EMn = E(p »'' dA = p 0
J
o
m
M^
('
('
p
J
A p-2 ^ IZnI1,.^,,, dPl dA = P IZ4I^ o
/
\
_ (PI (P — 1
Al-2 d), dP
\ o
IZnl M^ -' dP
))
Ja
(p/(p — 1))(EMP) ( p - ' )1 P(EIZI
')"
(by Hölder's Inequality) .]
(iii) Let {Z,: t > 0} be a continuous-parameter {.F,}-martingale, having a.s. continuous sample paths. If EIZ,IP < oo for all t ? 0, and for some p > 1, then show that max IZ,I >,) 0, P( o 1, show that
Ej
max IZ,Ij < (p/(p — 1))PEIZ T Ip. 0 a.
11. (i) Use the result in 10(i) to prove, for every e > 0, that P(IX,, — xl >, s) = o(t), as t j 0, (a) uniformly for all x in a bounded interval, if p(.), a(.) are Lipschitzian, and (b) uniformly for all x in 08' if p(. )' a(•) are bounded as well as Lipschitzian. (ii) Extend (i) to multidimension. (iii) Let t(.), a(.) be Lipschitzian. Prove that
P( max IXs — xl >_ e f = o(t) o5s5r
as t 10,
I
uniformly for every bounded set of x's. Show that the convergence is uniform for all x e IF k if t(. ), a(•) are bounded as well as Lipschitzian. [Hint:
lµ(X")I PI max IX'r — xl s > ^ S d s % a ^ a X adB + P max oss,t E/ \ Os: Ifos 2 I — e/ < P \J o t 2 () ='1
+ 1 2,
say. To estimate I l note that Ip(XS)I < 1µ(z)1 + MIX' — xl, and use Chebyshev's Inequality for the fourth moment and Exercise 10(iii). To estimate / 2 use Exercise 9(iii) with p = 4, Proposition 3.4 (or Exercise 8(ii) with m = 2) and Exercise 10(iii).] 12. (The Semigroup {T, }) Let p(-), a(•) be Lipschitzian. (i) Show that IITtf — f II -+ 0 as t j 0, for every real-valued Lipschitzian f on 08 k vanishing outside a bounded set. Here II • II denotes the "sup norm." [Hint: Use
Exercise 10.] (ii) Let f be a real-valued twice continuously differentiable function on 08'`. (a) If Af and gradf are polynomially bounded (i.e., IAf(x)I < c l (1 + Ixlm'), (grad f (x)l c 2 (1 + Ixlm'), show that T, f (x) -+ f(x) as t j 0, uniformly on every bounded set of x's. (b) If Af is bounded and grad f polynomially bounded, show that IITtf — f II -+ 0. [Hint: Use Itö's Lemma.] (iii) (Infinitesimal Generator) Let -9 denote the set of all real-valued bounded continuous functions f on R k such that Ilt - '(T,f — f) — gII -^ 0 as t 10, for some bounded continuous g (i.e., g = ((d/dt)T r f), = o , where the convergence of the difference quotient to the derivative is uniform on R"). Write g = Af for such f, and call A the infinitesimal generator of {T, }, and -9 the domain of A. Show that every twice continuously differentiable f, which vanishes outside some bounded set, belongs to -9 and that for such f, Af = Af where A is given
605
EXERCISES
by (3.21). [Hint: Suppose f(x) = 0 for Ixl > N. For each E > 0 and x E ITJ(x) — f(x) — tAf(x)i -< tO(Af:c) + 2tIIAf IIPI max IXs — xl \o<s
0, and twice continuously differentiable in x on H", satisfying au(t, x) at
= Au(t, x) +
V(x)u(t, x)
(t > 0, x E H"),
u(0, x) = f(x),
where f is a polynomially bounded continuous function on H k , and V isa continuous function on H k that is bounded above. If grad u(t, x) and Au(t, x) are polynomially
607
THEORETICAL COMPLEMENTS
bounded in x uniformly for every compact set oft in (0, oo), then
u(t,x)= E(f(X,)exp {
f
.'
V(XS)ds}) J
(t>,0,xcR'`).
[Hint: For each t > 0, apply Itö's Lemma to
X,.,
1'(s) = (Y1(s), Y2(s)) = (
J V(X) ds')
and
w(s, y) = u(t — s, Y1),
0
where y = ( y 1 , y 2 ).]
Exercises for Section VII.4 1. (i) Use the method of successive approximation to show that (4.36) is the solution to (4.35). (ii) Use Itö's Lemma to check that (4.36) solves (4.35). 2. Check that (4.37) is the dispersion matrix of the solution {X'} of the linear stochastic differential equation (4.35). 3. (i) Let C be a k x k matrix such that C + C' is negative definite (i.e., the eigenvalues of C + C' are all negative). Prove that the real parts of all eigenvalues of C are negative. (ii) Give an example of a 2 x 2 matrix C whose eigenvalues have negative real parts, but C + C' is not negative definite. 4. In (4.35) let C = —cI where c > 0 and I is the k x k identity matrix. Compute the dispersion matrices E(t), E explicitly in this case, and show that they are singular if and only if a is singular.
THEORETICAL COMPLEMENTS Theoretical Complements to Section VII.1
_>
1. (P- Completion of Sigmafields) Let (0, .F , P) be a probability space, a subsigmafield of .. The P-completion ' of is the sigmafield {G u N: G e N c M e such that P(M) = 0}. The proof of Lemma I of theoretical complement V.1.1 shows that if .3, is P-complete for all t 0 then the stochastic process {f(t): t 0} is progressively measurable provided (1) f(t) is ,y,-measurable for all t, and (2) t —+ f(t) is almost surely right-continuous. Several times in Section 1 we have constructed processes { f(t)} that have continuous sample paths almost surely, and which have the property that f(t) is .F- measurable. These are nonanticipative in view of the fact that, is P-complete.
>_
_< _
,J a
,l a
h(s)
dB,
for a _< t _< ß, a.s.
a
Without loss of generality one may take, for each n, the same partition {t ; (n): 0 < i < N of [a, ß] for both g h Modify h if necessary, so that h"(tf"°, co) = g"(t; ", w) for we {h(s) = g(s) for a _< s _< t," } (0 _< i ( i
=o
X l a r ar' q ( t n) Y(t'n))) + Rrr).n.i}, (T.3.4) where R, — 0 uniformly in i (for each w), because (u, y) ---r ö,. 3, cp(u, y) and t -a Y(t) are continuous. Using (3.9) the first sum in (T.3.4) is expressed as m Nn-1 J {' nl Y- Y- l( 1 i ") — ti )f,(s) + g,(s)' (Bt(
r=1
i
=0
—'
Jt 1
`_ .=t ^J sl
,
— Bilnl)}urq(tin) Y(t)) l
f.(u)a.^P(u,
J
1'(u)) du + 1 a,(P(u, Y(u))g.(u)•dB"], in probability, (T.3.5)
612
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
since ä,cp is continuous and (see Exercise 1.3) N„-1 1=0
f
4"
(0
(p(t(
('
I
"1 Y(ti"^)) — 3 p(u, Y(u)))g,(s)•dB„ = J h"(u)•dB,,, s
"
say, where $ Jh,.,(u)j du -+ 0 a.s. It remains to find the limiting value of the second sum in (T.3.4) (excluding the remainder). For this, write N„-1 (Y'(
t l + l) — Y
.
(t"))(Y'(t
1=o N„-1
_
'
+1) — Y (. ) (tl n) ))a.a. w(ti"' y(t;n '))
{ (t + l — t )f (s) + g.(s) • (B,j" , — B 11 0} ,
i0
X
l(t( +l — t1n))✓,
(s) + g; (s)• (B 1 1,. — Btlnl)}ä,är•(p(ti'° Y(t'"1)).
(T.3.6)
By (3.1), (3.15) and (3.16), (T.3.6) converges a.s. (for a suitable sequence of partitions) to N -1
um j
k
k
9(s)(B^f;, — B.)}{ g (s)(B) — B
) a.ö: ^P(t}
1
" ,
Y(ti''))
=1i=1 Jim =
9:1(s)9r1(s)
(Bri'.
k
— B^i^ ) Z a.a; W (ti " 1 Y(ti" 1 ))
1=0
j=1
N„-1
= L^ 9r11(S)gr^)(s) llm
Z
nl (ti +l — ti )a,a,. (ti ") , Y(ti "1 ))
i=0
j=1
(T.3.7) = J12'= Y f,' g,'j) (U)gr(j '(U)O,ar'(P(U,Y(u))du. j =1
n
The proof is completed by adding J0 , J 11 , and J 12 .
It may be noted that the proof goes through for f„ g, e £[0, T], I -< r -< k, cp(t, y) twice continuously differentiable in y and once in t. 2. (Explosion) Assume that µ(• ), 6(•) are locally Lipschitzian, that is, (2.42) holds for ^xl < n, jyj < n, with M = Mn , where Mn may go to infinity as n —• co. One may still construct a diffusion on R k satisfying (2.43) up to an explosion time C. For this, let {X': t >- 0} be the solution of (2.43) with globally Lipschitzian coefficients i,,(•), c()
satisfying µn(Y) = µ(Y)
and
a(y) = 6 (Y)
for jyj -< n.
(T.3.8)
We may, for example, let µ "(y) = p(ny/Iyl) and v"(y) = a(ny/Iyj) for lyl > n, so that (2.42) holds for all x, y with M = M. Let us show that, if (xl -< n, X' m (w) = X;. ,,(aw)
for 0 -< t < C n (co) (m >- n)
(T.3.9)
THEORETICAL COMPLEMENTS
613
outside a set of zero probability, where inf{t > 0: IX, „I = n } .
(T.3.10)
To prove (T.3.9), first note that if m >- n then µ(y) = µ„(y) and a,,,(y) = a„(y) for II -< n, so that and
µ m (X,,,,) = µ„(X,.,,)
«„(X;,,,) =
a„(Xf „)
for 0 -< t < S„(w). (T.3.1 I
It follows from (T.3.1 1) (see the argument in theoretical complement I.2) that
f p.(X,.,,) ds +
Jo
a,X) dB = Jo
µ„(X) ds + Jo
ß„(X;. ,,) dB
for 0 -< t < S„
JO (T.3.12)
almost surely. Therefore, a.s.,
Xi m — X^ n =
f'
o [I►m(X.;m) — P^,(Xs,)] ds +
[ 6 m(X.;.m) — am(X,.^)] dBu
Jo
for 0 -< t < S„. (T.3.13)
This implies cP(t):=
E(IXi — Xi„J 2 I
>
,) 2tMm J
(s) ds + 2M'
w(s) ds 0
= 2Mm(t + 1) q(s) ds.
Jo
(T.3.14)
By iteration (see (2.21) and (2.22)), q(t) = 0. In other words, on {^„ > t}, X; „, = a.s., for each t > 0. Since t --, X; ,,,, X,.,, are continuous, it follows that, outside a set of probability 0, X,.„, = Xs,, for 0 -< s < S„, m -> n. In particular, C. j a.s. to (the explosion time) 1;, say, as n j co, and X; (w) = lim X” „(w),
t < (w),
(T.13.15)
nom
exists a.s., and has a.s. continuous sample paths on [0, S(cw)). If P(C < co) > 0, then we say that explosion occurs. In this case one may continue {X'} fort -> S(w) by setting X; (co) _ "co”, t
(co),
(T.13.16)
where "co" is a new state, usually taken to be the point at infinity in the one-point compactification of 11" (see H. L. Royden (1963), Real Analysis, 2nd ed., Macmillan, New York, p. 168). Using the Markov property of {X} for each n, it is not difficult to show that {X': t >- 0} defined by (T.13.15) and (T.13.16) is a Markov process on 08'` u {"cc"}, called the minimal dillusion generated by A =
1
2
0 2 r; u^ ) d ,( x 3x^^i fix' + µ (X) (izri'
614
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
If k = 1 one often takes, instead of (T.13.16), —00
for t >-C(w)on(lim
+ oo
for t >- ^(w) on (um X; = + oo f .
X,
=—oo^,
\
Xt _ T -'
(T.3.17)
J
Write C = r_. on {lim,^. X, = —co}, C = r + on {lim,^. T,:= inf{t >, 0: X; = z }.
X,
= +oo}. Write
Theorem T.3.1. (Feller's Test for Explosion). Let k = 1 and I(z) := f o (2µ(z')/a 2 (z')) dz'. Then if
(a) P(r + ^ < oo) = 0
J
exp{- 1(y) } ^ y (exp {1(z) }/v 2 (z)) dz oo, dy =
L
0
f (b) P(r-. < oo) = 0i
J o exp' — 1(y)}
I
0
I
o (exp{I(z)}/a 2 (z)) dz] dy = oo.
fy
Proof. (a) Assuming that the integral on the right in (a) diverges we will construct a nonnegative, increasing and twice continuously differentiable function (p on [0, cc) such that (i) Agq(y) < p(y) for all y and (ii) 4p(y) -+ oo as y -^ oo. Granting the existence of such a function cp for the moment (and extending it to a twice continuously differentiable function on (-0o,0]), apply Itö's Lemma to the function cp(t, y) := exp{ — t }(p(y) to get, for all x > 0, n TO A I
Eq (t„ A T o n t, X '^
Tp
,,,) — q (x) = E
(—exp{ — s })[cp(Xs) — AQp(Xs )] ds 0
0.
This means q(x) >, Ecp(;r n A T o A t, X A
To „
j, so that
Q(x) >, e `^p(n)P(r < r o A t). -
Letting n -+ o0 one has, for all t >, 0, P(r + ,
< r o A t) < e`cp(x) )im cp
-
'(n) = 0.
(T.3.18)
n-•m
Thus, P(T +« < t, T o = oo) = 0, for all t >- 0, so that P(t + ^ < oo, -r o = oo) = 0. But on the set {tt o < co}, r + > i o by definition of r +m . Therefore, P(r +W t, r o < oo) = 0. This implies P(r + . < oo) = 0. It remains to construct cp. Let p 0 (z) = I (z ? 0) and define, recursively, q(z) = 2
^exp{- 1(y)
f.
exp{ 1 (z')}con-i(z }Ifoy
62 (z')
)
dz,ldy
J
(z ^ 0; n >- 1).
(T.3.19)
615
THEORETICAL COMPLEMENTS
Then (p, >- 0, q(z) j as z j, and cp„(z)
(£
[ex f ex6{((zl z
)}
dz' d3 , (n -> 0). (T.3.20)
2 p{-!(Y)}
n'
0
To prove (T.3.20), note that it holds for n = 0 and that „(z) -< cp„_,(z)^p,(z) (n -> 1). Now use induction on n. It follows from (T.3.20) that the series q (z):= „= o y^„(z) converges uniformly on compacts. Also, it is simple to check that A(P„(z) = cP„_,(z) for all n >_ 1, so that A may be applied term by term to yield AQQ(z) = (P(z) (z -> 0). By assumption, p 1 (z) --' -) as z -+ rc,. Therefore, cp(z) -* x as z - o. Next assume that the series on the right in (a) converges. Again, applying Itö's Lemma to rp(t, y):=exp{-t}cp(y), we get X q(x) = EcP(r„ A T o A t , TnA ToA
cp(n)P(i„
< t o A t) + q (0)P(r o -< t„ A t) + e
-
`cp(n)P(t
< z„ A x)
= cp(n){P(t„ -< T o A t) + e `P(i„ A T o > t)}.
(T.3.21)
-
Letting t -* oc we get 9(x) -< p(n){P(r„ < To, T o < cc) + P(i„ < r x , T o = cc)} < w(n)P(i„ < cc),
so that P(z, < cc) -> 9 (x)/q (oo) > 0.
The proof of (b) is essentially analogous.
n
REMARK. It should be noted that P(T + , < oo) > 0 means +co is accessible. For diffusions on S = (a, b) with a and/or b finite, the criterion of accessibility mentioned in theoretical complement V.2.3 may be derived in exactly the same manner. In multidimension (k > 1) the following criteria may be derived. We use the notation (3.23) for the following statement.
Theorem T.3.2.
(Has'tninskii's Test for Explosion).
(a) P(S < oo) = 0
if J
(b) P(? < oo) > 0
if
J
x
e i^•^(
f exp{/(u)}
x(u)
` J e-t(r)(I
,
exp{I(u)}
a(u)
du) dr = cc,
du dr < co.
[I]
Proof. (a) Assume that the right side of (a) diverges. Fix r o e [0, xl). Define the following functions on (r o , cc),
^Ao(v) = 1,
^P„(v) =
J„
e ^t.^ I I exp{ (u)}r0=i(u) du) dr.
„
x(u)
(T.3.22)
616
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Then the radial functions q(y)on (r0 _< IyI < oo) satisfy: A(y) _< ^p„_ I(IYI)• Now,
as in the proof of Theorem T.3.1, apply Itö's Lemma to the function _ 0: IXXI = R }, E(p(t, o /\ tR A t, X,'
TRn,) 0. It follows that P(S < co) = 0. (b) This follows by replacing upper bars by lower bars in the definition (T.3.22) and noting that for the resulting functions, Acp„(IYI) % (V „- i(IYI)• The rest of the proof follows the proof of the second half of part (a) of Theorem T.3. 1. n Theorem T.3.1 is due to W. Feller (1952), "The Parabolic Differential Equations and the Associated Semigroups of Transformations,” Ann. of Math., 55, pp. 468-519. Theorem T.3.2 is due to R. Z. Has'minskii (1960), "Ergodic Properties of Recurrent Diffusion Processes and Stabilization of the Solution of the Cauchy Problem for Parabolic Equations,” Theor. Probability Appl., 5, pp. 196-214. 3.
), a(.) be Lipschitzian, a(•) nonsingular and a - ' (•) Lipschitzian. Let {X'} be the solution of (Cameron-Martin-Girsanov Theorem) Let
X; = x + j a(X)dB (t
-> 0).
(T.3.23)
Jo
Define
exp{
3
`
J
6-
1 ` '(X II)R(XS )•dB 3 . - 2 la -t (Xs:)P(X -, )I Z ds' J5 (0`s 0, T> 0 be given. Define a continuously differentiable (in t) function c(t) _ (c 1 (t), ... , c k (t)) that vanishes for t > 2T and such that c.(t) = (d/dt)u ; (t), 1 _< i 0).
(T.3.36)
0
0
In view of Theorem T.3.5, it is enough to show P^IX° — x — Jc(s) ds < e for all t e [0, T]) > 0.
(T.3.37)
In turn, (T.3.37) follows if we can find a Lipschitzian vector field b(•) such that the solution {Y,} of r
Y, = x +
c(s) ds +
f
.'
r
b(Y,) ds +
6(Y s ) dB,
(t -> 0)
(T.3.38)
0
satisfies
P(IYr — x —
J c(s) ds < e for all
t E [0, T]) > 0.
(T.3.39)
0
But (T.3.39) will follow if one may find b(•) such that E(•r OB(x: ,) ) > T,
where T öB( , ) := inf { t > 0: I Y, — x — f ' c(s) dsI-> e } . .
)
The following lemma shows that such a b(•) exists.
n
Lemma. Let a(•) be nonsingular and Lipschitzian. Then, for every e > 0, sup Er fl( , E) = co, where the supremum is over the class of all Lipschitzian vector fields b(.).
(T.3.40) ❑
Proof. Without loss of generality, take x = 0. (This amounts to a translation of the coefficients by x.) Write Z, = Y, — f ö c(s) ds. For any given pair t() - b(. ), y(.)
620
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
define
f
Za)ZW
('
d(t, z):= d ;i (z + c(s) ds) Z IZI .
B(t, z) _ Y d ii (z +
J c(s) ds) o
,
t. t
C(t,
z) = 2 +
c(s) ds o
B(t, z) + C(t, z)
3=' sup
f30.Izl =r d(t,
a(r):= sup d(t, z).
— 1,
z) t
>-o.Izl =r
Similarly ß(r), a(r) are defined by replacing "sup" by "inf" in the last two expressions. Finally, let
I(r):=
f rß(u) du
!(r):_ I r ß(u) du
J
U
o
a U
where a is an arbitrary positive constant. Note that {Z,} is a (nonhomogeneous) diffusion on 11" whose drift coefficient is b (t , z) = b(z + ,(o c(s) ds), and whose diffusion matrix is D(t, z) := a(t, z)ß'(t, z) where a(t, z) = a(z + f o c(s) ds). The infinitesimal generator of this diffusion is denoted A. Now check that (T.3.41) limß(r)>k -1. •10
Define
F(r)
=
f, f, 8(
"a
1v )
(
exp{ —
J
uß(,
v
dll } dv) du. )))
«v
(T.3.42)
Using (T.3.41) it is simple to check that F(r) is finite for 0 < r s. Also, for r > 0,
F'(r) =
—
Er 1 exp^—
a(v) F"(r)
J
r
^
ß(U) dv' } dv < 0,
J)
v'
(1.3.43 )
_ —«-r) — Y ß r)
Define q(z) = F(IzI). By (3.24) and (T.3.43) it follows that for 0< IzI < s.
2A t rp(z) >- —1
Therefore, by Itb's Lemma applied to cp(Z,) (and using the fact that cp = 0 on öB(O:e)), 0
— 4?(0) 3
—
ZEiaBto:E),
so that
ETaB(o:,) -> 2q(0) = 2F(0),
0 -< IzI -< a.
(T.3.44)
Choose b(z) _ —Mz (M > 0). Then it is simple to check that as
M —r oo, F(0) — oo. n
621
THEORETICAL COMPLEMENTS
Corollary T.3.7. (Maximum Principle for Elliptic Equations). Suppose It(.) and a(• ) are locally Lipschitzian and r(.) nonsingular. Let A be as in (T.3.31). Let G be a bounded, connected, open set, and u(•) a continuous function on G, twice continuously differentiable in G, such that Au(x) = 0 for all x e G. Then u(.) cannot attain its maximum or minimum in G, unless u is constant on G. 0 Proof. If possible, let x o e G be such that u(x) _< u(x o ) for all x e G. Let 6 > 0 be such that B(x 0 :2 5) c G. Let cp be a bounded, twice continuously differentiable function on H" with bounded first and second derivatives, satisfying q0(x) = u(x) on B(x o :b). Let {X,} be a diffusion with coefficients µ(• ), a(• ), starting at x 0 . Apply It6's Lemma to get Eg7(X8,0 ai) =
or, denoting by it the distribution of X,ß, b .,
u(y)rt(dy)
f
=
u(x o ).
(T.3.45)
eB(x o :j)
But u(y) _< u(x o ) for all ye aB(x 0 :6). If u(y o ) < u(x o ) for some y o E 3B(x 0 :S) then, by the continuity of u, there is a neighborhood of y o on 3B(x 0 :S) in which u(y) < u(x o ). But Theorem T.3.6 implies that iv assigns positive probability to this neighborhood, which would imply that the left side of (T.3.45) is smaller than its right side. Thus u(y) = u(x o ) on 3B(x 0 :8). By letting 6 vary, it follows that u is constant on every open ball B contained in G centered at x o . Let D = {y e G: u(y) = u(x o )}. For each y e D there exists, by the same argument as above, an open ball centered at y on which u equals u(x o ). Thus D is open. But D is also closed (in G), since u is continuous. By connectedness of G, D = G. n D. W. Stroock and S. R. S. Varadhan (1970), "On the Support of Diffusion Processes with Applications to the Strong Maximum Principle," Proc. Sixth Berkeley Symp. on Math. Statist. and Probability, Vol. III, pp. 333-360, contains much more than the results described above. 5. (Positive Recurrence and the Existence of a Unique Invariant Probability) Consider a positive recurrent diffusion {X} on Rk . Fix two positive numbers r, < r z . Define ry e :=inf{t
0: X') = r 1 },
z; _ '
riz;:=inf{t
: IX,) = r z }, (T.3.46)
x 7zr +i =inf{ t->ri zi : ^X,^ =r 1 }
(i
1).
The random variables q ; are a.s. finite. By the strong Markov property {Xn z ,: i > 1} is a Markov chain on the state space 0B(O:r 1 ) = {)yl = r l }, having a (one-step) transition probability ni(y, B):= P(X"7:,.•, E B {iii
nz:-. )xt=r =
J
=.zi
()yj = r 1 , B E .1(ÖB(O: r, )),
H,(z, B)H2(y, dz), (T.3.47)
622
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
where .yn , is the pre-q sigmafield, and H (y, dz) is the distribution of X ß(0 ., t the random point where {X;} hits OB(0:r; ) at the time of its first passage to aB(O:r; ). Now for all z,, z 2 e OB(O:r, ), H2 (z,, dy), H2 (z 2 ,dy) are absolutely continuous with respect to each other and there exists a positive constant c (independent of z,, z 2 ) such that ;
dH2(z1, dy) >_
dH2(z2, dy)
(jz; = r l for i = 1, 2).
c
(T.3.48)
This is essentially Harnack's inequality (see D. Gilbarg and N. S. Trudinger (1977), Elliptic Partial Differential Equations of Second Order, Springer- Verlag, New York, p. 189). It follows from (T.3.47) and (T.3.48) that dn1(Y, dz)
dn1(Y1, dz) >c
(Y,
Yi e öB(O:r l )).
(T.3.49)
This implies (see theoretical complements II.6) that there exists a unique invariant probability µ l for the Markov chain {X11 2 , : i > l} and the n-step transition probability n(y, B) converges exponentially fast to µ 1 (B) as n -^ oo, uniformly for all y e 3B(O: r,) and all BE .(0B(0: r, )). Now let {X 1 } be the above diffusion with initial distribution µ,, so that {X n2 : i > 1} is a stationary process, as is the process nz
Z1 (f)= fJ(XS)ds (i n2i-
1),
1
where f is a real-valued bounded Borel-measurable function on 118'`. ❑
Lemma. The tail sigmafield of {Z ; (f ): i > 1} is trivial.
Proof. Let A e ✓ := n,,,, o {Z; (f ): i >, n} —the tail sigmafield, and Bev{Z i (f ): I _< i _< n} c ,Fn2 „ , (the pre-n Z „ + , sigmafield of {X, }). For every positive integer m, A belongs to the after- ,j 2( ,,, )+i process X,1z and, therefore, may be expressed 1
m
as {Xnzi m „ e A n +m a.s., for some Borel set A„ +m of C([0, oo): 11"). By the strong Markov property, }
P(A j
na^.m^. ) = E(In„,m(X
+
.I) I ^ ?2I.'+m,.,)
_ (Px (A n+m )) x = xa21,,.^1.I — ^n+m( X nzi^«)s
say. Hence,
P(A I . n2^*
I
) =
E( P„+m (Xmt (
„.m1
+^ )
^ ^nTh,,)
= (
J
+zmx,
dz)/ x
= X,2,,
i
But n,mt(x, B) -+ µ 1 (B) as m -+ oo, uniformly for all x and B. Therefore
P(A I m* ,) -
J
cp„ +m (z)µ,(dz) --. 0
a.s.
as m -• cc.
(T.3.50)
623
THEORETICAL COMPLEMENTS
But ^^n+mlZ)t^l(dZ) = E(n+m(X,1,_
= P(A).
Therefore, (T.3.50) implies, for every Be P(B nA)—P(B)P(A)—^0
asm --•oo.
(T.3.51)
But the left side of (T.3.51) does not depend on m. Therefore, P(B n A) = P(B)P(A). < i _< n} and the tail sigmafield % are independent for every I-fence 1 }. In particular, ✓ is independent i This implies that .i is independent of of itself, so that P(A) = P(A n A) = P(A)P(A). Thus, P(A) = 0 or 1. n
a{Z (f ): I ;
a{Z (f ): _>
n _> 1.
;
Using the above lemma and the Ergodic Theorem (theoretical complements II.9), instead of the classical strong law of large numbers, we may show, as in Section 9 of Chapter 1I, or Section 12 of Chapter V, that
J
1(
as. .
1
lim -- .i(X ) ds --' --- ---- E f(X s ) ds = .i(X)m(dX) > (T.3.52) r
-.
>
S
t 0 E(3 — tie) i,, 1
say, where in is the probability measure on (H', k ) with m(B) equal to the average amount of time the process {X } spends in the set B during a cycle [72+ 92n+3]. It follows from (T.3.52) that in is the unique invariant probability for the diffusion. The criterion (3.25) (Proposition 3.3) for positive recurrence is due to R. Z. Has'minskii (1960), "Ergodic Properties of Recurrent Diffusion Processes and Stabilization of the Solution to the Cauchy Problem for Parabolic Equations," Theor. Probability App!., 5, pp. 196-214. A proof and some extensions may be found in R. N. Bhattacharya (1978), "Criteria for Recurrence and Existence of Invariant Measures for Multidimensional Diffusions," Ann. Probab., 6, pp. 541-553; Correction Note (1980), Ann. Probab., 8. The proof in the last article does not require Harnack's inequality. S
Theoretical Complements to Section VII.4 1. Theorem 4.1 and Proposition 4.2 are special cases of more general results contained in G. Basak (1989), "Stability and Functional Central Limit Theorems for Degenerate Diffusions,)) Ph.D. Dissertation, Indiana University. Two comprehensive accounts of the theory of stochastic differential equations are N. Ikeda and S. Watanabe (1981), Stochastic Differentia! Equations and Diffusion Processes, North-Holland, Amsterdam, and I. Karatzas and S. E. Shreve (1988), Brownian Motion and Stochastic Calculus, Springer- Verlag, New York.
CHAPTER 0
A Probability and Measure Theory Overview
1 PROBABILITY SPACES Underlying the mathematical description of random variables and events is the notion of a probability space ((2, F, P). The sample space 1 is a nonempty set that represents the collection of all possible outcomes of an experiment. The elements of 12 are called sample points. The sigmafield F is a collection of subsets of Q that includes the empty set 0 (the "impossible event') as well as the set n (the "sure event") and is closed under the set operations of complements and finite or denumerable unions and intersections. The elements of F are called measurable events, or simply events. The probability measure P is an assignment of probabilities to events (sets) in F that is subject to the conditions that (i) 0 < P(F) _< 1, for each Fe F, (ii) P(0) = 0, P(12) = 1, and (iii) "(Ui F; ) = Z ; P(F; ) for any finite or denumerable sequence of mutually exclusive (pairwise disjoint) events F; , i = 1, 2, ... , belonging to F. The closure properties of F ensure that the usual applications of set operations in representing events do not lead to nonmeasurable events for which no (consistent) assignment of probability is possible. The required countable additivity property (iii) gives probabilities a sufficiently rich structure for doing calculations and approximations involving limits. Two immediate consequences of (iii) are the following so-called continuity properties: if A, c A 2 c is a nondecreasing sequence of events in .y then, thinking of Ute° , A n as the "limiting event" for such sequences, P^ U
lim P(An).
n-I An)=
^
To prove this, disjointify {A n } by B n = A. — A_ 1 , n >_ 1, A 0 = 0, and apply (iii) to U = , B. = U^ , A n . By considering complements, one gets for decreasing measurable
625
626
A PROBABILITY AND MEASURE THEORY OVERVIEW
events A,
A 2 • • • that Pj (l A \n—,
nt =
tim P(A,).
(1.2)
n
While (1.1) holds for all countably additive set functions u (in place of P) on F, finite or not, (1.2) does not in general hold if p(A n is not finite for at least some n
>_ _
- m(EX)E(X — EX) + Ecp(EX) = cp(EX ).
(2.6)
This inequality is strict if (p is strictly convex and X is not degenerate. More generally, convexity means that for any x 0 ye I, (p(tx + (1 — t)y) _< up(x) + (1 — t)q (y), 0 < t < 1; likewise strict convexity means strict inequality here. One may show that (2.5) still holds if m(x) is taken to be the right-hand derivative of tp at x. The general inequality (2.6) may then be stated as follows.
Proposition 2.2. (Jensen's Inequality). If cp is a convex function on the interval I and if X is a random variable taking its values in 1, then (2.7)
(p(EX) _< Etp(X)
provided that the indicated expected values exist. Moreover, if q is strictly convex, then equality holds if and only if the distribution of X is concentrated at a point (degenerate). From Jensen's inequality we see that if X is a random variable with finite pth absolute moment then for 0 < r < p, writing IXI = (IXI')'/", we get EIXI° >_ (EIXI')°l'. That is, taking the pth root, EIXI")' ' (EIXI')' 1 -
1, q > I and let f and g be functions such that If I ° and Igl are both p-integrable. If p and q are conjugate in the sense I/p + 1/q = 1, then using convexity of the function ex we obtain
IfI - I9I = e nrvuoglllP+tt/vuogl919
,
p
EIXIP
E
E(lIIxI,E}IXI P )
>_
(s > 0, p> 0),
e"P(IXI > e).
(2.16)
631
LIMITS AND INTEGRATION
3 LIMITS AND INTEGRATION A sequence of measurable functions { f : n >- I} on a measure space (S2, Y, p) is said to converge in measure (or in probability in case p is a probability) to f if, for every e > 0, f
p({Ifn —fi>E})-+0
asn -+oo.
(3.1)
A sequence { fn } is said to converge almost everywhere (abbreviated a.e.) to f, if u({fn— i })=0.
(3.2)
If g is a probability measure, and (3.2) holds, one says that (f} converges almost sure/i' (a.s.) to f. A sequence { f,} is said to be Cauchy in measure if, for every E > 0,
Alf.—LI >E })-0
asn, in--^ co.
(3.3)
Given such a sequence, one may find an increasing sequence of positive integers {n k : k >- 1} such that
u({I/n k — .in k . (> ('z) k }) < (z) k (k = 1, 2, ...).
(3.4)
The sets B,:= U°= {If — fnk , l > 2 -k } form a decreasing sequence converging to B:= {If — fnk „I > 2 for infinitely many integers k}. Now p(B,) -< 2"+'. Therefore, u(B) = 0. Since B = {1 fn , — f,,, s 2 for all sufficiently large k}, on B` the sequence { f,,,: k >- l} converges to a function f. Therefore, { fnk } converges to f a.e. Further, for any r>0 and all m, .
-
u({Ifn—fI>E})Vu({Ifn—fnml>2})+11({Ifn —.fl>2}). m
(3.5)
The first term on the right goes to zero as n and n m go to infinity. Also,
Bm if
1 1 Jn.^ — J I > 21 C
2—m+1 < 2
As p(B m ) -+ 0 as m -+ oc, so does the second term on the right in (3.5). Therefore, { fn : n -> l} converges to fin measure. Conversely, if {f,,} converges to f in measure then, for every E > 0,
— f > E}) < µ({l f
—
.fl > 2}) + p({IJJ — f > 2}) --> 0
as n, m --• x. That is, {f,,} is Cauchy in measure. We have proved parts (a), (b) of the following result. Proposition 3.1. Let (1 , .y, p) be a measure space on which is defined a sequence { !„} of measurable functions.
632
A PROBABILITY AND MEASURE THEORY OVERVIEW
(a) If { f„} converges in measure to f then
If.) is Cauchy in measure and there exists
a subsequence { f,k : k >- l} which converges a.e. to f (b) If {f} is Cauchy in measure then there exists f such that {f} converges to f in measure. (c) If µ is finite and { f,} converges to f a.e., then {f} converges to f in measure. Part (c) follows from the relations -+ 0 asn -goo.
u( {Ifm—fI>E})E }
(3.6)
m=n
Note that the sets A, say, within parentheses on the right side are decreasing to A := {I f, — f I > s for infinitely many m }. As remarked in Section 1, following (1.2), the convergence of µ(A m ) to u(A) holds because µ(A m ) is finite for all m. The first important result on interchanging the order of limit and integration is the following. Theorem 3.2. (The Monotone Convergence Theorem). If { f} is an increasing sequence of nonnegative measurable functions on a measure space (S2, .y , p), then I^m
J f.du= ffdl^,
(3.7)
where f = lim„ f,. Proof. If {f) is a sequence of simple functions then (3.7) is simply the definition of $ f dp. In the general case, for each n let { f k : k >- l} be an increasing sequence of nonnegative simple functions converging a.e. to f, (ask --* x). Then g k := max{ fn k : 1 -< n -< k} (k >, 1) is an increasing sequence of simple functions and, for k >- n, (3.8)
Jk
as f„, k -< f, < f, for I -< n < k. By the Domination Inequality
f
f,,
dp ,
J
g k dp 1) is a Cauchy sequence (i.e., II.% — .fmll P -* 0 as n, m -. oo), then there exists f in L° such that II fn — f II P - 0. For this last important fact, note first that Ilfn — fmllP - 0 easily implies that { fn } is Cauchy in measure and therefore, by Proposition 3.1, converges to some f in measure. As a consequence, {h f"} converges to I f I° in measure. It then follows by Fatou's Lemma applied to {I ff} that f I f I ° dp < oo, i.e., f e U. Applying Fatou's Lemma to the sequence n} one gets — p:
{If fml in >_
I.in — f I ° dp p, then {IX„I"} is uniformly integrable. There is one important case where convergence a.e. implies C-convergence. This is as follows. Theorem 3.7. (Scheffe's Theorem). Let (S), .°^ , p) be a measure space. Let f,(n >- 1),f be p.d.f.'s with respect to p, i.e., f,,, f are nonnegative, and Sf dp = 1 for all n, If dp = 1. If f„ converges a.e. to f, then f„ —. fin L'.
636
A PROBABILITY AND MEASURE THEORY OVERVIEW
Proof: Recall that for every real-valued function g on SI, one has g = g + — g - ,
where g + = max {g, 0 }, g - = —min {g, 0 }.
II = g + + g - ,
One has
f
(f — fn) dl^ =
0 = $(f — fn) dµ — f (f — fn) du, }
so that
J (f — f„) dP = J (f _
I
11f 112119112.
(3.23)
4 PRODUCT MEASURES AND INDEPENDENCE, RADON—NIKODYM THEOREM AND CONDITIONAL PROBABILITY If (S,, .,, µ, ), (S2,'2 , P2) are two measure spaces, then the product space (S, .9', p) is a measure space where (i) S is the Cartesian product S 1 x S 2 ; (ii) .50 = .9' g .'z is the smallest sigmafeld containing the class R of all measurable rectangles, x B 2 : B, e .So,, B z e Soz ); and (iii) p is the product measure P, x µ z on .9' determined by the requirement p(B, x B.) = p1(B1)pz(B2)
(B 1 E5",, B 2 e.9 ).
(4.1)
As the intersection of two measurable rectangles is a measurable rectangle, 9 is closed
637
PRODUCT MEASURES AND INDEPENDENCE
under finite intersections. Then the class' of all finite disjoint unions of sets in .4 is a field. By finite additivity and (4.1), p extends to' as a countably additive set function. Finally, Caratheodory's Extension Theorem extends p uniquely to a measure on 9 = e('), the smallest sigmafield containing '. For each Ba 5°, every x-section B(x ,., := {y: (x, y) E B} is in $o,. This is clearly true for measurable rectangles F = B, x B 2 . The class .01 of all sets in So for which the assertion is true is a 2-class (or, a lambda class). That is, (i) S e 2, (ii) A, Be 2, and A c B imply B\A E 2, and (iii) A„(n l) e 2, A„ IA imply Ac Y. We state without proof the following useful result from which the measurable-sections property asserted above for all Be .9' follows. Theorem 4.1. (Dynkin's Pi-Lambda Theorem).' Suppose a class .4 is closed under finite intersections, a class d is a lambda class, and R c .4. Then a(s) c .W, where a(s) is the smallest sigmafield containing -4. In view of the measurable-section property, for each B e 9 one may define the functions x —" p 2 (B(X ,.,), y -a p,(B). These are measurable functions on (S,,.',µ,) and (SZ, 52, P2), respectively, and one has µ(B) =
J A2(B(X .,)p, (dx) = f1
p,(Bc..vi)P2(dy)
(B E 2).
(4.2)
2
This last assertion holds for B = B, x B 2 e , by (4.2) and the relations µ2(B1,)) = P 2 (B 2 )l B ,(x), ‚(B () ) = p,(B,)1 8 ,(y). The proof is now completed for all B e .9° by the Pi-Lambda Theorem. It follows that if f(x, y) is a simple function on (S, .9', p), then for each x e S, the function y —. f(x, y) on (S 2 . Sot , µ 2 ) is measurable, and for each y e S 2 the function x —* f(x, y) on (S,,50,,µ,) is measurable. Further for all such f
J
/d
.f(x,Y)P2(dY) IPi(dx) = = J
\Js=
Js ( f, ^ f '
(x,Y)p,(dx) F^2(dy). (4.3)
J
By the usual approximation of measurable functions by simple functions one arrives at the following theorem. Theorem 4.2. (Fubini's Theorem). (a) If f is integrable on the product space (Si x S2, `° OO`°2 9, x P2) = (S, 5, p), then ,
(i) x - fsz f(x, y)p 2 (dy) is measurable and integrable on (S,, So,, p,). (ii) y - f s , f(x, y)p,(dx) is measurable and integrable on (S2, 9'Z, P2).
(iii) One has the equalities (4.3). (b) If f is nonnegative measurable on (S, .9', p) then (4.3) holds, whether the integrals are finite or not. The concept of a product space extends to an arbitrary but finite number of components (S ; ,., µ;) (1 < i _< k). In this case S = S, x S 2 x • .. x Sk , .° = 9, Ox .9 OO . • • (9 is the smallest sigmafield containing the class . of all measurable rectangles ' P. Billingsley (1986), Probability and Measure, 2nd ed., Wiley, New York. p. 36.
A PROBABILITY AND MEASURE THEORY OVERVIEW
638
B = B 1 x B 2 x • • • x B k (B ; e 1 -< i -< k). The sigmafield .9' is called the product sigmafield, while p = u, x • .. x Pk is the product measure determined by
µ(B 1 x Bz x
•..
x
Bk) = p1(B1)µ2(B2)
...
uk(Bk)
(4.4)
for elements in A. Fubini's theorem extends in a straightforward manner, integrating
f first with respect to one coordinate keeping the k — 1 others fixed, then integrating the resulting function of the other variables with respect to a second coordinate keeping the remaining k — 2 fixed, and so on until the function of a single variable is integrated with respect to the last and remaining coordinate. The order in which the variables are integrated is immaterial, the final result equals J s f dµ. Product probabilities arise as joint distributions of independent random variables. Let (S2, F, P) be a probability space on which are defined measurable functions X; (1 -< i < k), with X, taking values in S; , which is endowed with a sigmafield So (1 < i < k). The measurable functions (or random variables) X,, X2 .... , X„ are said to be independent if P(X, EB 1i X2 EB 2 ,...,Xk EBk )= P(X1 eB1)
...
P(XkEBk)
for all B I eYl ,...,Bk e4k . (4.5) In other words the (joint) distribution of (X 1 ,. . . , Xk ) is a product measure. If f• is an integrable function on (S; , .97, p i ) (1 < i -< k), then the function
f: (xt, x2.... , Xk) -- .r1(x 1)f2(x2)
.
..
fk(xk)
is integrable on (S, .9', p), and one has
J
fd (4.6)
or
E(fl f(X1)) = fl (Efi(X1)).
(4.7)
A sequence {X„} of random variables is said to be independent if every finite subcollection is. Two sigmaftelds FI , .y2 are independent if P(F1 n F2 ) = P(F 1 )P(F2 ) for all Fl E .fi , F2 e .F2 . Two families of random variables {X x : A E A 1 ) and { Yx : A E A 2 } are independent of each other if o{X5 : A E A t ) and r{YY : A E A Z } are independent. Here 6 {Xd: A E Al is the smallest sigmafield with respect to which all the X x are measurable. Events A 1 , A 2 , ... , A k are independent if the corresponding indicator functions 1 4 1 A 2 , ... ,1,,, are independent. This is equivalent to requiring P(B l n B 2 n • • • n Bk ) _ P(B 1 )P(B 2 )• • •P(Bk ) for all choices B l , ... , B k with B ; = A ; or A. Before turning to Kolmogorov's definition of conditional probabilities, it is necessary to state an important result from measure theory. To motivate it, let (S2, F, p) be a measure space and let f be an integrable function on it. Then the set function v defined by v(F) :=
fF f dµ
(F e .^),
(4.8)
639
PRODUCT MEASURES AND INDEPENDENCE
is a countable additive set function, or a signed measure, on .F with the property if µ(F) = 0.
v(F) = 0
(4.9)
A signed measure v on F is said to be absolutely continuous with respect to µ, denoted v «,u, if (4.9) holds. The theorem below says that, conversely, v « p implies the existence of an f such that (4.8) holds, if p, v are sigmajinite. A countably additive set function v is sigmafinite if there exist B. (n > 1) such that (i) U,,, 1 B„ = S2, (ii) Jv(B„)I < oo for all n. All measures in this book are assumed to be sigmaflnite. Theorem 4.3. (Radon—Nikodym Theorem). 2 Let (S2, F, p) be a measure space. If a finite signed measure v on F is absolutely continuous with respect to p then there exists an a.e. unique integrable function f, called the Radon—Nikodym derivative of v with respect to µ, such that v has the representation (4.8). Next, the conditional probability P(A B), of an event A given another event B, is defined in classical probability as n B) P(A I B)._ P(A (4.10)
P(B)
provided P(B) > 0. To introduce Kolmogorov's extension of this classical notion, let (S2, ,F, P) be a probability space and {B„} a countable partition of S2 by sets B. in F. Let -9 denote the sigmafield generated by this partition, Q = a{B„}. That is, -9 comprises all countable disjoint unions of sets in {B„}. Given an event A e .F, one defines P(A the conditional probability of A given Q, by the s-measurable random variable P(A 19)(oß)._
P(A n B.)
for w e B,,,
P(B„)
(4.11)
if P(B„) > 0, and an arbitrary constant c,,, say, if P(B„) = 0. Check that
J
P(A I-9) dP = P(A n D)
for all Dc -9.
(4.12)
D
If X is a random variable, E!XJ < oo, then one defines E(X (.9), the conditional expectation of X given -9, as the 9i-measurable random variable E(X I f)(w):= I X dP P(B,,) JB,
for we B,,, if P(B„) > 0,
=c^
forcoeB,,,ifP(B„)=0,
(4.13)
where c„ are arbitrarily chosen constants (e.g., c„ = 0 for all n). From (4.13) one easily verifies the equality E(X 9) dP = D
J
XdP for all D e ^.
D
Note that P(A I -9) = E(1 A I D), so that (4.12) is a special case of (4.14). 2 P. Billingsley (1986), loc. cit., p. 443.
(4.14)
640
A PROBABILITY AND MEASURE THEORY OVERVIEW
One may express (4.14) by saying that E(X -9) is a 9- measurable random variable whose integral over each D E -9 equals the integral of X over D. By taking D = B. in (4.14), on the other hand, one derives (4.13). Thus, the italicized statement above may be taken to be the definition of E(X -9). This is Kolmogorov's definition of E(X -9), which, however, holds for any subsigmafield -9 of F, whether generated by a countable partition or not. To see that a 9-measurable function E(X -9) satisfying (4.14) exists and is unique, no matter what sigmafield .9 c F is given, consider the set function v defined on 9 by
I
v(D)_
X dP
(D
E
9).
(4.15)
D
Then v(D) = 0 if P(D) = 0. Consider the restriction of P to .9. Then one has v « P on 9. By the Radon-Nikodym Theorem, there exists a unique (up to a P-null set) 9-measurable function, denoted E(X 9), such that (4.14) holds. The simple interpretation of E(X -9) in (4.13) as the average of X on each member of the partition generating .9 is lost in this abstract definition for more general sigmafields 9. But the italicized definition of E(X -9) above still retains the intuitive idea that given the information embodied in -9, the reassessed (or conditional) probability of A, or expectation of X, must depend only on this information, i.e., must be 9-measurable, and must give the correct probabilities, or expectations, when integrated over events in -9. Here is a list of some of the commonly used properties of conditional expectations.
1
Theorem 4.4. (Basic Properties of Conditional Expectations). Let (S2, F, P) be a probability space, -9 a subsigmafield of F. (a) If X is 9-measurable and integrable then E(X -9) = X. (b) (Linearity) If X, Y are integrable and c, d constants, then E(cX + dYI -9) cE(X -9) + dE(YI -9). (c) (Order) If X, Y are integrable and X , Qp(E(X I
.9)).
(4.18)
Proof. In (2.5) take y = X, x = E(X 1 -9), to get rp(X) >, cp(E(X
1 .9)) + m(E(X I .9))(X — E(X 1 .9)). n
Now take conditional expectations, given -9, on both sides.
As an immediate corollary to (4.18) it follows that the operation of taking conditional expectation is a contraction on LF(i2, .F, P). That is, if X e L° for some p >, 1, then
(4.19)
IIE(X 1 .9)IIp s 11X11 p .
If Xe L2 , then in fact E(X I .9) is the orthogonal projection of X onto LZ (fl, -9, P) c L2 (S2, F, P). For if Y is an arbitrary element of L2 (,.9, P), then E(X — Y) 2 = E(X — E(X .9) + E(X I -9) — Y) 2 = E(X — E(X + E(E(X I -9) — Y) 2 + 2E[(E(X .9) — Y)(X — E(X I .9))].
The last term on the right side vanishes, on first taking conditional expectation given (see Basic Property (e)). Hence, for all Ye L2 (12, .9, P), E(X — Y) 2 = E(X — E(X I .9)) 2 + E(E(X 11) — Y) 2 E(X — E(X ( .9 )) Z . (4.20)
Note that X has the orthogonal decomposition: X = E(X (9) + (X — E(X .9)). Finally, the classical notion of conditional probability as a reassessed probability measure may be recovered under fairly general conditions. The technical difficulty lies in the fact that for every given pairwise disjoint sequence of events {A n } one may assert the equality P(U A„ .9)(cu) _ E P(A„ I .9)(m) for all w outside a P-null set (Basic Properties (b), (f)). Since in general there are uncountably many such sequences, there may not exist any choice of versions of P(A ^) for all A e F such that for each w, outside a set of zero probability, A -+ P(A 1 .9)(cw) is a probability measure on F. When such a choice is possible, the corresponding family {P(A .9): A e f} is called a regular conditional probability. The problem becomes somewhat simpler if one does not ask for a conditional probability measure on F, but on a smaller sigmafield. For example, one may consider the sigmafield I = {Y - '(B): B e.9'} where Y is a measurable function on ((1, F, P) into (S, .9'). A function (w, B) -+ Q w(B -9) on S2 x .9' into [0, 1] is said to be a conditional distribution of Ygiven -9 if (i) for each Be .50 , Q W (B 1 -9) = P( {Ye B) .9)(cw) _E(I B (Y) .9)(w), for all w outside a P-null set, and (ii) for each to e S2, B —4 Q^,(B I -9) is a probability measure on (S, .9'). Note that (i), (ii) say that there is a regular conditional probability on 9 ✓. If there exists a conditional distribution Q,,(B -9) of Y given .9, then it is simple to check that E(W(Y) 19)(w) _
js
Q(y)Q(dy
19)
a.s.
(4.21)
643
CONVERGENCE IN DISTRIBUTION IN FINITE DIMENSIONS
for every measurable cp on (S, 9) such that co(Y) is integrable. Conditional distributions do exist if S is a (Borel subset of a) complete separable metric space and 5' its Borel sigmafeld. In this book we often write E(Z {X x : Ac A}) in place of E(Z 6{X,: A c A}) for simplicity.
5 CONVERGENCE IN DISTRIBUTION IN FINITE DIMENSIONS A sequence of probability measures {Pn : n = 1, 2, ...} on (W, I') is said to converge weakly to a probability measure P (on R') if lim 41(x) dP (x) = P
n^. oar
J 41(x) dP(x)
(5.1)
^aI
holds for all bounded continuous functions 0 on W. It is actually sufficient to verify (5.1) for those continuous functions 0 that vanish outside some finite interval. For suppose (5.1) holds for all such functions. Let 0 be an arbitrary bounded continuous function, l41(x)i -< c for all x. For notational convenience write {x e W: lxi >- N} = {jxI >, N}, etc. Given s > 0 there exists N such that P({ixi >- N}) < e/2c. Let 6 N,, O be as in Figure 5.1(a), (b). Then, lim Pn ( {Ixt ` N + 1}) >, lim J O x (x) dP(x) =
J © (x) dP(x) N
P({ixi 1---, 2c
so that limPP(ixi>N+1})_-1— lim Pn ({ixi- N + 1} one has f r(x)I < c, we have
lim I^ ¢ dP,, — n-. I$
0 dPI N + 1}) + cP({IxI > N + 1})) n-.m
= lim cP ({IxI > N + 1}) + cP({Ixi > N + 1})) n-^m
E 8 0 is arbitrary, J, 0 dPn —. }° a , ¢ dP, and the proof of the italicized statement above is complete. Let us now show that it is enough to verify (5.1) for every infinitely
644
A PROBABILITY AND MEASURE THEORY OVERVIEW
(a)
(b)
Figure 5.1
differentiable function that vanishes outside a finite interval. For each e > 0 define the
function PE(x) = d(E) expj —(1
= 0
for ^xj < E,
— x2/e2)}
for IxI >- e,
l
(5.3)
where d(e) is so chosen as to make f p,(x) dx = 1. One may check that p,(x) is infinitely differentiable in x. Now let 0 be a continuous function that vanishes outside a finite interval. Then ¢ is uniformly continuous and, therefore, b(s) = sup{1cß(x) — ¢(y)j: x — y^ -< e) -+ 0 as e j 0. Define (x)
= 0 * P(X)J
(5.4)
O(x — y)PE(y) dy, e
and note that, since 4(x) is an average over values of 4 within the interval (x — e, x + s), fr/(x) — ¢(x)I 0 there exists 'j(e) > 0 such that IF(x) - F(x o )I < r for Ix - x 0 1 _< ry(c). Let ii. (x) = I for x _< x 0 , = 0 for x > x 0 + rt(e), and /i. (x) be linearly interpolated for x o <x <x 0 + ry(e). Similarly, define /i, (x) = I for x < x 0 - q(e), ' (x) = 0 for x and linearly interpolated in the interval (x 0 - ry(e), x o ). Then, using (5.1), F
P
lim F(x 0 ) F(x o ) - e.
n-^w
Since r > 0 is arbitrary, lim n ^ FF (x o ) 0 there is an interval (a, b] in R' such that P„((a, b]) = F(b) — FF (a) > I E for each n I. It is clear that if a sequence of probability measures on R' (or R") converges weakly to a probability measure P, then {P„} is tight. Conversely, one has the following result. x of
_>
>_
—
Theorem 5.2. Suppose a sequence of probability measures {P„} on R' is tight. Then it has a subsequence converging weakly to a probability measure P. Proof. Let F„ be the d.f. of P.. Let {r 1 , r 2 , be an enumeration of the rationals. Since {F„(r l )} is bounded, there exists a subsequence {n,} (i.e., {1 f ,2,.....n,,...}) of the ...}
positive integers such that F„,(r,) is convergent. Since {F,,(r 2 is bounded, there exists a subsequence {n 2 } of {n 1 } such that F„ 2 (r2 is convergent. Then {F,,(r; )} ( i = 1, 2) converges. Continue in this manner. Consider the "diagonal” sequence 1 1 , 2 2 , kk Then {F„,(r; )} converges, as n —• oo, for every i, as {n,,: n >_ i} is a subsequence of {n : n > 1 }. Let us write n„ = n'. Define )}
)
... ,
, ... .
;
G(r): =lim F,, (r )
(i = 1, 2, ...),
F(x):= inf {G(r; ): r> x}
(x
;
(5.6)
Then F is a nondecreasing and right-continuous function on R', 0 < F < 1. Let x be a point of continuity of F. Let x' < y' <x < y” < x" with y', y" rational. Then
F(x') _< G(y') = lim F„.(y')
0, there exist x F , yE such that F,,(x,) I — s. Let xE, y£ be points of continuity of F such that xE <x and y> yE . Then
F(x,) = lim F,,(x) < e, F(y) = lim F,,.(y') I This shows lim, I _. F(x) = 0, lim, 1 , F(x) = 1.
—
E.
n
A similar proof applies to P,,, P on R k .
6 CLASSICAL LAWS OF LARGE NUMBERS Theorem 6.1. (Strong Law of Large Numbers). Let {X„} be a sequence of pairwise independent and identically distributed random variables defined on a probability space
CLASSICAL LAWS OF LARGE NUMBERS
647
(SZ, y, P). If EIX,j < oo then with probability I, lim n
X' + _+ Xn = EX,.
(6.1)
wn
The proof we present here is due to Etemadi. 3 It is based on Part I of the following
Borel-Cantelli Lemmas. Part 2 is also important and it is cited here for completeness. However, it is not used in Etemadi's proof of the SLLN. Lemma 6.1. (Bore/-Cantelli). Let {A n } be any sequence of events in 3. Part 1. If J t P(A,) < co then
P(A, i.o.):= P n U A k ) = 0. =I k=n
( '
Part 2. If A 1 , A 2 , ... are independent events and if
P(A,) diverges then
P(A, i.o.) = 1.
Proof. For Part I observe that the sequence of events B. = U= A k , n = 1, 2, ... , is a decreasing sequence. Therefore, we have by the continuity property (1.2), m
m
\
x
m
P n U A k t = lim P^ U A k ) -< lim Z P(A k ) = 0. nk=n
n-m
k=n
n^z k=n
For Part 2 note that
P({A n i.o.}`) = P^ U n Ak) = lim P( ñ Ak j = lim n=1 k=n
n-^ac
k=n
J
P(Ak).
n—x k=n
But m
w
P(Ak) = lim U (1 — P(Ak)) k =n
lim exp{— m-^m
P(A k )} = 0.
n
k=n
Without loss of generality we may assume for the proof of the SLLN that the random variables X. are nonnegative, since otherwise we can write X. = X„ — X,, where X„ = max(X,, 0) and X, = —min(X,, 0) are both nonnegative random variables, and then the result in the nonnegative case yields that S n
n z Xk = n1 k=, ark _ n1_k=,
converges to EX, — EX 1 = EX, with probability 1. -
' N. Etemadi (1983), `On the Laws of Large Numbers for Nonnegative Random Variables," J. Multivariate Analysis, 13, pp. 187-193.
648
A PROBABILITY AND MEASURE THEORY OVERVIEW
. Then Y. has moments of all orders. Let Truncate the variables X. by Y„ = X„1 T„ = Zk =1 Yk and consider the sequence {T„} on the "fast” time scale T. = [a"], for a e > 0. Then by Chebyshev's
fixed a > 1, where brackets [ ] denote the integer part. Let Inequality and pairwise independence, ET P ^I T^^— `^ >
Var(T) 1 T^ z z z=^ = z E T„ k= 1
T„
E T„
VarYk ^
E
1 T^ Z
= 22 S E {Xk2 l (xk 5k1} =' 2 Z E T„ k=l
1
E
t^
Zz E{X
EYk
rn k=1
E {X12 1(X, sk)}
i n k=1
1
1
ltx „t ^^} = 2 T„E{X i lix, s^^(}. (6.2)
c T„ k=1
e T„
Therefore, T” — ET
r^I > e^
P \I
Z E {Xi l(x, s^ "1} = 2
+2,
.3) 1(x„^^(I. (6
En=1 Tn
n=1 E Tn
Tn
Let x > 0 and let N = minn >- 1: T. -> x }. Then a" >- x and, since y < 2[y] for any y >- 1, Z n=l
1 l(:st^l= Z 1 --x Tn
« "= 2 a " =k« -
a—!
n3N
k, x
where k = 2 /(a — 1). Therefore, k
°° 1 n=1
in
forX,>0.
Xl
So ^n
ET= I>e) u) du = EX,
_ 0, Sty
Sk
Zn +l Sr—I
Tn+1 in
k
to Tn+l
Zn
(6.8)
But rn+l/in -* a, so that now we get with probability 1, 1
Sk
Sk
- EX, -< liminf - -< limsup - -< aEX, . a
k
k
k
k
(6.9)
Take the intersection of all such events for rational a > 1 to get lim k - d, Sk /k = EX, with probability 1. This is the Strong Law of Large Numbers (SLLN). n The above proof of the SLLN is really quite remarkable, as the following observations show. First, pairwise independence is only used to make sure that the positive and negative parts of X, and their truncations, remain pairwise uncorrelated for the calculation of the variance of Tk as the sum of the variances. Observe that if the random variables are all of the same sign, say nonnegative, and bounded, then it suffices to require that they merely be uncorrelated for the same proof to go through. However, this means that, if the random variables are bounded, then it suffices that they be uncorrelated to get the SLLN; for one may simply add a sufficiently large constant to make them all positive. Thus, we have the following. Corollary 6.2. If X 1 , X2,... , is a sequence of mean zero uncorrelated random variables that are uniformly bounded, then with probability 1, X, +.••+Xn —
n
*0
asn -*oo.
7 CLASSICAL CENTRAL LIMIT THEOREMS In view of the great importance of the central limit theorem (CLT) we shall give a general but self-contained version due to Lindeberg. This version is applicable to non-identically distributed summands. Theorem 7.1. (Lindeberg's CLT). For each n, let X,,,,. .. , Xk^ n be independent random
650
A PROBABILITY AND MEASURE THEORY OVERVIEW
variables satisfying k"
(
EXj.n = 0 ,
al.n'— (EX^
n)1/Z
CO,
an = 1
,
(7.1)
j=1
and, for each E > 0, k"
(Lindeberg condition)
lim Z E(X; jjjx "I,E}) = 0 .
(7.2)
j
n—W j=1
Then ^J^ 1 X
in distribution to the standard normal law N(0, 1).
Proof. Let {Z : j _> 1} be a sequence of i.i.d. N(0, 1) random variables, independent of j
{X: I < j Q. Write Z
so that EZ
0
,,
(7.3)
(1 <j < kn),
;=a j.n Zj
= EX EZj = a j = EX J . Define n
n
n
m
(I < _< —
k"
Um,n := Y_ X,
,,
m
+ I Zj,n j=m+1
j=1
kn
1), (7.4)
j=1
j=1
Vm.n `= Um.n — Xm n
(1 < m < k n ).
Let f be a real-valued function on I8' such that ff', f ", f " are bounded. Recall the following version of the Taylor expansion, which is easy to check by integration by parts,
h2 f(x
+ h) = f(x)+hf'(x) + h f"(x)+h 2 (1 — 6){f"(x + Oh) — f"(x) }dO
Taking x = V,,,, h
2!
0
(x,hE R'). (7.5)
= X,,, in (7.5), one gets
EJ (Umn) = E' f( V,. +
Xm.n) =
EJ ( Vmn) + E ( Xm.nf ' ( vmn)) + I'E(Xm.nf " ( V, )) + E(R mn), (7.6)
where Xm.n
J
1
(1
—
6){f " ( Vm n + ©x,) — f"( V)} d8.
(7.7)
As Xm , n and Vm , n are independent, and EX,,,,, = 0, EX,,, = of,,,, (7.6) reduces to .n
Ef(Um,n) = E.f(I'm.n) + ^2 Ef"(ym.n) + E(R m , n ).
(7.8)
Also Um_ l.n = Vm,n + Z., n , and Vm.n and Zm.n are independent. Therefore, exactly as
CLASSICAL CENTRAL LIMIT THEOREMS
651
above one gets, using EZ,,,, = 0, EZ m,n = Qm,n (7.9)
Ef(Um- t.n) = Ef(Vm,n) + -^.n Ef'(Vm,n) + E(R^n.n),
where R
= Zm
J 1 (1 — O){ f
„
(V. „ + ©Zm ) — .f"(Vm,„)} dB.
(7.10)
Hence, IEf(Umn) — Ef(U,-i.n)I -< EIRm,nI + EIR;n.nl
(1 -< m -< Q.
(7.11)
Now, given an arbitrary e > 0, EIRm,nI = E(IRm.nl l{Ix,,.,.l>el) + E(IRm.nI1Ux.,,.„ISc))
E xm.n1llXm.^1>El f '(1 — 0 ) 2 11f"Il^o do [
]
.
+ E X 2 ,.IllX.^..,I5c1 J ' (1 — O)IXm,nl IIJ "Il do J 0
[
Ilf"IIwE(Xr2n,. 1 tlx .
I>s} ) + 2EQm,n
11 f " II .
(7.12)
We have used the notation 11911 := sup{Ig(x)I: x e W}. By (7.1), (7.2), and (7.12),
lim
EIRm,„I
28IIf'"II .
m=1
As > 0 is arbitrary, k„
(7.13)
tim Z EIR m.„I = 0. m=1
Also, ,
EIRl s E[zm.„ J(1 — 9)Ilf"'1 Zm,nl de] _ ?II f"'II^EIzm.„1 3 = ?Ilf" IIWam.nEIZ1i 3 cam„c^
max a m , n l am
1 <m-k„
(7.14)
/
where c = Z IIf"ILEIZ1I3. Now, for each S > 0, a m.n = E(xm.n 1 11x., ,d>al) + E(Xm.n 1 lIX,,, ,I ös) -< E ( x m.n 1 llx.^..,1 >ö)) + 5 2 ,
which implies that k„
max am,„ K
E(Xm.„ltlxm.>aI) + Sz.
652
A PROBABILITY AND MEASURE THEORY OVERVIEW
Therefore, by (7.2),
max a,,,,, -- 0 i
as n -- oo.
(7.15)
5mbk
From (7.14) and (7.15) one gets k„
I EIR; ,j < c max a m n -^ 0
as n — cc.
(7.16)
m=1
Combining (7.13) and (7.16), one finally gets k„
/
/
Z (EIRm.nl + EIRin.nl) -a 0
IEf(Uk,,,n) — Ef(Uo.n)I
as n
c0.
(7.17)
m=1
But Uo , n is a standard normal random variable. Hence,
J
k„
Ef
Y- x)
f(y)(2n)-lie
j= 1
exp{—Zy 2 } dy -+ 0
as n -+ oo. n
By Theorem 5.1, the proof is complete.
It has been shown by Feller 4 that in the presence of the uniform asymptotic negligibility condition (7.15), the Lindeberg condition is also necessary for the CLT to hold. Corollary 7.2. (The Classical CLT). Let {Xj : j -> 1} be i.i.d. EXj = p, 0 < a 2 := Var Xj < oo. Then Y 7 =1 (Xj — )/(i/) converges in distribution to N(0, 1). Proof.
Let Xi ,, = (Xj — µ)/(a^), k n = n, and apply Theorem 7.1.
n
Corollary 7.3. (Liapounov's CLT). For each n let X1,n, X2,n, . .. , Xn k„ be k n independent random variables such that k„
k
I EXj,n = u,
I Var Xi ,, = Q Z > 0,
j= 1j=l k„
(Liapounov condition)
lim Z EIXj , n — EXj , n 1 2+ ' = 0
(7.18)
n-. j=l
for some 6 > 0. Then
Xi,,
converges in distribution to the Gaussian law with mean
p and variance a Z . Proof. By normalizing one may assume, without loss of generality, that k.,
EXj , n =0,
Z EX; n = 1. j=1
4
P. Billingsley (1986), loc. cit., p. 373.
653
FOURIER SERIES AND THE FOURIER TRANSFORM
It then remains to show that the hypothesis of the corollary implies the Lindeberg condition (7.2). This is true since, for every E > 0, k„
k„
i
(
Z E X;.^ltlx;..,l>tl) J=1
E 1
IX. Iz+a ,.n Ea 0,
as n - oo, by (7.18).
(7.19) n
Observe that the most crucial property of the normal distribution used in the proof of Theorem 7.1 is that the sum of independent normal random variables is normal. In other words, the normal distribution is infinitely divisible. In fact, the normal distribution N(0, 1) may be realized as the distribution of the sum of independent normal random variables having zero means and variances Q? for any arbitrarily specified set of nonnegative numbers or? adding up to 1. Another well-known infinitely divisible distribution is the Poisson distribution. The following multidimensional version of Corollary 7.2 may be proved along the lines of the proof of Theorem 7.1. Theorem 7.4. (Multivariate Classical CLT). Let {X n : n = 1, 2, ...} be a sequence of i.i.d. random vectors with values in l. Let EX I = p and assume that the dispersion matrix (i.e., variance-covariance matrix) D of X I is nonsingular. Then as n -+ 00, n - '^ 2 (X I + • • • + X. — nit) converges in distribution to the Gaussian probability measure with mean zero and dispersion matrix D.
8 FOURIER SERIES AND THE FOURIER TRANSFORM Consider a real- or complex-valued periodic function on the real line. By changing the scale, if necessary, one may take the period to be 2n. Is it possible to represent f as a superposition of the periodic functions ("waves") cos nx, sin nx of frequency n (n = 0, 1, 2, ...)? The Weierstrass approximation theorem (Theorem 8.1) says that every continuous periodic function f of period 2n is the limit (in the sense of uniform convergence of functions) of a sequence of trigonometric polynomials, i.e., functions of the form T Y c n e" s = n=-T
T
c0 +
(an cos nx + b n sin nx). n=1
The theory of Fourier series says, among other things, that with the weaker notion of L2 -convergence the approximation holds for a wider class of functions, namely for all square integrable functions f on [—it, n]; here square integrability means that I f I z is measurable and that J"_,, f (x)1 2 dx < oc. This class of functions is denoted by L2 [ — rr, n]. The successive coefficients c n for this approximation are the so-called Fourier coefficients: 1 c
n
= --
"
f(x)edx n
(n=0,±1,±2,...).
(8.1)
654
A PROBABILITY AND MEASURE THEORY OVERVIEW
The functions exp{inx} (n = 0, + 1, ±2, ...) form an orthonormal set: 1
e1nxe-"""dx =0
forn0m,
=1
for n = m,
2it
_n
(8.2)
so that the Fourier series off written formally, without regard to convergence for the time being, as X Y
cneinx
(8.3)
is a representation of f as a superposition of orthogonal components. To make matters precise we first prove the following theorem. Theorem 8.1. Let f be a continuous periodic function of period 2n. Then, given ö > 0, there exists a trigonometric polynomial Y„"= - N do exp{inx} such that N
sup f (x) —
do exp{inx} 0 it follows from (8.5) that k N (x) goes to zero uniformly on [ —n, —e] u [E, n] so that kN(x)
dx
-.
0
as N -^ co .
(8.6)
[ - a. - e]v [c.n]
In other words, k N (x) dx converges weakly to 5 0 (dx), the point mass at 0, as N -* oo. Consider now the approximation fN of f defined by
fN(x)'=
f
rz -
.f(y)kN(x — y)dy = y 1 1
. n= _N \\\
— N
Inl+llJc„ exp{inx},
(8.7)
655
FOURIER SERIES AND THE FOURIER TRANSFORM
where c" is the nth Fourier coefficient of f. By changing variables and using the periodicity of f and k N , one may express fN as .fN(x) =
f
n f(x — y)kN(y) dy. rz
Therefore, writing M = sup{I f (x)I: Xe l}, and S E = sup{If (y) — f(y')I: I y — y'I N_ N c" exp{inx} 11 2 decreases as N increases and that 2
N
lim
f(x) — Y- cne`"x
N- .
= 11
f
112 — Y- Ic n I 2 .
(8.12)
-N
To prove that the right side of (8.12) vanishes, first assume that f is continuous and f(—it) = f(n). Given e> 0 there exists, by Theorem 8.1, a trigonometric polynomial ZNOy o d n e i "x such that No
max f(x) — Yx
-No
de"' <e. < E.
656
A PROBABILITY AND MEASURE THEORY OVERVIEW
This implies n
1
2 de"" dx < F 2 .
No
f(x) — 2r j,,
(8.13)
-No
But, by (8.9), f(x) — J] N-N o c„ exp{inx} is orthogonal to exp{imx} (m = 0, ± 1, ... , ±N o ) so that 1 7C
rz /'( x ) —
No
2
d rze "x
fNo
No
.J
27t
l dx
- No
cneinx^ dx
f(x) —
27r .,
nx
2
No
=
_No rz
1
(cn — d n )e
_ No
n
n
1
2
No
d x =1 f(x) — So crzeinx +
2
No
— d rz )e inx dx.
+ (c„ 27r
(8.14)
,, -No
Hence, by (8.13) and (8.14), o
— No c n e 27[ f rz I f (x) -rz
inx
l2 dx < e 2 ,
lim
f(x) —
c„etnx
2 \ e 2 . (8.15)
-N
Nam
-No
Since f > 0 is arbitrary, it follows that N
lim f(x) — N--
crze inx
= 0,
(8.16)
-N
and, by (8.12),
II.f ll 2 =
Y_
(8.17)
Ic„1 2 .
This completes the proof of convergence for continuous periodic f. Now it may be shown that given a square integrable f and e > 0, there exists a continuous periodic g such that II f — gll < E/2. Also, letting Z d. exp{inx}, X c„ exp{inx} be the Fourier series of ,g, f, respectively, there exists N t such that N,
d„ exp{inx} II
N
Since the continuous functions Y N_ N c n exp{inx} converge uniformly (as N —> co) to c n exp{inx}, the latter must be a continuous function, say h. Uniform convergence to h also implies convergence in norm to h. Since Y"' N c n exp{inx} also converges in norm to f, f(x) = h(x) for all x. For if the two continuous functions f and h are not identically equal, then
J
f(x)—g(x)l 2 dx>0.
n
For a finite measure (or a finite signed measure) p on the circle [—n, n) (identifying
658
A PROBABILITY AND MEASURE THEORY OVERVIEW
-n and n), the nth Fourier coefficient of p is defined by 1 c„ = — 2n
exp{ -inx}p(dx)
(n = 0, ± 1, ...).
(8.23)
[n)
If p has a density f, then (8.23) is the same as the nth Fourier coefficient of f given by (8.1). Proposition 8.3. A finite measure µ on the circle is determined by its Fourier coefficients. Proof. Approximate the measure p(dx) by g(x) dx, where rz N (x):=
g
_ rz
n N N+
kN(x - y)p(dy) = ^ (1 - l
^c„ exp{inx}, 1
(8.24)
with c„ defined by (8.23). For every continuous periodic function h (i.e., for every continuous function on the circle),
f
h(x)gN(x) dx t
=
5
h(x)k,(x - y) dx)P(dy). ( ft - x,.)
(8.25)
As N -+ oo, the probability measure k N (x - y) dx = k N (y - x) dx on the circle converges weakly to b y (dx). Hence, the inner integral on the right side of (8.25) converges to h(y). Since the inner integral is bounded by sup{Ih(y)l: ye W}, Lebesgue's Dominated Convergence Theorem implies that
lim N
-
f
h(x)g N (x) dx = J
,-x,n)
h(y)p(dy).
(8.26)
[-rz.rz)
This means that p is determined by {g N : N > 1} The latter in turn are determined by {c),
n
We We are now ready to answer an important question: When is a given sequence
{c: n = 0, + 1, ...}, the sequence of Fourier coefficients of a finite measure on the circle? A sequence of complex numbers {c: n = 0, + 1, ±2,.. .) is said to be positive definite if for any finite sequence of complex numbers {zj : 1 < j < N), one has Y-
c j - k zj z k i 0.
(8.27)
1 j.k5N
Theorem 8.4. (Herglotz's Theorem). {c: n = 0, ± 1, ...} is the sequence of Fourier coefficients of a probability measure on the circle if and only if it is positive definite, and co = 1. Proof. (Necessity).
If p is a probability measure on the circle, and
{z j : 1 -< j -< N} a
659
FOURIER SERIES AND THE FOURIER TRANSFORM
given finite sequence of complex numbers, then Ci_kZJZk =
1 E zjik 21t 1-<j.kSN
1j,kN
J
exp{—i( j — k)x}µ(dx) [-a.n)
J exp{ —ijk} j (dx) >- 0. =1J z N
N
('
_ Irz ^E zj exp{ — iix})(Z z k exp{ikx})µ(dx) 1
1
Z
N
('
j
(8.28)
Also,
J
c0= p(dx) = 1.
(Sufficiency). Take z j = exp{i(j — 1)x}, j = 1, 2, . .. , N + 1 in (8.27) to get gN(x):= 1
c j _ k exp{i(j — k)x} >, 0.
(8.29)
N + 1 0-<j,k 0, one may change p to p' where p'({—it}) = p({—R}) + p({rr}), and p' = p on (—it, n) to get a probability measure p' on the circle whose Fourier coefficients are c. Note that (8.33) holds with p replaced by p', because exp{—inn} = exp{inrc} = 1. n
660
A PROBABILITY AND MEASURE THEORY OVERVIEW
Corollary 83. A sequence {c„} of complex numbers is the sequence of Fourier coefficients of a finite measure on the circle [ -n, it) if and only if {c„} is positive definite. Proof. Since the measure p = 0 has Fourier coefficients c„ = 0 for all n, and the latter trivially comprise a positive definite sequence, it is enough to prove the correspondence between nonzero positive definite sequences and nonzero finite measures. It follows from Theorem 8.4, by normalization, that this correspondence is 1-1 between positive definite sequences {c„} with c o = c > 0, and measures on the circle having total mass c. n The Fourier transform of an integrable (real- or complex-valued) function f on (-oo, oo) is the function f on (-oo, oo) defined by e`^yf(y)dy,
-oo< 0 there exists a step function f, such that
I14 — f II I °= f
W
If(y) - f(y)I dy 0 is arbitrary, 0
as {^j -+ co.
(8.37)
The property (8.37) is generally referred to as the Riemann-Lebesgue Lemma. If f is continuously differentiable and f, f' are both integrable, then integration by parts yields .^^( ) =
-
). if.
(8.38)
The boundary terms in deriving (8.38) vanish, for if f' is integrable (as well as f) then f (x) -. 0 as x - ± oo. More generally, if f is r-times continuously differentiable and f e' , 0 < j < r, are all integrable, then one may repeat the relation (8.38) to get )
f (^) = (-if' (). (r)
(8.39)
In particular, (8.39) implies that if f, f', f" are integrable then f is integrable. It is instructive to consider the Fourier transform as a limiting version of a Fourier series. Consider for this purpose that f is differentiable and vanishes outside a finite interval, and that f' is square integrable. Then, for all sufficiently large integers N, the
661
FOURIER SERIES AND THE FOURIER TRANSFORM
function gN(x):=
f(Nx)
vanishes outside (—n, n). Let
Cn Ne
°x
, c e
°S
(8.40)
be the Fourier series of y ti. and its
derivative g N , respectively. Then
f
Cn.N — gN(x)e-;nx dx =
n
'—
2R
f (Nx)e
dx = 2
inl/N
1 - ^r. f(y)e
dy
n N
1 2Nrz
=— f
(8.41)
Now writing A = (2>2j, n-2)'/2, 11 kn,NI = ICo.NI + (
n>o InI
1/z
1/2
> Z >2 Inen.Nl 2
Co.NI +
(I ncn.NI)
.,
on
n#0
'/2
Ico,NI + A( >2 Ic^ 1 NI 2 )
f =—
ou
j
x
n
1/2
.. g(x)dx + A ( rz f Ig (x)I 2 dx < 2 / -R
Therefore, for all sufficiently large N, the following convergence is uniform:
f(z) = yN
(
-
N -
n e ;n 1N. N
= n1 _.. f x, 2Nn
n Y>.
(8.42)
Letting N - co in (8.42), if f E C(If', dx), one gets the Fourier inversion formula, f(z)
=
--
`°
('
.f(—y)ey dy = - J
_^ .i(^)e_
;
^^ ds.
(8.43)
One may show that this formula holds for all f such that both f and f are integrable. Next, any f that vanishes outside a finite interval and is square integrable is automatically integrable, and for such an f one has, for all sufficiently large N, 1
1-
2n f
IgN(x)I2 dx
Ig(x)I 2 dx =
I
=2I f
If(y)I 2 dy,
Y ICn.NI 2 = 4NZrz2 Y, I f( — N^ 2 ,
so that
N n=^^ I f \ N n/I2 2n J
If(y)1 2 dy.
(8.44)
662
A PROBABILITY AND MEASURE THEORY OVERVIEW
Therefore, letting NI oc, one has the Plancherel identity . If(^)I Z dl; = 2n I
1
If(y)1 2 dy.
JJ
JJJ ^
(8.45)
Now every square integrable f on (— oo, oo) may be approximated in norm arbitrarily closely by a continuous function that vanishes outside a finite interval. Since (8.45) holds for the latter, it also holds for all f that is integrable as well as square integrable. We have, therefore, the following. Theorem 8.6 (a) If f and f are both integrable, then the Fourier inversion formula (8.43) holds. (b) If f is integrable as well as square integrable, then the Plancherel identity (8.45) holds. Next define the Fourier transform µ of a finite measure p by setting
µ(ff) _ f
e du(x).
(8.46)
j
If p is a finite signed measure, i.e., p = µ 1 — P 2 where p, P 2 are finite measures, then also one defines fi by (8.46) directly, or by setting µ = µ l — µ 2 . In particular, if p(dx) = f (x) dx, where f is real-valued and integrable, then µ = f If p is a probability measure, then A is also called the characteristic function of p (or of any random variable whose distribution is p). We next consider the convolution of two integrable functions f, g:
f
*g(x) =
.f(x—y)g(y)dy J
(—oo<x 1: pti^ > > 0}). • p. 127, (6.7): Superscript N + 1 on the left side should be n + 1. • p. 130: line 14 from bottom, should read: spectral radius is the largest magnitude of eigen values of A. • p. 135: line below (8.2): "return" times should be "passage" times. • p. 137, line 13 from top: The reference should be to Exercise 6 (not Exercise 7). • p. 137, (8.15): Should be (2p - 1) in place of (1 - 2p). • p. 140, (9.8): First sum should start with m = n + 1. • p. 150, (10.11): Replace X[l+l by f(X[nt]+1). • p. 173, line before (13.35): Replace "p-th row" by "last row". e p. 194, Exc. 4(iv), should read: Zn = X, n 1'. • p. 197, Exc. 6(c), last line should read: with transition law p and starting at i. • p. 197, Exc. 7(i), should read: the largest magnitude of the eigenvalues of A. • p. 200, Exc. 5, should read: A tree graph on r vertices vi ... v T is a connected graph that contains no cycles. [That is, ... unique sequence ei, e2, . .. , e r of edges ez = {vk^ , V mi } such that u E e,., ej fl ei+i. 01, i = 1, ... , r - 1.] The degree....
673
674
ERRATA
• p. 202, Exc. 1(ii): Insert n in hint: P(lY I > ne i.o.). • p. 210, line 7 from bottom: cEizi 1 should be EZ I (delete e and -1). • p. 225, line 2 from bottom: Replace T.9.1 by T.9.2. • p. 234, 1st line of (1.2), should read: pi,i +l =
(w+r z)(2T-2>
234, 2nd line of (1.2), should read:
i(2r -i)
• p. 236, (2.16): Last term should be (1 - 6,
- ,Qy )p w
(w -'+z)(w +r-i)
(i.e., missing factor pvv ).
• p. 239, (3.6): Right side of 2nd equation in (3.6) should be irk in place of j. • p. 240, 6 lines from bottom: Replace first denominator (2 - r + j) in (3.14) by (w - r + j). • p.256, Exc. 4, should read: al = e -x1 is the largest nontrivial (i.e., c'i is the eigenvector corresponding to al.
1) eigenvalue of p.
• p. 262, 2nd line of (1.4), should read: j < i. • p. 278, lines 7, 8 from top:
X- 1
should be XT2 , and XT, should be
XT,,^ }1 .
• p. 279, line 4 from top: Insert at the end: on the set {Yk = ik : k > 0 }. • p. 280, 2nd line of Proposition 5.6: Replace {Yt} by {Xt}. • p. 285, line 4 from bottom: The reference should be to Example 8.3 (not Example 8.2). • p. 290, line 7 from top: A,,,, +1 should be A, +i. • p. 293, 3rd line of (7.2): Capital T in center of inequality show be lowercase
t.
• p. 302, line 6 from top: {Vt : 0 < u < t} should be {Vu : 0 < u < t }. • p. 303, line 5 from top: o(T) should be o(1). • p. 307, (8.15): The right side should be divided by A2. • p. 307, 3rd line after (8.15): The right side of the display should be divided by A. • p. 309, 12 lines from top: The second and third sums in display (8.26) are E
N) _ N ^0" (y N
ß
• p. 309, (8.27): Delete
N
pyrv , -y
,
(^
+p) I (a+jR) ° (a +N,3) wherever it appears in display (8.27).
• p. 325, (11.2): First sum is over n E Z d . • p. 326, Figure 11.1: ( o o(0)) should be (a„(0)). -
• p. 329, 11.10: Insert d: Should be 2dp mn in second line of display (11.10). • p. 333, Exc. 3: p(u) > 0. • p. 334, line 1 from top: Replace "positive" by "nonzero". • p. 357, lines 3, 4 above (T.5.1): Tk - N should be T2k = N and X ,- k = N should be X 2k = N. ,-
• p. 357, (T.5.3) and 2nd line from bottom: Replace the upper limit of the sum by [N/2] + 1 (not [N/2]). • p. 363, line 6 from bottom: Replace (S,.T) by (E, C9). • p. 363, line 5 from bottom: Replace F by 9. • p. 364, line 8 from top: Replace F by 9.
ERRATA
675 • p. 364, line 14 from bottom, should read: to y in the metric p as k - oo, then a° ( "' k) (t) _ am (nk) ( 0 ) + a(0) ( 0 ) _ aD(7n)(t) a.s. as k - oo. m(n,t) m(n,t) • p. 364, line 12 from bottom, display should read: 1-2P.n , (o (t) = -1) = EQD""` ) (t) -f
Ea n (t) = 1 - 2P.,(o n (t) = -1). • p. 373, (2.12): tfH should be H f I. • p. 374, line 2 from top: l 1(y) - f (x) should be f" (y) - f" (x)1. • p. 375, line 3 from bottom: x = X. should be x = X s . • p. 380, (2:44): Delete square brackets on the right-hand side, and insert one parenthesis after -a/ax. • p. 400, (6.27): Insert a minus sign in the exponential. • p. 414, (8.38), ist line: Replace by d d x. • p. 439, 2nd line of (13.3): Replace f o fo b y fo fo in first term and insert ds'ds at the end of the line. • p. 476, Exc. 4(v): a 2 = yao should be a 2 = ry 2 a ö . • p. 477, Exc. 5(i): Reference to (2.4) in place of (2.2). • p. 485, line 9: Insert ds in the double integral. • p. 487, Exc. 9(ii): The eigenvalues are 0, -1, -2, ... . • p. 505, line 18 from bottom: "nonnegative" should be "nonpositive". • p. 518, line 5 from top: The publisher is Wiley Eastern Division (not Eka Press). • p. 564, line 20: In the definition of P-complete, replace N E .Ft by N E F. • p. 565, (1.9): dB t should be dt. • p. 566, (1.18): Should be .P« not. P,9 in (1.18). • p.569,line2:Insert (ß-a) toread a+n(ß-a)