Hoel Port Stone Iintroduction to Stochastic Processes
,
Th.� Houghton MifHin Series in Statistics undler the Editorsh...
371 downloads
1651 Views
14MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Hoel Port Stone Iintroduction to Stochastic Processes
,
Th.� Houghton MifHin Series in Statistics undler the Editorship of Herlman Chernoff
LEO BREIMAN
Probability and Stochastic J.Processes: With a View Toward Applications Statistics: With a View Toward Applications PAUL G. HOEL, SIDNEY C. PORT, AND CHARLES J. STONE
Introduction to Probability Theory Introduction to StatisticalT'heory Introduction to Stochastic }·rocesses PAUL F. LAZARSFELD AND N1EIL W. HENRY
Latent Structure Analysis G01ITFRIED E. NOETHER
Introduction to Statistics-.A Fresh Approach Y. S:. CHOW , HERBERT
ROBBn�s, AND DAVID SmGMUNI) Great Expectations: The Theory of Optimal Stopping
I. RICHARD SAV AGE
Statistics: Uncertainty and .Behavior
I ntrod uction to
Stochastic Processes
Paul G. Hoel
Sidney C. Port Charles
J.
Stone
University of California, Los Angeles
HOUGHTON MIFFLIIN COMPANY
BOSTON
New York
Daillas
Atlanta
Geneva, Illinois
Palo Alto
COPYRIGHT
© 1 972
BY HOUGHTON MIFFLIN COMPANY.
All rights reserved. No Jpart of this work may bt! reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or by any information storage or retrieval system, wlthout Jpermission in writing from the publisher. PRINTED IN THE U.S.A. LmRARY OF CONGRESS CATALOG CARD NUMBER: ISBN:
0-395-12076-4
79-165035
General Preface
This three-volume series gre'N out of a three-quarter course in proba.bility, statistics, and stochastic process(!s taught for a number of years at UCLA. VVe felt a need for a series of books that would treat these subjects in a way that is well coordinate:d, but which would also give adequate emphasis to each subject as being interesting and useful on its own merits. The first volume, Introduction to Probability Theory, presents the fundarnental ideas of probability theory and also prepares the student both for courses in statistics and for further study in probability theory, including stochastic pro(;esses. The second volume, Introduction to Sta tistica l J'heory, �evelops the basic theory of mathematical statistics in a systematic, unified manner. Togethe:r, the first two volumes contain the material that is often covered in a two-semester course in mathemlatical statistics. The third volume, Introduction to Stochastic Processes, treats Markov chains, Poisson processes, birth and death processes, Gaussian processes, Bro'w nian motion, and processes defined in terms of Brownian motion by means of ele mentary stochastic differential equations. \I
v
Preface In recent years there has been an ever increasing interest in the study of systems which vary in time in a random Inanner. Mathematical models of such systelms are known as stochastic processes. In this book we present an elementary account of some of the important topics in the theory of such processes. We have tried to select topics that are conceptually interesting and that have found fruitful application in various branches of science and technology. A stochastic process can be de:fined quite generally as any collection of random variables Jr(t), t E T, defined on a common probability space, where T is a subset of ( - 00, co) and is thought of as the time parameter set. The process is called a continuous parameter process if I'is an interval having positive length and a dlscrete parameter process if T is a subset of the integers. If the random variables X{t) all take on values from the fixed set f/, then f/ is called the sta te space of the process. Many s1tochastic processes of theoretical and appli1ed interest possess thle pro perty that, given the present state of the process, the past history does not affect conditiona.l probabilities of events defined in terms of the future. Such processes are called .Markov processes. In Chapters 1 and 2 we study Markov chains, which are discrete parameter Markov processes whose state space is finite or countably infinite. In Chapter 3 we study the corresponding continuous parameter processes, with the "]Poisson process" as a special case. In Chapters 4-6 we discuss continuous parameter processes whose state space is typically the real line. In Chapter 4 we introduce Gaussian processes, which are characterized by the property that every linear comlbination i nvolving a finite number of the random variables X(t), t E T, is normally distributed. j\s an important special case, we discuss the Wiener process, which arises as a nlathe matical model for the physical phenomenon known as "Brownian motion." In Chapter 5 we discuss integration and differentiation of stochastic processes. There we also use the Wiener process to give a mathematical model for Hwhite noise." In Chapter 6 we discuss solutions to nonhomogeneous ordinary differential equations having constant coefficients whose right-hand side is either a stochastic process or white noise. We also discuss estimation problems involving stochastic processes, and briefly consider the "spectral distribution" of a process. vii
viii
P'refsce
This text has been designed for a one-semester course i!l stochastic prolcesses. Written in close conjunction vvith Introduction to l'robability Theory, the first volume of our three-volume series, it assumes that th1e student is acquainted with the material covered in a one-slemester course in probability for which elem1entary calculus is a prerequisite. Some of the proofs in Chapt,ers 1 and 2 are some'Nhat more difficult than the rest of the text, and they appear in appendices to these: chapters. These proofs and the starred material in Section 2.6 probably should bc� omitted or discussed only briefly in an elementary course . An instructor using this text in a one-quarter course will probably not have time to cover the entire text. He may wish to cover the first three chapters thoroughly and the relmainder as time permits, perhaps discussing those topics in the last three chapters that involve the Wiener process. Another option, however, is to eJmpha size continuous parameter proc1esses by omitting or skimming Chapters 1 and 2 and conce�ntrating on Chapters 3-6. (For example�, the instructor could skip Sections 1 .6. 1 , 1 .6.2, 1 .9, 2.2.2, 2.5. 1 , 2.6. 1 , and 2.8 . ) With some aid from the instructor, the student should be able to read Chaptc�r 3 without having studied the first tvvo chapters thoroughly. Chapters 4-6 are independent of the first two chapters and depend on C�apter 3 only in minor ways, mainly in that the Poisson process introduced in Chapter .3 is used in examples in the later chapters.. The properties of the Poisson plocess that are needed later are summarized in Chapter 4 and can be regarded as axioms; for the Poisson proc{�ss. The authors wish to thank the UCLA students who tolerated prelinlinary versions of this text and whose: comments resulted in numerous improvernents. Mr. Luis �Gorostiza obtained the answers to the exe:rcises and also made many suggestions that resulted in significant improvements. Finally, we wish to thank Mrs. Ruth Goldstein for her excellent typing.
Table
1
Mlarkov Chains 1.1 1.2� 1.3 1.4�
1.5; 1.6;
1.7' 1.8:
1.91
2
of Contents
Markov chains having two states Transition function and initial distribution Examples Computations writh transition functions 1.4.1 Hitting times Transition matrix 1.4.2 Transient and re:current states Decomposition of the state space 1.6.1 Absorption probabilities Martingales 1.6.2 Birth and death chains Branching and queuing chains Branching chain 1.8.1 Queuing chain 1.8.2 Appendix Proof of results for the branching and queuing chains Branching chain 1.9.1 Queuing chain 1.9.2
St.ltionary Distribut:ions of a Markov Chain 2.1 2.2,
2.3 2.4 2.5;
2.6;
Elementary properties of stationary distributions Examples Birth and death chain 2.2.1 Particles in a box 2.2.2 Average number of visits to a recurrent state Null recurrent and positive recurrent states Existence and uniqueness of stationary distributions Reducible chains 2.5.1 Queuing chain Proof 2.6.1 ix
1 2 5 6 12 14 16 17 21 25 27 29 33 34 36 36 38 39 47 47 49 50 53 56 60 63 67 69 70
Table of COjntents
x
2.'7
2.:8
3
Nlarkov Pure Jum�. Processes 3.1
3. 2
3.:3
4
4.1
4.:3
Mean and covariance functions Gaussian proce:sses The Wiener process
72 75 77 79 84 84 89 92 94 98 99 102 1 04 111 111 1 19 1 22
C,ontinuity, Integr�ltion, and Differelntiation of Secon,d 1 28 O�rder Processes 5.1
5.:2 5.:3 5.4
6
Construction of jump processes Birth and death processes 3.2.1 Two-state birth and death process 3.2.2 Poisson process Pure birth process 3.2.3 Infinite� server queue 3.2.4 Properties of a Markov pure jump process Applications to birth and death processes 3.3.1
S.�cond Order Pro�c:esses 4.:2
5
Convergence to the stationary distribution Appendix Proof of convergence 2.S.1 Periodic case 2.8.2 A result from number theory
Continuity assumptions Continuity of the mean and covariance functions 5.1.1 Continuity of the sample functions 5.1.2 Integration Differentiation White noise
1 28 1 28 1 30 1 32 1 35 141
St:ochastic Differelltial Equations, I:stimation Theor", alld Spectral Distriibutions 1 52 6.1 6.:Z
6.3
6.4
First order diff(�rential equations Differential equations of order n 6.2.1 The case n = 2 Estimation theory General principles of estimation 6.3.1 Some e:xamples of optimal prediction 6.3.2 Spectral distribution
Answers to Exercises Glossary of Notation Index
1 54 1 59 1 66 1 70 1 73 1 74 1 77 1 90 1 99 201
1
Markov Chains
(�onsider a system that can be in any one of a finite or countably infinite number of states. Let fI' denote this set of states. We can assume that fI' is a subset of the int�egers. The set !/ is called the state space of the system,. Let the system be observed at the discrete moments of time n 0, 1, 2, . , and let Xn denote the state of the system at tinle n. Since we are interested in non-deterministic systems, we think of Xm n > 0, as random variables defined on a common probability space. Little can be said about such random variables unless SOlne additional structure is imposed upon them. 1rhe simplest possible structure is that of independent random variables. This would be a good model for such systems as repeated experimLents in which future states of the system are independent of past and present states. In most systems that arise in practice, however, past and present states of the system influence the future states even if they do not uniquely determine them. �vfany systems have the property that given the present state, the past states have no influence on the future. This property is called the Markov property, and systems having this property ar(;� called Markov chains. The Markov property is defined precisely by the requirenlent that =
.
.
for every choice of the nonnegative integer 11 and the numbers Xo, . . . , xn+ 1, each in �Cf. The conditional probabilities P(Xn+ 1 Y I Xn x) arc� called the transition probabilities of the chain. In this book we will study Markov chains having stationary transition probabilities, i.e., those such that P(Xn+ 1 Y I Xn x) is independent of n. From now 011, when we say that Xm n > 0, forms a Markov chain, we mean that these rando]tn variables satisfy the Markov property and have sta1tionary transition probabilities. lrhe study of such Markov chains is worthwhile from two viewpoints. First, they have a rich theory, much of which can be presented at an elementary level . Secondly, there are a large nurnber of systems arising in practice that can be modeled by Markov chains, so the subject has many useful applications. =
=
=
1
=
Marko v C�hains
2
In order to help motivate the general results that will be discussed later, we: begin by conside:ring Markov chains having only two states. 1,1 , .
Markov cha i n!; havi ng two state!;
For an example of a �f{arkov chain having t,�o states, consider a machine that at the start of any particular day is either broken down or in ope:rating condition. Assume that if the machine is broken down at the start of the nth day, the probability is p that it will be successfully repaired and in operating condition at the start of the (n + l)th day. Assume also that if the machine is in operating condition at the start of the nth day, the probability is q that it will have a failure causing it to be broken down at the start of the (n + l)th day. Finally, let 1to(O) denote the probability that the machine is broken down initially, i.e .. , at the start of the Oth day. Let the state 0 correspond to the machine being broken down a.nd let the state 1 correspond to the machine being in operating condition. Let X;" be the random variable denoting the state of the machine at tilme n. According to the above description }) (Xn+ 1 = 1 I Xn = 0) = p , }) (Xn+ 1 = 0 I Xn = 1 ) = q,
and
P(Xo = 0) = 1tO (O). Since there are only two states, 0 and 1 , it follows immediately that P(�(n+ 1 = 0 I Xn = 0) P(�(n+ 1 = 1 I Xn = 1)
=:
1 - p,
=:
1 - q,
and that the probability 1to(l) of being initially in state 1 is given by 1to(l) = P(Xo = 1) = 1 - 1to(O). From this information, we can easily compute P(Xn = 0) and P(Xn = 1). ",,"'e observe that }' (Xn+ 1 = 0) = P(Xn = P(Xn
:= :=
0 and Xn+ 1 = 0) -1- P(Xn O)P(Xn+' l = 0 I Xn = 0)
=
1 and Xn+ 1 = 0)
+ P(�J(n = I)P(Xn+ 1 = 0 I Xn = 1) =
(1 - lJ)P (Xn
= 0) + q P(XII1 = 1)
= ( 1 - p)P(Xn = 0) + q(1 - P(Xn = 0» = (1 - lJ - q)P(Xn = 0) + q.
,. ,.
Mark:o v chains having two .states
3
Now P(Xo = 0) = no(O), so
P(XI and
:=
0) = (1 - p - q)1to(0) + q
P(X2 = 0) = (1 - p - q)P(XI = 0) + q 2 = (1 - p - q) no(0) + q[1 + (1 n
It is easily seen by repea.ting this procedure (2)
0)
P(Xn =
(1
=
p
- q)].
times that n-l n - p - q)i . - p - q) no(O) + q � i= O
(1
In the trivial case p = q = 0 , it is clear that for all and
P(Xn = 0) = no(O)
n
P()(n = 1) = no(I) .
Suppose now that p + q > O. Then by the formula for the suml of a finite geometric progression,
f (1
R
_
i= O
p
_
W�e conclude from (2) that
(3)
P(Xn = 0) = _L + p+ q
and consequently that (4)
P(Xn =
1)
= _L +
p+ q
1 - (1
- p - q)R p+ q
q)i =
(1
- p - q)
(1
- p - q)
)
(
q , p+ q
(
p . p+ q
n n o(O) -
n
•
no( l) -
)
Suppose that p and q are neither both equal to zero nor both equal to 1. Then 0 < p + q < 2, "'hich implies that 11 p - ql < 1. In this case we: can let n --+ 00 in (3) and (4) and conclude that --
(5)
lim P(Xn = 0) n-'oo
==
q p+ q
and
lim P(Xn = 1) = 1_ ' n-.oo p+ q _
W�e can also 0 btain th�e pro ba bili ties q/(p -+ q) and p/(p + q) by a different approach. Suppose we want to choose no(O) and no(l) so that P(.Xn = 0) and P(Xn = 1) are independent of n. It is clear from (3) and (4) that to do this we should choose
q no(O) = -- p+ q
and
no( l) =
p . p+ q
Thus we see �hat if Xn, n > 0 , starts out with the initial distribution
q P(Xo = 0) = -- p+ q
and
P(x"o =
1)
p p+ q
= --
Marko v (:hains
4
then for all
n
P(Xn = 0) =
q p+q
-
and
P(}(n = 1 ) =
P p+q
The description of the machine is vague b(�cause it does not really say whether Xm n > 0 , can be assumed to satisfy the Markov property. Let us suppose, however, that the Markov prop(�rty does hold. We can use this added information to compute the joint distribution of Xo , Xl , . .. , Xn• For example, let n = 2 and let xo , Xl ' and X2 each equal 0 or 1 . Then
P(Xo = Xo, Xl = Xl' and X2 = X2) = P(Xo = Xo and Xl = XI)P (X2 = X2 I Xo = Xo and Xl = Xl) = P(Xo = XO)P(XI == Xl I Xo = XO)P(X2
==
x21 Xo = Xo and Xl
==
Xl) ·
Now P(X o = xo) and P(XI = Xl I X o = xo ) are determined by p, q, and 1to (O) ; but without the Markov property, we cannot evaluate P(X2 = X2 I Xo = Xo and Xl = Xl) in terms of p, q, and 1to (O) . If the M:arkov property is satisfied, however, then
P(X2 = X2 I Xo = Xo and Xl = Xl) = "P(X2 = X2 I Xl
=
Xl),
which is determined by p and q . In this case
P(Xo = Xo , Xl = Xl ' a.nd X2 = X2) = P(X o = XO)P(XI = Xl I X o
For exanlple,
P(Xo
= 0 , Xl =
=:
XO)P(X2 = X2 I Xl
1 , and X2 = 0) = P (X o = O)P(XI - 1 I �(o = 1t o (O)pq.
=
==
Xl)·
0)P(X2 = 0 I Xl = 1)
The reader should che�ck the remaining entries in the following table, which gives the joint distribution of X o , Xl' and X2•
Xo
Xl
X2
0 0 0 0
0 0
0 0
1 1 1 1
1 1 0 0
0
1 1
0
1 1 1 1
P(X o = Xo , Xl = Xl' and X2 = x2 )
1to(O)( 1 - p)2 1t o (O)( 1 - p)p 1to (O)pq 1to(O)p( 1 - q) ( 1 -- 1to(O))q( 1 - p) ( 1 -- 1to (O))qp ( 1 -- 1to (O))( 1 - q)q ( 1 -- 1to (O))( 1 - q)2
1 . 2.
5
Transition function and initial distribution
1 .2 .
Tra nsition fu nction and i n it i a l d i stri bution
Let XII' n > 0, be a Markov chain having state space f/. (The restriction to two states is now dropped.) The function P(x, y), x E f/ and y E f/, defined by =
P(X, y)
(6)
,P (XI
=
=
y I Xo
x, Y E f/,
x),
is called the transition function of the chain. It is such that
P(X, y) > 0,
(7)
x, Y E f/,
and
L P(x, y)
(8)
y
=
1,
x
E:
1/.
Since the Markov chain has stationary probabilities, we see that (9)
P(XII+l
=
y I XII
=
=
x)
n
P(x, y),
> 1.
It now follows from the Markov property that (10)
P(XII+l
=
y I Xo
=
xo, · · · , XII-l = XII-I ' XII = X)
=
P(x, y).
In other words, if the Markov chain is in state X at time n, then no matter how it got to x, it has probability P (x, y) of being in state y at th(� next step. For this reason the numbers P(x, y) are called the one-step transition probabilities of the Markov chain. The function 1to(x), X E 1/, defined by (11)
1to(X)
=
P(Xo
=
x),
is called the initial distribution of the chain. It is such that
1to(X) > 0,
( 1 2)
X E f/,
and
(13)
1.
=
L 1to(x) x
The joint distribution of Xo, . . . , XII can easily be expressed in terms of the transition function and the initial distribution. For example,
P(Xo = Xo, Xl
=
Xl)
=
Also,
P(Xo
=
Xo, Xl
=
=
=
Xl ' X2 P(Xo
=
=
=
P(Xo
=
=
XO)P(XI
Xl I Xo
=
xo)
1to(xo)P(xo, Xl)·
X2)
Xo, Xl
=
Xl)P(X2
1to(Xo)P(xo, Xl)P(X2
=
=
X2 I Xo
X2 I Xo
=
=
Xo, Xl
Xo, Xl = Xl).
:=
Xl)
6
Marko v (�hains
Since Xm n > 0 , satisfies the Markov property and has stationary transition probabilities, we see that
,
P(X2 = X2 I Xo = Xo, Xl = Xl) = P(X2 = X2 I Xl = Xl) = P(XI = X2 I Xo = Xl) = P(Xh X2) ·
Thus
P(Xo = Xo, Xl Xl ' X2 = X2) = 1to(.Xo)P(xo, XI)P(X1, x2) · By induction it is easily seen that =:
P(Xn-l, Xn) . (14) P(Xo = Xo, . . . , Xn = xn) = 1tO(XO)P� (Xo, Xl) It is usually more convenient, however, to reverse the order of our de:finitions. We say that P(x, y), X E f/ and y E f/, is a transition /unction if it satisfies (7) and (8), and we say that 1to(x), X E f/, is an initial distribu tion if it satisfies ( 1 2) and (13) . It can be sho'wn that given any transition function P and any initial distribution 1to, there is a probability spac�e and random variables Xm n > 0 , defined on that space satisfying (14) . It is not difficult to show that these random variables form a Markov chain having transition function P and initial distribution 1to . The reader may be bothered by the possibility that some of the condi tional probabilities we have discussed may not be well defined. For example, the left side of (1) is not well defined if · · ·
P(}(o = Xo, · . . , Xn = xn) = O . This difficulty is easily r�esolved. Equations (7), (8), (12), and (13) defining th,e transition functions and the initial distributions are well defined, and Equation (14) describing the joint distribution of Xo, . . . , Xn is well de:fined. It is not hard to show that if (14) holds, then (1), (6), (9), and (10) hold whenever the conditional probabilities in the respective equations are w(�ll defined. The same qualification holds for other equations involving conditional probabilities that will be obtained l later. It will soon be appare:nt that the transition function of a Markov chain plays a much greater role in describing its properties than does the initial distribution. For this r�eason it is customary to study simultaneously all Markov chains having a given transition function. In fact we adhcere to th�e usual convention that by "a Markov chain having transition function P, " we really mean the family of all Markov chains having that transition function. \.
1.3.
Exam pl es
In this section we will briefly describe sev(�ral interesting examples of Markov chains. These e:xamples will be furthe:r developed in the sequel.
1.3.
ExsR,ples
Exa m ple 1.
7
Let �l ' �2' be independent integer valued random variables having common density f Let Xo be an integer-valued random variable that is independent of the �/s and set X"J= Xo + �l + + �n . The sequence Xn , n > 0 , is called a random walk. It is a Markov chain whose state spaoe is the integers and �whose transition function is given by Ra nd()m wa lk.
•
•
•
· · ·
P(X, y) = fey - x). To verify this, let 1to denote the distribution of Xo. Then
P(Xo= xo , ' . . , Xn = xn)
= P(J,(o= XO , �l = Xl - ��o, · . . , �n = Xn - xn·-l) = P(){o= XO)P(�1 = Xl - xo) · ·· P(�n = xn - xn-l) = 1to(.xo)f(xl - xo) f (x - xn-l) · · ·
= 1tO (XO)P(Xo, Xl)
and thus (14) holds.
· · ·
ln
P(Xn-l , xn),
Suppose a "particle" l1noves along the integers according to this Markov chain. Whenever the particle is in x, regardless of how it got th(�re, it jUlnps to state y with probability fey - x) . .As a special case, consider a simple randorn walk in which f(l) = p, f( - 1 )= q, and f(O) = r, where p, q, and r are nonnegative and sum to one. The transition function is given by
P(x, y)=
p, q, r,
0,
y= x + 1 , y= ;� - 1 , y= x,
elsewhere.
Le:t a particle undergo such a random walk. If the particle is in state x at a given observation, then by the next observation it will have jumped to state X + 1 with probability p and to state x - I with probability q ; with probability r it will still be in state x . IExa mple 2.
The followring is a simple model of the exchange of heat or of gas molecules between t,,{O isolated bodies. Suppose we: have two boxes, labeled 1 and 2, and {l balls labeled 1 , 2, . ., . , d. Initially some of these balls are in box 1 and the remainder are in box 2. An integer is selected at random from 1, 2, . . . , d, and the balliabel�ed by that integer is removed from its box and placed in the opposite box. This procedure is repeat1ed indefinitely with the selections being ind1epen dent from trial to trial. Let Xn denote the nurnber of balls in box 1 after th(� nth trial. Then Xn , n > 0, is a Markov chain on f/= {O, 1 , 2, . , ' d}. EhrenfE�st cha i n .
.
.
8
Marko v Chains
The transition function of this Markov chain is easily computed. Suppose that there are .x balls in box 1 at timle n . Then with probability x/d the ball drawn on the (n + l)th trial will be from box 1 and will be transferred to box 2. In this case there will be x - I balls in box 1 at time n + 1 . Similarly, with probability (d - x)/d the ball drawn on the (n + l)th trial will be from box 2 and will be transferred to box 1 , resulting in x + 1 balls in box 1 at time n + 1 . Thus the transition function OIf this Markov chain is given by
x
P(x, y)
d'
x 1-d '
=
0,
y
=
y
:=
x
- 1,
X+
1,
elsewhere.
Note that the Ehrenfest chain can in one transition only go from state x to x - l or x + 1 with positive probability. A state a of a Markov chain is called an absorbing state if P(a, a) = 1 or, equivalently, if P(a, y) 0 for y i= a. The next example uses this dennition. =
Exa m pl e 3.
Suppose a gambler starts out with a certain initial capital in dollars and makes a series of one dollar bets against the house. Assume that he has respective probabilities p and q 1 - p of winning and losing each bet, and that if his capital ever reaches zero, he is ruinled and his capital re1mains zero thereafter. Let Xn, n > 0, denote the gambler ' s capital at time n. This is a Markov chain in which 0 is an absorbing state, and for x > 1 Gambler's rui n chai n .
==
(
(1 5)
P(:x, y)
=
q,
p,
0,
y y
= x - I, = x + 1,
elsewhere.
.
Such a chain is called a gambler's ruin chain on f/ {O, 1 , 2, . . } . We can modify this model by supposing that if the capital of the gal1nbler increases to d dollars hIe quits playing. In this case 0 and d are both absorbing states, and ( 1 5) holds for x = 1 , . . . , d - 1 . =
F or an alternative interpretation of the latter chain, we can assume that two gamblers are making a series of one dollar bets against each other and tha.t between them they have a total capital of �d dollars. Suppose the: first garnbler has probability p of winning any given bet, and the second gambler has probability q = 1 - p of winning. The two gamblers play until one
1 . 3.
Exsm'ples
9
of them goes broke. Let Xn denote the capital of the first gambler at time n. Then Xm n > 0, is a gambler ' S ruin chain on {O, 1 , . . . , d} . Consider a Markov chain either on f/ = {O, 1 , 2, ...} or on f/ = {O, 1 , ... , ti} such that starting from x th(� chain will be at x 1 , x, or x + 1 after one step. The transition function of such a chain is given by Exa m ple 4.
Birth and death chain.
--
P(x, Y)
=
qx,
Y = x -I , Y = x, Y = x + 1,
elsewhere,
where Px, qx , and rx are nonnegative numbers such that Px + qx + rx = 1 . The Ehrenfest chain and the two versions of the gambler' S ruin chain an� examples of birth and death chains. The phrase "birth and dleath" ste:ms from applications iln which the state of the chain is the population of S01me living system. In these applications a transition from state x to sta.te x + 1 corresponds to a "birth," while a transition from state: x to sta.te x-I corresponds to a "death." In Chapter 3 we will study birth and death processes. These processes ar(� similar to birth and death chains, except that jumps are allowed to oc��ur at arbitrary times instead of just at integer times. In most applica tions, the models discussed in Chapter 3 are: more realistic than those obtainable by using birth and death chains. IExa mple 5.
Consider a. service facility such as a ch��ckout counter at a supermarket. People arrive at the facility at various tinles and are eventually served. Those customlers that have arrived at the facility but have not yet been served form a waiting line or queue. �rhere ar(� a variety of models to describe such systems. We will consider here only one very simple and somewhat artificial model ; others will be discussed in Chapter 3 . Queui ng cha i n .
]Let time be measured in convenient periods, say in minutes. Suppos�� that if there are any customers waiting for service at the beginning of any given period, exactly one customer will be served during that period, and that if the�re are no customers 'Naiting for service at the beginning of a pe�riod, none will be served during that period. Let �n denote the number of new customers arriving during the nth period. We assume that �1 ' �2' are independent nonnegative integer-valued randorn variables having COlllmon density f •
•
•
Marko v Cbains
10
1Let Xo denote the numlber of customers pres��nt initially, and for n > 1, let Xn denote the number of customers present at the end of the nth pe�riod. If �Yn = 0, then Xn + I = �;n + I ; and if Xn > 1 , then Xn + I = Xn + en + I - 1 . It follows without diffi(�ulty from the assum.ptions on �m n > 1 , that Xml n > 0, is a Markov chain whose state space� is the nonnegative integers and whose transition function P is given by
P(O, y)
=
P(x, y)
= f( y
f(y)
and
- x + 1),
x
> 1.
Consider particles such as neutrons or bacteria that can gen��rate new particles of the same type. The initial set of objects is referred to as belonging to the Oth generation. Particles generated from the nth generation are said to belong to the (n + l)th generation. Let Xm n ;> 0, denote the nUInber of particles in the nth generation. 1� othing in this description requires that the various particles in a genera tion give rise to new particles simultaneously. Indeed at a given time, pal1icles from several generations may coexist. IExa m ple 6.
Bra nch i ng cha i n .
j\.
typical situation· is illustrated in Figure 1 : one initial particle gives rise to 1two particles. Thus Xo = 1 and Xl = 2. One of the particles in the� first generation gives rise to three particles and the other gives rise to one particle, so that X2 = 4. We see from Figure 1 that X3 = 2. Since neither of the particles in the third generation gives rise to new particles;, we conclude that X4 = ° and consequently that Xn = ° for all n > 4. In other words, the progeny of the initial particle in the zeroth generation become extinct after three generations.
Figure 1
1.3.
Examples
11
In order to model this system as a Markov chain, we suppose that each particle gives rise to � particles in the next generation, where � is a non negative integer-valued random variable having density f We suppose that the number of offspring of the various particles in the various genera tions are chosen independently according to the density f Under these assumptions Xm n > 0, forms a Markov chain whos{� state space is the nonnegativ(� integers. State 0 is an absorbing state. JFor if there are no particles in a given generation, there will not be any particles in the next generation either. For x > 1
P(x, y)
=
P(� l +
· · ·
+ ��x
=
y),
where � 1 ' ' �x are independent random variables having common density f In particular, P (l , y) = f(y), y > o. If a particle gives rise to � = 0 particles, the interpretation is that the pa.rticle dies or disappears. Suppose a particle gives rise to � particles, which in turn give rise to other particles ; but after some number of generations, all descendants of the initial particle have died or disappeared (see Figure 1). We describe such an event by saying that the descendants of the original particle eventually become extinct. An interesting problem involving branching chains is to compute th�e probability p of ev(�ntual extinction for a branching chain starting with a single particle or, equivalently, the probability that a branching chain starting at state 1 will eventually be absorbed at state o. Once we determine p, we can easily find the probability that in a branching chain starting with x particles the de:scendants of each of the original particles eventually become extinct. Indeed, since the particles are assumed to act independently in giving rise to new particles, the desired probability is just pX. The branching chain was used originally to determine the probability that the male line of a given person would eventually become extinct. For t"his purpose only male children would b(� included in the various generations. •
•
•
Exa m ple 7.
Consider a gene composed of d subunits, wher(� d is some positive integer and each subunit is either normal or mutant in form. Consider a cell with a gene composed of m mutant subunits and d m normal subunits. Before the cell divides into two daughter cells, the gene duplicates. The corresponding gene of one of the daughter cells is com posed of d units chosen at random from the 2m mutant subunits and the 2(ld m) normal subunits. Suppose we follow a fixed line of dc�scent from a given gene. Let Xo be the number of mutant subunits initially -
-
Marko v C�"ains
12
pre�sent, and let Xn, n > 1 , be the number pre�sent in the nth descendant gene. Then Xm n > 0, is a Markov chain on /:1' = {O, 1 , 2, . . . , d} and
States ° and d are absorbing states for this chain. 1 .�J.
Computati ons �N ith transition functions
]�et Xm n > 0, be a M[arkov chain on f/ having transition function P. In this section we will show how various conditional probabilities can be expressed in terms of P. We will also define the n-step transition funlction of the Markov chain. ��e begin with the fonmula
(16)
P (Xn+1 = Xn+1,
•
•
•
, Xn+m = Xn+m I �ro = xo,··· , Xn = X�I) = P(Xm Xn+1)· ·· P(Xn+m-l , Xn+m) ·
To prove ( 1 6) we write the left side of this equation as
, Xn+m = Xn+m) P(Xo = XO' P(XO = XO, ·· · , Xn = Xn) •
•
•
By ( 1 4) this ratio equals
tro(Xo)jP(xo, Xl)· · • P(Xn+m-1l , xn+m) trO(XO) P(Xo, Xl)· · P(Xn-1 , Xn) ·
which reduces to the right side of ( 1 6). It is convenient to rewrite ( 16) as
( 17) P(Xn+1
= Yl ' · . · , ..f"n+m = Ym I Xo = Xo, · · · , Xn-1 = Xn-l ' Xn = x) = P(x, Yl)P(Yl ' Y2)··· P( Ym-l �' Ym) ·
Let Ao, . . . , An-1 be subsets of ff. It follows from ( 17) and Exercise: 4(a) that
( 1 8) P(Xn+1 Let B1, that
•
•
•
= Yl ' ·. . , ..:f"n+m = Ym I Xo E Ao, · . . , Xn-1 E An-I , Xn
, Bm be subsets of
ff.
==
X)
= P(X, Yl)P(Yl ' Y2) • · · P(Ym-l " Ym) ·
It follows from ( 18) and Exercise 4(b)
1.4.
Computations with transiti'on functions
13
'The m-step transition .function p m(x, y), which gives the probability of going from x to y in m steps, is defined by m (20) � P(x, YI)P(YI ' Y 2) · · · p (x, y)= � Y1
. . .
l for m > 2, by p (X, y)
:=
Ym-1
P(x, y), and by
{I,
O p (x, y) =
x= y , elseVl{here.
0,
W1e see by setting BI = . . . = Bm-l = f/ and Bm =
{y} in (19) that
y I x"o E Ao, ... , Xn-l E An-I , Xn = X) = pm(X, y). In particular, by setting Ao= · · ·= An-l = f/, we see that
(21)
P(Xn+m =
(22) It also follows from (21 ) that (23)
P(Xn+m
==
y I Xo
= X, Xn = z) = pm(z, y).
Since (see Exercise 4(c)) l,n+m(x, y)= P(Xn+m = Y I Xo= x)
= � P(Xn= z I Xo z
=
n � p (x, z)P(Xn+m z
= x)P(Xn+m = =
Y I Xo
=:
Y I Xo
= x, Xn
==
z)
x, Xn= z),
we: conclude from (23) that n+ n m (24) p 1IrI(x, y)= � p (x, z)p (z , y). z
For Markov chains having a finite number of states, (24) allows us to think of pn as the nth power of the matrix P:, an idea we will pursue in Section 1 .4.2. :Let 1to be an initial distribution for the Markov chain. Since
P(Xn= y)
we: see that (25)
==
� P(Xo= x, Xn = y)
==
� P(Xo = x)P(Xn
x
x
:=
Y I Xo= x),
P()(n= y)= � 1to(x)pn(x, y). x
This formula all ows us to compute the distribution of Xn in terms of the . n initial distribution 1to and the n-step transition function p .
14
Marko v Cbains
JFor an alternative method of computing the distribution of Xn, observe that P(Xn+1
=
y)
==
�x P(Xn
==
� P(Xn = X)P(Xn+I = Y I Xn = x) ,
so that (26)
=
X, Xn+l
=
y)
x
P(Xn+l =
y)
� P(Xn = X)P(X, y).
=
x
If Vie know the distribution of Xo, we can use (26) to find the distribution of Xl. Then, knowing the distribution of Xl' we can use (26) to find the distribution of X2• Similarly, we can find the distribution of Xn by apply ing (26) n times. 'We will use the notation Px( ) to denot�e probabilities of various ev��nts defined in terms of a Markov chain starting at x. Thus Px(XI ¥= a, X2 ¥= a, X3
=
a)
denotes the probability that a Markov chain starting at x is in a state a at time 3 but not at time 1 or at time 2. In ternlS of this notation, ( 19) can be rewritten as (27) P(Xn+1 E BI,· .., )(n+m E Bm I Xo E Ao,···, Xn-l E An-I' Xn = X) = Px(XI E B1, ••• , Xm E Bm). 1.4�.1 .
H itti ng ti mes.,
Let A be a subs�et of
TA of A is defined by
1.:'
= min
(n > 0 :
Xn
E:
Y.
The hitting time
A)
if )rn E A for some n > 0, and by TA = 00 if Xn ¢ A for all n > O. In other words, TA is the first positive time the Nlarkov chain is in (hits) A. Hitting times play an important role in the th��ory of Markov chains.. In this book we will be interested mainly in hitting times of sets consisting of a single point. We denote the hitting time of a point a E Y by 7;, rather than by the more cumbersome notation 1{a} . .i�n important equation involving hitting times is given by (28)
n p ( x, y ) =
n�
�:
m==l
Px(Ty = m)p
n-m
(y�, y),
n>
1.
In order to verify (28) we note that the levents {Ty = m, Xn 1 �; m < n , are disjoint and that { Xn = y} =
n
U
_=1
{Ty = m, �J(n = y}.
=:
y},
1.4.
Computations with transition functions
15
We have in effect decomposed the event {Xn = y} according to the hitting tirne of y. We see from this decomposition that
=
=
n L Px(Ty = m, Xn = y) m= l n L Px(Ty = m)P(Xn = Y I Xo m= l n
L Px(Ty m= l
=
m)P(Xn
n - mLl Px(Ty =
=
m)pn - m( y , y),
=
= Y I
Xo
= x,
Ty
= X,
Xl
= #=
m) y, . . . ,
and hence that (28) holds. Exa m ple 8. Show that if Px(Ta < n), n > 1 .
a is an absorbing state, then p n(x, a)
=
If a is an absorbing state, then p n - m(a, a) = 1 for 1 < m < n, and hence (28) implies that
pn(x, a)
n
L Px(Ta = m)pn - m(a, a)
=
m= l n L Px( Ta m= l
=
=
m)
= Px( Ta < n).
�Dbserve that
and that
Px( Ty
= 2) =
L JPx(X l
z*y
=
Z,
X2
=
y)
=
L P(x, z )P(z, y).
z*y
For higher values of n the probabilities Px(Ty = n) can be found by using the formula
(29)
Px( Ty
= n
+ 1)
=
L P(x, z )Pz( Ty
z*y
= n) ,
n >
1.
This formula is a consequence of (27), but it should also be directly obvious. For in order to go from x to y for the first time at time n + 1 , it is necessary to go to some state z :F y at the first step and then go from z to y for the first time at the end of n additional steps.
16
1 .•�. 2.
Suppose now that the state space f/ is finite, say f/ = {O, 1 , . . . , d} . In this case we can think of P as the transition matrix having d + 1 rows and columns given by Transition mcitrix.
�[ �
P( ' 0)
d P(O, d)
d P·(d, 0)
P(d, d)
0
For example, the transition matrix of the gambler ' s rUIn chain on {O" 1, 2, 3} is 0 1 2 3 0 1 0 0 0 1 q 0 p 0 2 0 q 0 P 3 0 0 0 1 n Similarly, we can regard p as an n-step transition matrix. Formula (24) with m = n = 1 becom��s
p2 (X , Y)
=
L P(x, z )P(z, y). z
Re:calling the definition of ordinary matrix multiplication, we observe 2 that the two-step transition matrix p is the product of the matrix P' with itsc�lf. More generally, by setting m = 1 in (24) we see that
pn + l (X , y)
(30)
=
L pn(x, z ) l)(z , y). z
n It follows from (30) by induction that the n-step transition matrix p is th(;� nth power of P. AA.n initial distribution no can be thought of as a (d + I)-dimensional ro�w vector
If 'we let
'ltn
denote the «(1 + I)-dimensional row vector 'ltn
= (P(Xn = 0), . . . , P(X� = d)),
the�n (25) and (26) can b�� written respectively as and
"fhe two-state Markov chain discussed in St�ction 1 . 1 is one of thle few n examples where p can be found very easily.
1 . 5.
Tran��ient and recurrent states
Exa m ple 9.
17
Consider the two-state Markov chain having one-step
transition matrix
p=
[
]
1 - p p 1- q ' q
where p + q > O. Find pn .
In order to find p n (o, 0)= pO (Xn = 0), w'e set n o (O) obtain
pn(o, 0)
==
q + (1 p+ q
_
P
_.
q )n
=
1 in (3) and
P p+q
In order to find p n(o, 1) = Po (Xn = 1), we set no (l) = 0 in (4) and obtain
pn(o, 1)
==
P - (1
_
p+q
p
_ .
q) n
P
p+ q
Si1nilarly, we conclude that
p(n 1 , 0)
==
q - (1 p+ q
_
p
_
n
q p+q
q) n
q p+q
q)
and
pn(1, 1) It follows that
pn =
1 .�5.
==
P p+q
+ (1
_
p
_
[ ]
[
]
1 (1- p- q )n p -p q p + . -q q p+ q q P p+ q
Tra nsient a nd roecu rrent states
Let Xm n > 0, be a 1vJlarkov chain having state space function P. Set
Pxy = Px(Ty
g
and transition
< ex)).
Then Pxy denotes the probability that a Markov chain starting at x will be in state y at some positive time. In particular, Pyy denotes the prob ability that a Markov chain starting at y will ever return to y. A state y is call1ed recurrent if Pyy = 1 and transIent if Pyy < : 1. If y is a recurrent state, a ]\1arkov chain starting at y returns to y with probability one. If y is a transient state, a Markov chain starting at y has positive probability 1 PYl' of never returning to y. If y is an absorbing state, then Py(Ty = 1) =
18
Marko v Chains
P(y, y)
=
1 and hence Pyy
recurrent. Let l y(z),
z
=
1 ; thus an absorbing state is necessarily
E fI', denote the indicator function of the set
l y(z)
{�:
=
Z
{ y} defined by
y, ¥= y.
=
z
Le�t N (y) denote the number of times n > 1 that the chain is in state y. Since 1 y(Xn) = 1 if the chain is in state y at time n and 1 y(Xn) = 0 other wise, we see that
N( y)
(3 1)
=
00
1: l y(Xn) · n= l
The event {N (y) > I } is the same as the event { Ty < oo }. Thus
Px(N(y) > 1)
=
Px (Ty
< (0) =
Pxy .
L��t m and n be positive integers. By (27), the: probability that a Markov chain starting at x first visits y at time m and next visits y n units of time later is Px(Ty = m)Py(Ty = n). Thus
Px(N( y) > 2)
00
00
1: 1: Px(Ty
m= 1
=
n= 1
=
m)Py(Ty
= n)
PxyPyy ·
=
Si:milarly we conclude that
Px(N( y) > m)
( 32)
=
PXyp;y- l ,
m > 1.
Since
Px(N (y)
=
m)
=
Px(N (y) >
m)
-
Px(N (y) > m + 1),
it follows from (32) that
m > 1.
(33) Also so that (34)
Px(N (y)
=
0) = 1 - Pxy .
These formulas are intuitively obvious. To sc�e why (33) should be true, for example, observe that a chain starting at x visits state y exactly m times if and only if it visits y for a first time, returns to y m - 1 additional times, and then never again returns to y.
1.5.
Trans�ient and recurrent states
19
'We use the notation l�x( ) to denote expectations of random variables dejfined in terms of a Ma.rkov chain starting at x. For example, (35)
It follows from (3 1) and (35) that
00
= �
n= l 00
= �
n= 1
Ex( ly(�� n)) pn(x, y).
Set G(x, y )= Ex(N( y))=
00
� .p n(X , y) .
n= l
Then G(x, y) denotes the expected number of visits to y for a Markov chain starting at x. Theorem 1
(i) l,et y be a transient state. Then
and G(x, y)=
(36)
which is finite for all .x E [/'.
1
Pxy -
Pyy
,
X E [/',
(ii) Let y be a recurrent state. Then JPy(N(y)
G( y, y) = 00 . Also
(37)
·If Pxy
=
(0)
=
1 a.nd
X E [/'. =
0, then G(x, y)
=
0, while if Pxy > 0, then G(x, y)
=
00 .
�rhis theorem describes the fundamental difference between a transient state and a recurrent state. If y is a transient state, then no matter vvhere the: Markov chain starts, it makes only a finit�� number of visits to y and the: expected number of visits to y is finite. Suppose instead that y is a recurrent state. Then if the Markov chain starts at y, it returns to y infinitely often. If the chain starts at some other state x, it may bc� im possible for it to .ever hit y . If it is possible, however, and the chain does visit y at least once, then it does so infinitely often.
20
Markov Chains
Proof. (32) that
Let
Px(N(y) By
y
be a transient state. Since 0
Pyy
< 1 , it follows from
lim
PXyp�y- 1
m-+ oo
=
o.
(33) G(x, y)
Ex(N(y»
=
00 =
m)
m= l 00
� mpXyp�y- 1 (1 - Pyy).
=
m= l
Substituting t
Pyy in
=
the power series
� m�4IJI - 1 -
�
_
m= l
1 (1 - t)2
,
we conclude that
G(x, y)
Pxy 1 - Pyy
=
< 00 .
This completes the proof of (i). Now let
y
be recurrent. Then
Px(N(y)
Pyy
(0)
=
=
=
1 and it follows from
lim Px(N( y) >
m-+ oo
lim
=
m-+ oo
In particular,
Py(N(y)
(0)
=
Pxy
=
(32)
that
m)
Pxy ·
1 . If a nonnegative random variable has
=
positive probability of being infinite, its expectation is infinite. Thus
G(y, y) Px( Ty implies that P"(x, y) Pxy > 0, then Px(N(y) If Pxy
=
0, then
m)
= =
0,
=
=
> 1;
n
=
G(x, y)
=
00 .
° for all finite positive integers
=
(0)
Ey(N (y» thus
Pxy
=
G(x, y)
=
° in this case.
y
Ex(N(y»)
=
00 .
I
be a transient state. Since 00
� P"(x, y) ,, = 1
=
G(x, y)
< 00 ,
X E f/,
we see that
(38)
lim ,,-+ 00
P"(x, y)
=
If
> ° and hence
This completes the proof of Theorem 1 . Let
m, so (28)
0,
X E f/.
1 . 6.
Decomposition of the state space
21
.4�
Markov chain is caned a transient chain if all of its states are transient and a recurrent chain if all of its states are recurrent. It is easy to see that a Markov chain having a finite state space must have at least one recurrent state and hence cannot possibly be a transient chain. For if f/ is finite and all states are transient, then by (38) o = 1: lim p n(x , y)
y E 9' n -' oo
1: pn(x, y) n-' oo y E 9' lim Px(Xn E f/) n-' oo
= lim =
= lim 1
n-' oo
1,
=
which is a contradiction.
1 .6 .
Decompositi on o f the state space
Let x and Y be two not necessarily distinct states. We say that x leads to Y if Pxy > O. It is left as an exercise for the rea.der to show that x lea.d s to Y if and only if pn(x, y) :> 0 for some positive integer n. It is also lc�ft to the reader to show that if x leads to y and y leads to z, then x leads to z. Theorem 2 Let x be a recurrent state and suppose that to y. Then y is recurrent and Pxy = Pyx = 1 .
Proof
Since
We assume that y
'#
Px(Ty
x leads
x, for otherwise there is nothing to prove.
0,
we see that Px(Ty = n) > 0 for some positivc� integer n. Let no b�e the least such positive integer, i.e., set
>
1:
Px(Ty = n) > 0). It follows easily from (39) and (28) that p no (x, y) > 0 and no = min (n
(39)
1 < m < no .
(40)
Since p no (x, y) > 0, we can find states Yl ' . . . , Yno - l such that
Yl ' . . . , Xno - 1 = Yno - l ' Xno = y) = P (x, Yl ) · · · P( Yno - l , y) > O. NOIne of the states Yl ' . . . , Yno - l equals x or Y ; for if one of them did (�qual x or y, it would b� possible to go from x to Y with positive probability in
Px(X1
=
fewer than no steps, in contradiction to (40).
Marko v Chains
22
'We will now show that Pyx = 1 . Suppose on the contrary that Pyx < 1 . Then a Markov chain starting at y has positive probability 1 Pyx of never hitting x. More to the point, a Markov chain starting at x has the positive probability -
p (x�, Y 1) · · · P (Yno - 1 , y)( 1 - Pyx) of visiting the states Y 1 ' . . . , Yno - 1 ' Y successiv�ely in the first no times and never returning to x after time no . But if this happens, the Markov chain never returns to x at any time n > 1 , so we have contradicted the assumption that x is a r(;�current state. Since Pyx = 1 , there is a positive integer n 1 such that p n l(y, x) > 0. Now
pn l + n + no( y, y)
Py(Xn 1 + n + no = y) > Py(Xn 1 = X, Xn 1 + n == X, Xn 1 + n + no n n n = p l( y, x)p (x, x)P O(x, y) . =
Hence
00
L1
G( y, y) >
=
y)
pn( y, y)
n = n l + + no n +n +no( y, y) = L p l n= 1 > p n l( y, X)PnO(X, y) L p n(X, x) n = 1. n n = p l( y, X)P O(X, y)G(X, x) = + 00, 00
00
from which it follows that y is also a recurrent state. Since y is recurrent and y leads to x, we see from the part of the theorem that has already been verified that Px:-r = 1 . This completes the pro� I l\.
nonempty set C of states is said to be closed if no state inside of C leads to any state outsid(;� of C, i.e., if
(41 )
Pxy
=
0,
XE
C and y ¢ C.
Equivalently (see Exercis,e 1 6), C is closed if and only if (42)
p n(x, y)
==
0,
XE
C, y ¢ C, and n >
1.
Actually, even from the 'Neaker condition X E C and y ¢ C, P(x, y) = 0, we can prove that C is closed. For if (43 ) holds, then for x E C and y ¢ C p2(X, Y) = L P(x, z)P(z , y)
(43)
z e f/
=
L P(x, z)P(z , y)
zeC
=
0,
1 . 6.
23
Decomposition of the state space
and (42) follows by induction. If C is closed, then a Markov chain starting in C will, with probability one, stay in C for all time. If a is an absorbing state, then {a} is closed. l\ closed set C is called irreducible if x leads to y for all choices of x and y in C. It follows from Theorem 2 that if C is an irreducible closed set, then either every state in C is recurrent or every state in C is transient. The next result is an immediate consequence of Theorems 1 and 2. Corollary 1 Let C be an irreducible closed set of recurrent 00 for states. Then Pxy = 1 , Px(N (y) (0) = 1 , and G(x, y) all choices of x and y in C. =
=
An irreducible Markov chain is a chain whose state space is irreducible, that is, a chain in which every state leads back to itself and also to every other state. Such a Markov chain is necessarily either a transient chain or a recurrent chain. Corollary 1 implies, in particular, that an irreducible recurrent Markov chain visits every state infinitely often with probability on�e . We saw in Section 1 . 5 that if g is finite, it contains at least one recurrent state. The same argument shows that any finite closed set of states contains at least one recurrent state. Now let C be a finite irreducible closed set. We have seen that either every state in C is transient or every state in C is recurrent, and that C has at least one recurrent state. It follows that every state in C is recurrent. We summarize this result : Theorem 3 Let C be a finite irreducible closed set of states. Then every state in C is recurrent.
(:onsider a Markov chain having a finite nu]mber of states. Theorem 3 implies that if the chain is irreducible it must be recurrent. If the chain is not irreducible, we can use Theorems 2 and 3 to determine which states are recurrent and which are transient. IExa m ple 1 0.
Consider a Markov chain having the transition matrix 0
1
2:
0
2
1
0
0
5
5
0
0
0
1. 4
3
0
5
0
4
1
1 "2 1
0
0
0
1. 4 2
3
0
0 1
5
4
0
0
0
5
0
1 "2
0
0
1 "6
1 "3
0
1. 4
0
1 "2
5
0 1
1 "2
.3. 4
Determine which states are recurrent and which states are transient.
Markov Cbains
24
.As a first step in studying this Markov chain, we determine by inspe:ction which states lead to which other states. This can be indicated in lIlatrix form as 0 1
2 3 4 5
0
1
2
3
4
5
+ + +. 0 0 0
0 + + 0 0 0
0 + + 0 0 0
0 + + + + +
0 + + + + +
0 + + + + +
The x , y element of this matrix is + or 0 according as Pxy is positive or zero, i.e., according as x does or does not lead to y. Of course, if P(x, y) > 0, then Pxy > O. The converse is c��rtainly not true in general. FOIr example, P(2, 0) = 0 ; but
P 2 (2, 0)
=
P(2, l )P( l , 0)
> 0,
= t · ! = lo
so that P 2 0 > O. State 0 is an absorbing state, and hence also a recurrent state. Wre see cle:arly from the matrix of + ' s and O' s that {3, 4, 5} is an irreducible closed set. Theorem 3 now implies that 3, 4, and 5 are recurrent states. States 1 and 2 both lead to 0, but neither can be reached from o. We see from Theorem 2 that 1 and 2 must both be transient states. In sumlnary, states 1 and 2 are transie:nt, and states 0, 3, 4, and 5 are recurrent. :Let f/ Tdenote the collection of transient states in f/, and let f/R dlenote th(� collection of recurrent states in f/. In Example 1 0, f/ T = { I , 2} and f/R = {O, 3, 4, 5} . The set f/R can be deco]nposed into the disjoint ir reducible closed sets Cl = {OJ and C2 = {3, 4, 5} . The next the:orem shows that such a decomposition is always possible whenever !;'fJR is nonempty. TheoreM 4.
Suppose that the set f/ R of recurrent states is nonempty. Then f/ R is the union of a finite or countably infinite number of disjoint irreducible closed sets C\ , C2, •
•
•
•
Choose x E 9') R and let C be the set of all states y in f/R such that x leads to y. Since ,x is recurrent, Pxx = 1 and hence x E C. Wle will now verify that C is an irreducible closed set. Suppose that y is in (7 and y leads to z. Since y is recurrent, it follows from Theorem 2 that z is recurrent. Since x leads to y and y leads to z, we conclude that x leads to z. Thus z is in C. Tbis shows that C is closed. Suppose that y and z ar(� both in C. Since x is recurrent and x leads to y, it follows from .Proof
1.6.
Deco'mposition of the stat�' space
25
Theorem 2 that y leads to x. Since y leads to x and x leads to z, we conclude that y leads to z. This shows that (; is irreducible. To complete the proof of the theorem, we need only show that if Ie and D are two irreducible closed subsets of f/ R" they are either disjoint or id�entical. Suppose they are not disjoint and let x be in both C and D. Choose y in C. Now .x leads to y, since x is in C and C is irreducible. Since D is closed, x is in D, and x leads to y, we conclude that y is in D. Thus every state in C is also in D. Similarly every state in D is also in C, so that C and D are identical. I We can use our decornposition of the state space of a Markov chain to understand the behavior of such a system. If the Markov chain starts out in one of the irreducibl�e closed sets Ci of recurrent states, it stays in Cj forever and, with probability one, visits every state in Cj infinitely often. If the Markov chain starts out in the set of transient states 9' T, it either stays in fl' T forever or, at some time, enters one of the sets Cj and. stays there from that time on, again visiting every state in that Ci infinitely iQften. 1.16 .1
Let
C
be one of the irreducible closed sets of recurrent states, and let Pc(x) Px{Tc < 00) be the prob ability that a Markov chain starting at x eventually hits C. Sin(�e the chain remains permanently in C once it hits that set, we call Pc(;() the probability that a chain starting at x is absorbed by the set C. Clearly Pc(x) = 1 , X E C, and pc{x) = 0 if x is a recurrent state not in C. It is not so clear how to compute P c(x) for xE; fl'T' the set of transient states. If there are only a finit,e number of transient states, and in particular if f/ itself is finite, it is always possible to compute Pc(x), xE fI' T ' by solving a system of linear equations in which there are as many equations as unknowns, i.e., members of f/ T . To understand why this is the case, observe that if x E f/T, a chain starting at x can enter C only by entering C at time 1 or by being in f/ T at time 1 and entering C at some future� time. The former event has probability Ly e c P (x, y) and the latter event has probability Ly eV T P (x, y)pc( y). Thus Absorption p."oba bi l ities.
==
(44)
Pc{x) =
�
yeC
PI{X , y) +
� P(x , Y)Pc( y) ,
y e VT
Equation (44) holds whether f/ T is finite or infinite, but it is far frolIi clear how to solve (44) for the unknowns Pc(x) , x E �f7 T, when fI' T is infinite. An additional difficulty is that if f/ T is infinite, then (44) need not have a unique solution. Fortunately this difficulty does not arise if f/ T is finite.
Marko v C�hain$
26
Theorem 5 Suppose the set f/ T of transient states is finite and let C be an irreducible closed set of recurrent states. Then the system of equations
f(x)
(45)
L p·(x, y) + L P(x, y)f(y), E f/
=
yEC
y
T
has the unique solution ./ (x)
(46)
PC(x) ,
=
If (45) holds, then
Proof
fe y)
=
Z
L l»(y, z) + L P( y, z)f(z), E Z E f/ C
T
Substituting this into (45) we find that
f(x)
=
L P(x, y) + L L �P(x, y)P( y, ) z
y E f/ T Z E C
yEe
+ L
L P(x, y)P(y " )f( ) z
)I' E f/ T Z E f/ T
z .
The sum of the first t"fO terms is just Px( Tc < 2), and the third term reduces to LZ Ef/ T p 2 (X, z)f (z), which is the same as LY Ef/ T p 2 (X, y)f(y). Thus
f(x)
jPx(Tc
0 for x > 0 and qx > 0 for x > 1 . We will dtetermine when such a chain is recurrent and when it is transient. As a special case of (:59),
(6 1 )
P1(To
Tn)
2, and hence (67)
Po (To < OD)
= P(O, O)
+
P(O,
1)P 1 (To
1 the queuing chain is transient. In discussing the case It < 1 , we will assum(� that the chain is irreducible (s��e Exercises 3 7 and 38 for necessary and sufficient conditions for irr,educi bility and for results when the queuing chain is not irreducible). Suppose first that It < 1 . Then on the average fewer than one new custom��r will enter the queue in unit time. Since one customer is served whenever the queue is nonempty, we ,�ould expect that, regardless of the initial length of the queue, it will becom�� empty at some future� time. Tills is indeed the case and, in particular, 0 is a recurrent state. The case It = 1 is bord,erline, but again it turns out that 0 is a recurrent state. Thus if It � 1 and the Queu i ng cha i n . .
.
queuing chain is irreduclbIe, it is recurrent.
The proof of these results will be given in the appendix.
1 ,. 9.
A P P E N DIX
Proof of results for the bra nch i ng and q ueu i ng chai ns
In this section we will verify the results discussed in Section 1 .8. To do so we need the following.
Let be the probability generating function o.f a nonnegative integer-v'alued random variable e and set It = Ee (l1vith It = + 00 if e does not have finite expectation). If It < 1 and p(e = 1 ) < 1 , the equation (t ) = t (71) has no roots in [0, 1) . If It > 1 , then (71 ) has a unique root Po in [0, 1). Theorem 6
1.9.
Proof o f results for the branching and queuing ch.ins
37
Graphs of (J)(t), 0 < t < 1 , in three typical cases corresponding to Jl II are shown in Figure: 2. The fact that Jl is the left-hand derivative of (J)(t) at t = 1 plays a fundamental role in the proof of Theorem 6. y
y
y
y
t
J.L < 1
t
=
<J> ( t )
Po
Figure 2
Proof Let / denote the density of �. Then (J)(t) = J(O) + J( I)t
and
(J) ' (t) = f(l) + 2f (2)t
Thus (J)(O) = f (O), (J)( I ) = 1 , and lim (J) '(t) = .t(l)
t-+ 1
Suppose first that Jl
'(t) < 1 .
t -+ 1
Sinc:e �'(t) is nondecreasing in t, 0 < t < 1 , wle conclude that 1 , so by the continuity of 1 ' there is a number t o such that ° < to < 1 and fl>1' ( t ) > 1 for to < t < 1 . It follows from the mean value theorem that ( 1 ) - ( to)
> 1.
1 - to
Since ( 1 ) 1 , we conclude that ( to) - to < 0. Now (t ) -- t is continuous in t and nonnegative at t = 0, so by the intermediate value theorem it must have a zero P o on [0, to). Thus (71 ) has a root Po in [0, 1). We will complet,e the proof of the theorem by showing that there is only one such root. Suppose that ° < P o < PI < 1 , (p o) = Po , and (Pt ) = P l . Then the function ( t) - t vanishes at P o , PI ' and 1 ; hence by Rolle ' s thleorem its first derivative has at least two roots in (o�� 1). By another application of Rolle ' s theorem its second derivative (t ) has at least one root in (0, 1). But if J1 > 1 , then at least one of the numbers f (2), f (3) , . . . is strictly positive, and hence fl>" ( t ) = 2f (2) + 3 · 2f(3) t + . · · =
"
has no roots in (0, 1). This contradiction shows that (t ) = t has a unique root in [0, 1). I 1 . 9.1 .
Using Theorem 6 we see that the results fOir J1 < 1 follow as indicated in Section 1 .8. 1 . Suppose J1 > 1 . It follows from Theorem 6 that P equals Po or 1 , where Po is the unique root of the equation (t ) = t in [0, 1). We will sho'w that P always equals P o . First we observe that since the initial particles act independently in giving rise to their offspring, the probability Py(To < n) that the de scendants of each of the y > 1 particles becom�e extinct by time n is given by B ra nch i ng cha i n .
P:y{ To < n)
=
(P1(To
Consequently for n > 0 by Exercise 9(a)
P1( To < n + 1 )
�� n))Y .
00
=
P(I , 0) + � P( I , Y)Py( To < n)
=
P( l , 0) + � P( l , y)(P1( To < n))Y y= l
y= l
= f(O) +
00
00
� f( y)(P t {To < n))Y, y= 1
1.9.
Proo/� of results for the branching and queuing cbains
39
and hence n > O.
(72) 'We will use (72) to prove by induction that n > o.
(73) Now
jDl (To < 0) = 0 < Po ,
o. Suppose that (73) holds for a given value of so that (73) is true for n n. Since fl>(t) is increasing in t, we conclude from (72) that :=
P 1 (To < n + 1)
==
(P 1 (To < n)) < O. JBy letting n -+ 00 in (73) we see that P =
P 1 (To < 00 ) = lim P1(To < n) < P o . n -+ oo
Since P is one of the two numbers Po o r 1 , it must be the number P o. 1 . �• . 2 .
We will now vc�rify the results of Section 1 .8.2. Let ' denote the number of customers arriving during the nth time period. Then ' 1 ' e 2 ' . . . are independent randorn variables having COlIlmon density f, mean 11, and probability generating function . It follows from Exercise 9(b) and the identity P(O, z) = P(I , z) , valid for a queuing chain, that Poo = P I 0 . We "rill show that the number P P oo = P I 0 satisfies the equation Queu i ng cha i n . n
==
(74)
fl>(p) = p .
If 0 is a recurrent state, p = 1 and (74) follows immediately from th(� fact that (1) = 1 . To verify (74) in general, we observe first that by Ex��rcise 9(b) 00
Poo = P(O, 0) + � P(O, y)pyo, y= 1
i.e. , that
00
(75)
p = f(O) + � f( y)pyo · y= 1 In order to compute PyO ' y = 1 , 2, . . . , we consider a queuing chain start ing at the positive integer y. For n = 1 , 2, . . . , the event { T - 1 = n} Y occurs if and only if
n = min (m > 0 : Y + (' 1 - 1) + . . . = min
(m > 0 : j� 1 + · · · + e
m
=
+.
(' - 1) = m
m -- 1 ),
y
- 1)
Markov C�h.in.
40
that is, if and only if n is the smallest positive integer m such that the number of new customc�rs entering the queue: by time m is one less than th,e number served by time m. Thus Py(T - 1 = n) is independent of y, Y and consequently Py , y- l = Py(Ty- 1 < ex) ) is independent of �y for y = 1 , 2, . . . Since P I 0 = p, we see that .
Py,y- l = Py- l ,y - 2 = . . . = PI0 = p . Now the queuing chain can go at most one stC!P to the left at a time" so in order to go from state Y' > 0 to state 0 it must pass through all the inter vening states y 1, . . . , 1 . By applying the! Markov property we can conclude (see Exercise 3 9) that -
PyO -- Py ,y - 1 Py - 1 ,y - 2 . . . p 1 0
(7�6)
-
y P
•
It follows from (75) and (76) that
P
==
00
1(0) + y�1 f( y)pY =
==
Cl>(p),
so that (74) holds. Using (74) and Theorc�m 6 it is easy to see that if It < 1 and the queuing ch.ain is irreducible, then the chain is recurrent. For p satisfies (74) a:nd by Theorem 6 this equation has no roots in [0, 1 ) (observe that P(e 1 1) < 1 if the queuing chain is irreducible). We conclude that p = 1 . Since P oo = p , state 0 is recurrent, and thus since the chain is irreducible, all states are recurrent. Suppose now that Jl :> 1 . Again p satisfies (74) which, by Theor'em 6, ha.s a unique root Po in [0, 1 ) . Thus P equals either Po or 1 . We will prove that p = Po . To this end we first observ.e that by Exercise 9(a) =
P 1 (To
< n +
1 ) = P(1 ,
00
0) + �1 }'(1, y)Py(To < n), y=
which can be rewritten as (77)
P 1 (To
00
< n + 1 ) = 1(0) + �
y= 1
f( y)Py(To
< n) .
We claim next that y
(78)
> 1 and n > o.
To verify (78) observe that if a queuing chain starting at y reaches 0 in n or fewer steps, it must r�each y 1 in Il or fe,�er steps, go from y 1 to Y 2 in n or fewer steps, etc. By applying the Markov property w'e can conclude (see Exercise 39) that -
-
(79)
Py(To
-.
< n) < l�y(TY - l < n)Py _ 1 (Ty - 2 < n) · · · P1(To < n).
41
Exercises
Since
1
0 , be a Markov chain. Show that
P (Xo = Xo I Xl 8:
==
X l ' . . . , Xn = Xn)
=:
P(Xo = Xo I Xl = X l ) '
Let X and y be distinct states of a Markov chain having d < 00 states and suppose that X leads to y. Let no be the smallest positive integer such that pnO{x, y) ::> 0 and let X l ' . . . , xno - l be states such tha1t P(x, X l )P(X 1 , X 2 ) • • • P{xno - 2 ' Xno -- l )P(Xno - l ,
y) >
0.
(a) Show that x, X l ' . . . , xn o - l ' y are distinct states. (b) Use (a) to show that no < d - 1 . (c) Conclude that P ( Ty < d - 1) > O. 91
;,,
Use ( 29) to verify thle following identities : (a) Px{ Ty < n + 1) = P(x, y) + � P{x �, z)Pz{ Ty < n), z ':l= y
n
:>
.-
O·
,
(b) Pxy = P(x, y) + � P{x, z)pzy • z ':l= y
1 01
Consider the Ehren�est chain with d = 3 . (a) Find Px{ To = n) for X E f/ and 1 < .n < 3. (b) Find P, p 2 , and p 3 . (c) Let 1to be the uniform distribution 1to = (!, !, !, i). Find 1l: 1 , 1t2 ' and 1t 3 '
Exercises
43
11
Consider the genetics chain from Example 7 with d = 3. (a) Find the transition matrices P and p 2 . (b) If 1to = (0, t, t, 0), find 1t1 and 1t 2 · (c) Find Px ( 1{ 0 ,3} = n) , x E f/, for n = 1 and n = 2.
12
Consider the Markov chain having state space {O, matrix 0 2 0 0 }� = 1 - p 0 . 2 0
11 [1 1 �l
1,
2} and transition
(a) Find p 2 . (b) Show that p4 = p 2 . (c) Find pn , n > 1 . 13
1,
Let Xm n > 0, be a J�arkov chain whose state space f/ is a subset of {O, 2, . . . } and whose transition function P is such that
� y
yP(x, y) = Ax + B,
X
E f/,
1,
for some constants A and B. (a) Show that EXn + I = AEXn + B. (b) Show that if A ¥= then EXn = 14
B 1 - A
(
+ An EXo
B
_
1,
1 - A
)
.
Let Xm n > 0, be the Ehrenfest chain on {O, . . . , d} . Show that the assumption of Exercise 1 3 holds and use that exercise to conlpute
Ex(Xn) ·
15
Let y be a transient state. Use ( 36) to sho'N that for all x 00
00
� pn(x, y) < � pn( y, y) . 111= 0 n=O 16
Show that Pxy > 0 if and only if p n(x, y) > 0 for some positive integer n.
Show that if x leads tiQ y and y leads to z, then x leads to z. 1 8 Consider a Markov chain on the nonnegative integers such that, starting from x, the chain goes to state .X + with probability p, o < p < 1 , and goes to state 0 with probability - p. (a) Show that this chain is irreducible. (b) Find Po ( To = n) , n > 1 . (c) Show that the chain is recurrent. 17
11
44
Marko v .r:hains
1 !t
space {O, 1 , . . . , 6} and
Consider a Markov chain having stat,e transition matrix 0 1 3 4 5 10 t 0 t i- t 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 3 0 0 0 0 t 0 4 0 0 0 0 t 1 5 6 0 0 0 0 0 '1-
2
6 0 0 0 0
.2
2()
t
0
!-
(a) Determine which states are transient and which states are recurrent. (b) Find POY' y = O �, . . . , 6. Consider the Markov chain on {O, 1 , . . . , 5} having transition lmatrix 0 0 1
2 3 4 5
1
2
t t 0 t t 0 0 0 -1 t t 0 0 0 i 0
1
5
0
3 4 5 0 0 0 0 0 0 0 �8 0 0 t t 0 ! 0 .1 5
1
5
A 5
(a) Determine which states are transient and which are recurrent. (b) Find P { O , l } (X) , x = 0, . . . , 5. 211 Consider a Markov chain on {O, 1, . . . , el} satisfying (5 1) and having no absorbing states other than 0 and d. Show that the states 1 , . . . , d - 1 each lead to 0, and hence that each is a transient state. 2�!
Show that the genetics chain introduced in Example 7 satisfies Equation (5 1).
2�1
A certain Markov chain that arises in genetics has states 0, and transition function P(x, y) = '
2�J
1 -
Find p{ O } (x) , 0 < x < 2d. Consider a gambler" s ruin chain on {O, 1 , . . . , d } . Find
Px(To
2Ei
e:) c�r ( 2�rd -Y.
1, . . . , 2d
0 and qx > 0 for x > 1 . 1, x > 1. (a) Show that if L;= o Yy = 00 , then Pxo (b) Show that if L;= o Yy < 00 , then L;= x 1y X > ' Px O - � oo - 1 £';y = O Yy Consider a gambler' s ruin chain on {O, 1 , 2, . } . (a) Show that if q > p, then Pxo = 1 , x > 1 . (b) Show that if q < p, then pxo = (q/p)X, X > 1 . Hin t : Use Exercise 26. Consider an irreducible birth and death chain on the nonnegative integers. Show that if px < qx for x > 1 , the chain is recurrent. Consider an irreducible birth and death chain on the nonnegative integers such that X 2 qx x > 1. ' X + 1 Px ==
27
18 21
..
_
•
( )
(a) Show that this chain is transient. (b) Find PxO ' x > 1 . Hint : Use Exercise 26 and the formula :L;= 1 l /y 2 = 1[ 2/6. 30 Consider the birth and death chain in Exarnple 1 3. (a) Compute Px( Ta O. 31 Consider a branching chain such that f( l ) < 1 . Show that every state other than 0 is transient. 32 Consider the branching chain described in Example 1 4 . If a given man has two boys and one girl, what is the probability that his male line will continue forever? 33 Consider a branching chain with f (O) = f (3) = 1 /2. Find the probability P of extinction. X 34 Consider a branching chain with f(x) =: p( 1 - p) , x > 0, ,¥here o < p < 1 . Show that P = 1 if p > 1/2 and that P = p/(1 - p) if '
p
36 36
< 1/2.
Let Xn , n > 0, be a branching chain. Show that Ex(Xn) = Xjln . Hint: See Exercise 13. Let Xn , n > 0, be a branching chain and suppose that the associated random variable � has finite variance (J2. (a) Show that E[Xn2+ 1 I Xn = x] = X C1 2 + X2Jl 2 . (b) Use Exercise 35 to show that Ex(Xn2+ 1 ) = XJl"U1 + 11 2Ex(X;) . HiNt : Use the formula EY = Lx P(X = x)E[ Y I X :::: x] .
46
Marko v Chains
(c) Show that
Ex(X;)
= xcr 2 (Jl n
-l
+
· · ·
+ /1 2 ( n -- l ») + X 2/1 2 n,
n
> 1.
(d) Show that if there are x particles initially, then for n > 1 Var �(n = 37
38
391
{
x cr 2 /1n - l nx cr 2 ,
(1 )
� ' , /1
_ -
1
-
"
Jl "#
1,
Jl = 1 .
Consider the queuing chain. (a) Show that if either J(O) = 0 or J(O) + J( I) = 1, the chain is not irreducible. (b) Show that ifJ(O) > 0 andJ(O) + J(I) < 1, the chain is irreducible. Hint : First verify that (i) Pxy > 0 for 0 < y < x ; and (ii) if xo > 2 and J(xo) > 0, then P O , xo + n (xo - l ) > 0 for n > O. Determine which states of the queuing chain are absorbing, which are recurrent, and which are transient, when the chain is not irreducible. Consider the following four cases separate:ly (see Exercise 37) : (a) J(I) = 1 ; (b) J(O) > O, J(I ) > 0, and J(O) + J(I) = 1 ; (c) f(O) = 1 ; (d) f(O) = 0 and J( l) < 1 . Consider the queuing chain. (a) Show that for y > 2 and m a positive integer m- l
Py(To = m) = L Py(TY - 1 = k)Py - 1(To = m k= 1
-
k).
(b) By summing the equation in (a) on m = 1 , 2, . . . , show that
PyO
=
Py, y - 1 Py - 1 , 0
Y > 2.
(c) Why does Equation (76) follow from (b) ? (d) By summing the equation in (a) on m = 1, 2,
Py(To < n)
40
�;
Py(TY - 1 < n) Py - 1 (1ro < n) ,
(e) Why does Equation (79) follow from (d) ? Verify that ( 8 1) follows from ( 80) by induction.
. . . , n,
show that
y > 2.
Stationary Distributions of a Markov Chain
2 Let Xm P. If
17 >
0, be a Markov chain having sta.te space f/ and transition function f/, are nonnegative numbers su:mming to one, and if
n(x), x E
� n(x)P(x, y)
(1)
=
n( y),
f/,
YE
x
then
n is called
a stationary distribution . Suppose that a stationary distribution
n
exists and that
pn(x, y)
lim
(2)
n -+
=
n( y),
YE
f/ .
00
Then, as v..re will soon see, regardless of the initial distribution of the chain, the distribution of Xn approaches
n
as n
-+
00 .
In such cases,
n is sometimes called the
steady state distribution. In this chapter we will determine which Markov chains have stationary distribu tions, when there is such a unique distribution, and when
2 . '1 . Let
(2)
holds.
Ele menta ry p ro pert i es of st:at i o n a ry d i stri buti o ns
n
be a stationary distribution. I'hen
� n(x)p 2 (x, y)
=
x
� n(x) � P(x, z)P(z, y) z
x
=
=
� (� n(x)P(x, z ») P(z, y) � n(z )P(z, y)
=
n( y).
z
Sirnilarly by induction based on the formula
pn + lex, y)
=
� pn(X, Z )P(Z, y), z
we: conclude that for all n
(3)
� n(x)pn(x, y)
=
x
47
n( y),
Y E ff.
Stationllry Dist'riblltions of • Marko v Chain
48
If Xo has the stationary distribution TC for its initial distribution., then (3) implies that for all n (4)
P(Xn
=
y)
=
TC(y),
Y E f/,
and hence that the distribution of Xn is independent of n. Suppos�e con vt:�rsely that the distribution of Xli is independent of n. Then the initial distribution TCo is such that
TCO( Y)
=
P(X0
=
y)
P(X 1
=
=
y)
=
� TCo(X)P(x, y). x
Consequently TCo is a stationary distribution. In summary, the distribution of Xli is independent of n if and only if the initial distribution is a stationary distribution. Suppose now that TC is a stationary distribution and that (2) holds. L�et TCo be the initial distribution. Then
P(XlI
(5 )
=
y) =
lI
� TCO (X)P (x, y),
YE
x
f/.
By using (2) and the bounded convergence theorem stated in Section 2.5 , wle can let n -+ 00 in ( 5), obtaining limL P(XlI = y) =
n -' 00
� TCo(X) TC( Y). x
Since Lx TCo (X) = 1 , we: conclude that lim ,P(Xn = y) = TC( y),
(6)
11 -' 00
YE
f/.
Formula (6) states that, regardless of the initial distribution, for large values of n the distribution of Xli is approximately equal to the stationary distribution TC. It implit:�s that TC is the unique stationary distribution. For if there were some oth��r stationary distribution we could use it f;or the initial distribution TCo . 1From (4) and (6) we \vould conclude that TCo(Y) =
TC(y), Y E
f/.
Consider a system described by a Markov chain having transition function P and unique stationary distribution TC. Suppose we start observing the system after it has been going on for some time, say no units of time for some large positive integer no. In ��ffect, we observe Y,., n > 0, where n �:
o.
The random variables ��, n > 0, also form a �,1arkov chain with transition function P. In order to determine unique probabilities for events defined in terms of the Yn chain, we need to know its initial distribution, which is the same as the distribution of Xn . In most practical applications it il S very o
2. 2.
Examples
49
hard to determine this distribution exactly. We may have no choice but to assume that Ym n > 0, has the stationary distribution n for its initial distribution. This is a reasonable assumption if (2) holds and no is large. 2.2: .
Exa m ples
In this section we will consider some examples in which we can show dir(�ctly that a unique stationary distribution exists and find simple fonnulas for it. In Section 1 . 1 we discussed the two-state Markov chain on f/ = {O, I } having transition matrix
[
o 1 1
o q p
1
-
1
p
-
]
q
.
We saw that if + q > 01, the chain has a uniqUle stationary distribution n, detc�rmined byp and n(t) = p nCO) = _L p + q p + q
We also saw that if 0
y, th(�n ex
x = }2 a z bx_z · z= o
=
min(x, y )
�
z= o
a z bx _ z ·
(Y) [ A
]
U'sing this with (17) and the binomial expansion, we conclude that y -z ( 1 p")X- Z - ), z . " x ! e ( 1 prl) /q mi ( x ,y) p" � = p") jP (x, y) , (1 , z=0 Z y! (x - Z) ! q n
which simplifies slightly to (18)
P"(x , y) = e - ).( 1 - p" )/q
Since 0
O.
Thus (2) holds for this chain, and consequently the distribution 1t given by (1 6) is the unique stationary distribution of the chain. 2. 3.
Average n u m ber of visits to a rac:u rrent state
Consider an irreducible birth and death chain with stationary distribu tion 1t. Suppose that P(x, x) = rx = 0, x E tl', as in the Ehrenfest chain and the gambler ' s ruin chain. Then at each transition the birth and death chain moves either one step to the right or one step to the left. Thus the chain ca.n return to its starting point only after an leven number of transitions. In other words, P"(x, x) = 0 for odd values of n. For such a chain the formula lim P"(x, y) = 1t( y) , y E f/, cl(�arly fails to hold.
2. 3.
A ver�,ge number of visits tc, B recurrent state
57
'There is a way to handle such situations. L,et an ' of numbers. If
n
> 0, be a sequence
lim a" = L n-+ oo
(20)
for some finite number J�, then (21l)
1n mt= l am = L.
lim n-+ oo
Formula (2 1 ) can hold, however, even if (20) fails to hold. For exa:mple, if a" = 0 for n odd and an = 1 for n even, then an has no limit as n --+ 00 , but 1 n 1 . bm � am = n m= l 2 -
" -+ 00
..
In this section we will show that lim n-+ oo
"
1 -
n
� pm(x, y)
m= 1
exilsts for every pair x, y of states for an arbitrary Markov chain. In Se�ction 2. 5 we will use the existence of these limits to determine ,�hich Markov chains have stationary distributions and when there is such a unique distribution. ]Recall that l ,(z)
and that
=
{�:
z = y, z #: y ,
(22)
Set n
Nn ( y) = �
m= l
l y(Xm)
and n
(]n(x, y) = � pm(x, y) .
m= l
Then N,, (y) denotes the number of visits of the: Markov chain to y during tinles m = 1 , . . . , n. The expected number of such visits for a chain starting at x is given according to (22) by (23)
Ex(N,, (y)) = G,, (x, y) .
]Let y be a transient state. Then lim
,,-+ 00
N�(y)
=
N( y) < 00
with probability one,
58
Stationary Dist,ributions of a Marko v Chain
and
It follows that
x e f/.
n-' oo Nn( y)
lim
(24)
n-'
n
00
=
0
with probability one,
and that 1·11m
(25)
n-l·
00
Gn(x , n
y)
=
0
x e f/.
,
Observe that Nn(y)/n is the proportion of the first n units of time that the chain is in state y and that Gn(x, y)/n is the expected value of this propor tion for a chain starting at x. Suppose now that y is a recurrent state. I�et my Ey(Ty) denote the mean return time to y for a chain starting at y if this return time has finite expectation, and set my 00 otherwise. Let l {Ty < oo} denote the random variable that is 1 if Ty 1 let T; denote the time of the� rth visit to y, so that
T; = min (n > 1 : Nn( y) = r) . Set W; = Tyl = Ty and for r > 2 let W; = T; - T; - I denot'e the
waiting time between the (r - l)th visit to y and the rth visit to y. Clearly T; = W; + . . . + JV;.
The random variables W ! , W:, . . . are independent and identically distributed and hence they have common mean Ey( W!) = Ey(Ty) my . This result should be intuitively obvious, since every time the chain r(�turns to y it behaves from then on just as would a chain starting out initially at y. One can give a rigorous proof of this result by using (27) of Chapter 1 to show that for r > 1 :=
and then showing by induction that Py( W;
= m I , . . . , HI; = mr) = Py( W: = m I) · · · Py( W: = mr) .
The strong law of large numbers implies that lim
k-. oo
W yI + W y2 + . . . + W ky
k
= my
'with probability one,
60
Stationary Distributions of a Marko v Chain
i.e:., that
ex>
Tk lim -1 = m y
(30)
k-+
k
with proba.bility one.
Set r = N,.(y). By time n the chain has made exactly r visits to y. Thus thle rth visit to y occurs on or before time n, and the (r + l)th visit to y occurs after time n ; that is, T Nn(Y) s: n < T Nn(Y) + 1 ' Y Y and hence
or at least these results hold for n large enough so that N,.(y) � 1 . Since N,.( y) --+ 00 with probability one as n --+ 00 , these inequalities and (30) together imply that
ex>
I.1m
,.-+
n = m y N,.( y)
with probability one,
or, equivalently, that (29) holds. Let y be a recurrent state as before, but let X� have an arbitrary distribu tion. Then the chain nlay never reach y. If it does reach y, hov�ever, the above argument is valid ; and hence, with probability one, N,.(y)ln --+ 1 { Ty < 00 }/my as n --+ 00 . Thus (26) is valid. By definition 0 < N,.{y) < n, and hence
N o < ,.( y) < 1 . n
(3 1 )
A theorem from measure theory, known as the dominated convergence theorem, allows us to conclude from (26) and (3 1) that lim Ex
,.-+ ex>
(N,.( y)) n
= Ex
(I{TY<my ex> })
and hence from (23) that ( 27) holds. Theorem 1 . 2.4.
= Px( Ty < 00) = Pxy
my
This completes the proof of I
N u l l recu rrent a nd positive recu rrent states
,A
recurrent state y is called null recurrent if my we� see that if y is null r(�current, then
(32)
my
" I1m
,. -+ ex>
G,. (x, n
y)
ex>
" L:!. = 1 pm(X , = Ilm ,.-+
n
y)
=
= 0,
00 . From Theorem 1
X E [/'.
2.4.
Null jrecurrent and positive recurrent states
61
(It can be shown that if y is null recurrent, then liIn P"(x, y)
(33)
n -+ oo
=
X E f/ ,
0,
which is a stronger result than (32). We will not prove (33), since it will not be needed later and its proof is rather difficult.) A recurrent state y is called positive recurrent if my < 00. It follows fr()m Theorem 1 that if y is positive recurrent, then lim
Gn( y, y) n
=
_1 my
>
o.
Thus (32) and (33) fail to hold for positive recurrent states. Consider a Markov chain starting out in a recurrent state y. It follows fr()m Theorem 1 that if y is null recurrent, then, with probability one, the proportion of time the chain is in state y during the first n units of til1ne approaches zero as n -+ 00 . On the other hand, if y is a positive recurrent state, then, with probability one, the proportion of tim.e the chain is in state y during the first n units of time approaches the positive linrrit limy as n -+ 00 . 'The next result is closely related to Theorel1n 2 of Chapter 1 . x
Theorem 2 ��t .x be a positive recurrent state and suppose that leads to y. Then y is posiiw·e recurrent .
It follows from Theorem 2 of Chapter 1 that Thus there exist positive integers n1 and n2 such that .Proof.
y
leads to x.
and Now
p n l + m + n2( y, y)
and by summing on m
Gn l +n +n ( Y ' y) 2 n
1 , 2, . . , n and dividing by n, we conclud�e that G Gn l +II ,( Y ' y) > n l p ( y, x)p n2(x, y) n(x, x) . n n
==
_
p n l ( y, x)pm(x, x) pn2(x, y),
> .
As n -+ 00 , the left side of this inequality converges to l imy and the right side converges to
� my and consequently my
00 .
pn l ( y, x)pn2(x, y) mx
>
0
,
This shows that y is positive recurrent.
I
62
Stationary Distributions of a Marko v Chain
From this theorem and from Theorem 2 of Chapter 1 we see that if C is an irreducible closed set, then every state in C is transient, every state in C is null recurrent, or every state in C is positive recurrent. A Markov chain is called a null recurrent chain if all its states are null recurrent and a positive recurrent chain if all its states are positive recurrent. We see th,erefore that an irreducible Markov chain is a transient chain, a null re�current chain, or a positive recurrent chain. If C is a finite closed set of states, then C has at least one positive recurrent state. For
}:
y E: C
pm(x, y) = 1 ,
and by summing on m = 1 , .
}:
yeC
.
.
X E C,
, n and dividing by n we find that
Gn(x, y) = 1, n
X E C.
If C is finite and each state in C is transient or null recurrent, then (25) holds and hence 1 = lim }: Gn (x, y) n-+ oo y e C n =
}: lim
yeC
Gix, n n-+ oo
y)
=
0,
a contradiction. We are now able to sharpen Theorem 3 of Chapter 1 . Theorem 3 Let C be a finite irreducible closed set of states. Then every state in C is positive recurrent.
Proof.
The proof of this theorem is now almost immediate. Since C is a finite closed set, there is at least one positive recurrent state in C. Since C is irreducible, ev�ery state in C is positiv,e recurrent by Theorenl 2. I Corollary 2 An irreducible Markov chain having a finite number of states is positive recurrent. Corollary 3 A Afarkov chain having a )7nite number of states has no null recurrent states.
Corollary 2 follows immediately from Theorem 3. To verify Corollary 3, observe that if y is a recurrent state, then, by Theorem 4 of Chapter 1 , y is contained in an irreducible closed set C of recurrent states. Since C is necessarily finite, it follows from Th'eorem 3 that all states in C, including y itself, are positive recurrent. Thus every recurrent state is positive recurrent, and hence there are no null recurrent states. I Proof.
2. 5.
Existence and uniqueness of stationary distributions
63
Exa m ple 4.
Consider the Markov chain described in Example 10 of We have seen that 1 and 2 are transient states and that 01, 3, 4, Chapter and 5 are recurrent states. We now see that these recurrent states are necessarily positive recurrent. 1.
2.!5 .
Existence and u n iq ueness of stationary d i stri butio n!5
In this section we will determine which Markov chains have stationary distributions and when there is a unique such distribution. In our dis cussion we will need to interchange summations and limits on several occasions. This is justified by the following standard elementary result in analysis, which we state without proof. Let a(x) , x E !/, be nonnegative numbers havlng finite SUn-l, and let bn (x) , x E !/ and Il > 1 , be such that I bn (x) I < 1, x E !/ and n > 1 , and Bounded Convergence Theorem .
lim bn(x) = b(x) , n� oo
Then
X E !/.
lim � a(x) bn(x) = � a (x) b (x). n� cx:) x
x
Let n be a stationary distribution and let m be a positive integer. by (3) � n(z)pm(z, x) n(x) . =
z
=
Summing this equation on m clude that
(34)
1 , 2, . . . , n and dividing by
(z x) � n(z) Gn ,
z
Then
n
=
n(x) ,
n, we con
X E fI'.
Theorem 4 Let ,roc be a stationary distribution. If x is a transient state or a null recurrent state, then n(x) = o .
.Proof
(35)
If x is a transient state or a null recurrent state, 1·Inl
n� oo
Gn(z , x) = n
0,
X E fI',
as shown in Sections 2.3 and 2.4. It follows from (34), (35), and the bounded convergence theorem that Gn(z, x) = i.J n( z) n(x) = 1·1m '" n� cx> n
as desired.
z
0, I
64
Stationary Distributions of a Marko v Chain
It follows from this theorem that a Markov chain with no positive re1current states does not have a stationary distribution. Theorem 5 An irreducible positive recurrent Markov chain has a unique stationary distribution n, given by
(36) Proof
n(x)
=
1
-
mx
t/.
X E
,
It follows from Theorem 1 and the assumptions of this theorem
that (z, x) = � lim Gn , n -+ ocl n mx
(37)
x,
Z E
f/ .
Suppose n is a stationary distribution. We see from (34), (37), and the bounded convergence theorem that n
( x)
=
. G fz x) hm � n(z) 2-�_'_ n -+ oo n
=
- � n ( z)
z
1
mx
=
z
1
-
mx
.
Thus if there is a stationary distribution, it must be given by (36). To complete the proof of the theorem we need to show that the fUllction ?r ex) , x E f/ , defined by (36) is indeed a stationary distribution. It is clearly nonnegative, so we need only show that (3 :8) and 1
� -- P(x, y ) x m: x
(39)
=
1
-
my
.y E
,
f/ .
Toward this end we observe first that
� P'"(z, x) x Summing on m
=
=
1.
1, . . . , n and dividing by 1t�, we conclude that Z E
(40)
f/ .
N�ext we observe that by (24) of Chapter 1
� P'"(z, x)P(x, y) x
=
P'" + 1 ( Z, y).
2. 5.
Exists1nce and uniqueness o�r: stationary distributiolns
By again summing on m
(41)
1 , . . . , n and dividing by
x) P(x, y)
I: GnCz, x
=
n
65
Gn + 1 (z, y)
=
n, we conclude� that
P(z, y)
_
n
n
.
If f/ is finite, we conc1lude from (37) and (40) that · 1 - 11m
� i.J
n-' oo x
Gn(z, n
x)
�I
1
- i..x , mx _
'
i.e. , that (38) holds. Silnilarly, we conclude that (39) holds by le:tting n -+ 00 in (41). This com,pletes the proof of the theorem if f/ is finite .. 1rhe argument to complete the proof for f/ infinite is more complicated, sin4�e we cannot directly interchange limits and sums as we did for f/ jfinite (th�e bounded convergence theorem is not applicable). Let f/ 1 be a finite subset of f/ . We see from (40) that
I: Gn(z,
x e9'1
n
x) < 1 ,
Since [f'1 is finite, we can let n --+ 00 in this inequality and conclude from (37) that
Th(� last inequality holds for any finite subset
�l) 1
of f/, and hence
(42) For if the sum of l /mx over x E f/ exceeded 1 , the sum over some 1inite subset of f/ would also exceed 1 . Similarly, we conclude from (41) that if f/ 1 is a finite subset of f/, then � I.J
x efl'l
Gn(z, n
x) P(x, y
)
0 for some n > 1 , i.e., such that p)(;X = Px{ Tx < (0) > o. We define its perio( dx by
dx
=:
l
g.c.d. {n
>
1 : P"(x, x)
>
O} .
Then 1 < d� < min (n
>
1 : P"(x, x)
>
0) .
If P(x, x) > 0, then dx = 1 . If x and y are two states, each of which leads to the other, then dx For let n 1 and n2 be positive integers such that and
=
dy e
2. 7.
Con v�Jrgence to the stBtiont.ry distribution
pi + n 2 (x ,
73
pn l (x , y)pn2( y, x) > 0, and hence dx is a divisor of n l + n 2 0 If p n( y, y) > 0, then pn l + 1I + n2(x, x) > p n l (x, y)pn( y, y)pn 2( y, x) > 0, x)
>
so that dx is a divisor of n l + n + n 2 0 Since dx is a divisor of n l -t- n 2 , it lDust be a divisor of n. Thus dx is a divisor of all numbers in the set {n � 1 : pll( y, y) > O}. Since dy is the largest such divisor, we con4:lude tha.t dx < dy e Similarly ely < dx , and hence dx = dy e ��e have shown, in oth,er words, that the stat(�s in an irreducible Markov chain have common period d. We say that the 4:hain is periodic with period d if d > 1 and aperiodic if d = 1 . A simple sufficient condition for an irr(�ducible Markov chain to be aperiodic is that P(x, x) > 0 for some x E: Y. Since P(O, 0) = ,/(0) > 0 for an irreducible queuing chain, such a chain is necessarily aperiodic. I:xa m ple 7.
Determine the period of an irreducible birth and death
chain. If
some rx > 0, then }>(x, x) = rx > 0, and the birth and death chain is aperiodic. In particular, the modified Ehrenfest chain in Examp}(� 3 is ape�riodic. Suppose rx = 0 for all x. Then in one transition the state of the c;hain changes either from an odd numbered state to an even numbered state or fro:m an even numbered state to an odd numbered state. In particular, a chain can return to its initial state only after an (�ven number of transitions. Thus the period of the chain is 2 or a mUltiple of 2. Since 2 p' (0, 0) = PO q l > 0, we conclude that the chain is periodic with pc�riod 2. In particular, the Ehrenfest chain introduc:ed in Example 2 of (�hapter 1 is periodic with period 2.
Let �r,., n > 0, be an irre(lucible positive recurrent ..Markov cha in having stationary distribution no If the cha in is aperiodlc, litn pn ( x, y) = n( y) , (55) x, Y E ff. n -' oo If the chain is periodic lwith period d, then for each pair x, y of states in .[/ there is an integer f, 0 < f < d, such that p n(x, y) = 0 unle.ss rl = md + r for some nonnegative integer m �, and lim p md + r(x, y) = dn( y) . ( 5 6) m '''' oo Theorem 7
74
Stationary Distributions of a Marko v Chain
For an illustration of the second half of this theorem, consider an irreducible positive recurrent birth and death chain which is periodic with period 2. If y x is even, then p 2m+ l ex, y) = 0 for all m > 0 and -
lim p 2 m(X, y) = 2n( y).
If y
-
m -' oo x is odd, then p. 2 m(x, y) = 0 for all rn > 1 and lim p 2 m+ l ex , y) = 2n( y). m -' oo
We will prove this theorem in an appendix to this chapter, which can be olmitted with no loss of continuity. Exa m pl e 8.
Deterrnine the asymptotic behavior of the matrix p n for the transition matrix P (a) from Example 3, (b) from Example 2. (a) The transition matrix P from Example 3 corresponds to an aperiodic irreducible Markov chain on {O, 1 , 2, 3} having the stationary distribution given by nCO) = t,
n(l ) = i ,
n (2)
It follows from Theorem 7 that for t t t t
n
=
i,
large
i i i i
i i i i
t t t t
0 i 0 i 0 t 0 i 0 i 0
0
and
n(3) = i .
(b) The transition matrix P from Example 2 corresponds to a periodic irreducible Markov chain on {O, 1 , 2, 3} having period 2 and the same stationary distribution as the chain in Exam.ple 3. From the discussion following the statement of Theorem 7, we conclude that for n large and even
p n ...:. while for n large and odd
p n ...:.
!
0 i ! 0 0 i t 0
!
0
t
0 ! i 0 0 �4 i 0
2. 8.
75
Prootr of con vergence
A P P E N D IX 2.1�.
Proof of convelrgence
�we
will first prove l�heorem 7 in the ap��riodic case. Consid(!r an apc!riodic, irreducible, positive recurrent Markov chain having transition fUIlction P, state space fI', and stationary distribution Te. We will now verify that the conclusion of Theorem 7 holds for such a chain. �Choose a E f/ and let .l be the set of positive integers defined by [ = {n > 0 : p n (a, a)
::::>
O} .
Then (i) g.c.d. [ = 1 ; (ii) if m E l and n
IE
[, then m + n E I.
Property (ii) follows frol1n the inequality
p m+n (a, a)
>
p m (a, a) pn (a, a).
Properties (i) and (ii) irnply that there is a positive integer n 1 such that n E: [ for all n > n 1 • For completeness we will prove this number theoretic result in Section 2. 8.2. Using this result we (�onclude that p n(a, a) > 0 for n � n 1 • 1Let x and y be any pair of states in f/. Since the chain is irredu�cible, the:re exist positive integ(�rs n 2 and n 3 such that
pn2(X , a)
>
0
and
Then for n > n l
pn2 + n + n3(x, y)
pn2 (x, a)pn(a, a)pn 3 (a, y) > o. W(! have shown, in other words, that for every pair x, y of states in f/ there is a positive integer no such that >
(57) Set
f/ 2
==
{(x, y) : x E f/ and y E
fI'} .
ThIen f/ 2 is the set of ordered pairs of elements in f/ . We will consider a Markov chain (Xn' Yn) having state space f/ 2 and transition function P2 defined by It follows that Xn , n > 0, and Ym n > 0, are each Markov chains having transition functiop. P, and the successive transitions of the Xn chain and the: Yn chain are chosen independently of each other.
76
Stationary Dist4ributions of a Marko v Chain
We will now develop properties of the lMarkov chain (Xn ' 1';,.). In particular, we will show that this chain is an aperiodic, irreducible, positive recurrent Markov chain. We will then use this chain to verify the conclusion of the theor{�m. Choose (xo, Yo) E [/2 and (x, y) E f/ 2 . By (57) there is an no > 0 such that and Then (58) W'e conclude from (58) that the chain is both irreducible and aperiodic. The distribution rc 2 on fl' 2 defined by rc.2 (xo, Yo) = rc(xo)rc( Yo) is stationary distribution. For
�
( XO,Yo) e f/2
rc .2 (xo, Y o)P.2« xo, Yo), (x, = �
a
y))
� rc(xo)rc( Yo)P(x o, x)P( Y o, y)
Xo e f/ yo e f/
=
rc(x)rc( y)
=
1t .2 (x,
y) .
Thus the chain on fl' 2 is positive recurrent ; in particular, it is recurr{�nt. Set T' = min (n > 0 : Xn == Yn). Choose a E f/. Since tbe (Xn ' Yn) chain is recurrent, 1(a,a)
==
min (n >
0
:
(Xm �!) = (a, a))
is jfinite with probability one. Clearly T < 1(a ,a ' and hence T is finit�� with ) probability one. For any n > 1 (regardless of the distribution of (Xo' Yo)) (59)
P(Xn = y, T
< n) =
P( Yn = y, T
< n),
YE
fI' .
This formula is intuitiv��ly reasonable since the two chains are indistin guishable for n > T. To make this argument precise, we choose 1 :� m < n. Then for Z E f/
(60)
P (Xn = y I T = m, Xm = Ym = z)
P( Yn = y I T = m, Xm = Ym = z), m since both conditional probabilities equal p, n - (z, y). Now the event =
{ I' L n} is the union of the disjoint events {T = m, Xm = Ym = z},
1 < m < n and Z E f/,
2. 8.
Proot� of convergence
77
so it follows from (60) and Exercise 4(d) of Chapter 1 that P(Xn =
Y I T < n)
=
P( Yn =
y I T < n)
and hence that (59) holds. ]jquation (59) implies that P(Xn =
y) = P�(Xn = y, T < n) + }>(Xn = y, T > n) = p·( Yn = y, T < n) + P�(Xn = y, T > n) < p·( Yn = y) + p e T > n)
and similarly that P( Yn =
y) < P(Xn = y) + }>(T > n) .
Th,erefore for n > 1 IP(Xn =
(61)
y) - P( Yn = y) 1 < pe T )� n) ,
YE
ff.
Since T is finite with probability one, lim peT > n) = o.
(62) We:
n -' oo
conclude from (61) and (62) that
(63)
lim (P(Xn
==
y)
- P( Yn = y)) = 0,
yE
ff.
lJsing (63), we can easily complete the proof of Theorem 7. Choose x E f/ and let the initial distribution of ( Xn ' Yn) be such that P(Xo = x) = 1 and
Yo
E ff.
Since X", n > 0 , and Yn , n > 0, are each Markov chains with transition function P, we see that P(Xn =
(64)
y) = pn(x, y) ,
Y E f/ ,
and P ( }� = y) = n( y) ,
(65)
Y IE
ff.
Thus by (63)-(65) n lim (p (x , y) - n( y))
=
lim (P(Xn = y) - P( Yn
=
y)) = 0,
and hence the conclusion of Theorem 7 holds. We first conside�r a slight extension of Th(�orem 7 in the aperiodic case. Let C be an irreducible closed set of positive recurrent states such that each state in C has period 1 , and let 1t 2.8.1 .
Peri od i c case.
78
Stationary Distributions of a Marko v Chain
bc� the unique stationary distribution concentrated on C. By looking at the M[arkov chain restricted to C, we conclude that lim p n{x, y) = n{ y) =
n -' oo
�,
x,
my
y
E C.
In particular, if y is any positive recurrent state having period 1 , then by letting C be the irreducible closed set containing y, we see that lim pn{ y, y) =
(66)
n -' oo
�. my
We now proceed with the proof of Theorem 7 in the periodic case. Lc�t Xm n > 0, be an irr1educible positive recurrent Markov chain which is p{�riodic with period d :::> 1 . Set Ym = Xmd , m > O. Then Ym , m > 0, is a p d . Choose y f/. Then M[arkov chain having transition function Q
E
==
g.c . d. {m I Qm{y, y) > O} = g.c.d. {nl I p md {y, y) > O} =
! g.c.d. d
{n I
pn (y, y)
>
O}
= 1.
Thus all states have period 1 with respect to the Ym chain. Let the Xn chain and hence also the Ym chain start at y. Since the XII chain first returns to y at some mUltiple of d, it follows that the expected return time to y for thle Ym chain is d - 1 my, where my is the expected return time to y for the Xn chain. In particu1lar, y is a positive recurrent state for a Markov chain having transition function Q. By applying (66) to this transition function we conclude that lim Qm( y, y) =
m -' oo
�
my
= dn( y) ,
and thus that
Y E f/.
lim .p md{ y, y) = dn{ y),
(67)
m -' oo
Let x and y be any pair of states in f/ and set '1
= min (n : p n{x, y)
:>
0).
Then, in particular, pr l (x, y) > o. We will show that p n(x, y) > 0 only if n ' 1 is an integral 1nultiple of d. Choose nl such that lr;)n1( y, x) > O. Then -
p rl + n l ( y, y)
>
pn l(y, x)prl{x, y)
> 0,
2. 8.
Proof' of con vergence
79
and hence ' 1 + n 1 is an integral multiple of cl. If p n(x , y) > 0, th(�n by the: same argument n + n 1 is an integral multiple of d, and therefore so is n ' 1 . Thus, n = kd +. ' 1 for some nonnegative integer k. 1rhere is a nonnegative integer m1 such that ' 1 = mId + " vvhere o ::::; , < d. We conclud,e that --
(68)
p n(x,
n
unless
y) = 0
=
md + ,
for some nonnegative integer m. It follows from (6 8 ) and from (28) of Chapter 1 that (69)
pmd + r(x, y)
==
m L Px(Ty k=O
=
kd + r)p (m - k )d(y, y).
{
Set
p(m - k )d( y " y) = (k) am 0,
o < k
m.
m,
Th�en by (67) for each fixed k lim am(k)
m -' oo
=
dn( y).
We: can apply the bounded convergence theorem (with f/ replaced by {O, 1, 2, . . . }) to conclude from (69) that lim pmd+ r(x, y)
m -' oo
00
dn( y) L Px{'Ty = kd + r) k= O = d n( Y)Px( Ty < ex)) =
= dn( y),
and hence that (56) holds. This completes the proof of Theorem 7. 2 .1t 2.
A resu lt from n u m ber theory.
I
Let I be a nonempty set
of positive integers such that (i) g.c.d. 1 = 1 ; (ii) if m and n are in 1'1 then m + n is in I. Thc�n there is an n o such that n E I for all n > n o . '�e will first prove that I contains two consecutive integers. Suppose otherwise. Then there :is an integer k > 2 and an n 1 E I such that n 1 + k E I and any two distinct integers in I differ by at least k. It follows frolm property (i) that thlere is an n E I such that k is not a divisor iQf n. We can write n = mk + "
80
Stationary Dist�ributions of a Markov ,Chain
where m is a nonnegativ(� integer and 0 < r < k. It follows from property (ii) that (m + 1 ) (nl + k) and n + {m + l )nl are each in I. Their di1[erence is
(m + 1 ) (nl + k) -- n
-
(In + l)nl =
k + mk - n
=
k
-
r,
which is positive and smlaller than k. This contradicts the definition of k. We have shown that I contains two conse:cutive integers, say nl and nl + 1 . Let n > ni. Jlhen there are nonnegative integers m and ,. such that 0 < r < nl and n
Thus n
=
-
ni
=
mn l +
4r(nl + 1) + (nl
r
-
Jr.
+ m)nl
'
which is in I by property (ii). This shows that n E I for all n > no
=
i
I
n .
Exercises 1
Consider a Markov chain having matrix 0 o .4 1 .3 2 .2
[
state space {O, 1 , 2} and transition
]
1 2 .4 .2 .4 .3 .4 .4
It
Show that this chain has a unique stationary distribution n and find n. 2 Consider a Markov chain having transition function P such that P{x, y) = ay, x E ff and y E ff, where the ay's are constants. Show that the chain has a unique stationary distribution n , given by n( y) = 3: 4
5
exy , y E 9'. Let n be a stationary distribution of a M[arkov chain. Show that if n{x) > 0 and x leads to y, then n{ y) > O. Let 1t be a stationary distribution of a Markov chain. Suppose that y and z are two states such that for some constant C })(x, y) Show that n{ y) = c1r{z) .
=
cP{x, z),
X E 9'.
Let no and 1t 1 be distinct stationary distributions for a Markov (�hain. (a) Show that for 0 < a < 1 , the function n« defined by
n«{x) = ( 1
-
a) n o(x) + all: l {x),
is a stationary distribution.
X E 9',
81
Exercises
(b) Show that distinct values of C( dett:�rmine distinct stationary distributions 1t(%. JHint : Choose Xo E f/ such that 1to(xo) =1= 1tl(XO) and show that 1t(%(xo) = 1tp(xo ) implies that C( = p. 6 Consider a birth and death chain on the nonnegative integers and suppose that po = 1 , 1'Jx = P > 0 for x > 1 :, and qx = q = 1 p' > 0 for x > 1 . Find the stationary distribution when it exists. 7 (a) Find the stationary distribution of the lEhrenfest chain. (b) Find the mean and variance of this distribution. S For general d, find the transition function of the modified Ehrenfest chain introduced in E�ample 3, and show that this chain has the same stationary distribution as does the original l�hrenfest chain. 9 Find the stationary distribution of the birth and death chain described in Exercise 2 of Chapter 1 . Hint : Use the formula -
( d) 2 ()
10
+
•
•
•
+
(dd) 2 (2dd ) · =
Let Xm n > 0, be a positive recurrent irreducible birth and death chain, and suppose that Xo has the stationary distribution 1t. Show that
P (Xo =
y I Xl =
x) = P (x , y},
x, Y E f/ .
Hint : Use the definition of 1tx given by (9). 11
Let Xm n > 0, be the :Markov chain introduced in Section 2.2.2. Show that if Xo has a Poisson distribution with parameter t, then Xn has a Poisson distribution �{ith parameter
12
Let Xm n � 0, be as in Exercise 1 1 . Show that
Hint : Use the result of Exercise 1 1 and equate coefficients of tX in the 13
14
appropriate power series. Let Xm n > 0, be as in Exercise 1 1 and suppose that Xo has the stationary distribution. Use th(� result of Exercise 1 2 to find cov (Xm' x,n + .), m � 0 and n > o. Consider a Markov ch.ain on the nonnegativc� integers having transition p, wrhere function P given by P(x, x 1) = p and P(x, O) = 1 + o < p < 1 . Show that this chain has a unique stationary distribution 1t and find 1t. -
Stationary Distjributions of a Markov Chain
82
1 5.
The transition function of a Markov chain is called doubly stochastic if
� P(x , y) x ef/
1 Ei
=
1,
y
E f/.
What i s the stationary distribution of an irreducible Markov chain having d < 00 states and a doubly stochastic transition function ? Consider an irreducible Markov chain having finite state space f/, transition function .P such that P(x, x) 0, X E f/ and stationary distribution n. Let Px , x E f/, be such that 0 < Px < 1 , and let Q(x, y), x E f/ and Y f/, be defined by
E
:=
Q�(x, x) and
=
1
-
Px y
#=
x.
Show that Q is the transition function of a.n irreducible Markov chain having state space 9') and stationary distribution n', defined by
n' (x)
1 7'
1 S1
1 91
20
=
1 Px- n(x )
Ly ef/ p;
l n( y) '
X E f/.
The interpretation of the chain with tra.nsition function Q is that starting from x, it has probability 1 - Px of remaining in x and prob ability Px of jumping according to the transition function P. Consider the Ehrenfest chain. Suppose that initially all of the balls are in the second box. Find the expected amount of time until the system returns to that state. Hint: Use th,e result of Exercise 7(a). A particle moves according to a Markov (;hain on { I , 2, . . . , c .+ d}, where c and d are positive integers. Starting from any one of the first c states, the particle jumps in one transition to a state chosen uniformly from the last d states ; starting from any of the last d states, the particle jumps in one transition to a state chosen uniformly from the Jfirst c states. (a) Show that the chain is irreducible. (b) Find the stationary distribution. Consider a Markov chain having the transition matrix glve�n by Exercise 19 of Chapter 1 . (a) Find the stationary distribution concentrated on each of the irreducible closed sets. (b) Find limn -+ oo Gn(x, y)jn. Consider a Markov chain having transition matrix as in Exercise 20 of Chapter 1 . (a) Find the stationary distribution concentrated on each of the� irre ducible closed sets. (b) Find limn -+ oo Gn(x, y)jn.
83
Exercises
21
22
Let Xm n > 0, be the Ehrenfest chain with ,I = 4 and Xo = o. (a) Fjnd the approxirnate distribution of X� for n large and even. (b) Find the approxirnate distribution of X;ra for n large and odd. Consider a Markov chain on {O, 1 , 2} having transition matrix 0
P =
23
1
2
[ [! � �l .
(a) Show that the chain is irreducible. (b) Find the period. (c) Find the stationary distribution. Consider a Markov chain on {O, 1 , 2, 3, 4} having transition matrix 0 0 0 1 0 P= 2 0 3 1 4 1
1
1
3"
0 0 0 0
2 3 10 3 l. 0 4 ° t 0 0 0 0
(a) Show that the chain is irreducible. (b) Find the period. (c) Find the stationary distribution.
4 0 ! ! . 0 0
Markov Pure Jump Processes
3
Consider again a system that at any time can be in one of a finite or countably infinite set f/ of states. We call f/ the state space of the system. In Chapters 1 and 2 we studied the behavior of such systems at integer times. In this chapter we will study the behavior of such systems over all times t > o. 3.1 .
Cc)nst ruct i o n of j u m p p rocesses
Consid,er a system starting in state Xo at time o. We suppose that the system remains in state Xo until some positive time 't'l ' at which time the system jumps to a new state Xl :F Xo . We allow the possibility that the systenrl remains permanently in state xo , in which case we set 't'l 00 . If 't' l i s finite, upon reaching X l the system remains there until some time 't' 2 > 't'l ,�hen it jumps to state X2 :F Xl . If the system never leaves Xl ' we set 't' 2 00 . This procedure is repeated indefinitely. If some 't'm 00 , we set 't' n 00 for n > m. Let X(t ) denote the state of the system at time t, defined by =
=
=
=
, Xl'
't'l
O. But this is not necessarily the case. Consid�er, for example, a ball bouncing on the floor. Let the state of the systenl be the number of bounces it has made. We make the physically reasonable assumption that the time in seconds between the nth bounce and the (n + I )th bounce is 2 - n • Then xn n and =
't'
n
=
1 + ! + 2
·
·
·
84
+
1 2n - l
_
_
=
2
_
1 . 2n - l
_ _
3. 1.
Construction of jump procf.�sses
85
W,e see that Tn < 2 and 'tn � 2 as n � 00 . Thus (1) defines X(t) only for o :< t < 2. By the time t = 2 the ball will have made an infinite number of bounces. In this case it would be appropriate to define X(t) = eX) for
t
��
2.
In general, if
(2)
n -+ oo
we: say that the X(t) process explodes. If the X(t) process does not explode, i.e., if
(3)
lim
= 00 ,
Tn
n-+ oo
th�en (1) does define X(t ) for all t > o. We will now specify a probability structur�e for such a jump process. We suppose that all states are of one of two types, absorbing or non absorbing. Once the process reaches an absorbing state, it remains there permanently. With each non-absorbing state x, there is associated a distribution function Fx{it ), - 00 < t < 00 , which vanishes for t < 0, and transition probabilities Q�xy, Y f/, which are nonnegative and such that Qxx = 0 and
E
(4)
x remains there for a random length of tirne T 1 having distribution function Fx and then jumps to state X(T 1) = Y with probability QXY' Y f/,. We assume that 'r 1 and X(T 1 ) are chosen A process starting at
E
independently of each other, i.e., that
Px(T 1
0,
which we will now verify. If x is an absorbing state, (8) reduces to the obvious fact that
t
>
o.
Suppose x is not an absorbing state. Then for a process starting at x, the� event {-r:l < t, X(T1) = z and X(t) = y} occurs if and only if th(� first jump occurs at some time s < t and takes the process to z, and the process goc�s from z to y in the r(�maining t s units of time. Thus
-
88
Mrarko v Pure Jump Proc�esses
so jpx(7: 1
t and X(t) ==
=
y) +
(�
Px(7:1
O.
It follows from (9) that PXy(t) is continuous in t for t > O. Therefore th�e integrand in (9) is a. continuous function, so we can differentiate the right side. We obtain (10)
P�y(t)
=
-
qxPXy(t) + qx
In particular,
P�y(O)
:=
-
�
x y(t), z ;# x Q z]>Z
qxPXy(O) + qx I:x QxzPZy(O) z ;#
Set
x, Y E= f/ .
(1 1) Then (12)
qxy
It follows from (1 2) that (13)
t
=
y == x,
Y
9t:
x.
>
O.
3.2.
Birth and death processes
89
E
The quantities qxy, x !;'fJ and Y E Y, are called the infinitesimal parameters of the process. These parameters determine qx and Qxy, and thus by our construction determine a unique Markov pure jump process. We can rewrite (10) in terms of the infinitesimal parameters as
(14)
>
t
z
o.
This equation is known as the backward equation. If Y is finite, we can differentiate the Chapman-Kolmogorov equation with respect to s, obtaining P�y( t + s)
( 15)
In particular,
==
� Pxz(t)P�y(s),
s > 0 and t
z
P�y(t) =
� Pxz(t)P�y(O),
t
z
>
>
o.
0,
or equivalently, (16)
>
t
z
o.
Formula (1 6) is known as the forward equation. It can be shown that (I S) and ( 1 6) hold even if fI' is infinite, but the proofs are not easy and 'will be o:mitted. In Section 3.2 we will describe some examples in which the backward or forward equation can be used to find explicit formulas for Pxy(t). 3.2.
B i rth and death processes
= {O, 1 , . . . , d} or Y = {O, 1 , 2, . . } . By a birth and death process on f/ we mean a Markov pure jump process on f/ having infinites irnal parameters qxy suc�h that
Let
Y
.
qxy = 0,
Iy
-
xl
>
1.
Thus a birth and death process starting at x can in one jump go only to the states x l or x + 1 . The parameters Ax = qx,x + 1 , X f/, and /lx = qx , x - 1 ' X E Y, are: called r(�spectively the birth rates and death rates of the process. The parameters qx and QXY of the process can be expressed simply in terms of the birth and death rates. By (1 3) -
E
so that (1 7)
and
M'arko v Pure Jump Pro(:esses
90
Thus x is an absorbing state if and only if A;t = /lx = O. If x is absorbing state, then by (1 2)
( 1 8)
Qxy '.--
Ax Ax
/lx
+ /lx Ax + /lx
,
y =: x - 1 ,
,
y = x + 1,
else,where.
0,
a.
non
E
A birth and death proce:ss is called a pure birth process if /lx = 0, X f/ , and a pure death process if Ax = 0, X E f/ . A pure birth process can move only to the right, and a pure death process can move only to the left. IGiven nonnegative numbers Ax , x E f/ , and /lx , x E f/, it is natural to ask whether there is a birth and death process corresponding to these parameters. Of course, /l o = ° is a necessary requirement, as is Ad = 0 if f/ is finite. The only additional problem is that explosions must be ruled out if f/ is infinite. It is not difficult to derive a necessary and sujfficient condition for the process to be non-explosive. A simple sufficient condition for the process to be non-explosiv1e is that for some positive numbers A and B x �� O.
This condition holds in all the examples we will consider. In finding the birth and death rates of specific processes, we will use SOlne standard properties of independent exponentially distributed random variables. Let � 1 " ' . ' �n be independent random variables having exponential distributions with respective parameters Ci l , . . . , Cin • Then min (�l ' . . . ' �n) has an exponential distribution with paralneter Ci t + · · · + Cin and
k
(19)
Moreover, with probability one, the random variables �l' n distinct values. �ro verify these results we observe first that
P(min (�l ' . · , �n) ,.
>
t) = P( � 1
>
= P( � 1 >
•
= 1,
•
•
. . . , n.
, �n take on
t, . . . , �n > t) t) · · · P( �n > t)
and hence that min (� 1 ' . . . , �n) has the indicat��d exponential distribution.
3. 2.
91
Birtb and death processes
Set
Then 11k has an exponential distribution with parameter
and � k and 11k are independent. Thus P( 'k
=
min (' 1 ' . . .
, � n))
1 "
--
Qx ,x + 1
==
P
and
Qx , x - l
-1-
p.
92
M'arko v Pure Jump Processes
State 0 is an absorbing state. Since Ax w�� conclude that and
=
Ilx
qx�Qx ,x + 1 and Ilx
=
xq(1 -- p),
=
qxQ,x , x - 1 ,
x > 0.
In the preceding exam.ple we did not actually prove that the process is a birth and death process'l i.e., that it "starts from scratch" after making a junlp. This intuitively reasonable property basically depends on thle fact that an exponentially distributed random variable , satisfies the forrnula
P(,
>
t
+
s
I , > s)
=
P(,
>
S,
t),
t
> 0,
but a rigorous proof is complicated. By (1 7) and the definition of Ax and Ilx' the backward and forward equations for a birth and death process can be written respectively as (20)
P�y(t)
=
IlxPX - l ,y(t) - (Ax
P�y(t)
=
Ay - 1Px,y - l (t) - (Ay
+
Ilx)PXy(t)
AxPx + 1 ,y(t),
+
t
> 0,
and (2 1 )
In (21 ) we set A - 1
Ild + l
=
o.
=
+
lly)P.Xy(t)
+
lly + 1 Px ,y + l (t), t > o.
0, and if [/ = {O, . " . , d} for d
O.
(28)
1 in the backward equation, or by subtracting Poo(t ) and By setting y Pl O(t) from one, we conclude that =
(29)
P O l (t)
=
P 1 1 ( t)
=
A
A
e - (). + p)t
__ ,..
A + /1
A + /1
t > 0,
,
and
(30)
A )., + /1
+
/1 A +
/1
e
-
(). + p)t ,
t > O.
.From (27)-(30) we see� that
(3 1 )
lim PXy(t) -+ t + 00
where
nCO)
(32)
=
/1 A +
n( l )
and
_
__
n( y),
=
/1
=
A -
-- -
A + /1
.
If �n; o is the initial distribution of the process, then by (2 7) and (28)
P(X (t)
=
0) = n o(O)P oo (t) + ( 1 =
��- +
(
n o(O)
. -�A +- /1
(
n o( l )
A +- /1
Sirnilarly,
P(X(t)
=
1)
=
+
-
n o(O))P 1 0 (t)
)
t > 0.
)
t > 0.
-
/1 e - (). + p)t, A + JU
-
A e - (). + p)t, ,A + JU
Thus P(X(t ) 0) and J·(X(t) 1 ) are indep(�ndent of t if and only if no is the distribution n given by (32). =
3.�!.2. o ::; t
=
Poi sso n proct�ss. Consider a pure birth process .X (t ), < 00 , on the nonnegative integers such that
x
> 0.
Since a pure birth process can nlove only to the right,
(33)
(34)
y
<x
and t >
t >
o.
o.
3.2.
95
Birth and death processes
=1= 0 is
The forward equation for y
P�y( t)
Fro m (23) we see that
=:
APx,y - l ( t) - APxy( t) ,
Px,(t) = e - ltpx'(O) + A.
f�
t >
o.
e - l ( t - s) PX , Y _ l (S) ds ,
t >
o.
Since PXY(O) = �XY ' we conclude that for y > x PXy(t) - A.
(35)
_
"
ft 0
e - A ( t - S) Px,y - t ( s) ds ,
t >
o.
It follows from (3 4) and (35) that
and hence by using (35) once more that
Px, x + (t) = A 2
f
t
o
e - A ( t - s ) Ase - AS ds
=
A 2 e - At
f
t
0
(At) 2 - A e t. s ds = 2
By induction y - x e - At (At) Pxy ( t) '
(36)
'--
(y
_
x) !
o < x
�;: y
and t
>
o.
Formulas (33) and (36) imply that
t >
P:x.;y(t) = PO ,y - x(t),
(37)
0,
and that if X(O) = x , then X(t) - x has a Poisson distribution with parameter A t . In general, for 0 < S < 1, X(t) - X es) has a Poisson distribution with parameter A(t - s). For if 0 < s < 1 and y is a nonnegative integer, then
P(X( t) - Xe s) = y) =
1 the time Tn of the nth jump equals the time Tn when the process hits state n. When the Poisson process is used to model levents oc�curring in time as d(�scribed above, the common time Tn = Tn is the tilne when the nth event occurs. The Poisson process can be used to construct a variety of other processes.
3.2.
97
Birth and death processes
IExa m ple 2 .
Consider the branching process introduced in Example 1 . Suppose that new particles immigrate into the system at random times that form a Poisson process with parameter A and th(�n give rise to succeedilng generations as described in Example 1 . Find th�� birth and death rates of this birth and death process. B ra n c h i ng
process with
i rn m i g ratio n .
Suppose there are initially x particles pres1ent. Let C; l' . . . , C;x be the tinles at which these particles split apart or disappear, and let " be the first tinle a new particle enters the system. We inte�rpret the description of the system as implying that " is independent of C; 1 ' , C;x . Then ' 1 ' . . . , 'x, " are independent exponentially distributed random variables having res.pective parameters q, . . . , q, A. Thus •
=
'r 1
P('l' 1 =
x
-t-
.
min (, l ' . . . , 'x , ,,)
=
is �exponential1y distributed with parameter q},�
The event { X('l' 1 )
.
= ,, ) =
A
xq +
it ,
•
•
•
A, and by
( 1 9)
·
= "
I } occurs if either ! 1 'l' 1 = min (c; l '
xq +
or
, C;;�)
and a particle splits into two new particles at time 'l' 1 . Thus Qx,x' + 1
Also,
-
A
xq + A
Q�x ,X - 1 -
+
xq p. x q ·+ A --
xq (1 xq + A
_ .
p).
WC� conclude that
and Ilx = qxQx ,X - 1 = xq (1
-- p).
It is also possible to c:onstruct a Poisson process with parameter A on - (XJ < t < 00 . We first construct two independent Poisson pro(;esses X1 ( t ) , 0 < t < 00, and X2 ( t ) , 0 < t < 00, both having parameter A . Wc� then define X(t), - CX) < t < 00, by X(t)
=
{ - X 1 ( - t), X 2 (t),
t < t >
0,
o.
98
"�arkov Pure Jump Projcesses
It is easy to show that the process X{t), satisfies the following three properties :
(iii)
•
3.,2.3.
0 : XCI )
=:
=
y).
If X(O) = y, then min (t > 0 : X(t) = y) O. A more useful random variable in this case is the time Ty of the first return to y after the process leaves y. Both cases are� covered by setting
( t > Tl : X(t)
=
y).
H�ere 't 1 is the time of thc:� first jump. If T 1 = ($.) or X(t) #= y for all t > T l ' w(� set Ty = 00 . If x is an absorbing state, set Pxy = �xy ; and if x is a non-absorbing state, set Pxy = Px(Ty < 00 ).
A state y E f/ is called recurrent if Pyy
=
1 and transient if Pyy < 1 . The process is said to be a recurrent process if all of its states are recurrent and a transient process if all of its states are transient. The process is called irreducible if Pxy > 0 for all choices of x E f/ and y E f/. The function P(x, y) QXY ' x E f/ and y E f/, is the transition function of a Markov chain called the embedded cha in.. The quantities Pxy for this Markov chain are equal to the corresponding quantities for the Markov pure jump process. This relationship shows that results of Chapter 1 involving only the numbers Pxy are also valid in the present context. In particular, an irreduciblt� .process is either a recurrent process or a transient process. It is recurrent if and only if the embledded chain is recurrent. If n(x), x E f/, are nonnegative numbers su]mming to one and if :=
� n(x)Px:�(t) = n( y), x th�en n is called a stationary distribution.
(46)
tion n for its initial distribution, then
P(X(t) so that
X(t)
=
has distriblition
y)
=
Y E 9() and
0,
If X(0) has a stationary distribu
� n(x)PXy( t ) x
t>
n for all t > o.
=
n(y),
3.3.
1 03
Propt3rties of a Marko v pure jump process
If (46) holds and f/ is finite, we can differentiate this equation and obtain (47)
� n(x)PI�y(t) x
In particular, by setting
t
=
=
f/ and
t > O.
0 in (47), we conclude from (1 1 ) that
�� n(x) qXY
(48)
y E
0,
oX
=
0,
E;
Y
f/.
It can be shown that (47) and (48) are valid c�ven if f/ is an infinite set. Suppose, conversely, that (48) holds. If f/ is Jfinite we conclude frolm the backward equation ( 1 4) that
� �x n(x)PXy(t) dt
= = = =
� n(x)P�y(t) x
� n(x) (� qxzPzi t»)
� (� n(x qxz) PZy(t) )
O.
Thus
� n(x)PXy(t) x is a constant in
t and
thc� constant value is given by
� n(x)PXY(O) x
=
� n(x) 0,
6
fT(t)
=
{ 0,
t
0.
O. The Markov (�hains discussed in Chapters of
=
.
.
.
1 and 2 are discrete pararneter processes, while the pure jump processes discussed in Chapter 3 are continuous parameter processes.
A stochastic process X (t), t E T, is called a second order process if EX2(t ) < 00 for each
t
E
T. Second order processes and random variables defined in terms of
them by various "linear" operations including integration and differentiation are the subjects of this and the next two chapters. We �rill obtain formulas for the means, variances, and covariances of such random variables. We will consider continuous parameter processes almost exclusively in these three chapters.
Since no new techniques are needed for handling the analogous
results for discrete paranleter processes, little would be gained by treating such processes in detail .
4.1 .
M ea n a n d cova r i a nce functi o ns
Let
X (t ) , t E 1�, be a second order process.. The mean function /lx(t ) ,
t E T, of the process is defined by /lx(t ) The
=
EX(t ) .
covariance function rx(s, t ) , s E T and t E :T, is defined by rx(s, t )
==
cov
(X (s) , X (t ))
=
EX(s)X (t ) - EX(s)EX (t ).
This function is also called the auto-covariance function to distinguish it from the cross-covariance function which will be defined later. 111
Since
1 12
Second Order Pro(�esses
Var X(t) = cov (X(t), ..:f(t)), the variance of X(t ) can be expressed in terms of the covariance function as (1)
Var X(t) = rx(t, t),
t E T.
By a finite linear combination of the randolm variables X(t), t rnlean a random variab l�� of the form
n � bjX(tj), j= 1 where n is a positive int(�ger, t , tn are points in T, and b 1 , 1,
•
.
.
.
.
E
.
T, we
, bn are
real constants. The cova.riance between two such finite linear combinations is given by
=
In particular,
n � � a i bjrx(sb tj)' i= 1 j= 1 m
(2) It follows immediately from the definition of the covariance fUIlction that it is symmetric in s and t, i .e., that (3)
rx(s, t) = rx(t, s),
s, t E T.
It is also nonnegative deJ7nite. That is, if n is a positive integer, t l ' ar�e in T, and h 1' . . . , hn are real numbers, the:n
n n � � b bj rx{tb tj) It = 1 j = 1 i
;;:�
·
.
. , tn
O.
This is an immediate consequence of (2) . We say that X(t), - CfJ < t < 00 , is a second order stationary process if for every number 't' the second order process Y(t), - 00 < t .< 00 , defined by - 00 .< t < 00 , Y(t) X(t + 't'), ==
ha.s the same mean and covariance functions. as the X(t) process. It is left: as an exercise for thc� reader to show that this is the case if and only if flx( t) is independent of t and rx(s, t ) depends only on the diffe:rence between s and t. Let X(t), - 00 < t o.
>
o.
]
146
Continuity, Integration, and Differentiation
Let / be a continuously differentiable function on (
-
00 ,
b]
such that
It can be shown that
f oo
J(t) dW(t)
=
a
��
oo
f
J(t) dW(t)
exists and is finite with probability one, and that the random variable
foo
J (t) d W (t)
is normally distributed with mean zero and variance
Let 9 be a continuously differentiable function on
It can be shown that under these conditions
( - 00 , c]
(42)
such that
holds with a =
-
00 ,
I.e. ,
(43)
E
[f
J(t) d W(t) 00
Exa m p l e 3.
C( i s
get) d W (t) 00
]
f�:(b.C)
= a2
J(t)g(t) dt.
Let X(t ), - 00 < t < 00 , be defined by
X(t) where
f
=
foo
elZ( t - u )
dW(u ) ,
- 00
t
Find the mean and covariance function of the X(t) process. Set
X(t) =
6;
f� W(s) ds,
T
J( X(t) dt o
)
= o.
00 , be as in Exercise 6 and suppose that
>
O.
149
Exercises
Use the result of Exercise 6(a) to show that
( J
00
lim T Var ! T T -+
OT 0
)
X(t) dt = 2 Jf rx(t) dt = 0
Hint : Observe that for 0
1
T
t rx{t) - dt o T
8
<
o.
Since integrals with respect to white noise have mean zero, t �: o.
(1 .1 )
From Example 2 of Chapter 5 and the formula - 2Cla� that s >
(12)
=
0 and
2ao at , vve see
t > o.
In particular, t > o.
'We assume throughout the remainder of this section that Cl is negative. Then ( 1 3)
Xo (t)
=
�
ao
I
t
- 00
e«(t - s) d W ( s)
=
- at/ao
6. 1.
First order differential equations
157
is well defined as we saw in our discussion of Example 3 of Chapter 5. Also for - 00 < t < Cf:.)
Xo (t) = � ao
J'O -
00
elZ(t - s)
dW(s)
+
� (t elZ(t - s) dW(s) ao J o
� rr elZ(t - s) dW(s), ao J o which agrees with (9) for t o = O. Thus Xo (t), - 00 < t < 00 , satisfies (6) on ( - 00 , 00 ). Our goal is to demonstrate that Xo(t) is the only second =
Xo(O)e«'
+
order stationary process. to do so. We see from Example 3 of Chapter 5 that Xo (t ) is a second order stationary process having zero means and covariance function (1 4)
- 00 < t < 00 . It is not difficult to sho�r that the Xo(t) process i s a Gaussian process. Let X(t), - 00 < t 0
E(X( t)
--
X O ( t)) 2 =
E(Ceat) 2 = e2atEC 2 ,
and hence lim E(X(t) - X O( t)) 2 = o.
( 1 7)
t-t· +
Since
00
E(X(t) - XO (t)) 2 = (EX(t) - EXO (t)) 2 -r Var (X(t) - Xo (t)) = (EX(t)) 2 + Var (X(t) - Xo (t)) > (EX(t )) 2 , wc:� see from ( 1 7) that lim flx( t) = o.
( 18)
t-+ +
00
It follows from ( 1 7) and Schwarz' s inequality (see the proof of Equation (2) of Chapter 5) that
(19)
lirn ( rx(s , t) - ,.xo(s , t)) = O.
s,t-+ +
00
W�e summarize ( 1 7)--( 1 9) : any second order process X(t ) that satisfi�es (6) on [0, 00 ) is asymptotically equal to the second order stationary solution Xo (t) of (6) on ( - 00 , 00 ), which has zero mea.ns and covariance function given by (14).
6.2.
Diff.9rential equations of o�rder n
759
Let X(t), 0 < t < 00 , be the solution to (6) on [0, (0) satisfying the initial condition X(O) = xo , where Xo is some real constant. From (1 1), (1 2), and (14), we see directly that Exa m pl e 1 .
lim Jlx( t) = lim
t -t· +
and that
lim (rx(s , t) -
s,t-+ + 00
6,. 2.
t-+ + 00
00
rxo( s ,
x o ecu
-q
t)) = lim
t
s, -+ + oc)
Differential eql uations of order
= 0 2
2ao a 1
ea(s + t)
=
o.
n
In this section we will describe the extensions of the results of Section 6. 1 to solutions of nth order stochastic differential equations. Before doing so, however, we will briefly review the deterministic theory in a fornl con vt:�nient for our purposes. Consider the homogeneous differential equation (20)
where ao, at, . . . , an are: real constants with ao :1= O. By a solution to (20) 011 an interval, we mean a function 4> (t) which is n times differentiable and such that
011
that interval. For each j, 1 < j < n, there is a solution 4>j to the homoge:neous differential equation on ( - 00 , (0) such that k = j - 1, 0 < k < n - 1
and
k :1= j - 1 .
t
These functions are rc�al-valued. If n = 1 , then 4>t(t) = ea , where C( = - a t/ao . In Section 6.2. 1 we will find formulas for 4> 1 and 4> 2 when n = 2. For any choice of the: n numbers e 1 , . . . , em the function
is the unique solution to (20) satisfying the initial conditions
Wre can write this solution in the form
160
Stoc��Bstic Differential EqU'ations
The polynomial
is called the characteristic polynomial of th,e left side of (20). I�y the fundamental theorem of algebra, it can be fa(�tored as where r 1 , . , rn are roots of the equation p(r) = O. These roots are not nt:�cessari1y distinct and may be complex-valu�ed. If the roots are distinct, then .
.
are solutions to (20), and any solution to (20) can be written as a linear combination of these solutions (i.e., these solutions form a basis �or the space of all solutions to (20)). If root r i iis repeated n i times in the fa,ctorization of the cha.racteristic polynomial, then are all solutions to (20). As i varies, we obtain L i n i = n solutions in this way, which again form a basis for the space of all solutions to (20). The left side of (20) is stable if every solution to (20) vanishes at 00 . The specific form of the solutions to (20) described in the previous para.graph shows that the left side� of (20) is stable if a.nd only if the roots of the characteristic polynomial all have negative real parts. Consider next the nonhomogeneous differential equation
aox(n)(t)
(22)
,�
a 1 x(n - l ) (t)
+ ··· +
anx(t) = y(t)
for a continuous function y(t). To find the ge�neral solution to (22) on an interval, we need only find one solution to (22) and add the g�eneral solution to the corresponding homogeneous differential equation. One method of finding a specific solution to (22) involves the irnpulse response function h(t), t > 0, defined as that solution to the homogeneous diJffe rential equation (20) satisfying the initial conditions
h (n - l )(O) = ! . ao It is convenient to define h(t) for all t by setting h(t) = 0, t < O. It follows from (21) that t > 0, t < O. h(O) = · · · = h ( n - 2 )(0) = 0
and
The function x(t) =
( t h( t J to
-
s)y(s) ds
6. 2.
Differential equations of order n
767
is easily shown to be the solution to (22) on an interval containing t o as its left endpoint and satisfying the initial conditions
x(to) = · · · = x(n - l )(to) = O. Suppose now that the left side of (22) is stable. Then h (t ) ponentially fast" as t � 00 and, in particular,
(23)
f�oo I h(t)1 dt
-+
0 "ex
and
< 00
If y(t), - 00 < t < 00 , is continuous and does not grow too fast as t --+ - 00 , e.g., if for all then
c > 0,
f oo h(t - s)y(s) ds f�oo h(t - s)y(s) ds
x(t) = =
defines a solution to (22) on (- 00 , 00 ) . (The reason h(t) is calle:d the "impulse response function" is that if y(t), - 00 < t < 00 , is a "unit impulse at time 0," then the solution to (22) is
x( t) =
f�
00
h( t - s)y(s) ds = h(t),
so that h(t) is the response at time t to a unit impulse at time 0.) With this background we are now ready to discuss the nth order stochastic differential equation
where W'(t) is white noise with parameter (}"2 . This equation is not well de:fined in its original form. We say that the stochastic process X(t) is a solution to (24) on an interval containing the point to if it is n - 1 times differentiable on that interval and satisfies the integrated form of (24), namely,
(25)
a o(X (n - l ) (t) - x (n - l ) ( to))
+ ··· + +
on that interval.
an
a n - 1(X(t) - X( to))
J't Xes) ds to
=
Wet) - W(to)
Stochastic Differential Equ.ltions
162
Theorem 1
The .process X(t), t > to , ciefined by
(t
X(t) = J h(t - s) dW(s), to is a solution to (24) on [to , 00) sati�fying the initial conditions X(to) = · · · = x 2. Then X(t) =
f� h(t - s) dW(s),
which by Equation (32) of Chapter 5 can be rc�written as
X(t) = h(O)W(t) Since h (O) = O� we see that
X(t) =
(26)
It follows from (26) that
+
f� h'(t - s) W(s) ds
.
f� h'(t - s)W(s) ds.
f� X(s) ds f� (f: h'(s - u)W(u) dU) ds f� W(u) ({ h'(s - u) dS) du f� W(u)(h(t - u) - h(O» du o =
=
=
We replace the dummy variable h (O) = 0, and obtain
u
by s in the last integral, note again that
f� X(s) ds f� h(t =
(27)
-
s) W (s) ds .
In order to find X'(t) from (26), we will use the calculus formula (28)
d dt
it f(s�� t) ds to
=
f(t, t)
+
it ata f(s, t) ds, to
-
which is a consequence of the chain rule. It follows from (26) and (28) that
X '(t) = h'(O)W(t)
+
f� h"(t - s) W(s) ds
.
6.2.
Differential equations of order n
763
If n > 2, then h'(O) = 0, and hence
f� h"(t - s) W(s) ds.
X '(t) =
By repeated differentiation we conclude that (29)
X (j )(t) =
f� h(i + 1 )(t - s) W(s) ds,
<j < n -
o
1.
Since h (n - l ) (o) = l/ao , we find by differentiating (29) with j = n - 2 that
x (n - l )(t) = W(t) ao
(30)
+
r h (n)(t - s)W(s) ds.
Jo
From (29) and (30), we see that
X(O) = X ' (O) = · · · = x (n - l ) (o) = O.
(3 1)
It follows from (27), (29), and (30) that
a ox (n - 1 )(t)
+
·
·
·
+
an - 1 X(t)
= W(t) +
an
+
f� X(s) ds
f: (aoh(n)(t - s)
+
·
·
·
+
an h(t - s» W(s) ds.
Since h(t) satisfies the homogeneous differential equation integral vanishes, and hence
(32)
aox (n - O (t)
+ ... +
an - 1 X(t)
+
an
f: X(s) ds
(20), the last
= W(t).
We see from (3 1) and (32) that (25) holds with to = O. This completes the proof of the theorem. I The general solution to (24) on [to, (0) is gjven by (33)
x (n - l )(tO)4>n(t - to) (t h (t _ s) dW(s), + J to
X(t) = X(tO)4> l (t - to)
+ ... +
t > to .
In more detail, let C1 , . . . , Cn be any n randolll variables. Then the process X(t), t > to , defined by (34)
X(t) = C 1 (t) = a o r 2 erit + a l rert + a 2 ert = (aor 2 + a1 r + a 2 )ert = p(r)ert .
Thus 4>(t) = e r t satisfies the homogeneous equation if and only if p(r) = 0, i.e. , if and only if r is a root of the characteristic polynomial. In order to obtain specific formulas for 4> 1 (t) and 4> 2 (t) we must distinguish three separate cases corresponding to positive, negative�, and zero values of the discrilllinant of the characteristic polynomial. C:ase 1 . �rhe characteristic polynomial has two distin,ct real roots and The functions er 1 t and er2it are solutions to the homogeneous equation as is any linear combination c 1 er 1 t + C2 er2t, where C l and C2 are constants. w�� now choose C 1 and C,2 so that the solution satisfies the initial conditions 4> 1 (0) = 1 and
¢I � (0)
=
o. Since
we obtain two equations in the two unknowns C 1 and C2 , namely,
C1
+
C2 = 1
which have the unique solution and
C2 '-
rl r 1 - r2
---
768
Stochastic Differential EqulBtions
Thus
By similar methods we: find that the solution 4> 2 to the homogeneous equation having initial c;onditions 4> 2 (0) = 0 and 4>; (0) = 1 is
a i - 4a0t22 < o.
Case 2.
The characteristic polynomial has two distinct complex-valued roots
- a1 rl =
+
i.J4a o a 2 - a � 2a o
r2
and
:=
- a 1 - i.J4a oa 2 - a i 2ao
----,---
In terms of '1 and '2 th�� functions 4> 1 (t) and ,4> 2 (t) are given by the same formulas as in Case 1 . Alternatively, using the formula ei8 = cos f} + i sin f} and elementary algebra, we can rewrite these formulas as
4> 1 (t)
=
(
elZt cos pt -
and
4> it) where
a
=
1 elZt
� sin Pt)
sin P t,
and p are real numbers defined by ' 1
=
a
+
iP or
../4a O a 2 - a i . 2a o It is clear from these formulas that 4> 1 (t) and 4> 2 (t) are real-valued and
P=
functions. lease 3.
'The characteristic polynomial has the unique real root
r1
=
a1 . 2a o
- -
One solution to the homlogeneous equation is 4>(t) solution is 4>(t) = ter t t :
aocfl'(t)
+
a1 4> '(t)
+
a 2 q�(t)
=
er t t. A second such
a o(ri t + 2 r1)er t t + a1(r1 t + l )er t t + Gf 2 ter t t = (a or i + a 1 r 1 + ('l 2 )ter t t + (2a O r 1 + ajl) er t t
=
= o.
6. 2.
1 69
Dif�erential equations of ��rder n
Thus 4> 1 (t) = C 1 er 1 t + c2 ter 1 t is a solution to the homogeneous equation for arbitrary constants C 1 and C2 . Choosing C 1 and C2 so that 4> 1 (0) = 1 and 4> � (0) = 0, we find that
4> 1(t) = er 1 t( 1 - r l t).
Similarly the solution (P 2 satisfying the initial conditions 4> 2 (0) 4> ; (0) = 1 is found to be
=
0 and
4> 2 (t) = ter 1 t•
Suppose that the left side of (48) is stable. Then the stationary solution .x"o(t) to (48) on ( 00 , (0 ) has the covariance function given by (39). Since -
{I
h(t)
=
> 0, t -
4> (t), ao 2 0,
t < 0,
w'e can use our formulas for 4> 2 (t) to COlTlpute 'xo (t). The indicated integration is straightforward and leads to the result that
rxo( t)
(50)
=
ill all three cases for n
q2 4> tl), 1 (I 2a1a 2 :=
- C'X)
< t < 00,
2. In particular,
(51 )
- 00 < t < 00 .
Exa m ple 2.
Consider the stochastic diffe�rential equation
X"(t)
+
2X'(t)
+
2X(t)
:= W'(t).
(a) Suppose X(t), 0 < t < 00, is the solution to this equation on [0, (0 ) having the initial conditions X(O) = 0 and X'(O) = 1 . Find the distribution of X(t) at the first positive time t such that EX(t) = o. (b) Consider the stationary solution Xo (t), - 00 < t < 00 , to this equation on ( - 00, (0 ) . Find the first positiv,e time t such that Xo (O) and X"o (t) are uncorrelated. Since ai - 4aO a 2
=
2 2
4-8
a = - - --
T'hus
-1
=
- 4 < 0, Case 2 is applicable. Now and
p
=
�8
-
2
4 = 1.
1 70
Stoci1rastic Differential Equ41tions
and
h(t)
==
lP 2 (t)
=
e - t sin t,
t > o.
The mean and varianc:e of the solution having the initial conditions indicated in (a) are given according to (36) and (37) by l�X( t )
=
lP 2 (t)
=
e - t sin t
and
Evaluating the last integral, we find that 2 2 Var (X(t)) � [ 1 + e - t(cos 2t - sin 2t - 2)] . 8 =:
The first positive time t such that EX(t ) 0 is t n. We see that X(n) is normally distributed with mean 0 and variance 0' 2 ( 1 - e - 2 1t)/8. The covariance function of the stationary solution to the differential equation is given, according to (50), by =
=
Thus the first positive ti1ne t such that Xo(O) and Xo (t) are uncorrelated is
t
==
6.:3 .
3n/4.
Esti mati on theory
In this section we will study problems of the form of estimating a random variable Y by a random variable f, where f is required to be de:fined in terms of a given stochastic process 4X" (t), t T. In terms of the probability space Q, we observe a sample function X(t, ro), t T, and use this information to construct an estimate f (ro) of Y(ro). Estimation theory is Iconcerned with methods for choosing good estimators.
E
E
Let X(t), 0 < t < 00, be a slecond order process and let o .< to < t 1 • The problem of estimating X(t 1 ) from X(t), 0 < t < to , is call1ed a prediction problem. We think of to as the present, t < to as the past, and t > t o as th�e future. A prediction problem, then, involves estimating future values of a stochastic proces.s from its past and present values. In the absence of any general theory one can only use some intuitively reasonable estimates. We could, for example, estimate X(t 1 ) by the present value X(to). If the X(t) process is differentiable, we could estimate X(ti) by X(to) + (t l - t o)X'(t o). Exa m pl e 3.
6. 3.
EstiTJnstion theory
1 71
Let 8(t), 0 < t < 1 , be a second order process. Let N' (t), 0 < t < 1 , be a. second order process independent of th�� 8 (t) process and having zero means. Problems of estimating some random variable defined in term:s of the 8(t) process based on observation of the process X(t) = 8(t) + N(t), 0 < t < 1 , are called filtering problems. Thinking of the 8 (t ) process as a "signal" and of the N(t) process as noise, w�e wish to filter out most of the noise without appreciably distorting the signal. Suppose we want to estimate the 8 ( t ) process at some fixed. value of t, say t = !. If the signal varies slowly in time and. the noise osc�illates ra.pidly about zero, it might be reasonable to estimate 8 (!) by Exa m p l e 4.
1 28
ft-+s£ X(t) d t
t
fQir some suitable 8 betVl{een 0 and t.
We have discussed tw'o examples of estimation problems and described SQime ad hoc estimators. In order to formulate estimation as a precise mathematical problem, we need some criterion to use in comparing the ac:curacy of possible estimators. We will usc� mean square error as our m1easure of accuracy. VVe will estimate random variables Y having finite se1cond moment by random variables Z also having finite second moment. Th.e mean square error of the estimate is E(�� - y) 2 . If Z t and �l2 are t,,'o estimators of Y su�ch that E(Zt y) 2 < E(Z2 - y) 2 , then Zt is considered to be the better estimator. In any particular estimation problem we nrlust estimate some ra.ndom variable Y in terms of a process X(t), t E T. A random variable 2� is an allowable estimator only if it is defined in tef1ms of the X(t) process. We may further restrict the allowable estimators by requiring that they depend on the X(t) process in some suitably simple manner. In any case we obtain some collection .,I{ of random variables which we consider to be the al1lowable estimators. An optimal estimator of Y is a random variablc� f in .,II/ such that -
(52)
E(f"
- y) 2 = min
Z e Jl.
E(Z
_
y) 2
.
The estimators are required to have finite slecond moment, so that (i) if Z is in .,I{, then EZ 2 < 00 In almost all cases of interest, .,I{ is such that (ii) if Zt and Z2 ar�� in .,I{ and at and a 2 are real constants, then at Zt + a2 Z2 is in .,I{. If condition (ii) holds, then .,I{ is a vector space. To verify that optimal estimators exist, it is usually necessary for .,I{ to be such that •
772
Stochastic Differential Equ.ltions
(iii) if Z1 , Z2 , . . . are in vH and Z is a random variable such that 2 limn-+ oo E(Zn - �l) = 0, then Z is in vH. Condition (iii) states that if Z is the mean square limit of random variables in .A, then Z is in .A . In other words, this condition states that Jt is closed under mean squa.re convergence. Exa m ple 5. L i near esti mati o n . Conside:r a second order process X(t ), t E T. Let vH 0 be the collection of all random variables that are of
thle form of a constant plus a finite linear combination of the random variables X( t ), t E T. Thus a random variablt:� is in vH 0 if and only if it is of the form
for some positive integer n, some numbers S1 , . . . , Sn each in T, and some real numbers a, b 1 , • • • , bn• The collection .,It' o satisfies (i) and (ii), but it dOles not in general satisfy (iii) because certain random variables involving integration or differentiation, e.g., X '(t 1) for some t 1 E T, may b(� well defined in terms of the X(t) process but not be in Jt o . Such random variables, however, can be mean square limits of random variables in vii 0 under appropriate conditions, as we saw in S��ction 5.3. This leads us to consider the collection .A of all random variables which arise as mean square limits of randonl variables in vii o . C:learly vH contains .,Ito . It can be shown that .A satisfies conditions (i), (ii), and (iii). Estimlation problems involving this (�hoice of vii are called linear estimation problems. Let X(t), t E T, be a s(�cond order process as in the previous example. Let vH 0 be the collection of all random variables having finite second moment and of the form IExa m pl e 6.
N o n l i near esti matio n .
!(X(S1), . . . , X(sn)), where n ranges over all positive integers, Sl ' . . . , Sn range over T, and / is n an arbitrary real-valued function on R (subj��ct to a technical condition involving "measurability"). Again vH 0 satisfie:s conditions (i) and (iii) but not necessarily (iii). The larger collection .Aft of all random variables arising as m,ean square limits of random variables in J( 0 satisfies all three conditions. Estimation problems involving this choice of J( are called nonlinear estimation problems. "fhe extension from .4:'(0 to vH in the above two examples is nece:ssary only if the parameter set T is infinite. If T is a finite set, then vH 0 := Jt in these examples.
6. 3.
173
Estilll'Btion theory
Most methods for finding optimal estimators are based on the following theorem. 6.:3.1 .
G enera l pri nc:i ples of esti mati o n .
Theorem 2 Let vii satisfy conditions (i) and (ii). Then l' is an optimal estimator of Y if and only if
E(f - Y)Z
(5 3)
=
e
vii
Z e vii .
0,
If � I )2 0 I and Y are both optimal estimators of }T, then E ( Y - � and hence Y = l' with probability one ; in this sense the optinlal estimator of Y is uniquely determined. �
�
=:
'Two random variables Z l and Z2 ' each having finite s��cond moment, are said to be orthogonal to each other if EZ 1 Z2 = O. Theorem 2 asserts that an optimal estimator of Y in terms of a random variable lying in vii is the unique random variable l' in vii such that l' Y is orthogonal to all the random variablc�s lying in .A (see Figure 2). -
Figure 2
Let l' e .A bc:� an optimal estimator of Y and let Z be in Thien by condition (ii), l' + aZ is in JI . It follows from (52) that JProof
E ( l' - y) 2
< .E(l' +
aZ
_
y) 2 ,
- oo < a < oo .
In other words the function f defined by f(a) = =
E( Y
aZ E(f - y) 2
has a minimum at a
+
_
y) 2
+
2aE(f - Y)Z
o = f ' (O) =
2E(l' - l') Z,
=
O. Thus
which shows that (53) holds.
+
a 2 EZ 2
Jt.
174
StocJ;sstic Differential Equ,ations
Suppose now that in. vii . Then
f
E=
vii
and (53) holds. L,et Y be any random variable
E( -(T1 -- Y + Y - i"TI ) 2 E ( Y -- y) 2 + 2E(f - Y)(Y - f) + E (Y - f ) 2 . Since Y - Y is in vii , 'we can apply (53) with Z Y - f to conclude that E ( Y - Y)(Y - f) 0, and hence that E(Y - J�) 2 E(f - y) 2 + E(Y y) 2 . (54) E(Y - Y) 2 �
�
=
=
=
=
=
_
Since Eey - f)2 > 0, (54) shows that f is at least as good an estimator of Y as is Y. Since Y is an arbitrary rando:m variable in Jt, Y is an optimal estimator of Y. If Y is also an optirnal estimator of Y, then by (54) we see that E(Y y )2 0. This completes the proof of the th.eorem. I �
=
-
It can be shown that if Jt satisfies condition (iii) as well as (i) and (ii), then there is always an optimal estimator of }r. Let X(t), t E T, be a s�econd order process and let vii be as in ExaInple 5. Lc�t Y be the optimal linear estimator of a random variable Y. Sinlce the constant random variable Z = 1 is in vii 0 and hence in vii , it follows from (53) that
E (Y - Y)
(55)
Since the random variable X(t) is in
=
Jt 0
0. c
c/H
for t
E
T,
E ( Y - Y)X(t) 0, t E T. Conversely, if Y E vii satisfies (55) and (56), then f is the optimal linear ( 56)
=
estimator of Y. The proof of this result is left as an exercise. Let X(t), t E T, be a s��cond order process and let Ybe a random va.riable as before. Suppose no�r that for every positive integer n and every (;hoice of Sl ' . . . , Sn all in T, the random variables X(St ), . . . , X(sn), Y have a joint normal distribution. It can be shown that in this case the optimal linear estimator of Y and the optimal nonlinc�ar estimator of Y coincide. The proof depends basically on the fact that if X(SI), . . . , X(sn), Y have a joint normal distribution, then
for suitable constants
Q,
b1,
•
•
•
, bn •
We will close this section by discussing some examples of prediction problems in 1which th1e optimal predictor ta.kes on a particularly simple form. 6 . 3.2.
Some exa m p l es of o ptimal pred i cti o n .
6. 3.
Estin1ation theory
7 75
Let W ' (t) represent white noise with parameter O� 2 and let the observed process �r(t), 0 < t < 00 , be the solution to the differential equation Exam ple 7.
(57)
< t < 00 ,
o
satisfying the determinis.tic initial conditions
X(O)
= Xo
X'(O)
and
=
Vo .
L��t 0 < t 1 < t2 • Find the optimal linear prc�dictor of X(t2 ) in terms of X(t), O < t < t 1 , and find the mean square e:rror of prediction. As we saw in Section 6.2, we can write the solution to (57) as
X(t)
(58)
=
X(O)cP 1 (t)
+
X '(O)cP 2 (t) +
f� h(t - s) dW(s),
o
< t < 00 ,
where cP 1 , cP 2 , and h are defined explicitly in Section 6.2. 1 . We have sirnilarly that
X(t)
=
X(t1)cP 1(t - t1)
+
X '(t1) cP 2 (t - t1) +
f'
J t t h(t - s) dW(s),
Set Then
We will show that .f (t2 ) is the optimal linear predictor of X(t2 ) in terms of
X(t), 0 < t < t 1 .
We note first that
E(.f(t 2) - X( t2)) By (41) of Chapter 5
E
=
-E
f,ttt2 h(t2
--
[f� h(t - s) dW(s) {2 h(t2 - s) dW(S)]
s) dW(s) =
=
O.
0,
Using (58) and the fact that X(O) and X '(O) have the respective deter ministic values Xo and VOl ' we now conclude that for 0 < t < t1
E [X(t)(g(t2) - X(t2» ]
=
E
[ (X04> l(t)
x
+
V04> 2 (t)
+
f: h(t
-
)
s) dW (S»
( - f2 h(t2 - s) dW(S»)]
=
o.
1 76
Stochastic Differential Equ.stions
Thus, to show that.� (t 2 ) is the optimal linear predictor of X(t2 ) in terms of X(t), 0 < t < t 1 , it is lenough to show that g (t2) is the limit in mean square of linear combinations of the random variables X(t), 0 < t < t 1 . To do this we need only show that X/(tt) is such a limit in mean square. But from Equation (24) of Chapter 5 we sec� that X/(tt ) is the lilnit in mc�an square of the random variables
( �)
X(t l ) - X t l 1
n
as n � + 00 . This conc�ludes the proof that �� (t 2) is the desired optimal pr,edictor of X(t 2 ). The mean square error of the predictor g (t 2) is
or
'There are several worthwhile observations to be made concerning this example. First, g (t), t > t l ' can be uniquely defined as that function which satisfies the homogeneous equation
and the initial conditions and Secondly, the mean squa.re error of prediction depends only on the distance between t 1 and t2 and is an increasing function of that distance. Let 8 be any positive number less than t 1 . Then the: predictor g (t2 ) is the limit in mean square of linear combinations of the random variables X(t), t 1 - 8 < t < t 1 • Thus in predicting X(t2 ) in terms of X(t), 0 < t < t 1 , we: need only observe X(t), t 1 - 8 < t < tt, for an arbitrary small positive number 8. Finally, since the X(t ) process is a Gaussian process, tht� optimal linear predictor g (t 2 ) of X(t 2) in terms of X(t), 0 < t < t 1 , is also the optimal nonlinear predictor.
6. 4.
1 77
Spe(�trBI distribution
The results of Example 7 are readily extend��d to prediction of sto(;hastic processes defined as solutions to differential ��quations of order n having white noise inputs. Suppose that X(t), t > 0" is defined by requiring that (59)
on 0 < t < 00 , and that X(O), . . . , x ( n - l ) (o) take on n reSI)ective d��terministic values. Le�t 4> 1 ' . . . , 4>n and h be: as in Section 6.2. Th.en for o < t 1 < t2 , the optimlal (linear or nonlinear) predictor g (t2) of X(t2) given X(t), 0 < t < t 1 , is given by (6,0)
g(t2) = X(t 1 ) ¢11 (t2 - t 1 )
+
...
:x� (n - 1 )(t 1 )4>n(t2 - t 1 ) .
+
The corresponding function g (t), t > t 1 , is the unique function that sa.tisfies the homogeneous equation (6 1) and the initial conditions
The mean square error i Q f prediction is given by (63)
E(g(t2 ) - X(t2 » 2 = q2
f:2-t1 h2(s) ds.
Suppose now that th�e left side of (59) is stable and let X(t), -· 00 < t .< 00 , be the stationary solution to (59) on ( - 00 , 00 ) . The:n for - 00 < t 1 < t2 , the optimal (linear or nonlinear) predictor g (t 2) of X(t2 ) in terms of X(t), .- 00 < t < t 1 , is again given by (60) or (61)-(62), and (63) remains valid. 6.4.
S pectra l d istri bution
Let X(t), - 00 < t
e - uVH(A).
h( - u)
foo e- 1.AVry(v .- u) dv -
J:oo e - iA(t - V)h(t - v) dt e - iA v f: e - iAth(t) dt
e - iA v
=
=
(f-OO
00
)
e - u·' ry(v - u) dV dU e
foo e - l.A(V - U)ry(v - u) dv e - UU f oo e - l.AV ry(v) dv
e - UU
--
00
2n
--
2n
-
00
6. 4.
Spe(�tral distribution
181
Consequently,
J:oo e - ilUh( - u) du fy(}..) H()..) J:oo eilUh(u) du
fxO·) = fy(}..) H(}..) = ==
fy(A)H(A)H( - A).
It is left as an exercise for the reader to show that (67) From this and the prec(�ding result, we obtain (66) as desired. Of course, in order for (66) to be useful we lmust be able to compute the fr�e quency response function H. This turns out to be surprisingly easy :
( 68)
H(A)
_
ao(iA)n
+
1 a 1 (iA)n - 1
+ ··· +
- oo < A < oo .
an
We will now prove (68). The impulse response function h is such that if y(t), - 00 < t < 00 , is a bounded continuous function and
x(t) = then
J:oo h(t - s)y(s) ds
,
- oo < t < oo .
(69)
This is true even if y(t) is a complex-valued function. Choose -, 00 < A < 00 and set
y( t) = eilt,
Then
x(t) = =
By setting
u
=
t
- 00
< t
0, is from Exercise 2" (d) Set
X o(t) =
I�
Vo(s)
ds,
t>
0.
Show that the Xo( t ) process satisfies the stochastic differential equation m��" (t) + fX' (t) = W'(t),
(e) Express Xo(t) in terms of white noise. (f) Find the mean and variance of Xo(t).
t
>
0.
185
Exercises
4
Is there a stationary solution to the stochastic differential equation
5
on - 00 < t < 00 if ex = - a 1 /ao is positive ? If so, how can it be expressed in terms of white noise ? Let c be a real constant. (a) Define precisely what should be me:ant by a solution to the stochastic differential equation .rlo X ' (t) +
a1
X(t ) = c + W' (t ).
(b) Show that the ge:n eral solution to this equation on 0 < t
0 and t > 0 if and only if - 00
00 ,
< t
0, t > 0
f(s)f(t),
if and only if
o< 16
1 7'
1 S1 19
o.
+
=
=
P06
p{0, 1 } (3 )
=
0 < X < d.
1 )(b - a),
a < x < b;
2).
33 ( VS - 1)/2. 38 (a) (b) (c) (d)
0 is: transient and all other states are absorbing (and hence recurrent). 0 and 1 are recurrent and all other states are transient. 0 is absorbing and all other states are transient. All states are transient.
C H A PT E R 2 and n(2) .3. .4, n(l) . 3, 6 When p < t, the stationary distribution exists and is given by l px -1 - 2p , ) - 1 - 2JJ and n{x n(O) ' 2 (1 - p)X+ 1 2(1 - p)
1 n(O)
=
=
=
(
=
)
x >
1.
t. ft,
and
Answers
792
7 (a) n(x)
(:) ;d '
=
x
0, 1 , . . . , d, which is thc� binomial distribution with
=
parameters d and t . (b) mean dl2 and variance d14. y == x - I ,
xl2d, 8 P(x, y)
��
=
y
x)/2d,
_
y == els(�where.
0,
9 n(x)
(:r/e:) ,
=
1 3 (llq)pn. 1 4 n(x) 1 5 n(x)
l id,
=
1 7 2d .
1 8 (b) n(x) 1 9 (a)
no
=
==
x 1
2c
-
(0,
E
,
1
x
x
(1 - p)pX ,
=
=
20 (a)
(
lim
n -+
n
o
==
n
)
0
(b)
x, x + 1,
==
=
=:
4) ..:. t,
and
d.
Answers
193
C H A PT E R 3
= = -----
2 Po o (/ )
AO
Po 1 (/)
Ao
J.ll Al
+
+
_� e- ).ot +
+
A1
J.l l
_ _
Al + iiI
Ao + Al +
>
1
iiI
7 P(X( T)
n)
=
8 P(X(T ') J
n)
=
1 0 (a) P�y(/) P�x( /) (b) Pxx(/) (c) Pxy( t )
=
=
= =
(d) PX,Jc - 1 (/) and 1 1 EX(t )
;
=
1 2 (a) P�yl(t) 1 3 (b) Sx(/) and
=
and
=
+1
f:
0,
J.lx - 1 - J.lx =
( 1 - e- I' t)
J.lx1e- IJ',x t,
+
J.lx - 1
xe- I't,
[
xe 2 (). - p )t x
+
x(x
+
=
�
Y
1
+
e- ().O + ).l + Jll)t
0,
5 Tm has the gamma density with parameters 1
+
(A o
J.l l
A_ O __
Ao
+
+
A oJ.ll
A
=
J.l .
+
xe- I't
(y
+
)
( 1 - e- I' t).
1 )J.lPx,y + 1 (/ ) .
A nswers
194
1 4 (a) P,x
xp,
:=
(b) Pxi t) (c)
=
(1
Ad-
x
=
dA l + p,
21
n(x)
=
l
A + p,
x
(A/p,)X
X
_
)' = 0
e-- (A +Il) t) .
(b) transient.
P,
A + p,
(llpY
x!
- x)l.
(1 + pe-
t.