INTRODUCTION
TO THE THEORY OF RANDOM PROCESSES I. I. GIKHMAN A. V. SKOROKHOD Kieve State University
TRANSLATED BY SCR...
235 downloads
1712 Views
6MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
INTRODUCTION
TO THE THEORY OF RANDOM PROCESSES I. I. GIKHMAN A. V. SKOROKHOD Kieve State University
TRANSLATED BY SCRIPTA TECHNICA, INC.
W. B. SAUNDERS COMPANY Philadelphia
London
Toronto
W. B. Saunders Company: West Washington Square Philadelphia, Pa. 19105 12 Dyott Street London, W.C.1
1835 Yonge Street Toronto 7, Ontario
Originally published in 1965 as Vvedenie v teorige slychainich processov by the Nauka Press, Moscow
Introduction to the Theory of Random Processes © 1969 by W. B. Saunders Company. Copyright under the International Copyright Union. All rights reserved. This book is protected by copyright. No part of it may be duplicated or reproduced in any manner without written permission from the publisher. Made in the United States of America. Press of W. B. Saunders Company. Library of Congress catalog card number 68-18281.
Preface to the English Translation
The present work of Gikhman and Skorokhod is the first publication since the now classic "Stochastic Processes" by J. Doob to survey in a rigorous way the more modern results in the theory of stochastic processes and to link them to earlier developments in the subject. Not only are the standard classes of stochastic processes discussed, but much of the authors' original work on limit theorems in stochastic processes is also included. A significant feature of the book is a unique and beautiful treatment of processes with independent increments, which assumes no prior knowledge of the theory of infinitely divisible laws.
The book is appropriate for students who have a sound background in probability from a measure-theoretic viewpoint and will, undoubtedly, be welcome as a graduate text. For reference purposes the authors have included a comprehensive discussion of measure theory and of the basic ideas of probability. The authors take great care to state the topological assumptions underlying each theorem, although occasionally a result may be stated in slightly greater generality than seems warranted by the proof. The book contains a wealth of results, ideas, and techniques, the deepest appreciation of which demands a most careful reading. Certainly, this is not a book for the indolent. The English translation was reviewed by H. Dym, W. Rosenkrantz, S. Sawyer, D. Stroock, S. R. S. Varadhan, and myself. Although
it was our policy not to make major revisions of the manuscript, we corrected many small inadvertent errors. WARREN M. HIRSCH Courant Institute of Mathematical Sciences New York University
III
Introduction
The current literature on the theory of random processes quite extensive.
is
In addition to textbooks and monographs especially
devoted to this theory and its various divisions, there are many technical books, for the most part dealing with automation and radio electronics, in which considerable space is given to the theory of random processes.
From the point of view of instruction this literature can be divided into two groups, the first consisting of serious and lengthy monographs whose difficulty hinders their use as beginning textbooks,
and the second consisting of books that are either elementary or written for engineering students.
There are no books in the [Russian]
literature that are designed for rigorous exposition and are at the same time suitable for elementary instruction. The authors have therefore, decided to write the present book, based on material they have expounded in a number of courses at the University of Kiev. The first five chapters of the book are devoted to general questions in the theory of random processes (including measure theory and axiomatization of probability theory);
Chapters 6 through 9 are devoted to more specialized questions (processes with independent increments, Markov processes, and limit theorems for random processes). The book is designed for persons who have had a general course in probability theory and are ready to begin the study of the theory of random processes. The authors hope that it will prove to be useful for students in the universities and also for specialists, other than mathematicians, who wish to familiarize themselves with the fundamental methods and results of the theory in a rigorous though not the most general and exhaustive approach.
The authors have not undertaken to treat all branches of the theory. Certain questions and methods that are well covered in
current [Russian] literature are omitted. (These include the semigroup theory of Markov processes, ergodic properties of Markov processes, martingales, and generalized random processes.) On the V
INTRODUCTION
VI
other hand, questions that, up to the present, have not been included in books on the theory of random processes (such as limit theorems
for random processes) but that play an important role in contemporary theory, are considered. The theory of random processes has recently developed into a separate branch of probability theory. Because the theory of random processes is still so closely related to other divisions of probability theory, the boundaries between this theory and its divisions are often difficult to determine precisely. For example, the theory of random processes is related to the theory of summation of independent
random variables by the division of probability theory that studies processes with independent increments, and to mathematical statistics
by statistical problems in the theory of random processes.
Let us characterize the problems in the theory of random processes that, from our point of view, may be considered basic. 1. The first problem in the theory of random processes is the construction of a mathematical model that allows a rigorous (formal) definition of a random process, and the investigation of the general properties of that model. 2. A second problem is the classification of random processes. Obviously every classification is arbitrary to some extent. Therefore we need to begin with specific principles that at least indicate the direction the classification will take. The existing classification
in the theory of random processes separates from the entire set of random processes certain classes that admit a more or less constructive description. Every class is characterized by the property that only a finite number of functional characteristics need to be specified
in order to single out from the entire class an individual random process.
Sometimes, we consider classes of processes that admit a uniform solution to a specified set of problems. In considering such classes,
we are usually not interested in the difference between random processes, if the characteristics necessary for the solution of these
problems coincide for them. We might mention the following broad classes of processes: (1) processes with independent increments, (2) Markov processes, (3) Gaussian processes, (4) processes that are stationary in the narrow sense, and (5) processes that are stationary in the broad sense. (We might include in this last group processes with stationary increments.)
The third problem, closely associated with the preceding one, consists in finding, for various classes of random processes, an 3.
INTRODUCTION
V11
analytical apparatus that will enable us to calculate the probabilistic characteristics of random processes. Such an apparatus has been constructed for the simplest probabilistic characteristics, and it uses,
as a rule, either the theory of differential equations (ordinary and partial) and integrodifferential equations (in the case of Markov processes and processes with independent increments) or the theory of integral equations with symmetric kernel (in the case of Gaussian processes) or Fourier transformations and the theory of functions of a complex variable (for processes with independent increments and for stationary processes). 4. We need to single out a class of problems that has played an important role in the development of certain branches of the theory of random processes and that is of great practical significance.
In its general form, the problem consists in the best determination
of the value of some functional of a process from the values of other functionals of the same process. An example of such a problem is the problem of prediction: - From observation of a process
over a certain interval of time, determine the value of the process at some instant of time outside that interval. Under certain restrictions, prediction problems have been solved for processes that are stationary in the broad sense (see Chapter V). 5. An important class of problems in the theory of random processes is the study of various transformations of random processes.
These transformations are used to study complicated processes by reducing them to simpler ones. We might include with the study of transformations of random processes the theory of differential and integral equations involving random processes.
This class of problems also includes limit theorems for random processes since the operation of taking the limit is a
sort of transformation.
At the present time the principal fields of application of the theory of random processes are electronics (which deals primarily
with processes that are stationary in the broad sense and with Gaussian processes) and cybernetics (which deals with processes that
are stationary in the narrow sense and with Markov processes). In mathematical economics and mathematical biology we use Markov processes of a different sort. In the molecular theory of
gases we use the process of Brownian motion; in the theory of showers of cosmic particles we apply Markov processes and processes with independent increments.
In general, the methods of the theory of random processes are finding ever new fields of application, and today every one of the
INTRODUCTION
natural sciences has
felt the influence of this theory, at least to
some degree.
Let us characterize briefly the features of the contents of the The first chapter is devoted to random processes in the broad sense. This is the name we have given to the portion of the theory of random processes that deals only with distributions of finite sets of values of a random process. This portion is very close to elementary probability theory, involves no complicated mathematical concepts, and is sufficient for many applications. For a more profound study of the theory of random processes, a more highly developed theory of measure and integration is necessary. Therefore, following Chapter I we expound all the necessary information in this field (Chapter II), and on the basis of this information we construct an axiomatization of probability theory (Chapter III). We also consider the general questions in the theory of random functions and, after that, specific classes of random processes and special questions in the theory. Among random processes, extensive treatment is given to processes with independent increments (to which one chapter is devoted) and Markov processes (to which two chapters are devoted). Stationary processes are considered to some extent in Chapter I and again in Chapter V, which is devoted to linear transformations of random present book.
Chapter V also takes up the problem of linear prediction. An entire chapter is devoted to limit theorems for random processes. In this chapter basic attention is given to processes with independent increments and Markov processes. processes.
Most of the constructions are made for the case in which a random process assumes values belonging to a finite-dimensional Euclidean space. In a few cases we consider complex-valued onedimensional and multidimensional processes and also processes with values belonging to a complete metric space. We assume that the reader is familiar with the basic concepts of linear algebra, which is particularly important for the study of Gaussian processes, and the theory of Hilbert spaces, which is used in the study of linear transformations of random processes. The reader should also have some familiarity with functional analysis (complete metric spaces, compact spaces, etc.).
We have not attempted to give a complete bibliography of works on the theory of random processes. In addition to books cited in the text, the bibliography includes only the basic books on the theory of random processes and probability theory that exist in
INTRODUCTION
Ix
Russian, as well as articles in which the fundamental results in this field first appeared. The book is divided into chapters and the chapters into sections. The basic formulas and also the theorems, lemmas, and definitions are numbered afresh in each section. A reference to a theorem or
formula in the same section is indicated only by the number of the theorem or formula. If a reference is made to a theorem or formula in another section of the same chapter, the section number is added. If the reference is made to another chapter, the chapter number is added. The authors express their gratitude to colleagues and students in the Department of Probability Theory and Mathematical Statistics
of Kiev State University for the help they have given in the preparation of this book. Kiev
October 21, 1963
THE AUTHORS.
Contents
1
RANDOM PROCESSES IN THE BROAD SENSE ...................
....................................... ........ ....................... ...............
1
1. Definitions 2. Correlation Functions (Covariance Functions) 3. Gaussian Random Functions 4. Oscillations with Random Parameters 5. The Spectral Representations of the Correlation Function of a Stationary Process and of the Structural Function of a Process with Stationary Increments .......
22
MEASURE THEORY ..........................................
40
1
5 13
27
2
......................................... 2. Measurable Functions ............................. 3. Convergence in Measure ........................... 4. Integrals ......................................... 1. Measure
5. Interchanging Limits and Integrations. Lp Spaces 6. Absolute Continuity of Measures. Mappings 7. Extension of Measures 8. The Product of Two Measures
......
.......... ............................. ......................
41
50 56 61
68 76 80 93
3 AXIOMATIZATION OF PROBABILITY THEORY ................... 100 1. Probability Spaces ................................. 100
2. Construction of Probability Spaces
.................. 104
3. Independence ..................................... 111
4. Series of Independent Random Variables
.............
1 18 5. Ergodic Theorems ................................ 123
6. Conditional Probabilities and Conditional Mathematical Expectations
................................... 134 xi
Xii
CONTENTS
4
RANDOM FUNCTIONS ...................................... 144 1. Definition of a Random Function ................... 144 2. Separable Random Functions ....................... 150 3. Measurable Random Functions ..................... 156 4. Conditions for Nonexistence of Discontinuities of the
Second Kind ................................... 159
5. Continuous Random Functions ..................... 169 5
LINEAR TRANSFORMATIONS OF RANDOM PROCESSES .......... 174
1. Hilbert Spaces .................................... 174
2. Hilbert Random Functions ......................... 181
3. Stochastic Measures and Integrals ................... 190 4. Integral Representations of Random Functions 5. Linear Transformations 6. Physically Realizable Filters
........ 200
........................... 207 ........................ 216
7. Prediction and Filtering of Stationary Processes ........ 226 8. General Theorems on the Prediction of Stationary Processes ....................................... 241
6 PROCESSES WITH INDEPENDENT INCREMENTS ................. 255
1. Measures Constructed from the jumps of a Process ..... 255 2. Continuous Components of a Process with Independent Increments ..................................... 264
3. Representation of Stochastically Continuous Processes
with Independent Increments .................... 270 4. Properties of the Sample Functions of a Stochastically Continuous Process with Independent Increments ... 274 5. Processes of Brownian Motion ...................... 282 6. On the Growth of Homogeneous Processes with Independent Increments ......................... 288 7 JUMP MARKOV PROCESSES ................................. 297
1. Transition Probabilities ........................... 298 2. Homogeneous Processes with Countably Many States ... 302
xiii
CONTENTS
3. Jump Processes ................................... 311 4. Examples ........................................ 319 5. Branching Processes ............................... 327
6. The General Definition of a Markov Process .......... 343 7. The Basic Properties of Jump Processes .............. 347 8 DIFFUSION PROCESSES ..................................... 370
1. Diffusion Processes in the Broad Sense ................ 372
2. Ito's Stochastic Integral ............................ 378 3. Existence and Uniqueness of Solutions of Stochastic
Differential Equations ........................... 391
4. Differentiability of Solutions of Stochastic Equations
with Respect to Initial Conditions ................. 403 5. The Method of Differential Equations ............... 412 6. One-Dimensional Diffusion Processes with Absorption .. 420
9 LIMIT THEOREMS FOR RANDOM PROCESSES ................... 438
1. Weak Convergence of Distributions in a Metric Space .. 440
2. Limit Theorems for Continuous Processes ............ 448 3. Convergence of Sequences of Sums of Independent Random Variables to Processes of Brownian Motion .. 452 4. Convergence of a Sequence of Markov Chains to a
Diffusion Process ................................ 459
5. The Space of Functions without Discontinuities of the Second Kind ................................... 469
6. Convergence of a Sequence of Sums of Identically Distributed Independent Random Variables to a Homogeneous Process with Independent Increments . 478
7. Limit theorems for Functionals of Integral Form ....... 484 8. Application of Limit Theorems to Statistical Criteria .. 490 BIBLIOGRAPHIC NOTES ..................................... 497 BIBLIOGRAPHY ............................................. 503
INDEX OF SYMBOLS ........................................ 511
INDEX .................................................... 513
I RANDOM PROCESSES IN THE BROAD SENSE
1.
DEFINITIONS
The course of a random process, like that of a deterministic process, is described by some function (O) (which may assume real, complex,
or vector values), where 0 assumes values in a reference set O. As B varies, e(O) describes the evolution of the process. (Of course, the way in which the process evolves is random, and each of the functions describes only one of the possible ways in which the process may are called sample functions of the random develop). These functions process. For each fixed 0, the quantity E(0) is random. To be able to apply
mathematical methods to the questions that we are studying, it is natural to assume that e(O) is a random variable (possibly vectorvalued) in the probabilistic sense. Consequently, by a random process we mean a family of random variables (O) depending on a parameter B that assumes values in some set O. If the set O is arbitrary, then instead of the term "random process,"
it is more convenient to use the term "random function" and to reserve the name "random process" for those cases in which the parameter 0 is interpreted as time. When the argument of a random function is a spatial variable, this function is also called a random field. This definition of a random process, or a random function as we have just agreed to call it, needs to be made more precise. For the sake of simplicity we shall speak of a random function that assumes real values. First of all we need to make clear just what is meant by "a family of random variables depending on a parameter 0." We recall that in accordance with the principles of probability theory, a finite sequence of random variables EE
e1, S4,
,n
is completely characterized by the joint distribution function 1
RANDOM PROCESSES IN THE BROAD SENSE
2
F(xi, x2, ... , xn) = P{S 1 < xi, e2 < x2, ... , $n
k=1 1k < p, if E(O) e L,,(®). A precise converse of this assertion holds for moment functions with even indices. Let Da denote the operation of taking the symmetric finite difference with respect to the variable denote( its jth iterate: A
T
(81,
2s
and let li
. .. , B X1, . . . , X,)
_ *(O1, ...,O,X1, h;, '(B1, ... , B A1, ... , X; - h;, ... , ,,) ,
-
(B1, . . . , B A,1, AS (` T r=0
. . . , X,)
(-1)rC;'/ (B1, ..., B a1,
, a; + (j - 2r)h;, ..., a,) .
2.
COVARIANCE FUNCTIONS
7
(C; = j!l{r!(j - r)!} is the binomial coefficient.) Then 01'i10272 .. . ps.9a,iY (01, ... , Ba, %1 2 Jk
7s
M 11 E (- 1)rCr ei2(9k-r)hkF(Bk) 29 k k=1 r=0
M JJ (eihkf(Bk) - 2 tihkE(Bk))29k k=1 s
- II hk k(2i)2k= k=1
kM I (sin hke(Bk) )2ik [e(ok)]2jk /
1
,
hke(Bk)
k=1
or
02jk
_ \-
[7 (sln hk(Bk)
k=1 2hkjk
k=1
(Bk)]2jk
From this, by using Fatou's lemma (see Chapter II, Section 5), we get
E .k
(-
1)k=1
lim
hk-'0
k=1.2, , s
r
J1
a
k=1
M II [e(ok)]2jk k=1
11 (2hk)2jk 2=0
k=1
The expression on the left-hand side of this inequality coincides with the derivative E 2ak
ak-1
aX2j1' .. 1
aX23's 8
at the point X = 0 if 'r is 2 E81=1 jk times differentiable. Thus we have:
Theorem 2. I f the characteristic function #(0,,
, oa X,
, X.)
is p times differentiable, where p is an even integer, there exist moment functions of order q 5 p and they can be calculated from formula (2).
In addition to moment functions, we often consider central moment functions ...,;a (/Or, .. ., 0a)
M([e(01) - m1(01)l'1[V02) - m1(02)l'2 ... [ (Bs) - ml(0 )l's)
(3)
which are moment functions of a central random variable e1(0) = E(o) - m1(0) with mathematical expectation equal to 0 for arbitrary
oe0. Among the moment functions of special significance are the functions of the first two orders: m(0) = ml(0) = Me(o)
,
(4)
RANDOM PROCESSES IN THE BROAD SENSE
8 R(01, B2) = u11(01, 82)
= M([e(01) - m(01)][e(B2) - m(82)])
(5)
The function m(9) is called the mean value and R(01, 02) is called the
For 0, = 02 = 0, the covariance function gives the variance 62(9) of the random variable c(9); R(9, 0) = 0'2(9). For a stationary process (O = Z), it is obvious that m(t) = m = const , (6) covariance function.
R(t1, t2) = R(t1 - t2, 0) = R(t1 - t2)
,
(7)
that is, the covariance function depends only on the difference in the arguments. The function R(t) = R(t1 + t, t1) is also called a covariance function of a stationary process. Of course if equations (6)
and (7) are satisfied for some process, it does not follow that the process is stationary. Still, we often encounter problems whose solution depends only
on the values of the first two moments of a random function e(t). For such problems, the condition that the process be stationary reduces to conditions (6) and (7). Therefore it is natural to consider the following important class of processes (introduced by A. Ya. Khinchin).
Definition 3. A random process e(t) is said to be stationary in the broad sense if Mee(t) < oo and Me(t) = m = const; M([e(t1) - m][c(t2) - m]) = R(t1 - t2)
.
We note that for a stationary process in the broad sense the variance 62 of a random variable e(t) is independent of t: U2 = R(O) = M[e(t) - m]2
The quantity r(01, 92) =
R(01, 92) 6(01)0'(02)
=
R(01, 92)
1/R(01,
01)R(02,02)
is called the coefficient of correlation of the random variables e(91) and e(02). If e(91) and g(02) are independent, the coefficient of correlation is
0. The converse is not generally true. However in the important particular case in which the random variables e(01) and e(02) have a joint normal distribution, the variables e(01) and e(02) are independent if
the coefficient of correlation is 0 or, what amounts to the same thing, if the covariance function R(01, 02) is identically zero. In the general case, if two random variables e and 17 with finite secondorder moments satisfy the condition RE,, = M[(e - Me)(9 - My))] = 0,
2.
9
COVARIANCE FUNCTIONS
they are said to be uncorrelated.
Analogously, we may say that in those branches of the theory that deal only with first- and second-order moments, the concept of uncorrelatedness of random variables replaces the concept of independence of random variables. The coefficient of correlation of a pair of random variables is a measure of linear dependence between them; that is, the coefficient of correlation shows with what accuracy one of the random variables can be linearly expressed in terms of the other. Let us clarify this. For the measure of error aY) + b where a and b are real 8 of the approximate equation numbers, we take the quantity 8 = 1/M[e - (ay] + b)]2. Then 82
= M[(e - Me) - a(y) - My)) + (MS - aM)l -
b)]2
,
so that 82
= De + a2DlJ + (Me - aMY) - b)2 - 2aRE,n
=Ca6n- )2+01(1 -r2n)+(Me-aM)7-b)2 where Q2 = DC and 62) = D)7 are the variances of the variables respectively. This expression attains its minimum when
a=
Re* 01
=
aF 6n
r
and
QFrEmm,, 6=me-amn=me-a*
and this minimum is equal to min 32 = 62(1 - r2n). Thus the greater in absolute value the coefficient of correlation between two random
variables, the greater the accuracy with which one of them can be represented as a linear function of the other. We often consider complex-valued random functions C(0), which
can be represented in the form C(O) = C(B) + h'(B); we can also regard them as two-dimensional vector-valued random functions. For a complex-valued function, the relation C(O) e L4(0) means that M I C(0) 12 < 00 , where 0 e O, that is, e(0) e L2(®) and 72(0) e L2(®).
The covariance function of a complex random function is defined by the equation R(01, B2) = M([V01) - MV01)][C(02) - MV02)], where
the vinculum denotes the complex conjugate. Let us note certain properties of covariance functions: 1. R(0, 0) > 0 with equality holding if and only if C(B) is constant with probability 1; 2.
R(01, 02) = R(02, 01) ;
3.
1 R(01, 02) 12 < R(01, 0)R(02, 02) ;
4.
For every n, 0l, 02,
, 0n and complex numbers X1, X2, .
(8) (9) , X.1
RANDOM PROCESSES IN THE BROAD SENSE
10
j,k=1
R(O j, Bk)X ja,k > 0 .
(10)
The first two assertions are obvious. The third is obtained as a consequence of the Cauchy-Schwarz inequality (MIe77 I)2
MIeI2M177 12.
To prove 4, it suffices to note that
+
LJ R(6j,
j,k=1
y
y
n
,k = M E b(BI)b(0k)XIXk = M j,k=1
j=1
y b(B
0
We note that properties 1, 2, and 3 follow from property 4. A function R(01, 02) that satisfies property 4 is called the nonnegative-definite kernel on O. A complex-valued process C(t) is said to be stationary in the broad sense if m(t) = MC(t) = const, R(tl, t2) = R(t1 - t2). For processes that are stationary in the broad sense, pro-
perties 1 to 4 of a correlation function take the forms It. R(0) > 0 , 2'.
R(t) = R(-t)
(11)
Y.
I R(t)
R(O)
(12)
4'.
j,k=1
R(tj - tk)Xj3,k > 0 .
(13)
Let C1(0) and C2(O) denote two random functions belonging to To characterize the degree of linear dependence between two such functions, we introduce the joint covariance function. L2(O).
Definition 4. The joint covariance function of two random functions C1(0) and C2(O) in L2(O) is defined as the quantity Rs1;2(B1, 02) = M(I 1(01) - Mb1(01)][U02) - Mb2(B2)])
Suppose that O is an interval of the real axis Z. Then two processes C1(t) and C2(t) are said to be stationarily connected if each is stationary in the broad sense and 02) = R(B1 - 02). Suppose that we are given a sequence of complex-valued random functions , C,(0), C;(0) e L2(O), i = 1, 2, , r. Let us agree to C1(0), C2(0), treat this sequence as a single r-dimensional complex-valued random function r(O) = {C1(0), C2(0), , C,(0)}, 0 e O. If E and 7) are two vectors E = (E1, e2, , er), 77 = (rll, X72 , / r), we shall let er2* de-
note the matrix
fi
e,171 e1'/ 2
el' r
ef ... J7') = (ifi) -
.er iT 1 & r'/ 2
a
r'/ r
2.
COVARIANCE FUNCTIONS
11
We set m(0) = MC(0) = {MC1(0), MUO), ... , MUB)} , R(01, 02) = (Rdj(01, 02)) = MU(0) - m(01)][S(02) - m(02)]*)
i, j = 1, ..., r
= (M{[S¢(01) - m{(B1)1[Sj(02) - mj(02)l}) ,
The function m(0) is an r-dimensional complex vector-valued function. It is called the mean value of the vector-valued random function r(0). The matrix R(O1, 02) is called the covariance matrix of r(O). If O = % and m(t) = in = const, R(t1, t2) = R(t, - t2), the process r(t) is said to be stationary in the broad sense.
Corresponding to properties 1 to 4 of covariance functions are the following properties of the covariance matrix of a random function: 1.
R(O, 0) is a nonnegative-definite matrix n
2
M
L Rjk(B, j,k=1 2.
xjbj(0) j=1
R(01, 02)* = R(02, 0l)
>0;
(14)
(15)
,
where the asterisk denotes the complex conjugate matrix; 3.
I Rjk(01, 02) I2 C Rjj(01, 01)Rkk(02, 02)
,
j, k = 1, ..., r ;
(16)
, B and a sequence of complex vectors
4. For arbitrary n, 01, A1, A2, ... , An,
(R(01, Ok)Ak, A1) > 0
(17)
j,k=1
This last condition is equivalent to: 4'. For an arbitrary sequence of matrices A1, , An, the matrix j,k=1 A;R(0,, Ok)Ak is nonnegative-definite. Properties 1 and 2 are obvious. To prove property 3, let us use the Cauchy-Schwarz inequality for the mathematical expectation: Rjk(01, 02)
12
= I M[(S1(B1) - mj(B1))(bk(02) - mk(B2))l I' :_!5 Rij(011 00-'4021
To prove property 4, let us set Ak = (aki, n
n
I (R(01, Ok)Ak, Aj) =
j,k=1
, akr) .
02)
Then
r
E R21 (0j, 0k)akgajp
i
j,k=1 p,q=1
=M
n
r
Ij=1Ep=1 (bp(O,) - mp(Bj))ajp
2
> 0
In conclusion let us look at some modifications in the preceding definitions.
Let us agree to call a process C(t), for t e Z a process belonging
RANDOM PROCESSES IN THE BROAD SENSE
12
to the class L2 or Lz(1) if for arbitrary t1, t2 a Z, M I r(t2) - r(t1) I2 < -, where r(t) I denotes the norm of the vector. For processes belong-
ing to /the class L2, we introduce the vector-valued function (t1)} = m(t1, t2) = {m1(t1, t2), m2(t1, t2), ... , m,.(t1, t2)} , (t2) - V01 which we call the mean value of the increment of the process, and we introduce the matrix D(t1, t2, t3,t4)
= (D;k(t1, t2, t3,
t4))
/
yy
17( 1 = (M{[CA(t2) - C5(t1) - m;(t1, t2)][yyk(t4) - b k(t3) - mk(t3, t4)]})
(j,k=1,...,r)
which we call the structural matrix of the process r(t). If the functions m(t1, t2) and D(t1, t2, t3, t4) are independent of the displacement of the arguments, that is, if m(t1 + h, t2 + h) = m(t1, t2)
D(tl + h, t2 + h, t3 + h, t4 + h) = D(t1, t2, t3, t4)
where t, e Z, t; + h e Z, i = 1 ,
, 4, then the r(t) is called a process with stationary increments in the broad sense.
For a process with stationary increments, m(h) = m(t, t + h) is an additive function of h. If we make the additional requirement that the function m(h) be continuous or bounded on some interval, it then follows that m(h) is linear, that is, m(h) = (m1h, m2h, , mrh). For real processes the structural function D(t1, t2, t3, t4) can be expressed in terms of the simpler functions D(t1, t2) = D(t1, t2, tl, t2), which is also called the structural function of the process. Indeed, D(t1, Q = M([[y(t4) - 03) - M(49 t4) y+ '(t3) y
C(t) - m(t1, 01[04) l- S (t3) - M(49 t4) + S (t3)
- yS (t1) - m(t1, t3)1 *)
= D(t3, t4) + 2D(t1, t3, ts, t4) + D(t1, t3)
,
so that D(t2, t3, t3, t4) = ^ [D(t1, t4) - D(t1, t3) - D(t3, t4)]
2
Furthermore, D(t1, t2, t3, t4) = D(t1, t2, t2, t4) - D(t1, t2, t2, t3)
The last two formulas together express the function D(t1, t2, t3, t4) terms of the D(t1, t2).
i
3.
GAUSSIAN RANDOM FUNCTIONS
13
GAUSSIAN RANDOM FUNCTIONS
3.
In many practical problems an important role is played by random functions for which the family of joint distributions defining a random function consists of Gaussian (normal) distributions. First we shall give the definition and basic properties of a multi-dimensional Gaussian distribution. is said to _ ( 2, Definition 1. A random vector
,)
have a Gaussian (normal) distribution if the characteristic function can
be written in the form 0 1' t2, ... , t.) = M exp {i(t,
)}
= exp (i(m, t) - 2 (At, t))
,
(1 )
are vectors and and t = (t1, t2, , where m - (ml, m2, , is a nonnegative-definite real symmetric A = (Xik) for i, k = 1 , , n, matrix. Here (a, ,@) denotes the scalar product of the vectors a and ,B, so that (m, t) = L mktk ,
(At, t) = L X7kt]tk j,k=1
k=1
The following theorem serves as a formal justification of the definition that we have just given. Theorem 1. For a function -,Jr(t) exp [i(m, t) - 1/2(At, t)] to be the characteristic function of an n-dimensional random vector , it is necessary and sufficient that the real matrix A be nonnegative-definite and symmetric. The rank of the matrix A is equal to the dimension of the subspace in which the distribution of the vector a can be concentrated.
Proof of the Necessity.
Suppose that the characteristic function
of a random vector is given by formula (1). If we differentiate it first with respect to t, and then with respect to tk and then set t = 0, we see that the distribution has finite moments (Theorem 2, Section 2) and that '/r(t)
a* atj 1=0 a21r
atjatk
=iMEj=imj
-Mejek = -mjmk - X$j .
(2)
(3)
1=0
It follows from these formulas that the matrix A is real, symmetric, and nonnegative-definite:
(At, t) = M(? (ej - mj)tj) = D(e, t) > 0 . \j_1 2
(4)
RANDOM PROCESSES IN THE BROAD SENSE
14
If the rank of the matrix A is equal to r(< n), then by making a suitable change of variables tj = Ek=1 ajkc we can reduce the quadratic form to principal axes: (At, t) _
k=1
XkTk =
k=1 j=1
e (S j - mj)ajkZvk]2
mj)ajk = 0 for k = r + 1, , n with probability These relations show that there exist with probability 1, n - r
Thus 1.
linearly independent relationships among the components of the vector E and hence that its distribution is concentrated in the r-dimensional , n. hyperplane defined by E %j (xj - mj)ajk = 0, k = r + 1, Proof of the Sufficiency. Let us suppose that A is a positivedefinite symmetric matrix. The function
*(t) = exp {i(m, t) - 2 (At, t)} is absolutely integrable and differentiable. Consequently we can apply Fourier's integral formula to it:
0) =
flx) ei(z, t)dx ,
f(x) = (2 )n
L'/r'(t)e ict, x'dt
(5)
These integrals are n-dimensional, and dx and dt denote n-dimensional elements of volume. Let C denote an orthogonal matrix that re-
duces A to diagonal form, so that C*AC = D, where D = (xiaik) for i, k = 1, , n and Xi > 0, and where C* is the adjoint of C (Here, we note that C is real and orthogonal and thus C* coincides both with the transpose and with the inverse of C (that is, C* = C' = C-).) Let us make a change of variables of integration by setting
t = C u or u = C*t, where u = (u1, u2,
, un) .
Since the element of volume is not changed under an orthogonal transformation, it follows that
f(x) _ (21 L exp { - i(x - m, Cu) - I (ACu,
Cu)}du
We have (ACu, Cu) = (C*ACu, u)
a,kuk, k=1
(x - m, CU) =
*(x - m), U) _
k=1
XkUk
where xk is the kth component of the vector x* = C*(x - m). fore,
There-
3.
GAUSSIAN RANDOM FUNCTIONS
AX)
(
(2270
1
111
X
)"`
rexp{
27r J
rte,-
-l
x
- -2 n=11 ,s
(1'k=2
m exp { - 2Xk Uk - 2 X kuk}duk e Xk*2/21k
1
lI
15
k=1 1/27rXk
=
(27r)-nI2
IJ Xk
1/2e-1/2(D-lx*,x*)
Ck=1
Furthermore, Ilk=1 Xk = 0, where A is the determinant of the matrix A, and (D-lx*, x*) _ (D-1C*(x - m), C*(x - m)) _ (CD-1C*(x - m), (x - m)) ((CDC*)-1(x - m), (x - m)) = A-1(x -m), (x - m)) , where A-1 is the inverse of the matrix A. Finally we obtain (x)f
1/ (27r) n 1
1/(27r)"`.
.
exp {
exp
2
-1 `
(A-1(x -
m),
(x -
m)}
0k9(X9 - m.i)(xk - mk)
0
2 A.k=1
'
(L )
1
where the Ak; are the cofactors of the elements of the matrix A. It follows from (6) that f(x) > 0 and it follows from (5) that
L f(x)dx=*(0)= 1
.
Thus the function f(x) can be regarded as an n-dimensional distribution density, and ar(t) is its characteristic function.
Turning to the general case, let us suppose that the matrix A is a matrix of rank r (where r < n) and that C is an orthogonal transformation that reduces A to diagonal form: C*AC = D where D,. is a diagonal matrix whose diagonal elements Xk are zero for k = r + 1 , , n and positive for k = 1 , 2, , r. Suppose that X; _ X; for j = 1, , r but that X; =s for j = r+ 1, ,n. Then, A, = CD,C* is a positive-definite matrix and
're(t) = exp {i(m, t) - Z (A,t, t)} is the characteristic function of some distribution. Ass 0, the function *,(t) converges uniformly to ar(t). Hence ar(t) is the characteristic function of some distribution. As shown above, this distribution is concentrated in an r-dimensional hyperplane, so that it has no density. Such a distribution is called an improper Gaussian
RANDOM PROCESSES IN THE BROAD SENSE
16
distribution.
Corollary 1. In the expression (1) for the characteristic function is the vector of the of a Gaussian distribution, m = (m1, m2, mathematical expectation and A is the covariance matrix:
,
m= /Ljk = M[(ej - mj)(ek - mk)] . This corollary follows immediately from formulas (2) and (3). Corollary 2. If the covariance matrix A of a Gaussian random ME,
vector x is nondegenerate, there exists an n-dimensional distribution den-
sity f(x) defined by formula (6). Corollary 3. The joint distribution of an arbitrary group of components of a Gaussian random vector is Gaussian. , ) has a Gaussian Theorem 2. If a random vector C _ ( 1, e2, _ (1, , fir) and distribution and if the random vectors m.) , E" - (cr+1, (for r < n) are uncorrelated, then the vectors C' and e" are independent.
Proof. The fact that f' and E" are uncorrelated implies that , n. Therefore , r, j = r + 1, MetiE j - Me;Me j = 0, i = 1,
(r(t) = exp {i(m', t') + i(m", t") -
1
E, Xjktjtk -
2 j,k=1
1
"+
LJ Xjktjtk
2 j,k=r+1
,
where (m1, m2, ... , my), m" = (mr+i, ... , m,z) ,
m'
t' _ (t1, ... ' tr)' t" - (tr+1' ..., tn) The preceding formula can be rewritten in the form (t) = M exp {i(t', C') + i(t", C")} = M exp i(t', C')M exp i(t", E")
where *'(t') and *"(t") are the characteristic functions of the vectors E' and i". This relation proves the independence of El and Ell. , n) denote an , h and k= 1, Let W = J a j k JJ (for j= 1, arbitrary rectangular matrix, and set rJ = WC; that is f
99 =
n+
l"/1, ..., /h}, )7j = Lk=1ajkek ,
j = 1, .. ., h .
The vector 7) is a linear transform of the vector C. Theorem 3. Linear transformations of random vectors map Gaussian distributions into Gaussian distributions. Proof. Let *,,(t,, the vector '2. Then
, t,) denote the characteristic function of
3.
GAUSSIAN RANDOM FUNCTIONS
17 h
9(t1, ... , th) = M exp )i E
2=1
tR»»1J
n (/
M exp i' (
h+
k=1 \j=1
\j jllll tjajk)ek1
= exp {i(t, Win) - 2 (1XAW't, t)}
,
(7)
that is, 72 has a Gaussian distribution with mathematical expectation Win and with variance-covariance matrix A, = %A%'. , r, ) denote a seTheorem 4. Let t(l) (where a = 1 , 2, quence of n-dimensional vectors having Gaussian distributions with parameters (m(a), A(a)). The sequence of distributions of the vectors era' converges weakly (converges in distribution) to some limiting distribution
if and only if MW ._._, m
A(a) ._., A .
,
(8)
Then the limiting distribution is also a Gaussian distribution with parameters m and A.
For a sequence of distributions of random vectors f(a) to con-
verge weakly to a limit, it is necessary and sufficient that the sequence of their characteristic functions *(a)(t) converge to a conwhere tinuous function. Let us consider the sequence {ln In *(a)(t) = i(m(a), t) - 1/2(A(a)t, t) in some neighborhood of the point t = 0. For this sequence to converge it is necessary and sufficient that conditions (8) be satisfied. If conditions (8) are satisfied, then ar(t) = exp {i(m, t) - 1/2(At, t)} for all t; that is, a limiting *(a'(t) distribution exists and is Gaussian.
Let us turn now to random functions. A real r-dimensional , E,(0)} is said to be Gaussian if for every n the joint distribution of all components of the random vectors random function f(O) = {e1(B),
(9)
E(el), x(02), ... , (Bn)
are Gaussian. The covariance matrix R of the joint distribution of a sequence of random vectors (9) is rn x rn and it can be partitioned into square
r x r cells as follows: 1R(01, B1) R(01, 02) ... R(01, 0n)
R = R(02, 61) R(02, 02) ... R(02, 0) J R(e n, 0 ) R(0 1
,
0 2)
... R(6 ,,,
0n )
where R(01, 02) is the covariance matrix of the function e(0). The matrix R is real and nonnegative-definite.
RANDOM PROCESSES IN THE BROAD SENSE
18
The converse is obvious. Specifically, for any real-vector-valued function m(0) and any real-vector-valued nonnegative-definite matrix function R(01, 02), where 0; e O (for i = 1, 2), there exists an rdimensional Gaussian random function (in the broad sense) for which m(0) is the mathematical-expectation vector and R(01, 02) is the covariance matrix.
The moments of a Gaussian real valued random function can be obtained from the decomposition of the characteristic function. Confining ourselves to the case of central moments, we set m(0) = 0. Then
W" ..., 0.1 t1, ..., t8) e(-112)(fh,t) = 1
- 1 (At, t) 2
i +
2
22
(At, t)2 - ...
+ (_ 1),.
(At, t)R + .. .
2 n!
where A = (R(0;, Bk)) for j, k = 1, , s. From this we obtain for an arbitrary moment function of odd order, p;1... j (01, , 0,) = 0 if Ek-1 jk = 2n + 1. For central moments of functions of even order, fc;1...;,(B1, ..., O) _
a2%
1
aril ... ats8 21ni
(At, t),
,
t-o
jk = 2n . k-1
(10)
For example, for fourth-order moment functions we have the following formulas: ,u4(B) = 3R2(0, 0), "31(01, 02) = 3R(01, 01)R(01, 02) , 1/21(01,
02,
03) = R(01, 01)R(02, 03) + 2R(01, 02)R(01, 03)
,
1/ "1111(81, 02, 03, 04) = R(01, 02)R(03, 04)
+ R(01, 03)R(02, 04) + R(01, 04)R(02, 03)
In the general case, p;1... ;8(01, ..., 0,) = ZUR(0, 0q) , (11) the structure of which can be described as follows: We write the points 0, ... , 0, in order, where Ok is repeated until it appears jk times. We partition this sequence into arbitrary pairs. Then we take the product on the right side of formula (11) over all pairs of
this partition, and we take the sum over all partitions (pairs that differ only by a permutation of the elements are considered as a The assertion follows immediately from formula (10). The fact that Gaussian random functions play an important role in practical problems can be explained in part as follows: Under single pair).
S.
GAUSSIAN RANDOM FUNCTIONS
19
quite broad conditions, the sum of a large number of independent small (in absolute value) random functions is apporoximately a Gaussian random function, regardless of the probabilistic nature of the individual terms. This so-called theorem on the normal correlation is a multi-dimensional generalization of the central limit theorem.
Here is one of its simpler formulations: Theorem S. Let {Yjn} denote a sequence of sums of random func. Suppose that the foltions 17,(0) = Ek=61 a, (0), 0 e O, n = 1, 2, lowing three conditions are satisfied: a. For fixed n, the random variables an,(01), an2(02), , a l l y i n d e p e n d e n t f o r arbitrary 01, 02, , 0mn, possess second order
maxk blk(0), 0 asn, -; moments, and Mank(O) = 0, Ma;,k(B) = b. The sequence of covariance functions Rn(O1, 02) = M[)7n(0j)1In(02)] converges as n --- 00 to some limit lim"_ Rn(01, 02) = R(01, 02); c. For every 0, the sums "(B) = Ek °'1 ank(0) satisfy Lindeberg's condition: For arbitrary positive z, m"
-B, L k=1 1
Izl>rB,
X dH%k(6, x) -4 0 ,
where nnk(B, x) is the distribution function of the random variable ank(O), and B; = Ek1 b;, k(0) = R,(0, 0). Then the sequence {i),(0)} converges weakly as n ---p 00 to a Gaussian random function with mathematical expectation zero and covariance function R(01, 02).
Proof of this theorem reduces to one of the variants of the one-dimensional central limit theorem. We recall the necessary formulation: , where Theorem 6. Suppose that n = E'_"1 Enk, for n = 1, 2, the random variables enk satisfy the following conditions: a. For fixed n, the random variables en1+ en29 1 $nmn are nondegenerate and mutually independent, Meenk = 0 and Me;,k = bnk; b. For arbitrary v > 0, Lindeberg's condition is satisfied: mn
Bn2 k=1
Izl>rBn
xtdHnk(x) -a 0
as
n -,
,
where Bn = Ek=1 b;,k, and the H,k(x) are distribution functions of the variables nk Then the sequence {C,,/B"} of the distributions of the variable Cn/B" converges as n ---p 00 to a Gaussian distribution with parameters 0 and 1 (cf. Gnedenko [1963], p. 306).
Consider the characteristic function it E t;pn(B;)
'n(B1, .. ., 0., tt1, ..., tte) = Me 9=1
RANDOM PROCESSES IN THE BROAD SENSE
20
which for t = 1 is the characteristic function of the joint distribution of the variables )7n(0,), , 7),(0J, and for fixed t1, , t8 is the characteristic function of the random variable Zj=1 Sn The quantity C. can be written in the form 'mn
s
bn = Zk=1Rnk yy
Rnk =
Here,
j=1
tjank(ej) ,
/
k = 1, ..., Ynn .
8
Mlank = 0 , (Ink = MNnk = t tjtPMLank(ej)ank(01)J 7
j,P=1
Bn = Y, bnk = k=1
ll
j,P=1
tjtPRn(ej, OP)
From this we see that max
j.P=1
k
tjtp max bnk(ej) max k
bnk(OP)
0
n --> 0 ,
as
k
and Bn BZ = Z;,P=1 t;tPR(ej, BP). If k = 0, then {C.} converges weakly to zero and 'rn(91, , t,) --. 1, which is a special , 08, t,
case of the assertion of the theorem. For B2 > 0 we verify the satisfaction of Lindeberg's conditions for the variables Rnk this case it is sufficient to show that k=1
MLgr
j=1
tjank(Bi)JI
j=1
tjank(Bj))a] --> 0
as
In
n .--* 00
for arbitrary z > 0, where gr(x) = 0 for I x I < z and gr(x) = 1 for I x I > z.
If tPcenk(BP) is the greatest in absolute value of the tjank(ej)
, s), then
(for j = 1, 2,
prtL tjank(0j)/11L tjank(0j) I 2 7= j=1
Drla(tpank(BP))s2(tPank(BP))2
1
Therefore we always have grl\j l tjank(0j) 1( L tjank(ej)) < SQ
/
j=1
j=1
grls(tjank(ej))(tjank(B,))Z
and In,
L M[ k=1
rl
j=1
tjank(0j)J(L tjank(0j)) ] j=1 m 1
j=1
lank.
k=1
Ixlgrlsltjl
x2dHnk(ej,x)-->0
as
n --, c
Thus the central limit theorem is applicable to the quantities By virtue of this theorem, -t2/2 j,P1j E t P t xce j' eP ) e-(a2t2)/2 = e
Here, if we set t = 1, we see that the sequence of characteristic
3. GAUSSIAN RANDOM FUNCTIONS
21
functions of the joint distributions of the quantities for , s, converges as n 00 to the characteristic function of j = 1, a Gaussian distribution. The continuity of the correspondence between the distributions and the characteristic functions implies the conclusion of the theorem. A vector process E(t), for t e [0, T], is called a process with independent increments if, for arbitrary t1, t2, , where 0 u1, the matrix F(u2) - F(u1) is nonnegative-definite; (b) (°°Wk(u)a[trP(u)]
< 00
JJ
(10) ,
where
k(u) =
Proof.
Let us set
rm(k)_r(k2
1)-r(
2
k = 0, ± 1, ±2, For arbitrary m, the sequence rm(k), for k = 0, ± 1, ±2, m = 1 , 2,
,
(1 1)
, is a process with discrete time that is stationary in the broad sense. On the basis of Theorem 3, the covariance matrix R.(k) of the sequence (11) has the following representation
RANDOM PROCESSES IN THE BROAD SENSE
34
=
Rm(k)
rz
eikudFm(u)
(12)
,
-rz
where the Fm(u) satisfy condition (a) and tr{Fm(7r) - Fm(-7c -0)} < 00.
It is more convenient to write formula (12) in the form
=
Rm(k)
(13)
where Fm(u) = 0 for u < -2mtc ,
Fm(u) = Fm(u) Fm(-1 l for \\2m/ \ 2m /
- 2mtc < u < 2mtc
-
-
u>2-7r.
for
Fm(u) = Fm(tc) Since
t+S+k+l, t,,+r2 11 2='E'ED(t+ k+1-1 t+k+l t+j j=11=1
Dt+
k
2-
,
2m
r
2m
I
t +
2m
2m/
a
R,(k+I- j),
j=1 t=1
it follows that for numbers t1, t2, t3, and t4 of the form k/2m (where
k = 0, +1, +2,
), we obtain (t4-t3)2m-1
E j=1
D(t1, t2, t3, t4) _
-
etiut2
(t2-t1)2m-1
E
exp iu(k + 1 - j) }dm(U) 2m
1=1
- e:ut1
ei(u12m) -1
a-tiut4
-
e-tiut3
e-:(u12m)
which can also be written in the form eiut2 - ecut1 a-iut4 D(t1, t2, t3, t4) _
-
-W
-1
dPm(u)
e_iut3
1
iu
du
k(u)
dHm(u)
,
(14)
where a2k(a)
Hm(u) = J u ae
4 sin 2
a
dFm(u)
,
(15)
2m+1
Let us now show that the sequence of matrix functions Hm(u) is weakly compact. This means that the sequence {Hm(u)} contains a subsequence {Hmk(u)} such that for an arbitrary f(u) that is bounded
and continuous on (- 00 , o), km ,f(u)d'mk(u)
- f(u)dH(u)
5.
SPECTRAL REPRESENTATIONS
35
where H(u) is some matrix whose elements are functions of bounded variation. The matrix H(u) is then the weak limit of the sequence of the matrices Hmk(u). On the basis of Helly's theorem the sequence {Hm(u)}
is weakly compact if the norms of the matrices Hm(u) are uniformly bounded (with respect to m) and L>A dHm(u) II --.0 as A-. 00 uniuniI
formly with respect to m. Let us show that these conditions can be satisfied.
We have
D(t, t + h, t, t + h) = M(['(t + h) - r(t)][ (t + h) - r(t)l *) 4 sin' uh u2k(u)
dHm(u), t = k
,
h=J
(16)
.
m
Let us set 'r(h) = M I r(t + h) - fi(t) Ill. By the hypothesis of the theorem, #(h) -. 0 as h -. 0. From this it follows that the function *(h) is continuous. This is true because I *(h")
- *(h') I < M(I I '(t + h') - r(t + h") I
I
x [II'(t+h")-r(t)II + IIr(t+h')-C'(t)Ii]) V* (j h' - h" I)!(h") + 1/-(I h' - h" I)iir(h') It follows from (16) that for A > 0, 4 sin 2
A
*(h) ? I I D(t, t, + h, t, t, + h) I I > i-A h.
1 AAdHkk)(u) ,
(17)
.
,
uh 2
>
4 h2
= 7r
U 2
for I u I < 1.
uh 2
u' If AI h I < 7r we have
where Hm(u) = (Hm-)(u)) for r, s = 1 , 4 sine
.
Consequently i/r(h) >
Furthermore, for A > I we have
*(h) > 4f
l'>A
= 2f
sin' uh dHm,k)(u) 2
(4 - cos uh)dHm,k)(u) .
J I,&I>A
Integrating this inequality with respect to h, we obtain 1
h
' #(
5.
> 2f 2(1
iul>A
-
C1 -
1 1( Ah
uh
/J
dHc(k,k)(u) Iul>A
(18)
RANDOM PROCESSES IN THE BROAD SENSE
36
Since the left-hand member of this inequality approaches 0 independently of m as h --> 0, it follows that dH.'k'(u) uniformly with respect to m as A- oo.
0
From the positive-definiteness
of the matrix OH., we have I
I OHm(u) I C tr OHm
so that dHm(u)
tr
dHm(u) -- 0
lul>A
as
lul>A
A,-
uniformly with respect to m, and by virtue of (17), iuisAdH.(u)
h I< A
4h2 *(h)
These inequalities prove the weak compactness of the sequence of matrices Hm(u).
If we now take the limit in (14) with respect to the subsequence
of indices mk such that {Hmk(u)} converges weakly to H(u), we obtain edut2
D(tl, t2, t3, t4) -
- eiut, a-tint, iu
e-tiut3 dH(u)
- iu
This equation is valid for all dyadic t1, t2, t3, t,.
k(u)
Obviously
H(u2) - H(u1)
is a nonnegative-definite matrix. Since the left and right members of the formula are continuous functions and since these two members coincide on an everywhere-dense subset of the values of t1, t2, 4, t
they are equal for all values of these variables. It only remains for us to set F(u) = %{dH(u)/k(u)} to obtain the desired result. Let us now look at a scalar random field e(x) in n-dimensional space En: x = (x1, x2, , x") where - oo < xti < o o. This field is said to be homogeneous if Me(x) = a = const, R[xl + z, x2 + z] = M([e(x, + z) - a][e(x1 + z) -a]) = R(xl, x2) If we set z = - x1 in this last condition, we obtain R(x1, x2) = R(0, x2 - x1)
.
The last equation means that the covariance between the random variables e(x) and E(x) depends only on the vector x2 - x, connecting the points x1 and x2. The function R(x) = R(z, z + x) is also called the covariance function of the homogeneous random field.
5.
SPECTRAL REPRESENTATIONS
37
It is nonnegative-definite function of n variables; that is, the quadratic form N
E R(xi - xk)XiXk
i,k=1
is nonnegative-definite for arbitrary choice of N and points x1, x2, .. . , XN .
The function R(x) is continuous if and only if the random field fi(x) satisfies the condition
z--+0.
as
MIe(x+z)-e(x)I2---->0
(19)
The Bochner-Khinchin theorem for nonnegative-definite functions
of a single variable can be carried over almost without change in the course of the proof to functions of several variables. Thus the covariance function of a homogeneous field satisfying condition (9) has a representation of the form R(x) =
ei(.,.)du(v)
where (x, v) denotes the scalar product of the n-dimensional vectors
x and v and a(A) is a finite measure in En. A random field is said to be isotropic if the covariance function R(x1, x2) depends only on x, and the distance between the points x1 and x2. If in addition it is homogeneous, then R(x1, x2) = R(p), where p is the distance between the points x1 and x2:
p=
E (xi - x2)2
We will find a representation of the covariance function of a homogeneous isotropic field. Let us look at the expression for the covariance function of a homogeneous field R(p) =
En
ei(x,v)da(v) \ //
and let us integrate this expression over the surface of a sphere SP of radius p. Reversing the order of integration, we obtain P
n
n-1 R(p) = 27cn\/2 P2I
ff
JEn{Jspei(x'U)ds}da'(v)
ff
.
(20)
Let f(x) denote an arbitrary integrable function in E. and let V denote the ball of radius p with center at a fixed point. Then dp J VP
ff(x)dx1 ... dxn =
f(x)ds
1 SP
,
38
RANDOM PROCESSES IN THE BROAD SENSE
where the integral on the right is over the surface S,o of the ball V,,. Let us use this formula to evaluate the inner integral on the right side of formula (20). Shifting to spherical coordinates in ndimensional space (cf. G. M. Fikhtengol'ts, vol. III, p. 401) and taking for q1 the angle between the vector x and v, we obtain e'(z,-)dx1 ... dx = J VP
... JJ
r, 2reirlvlcos'Pirn-1
JP f 0
0
sinn-2 P1 J
0
0
x sinn-1p2 ... sin pn_2drdp, ... ddn_2
2jr' I,(n
1)/2
(
1J0
eirIvIcosp,rn-1
sin"-2p1 drdp1
'00
2
J
Furthermore, eirlvlcosvi
T T = k=0 f
(ir I V I cos TX k sinn-2p1dp1 k!
sin%-2 m1dn11
i
0
I
YI
2k+1
1
/
2
r(2k + n) 2 f
(2k)!
k=O
)(n
2
and
I_P 0
0eirlvlcosVirn-1
sinn-2 1 dp1
0
p2ktn y lk
k=O
F( 2k
(2k)! (2k + n)
)(n
1) 2
2
F( 2K + nl 2
J
Using the formula for the gamma function for half-integral values
P(k +
_ i/-7r P(2k)
11
22k-1 I'(k)
2
we obtain
I=
1/7rI'n
(p I V I )Zk+(n/Z) 2 V
2i I,(n
1)2(n/2)-lpn/2
151. (-1)k
J./2
k=
k!r(k+ 2
+ 1)
Y/2 2 1) (I2v
I
In/2(P I v I)]
is the Bessel function of the first kind of order in. where Consequently,
5.
SPECTRAL REPRESENTATIONS
Vp
39
yJ
1n/2
ei(x'")dxl ... dx'n
J./.(p I L
From this it follows that ea(x,v)Lts
Sp
= (27tP)
n/2
/2(p I y 1)
In particular, the integral depends on I v J. We introduce the positive parameter X and we set g(?) = a(VV), a, > 0. This last formula and formula (20) yield R(p) = 2`n-2)/2 r(2)
J(.-2)/2(2p)
Jo
(
p)(n-2)/2
dg(?)
(21)
where g(%) is an increasing function, g(0) = 0, and
g(+ °°) = a(EE) = R(0) < - .
Thus we have obtained Theorem S. For R(p)(0 G p < -) to be the covariance function of a homogeneous isotropic n-dimensional random field satisfying condition (19), it is necessary and sufficient that it have a representation of the form (21), where g(X) is a bounded nondecreasing function.
For n = 2 the formula takes the following simple form R(p) = o Jo(Xp)dg(X) ,
(22)
and for n = 3,
R( p )
= 2f
o
sin P ox'
dg (X)
(23)
II
MEASURE THEORY
We assume that the reader is familiar with the elements of the set-theoretic construction of probability theory. Therefore, in the present chapter we have omitted elementary examples and
details that give the intuitive basis underlying formal definitions (see, for example, Gnedenko, 1967). The starting point in probability theory is the assumption that one can define a set U and a class C25 of subsets of U so that every event A, for which it is meaningful to speak of its probability within the framework of a particular problem, can be interpreted as some subset of the set U belonging to e. Since an arbitrary event A interpreted in this manner can be regarded as the union of elements of U that belong to A, we call the points in the set U elementary events and the set U itself the space of elementary events.
For example, if an experiment consists of drawing the graph of a continuous random function in the course of a fixed interval of time [a, b], then U can be understood as the space of continuous functions
on the interval [a, b]. In what follows the events are identified with the sets corresponding to them. Obviously a certain event then coincides with the set U and an impossible event coincides with the empty set; the union, coincidence, and difference of two or more events then coincide with the set-theoretic union, intersection, and difference of sets. The incompatibility of a class of events means that the intersection of the corresponding sets is empty. If A is an event, the complementary event A is the set-theoretic complement of A in U. Furthermore, to every event A E C25 is assigned a nonnegative number p(A) called the probability of the event A. It is natural to require that the class e of events and their probabilities (which are defined) enjoy the following properties (with which we are familiar from elementary probability theory): (a) The difference of two events or the union of an arbitrary sequence of events in the class e are events (that is, they belong to e); (b) the prob40
1.
MEASURE
41
ability of the union of an arbitrary sequence of incompatible events is equal to the sum of the probabilities of the events of the given sequence; and (c) the probability of a certain event is equal to 1. The mathematical apparatus with which we formulate the basic
assumptions and concepts of probability theory and derive the general theorems is the abstract theory of measure and integration.
The material from this theory that we shall need in this book is expounded in the present chapter.
1.
MEASURE
Let U denote an abstract set, which we shall call a space. We shall indicate subsets of U by italic letters and classes of subsets of U by German letters (capitals in both cases). We assume that the definitions and simplest properties of the algebraic operations on sets are known and we mention only the frequently used duality relationships nAk = UAk k
(1)
U Ak = n Ak
(2)
k
and
k
k
where the index k ranges over an arbitrary (finite or infinite) set of values (cf. for example, Kolmogorov and Fomin). Definition 1. A nonempty class Jt of subsets of U is called an algebra of sets of U if it enjoys the following properties: a. b.
A e Jt and B e J{ imply A U B e Jl, A e 91 implies A e R.
Let us give some of the simpler consequences of this definition. Since A U A= U, the relation A e R implies U e R. This in turn implies that the empty set belongs to the algebra of sets. Furthermore, if A e R and B G R, then on the basis of relationships ( 1 ) and (2), A n B = (J U B) e R, and A\B= A n Be R, that is, the intersection and difference of two sets belonging to the algebra 91 also belong to 91. From this it follows by induction that the union and intersection of an arbitrary finite number of sets belong-
ing to the algebra R also belong to R. With respect to the unions and intersections of a countably infinite collection of sets in R, the latter assumption generally ceases to be valid. Therefore we introduce the following definition, which plays a fundamental role.
MEASURE THEORY
42
Definition 2. An algebra of sets C is called a o-algebra if for an arbitrary sequence of sets Ak e e, where k = 1, 2, , U k=1 Ak e e. The sets A e C25 are said to be e-measurable. (Since lk=1 Ak =
Uk=1 Ak, the intersection of an arbitrary countable collection of sets belonging to e also belongs to e.) Theorem 1. For every class of sets $t there exists a smallest 6-algebra e containing K. This a-algebra is called the a-algebra generated by the class a, and is denoted by v{%}. It is easy to prove the existence of such a or-algebra. There exist 6-algebras containing W. To exhibit one, it suffices to take the class of all subsets of the set U. Noting that the intersection of an arbitrary set of a-algebra is again a o'-algebra,
we see that the intersection of all o'-algebras containing a is the minimal or-algebra containing a. Definition 3.
In a metric space, the v-algebra of sets generated
by the class 6 of open sets is called the 6-algebra of Borel sets and its elements are called Borel sets. Obviously the 6-algebra generated by the closed sets of a
metric space coincides with the o'-algebra of Borel sets. We can easily see that in a separable metric space the o-algebra
of Borel sets is the o'-algebra generated by the set of open (or closed) spheres. On the real line the 6-algebras generated by open or closed (or even half-open half-closed) intervals coincide with the
a-algebra of Borel sets. In n-dimensional Euclidean space E. we choose for a system of sets generating the 6-algebra of Borel sets the systems of closed, open, or half-open half-closed parallelepipeds (or intervals) J[a, b], J(a, b), J[a, b), J(a, b]. (If a = (a1, a2, .,a.) and b = (b1, bE, , bJ , , n}. The then J[a, b) = {(x1, xE, , x ); a; < x; < b;, i = 1, 2, other intervals are defined in analogous fashion.) Suppose that to every set A in a certain class of sets I we assign a definite number W = W(A), which may be + oo or - oo.
This defines a set function W on t into the set of real numbers,
A,W= W(A). Definition 4. A set function W is said to be additive (or finitely additive) if it assumes infinite values of only one sign and if, for an arbitrary finite sequence of sets Ak e ! (for k = 1, 2, , n) that are pairwise disjoint (that is, Ak (1 A,. = 0, for k # r where k, r = 1, 2, , n and 0 denotes the empty set) such that
1.
MEASURE
43
UAkE k-1 we find that W(U Ak) = Ek=1W(Ak) . k-1
If this equation holds for an arbitrary countable collection of sets, that is, if
W(0Ak) = > W(Ak)
for an arbitrary sequence of sets Ak E I, where A. 1 A, = 0 whenever k :j,_ r, for k, r = 1, 2,
, n,
UAke
k=1
,
such that
,
then the set function W = W(A) is said to be countably additive (or completely additive).
Definition 5. A countably additive nonnegative set function
p = p(A) defined on a u-algebra of sets e and satisfying the equation p(O) = 0 is called a measure. If a u-algebra of sets e is defined on a set U and a measure p is defined on 0, then the set U is called a space with measure {U, e, u} or a measurable space.
The latter term will be applied for
a set U with a fixed u-algebra of sets e even when the measure p is not given. We can easily see that the condition p(O) = 0 is equivalent to the condition that p(A) is not identically equal to + co for all A e An arbitrary set A E e of a space with measure {U, e, p} can .
itself be regarded as a space with measure {A, CSA, PA}, where e,
is the u-algebra of subsets A of the form A fl B for an arbitrary subset B of 0 and PA(C) = p(C) for every C E e,. We now present a few properties of measures. Theorem 2. a. If A and B D A belong to e, then jt(A) < p(B), and if p(A) # co, then j_t(B\A) = p(B) - p(A). b. If {An} is a countable or infinite sequences of sets belonging to e, then jt(Un An) < Zn Ft(An). c. If {An} is an increasing sequence of sets in 0, that is, if
An+1 D A. f o r n = 1 , 2,
, then
lim tt(An) = P(UAn) d.
If {An}, for n = 1, 2,
(3)
, is a decreasing sequence of sets
MEASURE THEORY
44
in 0 and if p(A1) < 0, then lim p(An) = n--
tt((An) n=1
(4)
.
Proof. a. Since B\A e e and B = A U (B\A) (for A c B), we have p(B) = AC(A) + ,(B\A). b. Let us set C1 = Al and C. = An\(Uk=i Ak) for n = 2, 3,
.
Then the sets C. belong to 0 and they are pairwise disjoint (that is, Cn (1 C,. = 0 for n r). Furthermore, un=1 Cn = U;,=1 A. and p(C.) < p(A,). Therefore,
(OA) = (OC) = E p(C.) < n=1 E MA.) n=1 n=1 n=1
c.
If An c An+1 for n = 1, 2,
, we obtain, in the notation
used above, C. = An\An_1 and p(C.) = U(A,) - p(An-1) if ft(An-1) 00. co for every n and Ao = 0. Then Let us suppose that rt(An)
p(C.) = i [lp(An) - p(An-1)1 = lim tt(An) p(:E An) = in=1 n=1 n-1 n-On the other hand, if p(Ano) = co for some n = no, then for n > no we have a fortiori jt(An) _ co and j_c(U;,=1 An) _ o o.
.
The sets B. d. Let us set Bn = Al\An for n = 1, 2, belong to the u-algebra e, they increase monotonically (that is, Bn c Bn+1), and from (c), 1a(Un°1 Bn) = limn-- jt(B,). hand, n z, An = Al\Ut6=1 B,,. Therefore
On the other
,t(n1An) = p(A1) - f"(V1B.) = p(A1) - lim jt(B,) = ,a(A1) - lim[ p(A1) - u(A,)} = lim p(A,) Deflnltion 6. Let {An}, for n = 1, 2, , denote an infinite sequence of sets. The limit superior lim A. of the sequence {An} is
defined as the set consisting of those points of U that belong to infinitely many of the sets An. The limit inferior lim An of the sequence {An} is defined as the set of those points of the space U that belong to all except possibly finitely many of the sets A. for
n=1,2,
Thus
lim A. = fl U Ak
,
(5)
limAn=UnAk. n=1 k=1
(6)
n=1 k=n
If {An}, for n = 1 , 2, .. , is an increasing sequence, then lim An=
1.
MEASURE
45
An. On the other hand, if {An} is a decreasing lim A. = sequence, then lim A. = lim A. = nn-1 An. It follows from (5) and (6) that the limits superior and inferior of a sequence of sets belonging to a a-algebra c also belongs to Cam. If i denotes a measure
on e, then it follows from assertions c and d of Theorem 2 that a(lim An) = lim
n-.m
p(k-n
fI(lim An) = lim lt(1 Ak) n-.m
(7)
/ Ak)
k-n1
/
,
(8)
with equation 7 holding if the measure p is finite. Definition 7. A sequence of sets {An}, for n = 1, 2, , is said to be convergent if lim A,, = lim A. In this case the common value of the limits superior and inferior of the sequence {An} is called the limit of the sequence {An}: lim An = lim An = lim A. It follows from our definition of convergence of a sequence of sets that every point u e U either belongs to only a finite number of the sets A. or belongs to all the A. from some n on. It follows from what was said above that every monotonic sequence is convergent.
Since
II(n Ak) < 1e('4,) < '2(U Ak) it follows on the basis of formulas (3) and (4) that for every convergent sequence {An} of sets A. and every finite measure j_t(lim An) = lim P(An)
.
(9)
We now introduce the following useful concept: Definition 8. A class t of sets is said to be monotonic if the convergence of an arbitrary monotonic sequence of sets An e fit, , implies tnat the limit of such a sequence belongs for n = 1, 2,
to R. Since the intersection of monotonic classes is a monotonic class, it follows that corresponding to an arbitrary class % of sets there is a minimal monotonic class m{%} containing %. Obviously, every a-algebra is a monotonic class and every algebra that is a monotonic class is a a-algebra: u,n=1 An =
In many cases we need to show that a particular class of sets contains a minimal a-algebra generated by a given algebra. For which the following theorem is useful. Theorem 3. The minimal monotonic class m{ l(} containing the limUk=1 A,
algebra % coincides with the minimal a-algebra a{%}.
MEASURE THEORY
46
On the basis of the remark made above it is sufficient to show that m{%} D c{%}, and to do this it is sufficient to show that m{a} is an algebra. Let S{A} denote the class of all sets B such that A U B e m{S2I}, A\B a m{%}; B\A e m{%}
(10)
.
The class S{A} is monotonic: if is a monotonic sequence of sets and each B. satisfies conditions (10), then A U B, A\B, are also monotonic sequences of sets and lim (A U
lim lim
A U lim B. e m{%} , A\lim B. e m{%}
lim
e m{%} ;
that is, lim B. e m{%}. If A e W, then R(A} D I. Consequently, m{%} c St (A}; that is, for every F e m{?X} we have F e S{A}. It then follows from the definition of R{F} that A e R{F}. Then just as above, it follows from the monotonicity of S{F} that m{%} c St{F}. This means that relations (10) hold for arbitrary A and B in m{%}; that is, m{%} is an algebra of sets. Let us pause to look at arbitrary countably additive functions
defined on a v-algebra of e. We shall call them charges. Since every charge is the difference between two measures the study of charges reduces to the study of measures. from Theorem 4 below.
This follows immediately
Definition 9. For arbitrary A e e, the quantities W+(A) =
sup
A'cA,A'c
W(A'); W-(A) _ -inf
A'cA,A'e
W(A')
(11)
are called respectively the positive and negative variations of the charge W on the set A, and the quantity
W I (A) = W+(A) + W (A) is called the absolute variation.
(12)
We note that for arbitrary A e e, (13) IN'I(A)? IW(A) It follows immediately from the definition that W+ and W- are nonnegative and nondecreasing set functions: If A c B, then
0 < W±(A) < W±(B) .
(14)
Furthermore, W-+(A
U A2) < W-(AJ + W±(AE) .
(15)
Throughout the remaining portion of this section we shall assume that the space U and a 6-algebra C on it are fixed. All the sets that we shall consider are assumed to be s-measurable.
1.
47
MEASURE
Lemma 1. If W(A) < °° for every A, then W+(U) < ..
To prove this, let us assume the opposite, namely that W+ (u) _ °° . Let us show that in this case there exists for arbitrary c > 0, a set A such that W(A) > c and W+(A) = °°. We also prove this assertion by contradiction. If it is not
valid, there exists an Al such that W(A1) > c and W+(A1) < °°.
If
we set A2 = U\A1 in inequality (15) we obtain the result that W+(U\A1) _ °° .
Repeating the above reasoning with U replaced
by U\A1, we see that there exists an AE c U\A1 such that W(A2) > c and W+(A2) < °° . We obtain by induction an infinite sequence of sets A1, A2, ... , belonging to e, pairwise disjoint, and such that W(An) > c, so that
An) = n=1 i W(An) W(U n=1 which contradicts the hypothesis of the theorem. Thus for every c there exists an A such that W(A) > c and W+(A) = °° . Let us take successively c = 1 , 2, , n . . . and then apply what we have just proven to construct a sequence of sets B,,, such that for each n, W(B,) > n, W+(B,) = °° and Bn+1 c Bn. For this we first use U to find B1, then B1 to find B2 c B1, and so forth. Let us set B,,. Then W(B1\D) = lim[ W(B1) - W(B,)] = - o° so that D= W(D) = + °°, which is impossible. This contradiction completes the proof of the lemma. Theorem 4 (Hahn). Let W denote an arbitrary charge on a o'algebra e. Then U can be partitioned into two sets P and N such
that U = P U N, P n N = 0, and for every A e e, W(A n P) 0, W(AnN) ,8 - 2-n and
P= lim Cn = f 1 U Ck . n=1 k=n
If A c Cn, then W(A) = W(C,) - W(Cn\A) > Therefore, for
1
.
arbitrary A c P we obtain from the relation
MEASURE THEORY
48
A c U,-=n An Ck the result
/
W(A) _
k=n
W(A n
(c,,\'U_'cj)) j=1
i 12k > - k-,n
or taking the limit as n --> -, we find W(A) > 0 .
(16)
Let us now set N = U`P = U Dn, n=1
where D,y =
n1 (U\Ck) 1
k=n
If An c,, = o, then W(A) < 2-". (This is true because the inequality W(A) > 2-n implies that W(A U C,,) = W(A) + W(C,) > i9, which is impossible.) From this it follows that the relation
An(UCk)=0 k=n implies that W(A) < 0. Therefore if A c N, A=U (A n Dn), W(A) _ n=1
n=1
W(A n (Dn\Dn-1)) < 0 .
Thus
W(A) < 0 for every A c N. This completes the proof of the theorem.
(17)
Corollary 1. The positive, negative, and absolute variations of a charge are measures and
W+(A) = W(A n P), W-(A)
W(A n N) ,
IWI(A)=W(AnP)-W(AnN).
(18)
(19)
Corollary 2. An arbitrary charge can be represented as the difference of two measures:
W(A) = W+(A) - W-(A) .
(20)
Corollary 3. sup I W(A') I< I W I (A) < 2 sup I W(A') A'CA
A'e(g
Proof.
A'cA
A'eE5
Formula (20) follows from (18) since
W(A) = W(A n P) + W(A n N) = W+(A) - W-(A) , and inequalities (21) follow from (20) and (19).
(21)
1.
49
MEASURE
Let W denote the set of all finite charges on a a-algebra e. This set is a linear space if the sum of two charges and the
product of a charge and a number are defined in the natural manner (tW) (A) = tW(A) .
(W, + WZ) (A) = W1(A) + WZ(A) ,
We now define a norm on W:
IIWII=IWI(U) It follows from formulas (18) that
IItWII=ItIIIWII and from (15) that
Thus W becomes a normed space. Convergence in W is called , converges to W convergence in variation. If { for n = 1, 2, in variation, that is, if I W - W, I 0 as n 00 , then W, (A) I
W(A) uniformly over all sets A E C25: I
Ae(q
I W - W,
(cf. inequality (21)). Theorem S. The space W with norm II W II = W I (U) is a Banach space (that is, a complete normed linear space). We need prove only the completeness of the space. Suppose that W,,, - W, I 0 as n, n' oo . For every A G C5, the sequence of the numbers W,(A) is a fundamental sequence and it converges to a finite limit. Let us set WO(A) = WW,(A). The set function W0(A) is defined on C25 and is finite and additive. Let us show , denote that it is countably additive. Let {Ak}, for k = 1, 2, I
I
I
a sequence of disjoint sets in e. Then W0(Ul
Ak)
-
WOCkU1
W.(k=n+1
Ak/
Ak)
W \k=n+1 Ak/
= lim p-.
W,m+p1
J
A,)
+ lim 11 W.+p - W. II p-.m
-
(22)
The right-hand member of this inequality can be made arbitrarily small by the suitable choice of m and n. It follows from (21) that {W.} converges in variation to W0. This completes the proof of the theorem.
MEASURE THEORY
50
2.
MEASURABLE FUNCTIONS
From an intuitive point of view, a random variable e
is a
(variable) number that corresponds to each possible outcome of an experiment. Since the outcomes of an experiment are described
by elementary events, a random variable can be regarded as a function of an elementary event, e = f(u) for u e U. On the other
hand, in elementary probability theory a random variable c
is
completely characterized by its distribution function F(x) = p{e < x}.
Corresponding to the event { < x} is the set of elementary events {u, flu) < x}., Therefore, for it to be meaningful to speak of a
distribution function of a random variable, the set {u, f(u) < x} must for arbitrary real x belong to e. In this section we shall study the class of functions defined on a measurable space { U, Cam, p}
which enjoy this property. U.
Definition 1. Let e denote a a-algebra of sets of the space Let f(u) denote a function defined on an Cam-measurable set M
and assuming real values (and possibly the values ± o c). Such a function f(u) is said to be ce-measurable if for every real x the set {u; f(u) < x} is e5-measurable.
We note a few properties of measurable functions. Theorem 1. Let A denote an arbitrary Borel set in the ndimensional space E. and let f (u), , .(u) denote Cam-measurable functions all defined on the same set M e Then the set Cam..
{u; u e M, (fl(u), f2(u),
, f, (u)) e Al
is Cam-measurable.
Proof. Since {u; u e M, (fl(u), ... , fn(u)) e A'\A"} , f (u)) e A']\{u; u e M, _ {u; u e M, (fl(u),
(.f (u), ... , f, (u)) e A"} {u; u
M, (.f (u), ... , .f, (u)) e
k=1
A}f
= U {u; u e M, (f(u), ...,fn(u)) e A' ')I k=1
the class ( of sets A contained in En, such that the set
{u;ueM, (f(u), , fn(u))eA} is Cs-measurable constitutes a a-algebra. Furthermore, 5?.I contains the n-dimensional infinite intervals = {(x1, ... , x,m); x1 < a1, ... ,
a,y}
2.
MEASURABLE FUNCTIONS
51
since n
{u; u e M, (fl(u), ..., f,,(u)) E
n {u;; u e M, fk(u) < ak}
Consequently t contains all Borel sets in E,.. Corollary 1. If f(u) is an s-measurable function, then for every x the sets {u; u e M, f(u) < x}; {u; u e M, f(u) > x} , {u; u e M, f(u) > x}; {u; u e M, f(u) = x} , {u; u e M, a < f(x) < b}
,
etc.,
are s-measurable.
REMARK 1. As one can see from the proof of Theorem 1, the assertion in that theorem holds for an arbitrary function f(u) defined on an Ce-measurable set M and satisfying the condition {u, u e M, f(u) E K} e 0 for a certain class of sets K that generates a o'-algebra containing 01 (the o'-algebra of Borel sets in E1). In particular the function f(u) defined on Me e is s-measurable if for arbitrary real x one of the following systems of sets
{u; u e M, f(u) < x} {u; u e M, x < f(u)} {u; u e M, x < f(u)}
is Cam-measurable (x may range only over an arbitrary everywheredense set).
Theorem 2. Let { fn(u), n = 1, 2,
, u c M} denote a sequence Then the functions
of C5-measurable functions. n
n
x
f (u), lim
sup fn(u), inf n
are Cam-measurable.
The proof follows from the relations: {u; u E M, sup f (u) > x} = U {u; u e M, f (u) > x} , n
n=1
{u; u e M, inf f (u) < x} _ U {u; u e M, fn(u) < x} , n
n=1
{u;ueM,limf,,(u)<x}=U U k=1 n=1 j=n {u; u E M, linm f (u) > x} = U U n {u; u e M, fj(u) > x +
MEASURE THEORY
52
Definition 2. The characteristic function XA(u) of a set A is defined as the function that is equal to 1 for u e A and equal to 0 for u 0 A.
Note the following obvious relations: (1)
XAnB(u) = ZA(u)/6B(u);
(A n B= 0);
XAUB(u) = XA(u) + XB(u) XA(u)
(2)
(3)
1 -XA(u),
x1irA,y(u) = llm XA,y(u);
(4)
xiimA,y(u) = llm XAJU)
(5)
Definition 3. An Cs-measurable function J(u) is called a simple
function if there exists a finite collection of sets, each contained in the domain of definition of f and together covering this domain of definition, such that f assumes a constant finite value on each member of the collection (though possibly differing from member to member).
Suppose that a simple function f(u) is defined on a set Me e , an (where av a; if i :/- j, for and assumes the values al, a2,
i,j=1,...,n).
Let us set A; = {u; u e M, f(u) = a;} for j= 1,
, n.
Then
the A, are Cs-measurable and
.f(u) = 3-1 3
a,XA;(u),
ueM
(6)
where XA,(u) is the characteristic function of the set A;. On the other hand, every function that can be represented in the form (6) is a simple function defined on M. Theorem 3. For a function f(u) ( where u e Me e) to be emeasurable, it is necessary and sufficient that it
be the limit of a
sequence of simple functions that converges everywhere on M.
The sufficiency follows from Theorem 2. To prove the necessity we set Proof.
AN _2NN = {u; u e M, f(u) < -N} ; ANA; = {u; n e M, k
k
1
2N
< .f(u)
N}
fN(u) =
2NN+1 k k=-2NN
1
2N
y
X AN k(u), u e M .
fN(u) - f(u) I < 2-N if I .f(u) I < N, fN(u) = N if f(u) >N, andfN(u) < -N if f(u) < -N. Consequently, lim fN(u) = f(u), u e M. Then
I
2.
MEASURABLE FUNCTIONS
53
This completes the proof of the theorem. REMARK 2. If f(u) is nonnegative (or at least bounded below) and Cam-measurable, it is the limit of an everywhere-convergent nondecreasing sequence of simple functions. To see this, we note that the functions fN(u) constitute in this case a nondecreasing sequence beginning with some number N.
Let us now look at functions g(x) defined on some metric space R into the extended real line (that is, the set of real numbers
with the values ± - included). Let 0 denote the Q-algebra of Borel sets contained in R. Definition 4. A function f(x) for x e R is called a Borel function if for arbitrary real a the set {x; f(x) < a} is a Borel set. Definition 5. A Baire function is defined as a function belonging to the smallest class B of functions defined on R that satisfies the following two conditions: (a) B contains all continuous functions; (b) B is closed under passage to the limit; that is, if {fn(x)j, for n = 1 , 2, , is an arbitrary sequence of functions fn(x) e B that converges in R, then limn- fn(x) e B.
Theorem 4. The classes of Borel and Baire functions coincide. Proof. Let Q denote the class of Borel functions. Q contains all continuous functions and is closed under passage to the limit: {x; lim fn(x) < a} = {x; lim fn(x) < a}
= k=1 U n=1 U j=n n {x; .fj(x) s . The function ff(x) is continuous and lim, f,(x) = XF(x). 0
l
Conse-
quently F e a. b. The class a is monotonic. Let A. denote a monotonic sequence of sets A. e a and define Ao = lim A. Then xA0(x) = lira XA.(x) e B.
It follows from a and b that I contains all Borel
sets in the space (Theorem 3, Section 1). c. Let f(x) denote an arbitrary Borel function. On the basis of Theorem 3, there exists a sequence of simple Borel functions fN(x) such that f(x) = lim fN(x) (7) The simple functions fN(x) admit a representation of the form (6) in which the A. are Borel sets. Since the class B constitutes a linear space, simple Borel functions belong to B. Since B is closed under passage to the limit, on the basis of equation (7) an arbitrary Borel function is a Baire function. This completes the proof of the theorem. Let us now look at the properties of measurable functions. denote a sequence of finite aTheorem 5. Let fl(u), measurable functions defined on an a -measurable set M and let
,
, denote a Borel function in n-dimensional space E. the function q (fl(u), , fa(u)) for u e M is Cs-measurable.
(p(tl,
Proof
Then
For arbitrary real a, the set
B. = {(t1, ...,
p(tl, ..., t,a) < a}
is a Borel subset of E. The set {u; u e M, p(.f (u), .f2(u), ... , f.(u)) < a} , = {u; u e M, (fi(u),
e Ba}
is a -measurable on the basis of Theorem 1. This completes the
2.
MEASURABLE FUNCTIONS
55
proof of the theorem. Corollary 1. If f and g are s-measurable finite functions, then the functions f ± g, fg, and 11g are also Cam-measurable. Here l/x must be assigned some value unique for x = 0.
This follows from the fact that the functions x ± y, xy, and l/x are Borel functions. Corollary 2. For any two Cs-measurable functions flu) and g(u), where u e M, the sets {u; u e M, f(u) < g(u)j; {u; u e M, f(u) = g(u)} are measurable.
The proof follows from the measurability of the function flu) - g(u). Definition 6.
Let p denote a measure with domain of defini-
tion e. Two functions f and g are said to be equivalent (more precisely, p-equivalent) on a set Me e if the set A = {u; u e M, flu) zt- g(u)} is Cs-measurable and p(A) = 0.
Definition 7. A o-algebra of sets C is said to be complete (or p-complete or complete with respect to the measure p) if an arbitrary subset N' of a set N of p-measure 0 is Cam-measurable; that is, if
the relations N' C N, N e C5, and p(N) = 0 imply
N'ee.
(8)
The measure p defined on a p-complete o-algebra of sets is also said to be complete. Of course relation (8) implies that )"(N') = 0. Theorem 6. If C25 is a p-complete a-algebra, if f(u), for u e M, is a 25-measurable function, and if the functions f(u) and g(u) are equivalent on M, then g(u) for u e M is also s-measurable. It follows from the hypothesis of the theorem that for arbitrary real a, {u; u e M, g(u) < a} = {u; u e M, f(u) < a}\N' U N", where N' and N" are subsets of the set N = {u; u e M, f(u) :94- g(u)} of pmeasure 0. By virtue of the completeness of the measure, the set {u; u e M, g(u) < a} is Cam-measurable. The set of all C5-measurable functions defined on M and
equivalent to a given function f(u) is obviously some complete equivalence class of functions. In many cases there is no point in distinguishing among equivalent functions. Then the word "function" actually refers to an entire class of Cam-measurable functions that are equivalent to each other. In what follows we shall often proceed from this point of view. Let us make some remarks concerning terminology. A certain property is said to hold p-almost-everywhere on M if the p-measure
MEASURE THEORY
56
of the set of points at which this property does not hold is equal to 0. For example, if two functions f and g are equivalent on M, we may say that f and g coincide u-almost-everywhere on M. A sequence of functions f (u), for n = 1, 2, , and where u e M, is said to converge fit-almost-everywhere to the function f(u) on M if the p-measure of the set of those points u E M at which f (u) does not exist or does not coincide with f(u) is equal to 0. If a property holds p-almost-everywhere on a set M, let us indicate this property with the expression (mod p) instead of the more cumbrous "p-almost-everywhere." Thus if f and g are equivalent on a set M, we can write f(u) = g(u), u e M(mod p). Similarly, if {f(u)} converges to f(u) p-almost-everywhere on M, we can write simply f (u) = f(u), u e M(mod p). 3.
CONVERGENCE IN MEASURE
A sequence of random variables n is said to converge in probability to a random variable e if for arbitrary s > 0 P{ -$ > s}--'0 as n o, and we indicate this fact by writing = P - lim s,,. Corresponding to this definition in the general theory of functions is the following: Let { U, e, p} denote a space with a measure denote a sequence of p-almostand let { f (u)} for n 1, 2, everywhere finite s-measurable functions on U. Definition 1.
A sequence {f(u)} is said to converge in p-measure
to a Cs-measurable function f(u) if for arbitrary s > 0, p{u; I fn(u) - f(u) I > s} , 0 as n We indicate this by writing flu) = p-lim f .(U) . In this section we shall consider those properties of sequences of functions that are related to convergence in measure and we
shall look at the relation between convergence in measure and ordinary convergence (convergence at each point of some set). Some measurable space { U, e, p} is considered fixed. All the functions in question will be assumed Cs-measurable and (mod ft) finite even though this may not be explicitly stated. , denote a given sequence of Let {f(u)}, for n = 1, 2, measurable finite (mod p) functions on U. Let S denote the set of points of U at which the sequence {f(u)} converges to a finite
limit and let D denote the set of points at which this sequence
3.
CONVERGENCE IN MEASURE
diverges.
57
Then
s = n UL n1 {u; I f, (u) - f.+-(u) I < k } , D = U n UL {u; I .f%(u) - .fn+m(u) I > k
I
,
(1 ) (2)
that is, the sets S and D are s-measurable, so that it is always meaningful to speak of the measure of the set on which a sequence {f(u)} converges or diverges. If p(D) = 0, the sequence of functions converges It-almost-everywhere. We set f(u) = lim f (u), u e S,
and we extend the definition of f to the set U\S = D by setting f(u) = 0 for u e D. Then lim f,(u) = f(u) (mod p), and the function flu) is the finite limit (mod ,u) of the sequence {f(u)} on U. Theorem 1. Let pt denote a finite measure. If a sequence {f(u)} of functions fn(u) converges (mod p) to a finite (mod p) function f(u) on U, then { f%(u)} converges to f(u) in ,u-measure. Proof. Let D denote the subset of U on which the sequence {f(u)} does not converge to f(u). Define
Dkm =-= U {u; I .fn(u) - f(u) I > k } , Dk = M=1 n Dkm, D = U Dk k=1
The sets Dk constitute an increasing sequence and the sets Dkm constitute for fixed k a decreasing sequence. Consequently, p(D) = limk_ p(Dk) = 0, so that ji(Dk) = 0 and p(Dk) = lim f t(Dkm) = 0. Thus for every k and s > 0 there exists an m such that p{u; I fn(u)
- f(u) I> k}< s
for all n > m, which means that the sequence { f, (u)} converges to f(u) in p-measure.
The following theorem asserts that a sequence of functions that converges in ,u-measure has no more than one limit (mod It). Theorem 2. If f(u) = It-lim f (u) and g(u) = p-lim fn(u) (for u e M), then the functions f and g are equivalent on M. This is true because p{u; u e M, f zt- g} kim It{u; u e M, I f(u) - g(u) I > k }
MEASURE THEORY
58
{u; u e M, I f(u)
- g(u) I > k
s p{u; u e M,
f(u) - f (u) I+ I f(u) - g(u) I> k 1
k}
+ p{u; U e M, I ff(u) - g(u) I > k >0 ask,. Definition 2. A sequence {f(u)}, for n = 1, 2, be bounded with respect to a measure It on a set M if sup lt(u; u e M, I f,(u) I > L}
,
is said to
0 as L ---+ o
, converges Theorem 3. If a sequence { f (u)}, for n = 1, 2, in It-measure to a finite (mod ,u) function f(u) on U, it is bounded on U with respect to the measure lt. Proof. For arbitrary L,
If (u) - f(u) > 2
1iu; u e M, I fn(u) I > L} < It
+ p{u; u e M, I f(u) I>
J
1}
J
(3)
2 sequence The sets {u; u e M, I f(u) I > L/2} constitute a decreasing as L oc and
fl {u; If(u) I > 2
{u;
I f(u)
I=
}
Therefore the second term on the right side of formula (3) approaches 0. That the same is true of the first term follows from the convergence in measure of {fn} to f. Let us find a necessary and sufficient condition for convergence
in measure of a sequence of functions. We introduce a useful auxiliary concept:
Definition 3. A sequence {f(u)}, for n = 1, 2, , is said to be almost uniformly convergent on U if for arbitrary s > 0 there exists a set H such that p(H) < s and the sequence {f(u)} converges uniformly on U\H. The concept of an almost uniformly convergent sequence should
not be confused with the concept of a sequence that converges uniformly almost everywhere on U. for n = 1, 2, Lemma 1. If a sequence {
, of Cs-measurable
finite (mod p) functions converges almost uniformly on U, it converges
59
3. CONVERGENCE IN MEASURE
almost everywhere on U. Proof. For every integer k, there exists a set Hk such that p(Hk) < 2-11, and the sequence { fn(u)} converges uniformly on U\Hk.
Then the sequence { fn(u)} converges on an arbitrary set M, = U\ U;=, Hk and consequently converges on the set u;=1 M. The set p=1
H=U\UMp=fIUHk, p=1 k=p
where the sequence { fn(u)} (for n = 1, 2,
.
) may diverge, has
measure
lt(H) = lim P((J Hk) = 0 . k=p
/
Definition 4. A sequence { fn(u)} of Cam-measurable finite (mod
functions is said to be fundamental in p-measure if for arbitrary oc. s > 0, p{u; I fn(u) - fm(u) I > s} 0 as n, m
Theorem 4. If a sequence { fn(u)} of Cs-measurable finite (mod p) functions is fundamental in p-measure it contains an almost uniformly convergent subsequence {fnk(u)}, for k = 1, 2. Let us find an nk such that
Proof.
p{u; I f -(u) - fm(u) I > 2k} < 2k
for n, m > nk. Without loss of generality we may assume that the sequence {nk} is an increasing sequence. Suppose that Ek = {u; I J nk(u) - fnk+l(u) I >
2k }
Then if u U;=, Ej and i, j > k (for i < j), it follows that
I AM -
{{''
,/ n j (u) I {'
C IJ nd(u) - fnti+1(u) I + I Jni+1(u) 2°
2111 +
.
+
21-1
{
- A+2(u) I + ... + I l ny-1(u) - fnj(u)I
- 2-1
that is, the sequence { fnk(u)} converges uniformly on the set U\Hk, where
Hk = U Ek and u(Hk) < j fY(Ej) < 21-k j=k j=k in other words, the sequence { fnk(u)} converges almost uniformly on U. Theorem 5. For a sequence { fn(u)} of functions to converge in
MEASURE THEORY
60 t_t-measure, it is
necessary and sufficient that it be a fundamental
sequence in p-measure.
Proof of the Necessity.
If { ff(u)} converges in measure to the
function f(u), then l-p{u;
I f -(u) - fm(u) I > s} p{u; I f .(u)
as n, m
)0
- flu) I > 2 I+ t`{u' I f(u) - fm(u) I > 2 I
.
Proof of the Sufficiency. If a sequence {f(u)} is fundamental in measure, it contains by virtue of Theorem 4 a subsequence that converges almost uniformly and hence in measure to some finite (mod p) Cam-measurable function flu). Then
p{u; I f(u) - f -(u) I > s} F`{u;
I f(u) - fflk(u) I >
2
} + p{u; I fnk(u) - f -(u) I >
2
}
On the basis of the choice of sequence { the first term on the right side of the inequality approaches zero as k - and the
second term approaches zero as k, n ---, - by virtue of the fact that the sequence {
is fundamental in measure.
This completes
the proof of the theorem. Corollary. For a sequence {f(u)} to converge in p-measure to f(u), it is necessary and sufficient that every subsequence of the functions f(u) that to f(u) almost uniformly-The
necessity follows from Theorems 4 and 5. To prove the sufficiency, we note that if the sequence { f,(u)} does not converge to flu) in measure there exists a sequence of indices nk such that p{u; I .k(u) - f(u) I > e} > 8 for some s > 0 and 8 > 0. But this contradicts the assumption that the sequence { fk(u)} contains a subsequence that converges almost uniformly to J (u). If p(U) < oo, we can replace almost uniform convergence in the statement of the corollary with convergence (mod p). Theorem 6. If p(U) < 00 and if the sequences {f1k)(u)}, for k = 1, 2, , s, of finite" (mod p) functions converge (in ,u-measure) as n oo to functions gk(u) = p-lim fk)(u), and if q(tl, , t.) is an arbitrary continuous function of s variables t;, where - co < t. < 00, for j = 1, ..., s, then q(&(u), g2(u), ..., g.(u)) = p-lim p(f,"(u), f,2)(u), ..., {e)(u)).
4.
INTEGRALS
61
,
An arbitrary Proof. Let us set F (u) = q?( fnl)(u), sequence of indices n, contains a subsequence {nkT} such that the sequence { f kr(u)} converges n-almost-everywhere to g ,(u) (for j = It follows from the continuity of the function 1, , s) as n contains in turn (Atl, , t,) that an arbitrary subsequence , g,(u)) p-almosta subsequence that converges to F(u) = 9?(g1(u), everywhere as n --. oo . This, together with the above corollary,
yields the desired result. Corollary.
If p-lim ff(u) = flu) and p-lim g (u) = g(u),
then
p-lim (af (u) + /9gJu)) = a(u) + Qg(u) , p-lim ff(u)gn(u) = J (u) g(u) REMARK. It follows from the proof of Theorem 6 that the conclusion of that theorem remains in force even when the func, t,) is not necessarily continuous but has the property tion q,(t1, , g,(u)) e A} = 0 where A is the set of points of p{u; (g1(u), discontinuity of q. In particular, if p-lim g (u) = g(u) and g(u) # 0
(mod p), then the sequence {1/g,a(u)} converges in measure to 1/g(u).
4.
INTEGRALS
In probability theory, we assign to a random variable e a specific number me known as the mathematical expectation of the random variable a assumes finitely many values x1, x2, the mathematical expectation is given by the formula
me_
If
,x
(l)
x;P{C=x:} 4=1
and it enjoys the following properties: M(aE + ft) = aMC + bMrl
,
and the inequality e < 72 implies me < M77.
We now define the mathematical exception for more general For an arbitrary random variable , we construct a sequence of random variables, each assuming finitely many values, that converge to e. We set Me = lim MEG. This definition will be meaningful if (a) lim Me exists, (b) lim Me depends only on e and not on the particular choice of sequence {en} of random variables En approximating e, and (c) the extended definition of the mathematical expectation has "good" analytical properties. It has proven cases.
MEASURE THEORY
62
impossible to extend the concept of mathematical expectation to all random variables and at the same time to satisfy these three con-
However we can do this for a rather broad class of
ditions.
random variables (for example, for all random variables that are bounded either above or below). Difficulties arise here because in
taking the limit in a sequence of sums of the form (1), the parts corresponding to both positive and negative terms can approach Therefore it is expedient to consider first random variables that assume values of a single sign. When we shift from random variables to arbitrary functions defined on spaces with measure, the concept of mathematical expectation becomes the concept of an integral. The present section is devoted to carrying this out in the general case. We assume that some measurable space J U, C25, fc} is fixed and that all the ± 00 .
functions in question are Cs. -measurable.
Let us first look at simple functions. Let f(u) denote a simple function defined on U and assuming the values c1, c2, , c,,. The quantity U
f(u) 1C(du) =
,1 Ck
fC{u, f(u) = Ck}
0 ')
is called the integral of the function f(u) over U. Let us note the simpler properties of integrals of simple functions:
f(u) > 0 implies
f(u) fc(du) > 0 , JU
(2) (3)
Ukf(u)fc(du) =
k is an arbitrary constant; Uf1(u)f-i(du) + JUf2(u)f (du) .
U[.fi(u) + ff2(u)}p(du) =
(4)
Only (4) needs to be proved. Suppose that f (u), for i = 0, 1, 2, assumes the values ck°) (for k = 1, 2, , m;), that fo(u) = f (u) + f2(u) and that Ak' = {u; f'''(u) = ck )}. Then .b
UA(i)=U k=1 and either AV, l Art) (1 A;2' _ 0 or c( O) = c.1> + c;2). Consequently,
4.
INTEGRALS
63 m°
+0 MO
k=1 MO
k=1
/ / f0(u)p(du) = E ck ),u(Ak )) = L ck) E i(Ak) n An) n Ae2)) U
_
k=1 r,s
+
cr' p(Ai°) n A(1) n A,(2))
m°
c82>i(Ak)
k=1 r,s
crup(A(1)) +
_
r,8
r
n A(1) n
A82))
c82)t(A82)) a
=
JUf(l)(u)II(du)
+
f`2)(u)I-1(du)
,
which proves (4).
Let us now define the integral of a simple function f(u) over an arbitrary set Me by means of the formula
e
JM
f(u)p(du) = U f(u)xM(u)f"(du)
(5)
,
where XM(u) is the characteristic function of the set M. If M, and M2 are Cam-measurable and have no points in common,
then X-11"'2(U) = X -1(U) + X "2(U)
Consequently, ,/ M1UM2f(u),i(du) =
(f1
,/ f(u)p(du) +
,/
L2 f(u),u(du)
L
.
(6 )
It follows from (2) and (4) that the inequality f(u) > g(u) for u e M implies (7)
JMf(u)p(du) > JMg(u)li(du) .
Before generalizing the definition of an integral, let us prove: f o r n = 1 , 2, , denote a nondecreasing Lemma 1. Let sequence of nonnegative simple functions and suppose that lim g (u) > h(u) for u e M, where h(u) is a nonnegative simple function. Then, lim
JM
h(u) p(du) .
(8)
Proof. Let 0 h1 < h2 < ... < ha denote values assumed by the function h(u). To prove inequality (8), let us confine ourselves
to the case h, > 0. In the opposite case, it would suffice to prove inequality (8) for the set M\{u; h(u) = 0}.
Let s denote an arbitrary positive number and let Qm, for each m = 1, 2,
, denote the set {u; u e M,
>t h(u) - s, n > m}.
MEASURE THEORY
64
These Qm constitute a nondecreasing sequence of Cam-measurable sets,
and U:=1 Q. = M. Suppose that u(M) < oo . Then lim p(Q.) AC(M) and lim p(M\Qm) = 0. Furthermore, for n > m, 5
gn(u) p(du) > 5QM g.(u) p(du) > QM(h(u) - s),u(du)
h(u) p(du) - sp(Q.) Qm
h(u)p(du) - sp(M)
- h, fc(M\Qm)
If we let s approach 0 and m approach 00 , this inequality becomes inequality (8). If p(M) let us take s = h1/2. Then for n > m,
5g(u)fi(du) > 2 P(Qm) oo, which again yields (8).
as m
Lemma 2. Let {f(u)} and {g (u)} denote two nondecreasing sequences of nonnegative simple functions and suppose that lim f (u) _ lim g (u) for u e M e e.
Then
f(u)p(du) = lim
lim
g(u)p(du)
JM
Proof.
It follows from the hypothesis of the theorem that lim L(u) > gm(u)
On the basis of Lemma 1, lim L fn.(u)p(du) > 5gm(u)p(du).
If we now let m approach - we obtain lim
f.(u) f.C(du) > lim JM
g (u) fp(du) J
By reversing the roles of the sequences {f(u)} and {g,(u)} in this last inequality, we obtain the desired result. Let us now define an integral in the general case.
Let f(u)
denote an arbitrary s-measurable function on M. We define f+(u) = max {f(u), 0}
,
f-(u) = - min {f(u), 0}, u e M
.
Then (9) f(u) = f +(u) - .f (u) , and the functions f+ and f- are nonnegative and Cam-measurable.
4.
65
INTEGRALS
Definition 1.
The quantity f(u) p(du) = lim
p(du) ,
(10)
JM
JM
where {f(u)} is an arbitrary nondecreasing sequence of nonnegative simple functions that converges to f(u), is called the integral of the Cs-measurable nonnegative function f(u) over the set M. If f(u) is an arbitrary Cam-measurable function and if one of the integrals J Mf
+(u)p(du),
J
(11)
Mf-(u)a(du)
is finite the integral of the function f(u) is defined by f
JMf(u)li(du) = L f+(u) ft(du) JM
J'-(u)p(du) .
(12)
If both integrals (11) are finite, the integral of the function f is also finite and the function f is said to be integrable over the set M.
This definition is meaningful: On the basis of Lemma 2, the definition of an integral of a nonnegative function is independent of the particular choice of approximating sequence in formula (10). In particular, for a nonnegative simple function the definition of the integral (10) coincides with the original definition. Furthermore,
for an arbitrary simple function, the functions f+ and f- corresponding to it are simple, and on the basis of property (4), which asserts the linearity of the integral of a simple function, formula (12) for simple functions gives the originally determined value of the integral. Let us now look at the basic properties of an integral. Theorem 1. If p(M) = 0 and f is an arbitrary function defined on M, then L f(u)p(du) = 0 .
(13)
This follows immediately from formulas (12), (10), and (1'), Theorem 2. MlUM2
,/ .f(u)ft(du) = Ml f(u)p(du) +
f(u)p(du)
,
(14)
MZ
where Ml n M2 _ 0, where Ml and M2 belong to e, and where one side or the other of the equation is meaningful. It suffices to consider (14) for nonnegative functions. But for
these functions, equation (14) follows from the definition of an integral and the fact that this equation holds for simple functions.
MEASURE THEORY
66
Theorem 3.
Maf(u) p (du) = a
f(u)p(du)
(15)
Equation (15) is valid for simple functions. By taking limits, we extend this equation to arbitrary nonnegative functions, and by virtue of for a >t 0 , (af )+ = of+, (af )- = of (af)+ _ - elf-, (af)- _ - af+ for a < 0
and formula (12), the result is carried over for arbitrary Cam-measurable
functions whose integrals over M are meaningful. Theorem 4. JM
[flu) + g(u)]p(du) _ .f(u)p(du) + Lg(u)li(du) ,
(16)
if the the integrals on the right are not infinite of different signs. If the functions f(u) and g(u) are nonnegative on M, then (16) is valid because this equation holds for simple functions. The case in which f(u) and g(u) are both nonpositive reduces to this case by
means of (15) with a = - 1. Suppose now that f(u) > 0 and g(u) < 0 everywhere on M, that f(u) + g(u) > 0 on M, c M, and that f(u) + g(u) < 0 on ME c M, where M, U MQ = M. Then by setting f(u) + g(u) = h(u), we obtain f(u) = h(u) + (-g(u)), so that Miflu) Fi(du) =
JMLh(u) p(du) + Ll
- g(u) ft(du) ,
that is, Ll [flu) + g(u)]fu(du) = Ll flu) p (du) - Ll g(u)p(du) if the right-hand member of this equation is not an indeterminate form of the fo frm 00-00. Analogously, -g(u) = f(u) + (- h(u)). Therefore, J MQ -
g(u) f t(du) _ -
ME
_ _
g(u)p(du)
Mgflu) ft(du) + ME
Mg
- h(u)p(du)
J lu)ft(du) - ME h(u)p(du) ,
so that, (again, L2 (flu) + g(u))ft(du) _
LEflu) ft(du) + L2 g(u)p(du)
if the right-hand member of this equation is not an indeterminate form of the form - - 00. By combining the integrals over the
4.
67
INTEGRALS
sets M, and M2, we obtain formula (16), by virtue of (2).
Let f(u) and g(u) denote any two functions defined on M. Let us set M, = {u; f(u) > 0, g(u) > 0}, M, = {u; f(u) < 0, g(u) < 0}, M3 = {u; f(u) > 0, g(u) < 0}, M, _ {u; f(u) < 0, g(u) > 0}. Then
M= UM, where M; (1 M; = o for i :t- j, and formula (16) is proved for each of the sets M; for i = 1, 2, 3, 4. If we now replace M in formula (16) with the set M; for i = 1, 2, 3, 4 and add the equations obtained, we get, in view of (14), equation (16) in the general case.
Theorem 5. If f(u) and g(u) are equivalent on M, then L ff(u)p(du) = L g(u)p(du) Indeed,
flu) p(du) _
g(u) ft(du) +
[flu) - g(u)}p(du)
,
and mr
[flu) - g(u)]p(du) _
[flu) - g(u)}p(du) = 0
(by virtue of Theorem 1). Theorem 6. If f(u) > g(u) for u e M, we have f
JMflu)p(du) > Lg(u)p(du)
(17)
if both integrals exist. Proof.
The function cp(u) = f(u) - g(u)
is
nonnegative for
u e M. It follows from the definition of an integral that
p(u) p(du)
0 and hence
f(u)p(du) - Mg(u)fi(du) > 0 , if the difference on the left is defined. On the other hand, if the integrals are infinite but of the same sign, (17) is obvious. Theorem 7.
If JMff(u)p(du)
+
then p(u; u e M, f(u) = + oo} = 0. In particular, if the function f(u) is integrable, it is finite almost everywhere.
MEASURE THEORY
68
Proof. L f+(u)p(du) ? L h(u))Y(du) ,
and h(u) = h if f+(u) = + where h(u) = 0 if f+(u) < + where h is an arbitrary positive constant. Consequently,
CIO ,
L f+(u)p(du) > h p{u; u e M, f(u) = + 00},
which can be bounded as h
only when f(u) < + 00 almost
00
everywhere.
Theorem 8. If f(u) is integrable on M, then I f(u) I is also integrable on M. Proof.
If f(u) is integrable on M, then 1Mf +(u)1t(du) < oo,
J
f (u)ft(du)
lim gk.(u) = fk(u) Consequently,
lim h.(u) > lim fk(u) = f(u) k-.-
n__
Comparing this with (1), we obtain f(u) = lim h.(u) n--
It follows from the definition of an integral and, formula (1) that JU
f(u)lc(du) = lim Uh.(u) ft(du) < lim
f.(u),u(du)
.
JU
Inequality in the opposite direction follows from the fact that fn (u) 0 there exists a a > 0 such that the inequality p(A) < a implies
L1fl1 lt(du) < E
(11)
Proof. Since integrability of the function f(u) implies inte-
grability of I f(u) 1, it will be sufficient to prove the corollary for nonnegative integrable functions. We note first of all that tc{u; f(u) = oo } = 0. Therefore, lim q {u; f(u) > N} = p{u, f(u) = -} = 0 N-.o
MEASURE THEORY
72
(cf. Theorem 1, section 4). Consequently, for some No, cp{u; f(u) > No} < s/2. Furthermore, for arbitrary N, f(u),"(du)
N} .
Taking N = Np and u(A) < a = s/2Na, we obtain the desired result. Theorem S. Let {f(u)} denote a sequence of measurable functions that converges in measure to a function f(u) on U. Suppose that I f (u) I < s(u) (mod p), n = 1, 2, .. , where s(u) is an integrable function.
Then
lim J
Proof.
fn(u)p(du) =
J
l u),u(du)
In accordance with Theorem 4 of Section
3, an
arbitrary sequence { contains a subsequence { .(u)} that converges almost uniformly to f(u). It follows from Theorem 3 that lim f Uf-k;(u) (du) = `U f(u)[ (du)
.
Thus the bounded sequence of the integrals f(u)i(du) has a unique point of accumulation f(u),u(du). This completes the proof of the theorem.
We present one more theorem on limits. Here the requirement that an integrable majorant exist is replaced with another requirement whose satisfaction is in many problems easier to verify.
Theorem 6. If a sequence {f(u)} converges in measure to a function f(u) for u e U where ,uU < -, and if for some p > 1, JuI fn(u) IP[t(du) < c,
n = 1, 2 ...,
then
lim Proof.
JU
ff(u)p(du) =
f U f(u),u(du)
.
Let s denote an arbitrary positive number. {u;
Then for N > 1/s,
I f(u) - fn(u) I > e}
Define
5.
INTERCHANGING LIMITS AND INTEGRATIONS.
5flu)p(du)
-
JU
+
fn(u) p(du) I < A,y(e)\A,y(N)
LP SPACES
73
I f(u) - fn(u) I lt(du)
J
I .f(u) - .ff(u) I p(du)
+ 5 A,,(N) I f(u) - .fn(u) 11-t(du) < slt(U) I f(u) - fn(u) 11 p(du) . + Nft(An(s)) + NP-1 An(N) J 1
It follows from the inequality (a
b)P
2
< a JP
I b 11,
I
for p> 1
2
that
L(N)
I f(u) - .fn(u) 1 ' 1-t(du) < 2P-1
A,1e[I flu) IP + I.fn(u) P]lt(du) (N)
< 2f
P-1{5U [ f(u) I P + I .ff.(u) I P] lt(du)}
.
On the other hand, since f(u) = u-lim fn(u), there exists a sequence { fnk(u)} that converges to f(u) (mod lt). By virtue of Fatou's lemma and the hypothesis of the theorem, JU
I f(u) I P p(du) = f U lim I .fnk(u) JP p(du)
< lim
IP lt(du) < c
.
JU
Thus lim
f(u)1t(du) JU
- 5 U fn(u)pt(du)
5 sp(U) + N lim t(A.(s)) + f
2Pc1
= s,u(U) +
2Pc1
Since s is an arbitrary positive number and since N > 1/s, the proof is completed. Let us now look at complex-valued functions f(u) defined on U. The function f(u) = fl(u) + if2(u) where fl(u) and f2(u) are real, is said to be Cs-measurable if fl(u) and f2(u) are C5-measurable. The function f(u) is said to be integrable if fl(u) and f2(u) are integrable. The integral of the function f(u) over U is defined by the equation JUf(u)lt(du) = 5Uf1(u)1-t(du) +
iUf2(u)lt(du) .
74
MEASURE THEORY
The properties of an integral that were established for real functions are easily carried over to complex-valued functions. To the end of the present section, equivalent functions are assumed equal,
so that the word "function" actually means an entire 'class of functions that are equivalent to each other. Let LP = LP{ U, Cam, p} (for p > 1) denote the class of all emeasurable functions defined on U into the set of complex numbers and satisfying the inequality f I f(u) 11 u(du) < cc . U
Minkowski's inequality, which is proven for integrals in abstract
spaces in the same way as for ordinary spaces (see Hardy, Littlewood, and Polya; Heider & Simpson): fp(du)}11P
I f(u) + g(u) j P
l JU
(u) J P p(du)j""
I
+I
U I g(u) I P llbelonging
implies that the sum of two functions
to LP also belongs to LP. Obviously if f belongs to LP, so does af, where a is a complex number. Thus LP is a linear space. Defining the norm
of an element f of LP by fi(du)}"P
f%
IIf II =
we make LP a normed space.
IP
Obviously
IIaf1I = I 1 IIf1I , 11f +gII < IIf1I + IIg1I , so that the axioms of a normed space are satisfied. The distance between two functions f, g e LP is then defined by 1
1
PV, g) = I I f - gII =
}1/P
I f(u) - g(u) I Pd[
If a sequence { ff(u)} of functions in LP converges to f(u) in the sense of convergence in LP, that is, if j
I f(u) - fn(u) I I --> 0 as
n -* cc,
we say that {f(u)} converges in mean of order p to f(u). A sequence {f(u)} is said to be fundamental in LP if I as n, n' ---,. o o.
I
f,,,(u)I I ---p 0
It follows from the inequality
sPp{u; I f(u)
- g(u) I > s}
0, there exists a 8 > 0 such that the inequality p(A) < 8 implies the inequality p(A) < s. The proof of this condition is obvious. The necessity follows from a property of an integral (cf. corollary to theorem 4 of Section 5).
6.
ABSOLUTE CONTINUITY OF MEASURES. MAPPINGS
79
Theorem 2. If fe and p are finite measures and q k=1j X(A n Ek) + (A\U k=1
.
Since inequality in the opposite direction always holds because of the subadditivity of the outer measure, the assertion is proved. Lemma S. Every subset N' of a set N of outer measure zero (X(N) = 0) has outer measure zero and is measurable. Proof. If N' c N and MN) the subadditivity. Furthermore,
= 0, then MN') = 0 because of
X(A\N') < MA) < x(A n N') + X(A\N')
,
X(A n N') = 0 ,
and consequently, X(A) = X(A n N') + X(A\N') = M(A\N')
which proves the measurability of the set N.
,
MEASURE THEORY
84
Theorem 1 is an immediate consequence of Lemmas 6-8.
Theorem 1 shows how we can use the outer measure of sets to construct a measure. However, the possibility is not ruled out that the 6-algebra of the sets that are measurable with respect to this measure is trivial, that is, that it consists only of the empty set and the entire space U. On the other hand, in most cases we are interested not in arbitrary measures but in measures that coincide with a given set function on some class of sets I that does not constitute a a-algebra. For example, let us look at the particular problem characterizing the statement of the question in the general case. Let F(x) denote a given distribution function of some random variable e. We wish to construct a measurable space J U, Cam, P} in which U is the real line (- Ca , oo ), 0 is the
complete a-algebra of sets containing all Borel subsets of the real line, and p is a probability measure on 0 that is consistent with the given distribution function F(x), that is, such that
P{e < x} = p{(- -, x)} = F(x) . The distribution function F(x) determines the probability that the value of the random variable 6 will fall in the left-closed rightopen interval [a, b):
P{a 0, we obtain the desired result. Let us call the set function {X, U} defined by equations (8) and (9) the Lebesgue outer measure corresponding to {m,}. On the 2 of all X-measurable (in the sense of Caratheodory) sets, x is a measure. Let us agree to call it the Lebesgue measure in the present section. We now consider the question: When is {) , 2} the extension of a set function {m, 1Z}? We note that if
{m, fit} has an extension as a measure, m must be an additive function on TZ.
In many cases
Definition 4. A class of sets
t has the following structure: 1J
is said to be decomposable if
MEASURE THEORY
86
for arbitrary A, and AZ in R,
A,nA2e t, B
Al\A2 = u Ak , where A,* e qR, Ak n Ar = 0 for k = r . k=1
The concept of a decomposable class of sets is close to the concept of an algebra of sets. Specifically, a class of sets of the form REMARK 1.
B=(' Ak)nA, A;=1
where r is an arbitrary integer and the Ak are arbitrary sets in R, is an algebra of the subsets A. Deflnition S. A nonnegative additive set function not identi-
cally + - that is defined on a decomposable class of sets TZ is called an elementary measure.
Theorem 2. For a Lebesgue measure {X, 2} constructed from an elementary measure {m, R1 to be its extension, it is necessary and sufficient that for arbitrary A e fit the relations Uk=1 Ak D A, where each Ak e SR, imply
m(A) < I m(Ak) .
(10)
k=1
Proof of the Necessity. If {X, 2} is the extension of {m, Tt}, then X(A) = m(A) and inequality (10) follows from Lemma 9. Proof of the Sufficiency. It follows from the definition of outer measure that X(A) < m(A), if A e 1R. On the other hand if condition (10) is satisfied, then
%(A) = inf k=1 E m(Ak) > m(A)
,
from which we get MA) = m(A)
for every A e L2 (11) We must now show that all the A e R are measurable. From the subadditivity of the outer measure, we have for arbitrary A* e TR (12) (A*) < (A* n A) + M(A*\A) . Since A* n A= A0 and A*\A = Uk=1 Ak, where A. e TR (for j = 0, 1, , s), and since we may assume that the A. are disjoint, it .
follows that
'(A*) = m(A*) = Ek=0m(Ak) = ? (A* n A) +Ek=1m(Ak) ? x(A* n A) + ? (A*\A) .
(13)
7.
EXTENSIONS OF MEASURES
87
Comparing (12) and (13) we obtain
x(0*) = X(A* n A) + X(0*\4) (14) Let A denote an arbitrary subset of U. If X(A) _ oo we have X(A) > X(A n A) + X(A\0) .
On the other hand, if X(A) < . there exists a sequence of sets Ak e TZ such that A c u7=1 A and X(A) + s > k X(Ak), for s > 0. On the basis of (14) we have X(Ak) = %(Ak n A) + X(Ok\A) ,
from which we get x(A) + a > E x(Ak n A) + E X(Ak\A) _> x(A n,&) + x(A\A) k=1
,
k=1
since
AnOcU(AknA) and A\AcU(Ak\A) k=1 k=1
Thus since s is arbitrary, we have in all cases: X(A) > X,(A n A) + x(A\A)
.
Inequality in the opposite direction follows from the subadditivity of the outer measure. Consequently for arbitrary A c U and arbitrary A e J2, X(A) = X(A n A) + X(A\A),
that is, A is measurable. Thus lU c 9, and m(A) = X(A) on fit, that is, the Lebesgue measure {x, 2} is the extension of the elementary measure {m,
t}.
In connection with the solution obtained in Theorem 2 to the problem of the extension of an additive set function {m, Tt}, the question naturally arises: Is it possible to extend an elementary measure {m, fit} in another manner and obtain a measure distinct from the Lebesgue measure {X, 2}? In a certain sense the answer to this question is negative. We find that extension of an elementary measure to the smallest a-algebra a{fit} containing IDI is unique if it exists. We shall say that an elementary measure {m, 1J } is a-finite if U can be represented as the sum of countably many elementary sets each of finite measure. Theorem 3. The extension of an elementary a-finite measure {m, fit} as a measure {p, a{ J2}} is unique. Proof. Let A denote an arbitrary fixed set in TZ such that m(A) < 00. Let ao denote the class of sets of the form
MEASURE THEORY
88
for
B=UAkf1A, AkEV, AkflAj=0
k#-j.
k=1
of A. If { ul, o{ J2}} and elementary measure {m, t}, {u2, a{fit}} are two extensions of an then by virtue of the additivity of a measure, Then %,
is
the algebra of subsets
,uJB) _ E m(Ak fl A) =p2(B) p2(C). Obviously, Let denote the class A set 9 is a monotonic class and D a'. &nssque'ntly, A D a{%o}. Now E
let E denote an arbitrary set belonging to 6{J2}. The set E is covered by a countable disjoint collection of sets Aj c 9 each of finite m-measure. Consequently, for arbitrary i we have p,(E fl 0;)
p2(E fl 0,), so that p ,(E) _ p2(E) We shall now show the relation between the extension of an
elementary measure on a{JJ2} and the a-algebra 2 of all measurable sets.
Let W. (resp. tea) denote the class of sets that can be represented
as the union (resp. intersection) of a countable collection of sets belonging to J1 and let Was (resp. ''Q) denote the class of sets that can be represented as the intersection (resp. union) of a
countable collection of sets belonging to WQ (resp. '8). classes W., W8, W, Wba all belong to Q{fit}. Theorem 4. If (X, 2) is the Lebesgue measure that
The
is the
extension of a a-finite elementary measure {m, 9} satisfying condition (10), then for every set A E 2, there exists a measurable set E in X08 such that A c E and X(A) = X(E). Proof.
Since
X(A) = inf E m(Ok) = inf E X(Ok) UdkDA
UAkDA
there exists a sequence {Ok""} of coverings of the set A (that is, Uk-, Akn)
A, for each n) such that X(A) = lim E X(Ok') n--
k
Suppose that El') = Uk Okn> and E n) =
and
f}C c"
dbr e1r
nk=1
k=1
E. Then L, o
parvre
X(A) = Jim X(E(n)) _ X(E) ,
where E= nk=, Elk' is the limit of a decreasing sequence of sets and E e W... This completes the proof of the theorem.
7.
EXTENSIONS OF MEASURES
89
Let 9Z denote the class of all subsets of sets in 6{ JZ} that are of p-measure 0, where p is some measure defined on 6{9R}:
9Z = {N; NcE, Eec{ JZ}, p(E) = 0} , denote the class of all sets of the form A = (E U N')\N",
Let C25 where E e 6{Pt}, N' e W, and N" e 92.
In short, C is the class of
sets that differ from the sets in Q{JZ} by a subset of p-measure 0. Let us set #(A) = p(E). This definition is unambiguous: If A = (El U Ni)\Ni"= (E2 U NZ)\NZ', then E,\EZ C N1" U NZ and E2\E, c Ni U Nz'.
Since E,\E2 and EZ\E, belong to 6{9Z} we have p(E2\E,) = p(E1\E2) = 0 so that ft(E2) = ft(E). One can easily show that C25 is a v-algebra
and that ft is a measure on it. Definition 6. The measure {f, C25} is called the completion of the measure {p, o{ P }}.
Theorem 5. If a a-finite elementary measure {m, T2} satisfies condition (10), the completion {a C25} of its extension {X, orITZ}} onto the minimal or-algebra coincides with the Lebesgue measure {X, 2}.
Proof. E e 6(1JZ),
Since the a-algebra 2 is X-complete it follows that for
N', N" e 92, and A = E U N'\N", we have N', N" e 2
In other words C25 c 2. Suppose now that A is an arbitrary set of finite measure in 2. Then there exists an EDA belonging to 6{TZ} (in accordance with Theorem 4) and X(E\A) = and X(A) = M(E).
0.
Let us set E\A = F. Then there exists an E' D F belonging to
tr{9JZ} such that x(E') = X (F) = 0.
Thus
A= E\F, F c E', E' e or{ V}, (E) = 0, that is, F e W. In the general case in which A is an arbitrary set belonging to 2 we apply the last relation to the set A (1 Ok. This completes the proof of the theorem. These last theorems show that in a certain sense, measurable sets do not differ greatly from sets belonging to the minimal oralgebra 6{TZ}. Specifically, a measurable set A differs from some E e 6{ IJZ} by a subset of a set in c{TZ} that is of measure 0. The following result is a particular case, but it will be used in what follows. Theorem 6. Let U denote a complete separable metric space, let 0 denote the or-algebra of Borel subsets of U, and let p denote a finite measure on 0. Then for arbitrary A e 0 and arbitrary s > 0 there exists a compact set K, contained in A such that p(A\K,) < e. Proof.
Let us first prove the theorem for the particular case
of A = U. Let {xn} denote a countable everywhere-dense set in
MEASURE THEORY
90
, let {Sk(xn)} denote the sequence of U. For each n = 1, 2, closed spheres of radius 1/k (for k = 1, 2, ) with center at the point xn. For each k let us choose nk so that ,u(U1 sk(in)' > P(U)
- 2k
Define nk
Sk = U Sk(in) . n=1
The set Sk is closed and admits a finite (1/k)-net.
Let us set
KE=fls,. k=1
Then K. is closed and it admits a finite (1/k)-net for arbitrary integral k; that is, K, is compact. On the other hand, It(U\KE) < i I- (U\Sk) k=1
Thus the assertion of the theorem holds when A coincides with the entire space U. Furthermore, an arbitrary closed set F contained in a complete separable metric space is itself complete and separable. Consequently the theorem holds for an arbitrary closed set F = A. Let St denote the class of sets B that can be represented simultaneously in the form G, B= U Fn = n=1 n G. n=1
where the F. are closed and the G are open. The class 9 contains all closed sets and is an algebra of sets. Therefore a{ft} _ 58. Let St'* denote the class of sets for which the assertion of the theorem is valid. Let {Bn}, for n = 1, 2, . , denote a nondeer-eas-ing--sequence of sets belonging
91*.
Define
B0 =UBn, n=1 Let {K.} denote a sequence of compact sets such that K. C Bn, It(Bn\Kn) < s/2n}1 and li(B(,\Bn0) < s/2. n=1
so that B0 e S*.
Kn) G
Then
+ n=1
1i(Bn\Kn)
n. and n > no together imply the inequality P{I -,
- $.I > 6} < U
.
This condition is called the Cauchy condition with respect to the probability of the sequence {sn} (cf. Theorem 5, Section 3, Chapter II)
If P-lim C / = Y J ; for j = 1 , 2, , s, then for an arbitrary , t8) where - 00 < t j < 00 for j = continuous function (p(ti, t2, 9.
1, 2,
, s, the sequence of random variables
converges in probability to a random variable cp(721, 7),,
, 7)a) (cf.
Theorem 6, Section 3, Chapter II). Definition 5. The integral Me _
f(u)P(du) ,
e = f(u)
U
if it` is meaningful, is called the mathematical expectation of the random variable and is denoted by MC.
On the basis of Theorem 5, Section 4, Chapter II, the value of the mathematical expectation is independent of the function f(u) representing the random variable . The mathematical expectation is a functional defined on some subset of random variables -and
1.
103
PROBABILITY SPACES
enjoying the following properties: 10. M(aE + (3)7) = aME + 8M72, where a and 8 are constants and at least one of the quantities Me, my) is finite. ,
11.
is,
If xA is the characteristic function of the event A, that
if x4(u) = 1 for u e A and XA(u) = 0 for u V A, then MXA = P(A). 12. If c < )7 (mod P), then Me < Mrs. 13. For an arbitrary sequence of nonnegative random variables
Is .1, n},
MEEk=X, Mek k=1 k=1 (cf. Theorem 1, Section 5, Chapter II). 14. If I e,, 1 a} < MA C1) f(a) This is known "Chebyshev's inequality." Proof follows from the inequality f(I $ I) > .f(a)x[,,,,;(1 C 1), where XI.,- )(x) is the characteristic
function of the infinite left-closed interval [a, -). We now mention another inequality that is well known in analysis. 16. If g(x) is a continuous convex (downward) function for all real x and if e is a random variable with finite mathematical
expectation, then Mg(C) > g(MCC)
This is known as "Jensen's inequality." The proof can be found in the book by Hardy, Littlewood, and Polya. Let J U, C5, p} denote a probability space and let X denote an arbitrary set with fixed o-algebra of the sets B. Definition 6. Let C = f(u) denote a function defined on U into the set X and suppose that for arbitrary B e 0, {u; f(u) e B} E Cam. Then the function = f(u) is called a random element with range in X.
In other words,
= f(u)
is
a random element if under the
AXIOMATIZATION OF PROBABILITY THEORY
104
mapping u , x = f(u), we have f(Cs) i 0. In the case in which X is a metric space, we always understand by 0 the o-algebra of Borel subsets of X.
A random element with range in a finite-dimensional linear space is called a random vector. 2.
CONSTRUCTION OF PROBABILITY SPACES
Let {FB1,82,.... ,,(x1, x2, , xn)} where n = 1, 2, and 0k e O for k = 1, 2, , n, denote a family of distribution functions satisfying the compatibility conditions (cf. Section 1, Chapter I) and describing a random function in the broad sense. Is it possible to
define a probability space J U, Cam, p} and a family of random variables
e = fe(u), u e U , B e O, on it in such a way that the joint distribution function of the sequence {eBl, a2, , den} coincides with the given function F91,9.....0,a(x1,
, xn) for arbitrary n = 1, 2,
Ok e O, where k = 1, 2, .., n? The following problem is less difficult. Let F(x1, a distribution function of n random variables 1, c2,
, x%)
and denote
, e,4 and let A denote an arbitrary n-dimensional Borel set. How should we
define the probability of the event {e1, e2, - , fin} e A? A precise statement of this problem follows: Let F(x1, x2,
, x,b)
denote a distribution function of n independent variables, let E. denote n-dimensional space, and let e denote the u-algebra of all Borel subsets of E. How can one construct a probability space {E,,, C25, p} in such a way that F(a1, a2, ...,
P(Ia1,a2,...an)
for arbitrary real ak (for k = 1, , n)? (Here, Ia1,d2, the n-dimensional orthant of the form
,a
denotes
Ia1,d2,...,d,L = {(x1, . . ., x,); x1 < a1, ..., x < an} .)
We begin by solving the latter problem. Let a, b, x, , where a = (a1, a2, , denote the points , n. in the set E,z. Let us write a < b if ai S bi for i = 1, 2,
We shall refer to the set I [a, b) = {x; ai < xi < bi, i = 1,
, n},
where a < b, as an n-dimensional left-closed right-open interval or, briefly, as a left closed interval. Let a denote the class of all left-closed intervals. This class constitutes a decomposable family: I [a, b) n I [c, d) = {x; a, < xi < bi, ci < xi < di, i = 1, ... , n} = {x; max (ai, ci) < xi < min (bi, di)} = I [a', b') ,
2.
CONSTRUCTION OF PROBABILITY SPACES
105
and
I [a, b)\I[c, d) = {x; x, e [a;, b;)\[c;, d;)}
which is the sum of a finite number of left-closed intervals. Let us define the probability that a random point (el, e2, , en) will fall in a particular left-closed interval. We introduce the notation '&Lak,bk)F('x)
= F(xl, ..., xk-1, bk, xk+1, ..., x,) - F(x1,
.
, xk-1, ak, xk+i,
. , x,,)
Because of the monotonicity of F(x) with respect to each variable, of the variables x1, the function O1E xk-1, xk+1, .. xn is nonnegative and nondecreasing with respect to each variable. The probability-theoretic meaning of the quantity 0[akbk)F(x) is as follows: This quantity is the probability that the inequalities 6l < xl, . ., ek-l < xk-1, ak C ek < bk, ek+l < xk+l, ... , en < X,a Will be satisfied. We obtain by induction F(I[a, b)) = 11,b1)OC?2 b2) ... A[a;);bn)F(x) ? 0 , (1 ) where F(I[a, b)) is the probability of the event (el, e2,
, en) e I(a, b).
The quantity F(I[a, b)) can also be written Co
s!0°
F(I [a, b)) _
(ti,
n
t1,...,tn=0
(-l)'' F[b - t(b - a)]
,
(2)
where
t = (tl, ...,
t(b - a) = [t1(b1 - a), .. , tn(bn - an)]
The function F(I[a, b)) is an additive function on a, because if we partition I [a, b) into two subintervals 1[a, cl) and I [c1, b) where c1 = (c1, a2, , an) with a1 < c1 < b1, then
F(I[a, b)) =
a161)
. DEa;,,bn)F(x)
62) ...
0[a1
+ D(1) [o1,b,)
(2)
(a2,b2)
[a,)y,b,)F(x)
, , O(n)
F(x)
[a,t,bn)
= F(I[a, c1)) + F(I [c1, b1))
The same relation holds if the left-closed interval I [a, b) is partitioned
into two left-closed intervals by partitioning any one of the sides [ak, bk) into two parts. One can show by induction that the function F is additive for an arbitrary decomposition of LEa, b) into a union of left-closed intervals.
For the function F to be extendable as a measure on some a'-algebra of subsets of En, it is necessary and sufficient that it be subadditive (cf. Theorem 2, Section 7, Chapter II):
AXIOMATIZATION OF PROBABILITY THEORY
106
(3)
EF(Ik) k=i
F(I)
for an arbitrary system of left-closed intervals Ik (for k = 1, 2, .) such that Uk=1 Ik D I. Let us verify that this condition is satisfied in the present case. Let I[ak, bk) be denoted by Ik and let I[a(,, b0) be denoted by I. Since the function F(x) is continuous from the left, there exists an S. = (sk, , sk), where sk > 0, such that 0 < F(I[ak - sk, bk)) - F(I[ak, bk)) < 2-k'/
where 77 > 0 and k = 1 , 2,
The open intervals (ak - sk, bk)
.
cover the closed interval [a0, bo-s]. From the Heine-Borel theorem, the collection of these open intervals contains a finite subcovering, for example, {(ak - 8k, bk)} for k = 1, , n. Then the set of leftclosed intervals (Ca-1 - sk, bk)} for k = 1, .. , n covers the left-closed interval [a0, bo - s). Consider the collection of disjoint sets k-1
[a0,b0-e)n{[ak-sk,bk)
[ai-e,ba)},
k = 1,...,n
each of which is the union of nonintersecting left-closed intervals A;k) (for j = 1, 2, - , mk). Thus n
mk
[a0, bo - s) = U U 'V) k=1 j=1
and n
mk
F(I [ao, bo - s)) = k=1 E Ej=1F(Dik>) n
< E F(I[ak - sk, bk))
g(I x+MGn - Ck)I) =g(Ix1) Thus n
n
Mg(I C. I) > M Ek=1X(Ak)gk(Sk) >_ M E X(Ak)g(I Ck I) k=1
M E X009(t) = g(t)p{ma x I Ck I> t}
(here we use the fact that if the event Ak occurs, then I Ck I > t), which completes the proof. Let us note a special case of Kolmogorov's inequality. Let us set g(x) = x2.
Then n
P 15ksn max ICkI?t < a E6k k=1 where o'2k = Mek = Dek
(2)
AXIOMATIZATION OF PROBABILITY THEORY
120
In the case in which we do not assume the existence of any mathematical expectations instead of Kolmogorov's inequality we can use:
If
Theorem 2.
P{ICn-bkIa fork=0, 1, ,n, then {max I Ck I> 2t}
2t}
yy I
bk-1 I
I
and
Bk = {I C. - Ck I t}:U(AknBk), k=1 , n, are pairwise incompatible, and , n) are independent.
the events Ak, for k = 0, 1,
the events Ak and Bk (for n fixed, k = 0, 1, 2, Therefore,
1 - a> P{ICnI > t} > P{U (AknBk)} k-1
p(Ak)p(Bk) > a E P(Ak) = aP{max I Ck I > 2t} 15ksn J k=1
k=1
which implies inequality (3). The following inequality sometimes enables us to find a lower bound for the probability of the event {maxl t}: Theorem 3. If Mek = 0 and if I Ck I < c with probability 1, then p{max I lsksn
bk I
(cn+
< t}
r 1} k,k'?m Then
N = r=1 U Mn E.., Em,r From Kolmogorov's inequality we obtain p(N) = lim lim p(Em,r) < lim lim r2 r-boo M-,m
r-.oo m-boo
k=m
ak = 0.
In the general case, the question of convergence of the series (9)
is completely answered by:
(9)
Theorem 5 (Kolmogorov's three-series theorem). For the series of independent random variables to converge, it is necessary (resp.
AXIOMATIZATION OF PROBABILITY THEORY
122
sufficient) that for every (resp. some) c > 0, the three series
In=1P{Ienl > c} ,
(10) (11)
Men ,
(12) n=1
converge, where n = en for I n I < c and n = 0 for I n I > c. Proof of the Sufficiency. According to Theorem 4, the series
E (en - W1.) n=1 converges with probability 1. Since the series (11) converges, it follows that the series Jn=1 n converges. The series
En=E n=1
I
n=1
where;;
has only finitely many nonzero terms on the basis of condition (10) and the Borel-Cantelli theorem (Theorem 2, Section 3). Therefore the series (9) converges with probability 1. Suppose that the the series (9) converges.
Proof of the Necessity.
Then the sequence of its terms converges to zero with probability 1 since only finitely many terms in the series exceed the number c in absolute value. Therefore the series Jn=1 n converges with probability 1. Let {77n}, for n = 1 , 2, , denote a sequence of independent random variables that do not depend on the sequence {fin} and that have the same distributions as do the Let us setn =n - /n Then the series converges with probability 1; also, Din = 2Den .
I ek I < 2c ,
Men = 0 ,
The convergence of the series E;6=1 n implies that P1 max
15n5
n+ _ k=1
k
Therefore, for some t, P{ max 115nS
k=1
k
0.
It follows from inequality (4) that for arbitrary n,
2E DEk = k=1 E k=1
(2c
t)z
+ a
which proves the convergence of the series (12). It then follows
5.
ERGODIC THEOREMS
123
from Theorem 4 that the series E7=1 ;, -
coverges with probability 1. This in turn implies convergence of the series (11). On the basis of Theorem 2 of Section 3, the series (10) must converge, can exceed c only for a since when the series (9) converges, I
I
finite number of values of n. This completes the proof of the theorem.
Corollary. For the series (9) of independent nonnegative random variables to converge, it is necessary (resp. sufficient) that for every (resp. some) c > 0 the series E P{ek > c}
,
n=1
E Wk n=1
converge.
Proof. For nonnegative random variables n, we have M < Therefore convergence of the series (11) implies convergence
cMe'..
and hence of the series (12).
of the series
5.
ERGODIC THEOREMS
Let e(t) denote a stationary sequence, that is, a stationary random function that assumes real values and that is defined on the set Z of all integers (t = 0, ±1, ±2, - ). Let RZ denote the space of all real sequences w = {a(t), t e Z} and let 11 denote the a-algebra generated by the cylindrical sets R. The sequence e(t) defines a measure pF(C) on (9: Pe(C) = P({e(t); t o Z} c C), C G E
Let {PF, ,} denote the completion of the measure pF. In RZ we define an operation S representing time displacement: w' = So) if al(t) = a(t + 1) for t e Z, where w = {a(t); t e Z} and w' = {a'(t); t e Z}. The operation S has an inverse S-1:
S-'w = w"
,
w" = {a"(t); t e Z} ,
a"(t) = a(t - 1)
The condition for stationarity of the sequence e(t) means (cf. Section
1, Chapter I) that for an arbitrary cylindrical set C, PE(C) = Pe(SC) .
(1 ) Since a measure on cylindrical sets uniquely defines a measure on 1 and on its completion C, equation (1) remains valid for arbitrary AE(ge: PF(A) = Pe(SA) ,
Ae$
.
(2)
AXIOMATIZATION OF PROBABILITY THEORY
124
Definition 1. Let in, e, Ii} denote a space with a measure p and let T denote a measurable mapping of In, Cam} into itself. The mapping T is said to be measure preserving if for arbitrary A e Cam, p(T1A) = ,u(A) where T-'A, is the complete preimage of the set A. A mapping T is said to be invertible if there exists a measurable transformation T-1 such that TT 1 = T -IT = I, where I is the identity mapping. In this case the mapping T-1 is called the inverse of T. The definition of a stationary sequence is equivalent to saying that a sequence {E(t); t e Z} is stationary if the time-displacement operator S preserves the measure PF in Rz. Thus the problem of studying stationary sequences is included in the problem of studying measure-preserving invertible transformations (automorphisms) of some space with a measure.
Let us now look at the question of the asymptotic behavior of the mean _1 1
n k=o
-Ef(Tk
(o),
n-->°°
(3)
where Tk is the kth power of the mapping T, where f(w) is an arbitrary Cs-measurable function, and where {S2, e, p} is a space with measure p such that I.t(&I) = 1. To understand the meaning
of this problem, let us look at the case in which in, e, ti} is our Pf} and T = S. Suppose that f((o) is XA(e(0)), where XA(x) is the characteristic function of the Borel set A on the real space {Rz, line.
Then f(Tk(o) = XA(Sk(o) = XA(e(k)) and
-1 -nEk=o .f(Tkw) =
yn( ' w n
,
where v,,(A, (o) is the number of terms in the sequence e(0), e(1),
(4) ,
e(n - 1) with values in the set A; that is v (A, w)ln is the frequency with which the first n terms in the sequence e(t) (for t = 0, 1, fall in the set A. Thus the question we must consider is the question regarding the frequency with which the random variable E(t) assumes values in an arbitrary set A. Let us show first of all that the limit as n -p - of the mean (3) exists with probability 1. This
)
assumption constitutes the substance of the famous Birkhoff-Khinchin theorem. Lemma 1. If T preserves the measure It, if D e e, and if f(w) is a Ce-measurable nonnegative (p-integrable) function, then f
JT-1Df(Tw)tt(dw) = Df(w)li(dw)
(5)
5.
ERGODIC THEOREMS
Proof.
125
If we set f((o) = XA((o), formula (5) becomes the equa-
tion ,u(T-1(A n D)) = P(A n D)
,
which is valid for arbitrary A and D e e.
From this it follows that it is valid for arbitrary Cam-measurable nonnegative and p-integrable functions.
The following lemma is of an elementary arithmetic nature. , an denote a sequence of real numbers and let p denote an integer. We shall say that a term ak in a sequence {ak} is p-distinguished if at least one of the terms
Let al, a2,
ak, ak + ak+l, ... , ak + ak+l + ... + ak+P-1 is nonnegative. Thus ak is 1-distinguished if and only if it is nonnegative.
Lemma 2. Proof.
Let
The sum of all p-distinguished elements is nonnegative. ak1
denote the smallest p-distinguished element of
+ ak1+r, where r < p - 1, ak1 + ak1+l + denote the nonnegative sum with the smallest number of terms. For h < r, we have
a sequence, and let
ak, + ak1+1 + ... + ak1+k < 0
+ akl+r > 0, that is, all terms of the sequence ak1, ak1+l + ... + ak1+r are p-distinguished, and their sum is nonnegative. We can extend this reasoning by considering the sequence beginning with the term ak1+r+1 Thus every sequence is Consequently, akl+k+l +
broken into finite sums of p-distinguished terms, and each such sum is nonnegative. The set of p-distinguished elements in the entire
sequence coincides with the union of the sets of p-distinguished elements in the parts chosen. This completes the proof of the lemma.
The following lemma constitutes a basic step in the proof of the Birkhoff-Khinchin theorem. Lemma 3. Let f((o) denote a p-integrable function and define n1
E = (o; 3n for which 0 f(Tkco) > 0, n = 1, 2, k=o
Then
Ef(oj)p(d(o) > 0 Proof.
Consider the
(6)
.
sequence f(m), f(T(o),
,
f(TN+P-1(o).
Let s((o) denote the sum of all p-distinguished elements of this sequence. On the basis of Lemma 2 we have s((o) > 0. Define
AXIOMATIZATION OF PROBABILITY THEORY
126
Dk = {w; f(Tkco) is a p-distinguished element}, for k = 0, 1,
N + p - 1. Let Xk((o) denote the characteristic function of the set Dk. We note that n-1
Do = {co; sup Z f(Tkw) > 0 and Dk = T-ID,_1 for k < N . k=0
9V
Therefore Dk = T -'D0 for k< N. Therefore
0
n k=o
Then
L f(w)p(d(o) > X,"(E2)
(9)
5.
ERGODIC THEOREMS
127
This is proved by applying Lemma 3 to an appropriate choice of the function f. Theorem 1 (Birkhoff-Khinchin). Let {12, Cam, ft} denote a space with a measure, let T denote a measurable p-measure-preserving mapping of {12, C25} into itself and let fl w) denote an arbitrary i-integrable function. Then the limit 1 n-1
lim -E f(Tk(o) = f* (w) (mod ,u)
(10)
n-- n k=o exists ,u-almost-everywhere in E2. The function f*((o) is T-invariant; that is, f *(Tw) = f *(w) (mod ,u) , and it is integrable. Also, if t(&I) < -, then
(12)
Proof.
Without loss of generality we may assume that the Let us set
function f(w) is finite and nonnegative. n-
g*(w) = lim 1 EL.f(TkoJ) n k=o
n-1 ,
f(Tkc)) g*(w) = Jim 1n E k=o
We need to show that g*(w) = g*(w) (mod ft).
.
Suppose that
0 8p(K.p)
,
(13)
* Lemma 4 does not seem to be directly applicable. Instead (13) seems to follow most simply by choosing 0 < e < (3 and noting that 1n-1
f x a#
f fw) - (Q - e)lk(d.) = Jx aA n is chosen appropriately. This implies that
{
n
Z ft Tkw) - (3 - E)}
Jf(w)u(dw) > (Q - e)u(Kap) ; xaA
(13) now follows by letting e J, 0.
0
AXIOMATIZATION OF PROBABILITY THEORY
128
and applying (9) to the function - f(w), we obtain 5 Kip
- ap(Ka) ,
(-f(w))p(dw)
f(w )p(dw)
0, it follows from (13) that ut(K p) < -, but then (14) is possible if and only if p(Kap) = 0. Thus the existence (mod ft) of the limit (10) is proven. Let us set f*(w) = g*(w). Then (10) is satisfied and the function f*(w) is T-invariant everywhere in 2. To prove formula (12), we set
k2 1
Akn = {w; k Consequently, Mk
0 as k ----> o o.
I
E Mk
Thus for arbitrary 0,
p{g(0,u)aB,k= 1, 2, ,g(0,u)0BI N} < >=2,...,n
2
If we assume that N > 1, then sup P{p(g(0, u), x) > 2N} < sup P{p(g(B1, u), x)
+ max p(g(01, u), g(0,, u)) > N} + P{p(g(B;, u), g(0, u)) > 1} < which completes the proof.
Definition 4. A random function g(O, u) is said to be uniformly stochastically continuous on O if for arbitrarily small positive numbers
s and s, there exists a a > 0 such that
(4)
P{p(g(0, u), g(0', u)) > s} < s,
whenever r(0, 0') < a. Theorem 4. If g(0, u) is stochastically continuous on a compact set O, then g(0, u) is uniformly stochastically continuous.
If this is not the case, there is a pair of positive numbers s and sl, and for arbitrary 3 > 0 a pair of points B and Proof.
0n, for which r(0., 0;,) < 8,,, and P{p(g(0, u),
e} > e, .
We may assume that an ---> 0 and 0 --> 0, as n as n --> oo, and
oa .
Then 0;, -
0o
sl < P{p(g(0, u), g(Bn, u)) > s}
< P{p(g(8., u), g(0,, u)) > 2 + P{p(g(B0, u), g(0 , u)) > 21
This inequality contradicts the hypothesis of stochastic continuity. Theorem 5. Let O denote a separable space and let g(0, u) denote a separable stochastically continuous random function. Then an arbitrary countable everywhere-dense set of points in O can serve as set of separability for the random function g(0, u).
RANDOM FUNCTIONS
156
Proof. Let V = {S} denote the countable set of spheres in O , n, } mentioned in the proof of Theorem 1, let J = {Bk, k = 1, 2,
denote a set of separability of the random function g(0, u), let N denote the exceptional set of values of u that appears in the definition of separability, and let A denote an arbitrary countable everywhere-dense set of points in O. Let B(S, u) denote the closure of the set of values g(-I,, u) as the point 7k ranges over A n S and let N(S, k) denote the event that g(Bk, u) o B(S, u) if 6k e S. The events N(S, k) have probability 0. To see this, let {'y,} for r = 1, 2, ,n,
denote an arbitrary sequence of points in A (1 S that converges to 0k.
Then p{g(Ok, u) 0 B(S, u)} < p{lim p(g(ek, u), g(7 u)) > 0} r-,00
lim p{lim p(g(ek, u), g(7r, u)) > n lim lim P{p(g(Bk, u), g(7r, u)) > n } = 0 .
Suppose that
N' = U U N(S, k) . S BkeS
Then P(N') = 0. If u NUN' and g(-y, u) e F for all -1 e A (1 G, where G is some open set and FcX is closed, then for every 0k e G and S such that Bk e S c G, we have g(Bk, u) e B(S, u) c F .
From the definition of the set {0k}, it then follows that g(O, u) e F for all 0 e G and u 0 NUN'. Thus the set A satisfies the condition in the definition of a set of separability of a random function. 3.
MEASURABLE RANDOM FUNCTIONS
Let O and X denote metric spaces with distance r(01, 02) and p(x,, x2) respectively, let g(0, u) denote a random function with range
in X and domain of definition O, and let u denote an elementary event of the probability space {U, e, p}. Let us suppose that a o'-algebra of sets
containing Borel sets
is defined on O and that a complete measure It is defined on 9. Let oo{SI x Cam} denote the smallest or-algebra generated in O x U by the product of the or-algebras R and a and let v{SI x Cam} denote
its completion with respect to the measure p x P (cf. Chapter II, Section 8).
3.
MEASURABLE RANDOM FUNCTIONS
157
Definition 1. A random function g(O, u) is said to be measurable
if it is measurable with respect to 6{.Q x 01. By definition a random function g(5, u) is Cs-measurable for arbitrary B e O. On the other hand, if a random function is measurable, then on the basis of Fubini's theorem, g(8, u) is .Q-measurable as a function of 0 for p-almost-all u. In other words its sample functions are Q-measurable with probability 1.
Let us now look at conditions that ensure the existence of a measurable separable function stochastically equivalent to a given random function. Theorem 1. Suppose that O and X are compact. If for )ualmost-all 0, a random function g(8, u) is stochastically continuous, then
there exists a measurable separable random function g*(B, u) that is stochastically equivalent to the function g(8, u).
On the basis of Theorem 1 of Section 2, corresponding to the function g(O, u) there is a stochastically equivalent separable random function g(8, u). Let I denote the set of separability of the function g(8, u). As in Section 2, A(G, u) denotes the closure of the range
of g(B, u) as 0 ranges over the set G f1 I and A(O, u) denotes the intersection of all sets of the form A(S, u) where S is an arbitrary open sphere in V that contains 0. By virtue of the separability, g(8, u) e A(O, u) almost certainly (that is, for u 0 N, where P(N) = 0).
On the other hand, if P{g'(B, u) = g(B, u), 0 e I} = 1
and g'(0, u) e A(8, u) (where u 0 N), then g'(B, u) is also a separable random function (cf. Lemma 1, Section 2). Let us construct a function g*(0, u) that is stochastically equivalent to the function
g(B, u) and measurable with respect to the u-algebra v{ft x e}. For arbitrary n, let us cover O with a finite number of spheres Sk") e V, for k = 1, 2, . , m of diameter not exceeding IIn. In each Sk ), let us choose a point 01) e I and let us set g".(B, u) = 9(o), u) for k-1
OeS7) U S j=1
where k=1, 2, measurable.
, m". Obviously the functions g"(6, u) are v{ft x e}Furthermore, P[g"(B, u), g(8, u)] = P[g(Bk"), u), g(O, u)]
if
(l )
RANDOM FUNCTIONS
158 k-1
0 e Sk
I
1
7==1
Also, r(0$ ), 0) 5 1/n. If we set G.,m(B) = P{u; p[gn(0, u), gn+m(O, u)] > b"}
then by virtue of the hypothesis of the theorem, the function G,,m(B) approaches 0 as n --, - for 1.t-almost-all 0. Therefore 1-i x p[0, u; p{gn(0, u),
g.+.(0, u)} > s] =
Gn,m(0)1i(d0) -, 0 J
as n , oo ; that is, the sequence {gn(0, u)} is fundamental with respect to the measure It x P. It contains a subsequence {gnk(B, u)} that converges (p x P)-almost-everywhere to some v{ft x Cam}-measurable function g(O, u). Let M1 denote the set of points (0, u)
at which this convergence does not take place.
For (0, u)
M1,
we have g(0, u) a A(0, u). Since the set M1 has measure 0, it follows that 1t-almost-all its 0-sections have P-measure 0 (cf. Section 8, Chapter II). Let K, denote the set of values of 0 whose corre-
sponding sections have nonzero P-measure. We set 1g(0,u)
for OeIUK1UK,
9(0,u)
for
g*(B,u)-
where K is the set of all 0 at which the limit relation (2) of Section 2 is not satisfied. Then g*(0, u) e A(0, u) (where u 0 N); that is, g*(B, u) is separable. Furthermore it is 6{Q x e}-measurable since it coincides with a measurable function almost everywhere in e x U (exclusive of points 0 e K1 U K and u e N). Furthermore, if 0 (Z K, U K, then by virtue of (1) and the condition of stochastic continuity, P{9(0, u) = g(0, u)} = 1
,
from which it follows that the random functions g*(B, u) and g(0, u) are stochastically equivalent. This completes the proof of the theorem.
We can make a number of statements generalizing Theorem 1. REMARK 1. In Theorem 1 the requirement that the spaces O and X be compact can be replaced with the requirement that they be locally compact and separable. The compactness of the space
X was necessary only so that we might refer to Theorem 1 of Section 2. Now, however, we can refer to Theorem 2 of Section 2. Here the separable and measurable representation g*(B, u) of the function g(0, u) assumes values, generally speaking, in some compact topological extension of the space X. Furthermore, if the space 0
4.
CONDITIONS FOR NONEXISTENCE OF DISCONTINUITIES
159
is locally compact and separable it can be represented as the sum of countably many compact sets.
The reasoning can be applied to each
such compact set in particular. The assertion of the theorem then follows for their union also. Furthermore, the measure p need not be finite; it is sufficient that it be o'-finite. From this we get: REMARK 2. The assertion of Theorem 1 holds for the case in which O and X are finite-dimensional Euclidean spaces and the
measure f p, .Q} is Lebesgue measure in O. Now we note that the proof of Theorem 1 would be simplified if we did not require separability of the measurable representation of the given random function. The set I would not come into the
picture and the points Oj,' could be chosen arbitrarily from the corresponding sets. Of the properties of the space X, we should need to use only its completeness. Thus we have: REMARK 3. If X is a complete metric space, if O is a locally
compact separable space, and if p is a o-finite measure on a aalgebra containing Borel subsets of 0, then the random function g(O, u), B e O, u e U, with range in X, which is stochastically continuous for p-almost-all 0, is stochastically equivalent to the measurable random function. The following result, which has great significance, follows immediately from Fubini's theorem (Theorem 2, Section 8, Chapter II):
Theorem 2.
Let (0) = g(0, u) denote a measurable random
function with real range.
If
J8 I e(o) Ift(de) < then for an arbitrary set B G St,
Me(B)p(dO) =
a
e(B)p(d0) .
The last equation indicates the commutativity of the symbols representing the mathematical expectation and integration with respect to a parameter. 4.
CONDITIONS FOR NONEXISTENCE OF DISCONTINUITIES OF THE SECOND KIND
Let e(t) where t e Z = [a, b] denote a random process with range in a complete metric space X. Definition 1. If sample functions of a process have, with prob-
RANDOM FUNCTIONS
160
ability 1, left- and right-hand limits at every t e (a, b) and have right-hand (resp. left-hand) limits at the point a (resp. the point b), the process is said not to have discontinuities of the second kind. Throughout the present section, we assume that the process g(t) is separable. We let J denote the set of separability of the process.
Definition 2. Let s denote a positive number. A function y = f(t) with range in X is said to have no fewer than m a-oscillations on a closed interval [a, b] if there exist points to, , t,,, where a < to < t1 < .. < t, < b, such that p(f(tk_1), f(tk)) > s
for k= 1, ,m.
For a function y = f(t) not to have discontinuities of the second kind on a closed interval [a, b], it is necessary and sufficient that for arbitrary a > 0 it have only finitely many a-oscillations on [a, b]. Proof of the Sufficiency. Let us prove the existence of the limit f(t - 0) for arbitrary t e (a, b]. Let {tn} denote an arbitrary sequence such that t t t. There exist only finitely many numbers (where Lemma 1.
nk < nk+1) such that p(f(tnk),' `tnk+1)) > a .
Consequently from some m onward, the inequality p(f(tn), f(tn+k)) 0, holds for all n > m, that is, the sequence { f(tn)} converges. This implies the existence of
f(t - 0) =
lim f(r)
.
rtt
Proof of the existence of f(t + 0) on [a, b) is analogous. Proof of the Necessity.
Suppose that one of the one-sided limits
(for example, the left-hand limit) does not exist at a point to. Then there exists a sequence t t to such that for arbitrary n, sup p(f(tm), f(tn)) > a , m>n
that is, there are infinitely many a-oscillations. REMARK 1. Definition 2 can be carried over in a trivial manner to random functions defined on an arbitrary set of real values of t.
If the sample function of a separable process e(t) has no fewer than m a-oscillations on [a, b], it also has no fewer than m aoscillations on the set of separability J except possibly for a set of sample functions N of probability 0. Theorem 1. Suppose that a separable random process e(t), for
4. CONDITIONS FOR NONEXISTENCE OF DISCONTINUITIES
161
t e [a, b], with range in X satisfies the following conditions: a. There exist numbers p, q, r, and C (where p > 0, q > 0, r > 0,
and C > 0) such that, for arbitrary t1, t2, and t3 such that a < t1 < t2 < t3 < b, we have MpP[e(t1),
/
/
1n
1 e(t3)] < C I t3 - t1
I1+1'
/ (1 )
the process e(t) is stochastically continuous on [a, b]. Then c(t) has no discontinuities of the second kind. b.
Proof It follows from (1) and Chebyshev's inequality that P[{p[e(t1), e(t2)] ? sl} n {p[eC(t2), e(t3)] > F'2}]] M pP[e(t1), e(t2)] p9[S(t2), e(t3)] SP
Ea
=
C I t3
- t1
I1+r
6PSE
(2 )
In proving the theorem, we actually use inequality (2) instead of condition (1). It follows from (b) that for the set of separability we can choose an arbitrary countable set that is everywhere-dense on [a, b]. For such a set, let us take the set J of all dyadic numbers belonging to [a, b].
For the sake of convenience let us take [a, b] = [0, 1]. We shall break down the proof into several steps. 1. Let Ak,,, denote the event
{p[e(k 2"
1 ), (2n)] < En}
'
where sn = C(1/P+Q)(2)r(n - 1)/2(p + q) = La- , a = 2-r/2(P+9) < 1
L = (2r/2C)(11P+9),
and
Bkn = Ak,% U Ak+1.n = fPK k 2n 1 )' e(2 )] U
{+(Z ),
e(k 2n
< En}
1 )]
0, the process e(t) has no discontinuities of the second kind.
It will be sufficient to show that with probability
1
every
sample function e(t) has only finitely many s-oscillations. Let J denote the set of separability of the process g(t). Let us represent
it in the form J= U In n=1
where I. is an increasing sequence of sets each consisting of finitely many elements. Let s denote any positive number. We partition
the interval [a, b] into m subintervals Or for r = 1, equal length, so that
, m, all of
2a(4, bma)='' e} _
MP{p(e(s), g(t)) > s Ie} < a(s, 8) for t - s < S. Corollary. A separable stochastically continuous process with independent increments has no discontinuities of the second kind.
On the basis of the definition of processes with independent increments and uniform stochastic continuity (cf. Theorem 5, Section 2), we have Proof.
P{p(e(s), e(t)) >> s I
z`
8} = p(p(e(s), fi(t)) > e} < p (s, d) (mod P)
where t - s I < 3, p (s, 8) is independent of t and s, and p (s, d) as a --. 0. Thus the hypotheses of Theorem 2 are satisfied. I
0
We recall that if a process is separable the values of the sample functions e(t) are, with probability 1, the limiting values of sequences {e(t;)} as t; -+ t, where each tti belongs to the set of separability.
Here if a process has no discontinuities of the second kind, then
e(t) will, with probability 1, be equal to (t - 0) or (t + 0) for every t.
Theorem 3. If e(t) is a stochastically continuous process without discontinuities of the second kind, there exists a process e'(t) equivalent to it whose sample functions are continuous from the right (mod P). Proof.
Let A denote the event that the limit lim E(t + 1 ) n-. \ nf
exists for each t c [a, b]. The probability of this event is 1. Let us set e'(t) = e(t + 1/n) for the outcome A and e'(t) = g(t)
for the outcome A. We then have {R'(t) # e(t)} = mUI {P(et), e'(t)) > -M1 n A , PW(t) # g(t)} = lim P({P((t),'(t)) > m } n A) /
On the other hand
S.
CONTINUOUS RANDOM FUNCTIONS
1
> m}
P{P(e(t), c'(t))
= P{U1
169
n {p(w), e(t +
n
> m I}
n
= limp lnn
m}}
lim P{p(e(t)' e(t
+ n )/ > m} Thus p{e'(t) =t- c(t)} = 0. We note that with outcome A the function e'(t) is continuous from the right. This completes the proof of the theorem. .
One can prove by an analogous method the existence of a stochastically equivalent process that is continuous from the left. 5.
CONTINUOUS RANDOM FUNCTIONS
Let Z denote the interval [a, b], let X denote a complete metric space, and let e(t) denote a random process defined on Z into X. Definition 1. A process e(t) for t e Z is said to be continuous if almost all sample functions of the process are continuous on Z. For processes without discontinuities of the second kind we can establish a rather simple sufficient condition for continuity. and k = 0, 1, , mn Theorem 1. Let {tn,k} for n = 1, 2, denote a sequence of partitions of the interval [a, b]:
a = t, ,0 < tn,1 < ... < tn,'mn = b , define
n=
max (tn,k - to k-1) '-' 0
as
15ksm
n , oo
If a separable process e(t) has no discontinuities of the second kind, then the random process is continuous if for every s > 0, m,n
E P{p[e(tn,k), ee(tn,k-)] > 6} --' 0
as
n
k=1
Let v, (where 0 < v, < oo) denote the number of values at which p[e(t + 0), e(t - 0)] > 2s and let vEn) denote the number of indices k for which p[e,(tn,k), S(tn,k-1)] > s. Obviously Proof.
of t
vE S limn-. vEn).
On the other hand mn
MvE
= Lk=1P(o[e(tn,k), e(tn,k-1)] > 6}
RANDOM FUNCTIONS
170
By virtue of Fatou's lemma (Theorem 2, Section 5, Chapter II), MvE < M lim v, < lim MvE"' "__
"-)00
Thus Mv, = 0; that is, v, = 0 with probability 1 for arbitrary s > 0. Consequently, for arbitrary t we have (t - 0) = e(t + 0) with probability 1. By virtue of the separability of the process,
fi(t) = e(t - 0) = (t + 0); that is, the process is continuous. Theorem 2. Suppose that there exist three positive constants for arbitrary e > 0, C, r, and G such P{P[W(t1), PAW), c(t2)] > e} < CI t2
-
tl
Ir}1
(2)
/ If the process g(t) is separable, it is continuous. Proof. Condition (2) is a special case of condition (2) of Section
4 (with q = 0 and ,8 = p). In addition, condition (2) ensures stochastic continuity of the process. Therefore the process $(t) has no discontinuities of the second kind. If {t,,,,} for n = 1, 2,
and k = 0, 1, . , m" is a sequence of partitions of the interval [a, b], then E P{P[e(t.,k), e(t",k-1)] > e} G k=1
CI tk - t",k-1 k=1
I"'
ea
< C(b - a) max I 4,k - tn,k-1 '-' 0 F,
as
X" = max I tn,k - t",k-1 I -- 0 15k;%="
By virtue of Theorem 1, the process e(t) is continuous. REMARK 1. Condition (2) of Theorem 2 can be replaced with the somewhat more stringent but, from a practical point of view, more convenient inequality MPp[e(t), e(t2)] < C I t2 - t, I1+r .
(3)
Applying Chebyshev's inequality to the left-hand member of inequality (2) and keeping (3) in mind, we obtain the right-hand member of (2). These two theorems give only sufficient conditions for continuity of a random process. For the particular case of processes with independent increments, the conditions of Theorem 1 are also necessary.
Theorem 3. If a process E(t) for t e [a, b] with independent increments is continuous, then condition (1) is satisfied for an arbitrary
S.
CONTINUOUS RANDOM FUNCTIONS
171
and k = 0,
sequence {tn,k} for n = 1, 2, the interval [a, b] such that
, mn of partitions of
max (tn,k - tn,k-1) --' 0
15k:g.,
Proof. Let us set Ok = sup, tl_t2i5k p,e(t1), e(t2)]. Since the process e(t) is continuous on [a, b], it follows that A, ,O as h ,O with probability 1. Therefore limk..o p IAk > E} = 0. On the other hand, if X. < h we have e(tn,k-1)]7 > E}
P[O, > E} > P{Sup
p{p[e(tn,1), EE(tn,0)] >E}
+ P{oetn,l), e(tn,)7 < E}p{p[S(tn,2)' S(tn,l)] > E} m,b-1
... + H PIP, (tn. k), e(tn,k-1)] k=1
E}
X p{p[e(t,.,mn), W.,.,,-1)] > E} m +n
e P p{Ok :-< E}L P{p[e(tn,k), S(tn,k-1)] > E} ,
k=1
from which it follows that for arbitrary s > 0, I p{p[e(tn,k), e(tn,k-)] > e} C k=1
p{Ok > E} P{Qk 5 E}
- .0
This completes the proof of the theorem. From Theorem 2 of Section 2 and Theorem 1 of the present section we get another test for continuity of a process. Theorem 4. If a process e(t) is separable and as h --> 0.
lima(') =0
(4)
a-.0
for arbitrary s > 0 where a(E, S) is determined by formula (11) of Section 4, then the process e(t) is continuous.
Since satisfaction of condition 4 implies that the process c(t) has no discontinuities of the second kind, it will be sufficient to verify relation (1). Remembering that p{p[e(tn,k), e(tn,k-1)] > E}
a(E, Otn,k)
where Otn,k = tn,k - tn,k-1, we obtain the result that E p{p[e(tn,k), S(tn,k-1)1 > E} < (b - a) max a(E, Otn,k) 15k6n
k=1
as X.
0.
__, 0
Otn,k
This completes the proof of the theorem.
Let us look at the condition for continuity of a Gaussian process e(t) that assumes real values. For the characteristic function of the random variable e(t) we have the expression
RANDOM FUNCTIONS
172
'`p'(?, t) = Meaaect> = exp (im(tp.
- 2 U2(t) X2
If a process e(t) is continuous, it follows from Lebesgue's theorem on taking the limit under the integral sign that T(x, t) is continuous with respect to t for arbitrary X. This in turn implies continuity of the function
In T(t, X) = im(t)?. - a2(t) and hence continuity of the functions
with respect to t
o,2(t) =In NP(?, t) + In Y'(- X, t) and
ix
m(t)
[1n P(X, t) +
2
E(t)]
.
Thus continuity of the functions m(t) and c2(t) is a necessary condition for continuity of a Gaussian process. Turning to sufficient conditions for continuity, let us suppose that m(t) = Mc(t) = 0. (If this is not the case, we may consider the process e'(t) = c(t) - m(t) instead of e(t).) Let R(t1, t2) denote the correlation function of the process e(t). Then M(e(t2) - e(t1))z = R(t2, tE) - 2R(t1, t2) + R(t1, t1) = OR, M(e(t2) - ee(t1))Em _ (2m - 1)! I OR I-
.
Using Remark 1, we obtain the following result: Theorem 5. If there exist C > 0 and a > 0 such that m(t) is continuous and I R(t2, tE) - 2R(t1, t2) + R(t1, t1) I < C I tE - t1 I mo, t; a [a, b] ,
then a separable Gaussian process with mathematical expectation m(t) and correlation function R(t1, tE) is continuous. For Gaussian processes with independent increments we can go even farther. In such a case, R(t1, t2) = R(t1, t1) = c2(t1) for t1 < tE, OR = a2(t2) - a2(t1), and E1 PJI Wk) - \tk-1) I > E} < 64 L MI Wk) - Wk-1) 14 k= E [QE(tk) E4 k=1
0, as n -- oo; that is, l.i.m. yn = yo exists. Let us set yo = i/r(x). To prove the single-valuedness of this definition let us take an arbitrary sequence {x'}, where each x;, e L(G1), such that lim x;, = l.i.m. x,,. We-need to show that l.i.m. y;, = urn 'r(x) = yo. If x2, = x" and 4_1 = xn (for n = 1, 2, ), then l.i.m. x,'; = x and on the basis of what we have shown, l.i.m. y" exists. From this it follows that l.i.m. *(x2n) = l.i.m. *(xn) = y" and
y" = l.i.m. *(4'n-1) = l.i.m. *(xn) = Yo Now it is easy to show that the extended mapping y = *(x) is a oneto-one mapping of H(G1) onto H(G2) and that it preserves the scalar product. This completes the proof of the theorem.
2.
2.
HILBERT RANDOM FUNCTIONS
181
HILBERT RANDOM FUNCTIONS
Let {U, e, p} denote a probability space. Definition 1. The set of complex-valued random variables f(u) for u e U such that M I C 2 < - is called the Hilbert space L2 = L2(U, Cam, P) of random variables of the probability space {U, Cam, P}.
The scalar product in L2 is defined by (C, '2) = MCP,
C,
2
Corresponding to this definition, we have the norm random variable C
of the
:
C
I = {M I C
I2}1f 2
.
Two random variables 5 and r) are said to be orthogonal if
(C, I) = Kf = 0 . The square of the norm 11 C I I2 of a real random variable C coincides I2 with the second-order moment = M I C I2, and if MC = 0 it coincides with the variance. If C and - are both real and MC = M, = 0, their orthogonality implies that they are uncorrelated. 11
I
Definition 2. A complex-valued random function C(B) for 0 e O is called a Hilbert random function if
MIC(B)I2 -.
n
k + E,k=1)1k ,
I cPk(t) 2
, is
2.
HILBERT RANDOM FUNCTIONS
189
Theorem 2. A measurable mean-square continuous Hilbert process
((t), for t e [a, b], can be expanded in a series b(t) = Lk=1 ekcPk(t)
(16)
In this expansion, {$k} is an orthogonal sequence of random variables M I $k I2 = Xk where the Xk are the eigenvalues and the gk(t) are the eigenfunctions of the covariance
that converges in L2 for every t e [a, b].
B(t, r) of the process. REMARK 3.
If the process C(t) is a Gaussian process, its mean-
lt)C(t)dt are also
square derivative and all integrals of the form
Jm
Gaussian.
REMARK 4. If C(t) is a real Gaussian process and MC(t) = 0, then the coefficients ek in the series (16) are independent Gaussian variables and the series (16) converges with probability 1 for every t. To see this, note that independence of the variables ek implies
that they are orthogonal and Gaussian. For the series (16) to converge with probability 1 it is sufficient that the series E M($kgk(t))2 = Ek=1 X 'k I qk(t) I2 k=1
converge. B(t, t)).
But we have already seen that this series converges (to
Example 3. Let us look at the expansion of a process of Brownian motion C(t) (where C(0) = 0 and MC2(t) = t) on the interval [0, 1] in an orthogonal series. Here MC(t) = 0, B(t, S) = MC(t)C(s) = min (t, s). The eigenvalues and eigenfunctions of the kernel B(t, s) are easily found. From the equation Xnq)n(t) = J1 min (t, s)co (s)ds = tsgn(s)ds + J1tpn.(s)ds 0
t
o
we see first of all that q).(0) = 0. Differentiating with respect to t, we obtain X, p,' (t) = cp,,(s)ds from which we get p ,'(l) = 0. By t successive differentiation we arrive at the equation ) q;;(t) _ -an(t). The normalized solutions of this last equation that satisfy the boundary conditions are of the form
Xn1=(n+2)tr2, z
Pn(t)=1/2sin(n+ Thus
Dirt
n=0,1,...
LINEAR TRANSFORMATION OF RANDOM PROCESSES
190
e(t)=1/2 Len n=0
(17)
where {fin} is a sequence of independent Gaussian random variables with parameters 0 and 1. For fixed t this series converges with probability 1. Another expansion of a process of Brownian motion can be obtained as follows: Let us set e(t) = C(t) - tC(l). Then c(t) is a Gaussian process with convariance B1(t, s) =min (t, s) - is and Me(t)= 0. The eigenvalues and eigenfunctions of the kernel B1(t, s) are
found in the same way as in the preceding case. We again arrive an(t) with boundary conditions q),(0) at the equation X, (t) qn(1) = 0. The solutions of this equation with these boundary
conditions are of the form an(t) = 1/ 2 sin n7rt, A,n1 = n27r2, n = 1, 2,
.
Thus
VT
fi(t) = fi(t) -
n=1
sin n7rt
en
(18)
n7r
where {fin}, for n = 1, 2, , is again a normalized sequence of independent Gaussian random variables. Moreover,
n = 1/ 2 `le(t) sin nirtdt
.
0
Since MC(l) = 0, MC2(l) = 1, and
VT M(C(t) - tC(l))g(l) sinn7rtdt = 0, 0
if we set 0 = C(l) we obtain fi(t) =too + 1//-f
n-1
and the sequence {fin} for n = 0, 1, 2, as the sequence {fin} where n = 1, 2,
n sinn7rn7rt .
,
(19)
enjoys the same properties The series (19) converges
(for every t) with probability 1. 3. STOCHASTIC MEASURES AND INTEGRALS
In a number of processes an important role is played by integrals
of the form (1 ) Jf(t) t. In physical constructions, however, it is impossible to anticipate the future. Therefore, for physical constructions,
h(t, z) = 0 for t < z . (4) Equation (4) is called the condition of physical realizability of the system. For systems satisfying condition (4) formula (1) assumes the form t
z(t) =
h(t, z)x(z)dz
,
J
(5)
and if the system is homogeneous,
z(t) = L h(t - z)x(z)dz = o h(u)x(t - u)du .
(6)
If a function is introduced at the input of the system beginning at the instant of time 0 (that is, x(z) = 0 for z < 0), then z(t) = 0 th(t - z)x(z)dz
(7)
In studying such systems, it is convenient to use not the Fourier transform but the Laplace transform
2(p) = e-Ptz(t)dt
(8)
.
It follows from formula (7) that 2(p) = H(p)2(p) ,
x(p) =
f Jo
e-Ptx(t)dt
(9)
for Rep > a if the functions e-ath(t) and e-atx(t) are absolutely integrable.
Let us turn to the basic theme of the present section, namely, For the most part, we
linear transformations of random processes.
5.
LINEAR TRANSFORMATIONS
211
shall consider homogeneous (with respect to time) transformations of stationary processes. In regard to the more general case, we shall confine ourselves to some simple comments. Let C(t) (for - - < t < -) denote a measurable Hilbert process with covariance B(t, z).
Suppose that the function B(t, t) is integrable
with respect to t over every interval and that for every fixed z, the function h(z, t) is also integrable over every interval. Then with exists for arbitrary a and b. probability 1, the integral h(t, b
Let us define the improper integral from - - to - as the meansquare limit of the integral over finite intervals of integration as their endpoints approach - Co and + oo : h(t, z)(z)dz = l.i.m. h(t, z)e(z)dz . a-w Ja 6-, b
From part 2 of Section 2, we recall that for this limit to exist it is necessary and sufficient that the integral LL h(t, zl)B(v1, z2)h(t, z2)dzldz2
exist in the sense of a Cauchy improper integral over the plane. If it exists for t e Z, then z(t) for t e Z is a Hilbert random process with covariance h(t1, r1)B(z1, z2)h(t2, z2)dz1dz2 .
Bz(t1, t2) =
(10)
Let us suppose now that E(t) is a stationary process in the broad sense with spectral function F((o) and mg(t) = 0. This assumption will be retained until the end of the present section. The integral V(t)
=
h(t - z)e(z)dz
(11)
exists (in the sense mentioned above) if and only if the integral
JIB Ih(t - z1)R(z1 - z2)h(t - z2)dv1dz2 h(zl)R(z2 - Zl)h(z2)dzldz2
exists, where R(t) is the covariance function of the process. For this in turn it is sufficient that the function h(t) be absolutely
integrable over (- 0 ,
oo ).
In this case, by using the spectral
representation of the covariance function R(t) (cf. (1), Section 5, Chapter I), we obtain the following expression for the covariance function Rn(t1, t2) of the process ra(t):
LINEAR TRANSFORMATION OF RANDOM PROCESSES
212
R,,(t1, t2) =
j_ht1 -r1)R(r,
= r°°J°°J_h(tl e=ct1
Z2
- z2)h(t2 - r2)drldr2
- z )ei-(r1-=2)h(t2 - r2)dzldz2dF((0)
H(i(o) IZ dF((o) = Rn(t1 - t2)
Thus the process Y)(t) is also stationary in the broad sense.
For a process fi(t), a transformation T is called an admissible filter (or, more briefly, a filter) if it is defined by formula (11) where h(t) is an absolutely integrable function, or if it is the mean-square limit of a sequence of such transformations Definition 1.
(in L2(e}).
A condition for convergence of a sequence {)),(t)} = {Tje I t)} of transformations of the form (11), with impulse transfer functions h,(t) and frequency characteristics Hn(i(o), consists in the following: M )In(t) - /n+m(t) I2
I Hn(i(o) - Hn+m(i(o) 11 dF(m) - 0 ,
(12)
that is, in the requirement that the sequence {HH(iw)} be a fundamental sequence in L2{F}. But then the limit H(im) = l.i.m. H%(i(o) exists in L2{F}. This limit is called the frequency characteristic of the limiting filter. If c(t) rP= l.i.m. r)n(t), then R, (t) =
eta, I H(i(o) I2 dF((O) .
(13)
Conversely, every function H(i(o) e L2{F} can be approximated in the sense of convergence in L2{F} by functions that are the Fourier transforms of absolutely integrable functions. Thus it is convenient to define filters by their frequency characteristics. Theorem 1. For a function H(i(o) to be the frequency characteristic of an admissible filter, it is necessary and sufficient that H(im) belong to L2{F}. The covariance function of the process at the output of the filter with frequency characteristic H(i(o) is given by formula (13).
If we recall the energy interpretation of the spectral function, it follows from formula (13) that I H(i(o) I2 shows by how much the energy of simple harmonic components of a process with frequencies
in the interval (w, w + d(o) is multiplied by passage through the filter.
Theorem 2. If a process e(t) at the input of a filter with frequency characteristic H(im) has the spectral representation
S.
LINEAR TRANSFORMATIONS
213
e(t) = Letp(dw),
(14)
then the process 72(t) at the output of the filter is of the form y)(t) =
Proof.
.
(15)
If the filter has an absolutely integrable impulse transfer
function, then on the basis of Lemma 5 of Section 3, Y(t) =
ea"H(iw)p(dw) .
h(t - v)E(z)dv =
Proof in the general case is obtained by taking the limit with respect to sequences {H,1(iw)} that converge in L2{F} to H(iw). Let lyh(t) denote a process at the output of a filter with frequency characteristic Hh(iw) (for k = 1, 2). Let us find the mutual covariance function of the processes 7),(t) and iy2(t). It follows immediately from the isomorphism of the spaces L2{fit} and L2{F} that R12(t) = M?1(t +
e"H1(iW)H2(iW)dF(W) .
(16)
Let us give some examples of filters and their frequency characteristics.
A band filter admits (without modification) only harmonic components of a process with frequencies in a given interval (a, b). The frequency characteristic of the filter is equal to H(iw) = and the filter is admissible for an arbitrary process. The impulse transfer function is found from Fourier's formula 1.
h(t)
2.
f e"-dw =
e%bt _ eM1at
2'rit
A low-pass or a high-pass filter admits (without modification)
only harmonic oscillations with frequencies not exceeding or not less than some value b. Such a filter is admissible for an arbitrary process. Its frequency characteristic is equal to H(ZCV) = X(-oo,b)()) (h(b,+oo)(0)))
and the impulse transfer function does not exist. 3. Consider the operation of mean-square differentiation of a process that is stationary in the broad sense. A sufficient condition for a process e(t) to have a mean-square derivative is the existence of R"(0) (cf. Corollary 2, Section 2). This condition is equivalent to the requirement (cf. Theorem 2, Section 5, Chapter I) that LW2dF(w) < 00
.
(17)
LINEAR TRANSFORMATION OF RANDOM PROCESSES
214
On the other hand if this condition is satisfied, then
-1
ei"h
(in Lz{F})
--. ico
h
and in the relation (t + h) - e(t) =
we may take the limit as h
e""
f
1 ,(dw)
ean-
0 under the stochastic integral sign.
Consequently, (18)
eaa°,iw fi(des)
V(t) =
Thus, corresponding to the operation of differentiation is a filter with frequency characteristic ia), which is admissible for all stationary processes satisfying condition (17). The impulse transfer function
does not exist, but the filter can be regarded as limiting (ass for filters with impulse transfer functions of the form
for ItI> s
0
h,(t)_
sgn t 62
for
0)
,
I
t
I <s,
to which the frequency characteristics 4 sine (As 2
correspond. 4. The time-displacement operation.
Since
ei°,eeiwr p(d(o)
z) _
corresponding to the time-displacement operation ST defined by t) = (t + z) we have the frequency characteristic H(ico) = e"°T. There is no impulse transfer function. Sr(e I
5. Differential equations. Consider a filter defined by a linear differential equation with constant coefficients
L77 =Me,
(19)
where dn
L = a0 dtn +
d"-1 a1
M = b0 d- + dtm
+ dm 1 + ... + bm
dtn-1 b1
dtm_1
Equation (19) is meaningful only when the process e(t) is m times
S.
LINEAR TRANSFORMATIONS
215
mean-square differentiable. Then we seek an n-times mean-square differentiable stationary process Y)(t) satisfying equation (19). Let us suppose that (19) has a stationary solution. Then 7)(t) can be represented in the form y)(t) _
eiwtH(ic))p(dw)
.
If we apply the operations M and L to the processes e(t) and ra(t) respectively, we obtain e'1tL(ic))H(iw) p(dco) _ e`wtM(i(o)p(dco)
,
so that if L(iw) does not have real roots,
H(ie)) = M
(20)
L(ico)
Conversely, if the process e(t) is m times mean-square differentiable, if M(iw) e L2{F}, and if L(iw) =/- 0 (for - < (o < oo ), then the process 17(t) =
eb.t L(M
ia))
p(dw)
is n times mean-square differentiable and satisfies condition (19). Thus under the conditions M(ic)) e L2{F} and L(iw) # 0, there exists a unique filter satisfying the differential equation (19). We note,
however, that it is possible to determine the solution of equation (19) under more general conditions. Let us suppose that the polynomial L(iev) has no real roots. A filter with frequency characteristic M(iw))/L(ier)) exists, and without requiring that M(i(o) belong to L2{F}, it is sufficient to require that M(i(o)/L(i(o) belong to L2{F}. This
will always be the case when the degree n of the polynomial L is not less than m. Thus for n > m, the filter with frequency characteristic
(20) whose denominator does not vanish for real w is admissible for an arbitrary input process, and we identify the output process of the filter with the stationary solution of equation (19). Let us confine ourselves, as before, to differential equations for which the polynomial L(x) does not have purely imaginary roots. Let P(x) denote the polynomial portion of the rational function M(x)/L(x) (which will be nonzero if m > n) and let us decompose the remainer into partial fractions. Then n tk ) ckII C" (w) M(iw)
where
P(Zer)) +
k=1
(I0)
Pk)8
+
(uo
p'k )8
LINEAR TRANSFORMATION OF RANDOM PROCESSES
216
P(i(o) =
ak(iw)k k=0
for m > n and P(iw) = 0 for m 0, where pk and p'k are the roots of the polynomial L(x) = 0. Since
_
1
am
P)
8
d
1
°°epte-m.t,tdt
dPeb
TS _-l !
=
0
t
o
ePte-i.,tdt
(s 1 )
(Rep < 0) and 1 = fo (1w - p)8 - J-em
-
t8
e,-te-cwtdt
(Rep > 0)
(s - 1)t
the output process i)(t) of the filter can be represented in the form
)7(t) = k=0 E
a&kV) +
v)G1(z)dt + e(t +r)Gs(-r)dv , 0
0
where ekete ){exppt (t > 0)} G,(t) =k=1I (E 8=1 (S - 1)!
,
11
tk
G2(t)
(E
s
(scketl)i
){exppk't (t < 0)}
We note that if the polynomial L(x) has roots with positive real part, the corresponding filter is physically unrealizable. 6.
PHYSICALLY REALIZABLE FILTERS
In the present section we shall consider the question: What spectral functions can be obtained at the output of a physically realizable filter? At the input of the filter we shall consider the random process that is simplest in a certain sense. The processes considered in the present section are invariably assumed to be homogeneous and stationary in the broad sense. Therefore, the word "stationary" will sometimes be omitted, and the words "in the broad sense" will always be omitted. We shall begin by considering stationary sequences. We shall
not carry over to sequences all the definitions and heuristic considerations that were given for processes with continuous time although we shall use the corresponding terminology. Consider a
6.
PHYSICALLY REALIZABLE FILTERS
217
system whose states at the input and output are regulated only at integral-valued instants of time t = 0, ± 1, ± 2, .. Suppose that a unit impulse takes place at the input of the system at the instant of time 0. We let at denote the response of the system to that impulse at the instant t.
If the system does not anticipate the future,
then a t = 0 for t < 0. If the system is homogeneous with respect to time, the response of the system to the unit impulse applied to the system at the instant v is equal to at_z. The response of a linear, homogeneous, physically realizable system at the instant t to the
sequence of impulses E(n) (for - co < n < a.) will be )7(t) _ n=-. at_ E(n)
n)
n=0
.
(1 )
In a certain sense, the simplest hypothesis is that e(n) is a stationary sequence with mean value 0 and uncorrelated values Me(n) = 0, M(e(n)S(m)) _ a...(- - < n, m < co) . We shall call such a sequence uncorrelated. Its covariance function has the spectral representation n-n
R(n) =
ei'""dw
2ir
Consequently the spectral density of this sequence is constant.
For the series (1) to converge in mean-square in the case of an uncorrelated sequence e(n), it is necessary and sufficient that
(2)
E I an lZ < oo
n=0
If this condition is satisfied, the process 'J(t) the broad sense and M)7(t) = 0 ,
also stationary in
is
Ra(t) = M(-I(t + v))7(v)) _
n=0
an+tan
(3)
What sequences can be obtained in this manner? Lemma 1. For a stationary sequence 7; (n) to be the response of a physically realizable filter to an uncorrelated sequence, it is necessary and sufficient that the spectrum of 7J(n) be absolutely continuous and that its spectral density f(co) have a representation of the form
f(w) _
I2
g(eil) = E bneinm
,
n=0
E I bn Iz < n=0
(4)
Proof of the Necessity. Suppose that the sequence has a representation of the form (1). We set g(eiw) = .,
V
/1
i anein
2ir n=0
(5) \
LINEAR TRANSFORMATION OF RANDOM PROCESSES
218
Then from Parseval's formula, g(eiu) I2dw
an+tan = rz eitm
Rn(t) =
n=0
I
-rz
that is, the sequence i7(n) has an absolutely continuous spectrum with density f(m) = I g(eil) 12, where is defined by formula (5). Proof of the Sufficiency. Suppose that )7(n) is a sequence with covariance function eit) f(w)do '
Ra(t) = rz rz
and suppose that this sequence admits a spectral representation rI(t) = J`rz eit. p(dw)
.
rz
Suppose that f(tv) = I g(eiw) 12, where g(eil) is defined by relations (4).
On the o-algebra of Borel subsets of the interval (-7r, 7r), let us construct the stochastic measure rz
(A) =
1
-- V 27C g(eiw)
XA((o)p (d(o)
Then, 1
XA((0)Xs(0j) rz
_
1
27C .)Ana
27c
g(eim)
12
f([e)dco
dw,
that is, (A) is an orthogonal measure with structural function l(A fl B), where d denotes Lebesgue measure. py using Lemmas 2
and 1 of Section 3, we obtain )7(t) = Jrz
eit0fI(dw)
L V n=0
J-s
where an = V27Cb(n),
=
Vrzrz
cit'V27cg(ei-)e(d(o)
ei(t-n) 0 (dw) _
n=0
a,
(t - n)
rz
ei'%'e(dw) _rz
,
and
rz
M((n)(m)) =
-rzei(n-m)md(v
= 6nm
27C
Thus the sequence e(n) is uncorrelated. This lemma gives a simple answer to the question asked. But this answer is not sufficient for us in the general case since we still
do not know when the spectral density can be represented by formula (4).
6.
PHYSICALLY REALIZABLE FILTERS
219
Let us find conditions under which f(w) admits a representation of the form (4). Let H2 denote the set of all functions f(z) that are analytic in the disk D = {z, I z I < 11 and that satisfy the relation anzn , then = Jim Ir I fire M) 12 d8 < oo . If f(z) I f(z) 112
rT1
I
ff(Ye") = E anrneine n=0
that is, the anrn are the Fourier coefficients of the function f're'e). On the basis of Parseval's equality,
rzIfrei")I2d8
I an 12 r2'
= 21c
From this it is clear that f(z) e H2 if and only if
an I2 < co n=o
Consequently for every function f(z) e H2 it is possible to define a series f(e'B) = Z aneine that converges in L2(1), where I is Lebesgue n=0 measure on (- 7c, 7r). The function f(z) (for I z I < 1) is determined by the function f(e2B) in accordance with Poisson's formula f(reto) =
' t- f(e;°')P(r, 0, w)dw
2_
(6)
,
where
P r 8 w)
('
)
r2
1-2rcos(0-w)+r2
=
rinieance-.
Proof of this assertion follows immediately from Parseval's equality. It is shown in the theory of functions (see Privalov, or Hoffman) that if the function f(e'w) in formula (6) is Lebesgue-integrable, the limit lim f(re'B) = f eil) exists for almost all 0. The function f(e") *tl
is called the limiting value of the function f(z) (for I z I < 1). Theorem 1. Let f(w) denote a nonnegative function that is Lebesgue-integrable on the interval [ - tr, tc]. For the existence of a function g(z) e H2 such that
f(w) = I
g(ea.,)
12
,
(7)
it is necessary and sufficient that
rzIInf(w)Idw 11.
Then d(j)
= JB In I g(re'W) I dw - JA In I g(rez,) I = 2f B In I g(rei") I dw - ,-r In g(re"") I dw
rz I In I g(Yez,) II dw rz
From Jensen's inequality it follows that rz
I 27r
In If(rei-)Idw=In I r >>0, k=1 I Zk
-rz
where the zk are the zeros of the function f(z) inside the disk I z I < r. Consequently, Jrz
I In I g(reiw) II do) < rz
JB
In I g(re"-) I dw < B I g(re'-) I2 dw 2dw=27rjIa.12
Applying Fatou's lemma, we obtain lim rzrz I In I g(Yez') II dw < 7 lim I In I Jrz
J
II dw
InIg(ei-)Ildw < 27r Ian 2, n=0
a
which proves the necessity of condition (8). Proof of the Sufficiency. The function u(r, 0)
Suppose that condition (8) is satisfied.
27r JR rz
In f(w)P(r, 0, w)dw
is harmonic in the disk D: I z I < 1. Jensen's inequality that
We note that it follows from 1
rz
u(r, 0) - o.
.
rz
REMARK 2. The function 99(z) is an analytic function in D, and its real part has limiting values In f(co). Consequently, ,P(z)
In f(c))
= 27c
rz
+
eim
z do)
.
(9)
Expanding the function g(z) = exp (I /2q(z)) in a power series g(z) E;,=o bnzn, we obtain the following values for the coefficients an in
formula (1'): an = 1/Tbn. On the other hand the expression for g(z) can be transformed as follows:
Since eim
+z
+
2ze-i.
+
I - zeiw
ei0 - Z
kik. k=1
we have
g(z) = exp
27r
'-, lnf(w)do +
where dk - rz eika In f(c))dco rz
27r 1 k=1 j dkzk}
LINEAR TRANSFORMATION OF RANDOM PROCESSES
222
Setting
P = exp ( 47C
1a
In f(w)dw
,
exp C 1 j dkzk) = j Ckzk / k=o \ 22V k=1
we obtain g(Z) = P E CkZk . k=o
Thus,
a = 1/27rPC . REMARK 3. The function g(z), whose existence was established by Theorem 1, is not uniquely defined. However, if g(z) satisfies the two conditions
a. g(z)#O for z e D, b. g(0) > 0,
it is unique and hence coincides with the function that we have To see this, let g1(z) and g2(z) denote two such functions. Then 'r(z) = g1(z)/g2(z) is analytic in D, it does not vanish in D, and its absolute value is equal to 1 on the boundary of D. The function In 1r(z) is analytic in D and its real part vanishes on the boundary of D. Therefore In *(z) = ik, where k is real. Since In p(0) is real, we have In /r(z) = 0. Let t1(e) and C2(O), for 0 e O, denote two Hilbert random functions. Let denote the closed linear hull of the system of found.
random variables {C;,(0), 0 e O} in L2.
Definition 1. If HH1 c Hc2,then the random function C1(0) is said to be subordinate to C2(0). On the other hand, if Hsl = Hc2, then C1(0) and C2(0) are said to be equivalent. REMARK 4.
As one can see from the proof of Lemma 1, the
sequences E(n) and fi(n) are equivalent.
Let us turn to processes with continuous time. Consider a process 77(t) that has a representation of the form 77(t) =
Jo
a(v)de(t - v)
,
(10)
where e(t) is an integrated white noise, that is, a process with orthogonal increments, such that M Ae(t) = 0, M I A4(t) I2 = At. Corresponding to the process e(t) is an orthogonal stochastic measure on the a-algebra of Lebesgue-measurable sets (cf. Section 3). The
integral (10) exists if and only if
6.
PHYSICALLY REALIZABLE FILTERS
223
Ia(t)12dt
(23)
Under the additional conditions that h(w) 0 (for Re w > 0) and h(1) > 0, the function h(w) is unique and is given by formula (21). Theorem 4. For a stationary process ra(t) (for - oo < t < o o) to have representation of the form (10), it is necessary and sufficient that it have an absolutely continuous spectrum and that its spectral density satisfy condition (23). 7.
PREDICTION AND FILTERING OF STATIONARY PROCESSES
An important problem in the theory of random processes, one that has numerous practical applications, is that of finding as close as
7.
PREDICTION AND FILTERING OF STATIONARY PROCESSES
227
possible an estimate of the value of a random variable C in terms of the values of the variables ea(for a e ?X). It is a matter of finding a function f(ea I a e 1X) depending on Ea for a e ?I with least possible
error, and satisfying the approximate equation y
b fyb =flea Iae%) .
( I )
An example of such a problem is the prediction (extrapolation) of a random process. In this case, the problem is that of estimating
the value of the random process at the instant t* from its values on some set of instants of time Z preceding t*. Another example is the problem of the filtering of a random process. This problem consists in the following: A process e(t) = ra(t) + C(t), representing the sum of the "useful" signal C(t) and the "noise" Y)(t), is observed at instants t' a V. The problem is to separate the noise from the signal; that is, for some t* e Z : ', we need to find the best approximation of C(t) of the form
W *) - C= f e(t') t' e ') The statement of the problem is not yet complete, since it has not been shown what is meant by "best approximation." Of course the criterion of optimality depends on the practical nature of the problem in question. With regard to the mathematical theory, the methods of solving this problem are primarily based on the meansquare deviation as a measure of accuracy of the approximate I
equation (1). The quantity a = {M[ - f(ea I a e I)]2}1/2 is called the mean-square error of the approximate formula (1).
(2) The
problem consists in determining the function f so that (2) is minimized. In the case in which S?t is a finite set, we mean by f(ea I a e'? I) a measurable Borel function of the arguments a for a e W. On the other hand, if ?I is infinite this symbol denotes a
random variable that is measurable with respect to the a-algebra 2`7S = Q(ea, a e %).
In what follows we shall assume that both C and f( a a e s?I) have second-order moments. We define
(3) (cf. Section 8, Chapter III). Then 2 = M{C - f(EEa I a e ?X)}2
= M(C - 7)2 + 2M(C - 7)(7 - flea I a e sVI))
+ M(7 -
I a e L[))2 .
LINEAR TRANSFORMATION OF RANDOM PROCESSES
228
Since y - f(ea I a e s?I) is 3-measurable, we have
M( - 7)(7 - f(ea I a e s?1)) = MM{(C - 1)(7 - f($a I a e W)) = M(7 - .f($a I a (z at)) M{(e - 7) I
}
}=0
Thus
2 = M(C
- 7)a + M(7 - f(a I a e))Z
from which we get Suppose that a random variable C has a finite secondorder moment. An approximation of C with minimum mean-square error obtained with the aid of a o'{ea, a e %}-measurable random variable is
Theorem 1.
unique (mod P) and is given by the formula REMARK 1.
7=M{CIA}. The estimate = y of a random variable C
is
unbiased; that is,
M7 = MM{C} = MC and the variables C - y and are, for arbitrary a c t, uncorrelated: M(C - 7)ea = MM{(C - 7)ea IM } = Me M{(C - 7)
}=0.
Unfortunately, the use of Theorem 1 to obtain actual approximation formulas is extremely difficult in practice. In the case of Gaussian random variables, however, we can proceed further. We note first of all that the simplest statement of the problem leading in a number of cases to a final and analytically attainable solution is the problem of finding the best approximation & not in the class of all measurable functions of given random variables but in the narrower class of linear functions. More precisely this means: Let {U, Cam, P} denote a basic probability space. Let us suppose that the variables ea and C have finite second-order moments. We introduce
the Hilbert space H{ea, a e I}, which is the closed linear hull of the variables Ea, for a e fit, together with all constants. The subspace H{ea, a e S?I} can be regarded as the set of all linear (nonhomogeneous) functions of Sa with finite variances. The best linear approximation
& to the random variable C is that element H{ea, a G W} that lies nearest C:
3E=MIA-CI2 for arbitrary C' e H{ea, a e S?t}.
M
C' - CIZ
We know from the theory of Hilbert
spaces (cf. Section 1 of this chapter) that the problem of finding the element
in the subspace H. that lies nearest the given element
7.
PREDICTION AND FILTERING OF STATIONARY PROCESSES
229
is the projection C always has a unique solution. Specifically, of C onto Ho. The element can always be determined (uniquely) from the system of equations (C - , C') = 0 for arbitrary C' G Ho, where (x, y) denotes the scalar product of x and y. In the present case this system of equations can be written in the form
(4) M(M) Since (1) belongs to H{ea, a e 5?X}, we have M = MC, so that the best linear estimates of are necessarily unbiased. Furthermore we may assume that W. = 0 for arbitrary a. Therefore in what follows we shall confine ourselves to a study of the subspace of random variables in L2{ U, Cam, P} with mathematical expection 0.
Of course, we do not always have grounds for assuming that a linear estimate of the quantity C is acceptable. For example, if e(n) = e'(°'"+w),
where v is uniformly distributed on (-ir, ir), then
m and the best linear approximation of the variable gy(m) from the values of all E(n) for n # m is of the form M(e(n)e(m)) = 0 for n
Vi(m) = 0; that is, it does not use values of the variables E(n), whereas any pair of observations (k) and E(k + 1) are sufficient to determine the entire sequence E(n) precisely, si(n) _
(e(k+l))n-k (k)
Let us suppose now that all finite-dimensional distributions of the system {C, a, a e 51x} are normal and M. = MC = 0. In this case it followss from the uncorrelatedness of the variables C - & and a that they are independent. Therefore C - C are independent of the a-algebra
and
M{CI }=M{C-&+&I`}=M{CTheorem 2. For a system of Gaussian random variables (C, ea, a e 51X), the best approximation (from a standpoint of mean-square deviation) of the variable C with the aid of a e 51X} of a measurable function coincides with the best linear approximation in H{ea., a G 5I}.
We now consider a number of particular problems on the construction of best linear approximations. 1.
The number of random variables a. (for a = 1, 2,
, n)
is finite. This problem has a simple solution as we know from linear algebra. Assuming that the a are linearly independent, we can construct the projection & of the variable C onto the finite-dimensional
space H. generated by the quantities a (for a = 1, of the formula
, n) by means
LINEAR TRANSFORMATION OF RANDOM PROCESSES
230
(e1, e1) (S n, e1) e1
en) 0
,fin) is the Gram determinant of the system
where P = P( e2, of vectors 1, e2,
,
n, (e1, e1) .
g P(S1, 2, .. . ,n) _
. .
. ... . . ... .g.... . e1) .. . (S n, en)
r)) = M(e9) ). The mean-square error 3 of the approxiand where mate equation C & is equal to the length of the perpendicular
onto the space H. and is given by the
dropped from the vector formula a2 = {l (e1, e2,
, n)}
, en, S)/ry (e1, e2,
Consider the problem of approximating a random variable from results of observation of a mean-square continuous random process c(t) on a finite interval of time T = [a, b]. Let R(t, z-) denote the covariance function of the process e(t). On the basis of Theorem 2 of Section 2, the process e(t) can be expanded in a series, 2.
L V TkPIWt k=1
k
where {Pk(t)} is an orthonormal sequence of eigenfunctions, where the Xk are the eigenvalues of the correlation function on (a, b): XkPk(t) =
5Rt, m b
and where ek is the normalized uncorrelated sequence Meker = 3kr Obviously, {ek}, for k = 1 , 2, , constitutes a basis in H{e(t); t e (a, b)}.
n = 1, 2,
Therefore ,
_ E7=1
(' b
where
for
Ja
and
cn = MUn = :&(t),pn(t)dt a
&(t) =
The mean-square error a of the approximation can be found from the formula y 32=MI2-MI
6RsF(t)pn(t)dt SI2=MI C2 I -E n=o
2
a
In practice, the application of this method is made more difficult by the complication of calculating the eigenfunctions and eigenvalues
of the kernel R(t, z). 3.
The integral equation for filtration and prediction.
Wiener's
7.
PREDICTION AND FILTERING OF STATIONARY PROCESSES
231
Let E(t) and C(t) for t e Z denote two Hilbert random
method.
Let us suppose that the process e(t) is observed over some set T of values of the argument t. Consider the problem functions.
of determining the best approximation of the value of C(t0) for to e Z from the observed values of E(t) where t c T. If we assume that
the desired approximation is of the form
(5)
(to) _ Tc(B)E(O)m(dO)
where m is some measure on T, and if the conditions under which the above integral is meaningful are satisfied, then equation (4) takes the form Tc(B)Rff(B, t)m(d6) = R3'e(to, t)
(6)
tcT,
,
where RU is the covariance function of e(t) and RAF is the mutual correlation function of C(t) and e(t). Equation (6) is a Fredholm integral equation of the first kind with symmetric (Hermitian) kernel. By no means does it always have a solution. However, if T
M I e(t) 12 m(dt) < _ ,
the integral equation (6) has a solution c(O) e L2{m} if and only if the best linear approximation of C(t0) of the quantity C(t) is of the form (5). Suppose that Z is the real axis and that T = (a, b), that the processes and C are stationary and stationarily connected (in the broad sense), and that the measure m is Lebesgue measure. Then equation (6) takes the form bc(B)R,# - t)dO = &(to - t)
,
(7)
t e (a, b)
If C(t) = e(t) (for - oc < t < -) and tQ > b, that is, if the problem consists in finding an approximation for the quantity E(to) from the
values of c(t) in the past, we - shall call the problem one of pure prediction.
Let us look in greater detail at the problem of prediction of the variable C(t + T) from the results of observation of the process E(O) from the instant t, where 0 < t. Let us treat the predicting variable fi(t) as a function of t for fixed T. We can easily see that C(t) as defined by equation (5) is a stationary process. To see this, note that equation (7) is of the form Jt
ct(O)RF£(B - u)dO = &(t + T - u)
,
uGt
.
The change of variables t - is = v, t - 0 = z transforms the preceding
LINEAR TRANSFORMATION OF RANDOM PROCESSES
232
equation into the following:
ct(t - r)Ree(v - r)dr = Rse(T + v)
V>0.
,
(8)
From this we see that the function ct(t - z-) is independent of Let us set c(r) = ct(t - r). Equation (8) can now be written
Oc(r)Ree(v-r)dr=&e(T+v),
v>0,
t.
(9)
and formula (5) for the predicting function takes the form
c(t - r)e(v)dr =
&(t)
c(r)c(t - r)dr .
(10)
Jo
Thus the process C(t) = CT,(t) is stationary. It follows from formula (10) that c(t) is the impulse transfer function of a physically realizable filter that transforms the observed process into the best approximation
of the quantity C(t + T). It is easy to exhibit an expression for the mean-square error 8 of the predicting function fi(t). Since 82 is the square of the length of the perpendicular dropped from the vector C(t + T) onto H2{e(r), -r:!5; t}, we have
32=MI&(t+ T)I2-MI&(t)I2 = 5i5R25(t - r)c(r)dtdr Setting &(0) = a and shifting to the spectral representation of the covariance function RFF(t), we obtain 32
=v
(12)
I C(iw) 12 dFFF(m) ,
where FFF((o) is the spectral function of the process e(t) and C(ic)) =
Jc(t)e-i0tdt 0
We shall explain briefly a method proposed by N. Wiener for solving equation (9). Suppose that the spectrum of the process e(t) is absolutely continuous and that the spectral density ffF(w) admits a factorization (cf. Theorem 3, Section 6): .fee((o) = I h(i(o) 12, h(z) _
'
a(t)e_ztdt,
Vfi to
Re z > 0
It follows from Parseval's equality for the Fourier transform that Ree(t) =
e-at0, I h(i(w) I2 dw
=
Jo
a(t + u)a(u)du
Let us suppose also that the mutual spectral function of the processes
7.
PREDICTION AND FILTERING OF STATIONARY PROCESSES
233
Z(t) and e(t) is absolutely continuous and that its density ffe((o) satisfies the condition ffe((O) = k(i(o) E LZ .
(13)
h(i(o)
Then eit0ffe((o)dw
Rse(t) =
=
Jo
=
b(t + u)a(u)du
etitwk(i(o)h(i(o)dm
,
where b(t) 1
'
With the aid of the expressions obtained we can rewrite (9) as follows:
[b(T + v + u) - o c(v)a(v - v + u)dr]d(u)du = 0
,
v > 0 . (14)
A sufficient condition for equation (14) to hold is that c(t) satisfy the equation
b(T + x) =
c(v)a(x - v)dv
x>0.
,
Jo
(15)
Equation (15) is of the same type as equation (9) except for the important fact that the function a(t) vanishes for negative values of t. If we write equation (15) in the form
b(T + x) = Jc(v)a(x - v)dr
x>0,
,
(16)
we can immediately solve it with the aid of the Laplace transform. Multiplying equation (16) by e-11 and integrating from 0 to o, we obtain BT(z) = C(z)h(z), where BT(z) = (1/1/fi)Sb(T + x)e-22dx, 0
C(z)
1/27c Jo
c(t)e-ztdt
Thus
C(z) = BT(z) h(z)
c(t) =
,
1
eio,t
-c
BT(im) dm h(i(o)
,
(17)
where the expression for BT(z), for Re z > 0, can be written in the form BT(z)
_
1
27r
e%T°, J (m)
dm
h(im) z - iw
(18)
LINEAR TRANSFORMATION OF RANDOM PROCESSES
234
Formulation of the assumptions under which formulas (17) and (18) are valid is extremely laborious. In solving specific problems, it is simpler to verify directly the validity of the proposed transfor-
mations that lead to the solution of the problem. 4. Yaglom's method. With this method, in contrast to that of Wiener, we seek not the impulse transfer function of the optimum filter, which may not even exist, but instead the frequency characteristic. We shall not give general formulas for solving the problem but shall present only a method of choosing the desired function by starting with those requiremements that it must satisfy. In many important cases, this choice is easy to make. Suppose that a two-dimensional stationary process (e(t), ((t)) has
a spectral representation of the form fi(t) =
e'.t p (du)
eq.t
E(du)
with spectral density matrix Jee(w)' Jk(w) .fsf(m)' .fss(co)l
As before, let us consider the problem of the best approximation of the quantity ((t + T) from the values of the process e(-r), for v < t. The predicting process &(t) is subordinate to e(t). Therefore ecwtc(iw)1_t1(dw)
fi(t) =
,
_m I c(iw)
12
ff(w)dw < °
(19)
The equation defining the process fi(t), namely MC(t + T)e(T) = MC(t)e(v)
Z. < t
,
takes the form fSe(m) - c(io))fff(o))}dw = 0
,
V>0.
(20)
In addition to conditions (19) and (20), we also have the requirement that c(im) be the frequency characteristic of a physically realizable filter. These conditions will be satisfied if a. b.
the function ff(w) is bounded, c(iw) is the limiting value of the function c(z) E HZ (cf.
Section 6), c. gr(im) = e"'Tfff(w) - c(i(o)fff((o) is the limiting value of the
function *(z) E Ha, where Ha, is defined analogously to H; except that Ha consists of functions that are analytic in the left half-plane.
To see the validity of this assertion, note that condition (b) implies that
c(iw) 11 dw < oa, and this, together with condition
7.
PREDICTION AND FILTERING OF STATIONARY PROCESSES
235
a, ensures satisfaction of condition (19). Also, condition (b) implies that c(ic)) is the frequency characteristic of a physically realizable filter. It follows from condition (c) that e'lT ff,((o) - c(io)f££(oi) is the Fourier transform of a function that, by virtue of equation (20), vanishes for positive values of the argument. If we confine ourselves to condition b we rule out filters whose frequency characteristics can increase at infinity. Such frequency characteristics correspond to operations connected with differentiation of the process e(t) and are often encountered in the construction of optimum filters. Therefore it is desirable to replace condition b
with a less restrictive one. Let us suppose that c(z) is a function that is analytic in the right half-plane and that c(z) I approaches - no faster than the rth power for some r as I z We define the functions c (z) by cn(z) =
r+ l nI
c(z)
C1+
e H2
we have
Since I c(z)
lim
I c(uo) - t(ie)) I2f
(o))d(o = 0
if condition (19) is satisfied. Thus c(i(y) is the limit in L2{FF} of frequency characteristics of admissible physically realizable filters. Therefore c(ity) is also the frequency characteristic of such a filter. Thus we have proved: Theorem 3. If the spectral density fFF(w) of a process E(t) is bounded, then the three conditions
c(i() )d(o < b. c(ity) is the limiting value of a function c(z) that is analytic a.
in the right half-plane and that increases no faster than some power of
zI as zI,o-;
c. ' 1r(iw) = e24TffF(m) - c(im)fEF((o) function 1/r(z) e HZ ;
is the limiting value of a
determine uniquely the frequency characteristic c(ic)) of the optimum filter approximating the quantity e(t + T).
The mean-square error 3 of the best approximation is equal to
6={MIC(t+ t(i(d) 2fee((o)de)}2
Example 1.
(21)
Consider the problem of a simple prediction of a
LINEAR TRANSFORMATION OF RANDOM PROCESSES
236
process e(t) (where e(t) = C(t)) with covariance function R(t) _ 62e-a 1 t I
for a > 0. The spectral density is easily found; 62a
Jf((0)
7r
1
w2+a2
The analytic continuation of the function 'r(i(o) is of the form = c(z) - eIT Y'(Z) (z + a)(z - a) This function *(z) has only one pole in the left half-plane: z = - a. To neutralize it with the aid of a function c(z) that is analytic in the right half-plane, it suffices to 'set c(z) = const = e-aT. Here condition (a) of Theorem 3 is satisfied. Thus ezte-aT p(do))
e-aT
c(i(w) =
that is, the best formula for the prediction of the quantity (t + T) is (t + T) P,-s which depends only on the value of e(v) at the last observed instant of time. The mean-square error of the extrapolation is equal to 3 = 6V1 Example 2. Let us consider the problem of pure prediction of a process e(t), that is, of finding an estimate for (t + T) from observed values of e(v) for z < t. If the spectrum of the process fi(t) is absolutely continuous and if condition (23) of Section 6 is e-1,T.
satisfied, then the spectral density of the process admits a factorization fff(tn) = I h(i(o)12, where h(z) belongs to H2 and has no zeros in the right half-plane. Let us consider the case which is very important for practical applications and in which h(z) is a rational function h(z) = P(z)/Q(z), where P(z) is a polynomial of degree m and Q(z) is a polynomial of degree n > m. Let us suppose also that the spectral density f;(co)
is bounded and that it does not vanish. Then the zeros of the polynomials P(z) and Q(z) lie in the left half-plane. Let P(z) and Q(z) be represented in the forms 9
P(z) = A H (z - zj)aj
,
Q(z) = B R (z - 2j)lj
j=1
j=1
where
Eaj=m,
j=1
j=1
,6j=n.
Let us define
P1(z) = (- l )-A fJ (Z + 2j)ai 7=1
,
Q1(z) _ (- 1)" Bj=1II (z + 2jyj
7. PREDICTION AND FILTERING OF STATIONARY PROCESSES
237
The analytic continuation of the function 1/r(iw) is of the form #(z) = (e.=T - c(z)) P(z) Pa(z) Q(z) QAz)
The function c(z) must be analytic in the right half-plane and 1/r(z) must be analytic in the left half-plane. Therefore c(z) must be analytic in the entire complex plane and it may have poles at the zeros of the polynomial P(z). The order of such a pole cannot
exceed the order of the corresponding zero of P(z).
Therefore,
c(z) = M(z)/P(z), where M(z) is an analytic function in the z-plane that has no singularities for finite z. Since c(z) has no more than a power order of growth, M(z) is a polynomial. In view of the square-integrability of the absolute value of the function c(iw)
P(iw) = M(i(e) Q(iw) Q(iw)
the degree m, of the polynomial M(iw) does not exceed n - 1, that
is, ml < n - 1. On the other hand, the indicated choice of the function c(z) ensures satisfaction of conditions (a) and (b) of Theorem 3. We must choose the polynomial M(z) in such a way that the function '/'(z) = (eZTP(z).- M(z)) Pa(z) Q(z)
Q'(z)
or, what amounts to the// same thing, the function *AZ) =
ezT P(z) - M(Z) Q(z)
has no poles in the left half-plane. For this it is necessary and sufficient that
=
d'M(z) dzi
1z=7,
di(e$TP(z))
dzi
z=zk
i = 01 11 ... , 16k - 1
k=1,...,q.
(21')
The problem of constructing a polynomial M(z) satisfying conditions (21') is the usual problem in the theory of interpolation and it always has a unique solution in the class of polynomials of degree n - 1. In finding the polynomial M(z), we automatically find the
frequency characteristic of the optimum predicting filter c(iw) _ M(iw)/P(aw).
We may also use the following procedure for determining the function c(z). Let us decompose the functions P(z)Q-'(z) and M(z)Q-1(z) into partial fractions:
LINEAR TRANSFORMATION OF RANDOM PROCESSES
238
P(z) = Q(Z)
q
rykj
M(z)
Ck'
k=1 j=1 (Z - Zk)j '
k=i j=1 (Z - Zk)j
Q(Z)
For the function '(r1(z) not to have poles at the points 2k, for k = 1, , q, it is necessary and sufficient that
_ dzi
0, 1, ..., (3k - 1
(Z - 2k)PkY'1(Z) L-zj, = 0, Ek=1
where *1(z)
k1 (CkjezT - ,Ykj)I(Z
,
- 20j. Simple calculation
shows that rykj =
[kJ
T2
+ TCk.j+1 +
TPk-j
Ck.j+2 + ... +
(Rk - j)!
2!
CkPkI eakT
k=1,...,q,
Knowing the coefficients yk j, we can write the expression for c(iw): q
C(lw)
-
1
h(am) k=1 j=1 (Z
7kq2k)9
=
Pk
k-1 j=1 qqE
Pk
k=1
ryk j
(Z - 2Jj Ckj
(22)
(Z - Zk)q
Suppose that we are observing a process (z) (for z < t) but that the results of our measurements of C(r) are distorted by various errors, so that the observed values yield a function (v), for z < t, different from ((z). Let us suppose that the magnitude of the error (or, as we say, the noise) ra(t) = fi(t) - C(t) is a stationary process with mean value 0. Suppose that we wish to find an estimate of the value of C(t + T) from the results of observation of the process (v) = C(v) + Y)(z) for z < t. Such problems are called filtering or smoothing problems (we say that we need to filter the noise )a(t) out of the process fi(t) or that the process fi(t) must be "smoothed," that is, the irregular noise needs to be subtracted from it). Here for T > 0, we have a Example 3.
problem of filtering with prediction and, for T < 0, we have a problem of filtering with lag. Let us suppose that the noise ra(t) and the process C(t) are uncorrelated and have spectral densities{ fe(w) and ff,((o). Then Ree(t) = Rno(t) +
and J,(w) = .f,,((o) + there exists a mutual spectral density of the
Since RMt) = processes ((t) and e(t) and we have f;,((o) =
f"(w) = w2 + a2 Then
and f,,(w) =
Suppose that ( 1 )2+6 2
7.
PREDICTION AND FILTERING OF STATIONARY PROCESSES
££((0) -
((0 2
C3(0 + 72) //
C3 = C, + C2 ,
,
+ 32) + For the function *r(z) we obtain the expression
239
72 = c1a2 + C2 92 C1 + C2
a2)(w2
- C1ezT(Z2 - 32) +
C3C(Z)(Z2
- 72)
Y'(Z) _
(Z2 - a2)(Z2 - N2)
Suppose that T > 0. The function 'i/r(z) must be analytic in the left half-plane and it must belong to H,1. For this it is necessary
that the numerator vanish at the points z = - a and z This leads us to the equations C(- = 0 a) = C(R)
e "T(a2 C1 C3
2R2)
a - Y
(23)
Furthermore c(z) must be analytic in the left half-plane (and also in the right half-plane by virtue of condition b) except at the single point z = -7 where it may have a simple pole. Thus c(z) = q(z)/(z + 7), where T(z) is an entire function. From the condition of finiteness of the integral
I C(i(o) I2 f j(O)dw,
it follows that p(z) is a linear function, p(z) = Az + B. From (23) we obtain
c, 8+ae_.T c3 7+a
c(z)=AZ+(3
Z+7
Therefore the formula for optimum smoothing with prediction is of the form
T(t)=A
e24't10)+Rp1(dco).
ico + 7
Remembering that (i(o + 7)-1 is the frequency characteristic of a physically realizable filter with impulse transfer function e-71, we obtain CT(t) = C1 ,Q + a e_.T J e(t) l\
- (R - ry) f t
e
C37+a
(24)
For T < 0, formula (24) is not valid. Formally, this is connected with the fact that the function *(z) is not bounded in the left halfplane in this case. For T < 0 the function #(z) may be determined from the following considerations. Suppose that '1/r1(z) = - c1eQT(Z2
- R2) + C3C(Z)(Z2 - 72)
Then c(z) must be analytic in the left half-plane except at the point
z = - 7 and we have *,(-a) = 1/! l(- /3) = 0. Since C(Z) = #1(z) + c1ezT(Z2 - R2)
C3(Z2 - 72)
LINEAR TRANSFORMATION OF RANDOM PROCESSES
240
and c(z) is analytic in the right half-plane, it follows that *1(z) is an entire function and we have ''lr1(7) = - c1erT (72 - 9Z)
(25)
.
If we set *1(z) = A(z)(z + a)(z + 6), the function A(z) must be an entire function. It follows from condition (8) of Theorem 3 that A(z) = const = A. The value of A is determined from equation (25): A=C1eTT7+IS
a-7
From this we get c(icv) = cl (a - 7)(W2 + 62)ei.T - e. T (,y + R)(i(o + a)(ico + R) (26)
(a-7)(w2+72)
c3
For the prediction and filtering of stationary sequences, we apply methods analogous to those that were discussed for processes with continuous time. The general solution of the problem of prediction of stationary sequences is given in the next section. Here we shall confine ourselves to a single example. Example 4. A process of autogression is defined as a stationary sequence e(t) satisfying the finite-difference equation
aoe(t) + a1e(t - 1) + ... +
p) = YI(t)
(27)
and subordinate to r)(t), where Y)(t) is an uncorrelated sequence such that Mr9(t) = 0 and Dra(t) = 62. Suppose that rz
eitwdd(w)
)7(t) = f rz
is the spectral representation of the sequence ra(t) and that C((o) is a process with independent increments and with structural function (1/27r)l(A (1 B), where l denotes Lebesgue measure. Then the spectral representation of the sequence e(t) must have the form e(t) =
e{t°q?(c))dC((o) J rz
,
where
I cp((o) 12 do) < - . J rz
rz
(28)
rz
Substituting (28) into (27), we obtain rrz eito,p(ei')co(w)dd(w) J
n
where P(z) = Yk=o akzk
= rz
ett",dC(w) rz
From this it follows that
'P (0)) = P(ei0))
(mod 1)
.
Let us suppose that the function P(z) has no zeros in the closed
disk I z I< 1.
Then 1/P(z) e H2.
If
8.
GENERAL THEOREMS ON THE PREDICTION OF STATIONARY PROCESSES
P(Z)
=j
bkZk(b0
=
Qo)
241
,
then
e(t) _
n=0
byj(t - n)
and we have obtained a representation of the sequence E(t) in the form of the response of a physically realizable filter to the uncorrelated sequence Y)(t). 1
(t)
ao
Since
[a,e(t - 1) + ... + a9E(t - p) + )7t]
(29)
the optimum prediction e(t) from given (t - n), where n = 1, 2, is of the form fi(t)
,
1 (a,$(t - 1) + a2E(t - 2) + ... + a,e(t - p)) ao
The minimum mean-square error of the prognosis is equal to {M_(t)
I211/2
I aoI2
a
=
1ao
Successive use of formula (29) enables us to obtain the optimum prediction several steps in advance. 8.
GENERAL THEOREMS ON THE PREDICTION OF STATIONARY PROCESSES
In this section we shall consider certain general theorems on the prediction of stationary sequences and processes with respect to the infinite past. By a stationary process we mean a process that is stationary in the broad sense and that has mathematical expectation 0. 1.
Prediction of stationary sequences
Let e(t), for t = 0, ±1, ±2, , denote a stationary sequence. Let HF denote the closed linear hull generated in L2 by all the quantities e(t), and let HF(t) denote the closed linear hull generated by the quantities e(n) for n < t. Obviously HF(t) C Hf(t + 1) and HH is the closure of u-,HF(t). Consider in HF the operation S representing time displacement. For elements of H£ of the form Y7 = E c (tk) this operation is defined by S)J = E Ckl(tk + 1)
LINEAR TRANSFORMATION OF RANDOM PROCESSES
242
The operation S has an inverse S-1:
S-1' = E ckWk - 1) and it preserves the scalar product, M(S(E Cke(ik))S(L dkee(rk))) - E Er ckdrM(e(tk + 1)ee(rr + 1)) k _ E E ckdrM(e(tk)ee(zr)) = M(E cke(tk)
dre(rr))1
.
Therefore S can be extended as a continuous operator to H. It then becomes a unitary operator in H. We introduce the spectral representation of the sequence e(t) (cf. Remark 4, Section 4), e(t) =
e'.,t p(da)
where p is a spectral stochastic measure with structural function F.
In what follows we shall not distinguish the measure F(A) from
the spectral function (generated by it) of the sequence F(w) _ F[-7r, w). We recall that a random variable 7) belongs to HH if and only p(D)/.t(dw), where p e LZ{F}. Consider the sequence ra(t) if = J n of random variables 7)(t) = Str) (t = 0, ±1, ±2, ). .1 rz
Lemma 1. The sequence r)(t) is stationary and subordinate to e(t) and it has the spectral representation
YAO = I eit.,p(w)1 (dw) . J ,
(1 )
That Y)(t) is subordinate to the process e(t) is obvious. That it is stationary follows from the unitariness of S: MO7(t + r)ri(r)) = 01(t + r), r7(r)) _ (St+ry), S "Y)) = (St)7, rI) = M(7)(t)21(0))
Finally, the spectral representation (1) is easily verified for elements
i of the form 17 = Z
(where p(o)) = E akeiltk) and it
is
obtained for arbitrary Y) by passage to the limit. We also note the following obvious properties of the operator S: a. SH£(t) = HF(t + 1); b. if e(P)(t) is the projection of e(t) onto HF(t - p), then Sea)(t) = eM(t + 1) , q) Since
M I (P)(t + q) I2 = M I S9e(P)(t)
IE
= M I e(P)(t)
the quantity M I e(P)(t) 12 is independent of t.
12
,
Therefore the quantity
8.
GENERAL THEOREMS ON THE PREDICTION OF STATIONARY PROCESSES
32(P) = MIfi(t) - e(t)I2= MIfi(t)I2-MI
243
I(t)i2,
which is equal to the square of the minimum mean-square error of the prediction of the variable e(t) with the aid of the quantity e(n),
for n < t - p, is also independent of t. Obviously, a2(1) < 32(2) < ... < 62 = M I fi(t) 12
The equation 32 (n) = 62 means that e(t) is, for arbitrary t, uncor-
related with all the variables ek, for k < t - n, so that the value of these terms yields nothing for the prediction of the variable E(t). If 3(1) = 0, then e(t) e HF(t - 1), so that HF(t - 1) = HF(t). Let us set HE = nt HF(t). In the present case, HF = He. This means
that if we know the sequence of values of the process e(k), for k < to, then all the subsequent terms of the sequence have with probability 1 a precise linear expression in terms of the observed values. In a certain sense we have a contradiction in the case in which Hs = 0, where 0 denotes the trivial subspace of HF consisting only of the element 0. Here, knowing the terms of the sequence (k) (for k n) yields little for the prediction of the variable (n + t) for large values of t since lime . M I I e"'(n) I I = 0 and limt__ 52(t) = Q2. Deflnltlon 1. If H£ = HH, the process e(t) is called a singular (or determined) process. If 8(l) > 0, the process e(t) is called an undetermined process.
If HF = 0, the process is called a regular (or
completely undetermined) process.
Theorem 1. the form
An arbitrary sequence has a unique representation of
fi(t) = es(t) + I(t) ,
(2)
where e,s(t) and 77(t) are mutually uncorrelated sequences subordinate to is singular, and where ra(t) is regular.
Proof. Obviously, SHF = H. Since S is a unitary operator, it follows that HF reduces S; that is, S is a one-to-one mapping of the subspace H1, = HF e Hs onto itself (cf. Lemma 2, Section 1). Let es(0) denote the projection of e(0) onto Hs and let 17(0) denote the projection of (O) onto H. Let es(t) = and 7)(t) = S°7)(0) for t = 0, ±1, ±2, Since (0) = es(O) + r(0), we ra(t). Here the sequences 77(t) and e(t) have e(t) = are stationary, mutually uncorrelated, and subordinate to e(t). Furthermore, since, in the equation fi(t) = e8(t) + ra(t) it is true
.
that es(t) e Hs and ra(t) e HF, we have HF(t) n Hs c HFS(t).
Therefore
HF c Hs. On the other hand, the inclusion relation
e8(t) a HE
implies that HHS(t) c Hs.
Thus for arbitrary t, we have H s(t) _
LINEAR TRANSFORMATION OF RANDOM PROCESSES
244
HF = HAS; that is, the sequences E2(t) is singular. Furthermore, the equation ' J(t) = fi(t) - e,(t) implies that 7)(t) e HF(t). Therefore
H, = n H,,(t) c Hs. On the other hand, by definition H,(t)
is
Thus Hs = 0; that is, the process 7)(t) is regular. The uniqueness of the representation (2) follows from the fact that under the hypotheses of the theorem the projection of ra(t) onto HF is equal to 0, Hs = H 8 = HFS, and consequently $,(t) is the orthogonal to H,8.
projection of 72(t) onto Hs. This completes the proof of the theorem.
The sequences ?7(t) and $s(t) are called respectively the regular and singular components of the process e(t).
Theorem 2. The regular component ra(t) of a stationary sequence can be represented in the form
ra(t) _ E a(n)g(t - n) , ri=o
(3)
where C(t) (for t = 0, ± 1, .) is an uncorrelated sequence, where Hi(t) = H,(t), and where En=o I a(n) 11 < -. Proof.
We introduce the subspace G(t) = H,2(t) e H,(t - 1).
This subspace is one-dimensional (if it were the zero space we would have 32(1) , 0 and 7)(t) would be a singular sequence). Let us choose in G(0) the unit vector i;(0). Then the sequence C(t) = SIC(O) is orthonormal (C(t) e Hi(t) (D HH(t - 1), therefore C(t) is orthogonal
to Hn(t - 1); also C(k) e H,(t - 1) for k < t), Hi(t) c H,(t), and n Hi(t) c n H,(t) = 0. This means that the sequence C(t) constit
'tutes a basis in HE. Expanding 7)(0) in elements of this basis, we obtain
72(0) = E a(n)i;(-n) , k=0
a(n) I2 = I 117(0) I I2 = M I)7(0) I2
(t) = q an:(t - n) *=a
.
(5)
The value of the mean-square error is determined from the equation q--+1
u2(q) = Ln=oI a. 12
.
(6)
We shall now obtain a formula for the best prediction that does not contain the sequence C(n). Since C(0) c H, we have
8.
GENERAL THEOREMS ON THE PREDICTION OF STATIONARY PROCESSES
247
I qp(w) l2 dF(w) < _ ,
C(0) = Jrzn p(w)v(dw) ,
Jrz
rz
where v is a spectral stochastic measure of the process 77(t), F(w) = E;,=o dne'' (cf. Lemma 1, Section 6).
I g(eiw) I' dw, and g(e") =
Consequently (cf. Lemma 1), fi(t) = SV0) = Jrz e:t' cp(w)v(d(v)
.
rz
To find the function p (o)), let us use formula (3). We have i7(t) = n=0 E anb(t - n) = f
rz
J
e:t' q (w) n=0
rz
Comparing this with the equation 7)(t) =
(w) _
dv(dw), we obtain
I
J rrz
2tr g(iw))-' f = (V
ane-an.l -1 n=o
We now have
y2so rz
ane-tinml
/1q(w)e.t.,v(dw)
n=qy
that
r J
eit.,r1 rz
L
-
gy(ea.)
v(dw) (7)
where y-1
gy(e:.,) =
Z ane,n.,
(8)
We shall now demonstrate a method of determining the function g(z) = Z =o bnzn, where bn = (l/1/2tr)an. In doing so, we shall obtain
both the general solution of the problem of the prediction of a stationary sequence and a formula for calculating the mean-square
error of the prediction. The function g(z) e Hz, g(O) = ao/V 27r is real (cf. Remark 1). With the aid of this function, the spectral density of the sequence r'(t), f,(w) = I 12 can be factored. On the basis of Remark 3 of Section 6, if g(z) has no zeros in the disk I z I < 1, it can be determined uniquely from f,(w). Therefore if the function g(z), constructed in accordance with Theorem 1, does not vanish for I z I < 1, it is identical to the function g(z) obtained in the proof of Theorem 1 of Section 6. Lemma 3.. For I z < 1, we have g(z) # 0. Proof.
We note first of all that if fe(w) _ I
12, where
LINEAR TRANSFORMATION OF RANDOM PROCESSES
248
h(z) _
c 12 < -, then 82(1) > 27r I co 12.
This is true because
N
M I7(0) - k=1 E N C1 - E dke-ik')(
l-a
Cke-ik.
dco > 27r co 2 .
k=0
k=1
Since this is true for arbitrary dk and N, we have
(9)
32(1)>27rIcolE.
Let us suppose now that g(zo) = 0 for I zo I < 1. The function p
gJz) =
E a"z"
Therefore g1(z) = (z - zo)=o b z", where
vanishes at the point zo. Then,
bo = -a0/(V27c 20). g(e;.,)
1
1/27C
= n=O
1/27c
a"e-u"0,
1 - e-i°zo
e-°' - 20 = I o brie-a". =
e -2lU I
- z0
o
b", e
".,
n=0
brie:".
where bo' = bo = -ao/(1/27r'2,). It follows from (9) that 31(1) > 21rIb0'12 =
a
°
za
which by virtue of (6) is impossible for I zo I < 1. This completes the proof of the lemma. Corollary. In formula (7) for the best prediction, the function g(z) e H2 is uniquely determined (under the hypothesis that g(0) is positive)
and it coincides with the function obtained in Theorem 1 of Section 6.
We have solved the problem of prediction for the regular part Now we need to clarify the questions: How can we express the spectral density of a sequence rJ(t) in terms of the spectral function of the process e(t)? What is the form of the formula for prediction for the sequence E(t) expressed in terms of the characteristic function of $(t)? of an undetermined sequence.
Lemma 4. Suppose that an undetermined process e(t) is represented in the form e(t) = 7)(t) + e,(t), where ra(t) and e,(t) are uncorrelated, $,(t) is a singular process, i7(t) is regular, and F(w), F,.((O), and F,((O) are the spectral functions of the sequences $(t),r)(t), and gs(t). Then
8.
GENERAL THEOREMS ON THE PREDICTION OF STATIONARY PROCESSES
249
the equation
F(w) = F,(w) + Fs((o)
(10)
is the decomposition of the function F(co) into an absolutely continuous component F,(co) and a singular component FF(oi) with respect to Lebesgue measure.
Proof. Formula (10) follows from the uncorrelatedness of the sequences ra(t) and es(t). We introduce the spectral represen-
tation of the uncorrelated sequence c(t) that appears in the representation (3): tC(dw) ,
(11) W) = f I r the stochastic measure with structural function rz
where C(A) is (1/2ic)l(A) (I denotes Lebesgue measure). we obtain
Substituting (11) into (3),
72(t) = Jrz e'1w1/27cg(e'w)C(dcn) J
rz
Suppose that
es(t) = J1 z
(12)
rz
is the spectral representation of the sequence Es(t). Then e(t) = 5ett0e(dw) =
g(e.w)C(dw)
+ ei (dw)) .
(13)
+ ps(dw))
(14)
It follows from equation (13) that 7rg(eaw)C(dw)
Jrz p(w)p(dw) = Jrz P(w)(1 rz
rz
for an arbitrary function p (w) e LZ{F}.
We can write yet another spectral representation for es(t). Since es(0) e He, we have
s(0) _ rz ps(w)p(dw) rz
so that
e'.tgs((o)II(dco)
es(t) = Stss(0) = Jrz
.
rz
Remembering (13) and (14), we obtain
s(t) =
Jrz
ecwt1ps((0)[1/27cg(e'IC(dw) + us(dw)] rz
Substituting the expression for es(t) given by equation (12) into this equation and transposing the second term in the above integral to
LINEAR TRANSFORMATION OF RANDOM PROCESSES
250
the left, we obtain e;wt(1
-
rz
The two sides of this equation contain elements of subspaces that
are orthogonal to each other.
Therefore they are both zero.
Consequently,
.ps(w) = 1 rps(w)g(e"w) = 0
(mod Fs) , (mod 1)
.
Since g(e°l) can be equal to zero only on a set of Lebesgue measure 0, it follows that Ts(w) is equal to 0 almost everywhere. Let S denote the set on which ps(w) = 1. Then 1(S) = 0. Thus FS(A)
I
= JA
FF(A) =
cps(w) 12 dF(w) = F(A (1 S) , g(e;.,) I2 dw .
27c 1 A
This completes the proof of the lemma. Lemma 5. three integrals
Suppose that g1(co), q2(w), and q)3(w) are such that the
q,1(w)ft(dw) J
,
s
Jrz q 2(w)v(dw) ,
JRn q,3(w)l(dw)
rz
are the projections of the quantities e(t+q), ri(t + q), and s(t + q) onto HE(t), H,(t), and Hes(t) respectively. T1(w) = q)2((O) = TAW) = e.t((1
Proof.
Then
- g(e:l)
(mod F) .
I
In view of formula (7), it will be /sufficient to prove
that PI(w) = 'P2((0) = 'P3(0). It follows from the equation S(t + q) - Jrz g1(w),u(dw) =
[(t + q) - Jrz g1(w)v(dw)] + [e2(t + q) -
gi(w),us(dw)] (15)
and the orthogonality of the terms in the bracketed expressions on the right that 2
81(q) = M rI(t + q) -
g1(w)v(d(o) 2
+M
I es(t
+ q) -
J
rz
with equality possible only when rpl(w) = p2((o)
(mod Fr) ,
8.
GENERAL THEOREMS ON THE PREDICTION OF STATIONARY PROCESSES
es(t + q) = J rz q 1((o)ps(dw) ,
q(w) _ q (w)
rz
(mod Fs)
251
.
On the other hand, 32(q) = 32(q) by virtue of the definition of e,s(t). This completes the proof of the lemma. The results obtained can be formulated as:
Theorem 4. Let e(t) denote an undetermined stationary sequence. Then the optimal prediction e(°)(t) of the quantity (t + q) from the results of observation of e(r) for v < t is given by the formula e(9)(t)
rz
-
1-
eitro
09(eiw)
f"(dw)
g(ei.,)
S-rz
where p is the spectral stochastic measure of the sequence e(t), 4-1
g(z) _
bnZn n=0
g4(z) =
,
n=0
bnZn
the function g(z) e H2 has no zeros in the disk I z I < 1, g(O) is positive, and I g(eil) 12 = f(c)), where f((o) is the derivative of the absolutely continuous component of the spectral function of the sequence e(t). The square of the mean-square error of the prediction is equal to 32(q) = 27c exp
tat
rz
In f((o)dw) E I C.
12
where the cn are determined from the equation exp {_i_
Inf(w)dll = 1
rz
2ir n=1
Zn -rz
n=0
In particular, rz
32(1) = 27r exp (
\ 27r
In f(co)dco) -rz
The theorem follows immediately from Lemmas 4 and 5 and formula (7) of the present section and from Theorem 1 and Remark 2 of Section 6. 2.
Prediction of stationary processes with continuous time
Let e(t) (for - oo < t < rco) denote a stationary process e(t) =
ei.t jt(dw)
,
where p is an orthogonal stochastic measure on the real line
(-- < (v< o.) and
LINEAR TRANSFORMATION OF RANDOM PROCESSES
252
R(t) = M(e(t + u) (u)) =
Me(t) = 0 ,
e;e,dF(w)
F(+oo)= a'. We introduce the Hilbert space HF = H{e(t), - 00 < t < oa } and < z < t}. In HF we define its subspaces H2(t) = H{e(t), the group of operators representing time displacement Sr (for - ca < v < ca) by setting ST1
Cke(tk))
= E Ck6(tk + v)
and extending the definition of Sr as a continuous operator to the entire space H2. The Sr constitute a group of unitary transformations of H6. This group has the same properties, with obvious modifi-
cations, as the group of transformations S% in the case of discrete time. The problem of optimal linear prediction for a process e(t) consists in finding a random variable eT(t) e HF(t) such that 12
for an arbitrary element 7) of HF(t). This problem has a unique solution: the variable eT(t) is the projection of (t + T) onto HF(t). We set
The quantity 8(T), the mean-square error of the prediction, is a nonincreasing function of T and 0 < $(T) < a. If lim 8(T) = a, T-,the process is said to be regular (completely undetermined). If 3(T0) = 0 for some To, then HF(t) C HH(t - To) for arbitrary t. Consequently,
He(t) C n Hf(t - kTo) k=1
for arbitrary t and 8(T) = 0 for all T > 0. In this case the process is said to be singular (determined). We shall call nonsingular processes undetermined processes.
The proof of Theorem 1 can be carried over directly to processes
with continuous time: an arbitrary stationary process admits a decomposition of the form e(t) = YI(t) + eS(t)
where ri(t) is a regular and es(t) is a singular process and the two are uncorrelated and subordinate to e(t). The analogue of Theorem 2 is: Theorem S. A regular stationary process r1(t) can be represented
8.
GENERAL THEOREMS ON THE PREDICTION OF STATIONARY PROCESSES
253
in the form
a(t
ra(t)
(16)
where C(v) is an integrated white noise, Hi(t) = Hi(t), and Ia(t)12 dt
where f(w) is the derivative of the absolutely continuous component of the spectral measure F of the process e(t).
If E(t) =)7(t) + es(t) is the decomposition of the process e(t) into regular and singular components and if in accordance with Theorem 5, 72(t) =
a(z)0}PlE17-i> j=1
ijnj>1-1}
1- 1}
Therefore
P{I1 /nk > 1} < (PI "
'/nj > 0})
1+1
Taking the limit as n --> o, we see that P{v(t, A) > 1} 5 (P{v(t, A) > 0})1+1 .
Observe that .p{v(t, A) > 0} < 1. would obtain
(1 )
If this were not true, then we
P{v(t, A) = 0} = jj p{v(tk, A) - v(tk_1, A) = 0} = 0 k=1
for 0= t0 < t1 < ... to = t and hence there would exist t' and t" arbitrarily close to each other such that p{v(t", A) - v(t', A) > 1} = 1,
which would contradict the stochastic continuity of v(t, A). existence of Mv(t, A)r for r > 0 follows from the inequality Mv(t, A)r < E p{v(t, A) > k}(k + 1)r k=0
E (P{v(t, A) > 0})k(k + 1)r < 0 . k=0
The
1.
MEASURES CONSTRUCTED FROM THE JUMPS OF A PROCESS
(The series Ek-o (k + 1)'xk converges for arbitrary I x I < 1.) completes the proof of the theorem. Corollary. Define II(t, A) = Mv(t, A). 11(t, A) is, for fixed t, a measure on 58E.
257
This
Then the set function
Proof. If A = U 7-1 Ak where the Ak are pairwise disjoint, then v(t, A) = E7=1 v(t, Ak), and consequently Mv(t, A) = Ek-, Mv(t, Ak),
in view of the fact that 0 < Y,k=1 v(t, Ak) < v(t, A). To study the properties of the quantity v(t, A), we find it useful to consider the process (t, A) defined by e(t, A) = E [e(S + 0) - e(S - 0)]XA(e(S + 0) - e(S - 0)) , s a JLk=1D(Enk, xi) + L n-- P{Li k=1 k=1 _
1
e_n2/2du
V 27
,
n
lim P{E (enk, xti) < -aY Ek=1D(enk, xi) + k -
L (MSFnk, xd)} k=1
e-n212du
1
2-7r
--
But these last relations contradict the boundedness nk=1 nk, xti), which follows from relation (1). Thus Ek=1 x;) is bounded for all i. We also note that by Chebyshev's inequality, n
E
L
sl. that
We note
m
E e(t, Ak) + k=2 The terms on the right are independent by virtue of Corollary 1 to Theorem 2 of Section 1. Therefore, for arbitrary x, ,
k=2
D(e(t, Ok), X) < D(eE1(t), x)
,
and hence for arbitrary x the series Ek=2 D(e(t, Ok), X) converges. Let us choose a sequence {nk} (where n1 = 2) such that
Z D(e(T, 0;), x;) < .7=nk
1k6 for i= 1, 2, ' , r .
Then the sequence nk
(2)
[e(t, 0;) - Me(t, o;)] will, with probability 1, converge uniformly to some limit as k
oo.
To see this, note that
p sup
05t5T
nk+1
nk
E [e(t,4) - Me(t, 0;)] - Zj=2[e(t, 0;) - Me(t, 0;)] >k2} ;=nk+1 nk+1
pjsupT {=1
E (e(t, A j) - Mee(t, 0;), xa)
025t 5
nk+1
lim p{ sup
{=1 m-.o
=
15mT
1
}1(e(m,
Me((1
m'
/
x)
-
1
}
k2/-r
nk+1
lim rk°M t=1 m.-.o
E (W(T, A;) - Me(T, 0;), xi)
=nk+1
(Here we used Kolmogorov's inequality: Theorem 1, Section 4, Chapter III.) Since the series Ek=, r2/k2 converges, it follows from the BorelCantelli theorem (Theorem 2, Section 3, Chapter III) that the terms of the series nk+1
E (e(t, 0;) - Me(t, 0;)) E k=1,7=nk+1 .
are eventually majorized, with probability 1, by the terms of the convergent series Ek 11/k2. Hence, the sequence (2) converges uniformly with probability 1. Therefore, there exists a process e0(t) that is the uniform limit of the sequence nk+
eE1(t) -
L [e(t, 0;) - Me(t, 0;)] 3=2
2.
CONTINUOUS COMPONENTS OF A PROCESS
267
Since E(t, A,) is a stochastically continuous process and sup M I e (t, Oj) 12 < 05LST
it follows from Theorem 6, Section 5, Chapter II, that lim Me(t, Oj) = Me(s, A;) E-+8
Consequently, the process e,,(t) - J;k2 [e(t, A;) - Mf(t, p;)] fails with probability 1 to have jumps exceeding e,, in absolute value, and the process e0(t) (the uniform limit of such processes) is continuous with probability 1. We note that in accordance with Kolmogorov's theorem (Theorem 4, Section 4, Chapter III), the series j=2
[e(t, Oj) - Me(t, Oj)]
converges by virtue of the convergence of the series,
D(e(t, Oj), x)
for every x. For every t, the sum of the series ET, [e(t, 0 j) - Me(t, /.j)]
coincides (mod P) with the sum of the series
i
k+1
k=1j-nk+1
[e(t, Aj) - Me(t, Oj)]
Thus we have Theorem 1. For every separable stochastically continuous process with independent increments, there exists a continuous process E0(t) such that E(t) = e0(t) + e(t, 01) +j=2 E [E(t, Oj) - Meet, 0;)] REMARK 1.
The process e((t), being the limit of the sequence
of processes 9=2
is independent of each of the processes e(t, A,), where j = 1, 2,
.
, m.
Since
e0(t) + E (e(t, A,) - MSF(t, A,)) = es1(t) j=2
and m I eE1(t) I2
0, lim E P{I go(tnk) - g0(t,, k-1) I > p} = 0 n-.oo k=1
(4)
On the basis of formula (4) we may assert that there exists a sequence {pn} that converges to 0 as n oc and such that lim E P{I e0(tnk) - g0(tn, k-i) I > pn} = 0 n-.oo k=1
Set gnk = (z, kPn(g0(tnk) - g0(tn, k-i))), where *P(x) = 0 for I x I > p
and *P(x) = x for I x I < p. numbers I z I pn.
i
Pt
The quantities gnk are bounded by the
Since enk # (z, e0(t2) - e0(tl))1
k-1
+ (
L Plgnk z (z, g0(tnk) - g0(tn, k-1))} k=1
n
< Ek=1 P{I g0(tnk) - g0(tn, k-1) I > pn}
,
the sequence {Ek=1 gnk} converges in probability to the quantity g0(t2) - g0(t1). Let us suppose that D(z, go(t2) - g0(t)) > 0. Then n
lim E De%k > D(z, g0(t2) - g0(t1)) > 0 n-.oo k=1
Therefore Theorem 5, Section 3, Chapter I is applicable to the quantities gnk - Mgnk k=1
Dgnk
and hence,
lim M exp J i? I
Furthermore,
L1=1 gnk - k=1 L Me,k) L Dgnk k=1
= e-2
2
(5)
2.
CONTINUOUS COMPONENTS OF A PROCESS
lim M exp {i?. %--
e%k}
269
= M exp {iX(z, c0(tz) - eo(t1))}
k=1
From these two relations one easily shows that (z, co(t2) - eo(t1)) has a normal distribution. On the other hand if D(z, co(t2) - eo(t1)) = 0, then (z, eo(tz) - eo(t)) = M(z, eo(tz) - e0(t1)), and formula (3) is obviously valid. This completes the proof of the theorem. REMARK 1.
The expressions
M(z, e0(t) - e0(0))
are continuous functions of t. proved that P{1 (z, SM(tz))
and
D(z, eo(t) -bo(o))
It follows from the theorem just
- (z, eo(t1)) I > s} =
2
e-uz1zdu
1
,
U
(6)
where U = {u: I u + M(z, eo(tz)
- o(t1)) >
eo(tz) - ea(t1)))-1} .
If there were a t such that D(z, eo(t2) - e0(t1)) -%- 0 as t, - t and t2 - t,
the right-hand member of (6) would also not approach 0 and this would contradict the stochastic continuity of the process e0(t). For t1 b
t2-.t
it follows from (6) that lim P{I (z, e0(tz) - eo(t1)) I > E} >
I
.
tz-.t
This inequality ensures the continuity of M(z, e0(t) - e0(0)). REMARK 2. If we let a(t) denote the quantity M(e0(t) - e0(0)) and if we let A(t) denote a nonnegative symmetric linear operator
in X such that D(z, e0(t) - e0(O)) = (A(t)z, z) ,
then the distribution of the process e0(t) is determined by the characteristic function M exp {i(z, e0(t))}
=
Mei ,e0(O)) exp {i(a(t), z)
- 2 (A(t)z, z)}
(7)
PROCESSES WITH INDEPENDENT INCREMENTS
270 3.
REPRESENTATION OF STOCHASTICALLY CONTINUOUS PROCESSES WITH INDEPENDENT INCREMENTS
Let us consider stochastic integrals with respect to the measure v(t, A). As we mentioned in Section 1, the measure v(t, A) is a countably additive nonnegative function of the set A e 0E Suppose that a measurable function fi(x) is bounded on every compact subset
of the space X and is equal to 0 for I x I < s (where s is some positive number).
Then the integral T(x)v(t, dx) can be defined
in the usual way. This follows from the finiteness of the measure v(t, A) on 0. and also from the fact that v(t, X,o) = 0, where Xp is the set of all x such that I x I > p and
p=maxIe(s+0)-e(s-0)I 025sst Thus we are actually considering the integral only over the set Is < I x I < p} on which the function g(x) is bounded. that
Let us show
e(t, A) _ ` xv(t, dx) . J
(1 )
A
If A = U 7-1 Bk where the Bk are pairwise disjoint sets with diameters not exceeding 8 and if x, e Bk, then e(t, Bk) - xkv(t, Bk)
e(t, A) - E xkv(t, Bk)
k
1
+
( J e1 1+ I x 2
II(t, A)
,
and the measure G(t, dx) is defined by the relation X 12 II(t, G(t, A) =
dx)
I
JA 1 +
Ix12
Formula (9) is convenient because the measure G(t, A) is finite and is defined on the a-algebra of all Borel subsets of the space X. A random process e(t) with independent increments is said to be homogeneous if it is defined for t > 0, $(0) = 0, and the distribu-
tion of the quantity (t + h) - e(t) depends only on h. Theorem 3. For a stochastically continuous homogeneous process e(t) with independent increments, there exist a vector a e X, a symmetric
linear nonnegative operator A, and a function II(A) that is for every positives a finite measure on 0. and satisfies the inequality lim1 e-.0 .1 . 0 .
Let this probability be denoted by pl. Then we may write sup IC(S) - W,,) I > 0} P1 l} --- 0 as l , -, the process e(t) is
constant on v + 1 adjacent intervals and the quantity v is finite with probability 1. This completes the proof of the theorem. Suppose now that g(t) is a process with numerical values, that is, that X is the real line. Let us investigate the conditions under which the sample functions of the process E(t) are, with probability 1, monotonic functions. Theorem 2. For the sample functions of a numerical separable stochastically continuous process e(t) with independent increments to be nondecreasing with probability 1, it is necessary and sufficient that the characteristic function of the variadle e(t) be given by the formula Mei2ect> = Mei2e(o) exp {ix-l(t) +
1)II(t, dx)
(3)
0
where the measure II satisfies the condition \xfl(t, dx) < oo and 7(t) 0
is a nondecreasing function.
Proof of the Necessity. If e(t) is a nondecreasing function, the process e(t) has only positive jumps; hence II(t, A) = 0 for every set A lying on the negative half-line. We note also that in the present case the process E(t) - E(t, XJ is also nondecreasing (since removal of the jumps does not destroy monotonicity). Similarly, the process
e(t, XE) - e(t, X1)
for 0 < s < 1
,
is also a monotonic process; also 0 < W, XX) - e(t, X1) < e(t) - E(0) - E(t, X1)
On the basis of the lemma of Section 2, M[e(t) - E(0) - E(t, Xl)] < Therefore
PROCESSES WITH INDEPENDENT INCREMENTS
278
M[e(t, XE) - e(t, XM =
J
xH(t, dx) < M[e(t) - e(O) - c(t, X1)]
Taking the limit ass 0, we see that \xH(t, dx) has a finite value. We note also that the quantity e(t) - E(t, XE) decreases as s , 0 0
(since more positive jumps are discarded with decreasing s). Consequently the limit lim, [s(t) - g(t, XE)] = E0(t) exists with probability 1 and the process e0(t) is, with probability 1, continuous. As shown in Section 2, the increments of the process e0(t) will have Gaussian distributions. But the process e0(t), being the limit of nondecreasing processes, will itself be nondecreasing so that P{e0(t)
- 0(0) > 0} = 1
.
It follows from this relation that D(e0(t) - e0(0)) = 0 (since a normally
distributed variable a can be nonnegative with probability 1 only when D = 0). Thus where
spo(t) = e0(O) + 7(t),
7(t) = M[e0(t) - e0(0)]
and hence does not decrease. Formula (3) can be obtained from the relation Metiae(t) = lim Mei2eo(t)Met2 t.xd EGO
if we keep in mind the form of the process e0(t) and formula (5) of Section 3. This completes the proof of the necessity. Proof of the Sufficiency. Let us show that P {W(t2) - W(t1) > 0} = 1.
To do this, it will be sufficient to show that with probability 1 a nondecreasing random variable e whose characteristic function has the form
- 1)dG(x)}
Mebxe = exp {.:ei2
,
(4 )
where G(x) is a monotonic bounded function, (g(t2) - e(t1)) is the
sum of 7(t2) - 7(t) and the limit of a quantity with characteristic function of the form (4)). Let us set F(x) = c[G(+ oo) - G(x)], c = [G(+ oo) - G(+ 0)]-1 Then Me:ae =
c k=o k!
e-E(1r ei2ldF(x)
lk
J o
so that the characteristic function of the variable
coincides with
the characteristic function of the variable Sy, where So =0 and +,,. Here 1,2, is a sequence of independent Sn = 1 + identically distributed nonnegative variables with distribution function
4.
PROPERTIES OF THE SAMPLE FUNCTIONS
279
F(x), and v is a Poisson random variable independent of Consequently c is nonnegative. Thus P{c(t2) > c(t1)} = 1
for t1 < t2
.
It follows from the last relation that the event that the inequality e(t1) < e(t2) will be satisfied for all pairs of rational t1 and t2, where t1 < t2, has probability 1. Using the fact that g(t) is separable and has no discontinuities of the second kind, we conclude that P(W(t1) < Vt2), t, < t2) = 1
This completes the proof of the theorem. Let us investigate the conditions under which the sample functions of the process e(t) are with probability 1 of bounded variation. We recall that the variation of a function x(t) given on [a, b] with range in X is defined as n-1
var x(t) = sup E I X(ti) -
X(till)
i=0
[a,b]
with the supremum being taken over all possible partitions of the interval [a, b]: a = to < t1 < . . . < to = b. Theorem 3. For the sample functions of a separable stochastically continuous process e(t) with independent increments and defined on an interval [0, T] to be of bounded variation on that interval with probability 1, it is necessary and sufficient that the characteristic function of the variable e(t) be given by formula (8) of Section 3, with
var a(t) < o [0,T]
A(t) = 0, and the measure II(t, A) such that OQxI51
I x1II(t,dx) < - .
Proof of the Sufficiency.
Since the process defined by xv(t, dx)
e(t, X1) = IxI>1
is with probability 1 piecewise-constant, its variation, which coincides with the sum of the absolute values of the jumps, is finite. The function a(t) is, by hypothesis, of bounded variation. Therefore to prove the boundedness of the variation of e(t), it will be sufficient to prove the boundedness of the variation of the integral xv(t, dx) 0 0, P{max w(t) > a} = o5t5T
e_(x2I2T)dx 2 VT7FT-- a
This follows from Theorem 1 with (c, d) = (-=, oo). Theorem 2. Suppose that a, < 0 < a2 and [c, d] c [al, a2]. p{min w(t) > al, max w(t) < a2, w(T) e [c, d]} 05t5T
05t5T
1
Z/27cT
- exp { Proof.
Then,
i
ai))2k J"I
exp
2T
(x } 2k(a2 -
\
T (x - 2a2 + 2k(a2 - ai))2}]dx .
-2
J
(4)
Let W10 denote the event that the process w(t) defined
on the interval [0, T] crosses the level a; earlier than it does the level a; (where j # i and i, j = 1, 2) and then crosses the interval [al, a2] no fewer than k times (we are assuming that the function x(t) crosses the interval [al, a2] k times if the function sgn (x(t) - a) + sgn (x(t) - a2)
S.
287
PROCESSES OF BROWNIAN MOTION
The desired probability can
changes sign k times) and w(T) e [c, d]. be expressed as follows:
P{w(T) e [c, d]} - P{jol)} - P{Ko'}
To calculate P{ o°'}, let us find the probabilities y q-L j i, j = 1, 2) P{ k)} + P( W1} = P{'El(ks) U %W1}
As one can easily see, the %k' U Wk )1 is the event that the process
w(t) crosses the level a, prior to the instant T (though not necessarily before it crosses the level a;) and then crosses the interval [al, a2] no fewer than k times before, at time T falling into the interval [c, d].
Let v, denote the instant of first crossing of the level
a;, let v2 denote the first crossing of a; after the instant z-,, let z-,, etc. We set denote the first instant of crossing of a; after lPt)
2w(z-1) - w(t) wl(t)
w2(t)
12w,(z-,) (w2(t) 'N3(t)
- wl(t)
2w2(v3) - w2(t)
for for for for for for
t < z-1
,
t > zl t < 2.2
t> t < v3 t > z3
,
etc.
We note that the processes w,(t) are processes of Brownian motion since z, is the instant of first crossing of the level a; + (1 - 1)(ac - a,)
by the process w,_1(t). If the event Kki' U Wk+l occurs, the process wk+l(t) for t < T crosses successively the levels
aa,a.+(a;,-a;), ...,a;+k(a,-a;) and at the instant T falls in the interval [ck, dk], where
ck=c+(k+1)(ai-a;), dk=d+(k+l)(ati-a,) ck=2a;-d+k(a;-a.),
dk=2a:-c+k(a;-a3)
for odd
k,
I J
for even k .
Conversely, if wk+1(t) satisfies these conditions, the event k' U 'k+1 occurs. Since wk+l(t) is a continuous process that vanishes at t = 0,
for wk+l(T) to fall in the interval [ck, dk] it must beforehand cross Therefore the levels a; +ppl(a, - a3) for 1 = 0, , k. YY
P{Elk' U uk+1} = P{Wk+l(T) a [ck, dk]} = P{W(T) e [Ck dk]}
It follows from the continuity of the process w(t) that w(t) crosses, with probability 1, the interval [al, a2] finitely many times and hence,
PROCESSES WITH INDEPENDENT INCREMENTS
288
P{Wk)}--0 as k -->
Taking the limit in the equation
00 .
PJ%011)} + PIW011)} = (- l)n+l[p{1+1} + p{'Lln2+1}]
Y lyy + Jay E (- l)k(P{wkl)} l l + p{GLk2)} + _ P{'Llk1}1} + P{"Gl,k -1}) k=o\ we obtain
as n
2
p{ClI(01)l 1-L f +
l)k
k=0 i=1 2a1-c+2k(a2-a1)
l
(f k=0
-
5
(p{`^'kro)}
+ T p{LLk}1})
e (2I2T)dx +
52a2-+2k(a2--i1)
2a1-d+2k(a2-a1)
d+2(k+1)(a2-a1)
e-(x2/2T)dx
e-(x212T)dx
2a2-d+2k(a2-a1)
- 1 d+2(k+1)(a2-a1) e-(x2/2T)dxc+2(k+1)(a2-a
Therefore the desired probability is equal to d+2k(a2-a1)
1
l1
727' kfl
I
1
2a2 -d+2k(a2-al)
e-(a2/2T)dx -
c+2k(a2-al)
e-(x2I2T)dx
j2a2-c+2k(a2-a1)
If we set x - 2k(a2 - a) = u in the first integral and
2k(a2-a) +2a2-x=u in the second, we obtain formula (4). This completes the proof of the theorem. Corollary 1. The joint distribution of the quantities max w(t), and min w(t) 05t5T
05t5T
for al < 0 and a2 > 0 is given by the formula p{ min w(t) < a1, max w(t) < a2} 0 g(ak}1)}.
g(t) < g(ak}1)
2P{e(t) - e(ak) > 0} > 1
and
.
Therefore P( k) < 4P{e(ak) > g(t)}P{e(t) - e(ak) > 0} < 4P{e(t) > g(t)} Consequently, rak+l
P{Xk}dt
g(t)}dt
and
E P{Xk}
g(t)}dt
It follows from the Borel-Cantelli lemma (Theorem 2, Section 3, Chapter III) that with probability 1, only finitely many of the events Ik occur; that is, for some (generally speaking, random) number, k0,
6. ON THE GROWTH OF HOMOGENEOUS PROCESSES
291
the events uk do not occur if k > k0. This means that p{lim k-.o 4(a
')
sup
e(t)
k-1--t5¢k
For t e [ak-1, ak] (where k >_ k0), fi(t) k2(a2)g(t)
1 and X > k2(a2) the function Xg(t)
is
the upper function for e(t). This completes the proof of the theorem.
Theorem 2. Let E(t) denote a symmetric homogeneous process with independent increments and let g(t) denote a function of regular growth such that the series Z p{e(ak) > g(ak)} k=1
diverges for all a > 1. Then for arbitrary X < 1, xg(t) is the lower function for the process g(t). Proof.
We first show that: P
1.
(lim g(t)I > 1
(r < 1)
/
.
(3)
Suppose that lim p{e(t) > g(t)} >t-.co
Then there exists a sequence {tk} such that
p{e(tk) > g(tk)} > 2 -
'
Hence, by symmetry, p(I e(tk) I < g(tk)) < 1/k2, and so the series
jk=1P{I E(tk) j< g(tk)} < k=1 L
1
k2
converges. It follows from the Borel-Cantelli lemma that with prob-
ability 1, e(tk) > g(tk) from some k on. p
2.
m L EM > 1 {lik-.oo %9(tk)
This means that for X < 1,
-1
Suppose that there exists a S > 0 such that lim p{E(t) > g(t)} < 1 t__
Then for sufficiently large t,
PROCESSES WITH INDEPENDENT INCREMENTS
292
P{I e(t) I < g(t)} >- a Consider the independent events
0k = {e(ak+1) - S(ak) > g(ak+1) - g(ak)}
where a > 1. Then g(ak)
P{Bk} >
>
P{e(ak+1) - z > g(ak+1) - g(ak)}p{E(ak) e dZ}
_ J
9(¢k) J
g(ak+1)}
/
9(¢k)
P{e(ak) e LIZ} -9(¢k)
= P{e(ak+1) > g(ak+1)}p{I S(ak) I < g(ak)} > P{e(ak}1) > g(ak}1)}U
for sufficiently large k. Consequently the series Ek=1 P{ k} diverges.
Hence on the basis
of the Borel-Cantelli lemma, infinitely many of the events 0k occur with probability 1. We note that the event 8k implies one of the events
{-e(ak) > g(ak)}; {e(ak}1) > g(ak}1) - 2g(ak)}
.
Therefore, with probability 1, infinitely many of the events {I e(ak) I > g(ak) -
2g(ak-1)}
Given r < 1 choose a in such a way that
occur.
g(ak) - 2g(ak-1) = g(ak)rl LL
- 2g(ak-1) g(ak) J
> g(ak)[1 -
2 ] k1(a)
> Xg(ak)
(the possibility of such a choice of a is ensured by the regularity of the growth of g(t)). We see that {11He(t)I > 1} = 1
.
We now show that (3) implies that g(t) is a lower function for e(t).
C = lim t
E(t)
>1
Xg(t)
D = fri-m-
-e(t) >
1}
X90
Then it follows from (3) that P(C U D) = 1. We conclude from the symmetry of the process e(t) that P(C) = P(D). Finally from the 0-or-1 law (cf. Theorem 5, Section 3, Chapter III) it follows that p(C) and P(D) can be only zero or one. This means that P(C) = p(D) =1 since otherwise we would have p(C) = 0 and hence p(C U D) = 0
which contradicts equation (3). theorem.
This completes the proof of the
6. ON THE GROWTH OF HOMOGENEOUS PROCESSES
REMARK 1.
293
Suppose that in these two theorems we consider
a < 1 and instead of the function g(t) we consider the function p(t) = 1lg(1/t) where g is a function of regular growth. Without otherwise changing the proofs of the theorems, we see that the following assertions are true: a. If e(t) is a symmetric separable homogeneous stochastically continuous process with independent increments and if J1
t
P{fi(t) > p(t)}dt < o ,
then for ? > 1 the function ?p(t) will be a locally upper function for the process e(t); that is, p{lim, $(t)/xrp(t) < 1} = 1. If fi(t) is a homogeneous symmetric process with independent increments such that the series Ek=1 p{e(ak) > q (ak)} diverges b.
for every a < 1, then for ? < 1 the function ?q(t) will be a locally lower function for $(t); that is, p{lim, e(t)/xq(t) > l} = 1. Let us apply these results to a process of Brownian motion. Such a process is symmetric. By using the inequalities 1
1/ 27r
e-(u2/2)du
i -_e-
_
1
1/ 27r
u e-(u2/2)du
1
=
z
z
f +1e-(u2i2)du >
(z2/2)(Z
e
> 0)
27r
e-(z+1)212(z > 0)
1
1/ 27c
z
we see that e (J-tt }1)2/2 < p{w(t) > z}
(1 + s) V2t In In t}
1.
On the other
hand,
p{w(ak) > (1 - s)1/2ak In In ak} >_
e-(1/2)
1/ 27r
exp { - (1 - s2)[ In In all + V2 In In ak ] J}
C exp { - a In In ak} = C(ln ak)-a
PROCESSES WITH INDEPENDENT INCREMENTS
294
In ak)-1] < a < 1 (as will be the case for sufficiently large k). Consequently the series E P{w(ak) > (1 - s)1/2ak In In ak} diverges. Thus we have proved: Theorem 3. If w(t) is a separable process of Brownian motion,
if (1 - s2)[1 + (V2-1
then
w(t)
1/2tInInt
t
By using Remark 1 we can prove: Theorem 4. If w(t) is a separable process of Brownian motion, then
P lim t-o
w(t)
2t In In 1 t
Theorem 3 and 4 are called the "law of the iterated logarithm." In studying upper and lower functions for I e(t) I where e(t) is a process with independent increments, we use: Lemma 2. Let e(t) denote a separable stochastically continuous process with independent increments, for which there exists an a < 1 such that P{I e(T) - e(s) I > C} < a for 0 < s < T.
Then for every x > 0,
P{o5s5T supIa(s)I>C+x}< Proof.
1
I - a
P{Ie(T)I>x}.
(4)
It follows from Theorem 2, Section 4, Chapter III that
P{ sup 15k5n
e(nT) >x+
C}
x}.
Taking the limit as n , oa , we obtain proof of the lemma. Theorem S. Let e(t) denote a separable homogeneous stochastically continuous process with independent increments and let g(t) denote a
function of regular growth such that for arbitrary s > 0, lim p{I e(t) I > sg(t)} < 1 and
I J
P{I e(t) I > g(t)}dt
1 the function ? g(t) is an upper function for I e(t) I ; that is, I
Proof. event
I 0. Let %k denote the {sup I e(t) I > (1 + 2s)g(ak}1)} t 0 and a TO such that P{I E(t) I > sg(t)} < 1 - c
for t > T0.
Since for TO < t < ak, P{I e(t) I > sg(ak}1)} < 1 - C
P{I e(t) I > sg(t)} < I - c , and
lim sup P{I e(t) I > sg(ak+1)} = 0 k-.oo t sg(ak)} < 1 - c for sufficiently large k, we obtain P{%'} < 2 P{I e(ak) I > (1 + s)g(ak+1)}p{I c(t) - e(ak) I c
Eg(ak+1)}
g(ak+l)}
g(t)}
so that py
1
P{"Ik} < c2
c
ak+11
ak t P{I e(t) I > g(t)}dt ,
P{ Ik} < (ln a)-1c-2J at t P{ fi(t) > g(t)}dt .
This means that with probability 1, only finitely many events ak occur. By using the reasoning of Theorem 1, we see that the function X(1 + 2s)k2(a2)g(t) is, for X > 1, an upper function for I e(t) 1.
Since k2(a)
1 a,1 and since a> 1 ands > 0 are arbi-
trary, the assertion of the theorem follows.
296
PROCESSES WITH INDEPENDENT INCREMENTS
Analyzing the proof of Theorem 2, one can easily prove: Theorem 6. Let c(t) denote a homogeneous process with independent increments. If a function of regular growth is such that the series
E P11 e(ak) I > g(ak)} k=1
diverges for every a > 1, then for every 0 < X < 1 the function ?,g(t) is a lower function for I e(t) 1; that is, l
pflimlS(t)I >1}=1 The results of Theorems 5 and 6 can be reformulated for the case in which t 0, in a manner analogous to that used in Remark 1.
VII JUMP MARKOV PROCESSES
Let X denote an arbitrary space with fixed 6-algebra 0. Let us interpret X as the phase space of some physical system E and
let us denote the state of E at the instant t by c(t) (e X). Let us suppose that the time t varies in discrete amounts (t = 0, 1, 2,
).
Let us suppose that the change in the system E, from its state x at the instant t into another state at the next instant t + 1, is completely determined by the time t, the state x, and some random factor at that constitutes, for the different values of t, a sequence of independent random elements. Thus
(1 ) e(t + l) _ .f(t, e(t), at) , where f(t, x, a) is a function of the three variables t, x, and a, where t = 0, 1 , 2, , x e X, and a e A. Formula (1) enables us to express the state of the system E at an arbitrary instant s by at an instant t < s: starting with the state e(t) of the system e(s) = gt,s(c(t), at, at+1, ..., a8-1) .
(2)
We emphasize that e(t) in this equation is independent of the set at, at+1, ..., as-1. Let {S2, Cam, P} denote the probability space on which the random
elements at are defined. Let us suppose that for arbitrary fixed t and s (where s > t) the function gti8(x, at, at+1, , a8-1) is (0 x e)measurable. Then if the motion of the system Y, begins at the
instant t and its initial state e(t) = x is known, formula (2) enables us to determine the probability that E will fall in an arbitrary set A e 0 at the instant s > t. We shall call this probability the transition probability and we shall indicate it by p(t, x, s, A).
If XA(x)
denotes the characteristic function of the set A, then P(t, x, s, A) = MxA[gt,8(x, at, ..., a8-1)] .
(3)
Let u and v denote two numbers such that t < u < v. It follows from formula (2) (cf. Theorem 7, Section 4, Chapter IV) and the 297
JUMP MARKOV PROCESSES
298
independence of the random variables at,
, av_1
that
a.9 ... ,
P(t, x, v, A) = MxA[gu.
au, . . ., av_1)]}&=e(u)1 = MP(u, e(u), v, A) which may be rewritten p(t, x, v, A) _ P(u, y, v, A)p(t, x, u, dy), t < u < v
.
(4)
It expresses an important property of the systems that we are considering, namely Equation (4) is called the Chapman-Kolmogorov equation.
the absence of aftereffects: if we know the state of a system at a certain instant u, the probabilities of transition from that state do not depend on the motion of the system at previous instants of time. Systems enjoying this property are called Markov systems. They are
frequently encountered in equations of science and technology. Chapters 7 and 8 are devoted to a study of Markov processes. In this chapter, we shall consider systems whose motion can be characterized by the fact that the system E is immobile in phase space for some period of time and at a random instant its position changes by a jump. In the next chapter we shall consider systems whose states change continuously with time. 1.
TRANSITION PROBABILITIES
Let X denote an arbitrary space with fixed a-algebra of sets 0 and let Z denote a set of real numbers. Definition 1. A family of functions p(t, x, u, A), where t, u e Z,
t < u, x e X, and A e 0,
called a Markov process in the broad sense in the phase space X if the functions p satisfy the following is
conditi a. p(t, x, u, A) is, for fixed t, x, and u, a probability measure
on 0, b. for fixed t, u, and A, the function p(t, x, u, A) as a function of the variable x is 0-measurable, and c. for arbitrary t, u, v, x, and A (with t < u < v) the functions
p(t, x, u, A) satisfy the Chapman-Kolmogorov' equation
p(t, x, v, A) = f p(t, x, u, dy)p(u, y, v, A) .
(1 )
X
The functions p(t, x, u, A) are called the transition probabilities.
According to our interpretation of transition probabilities we naturally assume that
1.
TRANSITION PROBABILITIES
299
(2)
P(t, x, t, A) = XA(x) ,
where XA(x) is the characteristic function of the set A e 0. There are two families of operators connected with transition probabilities: 1. Suppose that the distribution of the position of the system in phase space is given at the instant to and suppose that po(A) = p{e(to) e Al where A e 0. Let the distribution of the system E at the instant t > to be denoted by pt(A). Then
(3)
5p0(dx)P(to, x, t, A)
!pt(A) =
Formula (3) defines an operator T[to.t] (for to < t), which maps the probability measure ,uo(A) into a new probability measure fct(A). This is true because (obviously) pt(A) is nonnegative, pt(X) = 5xpo(dx)p(to, x, t, X) =
5xpo(dx) = 1, and the countable additivity of
pt(A) follows from the countable additivity of the integral and the transition probability p(to, x, t, A). Furthermore, if instead of the probability measure ,uo in formula (3) we substitute an arbitrary finite charge Wo, the right-hand side of this formula remains meaningful and we obtain a transformation T[to.tI in the space W of all finite charges (cf. Section 1, Chapter II): W0(A) =
Jx
W0(dx)p(to, x, t, A)
.
(4)
If Wo = Wo - Wa , where Wa and Wo are the positive and negative variations respectively of the function Wo, then W (A)
Wo (dx)p(to, x, t, A) - Wo (dx)p(to, x, t, A) . J
(5)
Consequently, for the positive and negative variations of the charge WW(A) we have
Wt (A)
t is given by the expression
ft(x) = f(y)p(t, x, v, dy),
t -, the probabilities p;;(t) approach the values lim p;o(t) = Po = flo
limp;1(t) = P1 = Xo
t__
(j = 1, 2) ,
which are independent of i. On the other hand, the probabilities Po = 1-1o and p1 = x,o coincide with the probabilities of the states 0 and 1 in a stationary mode. If v(t) denotes the state of the channel
in the stationary case, then
and the covariance function of the process v(t) is calculated as follows:
R(t) = M(v(t + z)v(v))
-
m2 ,
M(v(t + v)v(T)) = P1P11(t)
from which we get R(t) = Xopoe-`1+N>t .
2.
Processes of Pure Growth
As an example of a Markov process with countably many states, let us consider a so-called process of pure growth.
The pos-
sible states in this example are the numbers 0, 1, 2, , and the sample functions do not decrease with increasing t, but with each jump they increase by unity. A process of this kind can serve as a mathematical model of processes of registration of a phenomenon that takes place at random instants of time. For example, when there is successive radioactive discharge from an original radioactive substance (the parent substance), another radioactive substance (the first daughter substance) is formed; from that is formed a second daughter substance, and so forth. Let us consider one particular atom of the original substance. In the course of a random instant of time this atom is in the original state. Then it disintegrates, becoming an atom of the first daughter
4.
EXAMPLES
321
substance, and so forth. Here the probability of disintegration of the atom in the instant (t, s) does not depend on the time the atom
has "lived" up to the instant t, and each state of the atom has a definite mean life span 1k = 1/X,. Examples of such chains of successive radioactive discharge are the transformations of natural These terminate with the formation of stable isotopes of lead. isotopes of uranium and thorium.
Let us state mathematically our assumptions regarding the random process e(t) in the form of conditions on the transition probabilities.
for
It is natural to assume that p;,,i+1(t) = X,it + o(t), pi;(t) = o(t) j - i 0, 1, pii(t) = I - xit + o(t). The first system of Kolmogorov equations take the form
i>0,
(1 ) and the second system of Kolmogorov equations takes the form -Aipi1(t) + xipi+1,,(t),
Pa2(t) = -Pi1(t)X1 + Pao(t) = -Pio(t)X0 -
Pi,j-1(t)X,-1,
j>1
,
(2)
To these equations we need to add the initial conditions pi,(0) _ ai;. Let us solve the second system. If i > j, then according to the definition of a process of pure growth we set pij(t) = 0 which of course is also a solution of the system (2) and satisfies the initial conditions. The system (2), for fixed i is then recursive. First let us determine pii(t) from the equation
P i(t) _ -Xipii(t), Pii(0) = 1 , and then let us determine successively the functions pi,;+1(t) and pi,i+Q(t), for each of which the system (2) is an ordinary linear differential equation. For the first step we have pii(t) = e-41. Then pi;(t) = x;-1J exp {-X;(t - s)}Pi,A-1(s)ds .
One can easily obtain a solution in explicit form for the system (2) by the usual methods of operational calculus. To do this, consider the transforms of the functions pij(t): Tij(z) = z10e-atp;,(t)dt . Then 0
z(,pi;(z) - 3ij)
and when we shift to the transforms of the functions, the system (2) takes the form
JUMP MARKOV PROCESESS
322 ZIPi.i(Z) _ -((X799iAZ) +
J>is
Z(Tii(z)
from which we get m
rii(Z) - Z+
j> i
Ti.7
f
x
and cpij(z) = (IIk- Xk)z/ r(z), where ' r(z) = k( z + Xk) the numbers Xk are distinct, then 1
_i
If all
1
*(Z) - k=i (Z + By means of this formula, the expression for rpi;(z) can be written in the form Z
111 Xk1) ,Pi,(z) = \k=i /k=x (Z + Xk)r,(-Xk)
Since q)ii(z) = z/(z + Xi) is the transform of the function e-xit, we have if j 1} < k=O
,
Mzk
k=1
and on the basis of the corollary mentioned, the series (5) converges with probability 1 if the series (6) converges. On the other hand,
',P{Tk> 1}=I,e-1k k=o
(7)
k=O
e- k
E MZk = E e-1k + E
(8)
and if the series (6) diverges but the series (7) converges, then the series (8) diverges.
Thus in the case of divergence of the series (6), one of the series (7) or (8) diverges and consequently the series (5) diverges with probability 1. This completes the proof of the theorem. It follows from Theorem 1 that if the series (6) converges, then after a finite interval of time the system moves out to infinity with probability 1 (or disappears from the phase space). With probability 1, a process of linear growth does not go out to infinity.
JUMP MARKOV PROCESSES
324 3.
Birth and Death Processes
Birth and death processes are homogeneous Markov processes in which transitions from with possible states 0, 1, 2, , n, the state n into the states n - 1 and n + 1 are possible. Accordingly we set
i-j>1
pi.i+1(t) _ X it + o(t), pi,j(t) = o(t),
;
Pi,i-1(t) = lAit + o(t), pii(t) = 1 - (Xi + ui)t + 0(t)
In the case of the first system of Kolmogorov equations, the differential equations for the transition probabilities take the form pi,(t) = -(A. + iui)Pi1(t) + Xipi+1, (t) + P pi-1,A(t)
i=0,1,2,...(lc0=0)
(9)
and in the case of the second system of Kolmogorov equations, they take the form pin(t) = -pi,(t)(X1 + !A,) + Pi.g-1(t)Xj-1 + pi,j+1(t)1l;+1
(10)
For the unconditional probabilities pi(t) we have the system of equations
Pi(t) = -pi(t)(X + 1-t) + pi-1(t)Ai-1 + pi+1(t)lAi+1
i=0,1,2,
(11)
,
p_1(t) = 0
Let us find a stationary distribution of the probabilities, that that a distribution of the probabilities pi(t) for i = 0, 1, 2, satisfies the system (11) and does not change in time: pi(t) = const. For such a distribution the system of differential equations (11) is,
degenerates into a homogeneous algebraic system:
- (xi + Pi)Pi + Xi-1Pi-i + ti+1Pi+i = 0,
-x0Po + piPi = 0 . Suppose that pk > 0 for k = 1, 2, and as we easily find by induction, Pk =
X0X1 ... Xk-1
.
Then p1 = (A0/i1)p0
Po
(12)
a0A1... X,,k 11
(13)
//
and
Ilk = p0(1 + k=0
k=1
[-1P2
('// 'k ff
Theorem 2. For a stationary probability distribution to exist in a birth and death process it is necessary and sufficient that the series
4.
EXAMPLES
325 XoX,,l//...Xk
k=1
1
(i'k>0,k_ 1,2,...)
(14)
P1("2 ... )lk
converge.
It is interesting to note the connection between stationary distributions of a process and the so-called "final probabilities" P k(°° ), where Pik(°°) = limt--p;k(t) Let us suppose that the final probabilities ptik(oo) exist. When
we integrate equation (11) from h to h + T, divide by T, and then
take the limit as T
-, we obtain
-pti,(o)(Xi + N) + p:,j-1( °° )X,-1 + -p.o(°O )7o + p:1(°°
p:,j+1( °° ))Ui+i = 0, )!11 =
j>0
0;
that is, the final probabilities p;; (=) coincide with the stationary distribution (for fixed i) and they are independent of i. In technology, physics, and natural science there are many problems that involve birth and death processes. Let us look at some of these. 1. The servicing of lathes. Suppose that in lathes are serviced by a crew of s repair men. When a lathe fails to function properly
it is repaired immediately unless all of the repairmen are working
on lathes that have already failed in which case the lathe must await repair. The lathes are repaired in the order in which they fail.
Let us make the following assumptions: For an individual functioning lathe, the probability of getting out of order during an interval of time (t, t + At) is independent of t and is equal to x(At) = xAt + o(At), independently of the "history" of its operation
(the length of time that it has been in use, the number of times that it has become out of order, and the length of service) up to the instant t. Analogously, if a lathe is being repaired, the probability of its being put back into operation during an interval of time
(t, t + At) is equal to te(At) = At + o(At) and is independent of the nature of its work and its length of service up to the instant t. The lathes are used, get out of order and are repaired independently of each other. Let E denote the state of the industrial process. Let us agree
to say that E is in the state Ek if at a given instant the number of lathes being repaired or awaiting repair (that is, the total number of lathes not in operation) is equal to k. Subsequent removal of a single lathe from service denotes transition to the state Ek+1, and
completion of the repair of one of the lathes indicates transition
JUMP MARKOV PROCESSES
326
into the state Ek_1. Thus we have a homogeneous Markov system with finitely many states Eo, E1, , E.. It follows from our assumptions that Pk,k+1(Ot) = (m - k)XAt + o(At), k = 0, pk,k-1(At) = kaAt + o(At) for 1 < k < s , Pk,k-1(At) = s fcAt + o(At) for m > k > s , Pk,ktr(At) = o(pt), r>2;
, in - 1
;
that is, we have a birth and death process with finitely many possible states. In the preceding notation,
k=0, 1, ...,in; Ilk=SP for s! b; < oo
.
=2
Then we have the first system of Kolmogorov equations dPlj(t) _ -b1P1j(t) + dt
k=o
(3)
bkPks(t)
k#1
Multiplying both sides of equations (3) by z' and summing with
respect to j from 0 to o, we obtain af1(z, t) at
_ -b1f1(z, t) + j bkfk(z, t)(I z I < 1) , k=0
k#1
or on the basis of (2), af(z, t) = -b1f(z, t) + j bkfk(Z, t) at
k=0 k#1
where we set f(z, t) = f1(z, t).
Finally we obtain the following
nonlinear differential equation (4)
at = u(f) where
u(z)=b(, -b1z+jbkzk k=2
z
1).
(5)
To equation (4) we must add the initial condition f(z, 0) = z . (6) The solution of equation (4) with initial condition (6) can be written in the form
P(.f) - P(Z) = t, P(Z) _
u(z) '
(7 )
Let us suppose that the conditions under which the second system of Kolmogorov differential equations may be applied are satisfied. From the definition of branching processes we get the formulas
pkk(t) _ (1 - b1t)k + o(t) = 1 - kblt + o(t) pk,k_1(t) = k(1 - b1t)k-'bot + o(t) = kbot + o(t) Pk.k-7(t) = o(t), j > 2 , Pk,k+j(t) = k(1 - b1t)b,+1t + o(t) = kb;+1t + o(t), j ? 1
S.
BRANCHING PROCESSES
331
from which it follows (in the notation of Section 2) that q99 = jbl, q,,,j-1 = jbo, 79,9-k = 0, k > 2 ; q9,9+k = jbk+1, k >
1
.
Thus for the functions p1j(t), the system (16) of Section 2 takes the form dpdt(t) = (j + 1)bop1,9+1(t) - jb1p19(t) + (j - 1)b2p1,9-1(t)
+ ... + b5p11(t)
(8)
.
Multiplying (8) by zi and summing with respect to j from 0 to 00, we obtain a new equation for the generating function f(z, 1),
0,u(1)=bo-bl+bk=0, k=2 u"(z) > O
for z>0.
Here we assume that not all the bk (for k > 2) are equal to 0. Thus u"(z) is convex downward for z > 0 and hence has no more than one zero in the interval (0, 1). Let us turn to the definition
S.
BRANCHING PROCESSES
335
of the probability a of degeneration of the branching process v(t). Since the events v(t) = 0 constitute an increasing class of events, we have a = P{lim v(t) = 0} = lim P{v(t) = 0} = lim plo(t) t-.oo
t-.co
.
t-.co
Theorem 1. The probability of degeneration of a branching process coincides with the smallest nonnegative root of the equation u(x) = 0. If
ul(l)=m, =-b,+Ekbk 0,
then a < 1. Proof.
If plo(t) = f(0, t), it follows from (4) that dpio(t) = u(pio(t)), Pm(0) = 0 .
dt
(21)
If bo = 0, then plo(t) = 0 is a solution of equation (21) and the theorem is trivial. Suppose that bo > 0. We note that if xo is the smallest positive root of the equation u(x) = 0, then plo(t) < xo for all t > 0, because if p,o(to) were equal to xo for to > 0, then by virtue of the uniqueness of the solution of equation (21) we would have plo(t) - xo, which is impossible. Furthermore since the limit a = limt__plo(t) < 1 exists, it
follows from (21) that the limit limt...p o(t) = u(a) also exists. But this implies that u(a) = 0 because otherwise the quantity t
Po(t) =
po(t)dt + po(to)
to
would increase without bound. Thus we have shown that a = xo.
If x0 < 1, then the function u(x) is an increasing function at the point x = 1 and the derivative u'(1) > 0 if it exists. On the other hand, if xo = 1, then u'(1) < 0. This completes the proof of the theorem.
Let us now investigate the asymptotic behavior of the proboc for degenerating processes (a = 1). Theorem 2. If m, = u'(1) < 0 and m2 = u"(1) < co, then 1 - plo(t) kerit for m, < 0
ability plo(t) as t
and
1 - plo(t) Proof.
Let us define
2
m2t
for m,=0.
JUMP MARKOV PROCESSES
336
q(t) = 1 - plo(t) The function q(t) satisfies the equation
dq = -u(1 - q(t)), q(0) = 1 Using the formula for finite increments, we obtain
= -u(1) + q(t)u'(E) =
dq
where a lies between plo(t) and 1. Since u'(x) is an increasing function and $ 1 as t -, it follows that u'(e) = u'(1) - e(t),
where e(t) > 0, and lim, e(t) = 0. Thus dq = q(t)(ml - e(t))
from which we get q(t) = exp (mit
-
o
We note that for e < < 1,
-)
0 < e(t) = u'(1) - u'(e) = u"(C)(1 < u"(l)(1 - Plo(t)) < m2emlt Therefore the integral for m1 < 0, q(t) sto kemlt
From this it follows that
e(t)dt is finite. where
k = exp (_°(t)dt)
Consider the case m1 = 0. We have
dq = -u(1 - q(t))
-u(1) + q(t)u'(1)
where 1 is a number in the interval (plo(t), 1). as t -4 00 , we have q 22 t) Wt
where e(t) - 0 as t m2t +
Since u"(e) ---> u"(1)
(mz + e(t))
From this it follows that 2
q(t) =
-
Jo
(2)d2 + 2
=+0\/. m2t
t
This completes the proof of the theorem. We shall supplement Theorem 2 with h-a result dealing with the asymptotic behavior of the probabilities plo(t) for degenerating
5.
BRANCHING PROCESSES
337
processes.
Since lim,-- p1,(t) = 0 (for n > 0), we have limt_ f(z, t) = 1. We define q(z, t) = 1 - f(z, t). For z = 0 we have q(0, t) = 1 - f(0, t) = 1 - P1u(t) = q(t) kem1t . We may assume that the same is true of the rate of decrease of the function q(z, t) at z 0 also. In connection with this we define q (z, t) = qq(t)) 1
f
(,
(22)
t)
We note that the function p1 (t) zn
.f*(z, t) = 1 - q(z, t) _
(23)
can be regarded as the generating function for the conditional distribution of the number v(t) of particles under the hypothesis that
it is nonzero up to the instant t. Theorem 3. If m1 = u'(1) < 0 and m2 = u"(1) < oo then as t --> oo the conditional distribution of the number of particles v(t), under the hypothesis that the process has not degenerated (v(t) zt- 0) up to the
instant t, approaches a definite limit, the generating function f*(z) of which is equal to
f*(z) = 1 - exp (mi__).
(24)
u Z)
Proof. Let us consider the function q?(z, t). It follows from (4) that cp(z, t) satisfies the equation (25) u(1 - q(t)p) + T u(1 - q(t)) Expanding the right-hand member of the equation obtained in
at
q(t)
accordance with Taylor's formula, we get
a
q(t)
[u(1) - q(t)q u'(1) +
(q()2 (u"(1) + -`1)]
q2(2 t) (u"(1) + E2)] + q(t) [u(1) - q(t)u'(1) + where s1 = u"(E1) - u"(1) and s2 = u"(e2) - u"(1), the number e1
(resp. e2) lying in the interval (f(z, t), 1) (resp. (f(0, t), 1)).
As t---
the functions s1 (for i = 1, 2) approach 0 uniformly in an arbitrary region I z I < p < 1. The preceding equation can be written in the form
ap at
q(t)T2 (m2 + s) + q(t)y (m2 + s2) 2
2
(25')
JUMP MARKOV PROCESSES
338
Beginning with some sufficiently large t, we have
at
< q(2t)2
p
(Mz +
2z) =
4mzq(t)q
so that t
q )(z, t) < q(z, to) exp (
The convergence of the integral
remains bounded as t written in the form Op
=
at
q(v)dr) 4 mz
.
q(z-)dr implies that the function q)(z, t)
Therefore equation (25) can be re-
q(t)y [m2(1 2
- q) + s] ,
q)(z, 0)
1-z
,
where s = sz - q)s1-. 0 as t -. co . Representing the solution of the last equation in the form t)
fq(z,
z) exp (tq(r)[m2(l - q.(z, r)) +
s]d-r)
,
we see that the limit lim, p (z, t) = K(z) exists. Furthermore, it follows from (25') that lim, dq /dt = 0. Since q(z, t) is an analytic function inside the disk I z I < 1 and since all the limit relationships that we have used hold uniformly inside every disk I z I < p < 1, it follows that the function K(z) is also analytic inside the disk and that lim ap(z, t) = dK(z) t__
az
dz
uniformly inside an arbitrary disk I z < p < 1. To determine the function K(z), we may use equation (9). Setting f(z, t) = 1 - q(t)q(z, t)
in that equation, we obtain u(z)q(t) ap(Z, t)
- q'(t)p(z, t) - q(t) a at' t)
Dividing this equation by q(t) and letting t approach oo , and remembering that q'(t)/q(t) - m1 (cf. proof of Theorem 2), we obtain
Here, K(0) = lim,
m1K(z) = u(z) ddzz) p(0, t) = 1. Thus
K(z) = exp
(z))
1 - f(z, t) ' q(t)K(z) = exp (mitt +
Jo u() z ))
5.
BRANCHING PROCESSES
339
which completes the proof of the theorem.
Let us now derive an expression for the mean number of particles at the instant t under the hypothesis that the process has not degenerated up to that instant. We have m*(t) = M{v(t) I v(t) > 0} =
(26)
,
9(t)
from which (keeping Theorem 2 in mind) we get the following asymptotic relations as t -->
00: 1
for m1 < 0 ,
k m*(t) P&
met
for m1 = 0 ,
2 emit
for m > 0 . 1
1-a
For m1 > 0, the number of particles v(t), under the hypothesis
that v(t) > 0, increases without bound. We define (t) _ "(t) m*(t)
Let us study the limiting behavior of the quantity v*(t) as t --> oo. As we might expect, the limiting distribution of the quantity v*(t) under the hypothesis v(t) > 0 will, if it exists, be a continuous Then M{v*(t) I v*(t) > 0} = 1.
distribution on the halfline [0, a), and therefore it is convenient for us to shift over from the generating functions to the characteristic functions. For the characteristic function g(X, t) of the random variable v*(t), under the hypothesis that v(t) r> 0, we have the value iXn
t)
_ n=1 exp (m*(t))
Pi(t) _
ff eXp \m*(t) ), t} - f 0, t) q(t)
g(t)
or
1 - f{exp ( g ('
9
t) -- 1 -
Consider the case m1 = 0.
m*(t))
tI (27)
q(t)
1
Theorem 4. If m1 = 0 and m2 < -, then lim p 2v(t) < x v(t) > 0} = t-
mgt
1-e
x .
(28)
Proof. Defining * (z, t) = 1 - f(z, t), we obtain from equation (4)
JUMP MARKOV PROCESSES
340
-
as = _u(l
[u"(1)
*r(z, 0) = 1 - z .
+ 6(t)],
2
Since the process is degenerate for m1 = 0, we see that * (z, t) -- 0 uniformly in the region I z I < 1 as t From this it follows that s(t) also approaches 0 uniformly with respect to z, for I z I < 1, 00 as t Integrating the last equation, we obtain *(Z1 t)
1
-.
1z
o(t)dt1 ,
_2_[ met
so that
* jexp ( m*(t)), t} +
m2 q(t) 2
q(t)
= lim
q(t)
t-
1 - exp
t
+
t
m*(t)
ix +
g(X) = lim g(A,, t) _
1
1 - ix
The function g(?) is the characteristic function of the distribution F(x) = 1 - e-z for x > 0, F(x) = 0 for x < 0. This completes the proof of the theorem. In the case m1 > 0, the quantity q(t) approaches the nonzero limit 1 - a. Therefore normalization with the aid of the function q(t) or shifting to the conditional mathematical expectations under the hypothesis that v(t) > 0 cannot play a significant role. Theorem 5. If m1 > 0 and m2 < =, then the quantity
v(t)e-m1t
converges in the sense of mean-square as t , 00 to the random variable 7) = l.i.m. v(t)e-""1t, whose characteristic function g(X) satisfies the functional equation
g(X)) exp (-
1
u(vu(v)(vl(v 1) ) dv)_
(29)
To prove the convergence of v(t) = v(t)e--1t in the sense of mean square, we use Cauchy's criterion. Suppose that t < t'. Then M(i (t) - v(t'))2 = MD(t)2 + Mi (t')2 - 2M(i (t)i (t'))
On the basis of formulas (16) and (19), MD(t)2 Gv' m2 m1
as
t -- 0
Using the definition of a branching process and the fact that it homogeneous, we obtain
is
S.
BRANCHING PROCESSES
341
M(v(t)V(t')) = MP(t)M{v(t') I v(t)} = Mv2(t)Mv(t' - t)
,
from which it follows (also on the basis of formulas (16) and (19)) that m2
M(v(t)v(t'))
m1
Thus M(v(t) - v(t'))2 --+ 0 as t, t' ---+ co and the limit 2 = l.i.m.t v(t) exists. Writing equation (4) in the form
df
u(f) - m1(f - 1) df = mldt
f- 1
u(f)(f- 1)
and integrating with respect to t from 0 to t, we obtain In (1 - f) - f u(v) - ml(v - 1) dv = m1t + In (1 - z)
.
u(v)(v - 1) Here if we set z = eaaim*(t) and let t approach oo, we obtain the equation JZ
In (1 - g)
i) 1)dv
- gu(v)- m1(v
= In(-i?)
from which (29) follows. This completes the proof of the theorem.
The theory of branching processes with particles of several types is analogous but more complicated. We pause only for the basic relationships in that theory. Just as in the case of particles of a single type, it is convenient to use the method of generating functions.
Let M denote the set of all possible states of a process, that is, the set of all vectors a = (a1, a2, , an) with nonnegative integral components. Let us agree to denote n-dimensional vectors and their components with the correby Greek letters a, 6, a, sponding Roman letters.
We define the generating functions
F:(t, Q) = F;(t, s1, s2, ... , sn)
of the transition probabilities p{;, p(t) F;(t, a) = FF(t, Si,
.
the relations
bby+
// , sn) = L p{i} (t)sl1sa2 ... sin,
9eM
(a = {sl, s2 .., sn}, $ _ {b1, b2, .. , bn}) .
We recall that {i} denotes the vector
{i} = {8ti1, 8;2, , bin}. functions F;(t, a) are analytic functions of the variables s1, s2,
in the region I s; I < 1 (for i = 1,
F;(t,a) < 1 for
.
(30)
The
, S.
, n). Also,
s; < 1,F;(t, 1, , 1)
= 1, FF(0, Q) = s;
(31)
.
If we define the n-dimensional vector-valued function ID(t, a) = {F1(t, a),
,
a)}
,
JUMP MARKOV PROCESSES
342
it follows from (31) that
c(0, v) = a .
(32)
Let us now find the equivalent of the Kolmogorov-Chapman formula for branching processes expressed in terms of the generating functions. We have pfofi(t + T) = E P(i)a(t)PaA(T), t > 0, T > 0 . aeM
If we substitute into this equation the value of pap(z) corresponding to formula (1), we obtain n
aeM
ai
E H H P(kldck,j)(z) .
P(.la(t + z) = E pr}-(t) kj
k=1 j=1
If we multiply both sides of this equation by s,11 ... Snn and sum over all 8, we obtain the relations: Fi(t + T, a) TnT
E P(i}(t)peM E aeM
kj
a7k
E 11 11 fi(k,j)=P k=1 j=1
Sb, (k,j)
7n7 7ak
aeM
Plila(t)H 11 E k=1 j=1 (01heI
Sx(k,j)1
/
7n
aeM
P(ila(t) k=1 11 Fkk(T, Q) ,
from which it follows that Fi(t + r, a) = FF(t, F1(r, or), ..., FF(T, a)),
i = 1, ,s,
or
4)(t + r, a) = 4)(t, O(r, a)) .
(33)
Theorem 6. The system of generating functions of a branching process satisfies the system of functional equations (33) and the initial condition (32).
Let us derive for the generating functions the differential equations
corresponding to the first and second systems of Kolmogorov equations for the transition probabilities. Suppose that
lim LL A!-) = bia ($
{i}),
lim 1 - Poi}lil(t) = bii
and
bii =deM,F#(iI E bia < ,
I = 1, ..., n
Then the transition probabilities p(i)p(t) satisfy the first system of Kolmogorov equations (cf. Section 2):
6.
THE GENERAL DEFINITION OF A MARKOV PROCESS
dp(i,a(t) dt
343
= -biip(i(a(t) a+e M,a#(i( E biapaa(t)
Multiplying this equation by sll, noting that equation (1) implies
, snn, summing over all 8, and
+
[F:(t, a)]ai Li paa(t)S°1sP ... snn = II beM i=1 (this equation expresses the independence of the evolution of the
particles that exist at a given instant of time), we obtain
- -biiFi(t, a) +
aFi(t, a) at
a.
n
aeM,a#(i(
bia H [Fi(t, a)] . i=1
or
a1 (t, a) at
i = 1, ...,n ,
= ui(F1(t, a), ..., FF(t, a)),
(34)
where ui(s1, ... , s,,) _ - biisi +
a E M,a#(i}
bia sa, ... San, i = 1 , ... , n . (35)
The functions ui(s1, , s,) are the generating functions of the systems of quantities { - bii, bia, a e M, a # {i}}.
To obtain the second equation, let us suppose that I si I < 1 (for i = 1, . , n). Then I Fi(t, a) I < 1 (for i = 1, , n) and let us differentiate equation (33) with respect to z.
then setting r = 0, we obtain aipat a) =
uk(a)
acp(t, ask
Differentiating and
a)
(36)
Equation (36) is a system of the same type of equation for the generating functions Fi(t, a), aFi(t, a) at
_
k=1
uk(a)
aFF(t, a)
i=1
ask
which must be solved under the initial conditions (31). Thus, we have obtained.
Theorem 7. The system of generating functions Fi(t, a) for I si I < 1, where i = 1, , n, satisfies the system of ordinary differential equations (34), the partial differential equation (36), and the initial conditions (31).
6.
THE GENERAL DEFINITION OF A MARKOV PROCESS
At the basis of the concept of a Markov system (process)
is
the concept of a system whose future evolution depends only on
JUMP MARKOV PROCESSES
344
the state of the system at the given instant of time (that
is, its
future evolution does not depend on the behavior of the system in the past). Let {U, C5, P} denote a probability space on which a random process e(t) is defined with range in a complete metric space X.
We shall call the space X the phase space of the system and we shall call e(t) the state of the system at the instant t e Z, where Z is a finite or infinite interval of the real line. We let 0 denote the algebra of Borel subsets of X. The hypothesis of absence of after-effect is most easily written with the aid of conditional probabilities:
(1 ) for arbitrary A e 8 and t1 < t2 < . . . < t < t. Since the conditional probability regarding a random variable can be regarded as P{e(t) e AI e(t1), e(t2), ... ,
P{e(t) P_ A I e(tj} (mod P) ,
a function of that variable, we set (s < t)
P{e(t) G A I e(s)} = P(s, $(s), t, A)
.
(2)
It follows from formula (26), Section 6, Chapter III that for
t, < t,
0, then the process fi(t) is continuous.
Proof. Assertion (b') is a particular case of (b). Assertions (a) and (b) follow from Theorem 1 of the present section, Theorem 2 of Section 4, and Theorem 4 of Section 5, Chapter IV. 7.
THE BASIC PROPERTIES OF JUMP PROCESSES
Jump Markov processes were introduced in Section 3. Let us now look at them in greater detail. Let p,(A) denote the distribution of e(t). Lemma 1. A jump Markov process is stochastically continuous. Proof. If t' > t, then on the basis of conditions a and c of
Definition 1 in Section 3, p{e(t)
(t')} = 1 IA,(dx)p(t, x, t', X - {x}) r
(q(t, x) + s)(t' - t) ftt(dx) < (k + s)(t' - t)
,
348
JUMP MARKOV PROCESSES
from which the assertion follows. Lemma 2. If e(t) is a separable jump process, then
P{e(z) = x for all z e [t, s] I e(t) = x} = exp (- 8q(z, x)dz)
\
Proof.
(1 )
t
Let M denote the set of separability of the process
E(t) on the interval [t, s]. It follows from the stochastic continuity of the process and Theorem 5, Section 2, Chapter IV that we may take for M any countable set that is everywhere-dense on [t, s]. It follows from the separability of the process that P = p{e(z) = x for all z e [t, s] I e(t) = x} = p{e(z) = x for all z e M c(t) = x}. For the set M, we can take the set of points of the form tnk = t + kh/2n, for k = 0, 1, . , 2n and h = s - t. We note that p = lim p{e(tnk) = x, k = 1 , , 2" 1 e(t) = x} n-_
since the events A. = {c(tnk) = x, k = 1 , , 21} constitute a decreasing sequence and fln=, A. = {e(z) = x, z e M}. Furthermore, pn = P{e(tnk) = x, k = 1, 2, ..., 2-1 e(t) = x} 72%
2%
= [f p{tn,k-1, x, tnk, {x}}, lnpn = E In P{tn,k-1, x, tnk, {x}} k=1
k=1
Let fn(t) denote a piecewise-constant function that on the interval [tn,k-1, tnk) is equal to 1/Atnk In p(tn,k-1, X, {x}), where It follows from the definition of a jump process Atnk = tnk - tn,k-1' (cf. Definition 1, Section 3) that p(tn,k_1, x, {x}) --+ 1 uniformly with respect to t. Setting p(t,n,k_1, /x, tnk, {x}) = 1 + ank, we see that In (1 + ank) + q(tnk, x) = In (1 + ank) ank + q(tnk, x) --.0 At uniformly with respect to t.k. Therefore fn(z) -q(r, x) uniformly
on [t, s] and
lnpn =
fn(z)dz
x)dz
J
This completes the proof of the lemma. Lemma 3. A separable jump Markov process has no discontinuities of the second kind. This is true because a(s, 3) 0 and o{%LJ, x 3} is the product of the o-algebras 01 and g- (cf. Definition 1, Section 8, Chapter II). Specifically, the set it, w; f(t, w) e Al is the sum of countably many cylindrical Borel sets
7.
THE BASIC PROPERTIES OF JUMP PROCESSES
tj < t
0
E'(t) = E(t + 7m),
Since E'(t) =
Sk+m
for 7k+m
f(t, cem, Zm+1, Sm+i, Zm+2, .
- ym < t < yk+m-1 - ym, we have
. .) and for arbitrary fixed t > 0, '(t) is a
Borel function of m, zm+1,
,
that is, a random element of X.
Theorem 2. Suppose that P'(t, x, s, A) = P{E'(s) e A I E'(t) = x, 7m = T, E'(0) = z} Then
P'(t,x,s,A)=p(t+T,x,s+T,A),
t<s.
In other words, for fixed ym = T and E'(0) = E. = z, the transition probabilities for the process E'(t) coincide with the transition probabilities of the Markov process E(t + T), where t > 0 and E(T) = z. Proof. The theorem follows easily from formula (9), by virtue of which P(zm+i < t1, Em+i E Al, ... , 2'r < tr
=
i
(t0 + 7m, Em, ds)
10
,
7--
1 ,1 (t0 + 7m + Si, Em,
dx) x
Al
+ 7m + Sl + ... + sr_l, xr_l, dsr)
X 0
X 11 (t0 + 7m + s1 + ... + sr, xr_1, Ar)
that is, the conditional distribution of the variables zm+i, , Em+r
Em+i,
for given E0, g1, ... , gm, zi,
, zm+r,
, zm depends only on Em
and 7m, and it coincides with the distribution that is obtained if we consider the Markov process E"(t) = E(t + T), T = 7m with fixed initial state E"(0) = Em. Since the joint distribution of the variables Zm+19 Zm+E,
, Zm+r, Em+1,
, Em+r ,
where r is an arbitrary positive number, uniquely determines the finite-dimensional distributions of the process E'(t), the theorem is proved.
Let us now give an important generalization of this theorem. Let a = a(u) denote a nonnegative Cs-measurable function that is finite and defined on some Q e e.
In the present section we shall call a a random variable, although this description is not exact since the function a(u) is not defined for all (mod P) elementary events.
JUMP MARKOV PROCESSES
356
Definition 1.
The function a = a(u) is said to be a random
variable independent of the future (with respect to the process E(t))
if for arbitrary t > 0, {u; a(u) < t} e a{ < t}
.
Thus a(u) is a quantity independent of the future if all we need to do to know whether the event {a(u) < t} occurred or not is to observe the sample function of the process fi(t) up to the instant t. Let o denote the class of all events Bee such that for arbitrary t > 0, B n {u; a(u) < t} e a{ a). Define v = 71 - a and define 7; inand ductively as the earliest instant at which m = g(ym) (for m = 2, 3, ). We define the events m
for yk-1 C t yk, D= k=1 n 1 e A'k-1 g J ( = Sm a A' } k ek-, 0 < yk - 7k-, < tk} n {e(7m) k
and
j, ... , jm) _
B{ JOn- 1
< 0, with the hypothesis $(s) = x. Since these joint distributions uniquely determine the conditional finite-dimensional distributions of the process, we have proven Theorem 3. Let $(t) denote a jump Markov process and let a denote a random variable that is independent of the future of E(t). Suppose that B e o'.. Then if 0 < t1 < . . . < tm,
7.
THE BASIC PROPERTIES OF JUMP PROCESSES
359
P{B,a<s,$(t1+a)eA1, P{e(t, + t) a A1, ..., (tm + t) e A. I e(t) = x} x g(t, dx)dF(t) , g(t, A)dF(t) = p{B, a 5 s, e(a) e Al.
where J
Corollary 1. If the conditions of the preceding theorem are satisfied and the process e(t) is homogeneous, then
P{B, a < s, E(t, + a) e Al, ... , (tm + a) e A. I e(a) = x} = P{B, a < s}p{e(t1) e A, ... , E(tm) e A. 1 (0) = x} . Corollary 2. Let e(t) denote a homogeneous Markov jump process with countably many states, let a denote the time the system first
falls in the sth state (a = z1 + - - - + zk if k = s. Suppose that P(a < -) = 1. Then the process c'(t) = g(a + t) for t > 0 is a Markov process with the same transition probabilities as the process e(t), it satisfies the initial condition e'(0) = s, and it is independent of the o'-algebra of the events a,,.
For an example of the application of these results, consider the problem of determining the distribution function of the time of first transition from the sth state to the rth state in the case of a jump birth and death process. We recall (Example 3 of Section 4) that a birth and death process is a homogeneous Markov process with countably many states (0, 1 , 2, ) for which q(n) = Xn + jam,, q(n, In + 11) = X.,, and q(n, In - 11) = u,,, for n = 0, 1, 2, .. , where po = 0 and {n} is the set consisting of the single element n. Thus in a jump birth and death process, transitions from the nth state are
possible only into the adjacent (n - 1)st and (n + 1)st states with probabilities (cf. Theorem 1)
7c' = H(n, in + 1}) = X. + /^m
7r' = II (n, In - 11) =
112
X. + Suppose that s > r and zar(t) is the length of the interval of time
to the first instant the system falls in the rth state if the system is in the sth state at the instant t. Then Zar(t) = Ta,a-1(t) + Ta-1,r(t + ra,a-1(t))
.
(18)
It follows from Corollary 1 to Theorem 3 that the terms in the right-hand member of equation (18) are independent and that the second term has the same distribution as za_i,r(t).
Let the distribu-
JUMP MARKOV PROCESSES
360
tion function of the variables r,,,.(t) be denoted by F,r(x); that is, F,r(x) = p{z,r < x}. Here we do not assume in advance that F,,.(+ oo) = 1. We introduce the Laplace-Stieltjes transform of the function F,r(x):
e -dFsr(x) ,
(P,r(Z) = J
Re z _> 0 .
0
On the basis of what we have just said, it follows from (18) that q)sr(z) =
(19)
Setting cp,,,_1(z) = (p,(z), we obtain from (19)// ,P .,(Z) - 7'a(Z)7's-i(Z) .
.
. 7'r+1(Z)
(20)
.
Since the function cp8T(z) determines F,r(x) uniquely, the problem is reduced to determining rp,(z). Let z,(t) denote the length of the
interval of time from t to the instant when the system leaves the sth state, which it is in at the instant t. Then with probability acs' we have z,,,_1(t) = z,(t), and with probability i8, re,8-1(t) = r8(t) + r,+1,s(t + ra(t)) + r8,8_1[t + r8(t) + r,+1,8(t + v8(t))]
(21)
The Laplace-Stieltjes transform be(t) of the distribution function of the variable z,(t) is easily found. Using Lemma 2, we obtain qi(z) =
e-Zxd(1
-e
(8
)=
q(s)
q(s) + z
o
It again follows from Corollary 1 of Theorem 3 that p8(Z) = X:'Ip:(z) + or
q,8+,(z)
= q(s) + z X8
-
f_eN
(22)
(p8(Z)
Successive applications of formula (10) lead to the representation
of the function p,+1(z) in the form of a continued fraction that is a rational function of pk = Iik/X'k and q1(z). Let us find the function p1(z) from the following considerations.
Define
poo(r) = p{e(t) = 0 1 M) = 0} .
Let r' denote the length of the kth interval of time in the course of which the system finds itself in the state 0, and let zlo' denote the time spent by the system after leaving the state 0 for the kth
time until the next return to
0.
The probability pou(t) is the
probability sum un=o En of incompatible events En, where E0 is the event z('' > t and E. is the event (n > 1)
7.
THE BASIC PROPERTIES OF JUMP PROCESSES
361
x
x
E (Z(k) + Z10
k=1
- t < k=1 (Z(k) + Z10)) + z(x+1)
It follows from the independence of the variables z(k) and z(o) that (for p(z(1) > t) = e-1o') Pl
k=1
t
Z,,(0). The process e'(t) is also a jump birth and death process with the same values of X. and px as the process e(t) for n < N. All the preceding formulas remain valid for this case. However, the function pou(t) now corresponds to the auxiliary process e'(t) and we may not assume that it is given. In the present problem it is easy to find the distribution of the variable N_1,N-2: t (1 (ZN_1,N-2 < x) = e(2N-1+PN-1)x) 1'N-1 P
j/
XN-1 + -N-1
from which we get
-
JUMP MARKOV PROCESSES
362
'P'-1(Z) _
f N-1 q(N-
1) + z
.
Using (22), we obtain the representation of the function p,(z) in the form of a continued fraction that is a rational function of z: 'P5(z) Ps
1+ps+z-
Pa+l PS+2
+Ps+1+z-
1
(24) PN-1
1+
PN-1
+
z )'N-1
Previously, we have always started with a given jump process. Let us now suppose that the function q(t, x, A) is given. Can we construct a Markov process with transition probabilities P(t, x, s, A) connected with the function q(t, x, A) by the relation lim P(t, x, s, A) - XA(x) = q(t, x, A)? B-.c
s-t
(25)
If the the answer is affirmative and the process constructed is a jump process, then the sample functions of the process can be constructed by using the preceding results. This remark forms the basis of
the solution of the problem posed. Let X denote a complete separable space, let 0 denote the u-algebra of Borel subsets of X. Suppose that the function y q(t, x, A) = -q(t, x)XA(x) + q(t, x, A)
is defined for all t > 0, x e X, and A e B and that it satisfies the following conditions: a'. For fixed t > 0 and x e X, the function q(t, x, A) is a finite
measure on B and q(t, x) = q(t, x, X). b'. For fixed x e X and A e 0, the function q(t, x, A) is continuous with respect to t and uniformly continuous with respect to A on every finite interval of variation of t. We note that conditions a' and b' are more general than those that the function q(t, x, A) of a jump Markov process must satisfy (conditions a-c of Section 3). Consider the space fl, introduced above, of the sequences }, where ek E X and zk > 0. On the algebra of v , e1, o) cylindrical sets in SZ we introduce the measure P(C) as follows. If
C = {Ck E Ak, k-0, 1, , m; r'; < t,, j = 1, , m},
7.
THE BASIC PROPERTIES OF JUMP PROCESSES
363
then we set p(C) =
F(t1, ... , tm; A1, ... , A. 10, x) fto(dx)
,
J Ao
where F(t1,
, t,,,; A1,
, A. 10, x) is given by formula (8) and 1-1o
is an arbitrary "initial" distribution on 0. In accordance with Kolmogorov's theorem (Theorem 3, Section 2, Chapter III), the measure P(C) can be extended as a complete measure {, p}, where is the complete u-algebra generated by the cylindrical sets in f . On Sb we define the function e(t) = f(t, w) by k
(/ f(t, w) = xk, if 7k G t < 7k+1{ 7p = 0, 7k = j Tj J j=1
If 't- _ Ek=1 Zk = =, then fit, w) is defined for all t > 0 for given w.
Let N denote the set
Nw;7-=LV 0, and its sample functions are continuous on the right in the discrete topology on X. This will be the case, in particular, when the function q(t, x) is bounded.
To see that this is so, note that the proof of Corollary 1 of Theorem 1 depends only on formula (8) and the boundedness of the function q(t, x), so that P(N) = 0 under this restriction upon the process in question. To determine the random process e(t) for all t > 0 for p-almost-
all w in the case in which p(N) > 0, we proceed in a different manner. The simplest method is as follows: We add to X a single point "«c ." The extended space is denoted by X'; that is, X' = X u {-}. Let us assume that e(t) = cc for t > 7-. The process thus constructed is denoted by e,(t). Other extensions of the process e(t) can be obtained as follows: Let S (for k = 1, 2, ) denote
a sequence of spaces that can be considered as distinct copies of the space f2. On S2k let us consider the measure {g,,, Pk} defined in the same way that {%, p} is defined on fl, but with measure {,u, 0} as the initial distribution. Let Nk denote the set in S2k analogous to N in fl: Nk = 1 w(k) w(k) E 1l
j=1
J
are independWe shall assume that the a-algebras of the events ent. Fort > 7_ we obtain e(t) = fit - 7,,, 01)). If p(y2) < -) = 0,
JUMP MARKOV PROCESSES
364
then e(t) is now defined for all t for almost all (w, w(1'). On the other hand, if p(7((1) < oc) > 0 we set fi(t) = f(t - 7_ - 7_1), (o(2)) and so forth. We note that if the function for t > Y- + q(t, x, A) is independent of t, then the variables 7_k) (for k > 1) are identically distributed and independent, so that the inductively defined process y(k), w(k+l))
fi(t) = f(t - 700 -
t
t - k=1
P1 Zm > t + x \\
Zk
x)
from which we get the important conclusion PwlTm
e , 7,=+, < tm+r, Sm e Am,
- T < tm, z,.+, < tm+1,
m-1
Z1, .. . , 7-11 E Zk < k=1
T < k=1 i Zk
F(tm, tm+l, ... , Am, ... , Am+r T,
(26)
But this means that the conditional probabilities of the events in 6{do(t), t > T} with respect to the u-algebra a{e(t), t < T} depend only on o(T) = Em-1 (or on o(T) = 00, which is obvious); that is, eo(t) is a Markov process. One can easily show that the same conclusion holds for the process e(t). Theorem 4. Let q(t, x, A) satisfy the conditions a' and b' (page 362).
Then
a. eo(t) is a Markov process and its sample functions are, with probability 1, continuous from the right; b. the transition probabilities of the process eo(t) are defined by the relations:
P(t x v A) = E Pcn)(t x, v, A) n=o
v > t, A E
,
(27)
where
Pco)(t x, v, A) = qt(t, x, v)XA(X)
,
/
p(n+1)(t x, v, A) = V p(n)(0, Y, v, A)
tX
''I1(t, x, v)
= exp (_5Vq(o, x)dO)
,
(t, x, O)q(O, x, dy)d8 ;
n = 0, 1,
(28)
7.
THE BASIC PROPERTIES OF JUMP PROCESSES
365
and p(t, x, v, {00 }) = 1 - p(t, x, v, X) ;
c.
the function P(t, x, v, A) (satisfies the first Kolmogorov equation: ap(t,
at-
- J xP(t, z, v, A)q(t, x, dz)
, v, A)
(29)
and the boundary condition lim p(t, x, v, A) = XA(X) ttv
d.
equation (25) is satisfied uniformly with respect to
t (for
0 < t < T where T is an arbitrary number) for fixed x and A. Proof. b.
a.
This was proved before the statement of the theorem.
Note that Theorem 2 holds for the process e0(t) since its
proof was based only on formula (8) and the fact that the first state 0 changes by a jump. Let p(")(t, x, v, A) denote the conditional probability that with the hypothesis e(t) = x # oo , the function E0(v) e A, and the function e0(t) has exactly n jumps on the interval [t, s]. Then it follows from (26) that P10)(t, x, v, A) = XA(x). Furthermore, if 6(0) = x for t < 0 < t + r, and (t + v1) then
on the basis of Theorem 2, P("+')(t, x, v, A) = M{p(")(t + zl, " v, A) I e(t) = x}
=
P(")(0,
f0
tX
y, v, A)q'(t, x, 0)q(0, x, dy)dO .
Therefore the function p(t, x, v, A) defined by equation (27)
is
the
probability of falling at the instant v into the set A after a finite number of jumps, after leaving the point x at the instant t. This completes the proof of (b). c. It follows from (27) and (28) that P(t, x, v, A) = ''V(t, x, v)XA(x)
+
fx 0(6, y, v,
x, 0)q(6, x, dy)dO .
(30)
Jt
From this it follows that lim P(t, x, v, A) = XA(X) ttv
and the function P(t, x, v, A) is continuous with respect to t. It
follows from the boundedness and continuity of the function p(t, x, v, A) with respect to t and the continuity in t of q(t, x, A) uniformly with respect to A (that is, the continuity of the variation of the measure q(t, x, A) as a function of t) that the integral P(B, z, v, A)q(B, x, dz) X
JUMP MARKOV PROCESSES
366
is continuous with respect to 0. Consequently equation (30) can be differentiated with respect to t. Thus op(t, x, v, A) at
-
= q(t, x)'I`(t, v, x)X A(x)
P(t, y, v, A)'T(t, x, t)q(t, x, dy) ,
+ t X P(0, y, v, A)q(t, x)q(0, x, dz)d0 = q(t, x)P(t, x, v, A) -
J JX
p(t, y, v, A)q(t, x, dy)
,
which completes the proof of (c). d. It follows from the continuity of the function q(t, x) with respect to t that lim
'I'(tl, x, t2)
t2 - tl = q(t, x) uniformly with respect to t (for 0 < t < T). In view of (30), it tlr t,tzlt
then follows that lim It2>titlo
1-
1 - p(tl, x, t2, {x}) = q(t, x) t2 - tl
It follows from (30) that p(tl, x, t2, A) is a continuous function of tl and t2. Keeping the continuity (mentioned above) of the inner integral on the right-hand side of formula (30), we obtain, for x A, p(t1t x' tt' A) 2
_ Xp(0, Z, t2, A)'(t1, x, 0)q(0, x, dz) ,
(tl
t>0,
p(t,x,v,X)= 1,
xeX.
(31)
The question of whether or not a Markov process has infinitely many jumps in a finite interval of time is of great interest. However Equation (31) is not convenient for determining this. We can
obtain a more suitable condition by confining ourselves to the homogeneous case, that is, by assuming that the function q(t, x, A) _ q(x, A) is independent of t. In the homogeneous case, P(n)(t,
x, v, A) = p(n)(v - t, x, A)
P(°)(t> x, A) = e-tq(,)xA(x)
exp {-0q(x)}P(n-1)(t - 0, y, A)q(x, dy)d9
P(n)(t, x, A)
,
where q(x) = q(x, X) and P(t, x, v, A) = P(v - t, x, A). If we set K(t, x) = 1 - P(t, x, X), then K(0, x) = 0 and it follows from equation (29) that aK(at x)
= q(x)K(t, x) -
dy) x K(t, y)q(x,
(32)
In this equation let us shift from the function K(t, x) to its Laplace transform: z(X, x) =
e-'tK(t, x)dt
x
.
We obtain (X + q(x))z(X, x) =
X
z(X, y)q(x, dy)
Let f(x) denote an arbitrary solution of the integral equation (? + q(x))f(x) =
JX
f(y)q(x, dy) ,
(33)
and let f(x) satisfy the condition sup,,, if(x) I < 1. Lemma 4.
For X > 0,
-z(X, x) < f(x) < z(X, x) Proof.
We have z(X, x) = limn. z(n)(X, x), where z(n)(X, x)
= x -e-'t(1 o
\\
-
P(k)(t, X, X ))dt k=o
//
Define Q(-)(X, x) = x fo J
Then
e-1t0(n)(t,
x, X)dt
JUMP MARKOV PROCESSES
368 Q(O)(X, X) _ X + q(x)
Q(k-1)(X, y) 4(x, dy)
Q(k)(X, X) / = z(n)(7
'
X+q(x) Q(k)( x)
, x) = 1 k=0
=1-
X
X + q(x)
-
f
x
(1 -
z("-1)(X, y)) 4(x, dy)
X + q(x)
from which we get (? + q(x))z(R)(a x) _
z("-1)(a
y)q(x, dy)
(34)
Since -z(0)(X, x) = - 1 < f(x) < 1 = z(') (X, x), by combining
(33) and (34) we obtain by induction -Z(")(X,, x) < f(x) 5 z(")(%, x)
and taking the limit as n, -, we obtain the desired assertion. Theorem 5. For p(t, x, X) to be equal to 1 for all t > 0 and x e X, it is necessary that equation (33) have no nontrivial bounded
solutions for arbitrary ? > 0, and it is sufficient that this condition be satisfied for some ? > 0.
If P(t, x, X) = 1, then z(X x) = 0 for arbitrary X > 0, and on the basis of Lemma 4 every bounded solution of equation (33) is identically equal to 0. On the other hand, if equation (33) Proof.
has no nontrivial bounded solutions for some X, then z(%, x) which is a bounded solution of equation (33), is identically equal to 0 and P(t, x, X) = 1. This completes the proof of the theorem. Example. Suppose that e0(t) for t > 0 is a birth and death process that is defined by the sequences {?,"} and J u.1 (for n = 0, 1, 2, ), where e0 = 0, X0 > 0, and X. > 0 for n > 1 (cf. p. 324).
Under what conditions do the sample functions of the
process remain bounded with probability 1 in the course of a finite interval of time? Obviously, this will occur only when these sample functions have finitely many jumps in a finite interval of time. Let us use the preceding theorem. Equations (33) in this case become an infinite system of linear algebraic equations (X + A 0)f(0) = XJ(l)
,
X,(.f(n + 1) -.f(n)) = a"(ff(n) - f(n - 1)) + Xf(n) , n > 1 . (35) If X0 # 0, then equations (35) determine all the f(n) for n 1 up to an arbitrary factor equal to f(0). On the other hand, if ? = 0
7.
THE BASIC PROPERTIES OF JUMP PROCESSES
369
we can set f(0) = 0. Then all the f(n) for n > 2 are uniquely determined up to the factor f(1). Since f(1) > f(0) > 0 (for X > 0),
we obtain by induction the result that f(n + 1) > f(n). Let us rewrite the system (35) in the form Jf (n + 1) - fn) = 7n(fin) - f(n (n - 1)) + Snf(n) 7
n>
and let us show that the sequence f(n), where f(0) = 1, is bounded if and only if E (Un + 7A-1 + 7n7,-1(ln-2 + n=1
.. . + 7n7n-1 ... 7201 + 7n7n-1 ... 71) < 00
(36)
.
We have ff(n + 1) -.f(n) = Unf(n) + 7nun-1.f(n - 1) + ... + ynyn_1 ... 7281f(1) + 'y11( (1) ff(n)(an + 7 A-1 + 'Y
yn Y
1)
... 71)
= p(n)f(n)
where
p(n) = an + 7,yUn-i + ... + 7n7n-1 On the other hand, ff(n + 1) - .f(n) ? p(n) (f(l) - 1)
.. '1'1
.
Thus
f(n) + pn(f(l) - 1) < f(n + 1) < f(n)(1 + pn) from which it follows that n
n
f(1) + (.f(1) - 1) k=1 1 pk c .f(n + l) c .f(1) 11 0 + pk) k=1
Since the series Ek=1 pk and the infinite product Hk=1 (1 + pk) converge simultaneously, we see that condition (36) is necessary and sufficient for the sequence f(n) to be bounded. Theorem 6. For the sample functions of the birth and death process do(t) to have with probability I finitely many jumps on an arbitrary finite interval of time, it is necessary and sufficient that n=1
+
(-iunhln-1 X flX fl-1X %_2
+ [ pn-1 ... //2 + X%Xn-1
X2X1
f//n ... )U21U1
'nX _1
X 2X 1
= 00
.
VIII DIFFUSION PROCESSES
In this chapter we shall consider continuous Markov processes with range in m-dimensional Euclidean space R(-1; Up to this point we have not completely described such processes. We shall now study an important class of these processes, the class of so-called diffusion
processes which, as the name suggests, can serve as a probabilistic model of the physical process of diffusion. In Section 5, Chapter VI, we considered a process of Brownian motion as a probabilistic model of diffusion in a homogeneous medium. Using a similar construction in the case of a nonhomogeneous medium, we arrive at the concept of a general diffusion process. Let us clarify the basic concepts of diffusion processes by giving an example of a one-dimensional process.
Let xt denote the coordinate of a sufficiently small particle suspended in a liquid at an instant t. Neglecting the inertia of the particle, we may assume that the displacement of the particle has two components: the "average" displacement caused by the macroscopic velocity of the motion of the liquid, and the fluctuation
of the displacement caused by the chaotic nature of the thermal motion of the molecules of the liquid. Suppose that the velocity of the macroscopic motion of the liquid at the point x and the instant t is equal to a(t, x). We assume that the fluctuational component of the displacement is a random variable whose distribution depends on the position x of the particle, the instant t at which the displacement is observed, and the quantity
At, which is the length of the interval of time during which the displacement is observed. We assume that the average value of this displacement is equal to 0 independently of t, xt and At. Thus the displacement of the particle can be written approximately in the form here, 370
xt+ot - x, = a(t, xt)Ot + t,st,at ; (I ) 0. If a(t, x) is equal to 0 and the distribution of
1.
DIFFUSION PROCESSES IN THE BROAD SENSE
371
t,xt,at is independent of x and t, as we assumed when we were considering Brownian motion (cf. Remark 1, Section 5, Chapter VI), then
Since the properties of the medium are naturally assumed to change only slightly for small changes in t and x, the XAt.
process is homogeneous in the small. Therefore, we may assume that ,xt,at = 6(t, xt)et,at
where c(t, x) characterizes the properties of the medium at the point x at the instant t, and t,at is the value of the increment that is obtained in the homogeneous case under the condition that 6(t, x) = 1. Thus St,at must be distributed like the increment of a process of Brownian motion: w(t + At) - w(t). Consequently, for the increment xt+at - xt, we can write the approximate formula
xt+at - xt s a(t, xt)Ot + 6(t, xt)[w(t + At) - w(t)] . (2) To make this formula precise, we replace the increments, as one frequently does in mathematical analysis, with differentials. When we do this we obtain the differential equation for xt, dxt = a(t, xt)dt + o'(t, xt)dw(t) , (3) which we may take as our starting point in determining the diffusion process.
Let xt denote a multidimensional process with range in R(m). Then equation (1) remains meaningful if a(t, xt) is a function with range in Rcm, and t,xt,at is a random vector in Rcm>. In this case we assume that Et,xt,at can be represented in the form m
St,xt,at =k=1 L bk(t, xt)[wk(t + At) -
Wk(t)]
where the bk(t, xt) are functions with ranges in R(m) and the wk(t) are independent one-dimensional processes of Brownian motion. Such a representation corresponds to a nonisotropic medium: the displacements in the different directions have, in general, different distributions.
The equation for the variable xt in this case takes
the form vn
dx, = a(t, xt)dt + Y, bk(t, xt)dwk(t)
(4)
k=1
We note that we cannot as yet give a precise meaning to equations
(3) and (4).
The difficulty lies in the fact that the quantity w(t + At) - w(t) At
where w(t) is a process of Brownian motion, has a normal distribution
DIFFUSION PROCESSES
372
with mean zero and variance l/Ot, and hence this quantity does not have a limit in any probabilistic sense. Since w(t) does not have a derivative, the usual definition of the differential dw(t) has no meaning. We shall give a precise meaning to equations (3) and (4) when
we introduce the concepts of a stochastic integral and stochastic differential in Section 2. In Section 1 we shall define a diffusion process, beginning with the properties of transition probabilities.
In Sections 3 to 6, we shall study the solutions of equations (3) and (4) from the point of view of their existence and uniqueness, and of the properties that will enable us to determine the distributions of the basic characteristics of the process.
1.
DIFFUSION PROCESSES IN THE BROAD SENSE
Let us first consider the one-dimensional case of diffusion processes in the broad sense. Let e(t) denote a Markov process in the
broad sense and defined on [0, T] into
R(11.
This means that the transition probabilities p(t, x, s, A), of a process are given and satisfy conditions a-c of Section 1, Chapter VII. A process e(t) is called a diffusion process if the following conditions are satisfied. For every x and every s > 0,
a.
p(t, x, s, dy) = o(s - t)
(1 )
Ix-YI>E
uniformly over t < s; b. there exist functions a(t, x) and b(t, x) such that for every x and every s > 0, (y - x)P(t, x, s, dy) = a(t, x)(s - t) + o(s - t) 1--Yl_ 0, a99(t)dw(t)
> c} < N +
q(t) I2dt > N}
.
(3)
To see that this is true, suppose that T(t) = cp(tz) for tQ Using the last two relations, we obtain b
P{suP 19)N(t) - q(t) I
P{ f aacp(t)dw(t)
=
P{
b
J
< P{
b
J
>
.
c}
b('P
2N(t)dw(t) +
(t) -
p,(t)dw(t) > c} + P{
PN(t))dw(t)
> c} b[T (t)
gN(t)]dw(t)
> 0}
M 'q,,(t)dw(t)
N}
which completes the proof.
Now let I f.1 denote a sequence of step functions such that b
0 in probability. Then b I fa(t) - fm(t) I2dt also a approaches 0 in probability as n and m approach oo . Consequently, a for every s > 0, I f(t) - fa(t) 12dt
(11b
lim P{J If n(t) - J fm(t) 2dt > F} = 0
Using property 3, f/we can write, for arbitrary s > 0 and 6 > 0,
im P {
n' M--
E d
J aJ
a(t)dw(t) -
+ a,m limw
J a m(t)dw(t) ff
> o}
b
P{ J¢
I .fa(t) - fm(t) Wt > s} 2
2.
ITO'S STOCHASTIC INTEGRAL
381
so that because of the arbitrariness of s > 0 we have
slim Pj
fn(t)dw(t) -
fm(t)dw(t)
2>a}-0
J
for every 8 > 0. It follows from this equation that the sequence of the random variables f (t)dw(t) converges in probability to b
some limit. This limit is independent of the choice of the sequence {fa(t)} for which bI f(t) 0. (If there are two such sequences {fa(t)} and {fa(t)}, by combining them into a single sequence
-
we see that with probability 1, the two sequences have the same limit.) Let us define 6f(t)dx,(t)
= P-lim a
We shall call this limit Ito's stochastic integral of the function f(t). The definite integral defined in this manner is a homogeneous additive functional of the function f(t) on W2[a, b]. Furthermore,
for a < c < b, 'f(t)dw(t) +
f(t)dw(t) = f bf(t)dw(t)
.
a
a
(4)
Proof of these properties is obvious for step functions and it carries over to the general case by means of a trivial limiting operation. Applying the limit for step functions obtained in (3) to arbitrary functions in E2[a, b], we see that property 3 is valid for all f(t) e M2[a, b].
With the aid of this property we can show that: 4.
If f(t) e U2[a, b], b If
e T22[a, b], and
(t) - f(t) I2dt _ 0
in probability, then p-lim f(t)dw(t) Ja
¢
Finally let us prove a property that generalizes properties
1
and 2.
2'.
If the function f iss% such that bM(I
\t) I2
I g'a)dt < o0
with probability 1, then b f(t)dw(t)
M(J a
19-a)
= 0 (mod p)
(5)
DIFFUSION PROCESSES
382 and
M([ .f(t)dw(t)]z z`-a) = a
a
(mod p)
M(j.f(t) III
.
(6)
To prove these properties, let us show that in the case in which
a)dt < -, there exists a sequence of step functions such that bM(I f(t) - fn(t) I2 Ia) 0 in probability. Let {f-(t)} bM(I f(t) 11
I
fa
denote the sequence of step functions in E2[a, b] such that p-lim
I f(t) - fn(t) I2dt = 0
.
a
Let us set g,(x) = x for I x I < N and g,(x) = Nx/I x I for Since I g,(x) - g.,,(y) I < I x - y I, we have
I x I > N. rbI
g,(f(t)) - DN(f.(t)) 2dt in probability.
0
(t) - n(t) 12dt
a a
Since the quantities rbI Ja
g:\'(f(t)) - gN(fm.(t))
2dt
are bounded by the number 4N2(b - a), and since their sequence converges to 0 in probability, it follows on the basis of Lebesgue's theorem that M0a g (f(t)) - gN(ffa(t)) 2dt I
a)
bM(j a
gN(f(t)) - gN(fn(t))
12 I
z`
a)dt
0
in probability.
On the otherhand, bM(I g.(f(t))
- f(t)
I2 z`
I
a)dt
0
in probability (again by Lebesgue's theorem), since I f(t) - gN(f(t))
12
I f(t)
12
and
bM(If(t) 12 12 a)dt < 00
Therefore we can choose a sequence {Nn} such that a
M(If(t) - gN,y(fn(t)) 12 Z-a)dt
- pN,y(f(t))
+ in probability.
12 I
"Ua)
2\bM(I g N,y(f(t)) - gN (fn(t)) 12 I a-a)dt .--> 0 a
2.
ITO'S STOCHASTIC INTEGRAL
383
The functions gN,a(fa(t)) are step functions and can be chosen for the functions fa(t), whose existence we are proving. Substituting into (1) the functions fa(t) that we have constructed and taking the limit as n --. co, we obtain (5). To obtain (6), we note that if the sequence in probability and if m I a - c,a 11 .__, 0 as n, m - 9 oc, then M 1$ fore
0 and hence Min
f(t)dw(t)]2 g-a) = lim M([f afa(t)dw(t)]2
M\LJ
lim yM(I fa(t) I2 t)-Jdt = JaM(I f(t) 12 n-w J
ME2.
There-
a/ g_a)dt
.
I
Let us now consider the stochastic integral as a function of Let *2(s) denote the function that is equal for s < t and equal to 0 for s > t. If f(s) e 1U 2[a, b], then to f(s),frt(s) e l2[a, b] for every t e [a, b]. We define the integral f(s)dw(s) for all t by the upper limit. 1
f(s)dw(s) a
f(s)1Jrt(s)dw(s) a
It follows from the definition of a stochastic integral that this integral is defined probabilistically. Therefore, as a function of the upper limit, the integral is defined up to stochastic equivalence (cf. Section 1, Chapter IV).
In what follows we shall always assume
that the values of the integral as a function of the upper limit, for different values of t, are compatible in such a way that C(t) = \f(s)dw(s) is a separable process. The possibility of doing this follows from Theorem 2, Section 2, Chapter IV. Let us note the basic properties of the function C(t) _
f(s)dw(s) a
5.
If f 'm(j f(s) I2 Z-a)ds < oc, then a
PlQupb
f(s)dw(s)
aJ
`
J
G
>c
(7) C2
and
sup
P
a;Vigb
f(s)dw(s)
2
JM(I f(S) I )gds.
.
(8)
It will be sufficient to prove inequality (7). Let us choose a partition of the interval [a, b]: a = t, < t, < ... < to = b. We define bk = tkf(s)dw(s), and we define Xk = 1 if C; I < c for i < k Ja
DIFFUSION PROCESSES
384
and I x I > c, and xk = 0 otherwise. Obviously Ek=1 xk < 1 and Xk is measurable with respect to 5t5tx, and since I Ck I2xk > C2xk, we have y bn
++ / = bnk=0 L xk
y
yy
k=0
yy yy b kXk + 2k=0j bk(bn - byyk)/L.k yy
y + yy / > C2 n xk + 2 nj byk(Sny - bk)xk + Lk=0G. - bk)2 xk k=0 k=0 yy
While taking the conditional mathematical expectation of both sides with respect to Z-a, we note that on the basis of (5), M(bn - bk 19tk) = M(fs)dw(s)
tk)
tk
= 0.
Therefore yyk)/lk
M(bk(bn
b
I
"-a) =
bk
I
m) = 0
and then using equation (6), we obtain M(Cn aa)
? C2M(I xk i a)
We note that D=0 xk is equal to 1 if sup I Ck I > c and equal to 0 otherwise.
Therefore
M (E k=0
xk
a)
= PS SUP ISkI > c 15k5n
Thus we have proved the inequality P to
> C 12 aI
N.
> o}
0 k
(the latter in view of the continuity of the process w(t)), we obtain, n-1
"/(t2) - v(t1)
max(tP
n-1
+ Lk=0ux(t(k),
yy
llm
(k))-,0
ut(t(k), b(t(k)))(t(k, 1) - t(k)
[No
S(t(k)))a(t(k+l) - t(k))
n-1
+
yy
k=0
ux(t(k), S(t(k)))bLw(t(k, 1)) - W(t(k))
7
b2 n-1
11k) C(t k) ))(t k+1) - (k) + -2 Ek=ouxx(t, t
+
b2
(t(k+l) E1 uxx(t(k), b(t(k)))L(W(t(k}1)) - W(t(k)))2 _
2 k=0
/
t(k) )1]
1
Obviously, the limits of all summations except the last are equal to the corresponding integrals. To prove formula (11) in the present case, it suffices to show that n-1
E uxx(t(k), b(t(k)))LLW(t(k 1 k=0
1))
- W(t(k))]2 - (t(k 11) _ t(k))] J
2.
ITO'S STOCHASTIC INTEGRAL
389
converges to 0 in probability. We define
- (t(k}1) - t(k)
Sk = [W(t(k+1)) - w(t(k))]2
and we let ,ykN) denote the characteristic function of the event /L {1 e(t(ry))
I < N for i < k} .
Then kN)Sk12
MCK1 uz(t(k), S(t(k)))h k=0
/)
n-1
L
yy(t(k) + MU11 xx(t(k) , b ))
SUp k _ t,IxISN
k=O
3 sup
I
(t(k+1) _ t(k))2
I U11 xx (t, x) 12 E
utt (t, X) 12 xx(+ ---3
2
MEk
0
t,IxISN
and n-1 P
lk= 0
uxx(t(k) b(t(k)))2(1 - ^hkN)) s
as N -+ a o.
0
0 , `61 bn(t) - b(t) 12dt --> 0 a
and the sequence of the processes
Cn(t) = C(a) + ta,t(s)ds + Jb"`(s)dw(s) a J¢
converges uniformly, with probability 1, to C(t). Then the sequence of processes ?),a(t) = u(t, c (t)) also converges uniformly with prob-
ability 1 to
Taking the limit as n , o,:, in the formula
7).(t2) - "7%(t1) =
(t, bm.(t)) + u¢(t, Sn(t))an(t)
+ 1 u'xx(t, C.(t))bn(t)]dt + tlux(t, C%(t))bn(t)dw(t) J
we obtain proof of formula (11) in the general case. Let us also consider the integrals bF(t)dw(t)
,
DIFFUSION PROCESSES
390
of the functions F(t) with range in Rcm>. If fl, , fm are the coordinates of the function F relative to some basis, then bF(t)dw(t)
d
is a random variable with range in R(m) with coordinates b f ti(t)dw(t) J a
for i = 1, ,m. Properties 1 to 8 remain valid for stochastic integrals of vectorvalued functions if by I F we understand
fl l2 + ... + V-11
.
(Formulas characterizing these properties were deliberately written in such a form that they remain meaningful for integrals of functions with ranges in Rcm>.)
Let us suppose that k mutually independent processes of Brownian motion wl(t),
, wk(t) are such that condition (2) regard-
ing the connection with the a-algebra, is satisfied for each of them and, for each h, the processes w,(t + h) - wv(h), for i = 1,
,
k,
are all independent of gk. Then we can define the integral af(t)dw;(t)
for every i = 1, , k and every function f(t) e ITt2[a, b] with range in R(m). Analogously to what we did in the one-dimensional case, we can define the differential: k
do(t) = a(t)dt + E fi(t)dwti(t) a_1
Let u(t, x) denote a function defined for t e [a, b] and x e R(m), Let x', , xm denote the coordinates of the
with range in Rcm>.
point x relative to some basis, and let fj, , fp denote the coordinates of the function fj. 8'. If the function u(t, x) is continuous and has continuous partial derivatives a
at
for i, j = 1,
u(t, x)
a ,
axti
a(t, x) ,
aQ
a(t, x)
axiaxj
, m, and if the process r(t) has a differential do(t) = a(t)dt + E fi(t)dwi(t)
then the process i)(t) = u(t, '(t)) also has a differential and
3.
EXISTENCE AND UNIQUENESS
391
au(, C(t)) + Ei ax; u(t, r(t))ai(t) at
dv)(t)
+
12 m,;,a E ax'ax' u(t, Vt))fi(t)fi (t)]dt a2
+' (_ au f'(t))dw2(t) j=i ax'
i=1
(12)
//
The derivation of formula (12) is analogous to that of formula
3.
EXISTENCE AND UNIQUENESS OF SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS
Consider the stochastic differential equation de(t) = a(t, e(t))dt + a(t, e(t))dw(t)
,
(1 )
whose solution, it is natural for us to expect, is a diffusion process with coefficient of diffusion a°(t, x) and coefficient of transfer a(t, x).
Let us assume that a(t, x) and 6(t, x) are Borel functions defined for x e R(1) and t e [to, T].
Equation (1) is equivalent to the equation
fi(t) _ (to) +
t
to
rt a(s, e(s))ds +
Q(s, ee(s))dw(s) J to
(2)
and it is solved under the condition that e(to) is given. For the integrals in (2) and hence the differentials in (1) to be meaningful, we need to introduce the a-algebras of events fit. In what follows, the quantity e(to) will always be assumed to be independent of the process w(t) - w(to) and by the a-algebra `t we shall understand the minimal a-algebra with respect to which the variables (to) and w(s) - w(to) for to < s < t are measurable. We shall consider e(t) to be a solution of equation (2) if e(t) is (fit-measurable, if the integrals in (2) exist, and if (2) holds for every t e [to, T] with probability 1. We note that property 3 of the preceding section implies that for stochastically equivalent processes f (s) and f2(s), the stochastic integrals
5t)
,
It
f2(s)dw(s)
coincide with probability 1, since fl (s) = f2(s) with probability 1 for every s and hence
DIFFUSION PROCESSES
392
to
Ifl(S) - f2(s)
gds>0}=0.
From this it follows that every process that is stochastically equivalent to a solution of equation (2) is itself a solution of the same equation. Since the right-hand member of equation (2) is stochastically equivalent to the left-hand member and with probability 1 is continuous, it follows that for every solution of (2), there In what exists a continuous solution stochastically equivalent to it. follows, we shall consider only continuous solutions of equation (2).
Theorem 1. Let a(t, x) and u(t, x) for t e [t0, T] and x e RM denote two Borel functions satisfying the following conditions for some K:
a.
For all x and y e k1>,
a(t, x) - a(t, y) I + I u(t, x) - 6(t, y) < K I x - y 1 b. For all x, I a(t, x) I2 + I u(t, x)12 < K2(1 + I x 12)
,
.
Then, equation (2) has a solution. If e1(t) and c2(t) are two continuous solutions (for fixed Eo(t)) of equation (2), then P{toupT I sil(t) - e2(t) I > 0} = 1
.
Proof. Let us first prove that a continuous solution is unique. Let $1(t) and e2(t) denote two continuous solutions of equation (2). Let XN(t) denote the random variable that is equal to 1 if I e1(s) I < N and 1 e2(s) I < N for all s e [to, t], but equal to 0 otherwise. = XN(t) for (1s < t, we have Since XEN(t)XN(s)
XN(t)[S1(t) - U01 =
XN(s)[a(s, $1(s)) - a(s, E2(s))]ds o
+
t J to
X N(s)[u(s, $1(s)) - 6(s, ee2(s)]dw(s)
J
'
Since
XN(s)[I a(s, e1(s)) - a(s, $2(s)) I + 16(s, E1(s)) - 6(s, e2(s)) J] < KXN(s) e1(s) - $2(s) I < 2KN ,
the squares of the integrals on the right-hand side of the last equation have mathematical expectations.
Applying the inequality (a + b)2 < 2(a2 + b2), Cauchy's inequality, and property 2 of the preceding section, we obtain the inequality
3.
EXISTENCE AND UNIQUENESS
393
2(t)]2 MXN(t)[E1(t) - Oto
< 2MXN(t)XN(s)[a(s, + 2MXN(t)
e1(s)) - a(s, e2(s))]ds)1
Ot to
[a(s, e1(s)) - a(s, $2(s))]dw(s))
/
to)t
< 2(T -
MXN(s)[a(s, e1(s)) - a(s, $2(s))]2ds to
11rt
+ 2J MXN(s)[a(s, S1(s)) - a(s, Us))]2dw(s) . tt
to
Taking into consideration condition a, we see that there exists a constant L such that MXN(t)[e1(t) - e2(t)]2
0, the limit lim
1
t"Mt,,.fls, e(s))v2(s, E(s))ds
tl-tMot" - t' t'
= f(t, x)v2(t, x)
DIFFUSION PROCESSES
416
Therefore the limit
exists.
f(t"))
v,(t', x) - Mt-.xva(t",
Jim
t" - t'
t-V1o t--.t
= lim t,-.t
x)
x) - v2(t [v2(t',
t" - it
+
va(t", x)
- Mt-.xva(t"
(t"))
t" - t'
also exists. But as shown in the proof of Theorem 1, lim Mt-.xyd(t
t,,-t, 10 t,-.t
m (t")) - v2(t , x) = Z a`(t, x)
t"
- t'
i=1
ax
v2(t, x)
+ 2 i,j,k=1 b (t, x)b (t, x) a2y.i(t, x) ax`axi Consequently, the limit v,(t" x) - v,(t', x) = a v,(t, x) Jim 1
t-Vlo t" - t' at tit exists and equation (3) is satisfied. This completes the proof of the theorem. Let us look in greater detail at the case of a one-dimensional process of Brownian motion. If the function f is independent of t, to find the distribution of the random variable I, we can in this case obtain equations of simpler form and with less rigid restrictions. For a process of Brownian motion, the coefficients in equation (2)
of Section 3 are: a = 0, a = 1. It is easy to see that the solution of equation (6) of Section 3 can then be written in the explicit form etjz(s) = x + w(s) - w(t), where w(t) is a process of Brownian motion.
It follows from Theorem 2 that the function v,(t, x) = M exp
J%J
f (x + w(s) - w(t))ds} t
satisfies the equation azv
v2(t x) + 2
(t, axl
x) +
and the condition lima T T v2(s, x) = 1.
xf(x)va(t,
x) = 0
We note that v,(t, x) _
u2(T - t, x), where u,(t, x) = M exp (xtf(x + w(s))ds) . / The function u,(t, x) satisfies the equation U
ua(t, x) + Vf(x)u2(t, x) at ua(t, x) = 2 axQ
and the initial condition limt to u,(t, x) = 1.
(4)
S.
THE METHOD OF DIFFERENTIAL EQUATIONS
417
Equation (4) can be solved by using the Laplace transform with respect to t. We define z,,,,(x) =
Je-1`tu,(t, x)dt .
Here and below, ? denotes a purely imaginary number and p denotes a real nonnegative number. Under these conditions z,,,,(x) is meaningful. Multiplying (4) by e-Pt and integrating with respect
to t from 0 to -, we obtain 1-[7',2(x) - I
-2a
(5)
Xf (x)z',2(x)
22
Equation (5) is valid when the function f and its derivatives are twice
continuously differentiable and bounded. Now let f(x) denote a piecewise-continuous bounded function. Let us choose a sequence of functions fa(x) each of which satisfies equation (5) as applied to z;,;(x) where zana(x) = f e-111M exp Jo
w(s))dsl dt
/
0
In addition, let us suppose that the fa(x) are uniformly bounded
and that they converge to f(x) for each x. One can easily see that the (;(x) are bounded by the number 1/pe and that for every p, x, and x, the sequence jz,1,n)(x)j converges to z,, ,(x) = e-PtM exp (x.tf(x + w(s))ds)dt 0
.
(6)
0
It follows from (5) that the a2zj )(x)lax2 are bounded by the number and that 4 + 2 1 X I C/ p, where C is a constant bounding the Therea2z,,{(x)/axe converges to 2[p - Xf(x)}z,,,,(x) - 2 as n o o. fore the derivative a2z,i,,(x)/axe exists and the sequence {a2zu z(x)/ax2} converges to a2z,,,,(x)/axe (at every point of continuity of the function
f(x)). Consequently at points of continuity of the function f(x), the function z,,,2(x) satisfies equation (5). Thus we have:
Theorem 3. If w(t) is a process of Brownian motion and z,,,,(x) for p > 0 and Re X = 0 is defined by formula (6), where f(x) is a bounded piecewise-continuous function, then z,,2(x) is continuously differ-
entiable, it has a second derivative at all points of continuity of f(x), and it satisfies equation (5). Example.
Let us find the distribution of the quantity
It =
t
sgn w(s)ds 0
DIFFUSION PROCESSES
418
In this case, equation (5) takes the form + 2(X, sgn x
z'',,',2(x)
- ")z,,I(x) = -2
Solving this equation separately for the cases x > 0 and x < 0, we obtain z, ,A(x) =
2Xx) + C2 exp (- 1/2," - 2Xx)
+ C1 exp
1
forx>0, 1
z,,,,(x) =
+ C3 exp (1/2p + 2Xx) + C, exp (- V2p + 2Xx)
for x < 0 . From the assumption that z,(x) is bounded as x--, ± oo, we obtain C, = C, = 0. Furthermore, using the continuity of the functions z,,,,(.x) and az,,,,(x)/ax at the point x = 0, we may write //
1
+C,=
//
+C3,-C2V2pc-2X =C31/2p + 2X
1
From this we get
C=
-1+1/si+ //
3
To determine the distribution of random variable I, it suffices to know z,,,,(0). Substituting the value found for C3, we obtain for _
1
_
1
zE .1 (0)
V
1
x2)-, /2 x2
=
1 : (2n - 1)!! 2n!!
,u n-o
Since n!
tne-'dt =
n=1-1
and
fork odd
0 2
sin' Tdp =
,
(2n - 1)!! 7r for k = 2n , (2n)!!
we have M exp
=
sgn w(s)ds)l =
sin's pdp 2
X2nt2n
(2n)!!
(2n)!
n-o
2
J-r
(2n - 1)!! Xktk
k!
= - (-12 e2t5'n,'dq
nJ
--,2
.
S.
THE METHOD OF DIFFERENTIAL EQUATIONS
419
Consequently, 1
M exp
0t
1 112
sgn w(s)ds) =
/
t
e2s
n'd? = f
eaudF(u)
-Tl2
7L
where 0
-(arc sin x + 1
F(x) =
for x< -l
,
for Ix < 1
,
for
.
7r 1
Ix
>1
Thus pj
REMARK 1.
o
sgn w(s)ds < xI =
I
(
sin t + )
2 Suppose that the function p ,(x) is defined by + sgn x 2
'p, (X)
Then the quantity t
cp+(w(s))ds
-c,+ = 0
represents the time passed on the positive half-axis by the process w(s) during the time t. Using the result obtained, we can find the distribution rt. Specifically, since
rt = 2 + 2
o
sgn w(s)ds
we have p
f t r` < x1 _
= pj-'
t
(arc sin (2x
o
sgn w(s)ds < 2x - 1
- 1) +
for 0< x < 1
Let us use some elementary transformations to obtain the distribu-
tion of the variable rt in a simpler (and more commonly used) form.
If
2 arc cos (1 - 2x) = z then on the one hand,
cos 2z = 1 - 2x ,
2z - sinz z , z =arc sin 1/ x x = 1 - cos 2
and on the other hand,
DIFFUSION PROCESSES
420
arc sin (2x - 1) + 2 arc cos (1 - 2x) = 2. 2 = 2 arc sin V x Thus we have obtained what is known as the "arc sine law." Theorem 4. If zt is the time a process of Brownian motion has spent, up to the instant t, on the half-line x > 0, then
for x r, and fi(r) = gi(r). By construction, this will be a process with absorption
on the boundary of the region G. Let us show that e*(t) is a solution to equation (2). If x(t) = 1 for t < r but x(t) = 0 for
t > r, and if e(r) = gi(r), then
$*(t) - o -
ta*(s,
rt6*(S, *(S))ds
*(S))dH(S)
0
a(s, e(s))ds
-
6(S, $(s))dw(s)]x(t)
+ [e*(t) - 0 - ra*(s, E*(s))ds -
1 r0'*(s, e*(s))dw(s)
0
0
gi(s)ds](1 - x(t)) = 0 .
This is true because if x(t) = 1, then for s < t, a*(s, e*(s)) = a(s, e(s))
,
6*(s, e*(s)) = o'(s, e(s))
*(s) = SO(S)
,
and if x(t) = 0, 6*(s, E*(s))dw(s) _
6(s, e(s))dw(s)
,
a*(s, E(s))ds = oa(s, (s))ds + Jrg;(s)ds and
*(t) - 0 - ra(s, E(s))ds - '6(s, $(s))dw(s) 0
0
$*(r) = gi(t) - gi(r) = tg(s)ds
.
DIFFUSION PROCESSES
422
Let us now prove that a solution of equation (2) that is a process with absorption on the boundary is unique. (As usual, by uniqueness we mean uniqueness up to stochastic equivalence.) Let e* (t) and $ *(t) denote two such solutions. We note that a sufficient condition for them to coincide is that they coincide inside the reigion G, since such solutions reach a given point of the boundary at the same instant and hence coincide. Furthermore, they are processes with absorption on the boundary of G. Let X(t) denote the function that is equal to I if $ (s) G (91(S), 920) for
s E [0, t] and i = 1, 2, and equal to 0 otherwise. Then [ee]T) - $2 (t)]X(t) = X(t)5oX(s)[a*(s, e*(s)) - a*(s, 2 (s))]ds
+
'*(s)) - u*(s, a (s))]dw(s) ,
from which we get (s)))ds]a
M(ee*(t) - $ (t))2X(t) < 2M[ctX(s)(a*(s, a*(s)) - a*(s, $* LJo
+
$*(s)) - a*(s, E2 (s))]dw(s)]2
< LM(ee*(s) - $ (s))2X(s)ds , 0
where L is some constant.
(Here we used the fact that a*(s, x)
and o'*(s, x) satisfy a Lipschitz condition for x e (g1(s), g2(s)).) Just as in the proof of the uniqueness in Theorem I of Section 3, it now follows from the last relation that M(E*(t) - ea (t))ZX(t) = 0;
that is, keeping in mind the continuity of the processes $j(t), we This completes the proof
conclude that P{e1(t) = e2(t), t < z} = 1. of the theorem.
REMARK 2. A solution of equation (2) that is a process with absorption on the boundary of the region G, is a Markov process
whose transition probabilities P*(t, x, s, dy) coincide with the distribu-
tion of the process e*x(s), with absorption on the boundary of G, that is a solution of the equation e*x(s) = x + aa*(u, S*x(u))du
+
5*(u, a*,(u))dw(u) t
(s > t)
.
(4)
The process a*x(s) can be obtained from the process et,x(s), which is a solution of equation (6) of Section 3, if in that equation we substitute those a(t, x) and a(t, x) that were used in the proof
6.
ONE-DIMENSIONAL DIFFUSION PROCESSES WITH ABSORPTION
423
of Theorem 1. Let z, s denote the smallest root of the equation (e,,z(s) - g,(s)) x (eLrz(s) - g2(s)) = 0 on the interval s e [t, T], and set z,,s = T when this equation has no root. Then *x(s) = st.s(s) for s < z,,x but * z(s) = g;(s) for s z,,x if x(z, x) = g;(z,,x) The first assertion is proved in a manner analogous to that of the proof of Theorem 2, Section 3, and the second assertion follows from Theorem 1. Let us look at certain transformations that enable us to simplify equation (2) in the region G. In addition to the process e*(t), let us consider the process iI*(t) = f(t, e*(t)), where f(t, x) is for each t an increasing twice continuously differentiable function of x for x e [g,(t), g2(t)] and differentiable with respect to t. The process *(t) is also a Markov process since $*(t) is uniquely
This is a process with absorption on the boundary of the region G that is bounded by the curves determined from )7*(t).
NO _ .f(t , ge(t)) ,
g2(t) = .f(t, g2(t))
Using property 8 of Section 2, we see that Y)*(t) satisfies the equation dry*(t)
=
[f'(t, e*(t)) + fs(t, $*(t))a*(t, $*(t)) + 2 .f'x(t,
a*(t))a.*(t, e*(t))2]dt
+ fz(t, $*(t))Q*(t, e*(t))dw(t)
Thus )7*(t) is a solution of a stochastic equation of the form (2): dry*(t) = a*(t, ,*(t))dt + Q*(t, )7*(t))dw(t) , where
a*(t, y) = f'(t, p(t, y)) + fx(t, p(t, y))a*(t, m(t, y)) + 2 fxx(t, P(t, y))6*(t, T(t, y))2 ,
a*(t, y) = fx(t, P(t, y))o*(t, P(t, y))
,
(5)
and the function q(t, y) is the inverse of f(t, x) with respect to x; that is, f(t, p(t, y)) = y, and cp(t, f(t, x)) = x. If we set fz(t, x) = l/Q*(t, x) for x e (g,(t), g2(t)), then a*(t, y) = 1 for y e (g,(t), g2(t)). Suppose that f(t, z) _ Then
*A
O*du
8,(t) u
u)
DIFFUSION PROCESSES
424 82M
gilt)=0,
g2(t)=
du 91(t) a*(t, u)
Let us introduce the process yJ*(t) =
=C(t) 7)*(t)/C(t).
This is also a
Markov process, with absorption on the boundary of the rectangular
region A: {0 < t < T, 0 < x < 1}, and it satisfies the equation
dy)*(t) _ [_c9.() + a*(t, C(t)7)*(t))]dt + d*(t, C(t)i)*(t)) dw(t) .
(6)
C(t)
Now let us define the process C(t) (dw(s)/C(s)). This process is, with probability 1, a continuous process with independent increments and satisfies the relations Mg(t) = 0 ,
DC(t) = f
t
dss)
0 CI(
_ X(t)
Therefore C(t) = w1(X(t)), where w1(t) is a process of Brownian motion. Setting x(t) = s (so that t = X-1(s)) and )7*(X-1(s)) = e*(s) , C'(X (s)) x + a*(X-1(s), C(X-1(s))x) = a*(s, x) C(X-1(s))
and letting x(o,1,(x) denote the characteristic function of the interval (0, 1), we obtain from (6) the following equation for e*(t): dE*(t) = a* (t, e*(t))dt + x(O,1)(e*(t))dw1(t)
(7)
.
Thus an arbitrary problem associated with finding the distribution of any characteristic of the process *(t) that is a solution of equation (2),
can be reduced to finding the distribution of some other
characteristic of the process $*(t) that is a solution of equation (7). The latter is somewhat simpler than equation (2). The transition probabilities of the process e*(t) can easily be obtained in terms of the transition probabilities for the process *(t). Consider' the question of determining the transition probabilities for the process e*(t). From the remarks made above we note that the transition probabilities p*(t, x, s, dy) of the process e*(t) coincide with the distribution of the process et;y(s) constructed as follows: Let $,,,(s) denote a solution of the equation a1(u, et.=(u))du + w1(s) - w(t)
$t,=(s) = x +
,
t
0<x 1, Z- t(1)
av8, Sv8 < T}
sup
St,x(s) > 1, 31), < T} - P {zt,x < 3vs}'
t8ug<sP >_
8vg<ss
sup I St,x(s') '111 p{zL', < T} 8-.0
Therefore, taking the limit as a > 0 and remembering that is continuous with probability 1, we obtain sup
Pj
l rL'x 1, zt;z < T}
J
exp(_)duP
V-2-2r
o h +c h
sup
t,x(s) > 1, Zt,x < T
T}
tx' (//227r o h +c h_ exp \
(-u2
2
)du p {zt;x < T}
by taking the limit first as s --> 0 and then ash assertion of the lemma.
0, we obtain the
Let us now prove:
Theorem 2. The random variable zt,x is, with probability continuous as a function of t and x. Proof.
1,
As already noted, to prove the assertion it will be
sufficient to show that the random variables z,1', and zt°x are continuous with probability 1. Let us show, for example, that z-,11) is continuous. We note that for t, < t2 and x x2 e (0, 1),
6.
ONE-DIMENSIONAL DIFFUSION PROCESSES WITH ABSORPTION
427
t2
x, - X2 I +
s"'.1(s) - ec2.y2(s)
a1(u, E5,y(u))du e1
s
+
a1(u, $52,y2(u)) du + w(t2) - w(t)
I a1(u, J t2
< I x1 - x2 I + C t1 - t2 I + I w(t2)
+
- w(t)
I L1,y1(u) - ee52,y2(u) I du . t2
Therefore
I $'1"1(s) - $'2,y2(s) I < (I x1 - x2 I + C I
tl - t2
l
+ I w(t2) - w(t1) 1) exp {K(s - t2)}
so that for some L, se1,y1(s) - ec2,y2(s) I
< L(I x1 - x21 + I tl - t21 + I w(t1) - w(t2) 1)
means that
The event zLi'yl > zi2'y2 + $t1,x1(s)
(9)
for S E (Tt21y2' r 2,y2 + ) .
1
Since '11y1('S) - t21y2(s) 1
< L(I x1 - x2 I + I tl - t2 1 + I w(t1) - w(t2) 1)
it follows that
tl - t2 1 + I w(tl) - w(t2) 1)
EL2,y2(S) < 1 + L(I x1 - x2 1 + I for s E (T(2 ?y2, _C"' y2 +
a).
)7a =
We define sup
ct2,y2(S) - 1
TL2, r2<s
+ I w(t) - w(t2) I) < 20
,
2 + 3 becomes impossible. Thus ztl'y1 < zi2'y2 +sl,
if L(I x, - X2 I + I t1 - t 2 1 + I w(t1) - w(t2) )_< i
.
(This inequality holds for sufficiently small I x1 - x2 I and
by virtue of the continuity of w(t).) Let us also define
sa = 1 -
sup S_ 0, and that u;(a) = 1. If we define u-(x)
C= lim u,(/3),
- 00 ,
that
C>0
we easily see that u(x) = lim u;(x) is a solution of the equation a(x) d dx u(x) +
I
0
2 d ' u(x)
with boundary conditions u(a) = 1 and u(/3) = C. To determine C, we note that u(x) = P,(a,,a) = 1 - P,(9, a) so that u(/3) = 0, since P,(/3, a) = 1. Solving this equation we find 13
B(t)dt
x
N) =
//
(18) (18)
,
B(t)dt
where
B(t) = exp
2a(u)du}
.
RE MARK 2. We have seen that under certain conditions on the smoothness of a(x) and 6(x), we can, by a transformation of the unknown function, reduce a stochastic equation to the case in which a(x) = 1. By means of such a transformation, we can easily show that formula (18) remains valid for arbitrary u(x) if we set
B(t) = exp { f 6a(u) dul
.
(19)
u
Sometimes it is useful to know the mathematicgl expectation
6.
ONE-DIMENSIONAL DIFFUSION PROCESSES WITH ABSORPTION
437
of the instant at which the process goes beyond the boundary of interval [a, 8,]. To determine this quantity, let us differentiate equation (16) with respect to X and then set X = 0. We see that the function I
x
o
= fi(x) = M Jofeolx(s))ds
satisfies the equation z
,p a(x) a-T + 2 dxl ax
+f=0
(20)
and the boundary conditions T(a) = cp(,Q,) = 0. Solving equation (20), we obtain
q(x) = JaB(t)I
- 1rt B(u)) du]dt
where
L B(t)B( u)) du
C= p1B(t)dt
By means of a limiting operation wherein f(x)-->1 for x c (a, e) while remaining bounded, we obtain du t
M7
B(u) (21)
f31B(z)dz
IX
LIMIT THEOREMS FOR RANDOM PROCESSES
Throughout the remainder of this book we shall frequently encounter processes that are obtained from simpler random processes by a limiting operation. These limiting operations yield either sample
functions of the original process (for example, in the solution of stochastic differential equations in Chapter VIII, Section 3, or in the expansion of a Gaussian process in a series of eigenfunctions of the kernel corresponding to the correlation function in Chapter V, Section 2) or simply finite-dimensional distributions of the process (for example, in the construction of stationary Gaussian processes by taking the limit of trigonometric sums, Chapter I, Section 4). In the study of random processes considerable attention is given to methods of finding the distributions of different functionals of a random process, for example t
2f
tl
sup s(t) ,
tlGt StZ
inf e(t)
.
tl5t 0 and acontinuous function f such that x(A, U A2) > "[fl- a and f XA, + XA2.
We let TAti denote the function 1
,lti'x O_
if
X G Ai, ,
1 p(x, A;) if 0 < p(x, A,) < s , 0
p(x,A,)>s.
if
Here p(x, y) = infy6A p(x, y). If 2s < p(A A2), then T1 + r2 < 1, 91`A, >_ XA1, and (pAZ x,,2. Therefore ,,(A1 U A2) + a > 1[7'31.fJ + l[7 2f] ? (A1) + '(A2) -
Relation (2) follows from this inequality by virtue of property (1) and the arbitrariness of a > 0. Let s.$o denote the collection of sets for which X(A') = 0, where A' is the boundary of the set A. Obviously 5 3o is an algebra of sets. Let us show that X(A) is an additive set function on V3, Let Al and A2 denote two disjoint members of moo. Let r denote the union of the boundaries of the sets Al and A2. Since X(P) = 0, it follows that for every s > 0, there exists a continuous function g satisfying the condition g > X, such that 1[g] < E. Define
[Ai] n lx' g(x) < 2 }
'
where [A] denotes the closure of A. The closed sets A;,' and A(,'
1.
WEAK CONVERGENCE OF DISTRIBUTIONS IN A METRIC SPACE
443
are disjoint and hence are at a positive distance from each other, since X is compact. Define 2-1
Then X(F,) < 1[2S] < 2s , X(Aa) < x(AzE,) + X(F,) < %(Aa:)) + 2s
Using the monotonicity of . and property (2) we obtain X(A1) + X(A2) < X(AIEN) + ? (AEE>) + 4s
=X(A(0U AZ")+ 4s<MA,UA2)+4s. Taking the limit ass 0 and keeping (1) in mind, we see that 7. is additive on S$o.
Thus the conditions of Theorem 2, Section 7, Chapter II are satisfied for the function x on 0o. Therefore 7 can be extended as a measure p defined on c(00). Let us show that u(Oo) coincides with 58. Note that for every x there exist only countably many radii r such that X(Sr(x)) > 0, where Sr(x) is the sphere of radius r with center at x and boundary ST(x). Therefore Q(uo) contains all spheres and hence all Borel subsets of X. Let us show that for all f e C1,
(2)
1 [f l = f(x)p(dx)
that f is
nonnegative. Let us choose values 0 = . < CN with c,,. > JJf JJ, in such a way that %{x; fix) = ck} = 0. Then if Ak = {x; ck_1 < f(x) < ck}, it follows that Ak e 3o and hence X(Ak) = p(Ak). Let r p k for k = 1 , 2, ,N Suppose
co < C, < c,
X(Ak) + s and pk XAk. Since f :!g Ek=1 C,q , we have N
, C Ne +E Ck? (Ak) = Ne + E CkI!"(Ak)
1[fl
k=1
k=1
< Ns + max(Ck - Ck-1) fl(X) + f(x)p(dx)
.
It follows from this inequality that 11f ] < f(x)p(dx)
(3)
for an arbitrary nonnegative function f e C1. Furthermore, 1[1
< f (l - f(x) p(dx) - -L-1 IP ) 11 f1I
.
LIMIT THEOREMS FOR RANDOM PROCESSES
444
Since 1(1) = p(dx) = p(X), we have
- I[f] < -
(4)
f(x)p(dx)
Comparison of (4) with (3) yields (2) for nonnegative functions and
hence for all f(x) e C. Thus for arbitrary f(x) e C1, lcim f(x)li,k(dx) = f(x),u(dx) This completes the proof of the lemma. We can now prove the theorem. Proof of the Sufficiency. Let us choose a sequence {Sm} that approaches 0 as m -. 00 and a sequence {K'm)} of compact sets such that Kim) c K(m+') and sup, p,(X\K'm)) < sm. We define 1jnm)(A) = lJ,(A n K'm)) .
Let us choose a sequence {nk')} such that the sequence of the
measures p") converges weakly to some measure pt'). Let us nk define sequences {nk} such that {nkj)} is a subsequence of {nkj-')l and the sequence converges weakly to some measure p(j). Since nk
p(j) and p(j-1) coincide on K(j-1), it follows that var I P(j) - p(i+P) i
0 and 8 > 0, there
exists a compact set K such that sup, p,(X\K8) < S, where Ka denotes the set of points x whose distance from K does not exceed 8. That b implies b' is obvious. Conversely, let K") denote a compact set such that sup, p,(X\K,'11)) < s/21. Then the set l,K,(Il)
1.
WEAK CONVERGENCE OF DISTRIBUTIONS IN A METRIC SPACE
445
If b' were not valid then there would exist /an s > 0 and a 3 > 0 such that is a compact set for which condition b is satisfied. sup,, ,u.(X \KS) > s for every compact set K.
Let KM denote a compact set such that p1(X\K(°)) < s. (The existence of such a compact set follows from Theorem 6, Section 7, Chapter II.) Since sup p,,(X\K8°') > s, there exists a number n, such that p,,,1(X\Ka°1) > s; hence, there exists a compact set K"'
>a and K'1' C X\Xb°1 (again on the basis of such that Theorem 6, Section 7, Chapter II). Since sup,, j_t.(X\KS°'\K81') > s, there exist a number n2 and a compact set K(2) C X\K°)\K1) such that ft,2(K(2)) > s. Continuing this process, we choose a sequence of numbers n; and compact sets KW such that u,,,,(K(j) > s and K(j) C X
j-1 Ka") =X \a
j-1UK(i) 8
i=O
i=O
\[
I
Let xi(x) denote a continuous nonnegative function bounded by unity, vanishing on X\Kli2 and equal to 1 on Since the distance between any two compact sets of the sequence Kci) is at least
3, the functions xi(x), for distinct values of i, cannot be nonzero simultaneously. Let us choose from the sequence { ft,,,j} a weakly convergent subsequence { ftk}. Suppose that this subsequence converges
to p. Since the measure It is finite and Ei Xi(x) is bounded, we have xi(x)I-t(dx) _ M1=1
2=1
xi(x)ft(dx)
p.,,(K`P'') > s
(k = np,)
as soon as k > p; hence for all p,
xi(x) j_t(dx) i=P
= lim k--
xi(x) p'k(dx) > s
.
J
This contradiction proves the necessity of condition b. This completes the proof of the theorem. The completeness of the space X was used only in proving the necessity of the conditions of the theorem. The conditions of the theorem are sufficient for weak compactness of a sequence of measures in an arbitrary metric space. These conditions are also REMARK 1.
LIMIT THEOREMS FOR RANDOM PROCESSES
446
necessary if the space X can be represented as a Borel subset of
some complete separable metric space. With the aid of Theorem 1 we can establish necessary and sufficient conditions for weak convergence of a sequence of measures in the case of a complete space X. Furthermore, these conditions are sufficient for every metric space. Theorem 2. For a sequence of measures pn to converge weakly to some measure ,u, it is necessary and sufficient that the sequence {ttn} be weakly compact and that p ,(A) - p(A) for all A belonging to some algebra 0° such that a(0°) = 0. Proof of the Necessity. Obviously, every convergent sequence is weakly compact. Let us take an arbitrary set A e 3. Let AM
denote its interior (that is, the set of all interior points of A), and let [A] denote its closure. If {pn} converges weakly to p, then by choosing a continuous function f(x) such that f(x) - 1 for x e [A] and p([A]) > f(x)p(dx) - s, we obtain J
p([A]) > f(x)1(dx) - s = lim f(x)1n(dx) - s > lim p. (A) - s . This means that lim pn(A) C p([A])
lim p. (X\,4) < p([X\A]), - lim p,(A) < - (A) . Therefore
)"(All)) < lim p(A) < lim p(A) < ft([A])
.
Thus for all sets A such that p([A]\A(0') = u(A') = 0, where A' is the boundary of the set A, lim j_cn(A) = ,u(A) n-_
Let 0° denote the collection of sets A such that p(A') = 0. Obviously 0, is an algebra of sets. Let us show that u(8°) _ 0. Note that of all the spheres S. with center at a given point x, for only countably many of them do we have ti(S,) > 0. Consequently o(0°) contains all spheres and hence all Borel sets since the algebra of Borel sets is a minimal a-algebra containing all spheres. This completes the proof of the necessity. Proof of the Sufficiency. Let us choose an arbitrary weakly convergent subsequence {p, } of the sequence {p.}. Let f1 denote the limit of this subsequence and let us show that p and P coincide. Suppose that A e 0°. Then as shown in the proof of the necessity,
1.
WEAK CONVERGENCE OF DISTRIBUTIONS IN A METRIC SPACE
u(A(O)) < lim u,tk(A) < lim ",,,k(A) < u([A])
But by hypothesis, lim
a(A).
447 .
Therefore, for all sets in
Q3o,
u(A) < u([A]) .
u(A`°')
(5)
Let {A%} denote an arbitrary monotonic sequence of sets that satisfy inequality (5).
Then since
(U An)c°> = U A"', %
(n An)c°
n An°
[U A.] C U [A.], [n A.] An] = n [A.] [An] , we can see that (5) is also satisfied for the limit of the sequence of sets A,,. Thus the collection of sets 0, that satisfy inequality (5) is a monotone class containing the algebra 0>. This means that this set contains 6( 0) = 0 (cf. Chapter II, Section 1, Theorem 3). Hence inequality (5) is valid for all sets A e 0. Let 02 denote the collection of sets A such that p(A') = 0. Then u(A) = P(A) for all A e ^32 since fl(AM) = ll([A]) = p(A). Obviously, the relation p(A) = AC(A) is also satisfied on the minimal a-algebra c(32) containing 02. As shown in the proof of the necessity of the hypotheses of the theorem, c(82) = 0. Thus the measures u and ,u coincide. We have shown that the sequence is a weakly compact sequence with a unique limit point that is the measure p. From this it follows that the sequence converges weakly to p. This completes the proof of the theorem. Corollary 1. If {p.} - p, then u,, (A) p(A) for every set A such that u(A(1)) = p([A]). The sets A such that u(A(°) = p([A]) are called sets of continuity of the measure p. This means that u, (A) --. p(A) for all sets of
continuity of the measure u if u, -,u. Corollary 2. Let us assume that the measures un correspond to processes e,(t) and that the measure u corresponds to the process e(t). Then the relation Zc ° p as n -f oc implies that the sequence of distributions of { f(on(t))} converges to the distribution of f(e(t)) for all 0-measurable functionals f that are almost everywhere continuous with respect to the measure p. Proof. Let A. denote the set of points of discontinuity of f. Then u(Ao) = 0. Let G. denote the set of those x such that {f(x) < a} and let Ga denote the boundary of the set G.: Ga = [{x; f(x) < a}] n [{x; f(x) >_ a}] . The intersection of the sets G.' and G,'1 for a < a, is contained
LIMIT THEOREMS FOR RANDOM PROCESSES
448
in the intersection of the sets [{x; f(x) < a}] f1 [{x; f(x)>_a1}].
There-
fore the inclusion relation x c G. n Gn1 implies that
lim inf f(y) < a, lim sup f(y) > a1 ; Y-
y_x
that is, G., nGn1EA..
Consequently ji(G., n G,,) = 0.
Hence for an arbitrary sequence
{ak},
p (U Gk) = E f"(G.'k)
From this it follows that the set of numbers a such that ji(Gn) # 0 is no more than countable. Therefore, for all a except possibly countably many values, the set G. is a set of continuity of the measure a, so that ,un(G..) --. ,u(G.) or a} - P{ f(e(t)) < a} as n oc . This completes the proof. REMARK 2. As a rule, in considering random processes we assume that the u-algebra of events of the basic probability space coincides with the minimal u-algebra containing all events of the form {u; fi(t) E C}, where C is a cylindrical set. This means that P{
weak convergence of the sequences of finite-dimensional distributions
of random processes ,n(t) to finite-dimensional distributions of the process e(t) implies convergence of the sequence of measures P. to the measure fa on the algebra 0. of all cylindrical sets of continuity
of the measure a, so that in this case, it
suffices
to verify the
conditions ensuring weak compactness of the measures p. 2.
LIMIT THEOREMS FOR CONTINUOUS PROCESSES
In this section we shall assume that the processes en(t) and e(t) are continuous on the interval [a, b]. Their sample functions belong, with probability 1, to the complete separable metric space C[a, b] of all functions x(t) on [a, b] with respect to the metric p(x, y) = supaStsb I x(t) - y(t) I
We note that in the space C[a, b], the minimal a-algebra of sets U containing all cylindrical sets contains all Borel sets. To see this, if suffices to note that every sphere belongs to U, since {x; sup I x(t) - a(t) I < r} = h {x; I x(tk) - a(tk) I < r}
,
2.
LIMIT THEOREMS FOR CONTINUOUS PROCESSES
449
where a(t) is an arbitrary continuous function and {tk} is an arbitrary sequence everywhere dense on [a, b].
Let H denote a constant and let cob denote a function defined for 3 > 0 that approaches 0 from above as a approaches 0 from above. Let K(H, (o,) denote the set of functions x(t) such that sup I x(t) j s}
In view of the arbitrariness of 77 > 0, we obtain (1). Proof of the Sufficiency. The convergence of the sequence of finite-dimensional distributions implies convergence of the sequence of measures t^n to the measure p that corresponds to the process
e(t) on the algebra 0, of all cylindrical sets of continuity of the measure p. But a(580) coincides with the minimal u-algebra containing all cylindrical sets. Hence (as shown earlier in this section) it contains all Borel subsets of the space C[a, b]. Therefore, in view of Theorem 2 of Section 1 it will be sufficient for us to
LIMIT THEOREMS FOR RANDOM PROCESSES
450
prove the weak compactness of the sequence of the measures p,. Let us show that for every rJ > 0 there exists a compact set K(H, o),) such that sup P{en(t) e K(H, (v,)} < 7
Since the sequence of the distributions $n(a) converges to the distribution e(a), there exists an H such that for all n, P{I en(a) I > H} < 2 .
Let us take a sequence {E,} that converges to 0 from above. every z, there exists an hr such that hr < hr_1 and sup p{I
sup
t.
h,
en(t') - en.(t") I > e,}
H} + r=1 Z P{. sup
It'-t"I Shr
I $n(t') - En(t") I > E,}
y/7
,,}1 = "/'p
2+
This completes the proof of the theorem. REMARK 1. Instead of condition (1), we may require that lim lim p{ sup En(t') - en(t") I > E} = 0, (2) t'-t"I 0 there
exists a 3 > 0 and an N such that for n > N and h < s, p{
sup t'-t"I
m2
m2a
[1og2(1/h)]+2 2"
-->0
uniformly with respect to n as h , 0. This completes the proof of the theorem.
LIMIT THEOREMS FOR RANDOM PROCESSES
452
It follows from the proof of the theorem that
REMARK 2.
instead of condition (4) we can require that MI
Sn(tl) - e .(t2) 10 < Hp(tl - t2)
where the function q(t) is such that for some ,8 > 0
0, Jim
kn
E \ u2dFni(u) = 0 , fl-,oo i=1 luI>e
(1 )
the random variable ni is said to satisfy Lindeberg's condition. Theorem 1. Suppose that random variables ni satisfy conditions 1 and 2 and Lindeberg's condition. Then, the finite-dimensional dis-
3.
453
CONVERGENCE OF SEQUENCES
tributions of the processes an(t) converge to the finite-dimensional distributions of the process w(t) and the sequence of distributions of f(on(t))
converges to the distribution of f(w(t)) for every functional f that is continuous on C[0, 1]. Proof. Let us first show that the finite-dimensional distributions of the processes en(t) converge to the finite-dimensional distributions of w(t). Let en(t) denote the random process defined by
sen(t) tni 0, kn
P{I an(t) - n(t) I> a} < P[sup I ni I > a1l < E
_
P!ll eni
I > a}l
a==l
a
f
ul>a
i=1
dFni(u)
1a2 i=1
tt>a
u2dFni(u) , 0.
Therefore, to prove the first assertion of the theorem, it will be sufficient to show that the finite-dimensional distributions of the process an(t) converge to the finite-dimensional distributions of the process w(t). But since w(t) and en(t) are processes with independent increments and w(0) = En(0) = 0, it will be sufficient to prove that the distributions of n(t") - n(t') converge to the distribution also note that w(t") - w(t') for all 0 < t' < t" < 1. MJ L V_E
and the expression on the right can be made arbitrarily small for sufficiently large n). This proves the convergence of the finite-dimensional distributions.
To prove that the distributions of f(on(t)) converge to the distributions of f(w(t)) for all functionals f that are continuous on C[0, 1], let us show that for arbitrary s > 0,
LIMIT THEOREMS FOR RANDOM PROCESSES
454
lim lim p{ '-,o
h-.0
> e} = 0 ,
sup I n(t') It'-t'IIsh
and let us use Remark 1 of Section 2. < 2 sup
sup
k
< 4 sup k
(2)
Since
sup
en(t) -
sup
en(t) - en(kh)
kh 1
VT7 h
Consequently
Li>wB)
exp(- u2ldu \ 2h
8}
3.
CONVERGENCE OF SEQUENCES
n--
> c^}
I n(t')
sup
lim P{
455
It
Y
f
1
1
exp
64h 1/2c
khcEash
exp (- u2 Idu
2/
\
s2
Since 1
exp (- u2) du
W'1
)
\
2
c2 J
nl>W,/hi
u2 exp(- u2)du 2
0
it then follows that equation (2) holds. This completes the proof of the theorem. REMARK 1. It follows from Corollary 2 to Theorem 2, Section 1 that if the conditions of Theorem 1 are satisfied, the distributions of f(on(t)) converge to the distribution of f(w(t)) for every functional
f defined on C[0, 1] and continuous (with respect to the metric of C[0, 1]) almost everywhere (with respect to the measure pw corresponding to a process of Brownian motion on [0, 1]). Let E1, 52, , sn+ . . denote a sequence of independent identically distributed random variables such that ME = 0 and pet = 1. From Theorem 1 we immediately have: Theorem 2. Let en(t) denote a random broken line with vertices {(k/n), (1/1/ n )Sk}, where Sk = e1 +
Then for every functional f defined and continuous on C [0, 1 ] almost everywhere with respect to the measure I.tw, the distributions of f($n(t)) converge to the distribu+ S k.
tion of f(e(t)). Proof. It will be sufficient to show that Lindeberg's condition is satisfied for the variables ink = (1/1/ n )ek. If we let F(x) denote the distribution function of the variable 5k, then the distribution function of the variables Enk is F(1/ _n x) and x2dFnk(x) _
k.=1
xl>[
=
nx2dF(1/ _n x) = J
Corollary. supOSke./n
Ixl>:
}
1/2ar
0
ex p (
\
? du 2
.
LIMIT THEOREMS FOR RANDOM PROCESSES
456
Proof. We have Y),/1/ n = supost p(x(t)) for almost all t and the functions q (x, (t)) are bounded by a single constant since sup,,, I xn(t) I
is finite and q(x) is bounded on every interval. We should note only that
M x,(w(i))dt =
Mxw(w(t))dt = '0 dt
27zt
exp
\
l
2t/du
0,
since A, has Lebesgue measure 0 (by virtue of the Riemannintegrability of the function q). The quantity negative.
Therefore
is non-
3.
CONVERGENCE OF SEQUENCES
457
0} = 0 . 0
If we let A c CEO, 1] denote the set of points of discontinuity of the functional f, then A c {x(t); X (x(s))ds > 0}
.
Hence
f w(A) < P{J xw(w(t))dt 7-, 0} = 0 . If en(t) is the process introduced in Theorem 2, then on the basis of that theorem, limp{J
a} = P{Jo cp(w(t))dt < a}
q(en(t))dt
p(x) > g
(x)
and
- pE [,pF (x)
(x)]dx < e .
continuous function p(x),
For
(en(t))dZ -
l
n k1
k
n Sk)
n(t)) - 1 n
.)(c1-1)ln
kjn
n
k=1
(k-1)/n
sup x-y s 1va
7'
Jdt
n
( n(t)) - 7' l n (-!))
dt
P(x) - ip(Y)
xlstn
where
/n=sup
n 1)- e (n)
SUP k
ek
1/n
I Sk Sn=suk pV n
Therefore
p{ 0
1n k=1
1
n
-
c p{ran < 8} + p{Sn > C}
LIMIT THEOREMS FOR RANDOM PROCESSES
458
where 8 and c are chosen in such a way that Iip-(x) - q5(y) I < s whenever I x - y I < 8 and I x I < C. Since 72, 0 in probability as (this follows from Lindeberg's condition and was established in the proof of Theorem 1), it follows that p{rJ, > 8} ---' 0 as n -> -
n
co
for every 8 > 0. The quantity p{C, > c} can be made arbitrarily small by choosing sufficiently large c for all n (this follows from Corollary 1). Therefore n k-=1
l
P\l/ n Sk) I
-' 0
in probability, so that lim P{
n--
n k=1
(Lsk) < a} = P{J1 rp(w(t))dt < a} n
if P{ f o p(w(t))dt = a} = 0.
Since
Lt
k=1
(p(
n
5k) < a} >
n k=1
> p{1
(5k) < a} /
)J
n
n k=t q
(1/ n Sk)
0,
by taking the limit as n
P11 " q(1/ P{\1cE (w(t))dt < a + h} > Tim 0 n-oo n k=1
n
S,) < a}
> n-lim P{L (sk) < a} n k=1 Vn
p (w(t))dt < a -
h}
.
But M
cp(w(t))dt
1PE (w(t))dt J0
J
<
0
0,
+ 16n(tnk, enk) - 6(tnk, nk)
0,
kn
Jim
E M I n,k+1 - enk n-.eo k=1
Z+d = 0
and kn
Ink) -0
P- Iim SUP E Mli %-,00 k=1 i=k
d. the functions l/on(tnk, x) and an(tnk, x)/a, (tnk, x) are uniformly bounded with respect to n; e. the limiting distribution of the random variable eno coincides with the distribution of the random variable
Proof.
we set g
g ink = Sn,k+1 - nk - anltnk, Snk)Atnkl l6n(tnk, Snk))
1
gg
l
)
Then n,k+l = enk + an(tnk, enk)Atnk + on(tnk, en.k)wn.k
.
Let ink denote the minimal a-algebra with respect to which
4.
CONVERGENCE OF A SEQUENCE OF MARKOV CHAINS
461
, nk are measurable. The quantity mnk is measurable with respect to the a-algebra and the variables S no, n1,
(2)
ink) = 01 M(0) k I Unk) = Atnk Consider the variables /nk defined by the relations M(Wnk I
y'p7
7/n0 = fn01 yy /n,k+1 = /nk + a(tnk, yy )nk)&tnk + Q(tnk, /nk)0)nk
Obviously, the /nk are also We have measurable with respect to the a-algebra
Let us find a bound for M(17nk - enk)2
'/n,k+l - en,k+l = /nk - enk +
[a(tnkk) '7 k)
- a(tnk, $.01
X Otnk + [o(tnk, y)ak) - a(tnk, Cnk)]Unk + Enk r
g
Enk = [a(tnk, Enk) - an(tn.k, Cnk)]Atnk + [Q(tnk, Y)J - 6n(tnk, En.k)]Wnk
Therefore, by using equations (2) and the Lipschitz conditions for a and a and also the inequality 2ab a2 + b2 we obtain y
2
M I /nk
+ 2M('/.k - Cnk) x (a(tnk, /nk) - a(tnk, + 2M I
'/nk y-7
$EEnk))Atnk
Cnk I X I a(tnk, e, ) - an(tnkk, enk) I Ltnkg
+ M
I J k - Cnk Al + 2KOtnk + I tnk)Otnk
+ M I a(tnk, Enk) - an(tnk, ink) I2Atnk
+
2
2M(a(tnkl
a(tnk+ nk) Atnk
f
+ M(6(tnk, /nk) - a(tnk, enk))2 I U nk) + 2M b»k G Myy"/nk - Enk 12(1 + L1tnk) + ank , I
where L = 2K + 1 + 4K2 and ank = M[a(tnk, Enk) - an(tnk, Enk)]' (Ztnk + 2Otnk) 2
+ M[Q(tnk, nk) - 6n(tnk, nk) Atnk Since M /no - ?;no I2 = 0, we have M I EEnl - Y)nl I'
an,,
an0(l + LAtnj + an, S (1 + LO40) + and M Sn3 - 77n3 I2 )} I
< 0(V m max (zk A:
by virtue of relations (3) to (5) and the convergence of the joint distributions of the variables *(s,) to the joint distributions of the variables 7(s;). Since maxk (zk - zk_1) is arbitrary, the conclusion of the theorem follows. Theorem 2. Under the hypotheses of Theorem 1, for any continuous functional f on C[0, 1], the distributions of f(on(t)) converge to the distribution of f(e(t)). Since the finite-dimensional distributions of the processes
Proof.
$,(t) converge to the finite-dimensional distributions of the process E(t), by virtue of Remark 1 of section 2 we can reduce the proof of this theorem to proof of the equation lim I P{ sup I En(t') - en(t") I > s} = 0 . (6) h-.0
n-.w
It
Following the reasoning used to prove Theorem 2 of Section 3, we see that P{
sup
n(t")
I en(t') kh 61
.
4
Let sk denote the greatest of the indices r such that tnr < kh. Then P lkh 5su(k+1)h I P
n(t) - w(h) I > 4 sup
.k5I159258k+1+1
161 C 2
Then on the basis of Lemma 4, P
I n(t) - E (kh) I >
lkh< sS_(k+l)h
4
< 2Pl I nsk+1+1 - $nsk I >
16
It follows from the convergence of the finite-dimensional distributions of E,(t) to the finite-dimensional distributions of E(t) that liimm P{I
$n.k+1+1 - %'k I >
16
16
This means that lim P( sup ,8
It'-t'h
I
en(t') - en(t") I > s} P
I
kY1
161
(-)4(kh+h)-(kh) E
kh a} < 1614Lh 6/
This proves equation (6) and hence the theorem.
5. THE SPACE OF FUNCTIONS WITHOUT DISCONTINUITIES OF THE SECOND KIND
Let D [0, 1] denote the set of real functions x(t) defined on the interval [0, 1] and having right- and left-hand limits at every point. We shall treat two functions that coincide at all points of continuity as the same function. Therefore it is natural to take some sort of standard definition of the values of functions
LIMIT THEOREMS FOR RANDOM PROCESSES
470
x(t) at points of discontinuity. In what follows we shall assume that for all functions in D[0, 1],
x(1)=x(1-0). (1)
x(0)=x(+0),
x(t)=x(t+0),
Study of the space D[0, 1]
is
useful since there are classes of
random processes whose sample functions fail, with probability 1, to have discontinuities of the second kind (for example, processes with independent increments, Markov processes under extremely broad conditions). In order to be able to use the results of Section 1 we need to -define on D [0, 11 a metric in which D [0, 1 ] becomes
a separable metric space enjoying the property that the minimal 6-algebra containing all cylindrical sets coincides with the 6-algebra of Borel subsets of that space. The metric should be sufficiently
"strong" (that is, there should be as few convergent sequences as possible and hence as many continuous functionals in that metric as possible).
The uniform-convergence metric p.u(x, y) = sup I x(t) - y(t) I 05t5,
is not suitable for this since D[0, 1] is not a separable space in that metric. (The set of functions xa(t) = [1 + sgn(t - s)]/2 for 0 < s < 1 has the cardinality of the continuum, but the distance between any two distinct elements of that set is equal to 1.) We introduce into the space D[0, 1] a metric that is somewhat weaker than the uniform-convergence metric. Let A denote the set of all continuous increasing real functions X(t) on [0, 1] such that X(0) = 0 and x(l) = 1 (that is, X is a continuous one-to-one mapping of [0, 1] onto itself). We note that for each ) e A there exists an inverse function ?-1, also in A. If X, and X2 belong to A, then the composite function X1(X2) also belongs to A.
Now for every pair x(t) and y(t) in D[0, 1], we define pD(x, y) =inf [ sup I x(t) - y(e(t)) I + sup t - X(t) ] I
2EA
(2)
t
t
Let us show that p defines a metric on D[0, 11. To do this, we need to show that the function p satisfies the three axioms of a metric: (a) pD(x, y) > 0 with equality holding if and only if x = y; (b) pD(x, y) = p,(y, x); (c) pD(x, z) < pD(x, y) + pD(y, x) for all x(t), y(t), and z(t) in D[0, 1]. Condition (a) is obvious. Condition (b) follows from the relation
pD(y, x) = inf [sup I y(t) - x(X(t)) I + sup I t - M(t) 2EA
t
t
= inf [sup I y(?-1(t)) - x(t) I + sup I X-1(t) - t I ] = pD(x, y) I ICA
t
t
S.
THE SPACE OF FUNCTIONS WITHOUT DISCONTINUITIES
471
Let us look at condition (c), the triangle inequality.
Let x(t),
y(t), and z(t) denote functions belonging to D[0, 1]. For every s > 0 there exist functions X1(t) and X2(t) such that PD(x, y) > sup I X(t) - Y(?1(t)) + sup I t - ?1(t) I - s t t PD(Y' z) > -Sup I Y(t) - z(X2(t)) + sup I t - %2(t) I - s t t
(3)
Then PD(x, z) < sup I x(t) - z(X2(?1(t))) I + sup t - %2(X1(t)) I t t sup I X(t) - Y(X1(t)) I + sup I t - x1(t) t t
I
+ sup I Y(X1(t)) - z(A2(X1(t))) I + sup I ?1(t) - XA1(t)) I t t
t - X1(t) = sup I X(t) - Y(X1(t)) I + sup t t I
I
+ sup I Y(t) - z(X2(t)) I + sup I t - X2(t) I t t
since X1(t) ranges over the interval [0, 1] as t ranges over the interval [0, 1]. From inequalities (3) we obtain PD(x, z) < PD(x, y) + PD(Y, z) + 2s ,
which, by virtue of the arbitrariness of s, implies condition (c). Thus we may take pD as the distance in D[0, 1]. To make further study of the properties of the metric pD, we need some auxiliary propositions. Lemma 1. Let us define for every function x(t) in D[0, 1],
0Jx) =
sup
t-c!9t't5t'15t+C
[min {1 x(t') - x(t) I ; 1 x(t") - x(t) I }]
+ sup Ix(t) - x(0) I + sup
x(t) - x(1)1
1-CSt51
OStSC
Then
lim 0jx) = 0 C-w
Proof. The continuity of x(t) at the points 0 and 1 implies that supo 0 such that for arbitrarily small c, sup
min [I x(t') - x(t) 1; x(t") - x(t) 1] > s
then we can also find sequences t' < t < t; such that t' - t" , 0 and
I x(t'n) - x(tm) I > 6
,
x(tn') - x(tn) I > s .
(4)
LIMIT THEOREMS FOR RANDOM PROCESSES
472
By taking subsequences if necessary, we may assume that {tn} converges to some point to in the interval [0, 1]. Then the sequences
{t} and {tn} must also converge to to. Therefore the quantities x(t), x(tn), and x(t') converge to one of the numbers x(to - 0) or x(to) = x(to + 0), so that at least two of these must have the same limit. It is easy to see that the limit of the sequence {x(t,)} must coincide with the common limit of the sequences {x(t' )} and {x(t;; )} when these two sequences have the same limit. This means that at least one of the differences x(t,) - x(tn) or x(t') - x(tn) approaches 0, and this contradicts inequalities (4). This completes the proof
of the lemma. Lemma 2. Let x(t) denote a function in D[0, 1] and let [a, t3] denote a subinterval of [0, 1]. If x(t) has no jumps exceeding s in [a, 8], then the inequality t' - t" I < c for t', t" e [a, ,8] implies I x(t') - x(t") I < 200(x) + s . I
Proof. Let us choose arbitrary 3 e (0, s) and a point z in the interval [t', t"] such that for t e [t', z], x(t') - x(t) I < 0.(x) + 8 and
I x(t') - x(z) I > 0Jx) + 3 . If no such point exists, then I x(t') - x(t") I < A,(x) + S. This means that the assertion of the lemma is satisfied. If a point z does exist then since min [I x(r) - x(t') I; I x(z) - x(t") I] < A,(x)
and
I x(z) - x(t') I > A(x) + 3 , we have x(z)
- x(t") I < 0,(x)
Thus I x(t") - x(t') I
< I x(t") - x(r) I + I x(z) - x(z - 0) 1 + I x(z - 0) - x(t') I
u(t) for all s e (t, 1]. Equation (5) defines a unique function 2(t) in D[0, 1]. Let us show that this function 2(t) is the limit of the sequence {xn(t)}. To do this, we construct auxiliary functions cpn(t) in A. , -r, are all points in the interval [0, 1] at Suppose that r , z-,, which 2(t) has jumps exceeding 1/n. We let [a,, 8,] denote the maximum interval on which a(t) assumes the value Zvi (this interval may consist of a single point). Let yi denote a point in the interval [ai, ei] such that x*(t) = X(Ti - 0) for t e [ai, yi) and x*(t) = 2(v) for t e [yi, If ati = yi, then x*(t) assumes the single value 2(v) on [ai, 6j].) Let us choose sn not exceeding 1/n such that DEn(g) < 1/n. Let cpn(t) denote a function satisfying the equation Cpn(yi) = zi and the inequality pn(t) - p(t) < sn. Let us find a bound for I
sup I x*(t) -
I
t
If t does not belong to any of the intervals [ai, Gi], then
x*(t) - x(q,n(t)) = x(f (t))
-
20En(x) +
n
by virtue of Lemma 2 since 2(t) has no jumps exceeding
1/n
between a(t) and pn(t). If t e [ai, Ri), then sup x(Zi - 0) - x(s) < DEn(x) I x*(t) - 2(9m.(t)) 1 < 8 E Sri-En.ii
since 12(21 - 0) - 2(zvi) I > 1/n. In an analogous manner we can show that I x*(t) - 2(cp.(t)) < DEn(2) for t e [yi, 8i]. Consequently, sup I x*(t) - x(pn(t)) I < 2L
(x) +
n
>1
k 64 } + P{1-c5t 4 (P{ kc 1/c, it is easy to see that L (e, (t)) = 0 since en(t) is constant on intervals of the form [i/n, (i + 1)/n]. On the other hand, if n < 11c, then although the number of points of the form i/n in the interval [kc, (k + 3)c] varies with varying k, this number still does not exceed the number of these points in the interval [0, 4c], so that we always have
LIMIT THEOREMS FOR RANDOM PROCESSES
482
P{A(n(t)) > F}
2P{ sup 105 t )(P{0 s!9upc t s-
+ (1 + C To find a bound for the probability -I
>} = P j sup N enti >
OSt54c
1k
r
1l
P1 sup I en(t) - En(O) I
4J
4
64
1l
}
k54nc i=1
11
we introduce the variable ni = Sni if I ni I < L and;,; = 0 if I > L, and the variable eni = Cni - eni Then k
P1
sup 15k5N
Sni > i=1
4
sup sup eni I >0 + P 15k5N 14}
i=1
N ,'
i=1
P{I eni I > L}
Pj1skp yL Wni - Meni) > 4 - Li M ni } NDen1
N. P{ni > L} +
(4
- N I Men,
I)2
(Here we used Kolmogorov's inequality, Theorem 1, Section 4, Chapter III. )
If L and -L are points of continuity of the function G, then the following limit relations are valid: lim n P{ I eni I > L} = lim 5
Ixl>L
1 + x2 dGn(x) _ 5 1 + x2 dG(x) x2 IxI>L X2
,
2
lim n Den,
lim n Me21 < lim n M 1 + L en1 = (1 + L2) n-.oo
n-.oo
1
+ e21
-oo
dG(x)
lim n Men1 = lim n LIL 1 + xLn-- IxI1 J
=y-
1 dG(x) + 1
JJ IxI>L X
xdG(x) = 71
.
J IxISL
k>
Therefore, if 4c 17L < a/4, then lim P{1
64 }
kupne
_< 4c xl>L
1 + x2 dG(x) + 4c ( 1 + L2) 5 x2
6
(4
dG(x)
-4cIILI
2
)dFn(x)
6.
CONVERGENCE OF A SEQUENCE
483
This means that for sufficiently small c, lim P{ sup OSt
4
Kc ,
were K is some constant such that lim P{i (en(t)) > 6} < C1 + 1 )(Kc)E + 2Kc
Equation (3) follows from this inequality. This completes the proof of the theorem. As examples of corollaries of this general theorem, let us look at a few particular limit theorems. Theorem 3. Let a(t) and b(t) > 0 denote continuous functions and let a denote a real number. If the hypotheses of Theorem 1 are satisfied, then
lim P{a( n) - ab\( l < snk < a\( n l + /
ab(nl k= 1' 2, ... , n
= P{a(t) - ab(t) < e(t) < a(t) + ab(t), 0 < t G 1} for all positive a such that p{a(t) - ab(t) < E(t) < a(t) + ab(t), 0 < t < 1} is continuous at a as a function of a. Proof.
Consider the functional defined on D[O, 1] by
f(x(t)) = sup I x(t) + a(t) 0 0, there exists an m such that for all sufficiently large n, M
sup 1m n'=1 n(`m/ - 5e(t)dt 0< It'-t"ISl/m
M
nt')
-
n(ttt)
An analogous inequality holds for e(t), since it follows from equation
(2) and Fatou's lemma (Theorem 2, Section 5, Chapter II) that lim
sup M
%-0 t'-t"Ish
0.
We note that the convergence of the sequences of finite-dimensional distributions of the process en(t) to finite-dimensional distributions
LIMIT THEOREMS FOR RANDOM PROCESSES
486
of the process e(t) implies convergence of the sequence of distributions (1/m) Ek 1 6,(k/m) to the distribution (1/m) E- 1 cn(k/m). fore
There-
lim M exp iX en(t)dt - M exp i? f le(t)dt "-'Co
0
0
< lim M exp {i1e(t)dt} - M exp {ix__ A;=1 ( 0
+ M exp {ix,'e(t)dt} - M exp {iX XIM
+I
\m
m
)JI
(
jt)dt - m
m/
M os(t)dt -
\m/
2I
S.
This completes the proof of the theorem. Theorem 2. Suppose that the sequences of finite-dimensional distributions of the processes en(t) converge to finite-dimensional distributions of the process e(t).
Suppose that for every s > 0
lim lim sup p{jen(t) - e,,,(s) I > s} = 0 , h-,oo n-.oo I t-a 15h
(3)
and that there is a nonnegative function 1/r(x) such that /r(x) T - as I x I --. + 00 and sup, sups M'/r(en(t)) = c < o o. Then for every continuous function (p(t, x) such that lim sup sup I 'At' x) I = 0 N-.oo
t
Ixj>N
*(x)
the sequence of distributions of the variable \(t, ,n(t))dt converges to the distribution of the variable
r1p(t,
0
E(t))dt.
0
Proof. It will be sufficient to show that the hypotheses of Theorem 1 are satisfied for the sequence of processes 7)n(t) = T (t, en(t)).
Convergence of the finite-dimensional distributions of the processes ren(t) to the finite-dimensional distributions of the process 7)(t) = p(t, e(t)) follows from the convergence of the finite-dimensional distributions of the processes en(t) to the finite-dimensional distributions of the process e(t) and the continuity of the function T (t, x). Since I p(t, x) I < K(1 + i/r(x)) for some K, we have sup sup M 1)7.(t) I < K(1 + c) t
Finally, let us show that equation (2) is satisfied for the sequence {ren(t)}. To do this, we define the function gN(x) that is equal to
7.
LIMIT THEOREMS FOR FUNCTIONALS OF INTEGRAL FORM
x for I x I< N and equal to N sgn x for
487
>_ N. We define
Ixl
EN = sup sup I IP(t, X) t xl?N *(X)
Then using the inequality I'P(t, g,(X)) - m c(t, X) I < *(X)
sN ,
we obtain I mq(t1, e (t1))
M
m - q(t2, ag (t)) I
< M 17'(t1,,, $N(Ef(tl))) - q'(t2, gN(t2))) I slmI,(En(t2)) + sNM Y' (ent)) + M I q (t1, gN(en(tl))) - 0111 gN(t2))) I + 2-cc
(4)
Since T(t, x) is a continuous function, it follows that for
every
s > 0 and L > 0, there exists a a > 0 such that 19 (t1, x1) -'(t2, x2) I < s
whenever
Ix1-x2I 0, sup (Fn (x) - F(x)) < a} = 1 - e-2a2
-°°<x 0,
lim P{n [Fn (x) - F(x)]ZdF(x)
0, the difference (t + h) - e(t) has a Gaussian distribution and M(e(t + h) - e(t)) = 0 ,
M(e(t + h) - e(t))' = (t + h)(1 - t - h)
+ t(l -t)-2t(l -t-h)=h-h2.
8.
APPLICATION OF LIMIT THEOREMS TO STATISTICAL CRITERIA
493
Therefore M I (e(t + h) - e(t)14 = 0(h2) and the process e(t) is continuous by virtue of Theorem 2. Section 5, Chapter IV. The convergence of the finite-dimensional distributions of the processes en(t) to the finite-dimensional distributions of E(t) has been established. On the basis of Theorem 2 of Section 5, it remains only to show that relation 5 of Section (5) is satisfied. Since Oc(x(t)) < sup I x(t') - x(t") I t'-t--is'
the lemma will be proved if we can show that lim lim p{ C-0 n-.-
sup
It'-t" Isc
I en(t') - en(t") I > a} = 0
(1)
for all s > 0. The process en(t) + 1/n t increases monotonically; that is, for t1 < t2 /< t, < t4, -V -n (t,, (- t1) < ((en( \t3) - S En(t2)
en(t4) - en\t1) + V n (t4 - t) Therefore
sup I en(t') - n(t")
It'-t'' I Sc
s0 \ 2-.) \ 2mn
We note that sup I
(kl/2m,n_k 2/2mn) I5c
n\2 n/ - n\22n/
1. Then where m(c) is the smallest integer such that c2-(c) > 1.
sup
p{
1 - n\2r/ >
) - n (2r)
2
ar- n(c)
1-a}
> 2(1 - a) } 4
1
s
sar-'nW
- S42ir/
2r-1
n{ Z 2r
s}
24(1 - a)4 s4a4(r-m(c))
LIMIT THEOREMS FOR RANDOM PROCESSES
494
that fall in the interval
Let fin denote the number of variables [t, t + h]. Then P{,un = k} = Cnhk(1 -
h)n-k
and
- hl
en(t + h) - en(t) = 11-n (u" \n
/
Calculations show (cf. Gnedenko, 1963.) That h
M(een(t + h) - n(t))4 < 3h2 +
< 3h2 + Zhnn
n
This means that for h > 1/2"`n we have h) - en(t))4 < 4h2
This inequality and the preceding one show that P{
Sup m n) I I >
4
2r
r-,, (c) c4a4(r-m(c))